|
Trudy SPIIRAN, 2013, Issue 24, Pages 332–348
(Mi trspy571)
|
|
|
|
Software for Creation of Sintactico-Statistical Russian Language Model Based on the Text Corpus
I. S. Kipyatkova St. Petersburg Institute for Informatics and Automation of RAS
Abstract:
Creation of the language model is one of the stages of training of a continuous speech recognition system. In the paper, the developed software for creation of syntactic-statistical Russian language model based on a text corpus is described. The main stages of the algorithm are preliminary text material processing, creation of statistical n-gram language model, extension of the statistical model by n-grams obtained by syntactical analysis. Syntactical analysis permits to increase the quantity of different bigrams created during text processing and to improve the quality of the language model by extracting grammatically-connected word pairs. The results of the testing of the language models created with the help of the software module are presented.
Keywords:
automatic speech recognition, statistical language model, syntactical analysis.
Received: 01.02.2013
Citation:
I. S. Kipyatkova, “Software for Creation of Sintactico-Statistical Russian Language Model Based on the Text Corpus”, Tr. SPIIRAN, 24 (2013), 332–348
Linking options:
https://www.mathnet.ru/eng/trspy571 https://www.mathnet.ru/eng/trspy/v24/p332
|
Statistics & downloads: |
Abstract page: | 233 | Full-text PDF : | 113 | References: | 38 | First page: | 1 |
|