|
Trudy SPIIRAN, 2010, Issue 12, Pages 35–49
(Mi trspy63)
|
|
|
|
This article is cited in 1 scientific paper (total in 1 paper)
Development and Research of a Statistical Russian Language Model
I. S. Kipyatkova, A. A. Karpov St. Petersburg Institute for Informatics and Automation of RAS
Abstract:
In the paper, the process of creation of a statistical Russian language model for con-tinuous speech recognition systems is described. Characteristics of the collected corpus that consists of several news Internet sites of some on-line newspapers is given; a statistical analysis of this corpus is carried out. Unigram, bigram, and trigram Russian language models have been created on the base of the collected text corpus. For an estimation of quality of these models the entropy and perplexity parameters for these models have been computed. Also a survey of existing approaches for creation of statistical language models is given in the paper.
Keywords:
statistical text processing, language model.
Received: 16.11.2010 Accepted: 06.12.2010
Citation:
I. S. Kipyatkova, A. A. Karpov, “Development and Research of a Statistical Russian Language Model”, Tr. SPIIRAN, 12 (2010), 35–49
Linking options:
https://www.mathnet.ru/eng/trspy63 https://www.mathnet.ru/eng/trspy/v12/p35
|
Statistics & downloads: |
Abstract page: | 387 | Full-text PDF : | 322 | References: | 37 | First page: | 1 |
|