|
Trudy SPIIRAN, 2011, Issue 19, Pages 146–158
(Mi trspy440)
|
|
|
|
N gram smoothing based on modeling of expectation of n gram occurrence
A. P. Zykov
Abstract:
It is shown that expectation of n gram frequency of occurrence depends on the size of the training set and the size of the dictionary, which has been formed on the basis of this set. A method for smoothing of n gram language model regarding probabilities of n grams of lower order is proposed. This approach is based on the modeling of expectation function of n gram occurrence probability. We suggest enlarging the size of the training set on the expected number of unseen n grams instead of discounting maximum n gram probability. To model the number of unseen n grams expectation function of n gram frequency of occurrence is extrapolated to zero frequency. Expectation function is modeled by the statistical analysis of occurrences of words in texts.
Keywords:
language model, smoothing techniques.
Received: 05.07.2011 Accepted: 29.11.2011
Citation:
A. P. Zykov, “N gram smoothing based on modeling of expectation of n gram occurrence”, Tr. SPIIRAN, 19 (2011), 146–158
Linking options:
https://www.mathnet.ru/eng/trspy440 https://www.mathnet.ru/eng/trspy/v19/p146
|
Statistics & downloads: |
Abstract page: | 436 | Full-text PDF : | 326 | First page: | 1 |
|