Trudy SPIIRAN
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Informatics and Automation:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Trudy SPIIRAN, 2011, Issue 19, Pages 146–158 (Mi trspy440)  

N gram smoothing based on modeling of expectation of n gram occurrence

A. P. Zykov
Abstract: It is shown that expectation of n gram frequency of occurrence depends on the size of the training set and the size of the dictionary, which has been formed on the basis of this set. A method for smoothing of n gram language model regarding probabilities of n grams of lower order is proposed. This approach is based on the modeling of expectation function of n gram occurrence probability. We suggest enlarging the size of the training set on the expected number of unseen n grams instead of discounting maximum n gram probability. To model the number of unseen n grams expectation function of n gram frequency of occurrence is extrapolated to zero frequency. Expectation function is modeled by the statistical analysis of occurrences of words in texts.
Keywords: language model, smoothing techniques.
Received: 05.07.2011
Accepted: 29.11.2011
Document Type: Article
UDC: 519.766.4
Language: Russian
Citation: A. P. Zykov, “N gram smoothing based on modeling of expectation of n gram occurrence”, Tr. SPIIRAN, 19 (2011), 146–158
Citation in format AMSBIB
\Bibitem{Zyk11}
\by A.~P.~Zykov
\paper N gram smoothing based on modeling of expectation of n gram occurrence
\jour Tr. SPIIRAN
\yr 2011
\vol 19
\pages 146--158
\mathnet{http://mi.mathnet.ru/trspy440}
Linking options:
  • https://www.mathnet.ru/eng/trspy440
  • https://www.mathnet.ru/eng/trspy/v19/p146
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Informatics and Automation
    Statistics & downloads:
    Abstract page:436
    Full-text PDF :326
    First page:1
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024