Sibirskie Èlektronnye Matematicheskie Izvestiya [Siberian Electronic Mathematical Reports]
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Impact factor

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Sib. Èlektron. Mat. Izv.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Sibirskie Èlektronnye Matematicheskie Izvestiya [Siberian Electronic Mathematical Reports], 2020, Volume 17, Pages 1959–1974
DOI: https://doi.org/10.33048/semi.2020.17.132
(Mi semr1326)
 

This article is cited in 1 scientific paper (total in 1 paper)

Probability theory and mathematical statistics

A statistical test for correspondence of texts to the Zipf—Mandelbrot law

A. Chakrabartya, M. G. Chebuninba, A. P. Kovalevskiica, I. M. Pupyshevca, N. S. Zakrevskayac, Q. Zhoud

a Novosibirsk State University, 1, Pirogova str., Novosibirsk, 630090, Russia
b Sobolev Institute of Mathematics, 4, Koptyuga ave., Novosibirsk, 630090, Russia
c Novosibirsk State Technical University, 20, K. Marksa ave., Novosibirsk, 630073, Russia
d School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
References:
Abstract: We analyse correspondence of texts to a simple probabilistic model. The model assumes that the words are selected independently from an infinite dictionary, and the probability distribution of words corresponds to the Zipf—Mandelbrot law. We count the numbers of different words in the text sequentially and get the process of the numbers of different words. Then we estimate the Zipf—Mandelbrot law's parameters using the same sequence and construct an estimate of the expectation of the number of different words in the text. After that we subtract the corresponding values of the estimate from the sequence and normalize along the coordinate axes, obtaining a random process on a segment from $0$ to $1$. We prove that this process (the empirical text bridge) converges weakly in the uniform metric on $C(0, 1)$ to a centered Gaussian process with continuous a.s. paths. We develop and implement an algorithm for calculating the probability distribution of the integral of the square of this process. We present several examples of application of the algorithm for analysis of the homogeneity of texts in English, French, Russian, and Chinese.
Keywords: Zipf's law, weak convergence, Gaussian process.
Funding agency Grant number
Russian Foundation for Basic Research 19-51-53010
The reported study was funded by RFBR and NSFC according to the research project No. 19-51-53010.
Received September 28, 2020, published November 27, 2020
Bibliographic databases:
Document Type: Article
UDC: 519.233
MSC: 62F03
Language: English
Citation: A. Chakrabarty, M. G. Chebunin, A. P. Kovalevskii, I. M. Pupyshev, N. S. Zakrevskaya, Q. Zhou, “A statistical test for correspondence of texts to the Zipf—Mandelbrot law”, Sib. Èlektron. Mat. Izv., 17 (2020), 1959–1974
Citation in format AMSBIB
\Bibitem{ChaCheKov20}
\by A.~Chakrabarty, M.~G.~Chebunin, A.~P.~Kovalevskii, I.~M.~Pupyshev, N.~S.~Zakrevskaya, Q.~Zhou
\paper A statistical test for correspondence of texts to the Zipf---Mandelbrot law
\jour Sib. \`Elektron. Mat. Izv.
\yr 2020
\vol 17
\pages 1959--1974
\mathnet{http://mi.mathnet.ru/semr1326}
\crossref{https://doi.org/10.33048/semi.2020.17.132}
\isi{https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=Publons&SrcAuth=Publons_CEL&DestLinkType=FullRecord&DestApp=WOS_CPL&KeyUT=000593965200001}
Linking options:
  • https://www.mathnet.ru/eng/semr1326
  • https://www.mathnet.ru/eng/semr/v17/p1959
  • This publication is cited in the following 1 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Statistics & downloads:
    Abstract page:253
    Full-text PDF :136
    References:20
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024