Computer Research and Modeling
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Computer Research and Modeling:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Computer Research and Modeling, 2017, Volume 9, Issue 5, Pages 837–850
DOI: https://doi.org/10.20537/2076-7633-2017-9-5-837-850
(Mi crm103)
 

This article is cited in 5 scientific papers (total in 5 papers)

MODELS OF ECONOMIC AND SOCIAL SYSTEMS

A novel method of stylometry based on the statistic of numerals

A. V. Zenkovab

a Ural Federal University, Mira st. 19, Ekaterinburg, 620002, Russia
b The Ural State University of Economics, 8th of March st. 62, Ekaterinburg, 620144, Russia
References:
Abstract: A new method of statistical analysis of texts is suggested. The frequency distribution of the first significant digits in numerals of English-language texts is considered. We have taken into account cardinal as well as ordinal numerals expressed both in figures, and verbally. To identify the author's use of numerals, we previously deleted from the text all idiomatic expressions and set phrases accidentally containing numerals, as well as itemizations and page numbers, etc. Benford's law is found to hold approximately for the frequencies of various first significant digits of compound literary texts by different authors; a marked predominance of the digit $1$ is observed. In coherent authorial texts, characteristic deviations from Benford's law arise which are statistically stable significant author peculiarities that allow, under certain conditions, to consider the problem of authorship and distinguish between texts by different authors. The text should be large enough (at least about $200$ kB). At the end of $\{1, 2, \dots , 9\}$ digits row, the frequency distribution is subject to strong fluctuations and thus unrepresentative for our purpose. The aim of the theoretical explanation of the observed empirical regularity is not intended, which, however, does not preclude the applicability of the proposed methodology for text attribution. The approach suggested and the conclusions are backed by the examples of the computer analysis of works by W. M. Thackeray, M. Twain, R. L. Stevenson, J. Joyce, sisters Brontë, and J. Austen. On the basis of technique suggested, we examined the authorship of a text earlier ascribed to L. F. Baum (the result agrees with that obtained by different means). We have shown that the authorship of Harper Lee's “To Kill a Mockingbird” pertains to her, whereas the primary draft, “Go Set a Watchman”, seems to have been written in collaboration with Truman Capote. All results are confirmed on the basis of parametric Pearson's chi-squared test as well as non-parametric Mann–Whitney $\mathrm{U}$ test and Kruskal–Wallis test.
Keywords: text attribution, first significant digit of numerals.
Received: 01.07.2017
Accepted: 14.08.2017
Document Type: Article
UDC: 51-78, 519.234.3, 519.257, 81-139
Language: Russian
Citation: A. V. Zenkov, “A novel method of stylometry based on the statistic of numerals”, Computer Research and Modeling, 9:5 (2017), 837–850
Citation in format AMSBIB
\Bibitem{Zen17}
\by A.~V.~Zenkov
\paper A novel method of stylometry based on the statistic of numerals
\jour Computer Research and Modeling
\yr 2017
\vol 9
\issue 5
\pages 837--850
\mathnet{http://mi.mathnet.ru/crm103}
\crossref{https://doi.org/10.20537/2076-7633-2017-9-5-837-850}
Linking options:
  • https://www.mathnet.ru/eng/crm103
  • https://www.mathnet.ru/eng/crm/v9/i5/p837
  • This publication is cited in the following 5 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Computer Research and Modeling
    Statistics & downloads:
    Abstract page:199
    Full-text PDF :88
    References:35
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024