|
Problemy Peredachi Informatsii, 2001, Volume 37, Issue 2, Pages 96–109
(Mi ppi520)
|
|
|
|
This article is cited in 94 scientific papers (total in 94 papers)
Source Coding
Using Literal and Grammatical Statistics for Authorship Attribution
O. V. Kukushkina, A. A. Polikarpov, D. V. Khmelev
Abstract:
Markov chains are used as a formal mathematical model for sequences of elements of a text. This model is applied for authorship attribution of texts. As elements of a text, we consider sequences of letters or sequences of grammatical classes of words. It turns out that the frequencies of occurrences of letter pairs and pairs of grammatical classes in a Russian text are rather stable characteristics of an author and, apparently, they could be used in disputed authorship attribution. A comparison of results for various modifications of the method using both letters and grammatical classes is given. Experimental research involves 385 texts of 82 writers. In the Appendix, the research of D. V. Khmelev is described, where data compression algorithms are applied to authorship attribution.
Received: 08.08.2000 Revised: 11.01.2001
Citation:
O. V. Kukushkina, A. A. Polikarpov, D. V. Khmelev, “Using Literal and Grammatical Statistics for Authorship Attribution”, Probl. Peredachi Inf., 37:2 (2001), 96–109; Problems Inform. Transmission, 37:2 (2001), 172–184
Linking options:
https://www.mathnet.ru/eng/ppi520 https://www.mathnet.ru/eng/ppi/v37/i2/p96
|
Statistics & downloads: |
Abstract page: | 2224 | Full-text PDF : | 1065 | References: | 85 | First page: | 1 |
|