M. Yu. Voronina, A. A. Kislitsyn, Yu. N. Orlov, “Algorithm of the correction of bigram method for the problem of the text author identification”, Mat. Model., 34:9 (2022), 3–20; Math. Models Comput. Simul., 15:2 (2023), 245

Loading [MathJax]/jax/output/SVG/config.js

Matematicheskoe modelirovanie

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive
	Impact factor

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Mat. Model.:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Matematicheskoe modelirovanie, 2022, Volume 34, Number 9, Pages 3–20
DOI: https://doi.org/10.20948/mm-2022-09-01 (Mi mm4401)

This article is cited in 2 scientific papers (total in 2 papers)

Algorithm of the correction of bigram method for the problem of the text author identification

M. Yu. Voronina, A. A. Kislitsyn, Yu. N. Orlov

Keldysh Institute of Applied Mathematics of RAS

Full-text PDF (443 kB) Citations (2)

References:

PDF

HTML

DOI: https://doi.org/10.20948/mm-2022-09-01

Abstract: The paper proposes a model for recognizing authors of literary texts based on the proximity of an individual text to the author's standard. The standard is the empirical frequency distribution of letter combinations, constructed according to all reliably known works of the author. Proximity is understood in the sense of the norm in L1. The author of an unknown text is assigned the one to whose standard the text under test is closest. For identification, a library of authors is used, each of which has a sufficiently large number of works defining the corresponding standards of two letter combinations. Testing of this identification method on the authors of the library has shown that it is very accurate. In the analyzed corpus of texts, 1783 texts of 100 authors were collected, the recognition error by the best method turned out to be 0.12. It is important that after the exclusion of erroneously recognized texts, a library of 88 authors and 1450 texts remained, each of which was identified correctly. The problem under study is the assessment of the probability that there is no standard of the author of the tested text among the library standards. To solve it, the paper analyzes the dependence of the probability of erroneous identification on the length of the text. Using the example of an unmistakably determined subgroup of texts, it turned out that the empirical probability of correct recognition of a text fragment, although it decreases with a decrease in the length of the fragment, still exceeds 0.5 up to the fragmentation of the text into 10 parts. If we take smaller fragments, some of them are identified incorrectly. If the correct standard is excluded from consideration, the second closest standard is assigned as such, but it turns out to be unstable: the ambiguity of such identification of the author of fragments occurs already when the text is cut into 4 fragments. Thus, the stability of the identification of the author of text fragments can be proposed as a criterion for the correctness of the method.

Keywords: text, author, bigram distribution, fragment identification, correction of error probability.

Received: 07.04.2022
Revised: 23.05.2022
Accepted: 27.06.2022

English version:
Mathematical Models and Computer Simulations, 2023, Volume 15, Issue 2, Pages 245–254
DOI: https://doi.org/10.1134/S2070048223020175

Bibliographic databases:

Document Type: Article

Language: Russian

Citation: M. Yu. Voronina, A. A. Kislitsyn, Yu. N. Orlov, “Algorithm of the correction of bigram method for the problem of the text author identification”, Mat. Model., 34:9 (2022), 3–20; Math. Models Comput. Simul., 15:2 (2023), 245–254

Citation in format AMSBIB

\Bibitem{VorKisOrl22}

\by M.~Yu.~Voronina, A.~A.~Kislitsyn, Yu.~N.~Orlov

\paper Algorithm of the correction of bigram method for the problem of the text author identification

\jour Mat. Model.

\yr 2022

\vol 34

\issue 9

\pages 3--20

\mathnet{http://mi.mathnet.ru/mm4401}

\crossref{https://doi.org/10.20948/mm-2022-09-01}

\mathscinet{http://mathscinet.ams.org/mathscinet-getitem?mr=4515346}

\transl

\jour Math. Models Comput. Simul.

\yr 2023

\vol 15

\issue 2

\pages 245--254

\crossref{https://doi.org/10.1134/S2070048223020175}

Linking options:

https://www.mathnet.ru/eng/mm4401

https://www.mathnet.ru/eng/mm/v34/i9/p3

This publication is cited in the following 2 articles:

M. Yu. Kislitsyna, Yu. N. Orlov, “Struktura oshibok raspoznavaniya avtora teksta metodom trigramm”, Preprinty IPM im. M. V. Keldysha, 2024, 060, 24 pp.
A. A. Kislitsyn, M. Yu. Kislitsyna, “Raspoznavanie vyborochnykh raspredelenii sredi sistemy etalonov: metod blizhaishego soseda”, Preprinty IPM im. M. V. Keldysha, 2023, 029, 21 pp.

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Statistics & downloads:
Abstract page:	268
Full-text PDF :	55
References:	59
First page:	9

Registration to the website

Logotypes