Yu. A. Kotov, “Comparative analysis of four methods for identifying letters of texts”, Informatsionnye Tekhnologii i Vychslitel'nye Sistemy, 2019, no. 3, 41

Loading [MathJax]/jax/output/SVG/config.js

Informatsionnye Tekhnologii i Vychslitel'nye Sistemy

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive
	Guidelines for authors

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Informatsionnye Tekhnologii i Vychslitel'nye Sistemy:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Informatsionnye Tekhnologii i Vychslitel'nye Sistemy, 2019, Issue 3, Pages 41–56
DOI: https://doi.org/10.14357/20718632190304 (Mi itvs352)

PATTERN RECOGNITION

Comparative analysis of four methods for identifying letters of texts

Yu. A. Kotov

Novosibirsk State Technical University, Novosibirsk, Russia

Full-text PDF (629 kB)

DOI: https://doi.org/10.14357/20718632190304

Abstract: The article presents the results of a comparison of four known frequency methods for identifying letters of texts that are necessary for an applied solution of cryptoanalysis, steganography, and general text analysis problems known in computer science as text mining. To compare and obtain a complete and unified characterization of the methods, an evaluation method is proposed, which includes the measurement of three identification errors and the formation of an integral characteristic based on them, called the goodness of the method. According to this method, an experimental comparison and qualitative analysis of one unigram and three bigram methods of identifying letters of texts was carried out. The comparison was made on representative samples of fragments of Russian texts. The qualitative and quantitative features of the methods, the boundaries of their effective use, the relationship with the type and volume of the text being processed are determined.
It is also shown that an important boundary of text volume for frequency methods and Russianlanguage texts is a text of approximately 4,000 characters. Such a volume is quite sufficient for the frequency identification of alphabet characters in a Russian-language text with minimal error, and in some cases for obtaining an exact solution. It is shown that with this and a larger amount of text, frequency methods for alphabet characters identification and the proposed estimates of their inaccuracies can be used to quantify certain stylistic features of the text.

Keywords: text, alphabet character, unigram, bigram, identification, one-to-one substitution, cipher, text analysis.

Document Type: Article

Language: Russian

Citation: Yu. A. Kotov, “Comparative analysis of four methods for identifying letters of texts”, Informatsionnye Tekhnologii i Vychslitel'nye Sistemy, 2019, no. 3, 41–56

Citation in format AMSBIB

\Bibitem{Kot19}

\by Yu.~A.~Kotov

\paper Comparative analysis of four methods for identifying letters of texts

\jour Informatsionnye  Tekhnologii i Vychslitel'nye Sistemy

\yr 2019

\issue 3

\pages 41--56

\mathnet{http://mi.mathnet.ru/itvs352}

\crossref{https://doi.org/10.14357/20718632190304}

Linking options:

https://www.mathnet.ru/eng/itvs352

https://www.mathnet.ru/eng/itvs/y2019/i3/p41

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Informatsionnye Tekhnologii i Vychslitel'nye Sistemy

Statistics & downloads:
Abstract page:	116
Full-text PDF :	108
References:	2

Registration to the website

Logotypes