Loading [MathJax]/jax/output/SVG/config.js
Informatsionnye Tekhnologii i Vychslitel'nye Sistemy
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Guidelines for authors

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Informatsionnye Tekhnologii i Vychslitel'nye Sistemy:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Informatsionnye Tekhnologii i Vychslitel'nye Sistemy, 2019, Issue 3, Pages 41–56
DOI: https://doi.org/10.14357/20718632190304
(Mi itvs352)
 

PATTERN RECOGNITION

Comparative analysis of four methods for identifying letters of texts

Yu. A. Kotov

Novosibirsk State Technical University, Novosibirsk, Russia
Abstract: The article presents the results of a comparison of four known frequency methods for identifying letters of texts that are necessary for an applied solution of cryptoanalysis, steganography, and general text analysis problems known in computer science as text mining. To compare and obtain a complete and unified characterization of the methods, an evaluation method is proposed, which includes the measurement of three identification errors and the formation of an integral characteristic based on them, called the goodness of the method. According to this method, an experimental comparison and qualitative analysis of one unigram and three bigram methods of identifying letters of texts was carried out. The comparison was made on representative samples of fragments of Russian texts. The qualitative and quantitative features of the methods, the boundaries of their effective use, the relationship with the type and volume of the text being processed are determined.
It is also shown that an important boundary of text volume for frequency methods and Russianlanguage texts is a text of approximately 4,000 characters. Such a volume is quite sufficient for the frequency identification of alphabet characters in a Russian-language text with minimal error, and in some cases for obtaining an exact solution. It is shown that with this and a larger amount of text, frequency methods for alphabet characters identification and the proposed estimates of their inaccuracies can be used to quantify certain stylistic features of the text.
Keywords: text, alphabet character, unigram, bigram, identification, one-to-one substitution, cipher, text analysis.
Document Type: Article
Language: Russian
Citation: Yu. A. Kotov, “Comparative analysis of four methods for identifying letters of texts”, Informatsionnye Tekhnologii i Vychslitel'nye Sistemy, 2019, no. 3, 41–56
Citation in format AMSBIB
\Bibitem{Kot19}
\by Yu.~A.~Kotov
\paper Comparative analysis of four methods for identifying letters of texts
\jour Informatsionnye Tekhnologii i Vychslitel'nye Sistemy
\yr 2019
\issue 3
\pages 41--56
\mathnet{http://mi.mathnet.ru/itvs352}
\crossref{https://doi.org/10.14357/20718632190304}
Linking options:
  • https://www.mathnet.ru/eng/itvs352
  • https://www.mathnet.ru/eng/itvs/y2019/i3/p41
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Informatsionnye  Tekhnologii i Vychslitel'nye Sistemy
    Statistics & downloads:
    Abstract page:116
    Full-text PDF :108
    References:2
     
      Contact us:
    math-net2025_05@mi-ras.ru
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2025