|
PATTERN RECOGNITION
Comparative analysis of four methods for identifying letters of texts
Yu. A. Kotov Novosibirsk State Technical University, Novosibirsk, Russia
Abstract:
The article presents the results of a comparison of four known frequency methods for identifying letters of texts that are necessary for an applied solution of cryptoanalysis, steganography, and general text analysis problems known in computer science as text mining. To compare and obtain a complete and unified characterization of the methods, an evaluation method is proposed, which includes the measurement of three identification errors and the formation of an integral characteristic based on them, called the goodness of the method. According to this method, an experimental comparison and qualitative analysis of one unigram and three bigram methods of identifying letters of texts was carried out. The comparison was made on representative samples of fragments of Russian texts. The qualitative and quantitative features of the methods, the boundaries of their effective use, the relationship with the type and volume of the text being processed are determined.
It is also shown that an important boundary of text volume for frequency methods and Russianlanguage texts is a text of approximately 4,000 characters. Such a volume is quite sufficient for the frequency identification of alphabet characters in a Russian-language text with minimal error, and in some cases for obtaining an exact solution. It is shown that with this and a larger amount of text, frequency methods for alphabet characters identification and the proposed estimates of their inaccuracies can be used to quantify certain stylistic features of the text.
Keywords:
text, alphabet character, unigram, bigram, identification, one-to-one substitution, cipher, text analysis.
Citation:
Yu. A. Kotov, “Comparative analysis of four methods for identifying letters of texts”, Informatsionnye Tekhnologii i Vychslitel'nye Sistemy, 2019, no. 3, 41–56
Linking options:
https://www.mathnet.ru/eng/itvs352 https://www.mathnet.ru/eng/itvs/y2019/i3/p41
|
Statistics & downloads: |
Abstract page: | 116 | Full-text PDF : | 108 | References: | 2 |
|