Abstract:
The article discusses the application features of methods of the frequencies ordering and approximation to solve the problem of text characters identification. The conditions for realization of Jacobsen’s method for receiving the least error of identification are defined. The method of approximation of one- and two-dimensional distributions of the frequencies of characters bigrams of the text and the language is offered. The experimental data about errors of Jacobsen’s method and the offered approximation method for Russian language texts are provided.
The error of the offered method is less than that of Jacobsen's method. This method can be used for identification of text characters for any language that has a reference distribution of the alphabetic characters bigrams frequencies.
Citation:
Yu. A. Kotov, “Approximation of distributions of text characters bigrams frequencies for alphabetic characters identification”, Tr. SPIIRAN, 50 (2017), 190–208
Linking options:
https://www.mathnet.ru/eng/trspy932
https://www.mathnet.ru/eng/trspy/v50/p190
This publication is cited in the following 1 articles:
Yuri A. Kotov, Olga V. Sanina, 2018 XIV International Scientific-Technical Conference on Actual Problems of Electronics Instrument Engineering (APEIE), 2018, 175