|
Statistical text language recognition with the use of $n$-gram frequency
Yu. N. Orlov, S. A. Shilin
Abstract:
Statistical properties of European language texts are investigated with the use of recognition procedure for $n$-gram distribution patterns. The numerical algorithm is constructed for analysis Hurst exponent for letter distance distributions of the text fragment. The accuracy of binary recognition is estimated as 0,99.
Keywords:
text language recognition, $n$-gram frequency.
Citation:
Yu. N. Orlov, S. A. Shilin, “Statistical text language recognition with the use of $n$-gram frequency”, Keldysh Institute preprints, 2017, 032, 21 pp.
Linking options:
https://www.mathnet.ru/eng/ipmp2248 https://www.mathnet.ru/eng/ipmp/y2017/p32
|
Statistics & downloads: |
Abstract page: | 157 | Full-text PDF : | 137 | References: | 37 |
|