|
This article is cited in 1 scientific paper (total in 1 paper)
The construction and analysis of the Russian language models for a cryptographic algorithm research
A. G. Malashina, A. B. Los National Research University «Higher School of
Economics» (Moscow)
Abstract:
The article provides a statistical analysis of the properties of lexical and $n$-gram models of the Russian language based on the news text corpus. A specialized corpus of political news articles of recent years has been created, reflecting a narrow area of language use. The token and $n$-gram dictionaries are compiled, the coverage values are found, as well as the values of entropy. Lemmatization of the original text corpus and extrapolation of the dictionary volumes are performed.
Keywords:
$n$-gram dictionaries, $n$-gram entropy, meaningful texts.
Received: 30.09.2020 Accepted: 22.06.2022
Citation:
A. G. Malashina, A. B. Los, “The construction and analysis of the Russian language models for a cryptographic algorithm research”, Chebyshevskii Sb., 23:2 (2022), 151–160
Linking options:
https://www.mathnet.ru/eng/cheb1182 https://www.mathnet.ru/eng/cheb/v23/i2/p151
|
|