|
The distribution of ordinal frequencies of consonants as an invariant of a language group
M. Yu. Kislitsyna, Yu. N. Orlov
Abstract:
The statistics of the frequency distribution of consonant letters in the main modern languages of the Indo-European family are collected. The distributions of descending frequencies were studied, based on the analysis of literary texts with a length of about 1 million characters. It is shown that it is possible to introduce an invariant of language groups – Germanic, Romance, Slavic and Baltic – as the distance between the elements of the group in the L1 norm. The threshold distance at which languages are grouped as fully connected subgraphs is 0.14. It is also shown that the structures of the graph of near and far neighbors correspond to the model of dependent random variables.
Keywords:
machine classification, text preprocessing, ordered frequencies distribution, nearest neighbor graph.
Citation:
M. Yu. Kislitsyna, Yu. N. Orlov, “The distribution of ordinal frequencies of consonants as an invariant of a language group”, Keldysh Institute preprints, 2024, 016, 18 pp.
Linking options:
https://www.mathnet.ru/eng/ipmp3226 https://www.mathnet.ru/eng/ipmp/y2024/p16
|
Statistics & downloads: |
Abstract page: | 33 | Full-text PDF : | 12 | References: | 16 |
|