|
This article is cited in 1 scientific paper (total in 1 paper)
Two-factor patterns construction in problems of texts classification
M. Yu. Voronina, A. A. Kislitsyn, Yu. N. Orlov
Abstract:
Two-factor patterns of empirical distributions of bigram frequencies for machine classification of texts by authors and subject are constructed. Text attributes are recognized by the nearest neighbor method in relation to reference distributions. The proximity between distributions is understood in the sense of the norm in L1. The 'author-topic' pair of an unknown text is defined as a nearest neighbor pattern. The problem of recognizing the author regardless of the topic of the text and the topic regardless of the author is analyzed. The possibilities of enlarging and detailing classification features are also being investigated.
Keywords:
machine classification, text, bigram distribution, spectral portrait,
clustering.
Citation:
M. Yu. Voronina, A. A. Kislitsyn, Yu. N. Orlov, “Two-factor patterns construction in problems of texts classification”, Keldysh Institute preprints, 2022, 043, 24 pp.
Linking options:
https://www.mathnet.ru/eng/ipmp3069 https://www.mathnet.ru/eng/ipmp/y2022/p43
|
Statistics & downloads: |
Abstract page: | 72 | Full-text PDF : | 21 | References: | 15 |
|