|
Computer science
Research of features of Dostoevsky's publicistic style by using $n$-grams based on the materials of the “Time” and “Epoch” magazines
R. V. Abramov, K. A. Kulakov, A. A. Lebedev, N. D. Moskin, A. A. Rogov Petrozavodsk State University, 33, pr. Lenina, Petrozavodsk, 185910, Russian Federation
Abstract:
The paper is devoted to the study of the publicity style of F. M. Dostoevsky on the basis of publications in the journals “Time” and “Epoch” (1861–1865). For this, fragments of texts (including other authors: M. M. Dostoevsky, N. N. Strakhov, A. A. Golovachev, etc.) were selected in sizes of 500, 700 and 1000 words, on which the occurrence of bigrams and trigrams (encoded sequences of parts of speech) were counted. Decision trees were built on their basis and an analysis of the accuracy of text recognition was performed. If we consider the class cation at the rest level of the tree (fragment size 1000), then the accuracy was on average 87 resulting decision trees.
Keywords:
publicity style, text attribution, decision tree, $n$-gram, F. M. Dostoevsky, information system “Statistical methods for analyzing literary texts”, tree matching.
Received: December 25, 2020 Accepted: October 13, 2021
Citation:
R. V. Abramov, K. A. Kulakov, A. A. Lebedev, N. D. Moskin, A. A. Rogov, “Research of features of Dostoevsky's publicistic style by using $n$-grams based on the materials of the “Time” and “Epoch” magazines”, Vestnik S.-Petersburg Univ. Ser. 10. Prikl. Mat. Inform. Prots. Upr., 17:4 (2021), 389–396
Linking options:
https://www.mathnet.ru/eng/vspui505 https://www.mathnet.ru/eng/vspui/v17/i4/p389
|
|