|
Graph $n$-grams in the text attribution problem
N. D. Moskin, A. A. Rogov, A. A. Lebedev Petrozavodsk State University, 33 Lenina Prosp., Petrozavodsk 185910, Russian Federation
Abstract:
The paper presents the results of research in the field of modeling the structure of texts using a generalized context-dependent graph-theoretic model. The object of the study is mainly literary and folklore texts for which the task of attribution arises. For example, there are many such texts in the works of the famous Russian writer F. M. Dostoevsky. The authors show how it is possible to build hybrid models based on dependency trees, graph models of syntactic structure of links between simple sentences in a multicomponent complex sentence, and “strong links” graphs of word combinations of different grammatical classes. Such models make it possible to construct new informative features that are potentially applicable in the attribution of texts. An example is the frequency of occurrence of graph $n$-grams which are generalizations of ordinary $n$-grams syntactic $n$-grams, and other similar constructions used in stylistic studies. The article also discusses the format for storing texts, their generalized graph models, and graph $n$-grams.
Keywords:
artificial intelligence, text attribution, graph, metagraph, hybrid graph, folklore text, literary text, graph $n$-gram.
Received: 01.07.2023
Citation:
N. D. Moskin, A. A. Rogov, A. A. Lebedev, “Graph $n$-grams in the text attribution problem”, Sistemy i Sredstva Inform., 33:4 (2023), 115–125
Linking options:
https://www.mathnet.ru/eng/ssi916 https://www.mathnet.ru/eng/ssi/v33/i4/p115
|
Statistics & downloads: |
Abstract page: | 44 | Full-text PDF : | 24 | References: | 14 |
|