|
This article is cited in 2 scientific papers (total in 2 papers)
On the main types of relatedness between text documents
M. M. Charnine, N. V. Somin Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
Abstract:
This paper considers the question of relatedness of natural language texts based on textual features (fragments). Two types of relatedness are revealed: first, explicit relatedness, when the texts are linked by bibliographic references, and, second, implicit relatedness, when the texts are linked through common text fragments. The advantages and applications of implicit relatedness are discussed. It is shown that the use of implicit relatedness increases the scope of text processing techniques based on relatedness of texts significantly. Measures of explicit and implicit relatedness are proposed. An experiment was conducted on a set of texts from the subject area of “computer graphics”. On the basis of the experiment, it was shown that both types of relatedness are correlated with each other. The authors found the parameters of text processing when the correlation was at maximum and reached about 55%. The plan for further development of the proposed method of texts comparison and refinement of the results is suggested.
Keywords:
relatedness between texts; explicit relatedness; implicit relatedness; measure of relatedness; collection of texts; correlation.
Received: 29.10.2016
Citation:
M. M. Charnine, N. V. Somin, “On the main types of relatedness between text documents”, Sistemy i Sredstva Inform., 27:1 (2017), 100–107
Linking options:
https://www.mathnet.ru/eng/ssi505 https://www.mathnet.ru/eng/ssi/v27/i1/p100
|
|