|
This article is cited in 1 scientific paper (total in 1 paper)
Using topic models for pairwise comparison of collections of scientific papers
F. V. Krasnova, A. V. Dimentovb, M. E. Shvartsmanbc a NAUMEN R&D, 49A Tatishcheva Str., Ekaterinburg 620028, Russian Federation
b National Electronic Information Consortium, 5 Letnikovskaya Str., Moscow 115114, Russian Federation
c Russian State Library, 3/5 Vozdvigenka Str., Moscow 119019, Russian Federation
Abstract:
The authors propose a new technique for pairwise comparison of collections of scientific articles via a topic model. The developed methodology is called Comparative Topic Analysis (CTA). Comparative topic analysis allows getting not only quantitative assessment of similarity of collections but also structural differences of the compared text collections. The authors developed transparent visualization for text collections distance. This study compares existing approaches to topic modeling concerning the task of comparing collections of scientific papers. The authors consider probabilistic and generative topic models. The analysis of the requirements for text collections for the correct application of CTA was carried out. The CTA methodology has shown high efficiency in identifying structural differences in related collections. The authors developed an integral metric “Content Uniqueness Ratio” which allows comparing text collections with each other. As a result of the digital experiment, the thematic model with additive regularization (ARTM) proved to be the most informative.
Keywords:
comparative topic analysis, comparative text model, deep text analysis, topic models metrics.
Received: 27.06.2019
Citation:
F. V. Krasnov, A. V. Dimentov, M. E. Shvartsman, “Using topic models for pairwise comparison of collections of scientific papers”, Inform. Primen., 14:3 (2020), 129–135
Linking options:
https://www.mathnet.ru/eng/ia689 https://www.mathnet.ru/eng/ia/v14/i3/p129
|
Statistics & downloads: |
Abstract page: | 124 | Full-text PDF : | 127 | References: | 19 |
|