|
Intelligent systems. Theory and applications, 2019, Volume 23, Issue 4, Pages 7–23
(Mi ista246)
|
|
|
|
This article is cited in 1 scientific paper (total in 1 paper)
Part 1. General problems of the intellectual systems theory
Building a complete set of topics of probabilistic topic models
A. V. Sukhareva, K. V. Vorontsov
Abstract:
Interpretability, linear increase in complexity with data growth, scalability made topic modeling one of the most popular tools for statistical text analysis. However, there are a number of disadvantages caused by the dependence of the solution on the initialization. It is known that the building of a topic model is reduced to solving an ill- posed problem of the non-negative matrix factorization. The set of its solutions in the general case is infinite. Every time the model finds a local extremum. Repeated training of the model for the same collection can lead to detection of more and more new topics. In practice, it is often necessary to define all the topics of the corpus. To solve this problem, the article proposed and investigated a new algorithm for finding the complete set of topics based on the construction of a convex hull. It was shown experimentally that it is possible to construct a basis for the finite number of models. The likelihood of the basis is higher than for a single model with a similar number of topics. Compare of the basis of LDA models (latent Dirichlet allocation) and ARTM models (additive regularization for topic modeling) suggests that the topics of the sets coincide with high accuracy.
Keywords:
LDA, ARTM, BigARTM, probabilistic topic modeling, stability of topic models, complete set of topics of topic models, latent Dirichlet allocation, LDA, regularization, ARTM, BigARTM.
Citation:
A. V. Sukhareva, K. V. Vorontsov, “Building a complete set of topics of probabilistic topic models”, Intelligent systems. Theory and applications, 23:4 (2019), 7–23
Linking options:
https://www.mathnet.ru/eng/ista246 https://www.mathnet.ru/eng/ista/v23/i4/p7
|
|