Abstract:
The problem of probabilistic topic modeling is as follows. Given a collection
of text documents, find the conditional distribution over topics for each
document and the conditional distribution over words (or terms) for each topic.
Log-likelihood maximization is used to solve this problem. The problem
generally has an infinite set of solutions and is ill-posed according to Hadamard.
In the framework of Additive Regularization of Topic Models (ARTM), a weighted
sum of regularization criteria is added to the main log-likelihood criterion.
The numerical method for solving this optimization problem is a kind of an
iterative EM-algorithm written in a general form for an
arbitrary smooth regularizer as well as for a linear combination of smooth
regularizers. This paper studies the problem of convergence of the EM iterative
process. Sufficient conditions are obtained for the convergence to a stationary
point of the regularized log-likelihood. The constraints imposed on the regularizer
are not too restrictive. We give their interpretations from the point of view
of the practical implementation of the algorithm. A modification of the algorithm
is proposed that improves the convergence without additional time and memory costs.
Experiments on a news text collection have shown that our modification both
accelerates the convergence and improves the value of the criterion to be optimized.
Keywords:
natural language processing, probabilistic topic modeling, probabilistic latent semantic analysis (PLSA), latent Dirichlet allocation (LDA), additive regularization of topic models (ARTM), EM-algorithm, sufficient conditions for convergence.
The work was performed within the project “Text mining tools for big data” according to the program of the Competence Center of the National Technological Initiative “Center for Big Data Storage and Processing” supported by the Ministry of Science and Higher Education of the Russian Federation under the agreement between Moscow State University and the NTI Fund of August 15, 2019, no. 7/1251/2019. This work was also partially supported by the Russian Foundation for Basic Research (project no. 20-07-00936).
Citation:
I. A. Irkhin, K. V. Vorontsov, “Convergence of the Algorithm of Additive Regularization of Topic Models”, Trudy Inst. Mat. i Mekh. UrO RAN, 26, no. 3, 2020, 56–68; Proc. Steklov Inst. Math. (Suppl.), 315, suppl. 1 (2021), S128–S139