Proceedings of the Institute for System Programming of the RAS
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Proceedings of the Institute for System Programming of the RAS, 2020, Volume 32, Issue 1, Pages 137–152
DOI: https://doi.org/10.15514/ISPRAS-2020-32(1)-8
(Mi tisp490)
 

This article is cited in 2 scientific papers (total in 2 papers)

Effective implementations of topic modeling algorithms

M. A. Apishev

Lomonosov Moscow State University
Full-text PDF (435 kB) Citations (2)
References:
Abstract: Topic modeling is an area of natural language processing that has been actively developed in the last 15 years. A probabilistic topic model extracts a set of hidden topics from a collection of text documents. It defines each topic by a probability distribution over words and describes each document with a probability distribution over topics. The exploding volume of text data motivates the community to constantly upgrade topic modeling algorithms for multiprocessor systems. In this paper, we provide an overview of effective EM-like algorithms for learning latent Dirichlet allocation (LDA) and additively regularized topic models (ARTM). Firstly, we review 11 techniques for efficient topic modeling based on synchronous and asynchronous parallel computing, distributed data storage, streaming, batch processing, RAM optimization, and fault tolerance improvements. Secondly, we review 14 effective implementations of topic modeling algorithms proposed in the literature over the past 10 years, which use different combinations of the techniques above. Their comparison shows the lack of a perfect universal solution. All improvements described are applicable to all kinds of topic modeling algorithms: PLSA, LDA, MAP, VB, GS, and ARTM.
Keywords: parallel algorithms, distributed data storage, stream data processing, fault tolerance, topic modeling, EM algorithm, latent Dirichlet allocation, additive regularization of topic models.
Funding agency Grant number
Russian Foundation for Basic Research 20-07-00936
This work is supported by Russian Foundation for Basic Research, grant 20-07-00936.
Document Type: Article
Language: Russian
Citation: M. A. Apishev, “Effective implementations of topic modeling algorithms”, Proceedings of ISP RAS, 32:1 (2020), 137–152
Citation in format AMSBIB
\Bibitem{Api20}
\by M.~A.~Apishev
\paper Effective implementations of topic modeling algorithms
\jour Proceedings of ISP RAS
\yr 2020
\vol 32
\issue 1
\pages 137--152
\mathnet{http://mi.mathnet.ru/tisp490}
\crossref{https://doi.org/10.15514/ISPRAS-2020-32(1)-8}
Linking options:
  • https://www.mathnet.ru/eng/tisp490
  • https://www.mathnet.ru/eng/tisp/v32/i1/p137
  • This publication is cited in the following 2 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Proceedings of the Institute for System Programming of the RAS
    Statistics & downloads:
    Abstract page:321
    Full-text PDF :238
    References:24
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024