M. A. Apishev, “Effective implementations of topic modeling algorithms”, Proceedings of ISP RAS, 32:1 (2020), 137

Proceedings of the Institute for System Programming of the RAS

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Proceedings of the Institute for System Programming of the RAS, 2020, Volume 32, Issue 1, Pages 137–152
DOI: https://doi.org/10.15514/ISPRAS-2020-32(1)-8 (Mi tisp490)

This article is cited in 2 scientific papers (total in 2 papers)

Effective implementations of topic modeling algorithms

M. A. Apishev

Lomonosov Moscow State University

Full-text PDF (435 kB) Citations (2)

References:

PDF

HTML

DOI: https://doi.org/10.15514/ISPRAS-2020-32(1)-8

Abstract: Topic modeling is an area of natural language processing that has been actively developed in the last 15 years. A probabilistic topic model extracts a set of hidden topics from a collection of text documents. It defines each topic by a probability distribution over words and describes each document with a probability distribution over topics. The exploding volume of text data motivates the community to constantly upgrade topic modeling algorithms for multiprocessor systems. In this paper, we provide an overview of effective EM-like algorithms for learning latent Dirichlet allocation (LDA) and additively regularized topic models (ARTM). Firstly, we review 11 techniques for efficient topic modeling based on synchronous and asynchronous parallel computing, distributed data storage, streaming, batch processing, RAM optimization, and fault tolerance improvements. Secondly, we review 14 effective implementations of topic modeling algorithms proposed in the literature over the past 10 years, which use different combinations of the techniques above. Their comparison shows the lack of a perfect universal solution. All improvements described are applicable to all kinds of topic modeling algorithms: PLSA, LDA, MAP, VB, GS, and ARTM.

Keywords: parallel algorithms, distributed data storage, stream data processing, fault tolerance, topic modeling, EM algorithm, latent Dirichlet allocation, additive regularization of topic models.

Funding agency	Grant number
Russian Foundation for Basic Research	20-07-00936
This work is supported by Russian Foundation for Basic Research, grant 20-07-00936.

Document Type: Article

Language: Russian

Citation: M. A. Apishev, “Effective implementations of topic modeling algorithms”, Proceedings of ISP RAS, 32:1 (2020), 137–152

Citation in format AMSBIB

\Bibitem{Api20}

\by M.~A.~Apishev

\paper Effective implementations of topic modeling algorithms

\jour Proceedings of ISP RAS

\yr 2020

\vol 32

\issue 1

\pages 137--152

\mathnet{http://mi.mathnet.ru/tisp490}

\crossref{https://doi.org/10.15514/ISPRAS-2020-32(1)-8}

Linking options:

https://www.mathnet.ru/eng/tisp490

https://www.mathnet.ru/eng/tisp/v32/i1/p137

This publication is cited in the following 2 articles:

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Proceedings of the Institute for System Programming of the RAS

Statistics & downloads:
Abstract page:	321
Full-text PDF :	238
References:	24

Что такое QR-код?

Registration to the website

Logotypes