Abstract:
We propose a generalized probabilistic topic model of text corpora which can incorporate heuristics of Bayesian regularization, sampling, frequent parameters update, and robustness in any combinations. Well- known models PLSA, LDA, CVB0, SWB, and many others can be considered as special cases of the proposed broad family of models. We propose the robust PLSA model and show that it is more sparse and performs better that regularized models like LDA.
Citation:
K. V. Vorontsov, A. A. Potapenko, “Regularization, robustness and sparsity of probabilistic topic models”, Computer Research and Modeling, 4:4 (2012), 693–706
\Bibitem{VorPot12}
\by K.~V.~Vorontsov, A.~A.~Potapenko
\paper Regularization, robustness and sparsity of probabilistic topic models
\jour Computer Research and Modeling
\yr 2012
\vol 4
\issue 4
\pages 693--706
\mathnet{http://mi.mathnet.ru/crm522}
\crossref{https://doi.org/10.20537/2076-7633-2012-4-4-693-706}
Linking options:
https://www.mathnet.ru/eng/crm522
https://www.mathnet.ru/eng/crm/v4/i4/p693
This publication is cited in the following 14 articles:
Ravil I. Mukhamediev, Marina Yelis, Kirill Yakunin, Yelena Popova, Yan Kuchin, Adilkhan Symagulov, Nadiya Yunicheva, Elena Zaitseva, Vitaly Levashenko, Elena Muhamedijeva, Viktors Gopejenko, Rustam Mussabayev, “Exploring the health care system's representation in the media through hierarchical topic modeling”, Cogent Engineering, 11:1 (2024)
Antonina Pinchuk, Svetlana Karepova, Dmitry Tikhomirov, “Text Mining technologies in sociological analysis (using the example of studying students`ideas about the mission of a modern university)”, Sociologicheskaja nauka i social'naja praktika, 12:1 (2024), 62
M. M. Gayanova, E. Yu. Sazonova, O. N. Smetanina, A. K. Sulejmanov, “Selection of Tools for Preprocessing and Thematic Modeling of Scientific Articles from the Data Lake”, Pattern Recognit. Image Anal., 33:3 (2023), 313
Sergei Dosko, Vladimir Utencov, Aleksey Spasenov, Igor Lukashin, Kirill Kucherov, Lecture Notes on Data Engineering and Communications Technologies, 119, Advances in Artificial Systems for Power Engineering II, 2022, 170
Wei Jiek Chong, Hui Na Chua, May Fen Gan, 2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET), 2022, 1
Kirill Yakunin, Maksat Kalimoldayev, Ravil I. Mukhamediev, Rustam Mussabayev, Vladimir Barakhnin, Yan Kuchin, Sanzhar Murzakhmetov, Timur Buldybayev, Ulzhan Ospanova, Marina Yelis, Akylbek Zhumabayev, Viktors Gopejenko, Zhazirakhanym Meirambekkyzy, Alibek Abdurazakov, “KazNewsDataset: Single Country Overall Digital Mass Media Publication Corpus”, Data, 6:3 (2021), 31
Kirill Yakunin, Ravil Mukhamediev, Yan Kuchin, Rustam Musabayev, Timur Buldybayev, Sanzhar Murzakhmetov, “Classification of negative publication in mass media using topic modeling”, J. Phys.: Conf. Ser., 1727:1 (2021), 012019
Kirill Yakunin, Ravil I. Mukhamediev, Elena Zaitseva, Vitaly Levashenko, Marina Yelis, Adilkhan Symagulov, Yan Kuchin, Elena Muhamedijeva, Margulan Aubakirov, Viktors Gopejenko, “Mass Media as a Mirror of the COVID-19 Pandemic”, Computation, 9:12 (2021), 140
Kirill Yakunin, Ravil I. Mukhamediev, Marina Yelis, Adilkhan Symagulov, Yan Kuchin, Elena Muhamedijeva, Jan Rabcan, Aubakirov Margulan, 2021 International Conference on Information and Digital Technologies (IDT), 2021, 260
Ravil I. Mukhamediev, Kirill Yakunin, Rustam Mussabayev, Timur Buldybayev, Yan Kuchin, Sanzhar Murzakhmetov, Marina Yelis, “Classification of Negative Information on Socially Significant Topics in Mass Media”, Symmetry, 12:12 (2020), 1945
V B Barakhnin, R I Mukhamedyev, R R Mussabaev, O Yu Kozhemyakina, A Issayeva, Ya I Kuchin, S B Murzakhmetov, K O Yakunin, “Methods to identify the destructive information”, J. Phys.: Conf. Ser., 1405:1 (2019), 012004
E. V. Tutubalina, “Sovmestnaya veroyatnostnaya tematicheskaya model dlya identifikatsii problemnykh vyskazyvanii, svyazannykh narusheniem funktsionalnosti produktov”, Trudy ISP RAN, 27:4 (2015), 111–128
Maria Saburova, Archil Maysuradze, Communications in Computer and Information Science, 518, Knowledge Engineering and Semantic Web, 2015, 168