|
This article is cited in 1 scientific paper (total in 1 paper)
Natural language processing
Text documents classification based on probabilistic topic model
S. N. Karpovicha, A. V. Smirnovb, N. N. Teslyab a JSC "Olympus", Moscow, Russia
b St. Petersburg Institute for Informatics and Automation of RAS, St. Petersburg, Russia
Abstract:
The paper proposes an approach to the classification of text documents using a probabilistic topic model, with a training set of documents represented by instances of one class. The proposed approach allows selecting positive instances similar to a given class from collections and text document flows. The models learned on instances of one class, solving problems of classification in application to text documents are considered, the key features of such models are indicated. The classification model Positive Example Based Learning-TM is presented and a software prototype is developed, which realizes the classification of text documents based on it. The developed model demonstrates high classification accuracy, which exceeds the alternative approaches. The proposed model as well as existing models was evaluated based on the SCTM-ru text corpora. Experimentally proved the superiority of Positive Example Based Learning-TM by the criterion of classification accuracy with a small size of training set.
Keywords:
classification, binary classification, topic model, natural language processing.
Citation:
S. N. Karpovich, A. V. Smirnov, N. N. Teslya, “Text documents classification based on probabilistic topic model”, Artificial Intelligence and Decision Making, 2018, no. 3, 69–77; Scientific and Technical Information Processing, 46:5 (2019), 314–320
Linking options:
https://www.mathnet.ru/eng/iipr217 https://www.mathnet.ru/eng/iipr/y2018/i3/p69
|
|