|
Problemy Peredachi Informatsii, 2017, Volume 53, Issue 3, Pages 100–111
(Mi ppi2248)
|
|
|
|
This article is cited in 7 scientific papers (total in 7 papers)
Source Coding
Information-theoretic method for classification of texts
B. Ya. Ryabkoab, A. E. Gus'kovca, I. V. Selivanovabc a Institute of Computational Technologies, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
b Novosibirsk State University, Novosibirsk, Russia
c Russian National Public Library for Science and Technnology, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
Abstract:
We consider a method for automatic (i.e., unmanned) text classification based on methods of universal source coding (or “data compression”). We show that under certain restrictions the proposed method is consistent, i.e., the classification error tends to zero with increasing text lengths. As an example of practical use of the method we consider the classification problem for scientific texts (research papers, books, etc.). The proposed method is experimentally shown to be highly efficient.
Received: 21.10.2015 Revised: 13.05.2017
Citation:
B. Ya. Ryabko, A. E. Gus'kov, I. V. Selivanova, “Information-theoretic method for classification of texts”, Probl. Peredachi Inf., 53:3 (2017), 100–111; Problems Inform. Transmission, 53:3 (2017), 294–304
Linking options:
https://www.mathnet.ru/eng/ppi2248 https://www.mathnet.ru/eng/ppi/v53/i3/p100
|
|