|
This article is cited in 8 scientific papers (total in 8 papers)
MODELS OF ECONOMIC AND SOCIAL SYSTEMS
Comparative analysis of statistical methods of scientific publications classification in medicine
G. V. Danilova, V. V. Zhukovb, A. S. Kulikova, E. S. Makashovaa, N. A. Mitinc, Yu. N. Orlovbc a Burdenko Neurosurgical Center,
16 4th Tverskaya-Yamskaya st., Moscow, 125047, Russia
b Peoples' Friendship University of Russia,
6 Miklukho-Maklaya st., Moscow, 117198, Russia
c Keldysh Institute of Applied Mathematics Russian Academy of Sciences,
4 Miusskaya square, Moscow, 125047, Russia
Abstract:
In this paper the various methods of machine classification of scientific texts by thematic sections on the example of publications in specialized medical journals published by Springer are compared. The corpus of texts was studied in five sections: pharmacology/toxicology, cardiology, immunology, neurology and oncology. We considered both classification methods based on the analysis of annotations and keywords, and classification methods based on the processing of actual texts. Methods of Bayesian classification, reference vectors, and reference letter combinations were applied. It is shown that the method of classification with the best accuracy is based on creating a library of standards of letter trigrams that correspond to texts of a certain subject. It is turned out that for this corpus the Bayesian method gives an error of about 20 %, the support vector machine has error of order 10 %, and the proximity of the distribution of three-letter text to the standard theme gives an error of about 5 %, which allows to rank these methods to the use of artificial intelligence in the task of text classification by industry specialties. It is important that the support vector method provides the same accuracy when analyzing annotations as when analyzing full texts, which is important for reducing the number of operations for large text corpus.
Keywords:
machine learning, medicine texts classification, statistical analysis.
Received: 25.03.2020 Revised: 16.04.2020 Accepted: 06.05.2020
Citation:
G. V. Danilov, V. V. Zhukov, A. S. Kulikov, E. S. Makashova, N. A. Mitin, Yu. N. Orlov, “Comparative analysis of statistical methods of scientific publications classification in medicine”, Computer Research and Modeling, 12:4 (2020), 921–933
Linking options:
https://www.mathnet.ru/eng/crm825 https://www.mathnet.ru/eng/crm/v12/i4/p921
|
Statistics & downloads: |
Abstract page: | 167 | Full-text PDF : | 63 | References: | 25 |
|