|
Information technologies controls
System of thematically-oriented texts automatic processing with dictionary of terms in the form of regular expressions
V. S. Sukhoverov V.A. Trapeznikov Institute of Control Sciences of RAS, Moscow
Abstract:
The system of automatic text processing is developed that determines the text subject based on the terminology used, according to the dictionary of terms. The application of regular expressions is proposed and justified in domain-specific dictionaries used in the programs of text analysis in natural languages. The interrelation of regular expressions and finite automata through regular sets is noted and described. A quantitative assessment is suggested of the thematic focus of the text investigated - the document profile, calculated by the terms search results. The system is implemented in practice in the form of a software package with a dictionary version for the selected subject area - control theory and its applications. The system was tested on the archive of the journal «Automation and Remote Control». The profiles of the thematic focus of the articles taken from various sections of the journal were obtained. The opportunities of the system development are indicated.
Keywords:
term, domain dictionary, regular expression, finite state machine, document profile, software package.
Received: 27.09.2018 Revised: 22.10.2018 Accepted: 12.12.2018
Citation:
V. S. Sukhoverov, “System of thematically-oriented texts automatic processing with dictionary of terms in the form of regular expressions”, Probl. Upr., 2019, no. 2, 41–46
Linking options:
https://www.mathnet.ru/eng/pu1129 https://www.mathnet.ru/eng/pu/v2/p41
|
Statistics & downloads: |
Abstract page: | 120 | Full-text PDF : | 31 | References: | 24 | First page: | 4 |
|