A. A. Musaev, D. A. Grigoriev, “Extracting knowledge from text messages: overview and state-of-the-art”, Computer Research and Modeling, 13:6 (2021), 1291

Computer Research and Modeling

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Computer Research and Modeling:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Computer Research and Modeling, 2021, Volume 13, Issue 6, Pages 1291–1315
DOI: https://doi.org/10.20537/2076-7633-2021-13-6-1291-1315 (Mi crm949)

This article is cited in 4 scientific papers (total in 4 papers)

MODELS OF ECONOMIC AND SOCIAL SYSTEMS

Extracting knowledge from text messages: overview and state-of-the-art

A. A. Musaev^a, D. A. Grigoriev^b

^a St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS), 39 Linia 14-th, VO, St. Petersburg, 199178, Russia
^b Saint-Petersburg State University (SPBU), 7/9 Universitetskaya Emb., St Petersburg 199034, Russia

Full-text PDF (324 kB) Citations (4)

References:

PDF

HTML

DOI: https://doi.org/10.20537/2076-7633-2021-13-6-1291-1315

Abstract: In general, solving the information explosion problem can be delegated to systems for automatic processing of digital data. These systems are intended for recognizing, sorting, meaningfully processing and presenting data in formats readable and interpretable by humans. The creation of intelligent knowledge extraction systems that handle unstructured data would be a natural solution in this area. At the same time, the evident progress in these tasks for structured data contrasts with the limited success of unstructured data processing, and, in particular, document processing. Currently, this research area is undergoing active development and investigation. The present paper is a systematic survey on both Russian and international publications that are dedicated to the leading trend in automatic text data processing: Text Mining (TM). We cover the main tasks and notions of TM, as well as its place in the current AI landscape. Furthermore, we analyze the complications that arise during the processing of texts written in natural language (NLP) which are weakly structured and often provide ambiguous linguistic information. We describe the stages of text data preparation, cleaning, and selecting features which, along side the data obtained via morphological, syntactic, and semantic analysis, constitute the input for the TM process. This process can be represented as mapping a set of text documents to «knowledge». Using the case of stock trading, we demonstrate the formalization of the problem of making a trade decision based on a set of analytical recommendations. Examples of such mappings are methods of Information Retrieval (IR), text summarization, sentiment analysis, document classification and clustering, etc. The common point of all tasks and techniques of TM is the selection of word forms and their derivatives used to recognize content in NL symbol sequences. Considering IR as an example, we examine classic types of search, such as searching for word forms, phrases, patterns and concepts. Additionally, we consider the augmentation of patterns with syntactic and semantic information. Next, we provide a general description of all NLP instruments: morphological, syntactic, semantic and pragmatic analysis. Finally, we end the paper with a comparative analysis of modern TM tools which can be helpful for selecting a suitable TM platform based on the user's needs and skills.

Keywords: text mining, information extraction, natural language processing, machine learning, semantic annotations.

Funding agency	Grant number
Russian Foundation for Basic Research	19-08-00989 20-08-01046
Ministry of Science and Higher Education of the Russian Federation	0073-2019-0004
Saint Petersburg State University	60419633
The work is partially supported by the Russian Foundation for Basic Research (grants 19-08-00989, 20-08-01046), state research 0073-2019-0004 (A. A. Musaev) and by the SPBU grant (project No. 60419633) as well as within the framework of the CEBA Center Research Program at SPBU (D. A. Grigoriev).

Received: 20.04.2021
Revised: 24.10.2021
Accepted: 26.10.2021

Document Type: Article

UDC: 519.254

Language: Russian

Citation: A. A. Musaev, D. A. Grigoriev, “Extracting knowledge from text messages: overview and state-of-the-art”, Computer Research and Modeling, 13:6 (2021), 1291–1315

Citation in format AMSBIB

\Bibitem{MusGri21}

\by A.~A.~Musaev, D.~A.~Grigoriev

\paper Extracting knowledge from text messages: overview and state-of-the-art

\jour Computer Research and Modeling

\yr 2021

\vol 13

\issue 6

\pages 1291--1315

\mathnet{http://mi.mathnet.ru/crm949}

\crossref{https://doi.org/10.20537/2076-7633-2021-13-6-1291-1315}

Linking options:

https://www.mathnet.ru/eng/crm949

https://www.mathnet.ru/eng/crm/v13/i6/p1291

This publication is cited in the following 4 articles:

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Statistics & downloads:
Abstract page:	271
Full-text PDF :	126
References:	33

Что такое QR-код?

Registration to the website

Logotypes