Computer Research and Modeling
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Computer Research and Modeling:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Computer Research and Modeling, 2021, Volume 13, Issue 6, Pages 1291–1315
DOI: https://doi.org/10.20537/2076-7633-2021-13-6-1291-1315
(Mi crm949)
 

This article is cited in 4 scientific papers (total in 4 papers)

MODELS OF ECONOMIC AND SOCIAL SYSTEMS

Extracting knowledge from text messages: overview and state-of-the-art

A. A. Musaeva, D. A. Grigorievb

a St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS), 39 Linia 14-th, VO, St. Petersburg, 199178, Russia
b Saint-Petersburg State University (SPBU), 7/9 Universitetskaya Emb., St Petersburg 199034, Russia
Full-text PDF (324 kB) Citations (4)
References:
Abstract: In general, solving the information explosion problem can be delegated to systems for automatic processing of digital data. These systems are intended for recognizing, sorting, meaningfully processing and presenting data in formats readable and interpretable by humans. The creation of intelligent knowledge extraction systems that handle unstructured data would be a natural solution in this area. At the same time, the evident progress in these tasks for structured data contrasts with the limited success of unstructured data processing, and, in particular, document processing. Currently, this research area is undergoing active development and investigation. The present paper is a systematic survey on both Russian and international publications that are dedicated to the leading trend in automatic text data processing: Text Mining (TM). We cover the main tasks and notions of TM, as well as its place in the current AI landscape. Furthermore, we analyze the complications that arise during the processing of texts written in natural language (NLP) which are weakly structured and often provide ambiguous linguistic information. We describe the stages of text data preparation, cleaning, and selecting features which, along side the data obtained via morphological, syntactic, and semantic analysis, constitute the input for the TM process. This process can be represented as mapping a set of text documents to «knowledge». Using the case of stock trading, we demonstrate the formalization of the problem of making a trade decision based on a set of analytical recommendations. Examples of such mappings are methods of Information Retrieval (IR), text summarization, sentiment analysis, document classification and clustering, etc. The common point of all tasks and techniques of TM is the selection of word forms and their derivatives used to recognize content in NL symbol sequences. Considering IR as an example, we examine classic types of search, such as searching for word forms, phrases, patterns and concepts. Additionally, we consider the augmentation of patterns with syntactic and semantic information. Next, we provide a general description of all NLP instruments: morphological, syntactic, semantic and pragmatic analysis. Finally, we end the paper with a comparative analysis of modern TM tools which can be helpful for selecting a suitable TM platform based on the user's needs and skills.
Keywords: text mining, information extraction, natural language processing, machine learning, semantic annotations.
Funding agency Grant number
Russian Foundation for Basic Research 19-08-00989
20-08-01046
Ministry of Science and Higher Education of the Russian Federation 0073-2019-0004
Saint Petersburg State University 60419633
The work is partially supported by the Russian Foundation for Basic Research (grants 19-08-00989, 20-08-01046), state research 0073-2019-0004 (A. A. Musaev) and by the SPBU grant (project No. 60419633) as well as within the framework of the CEBA Center Research Program at SPBU (D. A. Grigoriev).
Received: 20.04.2021
Revised: 24.10.2021
Accepted: 26.10.2021
Document Type: Article
UDC: 519.254
Language: Russian
Citation: A. A. Musaev, D. A. Grigoriev, “Extracting knowledge from text messages: overview and state-of-the-art”, Computer Research and Modeling, 13:6 (2021), 1291–1315
Citation in format AMSBIB
\Bibitem{MusGri21}
\by A.~A.~Musaev, D.~A.~Grigoriev
\paper Extracting knowledge from text messages: overview and state-of-the-art
\jour Computer Research and Modeling
\yr 2021
\vol 13
\issue 6
\pages 1291--1315
\mathnet{http://mi.mathnet.ru/crm949}
\crossref{https://doi.org/10.20537/2076-7633-2021-13-6-1291-1315}
Linking options:
  • https://www.mathnet.ru/eng/crm949
  • https://www.mathnet.ru/eng/crm/v13/i6/p1291
  • This publication is cited in the following 4 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Computer Research and Modeling
    Statistics & downloads:
    Abstract page:271
    Full-text PDF :126
    References:33
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024