Proceedings of the Institute for System Programming of the RAS
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Proceedings of the Institute for System Programming of the RAS, 2018, Volume 30, Issue 6, Pages 221–236
DOI: https://doi.org/10.15514/ISPRAS-2018-30(6)-12
(Mi tisp385)
 

This article is cited in 2 scientific papers (total in 2 papers)

Automatic search for fragments containing biographical information in a natural language text

A. V. Glazkova

University of Tyumen
Full-text PDF (680 kB) Citations (2)
References:
Abstract: The search and classification of text documents are used in many practical applications. These are the key tasks of information retrieval. Methods of text searching and classifying are used in search engines, electronic libraries and catalogs, systems for collecting and processing information, online education and many others. There are a large number of particular applications of these methods, but each such practical task is characterized, as a rule, by weak formalizability and narrow objectivity. Therefore, it requires individual study and its own approach to the solution. This paper discusses the task of automatically searching and typing text fragments containing biographical information. The key problem in solving this problem is to conduct a multi-class classification of text fragments, depending on the presence and type of biographical information contained in them. After reviewing the related works, the author concluded that the use of neural network methods is promising and widespread for solving such problems. Based on this conclusion, the paper compares various architectures of neural network models, as well as basic text presentation methods (Bag-Of-Words, TF-IDF, Word2Vec) on a pre-assembled and marked corpus of biographical texts. The article describes the steps involved in preparing a training set of text fragments for teaching models, methods for text representation and classification methods chosen for solving the problem. The results of the multi-class classification of text fragments are also presented. The examples of automatic search for fragments containing biographical information are shown for the texts that did not participate in the model learning process.
Keywords: text classification, natural language processing, word embedding, neural networks, biographical text.
Funding agency Grant number
Russian Foundation for Basic Research 18-37-00272
Bibliographic databases:
Document Type: Article
Language: Russian
Citation: A. V. Glazkova, “Automatic search for fragments containing biographical information in a natural language text”, Proceedings of ISP RAS, 30:6 (2018), 221–236
Citation in format AMSBIB
\Bibitem{Gla18}
\by A.~V.~Glazkova
\paper Automatic search for fragments containing biographical information in a natural language text
\jour Proceedings of ISP RAS
\yr 2018
\vol 30
\issue 6
\pages 221--236
\mathnet{http://mi.mathnet.ru/tisp385}
\crossref{https://doi.org/10.15514/ISPRAS-2018-30(6)-12}
\elib{https://elibrary.ru/item.asp?id=36825273}
Linking options:
  • https://www.mathnet.ru/eng/tisp385
  • https://www.mathnet.ru/eng/tisp/v30/i6/p221
  • This publication is cited in the following 2 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Proceedings of the Institute for System Programming of the RAS
    Statistics & downloads:
    Abstract page:140
    Full-text PDF :49
    References:15
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024