Computer Optics
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Computer Optics:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Computer Optics, 2016, Volume 40, Issue 4, Pages 572–582
DOI: https://doi.org/10.18287/2412-6179-2016-40-4-572-582
(Mi co252)
 

This article is cited in 12 scientific papers (total in 12 papers)

DATA ANALYSIS

Extraction the knowledge and relevant linguistic means with efficiency estimation for formation of subject-oriented text sets

D. V. Mikhaylov, A. P. Kozlov, G. M. Emelyanov

Yaroslav-the-Wise Novgorod State University, Velikii Novgorod, Russia
References:
Abstract: In this paper we look at two interrelated problems of extracting knowledge units from a set of subject-oriented texts (the so-called corpus) and selecting texts to the corpus by analyzing the relevance to the initial phrase. The main practical goal here is finding the most rational variant to express the knowledge fragment in a given natural language for further reflection in the thesaurus and ontology of a subject area. The problems are of importance when constructing systems for processing, analysis, estimation and understanding of information. In this paper the text relevance to the initial phrase in terms of the described fragment of actual knowledge (including forms of its expression in a given natural language) is defined by the total numerical estimate of the coupling strength of words from the initial phrase jointly occurring in phrases of the text under analysis. The paper considers known variants of such estimation procedures and their application for the search of distinct components which reflect the initial phrase in the texts selected to the topical text corpus. These components correspond to words and their combinations. In comparison with the search of such components on a syntactically marked text corpus, the method for text selection offered in this paper enables a 15-times reduction (on average) in the output of phrases which are irrelevant to the initial one in terms of either the described knowledge fragment or its expression forms in a given natural language.
Keywords: pattern recognition, intelligent data analysis, information theory, open-form test assignment, natural-language expression of expert knowledge, contextual annotation, document ranking in information retrieval.
Funding agency Grant number
Ministry of Education and Science of the Russian Federation
Russian Foundation for Basic Research 16-01-00004_а
This work was supported by the Ministry of Education and Science of the Russian Federation (the base portion goszadaniya) and RFBR grant (№16-01-00004).
Received: 14.04.2016
Accepted: 01.07.2016
Document Type: Article
Language: Russian
Citation: D. V. Mikhaylov, A. P. Kozlov, G. M. Emelyanov, “Extraction the knowledge and relevant linguistic means with efficiency estimation for formation of subject-oriented text sets”, Computer Optics, 40:4 (2016), 572–582
Citation in format AMSBIB
\Bibitem{MikKozEme16}
\by D.~V.~Mikhaylov, A.~P.~Kozlov, G.~M.~Emelyanov
\paper Extraction the knowledge and relevant linguistic means with efficiency estimation for formation of subject-oriented text sets
\jour Computer Optics
\yr 2016
\vol 40
\issue 4
\pages 572--582
\mathnet{http://mi.mathnet.ru/co252}
\crossref{https://doi.org/10.18287/2412-6179-2016-40-4-572-582}
Linking options:
  • https://www.mathnet.ru/eng/co252
  • https://www.mathnet.ru/eng/co/v40/i4/p572
  • This publication is cited in the following 12 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Computer Optics
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024