Computer Optics
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Computer Optics:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Computer Optics, 2017, Volume 41, Issue 3, Pages 461–471
DOI: https://doi.org/10.18287/2412-6179-2017-41-3-461-471
(Mi co406)
 

This article is cited in 8 scientific papers (total in 8 papers)

NUMERICAL METHODS AND DATA ANALYSIS

An approach based on analysis of n-grams on links of words to extract the knowledge and relevant linguistic means on subject-oriented text sets

D. V. Mikhaylov, A. P. Kozlov, G. M. Emelyanov

Yaroslav-the-Wise Novgorod State University, Velikii Novgorod, Russia
Full-text PDF (312 kB) Citations (8)
References:
Abstract: In this paper we look at two interrelated problems of extracting knowledge units from a set of subject-oriented texts (the so-called corpus) and completeness of reflection of revealed actual knowledge in initial phrases. The main practical goal here is finding the most rational variant to express the knowledge fragment in a given natural language for further reflection in the thesaurus and ontology of a subject area. The problems are of importance when constructing systems for processing, analysis, estimation and understanding of information. In this paper the text relevance to the initial phrase in terms of the described fragment of actual knowledge (including forms of its expression in a given natural language) is measured by estimating the coupling strength of words from the initial phrase jointly occurring in phrases of the analyzed text together with classifying these words according to their values of TF-IDF metrics in relation to text corpus. The paper considers an extension of links of words from traditional bigrams to three and more elements for the revelation of constituents of an image of the initial phrase in the form of combinations of related words. Variants of link revelation with and without application of a database of known syntactic relations are considered here. To describe more completely the fragment of expert knowledge revealed in corpus texts, sets of the initial phrases mutually equivalent or complementary in sense and related to the same image are entered into consideration. In comparison with the search of components of the analyzed image on a syntactically marked text corpus the method for text selection offered in the current paper can reduce, on average, by 17 times the output of phrases which are irrelevant to the initial ones in terms of either the knowledge fragment described or its expression forms in a given natural language.
Keywords: pattern recognition, intelligent data analysis, information theory, open-form test assignment, natural-language expression of expert knowledge, contextual annotation, document ranking in information retrieval.
Funding agency Grant number
Ministry of Education and Science of the Russian Federation
Russian Foundation for Basic Research 16-01-00004 а
The work was partially funded by the Russian Federation Ministry of Education and Science (the basic part of the state task) and the Russian Foundation of Basic Research, grant No. 16-01-00004.
Received: 10.04.2017
Accepted: 01.06.2017
Document Type: Article
Language: Russian
Citation: D. V. Mikhaylov, A. P. Kozlov, G. M. Emelyanov, “An approach based on analysis of n-grams on links of words to extract the knowledge and relevant linguistic means on subject-oriented text sets”, Computer Optics, 41:3 (2017), 461–471
Citation in format AMSBIB
\Bibitem{MikKozEme17}
\by D.~V.~Mikhaylov, A.~P.~Kozlov, G.~M.~Emelyanov
\paper An approach based on analysis of n-grams on links of words to extract the knowledge and relevant linguistic means on subject-oriented text sets
\jour Computer Optics
\yr 2017
\vol 41
\issue 3
\pages 461--471
\mathnet{http://mi.mathnet.ru/co406}
\crossref{https://doi.org/10.18287/2412-6179-2017-41-3-461-471}
Linking options:
  • https://www.mathnet.ru/eng/co406
  • https://www.mathnet.ru/eng/co/v41/i3/p461
  • This publication is cited in the following 8 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Computer Optics
    Statistics & downloads:
    Abstract page:1129
    Full-text PDF :60
    References:40
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024