|
This article is cited in 12 scientific papers (total in 12 papers)
DATA ANALYSIS
Extraction the knowledge and relevant linguistic means with efficiency estimation for formation of subject-oriented text sets
D. V. Mikhaylov, A. P. Kozlov, G. M. Emelyanov Yaroslav-the-Wise Novgorod State University, Velikii Novgorod, Russia
Abstract:
In this paper we look at two interrelated problems of extracting knowledge units from a set of subject-oriented texts (the so-called corpus) and selecting texts to the corpus by analyzing the relevance to the initial phrase. The main practical goal here is finding the most rational variant to express the knowledge fragment in a given natural language for further reflection in the thesaurus and ontology of a subject area. The problems are of importance when constructing systems for processing, analysis, estimation and understanding of information. In this paper the text relevance to the initial phrase in terms of the described fragment of actual knowledge (including forms of its expression in a given natural language) is defined by the total numerical estimate of the coupling strength of words from the initial phrase jointly occurring in phrases of the text under analysis. The paper considers known variants of such estimation procedures and their application for the search of distinct components which reflect the initial phrase in the texts selected to the topical text corpus. These components correspond to words and their combinations. In comparison with the search of such components on a syntactically marked text corpus, the method for text selection offered in this paper enables a 15-times reduction (on average) in the output of phrases which are irrelevant to the initial one in terms of either the described knowledge fragment or its expression forms in a given natural language.
Keywords:
pattern recognition, intelligent data analysis, information theory, open-form test assignment, natural-language expression of expert knowledge, contextual annotation, document ranking in information retrieval.
Received: 14.04.2016 Accepted: 01.07.2016
Citation:
D. V. Mikhaylov, A. P. Kozlov, G. M. Emelyanov, “Extraction the knowledge and relevant linguistic means with efficiency estimation for formation of subject-oriented text sets”, Computer Optics, 40:4 (2016), 572–582
Linking options:
https://www.mathnet.ru/eng/co252 https://www.mathnet.ru/eng/co/v40/i4/p572
|
|