Modelirovanie i Analiz Informatsionnykh Sistem
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Impact factor

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Model. Anal. Inform. Sist.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Modelirovanie i Analiz Informatsionnykh Sistem, 2016, Volume 23, Number 6, Pages 826–840
DOI: https://doi.org/10.18255/1818-1015-2016-6-826-840
(Mi mais543)
 

This article is cited in 5 scientific papers (total in 5 papers)

Methodological aspects of semantic relationship extraction for automatic thesaurus generation

N. S. Lagutina, K. V. Lagutina, E. I. Mamedov, I. V. Paramonov

P.G. Demidov Yaroslavl State University, 14 Sovetskaya str., Yaroslavl 150000, Russia
Full-text PDF (589 kB) Citations (5)
References:
Abstract: The paper is devoted to analysis of methods for automatic generation of a specialized thesaurus. The main algorithm of generation consists of three stages: selection and preprocessing of a text corpus, recognition of thesaurus terms, and extraction of relations among terms. Our work is focused on exploring methods for semantic relation extraction. We developed a test bench that allow to test well-known algorithms for extraction of synonyms and hypernyms. These algorithms are based on different relation extraction techniques: lexico-syntactic patterns, morpho-syntactic rules, measurement of term information quantity, general-purpose thesaurus WordNet, and Levenstein distance. For analysis of the result thesaurus we proposed a complex assessment that includes the following metrics: precision of extracted terms, precision and recall of hierarchical and synonym relations, and characteristics of the thesaurus graph (the number of extracted terms and semantic relationships of different types, the number of connected components, and the number of vertices in the largest component). The proposed set of metrics allows to evaluate the quality of the thesaurus as a whole, reveal some drawbacks of standard relation extraction methods, and create more efficient hybrid methods that can generate thesauri with better characteristics than thesauri generated by using separate methods. In order to illustrate this fact, one of such hybrid methods is considered in the paper. It combines the best standard algorithms for hypernym and synonym extraction and generates a specialized medical thesaurus. The hybrid method leaves the thesaurus quality on the same level and finds more relations between terms than well-known algorithms.
Keywords: thesaurus, semantic relations, hybrid method, complex assessment, test bench.
Funding agency Grant number
Ministry of Education and Science of the Russian Federation MK-5456.2016.9
This work was supported by the grant of the President of Russian Federation for state support of young Russian scientists (project MK-5456.2016.9).
Received: 19.10.2016
Bibliographic databases:
Document Type: Article
UDC: 004.912
Language: Russian
Citation: N. S. Lagutina, K. V. Lagutina, E. I. Mamedov, I. V. Paramonov, “Methodological aspects of semantic relationship extraction for automatic thesaurus generation”, Model. Anal. Inform. Sist., 23:6 (2016), 826–840
Citation in format AMSBIB
\Bibitem{LagLagMam16}
\by N.~S.~Lagutina, K.~V.~Lagutina, E.~I.~Mamedov, I.~V.~Paramonov
\paper Methodological aspects of semantic relationship extraction for automatic thesaurus generation
\jour Model. Anal. Inform. Sist.
\yr 2016
\vol 23
\issue 6
\pages 826--840
\mathnet{http://mi.mathnet.ru/mais543}
\crossref{https://doi.org/10.18255/1818-1015-2016-6-826-840}
\mathscinet{http://mathscinet.ams.org/mathscinet-getitem?mr=3596164}
\elib{https://elibrary.ru/item.asp?id=27517426}
Linking options:
  • https://www.mathnet.ru/eng/mais543
  • https://www.mathnet.ru/eng/mais/v23/i6/p826
  • This publication is cited in the following 5 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Моделирование и анализ информационных систем
    Statistics & downloads:
    Abstract page:287
    Full-text PDF :279
    References:29
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024