Proceedings of the Institute for System Programming of the RAS
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Proceedings of the Institute for System Programming of the RAS, 2022, Volume 34, Issue 4, Pages 187–200
DOI: https://doi.org/10.15514/ISPRAS-2022-34(4)-13
(Mi tisp713)
 

Methods and techniques to automatic entity linking in Russian

A. A. Mezentsevaab, E. P. Bruchesab, T. V. Baturaa

a A.P. Ershov Institute of Informatics Systems, Siberian Branch of the Russian Academy of Sciences
b Novosibirsk State University
Abstract: Nowadays, there is a growing interest in solving NLP tasks using external knowledge storage, for example, in information retrieval, question-answering systems, dialogue systems, etc. Thus it is important to establish relations between entities in the processed text and a knowledge base. This article is devoted to entity linking, where Wikidata is used as an external knowledge base. We consider scientific terms in Russian as entities. Traditional entity linking system has three stages: entity recognition, candidates (from knowledge base) generation, and candidate ranking. Our system takes raw text with the defined terms in it as input. To generate candidates we use string match between terms in the input text and entities from Wikidata. The candidate ranking stage is the most complicated one because it requires semantic information. Several experiments for the candidate ranking stage with different models were conducted, including the approach based on cosine similarity, classical machine learning algorithms, and neural networks. Also, we extended the RUSERRC dataset, adding manually annotated data for model training. The results showed that the approach based on cosine similarity leads to better results compared to others and doesn’t require manually annotated data. The dataset and system are open-sourced and available for other researchers.
Keywords: entity linking, knowledge base, scientific terms
Document Type: Article
Language: Russian
Citation: A. A. Mezentseva, E. P. Bruches, T. V. Batura, “Methods and techniques to automatic entity linking in Russian”, Proceedings of ISP RAS, 34:4 (2022), 187–200
Citation in format AMSBIB
\Bibitem{MezBruBat22}
\by A.~A.~Mezentseva, E.~P.~Bruches, T.~V.~Batura
\paper Methods and techniques to automatic entity linking in Russian
\jour Proceedings of ISP RAS
\yr 2022
\vol 34
\issue 4
\pages 187--200
\mathnet{http://mi.mathnet.ru/tisp713}
\crossref{https://doi.org/10.15514/ISPRAS-2022-34(4)-13}
Linking options:
  • https://www.mathnet.ru/eng/tisp713
  • https://www.mathnet.ru/eng/tisp/v34/i4/p187
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Proceedings of the Institute for System Programming of the RAS
    Statistics & downloads:
    Abstract page:18
    Full-text PDF :6
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024