Proceedings of the Institute for System Programming of the RAS
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Proceedings of the Institute for System Programming of the RAS, 2019, Volume 31, Issue 5, Pages 127–136
DOI: https://doi.org/10.15514/ISPRAS-2019-31(5)-9
(Mi tisp458)
 

This article is cited in 3 scientific papers (total in 3 papers)

Cross-lingual similar document retrieval methods

D. V. Zubarev, I. V. Sochenkov

Federal Research Center «Computer Science and Control» of Russian Academy of Sciences
Full-text PDF (352 kB) Citations (3)
References:
Abstract: In this paper, we compare different methods for cross-lingual similar document retrieval. We focus on Russian-English language pair. We compare well-known methods like Cross Lingual Explicit Semantic Analysis (CL-ESA) with methods based on cross-lingual embeddings. We use approximate nearest neighbor (ANN) search to retrieve documents based entirely on distances between learned document embeddings. Also we employ a more traditional approach with usage of inverted index, with extra step of mapping top keywords from one language to other with the help of cross-lingual word embeddings. We use Russian-English aligned Wikipedia articles to evaluate all approaches. Conducted experiments show that an approach with inverted index achieves better performance in terms of recall and MAP than other methods.
Keywords: cross-lingual document retrieval, cross-lingual plagiarism detection, cross-lingual word embeddings.
Funding agency Grant number
Russian Foundation for Basic Research 18-37-20017
Foundation of Project Support of the National Technology Initiative 13/1251/2018
This study was funded by RFBR according to the research project No 18-37-20017. The reported research is also partially funded by the project “Text mining tools for big data” as a part of the program supporting Technical Leadership Centers of the National Technological Initiative “Center for Big Data Storage and Processing” at the Moscow State University (Agreement with Fund supporting the NTI-projects No. 13/1251/2018 11.12.2018).
Document Type: Article
Language: English
Citation: D. V. Zubarev, I. V. Sochenkov, “Cross-lingual similar document retrieval methods”, Proceedings of ISP RAS, 31:5 (2019), 127–136
Citation in format AMSBIB
\Bibitem{ZubSoc19}
\by D.~V.~Zubarev, I.~V.~Sochenkov
\paper Cross-lingual similar document retrieval methods
\jour Proceedings of ISP RAS
\yr 2019
\vol 31
\issue 5
\pages 127--136
\mathnet{http://mi.mathnet.ru/tisp458}
\crossref{https://doi.org/10.15514/ISPRAS-2019-31(5)-9}
Linking options:
  • https://www.mathnet.ru/eng/tisp458
  • https://www.mathnet.ru/eng/tisp/v31/i5/p127
  • This publication is cited in the following 3 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Proceedings of the Institute for System Programming of the RAS
    Statistics & downloads:
    Abstract page:183
    Full-text PDF :62
    References:16
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024