D. V. Zubarev, I. V. Sochenkov, “Cross-lingual similar document retrieval methods”, Proceedings of ISP RAS, 31:5 (2019), 127

Loading [MathJax]/jax/output/SVG/config.js

Proceedings of the Institute for System Programming of the RAS

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Proceedings of the Institute for System Programming of the RAS, 2019, Volume 31, Issue 5, Pages 127–136
DOI: https://doi.org/10.15514/ISPRAS-2019-31(5)-9 (Mi tisp458)

This article is cited in 3 scientific papers (total in 3 papers)

Cross-lingual similar document retrieval methods

D. V. Zubarev, I. V. Sochenkov

Federal Research Center «Computer Science and Control» of Russian Academy of Sciences

Full-text PDF (352 kB) Citations (3)

References:

PDF

HTML

DOI: https://doi.org/10.15514/ISPRAS-2019-31(5)-9

Abstract: In this paper, we compare different methods for cross-lingual similar document retrieval. We focus on Russian-English language pair. We compare well-known methods like Cross Lingual Explicit Semantic Analysis (CL-ESA) with methods based on cross-lingual embeddings. We use approximate nearest neighbor (ANN) search to retrieve documents based entirely on distances between learned document embeddings. Also we employ a more traditional approach with usage of inverted index, with extra step of mapping top keywords from one language to other with the help of cross-lingual word embeddings. We use Russian-English aligned Wikipedia articles to evaluate all approaches. Conducted experiments show that an approach with inverted index achieves better performance in terms of recall and MAP than other methods.

Keywords: cross-lingual document retrieval, cross-lingual plagiarism detection, cross-lingual word embeddings.

Funding agency	Grant number
Russian Foundation for Basic Research	18-37-20017
Foundation of Project Support of the National Technology Initiative	13/1251/2018
This study was funded by RFBR according to the research project No 18-37-20017. The reported research is also partially funded by the project “Text mining tools for big data” as a part of the program supporting Technical Leadership Centers of the National Technological Initiative “Center for Big Data Storage and Processing” at the Moscow State University (Agreement with Fund supporting the NTI-projects No. 13/1251/2018 11.12.2018).

Document Type: Article

Language: English

Citation: D. V. Zubarev, I. V. Sochenkov, “Cross-lingual similar document retrieval methods”, Proceedings of ISP RAS, 31:5 (2019), 127–136

Citation in format AMSBIB

\Bibitem{ZubSoc19}

\by D.~V.~Zubarev, I.~V.~Sochenkov

\paper Cross-lingual similar document retrieval methods

\jour Proceedings of ISP RAS

\yr 2019

\vol 31

\issue 5

\pages 127--136

\mathnet{http://mi.mathnet.ru/tisp458}

\crossref{https://doi.org/10.15514/ISPRAS-2019-31(5)-9}

Linking options:

https://www.mathnet.ru/eng/tisp458

https://www.mathnet.ru/eng/tisp/v31/i5/p127

This publication is cited in the following 3 articles:

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Proceedings of the Institute for System Programming of the RAS

Statistics & downloads:
Abstract page:	227
Full-text PDF :	75
References:	27

Registration to the website

Logotypes