|
Analysis of textual and graphical information
Methods for cross-lingual retrieval of similar documents in legal domain based on machine learning
V. V. Zhebela, D. A. Devyatkinb, D. V. Zubarevb, I. V. Sochenkovbcd a Limited liability company "Technologies for Systems Analysis", Moscow, Russia
b Federal Research Center "Computer Science and Control" of Russian Academy of Sciences, Moscow, Russia
c Innopolis University, Kazan, Russia
d Ivannikov Institute for System Programming of the RAS, Moscow, Russia
Abstract:
The need of studying the international experience to improve legislation cause the need of information retrieval systems to be good in multilingual legal domain. One of the possible solutions is thematically similar document retrieval. However, there is an important task to transfer between languages to let the user put a document on the one language and get the search result on another one. The paper describes different approaches to solve this problem: from classical mediator-based methods to modern procedures of distributive semantics. As a test collection, we have used the UN digital library. The combination of the extended translation model and BM25 ranking function demonstrates the best results.
Keywords:
cross-lingual document retrieval, distributional semantics, information retrieval in the legal domain.
Citation:
V. V. Zhebel, D. A. Devyatkin, D. V. Zubarev, I. V. Sochenkov, “Methods for cross-lingual retrieval of similar documents in legal domain based on machine learning”, Artificial Intelligence and Decision Making, 2022, no. 2, 27–35; Scientific and Technical Information Processing, 50:5 (2023), 494–499
Linking options:
https://www.mathnet.ru/eng/iipr62 https://www.mathnet.ru/eng/iipr/y2022/i2/p27
|
Statistics & downloads: |
Abstract page: | 23 | Full-text PDF : | 25 |
|