|
This article is cited in 4 scientific papers (total in 4 papers)
ADVANCED STUDIES IN ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
ruSciBERT: a transformer language model for obtaining semantic embeddings of scientific texts in russian
N. A. Gerasimenko, A. S. Chernyavsky, M. A. Nikiforova Sberbank, Moscow
Abstract:
Due to the significant increase in the number of scientific publications and reports, the task of processing and analyzing them becomes complicated and labor-intensive. Transformer language models pretrained on large collections of texts can be used to obtain high-quality solutions for a variety of tasks related to textual data analysis. For scientific texts in English, there are language models, such as SciBERT [1] and its modification SPECTER [2], but they do not support the Russian language, because Russian texts are few in the training set. Moreover, only English is supported by the SciDocs benchmark, which is used to evaluate the performance of language models for scientific texts. The proposed ruSciBERT model will make it possible to solve a wide variety of tasks related to analysis of scientific texts in Russian. Moreover, it is supplemented with the ruSciDocs benchmark for evaluating the performance of language models as applied to these tasks.
Keywords:
language model, semantic representations, SciBERT, SciDocs.
Citation:
N. A. Gerasimenko, A. S. Chernyavsky, M. A. Nikiforova, “ruSciBERT: a transformer language model for obtaining semantic embeddings of scientific texts in russian”, Dokl. RAN. Math. Inf. Proc. Upr., 508 (2022), 104–105; Dokl. Math., 508:suppl. 1 (2022), S95–S96
Linking options:
https://www.mathnet.ru/eng/danma345 https://www.mathnet.ru/eng/danma/v508/p104
|
Statistics & downloads: |
Abstract page: | 91 | References: | 23 |
|