N. A. Gerasimenko, A. S. Chernyavsky, M. A. Nikiforova, “ruSciBERT: a transformer language model for obtaining semantic embeddings of scientific texts in russian”, Dokl. RAN. Math. Inf. Proc. Upr., 508 (2022), 104–105; Dokl. Math., 508:suppl. 1 (2022), S95

Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive
	Impact factor

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Dokl. RAN. Math. Inf. Proc. Upr.:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia, 2022, Volume 508, Pages 104–105
DOI: https://doi.org/10.31857/S2686954322070074 (Mi danma345)

This article is cited in 4 scientific papers (total in 4 papers)

ADVANCED STUDIES IN ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

ruSciBERT: a transformer language model for obtaining semantic embeddings of scientific texts in russian

N. A. Gerasimenko, A. S. Chernyavsky, M. A. Nikiforova

Sberbank, Moscow

Citations (4)

References:

PDF

HTML

DOI: https://doi.org/10.31857/S2686954322070074

Abstract: Due to the significant increase in the number of scientific publications and reports, the task of processing and analyzing them becomes complicated and labor-intensive. Transformer language models pretrained on large collections of texts can be used to obtain high-quality solutions for a variety of tasks related to textual data analysis. For scientific texts in English, there are language models, such as SciBERT [1] and its modification SPECTER [2], but they do not support the Russian language, because Russian texts are few in the training set. Moreover, only English is supported by the SciDocs benchmark, which is used to evaluate the performance of language models for scientific texts. The proposed ruSciBERT model will make it possible to solve a wide variety of tasks related to analysis of scientific texts in Russian. Moreover, it is supplemented with the ruSciDocs benchmark for evaluating the performance of language models as applied to these tasks.

Keywords: language model, semantic representations, SciBERT, SciDocs.

Presented: A. L. Semenov
Received: 28.10.2022
Revised: 28.10.2022
Accepted: 01.11.2022

English version:
Doklady Mathematics, 2022, Volume 508, Issue suppl. 1, Pages S95–S96
DOI: https://doi.org/10.1134/S1064562422060072

Bibliographic databases:

Document Type: Article

UDC: 004.8

Language: Russian

Citation: N. A. Gerasimenko, A. S. Chernyavsky, M. A. Nikiforova, “ruSciBERT: a transformer language model for obtaining semantic embeddings of scientific texts in russian”, Dokl. RAN. Math. Inf. Proc. Upr., 508 (2022), 104–105; Dokl. Math., 508:suppl. 1 (2022), S95–S96

Citation in format AMSBIB

\Bibitem{GerCheNik22}

\by N.~A.~Gerasimenko, A.~S.~Chernyavsky, M.~A.~Nikiforova

\paper ruSciBERT: a transformer language model for obtaining semantic embeddings of scientific texts in russian

\jour Dokl. RAN. Math. Inf. Proc. Upr.

\yr 2022

\vol 508

\pages 104--105

\mathnet{http://mi.mathnet.ru/danma345}

\crossref{https://doi.org/10.31857/S2686954322070074}

\elib{https://elibrary.ru/item.asp?id=49991318}

\transl

\jour Dokl. Math.

\yr 2022

\vol 508

\issue suppl. 1

\pages S95--S96

\crossref{https://doi.org/10.1134/S1064562422060072}

Linking options:

https://www.mathnet.ru/eng/danma345

https://www.mathnet.ru/eng/danma/v508/p104

This publication is cited in the following 4 articles:

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia

Statistics & downloads:
Abstract page:	91
References:	23

Что такое QR-код?

Registration to the website

Logotypes