D. Kosenko, Yu. Kuratov, D. Zharikova, “Accessible Russian large language models: open-sourced models and instructive datasets for commercial applications”, Dokl. RAN. Math. Inf. Proc. Upr., 514:2 (2023), 262–269; Dokl. Math., 108:suppl. 2 (2023), S393

Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive
	Impact factor

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Dokl. RAN. Math. Inf. Proc. Upr.:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia, 2023, Volume 514, Number 2, Pages 262–269
DOI: https://doi.org/10.31857/S2686954323602063 (Mi danma471)

SPECIAL ISSUE: ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING TECHNOLOGIES

Accessible Russian large language models: open-sourced models and instructive datasets for commercial applications

D. Kosenko^ab, Yu. Kuratov^abc, D. Zharikova^b

^a Moscow Institute of Physics and Technology (National Research University), Moscow, Russia
^b DeepPavlov, Moscow, Russia
^c Artificial Intelligence Research Institute, Moscow, Russia

References:

PDF

HTML

DOI: https://doi.org/10.31857/S2686954323602063

Abstract: This paper presents an approach to developing and fine-tuning large language models for Russian that are capable of following instructions across domains. As base models, XGLM-4.5B, LLaMA-1 7B, LLaMA-1 13B, LLaMA-2 7B, LLaMA-2 13B, and ruGPT-3.5 13B were used. This work compares two main fine-tuning techniques: fine-tuning all model parameters and fine-tuning using LoRA layers. To create a fine-tuning dataset, several open English language data sources were used, including Databricks Dolly 15k, OpenAssistant Conversations Dataset (OASST1), chip2-instruct-alpha-v6a-1, which were then translated into Russian using the WMT21 En-X model. This work shows that the quality of the instructions provided for training significantly affects the ability to solve tasks on automatic quality metrics like MT-BENCH and MMLU. At the same time, the quality of models trained on the dataset collected as part of this work with a commercial license achieves comparable results to models fine-tuned on the Saiga dataset with a limited license. The fine-tuned language models and collected Russian language dataset are released open-source with licenses suitable for commercial use.

Keywords: large language models, language models, language models in Russian.

Funding agency	Grant number
Правительство Российской Федерации	70-2021-00138

Presented: A. L. Semenov
Received: 31.08.2023
Revised: 30.09.2023
Accepted: 15.10.2023

English version:
Doklady Mathematics, 2023, Volume 108, Issue suppl. 2, Pages S393–S398
DOI: https://doi.org/10.1134/S1064562423701168

Bibliographic databases:

Document Type: Article

UDC: 0004.8

Language: Russian

Citation: D. Kosenko, Yu. Kuratov, D. Zharikova, “Accessible Russian large language models: open-sourced models and instructive datasets for commercial applications”, Dokl. RAN. Math. Inf. Proc. Upr., 514:2 (2023), 262–269; Dokl. Math., 108:suppl. 2 (2023), S393–S398

Citation in format AMSBIB

\Bibitem{KosKurZha23}

\by D.~Kosenko, Yu.~Kuratov, D.~Zharikova

\paper Accessible Russian large language models: open-sourced models and instructive datasets for commercial applications

\jour Dokl. RAN. Math. Inf. Proc. Upr.

\yr 2023

\vol 514

\issue 2

\pages 262--269

\mathnet{http://mi.mathnet.ru/danma471}

\crossref{https://doi.org/10.31857/S2686954323602063}

\elib{https://elibrary.ru/item.asp?id=56717833}

\transl

\jour Dokl. Math.

\yr 2023

\vol 108

\issue suppl. 2

\pages S393--S398

\crossref{https://doi.org/10.1134/S1064562423701168}

Linking options:

https://www.mathnet.ru/eng/danma471

https://www.mathnet.ru/eng/danma/v514/i2/p262

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia

Что такое QR-код?

Registration to the website

Logotypes