M. Gorpinich, O. Yu. Bakhteev, V. V. Strijov, “Gradient methods for optimizing metaparameters in the knowledge distillation problem”, Avtomat. i Telemekh., 2022, no. 10, 67–79; Autom. Remote Control, 83:10 (2022), 1544

Avtomatika i Telemekhanika

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive
	Impact factor
	Guidelines for authors
	Submit a manuscript

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Avtomat. i Telemekh.:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Avtomatika i Telemekhanika, 2022, Issue 10, Pages 67–79
DOI: https://doi.org/10.31857/S0005231022100075 (Mi at16052)

Topical issue

Gradient methods for optimizing metaparameters in the knowledge distillation problem

M. Gorpinich^a, O. Yu. Bakhteev^b, V. V. Strijov^b

^a Moscow Institute of Physics and Technology, Dolgoprudnyi, Moscow oblast, 141701 Russia
^b Dorodnicyn Computing Centre, Russian Academy of Sciences, Moscow, 119333 Russia

Full-text PDF (877 kB) First page

References:

PDF

HTML

DOI: https://doi.org/10.31857/S0005231022100075

Abstract: The paper investigates the distillation problem for deep learning models. Knowledge distillation is a metaparameter optimization problem in which information from a model of a more complex structure, called a teacher model, is transferred to a model of a simpler structure, called a student model. The paper proposes a generalization of the distillation problem for the case of optimization of metaparameters by gradient methods. Metaparameters are the parameters of the distillation optimization problem. The loss function for such a problem is the sum of the classification term and the cross-entropy between the responses of the student model and the teacher model. Assigning optimal metaparameters to the distillation loss function is a computationally difficult task. The properties of the optimization problem are investigated so as to predict the metaparameter update trajectory. An analysis of the trajectory of the gradient optimization of metaparameters is carried out, and their value is predicted using linear functions. The proposed approach is illustrated using a computational experiment on CIFAR-10 and Fashion-MNIST samples as well as on synthetic data.

Keywords: machine learning, knowledge distillation, metaparameter optimization, gradient optimization, metaparameter assignment.

Funding agency
This work was supported by K.V. Rudakov’s Academic Scholarship and by the Russian Foundation for Basic Research, project no. 20-07-00990.

Presented by the member of Editorial Board: A. A. Lazarev

Received: 17.02.2022
Revised: 23.06.2022
Accepted: 29.06.2022

English version:
Automation and Remote Control, 2022, Volume 83, Issue 10, Pages 1544–1554
DOI: https://doi.org/10.1134/S00051179220100071

Bibliographic databases:

Document Type: Article

Language: Russian

Citation: M. Gorpinich, O. Yu. Bakhteev, V. V. Strijov, “Gradient methods for optimizing metaparameters in the knowledge distillation problem”, Avtomat. i Telemekh., 2022, no. 10, 67–79; Autom. Remote Control, 83:10 (2022), 1544–1554

Citation in format AMSBIB

\Bibitem{GorBakStr22}

\by M.~Gorpinich, O.~Yu.~Bakhteev, V.~V.~Strijov

\paper Gradient methods for optimizing metaparameters in the knowledge distillation problem

\jour Avtomat. i Telemekh.

\yr 2022

\issue 10

\pages 67--79

\mathnet{http://mi.mathnet.ru/at16052}

\crossref{https://doi.org/10.31857/S0005231022100075}

\mathscinet{http://mathscinet.ams.org/mathscinet-getitem?mr=4529662}

\edn{https://elibrary.ru/AKGKQX}

\transl

\jour Autom. Remote Control

\yr 2022

\vol 83

\issue 10

\pages 1544--1554

\crossref{https://doi.org/10.1134/S00051179220100071}

Linking options:

https://www.mathnet.ru/eng/at16052

https://www.mathnet.ru/eng/at/y2022/i10/p67

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Statistics & downloads:
Abstract page:	60
References:	14
First page:	11

Что такое QR-код?

Registration to the website

Logotypes