|
This article is cited in 2 scientific papers (total in 2 papers)
Topical issue (end)
Bayesian distillation of deep learning models
A. V. Grabovoya, V. V. Strijovb a Moscow Institute of Physics and Technology, Dolgoprudnyi, Moscow oblast, 141701 Russia
b Dorodnicyn Computing Centre, Russian Academy of Sciences, Moscow, 119333 Russia
Abstract:
We study the problem of reducing the complexity of approximating models and consider methods based on distillation of deep learning models. The concepts of trainer and student are introduced. It is assumed that the student model has fewer parameters than the trainer model. A Bayesian approach to the student model selection is suggested. A method is proposed for assigning an a priori distribution of student parameters based on the a posteriori distribution of trainer model parameters. Since the trainer and student parameter spaces do not coincide, we propose a mechanism for the reduction of the trainer model parameter space to the student model parameter space by changing the trainer model structure. A theoretical analysis of the proposed reduction mechanism is carried out. A computational experiment was carried out on synthesized and real data. The FashionMNIST sample was used as real data.
Keywords:
model selection, Bayesian inference, model distillation, local transformation, probability space transformation.
Citation:
A. V. Grabovoy, V. V. Strijov, “Bayesian distillation of deep learning models”, Avtomat. i Telemekh., 2021, no. 11, 16–29; Autom. Remote Control, 82:11 (2021), 1846–1856
Linking options:
https://www.mathnet.ru/eng/at15826 https://www.mathnet.ru/eng/at/y2021/i11/p16
|
Statistics & downloads: |
Abstract page: | 131 | Full-text PDF : | 1 | References: | 53 | First page: | 25 |
|