M. S. Nakhodnov, M. S. Kodryan, E. M. Lobacheva, D. S. Vetrov, “Loss function dynamics and landscape for deep neural networks trained with quadratic loss”, Dokl. RAN. Math. Inf. Proc. Upr., 508 (2022), 50–69; Dokl. Math., 106:suppl. 1 (2022), S43

Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive
	Impact factor

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Dokl. RAN. Math. Inf. Proc. Upr.:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia, 2022, Volume 508, Pages 50–69
DOI: https://doi.org/10.31857/S2686954322070189 (Mi danma337)

This article is cited in 1 scientific paper (total in 1 paper)

ADVANCED STUDIES IN ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

Loss function dynamics and landscape for deep neural networks trained with quadratic loss

M. S. Nakhodnov^a, M. S. Kodryan^b, E. M. Lobacheva^b, D. S. Vetrov^ab

^a Artificial Intelligence Research Institute, Moscow, Russia
^b HSE University, Moscow, Russia

Citations (1)

References:

PDF

HTML

DOI: https://doi.org/10.31857/S2686954322070189

Abstract: Knowledge of the loss landscape geometry makes it possible to successfully explain the behavior of neural networks, the dynamics of their training, and the relationship between resulting solutions and hyperparameters, such as the regularization method, neural network architecture, or learning rate schedule. In this paper, the dynamics of learning and the surface of the standard cross-entropy loss function and the currently popular mean squared error (MSE) loss function for scale-invariant networks with normalization are studied. Symmetries are eliminated via the transition to optimization on a sphere. As a result, three learning phases with fundamentally different properties are revealed depending on the learning step on the sphere, namely, convergence phase, phase of chaotic equilibrium, and phase of destabilized learning. These phases are observed for both loss functions, but larger networks and longer learning for the transition to the convergence phase are required in the case of MSE loss.

Keywords: scale invariance, batch normalization, training of neural networks, optimization, MSE loss function.

Presented: A. A. Shananin
Received: 28.10.2022
Revised: 28.10.2022
Accepted: 01.11.2022

English version:
Doklady Mathematics, 2022, Volume 106, Issue suppl. 1, Pages S43–S62
DOI: https://doi.org/10.1134/S1064562422060187

Bibliographic databases:

Document Type: Article

UDC: 004.8

Language: Russian

Citation: M. S. Nakhodnov, M. S. Kodryan, E. M. Lobacheva, D. S. Vetrov, “Loss function dynamics and landscape for deep neural networks trained with quadratic loss”, Dokl. RAN. Math. Inf. Proc. Upr., 508 (2022), 50–69; Dokl. Math., 106:suppl. 1 (2022), S43–S62

Citation in format AMSBIB

\Bibitem{NakKodLob22}

\by M.~S.~Nakhodnov, M.~S.~Kodryan, E.~M.~Lobacheva, D.~S.~Vetrov

\paper Loss function dynamics and landscape for deep neural networks trained with quadratic loss

\jour Dokl. RAN. Math. Inf. Proc. Upr.

\yr 2022

\vol 508

\pages 50--69

\mathnet{http://mi.mathnet.ru/danma337}

\crossref{https://doi.org/10.31857/S2686954322070189}

\elib{https://elibrary.ru/item.asp?id=49991310}

\transl

\jour Dokl. Math.

\yr 2022

\vol 106

\issue suppl. 1

\pages S43--S62

\crossref{https://doi.org/10.1134/S1064562422060187}

Linking options:

https://www.mathnet.ru/eng/danma337

https://www.mathnet.ru/eng/danma/v508/p50

This publication is cited in the following 1 articles:

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia

Statistics & downloads:
Abstract page:	55
References:	15

Что такое QR-код?

Registration to the website

Logotypes