Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Impact factor

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Dokl. RAN. Math. Inf. Proc. Upr.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia, 2022, Volume 508, Pages 50–69
DOI: https://doi.org/10.31857/S2686954322070189
(Mi danma337)
 

This article is cited in 1 scientific paper (total in 1 paper)

ADVANCED STUDIES IN ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

Loss function dynamics and landscape for deep neural networks trained with quadratic loss

M. S. Nakhodnova, M. S. Kodryanb, E. M. Lobachevab, D. S. Vetrovab

a Artificial Intelligence Research Institute, Moscow, Russia
b HSE University, Moscow, Russia
Citations (1)
References:
Abstract: Knowledge of the loss landscape geometry makes it possible to successfully explain the behavior of neural networks, the dynamics of their training, and the relationship between resulting solutions and hyperparameters, such as the regularization method, neural network architecture, or learning rate schedule. In this paper, the dynamics of learning and the surface of the standard cross-entropy loss function and the currently popular mean squared error (MSE) loss function for scale-invariant networks with normalization are studied. Symmetries are eliminated via the transition to optimization on a sphere. As a result, three learning phases with fundamentally different properties are revealed depending on the learning step on the sphere, namely, convergence phase, phase of chaotic equilibrium, and phase of destabilized learning. These phases are observed for both loss functions, but larger networks and longer learning for the transition to the convergence phase are required in the case of MSE loss.
Keywords: scale invariance, batch normalization, training of neural networks, optimization, MSE loss function.
Presented: A. A. Shananin
Received: 28.10.2022
Revised: 28.10.2022
Accepted: 01.11.2022
English version:
Doklady Mathematics, 2022, Volume 106, Issue suppl. 1, Pages S43–S62
DOI: https://doi.org/10.1134/S1064562422060187
Bibliographic databases:
Document Type: Article
UDC: 004.8
Language: Russian
Citation: M. S. Nakhodnov, M. S. Kodryan, E. M. Lobacheva, D. S. Vetrov, “Loss function dynamics and landscape for deep neural networks trained with quadratic loss”, Dokl. RAN. Math. Inf. Proc. Upr., 508 (2022), 50–69; Dokl. Math., 106:suppl. 1 (2022), S43–S62
Citation in format AMSBIB
\Bibitem{NakKodLob22}
\by M.~S.~Nakhodnov, M.~S.~Kodryan, E.~M.~Lobacheva, D.~S.~Vetrov
\paper Loss function dynamics and landscape for deep neural networks trained with quadratic loss
\jour Dokl. RAN. Math. Inf. Proc. Upr.
\yr 2022
\vol 508
\pages 50--69
\mathnet{http://mi.mathnet.ru/danma337}
\crossref{https://doi.org/10.31857/S2686954322070189}
\elib{https://elibrary.ru/item.asp?id=49991310}
\transl
\jour Dokl. Math.
\yr 2022
\vol 106
\issue suppl. 1
\pages S43--S62
\crossref{https://doi.org/10.1134/S1064562422060187}
Linking options:
  • https://www.mathnet.ru/eng/danma337
  • https://www.mathnet.ru/eng/danma/v508/p50
  • This publication is cited in the following 1 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia
    Statistics & downloads:
    Abstract page:55
    References:15
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024