Семинары: М. Белкин, Optimization for over-parameterized systems and transition to linearity in deep learning

Семинары

RUS ENG

ЖУРНАЛЫ ПЕРСОНАЛИИ ОРГАНИЗАЦИИ КОНФЕРЕНЦИИ СЕМИНАРЫ ВИДЕОТЕКА ПАКЕТ AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	Календарь
	Поиск
	Регистрация семинара

	RSS
	Ближайшие семинары

Общероссийский семинар по оптимизации им. Б.Т. Поляка
3 ноября 2021 г. 17:30, Москва, Онлайн, пятница, 19:00

Optimization for over-parameterized systems and transition to linearity in deep learning

М. Белкин

*Дополнительные материалы:*
	Adobe PDF	909.0 Kb

Количество просмотров:
Эта страница:	262
Материалы:	41
Youtube:

https://www.youtube.com/watch?v=kzCm18bdSrI

Аннотация: The success of deep learning is due, to a great extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks. In this talk I will discuss some general mathematical principles allowing for efficient optimization in over-parameterized non-linear systems, a setting that includes deep neural networks. I will discuss that optimization problems corresponding to these systems are not convex, even locally, but instead satisfy the Polyak-Lojasiewicz (PL) condition on most of the parameter space, allowing for efficient optimization by gradient descent or SGD. We connect the PL condition of these systems to the condition number associated to the tangent kernel and show how a non-linear theory for those systems parallels classical analyses of over-parameterized linear equations.
In a related but conceptually separate development, I will discuss a new perspective on the remarkable recently discovered phenomenon of transition to linearity (constancy of NTK) in certain classes of large neural networks. I will show how this transition to linearity arises from the scaling of the Hessian with the size of the network.
Combining these ideas yields a clean and general argument for demonstrating the PL condition and convergence for a large class of wide neural networks. Finally I will comment systems which are "almost" over-parameterized, which appears to be common in practice.
Joint work with Chaoyue Liu and Libin Zhu

Дополнительные материалы:

slides.pdf (909.0 Kb)

Обратная связь:

Пользовательское соглашение

Регистрация посетителей портала

Логотипы