Abstract:
One of the most popular topics at the intersection of data analysis and optimization recently is how to train deep neural networks. Mathematically, the problem is reduced to the problem of stochastic optimization, which, in turn, using the Monte Carlo method is reduced to the problem of minimizing the sum of a large number of functions. It is important to note that a similar plot is inherent in almost all tasks coming from data analysis. Almost all data analysis (machine learning) tasks are reduced to optimization problems, or rather stochastic optimization. In mathematical statistics with a known probabilistic law (but unknown parameters), and in machine learning - with an unknown probabilistic law. One of the most popular ways to solve such optimization problems and their variants obtained using the Monte Carlo method is the stochastic gradient descent method and its variations. The methods were known back in the 50s of the last century. However, the real significance of this method has been evaluated in the last twenty years in connection with the noted applications. In this report, it is planned to make a small overview of the development of this direction in recent years (adaptive step selection, butch size, federated training, etc.).