Аннотация:
Distributed optimization plays an important role in modern large-scale
machine learning and data processing systems by optimizing the utilization
of computational resources. One of the classical and popular approaches
is Local Stochastic Gradient Descent (Local SGD), characterized by multiple
local updates before averaging, which is particularly useful in distributed
environments to reduce communication bottlenecks and improve scalability.
A typical feature of this method is the dependence on the frequency of
communications. But in the case of a quadratic target function
with homogeneous data distribution over all devices, the influence
of the frequency of communications vanishes. As a natural consequence,
subsequent studies include the assumption of a Lipschitz Hessian,
as this indicates the similarity of the optimized function to a quadratic
one to a certain extent. However, in order to extend the completeness of
Local SGD theory and unlock its potential, in this paper we abandon
the Lipschitz Hessian assumption by introducing a new concept of
approximate quadraticity. This assumption gives a new perspective
on problems that have near quadratic properties. In addition,
existing theoretical analyses of Local SGD often assume a bounded variance.
We, in turn, consider the unbounded noise condition, which allows us
to broaden the class of problems under study.
Bibliography: 36 titles.
The work of A. V. Gasnikov was supported by a grant for research centers in the field of artificial
intelligence, provided by the Analytical Center for the Government of the Russian Federation in
accordance with the subsidy agreement (agreement identifier 000000D730321P5Q0002) and the
agreement with the Ivannikov Institute for System Programming of the Russian Academy of
Sciences dated November 2, 2021 no. 70-2021-00142.
Поступила в редакцию: 16.08.2024
Тип публикации:
Статья
УДК:
519.853.62
Язык публикации: английский
Образец цитирования:
A. E. Sadchikov, S. A. Chezhegov, A. N. Beznosikov, A. V. Gasnikov, “Local SGD for near-quadratic problems:
Improving convergence under unconstrained noise conditions”, УМН, 79:6(480) (2024), 83–116