Аннотация:
We consider the problem of training a machine learning model on a dataset that is distributed across many devices. This is the case, for example, in Federated Learning, where a central server orchestrates the training for all connected devices. In a fully decentralised learning environment, the devices may be connected via any arbitrary network, which may change over time.
In the first part of the talk, we present a unified convergence analysis covering a variety of decentralized stochastic gradient descent methods. We derive universal convergence rates for smooth (convex and non-convex) problems. The rates interpolate between heterogeneous (non-identically distributed data) and homogeneous (iid) data and show that differences between workers' local data distributions significantly affect the convergence of these methods.
In the second part of the talk, we will present some methods that are not affected by data dissimilarity. In particular, we will focus on a novel mechanism for information propagation in decentralized learning. We propose a relay scheme that uses spanning trees to distribute information exactly uniformly across all workers with finite delays that depend on the distance between nodes. We prove that RelaySGD, based on this mechanism, is independent of data heterogeneity and scales to many workers, enabling highly accurate decentralized Deep Learning on heterogeneous data.
This talk is based on joint work with:
- A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi and Sebastian U. Stich, A Unified Theory of Decentralized SGD with Changing Topology and Local Updates, https://arxiv.org/abs/2003.10422, ICML 2020.
- T. Vogels, L. He, A. Koloskova, T. Lin, S.P. Karimireddy, S.U. Stich and Martin Jaggi, RelaySum for Decentralized Deep Learning on Heterogeneous Data, https://arxiv.org/abs/2110.04175, NeurIPS 2021.