Аннотация:
Today, most state-of-the-art neural networks require significant amounts of computational resources for training on increasingly larger datasets with billions of parameters. Generally, the most popular solution for this problem is to split the workload across multiple servers in a cluster, utilizing various distributed training techniques for increased efficiency. In this talk, we will overview the core methods for distributed DL that attempt to tackle its key issues, such as constraints on GPU memory and network bandwidth or low device utilization. In particular, we will cover both data-parallel and model-parallel algorithms, emphasizing the solutions used in practice nowadays. Finally, we will discuss some of the recent methods that allow to train large models with the same number of devices or to train large models on unreliable and poorly connected computers.