|
Parallel software tools and technologies
Shared memory based MPI Reduce and Bcast algorithms
A. A. Romanyutaa, M. G. Kurnosovbc a Siberian State University of Telecommunications and Informatics, Novosibirsk
b Siberian Academy of Telecommunications and Informatics
c Rzhanov Institute of Semiconductor Physics, Siberian Branch of Russian Academy of Sciences, Novosibirsk
Abstract:
Algorithms for implementing collective operations MPI_Bcast, MPI_Reduce, MPI_Allreduce using shared memory of multiprocessor servers are proposed. The algorithms create a shared memory segment and a system of queues in it, through which message blocks are transmitted. The software implementation is based on the Open MPI library as an isolated coll/sharm component. Unlike existing algorithms, interaction with the queuing system is organized with spinlock and focused on reducing the number of barrier synchronizations and atomic operations. When conducting experiments on a server with x86–64 architecture for the MPI_Bcast operation, the largest reduction in time was obtained by 6.5 times (85% less) and MPI_Reduce by 3.3 times (70% less) compared to the implementation in the coll/tuned component of the Open MPI library. Recommendations on the use of algorithms for different message sizes are suggested.
Keywords:
Bcast; Reduce; Allreduce; collective operations; MPI; computer systems.
Received: 24.07.2023
Citation:
A. A. Romanyuta, M. G. Kurnosov, “Shared memory based MPI Reduce and Bcast algorithms”, Num. Meth. Prog., 24:4 (2023), 339–351
Linking options:
https://www.mathnet.ru/eng/vmp1093 https://www.mathnet.ru/eng/vmp/v24/i4/p339
|
Statistics & downloads: |
Abstract page: | 25 | Full-text PDF : | 3 | References: | 2 |
|