|
This article is cited in 4 scientific papers (total in 4 papers)
The DiamondCandy algorithm for maximum performance vectorized cross-stencil computation
A. Yu. Perepelkina, V. D. Levchenko
Abstract:
An advance in the search for the 4D time-space decomposition that leads to an efficient vectorized cross-stencil implementation is presented here. The new algorithm is called DiamondCandy. It is built from the dependency and influence conoids of the scheme stencil. It has high locality in terms of the operational intensity, SIMD parallelism support, and is easy to implement. The implementation details are shown to illustrate how both instruction and data levels of parallelism are used for many-core CPU. The test run results show that it performs an order of magnitude better than the traditional approach, and that the performance does not decline with the increase of the data size.
Keywords:
Stencil, LRnLA, Wave Equation, time skewing, many-core.
Citation:
A. Yu. Perepelkina, V. D. Levchenko, “The DiamondCandy algorithm for maximum performance vectorized cross-stencil computation”, Keldysh Institute preprints, 2018, 225, 23 pp.
Linking options:
https://www.mathnet.ru/eng/ipmp2583 https://www.mathnet.ru/eng/ipmp/y2018/p225
|
Statistics & downloads: |
Abstract page: | 173 | Full-text PDF : | 122 | References: | 16 |
|