Abstract:
Processing of large amounts of data often consists of several steps, e.g. pre- and post-processing stages, which are executed sequentially with data written to disk after each step, however, when pre-processing stage for each task is different the more efficient way of processing data is to construct a pipeline which streams data from one stage to another. In a more general case some processing stages can be factored into several parallel subordinate stages thus forming a distributed pipeline where each stage can have multiple inputs and multiple outputs. Such processing pattern emerges in a problem of classification of wave energy spectra based on analytic approximations which can extract different wave systems and their parameters (e.g. wave system type, mean wave direction) from spectrum. Distributed pipeline approach achieves good performance compared to conventional “sequential-stage” processing.
Keywords:
distributed system, big data, data processing, parallel computing.
The research was carried out using computational resources of Resource Center Computational Center of Saint Petersburg State University (T-EDGE96 HPC-0011828-001) and partially supported by Russian Foundation for Basic Research (project No. 13-07-00747) and Saint Petersburg State University (project No. 9.38.674.2013 and 0.37.155.2014).
Received: 01.10.2014
Document Type:
Article
UDC:
004.04
Language: English
Citation:
I. G. Gankevich, A. B. Degtyarev, “Efficient processing and classification of wave energy spectrum data with a distributed pipeline”, Computer Research and Modeling, 7:3 (2015), 517–520
\Bibitem{GanDeg15}
\by I.~G.~Gankevich, A.~B.~Degtyarev
\paper Efficient processing and classification of wave energy spectrum data with a distributed pipeline
\jour Computer Research and Modeling
\yr 2015
\vol 7
\issue 3
\pages 517--520
\mathnet{http://mi.mathnet.ru/crm212}
\crossref{https://doi.org/10.20537/2076-7633-2015-7-3-517-520}
Linking options:
https://www.mathnet.ru/eng/crm212
https://www.mathnet.ru/eng/crm/v7/i3/p517
This publication is cited in the following 2 articles:
I. Gankevich, V. Gaiduchok, V. Korkhov, A. Degtyarev, A. Bogdanov, “Middleware for big data processing: test results”, Phys. Part. Nuclei Lett., 14:7 (2017), 1001
Ivan Gankevich, Yuri Tipikin, Vladimir Korkhov, Vladimir Gaiduchok, Alexander Degtyarev, Alexander Bogdanov, Lecture Notes in Computer Science, 9787, Computational Science and Its Applications – ICCSA 2016, 2016, 379