Numerical methods and programming
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Num. Meth. Prog.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Numerical methods and programming, 2019, Volume 20, Issue 3, Pages 182–191
DOI: https://doi.org/10.26089/NumMet.v20r317
(Mi vmp958)
 

A comprehensive analysis of performance quality of large supercomputer complexes

Vad. V. Voevodin

Lomonosov Moscow State University, Research Computing Center
Abstract: Currently, the problem of low performance of supercomputer complexes is largely due to the fact that administrators of such complexes cannot always timely detect and eliminate the root causes of reduced efficiency. This largely concerns not the equipment failure (such cases can usually be detected using monitoring systems), but an implicit performance decrease of certain supercomputer components, provided that they seems to continue working correctly. Such a situation arises because there are no sufficiently flexible and convenient software tools for prompt and comprehensive analysis of all the performance quality characteristics of computer systems at the moment. The existing solutions either allow analyzing only a small part of such characteristics or are made as non-universal solutions that satisfy only a small set of specific needs provided by administrators of a particular system. This paper describes a systematic approach to solving this issue, which will allow one to perform a comprehensive analysis of various aspects of supercomputer functioning, primarily related to the execution of supercomputer applications. A software tool developed on the basis of this approach will collect, within a single model, all the most important data on the properties and quality of jobs running on the supercomputer - data on their execution performance, size and duration, presence of specific or abnormal behavior scenarios, the usage of application packages and libraries, etc. Using flexible aggregation capabilities, the required level of detail will be specified - individual users, projects, application packages, subject areas, supercomputer partitions, time ranges, etc. This will allow one to create hundreds and thousands of different views for analyzing the state of the supercomputer, which will help administrators to choose the most suitable option for them.
Keywords: supercomputer, parallel computing, supercomputer applications, performance, efficiency analysis, monitoring data.
Received: 25.04.2019
Bibliographic databases:
UDC: 519.68
Language: Russian
Citation: Vad. V. Voevodin, “A comprehensive analysis of performance quality of large supercomputer complexes”, Num. Meth. Prog., 20:3 (2019), 182–191
Citation in format AMSBIB
\Bibitem{Voe19}
\by Vad.~V.~Voevodin
\paper A comprehensive analysis of performance quality of large supercomputer complexes
\jour Num. Meth. Prog.
\yr 2019
\vol 20
\issue 3
\pages 182--191
\mathnet{http://mi.mathnet.ru/vmp958}
\crossref{https://doi.org/10.26089/NumMet.v20r317}
\elib{https://elibrary.ru/item.asp?id=39540771}
Linking options:
  • https://www.mathnet.ru/eng/vmp958
  • https://www.mathnet.ru/eng/vmp/v20/i3/p182
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Numerical methods and programming
    Statistics & downloads:
    Abstract page:115
    Full-text PDF :76
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024