Proceedings of the Institute for System Programming of the RAS
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Proceedings of the Institute for System Programming of the RAS, 2019, Volume 31, Issue 4, Pages 29–38
DOI: https://doi.org/10.15514/ISPRAS-2019-31(4)-2
(Mi tisp437)
 

Automated error detection and analysis in hyperconverged systems

D. V. Silakov

Virtuozzo
References:
Abstract: The paper is devoted to the problem of early error detection and analysis in hyperconverged systems. One approach to organizing hyperconverged systems is to install on each physical server a separate instance of an operating system (OS) that carries virtualization tools and tools for administering and using a distributed data warehouse. Errors can occur both at the level of a single OS instance and at the level of the entire cluster. For example, incorrect control element commands from one infrastructure node can cause software failure on another node. In addition, errors from the subsystems of the cluster can provoke abnormal situations inside virtual machines. The complexity of the architecture of hyperconverged systems makes it difficult to analyze the errors that occur in them. To simplify such an analysis and increase its effectiveness, it is necessary to automate the process of detecting problems and collecting data necessary for their study and correction. Existing approaches for automation of error detection are described and various improvements are suggested to adopt them for systems where distributed storage and virtualization technologies are actively used. Improvements include log collection from the whole cluster just after the error occurred, additional analysis of guest operating system behaviour inside virtual machines, usage of a knowledge base for automated crash recovery and duplicate detection. Finally, a real-life scenario of error handling process in Virtuozzo company products is described starting from error detection and ending with fix deployment.
Keywords: error detection, virtualization, data storage.
Document Type: Article
Language: Russian
Citation: D. V. Silakov, “Automated error detection and analysis in hyperconverged systems”, Proceedings of ISP RAS, 31:4 (2019), 29–38
Citation in format AMSBIB
\Bibitem{Sil19}
\by D.~V.~Silakov
\paper Automated error detection and analysis in hyperconverged systems
\jour Proceedings of ISP RAS
\yr 2019
\vol 31
\issue 4
\pages 29--38
\mathnet{http://mi.mathnet.ru/tisp437}
\crossref{https://doi.org/10.15514/ISPRAS-2019-31(4)-2}
Linking options:
  • https://www.mathnet.ru/eng/tisp437
  • https://www.mathnet.ru/eng/tisp/v31/i4/p29
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Proceedings of the Institute for System Programming of the RAS
    Statistics & downloads:
    Abstract page:131
    Full-text PDF :48
    References:26
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024