D. V. Silakov, “Automated error detection and analysis in hyperconverged systems”, Proceedings of ISP RAS, 31:4 (2019), 29

Loading [MathJax]/jax/output/SVG/config.js

Proceedings of the Institute for System Programming of the RAS

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Proceedings of the Institute for System Programming of the RAS, 2019, Volume 31, Issue 4, Pages 29–38
DOI: https://doi.org/10.15514/ISPRAS-2019-31(4)-2 (Mi tisp437)

Automated error detection and analysis in hyperconverged systems

D. V. Silakov

Virtuozzo

Full-text PDF (649 kB)

References:

PDF

HTML

DOI: https://doi.org/10.15514/ISPRAS-2019-31(4)-2

Abstract: The paper is devoted to the problem of early error detection and analysis in hyperconverged systems. One approach to organizing hyperconverged systems is to install on each physical server a separate instance of an operating system (OS) that carries virtualization tools and tools for administering and using a distributed data warehouse. Errors can occur both at the level of a single OS instance and at the level of the entire cluster. For example, incorrect control element commands from one infrastructure node can cause software failure on another node. In addition, errors from the subsystems of the cluster can provoke abnormal situations inside virtual machines. The complexity of the architecture of hyperconverged systems makes it difficult to analyze the errors that occur in them. To simplify such an analysis and increase its effectiveness, it is necessary to automate the process of detecting problems and collecting data necessary for their study and correction. Existing approaches for automation of error detection are described and various improvements are suggested to adopt them for systems where distributed storage and virtualization technologies are actively used. Improvements include log collection from the whole cluster just after the error occurred, additional analysis of guest operating system behaviour inside virtual machines, usage of a knowledge base for automated crash recovery and duplicate detection. Finally, a real-life scenario of error handling process in Virtuozzo company products is described starting from error detection and ending with fix deployment.

Keywords: error detection, virtualization, data storage.

Document Type: Article

Language: Russian

Citation: D. V. Silakov, “Automated error detection and analysis in hyperconverged systems”, Proceedings of ISP RAS, 31:4 (2019), 29–38

Citation in format AMSBIB

\Bibitem{Sil19}

\by D.~V.~Silakov

\paper Automated error detection and analysis in hyperconverged systems

\jour Proceedings of ISP RAS

\yr 2019

\vol 31

\issue 4

\pages 29--38

\mathnet{http://mi.mathnet.ru/tisp437}

\crossref{https://doi.org/10.15514/ISPRAS-2019-31(4)-2}

Linking options:

https://www.mathnet.ru/eng/tisp437

https://www.mathnet.ru/eng/tisp/v31/i4/p29

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Proceedings of the Institute for System Programming of the RAS

Statistics & downloads:
Abstract page:	150
Full-text PDF :	54
References:	29

Registration to the website

Logotypes