M. N. Goryunov, A. G. Matskevich, D. A. Rybolovlev, “Synthesis of a machine learning model for detecting computer attacks based on the CICIDS2017 dataset”, Proceedings of ISP RAS, 32:5 (2020), 81

Loading [MathJax]/jax/output/SVG/config.js

Proceedings of the Institute for System Programming of the RAS

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Proceedings of the Institute for System Programming of the RAS, 2020, Volume 32, Issue 5, Pages 81–94
DOI: https://doi.org/10.15514/ISPRAS-2020-32(5)-6 (Mi tisp545)

This article is cited in 20 scientific papers (total in 20 papers)

Synthesis of a machine learning model for detecting computer attacks based on the CICIDS2017 dataset

M. N. Goryunov, A. G. Matskevich, D. A. Rybolovlev

The Academy of Federal Security Guard Service of the Russian Federation

Full-text PDF (1169 kB) Citations (20)

References:

PDF

HTML

DOI: https://doi.org/10.15514/ISPRAS-2020-32(5)-6

Abstract: The paper deals with the construction and practical implementation of the model of computer attack detection based on machine learning methods. Among available public datasets one of the most relevant was chosen - CICIDS2017. For this dataset, the procedures of data preprocessing and sampling were developed in detail. In order to reduce computation time, the only class of computer attacks (brute force, XSS, SQL injection) was left in the training set. The procedure of feature space construction is described sequentially, which allowed to significantly reduce its dimensions - from 85 to 10 most important features. The quality assessment of ten most common machine learning models on the obtained pre-processed dataset was made. Among the models (algorithms) that demonstrated the best results (k-nearest neighbors, decision tree, random forest, AdaBoost, logistic regression), taking into account the minimum time of execution, the choice of random forest model was justified. А quasi-optimal selection of hyper parameters was carried out, which made it possible to improve the quality of the model in comparison with the previously published research results. The synthesized model of attack detection was tested on real network traffic. The model has shown its validity only under the condition of training on data collected in a specific network, since important features depend on the physical structure of the network and the settings of the equipment used. The conclusion was made that it is possible to use machine learning methods to detect computer attacks taking into account these limitations.

Keywords: information security, intrusion detection system, machine learning, decision tree, random forest, network traffic, computer attack.

Document Type: Article

Language: Russian

Citation: M. N. Goryunov, A. G. Matskevich, D. A. Rybolovlev, “Synthesis of a machine learning model for detecting computer attacks based on the CICIDS2017 dataset”, Proceedings of ISP RAS, 32:5 (2020), 81–94

Citation in format AMSBIB

\Bibitem{GorMatRyb20}

\by M.~N.~Goryunov, A.~G.~Matskevich, D.~A.~Rybolovlev

\paper Synthesis of a machine learning model for detecting computer attacks based on the CICIDS2017 dataset

\jour Proceedings of ISP RAS

\yr 2020

\vol 32

\issue 5

\pages 81--94

\mathnet{http://mi.mathnet.ru/tisp545}

\crossref{https://doi.org/10.15514/ISPRAS-2020-32(5)-6}

Linking options:

https://www.mathnet.ru/eng/tisp545

https://www.mathnet.ru/eng/tisp/v32/i5/p81

This publication is cited in the following 20 articles:

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Proceedings of the Institute for System Programming of the RAS

Statistics & downloads:
Abstract page:	517
Full-text PDF :	246
References:	101

Registration to the website

Logotypes