|
This article is cited in 1 scientific paper (total in 1 paper)
A method to evaluate program similarity using machine learning methods
P. D. Borisova, Yu. V. Kosolapovb a State Scientific Organization: Research Institute "Spetsvuzavtomatika", Rostov-on-Don
b Southern Federal University
Abstract:
The problem of constructing an algorithm for comparing two executable files is considered. The algorithm is based on the construction of similarity features vector for a given pair of programs. This vector is then used to decide on the similarity or dissimilarity of programs using machine learning methods. Similarity features are built using algorithms of two types: universal and specialized. Universal algorithms do not take into account the format of the input data (values of fuzzy hash functions, values of compression ratios). Specialized algorithms work with executable files and analyze machine code (using disassemblers). A total of 15 features were built: 9 features of the first type and 6 of the second. Based on the constructed training set of similar and dissimilar program pairs, 7 different binary classifiers were trained and tested. To build the training set, coreutils programs were used. The results of the experiments showed high accuracy of models based on random forest and k nearest neighbors. It was also found that the combined use of features of both types can improve the accuracy of classification.
Keywords:
obfuscation, program similarity, machine learning
Citation:
P. D. Borisov, Yu. V. Kosolapov, “A method to evaluate program similarity using machine learning methods”, Proceedings of ISP RAS, 34:5 (2022), 63–76
Linking options:
https://www.mathnet.ru/eng/tisp721 https://www.mathnet.ru/eng/tisp/v34/i5/p63
|
Statistics & downloads: |
Abstract page: | 29 | Full-text PDF : | 16 |
|