|
Mathematical Foundations of Informatics and Programming
About algorithms for searching computer information
A. V. Zharkova, A. G. Musugalieva Saratov State University
Abstract:
Due to the large growth in the volume of data, there are many problems with information processing. If the search for information is a key task of the system, for example, the search for malware by antivirus programs, you need to know the principles of its organization. In this paper, we study the algorithms for searching a substring in a string: the naive search algorithm, the Knuth — Morris — Pratt algorithm, the Boyer — Moore algorithm, the Rabin — Karp algorithm, as well as wildcards applicable to them (substitution characters, ‘‘matching’’ with any character or their sequence). As a result, a C# program has been developed and implemented to search for files by various parameters (file name, extension, size and content) using the above algorithms and methods. The program allows you to scan a given directory to search for malware. Computational experiments were carried out, including changing the maximum number of characters of the sample and text, the corresponding conclusions were drawn. The overall best file search time (it is enough to find the first occurrence) turned out to be using the Boyer — Moore algorithm, the worst — using the Rabin — Karp algorithm. To search for files for small given data and parameters, you can use naive search, for medium and large data and parameters for small samples it is better to use the Knuth — Morris — Pratt algorithm, for large ones — Boyer — Moore algorithm.
Keywords:
Boyer — Moore algorithm, cybersecurity, file search, Knuth — Morris — Pratt algorithm, Rabin — Karp algorithm, scanning, substring search in a string.
Citation:
A. V. Zharkova, A. G. Musugalieva, “About algorithms for searching computer information”, Prikl. Diskr. Mat. Suppl., 2023, no. 16, 126–129
Linking options:
https://www.mathnet.ru/eng/pdma625 https://www.mathnet.ru/eng/pdma/y2023/i16/p126
|
Statistics & downloads: |
Abstract page: | 53 | Full-text PDF : | 20 | References: | 26 |
|