Abstract:
Not only developers require tools that work with binary code: it is impossible to achieve sufficient security in contemporary software without inspecting its properties at this level. A key component of binary code analysis toolset is the instruction decoder. Different instruction set architectures give rise to decoders that are differently structured, the decoding results are incompatible, and maintenance is hindered because of the ubiquitous practice of implementing decoders as cascades of conditional operators. Further binary code analysis (control and data flows, symbolic interpretation, etc.) cannot easily be ported from one target architecture to another because of limitations and peculiarities of decoder implementations. In this paper, we propose an approach to decoding machine instructions that is based on external specifications. The main distinction is an original way of representing the decoder instruction universally, i.e. in a way that does not differ from one architecture to another. The decoding process is handled by an abstract stack machine we have developed. Despite the inevitable overhead stemming from the approach's universality, an implementation prototype displays only 1.5-2.5 times slowdown compared to traditional decoders; the measurements include time required to parse the specification and build the required data structures. The proposed approach to organizing decoding would allow, in the long run, to establish a unified stack of binary code analysis tools that would be applicable to different instruction set architectures. The paper further discusses questions of translating the decoded instructions into a machine-neutral internal representation for analyzing their operational semantics and carrying out abstract interpretation. We give examples of practically useful interpretations: the concrete interpretation and a “directing” interpretation that allows to tie the idea of abstract interpretation with the problem of deeper automatic analysis of individual paths in a program.
This work was supported by RFBR grant no. 18-07-01256.
Document Type:
Article
Language: Russian
Citation:
M. A. Solovev, M. G. Bakulin, S. S. Makarov, D. V. Manushin, V. A. Padaryan, “Decoding of machine instructions for abstract interpretation of binary code”, Proceedings of ISP RAS, 31:6 (2019), 65–88