Disassembly of Executable Code Revisited

Benjamin Schwarz, Saumya Debray, Gregory Andrews
Department of Computer Science
University of Arizona
Tucson, AZ 85721, U.S.A.

Abstract

Machine code disassembly routines form a fundamental component of software systems that statically analyze or modify executable programs. The task of disassembly is complicated by indirect jumps and the presence of non-executable data---jump tables, alignment bytes, etc.---in the instruction stream. Existing disassembly algorithms are not always able to cope successfully with executable files containing such features and fail silently---i.e., produce incorrect disassemblies without any indication that the results they are producing are incorrect. This can be a serious problem, since it can compromise the correctness of a binary rewriting tool. In this paper we examine two commonly-used disassembly algorithms and illustrate their shortcomings. We propose a hybrid approach that performs better than these algorithms in the sense that it is able to detect situations where the disassembly may be incorrect and limit the extent of such disassembly errors. Experimental results indicate that the algorithm is quite effective: the amount of code flagged as incurring disassembly errors is usually quite small.