Reverse Engineering Self-Modifying Code: Unpacker Extraction
Department of Computer Science
University of Arizona
Tucson, AZ 85721, U.S.A.
An important application of binary-level reverse engineering is in reconstructing the internal logic of computer malware. Most malware code is distributed in encrypted (or "packed") form; at runtime, an unpacker routine transforms this to the original executable form of the code, which is then executed. Most of the existing work on analysis of such programs focuses on detecting unpacking and extracting the unpacked code. However, this does not shed any light on the functionality of different portions of the code so obtained, and in particular does not distinguish between code that performs unpacking and code that does not; identifying such functionality can be helpful for reverse engineering the code. This paper describes a technique for identifying and extracting the unpacker code in a self-modifying program. Our algorithm uses offline analysis of a dynamic instruction trace both to identify the point(s) where unpacking occurs and to identify and extract the corresponding unpacker code.