On the Infeasibility of Modeling Polymorphic Shellcode∗ Yingbo Song Michael E. Locasto Angelos Stavrou Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science Columbia University Columbia University Columbia University
[email protected] [email protected] [email protected] Angelos D. Keromytis Salvatore J. Stolfo Dept. of Computer Science Dept. of Computer Science Columbia University Columbia University
[email protected] [email protected] ABSTRACT General Terms Polymorphic malcode remains a troubling threat. The ability for Experimentation, Measurement, Security malcode to automatically transform into semantically equivalent variants frustrates attempts to rapidly construct a single, simple, easily verifiable representation. We present a quantitative analy- Keywords sis of the strengths and limitations of shellcode polymorphism and polymorphism, shellcode, signature generation, statistical models consider its impact on current intrusion detection practice. We focus on the nature of shellcode decoding routines. The em- pirical evidence we gather helps show that modeling the class of self–modifying code is likely intractable by known methods, in- 1. INTRODUCTION cluding both statistical constructs and string signatures. In addi- Code injection attacks have traditionally received a great deal of tion, we develop and present measures that provide insight into the attention from both security researchers and the blackhat commu- capabilities, strengths, and weaknesses of polymorphic engines. In nity [1, 14], and researchers have proposed a variety of defenses, order to explore countermeasures to future polymorphic threats, we from artificial diversity of the address space [5] or instruction set show how to improve polymorphic techniques and create a proof- [20, 4] to compiler-added integrity checking of the stack [10, 15] of-concept engine expressing these improvements.