Submission Version Emulization: a Framework for Static Analysis of Dynamic Code via Emulation Abbas Naderi-Afooshteh, Anh Nguyen-Tuong, Jason D. Hiser, Jack W. Davidson Department of Computer Science, University of Virginia
[email protected],
[email protected],
[email protected],
[email protected] Abstract Categories and Subject Descriptors F.3.2 [Logics and Static analysis of dynamic scripting code is a very chal- Meanings of Programs]: Semantics of Programming Languages— lenging problem due to the extensive use of dynamic features Program Analysis; D.3.4 [Programming Languages]: Processors— such as run-time code generation, dynamic aliasing, dynamic Interpreters; D.3.1 [Programming Languages]: Formal Defi- weak typing, and implicit object creation, to name a few. nitions and Theory With increasing popularity of dynamic languages, prior General Terms Design, Security research has focused on theoretical, or practical but ad- hoc solutions to handle many of these dynamic features. Keywords Dynamic Analysis, Static Analysis, Emulation Unfortunately, a comprehensive static analysis method that scales to popular real-world applications is lacking. 1. Introduction This work tackles the problem from a new perspective. Our Static analysis of dynamic applications, such as those writ- approach enables us to have an accurate, scalable method for ten in PHP, Javascript, Python or Bash Script, is an open analysis of dynamic applications. research problem. Several challenging problems plague anal- Our method, emulization, is based on a novel dynamic ysis of dynamic code. Constructing a sound or complete analysis powered by emulation, combined with static analysis. control flow graph (CFG), reasoning about dynamic evalua- The combination provides a framework for context and flow tions (i.e., calls to eval), type inference, error handling and sensitive dynamic and static analysis of dynamic applications.