Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation
Total Page:16
File Type:pdf, Size:1020Kb
Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation Chi-Keung Luk Robert Cohn Robert Muth Harish Patil Artur Klauser Geoff Lowney Steven Wallace Vijay Janapa Reddi Kim Hazelwood Intel Corporation ¡ University of Colorado ¢¤£¦¥¨§ © £ ¦£ "! #%$&'( £)&(¦*+©-,.+/01$©-!2 ©-,2¦3 45£) 67©2, £¦!2 "0 Abstract 1. Introduction Robust and powerful software instrumentation tools are essential As software complexity increases, instrumentation—a technique for program analysis tasks such as profiling, performance evalu- for inserting extra code into an application to observe its behavior— ation, and bug detection. To meet this need, we have developed is becoming more important. Instrumentation can be performed at a new instrumentation system called Pin. Our goals are to pro- various stages: in the source code, at compile time, post link time, vide easy-to-use, portable, transparent, and efficient instrumenta- or at run time. Pin is a software system that performs run-time tion. Instrumentation tools (called Pintools) are written in C/C++ binary instrumentation of Linux applications. using Pin’s rich API. Pin follows the model of ATOM, allowing the The goal of Pin is to provide an instrumentation platform for tool writer to analyze an application at the instruction level with- building a wide variety of program analysis tools for multiple archi- out the need for detailed knowledge of the underlying instruction tectures. As a result, the design emphasizes ease-of-use, portabil- set. The API is designed to be architecture independent whenever ity, transparency, efficiency, and robustness. This paper describes possible, making Pintools source compatible across different archi- the design of Pin and shows how it provides these features. tectures. However, a Pintool can access architecture-specific details Pin’s instrumentation is easy to use. Its user model is similar when necessary. Instrumentation with Pin is mostly transparent as to the popular ATOM [30] API, which allows a tool to insert calls the application and Pintool observe the application’s original, unin- to instrumentation at arbitrary locations in the executable. Users strumented behavior. Pin uses dynamic compilation to instrument do not need to manually inline instructions or save and restore executables while they are running. For efficiency, Pin uses sev- state. Pin provides a rich API that abstracts away the underlying eral techniques, including inlining, register re-allocation, liveness instruction set idiosyncrasies, making it possible to write portable analysis, and instruction scheduling to optimize instrumentation. instrumentation tools. The Pin distribution includes many sample This fully automated approach delivers significantly better instru- architecture-independent Pintools including profilers, cache simu- mentation performance than similar tools. For example, Pin is 3.3x lators, trace analyzers, and memory bug checkers. The API also faster than Valgrind and 2x faster than DynamoRIO for basic-block allows access to architecture-specific information. counting. To illustrate Pin’s versatility, we describe two Pintools Pin provides efficient instrumentation by using a just-in-time in daily use to analyze production software. Pin is publicly avail- (JIT) compiler to insert and optimize code. In addition to some able for Linux platforms on four architectures: IA32 (32-bit x86), standard techniques for dynamic instrumentation systems includ- 8 EM64T (64-bit x86), Itanium R , and ARM. In the ten months since ing code caching and trace linking, Pin implements register re- Pin 2 was released in July 2004, there have been over 3000 down- allocation, inlining, liveness analysis, and instruction scheduling to loads from its website. optimize jitted code. This fully automated approach distinguishes Pin from most other instrumentation tools which require the user’s Categories and Subject Descriptors D.2.5 [Software Engineer- assistance to boost performance. For example, Valgrind [22] re- ing]: Testing and Debugging-code inspections and walk-throughs, lies on the tool writer to insert special operations in their in- debugging aids, tracing; D.3.4 [Programming Languages]: Processors- termediate representation in order to perform inlining; similarly compilers, incremental compilers DynamoRIO [6] requires the tool writer to manually inline and save/restore application registers. General Terms Languages, Performance, Experimentation Another feature that makes Pin efficient is process attaching Keywords Instrumentation, program analysis tools, dynamic com- and detaching. Like a debugger, Pin can attach to a process, in- pilation strument it, collect profiles, and eventually detach. The application only incurs instrumentation overhead during the period that Pin is attached. The ability to attach and detach is a necessity for the in- strumentation of large, long-running applications. Pin’s JIT-based instrumentation defers code discovery until run time, allowing Pin to be more robust than systems that use static Permission to make digital or hard copies of all or part of this work for personal or instrumentation or code patching. Pin can seamlessly handle mixed classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation code and data, variable-length instructions, statically unknown in- on the first page. To copy otherwise, to republish, to post on servers or to redistribute direct jump targets, dynamically loaded libraries, and dynamically to lists, requires prior specific permission and/or a fee. generated code. PLDI'05 June 12–15,2005,Chicago,Illinois,USA. Pin preserves the original application behavior by providing in- 9 Copyright c 2005 ACM 1-59593-080-9/05/0006 :;:;: $5.00. strumentation transparency. The application observes the same ad- dresses (both instruction and data) and same values (both register )*,+-/.102¥3 465¢7 and memory) as it would in an uninstrumented execution. Trans- 88:9 parency makes the information collected by instrumentation more 2 ;=<¥0>3@? 5,?A62BDC¥2 ;,0¥5>2¥5 4EA62F relevant and is also necessary for correctness. For example, some GH*=I>J 5 4EA62FEK 5,?L¥2 ;,0¥5MNGH*=IO.P;=QSRTGH*=I/.D3EFF2URWV*,XY¥Z[/\ ;,]¥5^1_ applications unintentionally access data beyond the top of stack, so `EQ¥2 ;=<¥0`¨Ma02¥3 465¢R=b=cEQedfL>cEQ>cF gh<bRi;=QSRj3EFF2URk\ ;,]¥5^7 Pin and the instrumentation do not modify the application stack. l 8 R Pin’s first generation, Pin 0, supports Itanium . The recently- 88Dm 1 3¥nnE5EF>` A62>5Eo¥5E2Bp;=<\h02Eq4h0 ;hAh< released second generation, Pin 2, extends the support to four GH*=Ir*=<\h02Eq4h0 ;hAh<sMt*,X¥uO;=<\¨RWGH*=Iv.6o ^w_ 88 architectures: IA32 (32-bit x86) [14], EM64T (64-bit x86) [15], ;=<\h02Eqh? 56<¥0\1C¥2 ;,0¥5 \1q\ ;=<¥xP3yQ¥2¥5EF ;6463E0¥5EFr463¥nnsR 8 8 R R 88 Itanium [13], and ARM [16]. Pin 2 for Itanium is still under ;z{5UzW0E| 5P463¥nn:| 36QQ 56<\/;,``D0E| 5P\h0 A62¥5};6\ 88 development. 3 4h0Eq 3¥nn6BP5E~¥5 4tq¥0¥5EF Pin has been gaining popularity both inside and outside of Intel, ;,`OMt*,X¥u¥*6\tK 5,?A62BEL¥2 ;,0¥5Mt;=<\ ^^ 9 m 2¥5EF ;6463E0¥5EF 3¥n¥nM with more than 3000 downloads since Pin 2 was first released *,X¥u¥*=<\65E20 9 9 H*,XY 6-E)HhJ¥-URkE)V¥X YEJsMNJ 5 4EA62FEK 5,?L¥2 ;,0¥5^R in July 2004. This paper presents an in-depth description of Pin, ;=<\¨Ri* 9 YEJSRy*,EJ¥ hK¥-EKHhJLJ*,Y- 6-UR and is organized as follows. We first give an overview of Pin’s *,EJ¥ *,X¥uEY *,EJ¥ hK¥-EKHhJLJ*,Y- Eu*tE-URD*,EJ¥ 6-XEI^7 instrumentation capability in Section 2. We follow by discussing l design and implementation issues in Section 3. We then evaluate in Section 4 the performance of Pin’s instrumentation and compare it ;=<¥0y? 3;=<sMt;=<¥0P3E2x4¨Rk4t| 3E2v.E3E2xoS ^y_ 9 against other tools. In Section 5, we discuss two sample Pintools *,X *=<;,0¨M3E2x4¨Rk3E2xo ^7 used in practice. Finally, we relate Pin to other work in Section 6 02¥3 465D:` AhQ 56<sMhb,3E02¥3 465UzAhq¥0bRDbCbE^7 and conclude in Section 7. *,X¥u¥6FF *=<\h02Eqh? 56<¥0E)q<4h0 ;hA6<sMt*,<\h02q4h0;hAh<SRy^7 9 9 88 *,X EuE0¥3E20 2 A6x2¥3,?SMt^7 X¥5Eo¥5E2>2¥5E0Eq¥2E<\ 2¥5E0Eq¥2E</¢7 2. Instrumentation with Pin l The Pin API makes it possible to observe all the architectural state of a process, such as the contents of registers, memory, and Figure 1. A Pintool for tracing memory writes. control flow. It uses a model similar to ATOM [30], where the user adds procedures (as known as analysis routines in ATOM’s notion) ¤¨§ © £ © # £ © ¦ ¡ ¡ $ $ to the application process, and writes instrumentation routines to ¦¥" ensures that is © determine where to place calls to analysis routines. The arguments invoked only if the memory instruction is predicated . to analysis routines can be architectural state or constants. Pin Note that the same source code works on all architectures. The also provides a limited ability to alter the program behavior by user does not need to know about the bundling of instructions on allowing an analysis routine to overwrite application registers and Itanium, the various addressing modes on each architecture, the application memory. different forms of predication supported by Itanium and ARM, x86 Instrumentation is performed by a just-in-time (JIT) compiler. string instructions that can write a variable-size memory area, or ¨§ The input to this compiler is not bytecode, however, but a native ex- x86 instructions like that can implicitly write memory. ecutable. Pin intercepts the execution of the first instruction of the Pin provides a comprehensive API for inspection and instru- executable and generates (“compiles”) new code for the straight- mentation. In this particular example, instrumentation is done one line code sequence starting at this instruction. It then transfers con- instruction at a time. It is also possible to inspect whole traces, trol to the generated sequence.