Dynamic Binary Translation & Instrumentation
Total Page:16
File Type:pdf, Size:1020Kb
Dynamic Binary Translation & Instrumentation Pin Building Customized Program Analysis Tools with Dynamic Instrumentation CK Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Kim Hazelwood Intel Vijay Janapa Reddi University of Colorado http://rogue.colorado.edu/Pin PLDI’05 2 Instrumentation • Insert extra code into programs to collect information about execution • Program analysis: • Code coverage, call-graph generation, memory-leak detection • Architectural study: • Processor simulation, fault injection • Existing binary-level instrumentation systems: • Static: • ATOM, EEL, Etch, Morph • Dynamic: • Dyninst, Vulcan, DTrace, Valgrind, Strata, DynamoRIO C Pin is a new dynamic binary instrumentation system PLDI’05 3 A Pintool for Tracing Memory Writes #include <iostream> #include "pin.H" executed immediately before a FILE* trace; write is executed • Same source code works on the 4 architectures VOID RecordMemWrite(VOID* ip, VOID* addr, UINT32 size) { fprintf(trace,=> “%p: Pin Wtakes %p %dcare\n”, of ip, different addr, size); addressing modes } • No need to manually save/restore application state VOID Instruction(INS ins, VOID *v) { if (INS_IsMemoryWrite(ins))=> Pin does it for you automatically and efficiently INS_InsertCall(ins, IPOINT_BEFORE, AFUNPTR(RecordMemWrite), IARG_INST_PTR, IARG_MEMORYWRITE_EA, IARG_MEMORYWRITE_SIZE, IARG_END); } int main(int argc, char * argv[]) { executed when an instruction is PIN_Init(argc, argv); dynamically compiled trace = fopen(“atrace.out”, “w”); INS_AddInstrumentFunction(Instruction, 0); PIN_StartProgram(); return 0; PLDI’05 4 } Dynamic Instrumentation Original code Code cache Exits point back to Pin 1 1’ 2 3 2’ 5 4 7’ 6 7 Pin Pin fetches trace starting block 1 and start instrumentation PLDI’05 5 Dynamic Instrumentation Original code Code cache 1 1’ 2 3 2’ 5 4 7’ 6 7 Pin Pin transfers control into code cache (block 1) PLDI’05 6 Dynamic Instrumentation Original code Code cache trace linking 1 1’ 3’ 2 3 2’ 5’ 5 4 7’ 6’ 6 7 Pin Pin fetches and instrument a new trace PLDI’05 7 Pin’s Software Architecture Address space Pintool ❑ 3 programs (Pin, Pintool, App) in same Pin address space: ➢ User-level only Instrumentation APIs ❑ Instrumentation APIs: Virtual Machine (VM) ➢ Through which Pintool communicates with Pin JIT Compiler Code ❑ JIT compiler: Cache ➢ Dynamically compile and instrument Application Emulation Unit ❑ Emulation unit: ➢ Handle insts that can’t be directly executed (e.g., syscalls) Operating System Hardware ❑ Code cache: ➢ Store compiled code PLDI’05 => Coordinated by VM 8 Pin Internal Details • Loading of Pin, Pintool, & Application • An Improved Trace Linking Technique • Register Re-allocation • Instrumentation Optimizations • Multithreading Support PLDI’05 9 Register Re-allocation • Instrumented code needs extra registers. E.g.: • Virtual registers available to the tool • A virtual stack pointer pointing to the instrumentation stack • Many more … • Approaches to get extra registers: 1. Ad-hoc (e.g., DynamoRIO, Strata, DynInst) – Whenever you need a register, spill one and fill it afterward 2. Re-allocate all registers during compilation a. Local allocation (e.g., Valgrind) • Allocate registers independently within each trace b. Global allocation (Pin) • Allocate registers across traces (can be inter-procedural) PLDI’05 10 Valgrind’s Register Re-allocation Original Code Trace 1 mov 1, %eax mov 1, %eax mov 2, %ebx mov 2, %esi Virtual Physical cmp %ecx, %edx re-allocate cmp %ecx, %edx %eax %eax jz t mov %eax, SPILLeax %ebx %esi mov %esi, SPILLebx %ecx %ecx jz t’ t: add 1, %eax %edx %edx sub 2, %ebx Trace 2 t’: mov SPILLeax, %eax Virtual Physical mov SPILLebx ,%edi %eax %eax add 1, %eax %ebx %edi C Simple but inefficient sub 2, %edi %ecx %ecx %edx %edx • All modified registers are spilled at a trace’s end • Refill registers at a trace’s beginning PLDI’05 11 Pin’s Register Re-allocation Scenario (1): Compiling a new trace at a trace exit Original Code Trace 1 mov 1, %eax mov 1, %eax mov 2, %ebx mov 2, %esi Compile Trace 2 using the cmp %ecx, %edx re-allocate cmp %ecx, %edx jz t jz t’ binding at Trace 1’s exit: Virtual Physical t: add 1, %eax %eax %eax %ebx %esi sub 2, %ebx Trace 2 %ecx %ecx %edx %edx t’: add 1, %eax sub 2, %esi PLDI’05 C No spilling/filling needed across traces 12 Pin’s Register Re-allocation Scenario (2): Targeting an already generated trace at a trace exit Original Code Trace 1 (being compiled) mov 1, %eax mov 1, %eax mov 2, %ebx mov 2, %esi re-allocate Virtual Physical cmp %ecx, %edx cmp %ecx, %edx %eax %eax mov %esi, SPILL jz t ebx %ebx %esi mov SPILLebx, %edi %ecx %ecx t: add 1, %eax jz t’ %edx %edx sub 2, %ebx Trace 2 (in code cache) t’: add 1, %eax Virtual Physical sub 2, %edi %eax %eax %ebx %edi %ecx %ecx C Minimal spilling/filling code %edx %edx PLDI’05 13 Instrumentation Optimizations 1. Inline instrumentation code into the application 2. Avoid saving/restoring eflags with liveness analysis 3. Schedule inlined instrumentation code PLDI’05 14 OriginalExample: code Instruction Counting cmov %esi, %edi cmp %edi, (%esp) BBL_InsertCall(bbl, IPOINT_BEFORE, docount(), jle <target1> IARG_UINT32, BBL_NumIns(bbl), add %ecx, %edx IARG_END) cmp %edx, 0 je <target2> C 33 extra instructions executed altogether Instrument without applying any optimization Trace bridge() mov %esp,SPILLappsp pushf mov SPILL ,%esp pinsp push %edx call <bridge> push %ecx cmov %esi, %edi docount() mov SPILLappsp,%esp push %eax cmp %edi, (%esp) movl 0x3, %eax add %eax,icount jle <target1’> call docount ret pop %eax mov %esp,SPILLappsp pop %ecx mov SPILLpinsp,%esp call <bridge> pop %edx add %ecx, %edx popf cmp %edx, 0 PLDI’05 ret 15 je <target2’> Example: Instruction Counting Original code cmov %esi, %edi cmp %edi, (%esp) jle <target1> Inlining add %ecx, %edx cmp %edx, 0 Trace je <target2> mov %esp,SPILLappsp mov SPILLpinsp,%esp pushf add 0x3, icount popf cmov %esi, %edi mov SPILLappsp,%esp C 11 extra instructions executed cmp %edi, (%esp) jle <target1’> mov %esp,SPILLappsp mov SPILLpinsp,%esp pushf add 0x3, icount popf add %ecx, %edx PLDI’05 cmp %edx, 0 16 je <target2’> Example: Instruction Counting Original code cmov %esi, %edi cmp %edi, (%esp) jle <target1> Inlining + eflags liveness analysis add %ecx, %edx cmp %edx, 0 Trace je <target2> mov %esp,SPILLappsp mov SPILLpinsp,%esp pushf add 0x3, icount popf cmov %esi, %edi mov SPILLappsp,%esp C 7 extra instructions executed cmp %edi, (%esp) jle <target1’> add 0x3, icount add %ecx, %edx cmp %edx, 0 je <target2’> PLDI’05 17 Example: Instruction Counting Original code cmov %esi, %edi cmp %edi, (%esp) jle <target1> Inlining + eflags liveness analysis + scheduling add %ecx, %edx cmp %edx, 0 Trace je <target2> cmov %esi, %edi add 0x3, icount cmp %edi, (%esp) C 2 extra instructions executed jle <target1’> add 0x3, icount add %ecx, %edx cmp %edx, 0 je <target2’> PLDI’05 18 Pin Instrumentation Performance Runtime overhead of basic-block counting with Pin on IA32 Without optimization Inlining Inlining + eflags liveness analysis Inlining + eflags liveness analysis + scheduling 11 10.4 10 9 7.8 8 7 6 5 3.9 3.5 4 2.8 2.5 3 1.5 2 1.4 1 Average Slowdown Average 0 SPECINT SPECFP (SPEC2K using reference data sets) PLDI’05 19 Comparison among Dynamic Instrumentation Tools Runtime overhead of basic-block counting with three different tools Valgrind DynamoRIO Pin 9 8.3 8 7 6 5.1 5 4 2.5 3 2 Average Slowdown Average 1 0 SPECINT • Valgrind is a popular instrumentation tool on Linux • Call-based instrumentation, no inlining • DynamoRIO is the performance leader in binary dynamic optimization • Manually inline, no eflags liveness analysis and scheduling CPLDI’05Pin automatically provides efficient instrumentation20 Pin Applications • Sample tools in the Pin distribution: • Cache simulators, branch predictors, address tracer, syscall tracer, edge profiler, stride profiler • Some tools developed and used inside Intel: • Opcodemix (analyze code generated by compilers) • PinPoints (find representative regions in programs to simulate) • A tool for detecting memory bugs • Some companies are writing their own Pintools: • A major database vendor, a major search engine provider • Some universities using Pin in teaching and research: • U. of Colorado, MIT, Harvard, Princeton, U of Minnesota, Northeastern, Tufts, University of Rochester, … PLDI’05 21 Conclusions • Pin • A dynamic instrumentation system for building your own program analysis tools • Easy to use, robust, transparent, efficient • Tool source compatible on IA32, EM64T, Itanium, ARM • Works on large applications • database, search engine, web browsers, … • Available on Linux; Windows version coming soon • Downloadable from http://rogue.colorado.edu/Pin • User manual, many example tools, tutorials • 3300 downloads since 2004 July PLDI’05 22 Valgrind A Framework for Heavyweight Dynamic Binary Instrumentation Nicholas Nethercote — National ICT Australia Julian Seward — OpenWorks LLP 23 FAQ #1 • How do you pronounce “Valgrind”? • “Val-grinned”, not “Val-grined” • Don’t feel bad: almost everyone gets it wrong at first 24 DBA tools • Program analysis tools are useful • Bug detectors • Profilers • Visualizers • Dynamic binary analysis (DBA) tools • Analyse a program’s machine code at run-time • Augment original code with analysis code 25 Building DBA tools • Dynamic binary instrumentation (DBI) • Add analysis code to the original machine code at run- time • No preparation, 100% coverage • DBI frameworks • Pin, DynamoRIO, Valgrind, etc. Tool Framework + = Tool plug-in 26 Prior work