Dynamic Binary Translation & Instrumentation
Total Page:16
File Type:pdf, Size:1020Kb
1/24/2017 Dynamic Binary Translation & Instrumentation Pin Building Customized Program Analysis Tools with Dynamic Instrumentation CK Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Kim Hazelwood Intel Vijay Janapa Reddi University of Colorado http://rogue.colorado.edu/Pin PLDI’05 2 1 1/24/2017 Instrumentation • Insert extra code into programs to collect information about execution • Program analysis: • Code coverage, call‐graph generation, memory‐leak detection • Architectural study: • Processor simulation, fault injection • Existing binary‐level instrumentation systems: • Static: • ATOM, EEL, Etch, Morph • Dynamic: • Dyninst, Vulcan, DTrace, Valgrind, Strata, DynamoRIO Pin is a new dynamic binary instrumentation system PLDI’05 3 A Pintool for Tracing Memory Writes #include <iostream> #include "pin.H" executed immediately before a FILE* trace; write is executed • Same source code works on the 4 architectures VOID RecordMemWrite(VOID* ip, VOID* addr, UINT32 size) { fprintf(trace, “%p: W %p %d\n”, ip, addr, size);=> Pin takes care of different addressing modes } • No need to manually save/restore application state VOID Instruction(INS ins, VOID *v) { if (INS_IsMemoryWrite(ins))=> Pin does it for you automatically and efficiently INS_InsertCall(ins, IPOINT_BEFORE, AFUNPTR(RecordMemWrite), IARG_INST_PTR, IARG_MEMORYWRITE_EA, IARG_MEMORYWRITE_SIZE, IARG_END); } int main(int argc, char * argv[]) { executed when an instruction is PIN_Init(argc, argv); dynamically compiled trace = fopen(“atrace.out”, “w”); INS_AddInstrumentFunction(Instruction, 0); PIN_StartProgram(); return 0; PLDI’05 4 } 2 1/24/2017 Dynamic Instrumentation Original code Code cache Exits point back to Pin 1 1’ 2 3 2’ 5 4 7’ 6 7 Pin Pin fetches trace starting block 1 and start instrumentation PLDI’05 5 Dynamic Instrumentation Original code Code cache 1 1’ 2 3 2’ 5 4 7’ 6 7 Pin Pin transfers control into code cache (block 1) PLDI’05 6 3 1/24/2017 Dynamic Instrumentation Original code Code cache trace linking 1 1’ 3’ 2 3 2’ 5’ 5 4 7’ 6’ 6 7 Pin Pin fetches and instrument a new trace PLDI’05 7 Pin’s Software Architecture Address space Pintool 3 programs (Pin, Pintool, App) in same Pin address space: User‐level only Instrumentation APIs Instrumentation APIs: Virtual Machine (VM) Through which Pintool communicates with Pin JIT Compiler Code JIT compiler: Cache Dynamically compile and instrument Application Emulation Unit Emulation unit: Handle insts that can’t be directly executed (e.g., syscalls) Operating System Hardware Code cache: Store compiled code PLDI’05 => Coordinated by VM 8 4 1/24/2017 Pin Internal Details • Loading of Pin, Pintool, & Application • An Improved Trace Linking Technique • Register Re‐allocation • Instrumentation Optimizations • Multithreading Support PLDI’05 9 Register Re‐allocation • Instrumented code needs extra registers. E.g.: • Virtual registers available to the tool • A virtual stack pointer pointing to the instrumentation stack • Many more … • Approaches to get extra registers: 1. Ad‐hoc (e.g., DynamoRIO, Strata, DynInst) – Whenever you need a register, spill one and fill it afterward 2. Re‐allocate all registers during compilation a. Local allocation (e.g., Valgrind) • Allocate registers independently within each trace b. Global allocation (Pin) • Allocate registers across traces (can be inter‐procedural) PLDI’05 10 5 1/24/2017 Valgrind’s Register Re‐allocation Original Code Trace 1 mov 1, %eax mov 1, %eax mov 2, %ebx mov 2, %esi Virtual Physical cmp %ecx, %edx re‐allocate cmp %ecx, %edx %eax %eax jz t mov %eax, SPILLeax %ebx %esi mov %esi, SPILLebx %ecx %ecx jz t’ t: add 1, %eax %edx %edx sub 2, %ebx Trace 2 t’: mov SPILLeax, %eax Virtual Physical mov SPILLebx ,%edi %eax %eax add 1, %eax %ebx %edi Simple but inefficient sub 2, %edi %ecx %ecx %edx %edx • All modified registers are spilled at a trace’s end • Refill registers at a trace’s beginning PLDI’05 11 Pin’s Register Re‐allocation Scenario (1): Compiling a new trace at a trace exit Original Code Trace 1 mov 1, %eax mov 1, %eax mov 2, %ebx mov 2, %esi Compile Trace 2 using the cmp %ecx, %edx re‐allocate cmp %ecx, %edx jz t jz t’ binding at Trace 1’s exit: Virtual Physical t: add 1, %eax %eax %eax %ebx %esi sub 2, %ebx Trace 2 %ecx %ecx %edx %edx t’: add 1, %eax sub 2, %esi PLDI’05 No spilling/filling needed across traces 12 6 1/24/2017 Pin’s Register Re‐allocation Scenario (2): Targeting an already generated trace at a trace exit Original Code Trace 1 (being compiled) mov 1, %eax mov 1, %eax mov 2, %ebx mov 2, %esi re‐allocate Virtual Physical cmp %ecx, %edx cmp %ecx, %edx %eax %eax mov %esi, SPILL jz t ebx %ebx %esi mov SPILLebx, %edi %ecx %ecx t: add 1, %eax jz t’ %edx %edx sub 2, %ebx Trace 2 (in code cache) t’: add 1, %eax Virtual Physical sub 2, %edi %eax %eax %ebx %edi %ecx %ecx Minimal spilling/filling code %edx %edx PLDI’05 13 Instrumentation Optimizations 1. Inline instrumentation code into the application 2. Avoid saving/restoring eflags with liveness analysis 3. Schedule inlined instrumentation code PLDI’05 14 7 1/24/2017 OriginalExample: Instruction Counting code cmov %esi, %edi cmp %edi, (%esp) BBL_InsertCall(bbl, IPOINT_BEFORE, docount(), jle <target1> IARG_UINT32, BBL_NumIns(bbl), add %ecx, %edx IARG_END) cmp %edx, 0 je <target2> 33 extra instructions executed altogether Instrument without applying any optimization Trace bridge() mov %esp,SPILLappsp pushf mov SPILL ,%esp pinsp push %edx call <bridge> push %ecx cmov %esi, %edi docount() mov SPILLappsp,%esp push %eax cmp %edi, (%esp) movl 0x3, %eax add %eax,icount jle <target1’> call docount ret pop %eax mov %esp,SPILLappsp pop %ecx mov SPILLpinsp,%esp call <bridge> pop %edx add %ecx, %edx popf cmp %edx, 0 PLDI’05 ret 15 je <target2’> Example: Instruction Counting Original code cmov %esi, %edi cmp %edi, (%esp) jle <target1> Inlining add %ecx, %edx cmp %edx, 0 Trace je <target2> mov %esp,SPILLappsp mov SPILLpinsp,%esp pushf add 0x3, icount popf cmov %esi, %edi 11 extra instructions executed mov SPILLappsp,%esp cmp %edi, (%esp) jle <target1’> mov %esp,SPILLappsp mov SPILLpinsp,%esp pushf add 0x3, icount popf add %ecx, %edx PLDI’05 cmp %edx, 0 16 je <target2’> 8 1/24/2017 Example: Instruction Counting Original code cmov %esi, %edi cmp %edi, (%esp) jle <target1> Inlining + eflags liveness analysis add %ecx, %edx cmp %edx, 0 Trace je <target2> mov %esp,SPILLappsp mov SPILLpinsp,%esp pushf add 0x3, icount popf cmov %esi, %edi 7 extra instructions executed mov SPILLappsp,%esp cmp %edi, (%esp) jle <target1’> add 0x3, icount add %ecx, %edx cmp %edx, 0 je <target2’> PLDI’05 17 Example: Instruction Counting Original code cmov %esi, %edi cmp %edi, (%esp) jle <target1> Inlining + eflags liveness analysis + scheduling add %ecx, %edx cmp %edx, 0 Trace je <target2> cmov %esi, %edi add 0x3, icount cmp %edi, (%esp) 2 extra instructions executed jle <target1’> add 0x3, icount add %ecx, %edx cmp %edx, 0 je <target2’> PLDI’05 18 9 1/24/2017 Pin Instrumentation Performance Runtime overhead of basic‐block counting with Pin on IA32 Without optimization Inlining Inlining + eflags liveness analysis Inlining + eflags liveness analysis + scheduling 11 10.4 10 9 7.8 8 7 6 5 3.9 3.5 4 2.8 2.5 3 1.5 2 1.4 1 Average Average Slowdown 0 SPECINT SPECFP (SPEC2K using reference data sets) PLDI’05 19 Comparison among Dynamic Instrumentation Tools Runtime overhead of basic‐block counting with three different tools Valgrind DynamoRIO Pin 9 8.3 8 7 6 5.1 5 4 3 2.5 2 Average Slowdown Average 1 0 SPECINT • Valgrind is a popular instrumentation tool on Linux • Call‐based instrumentation, no inlining • DynamoRIO is the performance leader in binary dynamic optimization • Manually inline, no eflags liveness analysis and scheduling PLDI’05Pin automatically provides efficient instrumentation20 10 1/24/2017 Pin Applications • Sample tools in the Pin distribution: • Cache simulators, branch predictors, address tracer, syscall tracer, edge profiler, stride profiler • Some tools developed and used inside Intel: • Opcodemix (analyze code generated by compilers) • PinPoints (find representative regions in programs to simulate) • A tool for detecting memory bugs • Some companies are writing their own Pintools: • A major database vendor, a major search engine provider • Some universities using Pin in teaching and research: • U. of Colorado, MIT, Harvard, Princeton, U of Minnesota, Northeastern, Tufts, University of Rochester, … PLDI’05 21 Conclusions • Pin • A dynamic instrumentation system for building your own program analysis tools • Easy to use, robust, transparent, efficient • Tool source compatible on IA32, EM64T, Itanium, ARM • Works on large applications • database, search engine, web browsers, … • Available on Linux; Windows version coming soon • Downloadable from http://rogue.colorado.edu/Pin • User manual, many example tools, tutorials • 3300 downloads since 2004 July PLDI’05 22 11 1/24/2017 Valgrind A Framework for Heavyweight Dynamic Binary Instrumentation Nicholas Nethercote — National ICT Australia Julian Seward — OpenWorks LLP 23 FAQ #1 • How do you pronounce “Valgrind”? • “Val‐grinned”, not “Val‐grined” • Don’t feel bad: almost everyone gets it wrong at first 24 12 1/24/2017 DBA tools • Program analysis tools are useful • Bug detectors • Profilers • Visualizers • Dynamic binary analysis (DBA) tools • Analyse a program’s machine code at run‐time • Augment original code with analysis code 25 Building DBA tools • Dynamic binary instrumentation (DBI) • Add analysis code to the original machine code at run‐ time • No preparation, 100% coverage • DBI frameworks • Pin, DynamoRIO, Valgrind, etc. Tool Framework +=Tool plug‐in 26 13 1/24/2017 Prior work Well-studied Not well-studied Framework Instrumentation performance capabilities Simple tools Complex tools • Potential of DBI has not been fully exploited • Tools get less attention than frameworks • Complex tools are more interesting than simple tools 27 Shadow value tools 28 14 1/24/2017 Shadow value tools (I) • Shadow every value with another value that describes it • Tool stores and propagates shadow values in parallel Tool(s) Shadow values help find..