Advanced C Programming Profiling
Total Page:16
File Type:pdf, Size:1020Kb
Advanced C Programming Profiling Sebastian Hack [email protected] Christoph Weidenbach [email protected] 25.11.2008 saarland university computer science 1 Today Profiling Invasive Profiling Non-Invasive Profiling Tools gprof gcov valgrind oprofile Conclusion 2 What is a Profiler? Analyse the runtime behavior of the program I Which parts (functions, statements, . ) of a program take how long? I How often are functions called? I Which functions call which I Construct the dynamic call graph I Memory consumption I Memory accesses I memory leaks I Cache performance 3 Invasive Profiling I Modify the program (code instrumentation) I Insert calls to functions that record data I Advantages: I Very precise I Theoretically at the instruction level I Precise call graph I Disadvantages: I Potentially very high overhead I Depends on the instrumentation code that is inserted I Cannot profile already running systems (long running servers) I Can only profile application (not complete system) 4 Non-Invasive Profiling I Statistic sampling of the program I Use a fixed time interval or Hardware performance counters (CPU feature) to trigger sampling events I Record instruction pointer at each sampling event I Advantages: I Small overhead I Hardware assisted I Can profile the whole system (even the kernel!) I Disadvantages: I not precise + \only" statistical data I Call Graph possibly not complete + some functions are never sampled 5 Profiles I Flat Profile How much time does the program spend in which function? I Call Graph Which function calls which function how often? I Annotated Sources Annotate each source line with number of executions 6 gprof I Mixture of invasive and statistical profiling Invasive Part I gcc inserts calls to a function mcount into prologue of each function I Compile with -g and -pg I mcount can figure out its caller + we can construct the call graph I mcount counts the number of invocations for each function I Call to mcount is the only instrumentation + almost as efficient as normal build I After program is run, there is a file called gmon.out containing profiling data I Evaluate contents of gmon.out with gprof name-of-program 7 gprof Statistical Part I Kernel samples instruction pointer (IP) on each timer interrupt (100/s) I Increments a counter in a histogram of address ranges + cannot track the exact location where timer interrupt happened I Provides a frequency distribution over code locations I Beware of low samplerate I Short running programs will mostly not provide meaningful data I Accumulation of several profile runs is possible: $ ./test_program $ mv gmon.out gmon.sum $ ./test_program $ gprof -s ./test_program gmon.out gmon.sum 8 gcov I Analyses coverage of program code I Which line was executed how often I Helps for finding code that I can profit from optimizations I that is not covered by test cases I Use GCC flags I -fprofile-arcs: collect info about jumps I -ftest-coverage: collect info about code coverage I Attention: Multiple code lines might be merged to one instruction 100: 12:if (a != b) 100: 13: c= 1; 100: 14:else 100: 15: c= 0; 9 valgrind I JIT-compiler / translator: I Construct intermediate representation from x86 assembly code I Add instrumentation code I Compile back to x86 I Done while program is loaded I Is not only a profiler! I No compiler flags / recompilation needed (though -g -fno-inline advisable to analyse output) I Program runtime can degrade drastically due to instrumentation code and recompilation I can escape to debugger on certain events + very handy when debugging memory leaks I Disadvantage: I program might run an order of magnitude slower I program might consume an order of magnitude more memory 10 valgrind Tools memcheck I Redirects calls to malloc and the like I Keeps track of all allocated memory I Instruments references to warn about \bad" memory accesses I uninitialized I already freed I Detects memory leaks I Warns about jumps taken upon uninitialized values cachegrind I Instruments memory accesses I Simulates (!) a L1 and L2 cache in software I Gives precise data about cache misses callgrind I Records the call graph Hint Use kcachegrind for visualization 11 oprofile I Non-invasive I Kernel module and user-space daemon I Does not modify the program at all I -g for debug symbols recommendable I Sampling uses performance counters I . or timer interrupt of perf. counters not available I Profiles the whole system (also the kernel!) I Can distill data for each binary separately I For Windows, use Intel vTune ($$$) 12 oprofile Performance Counter I Set of hardware registers for a plethora of events I Differ from processor model to another I Very detailed events trackable. Examples: I L2 cache misses I Retired instructions I Outstanding bus requests I . and many more I Basic modus operandi: I Kernel module tells the CPU to fire an exception after a certain number of events of a certain type have occurred I CPU traps into kernel I instruction pointer is recorded in a buffer (no histograms) 13 oprofile Howto I Use opcontrol to control the daemon/module I opcontrol --init to load module and daemon I opcontrol -s to start sampling I opcontrol -t to stop sampling I opcontrol --dump flushes the event log I opcontrol --list-events shows available performance counters I opreport -l prog-name gives breakdown of samples per function in prog-name 14 Conclusion I Many different profiling methods exist I gprof I is obsolete I use only to get a quick impression I and for the call graph I sampling might be too imprecise I valgrind I easy to use I no recompile I precise I good visualization (kcachegrind) I but large increase in runtime I oprofile I much more precise than gprof I can profile exotic machine events if you are going for the last cycles I not as precise as valgrind I need root rights on the machine 15.