Advanced C Programming Profiling
Sebastian Hack [email protected] Christoph Weidenbach [email protected]
25.11.2008
saarland university
computer science
1 Today
Profiling Invasive Profiling Non-Invasive Profiling
Tools gprof gcov valgrind oprofile
Conclusion
2 What is a Profiler?
Analyse the runtime behavior of the program
I Which parts (functions, statements, . . . ) of a program take how long?
I How often are functions called?
I Which functions call which
I Construct the dynamic call graph
I Memory consumption
I Memory accesses
I memory leaks
I Cache performance
3 Invasive Profiling
I Modify the program (code instrumentation)
I Insert calls to functions that record data
I Advantages:
I Very precise
I Theoretically at the instruction level
I Precise call graph
I Disadvantages:
I Potentially very high overhead
I Depends on the instrumentation code that is inserted
I Cannot profile already running systems (long running servers)
I Can only profile application (not complete system)
4 Non-Invasive Profiling
I Statistic sampling of the program
I Use a fixed time interval or Hardware performance counters (CPU feature) to trigger sampling events
I Record instruction pointer at each sampling event
I Advantages:
I Small overhead
I Hardware assisted
I Can profile the whole system (even the kernel!)
I Disadvantages:
I not precise “only” statistical data
I Call Graph possibly not complete some functions are never sampled
5 Profiles
I Flat Profile How much time does the program spend in which function?
I Call Graph Which function calls which function how often?
I Annotated Sources Annotate each source line with number of executions
6 gprof
I Mixture of invasive and statistical profiling
Invasive Part
I gcc inserts calls to a function mcount into prologue of each function
I Compile with -g and -pg
I mcount can figure out its caller we can construct the call graph
I mcount counts the number of invocations for each function
I Call to mcount is the only instrumentation almost as efficient as normal build
I After program is run, there is a file called gmon.out containing profiling data
I Evaluate contents of gmon.out with gprof name-of-program
7 gprof
Statistical Part
I Kernel samples instruction pointer (IP) on each timer interrupt (100/s)
I Increments a counter in a histogram of address ranges cannot track the exact location where timer interrupt happened
I Provides a frequency distribution over code locations
I Beware of low samplerate
I Short running programs will mostly not provide meaningful data
I Accumulation of several profile runs is possible: $ ./test_program $ mv gmon.out gmon.sum $ ./test_program $ gprof -s ./test_program gmon.out gmon.sum
8 gcov
I Analyses coverage of program code
I Which line was executed how often
I Helps for finding code that
I can profit from optimizations
I that is not covered by test cases
I Use GCC flags
I -fprofile-arcs: collect info about jumps
I -ftest-coverage: collect info about code coverage
I Attention: Multiple code lines might be merged to one instruction 100: 12:if (a != b) 100: 13: c= 1; 100: 14:else 100: 15: c= 0;
9 valgrind
I JIT-compiler / translator:
I Construct intermediate representation from x86 assembly code
I Add instrumentation code
I Compile back to x86
I Done while program is loaded
I Is not only a profiler!
I No compiler flags / recompilation needed (though -g -fno-inline advisable to analyse output)
I Program runtime can degrade drastically due to instrumentation code and recompilation
I can escape to debugger on certain events very handy when debugging memory leaks
I Disadvantage:
I program might run an order of magnitude slower
I program might consume an order of magnitude more memory
10 valgrind Tools memcheck
I Redirects calls to malloc and the like I Keeps track of all allocated memory I Instruments references to warn about “bad” memory accesses
I uninitialized
I already freed
I Detects memory leaks I Warns about jumps taken upon uninitialized values cachegrind
I Instruments memory accesses I Simulates (!) a L1 and L2 cache in software I Gives precise data about cache misses callgrind
I Records the call graph Hint Use kcachegrind for visualization
11 oprofile
I Non-invasive
I Kernel module and user-space daemon
I Does not modify the program at all
I -g for debug symbols recommendable
I Sampling uses performance counters
I . . . or timer interrupt of perf. counters not available
I Profiles the whole system (also the kernel!)
I Can distill data for each binary separately
I For Windows, use Intel vTune ($$$)
12 oprofile Performance Counter
I Set of hardware registers for a plethora of events
I Differ from processor model to another
I Very detailed events trackable. Examples:
I L2 cache misses
I Retired instructions
I Outstanding bus requests
I . . . and many more
I Basic modus operandi:
I Kernel module tells the CPU to fire an exception after a certain number of events of a certain type have occurred
I CPU traps into kernel
I instruction pointer is recorded in a buffer (no histograms)
13 oprofile Howto
I Use opcontrol to control the daemon/module
I opcontrol --init to load module and daemon
I opcontrol -s to start sampling
I opcontrol -t to stop sampling
I opcontrol --dump flushes the event log
I opcontrol --list-events shows available performance counters
I opreport -l prog-name gives breakdown of samples per function in prog-name
14 Conclusion
I Many different profiling methods exist
I gprof
I is obsolete
I use only to get a quick impression
I and for the call graph
I sampling might be too imprecise
I valgrind
I easy to use
I no recompile
I precise
I good visualization (kcachegrind)
I but large increase in runtime
I oprofile
I much more precise than gprof
I can profile exotic machine events if you are going for the last cycles
I not as precise as valgrind
I need root rights on the machine
15