Advanced C Programming Profiling

Sebastian Hack [email protected] Christoph Weidenbach [email protected]

25.11.2008

saarland university

computer science

1 Today

Profiling Invasive Profiling Non-Invasive Profiling

Tools gcov valgrind oprofile

Conclusion

2 What is a Profiler?

Analyse the runtime behavior of the program

I Which parts (functions, statements, . . . ) of a program take how long?

I How often are functions called?

I Which functions call which

I Construct the dynamic call graph

I Memory consumption

I Memory accesses

I memory leaks

I Cache performance

3 Invasive Profiling

I Modify the program (code instrumentation)

I Insert calls to functions that record data

I Advantages:

I Very precise

I Theoretically at the instruction level

I Precise call graph

I Disadvantages:

I Potentially very high overhead

I Depends on the instrumentation code that is inserted

I Cannot profile already running systems (long running servers)

I Can only profile application (not complete system)

4 Non-Invasive Profiling

I Statistic sampling of the program

I Use a fixed time interval or Hardware performance counters (CPU feature) to trigger sampling events

I Record instruction pointer at each sampling event

I Advantages:

I Small overhead

I Hardware assisted

I Can profile the whole system (even the kernel!)

I Disadvantages:

I not precise “only” statistical data

I Call Graph possibly not complete some functions are never sampled

5 Profiles

I Flat Profile How much time does the program spend in which function?

I Call Graph Which function calls which function how often?

I Annotated Sources Annotate each source line with number of executions

6 gprof

I Mixture of invasive and statistical profiling

Invasive Part

I gcc inserts calls to a function mcount into prologue of each function

I Compile with -g and -pg

I mcount can figure out its caller we can construct the call graph

I mcount counts the number of invocations for each function

I Call to mcount is the only instrumentation almost as efficient as normal build

I After program is run, there is a file called gmon.out containing profiling data

I Evaluate contents of gmon.out with gprof name-of-program

7 gprof

Statistical Part

I Kernel samples instruction pointer (IP) on each timer (100/s)

I Increments a counter in a histogram of address ranges cannot track the exact location where timer interrupt happened

I Provides a frequency distribution over code locations

I Beware of low samplerate

I Short running programs will mostly not provide meaningful data

I Accumulation of several profile runs is possible: $ ./test_program $ mv gmon.out gmon.sum $ ./test_program $ gprof -s ./test_program gmon.out gmon.sum

8 gcov

I Analyses coverage of program code

I Which line was executed how often

I Helps for finding code that

I can profit from optimizations

I that is not covered by test cases

I Use GCC flags

I -fprofile-arcs: collect info about jumps

I -ftest-coverage: collect info about code coverage

I Attention: Multiple code lines might be merged to one instruction 100: 12:if (a != b) 100: 13: c= 1; 100: 14:else 100: 15: c= 0;

9 valgrind

I JIT-compiler / translator:

I Construct intermediate representation from assembly code

I Add instrumentation code

I Compile back to x86

I Done while program is loaded

I Is not only a profiler!

I No compiler flags / recompilation needed (though -g -fno-inline advisable to analyse output)

I Program runtime can degrade drastically due to instrumentation code and recompilation

I can escape to debugger on certain events very handy when debugging memory leaks

I Disadvantage:

I program might run an order of magnitude slower

I program might consume an order of magnitude more memory

10 valgrind Tools memcheck

I Redirects calls to malloc and the like I Keeps track of all allocated memory I Instruments references to warn about “bad” memory accesses

I uninitialized

I already freed

I Detects memory leaks I Warns about jumps taken upon uninitialized values cachegrind

I Instruments memory accesses I Simulates (!) a L1 and L2 cache in software I Gives precise data about cache misses callgrind

I Records the call graph Hint Use kcachegrind for visualization

11 oprofile

I Non-invasive

I Kernel module and user-space daemon

I Does not modify the program at all

I -g for debug symbols recommendable

I Sampling uses performance counters

I . . . or timer interrupt of . counters not available

I Profiles the whole system (also the kernel!)

I Can distill data for each binary separately

I For Windows, use Intel vTune ($$$)

12 oprofile Performance Counter

I Set of hardware registers for a plethora of events

I Differ from processor model to another

I Very detailed events trackable. Examples:

I L2 cache misses

I Retired instructions

I Outstanding bus requests

I . . . and many more

I Basic modus operandi:

I Kernel module tells the CPU to fire an exception after a certain number of events of a certain type have occurred

I CPU traps into kernel

I instruction pointer is recorded in a buffer (no histograms)

13 oprofile Howto

I Use opcontrol to control the daemon/module

I opcontrol --init to load module and daemon

I opcontrol -s to start sampling

I opcontrol -t to stop sampling

I opcontrol --dump flushes the event log

I opcontrol --list-events shows available performance counters

I opreport -l prog-name gives breakdown of samples per function in prog-name

14 Conclusion

I Many different profiling methods exist

I gprof

I is obsolete

I use only to get a quick impression

I and for the call graph

I sampling might be too imprecise

I valgrind

I easy to use

I no recompile

I precise

I good visualization (kcachegrind)

I but large increase in runtime

I oprofile

I much more precise than gprof

I can profile exotic machine events if you are going for the last cycles

I not as precise as valgrind

I need root rights on the machine

15