0 Systems Programming 7. Profiling

vrije Universiteit

Guillaume Pierre

Fall 2006

http://www.cs.vu.nl/~gpierre/courses/sysprog/

1

Table of contents

• 1. Introduction ...... 2 • 2. Profilingwithgprof ...... 4 • 3. Profilingwithoprofile ...... 12 1. Introduction 2

1. Introduction

1. Introduction 3

Introduction

◮ It is often hard to find why a program is slow ⊲ A very tiny piece of code can jeopardize the performance of everything (if it is suboptimal and called often enough)

◮ There are tools to help you identify the performance bottlenecks ⊲ ⊲ oprofile 2. Profiling with gprof 4

2. Profiling with gprof

2. Profiling with gprof 5

Profiling with gprof

◮ Even if your program does not contain any bug, it may not be perfect ⊲ You want to write efficient code

◮ If your program is too slow, how do you know where it comes from? 2. Profiling with gprof 6

Profiling with gprof [2/2]

◮ gprof is a standard tool to help you identify performance bottlenecks ⊲ Gprof can tell you (roughly) how much time is spend in each function of your program ⊲ You should pay some attention to the functions where most time is spent. .. ⊲ . . . and wonder if this is normal ⊲ Check documentation at: http://www.gnu.org/software/binutils/manual/gprof-2.9.1/

2. Profiling with gprof 7

Using gprof

◮ To use gprof, compile your programs with flag -pg ⊲ This compiles your program together with the profiling support ⊲ Profiled programs run several times slower than normal $ gcc -pg -g -Wall -o main main.c $

◮ Then: simply execute your program normally $ ./main $

◮ Once your program has returned, you will see a new file called gmon.out ⊲ It contains the profiling information ◮ Call gprof main to get a profile report 2. Profiling with gprof 8

Example [1/3]

#include

long add(long x, long y) { /* Very inefficient implementation */ int foo = (x>y)?x:y; long x1 = x/foo; long x2 = x%foo; long y1 = y/foo; long y2 = y%foo; return x1*foo + x2 + y1*foo + y2; }

long fib(int i) { if (i<2) return i; return add(fib(i-1), fib(i-2)); }

int main() { long l = fib(36); printf("fib(36)=%ld\n",l); }

2. Profiling with gprof 9

Example [2/3]

$ gcc -g -pg -o fib fib.c $ ./fib $ gprof fib Flat profile:

Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 83.96 40.81 40.81 24157816 0.00 0.00 add 16.16 48.67 7.86 1 7.86 48.67 fib 0.85 49.08 0.41 frame_dummy

(...) $

◮ 84% of the execution time is spend in function add!!?? 2. Profiling with gprof 10

Example [3/3]

◮ Change function add to simply return x+y; ◮ Compile, run fib again, then gprof

$ gprof fib Flat profile:

Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 78.85 8.07 8.07 1 8.07 9.90 fib 17.78 9.90 1.82 24157816 0.00 0.00 add 4.55 10.36 0.47 frame_dummy

(...) $

◮ That looks better...

2. Profiling with gprof 11

Important Remark

◮ No matter what you do, the total of execution times will be 100%! ⊲ And one of your functions has to be the first time consumer. . .

◮ :::::This::::::does ::::not:::::::mean :::::that:::::you::::::must::::::::::optimize:::::the :::::first

:::::::::function:::of::::the::::list::::no::::::::matter:::::::what!

◮ Use your brains: ⊲ Is it normal that function X takes this amount of time? ⊲ Optimize it only if the answer to this question is no! 3. Profiling with oprofile 12

3. Profiling with oprofile

3. Profiling with oprofile 13

Profiling with oprofile

◮ Gprof is not perfect: ⊲ If you profile a multithreaded/multiprocess program, gprof will profile the parent thread/ only ⊲ It will have difficulty profiling shared libraries ◮ Alternative: oprofile ⊲ http://oprofile.sourceforge.net/ ⊲ It is included in most modern distributions ⊲ It works only in Linux ⊲ You must have root access ◮ Oprofile will insert an extra module in your Linux kernel ◮ Then: it profiles every piece of code which executes on your computer ⊲ (multithreaded) programs, libraries, kernel, etc. 3. Profiling with oprofile 14

Using oprofile

◮ To use oprofile: 1. Login as root 2. Tell oprofile where your kernel is 3. Start oprofile 4. Run your program(s) 5. Stop oprofile 6. Generate reports 7. Tell oprofile to clean the profile information ⊲ Otherwise it will be kept and merged with future utilizations

3. Profiling with oprofile 15

Using oprofile

$ gcc -g -o fib fib.c $ opcontrol --reset $ opcontrol --no-vmlinux $ opcontrol --start Using default event: CPU_CLK_UNHALTED:100000:0:1:1 Using 2.6+ OProfile kernel interface. Using log file /var/lib/oprofile/oprofiled.log Daemon started. Profiler running. $ ./fib fib(36)=14930352 $ opcontrol --shutdown Stopping profiling. Killing daemon. $ 3. Profiling with oprofile 16

Using oprofile

$ opreport CPU: AMD64 processors, speed 2002.58 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 CPU_CLK_UNHALT...| samples| %| ------84067 91.7211 fib 6052 6.6030 no-vmlinux 330 0.3600 xemacs-21.4.15 324 0.3535 Xorg 323 0.3524 libc-2.3.3.so 140 0.1527 oprofiled 115 0.1255 bash 71 0.0775 libX11.so.6.2 (...) $

◮ During profiling, 92% of time was spent in program fib

3. Profiling with oprofile 17

Using oprofile

◮ 92% of the execution time of fib was spent in function add: $ opreport -l /home/gpierre/work/courses/sysprog/5.debug/fib CPU: AMD64 processors, speed 2002.58 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % symbol name 77855 92.6107 add 6211 7.3882 fib 1 0.0012 _init $

⊲ Note: you must specify the absolute path of the program 3. Profiling with oprofile 18

Using oprofile

◮ It can also profile dynamic libraries: $ opreport -l /lib64/tls/libc.so.6 CPU: AMD64 processors, speed 2002.58 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % symbol name 49 15.1703 __gconv_transform_utf8_internal 39 12.0743 mbrtowc 26 8.0495 _int_malloc 17 5.2632 strlen 14 4.3344 _dl_addr 13 4.0248 memcpy 11 3.4056 free 11 3.4056 malloc (...) $

◮ oprofile can do many more things for you! ⊲ Check the documentation

3. Profiling with oprofile 19

One final word. . .

Bottlenecks occur in surprising places, so don’t try to second guess and put in a speed hack until you’ve proven that’s where the bottleneck is. - Rob Pike

More sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason – including blind stupidity. - W. A. Wulf