Ftrace Profiling

Presenter: Steven Rostedt [email protected]

What do you want to profile?

● Application – Cache misses – Memory locality – Page faults – Finding bad algorithms – O(n^2) – CPU cycles – I/O usage

What do you want to profile?

● Kernel – Cache misses – Memory locality – Page faults – Finding bad algorithms – O(n^2) – CPU cycles – I/O usage – Different than – Locking – (Disabling, , etc) –

● Latency, fairness, RT, Dead Line, etc Profiling tools

● oprofile

● gdb ● trace-cmd

Perf

● stat

● top

● record

● report ● trace

Perf Stat

● Great for comparing versions of tools

● Perhaps not the best for the kernel analysis

● perf stat -e cycles --repeat 100 --

● Gives average cycles with standard deviation

Perf Record / Report

● Profile the system

● Works for both kernel and userspace

● Shows where in code time is spent

● Can break down to assembly ● Beware, can be off by one – Due to latching instructions – Irq enabling

Perf Record / Report

[root@bxtest ~]# perf record -g /work/c/hackbench 10 Time: 0.365 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.592 MB perf.data (14360 samples) ]

Perf Record / Report

Perf Record / Report

Perf Record / Report

● Very powerful

● Traces both user space and kernel

● I love the UI

● But! – Perf starts to show its overhead with heavy – Function tracing is still a weak

Perf Record / Report

● Very powerful

● Traces both user space and kernel

● I love the UI

● But! – Perf starts to show its overhead with heavy tracing – Function tracing is still a weak – I program in ftrace ;-)

Ftrace / trace-cmd

● trace-cmd – A front end interface to ftrace.

● Requires root privilege to start tracing

● Traces the kernel (not user space)

● Kernel buffer is optimized for tracing ● Does not do periodic profiling

trace-cmd 2.5

● New profiling feature

● Allows you to connect events

● Timings: Total, average, max, min

● Shows where events occur and frequency

trace-cmd profile

● By default enables: – Function graph tracer

● Depth of 1 – All irq events – All raw events – Schedule events:

● sched_wakeup – Sets stack trace ● sched_switch – Sets stack trace ● sched_process_exec – Page fault event trace-cmd profile Event Hooks

● sched_wakeup → sched_switch

● sched_switch → sched_wakeup – Sleeping ● sched_switch → sched_switch – Preempted ● softirq_raise → softirq_entry ● softirq_entry → softirq_exit

● irq_handler_entry → irq_handler_exit

● sys_enter → sys_exit trace-cmd profile

# trace-cmd profile --stderr hackbench 10 2> out # cat out

trace-cmd profile

Kernel buffer statistics: Note: "entries" are the entries left in the kernel ring buffer and are not recorded in the trace data. They should all be zero.

CPU: 0 entries: 0 overrun: 1271072 commit overrun: 0 bytes: 644 oldest event ts: 13166.516390 now ts: 13166.927822 dropped events: 0 events: 41271

[...]

FYI - coming in 2.6

● New output

CPU 0: 74560 bytes lost CPU 1: 59253 bytes lost CPU 2: 427979 bytes lost CPU 3: 99514 bytes lost CPU 4: 47925 bytes lost CPU 5: 42843 bytes lost CPU 6: 70676 bytes lost CPU 7: 676606 bytes lost

trace-cmd profile task: hackbench-2658 Event: sched_switch:R (4) Total: 824611 Avg: 206152 Max: 637060(ts:13164.420597) Min:38201(ts:13166.515937) | + ftrace_raw_event_sched_switch (0xffffffff810ad610) 100% (4) time:824611 max:637060(ts:13164.419988) min:38201(ts:13166.515926) avg:206152 __schedule (0xffffffff81770450) preempt_schedule (0xffffffff81770d2e) ___preempt_schedule (0xffffffff813b80ce) | + cpu_stop_queue_work (0xffffffff81133e26) | 77% (1) time:637060 max:637060(ts:13164.419988) | min:637060(ts:13164.419988) avg:637060 | stop_one_cpu (0xffffffff81134130) | sched_exec (0xffffffff810b0adb) | do_execveat_common.isra.32 | (0xffffffff8121e436) | do_execve (0xffffffff8121eb2c) | SyS_execve (0xffffffff8121ee0e) | return_to_handler (0xffffffff81779468) | stub_execve (0xffffffff81777669)

trace-cmd profile

Event: sched_switch:S (44) Total: 71376940 Avg: 1622203 Max: 21473090(ts:13166.461946) Min:25558(ts:13166.514547) | + ftrace_raw_event_sched_switch (0xffffffff810ad610) 100% (44) time:71376940 max:21473090(ts:13166.440494) min:25558(ts:13166.514544) avg:1622203 __schedule (0xffffffff81770450) schedule (0xffffffff81770d79) do_wait (0xffffffff810832ac) SyS_wait4 (0xffffffff81084723) return_to_handler (0xffffffff81779468) tracesys_phase2 (0xffffffff81777289)

trace-cmd profile

Event: sched_switch:D (1) Total: 2421103 Avg: 2421103 Max: 2421103(ts:453.664398) Min:2421103(ts:453.664398) | + ftrace_raw_event_sched_switch (0xffffffff8109fbb0) 100% (1) time:2421103 max:2421103(ts:453.662003) min:2421103(ts:453.662003) avg:2421103 __schedule (0xffffffff816b7bb9) schedule (0xffffffff816b8139) schedule_timeout (0xffffffff816bab35) io_schedule_timeout (0xffffffff816b8551) wait_for_completion_io (0xffffffff816b8cf1) blk_execute_rq (0xffffffff81315d39) scsi_execute (0xffffffff81454b67) scsi_execute_req_flags (0xffffffff814564ec) sr_check_events (0xffffffff81466ee9) cdrom_check_events (0xffffffff81492f3c) sr_block_check_events (0xffffffff81467381) disk_check_events (0xffffffff8131f55b) disk_events_workfn (0xffffffff8131f666) process_one_work (0xffffffff8109149b) return_to_handler (0xffffffff816be158) worker_thread (0xffffffff81091c3b) kthread (0xffffffff810976b9) ret_from_fork (0xffffffff816bc02c) trace-cmd profile

Event: sched_wakeup:0xa62 (53) Total: 3752286 Avg: 70797 Max: 641910(ts:13166.503267) Min:17215(ts:13166.514565) | + ftrace_raw_event_sched_wakeup_template (0xffffffff810abbf0) 100% (53) time:3752286 max:641910(ts:13166.502654) min:17215(ts:13166.514563) avg:70797 ttwu_do_wakeup (0xffffffff810af0f2) ttwu_do_activate.constprop.120 (0xffffffff810af2e6) try_to_wake_up (0xffffffff810b4a9b) default_wake_function (0xffffffff810b4d72) | + child_wait_callback (0xffffffff81081822) | 95% (52) time:3582236 | max:641910(ts:13166.502654) | min:17215(ts:13166.514563) avg:68889 | __wake_up_common (0xffffffff810c8cb8) | __wake_up_sync_key (0xffffffff810c8f74) | __wake_up_parent (0xffffffff810844c6) | do_notify_parent (0xffffffff81091552) | do_exit (0xffffffff810841a5) | do_group_exit (0xffffffff81084411) | SyS_exit_group (0xffffffff81084497) | return_to_handler (0xffffffff81779468) | tracesys_phase2 (0xffffffff81777289) trace-cmd profile Event: func: do_notify_resume() (1) Total: 3823 Avg: 3823 Max: 3823(ts:13164.442512) Min:3823(ts:13164.442512) Event: func: __do_page_fault() (83) Total: 877552 Avg: 10572 Max: 101563(ts:13164.442393) Min:1943(ts:13164.442602) Event: func: __fsnotify_parent() (1) Total: 655 Avg: 655 Max: 655(ts:13164.419758) Min:655(ts:13164.419758) Event: func: preempt_count_add() (166) Total: 77873 Avg: 469 Max: 12562(ts:13164.443360) Min:207(ts:13164.442078) Event: func: rcu_lockdep_current_cpu_online() (166) Total: 52312 Avg: 315 Max: 502(ts:13164.443588) Min:262(ts:13164.443169) Event: func: syscall_trace_enter_phase1() (159) Total: 44922 Avg: 282 Max: 558(ts:13166.440430) Min:149(ts:13166.506518) Event: func: SyS_close() (14) Total: 21622 Avg: 1544 Max: 2598(ts:13164.419772) Min:1286(ts:13164.444957) Event: func: mutex_unlock() (2) Total: 128686 Avg: 64343 Max: 127977(ts:13164.419757) Min:709(ts:13164.419763) Event: func: SyS_munmap() (1) Total: 37085 Avg: 37085 Max: 37085(ts:13164.442506) Min:37085(ts:13164.442506) Event: func: SyS_newfstat() (3) Total: 5724 Avg: 1908 Max: 2498(ts:13166.515650) Min:1511(ts:13164.441994) Event: func: SyS_socketpair() (12) Total: 360633 Avg: 30052 Max: 45273(ts:13164.444105) Min:22783(ts:13164.442701) Event: func: syscall_trace_leave() (159) Total: 229978 Avg: 1446 Max: 19880(ts:13166.504255) Min:909(ts:13166.506518) Event: func: SyS_wait4() (86) Total: 77870280 Avg: 905468 Max: 21639490(ts:13166.462072) Min:11999(ts:13166.503385) trace-cmd profile Event: sys_enter:33 (1) Total: 6761 Avg: 6761 Max: 6761(ts:13164.419784) Min:6761(ts:13164.419784) Event: sys_enter:10 (3) Total: 23531 Avg: 7843 Max: 8817(ts:13164.442015) Min:6820(ts:13164.442466) Event: sys_enter:158 (1) Total: 2914 Avg: 2914 Max: 2914(ts:13164.442167) Min:2914(ts:13164.442167) Event: sys_enter:61 (86) Total: 78099197 Avg: 908130 Max: 21641964(ts:13166.462073) Min:14079(ts:13166.503419) Event: sys_enter:1 (1) Total: 280576 Avg: 280576 Max: 280576(ts:13166.515967) Min:280576(ts:13166.515967) Event: sys_enter:3 (14) Total: 72451 Avg: 5175 Max: 21820(ts:13164.444315) Min:3343(ts:13164.444958) Event: sys_enter:5 (3) Total: 12056 Avg: 4018 Max: 4700(ts:13166.515651) Min:3624(ts:13164.441995) Event: sys_enter:12 (1) Total: 2576 Avg: 2576 Max: 2576(ts:13164.441794) Min:2576(ts:13164.441794) Event: sys_enter:2 (2) Total: 37828 Avg: 18914 Max: 22945(ts:13164.441980) Min:14883(ts:13164.441881) Event: sys_enter:56 (10) Total: 1261733 Avg: 126173 Max: 137177(ts:13164.444484) Min:110758(ts:13164.443131) Event: sys_enter:53 (12) Total: 399096 Avg: 33258 Max: 59347(ts:13164.444106) Min:25055(ts:13164.442702) Event: sys_enter:11 (1) Total: 39130 Avg: 39130 Max: 39130(ts:13164.442507) Min:39130(ts:13164.442507) Event: sys_enter:0 (1) Total: 6293 Avg: 6293 Max: 6293(ts:13164.441988) Min:6293(ts:13164.441988) trace-cmd profile

Event: page_fault_user:0x398dbb4510 (1) Event: page_fault_user:0x398d816230 (1) Event: page_fault_user:0x398dbb0b40 (1) Event: page_fault_user:0x398dbb39b4 (83) Event: page_fault_user:0x7fffe30918d8 (83) Event: page_fault_user:0x6014e0 (1) Event: page_fault_user:0x398dbb39b0 (1) Event: page_fault_user:0x398dbb39b8 (1) Event: page_fault_user:0x7f2030945010 (1) Event: page_fault_user:0x398d001590 (1) Event: page_fault_user:0x398d222218 (1) Event: page_fault_user:0x398d220ce8 (1) Event: page_fault_user:0x7f2030927000 (1) Event: page_fault_user:0x398d889f10 (1)

trace-cmd profile

Event: softirq_raise:TIMER (6) Total: 48317 Avg: 8052 Max: 9714(ts:104520.523158) Min:6733(ts:104520.526473) Event: softirq_raise:SCHED (1) Total: 4618 Avg: 4618 Max: 4618(ts:104520.519709) Min:4618(ts:104520.519709) Event: softirq_raise:RCU (2) Total: 16166 Avg: 8083 Max: 8698(ts:104520.516301) Min:7468(ts:104520.533259) Event: softirq_entry:RCU (2) Total: 8407 Avg: 4203 Max: 6083(ts:104520.533265) Min:2324(ts:104520.516303) Event: softirq_entry:TIMER (6) Total: 94501 Avg: 15750 Max: 89430(ts:104520.523247) Min:719(ts:104520.529866)

trace-cmd record/report --profile

● Live processing can take time and add overhead to the system ● If the test is not too long, recording the data to file may be better

● Can view the raw format too

trace-cmd record/report --profile

# trace-cmd record --profile hackbench 10 # trace-cmd report --profile

Customization

● --profile (and trace-cmd profile) by default may add too much overhead ● You may want just function tracing

● You may want to add your own events

● You may want to trace the times between any two events

● Limiting events, may make recording to disk more appropriate

No function graph tracing ● Adds a lot of overhead (even with depth set)

● -p function – use function tracing instead – Will not have function times though ● -l filter – Filters the functions ● -n not-filter – Do not trace specific functions ● -p nop

– No function tracing at all trace-cmd record/report --profile

# trace-cmd record --profile -l 'SyS*' hackbench 10 # trace-cmd report --profile

trace-cmd report --profile Event: func: sys_write() (1) Total: 70865 Avg: 70865 Max: 70865(ts:118168.165502) Min:70865(ts:118168.165502) Event: func: sys_access() (1) Total: 11636 Avg: 11636 Max: 11636(ts:118167.132885) Min:11636(ts:118167.132885) Event: func: SyS_clone() (400) Total: 31177395 Avg: 77943 Max: 240249(ts:118167.169010) Min:50093(ts:118167.170384) Event: func: SyS_read() (401) Total: 5238133 Avg: 13062 Max: 4392579(ts:118167.185445) Min:1079(ts:118167.185801) Event: func: SyS_brk() (1) Total: 1938 Avg: 1938 Max: 1938(ts:118167.132811) Min:1938(ts:118167.132811) Event: func: SyS_open() (2) Total: 27796 Avg: 13898 Max: 14450(ts:118167.132989) Min:13346(ts:118167.132916) Event: func: SyS_close() (402) Total: 557733 Avg: 1387 Max: 35680(ts:118167.148878) Min:447(ts:118167.179922) Event: func: sys_mprotect() (3) Total: 39966 Avg: 13322 Max: 17212(ts:118167.133045) Min:9180(ts:118167.133506) Event: func: sys_wait4() (396) Total: 157683754 Avg: 398191 Max: 40406227(ts:118168.095636) Min:4271(ts:118168.165072) Event: func: sys_socketpair() (202) Total: 2267612 Avg: 11225 Max: 42956(ts:118167.155665) Min:5833(ts:118167.173595) Event: func: SyS_munmap() (1) Total: 38888 Avg: 38888 Max: 38888(ts:118167.133549) Min:38888(ts:118167.133549) Event: func: SyS_newfstat() (3) Total: 10475 Avg: 3491 Max: 5265(ts:118168.165374) Min:1700(ts:118167.133008) Event: func: sys_execve() (1) Total: 635426 Avg: 635426 Max: 635426(ts:118167.132742) Min:635426(ts:118167.132742) -G for global irqs (2.6+) Global CPU[0] Events Event: softirq_raise:BLOCK (1) Total: 1508 Avg: 1508 Max: 1508(ts:29285.352048) Min:1508(ts:29285.352048) Event: softirq_raise:SCHED (78) Total: 562821 Avg: 7215 Max: 106988(ts:29285.423405) Min:297(ts:29285.493329) Event: softirq_raise:NET_RX (1) Total: 1233 Avg: 1233 Max: 1233(ts:29285.196645) Min:1233(ts:29285.196645) Event: softirq_raise:TIMER (500) Total: 4662144 Avg: 9324 Max: 69352(ts:29285.238341) Min:0(ts:29284.626903) Event: softirq_raise:RCU (98) Total: 1055197 Avg: 10767 Max: 147725(ts:29285.423424) Min:442(ts:29285.272222) Event: softirq_entry:RCU (98) Total: 247710 Avg: 2527 Max: 14940(ts:29285.327271) Min:487(ts:29285.414294) Event: softirq_entry:BLOCK (1) Total: 40274 Avg: 40274 Max: 40274(ts:29285.352088) Min:40274(ts:29285.352088) Event: softirq_entry:SCHED (78) Total: 281724 Avg: 3611 Max: 75285(ts:29284.573972) Min:424(ts:29285.499318) Event: softirq_entry:TIMER (500) Total: 2110896 Avg: 4221 Max: 105637(ts:29285.423404) Min:0(ts:29284.626903) Event: softirq_entry:NET_RX (1) Total: 31098 Avg: 31098 Max: 31098(ts:29285.196677) Min:31098(ts:29285.196677) Event: irq_handler_entry:0x1a (1) Total: 4853 Avg: 4853 Max: 4853(ts:29285.196645) Min:4853(ts:29285.196645) Event: irq_handler_entry:0x18 (1) Total: 6988 Avg: 6988 Max: 6988(ts:29285.352047) Min:6988(ts:29285.352047) Matching your own events

● If you want timings between two events

● Define start event, end event, and the value to match between the two

● -H ,/, – hook start event to end event – start field smatch == end field ematch ● -H ,,/, – Hooks are matched by same tasks – If hook needs a proxy – Like sched_wakeup Matching your own events

● -H ,,/,,

● FLAGS – 's'

● stack trace – 'p'

● Pinned task – only reset on CPU dropped events – 'g'

● global – do not map to tasks – coming in 2.6 Matching your own events

# trace-cmd record \ -H hrtimer_expire_entry,hrtimer/hrtimer_expire_exit,hrtimer,g \ hackbench 10 # trace-cmd report --profile Global Events Event: hrtimer_expire_entry:0xffff88011ea8d820 (111) Total: 218295 Avg: 1966 Max: 4528(ts:62612.053368) Min:897(ts:62612.121115) Event: hrtimer_expire_entry:0xffff88011ea0d820 (107) Total: 192927 Avg: 1803 Max: 4203(ts:62612.042083) Min:887(ts:62612.093103) Event: hrtimer_expire_entry:0xffff88011ebcd820 (77) Total: 146292 Avg: 1899 Max: 4076(ts:62612.039083) Min:412(ts:62612.107111) Event: hrtimer_expire_entry:0xffff88011ea4d820 (106) Total: 211691 Avg: 1997 Max: 6597(ts:62612.039084) Min:1093(ts:62612.118114) Event: hrtimer_expire_entry:0xffff88011eb8d820 (84) Total: 159953 Avg: 1904 Max: 4527(ts:62612.041086) Min:378(ts:62612.119116) Event: hrtimer_expire_entry:0xffff88011eb4d820 (107) Total: 207759 Avg: 1941 Max: 5483(ts:62612.044086) Min:976(ts:62612.117113) Event: hrtimer_expire_entry:0xffff88011eacd820 (116) Total: 244719 Avg: 2109 Max: 6130(ts:62612.039084) Min:1372(ts:62612.058086) Event: hrtimer_expire_entry:0xffff88011eb0d820 (102) Total: 181536 Avg: 1779 Max: 3979(ts:62612.039082) Min:1033(ts:62612.133120)

Matching your own events # trace-cmd record \ -H hrtimer_expire_entry,hrtimer/hrtimer_expire_exit,hrtimer,gp \ hackbench 10 # trace-cmd report --profile Global CPU[0] Events Event: hrtimer_expire_entry:0xffff88011ea0d820 (107) Total: 205385 Avg: 1919 Max: 6771(ts:62232.214849) Min:775(ts:62232.329894) Global CPU[1] Events Event: hrtimer_expire_entry:0xffff88011ea4d820 (109) Total: 214429 Avg: 1967 Max: 5332(ts:62232.214848) Min:323(ts:62232.274870) Global CPU[2] Events Event: hrtimer_expire_entry:0xffff88011ea8d820 (105) Total: 219277 Avg: 2088 Max: 5050(ts:62232.216850) Min:933(ts:62232.329894) Global CPU[3] Events Event: hrtimer_expire_entry:0xffff88011eacd820 (119) Total: 259717 Avg: 2182 Max: 6949(ts:62232.214849) Min:1288(ts:62232.232850) Global CPU[4] Events Event: hrtimer_expire_entry:0xffff88011eb0d820 (109) Total: 212016 Avg: 1945 Max: 5071(ts:62232.214848) Min:423(ts:62232.257864) Global CPU[5] Events Event: hrtimer_expire_entry:0xffff88011eb4d820 (94) Total: 184613 Avg: 1963 Max: 4661(ts:62232.215846) Min:397(ts:62232.271869) Global CPU[6] Events Event: hrtimer_expire_entry:0xffff88011eb8d820 (115) Total: 229882 Avg: 1998 Max: 4573(ts:62232.219853) Min:714(ts:62232.331895) Global CPU[7] Events

Event: hrtimer_expire_entry:0xffff88011ebcd820 (89) Total: 189363 Avg: 2127 Max: 4708(ts:62232.214848) Min:970(ts:62232.235851) How does this help? task: trace-cmd-1126 [...] Event: sched_switch:D (40) Total: 90828734 Avg: 2270718 Max: 28170222(ts:453.848016) Min:19093(ts:453.928895) | + ftrace_raw_event_sched_switch (0xffffffff8109fbb0) 100% (40) time:90828734 max:28170222(ts:453.819867) __schedule (0xffffffff816b7bb9) schedule (0xffffffff816b8139) schedule_preempt_disabled (0xffffffff816b8405) __mutex_lock_slowpath (0xffffffff816b9de5) mutex_lock (0xffffffff816b9e6b) | + tracing_buffers_splice_read (0xffffffff8113561b) | 74% (20) time:66847386 max:28170222(ts:453.819867) | do_splice_to (0xffffffff811fb00f) | SyS_splice (0xffffffff811fd4bf) | return_to_handler (0xffffffff816be158) | tracesys_phase2 (0xffffffff816bc2b0) | + tracing_buffers_splice_read (0xffffffff8113535b) | 26% (19) time:23957571 max:10499923(ts:453.784203) | do_splice_to (0xffffffff811fb00f) | SyS_splice (0xffffffff811fd4bf) | return_to_handler (0xffffffff816be158) | tracesys_phase2 (0xffffffff816bc2b0) | + return_to_handler (0xffffffff816be158) 0% (1) time:23777 max:23777(ts:453.109491) tracing_buffers_splice_read (0xffffffff8113561b) do_splice_to (0xffffffff811fb00f) SyS_splice (0xffffffff811fd4bf) Questions?

Questions?

Yeah right! Like we have time

Questions?

We do?

Questions?

LET'S DEMO!