Linux 4.X Tracing Tools Using BPF Superpowers
Total Page:16
File Type:pdf, Size:1020Kb
Linux 4.x Tracing Tools Using BPF Superpowers Brendan Gregg, NeElix [email protected] December 4–9, 2016 | Boston, MA www.usenix.org/lisa16 #lisa16 Demo me GIVE ME 15 MINUTES AND I'LL CHANGE YOUR VIEW OF LINUX TRACING inspired by Greg Law's: Give me fiOeen minutes and I'll change your view of GDB Demo LISA 2014 perf-tools (Orace) LISA 2016 bcc tools (BPF) Wielding Superpowers WHAT DYNAMIC TRACING CAN DO Previously • Metrics were vendor chosen, closed source, and incomplete • The art of inference & making do # ps alx F S UID PID PPID CPU PRI NICE ADDR SZ WCHAN TTY TIME CMD 3 S 0 0 0 0 0 20 2253 2 4412 ? 186:14 swapper 1 S 0 1 0 0 30 20 2423 8 46520 ? 0:00 /etc/init 1 S 0 16 1 0 30 20 2273 11 46554 co 0:00 –sh […] Crystal Ball Observability Dynamic Tracing Linux Event Sources Event Tracing Efficiency Eg, tracing TCP retransmits Kernel Old way: packet capture tcpdump 1. read send 2. dump buffer receive Analyzer 1. read 2. process file system disks 3. print New way: dynamic tracing Tracer 1. configure tcp_retransmit_skb() 2. read New CLI Tools # biolatency Tracing block device I/O... Hit Ctrl-C to end. ^C usecs : count distribution 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 1 | | 128 -> 255 : 12 |******** | 256 -> 511 : 15 |********** | 512 -> 1023 : 43 |******************************* | 1024 -> 2047 : 52 |**************************************| 2048 -> 4095 : 47 |********************************** | 4096 -> 8191 : 52 |**************************************| 8192 -> 16383 : 36 |************************** | 16384 -> 32767 : 15 |********** | 32768 -> 65535 : 2 |* | 65536 -> 131071 : 2 |* | New Visualizaons and GUIs NeElix Intended Usage Self-service UI: Flame Graphs Tracing Reports … should be open sourced; you may also build/buy your own Conquer Performance Measure anything Introducing BPF BPF TRACING A Linux Tracing Timeline • 1990’s: Stac tracers, prototype dynamic tracers • 2000: LTT + DProbes (dynamic tracing; not integrated) • 2004: kprobes (2.6.9) • 2005: DTrace (not Linux), SystemTap (out-of-tree) • 2008: Orace (2.6.27) • 2009: perf (2.6.31) • 2009: tracepoints (2.6.32) • 2010-2016: Orace & perf_events enhancements • 2014-2016: BPF patches also: LTTng, ktap, sysdig, ... Ye Olde BPF Berkeley Packet Filter # tcpdump host 127.0.0.1 and port 22 -d (000) ldh [12] (001) jeq #0x800 jt 2 jf 18 (002) ld [26] (003) jeq #0x7f000001 jt 6 jf 4 (004) ld [30] (005) jeq #0x7f000001 jt 6 jf 18 (006) ldb [23] (007) jeq #0x84 jt 10 jf 8 (008) jeq #0x6 jt 10 jf 9 (009) jeq #0x11 jt 10 jf 18 (010) ldh [20] (011) jset #0x1fff jt 18 jf 12 (012) ldxb 4*([14]&0xf) (013) ldh [x + 14] (014) jeq #0x16 jt 17 jf 15 (015) ldh [x + 16] (016) jeq #0x16 jt 17 jf 18 (017) ret #65535 (018) ret #0 BPF Enhancements by Linux Version • 3.18: bpf syscall eg, Ubuntu: • 3.19: sockets • 4.1: kprobes • 4.4: bpf_perf_event_output 16.04 • 4.6: stack traces • 4.7: tracepoints 16.10 • 4.9: profiling Enhanced BPF is in Linux BPF • aka eBPF == enhanced Berkeley Packet Filter – Lead developer: Alexei Starovoitov (Facebook) • Many uses – Virtual networking – Security – Programmac tracing • Different front-ends – C, perf, bcc, ply, … BPF mascot BPF for Tracing User Program Kernel 1. generate verifier BPF bytecode kprobes 2. load BPF uprobes per- event 3. perf_output tracepoints data 3. async stasGcs maps read Raw BPF samples/bpf/sock_example.c 87 lines truncated C/BPF samples/bpf/tracex1_kern.c 58 lines truncated bcc • BPF Compiler Collecon Tracing layers: – hrps://github.com/iovisor/bcc – Lead developer: Brenden Blanco bcc tool bcc tool … (PlumGRID) • Includes tracing tools bcc … • Front-ends Python lua front-ends – Python user – Lua kernel Kernel – C helper libraries BPF Events bcc/BPF bcc examples/tracing/bitehist.py enre program ply/BPF hps://github.com/wkz/ply/blob/master/README.md enre program The Tracing Landscape, Dec 2016 (my opinion) dtrace4L. ply/BPF (less brutal) ktap sysdig perf stap Orace bcc/BPF Ease of use (alpha) (mature) C/BPF Stage of Development Raw BPF (brutal) Scope & Capability State of BPF, Dec 2016 State of bcc, Dec 2016 1. Dynamic tracing, kernel-level (BPF support for kprobes) 1. Stac tracing, user-level (USDT probes via uprobes) 2. Dynamic tracing, user-level (BPF support for uprobes) 2. Stac tracing, dynamic USDT (needs library support) 3. Stac tracing, kernel-level (BPF support for tracepoints) 3. Debug output (Python with BPF.trace_pipe() and 4. Timed sampling events (BPF with perf_event_open) BPF.trace_fields()) 5. PMC events (BPF with perf_event_open) 4. Per-event output (BPF_PERF_OUTPUT macro and 6. Filtering (via BPF programs) BPF.open_perf_buffer()) 7. Debug output (bpf_trace_printk()) 5. Interval output (BPF.get_table() and table.clear()) 8. Per-event output (bpf_perf_event_output()) 6. Histogram prinGng (table.print_log2_hist()) 9. Basic variables (global & per-thread variables, via BPF maps) 7. C struct navigaon, kernel-level (maps to bpf_probe_read()) 10. Associave arrays (via BPF maps) 8. Symbol resoluon, kernel-level (ksym(), ksymaddr()) 11. Frequency counGng (via BPF maps) 9. Symbol resoluon, user-level (usymaddr()) 12. Histograms (power-of-2, linear, and custom, via BPF maps) 10. BPF tracepoint support (via TRACEPOINT_PROBE) 11. BPF stack trace support (incl. walk method for stack frames) 13. Timestamps and Gme deltas (bpf_kGme_get_() and BPF) 14. Stack traces, kernel (BPF stackmap) 12. Examples (under /examples) 15. Stack traces, user (BPF stackmap) 13. Many tools (/tools) 16. Overwrite ring buffers 14. Tutorials (/docs/tutorial*.md) 17. String factory (stringmap) 15. Reference guide (/docs/reference_guide.md) 18. OpGonal: bounded loops, < and <=, … 16. Open issues: (hrps://github.com/iovisor/bcc/issues) done not yet For end-users HOW TO USE BCC/BPF Installaon hrps://github.com/iovisor/bcc/blob/master/INSTALL.md • eg, Ubuntu Xenial: # echo "deb [trusted=yes] https://repo.iovisor.org/apt/xenial xenial-nightly main" | \ sudo tee /etc/apt/sources.list.d/iovisor.list # sudo apt-get update # sudo apt-get install bcc-tools – puts tools in /usr/share/bcc/tools, and tools/old for older kernels – 16.04 is good, 16.10 berer: more tools work – bcc should also arrive as an official Ubuntu snap Pre-bcc Performance Checklist 1. uptime 2. dmesg | tail 3. vmstat 1 4. mpstat -P ALL 1 5. pidstat 1 6. iostat -xz 1 7. free -m 8. sar -n DEV 1 9. sar -n TCP,ETCP 1 10. top hp://techblog.neElix.com/2015/11/linux-performance-analysis-in-60s.html bcc General Performance Checklist 1. execsnoop 2. opensnoop 3. ext4slower (…) 4. biolatency 5. biosnoop 6. cachestat 7. tcpconnect 8. tcpaccept 9. tcpretrans 10. gethostlatency 11. runqlat 12. profile 1. execsnoop # execsnoop PCOMM PID RET ARGS bash 15887 0 /usr/bin/man ls preconv 15894 0 /usr/bin/preconv -e UTF-8 man 15896 0 /usr/bin/tbl man 15897 0 /usr/bin/nroff -mandoc -rLL=169n -rLT=169n -Tutf8 man 15898 0 /usr/bin/pager -s nroff 15900 0 /usr/bin/locale charmap nroff 15901 0 /usr/bin/groff -mtty-char -Tutf8 -mandoc -rLL=169n -rLT=169n groff 15902 0 /usr/bin/troff -mtty-char -mandoc -rLL=169n -rLT=169n -Tutf8 groff 15903 0 /usr/bin/grotty […] 2. opensnoop # opensnoop PID COMM FD ERR PATH 27159 catalina.sh 3 0 /apps/tomcat8/bin/setclasspath.sh 4057 redis-server 5 0 /proc/4057/stat 2360 redis-server 5 0 /proc/2360/stat 30668 sshd 4 0 /proc/sys/kernel/ngroups_max 30668 sshd 4 0 /etc/group 30668 sshd 4 0 /root/.ssh/authorized_keys 30668 sshd 4 0 /root/.ssh/authorized_keys 30668 sshd -1 2 /var/run/nologin 30668 sshd -1 2 /etc/nologin 30668 sshd 4 0 /etc/login.defs 30668 sshd 4 0 /etc/passwd 30668 sshd 4 0 /etc/shadow 30668 sshd 4 0 /etc/localtime 4510 snmp-pass 4 0 /proc/cpuinfo […] 3. ext4slower # ext4slower 1 Tracing ext4 operations slower than 1 ms TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME 06:49:17 bash 3616 R 128 0 7.75 cksum 06:49:17 cksum 3616 R 39552 0 1.34 [ 06:49:17 cksum 3616 R 96 0 5.36 2to3-2.7 06:49:17 cksum 3616 R 96 0 14.94 2to3-3.4 06:49:17 cksum 3616 R 10320 0 6.82 411toppm 06:49:17 cksum 3616 R 65536 0 4.01 a2p 06:49:17 cksum 3616 R 55400 0 8.77 ab 06:49:17 cksum 3616 R 36792 0 16.34 aclocal-1.14 06:49:17 cksum 3616 R 15008 0 19.31 acpi_listen 06:49:17 cksum 3616 R 6123 0 17.23 add-apt-repository 06:49:17 cksum 3616 R 6280 0 18.40 addpart 06:49:17 cksum 3616 R 27696 0 2.16 addr2line 06:49:17 cksum 3616 R 58080 0 10.11 ag […] also: btrfsslower, xfsslower, zfslower 4. biolatency # biolatency -mT 1 Tracing block device I/O... Hit Ctrl-C to end. 06:20:16 msecs : count distribution 0 -> 1 : 36 |**************************************| 2 -> 3 : 1 |* | 4 -> 7 : 3 |*** | 8 -> 15 : 17 |***************** | 16 -> 31 : 33 |********************************** | 32 -> 63 : 7 |******* | 64 -> 127 : 6 |****** | […] 5. biosnoop # biosnoop TIME(s) COMM PID DISK T SECTOR BYTES LAT(ms) 0.000004001 supervise 1950 xvda1 W 13092560 4096 0.74 0.000178002 supervise 1950 xvda1 W 13092432 4096 0.61 0.001469001 supervise 1956 xvda1 W 13092440 4096 1.24 0.001588002 supervise 1956 xvda1 W 13115128 4096 1.09 1.022346001 supervise 1950 xvda1 W 13115272 4096 0.98 1.022568002 supervise 1950 xvda1 W 13188496 4096 0.93 1.023534000 supervise 1956 xvda1 W 13188520 4096 0.79 1.023585003 supervise 1956 xvda1 W 13189512 4096 0.60 2.003920000 xfsaild/md0 456 xvdc W 62901512 8192 0.23 2.003931001 xfsaild/md0 456 xvdb W 62901513 512 0.25 2.004034001 xfsaild/md0 456 xvdb W 62901520 8192 0.35 2.004042000 xfsaild/md0 456 xvdb W 63542016 4096 0.36 2.004204001 kworker/0:3 26040 xvdb W 41950344 65536 0.34 2.044352002 supervise 1950 xvda1 W 13192672 4096 0.65 […] 6.