Introduction to low-level profiling and tracing

EuroPython 2019 / Basel 2019-07-11 Christian Heimes Principal Software Engineer [email protected] / [email protected] @ChristianHeimes Our systems block every 5 minutes. We are loosing money. Fix it!

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Is it me, or the GIL?

Christoph Heer EuroPython 2019

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 requests.get('https://www.python.org')

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 2 ½ use case for tracing tools Who am I?

● from Hamburg/Germany ● Python and developer ● Python core contributor since 2008 ● maintainer of ssl and hashlib module ● Python security team

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Professional life

● Principal Software Engineer at Red Hat ● Security Engineering ● FreeIPA Identity Management ● Dogtag PKI

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Agenda & Goals

Agenda

● introduction ● (strace, ) ● kernel & hardware tracing tools ● Summary ● ~ 5 minutes Q&A

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Special thanks

● Brendan Gregg (Netflix) ● Victor Stinner (CPython, Red Hat) https://vstinner.readthedocs.io/benchmark.html ● Dmitry Levin (strace) ”Modern Strace”, DevConf.CZ 2019 ● Michal Sekletár (Red Hat) ”Tracing Tools for Systems Engineers”, DevConf.CZ 2019

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Introduction Terminology

Debugging – The of identifying and removing bugs. ● active, expensive, intrusive, slow-down ● deploy new version ● attach debugger

Tracing – observing and monitoring behavior ● passive, non-intrusive, and fast*

Profiling – gathering and analyzing metrics ● byproduct of tracing with counting and timing

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Methodology

Application level tracing ● debug builds ● performance logs (MySQL Slow Query Log, PHP xdebug) ● trace hooks (Python: sys.settrace()) tracing ● LD_PRELOAD, ptrace Kernel space tracing ● , eBPF, , LTTng, ... Hardware tracing ● hardware performance counter, PMU, ...

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Prerequisites

● installation of extra packages ● permissions ● root or special group ● CAP_SYS_PTRACE, CAP_SYS_ADMIN, CAP_SYS_MODULE ● recent Kernel (BCC, eBPF) ● debug symbols dnf debuginfo-install package, apt install package-dbg ● compiler/linker flags (-g, -fno-omit-framepointer, BIND_LAZY) ● disable secure boot (stap) Kernel Live Patching or dynamic modules

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 “Lies, damned lies, and statistics”

”Wer misst, misst Mist.”

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 benchmarks / statistics

● average, median, standard deviation, percentile ● observational error ● systematic error ● random error ● systemic bias, cognitive bias (human factor) ● misleading presentation (Vatican City has 2.27 popes / km²) ● sampling profiler: quantization error, Nyquist–Shannon theorem

“Producing Wrong Data Without Doing Anything Obviously Wrong!” (Mytkowicz, Diwan, Hauswirth, Sweeney)

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Computers are noisy

● CPU power states (P-states, C-states, TurboBoost, thermal throttling) ● caches (L1 CPU, file system, JIT warm-up, DNS/HTTP) ● hardware IRQs, Address Space Layout Randomization (ASLR)

https://vstinner.readthedocs.io/benchmark.html

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 env vars, path length, hostname, username

BenchmarkBenchmark withoutwithout PythonPython virtualvirtual environmentenvironment

%% timetime secondsseconds usecs/callusecs/call callscalls errorserrors syscallsyscall ------28.7828.78 0.0004380.000438 00 14401440 readread 27.3327.33 0.0004160.000416 11 440440 2525 statstat 9.729.72 0.0001480.000148 11 144144 mmapmmap ...... 0.790.79 0.0000120.000012 11 1111 munmapmunmap

BenchmarkBenchmark insideinside aa virtualvirtual environmentenvironment

%% timetime secondsseconds usecs/callusecs/call callscalls errorserrors syscallsyscall ------57.1257.12 0.0990230.099023 22 6147161471 munmapmunmap 41.8741.87 0.0725800.072580 11 6161861618 mmapmmap 0.230.23 0.0003950.000395 11 465465 2727 statstat

● https://mail.python.org/pipermail/python-dev/2019-February/156522.html ● https://homes.cs.washington.edu/~bornholt/post/performance-evaluation.html

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Let's profile

importimport timetime

startstart == time.time()time.time()

withwith ('/etc/os-release')open('/etc/os-release') asas f:f: lineslines == f.readlines()f.readlines()

print(time.time()print(time.time() -- start)start)

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Examples

● shell ● cat ● Python ● open + read file ● HTTPS request with requests

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 ptrace ptrace

● process trace syscall ● introduced in Unix Version 6 (~1985) ● user-space tracing ● used by debuggers and code analysis tools ● gdb ● strace ● ltrace ● code coverage ● ...

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 ltrace – A call tracer

$$ ltraceltrace -e-e ''SSL_CTX*@*SSL_CTX*@*'' \\ python3python3 -c-c ''importimport requests;requests; requests.get("https://www.python.org")requests.get("https://www.python.org")'' _ssl.cpython-37m-x86_64-linux-gnu.so->_ssl.cpython-37m-x86_64--gnu.so->SSL_CTX_newSSL_CTX_new(0x7fa4a1a77120,(0x7fa4a1a77120, 0,0, 0,0, 0)0) == 0x55636123a1000x55636123a100 _ssl.cpython-37m-x86_64-linux-gnu.so->_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_get_verify_callbackSSL_CTX_get_verify_callback(0x55636123a100,(0x55636123a100, -2,-2, 0x7fa4a1acd5e0,0x7fa4a1acd5e0, 0x7fa4a1acd5f8)0x7fa4a1acd5f8) == 00 _ssl.cpython-37m-x86_64-linux-gnu.so->_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_set_verifySSL_CTX_set_verify(0x55636123a100,(0x55636123a100, 0,0, 0,0, 0x7fa4a1acd5f8)0x7fa4a1acd5f8) == 00 _ssl.cpython-37m-x86_64-linux-gnu.so->_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_set_optionsSSL_CTX_set_options(0x55636123a100,(0x55636123a100, 0x82420054,0x82420054, 0,0, 0x7fa4a1acd5f8)0x7fa4a1acd5f8) == 0x825200540x82520054 _ssl.cpython-37m-x86_64-linux-gnu.so->_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_ctrlSSL_CTX_ctrl(0x55636123a100,(0x55636123a100, 33,33, 16,16, 0)0) == 2020 _ssl.cpython-37m-x86_64-linux-gnu.so->_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_set_session_id_contextSSL_CTX_set_session_id_context(0x55636123a100,(0x55636123a100, 0x7fa4af500494,0x7fa4af500494, 7,7, 0)0) == 11 ...... _ssl.cpython-37m-x86_64-linux-gnu.so->_ssl.cpython-37m-x86_64-linux-gnu.so->SSL_CTX_freeSSL_CTX_free(0x55636123a100,(0x55636123a100, 0x7fa4af4b1e10,0x7fa4af4b1e10, -3,-3, 0x7fa4a12734c8)0x7fa4a12734c8) == 00

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 ltrace – count memory allocations

>>>>>> importimport os,os, time,time, requestsrequests >>>>>> os.getpid()os.getpid() 65046504 ## attachattach ltraceltrace >>>>>> startstart == time.time();time.time(); rr == requests.get("https://www.python.org");requests.get("https://www.python.org"); print(time.time()print(time.time() -- start)start) ## withoutwithout ltrace:ltrace: 0.5660.566 secsec ## withwith ltrace:ltrace: 3.3963.396 secsec

$$ ltraceltrace -e-e ''malloc+free+realloc@*malloc+free+realloc@*'' -c-c -p-p 65046504 %% timetime secondsseconds usecs/callusecs/call callscalls functionfunction ------53.4953.49 1.1048041.104804 3737 2979629796 freefree 45.7845.78 0.9455650.945565 3737 2552325523 mallocmalloc 0.730.73 0.0150690.015069 4848 309309 reallocrealloc ------100.00100.00 2.0654382.065438 5562855628 totaltotal

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 strace – trace system calls and signals

● Paul Kranenburg in 1991 ● Dmitry Levin (current maintainer)

Ring 3

Ring 2 Least privileged

Ring 1

Ring 0

syscall Kernel

Most privileged Device drivers

Device drivers

Applications

Hertzsprung at English Wikipedia [CC BY-SA 3.0]

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 strace “open” syscall

$$ stracestrace -e-e openopen catcat /etc/os-release/etc/os-release >/dev/null>/dev/null ++++++ exitedexited withwith 00 ++++++ $$ manman 22 openopen ...... TheThe open()open() systemsystem callcall opensopens thethe filefile specifiedspecified byby pathname.pathname.

$$ stracestrace -c-c catcat /etc/os-release/etc/os-release >/dev/null>/dev/null %% timetime secondsseconds usecs/callusecs/call callscalls errorserrors syscallsyscall ------31,4031,40 0,0005910,000591 590590 11 execveexecve 13,5113,51 0,0002540,000254 2828 99 mmapmmap 8,678,67 0,0001630,000163 4040 44 mprotectmprotect 7,207,20 0,0001350,000135 2222 66 readread 6,956,95 0,0001310,000131 3232 44 openatopenat 6,366,36 0,0001200,000120 1919 66 closeclose 6,216,21 0,0001170,000117 2929 44 brkbrk ...... ------100.00100.00 0,0018820,001882 4949 22 totaltotal

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 syscall ABI / API

● open ● x86_64 glibc <= 2.25: open ● x86_64 glibc >= 2.26: openat ● riscv: openat $ strace -e ’/^open(at)?$’

● stat ● fstat, fstat64, fstatat64, lstat, lstat64, newfstatat, oldfstat, oldlstat, oldstat, stat, stat64, statx $ strace -e %%stat %%stat = %stat + %lstat + %fstat + statx

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 strace “openat” syscall

$$ stracestrace -e-e openatopenat catcat /etc/os-release/etc/os-release >/dev/null>/dev/null openat(AT_FDCWD,openat(AT_FDCWD, "/etc/ld.so.cache","/etc/ld.so.cache", O_RDONLY|O_CLOEXEC)O_RDONLY|O_CLOEXEC) == 33 openat(AT_FDCWD,openat(AT_FDCWD, "/lib64/libc.so.6","/lib64/libc.so.6", O_RDONLY|O_CLOEXEC)O_RDONLY|O_CLOEXEC) == 33 openat(AT_FDCWD,openat(AT_FDCWD, "/usr/lib/locale/locale-archive","/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC)O_RDONLY|O_CLOEXEC) == 33 openat(AT_FDCWD,openat(AT_FDCWD, "/etc/os-release","/etc/os-release", O_RDONLY)O_RDONLY) == 33 ++++++ exitedexited withwith 00 ++++++

$$ stracestrace -P-P /etc/os-release/etc/os-release catcat /etc/os-release/etc/os-release >/dev/null>/dev/null strace:strace: RequestedRequested pathpath '/etc/os-release''/etc/os-release' resolvedresolved intointo '/usr/lib/os.release.d/os-release-fedora''/usr/lib/os.release.d/os-release-fedora' openat(AT_FDCWD,openat(AT_FDCWD, "/etc/os-release","/etc/os-release", O_RDONLY)O_RDONLY) == 33 fstat(fstat(33,, {st_mode=S_IFREG|0644,{st_mode=S_IFREG|0644, st_size=693,st_size=693, ...})...}) == 00 fadvise64(fadvise64(33,, 0,0, 0,0, POSIX_FADV_SEQUENTIAL)POSIX_FADV_SEQUENTIAL) == 00 read(read(33,, "NAME=Fedora\nVERSION=\"29"NAME=Fedora\nVERSION=\"29 (Twenty(Twenty "...,"..., 131072)131072) == 693693 read(read(33,, "","", 131072)131072) == 00 close(close(33)) == 00 ++++++ exitedexited withwith 00 ++++++

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 strace options

● filter class examples ● %file: syscall that take a file name as argument ● %desc: file descriptor related syscalls ● %net: network related syscalls ● arguments ● -P: path filter ● -y: print path associated with file descriptor ● -yy: print protocol information (e.g. ip:port) ● -t/-tt/-ttt: time stamp ● -T: time spent ● -k: caller stack trace

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 strace: requests.get(...)

$$ stracestrace -e-e %file%file -p-p 65046504 ...... stat("/etc/resolv.conf",stat("/etc/resolv.conf", {st_mode=S_IFREG|0644,{st_mode=S_IFREG|0644, st_size=300,st_size=300, ...})...}) == 00 openat(AT_FDCWD,openat(AT_FDCWD, "/etc/hosts","/etc/hosts", O_RDONLY|O_CLOEXEC)O_RDONLY|O_CLOEXEC) == 33 openat(AT_FDCWD,openat(AT_FDCWD, "/etc/crypto-policies/back-ends/openssl.config","/etc/crypto-policies/back-ends/openssl.config", O_RDONLY)O_RDONLY) == 44 openat(AT_FDCWD,openat(AT_FDCWD, "/etc/ssl/cacert.pem","/etc/ssl/cacert.pem", O_RDONLY)O_RDONLY) == -1-1 ENOENTENOENT (No(No suchsuch filefile oror directory)directory)

$$ stracestrace -e-e %net%net -p-p 65046504 socket(socket(AF_INETAF_INET,, SOCK_DGRAMSOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK,|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP)IPPROTO_IP) == 33 connect(3,connect(3, {sa_family=AF_INET,{sa_family=AF_INET, sin_port=htons(sin_port=htons(5353),), sin_addr=inet_addr("10.38.5.26")},sin_addr=inet_addr("10.38.5.26")}, 16)16) == 00 sendto(3,sendto(3, "\323\300\1\0\0\1\0\0\0\0\0\0"\323\300\1\0\0\1\0\0\0\0\0\0\3www\6python\3org\3www\6python\3org\0\0\1\0\1",\0\0\1\0\1", 32,32, MSG_NOSIGNAL,MSG_NOSIGNAL, NULL,NULL, 0)0) == 3232 recvfrom(3,recvfrom(3, "\323\300\201\200\0\1\0\2\0\0\0\0\3www\6python\3org\0\0\1\0\1"...,"\323\300\201\200\0\1\0\2\0\0\0\0\3www\6python\3org\0\0\1\0\1"..., 2048,2048, 0,0, {sa_family=AF_INET,{sa_family=AF_INET, sin_port=htons(53),sin_port=htons(53), sin_addr=inet_addr("sin_addr=inet_addr("10.38.5.2610.38.5.26")},")}, [28->16])[28->16]) == 9393

socket(socket(AF_INETAF_INET,, SOCK_STREAMSOCK_STREAM|SOCK_CLOEXEC,|SOCK_CLOEXEC, IPPROTO_TCP)IPPROTO_TCP) == 33 setsockopt(3,setsockopt(3, SOL_TCP,SOL_TCP, TCP_NODELAY,TCP_NODELAY, [1],[1], 4)4) == 00 connect(3,connect(3, {sa_family=AF_INET,{sa_family=AF_INET, sin_port=htons(sin_port=htons(443443),), sin_addr=inet_addr("151.101.36.223")},sin_addr=inet_addr("151.101.36.223")}, 16)16) == 00 getsockopt(3,getsockopt(3, SOL_SOCKET,SOL_SOCKET, SO_TYPE,SO_TYPE, [1],[1], [4])[4]) == 00 getpeername(3,getpeername(3, {sa_family=AF_INET,{sa_family=AF_INET, sin_port=htons(443),sin_port=htons(443), sin_addr=inet_addr("151.101.36.223")},sin_addr=inet_addr("151.101.36.223")}, [16])[16]) == 00

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 syscall tampering

$$ stracestrace -e-e inject=socket:error=EMFILEinject=socket:error=EMFILE -p-p 65046504 >>>>>> requests.get("https://www.python.org/en")requests.get("https://www.python.org/en") TracebackTraceback (most(most recentrecent callcall last):last): ...... socket.gaierror:socket.gaierror: [Errno[Errno -2]-2] NameName oror serviceservice notnot knownknown $$ stracestrace -e-e inject=socket:error=EMFILE:when=2+inject=socket:error=EMFILE:when=2+ -p-p 65046504 TracebackTraceback (most(most recentrecent callcall last):last): ...... FileFile "/usr/lib64/python3.7/socket.py","/usr/lib64/python3.7/socket.py", lineline 151,151, inin __init____init__ _socket.socket.__init__(self,_socket.socket.__init__(self, family,family, type,type, proto,proto, fileno)fileno) OSError:OSError: [Errno[Errno 24]24] TooToo manymany openopen filesfiles

$$ dddd if=/dev/zeroif=/dev/zero of=/dev/nullof=/dev/null bs=1Mbs=1M count=10count=10 1048576010485760 bytesbytes (10(10 MB,MB, 1010 MiB)MiB) copied,copied, 0,003282250,00328225 s,s, 3,23,2 GB/sGB/s $$ stracestrace -e-e inject=write:delay_exit=100000inject=write:delay_exit=100000 -o-o /dev/null/dev/null \\ dddd if=/dev/zeroif=/dev/zero of=/dev/nullof=/dev/null bs=1Mbs=1M count=10count=10 1048576010485760 bytesbytes (10(10 MB,MB, 1010 MiB)MiB) copied,copied, 1,106991,10699 s,s, 9,59,5 MB/sMB/s

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 syscall tampering

$$ stracestrace -P-P /tmp/foo/tmp/foo -e-e inject=unlinkat:retval=0inject=unlinkat:retval=0 \\ rmrm /tmp/foo/tmp/foo newfstatat(AT_FDCWD,newfstatat(AT_FDCWD, "/tmp/foo","/tmp/foo", {st_mode=S_IFREG|0664,{st_mode=S_IFREG|0664, st_size=0,st_size=0, ...},...}, AT_SYMLINK_NOFOLLOW)AT_SYMLINK_NOFOLLOW) == 00 newfstatat(AT_FDCWD,newfstatat(AT_FDCWD, "/tmp/foo","/tmp/foo", {st_mode=S_IFREG|0664,{st_mode=S_IFREG|0664, st_size=0,st_size=0, ...},...}, AT_SYMLINK_NOFOLLOW)AT_SYMLINK_NOFOLLOW) == 00 faccessat(AT_FDCWD,faccessat(AT_FDCWD, "/tmp/foo","/tmp/foo", W_OK)W_OK) == 00 unlinkat(AT_FDCWD,unlinkat(AT_FDCWD, "/tmp/foo","/tmp/foo", 0)0) == 00 (INJECTED)(INJECTED) ++++++ exitedexited withwith 00 ++++++ $$ lsls /tmp/foo/tmp/foo /tmp/foo/tmp/foo

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Verdict

✔ easy to use ✔ powerful tool for quick hacks ✔ no extra privileges ✗ slow ✗ limit view (user-space only) ✗ ltrace incompatible with optimizations

$$ ltraceltrace lsls >/dev/null>/dev/null ++++++ exitedexited (status(status 0)0) ++++++ $$ scanelfscanelf -a-a /bin//bin/ls TYPETYPE PAXPAX PERMPERM ENDIANENDIAN STK/REL/PTLSTK/REL/PTL TEXTRELTEXTREL RPATHRPATH BINDBIND FILEFILE ET_DYNET_DYN PeMRxSPeMRxS 07550755 LELE RW-RW- R--R-- RW-RW- -- -- NOWNOW /bin/ls/bin/ls

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Low-level tracing What is the Kernel doing?

What is my hardware doing?

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Low-level tracing

● Trace inside the Kernel ● file system ● hardware drivers ● Profile hardware ● CPU cache ● memory / MMU ● efficient user-space tracing ● single process ● system-wide ● learn how Kernel space and hardware works

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Data sources

● kprobes ● uprobes ● events ● perf_event (HPC, PMU, faults, TLB, ...) ● clock ● Kernel TRACE_EVENT, DEFINE_EVENT ● user defined ● probe / USDT (Userland Statically Defined Tracing) ● -ust

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 kprobes / uprobes

Kernel probes ● (almost) all internal Kernel functions ● /proc/kallsyms

User-space probes ● (almost) all functions in binaries ● applications ● shared libraries

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 perf_event & metrics

$$ perfperf listlist branch-instructionsbranch-instructions OROR branchesbranches [Hardware[Hardware event]event] branch-missesbranch-misses [Hardware[Hardware event]event] bus-cyclesbus-cycles [Hardware[Hardware event]event] ...... alignment-faultsalignment-faults [Software[Software event]event] bpf-outputbpf-output [Software[Software event]event] context-switchescontext-switches OROR cscs [Software[Software event]event] cpu-clockcpu-clock [Software[Software event]event] cpu-migrationscpu-migrations OROR migrationsmigrations [Software[Software event]event] ...... L1-dcache-load-missesL1-dcache-load-misses [Hardware[Hardware cachecache event]event] L1-dcache-loadsL1-dcache-loads [Hardware[Hardware cachecache event]event] L1-dcache-storesL1-dcache-stores [Hardware[Hardware cachecache event]event] L1-icache-load-missesL1-icache-load-misses [Hardware[Hardware cachecache event]event] ...... power/energy-cores/power/energy-cores/ [Kernel[Kernel PMUPMU event]event] power/energy-gpu/power/energy-gpu/ [Kernel[Kernel PMUPMU event]event] power/energy-pkg/power/energy-pkg/ [Kernel[Kernel PMUPMU event]event] power/energy-psys/power/energy-psys/ [Kernel[Kernel PMUPMU event]event] ...... floatingfloating point:point: fp_arith_inst_retired.128b_packed_doublefp_arith_inst_retired.128b_packed_double [Number[Number ofof SSE/AVXSSE/AVX computationalcomputational 128-bit128-bit packedpacked doubledouble precisionprecision floating-pointfloating-point instructionsinstructions ...]...] ...... Summary:Summary: CLKSCLKS [Per-thread[Per-thread actualactual clocksclocks whenwhen thethe logicallogical processorprocessor isis active.active. ThisThis isis calledcalled 'Clockticks''Clockticks' inin VTune]VTune] CPICPI [Cycles[Cycles PerPer InstructionInstruction (threaded)](threaded)]

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Kernel trace events

## findfind /sys/kernel/debug/tracing/events/sys/kernel/debug/tracing/events -name-name formatformat || wcwc -l-l 18971897

## trace-cmdtrace-cmd recordrecord -e-e 'cfg80211:cfg80211_inform_bss_frame''cfg80211:cfg80211_inform_bss_frame' -e-e 'cfg80211:cfg80211_get_bss''cfg80211:cfg80211_get_bss' ## trace-cmdtrace-cmd reportreport ...... irq/140-iwlwifi-1834irq/140-iwlwifi-1834 [002][002] 91697.995389:91697.995389: cfg80211_inform_bss_framecfg80211_inform_bss_frame:: phy0,phy0, band:band: 1,1, freq:freq: 5180(scan_width:5180(scan_width: 0)0) :signal: -6600,-6600, tsb:132269768490547,tsb:132269768490547, detect_tsf:0,detect_tsf:0, tsf_bssid:tsf_bssid: 00:00:00:00:00:0000:00:00:00:00:00 kworker/u16:3-23176kworker/u16:3-23176 [004][004] 91697.995468:91697.995468: cfg80211_inform_bss_framecfg80211_inform_bss_frame:: phy0,phy0, band:band: 1,1, freq:freq: 5180(scan_width:5180(scan_width: 0)0) signal:signal: -6600,-6600, tsb:132269768490547,tsb:132269768490547, detect_tsf:0,detect_tsf:0, tsf_bssid:tsf_bssid: 00:00:00:00:00:0000:00:00:00:00:00 irq/140-iwlwifi-1834irq/140-iwlwifi-1834 [002][002] 91698.002485:91698.002485: cfg80211_inform_bss_framecfg80211_inform_bss_frame:: phy0,phy0, band:band: 1,1, freq:freq: 5180(scan_width:5180(scan_width: 0)0) signal:signal: -6700,-6700, tsb:132269775585563,tsb:132269775585563, detect_tsf:0,detect_tsf:0, tsf_bssid:tsf_bssid: 00:00:00:00:00:0000:00:00:00:00:00 wpa_supplicant-2320wpa_supplicant-2320 [000][000] 91698.198168:91698.198168: cfg80211_get_bsscfg80211_get_bss:: phy0,phy0, band:band: 1,1, freq:freq: 5180,5180, 08:96:d7:XX:XX:XX,08:96:d7:XX:XX:XX, buf:buf: 0x73,0x73, bss_type:bss_type: 0,0, privacy:privacy: 22 wpa_supplicant-2320wpa_supplicant-2320 [000][000] 91698.216052:91698.216052: cfg80211_get_bsscfg80211_get_bss:: phy0,phy0, band:band: 1,1, freq:freq: 5180,5180, 08:96:d7:XX:XX:XX,08:96:d7:XX:XX:XX, buf:buf: 0x73,0x73, bss_type:bss_type: 0,0, privacy:privacy: 22

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Advanced Tools Advanced tools

● ftrace (tracefs) ● ● BCC / eBPF tools ● SystemTap ● more tools ● LTTng ● dtrace ● VTune ● SysDig ● ...

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 DTrace

● Sun Microsystems (2005) for Solaris ● DTrace markers ● operating systems ● Linux (2017) ● Windows (2018) ● macOS ● FreeBSD ● NetBSD

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 ftrace – Function Tracer

● by Steven Rostedt (2008) ● tracefs /sys/kernel/debug/tracing ● ring buffer / pipes ● GCC profiling, live patching, trampolines ● foundation for Live Kernel Patching ● frontends ● busybox / shell (cat, echo) ● trace-cmd ● KernelShark ● perf

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 trace-cmd: NFS open (simplified)

## trace-cmdtrace-cmd recordrecord -p-p function_graphfunction_graph -l-l 'nfs_*''nfs_*' -c-c python3python3 python_open.pypython_open.py ## trace-cmdtrace-cmd reportreport ...... <...>-56879<...>-56879 [006][006] 8398345.057032:8398345.057032: funcgraph_entry:funcgraph_entry: || nfs_permission()nfs_permission() {{ <...>-56879<...>-56879 [006][006] 8398345.057032:8398345.057032: funcgraph_entry:funcgraph_entry: 0.0650.065 usus || nfs_do_access();nfs_do_access(); <...>-56879<...>-56879 [006][006] 8398345.057033:8398345.057033: funcgraph_entry:funcgraph_entry: || nfs_do_access()nfs_do_access() {{ <...>-56879<...>-56879 [006][006] 8398345.057034:8398345.057034: funcgraph_entry:funcgraph_entry: 0.9120.912 usus || nfs_alloc_fattr();nfs_alloc_fattr(); <...>-56879<...>-56879 [006][006] 8398345.057947:8398345.057947: funcgraph_entry:funcgraph_entry: || nfs_refresh_inode()nfs_refresh_inode() {{ <...>-56879<...>-56879 [006][006] 8398345.057950:8398345.057950: funcgraph_entry:funcgraph_entry: || nfs_refresh_inode.part.27()nfs_refresh_inode.part.27() {{ <...>-56879<...>-56879 [006][006] 8398345.057951:8398345.057951: funcgraph_entry:funcgraph_entry: || nfs_refresh_inode_locked()nfs_refresh_inode_locked() {{ <...>-56879<...>-56879 [006][006] 8398345.057952:8398345.057952: funcgraph_entry:funcgraph_entry: || nfs_update_inode()nfs_update_inode() {{ <...>-56879<...>-56879 [006][006] 8398345.057952:8398345.057952: funcgraph_entry:funcgraph_entry: 0.1900.190 usus || nfs_file_has_writers();nfs_file_has_writers(); <...>-56879<...>-56879 [006][006] 8398345.057954:8398345.057954: funcgraph_entry:funcgraph_entry: 0.2350.235 usus || nfs_set_cache_invalid();nfs_set_cache_invalid(); <...>-56879<...>-56879 [006][006] 8398345.057955:8398345.057955: funcgraph_exit:funcgraph_exit: 3.0893.089 usus || }} <...>-56879<...>-56879 [006][006] 8398345.057955:8398345.057955: funcgraph_exit:funcgraph_exit: 4.1064.106 usus || }} <...>-56879<...>-56879 [006][006] 8398345.057955:8398345.057955: funcgraph_exit:funcgraph_exit: 4.8304.830 usus || }} <...>-56879<...>-56879 [006][006] 8398345.057964:8398345.057964: funcgraph_exit:funcgraph_exit: ++ 14.69114.691 usus || }} <...>-56879<...>-56879 [006][006] 8398345.057964:8398345.057964: funcgraph_entry:funcgraph_entry: 0.0530.053 usus || nfs_access_set_mask();nfs_access_set_mask(); <...>-56879<...>-56879 [006][006] 8398345.057966:8398345.057966: funcgraph_entry:funcgraph_entry: 2.6212.621 usus || nfs_access_add_cache();nfs_access_add_cache(); <...>-56879<...>-56879 [006][006] 8398345.057970:8398345.057970: funcgraph_exit:funcgraph_exit: !! 936.572936.572 usus || }} <...>-56879<...>-56879 [006][006] 8398345.057970:8398345.057970: funcgraph_exit:funcgraph_exit: !! 937.907937.907 usus || }}

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 trace-cmd: NFS open

## trace-cmdtrace-cmd recordrecord -p-p functionfunction --func-stack--func-stack -l-l nfs_permissionnfs_permission -c-c python3python3 python_open.pypython_open.py ## trace-cmdtrace-cmd reportreport <...>-56912<...>-56912 [006][006] 8398623.394239:8398623.394239: function:function: nfs_permissionnfs_permission <...>-56912<...>-56912 [006][006] 8398623.394605:8398623.394605: kernel_stack:kernel_stack: trace> =>=> nfs_permissionnfs_permission (ffffffffc075d415)(ffffffffc075d415) =>=> inode_permissioninode_permission (ffffffffba2bb34e)(ffffffffba2bb34e) =>=> link_path_walk.part.49link_path_walk.part.49 (ffffffffba2bf172)(ffffffffba2bf172) =>=> path_lookupat.isra.53path_lookupat.isra.53 (ffffffffba2bfbc3)(ffffffffba2bfbc3) =>=> filename_lookup.part.67filename_lookup.part.67 (ffffffffba2c18c0)(ffffffffba2c18c0) =>=> vfs_statxvfs_statx (ffffffffba2b5683)(ffffffffba2b5683) =>=> __do_sys_newstat__do_sys_newstat (ffffffffba2b5c29)(ffffffffba2b5c29) =>=> do_syscall_64do_syscall_64 (ffffffffba00418b)(ffffffffba00418b) =>=> entry_SYSCALL_64_after_hwframeentry_SYSCALL_64_after_hwframe (ffffffffbaa00088)(ffffffffbaa00088) ...... <...>-56912<...>-56912 [006][006] 8398623.396069:8398623.396069: function:function: nfs_permissionnfs_permission <...>-56912<...>-56912 [006][006] 8398623.396093:8398623.396093: kernel_stack:kernel_stack: trace> =>=> nfs_permissionnfs_permission (ffffffffc075d415)(ffffffffc075d415) =>=> inode_permissioninode_permission (ffffffffba2bb34e)(ffffffffba2bb34e) =>=> link_path_walk.part.49link_path_walk.part.49 (ffffffffba2bf172)(ffffffffba2bf172) =>=> path_openatpath_openat (ffffffffba2bfdef)(ffffffffba2bfdef) =>=> do_filp_opendo_filp_open (ffffffffba2c26e3)(ffffffffba2c26e3) =>=> do_sys_opendo_sys_open (ffffffffba2ada36)(ffffffffba2ada36) =>=> do_syscall_64do_syscall_64 (ffffffffba00418b)(ffffffffba00418b) =>=> entry_SYSCALL_64_after_hwframeentry_SYSCALL_64_after_hwframe (ffffffffbaa00088)(ffffffffbaa00088)

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 perf – Performance analysis tools for Linux

(2009) ● perf_event_open() syscall ● perf counters (HPC, SPC) ● kprobes, uprobes, trace points ● LBR (Last Branch Records) sampling on recent Intel CPUs ● low-overhead sampling with callgraph ● wide range analysis from CPU instructions to Python, Java, NodeJS… ● perf command ● privileged and unprivileged ● ncurse user interface

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 perf stat

## perfperf statstat -d-d makemake -j-j 208.784,16208.784,16 msecmsec task-clocktask-clock ## 2,5992,599 CPUsCPUs utilizedutilized 156.385156.385 context-switchescontext-switches ## 749,028749,028 M/secM/sec 7.9647.964 cpu-migrationscpu-migrations ## 38,14538,145 M/secM/sec 3.774.2603.774.260 page-faultspage-faults ## 18077,34318077,343 M/secM/sec 611.151.444.573611.151.444.573 cyclescycles ## 2927194,8262927194,826 GHzGHz (62,51%)(62,51%) 620.124.617.748620.124.617.748 instructionsinstructions ## 1,011,01 insninsn perper cyclecycle (75,02%)(75,02%) 134.224.550.386134.224.550.386 branchesbranches ## 642887148,373642887148,373 M/secM/sec (75,02%)(75,02%) 4.022.428.0404.022.428.040 branch-missesbranch-misses ## 3,00%3,00% ofof allall branchesbranches (75,00%)(75,00%) 170.553.275.660170.553.275.660 L1-dcache-loadsL1-dcache-loads ## 816888629,684816888629,684 M/secM/sec (75,00%)(75,00%) 10.332.035.24910.332.035.249 L1-dcache-load-missesL1-dcache-load-misses ## 6,06%6,06% ofof allall L1-dcacheL1-dcache hitshits (74,99%)(74,99%) 2.288.155.9282.288.155.928 LLC-loadsLLC-loads ## 10959440,99210959440,992 M/secM/sec (49,98%)(49,98%) 477.323.105477.323.105 LLC-load-missesLLC-load-misses ## 20,86%20,86% ofof allall LL-cacheLL-cache hitshits (50,01%)(50,01%)

80,33550365280,335503652 secondsseconds timetime elapsedelapsed 185,163504000185,163504000 secondsseconds useruser 18,41685100018,416851000 secondsseconds syssys

## perfperf statstat -M-M Turbo_UtilizationTurbo_Utilization makemake -j-j 599.287.742.470599.287.742.470 cpu_clk_unhalted.threadcpu_clk_unhalted.thread ## 1,71,7 Turbo_UtilizationTurbo_Utilization 354.672.677.325354.672.677.325 cpu_clk_unhalted.ref_tsccpu_clk_unhalted.ref_tsc

IntelIntel i5-8265Ui5-8265U 1.60GHz1.60GHz // 3.90GHz3.90GHz

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 perf probe

$$ sudosudo perfperf probeprobe -x-x /lib64/libc.so.6/lib64/libc.so.6 --add--add mallocmalloc --add--add reallocrealloc --add--add freefree AddedAdded newnew events:events: probe_libc:mallocprobe_libc:malloc (on(on mallocmalloc inin /usr/lib64/libc-2.28.so)/usr/lib64/libc-2.28.so) probe_libc:malloc_1probe_libc:malloc_1 (on(on mallocmalloc inin /usr/lib64/libc-2.28.so)/usr/lib64/libc-2.28.so) probe_libc:reallocprobe_libc:realloc (on(on reallocrealloc inin /usr/lib64/libc-2.28.so)/usr/lib64/libc-2.28.so) probe_libc:realloc_1probe_libc:realloc_1 (on(on reallocrealloc inin /usr/lib64/libc-2.28.so)/usr/lib64/libc-2.28.so) probe_libc:freeprobe_libc:free (on(on freefree inin /usr/lib64/libc-2.28.so)/usr/lib64/libc-2.28.so) probe_libc:free_1probe_libc:free_1 (on(on freefree inin /usr/lib64/libc-2.28.so)/usr/lib64/libc-2.28.so)

$$ sudosudo perfperf statstat -e-e 'probe_libc:malloc*''probe_libc:malloc*' -e-e 'probe_libc:free*''probe_libc:free*' -e-e 'probe_libc:realloc*''probe_libc:realloc*' -p-p 1815418154 25.54025.540 probe_libc:malloc_1probe_libc:malloc_1 00 probe_libc:mallocprobe_libc:malloc 00 probe_libc:freeprobe_libc:free 29.94829.948 probe_libc:free_1probe_libc:free_1 00 probe_libc:reallocprobe_libc:realloc 309309 probe_libc:realloc_1probe_libc:realloc_1

plain 0.566 sec ltrace 3.396 sec perf 0.705 sec

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 perf record / report

$$ perfperf recordrecord -Fmax-Fmax --call-graph--call-graph lbrlbr \\ python3python3 -c-c "import"import requests;requests; requests.get('https://www.python.org')"requests.get('https://www.python.org')"

$$ perfperf reportreport $$ perfperf annotateannotate

$$ perfperf scriptscript || stackcollapse-perf.plstackcollapse-perf.pl >> out.perf-folded-fullout.perf-folded-full $$ flamegraph.plflamegraph.pl out.perf-folded-fullout.perf-folded-full >> perf-calls-full.svgperf-calls-full.svg

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Reset Zoom Flame Graph Search

[l.. _.. [libpy.. _.. [libpytho.. _.. [libpython.. _.. [libpython.. _.. [libpython3.. _.. [libpython3.. _.. _PyMethodDe.. _.. _.. _PyCFunctio.. _.. _.. _PyEval_EvalFr.. _.. P.. _PyEval_EvalCo.. _.. _.. _.. _.. P.. _Py.. _PyFunction_FastC.. _.. _.. _.. _.. [l.. _Py.. _PyEval_EvalFrameD.. _Py.. [.. _P.. _.. _.. _P.. _P.. _PyFunction_FastCallKeyw.. _Py.. _.. [.. P.. [l.. _P.. _.. _.. [lib.. _P.. _.. _P.. _PyEval_EvalFrameDefault _Py.. _Py_.. [lib.. [libpytho.. _Py.. _.. _P.. [lib.. _P.. _.. _P.. _PyFunction_FastCallKeywords _Py.. _Py_.. PE.. [libpyth.. [libpython3.7m... _Py.. [.. _P.. _PyO.. _PyFun.. _P.. _PyEval_EvalFrameDefault _PyF.. [libp.. X50.. [libpyth.. _PyGC_CollectNo.. _Py.. _.. _.. [l.. [libp.. _PyEval_.. _PyFunction_FastCallKeywords _PyE.. [libp.. dl.. by_.. PyGC_Col.. PyImport_Cleanup _Py.. P.. _P.. _P.. _PyMet.. _PyEval_E.. _PyEval_EvalFrameDefault _PyF.. _Py_U.. _d.. X50.. Py_FinalizeEx _Py.. _PyEval.. Py.. _PyCFu.. _PyFunction_FastCallKeywords _PyEva.. _.. __lib.. _d.. [_s.. [libpython3.7m.so.1.0] _PyEval.. _PyEval_EvalFrameDefault _PyFun.. _PyF.. _start python3

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 perf record / report (2)

$$ python3python3 >>>>>> importimport os,os, requestsrequests >>>>>> os.getpid()os.getpid() 1941419414 >>>>>> requests.get('https://www.python.org')requests.get('https://www.python.org') [200]>

$$ perfperf recordrecord -Fmax-Fmax --call-graph--call-graph lbrlbr -p-p 1941419414 ^C[^C[ perfperf record:record: WokenWoken upup 11 timestimes toto writewrite datadata ]] $$ perfperf scriptscript || stackcollapse-perf.plstackcollapse-perf.pl >> out.perf-folded-partialout.perf-folded-partial $$ flamegraph.plflamegraph.pl out.perf-folded-partialout.perf-folded-partial >> perf-calls-partial.svgperf-calls-partial.svg

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Reset Zoom Flame Graph Search

39%

as.. asn.. asn.. asn.. EC.. asn1.. AS.. a.. eck.. asn1.. AS.. as.. x.. _.. as.. ecke.. asn1.. ASN.. asn1_.. _.. asn.. x509_.. ASN1.. x509_name_canon asn1_i.. [.. asn.. pubke.. x509_name_ex_d2i asn1_t.. P.. asn1_item_embed_d2i asn1_i.. _P.. asn1_template_noexp_d2i m.. _.. ASN1_i.. m.. [li.. _.. asn1_template_ex_d2i m.. [.. X509_O.. _.. _PyEva.. _.. asn1_item_embed_d2i ms.. _.. OPENSS.. [.. _PyFun.. _.. asn1_template_noexp_d2i ms.. _.. X509_S.. [.. _PyEval_.. [.. a.. asn1_template_ex_d2i __.. _.. SSL_CT.. P.. _PyEval_.. P.. A.. select@.. asn1_item_embed_d2i in.. _P.. [_ssl... _.. _PyFunctio.. _P.. A.. [readline... ASN1_item_ex_d2i B.. EVP_D.. X5.. _Py.. [libpy.. _.. P.. _PyEval_Eva.. [li.. P.. PyOS_Readl.. ASN1_item_d2i PEM_read_bio_ex x5.. _Py.. [libpy.. _.. _P.. [l.. _PyEval_Eva.. _PyEva.. X.. PyTokenize.. PEM_X509_INFO_read_bio X5.. _Py.. [libpyt.. _Py.. _P.. _P.. _PyFunction_FastCall.. b.. PyParser_P.. o.. X509_load_cert_crl_file _Py.. _PyEval_Eval.. _P.. _P.. _PyEval_EvalFrameDefault X.. PyParser_A.. state.. by_file_ctrl _PyFunction_FastCa.. _P.. [l.. _PyEval_EvalCodeWithName _Py.. [_s.. [libpython.. SSL_d.. X509_STORE_load_locations _PyEval_EvalFrameDe.. Py.. Py.. _PyFunction_FastCallKeywords _.. _Py.. PyRun_Inte.. [_ssl.cpython-37m-x86_64-linux-gnu.so] _PyEval_EvalCodeWit.. _PyEval_EvalFrameDefault _P.. _Py.. python3

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Advanced Analysis in Kernel space BCC – BPF Compiler Collection

A toolkit for creating efficient kernel tracing and manipulation programs. ● Run tracing code in Kernel space ● extended BPF (Berkeley Packet Filters) ● eBPF JIT 2014 ● Usable Kernel 4.12 - 4.15, 2017 ● C with Python and LUA frontends ● rich set of existing tools

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 BCC: ext4slow, tcpconnect

Slow ext4fs operations

## ./ext4slower./ext4slower 11 TracingTracing ext4ext4 operationsoperations slowerslower thanthan 11 msms TIMETIME COMMCOMM PIDPID TT BYTESBYTES OFF_KBOFF_KB LAT(ms)LAT(ms) FILENAMEFILENAME 18:44:1818:44:18 bashbash 45734573 RR 128128 00 6.456.45 balooctlbalooctl 18:44:1918:44:19 IndexedDBIndexedDB #198#198 45714571 SS 00 00 8.908.90 583651055lp.sqlite-wal583651055lp.sqlite-wal 18:44:2118:44:21 IndexedDBIndexedDB #198#198 45714571 SS 00 00 10.0210.02 583651055lp.sqlite-wal583651055lp.sqlite-wal 18:44:2118:44:21 IndexedDBIndexedDB #198#198 45714571 SS 00 00 3.903.90 583651055lp.sqlite583651055lp.sqlite 18:44:4318:44:43 mozStoragemozStorage #5#5 45714571 WW 2424 288288 6.106.10 cookies.sqlite-walcookies.sqlite-wal

TCP connections for user id 1000

## ./tcpconnect./tcpconnect -u-u 10001000 PIDPID COMMCOMM IPIP SADDRSADDR DADDRDADDR DPORTDPORT 48744874 Chrome_IOThrChrome_IOThr 44 192.168.7.168192.168.7.168 107.170.8.46107.170.8.46 8080 48744874 Chrome_IOThrChrome_IOThr 44 192.168.7.168192.168.7.168 107.170.8.46107.170.8.46 443443 48744874 Chrome_IOThrChrome_IOThr 44 192.168.7.168192.168.7.168 107.170.8.46107.170.8.46 443443 48744874 Chrome_IOThrChrome_IOThr 44 192.168.7.168192.168.7.168 107.170.8.46107.170.8.46 443443 48744874 Chrome_IOThrChrome_IOThr 44 192.168.7.168192.168.7.168 107.170.8.46107.170.8.46 443443 48744874 Chrome_IOThrChrome_IOThr 44 192.168.7.168192.168.7.168 107.170.8.46107.170.8.46 443443

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 BCC: sslsniff

>>>>>> requests.get("https://ep2019.europython.eu/",requests.get("https://ep2019.europython.eu/", headers={'Accept-Encoding':headers={'Accept-Encoding': 'identity'})'identity'})

## ./sslsniff./sslsniff -p18888-p18888 FUNCFUNC TIME(s)TIME(s) COMMCOMM PIDPID LENLEN WRITE/SENDWRITE/SEND 0.0000000000.000000000 python3python3 1888818888 146146 ------DATADATA ------GETGET // HTTP/1.1HTTP/1.1 Host:Host: ep2019.europython.euep2019.europython.eu User-Agent:User-Agent: python-requests/2.20.0python-requests/2.20.0 Accept-Encoding:Accept-Encoding: identityidentity Accept:Accept: */**/* Connection:Connection: keep-alivekeep-alive ------ENDEND DATADATA ------

READ/RECVREAD/RECV 0.1590230410.159023041 python3python3 1888818888 81928192 ------DATADATA ------HTTP/1.1HTTP/1.1 200200 OKOK Server:Server: nginxnginx Date:Date: Sun,Sun, 0707 JulJul 20192019 19:18:4619:18:46 GMTGMT Content-Type:Content-Type: text/html;text/html; charset=utf-8charset=utf-8 Content-Length:Content-Length: 3361733617 Connection:Connection: keep-alivekeep-alive X-Frame-Options:X-Frame-Options: SAMEORIGINSAMEORIGIN ETag:ETag: "c35dc4fdb282b799d8d77703a7553424""c35dc4fdb282b799d8d77703a7553424" Vary:Vary: Accept-Language,Accept-Language, CookieCookie Content-Language:Content-Language: enen Set-Cookie:Set-Cookie: django_language=en;django_language=en; expires=Mon,expires=Mon, 06-Jul-202006-Jul-2020 19:18:4619:18:46 GMT;GMT; Max-Age=31536000;Max-Age=31536000; Path=/Path=/

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 BPFtrace https://github.com/iovisor/bpftrace

$$ sudosudo bpftracebpftrace -e-e \\ 'tracepoint:syscalls:'tracepoint:syscalls:sys_enter_openatsys_enter_openat {{ printf("%s(%d)printf("%s(%d) %s\n",%s\n", comm,comm, pid,pid, str(args->filename));str(args->filename)); }'}' irqbalance(1329)irqbalance(1329) /proc/interrupts/proc/interrupts

>>>>>> requests.get("https://ep2019.europython.eu/")requests.get("https://ep2019.europython.eu/") $$ sudosudo bpftracebpftrace -e-e 'uprobe:/lib64/libc.so.6:'uprobe:/lib64/libc.so.6:mallocmalloc // commcomm ==== ""python3python3"" // {{ @bytes@bytes == hist(arg0)hist(arg0);; }'}' @bytes:@bytes: [1][1] 171171 || || [2,[2, 4)4) 661661 |@@@|@@@ || [4,[4, 8)8) 356356 |@|@ || [8,[8, 16)16) 16991699 |@@@@@@@@|@@@@@@@@ || [16,[16, 32)32) 79297929 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ || [32,[32, 64)64) 1051310513 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@||@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [64,[64, 128)128) 15741574 |@@@@@@@|@@@@@@@ || [128,[128, 256)256) 609609 |@@@|@@@ || [256,[256, 512)512) 10061006 |@@@@|@@@@ || [512,[512, 1K)1K) 608608 |@@@|@@@ || [1K,[1K, 2K)2K) 362362 |@|@ || [2K,[2K, 4K)4K) 5757 || || [4K,[4K, 8K)8K) 1010 || || [8K,[8K, 16K)16K) 1212 || || [16K,[16K, 32K)32K) 1313 || || [32K,[32K, 64K)64K) 22 || || [64K,[64K, 128K)128K) 11 || ||

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 SystemTap

● Initial release 2005 ● stap ● Scripting Language for dynamic instrumentation ● Kernel module ● DynInst ● eBPF (extended ) programs ● additional features: ● USDT / DTrace probes (Userland Statically Defined Tracing) ● Cross-VM instrumentation ● Prometheus

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 USDT (dtrace / systemtap)

$$ stapstap -l-l 'process("/usr/lib64/libpython3.7m*").mark("*")''process("/usr/lib64/libpython3.7m*").mark("*")' process("/usr/lib64/libpython3.7m.so.1.0").mark("function__entry")process("/usr/lib64/libpython3.7m.so.1.0").mark("function__entry") process("/usr/lib64/libpython3.7m.so.1.0").mark("function__return")process("/usr/lib64/libpython3.7m.so.1.0").mark("function__return") process("/usr/lib64/libpython3.7m.so.1.0").mark("gc__done")process("/usr/lib64/libpython3.7m.so.1.0").mark("gc__done") process("/usr/lib64/libpython3.7m.so.1.0").mark("gc__start")process("/usr/lib64/libpython3.7m.so.1.0").mark("gc__start") process("/usr/lib64/libpython3.7m.so.1.0").mark("import__find__load__done")process("/usr/lib64/libpython3.7m.so.1.0").mark("import__find__load__done") process("/usr/lib64/libpython3.7m.so.1.0").mark("import__find__load__start")process("/usr/lib64/libpython3.7m.so.1.0").mark("import__find__load__start") process("/usr/lib64/libpython3.7m.so.1.0").mark("line")process("/usr/lib64/libpython3.7m.so.1.0").mark("line")

$$ stapstap -l-l 'process.provider("php").mark("*")''process.provider("php").mark("*")' -c-c /usr/bin/php/usr/bin/php process("/usr/bin/php").provider("php").mark("compile__file__entry")process("/usr/bin/php").provider("php").mark("compile__file__entry") process("/usr/bin/php").provider("php").mark("compile__file__return")process("/usr/bin/php").provider("php").mark("compile__file__return") process("/usr/bin/php").provider("php").mark("error")process("/usr/bin/php").provider("php").mark("error") process("/usr/bin/php").provider("php").mark("exception__caught")process("/usr/bin/php").provider("php").mark("exception__caught") process("/usr/bin/php").provider("php").mark("exception__thrown")process("/usr/bin/php").provider("php").mark("exception__thrown") process("/usr/bin/php").provider("php").mark("execute__entry")process("/usr/bin/php").provider("php").mark("execute__entry") process("/usr/bin/php").provider("php").mark("execute__return")process("/usr/bin/php").provider("php").mark("execute__return") process("/usr/bin/php").provider("php").mark("function__entry")process("/usr/bin/php").provider("php").mark("function__entry") process("/usr/bin/php").provider("php").mark("function__return")process("/usr/bin/php").provider("php").mark("function__return") process("/usr/bin/php").provider("php").mark("request__shutdown")process("/usr/bin/php").provider("php").mark("request__shutdown") process("/usr/bin/php").provider("php").mark("request__startup")process("/usr/bin/php").provider("php").mark("request__startup")

$$ stapstap -l-l 'process("/usr/lib/jvm/java*/jre/lib/amd64/server/libjvm.so").mark("*")''process("/usr/lib/jvm/java*/jre/lib/amd64/server/libjvm.so").mark("*")' || wcwc -l-l 521521

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Tracing with stap

$$ sudosudo stapstap python-import.stppython-import.stp -c-c ""python3python3 -c-c passpass"" PassPass 1:1: parsedparsed useruser scriptscript andand 496496 librarylibrary scriptsscripts ...... PassPass 2:2: analyzedanalyzed script:script: 22 probes,probes, 44 functions,functions, 44 embeds,embeds, 22 globalsglobals ...... PassPass 3:3: translatedtranslated toto CC intointo "/tmp/stapz4MP9b/stap_d6d27027af93310fa02967b0e5910963_5289_src.c""/tmp/stapz4MP9b/stap_d6d27027af93310fa02967b0e5910963_5289_src.c" ...... PassPass 4:4: compiledcompiled CC intointo "stap_d6d27027af93310fa02967b0e5910963_5289.ko""stap_d6d27027af93310fa02967b0e5910963_5289.ko" ...... PassPass 5:5: startingstarting run.run. ERROR:ERROR: Couldn'tCouldn't insertinsert modulemodule '/tmp/stapz4MP9b/stap_d6d27027af93310fa02967b0e5910963_5289.ko':'/tmp/stapz4MP9b/stap_d6d27027af93310fa02967b0e5910963_5289.ko': OperationOperation notnot permittedpermitted $$ dmesgdmesg [335808.816759][335808.816759] Lockdown:Lockdown: staprun:staprun: LoadingLoading ofof unsignedunsigned modulemodule isis restrictedrestricted;; seesee manman kernel_lockdown.7kernel_lockdown.7

$$ sudosudo systemctlsystemctl rebootreboot --firmware-setup--firmware-setup

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Trace Python imports

globalglobal depthsdepths == 0;0; globalglobal timingtiming

probeprobe process("python3").library("libpython3.7m.so.1.0").mark("process("python3").library("libpython3.7m.so.1.0").mark("import__find__load__startimport__find__load__start")") {{ modnamemodname == user_string($arg1);user_string($arg1); nownow == local_clock_ns()local_clock_ns() timing[tid(),timing[tid(), depths]depths] == nownow printf("%*s*printf("%*s* ImportingImporting '%s''%s' ...\n",...\n", 2*depths,2*depths, "","", modname);modname); depths++;depths++; }}

probeprobe process("python3").library("libpython3.7m.so.1.0").mark("process("python3").library("libpython3.7m.so.1.0").mark("import__find__load__doneimport__find__load__done")") {{ modnamemodname == user_string($arg1);user_string($arg1); foundfound == $arg2;$arg2; depths--;depths--; nownow == local_clock_ns()local_clock_ns() durdur == nownow -- timing[tid(),timing[tid(), depths]depths] ifif (found)(found) printf("%*sprintf("%*s++ ImportedImported '%s''%s' inin %ldus\n",%ldus\n", 2*depths,2*depths, "","", modname,modname, dur/1000);dur/1000); elseelse printf("%*sprintf("%*s-- FailedFailed '%s''%s' inin %ldus\n",%ldus\n", 2*depths,2*depths, "","", modname,modname, dur/1000);dur/1000); }}

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Trace Python imports

$$ sudosudo stapstap python-import.stppython-import.stp -c-c ""python3python3 -c-c passpass"" ** ImportingImporting 'zipimport''zipimport' ...... ++ ImportedImported 'zipimport''zipimport' inin 125us125us ...... ** ImportingImporting 'encodings''encodings' ...... ** ImportingImporting 'codecs''codecs' ...... ** ImportingImporting '_codecs''_codecs' ...... ++ ImportedImported '_codecs''_codecs' inin 187us187us ++ ImportedImported 'codecs''codecs' inin 1273us1273us ** ImportingImporting 'encodings.aliases''encodings.aliases' ...... ++ ImportedImported 'encodings.aliases''encodings.aliases' inin 836us836us ++ ImportedImported 'encodings''encodings' inin 3263us3263us ...... ** ImportingImporting 'site''site' ...... ** ImportingImporting 'sitecustomize''sitecustomize' ...... -- FailedFailed 'sitecustomize''sitecustomize' inin 149us149us ** ImportingImporting 'usercustomize''usercustomize' ...... -- FailedFailed 'usercustomize''usercustomize' inin 144us144us ++ ImportedImported 'site''site' inin 28946us28946us

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Verdict

✔ extremely detailed information ✔ fast, efficient ✔ apps, system-wide, hardware events ✔ wide range from pre-build tools to custom code ✗ extremely detailed information ✗ learning curve ✗ turn your 64 core server into a Commodore C64

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Summary In my humble opinion

● strace, bpftrace: Swiss Army Knife for simple tasks ● bcc: pre-build tools ● perf: benchmarking and hot-spot profiling ● SystemTap: dynamic languages (Python) ● ftrace: tracing on old Kernels ● future: eBFP

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Resources

● Brendan Gregg http://www.brendangregg.com/ ● eBPF ● IOVisor Project https://www.iovisor.org/ ● Sergey Klyaus: “Dynamic Tracing with DTrace & SystemTap” https://myaut.github.io/dtrace-stap-book/ ● SystemTap beginners guide https://sourceware.org/systemtap/SystemTap_Beginners_Guide/ ● Eben Freeman, PyBay16: Python Tracing Superpowers https://speakerdeck.com/emfree/python-tracing-superpowers-with-systems-tools ● Slides: https://speakerdeck.com/tiran/

Profiling and Tracing, EuroPython 2019, @ChristianHeimes, CC BY-SA 4.0 Questions?

@ChristianHeimes [email protected] [email protected] https://speakerdeck.com/tiran/ THANK YOU

plus.google.com/+RedHat facebook.com/redhatinc

linkedin.com/company/red-hat twitter.com/RedHat

youtube.com/user/RedHatVideos