Performance Profiling Tool

Minsoo Ryu

Real-Time and Communications Lab. Hanyang University

[email protected]

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr Outline

 Example source  Profiling . . . Oprofile  Tracing . . Ltarce .

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 22 Example Source

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr Example Source

 UDP program . Server • ./UDP_server [port] . Client1 / client2 • ./UDP_client [ip address] [port] [user name]

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 44 Example Source

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 55 Example Source

Server Client

Data Send/Receive

exit

Procedure of UDP Socket Programming

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 66 Outline

 Example source  Profiling . Perf . Gprof . Oprofile  Tracing . Strace . Ltarce . Ftrace

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 77 Profiling

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr Perf

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr Perf

 What is perf . Performance counters for Linux . Perf profiler collects data through a variety of techniques • Hardware , code instrumentation, instruction set simulation, operating systems, hooking, performance counters . Operates with PMU information taking the CPU helpful • The reason why user-level program is included into the kernel source • Perf is closely associated with the kernel ABI

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1010 Perf

 Perf install . sudo apt-get install linux-tools-common . sudo apt-get install linux-tools-3.19.0-25-generic • Linux-cloud-tools-3.19.0.25-generic

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1111 Perf

 Usage of part . Perf [option] Commands Descriptions list List all symbolic event types stat Run a command and gather performance counter Statistics top Generate and displays a performance counter profile record Run a command and record its profile into perf.data* report Read perf.data* and display the profile annotate Read perf.data* and display annotated code diff Read two perf.data* files and display the differential profile

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1212 Perf

 Usage of part . Hardware event can add a modifier that limits the scope • U: occur event in the user-level • k: occur event in the kernel-level • h: occur event in the hypervisor • H: occur event in the host machine • G: occur event in the guest machine

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1313 Perf Stat

 Perf stat

. Task-clock • Clock cycle number is 5260.179993, average CPU usage rate is 0.134 . Context-switches • 1.592 context switch (0.303 k/sec) . Page-faults • 98 page-faults (0.019 k/sec) Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1414

Perf Top

 Perf top . Analysis system during run-time • Just like the top command in Linux . Provides a monitoring system in real-time # perf top

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1515 Perf Record

 Perf record . Record events . Recorded data is saved as perf.data by default . Use case • Record a specific command in detail • Analyze a suspicious in detail • Determine a cause of poor performance of a process

# perf record [option] [execute file]

# ls perf.data

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1616 Perf Report

 Perf report . How to view the recorded data in perf.data file

# perf report

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1717 Perf Annotate & Diff

 Perf annotate . Read perf.data and display annotated decompiled code . Use cases • Identify time-consuming part in # perf annotate  Perf diff . Read two perf.data files and display the differential of the two profiles . Use cases • See differences between updated perf.data and older one # perf diff

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1818 Assignment 1

 Analysis client1 sample application using perf . Using perf stat . Using perf record / report . Using perf top

 Submit recorded perf.data file and screenshot  Submit stat and top screenshot

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1919 Gprof

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr Gprof

 What is gprof . Gprof is included gcc(binutils package) . Check a lot of load on any function

 Gprof package install . sudo apt-get install binutils • Binutils dependency:  Bash, coreutils, diffutils, gcc, gettext, glibc, grep, make, perl, sed, texinfo

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 2121 Gprof

 How to work gprof . Gprof provide statistical information • Record the time of each function from entry to end  Every function is recorded with a number of function call during program execution time  -pg: insert the option time function(mcount)

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 2222 Gprof

 Internal operating concept . Timer • Check the PC per 0ms • Occurs SIGPROF signal  Call the settimer before the main function • Signal handler increases pc counter

. Enter/exit hooking • Call a mcount function before a function call • Creating a call graph  By using a PC count before and after a function call  Record the exact function call

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 2323 Gprof

 Gprof profile category . Flat profile • Show the CPU time and number to use for each function call • Summarize a overall profiling information • Check whether you need to modify some function, to increase the performance  Show each function for execution time and the number of executed function in the program . Call graph • Propose to get rid of any function calls or whether effective alternative to other effective functions • Show the related functions and hidden bugs • Optimize certain code path, after check it • Show the details  Call frequency, time spent in subroutine and so on

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 2424 Gprof

 How to run gprof #. gcc –o a.out a.c –pg • -pg: add time-function at each function # ./a.out

# ls gmon.out

# gprof a.out gmon.out . Additional options • -l: source code line-by-line time • -l -A -x: print source code(line-by-line time) • -F: print a particular function call graph

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 2525 Gprof

 Gprof flat profile .

. Slow function is used most of the time . F function • Call the 1 time • Average using 3.26 milliseconds . G function • Call the 1 time • Average using 13.02 milliseconds

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 2626 Gprof

 Gprof contents . % time • The percentage of the total running time of the program used by this function . Cumulative seconds • A running sum of the number of seconds accounted for by this function alone . Self seconds • The number of seconds accounted for by this function alone  This is the major sort for this listing . Calls • The number of times, this function was invoked  If this function is profiled, else blank

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 2727 Gprof

 Gprof contents . Self ms/call • The average number of milliseconds spent in this function per call  If this function is profiled, else blank . Total ms/call • The average number of milliseconds spent in this function and its children per call  If this function is profiled, else blank . Name • The index shows the location of the function in the gprof list  If the index is in parenthesis, it shows where it would appear in the gprof list if it were to be printed  This is the minor sort for this listing

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 2828 Gprof

 Gprof call_graph profile .

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 2929 Gprof

 The meaning of each field in a child row function . When current function calls the child function • Self: the total time spent in child functions • Children: the total time spent in the child function of a child • Called: the number of child functions/The total number of invoked child functions  Recursion number is not included by the total number of invoked child functions • Name: children function name

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 3030 Gprof

 The meaning of each field in a parent row function . When parent function calls the current function • Self: the total time spent in current functions • Children: the total time spent in the child function of the current function • Called: the number of current functions/The total number of invoked current functions  Recursion number is not included by the total number of invoked child functions • Name: parent function name  The index number is displayed next to the parent function name  If parents belong to the cycle: displays number of cycles between the name and index number  If parents doesn‟t belong to the cycle : is displayed, and the remaining fields are blank Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 3131 Gprof

 Gprof other feature . The detailed list of sources • Obtaining a line-by-line profiling information • Profiling information is useful for determining the optimization of one part • Using the line-by-line profiling and flat profile verify that the code is a lot of running in any path  Verify loop and branch statements if any loop is the most running or any branch is the most running • It is useful to carefully modify some code for optimal performance

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 3232 Assignment 2

 Analysis client2 sample application using gprof . Using flat profile . Explain briefly the analyzed client2

 Submit screenshot and explain those information

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 3333 Oprofile

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr Oprofile

 What is oprofile . Linux system-wide profiler • Proceed without knowing the user in the background • Any time, to obtain the profile data

. Oprofile features • Profile CPU usage of a running process • Identify whether any function of the running process uses a lot of CPU resources • If some functions are used a lot of resources, the program can be redesigned in a way that reduces the function call • Useless low CPU utilization device  Low CPU utilization device : I/O bounded sever and so on • User range: database developers

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 3535 Oprofile

. Oprofile feature • The workload is low, system-wide performance monitoring tool  Information for the executable program in system • Used memory , the number of requested L2 cache, the number of transmission • Collect data samples related to the performance whenever the counter is interrupted by using timer  Sample data is periodically written to disk  Generate reports on system-level performance and application level performance by using the recorded data

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 3636 Oprofile

 Oprofile install . Method 1 • sudo apt-get install libiberty-dev • sudo apt-get install oprofile . Method 2 • wget http://prdownloads.sourceforge.net/oprofile/oprofile- 1.1.0.tar.gz • tar xvfz oprofile-1.1.0.tar.gz • cd oprofile-1.1.0

• sudo ./configure --with-kernel-support • make install

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 3737 Oprofile

 Usage of part . Oprofile

Commancds Descriptions opcontrol Set whether to collect certain data op_help Show a brief description of each task that is available processor in the system op_merge Merge one data from samples collected the same program running in one op_to_source If the application is compiled with debugging symbols, create an annotated source to run the program oprofiled Record the sample data on disk as a daemon, periodically oprofpp Search the profile data op_import Convert to the native format of the system from an external binary format

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 3838 Oprofile

 Oprofile setting . Setting configuration first before executing oprofile • Setting for monitoring the kernel . Setting oprofile by opcontrol • /root/.oprofile/daemonrc: where file is stored after running command . Oprofile initialize • opcontrol –init • opcontrol –deinit • modprobe oprofile timer=1 • opcontrol –start  Opcontrol –stop

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 3939 Oprofile

 Oprofile setting . Oprofile start and stop • Opcontrol –start: log in as root and run command  If you monitor the system using Oprofile  Store data at /root/.oprofile/daemonr • Opcontrol –shutdown  Start oprofile which is OProfile„s Daemon  This daemon writes the sample data to the file at /var/lib/oprofile/samples/ periodically  Store the log to /var/lib/oprofile/oprofiled.log  If you restart oprofile with different configuration, the sample file of the previous session is stored at /var/lib/oprofile/samples/session-N automatically • ‘N’ is before the session- N +1

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 4040 Outline

 Example source  Profiling . Perf . Gprof . Oprofile  Tracing . Strace . Ltarce . Ftrace

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 4141 Tracing

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr Strace & Ltrace

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr Strace

 Tracing and signal . Debugging, instructional and diagnostic utility . Use to monitor interactions between processes and the kernel • Included system calls, signal deliveries and changes of process state . Useful for debugging and analyzing the application construction • Tracing kernel system call

 Strace install . sudo apt-get install strace

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 4444 Strace

 Strace . Option option Descriptions -c Count time, calls, and errors for each system call and report a summary on program exit -f Trace child processes as they are created by currently traced processes -r Print a relative timestamp upon entry to each system call -t Prefix each line of the trace with the time of day -T Show the time spent in system calls -e Trace all system calls which take a file name as an argument -p Attach to the process with the process ID pid and begin tracing -s Specify the maximum string size to print (the default is 32) -S Sort the output of the histogram printed by the -c option by the specified criterion

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 4545 Strace

 Strace client result

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 4646 Ltrace

 Tracing shared library . Useful for debugging and analyzing the application construction • Check by tracing User mode library

 Ltrace install . sudo apt-get install ltrace

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 4747 Ltrace

 Ltrace . Option

option Descriptions -u Run command with the user id, group id and supplementary groups of username -t Show the time stamp -T Show the time spent inside each call -p PID Tracing the executing process -o Write the trace output to the file -e [library function name][library file name](+/-)

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 4848 Ltrace

 Ltrace client result

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 4949 Assignment 3

 Monitor the UDP_server by strace and ltrace . System call . Library  Submit screenshot that is the result of strace  Explain the server operation by tracing system call  Capture screenshot that is the result of ltrace  Explain the server operation by tracing library

 Submit screenshot that is result of tracing client1, client2 by using ltrace . Explain the difference between client1 and client2

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 5050 Ftrace

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr Ftrace

 What is ftrace . Debugging component to trace the internal operations of the kernel • Trace all functions within the kernel during any period of time . Analyze whether any events generated in the kernel • While the application is running . Analyze the stream of operating kernel functions on the kernel level as an atomic unit . Analyze to filter out only the desired kernel functions • Minimize the debugging overhead

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 5252 Ftrace

 How to work ftrace . Tracing the internal structure of the kernel function by using „pg option‟ in GCC compiler • – pg: operation through mcount routine from the entry of each kernel function

. Invoke kstop_machine • Use to ensure that the modified without considering the other CPU to execute the same code • CONFIG_STOP_MACHINE=y , ./include/linux/stop_machine.h  Machine operating like a single core

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 5353 Ftrace

 Plugins . View of the plug-in settings • “#> cat current_tracer” . Select enable/disable by using current_tracer command  Events . View of the event settings • “#> cat set_events” . Select enable/disable by using enable file in event directory

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 5454 Ftrace

 Ftrace trace plugins . Function: trace all kernel function calls during any period of time . Function_graph: analysis of functional relationships in graph form . Wakeup, wakeup_dl, wakeup_rt: analysis of the wakeup latency . Mmiotrace: analysis of the memory map I/O . Irqsoff: analysis of interrupt latency . Nop: trace nothing

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 5555 Ftrace

 Function tracer example . Default setting in ftrace is tracing all functions • 1. cat set_ftrace_filter  #### all functions enabled ####: set to trace all functions • 2. echo CommonTraceWorkHandler > set_ftrace_filter  CommonTraceWorkHandler sets to trace only function • 3. cat current_tracer  Nop: no set of any tracer • 4. echo function > current_tracer  Set to trace functions • 5. cat trace | head -15

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 5656 Ftrace

 Ftrace plug-in tracer example . The trace that affects the response time in system • Tracing wakeup, wakeup_rt, irqsoff, preemptoff, preemptirqsoff

. Irqsoff: print the time when IRQ is disabled • If irq disable is for long time, irq response is slowed down  It affects the response speed of App Users • 1. echo irqsoff > current_tracer • 2. cat trace | head -20

. Preemptoff: print the time when preemption is disabled • 1. echo preemptoff > current_tracer • 2. cat trace | head -20

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 5757 Ftrace

 Ftrace event tracer example . Analyze the event about kernel sub system • Block, ext4, schedule, IRQ, workqueue . It is possible to trace only the kernel function which work on hooking in advance by tracepoint • Trace the function when tracepoint is compiled . Differ implementation between events with plugins . Check the scsi bus trace events when external hard is connected • 1. echo 1 > events/scsi/enable • 2. cat set_event • 3. cat trace | head -20

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 5858 Ftrace

 Ftrace stack Tracer example . Verify call function structure in the current kernel • 1. echo 1 > /proc/sys/kernel/stack_tracer_enabled  Stack tracer on • 2. cat stack_trace

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 5959 Ftrace

 Ftrace function_graph tracer example . Checking kernel function call in/out by using ftrace . Difference between function_graph tracer and function tracer • Function_graph records both entry(in/out) • Check the function‟s executing time • Check the depth of function call • 1. echo function_graph > current_tracer • 2. cat trace | head -15

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 6060 Ftrace

 Trace-cmd . Console-based user interface using kernel ftrace . It is inconvenient to only trace by using the echo command, trace-cmd is convenient to use . Solve usability problems by using trace-cmd instead of memorizing many commands

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 6161 Ftrace

 Trace-cmd . Trace-cmd install • sudo apt-get install trace-cmd

. Unused trace-cmd • #> mount -t debugfs nodev /sys/kernel/debug #> cd /sys/kernel/debug/tracing #> echo function ./current_tracer #> echo 1 > tracing_on #> ls /system/ #> echo 0 > tracing_on

. Used trace-cmd • #>/sdcard/trace-cmd record -p function ls /system

Real-Time Computing and Communications Lab., Hanyang University 6262 http://rtcc.hanyang.ac.kr

Ftrace

 Trace-cmd list . trace-cmd list -o • Trace-cmd record -O option . trace-cmd list -p • Available plugins . trace-cmd list -e • Available events

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 6363 Assignment 4

 Submit screenshot that is tracing OS scheduling latency during any period of time . Hint: sched_wakeup

 Submit screenshot that is interval about interrupt disable . Hint: irqsoff -d

Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 6464