Linux Performance Profiling Tool
Minsoo Ryu
Real-Time Computing and Communications Lab. Hanyang University
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr Outline
Example source Profiling . Perf . Gprof . Oprofile Tracing . Strace . Ltarce . Ftrace
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 22 Example Source
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr Example Source
UDP program . Server • ./UDP_server [port] . Client1 / client2 • ./UDP_client [ip address] [port] [user name]
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 44 Example Source
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 55 Example Source
Server Client
Data Send/Receive
exit
Procedure of UDP Socket Programming
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 66 Outline
Example source Profiling . Perf . Gprof . Oprofile Tracing . Strace . Ltarce . Ftrace
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 77 Profiling
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr Perf
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr Perf
What is perf . Performance counters for Linux . Perf profiler collects data through a variety of techniques • Hardware interrupts, code instrumentation, instruction set simulation, operating systems, hooking, performance counters . Operates with PMU information taking the CPU helpful • The reason why user-level program is included into the kernel source • Perf is closely associated with the kernel ABI
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1010 Perf
Perf install . sudo apt-get install linux-tools-common . sudo apt-get install linux-tools-3.19.0-25-generic • Linux-cloud-tools-3.19.0.25-generic
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1111 Perf
Usage of part . Perf
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1212 Perf
Usage of part . Hardware event can add a modifier that limits the scope • U: occur event in the user-level • k: occur event in the kernel-level • h: occur event in the hypervisor • H: occur event in the host machine • G: occur event in the guest machine
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1313 Perf Stat
Perf stat
. Task-clock • Clock cycle number is 5260.179993, average CPU usage rate is 0.134 . Context-switches • 1.592 context switch (0.303 k/sec) . Page-faults • 98 page-faults (0.019 k/sec) Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1414
Perf Top
Perf top . Analysis system during run-time • Just like the top command in Linux . Provides a monitoring system in real-time # perf top
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1515 Perf Record
Perf record . Record events . Recorded data is saved as perf.data by default . Use case • Record a specific command in detail • Analyze a suspicious process in detail • Determine a cause of poor performance of a process
# perf record [option] [execute file]
# ls perf.data
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1616 Perf Report
Perf report . How to view the recorded data in perf.data file
# perf report
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1717 Perf Annotate & Diff
Perf annotate . Read perf.data and display annotated decompiled code . Use cases • Identify time-consuming part in source code # perf annotate Perf diff . Read two perf.data files and display the differential of the two profiles . Use cases • See differences between updated perf.data and older one # perf diff
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1818 Assignment 1
Analysis client1 sample application using perf . Using perf stat . Using perf record / report . Using perf top
Submit recorded perf.data file and screenshot Submit stat and top screenshot
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 1919 Gprof
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr Gprof
What is gprof . Gprof is included gcc(binutils package) . Check a lot of load on any function
Gprof package install . sudo apt-get install binutils • Binutils dependency: Bash, coreutils, diffutils, gcc, gettext, glibc, grep, make, perl, sed, texinfo
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 2121 Gprof
How to work gprof . Gprof provide statistical information • Record the time of each function from entry to end Every function is recorded with a number of function call during program execution time -pg: insert the option time function(mcount)
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 2222 Gprof
Internal operating concept . Timer • Check the PC per 0ms • Occurs SIGPROF signal Call the settimer before the main function • Signal handler increases pc counter
. Enter/exit hooking • Call a mcount function before a function call • Creating a call graph By using a PC count before and after a function call Record the exact function call
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 2323 Gprof
Gprof profile category . Flat profile • Show the CPU time and number to use for each function call • Summarize a overall profiling information • Check whether you need to modify some function, to increase the performance Show each function for execution time and the number of executed function in the program . Call graph • Propose to get rid of any function calls or whether effective alternative to other effective functions • Show the related functions and hidden bugs • Optimize certain code path, after check it • Show the details Call frequency, time spent in subroutine and so on
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 2424 Gprof
How to run gprof #. gcc –o a.out a.c –pg • -pg: add time-function at each function # ./a.out
# ls gmon.out
# gprof a.out gmon.out . Additional options • -l: source code line-by-line time • -l -A -x: print source code(line-by-line time) • -F: print a particular function call graph
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 2525 Gprof
Gprof flat profile .
. Slow function is used most of the time . F function • Call the 1 time • Average using 3.26 milliseconds . G function • Call the 1 time • Average using 13.02 milliseconds
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 2626 Gprof
Gprof contents . % time • The percentage of the total running time of the program used by this function . Cumulative seconds • A running sum of the number of seconds accounted for by this function alone . Self seconds • The number of seconds accounted for by this function alone This is the major sort for this listing . Calls • The number of times, this function was invoked If this function is profiled, else blank
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 2727 Gprof
Gprof contents . Self ms/call • The average number of milliseconds spent in this function per call If this function is profiled, else blank . Total ms/call • The average number of milliseconds spent in this function and its children per call If this function is profiled, else blank . Name • The index shows the location of the function in the gprof list If the index is in parenthesis, it shows where it would appear in the gprof list if it were to be printed This is the minor sort for this listing
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 2828 Gprof
Gprof call_graph profile .
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 2929 Gprof
The meaning of each field in a child row function . When current function calls the child function • Self: the total time spent in child functions • Children: the total time spent in the child function of a child • Called: the number of child functions/The total number of invoked child functions Recursion number is not included by the total number of invoked child functions • Name: children function name
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 3030 Gprof
The meaning of each field in a parent row function . When parent function calls the current function • Self: the total time spent in current functions • Children: the total time spent in the child function of the current function • Called: the number of current functions/The total number of invoked current functions Recursion number is not included by the total number of invoked child functions • Name: parent function name The index number is displayed next to the parent function name If parents belong to the cycle: displays number of cycles between the name and index number If parents doesn‟t belong to the cycle :
Gprof other feature . The detailed list of sources • Obtaining a line-by-line profiling information • Profiling information is useful for determining the optimization of one part • Using the line-by-line profiling and flat profile verify that the code is a lot of running in any path Verify loop and branch statements if any loop is the most running or any branch is the most running • It is useful to carefully modify some code for optimal performance
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 3232 Assignment 2
Analysis client2 sample application using gprof . Using flat profile . Explain briefly the analyzed client2
Submit screenshot and explain those information
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 3333 Oprofile
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr Oprofile
What is oprofile . Linux system-wide profiler • Proceed without knowing the user in the background • Any time, to obtain the profile data
. Oprofile features • Profile CPU usage of a running process • Identify whether any function of the running process uses a lot of CPU resources • If some functions are used a lot of resources, the program can be redesigned in a way that reduces the function call • Useless low CPU utilization device Low CPU utilization device : I/O bounded sever and so on • User range: database developers
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 3535 Oprofile
. Oprofile feature • The workload is low, system-wide performance monitoring tool Information for the executable program in system • Used memory , the number of requested L2 cache, the number of transmission interrupt • Collect data samples related to the performance whenever the counter is interrupted by using timer Sample data is periodically written to disk Generate reports on system-level performance and application level performance by using the recorded data
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 3636 Oprofile
Oprofile install . Method 1 • sudo apt-get install libiberty-dev • sudo apt-get install oprofile . Method 2 • wget http://prdownloads.sourceforge.net/oprofile/oprofile- 1.1.0.tar.gz • tar xvfz oprofile-1.1.0.tar.gz • cd oprofile-1.1.0
• sudo ./configure --with-kernel-support • make install
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 3737 Oprofile
Usage of part . Oprofile
Commancds Descriptions opcontrol Set whether to collect certain data op_help Show a brief description of each task that is available processor in the system op_merge Merge one data from samples collected the same program running in one op_to_source If the application is compiled with debugging symbols, create an annotated source to run the program oprofiled Record the sample data on disk as a daemon, periodically oprofpp Search the profile data op_import Convert to the native format of the system from an external binary format
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 3838 Oprofile
Oprofile setting . Setting configuration first before executing oprofile • Setting for monitoring the kernel . Setting oprofile by opcontrol • /root/.oprofile/daemonrc: where file is stored after running command . Oprofile initialize • opcontrol –init • opcontrol –deinit • modprobe oprofile timer=1 • opcontrol –start Opcontrol –stop
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 3939 Oprofile
Oprofile setting . Oprofile start and stop • Opcontrol –start: log in as root and run command If you monitor the system using Oprofile Store data at /root/.oprofile/daemonr • Opcontrol –shutdown Start oprofile which is OProfile„s Daemon This daemon writes the sample data to the file at /var/lib/oprofile/samples/ periodically Store the log to /var/lib/oprofile/oprofiled.log If you restart oprofile with different configuration, the sample file of the previous session is stored at /var/lib/oprofile/samples/session-N automatically • ‘N’ is before the session- N +1
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 4040 Outline
Example source Profiling . Perf . Gprof . Oprofile Tracing . Strace . Ltarce . Ftrace
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 4141 Tracing
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr Strace & Ltrace
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr Strace
Tracing system call and signal . Debugging, instructional and diagnostic user space utility . Use to monitor interactions between processes and the kernel • Included system calls, signal deliveries and changes of process state . Useful for debugging and analyzing the application construction • Tracing kernel system call
Strace install . sudo apt-get install strace
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 4444 Strace
Strace . Option option Descriptions -c Count time, calls, and errors for each system call and report a summary on program exit -f Trace child processes as they are created by currently traced processes -r Print a relative timestamp upon entry to each system call -t Prefix each line of the trace with the time of day -T Show the time spent in system calls -e Trace all system calls which take a file name as an argument -p Attach to the process with the process ID pid and begin tracing -s Specify the maximum string size to print (the default is 32) -S Sort the output of the histogram printed by the -c option by the specified criterion
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 4545 Strace
Strace client result
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 4646 Ltrace
Tracing shared library . Useful for debugging and analyzing the application construction • Check by tracing User mode library
Ltrace install . sudo apt-get install ltrace
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 4747 Ltrace
Ltrace . Option
option Descriptions -u Run command with the user id, group id and supplementary groups of username -t Show the time stamp -T Show the time spent inside each call -p PID Tracing the executing process -o Write the trace output to the file -e [library function name][library file name](+/-)
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 4848 Ltrace
Ltrace client result
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 4949 Assignment 3
Monitor the UDP_server by strace and ltrace . System call . Library Submit screenshot that is the result of strace Explain the server operation by tracing system call Capture screenshot that is the result of ltrace Explain the server operation by tracing library
Submit screenshot that is result of tracing client1, client2 by using ltrace . Explain the difference between client1 and client2
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 5050 Ftrace
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr Ftrace
What is ftrace . Debugging component to trace the internal operations of the kernel • Trace all functions within the kernel during any period of time . Analyze whether any events generated in the kernel • While the application is running . Analyze the stream of operating kernel functions on the kernel level as an atomic unit . Analyze to filter out only the desired kernel functions • Minimize the debugging overhead
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 5252 Ftrace
How to work ftrace . Tracing the internal structure of the kernel function by using „pg option‟ in GCC compiler • – pg: operation through mcount routine from the entry of each kernel function
. Invoke kstop_machine • Use to ensure that the modified without considering the other CPU to execute the same code • CONFIG_STOP_MACHINE=y , ./include/linux/stop_machine.h Machine operating like a single core
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 5353 Ftrace
Plugins . View of the plug-in settings • “#> cat current_tracer” . Select enable/disable by using current_tracer command Events . View of the event settings • “#> cat set_events” . Select enable/disable by using enable file in event directory
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 5454 Ftrace
Ftrace trace plugins . Function: trace all kernel function calls during any period of time . Function_graph: analysis of functional relationships in graph form . Wakeup, wakeup_dl, wakeup_rt: analysis of the wakeup latency . Mmiotrace: analysis of the memory map I/O . Irqsoff: analysis of interrupt latency . Nop: trace nothing
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 5555 Ftrace
Function tracer example . Default setting in ftrace is tracing all functions • 1. cat set_ftrace_filter #### all functions enabled ####: set to trace all functions • 2. echo CommonTraceWorkHandler > set_ftrace_filter CommonTraceWorkHandler sets to trace only function • 3. cat current_tracer Nop: no set of any tracer • 4. echo function > current_tracer Set to trace functions • 5. cat trace | head -15
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 5656 Ftrace
Ftrace plug-in tracer example . The trace that affects the response time in system • Tracing wakeup, wakeup_rt, irqsoff, preemptoff, preemptirqsoff
. Irqsoff: print the time when IRQ is disabled • If irq disable is for long time, irq response is slowed down It affects the response speed of App Users • 1. echo irqsoff > current_tracer • 2. cat trace | head -20
. Preemptoff: print the time when preemption is disabled • 1. echo preemptoff > current_tracer • 2. cat trace | head -20
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 5757 Ftrace
Ftrace event tracer example . Analyze the event about kernel sub system • Block, ext4, schedule, IRQ, workqueue . It is possible to trace only the kernel function which work on hooking in advance by tracepoint • Trace the function when tracepoint is compiled . Differ implementation between events with plugins . Check the scsi bus trace events when external hard is connected • 1. echo 1 > events/scsi/enable • 2. cat set_event • 3. cat trace | head -20
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 5858 Ftrace
Ftrace stack Tracer example . Verify call function structure in the current kernel • 1. echo 1 > /proc/sys/kernel/stack_tracer_enabled Stack tracer on • 2. cat stack_trace
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 5959 Ftrace
Ftrace function_graph tracer example . Checking kernel function call in/out by using ftrace . Difference between function_graph tracer and function tracer • Function_graph records both entry(in/out) • Check the function‟s executing time • Check the depth of function call • 1. echo function_graph > current_tracer • 2. cat trace | head -15
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 6060 Ftrace
Trace-cmd . Console-based user interface using kernel ftrace . It is inconvenient to only trace by using the echo command, trace-cmd is convenient to use . Solve usability problems by using trace-cmd instead of memorizing many commands
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 6161 Ftrace
Trace-cmd . Trace-cmd install • sudo apt-get install trace-cmd
. Unused trace-cmd • #> mount -t debugfs nodev /sys/kernel/debug #> cd /sys/kernel/debug/tracing #> echo function ./current_tracer #> echo 1 > tracing_on #> ls /system/ #> echo 0 > tracing_on
. Used trace-cmd • #>/sdcard/trace-cmd record -p function ls /system
Real-Time Computing and Communications Lab., Hanyang University 6262 http://rtcc.hanyang.ac.kr
Ftrace
Trace-cmd list . trace-cmd list -o • Trace-cmd record -O option . trace-cmd list -p • Available plugins . trace-cmd list -e • Available events
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 6363 Assignment 4
Submit screenshot that is tracing OS scheduling latency during any period of time . Hint: sched_wakeup
Submit screenshot that is interval about interrupt disable . Hint: irqsoff -d
Real-Time Computing and Communications Lab., Hanyang University http://rtcc.hanyang.ac.kr 6464