Universität Hamburg Department Informatik Scientific Computing, WR

Performance analysis using Great Performance Tools and Trace Toolkit next generation

Seminar Paper Seminar: Performance analysis in Linux

Heye Vöcking Matr.Nr. 6139373 [email protected]

Tutor: Julian Kunkel

April 3, 2012

Heye Vöcking Performance analysis using GPerfTools and LTTng

Contents

1 Introduction 3 1.1 Overview ...... 3 1.2 Motivation ...... 3 1.3 Related Tools ...... 3 1.3.1 Tools discussed in this work ...... 4

2 Related Work 4

3 History 4

4 Great Performance Tools in depth 4 4.1 Field of application ...... 5 4.1.1 Supported operating systems ...... 5 4.1.2 Supported programming languages ...... 5 4.2 Install / Setup ...... 5 4.3 How it works and how to use it ...... 5 4.3.1 CPU Profiler ...... 5 4.3.2 TCMalloc ...... 6 4.3.3 Heap Leak Checker ...... 6 4.3.4 Heap Profiler ...... 7 4.3.5 Profiler Insides ...... 8 4.4 Post-Processing with pprof ...... 8 4.4.1 Analyzing CPU Profiles ...... 8 4.4.2 Analyzing Memory Leaks ...... 10 4.4.3 Analyzing Heap Profiles ...... 10

5 Linux Trace Toolkit next generation in depth 10 5.1 Field of application ...... 10 5.1.1 Supported operating systems ...... 10 5.1.2 Supported architectures ...... 10 5.1.3 Supported programming languages ...... 10 5.2 Install / Setup ...... 11 5.3 How it works ...... 11 5.3.1 Tracing ...... 12 5.3.2 Tracepoints ...... 12 5.3.3 Probes, Trace Sessions, and Channels ...... 13 5.3.4 Storage and Trace Modes ...... 13 5.4 Post-Processing and Analysis ...... 14 5.4.1 Babeltrace ...... 14 5.4.2 LTT Viewer ...... 14 5.4.3 IDE integration ...... 14 5.5 Performance Impact ...... 15

6 Areas of Application 15

7 Summary 15

1 of 17 Heye Vöcking Performance analysis using GPerfTools and LTTng

8 Conclusion 16

9 Outlook 16

2 of 17 Heye Vöcking Performance analysis using GPerfTools and LTTng

Abstract Todays systems are getting more and more complex. Especially through the trend to software running on multi-core setups. In order to find bugs and improve performance of the software we develop, we need good tools for tracing the programs execution efficiently. By monitoring, recording, and analyzing the programs behavior we can diagnose problems, tune for more per- formance, and understand the interaction of our software with libraries and the operating system better. In this work we want to present the Great Performance Tools and the Linux Trace Toolkit next generation.

1 Introduction the advantage of multi-core systems. Effi- ciency is a crucial subject because it saves When we develop software we often en- time and energy during runtime. Another counter bugs or code that performs poorly. important part is software safety. To avoid To debug and tune our code we can ob- dead locks, segmentation faults, or mem- serve the programs execution. There are ory the programmer needs a good under- several more or less efficient ways to accom- standing of the software he writes and the plish this. Tracing is one of them which can libraries he uses. In order to Here we want be performed using LTTng, another way is to show what tools can help developers to profiling which is possible with gperftools. write efficient code.

1.1 Overview 1.3 Related Tools At first we will mention some tools avail- gprof (1982) gprof extended the 1979 re- able for tracing as well as discuss and com- leased Unix tool “prof”, by providing pare the Great Performance Tools (section a complete call graph analysis. [gpr12] 4) and the Linux Trace Toolkit next genera- tion (section 5) in detail. We will show what EDGE (1999) The Mentor Embedded features they offer, describe how to install EDGE Developer Suite is an IDE built and integrate them, show how to use them, upon the Eclipse framework and pro- and give an insight of how they work. The vides tools that give an insight into areas of application are discussed in section the kernel and a debugger as well as 6. some other features. [Men99]

QNX Momentics Tool Suite (2001) 1.2 Motivation This tool suite can be used to get There is more to developing software than a view of real-time interactions and just to make it work. Even though the pro- memory profiles as well as porting cessing power of computers is still rising a software from single- to multi-core developer should write efficient code and use systems. [QNX01]

3 of 17 Heye Vöcking Performance analysis using GPerfTools and LTTng

Valgrind (2002) Originally designed to be 3 History a free memory debugging toll for Linux on x86, Valgrind is now a Performance analysis is no recent problem. generic framework for creating dy- Tools for analyzing performance have been namic analysis tools. [Val12] around since the early 1970s. They where usually based on timer interrupts to de- tect “hot spots” in the program code. The 1.3.1 Tools discussed in this work first profiler driven analysis tool was prof in 1979. In 1994 instrumentation by inserting In this work we are going to discuss profiling code in programs during compile and compare the Great Performance Tools time came up with the ATOM platform. (GPerfTools) and the Linux Trace Toolkit Instrumentation of the Linux kernel using next generation (LTTng). Both projects a method called “Kprobes” is implemented where started in 2005, available for free on since July 2002 (Kernel 2.5.26) and still the Internet, and by now both are commu- present today (Kernel 3.3). In early 2008 nity driven. “Kernel Markers” where developed to re- place Kprobes with no success, because the so called “Tracepoints” found their way into the Kernel not even a year later and man- 2 Related Work aged to replace the Kernel Markers com- pletely since December 2009. Tracepoints Peter Smith manages to achieve a 30% are still popular and used for example in speed gain by fixing poorly performing code Systemtap and LTTng which is discussed in that he located by profiling his 3D applica- section 5. [Mö12] tion with GPerfTools and kCachegrind as described in his work [Smi11]. In his article [Tou11] Toupin describes 4 Great Performance Tools how to use tracing to diagnose or monitor in depth single and multicore systems.

The author and maintainer of LTTng, The Google Performance Tools project was Mathieu Desnoyers, has written his PhD. launched in March 2005 [SR08]. In 2011 thesis [Des09] on software tracing with LT- Craig Silverstein, in the gperftools project Tng better known as “csilvers”, who was the Romik Guha Anjoy and Soumya Kanti main developer, decided to step down. As Chakraborty evaluated in their Master The- a part of that Google decided to make the sis [RGA10] the efficiency of LTTng for project completely community run, there- employing it within the telecommunication fore with the beginning of version num- company Ericsson ber 2 (released in February 2012), the tools

4 of 17 Heye Vöcking Performance analysis using GPerfTools and LTTng changed their name from “google-perftools” allocation distributed under BSD license. to “gperftools”, where the “g” now stands They offer: for “great”. The gperftools are now main- 1. cpu_profiler, a profiler for monitor- tained by David Chappelle who was the only ing the performance of functions other active developer in the past. [CS12] 2. tcmalloc, a thread caching memory allocator as a fast replacement for 4.1 Field of application 3. heap_checker, a heap checker for leak 4.1.1 Supported operating systems detection The gperftools can be used in Linux, Mac, 4. heap_profiler, a profiler for moni- as well as in Windows environments. toring heap allocations CMalloc

4.1.2 Supported programming We will discuss these below. languages 4.3.1 CPU Profiler Supported programming languages are C The CPU profiler collects information dur- and C++. But the gperftools can be used ing runtime, namely the number of func- with any language that can call C code. tion calls with caller and callee methods and classes. There are multiple ways to use the 4.2 Install / Setup CPU profiler: Setup in a Linux environment is fairly 1. Through linking -lprofiler into straight forward. You should be able to your executable. find the corresponding packages using your 2. Through $LD_PRELOAD by us- package manager. In Arch Linux for ex- ing % env LD_PRELOAD=”/usr/lib/ ample you can use a tool called yaourt libprofiler.so” /path/to/binary. to install packages from the AUR (Arch But doing so is not recommended. User Repository). The command “yaourt - S google-perftools” will download, compile, Note that CPU profiling has to be and install the gperftools as well as all re- turned on manually before execution. quired packages. Additionally you can in- This is done by defining the environ- stall perl5, dot, and gv if you want to visu- ment variable $CPUPROFILE to the file alize the collected data with pprof. you want to save the collected informa- tion to. Surround the code you want 4.3 How it works and how to use to profile with ProfilerStart(”profile name”) and ProfilerStop(). Sev- it eral more advanced functions are The gperftools are a set of open-source tools available like ProfilerFlush() and for profiling, leak checking, and memory ProfilerStartWithOptions().

5 of 17 Heye Vöcking Performance analysis using GPerfTools and LTTng

Using the environment variables tion. So it records every allocation and $CPUPROFILE_FREQUENCY checks on exit if all allocated memory has and been freed. If not all memory has been freed $CPUPROFILE_REALTIME it reports a leak. you can control the behavior of the CPU In order to run a program with the profiler during runtime. heap leak checker you have to use tcmal- See section 4.4 for analysis of the gener- loc, see section 4.3.2 and you have to ated profiles. [Ghe08] define the $HEAPCHECK environment vari- able with the mode you want to use e.g. 4.3.2 TCMalloc HEAPCHECK=normal. There are 4 different modes of leak check- tcmalloc acts as a replacement for the stan- ing: dard malloc provided in clib. It is designed to provide an efficient memory management 1. minimal ignores all allocations before for multi-threaded applications, it reduces main(). memory overhead especially for small ob- jects. The heap checker as well as the heap 2. normal reports all objects allocated profiler depend on tcmalloc. For brevities during runtime that are still “alive” sake we are not looking further into how tc- at the end of execution. malloc works in the inside, because we are 3. strict is basically like normal but more focusing on performance analysis. But also reports memory leaked in global the interested reader might want to take a destructors. look it this [SG07].

There are multiple ways to use tcmalloc: 4. draconian is tracking every bit of memory. If not all allocated memory 1. Through linking -ltcmalloc into has been freed it reports a leak. your executable. (The heap leak checker depends on tcmalloc so in 5. as-is offers a very flexible mode, order to use it you have to link where you can choose all options pre- -ltcmalloc into your binary) cisely by setting environment vari- 2. Through $LD_PRELOAD by us- ables. ing % env LD_PRELOAD=”/usr/lib/ 6. local will activate leak checking not libtcmalloc.so” /path/to/binary. for the whole program, but explic- But doing so is not recommended. itly. In order to do so, you cre- ate a HeapLeakChecker object where 4.3.3 Heap Leak Checker you want to start leak checking, and As the name says, the heap leak checker end it with a call to NoLeaks(). In helps finding memory leaks during execu- order to turn leak checking on you

6 of 17 Heye Vöcking Performance analysis using GPerfTools and LTTng

still have to start the program with 4.3.4 Heap Profiler HEAPCHECK=local.

The heap profiler can be used, to find out more about the memory management of a You can also explicitly turn off leak check- certain program. This is useful for analyz- ing for certain objects or code pieces. To do ing the program heap, finding memory leaks this, bracket the code you want to ignore and make out places that allocate a lot of with the following heap-checking construct: memory. The profiler keeps track of all allo- cations done by malloc, calloc, realloc, ... and new.

{ To use the heap profiler tcmalloc must HeapLeakChecker::Disabler disabler; be used. Take a look at section 4.3.2 to see how to include tcmalloc in your applica- } tion. You also have to set the environment ... variable $HEAPPROFILE to the path where you want to save your profile to.

This will lead the heap leak checker to ig- In your code you can call nore all objects and any objects reachable HeapProfilerStart(”prefix”) to start from these objects, including any routines heap profiling. The profiled data will be called from the bracketed region to be ig- written in to files called prefix.XXXX.heap, nored. where XXXX is a whole number starting It is also possible to turn of leak check- at 0000 and increasing with every pe- ing for single objects and everything reach- riodic dump. With HeapProfileDump() able by following pointers inside these or GetHeapProfile() you can exam- objects. To do this simply pass the ine the profile. Whenever the pro- pointer of the object you want to ignore to filer is running IsHeapProfilerRunning() IgnoreObject(). By passing the pointer will return true. To stop profiling call to UnIgnoreObject() the effect can be un- HeapProfilerStop(). done. It is possible to manually check for leaks Via environment variables the leak using the profiler, but it is easier to use the checker can be tuned when run in as-is heap leak checker discussed in section 4.3.3 mode. For a complete overview of all avail- for this task. able options have a look at [Lif07]. During execution the heap profiler will The leak report is printed on the write periodical outputs to the specified command-line, but it can also be visualized path. These can be analyzed using the pprof by pprof, see section 4.4 for details. [Lif07] tool, see section 4.4 for details. [Ghe12]

7 of 17 Heye Vöcking Performance analysis using GPerfTools and LTTng

4.3.5 Profiler Insides and bar are located. The sizes array will also contain two el- The profiler collects information about the ements program stack. In order to do so it grabs information provided in the program stack sizes[0] = 16 during execution. In the following source sizes[1] = 16 code we want to show the basic idea behind this. Which are the corresponding frame sizes of the functions. main() { foo(); 4.4 Post-Processing with pprof } The program called “pprof” was written by foo() { Google Inc. in the programming language bar(); go. It is under the BSD license and its } source is located inside the gperftools repos- itory. bar() { It can be used to analyze the collected // Code inserted for profiling the data. It offers a number of different output // program stack formats: Raw text, postscript via gv, dot, int size = 10; pdf, gif, source-code listings, and disassem- void* result[size]; bly. Note that you might need to install int sizes[size]; dot, gv, and ps2pdf. To reduce cluttering int depth = GetStackFrames( some edges might be dropped but this as result, sizes, size, 1); well as the granularity of the output can be } controlled at the command-line.

4.4.1 Analyzing CPU Profiles The method bar() and the call to GetStackFrames() will be skipped, so After a program has been run with the CPU in this case the result returned by profiler enabled the output file can be ana- GetStackFrames() will be 2. The array lyzed using pprof. When calling pprof with result will contain two elements the –gv option, a call graph annotated with timing information will be displayed. In fig- result[0] = foo ure 1 you can see an example of approximat- result[1] = bar ing the value of a gaussian distribution using the direct approximation (original_exp), a The two elements are instruction pointers gradient, the exponential function, and a that point at the program stack where foo lookup table. The sizes of the nodes reflect

8 of 17 Heye Vöcking Performance analysis using GPerfTools and LTTng

german-open Total samples: 1625 _start 0 (0.0%) Focusing on: 181 of 181 (100.0%) Dropped nodes with <= 0 abs(samples) Dropped edges with <= 0 samples

181

__libc_start_main 0 (0.0%) of 181 (100.0%)

181

main 0 (0.0%) of 181 (100.0%)

181

doExperiment 0 (0.0%) of 181 (100.0%)

88 36 20 30

original_exp exponential_exp gradient_exp lookup_exp 1 (0.6%) 1 2 (1.1%) 2 2 (1.1%) 2 4 (2.2%) 4 of 88 (48.6%) of 36 (19.9%) of 20 (11.0%) of 30 (16.6%)

22 31 23 7 14 8 10 8 14

Tools Tools std GeneralProbabilisticTool basic_ostream GeneralProbabilisticTool get_probability_at 5 12 _M_insert 8 gauss 16 5 (2.8%) 8 (4.4%) of 22 (12.2%) 16 (8.8%) of 61 (33.7%) of 61 (33.7%)

3 51 14 23

Tools std std GeneralProbabilisticTool std num_put exp log do_put 0 (0.0%) random 5 0 (0.0%) 0 (0.0%) of 15 (8.3%) of 23 (12.7%) of 51 (28.2%) 5 (2.8%) of 14 (7.7%)

14 51 21

std __expf_finite num_put __logf_finite 14 _M_insert_float 8 21 14 (7.7%) 8 (4.4%) 21 (11.6%) of 51 (28.2%)

17

std use_facet 2 2 (1.1%) of 17 (9.4%)

14

__dynamic_cast 6 (3.3%) 5 of 15 (8.3%)

Figure 1: Output produced by pprof

9 of 17 Heye Vöcking Performance analysis using GPerfTools and LTTng their time consume. Each node contains the 5 Linux Trace Toolkit next function name (first line), the time spend in generation in depth the function itself (second line), the total time which is composed of the cumulated The LTTng was written by Mathieu time of the functions called and the time Desnoyers after he started to maintain the displayed in the second line. The time is predecessor called “Linux Trace Toolkit” provided in units where one unit is equiva- since 2005. It offers both user space, as well lent to one sample. For example: 88 units at as kernel space tracing. It claims to trace 100 samples per second equate to 0.88 sec- with a very low impact on the system. It is onds spend in the corresponding function. able to output the recorded data directly to The edges are drawn from the caller to the disk or to the network. An overview over the callee, located at the arrow of the connect- whole LTTng architecture can be achieved ing edge. The weight of the edges reflect by looking at figure 2. the time spend in the function that is be- ing called. Some edges and nodes might be 5.1 Field of application dropped to reduce clutter, so the sum might 5.1.1 Supported operating systems not always add up to 100 percent. [Ghe08] LTTng is only available for Linux.

4.4.2 Analyzing Memory Leaks 5.1.2 Supported architectures

LTTng has at the moment full support for To help track down memory leaks pprof can x86-32, x86-64, PowerPC 32/64, ARMv7 be used to display a call graph of the re- OMAP3, MIPS, sh, sparc64, s390 with a ported leak. [Lif07] timestamp precision of typically ∼ 1ns (cy- cle counter). Other Linux architectures are supported as well, but LTTng might only 4.4.3 Analyzing Heap Profiles have a limited timestamp precision on these. The pprof tool can also be used to analyze LTTng also supports arbitrary architecture the profiles generated by the heap profiler. endianness. [LTT12] Calling pprof with the –gv option will open a gv window displaying a weighted directed 5.1.3 Supported programming graph showing what portions of the code al- languages located how much memory. Supported programming languages are C To find gradual memory leaks you can use and C++. But LTTng can be used with pprof and compare two profiles. The mem- any language that can call C code, for ex- ory usage in the first profile will be sub- ample it can be used with Java or Erlang tracted from the second profile. [Ghe12] through the LTTng VM adaptor.

10 of 17 Heye Vöcking Performance analysis using GPerfTools and LTTng

HostSSH connexion Target C/C++Application J ava/Erlang Linux kernel - Tracepoint* Application - Tracepoint* Host-Side User Interfaces - Tracepoint Probes* - Tracepoint* - Dynamic probes LTTng Command Line liburcu (LGPLv2.1) LTTng VM adaptor (kprobes) Babeltrace (MIT/BSD) Interface liblttng-ust (LGPLv2.1) - Tracepoint Probes* LTTng modules - Trace converter (GPLv2) liburcu (LGPLv2.1) (GPLv2/LGPLv2.1) - Trace pretty printer liblttngctl (LGPLv2.1) - Tracepoint Probes* - Allow open source liblttng-ust (LGPLv2.1) and proprietaryplugins LTTng Session Daemon (GPLv2) libbabeltrace (MIT/BSD) Memory-mapped - Control multiple tracing sessions Posix shared memory Posix shared memory buffers or splice, - Centralized tracing status and pipe and pipe LTTV (GPLv2) management poll, ioctl - Trace display and analysis liburcu (LGPLv2.1) - Trace control liblttngctl (LGPLv2.1) - Allow open-source plugins liblttng-ust-ctl (GPLv2) LTTng ConsumerDaemon (GPLv2) libbabeltrace (MIT/BSD) - Zero-copy data transport or aggregator - Export raw trace data, statistics and summary data - Snapshots from in-memory flight recorder mode Eclipse Tracing and CustomControl Software - Store all trace data, discard on overrun Monitoring Framework (EPL) - Interface with proprierary - Trace display and analysis cluster management infrastructures liburcu (LGPLv2.1) liblttng-ust-ctl (GPLv2) - Trace control liblttngctl (LGPLv2.1) - Allow open source liblttng-consumer (GPLv2) and proprietary plugins

Local storage * Tracepointand Probes Characteristics CTF† - Low overhead, no trap, no system call, † Common Trace Format(CTF) - Re-entrant: Signal, thread and NMI-safe, - Compact binary format, - Wait-free read-copy update, - Self-described, - Can be used in real-time systems, CTF† over TCP/UDP/SSH - Use GCC asm goto and Linux kernel Instrumentation - Handles HW&SW tracing, - TCP and UDP network streaming, static jumps, - Flexible data layouts for - Cycle-level time-stamp, Control expressiveness and - Runtime activation of statically highest throughput, and dynamically inserted instrumentation, Trace Data - Layout allows fast seek and - Non-blocking atomic operations, processing of very large traces - Allow tracing of proprietary applications (>10GB). and proprietary control software Libraries (LGPLv2.1 license).

Figure 2: LTTng 2.0 architecture [LTT12]

5.2 Install / Setup 2. ltt-facilities is a collection of event types. The installation of LTTng 2.0 is fairly sim- ple, given you are using Ubuntu. You can 3. ltt-statedump generates events to de- simply install it from the PPA (Personal scribe kernels state at start time. Package Archives) and reboot. That’s it. 4. ltt-core handles a number of LT- 5.3 How it works Tng control events and therefore controls the previously mentioned LTTng consists of three parts: The kernel ltt-heartbeat, ltt-facilities, and ltt- part which controls kernel tracing, the user statedump. space command-line application called lttctl, and the user space daemon called lttd which 5. ltt-base which is a built in kernel ob- waits for the data to write it on the disk. ject that contains symbols and data Moreover LTTng is modular, it consists of 5 structures. modules: 1. ltt-heartbeat which generates periodic [DD06] events for monotonic increasing time- All these modules work together to ob- base. serve and persist the data while tracing.

11 of 17 Heye Vöcking Performance analysis using GPerfTools and LTTng

5.3.1 Tracing 5.3.2 Tracepoints

Tracing is an efficient way to monitor the A tracepoint is a small piece of code basi- execution of a program and record the ob- cally similar to this: served data. We observe in particular oper- if (tracepoint_1_activated) ating system kernel events [YD00] such as: (*tracepoint_1_probe)(arg1, arg2);

• System Calls Tracepoints are already available in the Linux kernel and in many Linux applica- • Interrupt Requests tions. But in order to use them they have to be activated prior to execution and con- • Scheduling Activities nected to a probe. A probe is the function • Network Activities called by an activated tracepoint during ex- ecution. See figure 3 for a digram of the Events can have an arbitrary number of ar- tracing architecture. With custom probes guments, therefore the event sizes might dif- tracing can be bound to a condition. In or- fer. All events are ordered accurately, yet der to keep the overhead of tracepoints to a across multiple cores or CPUs, except if the minimum, they are implemented very care- hardware makes it impossible. fully so the magnitude of an inactive trace- The data recorded saves the event as well point is practically zero. In the kernel they as its attributes and the timestamp. There use code modification and in user space the are several ways to do this. evaluation on booleans is fairly fast. When active a the tracepoints overhead is compa- • Static tracepoints are located at rable to a C function call. [Tou11]. [LTT12] source code level and therefore hard- Tracing wired in a compiled binary file of a Instrumentation On−site program or in the kernel. They have Scalability to multi−cores to be activated prior to execution. Deterministic real−time effect Probes Low−latency • Dynamic breakpoints can be inserted Low−overhead without recompiling via a system call, Portability Data extraction trap, or dynamic jump, and added in ....Input/Output the Linux kernel via kprobe. Post−processing Merge−sort A breakpoint interrupt causes big overhead, Off−site so LTTng uses Tracepoints to record raw Cross−architecture data which is analyzed in a post-processing Scalability to large traces Analysis step to keep the overhead during execution low. [Tou11] [RGA10]

12 of 17 Heye Vöcking Performance analysis using GPerfTools and LTTng

Figure 3: Trace Architecture Diagram [Des09] Probes are called by the instrumented portions of a program or the kernel. They collect the data, hand it over to the lttd which takes care of the data persistence. After the execution is over the collected data is prepared for the analysis and displayed in an appropriate viewer.

Figure 4: Tracer Components Overview [Des09] The notation used for the edge connecting Trace session 5.3.3 Probes, Trace Sessions, and and Channels indicates, that a Channels trace session can contain multiple channels. A probe can be connected to more than one tracepoint. When an active tracepoint is 5.3.4 Storage and Trace Modes reached, it calls the probe it is connected In order to minimize the performance im- to. The probe then reads the trace session pact on the system and ensure an effective and writes the events into a channel. The use of memory bandwidth, LTTng is de- probes are not implemented through traps signed to take a zero copy approach. That or system calls and are therefore very effi- means that no data is moved between mem- cient. ory locations during tracing phases. All A trace session can be attached to several recorded data is sent into channels which channels and keeps the trace configuration store the data in buffers and depending on which consists of the event filters to be ap- the trace mode store the data independent plied and a boolean value which states if the from tracing. The data in the channels can session is active or not. be written on disk or streamed over the net-

A channel is used as an abstract I/O de- work. Furthermore it is highly optimized to vice. Data written into a channel is at first keep it as compact as possible. stored in a buffer and when needed exported The following Tracing modes are avail- by lttd to the disk or to the network. able:

An overview of the different trace compo- write-to-disk the recorded data is written, nents can be seen in figure 4. whenever the circular buffer is full.

13 of 17 Heye Vöcking Performance analysis using GPerfTools and LTTng

Therefore no trace data is lost. 5.4.2 LTT Viewer

The LTT Viewer, or short LTTV, can be flight recorder the recorded data is not used to view recorded traces by either the written when the circular buffer is full, kernel space tracer or the user space trace. but whenever the recording does not It is written in C using glib and the GTK impact performance or after the trace and independent from LTTng. Developers has ended. The flight recorder mode can easily extend the LTTV by writing plug- activation is per-channel. Some data ins. The GUI of LTTV offers control flow might be overwritten in this mode. charts and resource viewers. In figure 5 you can see a trace recorded by LTTng of a pro- Still under development are stream to re- gram which performs a one second sleep. mote disk and live monitoring, remote or lo- At the position of the cursor we observe cal. Since I/O operations are costly they are a syscall and page faults by different pro- not done by the probes, but by threads spe- cesses. In the “Traceset” view we can select cialized for I/O operations which can persist statistics of the whole system and of our the data while tracing is in progress or after program. For example Traps, Interrupts, the trace session has been ended. [RGA10] and Syscalls. As of now, it is not possible to view traces generated with LTTng 2.0 because they are 5.4 Post-Processing and Analysis saved in CTF which is not yet supported by The trace dump must not necessarily be LTTV. viewed in the same environment where the data has been recorded. The trace output 5.4.3 IDE integration itself is a binary file which is self describing Through the TMF (Tracing and Monitor- and therefore portable. ing Framework, also known as “LTTng plug- in”) traces can be viewed and analyzed di- 5.4.1 Babeltrace rectly in eclipse. TMF is part of the Linux Tools project and can be installed using the Babeltrace offers libraries for parsing traces built in software manager in eclipse. It re- to human readable text logs and vice versa. quires the LTTng trace reading library to be It is distributed by EfficiOS which is run by installed. It offers the following functions: Mathieu Desnoyers, the author and main- tainer of LTTng. Babeltrace works with Control Flow visualizes the state transi- logs in CTF (Common Trace Format) and tion of the processes. is command-line based, so it offers no GUI. For a visual representation of a trace the Resources visualizes the state transition of LTT Viewer can be used. system resources.

14 of 17 Heye Vöcking Performance analysis using GPerfTools and LTTng

Figure 5: An example trace recorded with LTTng analyzed with LTTV

Statistics provides statistics on occur- 6 Areas of Application rences of events. Google is using their gperftools themselves for profiling, leak checking, as well as mem- ory allocation. The LTTng is used by a 5.5 Performance Impact variety of companies for performance mon- The performance impact of LTTng is rea- itoring and debugging. These include IBM, sonably small. On a busy system the CPU for solving issues in distributed file systems, time added by tracing under medium and Autodesk, for real time issues in application high load goes from 1.54% to 2.28%. Only development, Siemens, for internal debug- under very high load the impact is notice- ging and performance monitoring. Further- able with 9.46%. An overview of the impact more the LTTng included in the packages under different scenarios can be seen in ta- of many Linux distributions. To only name ble 1. The load in the third column shows a few: Ubuntu, Montavista, Wind River, the usage of the CPU by all processes in- STLinux and Suse. [RGA10] [LTT12] cluding probes and lttd. Adding up the time in the probes and the lttd column gives us 7 Summary the whole overhead caused by LTTng. The data rate is the amount of data outputted We have mentioned several different tools and processed by the lttd (LTTng Daemon for performance analysis and discussed the in user space) every second. And Event lists Linux Trace Toolkit next generation and the the number of events recorded every second. Great Performance Tools in detail. We ex- [DD06] plained how to use them and gave an insight

15 of 17 Heye Vöcking Performance analysis using GPerfTools and LTTng

CPU time (%) Data rate Load size Test Series Events/s load probes lttd (MiB/s) Small mozilla (browsing) 1.15 0.053 0.27 0.19 5,476 Medium find 15.38 1.150 0.39 2.28 120,282 High find + gcc 63.79 1.720 0.56 3.24 179,255 Very high find + gcc + ping flood 98.60 8.500 0.96 16.17 884,545

Table 1: LTTng benchmarks under different loads [DD06] to some basic functionality. The use of the LTTng and the gperftools work and further gperftools is very straight forward and the showed how use them and how to analyze collected information can be analyzed using their output. pprof by outputting it to many different for- mats. There are several tools for the analy- sis of data collected by the LTTng. 9 Outlook

8 Conclusion With the demand for more performance profiling and tracing will remain a crucial The need for more speed drives the trend task for speed improvement. When we look to systems that are getting more and more at the history of tracing we can see that due complex and should yet perform faster and to the rising demand tracing is getting more more efficient. At the same time the com- and more comfortable. Also the perfor- plexity of software running on those system mance impact of tracing is getting smaller is also increasing. In order to improve com- and smaller. But it is not only the speed prehension of the behavior of the systems we that matters but also the safety and security need to use tools that help us understanding of software. It is hard to imagine debugging how the different parts work together we can software with interacting threads by hand use tools that help us. In this work we have and since the drift to parallel programs is discussed two of the many tools offered for growing we will depend on good tracing and profiling and tracing. We explained how the profiling tools in the future.

16 of 17 Heye Vöcking Performance analysis using GPerfTools and LTTng

References

[CS12] Craig Silverstein, David C.: Great Performance Tools. (2012). http://code.google.com/p/gperftools/

[DD06] Desnoyers, Mathieu ; Dagenais, Michel R.: The LTTng tracer: A low impact performance and behavior monitor for GNU/Linux. In: Linux Symposium (2006)

[Des09] Desnoyers, Mathieu: Low-Impact Operating System Tracing, Université de Mon- tréal, Diss., December 2009

[Ghe08] Ghemawat, Sanjay: Gperftools CPU Profiler. (2008), May. http://gperftools.googlecode.com/svn/trunk/doc/cpuprofile.html

[Ghe12] Ghemawat, Sanjay: Gperftools Heap Profiler. (2012), February. http://gperftools.googlecode.com/svn/trunk/doc/heapprofile.html

[gpr12] gprof: GNU prof. (2012). http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html

[Lif07] Lifantsev, Maxim: Gperftools Heap Leak Checker. (2007), July. http://gperftools.googlecode.com/svn/trunk/doc/heap_checker.html

[LTT12] LTTng: http://www.lttng.org. (2012)

[Men99] Mentor: An In-Depth and Comprehensive look at The Mentor Embedded Linux Development Platform, 1999

[Mö12] Möller, Hajo: Instrumentierung des Kernels. (2012)

[QNX01] QNX: QNX. http://www.qnx.com/products/tools/. Version: 2001

[RGA10] Romik Guha Anjoy, Soumya Kanti C.: Efficiency of LTTng as a Kernel and Userspace Tracer on Multicore Environment, School of Innovation, Design and Engineering Mälardalen University Västerås, Sweden, Diplomarbeit, June 2010

[SG07] Sanjay Ghemawat, Paul M.: TCMal- loc : Thread-Caching Malloc. (2007), February. http://gperftools.googlecode.com/svn/trunk/doc/tcmalloc.html

[Smi11] Smith, Peter: Using the CPU to Improve Performance in 3D Applications. 2011. – Forschungsbericht

[SR08] Sean Rose, Riley A.: An Evaluation of Performance Analysis Tools, University of Ottawa, Diplomarbeit, April 2008

[Tou11] Toupin, Dominique: Using Tracing to Diagnose or Monitor Systems. In: Soft- ware, IEEE 28 issue:1 (2011), S. 87–91

[Val12] Valgrind: (2012). http://www.valgrind.org

[YD00] Yaghmour, Karim ; Dagenais, Michel R.: The Linux Trace Toolkit. In: (2000)

17 of 17