The Lttng Tracer: a Low Impact Performance and Behavior Monitor for GNU/Linux

The LTTng tracer: A low impact performance and behavior monitor for GNU/Linux Mathieu Desnoyers Michel R. Dagenais École Polytechnique de Montréal École Polytechnique de Montréal [email protected] [email protected] Abstract presented. The approach taken to allow nested types, variable fields and dynamic alignment of data in the trace buffers will be revealed. It will Efficient tracing of system-wide execution, show the mechanisms deployed to facilitate use allowing integrated analysis of both kernel and extension of this tool by adding custom in- space and user space, is something difficult to strumentation and analysis involving kernel, li- achieve. The following article will present you braries and user space programs. a new tracer core, Linux Trace Toolkit Next Generation (LTTng), that has taken over the It will also introduce LTTng’s trace analyzer previous version known as LTT. It has the same and graphical viewer counterpart: Linux Trace goals of low system disturbance and architec- Toolkit Viewer (LTTV). The latter implements ture independance while being fully reentrant, extensible analysis of the trace information scalable, precise, extensible, modular and easy through collaborating text and graphical plu- to use. For instance, LTTng allows tracepoints gins.1 It can simultaneously display multi- in NMI code, multiple simultaneous traces and ple multi-GBytes traces of multi-processor sys- a flight recorder mode. LTTng reuses and en- tems. hances the existing LTT instrumentation and RelayFS. This paper will focus on the approaches taken 1 Tracing goals by LTTng to fulfill these goals. It will present the modular architecture of the project. It With the increasing complexity of newer com- will then explain how NMI reentrancy requires puter systems, the overall performance of appli- atomic operations for writing and RCU lists for cations often depends on a combination of sev- tracing behavior control. It will show how these eral factors including I/O subsystems, device techniques are inherently scalable to multipro- drivers, interrupts, lock contention among mul- cessor systems. Then, time precision limita- tiple CPUs, scheduling and memory manage- tions in the kernel will be discussed, followed ment. A low impact, high performance, trac- by an explanation of LTTng’s own monotonic ing system may therefore be the only tool ca- timestamps motives. pable of collecting the information produced by instrumenting the whole system, while not In addition, the template based code generator for architecture agnostic trace format will be 1Project website: http://ltt.polymtl.ca. 210 • The LTTng tracer: A low impact performance and behavior monitor for GNU/Linux changing significantly the studied system be- aims to minimize CPU usage during the exe- havior and performance. cution of the monitored system by collecting data for later off-line analysis. As the goal is to Besides offering a flexible and easy to use in- have minimum impact on performance, static terface to users, an efficient tracer must satisfy instrumentation is habitually used in this ap- the requirements of the most demanding appli- proach. Static instrumentation consists in mod- cation. For instance, the widely used printk ifying the program source code to add logging and printf statements are relatively easy to statements that will compile with the program. use and are correct for simple applications, but Such systems include LTT [7], a Linux ker- do not offer the needed performance for instru- nel Tracer, K42 [5], a research operating sys- menting interrupts in high performance multi- tem from IBM, IrixView and Tornado which processor computer systems and cannot neces- are commercial proprietary products. sarily be used in some code paths such as non maskable interrupts (NMI) handlers. The second class of monitor aims at calculating An important aspect of tracing, particularly in well defined information (e.g. I/O requests per the real-time and high performance comput- seconds, system calls per second per PID) on ing fields, is the precision of events times- the monitored CPU itself: it is what is generally tamps. Real-time is often used in embedded called a pre-processing approach. It is the case systems which are based on a number of dif- of SystemTAP [3], Kerninst [4], Sun’s dtrace ferent architectures (e.g. ARM, MIPS, PPC) [1] and IBM’s Performance and Environment optimized for various applications. The chal- Monitoring (PEM) [6]. All except PEM use a lenge is therefore to obtain a tracer with precise dynamic instrumentation approach. Dynamic timestamps, across multiple architectures, run- instrumentation is performed by changing as- ning from several MHz to several GHz, some sembly instructions for breakpoints in the pro- being multi-processors. gram binary objects loaded in memory, like the gdb debugger does. It is suitable to their goal The number of ad hoc tracing systems devised because it generally has a negligible footprint for specific needs (several Linux device drivers compared to the pre-processing they do. contain a small tracer), and the experience with earlier versions of LTT, show the needs for a flexible and extensible system. This is the case Since our goal is to support high performance both in terms of adding easily new instrumen- and real-time embedded systems, the dynamic tation points and in terms of adding plugins for probe approach is too intrusive, as it implies the analysis and display of the resulting trace using a costly breakpoint interrupt. Further- data. more, even if the pre-processing of information can sometimes be faster than logging raw data, it does not allow the same flexibility as post- 2 Existing solutions processing analysis. Indeed, almost every aspect of a system can be studied once is obtained a trace of the complete flow of the system be- Several different approaches have been taken havior. However, pre-processed data can be by performance monitoring tools. They usually logged into a tracer, as does PEM with K42, for adhere to one of the following two paradigms. later combined analysis, and the two are there- The first class of monitor, post-processing, fore not incompatible. 2006 Linux Symposium, Volume One • 211 3 Previous Works been done in collaboration with members of their team. The XML file format for describ- ing events came from these discussions, aiming LTTng reuses research that has been previously at standardizing event description and trace for- done in the operating system tracing field in or- mats. der to build new features and address currently unsolved questions more thoroughly. The previous Linux Trace Toolkit (LTT) [7] 4 The LTTng approach project offers an operating system instrumentation that has been quite stable through the 2.6 Linux kernels. It also has the advantage of be- The following subsections describe the five ing cross-platform, but with types limited to main components of the LTTng architecture. fixed sizes (e.g. fixed 8, 16, 32, or 64-byte inte- The first one explains the control of the differ- gers compared to host size byte, short, integer, ent entities in LTTng. It is followed by a de- and long). It also suffers from the monolithic scription of the data flow in the different mod- implementation of both the LTT tracer and its ules of the application. The automated static viewer which have proven to be difficult to ex- instrumentation will thereafter be introduced. tend. Another limitation is the use of the ker- Event type registration, the mecanism that links nel NTP corrected time for timestamps, which the extensible instrumentation to the dynamic is not monotonic. LTTng is based on LTT but collection of traces, will then be presented. is a new generation, layered, easily extensible with new event types and viewer plugins, with a more precise time base and that will eventu- 4.1 Control ally support the combined analysis of several computers in a cluster [2]. There are three main parts in LTTng: a user RelayFS [8] has been developed as a standard space command-line application, lttctl; a user high-speed data relay between the kernel and space daemon, lttd, that waits for trace data and user space. It has been integrated in the 2.6.14 writes it to disk; and a kernel part that controls Linux kernels. It offers hooks for kernel clients kernel tracing. Figure 1 shows the control paths to send information in large buffers and inter- in LTTng. lttctl is the command line application acts with a user space daemon through file op- used to control tracing. It starts a lttd and con- erations on a memory mapped file. trols kernel tracing behavior through a library- module bridge which uses a netlink socket. IBM, in the past years, has developed K42 [5], an open source research kernel which aims at The core module of LTTng is ltt-core. This full scalability. It has been designed from the module is responsible for a number of LTT ground up with tracing being a necessity, not an control events. It controls helper modules option. It offers a very elegant lockless tracing ltt-heartbeat, ltt-facilities, and ltt-statedump. mechanism based on the atomic compare-and- Module ltt-heartbeat generates periodic events exchange operation. in order to detect and account for cycle coun- ters overflows, thus allowing a single monoton- The Performance and Environment Monitor- ically increasing time base even if shorter 32- ing (PEM) [6] project shares a few similarities bit (instead of 64-bit) cycle counts are stored with LTTng and LTTV since some work have in each event. Ltt-facilities lists the facilities 212 • The LTTng tracer: A low impact performance and behavior monitor for GNU/Linux User space trace files User space lttd lttctl liblttctl libltt-usertrace-fast lttd Kernel-User Communication Netlink socket RelayFS Kernel-User RelayFS Communication Kernel ltt-control modules ltt-statedump ltt-core Kernel ltt-base Built-in Kernel ltt-base built-in Figure 2: LTTng data flow ltt-heartbeat ltt-facilities data by using the poll file operation.

Load more