Combined Tracing of the Kernel and Applications with Lttng

Combined Tracing of the Kernel and Applications with LTTng Pierre-Marc Fournier Mathieu Desnoyers École Polytechnique de Montréal École Polytechnique de Montréal [email protected] [email protected] Michel R. Dagenais École Polytechnique de Montréal [email protected] Abstract adding debugging tools may render certain bugs unre- producible. Increasingly complex systems are being developed and put in production. Developers therefore face increas- Kernel tracing is one of these tools. It may be used to ingly complex bugs. Kernel tracing provides an ef- understand a great variety of bugs. Quite often, the ker- fective way of understanding system behavior and de- nel is aware of all the important activities of an appli- bugging many types of problems in the kernel and in cation, because they involve system calls or traps. In userspace applications. In some cases, tracing events certain cases however, kernel tracing is not sufficient. that occur in application code can further help by pro- For example, the execution of applications that process viding access to application activity unknown to the ker- a large number of requests or that have a large num- nel. ber of threads may be more difficult to follow from a kernel perspective. For this reason, applications need LTTng now provides a way of tracing simultaneously to be traceable too. It is moreover highly desirable that the kernel as well as the applications of a system. The userspace trace events be correlatable with kernel events kernel instrumentation and event collection facilities during the analysis phase. were ported to userspace. This paper describes the architecture of the new LTTng Userspace Tracer and how LTTng [3], while providing a highly efficient kernel it can be used in combination with the kernel tracer. tracer, lacks a userspace tracer of equal performance. Results of some early performance tests are also pre- In this paper, the LTTng Userspace Tracer, a work in sented. progress to fill this gap, is described. In the next sec- tions, its architecture is presented. Afterwards, performance tests are discussed, followed by proposals for fu- 1 Introduction ture work. Technologies such as multi-core, clusters, and embed- ded systems are used to build increasingly complex sys- 2 Related Work tems, which results in developers facing increasingly complex bugs. These bugs may, for example, occur The classic strace tool provides a primitive form of only in production, disappear when probed, occur rarely, userspace tracing. It reports system calls and signals in or have a slowdown of the system as the only symp- a process. Unfortunately, its usage induces a significant tom.These characteristics make traditional debugging performance penalty. It is moreover limited to tracing tools ineffective against them. New debugging tools are system calls and signals, both of which are nowadays therefore required. obtainable at a much lower cost through kernel tracing. The impact of these tools on system performance must DTrace has statically defined tracing (SDT) that can be be as small as possible, so they can run on systems in used inside userspace applications[6]. This implementa- production, whose hardware is chosen to match the pro- tion uses special support inside the runtime linker. Upon duction load (and not debugging tools), or on which activation of an instrumentation point, NOP instructions • 87 • 88 • Combined Tracing of the Kernel and Applications with LTTng placed by the linker are replaced by an instruction that • It should be completely reentrant, supporting provokes a trap. Probes are limited to 4 arguments. multi-threaded applications and tracing of events in signal handlers. SystemTap has an implementation of SDT[2] that seems to be very similar to that of DTrace. • There should be no system call in the fast path. LTTng has offered several different userspace tracing • The trace data should never be copied. technologies over the years. The first is called “sys- • It should be possible to trace code in shared li- tem call assisted” userspace tracing. It declares two new braries as well as the executable. system calls. The first is used to register an event type; it returns an event ID. The second is used to record an • The instrumentation points should support an un- 1 event; it requires an event ID and a payload to be passed limited number of arguments. as arguments. • No special support from the linker or compiler The second, called “companion process” userspace trac- should be required. ing, requires no kernel support. Processes write their • The trace format should be compact. events in buffers in their own address space. Each thread had a “companion” process, created by the tracing li- 3.1 Tracing Library brary, that shares the buffers (through a shared memory map). The companion consumes the buffers using a Programs that must be traced are linked with the lockless algorithm. tracing library (libust). They must also be linked with the Userspace Read-Copy-Update library After some refactoring of the LTTng core, these two (liburcu)[4], which is used for lockless manipula- approaches were dropped. Eventually, a quick replace- tion of data structures. They must finally be linked with ment was devised, which consists in a simple system call libkcompat[5], a library that provides a userspace taking a string as argument. Calling it produces an event version of some APIs available in the Linux kernel whose argument is the string. The event always has the (atomic operations, linked list manipulation, kref-style same name; an indication of the application generating mechanism, and others). the event needs to be prepended to the string. Eventually, the feature was moved from a system call 3.2 Time to a file in DebugFS (/debug/ltt/write_event). There are no dependencies between the kernel and Writing a string to this file generates an event called userspace tracers of LTTng. However, in order to do userspace_event whose argument is the string. a combined analysis of a kernel trace and of userspace Ftrace[7], another kernel tracer, has a similar feature us- traces, timestamps of all traces must be coherent (e.g. ing a file called trace_marker. they must come from the same time source). An appropriate tracing time source must have a high res- 3 Architecture olution in addition to being coherent across cores. The cost of accessing this time source must be low in ker- The LTTng Userspace Tracer (UST) is a port of the nel space, but also in userspace (making a system call LTTng static kernel tracer to userspace. This section is too costly). Work is needed in the Linux kernel to describes the architecture of the UST, insisting on the make such a time source with all these characteristics particularities of the userspace implementation. Figure available in all architectures. 1 shows an overview of the UST architecture. The UST code currently works only on x86 (32 and 64 Here are some of the important design goals of the UST, bits). Until a suitable time source is provided by the ker- that influenced its architecture. nel, the TSC is read directly with the rdtsc instruction. This is the same time source used by the kernel tracer. It • It should be completely independent from the ker- is quick to read and synchronized across cores in most nel tracer. Kernel and userspace traces should be variants of the architecture. correlated at analysis time. 1The only constraint is that an event must fit in a sub-buffer. 2009 Linux Symposium • 89 Path of trace data (zero copy) Tracing library Traced communication ustd application thread Trace Buffer Disk Created only Shared memory Consumer when needed segment daemon Network ust Socket Trace control program Consumer synchronization Tracing commands Figure 1: Overview of the LTTng Userspace Tracer architecture. 3.3 Instrumentation Points 3.4 Buffering Mechanism Instrumentation points consist in ports of the two kernel The buffering mechanism is a port of the lockless LTTng instrumentation technologies LTTng uses: markers and algorithm. Its design reuses many ideas from the K42 tracepoints. Their usage is the same as in the kernel. operating system and the Linux kernel Relay[8] system. Events are written in a circular, per-process buffer, Inserting a marker is as simple as adding a single line which is divided in sub-buffers. By default, when a sub- of code at the point where the event must be recorded. buffer is full, it is consumed by a consumer daemon. In Figure 2 shows an example of a marker. Markers include another operating mode called flight recorder, the circu- a format string that resembles printf format strings; lar buffers are constantly overwritten, until the buffers they include the name of each argument, the format of are flushed, either by the user or by a program. This the event in the trace and the type of the variable passed is useful to wait until an infrequent bug occurs in the to trace_mark. application. Tracepoints are designed to be more elegant and pro- Each event is associated with a channel. Each process vide type checking. An example is shown in Figure has a distinct buffer for each channel. Having several 3. They do not include a format string in the call, but channels allows to choose the size of sub-buffers per necessitate some declarations, typically in separate C channel. It also allows to keep some events for a longer and header files. This makes them more suitable for period of time by putting them in a low-rate buffer when permanent instrumentation. Markers are best suited for operating in flight recorder mode. quickly adding instrumentation while debugging. The buffers are allocated inside System V shared mem- Markers and tracepoints list information about them- ory segments so the consumer daemon can map them in selves in a special section of the executable or dynamic its address space.

Load more