Trace-Cmd: a Front-End for Ftrace [LWN.Net]

2/25/2021 trace-cmd: A front-end for Ftrace [LWN.net] Content ▶ Edition ▶ Subscribe / Log in / New account trace-cmd: A front-end for Ftrace Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today! Previous LWN articles have explained the basic way to use Ftrace directly through the debugfs filesystem (part 1 and part 2). While the debugfs October 20, 2010 interface is rather simple, it can also be awkward to work with. It is This article was contributed by especially convenient, though, for embedded platforms where it may be Steven Rostedt difficult to build and install special user tools on the device. On the desktop, it may be more convenient to have a command-line tool that works with Ftrace instead of echoing various commands into strange files and reading the result from another file. This tool does exist, and it is called trace- cmd. trace-cmd is a user-space front-end command-line tool for Ftrace. You can download it from the git repository at git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git. Some distributions ship it as a package, and some that currently do not, will soon. There are full man pages included, which are installed with a make install_doc. This article will not go over the information that is already in the man pages, but instead will explain a little about how trace-cmd works and how to use it. How it works A simple use case of trace-cmd is to record a trace and then report it. # trace-cmd record -e ext4 ls [...] # trace-cmd report version = 6 CPU 1 is empty cpus=2 trace-cmd-7374 [000] 1062.484227: ext4_request_inode: \ dev 253:2 dir 40801 mode 33188 trace-cmd-7374 [000] 1062.484309: ext4_allocate_inode: \ dev 253:2 ino 10454 dir 40801 mode 33188 The above example enables the ext4 tracepoints for Ftrace, runs the ls command and records the Ftrace data into a file named trace.dat. The report command reads the trace.dat file and outputs the tracing data to standard output. Some metadata is also shown before the trace output is displayed: the version of the file, any empty CPU buffers, and the number of CPUs that were recorded. By default, the record and report options write and read to the trace.dat file. You can use the -o or -i options to pick a different file to write to or read from respectively, but this article will use the default name when referencing the data file created by trace-cmd. https://lwn.net/Articles/410200/ 1/7 2/25/2021 trace-cmd: A front-end for Ftrace [LWN.net] When recording a trace, trace-cmd will fork off a process for each CPU on the system. Each of these processes will open the file in debugfs that represents the CPU the process is dedicated to record from. The process recording CPU0 will open /sys/kernel/debug/tracing/per_cpu/cpu0/trace_pipe_raw, the process recording CPU1 will open a similar file in the cpu1 directory, and so on. The trace_pipe_raw file is a mapping directly to the Ftrace internal buffer for each CPU. Each of these CPU processes will read these files using splice to record into a temporary file during the trace. At the end of the record, the main process will concatenate the temporary files into a single trace.dat file. There's no need to manually mount the debugfs filesystem before using the tool as trace-cmd will look to see if and where it is mounted. If debugfs is not mounted, it will automatically mount it at /sys/kernel/debug. Recording a trace As noted above, trace-cmd forks off a process for each CPU dedicated to record from that CPU but, in order to prevent scheduling interference, the threads are not pinned to a CPU. Pinning the threads to the CPU being traced may result in better cache usage, so a future version of trace-cmd may add an option to do that. The Ftrace ring buffers are allocated one per CPU, and each thread will read from a particular CPU's ring buffer. It is important to mention this because these threads can show up in the trace. A common request is to have trace-cmd ignore events that are caused by trace-cmd itself. But it is not wise to ignore these events because they show where the tracer may have made an impact on what it is tracing. These events can be filtered out after the trace, but they are good to keep around in the trace.dat file in case some delay was caused by the trace itself, and the events may show that. As trace-cmd is a front end to Ftrace, the arguments of record reflect some of the features of Ftrace. The -e option enables an event. The argument following the -e can be an event name, event subsystem name, or the special name all. The all name will make trace-cmd enable all events that the system supports. If a subsystem name is specified, then all events under that subsystem will be enabled during the trace. For example, specifying sched will enable all the events within the sched subsystem. To enable a single event, the event name can be used by itself, or the subsystem:event format can be used. If the subsystem name is left off, then all events with the given name will be enabled. Currently this would not be an issue because, as of this writing, all events have unique names. If more than one event or subsystem is to be traced, then multiple -e options may be specified. Ftrace also has special plugin tracers that do not simply trace specific events. These tracers include the function, function graph, and latency tracers. Through the debugfs tracing directory, these plugins are enabled by echoing the type of tracer into the current_tracer file. With trace-cmd record, they are enabled with the -p option. Using the tracer plugin name as the argument for -p enables that plugin. You can still specify one or more events with a plugin, but you may only specify a single plugin, or no plugin at all. When the record is finished, trace-cmd examines the kernel buffers and outputs some statistics, which may be a little confusing. Here's an example: Kernel buffer statistics: Note: "entries" are the entries left in the kernel ring buffer and are not recorded in the trace data. They should all be zero. CPU: 0 entries: 0 overrun: 0 commit overrun: 0 CPU: 1 [...] As the output explains, the entries field is not the number of entries that were traced, but the number of entries left in the kernel buffer. If entries were dropped because trace-cmd could not read the buffer faster than it was being written to, and the writer overflowed the buffer, then either the overrun or commit overrun values would be https://lwn.net/Articles/410200/ 2/7 2/25/2021 trace-cmd: A front-end for Ftrace [LWN.net] something other than zero. The overrun value is the number of entries that were dropped due to the buffer filling up, and the writer deleting the older entries. The commit overrun is much less likely to occur. Writes to the buffer is a three step process. First the writer reserves space in the ring buffer, then it writes to it, then it commits the change. Writing to the buffer does not disable interrupts. If a write is preempted by an interrupt, and the interrupt performs a large amount of tracing where it fills the buffer up to the point of the space that was reserved for the write it preempted, then it must drop events because it cannot touch the reserved space until it is committed. These dropped events are the commit overrun. This is highly unlikely to happen unless you have a small buffer. Filtering while recording As explained in "Secrets of the Ftrace function tracer", Ftrace allows you to filter what functions will be traced by the function tracer. Also, you can graph a single function, or a select set of functions, with the function graph tracer. These features are not lost when using trace-cmd. # trace-cmd record -p function -l 'sched_*' -n 'sched_slice' (Note that the above does not specify a command to execute, so trace-cmd will record until Ctrl^C is hit.) The -l option is the same as echoing its argument into set_ftrace_filter, and the -n option is the same as echoing its argument into set_ftrace_notrace. You can have more than one -l or -n option on the command line. trace-cmd will simply write all the arguments into the appropriate file. Note, those options are only useful with the function and function_graph plugins. The -g option (not shown) will pass its argument into the set_graph_function file. Here is a nice trick to see how long interrupts take in the kernel: # trace-cmd record -p function_graph -l do_IRQ -e irq_handler_entry sleep 10 # trace-cmd report version = 6 cpus=2 Xorg-4262 [001] 212767.154882: funcgraph_entry: | do_IRQ() { Xorg-4262 [001] 212767.154887: irq_handler_entry: irq=21 name=sata_nv Xorg-4262 [001] 212767.154952: funcgraph_exit: + 71.706 us | } Xorg-4262 [001] 212767.156948: funcgraph_entry: | do_IRQ() { Xorg-4262 [001] 212767.156952: irq_handler_entry: irq=22 name=ehci_hcd:usb1 Xorg-4262 [001] 212767.156955: irq_handler_entry: irq=22 name=NVidia CK804 Xorg-4262 [001] 212767.156985: funcgraph_exit: + 37.795 us | } The events can also be filtered.

Trace-Cmd: a Front-End for Ftrace [LWN.Net]

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support