Nanolog: a Nanosecond Scale Logging System

NanoLog: A Nanosecond Scale Logging System Stephen Yang, Seo Jin Park, and John Ousterhout, Stanford University https://www.usenix.org/conference/atc18/presentation/yang-stephen This paper is included in the Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC ’18). July 11–13, 2018 • Boston, MA, USA ISBN 978-1-939133-02-1 Open access to the Proceedings of the 2018 USENIX Annual Technical Conference is sponsored by USENIX. NanoLog: A Nanosecond Scale Logging System Stephen Yang Seo Jin Park John Ousterhout Stanford University Stanford University Stanford University Abstract able log messages to maintain performance. According NanoLog is a nanosecond scale logging system that is to our contacts at Google[7] and VMware[28], a consid- 1-2 orders of magnitude faster than existing logging sys- erable amount of time is spent in code reviews discussing tems such as Log4j2, spdlog, Boost log or Event Tracing whether to keep log messages or remove them for per- for Windows. The system achieves a throughput up to 80 formance. Additionally, this process culls a lot of useful million log messages per second for simple messages and debugging information, resulting in many more person has a typical log invocation overhead of 8 nanoseconds hours spent later debugging. Logging itself is expensive, in microbenchmarks and 18 nanoseconds in applications, but lacking proper logging is very expensive. despite exposing a traditional printf-like API. NanoLog The problem is exacerbated by the current trend to- achieves this low latency and high throughput by shifting wards low-latency applications and micro-services. Sys- work out of the runtime hot path and into the compila- tems such as Redis [34], FaRM [4], MICA[18] and tion and post-execution phases of the application. More RAMCloud [29] can process requests in as little as 1-2 specifically, it slims down user log messages at compile- microseconds; with today’s logging systems, these sys- time by extracting static log components, outputs the log tems cannot log events at the granularity of individual in a compacted, binary format at runtime, and utilizes an requests. This mismatch makes it difficult or impossible offline process to re-inflate the compacted logs. Addi- for companies to deploy low-latency services. One in- tionally, log analytic applications can directly consume dustry partner informed us that their company will not the compacted log and see a performance improvement deploy low latency systems until there are logging sys- of over 8x due to I/O savings. Overall, the lower cost tems fast enough to be used with them [7]. of NanoLog allows developers to log more often, log in NanoLog is a new, open-source [47] logging system more detail, and use logging in low-latency production that is 1-2 orders of magnitude faster than existing sys- settings where traditional logging mechanisms are too tems such as Log4j2 [43], spdlog [38], glog [11], Boost expensive. Log [2], or Event Tracing for Windows [31]. NanoLog 1 Introduction retains the convenient printf[33]-like API of existing logging systems, but it offers a throughput of around 80 mil- Logging plays an important role in production soft- lion messages per second for simple log messages, with ware systems, and it is particularly important for large- a caller latency of only 8 nanoseconds in microbench- scale distributed systems running in datacenters. Log marks. For reference, Log4j2 only achieves a throughput messages record interesting events during the execution of 1.5 million messages per second with latencies in the of a system, which serve several purposes. After a crash, hundreds of nanoseconds for the same microbenchmark. logs are often the best available tool for debugging the NanoLog achieves this performance by shifting work root cause. In addition, logs can be analyzed to provide out of the runtime hot path and into the compilation and visibility into a system’s behavior, including its load and post-execution phases of the application: performance, the effectiveness of its policies, and rates of recoverable errors. Logs also provide a valuable source • It rewrites logging statements at compile time to re- of data about user behavior and preferences, which can move static information and defers expensive mes- be mined for business purposes. The more events that are sage formatting until the post-execution phase. This recorded in a log, the more valuable it becomes. dramatically reduces the computation and I/O band- Unfortunately, logging today is expensive. Just for- width requirements at runtime. matting a simple log message takes on the order of one • It compiles specialized code for each log message microsecond in typical logging systems. Additionally, to handle its dynamic arguments efficiently. This each log message typically occupies 50-100 bytes, so avoids runtime parsing of log messages and encod- available I/O bandwidth also limits the rate at which log ing argument types. messages can be recorded. As a result, developers are often forced to make painful choices about which events • It uses a lightweight compaction scheme and out- to log; this impacts their ability to debug problems and puts the log out-of-order to save I/O and processing understand system behavior. at runtime. Slow logging is such a problem today that software de- • It uses a postprocessor to combine compacted log velopment organizations find themselves removing valu- data with extracted static information to generate USENIX Association 2018 USENIX Annual Technical Conference 335 NANO_LOG(NOTICE, "Creating table ’%s’ with id %d", name, tableId); 2017/3/18 21:35:16.554575617 TableManager.cc:1031 NOTICE[4]: Creating table ’orders’ with id 11 Figure 1: A typical logging statement (top) and the resulting output in the log file (bottom). “NOTICE” is a log severity level and “[4]” is a thread identifier. human-readable logs. In addition, aggregation and pealing for low-latency systems because it can result in analytics can be performed directly on the com- long tail latencies or even, in some situations, the appear- pacted log, which improves throughput by over 8x. ance that a server has crashed. In general, developers must ensure that an application 2 Background and Motivation doesn’t generate log messages faster than they can be Logging systems allow developers to generate a processed. One approach is to filter log messages accord- human-readable trace of an application during its execu- ing to their severity level; the threshold might be higher tion. Most logging systems provide facilities similar to in a production environment than when testing. Another those in Figure1. The developer annotates system code possible approach is to sample log messages at random, with logging statements. Each logging statement uses a but this may cause key messages (such as those identi- printf-like interface[33] to specify a static string indicat- fying a crash) to be lost. The final (but not uncommon) ing what just happened and also some runtime data asso- recourse is a social process whereby developers deter- ciated with the event. The logging system then adds sup- mine which log messages are most important and remove plemental information such as the time when the event the less critical ones to improve performance. Unfortu- occurred, the source code file and line number of the log- nately, all of these approaches compromise visibility to ging statement, a severity level, and the identifier of the get around the limitations of the logging system. logging thread. The design of NanoLog grew out of two observa- The simplest implementation of logging is to output tions about logging. The first observation is that fully- each log message synchronously, inline with the execu- formatted human-readable messages don’t necessarily tion of the application. This approach has relatively low need to be produced inside the application. Instead, the performance, for two reasons. First, formatting a log application could log the raw components of each mes- message typically takes 0.5-1 µs (1000-2000 cycles). In sage and the human-readable messages could be gener- a low latency server, this could represent a significant ated later, if/when a human needs them. Many logs are fraction of the total service time for a request. Second, never read by humans, in which case the message for- the I/O is expensive. Log messages are typically 50-100 matting step could be skipped. When logs are read, only bytes long, so a flash drive with 250 Mbytes/sec band- a small fraction of the messages are typically examined, width can only absorb a few million messages per sec- such as those around the time of a crash, so only a small ond. In addition, the application will occasionally have fraction of logs needs to be formatted. And finally, many to make kernel calls to flush the log buffer, which will logs are processed by analytics engines. In this case, introduce additional delays. it is much faster to process the raw data than a human- The most common solution to these problems is to readable version of the log. move the expensive operations to a separate thread. For example, I/O can be performed in a background thread: The second observation is that log messages are fairly the main application thread writes log messages to a redundant and most of their content is static. For exam- buffer in memory, and the background thread makes the ple, in the log message in Figure1, the only dynamic kernel calls to write the buffer to a file. This allows I/O to parts of the message are the time, the thread identifier, happen in parallel with application execution. Some sys- and the values of the name and tableId variables. All tems, such as TimeTrace in PerfUtils [32], also offload of the other information is known at compile-time and is the formatting to the background thread by packaging all repeated in every invocation of that logging statement.

Nanolog: a Nanosecond Scale Logging System

Log4j-Users-Guide.Pdf

LMAX Disruptor

High Performance & Low Latency Complex Event Processor

Need a Title Here

Varon-T Documentation Release 2.0.1-Dev-5-G376477b

On New Approaches of Assessing Network Vulnerability: Hardness and Approximation Thang N

Data Structures of Big Data: How They Scale

Research Article DECISION TREE LEARNING and REGRESSION MODELS to PREDICT ENDOCRINE DISRUPTOR CHEMICALS - a BIG DATA ANALYTICS APPROACH with HADOOP and APACHE SPARK

Anomaly Detection in Manufacturing Equipment with Apache Flink : Grand Challenge Yann Busnel, Nicolo Riveei, Avigdor Gal

Operating Model Editorial Board

Log4j User Guide

Brettner Washington 0250E 21