Packet-Level Analytics in Software Without Compromises

Packet-Level Analytics in Software without Compromises Oliver Michel John Sonchack University of Colorado Boulder University of Pennsylvania Eric Keller Jonathan M. Smith University of Colorado Boulder University of Pennsylvania Abstract just could not process the information fast enough. These approaches, of course, reduce information – aggregation Traditionally, network monitoring and analytics systems reduces the load of the analytics system at the cost of rely on aggregation (e.g., flow records) or sampling to granularity, as per-packet data is reduced to groups of cope with high packet rates. This has the downside packets in the form of sums or counts [3, 16]. Sampling that, in doing so, we lose data granularity and accu- and filtering reduces the number of packets or flows to be racy, and, in general, limit the possible network analytics analyzed. Reducing information reduces load, but it also we can perform. Recent proposals leveraging software- increases the chance of missing critical information, and defined networking or programmable hardware provide restricts the set of possible applications [30, 28]. more fine-grained, per-packet monitoring but are still Recent advances in software-defined networking based on the fundamental principle of data reduction in (SDN) and more programmable hardware have provided the network, before analytics. In this paper, we pro- opportunities for more fine-grained monitoring, towards vide a first step towards a cloud-scale, packet-level mon- packet-level network analytics. Packet-level analytics itoring and analytics system based on stream processing systems provide the benefit of complete insight into the entirely in software. Software provides virtually unlim- network and open up opportunities for applications that ited programmability and makes modern (e.g., machine- require per-packet data in the network [37]. But, com- learning) network analytics applications possible. We promises are still being made. identify unique features of network analytics applica- For example, [34, 23, 21] leverage processing prim- tions which enable the specialization of stream process- itives in hardware for custom aggregation (as opposed ing systems. As a result, an evaluation with our pre- to fixed aggregation in legacy hardware). Although cus- liminary implementation shows that we can scale up to tom, the aggregation is still limited to a set of pre-defined several million packets per second per core and together functions that are implementable in programmable for- with load balancing and further optimizations, the vision warding engines (PFE) [5, 10], and thus imposes re- of cloud-scale per-packet network analytics is possible. strictions on possible applications (e.g., [9, 19, 24, 31]). 1 Introduction These new programmable data planes have also been used for more granular and iteratively refined filtering in Network monitoring and analytics are crucial for the re- hardware [37, 15]. This, however, is only effective at re- liable and secure operation of cloud-scale networking ducing load if the application is only interested in a small environments. A monitoring system captures informa- subset of the overall traffic (e.g., only DNS packets). tion about traffic. An analytics system processes that in- In summary, monitoring systems are historically de- formation to detect equipment failures, incorrect behav- signed to reduce the load of the software analytics plat- ior (e.g., packet drops or routing loops), and attacks, as form based on the assumption that it cannot handle the well as to identify and locate performance issues within load. the infrastructure [34, 23, 21, 37, 15, 20]. In order to In this paper we challenge that assumption. We ask: reduce the load on the analytics systems, compromises What are the limits of software analytics systems – is it have been made in the monitoring systems. Aggregation possible to perform packet-level analytics on cloud scale e.g, ( pre-calculating information about entire flows, and infrastructures in software? aggregating into a flow record) and sampling (only look- We start with the assumption that hardware can send ing at a subset of the flows or packets), have been the per-packet records [29] to the software-based analytics de-facto standards for decades as the software systems system. The challenge is then processing those records 3 Stream Processing for Packet Records as (i) modern networks run at traffic rates of several ter- abits of data per second, which can equate to hundreds of Stream processing is widely used in many domains be- millions of packets per second [27], and (ii) modern an- cause it is a flexible and scalable way to process un- alytics systems, specifically stream processing systems bounded streams of records in real-time. These prop- (which is an ideal programming model for network an- erties also make stream processing a natural fit for real- alytics), such as Apache Flink [6], or Kafka [7], simply izing the goal of a software platform capable of flexible are not capable of handling these rates. Even more re- analytics with visibility into every packet. cent work, such as StreamBox, a stream processor de- As explained in the introduction, using general pur- signed for efficiency [22], can only support around 30M pose stream processors for packet analytics, however, is records/s with a 56 core server for a basic example appli- not efficient enough to be practical. The root cause of cation unrelated to network processing. In modern net- these issues are the differences between the workloads works, where even 10 Gb/s links have packet arrival rates for packet analytics and those typical to stream process- of 500K packets per second [4], real time packet ana- ing. The differences present challenges, but also intro- lytics with such general purpose systems would require duce new opportunities for optimization. By taking ad- racks full of servers. vantage of them, we can specialize stream processing Our insight is that stream processing for network an- systems for packet analytics to build systems that have alytics has fundamentally different characteristics than not only the flexibility and scalability benefits inherent traditional uses of stream processing. In this paper to the processing model, but also the efficiency to make we highlight some of these differences and show how software visibility into every packet practical at large they can impact the achievable processing rates (Sec- scales. tion 3). As a step towards validating our vision, we In this section, we explore some of the distinct charac- outline an architecture (Section 4), and built a proto- teristics of packet analytics workloads and how we can type with some preliminary optimizations (Section 5). optimize stream processors for them. With this, our evaluations indicate more than two orders 3.1 Packet Analytics Workloads of magnitude speedup (163×) over state-of-the-art solu- We illustrate the differences between a packet analytics tions [15, 37, 22] (Section 6). Based on data from Face- workload and that of a typical stream processing task book [27], we demonstrate that for an entire cluster in with a case study based on Twitter’s well-documented a Facebook data center our prototype can analyze every use of stream processing [32, 33]. packet leaving the cluster on a single commodity server (representing only 0.5% of the cluster’s hardware). As a High Record Rates One of the most striking differ- result, we believe that powerful packet-level analytics in ences between packet analytics workloads and typical software at cloud scale (without compromise) is indeed stream processing workloads are higher record rates. For within reach. example, Twitter reports that their stream processing systems handle up to 46 M events per second. For compari- son, the aggregate rate of packets leaving one small com- ponent of their platform is over 320 M per second. This 2 Stream Processing Overview network (the “Twitter Cache”), only represents 3% of the total infrastructure. Stream processing is a paradigm for parallel data processing applications. Stream processors are composed of Small Records Although record rates are higher for sequential data processing functions, that are connected packet analytics, the sizes of individual records are by FIFO queues transferring the streams of data. The smaller, which makes the overall bit-rate of the process- benefit of stream processing is that it compartmentalizes ing manageable. Network analytics applications are pre- state and logic. Processing functions only share data via dominately interested in statistics derived from packet streams. This allows each processing element to run in headers and processing metadata, which are only a small parallel, on a separate processing core or even a separate portion of each packet. A 40 B packet record, for ex- server. In this model, each processor (or kernel) usually ample, can contain the headers required for most packet transforms data (or tuples) from one or multiple input analytics tasks. In contrast, records in typical stream pro- streams into one or multiple output streams by perform- cessing workloads are much larger. ing some sort of stateful or stateless computation on the Event Rate Reduction Packet analytics applications tuples and including the results in the output tuple. Re- often aggregate data significantly before applying heavy- cently, with the increasing focus on real time and big data weight data mining or visualization algorithms, e.g., by analysis, there have been many new streaming platforms TCP connection. This is not true for general stream pro- that focus on efficient scaling and ease of use [7, 6, 35]. cessing workloads, where the back-end algorithm may mirrored packet Packet Queue Switch headers Kernel Bypass Alerting (e.g., DPDK, NIC intermediate netmap) results Switch parsed packet kernel Storage headers Figure 1: System Architecture operate on features derived from each record. such as DPDK [2], PF RING [25], or netmap [26], to Simple, Well Formed Records Packet analytics reduce overhead by mapping the packet records directly records are also simple and well formed.

Packet-Level Analytics in Software Without Compromises

High-Level and Efficient Stream Parallelism on Multi-Core Systems

Queue Streaming Model Theory, Algorithms, and Implementation

Pash: Light-Touch Data-Parallel Shell Processing

A Personal Analytics Platform for the Internet of Things Implementing Kappa Architecture with Microservice-Based Stream Processing

Paving the Way Towards High-Level Parallel Pattern Interfaces for Data Stream Processing

A Complete Bibliography of Publications in the International

Balancing Performance and Reliability in Software Components

Raftlib: a C++ Template Library for High Performance Stream Parallel Processing

User-Defined Execution Relaxations for Enhanced Programmability in High-Performance Parallel Computing

Online Modeling and Tuning of Parallel Stream Processing Systems Jonathan Curtis Beard Washington University in St

Intrinsic Compilation Model to Enhance Performance of Real Time

Efficiently and Transparently Maintaining High SIMD Occupancy in the Presence of Wavefront Irregularity Stephen V