Packet-Level Network Telemetry and Analytics

Packet-Level Network Telemetry and Analytics

Packet-Level Network Telemetry and Analytics by Oliver Michel B.Sc., University of Vienna, 2013 M.S., University of Colorado Boulder, 2015 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science 2019 This thesis entitled: Packet-Level Network Telemetry and Analytics written by Oliver Michel has been approved for the Department of Computer Science Professor Eric Keller Professor Dirk Grunwald Professor Sangtae Ha Professor Eric Rozner Professor Eric Wustrow Date The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above mentioned discipline. Michel, Oliver (Ph.D., Computer Science) Packet-Level Network Telemetry and Analytics Thesis directed by Professor Eric Keller Continuous monitoring is an essential part of the operation of computer networks. High- fidelity monitoring data can be used to detect security issues, misconfigurations, equipment failure, or to perform traffic engineering. With networks growing in complexity, traffic volume, and facing more complex attacks, the need for continuous and precise monitoring is greater than ever before. Existing SNMP or NetFlow based approaches are not suited for these new challenges as they com- promise on flexibility, fidelity, and performance. These compromises are a result of the assumption that analytics software cannot scale to high traffic rates. In this work, we look holistically at the requirements and challenges in network monitoring and present an architecture consisting of integrated telemetry, analytics, and record persistence components. By finding the right balance between responsibilities of hardware and software, we demonstrate that flexible and high-fidelity network analytics at high rates is indeed possible. Our system includes a packet-level, analytics-aware telemetry component in the data plane that runs at line-rates of several Terabits per second and tightly integrates with a flexible software network analytics platform. Operators can interact with this system through a time series database interface that also provides record persistence. We implement a full prototype of our system called Jetstream which can process approximately 80 million packets per 16-core commodity server for a wide variety of monitoring applications and scales linearly with server count. Contents Chapter 1 Introduction 1 2 Network Monitoring and Analytics 7 2.1 Background . .7 2.1.1 Network Management . .7 2.1.2 Sensors . .8 2.1.3 Telemetry-based Network Monitoring Architecture . 10 2.2 Applications . 13 2.2.1 Performance Monitoring and Debugging . 14 2.2.2 Traffic Engineering . 14 2.2.3 Traffic Classification . 15 2.2.4 Intrusion Detection . 15 2.3 Emerging Requirements and Challenges . 16 2.3.1 Traffic Volume . 16 2.3.2 Packet-Level Data . 19 2.4 Enabling Technologies and Opportunities . 23 2.4.1 Programmable Data Planes . 23 2.4.2 Parallel Streaming Analytics . 26 v 3 Packet-Level Network Telemetry 29 3.1 Introduction . 29 3.2 Background . 32 3.2.1 Design Goals . 33 3.2.2 Prior Telemetry Systems . 35 3.3 PFE Accelerated Telemetry . 37 3.4 Grouped Packet Vectors . 39 3.5 Generating GPVs . 41 3.5.1 PFE Architecture . 42 3.5.2 Design . 43 3.5.3 Implementation . 44 3.5.4 Configuration . 45 3.6 Analytics-aware Network Telemetry . 46 3.7 Processing GPVs . 49 3.7.1 The *Flow Agent . 49 3.7.2 *Flow Monitoring Applications . 50 3.7.3 Interactive Measurement Framework . 52 3.8 Evaluation . 53 3.8.1 The *Flow Cache . 54 3.8.2 *Flow Agent and Applications . 57 3.8.3 Comparison with Marple . 59 3.8.4 Analytics Plane Interface . 60 3.9 Conclusion . 61 4 Scalable Network Streaming Analytics 62 4.1 Introduction . 62 4.2 Motivation . 65 vi 4.2.1 Compromising on Flexibility . 65 4.2.2 Software Network Analytics Strawman . 66 4.3 Introducing Jetstream . 68 4.3.1 Analytics-aware Network Telemetry . 69 4.3.2 Highly-parallel Streaming Analytics . 70 4.3.3 User Analysis and Monitoring with On-Demand Aggregation . 71 4.4 High-Performance Stream Processing for Network Records . 71 4.4.1 Packet Analytics Workloads . 72 4.4.2 Optimization Opportunities . 73 4.5 User Analysis and Monitoring with On-Demand Aggregation . 75 4.6 Programmability and Applications . 78 4.6.1 Input/Output and Record Format . 78 4.6.2 Programming Model . 79 4.6.3 Custom Processors . 79 4.6.4 Standard Applications . 80 4.7 Evaluation . 81 4.7.1 Stream Processing Optimizations . 82 4.7.2 Scalability . 86 4.8 Deployment Analysis . 87 4.9 Related Work . 88 4.10 Conclusion . 90 5 Persistent Interactive Queries for Network Security Analytics 91 5.1 Introduction . 91 5.2 Background . 92 5.2.1 Database Models . 93 5.2.2 Time-Series Databases . 93 vii 5.2.3 Database Requirements for Network Record Persistence . 94 5.3 Querying Network Records . 95 5.3.1 Network Queries . 95 5.3.2 Retrospective Queries and Debugging . 96 5.4 Inserting Network Records . 98 5.5 Storing Network Records . 100 5.5.1 Grouped Packet Vectors . 100 5.5.2 Storage and Record Retention . 102 5.6 Conclusion . 103 6 Future Work and Conclusion 105 6.1 Future Work . 105 6.1.1 *Flow . 105 6.1.2 Jetstream . 106 6.1.3 Persistent Interactive Queries . 107 6.1.4 Orchestration . 107 6.2 Conclusion . 108 Bibliography 110 viii Tables Table 2.1 Facebook 24 hour datacenter trace [144, 69] statistics. 17 2.2 Statistics for CAIDA 2015 Passive Internet Core Traces [44] . 18 3.1 Practical requirements for PFE supported network queries. 33 3.2 Record count and sizes for a 1 hour 10 Gbit/s core Internet router trace [44]. DFRs contain IP 5 tuples, packet sizes, timestamps, TCP flags, ToS flag, and TTLs . 39 3.3 Resource requirements for *Flow on the Tofino, configured with 16384 cache slots, 16384 16-byte short buffers, and 4096 96-byte wide buffers. 54 3.4 Average throughput, in GPVs per second, for *Flow agent and applications. 57 3.5 Banzai pipeline usage for the *Flow cache and compiled Marple queries. 59 4.1 Properties of the different evaluated concurrent queue implementations. 84 5.1 Grouped Packet Vector Format . 100 5.2 Packet Record Format . 101 Figures Figure 1.1 Overview of a Telemetry-Based Network Monitoring and Analytics System . .2 1.2 Toccoa Network Telemetry & Analytics Components . .5 2.1 Network Management and Operation Lifecycle ..

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    133 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us