Tetration Analytics - Network Analytics & Machine Learning Enhancing Data Center Security and Operations
Total Page:16
File Type:pdf, Size:1020Kb
Tetration Analytics - Network Analytics & Machine Learning Enhancing Data Center Security and Operations Michael Herbert Principal Engineer INSBU BRKACI-2040 Okay what does Tetration Mean? • Tetration (or hyper-4) is the next hyperoperation after exponentiation, and is defined as iterated exponentiation • It’s bigger than a Google [sic] (Googol) • And yes the developers are a bunch of mathematical geeks BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 3 What if you could actually look at every process and every data packet header that has ever traversed the network without sampling? BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 4 Cisco Tetration Analytics Pervasive Sensor Framework Provides correlation of data sources across entire application infrastructure Enables identification of point events and provides insight into overall systems behavior Monitors end-to-end lifecycle of application connectivity BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 5 Cisco Tetration Analytics Policy Discovery and Observation APPLICATION WORKSPACES Public Cloud Private Cloud Cisco Tetration Analytics™ Application Segmentation Policy © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Profile and Context Driven Application Segmentation 1. Real-time Asset Tagging 2. Policy Workflows 3. Policy Enforcement (Role Based and Hierarchical) Cisco Tetration Application Insights (ADM) No Need to Tie Policy + to IP Address and Cisco Tetration Sensors Tag and Label-Based Add-on Policy Port (For Example, Mail Filters) Cisco Tetration Customer Defined Platform Performs the Translation Compliance Monitoring Enforcement Public Cloud Bare Metal Virtual Cisco ACITM* Traditional Network* © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Tetration Analytics: Open Access NORTHBOUND NORTHBOUND NORTHBOUND APPLICATION CONSUMERS CONSUMERS Kafka Broker Programmatic Message Tetration Interface Publish Apps Cisco Tetration Analytics Platform REST API Push Notification Tetration Apps Tetration flow search Out-of-box events Access to data lake Sensor management User defined events Write your own application © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Tetration Analytics Platform Architecture - Sensors Tetration Analytics Architecture Overview Data Collection Analytics Engine Open Access Software Sensor and Web GUI Enforcement Cisco Embedded REST API Network Sensors Tetration (Telemetry Only) Analytics Event Notification Cluster Third Party Sources (Configuration Data) Tetration Apps Self Managed Cluster No Hadoop / Data Science Background Needed Easy Integration via Open interfaces One Touch Deployment No External Storage Needed Open Data Lake (via Tetration Apps) BRKACI-2060 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Traditional Monitoring Is Showing Its Age Not suited for Modern Network and Security Operations Where Data Is Created Where Data Is Useful SNMP SNMP Server Non Syslog Real Syslog Collector time Storage & Analysis CLI Strong burden on Scripts back-end Normalize different encodings, transports, data models, timestamps BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 11 Data Granularity Needs to Improve One Minute SNMP Polling Telemetry – 10 Second Push SNMP – 1 Minute Polling © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Data Granularity Needs to Improve 10 Second SW Process Push Telemetry – 10 Second Push © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Data Granularity Needs to Improve Sub Second HW/SW Push © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Data Granularity Needs to Improve Type of Problems Customers are Looking to Address Workload Placement Service Level Monitoring ADM Security and Policy Enforcement Microburst Detection Traffic Engineering Capacity Planning Troubleshooting & Remediation (Self Driving) On-Change <= 1 sec ~10s sec ~minutes-hours Resolution = Frequency of Data Collection © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Processing on the Source Device is Expensive e.g. Consider Flow Collection Efficiency 512K Sampled Flow Cache with Flow Flow Data streaming export Table • Collect and Keep all Flow Data in the • Maintain a small ‘cache’ and Local Hardware or Software Flow export the cache at a high data Table • Sampling Flows Reduces rate • Size of the Table depends on the Cost of the Telemetry but • Shift the cost of aggregation to Data Rates and Connectivity Density Reduces Accuracy backend resources • BW is Growing Faster than Memory • Aggregate ‘Flow Table’ can be (Cost of Flow Entry per Gbps is not much larger flat) © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public The Richer the Data Sources the Better More Data == Better Interpolation Lamp Sensor Plug Sensor Heater © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public The Richer the Data Sources the Better You don’t always know what you need in advance • On-Box Filtering Loses Data • Can’t Change Your Mind About What’s Important Later • Can’t Scale Out Embedded Processing • Compression (Lossless) is Good • Massive Amounts of Data Motivate the Shift in Collection • Bulk Collection is Efficient • Bulk Processing/Export Not So Much © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Streaming Telemetry is a game changer Monitoring becomes a big data problem Where Data Is Created Where Data Is Useful Removing limitations and complexity • Streaming paradigm Real time • Dense Sensor Framework • Increased Data Granularity Volume – Scale of Data Velocity – Analysis of Streaming Data • Update on every event Variety – Different Forms of Data • Multiple Data Sources Big Data and Machine Learning Problem © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Pervasive Sensors Software Sensors Network Sensors Third Party Sources Available Now Next Generation 9K switches 3rd party Data Sources Linux VM Asset Tagging Nexus 9200-X Load Balancers Windows Server VM Bare Metal IP Address Management (Linux and Windows Server) Nexus 9300- CMDB Universal* EX/FX (Basic Sensor for other OS) … *Note: No per-packet Telemetry, Not an enforcement point New! Enforcement Point (Software agents) Low CPU Overhead (SLA enforced) Highly Secure (Code Signed, Authenticated) Low Network Overhead (SLA enforced) Every Flow (No sampling), NO PAYLOAD BRKACI-2060 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Software Sensor Tetration Sensor Application • Runs in the Host OS, not the Hypervisor libpcap Network Stack • Access to accurate state of the application and all connectivity Driver • Not in the data path • Sits in User Space • Designed by Kernel Developers NIC • Secure • Code Signed • SLA Enforcement • CPU and BW throttling BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 21 Software Sensor Enforcement Process High Privilege Collection • When leveraging the enforcement capability an additional component is downloaded by the Cluster to the existing sensors Low Privilege Monitoring • Monitoring and Enforcement are distinct functions with distinct threads (the enforcement code does not exist in the server until explicitly pushed Cluster Link • Agent will implement privilege separation • SSL libraries would run in low privilege space High Privilege Enforcement • /proc parsing in high privilege space • Enforcement in high privilege space Low Privilege Cluster Link © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public PKI within the Cluster/Sensor • Tetration Cluster runs an internal PKI • Root CA is per cluster, inserted at Image creation • Not accessible outside the cluster • Cannot connect to an external PKI • Certificate based authentication is performed for the Control Channel • CN of the certificate is the IP address • Certificates are rotated every 60 days • Sensors are code signed • Signature Authority is Cisco’s code signing certificate • Code Signature is validated at process start BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 23 How Sensor Communicate with the Cluster the First Time? Register with web server via ssl Assign UUID Rails Register with web server via ssl Sensor Download config Config Server Send meta data to collectors Collector BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 24 Components & Communication Software Sensor Agent Communication Unix Socket Control Channel TCP-SSL 443 Tetration Cluster Software Sensor/Agent Sensor Data TCP-SSL 5640 • When used policies pushed from the cluster are pairwise signed with TS (Replay protected) between Cluster and sensor agent LINUX/Windows/… • If rules changed on the end host – Enforcer restates the rules and sends a Notification to Controller 25 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Tetration Sensor Overhead (e.g. 2263 sensors) • CPU utilization on Host Sensor based on current deployments averages < 1% • Flow collection has zero impact on switch hardware sensor CPU • Network Overhead is ~1% of observed traffic load Presentation ID © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 26 Tetration Host Sensor Has Three Rate Limiting Modes Top Adjusted Disabled • Uses no more CPU % than • Takes the provided limit and • Use in hosts where the given limit