Tetration Analytics - Network Analytics & Machine Learning Enhancing Data Center Security and Operations
Mike Herbert, Principal Engineer, INSBU BRKDCN-2040 Session Abstract
Huge amounts of data traverse network infrastructure on a daily basis. With the innovative big data analytics capabilities, it is possible to use rich network metrics to provide unprecedented insight into IT infrastructure. By leveraging pervasive low overhead sensors in both hardware and software, a complete view of application and network behavior can be attained in real time. In modern data center today some of the key operational and security challenges faced are understanding applications dependencies accurately, ability to generate consistent whitelist policy model and to ensure network policy compliance. This session will describe how Analytics uses unsupervised machine learning approach to collect hundreds of data points and, use advanced analytics, addresses these challenges in a scalable fashion.
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 3 If this is not what you were hoping for here are some other Tetration Sessions
• Tetration Analytics, the secret ingredient for every Data Center • Session ID: PSODCN-1800
• Cisco Tetration: Data Center Analytics Deployment and Use Cases • Session ID: BRKACI-2060
• Tetration API’s : • Session ID: DEVNET-2423
• Tetration Analytics - Industry's Powerful Analytics Platform • Session ID: LABACI-3020
• Inside Cisco IT: ACI & Tetration Analytics • Session ID: BRKCOC-2006
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 4 Okay what does Tetration Mean?
• Tetration (or hyper-4) is the next hyperoperation after exponentiation, and is defined as iterated exponentiation
• It’s bigger than a Google [sic] (Googol)
• And yes the developers are a bunch of mathematical geeks
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 5 Tetration Analytics Platform
Introduction We Are at the Cusp of a Major Shift TRADITIONAL DATA CENTRE CLOUD DATA CENTRE Adoption Curve HYBRID CLOUDS
We are here Efficiency AUTOMATION
IT as a Service IaaS | PaaS | SaaS | XaaS
Flexible Consumption Models
VIRTUALISATION CONSOLIDATION
EFFICIENCY SIMPLICITY | SPEED DIGITAL EXPERIENCES 2000 2010 2015 The Next 5+ Years
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 7 What if you could actually look at every process and every data packet header that has ever traversed the network without sampling?
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 8 Cisco Tetration Analytics Pervasive Sensor Framework
Provides correlation of data sources across entire application infrastructure
Enables identification of point events and provides insight into overall systems behavior
Monitors end-to-end lifecycle of application connectivity
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 9 Cisco Tetration Analytics Policy Discovery and Observation APPLICATION WORKSPACES Public Cloud
Private Cloud
Cisco Tetration Analytics™ Application Segmentation Policy
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 10 Profile and Context Driven Application Segmentation
1. Real-time Asset Tagging 2. Policy Workflows 3. Policy Enforcement (Role Based and Hierarchical) Cisco Tetration Application Insights (ADM) No Need to Tie Policy + to IP Address and Cisco Tetration Sensors Tag and Label-Based Add-on Policy Port (For Example, Mail Filters) Cisco Tetration Customer Defined Platform Performs the Translation
Compliance Monitoring Enforcement
Public Cloud Bare Metal Virtual Cisco ACITM* Traditional Network*
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 11 Tetration Analytics: Open Access
NORTHBOUND NORTHBOUND NORTHBOUND APPLICATION CONSUMERS CONSUMERS
Kafka Broker
Programmatic Message Tetration Interface Publish Apps
Cisco Tetration Analytics Platform
REST API Push Notification Tetration Apps Tetration flow search Out-of-box events Access to data lake Sensor management User defined events Write your own application
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 12 Tetration Analytics Platform
Architecture - Sensors Tetration Analytics Architecture Overview
Data Collection Analytics Engine Open Access
Software Sensor and Web GUI Enforcement Cisco Embedded REST API Network Sensors Tetration (Telemetry Only) Analytics Event Notification Cluster
Third Party Sources (Configuration Data) Tetration Apps
Self Managed Cluster No Hadoop / Data Science Background Needed Easy Integration via Open interfaces One Touch Deployment No External Storage Needed Open Data Lake (via Tetration Apps)
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 14 Traditional Monitoring Is Showing Its Age Not suited for Modern Network and Security Operations
Where Data Is Created Where Data Is Useful
SNMP SNMP Server
Non Syslog Real Syslog Collector time Storage & Analysis CLI Strong burden on Scripts back-end Normalize different encodings, transports, data models, timestamps
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 15 Data Granularity Needs to Improve One Minute SNMP Polling
Telemetry – 10 Second Push SNMP – 1 Minute Polling
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 16 Data Granularity Needs to Improve 10 Second SW Process Push
Telemetry – 10 Second Push
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 17 Data Granularity Needs to Improve Sub Second HW/SW Push
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 18 Data Granularity Needs to Improve Type of Problems Customers are Looking to Address
Workload Placement
Service Level Monitoring
ADM
Security and Policy Enforcement
Microburst Detection Traffic Engineering
Capacity Planning
Troubleshooting & Remediation (Self Driving)
On-Change <= 1 sec ~10s sec ~minutes-hours
Resolution = Frequency of Data Collection
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 19 Processing on the Source Device is Expensive e.g. Consider Flow Collection Efficiency
512K Sampled Flow Cache with Flow Flow Data streaming export Table
• Collect and Keep all Flow Data in the • Maintain a small ‘cache’ and Local Hardware or Software Flow export the cache at a high data Table • Sampling Flows Reduces rate • Size of the Table depends on the Cost of the Telemetry but • Shift the cost of aggregation to Data Rates and Connectivity Density Reduces Accuracy backend resources • BW is Growing Faster than Memory • Aggregate ‘Flow Table’ can be (Cost of Flow Entry per Gbps is not much larger flat)
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 20 The Richer the Data Sources the Better More Data == Better Interpolation
Lamp Sensor Plug Sensor
Heater
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 21 The Richer the Data Sources the Better You don’t always know what you need in advance
• On-Box Filtering Loses Data • Can’t Change Your Mind About What’s Important Later • Can’t Scale Out Embedded Processing • Compression (Lossless) is Good • Massive Amounts of Data Motivate the Shift in Collection • Bulk Collection is Efficient • Bulk Processing/Export Not So Much
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 22 Streaming Telemetry is a game changer Monitoring becomes a big data problem
Where Data Is Created Where Data Is Useful
Removing limitations and complexity
• Streaming paradigm Real time • Dense Sensor Framework
• Increased Data Granularity Volume – Scale of Data Velocity – Analysis of Streaming Data • Update on every event Variety – Different Forms of Data
• Multiple Data Sources Big Data and Machine Learning Problem
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 23 Pervasive Sensors
Software Sensors Network Sensors Third Party Sources Available Now Next Generation 9K switches 3rd party Data Sources
Linux VM Asset Tagging
Nexus 9200-X Load Balancers Windows Server VM
Bare Metal IP Address Management (Linux and Windows Server) CMDB Nexus 9300-EX Universal* (Basic Sensor for other OS) …
*Note: No per-packet Telemetry, Not an enforcement point New! Enforcement Point (Software agents) Low CPU Overhead (SLA enforced) Highly Secure (Code Signed, Authenticated) Low Network Overhead (SLA enforced) Every Flow (No sampling), NO PAYLOAD
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 24 Tetration Sensors Locations
9732C-EX LC Hardware Sensor Packet and Flow Events Buffer and Switch State Software Sensor Processes & Socket Packet and Flow Events
92160CY-X 93180Y-EX
HYPERVISOR HYPERVISOR HYPERVISOR
Tetration Cluster
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 25 Hardware Sensor EX and FX series Nexus 9000
• Embedded Module (Flow Cache) • Nexus 92160CY-X • Nexus 93180Y-EX & 9732C-EX Line Cards
• Extracts Meta-Data from the forwarding pipeline • No latency impact, no performance impact
Flow Cache
PRX LUA LUB LUC
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 26 Hardware Sensor Direct Export of the Hardware State
Monitor SW State (polled, BGP EthPM STP timer driven, on demand, …) CPU sources the SW Telemetry Data (everything not in the HW export)
Configure Required Telemetry (Process State, Flow Cache, Events, SSX)
Configure Desired Triggers ASIC Directly Transmits HW (Events, Flows, …) Telemetry Data (Timer and Event Triggers)
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 27 Hardware Sensor EX and FX series Nexus 9000
• Support in NX-OS Mode • Cisco NX-OS Release 7.0(3)I5(2) adds filtering support • https://techzone.cisco.com/t5/Tetration-Analytics/Installation-and-configuration-of- Hardware-sensor-on-standalone/ta-p/1010838
• Support in ACI Mode • Cisco ACI NX-OS release
• https://techzone.cisco.com/t5/Tetration-Analytics/Tetration-Deep-Dive-Network- Connectivity-Hardware-Sensors/ta-p/975945
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 28 Software Sensor Tetration Sensor Application • Runs in the Host OS, not the Hypervisor libpcap Network Stack • Access to accurate state of the application and all connectivity Driver • Not in the data path • Sits in User Space • Designed by Kernel Developers NIC
• Secure • Code Signed
• SLA Enforcement • CPU and BW throttling
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 29 Software Sensor Enforcement Process High Privilege Collection • When leveraging the enforcement capability an additional component is downloaded by the Cluster to the existing sensors Low Privilege Monitoring • Monitoring and Enforcement are distinct functions with distinct threads (the enforcement code does not exist in the server until explicitly pushed Cluster Link
• Agent will implement privilege separation • SSL libraries would run in low privilege space High Privilege Enforcement • /proc parsing in high privilege space • Enforcement in high privilege space Low Privilege Cluster Link
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 30 PKI within the Cluster/Sensor
• Tetration Cluster runs an internal PKI • Root CA is per cluster, inserted at Image creation • Not accessible outside the cluster • Cannot connect to an external PKI
• Certificate based authentication is performed for the Control Channel • CN of the certificate is the IP address • Certificates are rotated every 60 days
• Sensors are code signed • Signature Authority is Cisco’s code signing certificate • Code Signature is validated at process start
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 31 How Sensor Communicate with the Cluster the First Time?
Register with web server via ssl Assign UUID Rails
Register with web server via ssl
Sensor Download config Config Server
Send meta data to collectors Collector
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 32 Components & Communication Hardware Sensor
Control Channel TCP/443 NXOS Agent
Agent Communication Guest Shell Unix Socket Tetration Cluster
ASIC Sensor Data UDP/5640
Cisco Nexus 9000
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 33 Components & Communication Software Sensor
Agent Communication Unix Socket Control Channel TCP-SSL 443
Tetration Cluster Software Sensor/Agent Sensor Data TCP-SSL 5640
• When used policies pushed from the cluster are pairwise signed with TS (Replay protected) between Cluster and sensor agent LINUX/Windows/… • If rules changed on the end host – Enforcer restates the rules and sends a Notification to Controller
34 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Universal Agent
• Supporting annoying operating system… • AIX • zOS • … • Process and connection tracked with a lower granularity
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 35 Tetration Host Sensor Has Three Rate Limiting Modes
Top Adjusted Disabled
• Uses no more CPU % than • Takes the provided limit and • Use in hosts where the given limit on any single multiplies it by the amount of telemetry MUST be core cores available to the system collected
• For example, 3% limit on a 10 • For example, 3% limit on a 10 • No CPU % limit, will take as core system = 3% out of total core system = 30% out of total much as necessary to 1,000% available 1,000% limit capture each and every packet • This is a fairly restrictive • This is the default profile (set to mode and would be 3%) – and it’s recommended to suggested only when use this profile unless necessary necessary
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 36 Tetration Sensor Overhead (e.g. 2263 sensors)
• CPU utilization on Host Sensor based on current deployments averages < 1% • Flow collection has zero impact on switch hardware sensor CPU
• Network Overhead is ~1% of observed traffic load
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 37 Software Sensor Support (Q2CY17)
Full Sensor Universal Sensor • RHEL (64 bit) – 5.x, 6.x, 7.x • Mainframes: AIX-ppc 5.3, 6.1, 7.1, 7.2 • CentOS (64 bit) – 5.x, 6.x, 7.x (trial)
• Oracle Linux (64 bit) – 6.x, 7.x • Solaris (x86_64)
• SUSE – 11.2, 11.3, 11.4, 12.1, 12.2 • RHL 4.x, 5.x (32 bit -386/amd)
• Ubuntu – 12.04, 14.04, 14.10 • CentOS - 4.x, 5.x (32 bit)
• Windows Server 2008 R1/R2 | Essentials | • Windows XP, 2003 (32 bit) Standard | Enterprise | DataCenter • Windows Server 2008 (32 bit) • Windows Server 2012 R1/R2 | Essentials | Standard | Enterprise | DataCenter
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 38 Tetration Analytics Platform
Architecture - Sensor Data Telemetry Means Different Things to Different People – Device State
• Device State Telemetry Know the Network • What is happening in the Switch/Router and infer the health • What is happening between Devices of the application based on the state • What is happening in the Network of the devices
Network Network Network Network
Device and Network Data Link Data Link Data Link Data Link
Physical Physical Physical Physical
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 40 Telemetry Means Different Things to Different People – Application State
Application Know the application Application and infer the health of Process Process the infrastructure based Process Process on the state of Sockets application connectivity Sockets
• Application State Telemetry • What is happening in the Operating System • What is happening in the Process (JVM) • What is happening in the Server I/O path
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 41 Tetrations View of Telemetry Application Processes, Sockets and Context
Socket > 1023 Socket = 443
Chrome NGINX
Consumer Process Provider/Service Process
• Application developers implement business logic as code that runs as processes and threads • TCP/IP which forms a foundation of the Internet was designed to allow these application processes to interact via sockets • Application logic can be viewed on one level as the interaction between a group of processes and their associated sockets • Understanding the inter-process communication and mapping that directly to the infrastructure provides a direct correlation between the application and the infrastructure
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 42 Tetrations View of Telemetry Application Processes, Sockets and Context
Socket > 1023 Socket = 80
Chrome NGINX
Consumer Process Provider/Service Process
#create an INET, STREAMing socket #create an INET, STREAMing socket s = socket.socket( serversocket = socket.socket( socket.AF_INET, socket.SOCK_STREAM) socket.AF_INET, socket.SOCK_STREAM) #now connect to the web server on port 80 #bind the socket to a public host, # - the normal http port # and a well-known port s.connect(("www.mcmillan-inc.com", 80)) serversocket.bind((socket.gethostname(), 80)) #become a server socket serversocket.listen(5)
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 43 What do we mean by Application Visibility Internet Stack
Application Application
Process Process Process Process
Sockets Sockets Transport Transport
Network Network Network Network
Data Link Data Link Data Link Data Link
Physical Physical Physical Physical
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 44 What Does Tetration Sensor Collect Socket Connectivity, the data flows
Application Application
Process Process Process Process
Sockets Sockets Transport Transport
Network Network Network Network
Data Link Data Link Data Link Data Link
Physical Physical Physical Physical
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 45 What does the Sensor Collect
Context Device Information: Process Buffer/ACL Drops, etc. Information: Which process is it, Application who started it, etc. Application
Process Process Process Process
Sockets Sockets Transport Transport
Network Network Network Network
Data Link Data Link Data Link Data Link
Physical Physical Physical Physical
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 46 Sensor Data Process Information
• Host Sensor collects information about the consumer and provider processes • /proc • runtime system information (e.g. system memory, devices mounted, hardware configuration, etc).
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 47 CMDB, DNS, whois, etc. External Data Additional Context (Talos,…, future) Repositories External Data Sources
Application Application
Process Process Process Process Annotation and Operations Data Sockets Sockets Transport Transport
Network Network Network Network
Data Link Data Link Data Link Data Link
Physical Physical Physical Physical APIC
Tetration Pervasive Sensors Analytics Engine
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 48 What does the Sensor Collect Socket Level Flow Information + Context Information
• Understanding of what happens TO • Anomaly detection ‘and’ INSIDE a flow • Latency (application and network) • Distributions (packet sizes, TCP • Events windows…) • VXLAN information • Burstiness
Per Packet Variations Length Length 66 9000
Accumulated Flow Information (Volume…)
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 49 Full vs. Sampled What happens when you sample?
Full Packet Stream
Flow A
Flow B
Flow C
SYN SYNACK ACK FIN Flow D
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 50 Full vs. Sampled Reasons and Use Cases for Both Sampled Full
• Sampling has it’s use cases, in SP • Depending on the number of flows environments for example and type of flows • High Volume, no behavioral analysis • Mice flows can go completely unseen • Connection Oriented flows may not be • Sampling provides a good statistical tracked properly (missed flags) model • For Trends • Accuracy of the flow increases with • For Traffic Visibility the packet count • For Volume Indication • Type of sampling and quality of entropy • Entropy is very important
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 51 Tetration Examines every packet
Full Packet Stream
• Variability ’within’ the flow
• Variability ‘between’ the flows
• Changes ‘within’ the flow
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 52 Collects the Meta-Data not the Packet
Meta-Data – Including Overlay VXLAN/GRE/IPinIP Encapsulated Header
Ethernet IP UDP VXLAN Ethernet IP TCP Payload Header Header Header Header Header Header Header
Ethernet IP TCP Payload Header Header Header
Ethernet IP UDP Payload Header Header Header
Privacy Risk
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 53 Sensor Data Flow Data – Forwarding
• COS
• Overlay Type (Native, 802.1q / 802.1p, VXLAN, iVXLAN, NVGRE, NSH, other)
• Source TEP or Port ID
• Destination TEP
• Disposition (RPF or Port Security failure, Policy drop, redirect or span)
• Port type (spine to leaf or leaf to host)
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 54 Sensor Data Accumulated Flow Information
• Bytes, Packet Count • Accumulated TCP flags
• IP options present • Last ACK / SEQ
• IP length error • Sampled Packet length
• DF bit set • Sampled Packet ID
• Fragment seen
• Last TTL
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 55 Sensor Data Histogram Bins #1 #2 #3 #4 82 bits 82 bits 0 bits 165 bits • Flow Cache has the notion of “bins” to build histograms 1 0 1 0 1 0 0 0 • TCP options length (8 bits) • Payload length (12 bits) Export • Receive window (6 bits) #5 #6 #7 #8 82 bits 82 bits 130 bits 165 bits • This means more visibility on the activity of flow 0 0 1 1 1 0 0 0 • Bin sizes are configurable • Bins don’t need to be of equal size (but Export need to be contiguous) • Last bin will capture the configured size Histogram of = and above the flow
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 56 Sensor Data Burst
• Measure the “burstiness” of a flow • Burst are measured in 32k interval • Current Burst • Each export period is divided by 128 • Max Burst • Burst Index • Flowlets are activity after a silence • Flowlets period (configurable)
Current – 128 Current – 256 Current – 32 Current – 1024 Current – 0 Max – 128 Max – 256 Max – 256 Max – 1024 Max – 1024 Burst Index - 0 Burst Index - 3 Burst Index - 3 Burst Index - 80 Burst Index - 80
0 1 2 3 30 80 128 Flowlet #1 Silence Flowlet #2
Max Burst occurred at 62.5ms with a value of 1024 and 2 flowlets
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 57 Sensor Data Anomaly List
• TTL changed • TCP flags are zero’d • IP reserved flags are not 0 • TCP SYN with data • DF bit has changed • TCP FIN with no ACK • TCP RST with no ACK • Ping of death • TCP SYN, FIN, RST and ACK zero’d • Fragment is too small to contain L4 header (TCP, UDP and SCTP) • URG set but no URG pointer • TCP SYN and FIN are set • URG pointer with no URG flag • TCP SYN and RST are set • TCP seq outside the expected range • TCP FIN, PSH and URG are set • TCP seq is less than expected (rexmit)
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 58 Sensor Data Application Latency: How long did it take for the inbound TCP Timing Data data to be ACK’d
SRTT Latency (Process to ACK Application Application Process at the TCP level)
Process Process Process Process
Sockets Port to Port Latency: Sockets (Requires HW support) Transport Transport
Network Network Network Network
Data Link Data Link Data Link Data Link
Physical Physical Physical Physical
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 59 Sensor Timing Data Network Performance Monitoring Example
• The host calculates round trip time as 8 milliseconds • The port-to-port network latency is 252 microseconds • The app took 15 seconds to return the ACK
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 60 Pervasive Visibility Flow Search and Forensics BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 62 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 63 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 64 Different Problems will need Different Data Sources
Application Application
Process Process Heath, Performance, Sockets Monitoring, Security, Transport Discovery Application Troubleshooting Network Network Network Heath, Data Link Data Link Performance, Monitoring, Physical Physical Capacity
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 65 Tetration Analytics Platform
Architecture - Cluster Tetration Analytics Architecture Overview
Visualization and Data Collection Analytics Engine Reporting
Host Sensors Tetration Web GUI VM Telemetry
Network Sensors Cisco Tetration Cisco Nexus® Cisco Nexus ™ REST API 92160YC-X 93180YC-EX Analytics Platform
3rd-Party Push Events Metadata Sources Configuration Data
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 67 The Analytics Cluster Front End
Components Compute (Data Cleaning and • Hadoop Based Platform Analytics) • Self managed Caching • One touch deployment (Search) • Tiered System • Heavy Compute for Machine Learning • Caching for light speed queries
• Extensibility (future) Long Term Storage • Messaging Bus (Data Lake) • API Access
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 68 The Analytics Cluster Appliance
• The Analytics Cluster operates as an appliance • Avoids the need for in house Big Data, Analytics expertise • Supported by Cisco TAC
• Self Monitoring • The cluster leverages a sensor architecture to track it’s state and provides event based notifications for
• Software upgrades and full install are all automated
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 69 Cluster Monitoring and Maintenance
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 70 Collector Monitoring and Maintenance
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 71 Sensor Monitoring and Maintenance Sensor Throttled
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 72 Hardware Sensor Monitoring
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 73 Tetration 1.0 Analytics Cluster Configurations
4 x 3-Phase PDU 4 x 1-Phase PDU 22.5 KW Peak Power 11.5 KW Peak Power
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 74 Tetration Analytics 2.0: Deployment Options
On-Premise Options Public Cloud
Cisco Tetration Analytics Cisco Tetration-M (Small Cisco Tetration Cloud Form Factor) (Large Form Factor) • Software deployed in AWS • Suitable for deployments • Suitable for deployments • Suitable for deployments under 1000 workloads more than 1000 workloads under 1000 workloads Includes: • Built in redundancy • AWS instance owned by • Scales up to 10,000 • 6 x UCS C-220 servers customer workloads • 2 x Nexus 9300 switches Includes: • 36 x UCS C-220 servers • 3 x Nexus 9300 switches
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 75 Analytics Engine Front End
The Platform Compute (Data Cleaning and • Hadoop Based Platform Analytics) • Self managed • One touch deployment Caching (Search) • Tiered System • Heavy Compute for Machine Learning • Caching for light speed queries
• Extensibility (future) Long Term Storage • Messaging Bus • API Access (Data Lake)
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 76 Front End GUI, RESTful API
• Servers hosting front end processes • GUI and Operational Interfaces • RESTful API
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 77 Data Processing Pipeline
• Data Ingest and Processing
• Multiple Pipelines for different processing activities
• Scaled to Millions of events per second
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 78 Caching Layer Natural Language Search
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 79 Caching Layer Search
• Caching Layer provides a large in memory and flash based data store for real time searches e.g. 16 weeks of policy delta data accessible for real time search
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 80 Data Lake HDFS Storage
• Long Term Storage for collected observations, for pipeline processing tasks, etc
• Usage is based on • Time Based Retention • Space Based Retention • Greedy Retention
• Max possible Retention period will depend on cluster size and observation rate
14.10 K hours of available capacity at the current collection rates (587 days)
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 81 Standard Data Analytics Pipeline Tetration Data Analysis
Various Pipelines (e.g. ADM) process the data to derive appropriate insights
Automated Data Statistical Analysis Reporting, Data Data Prep & & Visualization Aggregation Discovery& cleansing Prediction Tools or Alerts Evaluation
De-duplication, unification of uni- Sensor Collectors directional flows into bi-directional, GUI, REST API, Kafka, annotate flows with context Policy Export, … information, etc.
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 82 Data Collection Sensor to Collector
Automated Data Statistical Analysis Reporting, Data Data Prep & & Visualization Aggregation Discovery& cleansing Prediction Tools or Alerts Evaluation
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 83 Data Prep and Annotation
• De-duplication, Application Application unification of uni- Process Process Process Process Collector Sockets Sockets directional flows into Transport Transport bi-directional, Network Network Network Network annotate flows with Data Link Data Link Data Link Data Link Collector context information, Physical Physical Physical Physical etc.
Automated Data Statistical Analysis Reporting, Data Data Prep & & Visualization Aggregation Discovery& cleansing Prediction Tools or Alerts Evaluation
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 84 104 Annotation
• Think Gmail Labels
• User Defined information • User Uploaded • Keyed by VRF, IP • JSON Open Fields
• Derived Information • IP • VRF • …
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 85 Annotation & Inventory
• Discovered Inventory
• Uploaded Inventory and Meta Data (32 Arbitrary Tags)
• Inventory Tracked in Real Time, along with historical trends
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 86 Analyzing the Data
• Endpoints are iteratively compared with each other to find which “profiles” are most similar • Sensor Data: Ports provided and consumed, Addresses sent and received from, Properties of network flows, Running processes, Process originating flow, Hostname, • External Context: Load balancers / DNS / route tags • Human approved clusters from current or other workspaces and base cluster definition • This is an example of where we use machine leaning
Automated Data Statistical Analysis Reporting, Data Data Prep & & Visualization Aggregation Discovery& cleansing Prediction Tools or Alerts Evaluation
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 87 Machine Learning
Cognitive Computing - Finding and remembering all the relationships between data, querying the matrix of relationships (Watson)
Machine Learning - Remember what has happened before and then look at new data coming in that context to try and find patterns, build up a body of knowledge and then use that data to make a decision based on the new data. Can machines remember and apply what they remember to new data
Deep Learning - Not trying to maintain data and relationships over time but analyze that data through better representations and create model to learn these representations from large scale unlabeled data. Succession analysis
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 88 Machine Learning
A "Field of study that gives computers the ability to learn without being explicitly programmed“ Arthur Samuel (1959)
The programmers construction of algorithms that can learn from and make predictions on data (as opposed to static programming instructions).
7:00 am = 65 degrees 8:00 am = 75 degrees 77.5 degrees 9:00 am = 85 degrees
How warm will it be at 8:30 am tomorrow?
Supervised learning: Linear regression , Logistics regression, SVMs Unsupervised learning: K-means, PCA, Anomaly detection
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 89 ADM Clustering Machine Learning Example
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 90 K-means Algorithm Finding the Clusters
Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster
}
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 91 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 92 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 93 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 94 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 95 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 96 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 97 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 98 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 99 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 100 Silhouetting Validation of the Cluster
• The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation) • Produces a higher degree of probability that the clustering is representational
https://en.wikipedia.org/wiki/Silhouette_(clustering)
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 101 Results of the Clustering Machine Learning
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 102 Tuning Cluster Granularity Tuning the Algorithms
1 2 1 1 1
15
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 103 Analyzing the Data Fitting the Curve
• Every data set (e.g. flow) is examined to find the best function that describes it’s behaviour • Comparison within and between ‘flows’ can be used to find ‘outlier’ or anomaly conditions
Automated Data Statistical Analysis Reporting, Data Data Prep & & Visualization Aggregation Discovery& cleansing Prediction Tools or Alerts Evaluation
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 104 Outliers What does not look like it ‘fits’ Outlier dimension is Switch on Outlier view to highlighted with purple circle highlight uncommon flows
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 105 Tetration Analytics Architecture Overview
Data Collection Analytics Engine Open Access
Software Sensor and Web GUI Enforcement Cisco Embedded REST API Network Sensors Tetration (Telemetry Only) Analytics Event Notification Cluster
Third Party Sources (Configuration Data) Tetration Apps
Self Managed Cluster No Hadoop / Data Science Background Needed Easy Integration via Open interfaces One Touch Deployment No External Storage Needed Open Data Lake (via Tetration Apps)
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 106 Accessing the Data and Analytical Results API, Workspace Applications and Messaging BUS
Northbound Northbound Northbound application consumers consumers
Kafka Broker
Programmatic Message Tetration Interface Publish Apps
Kafka
Automated Data Statistical Analysis Reporting, Data Data Prep & & Visualization Aggregation Discovery& cleansing Prediction Tools or Alerts Evaluation
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 107 Tetration API
• Shipped as a limited ‘trial feature’ with the 103.8 release • Supported with the 2.0 release (FCS April 2017) • Is a RESTful API that uses HMAC time-bound authentication tokens generated from a private and public key pair • SDKs available in Python (2.7+) and JavaScript (ES6+) API • Supports managing sensors and switches, plus flow searching
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 108 Example: Flow Search Filters
• Method: POST • Endpoint: /flowsearch • Description: The entire Tetration Analytics flow database can be queried, boasting sub-second response times.
• Search the flow database for the first record that matches these parameters:
Between 02/01/2017 3PM and 02/01/2017 4PM
Default tenant and VRF
Destination port 80 (HTTP)
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 109 Tetration an Open Platform User Apps
• Tetration Engineering, Partners and Customers can write apps on Tetration • User can write his business logic and extend Tetration
• Programming Languages supported • Scala • Python • SQL • R (coming)
• Type of jobs • One Time Applications and Reports – scheduled now or any time in future • Recurring Applications and Reports
• Trigger Alerts on tetration or user defined events
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 110 User Apps: Tetration an Open Platform • Data Source • Tetration Data with Multi Tenancy (Row and Column filtering – Tetration Read Libs) • User Uploaded Data – Any schema or format • Application generated Data • Data Retention and quota monitoring • Arbitrary Stream ingestion (coming soon)
• Alert and Event Posting • Kafka Message bus integrated
• Security model • Tenant Isolation • Tetration jobs isolated from user • Active firewalls. All user jobs launched inside a private container, Adhoc VM. Jailed from Tetration mainstream • Kafka Kerberos Auth
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 111 Tetration Apps
• Explore data from your browser, develop your models
• Based on Jupyter Notebooks
• Provides an easy way to develop apps
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 112 Granular RBAC Application Access Controls
Permitted Scope Workloads Actions User Permitted Roles Scope Workloads Actions User Permitted Roles Scope Workloads Actions User Permitted Workloads Actions
R, Modify, ADM, Enforce, etc.
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 113 Hands on DEVNET Lab
For hands on exposure to the API please feel free to visit DEVNET
DEVNET-2423
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 114 Policy Discovery, Compliance and Enforcement Application Dependency and Cluster Grouping
Bare-metal, VM, & switch telemetry BM VM VM VM BM VM BM
Cisco Nexus® 9000 Series VM VM Network-only sensors, host-only sensors, or both (preferred)
Cisco Tetration BM VM Bare-metal & BM VM VM BM VM telemetry Analytics™ VM BM Brownfield Platform
Bare metal and VM VM BM
Unsupervised machine VM BM BM VM VM BM VM telemetry learning (AMI …) Behavior analysis BM On-premises and cloud workloads (AWS)
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 116 Application Conversation View
Application clusters Conversation details conversation views including process bindings
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 117 Whitelist Policy Recommendation Application Discovery Whitelist Policy Recommendation (Available in JSON, XML, and YAML)
{ "src_name": "App", "dst_name": "Web", "whitelist": [ {"port": [ 0, 0 ],"proto": 1,"action": "ALLOW"}, {"port": [ 80, 80 ],"proto": 6,"action": "ALLOW"}, {"port": [ 443, 443 ],"proto": 6,"action": "ALLOW"} ] }
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 118 Policy Compliance Verification & Simulation
What was seen on the network that was out of Policy
Permitted Traffic Seen on the network
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 119 Policy Enforcement The Goal Is to Describe Intent
I want to…
• Block non-production apps talking to productions apps • Allow HR apps to use the employee database • Block all HTTP connections that are not destined to web servers • Allow and notify me when a new app request DNS server access • Block and notify me when a new app requests AD server access
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 120 Security
Intent is rendered as security rules in native host firewalls Same level of security, any infrastructure.
Process Application
Denies Allows
End Point
Infrastructure
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 121 Virtual
Process Process Application Application
Denies Allows Denies Allows
End Point End Point
Hypervisor Virtual Network
Bare metal Network Infrastructure Cloud
Any Infrastructure Process Process Application Any Networking Application Denies Allows Same Security Model Denies Allows End Point End Point Rich Context Network Infrastructure Cloud Infrastructure
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 122 Mobility
Intent stays with the endpoint, no matter the infrastructure it resides on
EP EP
VLANs Interfaces
Tetration calculates all Subnets necessary rule changes and Security Groups automatically applies
ACLs Security Rules
7K 5K 2K Cloud
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 123 How Does It Work?
Tetration automatically converts your intent into black and white list rules
Block non-production apps talking to DENY SOURCE 10.0.0.0/8 DEST 128.0.0.0/8 production apps
Allow HR apps to use the employee database ALLOW SOURCE 128.0.10.0/16 DEST 128.0.11.0/16
ALLOW SOURCE * DEST 128.0.100.0/16 PORT = 80 Block all HTTP connections that are not destined to web servers DENY SOURCE * DEST * PORT = 80
Intent Rules
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 124 Enforcement
• Enforcement happens at the endpoint level • IPtables on Linux • Advanced Firewall on Windows
• It can be enabled / disabled at the endpoint level (from Tetration) • Monitoring or Monitoring + Enforcement • Cannot be reverted without removing the agent
• Enforcement runs as a separate process for compliance reasons • Proving the agent does not run
• A compliment not replacement for infrastructure enforcement
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 125 Application Centric, Okay but how do I get there? Enhanced Security Services Application and Infrastructure Optimization
Tenant and Application Security Requirements and Enforcement
Application Dependency Mapping
Automated Network Whitelist Policy Forensics Generation
Policy Policy Simulation Compliance and and Impact Auditability Assessment
Rich Telemetry Data from Hardware (Nexus 9000) and Software Sensors enables discovery and security monitoring
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 127 Data Center Vision Inter-dependent feedback loops ACI, YANG (Intent 1. Deployment and Based Automation) Provisioning Infrastructure Automation Security
Cisco CloudCenter (Common Application Guarantees Assurance (Formal Consumption across Deployment Compliance Methodologies) Hybrid IT) Consistency
ADM 2. Operations and Tetration Analytics Security Management (Machine Learning Based Operations Forensics and Security)
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 128 Summary
VM
Pervasive flow Ready-to-use Self-monitoring Open platform Accelerated telemetry that solution to address and eliminate the and northbound adoption and supports critical data center need for APIs enable comprehensive infrastructure for operational in-house big data transparent Solution multiple data use cases expertise integration support with centers at scale Services
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 129 Complete Your Online Session Evaluation
• Please complete your Online Session Evaluations after each session • Complete 4 Session Evaluations & the Overall Conference Evaluation (available from Thursday) to receive your Cisco Live T-shirt • All surveys can be completed via the Cisco Live Mobile App or the Don’t forget: Cisco Live sessions will be available Communication Stations for viewing on-demand after the event at CiscoLive.com/Online
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 130 Continue Your Education
• Demos in the Cisco campus
• Walk-in Self-Paced Labs
• Lunch & Learn
• Meet the Engineer 1:1 meetings
• Related sessions
BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 131 Q & A Thank You