Tetration Analytics - Network Analytics & Machine Learning Enhancing Data Center Security and Operations

Mike Herbert, Principal Engineer, INSBU BRKDCN-2040 Session Abstract

Huge amounts of data traverse network infrastructure on a daily basis. With the innovative big data analytics capabilities, it is possible to use rich network metrics to provide unprecedented insight into IT infrastructure. By leveraging pervasive low overhead sensors in both hardware and software, a complete view of application and network behavior can be attained in real time. In modern data center today some of the key operational and security challenges faced are understanding applications dependencies accurately, ability to generate consistent whitelist policy model and to ensure network policy compliance. This session will describe how Analytics uses unsupervised machine learning approach to collect hundreds of data points and, use advanced analytics, addresses these challenges in a scalable fashion.

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 3 If this is not what you were hoping for here are some other Sessions

• Tetration Analytics, the secret ingredient for every Data Center • Session ID: PSODCN-1800

• Cisco Tetration: Data Center Analytics Deployment and Use Cases • Session ID: BRKACI-2060

• Tetration API’s : • Session ID: DEVNET-2423

• Tetration Analytics - Industry's Powerful Analytics Platform • Session ID: LABACI-3020

• Inside Cisco IT: ACI & Tetration Analytics • Session ID: BRKCOC-2006

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 4 Okay what does Tetration Mean?

• Tetration (or hyper-4) is the next after , and is defined as iterated exponentiation

• It’s bigger than a Google [sic] ()

• And yes the developers are a bunch of mathematical geeks

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 5 Tetration Analytics Platform

Introduction We Are at the Cusp of a Major Shift TRADITIONAL DATA CENTRE CLOUD DATA CENTRE Adoption Curve HYBRID CLOUDS

We are here Efficiency AUTOMATION

IT as a Service IaaS | PaaS | SaaS | XaaS

Flexible Consumption Models

VIRTUALISATION CONSOLIDATION

EFFICIENCY SIMPLICITY | SPEED DIGITAL EXPERIENCES 2000 2010 2015 The Next 5+ Years

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 7 What if you could actually look at every process and every data packet header that has ever traversed the network without sampling?

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 8 Cisco Tetration Analytics Pervasive Sensor Framework

Provides correlation of data sources across entire application infrastructure

Enables identification of point events and provides insight into overall systems behavior

Monitors end-to-end lifecycle of application connectivity

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 9 Cisco Tetration Analytics Policy Discovery and Observation APPLICATION WORKSPACES Public Cloud

Private Cloud

Cisco Tetration Analytics™ Application Segmentation Policy

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 10 Profile and Context Driven Application Segmentation

1. Real-time Asset Tagging 2. Policy Workflows 3. Policy Enforcement (Role Based and Hierarchical) Cisco Tetration Application Insights (ADM) No Need to Tie Policy + to IP Address and Cisco Tetration Sensors Tag and Label-Based Add-on Policy Port (For Example, Mail Filters) Cisco Tetration Customer Defined Platform Performs the Translation

Compliance Monitoring Enforcement

Public Cloud Bare Metal Virtual Cisco ACITM* Traditional Network*

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 11 Tetration Analytics: Open Access

NORTHBOUND NORTHBOUND NORTHBOUND APPLICATION CONSUMERS CONSUMERS

Kafka Broker

Programmatic Message Tetration Interface Publish Apps

Cisco Tetration Analytics Platform

REST API Push Notification Tetration Apps  Tetration flow search  Out-of-box events  Access to data lake  Sensor management  User defined events  Write your own application

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 12 Tetration Analytics Platform

Architecture - Sensors Tetration Analytics Architecture Overview

Data Collection Analytics Engine Open Access

Software Sensor and Web GUI Enforcement Cisco Embedded REST API Network Sensors Tetration (Telemetry Only) Analytics Event Notification Cluster

Third Party Sources (Configuration Data) Tetration Apps

 Self Managed Cluster  No Hadoop / Data Science Background Needed  Easy Integration via Open interfaces  One Touch Deployment  No External Storage Needed  Open Data Lake (via Tetration Apps)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 14 Traditional Monitoring Is Showing Its Age Not suited for Modern Network and Security Operations

Where Data Is Created Where Data Is Useful

SNMP SNMP Server

Non Syslog Real Syslog Collector time Storage & Analysis CLI Strong burden on Scripts back-end Normalize different encodings, transports, data models, timestamps

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 15 Data Granularity Needs to Improve One Minute SNMP Polling

Telemetry – 10 Second Push SNMP – 1 Minute Polling

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 16 Data Granularity Needs to Improve 10 Second SW Process Push

Telemetry – 10 Second Push

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 17 Data Granularity Needs to Improve Sub Second HW/SW Push

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 18 Data Granularity Needs to Improve Type of Problems Customers are Looking to Address

Workload Placement

Service Level Monitoring

ADM

Security and Policy Enforcement

Microburst Detection Traffic Engineering

Capacity Planning

Troubleshooting & Remediation (Self Driving)

On-Change <= 1 sec ~10s sec ~minutes-hours

Resolution = Frequency of Data Collection

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 19 Processing on the Source Device is Expensive e.g. Consider Flow Collection Efficiency

512K Sampled Flow Cache with Flow Flow Data streaming export Table

• Collect and Keep all Flow Data in the • Maintain a small ‘cache’ and Local Hardware or Software Flow export the cache at a high data Table • Sampling Flows Reduces rate • Size of the Table depends on the Cost of the Telemetry but • Shift the cost of aggregation to Data Rates and Connectivity Density Reduces Accuracy backend resources • BW is Growing Faster than Memory • Aggregate ‘Flow Table’ can be (Cost of Flow Entry per Gbps is not much larger flat)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 20 The Richer the Data Sources the Better More Data == Better Interpolation

Lamp Sensor Plug Sensor

Heater

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 21 The Richer the Data Sources the Better You don’t always know what you need in advance

• On-Box Filtering Loses Data • Can’t Change Your Mind About What’s Important Later • Can’t Scale Out Embedded Processing • Compression (Lossless) is Good • Massive Amounts of Data Motivate the Shift in Collection • Bulk Collection is Efficient • Bulk Processing/Export Not So Much

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 22 Streaming Telemetry is a game changer Monitoring becomes a big data problem

Where Data Is Created Where Data Is Useful

Removing limitations and complexity

• Streaming paradigm Real time • Dense Sensor Framework

• Increased Data Granularity Volume – Scale of Data Velocity – Analysis of Streaming Data • Update on every event Variety – Different Forms of Data

• Multiple Data Sources Big Data and Machine Learning Problem

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 23 Pervasive Sensors

Software Sensors Network Sensors Third Party Sources Available Now Next Generation 9K switches 3rd party Data Sources

Linux VM Asset Tagging

Nexus 9200-X Load Balancers Windows Server VM

Bare Metal IP Address Management (Linux and Windows Server) CMDB Nexus 9300-EX Universal* (Basic Sensor for other OS) …

*Note: No per-packet Telemetry, Not an enforcement point  New! Enforcement Point (Software agents)  Low CPU Overhead (SLA enforced)  Highly Secure (Code Signed, Authenticated)  Low Network Overhead (SLA enforced)  Every Flow (No sampling), NO PAYLOAD

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 24 Tetration Sensors Locations

9732C-EX LC Hardware Sensor Packet and Flow Events Buffer and Switch State Software Sensor Processes & Socket Packet and Flow Events

92160CY-X 93180Y-EX

HYPERVISOR HYPERVISOR HYPERVISOR

Tetration Cluster

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 25 Hardware Sensor EX and FX series Nexus 9000

• Embedded Module (Flow Cache) • Nexus 92160CY-X • Nexus 93180Y-EX & 9732C-EX Line Cards

• Extracts Meta-Data from the forwarding pipeline • No latency impact, no performance impact

Flow Cache

PRX LUA LUB LUC

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 26 Hardware Sensor Direct Export of the Hardware State

Monitor SW State (polled, BGP EthPM STP timer driven, on demand, …) CPU sources the SW Telemetry Data (everything not in the HW export)

Configure Required Telemetry (Process State, Flow Cache, Events, SSX)

Configure Desired Triggers ASIC Directly Transmits HW (Events, Flows, …) Telemetry Data (Timer and Event Triggers)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 27 Hardware Sensor EX and FX series Nexus 9000

• Support in NX-OS Mode • Cisco NX-OS Release 7.0(3)I5(2) adds filtering support • https://techzone.cisco.com/t5/Tetration-Analytics/Installation-and-configuration-of- Hardware-sensor-on-standalone/ta-p/1010838

• Support in ACI Mode • Cisco ACI NX-OS release

• https://techzone.cisco.com/t5/Tetration-Analytics/Tetration-Deep-Dive-Network- Connectivity-Hardware-Sensors/ta-p/975945

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 28 Software Sensor Tetration Sensor Application • Runs in the Host OS, not the Hypervisor libpcap Network Stack • Access to accurate state of the application and all connectivity Driver • Not in the data path • Sits in User Space • Designed by Kernel Developers NIC

• Secure • Code Signed

• SLA Enforcement • CPU and BW throttling

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 29 Software Sensor Enforcement Process High Privilege Collection • When leveraging the enforcement capability an additional component is downloaded by the Cluster to the existing sensors Low Privilege Monitoring • Monitoring and Enforcement are distinct functions with distinct threads (the enforcement code does not exist in the server until explicitly pushed Cluster Link

• Agent will implement privilege separation • SSL libraries would run in low privilege space High Privilege Enforcement • /proc parsing in high privilege space • Enforcement in high privilege space Low Privilege Cluster Link

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 30 PKI within the Cluster/Sensor

• Tetration Cluster runs an internal PKI • Root CA is per cluster, inserted at Image creation • Not accessible outside the cluster • Cannot connect to an external PKI

• Certificate based authentication is performed for the Control Channel • CN of the certificate is the IP address • Certificates are rotated every 60 days

• Sensors are code signed • Signature Authority is Cisco’s code signing certificate • Code Signature is validated at process start

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 31 How Sensor Communicate with the Cluster the First Time?

Register with web server via ssl Assign UUID Rails

Register with web server via ssl

Sensor Download config Config Server

Send meta data to collectors Collector

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 32 Components & Communication Hardware Sensor

Control Channel TCP/443 NXOS Agent

Agent Communication Guest Shell Unix Socket Tetration Cluster

ASIC Sensor Data UDP/5640

Cisco Nexus 9000

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 33 Components & Communication Software Sensor

Agent Communication Unix Socket Control Channel TCP-SSL 443

Tetration Cluster Software Sensor/Agent Sensor Data TCP-SSL 5640

• When used policies pushed from the cluster are pairwise signed with TS (Replay protected) between Cluster and sensor agent LINUX/Windows/… • If rules changed on the end host – Enforcer restates the rules and sends a Notification to Controller

34 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Universal Agent

• Supporting annoying operating system… • AIX • zOS • … • Process and connection tracked with a lower granularity

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 35 Tetration Host Sensor Has Three Rate Limiting Modes

Top Adjusted Disabled

• Uses no more CPU % than • Takes the provided limit and • Use in hosts where the given limit on any single multiplies it by the amount of telemetry MUST be core cores available to the system collected

• For example, 3% limit on a 10 • For example, 3% limit on a 10 • No CPU % limit, will take as core system = 3% out of total core system = 30% out of total much as necessary to 1,000% available 1,000% limit capture each and every packet • This is a fairly restrictive • This is the default profile (set to mode and would be 3%) – and it’s recommended to suggested only when use this profile unless necessary necessary

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 36 Tetration Sensor Overhead (e.g. 2263 sensors)

• CPU utilization on Host Sensor based on current deployments averages < 1% • Flow collection has zero impact on switch hardware sensor CPU

• Network Overhead is ~1% of observed traffic load

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 37 Software Sensor Support (Q2CY17)

Full Sensor Universal Sensor • RHEL (64 bit) – 5.x, 6.x, 7.x • Mainframes: AIX-ppc 5.3, 6.1, 7.1, 7.2 • CentOS (64 bit) – 5.x, 6.x, 7.x (trial)

• Oracle Linux (64 bit) – 6.x, 7.x • Solaris (x86_64)

• SUSE – 11.2, 11.3, 11.4, 12.1, 12.2 • RHL 4.x, 5.x (32 bit -386/amd)

• Ubuntu – 12.04, 14.04, 14.10 • CentOS - 4.x, 5.x (32 bit)

• Windows Server 2008 R1/R2 | Essentials | • Windows XP, 2003 (32 bit) Standard | Enterprise | DataCenter • Windows Server 2008 (32 bit) • Windows Server 2012 R1/R2 | Essentials | Standard | Enterprise | DataCenter

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 38 Tetration Analytics Platform

Architecture - Sensor Data Telemetry Means Different Things to Different People – Device State

• Device State Telemetry Know the Network • What is happening in the Switch/Router and infer the health • What is happening between Devices of the application based on the state • What is happening in the Network of the devices

Network Network Network Network

Device and Network Data Link Data Link Data Link Data Link

Physical Physical Physical Physical

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 40 Telemetry Means Different Things to Different People – Application State

Application Know the application Application and infer the health of Process Process the infrastructure based Process Process on the state of Sockets application connectivity Sockets

• Application State Telemetry • What is happening in the Operating System • What is happening in the Process (JVM) • What is happening in the Server I/O path

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 41 View of Telemetry Application Processes, Sockets and Context

Socket > 1023 Socket = 443

Chrome NGINX

Consumer Process Provider/Service Process

• Application developers implement business logic as code that runs as processes and threads • TCP/IP which forms a foundation of the Internet was designed to allow these application processes to interact via sockets • Application logic can be viewed on one level as the interaction between a group of processes and their associated sockets • Understanding the inter-process communication and mapping that directly to the infrastructure provides a direct correlation between the application and the infrastructure

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 42 Tetrations View of Telemetry Application Processes, Sockets and Context

Socket > 1023 Socket = 80

Chrome NGINX

Consumer Process Provider/Service Process

#create an INET, STREAMing socket #create an INET, STREAMing socket s = socket.socket( serversocket = socket.socket( socket.AF_INET, socket.SOCK_STREAM) socket.AF_INET, socket.SOCK_STREAM) #now connect to the web server on port 80 #bind the socket to a public host, # - the normal http port # and a well-known port s.connect(("www.mcmillan-inc.com", 80)) serversocket.bind((socket.gethostname(), 80)) #become a server socket serversocket.listen(5)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 43 What do we mean by Application Visibility Internet Stack

Application Application

Process Process Process Process

Sockets Sockets Transport Transport

Network Network Network Network

Data Link Data Link Data Link Data Link

Physical Physical Physical Physical

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 44 What Does Tetration Sensor Collect Socket Connectivity, the data flows

Application Application

Process Process Process Process

Sockets Sockets Transport Transport

Network Network Network Network

Data Link Data Link Data Link Data Link

Physical Physical Physical Physical

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 45 What does the Sensor Collect

Context Device Information: Process Buffer/ACL Drops, etc. Information: Which process is it, Application who started it, etc. Application

Process Process Process Process

Sockets Sockets Transport Transport

Network Network Network Network

Data Link Data Link Data Link Data Link

Physical Physical Physical Physical

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 46 Sensor Data Process Information

• Host Sensor collects information about the consumer and provider processes • /proc • runtime system information (e.g. system memory, devices mounted, hardware configuration, etc).

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 47 CMDB, DNS, whois, etc. External Data Additional Context (Talos,…, future) Repositories External Data Sources

Application Application

Process Process Process Process Annotation and Operations Data Sockets Sockets Transport Transport

Network Network Network Network

Data Link Data Link Data Link Data Link

Physical Physical Physical Physical APIC

Tetration Pervasive Sensors Analytics Engine

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 48 What does the Sensor Collect Socket Level Flow Information + Context Information

• Understanding of what happens TO • Anomaly detection ‘and’ INSIDE a flow • Latency (application and network) • Distributions (packet sizes, TCP • Events windows…) • VXLAN information • Burstiness

Per Packet Variations Length Length 66 9000

Accumulated Flow Information (Volume…)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 49 Full vs. Sampled What happens when you sample?

Full Packet Stream

Flow A

Flow B

Flow C

SYN SYNACK ACK FIN Flow D

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 50 Full vs. Sampled Reasons and Use Cases for Both Sampled Full

• Sampling has it’s use cases, in SP • Depending on the of flows environments for example and type of flows • High Volume, no behavioral analysis • Mice flows can go completely unseen • Connection Oriented flows may not be • Sampling provides a good statistical tracked properly (missed flags) model • For Trends • Accuracy of the flow increases with • For Traffic Visibility the packet count • For Volume Indication • Type of sampling and quality of entropy • Entropy is very important

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 51 Tetration Examines every packet

Full Packet Stream

• Variability ’within’ the flow

• Variability ‘between’ the flows

• Changes ‘within’ the flow

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 52 Collects the Meta-Data not the Packet

Meta-Data – Including Overlay VXLAN/GRE/IPinIP Encapsulated Header

Ethernet IP UDP VXLAN Ethernet IP TCP Payload Header Header Header Header Header Header Header

Ethernet IP TCP Payload Header Header Header

Ethernet IP UDP Payload Header Header Header

Privacy Risk

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 53 Sensor Data Flow Data – Forwarding

• COS

• Overlay Type (Native, 802.1q / 802.1p, VXLAN, iVXLAN, NVGRE, NSH, other)

• Source TEP or Port ID

• Destination TEP

• Disposition (RPF or Port Security failure, Policy drop, redirect or span)

• Port type (spine to leaf or leaf to host)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 54 Sensor Data Accumulated Flow Information

• Bytes, Packet Count • Accumulated TCP flags

• IP options present • Last ACK / SEQ

• IP length error • Sampled Packet length

• DF bit set • Sampled Packet ID

• Fragment seen

• Last TTL

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 55 Sensor Data Histogram Bins #1 #2 #3 #4 82 bits 82 bits 0 bits 165 bits • Flow Cache has the notion of “bins” to build histograms 1 0 1 0 1 0 0 0 • TCP options length (8 bits) • Payload length (12 bits) Export • Receive window (6 bits) #5 #6 #7 #8 82 bits 82 bits 130 bits 165 bits • This means more visibility on the activity of flow 0 0 1 1 1 0 0 0 • Bin sizes are configurable • Bins don’t need to be of equal size (but Export need to be contiguous) • Last bin will capture the configured size Histogram of = and above the flow

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 56 Sensor Data Burst

• Measure the “burstiness” of a flow • Burst are measured in 32k interval • Current Burst • Each export period is divided by 128 • Max Burst • Burst Index • Flowlets are activity after a silence • Flowlets period (configurable)

Current – 128 Current – 256 Current – 32 Current – 1024 Current – 0 Max – 128 Max – 256 Max – 256 Max – 1024 Max – 1024 Burst Index - 0 Burst Index - 3 Burst Index - 3 Burst Index - 80 Burst Index - 80

0 1 2 3 30 80 128 Flowlet #1 Silence Flowlet #2

Max Burst occurred at 62.5ms with a value of 1024 and 2 flowlets

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 57 Sensor Data Anomaly List

• TTL changed • TCP flags are zero’d • IP reserved flags are not 0 • TCP SYN with data • DF bit has changed • TCP FIN with no ACK • TCP RST with no ACK • Ping of death • TCP SYN, FIN, RST and ACK zero’d • Fragment is too small to contain L4 header (TCP, UDP and SCTP) • URG set but no URG pointer • TCP SYN and FIN are set • URG pointer with no URG flag • TCP SYN and RST are set • TCP seq outside the expected range • TCP FIN, PSH and URG are set • TCP seq is less than expected (rexmit)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 58 Sensor Data Application Latency: How long did it take for the inbound TCP Timing Data data to be ACK’d

SRTT Latency (Process to ACK Application Application Process at the TCP level)

Process Process Process Process

Sockets Port to Port Latency: Sockets (Requires HW support) Transport Transport

Network Network Network Network

Data Link Data Link Data Link Data Link

Physical Physical Physical Physical

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 59 Sensor Timing Data Network Performance Monitoring Example

• The host calculates round trip time as 8 milliseconds • The port-to-port network latency is 252 microseconds • The app took 15 seconds to return the ACK

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 60 Pervasive Visibility Flow Search and Forensics BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 62 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 63 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 64 Different Problems will need Different Data Sources

Application Application

Process Process Heath, Performance, Sockets Monitoring, Security, Transport Discovery Application Troubleshooting Network Network Network Heath, Data Link Data Link Performance, Monitoring, Physical Physical Capacity

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 65 Tetration Analytics Platform

Architecture - Cluster Tetration Analytics Architecture Overview

Visualization and Data Collection Analytics Engine Reporting

Host Sensors Tetration Web GUI VM Telemetry

Network Sensors Cisco Tetration Cisco Nexus® Cisco Nexus ™ REST API 92160YC-X 93180YC-EX Analytics Platform

3rd-Party Push Events Metadata Sources Configuration Data

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 67 The Analytics Cluster Front End

Components Compute (Data Cleaning and • Hadoop Based Platform Analytics) • Self managed Caching • One touch deployment (Search) • Tiered System • Heavy Compute for Machine Learning • Caching for light speed queries

• Extensibility (future) Long Term Storage • Messaging Bus (Data Lake) • API Access

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 68 The Analytics Cluster Appliance

• The Analytics Cluster operates as an appliance • Avoids the need for in house Big Data, Analytics expertise • Supported by Cisco TAC

• Self Monitoring • The cluster leverages a sensor architecture to track it’s state and provides event based notifications for

• Software upgrades and full install are all automated

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 69 Cluster Monitoring and Maintenance

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 70 Collector Monitoring and Maintenance

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 71 Sensor Monitoring and Maintenance Sensor Throttled

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 72 Hardware Sensor Monitoring

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 73 Tetration 1.0 Analytics Cluster Configurations

4 x 3-Phase PDU 4 x 1-Phase PDU 22.5 KW Peak Power 11.5 KW Peak Power

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 74 Tetration Analytics 2.0: Deployment Options

On-Premise Options Public Cloud

Cisco Tetration Analytics Cisco Tetration-M (Small Cisco Tetration Cloud Form Factor) (Large Form Factor) • Software deployed in AWS • Suitable for deployments • Suitable for deployments • Suitable for deployments under 1000 workloads more than 1000 workloads under 1000 workloads Includes: • Built in redundancy • AWS instance owned by • Scales up to 10,000 • 6 x UCS C-220 servers customer workloads • 2 x Nexus 9300 switches Includes: • 36 x UCS C-220 servers • 3 x Nexus 9300 switches

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 75 Analytics Engine Front End

The Platform Compute (Data Cleaning and • Hadoop Based Platform Analytics) • Self managed • One touch deployment Caching (Search) • Tiered System • Heavy Compute for Machine Learning • Caching for light speed queries

• Extensibility (future) Long Term Storage • Messaging Bus • API Access (Data Lake)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 76 Front End GUI, RESTful API

• Servers hosting front end processes • GUI and Operational Interfaces • RESTful API

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 77 Data Processing Pipeline

• Data Ingest and Processing

• Multiple Pipelines for different processing activities

• Scaled to Millions of events per second

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 78 Caching Layer Natural Language Search

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 79 Caching Layer Search

• Caching Layer provides a large in memory and flash based data store for real time searches e.g. 16 weeks of policy delta data accessible for real time search

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 80 Data Lake HDFS Storage

• Long Term Storage for collected observations, for pipeline processing tasks, etc

• Usage is based on • Time Based Retention • Space Based Retention • Greedy Retention

• Max possible Retention period will depend on cluster size and observation rate

14.10 K hours of available capacity at the current collection rates (587 days)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 81 Standard Data Analytics Pipeline Tetration Data Analysis

Various Pipelines (e.g. ADM) process the data to derive appropriate insights

Automated Data Statistical Analysis Reporting, Data Data Prep & & Visualization Aggregation Discovery& cleansing Prediction Tools or Alerts Evaluation

De-duplication, unification of uni- Sensor Collectors directional flows into bi-directional, GUI, REST API, Kafka, annotate flows with context Policy Export, … information, etc.

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 82 Data Collection Sensor to Collector

Automated Data Statistical Analysis Reporting, Data Data Prep & & Visualization Aggregation Discovery& cleansing Prediction Tools or Alerts Evaluation

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 83 Data Prep and Annotation

• De-duplication, Application Application unification of uni- Process Process Process Process Collector Sockets Sockets directional flows into Transport Transport bi-directional, Network Network Network Network annotate flows with Data Link Data Link Data Link Data Link Collector context information, Physical Physical Physical Physical etc.

Automated Data Statistical Analysis Reporting, Data Data Prep & & Visualization Aggregation Discovery& cleansing Prediction Tools or Alerts Evaluation

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 84 104 Annotation

• Think Gmail Labels

• User Defined information • User Uploaded • Keyed by VRF, IP • JSON Open Fields

• Derived Information • IP • VRF • …

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 85 Annotation & Inventory

• Discovered Inventory

• Uploaded Inventory and Meta Data (32 Arbitrary Tags)

• Inventory Tracked in Real Time, along with historical trends

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 86 Analyzing the Data

• Endpoints are iteratively compared with each other to find which “profiles” are most similar • Sensor Data: Ports provided and consumed, Addresses sent and received from, Properties of network flows, Running processes, Process originating flow, Hostname, • External Context: Load balancers / DNS / route tags • Human approved clusters from current or other workspaces and base cluster definition • This is an example of where we use machine leaning

Automated Data Statistical Analysis Reporting, Data Data Prep & & Visualization Aggregation Discovery& cleansing Prediction Tools or Alerts Evaluation

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 87 Machine Learning

Cognitive Computing - Finding and remembering all the relationships between data, querying the matrix of relationships (Watson)

Machine Learning - Remember what has happened before and then look at new data coming in that context to try and find patterns, build up a body of knowledge and then use that data to make a decision based on the new data. Can machines remember and apply what they remember to new data

Deep Learning - Not trying to maintain data and relationships over time but analyze that data through better representations and create model to learn these representations from large scale unlabeled data. Succession analysis

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 88 Machine Learning

A "Field of study that gives computers the ability to learn without being explicitly programmed“ Arthur Samuel (1959)

The programmers construction of algorithms that can learn from and make predictions on data (as opposed to static programming instructions).

7:00 am = 65 degrees 8:00 am = 75 degrees 77.5 degrees 9:00 am = 85 degrees

How warm will it be at 8:30 am tomorrow?

Supervised learning: Linear regression , Logistics regression, SVMs Unsupervised learning: K-means, PCA, Anomaly detection

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 89 ADM Clustering Machine Learning Example

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 90 K-means Algorithm Finding the Clusters

Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster

}

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 91 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 92 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 93 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 94 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 95 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 96 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 97 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 98 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 99 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 100 Silhouetting Validation of the Cluster

• The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation) • Produces a higher degree of probability that the clustering is representational

https://en.wikipedia.org/wiki/Silhouette_(clustering)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 101 Results of the Clustering Machine Learning

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 102 Tuning Cluster Granularity Tuning the Algorithms

1 2 1 1 1

15

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 103 Analyzing the Data Fitting the Curve

• Every data set (e.g. flow) is examined to find the best function that describes it’s behaviour • Comparison within and between ‘flows’ can be used to find ‘outlier’ or anomaly conditions

Automated Data Statistical Analysis Reporting, Data Data Prep & & Visualization Aggregation Discovery& cleansing Prediction Tools or Alerts Evaluation

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 104 Outliers What does not look like it ‘fits’ Outlier dimension is Switch on Outlier view to highlighted with purple circle highlight uncommon flows

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 105 Tetration Analytics Architecture Overview

Data Collection Analytics Engine Open Access

Software Sensor and Web GUI Enforcement Cisco Embedded REST API Network Sensors Tetration (Telemetry Only) Analytics Event Notification Cluster

Third Party Sources (Configuration Data) Tetration Apps

 Self Managed Cluster  No Hadoop / Data Science Background Needed  Easy Integration via Open interfaces  One Touch Deployment  No External Storage Needed  Open Data Lake (via Tetration Apps)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 106 Accessing the Data and Analytical Results API, Workspace Applications and Messaging BUS

Northbound Northbound Northbound application consumers consumers

Kafka Broker

Programmatic Message Tetration Interface Publish Apps

Kafka

Automated Data Statistical Analysis Reporting, Data Data Prep & & Visualization Aggregation Discovery& cleansing Prediction Tools or Alerts Evaluation

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 107 Tetration API

• Shipped as a limited ‘trial feature’ with the 103.8 release • Supported with the 2.0 release (FCS April 2017) • Is a RESTful API that uses HMAC time-bound authentication tokens generated from a private and public key pair • SDKs available in Python (2.7+) and JavaScript (ES6+) API • Supports managing sensors and switches, plus flow searching

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 108 Example: Flow Search Filters

• Method: POST • Endpoint: /flowsearch • Description: The entire Tetration Analytics flow database can be queried, boasting sub-second response times.

• Search the flow database for the first record that matches these parameters:

Between 02/01/2017 3PM and 02/01/2017 4PM

Default tenant and VRF

Destination port 80 (HTTP)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 109 Tetration an Open Platform User Apps

• Tetration Engineering, Partners and Customers can write apps on Tetration • User can write his business logic and extend Tetration

• Programming Languages supported • Scala • Python • SQL • R (coming)

• Type of jobs • One Time Applications and Reports – scheduled now or any time in future • Recurring Applications and Reports

• Trigger Alerts on tetration or user defined events

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 110 User Apps: Tetration an Open Platform • Data Source • Tetration Data with Multi Tenancy (Row and Column filtering – Tetration Read Libs) • User Uploaded Data – Any schema or format • Application generated Data • Data Retention and quota monitoring • Arbitrary Stream ingestion (coming soon)

• Alert and Event Posting • Kafka Message bus integrated

• Security model • Tenant Isolation • Tetration jobs isolated from user • Active firewalls. All user jobs launched inside a private container, Adhoc VM. Jailed from Tetration mainstream • Kafka Kerberos Auth

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 111 Tetration Apps

• Explore data from your browser, develop your models

• Based on Jupyter Notebooks

• Provides an easy way to develop apps

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 112 Granular RBAC Application Access Controls

Permitted Scope Workloads Actions User Permitted Roles Scope Workloads Actions User Permitted Roles Scope Workloads Actions User Permitted Workloads Actions

R, Modify, ADM, Enforce, etc.

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 113 Hands on DEVNET Lab

For hands on exposure to the API please feel free to visit DEVNET

DEVNET-2423

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 114 Policy Discovery, Compliance and Enforcement Application Dependency and Cluster Grouping

Bare-metal, VM, & switch telemetry BM VM VM VM BM VM BM

Cisco Nexus® 9000 Series VM VM Network-only sensors, host-only sensors, or both (preferred)

Cisco Tetration BM VM Bare-metal & BM VM VM BM VM telemetry Analytics™ VM BM Brownfield Platform

Bare metal and VM VM BM

Unsupervised machine VM BM BM VM VM BM VM telemetry learning (AMI …) Behavior analysis BM On-premises and cloud workloads (AWS)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 116 Application Conversation View

Application clusters Conversation details conversation views including process bindings

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 117 Whitelist Policy Recommendation Application Discovery Whitelist Policy Recommendation (Available in JSON, XML, and YAML)

{ "src_name": "App", "dst_name": "Web", "whitelist": [ {"port": [ 0, 0 ],"proto": 1,"action": "ALLOW"}, {"port": [ 80, 80 ],"proto": 6,"action": "ALLOW"}, {"port": [ 443, 443 ],"proto": 6,"action": "ALLOW"} ] }

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 118 Policy Compliance Verification & Simulation

What was seen on the network that was out of Policy

Permitted Traffic Seen on the network

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 119 Policy Enforcement The Goal Is to Describe Intent

I want to…

• Block non-production apps talking to productions apps • Allow HR apps to use the employee database • Block all HTTP connections that are not destined to web servers • Allow and notify me when a new app request DNS server access • Block and notify me when a new app requests AD server access

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 120 Security

Intent is rendered as security rules in native host firewalls Same level of security, any infrastructure.

Process Application

Denies Allows

End Point

Infrastructure

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 121 Virtual

Process Process Application Application

Denies Allows Denies Allows

End Point End Point

Hypervisor Virtual Network

Bare metal Network Infrastructure Cloud

Any Infrastructure Process Process Application Any Networking Application Denies Allows Same Security Model Denies Allows End Point End Point Rich Context Network Infrastructure Cloud Infrastructure

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 122 Mobility

Intent stays with the endpoint, no matter the infrastructure it resides on

EP EP

VLANs Interfaces

Tetration calculates all Subnets necessary rule changes and Security Groups automatically applies

ACLs Security Rules

7K 5K 2K Cloud

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 123 How Does It Work?

Tetration automatically converts your intent into black and white list rules

Block non-production apps talking to DENY SOURCE 10.0.0.0/8 DEST 128.0.0.0/8 production apps

Allow HR apps to use the employee database ALLOW SOURCE 128.0.10.0/16 DEST 128.0.11.0/16

ALLOW SOURCE * DEST 128.0.100.0/16 PORT = 80 Block all HTTP connections that are not destined to web servers DENY SOURCE * DEST * PORT = 80

Intent Rules

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 124 Enforcement

• Enforcement happens at the endpoint level • IPtables on Linux • Advanced Firewall on Windows

• It can be enabled / disabled at the endpoint level (from Tetration) • Monitoring or Monitoring + Enforcement • Cannot be reverted without removing the agent

• Enforcement runs as a separate process for compliance reasons • Proving the agent does not run

• A compliment not replacement for infrastructure enforcement

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 125 Application Centric, Okay but how do I get there? Enhanced Security Services Application and Infrastructure Optimization

Tenant and Application Security Requirements and Enforcement

Application Dependency Mapping

Automated Network Whitelist Policy Forensics Generation

Policy Policy Simulation Compliance and and Impact Auditability Assessment

Rich Telemetry Data from Hardware (Nexus 9000) and Software Sensors enables discovery and security monitoring

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 127 Data Center Vision Inter-dependent feedback loops ACI, YANG (Intent 1. Deployment and Based Automation) Provisioning Infrastructure Automation Security

Cisco CloudCenter (Common Application Guarantees Assurance (Formal Consumption across Deployment Compliance Methodologies) Hybrid IT) Consistency

ADM 2. Operations and Tetration Analytics Security Management (Machine Learning Based Operations Forensics and Security)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 128 Summary

VM

Pervasive flow Ready-to-use Self-monitoring Open platform Accelerated telemetry that solution to address and eliminate the and northbound adoption and supports critical data center need for APIs enable comprehensive infrastructure for operational in-house big data transparent Solution multiple data use cases expertise integration support with centers at scale Services

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 129 Complete Your Online Session Evaluation

• Please complete your Online Session Evaluations after each session • Complete 4 Session Evaluations & the Overall Conference Evaluation (available from Thursday) to receive your Cisco Live T-shirt • All surveys can be completed via the Cisco Live Mobile App or the Don’t forget: Cisco Live sessions will be available Communication Stations for viewing on-demand after the event at CiscoLive.com/Online

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 130 Continue Your Education

• Demos in the Cisco campus

• Walk-in Self-Paced Labs

• Lunch & Learn

• Meet the Engineer 1:1 meetings

• Related sessions

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 131 Q & A Thank You