Tetration Analytics - Network Analytics & Machine Learning Enhancing Data Center Security and Operations

Michael Herbert Principal Engineer INSBU BRKACI-2040 Okay what does Mean?

• Tetration (or hyper-4) is the next after , and is defined as iterated exponentiation

• It’s bigger than a Google [sic] ()

• And yes the developers are a bunch of mathematical geeks

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 3 What if you could actually look at every process and every data packet header that has ever traversed the network without sampling?

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 4 Cisco Tetration Analytics Pervasive Sensor Framework

Provides correlation of data sources across entire application infrastructure

Enables identification of point events and provides insight into overall systems behavior

Monitors end-to-end lifecycle of application connectivity

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 5 Cisco Tetration Analytics Policy Discovery and Observation APPLICATION WORKSPACES Public Cloud

Private Cloud

Cisco Tetration Analytics™ Application Segmentation Policy

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Profile and Context Driven Application Segmentation

1. Real-time Asset Tagging 2. Policy Workflows 3. Policy Enforcement (Role Based and Hierarchical) Cisco Tetration Application Insights (ADM) No Need to Tie Policy + to IP Address and Cisco Tetration Sensors Tag and Label-Based Add-on Policy Port (For Example, Mail Filters) Cisco Tetration Customer Defined Platform Performs the Translation

Compliance Monitoring Enforcement

Public Cloud Bare Metal Virtual Cisco ACITM* Traditional Network*

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Tetration Analytics: Open Access

NORTHBOUND NORTHBOUND NORTHBOUND APPLICATION CONSUMERS CONSUMERS

Kafka Broker

Programmatic Message Tetration Interface Publish Apps

Cisco Tetration Analytics Platform

REST API Push Notification Tetration Apps  Tetration flow search  Out-of-box events  Access to data lake  Sensor management  User defined events  Write your own application

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Tetration Analytics Platform

Architecture - Sensors Tetration Analytics Architecture Overview

Data Collection Analytics Engine Open Access

Software Sensor and Web GUI Enforcement Cisco Embedded REST API Network Sensors Tetration (Telemetry Only) Analytics Event Notification Cluster

Third Party Sources (Configuration Data) Tetration Apps

 Self Managed Cluster  No Hadoop / Data Science Background Needed  Easy Integration via Open interfaces  One Touch Deployment  No External Storage Needed  Open Data Lake (via Tetration Apps)

BRKACI-2060 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Traditional Monitoring Is Showing Its Age Not suited for Modern Network and Security Operations

Where Data Is Created Where Data Is Useful

SNMP SNMP Server

Non Syslog Real Syslog Collector time Storage & Analysis CLI Strong burden on Scripts back-end Normalize different encodings, transports, data models, timestamps

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 11 Data Granularity Needs to Improve One Minute SNMP Polling

Telemetry – 10 Second Push SNMP – 1 Minute Polling

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Data Granularity Needs to Improve 10 Second SW Process Push

Telemetry – 10 Second Push

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Data Granularity Needs to Improve Sub Second HW/SW Push

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Data Granularity Needs to Improve Type of Problems Customers are Looking to Address

Workload Placement

Service Level Monitoring

ADM

Security and Policy Enforcement

Microburst Detection Traffic Engineering

Capacity Planning

Troubleshooting & Remediation (Self Driving)

On-Change <= 1 sec ~10s sec ~minutes-hours

Resolution = Frequency of Data Collection

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Processing on the Source Device is Expensive e.g. Consider Flow Collection Efficiency

512K Sampled Flow Cache with Flow Flow Data streaming export Table

• Collect and Keep all Flow Data in the • Maintain a small ‘cache’ and Local Hardware or Software Flow export the cache at a high data Table • Sampling Flows Reduces rate • Size of the Table depends on the Cost of the Telemetry but • Shift the cost of aggregation to Data Rates and Connectivity Density Reduces Accuracy backend resources • BW is Growing Faster than Memory • Aggregate ‘Flow Table’ can be (Cost of Flow Entry per Gbps is not much larger flat)

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public The Richer the Data Sources the Better More Data == Better Interpolation

Lamp Sensor Plug Sensor

Heater

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public The Richer the Data Sources the Better You don’t always know what you need in advance

• On-Box Filtering Loses Data • Can’t Change Your Mind About What’s Important Later • Can’t Scale Out Embedded Processing • Compression (Lossless) is Good • Massive Amounts of Data Motivate the Shift in Collection • Bulk Collection is Efficient • Bulk Processing/Export Not So Much

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Streaming Telemetry is a game changer Monitoring becomes a big data problem

Where Data Is Created Where Data Is Useful

Removing limitations and complexity

• Streaming paradigm Real time • Dense Sensor Framework

• Increased Data Granularity Volume – Scale of Data Velocity – Analysis of Streaming Data • Update on every event Variety – Different Forms of Data

• Multiple Data Sources Big Data and Machine Learning Problem

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Pervasive Sensors

Software Sensors Network Sensors Third Party Sources Available Now Next Generation 9K switches 3rd party Data Sources

Linux VM Asset Tagging

Nexus 9200-X Load Balancers Windows Server VM

Bare Metal IP Address Management (Linux and Windows Server) Nexus 9300- CMDB Universal* EX/FX (Basic Sensor for other OS) …

*Note: No per-packet Telemetry, Not an enforcement point  New! Enforcement Point (Software agents)  Low CPU Overhead (SLA enforced)  Highly Secure (Code Signed, Authenticated)  Low Network Overhead (SLA enforced)  Every Flow (No sampling), NO PAYLOAD

BRKACI-2060 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Software Sensor Tetration Sensor Application • Runs in the Host OS, not the Hypervisor libpcap Network Stack • Access to accurate state of the application and all connectivity Driver • Not in the data path • Sits in User Space • Designed by Kernel Developers NIC

• Secure • Code Signed

• SLA Enforcement • CPU and BW throttling

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 21 Software Sensor Enforcement Process High Privilege Collection • When leveraging the enforcement capability an additional component is downloaded by the Cluster to the existing sensors Low Privilege Monitoring • Monitoring and Enforcement are distinct functions with distinct threads (the enforcement code does not exist in the server until explicitly pushed Cluster Link

• Agent will implement privilege separation • SSL libraries would run in low privilege space High Privilege Enforcement • /proc parsing in high privilege space • Enforcement in high privilege space Low Privilege Cluster Link

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public PKI within the Cluster/Sensor

• Tetration Cluster runs an internal PKI • Root CA is per cluster, inserted at Image creation • Not accessible outside the cluster • Cannot connect to an external PKI

• Certificate based authentication is performed for the Control Channel • CN of the certificate is the IP address • Certificates are rotated every 60 days

• Sensors are code signed • Signature Authority is Cisco’s code signing certificate • Code Signature is validated at process start

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 23 How Sensor Communicate with the Cluster the First Time?

Register with web server via ssl Assign UUID Rails

Register with web server via ssl

Sensor Download config Config Server

Send meta data to collectors Collector

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 24 Components & Communication Software Sensor

Agent Communication Unix Socket Control Channel TCP-SSL 443

Tetration Cluster Software Sensor/Agent Sensor Data TCP-SSL 5640

• When used policies pushed from the cluster are pairwise signed with TS (Replay protected) between Cluster and sensor agent LINUX/Windows/… • If rules changed on the end host – Enforcer restates the rules and sends a Notification to Controller

25 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Tetration Sensor Overhead (e.g. 2263 sensors)

• CPU utilization on Host Sensor based on current deployments averages < 1% • Flow collection has zero impact on switch hardware sensor CPU

• Network Overhead is ~1% of observed traffic load

Presentation ID © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 26 Tetration Host Sensor Has Three Rate Limiting Modes

Top Adjusted Disabled

• Uses no more CPU % than • Takes the provided limit and • Use in hosts where the given limit on any single multiplies it by the amount of telemetry MUST be core cores available to the system collected

• For example, 3% limit on a 10 • For example, 3% limit on a 10 • No CPU % limit, will take as core system = 3% out of total core system = 30% out of total much as necessary to 1,000% available 1,000% limit capture each and every packet • This is a fairly restrictive • This is the default profile (set to mode and would be 3%) – and it’s recommended to suggested only when use this profile unless necessary necessary

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Software Sensor Support (Q2CY17)

Full Sensor Universal Sensor

• RHEL (64 bit) – 5.x, 6.x, 7.x • Mainframes: AIX-ppc 5.3, 6.1, 7.1, 7.2 • CentOS (64 bit) – 5.x, 6.x, 7.x (trial)

• Oracle Linux (64 bit) – 6.x, 7.x • Solaris (x86_64)

• SUSE – 11.2, 11.3, 11.4, 12.1, 12.2 • RHL 4.x, 5.x (32 bit -386/amd)

• Ubuntu – 12.04, 14.04, 14.10 • CentOS - 4.x, 5.x (32 bit)

• Windows Server 2008 R1/R2 | Essentials | • Windows XP, 2003 (32 bit) Standard | Enterprise | DataCenter • Windows Server 2008 (32 bit) • Windows Server 2012 R1/R2 | Essentials | Standard | Enterprise | DataCenter

Presentation ID © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 28 Hardware Sensor Direct Export of the Hardware State

Monitor SW State (polled, BGP EthPM STP timer driven, on demand, …) CPU sources the SW Telemetry Data (everything not in the HW export)

Configure Required Telemetry (Process State, Flow Cache, Events, SSX)

Configure Desired Triggers ASIC Directly Transmits HW (Events, Flows, …) Telemetry Data (Timer and Event Triggers)

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Hardware Sensor EX and FX series Nexus 9000

• Embedded Module (Flow Cache) • Nexus 92xxx-EX • Nexus 93xxx-EX • Nexus 93xxx-FX

• Extracts Meta-Data from the forwarding pipeline • No latency impact, no performance impact

Flow Cache

PRX LUA LUB LUC

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 30 Components & Communication Hardware Sensor

Control Channel TCP/443 NXOS Agent

Agent Communication Guest Shell Unix Socket Tetration Cluster

ASIC Sensor Data UDP/5640

Cisco Nexus 9000

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 31 Download TaAgent

• The cluster specific TaAgent rpm file is available for download from the UI.

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Upload the RPM to APIC

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Enabling Leaf Switches for Analytics

• Tetration is supported only on EX (and newer) leaf switches Fabric Policies • EX switches can run in one of two modes Switch Policies • Analytics Mode (Tetration) Policies • NetFlow Mode Fabric Node Controls • The default mode is Analytics. Node Control Policy Analytics Priority • A Node Control Policy should still be created to enable Analytics Priority for consistency Policy Groups • Node Control Policies are configured under Leaf Switch Policy Group Fabric Policies Profiles • An Analytics policy needs to be created to Leaf Switch Profile specify the TA Cluster IP. Leaf Selectors

Analytics policy

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Create Tenants and VRFs in Tetration

The IDs needs to be unique in TA. There is no relationship with ACI VRF ID

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Create Tenants and VRFs in Tetration

• Data should be available after some minutes in the flow search

• Don’t forget to change to your scope

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Collection Rules

• Ability to select which subnets needs to be analyzed by the switch

• Denied packet will not be exported to the clusters nor analyzed

• Collection rules are configured per scope/VRF (See limitation slide)

• If the switch version does not support collection rules, all flows are inspected. • Standalone: 7.0(3)I5(2) or later • ACI: 12.2.2* or later • ta_agent.log will show “Switch does not support filtering.”

• Collections rules apply to both Hardware and Software sensors

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Configuring Rules

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Verification

• By now the switches should appear in the Tetration

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Hardware Sensor EX and FX series Nexus 9000

• Nexus 9000 HW sensor supported in Tetration 2.0 release (Q2CY17)

• Support in NX-OS Mode • Cisco NX-OS Release 7.0(3)I5(2) adds filtering support • https://techzone.cisco.com/t5/Tetration-Analytics/Installation-and-configuration-of- Hardware-sensor-on-standalone/ta-p/1010838

• Support in ACI Mode • Cisco ACI release 2.3 adds filtering support • Cisco ACI release 2.4 adds additional statistics • http://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/kb/b_KB_Configur ing_Tetration.html • https://techzone.cisco.com/t5/Tetration-Analytics/Tetration-Deep-Dive-Network- Connectivity-Hardware-Sensors/ta-p/975945

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Universal Sensors

• Supporting any legacy operating system… - AIX - Solaris - Windows Server 2003 • Process and connection tracked with a lower granularity

• Enables accurate application dependency mapping and policy generation

• No per-packet telemetry, not an enforcement point

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Target Q3CY17 Tetration Telemetry: ERSPAN Option

Expanded Telemetry L3 Connection Collection Option Production Layer-3 Switch • Augment telemetry from Network other parts of the network Tetration Telemetry • Useful when software sensor or hardware ERSPAN sensor is not feasible • Dedicated VMs on each host with Cisco Tetration Production 10 software sensors each Analytics™ Network • Each sensor binds to separate vNIC • ERSPAN terminates on the VM vNIC • Each sensor terminates one ERSPAN session • Sensor generates telemetry based on the data plane traffic

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Tetration Analytics Platform

Architecture - Sensor Data Telemetry Means Different Things to Different People – Device State

• Device State Telemetry Know the Network • What is happening in the Switch/Router and infer the health • What is happening between Devices of the application based on the state • What is happening in the Network of the devices

Network Network Network Network

Device and Network Data Link Data Link Data Link Data Link

Physical Physical Physical Physical

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Telemetry Means Different Things to Different People – Application State

Application Know the application Application and infer the health of Process Process the infrastructure based Process Process on the state of Sockets application connectivity Sockets

• Application State Telemetry • What is happening in the Operating System • What is happening in the Process (JVM) • What is happening in the Server I/O path

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public View of Telemetry Application Processes, Sockets and Context

Socket > 1023 Socket = 443

Chrome NGINX

Consumer Process Provider/Service Process

• Application developers implement business logic as code that runs as processes and threads • TCP/IP which forms a foundation of the Internet was designed to allow these application processes to interact via sockets • Application logic can be viewed on one level as the interaction between a group of processes and their associated sockets • Understanding the inter-process communication and mapping that directly to the infrastructure provides a direct correlation between the application and the infrastructure

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 49 Tetrations View of Telemetry Application Processes, Sockets and Context

Socket > 1023 Socket = 80

Chrome NGINX

Consumer Process Provider/Service Process

#create an INET, STREAMing socket #create an INET, STREAMing socket s = socket.socket( serversocket = socket.socket( socket.AF_INET, socket.SOCK_STREAM) socket.AF_INET, socket.SOCK_STREAM) #now connect to the web server on port 80 #bind the socket to a public host, # - the normal http port # and a well-known port s.connect(("www.mcmillan-inc.com", 80)) serversocket.bind((socket.gethostname(), 80)) #become a server socket serversocket.listen(5)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 50 What do we mean by Application Visibility Internet Stack

Application Application

Process Process Process Process

Sockets Sockets Transport Transport

Network Network Network Network

Data Link Data Link Data Link Data Link

Physical Physical Physical Physical

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 51 What Does Tetration Sensor Collect Socket Connectivity, the data flows

Application Application

Process Process Process Process

Sockets Sockets Transport Transport

Network Network Network Network

Data Link Data Link Data Link Data Link

Physical Physical Physical Physical

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 52 What does the Sensor Collect

Context Device Information: Process Buffer/ACL Drops, etc. Information: Which process is it, Application who started it, etc. Application

Process Process Process Process

Sockets Sockets Transport Transport

Network Network Network Network

Data Link Data Link Data Link Data Link

Physical Physical Physical Physical

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 53 Sensor Data Process Information

• Host Sensor collects information about the consumer and provider processes • /proc • runtime system information (e.g. system memory, devices mounted, hardware configuration, etc).

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 54 CMDB, DNS, whois, etc. External Data Additional Context (Talos,…, future) Repositories External Data Sources

Application Application Annotation and Process Process Process Process Operations Data Sockets Sockets Transport Transport

Network Network Network Network

Data Link Data Link Data Link Data Link

Physical Physical Physical Physical APIC

Tetration Pervasive Sensors Analytics Engine

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 55 What does the Sensor Collect Socket Level Flow Information + Context Information

• Understanding of what happens TO • Anomaly detection ‘and’ INSIDE a flow • Latency (application and network) • Distributions (packet sizes, TCP • Events windows…) • VXLAN information • Burstiness

Per Packet Variations Length Length 66 9000

Accumulated Flow Information (Volume…)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 56 Full vs. Sampled What happens when you sample?

Full Packet Stream

Flow A

Flow B

Flow C

SYN SYNACK ACK FIN Flow D

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 57 Full vs. Sampled Reasons and Use Cases for Both Sampled Full

• Sampling has it’s use cases, in SP • Depending on the of flows environments for example and type of flows • High Volume, no behavioral analysis • Mice flows can go completely unseen • Connection Oriented flows may not be • Sampling provides a good statistical tracked properly (missed flags) model • For Trends • Accuracy of the flow increases with • For Traffic Visibility the packet count • For Volume Indication • Type of sampling and quality of entropy • Entropy is very important

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 58 Tetration Examines every packet

Full Packet Stream

• Variability ’within’ the flow

• Variability ‘between’ the flows

• Changes ‘within’ the flow

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 59 Collects the Meta-Data not the Packet

Meta-Data – Including Overlay VXLAN/GRE/IPinIP Encapsulated Header

Ethernet IP UDP VXLAN Ethernet IP TCP Payload Header Header Header Header Header Header Header

Ethernet IP TCP Payload Header Header Header

Ethernet IP UDP Payload Header Header Header

Privacy Risk & Highly Likely to be Encrypted

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 60 Sensor Data Flow Data – Forwarding

• COS

• Overlay Type (Native, 802.1q / 802.1p, VXLAN, iVXLAN, NVGRE, NSH, other)

• Source TEP or Port ID

• Destination TEP

• Disposition (RPF or Port Security failure, Policy drop, redirect or span)

• Port type (spine to leaf or leaf to host)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 61 Sensor Data Accumulated Flow Information

• Bytes, Packet Count • Accumulated TCP flags

• IP options present • Last ACK / SEQ

• IP length error • Sampled Packet length

• DF bit set • Sampled Packet ID

• Fragment seen

• Last TTL

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 62 Sensor Data Burst

• Measure the “burstiness” of a flow • Burst are measured in 32k interval • Current Burst • Each export period is divided by 128 • Max Burst • Burst Index • Flowlets are activity after a silence • Flowlets period (configurable)

Current – 128 Current – 256 Current – 32 Current – 1024 Current – 0 Max – 128 Max – 256 Max – 256 Max – 1024 Max – 1024 Burst Index - 0 Burst Index - 3 Burst Index - 3 Burst Index - 80 Burst Index - 80

0 1 2 3 30 80 128 Flowlet #1 Silence Flowlet #2

Max Burst occurred at 62.5ms with a value of 1024 and 2 flowlets

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 64 Sensor Data Anomaly List

• TTL changed • TCP flags are zero’d • IP reserved flags are not 0 • TCP SYN with data • DF bit has changed • TCP FIN with no ACK • TCP RST with no ACK • Ping of death • TCP SYN, FIN, RST and ACK zero’d • Fragment is too small to contain L4 header (TCP, UDP and SCTP) • URG set but no URG pointer • TCP SYN and FIN are set • URG pointer with no URG flag • TCP SYN and RST are set • TCP seq outside the expected range • TCP FIN, PSH and URG are set • TCP seq is less than expected (rexmit)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 65 Sensor Data Application Latency: How long did it take for the inbound TCP Timing Data data to be ACK’d

SRTT Latency (Process to ACK Application Application Process at the TCP level)

Process Process Process Process

Sockets Port to Port Latency: Sockets (Requires HW support) Transport Transport

Network Network Network Network

Data Link Data Link Data Link Data Link

Physical Physical Physical Physical

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 66 Sensor Timing Data Network Performance Monitoring Example

• The host calculates round trip time as 8 milliseconds • The port-to-port network latency is 252 microseconds • The app took 15 seconds to return the ACK

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public AppDynamics or Tetration ?

They actually make sense together Different Problems will need Different Data Sources

Application Application

Process Process Heath, Performance, Sockets Monitoring, Security, Transport Discovery Application Troubleshooting Network Network Network Heath, Data Link Data Link Performance, Monitoring, Physical Physical Capacity

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public How do you follow the customer journey?

Network Business Network Transaction Private Cloud

Private Cloud Login

Process payment Network Network Network Network

Search Private Cloud

ESB MQ ESB MQ Withdraw funds Private Cloud

Network Network ESB MQ

Network

Private Cloud

ESB MQ Private Cloud ESB MQ

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public How do you know what services are running in your data center?

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Two best in class solutions, one full picture of your apps

End User

eCommerce Application

Database Code Platform Load Balancer

Infrastructure (Compute, Network, Storage, Security)

Data Center Data Center Public Cloud Private Cloud

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Tetration Analytics Platform

Architecture - Cluster Tetration Analytics Architecture Overview

Visualization and Data Collection Analytics Engine Reporting

Host Sensors Tetration Web GUI VM Telemetry

Network Sensors Cisco Tetration Cisco Nexus® Cisco Nexus ™ REST API 92160YC-X 93180YC-EX Analytics Platform

3rd-Party Push Events Metadata Sources Configuration Data

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 74 The Analytics Cluster Front End

Components Compute (Data Cleaning and • Hadoop Based Platform Analytics) • Self managed Caching • One touch deployment (Search) • Tiered System • Heavy Compute for Machine Learning • Caching for light speed queries

• Extensibility (future) Long Term Storage • Messaging Bus (Data Lake) • API Access

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 75 The Analytics Cluster Appliance

• The Analytics Cluster operates as an appliance • Avoids the need for in house Big Data, Analytics expertise • Supported by Cisco TAC

• Self Monitoring • The cluster leverages a sensor architecture to track it’s state and provides event based notifications for

• Software upgrades and full install are all automated

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 76 Cluster Monitoring and Maintenance

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 77 Collector Monitoring and Maintenance

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 78 Sensor Monitoring and Maintenance Sensor Throttled

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 79 Hardware Sensor Monitoring

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 80 Tetration 1.0 Analytics Cluster Configurations

4 x 3-Phase PDU 4 x 1-Phase PDU 22.5 KW Peak Power 11.5 KW Peak Power

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 81 Tetration Analytics 2.0: Deployment Options

On-Premise Options Public Cloud

Cisco Tetration Analytics Cisco Tetration-M (Small Cisco Tetration Cloud Form Factor) (Large Form Factor) • Software deployed in AWS • Suitable for deployments • Suitable for deployments • Suitable for deployments under 1000 workloads more than 1000 workloads under 1000 workloads Includes: • Built in redundancy • AWS instance owned by • Scales up to 10,000 • 6 x UCS C-220 servers customer workloads • 2 x Nexus 9300 switches Includes: • 36 x UCS C-220 servers • 3 x Nexus 9300 switches

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Analytics Engine Front End

The Platform Compute (Data Cleaning and • Hadoop Based Platform Analytics) • Self managed • One touch deployment Caching (Search) • Tiered System • Heavy Compute for Machine Learning • Caching for light speed queries

• Extensibility (future) Long Term Storage • Messaging Bus • API Access (Data Lake)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 83 Front End GUI, RESTful API

• Servers hosting front end processes • GUI and Operational Interfaces • RESTful API

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 84 Data Processing Pipeline

• Data Ingest and Processing

• Multiple Pipelines for different processing activities

• Scaled to Millions of events per second

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 85 Caching Layer Natural Language Search

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 86 Caching Layer Search

• Caching Layer provides a large in memory and flash based data store for real time searches e.g. 16 weeks of policy delta data accessible for real time search

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 87 Data Lake HDFS Storage

• Long Term Storage for collected observations, for pipeline processing tasks, etc

• Usage is based on • Time Based Retention • Space Based Retention • Greedy Retention

• Max possible Retention period will depend on cluster size and observation rate

14.10 K hours of available capacity at the current collection rates (587 days)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 88 Standard Data Analytics Pipeline Tetration Data Analysis

Various Pipelines (e.g. ADM) process the data to derive appropriate insights

Automated Data Statistical Analysis Reporting, Data Data Prep & & Visualization Aggregation Discovery& cleansing Prediction Tools or Alerts Evaluation

De-duplication, unification of uni- Sensor Collectors directional flows into bi-directional, GUI, REST API, Kafka, annotate flows with context Policy Export, … information, etc.

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 89 Data Collection Sensor to Collector

Automated Data Statistical Analysis Reporting, Data Data Prep & & Visualization Aggregation Discovery& cleansing Prediction Tools or Alerts Evaluation

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 90 Data Prep and Annotation

• De-duplication, Application Application unification of uni- Process Process Process Process Collector Sockets Sockets directional flows into Transport Transport bi-directional, Network Network Network Network annotate flows with Data Link Data Link Data Link Data Link Collector context information, Physical Physical Physical Physical etc.

Automated Data Statistical Analysis Reporting, Data Data Prep & & Visualization Aggregation Discovery& cleansing Prediction Tools or Alerts Evaluation

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 91 104 Annotation

• Think Gmail Labels

• User Defined information • User Uploaded • Keyed by VRF, IP • JSON Open Fields

• Derived Information • IP • VRF • …

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Annotation & Inventory

• Discovered Inventory

• Uploaded Inventory and Meta Data (32 Arbitrary Tags)

• Inventory Tracked in Real Time, along with historical trends

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Analyzing the Data

• Endpoints are iteratively compared with each other to find which “profiles” are most similar • Sensor Data: Ports provided and consumed, Addresses sent and received from, Properties of network flows, Running processes, Process originating flow, Hostname, • External Context: Load balancers / DNS / route tags • Human approved clusters from current or other workspaces and base cluster definition • This is an example of where we use machine leaning

Automated Data Statistical Analysis Reporting, Data Data Prep & & Visualization Aggregation Discovery& cleansing Prediction Tools or Alerts Evaluation

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 94 Machine Learning

Cognitive Computing - Finding and remembering all the relationships between data, querying the matrix of relationships (Watson)

Machine Learning - Remember what has happened before and then look at new data coming in that context to try and find patterns, build up a body of knowledge and then use that data to make a decision based on the new data. Can machines remember and apply what they remember to new data

Deep Learning - Not trying to maintain data and relationships over time but analyze that data through better representations and create model to learn these representations from large scale unlabeled data. Succession analysis

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 95 Machine Learning

A "Field of study that gives computers the ability to learn without being explicitly programmed“ Arthur Samuel (1959)

The programmers construction of algorithms that can learn from and make predictions on data (as opposed to static programming instructions).

7:00 am = 65 degrees 8:00 am = 75 degrees 77.5 degrees 9:00 am = 85 degrees

How warm will it be at 8:30 am tomorrow?

Supervised learning: Linear regression , Logistics regression, SVMs Unsupervised learning: K-means, PCA, Anomaly detection

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 96 ADM Clustering Machine Learning Example

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 97 K-means Algorithm Finding the Clusters

Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster

}

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 98 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 99 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 100 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 101 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 102 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 103 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 104 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 105 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 106 BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 107 Silhouetting Validation of the Cluster

• The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation) • Produces a higher degree of probability that the clustering is representational

https://en.wikipedia.org/wiki/Silhouette_(clustering)

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 108 Results of the Clustering Machine Learning

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 109 Tuning Cluster Granularity Tuning the Algorithms

1 2 1 1 1

15

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 110 Analyzing the Data Fitting the Curve

• Every data set (e.g. flow) is examined to find the best function that describes it’s behaviour • Comparison within and between ‘flows’ can be used to find ‘outlier’ or anomaly conditions

Automated Data Statistical Analysis Reporting, Data Data Prep & & Visualization Aggregation Discovery& cleansing Prediction Tools or Alerts Evaluation

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 111 Outliers What does not look like it ‘fits’ Outlier dimension is Switch on Outlier view to highlighted with purple circle highlight uncommon flows

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 112 Tetration Analytics Architecture Overview

Data Collection Analytics Engine Open Access

Software Sensor and Web GUI Enforcement Cisco Embedded REST API Network Sensors Tetration (Telemetry Only) Analytics Event Notification Cluster

Third Party Sources (Configuration Data) Tetration Apps

 Self Managed Cluster  No Hadoop / Data Science Background Needed  Easy Integration via Open interfaces  One Touch Deployment  No External Storage Needed  Open Data Lake (via Tetration Apps)

BRKACI-2060 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Accessing the Data and Analytical Results GUI, API, Workspace Applications and Messaging BUS

Northbound Northbound Northbound application consumers consumers

Kafka Broker

Tetration Programmatic Message Tetration GUI Interface Publish Apps

Kafka

Automated Data Statistical Analysis Reporting, Data Data Prep & & Visualization Aggregation Discovery& cleansing Prediction Tools or Alerts Evaluation

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 114 Tetration API

• Shipped as a limited ‘trial feature’ with the 103.8 release • Supported with the 2.0 release (FCS April 2017) • Is a RESTful API that uses HMAC time-bound authentication tokens generated from a private and public key pair • SDKs available in Python (2.7+) and JavaScript (ES6+) API • Supports managing sensors and switches, plus flow searching

BRKACI-2060 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Example: Flow Search Filters

• Method: POST • Endpoint: /flowsearch • Description: The entire Tetration Analytics flow database can be queried, boasting sub-second response times.

• Search the flow database for the first record that matches these parameters:

Between 02/01/2017 3PM and 02/01/2017 4PM

Default tenant and VRF

Destination port 80 (HTTP)

BRKACI-2060 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Tetration an Open Platform User Apps

• Tetration Engineering, Partners and Customers can write apps on Tetration • User can write his business logic and extend Tetration

• Programming Languages supported • Scala • Python • SQL • R (coming)

• Type of jobs • One Time Applications and Reports – scheduled now or any time in future • Recurring Applications and Reports

• Trigger Alerts on tetration or user defined events

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public User Apps: Tetration an Open Platform • Data Source • Tetration Data with Multi Tenancy (Row and Column filtering – Tetration Read Libs) • User Uploaded Data – Any schema or format • Application generated Data • Data Retention and quota monitoring • Arbitrary Stream ingestion (coming soon)

• Alert and Event Posting • Kafka Message bus integrated

• Security model • Tenant Isolation • Tetration jobs isolated from user • Active firewalls. All user jobs launched inside a private container, Adhoc VM. Jailed from Tetration mainstream • Kafka Kerberos Auth

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Tetration Apps

• Explore data from your browser, develop your models

• Based on Jupyter Notebooks

• Provides an easy way to develop apps

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Granular RBAC Application Access Controls

Permitted Scope Workloads Actions User Permitted Roles Scope Workloads Actions User Permitted Roles Scope Workloads Actions User Permitted Workloads Actions

R, Modify, ADM, Enforce, etc.

BRKACI-2060 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Tetration Analytics – Ecosystem

Load balancers CMDB/DNS/IPAM

Policy Enforcement Workload Optimization

Cisco Tetration SIEM Systems Incident Management Analytics

Visibility

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Policy Discovery, Compliance and Enforcement Where do you start after you are collecting all the data Application Dependency and Cluster Grouping

Bare-metal, VM, & switch telemetry BM VM VM VM BM VM BM

Cisco Nexus® 9000 Series VM VM Network-only sensors, host-only sensors, or both (preferred)

Cisco Tetration BM VM Bare-metal & BM VM VM BM VM telemetry Analytics™ VM BM Brownfield Platform

Bare metal and VM VM BM

Unsupervised machine VM BM BM VM VM BM VM telemetry learning (AMI …) Behavior analysis BM On-premises and cloud workloads (AWS)

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Additional Inputs for ADM Runs

Load Balancer Configurations DNS Configurations

IP Address Management Database Existing CMDB Information

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public What do you get to start with Whitelist Policy Recommendation

Application Discovery Whitelist Policy Recommendation (Available in JSON, XML, and YAML)

{ "src_name": "App", "dst_name": "Web", "whitelist": [ {"port": [ 0, 0 ],"proto": 1,"action": "ALLOW"}, {"port": [ 80, 80 ],"proto": 6,"action": "ALLOW"}, {"port": [ 443, 443 ],"proto": 6,"action": "ALLOW"} ] }

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Additional you get Real Time Observation Policy Compliance Verification & Simulation

What was seen on the network that was out of Policy

Permitted Traffic Seen on the network

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Next Comes Actual Policy Enforcement Implement the Application Policy Direct Management • Tetration provides two mechanisms to control policy Application of Host OS firewall filtering • Policy export to Infrastructure Enforcement Devices Process Process (ACI, Firewalls, …) • Direct management of Host Access Control Sockets Mechanisms • Currently direct Enforcement happens at the Operating OS Kernel Space System Level • IPtables on Linux (IPSets with latest 2.x release) vSwitch • Advanced Firewall on Windows • It can be enabled / disabled at the host level (from Tetration Hypervisor Policy Export to APIC, Cluster) Firewall, … • Monitoring or Monitoring + Enforcement Network • A compliment not replacement for infrastructure enforcement

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Policy Enforcement should be easier The Goal Is to Describe Intent

I want to…

• Block non-production apps talking to productions apps • Allow HR apps to use the employee database

• Block all HTTP connections that are not destined to web servers • Allow and notify me when a new app request DNS server access

• Block and notify me when a new app requests AD server access

BRKACI-2060 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Enforcement should be Location Independent Policy follows the workload

Intent stays with the endpoint, no matter the infrastructure it resides on

EP EP

Ports Interfaces

Tetration calculates all EPG’s necessary rule changes and Security Groups automatically applies

Contracts Security Rules

Data Center Cloud (AWS, Azure, …)

BRKACI-2060 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public How Does It Work?

Tetration automatically converts your intent into black and white list rules and configures the native OS security mechanisms

Block non-production apps talking to DENY SOURCE 10.0.0.0/8 DEST 128.0.0.0/8 production apps

Allow HR apps to use the employee database ALLOW SOURCE 128.0.10.0/16 DEST 128.0.11.0/16

ALLOW SOURCE * DEST 128.0.100.0/16 PORT = 80 Block all HTTP connections that are not destined to web servers DENY SOURCE * DEST * PORT = 80

Intent Rules

BRKACI-2060 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Application Centric, Okay but how do I get there? Enhanced Security Services Application and Infrastructure Optimization

Tenant and Application Security Requirements and Enforcement

Application Dependency Mapping

Automated Network Whitelist Policy Forensics Generation

Policy Policy Simulation Compliance and and Impact Auditability Assessment

Rich Telemetry Data from Hardware (Nexus 9000) and Software Sensors enables discovery and security monitoring

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Data Center Vision Inter-dependent feedback loops ACI, YANG (Intent 1. Deployment and Based Automation) Provisioning Infrastructure Automation Security

Cisco CloudCenter (Common Application Guarantees Assurance (Formal Consumption across Deployment Compliance Methodologies) Hybrid IT) Consistency

ADM 2. Operations and Tetration Analytics Security Management (Machine Learning Based Operations Forensics and Security)

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Summary

VM

Pervasive flow Ready-to-use Self-monitoring Open platform Accelerated telemetry that solution to address and eliminate the and northbound adoption and supports critical data center need for APIs enable comprehensive infrastructure for operational in-house big data transparent Solution multiple data use cases expertise integration support with centers at scale Services

BRKDCN-2040 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 134 Complete Your Online Session Evaluation

• Give us your feedback to be entered into a Daily Survey Drawing. A daily winner will receive a $750 gift card. • Complete your session surveys through the Cisco Live mobile app or on www.CiscoLive.com/us.

Don’t forget: Cisco Live sessions will be available for viewing on demand after the event at www.CiscoLive.com/Online.

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Continue Your Education

• Demos in the Cisco campus

• Walk-in Self-Paced Labs

• Lunch & Learn

• Meet the Engineer 1:1 meetings

• Related sessions

Presentation ID © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 136 Data Center / Virtualization Cisco Education Offerings

Course Description Cisco Certification

Introducing Cisco Data Center Networking (DCICN); Introducing Get job-ready foundational-level certification and skills CCNA® Data Center Cisco Data Center Technologies (DCICT) in installing, configuring, and maintaining next generation data centers. Implementing Cisco Data Center Unified Computing v6.0 (DCUCI) Obtain professional level skills to design, configure, CCNP® Data Center Implementing Cisco Data Center Infrastructure v6.0 (DCII) implement, troubleshoot next generation data center Implementing Cisco Data Center Virtualization and Automation v6.0 infrastructure. Designing Cisco Data Center Infrastructure v6.0 (DCID) Troubleshooting Cisco Data Center Infrastructure v6.0 (DCIT) Product Training Portfolio:DCAC9K, DCINX9K, DCMDS, DCUCS, Gain hands-on skills using Cisco solutions to DCNX1K, DCNX5K, DCNX7K, HFLEX200 configure, deploy, manage and troubleshoot unified UCSDF, UCSDACI, DCUCCEN computing, policy-driven and virtualized data center infrastructure. Designing the FlexPod® Solution (FPDESIGN); Learn how to design, implement and administer Cisco and NetApp Certified Implementing and Administering the FlexPod® Solution (FPIMPADM) FlexPod® solutions FlexPod® Specialist

Designing the VersaStack Solution (VSDESIGN); Learn how to design, implement and administer Implementing and Administering the VersaStack Solution (VSIMP) VersaStack solutions

For more details, please visit: http://learningnetwork.cisco.com Questions? Visit the Learning@Cisco Booth

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 137 Thank you