Inside Cisco IT: Using Machine Learning Technologies to Drive Digital Transformation
Plamen Nedeltchev, Ph.D. Cisco IT Distinguished Engineer
BRKCOC-2017 Agenda
• Enterprise IT and Digital IT
• KDD and ML/DL/AI
• ML/DL Cisco IT use cases
• What is on the Radar?
• Conclusion The next industrial revolution is digital
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 4 Next Generation Cisco Secure Intelligent Platform for Digital Business
Deploy Enrich security Reinvent Enable a Unlock the employee/ everywhere Networking multi-cloud power of customer world data experience
Deliver Continuous Customer Value
Presentation ID © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 5 IoT - IT and OT Integration
Remote Access Local Access WAN Data Center Cloud
Industrial Router Industrial Switch Industrial Access Point Industrial Video Industrial Sensor
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 6 Data Growth Projection Cisco Structured ~ Unstructured Data Growth estimate by 2020 25 • Per IDC Data will grow at about 9x between 2015 23 and 2020 Structured Unstructured • About 80% of data growth will be unstructured Data Type Current Size Growth Rate 20 19 (PB) YOY
Biz Data 8 24% 15 15.19 IOE Data <1 50% 15 Big Data 4 50% 12
PB 10 10.13 Storage Classification (Petabytes) 10 8 Database Prod Structured 4 in DataPetaBytes Actuals 6.75 Database Non Prod Structured 8 6 5 Database Big Data/Unstructured 4 5 4 4.50 3 Database Logs/Code Trees 6 3.00 Exchange 6 2 2 1 Others ( WPR, Video, Apps) 20 0.1 0.2 0.3 0.5 Total GIS Used Storage 48 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Year BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 7 Big Data is the new normal – the triple “V” The “N” dimension – the N data is the Big Data
Corp. Traffic 60% 90% Unstructured On the edge
40%
80% of traffic will be inside the Widely-distributed, Streaming data at massive scale Data Center: short shelf life; Store and analyze 7% DC to DC; 13% DC to Users too big to move Analyze before you store Replicate, Parse
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 8 The changing data patterns and directions Sharing economy Global Mobile Traffic 30 24.3 25 57% CAGR 2014-2019 • BW – Not Fully Utilized Data Center Reached 190 EB/Year By 20 16.1 $Trend Down 2008 and Will Reach 25 15 10.7 • Ipv6 – Pervasive EB/Year By 2020 6.8 10 4.2 • HDD – Terabytes, Compute Power. 5 2.5 $Trend Down. 0 2014 2015 2016 2017 2018 2019 • Excess Of Space, Power, Cooling And Access. $Trend Up Exabytes per Month
Cloud Mobile Prosumer A New Growth Opportunity. Users not only consuming services but offering services Mobile Video and products. Will be 70% 60 sec. to entrepreneurship. 14 million prosumers US 2008 – of all the traffic. 2010 only. Cloud will host 90% of all the mobile traffic Smart Home BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 9 Workplaces are becoming digital Digitize Accelerate Enable Your Business (PaaS) Simplify Innovation with Hybrid AGILITY Simple, Seamless Collaboration Converge Experience In Business Collaboration Flexible, Scalable and Continuous Infrastructure Richer Collaboration Process-oriented Collab Voice Video Data Collaborate where you are Adaptive Collaboration IoE Lower IT Expenses Pervasive Team Collaboration On-line workplace Foundation for video Activity-based Working Multi-cloud fusion Unified Communications Pervasive Video
PRODUCTIVITY IP Telephony and Telecommuting Omnichannel Contact Center Visualized Networks Corporate Office
Converged Networks EFFICIENCY
2005 2015 FUTURE
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 10 IT Infrastructure as you know it
Clients Security Applications Services Management Video | Collaboration Policy Endpoint/User Services AI, Analytics Network Services Virtualization Systems APIs
Remote Local Data Access Access WAN Center Cloud
DATA
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 11 DATA
Clients Applications Services Security Video | Collaboration Managemen Endpoint/User Services t Policy Network Services AI, Analytics Systems Virtualization APIs Clients Application Services Endpoint/User Services Security Managemen Policy Network Services t Mobility Virtualization APIs Systems Remote Local WAN Data Cloud Access Access Center
Remote Local WAN Data Cloud Access Access BRKCOC-2017 Center © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 12 Transition to Digital AI IT Infrastructure NETWORK DATA CENTER
COLLABORATION
Clients Application Services Endpoint/User Services Security Management Policy Network Services AI, Analytics Virtualization APIs Systems
Cloud Industrial Router Remote Local WAN Data Industrial Video Industrial Sensor IndustrialAccess Switch Access Industrial Access PointCenter
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 13 Digital IT AI NETWORKING
Infrastructure DATA CENTER
COLLABORATION
Client Application Services Endpoint/User Services Security Managemen Policy Network Services t Mobility Virtualization APIs Systems Industrial Access Point Industrial Video Industrial Router Industrial Switch Data Industrial Sensor Cloud Remote Local Center Access Access WAN BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 14 The Network. Intuitive. Constantly learning, adapting and protecting. LEARNING Informed DNA Center by Context Visibility into traffic and threat patterns Who, What, When, Policy Automation Analytics Where, How
INTENT CONTEXT
Powered Intent-based by Intent Network Infrastructure Translate Business Intent to Network Policy Automate the management and provisioning millions of devices instantly SECURITY © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public GIS AI built on the top of Data Architecture 2.0
KAFKA+NIFI KYLO R+SPARK KAFKA+ Jupiter TheBench SDLC AI Pulse TensorFlow GIS API AI DRILL Studio KDD+AI Toolset
Deep Learning – (LSTM) (CNN), (RNN); Collaborative– Alternative least squares; Clustering – K-Means; Dimensionally reduction – SVD, PCA, NLP; Classification and Regression – Linear Filtering methods – (SVM, Logistic regression, Linear Regression); Decision trees / KDD and AI Boosted decision trees; Naïve Bayes, Optimization – Stochastic gradient descent; Limited memory BF GS (L-BPCS); Newton’s method.
Custom GIS AI applications Ingest Prepare Analyze Data Visualize IaaS Cost DC Capacity Generate Choose any Generate Connect metadata datasets metadata Connect Create optimization Planning any data from lake Data Lake datalake visualizati source Define Define metastore on reports indexing Define searching Get data policies analytics/tr policies Search Create in native ansformati available usecase form Define Define on models Short datasets in specific BI audit/accu audit/accu term, lake dashboard CORE GIS AI applications Store in Code racy racy transactio AI Intent transit policies advanced policies Consume Define Infra Incident
Data Acquisition Data n state Classification
analytics datasets access manager prediction Publish/subscribe Define Publish/subscribe Define over control Iterate dataset Associate dataset Long term, api/native- HW Failure DC Energy Prediction over props. analytics props. historical sql/event- Publish BI Management defined with data driven dashboard freq. workflow Store in Store in Infrastructure ETA KAIROS data lake Kafka data lake Data Lake
Data Indexing service Data Quality service Data workflow service Data Access service Batch Services Stream Services Continuous Data Feed scheduling service Data Lineage service Feed health service Alerts service Feed SLA service Processing Metadata indexing service Data Dictionary Search Continuous Analytics Iterative Processing Data Virtualization Common Core UCS NLYTE VMTURBO OCI EMAN vCenter Compliance TRENDS Host AM ITSM ESP Mgmt. © 2017 Cisco and/or its affiliates. All rights reserved.DATA Cisco Acquisition Public CPU utilization based incident prediction
1. Incident Prediction Previous Incident data points CPU usage Predicted CPU usage
Compare Time Stamp Vector Analysis Fast Forward: Rescaled for Incident Prediction.
Predicted CPU usage (scenario 1) No/ Little match found with previous Incident data points Predicting future state Predicted CPU usage (scenario 2) More match found with previous incident data points
2. Anomaly Detection – comparing predicted to actual behavior
Actual CPU usage (scenario 1) Actual state of system matches predicted state Actual state Actual CPU usage (scenario 2) Actual state of system has deviated from predicted state
Legend: Healthy state Predicted healthy behavior
Not healthy Predicted unhealthy state© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Authentication failures on ISE
M L PRO CESS
• Event —> Incident log data collection
• Data preparation/ cleaning
• Analyse types of classifiers based on data
• Observe predictions and test for accuracy
• Improve accuracy until accepted range
Presentation ID © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 18 New Skill Set in IT Cisco Data Preparation
Clean and Business Analysts change Data Engineers Data Scientists Explore Combine
Share & Govern Data Virtualization Data Set DBaaS Add data Files XML Docs Databases Hadoop NoSQL Publish SaaS Apps Desktops Shape Enrich
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 19 Software Development and ML/DL/AI
Computer Programing Machine Learning Squared Root Curve Fitting by Finder Linear Regression
Data Data
Output Program Computer Computer
Program Output
SW Development Data Science • Write code • Gather Data • Fix bugs • Prep Data • Test code • Choose a model • Release code • Train the model – 80:20/70:30 • Evaluate • Hyper parameter training • PredictBRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 20 Types of Analytics How can we make it happen? What will Prescriptive happen? Analytics Why did it Predictive happen? Analytics What Diagnostic
Value happened? Analytics Descriptive Analytics
Difficulty
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 21 How do you prepare?
Techniques New Skill Set • A/B Testing • Machine Learning • Crowdsourcing - • Natural Language Processing • Data fusion and integration – Data integration. • Signal Processing • Genetic Algorithms – In the field of artificial • Simulation intelligence, a genetic algorithm (GA) is • Time series analysis a search heuristic that mimics the process • Visualization of natural selection
Languages and Solutions: • PIG, Go, R, Python • MapReduce • Column-oriented databases • Schema-less database (NoSQL databases) • Hadoop • Hive
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 22 Machine Learning – Deep Learning KDD Knowledge Discovery and Data Mining “Predictions are very difficult, especially if it is about the future – Evolution of terms Niels Bohr
Pattern Recognition Statistics Deep Learning
Machine Learning Data Databases Mining AI
KDD
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 24 Tesla Autopilot predicts crash
https://www.youtube.com/watch?v=Bs4LwCjA12o BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 26 SOFIA – the first citizen
https://www.youtube.com/watch?v=S5t6K9iwcdw Presentation ID © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 28 Machine Learning is the new normal
Unsupervised Learning Deep Learning Machine learning is a subfield 02 No labels are given to the learning 04 Model the way the human brain processes of computer science - “field of algorithm, leaving it on its own to light and sound into vision and hearing study that gives computers the find structure in its input ability to learn without being explicitly programmed” – Art Samuel - 1959.
Supervised Learning Reinforcement Learning 01 Computer is presented with 03 A computer program interacts with a example inputs and their dynamic environment in which it must desired outputs to learn a perform a certain goal(such as driving general rule a vehicle)
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 29 A Neuronet is a function approximator
Threshold T (bias) Unless it is exceeded, nothing happens
Z= F(Xi, Wj, Tk)
X1 ->>>> X1 <- W1 (weight) X2 ->>>> X1 <- W2 (weight) Threshold T (bias) - >>> ∑ (W1X1+W2X2+….+WnXn) - >>> } Z is 1 or 0 Xn ->>>> X1 <- Wn (weight) © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Data Mining Algorithms ZeroR One R
Frequency Table Naïve Bayesian
Classification Covariance Matrix Linear Discriminant Analysis Logistic Regression Similarity Functions K Nearest Neighbors Others Artificial Neural Network
Frequency Table Support Vector Machine Regression Decision Tree Covariance Matrix Multiple Linear Regression
Similarity Functions K Nearest Neighbor
Others Artificial Neural Networks
Support Vector Machine
Clustering Hierarchial Agglomerative Divisive Partitive K-Means
Self-Organizing Map Associative Rules
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 31 How do we know we are doing a good job?
Training True Positives + True Negatives Accuracy = of Models True Positives + True Negatives + False Positives + False Negatives
Positive True Positives How many positives I came Predictive = up from all the things I Value (PPV) True Positives + False Positives labeled positively
True Positives Sensitivity = Percentage correctly found True Positives + False Negative
True Negative Specificity = Percentage correctly rejected True Negative + False Positive
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 32 Intellectual Asset Protection (iCAM)
Context Enrichment Data Mining Business Decision Security Security Experts Update Experts 3rd Party Business Private Context People Cloud Providers Update Abnormal Data Centre User Identity Business Ad hoc Behavior Ad Hoc End Data Identity Rules Query Rules Query Points Endpoints Applications Device Identity & Data App Identity Notification Lab Event Behavior Business Rule Lab Behavior Abnormal Analysis Analysis Verification Alert Cloud Database Behavior Public SudoCloud Structured Data Behavior Security vs. High Level Cisco Data at Risk User Activity Reconstruction Productivity Business Language EVENTS
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 34 SVM Demo Monitoring Quality Index Ensemble of ML algorithms: • Support Vector Machine (SVM) algorithm for pattern recognition – (Vladimir Vapnik) Decision tree - Both algorithms are being used to reduce the false positive rate. • Naive Bayes for decision surface to define which documents the file belongs to – Thomas Bayes 18th century. • Poisson distribution is being used for anomaly detection. Collaborative filtering – who is who
78,240+ 453,000+ Events per second Events per second (Average) (Peak)
1,888 2.9 Minutes 1.0 Hours 89.7% Incidents Captured Median Time to Detect Median Time to Remediate Auto Corrective Action
64% 12.8% 0.64% 1.1% Incidents with Zero Human Incidents involved manual Repeating incidents (SVM + Decision tree) Touch False positive rate support
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 35 Cisco Log Intelligent Platform Device/ App Logs18
Load Balancer
APIS Messages APIS Messages Messages APIS CLIP Server CLIP Server CLIP Server
CLIP Analyzer CA APP1-Node1 CA APP3-Node1
CA APP2-Node1 CA APP2-Node2 CA APP1-Node2 CA APP3-Node2
CLIP Meta Store CLIP DB 1 CLIP DB 2
CLIP Web Console
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public Metrics and Machine Learning on CLIP • Support Vector Machines(SVMs) • Logistic Regression • Linear regression • k-Means clustering
500ms 80% accuracy 2.8T 1.1 Hours Median Time to classify a Bugs Classification Volume processed Median Time to Process bug
84% accuracy 30 days Week, Day, Hour 3 sec Data Usage Prediction Usage sampling duration Data sampling intervals Median Time to Predict
90% accuracy 1000 nodes 1 min Hourly Anomaly Detection Nodes monitored Temporal resolution Window of detection
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 37 Machine Learning and Tetration Cisco Tetration Analytics Architecture
Visualization and Data Collection Analytics Engine Reporting
Host Sensors Tetration Web GUI VM Telemetry Cisco Tetration Network Sensors ™ Cisco Nexus® Cisco Nexus Analytics REST API 92160YC-X 93180YC-EX Platform
3rd-Party Push Events Metadata Sources Configuration Data
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 40 k-means Demo Unsupervised machine learning
The k-means algorithm is an iterative method for clustering a set of N points (vectors) into k groups or clusters of points. Lloyd-Forgy 1957
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 41 Visual Query with Flow Exploration • Replay flow details like a DVR • Information mapped across 25 different dimensions
• Thick lines indicate common flows • Faint links indicate uncommon flows
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 42 Outliers • Switch on Outlier view to • Outlier dimension is highlight uncommon flows highlighted with purple circle
© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 43 Machine Learning-Based Malware Defense Tracking with Lightbeam
See who’s tracking you online Lightbeam is a Firefox add-on that uses interactive visualizations to show you the first and third party sites you interact with on the Web. As you browse, Lightbeam reveals the full depth of the Web today, including parts that are not transparent to the average user.
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 45 Results Random Forrest Classification Leo Breiman, Berkley 2005 Benign Malicious Domain Probability Domains Domains pdxxwkfttogrib.in 0.985 jpqrhoctgihell.tw 0.985 jtmvtchedyscmn.me 0.985 krpbtonwsrhcig.su 0.985 xoeluhhsnlosqo.me 0.98 dkblkeftpeodxk.me 0.98 iqivnmecsnyvbu.me 0.98 rndruppbakyokv.com 0.98 gdbvlvedrjunwn.me 0.975 dsbyhplmesbqgh.me 0.975 njcdcqdwcsrhoc.me 0.975 mvugkafkrelpsa.tw 0.975 veqalsexqhkrrg.su 0.975 jjhsmiubxxqvbl.me 0.975 dbqhfffdjdvrmn.me 0.975 DEMO Compromised gsiyrhxqljweuh.me 0.975 Devices nbbnwnesmxkbmv.me 0.975
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 46 Machine Learning Pipeline
Domain Name Classifier Malicious Domains DGA Network Time
DNS Resolver Compromised Device classifier Malicious Raw (Security Event) Resolvers pDNS Tunnel Network Time classifier
Device Behavior classifier Behavior Anomalies Network Time
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 47 Classification Performance
Overall model performance Performance per malware family
(Random Forrest) Malware Family Accuracy 86.309% Metric Performance Conflicker Accuracy 98.738% Cryptolocker 98.348% Precision 99.288% Pushdo 95.515% Recall 98.181% Ramdo 99.823% AUC 99.801% Tinba 96.715% Zeus 100.0%
Trained on Alexa top 350k and 350k DGAs from 6 malware families
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 48 Tailgating Discovery Deep Learning - Video Analytics
Tailgating Detection @ Cisco3
Tailgating Detection @ Cisco2
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 50 Deep Learning - Video Analytics
Tailgating Detection @ Cisco4
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 51 Mood and Engagement - IoVision Engagement Metrics - Speaker’s View
100%
95% 90%
20%
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 53 Engagement Metrics - Speaker’s View
100%
90% 95%
20%
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 54 What is on the Radar? Demo
KAIROS – WiFi / Wired Assurance BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 57 BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 58 BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 59 BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 60 BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 61 BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 62 BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 63 BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 64 BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 65 BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 66 BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 67 BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 68 BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 69 Engage With Us blogs.cisco.com/ciscoit
facebook.com/ciscoit cisco.com/go/ciscoit
twitter.com/ciscoit youtube.com/cisco
BRKCOC-2017 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 70 Thank you Presentation ID © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public