Cybersecurity Data Science (CSDS) PART 1: FRAME
Total Page:16
File Type:pdf, Size:1020Kb
Cybersecurity Data Science (CSDS) Practitioner Methods and Best Practices PART 1: FRAME Scott Allen Mongeau Cybersecurity Data Scientist SAS Institute [email protected] Copyright © SAS Institute Inc. All rights reserved. Cybersecurity Data Science (CSDS) TOPIC 1. FRAME 2. DATA 3. DISCOVER 4. DETECT 5. DEPLOY 2 Introductions, Expectations, Agenda Copyright © SAS Institute Inc. All rights reserved. Scott Allen Mongeau Cybersecurity Data Scientist Cybersecurity Data Science (CSDS) MA GD MA MBA PhD (ABD) +31 (0)68 370 3097 [email protected] Scott Allen Mongeau • SAS (2015 – present) SARK7 • F&SI Cyber – Global (EU/Netherlands based) • Fraud & Financial Crime Analytics (London) • SARK7 (2008 – present) • Data analytics consulting (Leiden) • Deloitte (2013 – 15) • Fraud, Financial Crime, Cyber (Amsterdam) • Genentech Inc. (2000 – 2008) • Data analytics (San Francisco, CA) Scott Mongeau • Defining key CSDS challenges Cybersecurity • Outline best practices Data Science: Perspectives on an • Guidance to managers and Emerging Profession practitioners in implementation of CSDS programs 5 Key Learning Objectives 6 Copyright © SAS Institute Inc. All rights reserved. Cybersecurity Services and Solutions: Overview of Major Areas 7 NIST Cybersecurity Framework 8 9 Cybersecurity Data Science as a Process Data Engineering Advanced Analytics Triage / Validate Remediate Diagnostics & Establishing Predictive Anomaly Behavioral ? patterns baselines modelling detection insights DATA DataENGINEER Manager Scientist Cyber INVESTIGATOR Data Scientist CASE MGMT Infosec Response Investigator 10 Cybersecurity Analytics Maturity Curve Data-aware Anomaly Detection Predictive Detection Risk Optimization Investigations • Big data management • Flags, rules, and alerts __________________ • Multivariate statistics, inferenceCHASING & unsupervisedPHANTOM machinePATTERNS learning • Groups extracted as baselines 11 1. FRAME C S D S 2. DATA P r o c e s s DEPLOY 4. DETECT 3. DISCOVER 12 Cybersecurity Data Science (CSDS) Lifecycle Monitor Frame Data Manager InfoSec Response DECIDE Deploy Explore DETECTION DISCOVERY Validate Engineer Cyber Data Scientist Model Investigator Copyright © SAS Institute Inc. All rights reserved. Security Operations Center (SOC) 14 Emerging SOC Operational Drivers Big & fast streaming data Limitations of traditional Integrated situational Need to build and Automation of manual needs to be stitched into signature and rules- awareness of network, validate efficacious investigation and ‘smart data’ based approaches, device, and user behavior machine learning remediation processes requiring probabilistic while reducing false alerts models and risk-focused models Copyright © SAS Institute Inc. All rights reserved. 1. FRAME Cybersecurity, Data Science, Logs Copyright © SAS Institute Inc. All rights reserved. Cybersecurity Data Science (CSDS) TOPIC 1. FRAME 2. DATA 3. DISCOVER 4. DETECT 5. DEPLOY 17 Learning Objectives 18 Copyright © SAS Institute Inc. All rights reserved. Objectives of Context Setting Establishing a foundation • Data challenges (opportunities) in cybersecurity • Data science foundations • Demonstration / exercises • Insights into the ‘data deluge’ (network analytics) • Log file analytics – from unstructured to structured 19 CSDS Process Unified Orchestration Copyright © SAS Institute Inc. All rights reserved. Cybersecurity Context Copyright © SAS Institute Inc. All rights reserved. Evolving Threats Internal Threats State Actors Automated Fraud-Cyber Attacks Hybrids Social Ransomware Engineering & Cryptojacking Copyright © SAS Institute Inc. All rights reserved. Hidden threats want to remain hidden (in the data) 23 24 24 Source: Recorded Future via Fortune Magazine ‘A Hacker’s Tool Kit’ http://fortune.com/2017/10/25/cybercrime-spyware-marketplace/ 25 FUD Fear, Uncertainty, Doubt Increasing FREQUENCY, sophistication, and speed of attacks Example of hacking tool ‘ecosystem’: Kali Linux • 51,000 attackers were blocked within days of site launch • More than 1 million VPN-based attacks were blocked 26 Ambiguity effect Distinction bias Information bias Reactive devaluation Anchoring or focalism Dread aversion Insensitivity to sample size Recency illusion Anthropocentric thinking Dunning–Kruger effect Interoceptive bias Regressive bias Anthropomorphism or personification Duration neglect Irrational escalation Restraint bias Attentional bias Empathy gap Law of the instrument Rhyme as reason effect Risk compensation / Peltzman Automation bias Endowment effect Less-is-better effect effect Availability heuristic Exaggerated expectation Look-elsewhere effect Selection bias Availability cascade Experimenter's or expectation bias Loss aversion Selective perception Backfire effect Focusing effect Mere exposure effect Semmelweis reflex Bandwagon effect Forer effect or Barnum effect Money illusion Social comparison bias Base rate fallacy or Base rate neglect Form function attribution bias Moral credential effect Social desirability bias LIST OF Belief bias Framing effect Negativity bias or Negativity effect Status quo bias Frequency illusion or Baader– Ben Franklin effect Meinhof effect Neglect of probability Stereotyping Berkson's paradox Functional fixedness Normalcy bias Subadditivity effect COGNITIVE Bias blind spot Gambler's fallacy Not invented here Subjective validation Choice-supportive bias Groupthink Observer-expectancy effect Surrogation Clustering illusion Hard–easy effect Omission bias Survivorship bias Confirmation bias Hindsight bias Optimism bias Time-saving bias BIASES Congruence bias Hostile attribution bias Ostrich effect Third-person effect Conjunction fallacy Hot-hand fallacy Outcome bias Parkinson's law of triviality Conservatism (belief revision) Hyperbolic discounting Overconfidence effect Unit bias Continued influence effect Identifiable victim effect Pareidolia Weber–Fechner law https://en.wikipedia.org/wiki Contrast effect IKEA effect Pessimism bias Well travelled road effect Courtesy bias Illicit transference Planning fallacy Zero-risk bias /List_of_cognitive_biases Curse of knowledge Illusion of control Post-purchase rationalization Declinism Illusion of validity Present bias Decoy effect Illusory correlation Pro-innovation bias Default effect Illusory27 truth effect Projection bias Denomination effect Impact bias Pseudocertainty effect Disposition effect Implicit association Reactance Lack of statistical and analytics approaches to establish alert validity High volumes of Too many alerts combined disconnected and with slow and resource disparate data sources constrained investigations Proliferation of point Disconnected log analytics solutions sources of variable impedes holistic risk- quality based approach Difficulty Actioning ! Indicators Company Confidential – For Internal Use Only Copyright © SAS Institute Inc. All rights reserved. ‘Drowning’ in Data Lakes 29 Network Traffic Tracking network traffic on a single device 30 Copyright © SAS Institute Inc. All rights reserved. Exponential Growth in Network Traffic https://www.neustadt.fr/essays/against-a-user-hostile-web/ Using cookies and other tracking techniques, many “normal” web sites aggressively access and store our: • Device details • Buying behavior • IP address • Personal finances • Present geolocation • Religious beliefs • Changing physical locations • Political affiliations • Browser history • Online purchases • Health concerns and problems • Profile information • Camera / photos / microphone 31 Marketing Analytics Ecosystem (>4000 companies) 32 Demonstration / Exercise Network Traffic from Web Browsing 1. Open ‘Wireshark’ 2. Click ‘Ethernet’ 3. Observe results 33 Demonstration / Exercise Network Traffic from Simple Web Browsing 4. Open ‘Internet Explorer’ (next to Wireshark) 5. Go to a popular news website 6. Monitor Wireshark – what do you see? 7. Sort on destination 7. View ‘red’ & ‘black’ events See codes View => Coloring Rules Note ‘Destination’ IP 8. Go to http://ip-lookup.net Where and what are these IPs? 34 Wireshark Coloring Rules (default) 35 36 Data Science Foundations Copyright © SAS Institute Inc. All rights reserved. Level of difficulty in reducing false alerts* * Survey of 621 global IT security practitioners https://www.sas.com/en_us/whitepapers/ponemon-how-security- analytics-improves-cybersecurity-defenses-108679.html 38 Statistics Cognitive Pattern science Identification Data Science Data AI Mining Machine Learning Data Engineering As adapted from: Hall, P., et al. An Overview of Machine Learning with SAS Enterprise Miner. SAS Institute Inc. 39 39 DATA ANALYTICS What to do? What are PRESCRIPTIVE trends? PREDICTIVE What happened? DESCRIPTIVE SOPHISTICATION VALUE 40 DATA ANALYTICS Operations Management Econometrics PRESCRIPTIVE Forecasting Machine Learning PREDICTIVE Business Intelligence (BI) DESCRIPTIVE SOPHISTICATION VALUE 41 DATA ANALYTICS How can we optimize remediation from focused alerts? PRESCRIPTIVE What is the ‘normal’ behaviour of particular users or devices? PREDICTIVE What are trends and patterns in network and device usage? DESCRIPTIVE SOPHISTICATION VALUE 42 43 Calvin.Andrus (2012) Depicts a mash-up of disciplines from which Data Science is derived http://en.wikipedia.org/wiki/File:DataScienceDisciplines.png Engineering Physics / ‘Hard Computer Sciences’ Science Software Economics / Engineering Finance Social Sciences 44 Calvin.Andrus (2012) Depicts a mash-up of disciplines from which Data Science is derived http://en.wikipedia.org/wiki/File:DataScienceDisciplines.png 45 Scientific test… 46 VALIDATE WITH DIAGNOSTICS