Turn on the Lights with Big Data Security Analytics

TECSEC 3900

Chris Ebmree James Sirota Nadhem AlFardan Agenda

• Introduction • Big Data Concepts • The OpenSOC Architecture – Part 1 • 15 min Break – 11:00 to 11:15 • The OpenSOC Architecture – Part 2 • The User Interface • Building your OpenSOC • Summary/Closing/Open Discussion

TECSEC 3990 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 3 Introduction to the Technical Seminar Introduction to the Technical Seminar – Abstract

• This is a half-day advanced technical seminar that covers security analytics using a state-of-the-art big data platform that is based on customized Hadoop 2.0 and other Apache projects. • The seminar will introduce a number big data concepts and how big data components such as Flume, Kafka, Storm, Elastic Search, Hive and others are integrated to build the leading OpenSOC project. • Describe how the advanced security analytics capabilities are bolted into this platform to detect advanced threats.

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 5 Introduction to the Technical Seminar – Abstract

• Describe in details how the platform stream process in real-time large amounts of data coming from various sources including packet capture, syslog and many others. • Concepts related to data enrichment, machine learning algorithms and graph analysis for anomalies detection will be described and demonstrated in this technical seminar

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 6 The Team

Chris James Nadhem Embree Sirota AlFardan

Sr. Hadoop Cluster Data Scientist Solutions Architect Administrator nalfarda @chris_embree @JamesSirota

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved.7 Cisco Public Security Challenges The Security Model

Control Detect Scope Enforce Block Contain Harden Defend Remediate

Network Endpoint Mobile Virtual Cloud

Point-in-Time Continuous

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Designing for the Evolving Threat Landscape

• Device monitoring • Events correlation • Feeds from • Log collection with • Network and reputation services limited retention system log • Vulnerability • Limited device collection management coverage • Case management • Incident handling • Slow reactions to capabilities incidents • Big Data sophisticated security analytics • Feeds from intelligence services • Cloud processing • Sophisticated

NetFlow analysis Attack Sophistication Attack • Early alarming • Forensics capabilities

1st Generation 2nd Generation 3rd Generation 4th Generation TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Introduction to BIG Data Why a Different Approach? Everything we do leave traces, the promise is to be able to analyze this data! Logs Alerts Events

DNS Transactions Network Packet Flows Capture Transactions Content (emails, web Social Media pages, etc.) Hash Values Sensor Logs

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 13 A SOC Framework

Metrics & Reporting Threat Intelligence Mitigate & Respond Execs Metrics (Commercial & Collaborative) Remedy Cases Desktop Admins Auditors Lab Admins Clients Threat Mapping Inform Duty Response End Users Partners Infosec Incident Reports Threat Mitigation IT Orgs DC Support HR-Legal Threat Assessments Inform Respond Infection Reports External Orgs

Executive Comms Compliance Reports Incident Response Team Knowledge EA Playbook IR Handbook Management Apps Base Configuration Mgt

Incident Mgt System Detect Investigate IDS Mgt Tools Alerts Tracker Signature Mgt

Infection Tracker Security Posture Event Management Asset Group Mgt Compliance Tracker Management & Analysis Tools Black Hole Mgt Attribution Engine Service Monitoring Logs Scans Config Events Packet Flow

Inspection IDS | IPS | Network Traffic Analysis | NetFlow | Email Gateway | Web Gateway | HIDS

Logging Syslog | TACACS | 802.1x | Antivirus | Endpoint Security | DNS | DHCP | NAT | VPN

Discovery Vuln Scans | Port Scans | Router Configs | ARP Tables | CAM Tables | 802.1x

Telemetry Address, Host & Employee Mgt | Partner DB | Host Mgt | HIPS | AV | Asset DB | Endpoint Sec Mgt | Config DB

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Why a Different Approach – For Security?

• Current solutions, typically delivered by standard SIEM tools, do not scale very well.

Large amount of data Data is generated at Support for is generated, Petabyte very high rate. structured and and higher. This is too Imagine a 10Gbp unstructured data of large for traditional link that is 50% different types, from databases, requiring a utilized in average. different sources distributed system for and using various storage and access protocols. SPEED OF SCALE OF DATA DATA CAPTURE TYPES AND SOURCES AND ANALYSIS VOLUME VELOCITY VARIETY

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 15 The Security Analytics Market – Reality*

• “defining “security analytics” at this point simply involves looking up the words in the dictionary. • There is no “security analytics market” or dedicated and purchasable “security analytics tools”; security analytics is a concept that an organization can practice, but can’t buy. Many different tools — from network intrusion prevention system (NIPS) to DLP and SIEM — use various algorithms to analyze data, thus performing analytics. Thus, if security-relevant data is subjected to analytic algorithms, security analytics is being practiced.”

• ‘The ability to analyze lot of security data over long periods of time, find threats and create models’

*http://blogs.gartner.com/anton-chuvakin/2014/07/08/why-no-security-analytics-market/

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 16 Security Analytics – A Process

Pick a specific problem to solve

Collect data (logs, events, packets, network flows, configuration, threat intelligence, etc.)

Analyze the data using an algorithm

(correlation rules, machine learning, clustering, etc.) Data Data Scientist Interact, query, present, visualize and take actions Analyst Security

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Big Data Security Analytics

• Big data security analytics is simply a collection of data sets so large and complex that it becomes difficult (or impossible) to process using on-hand database management tools or traditional security data processing applications

The ability to collect, Provide users with Built with an process, and store the ability to interact, appropriate compute terabytes to query, and visualize architecture to petabytes of data for this volume of data process data an assortment of in an assortment of analytic algorithms security analytics ways. and complex queries activities. and then deliver results in an acceptable timeframe SCALE FLEXIBILITY PERFORMANCE

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 18 BIG Data Security Analytics – A Process

Pick a specific problem to solve

Collect BIG data (logs, events, packets, network flows, configuration, threat intelligence, etc.)

Analyze the data using an algorithm (correlation rules, machine learning, clustering, etc.)

Interact, query, present, visualize and take actions

Security Analyst Security Data Scientist Admin platform BIG DATA

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public BIG Data Security Analytics – Stream versus Batch Processing

Concerned with data from any point in the past. That is not necessarily last week or Various data Sources last month — it could equally be data from 10 seconds ago. Data Sources

Store data to an appropriate Query warehouse on a Present platform like Hadoop Analyze

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights20 reserved. Cisco Public BIG Data Security Analytics – Stream versus Batch Processing

Data Streams Various data streams

Data is collected, ingested, transformed, managed and/or analyzed in real- Analyze data as it flows time. 1. Store original data and output of analysis to an appropriate warehouse on a platform like Hadoop 2. Present the result of analysis

21 TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public The Lambda (λ) Architecture

The Lambda Architecture consists of three layers: • The batch layer • The speed layer • The serving layer

Data is processed simultaneously by both the batch layer, and the speed layer.

The architecture provides a useful model for combining multiple big data technologies.

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public OpenSOC Intersection of Big Data and Security Analytics

Scalable Compute Real-Time Alerts Multi Petabyte Storage Anomaly Detection Interactive Query Data Correlation Hadoop Real-Time Search

Big Data OpenSOC Platform

Rules and Unstructured Data Reports Predictive Scalable Stream Processing Modeling UI and Data Access Control Applications

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 23 OpenSOC – A Framework

Real time A mechanism processing Efficient An interface to capture, and information that gives a store, and application of storage: security transform any enrichments Logs and investigator a type of such as threat telemetry centralized security view of data intelligence, The ability to telemetry at and alerts geo-location, extract and extremely high passed and DNS reconstruct rates through the information to full packets telemetry system being Long-term collected storage

24 TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public OpenSOC – The Framework

Source Systems Data Collection Messaging System Real Time Processing Storage Access

PCAP Storm Hive Analytic Tools Passive Kafka Tap Telemetry Telemetry Telemetry PCAP Telemetry CaptureTraffic Normalization Parsing PCAPAnalysis RawAnalysis Data R / Python PCAP Topic Replicator Topology Power Pivot Real-Time Anomaly ORC Services Storage of Batch Visualization DPI TopicIndexing and DPI TopologyDetection Tableau Big Data Processing Telemetry Searching and Machine Sources Flume Elastic SearchAPI A Topic A TopologyLearning Web Services Syslog Agent A Index B Topology B Topic Search HTTP Agent B HBase File SystemExtensible Integrated Automated OPEN The Platform PCAP Agent N N Topic N Topology Reconstruction Other PCAP Table

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 25 OpenSOC – The Architecture

Source Systems Data Collection Messaging System Real Time Processing Storage Access

PCAP Storm Hive Analytic Tools Passive Kafka Tap Traffic PCAP Raw Data R / Python PCAP Topic Replicator Topology Power Pivot ORC DPI Topic DPI Topology Tableau Telemetry Sources Flume Elastic Search A Topic A Topology Web Services Syslog Agent A Index B Topology B Topic Search HTTP Agent B File System HBase PCAP Agent N N Topic N Topology Reconstruction Other PCAP Table

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 26 Technologies Behind OpenSOC

. Telemetry Capture Layer Apache Flume

. Data Bus

. Stream Processor

. Real-Time Index and Search Elastic Search

. Long-Term Data Store

. Long-Term Packet Store Apache Hbase

. Visualization Platform Kibana

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public OpenSOC – Big Data and Security Analytics

Source Systems Data Collection Messaging System Real Time Processing Storage Access

PCAP Storm Hive Analytic Tools Passive Kafka Tap Traffic PCAP Raw Data R / Python PCAP Topic Replicator Topology Power Pivot ORC DPI Topic DPI Topology Tableau Telemetry Sources Flume Elastic Search COLLECT A Topic ANALYZEA Topology Store Web Services Syslog

Agent A Index Present

B Topology ANALYZE B Topic Search HTTP Agent B Communicate File System HBase PCAP Agent N N Topic N Topology Reconstruction Other PCAP Table

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 28 Managed Threat Defense (MTD)

CUSTOMER CISCO SOC

Telemetry

Community Intelligence Analysis SOC 24/7 Native Customer Data Ticketing

Dedicated Intel Full Packet Capture Customer Segment Advanced Analytics

Sourcefire Portal

Metadata Extraction CUSTOMER

© 2015 Cisco and/or its affiliates. All rights reserved. 29 Demo A Very Brief Introduction to Hadoop Hadoop is Always Changing 2005 – 2015 (25+ Internet years) Originally Batch Processing Single points of failure Java Required Security? What’s that?

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 32 Hadoop is Always Changing 2005 – 2015 (25+ Internet years) 3-4 years ago

Begin Stream processing

Added some HA

Added new languages/access methods

Crude, per component security via Kerberos

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 33 Hadoop is Always Changing 2005 – 2015 (25+ Internet years) Today Mature Batch and Stream Processing Mostly HA (depending on Vendor) Many access methods Mature security Enterprise Ready

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 34 Hadoop is Always Changing 2005 – 2015 (25+ Internet years)

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 35 Hadoop is similar to Linux

Linux is really only the Kernel

Distributions bundle Linux w/ other packages to create a useful platform • RedHat, CentOS, Scientific Linux, Fedora • Debian, Ubuntu, Mint • SuSE, OpenSuSE • Gentoo GNU Utils

Network, TCP/IP, etc Linux

Xwindows

File Systems

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 36 Core Hadoop - HDFS

NameNode – Contains the location of all copies of all data.

Virtual Drive - HDFS

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 37 Core Hadoop - YARN

YARN – Manages resource allocation to direct processes efficiently.

Application Virtual Drive - HDFS

YARN

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 38 Core Hadoop - YARN

YARN – Manages resource allocation to direct processes efficiently.

Application

Where is the data? YARN DataNode 2 and 4

App App

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 39 Hadoop is similar to Linux

Core is HDFS plus Map/Reduce (and utilities)

Hadoop Vendors bundle additional projects (Speakers opinion only) • – Ambari, Tez/Stinger, Slider • Cloudera – Cloudera Manager, Impala • MapR – Mostly known for HDFS re-write, non-free • BigTop – Apache project for Roll-Your-Own Distributions Hbase

Hive HDFS/YAR N

Storm

Kafka

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 40 Core Hadoop

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 41 Hadoop – Plus the jungle of projects

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 42 Hadoop – What really matters

• The tool that solves your problem.

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 43 Tools We Use in OpenSOC

. Telemetry Capture Layer Apache Flume

. Data Bus Apache Kafka

. Stream Processor Apache Storm

. Real-Time Index and Search Elastic Search

. Long-Term Data Store Apache Hive

. Long-Term Packet Store Apache Hbase

. Visualization Platform Kibana

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public OpenSOC – The Architecture

Source Systems Data Collection Messaging System Real Time Processing Storage Access

PCAP Storm Hive Analytic Tools Passive Kafka Tap Traffic PCAP Raw Data R / Python PCAP Topic Replicator Topology Power Pivot ORC DPI Topic DPI Topology Tableau Telemetry Sources Flume Elastic Search A Topic A Topology Web Services Syslog Agent A Index B Topology B Topic Search HTTP Agent B File System HBase PCAP Agent N N Topic N Topology Reconstruction Other PCAP Table

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 45 Tools We Use in OpenSOC

• Kafka – a distributed input queue

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 46 OpenSOC – Big Data and Security Analytics

Source Systems Data Collection Messaging System Real Time Processing Storage Access

PCAP Storm Hive Analytic Tools Passive Kafka Tap Traffic PCAP Raw Data R / Python PCAP Topic Replicator Topology Power Pivot ORC DPI Topic DPI Topology Tableau Telemetry Sources Flume Elastic Search COLLECT A Topic ANALYZEA Topology Store Web Services Syslog

Agent A Index Present

B Topology ANALYZE B Topic Search HTTP Agent B File System HBase PCAP Agent N N Topic N Topology Reconstruction Other PCAP Table

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 47 Tools We Use in OpenSOC

• Kafka – a distributed input queue • Flume – Listens for “syslog” style data and directs it into Kafka (or HDFS)

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 48 Tools We Use in OpenSOC

• Kafka – a distributed input queue • Flume – Listens for “syslog” style data and directs it into Kafka (or HDFS)

Flume Syslog Flume Source Sink

Snort Flume Flume Kafka Source Sink

Other Flume Flume Source Sink

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 49 OpenSOC – The Architecture

Source Systems Data Collection Messaging System Real Time Processing Storage Access

PCAP Storm Hive Analytic Tools Passive Kafka Tap Traffic PCAP Raw Data R / Python COLLECT PCAP Topic Replicator Topology Power Pivot ORC DPI Topic DPI Topology Tableau Telemetry Sources Flume Elastic Search COLLECT A Topic ANALYZEA Topology Store Web Services Syslog

Agent A Index Present

B Topology ANALYZE B Topic Search HTTP Agent B File System HBase PCAP Agent N N Topic N Topology Reconstruction Other PCAP Table

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 50 Tools We Use in OpenSOC

• Kafka – a distributed input queue • Flume – Listens for “syslog” style data and directs it into Kafka • Storm – A stream processor

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 51 OpenSOC – The Architecture

Source Systems Data Collection Messaging System Real Time Processing Storage Access

PCAP Storm Hive Analytic Tools Passive Kafka Tap Traffic PCAP Raw Data R / Python PCAP Topic Replicator Topology Power Pivot ORC DPI Topic DPI Topology Tableau Telemetry Sources Flume Elastic Search COLLECT A Topic A Topology Web Services Syslog

Agent A Index Present

B Topology ANALYZE B Topic Search HTTP Agent B File System HBase PCAP Agent N N Topic N Topology Reconstruction Other PCAP Table

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 52 Tools We Use in OpenSOC

• Kafka – a distributed input queue • Flume – Listens for “syslog” style data and directs it into Kafka • Storm – A stream processor • Hbase – An In Memory Database (NoSQL)

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 53 OpenSOC – The Architecture

Source Systems Data Collection Messaging System Real Time Processing Storage Access

PCAP Storm Hive Analytic Tools Passive Kafka Tap Traffic PCAP Raw Data R / Python PCAP Topic Replicator Topology Power Pivot ORC DPI Topic DPI Topology Tableau Telemetry Sources Flume Elastic Search COLLECT A Topic ANALYZEA Topology Web Services Syslog

Agent A Index Present

B Topology ANALYZE B Topic Search HTTP Agent B Communicate File System HBase PCAP Agent N N Topic N Topology Reconstruction Other PCAP Table

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 54 Tools We Use in OpenSOC

• Kafka – a distributed input queue • Flume – Listens for “syslog” style data and directs it into Kafka • Storm – A stream processor • Hbase – An In Memory Database (NoSQL) • Hive – SQL access to HDFS Data (Java, JDBC, ODBC)

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 55 OpenSOC – The Architecture

Source Systems Data Collection Messaging System Real Time Processing Storage Access

PCAP Storm Hive Analytic Tools Passive Kafka Tap Traffic PCAP Raw Data R / Python PCAP Topic Replicator Topology Power Pivot ORC DPI Topic DPI Topology Tableau Telemetry Sources Flume Elastic Search COLLECT A Topic ANALYZEA Topology Web Services Syslog

Agent A Index Present

B Topology ANALYZE B Topic Search HTTP Agent B Communicate File System HBase PCAP Agent N N Topic N Topology Reconstruction Other PCAP Table

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 56 Tools We Use in OpenSOC

• Kafka – a distributed input queue • Flume – Listens for “syslog” style data and directs it into Kafka • Storm – A stream processor

• Hbase – An In Memory Database (NoSQL) Source IP • Hive – SQL access to HDFS Data • Elastic Search – For Indexing PCAP Source TS_Micro Port PCAP_ID

Dest Dest Port IP

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 57 OpenSOC – The Architecture

Source Systems Data Collection Messaging System Real Time Processing Storage Access

PCAP Storm Hive Analytic Tools Passive Kafka Tap Traffic PCAP Raw Data R / Python PCAP Topic Replicator Topology Power Pivot ORC DPI Topic DPI Topology Tableau Telemetry Sources Flume Elastic Search COLLECT A Topic ANALYZEA Topology Web Services Syslog

Agent A Index Present

B Topology ANALYZE B Topic Search HTTP Agent B Communicate File System HBase PCAP Agent N N Topic N Topology Reconstruction Other PCAP Table

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 58 Capture Options NapaTech • High Speed/Proprietary • Expensive JNetPCAP • OpenSource Java • http://jnetpcap.com/ OpenSOC PyCapa • Newly released • Lower Speed • https://github.com/OpenSOC/pycapa

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 59 OpenSOC – The Architecture

Source Systems Data Collection Messaging System Real Time Processing Storage Access

PCAP Storm Hive Analytic Tools Passive Kafka Tap Traffic PCAP Raw Data R / Python PCAP Topic Replicator Topology Power Pivot ORC DPI Topic DPI Topology Tableau Telemetry Sources Flume Elastic Search A Topic ANALYZEA Topology Store Web Services Syslog

Agent A Index Present

B Topology ANALYZE B Topic Search HTTP COLLECT Agent B Communicate File System HBase PCAP Agent N N Topic N Topology Reconstruction Other PCAP Table

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 60 OpenSOC – The Architecture

Source Systems Data Collection Messaging System Real Time Processing Storage Access

PCAP Storm Hive Analytic Tools Passive Kafka Tap Traffic PCAP Raw Data R / Python PCAP Topic Replicator Topology Power Pivot ORC DPI Topic DPI Topology Tableau Telemetry Sources Flume Elastic Search A Topic A Topology Web Services Syslog Agent A Index B Topology B Topic Search HTTP Agent B File System HBase PCAP Agent N N Topic N Topology Reconstruction Other PCAP Table

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 61 OpenSOC The BIG Vision

Introduction

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Problem

• Duration of attacks • Variety of attacks • Disparity of tools • Lack of automation

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 64 The BIG Vision

Collect

Automate Enrich

Act Correlate

Analyze Store

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public The BIG Disruption

• Horizontally-scalable processing and compute • Open non-proprietary software • Ability to run on commodity hardware • Interoperability of tools • Large developer community

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 66 OpenSOC is a…

• Open framework for security analytics • Hadoop Application – Telemetry ingest – Enrichment – Alerts – Visualizations – Analytics • Geared towards the big data use case • Focus on real-time • Developed for MTD by Cisco

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 67 OpenSOC wants to be…

• A de facto standard • A community effort • An open project • Free

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 68 The BIG Idea

Use Cases and Capabilities

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Capabilities

• Telemetry Transformation/Normalization • In-line Enrichment • In-line Alerts • Real-time indexing and search • Big Data Store • UI and Data Visualizations

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Right Use Case

Events: ~500K/second + Highly Custom Platform

Storage: ~petabytes SEIM Capabilities

Data: semi-structured, PCAP capture unstructured

Team: highly skilled Information retention

Hardware: commodity, large Integration with existing tools footprint

Process: information-driven Managed Service

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Alternatives

Telemetry Telemetry Telemetry PCAP Rules/Ale Real-Time Anomaly Reports, Free/Paid Capture Normalization Parsing Analytics rts Indexing/ Detection SQL Engine Search , ML capability

OpenSOC YES YES YES YES YES YES YES YES FREE (Beta) ElasticSearch, Logstash, Kibana (ELK) YES YES YES NO NO YES NO NO FREE

Flume + Morphlines YES YES YES NO NO NO NO NO FREE

Splunk YES YES YES NO YES YES NO NO PAID

ArcSIGHT YES YES YES NO YES YES NO NO PAID

AOL Moloch YES YES YES YES NO NO NO NO FREE

VaporWare YES YES YES YES YES YES YES YES Doesn’t Exist

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Background Sept 2014 General Availability

May 2014

CR Work off

April 2014 First beta test at March 2014 customer site Platform Dec 2013 development finished Hortonworks joins the project Sept 2013

First Prototype

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public State of OpenSOC

• In version 0.5 • Supported by Cisco • Community of developers • Variant fielded for MTD Service

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public OpenSOC Community

PMC • Our Git Hub: www.getopensoc.com • Community: MITRE, ACCENTURE, LOCKHEAD MARTIN, INFOSYS, INDEPENDENT

RESEARCHERS, ETC. Evangelists • Actively recruiting Contributors and Committers • Large Feature Backlog

• Large documentation effort Committers • Large platform automation effort

Contributors

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 75 OpenSOC Distributions

• Open – Community owned: all features of OpenSOC developed by the open source community • Curated – Cisco owned: closed MTD platform + curated features borrowed from open source distribution

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Distribution Alignment

Curated Open Distribution Distribution

Components Components proprietary Common Not selected to MTD Framework For MTD

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Top-Level Architecture

Flow Charts and Diagrams

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Top-Level Architecture Threat Intelligence Feeds

Raw Network Stream Applications + Analyst Tools Network Metadata Stream Network Big Data Packet Mining Netflow Log Mining Exploration, and PCAP and Analytics Predictive

Reconstructio Alert

Enrich Modeling Syslog n

Parse Format + Parse Elastic Search HBase Hive Raw Application Logs Real-Time Raw Packet Long-Term Other Streaming Index Store Store Telemetry

Enrichment Data

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Analytics Pipeline

Big Data Stores Flume Kafka Storm OpenSOC-Streaming Elastic Search Real-Time Index and Alert RAW Transform Enrich Search (Rules-Based) HIVE Long-Term Data Store OpenSOC-Aggregation Filter Aggregators HBase Windowed Rollup Enriched Store OpenSOC-ML Model 1 Router Scorer Model 2 HIVE Alerts Store Model n

SOC Alert Consumers External Alert UI UI Secure Gateway Consumers Alerts UI UIWeb UI Services Services Remedy Ticketing System TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Storm Topology

Error Error Index Index Bolt

Alerts Alerts Index Index Test Bolt File Spout Enrich Enrich Telemetry Index Parser Alerts Index ment ment Bolt Bolt Bolt Bolt(a) Bolt(n) Kafka Kafka Spout Hive HDFS Bolt

Kafka (ML) Kafka Topic Bolt

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Enrichments { { { { “msg_key1”: “msg value1”, “msg_key1”: “msg value1”, “msg_key1”: “msg value1”, “msg_key1”: “msg value1”, “msg_key2”: “msg value2”, “msg_key2”: “msg value2”, “msg_key2”: “msg value2”, “msg_key2”: “msg value2”, “msg_key3”: “msg value3”, “msg_key3”: “msg value3”, “msg_key3”: “msg value3”, “msg_key3”: “msg value3”, “src_ip”: “10.20.30.40”, “src_ip”: “10.20.30.40”, “src_ip”: “10.20.30.40”, “src_ip”: “10.20.30.40”, “dest_ip”: “20.30.40.50”, “dest_ip”: “20.30.40.50”, “dest_ip”: “20.30.40.50”, “dest_ip”: “20.30.40.50”, “domain”: “mydomain.com” “enrichments”: {“geo”: {…}} “enrichments”: {“geo”: {…}, “enrichments”: {“geo”: {…}, } } “whois”: {…} } “whois”: {…}, } “cif”:”Yes”} }

RAW Parser GEO Who Is CIF Enriched Message Bolt Enrich Enrich Enrich Message

Cache Cache Cache

MySQL HBase HBase

Geo Lite Data Who Is Data CIF Data

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Bolts

Message Stream Shuffle Grouping Parsers Enrich Geo Enrich Whois Control Stream ALL Grouping Enrich Enrich Parser KAFKA Topic: Enriched

Enrich Enrich KAFKA MESSAGE Parser Topic: Enriched Kafka Spout

Enrich Enrich Parser KAFKA Topic: Enriched

Enrich Enrich KAFKA Parser Topic: Enriched

Parser Plugin Geo Plug-In Whois Plug-In

SQL Store K-V Store TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Software Packages

Structure, Breakdown, and Dependencies

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Top-Level Projects

• OpenSOC-UI: data visualization, dashboards, adhoc analytics, reports

• OpenSOC-Streaming: parsing, indexing, enrichment, storage

• OpenSOC-Aggregation: (not yet available) stream summaries, sketches, descriptive statistics

• OpenSOC-ML: (not yet available) anomaly detection, statistical modeling, model scoring framework

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public OpenSOC-UI

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public OpenSOC-Streaming

OpenSOC- Streaming

OpenSOC- OpenSOC- OpenSOC- OpenSOC- OpenSOC- OpenSOC- Message Enrichment Topologies Alerts Indexing Common Parsers Adapters

Topology Enrichment Parser Bolt Alerts Bolt Indexing Bolt Runners Bolt

Abstract Abstract Abstract Abstract Parser Enrichment Alerts Indexing Adapter Adapter Adapter Adapter

PCAP DPI Lancope ETC…

Elastic GEO CIF Whois Etc.. Search Solr Adapter Etc… Adapter

Sourcefire PCAP Parser DPI Parser Parser Etc… Adapter Adapter Adapter

White/Blacklis CIF Adapter Etc… t Adapter

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 87 OpenSOC-Streaming Modules

• OpenSOC-Topologies: sample Storm topologies that exercise OpenSOC Modules

• OpenSOC-MessageParsers: extensible parsing and normalization framework for telemetry consumption

• OpenSOC-Enrichment: extensible framework in-line enrichments

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public OpenSOC-Streaming Modules

• OpenSOC-Indexing: extensible framework for in-line indexing

• OpenSOC-Alerts: extensible framework for in-line alerts

• OpenSOC-Common: various common helper functions shared by OpenSOC modules

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public OpenSOC-Streaming Modules

• OpenSOC-DataServices: extensible framework for integrating external systems with OpenSOC

• OpenSOC-PCAP_Service: REST service to build PCAP files on demand from data captured in Hbase

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Helper Scripts

• OpenSOC-DataLoads: scripts for building up enrichment databases from externally-available flat files

• OpenSOC-FlumeAgents: configurations of Flume agents for ingesting lower- volume telemetry into Kafka Queues

• OpenSOC-Capture: (released soon) scripts for capturing network traffic and pushing it into Kafka queues

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public OpenSOC Software

Technology Concepts

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Apache Kafka

• 1 topic per telemetry type • Retention of 15 minutes • Flume plugin (low-volume) • Kafka producer (high-volume) • Kafka spout (consumer)

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Apache Storm

• Topologies • Spouts • Bolts • Groupings

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Apache Flume

• Sources • Sinks • Plugins • Interceptors

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Apache HBase

• Key Construction • Bulk loading • Region pre-splitting • Compaction/Deletion

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public

• HDFS • MR • Hive/Hive Server/Metastore • Apache Zookeeper

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public What is Extensible

• Adding new telemetries • Adding new enrichments • Adding rules processors • Extending the UI widgets

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Application Deployment

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Your 4 Steps to Setup the Platform

Step 1 – Setting Up Step 2 – Setup Step 3 – Configure Step 4 – Configure Telemetry Capture Enrichments Storm Topologies Aging of Data

Index rotation NapaCapa GeoIP Environment for Elastic NapaCapa Search

Topology- Hbase custom DPICapture CIF specific compaction DPICapture configuration policy

Bolt-specific HDFS/Hive Flume Agents Whois Flume Agents configuration (optional)

Spout-specific Kafka Topics Alerts Kafka Topics configuration

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 100 Step 1 – Check for prerequisites

• Flume agents • Kafka topics • Storm Nimbus • MySql • Hive • Hbase • Elastic Search Head • Graphite

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Step 2 – Start Topologies

• PCAP • Bro • Sourcefire • ISE • Lancope • (CUSTOM)

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Step 3 – Start Telemetry

• NapaCapa • Bro Capture • Flume-Sourcefire Agent • Flume-ISE Agent • Flume-Lancope Agent

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Step 4 – Post Deployment Steps

• Storm Nimubs – Monitor for errors – Check for rolling counters – Grep storm logs for errors • Elastic Search Head Plugin – Monitor cluster status – Monitor for indexing problems • Graphite – Check metrics • Hbase – Count keys (make sure ~equal to Elastic Search) • Hive – Count rows (make sure ~equal to Elastic Search)

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Problems? Common Places to Look (Logical order)

• Storm Logs • Storm error index • Graphite • Kafka Logs • Hbase Logs • HDFS Logs • Elastic Search logs • Flume Logs

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Making Your Own

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Software Architecture

• Built around JSON messaging • Uniform topologies • Testing spouts • Bolts with plug-ins • Stackable enrichments • Topology Runners

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Conventions

• Configurations – Environment – Topology-based – Cluster-based – Customer-based • Message format – Standard OpenSOC JSON (SOJ) • Plug-in

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Step 1 – Start a New Configuration

• Configuration file list – Features_enabled: turns bolts on and off – Topology.conf : configuration for bolts and adapters

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Step 2 – Write a parser (required)

• Plugin for parser bolt – Input: raw data stream – Output: OpenSOC formatted JSON • Plugins provided by Cisco – PCAP – Bro – Sourcefire – Lancope – ISE • Example: – message {“some_field”: “some_value”}

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Step 3 – Write enricher (optional)

• Plugin for enrichment bolt – Input: OpenSOC Json message – Output: Enrichment field attached to OpenSOC JSON message • Plugins provided by Cisco – CIF enrichment: threat intelligence – Whois: where whois data is present – Geo: geoip for IP tags • Example – {message{“key”: “value”}, enrichment{“geo”:{“key”:”xx”, etc}}}

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Step 4 – Write alerter (optional)

• Plugin for alerts engine – Input: raw/enriched OpenSOC message – Output: 2 streams • Message stream: original message • Alerts stream: alerts message • Plugins provided by Cisco – AllAlerts: each telemetry message treated as an alert – WhitelistAlerter: checks source and destination IP against a white list and alerts if they do not apply to a customer asset

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Step 5 - Indexer

• Plugin for indexer – Input: raw/enriched OpenSOC message – Output: call to external indexing system

• Adapters provided by CISCO – Elastic Search: capability is natively built in – Solr: capability is TBD

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Step 6 – HDFS Bolt

• Natively provided by Hortonworks with HDP 2.6+ distribution.

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Hands-on Demo Scenario:

• I want to add a new telemetry to OpenSOC and I want to be able to turn on enrichments and use my UI console as SEIM to display and analyze the information

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public The User Interface OpenSOC – Big Data and Security Analytics

Source Systems Data Collection Messaging System Real Time Processing Storage Access

PCAP Storm Hive Analytic Tools Passive Kafka Tap Traffic PCAP Raw Data R / Python PCAP Topic Replicator Topology Power Pivot ORC DPI Topic DPI Topology Tableau Telemetry Sources Flume Elastic Search COLLECT A Topic ANALYZEA Topology Store Web Services Syslog Agent A Index B Topology B Topic Search HTTP Agent B Communicate File System HBase PCAP Agent N N Topic N Topology Reconstruction Other PCAP Table

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 122 Introduction OpenSOC UI

• The UI is a modified version of Kibana 3 which is served by a node JS backend. • Remember that Elasticsearch is API driven. For example: Curl –XPUT/XPOST/XGET http://www.my-es-host.com:9200/ • How can you query, filter, present and visualize data? • Kibana can be thought of as a visualization and exploration application. • Kibana is a browser based analytics and search dashboard for ElasticSearch. How about accessing HBase? • Kibana is based on HTML and Javascript. Latest release is 3.0. • Kibana is not a web server and hence it requires a backend web server such as Node.js, nginx, etc.

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 123 Securing the UI and the Rest of Services

• Access to the various APIs should be done using the web server acting as a “reverse proxy”. • Provides authentication using LDAP/LDAPS.

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 124 OpenSOC UI Design

Storage

Elastic Search Elasticsearch Web Access REST API calls for alerts, events and Web Services PCAP metadata Web Browser Index

HBase Node.js

Call to a new PCAP API Tshark PCAP Table developed based on the

PCAP Kibana files loaded Service HBase Java API on the web browser The API call returns a PCAP file LDAP(S)

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 125 OpenSOC UI Design Elasticsearch Storage JSON based REST API Calls for alerts, events and Web Access Elastic Search PCAP metadata Web Services Web Services Index

Node.js Your own tools/scripts Node.js Call to the

HBase PCAP API

PCAP Service

PCAP Table PCAP Service Web Browser Hive

Tableau ODBC/JDBC Driver

126 TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public UI Configuration

• OpenSOC UI configuration file (.opensoc-ui) parameters include: – ElasticSearch with OpenSOC data. – PCAP Service for access to raw pcaps. – LDAP for authentication. • Now you can install the UI module using npm! Refer to README on github: npm install -g opensoc-ui • You can also download the development virtualbox

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 127 Kibana API call to Elasticsearch

Storage Web Access curl -XPOST "http://www.my-es-host:9200/_search" -d' { Web Services Elastic Search "query": { Elasticsearch "match_all": {} REST API calls for } alerts, events and }' Index PCAP index Node.js

HBase

Web Browser PCAP Table

128

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Kibana API call to Elasticsearch

Sample API call to Elasticsearch pcap_all index

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 129 Kibana API call to Elasticsearch

Sample API call to Elasticsearch pcap_all index

Sample ES indexes from the OpenSOC development virtualbox on github

curl 'localhost:9200/_cat/indices?v' health index pri rep docs.count docs.deleted store.size pri.store.size yellow alert 5 1 14000 0 22.1mb 22.1mb yellow kibana-int 5 1 1 0 12.3kb 12.3kb yellow pcap_all 5 1 119899 0 34.7mb 34.7mb yellow lancope_index 5 1 1000 0 2.4mb 2.4mb yellow fireeye_index 5 1 1000 0 2.5mb 2.5mb yellow qosmos_index 5 1 3000 0 10.1mb 10.1mb yellow sourcefire_index 5 1 3000 0 10.8mb 10.8mb yellow bro_index 5 1 6000 0 14.4mb 14.4mb

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 130 Kibana API call to Elasticsearch

Sample API call to Elasticsearch alert index

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 131 Retrieving PCAP content – Step 1 Retrieve the Stream

The fields can be automatically filled using the event action facility OR Manually during an investigation

A stream is a set of packets that share the same src IP, dst IP, src port, dst port and protocol Pcap_id points to stream Pcap_id + time-stamp maps to a packet

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Retrieving PCAP content – Step 1 Retrieve Web Browser the Stream

• Populating the search field and requesting stream information: Storage This is an API call that Kibana makes to Elastisearch, for the “pcap_all” index. The following is the POST request (https://www.my-kibana- Elastic Search Kibana makes an API host.com/__es/pcap_all/pcap_doc/_search ) data that the call to ES: it sends a browser sends to the Kibana instance: GET request to Node.js Index which proxies it to ES {"facets":{"packets":{"terms":{"field":"pcap_id","size":100},"facet_f Web Services ilter":{"bool":{"must":[{"term":{"ip_src_addr":"89.248.162.242"}},{"t erm":{"ip_dst_addr":"192.168.0.200"}},{"term":{"ip_src_port":"584 32"}},{"term":{"ip_dst_port":"53"}},{"range":{"ts_micro":{"gte":1421 Node.js 330043539000,"lte":1421330163539000}}}]}}}}} HBase

• This will retrieve stream information. PCAP Table

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 133 Retrieving PCAP content – Step 2 List Stream Packets

List the packets that belong the selected connection/stream

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 134 Retrieving PCAP content – Step 2 List Web Browser Stream Packets

Storage • Now you have the stream information and you want to list he packets that make the stream (without their content) Kibana makes an API Elastic Search call to ES and includes • This is a call to Elasticsearch that contains a pcap_id and time the pcap_id value for the stream: it sends a GET stamp range values request to Node.js which Index proxies it to ES • {"filter":{"bool":{"must":[{"range":{"ts_micro":{"gte":14213300435390 00,"lte":1421330163539000}}},{"or":{"filters":[{"term":{"pcap_id":"59f Web Services 8a2f2-c0a800c8-17-58432- 53"}}]}}]}},"sort":["ts_micro"],"from":0,"size":500} Node.js HBase

PCAP Table

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 135 Retrieving PCAP content – Step 3 List Stream Packets

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 136 Retrieving PCAP content – Step 3 List Stream Packets Web Browser • Now you have the list of packets and you want to retrieve the content of one packet. Storage

• This content of the packet is in HBase. Elastic Search • The browser uses the EventSource API to make an HTTP content-type "text/event-stream” GET request to Node.js. The Index request includes a key. Send a GET https://www.my-kibana- request to Node.js host.com/pcap/getPcapsByKeys?keys=59f8a2f2-c0a800c8-17- that includes the 58432-53-0- pcap_id value 0&includeReverseTraffic=false&startTime=1421330103543&end HBase Time=1421330103544 Web Services

• Node.js makes an API call to the PCAP service to retreive the PCAP Table PCAP content of the packet, which returns a PCAP file. Service Node.js REST API call • The PCAP file goes through Tshark on the Node.js server to HBASE to before it is submitted to browser. retrieve PCAP content • The browser renders the results in a wireshark-like interface. TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 137 Retrieving PCAP content – Step 4 Retrieve PCAP file for the stream Web Browser

Storage • The browser sends a request to Node.js and includes the pcap_id that is used to retrieve all packets in the stream. Elastic Search https://www.my-kibana- host.com/pcap/getPcapsByKeys?keys=59f8a2f2-c0a800c8-17-58432- 53&includeReverseTraffic=false&startTime=1421330043539000&end Index Time=1421330104539060&raw=true Send a GET • Raw=true instructs Node.js to forward the received file as is. request to Node.js that includes the pcap_id value • This does not go through Tshark. Raw=true Instruction to NodeJS, to proxy data as is, i.e. the pcap file sent by the PCAP service. HBase Web Services

PCAP Table Node.js

REST API call to HBASE to retrieve PCAP content

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 138 Accessing PCAP Content Retrieval Process 5 Web Browser Browser holds the entire parsed PCAP data in memory 1 EventSource API call that has HTTP content-type 2 "text/event-stream” to PCAP API call with Node.js, containing the the pcap_id pcap_id value HBase contained in the Web Services browser request

PCAP Table Node.js PCAP Service Tshark

PCAP file is sent back to 3 Node.js 4 If raw=false then Process the PCAP file using Tshark Stream the PCAP data to the browser over the HTTP EventSource API

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 139 Closer look at the request Accessing PCAP Data sent from the browser to Node.js for retrieving the content of a packet Web requests to /pcap/getPcalByKeys?xxxx x

pcap_id is added to the request for the field keys

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 140 UI Demo Building Your Own Kittens! - Pets are given names like Fluffy - They are unique and lovingly raised and cared for - If they get sick, you take them to the vet and nurse them back to health - You hope they’ll live forever

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 156 Cattle - Cattle are given numbers like dn01. - They are almost identical to other cattle. - When they get sick, you take “normal” measures to cure. Cost/benefit model. - Cattle have minimum life expectancy. - To serve their purpose, they must be “herded.”

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 157 Clusters Nodes are Cattle!

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 158 What are the implications?

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 159 What are the implications?

• Most of the “Enterprise Class” Server rules no longer apply!

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 160 What are the implications?

• Most of the “Enterprise Class” Server rules no longer apply! • Automation is easier and strongly encouraged.

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 161 What are the implications?

• Most of the “Enterprise Class” Server rules no longer apply! • Automation is King • Cheaper is usually better

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 162 What are the implications?

• Most of the “Enterprise Class” Server rules no longer apply! • Automation is King • Cheaper is usually better • No 3-5 Year refresh cycle

To-Do in 3 Years 1. Buy new servers to replace the old ones 2. Re-install software 3. Transfer Data

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 163 So what do I need to run OpenSOC?

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 164 So what do I need to run OpenSOC?

• A Cluster running Core Hadoop + Hive, Hbase, Kafka, Storm and Elastic Search

OK, but not required Elastic Search

Elastic Search

Elastic Search

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 165 So what do I need to run OpenSOC?

• A Cluster running Core Hadoop + Hive, Hbase, Kafka, Storm and Elastic Search

• 3 Physical Servers for NN, Zookeeper, YARN and “Master” Nodes

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 166 So what do I need to run OpenSOC?

• A Cluster running Core Hadoop + Hive, Hbase, Kafka, Storm and Elastic Search

• 3 Physical Servers for NN, Zookeeper, YARN and “Master” Nodes • Data Nodes: As many as you can • Depending on Data Retention Requirements & Ingestion Rates

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 167 OpenSOC at Cisco (aka MTD)

Hardware footprint (40u): - 14 Hadoop Data Nodes (UCS C240 M3) - 3 Cluster Control Nodes (UCS C220 M3) - 2 ESX Hypervisor Hosts (UCS C220 M3) - 1 PCAP Processor (UCS C220 M3 + Napatech NIC) - 2 SourceFire Threat alert processors - 1 Anue Network Traffic splitter - 1 Router - 1 48 Port 10GE Switch Software Stack - HDP 2.2 - Kafka 0.8.1 - Elastic Search 1.3.0 - MySQL 5.5 (Hive Meta & GeoData)

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 168 OpenSOC at Cisco

CTRL01 CTRL02 CTRL03

• Zookeeper • Zookeeper • Zookeeper • NameNode1 • NameNode2 • YARN / History Server • ES Master • ES Master • ES Master • Nimbus Server/UI • Hbase Master • Hive Meta Standby • Hbase Master • Flume Agents

Data Nodes (10-14) Elastic Search Nodes (8) *Dedicated disks • YARN / HDFS • 3 ES Instances • Hbase • Each w/ dedicated disks • Storm Client • Kafka* • 2x ES* (If Shared)

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 169 Installation Suggestions

• Kickstart / Autoyast / etc. – Automated builds as much as possible

Ansible Generated PXE Config (MAC Addr file)

Ansible Generated Kickstart w/ Authorized_keys

PXE Boot

Now Ansible Ready

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 170 Installation Suggestions

• Kickstart / Autoyast / etc. – Automated builds as much as possible • Hadoop Distro or Roll Your Own Distribution Roll your own

• Fast Setup • Longer Startup • Vendor Support • Experience • Integration required Testing • Lower Cost • Can cost more • More control than nodes! • Potential Lock-in

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 171 Installation/Configuration Suggestions • Kickstart / Autoyast / etc. – Automated builds as much as possible • Hadoop Distro or Roll Your Own • Plus Configuration Management • We’re working on Ansible scripts that we expect to release

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 172 Installation/Configuration Suggestions • Kickstart / Autoyast / etc. – Automated builds as much as possible • Hadoop Distro or Roll Your Own • Plus Configuration Management • We’re working on Ansible scripts that we expect to release • Enable NTP

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 173 Installation/Configuration Suggestions • Kickstart / Autoyast / etc. – Automated builds as much as possible • Hadoop Distro or Roll Your Own • Plus Configuration Management • We’re working on Ansible scripts that we expect to release • Enable NTP • Use /etc/hosts vs. DNS

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 174 Installation/Configuration Suggestions • Kickstart / Autoyast / etc. – Automated builds as much as possible • Hadoop Distro or Roll Your Own • Plus Configuration Management • We’re working on Ansible scripts that we expect to release • Enable NTP • Use /etc/hosts vs. DNS • Use easy names: • Ctrl01, DN01, ES01 • Add IP addresses for Services: nn1, nn2, yarn, etc.

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 175 Installation/Configuration Suggestions • Kickstart / Autoyast / etc. – Automated builds as much as possible • Hadoop Distro or Roll Your Own • Plus Configuration Management • We’re working on Ansible scripts that we expect to release • Enable NTP • Use /etc/hosts vs. DNS • Use easy names: • Ctrl01, DN01, ES01 • Add IP addresses for Services: nn1, nn2, yarn, etc. • “Steal” hadoop config files from POC installs and adjust. • Building from scratch is painful.

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 176 Installation/Configuration Suggestions • Kickstart / Autoyast / etc. – Automated builds as much as possible • Hadoop Distro or Roll Your Own • Plus Configuration Management • We’re working on Ansible scripts that we expect to release • Enable NTP • Use /etc/hosts vs. DNS • Use easy names: • Ctrl01, DN01, ES01 • Add IP addresses for Services: nn1, nn2, yarn, etc. • “Steal” hadoop config files from POC installs and adjust. • Building from scratch is painful. • Add parallelism in Storm Topologies • Add parallelism in Storm Topologies • Add parallelism in Storm Topologies TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 177 Installation/Configuration Suggestions • Kickstart / Autoyast / etc. – Automated builds as much as possible • Hadoop Distro or Roll Your Own • Plus Configuration Management • We’re working on Ansible scripts that we expect to release • Enable NTP • Use /etc/hosts vs. DNS • Use easy names: • Ctrl01, DN01, ES01 • Add IP addresses for Services: nn1, nn2, yarn, etc. • “Steal” hadoop config files from POC installs and adjust. • Building from scratch is painful. • Add parallelism in Storm • Tune both ways: Independently and as a Group

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 178 Installation/Configuration Suggestions • Dedicate Disks to Kafka

Kafka Other stuff

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 179 Installation/Configuration Suggestions • Dedicate Disks to Kafka • Spout Workers should equal Kafka Topic Partitions x replicas • 12 Partitions with 2 replicas would have 24 workers per spout

Partitions Replicas

Workers

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 180 Installation/Configuration Suggestions • Dedicate Disks to Kafka • Spout Workers should equal Kafka Topic Partitions x replicas • 12 Partitions with 2 replicas would have 24 workers per spout • Evaluate request.required.acks Kafka Producers (-1, 0, 1) • https://kafka.apache.org/08/configuration.html (Section 3.3, near the bottom)

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 181 Installation/Configuration Suggestions • Dedicate Disks to Kafka • Spout Workers should equal Kafka Topic Partitions x replicas • 12 Partitions with 2 replicas would have 24 workers per spout • Evaluate request.required.acks Kafka Producers (-1, 0, 1) • https://kafka.apache.org/08/configuration.html (Section 3.3, near the bottom) • Dedicate a cluster to Elastic Search • 3 instances on dedicated servers outperform shared nodes w/ equal instances

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 182 Installation/Configuration Suggestions • Dedicate Disks to Kafka • Spout Workers should equal Kafka Topic Partitions x replicas • 12 Partitions with 2 replicas would have 24 workers per spout • Evaluate request.required.acks Kafka Producers (-1, 0, 1) • https://kafka.apache.org/08/configuration.html (Section 3.3, near the bottom) • Dedicate a cluster to Elastic Search • 3 instances on dedicated servers outperform shared nodes w/ equal instances

• Dive in and Learn!

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 183 Closing Call to Action

• Visit the World of Solutions for – Cisco Campus – Walk in Labs – Technical Solution Clinics • Meet the Engineer • Lunch time Table Topics • DevNet zone related labs and sessions • Recommended Reading: for reading material and further resources for this session, please visit www.pearson-books.com/CLMilan2015

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 185 Complete Your Online Session Evaluation

• Please complete your online session evaluations after each session. Complete 4 session evaluations & the Overall Conference Evaluation (available from Thursday) to receive your Cisco Live T-shirt.

• All surveys can be completed via the Cisco Live Mobile App or the Communication Stations

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 186 IPv6-only Experimental SSID (with NAT64) SSID: IPV6ONLYEXP PASS: iknowbesteffort

Addressing: SLAAC + stateless DHCPv6 Offsite NAT64 (Thanks to Go6 Institute)

Questions/support: @ayourtch Hashtag: #IPV6ONLYEXP SLA: it’s in the password 

TECSEC 3900 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 187