Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka

White Paper Information Security | Machine Learning October 2020 IT@Intel: Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka Our Apache Kafka data pipeline based on Confluent Platform ingests tens of terabytes per day, providing in-stream processing for faster security threat detection and response Intel IT Authors Executive Summary Ryan Clark Advanced cyber threats continue to increase in frequency and sophistication, Information Security Engineer threatening computing environments and impacting businesses’ ability to grow. Jen Edmondson More than ever, large enterprises must invest in effective information security, Product Owner using technologies that improve detection and response times. At Intel, we Dennis Kwong are transforming from our legacy cybersecurity systems to a modern, scalable Information Security Engineer Cyber Intelligence Platform (CIP) based on Kafka and Splunk. In our 2019 paper, Transforming Intel’s Security Posture with Innovations in Data Intelligence, we Jac Noel discussed the data lake, monitoring, and security capabilities of Splunk. This Security Solutions Architect paper describes the essential role Apache Kafka plays in our CIP and its key Elaine Rainbolt benefits, as shown here: Industry Engagement Manager ECONOMIES OPERATE ON DATA REDUCE TECHNICAL GENERATES OF SCALE IN STREAM DEBT AND CONTEXTUALLY RICH Paul Salessi DOWNSTREAM COSTS DATA Information Security Engineer Intel IT Contributors Victor Colvard Information Security Engineer GLOBAL ALWAYS MODERN KAFKA LEADERSHIP SCALE AND REACH ON ARCHITECTURE WITH THROUGH CONFLUENT Juan Fernandez THRIVING COMMUNITY EXPERTISE Technical Solutions Specialist Frank Ober SSD Principal Engineer Apache Kafka is the foundation of our CIP architecture. We achieve economies of Table of Contents scale as we acquire data once and consume it many times. Simplified connection Data Silos Impede Response .........2 of data sources helps reduce our technical debt, while filtering data helps reduce Defining a Modern, Scalable costs to downstream systems. Architecture .........................2 Intel vice president and Chief Information Security Officer, Brent Conran, explains, Securing Apache Kafka ...............5 “Kafka helps us produce contextually rich data for both IT and our business units. Improving Availability with Kafka also enables us to deploy more advanced techniques in-stream, such as Multi-Region Clusters ................6 machine-learning models that analyze data and produce new insights. This helps us reduce mean time to detect and respond; it also helps decrease the need for Sizing Apache Kafka Infrastructure ...7 human touch. Kafka technology, combined with Confluent’s enterprise features Next Steps ...........................7 and high-performance Intel architecture, support our mission to make it safe for Conclusion ..........................8 Intel to go fast.” White Paper | IT@Intel: Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka 2 Data Silos Impede Response At Intel, our Information Security organization keeps the “Kafka also enables us to deploy more enterprise secure, while managing legal compliance worldwide. It requires a large number of people, applications, advanced techniques in-stream, such as databases, and analytics capabilities. In the past, we were machine-learning models that analyze data not always well integrated, which led to silos in multiple and produce new insights.” locations focused on specific information. Intel IT is not alone. When left unaddressed, these disparate silos can lead –Brent Conran, Vice President and CISO, Intel to poor or no data integration, dead-ends, inconsistencies, inaccurate interpretations, and significant technical debt. In these scenarios, organizations might spend more time maintaining outdated legacy systems than preparing for Selecting an Apache Kafka Supplier emerging threats. Siloed and disparate systems cannot keep pace with the rapidly changing threat landscape. They hinder Before choosing Confluent as our technology partner, we efficient prevention, detection, and response to threats evaluated several publish and subscribe (Pub/Sub) data and vulnerabilities. We needed the ability to connect and pipeline technologies for our CIP solution. We needed the integrate on a single platform that enables us to use our data capability to ingest massive amounts of data, produce feeds, to its greatest advantage. and perform in-stream data transformations (also known as stream processing) with high throughput and low latency. Essentially, we needed a circulatory system that could filter, Defining a Modern, Scalable Architecture refine, and improve data before it is consumed. We saw that the ability to acquire data once and consume it many times Our Information Security organization works in concert would deliver economies of scale to Information Security, IT to operate on data, construct detection logic, and create operations, and Intel’s business units. We decided Apache dashboards to quickly identify, triage, and mitigate threats. Kafka was the best technology for these desired capabilities. Effectively communicating event data in a timely manner is critical. Our solution requires a resilient, highly available, Information Security’s mantra is to buy before build whenever scalable platform capable of ingesting multiple TBs of data possible, and we knew we needed a Kafka supplier to help per day from many sources, and in near real time. Our Cyber with the typical pain points and risks that accompany pure Intelligence Platform (CIP) goes beyond traditional security open source software. Upon scanning the market, it quickly information and event management capabilities to continually became obvious that Confluent was the leader in supplying increase data effectiveness across the organization. That value-add platform capabilities, in addition to enterprise includes vulnerability management, security compliance software support for Kafka. Other options claimed to support and enforcement, threat hunting, incident response, risk Kafka; however, because they were also focused on a broader management, and other activities. Through modern, industry- set of software, such as Hadoop, Spark, and Apache NiFi, leading technologies, our new CIP increases the value of our their expertise with Kafka was diluted. Conversely, Confluent data by making it more usable and accessible to everyone in was laser-focused on Kafka software and its surrounding the organization (see Figure 1). community, making them our partner of choice. API Data Virtualization ayer Infrastructure Information Security Common ork Security Data Control ayer Clients usiness Role Surface ayer Data ake lueprint Incident Response Query Security Event Management User Event ehavior Analytics Vulnerability Search Servers Management Vulnerability Scanning Reporting Threat Intelligence Compliance Dashboards Advanced Analytics Network Enforcement Visualizations Deceptions Infrastructure Data Protection Analytics Workbench Intrusion Detection Firewalls Threat Intelligence Workflow Automation Intrusion Prevention Endpoint Detection and Response Other Data Data Loss Prevention Sources Intrusion Scanning Connectors Enterprise Security Message us Topics, Publish/Subscribe, Transform, Enrich, Filter, oin Figure 1. Our new CIP requires modern technologies that quickly and easily consume new data sources, employ machine learning, and extend capabilities. White Paper | IT@Intel: Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka 3 The combination of open source Apache Kafka with Confluent Benefits of Apache Kafka as our technology partner enabled Information Security to quickly realize our objectives and vision for the Enterprise Apache Kafka and Confluent Platform offer several benefits: Security Message Bus in our CIP solution architecture (see • Economies of scale. The concept of acquiring data once and Figure 2). consuming it many times is extremely powerful in describing the benefits of Kafka. When data is produced to Kafka once, the options for consuming that data become endless. Gone are the days of many point-to-point and custom connectors Confluent’s Enterprise Capabilities to integrate disparate technologies (see Figure 3). It is now Confluent provides enterprise capabilities beyond more common to seamlessly share data across multiple open source Apache Kafka that make it more people, applications, and systems. It also makes it easier to powerful, manageable, maintainable, and easy to monitor, build, and maintain data pipelines, all in one place. use. Some of these key capabilities are: • Operate on data in-stream. Rapid threat detection and • Confluent Control Center for monitoring our response is a primary goal for the Information Security Kafka clusters team, whose successes are often measured by indicators • Replicator and Multi-Region Clusters for multiple like mean-time-to-detect and mean-time-to-respond. data center design and fault tolerance Intel is no different. Now, with the ability to operate on • Auto Data Balancer for cluster rebalancing and data in-stream, we can identify threats in near real time. easy scale out, when needed • Reduce downstream costs. The ability to filter data in- • Enterprise-grade security including Role-based stream can help reduce costs. For example, if the first Access Control and audit logs operation you plan to perform is filtering duplicate data, • 100+ pre-built connectors to common sources it may be better to do it in-stream, thus reducing

Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka

Unravel Data Systems Version 4.5

HDP 3.1.4 Release Notes Date of Publish: 2019-08-26

An Empirical Performance Evaluation of Apache Kafka

The Geography of Job Tasks

Apache Flume™

DMI-Technical-Proposal ODOT ESP-2020-11-06

Dissertation Zur Erlangung Des Akademischen Grades Des Doktors Der Naturwissenschaften

Technology Overview

A Survey of Distributed Message Broker Queues

The Event Mesh: a Primer Chapter 1: What Are Events? PAGE 3

Building a Scalable Distributed Data Platform Using Lambda Architecture

Log4j User Guide