Building the Enterprise Data Lake with Cloudera & Cisco Prepared by : Marilyn Tan, Country Manager Singapore Xue Daming, Senior Systems Engineer

© Cloudera, Inc. All rights reserved. 1 Digital Transformation with Data

© Cloudera, Inc. All rights reserved. 2 DATA is Transforming the World!

DIGITAL DEMOCRACY & CONNECTED WORLD MORE DATA SMART APPs SECURITY

INTERNET OF THINGS DATA AS INDUSTRY 4.0 4th PRODUCTION FACTOR THE NEW UX RISE OF OPEN SOURCE

- Smart Things / Devices - New Analytics - Connected Experience - Digital Sharing Economy: - New Use Cases - Machine Learning & AI - Ubiquitous computing: Open Data & Algorithms - - New Architectures - New data sources Everywhere & on every Enterprise ready Open device (Voice, VR, AR, Source (e.g. Apache) - By 2020: 20.8b devices - Data Virtualization mobile, Wearables) - Digital (distributed) Trust - IoT: $1.7trillion in 2020 - Data Science (esp. Blockchain)

© Cloudera, Inc. All rights reserved. 3 The 9 year Cloudera journey …

2008 2011 2012 2013 2014 2016 CLOUDERA FOUNDED BY CLOUDERA REACHES 100 CLOUDERA ENTERPRISE 4 CLOUDERA EXPANDS CLOUDERA FOCUSSES ON NAVIGATOR OPTIMIZER MIKE OLSON, PRODUCTION CUSTOMERS THE STANDARD FOR HADOOP BEYOND MR AND HBASE, SECURITY, AND GENERAL AVAILABILITY, AMR AWADALLAH & IN THE ENTERPRISE INTRODUCING IMPALA, GOVERNANCE WITH IMROVED CLOUD COVERAGE JEFF HAMMERBACHER, SOLR AND SPARK NAVIGATOR 2 AND CLOUD WITH AWS, AZURE AND GCP CHRISTOPHE BISCIGLIA WITH DIRECTOR JOINED BY (2009)

CLOUDERA Altus ENTERPRISE Clouds ∀ CDH / CM ENTERPRISE CDSW 4 DATA HUB

2009 2011 2012 2014 2015 2017… CDH: FIRST COMMERICAL CLOUDERA UNIVERSITY CLOUDERA CONNECT CLOUDERA INTRODUCES CLOUDERA INCLUDES CLOUDERA ACQUIRED FAST EXPANDS TO 140 COUNTRIES REACHES 300 PARTNERS THE ENTERPRISE DATA HUB KAFKA, KUDU AND FORWARD LABS, ANNOUNCED PaaS DISTRIBUTION & SUPPORT IMPLEMENTS ACROSS SI, HARDEWARE, AND CLOUDERA RECORD SERVICE WITHIN ALTUS, DATA SCIENCE WORKBENCH, CLOUDERA MANAGER FOLLOW THE SUN MODEL AND SOFTWARE PARTNERS ENTERPRISE 5 CLOUDERA ENTERPRISE SHARED DATA EXPERIENCE (SDX) AND MORE TO COME!

© Cloudera, Inc. All rights reserved. 4 What Happen Next: A Decade of Hadoop

18 Projects

and beyond © Cloudera, Inc. All rights reserved. 5 Gartner Analytics Ascendancy Model What will make it happened?

Prescriptive What will Analytics happened?

Why did it Predictive Analytics happened? Optimization

Diagnostic What Analytics

Value happened?

Descriptive Analytics

Hindsight Insight Foresight

Information

Difficulty © Cloudera, Inc. All rights reserved. 6 Cloudera & Cisco Enterprise Data Lake Innovation

© Cloudera, Inc. All rights reserved. 7 Cisco UCS Integrated Infrastructure with Cloudera for IoT

Data Analytics Real-Time

Real-Time Data Store UCS C220/C240 Data Inject (CoAP/MQTT.XMPP) Data Processing

Fog Kafka DATA C800/UCS Mini/ Cisco UCS C240 Aggregator UCS C240 Cisco UCS C240 Batch

ISR 8x9 with 4G LTE and Dual 802.11n a/g/n (WiFi) Radios Speed Layer Batch Layer Serving Layer Managed by Cisco FogDirector Big Data Store UCS C240/C3160 Cisco UCS at all layers, fully validated architectures with all major players

© Cloudera, Inc. All rights reserved. 8 Fabric Centric Design

High Performance UCS Manager 40 GB/s Ethernet; 320 GB/s per Chassis

Unified Fabric Management

Single Cable for Network, Storage, and Ethernet Management Traffic Storage Easy to Scale Single Point of Management: Add Cables for Bandwidth vs. Fabric Type

© Cloudera, Inc. All rights reserved. 9 Management Simplicity

Big Data: Management Consistency Hundreds of Servers Thousands of management points Simplified Scalability Easily Scale your infrastructure from few servers to thousands of servers with a fully Integrated Infrastructure UCS Service Profile Cisco ACI Application Profile Centralized Management Service Profiles for Servers • Manage all servers centrally Application Profiles for Network • Manage all network centrally

© Cloudera, Inc. All rights reserved. 10 The enterprise platform for machine learning

PATTERN RECOGNITION DRIVE CUSTOMER INSIGHTS Market segmentation Customer 360 Next best offer Churn analysis & prevention

PREDICTION DETECTION 500+ CUSTOMERS RUN CONNECT PRODUCTS & SERVICES (IoT) PROTECT BUSINESS ON Cybersecurity Predictive maintenance Fraud Genomics & personalized medicine Anti-money laundering Predicting and preventing disease Risk modeling & assessment Natural language SPAM detection

© Cloudera, Inc. All rights reserved. 11 Machine learning requires a complete stack.

Business Integration

Prepare Analyze Data Deploy

• Load external data • Diagnose/treat data issues • Publish to BI/Viz • Process structured data • Design experiments • Deploy to batch scoring • Process unstructured data • Partition data • Deploy to real-time scoring • Process streaming data • Engineer features • Deploy to scoring API • Cleanse data • Train and validate models • Manage models • Vectorize data • Evaluate and assess models • Monitor model performance

● Batch Processing ● Analytic Languages ● BI/Viz ● Stream Processing ● ML Libraries ● Interactive SQL ● Interactive SQL ● Batch Processing ● Search Tools ● Stream Processing ● Text/Image Processing ● Operational DB

Administration, Governance and Security

© Cloudera, Inc. All rights reserved. 12 A complete, integrated enterprise platform

Cloudera Enterprise Data Hub

Cloudera Distribution for Hadoop

© Cloudera, Inc. All rights reserved. 13 Cloudera Data Science Workbench

• Supports data science end-to-end • Full access to data • Secure self-service provisioning • Containerized environments • Supports Python, R, and Scala • Automates: Workflow Version control Collaboration Sharing

© Cloudera, Inc. All rights reserved. 14 CDSW Benefits Data Scientists IT • Web browser, no desktop footprint • Support self-service data science • Use R, Python, or Scala • Full platform security • Install any library or framework • Kerberos authentication • Isolated project environments • Run on-premises or in the cloud • Direct access to data in secure clusters • Share insights with team • Reproducible, collaborative research • Automate and monitor data pipelines • Built-in job scheduling

© Cloudera, Inc. All rights reserved. 15 Deep learning in Cloudera with Apache Spark

Spark Packages DL4J BigDL • Two packages: • Open source DL library • Deep learning framework • CaffeOnSpark • Developed by Skymind • Developed by • TensorFlowOnSpark • Built on JVMs • Supports CPUs only • Developed by Yahoo • Supports CPUs and GPUs • Leverages Intel MKL • Python and Scala APIs • Java, Scala, Python APIs • Scala, Python APIs • All DL architectures • Training and inference • Imports models from: • Integrated pipeline • Imports models from: • TensorFlow • Run on existing clusters • TensorFlow • Caffe • Training and inference • Caffe • Torch • Torch • Runs on existing clusters • Theano • Runs on existing clusters

© Cloudera, Inc. All rights reserved. 16 New! Accelerated deep learning on-demand with GPUs

“Our data scientists want GPUs, but we can’t find a way to deliver multi-tenancy. Multi-tenant GPU support on-premises or cloud If they go to the cloud on their own, it’s expensive and we lose governance.” Data Science CDH CDH Workbench

● Extend existing CDSW benefits to GPU- optimized deep learning tools ● Schedule & share GPU resources CPU GPU CPU CPU ● Train on GPUs, deploy on CPUs distributed single-node training ● Works on-premises or cloud training, scoring

© Cloudera, Inc. All rights reserved. 17 Enterprise Data Lake Architecture

© Cloudera, Inc. All rights reserved. 18 Canonical Ingestion & Spark Streaming Analytics with Cisco Big Data Analytics Platform

• Integrate with Apache Spark Streaming for real-time analysis of data • Write back to Kafka for further processing or to send to an application layer

© Cloudera, Inc. All rights reserved. 19 Proposed Architecture for Enterprise Data Platform

AI Platform Advanced Analytics Data Visualization

Meteorological Data Sources Data Data Data War Warehouse Sensors Data eho use Historian/SCADA

Unstructured BW HANA files BPC Social any SAP NW

Video/Image Data WarehouseEnterprise Geospatial Data Warehouse

© Cloudera, Inc. All rights reserved. 20 Big Data Blueprints: Cisco Validated Designs

Designs Big Data What you get

Cisco® Validated Designs with Cloudera Industry-leading partnerships

Tested and validated reference architectures to meet performance, capacity, and scale

Joint engineering lab

Extensive options for data management (Hadoop, MPP, and NoSQL) to meet your business needs Solution bundles optimized for cost of ownership and ease of ordering

Solution designed, tested, and documented to facilitate faster, more reliable, and more predictable customer deployments.

© Cloudera, Inc. All rights reserved. 21 Our Customers’ Success Stories

© Cloudera, Inc. All rights reserved. 22 CASE STUDY TRANSPORTATION » PREDICTIVE MAINTENANCE » IMPROVED SERVICE DATA-DRIVEN PRODUCTS » DATA DRIVEN PRODUCTS

Using Predictive Maintenance to Improve Performance and Reduce Fleet Downtime • OnCommand Connection is collecting telematics and geolocation data across the fleet • Reduced maintenance costs to $.03 per mile from $.12-$.15 per mile • Centralizing data from 13 systems with varying frequency and semantic definitions • Real-time visibility of 250,000+ trucks in order to improve uptime and vehicle performance

© Cloudera, Inc. All rights reserved. 23 CASE STUDY TRAVEL & TRANSPORTATION » SMART BUILDINGS » PREDICTIVE MAINTENANCE DATA-DRIVEN » ADVANCED ANALYTICS PRODUCTSPROCESS

Smart Buildings - Preventative Maintenance Using Sensors & IoT to Improve Passenger Safety and Airport Efficiency Challenge: • Improve traveler satisfaction and safety, by reducing downtime for critical operational machinery Solution: • Cloudera on Azure to capture, secure, and correlate sensor (IoT) data collected from escalators, elevators, and baggage carousels • Provide necessary fixes to prevent unplanned downtime

© Cloudera, Inc. All rights reserved. 24 CASE STUDY 2016 Data Impact Award Winner State of Kentucky Department of Transportation Smart Cities Enabling the State of Kentucky manage snow and ice events in real time

Challenge: • Needed more efficient approach to inclement weather road management

Solution: • Real-time weather response system that incorporates real-time data from Waze, HERE, ESRI’s GeoEvent processor, and Automatic Vehicle Locations (providing sensor data from salt trucks). • KYTC aggregates 15-20 million records every day and process more than a million records per second.

© Cloudera, Inc. All rights reserved. 25 Data Protection & Governance

0 – Data in 1 – Behavior and 2 – Expanded Data 3 – EDH: Secure Isolation Transaction Fusion Surface Area Data Vault

Fully Compliance Ready Audit-Ready & Protected

Audit Ready For: PCI PII Full encryption, key Data Security & Governance management, transparency, Lineage Visibility and enforcement for all data- Basic Security Controls Metadata Discovery at-rest and data-in-motion Authorization Data in application silos Encryption & Key Authentication Management Limited Insights Comprehensive Auditing Summarized Security Compliance & Risk Mitigation

© Cloudera, Inc. All rights reserved. 26 Why Cloudera

The Platform for Next-Generation Analytics Cloudera Enterprise delivers the capabilities required by the largest enterprises, spanning analytics, security, governance, and management. We make Hadoop fast, easy, and secure. The Experience to Help You Succeed No one knows Hadoop like Cloudera. As the first Hadoop company, Cloudera is the world’s leading contributor to and provider of enterprise Hadoop, with experience you can rely on to help you succeed. Open Innovation Our unique hybrid open source strategy enables us to lead the enterprise expansion of the Hadoop ecosystem, driving innovative new capabilities and open standards in the community.

© Cloudera, Inc. All rights reserved. 27 Thank you [email protected] | +65 9822 2338 [email protected] | +65 9368 2316

© Cloudera, Inc. All rights reserved. 28