Cloudera-Enterprise Data Lake Presentation
Total Page:16
File Type:pdf, Size:1020Kb
Building the Enterprise Data Lake with Cloudera & Cisco Prepared by : Marilyn Tan, Country Manager Singapore Xue Daming, Senior Systems Engineer © Cloudera, Inc. All rights reserved. 1 Digital Transformation with Data © Cloudera, Inc. All rights reserved. 2 DATA is Transforming the World! DIGITAL DEMOCRACY & CONNECTED WORLD MORE DATA SMART APPs SECURITY INTERNET OF THINGS DATA AS INDUSTRY 4.0 4th PRODUCTION FACTOR THE NEW UX RISE OF OPEN SOURCE - Smart Things / Devices - New Analytics - Connected Experience - Digital Sharing Economy: - New Use Cases - Machine Learning & AI - Ubiquitous computing: Open Data & Algorithms - - New Architectures - New data sources Everywhere & on every Enterprise ready Open device (Voice, VR, AR, Source (e.g. Apache) - By 2020: 20.8b devices - Data Virtualization mobile, Wearables) - Digital (distributed) Trust - IoT: $1.7trillion in 2020 - Data Science (esp. Blockchain) © Cloudera, Inc. All rights reserved. 3 The 9 year Cloudera journey … 2008 2011 2012 2013 2014 2016 CLOUDERA FOUNDED BY CLOUDERA REACHES 100 CLOUDERA ENTERPRISE 4 CLOUDERA EXPANDS CLOUDERA FOCUSSES ON NAVIGATOR OPTIMIZER MIKE OLSON, PRODUCTION CUSTOMERS THE STANDARD FOR HADOOP BEYOND MR AND HBASE, SECURITY, AND GENERAL AVAILABILITY, AMR AWADALLAH & IN THE ENTERPRISE INTRODUCING IMPALA, GOVERNANCE WITH IMROVED CLOUD COVERAGE JEFF HAMMERBACHER, SOLR AND SPARK NAVIGATOR 2 AND CLOUD WITH AWS, AZURE AND GCP CHRISTOPHE BISCIGLIA WITH DIRECTOR JOINED BY DOUG CUTTING (2009) CLOUDERA Altus ENTERPRISE Clouds ∀ CDH / CM ENTERPRISE CDSW 4 DATA HUB 2009 2011 2012 2014 2015 2017… CDH: FIRST COMMERICAL CLOUDERA UNIVERSITY CLOUDERA CONNECT CLOUDERA INTRODUCES CLOUDERA INCLUDES CLOUDERA ACQUIRED FAST APACHE HADOOP EXPANDS TO 140 COUNTRIES REACHES 300 PARTNERS THE ENTERPRISE DATA HUB KAFKA, KUDU AND FORWARD LABS, ANNOUNCED PaaS DISTRIBUTION & SUPPORT IMPLEMENTS ACROSS SI, HARDEWARE, AND CLOUDERA RECORD SERVICE WITHIN ALTUS, DATA SCIENCE WORKBENCH, CLOUDERA MANAGER FOLLOW THE SUN MODEL AND SOFTWARE PARTNERS ENTERPRISE 5 CLOUDERA ENTERPRISE SHARED DATA EXPERIENCE (SDX) AND MORE TO COME! © Cloudera, Inc. All rights reserved. 4 What Happen Next: A Decade of Hadoop 18 Projects and beyond © Cloudera, Inc. All rights reserved. 5 Gartner Analytics Ascendancy Model What will make it happened? Prescriptive What will Analytics happened? Why did it Predictive Analytics happened? Optimization Diagnostic What Analytics Value happened? Descriptive Analytics Hindsight Insight Foresight Information Difficulty © Cloudera, Inc. All rights reserved. 6 Cloudera & Cisco Enterprise Data Lake Innovation © Cloudera, Inc. All rights reserved. 7 Cisco UCS Integrated Infrastructure with Cloudera for IoT Data Analytics Real-Time Real-Time Data Store UCS C220/C240 Data Inject (CoAP/MQTT.XMPP) Data Processing Fog Kafka DATA C800/UCS Mini/ Cisco UCS C240 Aggregator UCS C240 Cisco UCS C240 Batch ISR 8x9 with 4G LTE and Dual 802.11n a/g/n (WiFi) Radios Speed Layer Batch Layer Serving Layer Managed by Cisco FogDirector Big Data Store UCS C240/C3160 Cisco UCS at all layers, fully validated architectures with all major players © Cloudera, Inc. All rights reserved. 8 Fabric Centric Design High Performance UCS Manager 40 GB/s Ethernet; 320 GB/s per Chassis Unified Fabric Management Single Cable for Network, Storage, and Ethernet Management Traffic Storage Easy to Scale Single Point of Management: Add Cables for Bandwidth vs. Fabric Type © Cloudera, Inc. All rights reserved. 9 Management Simplicity Big Data: Management Consistency Hundreds of Servers Thousands of management points Simplified Scalability Easily Scale your infrastructure from few servers to thousands of servers with a fully Integrated Infrastructure UCS Service Profile Cisco ACI Application Profile Centralized Management Service Profiles for Servers • Manage all servers centrally Application Profiles for Network • Manage all network centrally © Cloudera, Inc. All rights reserved. 10 The enterprise platform for machine learning PATTERN RECOGNITION DRIVE CUSTOMER INSIGHTS Market segmentation Customer 360 Next best offer Churn analysis & prevention PREDICTION DETECTION 500+ CUSTOMERS RUN CONNECT PRODUCTS & SERVICES (IoT) PROTECT BUSINESS ON Cybersecurity Predictive maintenance Fraud Genomics & personalized medicine Anti-money laundering Predicting and preventing disease Risk modeling & assessment Natural language SPAM detection © Cloudera, Inc. All rights reserved. 11 Machine learning requires a complete stack. Business Integration Prepare Analyze Data Deploy • Load external data • Diagnose/treat data issues • Publish to BI/Viz • Process structured data • Design experiments • Deploy to batch scoring • Process unstructured data • Partition data • Deploy to real-time scoring • Process streaming data • Engineer features • Deploy to scoring API • Cleanse data • Train and validate models • Manage models • Vectorize data • Evaluate and assess models • Monitor model performance ● Batch Processing ● Analytic Languages ● BI/Viz ● Stream Processing ● ML Libraries ● Interactive SQL ● Interactive SQL ● Batch Processing ● Search Tools ● Stream Processing ● Text/Image Processing ● Operational DB Administration, Governance and Security © Cloudera, Inc. All rights reserved. 12 A complete, integrated enterprise platform Cloudera Enterprise Data Hub Cloudera Distribution for Hadoop © Cloudera, Inc. All rights reserved. 13 Cloudera Data Science Workbench • Supports data science end-to-end • Full access to data • Secure self-service provisioning • Containerized environments • Supports Python, R, and Scala • Automates: Workflow Version control Collaboration Sharing © Cloudera, Inc. All rights reserved. 14 CDSW Benefits Data Scientists IT • Web browser, no desktop footprint • Support self-service data science • Use R, Python, or Scala • Full platform security • Install any library or framework • Kerberos authentication • Isolated project environments • Run on-premises or in the cloud • Direct access to data in secure clusters • Share insights with team • Reproducible, collaborative research • Automate and monitor data pipelines • Built-in job scheduling © Cloudera, Inc. All rights reserved. 15 Deep learning in Cloudera with Apache Spark Spark Packages DL4J BigDL • Two packages: • Open source DL library • Deep learning framework • CaffeOnSpark • Developed by Skymind • Developed by Intel • TensorFlowOnSpark • Built on JVMs • Supports CPUs only • Developed by Yahoo • Supports CPUs and GPUs • Leverages Intel MKL • Python and Scala APIs • Java, Scala, Python APIs • Scala, Python APIs • All DL architectures • Training and inference • Imports models from: • Integrated pipeline • Imports models from: • TensorFlow • Run on existing clusters • TensorFlow • Caffe • Training and inference • Caffe • Torch • Torch • Runs on existing clusters • Theano • Runs on existing clusters © Cloudera, Inc. All rights reserved. 16 New! Accelerated deep learning on-demand with GPUs “Our data scientists want GPUs, but we can’t find a way to deliver multi-tenancy. Multi-tenant GPU support on-premises or cloud If they go to the cloud on their own, it’s expensive and we lose governance.” Data Science CDH CDH Workbench ● Extend existing CDSW benefits to GPU- optimized deep learning tools ● Schedule & share GPU resources CPU GPU CPU CPU ● Train on GPUs, deploy on CPUs distributed single-node training ● Works on-premises or cloud training, scoring © Cloudera, Inc. All rights reserved. 17 Enterprise Data Lake Architecture © Cloudera, Inc. All rights reserved. 18 Canonical Ingestion & Spark Streaming Analytics with Cisco Big Data Analytics Platform • Integrate with Apache Spark Streaming for real-time analysis of data • Write back to Kafka for further processing or to send to an application layer © Cloudera, Inc. All rights reserved. 19 Proposed Architecture for Enterprise Data Platform AI Platform Advanced Analytics Data Visualization Meteorological Data Sources Data Data Data War Warehouse Sensors Data eho use Historian/SCADA Unstructured BW HANA files BPC Social any SAP NW Video/Image Data WarehouseEnterprise Geospatial Data Warehouse © Cloudera, Inc. All rights reserved. 20 Big Data Blueprints: Cisco Validated Designs Designs Big Data What you get Cisco® Validated Designs with Cloudera Industry-leading partnerships Tested and validated reference architectures to meet performance, capacity, and scale Joint engineering lab Extensive options for data management (Hadoop, MPP, and NoSQL) to meet your business needs Solution bundles optimized for cost of ownership and ease of ordering Solution designed, tested, and documented to facilitate faster, more reliable, and more predictable customer deployments. © Cloudera, Inc. All rights reserved. 21 Our Customers’ Success Stories © Cloudera, Inc. All rights reserved. 22 CASE STUDY TRANSPORTATION » PREDICTIVE MAINTENANCE » IMPROVED SERVICE DATA-DRIVEN PRODUCTS » DATA DRIVEN PRODUCTS Using Predictive Maintenance to Improve Performance and Reduce Fleet Downtime • OnCommand Connection is collecting telematics and geolocation data across the fleet • Reduced maintenance costs to $.03 per mile from $.12-$.15 per mile • Centralizing data from 13 systems with varying frequency and semantic definitions • Real-time visibility of 250,000+ trucks in order to improve uptime and vehicle performance © Cloudera, Inc. All rights reserved. 23 CASE STUDY TRAVEL & TRANSPORTATION » SMART BUILDINGS » PREDICTIVE MAINTENANCE DATA-DRIVEN » ADVANCED ANALYTICS PRODUCTSPROCESS Smart Buildings - Preventative Maintenance Using Sensors & IoT to Improve Passenger Safety