What’s Coming in Fall 2020 Cloudera with IBM
—
Nagapriya (Priya) Tiruthani Offfering management – Big Data ntiruth@us.ibm.com As data becomes more ACCESSIBLE it provides more VALUE
Data Driven Insight Driven Digital Transformation Outcomes Culture Change Prediction New Business Models Breaking Silos Optimization Disruptive Technology Discover “what” Automation Real-Time Decisions Understand “why” Collaboration
Capabilities Self Service Models AI Reports Visualization Multi-Cloud Business Intelligence Applications
Cost Reduction Competitive Market Drivers Modernization Leader Value from Data
IBM Data & AI / © 2020 IBM Corporation There is no AI without an IA “Information Architecture”
“ No amount of AI algorithmic sophistication will overcome a lack of data [architecture]
Data collection & preparation is the most time consuming and difficult part of AI ”
Sources: 2018 MITSlone ”Reshaping business with AI” IBM Data & AI / © 2020 IBM Corporation 3 Open source will 3+ 87% of AI Developers continue to drive average number of depend on Open Open Source Databases innovation and speed Source technology up the journey to AI used by enterprises today
85% >50% 3 of the of enterprises are of revenue will be from engaged in Open Source servers that run open top 5 projects source AI software most popular databases are Open Source, including Postgres SQL
Source: https://dzone.com/articles/2019-open-source-database- 4 report-top-databases-pub A long open Serve on the Built on open history board source
In 1999, IBM IBMers serve on Many of IBM’s supported Linux many open offerings IBM’s by investing $1 source boards leverage open billion in its including Linux, source— Commitment development, Eclipse, Apache, including cloud, To Open Source making is less CNCF, Node.js, big data and risky to traditional Hyperledger, and analytics, enterprise users. more. blockchain, IoT, machine learning, and AI.
IBM Data & AI / © 2020 IBM Corporation IBM and Cloudera Relationship and Significant Milestones Together selling more than $100M annually in software, support and services
2017 2018 2019 2020
IBM + Hortonworks Strategic IBM wins Hortonworks overall Cloudera + Hortonworks Merger Cloudera announces Red Hat Partnership Announced “2018 Partner of the Year” Completed with hybrid cloud OpenShift as preferred container vision solution for CDP HDP and HDF Certified for IBM Announces intent to buy IBM Power, Spectrum Scale, Red Hat for $34B and IBM and Cloudera expand IBM releases NEW version of Db2 Big SQL and Watson become the world’s largest partnership to include resell and Db2 Big SQL supporting Studio hybrid cloud provider support of entire1 Cloudera Cloudera’s CDP Private Cloud portfolio IBM announces workshops to IBM releases NEW versions of help customer plan for CDP Db2 Big SQL supporting Private Cloud Cloudera’s CDH 5 & 6 1 Cloudera portfolio IBM Power and IBM Storage ✓ All legacy Cloudera offerings IBM Wins Cloudera overall “2019 come to market with CDP Private ✓ All legacy Hortonworks offerings Partner of the Year” Cloud Base ✓ Cloudera Data Platform (CDP) ✓ All Services & Training offerings (DSE, PSE, Operational Services)
IBM Data & AI / © 2020 IBM Corporation Changing Customer Needs
Any Tier | All Data Data Lifecycle Secure & Open + Governed Standards
Multiple public Streaming Data & metadata 100% Open source Hybrid Data engineering Fine grained security Open data formats Private Data warehousing Lineage and provenance Open storage & compute Open APIs Data center Machine learning & AI Data & workload migration Edge
IBM Data & AI / © 2020 IBM Corporation Data Landscape is Evolving The new realities of managing data and workloads across clouds
Decade 1 Decade 2 Hadoop on-prem and on the cloud Hadoop powered data clouds
● Need to efficiently store & process data ● Need to integrate the entire lifecycle USE CASES ● Batch process “big data” ● Industrialize data-driven decision making
● Co-locate compute and storage to use TECHNOLOGY ● High performance analytics with commodity hardware and avoid costly INFRASTRUCTURE remote disaggregated storage with network transfers memory and SSD caching
● Deploy software in months and quarters USER EXPERIENCE ● Spin up services in minutes
● Network perimeter & physical access ● Security at the workload, data & PRIVACY, SECURITY & controls are the norm metadata layer GOVERNANCE ● Simplicity over robust mechanisms ● Solutions for new regulations (GDPR)
8 IBM Data & AI / © 2020 IBM Corporation Complete Enterprise Data Lifecycle Manage and secure the data lifecycle in any cloud or datacenter
Data Operational Engineering Database Collect Report Predict 02 04
01 03 05 Streaming Curate Data Serve Machine & Data Flow Warehouse Learning & AI
Security | Governance | Lineage | Management | Automation
IBM Data & AI / © 2020 IBM Corporation Poll:
Cloudera Enterprise Data Hub Which platform are Hortonworks Data Platform you using today?
Cloudera Data Platform
IBM Data & AI / © 2020 IBM Corporation Introducing Cloudera Data Platform…. Industry’s First Enterprise Data Cloud
Cloudera Data Platform Private Cloud with IBM
Cloud Cloud Data Cloud Cloud Data Data Center Data Machine Flow Hub Software Warehouse Learning
DataFlow HDP Cloudera Data CDF Enterprise Enterprise Science HDF Plus Data Hub Workbench
Today’s Products
IBM Data & AI / © 2020 IBM Corporation Introducing Cloudera Data Platform
Data center & Public Hybrid Control Private Cloud Multi-Cloud Cloud Plane
Data • Control cloud costs with auto Anywhere scale, suspend and resume
• Optimize workloads based on analytics and machine learning Governed Catalog | Schema | Migration | Security | Governance Everywhere • View data lineage across any cloud and transient clusters Data Flow & Data Data Operational Machine • Use a single pane of glass across Edge to AI Streaming Engineering Warehouse Database Learning hybrid and multi-clouds Analytics
• Scale to petabytes of data and 1,000s of diverse users Open Cloudera Runtime
Distribution Identity | Orchestration | Management | Operations | Management | Orchestration | Identity Management Console
IBM Data & AI / © 2020 IBM Corporation One Platform – Two Form Factors
CDP Public Cloud CDP Private Cloud (Base + Plus) (platform-as-a-service) (installable software)
Control Plane
CDP Datacenter Private AWS Azure GCP Cloud
Virtual Private Self-Serve Self-Serve Physical Clusters Experiences Experiences Clusters
DW, ML, DE, DW, ML, DE, Data Hub Data Center … …
Cloudera Runtime
IBM Data & AI / © 2020 IBM Corporation CDP Public / Private Cloud Architecture
Management Console Management Console - A single pane of glass to Data Workload Replication manage one or more environments and the services that Catalog Manager Manager run within each environment
Environment - A logical encapsulation of a customer network and the the services that run within that network Environment (like an Azure virtual network)
Data DW ML DataHuHub Cluster – A distributed computing service that running on ClusterCDW ClusterCML Clusterb VMs (Data Hub) or K8s (the experiences) and has Clusterss Clusterss Clusterss access the shared data lake
SDX – The data access control layer that sits on top of SDX the backend object store and provides coherent data security and governance for all the applications running with the environment
IBM Data & AI / © 2020 IBM Corporation Poll: How soon are you planning to migrate to Cloudera Data Platform?
6 months
12 months 18 months
IBM Data & AI / © 2020 IBM Corporation 24 months or later Cloudera Data Platform Private Cloud (BASE & PLUS)
CDP Private Cloud PLUS expands Experiences upon the value of CDP Private Machine Data Data PLUS Cloud BASE by providing: DataFlow Learning Warehouse Engineering • New set of Experiences
• Leverages Red Hat Open Shift • Provides customers greater flexibility as they can run on any private/public cloud of choice SDX • Leverages BareMetal Schema • Allows customers to leverage their existing Security HDFS / Ozone BASE investments and architecture Governance • Allows customers to build toward future state of compute and storage BareMetal
IBM Data & AI / © 2020 IBM Corporation Cloudera Data Platform / CDP Private Cloud BASE
The most comprehensive Data Analytics Platform
EDH Cloudera Enterprise Data Hub
+ + New Features = CDP Private Cloud BASE recently renamed from CDP Data Center HDP HORTONWORKS DATA PLATFORM powered by Apache Hadoop
IBM Data & AI / © 2020 IBM Corporation CDP Private Cloud Plus
CDP Private Cloud PLUS expands upon the value of CDP Private Cloud BASE by providing:
• 10x faster deployments of analytics and machine learning services with a petabyte-scale hybrid data architecture that can burst to public clouds
• 100% tenant isolation in meeting the SLAs of your mission-critical workloads eliminating the noisy neighbor problem
• 50% reduced data center costs by drastically improving efficiency and utilization of your compute infrastructure and eliminating data replication
IBM Data & AI / © 2020 IBM Corporation New Features for everyone… CDP Private Cloud BASE First Step to Private Cloud PLUS and MAX
New features for CDH 6 customers New features for HDP 3 customers
• Virtual private clusters • Dynamic row filtering & column masking • Automated wire encryption setup Cloudera Manager Ranger 2.0 • Attribute-based access control • Fine-grained RBAC for administrators • SparkSQL fine-grained access control • Streamlined maintenance workflows
• Advanced data discovery • Advanced data lineage Atlas 2.0 Atlas 2.0 • Improved performance and scalability • Faceted search
• Relevance-based text search over • Hive-on-Tez for better ETL performance Solr 7 Hive 3 unstructured data (text, pdf, jpg, …) • ACID transactions • Better fit for Data Mart migration use Impala Ozone (Preview) • 10x scalability of HDFS cases (interactive, BI style queries)
Knox* Gateway-based SSO Hue Built-in SQL Editor
Low-latency DataMart for real-time and Better performance for fast changing / Druid* Kudu aggregate data updateable data Better at-rest Key Trustee Server, NavEncrypt* Spark on Docker* Simplified dependency management Encryption
* In future release CDP Data Center First Step to Private Cloud (Includes SDX IBM Data & AI / © 2020 IBM Corporation and many other important capabilities) CDP Private Cloud Base 7.1 Components
IBM Data & AI / © 2020 IBM Corporation Customers on HDP2.6.x/3.x and CDH5.x/6.x End-of-Support Dates
Current End-of-Support (EoS) Dates The following table specifies the planned End of Support Schedule for Cloudera products. All future dates are provided for planning purposes only and are subject to change, but with the expectation that dates may move later but will not move earlier. In each case, the projected EoS Date is considered to be the last day of the month specified in the table below. Check website for dates: https://www.cloudera.com/legal/policies/support-lifecycle-policy.html
Release End of Full Support Date Release End of Full Support Date
HDP 2.3 July 2018 CDH 5.14 December 2020
HDP 2.4 March 2019 CDH 5.15 December 2020
HDP 2.5 August 2019 CDH 5.16 December 2020
HDP 2.6 December 2020 CDH 6.0 August 2021
HDP 3.0 July 2021 CDH 6.1 December 2021
HDP 3.1 December 2021 CDH 6.2 March 2022 CDH 6.3 March 2022
IBM Data & AI / © 2020 IBM Corporation Three Paths to CDP
Migrate to Public Cloud Migrate to CDP PvC Upgrade to CDP PvC BASE BASE
CDP CDP CDP
Copy data and metadata to a Build a new CDP Private Cloud Upgrade from classic cluster to public cloud; implement new, or BASE cluster on-premises; CDP Private Cloud BASE in- migrate existing workloads on copy data and metadata from place on the same hardware CDP Public Cloud. existing classic cluster; and infrastructure. migrate existing workloads.
Small initial investment Higher initial investment Single cutover, lower capital investment
IBM Data & AI / © 2020 IBM Corporation Create new apps using Upgrading to CDP – Private Cloud CDP - Private Cloud CDP - Private Cloud PLUS (faster time to value) CDP Private Cloud Base provides the stateful elements for a new wave of containerized applications Altus DataPlane ✔Isolation from noisy neighbors Self-serve Self-serve analytic experiences • Storage DistroX Analytic ✔ • Table Schema Clusters Experience • Authentication & Authorization s ✔Decoupled from storage • Governance SDX ✔Decoupled upgrade cycles
Create new apps using CDP - Private Container Cloud ✔Elastic compute (batteries included or customer provided) Cloud as sidecar to CDH / HDP clusters (faster time to value)
CDH 5 / HDP 2 CDH 6 / HDP 3 CDP - Private Cloud Cluster Upgrade Cluster Upgrade BASE (DistroX on bare metal) Existing Apps ✔Latest Existing Apps ✔Best of CDH Existing Apps Upgrade existing clusters & upstream and HDP applications in-place Existing Data features Existing Data features Existing Data (protect existing investment)
Existing Hardware ExistingDirect Upgrade Hardware Existing Hardware
IBM Data & AI / © 2020 IBM Corporation Upgrading an Existing Cluster: Option A
CDP Private Cloud Step 1: Upgrade an existing cluster to CDP PvC Base, thus creating an SDX environment based on existing data Management Console
Step 2: Install CDP Private Cloud and use the Experiences to build new applications Data CDW CML Hub Step 3: Use Workload Manager to intelligently migrate key workloads from the CDP PvC Base cluster to the CDP Private Cloud Experiences
CDP PvC Base CDH 5 / HDP 2 CDH 6 / HDP 3 (SDX environment)
Existing Apps Upgrade Existing Apps Upgrade Existing Apps Existing Data Existing Data Existing Data Existing Existing Existing Upgrade Hardware Hardware Hardware
IBM Data & AI / © 2020 IBM Corporation Migrating from an Existing Cluster: Option B
CDP Private Cloud Step 1: Install CDP Data Center on new hardware and use Replication Manager to replicate data, metadata, and policies from an existing Management Console cluster to create the SDX environment
Step 2: Install CDP Private Cloud and use the Experiences to build new Data CDW CML applications Hub
Step 3: Use Workload Manager to intelligently migrate key workloads from the CDH / HDP cluster to the CDP Private Cloud Experiences
CDH / HDP CDP PvC Base (SDX environment) No bare metal Existing Apps apps Existing Data Intelligent Replication (data, metadata, policies) New Data Existing Hardware New Hardware
IBM Data & AI / © 2020 IBM Corporation Complete Data Lifecycle
Collect
Streaming & Data Flow Data at Serve Predict rest Curate Report
Data Data Operational Machine IBM Data & AI / © 2020 IBM Corporation Engineering Warehouse Database Learning & AI Complete and Connected Data Lifecycle
Stream Flow Streaming Messaging Management Analytics
Data in Analyze Act motion Buffer Distribute
Batch Operational Scoring Collect Enrichment curation Insights
Streaming & Data Flow Data at Serve Predict rest Curate Report
Data Data Operational Machine IBM Data & AI / © 2020 IBM Corporation Engineering Warehouse Database Learning & AI Poll:
Are you exploring real time use cases in your data platform?
IBM Data & AI / © 2020 IBM Corporation Cloudera Data Platform DataFlow
IBM Data & AI / © 2020 IBM Corporation Cloudera Data Platform Streaming Edition
Advanced messaging and stream processing powered by Apache Kafka + Friends.
This new product combines 3 offerings into 1 package built on CDP:
1 3 CDP Streaming
2 Edition
IBM Data & AI / © 2020 IBM Corporation CDP Streaming Offerings
NEW CDP Streams NEW CDP-Streaming Component OLD CSP Messaging Base OLD CSP & CSM Edition CDP PvC Base Cloudera Manager x x x x x Zookeeper x x x x x Knox x x x x x Ranger X (Sentry) x X (Sentry) x x Atlas x x x x x Kafka x x x x x Schema Registry x x x x x Kafka Streams x x x x x SMM x x x SRM x x x Cruise Control (NEW) x x Kafka Connect (NEW) x x HDFS / HBase / Solr x x YARN x x Flink x Balance of CDP PvC Base (30+ x components)
IBM Data & AI / © 2020 IBM Corporation 2020-21 Roadmap
2020 2021
• Introducing new experiences in CDP Private Cloud Plus • CDP Private Cloud Base • DataFlow • v7.0.3 • Data Engineering • V7.1.3 • CDP Private Cloud Control Plane • CDP Private Cloud Plus (DW and ML experiences) • Replication Manager • CDP Streaming Edition, includes • Workload Manager • Stream Processing • Data Catalog • Streams Management • Enhanced Data Warehouse and Machine Learning • Streaming Analytics capabilities • CDP Private Cloud Base support on POWER • CDP Private Cloud Base support on POWER/Spectrum • Db2 Big SQL support on CDP PvC Base Scale • Db2 Big SQL support on CDP PvC Base
* Roadmap plans may change
IBM Data & AI / © 2020 IBM Corporation Db2 Big SQL: Supercharge Big Data Workloads on CDP
FAST, INTEGRATED and SECURE DATA ACCESS LAYER for data platforms
INDUSTRY LEADING ADVANCED ELASTIC SQL SQL ENGINE FOR ANALYTICS AND ML ENGINE FOR BIG DATA MADE SIMPLE WORKLOADS
SQL compatibility with many BI and data science tools can Scale compute nodes based SQL dialects, enables reuse of access data stored in Hadoop on workloads to efficiently skills and applications or object stores use resources
Supports all open source file formats like ORC, Parquet, Avro, etc.
IBM Data & AI / ©2020 IBM Corporation Infuse the power of Db2 in CDP using Db2 Big SQL
SQL Compatibility Federation Performance Enterprise & • Understands different • Connect to remote data • Execute all 99 TPCDS Security SQL dialects sources queries • Automatic memory • Reuse skills and • Query pushdown • Scales linearly with management applications with less/no • Spark connectors for increased concurrency • Role/column based data changes more data sources & ML security models
Db2 Big SQL is the only SQL engine on the open source data platform that …
• SQL compatible with: • Federates to more than • Exhibits high • Secures data using SQL 10 data sources: performance even when with roles RDBMS, NoSQL and/or data scales up to 100TB • Integrates with Ranger Object Stores with complex SQLs for centralized • Integrates bi- • Handles many management • Applications work as-is directionally with Spark, concurrent users without without any changes like no other relinquishing • Operationalizes ML performance models
34 IBM Data & AI / © 2020 IBM Corporation To Summarize - Get more for Less with Db2 Big SQL
Accelerate time to market Empower SQL users to Augment disparate data Enable BI analytics with while modernizing your operationalize ML models for deep analytics and AI high performance and warehouse enterprise security
Above all, bring stability to your applications even when the platform goes through updates......
Now available.... Introducing Db2 Big SQL support on CDP v7.1
Make Db2 Big SQL the point of entry to Big Data irrespective of which platform has the data
35 IBM Data & AI / © 2020 IBM Corporation World Class Customer Support
Since beginning of the partnership: LEVEL 1 • More than 200 customers • Financial, retail, travel, automotive, energy, communications • More than 2000 Cases • 98+% managed within the SLA
IBM Support is your competitive advantage: • The ability for your business to easily and quickly access high-quality support is a critical advantage that will help you keep your business ahead of the competition. • To meet the growing needs of your business in today’s competitive landscape, IBM embarked on a transformation journey to reimagine the way we deliver support to you. LEVEL 2 One of the results of this revolution is the Cognitive Support Platform (CSP). • Infused with IBM’s enterprise AI technology, Watson, our Cognitive Support Platform helps you resolve issues quickly by providing you with an omnichannel support experience that is driven by insights, fueled by knowledge, and powered by Cognition. • One Click, One Call • One Case Owner
IBM Data & AI / © 2020 IBM Corporation 36 Professional Services Our Professional Services will help you unlock the value of your data throughout your data-driven journey.
Optimize at every stage of Shorten your time to your data journey production and value
We have just the right package for ensure your success • SmartStart - Get started with the Cloudera Platform in your data center Realize the full value of • SmartMigrate - Migrate legacy Cloudera CDH or HDP workloads to CDP in your data center your data • SmartUpgrade - Upgrade existing Cloudera CDH or HDP deployment to CDP with minimum disruption • SmartOffload - Offload your legacy data warehouse to the Cloudera platform • SmartHealth - Comprehensive platform health check for optimal performance overall
Our goal is to ensure your infrastructure outperforms standards at every stage of your organization's journey to becoming truly data-driven. IBM Data & AI / © 2020 IBM Corporation The Power of ONE
Greater Outcomes
IBM and Cloudera have a prescriptive approach IBM and Cloudera have a prescriptive approach • Mitigate the risk of 1. No vendor lock-in: Lay the Foundationcompliance with Red finesHat’s - OpenShiftup to Container Platform 1. No vendor lock-in: Lay the Foundation with Red Hat’s OpenShift Container Platform 2. New Revenue: Explore new business4% models of gross with sales machine learning at scale 2. New Revenue: Explore new businessby eachmodels personal with machine learning at scale 3. Reduce risk: establish and enforce theinformation enterprise data governance and security policies for data 3. Reduce risk: establish and enforce the enterprise data governance and security policies for data 4. Reduce costs: RHOS delivers up to 38%data lower breach infrastructure incident. and development costs per application 4. Reduce costs: RHOS delivers up to 38% lower infrastructure and development costs per application 5. Improve productivity: automate and• GDPRgovern will the become data and a reality AI lifecycle while ensuring compliance 5. Improve productivity: automate andin 2018govern. the data and AI lifecycle while ensuring compliance
IBM / Hybrid Data Management / © 2019 IBM Corporation © 2019 IBM Corporation 38 IBM Data & IBMAI / ©/ Hybrid 2020 IBM Data Corporation Management / © 2019 IBM Corporation Events and Resources
IBM and Cloudera Partnership ibm.com/analytics/partners/cloudera
IBM Open Source Offerings Community
Stay tuned and learn upcoming events by joining ibm.biz/hdmoscomm
IBM Data & AI / © 2020 IBM Corporation Thank you
Priya Tiruthani Fred Koopmans Offering Manager – Cloudera products and Db2 Big SQL VP, Product Management – Cloudera — — [email protected] Venky Sellappa Partner Solutions – Cloudera Lynn Chou — Offering Manager – Cloudera products — Dave Fowler [email protected] Partner Solutions – Cloudera —
IBM Data & AI / © 2020 IBM Corporation IBM Data & AI / © 2020 IBM Corporation Use Cases Modernize Enterprise Data Grow your business Architect the information architecture of the enterprise Increase your revenue, improve your customer to have a secured and governed platform to help drive: satisfaction, and power new business models by • Drive growth focusing on use cases such as: • Connect business • Marketing automation • Secure the processes • Personalized marketing • Customer experience • Churn prevention • Customer retention Protect your business Connect your business Protect your business by tackling challenging use cases Connect Operational Technology (OT) with IT to achieve such as: operational excellence around use cases such as the • Regulatory compliance following: • Risk modeling & analysis • Predictive maintenance • Financial crime prevention • Connected vehicles • Fraud detection • Smart cities • Cybersecurity • Healthcare analytics IBM Data & AI / © 2020 IBM Corporation • Industrial IoT