What's Coming in Fall 2020 Cloudera With

What’s Coming in Fall 2020 Cloudera with IBM — Nagapriya (Priya) Tiruthani Offfering management – Big Data [email protected] As data becomes more ACCESSIBLE it provides more VALUE Data Driven Insight Driven Digital Transformation Outcomes Culture Change Prediction New Business Models Breaking Silos Optimization Disruptive Technology Discover “what” Automation Real-Time Decisions Understand “why” Collaboration Capabilities Self Service Models AI Reports Visualization Multi-Cloud Business Intelligence Applications Cost Reduction Competitive Market Drivers Modernization Leader Value from Data IBM Data & AI / © 2020 IBM Corporation There is no AI without an IA “Information Architecture” “ No amount of AI algorithmic sophistication will overcome a lack of data [architecture] Data collection & preparation is the most time consuming and difficult part of AI ” Sources: 2018 MITSlone ”Reshaping business with AI” IBM Data & AI / © 2020 IBM Corporation 3 Open source will 3+ 87% of AI Developers continue to drive average number of depend on Open Open Source Databases innovation and speed Source technology up the journey to AI used by enterprises today 85% >50% 3 of the of enterprises are of revenue will be from engaged in Open Source servers that run open top 5 projects source AI software most popular databases are Open Source, including Postgres SQL Source: https://dzone.com/articles/2019-open-source-database- 4 report-top-databases-pub A long open Serve on the Built on open history board source In 1999, IBM IBMers serve on Many of IBM’s supported Linux many open offerings IBM’s by investing $1 source boards leverage open billion in its including Linux, source— Commitment development, Eclipse, Apache, including cloud, To Open Source making is less CNCF, Node.js, big data and risky to traditional Hyperledger, and analytics, enterprise users. more. blockchain, IoT, machine learning, and AI. IBM Data & AI / © 2020 IBM Corporation IBM and Cloudera Relationship and Significant Milestones Together selling more than $100M annually in software, support and services 2017 2018 2019 2020 IBM + Hortonworks Strategic IBM wins Hortonworks overall Cloudera + Hortonworks Merger Cloudera announces Red Hat Partnership Announced “2018 Partner of the Year” Completed with hybrid cloud OpenShift as preferred container vision solution for CDP HDP and HDF Certified for IBM Announces intent to buy IBM Power, Spectrum Scale, Red Hat for $34B and IBM and Cloudera expand IBM releases NEW version of Db2 Big SQL and Watson become the world’s largest partnership to include resell and Db2 Big SQL supporting Studio hybrid cloud provider support of entire1 Cloudera Cloudera’s CDP Private Cloud portfolio IBM announces workshops to IBM releases NEW versions of help customer plan for CDP Db2 Big SQL supporting Private Cloud Cloudera’s CDH 5 & 6 1 Cloudera portfolio IBM Power and IBM Storage ✓ All legacy Cloudera offerings IBM Wins Cloudera overall “2019 come to market with CDP Private ✓ All legacy Hortonworks offerings Partner of the Year” Cloud Base ✓ Cloudera Data Platform (CDP) ✓ All Services & Training offerings (DSE, PSE, Operational Services) IBM Data & AI / © 2020 IBM Corporation Changing Customer Needs Any Tier | All Data Data Lifecycle Secure & Open + Governed Standards Multiple public Streaming Data & metadata 100% Open source Hybrid Data engineering Fine grained security Open data formats Private Data warehousing Lineage and provenance Open storage & compute Open APIs Data center Machine learning & AI Data & workload migration Edge IBM Data & AI / © 2020 IBM Corporation Data Landscape is Evolving The new realities of managing data and workloads across clouds Decade 1 Decade 2 Hadoop on-prem and on the cloud Hadoop powered data clouds ● Need to efficiently store & process data ● Need to integrate the entire lifecycle USE CASES ● Batch process “big data” ● Industrialize data-driven decision making ● Co-locate compute and storage to use TECHNOLOGY ● High performance analytics with commodity hardware and avoid costly INFRASTRUCTURE remote disaggregated storage with network transfers memory and SSD caching ● Deploy software in months and quarters USER EXPERIENCE ● Spin up services in minutes ● Network perimeter & physical access ● Security at the workload, data & PRIVACY, SECURITY & controls are the norm metadata layer GOVERNANCE ● Simplicity over robust mechanisms ● Solutions for new regulations (GDPR) 8 IBM Data & AI / © 2020 IBM Corporation Complete Enterprise Data Lifecycle Manage and secure the data lifecycle in any cloud or datacenter Data Operational Engineering Database Collect Report Predict 02 04 01 03 05 Streaming Curate Data Serve Machine & Data Flow Warehouse Learning & AI Security | Governance | Lineage | Management | Automation IBM Data & AI / © 2020 IBM Corporation Poll: Cloudera Enterprise Data Hub Which platform are Hortonworks Data Platform you using today? Cloudera Data Platform IBM Data & AI / © 2020 IBM Corporation Introducing Cloudera Data Platform…. Industry’s First Enterprise Data Cloud Cloudera Data Platform Private Cloud with IBM Cloud Cloud Data Cloud Cloud Data Data Center Data Machine Flow Hub Software Warehouse Learning DataFlow HDP Cloudera Data CDF Enterprise Enterprise Science HDF Plus Data Hub Workbench Today’s Products IBM Data & AI / © 2020 IBM Corporation Introducing Cloudera Data Platform Data center & Public Hybrid Control Private Cloud Multi-Cloud Cloud Plane Data • Control cloud costs with auto Anywhere scale, suspend and resume • Optimize workloads based on analytics and machine learning Governed Catalog | Schema | Migration | Security | Governance Everywhere • View data lineage across any cloud and transient clusters Data Flow & Data Data Operational Machine • Use a single pane of glass across Edge to AI Streaming Engineering Warehouse Database Learning hybrid and multi-clouds Analytics • Scale to petabytes of data and 1,000s of diverse users Open Cloudera Runtime Distribution Identity | Orchestration | Management | Operations | Management | Orchestration | Identity Management Console IBM Data & AI / © 2020 IBM Corporation One Platform – Two Form Factors CDP Public Cloud CDP Private Cloud (Base + Plus) (platform-as-a-service) (installable software) Control Plane CDP Datacenter Private AWS Azure GCP Cloud Virtual Private Self-Serve Self-Serve Physical Clusters Experiences Experiences Clusters DW, ML, DE, DW, ML, DE, Data Hub Data Center … … Cloudera Runtime IBM Data & AI / © 2020 IBM Corporation CDP Public / Private Cloud Architecture Management Console Management Console - A single pane of glass to Data Workload Replication manage one or more environments and the services that Catalog Manager Manager run within each environment Environment - A logical encapsulation of a customer network and the the services that run within that network Environment (like an Azure virtual network) Data DW ML DataHuHub Cluster – A distributed computing service that running on ClusterCDW ClusterCML Clusterb VMs (Data Hub) or K8s (the experiences) and has Clusterss Clusterss Clusterss access the shared data lake SDX – The data access control layer that sits on top of SDX the backend object store and provides coherent data security and governance for all the applications running with the environment IBM Data & AI / © 2020 IBM Corporation Poll: How soon are you planning to migrate to Cloudera Data Platform? 6 months 12 months 18 months IBM Data & AI / © 2020 IBM Corporation 24 months or later Cloudera Data Platform Private Cloud (BASE & PLUS) CDP Private Cloud PLUS expands Experiences upon the value of CDP Private Machine Data Data PLUS Cloud BASE by providing: DataFlow Learning Warehouse Engineering • New set of Experiences • Leverages Red Hat Open Shift • Provides customers greater flexibility as they can run on any private/public cloud of choice SDX • Leverages BareMetal Schema • Allows customers to leverage their existing Security HDFS / Ozone BASE investments and architecture Governance • Allows customers to build toward future state of compute and storage BareMetal IBM Data & AI / © 2020 IBM Corporation Cloudera Data Platform / CDP Private Cloud BASE The most comprehensive Data Analytics Platform EDH Cloudera Enterprise Data Hub + + New Features = CDP Private Cloud BASE recently renamed from CDP Data Center HDP HORTONWORKS DATA PLATFORM powered by Apache Hadoop IBM Data & AI / © 2020 IBM Corporation CDP Private Cloud Plus CDP Private Cloud PLUS expands upon the value of CDP Private Cloud BASE by providing: • 10x faster deployments of analytics and machine learning services with a petabyte-scale hybrid data architecture that can burst to public clouds • 100% tenant isolation in meeting the SLAs of your mission-critical workloads eliminating the noisy neighbor problem • 50% reduced data center costs by drastically improving efficiency and utilization of your compute infrastructure and eliminating data replication IBM Data & AI / © 2020 IBM Corporation New Features for everyone… CDP Private Cloud BASE First Step to Private Cloud PLUS and MAX New features for CDH 6 customers New features for HDP 3 customers • Virtual private clusters • Dynamic row filtering & column masking • Automated wire encryption setup Cloudera Manager Ranger 2.0 • Attribute-based access control • Fine-grained RBAC for administrators • SparkSQL fine-grained access control • Streamlined maintenance workflows • Advanced data discovery • Advanced data lineage Atlas 2.0 Atlas 2.0 • Improved performance and scalability • Faceted search • Relevance-based text search over

What's Coming in Fall 2020 Cloudera With

Splitting the Load How Separating Compute from Storage Can Transform the Flexibility, Scalability and Maintainability of Big Data Analytics Platforms

Apache Hadoop Today & Tomorrow

Big Business Value from Big Data and Hadoop

View Whitepaper

Final HDP with IBM Spectrum Scale

Hortonworks Data Platform Release Notes (October 30, 2017)

Hortonworks Data Platform Apache Solr Search Installation (July 12, 2018)

Hadoop Security

Ingesting Data

Hortonworks Data Platform Teradata Connector User Guide (May 17, 2018)

Hortonworks Data Platform on IBM Power Systems

Hortonworks Data Platform Data Movement and Integration (December 15, 2017)