You're Now a Data Company
Total Page:16
File Type:pdf, Size:1020Kb
You’re now a data company Reality : Tool oriented, little accretive value, unnecessarily complex & costly The bottom line in 2020 and beyond Our Perspective: Four ecosystems have emerged within Data & Analytics Primary Ecosystem Comprehensive Streaming / Events Data Data Lake Maturity of Persistence Analytic Data Warehouse Ecosystem Ecosystem Data Mart Database / OLTP Basic Data Fabric Ecosystem Virtualization Comprehensive Lineage Maturity of Catalogs Ecosystem Master Data Quality & Governance Basic ETL Comprehensive Explainability Model Deployment Optimization Data Maturity of Ecosystem Operational Planning Science Ecosystem Notebooks / R Basic Reporting Mongo Postgres Jupyter, Zepplin, RStudio Spark Kubernetes Open Source Ecosystem Data Data Science Persistence Ecosystem Ecosystem Data Fabric Ecosystem We THINK they belong in a single Data & AI platform It’s Called an Enterprise Insights Platform (EIP) Data Data Data Open Source Persistence + Fabric + Science + Ecosystem Ecosystem Ecosystem Ecosystem Great News! Existing IBM Customer’s can gain access to our EIP via a simple upgrade program and a small increase to your existing IBM subscription Unlocked Ecosystems and Capabilities Data Science Technological Modernization Program Data Prep Low & No Code Modeling Entry Points Open Source Notebooks & Studios Explainability & Bias Detection Model Deployment and REST API Open Source Management Db2 Generators Informix Business Intelligence Keep Existing MDM IBM Tool Data Fabric Run old school (VMs) 1:1 VPC Match Example: 100 VPCs of Db2 or new school Caching Lineage Data Quality DataStage (Containers) = 100 VPCs of Db2 Cartridge Business Discovery + 100 VPCs of Cloud Pak Optimization Glossary Cognos for Data + Computational Mesh Data Prep Governance Planning Gain Access to SPSS All Ecosystems via IBM’s EIP Data Persistence CPLEX (Cloud Pak for Data) Enterprise Data Warehouse Data Lake Operational Data Store No SQL WEX Data Marts Transactional So what can you do now? Analyze with Cloud Pak for Data Scale Data Science What you will accomplish Enable data science throughout your enterprise via a single integrated platform capable of developing Data Science Single Ecosystem algorithms and models using open source, low-code, or automation to build, train and deploy insights Tools Needed to at scale Match IBM’s Data Prep Low & No Code Modeling Ecosystem Business Outcome Open Source Notebooks & Studios Explainability & Bias Detection 50% $3-5M 1 Open Source Package Model Deployment and REST API Reduction in time to model Revenue potential from deployed Unified Analytics Platform Management Generators deployment models Business Intelligence Examples Configurations Small Medium Large Data Virtualization Data Catalog (not deployed in this example) (not deployed in this example) Concurrent Users • 5 users • 10 users • 20 users Data Warehouse Deployed Models • 5 models • 10 models • 20 Models (not deployed in this example) Spark Executors • 2 executors (20 gb) • 5 executors (100 gb) • 10 executors (200 gb) Cloud Pak for Data Platform VPCs • 96 VPCs • 160 VPCs • 224 VPCs Administration Scalability Kubernetes Complimentary IBM Cartridges Definition: Data Science & ML Platform Definition: Predictive Analytics & ML Platform Watson Studio Premium | Watson ML Accelerator | Cognos & Planning Analytics IBM Position: Challenger (2020) IBM Position: Leader (2020) Organize with Cloud Pak for Data Build a Data Fabric What you will accomplish Reduce data movement, ETL jobs and the time it takes to access enterprise data at scale by providing Data Science Single Ecosystem (not deployed in this example) Tools Needed to your organization with a data fabric that can abstract multiple data sources into a single SQL query or Match IBM’s catalog asset ultimately driving speed to new insights and outcomes Ecosystem 3rd Party Connections 3rd Party Connections Data Virtualization Data Catalog Business Outcome Lineage Data Quality $7.3M 65% 430% Caching Business Discovery Software Cost Savings Reduction in ETL Jobs Faster than Data Federation Optimization Glossary Data Prep Examples Configurations Computational Mesh Governance Small Medium Large Virtual • 2000 Simple (<1 min) or • 4000 Simple (<1 min) or • 8000 Simple (<1 min) or Data Warehouse queries per • 25 moderate (1-10 min) or • 50 moderate (1-10 min) or • 100 moderate (1-10 min) or (not deployed in this example) minute • 5 complex (>10 min) • 10 complex (>10 min) • 20 complex (>10 min) • 3 Data Refinery Jobs • 10 Data Refinery Jobs • 20 Data Refinery Jobs Cloud Pak for Data Platform Catalog • 3 Profiling Flows • 10 Profiling Flows • 20 Profiling Flows Activities • 3 Data Stewards • 5 Data Stewards • 25 Data Stewards Administration Scalability Kubernetes Number Assets • Unlimited • Unlimited • Unlimited VPC Totals • 48 VPCs • 96 VPCs • 160 VPCs Definition: Data Integration Definition: Data Fabric IBM Position: Leader (2020) IBM Position: Leader (2020) Complimentary IBM Cartridges Db2 | Information Server | DataStage | MDM | Cognos & Planning Analytics Collect with Cloud Pak for Data Deploy Fit for Purpose Data Stores What you will accomplish Deploy fit for purpose repositories that focus on performance and business application speed vs. a Data Science Single Ecosystem single commoditized storage layer that trades performance for general purpose (not deployed in this example) Tools Needed to Match IBM’s Data Virtualization Data Catalog Ecosystem (not deployed in this example) (not deployed in this example) Business Outcome Up to 10X 3X 1/3 Warehouses Specialized Faster than competitive Faster time to deployment The total cost of Enterprise Data Warehouse Data Lake repositories (Load to Query) ownership Operational Data Store No SQL Warehouse Configuration Examples Data Marts Transactional Small Medium Large • Netezza: 18 TB • Netezza: 48 TB • Netezza: 96 TB Capacity • Db2: 20 TB • Db2: 40 TB • Db2: 80 TB Cloud Pak for Data Platform • Netezza: 66 • Netezza: 114 • Netezza: 210 VPCs • Db2: 48 • Db2: 96 • Db2: 160 Administration Scalability Kubernetes Complimentary IBM Cartridges Definition: Data Management Solutions for Analytics Definition: Cloud Data Warehouse Information Server | DataStage | Db2 | Cognos & Planning Analytics IBM Position: Leader (2019) IBM Position: Leader (2020) 3rd Party Connections Single Ecosystem Data Science Tools Needed to Do it all with Cloud Pak for Data Match Solution Pattern The Modern Reference Architecture Data Prep Low & No Code Modeling Open Source Notebooks & Studios Explainability & Bias Detection What you will accomplish Model Deployment and REST API Open Source Management Significant simplification of analytics architecture and a remarkable increase in time to value for data Generators engineers, stewards, scientists, and analysts. Run anywhere, mature over time, and leverage existing investments to compliment our modern reference architecture Business Intelligence Business Outcome 3rd Party Connections 3rd Party Connections 50% 65% 1 Data Virtualization Data Catalog Reduction in time to model deployment Reduction in ETL Jobs Unified Analytics Platform Caching Lineage Data Quality Business Example Configurations Discovery Optimization Glossary Small Medium Large Computational Mesh Data Prep • 5 Users • 10 Users • 20 Users Governance Data Science • 5 Models • 10 Models • 20 Models • 2 Spark Executors (20 gb) • 5 Spark Executors (100 gb) • 10 Spark Executors (200 gb) Data • 2000 Virtual Queries • 4000 Virtual Queries • 8000 Virtual Queries Warehouses Specialized Virtualization & • 3 Data Stewards • 5 Data Stewards • 25 Data Stewards Catalog • Unlimited Data Assets • Unlimited Data Assets • Unlimited Data Assets Enterprise Data Warehouse Data Lake • • Netezza: 48 TB • Netezza: 96 TB Data Warehouse Netezza: 18 TB • Db2: 20 TB • Db2: 40 TB • Db2: 80 TB Operational Data Store No SQL • 210 VPCs (w/ Netezza) • 370 VPCs (w/ Netezza) • 594 VPCs (w/ Netezza) VPCs • 192 VPCs (w/ DB2) • 352 VPCs (w/ DB2) • 544 VPCs (w/ DB2) Data Marts Transactional Complimentary IBM Cartridges Cloud Pak for Data Platform DataStage | Watson Studio Premium | Cognos | Planning Analytics Definition: Enterprise Insights Platform IBM Position Leader (2020) Administration Scalability Kubernetes Success Do a lot more for a lot less by upgrading your existing IBM software to Cloud Pak for Data today.