Why Cloudera the Platform for Production Success
Total Page:16
File Type:pdf, Size:1020Kb
Why Cloudera The Platform for Production Success © Cloudera, Inc. All rights reserved. 1 Why Cloudera? We deliver long-term production success with enterprise Hadoop. Open Source Innovation Enterprise Security No one knows Hadoop better than Cloudera. Meet compliance requirements and reduce Cloudera leads development of enterprise risk exposure from storing sensitive data. Hadoop and offers the best support, training, and services. Data Governance Powerful Enterprise Tools Enable compliance and maximize analyst productivity. Cloudera extends open source Hadoop with capabilities required by the largest enterprises. Complete Management Ecosystem Deliver optimum system utilization and Cloudera partners with industry leaders to ensure meet SLA commitments, on-premises or Hadoop works with the platforms, tools, and in the cloud, with minimum effort. integrators our customers rely on. © Cloudera, Inc. All rights reserved. 2 Our Platform © Cloudera, Inc. All rights reserved. 3 Cloudera is Built for Production Success A modern data platform plus what the enterprise requires. Hadoop delivers: Process Discover Model Serve • One place for unlimited data • Unified, multi-framework data access Security and Administration Unlimited Storage Cloudera delivers: • Enterprise Security • Data Governance Deployment On-Premises Public Cloud Appliances Private Cloud • Complete Management Flexibility Engineered Systems Hybrid Cloud • And more… © Cloudera, Inc. All rights reserved. 4 Industrial Multi-Workload Performance Multiple big data opportunities in one optimized, high-performance, multi-tenant platform. Batch, Interactive, Discover Process Model Serve and Real-Time. Ingest Analytic Machine NoSQL Database Sqoop, Flume Database Learning HBase Leading performance and Impala SAS, R, Spark, usability in one platform. Transform Streaming Mahout MapReduce, Search Spark Streaming Hive, Pig, Spark Solr • End-to-end analytic workflows • Access more data YARN, Cloudera Manager, Security and Administration Cloudera Navigator • Work with data in new ways Unlimited Storage HDFS, HBase • Enable new users © Cloudera, Inc. All rights reserved. 5 Latest SQL Performance 350 Single User vs 10 User Response Time & Impala Times Faster 10 Users, 302 Users, 10 300 (Lower bars = better) 250 10 Users, 202 Users, 10 200 27.4x 150 120 Users, 10 18.3x 100 77 Single User, Time (in seconds) Time (in Single User, 37 Single User, Single User, 25 Single User, 10.6x 15.4x 50 11 Users, 10 Single User, 5 Single User, 5.0x 7.4x 0 Impala Spark SQL Presto Hive-on-Tez Independent validation by IBM Research SQL-on-Hadoop VLDB paper: “Impala’s database architecture provides significant performance gains” © Cloudera, Inc. All rights reserved. 6 Hadoop Security is Different Hadoop Benefit Security Side Effect Combining data and audiences that used to be A single platform for all the data securely silo’d Security method proliferation can increase costs/ A rich, flexible ecosystem of tools & utilities introduce coverage gaps Ingest data of any type Sensitive fields added without review Active Archive provides lower cost storage Lose the built-in compliance controls that legacy than legacy systems systems provided © Cloudera, Inc. All rights reserved. 7 The Only Comprehensively Secure Hadoop Platform Meet compliance requirements and reduce risk exposure from storing sensitive data. 1. Perimeter Standards-based Authentication Cloudera is the leader in Hadoop security. Process Discover Model Serve Unique Capabilities: 2. Access Unified Role-based Authorization • Comprehensive and Unified Security and Administration • Secure at the core 3. Visibility Auditing & Governance • No Performance Impact Unlimited Storage • Jointly engineered with Intel • Compliance-Ready 4. Data Encryption & Key Management • Only distribution to pass PCI audit © Cloudera, Inc. All rights reserved. 8 The Only Hadoop Data Governance Solution Enable compliance and maximize analyst productivity. Cloudera Navigator Minimize risk and maintain compliance with the only native end-to-end data governance solution for Apache Hadoop. Unique Capabilities: • Auditing • Lineage • Metadata Tagging and Discovery • Lifecycle Management © Cloudera, Inc. All rights reserved. 9 MasterCard Cloudera: The first PCI-Certified Hadoop Platform Challenge: All applications, databases, or file systems that have the potential to handle personal account-related data must undergo full “Data privacy and protection is a top priority for MasterCard. As we maximize PCI certification the most advanced technologies from partners and vendors, they must meet the Solution: MasterCard’s Cloudera environment rigorous security standards we’ve set. With Cloudera’s commitment to the same fully conforms to the PCI-DSS V 2.0 security standards, we now have additional options standards so it can host PCI datasets and in how we manage ourGary VonderHaardata center.” Chief Technology Officer, potentially integrate with other internal systems Architecture MasterCard © Cloudera, Inc. All rights reserved. 10 Security and Governance Cloudera Hortonworks Unified, Compliance-Ready, Transparent Fragmented, Incomplete, Complex ● Kerberos with Cloudera Manager Kerberos Perimeter Automated, industry-standard ◐ Manual configuration Protecting access to the cluster authentication integrated with and integration existing systems Apache Sentry ◐ Hive ATZ-NG, Ranger Access ● Working within the RBAC configuration silos, Securing access to data community to deliver centralized, GUI “Band-Aid” granular RBAC across frameworks Cloudera Navigator Apache Falcon, Knox, Ranger Visibility ● Transparent end-to-end ◔ Manual and limited auditing through Reporting on data access data and metadata visibility a single workflow framework, and lineage and multiple tools ● Cloudera Navigator N/A Data Transparent, comprehensive, high- ○ Protecting data at rest performance, compliance-ready or in transmission encryption and key management © Cloudera, Inc. All rights reserved. 11 The Only Complete Hadoop Management Suite Deliver optimum system utilization and meet SLA commitments. Cloudera Manager Focus on the solution, not the cluster, with the only complete, zero-downtime administration tool for Apache Hadoop. Unique Capabilities: • Unified configuration, management and monitoring across all services • Online installation and upgrades • Direct connection to Cloudera Support • 3rd Party Extensibility © Cloudera, Inc. All rights reserved. 12 Cloudera Manager vs. Ambari Cloudera Ambari Unified, Directed, Streamlined Federated, Chaotic, Disjointed Parcels and Workflows YUM and Shell Commands Manage ● Holistic, service-oriented components ◐ Manual configuration Deploying and enable streamlined, comprehensive, and and time-consuming, configuring services straightforward operations error-prone integration Integrated Charting and SNMP Alerts Nagios, Ganglia Monitor ● Catalog of chart metrics and visualization ◐ Manual configuration, limited native System health and with easy-to-build, easy-to-share visualization, and manual integration of QoS and SLA notification dashboards and common alerts separate, disparate systems and services Time Control and Log Collection SSH/SCP to /var/log Diagnose ●Centralized log aggregation of all services ◔ Manual log collection via CLI tools Root cause discovery, with integrated faceted search and from diverse locations with limited, service- analysis, and solution visual timeframe controls specific search and no historical views ● Enterprise Kerberos Integration Kerberos Integrate Automated, industry-standard ◐ Assisted CLI configuration, Extending security policies, authentication with integration manual deployment, adding 3rd party services to existing enterprise systems and limited integration © Cloudera, Inc. All rights reserved. 13 The Only Portable Cloud Experience for Hadoop Maximize flexibility in Hadoop deployment architectures. Cloudera Director The first portable, self-service solution for deploying and managing enterprise-grade Hadoop in the Cloud. Unique Capabilities: • Dynamic cluster lifecycle management • Cloud blueprints • Multi-cluster health visibility • Usage reporting for billing models © Cloudera, Inc. All rights reserved. 14 Our Approach © Cloudera, Inc. All rights reserved. 15 Focusing on Open Standards, not just Open Source Vendor Support Open Standards are just as Component (Founder) Cloudera Pivotal MapR Amazon IBM Hortonworks Impala (Cloudera) ✔ ✖ ✔ ✔ ✖ ✖ important as Open Source. Spark (UC Berkeley) ✔ ✔ ✔ ✔ ✔ ✔ Hue (Cloudera) ✔ ✔ ✔ ✔ ✖ ✔ Why does it matter? Sentry (Cloudera) ✔ ✔ ✔ ✖ ✔ ✖ • Diverse engineering is more sustainable. Flume (Cloudera) ✔ ✔ ✔ ✖ ✔ ✔ • Broad support ensures vendor portability. Parquet ✔ ✔ ✔ ✔ ✔ ✖ (Cloudera/Twitter) • Project utility depends on ecosystem Sqoop (Cloudera) ✔ ✔ ✔ ✔ ✔ ✔ compatibility, which depends on standards. Falcon (Hortonworks) ✖ ✖ ✖ ✖ ✖ ✔ Knox (Hortonworks) ✖ ✖ ✖ ✖ ✔ ✔ Tez (Hortonworks) ✖ ✖ ✔ ✖ ✖ ✔ Cloudera leads in defining Ranger (Hortonworks) ✖ ✖ ✖ ✖ ✖ ✔ the de facto open standards ORCfile (Hortonworks) ✖ ✖ ✖ ✖ ✖ ✔ adopted by the market. © Cloudera, Inc. All rights reserved. 16 Sustainable Innovation A Hybrid Open Source Model combining the power of open source with the enterprise capabilities customers need. Open Platform • Deep open source commitment 100% Open Source • 2/3 of engineering on open source & Open Standards • 19 Hadoop ecosystem projects founded • 90 ASF committer seats, 67 PMC seats • Enterprise-ready extensions • Security, governance, and system management • Comprehensive partner