Why Cloudera The Platform for Production Success

© Cloudera, Inc. All rights reserved. 1 Why Cloudera?

We deliver long-term production success with enterprise Hadoop.

 Open Source Innovation Enterprise Security No one knows Hadoop better than Cloudera. Meet compliance requirements and reduce Cloudera leads development of enterprise risk exposure from storing sensitive data. Hadoop and offers the best support, training, and services. Data Governance  Powerful Enterprise Tools Enable compliance and maximize analyst productivity. Cloudera extends open source Hadoop with capabilities required by the largest enterprises. Complete Management  Ecosystem Deliver optimum system utilization and Cloudera partners with industry leaders to ensure meet SLA commitments, on-premises or Hadoop works with the platforms, tools, and in the cloud, with minimum effort. integrators our customers rely on.

© Cloudera, Inc. All rights reserved. 2 Our Platform

© Cloudera, Inc. All rights reserved. 3 Cloudera is Built for Production Success A modern data platform plus what the enterprise requires. Hadoop delivers: Process Discover Model Serve • One place for unlimited data • Unified, multi-framework data access Security and Administration

Unlimited Storage Cloudera delivers: • Enterprise Security • Data Governance Deployment On-Premises Public Cloud Appliances Private Cloud • Complete Management Flexibility Engineered Systems Hybrid Cloud • And more…

© Cloudera, Inc. All rights reserved. 4 Industrial Multi-Workload Performance Multiple big data opportunities in one optimized, high-performance, multi-tenant platform.

Batch, Interactive, Discover Process Model Serve and Real-Time. Ingest Analytic Machine NoSQL Database Sqoop, Flume Database Learning HBase Leading performance and Impala SAS, R, Spark, usability in one platform. Transform Streaming Mahout MapReduce, Search Spark Streaming Hive, Pig, Spark Solr • End-to-end analytic workflows • Access more data YARN, Cloudera Manager, Security and Administration Cloudera Navigator • Work with data in new ways Unlimited Storage HDFS, HBase • Enable new users

© Cloudera, Inc. All rights reserved. 5 Latest SQL Performance

350 Single User vs 10 User Response Time

& Impala Times Faster 10 Users, 302 Users, 10 300 (Lower bars = better)

250 10 Users, 202 Users, 10 200 27.4x

150 120 Users, 10 18.3x

100 77 SingleUser,

Time (in seconds) (in Time Single User, 37 SingleUser, Single User, 25 SingleUser, 10.6x 15.4x

50 11 Users, 10 Single User, 5 SingleUser, 5.0x 7.4x 0 Impala Spark SQL Presto Hive-on-Tez Independent validation by IBM Research SQL-on-Hadoop VLDB paper: “Impala’s database architecture provides significant performance gains”

© Cloudera, Inc. All rights reserved. 6 Hadoop Security is Different

Hadoop Benefit Security Side Effect Combining data and audiences that used to be A single platform for all the data securely silo’d Security method proliferation can increase costs/ A rich, flexible ecosystem of tools & utilities introduce coverage gaps

Ingest data of any type Sensitive fields added without review

Active Archive provides lower cost storage Lose the built-in compliance controls that legacy than legacy systems systems provided

© Cloudera, Inc. All rights reserved. 7 The Only Comprehensively Secure Hadoop Platform Meet compliance requirements and reduce risk exposure from storing sensitive data.

1. Perimeter Standards-based Authentication Cloudera is the leader in Hadoop security. Process Discover Model Serve Unique Capabilities: 2. Access Unified Role-based Authorization • Comprehensive and Unified Security and Administration • Secure at the core 3. Visibility Auditing & Governance • No Performance Impact Unlimited Storage • Jointly engineered with • Compliance-Ready 4. Data Encryption & Key Management • Only distribution to pass PCI audit

© Cloudera, Inc. All rights reserved. 8 The Only Hadoop Data Governance Solution Enable compliance and maximize analyst productivity. Cloudera Navigator Minimize risk and maintain compliance with the only native end-to-end data governance solution for .

Unique Capabilities: • Auditing • Lineage • Metadata Tagging and Discovery • Lifecycle Management

© Cloudera, Inc. All rights reserved. 9 MasterCard Cloudera: The first PCI-Certified Hadoop Platform Challenge: All applications, databases, or file systems that have the potential to handle personal account-related data must undergo full “Data privacy and protection is a top priority for MasterCard. As we maximize PCI certification the most advanced technologies from partners and vendors, they must meet the Solution: MasterCard’s Cloudera environment rigorous security standards we’ve set. With Cloudera’s commitment to the same fully conforms to the PCI-DSS V 2.0 security standards, we now have additional options standards so it can host PCI datasets and in how we manage ourGary VonderHaardata center.” Chief Technology Officer, potentially integrate with other internal systems Architecture MasterCard

© Cloudera, Inc. All rights reserved. 10 Security and Governance

Cloudera Unified, Compliance-Ready, Transparent Fragmented, Incomplete, Complex ● Kerberos with Cloudera Manager Kerberos Perimeter Automated, industry-standard ◐ Manual configuration Protecting access to the cluster authentication integrated with and integration existing systems Apache Sentry ◐ Hive ATZ-NG, Ranger Access ● Working within the RBAC configuration silos, Securing access to data community to deliver centralized, GUI “Band-Aid” granular RBAC across frameworks Cloudera Navigator Apache Falcon, Knox, Ranger Visibility ● Transparent end-to-end ◔ Manual and limited auditing through Reporting on data access data and metadata visibility a single workflow framework, and lineage and multiple tools ● Cloudera Navigator N/A Data Transparent, comprehensive, high- ○ Protecting data at rest performance, compliance-ready or in transmission encryption and key management

© Cloudera, Inc. All rights reserved. 11 The Only Complete Hadoop Management Suite Deliver optimum system utilization and meet SLA commitments. Cloudera Manager Focus on the solution, not the cluster, with the only complete, zero-downtime administration tool for Apache Hadoop.

Unique Capabilities: • Unified configuration, management and monitoring across all services • Online installation and upgrades • Direct connection to Cloudera Support • 3rd Party Extensibility

© Cloudera, Inc. All rights reserved. 12 Cloudera Manager vs. Ambari

Cloudera Ambari Unified, Directed, Streamlined Federated, Chaotic, Disjointed Parcels and Workflows YUM and Shell Commands Manage ● Holistic, service-oriented components ◐ Manual configuration Deploying and enable streamlined, comprehensive, and and time-consuming, configuring services straightforward operations error-prone integration Integrated Charting and SNMP Alerts Nagios, Ganglia Monitor ● Catalog of chart metrics and visualization ◐ Manual configuration, limited native System health and with easy-to-build, easy-to-share visualization, and manual integration of QoS and SLA notification dashboards and common alerts separate, disparate systems and services Time Control and Log Collection SSH/SCP to /var/log Diagnose ●Centralized log aggregation of all services ◔ Manual log collection via CLI tools Root cause discovery, with integrated faceted search and from diverse locations with limited, service- analysis, and solution visual timeframe controls specific search and no historical views ● Enterprise Kerberos Integration Kerberos Integrate Automated, industry-standard ◐ Assisted CLI configuration, Extending security policies, authentication with integration manual deployment, adding 3rd party services to existing enterprise systems and limited integration

© Cloudera, Inc. All rights reserved. 13 The Only Portable Cloud Experience for Hadoop Maximize flexibility in Hadoop deployment architectures. Cloudera Director The first portable, self-service solution for deploying and managing enterprise-grade Hadoop in the Cloud.

Unique Capabilities: • Dynamic cluster lifecycle management • Cloud blueprints • Multi-cluster health visibility • Usage reporting for billing models

© Cloudera, Inc. All rights reserved. 14 Our Approach

© Cloudera, Inc. All rights reserved. 15 Focusing on Open Standards, not just Open Source

Vendor Support Open Standards are just as Component (Founder) Cloudera Pivotal MapR Amazon IBM Hortonworks

Impala (Cloudera) ✔ ✖ ✔ ✔ ✖ ✖ important as Open Source.

Spark (UC Berkeley) ✔ ✔ ✔ ✔ ✔ ✔ Hue (Cloudera) ✔ ✔ ✔ ✔ ✖ ✔ Why does it matter? Sentry (Cloudera) ✔ ✔ ✔ ✖ ✔ ✖ • Diverse engineering is more sustainable. Flume (Cloudera) ✔ ✔ ✔ ✖ ✔ ✔ • Broad support ensures vendor portability. Parquet ✔ ✔ ✔ ✔ ✔ ✖ (Cloudera/Twitter) • Project utility depends on ecosystem Sqoop (Cloudera) ✔ ✔ ✔ ✔ ✔ ✔ compatibility, which depends on standards. Falcon (Hortonworks) ✖ ✖ ✖ ✖ ✖ ✔

Knox (Hortonworks) ✖ ✖ ✖ ✖ ✔ ✔

Tez (Hortonworks) ✖ ✖ ✔ ✖ ✖ ✔ Cloudera leads in defining

Ranger (Hortonworks) ✖ ✖ ✖ ✖ ✖ ✔ the de facto open standards ORCfile (Hortonworks) ✖ ✖ ✖ ✖ ✖ ✔ adopted by the market.

© Cloudera, Inc. All rights reserved. 16 Sustainable Innovation

A Hybrid Open Source Model combining the power of open source with the enterprise capabilities customers need.

Open Platform • Deep open source commitment 100% Open Source • 2/3 of engineering on open source & Open Standards • 19 Hadoop ecosystem projects founded • 90 ASF committer seats, 67 PMC seats • Enterprise-ready extensions • Security, governance, and system management • Comprehensive partner integrations • 160+ certified solutions

© Cloudera, Inc. All rights reserved. 17 Supporting the Entire Ecosystem, not just the Core

90 Committer* Seats deliver the fastest issue resolution and enable us to Hortonworks drive the Apache roadmap for our customers. IBM MapR Microsoft Pivotal Cloudera and Intel committers resolve over 50% of WANdisco all JIRA tickets among all Hadoop vendors.

54% Projects Included: Accumulo Hive Tez Avro Kafka Whirr Bigtop Mahout Zookeeper Source: Apache JIRA Crunch Oozie January 2012 – March 2015 Flume Pig Hadoop Core Solr HBase Spark Sqoop * “Committer” = A developer who has earned community privileges to commit patches © Cloudera, Inc. All rights reserved. 18 Leading Innovation in the Hadoop Ecosystem

Cloudera Hortonworks Founded Founded First Training Offered Cloudera U Hortonworks U (Over 20,000 Trained) (Less than 1,000 Trained) CDH 1 Released HDP 1.0 Released

Cloudera Manager 1.0 Ambari 1.0 (Missing many enterprise features) HUE Ships in CDH3 HUE Ships in HDP 2.0 Impala Launches Stinger “Final Phase” (Still 5-9x slower) Navigator Launches Falcon (Missing many enterprise features)

Search Launches LucidWorks (Reseller Only) Sentry Ships CDH 4.3 XA Secure / Ranger (Limited scope) Spark for CDH 4.4 ???

Key Management N/A

Data Encryption N/A

Cloud Deployment N/A

2008 2009 2010 2011 2012 2013 2014

© Cloudera, Inc. All rights reserved. 19 Best-In-Class Support

Overall satisfaction makes Cloudera the 8.9 industry benchmark for support

Customers agree they benefit from Cloudera 95% technical support outreach

Ability to solve technical issues is the top #1 reason to recommend Cloudera for Hadoop

© Cloudera, Inc. All rights reserved. 20 Industry-Leading Training and University Programs

Big Data professionals from Cloudera has trained over 60% 40,000 of the Fortune 100 have people on Hadoop since attended live Cloudera 2009 training

© Cloudera, Inc. All rights reserved. 21 Source: Fortune, “Fortune 500 “ and “Global 500,” May 2012. The Most Complete Partner Ecosystem

Applications More than 1,400 partners ensure compatibility with existing investments, lower skill barriers, Operational and help maximize value from Tools your data.

Enterprise Data Hub System Integration Process Discover Model Serve Data Systems Security and Administration

Unlimited Storage

Infrastructure

© Cloudera, Inc. All rights reserved. 22 Why Cloudera?

We deliver long-term production success with enterprise Hadoop.

 Open Source Innovation Enterprise Security No one knows Hadoop better than Cloudera. Meet compliance requirements and reduce Cloudera leads development of enterprise risk exposure from storing sensitive data. Hadoop and offers the best support, training, and services. Data Governance  Powerful Enterprise Tools Enable compliance and maximize analyst productivity. Cloudera extends open source Hadoop with capabilities required by the largest enterprises. Complete Management  Ecosystem Deliver optimum system utilization and Cloudera partners with industry leaders to ensure meet SLA commitments, on-premises or Hadoop works with the platforms, tools, and in the cloud, with minimum effort. integrators our customers rely on.

© Cloudera, Inc. All rights reserved. 23 Thank You! Matt Brandwein @mattbrandwein

© Cloudera, Inc. All rights reserved. 24