Apache™ Hadoop® in the Datacenter and Cloud Digital Transformation fueled by The Shift to the Big Data Analytics and IoT Connected Data Architecture

• Cloud and ACTIONABLE INTELLIGENCE Data Center Relational • Powered by IDMS Database Data in Data at Motion Rest Open Source

Connected Data Architecture System‐centric User‐centric

Transformational Use‐Cases • Predictive Retail • Factory Automation • Connected Cars • Predictive Analytics • Artificial Intelligence Mainframe Client / Server Web and SaaS Modern Applications 2 © Inc. 2011 – 2016. All Rights Reserved Hadoop in the Data Center

Create and Manage Central Data Lakes

Support all Types of Data

Provide Flexible Processing and Access Methods

Reduce Architecture Costs by 80% or More

Drive Transformational New Use Cases

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop in the Cloud

Fast On‐Ramp for New Users

Elastic Compute and Storage Capabilities

Zero‐configuration access engine capabilities (HD Insight)

Eliminate Hardware purchases

Facilitate Certain Modern Data Applications through Cloud Connectivity

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Transformational Applications Require Connected Data

Edge Machine Analytics Learning

CLOUD Edge Data at Data Data in Motion Rest

Stream Analytics

Data in Data at Edge DATA CENTER Motion Rest Data

Deep Historical Analysis

© Hortonworks Inc. 2011 – 2016. All Rights Reserved Our Focus: Enable Modern Applications on Connected Data Platforms

Continuous Enterprise Any Open Insights Ready Delivery Model Innovation

Deliver insights Management Data Center Architecture from ALL data, Security Cloud Community origin to rest Governance Hybrid Ecosystem

© Hortonworks Inc. 2011 – 2016. All Rights Reserved A Look at Hadoop in the Data Center

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Actionable Intelligence from Connected Data Platforms Modern Data Applications

 Capturing perishable ACTIONABLE insights from data in motion INTELLIGENCE  Ensuring rich, historical insights on data at rest  Necessary for modern data applications DATA IN DATA AT MOTION REST

Hortonworks Hortonworks DataFlow Data Platform

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hortonworks Data Platform for Data at Rest Powered by Open Enterprise Hadoop

Open

Central

Interoperable

Ready

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hortonworks Data Platform 2.5 Highlights

Dynamic Security: Apache Atlas + Ranger Integration

Enterprise Spark at Scale: Apache Zeppelin Notebook for Spark

Real‐Time Applications: Storm and HBase/Phoenix

Streamlined Operations:

Interactive Query in Seconds: Hive with LLAP (Technical Preview )

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas + Ranger ‐ More Powerful Together

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Introducing Tag Based Security Apache Atlas and Ranger Integration

Basic Tag policy – Access and entitlements can be based on attributes. As an Key Benefits: example: Personally Identifiable Information (PII) is a tag that can be leveraged to protect sensitive personal data. New scalable metadata based security paradigm Geo‐based policy – Access policy based on location. As an example: A user might be able to access data in North America, but may be restricted from Dynamic, real‐time policy access in EMEA due to privacy compliance. Automatic updates to Time‐based policy – Access policy based on time windows. An an example: changes in metadata A user might be able to access data only between 8AM – 5PM (common in SOX regulations.) Centralized and simple to manage policy Prohibitions – Restrictions on combining two data sets which might be in compliance originally, but not when combined together. As an example, SSNs and Names)

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas Powers Cross‐Component Data Lineage

As a part of HDP 2.5, users can track lineage across the following Key Benefits: components using Atlas: Enterprises need open  Apache – Import from and export to relational databases, solutions, not single app and additional package that leverages Sqoop vendor  Hive ‐ Dataset lineage with entity versioning (including schema changes) More native connectors than any other vendor  / Storm ‐ IoT event‐level processing, such as syslogs or sensor data Hardened metadata  Falcon ‐ Data lifecycle at Feed and Process entity level for infrastructure replication, and repeating workflows. Tracks period‐icy, throttling, eviction. ATLAS‐69 FALCON‐1570

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Expanded Native Connector: Dataset Lineage

Apache Kafka Teradata Connector

RDBMS

Sqoop

Custom Metadata Activity Repository Reporter

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas Enables Business Catalog for Ease of Use

 Organize data assets along business terms Key Benefits: – Authoritative: Hierarchical business Taxonomy Creation – Agile modeling: Model Conceptual, Logical, Physical assets Easy way to create business – Definition and assignment of tags like PII (Personally Taxonomy Identifiable Information) Useful for multiple user types  Comprehensive features for compliance including Data Steward and – Multiple user profiles including Data Steward and Business Business Analysts Analysts – Object auditing to track “Who did it” Comprehensive features for – Metadata Versioning to track ”what did they do” compliance  Faster Insight: – Data Quality tab for profiling and sampling – User Comments

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Business Catalog Model and explore metadata via the new Business Catalog in Apache Atlas

Data Steward

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streamlining Operations, Three Phase Plan Focused Strategic Investments into our core products to give customers more unique tooling to quickly understand the cluster’s health, how business users are using it, and where to focus efforts when issues arise.

⬢ Capabilities – Phase 1: Advanced Performance & Health Metrics Dashboards – with Ambari 2.2.2 Grafana Log – Phase 2: Consolidated Cluster Activity Reporting – NEW! with SmartSense 1.3.0 Search – Phase 3: Centralized & Contextual Log Search – Tech Preview with Ambari 2.4.0 Ambari ⬢ Core Technologies Metrics System AMBARI SmartSense – Apache Ambari – Ambari Metrics System – Apache Solr Dedicated UIs – Hortonworks SmartSense Solr – Grafana

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streamlined Operations Phase 1: Advanced Metrics Visualization & Dashboarding

Goal: Quickly understand cluster health metrics and key performance indicators Grafana

⬢ Capabilities – Centralized Dashboarding focusing on component Health & Ambari Performance Metrics System AMBARI – Ad‐Hoc Graph Creation

⬢ Pre‐Built Dashboards – HDFS – YARN – HBase

⬢ Core Technologies – Ambari Metrics System – Grafana

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari now includes pre‐built dashboards for visualizing cluster health

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streamlined Operations Phase 2: Consolidated Cluster Activity Reporting Goal: Quickly visualize and report on how business users and tenants are using the cluster, top 10 queues, users, most time consuming jobs AMBARI SmartSense ⬢ Capabilities Ambari AmbariMetrics Apache – Top K Activity Reporting Metrics System Zeppelin System – Chargeback

⬢ Services Covered – YARN – MapReduce – Hive/Tez – Spark – HDFS

⬢ Core Technologies – Hortonworks SmartSense

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved – Apache Zeppelin Activity Explorer: Cluster Utilization Reporting

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Preview: Streamlined Operations Investments Phase 3: Centralized & Contextual Log Search

Goal: When issues arise, be able to quickly find issues AMBARI across all HDP components

Log ⬢ Capabilities Search – Rapid Search of all HDP component logs Solr – Search across time ranges, log levels, and for keywords

⬢ Core Technologies: – Apache Ambari – Apache Solr – Apache Ambari Log Search

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Tune the log collection system with Guided Smart Configurations

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved View a comprehensive inventory of operational logs for each host

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive 2 with LLAP Enable Interactive Query In Seconds

Developer Productivity: Interactive query in seconds

Ease of Use and Adoption : 100% compatible with Hive SQL

Enterprise Readiness: Linear scaling at Terabytes volume of data

Streamlined Operations: LLAP integration with Ambari with automated dashboards

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive 2 with LLAP: Preliminary Numbers

Hive2.0 and LLAP: TPC‐DS at 10 TB Scale, 18 Nodes

80

70

60

Min query time: 50 Query 55: 2.38s 40 Hive2.0‐Tez LLAP

30

20

10

0 q3 q7 q12 q13 q19 q21 q26 q27 q42 q43 q45 q52 q55 q60 q73 q84 q89 q91 q98

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved A Look at Hadoop in the Cloud

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Traditional Hadoop Clusters

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 28 Why Cloud?

IT & No Upfront Ephemeral & Unlimited Business Agility HW Costs Long‐Running Elastic Scale

© Hortonworks Inc. 2011 – 2016. All Rights Reserved How Do We Approach The Cloud Market?

HYBRID SEGMENT CLOUD ONRAMP New users via digital engagement or Today’s enterprise customers existing customers exploring cloud options

Seamless Connected Data Architecture Elasticity, Automation, across Cloud and Data Center. Pay as you Go, One‐Click Start. Always‐on enterprise use cases are common. Ephemeral use cases are common starting point.

Azure HDInsight, HDP, and HDF Azure HDInsight is our Premier offering. are our Premier offerings. Focused offerings for AWS that enable us to Customer journey to future state architecture, engage and position our Premier offerings. cloud operation & consumption model.

Cloud‐first approach to product design, development, testing & delivery

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Outlook: Cloud and the Big Data Market

 Public cloud adoption (AWS, Azure, Google) will continue to accelerate

 Many customers will go Cloud First to simplify/speed adoption

 Customers deploying in public cloud expect a pay‐as‐you‐go (PAYG) pricing model – Hourly pricing is default; “reserved” optimizes annual spend; “spot” optimizes hourly spend

 Interested in running workloads in the cloud and in addition to on‐premise clusters.

 Familiar with Native Cloud tooling.

 Heightens importance of product packaging and user experience tuned to Cloud

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Cloud IaaS and Hadoop as a Service

Running Hadoop on Using Hadoop as a Cloud IaaS Cloud Service

Public Cloud Service Providers

32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Microsoft Azure HDInsights Powered by Hortonworks Data Platform

Seamless Access to the Public Cloud for Spark, Hive, and HBase and other mission‐critical workloads Unmatched Economics combining HDInsight’s elasticity in the cloud with HDP’s cost efficiencies at scale Enterprise Readiness with robust security, governance and operations in the cloud, powered by Hortonworks Data Platform

33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Connected Data Architecture with Azure HDInsight

CLOUD Azure Ideal Use Cases HDInsight Cloud Data Processing Data Prep, Query, and Analysis (Hadoop, Hive, Pig) HDInsight Cluster Types Iterative In‐Memory Analysis (Spark)

HDF Advanced Statistics, Modeling, Data Flow Machine Learning Management (R Server on Spark)

NoSQL Data Storage (HBase)

Real‐time Event Processing HDP (Storm) Enterprise DATA CENTER Data Lake

© Hortonworks Inc. 2011 – 2016. All Rights Reserved Runs in more datacenters than anyone else

North Central US Illinois West Europe Netherlands Central US Iowa China North * Beijing

Japan East North Europe China South * Tokyo, Saitama Ireland Shanghai West US East US California Virginia India Central Japan West Pune Osaka East US 2 South Central US Virginia Texas East Asia Hong Kong

SE Asia Singapore

Australia East New South Wales

Brazil South Sao Paulo State Australia South East Victoria

 Azure doubling compute and storage every 6 months 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Microsoft Azure HDInsight and Apache Projects in the Cloud YARN DATA OPERATING SYSTEM

Machine Standard Hadoop Projects Batch Learning for Hive, YARN, HDFS, MapReduce, Pig, GOVERNANCE Tez, Sqoop, oozie, Zookeeper, Mahout,

STORAGE Phoenix

STORAGE Compehensive List of Emerging Projects OPERATIONS SECURITY Spark, Storm Hbase, and R

Interactive Streaming Ability to Add Projects Add various projects to the the cloud

Search

36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Forrester Wave™: Big Data Hadoop Cloud Solutions, Q2 2016

“Elasticity, Automation, And Pay‐As‐You‐Go Compel Enterprise Adoption Of Hadoop In The Cloud”

37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Connected Data Architecture with HDC for AWS

CLOUD HDC for AWS Cloud Data Processing Ideal Use Cases

Data Science and Exploration (Spark, Zeppelin) HDF Data Flow ETL and Data Preparation Management (Hive, Spark)

Analytics and Reporting (Hive2 w/LLAP, Zeppelin) HDP Enterprise DATA CENTER Data Lake TECH PREVIEW © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hortonworks Data Cloud for AWS

Cluster Types

TECH PREVIEW 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Prescriptive On‐Demand Ephemeral Workloads

** Planned list of available Cluster Types TECH PREVIEW 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Hortonworks Cloud Solutions?

Choice of Cloud

Rich Set of Capabilities and Security

Zero‐configuration access engine capabilities (HD Insight)

S3 Integrations on AWS (Tech Preview)

Award Winning Hadoop Expertise

41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Connected Data Platforms Integrate Cloud and Data Center Deployments

Edge Machine Analytics Learning

CLOUD Edge Data at Data Data in Motion Rest

Stream Analytics

Data in Data at Edge DATA CENTER Motion Rest Data

Deep Historical Analysis

© Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You

43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved