We Put the SQL Back in Nosql* @Apachephoenix H�P://Phoenix.Apache.Org/ James Taylor (@Jamesplusplus)

We put the SQL back in NoSQL* @ApachePhoenix hp://phoenix.apache.org/ James Taylor (@JamesPlusPlus)

* Anybody else noce that all the NoSQL stores have SQL? V5 About James • Architect at Salesforce.com – Part of the Big Data group – Lead of the Apache Phoenix project What is Apache Phoenix? • A relaonal database layer for Apache HBase What is ? • High performance horizontally scalable byte store • Suitable as store of record for mission crical data What is Apache Phoenix? What is Apache Phoenix? • A relaonal database layer for Apache HBase – Query engine • Transforms SQL into nave HBase API calls • Pushes work to cluster for parallel execuon – Metadata repository • Typed access to data stored in HBase tables • Support mul-tenancy modeled as SQL views – A JDBC driver • A top level Apache Soware Foundaon project – Originally developed at Salesforce – Now a top-level project at the ASF – A growing community with momentum Where Does Phoenix Fit In?

Phoenix Pig Hive GraphX Graph analysis MLLib JDBC client Data Manipulaon Structured Query Data mining framework

Sqoop Phoenix YARN

RDB Data Collector Query execuon engine Spark Cluster Resource Iterave In-Memory Manager / HBase Computaon Distributed Database MapReduce Coordinaon Flume HDFS 2.0 Zookeeper

Log Data Collector Hadoop Distributed File System Hadoop The Java Virtual Machine Common JNI Why is Phoenix fast? • Pushes down computaon to region servers – Start/stop key range(s) – Time range min/max – Predicates – Aggregaon – Sort – Limit – TopN • Parallelizes query from client – Intra-region through stascs collecon • Supports secondary indexes – Global & co-located Why is Phoenix fast?

Client Scan ranges Parallel scans

Intra-region guideposts

RServer 1 R2 … Server Server Paroned Table Server Server Server … Rn

• Filter • Aggregate • Sort • Limit

Skip Scans • Push key range sets through ﬁlter • Use SEEK_NEXT_HINT to skip data

Skip R1

Include Skip Include Skip Include Filters • Pushes down WHERE clause to server • Filters enre HFile based on me-range

R1 t1 - 10 HFile t11 - 20 HFile t21 - 30 HFile t31 - 40 HFile Scan What’s Next?

You are here Introducing Apache Calcite • Query parser, compiler, and planner framework – SQL-92 compliant • Pluggable cost-based opmizer framework – Sane way to model push down through rules • Interop with other Calcite adaptors – Already used by Drill, Hive, Kylin, Samza – Supports any JDBC source (RDBMS - remember them J) – One cost-model to rule them all How does Phoenix plug in? JDBC Client

SQL + Phoenix Calcite Parser & Validator speciﬁc grammar Built-in rules + Calcite Query Opmizer Phoenix speciﬁc rules Phoenix Query Plan Generator Phoenix Runme What’s Next? Query ?

Drillbit Drillbit Drillbit Drillbit Calcite Calcite Calcite Calcite

Phoenix/ Samza/ HDFS RDBMS HBase Streams Thank you! Quesons?

* who uses Phoenix