We put the SQL back in NoSQL* @ApachePhoenix h p://phoenix.apache.org/ James Taylor (@JamesPlusPlus)
* Anybody else no ce that all the NoSQL stores have SQL? V5 About James • Architect at Salesforce.com – Part of the Big Data group – Lead of the Apache Phoenix project What is Apache Phoenix? • A rela onal database layer for Apache HBase What is ? • High performance horizontally scalable byte store • Suitable as store of record for mission cri cal data What is Apache Phoenix? What is Apache Phoenix? • A rela onal database layer for Apache HBase – Query engine • Transforms SQL into na ve HBase API calls • Pushes work to cluster for parallel execu on – Metadata repository • Typed access to data stored in HBase tables • Support mul -tenancy modeled as SQL views – A JDBC driver • A top level Apache So ware Founda on project – Originally developed at Salesforce – Now a top-level project at the ASF – A growing community with momentum Where Does Phoenix Fit In?
Phoenix Pig Hive GraphX Graph analysis MLLib JDBC client Data Manipula on Structured Query Data mining framework
Sqoop Phoenix YARN
RDB Data Collector Query execu on engine Spark Cluster Resource Itera ve In-Memory Manager / HBase Computa on Distributed Database MapReduce Coordina on Flume HDFS 2.0 Zookeeper
Log Data Collector Hadoop Distributed File System Hadoop The Java Virtual Machine Common JNI Why is Phoenix fast? • Pushes down computa on to region servers – Start/stop key range(s) – Time range min/max – Predicates – Aggrega on – Sort – Limit – TopN • Parallelizes query from client – Intra-region through sta s cs collec on • Supports secondary indexes – Global & co-located Why is Phoenix fast?
Client Scan ranges Parallel scans
Intra-region guideposts
RServer 1 R2 … Server Server Par oned Table Server Server Server … Rn
• Filter • Aggregate • Sort • Limit
Skip Scans • Push key range sets through filter • Use SEEK_NEXT_HINT to skip data
Skip R1
Include Skip Include Skip Include Filters • Pushes down WHERE clause to server • Filters en re HFile based on me-range
R1 t1 - 10 HFile t11 - 20 HFile t21 - 30 HFile t31 - 40 HFile Scan What’s Next?
You are here Introducing Apache Calcite • Query parser, compiler, and planner framework – SQL-92 compliant • Pluggable cost-based op mizer framework – Sane way to model push down through rules • Interop with other Calcite adaptors – Already used by Drill, Hive, Kylin, Samza – Supports any JDBC source (RDBMS - remember them J) – One cost-model to rule them all How does Phoenix plug in? JDBC Client
SQL + Phoenix Calcite Parser & Validator specific grammar Built-in rules + Calcite Query Op mizer Phoenix specific rules Phoenix Query Plan Generator Phoenix Run me What’s Next? Query ?
Drillbit Drillbit Drillbit Drillbit Calcite Calcite Calcite Calcite
Phoenix/ Samza/ HDFS RDBMS HBase Streams Thank you! Ques ons?
* who uses Phoenix