Apache HBase, the Scaling Machine Jean-Daniel Cryans Software Engineer at Cloudera @jdcryans

Tuesday, June 18, 13 Agenda

• Introduction to Apache HBase • HBase at StumbleUpon • Overview of other use cases


Tuesday, June 18, 13 About le Moi

• At Cloudera since October 2012. • At StumbleUpon for 3 years before that. • Committer and PMC member for Apache HBase since 2008. • Living in San Francisco. • From Québec, Canada.


Tuesday, June 18, 13 4

Tuesday, June 18, 13 What is Apache HBase

Apache HBase is an open source distributed scalable consistent low latency random access non-relational built on


Tuesday, June 18, 13 Inspiration: (2006)

• Goal: Low latency, consistent, random read/ write access to massive amounts of structured data. • It was the data store for Google’s crawler web table, gmail, analytics, earth, blogger, …


Tuesday, June 18, 13 HBase is in Production

• Inbox • Storage • Web • Search • Analytics • Monitoring


Tuesday, June 18, 13 HBase is in Production

• Inbox • Storage • Web • Search • Analytics • Monitoring


Tuesday, June 18, 13 HBase is in Production

• Inbox • Storage • Web • Search • Analytics • Monitoring


Tuesday, June 18, 13 HBase is in Production

• Inbox • Storage • Web • Search • Analytics • Monitoring


Tuesday, June 18, 13 HBase is Open Source

• Apache 2.0 License • A Community project with committers and contributors from diverse organizations: • , Cloudera, .com, Huawei, eBay, , Intel, Twitter … • Code license means anyone can modify and use the code.


Tuesday, June 18, 13 So why use HBase?


Tuesday, June 18, 13 10

Tuesday, June 18, 13 Old School Scaling


• Find a scaling problem. • Beef up the machine. • Repeat until you cannot find a big enough machine or run out of funding.


Tuesday, June 18, 13 “Get Rid of Everything” Scaling

• Remove text search queries (LIKE) • Remove joins • Joins due to Normalization require expensive seeks • Remove foreign keys and encode your relations • Avoid constraint checks • Put all parts of a query in a single table.

• Use read slaves to scale reads. • Shard to scale writes.


Tuesday, June 18, 13 We “optimized the DB” by discarding some fundamental SQL/relational features.


Tuesday, June 18, 13 HBase is Horizontally Scalable

• Adding more servers linearly increases performance and capacity • Storage capacity • • Input/output operations Largest cluster: >1000 nodes, >1PB • Store and access data on 1-1000’s • Most clusters: commodity servers 10-40 nodes, 100GB-4TB


Tuesday, June 18, 13 HBase is Consistent

• Brewer’s CAP theorem • Consistency: • DB-style ACID guarantees on rows • Availability: • Favor recovering from faults over returning stale data • Partition Tolerance: • If a node goes down, the system continues.


Tuesday, June 18, 13 HBase Dependencies

• Apache Hadoop HDFS for data durability and reliability (Write- Ahead Log). • Apache ZooKeeper for distributed App MR coordination. • Apache Hadoop MapReduce

support built-in support for ZK HDFS running MapReduce jobs.


Tuesday, June 18, 13 HBase on a Cluster


Tuesday, June 18, 13 Tables and Regions


Tuesday, June 18, 13 Load Distribution

RegionServer RegionServer Region Region Region


Tuesday, June 18, 13 Load Distribution

This region is getting too big and afects the balancing (more about writing in a moment)

RegionServer RegionServer Region Region Region


Tuesday, June 18, 13 Load Distribution

Let’s split the region in order to split the load

RegionServer RegionServer Region Region Region


Tuesday, June 18, 13 Load Distribution

RegionServer RegionServer Region Region A Region

Region B

Now that we have smaller pieces, it’s easier to move the load around


Tuesday, June 18, 13 Load Distribution

RegionServer RegionServer Region Region A Region B Region

No data was actually moved during this process, only its responsibility!


Tuesday, June 18, 13 The region is the unit of load distribution in HBase.


Tuesday, June 18, 13 So HBase can scale, but what about Hadoop?


Tuesday, June 18, 13 HDFS Data Allocation


Locations? Name node


Tuesday, June 18, 13 HDFS Data Allocation

Data Data Data Client node node node

Here you go Name node


Tuesday, June 18, 13 HDFS Data Allocation

Data is sent along the pipeline

Data Data Data Client node node node

Name node


Tuesday, June 18, 13 HDFS Data Allocation

Data Data Data Client node node node

ACKs are sent back as soon as the Name data is in memory in the last node node


Tuesday, June 18, 13 Putting it Together

Machine Region Data Data Data server node node node

Name node


Tuesday, June 18, 13 Data locality is extremely important for Hadoop and HBase.


Tuesday, June 18, 13 Scaling is just a matter of adding new nodes to the cluster.


Tuesday, June 18, 13 Sorted Map Datastore


Tuesday, June 18, 13 Sorted Map Datastore


Row key info:height info:state roles:hadoop roles:hbase

cutting ‘9ft’ ‘CA’ ‘Founder’

‘PMC’ @ts=2011 tlipcon ‘5ft7’ ‘CA’ ‘Commiter’ ‘Committer’ @ts=2010


Tuesday, June 18, 13 Sorted Map Datastore

Implicit PRIMARY KEY Format is family:qualifier

Row key info:height info:state roles:hadoop roles:hbase

cutting ‘9ft’ ‘CA’ ‘Founder’

‘PMC’ @ts=2011 tlipcon ‘5ft7’ ‘CA’ ‘Commiter’ ‘Committer’ @ts=2010


Tuesday, June 18, 13 Sorted Map Datastore

Implicit PRIMARY KEY Format is family:qualifier Data is all byte[] in HBase

Row key info:height info:state roles:hadoop roles:hbase

cutting ‘9ft’ ‘CA’ ‘Founder’

‘PMC’ @ts=2011 tlipcon ‘5ft7’ ‘CA’ ‘Commiter’ ‘Committer’ @ts=2010


Tuesday, June 18, 13 Sorted Map Datastore

Implicit PRIMARY KEY Format is family:qualifier Data is all byte[] in HBase

Row key info:height info:state roles:hadoop roles:hbase

cutting ‘9ft’ ‘CA’ ‘Founder’

‘PMC’ @ts=2011 tlipcon ‘5ft7’ ‘CA’ ‘Commiter’ ‘Committer’ @ts=2010

A single cell might have diferent values at diferent timestamps


Tuesday, June 18, 13 Sorted Map Datastore

Implicit PRIMARY KEY Format is family:qualifier Data is all byte[] in HBase

Row key info:height info:state roles:hadoop roles:hbase

cutting ‘9ft’ ‘CA’ ‘Founder’

‘PMC’ @ts=2011 tlipcon ‘5ft7’ ‘CA’ ‘Commiter’ ‘Committer’ @ts=2010

A single cell might have Diferent rows may have diferent values at diferent sets of columns diferent timestamps (table is sparse)


Tuesday, June 18, 13 Anatomy of a row

• Each row has a primary key • Lexicographically sorted byte[] • Timestamp associated for keeping multiple versions of data (MVCC for consistency) • Row is made up of columns. • Each (row,column) referred to as a Cell • Contents of a cell are all byte[]’s. • Apps must “know” types and handle them. • Rows are Strongly consistent.


Tuesday, June 18, 13 Access HBase data via an API

• Data operations • Get • Put • Delete • Scan • Compare-and-swap • DDL operations • Create • Alter • Enable/Disable • Access via HBase shell, Java API, REST proxy


Tuesday, June 18, 13 Java API

HBase provides utilities for easy conversions byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”);

Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); p.add(fam, qual, putVal) table.put(p);

Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);


Tuesday, June 18, 13 Java API

HBase provides utilities for easy conversions byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); This reads the configuration files

Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); p.add(fam, qual, putVal) table.put(p);

Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);


Tuesday, June 18, 13 Java API

HBase provides utilities for easy conversions byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); This reads the configuration files

Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); This creates a connection to p.add(fam, qual, putVal) the cluster, no master needed table.put(p);

Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);


Tuesday, June 18, 13 Java API

HBase provides utilities for easy conversions byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); This reads the configuration files

Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); This creates a connection to p.add(fam, qual, putVal) the cluster, no master needed table.put(p); By default all operations are persisted

Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);


Tuesday, June 18, 13 Java API

HBase provides utilities for easy conversions byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); This reads the configuration files

Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); This creates a connection to p.add(fam, qual, putVal) the cluster, no master needed table.put(p); By default all operations are persisted

Get g = new Get(row); Result r = table.get(g); From the moment the call to put() came byte[] jd = r.getValue(col); back, the data became visible to all the readers


Tuesday, June 18, 13 46

Tuesday, June 18, 13 StumbleUpon is...


Tuesday, June 18, 13 The Product


Tuesday, June 18, 13 Pushing this button takes you The Product to your next recommendation


Tuesday, June 18, 13 You can tell the recommendation The Product engine if you liked the page or not


Tuesday, June 18, 13 The Product

You can also browse specific interests


Tuesday, June 18, 13 Business Model

Users are showed sponsored pages that are relevant to their interests


Tuesday, June 18, 13 If HBase goes down, the site goes down


Tuesday, June 18, 13 Architecture

Load Apache Apache Apache Balancer HTTP Thrift HBase


Tuesday, June 18, 13 Architecture

Load Apache Apache Apache Balancer HTTP Thrift HBase



Tuesday, June 18, 13 Architecture

Load Apache Apache Apache Balancer HTTP Thrift HBase

PHP ~40 nodes


Tuesday, June 18, 13 A Few Use Cases

• A/B testing framework


Tuesday, June 18, 13 A Few Use Cases

• A/B testing framework • Realtime counters for dashboards


Tuesday, June 18, 13 A Few Use Cases

• A/B testing framework • Realtime counters for dashboards • Queueing


Tuesday, June 18, 13 A Few Use Cases

• A/B testing framework • Realtime counters for dashboards • Queueing • Page sharing with comments


Tuesday, June 18, 13 A Few Use Cases

• A/B testing framework • Realtime counters for dashboards • Queueing • Page sharing with comments • User lists of stumbles


Tuesday, June 18, 13 A Few Use Cases

• A/B testing framework • Realtime counters for dashboards • Queueing • Page sharing with comments • User lists of stumbles • Thumbnails serving


Tuesday, June 18, 13 A Few Use Cases

• A/B testing framework • Realtime counters for dashboards • Queueing • Page sharing with comments • User lists of stumbles • Thumbnails serving • Badges serving


Tuesday, June 18, 13 Analytics

HBase Prod >100 nodes

Apache MR Logs Cluster

Pig, Hive, Cascading, MySQL and pure MapReduce


Tuesday, June 18, 13 Analytics

HBase HBase Replication Prod Sub-second lag

Apache MR Logs Cluster



Tuesday, June 18, 13 Analytics

HBase Prod

Cron’d copy Apache MR Logs Cluster



Tuesday, June 18, 13 Analytics

HBase Prod

Apache MR Logs Cluster

Cron’d dump MySQL and load


Tuesday, June 18, 13 Monitoring With HBase

• OpenTSDB


Tuesday, June 18, 13 69

Tuesday, June 18, 13 Facebook Messages


Tuesday, June 18, 13 Facebook Messages

• Facebook needed a real email solution. • Originally the data was stored in MySQL. • They have the biggest HBase deployment. • All the emails, SMS, and chats are stored in HBase. • Users are sharded by pods of machines, each pod is configured the same way.


Tuesday, June 18, 13 Opower


Tuesday, June 18, 13 Opower


Tuesday, June 18, 13 Opower

• Perfect use case for HBase. • Follows a time series pattern. • Live trafc served with short scans. • New data constantly being fed.


Tuesday, June 18, 13 What now?

• Download HBase: http://www.hbase.org • Read HBase: The Definitive Guide by Lars George • Watch the videos from last week’s conference (available within a few weeks): http://www.hbasecon.com/ • Have a chat on #hbase hosted by irc.freenode.net


Tuesday, June 18, 13 76

Tuesday, June 18, 13