Apache HBase, the Scaling Machine Jean-Daniel Cryans Software Engineer at Cloudera @jdcryans
Tuesday, June 18, 13 Agenda
• Introduction to Apache HBase • HBase at StumbleUpon • Overview of other use cases
2
Tuesday, June 18, 13 About le Moi
• At Cloudera since October 2012. • At StumbleUpon for 3 years before that. • Committer and PMC member for Apache HBase since 2008. • Living in San Francisco. • From Québec, Canada.
3
Tuesday, June 18, 13 4
Tuesday, June 18, 13 What is Apache HBase
Apache HBase is an open source distributed scalable consistent low latency random access non-relational database built on Apache Hadoop
5
Tuesday, June 18, 13 Inspiration: Google BigTable (2006)
• Goal: Low latency, consistent, random read/ write access to massive amounts of structured data. • It was the data store for Google’s crawler web table, gmail, analytics, earth, blogger, …
6
Tuesday, June 18, 13 HBase is in Production
• Inbox • Storage • Web • Search • Analytics • Monitoring
7
Tuesday, June 18, 13 HBase is in Production
• Inbox • Storage • Web • Search • Analytics • Monitoring
7
Tuesday, June 18, 13 HBase is in Production
• Inbox • Storage • Web • Search • Analytics • Monitoring
7
Tuesday, June 18, 13 HBase is in Production
• Inbox • Storage • Web • Search • Analytics • Monitoring
7
Tuesday, June 18, 13 HBase is Open Source
• Apache 2.0 License • A Community project with committers and contributors from diverse organizations: • Facebook, Cloudera, Salesforce.com, Huawei, eBay, HortonWorks, Intel, Twitter … • Code license means anyone can modify and use the code.
8
Tuesday, June 18, 13 So why use HBase?
9
Tuesday, June 18, 13 10
Tuesday, June 18, 13 Old School Scaling
=>
• Find a scaling problem. • Beef up the machine. • Repeat until you cannot find a big enough machine or run out of funding.
11
Tuesday, June 18, 13 “Get Rid of Everything” Scaling
• Remove text search queries (LIKE) • Remove joins • Joins due to Normalization require expensive seeks • Remove foreign keys and encode your relations • Avoid constraint checks • Put all parts of a query in a single table.
• Use read slaves to scale reads. • Shard to scale writes.
12
Tuesday, June 18, 13 We “optimized the DB” by discarding some fundamental SQL/relational databases features.
13
Tuesday, June 18, 13 HBase is Horizontally Scalable
• Adding more servers linearly increases performance and capacity • Storage capacity • • Input/output operations Largest cluster: >1000 nodes, >1PB • Store and access data on 1-1000’s • Most clusters: commodity servers 10-40 nodes, 100GB-4TB
14
Tuesday, June 18, 13 HBase is Consistent
• Brewer’s CAP theorem • Consistency: • DB-style ACID guarantees on rows • Availability: • Favor recovering from faults over returning stale data • Partition Tolerance: • If a node goes down, the system continues.
15
Tuesday, June 18, 13 HBase Dependencies
• Apache Hadoop HDFS for data durability and reliability (Write- Ahead Log). • Apache ZooKeeper for distributed App MR coordination. • Apache Hadoop MapReduce
support built-in support for ZK HDFS running MapReduce jobs.
16
Tuesday, June 18, 13 HBase on a Cluster
17
Tuesday, June 18, 13 Tables and Regions
18
Tuesday, June 18, 13 Load Distribution
RegionServer RegionServer Region Region Region
19
Tuesday, June 18, 13 Load Distribution
This region is getting too big and afects the balancing (more about writing in a moment)
RegionServer RegionServer Region Region Region
20
Tuesday, June 18, 13 Load Distribution
Let’s split the region in order to split the load
RegionServer RegionServer Region Region Region
21
Tuesday, June 18, 13 Load Distribution
RegionServer RegionServer Region Region A Region
Region B
Now that we have smaller pieces, it’s easier to move the load around
22
Tuesday, June 18, 13 Load Distribution
RegionServer RegionServer Region Region A Region B Region
No data was actually moved during this process, only its responsibility!
23
Tuesday, June 18, 13 The region is the unit of load distribution in HBase.
24
Tuesday, June 18, 13 So HBase can scale, but what about Hadoop?
25
Tuesday, June 18, 13 HDFS Data Allocation
Client
Locations? Name node
26
Tuesday, June 18, 13 HDFS Data Allocation
Data Data Data Client node node node
Here you go Name node
27
Tuesday, June 18, 13 HDFS Data Allocation
Data is sent along the pipeline
Data Data Data Client node node node
Name node
28
Tuesday, June 18, 13 HDFS Data Allocation
Data Data Data Client node node node
ACKs are sent back as soon as the Name data is in memory in the last node node
29
Tuesday, June 18, 13 Putting it Together
Machine Region Data Data Data server node node node
Name node
30
Tuesday, June 18, 13 Data locality is extremely important for Hadoop and HBase.
31
Tuesday, June 18, 13 Scaling is just a matter of adding new nodes to the cluster.
32
Tuesday, June 18, 13 Sorted Map Datastore
33
Tuesday, June 18, 13 Sorted Map Datastore
Implicit PRIMARY KEY
Row key info:height info:state roles:hadoop roles:hbase
cutting ‘9ft’ ‘CA’ ‘Founder’
‘PMC’ @ts=2011 tlipcon ‘5ft7’ ‘CA’ ‘Commiter’ ‘Committer’ @ts=2010
34
Tuesday, June 18, 13 Sorted Map Datastore
Implicit PRIMARY KEY Format is family:qualifier
Row key info:height info:state roles:hadoop roles:hbase
cutting ‘9ft’ ‘CA’ ‘Founder’
‘PMC’ @ts=2011 tlipcon ‘5ft7’ ‘CA’ ‘Commiter’ ‘Committer’ @ts=2010
35
Tuesday, June 18, 13 Sorted Map Datastore
Implicit PRIMARY KEY Format is family:qualifier Data is all byte[] in HBase
Row key info:height info:state roles:hadoop roles:hbase
cutting ‘9ft’ ‘CA’ ‘Founder’
‘PMC’ @ts=2011 tlipcon ‘5ft7’ ‘CA’ ‘Commiter’ ‘Committer’ @ts=2010
36
Tuesday, June 18, 13 Sorted Map Datastore
Implicit PRIMARY KEY Format is family:qualifier Data is all byte[] in HBase
Row key info:height info:state roles:hadoop roles:hbase
cutting ‘9ft’ ‘CA’ ‘Founder’
‘PMC’ @ts=2011 tlipcon ‘5ft7’ ‘CA’ ‘Commiter’ ‘Committer’ @ts=2010
A single cell might have diferent values at diferent timestamps
37
Tuesday, June 18, 13 Sorted Map Datastore
Implicit PRIMARY KEY Format is family:qualifier Data is all byte[] in HBase
Row key info:height info:state roles:hadoop roles:hbase
cutting ‘9ft’ ‘CA’ ‘Founder’
‘PMC’ @ts=2011 tlipcon ‘5ft7’ ‘CA’ ‘Commiter’ ‘Committer’ @ts=2010
A single cell might have Diferent rows may have diferent values at diferent sets of columns diferent timestamps (table is sparse)
38
Tuesday, June 18, 13 Anatomy of a row
• Each row has a primary key • Lexicographically sorted byte[] • Timestamp associated for keeping multiple versions of data (MVCC for consistency) • Row is made up of columns. • Each (row,column) referred to as a Cell • Contents of a cell are all byte[]’s. • Apps must “know” types and handle them. • Rows are Strongly consistent.
39
Tuesday, June 18, 13 Access HBase data via an API
• Data operations • Get • Put • Delete • Scan • Compare-and-swap • DDL operations • Create • Alter • Enable/Disable • Access via HBase shell, Java API, REST proxy
40
Tuesday, June 18, 13 Java API
HBase provides utilities for easy conversions byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”);
Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); p.add(fam, qual, putVal) table.put(p);
Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);
41
Tuesday, June 18, 13 Java API
HBase provides utilities for easy conversions byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); This reads the configuration files
Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); p.add(fam, qual, putVal) table.put(p);
Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);
42
Tuesday, June 18, 13 Java API
HBase provides utilities for easy conversions byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); This reads the configuration files
Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); This creates a connection to p.add(fam, qual, putVal) the cluster, no master needed table.put(p);
Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);
43
Tuesday, June 18, 13 Java API
HBase provides utilities for easy conversions byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); This reads the configuration files
Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); This creates a connection to p.add(fam, qual, putVal) the cluster, no master needed table.put(p); By default all operations are persisted
Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);
44
Tuesday, June 18, 13 Java API
HBase provides utilities for easy conversions byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); This reads the configuration files
Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); This creates a connection to p.add(fam, qual, putVal) the cluster, no master needed table.put(p); By default all operations are persisted
Get g = new Get(row); Result r = table.get(g); From the moment the call to put() came byte[] jd = r.getValue(col); back, the data became visible to all the readers
45
Tuesday, June 18, 13 46
Tuesday, June 18, 13 StumbleUpon is...
47
Tuesday, June 18, 13 The Product
48
Tuesday, June 18, 13 Pushing this button takes you The Product to your next recommendation
49
Tuesday, June 18, 13 You can tell the recommendation The Product engine if you liked the page or not
50
Tuesday, June 18, 13 The Product
You can also browse specific interests
51
Tuesday, June 18, 13 Business Model
Users are showed sponsored pages that are relevant to their interests
52
Tuesday, June 18, 13 If HBase goes down, the site goes down
53
Tuesday, June 18, 13 Architecture
Load Apache Apache Apache Balancer HTTP Thrift HBase
54
Tuesday, June 18, 13 Architecture
Load Apache Apache Apache Balancer HTTP Thrift HBase
PHP
55
Tuesday, June 18, 13 Architecture
Load Apache Apache Apache Balancer HTTP Thrift HBase
PHP ~40 nodes
56
Tuesday, June 18, 13 A Few Use Cases
• A/B testing framework
57
Tuesday, June 18, 13 A Few Use Cases
• A/B testing framework • Realtime counters for dashboards
5856
Tuesday, June 18, 13 A Few Use Cases
• A/B testing framework • Realtime counters for dashboards • Queueing
5956
Tuesday, June 18, 13 A Few Use Cases
• A/B testing framework • Realtime counters for dashboards • Queueing • Page sharing with comments
6056
Tuesday, June 18, 13 A Few Use Cases
• A/B testing framework • Realtime counters for dashboards • Queueing • Page sharing with comments • User lists of stumbles
6156
Tuesday, June 18, 13 A Few Use Cases
• A/B testing framework • Realtime counters for dashboards • Queueing • Page sharing with comments • User lists of stumbles • Thumbnails serving
6256
Tuesday, June 18, 13 A Few Use Cases
• A/B testing framework • Realtime counters for dashboards • Queueing • Page sharing with comments • User lists of stumbles • Thumbnails serving • Badges serving
6356
Tuesday, June 18, 13 Analytics
HBase Prod >100 nodes
Apache MR Logs Cluster
Pig, Hive, Cascading, MySQL and pure MapReduce
64
Tuesday, June 18, 13 Analytics
HBase HBase Replication Prod Sub-second lag
Apache MR Logs Cluster
MySQL
65
Tuesday, June 18, 13 Analytics
HBase Prod
Cron’d copy Apache MR Logs Cluster
MySQL
66
Tuesday, June 18, 13 Analytics
HBase Prod
Apache MR Logs Cluster
Cron’d dump MySQL and load
67
Tuesday, June 18, 13 Monitoring With HBase
• OpenTSDB
68
Tuesday, June 18, 13 69
Tuesday, June 18, 13 Facebook Messages
70
Tuesday, June 18, 13 Facebook Messages
• Facebook needed a real email solution. • Originally the data was stored in MySQL. • They have the biggest HBase deployment. • All the emails, SMS, and chats are stored in HBase. • Users are sharded by pods of machines, each pod is configured the same way.
71
Tuesday, June 18, 13 Opower
72
Tuesday, June 18, 13 Opower
73
Tuesday, June 18, 13 Opower
• Perfect use case for HBase. • Follows a time series pattern. • Live trafc served with short scans. • New data constantly being fed.
74
Tuesday, June 18, 13 What now?
• Download HBase: http://www.hbase.org • Read HBase: The Definitive Guide by Lars George • Watch the videos from last week’s conference (available within a few weeks): http://www.hbasecon.com/ • Have a chat on #hbase hosted by irc.freenode.net
75
Tuesday, June 18, 13 76
Tuesday, June 18, 13