Apache HBase, the Scaling Machine Jean-Daniel Cryans Software Engineer at Cloudera @jdcryans

Tuesday, June 18, 13 Agenda

• Introduction to Apache HBase • HBase at StumbleUpon • Overview of other use cases

2

Tuesday, June 18, 13 About le Moi

• At Cloudera since October 2012. • At StumbleUpon for 3 years before that. • Committer and PMC member for Apache HBase since 2008. • Living in San Francisco. • From Québec, Canada.

3

Tuesday, June 18, 13 4

Tuesday, June 18, 13 What is Apache HBase

Apache HBase is an open source distributed scalable consistent low latency random access non-relational built on

5

Tuesday, June 18, 13 Inspiration: (2006)

• Goal: Low latency, consistent, random read/ write access to massive amounts of structured data. • It was the data store for Google’s crawler web table, gmail, analytics, earth, blogger, …

6

Tuesday, June 18, 13 HBase is in Production

• Inbox • Storage • Web • Search • Analytics • Monitoring

7

Tuesday, June 18, 13 HBase is in Production

• Inbox • Storage • Web • Search • Analytics • Monitoring

7

Tuesday, June 18, 13 HBase is in Production

• Inbox • Storage • Web • Search • Analytics • Monitoring

7

Tuesday, June 18, 13 HBase is in Production

• Inbox • Storage • Web • Search • Analytics • Monitoring

7

Tuesday, June 18, 13 HBase is Open Source

• Apache 2.0 License • A Community project with committers and contributors from diverse organizations: • , Cloudera, .com, Huawei, eBay, , Intel, Twitter … • Code license means anyone can modify and use the code.

8

Tuesday, June 18, 13 So why use HBase?

9

Tuesday, June 18, 13 10

Tuesday, June 18, 13 Old School Scaling

=>

• Find a scaling problem. • Beef up the machine. • Repeat until you cannot find a big enough machine or run out of funding.

11

Tuesday, June 18, 13 “Get Rid of Everything” Scaling

• Remove text search queries (LIKE) • Remove joins • Joins due to Normalization require expensive seeks • Remove foreign keys and encode your relations • Avoid constraint checks • Put all parts of a query in a single table.

• Use read slaves to scale reads. • Shard to scale writes.

12

Tuesday, June 18, 13 We “optimized the DB” by discarding some fundamental SQL/relational features.

13

Tuesday, June 18, 13 HBase is Horizontally Scalable

• Adding more servers linearly increases performance and capacity • Storage capacity • • Input/output operations Largest cluster: >1000 nodes, >1PB • Store and access data on 1-1000’s • Most clusters: commodity servers 10-40 nodes, 100GB-4TB

14

Tuesday, June 18, 13 HBase is Consistent

• Brewer’s CAP theorem • Consistency: • DB-style ACID guarantees on rows • Availability: • Favor recovering from faults over returning stale data • Partition Tolerance: • If a node goes down, the system continues.

15

Tuesday, June 18, 13 HBase Dependencies

• Apache Hadoop HDFS for data durability and reliability (Write- Ahead Log). • Apache ZooKeeper for distributed App MR coordination. • Apache Hadoop MapReduce

support built-in support for ZK HDFS running MapReduce jobs.

16

Tuesday, June 18, 13 HBase on a Cluster

17

Tuesday, June 18, 13 Tables and Regions

18

Tuesday, June 18, 13 Load Distribution

RegionServer RegionServer Region Region Region

19

Tuesday, June 18, 13 Load Distribution

This region is getting too big and afects the balancing (more about writing in a moment)

RegionServer RegionServer Region Region Region

20

Tuesday, June 18, 13 Load Distribution

Let’s split the region in order to split the load

RegionServer RegionServer Region Region Region

21

Tuesday, June 18, 13 Load Distribution

RegionServer RegionServer Region Region A Region

Region B

Now that we have smaller pieces, it’s easier to move the load around

22

Tuesday, June 18, 13 Load Distribution

RegionServer RegionServer Region Region A Region B Region

No data was actually moved during this process, only its responsibility!

23

Tuesday, June 18, 13 The region is the unit of load distribution in HBase.

24

Tuesday, June 18, 13 So HBase can scale, but what about Hadoop?

25

Tuesday, June 18, 13 HDFS Data Allocation

Client

Locations? Name node

26

Tuesday, June 18, 13 HDFS Data Allocation

Data Data Data Client node node node

Here you go Name node

27

Tuesday, June 18, 13 HDFS Data Allocation

Data is sent along the pipeline

Data Data Data Client node node node

Name node

28

Tuesday, June 18, 13 HDFS Data Allocation

Data Data Data Client node node node

ACKs are sent back as soon as the Name data is in memory in the last node node

29

Tuesday, June 18, 13 Putting it Together

Machine Region Data Data Data server node node node

Name node

30

Tuesday, June 18, 13 Data locality is extremely important for Hadoop and HBase.

31

Tuesday, June 18, 13 Scaling is just a matter of adding new nodes to the cluster.

32

Tuesday, June 18, 13 Sorted Map Datastore

33

Tuesday, June 18, 13 Sorted Map Datastore

Implicit PRIMARY KEY

Row key info:height info:state roles:hadoop roles:hbase

cutting ‘9ft’ ‘CA’ ‘Founder’

‘PMC’ @ts=2011 tlipcon ‘5ft7’ ‘CA’ ‘Commiter’ ‘Committer’ @ts=2010

34

Tuesday, June 18, 13 Sorted Map Datastore

Implicit PRIMARY KEY Format is family:qualifier

Row key info:height info:state roles:hadoop roles:hbase

cutting ‘9ft’ ‘CA’ ‘Founder’

‘PMC’ @ts=2011 tlipcon ‘5ft7’ ‘CA’ ‘Commiter’ ‘Committer’ @ts=2010

35

Tuesday, June 18, 13 Sorted Map Datastore

Implicit PRIMARY KEY Format is family:qualifier Data is all byte[] in HBase

Row key info:height info:state roles:hadoop roles:hbase

cutting ‘9ft’ ‘CA’ ‘Founder’

‘PMC’ @ts=2011 tlipcon ‘5ft7’ ‘CA’ ‘Commiter’ ‘Committer’ @ts=2010

36

Tuesday, June 18, 13 Sorted Map Datastore

Implicit PRIMARY KEY Format is family:qualifier Data is all byte[] in HBase

Row key info:height info:state roles:hadoop roles:hbase

cutting ‘9ft’ ‘CA’ ‘Founder’

‘PMC’ @ts=2011 tlipcon ‘5ft7’ ‘CA’ ‘Commiter’ ‘Committer’ @ts=2010

A single cell might have diferent values at diferent timestamps

37

Tuesday, June 18, 13 Sorted Map Datastore

Implicit PRIMARY KEY Format is family:qualifier Data is all byte[] in HBase

Row key info:height info:state roles:hadoop roles:hbase

cutting ‘9ft’ ‘CA’ ‘Founder’

‘PMC’ @ts=2011 tlipcon ‘5ft7’ ‘CA’ ‘Commiter’ ‘Committer’ @ts=2010

A single cell might have Diferent rows may have diferent values at diferent sets of columns diferent timestamps (table is sparse)

38

Tuesday, June 18, 13 Anatomy of a row

• Each row has a primary key • Lexicographically sorted byte[] • Timestamp associated for keeping multiple versions of data (MVCC for consistency) • Row is made up of columns. • Each (row,column) referred to as a Cell • Contents of a cell are all byte[]’s. • Apps must “know” types and handle them. • Rows are Strongly consistent.

39

Tuesday, June 18, 13 Access HBase data via an API

• Data operations • Get • Put • Delete • Scan • Compare-and-swap • DDL operations • Create • Alter • Enable/Disable • Access via HBase shell, Java API, REST proxy

40

Tuesday, June 18, 13 Java API

HBase provides utilities for easy conversions byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”);

Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); p.add(fam, qual, putVal) table.put(p);

Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);

41

Tuesday, June 18, 13 Java API

HBase provides utilities for easy conversions byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); This reads the configuration files

Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); p.add(fam, qual, putVal) table.put(p);

Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);

42

Tuesday, June 18, 13 Java API

HBase provides utilities for easy conversions byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); This reads the configuration files

Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); This creates a connection to p.add(fam, qual, putVal) the cluster, no master needed table.put(p);

Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);

43

Tuesday, June 18, 13 Java API

HBase provides utilities for easy conversions byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); This reads the configuration files

Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); This creates a connection to p.add(fam, qual, putVal) the cluster, no master needed table.put(p); By default all operations are persisted

Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);

44

Tuesday, June 18, 13 Java API

HBase provides utilities for easy conversions byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); This reads the configuration files

Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); This creates a connection to p.add(fam, qual, putVal) the cluster, no master needed table.put(p); By default all operations are persisted

Get g = new Get(row); Result r = table.get(g); From the moment the call to put() came byte[] jd = r.getValue(col); back, the data became visible to all the readers

45

Tuesday, June 18, 13 46

Tuesday, June 18, 13 StumbleUpon is...

47

Tuesday, June 18, 13 The Product

48

Tuesday, June 18, 13 Pushing this button takes you The Product to your next recommendation

49

Tuesday, June 18, 13 You can tell the recommendation The Product engine if you liked the page or not

50

Tuesday, June 18, 13 The Product

You can also browse specific interests

51

Tuesday, June 18, 13 Business Model

Users are showed sponsored pages that are relevant to their interests

52

Tuesday, June 18, 13 If HBase goes down, the site goes down

53

Tuesday, June 18, 13 Architecture

Load Apache Apache Apache Balancer HTTP Thrift HBase

54

Tuesday, June 18, 13 Architecture

Load Apache Apache Apache Balancer HTTP Thrift HBase

PHP

55

Tuesday, June 18, 13 Architecture

Load Apache Apache Apache Balancer HTTP Thrift HBase

PHP ~40 nodes

56

Tuesday, June 18, 13 A Few Use Cases

• A/B testing framework

57

Tuesday, June 18, 13 A Few Use Cases

• A/B testing framework • Realtime counters for dashboards

5856

Tuesday, June 18, 13 A Few Use Cases

• A/B testing framework • Realtime counters for dashboards • Queueing

5956

Tuesday, June 18, 13 A Few Use Cases

• A/B testing framework • Realtime counters for dashboards • Queueing • Page sharing with comments

6056

Tuesday, June 18, 13 A Few Use Cases

• A/B testing framework • Realtime counters for dashboards • Queueing • Page sharing with comments • User lists of stumbles

6156

Tuesday, June 18, 13 A Few Use Cases

• A/B testing framework • Realtime counters for dashboards • Queueing • Page sharing with comments • User lists of stumbles • Thumbnails serving

6256

Tuesday, June 18, 13 A Few Use Cases

• A/B testing framework • Realtime counters for dashboards • Queueing • Page sharing with comments • User lists of stumbles • Thumbnails serving • Badges serving

6356

Tuesday, June 18, 13 Analytics

HBase Prod >100 nodes

Apache MR Logs Cluster

Pig, Hive, Cascading, MySQL and pure MapReduce

64

Tuesday, June 18, 13 Analytics

HBase HBase Replication Prod Sub-second lag

Apache MR Logs Cluster

MySQL

65

Tuesday, June 18, 13 Analytics

HBase Prod

Cron’d copy Apache MR Logs Cluster

MySQL

66

Tuesday, June 18, 13 Analytics

HBase Prod

Apache MR Logs Cluster

Cron’d dump MySQL and load

67

Tuesday, June 18, 13 Monitoring With HBase

• OpenTSDB

68

Tuesday, June 18, 13 69

Tuesday, June 18, 13 Facebook Messages

70

Tuesday, June 18, 13 Facebook Messages

• Facebook needed a real email solution. • Originally the data was stored in MySQL. • They have the biggest HBase deployment. • All the emails, SMS, and chats are stored in HBase. • Users are sharded by pods of machines, each pod is configured the same way.

71

Tuesday, June 18, 13 Opower

72

Tuesday, June 18, 13 Opower

73

Tuesday, June 18, 13 Opower

• Perfect use case for HBase. • Follows a time series pattern. • Live trafc served with short scans. • New data constantly being fed.

74

Tuesday, June 18, 13 What now?

• Download HBase: http://www.hbase.org • Read HBase: The Definitive Guide by Lars George • Watch the videos from last week’s conference (available within a few weeks): http://www.hbasecon.com/ • Have a chat on #hbase hosted by irc.freenode.net

75

Tuesday, June 18, 13 76

Tuesday, June 18, 13