CockroachDB Scalable, survivable, strongly consistent, SQL

presented by Ben Darnell / CTO About Me

• Co-founder of Cockroach Labs • Previously at , Dropbox, Square

@ Agenda

● Motivation ● High-level architecture ● Some CockroachDB Features ● Q & A ● Interruptions are encouraged!

@cockroachdb Motivation

@cockroachdb Limitations of Existing

Relational NoSQL Hard to scale horizontally Scalability with strings attached

● Scalability: manual sharding ● Limited transactions: developer results in high operational burden due to complex data complexity and application modeling OR rewrites ● Limited indexes: lost flexibility ● Replication: wasted resources with querying and analytics (stand-by servers) or lost ● Eventual consistency: consistency (asynchronous correctness issues and higher replication) risk of data corruption

@cockroachdb CockroachDB: The Best of Both Worlds

• Single binary/symmetric nodes • Applications see one logical DB, including cross-datacenter, global • Self-healing/self-balancing • Scale out is as simple as adding nodes • SQL

@cockroachdb High-Level Architecture

@cockroachdb Abstraction Stack

SQL

Transactional KV

Distribution

Replication

Storage

@cockroachdb Transactional KV

• Monolithic sorted key-value map SQL

• Automatically replicated and distributed Transactional KV • Consistent Distribution • Self-healing Replication

@cockroachdb Transactional KV: ACID

• Atomicity. All operations or no operations. SQL

• Consistency. No violating constraints. Transactional KV • Isolation. Exclusive access. Distribution • Durability. Committed data survives crashes. Replication

@cockroachdb SQL: Structured Data Model Inventory

● Tables

@cockroachdb SQL: Structured Data Model Inventory

● Tables ● Rows

@cockroachdb SQL: Structured Data Model Inventory

ID Name Quantity ● Tables ● Rows 1 Glove 1 ● Columns 2 Ball 4

3 Shirt 2

4 Shorts 12

5 Bat 0

6 Shoes 4

@cockroachdb SQL: Structured Data Model Name_Idx Inventory

Name ID Name Quantity ● Tables ● Rows Ball 1 Glove 1 ● Columns Bat 2 Ball 4

● Indexes Glove 3 Shirt 2 Shirt 4 Shorts 12 Shoes 5 Bat 0

Shorts 6 Shoes 4

@cockroachdb SQL

CREATE TABLE inventory ( SQL id INTEGER PRIMARY KEY, Transactional KV name VARCHAR, Distribution quantity INTEGER, INDEX name_index (name) Replication );

@cockroachdb SQL: Key anatomy

INSERT INTO inventory VALUES (1, ‘Apple’, 12); INSERT INTO inventory VALUES (2, ‘Orange’, 15);

id name quantity key /

/// Value 1 Apple 12 /inventory/primary/1/name Apple = /inventory/primary/1/quantity 12 2 Orange 15 /inventory/primary/2/name Orange

/inventory/primary/2/quantity 15

@cockroachdb Distribution: Sharding The data is split into ~64MB ranges. Each holds a contiguous range of the key space.

Ø-lem lem-pea pea-∞

apricot lemon peach

banana lime pear

blueberry mango pineapple

cherry melon raspberry

grape orange strawberry

@cockroachdb Distribution: Index An index maps from key to range ID

shard index Ø-lem lem-pea pea-∞

Ø-lem lem-pea pea-∞

apricot lemon peach

banana lime pear

blueberry mango pineapple

cherry melon raspberry

grape orange strawberry @cockroachdb Distribution: Split Split when a range is too large (or too hot, or…)

shard index Ø-lem lem-pea pea-str str-∞

Ø-lem lem-pea pea-str str-∞

apricot lemon peach strawberry

banana lime pear tamarillo

blueberry mango pineapple tamarind

cherry melon raspberry

grape orange @cockroachdb Replication: Survivability

● Each range is replicated to three or more SQL

nodes Transactional KV ● Consensus via Raft Distribution ● "Leaseholder" optimization to allow reads Replication to be served without consensus ● Multi-Version Concurrency Control

@cockroachdb Data Distribution: Placement

Node 1 Node 2 Node 3

Range 1 Range 1 Range 1 Each range is replicated Range 2 Range 2 Range 2 to three or more nodes RangeRange 32 Range 3 Range 3

@cockroachdb Data Distribution: Rebalancing

Node 1 Node 2 Node 3

Range 1 Range 1 Range 1 Adding a new (empty) Range 2 Range 2 Range 2 node RangeRange 32 Range 3 Range 3

Node 4

@cockroachdb Data Distribution: Rebalancing

Node 1 Node 2 Node 3

Range 1 Range 1 Range 1 A new replica is Range 2 Range 2 Range 2 allocated, data is RangeRange 32 Range 3 Range 3 copied.

Node 4

Range 3

@cockroachdb Data Distribution: Rebalancing

Node 1 Node 2 Node 3

Range 1 Range 1 Range 1 The new replica is made Range 2 Range 2 Range 2 live, replacing another. RangeRange 32 Range 3 Range 3

Node 4

Range 3

@cockroachdb Data Distribution: Rebalancing

Node 1 Node 2 Node 3

Range 1 Range 1 Range 1 The old (inactive) replica Range 2 Range 2 Range 2 is deleted. RangeRange 32 Range 3

Node 4

Range 3

@cockroachdb Data Distribution: Rebalancing

Node 1 Node 2 Node 3

Range 1 Range 1 Range 1 Process continues until Range 2 Range 2 nodes are balanced. RangeRange 32 Range 3

Node 4

Range 2

Range 3

@cockroachdb Data Distribution: Recovery

Node 1 Node 2 Node 3

Range 1 Range 1 Range 1 Losing a node causes Range 2 Range 2 recovery of its replicas. RangeRange 32 Range 3 X Node 4

Range 2

Range 3

@cockroachdb Data Distribution: Recovery

Node 1 Node 2 Node 3

Range 1 Range 1 Range 1 A new replica gets Range 2 Range 2 created on an existing RangeRange 32 Range 3 Range 3 node. X Node 4 Range 1

Range 2

Range 3

@cockroachdb Data Distribution: Recovery

Node 1 Node 3

Range 1 Range 1 Once at full replication, Range 2 Range 2 the old replicas are RangeRange 32 Range 3 forgotten.

Node 4 Range 1

Range 2

Range 3

@cockroachdb Some CockroachDB Features

@cockroachdb Geographic Zone Configurations

● Control where your data is ● Nodes are tagged with attributes and hierarchical localities ● Rules target these ● Zero downtime data migrations

@cockroachdb Geo-Partitioning

■ Domicile data according to customer ○Meet regulatory constraints ○Low-latency reads / writes ■ One logical database ○Simplified app development

@cockroachdb Distributed SQL

SELECT l_shipmode, AVG(l_extendedprice) FROM lineitem GROUP BY l_shipmode;

@cockroachdb Online Schema Changes

• Based on Google's F1 Paper • State machine, possibly with backfill • Zero downtime

@cockroachdb Questions? [email protected] github.com/cockroachdb www.cockroachlabs.com Other Topics

• (New in 2.1) Query optimizer • Testing with Jepsen • Graphical admin UI • Distributed import

@cockroachdb Backup/Restore

• Distributed • Consistent to a point in time • Incremental

@cockroachdb