Google Megastore & Spanner The Problem

I have a great app idea, but I don’t want to learn much about databases. Can’t it just be easy and scalable and reliable? Because that’s impossible!

Consistency Availability

Scalability

RDBMS NoSQL

Highly Scalable Rapid Development

Google Low Latency Megastore

Consistent Highly Available , Google’s distributed KV store

“contents:” “anchor:cnnsicom” “anchor:my.look.ca”

t3 t4 “CNN” t9 “CNN.com” t8 t5 “...” Datacenters

Entity Groups Megastore 2, 14, 37.... 2, 14, 37.... 2, 14, 37....

BigTable 58,90,102.. 58,90,102.. 58,90,102..

GFS 700, 706.. 700, 706.. 700, 706.. Datacenters

Entity Groups

user1 user1 user1 @.com @gmail.com @gmail.com

user2 user2 user2 @gmail.com @gmail.com @gmail.com

user3 user3 user3 @gmail.com @gmail.com @gmail.com Paxos Basics - Read and Write Read, Phase 1 - Reading replica polls other replicas Read, Phase 2 - Available replicas respond with their values Write, Prepare - Writing replica asks other replicas to conduct vote 0

0 0

0 0

0

0 0 Write, Promise -- Available replicas promise to ignore lower proposals 0

0 0

0 0

0

0 0 Write, Accept - Writing replica proposes its value

0, 3

0, 3 0, 3

0, 3 0, 3

0, 3

0, 3 0, 3 Write, Commit - Available replicas accept the value Paxos Basics -- Write Conflicts Prepare - 2 writing replicas want to make proposals

0

0 0

1 1

1

1 1 Promise

0

0 0

1 1

1

1 1 More prepares - Lower numbered proposals get rejected / replaced 1

1 1

0

1 1

0 0 0 1

1 1 More Promises

1 1 1

1 1

1

1 1 Accept - Only 1 replica gets the majority of promises required 1, 3

1, 3 1, 3

1, 3 1, 3

1, 3

1, 3 1, 3 Commit How could Paxos be made more efficient? Local Reads via a “coordinator” Local replica up-to-date Local replica out-of-date Local replica out-of-date - normal Paxos read Local replica out-of-date - update replica and coordinator Faster Writes: Accept & Prepare in 1 communication

0, 3, b

0, 3, b 0, 3, b

0, 3, b 0, 3, b

0, 3, b

0, 3, b b 0, 3, b Writing replica requests to be proposal 0

8

b Immediate Accept & Prepare for next round

0, 8, c 0, 8, c 0, 8, c

0, 8, c 0, 8, c

0, 8, c 0, 8, c b

0, 8, c Megastore Performance Analysis Performance Analysis

Our experimental setup is that we’ve used Megastore for 100+ applications for several years and it works!

What are some drawbacks to this solution? Highly Scalable Rapid Development

Google Low Latency Megastore

Consistent Highly Available Highly Scalable Rapid Development

Google Spanner Low Latency

Consistent Highly Available 2016-02-10, 4:35:30 2016-02-10, 4:35:31 What time is it?

2016-02-10, 4:35:32

2016-02-10, 4:35:33

TT.now() → TTinterval:

Lower bound Absolute time Upper bound Improved Paxos: Serialization with Locking

timestamp _0 timestamp_0 timestamp _0

timestamp _0 timestamp _0

timestamp _0

timestamp _0

timestamp _0 Improved Paxos: Long Leader Leases

ts _0

ts_0 ts_0 ts_1

ts_1 ts_1

ts_0

ts_1

ts _0 ts _0 ts _0

ts_1 ts_1

ts_1 Spanner Performance Analysis Setup

Next steps? Supporting more complex SQL queries with an underlying key-value structure

ID, name, age, address key, value ... … … … … … … … … ... … … ...... The Future

SQL NoSQL

NewSQL