from the inside

Justin Sheehy Basho Technologies Riak is a scalable, highly-available, distributed open-source key/value store.

basho Riak is a scalable, highly-available, distributed open-source key/value store....

...but that’s not what I’m here to tell you about.

basho Riak is a system built using Erlang/OTP for robustness, flexibility, and simplicity.

basho Riak is a system built using Erlang/OTP for robustness, flexibility, and simplicity....

...so today, I’ll talk about how that helped us.

basho non-topics

Riak Core’s Implementation Details -- essential distributed systems infrastructure Riak Features -- anti-entropy: hinted-handoff and read-repair -- map/reduce -- storage subsystems: bitcask, innostore...

basho client application

protobufs http

riak_client

dynamo model FSMs

The Riak key/value stack: riak core

vnode master

k/v vnode

storage engine

basho It’s not only a “stack”, of course:

kv_sup

vnode_sup vnode master protobufs http

k/v vnode k/v vnode k/v vnode

storage engine storage engine storage engine

(this represents a part of the k/v section of the supervisor tree) basho client application

protobufs http

riak_client

dynamo model FSMs

The Riak key/value stack: riak core

vnode master

k/v vnode

storage engine

basho client application client application client application

protobufs http protobufs http protobufs http

riak_client riak_client riak_client

dynamo model FSMs dynamo model FSMs dynamo model FSMs

the nodes are connected with riak core using gossip, consistent hashing, etc

vnode master vnode master vnode master

k/v vnode k/v vnode k/v vnode

storage engine storage engine storage engine

basho let’s start at the top client application

protobufs http more than one way to access riak_client key/value data dynamo model FSMs

riak core

interchangeable: vnode master use them both k/v vnode

storage engine

basho get/put is representation transfer client application HTTP is a great transfer protocol protobufs http

riak_client

•ubiquity dynamo model FSMs •flexibility riak core •interoperability vnode master

This is why we wrote webmachine! k/v vnode

storage engine

basho throughput matters! client application protobufs are fast, simple, compact protobufs http

riak_client •{packet,4} dynamo model FSMs

•{active,once} riak core

vnode master •socket owner handles both TCP packets and k/v vnode internal response messages storage engine

basho both the protobuf and HTTP entry client application points use the same interface protobufs http

erlang native interface as general API riak_client

dynamo model FSMs •parameterized module over •coordinator node riak core •client instance id vnode master

k/v vnode •all access into Riak defined here storage engine

basho client application

protobufs http simple interface, complex semantics riak_client direct inspiration: Amazon’s Dynamo dynamo model FSMs

riak core •gen_fsm helps a lot here vnode master •interactions with N other nodes •multiple phases of interaction k/v vnode • version vector resolution storage engine

basho digression: testing tricky FSMs is tricky ( dynamo model FSMs )

QuickCheck to the rescue!

•property-based / model-based tests

•bugs may only appear with unexpected combinations of events

•shrinking these combinations helps find minimal failure cases basho client application

protobufs http simple interface, complex semantics riak_client direct inspiration: Amazon’s Dynamo dynamo model FSMs

riak core •gen_fsm helps a lot here vnode master •interactions with N other nodes •multiple phases of interaction k/v vnode • version vector resolution storage engine

basho client application

protobufs http

FSMs run anywhere, use everything riak_client

dynamo model FSMs

Riak Core: fundamental distribution riak core

vnode master

arbitrary number of storage nodes k/v vnode

each contributing to the whole storage engine

basho client application

protobufs http

gen_server as a multiplexer riak_client

and well-known entry point dynamo model FSMs

riak core

one host, many virtual nodes vnode master

k/v vnode

instantiate vnodes as needed storage engine

basho client application disposable, per-partition actor protobufs http for access to local data riak_client

enable parallelism & fault-tolerance dynamo model FSMs the Erlang way (per-process) riak core

vnode master

node-local k/v storage abstraction k/v vnode

storage engine

basho client application

like an Erlang “behaviour” protobufs http separating development of disk riak_client

or other storage from distribution dynamo model FSMs

riak core steps forward w/o breaking code vnode master (innostore, bitcask, ...) k/v vnode

all storage systems look the same storage engine basho client application

protobufs http

riak_client

dynamo model FSMs

it’s just a key/value store riak core

vnode master

k/v vnode

from the bottom storage engine basho from the top client application

protobufs http

riak_client

dynamo model FSMs

it’s just a key/value store riak core

vnode master

k/v vnode

storage engine

basho client application

protobufs http

riak_client

dynamo model FSMs

it’s a distributed system at heart riak core

vnode master

k/v vnode

storage engine

basho client application

protobufs http

riak_client

dynamo model FSMs

carefully managed complexity... riak core

vnode master

k/v vnode

storage engine

basho client application

protobufs http

riak_client

dynamo model FSMs

allows simplicity at the edges riak core

vnode master

k/v vnode

storage engine

basho http://www.basho.com follow twitter.com/basho/team [email protected] #riak on Freenode

basho