from the inside
Justin Sheehy Basho Technologies Riak is a scalable, highly-available, distributed open-source key/value store.
basho Riak is a scalable, highly-available, distributed open-source key/value store....
...but that’s not what I’m here to tell you about.
basho Riak is a system built using Erlang/OTP for robustness, flexibility, and simplicity.
basho Riak is a system built using Erlang/OTP for robustness, flexibility, and simplicity....
...so today, I’ll talk about how that helped us.
basho non-topics
Riak Core’s Implementation Details -- essential distributed systems infrastructure Riak Features -- anti-entropy: hinted-handoff and read-repair -- javascript map/reduce -- storage subsystems: bitcask, innostore...
basho client application
protobufs http
riak_client
dynamo model FSMs
The Riak key/value stack: riak core
vnode master
k/v vnode
storage engine
basho It’s not only a “stack”, of course:
kv_sup
vnode_sup vnode master protobufs http
k/v vnode k/v vnode k/v vnode
storage engine storage engine storage engine
(this represents a part of the k/v section of the supervisor tree) basho client application
protobufs http
riak_client
dynamo model FSMs
The Riak key/value stack: riak core
vnode master
k/v vnode
storage engine
basho client application client application client application
protobufs http protobufs http protobufs http
riak_client riak_client riak_client
dynamo model FSMs dynamo model FSMs dynamo model FSMs
the nodes are connected with riak core using gossip, consistent hashing, etc
vnode master vnode master vnode master
k/v vnode k/v vnode k/v vnode
storage engine storage engine storage engine
basho let’s start at the top client application
protobufs http more than one way to access riak_client key/value data dynamo model FSMs
riak core
interchangeable: vnode master use them both k/v vnode
storage engine
basho get/put is representation transfer client application HTTP is a great transfer protocol protobufs http
riak_client
•ubiquity dynamo model FSMs •flexibility riak core •interoperability vnode master
This is why we wrote webmachine! k/v vnode
storage engine
basho throughput matters! client application protobufs are fast, simple, compact protobufs http
riak_client •{packet,4} dynamo model FSMs
•{active,once} riak core
vnode master •socket owner handles both TCP packets and k/v vnode internal response messages storage engine
basho both the protobuf and HTTP entry client application points use the same interface protobufs http
erlang native interface as general API riak_client
dynamo model FSMs •parameterized module over •coordinator node riak core •client instance id vnode master
k/v vnode •all access into Riak defined here storage engine
basho client application
protobufs http simple interface, complex semantics riak_client direct inspiration: Amazon’s Dynamo dynamo model FSMs
riak core •gen_fsm helps a lot here vnode master •interactions with N other nodes •multiple phases of interaction k/v vnode • version vector resolution storage engine
basho digression: testing tricky FSMs is tricky ( dynamo model FSMs )
QuickCheck to the rescue!
•property-based / model-based tests
•bugs may only appear with unexpected combinations of events
•shrinking these combinations helps find minimal failure cases basho client application
protobufs http simple interface, complex semantics riak_client direct inspiration: Amazon’s Dynamo dynamo model FSMs
riak core •gen_fsm helps a lot here vnode master •interactions with N other nodes •multiple phases of interaction k/v vnode • version vector resolution storage engine
basho client application
protobufs http
FSMs run anywhere, use everything riak_client
dynamo model FSMs
Riak Core: fundamental distribution riak core
vnode master
arbitrary number of storage nodes k/v vnode
each contributing to the whole storage engine
basho client application
protobufs http
gen_server as a multiplexer riak_client
and well-known entry point dynamo model FSMs
riak core
one host, many virtual nodes vnode master
k/v vnode
instantiate vnodes as needed storage engine
basho client application disposable, per-partition actor protobufs http for access to local data riak_client
enable parallelism & fault-tolerance dynamo model FSMs the Erlang way (per-process) riak core
vnode master
node-local k/v storage abstraction k/v vnode
storage engine
basho client application
like an Erlang “behaviour” protobufs http separating development of disk riak_client
or other storage from distribution dynamo model FSMs
riak core steps forward w/o breaking code vnode master (innostore, bitcask, ...) k/v vnode
all storage systems look the same storage engine basho client application
protobufs http
riak_client
dynamo model FSMs
it’s just a key/value store riak core
vnode master
k/v vnode
from the bottom storage engine basho from the top client application
protobufs http
riak_client
dynamo model FSMs
it’s just a key/value store riak core
vnode master
k/v vnode
storage engine
basho client application
protobufs http
riak_client
dynamo model FSMs
it’s a distributed system at heart riak core
vnode master
k/v vnode
storage engine
basho client application
protobufs http
riak_client
dynamo model FSMs
carefully managed complexity... riak core
vnode master
k/v vnode
storage engine
basho client application
protobufs http
riak_client
dynamo model FSMs
allows simplicity at the edges riak core
vnode master
k/v vnode
storage engine
basho http://www.basho.com follow twitter.com/basho/team [email protected] #riak on Freenode
basho