Riak from the Inside

Riak from the Inside

from the inside Justin Sheehy Basho Technologies Riak is a scalable, highly-available, distributed open-source key/value store. basho Riak is a scalable, highly-available, distributed open-source key/value store.... ...but that’s not what I’m here to tell you about. basho Riak is a system built using Erlang/OTP for robustness, flexibility, and simplicity. basho Riak is a system built using Erlang/OTP for robustness, flexibility, and simplicity.... ...so today, I’ll talk about how that helped us. basho non-topics Riak Core’s Implementation Details -- essential distributed systems infrastructure Riak Features -- anti-entropy: hinted-handoff and read-repair -- javascript map/reduce -- storage subsystems: bitcask, innostore... basho client application protobufs http riak_client dynamo model FSMs The Riak key/value stack: riak core vnode master k/v vnode storage engine basho It’s not only a “stack”, of course: kv_sup vnode_sup vnode master protobufs http k/v vnode k/v vnode k/v vnode storage engine storage engine storage engine (this represents a part of the k/v section of the supervisor tree) basho client application protobufs http riak_client dynamo model FSMs The Riak key/value stack: riak core vnode master k/v vnode storage engine basho client application client application client application protobufs http protobufs http protobufs http riak_client riak_client riak_client dynamo model FSMs dynamo model FSMs dynamo model FSMs the nodes are connected with riak core using gossip, consistent hashing, etc vnode master vnode master vnode master k/v vnode k/v vnode k/v vnode storage engine storage engine storage engine basho let’s start at the top client application protobufs http more than one way to access riak_client key/value data dynamo model FSMs riak core interchangeable: vnode master use them both k/v vnode storage engine basho get/put is representation transfer client application HTTP is a great transfer protocol protobufs http riak_client •ubiquity dynamo model FSMs •flexibility riak core •interoperability vnode master This is why we wrote webmachine! k/v vnode storage engine basho throughput matters! client application protobufs are fast, simple, compact protobufs http riak_client •{packet,4} dynamo model FSMs •{active,once} riak core vnode master •socket owner handles both TCP packets and k/v vnode internal response messages storage engine basho both the protobuf and HTTP entry client application points use the same interface protobufs http erlang native interface as general API riak_client dynamo model FSMs •parameterized module over •coordinator node riak core •client instance id vnode master k/v vnode •all access into Riak defined here storage engine basho client application protobufs http simple interface, complex semantics riak_client direct inspiration: Amazon’s Dynamo dynamo model FSMs riak core •gen_fsm helps a lot here vnode master •interactions with N other nodes •multiple phases of interaction k/v vnode • version vector resolution storage engine basho digression: testing tricky FSMs is tricky ( dynamo model FSMs ) QuickCheck to the rescue! •property-based / model-based tests •bugs may only appear with unexpected combinations of events •shrinking these combinations helps find minimal failure cases basho client application protobufs http simple interface, complex semantics riak_client direct inspiration: Amazon’s Dynamo dynamo model FSMs riak core •gen_fsm helps a lot here vnode master •interactions with N other nodes •multiple phases of interaction k/v vnode • version vector resolution storage engine basho client application protobufs http FSMs run anywhere, use everything riak_client dynamo model FSMs Riak Core: fundamental distribution riak core vnode master arbitrary number of storage nodes k/v vnode each contributing to the whole storage engine basho client application protobufs http gen_server as a multiplexer riak_client and well-known entry point dynamo model FSMs riak core one host, many virtual nodes vnode master k/v vnode instantiate vnodes as needed storage engine basho client application disposable, per-partition actor protobufs http for access to local data riak_client enable parallelism & fault-tolerance dynamo model FSMs the Erlang way (per-process) riak core vnode master node-local k/v storage abstraction k/v vnode storage engine basho client application like an Erlang “behaviour” protobufs http separating development of disk riak_client or other storage from distribution dynamo model FSMs riak core steps forward w/o breaking code vnode master (innostore, bitcask, ...) k/v vnode all storage systems look the same storage engine basho client application protobufs http riak_client dynamo model FSMs it’s just a key/value store riak core vnode master k/v vnode from the bottom storage engine basho from the top client application protobufs http riak_client dynamo model FSMs it’s just a key/value store riak core vnode master k/v vnode storage engine basho client application protobufs http riak_client dynamo model FSMs it’s a distributed system at heart riak core vnode master k/v vnode storage engine basho client application protobufs http riak_client dynamo model FSMs carefully managed complexity... riak core vnode master k/v vnode storage engine basho client application protobufs http riak_client dynamo model FSMs allows simplicity at the edges riak core vnode master k/v vnode storage engine basho http://www.basho.com follow twitter.com/basho/team [email protected] #riak on Freenode basho.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    27 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us