Building Large Tarantool Cluster with 100+ Nodes Yaroslav Dynnikov

Building large Tarantool cluster with 100+ nodes Yaroslav Dynnikov Tarantool, Mail.Ru Group 10 October 2019 Slides: rosik.github.io/2019-bigdatadays 1 / 40 Tarantool = + Database Application server (Lua) (Transactions, WAL) (Business logics, HTTP) 2 / 40 Core team 20 C developers Product development Solution team 35 Lua developers Commertial projects 3 / 40 Core team 20 C developers Product development Solution team 35 Lua developers Commertial projects Common goals Make development fast and reliable 4 / 40 In-memory no-SQL Not only in-memory: vinyl disk engine Supports SQL (since v.2) 5 / 40 In-memory no-SQL Not only in-memory: vinyl disk engine Supports SQL (since v.2) But We need scaling (horizontal) 6 / 40 Vshard - horizontal scaling in tarantool Vshard assigns data to virtual buckets Buckets are distributed across servers 7 / 40 Vshard - horizontal scaling in tarantool Vshard assigns data to virtual buckets Buckets are distributed across servers 8 / 40 Vshard - horizontal scaling in tarantool Vshard assigns data to virtual buckets Buckets are distributed across servers 9 / 40 Vshard - horizontal scaling in tarantool Vshard assigns data to virtual buckets Buckets are distributed across servers 10 / 40 Vshard - horizontal scaling in tarantool Vshard assigns data to virtual buckets Buckets are distributed across servers 11 / 40 Vshard - horizontal scaling in tarantool Vshard assigns data to virtual buckets Buckets are distributed across servers 12 / 40 Vshard configuration Lua tables sharding_cfg = { ['cbf06940-0790-498b-948d-042b62cf3d29'] = { replicas = { ... }, }, ['ac522f65-aa94-4134-9f64-51ee384f1a54'] = { replicas = { ... }, }, } vshard.router.cfg(...) vshard.storage.cfg(...) 13 / 40 Vshard automation. Options Deployment scripts Docker compose Zookeeper 14 / 40 Vshard automation. Options Deployment scripts Docker compose Zookeeper Our own orchestrator 15 / 40 Orchestrator requirements Doesn't start/stop instances (systemd can do that) Applies configuration only Every cluster node can manage others Built-in monitoring 16 / 40 Clusterwide configuration Must be the same everywhere Applied with two-phase commit topology: servers: # two servers s1: replicaset_uuid: A uri: localhost:3301 s2: replicaset_uuid: A uri: localhost:3302 replicasets: # one replicaset A: roles: ... 17 / 40 Which came first? Database Orchestrator tcp_listen() apply_2pc() 18 / 40 Membership implementation SWIM protocol - one of the gossips protocols family 19 / 40 Membership implementation SWIM protocol - one of the gossips protocols family 20 / 40 Membership implementation SWIM protocol - one of the gossips protocols family 21 / 40 Membership implementation SWIM protocol - one of the gossips protocols family 22 / 40 Membership implementation SWIM protocol - one of the gossips protocols family 23 / 40 Membership implementation SWIM protocol - one of the gossips protocols family Dissemination speed: O(logN) Network load: O(N) 24 / 40 Membership implementation SWIM protocol - one of the gossips protocols family Dissemination speed: O(logN) Network load: O(N) PING ! ! + payload ACK ! ! 25 / 40 Bootsrapping new instance 1. New process starts 2. New process joins membership 3. Cluster checks new process is alive 4. Cluster applies configuration 5. New process polls it from membership 6. New process bootstraps 7. Repeat N times 26 / 40 Bootsrapping new instance 1. New process starts 2. New process joins membership 3. Cluster checks new process is alive 4. Cluster applies configuration 5. New process polls it from membership 6. New process bootstraps 7. Repeat N times N = 100 27 / 40 Benefits so far 1. Orchestration works 28 / 40 Benefits so far 1. Orchestration works 29 / 40 Benefits so far 1. Orchestration works 30 / 40 Benefits so far 1. Orchestration works 2. Monitoring works 31 / 40 Benefits so far 1. Orchestration works 2. Monitoring works 32 / 40 Benefits so far 1. Orchestration works 2. Monitoring works 3. We can assign any role to the instance 33 / 40 Benefits so far 1. Orchestration works 2. Monitoring works 3. We can assign any role to the instance 34 / 40 Role management function init() 35 / 40 Role management function init() function validate_config() function apply_config() 36 / 40 Role management function init() function validate_config() function apply_config() function stop() 37 / 40 Refactoring the bootstrap process Assembling large clusters with 100+ instances is slow N two-phase commits are slow N config pollings is slow 38 / 40 Refactoring the bootstrap process Assembling large clusters with 100+ instances is slow N two-phase commits are slow N config pollings is slow Solution Bootstrap all instances with a single 2pc Re-implement binary protocol and reuse port 39 / 40 Links tarantool.io github.com/tarantool/tarantool Telegram - @tarantool, @tarantool_news Cartridge framework - github.com/tarantool/cartridge Cartridge CLI - github.com/tarantool/cartridge-cli Posts on Habr - habr.com/users/rosik/ This presentation - rosik.github.io/2019-bigdatadays Questions? 40 / 40.

Building Large Tarantool Cluster with 100+ Nodes Yaroslav Dynnikov

Tarantool Enterprise Manual Release 1.10.4-1

LIST of NOSQL DATABASES [Currently 150]

SCHEDULE SCHEDULE Thanks to Our Sponsors

Alexander Turenko

Tarantool Team's Experience with Lua Developer Tools

3 Generic Programming with Mutually Recursive Types 27 3.1 the Generics-Mrsop Library

Tarantool Get Started Guide Draft 8

FOSDEM 2020 Schedule

Watcher Release 0.2.0

Storage Solutions for Big Data Systems: a Qualitative Study and Comparison

Использование Tarantool В .NET-Проектах Анатолий Попов Director of Engineering, Net2phone Тезисы

Tarantool a No-SQL DBMS Now with SQL Kirill Yukhin