Using Rust to Build a Distributed Transactional Key-Value Database LiuTang | [email protected] About me

● Chief Architect at PingCAP ● TiDB and TiKV ● Open source projects ○ LedisDB ○ go- ○ go-mysql-elasticsearch ○ rust-prometheus ○ ... Agenda

● Introduction ● Hierarchy ○ Storage ○ Raft ○ Transaction ○ RPC Framework ○ Monitor ○ Test ● Combine them all When we want to build a distributed transactional key-value database... Consistency

Performance Scalability

ACID Stability

HA Others… A High Building, A Low Foundation Language

Let’s start from scratch!!!

RocksDB

Immutable Memory Memory Table Table WAL

Flush Memory Disk Compaction SST Info Log Level 0

SST SST …... SST Manifest Level 1 Current SST SST …... SST Level 2 https://github.com/pingcap/rust-rocksdb

Raft

Client

State State State Machine Raft Machine Raft Machine Raft a = 1 Module a = 1 Module a = 1 Module b = 2 b = 2 b = 2

Log a = 1 b = 2 Log a = 1 b = 2 Log a = 1 b = 2 Multi-Raft

Key Space

A - B Raft Group Region 1 Region 1 Region 1

Raft Group Region 2 Region 2 Region 2 B - C

Raft Group Region 3 Region 3 Region 3 C - D Multi-Raft - Scalability

A B C D

Region 1 Region 1 Region 1

Region 2 Region 2 Region 2 Multi-Raft - Scalability

A B C D

Region 1 Region 1 Region 1

Region 2 Region 2 Region 2 Region 2

Raft ConfChange - AddNode Multi-Raft - Scalability

A B C D

Region 1 Region 1 Region 1

Region 2 Region 2 Region 2

Raft ConfChange - RemoveNode https://github.com/pingcap/raft-rs Transaction Transaction How to keep consistency crossing multi-Raft Groups? let mut txn = store.begin() let value1 = txn.get(region1_key) let value2 = txn.get(region2_key) // do something with value txn.set(region1_key, new_value1) txn.set(region2_key, new_value2) txn.commit() // or txn.rollback() Transaction

1. Inspired by Google Percolator 2. Optimized Two Phase Commit (2 PC) 3. Multiversion Concurrency Control (MVCC) 4. Snapshot Isolation 5. Optimistic Transaction gRPC

● Mode ○ Unary ○ Client streaming ○ Server streaming ○ Duplex streaming

● Using Futures to wrap the asynchronous C gRPC API

let f = unary(service, method, request); let resp = f.wait(); https://github.com/pingcap/grpc-rs

Prometheus

● Type ○ Counter ○ Gauge ○ Histogram lazy_static! { static ref HTTP_COUNTER: Counter = register_counter!( "http_request_total", "Total number of HTTP request." ).unwrap(); }

HTTP_COUNTER.inc(); https://github.com/pingcap/rust-prometheus Testing Testing - Failure Injection

// Ingest a failure fn function_foo() { fail_point!("foo"); }

// Run and Trigger the failure FAILPOINTS=foo=panic cargo run https://github.com/pingcap/fail-rs Architecture Architecture

Prometheus Client

gRPC

Txn API Txn API Txn API

MVCC MVCC MVCC

Raft Raft Raft

RocksDB RocksDB RocksDB https://github.com/pingcap/tikv Beyond TiKV A Distributed Relational Database TiDB

Applications

MySQL Drivers(e.g. JDBC)

MySQL Protocol

TiDB

RPC

TiKV A Distributed Analytical Database TiSpark

Spark Driver

Job

Spark Cluster Worker Worker Worker

RPC

TiKV Hybrid Transactional/Analytical Processing Database PD PD

TSO/Data location Data location

PD

PD Cluster

Meta data Spark Driver TiDB TiKV TiKV Job TiDB API API Worker TiDB TiKV TiKV Worker TiDB TiKV TiKV Worker TiDB ...... Spark Cluster

TiDB Cluster TiKV Cluster (Storage) TiSpark Thank you! https://github.com/pingcap/tidb https://github.com/pingcap/tikv

We are hiring… @China @Silicon Valley @Home