TiDB as an HTAP Database
主讲人:PingCap CEO 刘奇 About me
• CEO and co-founder of PingCAP • Open source hacker • Infrastructure engineer • Founder of TiDB/TiKV/Codis • Infrastructure software engineer • Wandoulabs/JD Why a new database? Brief History
RDBMS NoSQL NewSQL
● Standalone RDBMS 1970s 2010 2015 Present ● NoSQL ● Middleware & Proxy Redis Google Spanner MySQL HBase Google F1 ● NewSQL PostgreSQL Cassandra TiDB Oracle MongoDB DB2 ...... NewSQL Database
● Horizontal Scalability ● ACID Transaction ● High Availability ● SQL at Scale OLTP & OLAP
OLTP OLAP
ETL 8am 2pm 6pm 2am
Database
Where is my data? ERP Is the data out-of-date? Data Warehouse CRM Why two separate systems
● Huge data size ● Complex query logic ● Latency VS Throughput ● Point query VS Full range scan ● Transaction & lsolation level OLAP + OLTP = HTAP
Hybrid Transactional / Analytical Processing ● ACID Transcation ● Real-time analysis ● SQL HTAP How do we build the new database What is TiDB
● Scalability as the first class feature ● SQL is necessary ● Compatible with MySQL, in most cases ● OLTP + OLAP = HTAP (Hybrid Transactional/Analytical Processing) ● 24/7 availability, even in case of datacenter outages ● Open source, of course Architecture
Stateless SQL Layer
Metadata / Timestamp request TiDB ... TiDB ... TiDB
Placement Driver (PD)
Raft Raft Raft TiKV ... TiKV TiKV TiKV Control flow: Balance / Failover
Distributed Storage Layer TiKV - Overview
• Region: a set of continuous key-value pairs • Data is organized/stored/replicated by Regions • Highly layered
TiKV Key Space RPC (gRPC)
Node A Transaction 256MB
MVCC [ start_key, Raft end_key) RocksDB (-∞, +∞) Raft Raft Sorted Map
Node B Node C Raft TiKV - Multi-Raft
Multiple raft groups in the cluster, one group for each region.
Client
RPC RPC RPC RPC
Store 1 Store 2 Store 3 Store 4
Region 1 Region 1 Region 2 Region 1
Region 2 Region 3 Region 2 Region 5
Region 5 Raft Region 5 Region 4 Region 3 Group Region 4 Region 4 Region 3
TiKV node 1 TiKV node 2 TiKV node 3 TiKV node 4 TiKV - Horizontal Scale
Node B
Region 1^ Region 1 Region 2 Region 3 Region 1* Node D
Region 2 Region 2 Region 3 Region 3 Node C Node A Add Replica Three steps to move a leader replica ● Transfer Leader ● Add Replica Node E ● Remove Replica PD - Overview
• Meta data management • Load balance management
Route Info TiKV Client PD
Node/Region Management Info Command
TiKV Cluster TiKV TiKV TiKV … ... TiKV PD - TiKV Cluster Managment
PD Node 1 HeartBeat Region A Cluster Info
Region B Scheduling Command Movement Admin
Scheduling Config Node 2 Stratege
Region C TiDB - Overview
• The stateless SQL layer
Optimized SQL AST Logical Plan Logical Plan
Statistics Cost Model
Selected TiDB SQL Layer Physical Plan
TiKV TiKV TiKV TiDB - Distributed SQL
SELECT COUNT(c1) FROM t WHERE c1 > 10 AND c2 = ‘shanghai’;
Physical Plan on TiDB Physical Plan on TiKV (index scan)
Partial Aggregate Final Aggregate COUNT(c1) SUM(COUNT(c1)) Row COUNT(c1) Filter c2 = “shanghai” DistSQL Scan Row
Read Row Data by RowID COUNT(c1) COUNT(c1) COUNT(c1) RowID TiKV Read Index TiKV idx1: (10, +∞) TiKV TiDB - Cost Based Optimizer
• Predicate Pushdown • Column Pruning • Eager Aggregate • Convert Subquery to Join • Statistics framework • CBO Framework – Index Selection – Join Operator Selection • Hash join • Index lookup join • Sort-merge join – Stream Operators VS Hash Operators OLTP + OLAP
TiDB TiKV Scheduler OLTP Jobs Query Job TiDB TiKV High Queue Priority TiDB TiKV Worker
. Worker OLAP . Query TiDB Jobs . . Worker Low TiDB Priority TiKV Worker Pool HTAP Ecosystem: Beyond TiDB and SQL Spark on TiDB
PD PD
TSO/Data location Data location
PD
PD Cluster
Meta data Spark Driver TiDB Job Application TiKV TiKV TiDB DistSQL API DistSQL API
Worker TiDB TiKV TiKV Syncer Worker TiDB TiKV TiKV Worker TiDB ...... Spark Cluster
TiDB Cluster TiKV Cluster (Storage) TiSpark Spark on TiDB
• Spark ecosystem • TiKV Connector is better than JDBC connector • Index support • Complex Calculation Pushdown • CBO – Pick up right Access Path – Join Reorder • Priority & Isolation Level Future plan
• Code Generation • MPP Engine • Mixed storage engine (Columnar / Row-based) • Heterogeneous computing (CPU/GPU/FPGA) THANKS