Tidb As an HTAP Database
Total Page:16
File Type:pdf, Size:1020Kb
TiDB as an HTAP Database 主讲人:PingCap CEO 刘奇 About me • CEO and co-founder of PingCAP • Open source hacker • Infrastructure engineer • Founder of TiDB/TiKV/Codis • Infrastructure software engineer • Wandoulabs/JD Why a new database? Brief History RDBMS NoSQL NewSQL ● Standalone RDBMS 1970s 2010 2015 Present ● NoSQL ● Middleware & Proxy Redis Google Spanner MySQL HBase Google F1 ● NewSQL PostgreSQL Cassandra TiDB Oracle MongoDB DB2 ... ... NewSQL Database ● Horizontal Scalability ● ACID Transaction ● High Availability ● SQL at Scale OLTP & OLAP OLTP OLAP ETL 8am 2pm 6pm 2am Database Where is my data? ERP Is the data out-of-date? Data Warehouse CRM Why two separate systems ● Huge data size ● Complex query logic ● Latency VS Throughput ● Point query VS Full range scan ● Transaction & lsolation level OLAP + OLTP = HTAP Hybrid Transactional / Analytical Processing ● ACID Transcation ● Real-time analysis ● SQL HTAP How do we build the new database What is TiDB ● Scalability as the first class feature ● SQL is necessary ● Compatible with MySQL, in most cases ● OLTP + OLAP = HTAP (Hybrid Transactional/Analytical Processing) ● 24/7 availability, even in case of datacenter outages ● Open source, of course Architecture Stateless SQL Layer Metadata / Timestamp request TiDB ... TiDB ... TiDB Placement Driver (PD) Raft Raft Raft TiKV ... TiKV TiKV TiKV Control flow: Balance / Failover Distributed Storage Layer TiKV - Overview • Region: a set of continuous key-value pairs • Data is organized/stored/replicated by Regions • Highly layered TiKV Key Space RPC (gRPC) Node A Transaction 256MB MVCC [ start_key, Raft end_key) RocksDB (-∞, +∞) Raft Raft Sorted Map Node B Node C Raft TiKV - Multi-Raft Multiple raft groups in the cluster, one group for each region. Client RPC RPC RPC RPC Store 1 Store 2 Store 3 Store 4 Region 1 Region 1 Region 2 Region 1 Region 2 Region 3 Region 2 Region 5 Region 5 Raft Region 5 Region 4 Region 3 Group Region 4 Region 4 Region 3 TiKV node 1 TiKV node 2 TiKV node 3 TiKV node 4 TiKV - Horizontal Scale Node B Region 1^ Region 1 Region 2 Region 3 Region 1* Node D Region 2 Region 2 Region 3 Region 3 Node C Node A Add Replica Three steps to move a leader replica ● Transfer Leader ● Add Replica Node E ● Remove Replica PD - Overview • Meta data management • Load balance management Route Info TiKV Client PD Node/Region Management Info Command TiKV Cluster TiKV TiKV TiKV … ... TiKV PD - TiKV Cluster Managment PD Node 1 HeartBeat Region A Cluster Info Region B Scheduling Command Movement Admin Scheduling Config Node 2 Stratege Region C TiDB - Overview • The stateless SQL layer Optimized SQL AST Logical Plan Logical Plan Statistics Cost Model Selected TiDB SQL Layer Physical Plan TiKV TiKV TiKV TiDB - Distributed SQL SELECT COUNT(c1) FROM t WHERE c1 > 10 AND c2 = ‘shanghai’; Physical Plan on TiDB Physical Plan on TiKV (index scan) Partial Aggregate Final Aggregate COUNT(c1) SUM(COUNT(c1)) Row COUNT(c1) Filter c2 = “shanghai” DistSQL Scan Row Read Row Data by RowID COUNT(c1) COUNT(c1) COUNT(c1) RowID TiKV Read Index TiKV idx1: (10, +∞) TiKV TiDB - Cost Based Optimizer • Predicate Pushdown • Column Pruning • Eager Aggregate • Convert Subquery to Join • Statistics framework • CBO Framework – Index Selection – Join Operator Selection • Hash join • Index lookup join • Sort-merge join – Stream Operators VS Hash Operators OLTP + OLAP TiDB TiKV Scheduler OLTP Jobs Query Job TiDB TiKV High Queue Priority TiDB TiKV Worker . Worker OLAP . Query TiDB Jobs . Worker Low TiDB Priority TiKV Worker Pool HTAP Ecosystem: Beyond TiDB and SQL Spark on TiDB PD PD TSO/Data location Data location PD PD Cluster Meta data Spark Driver TiDB Job Application TiKV TiKV TiDB DistSQL API DistSQL API Worker TiDB TiKV TiKV Syncer Worker TiDB TiKV TiKV Worker TiDB ... ... ... Spark Cluster TiDB Cluster TiKV Cluster (Storage) TiSpark Spark on TiDB • Spark ecosystem • TiKV Connector is better than JDBC connector • Index support • Complex Calculation Pushdown • CBO – Pick up right Access Path – Join Reorder • Priority & Isolation Level Future plan • Code Generation • MPP Engine • Mixed storage engine (Columnar / Row-based) • Heterogeneous computing (CPU/GPU/FPGA) THANKS.