MySQL Cluster
Monty Taylor MySQL Inc
University of São Paulo February 2007
Copyright 2006 MySQL AB 1 WWhoho aamm II??
• MMononttyy TayloTaylorr • SSeneniioror ConsConsuultltanantt • WorWorkikingng fforor MMyySSQQLL IncInc.. • LLiivivingng iinn SSeeatatttle,le, WasWashhiingtngtonon • ThTheaeattrree DDiirrececttoror,, LLiighghttiingng DDesesiignergner aandnd PPhothotogrographapherer
Copyright 2006 MySQL AB 2 Quick Introduction to MySQL
Copyright 2006 MySQL AB 3 Quick Introduction to MySQL
SQL Relational Database Management System
Copyright 2006 MySQL AB 4 Quick Introduction to MySQL
Aiming for full standards compliance (SQL 2003)
Copyright 2006 MySQL AB 5 Quick Introduction to MySQL
Reputation for: ✔Speed ✔Reliability ✔Ease of Use
Copyright 2006 MySQL AB 6 Quick Introduction to MySQL
Cluster appeared in 4.1
Copyright 2006 MySQL AB 7 Quick Introduction to MySQL
5.0 is GA Recommended release for Production Many speed improvements for Cluster
Copyright 2006 MySQL AB 8 Quick Introduction to MySQL
5.1 is Beta Many features for Cluster (also speed improvements)
Copyright 2006 MySQL AB 9 What to Expect Today
Copyright 2006 MySQL AB 10 What to expect today
• Learn about the architecture of MySQL Cluster • Current and future features • What problems MySQL Cluster is currently good at solving – and those that it isn't • Some discussion of internals
Copyright 2006 MySQL AB 11 Storage Engines
Copyright 2006 MySQL AB 12 Storage Engines
Unique feature of MySQL
Copyright 2006 MySQL AB 13 No one best way to store tables
Copyright 2006 MySQL AB 14 Choice of Storage Engines
Copyright 2006 MySQL AB 15 Different engine per table (if you want)
Copyright 2006 MySQL AB 16 Just like a Virtual File System Layer
Application Application Application Application
open() read() write() sync()
Kernel
VFS ext3 reiserfs XFS vfat
Copyright 2006 MySQL AB 17 Just like a Virtual File System Layer
Application Application Application Application
INSERT UPDATE DELETE CREATE
MySQL Server
Storage Engine API MyISAM InnoDB NDB Falcon
Cluster
Copyright 2006 MySQL AB 18 What is MySQL Cluster?
Copyright 2006 MySQL AB 19 What is MySQL Cluster?
A High Availability
Copyright 2006 MySQL AB 20 What is MySQL Cluster?
A High Availability High Performance
Copyright 2006 MySQL AB 21 What is MySQL Cluster?
A High Availability High Performance In Memory (4.1 and 5.0)
Copyright 2006 MySQL AB 22 What is MySQL Cluster?
A High Availability High Performance In Memory (4.1 and 5.0) Shared Nothing
Copyright 2006 MySQL AB 23 What is MySQL Cluster?
A High Availability High Performance In Memory (4.1 and 5.0) Shared Nothing Clustered
Copyright 2006 MySQL AB 24 What is MySQL Cluster?
A High Availability High Performance In Memory (4.1 and 5.0) Shared Nothing Clustered Storage Engine
Copyright 2006 MySQL AB 25 What is MySQL Cluster?
A High Availability High Performance In Memory (4.1 and 5.0) Shared Nothing Clustered Storage Engine
Copyright 2006 MySQL AB 26 What is MySQL Cluster?
High Availability
Copyright 2006 MySQL AB 27 High Availability
Designed for Five Nines (99.999%) Uptime
Copyright 2006 MySQL AB 28 High Availability
Sub-Second Failover
Copyright 2006 MySQL AB 29 High Availability
Sub-Second Failover
mysqld mysqld
Transactions
Data Nodes
Copyright 2006 MySQL AB 30 High Availability
Sub-Second Failover
mysqld mysqld
Transactions
Data Nodes
Copyright 2006 MySQL AB 31 High Availability
Sub-Second Failover
mysqld mysqld
Transactions
Data Nodes
Copyright 2006 MySQL AB 32 Failure Detection: Heartbeats, lost connections
Node number is an internal Nodes organized number signifying the age Data in a logical circle of the node. Node 1
Data Data Node 4 Node 2
All nodes must Data Heartbeat messages have the same view Node 3 sent to next node in circle of which nodes are alive
Copyright 2006 MySQL AB 33 Heartbeat Handling of MySQL Servers
MySQL Servers (mysqld)
Node Y1 Node Y2 Node Y3 Node Y4 (mysqld) (mysqld) (mysqld) (mysqld)
API_REGREQ API_REGCONF(HB_TIME)
Node X (ndbd)
Copyright 2006 MySQL AB 34 Time to Handle Node Failure
Detection Reorganization Handling transactions Time Time that lost coordinator
Max 4 * Heartbeat A matter of time Timer (default 1.5 microseconds Seconds) sending two rounds Almost instanteous of broadcast m essages if node failure discovered by OS.
Copyright 2006 MySQL AB 35 High Availability
“Hot” (Online) Backup
Copyright 2006 MySQL AB 36 High Availability
Hot (Online) Backup
mysqld mysqld
Transactions
Data Nodes
Copyright 2006 MySQL AB 37 High Availability
Hot (Online) Backup
mysqld mysqld
Transactions
Backup Backup
Data Nodes
Copyright 2006 MySQL AB 38 High Availability
Configurable Amount of Redundancy
Copyright 2006 MySQL AB 39 High Availability
Configurable Amount of Redundancy
NoOfReplicas
Copyright 2006 MySQL AB 40 High Availability
NoOfReplicas=1
D a t a
Copyright 2006 MySQL AB 41 High Availability
NoOfReplicas=2
Da ta Da ta
Copyright 2006 MySQL AB 42 High Availability
NoOfReplicas=3
Da Da ta Da ta ta
Copyright 2006 MySQL AB 43 High Availability
NoOfReplicas=4
Data Data Data Data
Copyright 2006 MySQL AB 44 What is MySQL Cluster?
A High Availability High Performance In Memory (4.1 and 5.0) Shared Nothing Clustered Storage Engine
Copyright 2006 MySQL AB 45 High Performance
High Performance
Copyright 2006 MySQL AB 46 High Performance
Not BEGIN to COMMIT
Copyright 2006 MySQL AB 47 High Performance
Not BEGIN to COMMIT Through Parallelism
Copyright 2006 MySQL AB 48 High Performance
Parallelism
mysqld mysqld
Transactions
Data Nodes
Copyright 2006 MySQL AB 49 What is MySQL Cluster?
A High Availability High Performance In Memory (4.1 and 5.0) Shared Nothing Clustered Storage Engine
Copyright 2006 MySQL AB 50 What is MySQL Cluster?
In Memory (4.1 and 5.0)
Copyright 2006 MySQL AB 51 In Memory (4.1 and 5.0)
Data and Indexes kept in main memory
Size of Database×No Of Replicas×1.1 Per Node= No Of Data Nodes
Copyright 2006 MySQL AB 52 In Memory (4.1 and 5.0)
Check point to disk
Copyright 2006 MySQL AB 53 In Memory (4.1 and 5.0)
Check point to disk
Frequent, Configurable
Copyright 2006 MySQL AB 54 In Memory (4.1 and 5.0)
Check point to disk
Not complete data loss after power outage
Copyright 2006 MySQL AB 55 Flushing the log to disk
REDO log
T1 commits T2 commits GCP T3 commits Time
A global checkpoint GCP flushes REDO log to disk.
Transactions T1 and T2 become disk persistent.
(Also called group commit.)
Copyright 2006 MySQL AB 56 Log and Local Checkpoints
The REDO log is written to disk.
LCP T1 commits T2 commits T3 commits Time
A local checkpoint LCP Since MySQL Cluster replicate all data, the REDO log saves an image of the and LCPs are only used to recover from whole system database on disk. failures. This enables cutting the tail of the REDO log. Transactions T1, T2 and T3 are main memory persistent.
Copyright 2006 MySQL AB 57 Checkpointing takes time
UNDO log REDO log
T1 commits T2 commits T3 commits Time LCP LCP start stop
UNDO log makes LCP image consistent with database at LCP start time.
Copyright 2006 MySQL AB 58 Restoring a Fragment at System Restart
1) Load Data Pages from local checkpoint into memory
Start Execute UNDO log End of UNDO log UNDO log
End Global Fuzzy point of Checkpoint start of Local Log Record Checkpoint
Execute REDO log
Copyright 2006 MySQL AB 59 In Memory (4.1 and 5.0)
Data on Disk in 5.1
Copyright 2006 MySQL AB 60 In Memory (4.1 and 5.0)
Data on Disk in 5.1 Indexed fields and Indexes in main memory
Copyright 2006 MySQL AB 61 What is MySQL Cluster?
A High Availability High Performance In Memory (4.1 and 5.0) Shared Nothing Clustered Storage Engine
Copyright 2006 MySQL AB 62 What is MySQL Cluster?
Shared Nothing
Copyright 2006 MySQL AB 63 Shared Nothing
Commodity PCs
Copyright 2006 MySQL AB 64 Shared Nothing
Commodity Interconnects
Copyright 2006 MySQL AB 65 Shared Nothing
Commodity Interconnects Ethernet
Copyright 2006 MySQL AB 66 Shared Nothing
Commodity Interconnects Ethernet SCI
Copyright 2006 MySQL AB 67 Shared Nothing
No Expensive Shared Disk
Copyright 2006 MySQL AB 68 Shared Nothing
No Expensive Shared Disk (so no single point of failure there)
Copyright 2006 MySQL AB 69 What is MySQL Cluster?
A High Availability High Performance In Memory (4.1 and 5.0) Shared Nothing Clustered Storage Engine
Copyright 2006 MySQL AB 70 What is MySQL Cluster?
Clustered
Copyright 2006 MySQL AB 71 What is MySQL Cluster?
Clustered
Copyright 2006 MySQL AB 72 What is MySQL Cluster?
A High Availability High Performance In Memory (4.1 and 5.0) Shared Nothing Clustered Storage Engine
Copyright 2006 MySQL AB 73 What is MySQL Cluster?
Storage Engine
Copyright 2006 MySQL AB 74 What is MySQL Cluster?
ENGINE=NDBCLUST ER
Copyright 2006 MySQL AB 75 What is MySQL Cluster?
A High Availability High Performance In Memory (4.1 and 5.0) Shared Nothing Clustered Storage Engine
Copyright 2006 MySQL AB 76 Node Types
Copyright 2006 MySQL AB 77 Node Types
Data Nodes (ndbd)
Copyright 2006 MySQL AB 78 Data Nodes
Copyright 2006 MySQL AB 79 Data Nodes
Data nodes (running ndbd)
Copyright 2006 MySQL AB 80 Data Nodes
This cloud means a cluster
Data nodes (running ndbd)
Copyright 2006 MySQL AB 81 Data Nodes grouped
Copyright 2006 MySQL AB 82 Data Nodes grouped into nodegroups
Nodegroup 0 Nodegroup 1
Copyright 2006 MySQL AB 83 NoOfReplicas is number of nodes in node group
NoOfReplicas=2
Nodegroup 0 Nodegroup 1
Copyright 2006 MySQL AB 84 NoOfReplicas is number of nodes in node group
DDAA TTAA NoOfReplicas=2
DDAA TTAA
Nodegroup 0 Nodegroup 1
Copyright 2006 MySQL AB 85 Data Nodes
Nodegroup 0 Nodegroup 1
Copyright 2006 MySQL AB 86 Data Nodes
pk
Nodegroup 0 Nodegroup 1
Copyright 2006 MySQL AB 87 HASH(pk) pk
Copyright 2006 MySQL AB 88 HASH(pk) pk
Copyright 2006 MySQL AB 89 HASH(pk) pk
Copyright 2006 MySQL AB 90 Perception Reality
HASH(pk) pk
Copyright 2006 MySQL AB 91 pk
Copyright 2006 MySQL AB 92 MySQL Servers talk to the Data Nodes
mysqld
Nodegroup 0 Nodegroup 1
Copyright 2006 MySQL AB 93 One used as a Transaction Coordinator
mysqld
One storage node as Transaction Coordinator (TC)
Nodegroup 0 Nodegroup 1
Copyright 2006 MySQL AB 94 One used as a Transaction Coordinator
mysqld
One storage node as Transaction Coordinator (TC)
Nodegroup 0 Nodegroup 1
Copyright 2006 MySQL AB 95 Variant of 2-phase commit protocol
Example showing transaction TC with two operations using three replicas Prepare
Committed/ Primary Completed Primary
Prepare Commit/ Complete Backup Backup
Prepare Commit/ Complete Backup Backup Prepared Commit/ Complete 1. Prepare F1 2. Commit F1 1. Prepare F2 2. Commit F2 3. Complete F1 3. Complete F2
Copyright 2006 MySQL AB 96 Node Types
Data Nodes (ndbd) Up to 48 in one cluster
Copyright 2006 MySQL AB 97 Node Types
Management Server (ndb_mgmd)
Copyright 2006 MySQL AB 98 Management Server config.ini
[ndbd default] NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster mysqld [ndbd] mysqld HostName= 192.168.0.40 ndb_mgmd
[ndbd] HostName= 192.168.0.41
[ndb_mgmd] DataDir= /usr/local/mysql/cluster HostName= 192.168.0.42
[mysqld] [mysqld] ndbd ndbd [mysqld]
Copyright 2006 MySQL AB 99 Management Server
config.ini
[ndbd default]
NoOfReplicas= 2
DataMemory= 400M
IndexMemory= 32M DataDir= /usr/local/mysql/cluster mysqld [ndbd]
HostName= 192.168.0.40 mysqld ndb_mgmd [ndbd]
HostName= 192.168.0.41
[ndb_mgmd]
DataDir= /usr/local/mysql/cluster
HostName= 192.168.0.42
[mysqld]
[mysqld]
[mysqld]
ndbd ndbd
Copyright 2006 MySQL AB 100 Management Server
config.ini mysqld mysqld ndb_mgmd
ndbd ndbd
Copyright 2006 MySQL AB 101 Management Server
config.ini mysqld mysqld ndb_mgmd
ndbd ndbd
Copyright 2006 MySQL AB 102 Management Server
config.ini mysqld mysqld ndb_mgmd
ndbd ndbd
Copyright 2006 MySQL AB 103 Node Types
Management Server (ndb_mgmd) also involved in arbitration, starting backups, issuing commands to nodes (start, stop, restart)
Copyright 2006 MySQL AB 104 Node Types
SQL Nodes (mysqld) sometimes called API nodes
Copyright 2006 MySQL AB 105 SQL Nodes
mysqld mysqld mysqld
Copyright 2006 MySQL AB 106 SQL and API Nodes
mysqld mysqld mysqld
NDB API NDB API
Copyright 2006 MySQL AB 107 DELETE INSERT UPDATE mysqld mysqld mysqld
NDB API NDB API
Copyright 2006 MySQL AB 108 DELETE INSERT UPDATE mysqld mysqld mysqld
NDB API NDB API
update() update()
Copyright 2006 MySQL AB 109 Node Types
SQL Nodes (mysqld) Accessed like any other MySQL Server
Copyright 2006 MySQL AB 110 Node Types
API Nodes Talk NDB API directly to the Data Nodes
Copyright 2006 MySQL AB 111 The full stack
Copyright 2006 MySQL AB 112 Physical Requirements
Copyright 2006 MySQL AB 113 Physical Requirements
A node is a process, not a computer
Copyright 2006 MySQL AB 114 Physical Requirements
At least three physical machines for High Availability
Copyright 2006 MySQL AB 115 Three machines minimum for HA
A B
Copyright 2006 MySQL AB 116 Three machines minimum for HA
A B
Can no longer see A
Copyright 2006 MySQL AB 117 Three machines minimum for HA
A B
Can no longer see A Did A Die?
Copyright 2006 MySQL AB 118 Three machines minimum for HA
A B
Or did the network Can no longer see A link between A and B die?
Copyright 2006 MySQL AB 119 Three machines minimum for HA
A B
Can no longer see B Or did the network Can no longer see A link between A and B die?
Copyright 2006 MySQL AB 120 Three machines minimum for HA Who is in charge now?
A B
Can no longer see B Or did the network Can no longer see A link between A and B die?
Copyright 2006 MySQL AB 121 Three machines minimum for HA Split Brain = Badness
A B
Can no longer see B Or did the network Can no longer see A link between A and B die?
Copyright 2006 MySQL AB 122 Three machines minimum for HA Split Brain = Badness
We detect possible split brain scenarios.
Nodes will shut down instead
Copyright 2006 MySQL AB 123 Three machines minimum for HA C
A B
Management server on 3rd machine
Copyright 2006 MySQL AB 124 Three machines minimum for HA C
A B
Management server on 3rd machine Is TThhee DDeecciiddeerr
Copyright 2006 MySQL AB 125 Three machines minimum for HA C
A B
Management server on 3rd machine Is TThhee DDeecciiddeerr
Copyright 2006 MySQL AB 126 Three machines minimum for HA C
A B
Management server on 3rd machine Is TThhee DDeecciiddeerr
Copyright 2006 MySQL AB 127 Physical Requirements
Management Server: Not CPU intensive
Copyright 2006 MySQL AB 128 Physical Requirements • Data Nodes –Lots of Memory –Disk I/O Can be calculated –(generally) not CPU bound –ndbd is single threaded •multiple nodes for SMP
Copyright 2006 MySQL AB 129 Physical Requirements
SQL Node: Many API/SQL nodes are needed to load Storage Nodes
Copyright 2006 MySQL AB 130 Physical Requirements
SQL Node: MySQL is multi-threaded – takes advantage of SMP transparently
Copyright 2006 MySQL AB 131 A Configuration
[ndbd default] NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster
[ndbd] HostName= 192.168.0.40
[ndbd] HostName= 192.168.0.41
[ndb_mgmd] DataDir= /usr/local/mysql/cluster HostName= 192.168.0.42
[mysqld] [mysqld] [mysqld]
Copyright 2006 MySQL AB 132 A Configuration
[ndbd default] NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster
[ndbd] HostName= 192.168.0.40 ndb_mgmd
[ndbd] HostName= 192.168.0.41
[ndb_mgmd] DataDir= /usr/local/mysql/cluster HostName= 192.168.0.42
[mysqld] [mysqld] [mysqld]
Copyright 2006 MySQL AB 133 A Configuration
[ndbd default] NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster
[ndbd] HostName= 192.168.0.40 ndb_mgmd
[ndbd] HostName= 192.168.0.41
[ndb_mgmd] DataDir= /usr/local/mysql/cluster HostName= 192.168.0.42
[mysqld] [mysqld] ndbd ndbd [mysqld]
Copyright 2006 MySQL AB 134 A Configuration
[ndbd default] NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster mysqld [ndbd] mysqld HostName= 192.168.0.40 ndb_mgmd
[ndbd] HostName= 192.168.0.41
[ndb_mgmd] DataDir= /usr/local/mysql/cluster HostName= 192.168.0.42
[mysqld] [mysqld] ndbd ndbd [mysqld]
Copyright 2006 MySQL AB 135 A Configuration
[ndbd default] NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster mysqld [ndbd] mysqld HostName= 192.168.0.40 ndb_mgmd
[ndbd] HostName= 192.168.0.41
[ndb_mgmd] DataDir= /usr/local/mysql/cluster HostName= 192.168.0.42
[mysqld] [mysqld] ndbd ndbd [mysqld]
Copyright 2006 MySQL AB 136 A Configuration
[ndbd default] NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster mysqld [ndbd] mysqld HostName= 192.168.0.40 ndb_mgmd
[ndbd] HostName= 192.168.0.41
[ndb_mgmd] DataDir= /usr/local/mysql/cluster HostName= 192.168.0.42
[mysqld] [mysqld] ndbd ndbd [mysqld]
Copyright 2006 MySQL AB 137 A Configuration
[ndbd default] applications NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster mysqld [ndbd] mysqld HostName= 192.168.0.40 ndb_mgmd
[ndbd] HostName= 192.168.0.41
[ndb_mgmd] DataDir= /usr/local/mysql/cluster HostName= 192.168.0.42
[mysqld] [mysqld] ndbd ndbd [mysqld]
Copyright 2006 MySQL AB 138 Using the Cluster
• MySQL Cluster is a storage engine – tables can be stored in the cluster simply by ENGINE=NDBCLUSTER • No Foreign Key constraints • Supports 5.0 features: – Views – Stored Procedures – Triggers – standard permissions – these need to be set up on each SQL node • as the mysql.* tables aren't stored as NDB tables
Copyright 2006 MySQL AB 139 Failure Scenarios
• MySQL Server Node – Can be restarted and reconnected to the cluster. – Applications can connect to other MySQL Server nodes • Data Node – All other storage nodes are informed about failure – Data is replicated, so there is another storage node to service transaction requests – Currently, transactions using a failed node are aborted, and have to be restarted by the application • Management Server Node – Continued operation not dependent on management server – may be restarted, or have redundant nodes
Copyright 2006 MySQL AB 140 Schema Considerations
• All tables have a primary key – if no primary key is explicitly set, NDBCLUSTER creates a hidden field • Three types of indexes – Primary Hash Index – Unique Hash Index – Ordered Tree Index • Creating a UNIQUE index from SQL will also create an ordered index unless USING HASH is specified – requires knowing how you application will query your DB • Primary key is primary hash index and ordered index – unless USING HASH is specified
Copyright 2006 MySQL AB 141 Optimizations
• An improvement in 5.0 is the engine_condition_pushdown option – SELECT A from T where unindexed_field = 42; – The condition is evaluated in the data nodes, not in the SQL node – saving a full table scan • which can be rather expensive on large tables – Common to see 5-10x speed improvement – details in EXPLAIN output • In 5.0, we integrate with the query cache • Batched lookups – SELECT * from t1 where primary_key IN (1,2,3,4,5,6,7,8,9,10); – all 10 key lookups are sent in one batch rather than one at a time – 2-3 times performance gain
Copyright 2006 MySQL AB 142 Condition Pushdown
• Conditions on non-key/index attributes are pushed down to data nodes whenever possible. select * from t2,t3 where t2.attr2 = t3.attr2 and t2.attr1 < 1 and t3.attr1 < 5; Join query is executed as two nested scans on table t2 and t3.The scan on t2 will push the condition t2.attr1 < 1 and the scan on t3 will push the condition t3.attr1 < 5 • Can be a arbitrary nesting of AND/OR/NOT and support the following operators=,=!,<,<=,>,>=,IS NULL, IS NOT NULL, LIKE, and NOT LIKE. • Is enabled|disabled by set engine_condition_pushdown=on|off;
Copyright 2006 MySQL AB 143 New in 5.1
Copyright 2006 MySQL AB 144 Variable Sized Rows
Copyright 2006 MySQL AB 145 4.1 and 5.0
1 2 hello int int VARCHAR
Copyright 2006 MySQL AB 146 4.1 and 5.0
1 2 hello and how are you today? int int VARCHAR
Copyright 2006 MySQL AB 147 4.1 and 5.0
1 2 hello and how are you today? very well. int int VARCHAR
Copyright 2006 MySQL AB 148 4.1 and 5.0
1 2 hello int int VARCHAR
Copyright 2006 MySQL AB 149 Wasted Space
1 2 hello int int VARCHAR
Wasted Space
Copyright 2006 MySQL AB 150 5.1
1 2 hello int int VARCHAR
Copyright 2006 MySQL AB 151 The Saving
Copyright 2006 MySQL AB 152 The Saving
Copyright 2006 MySQL AB 153 Variable Sized Rows
Copyright 2006 MySQL AB 154 More rows per gigabyte
Copyright 2006 MySQL AB 155 Larger datasets in Cluster
Copyright 2006 MySQL AB 156 Copyright 2006 MySQL AB 157 On line Add/Drop Index
Copyright 2006 MySQL AB 158 ADD INDEX (4.1, 5.0)
t1
Index Index
Rows
Copyright 2006 MySQL AB 159 ADD INDEX (4.1, 5.0)
t1 temp table Index Index Index Index Index
Rows Rows
Copyright 2006 MySQL AB 160 ADD INDEX (4.1, 5.0)
t1 temp table Index Index Index Index Index
Rows Rows
Copyright 2006 MySQL AB 161 ADD INDEX (4.1, 5.0)
t1 temp table Index Index Index Index Index
Rows Rows
Copyright 2006 MySQL AB 162 ADD INDEX (4.1, 5.0)
t1 temp table Index Index Index Index Index
Rows Rows
Copyright 2006 MySQL AB 163 ADD INDEX (4.1, 5.0)
t1 temp table Index Index Index Index Index
Rows Rows
Copyright 2006 MySQL AB 164 ADD INDEX (4.1, 5.0)
t1 temp table Index Index Index Index Index
Rows Rows
Copyright 2006 MySQL AB 165 ADD INDEX (4.1, 5.0)
t1 temp table Index Index Index Index Index
Rows Rows
Copyright 2006 MySQL AB 166 ADD INDEX (4.1, 5.0)
temp table Index Index Index
Rows
Copyright 2006 MySQL AB 167 ADD INDEX (4.1, 5.0)
t1 Index Index Index
Rows
Copyright 2006 MySQL AB 168 ADD INDEX in 5.1
t1
Index Index
Rows
Copyright 2006 MySQL AB 169 ADD INDEX in 5.1
t1 Index Index Index
Rows
Copyright 2006 MySQL AB 170 ADD INDEX in 5.1
t1 Index Index Index
Rows
Copyright 2006 MySQL AB 171 ADD INDEX in 5.1
t1 Index Index Index
Rows
Copyright 2006 MySQL AB 172 ADD INDEX in 5.1
t1 Index Index Index
Rows
Copyright 2006 MySQL AB 173 ADD INDEX in 5.1
t1 Index Index Index
Rows
Copyright 2006 MySQL AB 174 ADD INDEX in 5.1
t1 Index Index Index
Rows
Copyright 2006 MySQL AB 175 ADD INDEX in 5.1
t1 Index Index Index
Rows
Copyright 2006 MySQL AB 176 DROP INDEX in 5.1
t1 Index Index Index
Rows
Copyright 2006 MySQL AB 177 DROP INDEX in 5.1
t1
Index Index
Rows
Copyright 2006 MySQL AB 178 How much faster?
Copyright 2006 MySQL AB 179 Online Add Index
• Before (copy the table): mysql> create index b on t1(b); Query OK, 1356 rows affected (2.20 sec) Records: 1356 Duplicates: 0 Warnings: 0
mysql> drop index b on t1; Query OK, 1356 rows affected (2.03 sec) Records: 1356 Duplicates: 0 Warnings: 0
Copyright 2006 MySQL AB 180 Online Add Index
• Before (copy the table): mysql> create index b on t1(b); Query OK, 1356 rows affected (2.20 sec) Records: 1356 Duplicates: 0 Warnings: 0
mysql> drop index b on t1; Query OK, 1356 rows affected (2.03 sec) Records: 1356 Duplicates: 0 Warnings: 0 • Now (just add/drop an index): mysql> create index b on t1(b); Query OK, 0 rows affected (0.58 sec) Records: 0 Duplicates: 0 Warnings: 0
mysql> drop index b on t1; Query OK, 0 rows affected (0.46 sec) Records: 0 Duplicates: 0 Warnings: 0
Copyright 2006 MySQL AB 181 On line Add/Drop Index
Copyright 2006 MySQL AB 182 Faster Add/Drop Index
Copyright 2006 MySQL AB 183 Less memory requirements
Copyright 2006 MySQL AB 184 A more adaptive Cluster
Copyright 2006 MySQL AB 185 User Defined Partitioning
Copyright 2006 MySQL AB 186 Since the dawn of time...
Copyright 2006 MySQL AB 187 Copyright 2006 MySQL AB 188 pk
Copyright 2006 MySQL AB 189 pk
Copyright 2006 MySQL AB 190 pk
Nodegroup 0 Nodegroup 1
Copyright 2006 MySQL AB 191 HASH(pk) pk
Copyright 2006 MySQL AB 192 HASH(pk) pk
Copyright 2006 MySQL AB 193 HASH(pk) pk
Copyright 2006 MySQL AB 194 Perception Reality
HASH(pk) pk
Copyright 2006 MySQL AB 195 pk
Copyright 2006 MySQL AB 196 pk
Two Partitions
Copyright 2006 MySQL AB 197 How the default looks (in 5.1 SHOW CREATE TABLE)
CREATE TABLE account(number int unsigned, location varchar, amount int) PRIMARY KEY (number) [PARTITION BY KEY ()] [(PARTITION P0 NODEGROUP 0, PARTITION P1 NODEGROUP 1, …)] ENGINE=NDBCLUSTER;
Copyright 2006 MySQL AB 198 How the default looks (in 5.1 SHOW CREATE TABLE)
CREATE TABLE account(number int unsigned, location varchar, amount int) PRIMARY KEY (number) [PARTITION BY KEY ()] [(PARTITION P0 NODEGROUP 0, PARTITION P1 NODEGROUP 1, …)] ENGINE=NDBCLUSTER;
Copyright 2006 MySQL AB 199 How the default looks (in 5.1 SHOW CREATE TABLE)
CREATE TABLE account(number int unsigned, location varchar, amount int) PRIMARY KEY (number) [PARTITION BY KEY ()] [(PARTITION P0 NODEGROUP 0, PARTITION P1 NODEGROUP 1, …)] ENGINE=NDBCLUSTER;
Copyright 2006 MySQL AB 200 How the default looks (in 5.1 SHOW CREATE TABLE)
CREATE TABLE account(number int unsigned, location varchar, amount int) PRIMARY KEY (number) [PARTITION BY KEY ()] [(PARTITION P0 NODEGROUP 0, PARTITION P1 NODEGROUP 1, …)] ENGINE=NDBCLUSTER;
Copyright 2006 MySQL AB 201 Now, In 5.1
Copyright 2006 MySQL AB 202 User Defined Partitioning
Copyright 2006 MySQL AB 203 By Key
Copyright 2006 MySQL AB 204 Partition by Key
CREATE TABLE account(number int unsigned, location varchar, amount int) PRIMARY KEY (number) [PARTITION BY KEY ()] [(PARTITION P0 NODEGROUP 0, PARTITION P1 NODEGROUP 1, …)] ENGINE=NDBCLUSTER;
Copyright 2006 MySQL AB 205 By Range
Copyright 2006 MySQL AB 206 Partition by Range
CREATE TABLE account(number int unsigned, location varchar, amount int) PRIMARY KEY (number) PARTITION BY RANGE (amount) PARTITIONS 2 (PARTITION P0 VALUES LESS THAN 10000 NODEGROUP 0, PARTITION P1 VALUES LESS THAN 100000 NODEGROUP 1) ENGINE=NDBCLUSTER;
Copyright 2006 MySQL AB 207 or by List
Copyright 2006 MySQL AB 208 Partition by List
CREATE TABLE account(number int unsigned, location varchar, amount int) PRIMARY KEY (number) PARTITION BY LIST (ORD(location)) PARTITIONS 2 (PARTITION P0 VALUES IN (66, 76) -- Berlin, London NODEGROUP 0, PARTITION P1 VALUES IN (78, 84) -- New York, Tokyo NODEGROUP 1) ENGINE=NDBCLUSTER;
Copyright 2006 MySQL AB 209 User Defined Partitioning
Copyright 2006 MySQL AB 210 Gives the User control
Copyright 2006 MySQL AB 211 Gives the User control over physical distribution of data
Copyright 2006 MySQL AB 212 MySQL Cluster Replication
Copyright 2006 MySQL AB 213 Not Internal Mirroring between nodes
Copyright 2006 MySQL AB 214 Replication from one Cluster to Another Cluster
Copyright 2006 MySQL AB 215 Why?
Copyright 2006 MySQL AB 216 the usual reasons
Copyright 2006 MySQL AB 217 Why replicate between Clusters?
• Geographical Redundancy
Copyright 2006 MySQL AB 218 Why replicate between Clusters?
• Geographical Redundancy • Split the processing load
Copyright 2006 MySQL AB 219 Why replicate between Clusters?
• Geographical Redundancy • Split the processing load –e.g. for monthly reports
Copyright 2006 MySQL AB 220 Quick Overview of MySQL Replication
Copyright 2006 MySQL AB 221 MySQL Replication
Copyright 2006 MySQL AB 222 MySQL Replication
INSERT ...
Copyright 2006 MySQL AB 223 MySQL Replication
INSERT ...
INSERT...
Copyright 2006 MySQL AB 224 MySQL Replication
INSERT ...
INSERT...
Copyright 2006 MySQL AB 225 MySQL Replication
INSERT ...
INSERT...
INSERT...
Copyright 2006 MySQL AB 226 MySQL Replication
INSERT ...
INSERT ...
INSERT...
INSERT...
Copyright 2006 MySQL AB 227 MySQL Replication
INSERT ...
INSERT ...
INSERT...
INSERT...
INSERT...
Copyright 2006 MySQL AB 228 MySQL Replication
INSERT ... UPDATE ... INSERT ...
INSERT... UPDATE... INSERT...
INSERT...
Copyright 2006 MySQL AB 229 MySQL Replication
INSERT ... UPDATE ... DELETE ... INSERT ...
INSERT... UPDATE... DELETE... INSERT... UPDATE... INSERT...
Copyright 2006 MySQL AB 230 MySQL Replication
Master
Copyright 2006 MySQL AB 231 MySQL Replication
Master
Copyright 2006 MySQL AB 232 MySQL Replication
Slave
Master Slave
Slave
Copyright 2006 MySQL AB 233 MySQL Replication
Slave
Master Slave
Slave
Copyright 2006 MySQL AB 234 MySQL Replication
Slave
Master Slave
Slave
Copyright 2006 MySQL AB 235 MySQL Replication
Slave
Master Slave
Slave
Copyright 2006 MySQL AB 236 MySQL Replication
Slave
Master Slave
Slave
Copyright 2006 MySQL AB 237 MySQL Replication
Slave
Slow, Unreliable Network Master Slave
Slave
Copyright 2006 MySQL AB 238 MySQL Replication
Slave
Slow, Unreliable Network Master Slave
Slave
Copyright 2006 MySQL AB 239 MySQL Replication
Slave
Slow, Unreliable Network Master Slave
Slave
Copyright 2006 MySQL AB 240 MySQL Replication
Slave
Slow, Unreliable Network Master Slave
Slave
Copyright 2006 MySQL AB 241 Back at Cluster
Copyright 2006 MySQL AB 242 mysqld mysqld mysqld
Copyright 2006 MySQL AB 243 mysqld mysqld mysqld
NDB API NDB API
Copyright 2006 MySQL AB 244 mysqld mysqld mysqld UPDATE UPDATE UPDATE
NDB API NDB API
Copyright 2006 MySQL AB 245 mysqld mysqld mysqld
NDB API NDB API
update() update()
Copyright 2006 MySQL AB 246 mysqld mysqld mysqld
NDB API NDB API
update() update()
Copyright 2006 MySQL AB 247 INSERT UPDATE DELETE
mysqld mysqld mysqld
NDB API NDB API
update() update()
Copyright 2006 MySQL AB 248 INSERT UPDATE DELETE
mysqld mysqld mysqld
HOW? HOW?
NDB API NDB API
update() update()
Copyright 2006 MySQL AB 249 INSERT UPDATE DELETE
mysqld mysqld mysqld
Copyright 2006 MySQL AB 250 INSERT INSERT INSERT UPDATE UPDATE UPDATE DELETE DELETE DELETE
SLAVE
Copyright 2006 MySQL AB 251 INSERT INSERT INSERT UPDATE UPDATE UPDATE DELETE DELETE DELETE ORDER?
SLAVE
Copyright 2006 MySQL AB 252 Serialization is in the storage nodes
Copyright 2006 MySQL AB 253 INSERT UPDATE DELETE
mysqld mysqld mysqld
NDB API NDB API
update() update()
Copyright 2006 MySQL AB 254 INSERT UPDATE DELETE
mysqld mysqld mysqld
NDB API NDB API
update() update()
Copyright 2006 MySQL AB 255 INSERT UPDATE DELETE
mysqld mysqld mysqld
NDB API NDB API
update() update()
Copyright 2006 MySQL AB 256 NDB Injector Thread
• A thread inside the MySQL Server • Subscribes to events in NDB – The event of “Row was committed” • Injects the rows into the binlog • Producing a single, canonical binlog of your cluster – not just one mysql server – it contains EVERYTHING done on ALL ndbapi programs (including mysqld) connected to the cluster
Copyright 2006 MySQL AB 257 INSERT UPDATE DELETE
mysqld mysqld mysqld
NDB API NDB API
update() update()
Copyright 2006 MySQL AB 258 A Closer Look...
Copyright 2006 MySQL AB 259 MySQL Replication between Clusters
Application Application
MySQL Server Master
NdbCluster Handler
NDB Kernel (Data nodes) ndbd ndbd
ndbd ndbd
Copyright 2006 MySQL AB 260 MySQL Replication between Clusters
Application Application
MySQL Server Master
NdbCluster Handler
NDB Kernel (Data nodes) ndbd ndbd
ndbd ndbd
Copyright 2006 MySQL AB 261 MySQL Replication between Clusters
Application Application
MySQL Server Master
NdbCluster
Handler Binlog
NDB Kernel (Data nodes) ndbd ndbd
ndbd ndbd
Copyright 2006 MySQL AB 262 MySQL Replication between Clusters
Application Application
MySQL Server MySQL Server Master Slave
NdbCluster
Handler Binlog
NDB Kernel (Data nodes) ndbd ndbd
ndbd ndbd
Copyright 2006 MySQL AB 263 MySQL Replication between Clusters
Application Application
MySQL Server MySQL Server Master Slave
NdbCluster NdbCluster
Handler Binlog Handler
NDB Kernel NDB Kernel (Data nodes) (Data nodes) ndbd ndbd ndbd ndbd
ndbd ndbd ndbd ndbd
Copyright 2006 MySQL AB 264 MySQL Replication between Clusters
Application Application
MySQL Server MySQL Server Master Slave
NdbCluster $handler Handler Binlog
NDB Kernel (Data nodes) ndbd ndbd
ndbd ndbd
Copyright 2006 MySQL AB 265 MySQL Replication between Clusters
Application Application
MySQL Server MySQL Server Master Slave
NdbCluster NdbCluster
Handler Binlog Handler
NDB Kernel NDB Kernel (Data nodes) (Data nodes) ndbd ndbd ndbd ndbd
ndbd ndbd ndbd ndbd
Copyright 2006 MySQL AB 266 MySQL Replication between Clusters
Application Application
MySQL Server MySQL Server Replication Master Slave
NdbCluster NdbCluster
Handler Binlog Handler
NDB Kernel NDB Kernel (Data nodes) (Data nodes) ndbd ndbd ndbd ndbd
ndbd ndbd ndbd ndbd
Copyright 2006 MySQL AB 267 MySQL Replication between Clusters
Application Application
MySQL Server I / O MySQL Server Replication Master thread Slave
NdbCluster NdbCluster Relay Handler Binlog Handler Binlog
NDB Kernel NDB Kernel (Data nodes) (Data nodes) ndbd ndbd ndbd ndbd
ndbd ndbd ndbd ndbd
Copyright 2006 MySQL AB 268 MySQL Replication between Clusters
Application Application
MySQL Server I / O MySQL Server Replication Master thread Apply Slave thread
NdbCluster NdbCluster Relay Handler Binlog Handler Binlog
NDB Kernel NDB Kernel (Data nodes) (Data nodes) ndbd ndbd ndbd ndbd
ndbd ndbd ndbd ndbd
Copyright 2006 MySQL AB 269 MySQL Replication between Clusters
Application Application
MySQL Server I / O MySQL Server Replication Master thread Apply Slave thread
NdbCluster NdbCluster Relay Handler Binlog Handler Binlog Binlog
NDB Kernel NDB Kernel (Data nodes) (Data nodes) ndbd ndbd ndbd ndbd
ndbd ndbd ndbd ndbd
Copyright 2006 MySQL AB 270 MySQL Replication between Clusters
I / O MySQL Server thread Apply Slave thread
NdbCluster Relay Handler Binlog Binlog
NDB Kernel (Data nodes) ndbd ndbd
ndbd ndbd
Copyright 2006 MySQL AB 271 MySQL Replication between Clusters
Application Application
I / O MySQL Server thread Apply Slave thread
NdbCluster Relay Handler Binlog Binlog
NDB Kernel (Data nodes) ndbd ndbd
ndbd ndbd
Copyright 2006 MySQL AB 272 MySQL Replication between Clusters
Application Application Application Application
MySQL Server I / O MySQL Server Replication Master thread Apply Slave thread
NdbCluster NdbCluster Relay Handler Binlog Handler Binlog Binlog
NDB Kernel NDB Kernel (Data nodes) (Data nodes) ndbd ndbd ndbd ndbd
ndbd ndbd ndbd ndbd
Copyright 2006 MySQL AB 273 One thing...
Who spotted the single point of failure?
Copyright 2006 MySQL AB 274 Redundant Replication Channels
Replication Apply MySQL Server Master I/O MySQL Server thread thread Slave NdbCluster NdbCluster Handler Relay Handler Binlog Binlog Binlog
NDB Kernel NDB Kernel (Data nodes) (Data nodes) ndbd ndbd ndbd ndbd
ndbd ndbd ndbd ndbd
Copyright 2006 MySQL AB 275 Redundant Replication Channels
Replication Apply MySQL Server Master I/O MySQL Server thread thread Slave NdbCluster NdbCluster Handler Relay Handler Binlog Binlog Binlog
NDB Kernel NDB Kernel (Data nodes) (Data nodes) ndbd ndbd ndbd ndbd
ndbd ndbd ndbd ndbd
NdbCluster Binlog Handler
MySQL Server Master
Copyright 2006 MySQL AB 276 Redundant Replication Channels
Replication Apply MySQL Server Master I/O MySQL Server thread thread Slave NdbCluster NdbCluster Handler Relay Handler Binlog Binlog Binlog
NDB Kernel NDB Kernel (Data nodes) (Data nodes) ndbd ndbd ndbd ndbd
ndbd ndbd ndbd ndbd
NdbCluster Binlog Relay NdbCluster Binlog Handler Binlog Handler MySQL Server Master Replication MySQL Server I/O Apply thread thread Slave
Copyright 2006 MySQL AB 277 Redundant Replication Channels
Replication Apply MySQL Server Master I/O MySQL Server thread thread Slave NdbCluster NdbCluster Handler Relay Handler Binlog Binlog Binlog
NDB Kernel NDB Kernel (Data nodes) (Data nodes) ndbd ndbd ndbd ndbd
ndbd ndbd ndbd ndbd
NdbCluster Binlog Relay NdbCluster Binlog Handler Binlog Handler MySQL Server Master Replication MySQL Server I/O Apply thread thread Slave
Copyright 2006 MySQL AB 278 Redundant Replication Channels
Replication Apply MySQL Server Master I/O MySQL Server thread thread Slave NdbCluster NdbCluster Handler Relay Handler Binlog Binlog Binlog
NDB Kernel NDB Kernel (Data nodes) (Data nodes) ndbd ndbd ndbd ndbd
ndbd ndbd ndbd ndbd
NdbCluster Binlog Relay NdbCluster Binlog Handler Binlog Handler MySQL Server Master Replication MySQL Server I/O Apply thread thread Slave
Copyright 2006 MySQL AB 279 Redundant Replication Channels
Replication Apply MySQL Server Master I/O MySQL Server thread thread Slave NdbCluster NdbCluster Handler Relay Handler Binlog Binlog Binlog
NDB Kernel NDB Kernel (Data nodes) (Data nodes) ndbd ndbd ndbd ndbd
ndbd ndbd ndbd ndbd
NdbCluster Binlog Relay NdbCluster Binlog Handler Binlog Handler MySQL Server Master Replication MySQL Server I/O Apply thread thread Slave
Copyright 2006 MySQL AB 280 So,
THAT'S COOL!
Copyright 2006 MySQL AB 281 Not just Cool,
Copyright 2006 MySQL AB 282 It's SHINY.
Copyright 2006 MySQL AB 283 How do I make fail over happen?
Copyright 2006 MySQL AB 284 But first...
Copyright 2006 MySQL AB 285 Epoch
• A point of synchronization in the cluster • Everybody agrees on what transactions are disk persistent • In case of system crash, this is where we'll recover to.
Copyright 2006 MySQL AB 286 Okay, but fail over?
Copyright 2006 MySQL AB 287 Currently manual,
Copyright 2006 MySQL AB 288 Currently manual, but only four simple steps
Copyright 2006 MySQL AB 289 STEP 1
• Find out where the Slave is up to • In the binary log produced by the injector – each Global Check Point (epoch) is a transaction – Where we are up to is recorded on the slave • The cluster database tracks these things – here, the apply_status table. • Which has two columns: server_id and epoch (both integers) • and is ENGINE=ndb – so is available everywhere! • So, we – mysqlS`> SELECT @latest:=MAX(epoch) from cluster.apply_status; – possibly with a WHERE clause for server_id
Copyright 2006 MySQL AB 290 STEP 2
• Find the binlog position for this epoch • The cluster.binlog_index table will help us here – It maps binlog position to GCI – and tells us the number of INSERT, UPDATES, DELETES and SCHEMAOPS per GCI – Is MyISAM and is per-master • So, we run the query (@latest from slave) • mysqlM`> SELECT @file:=SUBSTRING_INDEX(File, '/', -1), @pos:=Position FROM cluster.binlog_index WHERE epoch > @latest ORDER BY epoch ASC LIMIT 1;
Copyright 2006 MySQL AB 291 STEP 3
• Synchronize the second channel – i.e. change the master • Run the query (using @file and @pos from last query) – mysqlS`> CHANGE MASTER TO MASTER_LOG_FILE='@file', MASTER_LOG_POS=@pos;
Copyright 2006 MySQL AB 292 STEP 4
• mysqlS`> START SLAVE; • No, really, that's it.
Copyright 2006 MySQL AB 293 Limitations
• Fail over of replication channels is manual – can be scripted • Since all updates are through one injector thread, there is a limit – this limit is much less than what you can pump through a good cluster – We are working to overcome this
Copyright 2006 MySQL AB 294 You can now...
• Have a 99.999% uptime cluster – with great performance – and a cool name
Copyright 2006 MySQL AB 295 You can now...
• Have a 99.999% uptime cluster – with great performance – and a cool name • Have replication between it and another Cluster (or single server) – Load balancing – Geographical redundancy
Copyright 2006 MySQL AB 296 You can now...
• Have a 99.999% uptime cluster – with great performance – and a cool name • Have replication between it and another Cluster (or single server) – Load balancing – Geographical redundancy • Redundant replication channels between these setups • Redundancy up the wazoo!
Copyright 2006 MySQL AB 297 Copyright 2006 MySQL AB 298 Disk Data
Copyright 2006 MySQL AB 299 Options for Adding Disk Data to NDB
1. Put Existing disk engine into NDB kernel
Copyright 2006 MySQL AB 300 Options for Adding Disk Data to NDB
1. Put Existing disk engine into NDB kernel 2. Careful re-engineering of NDB kernel to add support for disk based attributes
Copyright 2006 MySQL AB 301 Options for Adding Disk Data to NDB
1. Put Existing disk engine into NDB kernel 2. Careful re-engineering of NDB kernel to add support for disk based attributes
Copyright 2006 MySQL AB 302 Two Phase Implementation
1. Data on disk (5.1) 2. Indexes on disk (future)
Copyright 2006 MySQL AB 303 A few concepts...
Copyright 2006 MySQL AB 304 Where we store things
Table Space
Copyright 2006 MySQL AB 305 Where we store things
Table Space
Data file
Copyright 2006 MySQL AB 306 Where we store things
Table Space
Data file Data file
Copyright 2006 MySQL AB 307 Where we store things
Table Space
Data file Data file
Data file
Copyright 2006 MySQL AB 308 Where we store things
Table Space
Table Space Data file Data file
Data file Data file
Data file
Copyright 2006 MySQL AB 309 Where we store things
Table Space
Table Space Data file Data file
Data file Data file
Data file Log file group
Undo file
Copyright 2006 MySQL AB 310 Where we store things
Table Space
Table Space Data file Data file
Data file Data file
Data file Log file group
Undo file Undo file
Copyright 2006 MySQL AB 311 Where we store things
Table Space
Table Space Data file Data file
Data file Data file
Data file Log file group
Undo file Undo file
Copyright 2006 MySQL AB 312 Files are per node
Node 2 Node 1
df1 df1
Copyright 2006 MySQL AB 313 Let's look at the SQL
Copyright 2006 MySQL AB 314 CREATE LOGFILE GROUP
• CREATE LOGFILE GROUP lg_1 ADD UNDOFILE 'undo1' INITIAL_SIZE 16M UNDO_BUFFER_SIZE 2M ENGINE=NDB; • We can add another undo file • ALTER LOGFILE GROUP lg_1 ADD UNDOFILE 'undo2' INITIAL_SIZE 12M ENGINE=NDB; • We currently don't auto-extend files – For more space, add files
Copyright 2006 MySQL AB 315 CREATE TABLESPACE
• CREATE TABLESPACE ts1 ADD DATAFILE 'datafile1' USE LOGFILE GROUP lg1 INITIAL_SIZE 32M ENGINE=NDB; • We can add another datafile too • ALTER TABLESPACE ts1 ADD DATAFILE 'datafile2' INITIAL_SIZE 48M ENGINE=NDB; • We currently don't auto-extend – So just add another file
Copyright 2006 MySQL AB 316 CREATE TABLE
• CREATE TABLE t1 ( pk1 INT NOT NULL PRIMARY KEY, b INT NOT NULL, c INT NOT NULL) TABLESPACE ts1 STORAGE DISK ENGINE=NDB; • b and c will be stored on disk • pk1 in memory (as it's indexed)
Copyright 2006 MySQL AB 317 ALTER TABLE (add index)
• If you add an index, fields moved to MM
Copyright 2006 MySQL AB 318 Checking free space
• INFORMATION_SCHEMA.FILES – Ongoing debate on if this is going to stay in INFORMATION_SCHEMA or move to performance schema. – The actual schema of the table is pretty decided though – Currently NDB only • Pressure your storage engine developers to add code for theirs! • Does require some knowledge about how space is allocated to tables
Copyright 2006 MySQL AB 319 How we allocate disk space
Copyright 2006 MySQL AB 320 Datafile
Copyright 2006 MySQL AB 321 Datafile
Extents
Copyright 2006 MySQL AB 322 Datafile t1
Copyright 2006 MySQL AB 323 Datafile t1
Copyright 2006 MySQL AB 324 Datafile t1
Copyright 2006 MySQL AB 325 Datafile t1
t2
Copyright 2006 MySQL AB 326 Datafile t1
t2
Copyright 2006 MySQL AB 327 Datafile t1
t2
Copyright 2006 MySQL AB 328 Datafile t1
t2
Copyright 2006 MySQL AB 329 Datafile t1
t2
Copyright 2006 MySQL AB 330 Datafile t1
t2
1 Free Extent
Copyright 2006 MySQL AB 331 I_S.FILES for Data files
• Datafile – what ts it belongs to • Extent Size (bytes) • Number of extents in file • Number of free extents • So, Free extents multiplied by extent size = free bytes that can be allocated to tables
Copyright 2006 MySQL AB 332 A useful VIEW
CREATE VIEW isf AS SELECT FILE_NAME, (TOTAL_EXTENTS * EXTENT_SIZE) AS 'Total', (FREE_EXTENTS * EXTENT_SIZE) AS 'Free', ( ((FREE_EXTENTS * EXTENT_SIZE)*100) / (TOTAL_EXTENTS * EXTENT_SIZE)) AS '% Free' FROM INFORMATION_SCHEMA.FILES WHERE ENGINE="ndbcluster" and FILE_TYPE = 'DATAFILE';
Copyright 2006 MySQL AB 333 A useful VIEW
mysql> select * from isf; Empty set (0.01 sec)
Copyright 2006 MySQL AB 334 A useful VIEW
mysql> select * from isf; Empty set (0.01 sec)
mysql> CREATE TABLESPACE ts1 ADD DATAFILE 'datafile.dat' USE LOGFILE GROUP lg1 INITIAL_SIZE 12M ENGINE NDB; Query OK, 0 rows affected (2.55 sec)
Copyright 2006 MySQL AB 335 A useful VIEW
mysql> select * from isf; +------+------+------+------+ | FILE_NAME | Total Extent Size | Total Free In Bytes | % Free Space | +------+------+------+------+ | datafile.dat | 12582912 | 12582912 | 100.0000 | | datafile.dat | 12582912 | 12582912 | 100.0000 | +------+------+------+------+ 2 rows in set (0.00 sec)
mysql>
Copyright 2006 MySQL AB 336 A useful VIEW mysql> select * from isf; +------+------+------+------+ | FILE_NAME | Total Extent Size | Total Free In Bytes | % Free Space | +------+------+------+------+ | datafile.dat | 12582912 | 12582912 | 100.0000 | | datafile.dat | 12582912 | 12582912 | 100.0000 | +------+------+------+------+ 2 rows in set (0.00 sec)
mysql> CREATE TABLE t1 (pk1 int not null primary key, b int not null, c int not null) tablespace ts1 storage disk engine ndb; Query OK, 0 rows affected (0.68 sec)
mysql> insert into t1 () values (); Query OK, 1 row affected, 3 warnings (0.00 sec)
Copyright 2006 MySQL AB 337 A useful VIEW
mysql> select * from isf; +------+------+------+------+ | FILE_NAME | Total Extent Size | Total Free In Bytes | % Free Space | +------+------+------+------+ | datafile.dat | 12582912 | 11534336 | 91.6667 | | datafile.dat | 12582912 | 11534336 | 91.6667 | +------+------+------+------+ 2 rows in set (0.01 sec)
mysql>
Copyright 2006 MySQL AB 338 I_S.FILES for UNDO files
• Free log space • If running out, maybe need to add more
Copyright 2006 MySQL AB 339 The future of I_S.FILES
• Perhaps reporting of (approximate or exact) free space inside extents allocated to tables – This means there'll be the generic row for the data file as well as rows for each table – Potentially very long I_S.FILES table • Other storage engines – Developers need to add fill_files_table to their handlerton – Potentially some code cleanups to make it easier to extend the table
Copyright 2006 MySQL AB 340 Some internal details....
Copyright 2006 MySQL AB 341 Buffer Manager
• Main memory database has no buffer manager – no need, everything is in memory – simplifies things a lot :) • Good Buffer Manager = Good Performance • There will be tuning and enhancements here in future releases
Copyright 2006 MySQL AB 342 NO-STEAL
• We use no-steal algorithm – no uncommitted data written to disk – Saves some disk writes – Limited transaction size (configurable – same as always)
Copyright 2006 MySQL AB 343 Optimized NR
• Traditional NR – copy everything over the wire – PRO: easy to implement correctly – PRO: not too bad for a few gigs of data – CON: very very bad for disk data • think 2TB going over the wire... ouch! • Details on optimized node recovery for NDB – in s1108-ronstrom.pdf – recovery from checkpoint, so don't have to copy everything
Copyright 2006 MySQL AB 344 Disk Data
Copyright 2006 MySQL AB 345 More data in Cluster
Copyright 2006 MySQL AB 346 While keeping Main Memory performance when needed
Copyright 2006 MySQL AB 347 Questions?
Copyright 2006 MySQL AB 348 In 4.1 and 5.0...
Copyright 2006 MySQL AB 349 Storage Nodes
Copyright 2006 MySQL AB 350 CREATE TABLE t1...
Storage Nodes
Copyright 2006 MySQL AB 351 t1 t1
t1 t1
Storage Nodes
Copyright 2006 MySQL AB 352 CREATE DATABASE foo
Storage Nodes
Copyright 2006 MySQL AB 353 CREATE DATABASE foo
CREATE DATABASE foo
Storage Nodes
Copyright 2006 MySQL AB 354 CREATE DATABASE foo CREATE DATABASE foo
CREATE DATABASE foo
Storage Nodes
Copyright 2006 MySQL AB 355 CREATE DATABASE foo CREATE DATABASE foo CREATE DATABASE foo
CREATE DATABASE foo
Storage Nodes
Copyright 2006 MySQL AB 356 But now...
Copyright 2006 MySQL AB 357 in 5.1
Copyright 2006 MySQL AB 358 CREATE DATABASE foo
Storage Nodes
Copyright 2006 MySQL AB 359 foo foo
foo foo
Storage Nodes
Copyright 2006 MySQL AB 360 in 4.1 and 5.0
Copyright 2006 MySQL AB 361 Table Auto Discovery
Copyright 2006 MySQL AB 362 in 5.1 Database and Table auto discovery
Copyright 2006 MySQL AB 363 Disk Data
in 5.1 Database and Table auto discovery
Copyright 2006 MySQL AB 364 Disk Data Integration with Replication in 5.1 Database and Table auto discovery
Copyright 2006 MySQL AB 365 Disk Data Integration with Replication in 5.1 Database and Table auto discovery
User Defined Partitioning
Copyright 2006 MySQL AB 366 Disk Data Integration with Replication Variable Sized Attributes in 5.1 Database and Table auto discovery
User Defined Partitioning
Copyright 2006 MySQL AB 367 Disk Data Integration with Replication Variable Sized Attributes in 5.1 Database and Table auto discovery On line Add/Drop Index User Defined Partitioning
Copyright 2006 MySQL AB 368 Disk Data Better error reporting Integration with Replication Variable Sized Attributes in 5.1 Database and Table auto discovery On line Add/Drop Index User Defined Partitioning
Copyright 2006 MySQL AB 369 Disk Data Better error reporting Integration with Replication Variable Sized Attributes in 5.1 Database and Table auto discovery On line Add/Drop Index User Defined Partitioning More internal QA
Copyright 2006 MySQL AB 370 Disk Data Better error reporting Integration with Replication Variable Sized Attributes in 5.1 Database and Table auto discovery Improvements in Cluster managementOn line Add/Drop Index User Defined Partitioning More internal QA
Copyright 2006 MySQL AB 371 Disk Data Better error reporting
Improving Documentation Integration with Replication Variable Sized Attributes in 5.1 Database and Table auto discovery Improvements in Cluster managementOn line Add/Drop Index User Defined Partitioning More internal QA
Copyright 2006 MySQL AB 372 Disk Data Better error reporting
Improving Documentation Integration with Replication Variable Sized Attributes Improving configuration in 5.1 Database and Table auto discovery Improvements in Cluster managementOn line Add/Drop Index User Defined Partitioning More internal QA
Copyright 2006 MySQL AB 373 Find out More!
• MySQL Online Documentation – http://dev.mysql.com/doc/refman/5.0/en/index.html • Cluster Forum – http://forums.mysql.com/list.php?25 • Cluster Mailing List – http://lists.mysql.com/cluster • Contact Sales – [email protected] • Contact Me – Monty Taylor – [email protected]
Copyright 2006 MySQL AB 374 Quick intro to MySQL
• SQL RDBMS – Aiming for full standards compliance • Reputation for speed, reliability and ease of use • MySQL 5.0 is GA, 5.1 is Beta – many Cluster improvements over 4.1 • 5.0 Supports (among other things) – SQL 2003 Stored procedures – Triggers – Updateable Views – Cursors – Precision math – data dictionary (INFORMATION_SCHEMA database) – distributed transactions (XA)
Copyright 2006 MySQL AB 375 Cluster across the versions...
• Cluster in 4.1 – Focused on High Availability • Cluster in 5.0 – Make it faster! • Cluster in 5.1 – Added new features • Online upgrade between minor versions (e.g. 5.0.18 to 5.0.19)
Copyright 2006 MySQL AB 376 Storage Engines
• Unique feature of MySQL – Theory is: no one way of storing a table is ideal for all situations – So we let you choose! – on a per table basis – it's a VFS for your database tables! • Several to choose from • Can create your own • Well known existing ones: – MyISAM, InnoDB, HEAP, CSV, BDB, Archive, federated • Cluster is a storage engine for MySQL – ENGINE=NDB or ENGINE=ndbcluster
Copyright 2006 MySQL AB 377 What is MySQL Cluster?
• An in-memory clustered storage engine for MySQL Size of Database×No Of Replicas×1.1 – Data and Indexes kept in main mePemr Noordye= No Of Data Nodes – Check-pointed to disk – Disk Data 5.1 • Highly Available – Designed for five 9s availability (99.999% uptime) – sub-second fail over – Hot (Online) Backup (no locks taken during backup) – Amount of redundancy is configurable: NoOfReplicas • Shared-nothing architecture – A cluster of commodity hardware – Choice for interconnects (TCP and SCI are the main ones) – No need for expensive SANs
Copyright 2006 MySQL AB 378 Management Server config.ini
[ndbd default] NoOfReplicas= 2 outside world DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster mysqld [ndbd] mysqld HostName= 192.168.0.40 ndb_mgmd
[ndbd] HostName= 192.168.0.41
[ndb_mgmd] DataDir= /usr/local/mysql/cluster HostName= 192.168.0.42
[mysqld] [mysqld] ndbd ndbd [mysqld]
Copyright 2006 MySQL AB 379 Nodes
• Storage nodes (ndbd) – Grouped into nodegroups – Each nodegroup holds a partition of the database – Number of nodes in a nodegroup configurable (NoOfReplicas) – Up to 48 in one cluster – One acts as a transaction coordinator • Management Node – Distributes configuration to other nodes • MySQL nodes – Sometimes referred to as API nodes – MySQL servers – Access just like you would any other mysql server
Copyright 2006 MySQL AB 380 Physical Requirements
• A node is a process, not a computer • Need at least three machines in a cluster – This avoids the split brain problem • Management server doesn't need a powerful machine • Data nodes – generally have a lot of memory – Disk IO requirements can be calculated from configuration parameters – not CPU bound – ndbd is single threaded • SQL nodes need the most CPU – have more of these than data nodes – mysqld is multithreaded
Copyright 2006 MySQL AB 381 CREATE TABLE
• CREATE TABLE fbhighscores ( level int, piclevel int, time float, name varchar(50) ) engine=ndb; • CREATE TABLE fblevelset ( name varchar(20), `num` int auto_increment, `level` text, PRIMARY KEY (name,num) ) ENGINE=ndbcluster;
Copyright 2006 MySQL AB 382 Data Distribution on 4 Nodes
Pn AccN Va $ r o l $ F1
Horizontal fragmentation F2 of Table 1 (4 fragments) F3 Fragments distributed 2 copies of data on nodes F4 Fx – primary replica Fx – secondary replica Table 1
Physical configuration Logical configuration
Node group 1 Node group 2 Node 1 Node 2 F1 F1 F2 F2
F3 F3 F4 F4 Node 3 Node 4 Node 1 Node 2 Node 3 Node 4
Copyright 2006 MySQL AB 383