Scaling-Databases

CIT 668: System Architecture Distributed Databases Topics 1. MySQL 2. Concurrency 3. Transactions and ACID 4. Database scaling 5. Replication 6. Partitioning 7. Brewer’s CAP Theorem 8. ACID vs. BASE 9. Taxonomy of NoSQL databases MySQL MySQL Architecture MySQL Storage Engines InnoDB – Default storage engine as of MySQL 5.5. – Supports transactions, hot backups, etc. – Row level locking. – Fast crash recovery. MyISAM: – Default storage engine starting with MySQL 3.23. – Does NOT support transactions. – Must halt writes before doing a backup. – Table level locking. – Higher performance for read-heavy applications. MySQL History Year Version Description 1995 First commercial version from MySQL AB corporation in Sweden. 2001 3.23 First open source version that is widely used, supporting full text indexing and replication. Dual licensing: commercial and GPL. 2003 4.0 Better SQL syntax support, incl UNION. SSL and InnoDB options. 2005 4.1 Better SQL syntax support, incl subqueries. UTF-8 support. 2006 5.0 Even more SQL: views, triggers, stored procedures. 2008 5.1 First release after Sun purchase of MySQL. Added partitioning. 2010 5.5 First release after Oracle purchase of Sun. InnoDB is default. 2013 5.6 InnoDB full text search support and speed improvements. MySQL Forks MariaDB: community developed GPL only fork started by MySQL co-founder Monty Widenius in reaction Sun purchase. Used by Wikipedia, Google, etc. – Backwards compatible (versions 5.1-5.5). – Version 10: Multi-master, NoSQL storage engines. Drizzle: fork based on MySQL 6.0 code base, designed to be smaller and faster by removing features. WebScaleSQL: fork of MySQL 5.6 designed for larger scale databases started by consortium of Facebook, LinkedIn, Google, and Twitter. Concurrency Race Conditions A race condition is a bug in which the result of a process depends on the sequence or timing of other events. Mutual Exclusion To synchronize access to shared objects, we can use mutual exclusion. Code that uses mutual exclusion to synchronize its execution is called a critical section, which is a section of code such that: 1. Only one thread at a time can execute in the critical section. 2. All other threads must wait to enter the section. 3. When a thread leaves the critical section, another thread can enter. Critical section requirements Mutual exclusion – At most one thread is in the critical section. Progress – If a thread is outside the critical section, it cannot prevent another thread from entering the critical section. Bounded waiting – If a thread is waiting on the critical section, it will eventually enter the critical section. Performance – The cost of entering and leaving the critical section is small with respect to the work done within it. Locking A lock is an object that provides two operations: acquire(): thread calls this before entering critical section release(): thread calls this after leaving critical section A thread holds the lock btw acquire() and release(). Lock Granularity Table Locks – If a thread is reading, other readers can use table. – If a thread is writing, only it can access table. – Low overhead, small number of locks. – Low concurrency. Row Locks – High overhead, large number of locks. – High concurrency. – Supported by InnoDB storage in MySQL. Deadlocks Deadlocks A deadlock is a situation where two or more actions are waiting for the other to finish, and thus neither ever completes. Transactions The classic problem Code to withdraw funds from bank account withdraw(account, amount) { balance = get_balance(account); balance -= amount; put_balance(account, balance); return amount; } What happens if you setup automatic bill pay and two withdrawals are made simultaneously? Create a separate thread for each withdrawal, each running the same code. withdraw(account, amount) { withdraw(account, amount) { balance = get_balance(account); balance = get_balance(account); balance -= amount; balance -= amount; put_balance(account, balance); put_balance(account, balance); return amount; return amount; } } Interleaved schedules Execution of the two threads can be interleaved, with preemptive scheduling: balance = get_balance(account); balance -= amount; Execution sequence context switch as seen by CPU balance = get_balance(account); balance -= amount; put_balance(account, balance); context switch put_balance(account, balance); What’s the account balance after this sequence? Transactions A transaction is a set of actions that are executed atomically. Either all actions are completed or none. – If an error occurs during a transaction, the database rolls back the actions, so the state of database is left as it was before the transaction. SQL Transaction example: START TRANSACTION; UPDATE account SET balance = balance - amount WHERE id=1; UPDATE account SET balance = balance + amount WHERE id=2; COMMIT; ACID Properties Atomicity—All data modifications within a transaction must happen completely or not at all. No partial transactions can be recorded. Consistency—All changes to an instance of data must be reflected in all instances of that data. Isolation—The Elements of a transaction should be isolated to the user performing the transaction until it is completed. Durability—When a system failure occurs, the data in the DB must be accurate up to the last committed transaction before failure. Database scaling Database Scaling Techniques Base case Scaleup a 1 TPS system to a 2 TPS centralized system 1 TPS server 100 Users 200 Users 2 TPS server Partitioning Replication Two 1 TPS systems Two 2 TPS systems 1 TPS server 100 Users 100 Users 2 TPS server O tps O tps 1 tps 1 tps 1 TPS server 100 Users 100 Users 2 TPS server Distributed System Types Shared • All CPUs share memory/disk • Scalability limited by memory Memory contention (vertical scaling only) Shared • CPUs share storage, not RAM • Scalability limited by disk contention Disk (vertical scaling only) Shared • Each CPU has its own RAM and disks • Very high (horizontal) scalability since Nothing no contention for shared resources Database replication Purposes of Replication Data distribution – Maintains a copy of DB at another geographic site to lower latency at site or for DR. Load balancing – Allows application to access data on multiple servers. Backup and recovery – Backups of replicated DB can be performed without impacting performance of original production DB. High availability – Application can failover to replicated DB. Replication techniques Eager (synchronous) – All replicas updated as part of original transaction. – Data is always consistent, but ensuring data consistency can lead to long waits or deadlocks. Lazy (asynchronous) – Original transaction completes on node, then updates propagated to other nodes as separate transactions. – Can result in conflicts when transactions modify same object on different nodes before replicas are updated. – Must have reconciliation protocol to resolve conflicts. Master/Slave Replication Slave DBs only accept read operations for application. – Flickr.com DBs logged 13 SELECTS for each write. Master DB does all writes – No read operations. – Copies write operations to slave DBs. – Slave data will be slightly behind master. – Single point of failure! Master/Slave Replication Scales reads, not writes. – Writes are faster, since master only does writes, but writes do not scale with the addition of more slaves. – Good for read-centric applications. Master is a single point of failure. – Manually promote one slave to master, then – Re-parent slaves on master failure. Dual Master Replication High reliability – Two identical copies of DB. Easy maintenance – Set only one DB to be active. – Update inactive server. – Synchronize. – Flip to other DB as active one, then update it. One big problem – How to handle conflicting changes? Complex Replication Topologies Dual Master with Read Replicas Ring Multimaster Topology with Read Replicas at each site Pyramid Replication (reduces replication load on master) MySpace Case Study • 3000 web servers • 800 cache servers • 440 database servers hosting >1000 databases • Each DB server has 4 2-core CPUs + 64GB RAM https://www.microsoft.com/casestudies/Case_Study_Detail.aspx?casestudyid=4000004532 Database Partitioning Partitioning A partition is a division of a logical database into independent components, called shards or partitions. Horizontal partitioning divides the database by rows, with groups of rows stored on diff nodes. Horizontal scaling is highly scalable. A horizontal partition is called a shard. Vertical partitioning divides the database by columns, with sets of columns stored on different nodes. Vertical scaling is limited by the number of columns that are accessed independently. Partitioning Criteria Range partitioning selects a partition by determining if the partition key is within a certain range. List partitioning assigns each partition a list of values. Hash partitioning uses a hash function to determine partition membership. Wikipedia Shard Architecture LiveJournal Sharding Architecture 2007 Sharding Advantages Faster Queries – Since each shard has a fraction of the whole DB, queries are faster than they would be on whole. Higher Write Bandwidth – Can have master/slave configuration for each shard, so each shard has its own dedicated master to do writes. High Scalability – Can continue to divide DB into more shards to scale out indefinitely. Sharding Disadvantages Rebalancing – To scale, you need to split shards, which can require substantial manual effort and downtime. – Google and Flickr’s shards auto-rebalance,

Load more