Transaction Processing: Concurrency Control ACID Transaction in SQL Concurrency Control Good Versus Bad Schedules Serial Schedul

ACID • Atomicity Transaction Processing: – Transactions are either done or not done Concurrency Control – They are never left partially executed • Consistency – Transactions should leave the database in a consistent state CPS 216 • Isolation – Transactions must behave as if they were executed in isolation Advanced Database Systems • Durability – Effects of completed transactions are resilient against failures 2 Transaction in SQL Concurrency control • (Implicit beginning of transaction) • Goal: ensure the “I” (isolation) in ACID SELECT …; T1: T2: UPDATE …; read(A); read(A); …… write(A); write(A); ROLLBACK | COMMIT; read(B); read(C); write(B); write(C); • ROLLBACK (a.k.a. transaction abort) commit; commit; – Will undo the the partial effects of the transaction – May be initiated by the DBMS A B C • For example, when some statement in the transaction violates a database constraint 3 4 Good versus bad schedules Serial schedule Good! Bad! Good! (But why?) • Execute transactions in order, with no T1 T2 T1 T2 T1 T2 interleaving of operations – T1.r(A), T1.w(A), T1.r(B), T1.w(B), T2.r(A), T2.w(A), r(A) r(A) r(A) T .r(C), T .w(C) w(A) Read 400 r(A) w(A) 2 2 Read 400 – T .r(A), T .w(A), T .r(C), T .w(C), T .r(A), T .w(A), r(B) Write w(A) r(A) 2 2 2 2 1 1 400 – 100 T1.r(B), T1.w(B) w(B) w(A)Write w(A) r(A) r(B) 400 – 50 r(B) – Isolation achieved by definition! w(A) r(C) r(C) • Problem: no concurrency at all r(C) w(B) w(B) • Question: how to reorder schedule to allow more w(C) w(C) w(C) 5 concurrency 6 1 Conflicting operations Precedence graph • Two operations on the same data item conflict if • A node for each transaction at least one of the operations is a write • A directed edge from Ti to Tj if an operation of Ti –r(X) and w(X) conflict precedes and conflicts with an operation of Tj in –w(X) and r(X) conflict the schedule –w(X) and w(X) conflict T1 T2 T1 T2 T1 T1 –r(X) and r(X) do not r(A) r(A) w(A) r(A) T T –r/w(X) and r/w(Y) do not r(A) 2 w(A) 2 • Order of conflicting operations matters w(A) w(A) r(B) Good: r(B) Bad: r(C) r(C) –If T1.r(A) precedes T2.w(A), then conceptually, T1 no cycle cycle w(B) w(B) should precede T2 7 w(C) w(C) 8 Conflict-serializable schedule Locking • A schedule is conflict-serializable iff its •Rules precedence graph has no cycles – If a transaction wants to read an object, it must first • A conflict-serializable schedule is equivalent to request a shared lock (S mode) on that object some serial schedule (and therefore is “good”) – If a transaction wants to modify an object, it must first request an exclusive lock (X mode) on that object – In that serial schedule, transactions are executed in the topological order of the precedence graph – Allow one exclusive lock, or multiple shared locks – You can get to that serial schedule by repeatedly Mode of the lock requested Mode of lock(s) swapping adjacent, non-conflicting operations from SX currently held SYesNo Grant the lock? different transactions by other transactions XNoNo Compatibility matrix 9 10 Basic locking is not enough Two-phase locking (2PL) Add 1 to both A and B T1 T2 Multiply both A and B by 2 (preserve A=B) (preserves A=B) lock-X(A) • All lock requests precede all unlock requests Read 100 r(A) – Phase 1: obtain locks, phase 2: release locks Write 100+1 w(A) T T T T unlock(A) 1 2 1 2 lock-X(A) lock-X(A) 2PL guarantees a T r(A) conflict-serializable r(A) Possible schedule r(A) Read 101 1 w(A) schedule w(A) under locking w(A) Write 101*2 lock-X(B) r(A) unlock(A) unlock(A) T2 lock-X(A) w(A) But still not lock-X(B) r(A) r(B) Read 100 conflict-serializable! r(B) w(A) w(B) w(B) Write 100*2 lock-X(B) r(B) unlock(B) r(B) w(B) lock-X(B) w(B) Read 200 r(B) A <> B! r(B) Cannot obtain the lock on B until T unlocks Write 200+1 w(B) 11 w(B) 1 12 unlock(B) unlock(B) 2 Problem of 2PL Strict 2PL T T 1 2 • T2 has read uncommitted data r(A) written by T1 w(A) • Only release locks at commit/abort time r(A) • If T1 aborts, then T2 must – A writer will block all other readers until the writer w(A) abort as well r(B) commits or aborts w(B) • Cascading aborts possible if r(B) w(B) other transactions have read Abort! • Used in most commercial DBMS (except Oracle) data written by T2 • What’s worse, what if T2 commits before T1? – Not recoverable if the system crashes right after T2 commits 13 14 Deadlocks Dealing with deadlocks • Impose an order for locking objects T1 T2 lock-X(A) – Must know in advance which objects a transaction will access r(A) Deadlock = w(A) lock-X(B) cycle in the wait-for graph • Timeout r(B) – If a transaction has been blocked for too long, just abort w(B) T lock-X(B)lock-X(A) 1 • Prevention Deadlock! T1 is waiting for T2 T2 is waiting for T1 – Idea: abort more often, so blocking is less likely r(B) r(A) T w(B) w(A) 2 – Wait/die versus wound/wait • Detection using wait-for graph – Idea: deadlock is rare, so only deal it when it becomes an issue – How often do we detect deadlocks? 15 – Which transactions do we abort in case of deadlock? 16 Implementation of locking SQL transaction isolation levels • Do not rely on transactions themselves to • SERIALIZABLE (default) lock/unlock explicitly • Weaker isolations levels • DBMS inserts lock/unlock requests automatically – READ UNCOMMITTED Transactions – READ COMMITTED Streams of operations – REPEATABLE READ Insert lock/unlock requests Operations with • Why weaker levels? lock/unlock requests – Increase performance by eliminating overhead and Lock table Scheduler allowing higher degree of concurrency Lock info for each object, Serialized schedule with no including locks currently held lock/unlock operations and the request queue 17 18 3 READ UNCOMMITTED READ COMMITTED • Dirty reads possible (dirty = uncommitted) • No dirty reads, but non-repeatable reads possible • Example: wrong average • Example: different averages T1: T2: T1: T2: SELECT AVG(balance) UPDATE Account UPDATE Account FROM Account; SET balance = balance – 200 SET balance = balance – 200 WHERE number = 142857; SELECT AVG(balance) WHERE number = 142857; FROM Account; COMMIT; ROLLBACK; SELECT AVG(balance) COMMIT; FROM Account; COMMIT; • Possible cause • Possible cause – Non-strict locking protocol, or no read lock 19 – Locking is not two-phase 20 REPEATABLE READ Summary of SQL isolation levels • Reads repeatable, but may see phantoms Isolation level / anomaly Dirty reads Non-repeatable reads Phantoms • Example: different average (still!) READ UNCOMMITTED Yes Yes Yes READ COMMITTED No Yes Yes T1: T2: REPEATABLE READ No No Yes SELECT AVG(balance) SERIALIZABLE No No No INSERT INTO Account FROM Account; VALUES(428571, 1000); COMMIT; • Criticized for definition in terms of anomalies SELECT AVG(balance) FROM Account; – Berenson, Bernstein, Gray, et al. “A critique of ANSI COMMIT; SQL isolation levels,” SIGMOD 1995 • Possible cause – Insertion did not acquire any lock (what to acquire?) 21 22 Concurrency control without locking Optimistic concurrency control • Locking is pessimistic • Optimistic (validation-based) – Use blocking to avoid conflicts – Overhead of locking even if contention is low • Timestamp-based • Optimistic concurrency control – Assume that most transactions do not conflict – Let them execute as much as possible • Multi-version (Oracle) – If it turns out that they conflict, abort and restart 23 24 4 Sketch of protocol Pessimistic versus optimistic • Read phase: transaction executes, reads from the • Overhead of locking versus overhead of database, and writes to a private space validation and copying private space • Validate phase: DBMS checks for conflicts with • Blocking versus aborts and restarts other transactions; if conflict is possible, abort • Agrawal, Carey, and Livny. “Concurrency control and restart performance modeling: alternatives and implications,” – Requires maintaining a list of objects read and written TODS 12(4), 1987 by each transaction – Locking has better throughput for environments with • Write phase: copy changes in the private space to medium-to-high contention the database – Optimistic concurrency control is better when resource utilization is low enough 25 26 Timestamp-based Multi-version concurrency control • Associate each database object with a read • Maintain versions for each database object timestamp and a write timestamp – Each write creates a new version • Assign a timestamp to each transaction – Each read is directed to an appropriate version • Timestamp order is commit order – Conflicts are detected in a similar manner as • When transaction reads/writes an object, check timestamp concurrency control the object’s timestamp for conflict with a younger • In addition to the problems inherited from transaction; if so, abort and restart timestamp concurrency control • Problems – Pro: Reads are never blocked – Even reads require writes (of object timestamps) – Con: Multiple versions need to be maintained – Ensuring recoverability is hard (plenty of dirty reads) • Oracle uses some variant of this scheme 27 28 Summary Next time • Covered – Conflict-serializability – 2PL, strict 2PL Recovery – Deadlocks SQL triggers and programming interface – Overview of other concurrency-control methods • Not covered – View-serializability – Hierarchical locking – Predicate locking and tree locking 29 30 5.

Load more