2 Announcement
Project milestone #2 due today (April 14) Transaction Processing: Homework #4 due in 9 days (April 23) Concurrency Control Recitation session this Friday (April 18) Homework #4 Q&A Project demo period in two weeks (April 28-May 3) CPS 216 Sign-up sheet will be available next Monday Advanced Database Systems Final exam in 17 days (May 1, 2-5pm)
3 4 Review Concurrency control
ACID Goal: ensure the “I” (isolation) in ACID Atomicity: TX’s are either completely done or not done at all Consistency: TX’s should leave the database in a consistent state T1: T2: Isolation: TX’s must behave as if they are executed in isolation read(A); read(A); Durability: Effects of committed TX’s are resilient against failures write(A); write(A); read(B); read(C); SQL transactions write(B); write(C); -- Begins implicitly commit; commit; SELECT …; UPDATE …; ROLLBACK | COMMIT; A B C
5 6 Good versus bad schedules Serial schedule
Good! Bad! Good? Execute transactions in order, with no interleaving of operations T1 T2 T1 T2 T1 T2 T1.r(A), T1.w(A), T1.r(B), T1.w(B), T2.r(A), T2.w(A), r(A) r(A) r(A) T2.r(C), T2.w(C) w(A) Read 400 r(A) w(A) T2.r(A), T2.w(A), T2.r(C), T2.w(C), T1.r(A), T1.w(A), r(B) w(A) Read 400 r(A) Write T1.r(B), T1.w(B) w(B) 400 – 100 w(A) w(A) Write )Isolation achieved by definition! r(A) r(B) 400 – 50 r(B) w(A) r(C) r(C) Problem: no concurrency at all r(C) w(B) w(B) Question: how to reorder operations to allow more w(C) w(C) w(C) concurrency
1 7 8 Conflicting operations Precedence graph
Two operations on the same data item conflict if at A node for each transaction least one of the operations is a write A directed edge from Ti to Tj if an operation of Ti r(X) and w(X) conflict precedes and conflicts with an operation of Tj in the w(X) and r(X) conflict schedule
w(X) and w(X) conflict T T T T 1 2 T 1 2 T r(X) and r(X) do not 1 1 r(A) r(A) w(A) r(A) r/w(X) and r/w(Y) do not T T r(A) 2 w(A) 2 Order of conflicting operations matters w(A) w(A) r(B) Good: r(B) Bad: E.g., if T .r(A) precedes T .w(A), then conceptually, T 1 2 1 r(C) no cycle r(C) cycle should precede T2 w(B) w(B) w(C) w(C)
9 10 Conflict-serializable schedule Locking
A schedule is conflict-serializable iff its precedence Rules graph has no cycles If a transaction wants to read an object, it must first A conflict-serializable schedule is equivalent to some request a shared lock (S mode) on that object serial schedule (and therefore is “good”) If a transaction wants to modify an object, it must first request an exclusive lock (X mode) on that object In that serial schedule, transactions are executed in the topological order of the precedence graph Allow one exclusive lock, or multiple shared locks You can get to that serial schedule by repeatedly Mode of the lock requested swapping adjacent, non-conflicting operations from SX different transactions Mode of lock(s) SYesNo Grant the lock? currently held XNoNo by other transactions Compatibility matrix
11 12 Basic locking is not enough Two-phase locking (2PL)
Add 1 to both A and B T1 T2 Multiply both A and B by 2 (preserve A=B) (preserves A=B) All lock requests precede all unlock requests lock-X(A) Read 100 r(A) Phase 1: obtain locks, phase 2: release locks Write 100+1 w(A) T1 T2 T1 T2 unlock(A) 2PL guarantees a lock-X(A) lock-X(A) T r(A) conflict-serializable r(A) r(A) Read 101 1 Possible schedule w(A) schedule w(A) under locking w(A) Write 101*2 lock-X(B) r(A) unlock(A) unlock(A) w(A) T2 lock-X(A) But still not lock-X(B) r(A) r(B) conflict-serializable! r(B) Read 100 w(A) w(B) w(B) Write 100*2 lock-X(B) r(B) unlock(B) r(B) w(B) lock-X(B) w(B) Read 200 r(B) A ≠ B! r(B) Cannot obtain the lock on B until T unlocks Write 200+1 w(B) w(B) 1 unlock(B) unlock(B)
2 13 14 Problem of 2PL Strict 2PL
T1 T2 T2 has read uncommitted Only release locks at commit/abort time r(A) data written by T A writer will block all other readers until the writer w(A) 1 r(A) commits or aborts If T1 aborts, then T2 must w(A) r(B) abort as well w(B) r(B) Cascading aborts possible if ) Used in most commercial DBMS (except Oracle) w(B) other transactions have read … Abort! data written by T2
Even worse, what if T2 commits before T1? Schedule is not recoverable if the system crashes right
after T2 commits
15 16 Deadlocks Dealing with deadlocks
Impose an order for locking objects Must know in advance which objects a transaction will access Timeout T T 1 2 Deadlock: cycle in the wait-for graph If a transaction has been blocked for too long, just abort lock-X(A) r(A) Prevention T1 w(A) lock-X(B) Idea: abort more often, so blocking is less likely r(B) T1 is waiting for T2 T2 is waiting for T1 w(B) Suppose T is waiting for T’ T lock-X(B) lock-X(A) 2 • Wait/die scheme: Abort T if it has a lower priority; otherwise T waits Deadlock! r(B)r(A) • Wound/wait scheme: Abort T’ if it has a lower priority; otherwise T waits w(B)w(A) Detection using wait-for graph Idea: deadlock is rare, so only deal it when it becomes an issue When do we detect deadlocks? Which transactions do we abort in case of deadlock?
17 18 Implementation of locking Multiple-granularity locks
Do not rely on transactions themselves to Hard to decide what granularity to lock Database lock/unlock explicitly Trade-off between overhead and concurrency Tables
DBMS inserts lock/unlock requests automatically Granularities form a hierarchy Pages Transactions Allow transactions to lock at different Streams of operations Rows granularity, using intention locks Insert lock/unlock requests Operations with S, X: lock the entire subtree in S, X mode, respectively lock/unlock requests IS: intend to lock some descendent in S mode Lock table Scheduler IX: intend to lock some descendent in X mode Lock info for each object, Serialized schedule with no including locks currently held lock/unlock operations SIX (= S + IX): lock the entire subtree in S mode; and the request queue intend to lock descendent in X mode
3 19 20 Multiple-granularity locking protocol Examples Mode of the lock requested S X IS IX SIX T1 reads all of R Mode of lock(s) SYes Yes T gets an S lock on R X 1 currently held Grant the lock? IS Yes Yes Yes Yes T scans R and updates a few rows by other transactions IX Yes Yes 2 SIX Yes T2 gets an SIX lock on R, and then occasionally get an X Compatibility matrix lock for some rows Lock: before locking an item, T must acquire intention locks on all ancestors of the item T3 uses an index to read only part of R To get S or IS, must hold IS or IX on parent T3 gets an IS lock on R, and then repeatedly gets an S • What if T holds S or SIX on parent? lock on rows it needs to access To get X or IX or SIX, must hold IX or SIX on parent Unlock: release locks bottom-up 2PL must also be observed
21 22 Phantom problem revisited Solutions
Reads are repeatable, but may see phantoms Index locking Example: different average Use the index on Student(age)
-- T1: -- T2: T2 locks the index block(s) with entries for age = 10 SELECT AVG(GPA) • If there are no entries for age = 10, T2 must lock the index FROM Student WHERE age = 10; block where such entries would be, if they existed! INSERT INTO Student VALUES(789, ‘Nelson’, 10, 1.0); Predicate locking COMMIT; “Lock” the predicate (age = 10) SELECT AVG(GPA) FROM Student WHERE age = 10; Reason with predicates to detect conflicts COMMIT; Expensive to implement ) How do you lock something that does not exist yet?
23 24 Concurrency control without locking Optimistic concurrency control
Locking is pessimistic Optimistic (validation-based) Use blocking to avoid conflicts Overhead of locking even if contention is low
Timestamp-based Optimistic concurrency control Assume that most transactions do not conflict Let them execute as much as possible Multi-version (Oracle, PostgreSQL) If it turns out that they conflict, abort and restart
4 25 26 Sketch of protocol Pessimistic versus optimistic
Read phase: transaction executes, reads from the Overhead of locking versus overhead of validation database, and writes to a private space and copying private space Validate phase: DBMS checks for conflicts with Blocking versus aborts and restarts other transactions; if conflict is possible, abort and “Concurrency control performance modeling: alternatives restart and implications,” by Agrawal et al. TODS 1987 (in red Requires maintaining a list of objects read and written by book) each transaction Locking has better throughput for environments with medium-to-high contention Write phase: copy changes in the private space to the database Optimistic concurrency control is better when resource utilization is low enough
27 28 Timestamp-based Multi-version concurrency control
Assign a timestamp to each transaction Maintain versions for each database object Timestamp order is commit order Each write creates a new version Associate each database object with a read Each read is directed to an appropriate version timestamp and a write timestamp Conflicts are detected in a similar manner as timestamp concurrency control When transaction reads/writes an object, check the object’s timestamp for conflict with a younger In addition to the problems inherited from transaction; if so, abort and restart timestamp concurrency control Pro: Reads are never blocked Problems Even reads require writes (of object timestamps) Con: Multiple versions need to be maintained Ensuring recoverability is hard (plenty of dirty reads) ) Oracle and PostgreSQL use variants of this scheme
29 Summary
Covered Conflict-serializability 2PL, strict 2PL Deadlocks Multiple-granularity locking Predicate locking and tree locking Overview of other concurrency-control methods Not covered View-serializability Concurrency control for search trees (not the same as multiple-granularity locking and tree locking)
5