<<

Review CS 186 Fall 2002, Lecture 24 R &G - Chapter 19 • DBMSs support ACID Transaction semantics. • Concurrency control and Crash Recovery are key compontents here. • For Isolation property, a serial execution of Smile, it is the key that transactions is safe but slow fits the of – Try to find schedules equivalent to serial execution everybody's heart. • One solution for “conflict serializable” schedules is Two Phase Locking (2PL) Anthony J. D'Angelo, The College Blue Book

Conflict Serializable Schedules Conflict Equivalence - Intuition • We need a formal notion of equivalence that can be • If you can transform an interleaved schedule by implemented efficiently… swapping consecutive non-conflicting operations •Two operations conflict if they are by different of different transactions into a serial schedule, transactions, they are on the same object, and at then the original schedule is conflict serializable. least one of them is a write. • Example: • Two schedules are conflict equivalent iff: They involve the same actions of the same transactions, and R(A) W(A) R(B) W(B) every pair of conflicting actions is ordered the same way R(A) W(A) R(B) W(B) • Schedule S is conflict serializable if S is conflict equivalent to some serial schedule. R(A) W(A) R(B)R(B)W(B)R(B)W(B)W(B) – Note, some “serializable” schedules are NOT conflict serializable. R(A)R(A)W(A)R(A)W(A)W(A) R(B) W(B) – This is the price we pay for efficiency.

Conflict Equivalence (Continued) Dependency Graph • Here’s another example: • Dependency graph: One node per Xact; edge R(A) W(A) from Ti to Tj if an operation of Ti conflicts R(A) W(A) with an operation of Tj and Ti’s operation appears earlier in the schedule than the conflicting operation of Tj. • Serializable or not???? •Theorem: Schedule is conflict serializable if and only if its dependency graph is acyclic NOT!

1 Example View Serializability – an Aside • Alternative (weaker) notion of serializability. • A schedule that is not conflict serializable: • Schedules S1 and S2 are view equivalent if: – If Ti reads initial value of A in S1, then Ti also reads initial value of A in S2 T1:T1: R(A), R(A), W(A),W(A), R(B)R(B),R(B),, W(B)W(B) – If Ti reads value of A written by Tj in S1, then Ti also T2:T2: R(A), R(A), W(A),W(A), R(B),R(B), W(B)W(B) reads value of A written by Tj in S2 A – If Ti writes final value of A in S1, then Ti also writes T1 T2 Dependency graph final value of A in S2 • Basically, allows all conflict serializable B schedules + “blind writes” • The cycle in the graph reveals the problem. T1: R(A) W(A) T1: R(A),W(A) The output of T1 depends on T2, and vice- T2: W(A) view T2: W(A) versa. T3: W(A) T3: W(A)

Notes on Serializability Definitions Two-Phase Locking (2PL) S X • View Serializability allows (slightly) more Lock S √ – schedules than Conflict Serializability does. Compatibility Matrix X – – – Problem is that it is difficult to implement • Locking Protocol efficiently. – Each Xact must obtain a S (shared) lock on object • Neither definition allows all schedules that before reading, and an X (exclusive) lock on you would consider “serializable”. object before writing. – A transaction can not request additional locks – This is because they don’t understand the once it releases any locks. meanings of the operations or the data. – Thus, there is a “growing phase” followed by a • In practice, Conflict Serializability is what gets “shrinking phase”. used, because it can be done efficiently. • 2PL on its own is sufficient to guarantee – In order to allow more concurrency, some special conflict serializability, but, it is subject to cases do get implemented, such as for travel reservations, etc. Cascading Aborts.

Strict 2PL Strict 2PL (continued)

• Problem: Cascading Aborts All locks held by a transaction are released only when the transaction completes • Example: rollback of T1 requires rollback of T2! • Strict 2PL allows only schedules whose precedence graph is acyclic, but it is actually T1: R(A), W(A), R(B), W(B), Abort stronger than needed for that purpose. T2: R(A), W(A) • In effect, “shrinking phase” is delayed until a) Transaction has committed (commit log record • To avoid Cascading aborts, use Strict 2PL on disk), or • Strict Two-phase Locking (Strict 2PL) b) Decision has been made to abort the xact (then Protocol: locks can be released after rollback). – Same as 2PL, except: – All locks held by a transaction are released only when the transaction completes

2 Non-2PL, A= 1000, B=2000, Output =? 2PL, A= 1000, B=2000, Output =?

Lock_X(A) Lock_X(A) Read(A) Lock_S(A) Read(A) Lock_S(A) A: = A-50 A: = A-50 Write(A) Write(A)

Unlock(A) Lock_X(B) Read(A) Unlock(A) Unlock(A) Read(A) Lock_S(B) Lock_S(B) Lock_X(B) Read(B) Read(B) Unlock(B) B := B +50 PRINT(A+B) Write(B) Read(B) Unlock(B) Unlock(A) B := B +50 Read(B) Write(B) Unlock(B) Unlock(B) PRINT(A+B)

Strict 2PL, A= 1000, B=2000, Output =? Lock Management Lock_X(A) • Lock and unlock requests are handled by the Lock Read(A) Lock_S(A) Manager. A: = A-50 • LM contains an entry for each currently held lock. Write(A) • Lock table entry: Lock_X(B) – Ptr. to list of transactions currently holding the lock Read(B) – Type of lock held (shared or exclusive) B := B +50 – Pointer to queue of lock requests Write(B) Unlock(A) • When lock request arrives see if anyone else holding Unlock(B) a conflicting lock. Read(A) – If not, create an entry and grant the lock. Lock_S(B) – Else, put the requestor on the wait queue Read(B) • Locking and unlocking have to be atomic operations PRINT(A+B) • Lock upgrade: transaction that holds a shared lock Unlock(A) can be upgraded to hold an exclusive lock Unlock(B) – Can cause deadlock problems

Example: Output = ?

Lock_X(A) Lock_S(B) Deadlocks Read(B) Lock_S(A) • Deadlock: Cycle of transactions waiting for Read(A) locks to be released by each other. A: = A-50 Write(A) • Two ways of dealing with deadlocks: Lock_X(B) – Deadlock prevention – Deadlock detection

• Many systems just punt and use Timeouts – What are the dangers with this approach?

3 Deadlock Prevention Deadlock Detection

• Assign priorities based on timestamps. • Create a waits-for graph: Assume Ti wants a lock that Tj holds. Two – Nodes are transactions policies are possible: – There is an edge from Ti to Tj if Ti is waiting for Tj – Wait-Die: If Ti has higher priority, Ti waits for Tj; to release a lock otherwise Ti aborts • Periodically check for cycles in the waits-for – Wound-wait: If Ti has higher priority, Tj aborts; graph otherwise Ti waits • If a transaction re-starts, make sure it gets its original timestamp –Why?

Deadlock Detection (Continued) Multiple-Granularity Locks

Example: • Hard to decide what granularity to lock (tuples vs. pages vs. tables). T1: S(A), S(D), S(B) T2: X(B) X(C) • Shouldn’t have to make same decision for all T3: S(D), S(C), X(A) transactions! T4: X(B) • Data “containers” are nested: T1 T2 Tables contains Pages

T4 T3 Tuples

Solution: New Lock Modes, Protocol Database • Allow Xacts to lock at each level, but with a Database Multiple Granularity Lock Protocol Tables special protocol using new “intention” locks: Tables Pages • Still need S and X locks, but before locking an • Each Xact starts from the root of the Pages item, Xact must have proper intension locks hierarchy. Tuples on all its ancestors in the granularity Tuples • To get S or IS lock on a node, must hold IS or hierarchy. IS IX SIX SX IX on parent node. – What if Xact holds SIX on parent? S on parent?  IS – Intent to get S lock(s) at IS √√√√ - • To get X or IX or SIX on a node, must hold IX or finer granularity. √ √ IX - - - SIX on parent node.  IX – Intent to get X lock(s) SIX √ - - - - • Must release locks in bottom-up order. at finer granularity. √ S - - √ -  SIX mode: Like S & IX at Protocol is correct in that it is equivalent to directly setting X - the same time. Why useful? - - - - locks at the leaf levels of the hierarchy.

4 Dynamic – The “Phantom” Examples – 2 level hierarchy Tables Problem • T1 scans R, and updates a few tuples: Tuples • If we relax the assumption that the DB is a fixed – T1 gets an SIX lock on R, then get X lock on tuples that are updated. collection of objects, even Strict 2PL (on individual items) will not assure serializability: • T2 uses an index to read only part of R: – T2 gets an IS lock on R, and repeatedly • Consider T1 – “Find oldest sailor” gets an S lock on tuples of R. – T1 locks all records, and finds oldest sailor (say, age = 71). • T3 reads all of R: SIX IS IX SX – Next, T2 inserts a new sailor; age = 96 and commits. – T3 gets an S lock on R. IS √ √√ √ – T1 (within the same transaction) checks for the oldest – OR, T3 could behave like T2; can IX √ √ sailor again and finds sailor aged 96!! use lock escalation to decide which. SIX √ • The sailor with age 96 is a “phantom tuple” from – Lock escalation dynamically asks for S √ √ courser-grained locks when too many T1’s point of view --- first it’s not there then it is. X low level locks acquired • No serial execution where T1’s result could happen!

The “Phantom” Problem – example 2 The Problem

• Consider T3 – “Find oldest sailor for each rating” • T1 and T3 implicitly assumed that they had locked the set of all sailor records satisfying – T3 locks all pages containing sailor records with rating = 1, and finds oldest sailor (say, age = 71). a predicate. – Next, T4 inserts a new sailor; rating = 1, age = 96. – Assumption only holds if no sailor records are added while they are executing! – T4 also deletes oldest sailor with rating = 2 (and, say, age = 80), and commits. – Need some mechanism to enforce this assumption. (Index locking and predicate – T3 now locks all pages containing sailor records with locking.) rating = 2, and finds oldest (say, age = 63). • T3 saw only part of T4’s effects! • Examples show that conflict serializability on reads and writes of individual items • No serial execution where T3’s result could happen! guarantees serializability only if the set of objects is fixed!

Predicate Locking Data Index Index Locking r=1 • If there is a dense index on the rating field • Grant lock on all records that satisfy some using Alternative (2), T3 should lock the logical predicate, e.g. age > 2*salary. index page containing the data entries with • Index locking is a special case of predicate rating = 1. locking for which an index supports efficient – If there are no records with rating = 1, T3 must implementation of the predicate lock. lock the index page where such a data entry – What is the predicate in the sailor example? would be, if it existed! • In general, predicate locking has a lot of • If there is no suitable index, T3 must lock all locking overhead. pages, and lock the file/table to prevent new pages from being added, to ensure that no records with rating = 1 are added or deleted.

5 Optimistic CC (Kung-Robinson) Transaction Support in SQL-92 Locking is a conservative approach in which conflicts are prevented. • SERIALIZABLE – No phantoms, all reads Disadvantages: repeatable, no “dirty” (uncommited) reads. • Lock management overhead. • REPEATABLE READS – phantoms may •Deadlock detection/resolution. happen. • Lock contention for heavily used objects. • READ COMMITTED – phantoms and • Locking is “pessimistic” because it unrepeatable reads may happen assumes that conflicts will happen. • READ UNCOMMITTED – all of them may happen. • If conflicts are rare, we might get better performance by not locking, and instead checking for conflicts at commit.

Kung-Robinson Model Validation

• Xacts have three phases: • Test conditions that are sufficient to ensure –READ: Xacts read from the database, but that no conflict occurred. make changes to private copies of objects. • Each Xact is assigned a numeric id. – VALIDATE: Check for conflicts. –Just use a timestamp. –WRITE:Make local copies of changes • Xact ids assigned at end of READ phase, just public. before validation begins. old • ReadSet(Ti): Set of objects read by Xact Ti. modified ROOT • WriteSet(Ti): Set of objects modified by Ti. objects new

Test 1 Test 2 • For all i and j such that Ti < Tj, check that: • For all i and j such that Ti < Tj, check that Ti – Ti completes before Tj begins its Write phase completes before Tj begins. + – WriteSet(Ti) ReadSet(Tj) is empty. Ti Tj Ti R V W R V W Tj R V W R V W

Does Tj read dirty data? Does Ti overwrite Tj’s writes?

6 Applying Tests 1 & 2: Serial Validation Test 3 • To validate Xact T: • For all i and j such that Ti < Tj, check that: valid = true; – Ti completes Read phase before Tj does + // S = set of Xacts that committed after Begin(T) – WriteSet(Ti) ReadSet(Tj) is empty + // (above defn implements Test 1) //The following is done in critical section – WriteSet(Ti) WriteSet(Tj) is empty. < foreach Ts in S do { Ti if ReadSet(T) intersects WriteSet(Ts) R V W start then valid = false; of } Tj critical R V W section if valid then { install updates; // Write phase Commit T } > Does Tj read dirty data? Does Ti overwrite Tj’s writes? else Restart T end of critical section

Comments on Serial Validation Overheads in Optimistic CC • Applies Test 2, with T playing the role of Tj • Must record read/write activity in ReadSet and and each Xact in Ts (in turn) being Ti. WriteSet per Xact. • Assignment of Xact id, validation, and the – Must create and destroy these sets as needed. Write phase are inside a critical section! • Must check for conflicts during validation, and – Nothing else goes on concurrently. must make validated writes ``global’’. – So, no need to check for Test 3 --- can’t happen. – Critical section can reduce concurrency. – If Write phase is long, major drawback. – Scheme for making writes global can reduce clustering of objects. • Optimization for Read-only Xacts: • Optimistic CC restarts Xacts that fail validation. – Don’t need critical section (because there is no – Work done so far is wasted; requires clean-up. Write phase).

Other Techniques Summary • Correctness criterion for isolation is “serializability”. • Timestamp CC: Give each object a read- – In practice, we use “conflict serializability”, which is timestamp (RTS) and a write-timestamp (WTS), somewhat more restrictive but easy to enforce. give each Xact a timestamp (TS) when it begins: • Two Phase Locking, and Strict 2PL: Locks directly – If action ai of Xact Ti conflicts with action aj implement the notions of conflict. of Xact Tj, and TS(Ti) < TS(Tj), then ai must – The lock manager keeps track of the locks issued. Deadlocks can either be prevented or detected. occur before aj. Otherwise, restart violating • Must be careful if objects can be added to or Xact. removed from the database (“phantom problem”). • Multiversion CC: Let writers make a “new” • Index locking common, affects performance copy while readers use an appropriate “old” significantly. copy. – Needed when accessing records via index. – Advantage is that readers don’t need to get locks – Needed for locking logical sets of records (index – Oracle uses a simple form of this. locking/predicate locking).

7 Summary (Contd.) Summary (Contd.)

• Multiple granularity locking reduces the overhead • Optimistic CC aims to minimize CC overheads in involved in setting locks for nested collections of an ``optimistic’’ environment where reads are objects (e.g., a file of pages); common and writes are rare. – should not be confused with tree index locking! • Optimistic CC has its own overheads however; • Tree-structured indexes: most real systems use locking. – Straightforward use of 2PL very inefficient. • There are many other approaches to CC that we – Idea is to use 2PL on data to ensure serializability and use other protocols on tree to ensure structural integrity. don’t cover here. These include: – Bayer-Schkolnick illustrates potential for improvement. – timestamp-based approaches – multiple-version approaches – semantic approaches

8