Database Management Systems Introduction Transaction ACID
Total Page:16
File Type:pdf, Size:1020Kb
Introduction What is Concurrent Process (CP)? • Multiple users access databases and use computer Database Management systems simultaneously. • Example: Airline reservation system. Systems œ An airline reservation system is used by hundreds of travel agents and reservation clerks concurrently. Transaction, Concurrency and Why Concurrent Process? Recovery • Better transaction throughput and response time • Better utilization of resource Adapted from Lecture notes by Goldberg @ Berkeley Transaction ACID Properties of transaction • What is Transaction? • Atomicity: Transaction is either performed in its entirety or not performed at all, this should be DBMS‘ • A sequence of many actions which are responsibility considered to be one atomic unit of work. • Consistency: Transaction must take the database • Basic operations a transaction can include from one consistent state to another if it is executed in —actions“: isolation. It is user‘s responsibility to insure consistency œ Reads, writes • Isolation: Transaction should appear as though it is œ Special actions: commit, abort being executed in isolation from other transactions • Durability: Changes applied to the database by a committed transaction must persist, even if the system fail before all changes reflected on disk Concurrent Transactions Schedules • What is Schedules œ A schedule S of n transactions T1,T2,…Tn is an ordering of the B B operations of the transactions subject to the constraint that, for each transaction Ti that participates in S, the operations of Ti in Smust CPU2 A appear in the same order in which they occur in Ti. CPU A 1 œ Example: Sa: r1(A),r2(A),w1(A),w2(A), a1,c2; CPU1 T1 T2 time Read(A) Read(A) t1 t2 t1 t2 Write(A) interleaved processing parallel processing Write(A) Abort T1 Commit T2 1 Oops, something‘s wrong Another example • Reserving a seat for a flight • Problems can occur when concurrent transactions execute in an uncontrolled manner. • If concurrent access to data in DBMS, two users • Examples of one problem. may try to book the same seat simultaneously œ A original equals to 100, after execute T1 and T2, A is supposed to be 100+10-8=102 Agent 1 finds time seat 35G empty Add 10 To A Minus 8 from A Value of A on Agent 2 finds T1 T2 the disk seat 35G empty Read(A) 100 Agent 1 sets A=A+10 100 seat 35G occupied Read(A) 100 A=A-8 100 Agent 2 sets Write(A) 110 seat 35G occupied Write(A) 92 What Can Go Wrong? Conflict operations • Two operations in a schedule are said to be conflict if they • Concurrent process may end up violating Isolation satisfy all three of the following conditions: property of transaction if not carefully scheduled (1) They belong to different transactions (2) They access the same item A; • Transaction may be aborted before committed (3) at least one of the operations is a write(A) - undo the uncommitted transactions Example in Sa: r1(A),r2(A),w1(A),w2(A), a1,c2; - undo transactions that sees the uncommitted change before the crash œ r1(A),w2(A) conflict, so do r2(A),w1(A), œ r1(A), w1(A) do not conflict because they belong to the same transaction, œ r1(A),r2(A) do not conflict because they are both read operations. Characterizing Schedules Serializabilityof schedules No Yes 1. Avoid cascading abort(ACA) T1 T2 T1 T2 • Serial • Aborting T1 requires aborting T2! Read(A) Read(A) œ A schedule S is serial if, for every transaction T participating in the œ Cascading Abort Write(A) Write(A) schedule, all the operations of T are executed consecutively in the • An ACA (avoids cascading abort) Read(A) commit schedule.( No interleaving occurs in a serial schedule) œ A X act only reads data from Write(A) Read(A) • Serializable committed X acts. Abort Write(A) T1 T2 T1 T2 œ A schedule S of n transactions is serializable if it is equivalent to some 2.recoverable Read(A) Read(A) serialschedule of the same n transactions. • Aborting T1 requires aborting T2! Write(A) Write(A) – But T2 has already committed! • schedules are conflict equivalent if: Read(A) Read(A) œ they have the same sets of actions, and • A recoverable schedule is one in which Write(A) Write(A) this cannot happen. œ each pair of conflicting actions is ordered in the same way Commit Commit – i.e. a X act commits only after all the Commit • Conflict Serializable X acts it “depends on” (i.e. it reads from) Abort commit. T1 T2 œ A schedule is said to be conflict serializable if it is conflict equivalent to – ACA implies recoverable (but not Read(A) a serial schedule vice-versa!). Write(A) Commit 3. strict schedule Read(A) Write(A) Commit 2 Denn Diagram for Schedules Example All Schedules View Serializable Conflict Serializable T1:W(X), T2:R(Y), T1:R(Y), T2:R(X), C2, C1 Recoverable ACA • serializable: Yes, equivalent to T1,T2 Strict • conflict-serializable: Yes, conflict- Serial equivalent to T1,T2 • recoverable: No. Yes, if C1 and C2 are switched. • ACA: No. Yes, if T1 commits before T2 reads X. Sample Transaction (informal) Sample Transaction (Formal) • Example: Move $40 from checking to savings T1 account • To user, appears as one activity t0 read_item(X); • To database: read_item(Y); œ Read balance of checking account: read( X) X:=X-40; œ Read balance of savings account: read (Y) Y:=Y+40; œ Subtract $40 from X œ Add $40 to Y write _item(X); œ Write new value of X back to disk tk write_item(Y); œ Write new value of Y back to disk Focus on concurrency control Concurrency Control Through Locks • Lock:variable associated with each data item œ Describes status of item wrt operations that can • Real DBMS does not test for serializability be performed on it œ Very inefficient since transactions are continuously • Binary locks: Locked/unlocked arriving Multiple-mode locks: Read/write œ Would require a lot of undoing • • Three operations • Solution: concurrency protocols œ read_lock(X) • If followed by every transaction, and œ write_lock(X) enforced by transaction processing system, œ unlock(X) guarantee serializabilityof schedules • Each data item can be in one of three lock states 3 Two Transactions T1 T2 Locks Alone Don’t Do the Trick! read_lock(Y); read_lock(X); Let’s run T1 and T2 in interleafed fashion read_item(Y); read_item(X); Schedule S T1 unlock(Y); unlock(X); T2 write_lock(X); write_lock(Y); read_lock(Y); read_item(X); read_item(Y); read_item(Y); read_lock(X); unlock(Y); read_item(X); X:=X+Y; Y:=X+Y; unlock(X); write_item(X); write_item(Y); write_lock(Y); read_item(Y); unlock(X); unlock(Y); unlocked too early! Y:=X+Y; write_lock(X); write_item(Y); Let’s assume serial schedule S1: T1;T2 read_item(X); unlock(Y); → X:=X+Y; Non-serializable! Initial values: X=20, Y=30 Result: X=50, Y=80 write_item(X); unlock(X); Result: X=50, Y=50 Example Two-Phase Locking (2PL) T1’ T2’ read_lock(Y); read_lock(X); read_item(Y); read_item(X); • Def.: Transaction is said to follow the write_lock(X); write_lock(Y); two-phase-locking protocolif all locking unlock(Y); unlock(X); operations precede the first unlock read_item(X); read_item(Y); Y:=X+Y; operation X:=X+Y; write_item(X); write_item(Y); unlock(X); unlock(Y); • Both T1’ and T2’ follow the 2PL protocol • Any schedule including T1’ and T2’ is guaranteed to be serializable • Limits the amount of concurrency Variations to the Basic Protocol Deadlock in 2PL Write to A, B Read from A, B • Previous technique knows as basic 2PL • Deadlock T1 T2 Conservative 2PL (static) 2PL: Lock all œ T1 waits for • write_lock(A) items needed BEFORE execution begins T2 to unlock B read_lock(B) write_lock(B) by predeclaring its read and write set œ T2 waits for read_lock(A) œ If any of the items in read or write set is T1 to unlock A write(A) already locked (by other transactions), œ Neither can write(B) transaction waits (does not acquire any write(A) locks) proceed! write(B) A deadlock! unlock(A) œ Deadlock free but not very realistic unlock(B) unlock(A) unlock(B) 4 Dariations to the Basic Protocol The Phantom Problem • Strict 2PL: Transaction does not • The concurrency Accounts release its write locks until AFTER it acc_num branch amount control problem for aborts/commits insertion and deletion 99 Easton $100 œ Not deadlock free but guarantees in database 120 Allentown $500 190 Easton $200 recoverable schedules (strict schedule: • Example: A local bank transaction can neither read/write X until Assets last transaction that wrote X has branch assets committed/aborted) Easton $300 œ Most popular variation of 2PL Allentown $500 Two Transactions Schedule following 2PL T1 T2 • T1 wants to verify that the accounts at the read_lock(Accounts[99]); write_lock(Accounts[150]); Easton branch add up to be equal to the read_lock(Accounts[190]); write_item(Accounts[150, ‘Easton’, $50]); total assets of the Easton branch read_item(Accounts[99]); write_lock(Assets[Easton]); read_item(Accounts[190]); write_item(Assets[Easton, $350); • T2 wants to add a new account (150, unlock(Accounts[150]); ‘Easton’, $50) to the accounts table read_lock(Assets[Easton]); unlock(Assets[Easton]); read_item(Assets[Easton]); • Write schedules for both transaction unlock(Accounts[99]); following the 2-phase locking protocol unlock(Accounts[190]); unlock(Assets[Easton]); When Will the Phantom Problem Index Locking Occur? • When T1 and T2 are interleaved, the Phantom • Suppose access to a table is controlled by Problem may occur a B-tree œ See previous slide • Should we use 2PL on B-tree? • Does it mean 2PL is not suitable for insertion œ 2PL says that a transaction must acquire all and deletion in database? the locks before it can release any œ No.