<<

Outline l Introduction/problems, l definitions

Introduction/ (transaction, history, conflict, equivalence, Problems , ...), Definitions l locking. Chapter 2: Locking Basics

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 1 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 2

Atomicity, Isolation Synchronisation, Distributed (1) l Transactional guarantees – l Essential feature of : in particular, atomicity and isolation. Many users can access the same data concurrently – be it read, be it write. Introduction/ l Atomicity Introduction/ Problems Problems u Example, „bank scenario“: l Consistency must be guaranteed – Definitions Definitions task of synchronization component. Locking Number Person Balance Locking Klemens 5000 l Multi-user mode shall be hidden from users as far as possible: concurrent processing

Gunter 200 of requests shall be transparent, u Money transfer – two elementary operations. ‚illusion‘ of being the only user. – debit(Klemens, 500), – credit(Gunter, 500). l Isolation – can be explained with this example, too. l Transactions.

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 3 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 4 Synchronisation, Distributed (2) Synchronization in General l Serial execution of application programs Uncontrolled non-serial execution u achieves that illusion leads to other problems, notably inconsistency: l Introduction/ without any synchronization effort, Introduction/ lost updates, Problems u consistency Problems l Inconsistent analysis („non-repeatable read“), Definitions Definitions Locking at the end of each program, Locking l dirty reads, i.e., reads of uncommitted updates, u but extremely long delays l phantoms. and insufficient utilization of resources. (Processor is idle during communication and I/O.)

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 5 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 6

Lost Update Dirty Read

l Program T1 transfers EUR 300,- l Program T2 credits interest rate based on a value from Account A to Account B, that is not part of a consistent state.

Program T2 credits 3 % interest rate to Account A. l Introduction/ Introduction/ Namely, T1 is aborted later on. Problems Problems l Interest credited in Step 5 by T2 is lost, Definitions Definitions Step T1 T2 because value is overwritten in Step 6 by T1. Locking Locking 1 Read(A, a1) 2 a1 := a1-300 Step T1 T2 3 Write(A, a1) 1 Read(A, a1) 4 Read(A, a2) 2 a1 := a1-300 5 a2 := a2 *1.03 3 Read(A, a2) 6 Write(A, a2) 4 a2 := a2 *1.03 7 commit 5 Write(A, a2) 8 Read(B, b1) 6 Write(A, a1) 9 … Read(B, b1) 7 10 abort

8 b1 := b1 + 300 Not necessar ily 9 Write(B, b1) invoked by user.

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 7 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 8 Non-Repeatable Reads Transactions l Program reads data object more than once l Execution of a program and sees modification of another program. that manipulates the database. l Representation of the execution that identifies Introduction/ Step T1 T2 Introduction/ Problems Problems u the reads and writes, Definitions 1 Read(A, a1) Definitions Locking 2 a1 := a1-300 Locking u the order of their execution, 3 Write(A, a1) u whether or not there is a commit at the end 4 Read(A, a2) (or abort). 5 a2 := a2 *1.03 6 Write(A, a2) 7 Read(A, a3) 8 …

l Explain why this is neither a lost update nor a dirty read.

z Klemens Böhm Distributed Data Management: Concurrency Control Basics – 9 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 10

Transactions Conflict l Example: l Two operations p, q conflict := Procedure P begin p, q operate on the same data object, Start; and p or q is a write. Introduction/ Introduction/ Problems temp := Read(x); Problems l Further operations – operations Definitions temp := temp + 1; Definitions definition of conflict must be extended. Locking Write(x, temp); Locking l Example. Compatibility matrix: Commit end Read Write Increment Decrement l Representation: r [x] → w [x] → c Read y n n n 1 1 1 Write n n n n l Transaction is partial order (Σ, <) Increment n n y y (Σ will mostly be omitted in what follows.) Decrement n n y y

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 11 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 12 Transaction – Formal Definition Reads-from Relationship between Transactions Transaction is partial order with ordering relation <, s.t. l Transaction T reads-from transaction T 1. T ⊆ {r [x], w [x]|x is a data object} ∪ {a , c }, i j i i i i i in a certain execution if Introduction/ 2. ai∈Ti ⇔ ci∉Ti; Introduction/ Problems Problems 1. Tj reads x after Ti has written x; 3. Definitions if t is ci or ai Definitions 2. Ti does not abort before Ti reads x; and Locking then for each other operation p∈Ti holds: p

T3 from T2, T4 from T2. Nothing else.

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 13 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 14

Histories (1) Histories – Examples l Execution of the operations of several transactions l Two transactions given:

that are ‘intertangled’ with each other, T1 r1[x] w1[x] i.e., concurrent. c1 T2 r2[x] w2[x] c2 Introduction/ Introduction/ Problems l Formally – Problems r1[y] w1[y] Definitions T = {T , T , …, T } be a set of transactions. Definitions 1 2 n l An OK complete history: Locking Complete history H over T := Locking r2[x] w2[x] c2 partial order with order relation

r1[x] w1[x] c1 r1[y] w1[y]

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 17 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 18

Prefix Commit-Closedness (1) Prefix Commit-Closedness (2) l Characteristic of histories is prefix commit-closed H=o1 ... on α=”history β=”all operations γ=”history (linear contains less are reads” contains more if the following holds: to keep than 10 than 10 characteristic holds for H. Introduction/ things operations” operations” Introduction/ Problems Problems Ω characteristic holds for C(H’), H’ is prefix of H. simple) Definitions. Definitions. (C(H) := committed projection, i.e., Prefixes: If α holds for H, If β holds for H, γ may not hold Locking Locking H’=o1 ... ol it also holds for it also holds for for H’ even if it only operations from committed transactions) H”=o1 ... H’ and H” and H’ and H” and holds for H. γ is l Note that we do not take just any prefix om any other prefix. any other prefix. not prefix into account, only committed projections. α is prefix β is prefix commit-closed.

commit-closed. commit-closed.

l Come up with an example of your own.

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 19 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 20 Prefix Commit-Closedness (3) Equivalence of Histories l Rationale: l More than one definition: u correctness criterion for histories u conflict equivalence,

Introduction/ must have this characteristic. Introduction/ u view equivalence. Problems Problems u Scheduler generates history, l Definitions Definitions Definition ‘conflict equivalence’: Locking but also each prefix. Locking Histories H, H’ are (conflict) equivalent if u Failure of the DBMS – 1. same transactions, same operations; history after recovery has the characteristic 2. they establish the same order as well. I.e., history should be correct as well. of operations with conflict.

I.e., pi, qj, belong to Ti and Tj, respectively. ai, aj ∉H. If pi

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 21 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 22

Equivalence of Histories Equivalence of Histories – Examples (1) – Examples (2) l Two transactions given: l History 1:

T1 r1[x] w1[x] Step T1 T2 T3 Introduction/ c r [x] w [x] c Introduction/ 1 Read(A) Problems 1 T2 2 2 2 Problems All transactions eventually commit. Definitions r1[y] w1[y] Definitions 2 Write(A) (Same below.) Locking Locking 3 Write(A) l An OK complete history: 4 Write(A)

r2[x] w2[x] c2 l History 2: r1[x] w1[x] c1 Step T1 T2 T3 r1[y] w1[y] 1 Read(A) 2 Write(A) l Give an example of a conflict-equivalent history 3 Write(A) that is not identical. 4 Write(A)

l Are these histories conflict-equivalent? Klemens Böhm Distributed Data Management: Concurrency Control Basics – 23 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 24 View Equivalence (1) View Equivalence (2) l Auxiliary definition – final write of x in a history: l View equivalence :=

operation wi[x]∈H such that ai∉H, two histories H, H’ are equivalent if and for each wj[x]∈H (j≠i) 1. Introduction/ Introduction/ same transactions, same operations; Problems either wj[x]

r2[x] w2[x] c2

r1[x] w1[x] c1 r1[y] w1[y]

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 25 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 26

View-Equivalence of Histories Serializability (1) – Example l Committed projection of history H – l History 1: Abbreviation: C(H) := results from H Step T1 T2 T3 by removing all operations that are not committed. Introduction/ 1 Read(A) Introduction/ Problems All transactions Problems eventually commit. Definitions 2 Write(A) Definitions (Same below.) l H is (conflict) serializable Locking 3 Write(A) Locking 4 Write(A) if C(H) is equivalent to serial history HS.

l History 2: l Is this history serializable? If so, what is equivalent serial history? Step T1 T2 T3 1 Read(A) r2[x] w2[x] c2 2 Write(A) 3 Write(A) r1[x] w1[x] 4 Write(A) c1

r1[y] w1[y] l Are these histories view-equivalent? Klemens Böhm Distributed Data Management: Concurrency Control Basics – 27 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 28 Serializability (2) Serializability (3) l Obviously, conflict serializability l H is view serializable if for each prefix H’ is prefix commit-closed. C(H’) is view equivalent to a serial history. l (C(H’) – committed projection) Introduction/ Illustration: Introduction/ Problems c Problems l Why does this definition include each prefix? Definitions r2[x] w2[x] 2 Definitions u Locking Locking View serializability shall be r1[x] w1[x] prefix commit-closed. c1 r1[y] w1[y] u Example:

– H12=w1[x] w2[x] w2[y] c2 w1[y] c1 w3[x] w3[y] c3 H12=C(H12) and is view equivalent to T1 T2 T3.

– H’12=w1[x] w2[x] w2[y] c2 w1[y] c1 – According to definition,

H12 is not view serializable.

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 29 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 30

Serializability (4) Recoverability (1) l H conflict serializable Ω H view serializable l ‘Commit of a transaction’ – – but not vice versa. DBMS cannot abort it any more. l l Introduction/ Example: Introduction/ Transaction may issue commit, but DBMS decides Problems Problems H13=w1[x] w2[x] w2[y] c2 w1[y] w3[x] w3[y] c3 w1[z] c1 on point of time when it is executed. Definitions Definitions l Locking Locking Commit only when all modifications of data objects that T have read are committed. l Counterexample: Step T1 T2 Dirty Read. 1 Read(A, a1) 2 a1 := a1-300 3 Write(A, a1) 4 Read(A, a2) Is it serializable? 5 a2 := a2 *1.03 6 Write(A, a2) 7 commit 8 Read(B, b1) 9 … z 10 abort Klemens Böhm Distributed Data Management: Concurrency Control Basics – 31 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 32 Recoverability (2) Cascading Aborts (1) l Execution is recoverable := l Transaction T issues abort operation commit of T follows the commit of each transaction when it cannot terminate correctly. from which T has read. l Introduction/ Introduction/ Effects of T must be undone: Problems Problems u writes of T, Definitions Definitions Locking Locking u effects of all other transactions that have seen such writes. l Undo may lead to cascading abort.

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 33 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 34

Cascading Aborts (2) Cascadelessness l Example of cascading abort: l Cascading aborts are undesired: u Initially, x and y both have value 1. u they require bookkeeping,

Introduction/ u Two transactions T1 and T2. Introduction/ u Number of transactions that must abort Problems Problems u is not bounded. Definitions Order of operations: Definitions Locking Write1(x, 2); Read2(x); Write2(y, 3). Locking l DBMS is cascadeless := Ω Each transaction only reads values u Undo of Write1(x, 2) T2 must abort as well. of committed transactions. l Cascading aborts may happen even though history is recoverable.

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 35 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 36 Strictness Examples

l Undo typically works with before images. l T1 = w1[x] w1[y] w1[z] c1

l Problem illustration – l T2 = r2[u] w2[x] r2[y] w2[y] c2

Introduction/ original value of x: 1 Introduction/ Problems Problems 1. w1(x, 2); w2(x, 3); abort1 l Definitions Definitions H7 = w1[x] w1[y] r2[u] w2[x] r2[y] w2[y] c2 w1[z] c1 2. w (x, 2); w (x, 3); abort ; abort Locking 1 2 1 2 Locking l H8 = w1[x] w1[y] r2[u] w2[x] r2[y] w2[y] w1[z] c1 c2 l strictness := l H9 = w1[x] w1[y] r2[u] w2[x] w1[z] c1 r2[y] w2[y] c2 write(x, val) is delayed l H = w [x] w [y] r [u] w [z] c w [x] r [y] w [y] c until all transactions that have written x before 10 1 1 2 1 1 2 2 2 2 have committed or aborted, + cascadelessness. l Which histories are recoverable, which ones can do without cascading aborts, which ones are strict?

z Klemens Böhm Distributed Data Management: Concurrency Control Basics – 37 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 38

Serializability Graph (1) Serializability Graph (2) l Delays and resets are common techniques l Illustration: in transaction management. r [x] w [x] c l 2 2 2 Introduction/ Test if certain schedule (= sequence of Introduction/ Problems Problems transactions and their operations) is serializable: r1[x] w1[x] Definitions Definitions c l generate serializability graph 1 Locking Locking r [y] w [y] or dependency graph. 1 1 l nodes = transactions, l edge (directed) = dependency between two transactions: access to the same data object, and operations are in conflict.

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 39 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 40 Serializability Graph (3) Serializability Graph – Example l Theorem: schedule is serializable l r(x)/w(x) – read-/write access to data object x.

if corresponding dependency graph is cycle-free. l Schedule: w(y) r(x) l T1 Introduction/ There is a partial order that can be extended Introduction/ Problems to a total one: equivalent serial schedule. Problems r(y) Definitions Definitions T2 Locking Locking w(x) T3

l Dependency graph:

T3 x y T1 T2

l acyclic → schedule serializable. l Serialization order: T3 < T1 < T2

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 41 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 42

Serializability Graph – Discussion Locking (1) Approach is not practicable. l for each data object and for each type of operation – notation: ol [x] l Serializability of schedules i l can only be checked with hindsight Introduction/ DBMS acquires lock before accessing object. (im nachhinein). Problems l Transaction obtains lock, transaction releases lock. Definitions l Administrative costs are too high: Locking l Locks may conflict. dependencies with terminated transactions need to be taken into account as well. T1 T2 E.g., relationship between T1 and T3 only becomes clear after commit of T3. A

l Locks conflict, just as operations do. l Locking as is is not sufficient! (See next slide.)

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 43 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 44 ‘Naive Locking’ is not Sufficient. Locking (2) l Illustration – lost update. l Growing phase, shrinking phase. l Step T1 T2 Serializability ensured by . Introduction/ 1 Read(A, a1) Introduction/ two-phase locking protocol Problems 2 a1 := a1-300 Problems l Cascadelessness ensured Definitions 3 Read(A, a2) Definitions Locking Locking by strict two-phase locking protocol: 4 a2 := a2 *1.03 keep all locks until end of transaction 5 Write(A, a2) (second phase of commit). 6 Write(A, a1) 7 Read(B, b1) l Illustration: r2[x] w2[x] c2 8 b1 := b1 + 300 9 Write(B, b1) r [x] w [x] 1 1 c1 r1[y] w1[y]

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 45 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 46

Locking (3) Example Illustrating 2PL

l Simplest case: only read locks and write locks l T1: r1[x] → w1[y] → c1; T2: w2[x] → w2[y] → c2 („RX locking scheme“). l H1=rl1[x] r1[x] ru1[x] wl2[x] w2[x] wl2[y] w2[y] wu2[x] l Introduction/ Centralized locking protocol: Introduction/ wu2[y] c2 wl1[y] w1[y] wu1[y] c1 Problems one designated node handles all lock requests Problems l r1[x]

H1=rl1[x] r1[x] wl1[y] w1[y] c1 ru1[x] wu1[y] wl2[x] w2[x] wl2[y] w2[y] c2 wu2[x] wu2[y]

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 47 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 48 Example for with 2PL Deadlock – Terminology

l T1: r1[x] → w1[y] → c1; T2: w2[y] → w2[x] → c2 l Victim, l Chronology: l victim selection policy,

Introduction/ 1. Both transactions Introduction/ l fair victim selection policy. Problems do not have any locks initially. Problems Definitions Definitions Locking 2. Scheduler receives r1[x] from TM. Locking rl1[x], scheduler submits r1[x] to DM.

3. Scheduler receives w2[y] from TM. wl2[y], Scheduler submits w2[y] to DM.

4. Scheduler receives w2[x] from TM. wl2[x] not possible. Delay.

5. Scheduler receives w1[y] from TM. wl1[y] not possible. Delay. l External reset of deadlock required.

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 49 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 50

Correctness of 2PL Locking (4) l Lemmata. l Growing phase, shrinking phase. (Let H be the history l Serializability ensured generated by 2PL scheduler.) Introduction/ Introduction/ by two-phase locking protocol. Problems Problems 1. oi[x]∈C(H) l Cascadelessness ensured Definitions Ω Definitions oli[x], oui[x]∈C(H), and oli[x] < oi[x] < oui[x]. Locking Locking by strict two-phase locking protocol: 2. pi[x] and qj[x] (i≠j) conflict. keep all locks until end of transaction Ω pui[x] < qlj[x] or quj[x] < pli[x]. (second phase of commit). Ω 3. pi[x], qi[y]∈C(H) pli[x] < qui[y]. l Illustration: Write1(x, 2); Read2(x); Write2(y, 3).

l Now consider T1→T2 → … →Tn →T1. l Theorem: each 2PL history H is serializable.

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 51 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 52 Local vs. Global Transactions Requirements with Distribution l System-wide serializability of all local and global Global Ä transactions („“). Transactions Local serializability (in each node) Introduction/ Introduction/ Problems Problems does not suffice: Coordination Layer Definitions Definitions Serialization orders may be different. – Example? Locking Locking l Communication costs should be low. Local TAs Ä DBMS DBMS DBMS l As in the centralized case: only as few blockings and resets of transactions as possible. (These are the methods to deal with synchronization conflicts, negative effect on throughput and answering times.) l Transaction is global l High robustness wrt system faults if it refers to more than one database. (system failure, communication problems).

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 53 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 54

Locking in the Distributed Case Illustration – one Approach l Advantage: synchronization Ä (incl. deadlock recognition) Introduction/ as in centralized DBS; node has total knowledge. Introduction/ Problems Problems Coordination Layer Definitions l Disadvantages: Definitions Locking u enormous communication effort required: Locking each lock request becomes a message DBMS DBMS DBMS and its reply, waiting time. u Synchronization node potential bottleneck for performance and availability („single point of failure“). u no autonomy of nodes. → not really applicable for distributed databases.

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 55 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 56 Distributed Locking Example for Deadlock with 2PL

l Data is partitioned among nodes. l T1: r1[x] → w1[y] → c1; T2: w2[y] → w2[x] → c2 l Each node synchronizes operations l Chronology:

Introduction/ that access data on its partition. Introduction/ 1. Both transactions Problems l No further communication (except for replicas): Problems do not have any locks initially. Definitions distributed execution of transactions Definitions Locking Locking 2. Scheduler receives r1[x] from TM. and operations already adapted rl1[x], scheduler submits r1[x] to DM. to distribution of data. 3. Scheduler receives w2[y] from TM. l Release of locks with strict 2PL as part of ACP. wl2[y], Scheduler submits w2[y] to DM. l Biggest problem: global . 4. Scheduler receives w2[x] from TM. wl2[x] not possible. Delay.

5. Scheduler receives w1[y] from TM. wl1[y] not possible. Delay. l External reset of deadlock required.

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 57 Klemens Böhm Distributed Data Management: Concurrency Control Basics – 58

Potential Exam Questions l Which kinds of inconsistencies in multi-user mode do you know? l What is the rationale behind the definition of 'prefix commit-closed'? l Explain the difference between conflict- and view-serializability. l Which requirements should a history fulfill, in addition to serializability? l How can we ensure serialisability? How to proceed in the distributed case? l Explain based on an example why 2PL (strict 2PL) fulfills its purpose.

Klemens Böhm Distributed Data Management: Concurrency Control Basics – 59