<<

• A distributed transaction accesses resource managers distributed across a network Implementing Distributed • When resource managers are DBMSs we refer to the Transactions system as a distributed system

Chapter 24 DBMS at Site 1 Application Program

DBMS

1 at Site 2 2

Distributed Database Systems ACID Properties

• Each local DBMS might export • Each local DBMS – stored procedures, or – supports ACID properties locally for each subtransaction – an SQL interface. • Just like any other transaction that executes there • In either case, operations at each site are grouped – eliminates local deadlocks together as a subtransaction and the site is referred • The additional issues are: to as a cohort of the distributed transaction – Global atomicity: all cohorts must abort or all – Each subtransaction is treated as a transaction at its site – Global deadlocks: there must be no deadlocks involving • Coordinator module (part of TP monitor) supports multiple sites ACID properties of distributed transaction – Global serialization: distributed transaction must be globally serializable – Transaction manager acts as coordinator

3 4

Atomic Commit Protocol Global Atomicity

Transaction (3) xa_reg • All subtransactions of a distributed transaction Manager Resource must commit or all must abort (coordinator) Manager (1) tx_begin (cohort) • An protocol, initiated by a (4) tx_commit (5) atomic coordinator (e.g., the transaction manager), commit protocol ensures this. (3) xa_reg Resource Application – Coordinator polls cohorts to determine if they are all Manager program (cohort) willing to commit (2) access • Protocol is supported in the xa interface resources (3) xa_reg between a transaction manager and a resource Resource Manager manager (cohort) 5 6

1 Cohort Abort Atomic Commit Protocol • Most commonly used atomic commit • Why might a cohort abort? protocol is the two-phase commit protocol – Deferred evaluation of integrity constraints • Implemented as an exchange of messages – Validation failure (optimistic control) between the coordinator and the cohorts – Deadlock – Crash of cohort site • Guarantees global atomicity of the – Failure prevents communication with cohort transaction even if failures should occur site while the protocol is executing

7 8

Two-Phase Commit – The Two-Phase Commit -- Phase 1 • When application invokes tx_commit, coordinator Transaction Record sends prepare message to all cohorts • During the execution of the transaction, before the • prepare message (coordinator to cohort) : two-phase commit protocol begins: – If cohort wants to abort at any time prior to or on receipt of the message, it aborts and releases locks – When the application calls tx_begin to start the – If cohort wants to commit, it moves all update records to transaction, the coordinator creates a transaction record mass store by forcing a prepare record to its log for the transaction in • Guarantees that cohort will be able to commit (despite crashes) if – Each time a resource manager calls xa_reg to join the coordinator decides commit (since update records are durable) transaction as a cohort, the coordinator appends the • Cohort enters prepared state cohort’s identity to the transaction record – Cohort sends a vote message (“ready” or “aborting”). It • cannot change its mind • retains all locks if vote is “ready” • enters uncertain period (it cannot foretell final outcome) 9 10

Two-Phase Commit -- Phase 1 Two-Phase Commit -- Phase 2

• vote message (cohort to coordinator): Cohort indicates it • Commit or abort message (coordinator to cohort): is “ready” to commit or is “aborting” – If commit message – Coordinator records vote in transaction record • cohort commits locally by forcing a commit record to its log – If any votes are “aborting”, coordinator decides abort and deletes transaction record • cohort sends done message to coordinator – If all are “ready”, coordinator decides commit, forces commit – If abort message, it aborts record (containing transaction record) to its log (end of phase 1) – In either case, locks are released and uncertain period ends • Transaction committed when commit record is durable • done message (cohort to coordinator): • Since all cohorts are in prepared state, transaction can be – When coordinator receives a done message from each committed despite any failures cohort, it writes a complete record to its log and deletes – Coordinator sends commit or abort message to all cohorts transaction record from volatile store 11 12

2 Two-Phase Commit (commit case) Two-Phase Commit (abort case) Application Coordinator Cohort Application Coordinator Cohort

tx_commit - send prepare msg to tx_commit - send prepare msg to cohorts in trans. rec. - force prepare cohorts in trans. rec. - force prepare phase 1 rec. to cohort log phase 1 rec. to cohort log

- record vote in trans. rec. - send vote msg u - record vote in trans.rec.

p - send vote msg n u e p c n r e e c

- if all vote ready, force i - if any vote abort, r r e o t i r a d o t i a commit rec. to coord. log delete transaction rec. d n i - send commit msg -force commit - send abort msg - local abort n rec. to cohort log resume - return status - release locks phase 2 - release locks - when all done msgs rec’d, - send done msg xa interface write complete rec. to log - delete trans. rec. xa interface resume - return status 13 14

Distributing the Coordinator Coordinator/Cohort Tree Domain A • A transaction manager controls resource TMA managers in its domain Applic.

• When a cohort in domain A invokes a resource RM1 RM2

manager, RMB, in domain B, the local transaction manager, TMA, and remote Domain C transaction manager, TMB, are notified Domain B

– TMB is a cohort of TMA and a coordinator of TMB TMC RMB • A coordinator/cohort tree results RM3 RM4 RM5 invocations 15 protocol msgs 16

Distributing the Coordinator Failures and Two-Phase Commit

• The two-phase commit protocol progresses • A participant recognizes two failure situations. down and up the tree in each phase – Timeout : No response to a message. Execute a timeout protocol – When TMB gets a prepare msg from TMA it sends a prepare msg to each child and waits – Crash : On recovery, execute a restart protocol

– If each child votes ready, TMB sends a ready • If a cohort cannot complete the protocol until msg to TMA some failure is repaired, it is said to be blocked • if not it sends an abort msg – Blocking can impact performance at the cohort site since locks cannot be released

17 18

3 Timeout Protocol Timeout Protocol • Cohort (in prepared state) times out waiting for • Cohort times out waiting for prepare message commit/abort message – Cohort is blocked since it does not know coordinator’s – Abort the subtransaction decision • Since the (distributed) transaction cannot commit unless • Coordinator might have decided commit or abort cohort votes to commit, atomicity is preserved • Cohort cannot unilaterally decide since its decision • Coordinator times out waiting for vote message might be contrary to coordinator’s decision, violating atomicity – Abort the transaction • Locks cannot be released • Since coordinator controls decision, it can force all – Cohort requests status from coordinator; remains blocked cohorts to abort, preserving atomicity • Coordinator times out waiting for done message – Requests done message from delinquent cohort 19 20

Restart Protocol - Cohort Restart Protocol - Coordinator • On restart: • On restart cohort finds in its log – Search log and restore to volatile memory the transaction – begin_transaction record, but no prepare record: record of each transaction for which there is a commit record, • Abort (transaction cannot have committed because cohort has not but no complete record voted) • Commit record contains transaction record – prepare record, but no commit record (cohort crashed in its uncertain period) • On receiving a request from a cohort for transaction • Does not know if transaction committed or aborted status: • Locks items mentioned in update records before restarting system – If transaction record exists in volatile memory, reply based • Requests status from coordinator and blocks until it receives an answer on information in transaction record – commit record – If no transaction record exists in volatile memory, reply abort • Recover transaction to committed state using log • Referred to as presumed abort property 21 22

Presumed Abort Property Presumed Abort Property

• If, when a cohort asks for the status of a – The coordinator had crashed and restarted and found a transaction, there is no transaction record in complete record for the transaction in its log coordinator’s volatile storage, either – The coordinator had committed the transaction, – The coordinator had aborted the transaction and deleted received done messages from all cohorts and hence the transaction record deleted the transaction record from volatile memory – The coordinator had crashed and restarted and did not • The last two possibilities cannot occur find the commit record in its log because • It was in Phase 1 of the protocol and had not yet made a – In both cases, the cohort has sent a done message and decision, or hence would not request status • It had previously aborted the transaction • Therefore, coordinator can respond abort – or …

23 24

4 Heuristic Commit Variants/Optimizations • What does a cohort do when in the blocked state and the coordinator does not respond to a request • Read/only subtransactions need not for status? participate in the protocol as cohorts – Wait until the coordinator is restarted – Give up, make a unilateral decision, and attach a fancy – As soon as such a transaction receives the name to the situation. prepare message, it can give up its locks and • Always abort exit the protocol. • Always commit • Always commit certain types of transactions and always abort • Transfer of coordination others – Resolve the potential loss of atomicity outside the system • Call on the phone or send email 25 26

Transfer of Coordination Linear Commit • Sometimes it is not appropriate for the coordinator (in the initiator’s domain) to • Variation of two-phase commit that coordinate the commit involves transfer of coordination – Perhaps the initiator’s domain is a convenience • Used in a number of commerce store and the bank does not trust it to perform protocols the commit • Ability to coordinate the commit can be • Cohorts are assumed to be connected in a transferred to another domain linear chain • Linear commit • Two-phase commit without a prepared state

27 28

Linear Commit Protocol Linear Commit Protocol

• When vote message reaches the rightmost cohort, • When leftmost cohort, A, is ready to commit, it R, if R is ready to commit, it commits the entire goes into a prepared state and sends a vote transaction (acting as coordinator) and sends a message (“ready”) to the cohort to its right, B commit message to the cohort on its left (requesting B to act as coordinator). • The commit message propagates down the chain • After receiving the vote message, if B is ready to until it reaches A commit, it also goes into a prepared state and sends a vote message (“ready”) to the cohort to its • When A receives the commit message it sends a right, C (requesting C to act as coordinator) done message to B, and that also propagates • And so on ...

29 30

5 Linear Commit Protocol Linear Commit • Requires fewer messages than conventional two-phase commit. For n cohorts, ready ready ready – Linear commit requires 3(n - 1) commit – Two-phase commit requires 4n messages A commit B commit • • • R • But two-phase commit requires only 4 done done done message times (messages are sent in parallel) while linear commit requires 3(n - 1) times (messages are sent serially)

31 32

Two-Phase Commit Without a Two-Phase Commit Without a Prepared State Prepared State commit request • Assume exactly one cohort, C, does not support a at end of phase 1 prepared state. C • Coordinator performs Phase 1 of two-phase commit protocol with all other cohorts coordinator C1 • If they all agree to commit, coordinator requests that C commit its subtransaction (in effect, requesting C to decide the transaction’s outcome) C2 two-phase commit • C responds commit/abort, and the coordinator sends a commit/abort message to all other sites C3

33 34

Global Deadlock Global Deadlock • With distributed transaction: – A deadlock might not be detectable at any one • Global deadlock cannot always be resolved by site aborting and restarting a single subtransaction, • Subtransaction T of T at site A might wait for 1A 1 since data might have been communicated subtransaction T2A of T2, while at site B, T2B waits for T1B between cohorts – Since concurrent execution within a transaction – T ’s computation might depend on data received is possible, a transaction might progress at 2A from T2B. Restarting T2B without restarting T2A some site even though deadlocked will not in general work. • T2A and T1B can continue to execute for a period of time

35 36

6 Global Deadlock Detection Global Deadlock Prevention • Global deadlock detection is generally a simple extension of local deadlock detection • Global deadlock prevention - use timestamps – Check for a cycle when a cohort waits – For example an older transaction never waits for a younger one. The younger one is aborted. • If a cohort of T1 is waiting for a cohort of T2, coordinator of T1 sends probe message to coordinator of T2

• If a cohort of T2 is waiting for a cohort of T3, coordinator of T2 relays the probe to coordinator of T3

• If probe returns to coordinator of T1 a deadlock exists – Abort a distributed transaction if the wait time of one of its cohorts exceeds some threshold

37 38

Global Isolation Two-Phase Locking and Two- Phase Commit • If subtransactions at different sites run at different isolation levels, the isolation between concurrent • Theorem: If all sites use a strict two-phase locking protocol and the transaction manager uses a two-phase commit protocol,

distributed transactions cannot easily be characterized. transactions are globally serializable in commit order.

¢¡¤£¦¥ §©¨ ¥ §© ¨¡ • Suppose all subtransactions run at . – Argument: Suppose previous situation occurred. Are distributed transactions as a whole serializable? • At site A – T cannot commit until T releases locks (2Φ locking) – Not necessarily 2A 1A – T does not release locks until T commits (2Φ commit) • T and T might conflict at site A, with T preceding T 1A 1 1A 2A 1A 2A – Hence (if both commit) T commits before T • T and T might conflict at site B, with T preceding T . 1 2 1B 2B 2B 1B • At site B

– Similarly (if both commit) T2 commits before T1 , • => Contradiction (transactions deadlock in this case)

39 40

When Global Atomicity Cannot Spectrum of Commit Protocols Always be Guaranteed • A site might refuse to participate • Two-phase commit – Concerned about blocking • One-phase commit – Charges for its services – When all subtransactions have completed, coordinator • A site might not be able to participate sends a commit message to each one – Does not support prepared state – Some might commit and some might abort • Middleware used by client might not support two- • Zero-phase commit phase commit – When each subtransaction has completed, it – For example, ODBC immediately commits or aborts and informs coordinator • Heuristic commit • – When each database operation completes, it commits

41 42

7 Data • Advantages Application Supported Replication – Improves availability: data can be accessed even though some site has failed • Application creates replicas

– Can improve performance: a transaction can access – If X1 and X2 are replicas of the same item, each the closest (perhaps local) replica transaction enforces the global constraint X1 = X2 • Disadvantages – Distributed DBMS is unaware that X1 and X2 are replicas – More storage – When accessing an item, a transaction must – Increases system complexity specify which replica it wants • Mutual consistency of replicas must be maintained • Access by concurrent transactions to different replicas can to incorrect results 43 44

System Supported Replication Replica Control Transaction • Hides replication from transaction Request access to x • Knows location of all replicas Receive requests for Request access to access to local replicas Replica control remote replica of x • Translates transaction’s request to access an item into request to access a particular replica(s) Request access to local replica of x • Maintains some form of mutual consistency: – Strong: all replicas always have the same value • In every committed version of the database Access local replica of x – Weak: all replicas eventually have the same value Local database – Quorum: a quorum of replicas have the same value 45 46

Read One/Write All Replica Synchronous-Update Read One/ Control Write All Replica Control • Read request locks and reads the most local replica • Satisfies a transaction’s read request using the • Write request locks and updates all replicas nearest replica – Maintains strong mutual consistency • Causes a transaction’s write request to update all • Atomic commit protocol guarantees that all sites replicas commit and makes new values durable – Synchronous case: immediately (before transaction • Schedules are serializable commits) • Problems: Writing: – Asynchronous case: eventually – Has poor performance • Performance benefits result if reads occur – Is prone to deadlock substantially more often the writes – Requires 100% availability

47 48

8 Generalizing Read One/Write All Quorum Consensus Replica Control • Problem: With read one/write all, availability is • Replica control dynamically selects and locks a read worse for writers since all replicas have to be (or write) quorum of replicas when a read (or write) request is made accessible – Read operation reads only replicas in the read quorum • Goal: A replica control in which an item is available – Write operation writes only replicas in the write quorum for all operations even though some replicas are – If p = |read quorum|, q = |write quorum| and inaccessible n = |replica set| then algorithm requires • This implies: p+q > n (read/write conflict) q > n/2 (write/write conflict) – Mutual consistency is not maintained • Guarantees that all conflicts between operations of – Value of an item must be reconstructed by replica concurrent transactions will be detected at some site control when it is accessed and one transaction will be forced to wait. – is maintained 49 50

Quorum Consensus Replica Quorum Consensus Replica Control Control

write write quorum (q) quorum (q)

Set of all Set of all read write quorum (p) replicas of quorum (q) replicas of an item (n) an item (n)

- p+q > n (read/write conflict) - q > n/2 (read/write conflict) - An intersection between any read and - An intersection between any two any write quorum is guaranteed write quorums is guaranteed 51 52

Mutual Consistency Quorum Consensus Replica • Problem: Algorithm does not maintain mutual Control consistency; thus reads of replicas in a read quorum might return different values • Allows a tradeoff among operations on • : Assign a timestamp to each transaction, T, availability and cost when it commits; clocks are synchronized between – A small quorum implies the corresponding sites so that timestamps correspond to commit order operation is more available and can be – T writes: replica control associates T’s timestamp with all performed more efficiently but ... replicas in its write quorum – The smaller one quorum is, the larger the other – T reads: replica control returns value of replica in read quorum with largest timestamp. Since read and write quorums overlap, T gets most recent write – Schedules are serializable 53 54

9 Asynchronous-Update Failures Read One/Write All Replica Control • Algorithm can continue to function even • Problem: Synchronous-update is slow since all though some sites are inaccessible replicas (or a quorum of replicas) must be updated before transaction commits • No special steps required to recover a site • Solution: With asynchronous-update only some after a failure occurs (usually one) replica is updated as part of – Replica will have an old timestamp and hence transaction. Updates propagate after transaction its value will not be used commits but… – Replica’s value will be made current the next – only weak mutual consistency is maintained time the site is included in a write quorum – serializability is not guaranteed

55 56

Asynchronous-Update Primary Copy Replica Control Read One/Write All Replica Control • Weak mutual consistency can result in non- • One copy of each item is designated primary; serializable schedules the other copies are secondary new – A transaction (locks and) reads the nearest copy T : w(x ) w(y ) commit 1 A B – A transaction (locks and) writes the primary copy T2: r(xC) r(yB) commit Trep_upd: w(xC) w(xB) . . . – After a transaction commits, updates it has made to old primary copies are propagated to secondary copies • Alternate forms of asynchronous-update replication (asynchronous) vary the degree of synchronization between replicas; none support serializability • Writes of all transactions are serializable, reads are not 57 58

Primary Copy Replica Control Primary Copy Mutual Consistency old

T1: w(xpri) w(ypri) commit • Updates of an item are propagated by T : r(x ) r(y ) commit 2 pri B – A single (distributed) propagation transaction Trep_upd: w(xC) w(xB) w(yC) w(yB) new – Multiple propagation transactions – Periodic broadcast • The schedule is not serializable • Weak mutual consistency is guaranteed if the sequence of updates made to the primary copy of an item (by all transactions) is applied to each secondary copy of the item (in the same order). 59 60

10 Example Where Asynchronous Update is OK Variations on Propagation

• Internet Grocer keeps replicated information about • A secondary site might declare a of the customers at two sites primary, so that only the relevant part of the – Central (primary) site where customers place orders item is transmitted – Warehouse (secondary) site from which deliveries are made • With synchronous update, order transactions are – Good for low bandwidth connections distributed and become a bottleneck • With a pull strategy (in contrast to a push • With asynchronous update, order transaction updates strategy) a secondary site requests that its the central site immediately; update is propagated to view be updated the warehouse site later. – Provides faster response time to customer – Good for sites that are not continuously – Warehouse site does not need data immediately connected, e.g. laptops of business travelers 61 62

Group Replication (asynchronous) Group Replication - Conflicts • A transaction can ( and) update any replica. • Conflict: Updates are performed concurrently to the • Problem: Does not support weak mutual same item at different sites. consistency. • Problem: If a replica takes as its value the contents of the last update message, weak mutual consistency Site A Site B Site C Site D is not maintained • Solution: Associate unique timestamp with each T : x := 5 T2: x := 7 1 update and each replica. Replica takes timestamp of propagation propagation most recent update that has been applied to it. – Update discarded if its timestamp is less than timestamp of replica time – Weak mutual consistency is supported final value: xA=7 xB=7 xC=5 xD=5 63 64

Conflict Resolution Procedural Replication

• No conflict resolution strategy yields • Problem: Communication costs of previous serializable schedules propagation strategies are high if many – e.g., timestamp algorithm allows lost update items are updated • Conflict resolution strategies: – Ex: How do you propagate quarterly posting of – Most recent update wins interest to duplicate bank records? – Update coming from highest priority site wins • Solution: Replicate at – User provides conflict resolution strategy replica sites. Invoke the procedure at each – Notify the user site to do the propagation

65 66

11 Summary of Distributed

Transactions

§©¨ ¥ §© ¨ ¡ • The good news: If transactions run at ¢¡¤£¦¥ , all sites use two-phase commit for termination and synchronous update replication, then distributed transactions are globally atomic and serializable.

• The bad news: To improve performance

§©¨ ¥ §© ¨¡ – applications often do not use ¤¡¢£¥ – DBMSs might not participate in two-phase commit – replication is generally asynchronous update • Hence, consistent transactions might yield incorrect results

67

12