Transaction Management in HDBMSs

©2002 Vera Goebel & Denise Ecklund HDBMS-TM-1 HDBS Transaction Model global transactions

GTi GTj

GTM - global transaction manager

{ GSTi1, GSTl1, GSTi2, GSTj2 }

server server (proxy for the GTM) (proxy for the GTM)

GSTi1 GSTj1 GSTi2 GSTj2 local local LTk transactions transactions LTm LTl DBMS 1 ... DBMS n LTn

©2002 Vera Goebel & Denise Ecklund HDBMS-TM-2 Transaction Management • Local transactions: access data at a single site outside of the global HDBS control.

• Global transactions: are executed under the HDBS control.

Local DBMSs have three types of autonomy: Autonomy Type Definition Resulting Problem No changes can be made to the local Non-serializable schedule Design DBMS software to support the HDBMS for global transactions Each local DBMS controls execution of Non-atomic & non-durable Execution global subtransactions and local global transactions transactions ( the commit/abort decision) Local DBMS do not communicate with Distributed Communication each other and they do not exchange can not be detected execution control information

©2002 Vera Goebel & Denise Ecklund HDBMS-TM-3 Global Problem Global Serializability Atomicity & • GTM is responsible for Durability Distrbuted – A serializable schedule for the set of global transactions Deadlock – Coordination of submission and execution of global subtransactions among the local DBMSs • Serializing the global schedule?

GT1 GT2

GST11 GST12 GST21 GST22 GST23

Local DBMS-3 Local DBMS-1 Local DBMS-2

If GST11 〈 GST22 at site DBMS-1, GT1 〈 GT2 Then it must be the case that GST12 〈 GST23 at site DBMS-2

If GST 〈 GST at site DBMS-2 GT 〈 GT 23 12 2 1 A non-serializable schedule!

©2002 Vera Goebel & Denise Ecklund HDBMS-TM-4 Local Transactions and the Global Serializable Schedule • Local transactions execute outside the control of the GTM • Local transactions create indirect conflicts with global transactions • GTM is not aware of local transactions and these indirect conflicts • In general, the GTM cannot ensure global serializability GTM belives GT1 〈 GT2 GT1: r1(a) r1(c) GT2: r2(b) r2(d) at both sites

LT3: w3(a) w3(b) LDBMS-1 LDBMS-2 LT4: w4(c) w4(d)

a b c d

LDBMS-1: r1(a) c1 w3(a) w3(b) c3 r2(b) c2 LDBMS-2: w4(c) r1(c) c1 r2(d) c2 w4(d) c4

=> LDBMS-1: GT1 〈 LT3 〈 GT2 => LDBMS-2: GT2 〈 LT4 〈 GT1

©2002 Vera Goebel & Denise Ecklund HDBMS-TM-5 Controlling the Execution Order of Global Subtransactions Global Serializability • Four Strategies: Atomicity & Durability 1) Execute global transactions serially Distrbuted Deadlock • No concurrent execution for global transactions! • Does not solve indirect conflicts with local transactions • Costs: Heavy CC processing at the GTM Low query processing throughput 2) Define a specific order over the global transactions and use the mechanism of each local DBMS to enforce that order • Every local DB stores one ”ticket” object • Extend every global subtransaction to access the ticket GT1: r1(a) w1(a) newGT1: r1(ticketS1) r1(a) w1(a) w1(ticketS1) c1 GT2: r2(b) w2(b) newGT2: r2(ticketS1) r2(b) w2(b) w2(ticketS2) c2 • Means GT1 and GT2 will be correctly serialized with respect to all global transactions and all local transaction executed by the local DBMS at S1

©2002 Vera Goebel & Denise Ecklund HDBMS-TM-6 Controlling the Execution Order of Global Subtransactions Global Serializability Atomicity & 3) Use local DBs deploying rigorous CC Algorithms Durability Distrbuted • If all LDBMSs use rigorous 2-phase locking Deadlock and support a “prepare-to-commit” interface then – Global transactions are serializable without a CC Alg at GTM – Local transactions can not cause indirect conflicts Ex: (w4(c) r1(c) c1 r2(d) c2 w4(d) c4) In R2PL, T4 holds Not a rigorous all locks until commit, so ... local schedule T1 can not read object c until after T4 commits 4) Relax the serializability requirement • Use “strong correctness” instead • Most indirect conflicts have no effect on correctness

©2002 Vera Goebel & Denise Ecklund HDBMS-TM-7 Alternative Consistency Models • Global schedule is not serializable; it is strongly correct – Global transactions preserve all data consistency constraints Global Serializability Constraint-based strategies Atomicity & Durability • Local serializability: Some HDBS applications have no global Distrbuted constraints because each DBS is (and should be) independent from Deadlock each other => no global concurrency control mechanism needed So, local serializability ensures strong correctness of global executions. Ex application: travel reservation service for planes, trains, ferries, hotels, etc. • Limited global constraints: Some applications need global constraints. Define 2 types of data: global data and local data. Global constraints may only span global data, and local transactions may not write to global data. Use two-level serializability (2LSR): local-SR and global-SR. Artificial solution: local site has no autonomy over or direct-access to global data; local site must submit transactions to GTM to update global data stored at the local site => master-slave relationship.

©2002 Vera Goebel & Denise Ecklund HDBMS-TM-8 Alternative Consistency Models Global Serializability AtomicityGlobal & SerializabilityDurability Non-constraint-based strategies Distrbuted Deadlock • Diverge from strong correctness and serializability 1) Epsilon Serializability • Allows a specified number of nonserializable conflicts 2) Sets of Compatible Transactions • Assume a set of known transactions • Pre-analyze the transactions for conflicts • Group non-conficting transactions into compatible sets • Not CC control required among transactions in a compatible set

©2002 Vera Goebel & Denise Ecklund HDBMS-TM-9 Global Atomicity and Recovery Problem Global Serializability Atomicity & • The GTM must guarantee that a global transaction Durability Distrbuted commits at all sites or aborts at all sites Deadlock • Local DBMSs wish to preserve their execution autonomy – May not implement or export a “prepare-to-commit” interface

GTM GT1 2PC 2PC

GST11 GST12 GTM Proxy GTM Proxy Abort GST11 No 2PC No 2PC Commit GST12 LDBMS LDBMS

• A local DBMS can unilaterally abort a subtransaction anytime – Results in non-atomic global transactions and incorrect global schedules – Local transactions and global subtransactions see committed partial results

Note: The first heterogeneous systems did not support update transactions!

©2002 Vera Goebel & Denise Ecklund HDBMS-TM-10 Approaches to Achieve Atomicity and Durability Global Serializability Atomicity & Durability Distrbuted Deadlock • If all LDBMSs export a “prepare-to-commit” interface, then use 2PC between the proxy and the LDBMS

• If some LDBMSs do not export “prepare-to-commit”, then four approaches: 1) Modify each global subtransaction to “callback to the proxy” just before local commit GTM 2PC • Blocks the global subtransaction until GTM GTM Proxy completes 2PC with proxies No 2PC • Possibly only if the LDBMS supports a client LDBMS callback service • Fails if the LDBMS uses optimistic concurrency control

©2002 Vera Goebel & Denise Ecklund HDBMS-TM-11 Approaches to Achieve Atomicity and Durability Global Serializability Atomicity & • If any global subtransaction aborts Durability Distrbuted Deadlock 2) REDO failed write operations from global subtransactions - Performed by the proxy, who must maintain a local redo log

3) RETRY failed global subtransactions (read & write operations) - Performed by the proxy - Inappropriate semantics for many applications or transactions - No guarantee that the retry can ever be committed Ex: Banking application – withdrawing money can fail ”forever” 4) UNDO committed global subtransactions by Inconsistent data is executing compensating transactions temporarily visible to other transactions! - Performed by the GTM - Can provide semantic atomicity (called a saga)

©2002 Vera Goebel & Denise Ecklund HDBMS-TM-12 Global Deadlock Problem Global Serializability • Same problem as in distributed homogeneous DBMSs Atomicity & Durability waits for T1 x Distrbuted Site X to release Lx Deadlock T1 x T2 x holds Lx holds lock Lb T1 x needs a T2 y needs b waits for T1 y waits for T2 x to complete to complete waits for T2 y to release Ly T2 y Site Y T1 y holds lock La holds lock Ly

• We solved the problem by exchanging lock information to construct the global “waits-for” graph – This violates design autonomy and communication autonomy • Therefore the GTM will be unaware of a global deadlock. • There are no complete solutions to the global deadlock problem for autonomous multi- systems.

©2002 Vera Goebel & Denise Ecklund HDBMS-TM-13 Status: Transaction Management for HDBS

• Transaction management for HDBSs is a very active research area. • Distributed transactions over the define new semantics for transaction consistency, allowing development of new solutions.

Open issues: • What can be done if some of the local subsystems (e.g., file systems) do not support transaction management?

• Performance implications of transaction management strategy?

• Handling of different degrees of consistency?

©2002 Vera Goebel & Denise Ecklund HDBMS-TM-14 Conclusions

HDBS allows a uniform view on the combination of data maintained by different autonomous database systems.

• available: prototypes & commercial products with a set of fixed / specific drivers (so-called gateways) for existing, widely used data management systems (conventional DBS and file systems)

• missing: systematic support for individual integration of arbitrary data management systems – Examples: geographical DBs, multimedia DBs, Internet storefronts, etc.

©2002 Vera Goebel & Denise Ecklund HDBMS-TM-15