Transaction Management in HDBMSs
©2002 Vera Goebel & Denise Ecklund HDBMS-TM-1 HDBS Transaction Model global transactions
GTi GTj
GTM - global transaction manager
{ GSTi1, GSTl1, GSTi2, GSTj2 }
server server (proxy for the GTM) (proxy for the GTM)
GSTi1 GSTj1 GSTi2 GSTj2 local local LTk transactions transactions LTm LTl DBMS 1 ... DBMS n LTn
©2002 Vera Goebel & Denise Ecklund HDBMS-TM-2 Transaction Management • Local transactions: access data at a single site outside of the global HDBS control.
• Global transactions: are executed under the HDBS control.
Local DBMSs have three types of autonomy: Autonomy Type Definition Resulting Problem No changes can be made to the local Non-serializable schedule Design DBMS software to support the HDBMS for global transactions Each local DBMS controls execution of Non-atomic & non-durable Execution global subtransactions and local global transactions transactions ( the commit/abort decision) Local DBMS do not communicate with Distributed deadlock Communication each other and they do not exchange can not be detected execution control information
©2002 Vera Goebel & Denise Ecklund HDBMS-TM-3 Global Serializability Problem Global Serializability Atomicity & • GTM is responsible for Durability Distrbuted – A serializable schedule for the set of global transactions Deadlock – Coordination of submission and execution of global subtransactions among the local DBMSs • Serializing the global schedule?
GT1 GT2
GST11 GST12 GST21 GST22 GST23
Local DBMS-3 Local DBMS-1 Local DBMS-2
If GST11 〈 GST22 at site DBMS-1, GT1 〈 GT2 Then it must be the case that GST12 〈 GST23 at site DBMS-2
If GST 〈 GST at site DBMS-2 GT 〈 GT 23 12 2 1 A non-serializable schedule!
©2002 Vera Goebel & Denise Ecklund HDBMS-TM-4 Local Transactions and the Global Serializable Schedule • Local transactions execute outside the control of the GTM • Local transactions create indirect conflicts with global transactions • GTM is not aware of local transactions and these indirect conflicts • In general, the GTM cannot ensure global serializability GTM belives GT1 〈 GT2 GT1: r1(a) r1(c) GT2: r2(b) r2(d) at both sites
LT3: w3(a) w3(b) LDBMS-1 LDBMS-2 LT4: w4(c) w4(d)
a b c d
LDBMS-1: r1(a) c1 w3(a) w3(b) c3 r2(b) c2 LDBMS-2: w4(c) r1(c) c1 r2(d) c2 w4(d) c4
=> LDBMS-1: GT1 〈 LT3 〈 GT2 => LDBMS-2: GT2 〈 LT4 〈 GT1
©2002 Vera Goebel & Denise Ecklund HDBMS-TM-5 Controlling the Execution Order of Global Subtransactions Global Serializability • Four Strategies: Atomicity & Durability 1) Execute global transactions serially Distrbuted Deadlock • No concurrent execution for global transactions! • Does not solve indirect conflicts with local transactions • Costs: Heavy CC processing at the GTM Low query processing throughput 2) Define a specific order over the global transactions and use the concurrency control mechanism of each local DBMS to enforce that order • Every local DB stores one ”ticket” object • Extend every global subtransaction to access the ticket GT1: r1(a) w1(a) newGT1: r1(ticketS1) r1(a) w1(a) w1(ticketS1) c1 GT2: r2(b) w2(b) newGT2: r2(ticketS1) r2(b) w2(b) w2(ticketS2) c2 • Means GT1 and GT2 will be correctly serialized with respect to all global transactions and all local transaction executed by the local DBMS at S1
©2002 Vera Goebel & Denise Ecklund HDBMS-TM-6 Controlling the Execution Order of Global Subtransactions Global Serializability Atomicity & 3) Use local DBs deploying rigorous CC Algorithms Durability Distrbuted • If all LDBMSs use rigorous 2-phase locking Deadlock and support a “prepare-to-commit” interface then – Global transactions are serializable without a CC Alg at GTM – Local transactions can not cause indirect conflicts Ex: (w4(c) r1(c) c1 r2(d) c2 w4(d) c4) In R2PL, T4 holds Not a rigorous all locks until commit, so ... local schedule T1 can not read object c until after T4 commits 4) Relax the serializability requirement • Use “strong correctness” instead • Most indirect conflicts have no effect on correctness
©2002 Vera Goebel & Denise Ecklund HDBMS-TM-7 Alternative Consistency Models • Global schedule is not serializable; it is strongly correct – Global transactions preserve all data consistency constraints Global Serializability Constraint-based strategies Atomicity & Durability • Local serializability: Some HDBS applications have no global Distrbuted constraints because each DBS is (and should be) independent from Deadlock each other => no global concurrency control mechanism needed So, local serializability ensures strong correctness of global executions. Ex application: travel reservation service for planes, trains, ferries, hotels, etc. • Limited global constraints: Some applications need global constraints. Define 2 types of data: global data and local data. Global constraints may only span global data, and local transactions may not write to global data. Use two-level serializability (2LSR): local-SR and global-SR. Artificial solution: local site has no autonomy over or direct-access to global data; local site must submit transactions to GTM to update global data stored at the local site => master-slave relationship.
©2002 Vera Goebel & Denise Ecklund HDBMS-TM-8 Alternative Consistency Models Global Serializability AtomicityGlobal & SerializabilityDurability Non-constraint-based strategies Distrbuted Deadlock • Diverge from strong correctness and serializability 1) Epsilon Serializability • Allows a specified number of nonserializable conflicts 2) Sets of Compatible Transactions • Assume a set of known transactions • Pre-analyze the transactions for conflicts • Group non-conficting transactions into compatible sets • Not CC control required among transactions in a compatible set
©2002 Vera Goebel & Denise Ecklund HDBMS-TM-9 Global Atomicity and Recovery Problem Global Serializability Atomicity & • The GTM must guarantee that a global transaction Durability Distrbuted commits at all sites or aborts at all sites Deadlock • Local DBMSs wish to preserve their execution autonomy – May not implement or export a “prepare-to-commit” interface
GTM GT1 2PC 2PC
GST11 GST12 GTM Proxy GTM Proxy Abort GST11 No 2PC No 2PC Commit GST12 LDBMS LDBMS
• A local DBMS can unilaterally abort a subtransaction anytime – Results in non-atomic global transactions and incorrect global schedules – Local transactions and global subtransactions see committed partial results
Note: The first heterogeneous systems did not support update transactions!
©2002 Vera Goebel & Denise Ecklund HDBMS-TM-10 Approaches to Achieve Atomicity and Durability Global Serializability Atomicity & Durability Distrbuted Deadlock • If all LDBMSs export a “prepare-to-commit” interface, then use 2PC between the proxy and the LDBMS
• If some LDBMSs do not export “prepare-to-commit”, then four approaches: 1) Modify each global subtransaction to “callback to the proxy” just before local commit GTM 2PC • Blocks the global subtransaction until GTM GTM Proxy completes 2PC with proxies No 2PC • Possibly only if the LDBMS supports a client LDBMS callback service • Fails if the LDBMS uses optimistic concurrency control
©2002 Vera Goebel & Denise Ecklund HDBMS-TM-11 Approaches to Achieve Atomicity and Durability Global Serializability Atomicity & • If any global subtransaction aborts Durability Distrbuted Deadlock 2) REDO failed write operations from global subtransactions - Performed by the proxy, who must maintain a local redo log
3) RETRY failed global subtransactions (read & write operations) - Performed by the proxy - Inappropriate semantics for many applications or transactions - No guarantee that the retry can ever be committed Ex: Banking application – withdrawing money can fail ”forever” 4) UNDO committed global subtransactions by Inconsistent data is executing compensating transactions temporarily visible to other transactions! - Performed by the GTM - Can provide semantic atomicity (called a saga)
©2002 Vera Goebel & Denise Ecklund HDBMS-TM-12 Global Deadlock Problem Global Serializability • Same problem as in distributed homogeneous DBMSs Atomicity & Durability waits for T1 x Distrbuted Site X to release Lx Deadlock T1 x T2 x holds lock Lx holds lock Lb T1 x needs a T2 y needs b waits for T1 y waits for T2 x to complete to complete waits for T2 y to release Ly T2 y Site Y T1 y holds lock La holds lock Ly
• We solved the problem by exchanging lock information to construct the global “waits-for” graph – This violates design autonomy and communication autonomy • Therefore the GTM will be unaware of a global deadlock. • There are no complete solutions to the global deadlock problem for autonomous multi-database systems.
©2002 Vera Goebel & Denise Ecklund HDBMS-TM-13 Status: Transaction Management for HDBS
• Transaction management for HDBSs is a very active research area. • Distributed transactions over the Internet define new semantics for transaction consistency, allowing development of new solutions.
Open issues: • What can be done if some of the local subsystems (e.g., file systems) do not support transaction management?
• Performance implications of transaction management strategy?
• Handling of different degrees of consistency?
©2002 Vera Goebel & Denise Ecklund HDBMS-TM-14 Conclusions
HDBS allows a uniform view on the combination of data maintained by different autonomous database systems.
• available: prototypes & commercial products with a set of fixed / specific drivers (so-called gateways) for existing, widely used data management systems (conventional DBS and file systems)
• missing: systematic support for individual integration of arbitrary data management systems – Examples: geographical DBs, multimedia DBs, Internet storefronts, etc.
©2002 Vera Goebel & Denise Ecklund HDBMS-TM-15