Heterogeneity

Concurrency control in Heterogeneous Federations Heterogeneous Distributed Independent servers with local autonomy

- If all servers happen to be independent servers: multidatabase system (MDBS) - Global knowledge is not available, • Global and local TA in global protocols are not an option! multidatabases - MDBS: loosely coupled distributed databases • Local and global histories Important: • Direct and indirect conflicts - local transactions can interfere with global ones • Ticket based CC - can solutions with scheduling guarantees be • Coherency an CC in data built from localized components? sharing systems (caches) based on Weikum / Vossen: Transactional Information Systems hs / FUB dbsII-03-18DDBCC2-2

Heterogeneous federation Local and global histories Local history at site S Global - All operations of local TAs at S and those operations transactions . . . of global TAs executed at S (subtransactions at S)

S1 = {a,b}, S2 = {c,d,e} Global TA manager t1 = r(a) w(b) local at S1 t2 = w(d) r(e) local at S2 t3 = w(a) r(d) global t4 = w(b) r(c) w(e) global Local ... TA managers Local histories: and data servers s1: r1(a) w3(a) c3 w1(b) c1 w4(b) c4 s2: r4(c) w2(d) r3(d) c3 r2(e) c2 w4(e) c4

Local transactions hs / FUB dbsII-03-18DDBCC2-3 hs / FUB dbsII-03-18DDBCC2-4

Local and global History of global history

Global history of a set of transactions Global history serializable? - Given local and global transactions Ti= {ti} with local histories s1,…s2 r1(a) r4(c) w2(d) r3(d) w3(a) c3 r2(d) c2 c3 w1(b) c1 w4(b) w4(e) c4 c4 - A global history h of Ti is a sequence of exactly the operations of all the local and global transactions and.. - .. local projections of h are the local ones (si) s1: r1(a) w3(a) c3 w1(b) c1 w4(b) c4 Example red: local TAs s2: r4(c) w2(d) r3(d) c3 r2(e) c2 w4(e) c4 s1: r1(a) w3(a) c3 w1(b) c1 w4(b) c4 s2: r4(c) w2(d) r3(d) c3 r2(e) c2 w4(e) c4 s1: t1 < t3, t1 < t4 Ö t1 t3 t4 s2 : t2 < t4, t2 < t3 Ö t2 t3 t4 r1(a) r4(c) w2(d) r3(d) w3(a) c3 r2(d) c2 c3 w1(b) c1 w4(b) w4(e) c4 c4

r1(a) w2(d) r4(c) …. is NOT a global history Global history is serializable: t1 < t2 < t3 < t4 Ö t1 t2 t3 t4 … by chance! Local serializability does s1: t1 < t3, t1 < t4 s2 : t2 < t4, t2 < t3 NOT imply global serializability in heterogenous systems

hs / FUB dbsII-03-18DDBCC2-5 hs / FUB dbsII-03-18DDBCC2-6

1 Serializablity of global history Global histories

Counterexample with serializable local histories Sad consequence: without a serializable global one Local serializability not sufficient to guarantee S1 = {a}, S2= {b,c} global serializability in heterogeneous federations Global TA: t1 = r(a), w(b), t2 = w(a) r(c)  Local TA t3 = r(b) w(c) Reason: indirect conflict between t1 and t2 at S2 caused by local TA t3: Local histories: r3(b) w1(b) r2(c) w3(c) S1: r1(a) w2(a) S2: r3(b) w1(b) r2(c) w3(c) Solution in principle: Global TA manager has to local histories serializable: t1, t2 and t2 t3 t1 guarantee same serialization sequence of global TAs also in case of indirect conflicts global history is not: t1 t2 t3 ?? # t2 t3 t1 ?? #

hs / FUB dbsII-03-18DDBCC2-7 hs / FUB dbsII-03-18DDBCC2-8

Global histories Conflicts in heterogeneous DDB

Even read-only transactions may be in conflict (!) Direct and indirect conflicts

s1= {a ,b} , S2 = {c,d } Let si be a local history, and let t, t‘ TAs with local histories s, s' • t and t‘ are in a direct conflict in si if there are two data S1: r1(a) r3(a) r3(b) w3(a) w3(b) r2(b) operations p ∈ t and q ∈ t‘ in si that access the same data item and at least one of them is a write. S2: r2(c) r4(c) r4(d) w4(c) w4(d) r1(d) • t and t‘ are in an indirect conflict in si if there exists a sequence t1 , ... , tr of transactions with operations in si Global transactions t1 and t2 (both read-only!) are such that t is in si in a direct conflict with t1, tj is in si in a serialized differently at either site. direct conflict with tj+1, 1 <= j<= r-1, and tr is in si in a direct conflict with t‘. t1 t3 t2 t4 t1 t2 • t and t‘ are in conflict in si if they are in a direct or an t4 indirect conflict

hs / FUB dbsII-03-18DDBCC2-9 hs / FUB dbsII-03-18DDBCC2-10

Global serializability

How to achieve global serializability? Schedule is globally conflict serializable (no direct or Reasonable assumption: indirect conflict) Servers guarantee local conflict serializability Ù global conflict graph (serialization graph) does Question: how should global TA manager not contain a cycle schedule global TAs in order to avoid indirect conflicts?

Hint: TAs in indirect conflict but not in direct conflict may commute: Let t3 < t1 and t2 < t3 , then t1 and t2 may be exchanged as long as still holds: t3 < t1 , t2 < t3 e.g. t2 t3 t1 still is correct serialization hs / FUB dbsII-03-18DDBCC2-11 hs / FUB dbsII-03-18DDBCC2-12

2 Global serializability Serialisation in Multidatabases: Questions

Example (1) Properties of local schedulers which guarantee global serializability ? Which ones? S1: r1(a) w2(a) S2: r3(b) w1(b) r2(c) w3(c) (2) If local schedulers can only guarantee local conflict serializability – e.g. by using 2PL locking or TO - , In Site 1, t1 and t2 are in direct conflict. how can global scheduler notice conflict? In Site 2, they are in indirect conflict. (3) …. what can global Scheduler do to enforce global conflict serializability? Note: t1 and t2 do commute at S2, but that leaves their indirect conflict unchanged.

hs / FUB dbsII-03-18DDBCC2-13 hs / FUB dbsII-03-18DDBCC2-14

Global serializability in heterogeneous multi-DBS Global serializability

Goal: find properties of local schedulers in order to … but not quite sufficient guarantee global serializability Example Reasonable assumption: local schedulers guarantee conflict serializable schedules – e.g. 2PL, TO, Optimistic CC, … s1 = w1(a) c1 r3(a) r3(b) c3 w2(b) c2 is RG, with t1 < t2 … not sufficient, as we know s2 = w2(c) c2 r4(c) r4(d) c4 w1(d) c1 is RG, Stronger protocol: "rigorousness" If there is a conflict pair with t2 < t1 opk(X) < opj(X), then transaction k commits (or aborts) before Problem: w1(d) c1 "too late", … or w1(a) c1 "too early" the opj(X) is executed

hs / FUB dbsII-03-18DDBCC2-15 hs / FUB dbsII-03-18DDBCC2-16

Global serializability Global serializability

t is commit-deferred if its commit operation is sent by Rigorous serializability not exotic: the GTM to respective local sites only after all of t‘s Strict 2PL - all locks are released at commit – local operations have been acknowledged. guarantees rigorousness

Let s be a global history for s1 , ... , sn. If each si is in RG Why? and all global transactions are commit-deferred, then s is globally serializable. Consequence: t2 Because… tj-1 … Heterogeneous transactional servers can be built t1 with local servers using strict 2PL tj == t1

If not csr there must be a cycle connecting (cycle-free) local graphs Ö t1 must have been committed at one site, but still active at another. # hs / FUB dbsII-03-18DDBCC2-17 hs / FUB dbsII-03-18DDBCC2-18

3 Global serializability in heterogeneous environments Global serializability using tickets

 Suppose servers do not have "nice" properties (e.g. Example: strict 2PL) S1={a}, S2 ={b,c} S1: r (a) w (a) How could the local / global transaction managers 1 2 S2: r (b) w (b) r (c) w (c) c check for cycles? 3 1 2 3 3 serial execution of global transactions is not S1 ={a ,b } , S2 = {c,d} sufficient @S1 : t1 t2, @S2: t2 t3 t1 S1: r1(a) c1 w3(a) w3(b) c3 r2(b) c2 ->-> t1 t1 t3 t3 t2 t2 S2: w4(c) r1(c) c1 r2(d) c2 w4(d) c4 ->-> t2 t2t4 t4 t1t1 Take a site ticket! (I1 or I2) :

If global TM schedules t2 after t1 (serial execution of TAs) S1: r1(I1) w1(I1+1) r1(a) c1 r2(I1) w2(I1+1) w2(a) c2 and t1 writes an (artificial) object which t2 reads (at each local site) S2: r3(b) r1(I2) w1(I2+1) w1(b) c1 r2(I2) w2(I2+1) r2(c) c2 w3(c) c3 then serialization is either "t1 t2" or a local conflict will be detected! @S1: t1, t2, @S2: not serializable! Conflict detected! t3 < t1 , t1 < t2 (forced!) t2 < t3 Serial execution of global TAs t1, t4 not realistic. Distributed solution?

hs / FUB dbsII-03-18DDBCC2-19 hs / FUB dbsII-03-18DDBCC2-20

Global serializability using tickets Global serializability using tickets

Ticket: e.g. counter at each site, e.g. timestamp Alternative to global ticket graph? Basic idea: each global TA takes ticket and Yes: if local servers S are conflict serializable and guarantees a particular serialization order avoid cascading abort, resulting global histories (at this site!): are conflict serializable! Show: if there is an incompatible If t1 takes a ticket at S before t2, then t1 < t2 at S serialization order at Si and Sj, e.g. why? t1 < t2 at Si and t2 < t1 at Sj then t2 cannot read ticket at Si before t1 commits Global serializability: (and vice versa) . No cascading abort! Global TM must know the relative "ticket orders": Total ordering of commits assumed: # Ö global ticket graph. Global TM receives ticket order from each local server Ö No global ticket graph, serialization conflicts are ticket graph acyclic Ö globally serializable detected completely locally

hs / FUB dbsII-03-18DDBCC2-21 hs / FUB dbsII-03-18DDBCC2-22

Locking in data sharing systems Locking in data sharing systems

Example: Server 1 Server 2 Server n - Online shop Product (pID, prodName,…) Customer (cID, cName,….) ... Order (oNo, cID, date,…) Cache Cache Cache OItem(pID, qty, oNo,…) - How should DB be mapped to data sharing architecture Interconnect - with minimal locking overhead - with minimal overhead for guaranteeing cache coherence

Global authority - Static assignment part of the data to a Server and its lock Issues: manager e.g. hash (pID) -> {i | Si is a server} - - Static data distribution also effective for joins - Cache coherence

hs / FUB dbsII-03-18DDBCC2-23 hs / FUB dbsII-03-18DDBCC2-24

4 Locking in data sharing systems Locking in data sharing systems

Naive solution: Global Lock Authority responsible for Local lock authorities and call back locking all locks Very similar to cache coherence issue invariants: appl@ S0 S1 S2 Sk Sn - multiple caches can hold up-to-date copy read(ti, x) lockReq(x) lock authority for x of the same page in cache (page owners) lockGrant(x) - page owner may give a copy to servers which want to read, i.e. become page owner read(tj, x) - once a page is updated in one of the caches only this cache may hold a page copy write(tj, x) wait until ti commits Lock authorities: - Global, statically defined How to benefit from caches? Cache coherence? - 0,… n local read lock authorities RLAi (p) for a page RLAi(p) may grant read locks for p Example from above: read data item (page), put it into cache, - one write lock authority WLAi(p): may grant write locks for p cache invalidate for every write on page if there is no RLA

hs / FUB dbsII-03-18DDBCC2-25 hs / FUB dbsII-03-18DDBCC2-26

Example Call back locking 6: Read copy of p from cache

Server 1 1: Read p Server 2 4: Read p Server n Server 1 Server 2 Server n request write(p) Owner of p Owner of p 2: Grant ... callback read-lock (p) ... Cache ownership Cache 5 Grant ownership p, Cache Cache Cache Get p from S1 Ack(p) Cache Ack(p) Grant write(p) authority 3: Read p Interconnect Interconnect

•Locking: Example from above: p all copy holders Product / customer data will are local lock be copied into the caches authorities. – as long as there is no

update. Efficient cache reads. hs / FUB dbsII-03-18DDBCC2-27 hs / FUB dbsII-03-18DDBCC2-28

"Evaluation" of callback locking Owner (x) S 1 S2 S3 Callback lock / cache coherence protocol rlock (x) r1(x) fits excellent to application profile with - many reads from TAs on different servers rlock authority (x) r (x) Example rlock (x) 2 - few writes on these objects rlock authority (x) - writes on some objects by only one TA

c 1 Example from above r (x) 3 - Products: very few updates c 3 - Customers: few updates, some inserts w (x) wlock (x) 4 - Orders: many updates ("shopping cart") , but updates within one TA, no concurrency callback (x) callback (x) Finer lock granularity? ok c Cache transfer unit remains a page ok 2 wlock authority (x) hs / FUB dbsII-03-18DDBCC2-30

5 Locking granularity Summary

Refinements : record level locking Homogeneous federations - For each page p : list of write-locked records W(p)={r1,…rk} - No fundamental change of model(s) - request rlock(rec): grant read authority (p \ W) - concurrency control needs global synchronization where p is the page of rec - emphasis is on transparency - request wlock(rec) : callback W:= W ∪ {rec} - if rec not locked  Heterogeneous federations lock rec until eot - the more important ones, short term write lock on p to establish up-to-date copy - allow for a component-based approach built of page from (strong) local properties and the ticket method Further refinements - emphasis is on autonomy - Locking in an adaptive fashion Data sharing systems - Lock conflict on a page rare - require coherency control and callback locking Æ page lock as above, only if needed: record locks - locking and coherence may be combined into (nearly) one mechanism

hs / FUB dbsII-03-18DDBCC2-31 hs / FUB dbsII-03-18DDBCC2-32

6