<<

Lazy with Ordering Guarantees

Khuzaima Daudjee and Kenneth Salem School of Science University of Waterloo Waterloo, Ontario, Canada {kdaudjee, kmsalem}@db.uwaterloo.ca

Abstract customer of an electronic ticket reservation service. This customer issues two transactions1 sequentially against the Lazy replication is a popular technique for improving ticket reservation database. The first transaction, Treserve, the performance and availability of database systems. Al- books tickets for a concert and records the bookings in the though there are techniques which customer’s virtual shopping bag . The second transaction, guarantee in lazy replication systems, these Tstatus, reports the contents of the customer’s shopping techniques may result in undesirable transaction orderings. bag. Since transactions may see stale data, they may be serial- Since the customer issues Tstatus after Treserve has fin- ized in an order different from the one in which they were ished, it is reasonable for him to expect that Tstatus will submitted. Strong serializability avoids such problems, but “see” the effects of Treserve.Thatis,Tstatus should report it is very costly to implement. In this paper, we propose a that the shopping bag contains the concert tickets. This is, in generalized form of strong serializability that is suitable for fact, the behavior that many database systems will exhibit. use with lazy replication. In addition to having many of the For example, if the database system uses eager replication advantages of strong serializability, it can be implemented or no replication, and it uses two-phase locking, Tstatus more efficiently. We show how generalized strong serializ- will appear to follow Treserve. However, if the ticketing ability can be implemented in a lazy replication system, and database is replicated and lazily synchronized, this may not we present the results of a simulation study that quantifies be the case. The Treserve and Tstatus transactions may run the strengths and limitations of the approach. against different copies of the data and, in particular, Tstatus may run against a copy that does not yet reflect the updates made by Treserve. 1 Introduction There are several concurrency control algorithms that guarantee in lazy replicated systems Replication is a popular technique for improving the per- [5, 4, 16, 13]. However, serializability is not enough to solve formance and availability of a database system. In dis- this transaction ordering problem. Serializability ensures tributed database systems, replication can be used to bring that transactions will appear to have executed sequentially more computational resources into play, or to move data in some order. However, it does not guarantee that the se- closer to where it is needed. rialization order will be consistent with the order in which A key issue in replicated systems is synchronization. the transactions are submitted to the system. A concurrency Synchronization schemes can be classified as eager or lazy control algorithm that guarantees only serializability may T T T [7]. Eager systems propagate updates to replicas within the serialize status before reserve even if reserve is submit- T scope of the original updating transaction. This makes it ted first. This happens when status is allowed to see a stale relatively easy to guarantee transactional properties, such copy, as described above. as serializability. However, since such transactions are dis- Our transaction ordering problem could be solved by tributed and relatively long-lived, the approach does not using a concurrency control that guarantees strong se- scale well. Lazy schemes, on the other hand, update repli- rializability [3], rather than plain serializability. Infor- cas using separate transactions. mally, strong serializability means that sequential transac- In this paper we address a problem related to the per- tions must be serialized in the order in which they are sub- ceived ordering of execution of transactions in lazy repli- 1Perhaps this customer issues these transactions indirectly, by interact- cated systems. To understand this problem, consider a ing with a web service provided by the e-ticketer.

Proceedings of the 20th International Conference on Data Engineering (ICDE’04) 1063-6382/04 $ 20.00 © 2004 IEEE Customer #1 Transactions which are not. As we describe in Section 2, this is accom- plished by grouping transactions into sessions. Transaction ordering is preserved between transactions in the same ses- T T reserve status sion, but not between transactions in different sessions. By ignoring unimportant transaction orderings, we ex- time pect to be able to implement strong session serializability T avail efficiently. Our second contribution is a pair of algorithms for implementing strong session serializability in lazy repli- Customer #2 Transaction cated systems. These algorithms are presented in Section 3.2. We have used simulation to compare the cost of pro- Figure 1. Execution Example with Two Cus- viding strong session serializability using these algorithms tomers with the cost of providing plain serializability, and with the cost of providing strong serializability. The results of these experiments are presented in Section 5. The experiments show that, under the right conditions, strong session serial- mitted. In our example, Tstatus would have to be serialized izability can be provided almost as efficiently as plain se- after Treserve. However, none of the proposed concurrency rializability, and at a much lower cost than strong serializ- controls for lazily replicated systems guarantee strong se- ability. rializability, and there is good reason for this. Although Strong session serializability provides a basis for captur- strong serializability would address our transaction order- ing applications’ transaction ordering constraints, while si- ing problem, it is too strong. In a strongly serializable sys- multaneously providing sufficient flexibility to allow those tem, once one copy of the data has been updated, all other constraints to be enforced in lazy replicated systems. copies are effectively rendered useless until they, too, have been updated. Showing a stale copy to a subsequent trans- action will violate the strong serializability guarantee. This 2 Sessions and Strong Session Serializability effectively neutralizes many of the advantages of using lazy replication in the first place. Breitbart, Garcia-Molina and Silberschatz [3] used the In this paper, we address this transaction ordering prob- term strong serializability to mean serializability in which lem in lazy replicated systems. We make two contributions the ordering of non-concurrent transactions is preserved in to the body of work in this area. First, since serializability is the serialization order. Since we are concerned in this paper not strong enough to address the transaction ordering prob- with replicated , we present here a slightly modi- lem and strong serializability is too strong, we propose a fied version of their definition that is appropriate when there new criterion called strong session serializability.Theidea is replication. behind strong session serializability can be illustrated with Definition 2.1 Strong 1SR: A transaction execution his- the help of the diagram in Figure 1. The figure shows the tory H is strongly serializable (Strong 1SR) iff it is 1SR transactions Treserve and Tstatus that have already been de- and, for every pair of committed transactions Ti and Tj scribed. In addition, the figure shows a transaction Tavail, in H such that Ti’s commit precedes the first operation of which is issued by a different customer. Tavail simply Tj , there is some serial one-copy history equivalent to H in checks ticket availability for the concert. which Ti precedes Tj. With strong session serializability, we preserve the order- ing of Treserve and Tstatus. However, we do not (necessar- A transaction execution history over a replicated database ily) preserve the ordering of Tavail with respect to Treserve is one-copy serializable (1SR) if it is equivalent to a serial and Tstatus. The intuition is that the ordering of Treserve history over a single copy of the database [2]. and Tstatus is important, in this case because the sequen- As discussed in Section 1, strong 1SR can be too strong: tial nature of those transactions is observed by Customer it requires the enforcement of transaction ordering con- 1. However, the ordering of Tavail relative to the other two straints that may be unnecessary and costly. For example, transactions is not directly observed by either customer. In- strong 1SR requires Tavail to be serialized after Treserve in deed, customers do not observe other customers’ requests at the example of Figure 1. In a replicated system, this means all, except indirectly through their effects on the database. that Tavail cannot run at a site until Treserve’s updates have In our example, strong session serializability frees the sys- been propagated there. tem to serialize Tavail before or after Treserve, and before Strong session serializability captures the idea that or- or after Tstatus. dering constraints may be necessary between some pairs of In general, strong session serializability allows us to transactions, but not between others. Abstractly, we use ses- specify which transaction orderings are important and sions as a means of specifying which ordering constraints

Proceedings of the 20th International Conference on Data Engineering (ICDE’04) 1063-6382/04 $ 20.00 © 2004 IEEE are important and which are not. A session is simply a set of Strong transactions. The transactions in an execution history H are Strong Session partitioned into one or more sessions. Ordering constraints 1SR 1SR 1SR will be enforced among transactions in a single session, but Treserve

Proceedings of the 20th International Conference on Data Engineering (ICDE’04) 1063-6382/04 $ 20.00 © 2004 IEEE Secondary Sites ple way to implement such sequence numbers is to use the Clients log sequence numbers of the transactions’ commit records.2

Primary Site Updates made by committed transactions, together with the transactions’ sequence numbers, are propagated lazily from the primary site to the secondary sites. Lazy propa- gation means that propagation occurs some time after the update transaction has been committed. We assume that the propagation mechanism propagates updates in serializa- tion order, and that updates are not lost or reordered during Update and Read-Only Forwarded Update propagation. If the primary database resides at a single site Transactions Transactions as shown in Figure 2, this is relatively easy to achieve us- Lazy Update Propagation ing log sniffers, triggers, or other mechanisms. If, for per- formance reasons, the primary database is partitioned over several sites, it is necessary for these sites to generate a sin- Figure 2. Lazy Master System Architecture gle, unified stream of updates (consistent with the primary serialization order) for propagation to the secondary sites. with increasing read-only transaction workloads. The ar- This is more difficult in the multi-site case than it is in the chitecture is best suited to read-mostly workloads since the single site case, but it is certainly possible. For example, primary site is a potential update bottleneck. Many common Liu and colleagues have described a log-sniffing mechanism workloads in application areas such as decision support and that merges updates from the logs of a partitioned database web commerce are read-mostly [19, 11, 20]. However, if system into a single stream [10]. We will not be concerned the update load at the primary does become an issue, data in this paper with the exact format of the change log or the partitioning can be used to distribute the update load over update propagation messages. multiple primary sites. As discussed in Section 3.1.1, the important property of the primary site, for our purposes, is 3.1.2 The Secondary Sites that it should be capable of generating transaction sequence At each secondary site, propagated updates are placed in a numbers and propagating updates in serialization order. FIFO update queue. A refresh process removes propagated The basic system uses the following general approach updates from the update queue and applies them to the lo- to ensure global serializability. All update transactions are cal database copy. Note that there is an independent refresh serialized by the local concurrency control at the primary process at each secondary site. For each propagated update site. Updates are propagated lazily to the secondary sites, transaction, the refresher uses a local transaction to remove and are installed there in the same order in which they were the transaction’s updates from the update queue and then ap- serialized at the primary. Thus, at any time, each secondary ply them to the local secondary database copy. These local site will hold a snapshot, probably stale, of the primary transactions are called refresh transactions. Thus, for ev- database. Read-only transactions run against these snap- ery update transaction at the primary there is eventually one shots. The secondary sites are not synchronized with each corresponding refresh transaction at each secondary site. other, so at any given time some secondary sites may be Refresh transactions are also used to maintain a database more or less fresh than others. There are several specific sequence number, seq(DB), at each secondary site. Each issues that must be addressed in order to make this general secondary site has its own seq(DB), which is independent approach work. We discuss these in the next two subsec- of the sequence numbers at the other secondary sites. When tions. a refresh transaction applies updates at a secondary site, it also sets that site’s seq(DB) to match the transaction 3.1.1 The Primary Site sequence number that was propagated with those updates. These sequence numbers are not needed by the basic system We assume that the primary site is capable of associating to ensure serializability. However, as described in Section a sequence number with each update transaction committed 3.2, they are used by the session managers to enforce strong seq(T ) there. We will use to denote the sequence number of session serializability. T transaction . We also assume that these sequence numbers It is important that refresh transactions at each secondary are consistent with the local transaction serialization order site be serialized in the order in which their correspond- at the primary site. That is, seq(T1)

Proceedings of the 20th International Conference on Data Engineering (ICDE’04) 1063-6382/04 $ 20.00 © 2004 IEEE ing update transactions were serialized at the primary site. For each read-only transaction T in session s: We use a simple mechanism to achieve this. First, the re- WAIT UNTIL seq(DB) ≥ seq(s) AT SECONDARY SITE fresh process performs the refresh transactions sequentially START T AT SECONDARY SITE in the order in which updates are retrieved from the local up- PERFORM T ’S READS AT SECONDARY SITE date queue. Second, the local concurrency controller at the COMMIT OR ABORT T AT SECONDARY SITE secondary is required to guarantee the strong serializability IF T COMMITTED THEN property. This property ensures that sequentially executed seq(s) ← seq(DB) transactions are serialized in the order in which they are ex- ENDIF ecuted.3 Note that it is possible to use other techniques, e.g. ticketing or transaction conflict analysis [6, 8], to order the For each update transaction T in session s: refresh transactions. However, since our approach depends START T AT PRIMARY SITE only on the proper serialization of the refresh transactions PERFORM T ’S READS/WRITES AT PRIMARY SITE and not on the particular mechanism used to achieve it, we COMMIT OR ABORT T AT PRIMARY SITE will stick with the simple sequential execution mechanism OBTAIN seq(T ) FROM PRIMARY SITE in this paper. IF T COMMITTED THEN seq(s) ← seq(T ) Theorem 1 The basic system guarantees 1SR. ENDIF

Proof: Let Ti represent the ith transaction in the local se- rialization order at the primary, and let Si represent the Figure 3. The BLOCK Algorithm database state at the primary site that results from Ti’s ex- ecution. Without loss of generality (since secondary sites are independent of one another), suppose there is a single between transactions submitted by different clients. Each transaction has associated with it, either explicitly or im- secondary site, and let Ri represent the ith refresh transac- tion to commit at that site. Because (1) the primary site plicitly, a session label, so that the secondary site can tell propagates updates in serialization order, (2) updates are which transactions belong to which session. not reordered during propagation, (3) the refresh process This simple session model has two properties which are applies update transactions in the order received, and (4) significant to the algorithms that we will describe. The first the local concurrency control at the secondary site ensures property is that the transactions within a session are submit- strong serializability, Ri applies Ti’s updates to the database ted sequentially, not concurrently. The second property is at the secondary site. Since this is true for all i, and since that each session is associated with a single secondary site. refresh transactions are the only update transactions at the A relaxation of the first property would be relatively simple secondary site, the secondary database will be in state Si to handle. We have chosen to retain it because it simplifies 4 as a result of Ri’s execution. Now consider an aribtrary the presentation of our algorithms. The second property, read-only transaction Tr that is serialized after Ri and be- on the other hand, allows the secondary sites to operate fore Ri+1 by the local concurrency control at the secondary independently of one another. Without it, enforcement of site. Tr sees state Si. Thus, it can be serialized after Ti and strong session 1SR would require coordination among the before Ti+1 in an equivalent one-copy serial schedule. All secondary sites. other read-only transactions at the secondary can be serial- Each secondary site has a session manager. The session ized the same way.2 managers, together with the propagation, refresh, and lo- cal concurrency control mechanisms described in Section 3.2 The Session Manager 3.1, are responsible for enforcing strong session 1SR. Each session manager is responsible for the sessions that are as- sociated with its secondary site. In this section, we show how to augment the basic sys- tem so that it will guarantee strong session 1SR. Previously, we assumed that each client submits a sequence of trans- 3.2.1 The BLOCK Algorithm actions to a secondary site. We will now make the addi- tional assumption that each client’s transactions constitute We propose two algorithms that can be used by the ses- a session. That is, we wish to enforce ordering constraints sion managers to enforce strong session 1SR. The first al- among the transactions submitted by a single client, but not gorithm, BLOCK, is summarized in Figure 3. Each session manager maintains, for each of the sessions s for which it 3 Strong serializability is guaranteed by well-known concurrency con- is responsible, a sequence number seq(s). A session’s se- trols such as two-phase locking. 4There is an implicit assumption here that all copies of the database quence number essentially indicates database state “seen” start in the same state. by the most recently committed transaction in that session.

Proceedings of the 20th International Conference on Data Engineering (ICDE’04) 1063-6382/04 $ 20.00 © 2004 IEEE The BLOCK algorithm works by delaying read-only trans- For each read-only transaction T in session s: actions, if necessary, until the database state at the sec- IF seq(DB) ≥ seq(s) AT SECONDARY SITE THEN ondary site is at least as great as seq(s).5 This, plus the START T AT SECONDARY SITE local concurrency control at the secondary site, is sufficient PERFORM T ’S READS AT SECONDARY SITE to ensure that each read-only transaction will be serialized COMMIT OR ABORT T AT SECONDARY SITE after its predecessors in its session. IF T COMMITTED THEN seq(s) ← seq(DB) Theorem 2 If each site guarantees strong serializabilitylo- ENDIF cally, then the BLOCK algorithm, in conjunction with the ELSE propagation and refresh mechanisms of the basic system, START T AT PRIMARY SITE guarantees global strong session 1SR. PERFORM T ’S READS AT PRIMARY SITE COMMIT OR ABORT T AT PRIMARY SITE Proof: Suppose that the claim is false, which means that OBTAIN seq(T ) FROM PRIMARY SITE there exists a pair of transactions T1 and T2 in the same ses- IF T COMMITTED THEN sion s for which T1 is executed before T2 but T1 cannot be seq(s) ← seq(T ) serialized before T2. There are four cases to consider: ENDIF Case 1: Suppose T1 and T2 are update transactions. T1 and ENDIF T2 both execute at the primary site. Since the primary site ensures strong serializability and since T2 starts after T1 fin- ishes, T1 is serialized before T2, a contradiction. Figure 4. Read-Only Transactions Under the Case 2: Suppose T1 is a read-only transaction and T2 is FORWARD Algorithm an update transaction. Since T1 precedes T2 and T2 pre- R R cedes its refresh transaction (T2 ), T1 precedes T2 at the secondary site. Since the secondary site ensures strong se- number seq(s).TheFORWARD algorithm works by for- T T R rializability, 1 is locally serialized before 2 at the sec- warding some read-only transactions to the primary site for T ondary, and thus it is globally serialized before 2, a con- execution. Specifically, if the database at the secondary tradiction. site is too stale to execute a particular read-only transac- T T Case 3: Suppose 1 is an update transaction and 2 is a tion without violating session ordering constraints, then the T seq(s) read-only transaction. When 1 commits, is set to read-only transaction is forwarded to the primary site. Since seq(T ) 1 . The blocking condition ensures that no subse- the primary site is always in the “latest” state, this ensures s seq(DB) quent read-only transaction in can run until is that the read-only transaction will “see” its session prede- seq(T ) T at least as large as 1 . Thus, 2 does not execute until cessors. However, it increases the load at the primary site, T sometime after 1’s refresh transaction has finished. Since which is also responsible for all update transactions. T the secondary site ensures strong serializability, 2 is serial- In Section 3.1.1, we assumed that the primary site is able T ized after 1’s refresh transaction at the secondary site, and to associate a sequence number with each update transac- T therefore after 1 in the global serialization order, a contra- tion that commits there. When the FORWARD algorithm is diction. used, it is also necessary for the primary site to associate T T Case 4: Suppose both 1 and 2 are read-only transactions. a sequence number with each read-only transaction that is Both transactions run at the secondary site. Since strong se- forwarded to it. The primary site should not associate new T rializability is guaranteed locally there, 2 sees the database sequence numbers with read-only transactions. Rather, each T in the same state as 1, or in a later state. Thus, it can be read-only transaction should be assigned the same sequence T 2 serialized after 1, a contradiction. number as the most recent (in the primary’s local serializa- tion order) committed update transaction. 3.2.2 The FORWARD Algorithm .

The second algorithm, called FORWARD, handles update Theorem 3 If each site guarantees strong serializabilitylo- transactions the same way that the B algorithm does, cally, then the FORWARD algorithm, in conjunction with the but it handles read-only transactions differently. The han- propagation and refresh mechanisms of the basic system, dling of read-only transactions under the FORWARD algo- guarantees global strong session 1SR. rithm is summarized in Figure 4. As was the case under the BLOCK algorithm, each ses- Proof: Suppose that the claim is false, which means that sion manager maintains, for each of its sessions, a sequence there exists a pair of transactions T1 and T2 in the same ses- T T T 5Recall that seq(DB), which indicates the current state of the database sion for which 1 is executed before 2 but 1 cannot be at the secondary site, is maintained by the refresh transactions. serialized before T2. There are four cases to consider:

Proceedings of the 20th International Conference on Data Engineering (ICDE’04) 1063-6382/04 $ 20.00 © 2004 IEEE Case 1: Suppose T1 and T2 both execute at the primary. Parameter Description Default This is the same as Case 1 in Theorem 2. num clients number of clients varies Case 2: Suppose T1 executes at the secondary and T2 exe- num sec number of secondary 5 cutes at the primary. If T2 is an update transaction, this is sites the same as Case 2 in Theorem 2. If T2 is a read-only trans- think time mean client think time 7s action, let Tu be the latest update transaction whose effects session time mean session duration 15 min. T T T T were seen by 1. u preceded 1, and thus 2, at the pri- update tran prob probability of an update 20% mary site. Since the primary ensures strong serializability, transaction T T T 2 must be serializable after u, and thus after 1. conflict prob transaction conflict prob- 20% T T Case 3: Suppose 1 executes at the primary and 2 ex- ability T seq(s) ecutes at the secondary. When 1 commits, is tran size mean number of opera- 10 seq(T ) set to 1 . The forwarding condition ensures that no tions per transaction s subsequent transaction in can run at the secondary until op service time service time per opera- 0.02s seq(DB) seq(T ) T is at least as large as 1 .If 1 is an update tion T transaction, then the condition ensures that 2 does not run update op prob probability of an update 30% T until sometime after 1’s refresh transaction has finished. operation T Since the secondary site ensures strong serializability, 2 propagation delay propagator think time 10s is serialized after T1’s refresh transaction at the secondary site, and therefore after T1 in the global serialization order, Table 2. Simulation Model Parameters a contradiction. If T1 is a read-only transaction, a similar argument shows that T2 sees at least the same updates as seq(s) T1, and thus can be serialized after T1. by properly reinitializing when the session manager Case 4: Suppose T1 and T2 both execute at the secondary. recovers from a failure. Specifically, the session manager This is the same as Case 4 in Theorem 2.2 should initialize seq(s) (for each of its sessions s)tothe current sequence number of the database at the primary site. The session manager can determine the primary’s sequence 3.2.3 Discussion number by executing a dummy transaction Tinit at the pri- mary, obtaining Tinit’s sequence number from the primary, Both the BLOCK algorithm and the FORWARD algorithm re- and then initializing seq(s) to seq(Tinit). quire the session manager to read the value of seq(DB).As a result, the session manager conflicts with refresh transac- tions running at its secondary site, and seq(DB) is a poten- 4 Simulation Model tial database hotspot. However, the session manager need not read seq(DB) within the scope of the requested trans- We have developed a simulation model of the lazy mas- action (T in Figures 3 and 4). Instead, it can minimize con- ter replicated database system described in Section 3.1. We tention for seq(DB) by using a separate, short transaction have used the simulator to compare the Block and Forward to read it. global concurrency control protocols described in Section A second issue with both algorithms is maintaining the 3.2 and to determine the impact of the correctness criterion strong session 1SR guarantee in the case of a failure of a ses- (1SR, strong session 1SR, or strong 1SR) on system perfor- sion manager. Under either algorithm, the session manager mance. The model is implemented in C++ using the CSIM updates seq(s) outside the scope of the requested transac- simulation package [12]. tion T . This creates a small failure window between the The model consists of one CSIM resource (server) for commit of T and the change to seq(s). Of course, this each site in the system. Client processes simulate the ex- window could be closed completely by simply updating ecution of transactions requested by system clients. Each seq(s) within the scope of T in Figures 3 and 4. How- client process is associated with a particular secondary site, ever, this may hurt performance by increasing the size of and it submits all of its transactions to that site. The client T , and by turning read-only transactions into update trans- processes are uniformly distributed over the available sec- actions. Furthermore, updating seq(s) within T may imply ondary sites. In addition, there are processes that simu- reading seq(DB) within T , since the new value of seq(s) late update propagation and refresh transactions at the sec- may come from seq(DB). This further increases the size of ondary sites. The model’s parameters are summarized in T , and may introduce conflict between T and refresh trans- Table 2. actions. There are num clients client processes. Each client pro- To avoid these potential performance problems, we al- cess generates a series of sessions. Session lengths are ex- low the failure window to exist, and we address the problem ponentially distributed with a mean length of session time.

Proceedings of the 20th International Conference on Data Engineering (ICDE’04) 1063-6382/04 $ 20.00 © 2004 IEEE Within each session, a client iteratively generates transac- casts the message to the secondary sites. The primary’s tions, with exponentially distributed think times of mean propagation process consumes op service time during each length think time between them. When a session is finished, propagation cycle. It does not use the local concurrency the client immediately initiates a new session so that the to- control at the primary site, since we assume that the propa- tal number of concurrent sessions is num clients at all times. gator is implemented as a log sniffer. A transaction is an update transaction with probability Propagation messages are assumed to be delivered at the update tran prob, otherwise it is a read-only transaction. secondary sites after a delay of length propagation delay. Our default transaction mix is 80% read-only, 20% update, The simulation does not include an explicit resource to rep- although we also ran some experiments with a 95%/5% resent the network. We assume that the network has suffi- mix. Note that the “shopping” mix specified in the TPC- cient capacity so that network contention is not a significant W benchmark is an 80/20 mix of read-only and update web contributor to the propagation delay. interactions, and the “browsing mix” is a 95/5 mix.6 The de- At each secondary site, the local propagation process fault think time and session time values are also taken from receives the broadcast propagation messages and installs the TPC-W benchmark specification [19]. the transaction update information in an update queue in In the transaction execution model, each transaction pro- the local database. The propagation process consumes ceeds first to the session manager at the secondary site to op service time for each propagation message it handles. It which it is submitted. The session manager directs each does not use the local concurrency control when it inserts transaction to the primary site or to the secondary site for update information into the update queue. The queue oper- execution. If the session manager is using the BLOCK algo- ations conflict only with the refresh processes, which reads rithm, it may also cause the transaction to block. from the other end of the queue. Thus, we assume that any After leaving the session manager, each transaction pro- contention would be insignificant. ceeds to the local concurrency control at the site to which At each secondary site there is a set of refresh processes. it is assigned. We use a simple model of the local concur- Each refresh process iteratively waits to obtain one trans- rency control, which works as follows. Each newly arriv- action’s update records from the local update queue, and ing transaction has a probability conflict prob of conflicting then runs a single refresh transaction to apply those updates with each transaction that is already in the local system (ei- to the secondary database. The refresh transaction must ther running or waiting to run). If a newly arriving trans- pass through the local concurrency control at the secondary action does not conflict with any existing transactions, it site. It has probability conflict prob of conflicting with each begins running immediately. Otherwise, it waits for all con- read-only transaction at the site. In addition, we force a flicting transactions to complete before it begins running. conflict between every pair of refresh transactions at a site. Once a transaction has finished with the local con- This, together with our local concurrency control mecha- currency control, it can run. Each transaction consists nism, ensures that refresh transactions are not reordered by of a number of operations. Each operation takes time the local concurrency control. The refresh process con- op service time at the server. The number of operations in sumes op service time at the secondary server to retrieve each transaction is randomly chosen in the range tran size the transaction record from the update queue. It consumes plus or minus 50%. For update transactions, each oper- an additional op service time per update operation during ation has probability update op prob of being an update, the execution of the refresh transaction itself. otherwise it is a read. All transactions running at a given site share that site’s single server, which uses a round-robin 5 Performance Analysis queuing discipline with a time slice of 0.001 seconds. Propagation processes running at each site are used to We ran a series of experiments using the simulation simulate the propagation mechanism. There is one propa- model, with two goals in mind. The first was to deter- gation process per site. At the primary site, the propagation mine the cost, in terms of transaction throughput and re- process goes through a series of propagation cycles, with a sponse time, of providing strong session 1SR rather than delay of propagation delay time units between the end of the weaker 1SR. The second was to compare the two algo- one cycle and the beginning of the next. During each prop- rithms, BLOCK and FORWARD, that guarantee strong ses- agation cycle, the process propagates all transactions that sion 1SR. To facilitate these comparisons we implemented have committed at the primary since the last propagation two more algorithms for the session manager, in addition to cycle. To do this, the process generates a single message BLOCK and FORWARD: describing the updates of all of these transactions and broad- ALG-1SR: ALG-1SR provides only global serializabil- 6The TPC-W benchmark specifies the read/write mix in terms of web interactions, not database transactions. There is not necessarily a one-to- ity (1SR), not strong session 1SR. ALG-1SR simply one correspondence between the two. routes all update transactions to the primary site, and

Proceedings of the 20th International Conference on Data Engineering (ICDE’04) 1063-6382/04 $ 20.00 © 2004 IEEE all read-only transactions to the secondary site. Trans- Block actions are never blocked by Alg-1SR itself, although 20 Forward Alg-1SR they may, of course, be blocked by the local concur- Alg-Strong-Site-1SR rency control at their assigned site. Under this algo- rithm, the system operates like the basic system de- 15 scribedinSection3.1. 10 ALG-STRONG-SITE-1SR: ThisisthesameastheBLOCK Throughput (tps) algorithm of Figure 3 except that there is only a sin- gle session per secondary site, rather than one ses- 5 sion per client. Thus, ALG-STRONG-SITE-1SR main- tains only one session sequence number (seq(s))at 0 0 50 100 150 200 250 each secondary site. It enforces many more transac- Number of Clients tion ordering constraints than those required by strong session 1SR, but fewer than would be required by a Figure 5. Transaction Throughput vs. Number true implementation of strong 1SR. In our experiments of Clients we have used it as a surrogate for a strong 1SR algo- rithm. ALG-STRONG-SITE-1SR should result in per- formance no worse than (and probably significantly 9 Block better than) that of any algorithm that provides true Forward 8 Alg-1SR strong 1SR. Alg-Strong-Site-1SR 7 5.1 Methodology 6 5

For each run, the simulation parameters were set to the 4

default values shown in Table 2, except as indicated in the Response Time (s) 3

descriptions of the individual experiments. Each run lasted 2 for 35 simulated minutes. We ignored the first five minutes 1 of each run to allow the system to warm up, and measured 0 transaction throughput, response times and other statistics 0 50 100 150 200 250 over the remainder of the run. Each reported measurement Number of Clients is an average over five independent runs. We computed 95% confidence intervals around these means. These are shown Figure 6. Read-Only Transaction Response as error bars in the graphs. Time vs. Number of Clients

5.2 Performance of the Default Configuration 9 Block Forward We ran a series of experiments in which our default con- 8 Alg-1SR Alg-Strong-Site-1SR figuration, with five secondary sites, was subjected to load 7

from an increasing number of clients. The results of these 6 experiments are shown as throughput, read-only transaction 5 response time and update transaction response time plots in Figures 5, 6 and 7, respectively. Each curve describes the 4

behavior of one of the four global concurrency control algo- Response Time (s) 3 rithms (ALG-1SR, BLOCK,FORWARD and ALG-STRONG- 2

SITE-1SR) that we considered. 1

At low loads, the FORWARD algorithm provides bet- 0 ter read response times than BLOCK, and approximately 0 50 100 150 200 250 Number of Clients the same overall throughput and response times as ALG- 1SR. However, as the load increases, FORWARD breaks down because it saturates the primary site. The BLOCK Figure 7. Update Transaction Response Time algorithm is more consistent. Its performance is similar vs. Number of Clients to that of ALG-1SR over the load range that we tested,

Proceedings of the 20th International Conference on Data Engineering (ICDE’04) 1063-6382/04 $ 20.00 © 2004 IEEE and it was much better than that of ALG-STRONG-SITE- 25 Block 1SR. This indicates that strong session 1SR, which is guar- Forward Alg-1SR Alg-Strong-Site-1SR anteed by the BLOCK algorithm, can be achieved at about 20 the same cost as plain 1SR. Furthermore, a comparison of the performance of ALG-STRONG-SITE-1SR with that of 15 BLOCK shows that it is important to avoid enforcing unnec- essary transaction ordering constraints. The BLOCK algo- 10

rithm, which only enforces transaction ordering constraints Throughput (tps) within session, performs better than ALG-STRONG-SITE- 1SR, which enforces additional constraints. 5 These results are sensitive to the update propagation de- 0 lay, which is a key parameter in our performance model. 0 2 4 6 8 10 12 14 16 Ideally, updates would be propagated to the secondary sites Number of Secondary Sites very quickly, and the secondary databases would be almost as fresh as the primary. In our model, this can be simu- Figure 8. Transaction Throughput, 20 Clients lated by setting propagation delay close to zero. We ran per Secondary experiments under that condition (not shown), and found that all four of the algorithms have almost identical per- formance. That is, if all of the replicas are very current, 14 Block strong session 1SR is easy to achieve, and it does not matter Forward Alg-1SR which of the algorithms is used to achieve it. In practice, 12 Alg-Strong-Site-1SR however, scheduling at the primary site, network latencies, 10 and batching may introduce propagation latencies, and will make very low latencies difficult to achive. A nice property 8

of strong session 1SR is that it allows some propagation la- 6

tency to be tolerated without impacting transaction through- Response Time (s) put and response time. In particular, inter-transaction think 4

times within a session should effectively hide propagation 2 latency. In our default configuration, the mean transaction 0 think time (7 sec) is comparable to the propagation latency 0 2 4 6 8 10 12 14 16 (10sec).ThisiswhytheBLOCK algorithm performs well Number of Secondary Sites in the results shown in Figures 5, 6 and 7. Figure 9. Read-Only Transaction Response 5.3 Time, 20 Clients per Secondary

A desirable feature of our system is that the number of secondary sites can be scaled with the client load. To ex- 14 Block Forward amine the effects of doing so, we ran an experiment in Alg-1SR 12 Alg-Strong-Site-1SR which both the number of secondary sites and the number of clients were gradually increased, with the number of clients 10 per secondary site held constant at 20. Figures 8, 9, and 10 8 show the results of this experiment. In these experiments, the BLOCK algorithm again per- 6 Response Time (s) formed about as well as ALG-1SR, and significantly better 4 than ALG-STRONG-SITE-1SR. The FORWARD algorithm performed poorly, again because of the extra load it places 2

on the primary site. Throughput for both BLOCK and ALG- 0 1SR eventually peaks (with about 10 secondary sites) when 0 2 4 6 8 10 12 14 16 Number of Secondary Sites the primary site saturates. Since the primary site is the bot- tleneck that eventually limits throughput, a key scalability parameter is the mix of read-only and update transactions Figure 10. Update Transaction Response in the workload. Figure 11 shows the result of the scala- Time, 20 Clients per Secondary bility experiment using a 95/5 read/write transaction mix in

Proceedings of the 20th International Conference on Data Engineering (ICDE’04) 1063-6382/04 $ 20.00 © 2004 IEEE 100 Block the forced writes. In this work there is no notion of sessions 90 Forward Alg-1SR and thus no provision of session guarantees. Alg-Strong-Site-1SR 80 70 Some recent work has considered the specific problem 60 of concurrency control for lazy master replicated database 50 systems. Breitbart and colleagues have developed several 40 protocols for guaranteeing 1SR in lazy master systems [4]. Throughput (tps) 30 These protocols, called DAG(WT), DAG(T), and Backedge,

20 operate in a more general environment than the one consid-

10 ered here. Different database objects may have their pri-

0 mary copies located at different sites, and an acyclic (ex- 0 5 10 15 20 25 30 35 40 45 50 55 cept in the case of the Backedge protocol) site graph is used Number of Secondary Sites to guide the propagation of updates among the sites. Like our protocols, these rely on in-order propagation and ap- Figure 11. Transaction Throughput, 20 Clients plication of updates. Other related work on concurrency per Secondary, 95/5 read/write mix control protocols includes the virtual sites protocol of Bre- itbart and Korth, the quorum consensus protocol of Satya- narayanan and Agrawal, which uses a gossip mechanism to lazily propagate updates to sites that have missed them, and place of the default 80/20 mix. As might be expected, much the epidemic update propagation protocol of Agrawal and greater scalability is achieved in this case (compare Figures colleagues [5, 16, 1]. Pacitti, Minet, and Simon [13] pro- 8 and 11). posed a lazy master update propagation protocol that uses a worse-case estimation of update propagation time, allow- 6 Related Work ing a global ordering of refresh transactions. None of these techniques address the transaction ordering problem that we The techniques described in this paper are global con- have considered, and none have a notion of transaction ses- currency controls implemented in a federated, replicated sions. All guarantee 1SR, but not strong session 1SR or database system. Breitbart, Garcia-Molina and Silberschatz strong 1SR. have provided a thorough overview of concurrency control issues for federated databases [3]. That overview is not con- Pacitti and Simon [14] have studied the effects of differ- cerned with issues of replication and data freshness. How- ent update propagation techniques on the freshness of repli- ever, many of the general concerns of transaction ordering cated data. Our update propagation technique, in which that are addressed by our basic system (Section 3.1) arise in propagation occurs after commit, would be classified as all federated systems, whether they manage replicated data deferred-immediate in their work. The goal of their work or not. Our definition of strong serializability is based on is to keep the replicas as fresh as possible as efficiently as the one in that paper. possible. They did not consider the relationship between Bayou [18, 17] is a system that provides data freshness freshness and transaction ordering and execution that we to user sessions through causal ordering constraints on read have attempted to exploit in our work. and write operations. However, Bayou guarantees only eventual consistency, which means that servers converge to King and colleagues considered techniques for maintain- the same database state only in the absence of updates. For ing a replicated database to be used as in case the replicated data, this is the same weak notion of consistency primary copy fails [8]. Theirs is a lazy master approach, like as convergence [21]. Bayou does not necessarily guarantee ours. Their work did not consider execution of application that constraints within a session are preserved. It is possible transactions at the backup (secondary) site, so they were not for the system to report that a requested session guarantee, concerned with global concurrency controls. However, they e.g. read-your-writes, cannot be provided. faced an issue that arises in all lazy replication techniques: Ladin and colleagues [9] proposed a scheme that pro- how to ensure that the updates are applied in the same or- vides data freshness guarantees to read and write operations. der at the secondary site as they were at the primary. They In their work, if all writes are forced operations, they are di- proposed the use of transaction conflict analysis to allow rected to a single “primary” site, which orders them. This non-conflicting refresh transactions to run in parallel at the site uses a 2-phase commit protocol to propagate, in order, secondary site. A similar approach could be used in our sys- each forced write’s update to a majority of replicated sites. tem. However, it does require an analysis of transactions’ Subsequent operations are guaranteed to see the effects of read sets, as well as their updates.

Proceedings of the 20th International Conference on Data Engineering (ICDE’04) 1063-6382/04 $ 20.00 © 2004 IEEE 7Conclusion local conflicts. In Proceedings of the Seventh International Conference on Data Engineering, pages 314–323, 1991. In this paper, we proposed a new correctness criterion, [7] J. Gray, P.Helland, P. E. O’Neil, and D. Shasha. The dangers of replication and a solution. In Proceedings of the 1996 strong session 1SR, for transaction scheduling. Strong ses- ACM SIGMOD International Conference on Management sion 1SR is a generalization of one copy serializability of Data, Montreal, Quebec, Canada, June 4-6, 1996,pages (1SR) and strong serializability (strong 1SR), and guaran- 173–182, 1996. tees data freshness by allowing important transaction order- [8] R. P. King, N. Halim, H. Garcia-Molina, and C. A. Polyzois. ing constraints to be captured and unimportant ones to be Management of a remote backup copy for disaster recov- ignored. ery. ACM Transactions on Database Systems, 16(2):338– Starting with a basic system that provides only 1SR 368, 1991. over lazily synchronized replicated data, we showed how [9] R. Ladin, B. Liskov, L. Shrira, and S. Ghemawat. Providing to modify the system so that it ensures strong session 1SR. high availability using lazy replication. ACM Transactions We proposed two global concurrency control algorithms, on Computer Systems, 10(4):360–391, 1992. [10] C. Liu, B. G. Lindsay, S. Bourbonnais, E. Hamel, T. C. called Block and Forward, to achieve this. One works by Truong, and J. Stankiewitz. Capturing global transactions delaying transactions that need to see fresher data, the other from multiple recovery log files in a partitioned database works by redirecting such transactions to the fresh database system. In Proceedings of 29th International Conference copy at the primary site. We studied the performance of our on Very Large Data Bases, Sept. 2003. algorithms for the lazily synchronized replicated system. [11] D. A. Menasce, V. Almeida, R. H. Riedi, F. Ribeiro, R. C. We found that when propagation latencies are low, that is, Fonseca, and W. M. Jr. In search of invariants for e-business when the secondary database copies can be kept very fresh, workloads. In ACM Conference on Electronic Commerce, ensuring strong session 1SR, and even strong 1SR, costs pages 56–65, 2000. very little in terms of transaction throughput and response [12] Mesquite Inc. CSIM18 Simulation Engine (C++ time. However, for longer, more realistic propagation version) User’s Guide, Jan. 2002. [13] E. Pacitti, P. Minet, and E. Simon. Replica consistency in latencies, strong 1SR becomes very difficult to achieve lazy master replicated databases. Distributed and Parallel while strong session 1SR can be maintained almost as effi- Databases, 9(3):237–267, 2001. ciently as 1SR. Of the two algorithms, Block has consistent [14] E. Pacitti and E. Simon. Update propagation strategies to im- performance at all load levels. Strong session 1SR appears prove freshness in lazy master replicated databases. VLDB to be a useful and practical basis for enforcing transaction Journal, 8(3-4):305–318, 2000. ordering constraints in scalable replicated database systems. [15] Y. Raz. The principle of commitment ordering, or guar- anteeing serializability in a heterogeneous environment of Acknowledgement: We are grateful to M. Tamer multiple autonomous resource managers using atomic com- Ozsu¨ for his comments on an earlier draft of this paper. mitment. In Proceedings of 18th International Conference on Very Large Data Bases, pages 292–312, Aug. 1992. [16] O. T. Satyanarayanan and D. Agrawal. Efficient execution of References read-only transactions in replicated multiversion databases. IEEE Transactions on Knowledge and Data Engineering, [1] D. Agrawal, A. E. Abbadi, and R. Steinke. Epidemic algo- 5(5):859–871, 1993. rithms in replicated databases. In Symposium on Principles [17]D.B.Terry,A.J.Demers,K.Petersen,M.Spreitzer,and of Database Systems, pages 161–172, 1997. M. Theimer. Flexible update propagation for weakly consis- [2] P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concur- tent replication. In Symposium on Operating Systems Prin- rency Control and Recovery in Database Systems. Addison- ciples, pages 288–301, 1997. Wesley, 1987. [18] D. B. Terry, A. J. Demers, K. Petersen, M. Spreitzer, [3] Y. Breitbart, H. Garcia-Molina, and A. Silberschatz. M. Theimer, and B. W. Welch. Session guarantees for Overview of multidatabase transaction management. VLDB weakly consistent replicated data. In Conference on Par- Journal, 1(2):181–293, 1992. allel and Distributed Information Systems, pages 140–149, [4] Y. Breitbart, R. Komondoor, R. Rastogi, S. Seshadri, and 1994. A. Silberschatz. Update propagation protocols for repli- [19] Performance Council. cated databases. In Proceedings ACM SIGMOD Interna- TPC Benchmark W (Web Commerce), Feb. 2001. tional Conference on Management of Data, pages 97–108, http://www.tpc.org/tpcw/default.asp. June 1999. [20] Transaction Processing Performance Council. TPC [5] Y. Breitbart and H. F. Korth. Replication and consistency: Benchmark H (Decision Support) Revision 2.0.0, 2002. Being lazy helps sometimes. In Proceedings of the Sixteenth http://www.tpc.org/tpch/default.asp. ACM SIGACT-SIGMOD-SIGART Symposium on Principles [21] Y. Zhuge, H. Garcia-Molina, and J. L. Wiener. Consistency of Database Systems, pages 173–184, May 1997. algorithms for multi-source warehouse view maintenance. [6] D. Georgakopoulos, M. Rusinkiewicz, and A. Sheth. On Distributed and Parallel Databases, 6(1):7–40, 1998. serializability of multidatabase transactions through forced

Proceedings of the 20th International Conference on Data Engineering (ICDE’04) 1063-6382/04 $ 20.00 © 2004 IEEE