Limitations on Database Availability When Networks Partition

Limitations on Database Availability when Networks Partition Brian A. Coan, Brian M. Oki, and Elliot K. Kolodner Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, MA 02139 Abstract: In designing fault-tolerant distributed database 1 Introduction systems, a frequent goal is making the system highly available despite component failure. We examine software ap- In designing fault-tolerant distributed database sys- proaches to achieving high availability in the presence of tems, a frequent objective is to make the system highly partitions. In particular, we consider various replicated- available in spite of component failures. We measure avail- data management protocols that maintain database consistency and attempt to increase database availability when ability as the fraction of transactions presented to the sys- networks partition. We conclude that no protocol does bet- tem that complete. One technique to increase data avail- ter than a bound we have determined. Our conclusions hold ability is replicating the data at various sites in the net- under the assumption that the pattern of data accesses by work. In this paper, we examine several replicated-data transactions obeys a uniformity assumption. There may be management protocols that maintain database consistency some particular distribution for which specialized protocols can increase availability. and attempt to make replicated data highly available in the presence of network partitions. (Partitions are failures that divide a system into two or more components between which communication is impossible.) The protocols we examine in this paper maintain one- copy serializability [1] [8], and are of the on-line kind, that is, those that are required to make irrevocable commit/abort decisions at the time the transaction is processed. We do not consider other classes of protocols, such as off-line protocols, which are protocols that may defer commit/abort decisions until the partitions are rejoined; protocols that abandon one-copy serializability as the correctness criterion; and protocols that use type-specific information. Da- vidson's optimistic protocol [2] is an example of an off- This research was supported in part by the Advanced Research line protocol. The partition-tolerant distributed databases Projects Agency of the Department of Defense, monitored by the project at the Computer Corporation of America [9] is an Office of Naval Research under contracts N00014-83-K-0125 and N00014-85-K-0168, in part by the National Science Foundation example of a system that abandons one-copy serializability under grants DCR-8503662 and DCR-83-02091, and in part by to achieve higher availability. Herlihy [7] deals with repli- the Office of Army Research under contract DAAG29-84-K-0058. cation methods for abstract data types. The main objective of replicated-data management protocols is achieving availability while maintaining data consistency. No protocol whose correctness criterion is one- copy serializability can do better than a bound we have determined, under the assumption that the pattern of data accesses by transactions obeys a certain uniformity assump- Permission to copy without fee all or part of this material is granted tion that we explain. We believe this assumption is area- provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specfic permission. © 1986 ACM 0-89791-198-9/86/0800-0187 75¢ "18 7 sonable one if we know nothing in particular about the one another is called a partition group, following Davidson transaction distributions; there might be some particular [2]. We make the simplifying assumption that a partitioned distributions for which specialized protocols achieve greater network consists of only two partition groups, called a ma- availability. Furthermore, this assumption permits us to do jority partition and a minority partition. This simplifica- the analysis. tion does not change our conclusions because we believe In the context of a simple model we have developed, that this kind of partition is rare and that more extensive we analyze the level of availability achieved by several partitioning is even rarer; we analyze the most common replicated-data management protocols proposed in the lit- case. We define L i as the load on the system for a site j erature. The protocols we look at use different rules to based on the fraction of transactions that run there. That increase data availability during a partition. Given the au- is, thors' informal discussion of availability achievable by these number of transactions initiated at site j protocols, it is difficult to determine how one protocol com- Li = number of transactions pares against the others. We provide a uniform basis for Then set S of sites is a majority if and only if comparison. In addition, we show that several of the proto- (~0~s Lo) > ~. We use this somewhat nonstandard defi- cols achieve the upper bound for availability, so the bound nition of majority because it ensures the desirable property is tight. that during a partition more than one-half of the work sub- Our analysis shows that there is a severe limitation on mitted to the system is submitted to the majority partition. the availability that can be achieved during a partition. In this paper, we are interested in availability. It is Because of this limitation, networks should be designed to a measure of the amount of work that can be done by a minimize the probability that partitions will occur. system. We define availability as follows. This paper is organized as follows. Section 2 begins with Availability = some assumptions and definitions underlying our model, de- number of transactions successfully completed fines the notion of availability, and provides a bound on the number of transactions presented to the system level of availability achievable with replicated-data management protocols that maintain one-copy serializability. In We do not study other aspects of performance, such as the section 3 we describe some of the known replicated-data relative expense of read/write operations or the cost of re- management protocols and, for each protocol, give a quan- joining partitions. titative measure of the availability achieved by the protocol. In order to quantify our observations concerning avail- Finally, in section 4, we summarize our results. ability, we are interested in the following parameters. t = total number of transactions presented during 2 Bounds on Availability the partition u,~ i = fraction of t that are update transactions and In this section, we define availability and then prove a are in the majority partition bound on the availability achievable. u,ra, = fraction of t that are update transactions and are in the minority partition 2.1 Assumptions and Definitions rraai = fraction of t that are read - only transactions The context of our work is a distributed database system and are in the majority partition rm~. = fraction of t that are read - only transactions in which the data is fully replicated. This system consists of and are in the minority partition a collection of n sites, numbered I,..., n. We have chosen this special case of a distributed database system because 2.2 Analysis it simplifies our analysis. Transactions can operate on data items by reading or updating. We assume no blind updates, We now show that no on-line replicated-data manage- where a blind update is one that updates a data item with- ment protocol that maintains one-copy serializability can out first reading it. The set of data items updated by a achieve a level of availability that is better than ur~ai + transaction is called its write set. rmaj + rrain. A partition occurs in a system when two functioning Our proof depends on an assumption regarding system sites are unable to communicate for a significant interval of workload, which we call the uniformity assumption. One time. A maximal set of sites that can communicate with informal characterization, which is sufficient for the unifor- '188 Majority Minority 0 Y.,.~ TA: Z,nia U d • Z,~j a 0 t R(x) Rix) e W(x) W(x) R,,,,,i R,,~. R e Figure 1: Execution £ a d Majority Minority mity assumption to be satisfied, is that the transaction mix Figure 2: Sets of transactions used in proof of Lemma 2 be the same at each site. Uniformity assumption• For all D where D is a sub- set of the data items, and for all j E 1,... ,n. bound on the availability achievable by a replicated-data Number of D - transactions initiated at site j management protocol that maintains one-copy serializabU- Total number of D - transactions = Li ity. where a D-transaction is a transaction with write set D. Lemma 2. Let .4 be any replicated-data management For example, suppose 10% of the update transactions run at protocol that satisfies the correctness property and that site 1; then our assumption says that of those transactions operates in a system in which the uniformity assumption that update a set of data items, 10% of them run at site 1. holds. Any execution ~ of protocol .4 during a partition We find it convenient to formulate the following correct- has availability at most u,uzj + r~j + r,~,~. ness property, which in Theorem 1 we show is a necessary Proof. We use a counting argument to bound the avail- condition for maintaining one-copy serializability. ability from above. See Figure 2 for an illustration of the Correctness property. For all data items d, d is not sets of transactions defined below. updated on both the minority and majority sides of a par- We define the following quantities: tition.

Load more