Concurrency Control in Distributed Database Systems

Concurrency Control in Distributed Database Systems PHILIP A. BERNSTEIN AND NATHAN GOODMAN Computer Corporation of America, Cambridge, Massachusetts 02139 In this paper we survey, consolidate, and present the state of the art in distributed database concurrency control. The heart of our analysts is a decomposition of the concurrency control problem into two major subproblems: read-write and write-write synchronization. We describe a series of synchromzation techniques for solving each subproblem and show how to combine these techniques into algorithms for solving the entire concurrency control problem. Such algorithms are called "concurrency control methods." We describe 48 principal methods, including all practical algorithms that have appeared m the literature plus several new ones. We concentrate on the structure and correctness of concurrency control algorithms. Issues of performance are given only secondary treatment. Keywords and Phrases: concurrency control, deadlock, dtstnbuted database management systems, locking, senahzability, synchromzation, tunestamp ordering, timestamps, two- phase commit, two-phase locking CR Categories: 4.33, 4.35 INTRODUCTION Concurrency control has been actively investigated for the past several years, and The Concurrency Control Problem the problem for nondistributed DBMSs is Concurrency control is the activity of co- well understood. A broad mathematical ordinating concurrent accesses to a data- theory has been developed to analyze the base in a multiuser database management problem, and one approach, called two- system (DBMS). Concurrency control per- phase locking, has been accepted as a mits users to access a database in a multi- standard solution. Curre.nt research on non- programmed fashion while preserving the distributed concun'ency control is focused illusion that each user is executing alone on on evolutionary improvements to two- a dedicated system. The main technical phase locking, detailed performance analy- difficulty in attaining this goal is to prevent sis and optimization, and extensions to the database updates performed by one user mathematical theory. from interfering with database retrievals Distributed concurrency control, by con- and updates performed by another. The trast, is in a state of extreme turbulence. concurrency control problem is exacerbated More than 20 concurrency control algo- in a distributed DBMS (DDBMS) because rithms have been proposed for DDBMSs, (1) users may access data stored in many and several have been, or are being, imple- different computers in a distributed system, mented. These algorithms are usually com- and (2) a concurrency control mechanism plex, hard to understand, and difficult to at one computer cannot instantaneously prove correct (indeed, many are incorrect). know about interactions at other com- Because they are described in different ter- puters. minologies and make different assumptions Permission to copy without fee all or part of this material m granted provided that the coples are not made or distributedfor directcommercial advantage, the ACM copyright notice and the titleof the publicationand its date appear, and notice is given that copying is by perrmssion of the Associationfor Computing Machinery. To copy otherwise,or to republish,requires a fee and/or specificpermission. © 1981 ACM 0010-4892/81/0600-0185 $00.75 Computing Surveys, Vol. 13, No. 2, June 1981 186 • P. A. Bernstein and N. Goodman CONTENTS cry concurrency control algorithm must in- clude a subalgorithm to solve each subproblem. The first step toward understanding a concurrency control algorithm is to isolate INTRODUCTION the subalgorithm employed for each sub- The Concurrency Control Problem problem. Examples of Concurrency Control Anomalies After studying the large number of pro- Comparison to Mutual Exclnslon Problems posed algorithms, we find that they are 1. TRANSACTION-PROCESSING MODEL 1.1 Prelmunary Defimtmns and DDBMS Archi- compositions of only a few subalgorithms. tecture In fact, the subalgorithms used by all prac- 1.2 Centrahzed Transactmn-Processmg Model tical DDBMS concurrency control algo- 1.3 Dmmbuted Transactmn-Processing Model rithms are variations of just two basic tech- 2 DECOMPOSITION OF THE CONCUR- niques: two-phase locking and timestamp RENCY CONTROL PROBLEM 2 1 Selaallzabfllty ordering; thus the state of the art is far 2.2 A Parachgm for Concurrency Control more coherent than a review of the litera- 3. SYNCHRONIZATION TECHNIQUES ture would seem to indicate. BASED ON TWO-PHASE LOCKING 3.1 Basra 2PL Implementation Examples of Concurrency Control Anomalies 3.2 Primary Copy 2PL 3.3 Voting 2PL The goal of concurrency control is to pre- 3.4 Centrahzed 2PL vent interference among users who are si- 3.5 Deadlock Detection and Prevention 4 SYNCHRONIZATION TECHNIQUES multaneously accessing a database. Let us BASED ON TIMESTAMP ORDERING illustrate the problem by presenting two 4.1 Basic T/O Implementatmn "canonical" examples of interuser interfer- 4.2 The Thomas Write Rule ence. Both are examples of an on-line 4.3 MulUversion T/O 4.4 Conservative T/O electronic funds transfer system accessed 4 5 Tnnestamp Management via remote automated teller machines 5 INTEGRATED CONCURRENCY CONTROL (ATMs). In response to customer requests, METHODS ATMs retrieve data from a database, per- 5 1 Pure 2PL Methods form computations, and store results back 5.2 Pure T/O Methods 5.3 MLxed 2PL and T/O Methods into the database. 6. CONCLUSION Anomaly 1: Lost Updates. Suppose two APPENDIX. OTHER CONCURRENCY CON- customers simultaneously try to deposit TROL METHODS AI. Certifiers money into the same account. In the ab- A2. Thomas' MaJority Consensus Algorithm sence of concurrency control, these two ac- A3. Ellis' Ring Algorithm tivities could interfere (see Figure 1). The ACKNOWLEDGMENT two ATMs handling the two customers REFERENCES could read the account balance at approxi- mately the same time, compute new bal- v ances in parallel, and then store the new about the underlying DDBMS environ- balances back into the database. The net ment, it is difficult to compare the many effect is incorrect: although two customers proposed algorithms, even in qualitative deposited money, the database only reflects terms. Naturally each author proclaims his one activity; the other deposit is lost by the or her approach as best, but there is little system. compelling evidence to support the claims. Anomaly 2: Inconsistent Retrievals. Suppose two customers simultaneously ex- To survey the state of the art, we intro- ecute the following transactions. duce a standard terminology for describing DDBMS concurrency control algorithms Customer 1: Move $1,000,000 from Acme and a standard model for the DDBMS en- Corporation's savings ac- vironment. For analysis purposes we de- count to its checking account. compose the concurrency control problem Customer 2: Print Acme Corporation's into two major subproblems, called read- total balance in savings and write and write-write synchronization. Ev- checking. Computing Surveys, Vol. 13, No 2, June 1981 Concurrency Control in Database Systems 187 Execut,on of T I Dotobose Execution of T2 READ bolonce I I, 00000 ] 0,000e Add ~I,000,000 $1,500,000 [ J $2~500,000 ] Add $21000,000 WRITE result bock to dotobose bock to dotobose Figure 1. Lostupdate anomaly. In the absence of concurrency control access to shared resources. However, con- these two transactions could interfere (see trol schemes that work for one do not nec- Figure 2). The first transaction might read essarily work for the other, as illustrated by the savings account balance, subtract the following example. Suppose processes $1,000,000, and store the result back in the P1 and P2 require access to resources R1 database. Then the second transaction and R2 at different points in their execution. might read the savings and checking ac- In an operating system, the following inter- count balances and print the total. Then leaved execution of these processes is per- the first transaction might finish the funds fectly acceptable: P1 uses R1, P2 uses R~, Pe transfer by reading the checking account uses R2, P1 uses R2. In a database, however, balance, adding $1,000,000, and finally stor- this execution is not always acceptable. As- ing the result in the database. Unlike sume, for example, that P2 transfers funds Anomaly 1, the final values placed into the by debiting one account (RI), then crediting database by this execution are correct. Still, another (R2). If P2 checks both balances, it the execution is incorrect because the bal- will see R~ after it has been debited, but see ance printed by Customer 2 is $1,000,000 R2 before it has been credited. Other differ- short. ences between concurrency control and mu- These two examples do not exhaust all tual exclusion are discussed in CHAM74. possible ways in which concurrent users can interfere. However, these examples are typical of the concurrency control problems 1. TRANSACTION-PROCESSING MODEL that arise in DBMSs. To understand how a concurrency control algorithm operates, one must understand Comparison to Mutual Exclusion Problems how the algorithm fits into an overall The problem of database concurrency con- DDBMS. In this section we present a sim- trol is similar in some respects to that of ple model of a DDBMS, emphasizing how mutual exclusion in operating systems. The the DDBMS processes user interactions. latter problem is concerned with coordinat- Later we explain how concurrency control ing access by concurrent processes to sys- algorithms operate

Load more