A Critique of ANSI SQL Isolation Levels
Total Page:16
File Type:pdf, Size:1020Kb
A Critique of ANSI SQL Isolation Levels Hal Berenson Microsoft Corp. [email protected] Phil Bernstein Microsoft Corp. [email protected] Jim Gray Microsoft Corp. [email protected] Jim Melton Sybase Corp. [email protected] Elizabeth O’Neil UMass/Boston [email protected] Patrick O'Neil UMass/Boston [email protected] June 1995 Technical Report MSR-TR-95-51 Microsoft Research Advanced Technology Division Microsoft Corporation One Microsoft Way Redmond, WA 98052 “A Critique of ANSI SQL Isolation Levels,” Proc. ACM SIGMOD 95, pp. 1-10, San Jose CA, June 1995, © ACM. -1- A Critique of ANSI SQL Isolation Levels Hal Berenson Microsoft Corp. [email protected] Phil Bernstein Microsoft Corp. [email protected] Jim Gray U.C. Berkeley [email protected] Jim Melton Sybase Corp. [email protected] Elizabeth O’Neil UMass/Boston [email protected] Patrick O'Neil UMass/Boston [email protected] Abstract: ANSI SQL-92 [MS, ANSI] defines Isolation distinction is not crucial for a general understanding. Levels in terms of phenomena: Dirty Reads, Non-Re- The ANSI isolation levels are related to the behavior of peatable Reads, and Phantoms. This paper shows that lock schedulers. Some lock schedulers allow transactions these phenomena and the ANSI SQL definitions fail to to vary the scope and duration of their lock requests, thus characterize several popular isolation levels, including the departing from pure two-phase locking. This idea was in- standard locking implementations of the levels. troduced by [GLPT], which defined Degrees of Consistency Investigating the ambiguities of the phenomena leads to in three ways: locking, data-flow graphs, and anomalies. clearer definitions; in addition new phenomena that better Defining isolation levels by phenomena (anomalies) was characterize isolation types are introduced. An important intended to allow non-lock-based implementations of the multiversion isolation type, Snapshot Isolation, is defined. SQL standard. 1. Introduction This paper shows a number of weaknesses in the anomaly approach to defining isolation levels. The three ANSI phe- nomena are ambiguous. Even their broadest interpreta- Running concurrent transactions at different isolation lev- tions do not exclude anomalous behavior. This leads to els allows application designers to trade throughput for some counter-intuitive results. In particular, lock-based correctness. Lower isolation levels increase transaction isolation levels have different characteristics than their concurrency but risk showing transactions a fuzzy or ANSI equivalents. This is disconcerting because incorrect database. Surprisingly, some transactions can commercial database systems typically use locking. execute at the highest isolation level (perfect Additionally, the ANSI phenomena do not distinguish serializability) while concurrent transactions running at a among several isolation levels popular in commercial lower isolation level can access states that are not yet systems. committed or that postdate states the transaction read earlier [GLPT]. Of course, transactions running at lower Section 2 introduces basic isolation level terminology. It isolation levels may produce invalid data. Application defines the ANSI SQL and locking isolation levels. Section designers must prevent later transactions running at higher 3 examines some drawbacks of the ANSI isolation levels isolation levels from accessing this invalid data and propa- and proposes a new phenomenon. Other popular isolation gating errors. levels are also defined. The various definitions map between ANSI SQL isolation levels and the degrees of con- The ANSI/ISO SQL-92 specifications [MS, ANSI] define four sistency defined in 1977 in [GLPT]. They also encompass isolation levels: (1) READ UNCOMMITTED, (2) READ COMMITTED, (3) REPEATABLE READ, (4) Date’s definitions of Cursor Stability and Repeatable Read SERIALIZABLE. These levels are defined with the classi- [DAT]. Discussing the isolation levels in a uniform frame- cal serializability definition, plus three prohibited action work reduces confusion. subsequences, called phenomena: Dirty Read, Non-re- peatable Read, and Phantom. The concept of a phe- Section 4 introduces a multiversion concurrency control nomenon is not explicitly defined in the ANSI specifica- mechanism, called Snapshot Isolation, that avoids the tions, but the specifications suggest that phenomena are ANSI SQL phenomena, but is not serializable. Snapshot action subsequences that may lead to anomalous (perhaps Isolation is interesting in its own right, since it provides a non-serializable) behavior. We refer to anomalies in what reduced-isolation level approach that lies between READ follows when suggesting additions to the set of ANSI COMMITTED and REPEATABLE READ. A new formalism phenomena. As shown later, there is a technical (available in the longer version of this paper [OOBBGM]) distinction between anomalies and phenomena, but this connects reduced isolation levels for multiversioned data to the classical single-version locking serializability theory. Permission to copy without fee all or part of this material is Section 5 explores some new anomalies to differentiate the granted provided that the copies are not made or distributed for isolation levels introduced in Sections 3 and 4. The ex- direct commercial advantage, the ACM copyright notice and the tended ANSI SQL phenomena proposed here lack the power title of the publication and its date appear, and notice is given that copying is by permission of the Association of Computing to characterize Snapshot isolation and Cursor Stability. Machinery. To copy otherwise, or to republish, requires a fee Section 6 presents a Summary and Conclusions. and/or specific permission. “A Critique of ANSI SQL Isolation Levels,” Proc. ACM SIGMOD 95, pp. 1-10, San Jose CA, June 1995, © ACM. -1- 2. Isolation Definitions P3 (Phantom): Transaction T1 reads a set of data items satisfying some <search condition>. Transaction T2 2.1 Serializability Concepts then creates data items that satisfy T1’s <search condi- tion> and commits. If T1 then repeats its read with the same <search condition>, it gets a set of data items Transactional and locking concepts are well documented different from the first read. in the literature [BHG, PAP, PON, GR]. The next few None of these phenomena could occur in a serial history. paragraphs review the terminology used here. Therefore by the Serializability Theorem they cannot occur in a serializable history [EGLT, BHG Theorem 3.6, GR A transaction groups a set of actions that transform the Section 7.5.8.2, PON Theorem 9.4.2]. database from one consistent state to another. A history models the interleaved execution of a set of transactions as Histories consisting of reads, writes, commits, and aborts a linear ordering of their actions, such as Reads and Writes can be written in a shorthand notation: “w1[x]” means a (i.e., inserts, updates, and deletes) of specific data items. write by transaction 1 on data item x (which is how a data Two actions in a history are said to conflict if they are item is “modified’), and “r2[x]” represents a read of x by performed by distinct transactions on the same data item transaction 2. Transaction 1 reading and writing a set of and at least one of is a Write action. Following [EGLT], records satisfying predicate P is denoted by r1[P] and this definition takes a broad interpretation of “data item”: w1[P] respectively. Transaction 1’s commit and abort it could be a table row, a page, an entire table, or a (ROLLBACK) are written “c1” and “a1”, respectively. message on a queue. Conflicting actions can also occur on a set of data items, covered by a predicate lock, as well as Phenomenon P1 might be restated as disallowing the fol- on a single data item. lowing scenario: A particular history gives rise to a dependency graph (2.1) w1[x] . r2[x] . (a1 and c2 in either order) defining the temporal data flow among transactions. The actions of committed transactions in the history are repre- The English statement of P1 is ambiguous. It does not ac- sented as graph nodes. If action op1 of transaction T1 tually insist that T1 abort; it simply states that if this hap- conflicts with and precedes action op2 of transaction T2 in pens something unfortunate might occur. Some people the history, then the pair <op1, op2> becomes an edge in reading P1 interpret it to mean: the dependency graph. Two histories are equivalent if they have the same committed transactions and the same depen- (2.2) w1[x]...r2[x]...((c1 or a1) and (c2 or a2) in any order) dency graph. A history is serializable if it is equivalent to a serial history — that is, if it has the same dependency Forbidding the (2.2) variant of P1 disallows any history graph (inter-transaction temporal data flow) as some where T1 modifies a data item x, then T2 reads the data history that executes transactions one at a time in se- item before T1 commits or aborts. It does not insist that T1 quence. aborts or that T2 commits. 2.2 ANSI SQL Isolation Levels Definition (2.2) is a much broader interpretation of P1 than (2.1), since it prohibits all four possible commit-abort ANSI SQL Isolation designers sought a definition that pairs by transactions T1 and T2, while (2.1) only prohibits would admit many different implementations, not just two of the four. Interpreting (2.2) as the meaning of P1 locking. They defined isolation with the following three prohibits an execution sequence if something anomalous phenomena: might in the future. We call (2.2) the broad interpretation of P1, and (2.1) the strict interpretation of P1. P1 (Dirty Read): Transaction T1 modifies a data item. Interpretation (2.2) specifies a phenomenon that might Another transaction T2 then reads that data item before T1 lead to an anomaly, while (2.1) specifies an actual performs a COMMIT or ROLLBACK. If T1 then performs a anomaly. Denote them as P1 and A1 respectively.