A Theoretical Study of ‘Snapshot Isolation’

Ragnar Normann Lene T. Østby Department of Informatics Department of Informatics University of Oslo University of Oslo ragnarn@ifi.uio.no [email protected]

ABSTRACT Isolation in relation to concurrency phenomena and isolation Snapshot Isolation is a popular and efficient protocol for con- levels. A later paper by Fekete & al. [5] focuses on identi- currency control. In this paper we discuss Snapshot Isolation fying cases where Snapshot Isolation will not comply with in view of the classical theory for transaction processing. In the isolation level serializable, and how to avoid these (e.g. addition to summarizing previous research we prove that at the application level). Papers discussing other aspects of the set SI of histories that may be generated by Snapshot Snapshot Isolation are, at best, rare. Isolation is incomparable to final state, view and conflict , that SI is monotone, and that schedules gen- In this paper we relate Snapshot Isolation to the well-known erated by Snapshot Isolation are strict and thus have good theory of serializability and recoverability. Most of our re- properties with respect to recoverability. sults were first presented in the master thesis [13] of the second author. Categories and Subject Descriptors H.2.4 [ Management]: Systems—Concurrency, The organization of the rest of the paper is as follows: Some Transaction processing useful background material is given in section 2 before the Snapshot Isolation protocol is presented in section 3. Sec- tions 4 and 5 summarize the previous research relating Snap- General Terms shot Isolation to concurrency phenomena and anomalies and Theory to the SQL isolation levels, Sections 6 through 8 discuss Snapshot Isolation and serializability, monotonicity and re- Keywords coverability. Some final remarks are given in section 9. Snapshot Isolation, concurrency, transaction, serializability, monotonicity, recoverability 2. NOTATION AND THEORETICAL FRAMEWORK 1. INTRODUCTION AND MOTIVATION We have chosen to use a terminology and notation very close Snapshot Isolation is a multiversion protocol for concur- to the one used by Weikum & Vossen in their book [11]. rency control that is quite popular and heavily used. To our knowledge the first commercial available implementation was in Borland’s multiversion database Definition 1. A transaction ti is a finite set of operations InterBase 4 released in 1994. Later, among others, Oracle, opi with a strict (linear) order

Permission to make digital or hard copies of all or part of this work for Definition 2. ([11, definition 3.1]) Let T = {t1, . . . , tn} be personal or classroom use is granted without fee provided that copies are a (finite) set of transactions, where each t ∈ T has the form not made or distributed for profit or commercial advantage and that copies i bear this notice and the full citation on the first page. To copy otherwise, to ti = (opi,

44 n n n n (a) op(s) ⊆ Si=1 opi ∪ Si=1{ai, ci} and Si=1 opi ⊆ (a) op(m) = h(Si=1 op(ti)) for some version func- op(s), i.e., s consists of the union of operations tion h from the given transactions plus some (terminat- (b) for all t ∈ T and all operations p, q ∈ op(t) the ing) operations, each of which may either be a ci following holds: (commit) or an ai (abort); p

In plain language, definition 2 states that a history h for T Note that this definition only takes the non-serializable his- is a partial order of all steps in the transactions in T such tories into account. It is irrelevant which serializable histo- that ries that may be disallowed.

• the internal order of each transaction’s steps is pre- 3. THE SNAPSHOT ISOLATION served in h PROTOCOL Definition 6. The Snapshot Isolation Protocol produces • each transaction in T is terminated by either a commit multiversion schedules by enforcing the following two rules: or an abort in h

• all read-write or write-write conflicts between different 1. When a transaction t reads a data item x, t reads transactions in T are ordered in h the newest version of x written by a transaction that committed before t started. When working with histories it is convenient to introduce 2. The write sets of two concurrent transactions must be two (fictious) transactions: one initializing transaction t0 disjoint. that writes the initial state of the database (and commits) before the execution of the history is started, and one final The set of histories that may be produced by the Snapshot (read-only) transaction t that reads the whole database ∞ Isolation protocol is denoted SI. after the execution of the history is finished.

As mentioned in the introduction, Snapshot Isolation is a Rule 1 states that the version function maps each read action multiversion protocol. We therefore need the following two ri(x) to the most recent committed write action wj (x) at the definitions: time ti started.

Rule 2 states that if t1 and t2 are two transactions where t1 Definition 3. ([11, definition 5.1]) Let s be a history with starts before t2, and t1 commits after t2 has started, t1 and initializing transaction t0 and final transaction t∞. A version t2 are not permitted to write the same data item. function for s is a function h, which associates whith each read step of s a previous write step on the same data item, There are several methods to enforce rule 2. One is to com- and which is the identity on write steps. pare write sets at commit. Oracle has implemented a more efficient algorithm which in Fekete et al. [6] is called ‘first updater wins’. This algorithm works as follows: Definition 4. ([11, definition 5.2]) Let T = {t1, . . . , tn} be a (finite) set of transactions. Assume two transactions t1 and t2 are concurrent, t1 writes x, and t2 also wants to write x. Then t2 cannot write x 1. A multiversion history for T is pair m = (op(m),

45 • If t2 isqueuedtowrite x and t1 commits, t2 is imme- concurrent update transactions wrote incorrect values. diately aborted. But in [6] Fekete et al. presented the above example showing that this is not the case. In the example the • If t1 commits before t2 tries to write x, t2 is aborted commit order differs from the serial order t1t2t3, and when it tries to perform the write. since t1 and t2 write different data items, Snapshot • If t1 drops its lock because it aborts, t2 is allowed to Isolation will not detect this Read-Only anomaly. write x. Theorem 1. (Berenson & al.[2], Fekete & al.[6]) Of the Finally, here is the rule for when old versions are superfluous: ten concurrency phenomena and anomalies listed above, only these three may occur in a schedule produced by Snapshot Isolation: • A version xi of a data item x may be deleted if, and only if, there is a newer version xk of x, and all active P3: Phantom transactions were started after tk was committed. A5B: Write skew A6: Read-only transaction anomaly 4. CONCURRENCY PHENOMENA AND ANOMALIES 5. SQL ISOLATION LEVELS We base our discussion mainly on Berenson & al. [2], and Isolation levels were introduced in the SQL-92 standard [1], throughout this section we make heavy use of their defini- [9, section 14.3]. This standard defines four levels of isola- tions and examples. In [2] they distinguish between con- tion which SQL-servers should support (the levels are ranked currency phenomena (denoted by P) – which may result in from the strongest to the weakest): errors – and concurrency anomalies (denoted by A) – which always result in errors. We follow their nomenclature in the Serializable. A history generated at isolation level Serial- list below: izable shall prohibit all concurrency phenomena and anomalies, in fact it shall produce the same result as some serial execution of the transactions in the history. P0 – Dirty Write Example: w1(x)w2(x)(c1 or a1) Repeatable Read. All phenomena and anomalies but phan- toms are prohibited. P1 – Dirty Read Example: w1(x)r2(x)(c1 or a1) Read Committed. All phenomena and anomalies except dirty writes and reads may occur. P2 – Fuzzy or Non-Repeatable Read Example: r1(x)w2(x)(c1 or a1) Read Uncommitted. Only dirty writes are prohibited. P3 – Phantom Example: r1(P )w2(y in P )(c1 or a1) Theorem 2. (Berenson & al.[2]) The P in the example stands for ’Predicate’ and is ReadCommitted ≪ SI ≪ Serializable while typically the result set of a SQL WHERE-clause. So if SI ≫≪ RepeatableRead t1 reads the data items that satisfy P (e.g. to evaluate some aggregate function on them) whereupon t2 writes 6. SERIALIZABILITY a new data item satisfying P before t1 commits, t1 will Chapter 3 in the book [11] of Weikum and Vossen is a good commit with an incorrect answer. source for the theory of serialization. We use their definitions A stricter interpretation of P3 will transform it into an and refer to their book for a more comprehensive treatment anomaly: of this subject. A3 – Phantom Example: r1(P )w2(insert y to P )c2r1(P )c1 Definition 7. Let s be a history of n transactions t1, . . . , tn Here t1 executes a query satisfying the predicate P with initializing transaction t0 and final transaction t∞, and twice, finding that the result has changed. let D be the set of data items read or written by some tk. For each x ∈ D we recursively define the Herbrand seman- P4 – Lost Update tics of each read step rk(x) and each write step wk(x) in s Example: r1(x)w2(x)w1(x)c1 by A5A – Read Skew

Example: r1(x)w2(x)w2(y)c2r1(y)(c1 or a1) 1. Hs(w0(x)) = f0x() where f0x() is a 0-ary function (a constant – the initial value of x) A5B – Write Skew Example: r1(x)r2(y)w1(y)w2(x)(c1 and c2 in some 2. Hs(rk(x)) = Hs(wp(x)) where tp(p 6= k) is the last order) transaction to write x prior to tk reading x

A6 – Read-Only Transaction Anomaly 3. Hs(wk(x)) = fkx(Hs(rk(y1)),...Hs(rk(ym))) where Example: r2(x)w1(y)c1r3(x)r3(y)c3w2(x)c2 rk(y1), . . . , rk(ym) are all read steps performd by tk Snapshot Isolation was originally trusted to never pre- prior to wk(x), and fkx is an uninterpreted m-ary func- sent incorrect data from read-only transactions if no tion symbol

46 The Herbrand semantics of the history s is the function H[s] Both reads are of the initial value and the two write sets from D into the universe of Herbrand formulas defined by: are disjoint, so h2 ∈ SI. To prove the theorem it remains to show that h2 ∈/ FSR, i.e. that h2 is not final state equivalent H[s](x) = H (r (x)), x ∈ D s ∞ to any of the two possible serial histories:

s1 = t1t2 = r1(x)w1(y)c1r2(y)w2(x)c2 Definition 8. Two histories s1 and s2 are final state equiv- alent if they contain the same transactions and H[s1] = H[s2]. s2 = t2t1 = r2(y)w2(x)c2r1(x)w1(y)c1 To prove this we calculate the Herbrand semantics: A history is final state serializable if it is final state equiva- lent to a serial history. H[h2](x) = Hh2 (r∞(x)) = Hh2 (w2(x)) = f2x(Hh2 (r2(y)))

= f2x(Hh2 (w0(y))) = f2x(f0y()) The set of all finite state serializable histories is denoted H[s1](x) = H 1 (r (x)) = H 1 (w2(x)) = f2 (H 1 (r2(y))) FSR. s ∞ s x s = f2x(Hs1 (w1(y))) = f2x(f1y(Hs1 (r1(x)))

= f2x(f1y(Hs1 (w0(x))) = f2x(f1y(f0x()) Definition 9. Two histories s1 and s2 are view equivalent 2 2 2 2 if they are final state equivalent and Hs1 (p) = Hs2 (p) for all H[s ](x) = Hs2 (r∞(x)) = Hs2 (w (x)) = f x(Hs2 (r (y)))

(read and write) steps p. = f2x(Hs2 (w0(y))) = f2x(f0y())

A history is view serializable if it is view equivalent to a H[h2](y) = H 2 (r∞(y)) = H 2 (w1(y)) = f1y(H 2 (r1(x))) serial history. h h h = f1y (Hh2 (w0(x))) = f1y(f0x())

The set of all view serializable histories is denoted VSR. H[s1](y) = Hs1 (r∞(y)) = Hs1 (w1(y)) = f1y(Hs1 (r1(x)))

= f1y (Hs1 (w0(x))) = f1y(f0x())

Definition 10. The conflict relation of a schedule s, conf(s), H[s2](y) = Hs2 (r∞(y)) = Hs2 (w1(y)) = f1y(Hs2 (r1(x))) is defined as the set of all pairs (p, q) where p

The set of all conflict serializable histories is denoted CSR. ‘The greater the number of transactions that run concur- rently in the system, the less the chances that a request of a transaction to read or write some data will be granted im- The following result is considered well-known (see e.g. [11, mediately. Or in other words, if a request is granted under a corollary 3.3]): given load, it would also be granted if the load were lighter (i.e., if some of the other transactions were not present).’ Theorem 3. CSR ⊂ VSR ⊂ FSR Yannakis called histories adhering to this principle mono- tone. To give a more formal description of monotonicity Theorem 4. SI is neither included in, nor includes, any Yannakakis introduced the concept subschedule which Wei- of the serializability classes FSR, VSR or CSR. kum & Vossen [11, end of section 3.7]) call a projection:

Proof. Due to theorem 3 it is sufficient to find two his- Definition 11. Let s be a schedule for a set T of transac- tories h1 and h2 such that h1 ∈ CRS \ SI and h2 ∈ SI \ FSR. tions, and let U ⊆ T . The projection of s on U, denoted ΠU (s), is what we get if we from s remove all operations For h1 we may use the example of a dirty read (P1 in section performed by all transactions in T \ U. 4):

h1 = w1(x)w2(x)c1c2 Schedulers use projections mainly to handle aborts: When- h1 is conflict equivalent to t1t2, and thus in CSR. But since ever one or more transactions in a schedule abort, the sched- t1 and t2 are two concurrent transactions that both write x, ule is replaced by the projection of the schedule on its non- h1 is not in SI. aborted transactions. In fact, the same happens when a transaction commits: The transaction is added to the his- For h2 the example of a write skew (A5B in section 4) will tory of all committed transactions and removed from the do: schedule by a projection, but projections caused by com- h2 = r1(x)r2(y)w1(y)w2(x)c1c2 mits can not create any problems.

47 Definition 12. A class E of histories (legal schedules) is When the Snapshot Isolation scheduler constructed s it sched- said to be monotone if the following holds: If s is in E then uled t to read the last version xk of x written by a transaction all projections of s are in E. tk that committed before t started.

Since tk is committed it can not be among the transactions The importance of monotonicity is that a monotone class in s that are not in p. Thus xk is still the last version of x is closed under aborts. The following example illustrates written by a transaction which committed before t started this and shows why it seems almost impossible to make a even if some of the transactions in s are aborted. scheduler for a class which is not monotone: Finally we know that s does not contain any write-write Example Let E be the class of schedules a given scheduler conflicts. Removing transactions from s cannot create any S may produce, i.e. E is the class of legal schedules, new conflicts, so p does not contain any write-write conflicts and assume that E is not monotone. Then the follow- either. ing may happen: Thus both criteria of definition 6 are proved, and we con- S makes a schedule s for a set of transactions clude that p is in SI. It follows that SI is monotone. T . A transaction t ∈ T aborts. 8. RECOVERABILITY Again, we borrow the terminology of Weikum & Vossen [11] ΠT \{t}(s) ∈/ E (construe an illegal schedule). and cite their definitions: Another problem is that an illegal schedule s may be- come legal if a new transaction arrives to be interleaved Definition 13. ([11, definition 11.5]) A schedule s is re- with s. coverable if the following holds for all transactions ti, tj ∈ trans(s), i 6= j: if ti reads from tj in s and ci ∈ op(s), then We claim that this example proves the importance of mono- cj

Both Yannakakis [12] and Weikum & Vossen [11] have tac- Definition 14. ([11, definition 11.6]) A schedule s avoids itly assumed that their database is not multiversion when cascading aborts if the following holds for all transactions discussing monotonicity. In fact, as far as we can see, no ti, tj ∈ trans(s), i 6= j: if ti reads X from tj in s, then discussions of monotonicity in multiversion exist, cj

ΠT (s) = r1(x0)r1(y0)r3(x0)r3(y2)c3w1(x1)c1 Definition 16. ([11, definition 11.8]) A schedule s is rig- orous if it is strict and additionally satisfies the following 3 2 Here t reads a version of y written by t which does not condition: for all transactions ti, tj ∈ trans(s), if rj (x)

Proof. First, assume that s ∈ SI and that ti, tj ∈ trans(s), Theorem 5. The class SI is monotone. i 6= j, have a conflict wj (x)

48 Here all reads are of the initial value and the write sets of tial as only conflicting steps are ordered. However, the def- t1 and t2 are disjoint so s1 ∈ SI by definition 6. But since t1 inition of a snapshot requires that we can compare all steps writes y after t2 reads y but before t2 commits, by definition to all commits. This works fine as long as all transactions 16 s1 ∈/ RG. Hence s1 ∈ SI \ RG which shows SI 6⊆ RG. run on the same computer using the same system clock. But in a distributed system with several autonomous nodes, it To prove the last claim, consider the history is not obvious how to define a snapshot. This question is of course closely related to the notion of global time, but s2 = r1(x)r2(y)w1(x)c1r2(x)c2 we feel that there are still some open questions related to s2 contains no write-write conflicts and all reads are of comit- ’Distributed Snapshot Isolation’. ted values, so by definition 15 s2 ∈ ST. Neither does s2 con- tain any read-write conflicts, so by definition 16 s2 ∈ RG. 10. ACKNOWLEDGEMENTS But since t2 reads a value of x written after t2 executed its We want to thank the anonymous referees for their usefull first operation, s2 ∈/ SI. Hence s2 ∈ RG \ SI, so RG 6⊆ SI. comments.

Finally, by theorem 6 we have RG \ SI ⊆ ST \ SI. Thus we 11. REFERENCES have s2 ∈ ST \ SI which prove the inclusion SI ⊆ ST to be [1] American National Standards Institute. ANSI X3. strict. 135-1992 Information Systems – Database Language – SQL, 1992. 9. CONCLUSIONS AND FUTURE WORK [2] H. Berenson, P. Bernstein, J. Gray, J. Melton, Our main results are that SI neither includes CSR nor is E. O’Neil, and P. O’Neil. A Critique of ANSI SQL included in FSR, that SI is monotone, and that all histories Isolation Levels. In Proc. of the ACM SIGMOD in SI are strict. In [2] Berenson & al. showed that SI is a International Conference on Management of Data, stronger isolation level than read committed. Thus Snap- pages 1–10, 1995. shot Isolation guarantees a reasonably strong level of isola- [3] T. Connolly and C. Begg. Database Systems. 5th tion and produces easily recoverable histories. Since SI is Edition. Addison Wesley, 2010. monotone the simplicity of the Snapshot Isolation Protocol [4] R. Elmasri and S. B. Navathe. Fundamentals of makes it easy to implement an efficient scheduler for SI. So in Database Systems. 5th Edition. Benjamin/Cummings, spite of the fact that Snapshot Isolation may generate histo- 2006. ries that are not final state serializable, we think it is a wise [5] A. Fekete, D. Liarokapis, E. O’Neil, P. O’Neil, and decision of the DBMS vendors to offer Snapshot Isolation as D. Shasha. Making Snapshot Isolation Serializable. an isolation level. ACM Transactions on Database Systems, 30(2):492–528, 2005. Note: Snapshot Isolation vs. Serializable [6] A. Fekete, E. O’Neil, and P. O’Neil. A Read-Only Among others, both Oracle and PostgreSQL use Snapshot Transaction Anomaly Under Snapshot Isolation. Isolation to implement their isolation level ’Serializable’. Ac- SIGMOD Record, 33(3):12–14, September 2004. cording to theorem 2 this is not correct. In fact, this is an [7] H. Garcia-Molina, J. D. Ullman, and J. Widom. explicit violation of the SQL-99 standard [8]. Of the other Database Systems: The Complete Book. Second major DBMSes, Microsoft SQL Server offers both Serializ- Edition. Prentice-Hall, 2009. able and Snapshot Isolation as isolation levels. [8] International Organization for Standardization. ANSI/ISO SQL 99. (ISO/IEC9075-2:1999(E), page Note: The importance of monotonicity 83), 1999. As shown in the example in section 7 monotonicity is an im- [9] J. Melton and A. R. Simon. Understanding The New portant property for schedulers. We therefore find it strange SQL: A Complete Guide. Morgan Kaufmann, 1993. that this topic is not covered in many (most?) popular text- [10] A. Silberschatz, H. F. Korth, and S. Sudarshan. books discussing schedulers (among them [3, 4, 7, 10]). An Database System Concepts. 5th Edition. McGraw-Hill, exception is the book of Weikum & Vossen [11] where they 2005. in section 3.7 prove that neither FSR nor VSR are mono- [11] G. Weikum and G. Vossen. Transactional Information tone. In our opinion this is the main reason why both final Systems: Theory, Algorithms and the Practice of state and view equivalence are unfit as serializability crite- Concurrency Control and Recovery. Morgan ria. Not being monotone is much more important than the Kaufmann, 2002. fact that their schedulers have to be based on NP-complete [12] M. Yannakakis. Serializability by Locking. Journal of algorithms. the ACM, 31(2):227–244, April 1984. [13] L. T. Østby. En teoretisk studie av ”Snapshot Future Work Isolation”. Master’s thesis, Dept. of Informatics, While the order of steps within a transaction (Definition 1) University of Oslo, 2008. (In Norwegian). is linear the order of steps in a history (Definition 2) is par-

49