Testing Database Transaction Concurrency
Yuetang Deng Phyllis Frankl Zhongqiang Chen Department of Computer and Information Science Polytechnic University Brooklyn, NY 11201 USA {ytdeng, phyllis, zchen}@cis.poly.edu
individual queries. In [19] we explore problems dealing Abstract with transactions, comprising groups of related queries. This paper focuses on concurrency problems associated Database application programs are often designed to with multiple clients simultaneously running an be executed concurrently by many clients. By grouping application program. related database queries into transactions, DBMS Many database applications are designed to support systems can guarantee that each transaction satisfies the concurrent users. For example, in an airline reservation well-known ACID properties: Atomicity, Consistency, system, many clients can simultaneously purchase tickets. Isolation, and Durability. However, if a database To avoid simultaneous updates of the same data item and application is decomposed into transactions in an other potential concurrency problems, database incorrect manner, the application may fail when applications employ program units called transactions, executed concurrently due to potential offline consisting of a group of queries that access, and possibly concurrency problems. update, data items. The DBMS schedules query This paper presents a dataflow analysis technique for executions so as to insure data integrity at transaction identifying schedules of transaction execution aimed at boundaries (while also trying to utilize resources revealing concurrency faults of this nature, along with efficiently). techniques for controlling the DBMS or the application Although the DBMS employs sophisticated to force execution to follow these transaction sequences. mechanisms to assure that transactions satisfy the ACID The techniques have been integrated into AGENDA, a properties – Atomicity, Consistency, Isolation, and tool set for testing relational database application Durability – an application can still have concurrency- programs. Preliminary empirical evaluation is presented. related faults if the application programmer erroneously placed queries in separate transactions when they should 1. Introduction have been executed as a unit. The focus of this paper is on detecting faults of this nature.
The rest of the paper is organized as follows. Section 2 Database and transaction processing systems occupy introduces the background and a tool set called AGENDA a central position in our information-based society. It is for database application testing. Section 3 identifies some essential that these systems function correctly and typical transaction concurrency problems. Section 4 provide acceptable performance. Substantial effort has presents our approach to test for concurrency problems. been devoted to insuring that the algorithms and data Section 5 discusses problems with transaction structures used by Database Management Systems atomicity/durability in database applications and our (DBMSs) work efficiently and protect the integrity of the solution. Section 6 presents preliminary empirical data. However, relatively little attention has been given to evaluation. Section 7 describes related work. Section 8 developing systematic techniques for assuring the concludes with our contributions and discusses future correctness of the related database application programs. work. Given the critical role these systems play in modern society, there is clearly a need for new approaches to assessing the quality of the database application 2. Background programs. To address this need, we have developed a systematic, 2.1. Database Transaction partially-automatable approach and a tool set based on this approach to testing database applications [1,5,11]. A transaction is a means by which an application Initially, we focused on problems associated with programmer can package together a sequence of database operations so that the database system can provide a Generator, Input Generator, State Validator, and Output number of guarantees, known as the ACID properties of a Validator. transaction [4]. Atomicity ensures that the set of updates AGENDA takes as input the database schema of the contained in a transaction must succeed or fail as a unit. database on which the application runs; the application Consistency means that a transaction will transform the source code; and “sample value files”, containing some database from one consistent state to another. The suggested values for attributes. The tester interactively isolation property guarantees that partial effects of selects test heuristics and provides information about incomplete transactions should not be visible to other expected behaviors of test cases. Using this information, transactions. Durability means that effects of a committed AGENDA populates the database, generates inputs to the transaction are permanent and must not be lost because of application, executes the application on those inputs and later failure. Therefore, transactions can be used to solve checks some aspects of correctness of the resulting problems such as creation of inconsistent results, database state and the application output. concurrent execution errors within transactions and lost This approach is loosely based on the Category- results due to system crash [17]. Partition method [16]: the user supplies suggested values A database application program typically consists of for attributes, partitioned into groups, which we call data SQL statements embedded in a source language, such as groups. This data is provided in the sample value files. C or Java. Transactions are delimited by the keywords The tool then produces meaningful combinations of these BEGIN and COMMIT (or ROLLBACK). During execution, values in order to fill database tables and provide input once a transaction’s BEGIN is executed, the DBMS will parameters to the application program. Data groups are assure that no concurrently executing application executes used to distinguish values that are expected to result in a statement that interferes with this transaction (e.g. by different application behaviors. For example, in a payroll updating data that this transaction is using) until the system, different categories of employees may be treated transaction’s COMMIT is executed or until the transaction differently. Additional information about data groups, is rolled back. The system can initiate a rollback when it such as probability for selecting a value from the list of enters a state in which ACID properties may be violated, values that follows, can also be provided via sample value either due to overly optimis tic scheduling decisions or to files. external events such as system failure. When a rollback Using these data groups and guided by heuristics occurs, the system is restored to the state before selected by the tester, AGENDA produces a collection of execution of the transaction’s BEGIN. Once a transaction test templates representing abstract test cases. The tester commits, all of the changes it has made to the database then provides information about the expected behavior of state become permanent. the application on tests represented by each test template. Although ACID properties are related to each other, For example, the tester might specify that the application testing each of the properties may expose problems in should increase the salaries of employees in the “faculty” different aspects of the database application. To test group by 10% and should not change any other attributes. transactions in the application, all the ACID properties of In order to control the explosion in the number of test transactions need to be addressed. In this paper, we focus templates and to force the generation of particular kinds of on testing transaction concurrency (isolation) and templates, the tester selects heuristics. Finally, AGENDA atomicity/durability. instantiates the templates with specific test values, executes the test cases and checks that the output and 2.2. Agenda Tool Set new database state are consistent with the expected behavior indicated by the tester. The architecture of In our previous work [1,11], we have discussed issues AGENDA is shown in Figure 1. arising in testing database systems, presented an approach to testing database applications, and described AGENDA, a set of tools based on this approach. In testing such applications, the states of the database before and after the application’s execution play an important role, along with the user’s input and the system output. A framework for testing database applications was introduced. A complete tool set based on the framework has been prototyped and is shown in Figure 1. The components of this system are: Agenda Parser, State The fifth component (Output Validator) stores the tuples which are satisfied by the application query and constraints in the log tables. It uses integrity constraint techniques to check that the preconditions and post- conditions hold for the output returned by the application. In this paper, we consider application programs written in embedded SQL, i.e., a high-level language, such as C, with SQL statements interspersed. The SQL statements may involve host variables, which are variables declared in the host program; within the SQL statements, host variables are preceded by colons. Multiple clients may execute an application program concurrently. In this case, each client has its own host variables. The only “variable” shared by concurrently executing clients is the database, itself. Figure 1 Architecture of the AGENDA tool set 3. Transaction concurrency problems The first component (Agenda Parser) extracts relevant information from the application’s database schema, the Concurrency is one of the trickiest aspects of software application queries, and tester-supplied sample value files, development. Even if an application runs correctly for all and makes this information available to the other four inputs in an isolated environment, it may have incorrect components. It does this by creating an internal database behavior when running concurrently with other instances called Agenda DB to store the extracted information. The of the same application. Transaction mechanisms in a Agenda DB is used and/or modified by the remaining four DBMS system provide protection for the shared resource components. Agenda Parser is a modified version of from (i.e., the database). However, designers must use them the open-source DBMS Postgresql parser [9]. carefully to assure correct behavior. A common scenario The second component (State Generator) uses the in a database application is that the application selects database schema along with information from the tester’s some data from a database, processes the data, and sample value files indicating useful values (optionally updates the database with new data. Consistency partitioned into different groups of data) for attributes, constraints can be violated if more than one instance tries and populates the database tables with data satisfying the to modify the same data element concurrently. To avoid integrity constraints. It retrieves the information about the interference from other instances, related operations application’s tables, attributes, constraints, and sample should be grouped into a transaction. To improve data from the Agenda DB and generates an initial DB state efficiency, operations on different tables or different for the application. We refer to the generated DB as the tuples of the same table could perform concurrently. application DB. Heuristics are used to guide the The entire application can be implemented as a generation of both the application DB state and transaction to avoid concurrency problems , but this would application inputs. deny any concurrency since all instances are essentially The third component (Input Generator) generates input forced to execute sequentially. To achieve some degree of data to be supplied to the application. The data are created concurrency, an application is composed of a set of by using information that is generated by the Agenda transactions. There is a close relation between the Parser and State Generator components, along with concurrency and the number of transactions in the information derived from parsing the SQL statements in application. The incorrect manipulation of this relation to the application program and information that is useful for over-optimize efficiency easily introduces concurrency checking the test results. faults into the system. This kind of concurrency problem The fourth component (State Validator) investigates which occurrs when data are manipulated across multiple how the state of the application DB changes during database transactions is termed offline concurrency execution of a test. It automatically constructs a log table problem [14]. and triggers/rules for each table in the application schema. A partial and buggy implementation of an airline The State Validator uses these triggers/rules to capture reservation system is shown in Figure 2. The attribute the changes in the application DB automatically, and uses avail in table Flights represents the number of seats still database integrity constraint techniques to check that the available on a given flight. In line 5, the number of seats application DB state changed in the right way. available on a given flight is stored in the host variable avail and is decremented by 1 in line 6 indicating that one } seat is booked for the flight. The application erroneously commits between lines 5 and 6, thus allowing another Figure 3: Partial implementation of TPC-C concurrent application instance to modify the database warehouse application. between executions of these lines. If two clients, A and B, run Delivery( ) with the same c_id 3 5 4 Example 1 Ticket booking system simultaneously, then the schedule
First, DFA algorithm is used to propagate data information Table 4: access patterns and their problems. in the TCFG. Recall that each node in the TCFG Transactions (Tm, Tk) in one instance and Ti in corresponds to a transaction, and the DFA information another access the same data element. R stands for with each node is a set of tuples consisting of data “READ”; W stands for “WRITE”. elements read/written and host variables defined/used by the node (transaction). There are two kinds of tuples with If exists tuple (Tj, READ/WRITE, Tname, each node in the TCFG. One is of the form (xact_id, Aname) in GEN_DB[Tj] operation, table_name, attr_name) which indicates that Add (Ti, Tj, Tname, Aname) to DBPair transaction xact_id applies the operation (READ or WRITE) to data element (table_name, attr_name). The GEN_HVPair( Tj ) other is of the form (xact_id, operation, p) which means For each tuple ( Ti, DEF/USE , p) in IN_HV[i] that transaction xact_id accesses parameter p via the If exist tuple ( Tj , DEF/USE , p’) in GEN_HV[i] operation (DEF or USE). AND the def of p in Ti reaches this use of p’ The DFA information for each node can be represented Add (Ti, Tj, p) to HVPair by two groups of sets: (IN_DB, OUT_DB, GEN_DB, KILL_DB) and (IN_HV, OUT_HV, GEN_HV, KILL_HV). Figure 5: GEN_DBPair and GEN_HVPair The former group represents data element flow information, while the latter represents host variable flow After DBPair and HVPair sets are generated, we use information. For each node (transaction) Tj, GEN_DB[Tj] two heuristics to generate interesting schedules: and GEN_HV[Tj] contain a tuple for each data element and For each DBPair/HVPair
After that, IN_DB, GEN_DB, IN_HV, and GEN_HV can be 4.3. Generation and execution of test cases for used to derive pairs of transactions (denoted as DBPair) schedule s accessing the same data element and pairs of transactions (denoted as HVPair) accessing the same data element To enforce the generated schedule (or partial through host variables in the same execution path. If data schedule), test cases must be generated in such a way that element (table_name, attr_name) is in a node’s IN_DB set transactions in the given schedule manipulate the same and GEN_DB set, then a tuple
Suppose that each single query is considered as a 6. Preliminary evaluation transaction, and the data and control flows of the original program are kept unchanged. Then we can find or We have implemented the proposed methods and compute the total number of schedules consisting of 3 measured some aspects of their performance on the TPC-C transactions for the following 4 different situations, and benchmark. We performed all experiments on the platform their results are given in Table 7. of Sun ULTRA 10 workstation. The CPU clock rate is 440 W1: any 3 transactions. Mhz. Main memory is 384 MB. The TPC-C application is W2: 2 of the 3 transactions in the same application implemented in C programming language with embedded instance access the same data element or host variable. SQL. The instrumentation is implemented in Perl. TPC W3: 2 of the 3 transactions in the same application Benchmark™ C (TPC-C) is the standard benchmark for instance are generated from DBPair/HVPair. online transaction processing (OLTP). It is a mixture of W4: 2 of the 3 transactions in the same application read-only and update-intensive transactions that simulate instance are generated from DBPair/HVPair; only those the activities found in a complex OLTP [7]. The TPC-C schedules are considered which match the patterns that application models a wholesale supplier managing orders may reveal concurrency errors in table 4 or table 5. and stocks. Experiments based on TPC-C are studied to test transaction consistency in our related work [19]. The Table 7: Schedules analysis five OLTP transactions are new-orders, payment, order- Situation W1 W2 W3 W4 status, delivery, and stock-level. Schedules 39304 916 127 11 To evaluate the time overhead of the proposed methods, we chop each original transaction into 2 As we can see from Table 7, by taking account of the transactions arbitrarily. This introduces potential information about the data element, the number of concurrency faults. We run the TPC-C application in three schedules to consider can be reduced significantly different ways and record their corresponding CPU elapse (W2/W1= 2.33%). Similarly, if we also consider data flow time. information, we can further reduce the number of S0: The original five transactions are executed without schedules (W3/W2 = 13.43%). Finally, our proposed any instrumentation and without scheduling enforcement. method also considers the access patterns; the number of S1: The desired schedule is enforced by the DBMS interesting schedules is the smallest (W4/W3 = 8.66%). transaction manager. S2: The transactions are instrumented in the application source code according to the desired schedule. 7. Related work We run S0, S1 and S2 separately for 5 times and record their total elapse time in the first three rows of Table 6. The The issue of test suite adequacy for database-driven unit of time is seconds. The last two rows of Table 6 show applications is addressed in [21]. A dataflow-based family the overhead for situations S1 and S2, defined as OHi = of test adequacy metrics is proposed to assess the quality (Si-S0)/S0 * 100% (i=1 or 2). of test suites where a database interaction control flow As we see from Table 6, the overhead for S1 is much graph is given as a unified representation. smaller than that for S2, indicating that modification of the Model checking tools [15] systematically explore the DBMS transaction manager is more efficient than state spaces of concurrent/reactive software systems. They have been shown to be effective in verifying many types of properties, including checking for assertion offline concurrency problems in database applications. A violations [15, 20]. However, model checking depends on a data flow analysis technique is used to identify interesting manageable number of program states in the model which and legal schedules which may reveal concurrency errors. represents the specified program, so it cannot be directly Two approaches are suggested to execute a given applied to testing a database application implementation schedule. The instrumentation method is also used to test considered in this paper. atomicity/durability. Preliminary empirical evaluation The instrumentation technique is widely used in white- based on the TPC-C benchmark is presented and box testing. Contest [8] is a tool for detecting demonstrates our approach’s effectiveness and efficiency. synchronization faults in multithreaded Java programs. In database applications, we believe that the unit test The program under test is instrumented with sleep(), involves testing transaction consistency; the integration yield(), and priority() primitives at points of shared test involves testing transaction concurrency; and the memory accesses and synchronization events. At run system test involves testing atomicity/durability. To time, based on random or coverage-based decisions, expose potential problems associated with “dirty read”, Contest can determine whether the seeded primitive is to “non-repeatable read”, and “phantom”, we plan to extend be executed. A replay algorithm facilitates debugging by our tools to control schedules at the query level. We also saving the orders of shared memory accesses and plan to refine our schedule generation algorithm to avoid synchronization events. This kind of instrumentation deadlock. technique is integrated into our tools for testing database applications. 9. Acknowledgments Race condition is the major concurrent error which occurs in concurrent systems. ExitBlock [10] is a practical Thanks to David Chays, Gleb Naumovich, Torsten testing algorithm that systematically and deterministically Suel, and Alex Delis for helpful discussions. finds program errors resulting from unintended timing dependencies. ExitBlock executes a program or a portion 10. References of a program on a given input multiple times, enumerating meaningful schedules in order to cover all program [1] D. Chays, P. Frankl, et al. “A Framework for Testing behaviors. However, in database applications, a race Database Application” ACM International Symposium on condition occurs when two or more application instances Software Testing and Analysis, Portland, Oregon, 2000. try to access the same data element in the database. This [2] H. Berenson, et al. “A critique of ANSI SQL Isolation can be controlled by the transaction manager of the Levels” SIGMOD95, San Jose, CA USA 1995. DBMS system. The major concern in database [3] J. Gray, A. Reuter, Transaction Processing: Concept applications is how to group queries into transactions so and Techniques, Morgan Kaufmann Publishers Inc., 1993. that a balance between efficiency and robustness can be [4] H. Garcia-monlina, J.D. Ullman, J. Widom, Database achieved. Systems: the Complete Book, Prentice Hall, 2000. Flow analysis techniques are used in many fields such [5] D. Chays, Y. Deng, “Demonstration of Agenda Tool for as compilation, code analysis, and reverse engineering. Testing Database Applications”, International Reachability testing [13] systematically executes all Conference on Software Engineering, Portland, Oregon, possible orderings of operations on shared variables in a USA, 2003. program consisting of multiple processes. It parallelizes [6] J. Melton, A.R. Simon, SQL: 1999 Understanding the executions of different schedules to speed up testing. Relational language components, Morgan Kaufmann Reachability testing requires explicit declarations of which Publishers Inc., 2002. variables are shared. Flow analysis techniques can be [7] TPC-Benchmark C. Transaction Processing Council. used to generate flow-sensitive schedules in database http://www.tpc.org, 2002. applications. [8] O. Edelstein, E. Farchi, et al. “Multithreaded Java
program test generation”, IBM Systems Journal, Vol. 41, 8. Conclusions and future work 2002. [9] PostgreSQL Global Development Team, PostgreSQL, To test database applications, we have proposed a http://www.postgresql.org, 2002. framework and have designed and implemented a tool set [10] D. Bruening, J. Chapin, “Systematic testing of to partially automate the testing process. In this paper, we multithreaded programs”, MIT/LCS Technical Memo, LCS- extend our previous work to test transaction concurrency TM 607, April 2000. and transaction atomicity/durability at the isolation level of SERIALIZABLE. We have identified the potential [11] D. Chays, Y. Deng, et al. “An AGENDA for Testing [17] P. O’neil, Database: Principles, Programming, and Relational Database Application”, Technical Report (TR- Performance, Morgan Kaufmann Publishers, 2000. CIS-2002-04), CIS Department, Polytechnic University. [18] A.V. Aho, R. Sethi, and J.D. Ullman. Compipler: [12] W.R. Stevens, Advanced programming in the UNIX Principles, Techniques, and Tools, Addison-Wesley, environment, Addison-Wesley, 1993. 1988. [13] G. H. Hwang, K. C. Tai, and T. L. Huang, “Reachability [19] Y. Deng, D. Chays and P. Frankl, “Testing database testing: an approach to testing concurrent software”, Int. transaction consistency”, Technical Report, CIS Journal on Software Engineering and Knowledge Department, Polytechnic University, 2003. Engineering, Vol. 5, No. 4, 1995, 493-510. [20] H. Hong, S. Cha, I. Sokolsky, and H. Ural, “Data Flow [14] M. Fowler, Patterns of Enterprise Application Testing as Model Checking”, 25th international conference Architecture, Addison-Wesley & Benjamin Cummings, on software engineering, Portland, Oregon, 3-10 May, 2003. 2003. [15] G. Holzmann, “The Spin Model Checker”, IEEE Trans. [21] G. Kapfhammer and M. Soffa, “A Family of Test on Software Engineering, May 1997. Adequacy criteria for Database-Driven Applications”, [16] T.J. Ostrand and M.J. Balcer. “The category-partition ESEC/FSE’03, Helsinki, Finland, September 2003. method for specifying and generating functional tests”. Comm. ACM, Vol. 31 no. 6, 1988.