I0$l-[[ill0[ ll 0 ll s I ll I lll 0 ll I ...... o ......

Test-l[iuen lleuelopment ol Belational llatabases

Scott W. Ambler, rrM

n test-first development, developers formulate and implement a de- lmplemenling tailed design iteratively one test at a time.1,2 Test-driven development lesl-driven (also called test-driven design) combines TFD with refactoring,3 dolshuse design wherein developers make small changes (refactorings) to improve involves dolobose code design without changing the code's semantics.'Sfhen developers decide refocloring, to use TDD to implement a new feature, they must first ask whether the cur- regression lesling, rent design is the easiest possible design to enable the feature's addition. If ond conlinuous so, they implement the feature via TFD. If not, because sometimes you'll write a developer test inlegrolion. they first refactor the code, then use TFD. In that specifies application code behavior, while Although technicolly the behavior-driven deveiopment4 approach to other times it will specify behavior. naming and TDDD is important for several reasons. slroightforword, TDD, the conventions terminology make the developer's primary goal explicit: to First, a1l of application TDD's benefits apply to TDDD involves specify, rather than validate, the system. TDDD: you can take small, safe steps; refactor- culturol chollenges Test-driven applies TDD to ing lets you maintain high-quality design database development. In TDDD, developers throughout the life cycle; regression testing lets lhot slow ils implement critical behaviors (including business you detect defects earlier in the li{e cycle; and, odoplion. rules about data invariants, such as "The only because TDDD gives you an executable system valid colors are red, blue, and purple") and specification,you'remotivatedtokeepitup-to- business functionality within relational data- date (unlike with traditional design documenta- bases. Theret nothing special about relational tion). Moreover, with TDD, your database de- -we specify database behaviors via velopment efforts effectively dovetail into your database tests in the same way we implement overali application development efforts. The IT application-code behaviors via developer tests. community has long suffered because of the TDDD is not a stand-alone activity. Itt best cultural mismatch between application devel- viewed as simply the database aspects of TDD. opers and data professionals.5 This mismatch That is, you should develop your database results largely from differing philosophies and schema in lock-step with your application codes ways of working. Modern methodologies, in-

0740-74591071$25.00 @ 2007 rEEE May/June 2007 IEEE s0FTWABE 37 r SQL statement calculations return the ex- pected results, a particular user ID can't access specific information, or I a particular user ID can access specific in- formation.

Interface tests are fairly straightforward to implement because good tools exist for many platforms, especially the xUnit framework. Commercial tools are also viable options. My - Clear-box testing advice: use the same tools that you're using for o Stored procedures/funGlions a pplication regression resring. Black-box teslingl . Triggers . Data values being persisted o Views o Dala values being relrieued . Constrainls lnternal testing. Internai database tests specify . Stored procedures/functions . Existing quality data both behaviors that the database implements 4... . Relerential inte0rityidata consistency 4... and data validity. There's nothing special about code-if you can write Figure l. Testing a tests for Java code, surely you can write tests relational database. cluding Rational Unified Process, Extreme Pro- for procedural language extensions to SQL llevelopers must test gramming, and Dynamic System Development (PLSQL) code. According to current rhetoric, the database both Method, work in an evolutionary (iterative and data is a corporate asset-shouldn't you there- internally and at its incremental) manner. It therefore behooves us fore specify what kind of data you intend to interface-the to find techniques that let data professionals store, along with the data's invariants? From equivalent oI clear-box work in an evolutionary manner with their ap- the database's viewpoint, these are effectively testing and black-box plication developer counterparts. TDDD is one clear-box tests. Such tests might specify testing, respectively. such technique. r some of a stored procedure's behavior, Extending Tllll to database r a validity check for a paramerer being development passed to a stored procedure, To extend TDD to database development, r a validity check for the value that a stored we need database equivalents of regression test- procedure returns, ing, refactoring, and continuous integration. I a stored procedure to return an expected error code, Ilatabase legression testing I a column invariant, In database regression testing,6,7 you regu- I a referentiai integrity rule, and larly validate the database by running a com- I a column default. prehensive test suite-ideally whenever you change the itself or access the Tools for internal database testing are a bit database in a new way. As the threat bound- harder to come by, but that's changing. For ex- (dashed ariess lines) in figure 1 show, you must ample, tools such as Quest's Qute validate Or- test both the database interface and the database acle PLSQL code and Microsoft's Visual Stu- itself. dio Team System for Database Professionals includes testing tools for SQL Server. 'We hfurtace testing. use database interface tests to specify how systems will access the database. Testing example. The following example illus- From the database's viewpoint, these are effec- trates regression testing. Assume that we're tively black-box tests. Such a test might specify building a banking system and have two fea- a portion of a hard-coded SQL statemenr that tures to implement: Retrieve Account from you might submit to the database. It might also Database and Save Account to Database. Let's specify that also assume that this is the first time we've run into the Account concept and that we're im- r an SQL statement returns an expected result, plementing these two features. For account re- r a view produces an expected result, trieval, we'd run several database tests (one at a

3 8 tEEE s0FTWARE www.computer.org/software time), including shouldExistOneOrMoreAccounts- ForCustomer, shouldExistOneorMorecustomers- ForAccount, shouldHaveUniqueAccountlD, and shouldHavePositiveAccountBalance. For saving an account into the database, we'd add tlvo new tests: shouldBeAssignedUniqueAccountlDOn- Creation and shouldBeAssignedZeroBalanceOn- Creation. I'm using the BBD "should" naming conventions4 here, but the "traditional" TDD naming convention (stating tests with "test") would also work fine. With a TDDD approach, we'd write each of these tests one at a time, then evolve the data- base schema to fulfill the test-specified function- ality. Consider the test shouldHavePositiveAc- countBalance. If this is the first time we've considered the account-balance concept, we'd add the Account.Balance column. Depending on our database design standards, implementing such a data-oriented business rule could moti- vate us to also add a column constraint or an update trigger that checks the value of Ac- count.Balance. Regardless of the implementa- tion strategy, it's critical to validate that your database properly supports the business ruie. There's no magic to identifying a database Figure 2.Ilatabase test-the advice presented in this special is- renaming a column, splitting a multiple-use col- relactoring. Ileuelopers sue's other articles, as well as in TDD books, umn, and parameterizing a database method to euolue the database is pertinent to TDDD.1,2 The primary differ- consolidate several similar ones. sehema in their own ence in TDDD is that we must recognize how Conceptually, a database refactoring is more sandboxes belore important it is to validate both the functional- difficult than a code refactoring because you sharing the change ity implemented by, and the data contained must preserve informational semantics along with their teammates. within, the database. A database is nothing with behavioral semantics. Preserving informa- special-like any other system component. it tional semantics implies that if you change the must be validated. values of a column's data, the change shouldn't materially affect that information's clients. For Ilatabase rcIactoring example, assume that a phone number is stored Refactoring is a disciplined approach to re- in character-based format. If you change the structuring code: you make small changes to value from "(416) 555-1212" to "416555121.2," improve the code's design, without changing its you haven't materially changed the informa- behavioral semantics-that is, you neither re- tion's semantics, as long as all that column's move nor add behavior.3,e Refactoring lets you clients can still work with the new format. In improve your code over time in a safe, evolu- contrast, changing the value to "(416) 555- tionary manner. Similarly in database refactor- 5555" clearly changes the informational se- ing, you make a simple change to a database mantics. Similarly, to preserve behavioral se- that improves its design, while retaining both mantics, you must keep the same black-box its behavioral and informational semantics. functionality. The implication? You must re- Databases include structural aspects (such as work any source code that works with changed table and view definitions), functional aspects aspects of your database schema to maintain (such as stored procedures and triggers), and in- the existing functionality. formational aspects (the database data). There are many examples of database refactoring, in- Process overview. Figure 2 offers an overview cluding dropping a view that's no longer used, of the database refactoring process using a moving a column to a more appropriate table, UML activity diagram. The recommended ap-

May/June 2007 IEEE S0FTWABE 39 proach is to first implement the database your regression test suite whenever you inte- refactoring in a separate developer's "sand- grate your system (more on these topics later). &rsaxsExmEiesEs box" where you can perform the work in iso- As you find bugs, you'll need to fix them, iter- iation before promoting it to shared environ- ating back and forth sffiffis8 $rers?*88 between testing and cod- ments, such as your team integration sandbox. ing as required. #$B egr*sEts# Better yet: do this early work.in pairs, where Once you've completed the refactoring, least rBBe&r:& *g at one person has agile database deveiop- you should put your work under configura- ment skills and-more importantly-knowl- tion management (CM) control and announce *ss*EE!ffiffix*e€8ffiff edge of the target database.s the refactoring both to your team and the or- ff*ae&es& The first step in refactoring is to verify that ganization as a whole. Because databases are you need to refactor the schema and then, if shared resources, database sIlesEs*$ refactoring is often so, to choose the right refactoring. For exam- a challenge to organizations in that they must e5E?*!ffi$ E*&5s?S" p[e. sometimes people suggest renaming a col- promote an effective means of communicating umn, but don't realize that the existing name database changes among teams. conforms to corporate naming conventions. Or, perhaps the real problem isn't the col- Stractural refactoring example. To show how re- umn's name, but rather that it's in the wrong factoring works, we'll work through a simple table and therefore appears poorly named in structural refactoring, moving a column from that table's context. In other cases, refactoring one table to another. Let's say our example envi- is simply nor cost effective. ronment has more than 100 systems accessing a A possible second step here is to deprecate mission-critical database that we can't afford to the target schema's existing portion. This is an break. Figure 3a shows a portion of a bank data- optional step; some production databases are base's original schema. The Customer table has a accessed by dozens, if not hundreds, of sys- Balance column that belongs in the Account tems running on different platforms, written table. As a result of this design flaw, the company in different languages, and released into pro- has been coding workarounds for years, making duction on different schedules. In this situa- it difficult to support the business. tion, you must support a transition window- Figure 3b shows the transitional database which might last several years in some schema, which we need because the situation is organizations-where the schema's existing somewhat complex. \7e've selected a two-year and refactored versions run in parallel. In sim- transition window to ensure that the various de- pler situations, you can update and release velopment teams can refactor and test their code both the database schema and accessing sys- to work with the new schema. As the transitional tems in parallel, so you don't need to support image shows, we've added the new schema (the a transition window. The point is, you can do Account.Balance column) as well as scaffolding database refactoring in many situations, from code-the SynchronizeCustomerBalance and small systems to highly complex ones. SynchronizeAccountBalance triggers-to keep In the next srep, you write the code (or gen- the rwo columns synchronized.'We must assume erate it using a database refactoring tool) to that at any point during the transition, each of make the change to the database. For database the systems accessing this database will access refactorings that change the table structure, one but not the other column. In other words, you'd create (DDL) the database must be responsible for keeping it- code to add the new schema and Data Manipu- self consistent. lation Language (DML) code ro manipulate the As figure 3b shows, we've marked the scaf- data. To keep the data synchronized, you need folding code and original schema (Cus- to add scaffolding code-typically a trigger, al- tomer.Balance) with their intended drop .We're date. though in some instances an updatable view can basically following the same strategy work, you're too. If doing data-quality refactor- that Sun Microsystems uses to evolve the Java ing, you might only need to write DML code to Development Kit (JDK)-add the new func- address the quality issue. Similarly, if you're do- tionality and deprecate the old functionality to ing database-method refactorings, you might indicate that people should no longer use it. oniy have to rework stored procedure code. In The difference here is that our drop date tells parallel with this, you must test your work, tak- people how long they have to refactor their ing a test-first approach to design and running code. Naturally, we're not going to automati-

4 0 IEEE S0FTWABE www.c0mputer.org/s0ftware Figure 3. A simple AcnoInt, I structural relactoring. FirslName: AccounllD i (a) The original ' '' 1 accesses <> Customerlo < ,9ltL1ryql9i:l[*;.-,*;.-,"-J database schema's Balaric8. .. r ' Balance column is Checkcusl0meiExislsi: : : r l in the wrong table. CheckNoAscountsr :,: $ {event:-' tqforc updale I beloie insett}:.} levent,=belore deletel, i (b) Iluring the *.***.* transition period, the (a) database must support '. .i . ,. ..,,.,,:.Guslomglr'r,. both the original and new sehama uersions. (c) 0nce developers 1 accesses 1 .. ,1,.*9,*,m,s*s! successlully update and deploy the systems that SynchronizeAGcountBalance SynchronizeCustomerBalance accessed the original {€ient'= on usdatt I on'delete I m inssrt, levent = on update I on insett, column, we temoue drop date = June 14 2llll9l drop date = June 14 O09l CheckNoAccounls CheckCuslomerExists the original schema and levent = belore deletel , {eved telore undalBil..bolorc ins!$ = transition scallolding.

caily drop the old schema and scaffolding l. Automate the build. Your database build code on that date; we'll ensure that developers process should detect whether it needs to have successfully updated and deployed the apply any changes to the schema, then ap- accessing applications first. Figure 3c shows p1y those changes and run the test suite. The the resulting database schema when the database build process is part of your over- process is complete. all system build process. Currently, database build tools are a relatively new concept, but Gontinuous database integration open source developers are working on Ac- In continuous database integration, design tiveRecordMigration in Ruby on Rails and team members regularly integrate changes to dbdeploy. their own database instances, including struc- 2. Put euerything under uersion control. Data tural, functional, and informational changes, as models, data scripts, test data, and similar well as those related to automated system build artifacts are all important work products, and regression testing.l0 Ideally, they'd do this at and you should manage them appropri- least dai1y. In any case, whenever someone sub- ate1y. My philosophy is simple: if it's worth mits a database change, everyone else working creating, it's worth putting under version with that database should download the change control. from the CM system and apply it to their data- 3. Giue deuelopers their own database copies. base instance as soon as possible. Better yet, such Developers should first work on application changes should be automated-in the same way code and database schema changes in their tools such as Build Forge or CruiseControl auto- own sandboxes.ll Once they've fully tested matically download application code changes changes and deemed them safe, they should and rebuild your system on your machine. apply the changes to the project integration There are several best practices for contin- environment. This reduces the chance that uous database integration: individual developers will break the build,

May/June 2007 IEEE S0FTWARE 41 because they first do their work within their ing, and continuous database integration is own independent environmenrs. one parr of that overall effort. In other words, ffi*sffiE*s*&-s 4. Awtomate database creation, There are sev_ it isn't enough to just integrate your applica_ eral reasons to you *ffiE*re€** create a database copy. tion code; you also need to integrate your might need to rebuild it from scratch for database code. €ffiEx #&ex-s comprehensive testing, new people might join &s€8, s*E your team and require proper environ- fidopting TIlllD ment setup, or you might need to install the TDDD is an integrated part EE of the overall de- rs$Ele**s database on a workstation to demo the svs- velopment process, not a standalone activity that tem for someone. Creating a ,rew * re3egEe$**s database data professionals perform in parallei with ap- instance is something that you're likely *sw **ffi**$E to plication TDD. Although from a technical view_ do often, so you should strive to automate point, TDDD is relatively straightforward, we g*r *e€e the task. SimilarlS you should be able to must overcome several challenges to its whole- drop the database *tr*x*s$E&**is. easily. saie adoption throughout the IT community. 5. Refactor each database indiuiduallv. Indi- vidual refactoring iets you apply ..h.-" Cultural divisions changes one at a time, which in turn lets The cultural divisions berween the develop_ you start from any database schema ver_ menr and data communiries are rhe mosr signifi_ sion and evolve it forward to any other. cant barrier to adoption.5 The development com_ 6. Adopt a consistent identification strcttegy. munity is clearly moving toward evolutionary Common identification strategies include us_ development, while the data communiry seems ing an incremental integer number (1r2,3,4, firmly rooted in serial development. Developers ...), a date/time stamp, or a release number embraced TDD years ago, yet it remains a rela- (v2.3.1.5). All three strategies are fine, but I tively new concept for data professionals. A no- prefer the integer straregy because it,s simple. table exception here is in the data warehousing/ Ir's imporrant rhat you be able ro apply dara_ business intelligence communiry where thought base changes in the proper order, particuiarly leaders promote evolutionary *ork p.ocesses (rl- if you want to automate database creation though practitioners, to their peril, often fail to and schema update. heed this good advice). Unfortunately, this same 7. Bundle database changes wben needed. To community seems to believe that, because they implement a single feature, a developer work in an evolutionary manner, they don,t need might need to make several database 'When sche_ to tesr.6 We saw this same attitude within the ob- ma changes. deploying your sysrem ject developmenr community in the early 1990s. from your project integration environment to your preproduction testing environment. A dearth of tools you'll likely need to bundle up several sche- As I touched on earlier, our second challenge ma changes. For example, assume that the is the lack of tooling. We're starting to see inter- current database schema version is 1701,, esting tools for database refactoring, testing, and and you need to apply three new database continuous integration emerge in the market- refactorings (1702, 1,703, and 1704). Once place. Because databases are shared resources, you've applied all three, in order, the data_ we also need tools that let us communicate base schema's new version would be 1704. schema changes among all interested teams. 8. Ensure that the database knows its uersion We also need improvements to database nwmber. Your database update scripts modeling tools-we should, for example, be should help deveiopers easily determine ihe able to indicate a deprecated database ele- current database version. For example, you ment's drop date (as in figure 3b). Good data_ should have an Updateschema script that base modeling tools should add value to de- takes as a parameter a database veision- sign visualization. Better yet, we might one such as 1704 or 2007-06-14-23:45:00_and day see modeling tools that can define data- automatically runs the right change scripts to base tests. FinallS we need to rework data- update to that version from the ..r.r..f orr.. bases themselves. For example, shouldn,t the database throw a warning whenever someone Continuous system integration includes the accesses a deprecated database element, just as system's automated build and regression test_ Sun's JDK does with deprecated Java code?

42 IEEES0FIWARE www.conputer.org/sLftware Lack of familiarity In my experience, agile software developers seem to take to the TDDD idea rather quickly. Scott W. Ambler is the prorlke leoder {or ogile development with IBM Rolionol ond o PresumablS this is because they've already senior ronlribufing editor with Dr. Dohh's lournol. His reseor(h interesls indude evolulionory dolobose development ond effedive dolo monogemenl proclkes. He originoled fie Agile Mod- adopted TDD, so extending it to include data- eling Method, fie Agile Doto Method, ond the Enterprise Unified Prores ond is on otlive ron- base development makes sense to them. Tradi- tributor to the 0pen Unified Prores. He rereived his MIS in compuler-supporled cooperolive tional developers-and traditional data pro- work from the Universily ofloronlo. He is cooulhor of severol books, induding rleforlonng Dotoboses (Addison-Wesley, 2006). (onlod him ot [email protected]; ww-306. fessionals, in particular-are often intrigued ibm.romAoftwore/rotion0lAios^mbler.html. by the TDDD idea, but they struggle because it's so different from what they're used to. My hope is that we can overcome this challenge through training, education, mentoring, and a Bebauior Driuen Deuelopment, Mar.2007; ww. patience. bit of behaviour-driven. org. S.1W. Ambler, Agile Datdbase Techniques: Effectiue Strategies for tbe Agile Software Deueloper, lobn Viley & Sons, 2003. learly, we can extend TDD into the S.-{/. Ambler, Datdbase Regression Teslrzg, 2006; www database development realm, where agiledata.org/essays/databaseTesting.html. implementing a more evolutionary 7. S.A. Becker and A. Berkemeyer, "The Application of a Data- methodoiogy better align field Software Testrng Technique to Uncover Errors in would the with base Systems," Proc. 20th Pacific Northwest Softuare other modern methods and applications. A11 Quality Conf., PNSQCiPacific Agenda, 1999, pp. 173- of the techniques required for TDDD adop- 183. 8. F. Swiderski and W. Snyder; Tbreat Modeling, Microsoft tion already exist and-more importantly- Press, 2004. we're seeing nascent tool support for them. I 9. S.$7. Ambler and P Sadalage, Refactoring Databases: expect that within the next two to three years, Eu olutionary D atab dse Deslgn, Addison-Wesley, 2006. 10. M. Fowler and P. Sadalage, "Evolutionary Database see surge we'll a in database development Design, " 2003; wrvw.martinfowler.com./articles/ tools to support evolutionary, if not agile, da- evodb.html. tabase development techniques. @ 11. S.\)(/. Ambler, "Development Sandboxes: An Agile Best Practice," 2002-2006 ; www.agiledata.org/essays/ sandboxes.html. Relerences 1. D. Astels, Test Driuen Deuelopffient: A Practical Guide, Prentice Hall, 2003. 2. K. Beck, Test Driuen Deuelopment: By Example, Addi- son-Wesle,v, 2003. 3. M. Fowler, Refactoring: lmprouing the Design of Exist- For more informolion on lhis or ony other compuling lopk, pleose visil our ing Code, Addison-Wesley Longman, 1999. Digitol Librory ot ww.computer.org/publicotions/dlib.

See the Future of Gomputing llow

ln IEEE lntelligent Systens

Tomorrow's PCs, handhelds, and lnternet will use technology that exploits current research in artificial intelligence. Breakthroughs in areas such as intelligent agents, the Semantic Web, data mining, and natural language processing will revolutionize your work and leisure activities. Read about this research as it happens in IEEE lntelligent Systems.

May June 2047 IEEE S0FTlillARE 43