A Comprehensive Framework for Testing Database-Centric Software Applications

Gregory M. Kapfhammer Department of Science University of Pittsburgh

PhD Dissertation Defense April 19, 2007

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 1 Dissertation Committee

Director: Dr. Mary Lou Soffa (University of Virginia)

Dr. Panos Chrysanthis (University of Pittsburgh) Dr. Bruce Childers (University of Pittsburgh) Dr. Jeffrey Voas (SAIC)

With support and encouragement from countless individuals!

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 2 Motivation

The Risks Digest, Volume 22, Issue 64, 2003

Jeppesen reports airspace boundary problems

About 350 airspace boundaries contained in Jeppesen NavData are incorrect, the FAA has warned. The error occurred at Jeppe- sen after a software upgrade when information was pulled from a database containing 20,000 airspace boundaries worldwide for the March NavData update, which takes effect March 20.

Important Point: Practically all use of databases occurs from within applica- tion programs [Silberschatz et al., 2006, pg. 311].

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 3 Research Contributions

Comprehensive framework that tests a program’s interaction with the complex state and structure of a database

Database interaction fault model Database-aware representations Test adequacy Test coverage monitoring Regression testing Worst-case analysis of the algorithms and empirical evaluation with six case study applications

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 4 Traditional Software Testing P Input Output

a 5 print ... exit Final Result: 45 Byte Code

Virtual Graphical Database Machine Interface

File Operating System System

Execution Environment Defects (e.g., bugs, faults, errors) can exist in program P and all aspects of P ’s environment

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 5 Testing Environment Interactions P Input Output

a 5 print ... exit Final Result: 45

Virtual Machine Database Operating System File System

Execution Environment

Defects can also exist in P ’s interaction with its environment

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 6 Focus on Database Interactions

P m

select insert update delete

D1 De

Program P can view and/or modify the state of the database

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 7 Types of Applications

Database−Centric Applications

Interaction Approach Program Location

Embedded Interface Inside DBMS Outside DBMS

Testing framework relevant to all types of applications Current tool support focuses on Interface-Outside applications

Example: Java application that submits SQL Strings to HSQLDB relational database using JDBC drivers

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 8 Research Contributions

Database Interaction Fault Model Test Adequacy Criteria Test Coverage Monitoring Regression Testing Reduction Prioritization

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 9 Database Interaction Faults: (1-v)

P uses update or P m insert to incorrectly modify items within database insert update Commission fault that violates expected database validity Database-aware before after adequacy criteria actual can support fault isolation

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 10 Database Interaction Faults: (1-c)

m P uses delete to P remove incorrect items from database delete Commission fault that violates database expected completeness Database-aware before after adequacy criteria can actual support fault isolation

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 11 Data Flow-Based Test Adequacy P

3 delete from R define(R) where A > 100 m i

6 select * from R use(R)

The intraprocedural database interaction association hn3, n6, Ri exists within method mi

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 12 Research Contributions

Database Interaction Fault Model Test Adequacy Criteria Test Coverage Monitoring Regression Testing Reduction Prioritization

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 13 Test Adequacy Component

P, C

Representation ICFG Constructor Database-Aware Database Representation Constructor Database Database Interaction Entities Database Interaction ICFG Data Flow Analyzer (DI-ICFG) Analyzer

Database Interaction Associations

Process: Create a database-aware representation and perform data flow analysis Purpose: Identify the database interaction associations (i.e., the test requirements)

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 14 Database-Aware Representation

Database interaction qu_lck = "UPDATE UserInfo ..." + temp1 + ";" graphs (DIGs) are update_lock = m_connect.createStatement() placed before interaction

entry G entry Gr 1 A r2 D point define(temp3) define(temp2) Multiple DIGs can be use(temp4) exit Gr1 integrated into a single exit G r 2 CFG Ir result_lock = update_lock.executeUpdate(qu_lck) Analyze interaction in a if( result_lock == 1) control-flow sensitive completed = true fashion exit lockAccount

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 15 Data Flow Time Overhead

P P+D P+R P+Rc P+A P+Av 25

20.50 20.74 20.77 20.78 20.94 21.06 20 L 15 sec H

10 Time

5

P P+D P+R P+Rc P+A P+Av Database Granularity

2.7% increase in time overhead from P to P + Av (TM)

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 16 Research Contributions

Database Interaction Fault Model Test Adequacy Criteria Test Coverage Monitoring Regression Testing Reduction Prioritization

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 17 Database-Aware Coverage Monitoring

Program

Test Suite

Adequacy Criteria Instrumented Program Instrumentation Instrumented Test Suite

Test Suite Coverage Execution Results

Test Adequacy Requirements Calculation

Adequacy Measurements

Purpose: Record how the program interacts with the database during test suite execution

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 18 Database-Aware Instrumentation

Test Coverage Monitoring Instrumentation

Interaction Location Interaction Type

Program Test Suite Defining Using Defining-Using

Efficiently monitor coverage without changing the behavior of the program under test Record coverage information in a database interaction calling context tree (DI-CCT) PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 19 Configuring the Coverage Monitor

Configuration of the Test Coverage Monitor

Instrumentation Tree Format Tree Type Tree Storage

Static Dynamic Binary XML Traditional Database-Aware Standard Compressed

Source Code Bytecode CCT DCT Interaction Level DI-DCT DI-CCT

Database Relation Attribute Record Attribute Value

Flexible and efficient approach that fully supports both traditional and database-centric applications

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 20 Static Instrumentation: Time

FF PI RM ST TM GB All

10 L 8.687 sec H 8 Time

6 5.583 5.169

4.391 4.404 4.396 4.394

4 Instrumentation

2 Static

FF PI RM ST TM GB All Application

Attach probes to all of the applications in less than nine seconds Static approach is less flexible than dynamic instrumentation PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 21 Static Instrumentation: Space

A As

ZIP GZIP PACK

80000 75782 L 68456 68475 bytes H 60000 Size

40000

25084 19419 20000

Application 9034

ZIP GZIP PACK Compression Technique GB

Increase in bytecode size may be largeH L (space vs. time trade-off)

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 22 Static vs. Dynamic Instrumentation

Norm Sta -CCT Sta -DCT Dyn -CCT Dyn -DCT 14

11.435 12 11.084

10 L 8.026 7.626 sec 8 H 6.939

Time 6 TCM 4

2

Norm Sta -CCT Sta -DCT Dyn -CCT Dyn -DCT Instrumentation Technique - Tree Type GB

Static is faster than dynamic / CCT is HfasterL than DCT The coverage monitor is both efficient and effective

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 23 Size of the Instrumented Applications

Compr Tech Before Instr (bytes) After Instr (bytes) None 29275 887609 Zip 15623 41351 Gzip 10624 35594 Pack 5699 34497

Average static size across all case study applications Compress the bytecodes with general purpose techniques Specialized compressor nicely reduces space overhead

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 24 Database Interaction Levels

CCT Interaction Level TCM Time (sec) Percent Increase (%) Program 7.44 12.39 Database 7.51 13.44 Relation 7.56 14.20 Attribute 8.91 34.59 Record 8.90 34.44 Attribute Value 10.14 53.17

Static instrumentation supports efficient monitoring 53% increase in testing time at finest level of interaction

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 25 Research Contributions

Database Interaction Fault Model Test Adequacy Criteria Test Coverage Monitoring Regression Testing Reduction Prioritization

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 26 Database-Aware Regression Testing

Original Test Suite Program

Reduction Modified Test Suite Begin Coverage Report Testing Results End or Prioritization Test Suite Execution

Program or Database Changes

Version specific vs. general approach Use paths in the coverage tree as a test requirement Prioritization re-orders the test suite so that it is more effective Reduction identifies a smaller suite that still covers all of the requirements

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 27 Configuring the Regression Tester

Configuration of the Regression Tester

Technique Test Cost Requirements

Reduction Prioritization Type Unit Actual Type Traditional Database-Aware

Greedy Reverse Random Data Flow Coverage Tree Path

Overlap-Aware Not Overlap-Aware Unique Dominant Type of Tree

Cost Coverage Ratio Super Path Containing Path DCT CCT

Regression testing techniques can be used in the version specific model PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 28 Research Contributions

Database Interaction Fault Model Test Adequacy Criteria Test Coverage Monitoring Regression Testing Reduction Prioritization

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 29 PSfrag replacements Id : graphCT.ladot, v1.22006/07/1220 : 29 : 00gkapfhamExp Revision : 1.2

R R1 R R2 R R3 R R4 R R5 R R6

R7 R1 T1

R2

T8 R3 R3

R3 T7 R3

R3 T11

R4

R4 R4 R4 T5 Finding the Overlap in Coverage R4 R4 T10 R

R5

R6 R2 R1 R3 R4 R6 R7 R5

T10 R7 R7

T8 T2 T4 T12 T1 T3 T11 T7 T9 T10 T5 T6

Rj → Ti means that requirement Rj is covered by test Ti

T = hT2, T3, T6, T9i cover all of the test requirements

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 30 Reducing the Size of the Test Suite

App Rel Attr Rec Attr Value All

RM (13) (7, .462) (7, .462) (10, .300) (9, .308) (8.25, .365) FF (16) (7, .563) (7, .563) (11, .313) (11, .313) (9, .438) PI (15) (6, .600) (6, .600) (8, .700) (7, .533) (6.75, .550) ST (25) (5, .800) (5, .760) (11, .560) (10, .600) (7.75, .690) TM (27) (14, .481) (14, .481) (15, .449) (14, .481) (14.25, .472) GB (51) (33, .352) (33, .352) (33, .352) (32, .373) (32.75, .358)

All (24.5) (12, .510) (12.17, .503) (14.667, .401) (13.83, .435)

Reduction factor for test suite size varies from .352 to .8

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 31 Reducing the Testing Time

GRO GRC GRV GRR RVR RAR

Rc Av

1 0.94 0.94

0.81 0.81 Time 0.78 0.79 0.78 0.79 0.8 -

0.6 Factor

0.39 0.36 0.4 Reduction

0.2

0.06 0.06

Rc Av Database Interaction GB

GRO reduces test execution time HevLen though it removes few tests

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 32 Preserving Requirement Coverage

GRO GRC GRV GRR RVR RAR

Rc Av

1. 1. 0.98 0.99 0.98 0.96 1 0.94 0.91

0.8 Coverage 0.71 0.72 -

0.6 Factor

0.4 0.3

0.2 Preservation 0.09

Rc Av Database Interaction GB

GRO guarantees coverage preserH L vation - others do not

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 33 Research Contributions

Database Interaction Fault Model Test Adequacy Criteria Test Coverage Monitoring Regression Testing Reduction Prioritization

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 34 Cumulative Coverage of a Test Suite

n−1 Cover Π(T1) Cover Si=1 Π(Ti)

PSfrag replacements

) Cover Π(T ) t , T ( T Done

κ n . . . t(n) Area R0 κ(T, t) Covered Test Reqs

Testing Time (t)

T1 Done Tn−1 Done Calculate coverage effectiveness of a prioritization : Actual Ideal PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 35 Improving Coverage Effectiveness

GPO GPC GPV GPR RVP ORP RAP

Rc Av

1 0.94 0.93 0.94 0.93 0.9 0.88 0.87 0.87 0.86 0.84

0.8

Effectiveness 0.56 0.6 0.55

0.4 Coverage 0.22 0.22

0.2

Rc Av Database Interaction GB

GRO is the best choice - originalH Lordering is poor

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 36 Conclusions

Practically all use of databases occurs from within application programs - must test these interactions! Incorporate the state and structure of the database during all testing phases Fault model, database-aware representations, test adequacy, test coverage monitoring, regression testing Experimental results suggest that the techniques are efficient and effective for the chosen case study applications A new perspective on software testing: it is important to test a program’s interaction with the execution environment

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 37 Future Work

Avoiding database server restarts during test suite execution Time-aware regression testing Input simplification and fault localization during debugging Reverse engineering and mutation testing New environmental factors: eXtensible markup language (XML) databases Distributed hash tables Tuple spaces Further empirical studies with additional database-centric applications and traditional programs PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 38 Further Resources

Gregory M. Kapfhammer and Mary Lou Soffa. A Family of Test Adequacy Criteria for Database-Driven Applications. In Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engi- neering, 2003. (Distinguished Paper Award)

Gregory M. Kapfhammer. The Handbook, chapter 105: Software Testing. CRC Press, Boca Raton, FL, Second Edition, June 2004.

Gregory M. Kapfhammer, Mary Lou Soffa, and Daniel Mossé. Testing in Resource-Constrained Exe- cution Environments. In Proceedings of the 20th ACM/IEEE International Conference on Automated Software Engineering, Long Beach, California, November, 2005

Kristen R. Walcott, Mary Lou Soffa, Gregory M. Kapfhammer, and Robert S. Roos. Time-Aware Test Suite Prioritization. In Proceedings of the ACM SIGSOFT/SIGPLAN International Symposium on Software Testing and Analysis, Portland, Maine, June, 2006.

Gregory M. Kapfhammer and Mary Lou Soffa. A Test Adequacy Framework with Database Interac- tion Awareness. ACM Transactions on Software Engineering and Methodology. Invited Paper Under Review.

PhD Dissertation Defense, University of Pittsburgh, April 19, 2007 – p. 39