Empirical Evaluation of the Effectiveness and Reliability of Software Testing Adequacy Criteria and Reference Test Systems

Empirical Evaluation of the Effectiveness and Reliability of Software Testing Adequacy Criteria and Reference Test Systems Mark Jason Hadley PhD University of York Department of Computer Science September 2013 2 Abstract This PhD Thesis reports the results of experiments conducted to investigate the effectiveness and reliability of ‘adequacy criteria’ - criteria used by testers to determine when to stop testing. The research reported here is concerned with the empirical determination of the effectiveness and reliability of both tests sets that satisfy major general structural code coverage criteria and test sets crafted by experts for testing specific applications. We use automated test data generation and subset extraction techniques to generate multiple tests sets satisfying widely used coverage criteria (statement, branch and MC/DC coverage). The results show that confidence in the reliability of such criteria is misplaced. We also consider the fault-finding capabilities of three test suites created by the international community to serve to assure implementations of the Data Encryption Standard (a block cipher). We do this by means of mutation analysis. The results show that not all sets are mutation adequate but the test suites are generally highly effective. The block cipher implementations are also seen to be highly ‘testable’ (i.e. they do not mask faults). 3 Contents Abstract ............................................................................................................................ 3 Table of Tables ................................................................................................................ 7 Table of Figures ............................................................................................................... 8 Acknowledgements .......................................................................................................... 9 Author's declaration ....................................................................................................... 10 1. Introduction ................................................................................................................ 11 1.1 Background and Motivation ................................................................................ 11 1.2 Research Objectives ............................................................................................. 12 1.3 The Aims and Goals of the Thesis ....................................................................... 13 1.4 Overview and Structure of the Thesis .................................................................. 14 2. Literature Survey ....................................................................................................... 16 2.1. Fundamental Concepts ........................................................................................ 17 2.1.1 Testing – the English Usage ......................................................................... 17 2.1.2 Software Testing – a Technical Usage.......................................................... 18 2.1.3 Faults and Failures ........................................................................................ 18 2.1.4 Faults, Failures and Testing .......................................................................... 20 2.1.5 Terminology in the Literature ....................................................................... 20 2.1.6 What is Tested and Where? .......................................................................... 21 2.1.7 Phases of Dynamic Testing: Unit, Integration, System and Acceptance Testing.................................................................................................................... 25 2.2 Characteristics of Testing .................................................................................... 29 2.2.1 Introduction ................................................................................................... 29 2.2.2 Static v Dynamic ........................................................................................... 29 2.2.3 Functional v Structural .................................................................................. 30 2.2.4 Formal v Informal ......................................................................................... 33 2.2.5 Manual v Automated ..................................................................................... 35 2.2.6 Testing v Debugging ..................................................................................... 37 2.3. Software Standards, Process Models and Metrics .............................................. 40 2.3.1 Introduction ................................................................................................... 40 2.3.2 Software Standards ....................................................................................... 40 2.3.4 Software Testing Metrics .............................................................................. 45 2.3.5 Software Projects Continue to Fail ............................................................... 49 2.4. Model Testing, Reliability Growth Models and Fault Adequacy Techniques ... 51 4 2.4.1 Introduction ................................................................................................... 51 2.4.2 Model Based Testing .................................................................................... 51 2.4.3 Reliability Growth Models ........................................................................... 55 2.4.4 Fault Based Adequacy Techniques ............................................................... 58 2.4.5 Execution, Infection and Propagation (PIE) ..................................................... 65 2.5. What do Empirical and Case Studies tell us about Software Testing? ............... 66 2.6 When Can We Stop Testing? ............................................................................... 77 2.6.1 Introduction ................................................................................................... 77 2.6.2 Can We Stop Testing Now? .......................................................................... 77 2.7 Test Set Sub-setting via Heuristic Search ............................................................ 81 2.8 Summary .............................................................................................................. 85 3. The Research Test Framework and its Components .................................................. 86 3.1 Framework ........................................................................................................... 87 3.1.1 Overview of the Framework ............................................................................. 87 3.1.2 Code-Coverage ................................................................................................. 91 3.1.3 Test Set Generation ........................................................................................... 95 3.1.4 JUnit and the Framework .................................................................................. 96 3.1.5 Test Subset Extraction ...................................................................................... 98 3.1.5.1 Genetic Algorithms ........................................................................................ 98 3.1.6 Mutation .......................................................................................................... 103 3.1.7 Mutation Score Generation ............................................................................. 106 3.2 Testing Framework Refinement ........................................................................ 108 3.2.1 Overview of the Framework Refinement ....................................................... 108 3.2.2 Test Case Generation and Execution .............................................................. 109 3.2.3 Mutant Testing and Mutant Score Generation ................................................ 111 3.3 Framework Summary......................................................................................... 112 4. Testing Code Coverage Criteria............................................................................... 114 4.1 Numerical Programs Under Test and Their Program Properties ....................... 116 4.2 Mutations Injected and Results .......................................................................... 118 4.2.1 Full Test Set Results ....................................................................................... 118 4.2.2 Rationale for Different MS in Numerical Recipes ......................................... 121 4.2.3 Test Subset Results ......................................................................................... 126 4.2.4 Results from Miscellaneous and Sorting Algorithms. .................................... 131 4.3 Program Properties............................................................................................. 137 5 4.4 Conclusions ........................................................................................................ 137 5. Testing Encryption Algorithms................................................................................ 140 5.1 Programs under Test, Program Properties and Test Vectors ............................. 141 5.2. Mutation Results ............................................................................................... 145 5.4 Conclusions .......................................................................................................

Empirical Evaluation of the Effectiveness and Reliability of Software Testing Adequacy Criteria and Reference Test Systems

The Use of Summation to Aggregate Software Metrics Hinders the Performance of Defect Prediction Models

Balancing Dependability Quality Attributes Relationships for Increased Embedded Systems Dependability

A Reasoning Framework for Dependability in Software Architectures Tacksoo Im Clemson University, [email protected]

Testing E-Commerce Systems: a Practical Guide - Wing Lam

Interpreting Reliability and Availability Requirements for Network-Centric Systems What MCOTEA Does

Fundamental Concepts of Dependability

Types of Software Testing

Dependability Assessment of Software- Based Systems: State of the Art

QA and Testing Have Become More of a Concurrent Process

Software Maintainability a Brief Overview & Application to the LFEV 2015

A Metrics-Based Software Maintenance Effort Model

Software Reliability and Dependability: a Roadmap Bev Littlewood & Lorenzo Strigini