Metamorphic Testing Techniques to Detect Defects in Applications Without Test Oracles
Total Page:16
File Type:pdf, Size:1020Kb
Metamorphic Testing Techniques to Detect Defects in Applications without Test Oracles Christian Murphy Submitted in partial fulfillment of the Requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2010 c 2010 Christian Murphy All Rights Reserved Abstract Metamorphic Testing Techniques to Detect Defects in Applications without Test Oracles Christian Murphy Applications in the fields of scientific computing, simulation, optimization, machine learning, etc. are sometimes said to be “non-testable programs” because there is no reliable test oracle to indicate what the correct output should be for arbitrary input. In some cases, it may be impossible to know the program’s correct output a priori; in other cases, the creation of an oracle may simply be too hard. These applications typically fall into a category of software that Weyuker describes as “Programs which were written in order to determine the answer in the first place. There would be no need to write such programs, if the correct answer were known.” The absence of a test oracle clearly presents a challenge when it comes to detecting subtle errors, faults, defects or anomalies in software in these domains. As these types of programs become more and more prevalent in various aspects of everyday life, the dependability of software in these domains takes on increasing impor- tance. Machine learning and scientific computing software may be used for critical tasks such as helping doctors perform a medical diagnosis or enabling weather forecasters to more accurately predict the paths of hurricanes; hospitals may use simulation software to understand the impact of resource allocation on the time patients spend in the emergency room. Clearly, a software defect in any of these domains can cause great inconvenience or even physical harm if not detected in a timely manner. Without a test oracle, it is impossible to know in general what the expected output should be for a given input, but it may be possible to predict how changes to the input should effect changes in the output, and thus identify expected relations among a set of inputs and among the set of their respective outputs. This approach, introduced by Chen et al., is known as “metamorphic testing”. In metamorphic testing, if test case input x produces an output f (x), the function’s so-called “metamorphic properties” can then be used to guide the creation of a transformation function t, which can then be applied to the input to produce t(x); this transformation then allows us to predict the expected output f (t(x)), based on the (already known) value of f (x). If the new output is as expected, it is not necessarily right, but any violation of the property indicates that one (or both) of the outputs is wrong. That is, though it may not be possible to know whether an output is correct, we can at least tell whether an output is incorrect. This thesis investigates three hypotheses. First, I claim that an automated approach to metamorphic testing will advance the state of the art in detecting defects in programs without test oracles, particularly in the domains of machine learning, simulation, and optimization. To demonstrate this, I describe a tool for test automation, and present the results of new empirical studies comparing the effectiveness of metamorphic testing to that of other techniques for testing applications that do not have an oracle. Second, I claim that conducting function-level metamorphic testing in the context of a running application will reveal defects not found by metamorphic testing using system-level properties alone, and introduce and evaluate a new testing technique called Metamorphic Runtime Checking. Third, I hypothesize that it is feasible to continue this type of testing in the deployment environment (i.e., after the software is released), with minimal impact on the user, and describe an approach called In Vivo Testing. Additionally, this thesis presents guidelines for identifying metamorphic properties, explains how metamorphic testing fits into the software development process, and discusses suggestions for both practitioners and researchers who need to test software without the help of a test oracle. Contents List of Figures vi List of Tables ix 1 Introduction 1 1.1 Definitions . .3 1.2 Problem Statement . .5 1.3 Requirements . .6 1.4 Scope . .7 1.5 Proposed Approach . .8 1.6 Hypotheses . 11 1.7 Outline . 12 2 Background 14 2.1 Motivation . 14 2.2 Metamorphic Testing . 16 2.3 Previous Work in Metamorphic Testing . 18 2.4 Devising Metamorphic Properties . 19 2.4.1 Mathematical Properties . 19 2.4.2 Considerations for General Properties . 21 2.4.3 Automated Detection . 23 i 2.4.4 Effect on the Development Process . 25 2.5 Applications in the Domain of Interest . 25 2.5.1 Machine learning fundamentals . 26 2.5.2 MartiRank . 27 2.5.3 Support Vector Machines . 31 2.5.4 C4.5 . 35 2.5.5 PAYL . 37 2.6 Summary . 40 3 Automated Metamorphic System Testing 41 3.1 Motivation . 43 3.1.1 The Need for Automation . 43 3.1.2 Imprecision . 44 3.1.3 Non-Determinism . 45 3.2 Automated Testing Framework . 45 3.2.1 Model . 46 3.2.2 Assumptions . 48 3.2.3 Specifying Metamorphic Properties . 48 3.2.4 Configuration . 52 3.2.5 Execution of Tests . 53 3.2.6 Testing in the Deployment Environment . 54 3.2.7 Effect on Testing Time . 55 3.3 Heuristic Metamorphic Testing . 55 3.3.1 Reducing False Positives . 56 3.3.2 Addressing Non-Determinism . 58 3.4 Empirical Studies . 62 3.4.1 Techniques Investigated . 63 3.4.2 Study #1: Machine Learning Applications . 66 ii 3.4.3 Study #2: Applications in Other Domains . 83 3.4.4 Study #3: Non-Deterministic Applications . 96 3.4.5 Summary . 102 3.4.6 Threats to Validity . 102 3.5 Summary . 105 4 Metamorphic Runtime Checking 107 4.1 Approach . 109 4.1.1 Overview . 109 4.1.2 Devising Metamorphic Properties . 111 4.2 Model . 112 4.3 Architecture . 114 4.3.1 Creating Tests . 115 4.3.2 Instrumentation and Test Execution . 122 4.3.3 Limitations . 124 4.4 Case Studies . 126 4.4.1 Experimental Setup . 126 4.4.2 Findings . 127 4.4.3 Discussion . 128 4.5 Empirical Studies . 129 4.5.1 Study #1: Machine Learning Applications . 130 4.5.2 Study #2: Applications in Other Domains . 141 4.5.3 Study #3: Non-Deterministic Applications . 147 4.5.4 Summary . 151 4.6 Effect on Testing Time . 151 4.6.1 Developer Effort . 151 4.6.2 Performance Overhead . 152 4.7 Summary . 155 iii 5 In Vivo Testing 156 5.1 Approach . 157 5.1.1 Conditions . 159 5.1.2 In Vivo Tests . 159 5.1.3 Categories and Motivating Examples . 161 5.1.4 In Vivo Testing Fundamentals . 166 5.2 Model . 167 5.3 Architecture . 168 5.3.1 Preparation . 168 5.3.2 Test Execution . 173 5.3.3 Scheduling Execution of Tests . 175 5.3.4 Configuration Guidelines . 176 5.3.5 Distributed In Vivo Testing . 177 5.4 Case Studies . 179 5.5 Performance Evaluation . 180 5.6 More Efficient Test Execution . 183 5.6.1 Analysis . 184 5.6.2 Implementation . 187 5.6.3 Evaluation . 193 5.6.4 Limitations . 197 5.6.5 Broader Impact . 198 5.7 Summary . 199 6 Related Work 201 6.1 Addressing the Absence of a Test Oracle . 201 6.1.1 Embedded Assertion Languages . 201 6.1.2 Extrinsic Interface Contracts and Algebraic Specifications . 202 6.1.3 Pure Specification Languages . 204 iv 6.1.4 Trace Checking and Log File Analysis . 205 6.2 Metamorphic Testing . 206 6.3 Self-Testing Software . 208 6.4 Runtime Testing . 209 6.4.1 Testing in the Deployment Environment . 209 6.4.2 Reducing Performance Overhead . 212 6.5 Domain-Specific Testing . ..