Mutation Testing

Assessing Test Quality David Schuler Dissertation zur Erlangung des Grades des Doktors der Ingenieurwissenschaften der Naturwissenschaftlich-Technischen Fakultaten¨ der Universitat¨ des Saarlandes Saarbrucken,¨ 2011 Day of Defense Dean Prof. Holger Hermanns Head of the Examination Board Prof. Dr. Reinhard Wilhelm Members of the Examination Board Prof. Dr. Andreas Zeller, Prof. Dr. Sebastian Hack Dr. Valentin Dallmeier ii Abstract When developing tests, one is interested in creating tests of good quality that thor- oughly test the program. This work shows how to assess test quality through mutation testing with impact metrics, and through checked coverage. Although there a different aspects that contribute to a test’s quality, the most impor- tant factor is its ability to reveal defects, because software testing is usually carried out with the aim to detect defects. For this purpose, a test has to provide inputs that execute the defective code under such conditions that it causes an infection. This infection has to propagate and result in a failure, which can be detected by a check of the test. In the past, the aspect of test input quality has been extensively studied while the quality of checks has received less attention. The traditional way of assessing the quality of a test suite’s checks is mutation testing. Mutation testing seeds artificial defects (mutations) into a program, and checks whether the tests detect them. While this technique effectively assesses the quality of checks, it also has two drawbacks. First, it places a huge demand on computing resources. Second, equivalent mutants, which are mutants that are semantically equivalent to the original program, dilute the quality of the results. In this work, we address both of these issues. We present the JAVALANCHE framework that applies several optimizations to enable automated and efficient mutation testing for real-life programs. Furthermore, we address the problem of equivalent mutants by introducing impact metrics to detect non-equivalent mutants. Impact metrics compare properties of tests suite runs on the original program with runs on mutated versions, and are based on abstrac- tions over program runs such as dynamic invariants, covered statements, and return values. The intention of these metrics is that mutations that have a graver influence on the program run are more likely to be non-equivalent. Moreover, we introduce checked coverage, an alternative approach to measure the quality of a test suite’s checks. Checked coverage determines the parts of the code that were not only executed, but that actually contribute to the results checked by the test suite, by computing dynamic backward slices from all explicit checks of the test suite. iv Zusammenfassung Diese Arbeit stellt dar, wie die Qualitat¨ von Software Tests durch mutationsbasiertes Testen in Verbindung mit Auswirkungsmaßen und durch Checked Coverage beurteilt werden kann. Obwohl unterschiedliche Faktoren die Qualitat¨ eines Tests beeinflussen, ist der wichtigste Aspekt die Fahigkeit¨ Fehler aufzudecken. Dazu muss ein Test Eingaben bereitstellen, die den fehlerhaften Teil des Programms so ausfuhren,¨ dass eine Infek- tion entsteht, d.h. der Programmzustand fehlerhaft wird. Diese Infektion muss sich so fortpflanzen, dass sie in einem fehlerhaften Ergebnis resultiert, welches dann von einer Test-Prufung¨ erkannt werden muss. Die herkommliche¨ Methode um die Qualitat¨ von Test-Prufungen¨ zu beurteilen ist mutationsbasiertes Testen. Hierbei werden kunstliche¨ Fehler (Mutationen) in ein Pro- gramm eingebaut und es wird uberpr¨ uft,¨ ob diese von den Tests erkannt werden. Ob- wohl diese Technik die Qualitat¨ von Test-Prufungen¨ beurteilen kann, weist sie zwei Nachteile auf. Erstens hat sie einen großen Bedarf an Rechenkapazitaten.¨ Zweitens verwassern¨ aquivalente¨ Mutationen, welche zwar die Syntax eines Programms andern,¨ jedoch nicht seine Semantik, die Qualitat¨ der Ergebnisse. In dieser Arbeit werden Losungen¨ fur¨ beide Probleme aufgezeigt. Wir prasentieren¨ JAVALANCHE, ein Sys- tem, das effizientes und automatisiertes mutationsbasiertes Testen fur¨ realistische Pro- gramme ermoglicht.¨ Des Weiteren wird das Problem von aquivalenten¨ Mutationen mit- tels Auswirkungsmaßen angegangen. Auswirkungsmaße vergleichen Eigenschaften zwischen einem normalen Programmlauf und einem Lauf eines mutierten Programms. Hierbei werden verschiedene Abstraktionen uber¨ den Programmlauf benutzt. Die zu- grunde liegende Idee ist, dass eine Mutation, die eine große Auswirkung auf den Pro- grammlauf hat, weniger wahrscheinlich aquivalent¨ ist. Daruber¨ hinaus stellen wir Checked Coverage vor, ein neuartiges Abdeckungsmaß, welches die Qualitat¨ von Test-Prufungen¨ misst. Checked Coverage bestimmt die Teile im Programmcode, die nicht nur ausgefuhrt,¨ sondern deren Resultate auch von den Tests uberpr¨ uft¨ werden. vi Acknowledgements First of all, I would like to thank my adviser Andreas Zeller for the guidance and support while working on my PhD. Many thanks also go to Sebastian Hack for being the second examiner of my thesis, and to During my time at the Software Engineering chair, I have been fortunate to work with great colleagues, of whom many helped me with proofreading parts of this thesis. Thank you Valentin Dallmeier, Gordon Fraser, Florian Groß, Clemens Hammacher, Kim Herzig, Yana Mileva, Jeremias Roßler,¨ and Kevin Streit. While working at this chair, I have been lucky to have great office mates. Thank you Christian Lindig, Rahul Premraj and Thomas Zimmermann. The research that I carried out would not have been possible without the work of our system administrators Kim Herzig, Sascha Just, Sebastian Hafner and Christian Holler who did a great job at maintaining the infras- tructure at our chair. For parts of this research, I collaborated with Bernhard Grun¨ and Jaechang Nam, whom I would like to thank. Furthermore, I would like to thank Arnika Marx and Anna Theobald for proofreading. I am deeply grateful to my friends and family. Especially, I would like to thank Anna, Conny, Heiner, Bastian, Julia, Luzie, Peter and Timo for cooking together and the good time we spent together. Finally, my biggest thanks go to my parents Sigrid and Werner for always supporting me. Contents 1 Introduction1 1.1 Thesis Structure.............................3 1.2 Publications...............................4 2 Background7 2.1 Software Testing............................7 2.2 Unit Tests................................9 2.3 Coverage Metrics............................ 11 2.3.1 Control Flow Criteria...................... 12 2.3.2 Data Flow Criteria....................... 14 2.3.3 Logic Coverage Criteria.................... 14 2.3.4 Summary Coverage Criteria.................. 15 2.4 Program Analysis............................ 16 2.4.1 Static Program Analysis.................... 16 2.4.2 Dynamic Program Analysis.................. 17 2.4.3 Execution Trace........................ 18 2.4.4 Program Analysis Techniques................. 19 2.4.5 Coverage Metrics........................ 19 2.4.6 Program Slicing........................ 19 2.4.7 Invariants............................ 22 2.4.8 Related Work.......................... 24 2.5 Summary................................ 25 3 Mutation Testing 27 3.1 Underlying Hypotheses......................... 30 3.1.1 Competent Programmer Hypothesis.............. 30 3.1.2 Coupling Effect......................... 31 ix CONTENTS 3.2 Costs of Mutation Testing........................ 31 3.3 Optimizations.............................. 32 3.3.1 Mutation Reduction Techniques................ 32 3.3.2 Mutant Sampling........................ 33 3.3.3 Selective Mutation....................... 33 3.3.4 Weak Mutation......................... 34 3.3.5 Mutation Schemata....................... 34 3.3.6 Coverage Data......................... 35 3.3.7 Parallelization......................... 35 3.4 Related Work.............................. 35 3.5 Summary................................ 36 4 The Javalanche Framework 37 4.1 Applying Javalanche.......................... 39 4.2 Subject Programs............................ 41 4.3 Mutation Testing Results........................ 43 4.4 Related Work.............................. 44 4.4.1 Further Uses of Javalanche................... 45 4.5 Summary................................ 47 5 Equivalent Mutants 49 5.1 Types of Mutations........................... 50 5.1.1 A Regular Mutation...................... 50 5.1.2 An Equivalent Mutation.................... 50 5.1.3 A Not Executed Mutation................... 51 5.2 Manual Classification.......................... 52 5.2.1 Percentage of Equivalent Mutants............... 52 5.2.2 Classification Time....................... 53 5.2.3 Mutation Operators....................... 53 5.2.4 Types of Equivalent Mutants.................. 54 5.2.5 Discussion........................... 56 5.3 Related Work.............................. 58 5.4 Summary................................ 59 6 Invariant Impact of Mutations 61 6.1 Learning Invariants........................... 62 6.2 Checking Invariants........................... 63 6.3 Classifying Mutations.......................... 64 6.4 Evaluation................................ 65 x CONTENTS 6.4.1 Evaluation Subjects....................... 65 6.4.2 Manual Classification of Impact Mutants........... 67 6.4.3 Invariant Impact and Tests................... 71 6.4.4 Ranking............................. 73 6.4.5 Invariant Impact of the Manually Classified Mutants..... 76 6.4.6 Discussion..........................

Load more