Neural Networks and Search Landscapes for Software Testing

Neural Networks and Search Landscapes for Software Testing Leonid Joffe A report submitted in fulfilment of the requirements for the degree of Doctor of Philosophy of UCL. Department of Computer Science UCL July 9, 2020 This thesis was made possible thanks to my father's persistent efforts { and despite mine. Thanks, dad. 1 Declaration I, Leonid Joffe, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the work. 2 Summary of Publications Parts of the work presented in this thesis have been published in the following papers. For each of these papers, the author of this thesis contributed to developing the concept and to writing, conducted all experimental work, and presented the papers at venues. • Leonid Joffe and David Clark. Constructing Search Spaces for SBST using Machine Learning, published at the International Symposium on Search Based Software Engineering (SSBSE 2019). This paper forms the basis of Chapter 3, and outlines the concepts upon which further chapters build. • Leonid Joffe and David Clark. Directing a Search Towards Execution Properties with a Learned Fitness Function, published at the 12th IEEE International Conference on Software Testing, Validation and Verifica- tion (ICST 2019). Chapter 4 is based on this paper. • Leonid Joffe. Machine Learning Augmented Fuzzing, published at the International Symposium on Software Reliability Engineering Work- shops (ISSREW 2018). This PhD symposium short paper is an overview of the whole thesis, as implemented and planned at the time of publi- cation. In addition, the author of this thesis co-authored the following publications that are not related to the work described in this thesis. • David R White, Leonid Joffe, Edward Bowles, Jerry Swan. Deep Pa- rameter Tuning of Concurrent Divide and Conquer Algorithms in Akka, published at the European Conference on the Applications of Evolu- tionary Computation (EvoStar 2017). Initially intended as a submis- sion to the SSBSE challenge track, this short paper is a proof of concept for automatic deep parameter tuning for a Java concurrency framework. The author of this thesis participated in implementing and conducting 3 the experiments, and writing. • Akhil Mathur, Tianlin Zhang, Sourav Bhattacharya, Petar Velickovic, Leonid Joffe, Nicholas D Lane, Fahim Kawsar, Pietro Lió. Using Deep Data Augmentation Training to Address Software and Hardware Het- erogeneities in Wearable and Smartphone Sensing Devices, published at the 17th ACM/IEEE International Conference on Information Pro- cessing in Sensor Networks (IPSN 2018). For this paper, the author of this thesis participated in the implementation of the experiments. 4 Abstract Search-Based Software Testing (SBST) methods are popular for improving the reliability of software. They do, however, suffer from challenges: poor representation, ineffective fitness functions and search landscapes, and restricted testing strategies. Neural Networks (NN) are a machine learning technique that can process complex data, they are continuous by design, and they can be used in a generative capacity. This thesis ex- plores a number of approaches that leverage these properties to tackle the challenges of SBST. The first use case is for defining fitness functions to target specific properties of interest. This is showcased by first training an NN to classify an execution trace as crashing or non-crashing. The estimate is then used to prioritise previously unseen executions that are deemed more likely to crash by the NN. This fitness function yields more efficient crash discovery than a baseline. The second proposition is to use NNs to define a search space for a diversity driven testing strategy. The space is constructed by encoding execution traces into an n-dimensions, where distance represents the degree of feature similarity. This thesis argues that this notion of similarity can drive a diversification driven testing strategy. Finally, an application of a generative model for SBST is presented. Initially, random inputs are fed to the program and execution traces are collected and encoded. Redundant executions are culled, distinct ones are kept and the loop is repeated. Over time, this mechanism discovers new program behaviours and these are added to the ever more diverse training dataset. Although this approach does not yet compete with existing tools, experiments show that the notion of similarity is meaningful, the generated program inputs are sensible and faults are found. Much of the work presented in this thesis is exploratory and is meant to serve as basis for future research. 5 Impact Statement Testing is an integral part of a software development process. Many popular approaches use feedback to guide an automatic generator of program inputs. These methods fall under a discipline known as Search-Based Software Test- ing (SBST). The effectiveness of any SBST process depends on choices of representation, fitness function and search strategy. A poor representation restricts the choice of a fitness function and the resulting search landscape. A poor search landscape in turn limits the choice of search strategies. As a result, the effectiveness of an SBST process is hampered. This thesis proposes the use of Neural Networks (NN) to address these challenges. The first part of the thesis shows how an NN can be used to process observations of program execution to construct convenient, continuous search landscapes { for targeted and diversification-driven strategies alike. For targeted strategies, observations are used to train an NN to predict a execution property of interest, and the output of the NN is then interpreted as a fitness. For diversification-driven strategies, execution traces are encoded into a \latent" space which defines an ordering over program behaviours using ap- parently non-orderable observations. Experimental results show that these search spaces are continuous and large, and that they have a strong predictive ability for various properties of interest. To the best of the author's knowl- edge, constructing search landscapes for SBST in this manner is novel. The second part of the thesis shows that an NN-generated search landscape for a property targeting strategy is effective. An NN is first trained to predict the presence of a crash given an execution trace. During testing, the output of the NN is used as a fitness of executions that have not yet crashed. Em- pirical results show that a search using this fitness consistently outperforms benchmarks. 6 The final portion of the thesis is a framework for diversity driven testing. The approach combines two NNs: one generates program inputs, and the other encodes execution traces. As the networks train, they generate new program inputs that explore the program's behaviours. Uninteresting datapoints are periodically discarded, so the training is conducted with a dataset comprised of the most interesting datapoints. Experimental results show that such a framework can be trained, that it does produce diverse program inputs, that the notion of similarity in the latent space is meaningful and that it can actually discover crashes. The work presented does not attempt to outperform state of the art testing tools. Instead, it is intended to propose and explore a number of approaches for alleviating the challenges of SBST by using NNs. The presented results reflect these aims; they show that although the tools' immediate efficacy for automated testing is limited, the approaches are viable and can be used as a basis for various directions of future work. 7 Contents 1 Introduction 15 1.1 Basic Concepts . 17 1.2 Challenge of Testing in General . 19 1.3 Search-Based Software Testing and its Challenges . 22 1.4 Overview, Contributions and Scope . 28 1.4.1 Producing Search Landscapes for SBST with Neural Networks . 29 1.4.2 Fitness Functions for Property Targeting Search . 30 1.4.3 Deconvolutional Generative Adversarial Fuzzer . 31 2 Background and Literature 33 2.1 Search-Based Software Testing . 34 2.1.1 Fuzzing and the American Fuzzy Lop . 34 2.1.2 Representations and Fitness Landscapes . 40 2.1.3 Search Strategies . 47 2.2 Neural Networks . 54 2.2.1 Neural Network Model . 54 2.2.2 Training . 56 2.2.3 Inference . 59 2.2.4 Embeddings . 60 2.2.5 Autoencoders . 63 8 2.2.6 Recurrent Neural Networks . 68 2.2.7 Convolutional Neural Networks . 69 2.2.8 Generative Neural Networks . 71 2.3 Machine Learning in Software Engineering . 75 2.3.1 Program Profiling . 76 2.3.2 GNNs in Software Engineering . 77 3 Producing Search Landscapes for SBST with Neural Net- works 80 3.1 Overview . 81 3.1.1 Property Targeting Search Landscape . 81 3.1.2 Diversity Driven Search Landscape . 83 3.1.3 Contributions and Scope . 84 3.2 Datasets and Tools . 84 3.2.1 Program Corpus . 85 3.2.2 Properties of Interest . 86 3.2.3 PIN Traces . 88 3.2.4 Ftrace Traces . 90 3.3 Experimental Setup . 91 3.3.1 Exp.1 { Search Landscape for a Property Targeting Strategy . 91 3.3.2 Exp.2 { Search Landscape for a Diversity Driven Strategy 94 3.4 Results . 94 3.4.1 Size and Continuity of Landscapes . 95 3.4.2 Representation Condition . 96 3.4.3 Meaningful Ordering of Candidate Solutions . 99 3.5 Conclusion . 102 4 Fitness Functions for Property Targeting Search 104 4.1 Overview . 105 4.2 Research Goals . 106 9 4.2.1 Representation . 106 4.2.2 Fitness Function . 107 4.2.3 Generalisability to real world Programs . 107 4.3 Experimental Setup . 107 4.3.1 Dataset . 108 4.3.2 Trace Instrumentation . 108 4.3.3 Experiments . 111 4.3.4 Regression Classifier Neural Network . 112 4.3.5 Modifications to AFL . 113 4.4 Results . 117 4.4.1 RQ1 { Correlation of C Library Call Traces and Crashes117 4.4.2 RQ2 { Fuzzing with a Targeted Fitness Function .

Load more