Machine Learning-Based Software Testing: Towards a Classification
Total Page:16
File Type:pdf, Size:1020Kb
Machine Learning-based Software Testing: Towards a Classification Framework Mahdi Noorian1, Ebrahim Bagheri1,2, and Wheichang Du1 University of New Brunswick, Fredericton, Canada1 Athabasca University, Edmonton, Canada2 [email protected], [email protected], [email protected] Abstract—Software Testing (ST) processes attempt to verify as: What types of machine learning methods can be effective and validate the capability of a software system to meet its • required attributes and functionality. As software systems become in different aspects of software testing? more complex, the need for automated software testing methods What are the strengths and weaknesses of each learning • emerges. Machine Learning (ML) techniques have shown to be method in the context of testing? quite useful for this automation process. Various works have How can we determine the proper type of learning method been presented in the junction of ML and ST areas. The lack of • general guidelines for applying appropriate learning methods for for the stages of a testing process? Where are the critical points in a software testing process software testing purposes is our major motivation in this current • paper. In this paper, we introduce a classification framework in which ML can positively contribute? which can help to systematically review research work in the The general purpose of our ongoing research is to provide ML and ST domains. The proposed framework dimensions are defined using major characteristics of existing software testing a specific set of guidelines for automating software testing and machine learning methods. Our framework can be used to processes with the aid of machine leaning techniques. To come effectively construct a concrete set of guidelines for choosing the up with such set of guidelines, it is required to systematically most appropriate learning method and applying it to a distinct analyse the current research work and find answers to some of stage of the software testing life-cycle for automation purposes. the abovementioned questions. I. INTRODUCTION In the first step, we propose a framework which can be used to classify the current research work in the conjunction of ML Software Testing (ST) is an investigation process which and ST. In addition, to support our framework, some works are attempts to validate and verify the alignment of a software reviewed. This classification framework can assist ST practi- system’s attributes and functionality with its intended goals. tioners to analyse and understand the emerging applications of Software testing is a labour intensive and costly process and the ML-ST domain. Such structured classification framework as mentioned in [2], [3], a testing process may require up to can be useful for defining and constructing a set of guidelines 50% of the development resources. Due to this fact, automated for automating software testing processes. testing approaches are desired to reduce this cost and time. Be- The reminder of this paper is as follows. In Section II , the sides, automation can significantly enhance the testing process proposed classification framework is presented. In addition, the performance. Hence, for future software systems development, dimensions of the framework are discussed in detail. Section steps need to be taken towards the development of automated III is devoted to presenting some work in the area of ML and testing strategies [4]. ST. In Section IV we discuss how this set of representative Several interesting attempts have already been made for work can be classified according to our proposed framework. automating the software testing process. Machine Learning We conclude the paper with conclusions and direction for (ML) as a sub domain of AI [12] is widely used in various future work in Section V. stages of the software development life-cycle [19], especially for automating software testing processes [5]. In [1], [17], II. THE PROPOSED FRAMEWORK FOR CLASSIFICATION evolutionary algorithms have been employed for automat- During the past decade, several works have been introduced ing test case generation. In [16], Artificial Neural Networks based on machine learning techniques where learning algo- (ANN) have been used to build a model for estimating the rithms have been applied to the various stages of software effectiveness of the generated test cases. Briand et al. in testing. In order to clarify the stand point of ML in the area [8] has proposed a method based on the C4.5 decision tree of ST, we propose a classification framework. In this section, algorithm for predicting potential bugs in a software system we take a look at the proposed framework and its dimensions. and localizing the bugs in order to reduce the debugging process time. All research works show that the employment A. The Framework’s Dimensions of machine learning techniques is a promising approach for The most important factors that can be used to distinguish automating testing processes. But, still some major questions the current research work are represented in Fig. 1. In its remain that need to be addressed and further investigated such highest layer, the framework consists of two main categories TABLE I work: acceptance testing, system testing, integration testing, TESTING GENERAL ACTIVITY DIMENTION module testing, unit testing, and regression testing, which Sub-dimension Possible value A: Test Planning Testing cost estimation [9] refer to requirement analysis, architectural design, subsystem Test case prioritization [1] design, detailed design, implementation, and maintenance in B: Test Case Management Test case design [1] the software development stages, respectively. Test case refinement [6] Test case evaluation [16] Dimension 4-Learning technique: In the machine learning Fault localization [18] area, various types of learning methods have been introduced C: Debugging Bug prioritization [18] and each of them has its own specific characteristics. We adopt Fault prediction [15] Mitchel [12] learning method classification and define learn- (Testing and ML). According to the defined categories, six ing technique dimension values as follows: Decision Trees main dimensions and some sub-dimensions are obtainable in (DT), Artificial Neural Networks (ANN), Genetic Algorithms the classification framework. The framework’s dimensions are (GA), Bayesian Learning (BL), Instance Base Learning (IBL), as follows: Clustering, and Hybrid methods, which can be combination of Dimension 1-Testing approach: According to the older test- several other learning methods. ing terminology [2], we define the testing approach dimension Dimension 5-Learning property: Each learning method has with three possible values: black-box, white-box, and gray- its own specific characteristics. This dimension is devoted box [2]. Based on the black-box approach, testing can be to evaluating learning methods properties in various aspects. performed using the external description of a software system To do so, three sub-dimensions; Training data properties, such as the software specification. In a white-box approach, Supervision, Time generalization are defined. The Training the internal properties of a software system like source code data properties evaluate the properties of a data set. The can be used for testing purposes. The grey-box approach is gathered data for learning process can be small or large in a combination of the two, which considers both internal and terms of its quantity. Data can be noisy or accurate in terms external properties at the same time. of errors that might exist in a data set. In addition, learning Dimension 2-Testing general activity: In the second dimen- can be supervised or unsupervised; therefore, the Supervision sion, the standard testing process life-cycle [13] inspired us to sub-dimension is defined. For a given target function the time define the a) test planning, b) test case management, and c) generalization sub-dimension exposes how its generalization debugging sub-dimensions. With respect to the testing process is; which can be either eager (at learning phase) or lazy (at life-cycle, we found out that these three phases are critical and classification phase). In terms of online and offline learning, that automation can effectively assist them in order to reduce also time generalization shows how a learning system updates the cost and time of the whole testing process. The possible its approximation of the target function after the initial training activities that can be automated based on ML in a, b, and c data are used. The last sub-dimension, automation degree, is are presented in Table I. responsible for addressing the learning system in terms of the In the test planning sub-dimension, testing cost estimation degree of automation. The designed system can be fully or can help test managers to predict testing process cost and time partially automatic. and provide good testing plan to manage the testing process Dimension 6-Elements learned: In each learning system for efficiently. Test case management includes several tasks such software testing automation, various types of data can be as test case prioritization, which intends to prioritize the test used to build the target function. The training data can be input space in terms of test case effectiveness; test case design, collected in different stages of the software testing process or which intends to generate high quality test cases; test case software development life-cycle. The learning elements could refinement, which intends to map the current specification of a be