Constructing Automated Test Oracle for Low Observable Software
Total Page:16
File Type:pdf, Size:1020Kb
Scientia Iranica D (2020) 27(3), 1333{1351 Sharif University of Technology Scientia Iranica Transactions D: Computer Science & Engineering and Electrical Engineering http://scientiairanica.sharif.edu Constructing automated test oracle for low observable software M. Valueian1, N. Attar, H. Haghighi, and M. Vahidi-Asl Faculty of Computer Science and Engineering, Shahid Beheshti University, G.C, Tehran, P.O. Box 1983963113, Iran. Received 27 July 2018; received in revised form 26 January 2019; accepted 10 August 2019 KEYWORDS Abstract. The application of machine learning techniques for constructing automated test oracles has been successful in recent years. However, existing machine learning based Software testing; oracles are characterized by a number of de ciencies when applied to software systems with Test oracle; low observability, such as embedded software, cyber-physical systems, multimedia software Machine learning; programs, and computer games. This paper proposes a new black box approach to construct Arti cial neural automated oracles that can be applied to software systems with low observability. The network; proposed approach employs an Arti cial Neural Network algorithm that uses input values Software observability. and corresponding pass/fail outcomes of the program under test as the training set. To evaluate the performance of the proposed approach, extensive experiments were carried out on several benchmarks. The results manifest the applicability of the proposed approach to software systems with low observability and its higher accuracy than a well-known machine learning based method. This study also assessed the e ect of di erent parameters on the accuracy of the proposed approach. © 2020 Sharif University of Technology. All rights reserved. 1. Introduction reliability, among which software testing plays a vital role. Due to the considerable size, complexity, distribu- Software testing is known as a labor-intensive and tion, pervasiveness, and criticality of recent software expensive task. The huge cost, diculty, and inaccu- systems, producing faultless software seems to be racy of manual testing have motivated researchers to an unattainable dream. Furthermore, the increasing seek for automated approaches, which aim to improve demand for software programs and competition in the accuracy and eciency of the task. As a con- software industry lead to short time to market, which sequence, there has been a burgeoning emergence of increases the likelihood of shipping faulty software. software testing methodologies, techniques, and tools These facts highlight the signi cance of quality as- to support testing automation in recent years. surance activities in improving software quality and Among di erent testing activities, test data gen- eration is one of the most e ective generations that 1. Present address: Department of Computer Engineering, aims to create an appropriate subset of input val- Sharif University of Technology, Tehran, Iran. ues to determine whether a program produces the *. Corresponding author. Tel.: +98 21 29904131 intended outputs. The input values should satisfy E-mail addresses: [email protected] (M. Valueian); n [email protected] (N. Attar); h [email protected] (H. some testing criteria such that the testers have a Haghighi); mo [email protected] (M. Vahidi-Asl) dependable estimation of the program reliability [1]. To reduce laboriousness, inaccuracy, and intolerable costs doi: 10.24200/sci.2019.51494.2219 of manual test data generation, a signi cant amount of 1334 M. Valueian et al./Scientia Iranica, Transactions D: Computer Science & ... 27 (2020) 1333{1351 research has been dedicated to automating the process a partial oracle is the application of metamorphic of test data generation [2,3]. testing, i.e., building the test oracle based on known Despite great advances in di erent software test- relationships between expected behaviors of the SUT. ing domains, e.g., test data generation, one impor- Recently, Machine Learning (ML) methods have tant challenge has been overlooked by academia and been employed to build test oracles. A typical ML- industry. The question arises, \who will evaluate based test oracle involves two main steps: choosing the correctness of the outcome/behavior of a software an appropriate learning algorithm for constructing a program according to the given test input data?" The learning model and preparing an appropriate training mechanism of labeling program outcomes, as pass or dataset from SUT's historical behavior, represented as fail, subjected to speci c input values is referred to as input-output pairs; this process is illustrated in Fig- \test oracle" [4]. ure 2. According to the gure, the constructed model Providing an accurate and precise test oracle is re ects the correct behavior of the SUT assuming that the main prerequisite for achieving robust and realistic the model is of 100% precision. software testing techniques and tools. This essential Most of ML-based approaches model the rela- aspect of software testing, left as an open problem in tionships between a limited number of inputs and the many works, is facing serious challenges. Typically, corresponding output values as the training set. The in real world applications, there might be no common idea behind these approaches is that the ML-based test oracle except human assessment on pass or fail oracles generalize the limited relationships, involved outcome of a program run [5]. One primary challenge in the training set, to the whole behavior of the of manual test oracles is the diversity and complexity SUT according to its input space. In other words, of software systems and platforms, each of which has they produce expected results for the remaining inputs various types of input parameters and output data. based on what they learned in the training phase. Moreover, due to demanding business expansion and In the testing phase, the inputs of a test set are thanks to advances in communications technologies, given to both SUT and test oracle. The expected software components may be produced simultaneously results, generated by the oracle, are compared with the by numerous developers, perhaps located in far apart actual outputs of the SUT. If the results are similar or places. This makes the manual judgment on the close together (based on a prede ned threshold), the correctness of diverse and complicated components outcome is labeled passed; otherwise, the execution is inaccurate, incomplete, and expensive. considered to be failed. The choice of the ML method The mentioned challenges emphasize the need for may a ect the precision of the test oracle. automatic test oracles. A typical automatic test oracle Some of ML-based works are based on classifying contains a model or speci cation by which the outcome the software behavior by using input/output pairs [6] of a Software Under Test (SUT) could be assessed. and, sometimes, along with execution traces [6{8] Figure 1 illustrates the structure of a conventional as input features of the classi er. Neither of the automated test oracle. As shown in the gure, the mentioned methods is applicable to low observable test inputs are given to both SUT and automated software systems. The reason is that, in these systems, test oracle. Their outputs are compared with an the expected results and actual outputs are not easily appropriate comparator, which decides whether the representable. To be more precise, it is not easy to outcome of the SUT, subjected to the given test inputs, observe and understand the behavior of low observable is correct. systems in terms of their outputs, e ects on the An automated test oracle can be derived from the environment, and other hardware and software compo- formal speci cation of the SUT. Usually, there is a lack nents. Sometimes, even if the outputs are observable, of the full formal speci cation of the SUT features. their precise encoding or representation, which is a In these situations, this can use a partial test oracle prerequisite for comparison, is dicult. Therefore, in that can assess the program behavior with respect to the cases categorized below, comparing the expected a subset of test inputs [4]. One approach to producing and actual results is inaccurate or even impossible: 1. Embedded software testing, in which the software has a direct e ect on a hardware device(s) that limits the observability of the software [9]. In these Figure 2. A block diagram of a Machine Learning (ML) Figure 1. The overall view of a conventional test oracle. based test oracle. M. Valueian et al./Scientia Iranica, Transactions D: Computer Science & ... 27 (2020) 1333{1351 1335 situations, assuming that the hardware works achieve a test oracle to predict the pass/fail behavior correctly, most of the testers prefer to evaluate of the program for the given test inputs during the embedded software by observing the behavior of testing phase. the hardware; Based on the above idea, in a conference pa- 2. Multimedia software programs, e.g., image editors, per [12], a learning based approach is employed, which audio and video players, and computer games, merely requires input values and the corresponding where typically there is no scalar (and even pass/fail outcomes/behaviors as the training set. The structured) output. In these programs, the training set is used to train a binary classi er that types of outputs are commonly unstructured or serves as the program's test oracle. Later, during the semi-structured [10]. Therefore, encoding these testing phase, several input parameters for which the outputs is mainly dicult, inaccurate, or even corresponding execution outcome is unknown are given infeasible, while labeling them with pass/fail is to the classi er. The classi er labels the outcome as usually feasible; pass or fail. Figure 3 illustrates an overview of our approach using Arti cial Neural Network (ANN) as the 3. Graphical User Interface (GUI) based programs, in binary classi er. which users are faced with various graphical elds Besides solving the mentioned problems, the pre- instead of neat values.