Learning Tests Traces

Learning Tests Traces Eyal Hadad Roni Stern Software and Information Systems Engineering Software and Information Systems Engineering Ben Gurion University of the Negev Ben Gurion University of the Negev Be’er Sheva, Israel Be’er Sheva, Israel [email protected] [email protected] Abstract—Modern software projects include automated tests coverage is a standard objective in popular test generation written to check the programs’ functionality. The set of functions frameworks, such as EvoSuite [2]. In bug isolation, traces are invoked by a test is called the trace of the test, and the action used by software diagnosis algorithms such as Barinel [3] to of obtaining a trace is called tracing. There are many tracing tools since traces are useful for a variety of software engineering localize the root cause of observed bugs. Traces are also used tasks such as test generation, fault localization, and test execution to prioritize which tests to execute [4], [5]. planning. A major drawback in using test traces is that obtaining A main drawback in using traces for all of the above tasks them, i.e., tracing, can be costly in terms of computational is that collecting traces is costly in terms of computational resources and runtime. Prior work attempted to address this in resources and runtime. This is because in order to obtain the various ways, e.g., by selectively tracing only some of the software components or compressing the trace on-the-fly. However, all trace of a test, one must build the project and execute the these approaches still require building the project and executing test while applying techniques such as byte-code manipulation the test in order to get its (partial, possibly compressed) trace. to record its trace. All these activities can be very costly in This is still very costly in many cases. In this work, we propose real-sized projects, and the size of the resulting trace can be a method to predict the trace of each test without executing prohibitively large. Prior worked partially addressed this by it, based only on static properties of the test and the tested program, as well as past experience on different tests. This compressing the trace while it is collected [6], [7] and by prediction is done by applying supervised learning to learn the choosing selectively which software components to include in relation between various static features of test and function a trace [8]–[11]. These approaches are very useful, but still and the likelihood that one will include the other in its trace. require executing the test. Then, we show how to use the predicted traces in a recent In this work, we propose to learn to predict the trace automated troubleshooting paradigm called Learn, Diagnose, and Plan (LDP), instead of the actual, costly-to-obtain, test traces. In of a test without executing it. This prediction relies only a preliminary evaluation on real-world open-source projects, we on static properties of the test and the tested program, as observe that our prediction quality is reasonable. In addition, well as previously collected traces of other tests. The first using our trace predictions in LDP yields almost the same contribution of our work is to define the trace prediction results comparing to when using real traces, while requiring less problem and model it as Binary classification problem. Then, overhead. supervised learning Index Terms—Automated debugging, Machine learning, Soft- we propose to use a algorithm over ware diagnosis, Automated testing traces of previously executed tests to solve this classification problem, and suggest easy-to-extract features to do so. This is the second contribution of this work. I. INTRODUCTION One of the benefits of having a trace prediction algorithm is The number of software projects and their size increase that it can be used instead of real traces in software engineering every day, while their time-to-market decreases. As a result, tasks. We show this for the task of software troubleshooting. arXiv:1909.01679v2 [cs.SE] 9 Sep 2019 the number of bugs in software projects increase. Bugs damage In particular, we propose to integrate our test prediction the performance of software products and directly affect their algorithm in a recently proposed automated troubleshooting customers. Therefore, software companies heavily invest in paradigm called LDP [5]. LDP aims to identify the root cause software quality and quality-related costs can consume as of an observed bug, and does so by using a combination much as 60% of the development budget [1]. of techniques from the Artificial Intelligence (AI) literature. To maintain software quality, modern software projects It uses a software diagnosis algorithm to output candidate include automated tests written to check the programs’ func- diagnoses. If this set is too large, LDP uses a test planning tionality. The set of functions invoked by a test is called (TP) algorithm to choose which tests to perform next in order the trace of the test. Many tools and research papers use to collect more information for the diagnosis algorithm. Prior test traces to perform a range of software engineering tasks, work on the TP component of LDP assumed that the test including test generation, bug isolation, and managing test planner knows the traces of the test it is planning to execute. execution. In test generation tools, traces are collected and We propose a simple TP algorithm that can use the predicted used to compute coverage, which is the union of the sets traces instead of the actual, costly-to-obtain, test traces. This of functions in the traces of the generated tests. Maximizing is the third contribution of this work. Finally, we perform a small-scale evaluation of our trace classification task is the task of mapping a label to a given prediction algorithm and our TP algorithm in LDP on real- instance, where the set of possible labels is discrete and finite. world open-source projects. Results show that while prediction A binary classification task is a classification task in which quality can be improved, it is sufficiently accurate to be used there are only two possible labels. by our TP algorithm to guide LDP to troubleshoot bugs almost To solve a classification task using supervised learning, well as when using real traces. one needs to accept as input a set of instance-label pairs, i.e., a set of instances and their label. This is called a II. BACKGROUND AND PROBLEM DEFINITION training set. Supervised learning algorithms work, in general, An automated test is a method that executes a program in or- as follows: they extract features from every instance in the der to check if it is working properly. The outcome of running training set, and then run an optimization algorithm to search a test is either that the test has passed or failed, where a failed for parameters of a chosen mathematical model that maps 1 test indicates that the program is not behaving properly. The values of these features to the correct label. trace of a test is the set of software components, e.g., classes The output of a supervised learning algorithm is this math- or methods, that were activated during the execution of a test. ematical model along with optimized values for its parameter. We denote by outcome(t) and trace(t) the outcome and trace, In the context of classification, this is called a classifier.A respectively, of a test t. classifier can be used to output a label for a previously unseen Modern software programs include a large set of automated instance, by extracting the features of this instance, inserting tests. For a given program of interest, we denote its set of their values to the learned mathematical model, and outputting automated tests by T, and its set of software components by the resulting label. COMPS. Note that for every test t 2 T it holds that trace(t) ⊆ COMPS. There are several techniques for obtaining the trace of test B. Trace Prediction as a Binary Classification Problem after executing it. A common way to obtain the trace of a test t is to modify the program’s source code so that it Trace prediction can be viewed as a binary classification records every function invocation, and then run t. For example, task. An instance in trace prediction is a pair (t; c) where t is a in Java programs such code modification can be done in test and c is a software component. The label is true if and only runtime using byte-code manipulation frameworks such as if c is in the trace of t, i.e., iff c 2 trace(t). The corresponding ASM (http://asm.objectweb.org), BCEL (http://jakarta.apache. binary classifier is a classifier that accepts a pair (t; c) where org/bcel), and SERP (http://serp.sourceforge.net). Another way t 2 T and c 2 COMPS, and outputs true if c 2 trace(t) to obtain the trace of a test is to execute it with a debugging and false otherwise. We refer to such a classifier as a trace tool, and, again, record every function invocation. These classifier. A trace classifier can be used to solve the trace tracing techniques have been used in practice in various tracing prediction problem: for a given test t, run over all software tools, such as iDNA [12] and Clover (https://www.atlassian. components c 2 COMPS, and return only the components com/software/clover). For a survey of tracing tools, see [13]. labeled as true by the classifier. All these tools require running the test in order to obtain To solve this binary classification problem with supervised its trace.

Learning Tests Traces

Development and Certification of Mixed-Criticality Embedded

Improving Software Fault Prediction Using a Data-Driven Approach

Fault Diagnosis for Functional Safety in Electrified and Automated Vehicles

Ada User Journal 66 Editorial 67 Quarterly News Digest 69 Conference Calendar 97 Forthcoming Events 103 Articles from the Industrial Track of Ada-Europe 2011 R

Anglia Ruskin University a Type-Safe Apparatus

AI for Software Quality Assurance Blue Sky Ideas Talk

Call for Papers the 20Th International Conference on Intelligent Software Methodologies, Tools, and Techniques

Software Evolution

Apply the Principles of Resolving Problems for Single- User and Multi-User Computer Operating Systems 114183

Delft University of Technology an Empirical Evaluation of Feedback

Using Model-Based Diagnosis to Improve Software Testing

Model–Based Debugging Using Multiple Abstract Models 57