Using Model-Based Diagnosis to Improve Software Testing
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Using Model-Based Diagnosis to Improve Software Testing Tom Zamir and Roni Stern and Meir Kalech [email protected] [email protected] [email protected] Department of Information Systems Engineering Ben Gurion University of the Negev Be’er Sheva, Israel Abstract In this work we propose to enhance the software test- ing and debugging process described above by combining We propose a combination of AI techniques to improve soft- Model-Based Diagnosis (MBD) and planning techniques ware testing. When a test fails, a model-based diagnosis (MBD) algorithm is used to propose a set of possible expla- from the Artificial Intelligence (AI) literature. MBD algo- nations. We call these explanations diagnoses. Then, a plan- rithms have been proposed in the past for the purpose of ning algorithm is used to suggest further tests to identify the diagnosing software bugs (Gonzalez-Sanchez´ et al. 2011; correct diagnosis. A tester preforms these tests and reports Abreu, Zoeteweij, and van Gemund 2011; Wotawa and Nica their outcome back to the MBD algorithm, which uses this 2011; Stumptner and Wotawa 1996). Thus, when a tester information to prune incorrect diagnoses. This iterative pro- encounters a bug, any of these algorithms can be used to cess continues until the correct diagnosis is returned. We call generate a set of possible diagnoses automatically. this testing paradigm Test, Diagnose and Plan (TDP). Several To identify which of these diagnoses is correct, additional test planning algorithms are proposed to minimize the num- tests need to be performed. We propose several algorithms ber of TDP iterations, and consequently the number of tests required until the correct diagnosis is found. Experimental for planning these additional tests. These tests may be gen- results show the benefits of using an MDP-based planning al- erated automatically or selected from a manually created gorithms over greedy test planning in three benchmarks. set of tests, considering the set of possible diagnoses and choosing tests that will differentiate between them. This process of testing, diagnosing and planning further testing Introduction is repeated until a single diagnosis is found. Importantly, Testing is a fundamental part of the software development unlike previous work on test generation for software, we do process (Myers et al. 2004). A software testing phase in- not assume any model of the diagnosed software (Esser and volves finding bugs and fixing them. From the perspective Struss 2007) or an ability to manipulate internal variables of of the programmer, fixing bugs usually involves two tasks. the software (Zeller 2002). One could consider this work as First, the root cause of the bug needs to be found, and then variant of algorithmic debugging (Silva 2011) that combines the faulty software components (e.g., functions or classes) software MBD with planning. are fixed. Diagnosing the root cause of a software bug is of- The contributions of this paper are threefold. First, we ten a challenging task that involves a trial-and-error process: present a methodology change to the software testing and several possible diagnoses are suggested by the programmer, debugging process that uses a combination of MBD and which then performs tests and probes to differentiate the cor- planning techniques. Second, we propose several test plan- rect diagnosis. One of the reasons why this trial-and-error ning algorithms for identifying the correct diagnosis while process is challenging is because it is often non-trivial to re- minimizing the tests performed by the tester. Third, we eval- produce bugs found by a tester. uate these test planning algorithms on three benchmarks. An ideal solution to this problem would be that the tester, when observing a bug, will perform additional test steps Background to help the programmer find the software component that We use the terms tester and developer to refer to the per- caused the bug. However, planning these additional test son that tests and programs the software, respectively. The steps cannot be done efficiently without being familiar with purpose of traditional software testing process (referred to the code of the tested software. Often, testing is done by hereafter as simply testing) is to verify that the developed Quality Assurance (QA) professionals, which are not famil- system functions properly. One can view testing as part of iar with the software code that they are testing. This sep- an information passing process between the tester and the aration, between those who write the code and those who developer, depicted in the left side of Figure 1. The tester test it, is even regarded as a best-practice, allowing unbiased executes a sequence of tests to test some functionality of the testing. developed system. Such a sequence of tests is called a test Copyright c 2014, Association for the Advancement of Artificial suite. The tester runs all the tests in a test suite until ei- Intelligence (www.aaai.org). All rights reserved. ther the test suite is done and all the tests have passed, or 1135 one of the tests failed. A failing test indicates the existence Algorithm 1: An Algorithmic View of TDP of a bug. To fix it, the tester passes information about the Input: ObsT ests, the tests performed by the tester failed test to the developer, often in the form of a “bug re- until the bug was found. port”, and continues to execute other tests. The developer 1 Ω Compute diagnosis from ObsT ests is then responsible for fixing the bugs found by the tester. 2 while jΩj > 1 do This process is often performed using bug or issue tracking 3 NewT est plan a new test tools (e.g., HP Quality Center, Bugzilla, and IBM Rational 4 NewObs Tester performs NewT est ClearQuest). 5 ObsT ests ObsT ests [ NewObs In order to fix the bug, the developer needs to identify 6 Ω Compute diagnosis from ObsT ests the faulty software component and then fix it. This process, 7 end of finding the faulty software component and fixing it, is 8 return Ω commonly referred to as “debugging”. The Test, Diagnose and Plan Paradigm tester (line 1). Then, an additional test is proposed, such that at least one of the diagnoses is checked (line 3). The tester Traditional Process Proposed Process then performs this test (line 4). After the test is performed, •Run a test suite •Run a test suite the diagnosis algorithm is run again, now with the additional •Discover a bug •Discover a bug information gained from the new test that was performed Tester •File bug report Tester •File bug report (line 6). If a single diagnosis is found, it is passed to the •Run an AI diagnosis algorithm developer to fix the faulty software component. Otherwise, •Produce a set of possible diagnoses this process continues by planning and executing new tests. AI •Plan a test to prune false diagnoses The key components in TDP are the diagnosis algorithm used to compute diagnoses and the planning algorithm used •Identify where is the bug •Fix the bug to plan tests. We describe next how these components can Developer •Fix the bug Developer be implemented. Model-Based Diagnosis for Software Figure 1: Traditional software testing vs. TDP. The input to classical MBD algorithms is a tuple We propose a new testing paradigm, called Test, Diagnose 1 hSD;COMPS;OBSi, where SD is a formal description and Plan (TDP), for improving the testing and debugging of the diagnosed system’s behavior, COMP S is the set of processes described above by empowering the tester with components in the system that may be faulty, and OBS tools from the Artificial Intelligence (AI) literature. TDP is is a set of observations. A diagnosis problem arises when illustrated on the right side of Figure 1. When a test fails, an SD and OBS are inconsistent with the assumption that all MBD algorithm is run to suggest a set of possible diagnoses, the components in COMP S are healthy. The output of an i.e., software components that may contain a bug that caused MBD algorithm is a set of diagnoses. the test to fail. If this set of diagnoses contains a single di- Definition 1 (Diagnosis). A set of components ∆ ⊆ agnosis, it is passed to the developer. Otherwise, a planning COMP S is a diagnosis if by assuming that they are faulty, algorithm is used to plan further tests for the tester intended then SD is consistent with OBS. to narrow the set of diagnoses. The tester performs these tests and reports the observed output back to the MBD al- In software, the set of components COMP S can be de- gorithm, which then outputs a new, potentially more refined, fined for any level of granularity: a class, a function, a block set of diagnoses. This process is repeated until a single diag- etc. Low level granularity will result in a very focused di- nosis is found and passed to the developer. Other stopping agnosis (e.g., pointing on the exact line of code that was conditions are also possible and discussed later in the paper. faulty), but obtaining that diagnosis will require more effort. The great benefit of TDP over the traditional testing and Observations (OBS) in software diagnosis are observed ex- debugging is that in TDP the developer is given the exact ecutions of tests. Every observed test t is labeled as “passed” set of software components that caused the bug. Moreover, or “failed”, denoted by passed(t) and failed(t), respec- having the tester perform additional tests immediately when tively. This labeling is done manually by the tester or auto- the bug is observed, as is done in TDP, is expected to provide matically in case of automated tests (e.g., failed assertions).