<<

WSEAS TRANSACTIONS on COMPUTERS Ljubomir Lazic and Nikos Mastorakis

Orthogonal Array application for optimal combination of software defect detection techniques choices

LJUBOMIR LAZICa, NIKOS MASTORAKISb

aTechnical Faculty, University of Novi Pazar Vuka Karadžića bb, 36300 Novi Pazar, SERBIA [email protected] http://www.np.ac.yu

bMilitary Institutions of University Education, Hellenic Naval Academy Terma Hatzikyriakou, 18539, Piraeu, Greece [email protected]

Abstract: - In this paper, we consider a problem that arises in black box testing: generating small test suites (i.e., sets of test cases) where the combinations that have to be covered are specified by input-output parameter relationships of a software system. That is, we only consider combinations of input parameters that affect an output parameter, and we do not assume that the input parameters have the same number of values. To solve this problem, we propose testing, particularly an Testing Strategy (OATS) as a systematic, statistical way of testing pair-wise interactions. In process (STP), it provides a natural mechanism for testing systems to be deployed on a variety of hardware and software configurations. The combinatorial approach to software testing uses models to generate a minimal number of test inputs so that selected combinations of input values are covered. The most common coverage criteria are two-way or pairwise coverage of value combinations, though for higher confidence three-way or higher coverage may be required. This paper presents some examples of software-system test requirements and corresponding models for applying the combinatorial approach to those test requirements. The method bridges contributions from mathematics, design of , software test, and algorithms for application to testing. Also, this study presents a brief overview of the response surface methods (RSM) for computer experiments available in the literature. The Bayesian approach and orthogonal arrays constructed for computer experiments (OACE) were briefly discussed. An example, of a novel OACE application, to STP optimization study was also given. In this case study, an orthogonal array for computer experiments was utilized to build a second order response surface model. Gradient-based optimization algorithms could not be utilized in this case study since the design variables were discrete valued. Using OACE novel approach, optimum combination of software defect detection techniques choices for every software development phase that maximize all over Defect Detection Effectiveness of STP were determined.

Key-Words: - Software testing, Opzimization, , Orthogonal array

1 Introduction practices that many IT organizations currently have in Many IT organizations struggle with how to place? Simply put, optimized testing is a practical determine the proper balance of testing in light of approach to improving application quality that business demands and budgetary limitations. Testing balances quality, cost and schedules to prioritize has historically been difficult to optimize because of testing and optimize limited resources. The efficiency a number of factors. IT has always faced the of this practical testing approach ensures that the constraints of balancing time, quality and cost. finite testing resources at IT management’s disposal Combine this with the reality that it is often difficult are used to their most productive levels while to measure exactly how effective testing is for a given eliminating harmful defects and errors during the application, and you can easily see that many formal testing process. Optimized testing practices, methods testing processes are far from optimal. Yet many IT and tools now exist that allow the QA team to better organizations are not sure how to make them better. align testing activities with business requirements by So what exactly is optimized testing, and how does it prioritizing testing activities based on two key differ from the quality control and quality assurance factors: importance to the business and risk to the

ISSN: 1109-2750 1319 Issue 8, Volume 7, August 2008 WSEAS TRANSACTIONS on COMPUTERS Ljubomir Lazic and Nikos Mastorakis business [25,26]. Business users, development, QA in applying this promising technique in software and other key constituents can now collaborate to testing is the construction of covering designs. There prioritize requirements and identify and prioritize are two issues that need to be considered: the first and project risks. And, when required, they can perhaps more important one is the size of the collaborate to resolve problems during the test covering design since it dictates the number of test execution. Once priorities have been established, QA cases, and consequently, the amount of resources can confidently select the optimum set of tests to needed to test a software system; the second is the thoroughly test the high-priority areas of an time and space requirements of the construction application and adequately test all other areas [25,26]. itself. This paper presents a novel OACE approach for A great deal of research work has been devoted software testing process (STP) optimization study to generating small test suites [2, 3, 7, 12-16, 25]. finding optimum combination of software defect Most researchers focus on uniform coverage of input detection techniques (DDT) choices for every parameters with uniform ranges; i.e., they consider software development phase that maximize all over test suites that cover all t -wise combinations of the Defect Removal Effectiveness (DRE) of STP. The input parameters for some integer t and the input optimum combination of software defect detection parameters are assumed to have the same number of techniques choices were determined applying values. While such test suites apply to a large number orthogonal arrays constructed for post mortem of situations, in practice not all t -wise combinations designed with collected defect data of a of input parameters have equal priority in testing [3, real project [27]. By integrating this unique approach 25], nor are all parameter domains of the same size. to requirements management with test management Testers often prioritize combinations of input and automation tools, tests and test suites can be parameters that influence a system’s output automatically created and executed as in our parameters over those that do not [3, 15, 20]. Integrated and Optimized Software Testing Process Determining which set of input parameters influence (IOSTP) [25,27]. a system’s output parameters can be accomplished Software testing remains an important topic in using existing analyses [3, 25] or can be discovered software engineering. Testing efficiency and in the process of determining the expected result of a effectiveness, two goals which are not always in test case. alignment, are both significantly improved. A recent Covering all pairs of tested factor levels has been report generated for the National Institute of extensively studied. Mandl described using Standards and Technology (NIST) found that orthogonal arrays in testing of a compiler [16]. software defects cost the U.S. economy 59.9 billion Tatsumi, in his paper on Test Case Design Support dollars annually [28]. While current technologies System used in Fujitsu Ltd [22], talks about two cannot hope to remove all errors from software, the standards for creating test arrays: (1) with all report goes on to estimate that 22 billion dollars combinations covered exactly the same number of could be saved through earlier and more effective times (orthogonal arrays) and (2) with all defect detection. combinations covered at least once. When making Black-box testing is a type of software testing that crucial distinction, he references an earlier paper that ensures a program meets its specification from a by Shimokawa and Satoh [19]. Over the years, behavioral or functional perspective. The number of pairwise testing was shown to be an efficient and possible black-box test cases for any non-trivial effective strategy of choosing tests [4-6, 10, 13, 25]. software application is extremely large. The However, as shown by Smith et al. [20] and later by challenge in testing is to reduce the number of test Bach and Shroeder [3] pairwise, like any technique, cases to a subset that can be executed with available needs to be used appropriately and with caution. resources and can also exercise the software Since the probl em of finding a minimal array adequately so that majority of software defects are covering all pairwise combinations of a given set of exposed. One popular method for performing black- test factors is NP-complete [14], understandably a box testing is the use of combinatorial covering considerable amount of research has gone into designs [3-11] based on techniques developed in efficient creation of such arrays. Several strategies designing experiments. These designs correspond to were proposed in an attempt to minimize number of test suites (i.e., a set of test cases) that cover, or tests produced [11]. Authors of these combinatorial execute, combinations of input parameters in a test case generation strategies often describe systematic and effective way, and are most applicable additional considerations that must be taken into in testing data-driven systems where the manipulation account before their solutions become practical. In of data inputs and the relationship between input many cases, they propose methods of handling these parameters is the focus of testing. As pointed out by in context of their generation strategies. Tatsumi [22] Dunietz, et al. [10], a technical challenge that remains mentions constraints as a way of specifying unwanted

ISSN: 1109-2750 1320 Issue 8, Volume 7, August 2008 WSEAS TRANSACTIONS on COMPUTERS Ljubomir Lazic and Nikos Mastorakis combinations (or more generally, dependencies This section highlights some suggested steps for among test factors). Sherwood [18] explores adapting incorporating these areas of optimized testing into conventional t-wise strategy to invalid testing and the your existing testing environment. problem of preventing input masking. Cohen et al. [6] describe seeds which allow specifying combinations 2.2 Adoption of an application quality life that need to appear in the output and covering cycle combinations with mixed-strength arrays as a way of The cost of fixing defects increases exponentially putting more emphasis on interactions of certain test as a defect moves through development into factors. Kuhn et al measured the number of defects production. By adopting an application quality life identified at different strengths of interaction cycle, quality is built in to the application from the coverage in software [13] and also in systems with earliest phases of its life cycle, rather than attempting embedded software [23]. Several other examples to test it in when it’s too late. This requires discipline include application of experimental designs (DOE) to in defining and managing requirements, computer benchmarking, Object Oriented Testing, implementing automated and repeatable best network testing, and compiler testing [24]. Indeed practices and access to the right to make several of these mentioned studies have had an confident decisions. This best practice allows you to impact and tools for automatic construction of test fix defects earlier, when there is more time to suites have recently appeared. For instance, the sufficiently address the problems and it is far less Automated Efficient Test Generator tool (AETG) has expensive. It lays the groundwork for continuous been developed by Telecordia [6]; NASA has funded process improvement and higher-quality applications the development of the Test Case Generator tool [25,28]. (TCG) [23]; IBM has funded the Combinatorial Test Services tool (CTS) [1]; and TestCover.com has also 2.2.1 Testing Process Activities Flow introduced a web-based tool. Once it was clear that Testing was much more In this study response surface methods for than “Debugging” or “Problem fixing”, it was computer experiments are investigated and some of apparent that testing was more than just a phase near the approaches available in the literature are the end of the development cycle. Testing has a life discussed. The focus is on response surface model cycle of its own and there is useful and constructive building using orthogonal arrays designed for testing to be done throughout the entire life cycle of computer experiments (OACE). Different Defect development. This means that testing process begins Detection Strategy and Techniques options, with the requirements phase and from there parallels together with critical STP variables performance the entire development process. In other words, for each phase of the development process there is an characteristics (e.g. DRE, cost, duration), are important testing activity. This necessitates the need studied to optimize design, development, test and to migrate from an immature, ad hoc way of working evaluation (DDT&E) cost using orthogonal arrays to having a full-fledged Testing Process. The for computer experiments [25-27]. following is the life cycle for the complete Test This paper is organized as follows. Section 2 Development and Execution Process scheme. presents the Best practices for optimized testing. Section 3 explain The Strategy. A novel Orthogonal Arrays application as Design of Experiments Optimization Strategy are presented in Section 4. Finally, the paper is concluded in Section 5.

2 The optimized testing approach 2.1 Best practices for optimized testing Adopting an optimized testing approach may Fig. 1 Test Development and Execution Process sound overwhelming. However, the truth is that IT scheme. organizations can adopt optimized testing practices The specification defines the program correct incrementally, implementing certain aspects tactically behavior. The incorrect behavior is a software failure. to achieve strategic advantage. This section offers It can be improper output, abnormal termination, and some suggested ways to adopt an optimized testing unmet time or space constraints. Failures are mostly approach. Optimized testing best practices include caused by faults, which are missing or incorrect code. adoption of the application life cycle, requirements Error is a human action that produces a failure. An management, risk-based testing and automation [26].

ISSN: 1109-2750 1321 Issue 8, Volume 7, August 2008 WSEAS TRANSACTIONS on COMPUTERS Ljubomir Lazic and Nikos Mastorakis abend is an abort termination of a program (like "blue actual result is what was generated by run. An oracle screen of death" by Microsoft Windows). An produces expected results. An oracle can be an omission is a required capability, which is not present automated tool or human resource. A test is called to in an implementation. Surprise is code that does not be passing if expected results and actual results are support a required capability. It can be surprising at equal, otherwise it is called to be no pass or fail. code reuse. Bug is an error or a fault. The scope of Test cases can be designed for positive testing or STP is the collections of artifacts under test (AUT). negative testing. Positive testing checks that the Testing activities can be categorized by the scope of software does what it should. Negative testing checks AUT that belongs to corresponding STP or SDL that the software does not do what it should not do. phase. Test artifact under test can be the software A test suit is a collection of test cases related to each requirement (SRUT), High level design (HLDUT), other. Test run is the execution of a test suit. Test Low Level Design (LLDUT), code being tested is driver is a tool (can be a unit or utility program) that called implementation under test (CUT), integration applies test cases to AUT. A stub is a partial, test (IUT) system under test (SUT), or in object- temporary implementation of a component. The oriented environment class under test (CLUT), object following figure shows the systems engineering view under test (OUT), method under test (MUT). A test of testing. Test strategy identifies the levels of case defines the input sequence of data, the testing, the methods, test detection techniques (DDT) environment and state of AUT, and the expected and tools to be used. Test strategy defines the result. Expected result is what AUT should generate, algorithm to create test cases.

Fig. 2 The systems engineering view of testing should be tested at least with the bug revealer test Test design produces test cases using a test strategy. case. Test effectiveness of DDT is the ability of the test Coverage is the percentage of elements required strategy to find the bugs. Test efficiency is the cost of by a test strategy that have been exercised by a test finding bugs. Strategies for the test design can be suite. There are many coverage models. Statement functional (Black-box), structural (White-box), hybrid (Gray-box) and fault-based. coverage is the percentage of source code statements is based on the specification of software, without executed at least once by a test suite. Clearly, knowing something about program code. It uses the statement coverage can be used only by structural or specified or expected behavior. It is also called hybrid testing. Testing should make effort to reach specification based, behavioral, and responsibility- 100% code coverage. This can avoid the user to run based or black-box testing. untested code. All these definitions raise a lot of Structural testing relies on the structure of the questions and problems, and all of them cannot be source code to develop test cases. It uses the actual dealt in this article (see references in [25]), although implementation to create test suits. It is also called the most important ones can be found below. The implementation based, white box or clear box testing. testing strategy defines how test design should Hybrid testing is the blend of functional and produce the test cases, but nothing it can tell us about structural testing. It is also called gray-box testing. how much testing is enough, and how effective the Fault-based testing introduces faults into code testing was. Test case effectiveness depends on (mutation) to see if these faults are revealed by a test numerous factors, and can be evaluated after the end suite. is retesting the software with of testing, which is normally too late. To avoid these the same test cases. After a bug is fixed, the product problems, testers should perform in-process

ISSN: 1109-2750 1322 Issue 8, Volume 7, August 2008 WSEAS TRANSACTIONS on COMPUTERS Ljubomir Lazic and Nikos Mastorakis evaluation of proposed and planned STP, according A key metric for measuring and benchmarking the to established performance metrics and quality IOSTP [25] by measuring the percentage of possible criteria [25,29] as we described below. defects removed from the product at any point in The testers should verify test cases at the end of time. Both a project and process metric – can test design, check the conformance of test cases to measure effectiveness of quality activities or the meet the requirements. It should also check the quality of a all over project by: specification coverage. Validation is after test execution. Knowing the result, an effectiveness rate DRE = E/(E+D) (2) should count, and if it is under the threshold, the test Where E is the number of errors found before suite should be analyzed, and the test process should delivery to the end user, and D is the number of be corrected. errors found after delivery. The goal is to have DRE A simple metric for effectiveness, which is only test close to 100%. The same approach is applied to every suite dependent [29]. It is the ratio of bugs found by test phase denoted wit i : test cases (N ) to the total number of bugs (N ) tc tot DREi = Ei / (Ei + Ei+1) (3) reported during the test cycle (by test cases or by side effect): Where Ei is the number of errors found in a software

engineering activity i, and Ei+1 is the number of errors TCE= 100 * Ntc / Ntot [%] (1) that were traceable to errors that were not discovered

This metric can evaluate effectiveness after a test in software engineering activity i. The goal is to have cycle, which provides in-process feedback about the this DREi approach to 100% as well i.e., errors are actual test suite effectiveness. To this metric a filtered out before they reach the next activity. threshold value should create. This value is suggested Projects that use the same team and the same to be about 75%, although it depends on the development processes can reasonably expect that the application. DRE from one project to the next are similar. When the TCE value is above the threshold, the For example, if on the previous project, you test case can be said effective according to very removed 80% of the possible requirements defects useful model for dealing with defects as depicted on using inspections, then you can expect to remove Fig. 3. If it is below, testers should correct the test ~80% on the next project. Or if you know that your plan, focusing on side effect bugs. historical data shows that you typically remove 90% before shipment, and for this project, you’ve used the same process, met the same kind of release criteria, and have found 400 defects so far, then there probably are ~50 defects that you will find after you release. How to combine DDT to achieve high DRE, let say >85%, as a threshold for STP required effectiveness, is explained in section 4. which describe optimum combination of software defect detection techniques choices determination applying orthogonal arrays constructed for post mortem designed experiment with collected defect data of a real project [27].

2.3 Risk-based testing

Fig. 3 Fault Injection Model Traditional An important area to focus on when optimizing your testing is identifying all of the business and technical It basically says that given a software project – you requirements that exist for an application, and then have defects being “injected” into it (from a variety prioritizing them based on the impact of failure on of sources) and defects being removed from it (by a the business. QA teams should ensure they have variety of means). This high-level model is good to access to the application’s business and technical use to guide our thinking and reasoning about defects requirements in order to create effective test and defect processes. So, based on this model, the requirements. Involving business managers, test goal in software development, for delivering the managers and QA architects will help achieve the fewest defects, is to: minimize the number of defects balance of testing that is optimal [25-28]. The that go in maximize the number of defects that are advantage of automated risk-based testing is that it removed. adds a level of objectivity not available with traditional testing, where individual testers were left 2.2.2 Defect Removal Efficiency to determine what should be tested and when.

ISSN: 1109-2750 1323 Issue 8, Volume 7, August 2008 WSEAS TRANSACTIONS on COMPUTERS Ljubomir Lazic and Nikos Mastorakis

Thoroughly understanding and correctly prioritizing program manager works with the test engineers to testing requirements can have the greatest impact on use IOSTP with embedded RBOSTP [26] to develop successful delivery of a high-quality application. By a comprehensive evaluation strategy that uses data implementing an optimized testing solution, IT can from the most cost-effective sources; this may be a ensure quality activities accurately reflect business combination of archived, simulation, and software priorities, and can make certain they are testing the test event data, each one contributing to addressing right areas of an application within the constraints of the issues for which it is best suited. the schedule. Using an optimized testing solution, the The central elements of IOSTP with embedded risks are calculated automatically and time estimates RBOSTP are: the acquisition of information that is are rolled up per requirements balancing quality, credible; avoiding duplication throughout the life schedule and cost through risk-based practices. This cycle; and the reuse of data, tools, and information. allows testers to apply a time factor to existing risk The system/software under test is described by factors, which enables users to quickly select the objectives, parameters i.e. factors (business highest-priority test cases and understand how long it requirements - BR are indexed by j) in requirement will take to test them [26]. specification matrix, where the major capabilities of subsystems being tested are documented and 2.4 Software Testing Optimization Model and represent an independent i.e. input variable to IT benefits optimization model. Information is sought under a With optimized testing, IT organizations are number of test conditions or scenarios. Information able to balance the quality of their applications with may be gathered through feasible series of existing testing schedules and the costs associated experiments (E): software test method, field test, with different testing scenarios. Optimized testing through simulation, or through a combination, which provides a sound and proven approach that allows IT represent test scenario indexed by i i.e. sequence of to align testing activities with business value. The test events. Objectives or parameters may vary in practices, processes and tools that encompass importance αj or severity of defect impacts. Each optimized testing offer many benefits. The increasing M&S or test option may have k models/tests called cost and complexity of software development is modes, at different level of credibility or probability leading software organizations in the industry to to detect failure βijk and provide a different level of search for new ways through process methodology computed test event information benefit Bijkl of and tools for improving the quality of the software experimental option for cell (i,j), k, and they develop and deliver. indexed option l for each feasible experiment depending on the nature of the method and structure 2.4.1 Manage “what-if” scenarios -A Software of the test. Test event benefit Bijkl of feasible Testing Optimization Model experiment can be simple ROI or design parameter Such scenarios are invaluable for determining solution or both etc. The cost Cijkl, of each where testing resources should be spent at the experimental option corresponding to (i,j,k,l) beginning of software development project. With an combination must be estimated through standard optimized testing solution, you can create what-if cost analysis techniques and models. For every scenarios to help users understand the impact of feasible experiment option, tester should estimate changing risks, cycle attributes and requirements as time duration Tjikl of experiment preparation end priorities change. This insight proves invaluable execution. The testers of each event, through when a testing organization is trying to determine the historical experience and statistical calculations best way to balance quality with cost and schedule. define the Eijkl's (binary variable 0 or 1) that identify By understanding the impact of different factors on options. The following objective function is testing, IT managers can identify the right balance. structured to maximize benefits and investment in We applied the End-to-End (E2E) Test strategy in the most important test parameters and in the most our Integrated and Optimized Software Testing credible options. The model maintains a budget, framework (IOSTP) [25-27]. End-to-End schedule and meets certain selection requirements Architecture Testing is essentially a "gray box" and restrictions to provide feasible answers through approach to testing - a combination of the strengths maximization of benefit index - BenefitB Index: of white box and black box testing. In determining IB ndexenefit = max βα ijkj EB ijklijkl (4) ,,, lkji ∑ ∑ ∑ ∑ the best source of data to support analyses, IOSTP kij l with embedded RBOSTP considers credibility and cost of each test scenario i.e. concept. Resources for Subject to: (Budget simulations and software test events are weighed ∑∑∑∑ EC ijklijkl ≤ BUDGET against desired confidence levels and the limitations j ikl of both the resources and the analysis methods. The constraint);

ISSN: 1109-2750 1324 Issue 8, Volume 7, August 2008 WSEAS TRANSACTIONS on COMPUTERS Ljubomir Lazic and Nikos Mastorakis

(Time- • k is the number of columns, representing the ∑∑∑∑ ET ijklijkl ≤ TIMESCHEDULE j ikl number of parameters. schedule constraint) • The entries in the array are the values 0, …, n – 1, where n = f(n , …, n . ∑ Eijkl ≤ 1 for all i,j,k (at most one option selected 0 k-1) l Typically, this means that each parameter would have per cell i, j, k mode) (up to) n values. • d is the strength of the array (see below). ∑∑Eijkl ≥ 1 for all i,j (at least one experiment kl An orthogonal array has strength d if in any ρ × d option per cell i, j) sub-matrix (that is, select any d columns), each of the n*d possible d-tuples (rows) appears the same 3 Orthogonal Array Testing Strategy number of times (>0). In other words, all d- (OATS) and Techniques interaction elements occur the same number of times. Here is some terminology for working with The Orthogonal Array Testing Strategy orthogonal arrays followed by an example array in (OATS) is a systematic, statistical way of testing Fig. 4 [24,25]: pair-wise interactions. It provides representative • Runs - ρ: the number of rows in the array. This (uniformly distributed) coverage of all variable pair directly translates to the number of test cases that combinations. This makes the technique particularly useful for of software components will be generated by the OATS technique. (especially in OO systems where multiple subclasses • Factors - k: the number of columns in an array. can be substituted as the server for a client). It is This directly translates to the maximum number of also quite useful for testing combinations of variables that can be handled by this array. configurable options (such as a web page that lets • Levels - n: the maximum number of values that can the user choose the font style, background color, and be taken on by any single factor. An orthogonal page layout). Dr. was one of the array will contain values from 0 to Levels-1. first proponents of orthogonal arrays in test design. • Strength - d: the number of columns it takes to see His techniques, known as , have Stength each of the Levels possibilities equally often. been a mainstay in experimental design in • Orthogonal arrays are most often named following manufacturing fields for decades. Factors Orthogonal arrays are two dimensional arrays the pattern LRuns(Levels ). of numbers which possess the interesting quality that by choosing any two columns in the array you Factors receive an even distribution of all the pair-wise 0 0 0 0 combinations of values in the array. The method of orthogonal arrays is an experimental design 0 1 1 2 construction technique from the literature of 0 2 2 1 . In turn, construction of such arrays depends R 1 0 1 1 on the theory of combinatorics. An orthogonal array u is a balanced two-way classification scheme used to n 1 1 2 0 construct balanced experiments when it is not s 1 2 0 2 practical to test all possible combinations. The size 2 0 2 2 and shape of the array depend on the number of parameters and values in the experiment. Orthogonal 2 1 0 1 arrays are related to combinatorial designs. An 2 2 1 0 orthogonal array is a balanced two-way classification scheme used to construct balanced experiments when Fig. 4 An L (3 ) orthogonal array with 9 runs, 4 it is not practical to test all possible combinations. 9 4 factors, 3 levels, and strength of 2. The size and shape of the array depend on the number of parameters and values in the experiment. 3.1 Facts and Industry experiences Definition 1: Orthogonal array O(ρ, k, n, d) Being intelligent about which test cases you An orthogonal array is denoted by O(ρ, k, n, d), choose can make all the difference between (a) where: endlessly executing tests that just aren't likely to find • ρ is the number of rows in the array. The k-tuple bugs and don't increase your confidence in the forming each row represents a single test system and (b) executing a concise, well-defined set configuration, and thus ρ represents the number of of tests that are likely to uncover most (not all) of test configurations.

ISSN: 1109-2750 1325 Issue 8, Volume 7, August 2008 WSEAS TRANSACTIONS on COMPUTERS Ljubomir Lazic and Nikos Mastorakis the bugs and that give you a great deal more comfort As an example of the benefit of using the in the quality of your software. OATS technique over a test set that exhaustively Some advantage of the Orthogonal Array Testing tests every combination of all variables, consider a Strategy (OATS) is outlined below: system that has four options, each of which can have • Pairwise testing protects against pairwise bugs three values. The exhaustive test set would require while dramatically reducing the number of tests to 81 test cases (3 x 3 x 3 x 3 or the Cartesian product perform which is especially cool because pairwise of the options). The test set created by OATS (using bugs represent the majority of combinatoric bugs and the orthogonal array in Fig. 4) has only nine test such bugs are a lot more likely to happen than the cases, yet tests all of the pair-wise combinations. ones that only happen with more variables. The OATS test set is only 11% as large at the • Plus, the availability of tools means you no longer exhaustive set and will uncover most of the need to create these tests by hand. interaction bugs. It covers 100% (9 of 9) of the pair- • Pairwise testing might find some pairwise bugs wise combinations, 33% (9 of 27) of the three-way while dramatically reducing the number of tests to combinations, and 11% (9 of 81) of the four-way perform, compared to testing all combinations, but combinations. The test set could easily be not necessarily compared to testing just the augmented if there were particularly suspicious combinations that matter, which is especially cool three- and four-way combinations that should be because pairwise bugs represent the majority of tested. Interaction testing can offer significant combinatoric bugs, or might not, depending on the savings. Indeed a system with 20 factors and 5 levels actual dependencies among the variables in the each would require 520 = 95 367 431 640 625 i.e. product, and such bugs are more likely to happen almost 1014 exhaustive test configurations. Pair-wise than ones that only happen with more variables, or interaction testing for 520 can be achieved in 45 tests. less likely to happen, because user inputs are not randomly distributed. 3.2 How to use this technique • Plus, the availability of tools means you no longer The OATS technique is simple and straightforward. need to create these tests by hand, except for the work The steps are outlined below. of analyzing the product, selecting variables and 1. Decide how many independent variables will be values, actually configuring and performing the test, tested for interaction. This will map to the Factors and analyzing the results. of the array. What do all pairs do? 2. Decide the maximum number of values that each • Input is a set of equivalence classes for each independent variable will take on. This will map to variable. the Levels of the array. • Sometimes the equivalence classes are those used 3. Find a suitable orthogonal array with the smallest in Boundary Value testing (min, min+, nominal, max- number of Runs. A suitable array is one that has at , max) least as many Factors as needed from Step 1 and has • Output is a set of (partial) test cases that at least as many levels for each of those factors as approximate an orthogonal array, plus some pairing decided in Step 2. information. 4. Map the Factors and values onto the array. All pairs assumptions are: 5. Choose values for any "left over" Levels. • Variables have clear equivalence classes. 6. Transcribe the Runs into test cases, adding any • Variables are independent. particularly suspicious combinations that aren't • Failures are the result of the interaction of a pair of generated. variable values. OATS provides a means to select a test set that: 3.3 OART application areas in software • Guarantees testing the pair-wise combinations of testing all the selected variables. OART can be applied to Black-box testing strategy as • Creates an efficient and concise test set with many outlined below: fewer test cases than testing all combinations of all • : derive test cases for black-box testing variables. of single components, • : derive test cases for black-box • Creates a test set that has an even distribution of all testing of entire systems, pair-wise combinations. • Known that significant number of failures is caused • Exercises some of the complex combinations of all by parameter interactions that occur in typical, yet the variables. realistic situations, • Is simpler to generate and less error prone than test • Assume that tests maximizing the interactions sets created by hand. between parameters will find more faults,

ISSN: 1109-2750 1326 Issue 8, Volume 7, August 2008 WSEAS TRANSACTIONS on COMPUTERS Ljubomir Lazic and Nikos Mastorakis

• Test at least for all two-way interactions among all • Intelligent test case generation is vital to cut down input parameter combinations because exhaustive costs and improve the quality of testing, testing (i.e. executing test cases for all possible input • Dramatically reduced overall number of test cases parameter combinations) cannot be afforded, compared to exhaustive testing, • Assume that the risk of an interaction failure among • Detects all faults due to a single parameter input three or more input parameters is balanced against the domain, ability to complete testing within a reasonable • Detects all faults due to interaction of two budget, parameter input domains, • Calculate the minimal set of test parameter • Detects many faults due to interaction of multiple combinations that test each pair-wise parameter parameter input domains. combination. Case studies [23,25,30] give evidence that the OART can be applied to Configuration Testing as approach compared to conventional approaches is: outlined below: • more than twice as efficient (measured in terms of • Testing of complex systems with multiple detected faults per testing effort) as traditional configurations, testing, • Interoperability testing, • about 20% more effective (measured in terms of • , detected faults per number of test cases) as traditional • Known that faulty interaction between system testing, components is a common source of system failures, • Approach is applicable: • Re-use existing suite of (system) test cases, ¾ to generate detailed test case input data • Test at least for all two-way interactions among during unit testing various system components because exhaustive ¾ to generate high-level test cases during testing (i.e. executing a suite of test cases for all system testing possible configurations) cannot be afforded, ¾ to generate test cases during configuration • Assume that the risk of an interaction failure among testing three or more components is balanced against the ¾ seamlessly with conventional test methods. ability to complete testing within a reasonable budget, For illustration let us analyze this example of • Calculate the minimal set of test configurations that Configuration Testing of communication system test each pair-wise combination of components. under test that has 4 components, each of which has 3 Which test input data shall be selected and what is possible elements as depicted in Fig.5 . benefit of OATS technique?

Fig. 5 Example of Configuration Testing of communication system

Those 4 components are: Calling phone ( Regular Regular phone, Pager), each of which has 3 quoted phone, Mobile phone, Coin phone), Call type ( Local possible elements. call, Long distance call, Toll free), Access (ISDN, Suite of system test cases exists and has to be PBX, Loop), and Called phone (Mobile phone, executed with overall 81 = 3x3x3x3, number of possible configurations. But, with applying OART

ISSN: 1109-2750 1327 Issue 8, Volume 7, August 2008 WSEAS TRANSACTIONS on COMPUTERS Ljubomir Lazic and Nikos Mastorakis algorithm according to Fig. 4, requires only 9 test 1. What variables (in our case, which Defect configurations instead of 81 that are already Detection Techniques – DDT) are the most sufficient to cover all pair-wise component significant? interactions. 2. In what way (in our case, find optimum DDT combination) do they affect the quality (in our case, 4 A novel Orthogonal Arrays effectiveness of software test activities with Defect application as Optimization Strategy Removal Efficiency – DRE) of the product or process? Experimental optimization can be carried out in 3. What is the optimal combination of settings for several ways. Most popular is the one-variable-at-a- these significant variables (in our case, DDT)? time approach. This approach is however extremely inefficient in locating the true optimum when 4.1 DDT evalouation or Measurement Choice interaction effects are present. Multivariable design of experiments are since many years used to like customer satisfaction index overcome the problems with interaction effects. Before, Orthogonal Arrays application as There are two general groups of designs to choose Optimization Strategy – OAOS, we did DDT from: Sequential or simultaneous experiment designs. evaluation like customer satisfaction index to answer The choice depends of the purpose of the study. the 1. question above i.e. which Defect Detection Are these both approaches, sequential and Techniques – DDTs are the most significant. Every simultaneous design of experiments, competing tester in test team assess 5 most frequently used alternatives or can they be joined into a DDT: DDT1= Inspection – DBR, DDT2= PBR, comprehensive and effective optimization and model- DDT3= CEG+BOR+MI, DDT4= M&S, DDT5= building strategy? Our a novel Orthogonal Arrays Hybrid (Category Partition, Boundary value application, may have come up with the answer. In analysis,…., Path testing etc.), as briefly described in this approach is first to optimize and then to study this section, according to nine Performance and variable effects, significance, etc. (i.e. model- Quality criteria given in Table 1. We have chosen a building). unified ordinal scale for the empirical Performance In the past, optimization usually required answers to and Quality criterion from 1 (worst) to 5 (best) of the three ordered questions: tester satisfaction level (TS level). This aspect is at most indeterminate in our approach.

Table 1. DDT Performance and Quality criteria for tester’s assessment

Criterion Name Description Ck 1 Overall Considering all aspects of the experience, how would you rate your overall level of satisfaction with DDT e.g. Combinatorial Test Services? 2 Capability How satisfied are you that DDT e.g. Combinatorial Test Services has the functions and features to perform as expected? 3 Usability How satisfied are you with the "ease of use" of DDT e.g. Combinatorial Test Services? 4 Performance How satisfied are you with the response time or speed with which DDT e.g. Combinatorial Test Services executes its functions? 5 Reliability How satisfied are you with the frequency, number, and seriousness of errors in DDT e.g. Combinatorial Test Services? 6 Installation How satisfied were you with the ease of installation, initialization, and migration of DDT e.g. Combinatorial Test Services? 7 Maintenance How satisfied are you with getting updates from alphaWorks, or patches from the development team for DDT e.g. Combinatorial Test Services? 8 Information How satisfied are you with the accuracy, completeness, and time it takes to find information (online help, etc.) for DDT e.g. Combinatorial Test Services? 9 Service How satisfied are you with the effectiveness of the DDT support e.g. alphaWorks discussion forum for Combinatorial Test Services, and responses to your e-mails?

The three Static Analysis test techniques are all DBR as DDT1, perspective-based reading-PBR as based on Behavior Analysis which enables analysts DDT2 ,and Simulation and Model Testing of the to convert narrative text into a tabular representation Design prior to Implementation – M&S as DDT4. In that are DDT candidates: Defect-based reading – its own right, Behavior Analysis helps to find errors

ISSN: 1109-2750 1328 Issue 8, Volume 7, August 2008 WSEAS TRANSACTIONS on COMPUTERS Ljubomir Lazic and Nikos Mastorakis in requirements because it demands the analyst calculates the right set of test cases to cover 100% of adopt a systematic approach to extracting functions, the functionality. The cause-effect graphing conditions and responses from the requirements text technique uses he same algorithms that are used in before synthesizing elementary test cases which can hardware logic circuit testing. Strict test case design be validated individually. Even if the three test in hardware ensures virtually defect free hardware. techniques are not used, Behavior Analysis is an The starting point for the Cause-Effect Graph, effective method for finding errors in requirements applied to software testing, is the requirements and other software development documents (R&D document. The requirements describe what the D). The technique is easy to learn and implement, system is intended to do. The requirements and other and can be applied to requirements documents of software development documents can describe real any size by selecting only critical sections for time systems, events, data driven systems, state analysis. transition diagrams, object oriented systems, Behavior Analysis is a pre-requisite for the graphical standards, etc. A three test methods; inspection, walkthrough and specification based testing strategy, called CEG-BOR animation, and each test technique have its own area [15], combines the use of cause-effect graphs (CEGs) of applicability and benefits: as a mechanism for representing specifications and • test by inspection is useful where traditional the use of the Boolean operator (BOR) strategy for inspections would be too difficult, time-consuming generating tests for a Boolean expression. If all or expensive to implement in the user community causes of a CEG are independent from each other, a • test by scenario walkthrough is useful where the fit test set for the CEG can be constructed such that all between proposed system and new or changed boolean operator faults in the CEG can be detected business process is a key consideration and the size of this test set grows linearly with the • test by animation is most useful where number of nodes in the CEG. Four case studies are requirements are not stable, where the system is conducted to provide empirical data on the aimed at individual users or where a more controlled performance of CEG-BOR [15]. Empirical results Prototyping technique is required. indicate that CEGs can be used to model a large class Errors in requirements present a most difficult of software specifications and that CEG-BOR is very challenge. Developers find it almost impossible to effective in detecting a broad spectrum of faults. detect errors in requirements without the help of Also, a BOR test set based on a CEG specification users. New methodologies involve users much more provides better coverage of the implementation code intimately, but these techniques are successful in than test sets based on random testing, functional specific situations where the risk of errors is low and testing, and state-based testing. For a CEG that does overall project size is small. Traditional not have mutually independent causes, the BOR developments of larger systems still use a staged, strategy does not perform well. To remedy this rather than an iterative approach. Users are problem, a new test generation strategy is presented, intimately involved only in the earliest stages and at which combines the BOR strategy with the the very end. The risk of requirements errors is Meaningful Impact (MI) strategy, a recently extremely high in such projects, but in most cases developed test generation strategy for Boolean requirements are not adequately tested. Many of expressions. This new strategy, called BOR+MI [15], these projects fail in the end, when the cost is decomposes a Boolean expression into mutually highest, because they could not deliver the required independent components, applies the BOR or MI business benefits. Testing of requirements and other strategy to each component for test generation, and software development documents (R&D D) is then applies the BOR strategy to combine the test sets potentially the most valuable testing we can do, for all components. The size and fault detection because errors in requirements are usually the most capability of a BOR+MI test set are very good. Both expensive to correct later, and present the biggest analytical and empirical results show that the threat to the project's success. Behavior Analysis, BOR+MI strategy generates a smaller test set than the testing by inspection, testing by walkthrough and MI strategy and provides comparable fault detection testing by animation offer some hope that ability as the MI strategy. This extension, called requirements can be `got right first time'. BRO+MI, detects incorrect relational operators in A Cause-Effect Graphing – CEG+BOR+MI relational expressions and also accounts for user- [15] is used as DDT3. The Cause-Effect Graphing defined or implicit restrictions on the causes of a technique was invented by Bill Elmendorf of IBM in CEG. 1973. Instead of the test case designer trying to Traditional Test Case Design Techniques, including manually determine the right set of test cases, he/she Equivalence Class Partitioning, Boundary Value models the problem using a cause-effect graph, and Analysis and some White-box (statement, branch and the software that supports the technique [15], path covering) are combined in Hybrid or Gray-box

ISSN: 1109-2750 1329 Issue 8, Volume 7, August 2008 WSEAS TRANSACTIONS on COMPUTERS Ljubomir Lazic and Nikos Mastorakis

DDT that we denoted as DDT5 . These techniques The alternatives are then ordered according to these rely on the test case designer to manually work out counts. The Borda method is an example of a the proper combinations of test cases. Often, the test positional voting method, which assigns Pj points to a case designer does not use a formal test case design voter’s jth-ranked candidate, j = 1, . . . , N, and then technique and relies on his/her “gut feel” to assess determines the ranking of the candidates by whether test coverage is sufficient. While these evaluating the total number of points assigned to each techniques do generate combinations of test cases, of them. Voting theorists [31] have shown that the they often fall short on providing full functional Borda method is the optimal positional voting coverage. Too often the normal flow or “go path” method with respect to several standards, such as functionality has overlapping, redundant test cases, minimizing the number and kinds of voting while exceptions and error conditions go untested. paradoxes. In addition, if ties are not present in the The Full-Lifecycle IOSTP methodology is a criteria rankings, it is demonstrated that the Borda collection of testing techniques to verify and validate method is equivalent to determining the consensus broad types of software products. The IOSTP uses a rankings that minimize the sum of the squared wide variety of techniques (described in [25]) that are deviations from the criteria rankings. The Borda available to deploy throughout all aspects of software method has been used to rank alternatives in a variety development. The list of techniques is not meant to of applications, including a cost and operational be complete – instead the goal is to make it explicit effectiveness analysis (COEA) and an aircraft wide range of options available for STP optimization. maintenance study. In the DDT ranking application, let N be the total number of DDTs, and the index i 4.2 Application of the Borda optimal positional denote a particular DDT. Let the DDT criterion of voting method to DDT ranking Overall (from Table 1) assessment be denoted by k = The Borda method is used in the Risk Matrix 1, and the DDT criterion of Capability assessment be software application [31] to rank risks from most-to- denoted by k = 2 etc. The rest of this section least critical on the basis of multiple evaluation describes how the Borda voting method is criteria. We adapted this Borda method, on similar implemented in our DDT rank assessment case. way, to rank all used Defect Detection Techniques (DDT) through software development life cycle from 4.2.1 Evaluate Rank of Each DDT with Respect to most-to-least performance and quality characteristics Overall criterion of DDT in revealing software faults (bugs, errors). Let J be the total number of possible Overall This section describes in detail how the Borda assessments. As discussed above, a DDT can be method is applied, using the sample of DDT assessed by tester, as for the empirical criterion from Assessment Entries Worksheet that every tester in 1 (worst) to 5 (best) of the tester satisfaction level test team provided as an illustration. (TS level), and so there are J = 5 possible On the other hand, it is necessary to map the possible assessments. Let Qj be the j-th possible Overall metrics values to the ordinal scale of the empirical assessment, which is assumed to be ordered in the criterion. We have chosen a unified ordinal scale for following way: Qj has a higher Overall point than the empirical criterion from 1 (worst) to 5 (best) of Qj+1. Thus, Q1 = 1 (TS level), Q2 =2 (TS level), etc. the tester satisfaction level (TS level). This aspect is Let Mj be the number of DDTs having Qj as the at most indeterminate in our approach. Hence, its tool Overall rating. Table 2 gives the values of Mj that support requires high flexibility for the adjustment or correspond to the sample given by testers. tuning of the measurement process of the customer satisfaction determination. Table 2 Values of Qj, Mj, and Tj for sample given by Borda proposed the following voting method in testers 1770: Given N - DDT candidates, if points of N -1, N - 2, . . . , and 0 are assigned to the first-ranked, j Qj Mj Tj second-ranked, . . . , and last-ranked candidate in 1 5 (best) 2 1.5 each test team voter’s preference order, then the 2 4 3 4 winning candidate is the one with the greatest total 3 3 0 N/A number of points. Instead of voters, suppose that 4 2 0 N/A there are multiple criteria. If rik is the rank of alternative i - particular DDT, under criterion k = 1 to 5 1 (worst) 0 N/A 9 ( DDT criteria from Table 1), the Borda count for alternative i is Let T j be the rank position for all DDTs that are given 9 the j-th possible impact assessment. b = − rN )( . (5) i ∑ ik How can we evaluate this rank position? The basic k =1 approach is to evaluate the rank of a tied alternative

ISSN: 1109-2750 1330 Issue 8, Volume 7, August 2008 WSEAS TRANSACTIONS on COMPUTERS Ljubomir Lazic and Nikos Mastorakis

as the average of the associated rankings. The for h > 1 and B1B = 0. The values of Sh are given in following is a key result: if a is the first term in an Table 4 for the sample given by testers. arithmetic progression, t is the final term, and n is the number of terms, then (n/2)(a + t) is the sum of the n Table 4 Values of Ph, Nh, and Sh for Sample given by terms. Because there are M1 DDTs that are tied for testers positions 1 through M1 , the sum of these rank h Ph Nh Sh positions is (M1 / 2)(1 + M1) . Thus, the average of 1 5 1 1 this sum is T1 = (1/ 2)(1 + M1) . Similarly, there are M2 DDTs that are tied for positions M1 + 1 through 2 4 1 2 M + M , so that the average of this sum is T = (1/ 2)( 1 2 2 3 3 2 3.5 2M1+ 1 + M2) . More generally, if M j > 0, Tj = 1/2(2Cj + 1+ Mj), where 4 2 1 5 5 1 0 N.A. j−1

Cj = ∑ M r (6) r=1 Let r12 be the rank of the ith DDT respect to the Capability of occurrence. If the ith DDT has the h-th for j > 1 and C1 =0 . The values of Tj are given in possible assessment, then set r12 = Sh. The values of Table 2 for the sample given by testers. r12 are given in Table 3 for the sample given by Let ri1 be the rank of the i- th DDT with respect to the testers. impact assessment. If the i-th DDT has the j-th possible impact assessment, then set ri1= Tj . The 4.2.3 Determine Borda Ranking of Each DDT values of ri1 are given in Table 3 for the sample given Let N be the total number of DDTs, which satisfies by testers. H N = N (8) Table 3 Borda Points and Count for sample given by ∑ h h=1 testers The Borda Count for DDT i is computed with formulae:

DDT C1 C2 ri1 ri2 Borda Borda bi= (N- ri1) + (N- ri2) (9) No. Criterion Criterion Count Rank The final step is to rank the DDTs with respect to their 1 5 3 1.5 3.5 5 0 Borda Count. In particular, the DDT with the highest Borda Count is the best DDT according to testers 2 4 5 4 1 5 0 Performance and Quality multi-criteria assessment, the 3 4 4 4 2 4 3 DDT with the second highest count is the next DDT with highest score, and so forth. The Borda Rank for 4 5 3 1.5 3.5 5 0 given DDT is the number of other DDTs that are better 5 4 2 4 5 1 4 then that DDT. Table 3 provides both the Borda Count and Borda Rank for the sample given by testers. Defect 4.2.2 Evaluate Rank of Each DDT with Respect to Detection Techniques DDT1, DDT2, and DDT4 are Capability criterion tied with the highest Borda Count, and so their Borda Let H be the total number of possible Capability Rank is 0. DDT3 has a Borda Rank of 3, because there assessments. As discussed above, there are five default are three other DDTs that are more critical. DDT5 has a Borda Rank of 4, because there are four other Capability ranges and so H = 5. Let Ph be the highest Capability associated with the h-th possible DDTs that are better then that DDT5. The foregoing assessment, and let these be ordered such that P > algorithm has been implemented as part of the software h application. The same procedure is accomplished for Ph+1. Let Nh be the number of DDTs that are assigned the h-th possible Capability assessment. the rest 7 criteria and Borda Rank for the sample given Table 4 shows the values of P and N that are used by testers wasn’t changed i.e. were the same. h h As a conclusion after Borda Ranking of DDT for our numerical example, where the values of Nh are derived from sample given by testers. candidates, we did DDT evaluation like testers satisfaction index to answer the 1. question from Let Sh be the rank position for all DDTs that are given the h-th possible Capability assessment. optimization point of view i.e. which Defect Detection Techniques – DDTs are the most As before, if Nh > 0, significant? According to testers assessment of 5

Sh = 1/2(2BhB + 1+ Nh), where most frequently used DDT in IOSTP [25]: DDT1= h−1 Inspection – DBR, DDT2= PBR, DDT3= Bh = N (7) ∑ r CEG+BOR+MI, DDT4= M&S, DDT5= Hybrid r=1

ISSN: 1109-2750 1331 Issue 8, Volume 7, August 2008 WSEAS TRANSACTIONS on COMPUTERS Ljubomir Lazic and Nikos Mastorakis

(Category Partition, Boundary value analysis, Path computer (software testing) experiments is (in testing etc.) three of DDTs have the highest rank 0 almost all cases) deterministic. Generally, there is i.e. DDT1=DDT2=DDT4=0, then DDT3= no measurement error or no variability in analysis CEG+BOR+MI is next ranked and the last was outputs. Therefore, experimental designs DDT5. Because of that we will group those three constructed to minimize variability of DDT with highest rank 0, call them Static Test measurements may not be the best choice for Techniques – TT1 and treat all three DDTs as one computer experiments [17]. factor in optimization experiment applying In this study response surface methods for Orthogonal Arrays as Optimization Strategy. Next computer experiments are investigated and some of high Borda ranked DDT4= CEG+BOR+MI we the approaches available in the literature are designate with TT2 and the last ranked DDT5 as discussed. The focus is on response surface model TT3. building using orthogonal arrays designed for In next section we provided the answers to 2. and 3. computer experiments (OACE). optimization question given above i.e. in what way DDTs combination do affect the quality (in our case, 4.3.1 Response Surface Model Building Using effectiveness of software test activities with Defect Central Composite Designs Removal Efficiency – DRE) of the software testing Response surface methods (RSM) can be utilized process and what is the optimal combination of DDTs for MDO in cases where computerized design tool for these significant variables (in our case - TT1,TT2 integration is difficult and design effort is costly. and TT3)? The first step in RSM, is to construct polynomial

approximations to the functional relationships 4.3 Response Surface Model Building Using between design or process variables and Orthogonal Arrays Multidisciplinary design optimization (MDO) is an performance characteristics (e.g. DDT, DRE) important step in the conceptual design and [25,27]. In the next step, these parametric models evaluation of STP effectiveness and efficiency are used for MDO and to determine variable since many factors has a significant impact on sensitivities. A quadratic approximation model in performance and software development lifecycle the form given below (10) is commonly used since (SDL) cost. The objective in MDO is to search the it can account for individual parameter effects, software design and STP space efficiently to second-order curvature or non-linearity (square determine the values of design and process terms), and for two-parameter interactions (cross variables that optimize performance characteristics terms). subject to system constraints.  k k k An alternative is to utilize response surface 2 0 ∑∑ii ++= jiij + ∑ XbXXbXbby iii methodology (RSM) to obtain mathematical models i=≠11ji , i=1 that approximate the functional relationships (10) between performance characteristics and where : yˆ - approximation of output variable i.e. design/process variables. A common approach trial response (the performance characteristic to be used in RSM is to utilize central composite designs optimized), k- number of factors, b0 ,bij – are (CCD) from the design of experiments literature to estimated least squares regression coefficients, sample the SDL and STP space efficiently [25,27]. based on the design and analysis data obtained by sampling the design/process space (or by With this approach, design analyses (experiments) − xx th are performed at the statistically selected points conducting experiments), X = ii 0 coded i i Δx specified by a CCD matrix. The resulting data is th factor values, xi –real i factor values, xi0 -real factor used to construct response surface approximation value in “NULL" point (point in experimental models using least squares . center) and Δx- variation interval. These response surface equations are then used for In some cases, however, RSM using CCD may not MDO and for rapid sensitivity studies. result in a good representation of the response However, like most experimental designs, CCD is surface as may be evidenced by poor predictions of designed with the physical experiments in mind the design analysis results. The reasons for this where the dominant issue is the variance of problem can be mainly due to; measurement of the response. In a physical experiment, there is usually some variability in the 1) The response surface is more complex than can output response with the experiment repeated with be represented by a the same inputs. In , the output of second order approximation model given by

ISSN: 1109-2750 1332 Issue 8, Volume 7, August 2008 WSEAS TRANSACTIONS on COMPUTERS Ljubomir Lazic and Nikos Mastorakis

equation (10), DRE) subject to software testing process 2) There are other influential design/process constraints. However, research shows that up to variables and interactions other than 82% of the life software testing cycle cost is those currently under study, committed during the early design phase [25,29]. 3) The sample design points (experiments) Therefore, significant cost savings could be realized specified by a CCD may not be if designers and test managers were better able to suitable in terms of selection of these specific evaluate their designs on a cost basis. points for experimentation This study focuses on rapid multidisciplinary with computerized design analysis tools. analysis and evaluation-on-a-DRE maximum-basis The third problem is directly related to the choice for DDT combination choices selection for each test of specific experimental design points. In order to phase activities i.e. P1- software requirement address this problem and to improve response (SRUT), P2- High level design (HLDUT), P3- Low surface model building using computer Level Design (LLDUT), P4- code under test (CUT), experiments, a study was conducted. P5- integration test (IUT), P6- system under test (SUT) and finally P7- Acceptance test, recall section 4.3.2 Response Surface Model Building Methods 2.2.1. Different Defect Detection Strategy and for Computer Experiments Techniques options, together with critical STP Bayesian approach to experimental design variables performance characteristics (e.g. DRE, appears to be a growing area of research. However, cost, duration), are studied to optimize design, the application of Bayesian experimantal design development, test and evaluation (DDT&E) cost methods in real design analysis and optimization using orthogonal arrays for computer experiments problems have been limited partly due to the lack [25-27]. Calculus-based optimizers could not have of user friendly software [17]. Further development been used in this case since material and appears to be needed before they can be applied to technology options selection require the study of practical design optimization problems. design variables that have discrete values. This The frequentist approach, surveyed by Owen [17] study has the following steps: on the other hand, introduces randomness by taking function values that are partially 1. Identify the design variables to be studied and determined by pseudorandom number generators. alternative levels Then this randomness is propagated through to In this study, design of maximum DRE percentage randomness in the estimate. Owen, lists a set of of STP optimization problem solving with best randomized orthogonal arrays for computer DDT choice combination in each phase P1 to P7 as experiments. The Statlib computer programs controlled variables values is determined by (http://lib.stat.cmu.edu/designs/) to generate designed experiment plan using orthogonal arrays these orthogonal arrays are also listed. designed for this computer experiment (OACE). The use of these orthogonal arrays in practice for To simplify the analysis such as decreasing response surface model building would be similar factor’s values (only three DDT number) applying to utilizing central composite designs, with a Borda Ranking of DDT candidates with highest potential of improving model accuracy for rank, several design disciplines were decoupled computer experiments. In the following section, an from the present analysis. example application to an optimum DDTs Seven major test phases P1 to P7 for accounting combination selection and optimization study for maximum DRE percentage all over STP fault an Integrated and Optimized Software Testing injection and removal model (see Fig. 3) for DDT Process IOSTP [25] is presented. candidate selection in each test phase were determined. These were the Static Test Techniques 4.3.3 Example Application: optimum DDTs – TT1 (consisting of three DDTs as one factor in combination selection and optimization study for optimization experiment applying Orthogonal an IOSTP Arrays as Optimization Strategy), the TT2 i.e. Traditionally, the objective in a MDO study has DDT4= CEG+BOR+MI and TT3 – Hybrid Detection been to search the design space to determine the Technique= DDT5 (consisting of Category Partition, values of design variables (such as DDTs) that Boundary value analysis, Path testing etc.). The optimize a performance characteristic (such as objective of this investigation was then to determine the best combination of Test Techniques options for

ISSN: 1109-2750 1333 Issue 8, Volume 7, August 2008 WSEAS TRANSACTIONS on COMPUTERS Ljubomir Lazic and Nikos Mastorakis the seven major test phase activities sections par test phase activities (P1 to P7) according to optimized for STD&STP maximum DRE Borda ranking result and Orthogonal Array design percentage under cost and time constraints of experiment plan in 18 experiments from Table 5 according to IOSTP benefit index maximization in where TT1 is codded as 1, TT2 as 2 and TT3 as 3. (4) [25-27]. The analysis results of the 18 experiments for

IOSTP DRE percentage and corresponding TT 2. Design the experiment and select an selection per each test phase are presented in Table appropriate orthogonal array Owen [17], lists a set of orthogonal arrays for 6. For the 18 DDT combinations shown in Table 6, the highest DRE is 94.44 % (experiment number computer experiments. For this study, an seven). orthogonal array that enables the study of seven variables with three levels each was selected Table 6 The “what-if” analysis results of OACE (http://lib.stat.cmu.edu/designs/owen.small). If a experiment full factorial design where all possible variable/TT DRE combinations studied would have required 2,187 Exp. P1 P2 P3 P4 P5 P6 P7 No. (%) (37) experiments while in Orthogonal Array design of experiment plan only 18 experiments are enough 1 1 1 3 2 3 1 1 90.5 1 as shown on Table 5. Variable interactions were 2 3 2 2 1 1 3 1 84.79 assumed to be insignificant for this study. 3 2 3 1 3 2 2 1 87.66 4 1 2 1 1 2 1 2 91.27 Table 5 Seven Variable Orthogonal Array with three 5 3 3 3 3 3 3 2 80.34 81.66 levels each [17] 6 2 1 2 2 1 2 2 7 1 3 2 1 3 2 3 94.44 No. of P1 P2 P3 P4 P5 P6 P7 8 3 1 1 3 1 1 3 83.14 Exper. 82.99 1 1 1 3 2 3 1 1 9 2 2 3 2 2 3 3 2 3 2 2 1 1 3 1 10 1 3 1 2 1 3 1 87.89 11 3 1 3 1 2 2 1 85.05 3 2 3 1 3 2 2 1 12 2 2 2 3 3 1 1 89.77 4 1 2 1 1 2 1 2 13 1 1 2 3 2 3 2 83.91 5 3 3 3 3 3 3 2 14 3 2 1 2 3 2 2 85.19 6 2 1 2 2 1 2 2 15 2 3 3 1 1 1 2 81.72 7 1 3 2 1 3 2 3 16 1 2 3 3 1 3 3 84.11 8 3 1 1 3 1 1 3 17 3 3 2 2 2 2 3 82.58 92.94 9 2 2 3 2 2 3 3 18 2 1 1 1 3 1 3 10 1 3 1 2 1 3 1 11 3 1 3 1 2 2 1 4. Analyze the data to determine the optimum 12 2 2 2 3 3 1 1 levels and verify results The average DRE (%) for each variable P and for 13 1 1 2 3 2 3 2 each of the three levels i.e. TT are calculated and 14 3 2 1 2 3 2 2 displayed in the response table given in Table 7. 15 2 3 3 1 1 1 2 This response table shows the DRE (%) effects of 16 1 2 3 3 1 2 3 the variables at each level. These are separate 17 3 3 2 2 2 1 3 effects of each parameter and are commonly called main effects. The average (%) shown in the 18 2 1 1 1 3 3 3 response table are calculated by taking the average for a variable at a given level, every time it was 3. Conduct the orthogonal array experiments used. As an example, the variable P1 was at level 2 The eighteen matrix experiments were conducted in experiments 3,6,9,12,15 and 18. The average of using a DRE estimating relationships in an post- corresponding DRE (%) is 86.12 (%) which is mortem real project data doing “what-if” analysis shown in the response table (Table 7) under P1 at i.e. which DRE percentage of all over IOSTP will level 2. This procedure is repeated and the response be reached if we combine DDTs in different way table is completed for all variables at each level.

ISSN: 1109-2750 1334 Issue 8, Volume 7, August 2008 WSEAS TRANSACTIONS on COMPUTERS Ljubomir Lazic and Nikos Mastorakis

Table 7 DRE (%) Response table per phase P choice given in Table 8. Optimal combination of DDT choices per phase P Phase P1 P2 P3 P4 P5 P6 P7 given in Table 8 made increase of about 6 %, / TT compared to un-optimized DDTs combination per each test phase we used in our real project in which 1 88.69 86.20 88.02 88.36 83.91 86.35 84.02 we achieved DRE of 87.43 %. 2 86.12 86.35 86.19 85.14 85.58 86.51 87.61 5 Conclusion 3 83.52 85.77 84.12 84.82 88.72 85.50 86.70 Organizations are constantly working to leverage today’s best practices for testing—within the context of their existing IT environments. As IT works to The optimum level (TT) for the design variables balance the business needs for a certain application (P) can now be selected by choosing the level with and the testing limitations with regards to resources the highest DRE percentage. For example the and schedules, making the best use of the testing highest DRE percentage is got when variable P1 environment becomes critical. Optimized testing is a was at level 1 at 88.69 % as opposed to 83.52 % at way for organizations to move their testing efforts level 3, and 86.12 % at level 2. Similarly, the levels forward to reflect changing business environments that optimize total IOSTP defect removal and resource constraints. Optimized testing uses test effectiveness (DRE) were chosen. The optimum techniques which has the highest defect detection levels are indicated by bold&underlined in Table yield and combined with the Orthogonal Array Testing Strategy (OATS) provides 7. As the next step, least squares regression : • Pairwise testing that protects against pairwise bugs analysis is used to fit the second order while dramatically reducing the number of tests to approximation model (Equation 10) to the DRE data perform which is especially cool because pairwise (yi) given in Table 6 in terms of the seven design bugs represent the majority of combinatoric bugs and variables (Xi). This parametric model accounts for such bugs are a lot more likely to happen than the the response surface curvature (square terms) and ones that only happen with more variables. two factor interactions (cross terms) i.e. RSM: • Plus, the availability of tools means you no longer need to create these tests by hand. DRE (%) = 111.71 - 2.58 (P1) + 1.22 (P2) -1.95 • Pairwise testing might find some pairwise bugs (P3) - 7.61 (P4) - 0.69 (P5) + 0.94 (P6) -13.04 (P7) - while dramatically reducing the number of tests to 2 2 2 2 perform, compared to testing all combinations 0.36 (P2) + 1.46 (P4) + 0.79 (P5) - 0.36 (P6) + because pairwise bugs represent the majority of 2 3.15 (P7) (11) combinatoric bugs. • Plus, the availability of tools means you no longer Note that, in this response surface approximation need to create these tests by hand, except for the work model, the parameter values are restricted to 1 of analyzing the product, selecting variables and (TTl), or 2 (TT2), or 3 (TT3). values, actually configuring and performing the test,

Table 8 Maximum DRE (%) value and and analyzing the results which improves application corresponding Test Techniques choices per test quality, maximizes development resources and helps phase solution deliver applications on time and within budget. Using the tools associated with optimized testing, DRE including DDT assessment, requirements P1 P2 P3 P4 P5 P6 P7 (%) management and automation, IT managers can make TT1 TT2 TT1 TT1 TT3 TT2 TT2 94.03 more informed decisions regarding testing, and have

the data and information to back up those decisions. This article has highlighted increase of DRE about 6 At these levels, the IOSTP DRE was predicted to be % compared to un-optimized DDTs combination per 94.03 % using a second order prediction model each test phase maximizing DRE percentage of STP (10). As a next step, a verification analysis was solving optimization problem with best DDT performed. The DRE (%) of an IOSTP calculated choices combination in each phase P1 to P7 as from these test techniques choices, according to the controlled variables values which is determined by post-mortem real project data using optimized DDT designed experiment plan using orthogonal arrays chices from Table 8, we computed DRE (%) to be designed for this computer experiment (OACE). 93.43 % . Difference is 0.6%=94.03%-93.43% that is acceptable to validate our prediction model for DRE References (%) in equation (11) for optimal DDT combination [1] http://www.pairwise.org/tools.asp.

ISSN: 1109-2750 1335 Issue 8, Volume 7, August 2008 WSEAS TRANSACTIONS on COMPUTERS Ljubomir Lazic and Nikos Mastorakis

[2] P. E. Ammann and A. J. Offutt. Using formal [16] R. Mandl. Orthogonal latin squares: an application methods to derive test frames in category-partition of experiment design to compiler testing. testing. In Ninth Annual Conference on Computer Communications of the ACM, 28(10):1054–1058, 1985. Assurance (COMPASS’94), Gaithersburg MD, pages [17] Owen, A.: "Orthogonal Array Designs for 69–80, 1994. Computer Experiments," Department of Statistics, [3] J. Bach and P. Shroeder. Pairwise testing - a best Stanford University, 1994, practice that isn’t. In Proceedings of the 22nd Pacific http://lib.stat.cmu.edu/designs/owen.small. Northwest Conference, pages 180– [18] G. Sherwood. Effective testing of factor 196, 2004. combinations In Proceedings of the Third International [4] K. Burr and W. Young. Combinatorial test Conference on Software Testing, Analysis and Review, techniques: Table-based automation, test generation, Washington, DC, pages 133–166, 1994. and test coverage. In Proceedings of the International [19] H. Shimokawa and S. Satoh. Method of setting Conference on Software Testing, Analysis, and Review software test cases using the experimental design (STAR), San Diego CA, 1998. approach. In Proceedings of the Fourth Symposium on [5] K. Burroughs, A. Jain, and R. L. Erickson. Improved Quality Control in Software Production, Federation of quality of protocol testing through techniques of Japanese Science and Technology, pages 1–8, 1984. experimental design. In Proceedings of the IEEE [20] B. Smith, M. S. Feather, and N. Muscettola. International Conference on Communications Challenges and methods in testing the remote agent (Supercomm/ICC’94), May 1-5, New Orleans, planner. In Proceedings of AIPS, 2000. Louisiana, pages 745–752, 1994. [21] K. C. Tai and Y. Lei. A test generation strategy for [6] D. M. Cohen, S. R. Dalal, M. L. Fredman, and G. C. pairwise testing. IEEE Transactions of Software Patton. The AETG system: An approach to testing Engineering, 28(1), 2002. based on . IEEE Transactions On [22] K. Tatsumi. Test case design support system. In Software Engineering, 23(7), 1997. Proceedings of the International Conference on Quality [7] D. M. Cohen, S. R. Dalal, J. Parelius, and G. C. Control (ICQC), Tokyo, 1987, pages 615–620, 1987. Patton. The combinatorial design approach to automatic [23] D. R. Kuhn, D. R. Wallace, and A. M. Gallo. test generation. IEEE Software, 13(5):83–87, 1996. Software fault interactions and implications for software [8] C. J. Colbourn, M. B. Cohen, and R. C. Turban. A testing. IEEE Trans. Software Engineering, 30(6):418– deterministic density algorithm for pairwise interaction 421, October 2004. coverage. In Proceedings of the IASTED International [24] A. W. Williams. Software components interactions Conference on Software Engineering, 2004. testing:coverage measurement and generation of [9] S. R. Dalal, A. J. N. Karunanithi, J. M. L. Leaton, G. configurations, PhD thesis, Computer Science, Ottawa- C. P. Patton, and B. M. Horowitz. Model-based testing Carleton Institute for Computer Science, School of in practice. In Proceedings of the International Information Technology and Engineering,University of Conference on Software Engineering (ICSE 99), New Ottawa, 2002. York, pages 285–294, 1999. [25] Lj. Lazić, The Integrated and Optimized Software [10] I. S. Dunietz, W. K. Ehrlich, B. D. Szablak, C. L. Testing Process, PhD Thesis, School of Electrical Eng., Mallows, and A. Iannino. Applying design of Belgrade, Serbia, 2007. experiments to software testing. In Proceedings of the [26] Lj. Lazić, Mastorakis, N. RBOSTP: Risk-based International Conference on Software Engineering optimization of software testing process Part 2”, WSEAS (ICSE 97), New York, pages 205–215, 1997. TRANSACTIONS on INFORMATION SCIENCE and [11] M. Grindal, J. Offutt, and S. F. Andler. APPLICATIONS, Issue 7, Volume 2, p 902-916, July Combination testing strategies - a survey. GMU 2005. Technical Report, 2004. [27] Lj. Lazić, D. Velasević: “Applying simulation and [12] A. Hartman and L. Raskin. Problems and design of experiments to the embedded software testing algorithms for covering arrays. Discrete Mathematics, process”, STVR, Volume 14, Issue 4, p257-282, John 284(1-3):149–56, 2004. Willey & Sons, Ltd., 2004. [13] R. Kuhn and M. J. Reilly. An investigation of the [28] T. Gregory: “The Economic Impacts of Inadequate applicability of design of experiments to software Infrastructure for Software Testing”, NIST Final Report, testing. In Proc. of the 27th NASA/IEEE Softw. Engin.g May 2002. Workshop, NASA Goddard Space Flight Center, 2002. [29] S. H. Kan. Metrics and Models in Software [14] Y. Lei and K. C. Tai. In-parameter-order: a test Quality Engineering, Second Edition, Addison- generation strategy for pairwise testing. In Proceedings Wesley, 2002. of the Third IEEE International High-Assurance [30] Brownlie R, Prowse J, and Phadke M.: “ Robust Systems Engineering Symposium, pages 254–261, Testing of AT&T PMX/StarMAIL Using OATS”, 1998. AT&T Technical Journal; 71(3): 41- 47, May/June [15] Paradkar, Amit: “Specifcation Based Testing Using 1992. Cause-Efect Graphs“, Ph.D. dissertation, Graduate [31] Pamela A. Engert, Zachary F. Lansdowne. Risk Faculty of North Carolina State University, Matrix User’s Guide Version 2.2, © 1999 The MITRE COMPUTER SCIENCE, Raleigh, 1996. Corporation, 1999.

ISSN: 1109-2750 1336 Issue 8, Volume 7, August 2008