Empirical Evaluation of the Effectiveness and Reliability of Software Testing Adequacy Criteria and Reference Test Systems

Total Page:16

File Type:pdf, Size:1020Kb

Empirical Evaluation of the Effectiveness and Reliability of Software Testing Adequacy Criteria and Reference Test Systems Empirical Evaluation of the Effectiveness and Reliability of Software Testing Adequacy Criteria and Reference Test Systems Mark Jason Hadley PhD University of York Department of Computer Science September 2013 2 Abstract This PhD Thesis reports the results of experiments conducted to investigate the effectiveness and reliability of ‘adequacy criteria’ - criteria used by testers to determine when to stop testing. The research reported here is concerned with the empirical determination of the effectiveness and reliability of both tests sets that satisfy major general structural code coverage criteria and test sets crafted by experts for testing specific applications. We use automated test data generation and subset extraction techniques to generate multiple tests sets satisfying widely used coverage criteria (statement, branch and MC/DC coverage). The results show that confidence in the reliability of such criteria is misplaced. We also consider the fault-finding capabilities of three test suites created by the international community to serve to assure implementations of the Data Encryption Standard (a block cipher). We do this by means of mutation analysis. The results show that not all sets are mutation adequate but the test suites are generally highly effective. The block cipher implementations are also seen to be highly ‘testable’ (i.e. they do not mask faults). 3 Contents Abstract ............................................................................................................................ 3 Table of Tables ................................................................................................................ 7 Table of Figures ............................................................................................................... 8 Acknowledgements .......................................................................................................... 9 Author's declaration ....................................................................................................... 10 1. Introduction ................................................................................................................ 11 1.1 Background and Motivation ................................................................................ 11 1.2 Research Objectives ............................................................................................. 12 1.3 The Aims and Goals of the Thesis ....................................................................... 13 1.4 Overview and Structure of the Thesis .................................................................. 14 2. Literature Survey ....................................................................................................... 16 2.1. Fundamental Concepts ........................................................................................ 17 2.1.1 Testing – the English Usage ......................................................................... 17 2.1.2 Software Testing – a Technical Usage.......................................................... 18 2.1.3 Faults and Failures ........................................................................................ 18 2.1.4 Faults, Failures and Testing .......................................................................... 20 2.1.5 Terminology in the Literature ....................................................................... 20 2.1.6 What is Tested and Where? .......................................................................... 21 2.1.7 Phases of Dynamic Testing: Unit, Integration, System and Acceptance Testing.................................................................................................................... 25 2.2 Characteristics of Testing .................................................................................... 29 2.2.1 Introduction ................................................................................................... 29 2.2.2 Static v Dynamic ........................................................................................... 29 2.2.3 Functional v Structural .................................................................................. 30 2.2.4 Formal v Informal ......................................................................................... 33 2.2.5 Manual v Automated ..................................................................................... 35 2.2.6 Testing v Debugging ..................................................................................... 37 2.3. Software Standards, Process Models and Metrics .............................................. 40 2.3.1 Introduction ................................................................................................... 40 2.3.2 Software Standards ....................................................................................... 40 2.3.4 Software Testing Metrics .............................................................................. 45 2.3.5 Software Projects Continue to Fail ............................................................... 49 2.4. Model Testing, Reliability Growth Models and Fault Adequacy Techniques ... 51 4 2.4.1 Introduction ................................................................................................... 51 2.4.2 Model Based Testing .................................................................................... 51 2.4.3 Reliability Growth Models ........................................................................... 55 2.4.4 Fault Based Adequacy Techniques ............................................................... 58 2.4.5 Execution, Infection and Propagation (PIE) ..................................................... 65 2.5. What do Empirical and Case Studies tell us about Software Testing? ............... 66 2.6 When Can We Stop Testing? ............................................................................... 77 2.6.1 Introduction ................................................................................................... 77 2.6.2 Can We Stop Testing Now? .......................................................................... 77 2.7 Test Set Sub-setting via Heuristic Search ............................................................ 81 2.8 Summary .............................................................................................................. 85 3. The Research Test Framework and its Components .................................................. 86 3.1 Framework ........................................................................................................... 87 3.1.1 Overview of the Framework ............................................................................. 87 3.1.2 Code-Coverage ................................................................................................. 91 3.1.3 Test Set Generation ........................................................................................... 95 3.1.4 JUnit and the Framework .................................................................................. 96 3.1.5 Test Subset Extraction ...................................................................................... 98 3.1.5.1 Genetic Algorithms ........................................................................................ 98 3.1.6 Mutation .......................................................................................................... 103 3.1.7 Mutation Score Generation ............................................................................. 106 3.2 Testing Framework Refinement ........................................................................ 108 3.2.1 Overview of the Framework Refinement ....................................................... 108 3.2.2 Test Case Generation and Execution .............................................................. 109 3.2.3 Mutant Testing and Mutant Score Generation ................................................ 111 3.3 Framework Summary......................................................................................... 112 4. Testing Code Coverage Criteria............................................................................... 114 4.1 Numerical Programs Under Test and Their Program Properties ....................... 116 4.2 Mutations Injected and Results .......................................................................... 118 4.2.1 Full Test Set Results ....................................................................................... 118 4.2.2 Rationale for Different MS in Numerical Recipes ......................................... 121 4.2.3 Test Subset Results ......................................................................................... 126 4.2.4 Results from Miscellaneous and Sorting Algorithms. .................................... 131 4.3 Program Properties............................................................................................. 137 5 4.4 Conclusions ........................................................................................................ 137 5. Testing Encryption Algorithms................................................................................ 140 5.1 Programs under Test, Program Properties and Test Vectors ............................. 141 5.2. Mutation Results ............................................................................................... 145 5.4 Conclusions .......................................................................................................
Recommended publications
  • The Use of Summation to Aggregate Software Metrics Hinders the Performance of Defect Prediction Models
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. XX, NO. XX, XXX 2016 1 The Use of Summation to Aggregate Software Metrics Hinders the Performance of Defect Prediction Models Feng Zhang, Ahmed E. Hassan, Member, IEEE, Shane McIntosh, Member, IEEE, and Ying Zou, Member, IEEE Abstract—Defect prediction models help software organizations to anticipate where defects will appear in the future. When training a defect prediction model, historical defect data is often mined from a Version Control System (VCS, e.g., Subversion), which records software changes at the file-level. Software metrics, on the other hand, are often calculated at the class- or method-level (e.g., McCabe’s Cyclomatic Complexity). To address the disagreement in granularity, the class- and method-level software metrics are aggregated to file-level, often using summation (i.e., McCabe of a file is the sum of the McCabe of all methods within the file). A recent study shows that summation significantly inflates the correlation between lines of code (Sloc) and cyclomatic complexity (Cc) in Java projects. While there are many other aggregation schemes (e.g., central tendency, dispersion), they have remained unexplored in the scope of defect prediction. In this study, we set out to investigate how different aggregation schemes impact defect prediction models. Through an analysis of 11 aggregation schemes using data collected from 255 open source projects, we find that: (1) aggregation schemes can significantly alter correlations among metrics, as well as the correlations between metrics and
    [Show full text]
  • Balancing Dependability Quality Attributes Relationships for Increased Embedded Systems Dependability
    Master Thesis Software Engineering Thesis no: MSE-2009:17 September 2009 Balancing Dependability Quality Attributes Relationships for Increased Embedded Systems Dependability Saleh Al-Daajeh Supervisor: Professor Mikael Svahnberg School of Engineering Blekinge Institute of Technology Box 520 SE – 372 25 Ronneby Sweden This thesis is submitted to the School of Engineering at Blekinge Institute of Technology in partial fulfillment of the requirements for the degree of Master of Science in Software Engineering. The thesis is equivalent to 2 x 20 weeks of full time studies. Contact Information: Author: Saleh Al-Daajeh E-mail: [email protected] University advisor: Prof. Miakel Svahnberg School of Engineering Internet : www.bth.se/tek Blekinge Institute of Technology Phone : +46 457 385 000 Box 520 Fax : +46 457 271 25 SE – 372 25 Ronneby Sweden II Abstract Embedded systems are used in many critical applications of our daily life. The increased complexity of embedded systems and the tightened safety regulations posed on them and the scope of the environment in which they operate are driving the need for more dependable embedded systems. Therefore, achieving a high level of dependability to embedded systems is an ultimate goal. In order to achieve this goal we are in need of understanding the interrelationships between the different dependability quality attributes and other embedded systems’ quality attributes. This research study provides indicators of the relationship between the dependability quality attributes and other quality attributes for embedded systems by identify- ing the impact of architectural tactics as the candidate solutions to construct dependable embedded systems. III Acknowledgment I would like to express my gratitude to all those who gave me the possibility to complete this thesis.
    [Show full text]
  • A Reasoning Framework for Dependability in Software Architectures Tacksoo Im Clemson University, [email protected]
    Clemson University TigerPrints All Dissertations Dissertations 12-2010 A Reasoning Framework for Dependability in Software Architectures Tacksoo Im Clemson University, [email protected] Follow this and additional works at: https://tigerprints.clemson.edu/all_dissertations Part of the Computer Sciences Commons Recommended Citation Im, Tacksoo, "A Reasoning Framework for Dependability in Software Architectures" (2010). All Dissertations. 618. https://tigerprints.clemson.edu/all_dissertations/618 This Dissertation is brought to you for free and open access by the Dissertations at TigerPrints. It has been accepted for inclusion in All Dissertations by an authorized administrator of TigerPrints. For more information, please contact [email protected]. A Reasoning Framework for Dependability in Software Architectures A Dissertation Presented to the Graduate School of Clemson University In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Computer Science by Tacksoo Im August 2010 Accepted by: Dr. John D. McGregor, Committee Chair Dr. Harold C. Grossman Dr. Jason O. Hallstrom Dr. Pradip K. Srimani Abstract The degree to which a software system possesses specified levels of software quality at- tributes, such as performance and modifiability, often have more influence on the success and failure of those systems than the functional requirements. One method of improving the level of a software quality that a product possesses is to reason about the structure of the software architecture in terms of how well the structure supports the quality. This is accomplished by reasoning through software quality attribute scenarios while designing the software architecture of the system. As society relies more heavily on software systems, the dependability of those systems be- comes critical.
    [Show full text]
  • Testing E-Commerce Systems: a Practical Guide - Wing Lam
    Testing E-Commerce Systems: A Practical Guide - Wing Lam As e-customers (whether business or consumer), we are unlikely to have confidence in a Web site that suffers frequent downtime, hangs in the middle of a transaction, or has a poor sense of usability. Testing, therefore, has a crucial role in the overall development process. Given unlimited time and resources, you could test a system to exhaustion. However, most projects operate within fixed budgets and time scales, so project managers need a systematic and cost- effective approach to testing that maximizes test confidence. This article provides a quick and practical introduction to testing medium- to large-scale transactional e-commerce systems based on project experiences developing tailored solutions for B2C Web retailing and B2B procurement. Typical of most e-commerce systems, the application architecture includes front-end content delivery and management systems, and back-end transaction processing and legacy integration. Aimed primarily at project and test managers, this article explains how to · establish a systematic test process, and · test e-commerce systems. The Test Process You would normally expect to spend between 25 to 40 percent of total project effort on testing and validation activities. Most seasoned project managers would agree that test planning needs to be carried out early in the project lifecycle. This will ensure the time needed for test preparation (establishing a test environment, finding test personnel, writing test scripts, and so on) before any testing can actually start. The different kinds of testing (unit, system, functional, black-box, and others) are well documented in the literature.
    [Show full text]
  • Interpreting Reliability and Availability Requirements for Network-Centric Systems What MCOTEA Does
    Marine Corps Operational Test and Evaluation Activity Interpreting Reliability and Availability Requirements for Network-Centric Systems What MCOTEA Does Planning Testing Reporting Expeditionary, C4ISR & Plan Naval, and IT/Business Amphibious Systems Systems Test Evaluation Plans Evaluation Reports Assessment Plans Assessment Reports Test Plans Test Data Reports Observation Plans Observation Reports Combat Ground Service Combat Support Initial Operational Test Systems Systems Follow-on Operational Test Multi-service Test Quick Reaction Test Test Observations 2 Purpose • To engage test community in a discussion about methods in testing and evaluating RAM for software-intensive systems 3 Software-intensive systems – U.S. military one of the largest users of information technology and software in the world [1] – Dependence on these types of systems is increasing – Software failures have had disastrous consequences Therefore, software must be highly reliable and available to support mission success 4 Interpreting Requirements Excerpts from capabilities documents for software intensive systems: Availability Reliability “The system is capable of achieving “Average duration of 716 hours a threshold operational without experiencing an availability of 95% with an operational mission fault” objective of 98%” “Mission duration of 24 hours” “Operationally Available in its intended operating environment “Completion of its mission in its with at least a 0.90 probability” intended operating environment with at least a 0.90 probability” 5 Defining Reliability & Availability What do we mean reliability and availability for software intensive systems? – One consideration: unlike traditional hardware systems, a highly reliable and maintainable system will not necessarily be highly available - Highly recoverable systems can be less available - A system that restarts quickly after failures can be highly available, but not necessarily reliable - Risk in inflating availability and underestimating reliability if traditional equations are used.
    [Show full text]
  • Fundamental Concepts of Dependability
    Fundamental Concepts of Dependability Algirdas Avizˇ ienis Jean-Claude Laprie Brian Randell UCLA Computer Science Dept. LAAS-CNRS Dept. of Computing Science Univ. of California, Los Angeles Toulouse Univ. of Newcastle upon Tyne USA France U.K. UCLA CSD Report no. 010028 LAAS Report no. 01-145 Newcastle University Report no. CS-TR-739 LIMITED DISTRIBUTION NOTICE This report has been submitted for publication. It has been issued as a research report for early peer distribution. Abstract Dependability is the system property that integrates such attributes as reliability, availability, safety, security, survivability, maintainability. The aim of the presentation is to summarize the fundamental concepts of dependability. After a historical perspective, definitions of dependability are given. A structured view of dependability follows, according to a) the threats, i.e., faults, errors and failures, b) the attributes, and c) the means for dependability, that are fault prevention, fault tolerance, fault removal and fault forecasting. he protection and survival of complex information systems that are embedded in the infrastructure supporting advanced society has become a national and world-wide concern of the 1 Thighest priority . Increasingly, individuals and organizations are developing or procuring sophisticated computing systems on whose services they need to place great reliance — whether to service a set of cash dispensers, control a satellite constellation, an airplane, a nuclear plant, or a radiation therapy device, or to maintain the confidentiality of a sensitive data base. In differing circumstances, the focus will be on differing properties of such services — e.g., on the average real-time response achieved, the likelihood of producing the required results, the ability to avoid failures that could be catastrophic to the system's environment, or the degree to which deliberate intrusions can be prevented.
    [Show full text]
  • Types of Software Testing
    Types of Software Testing We would be glad to have feedback from you. Drop us a line, whether it is a comment, a question, a work proposition or just a hello. You can use either the form below or the contact details on the rightt. Contact details [email protected] +91 811 386 5000 1 Software testing is the way of assessing a software product to distinguish contrasts between given information and expected result. Additionally, to evaluate the characteristic of a product. The testing process evaluates the quality of the software. You know what testing does. No need to explain further. But, are you aware of types of testing. It’s indeed a sea. But before we get to the types, let’s have a look at the standards that needs to be maintained. Standards of Testing The entire test should meet the user prerequisites. Exhaustive testing isn’t conceivable. As we require the ideal quantity of testing in view of the risk evaluation of the application. The entire test to be directed ought to be arranged before executing it. It follows 80/20 rule which expresses that 80% of defects originates from 20% of program parts. Start testing with little parts and extend it to broad components. Software testers know about the different sorts of Software Testing. In this article, we have incorporated majorly all types of software testing which testers, developers, and QA reams more often use in their everyday testing life. Let’s understand them!!! Black box Testing The black box testing is a category of strategy that disregards the interior component of the framework and spotlights on the output created against any input and performance of the system.
    [Show full text]
  • Dependability Assessment of Software- Based Systems: State of the Art
    Dependability Assessment of Software- based Systems: State of the Art Bev Littlewood Centre for Software Reliability, City University, London [email protected] You can pick up a copy of my presentation here, if you have a lap-top ICSE2005, St Louis, May 2005 - slide 1 Do you remember 10-9 and all that? • Twenty years ago: much controversy about need for 10-9 probability of failure per hour for flight control software – could you achieve it? could you measure it? – have things changed since then? ICSE2005, St Louis, May 2005 - slide 2 Issues I want to address in this talk • Why is dependability assessment still an important problem? (why haven’t we cracked it by now?) • What is the present position? (what can we do now?) • Where do we go from here? ICSE2005, St Louis, May 2005 - slide 3 Why do we need to assess reliability? Because all software needs to be sufficiently reliable • This is obvious for some applications - e.g. safety-critical ones where failures can result in loss of life • But it’s also true for more ‘ordinary’ applications – e.g. commercial applications such as banking - the new Basel II accords impose risk assessment obligations on banks, and these include IT risks – e.g. what is the cost of failures, world-wide, in MS products such as Office? • Gloomy personal view: where it’s obvious we should do it (e.g. safety) it’s (sometimes) too difficult; where we can do it, we don’t… ICSE2005, St Louis, May 2005 - slide 4 What reliability levels are required? • Most quantitative requirements are from safety-critical systems.
    [Show full text]
  • QA and Testing Have Become More of a Concurrent Process
    Special Feature: SW Testing “QA and testing have become more of a concurrent process rather than an end of lifecycle process” Manish Tandon, VP & Head– Independent Validation and Testing Solutions, Infosys Interview by: Anil Chopra, PCQuest n an exclusive interview with the PCQuest Editor, Manish talks become a software tester? about the global trends in software testing, skillsets required Software testing has evolved into a very special function and Ito enter the field, how technology has evolved in this area to anybody wanting to enter the domain requires two special provide higher ROI, and much more. An IIT and IIM graduate, skillsets. One is deep functional or domain skills to understand Manish is responsible for formulating and executing the business the functionality. Plus, you need specialized technical skills strategy for Independent Validation and Testing Solutions practice as well because you want to do automation, middleware, and at Infosys. He mentors the unit specifically in meeting business performance testing. It’s a unique combination, and we focus goals and targets. In addition, he manages critical relationships very strongly on both these dimensions. A good software tester with client executives, industry analysts, deal consultants, and should always have an eye for detail. So people who have a slightly anchors the training and development of key personnel. Provided critical eye and are willing to dig deeper always make much better here are his expert comments on the subject. testers than people who want to fly at 30k feet. Q> What are some of the global trends in the software Q> You mentioned that software testing is now required testing business? How do you see the market moving? for front-end applications.
    [Show full text]
  • Software Maintainability a Brief Overview & Application to the LFEV 2015
    Software Maintainability A Brief Overview & Application to the LFEV 2015 Adam I. Cornwell Lafayette College Department of Electrical & Computer Engineering Easton, Pennsylvania [email protected] Abstract — This paper gives a brief introduction on For “lexical level” approaches which base what software maintainability is, the various methods complexity on program code, the following which have been devised to quantify and measure software maintainability, its relevance to the ECE 492 measurands are typical: individual and average Senior Design course, and some practical lines of code; number of commented lines, implementation guidelines. executable statements, blank lines, tokens, and data declarations; source code readability, or the Keywords — Software Maintainability; Halstead ratio of LOC to commented LOC; Halstead Metrics; McCabe’s Cyclomatic complexity; radon Metrics, including Halstead Length, Volume, and Effort; McCabe’s Cyclomatic Complexity; the I. INTRODUCTION control structure nesting level; and the number of Software Maintainability is an important knots, or the number of times the control flow concept in the upkeep of long-term software crosses. The final measurand is not as useful with systems. According to the IEEE, software object oriented programming but can still be of maintainability is defined as “the ease with which some use. a software system or component can be modified to correct faults, improve performance or other “Psychological complexity” approaches measure attributes, or adapt to a changed environment difficulty and are based on understandability and [1].” Software Maintainability can be measured the user [3]. The nature of the file and the using various devised methods, although none difficulty experienced by those working with the have been conclusively shown to work in a large file are what contribute to these kinds of variety of software systems [6].
    [Show full text]
  • A Metrics-Based Software Maintenance Effort Model
    A Metrics-Based Software Maintenance Effort Model Jane Huffman Hayes Sandip C. Patel Liming Zhao Computer Science Department Computer Science Department Computer Science Department Lab for Advanced Networking University of Louisville University of Kentucky University of Kentucky [email protected] [email protected] [email protected] (corresponding author) Abstract planning a new software development project. Albrecht introduced the notion of function points (FP) to estimate We derive a model for estimating adaptive software effort [1]. Mukhopadhyay [19] proposes early software maintenance effort in person hours, the Adaptive cost estimation based on requirements alone. Software Maintenance Effort Model (AMEffMo). A number of Life Cycle Management (SLIM) [23] is based on the metrics such as lines of code changed and number of Norden/Rayleigh function and is suitable for large operators changed were found to be strongly correlated projects. Shepperd et al. [27] argued that algorithmic cost to maintenance effort. The regression models performed models such as COCOMO and those based on function well in predicting adaptive maintenance effort as well as points suggested an approach based on using analogous provide useful information for managers and maintainers. projects to estimate the effort for a new project. In addition to the traditional off-the-self models such as 1. Introduction COCOMO, machine-learning methods have surfaced recently. In [17], Mair et al. compared machine-learning Software maintenance typically accounts for at least 50 methods in building software effort prediction systems. percent of the total lifetime cost of a software system [16]. There has also been some work toward applying fuzzy Schach et al.
    [Show full text]
  • Software Reliability and Dependability: a Roadmap Bev Littlewood & Lorenzo Strigini
    Software Reliability and Dependability: a Roadmap Bev Littlewood & Lorenzo Strigini Key Research Pointers Shifting the focus from software reliability to user-centred measures of dependability in complete software-based systems. Influencing design practice to facilitate dependability assessment. Propagating awareness of dependability issues and the use of existing, useful methods. Injecting some rigour in the use of process-related evidence for dependability assessment. Better understanding issues of diversity and variation as drivers of dependability. The Authors Bev Littlewood is founder-Director of the Centre for Software Reliability, and Professor of Software Engineering at City University, London. Prof Littlewood has worked for many years on problems associated with the modelling and evaluation of the dependability of software-based systems; he has published many papers in international journals and conference proceedings and has edited several books. Much of this work has been carried out in collaborative projects, including the successful EC-funded projects SHIP, PDCS, PDCS2, DeVa. He has been employed as a consultant to industrial companies in France, Germany, Italy, the USA and the UK. He is a member of the UK Nuclear Safety Advisory Committee, of IFIPWorking Group 10.4 on Dependable Computing and Fault Tolerance, and of the BCS Safety-Critical Systems Task Force. He is on the editorial boards of several international scientific journals. 175 Lorenzo Strigini is Professor of Systems Engineering in the Centre for Software Reliability at City University, London, which he joined in 1995. In 1985-1995 he was a researcher with the Institute for Information Processing of the National Research Council of Italy (IEI-CNR), Pisa, Italy, and spent several periods as a research visitor with the Computer Science Department at the University of California, Los Angeles, and the Bell Communication Research laboratories in Morristown, New Jersey.
    [Show full text]