System, Acceptance, and Regression Testing Learning Objectives System

Learning objectives

•! Distinguish system and acceptance testing –! How and why they differ from each other and from System, Acceptance, and Regression unit and integration testing Testing •! Understand basic approaches for quantitative assessment (reliability, performance, ...) •! Understand interplay of validation and verification for usability and accessibility –! How to continuously monitor usability from early design to delivery •! Understand basic regression testing approaches –! Preventing accidental changes

System Testing System Acceptance Regression Test for ... Correctness, Usefulness, Accidental •! Key characteristics: completion satisfaction changes –! Comprehensive (the whole system, the whole spec) –! Based on specification of observable behavior Test by ... Development Test group with Development Verification against a requirements specification, not validation, and not opinions test group users test group –! Independent of design and implementation

Verification Validation Verification Independence: Avoid repeating software design errors in system test design

•! One strategy for maximizing independence: •! If the development organization controls System (and acceptance) test performed by a system testing ... different organization –! Perfect independence may be unattainable, but we –! Organizationally isolated from developers (no can reduce undue influence pressure to say “ok”) •! Develop system test cases early –! Sometimes outsourced to another company or –! As part of requirements specification, before major agency design decisions have been made •! Especially for critical systems •!Agile “test first” and conventional “V model” are both •!Outsourcing for independent judgment, not to save money examples of designing system test cases before designing •!May be additional system test, not replacing internal V&V the implementation –! Not all outsourced testing is IV&V •!An opportunity for “design for test”: Structure system for critical system testing early in project •!Not independent if controlled by development organization

Incremental System Testing Global Properties

•! System tests are often used to measure •! Some system properties are inherently global progress –! Performance, latency, reliability, ... –! System test suite covers all features and scenarios of –! Early and incremental testing is still necessary, but use provide only estimates –! As project progresses, the system passes more and •! A major focus of system testing more system tests –! The only opportunity to verify global properties •! Assumes a “threaded” incremental build plan: against actual system specifications Features exposed at top level as they are –! Especially to find unanticipated effects, e.g., an developed unexpected performance bottleneck

•! Beyond system-global: Some properties depend •! When a property (e.g., performance or real- on the system context and use time response) is parameterized by use ... –! Example: Performance properties depend on –! requests per second, size of database, ... environment and configuration •! Extensive stress testing is required –! Example: Privacy depends both on system and how it –! varying parameters within the envelope, near the is used bounds, and beyond •!Medical records system must protect against unauthorized use, and authorization must be provided only as needed •! Goal: A well-understood model of how the –! Example: Security depends on threat profiles property varies with the parameter •!And threats change! –! How sensitive is the property to the parameter? •! Testing is just one part of the approach –! Where is the “edge of the envelope”? –! What can we expect when the envelope is exceeded?

Stress Testing Estimating Dependability

•! Often requires extensive simulation of the •! Measuring quality, not searching for faults execution environment –! Fundamentally different goal than systematic testing –! With systematic variation: What happens when we •! Quantitative dependability goals are statistical push the parameters? What if the number of users –! Reliability or requests is 10 times more, or 1000 times more? –! Availability •! Often requires more resources (human and –! Mean time to failure machine) than typical test cases –! ... –! Separate from regular feature tests •! Requires valid statistical samples from –! Run less often, with more manual control operational profile –! Diagnose deviations from expectation –! Fundamentally different from systematic testing •!Which may include difficult debugging of latent faults!

•! We need a valid operational profile (model) •! Necessary for ... –! Sometimes from an older version of the system –! Critical systems (safety critical, infrastructure, ...) –! Sometimes from operational environment (e.g., for an embedded controller) •! But difficult or impossible when ... –! Sensitivity testing reveals which parameters are most important, and which can be rough guesses –! Operational profile is unavailable or just a guess •!Often for new functionality involving human interaction •! And a clear, precise definition of what is being –! But we may factor critical functions from overall use to measured obtain a good model of only the critical properties –! Failure rate? Per session, per hour, per operation? –! Reliability requirement is very high •!Required sample size (number of test cases) might require •! And many, many random samples years of test execution –! Especially for high reliability measures •!Ultra-reliability can seldom be demonstrated by testing

Process-based Measures Usability

•! Less rigorous than statistical testing •! A usable product –! Based on similarity with prior projects –! is quickly learned •! System testing process –! allows users to work efficiently –! Expected history of bugs found and resolved –! is pleasant to use •! Alpha, beta testing •! Objective criteria –! Alpha testing: Real users, controlled environment –! Time and number of operations to perform a task –! Beta testing: Real users, real (uncontrolled) –! Frequency of user error environment •!blame user errors on the product! –! May statistically sample users rather than uses •! Plus overall, subjective satisfaction –! Expected history of bug reports

•! Usability rests ultimately on testing with real Validation Verification (usability lab) (developers, testers) users — validation, not verification •! Usability testing •! Inspection applies –! Preferably in the usability lab, by usability experts establishes usability usability check-lists to check-lists specification and design •! But we can factor usability testing for process –! Guidelines applicable across a product line or visibility — validation and verification domain throughout the project •! Early usability testing •! Behavior objectively –! Validation establishes criteria to be verified by evaluates “cardboard verified (e.g., tested) testing, analysis, and inspection prototype” or mock-up against interface design –! Produces interface design

Varieties of Usability Test Typical Usability Test Protocol

•! Exploratory testing •! Select representative sample of user groups – Typically 3-5 users from each of 1-4 groups –! Investigate mental model of users ! – Questionnaires verify group membership –! Performed early to guide interface design ! •! Comparison testing •! Ask users to perform a representative sequence of tasks –! Evaluate options (specific interface design choices) –! Observe (and measure) interactions with alternative •! Observe without interference (no helping!) interaction patterns –! The hardest thing for developers is to not help. •! Usability validation testing Professional usability testers use one-way mirrors. –! Assess overall usability (quantitative and qualitative) •! Measure (clicks, eye movement, time, ...) and –! Includes measurement: error rate, time to complete follow up with questionnaire

•! Check usability by people with disabilities •! Yesterday it worked, today it doesn’t –! Blind and low vision, deaf, color-blind, ... –! I was fixing X, and accidentally broke Y •! Use accessibility guidelines –! That bug was fixed, but now it’s back –! Direct usability testing with all relevant groups is •! Tests must be re-run after any change usually impractical; checking compliance to –! Adding new features guidelines is practical and often reveals problems –! Changing, adapting software to new conditions •! Example: W3C Web Content Accessibility –! Fixing other bugs Guidelines •! Regression testing can be a major cost of –! Parts can be checked automatically software maintenance –! but manual check is still required –! Sometimes much more than making the change •!e.g., is the “alt” tag of the image meaningful?

Basic Problems of Regression Test Test Case Maintenance

•! Maintaining test suite •! Some maintenance is inevitable –! If I change feature X, how many test cases must be –! If feature X has changed, test cases for feature X revised because they use feature X? will require updating –! Which test cases should be removed or replaced? •! Some maintenance should be avoided Which test cases should be added? –! Example: Trivial changes to user interface or file •! Cost of re-testing format should not invalidate large numbers of test –! Often proportional to product size, not change size cases –! Big problem if testing requires manual effort •! Test suites should be modular! •!Possible problem even for automated testing, when the test –! Avoid unnecessary dependence suite and test execution time grows beyond a few hours –! Generating concrete test cases from test case specifications can help

(c) 2007 Mauro Pezzè & Michal Young Ch 22, slide 27 (c) 2007 Mauro Pezzè & Michal Young Ch 22, slide 28 Selecting and Prioritizing Regression Obsolete and Redundant Test Cases •! Obsolete: A test case that is not longer valid •! Should we re-run the whole regression test –! Tests features that have been modified, substituted, suite? If so, in what order? or removed –! Maybe you don’t care. If you can re-rerun –! Should be removed from the test suite everything automatically over lunch break, do it. •! Redundant: A test case that does not differ –! Sometimes you do care ... significantly from others •! Selection matters when –! Unlikely to find a fault missed by similar test cases –! Test cases are expensive to execute –! Has some cost in re-execution •!Because they require special equipment, or long run-times, or cannot be fully automated –! Has some (maybe more) cost in human effort to maintain •! Prioritization matters when –! May or may not be removed, depending on costs –! A very large test suite cannot be executed every day

Control-flow and Data-flow Regression Code-based Regression Test Selection Test Selection •! Observation: A test case can’t find a fault in •! Same basic idea as code-based selection code it doesn’t execute –! Re-run test cases only if they include changed –! In a large system, many parts of the code are elements untouched by many test cases –! Elements may be modified control flow nodes and •! So: Only execute test cases that execute edges, or definition-use (DU) pairs in data flow changed or new code •! To automate selection: –! Tools record elements touched by each test case Executed by •!Stored in database of regression test cases test case –! Tools note changes in program New or changed –! Check test-case database for overlap

(c) 2007 Mauro Pezzè & Michal Young Ch 22, slide 31 (c) 2007 Mauro Pezzè & Michal Young Ch 22, slide 32 Specification-based Regression Test Prioritized Rotating Selection Selection •! Like code-based and structural regression test •! Basic idea: case selection –! Execute all test cases, eventually –! Pick test cases that test new and changed –! Execute some sooner than others functionality •! Possible priority schemes: •! Difference: No guarantee of independence –! Round robin: Priority to least-recently-run test cases –! A test case that isn’t “for” changed or added feature –! Track record: Priority to test cases that have X might find a bug in feature X anyway detected faults before •! Typical approach: Specification-based •!They probably execute code with a high fault density prioritization –! Structural: Priority for executing elements that have –! Execute all test cases, but start with those that not been recently executed related to changed and added features •!Can be coarse-grained: Features, methods, files, ...

Summary

•! System testing is verification –! System consistent with specification? –! Especially for global properties (performance, reliability) •! Acceptance testing is validation –! Includes user testing and checks for usability •! Usability and accessibility require both –! Usability testing establishes objective criteria to verify throughout development •! Regression testing repeated after each change –! After initial delivery, as software evolves