Testing Strategies F

M. Weintraub and TESTING STRATEGIES F. Tip Thanks go to Andreas Zeller for allowing incorporation of his materials TESTING Testing: a procedure intended to establish the quality, performance, or reliability of something, esp. before it is taken into widespread use. 2 SOFTWARE TESTING Software Testing is the process of exercising a program with the specific intent of finding errors prior to delivery to the end user. 3 TECHNIQUES FOR EVALUATING SOFTWARE Testing Program (dynamic Analysis verification) (static or dynamic) Inspections Proofs (static verification) (static verification) THE CURSE OF TESTING Dijkstra’ Curse: Testing can show the presence but not the absence of errors optimistic inaccuracy a test run a test a test run a test run a test a test run a test ∞ possible runs 5 STATIC ANALYSIS We cannot tell whether this condition ever if ( .... ) { holds ... Zeller’s Corollary:(halting problem) Static Analysis lock(S); can confirm the absence but not } Static pessimistic checking for ... the presence of errorsinaccuracy if ( ... ) { match is ... necessarily inaccurate unlock(S); } 6 COMBINING METHODS a proof a proof a test run a test unverified a test run a test run a test run a test abstraction properties ∞ possible runs 7 WHY IS SOFTWARE VERIFICATION HARD? 1. Many different quality requirements 2. Evolving (and deteriorating) structure 3. Inherent non-linearity 4. Uneven distribution of faults 8 WHY IS SOFTWARE VERIFICATION HARD? 1. Many different quality requirements 2. Evolving (and deteriorating) structure 3. Inherent non-linearity 4. Uneven distribution of faults If an elevator can safely carry a load of 1000 kg, it can also safely carry any smaller load 9 WHY IS SOFTWARE VERIFICATION HARD? 1. Many different quality requirements 2. Evolving (and deteriorating) structure 3. Inherent non-linearity 4. Uneven distribution of faults If an elevator can safely carry a load of If a procedure correctly sorts a set of 256 1000 kg, it can also safely carry any smaller elements, it may fail on a set of 255 or 53 load elements, as well as on 257 or 1023 10 TRADE-OFFS We can be inaccurate (optimistic or pessimistic) or we can simplify properties… but you cannot have it all! 11 RECALL THE WATERFALL MODEL (1968) Communication project initiation requirements gathering Planning estimating scheduling tracking Modeling analysis design Construction code test Deployment delivery support feedback TESTING FALLS TOWARD THE END… Communication project initiation requirements gathering Planning estimating scheduling tracking Modeling analysis design Construction code test Deployment delivery support feedback WE TEND TO ASSUME OUR CODE IS DONE WHEN 14 WE TEND TO ASSUME OUR CODE IS DONE WHEN We built it! 15 BUT, SHALL WE DEPLOY IT? Is it really ready for release? 16 SO LET’S FOCUS ON THE CONSTRUCTION STEP Construction code test Waterfall Model (1968) Construction code test Deployment delivery support feedback V & V: VALIDATION & VERIFICATION Validation Verification Ensuring that software has been built Ensuring that software correctly according to customer requirements implements a specific function Are we building the Are we building the right product? product right? 19 VALIDATION AND VERIFICATION SW Specs System Actual Requirements Validation Verification Involves usability testing, user Includes testing, code feedback, & product trials inspections, static analysis, proofs 20 VALIDATION if a user presses a request button at floor i, an available elevator must arrive at floor i soon not verifiable, but may be validated 21 VERIFICATION if a user presses a request button at floor i, an available elevator must arrive at floor i within 30 seconds verifiable 2 2 CORE QUESTIONS 1. When does V&V start? When is it done? 2. Which techniques should be applied? 3. How do we know a product is ready? 4. How can we control the quality of successive releases? 5. How can we improve development? 23 WHEN DOES V&V START? WHEN IS IT DONE? WATERFALL SEPARATES CODING AND TESTING INTO TWO ACTIVITIES Code Test FIRST CODE, THEN TEST 1. Developers on software should do no testing at all 2. Software should be “tossed over a wall” to strangers who will test it mercilessly 3. Testers should get involved with the project only when testing is about to begin 26 V MODEL Module Test From: Pezze + Young, “Software Testing and Analysis” 27 UNIT TESTS Aims to uncover errors at module boundaries Typically written by programmer Should be completely automatic (→ regression testing) 28 TESTING COMPONENTS: STUBS AND DRIVERS Driver A driver exercises a module’s functions A stub simulates not-yet-ready modules Frequently realized as mock objects Stub Stub 29 PUTTING THE PIECES TOGETHER: INTEGRATION TESTS General idea: Constructing software while conducting tests Options: 1. Big Bang 2. Incremental Construction 30 BIG BANG APPROACH All components are combined in advance The entire program is tested as a whole 31 BIG BANG APPROACH All components are combined in advance The entire program is tested as a whole Chaos results! For every failure, the entire program must be taken into account. 32 TOP-DOWN INTEGRATION Top module is tested with stubs (and then used as driver) A Stubs are replaced one at a time StubB Stub Stub (“depth first”) As new modules are integrated, tests are StubC Stub re-run StubD Stub Allows for early demonstration of capability 33 BOTTOM-UP INTEGRATION Bottom modules implemented first and Driver combined into clusters DriverC F Drivers are replaced one at a time Removes the need for complex stubs D E 34 SANDWICH INTEGRATION A Combines bottom-up and top-down integration StubB Stub Stub Top modules tested with stubs, bottom Driver modules with drivers. Combines the best of the two C F approaches D E 35 ONE DIFFERENCE FROM UNIT TESTING: EMERGENT BEHAVIOR Some behaviors are only clear when components are put together. This has to be tested too, although it can be very hard to plan in advance! (It’s almost always discovered after the fact…) Causes test suites/cases to be refactored. 36 NEXT CONCERN: WHO OWNS RESPONSIBILITY FOR TESTING THE SOFTWARE? Developer Independent Tester understands the system must learn about system May test gently if driven by delivery. Likely won’t be gentle if driven by quality, May miss things because of familiarity. but faces a steep learning curve 37 WEINBERG’S LAW A developer is unsuited to test his or her code. From Gerald Weinberg, “The psychology of computer programming” 38 WEINBERG’S LAW Theory: As humans want to be honest with themselves, developers are A developer is unsuited blindfolded with respect to their own mistakes. to test his or her code. Cynical Theory: People are not entirely honest and can take shortcuts. So, how might you change the conditions to avoid these pitfalls? From Gerald Weinberg, “The psychology of computer programming” 39 MAKE EVERYONE A TESTER! ✓ Experienced Outsiders and Clients Good for finding gaps missed by developers, especially domain specific items ✓ Inexperienced Users Good for illuminating other, perhaps unintended uses/errors ✓ Mother Nature Crashes tend to happen during an important client/customer demo… 40 SYSTEM TESTING 41 SPECIAL KINDS OF SYSTEM TESTING Recovery Testing Forces the software to fail in a variety of ways and verifies that recovery is properly performed Security Testing Verifies that protection mechanisms built into a system will, in fact, protect it from unauthorized access/disclosure/modification/destruction Stress Testing Executes a system in a manner that demands resources in abnormal quantity, frequency, or volume Performance Testing Test the run-time performance of software within the context of an integrated system 42 PERFORMANCE TESTING Measures a system’s capacity to process a specific load over a specific time-span, usually: ▪ number of concurrent users ▪ specific number of concurrent transactions ▪ other options: throughput, server response time, server request round-trip time, … Involves defining and running operational profiles that reflect expected use 43 TYPES OF PERFORMANCE TESTING Load Aims to assess compliance with non-functional requirements Stress Identifies system capacity limits Spike Testing involving rapid swings in loa Endurance Continuous operation at a given load (aka Soak) 44 SECURITY TESTING Confidentiality Information protection from unauthorized access or disclosure Integrity Information protection from unauthorized modification or destruction Availability System protection from unauthorized disruption 45 ACCEPTANCE TESTING 46 ACCEPTANCE TESTING Acceptance testing checks whether contractual requirements are met Typically incremental alpha test at production site, beta test at user’s site Work is over when acceptance testing is done 47 WHICH TECHNIQUES SHOULD BE APPLIED? 48 TECHNIQUES FOR EVALUATING SOFTWARE Testing Program (dynamic Analysis verification) (static or dynamic) Inspections Proofs (static verification) (static verification) HOW DO WE KNOW A PRODUCT IS READY? 50 HOW DO WE KNOW WHEN A PRODUCT IS READY? Let the customer test it :-) We’re out of time… Relative to a theoretically sound and experimentally validated statistical model, we have done sufficient testing to say with 95% confidence that the probability of 1,000 CPU hours of failure-free operation is ≥ 0.995. 51 HOW CAN WE CONTROL THE QUALITY OF SUCCESSIVE RELEASES? 52 REGRESSION TESTS Set up automated tests Using, e.G., Junit Ideally, run regression tests after each change If running the tests takes too long: Prioritize and run a subset Apply regression test selection to determine tests

Load more