6.005 Elements of Software Construction Fall 2008

MIT OpenCourseWare http://ocw.mit.edu 6.005 Elements of Software Construction Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 10/6/2008 Today’s Topics why testing is important and hard choosing inputs ¾input space partitioning ¾boundary testing how to know if you’ve done enough Testing and Coverage ¾coverage testing pragmatics Rob Miller ¾stubs, drivers, oracles Fall 2008 ¾test-first development ¾regression testing © Robert Miller 2008 © Robert Miller 2008 Real Programmers Don’t Test!(?) top 5 reasons not to test 5) I want to get this done fast – testing is going to slow me down. 4) I started programming when I was 2. Don’t insult me by testing my perfect code! 3) testing is for incompetent programmers who cannot hack. 2) we’re not Harvard students – our code actually works! 1) “Most of the functions in Graph.java, as implemented, are one or two line functions that rely solely upon functions in HashMap or HashSet. I am assuming that these functions work perfectly, and thus there is really no need to test them.” – an excerpt from a 6.170 student’s e-mail WHY TESTING MATTERS © Robert Miller 2008 © Robert Miller 2008 1 10/6/2008 Who Says Software is Buggy? Another Prominent Software Bug Ariane 5 self-destructed 37 seconds after launch Mars Polar Lander crashed Photographs of the Ariane 5 rocket Diagrams of the Mars Polar Lander removed due to copyright restrictions. removed due to copyright restrictions. reason: a control software bug that went undetected ¾conversion from 64-bit floating point to 16-bit signed integer caused an ¾sensor signal falsely indicated that the craft had touched down when it was exception still 130 feet above the surface. • because the value was larger than 32767 (max 16-bit signed integer) ¾the descent engines shut down prematurely... and it was never heard from ¾but the exception handler had been disabled for efficiency reasons again ¾software crashed ... rocket crashed ... total cost over $1 billion the error was traced to a single bad line of code ¾Prof. Nancy Leveson: these problems "are well known as difficult parts of the software-engineering process”... and yet we still can’t get them right © Robert Miller 2008 © Robert Miller 2008 The Challenge Testing Strategies That Don’t Work we want to exhaustive testing is infeasible ¾know when product is stable enough to launch ¾space is generally too big to cover exhaustively ¾deliver product with known failure rate (preferably low) ¾imagine exhaustively testing a 32-bit floating-point multiply operation, a*b ¾offer warranty? • there are 2^64 test cases! but statistical testing doesn’t work for software ¾it’s very hard to measure or ensure quality in software ¾other engineering disciplines can test small random samples (e.g. 1% of ¾residual defect rate after shipping: hard drives manufactured) and infer defect rate for whole lot • 1 - 10 defects/kloc (typical) ¾many tricks to speed up time (e.g. opening a refrigerator 1000 times in 24 • 0.1 - 1 defects/kloc (high quality: Java libraries?) hours instead of 10 years) • 0. 01 - 0. 1 defects/kloc (very best: Praxis, NASA) ¾gives known failure rates (e.g. mean lifetime of a hard drive) ¾example: 1Mloc with 1 defect/kloc means you missed 1000 bugs! ¾but assumes continuity or uniformity across the space of defects, which is true for physical artifacts ¾this is not true for software • overflow bugs (like Ariane 5) happen abruptly • Pentium division bug affected approximately 1 in 9 billion divisions © Robert Miller 2008 © Robert Miller 2008 2 10/6/2008 Two Problems Aims of Testing often confused, but very different what are we trying to do? (a) problem of finding bugs in defective code ¾find bugs as cheaply and quickly as possible (b) problem of showing absence of bugs in good code reality vs. ideal approaches ¾idea lly, choose one test case that exposes a bug and run iit ¾testing: good for (a), occasionally (b) ¾in practice, have to run many test cases that “fail” (because they don’t ¾reasoning: good for (a), also (b) expose any bugs) theory and practice in practice, conflicting desiderata ¾for both, you need grasp of basic theory ¾increase chance of finding bug ¾good engineering judgment essential too ¾decrease cost of test suite (cost to generate, cost to run) © Robert Miller 2008 © Robert Miller 2008 Practical Strategies Basic Notions design testing strategy carefully what’s being tested? ¾know what it’s good for (finding egregious bugs) and not good for ¾unit testing: individual module (method, class, interface) (security) ¾subsystem testing: entire subsystems ¾complement with other methods: code review, reasoning, static analysis ¾integration, system, acceptance testing: whole system ¾exploit automation (e.g. JUnit) to increase coverage and frequency of how are inputs chosen? testing ¾random: surprisingly effective (in defects found per test case), but not ¾do it early and often much use when most inputs are invalid (e.g. URLs) ¾systematic: partitioning large input space into a few representatives ¾arbitrary: not a good idea, and not the same as random! how are outputs checked? ¾automatic checking is preferable, but sometimes hard (how to check the display of a graphical user interface?) © Robert Miller 2008 © Robert Miller 2008 3 10/6/2008 Basic Notions how good is the test suite? ¾coverage: how much of the specification or code is exercised by tests? when is testing done? ¾test-didriven deve lopment: tests are written first, bfbefore the code ¾regression testing: a new test is added for every discovered bug, and tests are run after every change to the code essential characteristics of tests ¾modularity: no dependence of test driver on internals of unit being tested ¾automation: must be able to run (and check results) without manual effort CHOOSING TESTS © Robert Miller 2008 © Robert Miller 2008 Example: Thermostat How Do We Test the Thermostat? specification arbitrary testing is not convincing ¾user sets the desired temperature Td ¾“just try it and see if it works“ won’t fly ¾thermostat measures the ambient temperature Ta exhaustive testing is not feasible ¾want heating if desired temp is higher than ambient temp ¾would require millions of runs to test all possible (TdT(Td,Ta ) pairs ¾want cooling if desired temp is lower than ambient temp key problem: choosing a test suite systematically ¾small enough to run quickly if Td > Ta, turn on heating ¾large enough to validate the program convincingly if Td < Ta, turn on air-conditioning if Td = Ta, turn everything off © Robert Miller 2008 © Robert Miller 2008 4 10/6/2008 Key Idea: Partition the Input Space More Examples input space is very large, but program is small java.math.BigInteger.multiply(BigInteger val) ¾so behavior must be the “same” for whole sets of inputs ¾has two arguments, this and val, drawn from BigInteger ideal test suite ¾partition BigInteger into: approach 1: ¾ident ify sets of inputs wiihth the same bbhehavi or • BigNeg, SmallNeg, -1, 0, 1, SmallPos, BigPos partition itinputs ¾try one input from each set ¾pick a value from each class separately, • -265, -9 -1, 0, 1, 9, 265 then form all ¾test the 7 × 7 = 49 combinations combinations static int java.Math.max(int a, int b) ¾partition into: approach 2: if Td > Ta, turn on heating • a < b, a = b, a > b partition the whlhole if Td < Ta, turn on air-conditioning ¾pick value from each class input space (useful if Td = Ta, turn everything off • (1, 2), (1, 1), (2, 1) when relationship between inputs matters) © Robert Miller 2008 © Robert Miller 2008 More Examples Boundary Testing Set.intersect(Set that) ¾include classes at boundaries of the input space • zero, min/max values, empty set, empty string, null ¾partition Set into: use both approaches • ∅, singleton, many ¾why? because bugs often occur at boundaries ¾partition whole input space into: • off-by-one errors • this = that, this ⊆ that, this ⊇ that, this ∩ that ≠ ∅, this ∩ that = ∅ • forget to handle empty container • overflow errors in arithmetic ¾pppick values that cover both partitions • {},{} {},{2} {},{2,3,4} • {5},{} {5},{2} {4},{2,3,4} • {2,3},{} {2,3},{2} {1,2},{2,3} © Robert Miller 2008 © Robert Miller 2008 5 10/6/2008 Exercise recall our quiz grammar ¾partition the input space of quizzes ¾devise a set of test quizzes Option ::= Value? Text Value ::= [ digit+ ] Text ::= char* Rule ::= Range Message Range ::= digit+ - digit+ : Message ::= char* COVERAGE ¾what important class of inputs are we leaving out? © Robert Miller 2008 © Robert Miller 2008 Coverage State Diagram for Thermostat how good are my tests? ¾measure extent to which tests ‘cover’ the spec or code if Td > Ta, turn on heating spec coverage for state machines if Td < Ta, turn on air-conditioning if Td = Ta, turn everything off state machine being tested all actions ¾a test case is a trace of (Td,Ta) pairs ¾all actions: (Td<Ta), (Td=Ta), (Td>Ta) all states • e.g., using actual temperatures: (67, 70), (67, 67), (70, 67) kinds of ¾all states: the same trace would cover all states coverage all transitions ¾all transitions: (Td<Ta), (Td=Ta), (Td > Ta), (Td=Ta) • e.g. (67, 70), (67, 67), (70, 67), (70, 70) all paths ¾all-actions, all-states ≤ all-transitions ≤ all-paths © Robert Miller 2008 © Robert Miller 2008 6 10/6/2008 Code Coverage How Far Should You Go? view control flow graph as state machine for spec coverage ¾and then apply state machine coverage notions ¾ all-actions: essential example ¾ all-states, all-transitions: if possible if (x < 10) x++; ¾ all-paths: generally infeasible, even if finite for code coverage ¾ all-statements, all-branches: if possible ¾ all-paths: infeasible industry practice ¾ all-statements is common goal, rarely achieved (due to unreachable code) ¾safe ty critica l in dus try has more arduous criter ia (eg, “MCDC”, modifie d decision/condition coverage) © Robert Miller 2008 © Robert Miller 2008 A Typical Statement Coverage Tool Black Box vs.

6.005 Elements of Software Construction Fall 2008

Testing in the Software Development Life Cycle and Its Importance

Definition of Software Construction Artefacts for Software Construction

A Parallel Program Execution Model Supporting Modular Software Construction

Object-Oriented Software Construction Oriented Software Construction Software Construction I Software Construction II

6.005 Elements of Software Construction Fall 2008

Integration Testing of Object-Oriented Software

Effective Methods for Software Testing

A Survey of System Development Process Models

Decision Framework for Selecting a Suitable Software Development Process

Continual and Long-Term Improvement in Agile Software Development

Software Construction

Lean Thinking in Software Engineering : a Systematic Review