6.005 Elements of Software Construction Fall 2008
Total Page:16
File Type:pdf, Size:1020Kb
MIT OpenCourseWare http://ocw.mit.edu 6.005 Elements of Software Construction Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 10/6/2008 Today’s Topics why testing is important and hard choosing inputs ¾input space partitioning ¾boundary testing how to know if you’ve done enough Testing and Coverage ¾coverage testing pragmatics Rob Miller ¾stubs, drivers, oracles Fall 2008 ¾test-first development ¾regression testing © Robert Miller 2008 © Robert Miller 2008 Real Programmers Don’t Test!(?) top 5 reasons not to test 5) I want to get this done fast – testing is going to slow me down. 4) I started programming when I was 2. Don’t insult me by testing my perfect code! 3) testing is for incompetent programmers who cannot hack. 2) we’re not Harvard students – our code actually works! 1) “Most of the functions in Graph.java, as implemented, are one or two line functions that rely solely upon functions in HashMap or HashSet. I am assuming that these functions work perfectly, and thus there is really no need to test them.” – an excerpt from a 6.170 student’s e-mail WHY TESTING MATTERS © Robert Miller 2008 © Robert Miller 2008 1 10/6/2008 Who Says Software is Buggy? Another Prominent Software Bug Ariane 5 self-destructed 37 seconds after launch Mars Polar Lander crashed Photographs of the Ariane 5 rocket Diagrams of the Mars Polar Lander removed due to copyright restrictions. removed due to copyright restrictions. reason: a control software bug that went undetected ¾conversion from 64-bit floating point to 16-bit signed integer caused an ¾sensor signal falsely indicated that the craft had touched down when it was exception still 130 feet above the surface. • because the value was larger than 32767 (max 16-bit signed integer) ¾the descent engines shut down prematurely... and it was never heard from ¾but the exception handler had been disabled for efficiency reasons again ¾software crashed ... rocket crashed ... total cost over $1 billion the error was traced to a single bad line of code ¾Prof. Nancy Leveson: these problems "are well known as difficult parts of the software-engineering process”... and yet we still can’t get them right © Robert Miller 2008 © Robert Miller 2008 The Challenge Testing Strategies That Don’t Work we want to exhaustive testing is infeasible ¾know when product is stable enough to launch ¾space is generally too big to cover exhaustively ¾deliver product with known failure rate (preferably low) ¾imagine exhaustively testing a 32-bit floating-point multiply operation, a*b ¾offer warranty? • there are 2^64 test cases! but statistical testing doesn’t work for software ¾it’s very hard to measure or ensure quality in software ¾other engineering disciplines can test small random samples (e.g. 1% of ¾residual defect rate after shipping: hard drives manufactured) and infer defect rate for whole lot • 1 - 10 defects/kloc (typical) ¾many tricks to speed up time (e.g. opening a refrigerator 1000 times in 24 • 0.1 - 1 defects/kloc (high quality: Java libraries?) hours instead of 10 years) • 0. 01 - 0. 1 defects/kloc (very best: Praxis, NASA) ¾gives known failure rates (e.g. mean lifetime of a hard drive) ¾example: 1Mloc with 1 defect/kloc means you missed 1000 bugs! ¾but assumes continuity or uniformity across the space of defects, which is true for physical artifacts ¾this is not true for software • overflow bugs (like Ariane 5) happen abruptly • Pentium division bug affected approximately 1 in 9 billion divisions © Robert Miller 2008 © Robert Miller 2008 2 10/6/2008 Two Problems Aims of Testing often confused, but very different what are we trying to do? (a) problem of finding bugs in defective code ¾find bugs as cheaply and quickly as possible (b) problem of showing absence of bugs in good code reality vs. ideal approaches ¾idea lly, choose one test case that exposes a bug and run iit ¾testing: good for (a), occasionally (b) ¾in practice, have to run many test cases that “fail” (because they don’t ¾reasoning: good for (a), also (b) expose any bugs) theory and practice in practice, conflicting desiderata ¾for both, you need grasp of basic theory ¾increase chance of finding bug ¾good engineering judgment essential too ¾decrease cost of test suite (cost to generate, cost to run) © Robert Miller 2008 © Robert Miller 2008 Practical Strategies Basic Notions design testing strategy carefully what’s being tested? ¾know what it’s good for (finding egregious bugs) and not good for ¾unit testing: individual module (method, class, interface) (security) ¾subsystem testing: entire subsystems ¾complement with other methods: code review, reasoning, static analysis ¾integration, system, acceptance testing: whole system ¾exploit automation (e.g. JUnit) to increase coverage and frequency of how are inputs chosen? testing ¾random: surprisingly effective (in defects found per test case), but not ¾do it early and often much use when most inputs are invalid (e.g. URLs) ¾systematic: partitioning large input space into a few representatives ¾arbitrary: not a good idea, and not the same as random! how are outputs checked? ¾automatic checking is preferable, but sometimes hard (how to check the display of a graphical user interface?) © Robert Miller 2008 © Robert Miller 2008 3 10/6/2008 Basic Notions how good is the test suite? ¾coverage: how much of the specification or code is exercised by tests? when is testing done? ¾test-didriven deve lopment: tests are written first, bfbefore the code ¾regression testing: a new test is added for every discovered bug, and tests are run after every change to the code essential characteristics of tests ¾modularity: no dependence of test driver on internals of unit being tested ¾automation: must be able to run (and check results) without manual effort CHOOSING TESTS © Robert Miller 2008 © Robert Miller 2008 Example: Thermostat How Do We Test the Thermostat? specification arbitrary testing is not convincing ¾user sets the desired temperature Td ¾“just try it and see if it works“ won’t fly ¾thermostat measures the ambient temperature Ta exhaustive testing is not feasible ¾want heating if desired temp is higher than ambient temp ¾would require millions of runs to test all possible (TdT(Td,Ta ) pairs ¾want cooling if desired temp is lower than ambient temp key problem: choosing a test suite systematically ¾small enough to run quickly if Td > Ta, turn on heating ¾large enough to validate the program convincingly if Td < Ta, turn on air-conditioning if Td = Ta, turn everything off © Robert Miller 2008 © Robert Miller 2008 4 10/6/2008 Key Idea: Partition the Input Space More Examples input space is very large, but program is small java.math.BigInteger.multiply(BigInteger val) ¾so behavior must be the “same” for whole sets of inputs ¾has two arguments, this and val, drawn from BigInteger ideal test suite ¾partition BigInteger into: approach 1: ¾ident ify sets of inputs wiihth the same bbhehavi or • BigNeg, SmallNeg, -1, 0, 1, SmallPos, BigPos partition itinputs ¾try one input from each set ¾pick a value from each class separately, • -265, -9 -1, 0, 1, 9, 265 then form all ¾test the 7 × 7 = 49 combinations combinations static int java.Math.max(int a, int b) ¾partition into: approach 2: if Td > Ta, turn on heating • a < b, a = b, a > b partition the whlhole if Td < Ta, turn on air-conditioning ¾pick value from each class input space (useful if Td = Ta, turn everything off • (1, 2), (1, 1), (2, 1) when relationship between inputs matters) © Robert Miller 2008 © Robert Miller 2008 More Examples Boundary Testing Set.intersect(Set that) ¾include classes at boundaries of the input space • zero, min/max values, empty set, empty string, null ¾partition Set into: use both approaches • ∅, singleton, many ¾why? because bugs often occur at boundaries ¾partition whole input space into: • off-by-one errors • this = that, this ⊆ that, this ⊇ that, this ∩ that ≠ ∅, this ∩ that = ∅ • forget to handle empty container • overflow errors in arithmetic ¾pppick values that cover both partitions • {},{} {},{2} {},{2,3,4} • {5},{} {5},{2} {4},{2,3,4} • {2,3},{} {2,3},{2} {1,2},{2,3} © Robert Miller 2008 © Robert Miller 2008 5 10/6/2008 Exercise recall our quiz grammar ¾partition the input space of quizzes ¾devise a set of test quizzes Option ::= Value? Text Value ::= [ digit+ ] Text ::= char* Rule ::= Range Message Range ::= digit+ - digit+ : Message ::= char* COVERAGE ¾what important class of inputs are we leaving out? © Robert Miller 2008 © Robert Miller 2008 Coverage State Diagram for Thermostat how good are my tests? ¾measure extent to which tests ‘cover’ the spec or code if Td > Ta, turn on heating spec coverage for state machines if Td < Ta, turn on air-conditioning if Td = Ta, turn everything off state machine being tested all actions ¾a test case is a trace of (Td,Ta) pairs ¾all actions: (Td<Ta), (Td=Ta), (Td>Ta) all states • e.g., using actual temperatures: (67, 70), (67, 67), (70, 67) kinds of ¾all states: the same trace would cover all states coverage all transitions ¾all transitions: (Td<Ta), (Td=Ta), (Td > Ta), (Td=Ta) • e.g. (67, 70), (67, 67), (70, 67), (70, 70) all paths ¾all-actions, all-states ≤ all-transitions ≤ all-paths © Robert Miller 2008 © Robert Miller 2008 6 10/6/2008 Code Coverage How Far Should You Go? view control flow graph as state machine for spec coverage ¾and then apply state machine coverage notions ¾ all-actions: essential example ¾ all-states, all-transitions: if possible if (x < 10) x++; ¾ all-paths: generally infeasible, even if finite for code coverage ¾ all-statements, all-branches: if possible ¾ all-paths: infeasible industry practice ¾ all-statements is common goal, rarely achieved (due to unreachable code) ¾safe ty critica l in dus try has more arduous criter ia (eg, “MCDC”, modifie d decision/condition coverage) © Robert Miller 2008 © Robert Miller 2008 A Typical Statement Coverage Tool Black Box vs.