Software Testing Overview Software Testing Overview: Dr. Andrea Arcuri Part I Simul a R esearch L ab oratory Oslo,,y Norway [email protected]
Based on slides provided by Prof. Lionel Briand
1 2 Lionel Briand 2009 Lionel Briand 2009
Software has become prevalent in all aspects of our lives Qualities of Software Products
• Correctness • Repairability • Reliability • Evolvability • Robustness • Reusability • Performance • Portability • User Friendliness • Understandability • Verifiability • Interoperability • Maintainability
3 4 Lionel Briand 2009 Lionel Briand 2009 Pervasive Problems Examples of Software Failures • Communications: Loss or corruption of • Software is commonly delivered late, way over communication media, non delivery of data. budgg,et, and of unsatisfactor yqy quality • Space Applications: Lost lives , launch delays , e .g ., • Software validation and verification are rarely European Ariane 5 shuttle, 1996: systematic and are usually not based on sound, – From the official disaster report: “Due to a well-defined techniques malfunction in the control software, the rocket veered off its flight path 37 seconds after • Software development processes are commonly launch.” unstable and uncontroll ed • Defense and Warfare: Misidentification of friend or • Software quality is poorly measured, monitored, foe. andtlldd controlled. • Transportation: Deaths, delays, sudden acceleration, • Software failure examples: inability to brake. http://www. cs. bc. edu/~gtan/bug/softwarebug. html • Electric Power: Death , injuries , power outages , long-term health hazards (radiation). 5 6 Lionel Briand 2009 Lionel Briand 2009
Examples of Software Failures Ariane 5 – ESA (()cont.) • Money Management: Fraud, violation of privacy, shutdown of stock exchanges and banks , negative interest rates . • Control of Elections: Wrong results (intentional or non- On June 4, 1996, the flight of the intentional). Ariane 5 launcher ended in a • Control of Jails: Technology-aided escape attempts and successes, failure. failures in software-controlled locks. • Law Enforcement: False arrests and imprisonments . Only about 40 seconds after initiation of the flight sequence, at an altitude of about 3,700 m, the launcher veered off its flight path, broke up and exploded . 7 8 Lionel Briand 2009 Lionel Briand 2009 Ariane 5 – Root Cause Ariane 5 – Lessons Learned • Source: ARIANE 5 Fligg,pyqyht 501 Failure, Report by the Inquiry Board • Adequate exception handling and redundancy strategies A program segment for converting a floating point number to a (real function of a backup system, degraded modes?) signed 16 bit i nt eger was execut ed with an i nput d at a val ue out sid e the range representable by a signed 16-bit integer. • Clear, complete, documented specifications (e.g., This run time error (out of range , overflow) , which arose in both preconditions, post-conditions) the active and the backup computers at about the same time, was • But perhaps more importantly: usage-based testing detected and both computers shut themselves down. (based on operational profiles), in this case actual This resulted in the total loss of attitude control. The Ariane 5 Ariane 5 trajectories turned uncontrollably and aerodynamic forces broke the vehicle apart. • Note this was not a complex, computing problem, but a deficiency of the software engineering practices in place This breakup was detected by an on-board monitor which ignited the explosive charges to destroy the vehicle in the air. Ironically, … the result of this format conversion was no longer needed after lift 9 10 off. Lionel Briand 2009 Lionel Briand 2009
F-18 crash Fatal Therac-25 Radiation • An F-18 crashed because of a missing exception condition: An if ... then ... block without the else clause that was • In 1986, a man in Texas received between 16,500- thought could not possibly arise. 25, 000 radiations in less than 10 sec , over an area of about 1 cm. • In simulation, an F-16 program bug caused the virtual plane to flip over whenever it crossed the equator, as a • He lost his left arm, and died of complications 5 result of a missing minus sign to indicate south latitude. months later.
11 12 Lionel Briand 2009 Lionel Briand 2009 Power Shutdown in 2003 Consequences of Poor Quality 508 generating units and 256 power plants shut down • Standish Group surveyed 350 companies, over 8000 projects, in 1994 Affected 10 million people in Ontario, • 31% cancelled before completed, 9-16% were delivered Canada within cost and budget Affected 40 million • US study (1995): 81 billion US$ spend per year for failing people in 8 US software development projects states • NIST stud y (2002) : bugs cost $ 59. 5 billion a year. Ear lier Financial losses of detection could save $22 billion. $6 Billion USD
The alarm system in the energy management system failed due to a software error and operators were not informed of the power overload in the system 13 14 Lionel Briand 2009 Lionel Briand 2009
Quality Assurance Dealing with SW Faults
• Uncover faults in the documents where they are FltHdliFault Handling introduced, in a systematic way, in order to avoid ripple effects. Systematic, structured reviews of software FltTlFault Tolerance dtfdtdocuments are referred to as itiinspections. Fault Avoidance Fault Detection • Derive, in a systematic way, effective test cases to uncover faults Design Atomic Modular Inspections • Automate testing and inspection activities, to the Methodology Transactions Redundancy maximum extent possible Configuration Verification • Monitor and control quality, e.g., reliability, Management maintainability, safety, across all project phases and Testing Debugging activities • All this implies the quality measurement of SW products and processes Component Integration System Correctness Performance Testing Testing Testing Debugging Debugging
15 16 Lionel Briand 2009 Lionel Briand 2009 Testing Definition Basic Testing Definition • Errors: People commit errors • SW Testing: Techniques to execute programs • Fault: A fault is the result of an error in the software with the intent of finding as many defects as docu mentation, code , etc . possible and/or gaining sufficient confidence • Failure: A failure occurs when a fault executes • Many people use the above three terms inter -changeably. It in the software system under test. should be avoided – “Program testing can show the presence of • Incident: Consequences of failures – Failure occurrence bugs, never their absence” (Dijkstra) may or may not be apparent to the user • The fundamental chain of SW dependability threats:
propagation causation results in Error Fault Failure Incident ...
17 18 Lionel Briand 2009 Lionel Briand 2009
Why is SW testing important?
• According to some estimates: ~50% of development costs Testing • A study by (the American) NIST in 2002: – The annual national cost of inadequate testing is as Definitions & Objectives much as $59 Billion US! – The report i s ti tl ed : “Th e E conomi c I mpacts of Inadequate Infrastructure for Software Testing”
19 20 Lionel Briand 2009 Lionel Briand 2009 Test Stubs and Drivers Summary of Definitions
•Test Stub: Partial implementation of a component on which a unit under test Test suite depends. Test S tub exercises is r evi sed by Depends * * 1…n Component a Component b Test case Component Correction U nder T es t * * * Test stub finds • Test Driver: Partial implementation of a component that depends on a unit under repairs test. * Test Driver Test driver Depends C om ponent j C om ponent k * * Failure * * Fault ** Error Under Test is caused by is caused by • Test stubs and drivers enable components to be isolated from the rest of the
system for testing. 21 22 Lionel Briand 2009 Lionel Briand 2009
Motivations The Testing Dilemma
Available • No matter how rigorous we are, software is going testing to be faulty Limited resources resources All Software System • Testing represent a functionality substantial percentage of expertis Time Peopl e software development Money Potentially costs and time to market e • Impossible to test under thousands all operating conditions – bdbased on incompl ltete of items testing, we must gain confidence that the system to test has th e des ir ed be hav ior o • Testing large systems is complex – it requires strategy and technology- and is often done inefficiently in practice 23 Faulty functionality 24 Lionel Briand 2009 Lionel Briand 2009 TtiPTesting Process O vervi ew Qualities of Testing
SW Representation (e.g., models, requirements) • Effective at uncovering faults DiDerive TtTest cases Estimate •Help locate faults for debugging Expected SW Code • Repeatable so that a precise understanding Results Execute Test cases of the fault can be gained Get Test Results • Automated so as to lower the cost and Test Oracle Compare timescale [Test Result==Oracle] [Test Result!=Oracle] • SiSystematic so as to bdiblibe predictable in terms of its effect on dependability
25 26 Lionel Briand 2009 Lionel Briand 2009
Subtleties of Software Continuity Property • Problem: Test a bridge ability to sustain a Dependability certain weight • Continuity Property: If a bridge can sustain a • Dependability: Correctness, reliability, safety, weight equal to W1, then it will sustain any robustness weight W2 <= W1 • A program is correct if it obeys its specification. • EtilltiittEssentially, continuity property= small differences in operating conditions should not • Reliability is a way of statistically approximating result in dramatically different behavior correctness. • Safety implies that the software must always • BUT, the same testing property cannot be applied when testing software, display a safe behavior, under any condition. wh?hy? • A system is robust if it acts reasonably in severe, • In software, small differences in operating conditions can result in dramaticallyy(g,) different behavior (e.g., value boundaries) unusual or illegal conditions. • Thus, the continuity property is not applicable to software 27 28 Lionel Briand 2009 Lionel Briand 2009 Subtleties of Software Software Dependability Dependability II Ex: Traffic Light Controller • Correct but not safe or robust: the specification is • Correctness, Reliability: The system should let traffic pass according to the correct pattern and central idinadequate scheduling on a continuous basis. • Reliable but not correct: failures rarely happen • Robustness: The system should provide degraded functionality in the presence of • Safe but not correct: annoying failures may abnormalities. happen • Safety: It should never signal conflicting greens. • Reliable and robust but not safe: catastrophic failures are possible An examppgle degraded function: the line to central controlling is cut-off and a default pattern is then used by local controller.
29 30 Lionel Briand 2009 Lionel Briand 2009
Dependability Needs Vary
• Safety-critical applications – flight control systems have strict safety requirements – telecommunication systems have strict robustness Fundamental Principles requirements • Mass-market products – deppyendability is less im portant than time to market • Can vary within the same class of products: – reliability and robustness are key issues for multi-user operating systems (e.g., UNIX) less important for single users operating systems (e.g., Windows or MacOS)
31 32 Lionel Briand 2009 Lionel Briand 2009 Exhaustive Testing Input Equivalence Classes • EhExhausti tittiive testing, i.e., t ttiesting a soft ware syst em usi ng all the possible inputs, is most of the time impossible. General principle to reduce the number of inputs • Examples: Testing criteria group input elements into (equivalence) – A program that computes the factorial function (n!=n.(n-1).(n-2)…1) classes • Exhaustive testingggpg,,,,, = running the program with 0, 1, 2, …, 100, … as an input! – OiOne input in sel ected dihl(i in each class (notion of test – A compiler (e.g., javac) coverage) Input • EhExhausti ve testi ng = runni ng th e (J ava) compil er wi ihth any Domain possible (Java) program (i.e., source code) tc5 tc4
tc6 tc1 tc3 tc2 33 34 Lionel Briand 2009 Lionel Briand 2009
TtCTest Coverage Complete Coverage: White-Box SftSoftware Represent ttiation (Model) Associated Criteria if x > y then Test cases must cover Max := x; all the … in the model else Max :=x ; // fault! endifd if;
Test Data {{,y;,y}x=3, y=2; x=2, y=3} can detect the error , more “covera ge” {x=3, y=2; x=4, y=3; x=5, y=1} is larger but cannot detect it Representation of • the specification Black-Box Testing • Testing criteria group input domain elements into (equivalence) classes (control flow paths here) • the implementation White-Box Testing • Complete coverage attempts to run test cases from each class
35 36 Lionel Briand 2009 Lionel Briand 2009 Control Flow Coverage (CFG) - Example Control Flow Coverage (CFG) - Definitions Greatest common divisor (GCD) program • Directed graph read(x); • Nodes are blocks of sequential read(y); x y x y while x y loop statements x = y if x>y then x = y •Edges are transfers of control x<= y x>yx > y x := x – y; x<=y x > y • Edges may be labeled with else pppgredicate representing the y := y – x; condition of control transfer end if; • There are several conventions for end loop; gcd := x; flow graph models with subtle differences (e.g., hierarchical CFGs, concurrent CFG s ) 37 38 Lionel Briand 2009 Lionel Briand 2009
Basics of CFG: Blocks Testing Coverage of Control flow
• As a testing strategy, we may want to ensure that testing exercises control flow: – Statement/Node Coverage – Edge/Branch Coverage x y – Condition Coverage x = y – Path Coverage x<=y x > y
If-Then-Else While loop Switch
39 40 Lionel Briand 2009 Lionel Briand 2009 Complete Coverage: Black-Box Black vs. White Box Testing
• Specification of Compute Factorial Number: If the input value n is < 0, then an System appropriate error message must be printed. If 0 <= n < 20, then the exact value of n! must be printed. If 20 <= n < 200 , then an approximate value of n! must be printed in floating point format, e.g., using some approximate method of numerical calculus. The admissible error is 0.1% of the exact value. Finally, if n>=200, the input can be rejected Specification by printing an appropriate error message. IlImplement ttiation • Because of expected variations in behavior, it is quite natural to divide the input domain into the classes {n<0}, {0<= n <20}, {20 <= n < 200}, { n >= 200} . W e can use one or more t est cases f rom each cl ass in each test set. Correct results from one such test set support the assertion that the program will behave correctly for any other class value , b ut there is no g uarantee! Missing functionality: Unexpected functionality: Cannot be revealed by white-box Cannot be revealed by black-box thitechniques thitechniques 41 42 Lionel Briand 2009 Lionel Briand 2009
White-box vs. Black-box Testing
• Black box •White box + Check conformance with + It allows you to be Software Testing Overview: specifications confident about code coverage of testing + It scales up (different thitechniques at tdifft different + It i s b ased on cont rol or Part II granularity levels) data flow code analysis – It depends on the – It does not scale up specificati on notati on and (l(mostly appli liblicable at unit degree of detail and integration testing – Do not know how much of levels) the system is being tested – Unlike black-box – What if the software techniques, it cannot reveal performed some missing functionalities (part unspecified, undesirable ofhf the spec ificati on th at i s not implemented) task? 43 44 Lionel Briand 2009 Lionel Briand 2009 Many Causes of Failures
• The specification may be wrong or have a Practical Aspects missing requirement • The specification may contain a requirement that is impossible to implement given the prescribed software and hardware • The system design may contain a fault • The program code may be wrong
45 46 Lionel Briand 2009 Lionel Briand 2009 de
oo Unit Design System Other User Customer test descriptions functional software environment Test Organization requirements
nent c specifications specifications oo
• May different potential causes of failure, Large Comp Unit systems -> testiiing invol ves several stages de tttest • Module, component, or unit testing nent co Integration Function Performance Acceptance Installation oo . • Integration testing test test test test test
• Function test Comp . Integrated Functioning Verified, Accepted • Performance test . modules system validated system
code software • Acceptance test Unit • Installation test test SYSTEM ponent Pfleegg,er, 1998 IN USE!
47 Com 48 Lionel Briand 2009 Lionel Briand 2009 Unit Testing Integration/Interface Testing
• (Usually) performed by each developer. • Performed by a small team. • Scope: Ensure that each module (i.e., class, subprogram) has been implemented correctly. • Scope: Ensure that the interfaces between components (which individual developers could not test) have been implemented • Often based on White-box testing. correctly, e.g., consistency of parameters, file format
Test • A unit is the smallest testable part of an application . • In procedural programming, a unit may be an individual subppgrogram, function, ,p procedure , etc. Test • In object-oriented programming, the smallest unit is a method; which may belong to a base/super class, abstract class or • Test cases have to be planned, documented, and reviewed. deri ved/ c hild c lass. • Per forme d in a rel ati vel y small ti me-frame 49 50 Lionel Briand 2009 Lionel Briand 2009
In tegrati on T esti ng F ail ures System Testing • Performed byyp a separate g gproup within the org anization ( Most of IiflldIntegration of well tested components may l ldead to the times). failure due to: • Scope: Pretend we are the end-users of the product. • BdBad use of fh the i nterf aces (b (bdiad interf ace specifications / implementation) • Focus is on functionality, but may also perform many other types of non-functional tests (e.g., recovery, performance). • Whhihbhi/fldWrong hypothesis on the behavior/state of related modules (bad functional specification / implementation), e .g ., wrong assumption about Test return value • Use of poor drivers/stubs: a module may behave correctly with (simple) drivers/stubs, but result in failures when integrated with actual (complex) • Black-box form of testing, but code coverage can be monitored.
modules. 51 • Test case specification driven by system’s use-cases. 52 Lionel Briand 2009 Lionel Briand 2009 Differences among Testing System vs. Acceptance Testing Activities • System testing Unit Testing Integration Testing System Testing – The software is comppqared with the requirements specifications (verification) From module From interface From requirements – Usually performed by the developers, who know the specifications specifications specs system Visibility Visibility No visibility of • Acceptance testing code of code details of integr. Struct. – The soft ware i s compared with th e end -user Complex Some No drivers/stubs requirements (validation) scaffolding scaffolding – Usually performed by the customer (buyer) , who knows the environment where the system is to be used Behavior of Interactions System single modules among modules functionalities – Sometime distinguished between - -testing for general purpose products Pezze and Young, 1998 53 54 Lionel Briand 2009 Lionel Briand 2009
Testing through the Lifecycle Life Cycle Mapping: V Model
• Much of the life-cycle development artifacts provides a rich source of test data • Identifying test requirements and test cases early helps shorten the development time • They may help reveal faults • It may also help identify early low testability specifications or didesign Other name: Integration Analysis Design Implementation Testing testing Other name: Unit ttitesting Preparation Preparation Preparation Testing for Test for Test for Test 55 56 Lionel Briand 2009 Lionel Briand 2009 Testing Activities BEFORE TtitkTesting takes creati tiitvity Coding • TiTesting of ten vi ewed as di rty work k(hhl (though less • Testing is a time consuming activity and less). • DiiDevising a test strategy and didifh identify the test • TdTo devel op an eff ffiective test, one must h ave: requirements represent a substantial part of it • Detailed understanding of the system • Planni ng i s essenti al • Knowledge of the testing techniques • Skill to apply these techniques in an effective and efficient • Testing activities undergo huge pressure as it is is manner run towardhds the end dfh of the pro ject • Testing is done best by independent testers • In order to shorten time-to-market and ensure a • Programmer often stick to the data set that makes certailin level lf of quali ty, a l ot of fQA QA-reldlated the program work activities (including testing) must take place early • A program often does not work when tried by in the development life cycle somebody else. 57 58 Lionel Briand 2009 Lionel Briand 2009