Overview Software Testing Overview: Dr. Andrea Arcuri Part I Simul a R esearch L ab oratory Oslo,,y Norway [email protected]

Based on slides provided by Prof. Lionel Briand

1 2  Lionel Briand 2009  Lionel Briand 2009

Software has become prevalent in all aspects of our lives Qualities of Software Products

• Correctness • Repairability • Reliability • Evolvability • Robustness • Reusability • Performance • Portability • User Friendliness • Understandability • Verifiability • Interoperability •

3 4  Lionel Briand 2009  Lionel Briand 2009 Pervasive Problems Examples of Software Failures • Communications: Loss or corruption of • Software is commonly delivered late, way over communication media, non delivery of data. budgg,et, and of unsatisfactor yqy quality • Space Applications: Lost lives , launch delays , e .g ., • Software validation and verification are rarely European Ariane 5 shuttle, 1996: systematic and are usually not based on sound, – From the official disaster report: “Due to a well-defined techniques malfunction in the control software, the rocket veered off its flight path 37 seconds after • processes are commonly launch.” unstable and uncontroll ed • Defense and Warfare: Misidentification of friend or • is poorly measured, monitored, foe. andtlldd controlled. • Transportation: Deaths, delays, sudden acceleration, • Software failure examples: inability to brake. http://www. cs. bc. edu/~gtan/bug/softwarebug. html • Electric Power: Death , injuries , power outages , long-term health hazards (radiation). 5 6  Lionel Briand 2009  Lionel Briand 2009

Examples of Software Failures Ariane 5 – ESA (()cont.) • Money Management: Fraud, violation of privacy, shutdown of stock exchanges and banks , negative interest rates . • Control of Elections: Wrong results (intentional or non- On June 4, 1996, the flight of the intentional). Ariane 5 launcher ended in a • Control of Jails: Technology-aided escape attempts and successes, failure. failures in software-controlled locks. • Law Enforcement: False arrests and imprisonments . Only about 40 seconds after initiation of the flight sequence, at an altitude of about 3,700 m, the launcher veered off its flight path, broke up and exploded . 7 8  Lionel Briand 2009  Lionel Briand 2009 Ariane 5 – Root Cause Ariane 5 – Lessons Learned • Source: ARIANE 5 Fligg,pyqyht 501 Failure, Report by the Inquiry Board • Adequate exception handling and redundancy strategies A program segment for converting a floating point number to a (real function of a backup system, degraded modes?) signed 16 bit i nt eger was execut ed with an i nput d at a val ue out sid e the range representable by a signed 16-bit integer. • Clear, complete, documented specifications (e.g., This run time error (out of range , overflow) , which arose in both preconditions, post-conditions) the active and the backup computers at about the same time, was • But perhaps more importantly: usage-based testing detected and both computers shut themselves down. (based on operational profiles), in this case actual This resulted in the total loss of attitude control. The Ariane 5 Ariane 5 trajectories turned uncontrollably and aerodynamic forces broke the vehicle apart. • Note this was not a complex, problem, but a deficiency of the practices in place This breakup was detected by an on-board monitor which ignited the explosive charges to destroy the vehicle in the air. Ironically, … the result of this format conversion was no longer needed after lift 9 10 off.  Lionel Briand 2009  Lionel Briand 2009

F-18 crash Fatal Therac-25 Radiation • An F-18 crashed because of a missing exception condition: An if ... then ... block without the else clause that was • In 1986, a man in Texas received between 16,500- thought could not possibly arise. 25, 000 radiations in less than 10 sec , over an area of about 1 cm. • In simulation, an F-16 program bug caused the virtual plane to flip over whenever it crossed the equator, as a • He lost his left arm, and died of complications 5 result of a missing minus sign to indicate south latitude. months later.

11 12  Lionel Briand 2009  Lionel Briand 2009 Power Shutdown in 2003 Consequences of Poor Quality 508 generating units and 256 power plants shut down • Standish Group surveyed 350 companies, over 8000 projects, in 1994 Affected 10 million people in Ontario, • 31% cancelled before completed, 9-16% were delivered Canada within cost and budget Affected 40 million • US study (1995): 81 billion US$ spend per year for failing people in 8 US software development projects states • NIST stud y (2002) : bugs cost $ 59. 5 billion a year. Ear lier Financial losses of detection could save $22 billion. $6 Billion USD

The alarm system in the energy management system failed due to a software error and operators were not informed of the power overload in the system 13 14  Lionel Briand 2009  Lionel Briand 2009

Quality Assurance Dealing with SW Faults

• Uncover faults in the documents where they are FltHdliFault Handling introduced, in a systematic way, in order to avoid ripple effects. Systematic, structured reviews of software FltTlFault Tolerance dtfdtdocuments are referred to as itiinspections. Fault Avoidance Fault Detection • Derive, in a systematic way, effective test cases to uncover faults Design Atomic Modular Inspections • Automate testing and inspection activities, to the Methodology Transactions Redundancy maximum extent possible Configuration Verification • Monitor and control quality, e.g., reliability, Management maintainability, safety, across all project phases and Testing Debugging activities • All this implies the quality measurement of SW products and processes Component Integration System Correctness Performance Testing Testing Testing Debugging Debugging

15 16  Lionel Briand 2009  Lionel Briand 2009 Testing Definition Basic Testing Definition • Errors: People commit errors • SW Testing: Techniques to execute programs • Fault: A fault is the result of an error in the software with the intent of finding as many defects as docu mentation, code , etc . possible and/or gaining sufficient confidence • Failure: A failure occurs when a fault executes • Many people use the above three terms inter -changeably. It in the software system under test. should be avoided – “Program testing can show the presence of • Incident: Consequences of failures – Failure occurrence bugs, never their absence” (Dijkstra) may or may not be apparent to the user • The fundamental chain of SW dependability threats:

propagation causation results in Error Fault Failure Incident ...

17 18  Lionel Briand 2009  Lionel Briand 2009

Why is SW testing important?

• According to some estimates: ~50% of development costs Testing • A study by (the American) NIST in 2002: – The annual national cost of inadequate testing is as Definitions & Objectives much as $59 Billion US! – The report i s ti tl ed : “Th e E conomi c I mpacts of Inadequate Infrastructure for Software Testing”

19 20  Lionel Briand 2009  Lionel Briand 2009 Test Stubs and Drivers Summary of Definitions

•Test Stub: Partial implementation of a component on which a unit under test Test suite depends. Test S tub exercises is r evi sed by Depends * * 1…n Component a Component b Test case Component Correction U nder T es t * * * Test stub finds • Test Driver: Partial implementation of a component that depends on a unit under repairs test. * Test Driver Test driver Depends C om ponent j C om ponent k * * Failure * * Fault ** Error Under Test is caused by is caused by • Test stubs and drivers enable components to be isolated from the rest of the

system for testing. 21 22  Lionel Briand 2009  Lionel Briand 2009

Motivations The Testing Dilemma

Available • No matter how rigorous we are, software is going testing to be faulty Limited resources resources All Software System • Testing represent a functionality substantial percentage of expertis Time Peopl e software development Money Potentially costs and time to market e • Impossible to test under thousands all operating conditions – bdbased on incompl ltete of items testing, we must gain confidence that the system to test has th e des ir ed be hav ior o • Testing large systems is complex – it requires strategy and technology- and is often done inefficiently in practice 23 Faulty functionality 24  Lionel Briand 2009  Lionel Briand 2009 TtiPTesting Process O vervi ew Qualities of Testing

SW Representation (e.g., models, requirements) • Effective at uncovering faults DiDerive TtTest cases Estimate •Help locate faults for debugging Expected SW Code • Repeatable so that a precise understanding Results Execute Test cases of the fault can be gained Get Test Results • Automated so as to lower the cost and Test Oracle Compare timescale [Test Result==Oracle] [Test Result!=Oracle] • SiSystematic so as to bdiblibe predictable in terms of its effect on dependability

25 26  Lionel Briand 2009  Lionel Briand 2009

Subtleties of Software Continuity Property • Problem: Test a bridge ability to sustain a Dependability certain weight • Continuity Property: If a bridge can sustain a • Dependability: Correctness, reliability, safety, weight equal to W1, then it will sustain any robustness weight W2 <= W1 • A program is correct if it obeys its specification. • EtilltiittEssentially, continuity property= small differences in operating conditions should not • Reliability is a way of statistically approximating result in dramatically different behavior correctness. • Safety implies that the software must always • BUT, the same testing property cannot be applied when testing software, display a safe behavior, under any condition. wh?hy? • A system is robust if it acts reasonably in severe, • In software, small differences in operating conditions can result in dramaticallyy(g,) different behavior (e.g., value boundaries) unusual or illegal conditions. • Thus, the continuity property is not applicable to software 27 28  Lionel Briand 2009  Lionel Briand 2009 Subtleties of Software Software Dependability Dependability II Ex: Traffic Light Controller • Correct but not safe or robust: the specification is • Correctness, Reliability: The system should let traffic pass according to the correct pattern and central idinadequate scheduling on a continuous basis. • Reliable but not correct: failures rarely happen • Robustness: The system should provide degraded functionality in the presence of • Safe but not correct: annoying failures may abnormalities. happen • Safety: It should never signal conflicting greens. • Reliable and robust but not safe: catastrophic failures are possible An examppgle degraded function: the line to central controlling is cut-off and a default pattern is then used by local controller.

29 30  Lionel Briand 2009  Lionel Briand 2009

Dependability Needs Vary

• Safety-critical applications – flight control systems have strict safety requirements – telecommunication systems have strict robustness Fundamental Principles requirements • Mass-market products – deppyendability is less im portant than time to market • Can vary within the same class of products: – reliability and robustness are key issues for multi-user operating systems (e.g., UNIX) less important for single users operating systems (e.g., Windows or MacOS)

31 32  Lionel Briand 2009  Lionel Briand 2009 Exhaustive Testing Input Equivalence Classes • EhExhausti tittiive testing, i.e., t ttiesting a soft ware syst em usi ng all the possible inputs, is most of the time impossible.  General principle to reduce the number of inputs • Examples:  Testing criteria group input elements into (equivalence) – A program that computes the factorial function (n!=n.(n-1).(n-2)…1) classes • Exhaustive testingggpg,,,,, = running the program with 0, 1, 2, …, 100, … as an input! – OiOne input in sel ected dihl(i in each class (notion of test – A (e.g., javac) coverage) Input • EhExhausti ve testi ng = runni ng th e (J ava) compil er wi ihth any Domain possible (Java) program (i.e., source code) tc5 tc4

tc6 tc1 tc3 tc2 33 34  Lionel Briand 2009  Lionel Briand 2009

TtCTest Coverage Complete Coverage: White-Box SftSoftware Represent ttiation (Model) Associated Criteria if x > y then Test cases must cover Max := x; all the … in the model else Max :=x ; // fault! endifd if;

Test Data {{,y;,y}x=3, y=2; x=2, y=3} can detect the error , more “covera ge” {x=3, y=2; x=4, y=3; x=5, y=1} is larger but cannot detect it Representation of • the specification  Black-Box Testing • Testing criteria group input domain elements into (equivalence) classes (control flow paths here) • the implementation  White-Box Testing • Complete coverage attempts to run test cases from each class

35 36  Lionel Briand 2009  Lionel Briand 2009 Control Flow Coverage (CFG) - Example Control Flow Coverage (CFG) - Definitions Greatest common divisor (GCD) program • Directed graph read(x); • Nodes are blocks of sequential read(y); x  y x  y while x  y loop statements x = y if x>y then x = y •Edges are transfers of control x<= y x>yx > y x := x – y; x<=y x > y • Edges may be labeled with else pppgredicate representing the y := y – x; condition of control transfer end if; • There are several conventions for end loop; gcd := x; flow graph models with subtle differences (e.g., hierarchical CFGs, concurrent CFG s ) 37 38  Lionel Briand 2009  Lionel Briand 2009

Basics of CFG: Blocks Testing Coverage of Control flow

• As a testing strategy, we may want to ensure that testing exercises control flow: – Statement/Node Coverage – Edge/Branch Coverage x  y – Condition Coverage x = y – Path Coverage x<=y x > y

If-Then-Else While loop Switch

39 40  Lionel Briand 2009  Lionel Briand 2009 Complete Coverage: Black-Box Black vs. White Box Testing

• Specification of Compute Factorial Number: If the input value n is < 0, then an System appropriate error message must be printed. If 0 <= n < 20, then the exact value of n! must be printed. If 20 <= n < 200 , then an approximate value of n! must be printed in floating point format, e.g., using some approximate method of numerical calculus. The admissible error is 0.1% of the exact value. Finally, if n>=200, the input can be rejected Specification by printing an appropriate error message. IlImplement ttiation • Because of expected variations in behavior, it is quite natural to divide the input domain into the classes {n<0}, {0<= n <20}, {20 <= n < 200}, { n >= 200} . W e can use one or more t est cases f rom each cl ass in each test set. Correct results from one such test set support the assertion that the program will behave correctly for any other class value , b ut there is no g uarantee! Missing functionality: Unexpected functionality: Cannot be revealed by white-box Cannot be revealed by black-box thitechniques thitechniques 41 42  Lionel Briand 2009  Lionel Briand 2009

White-box vs. Black-box Testing

• Black box •White box + Check conformance with + It allows you to be Software Testing Overview: specifications confident about of testing + It scales up (different thitechniques at tdifft different + It i s b ased on cont rol or Part II granularity levels) data flow code analysis – It depends on the – It does not scale up specificati on notati on and (l(mostly appli liblicable at unit degree of detail and integration testing – Do not know how much of levels) the system is being tested – Unlike black-box – What if the software techniques, it cannot reveal performed some missing functionalities (part unspecified, undesirable ofhf the spec ificati on th at i s not implemented) task? 43 44  Lionel Briand 2009  Lionel Briand 2009 Many Causes of Failures

• The specification may be wrong or have a Practical Aspects missing requirement • The specification may contain a requirement that is impossible to implement given the prescribed software and hardware • The system design may contain a fault • The program code may be wrong

45 46  Lionel Briand 2009  Lionel Briand 2009 de

oo Unit Design System Other User Customer test descriptions functional software environment Test Organization requirements

nent c specifications specifications oo

• May different potential causes of failure, Large Comp Unit systems -> testiiing invol ves several stages de tttest • Module, component, or unit testing nent co Integration Function Performance Acceptance Installation oo . • Integration testing test test test test test

• Function test Comp . Integrated Functioning Verified, Accepted • Performance test . modules system validated system

code software • Acceptance test Unit • Installation test test SYSTEM ponent Pfleegg,er, 1998 IN USE!

47 Com 48  Lionel Briand 2009  Lionel Briand 2009 Unit Testing Integration/Interface Testing

• (Usually) performed by each developer. • Performed by a small team. • Scope: Ensure that each module (i.e., class, subprogram) has been implemented correctly. • Scope: Ensure that the interfaces between components (which individual developers could not test) have been implemented • Often based on White-box testing. correctly, e.g., consistency of parameters, file format

Test • A unit is the smallest testable part of an application . • In procedural programming, a unit may be an individual subppgrogram, function, ,p procedure , etc. Test • In object-oriented programming, the smallest unit is a method; which may belong to a base/super class, abstract class or • Test cases have to be planned, documented, and reviewed. deri ved/ c hild c lass. • Per forme d in a rel ati vel y small ti me-frame 49 50  Lionel Briand 2009  Lionel Briand 2009

In tegrati on T esti ng F ail ures System Testing • Performed byyp a separate g gproup within the org anization ( Most of IiflldIntegration of well tested components may l ldead to the times). failure due to: • Scope: Pretend we are the end-users of the product. • BdBad use of fh the i nterf aces (b (bdiad interf ace specifications / implementation) • Focus is on functionality, but may also perform many other types of non-functional tests (e.g., recovery, performance). • Whhihbhi/fldWrong hypothesis on the behavior/state of related modules (bad functional specification / implementation), e .g ., wrong assumption about Test return value • Use of poor drivers/stubs: a module may behave correctly with (simple) drivers/stubs, but result in failures when integrated with actual (complex) • Black-box form of testing, but code coverage can be monitored.

modules. 51 • Test case specification driven by system’s use-cases. 52  Lionel Briand 2009  Lionel Briand 2009 Differences among Testing System vs. Acceptance Testing Activities • System testing Unit Testing Integration Testing System Testing – The software is comppqared with the requirements specifications (verification) From module From interface From requirements – Usually performed by the developers, who know the specifications specifications specs system Visibility Visibility No visibility of • Acceptance testing code of code details of integr. Struct. – The soft ware i s compared with th e end -user Complex Some No drivers/stubs requirements (validation) scaffolding scaffolding – Usually performed by the customer (buyer) , who knows the environment where the system is to be used Behavior of Interactions System single modules among modules functionalities – Sometime distinguished between  - -testing for general purpose products Pezze and Young, 1998 53 54  Lionel Briand 2009  Lionel Briand 2009

Testing through the Lifecycle Life Cycle Mapping: V Model

• Much of the life-cycle development artifacts provides a rich source of test data • Identifying test requirements and test cases early helps shorten the development time • They may help reveal faults • It may also help identify early low testability specifications or didesign Other name: Integration Analysis Design Implementation Testing testing Other name: Unit ttitesting Preparation Preparation Preparation Testing for Test for Test for Test 55 56  Lionel Briand 2009  Lionel Briand 2009 Testing Activities BEFORE TtitkTesting takes creati tiitvity Coding • TiTesting of ten vi ewed as di rty work k(hhl (though less • Testing is a time consuming activity and less). • DiiDevising a test strategy and didifh identify the test • TdTo devel op an eff ffiective test, one must h ave: requirements represent a substantial part of it • Detailed understanding of the system • Planni ng i s essenti al • Knowledge of the testing techniques • Skill to apply these techniques in an effective and efficient • Testing activities undergo huge pressure as it is is manner run towardhds the end dfh of the pro ject • Testing is done best by independent testers • In order to shorten time-to-market and ensure a • Programmer often stick to the data set that makes certailin level lf of quali ty, a l ot of fQA QA-reldlated the program work activities (including testing) must take place early • A program often does not work when tried by in the development life cycle somebody else. 57 58  Lionel Briand 2009  Lionel Briand 2009