Embedded Software Testing 18-649 Distributed Embedded Systems Philip Koopman September 30, 2015
Total Page:16
File Type:pdf, Size:1020Kb
10 Embedded Software Testing 18-649 Distributed Embedded Systems Philip Koopman September 30, 2015 © Copyright 2000-2015, Philip Koopman Overview Testing is an attempt to find bugs • The reasons for finding bugs vary • Finding all bugs is impossible Various types of testing for various situations • Exploratory testing – guided by experience • White Box testing – guided by software structure • Black Box testing – guided by functional specifications Various types of testing throughout development cycle • Unit test • Subsystem test • System integration test • Regression test • Acceptance test • Beta test 2 Definition Of Testing Testing is performing all of the following: • Providing software with inputs (a “workload”) • Executing a piece of software • Monitoring software state and/or outputs for expected properties, such as: – Conformance to requirements – Preservation of invariants (e.g., never applies brakes and throttle together) – Match to expected output values – Lack of “surprises” such as system crashes or unspecified behaviors General idea is attempting to find “bugs” by executing a program The following are potentially useful techniques, but are not testing: • Model checking • Static analysis (lint; compiler error checking) • Design reviews of code • Traceability analysis 3 Testing Terminology Workload: • “Inputs applied to software under test” • Each test is performed with a specific workload Behavior: • “Observed outputs of software under test” • Sometimes special outputs are added to improve observability of software (e.g., “test points” added to see internal states) Oracle: • “A model that perfectly predicts correct behavior” • Matching behavior with oracle output tells if test passed or failed • Oracles come in many forms: – Human observer – Different version of same program – Hand-created script of predicted values based on workload – List of invariants that should hold true over behavior 4 What’s A Bug? Simplistic answer: • A “bug” is a software defect = incorrect software • A software defect is an instance in which the software violates the specification More realistic answer – a “bug” can be one or more of the following: • Failure to provide required behavior • Providing an incorrect behavior • Providing an undocumented behavior or behavior that is not required • Failure to conform to a design constraint (e.g., timing, safety invariant) • Omission or defect in requirements/specification • Instance in which software performs as designed, but it’s the “wrong” outcome • Any “reasonable” complaint from a customer • … other variations on this theme … The goal of most testing is to attempt to find bugs in this expanded sense 5 Types Of Testing – Outline For Following Slides Testing styles: • Smoke testing • Exploratory testing • Black box testing • White box testing Testing situations: • Unit test • Subsystem test • System integration test • Regression test • Acceptance test • Beta test Most testing styles have some role to play in each testing situation 6 Smoke Testing Quick test to see if software is operational • Idea comes from hardware realm – turn power on and see if smoke pours out • Generally simple and easy to administer • Makes no attempt or claim of completeness • Smoke test for car: turn on ignition and check: – Engine idles without stalling – Can put into forward gear and move 5 feet, then brake to a stop – Wheels turn left and right while stopped Good for catching catastrophic errors • Especially after a new build or major change • Exercises any built-in internal diagnosis mechanisms But, not usually a thorough test • More a check that many software components are “alive” 7 Exploratory Testing A person exercises the system, looking for unexpected results • Might or might not be using documented system behavior as a guide • Is especially looking for “strange” behaviors that are not specifically required nor prohibited by the requirements Advantages • An experienced, thoughtful tester can find many defects this way • Often, the defects found are ones that would have been missed by more rigid testing methods Disadvantages • Usually no documented measurement of coverage • Can leave big holes in coverage due to tester bias/blind spots • An inexperienced, non-thoughtful tester probably won’t find the important bugs 8 Black Box Testing Tests designed with knowledge of behavior • But without knowledge of implementation • Often called “functional” testing Idea is to test what software does, but not how function is implemented • Example: cruise control black box test – Test operation at various speeds – Test operation at various underspeed/overspeed amounts – BUT, no knowledge of whether lookup table or control equation is used Advantages: • Tests the final behavior of the software • Can be written independent of software design – Less likely to overlook same problems as design • Can be used to test different implementations with minimal changes Disadvantages: • Doesn’t necessarily know the boundary cases – For example, won’t know to exercise every lookup table entry • Can be difficult to cover all portions of software implementation 9 Examples of Black Box Testing Assume you want to test a floating point square root function: sqrt(x) • sqrt(0) = 0 (boundary condition) • sqrt(1) = 1 (behavior changes between <1 and >1 • sqrt(9) = 3 (test some number greater than 1) • sqrt(.25) = .5 (test some number less than 1) • sqrt(-1) => error (test an out of range input) • sqrt(FLT_MAX) => … (test maximum numeric range) • sqrt(FLT_EPSILON) => … (test smallest positive number) • sqrt(NaN) => NaN (test for Not a Number input values) • Pick random positive numbers and confirm that: sqrt(x) * sqrt(x) = x – Note that the oracle here is an invariant, not a specific result • Other types of possible results to monitor: – Monitor numerical accuracy/stability for floating point math – Check to see if software crashes on some inputs 10 White Box Testing Tests designed with knowledge of software design • Often called “structural” testing Idea is to exercise software, knowing how it is designed • Example: cruise control white box test – Test operation at every point in control loop lookup table – Tests that exercise both paths of every conditional branch statement Advantages: • Usually helps getting good coverage (tests are specifically designed for coverage) • Good for ensuring boundary cases and special cases get tested Disadvantages: • 100% coverage tests might not be good at assessing functionality for “surprise” behaviors and other testing goals • Tests based on design might miss bigger picture system problems • Tests need to be changed if implementation/algorithm changes • Hard to test code that isn’t there (missing functionality) with white box testing 11 Examples of White Box Testing Assume you want to test a floating point square root function: sqrt(x) • Uses lookup table spaced at every 0.1 between zero and 10 • Uses iterative algorithm at and above value of 10 • Test sqrt(x) for negative numbers (expect error) • Test sqrt(x) for every value of x in middle of lookup table: 0.05, 0.15, … 9.95 • Test sqrt(x) exactly at every lookup table entry: 0, 0.1, 0.2, … 10.0 • Test sqrt(x) at 10.0 + FLT_EPSILON • Test sqrt(x) for some numbers that exercise interpolation algorithm Main differences from Black Box Testing: • Tests exploit knowledge of software design & coding details • Usually strives for 100% coverage of known properties – e.g., lookup table entries • Digs deepest at algorithmic discontinuities & branches – e.g., lookup table boundaries and center values 12 Testing Coverage “Coverage” is a notion how completely testing has been done • Usually a percentage (e.g., “97% branch coverage”) White box testing coverage (this is the usual use of the word “coverage”): • Percent of conditional branches where both sides of branch have been tested • Percent of lookup table entries used in computations • Doesn’t test missing code (e.g., missing exception handlers) Black box testing coverage: • Percent of requirements tested • Percent of documented exceptions exercised by tests • But, must relate to externally visible behavior or environment, not code • Doesn’t test extra behavior that isn’t supposed to be there Important note: 100% coverage is not “100% tested” • Each coverage aspect is narrow; good coverage is necessary, but not sufficient to achieve good testing 13 System-Level Testing Can’t Be Complete • System level testing is useful and important – Can find unexpected component interactions • But, it is impracticable to test everything at the system level – Too many possible operating conditions, timing sequences, system states – Too many possible faults, which might be intermittent • Combinations of component failures + memory corruption patterns • Multiple software defects activated by a sequence of operations TOO MANY POSSIBLE NAL TESTS O I E AT UR R L AI S E F PE P TY SCENARIOS O TIMING AND SEQUENCING 14 Even Unit Testing Is Typically Incomplete How many test cases to completely test the following : myfunction(int a, int b, int c) Assume 32-bit integers: •232 * 232 * 232 = 296 = 7.92 * 1028 => at 1 billion tests/second • Testing all combinations takes 2,510,588,971,096 years • Takes longer for complete tests if there are any state variables inside function • Takes longer if there are environmental dependencies to test as well This is a fundamental problem with testing – you can’t test everything • Therefore, need some notion of how much is “enough” testing Even if you could test “everything,”