Embedded Software Testing 18-649 Distributed Embedded Systems Philip Koopman September 30, 2015

10 Embedded Software Testing 18-649 Distributed Embedded Systems Philip Koopman September 30, 2015 © Copyright 2000-2015, Philip Koopman Overview Testing is an attempt to find bugs • The reasons for finding bugs vary • Finding all bugs is impossible Various types of testing for various situations • Exploratory testing – guided by experience • White Box testing – guided by software structure • Black Box testing – guided by functional specifications Various types of testing throughout development cycle • Unit test • Subsystem test • System integration test • Regression test • Acceptance test • Beta test 2 Definition Of Testing Testing is performing all of the following: • Providing software with inputs (a “workload”) • Executing a piece of software • Monitoring software state and/or outputs for expected properties, such as: – Conformance to requirements – Preservation of invariants (e.g., never applies brakes and throttle together) – Match to expected output values – Lack of “surprises” such as system crashes or unspecified behaviors General idea is attempting to find “bugs” by executing a program The following are potentially useful techniques, but are not testing: • Model checking • Static analysis (lint; compiler error checking) • Design reviews of code • Traceability analysis 3 Testing Terminology Workload: • “Inputs applied to software under test” • Each test is performed with a specific workload Behavior: • “Observed outputs of software under test” • Sometimes special outputs are added to improve observability of software (e.g., “test points” added to see internal states) Oracle: • “A model that perfectly predicts correct behavior” • Matching behavior with oracle output tells if test passed or failed • Oracles come in many forms: – Human observer – Different version of same program – Hand-created script of predicted values based on workload – List of invariants that should hold true over behavior 4 What’s A Bug? Simplistic answer: • A “bug” is a software defect = incorrect software • A software defect is an instance in which the software violates the specification More realistic answer – a “bug” can be one or more of the following: • Failure to provide required behavior • Providing an incorrect behavior • Providing an undocumented behavior or behavior that is not required • Failure to conform to a design constraint (e.g., timing, safety invariant) • Omission or defect in requirements/specification • Instance in which software performs as designed, but it’s the “wrong” outcome • Any “reasonable” complaint from a customer • … other variations on this theme … The goal of most testing is to attempt to find bugs in this expanded sense 5 Types Of Testing – Outline For Following Slides Testing styles: • Smoke testing • Exploratory testing • Black box testing • White box testing Testing situations: • Unit test • Subsystem test • System integration test • Regression test • Acceptance test • Beta test Most testing styles have some role to play in each testing situation 6 Smoke Testing Quick test to see if software is operational • Idea comes from hardware realm – turn power on and see if smoke pours out • Generally simple and easy to administer • Makes no attempt or claim of completeness • Smoke test for car: turn on ignition and check: – Engine idles without stalling – Can put into forward gear and move 5 feet, then brake to a stop – Wheels turn left and right while stopped Good for catching catastrophic errors • Especially after a new build or major change • Exercises any built-in internal diagnosis mechanisms But, not usually a thorough test • More a check that many software components are “alive” 7 Exploratory Testing A person exercises the system, looking for unexpected results • Might or might not be using documented system behavior as a guide • Is especially looking for “strange” behaviors that are not specifically required nor prohibited by the requirements Advantages • An experienced, thoughtful tester can find many defects this way • Often, the defects found are ones that would have been missed by more rigid testing methods Disadvantages • Usually no documented measurement of coverage • Can leave big holes in coverage due to tester bias/blind spots • An inexperienced, non-thoughtful tester probably won’t find the important bugs 8 Black Box Testing Tests designed with knowledge of behavior • But without knowledge of implementation • Often called “functional” testing Idea is to test what software does, but not how function is implemented • Example: cruise control black box test – Test operation at various speeds – Test operation at various underspeed/overspeed amounts – BUT, no knowledge of whether lookup table or control equation is used Advantages: • Tests the final behavior of the software • Can be written independent of software design – Less likely to overlook same problems as design • Can be used to test different implementations with minimal changes Disadvantages: • Doesn’t necessarily know the boundary cases – For example, won’t know to exercise every lookup table entry • Can be difficult to cover all portions of software implementation 9 Examples of Black Box Testing Assume you want to test a floating point square root function: sqrt(x) • sqrt(0) = 0 (boundary condition) • sqrt(1) = 1 (behavior changes between <1 and >1 • sqrt(9) = 3 (test some number greater than 1) • sqrt(.25) = .5 (test some number less than 1) • sqrt(-1) => error (test an out of range input) • sqrt(FLT_MAX) => … (test maximum numeric range) • sqrt(FLT_EPSILON) => … (test smallest positive number) • sqrt(NaN) => NaN (test for Not a Number input values) • Pick random positive numbers and confirm that: sqrt(x) * sqrt(x) = x – Note that the oracle here is an invariant, not a specific result • Other types of possible results to monitor: – Monitor numerical accuracy/stability for floating point math – Check to see if software crashes on some inputs 10 White Box Testing Tests designed with knowledge of software design • Often called “structural” testing Idea is to exercise software, knowing how it is designed • Example: cruise control white box test – Test operation at every point in control loop lookup table – Tests that exercise both paths of every conditional branch statement Advantages: • Usually helps getting good coverage (tests are specifically designed for coverage) • Good for ensuring boundary cases and special cases get tested Disadvantages: • 100% coverage tests might not be good at assessing functionality for “surprise” behaviors and other testing goals • Tests based on design might miss bigger picture system problems • Tests need to be changed if implementation/algorithm changes • Hard to test code that isn’t there (missing functionality) with white box testing 11 Examples of White Box Testing Assume you want to test a floating point square root function: sqrt(x) • Uses lookup table spaced at every 0.1 between zero and 10 • Uses iterative algorithm at and above value of 10 • Test sqrt(x) for negative numbers (expect error) • Test sqrt(x) for every value of x in middle of lookup table: 0.05, 0.15, … 9.95 • Test sqrt(x) exactly at every lookup table entry: 0, 0.1, 0.2, … 10.0 • Test sqrt(x) at 10.0 + FLT_EPSILON • Test sqrt(x) for some numbers that exercise interpolation algorithm Main differences from Black Box Testing: • Tests exploit knowledge of software design & coding details • Usually strives for 100% coverage of known properties – e.g., lookup table entries • Digs deepest at algorithmic discontinuities & branches – e.g., lookup table boundaries and center values 12 Testing Coverage “Coverage” is a notion how completely testing has been done • Usually a percentage (e.g., “97% branch coverage”) White box testing coverage (this is the usual use of the word “coverage”): • Percent of conditional branches where both sides of branch have been tested • Percent of lookup table entries used in computations • Doesn’t test missing code (e.g., missing exception handlers) Black box testing coverage: • Percent of requirements tested • Percent of documented exceptions exercised by tests • But, must relate to externally visible behavior or environment, not code • Doesn’t test extra behavior that isn’t supposed to be there Important note: 100% coverage is not “100% tested” • Each coverage aspect is narrow; good coverage is necessary, but not sufficient to achieve good testing 13 System-Level Testing Can’t Be Complete • System level testing is useful and important – Can find unexpected component interactions • But, it is impracticable to test everything at the system level – Too many possible operating conditions, timing sequences, system states – Too many possible faults, which might be intermittent • Combinations of component failures + memory corruption patterns • Multiple software defects activated by a sequence of operations TOO MANY POSSIBLE NAL TESTS O I E AT UR R L AI S E F PE P TY SCENARIOS O TIMING AND SEQUENCING 14 Even Unit Testing Is Typically Incomplete How many test cases to completely test the following : myfunction(int a, int b, int c) Assume 32-bit integers: •232 * 232 * 232 = 296 = 7.92 * 1028 => at 1 billion tests/second • Testing all combinations takes 2,510,588,971,096 years • Takes longer for complete tests if there are any state variables inside function • Takes longer if there are environmental dependencies to test as well This is a fundamental problem with testing – you can’t test everything • Therefore, need some notion of how much is “enough” testing Even if you could test “everything,”

Embedded Software Testing 18-649 Distributed Embedded Systems Philip Koopman September 30, 2015

Smoke Testing

Types of Software Testing

Smoke Testing What Is Smoke Testing?

Opportunities and Open Problems for Static and Dynamic Program Analysis Mark Harman∗, Peter O’Hearn∗ ∗Facebook London and University College London, UK

Michael Bolton in NZ Pinheads, from the Testtoolbox, Kiwi Test Teams & More!

Exploring Languages with Interpreters and Functional Programming Chapter 11

Experience and Exploring in Software Testing

Information Systems Project Management

Is White Box Testing Manual Or Automated

Getting to Continuous Testing

The Nature of Exploratory Testing

Tapir: Automation Support of Exploratory Testing Using Model Reconstruction of the System Under Test Miroslav Bures, Karel Frajtak, and Bestoun S