QUALITY ASSURANCE

Michael Weintraub Fall, 2015 Unit Objective • Understand what quality assurance means • Understand QA models and processes Definitions According to NASA

• Software Assurance: The planned and systematic set of activities that ensures that software life cycle processes and products conform to requirements, standards, and procedures. • Software Quality: The discipline of software quality is a planned and systematic set of activities to ensure quality is built into the software. It consists of software quality assurance, software quality control, and software quality engineering. As an attribute, software quality is (1) the degree to which a system, component, or process meets specified requirements. (2) The degree to which a system, component, or process meets customer or user needs or expectations [IEEE 610.12 IEEE Standard Glossary of Software Engineering Terminology]. • Software Quality Assurance: The function of software quality that assures that the standards, processes, and procedures are appropriate for the project and are correctly implemented. • Software Quality Control: The function of software quality that checks that the project follows its standards, processes, and procedures, and that the project produces the required internal and external (deliverable) products. • Software Quality Engineering: The function of software quality that assures that quality is built into the software by performing analyses, trade studies, and investigations on the requirements, design, code and verification processes and results to assure that reliability, maintainability, and other quality factors are met. • Software Reliability: The discipline of software assurance that 1) defines the requirements for software controlled system fault/failure detection, isolation, and recovery; 2) reviews the software development processes and products for software error prevention and/or controlled change to reduced functionality states; and 3) defines the process for measuring and analyzing defects and defines/derives the reliability and maintainability factors. • Verification: Confirmation by examination and provision of objective evidence that specified requirements have been fulfilled [ISO/IEC 12207, Software life cycle processes]. In other words, verification ensures that “you built it right”. • Validation: Confirmation by examination and provision of objective evidence that the particular requirements for a specific intended use are fulfilled [ISO/IEC 12207, Software life cycle processes.] In other words, validation ensures that “you built the right thing”.

From: http://www.hq.nasa.gov/office/codeq/software/umbrella_defs.htm Software Quality Assurance

Technology Objective: Designing a quality system and writing quality software

√ The tech team aims to deliver a correctly behaving system to the client

Software Quality Assurance is about assessing if the system meets expectations

Доверяй, но проверяй (Russian Proverb - Doveryay, no proveryay)

Trust, but verify Validation Versus Verification

Validation Verification

Are we building the right Are we building the product or service? product or service right?

Both involve testing – done at every stage but “testing can only show the presence of errors, not their absence” Dijkstra Validation

Typically a client-leaning activity

After all, they are the ones who asked for the system

Product Trials User Experience Evaluation Verification Optimist: It’s about showing correctness/goodness Pessimist: It’s about identifying defects

Good Input ? Good Output System Bad Input Bad Output ? Quality versus Reliability

Quality Assurance Reliability

Assessing whether a Probability of failure-free software component or software operation for a system produces the specified duration in a expected/correct/accepted particular environment behavior or output relationship between a Cool phrases given set of inputs Five 9’s OR No down-time Assessing features of the software Fun Story – First Computer Bug (1947)

The First "Computer Bug". Moth found trapped between points at Relay # 70, Panel F, of the Mark II Aiken Relay Calculator while it was being tested at Harvard University, 9 September 1947.

The operators affixed the moth to the computer log, with the entry: "First actual case of bug being found". They put out the word that they had "debugged" the machine, thus introducing the term "debugging a comp...uter program".

In 1988, the log, with the moth still taped by the entry, was in the Naval Surface Warfare Center Computer Museum at Dahlgren, Virginia. The log is now housed at the Smithsonian Institution’s National Museum of American History, who have corrected the date from 1945 to 1947. Courtesy of the Naval Surface Warfare Center, Dahlgren, VA., 1988. NHHC Photograph Collection, NH 96566-KN (Color).

From https://www.facebook.com/navalhistory/photos/a.77106563343.78834.76845133343/10153057920928344/ Testing is Computationally Hard The space is huge and it is generally infeasible to test anything completely

Assessing quality is an exercise in establishing confidence in a system Or Minimizing Risks

Other factors include App1

Quality of the Process OS1 Each layer introduces Quality of the Team VM risk Quality of the Environment Host OS

Hardware Lots to Consider • Component behavior • Interactions between components • System and sub- system behavior • Interactions between sub-systems • Negative path • Behavior under load • Behavior over time • Usability Two Approaches

Static Evaluations Dynamic Evaluations

Making judgments Involves executing the without executing the code and judging code performance Static Technique - Reviews Fundamental QA Technique

Peer(s) reviews artifact for correctness and clarity

Often a formal process

Value: finding issues at design/definition time rather than waiting for results of the step to complete Requirements Test Plans

Architecture Highly effective, but does not Implementation replace the need for dynamic & Design techniques One Extreme: Jury/Peer Reviews Before anything is accepted, someone other than the creator must review it and approve it

• Single reviewer model – Usually a “certified” / senior person

• Panel model – Highly structured reviews

– Can take significant preparation

• Usually done at the design or development stage

• May introduce delay between when code is written and when it gets reviewed Reviews

Models exist for both reviewer or author to lead the discussion

Author usually provides participants materials to study in advance

Requires positive and open attitudes and preparation

Value Review Meeting Second opinion on clarity, effectiveness, and efficiency Moderator Scribe Learning from others Review Panel Peers Author Avoids “board blindness” on seeing Experts flaws Client(s) Peer pressure to be neat and tie up loose ends Paired Programming

Lightweight Peer Reviews

One person drives while the other watches/reviews

Derived from Extreme Programming, current favorite in agile  Continuous review When compared to solo dev models,  Shared problem solving MAY cause higher initial cost per  Better communications module created (time and  Learning from Peer resource), BUT higher quality and  Social! lower overall cost  Peer Pressure See as an example http://collaboration.csc.ncsu.edu/laurie/Papers/XPSardinia.PDF What do reviews look for? Clarity Can the reader easily and directly understand what the artifact is doing

Correctness Analysis of algorithm used

Common Code Faults 1. Data initialization, value ranges and type mismatches 2. Control: are all the branches really necessary (are the conditions properly and efficiently organizated)? Do loops terminate? 3. Input: are all parameters or collected values used? 4. Output: every output is assigned a value? 5. Interface faults: Parameter numbers, types, and order; structures and shared memory 6. Storage management: memory allocation, garbage collection, inefficient memory access 7. Exception handling: what can go wrong, what error conditions are defined and how are they handled

List adapted from W. Arms: http://www.cs.cornell.edu/Courses/cs5150/2015fa/slides/H2-testing.pdf Examples You are asked to sort an array. There are many algorithms to sort an array. [You aren’t going to use a library function so you have to write this]

Many choices exist. Suppose you are deciding between bubble sort, quicksort, and merge sort. All will work (sort an array), but which will be the better code ?

Bubble sort is very easy to write: two loops. Slow on average O(n2) – how big will n be?? O(n) for memory.

Quicksort is complicated to write. O(n log(n)) on average, O(n2) worst case. Requires constant memory O(n). Very effective on in-memory data. Most implementations are very fast.

Mergesort is moderate to write. O(n log(n)) worst case. Memory required is a function of the data structure. Very effective on data that requires external access. Expressively Logical… boolean SquareRoot (double dValue, boolean SquareRoot (double dValue, double &dSquareRoot) double &dSquareRoot) { { boolean bRetValue = false; dSquareRoot = NULL;

if (dValue < 0) { if (dValue < 0) dSquareRoot = NULL; return false; bRetValue = false; } dSquareRoot = pow(dValue, 0.5); else { return true; dSquareRoot = pow(dValue, 0.5); } bRetValue = true; } return bRetValue; } Static Program Analyzers

Evaluate code modules automatically looking for errors or odd things

 Loops or programs within multiple exits (more common) or entries (less common)  Undeclared, uninitialized, or unused variables  Unused functions/procedures, parameter mismatches  Unassigned pointers  Memory leaks  Show paths through code/system  Show how outputs depend on inputs Rules of Defensive Programming (taken from Bill Arms) Based on Murphy’s Law: Anything that can go wrong, will

1. Write SIMPLE code 4. Eliminate all compiler warnings from code 2. If code is difficult to read, RE-WRITE IT 5. It never hurts to check system states after 3. Test implicit assumptions modification – Check all parameters passed in from other modules Dynamic Evaluations

Quick Terminology Objective

• Mistake Write test cases and organize – A human action that results in an incorrect result them into suites that cause failure • Fault / Defect and illuminate faults – Incorrect step, process, or data within the software • Failure Ideally you will fail in striving for – Inability of the software to this objective, perform within performance criteria but you will be surprised how • Error successful you may be – The difference between the observed and the expected value or behavior Who is a Tester?

Developers Good for exposing known risks areas

Experienced Outsiders Good for finding gaps missed and Clients by developers

Inexperienced Users Good for finding other errors

Mother Nature Always finds the hidden flaw Approaches 1. Top Down – System flows are tested Especially useful in • UI’s, UX – Units are stubbed • Workflows • Very large systems 2. Bottom Up – Each unit is tested on its own

3. Stress – Test at or past design limits Testing Flow (Dynamic Evaluation)

Soak – Operational Readiness Acceptance Installation Performance Functional Client Unit Integration Test Operational Test Test System Test

Unit Two Forms of Testing

Black Box White Box Black Box Testing

• No access to the internal • With software, this tests workings of the system the interface under test (SUT) →What is input to the • Testing against system? specifications – The tester knows what the →What you can do from SUT’s I/O or behavior the outside to change should be the system?

• The tester observes the →What is output from the results or behavior system? Can a Component Developer Do Black Box Testing? White Box Testing

• Have access to the • Testing evaluates logical internal workings of the paths through code system under test (SUT) – Conditionals – Loops – Branches • Testing against specifications, with • Impossible to exercise all access to algorithms, paths completely, so you data structures, and make compromises messaging. – Focus paths on only • The tester observes the important paths results or behavior • Keeping components small is a big help here

– Focus on only important data structures Ground Floor –

Tests focus an individual Unit tests decouple the component 1. Interfaces developer from the code 2. Messages Individual code ownership is not 3. Shared memory required if unit tests protect the 4. Internal functions code Emphasizes adherence to the specifications Unit tests enable refactoring Code bases often include After each small change, the unit the code and the unit tests tests can verify that a change in as a coherent piece structure did not introduce a Usually done by developers building the change in functionality component What Makes for a Good Test

Test Perspective Tester Perspective

• Either addresses a partition of inputs or Know why the test exists tests for common developer errors • Automated – Should target finding specific • Runs Fast problems – To encourage frequent use – Should optimize the cost of • Small in scope defining and running the test – Test one thing at a time against the likelihood of • When a failure occurs, it should finding a fault/failure pinpoint the issue and not require much debugging

– Failure messages help make the issue clear – Should not have to refer to the test to understand the issue Organizing Testing

Test Plan Test Suite Describes test activities A set of test cases and scripts to 1. Scope measure answers 2. Approach Often the post condition of one test 3. Resources is often used as the precondition for 4. Schedule the next one Identifies • What is to be tested • The tasks required to do the testing • Who will do each task OR • The test environment • The test design techniques • Entry and exit criteria to be used • Risk identification and contingency Tests may be executed in any order planning

Adapted from http://sqa.stackexchange.com/questions/9119/test-suite-vs-test-plan Defect Severity

An assessment of a defect’s impact Can be a major source of contention between dev and test

Show stopper. The functionality cannot be delivered Critical unless that defect is cleared. It does not have a workaround. Major flaw in functionality but it still can be released. There is a Major workaround; but it is not obvious and is difficult. Minor Affects minor functionality or non-critical data. There is an easy workaround. Trivial Does not affect functionality or data. It does not even need a workaround. It does not impact productivity or efficiency. It is merely an inconvenience. Test Exit Report – Input to Go/No Go Decision 1. Document Purpose 5. Types of testing Short description about the objective performed – Description of tests run 2. Application Overview Overview of the SUT 6. Test Environment and Tools 3. Testing Scope – Description of the environment. Describes the functions/modules in and Helpful for recreating issues and out of scope for testing. Also identifies understanding context what was omitted. 7. Recommendations 4. Metrics – Workaround options Results of testing, including summaries – Number of test cases planned vs executed 8. Exit Criteria – Number of test cases passed/failed – Statement whether SUT passes or not – Number of defects identified and their Status & Severity – Distribution of defects 9. Conclusion/Sign Off – Go/ no go recommendation Testing Hint #1 – Mess With Inputs

• If a single value, try – Negative values – Alternate types – Very small or very large inputs (overflow buffers if you can) – Null values

• If input is a sequence, try – Using a single valued sequence – Repeated values – Varying the length of sequences and the order of the data – Forcing situations where the first, last and middle values are used

• Try to force each and every error message

• Try to force computational overflows or underflows Testing Hint #2 – Force Every Path

Each logical path must be exercised at least once Each execution path through the code

• If…then…else = Two paths • Switch…case() = One path per case • +one path if no catch-all case

• Repeat…Until ≥ Two paths • While…Do ≥ Two paths

• Object member functions = 1 path per signature Testing Hint #3 – Mess With Interfaces

• Remember, interfaces may be involve 1. References to data or functions • Data may be passed by-reference or by-data Internals • Methods only have data interfaces 1. Functions 2. Shared memory 2. Data 3. Messages

• Set interface parameters to extremely low and high values • Set pointer values to NULL • Mis-type the parameters or violate value boundaries – e.g. set input as negative where the signature expects ≥ 0

• Call the component so it will fail and check the failure reactions

• Pass too few or too many parameters

• Bombard the interface with messages

• With shared memory, vary accessor instantiation and access activities Testing Hint #4 – Be Diabolical

Try to break the system by using data with extreme values to crash the system Life Lessons

• If unit testing is not thorough, • Unit tests will be most needed all subsequent testing will when you have the least likely be a waste of time. amount of time – Unit tests should be created • You should always take the time to do a good job with unit before they are needed, not testing when you need them – Even when the project is falling behind

• The end of a project is almost always compressed – Developers often defer testing- related tasks until as late as possible System Test

Integrating components and sub-systems to create the checks on component compatibility, interactions, correctly passing information, and timing

Like Unit Test, activities focus Unlike Unit Test on following uses and data • Components may come from 1. Typical many, independent parties 2. Boundaries • Bespoke development may meet Off-The-Shelf or reused 3. Outliers components • Testing becomes a group 4. Failures activity • Testing may move to an independent team altogether WARNING

Will be a complete and utter waste if components are not thoroughly tested Unlike Components, Systems Have Emergent Behavior

Some behavior is only clear when you put components together

This has to be tested too,

although it can be very hard to plan in advance! Integrating Multiple Parties May Introduce Conflict System Integration Implications

• Components may come from • Who controls integration multiple, possibly independent, readiness? parties – What does lab entry mean? – Are COTS components trusted? • Bespoke development may meet Off-The-Shelf or reused • How to assign credit for test results and then who is components responsible for repairs? – How to maintain momentum • Testing becomes a group when everyone isn’t at the activity table? – When partner priorities are not • Testing may move to an shared? independent team altogether – What about open source? Testing Focus Emphasizes component compatibility, interactions, correctly passing information, and timing

Integration aims to find misunderstandings one component introduces when it interacts with other components

Use Cases are a useful testing model

• Forces components to interact • Sequence diagrams form a strong basis for designing these tests – Articulates the inputs required and the expected behaviors and outputs Iterative Development Leads to Iterative Testing

Two senses 1. Create tests incrementally

2. Run tests iteratively a. On check-in and branch merge, test all affected modules b. On check-in, test all modules c. Per a schedule, test all modules – E.g. daily

Each change, especially after a bug fix, should mean adding at least one new

It is always best to test after each change as completely as you can, and completely before a release Your testing is good enough until a problem shows that it is not good enough

It is hard to know when you should feel enough confidence to release the system

Confidence comes, in part, on the sub-test of possible tests selected

Picking the Subset high

Selection based on company few defects found policy many defects found Every statement must executed one Software Quality Every path must be exercised low high Crafted by specific end user use cases (scenario testing) few defects found few defects found

Selection based on testing team low experience Test Quality Measuring Quality: Defect Density

Using the past to estimate the future

Judges code stability by comparing past number of bugs per code measure (lines of code, number of modules,…) to present measured levels

Poor Software 10 Quality bugs i) + bugspost i) 9.5 pre−release ( −release ( 9 bugDensityrelease(i) = Expected 푐표푑푒푀푒푎푠푢푟푒 8 7 Quality 7 If density for the next release’s additional code is 6 Poor Test Coverage/Quality within ranges of prior releases, it is a candidate for 5 release 4 Defect Density Defect 3 Unless test or development practices have improved 2 1 0 Release 1 2 Measuring Quality: Defect Seeding

Using a known quantity as inference to the unknown

Judges code stability by intentionally inserting bugs into a program and then measuring how many get found as an estimator for the actual number of bugs

seededBugsp푙푎푛푡푒푑(i) bugsrelease(i) = ∗ bugsfound(i) seededBugsfound(i)

Challenges

1. Seeding is not easy. Placing right kinds of bugs in enough of the code is hard.

– Bad seeding, being too easy or too hard to find, creates false senses of confidence in your reviews and testing • Too easy: doesn’t mean that most or all of the real bugs were found. • Too hard: danger of looking past the Goodenov line or for things that aren’t there 2. Seeded code must be cleansed of any missed seeds before release. Post clean- up, the code must be tested to insure nothing got accidently broken. Measuring Quality: Capture-Recapture

Applies estimating technique used in predicting wild-life populations (Humphrey, Introduction to Team Software Process, Addison Wesley, 2000)

Uses data collected by two or more independent collectors Collected via reviews or tests

Example: Estimating Turtle Population

You tag 5 turtles and release them. You later catch 10 turtles, two have tags. 푇표푡푎푙 # 표푓 푡푢푟푡푙푒푠 10 푡푢푟푡푙푒푠 ≈ 5 푡푢푟푡푙푒푠 2 푡푢푟푡푙푒푠 10 푡푢푟푡푙푒푠 ∗ 5 푡푢푟푡푙푒푠 푇표푡푎푙 # 표푓 푡푢푟푡푙푒푠 = = 25 푡푢푟푡푙푒푠 2 푡푢푟푡푙푒푠 Capture-Recapture

Each collector finds some defects out of the total number of defects Some of these defects found will overlap

Method

1. Count the number of defects found by each collector (A, B) 2. Count the number of intersecting defects found by each collector (C) 3. Calculate defects found = (A+B) - C

(퐴∗퐵) 4. Estimate total defects = 퐶 (퐴∗퐵) 5. Estimate remaining defects = - (A+B)-C 퐶

If multiple collectors, assign A to the highest collected number and set B to the rest of the collected defects. When multiple engineers find the same defect, count it just once. Performance Testing

Measures the system’s capacity to process load Involves creating and executing an operational profile that reflects the expected values of uses

Performance Stress Aims to assess compliance with Identify defects that emerge only non-functional requirements under load Endurance Measures reliability and availability

Ideally the system should degrade gracefully rather than collapse under load Under load, other issues like protocol overhead or timing issues take center stage