<<

Debugging: Past, Present, and Future

Alessandro (Alex) Orso School of Computer Science – College of Georgia Institute of Technology http://www.cc.gatech.edu/~orso/

Partially supported by: NSF, Google, IBM, and MSR Automated Debugging

Past

Present

Future Past A Short History of Debugging not so The Birth of Debugging

???

First reference to software errors Your guess?

2017 The Birth of Debugging

1840 1843 • Software errors mentioned in Ada Byron's notes on Charles Babbage's analytical engine • Several uses of the term bug to indicate defects in computers and software • First actual bug and actual debugging (Admiral ’s associates working on Mark II Computer at Harvard University)

2017 The Birth of Debugging

1840 ... • Software errors mentioned in Ada Byron's notes on Charles Babbage's analytical engine 1940 • Several uses of the term bug to indicate defects in computers and software • First actual bug and actual debugging (Admiral Grace Hopper’s associates working on Mark II Computer at Harvard University)

2017 The Birth of Debugging

1840 • Software errors mentioned in Ada Byron's notes on Charles Babbage's analytical engine • Several uses of the term bug to indicate defects in computers and software 1947 • First actual bug and actual debugging (Admiral Grace Hopper’s associates working on Mark II Computer at Harvard University)

2017 The Birth of Debugging

1840 • Software errors mentioned in Ada Byron's notes on Charles Babbage's analytical engine • Several uses of the term bug to indicate defects in computers and software 1947 • First actual bug and actual debugging (Admiral Grace Hopper’s associates working on Mark II Computer at Harvard University)

2017 Symbolic Debugging

1840

• UNIVAC 1100’s FLIT (Fault Location by Interpretive Testing) • Richard Stallman’s GDB DDD 1962 • • ...

2017 Symbolic Debugging

1840

• UNIVAC 1100’s FLIT (Fault Location by Interpretive Testing) • GDB • DDD 1986 • ...

2017 Symbolic Debugging

1840

• UNIVAC 1100’s FLIT (Fault Location by Interpretive Testing) • GDB • DDD 1996 • ...

2017 Symbolic Debugging

1840

• UNIVAC 1100’s FLIT (Fault Location by Interpretive Testing) • GDB • DDD 1996 • ...

2017 Program Slicing

1960 • Intuition: developers “slice” backwards when debugging • Weiser’s breakthrough paper • Korel and Laski’s dynamic slicing 1981 • Agrawal • Ko’s Whyline

2017 Program Slicing

1960 • Intuition: developers “slice” backwards when debugging • Weiser’s breakthrough paper • Korel and Laski’s dynamic slicing 1981 • Agrawal • Ko’s Whyline

2017 Static Slicing Example

mid() { int x,y,z,m; 1: read(“Enter 3 numbers:”,x,y,z); 2: m = z; 3: if (yy) 10: m = y; 11: else if (x>z) 12: m = x; 13: print(“Middle number is:”, m); } Program Slicing

1960 • Intuition: developers “slice” backwards when debugging • Weiser’s breakthrough paper • Korel and Laski’s dynamic slicing 1981 • Agrawal • Ko’s Whyline

2017 Program Slicing

1960 • Intuition: developers “slice” backwards when debugging • Weiser’s breakthrough paper • Korel and Laski’s dynamic slicing Agrawal 1988 • 1993 • Ko’s Whyline

2017 Dynamic Slicing Example

mid() { int x,y,z,m; 1: read(“Enter 3 numbers:”,x,y,z); 2: m = z; 3: if (yy) 10: m = y; 11: else if (x>z) 12: m = x; 13: print(“Middle number is:”, m); } } 13: print(“Middle numberis:”,m); 12: 11: 10: 9: 8: else 7: 6: 5: 4: 3: if(y

mid() { else if(x>z) if (x>y) else if(x

m =x; m =y; m =y;//bug m =y; Pass/Fail

P • • • • • • • 3,3,5 P • • • • • • 1,2,3 Test Cases P • • • • • • • 3,2,1 P • • • • • • • 5,5,5 P • • • • • • 5,3,4 F • • • • • • • 2,1,3 } 13: print(“Middle numberis:”,m); 12: 11: 10: 9: 8: else 7: 6: 5: 4: 3: if(y

mid() { else if(x>z) if (x>y) else if(x

m =x; m =y; m =y;//bug m =y; Pass/Fail

P • • • • • • • 3,3,5 P • • • • • • 1,2,3 Test Cases P • • • • • • • 3,2,1 P • • • • • • • 5,5,5 P • • • • • • 5,3,4 F • • • • • • • 2,1,3 Program Slicing

1960 • Intuition: developers “slice” backwards when debugging • Weiser’s breakthrough paper • Korel and Laski’s dynamic slicing Agrawal 1988 • 1993 • Ko’s Whyline

2017 Program Slicing

1960 why is this • Intuition: developers “slice” backwardsstroke black? when debugging Weiser’s breakthrough paper why• didn’t this •colorKorel panel and Laski’s dynamic slicing change? • Agrawal Ko’s Whyline 2008 •

2017 Program Slicing

1960 why is this • Intuition: developers “slice” backwardsstroke black? when debugging Weiser’s breakthrough paper why• didn’t this •colorKorel panel and Laski’s dynamic slicing change? • Agrawal Ko’s Whyline 2008 •

2017 Program Slicing

1960 : developers “slice” backwards • Intuition why is this when debugging stroke black? • Weiser’s breakthrough paper why didn’tKorel this and Laski’s dynamic slicing color• panel change?• Agrawal Ko’s Whyline 2008 •

2017 Delta Debugging

1960 • Intuition: it’s all about differences! • Isolates failure causes automatically • Zeller’s “Yesterday, My Program Worked. Today, It Does Not. Why?”

1999

2017 Delta Debugging

1960 • Intuition: it’s all about differences! • Isolates failure causes automatically • Zeller’s “Yesterday, My Program Worked. Today, It Does Not. Why?” • Applied in several contexts 1999

2017 Today ✘

✔ Yesterday Today ✘

✔ ✔ Yesterday Today ✘ ✘

✔ ✔ Yesterday Today ✘ ✘ ✘

✔ ✔ Yesterday Today ✘ ✘

✘ … ✘

✔ … Failure cause ✔ ✔ Yesterday Today ✘ ✘

✘ … ✘

✔ … Failure cause ✔ ✔ Yesterday Applied to programs, inputs, states, ... Statistical Debugging

1960 • Intuition: debugging techniques can leverage multiple executions • Tarantula • Liblit’s CBI

2001 • Many others!

2017 Statistical Debugging

1960 • Intuition: debugging techniques can leverage multiple executions • Tarantula • Liblit’s CBI

2001 • Many others!

2017 Tarantula

Test Cases

mid() { suspiciousness

int x,y,z,m; 3,3,5 1,2,3 3,2,1 5,5,5 5,3,4 2,1,3 1: read(“Enter 3 numbers:”,x,y,z); • • • • • • 0.5 2: m = z; • • • • • • 3: if (yy) • 10: m = y; • 11: else if (x>z) 12: m = x; • • • • • • 13: print(“Middle number is:”, m); Pass/Fail P P P P P F } Tarantula

Test Cases

mid() { suspiciousness

int x,y,z,m; 3,3,5 1,2,3 3,2,1 5,5,5 5,3,4 2,1,3 1: read(“Enter 3 numbers:”,x,y,z); • • • • • • 0.5 2: m = z; • • • • • • 0.5 3: if (yy) 0.0 • 0.0 10: m = y; • 0.0 11: else if (x>z) 0.0 12: m = x; • • • • • • 0.5 13: print(“Middle number is:”, m); Pass/Fail P P P P P F } Tarantula

Test Cases

mid() { suspiciousness

int x,y,z,m; 3,3,5 1,2,3 3,2,1 5,5,5 5,3,4 2,1,3 1: read(“Enter 3 numbers:”,x,y,z); • • • • • • 0.5 2: m = z; • • • • • • 0.5 3: if (yy) 0.0 • 0.0 10: m = y; • 0.0 11: else if (x>z) 0.0 12: m = x; • • • • • • 0.5 13: print(“Middle number is:”, m); Pass/Fail P P P P P F } Statistical Debugging

1960 • Intuition: debugging techniques can leverage multiple executions • Tarantula • Liblit’s CBI

2001 • Many others!

2017 Statistical Debugging

1960 • Intuition: debugging techniques can leverage multiple executions • Tarantula • CBI

2003 • Many others!

2017 Statistical Debugging

1960 • Intuition: debugging techniques can leverage multiple executions • Tarantula • CBI • Ochiai

2006 • Many others!

2017 Statistical Debugging

1960 • Intuition: debugging techniques can leverage multiple executions • Tarantula • CBI • Ochiai • Causal inference based 2010 • Many others!

2017 Statistical Debugging

1960 • Intuition: debugging techniques can leverage multiple executions • Tarantula • CBI • Ochiai • Causal inference based 2008 • IR-based techniques

2017 IR-Based Techniques

Bug ID: 90018 Summary: Native tooltips left around on CTabFolder. Description: Hover over the PartStack CTabFolder inside eclipse until some native tooltip is displayed. For example, the maximize button. When the tooltip appears, change perspectives using the keybinding. the CTabFolder gets hidden, but its tooltip is permanently displayed and never goes away. Even if that CTabFolder is disposed (I'm assuming) when the perspective is closed. ------IR-Based Techniques

Source code file: CTabFolder.java public class CTabFolder extends Composite { // tooltip int [] toolTipEvents = new int[] {SWT.MouseExit, SWT.MouseHover, SWT.MouseMove, Bug ID: 90018 SWT.MouseDown, SWT.DragDetect}; Summary: Native tooltips left around on Listener toolTipListener; CTabFolder. … Description: Hover over the PartStack / * Returns true if the CTabFolder CTabFolder inside eclipse until some native only displys the selected tab tooltip is displayed. For example, the * and false if the CTabFolder maximize button. When the tooltip appears, displays multiple tabs. change perspectives using the keybinding. */ the CTabFolder gets hidden, but its tooltip is …void onMouseHover(Event event) { permanently displayed and never goes away. showToolTip(event.x, event.y); Even if that CTabFolder is disposed (I'm } assuming) when the perspective is closed. void onDispose() { ------inDispose = true; hideToolTip(); … } } Statistical Debugging

1960 • Intuition: debugging techniques can leverage multiple executions • Tarantula • CBI • Ochiai • Causal inference based ... • IR-based techniques Many others! 2017 • Additional Techniques

1960 • Contracts (e.g., Meyer et al.) • Counterexample-based (e.g., Groce et al., Ball et al.) • Tainting-based (e.g., Leek et al.) • Debugging of field failures (e.g., Jin et al.) • Predicate switching (e.g., Zhang et al.) • Fault localization for multiple faults (e.g., Steimann et al.) • Debugging of concurrency failures (e.g., Park et al.) • Automated data structure repair (e.g., Rinard et al.) • Finding patches with genetic programming • Domain specific fixes (tests, web pages, comments, concurrency) • Identifying workarounds/recovery strategies (e.g., Gorla et al.) • Formula based debugging (e.g., Jose et al., Ermis et al.) 2017 • ... Additional Techniques

1960 • Contracts (e.g., Meyer et al.) • Counterexample-based (e.g., Groce et al., Ball et al.) • Tainting-based (e.g., Leek et al.) • Debugging of field failures (e.g., Jin et al.) • Predicate switching (e.g., Zhang et al.) • Fault localization for multiple faults (e.g., Steimann et al.) • Debugging of concurrency failures (e.g., Park et al.) • Automated data structure repair (e.g., Rinard et al.) • FindingNot patches meant with genetic programmingto be comprehensive! • Domain specific fixes (tests, web pages, comments, concurrency) • Identifying workarounds/recovery strategies (e.g., Gorla et al.) • Formula based debugging (e.g., Jose et al., Ermis et al.) 2017 • ... Present Can We Debug at the Push of a Button? Automated Debugging (rank based)

1)" 2)" 3)" 4)" …" Automated Debugging (rank based)

1)" 2)" 3)" 4)" …" Here$is$a$list$of$ places$to$check$out$ Automated Debugging Conceptual Model

1)" 2)" 3)" 4)" …" Ok,$I$will$check$out$ your$sugges3ons$ one$by$one.$ Automated Debugging Conceptual Model

1)" ✔ 2)" ✔ 3)" ✔ 4)" …" Found&the&bug!& Performance of Automated Debugging Techniques 100 Spectra-Based 80 Techniques

60 Space 40 Siemens

20 % of faultyversions%of

0 20 40 60 80 100 % of program to be examined to find fault Performance of Automated Debugging Techniques 100 Spectra-Based 80 Techniques

60 Mission Space 40 Siemens

20 % of faultyversions%of Accomplished?

0 20 40 60 80 100 % of program to be examined to find fault Assumption #1: Locating a bug in 10% of the code is a great result

100 LOC ➡ 10 LOC

10,000 LOC ➡ 1,000 LOC

100,000 LOC ➡ 10,000 LOC Assumption #2: exhibit perfect bug understanding

Do you see a bug? Assumption #3: Programmers inspect a list linearly and exhaustively

Good for comparison, but is it realistic? Assumption #3: Programmers inspect a list linearly and exhaustively

Good for comparison,

Does the conceptualbut modelis it realistic? make sense? Have we really evaluated it? Assumption #3: Programmers inspect a list linearly and exhaustively

irection? Good for comparison,

DoesAre the we conceptualheadedbut in the modelis rightit realistic? make d sense? Have we really evaluated it? Are we headed in the right direction?

“Are Automated Debugging Techniques Actually Helping Programmers?” ISSTA 2011 . Parnin and A. Orso “Evaluating the Usefulness of IR-Based Fault Localization Techniques” ISSTA 2015 Q. Wang, C. Parnin, and A. Orso What do we know about automated debugging?

Studies on tools Human studies What do weLet’s&see…& know about Over&50&years&of&research& automatedon&automated&debugging.& debugging?

2001.%Sta)s)cal%Debugging%

Studies on tools 1999.$Delta$Debugging$ Human studies

1981.%Weiser.%Program%Slicing%

1962.&Symbolic&Debugging&(UNIVAC&FLIT)& What do we know about automated debugging?

Human studies

Studies on tools

Weiser Kusumoto Sherwood Ko DeLine Are these Techniques and Tools Actually Helping Programmers?

RQ1: Do programmers who use automated debugging tools locate bugs faster than programmers who do not use such tools? RQ2: Is the effectiveness of debugging with automated tools affected by the faulty statement’s rank? RQ3: Do developers navigate a list of statements ranked by suspiciousness in the order provided? RQ4: Does perfect bug understanding exist? Are these Techniques and Tools Actually Helping Programmers?

RQ1: Do programmers whoUser use automated studies: debugging tools locate bugs faster than programmers who do not use such tools? RQ2: Is the effectiveness of debugging with automated tools affected by the faulty statement’sSpectra rank? based fault localization RQ3: Do developersIR-based navigate a list faultof statements localization ranked by suspiciousness in the order provided? RQ4: Does perfect bug understanding exist? Experimental Protocol: Setup

Participants: 34 developers MS’s Students Different levels of expertise (low, medium, high)

To o l s 1)" ✔ 2)" ✔ Rank-based tool ✔ • 3)" (Eclipse plug-in, logging) 4)" …" • Eclipse Experimental Protocol: Setup

Software subjects: • Tetris (~2.5KLOC) • NanoXML (~4.5KLOC)

1)" ✔ 2)" ✔ 3)" ✔ 4)" …" Tetris Bug

(Easier) NanoXML Bug

(Harder) Experimental Protocol: Setup

Tasks: • Fault in Tetris • Fault in NanoXML • 30 minutes per task • Questionnaire at the end

1)" ✔ 2)" ✔ 3)" ✔ 4)" …" Experimental Protocol: Studies and Groups Experimental Protocol: Studies and Groups A! B!

Part 1 Experimental Protocol: Studies and Groups C! D! Rank 7 ➡ 35 ! Part 2 ! 16 ➡ 83 Rank Study Results A! B! C! D! Rank ! ! Rank

Tetris NanoXML A B C D Study Results A! B! C! D! Rank ! ! Rank

Tetris NanoXML A Not significantly B different C D Study Results A! B! C! D! Rank ! ! Rank

Tetris NanoXML A Not Not significantly significantly B different different C Not Not significantly significantly D different different Study Results A! B! C! D! Rank ! ! Rank

Tetris NanoXML

Stratifying A Significantly Not participants different for high significantly B performers different C Not Not significantly significantly D different different Study Results A! B! C! D! Rank ! ! Rank

Tetris NanoXML

Stratifying A Significantly Not participants different for high significantly AnalysisB of resultsperformers and questionnaires...different C Not Not significantly significantly D different different Findings

RQ1: Do programmers who use automated debugging tools locate bugs faster than programmers who do not use such tools? Experts are faster when using the tool ➡ Yes (with caveats) RQ2: Is the effectiveness of debugging with automated tools affected by the faulty statement’s rank Changes in rank have no significant effects ➡ No RQ3: Do developers navigate a list of statements ranked by suspiciousness in the order provided? Programmers do not visit each statement in the list, they search RQ4: Does perfect bug understanding exist? Perfect bug understanding is generally not a realistic assumption Future Where Shall We Go Next? Feedback-based Debugging (Humans in the Loop)

Intuition: we should amplify, rather than replace human skills 1)" Pass 2)" 3)" SFL 4)" …" Fail 1)" Pass 2)" 3)" SFL 4)" …" Fail Assumption: Programmers exhibit Assumption: Programmers inspect a perfect bug understanding list linearly and exhaustively 1)" Pass 2)" 3)" SFL 4)" …" Fail Assumption: Programmers exhibit Assumption: Programmers inspect a perfect✗ bug understanding list linearly✗ and exhaustively Partial Pass Conjecture of Execution Fault Cause

Fail

1)" Pass 2)" 3)" 4)" SFL …" Fail Conjecture of Fault Cause Partial 1)" 2)" Execution Pass 3)" 4)" SFL …"

Fail Conjecture of Fault Cause

1)" Method-level 2)" Pass 3)" Execution 4)" SFL …"

Fail Swift

Conjecture of Fault Cause

1)" Method-level 2)" Pass 3)" Execution 4)" SFL …"

Fail

High-level Query state, params method return, state’ ✓ | ✗ | ? Example Iteration 1

✓ ✗ ? Iteration 2

✓ ✗ ? Iteration 3

✓ ✗ ? Swift

Conjecture of Fault Cause

1)" Method-level 2)" Pass 3)" Execution 4)" SFL …"

Virtual Test Cases Fail

High-level Query state, params method return, state’ ✓ | ✗ | ? Swift

Conjecture of Fault Cause

1)" Method-level 2)" Pass 3)" Execution 4)" PreliminarySFL empirical…" results:

Virtual Test Cases Fail• 20 faults from 3 open-source projects • Average ranking: 66.3 High-level Query state, params method • Average # of queries: 4.3 return, state’ ✓ | ✗ | ? Formula-based Debugging (AKA Failure Explanation) • Intuition: executions can be expressed as formulas that we can reason about • Cause Clue Clauses • Error invariants • Error invariants Input

Assertion Input Formula

Semantics of the program ✘ Input ⋀ A ⋀ B ⋀ C ⋀ Assertion

Unsatisfiable Formula

Assertion Input Formula

Semantics of the program ✘ Input ⋀ A ⋀ B ⋀ C ⋀ Assertion

Unsatisfiable Formula MAX-SAT solver

✔ ✘ Input ⋀ A ⋀ B ⋀ Assertion C

MAX-SAT set CoMSS Assertion Input Formula

Semantics of the program ✘ Input ⋀ A ⋀ B ⋀ C ⋀ Assertion

Unsatisfiable Formula MAX-SAT solver

✔ ✘ Input ⋀ A ⋀ B ⋀ Assertion C

MAX-SAT set CoMSS Assertion Formula-based Debugging (AKA Failure Explanation) • Intuition: executions can be expressed as formulas that we can reason about • Bug Assist • Error invariants • Error invariants Formula-based Debugging (AKA Failure Explanation) • Intuition: executions can be expressed as formulas that we can reason about • Bug Assist • Error invariants • Error invariants Formula-based Debugging (AKA Failure Explanation) • Intuition: executions can be expressed as formulas that we can reason about • Bug Assist • Error invariants • Angelic Debugging • Error invariants Formula-based Debugging (AKA Failure Explanation) • Intuition: executions can be expressed as formulas that we can reason about • Bug Assist • Error invariants • Angelic Debugging • Error invariants In Summary • We came a long way since the early days of debugging

...

• But there is still a long way to go... Where Shall We Go Next? • Hybrid, semi-automated fault localization techniques • Failure understanding and explanation • Debugging of field failures (with limited information) • (Semi-)automated repair and workarounds • Takeaway messages (true also for other areas): • Again: don’t pursue full automation at all costs! • Be careful (and honest) with your assumptions! • User studies, user studies, user studies! With much appreciated input/contributions from

• Andy Ko • Abhik Roychoudhury • Wei Jin • Wes Weimer • Jim Jones • Tao Xie • Wes Masri • Andreas Zeller • Chris Parnin • Xiangyu Zhang