Automated Debugging: Are We Close?
Total Page:16
File Type:pdf, Size:1020Kb
COMPUTING PRACTICES Automated Debugging: Are We Close? Despite increased automation in software engineering, debugging hasn’t changed much. A new algorithm promises to relieve programmers of the hit-or-miss approach to isolating a failure’s external root cause. Andreas or the past 50 years, software engineers have determine if the failure is present or if the outcome is Zeller enjoyed tremendous productivity increases unresolved and feeds that information to the Delta Saarland as more and more tasks have become auto- Debugging code. University mated. Unfortunately, debugging—the Programmers can either hardcode the algorithm, process of identifying and correcting a fail- available from the Delta Debugging Web site Fure’s root cause—seems to be the exception, remain- (http://www.st.cs.uni-sb.de/dd/) or download and ing as labor-intensive and painful as it was five decades adapt the core of Wynot, a prototype debugger writ- ago. An engineer or programmer still has to notice ten in Python that runs on a Unix or Linux system. something, wonder why it happens, set up hypothe- Both the algorithm and Wynot (short for “worked yes- ses, and then attempt to confirm or refute them. terday, not today”) fit with any imperative language, In theory, there is no reason to continue this legacy. and Wynot is platform independent. Debugging can be just as disciplined, systematic, and Work on a “debugging server” is in progress. The quantifiable as any other area of software engineer- idea is to have a programmer submit the program to ing—which means that we should eventually be able be examined along with invocation details to a Web to automate at least part of it. So far, research has con- site, choose from a menu how to identify a failure, and centrated on program analysis because of its roots in click on “Submit.” Wynot will then automatically iso- compiler construction. But analysis requires complete late the failure-inducing circumstances and e-mail the knowledge about the program being examined and results to the programmer. Plans are for the server to does not scale well to large programs. be operational by spring 2002. Eventually, users might Testing is another way to gather knowledge about be able to run their own debugging server—for exam- a program because it helps weed out the circumstances ple, within a company intranet or as a module in an that aren’t relevant to a particular failure. If testing automated test system. reveals that only three of 25 user actions are relevant, for example, you can focus your search for the fail- HOW DELTA DEBUGGING WORKS ure’s root cause on the program parts associated with As Figure 1 shows, external failures typically come these three actions. If you can automate the search from program input, user interaction, and program process, so much the better. changes. Within each of these categories are myriad This is the premise of Delta Debugging, an algo- circumstances, any one of which could be the root rithm that uses the results of automated testing to sys- cause of the failure. tematically narrow the set of failure-inducing Delta Debugging requires a test to prove that each circumstances.1 Programmers supply a test function circumstance is really failure inducing. An automated for each bug and hardcode it into any imperative lan- testing environment typically provides such a test, guage. The test function checks a set of changes to but the number of test runs can be quite large. The 26 Computer 0018-9162/01/$17.00 © 2001 IEEE trade-off is that, unlike conventional debugging, Delta Debugging always produces a set of relevant 12 3 ✘ n failure-inducing circumstances, which offer signifi- HTML ✔ cant insights into the nature and cause of the failure. … Also, if you know something about the structure of the failure-inducing circumstance, fewer tests are required. Circumstances How time-consuming it is to incorporate Delta Debugging in a testing environment depends on the circumstances. Program input is typically easy to access, modify, and evaluate. Adapting Delta Debugging to run a program on a different input should take only a matter of hours. Examining user- interaction history requires external tools that record and play back user interaction. Program changes are Program moderately easy to access and simple to modify, but altering configurations can be difficult, depending on the tools you use. ✔ (pass) My colleagues and I used the Wynot prototype to y ✘ (fail) debug Mozilla, Netscape’s open-source Web browser Output ? (unresolved) project (http://www.mozilla.org/), and the GNU debugger. Specifically, we used Wynot to Ok the following operations cause Figure 1. How circum- • simplify HTML input that causes the Mozilla mozilla to crash consistently on my stances affect a pro- browser to fail—after 57 test runs, only one of machine gram’s behavior. A 896 HTML lines remained; failure can stem from • simplify Mozilla failure-inducing user interac- → Start mozilla a range of circum- tion—after 82 test runs, we saw that only 3 of 95 → Go to bugzilla.mozilla.org stances, including user actions were relevant for the failure; and → Select search for bug program input, such • identify failure-inducing code changes—after 97 → Print to file setting the bottom as an HTML page that tests, we narrowed a set of 178,000 changed lines and right margins to .50 (I use makes a browser fail, to one changed line in the GNU debugger that the file /var/tmp/netscape.ps) user interaction that caused a failure. → Once it’s done printing do the makes a program exact same thing again on the crash, or changes a SIMPLIFYING PROGRAM INPUT same file (/var/tmp/netscape.ps) programmer makes Mozilla is a work in progress with a wide audience → This causes the browser to crash after the program (Netscape 6 uses a Mozilla variant), and Mozilla engi- with a segfault fails a regression neers receive several dozens of bug reports a day. Their test. first step in processing a bug report is to simplify it— To simplify bug reports, BugAThon volunteers were eliminate all details that are irrelevant to the failure. supposed to load the Bugzilla Web page (http:// In part, a simplified bug report makes debugging eas- bugzilla.mozilla.org/) into their text editors and then ier because it replaces other reports with irrelevant follow the BugAThon instructions for simplifying details. Mozilla test cases: In July 1999, Bugzilla, the Mozilla bug database, listed more than 370 open, unsimplified bug reports, Start removing HTML markup, CSS rules, and lines and the queue was growing. Seeing that Mozilla engi- of JavaScript from the page. Start by removing the neers “faced imminent doom,” Eric Krock, the parts of the page that seem unrelated to the bug. Netscape product manager, sent out the Mozilla Every few minutes, check the page to make sure it BugAThon, a call for volunteers to help simplify bug still reproduces the bug. [. ] When you’ve cut away reports.2 Each volunteer was to turn a bug report into as much HTML, CSS, and JavaScript as you can, and a set of minimal test cases, in which every input would cutting away any more causes the bug to disappear, be significant in reproducing the failure. For every five you’re done.2 simplifications, a volunteer would get an invitation to the Mozilla launch party; 20 simplifications would earn This volunteer most likely carried out the process the volunteer a T-shirt signed by grateful engineers. manually, but there is no need to—especially in an The following was Bugzilla entry #24735: environment designed for automation. If you have an November 2001 27 1 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✘ 14 <SELECTNAME=”priority” MULTIPLE SIZE=7> ✔ 2 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✔ 15 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✔ 3 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✔ 16 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✘ 4 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✔ 17 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✘ 5 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✘ 18 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✘ 6 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✘ 19 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✔ 7 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✔ 20 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✔ 8 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✔ 21 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✔ 9 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✔ 22 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✔ 10 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✘ 23 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✔ 11 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✔ 24 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✔ 12 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✔ 25 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✔ 13 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✔ 26 <SELECT NAME=”priority” MULTIPLE SIZE=7> ✘ Figure 2. Simplifying automated test that tells whether or not the failure is for a total of about 21 minutes. Figure 2 shows how a bug in Mozilla’s still present, you can easily automate simplification. the algorithm worked. HTML input. The Setting up an automated test that checks whether Wynot prototype printing a specific HTML page works is not very dif- SIMPLIFYING USER INTERACTIONS starts cutting large ficult. All you need is a record-and-replay facility for Having simplified the HTML input, we were ready chunks of HTML input user interaction. to look at other details in the Mozilla bug report. Was (many characters at a One approach to automating simplification is to “setting the bottom and right margins to .50” really time) and gradually write a program that removes single characters from necessary, for example? In a separate test from the narrows the cut to just the input.