EXPOSITOR: Scriptable Time-Travel Debugging with First-Class Traces Technical Report CS-TR-5021

EXPOSITOR: Scriptable Time-Travel Debugging with First-Class Traces Technical Report CS-TR-5021 Khoo Yit Phang, Jeffrey S. Foster, and Michael Hicks Computer Science Department, University of Maryland, College Park, MD 20742, USA fkhooyp,jfoster,[email protected] Abstract—We present EXPOSITOR, a new debugging envi- those hypotheses fail. Standard debuggers make it hard to ronment that combines scripting and time-travel debugging to efficiently reuse the manual effort that goes into hypothesis allow programmers to automate complex debugging tasks. The testing, in particular, it can be hard to juggle the breakpoints, fundamental abstraction provided by EXPOSITOR is the execution trace, which is a time-indexed sequence of program state single stepping, and state inspection available in standard de- snapshots or projections thereof. Programmers can manipulate buggers to find the point at which the fault actually happened. traces as if they were simple lists with operations such as map and Scriptable debugging is a powerful technique for hypothesis filter. Under the hood, EXPOSITOR efficiently implements traces testing in which programmers write scripts to perform complex as lazy, sparse interval trees whose contents are materialized on debugging tasks. For example, suppose we observe a bug demand. EXPOSITOR also provides a novel data structure, the edit hash array mapped trie, which is a lazy implementation of sets, involving a cleverly implemented set data structure. We can maps, multisets, and multimaps that enables programmers to try to debug the problem by writing a script that maintains maximize the efficiency of their debugging scripts. In our micro- a shadow data structure that implements the set more simply benchmarks, EXPOSITOR scripts are faster than the equivalent (e.g., as a list). We run the buggy program, and the script tracks non-lazy scripts for common debugging scenarios. EXPOSITOR the program’s calls to insert and remove, stopping execution to debug a stack overflow and to unravel a subtle data race in Firefox. We believe that EXPOSITOR represents an important when the contents of the shadow data structure fail to match step forward in improving the technology for diagnosing complex, those of the buggy one, helping pinpoint the underlying fault. hard-to-understand bugs. While we could have employed the same debugging strategy by altering the program itself (e.g., by inserting print state- I. INTRODUCTION ments and assertions), doing so would require recompilation— “...we talk a lot about finding bugs, but really, [Fire- and that can take considerable time for large programs (e.g., fox’s] bottleneck is not finding bugs but fixing [them]...” Firefox), thus greatly slowing the rate of hypothesis testing. —Robert O’Callahan [1] Modifying a program can also change its behavior—we have all experienced the frustration of inserting a debugging print “[In debugging,] understanding how the failure came statement only to make the problem disappear! Scripts also to be...requires by far the most time and other resources” have the benefit that they can invoke libraries not used by the —Andreas Zeller [2] program itself. And, general-purpose scripts may be reused. Debugging program failures is an inescapable task for software programmers. Understanding a failure involves repeated A. Background: Prior Scriptable Debuggers application of the scientific method: the programmer makes There has been considerable prior work on scriptable de- some observations; proposes a hypothesis as to the cause of bugging. GDB’s Python interface makes GDB’s interactive the failure; uses this hypothesis to make predictions about the commands—stepping, setting breakpoints, etc.—available in a program’s behavior; tests those predictions using experiments; general-purpose programming language. However, this inter- and finally either declares victory or repeats the process with face employs a callback-oriented programming style which, a new or refined hypothesis. as pointed out by Marceau et al. [3], reduces composability There are certain kinds of bugs that can truly test the mettle and reusability as well as complicates checking temporal of programmers. Large software systems often have complex, properties. Marceau et al. propose treating the program as an subtle, hard-to-understand mandelbugs1 whose untangling can event generator—each function call, memory reference, etc. require hours or even days of tedious, hard-to-reuse, seemingly can be thought of as an event—and scripts are written in the sisyphean effort. Debugging mandelbugs often requires testing style of functional reactive programming (FRP) [4]. While many hypotheses, with lots of backtracking and retrials when FRP-style debugging solves the problems of callback-based programming, it has a key limitation: time always marches 1“Mandelbug (from the Mandelbrot set): A bug whose underlying causes forward, so we cannot ask questions about prior states. For are so complex and obscure as to make its behavior appear chaotic or even nondeterministic.” From the New Hacker’s Dictionary (3d ed.), Raymond E.S., example, if while debugging a program we find a doubly editor, 1996. freed address, we cannot jump backward in time to find the 1 corresponding malloc. Instead we would need to rerun the for sets, maps, multisets, and multimaps. As far as we are program from scratch to find that call, which may be prob- aware, the EditHAMT is the first data structure to provide lematic if there is any nondeterminism, e.g., if the addresses these capabilities. (Sec. IV describes the EditHAMT.) returned by malloc differ from run to run. Alternatively, we We have used EXPOSITOR to write a number of simple could prospectively gather the addresses returned by malloc scripts, as well as to debug two more significant problems. as the program runs, but then we would need to record all Sec. II describes how we used EXPOSITOR to find an ex- such calls up to the erroneous free. ploitable buffer overflow. Sec. VI explains how we used Time-travel debuggers, like UndoDB [5], and systems for EXPOSITOR to track down a deep, subtle bug in Firefox that capturing entire program executions, like Amber [6], allow a was never directly fixed, though it was papered over with a single nondeterministic execution to be examined at multiple subsequent bug fix (the fix resolved the symptom, but did not points in time. Unfortunately, scriptable time-travel debuggers remove the underlying fault). In the process, we developed typically use callback-style programming, with all its prob- several reusable analyses, including a simple race detector. lems. (Sec. VII discusses prior work in detail.) (Sec. VI presents our full case study.) In summary, we believe that EXPOSITOR represents an B. EXPOSITOR: Scriptable, Time-Travel Debugging important step forward in improving the technology for di- In this paper, we present EXPOSITOR, a new scriptable agnosing complex, hard-to-understand bugs. debugging system inspired by FRP-style scripting but with II. THE DESIGN OF EXPOSITOR the advantages of time-travel debugging. EXPOSITOR scripts treat a program’s execution trace as a (potentially infinite) We designed EXPOSITOR to provide programmers with a immutable list of time-annotated program state snapshots or high-level, declarative API to write analyses over the program projections thereof. Scripts can create or combine traces using execution, as opposed to the low-level, imperative, callback- common list operations: traces can be filtered, mapped, sliced, based API commonly found in other scriptable debuggers. folded, and merged to create lightweight projections of the In particular, the design of EXPOSITOR is based on two key entire program execution. As such, EXPOSITOR is particularly principles. First, the EXPOSITOR API is purely functional— well suited for checking temporal properties of an execution, all objects are immutable, and methods manipulate objects by and for writing new scripts that analyze traces computed by returning new objects. The purely functional API facilitates prior scripts. Furthermore, since EXPOSITOR extends GDB’s composition, by reducing the risk of scripts interfering with Python environment and uses the UndoDB [5] time-travel each other via shared mutable object, as well as reuse, since backend for GDB, users can seamlessly switch between run- immutable objects can easily be memoized or cached upon ning scripts and interacting directly with an execution via construction. It also enables EXPOSITOR to employ lazy GDB. (Sec. II overviews EXPOSITOR’s scripting interface.) programming techniques to improve efficiency. The key idea for making EXPOSITOR efficient is to employ Second, the trace abstraction provided by EXPOSITOR is laziness in its implementation of traces—invoking the time- based around familiar list-processing APIs found in many travel debugger is expensive, and laziness helps minimize the languages, such as the built-in list-manipulating functions in number of calls to it. EXPOSITOR represents traces as sparse, Python, the Array methods in JavaScript, the List module in time-indexed interval trees and fills in their contents on de- Ocaml, and the Data.List module in Haskell. These APIs are mand. For example, suppose we use EXPOSITOR’s breakpoints also declarative—programmers manipulate lists using combi- combinator to create a trace tr containing just the program nators such as filter, map, and merge that operate over entire execution’s malloc calls. If we ask for the first element of tr lists, instead of manipulating individual list elements. These before time 42 (perhaps because there is a suspicious program list combinators allow EXPOSITOR to compute individual list output then), EXPOSITOR will direct the time-travel debugger elements on-demand in any order, minimizing the number of to time 42 and run it backward until hitting the call, capturing calls to the time-travel debugger. Furthermore, they shield the resulting state in the trace data structure. The remainder programmers from the low-level details of controlling the of the trace, after time 42 and before the malloc call, is not program execution and handling callbacks.

EXPOSITOR: Scriptable Time-Travel Debugging with First-Class Traces Technical Report CS-TR-5021

TARDIS: Affordable Time-Travel Debugging in Managed Runtimes

Sindarin: a Versatile Scripting API for the Pharo Debugger

Deterministic Replay of System's Execution with Multi-Target QEMU

Time-Travel Debugging Java Applications

Refinement Types for Elm

CROCHET: Checkpoint and Rollback Via Lightweight Heap Traversal on Stock Jvms

PROGRAMME HEADLINE 2021 SPONSOR CONFERENCE Bloomberg Is the Leading Provider of Financial News and Information

EXPOSITOR: Scriptable Time-Travel Debugging with First-Class Traces Khoo Yit Phang, Jeffrey S

Towards Liveness in Game Development

A Time-Travel Debugger for Web Audio Applications

Righting Web Development

TARDIS: Affordable Time-Travel Debugging in Managed Runtimes