Debugging Concurrent Programs

CHARLES E. MCDOWELL and DAVID P. HELMBOLD Board of Studies in Computer and Information Sciences, University of California at Santa Cruz, Santa Cruz, California 95064

The main problems associated with debugging concurrent programs are increased complexity, the “probe effect,” nonrepeatability, and the lack of a synchronized global clock. The probe effect refers to the fact that any attempt to observe the behavior of a distributed system may change the behavior of that system. For some parallel programs, different executions with the same data will result in different results even without any attempt to observe the behavior. Even when the behavior can be observed, in many systems the lack of a synchronized global clock makes the results of the observation difficult to interpret. This paper discusses these and other problems related to debugging concurrent programs and presents a survey of current techniques used in debugging concurrent programs. Systems using three general techniques are described: traditional or breakpoint style debuggers, event monitoring systems, and static analysis systems. In addition, techniques for limiting, organizing, and displaying a large amount of data produced by the debugging systems are discussed.

Categories and Subject Descriptors: A.1 [General Literature]: Introductory and Survey; D.1.3 [Programming Techniques]: Concurrent Programming; D.2.4 [ Engineering]: Program Verification-assertion checkers; D.2.5 [Software Engineering]: Testing and Debugging-debugging aids; diagnostics; monitors; symbolic execution; tracing Additional Key Words and Phrases: Distributed computing, event history, nondeterminism, parallel processing, probe-effect, program replay, program visualization, static analysis

INTRODUCTION programs even harder than debugging se- quential programs. In the remainder of this The interest in parallel programming has section we will justify this claim and outline grown dramatically in recent years. New the basic approaches currently used for de- languages, such as Ada’ and Modula II, bugging parallel programs. In Sections l-4 have built-in features for concurrency. we discuss each of these approaches in de- Older languages, such as C and FORTRAN, tail. We conclude with Section 5 and an have been extended in a variety of ways in appendix with tables that summarize the order to support parallel programming features of 35 systems designed for debug- [Gehani and Roome 1985; Karp 19871. ging parallel programs. The added complexity of expressing concurrency has made debugging parallel Difficulty Debugging Concurrent Programs

‘Ada is a registered trademark of the U.S. Govern- The classic approach to debugging sequen- ment (Ada Joint Program Office). tial programs involves repeatedly stopping

This work was supported in part by IBM grants SL87033 and SL88096. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1989 ACM 0360-0300/89/1200-0593 $01.50

ACM Computing Surveys, Vol. 21, No. 4, December 1989 594 l C. E. McDowell and D. P. Helmbold

the program during execution, examining existent [Lamport 19781. Without a syn- the state, and then either continuing or chronized global clock, it may be difficult reexecuting in order to stop at an earlier to determine the precise order of events point in the execution. This style of debug- occurring in distinct, concurrently execut- ging is called cyclical debugging. Unfortu- ing processors. nately, parallel programs do not always have reproducible behavior. Even when Basic Approaches they are run with the same inputs, their Some researchers distinguish between results can be radically different. These monitoring and traditional debugging differences are caused by races, which occur [Joyce et al. 19871. Monitoring is the pro- whenever two activities are allowed to pro- cess of gathering information about a pro- gress in parallel. For example, one gram’s execution. Debugging, as defined in may attempt to write a memory location the current ANSI/IEEE standard glossary while a second process is reading from that of software engineering terms, is “the pro- memory cell. The second process’s behavior cess of locating, analyzing, and correcting may differ radically, depending on whether suspected faults,” where a fault is defined its reads the new or old value. to be an accidental condition that causes a The cyclical debugging approach often program to fail to perform its required func- fails for parallel programs because the un- tion. Since monitoring is often an effective desirable behavior may not appear when procedure for locating incorrect behavior, the program is reexecuted. If the undesira- it should be considered a debugging tool. ble behavior occurs with very low probabil- For the purposes of this survey, tech- ity, the programmer may never be able to niques for debugging concurrent systems recreate the error situation. In fact, any have been organized into four groups: attempt to gain more information about the program may contribute to the difficulty of (1) Traditional debugging techniques can reproducing the erroneous behavior. This be applied with some success to parallel has been referred to as the “Heisenberg programs. These are discussed in Sec- Uncertainty” principle applied to software tion 1. [LeDoux and Parker 19851 or the “Probe (2) Event-based debuggers view the exe- Effect” [Gait 19851. For programs that con- cution of a parallel program as a se- tain races, any additional print or debug- quence (or several parallel sequences) ging statements may modify a crucial race, of events. The generation and analysis lowering the probability that the interest- of these sequences or event histories is ing behavior occurs. This interference can the subject of Section 2. be disastrous when attempting to diagnose (3) Techniques for displaying the flow of an error in a parallel program. control and distributed data associated The nondeterminism arising from races with parallel programs are presented in is particularly difficult to deal with because Section 3. the programmer often has little or no con- (4) Static analysis techniques based on trol over it. The resolution of a race may dataflow analysis of parallel programs depend on each CPU’s load, the amount of are presented in Section 4. These tech- network traffic, and nondeterminism in the niques allow some program errors to communication medium (e.g., exponential be detected without executing the backoff protocols [Tannenbaum 1981, pp. program. 292-2951). It is this nondeterministic behavior that tends to make understand- This survey covers a large number of ing, writing, and debugging parallel pro- research and commercial projects designed grams more difficult than their sequential to help produce error-free concurrent soft- counterparts. ware. It focuses primarily on systems that An additional problem found in distrib- are directed toward isolating program er- uted systems is that the concept of “global rors. A large body of work in formal pro- state” can be misleading or even non- gram verification and in program testing

ACM Computing Surveys, Vol. 21, No. 4, December 1989 Debugging Concurrent Program 595 has been explicitly excluded from this sur- They have the potential of identifying a vey. Most of the systems surveyed fall into large class of program errors that are par- one of two general categories, traditional ticularly difficult to find using current dy- parallel debuggers (or what are sometimes namic techniques. These techniques have called “breakpoint” debuggers) and event- been applied mostly to parallel versions of based debuggers. Of course, some systems FORTRAN that do not support recursion. contain aspects of both classes. All of the As with the event-based debuggers, static systems (or in some cases proposed sys- analysis systems are still in the prototype tems) in these two general categories are stage. The primary problem with most listed in the tables in Appendix A. static analysis is that their In addition to traditional parallel debug- worst-case computational complexity is gers and event-based debuggers, some static often exponential. analysis systems are included. The static All three types of debugging systems have analysis systems surveyed fall somewhere made some progress in presenting the com- between debugging and testing. The static plex concurrent program state and the ac- analysis systems are distinguished from companying massive amounts of data to testing by not requiring program execution the user. Multiple windows is a useful and by generally checking for structural mechanism for interfacing with traditional faults instead of functional faults. That is, style debuggers for parallel systems. The the analysis tools have no knowledge of the abstraction capabilities of event-based de- intended function of the program and sim- buggers (see Section 2) have been used to ply identify program structures that are present interesting and potentially useful generally indicative of an error. These views of system states graphically [Hough systems do not appear in the comparison and Cuny 19871. table in Appendix A but are discussed in Section 4. 1. EXTENDING TRADITIONAL DEBUGGING Each of the three types of systems sur- TO PARALLEL PROGRAMS veyed takes a different approach to the debugging problem. The traditional parallel The simplest type of debugger to imple- debuggers are the easiest to build and there- ment for parallel systems is (or behaves fore provide an immediate partial solution. like) a collection of sequential debuggers, They provide some control over program one per parallel process. To date, all com- execution and provide state examination. mercially available debuggers for parallel They are also severely limited by the probe programs fit this description. The primary effect. differences lie in how the output from the Event-based debuggers provide better several sequential debuggers is displayed abstraction than that provided by tradi- and how the separate sequential debuggers tional style debuggers. They also address are controlled. We will call these collections the probe effect by permitting deterministic of sequential debuggers traditional parallel replay of nondeterministic programs. If it debuggers. is not possible to record event histories The probe effect, discussed in the Intro- continuously, however, the probe effect will duction, has gone mostly unaddressed by still be a problem. Also, event-based debug- traditional parallel debuggers. This makes gers are generally research prototypes, ap- traditional parallel debuggers ineffective plicable only to systems without shared against timing dependent errors. The probe memory. A notable exception is instant effect, however, does not always rear its replay [LeBlanc and Mellor-Crummey ugly head, allowing many program errors 19871, which supports event tracing and to be isolated using traditional cyclic de- replay on the shared memory BBN Butter- bugging techniques. This can be attributed fly provided OS protocol routines are used to two factors. First, those errors in parallel for all shared memory accesses. programs that are not timing dependent Static analysis tools avoid the probe ef- would never be masked by the probe effect. fect entirely by not executing the programs. Second, even for timing related errors, the

ACM Computing Surveys, Vol. 21, No. 4, December 1989 596 . C. E. McDowell and D. P. Helmbold

effect of the probe may not disturb the The Sun Microsystems’ dbxtool is an ex- outcome of the critical races. ample of applying a set of sequential de- Another criticism of traditional parallel buggers to concurrent programs without debuggers is that they operate at too low a any explicit coordination. It is capable of level. For programs consisting of many con- attaching to an existing UNIX’ process, currently executing processes, the major making it possible to debug a system of difficulty may be in understanding what is communicating UNIX processes by attach- happening at the interprocess level. Tradi- ing a separate copy of dbxtool to each pro- tional debugging techniques work well for cess. (The UNIX process may not contain viewing the behavior at the instruction process creation calls such as “fork,” and level or at the procedure level. In Section the executable image being debugged can- 2.4 some recent developments for viewing not be shared.) program behavior at a more abstract level An alternative to relying on a window are presented. manager to direct commands to the proper sequential debugger is to control all of the debuggers from a single terminal or window 1.1 Coordinating Several Sequential [Sequent Corp. 19861. Commands are then Debuggers directed at a specific process using a com- In addition to the sequential capabilities of mand parameter or by defaulting to a standard sequential debuggers, traditional specific “current” process. For example, parallel debuggers should be able to do the “continue Pl” would continue process Pl, following: and “continue” without a parameter would continue the “current” process. The “cur- 1. direct any sequential debugger com- rent” process can be changed at any time. mand to a specific task, The use of a single control window also 2. direct any sequential debugger com- permits the commands to be sent to all mand to an arbitrary set of tasks, processes. For example, “continue all” 3. differentiate the terminal output from would continue all currently suspended the different tasks. processes. In general, all processes will not receive the command at the same instant. The most primitive debugger for parallel The commands will, however, arrive at programs would be nothing more than a times that differ by an amount approxi- sequential debugger capable of attaching to mating the communication delay in the any single process in a parallel program. system. If all processes could be instanta- All that would then be necessary is to pro- neously stopped (and started) then, in the vide the user with multiple real or virtual absence of timeouts, “stop all” breakpoints terminals from which to execute the mul- would not cause any probe effect. This is, tiple copies of the debugger. Today’s mul- of course, impossible, but anything that can tiple window workstations make this more be done to minimize the time difference practical than it might have been a few for receipt of stop signals should reduce years ago. With a window manager [Schei- the probe effect. In addition to reducing fler and Gettys 1986; Sun Microsystems the probe effect, broadcasting a single 19861 points 1 and 3 could be satisfied by command to a set of processes is a useful selecting the desired window. Satisfying feature. point 2 could be achieved simply by repeat- The “current” task notion is generalized ing the desired command in each of the in Griffin [1987] and Intel Corp. [1987] to desired windows. This approach, however, a current set of processes that all receive would become fairly unwieldy for more any process-related commands. In Griffin than a few processes. Furthermore, the time [ 19871, processes can be added to or re- lapse between sending the command to the moved from the set simply by pointing to a first and last processes in the set could symbol for the process in a special window. aggravate the probe effect. This is particu- larly true for commands such as “stop” and “continue.” * UNIX is a trademark of AT&T Bell Laboratories.

ACM Computing Surveys, Vol. 21, No. 4, December 1989 Debugging Concurrent Programs l 597 This could be generalized to permit arbi- notion of logical time that stops when any trary groupings of processes. For instance, process reaches a breakpoint [Cooper 1987; it might be desirable to alternate com- DiMaio et al. 19851. In systems that sus- mands between two disjoint sets of pro- pend only the selected process, other pro- cesses. With only a single “current” set and cesses will continue to execute until they no overlap between the desired sets, this encounter some explicitly time-dependent would require as many commands from the operation. In that case, the logical clock is user as would be required with no support the one used for time in the time-dependent for process grouping. It would, however, operation. For example, if a breakpoint is still reduce the probe effect. It appears that encountered, no timeouts will expire until the macro capability of Intel Corp. [1987] the suspended process is continued. In sys- combined with their “context” command tems that stop all processes upon encoun- for specifying the current set of pro- tering a breakpoint, logical time is stopped cesses would support this toggling back and so that all of the suspended processes can forth between disjoint sets of processes. be continued with minimal impact. This will certainly not eliminate the probe effect, but it can permit some traditional style 1.2 Breakpoints debugging in the presence of such explicitly time-dependent operations. The ability to set breakpoints is possibly The domain of expressions or predicates the most important feature of a sequential used to describe a breakpoint is larger for debugger. (Since tracing is equivalent to parallel programs than for sequential pro- setting a breakpoint that, when encoun- grams. These predicate expressions may tered, prints some information and auto- involve both process state and events. An matically continues, the discussion in this event can be loosely defined as any atomic section will refer only to breakpoints.) Tra- action visible beyond the scope of a single ditional parallel debuggers generally sup- process. port the same types of breakpoints as those Predicates involving global state in an found in sequential debuggers. These executing parallel program can be a prob- breakpoints include stop at a source state- lem. This results from the lack of global ment, stop on the occurrence of an excep- clock in most systems. For example, an tion or some user detectable event, stop expression such as “process A never modi- when a specific variable is accessed, and fies variable X while process B is modifying stop when some conditional expression is variable X” may appear to be true due to satisfied [Seidner and Tindall 19831. Un- the delay in communicating this informa- like sequential debuggers, there are two tion to the debugger, when in fact concur- possible actions to take when a breakpoint rent modification has occurred. The use of is encountered. Either all of the processes events and some notion of consistent global in the parallel program can be stopped im- time can be used to address this (see Sec- mediately or only the process encountering tion 2). Possibly more important is that it the breakpoint can be stopped. The former may not be possible to stop the desired can be difficult to achieve within a suffi- processes after detecting that the predicate ciently small interval of time, and the latter is satisfied yet before the state has changed. can have a serious impact on systems that The distinction between events and contain such things as timeouts. Assuming global states is admittedly vague in general message passing is the communication but is usually well defined for any particular mechanism, an to stop all pro- system. For example, an event-based pred- cesses in a consistent state is presented in icate might be “process A sends a message,” Miller and Choi [1988a]. and a global state-based predicate might be Using breakpoints to debug systems with “the message buffer contains a message explicit time-dependent operations (such as from process A.” This could even be repre- timeouts) can be especially difficult. Some sented as a collection of program counter- systems have attempted to deal with such based breakpoints, one immediately follow- explicit race conditions by supporting a ing each send statement in process A. In

ACM Computing Surveys, Vol. 21, No. 4, December 1989 598 l C. E. McDowell and D. P. Helmbold systems that deal with event-based predi- facilities: cates, there must be some language for de- scribing events. The language may be as l the ability to trap on any interprocess simple as naming one of a finite set of communication (IPC), events such as “taskstart” or “sendmes- l the ability to modify/insert/delete IPC sage”; or it may support relatively powerful messages, abstractions such as those described in Sec- l the ability to control the clock used for tion 2. Table A.4 summarizes the break- timeouts. point capabilities of the systems surveyed. Several approaches have been taken to provide these capabilities. Some debugging 1.3 OS Support for Parallel Debuggers systems modify the program source in order Parallel debuggers that support global to provide the necessary hooks for the de- state-based breakpoints and event-based bugger at run time. This avoids the need breakpoints place greater demands on the for modifying the at the operating system than sequential debug- expense of slower performance and re- gers. This is just one more step in the stricted capability. A second approach is to evolution of debuggers. Early debuggers provide an alternative set of system rou- only needed the ability to examine core tines. This permits the debugger to inter- memory and the saved values of CPU reg- vene in any interaction between the user isters after the program terminated. Next and system. The final approach is actually was added the ability to set breakpoints. to modify the operating system to provide The operating system provided a means of the necessary hooks. At some point it may modifying the executable image and of become cost effective to implement more of passing control to the debugger when the the debugging hooks directly in the ma- breakpoint instruction was encountered. chine architecture. The interface entry in Most operating systems also pass control Table A.2 summarizes where the hooks to the debugger for most program excep- were placed for the systems surveyed. tions. Some hardware architectures now also include a special trace mode to facili- 2. EVENT HISTORIES tate single stepping. The final feature that is provided is a mechanism for passing con- Since the various debuggers surveyed were trol to the debugger when a specific memory designed for different environments, it is location is accessed. To summarize, a state- only natural that they do not agree on the of-the-art sequential debugger may need definition of “event.” For example, in the the following capabilities to be provided by DISDEB system events are memory ac- the operating system or hardware: cesses, in Radar each message send or re- ceive is an event, in Instant Replay an l the ability to read or write a register or event is the access of an object, and in memory location, YODA events represent Ada tasking activ- ity. In some systems, such as TSL and l the ability to set and trap breakpoints, EDL, events can be defined by the pro- l the ability to trap program exceptions, grammer. In systems with explicit inter- l the ability to single step a program, process communication, events can be l the ability to trap memory accesses. divided into two classes: those representing inter-process communication activity and Traditional parallel debuggers that go those representing activity internal to a beyond being simply a collection of sequen- single process. This distinction does not tial debuggers acting together may require seem to hold for shared memory systems, more of the underlying operating system. since each memory access is a potential Debuggers that fit the traditional cyclic interprocess communication. debugging paradigm, using breakpoints and Some systems merely display the events tracing, require one or more of the following as they occur (see Appendix A). A more

ACM Computing Surveys, Vol. 21, No. 4, December 1989 Debugging Concurrent Programs 599 powerful approach is to record an event any single process. This permits the history containing all of the events gener- use of a sequential debugger on a pro- ated by the program. The history can then cess without reexecuting the entire be examined by the user after the program program. has completed. Since the event history is often very large, some debuggers provide Browsing requires only minimal infor- facilities to browse or query the history. mation about each event. Simply recording Event histories can also be used to guide the kinds of events executed by a process the program’s execution, allowing the re- can help isolate an error. Of course, if more production of erroneous computations. If information is recorded, then more infor- the history is complete enough, a single mation will be available to the programmer. process can be debugged in isolation with One problem with browsing event histo- the history providing the needed commu- ries is that the histories frequently contain nication. Finally, some systems can auto- enormous numbers of events, making it matically check the history for suspicious difficult to locate the events of interest. behavior or transform the lower level his- Some systems allow selective recording of tory of events into more meaningful high- information, and others include powerful level events. mechanisms for examining the event his- tory (see Section 2.2). 2.1 Recording Event Histories To replay an execution requires enough information so that the next event in which A common approach is for the debugger to do as little as possible, mainly recording each process participates can be deter- mined. LeBlanc and Mellor-Crummey information, at run time. By limiting the debugger’s activity, the probe effect should [1987] describe a method that reduces the be reduced. The recorded information can amount of information needed for replay then be analyzed following the program’s compared with previous methods that re- execution. corded the complete contents of all mes- sages. Their ideas work because the program generates the contents of the mes- 2.1.1 Which Information to Record sages during the reexecution. The amount of information that must be Simulating the rest of the program so recorded for each event depends upon how that a single process can be debugged in the event history is going to be used. Three isolation requires that all events visible to general levels of use that require increasing the process be recorded. This includes both amounts of detail to be recorded for each the contents of messages and the values event are the following: written to shared memory. Note that reex- ecuting a single process requires more in- (1) Browsing-The event history is exam- formation than reexecuting the entire ined possibly through the use of spe- system. cialized tools. Examination methods If the interesting portion of the execution range from text editors to “movies” can be identified, then the amount of infor- showing the state changes caused by mation required for replay can be consid- events [Hough and Cuny 1987; erably reduced. Instead of recording the Le Blanc and Robbins 19851. entire history, the debugger can take a (2) Replay-The debugger uses the event snapshot of the program’s state and keep history to control a reexecution of the only that part of the history that follows program. This permits the use of con- the snapshot. It may, however, be difficult ventional debugging techniques, such to obtain accurate snapshots in distributed as breakpoints, state examination, and systems efficiently (see Chandy and single stepping, without changing the Lamport [1985] for one method). This tech- behavior of the program. nique may work best for simulating a single (3) Simulation-The event history can be process, since only that process’s state used to simulate the environment of needs to be recorded.

ACM Computing Surveys, Vol. 21, No. 4, December 1989 600 . C. E. McDowell and D. P. Helmbold

2.1.2 How the History Gets Recorded arate history tape for each process. Al- though each history tape is a linear stream, In addition to the amount of information together they (with the program) represent recorded in an event history, some atten- the partial ordering of the events in tion must be given to how the recording is the computation. The Traveler debugger done and the resulting impact on perfor- [Manning 19871 for Acore (a LISP-like lan- mance and the probe effect. In the systems guage) keeps a history tape (“lifeline” in surveyed, the methods used to generate the their terminology) for each object in the event history varied from inserting appro- program. In addition, the Traveler system priate statements in the original source records the partial order by explicitly link- program to monitoring the buses passively. ing each action to the child actions it causes An intermediate method of recording event (or parent action it allows to continue). histories is to provide modified system rou- A general technique for obtaining the tines. These modified routines record the partial order involves associating each history in addition to performing their nor- event with a vector of “logical timestamps” mal system functions. In some cases such [Fidge 1988; Haban and Weisel19881. The as LeBlanc and Mellor-Crummey [1987] it order (or absence thereof) between two is the user’s responsibility to insert system events can be easily determined by com- calls that record the event information. In paring the vectors of timestamps associated others it is only necessary to link the pro- with the events. gram using special monitoring versions (see the interface entry in Table A.2). With the exception of the bus monitor- ing, all other recording methods resulted in 2.2 Browsing Event Histories possible changes in timing and hence are Many systems recognize the need for facil- potentially susceptible to the probe effect. ities that help interpret massive event his- In some papers (see the probe effect in tories. Graphical techniques such as time- Table A-2) it was argued that the perfor- process diagrams and animation are dis- mance impact of monitoring was suffi- cussed in Section 3. A simple feature found ciently small to justify leaving the recording in many systems is filtering, whereby the permanently enabled. In other papers it events that the programmer feels are un- was argued that the perturbations caused important are automatically discarded, by the monitoring software were suffi- usually based on process or kind of event ciently small to avoid the probe effect in (see Table A.@. Two systems go further, most cases. storing the event history in a for easy examination. The YODA system for Ada tasking pro- 2.1.3 Linear Versus Partially Ordered Event grams [LeDoux and Parker 19851 stores the Histories event history as Prolog facts. Prolog pred- An issue that arises in distributed systems icates define the common temporal rela- is whether the event history should be par- tionships such as during, before, and after. tially or linearly (i.e., totally) ordered. On The user is able to define and store addi- the one hand, a linearly ordered history is tional predicates (either temporal or non- simpler to understand and can be easier to temporal). Existential queries, such as work with. On the other hand, a linear “which tasks updated variable X before stream is often misleading since it implies task T,” are used to retrieve information an ordering between every pair of events- from the history. Note that the YODA even when the events are completely unre- system stores when (using a global event lated. A partial ordering on the events is counter) and to what values variables are necessary to reflect the behavior of a dis- updated, as well as the explicit intertask tributed system accurately. communications. One way to represent the partial order In [Snodgrass 19841, the events are (used by Instant Replay [LeBlanc and captured in a relational database. In addi- Mellor-Crummey 19871) is to record a sep- tion to the event relations, the database

ACM Computing Surveys, Vol. 21, No. 4, December 1989 Debugging Concurrent Programs l 601 contains interval relations indicating be- regard. On one example (gaussian elimina- havior with duration. Time is kept in mi- tion of a 400 X 400 matrix on up to 64 croseconds, presumably using a global processors) they report that (with tuned clock. TQuel, a version of Quel augmented monitoring) the overhead involved in col- with temporal constructs, is used to build lecting the event history amounts to less new relations from those in the event his- than 1 percent. Furthermore, there was no tory. The user queries the history by build- additional decrease in performance when ing and printing the appropriate relation. the execution was replayed. Replaying real-time systems has several 2.3 Replaying Event Histories additional problems. The external I/O and interrupts must be recorded in addition to Several of the debugging systems allow a the communications between the processes. program to be reexecuted under the control Since real-time systems generally have a of an event history (see Table A.5). We call strong time dependency, it may be impor- this capability replay. If the history cor- tant to simulate time during the replay. If rectly reflects the interprocess communi- behavior due to the absence of communi- cation of the original execution, then the cation, such as timeouts, takes place, then replay produces the same results. Although faithfully reproducing such behavior re- replay helps solve the reproducibility prob- quires that additional events appear in the lems, it is useful only if additional infor- history. mation is gained. One way to gain information is to replay 2.4 Checking and Transforming the program in “debug mode,” with a tra- the Event History ditional sequential debugger attached to each process. This allows the internal state Several debugging systems compare the of the processes to be examined, giving the event history generated by the program programmer significantly more informa- with a set of predicates. These systems are tion than a stream of IPC events. Some more than browsers since the analysis is systems allow a single suspect process to be done at run time. In addition, the violation replayed in isolation, with the remainder of of a predicate can trigger additional debug- the program simulated by the event history. ging action. Because the analysis is done at This approach reduces the parallel debug- run time, all of the predicates to be tested ging problem to that of debugging a single must be written before the program starts sequential process, once the faulty process executing. This disadvantage can be re- is identified. duced if the event history is recorded so Another way of gaining information is to that the execution can be replayed. execute a modified program under the con- The DISDEB system [Lazzerini and trol of an event history. The event history Prete 19861 relies on programmable debug- can control the new program as long as ging aids that eavesdrop on bus traffic, its behavior is compatible with the old avoiding the probe effect. Although this [Curtis and Wittie 1982; LeBlanc and system contains several interesting fea- Mellor-Crummey 19871. This facility al- tures, it has a very low-level interface. An lows the programmer to add additional event definition is of the form “process P debugging statements or experiment with with certain permissions accesses memory modified algorithms. Another advantage of location X reading/writing value V.” The this feature is that corrected programs can memory location is mandatory, and the be tested against the same input and his- value may be a range. If part of the event tory that caused the previous version to definition is omitted, then that part is fail. treated as “don’t care.” DISDEB allows The added synchronization involved dur- state to be stored in two ways. First, ing replay can dramatically slow down a counters and timers can be defined and parallel program. The Instant Replay sys- used; second, event definitions can be en- tem [LeBlanc and Mellor-Crummey 19871 abled and disabled. Once a suitable set of reported some of the best results in this events has been defined, it can be used to

ACM Computing Surveys, Vol. 21, No. 4, December 1989 602 l C. E. McDowell and D. P. Helmbold

trigger debugging actions; for example, this system is simplified by the restriction “when El occurs, display counter C and that each specification can only refer to stop process N.” Other potential actions events from a single process. include starting and stopping traces The TSL system [Helmbold and Luck- of memory locations and manipulating ham 1985a] automatically checks specifi- timers. cations against the events generated by an A similar approach is taken by the Ada tasking program. Each TSL specifica- HARD system for Ada tasking programs tion is of the form “when this occurs then [Di Maio et al. 19851. There the predicates that occurs before something else,” where and debugging actions are encoded in spe- each of the three parts is an event formula. cial Ada tasks called D tasks. Manually TSL contains placeholders allowing a sin- inserted calls to the D tasks enable them gle specification to constrain multiple to obtain information about the program’s tasks. Additional abstraction is gained by execution. Based on this information, they using macros for event subformulas. An can call routines that display or modify the important contribution of the TSL system program state. All of the Ada facilities can is its use of Ada semantics to guarantee be used inside of a D task, so the program- that, even in distributed systems, certain mer can use a familiar high-level language pairs of events appear in the history in the to control the debugging process. correct order. Rather than using the stream of events The Event Description Language (EDL) to control debugging activity, the following takes a slightly different approach [Bates systems automatically check specifications and Wileden 19831. Instead of checking for the program. Although this requires specifications against the event history, it that the programmer learn an additional provides a method for defining multiple language, it can complement a formal spec- levels of abstract events from the primitive ification/verification approach to program events generated by the program. Each development. Most of these systems have high-level event is defined by an event for- their own way of specifying complex event mula over lower level events. There is one formulas, usually based on the sequential clause that constrains the values associated and parallel composition of events. with the lower level events and another that The IDD system [Harter et al. 19851 uses determines the values associated with the an interval logic specification of the pro- higher level event. The Belvedere system gram. This specification is checked against uses EDL to help control its display (see the program’s behavior. When a specifica- Section 3.3). tion is violated, the program is stopped for All of these specification methods have inspection. Temporal logic views the com- simplifying restrictions. In EDL, an accu- putation as a sequence of states. The main rate global clock is assumed, the event operators are “always” and “eventually,” recognizer is a potential bottleneck, and meaning that the following predicate on the some ambiguity arises when a low-level state is either always true or eventually event can be used in multiple higher level becomes true. Interval logic adds expressi- events (see, however, Bates [1988]). The bility by restricting the temporal operators TSL specification checker requires a lin- to portions of the computation. early ordered stream of events and is also The ECSP debugger [Baiardi et al. 19861 a bottleneck in the current application. In can check behavior specifications that the IDD system, events are restricted to completely describe the allowable commu- broadcasts on a shared medium (such as an nication behavior of the processes. The Ethernet). A tree structure method for eval- specifications can refer to various commu- uating the IDD interval logic expressions is nication activity and can contain assertions briefly described. The ECSP assertion on the process’s state. One of the constructs checker is for a hierarchical fork-join in their language causes control to be re- method of parallelism. Its main disadvan- turned to the user (presumably so that the tages are that each specification deals only process can be examined). The ordering of with the activity of one process and all events and checking of specifications in processes must be completely specified.

ACM Computing Surveys, Vol. 21, No. 4, December 1989 Debugging Concurrent Programs 603

3. GRAPHICS proc Simple = reply(request(A) + request(B) + request(C)); Sequential programs lend themselves rea- proc A = reply(request(X) + request(Y)); sonably well to being debugged with a single proc B = reply(b); sequential output device. There is only one proc C = reply(c); proc X = reply(r); thread of execution that can accurately be proc Y = reply(y); displayed as sequential text. Also, the data are logically stored in one place and can be Figure 1. A stylized transaction program. displayed when desired. Parallel programs are different in these two areas. In parallel programs, there are multiple threads of 3.1 Text Windows control and the data may be logically as A simple text presentation of the debugging well as physically distributed. An impor- information is the most common type of tant goal of research into parallel debugging display. All of the systems make use of systems is to find ways of presenting the some simple text displays. For a traditional distributed data and control of parallel pro- parallel debugger, this may be the only type grams to users in a manner that aids in of data display (see Section 1.1). In an comprehension. event-based system, a sequential display of Four basic techniques for displaying de- the events as seen by a particular process bugging information are as follows: can be useful. The Traveler [Manning 19871, which is an object-based system, can (1) Textual presentation of the data, which display a “lifeline” that is a sequential list may involve color, highlighting, or a of processes in the order that they accessed display of control flow information. a particular shared object. Time may not progress monotonically It is not always necessary for time to from the top of the screen to the progress monotonically from the top of a bottom. sequential text display to the bottom. In (2) Time-process diagrams that present the Traveler events are displayed in their execution of the program in a two- causal order instead of in their temporal dimensional display with time on one order. Traveler is used to debug message axis and individual processes on the passing programs. In their model, all mes- other axis. The points in the display sages come in pairs, with a request and a are labeled to indicate the activity of reply. All requests block until a reply is the specified process at the specified received, and the programming model is time. such that making one request may result in many more nested requests before a reply (3) Animation of the program execution, whereby both dimensions of the display is sent. Also, one process may send several are spatial dimensions. The display requests concurrently. The Traveler dis- corresponds to a single instant in time, play presents these nested request-reply pairs, which they call transactions, in such or snapshot of the program state. These a way as to emphasize the nested structure. snapshots can be displayed one after another, animating the program’s exe- Requests sent concurrently will be dis- played paired with their responses and cution. The actual format of a single nested within the send-receive pair that frame can take many forms as dis- caused them to occur. Figure 1 gives a styl- cussed below. ized example program using requests and (4) Multiple windows, whose use permits replies. “Request(A)” sends a message to several simultaneous views of the pro- request a response from process A. “Re- gram being debugged. This frequently ply(x)” sends the reply “x”. Figure 2 gives involves using one window per process. a possible sequential ordering of the events for a partial execution of the request “re- Each of these approaches will be discussed quest(Simple)“. Figure 3 gives the nested below with examples taken from the sur- transaction display for the same partial veyed papers. execution.

ACM Computing Surveys, Vol. 21, No. 4, December 1989 604 . C. E. McDowell and D. P. Helmbold

request(Simple) request(A ) reauest(B) re&estiXj 2. A sequential display. Figure request(C) request(Y) reply(y) Figure 4. A simple time-process diagram. reply(c)

In Griffin [ 19871 time-process diagrams request(Simple) are used without enhancement to display request(A) request(X) the activity of this shared memory-based [no response] system. It has the advantage that it can be request(Y) presented on a simple text screen (see Fig- Figure 3. A nested display. reply(y) ure 5). This system is actually a uniproces- [no response] Iequest(B) sor simulation of a multiprocessor, and this [no response] is evident in the display. The unit of time request(C) is the occurrence of an event. A single char- reply(c) acter is used to represent each of the pos- [no response] sible events in which a process may engage. Because this is a uniprocessor, each column will have one event character and all other For large programs, all transactions and active processors will have a “.” in that their subtransactions might not fit on the column. The last character other than “.” display simultaneously. To handle this, the in a row corresponds to the last event in user may selectively open and close trans- which the process engaged. This can be actions. When a transaction is closed, all thought of as the process’s current state. of its subtransactions are hidden. For The display can be scrolled forward or example, in Figure 3, if the transaction backward in time. The last non “.” in each for “request(A)” were closed, then “re- row that was scrolled off to the left is quest(X)“, “request(Y)“, and their corre- maintained in the leftmost column. The sponding response lines would not be display contains additional text that pro- shown. As the computation advances, vides additional information about the the various “ [no response] ” entries will current (rightmost column) state. This in- be filled in. cludes such things as which signal a process is waiting for and which signals have been 3.2 Time-Process Diagrams posted. (To avoid confusion with our broader use of the word event, we use signal A time-process diagram is a two-dimen- here where the mtdbx system [Griffin sional representation of the state of a par- 19871 uses event.) allel system over time. One axis represents In [Harter et al. 19851 the message-pass- time, and the other axis represents the ing system, Idd, there is heavier use of processes. Each point in the display gives graphics. Instead of placing one character some information regarding the state of the at each point in the display, two points corresponding process at the corresponding in the display are connected by a line to time. For example, a simple time-process indicate the transmission and receipt of diagram for a hypothetical system is shown a message (see Figure 6). To aid in compre- in Figure 4. Each row describes the events hension of a potentially very cluttered dis- in which a process engages, and each col- play, the user can magnijl or scroll to see umn describes the state of the system at a only a selected portion of the display. The particular instant in time. At time 1, pro- display option allows the user to rearrange cess 1 engages in event A; at time 2, process the rows so that related processes can be 2 engages in event B and process 3 engages placed close together. The user may also in event C; and so on. select various filters to be used in displaying

ACM Computing Surveys, Vol. 21, No. 4, December 1989 Debugging Concurrent Programs l 605

1 STRRT] 1 STOP 11 CONT 11 QUIT 1 Display Rate Chm~l: tOI O-10

MRIN -...... TRSK 1 i? : : : . SC- . . . . I...(-)(-..I.( . . . . TRSK2 . R. . . : : : : : : . SP. UJ...... TRSK3 . . R.. SI . . . I. : : . . 1 -I(. . . . . -1. (. -I(- TRSK4.. . R.. -SPUl...... ::......

TASK ID STATUS TASK IO STATUS LOCK STRTUS OWNER 0 MAIN running 3 3 running 1 off 1 lockwait on 2 4 4 eventwait on 4 2 on 3 t 2 eventwait on 3

Figure 5. The mtdbx time-process diagram.

Display Filter

w Time in MS (Freers) Time in MS

<

Figure 6. The Idd time-process diagram. Used with permission [Harter 198510 1985 IEEE.

subsets of the messages that fall within the message. Figure 7 shows process histories time-process space currently being dis- for three processes and the corresponding played. Similar displays are included in concurrency map. The horizontal lines PPUTT [Fowler et al. 19881 in which the (partially obscured by the boxes) corre- emphasis is on the programmer noticing spond to logical time divisions. An event irregularities in the patterns of communi- may have occurred during any time division cation. touched by the box containing the event. For many of the variations on time- Time-process displays appear to have a process diagrams a global clock is required. definite place in viewing the activity of At least one system [Stone 19881, however, parallel systems. They do have their limi- uses a type of time-process display without tations. As the number of processes be- needing a global clock. This display is called comes large, the display may become too a concurrency map. Instead of displaying cluttered with information to be useful. exactly when events occurred based on a This can be addressed to a degree with global clock, events are arranged to show filtering and such features as the display only the order in which they occurred. This and magnify options described above for order is derived from the time dependencies Idd. In the next section, we present an in the program. For example, the receipt of alternative display that gives a much dif- a message must follow the sending of a ferent view of the system.

ACM Computing Surveys, Vol. 21, No. 4, December 1989 PROCESSA PROCEiS B PROCEiS C I II: Compute Cl: Compute

Compute Compute Computo “I Send Ml Rocelve Ml Receive Mz Compute Compute Compute Send I42 Send MI Receive MI COJllpUtO Computa Compute Send I43 Roaivo M3 Send M!5 Compute Computa Compute Recoivo H5 ...... Compute.. . I

Process Histories

/ /+fc;I: Compute t

Figure 7. Process histories and a concurrency map. Debugging Concurrent Programs l 607

3.3 Animation An alternative to time-process displays is to place each process (or selected portions of distributed data) at a different point in a two-dimensional display and have the entire display represent the system at a single instant in time. As time advances, the display changes and these changes can be played in sequence to give a type of animated movie. This movie displays the evolution of the state of the system. The placement of the processes (or data) in the two-dimensional display could be arbitrary, be under user control, or correspond to the underlying structure of the program (or data) being represented. In Belvedere [Hough and Cuny 19871, the placement of processes is very impor- Figure 8. Animation using Belvedere. Used with per- tant and is specified by the user. The basic mission of Hough and Cuny [1988]. animation elements are depicted in Fig- ure 8. This system animates primitive events and user defined events specified using EDL. To further help organize the can use existing views or build new ones. display, the user may request that events Four views have been constructed and are be displayed from different perspectives: described in Socha et al. [1988]: icon view, that of a processor, a channel or a data vector view, trace view, and linked-list item. For example, when viewed from the view. perspective of a single processor, the events In the icon view, the events indicate will be displayed in the order that they where in a two-dimensional picture a par- appeared to that processor. Examples of ticular icon should be drawn and when to this can be found in Hough and Cuny start a new animation frame. By observing [1987] and in Figure 8. Figure 8 is a snap- the position of the icons within a single shot of message traffic animation during a frame and their change in position from traveling salesman program. Activity is de- frame to frame, several errors have been picted by highlighting the appropriate ports detected. The vector view is a variation of (small boxes) and channels (lines). A port this where instead of drawing icons, vectors is highlighted during a receive and a chan- are drawn. nel during a send. Arrowheads indicate di- The trace view provides a small box (win- rection for sends, with multiple arrowheads dow) for each process. Connections to other indicating more than one message in the processors are shown as lines to other buffer. boxes. Values local to the process are then The Radar [LeBlanc and Robbins 19851 displayed inside the box. The linked-list system also uses animation of messages.In view is a variation of the trace view that Radar, the user can control how long each displays a linked-list data structure from time frame is displayed. Also, at any time, the program. The nodes are drawn as boxes, the user can have the contents of any mes- with the actual data values displayed. The sage displayed. boxes are then connected to show the actual Voyeur [Socha 19881 is a prototype sys- link structure. tem for the construction of application spe- The emphasis in Voyeur is the ability to cific “views” of parallel programs. Its input provide an easy method for programmers is a sequence of events generated from user- to animate their parallel programs for the inserted instrumentation code. The user purpose of uncovering errors.

ACM Computing Surveys, Vol. 21, No. 4, December 1989 608 . C. E. McDowell and D. P. Helmbold

3.4 Animation Versus Time Process would be to display several animation frames simultaneously. Using the abstrac- It should come as no surprise that neither tion of Belvedere, it might even be useful animation nor time-process diagrams alone to display several different perspectives in is sufficient to detect all of the errors in different windows simultaneously. It may parallel programs easily. Animation is good be that the flexibility of a system like for observing the instantaneous state of the Voyeur will be necessary because no single system. By only displaying a single instant view is sufficient for the many different of time, more state information can be dis- types of errors that must be addressed. played simultaneously. This may mean more information per process or more pro- cesses. Patterns of concurrent behavior can 4. STATIC ANALYSIS FOR DEBUGGING also be viewed (e.g., all processes except one PARALLEL PROGRAMS are sending messages to their neighbor). When the probe effect renders the tech- Animation, however, does not clearly show niques in Sections 1 and 2 useless, what patterns of behavior that occur across time. options are left to a programmer to debug This is addressed to some degree in Belve- a parallel program? Some researchers are dere by the use of high-level abstract events pursuing static analysis techniques for de- that may encompass an interval of time. tecting certain classes of anomalies in par- Time-process diagrams can display pat- allel programs. This is distinct from formal terns of behavior over time. This can be proof of correctness, because no attempt is especially helpful in finding performance made to prove conformance with a written bugs. The trade-off is that only a very small specification. Instead, an attempt is made amount of information can be displayed for to give assurance that the program can- each process at any point in time. In mtdbx not enter certain predefined states that [Griffin 19871 only a single character is generally indicate errors. displayed for the entire state of a process. Static analysis is being used to detect two In PPUTT a process is either waiting, run- classes of errors in parallel programs: syn- ning, sending, or receiving. chronization errors and data-usage errors. It appears that the ideal debugger for Synchronization errors include such things complex concurrent systems would support as deadlock and wait forever. Data-usage both animation and time-process displays. errors include the usual sequential data- Two pieces of evidence to support this usage errors, such as reading an uninitial- claim are the following: ized variable, and parallel data-usage errors The animation systems find some errors typified by two processes simultaneously by noticing changes from one frame to updating a shared variable. the next (the passage of time), and There appear to be two related but dis- tinct areas being investigated. One is apply- The time-process displays generally pro- ing dataflow analysis techniques, similar to vide some mechanism for displaying de- those used by optimizing and vectorizing tailed system state for a particular compilers, to determine data-usage prop- instant in time. erties in parallel programs. The other is answering the question, One._. approach to combining the two would be simultaneously to present a time- Is it possible for two statements Sl and S2 process diagram in one window of a graphic in a parallel program to execute in parallel? workstation and an animation in another. The time-process display would guide the These two areas are discussed in the re- programmer to the important point in time, mainder of this section. Some closely re- and the animation display would present lated work on combining static analysis detailed information about the state of the with dynamic debugging and a system de- program in a comprehensible way. An al- signed to aid in the development of parallel ternative method of displaying both time algorithms are presented at the end of and the detail found in animation frames Section 4.

ACM Computing Surveys, Vol. 21, No. 4, December 1989 Debugging Concurrent Programs l 609

4.1 Dataflow Analysis of Parallel Programs (6) a process waiting for the completion of another process that is guaranteed to Probably the most frequently referenced have already completed, and work on dataflow analysis of parallel pro- grams is that of Taylor and Osterweil (7) a process that is scheduled to execute [ 19801. Their algorithms generate four in parallel with itself. data-usage sets for each node of a program In addition to not permitting recursion, flow graph: gen, kill, live, and avail. These the algorithm for item (2) assumes the exis- correspond to the sets by the same names tence of an algorithm for determining if used in the global dataflow analysis of op- two statements can execute in parallel. timizing compilers [Fosdick and Osterweil This is the subject of Section 4.2. Also, it 1976; Hecht and Ullman 19751. The origi- is recognized that it is impossible to “create nal algorithms used to compute live and a fixed static procedure capable of con- avail have been extended to pass data-usage structing the PAF of any program written information across edges in the flow graph in a language which allows run-time deter- corresponding to synchronization opera- mination of tasks to be scheduled and tions. By reinterpreting the meaning of gen waited for.” and kill, it is possible to use the modified The difficulty of using dataflow to ana- data-usage sets to arrive at algorithms to lyze parallel programs is clearly shown in detect anomalies in parallel programs. Callahan and Subhlok [1988]. They present The algorithms in Taylor and Osterweil an algorithm for determining which data [1980] assume a simple process synchroni- dependencies present in a sequential exe- zation model. One process may cause an- cution of a program are preserved in a other process to begin execution with the parallel execution of the program. They statement “schedule X” and wait for the then show that determining if all data de- completion of another process with “wait pendencies are maintained is Co-NP-hard X”. Their model does not permit a process using only the information found in their to execute (be scheduled) in parallel with verison of the PAF which they call the itself. This would correspond to a recursive synchronized control flow graph. They also process invocation. Recursion is also not present approximations that execute in allowed within any single process. They polynomial time on programs written using present algorithms based on their modified a simple programming model. Two notable data-usage sets that operate on a represen- limitations of the model are that no syn- tation of the program called a Process chronization operations are permitted Augmented Flowgraph (PAF). This is con- within loops and all synchronization is structed by taking the flowgraphs of the done with event variables that cannot be individual processes and connecting them cleared. with edges to indicate process synchroni- zation constraints. For example, there 4.2 Parallel (i, j) would be an edge connecting the “schedule X” statement in one process with the initial A Boolean function parallel (i, j), which statement in process X. Their algorithms returns true if it is possible for program can detect the following: points “i” and “j ” to execute in parallel, can be used to detect parallel access errors. (1) a reference to an uninitialized variable, These occur when a variable is being read and written in parallel or when two pro- (2) a variable that is referenced while being cesses can simultaneously write to the same defined in parallel, variable. If the program can be represented (3) a definition of a variable that is never as a Petri net [Peterson 19771, then this referenced, function can be implemented by examining (4) a variable that may have an indeter- the reachable states for the net. Unfortu- minate value, nately, the number of reachable states in a (5) a process waiting for the completion of bounded Petri net grows exponentially with an unscheduled process, the number of places (nodes) in the net.

ACM Computing Surveys, Vol. 21, No. 4, December 1989 610 l C. E. McDowell and D. P. Helmbold

An algorithm similar to computing the and Osterweil [ 19801, if a node M is in any reachable states in a Petri net applied to of the above three execution sets at node Ada programs is presented by Taylor N, then the nodes N and M would be as- [1983]. Like the Petri net algorithm, the sumed to execute in parallel. The result of number of states generated by Taylor’s al- this could be a potentially large number of gorithm can increase exponentially with extraneous anomaly reports corresponding the number of parallel tasks in the Ada to infeasible paths. program. The three functions are actually repre- An algorithm presented in McDowell sented as execution sequence sets attached [ 19891 computes parallel (i, j ) for programs to each node of the PAF. The algorithms written in FORTRAN with extensions to for computing the execution sequence sets support explicit parallelism. Whereas the are very similar to the dataflow analysis simple language in Taylor and Osterweil algorithms mentioned in Section 4.1. The [1980] explicitly prohibits the execution of details of the algorithms can be found in a process with itself, the algorithm in Bristow et al. [ 1979131.For the nonrecursive McDowell [1989] uses the fact that many language HAL/S, the execution sequence parallel numerical applications are ex- sets can all be computed in polynomial pressed as collections of identical tasks ex- time. ecuting in parallel on shared data. The The result of applying the analysis men- result is that many fewer states are gener- tioned above is an anomaly report. It should ated. This algorithm is being used in a be possible to provide the user with suffi- prototype debugging tool [Appelbe and cient information to determine what source McDowell 19851. statements are involved in the anomaly. A somewhat different approach to com- There are, however, two problems related puting parallel (i, j ) was taken in Bristow to presenting the anomaly report. One et al. [1979a]. Their algorithms operate on problem is that the anomaly report may the same PAF representation of a program contain many anomalies that are the result described in Section 4.1. They can build of infeasible paths and do not correspond PAFs for the real language HAL/S. Al- to a real error in the program. Some prog- though more powerful than the simple lan- ress in removing infeasible paths from guage used in Taylor and Osterweil [ 19801, static analysis of sequential programs has HAL/S is still nonrecursive and con- been reported [Werner 19881. There is, tains relatively simple synchronization however, nothing in the literature concern- operators. ing the removal of infeasible paths from Instead of computing a single function, static analysis of concurrent programs. parallel (i, j), they compute eleven func- The second problem with presenting the tions which they call execution sequence anomaly report is presenting the informa- sets. Included are three execution sequence tion in such a way that the user under- sets: concurrent, always-concurrent, and stands how the erroneous concurrent state possibly-concurrent. The sets are computed could arise. It may not be sufficient to for each node in the PAF. The set concur- report that variable X is modified concur- rent for node N contains all nodes M such rently by a process executing line 100 and that on all execution paths on which both another process executing line 200. If the M and N occur, they occur with no forced user cannot understand how lines 100 and ordering. The set always-concurrent is a 200 can execute in parallel, then it may be subset of concurrent that satisfies the ad- difficult to determine how to resolve the ditional restriction that all program exe- problem. Furthermore, the user may simply cution paths containing N also contain M. decide (erroneously) that this situation Node M is in the set possibly-concurrent could never arise and that the anomaly for node N if there is some execution path report should be ignored. in which both M and N occur with no forced The approach taken in Applebe and ordering between the two. For use in the McDowell [1988] allows the user to exam- anomaly detection algorithms of Taylor ine not only the concurrency state causing

ACM Computing Surveys, Vol. 21, No. 4, December 1989 Debugging Concurrent Programs l 611 the anomaly report but also the concur- block that is not executed with the given rency states that led up to that state. A input. The analysis performed to identify multiwindow user interface is provided that races can also be used to help with break- displays an anomalous concurrency state point debugging. If the critical point in each along with a description of the anomaly process involved in a race can be identified, [McDowell 19881. The concurrency state is then a breakpoint can be placed just before represented by displaying a small portion that point in each process. This will stop of the source for each concurrent task in a the system in the state necessary to induce separate window. The user may then dis- the race and permit close examination. In play any previous or successor concurrency addition, by selectively continuing the state to determine how the situation arose. processes, alternative race outcomes can be This is somewhat like performing a coarse explored. forward or backward simulation. 4.4 Static Analysis in the Development 4.3 Combining Static Analysis with Process Dynamic Debugging In addition to analyzing parallel programs Taylor [1984] describes several ways in statically, and debugging them with run which static analysis could be productively time debuggers and monitors, there is the combined with dynamic analysis. One ap- possibility of eliminating the errors in pro- proach would be to use the information grams before they occur. Here we would from static analysis to help develop test like to present some current work that data for use in conjunction with a dynamic seeks to aid in the development of parallel debugger. Conversely, information from dy- programs that are free of the kinds of errors namic monitoring could be used to guide outlined at the beginning of Section 4. partial static analysis when complete static Automatic vectorizing compilers are the analysis would generate too many states. If predecessors of the work presented here. subparts of a program could be shown to be They represent a very restricted form of free of errors using static analysis, then parallel programs that are free of parallel those portions of the program would not bugs, assuming, of course, that the compi- need monitoring. This could reduce the lers are correct. This work has been ex- overhead associated with monitoring. If tended most notably by Banerjee et al. run-time assertion testing is included in the [ 19791 to permit parallel execution of a wide program, then the static analyzer could as- range of loops. Again, assuming that both sume that the assertions are true, reducing the sequential programs and the compilers the number of states that must be exam- are correct, the parallel programs that re- ined. A related technique is the use of sym- sult will also be correct. bolic execution to reduce the state space of In addition to the fully automatic tech- a static analysis tool by eliminating infeas- niques of Banerjee et al., researchers at ible paths [Young and Taylor 19861. Rice University are working on a system A somewhat different combined use of called PTOOL [Allen et al. 19861. PTOOL static and dynamic techniques is described performs interprocess dataflow analysis to in Allen and Padua [ 19871, Miller and Choi determine when a loop can be parallelized. [1988b], and Stone [1989]. Each of these It does this only for loops selected by the systems applies static analysis to a dynam- programmer. It interacts with the program- ically generated trace in order to identify mer for three reasons. First, the amount of parallel access anomalies that they call time to perform the analysis for all loops races. If a particular trace can be shown to and all combinations of loops is prohibitive. be free of races, then the program is free of It is assumed that the programmer under- races for the given input. This does not stands the overall structure of the program mean that the program is free of races in and knows which sections are most suitable general. For example, a race condition for parallelization. Second, the programmer could be present in a conditionally executed can make a judgment about the typical

ACM Computing Surveys, Vol. 21, No. 4, December 1989 612 l C. E. McDowell and D. P. Helmbold

values of certain variables at run time that a major problem with the animation dia- affect the decision of whether to parallelize grams is the placement of the symbols rep- a particular loop. This is particularly im- resenting the processes. Hough and Cuny portant when the overhead for parallel ex- [1987] make it clear that proper placement ecution is relatively high. Finally, by can be very important in comprehension of interacting with the programmer, PTOOL the display (see Figure 8). can provide information that might permit In addition to the problem of placement the programmer to change the program is the problem of too much information- slightly, thereby allowing an important even for a picture. A possible solution is a loop to be parallelized. In a fully automatic language for abstracting low-level events system the compiler would have to reject into higher level events for display. Event the loop as a candidate, possibly missing description languages can also be used an important opportunity for parallel to filter out irrelevant events, reducing speedup. the amount of information that must be By using automatic or semiautomatic displayed. techniques based on correctness preserving One prominent feature of several systems transformations, it is possible to debug a is modularity [Joyce et al. 1987; Victor sequential version of a program using con- 19771. By carefully designing a modular ventional debugging tools and then trans- system, the addition or modification of var- form it into an equivalent parallel version. ious features can be managed easily. An event-based system might have modules for low-level event monitoring, filtering, and 5. CONCLUSION recording of events (from the low-level Having completed the survey, the question event modules), display of recorded events, remains, What progress has been made and analysis of recorded events, and controlled where is more work needed? Because of the reexecution of the program. In traditional diversity of applications, languages, and parallel debuggers as described in Section systems, no single approach can satisfy all 1, there may be separate modules for inter- parallel debugging needs. The following acting with the low-level machine and for paragraphs summarize what has been interacting with the user. “Plug compati- achieved and speculate on possible research ble” modules are advantageous because directions. they allow experimentation with different The deficiencies of both static and dy- debugger functions. namic techniques have been discussed in A well-defined interface (or hierarchy of this paper. One promising approach that interfaces) between the user and the low- alleviates some of these deficiencies is the level machine isolates most components creation of a toolkit that integrates both from changes in any one part of the system. approaches (see Section 4.3). The user modules become machine inde- As the saying goes, “A picture is worth a pendent; the low-level machine modules thousand words.” With program activities become user interface and language distributed across both space and time, independent; and possibly some user in- simple sequential displays of program ac- terface modules may become language tivity are inadequate. The time-process dia- independent. grams (see Section 3.2) give a compact view The probe effect is possibly the most of the event history, whereas the animation significant difference between debugging diagrams (see Section 3.3) give a more parallel programs and sequential programs. detailed view of a single instant in time. The most obvious solution to the problem Both representations are valuable, and the of the probe effect is to have the probes use of multiwindow workstations makes it permanently in place. This does not help possible to have both. with breakpoint debugging, but it solves The appropriateness of each of these dia- the problem for event-based debugging grams needs further research. For example, using monitoring and event histories. The

ACM Computing Surveys, Vol. 21, No. 4, December 1989 Debugging Concurrent Programs 613 problem with this solution is the perfor- for expressing them. Furthermore, the issue mance penalty for having software probes of integrating design specifications with permanently enabled. The use of hardware run-time checking has not been addressed. assistance for high-level debugging was A variation of the specification approach proposed in Gentleman and Hoeksma is taken by EDL [Bates and Wileden 19831. [ 19831, and systems using hardware moni- They do not check the event history against toring for multiprocessors are described in specifications but instead transform it into Lazzerini and Prete [ 19861and Rubin et al. a higher level history. This approach may [1988]. help bridge the gap between low-level com- Before hardware designers will dedicate munication primitives and the more ab- precious silicon to “hooks” for parallel de- stract communication mechanisms used in buggers, it will be necessary to identify just the program. EDL has been successfully what hooks are useful. Having specified the used to control the presentation of graphic hooks, a cost-benefit analysis could deter- data in the Belvedere system [Hough and mine which low-level debugging elements Cuny 19871. should be implemented in hardware. Most We have seen some attempts at software uniprocessors today have hardware hooks solutions to the probe effect. These involve for breakpointing and single stepping. It some mechanism for the debugger to ma- seems only natural that hooks for parallel nipulate the logical passage of time. In none debugging be added to parallel systems. of the systems surveyed was this completely Event histories may be the most natural successful; real-time events do not lend abstraction of distributed systems. A vari- themselves well to manipulations of logical ety of tools and methods for examining time. Although appearing unsolvable in them have been developed. For small and general, software solutions might be attain- simple parallel programs, it may suffice to able for some systems such as message- print the events as they occur. In larger passing systems without timeouts. It systems, it may be preferable to save the certainly appears that designers of new event history for later examination. An parallel languages and synchronization alternative to examining the event history constructs should keep the probe effect in manually is to check the event history mind. against a set of specifications as it is gen- Perhaps the best way to avoid the bugs erated. Although this approach incurs a associated with current parallel constructs large overhead, it may be the only effective is to design languages for parallel machines way to monitor large continually executing that make such errors impossible. Dataflow systems. Ideally, specifications from the and functional languages are one example program’s design phase would be used to of systems that attempt to “define away the detect errors. In current systems, the pro- problem.” Still other examples can be found grammer must write the specifications in declarative languages such as Prolog for checking the event history. This is or higher level languages as described in usually done as part of a testing phase and Goldberg [ 19861. A somewhat less radical is not directly connected to any design approach is the use of tools for automati- specification. cally detecting parallelism. These may be The event specification languages we fully automatic or require some user inter- have seen [Baiardi et al. 1986; Harter et al. action. In either case, such systems would 1985; Helmbold and Luckham 1985b] can ensure that the parallel program produces all express simple constraints on the event exactly the same results as the sequential stream. It is unreasonable, however, to ex- version. pect the programmer to specify completely the intended behavior of a program with ACKNOWLEDGMENTS these languages. Much work needs to be We would like to thank Anil Sahai who contributed done before we know what kinds of asser- to an early version of this survey. We would also like tions are most useful and the best languages to thank the referees for their comments.

ACM Computing Surveys, Vol. 21, No. 4, December 1989 614 l C. E. McDowell and D. P. Helmbold

APPENDIX A. SUMMARY TABLES Hardware Hardware configuration debugger uses or re- The tables in this appendix present brief quires descriptions of the systems surveyed in a form that permits quick comparisons. To Status Completeness of debugger make it possible to place as much infor- implementation mation as possible in each summary table, partial Prototype missing major we use one- or two-word descriptors in features the tables. Informal explanations of the descriptors are given before the tables. production Production version avail- able A.1 General Characteristics Part 1 prototype Complete in-house system O.S. Operating system debug- ger runs under n/s Not specified

Table A.l. General Characteristics Part 1 System OS. Hardware Status Agora [For881 Agora LAN prototype Amoeba [Els88] Amoeba n/s prototype belvedere [HC87] simple sim emulator partial BUGNET [CW82] MICROS MICRONET partial CBUG [Gai85] UNIX any UNIX prototype cdbg [Int87] iPSC iPSC prototype dbxtool [AM861 UNIX(Sun) Sun production defence n/s uniprocessor partial DISDEB yiE; Mara Mara prototype EDL [BW83] VMT UMass VMT UMass partial HARD [ MCR85] UNIX any UNIX prototype IDD [HHK85] UNIX network(Sun) partial Instant [LM87] Chrysalis BBN butterfly prototype Jade [ JLSU87] Jipc vax/Sun prototype MAD [RRZ88] n/s mimd sh. bus prototype Meglos [GK86] UNIX MC68000 production mtdbx [Gri87] UNIX/COS Sun/Gray prototype Multibug [CP86] n/s multi-macro-proc prototype Parasight [AG88] Mach/Umax Multimax prototype pdbx [Se@61 Dynix Sequent production Pilgram [Coo871 Mayflower Cambridge DCS partial PPD [MC88b] n/s n/s partial RADAR [LR85] n/s PERQ prototype Recap PLW n/s n/s proposed Traveler [Man871 Apiary emulator prototype TSL [HL85b] any any prototype Voyeur [SBN88] n/s n/s prototype YODA [LP85] n/s n/s prototype [AP87] Cedar n/s partial [BDV86] MuTEAM MuTEAM partial [GB85] PathPascal n/s partial [GGKSI] n/s n/s prototype [GKY88] any any prototype UNIX any UNIX prototype Medusa/StarOS Cm* prototype

ACM Computing Surveys, Vol. 21, No. 4, December 1989 Debugging Concurrent Programs l 615

A.2 General Characteristics Part 2 Global Clock Whether debugger re- quires a global clock Interface How the debugger hooks assumed Debugger assumes the into the program existence of an accurate hardware Additional hardware is global clock used instead of a soft- self-timed Debugger simulates its ware interface own global time manual Calls to the debugger are uniprocessor Debugger is for a single manually inserted by the processor system or programmer simulation object Compiler modifies the ob- Languages Languages the debugger ject code supports oper sys Debugger interacts with Model The model of communica- the normal operating tion system block-send Message passing with source Automatic insertion of blocking sends source code statements gmem A bank of global memory calling the debugger equidistant from all Probe Effect How the debugger ad- processors dresses the Probe Effect hybrid Has both shared memory fast calls Fast monitoring opera- and message passing tions minimize the effect lmem The shared memory is lo- on program timing cated at the processors leave in Leave the debugger in the messages Message passing system rndzv Ada rendezvous logical time Logical time hides the WC Remote procedure call effects of debugging operations n/s Not specified

Table A.2. General Characteristics Part 2 System Interface Probe Effect Global Clock Languages Model Agora [For881 oper sys leave in self-timed n/s lmem Amoeba [Els88] object n/s none n/s messages belvedere [HC87] object logical time uniprocessor Simple Simon messages BUGNET [CWSZ] object n/s self-timed Modula2 messages CBUG [Gai85] source fast calls none C 23-m cdbg [I&37] oper sys n/s none C, Ftn messages dbxtool [AM861 oper sys n/s none C, Pscl, Ftn messages defence [~;UJ object n/s uniprocessor Cont. Euclid monitors DISDEB hardware none’ none any gmem + lmem EDL [BW83] n/s n/s assumed n/s n/s HARD [MCR85] source logical time assumed Ada rndzv + gmem IDD [HHK85] object n/s none C, ModulaQ messages Instant [LM87] object leave in none several gmem Jade [ JLSU87] object n/s none several block-send MAD [RRZ88] man + hard leave in assumed PARC(C) gmem Meglos GK861 y-c; n/s none C messages mtdbx [Gri87] logical time uniprocessor f77 + Cray gmem Multibug [CP86] oper sys n/s none low level messages Parasight [AGW oper sys fast calls none C gmem ’ DISDEB uses additional hardware to eavesdrop on the network traffic. This allows the DISDEB debuggers to run transparently, without disturbing the program’s timing. (continued)

ACM Computing Surveys, Vol. 21, No. 4, December 1989 616 l C. E. McDowell and D. P. Helmbold

Table A.2. (Continued) System Interface Probe Effect Global Clock Languages Model pdbx Bw861 oper sys n/s none C, P, F + dynix gmem Pilgram [Coo871 object logical time none Cont. Clu WC PPD [MC88b] object n/s none C gmem RADAR [LR85] object n/s none Pronet messages Recap [PL88] object n/s none n/s hybrid Traveler [Man871 oper sys n/s uniprocessor Acore(lisp) messages TSL [HL85b] source n/s none Ada rndzv Voyeur [SBN88] manual n/s assumed several hybrid YODA [LP85] source n/s assumed Ada rndzv + gmem [AP87] oper sys n/s none FORTRAN gmem [BDV86] object logical time none ECSP messages [GB85] oper sys n/s none Path Pascal gmem [GGK84] oper sys none none n/s messages [GKY88] source fast calls none Occam, NIL messages [MMSSG] oper sys fast calls none C messages [Sno84] object n/s assumed n/s hybrid

A.3 User Interface windows, win Processes are dis- played in separate Exam/Mod State Capabilities for ex- windows amining/modifying the program’s state Exam Event global, glb Global state can be History How user examines examined a recorded event ipc Communication state history can be examined browser Using an local Local states can be editor/browser examined ( law-we > Using queries in the + Modification of state indicated language is also possible replay Only examination sequent (Plans to) interface is to replay the with a sequential history debugger scroll tp Scrollable time- Rename Objects Whether program process diagrams objects are given Event Lang What language (if special names dur- any) is used to ing debugging express patterns no No objects can be of events given names procs Only processes can Control Sched Whether user can be given names control scheduling of the program yes Most objects can be given names hist History guides to Graphics Whether debugger scheduling uses graphics select, se1 Can select which commun Animated view of process to run next interprocess sus/cont, SC Can suspend or con- communications tinue individual Time-process processes diagrams can be displayed n/s Not specified

ACM Computing Surveys, Vol. 21, No. 4, December 1989 Debugging Concurrent Programs l 617

Table A.3. User Interface Examine Exam/Mod Rename Event Event Control System State Objects Graphics History Lang Sched Agora [For881 local no windows replay none sus/cont Amoeba [El9881 local+ no none replay reg. exp. sus/cont belvedere [HC87] ipc no commun replay EDL hist BUGNET [CWSZ] local, ipc no none replay none sus/cont CBUG [Gai85] local, ipc no windows none none sus/cont cdbg [Int87] local+ yes none none none sus/cont dbxtool [AM861 local+ no windows none none sus/cont defence [Web831 local+ no none none none sus/cont DISDEB [LP86] global+ no none none sus/cont EDL [BW83] n/s no none replay ;DL none HARD [MCR85] glb+, ipc+ no none none Adad sus/cont IDD [HHK85] local, ipc no tP scroll tp int. log. SC, hist Instant [LM87] sequent no tP” replay none hist Jade [ JLSU87] sequent yes tp, commun browser none sel, hist MAD [RRZ88] global no tP browser path rules n/s Meglos [GK86] local+ no none none none sus/cont mtdbx [Gri87] global+ no tp, win scroll tp none SC, sel, hist Multibug [CP86] local+ yes none none none sus/cont Parasight [AG88] local+ no none none none sus/cont pdbx L%W local+ no windows none none sus/cont Pilgram [Coo871 global+ no none none none sus/cont PPD [MC88b] local, ipc no b replay none sus/cont RADAR [ LR85] ipc no commun replay none none Recap [PL88] sequent no none replay none hist Traveler [Man871 n/s no windows browser none select TSL [HL85b] sequent yes none browser TSL select Voyeur [SBN88] global yes programmable browser none none YODA [LP85] ipc no none prolog none none [AP87] n/s no none none none none [BDV86] sequent no none none BS none [GB85] global+ no win, commun replay yes, n/s sus/cont [GGK84] local+ no none scroll tp yes, n/s sus/cont [GKY88] glb+, ipc+ no windows browser temp. logic SC, select [MMS86] n/s no none browser none none [Sno84] local no none TQuel TQuel none “In Fowler et al. [1988] a toolkit by the same authors includes process time diagrams that require a global clock. The process time diagrams are not presented in the 1987 paper. b PPD generates and displays dynamic dependence graphs. ’ DISDEB allows complex events to be built out of very low-level machine code like events using a low-level language. For example, the language can only refer to physical addresses rather than using identifiers from the source code. d There is a facility for calling the debugger from special tasks. These tasks can be used to implement arbitrarily complex breakpoints.

A.4 Breakpoints local Breakpoints can be set on local State Breakpoints Types of state state-based stmt Breakpoints breakpoints can be set at supported a source global Breakpoints can statement be set on global Event Breakpoints Types of (and local) event-based state breakpoints

ACM Computing Surveys, Vol. 21, No. 4, December 1989 618 . C. E. McDowell and D. P. Helmbold

mult. ( language) Breakpoints on during program conjunction, execution disjunction, or Breakpoint Effect What is halted repetition of when a break- events point is reached seq. ( language ) Breakpoints on either Either one pro- complex cess or the sequence entire program of events may be halted process One process is single Breakpoints on halted the occurrence program The entire pro- of single events gram is halted Modify Breakpoints Whether break- points can be n/a Not applicable added/disabled n/s Not specified

Table A.4. Breakpoints State Event Modify Breakpoint System Breakpoints Breakpoints Breakpoints Effect Agora [For881 local single yes -. Amoeba [El&] local + stmt seq(reg. expr.) yes either belvedere [HC87] no none no n/a BUGNET n/s single yes either CBUG ME; stmt none yes process cdbg [I&37] local + stmt single yes process dbxtool [AM861 local + stmt none yes process defence [Web831 local + stmt none yes program DISDEB [LP86] no multiple no either EDL [BW83] no none n/a n/a HARD [MCR85] local + stmt multiple remove process IDD [HHK85] global seq n/s program Instant [LM87] local + stmt none yes program Jade [JLSU87] no single yes MAD [ RRZ88] n/s n/s n/a n/a Meglos [GK86] local single yes either mtdbx [Gri87] local + stmt none yes either Multibug [CP86] local + stmt single yes process Parasight [AG88] stmt none yes process pdbx Pw861 local + stmt none yes process Pilgram [Coo871 stmt none 3-s process PPD [MC88b] stmt + local none yes program RADAR none n/a n/a Recap EEli; gal + stmt n/s yes process Traveler [Man871 no no (planned) n/a n/a TSL [HLSW] no seq(TSL) no process Voyeur [SBN88] n/s n/s n/s n/s YODA [LP85] no none n/a n/a [AP87] n/a n/a n/a n/a [BDV86] local seq(BS) no process [GB85] -2 yes process [GGKM] $a1 + global seq yes either [GKY88] stmt + global single no program [MMS86] no none n/a n/a [ Sno84] no none n/a n/a

ACM Computing Surveys, Vol. 21, No. 4, December 1989 Debugging Concurrent Programs l 619

A.5 Event Monitoring history Using predicates on the history of events Event Type What is an event (language > Specified language ipc Every (explicit) interpro- describes “interesting” cess communication events sh mem Shared memory references local Local state stmt Each statement execution process Specifying important History Kind of event history process(es) recorded Replay How complete are the buffer Last n events are stored in replay facilities a buffer commun Communication state can chk pt All events since the last be deduced checkpoint are saved complete Entire state (including complete All events are recorded and local vars) is available preserved Ordering How are event histories sparse Some events are kept; ordered others are not linear All events are forced into a Filtering How the information linear history recorded (or replayed) partial “Concurrent” events are can be reduced not ordered event Using predicates on single events n/a Not applicable global Global state n/s Not specified

Table AS. Event Monitoring System Event Type History Filtering Replay Ordering Agora [For881 sh mem chk. pt. n/s complete partial Amoeba [Els88] ipc chk. pt. history complete partial belvedere [HC87] ipc complete none commun partial BUGNET [CW82] ipc chk. pt. proc, event complete linear CBUG [Gai85] ipc none none none n/a cdbg [Int87] ipc none n/a n/a n/a dbxtool [AM861 stmt none none none n/a defence [Web831 stmt none n/a n/a n/a DISDEB [LP86] ipc, sh mem none a none linear EDL [BW83] n/s complete EDL commun linear HARD [MCR85] stmt none none none partial IDD [HHK85] ipc buffer proc, event none linear Instant [LM87] sh mem complete none complete partial Jade [JLSU87] ipc complete proc, event complete linear MAD stmt sparse history none linear Meglos [gEEi; sh mem none none none partial mtdbx [Gri87] ipc complete none complete linear Multibug [CP86] ipc none n/a n/a n/a Parasight [AG88] stmt none none none n/a pdbx [SeqW stmt none none none n/a Pilgram [Coo871 stmt none none none n/a PPD [MC88b] sh mem complete none none partial RADAR [ LR85] ipc complete none complete partial Recap [PL88] ipc, sh mem chk. pt. process complete partial ’ DISDEB allows complex events to be built out of very low-level machine code like events using a low-level language. For example, the language can only refer to physical addresses rather than using identifiers from the source code. ’ Filtering is done by transactions. The nested function calls can be hidden, giving a clearer picture of the high- level activity (see Section 3.1). (continued)

ACM Computing Surveys, Vol. 21, No. 4, December 1989 620 . C. E. McDowell and D. P. Helmbold

Tadle AS. (Continued) System Event Type History Filtering Replay Ordering Traveler [Man871 ipc complete b none partial TSL [HL85b] ipc complete TSL (planned) linear Voyeur [SBN88] stmt complete n/s none linear YODA [LP85] ipc, sh mem complete none none linear [AP87] ipc, sh mem sparse none none partial [BDV86] ipc none n/a n/a n/a [GB85] ipc complete n/s commun partial [GGK84] n/s complete suggested complete partial [GKY88] ipc, stmt complete history complete linear [MMS86] ipc complete event none linear [Sno84] stmt sparse event none linear

REFERENCES anomaly detection in HAL/S programs. Tech. Rep. CU-CS-151-79. Univ. of Colorado at Boul- [ABKP86] ALLEN, R., BAUMGARTNER, D., KEN- der. NEDY, K., AND PORTERFIELD, A. 1986. Ptool: A BDV86] BAIARDI, F., DEFRANCESCO, N., AND semiautomatic parallel programming assistant. VAGLINI, G. 1986. Development of a debugger In Proceedings of the International Conference on for a concurrent language. IEEE Trans. Softw. Parallel Processing. IEEE, pp. 164-170. Eng. SE-12,4 (Apr.), 547-553. [AG88] ARAL, Z., AND GERTNER, I. 1988. High-level BW83] BATES, P. C., AND WILEDEN, J. C. 1983. debugging in parasight. In Proceedings of Work- High-level debugging of distributed systems: The shop on Parallel and Distributed Debugging. behavioral abstraction approach. J. Syst. Softw. ACM, pp. 151-162. 3, 255-264. Also COINS Tech. Rep. #83-29. [AM851 APPELBE, W. F., AND MCDOWELL, C. E. 1985. [CL851 CHANDY, K. M., AND LAMPORT, L. 1985. Anomaly reporting: A tool for debugging and Distributed snapshots: Determining global states developing parallel numerical algorithms. In Pro- of distributed systems. ACM Trans. Comput. ceedings of the 1st International Conference on Syst. 3, 1 (Feb.), 63-75. Supercomputing Systems. IEEE, pp. 386-391. [Coo871 COOPER, R. 1987. Pilgram: A debugger for [AM861 ADAMS, E., AND MUCHNICK, S. S. 1986. distributed systems. In Proceedings of the 7th Dbxtook A window-based symbolic debugger for International Conference on Distributed Comput- sun workstations. Softw. Pratt. Exper. 16,7,653- ing Systems. IEEE, pp. 458-465. 669. [CP86] CORSINI, P., AND PRETE, C. A. 1986. [AM881 APPELBE, W. F., AND MCDOWELL, C. E. 1988. Multibug: Interactive debugging in distributed Developing multitasking applications programs. systems. IEEE Micro 6, 3, 26-33. In Proceedings of Hawaii International Confer- [CS881 CALLAHAN, D., AND SUBHLOK, J. 1988. ence on System Sciences. IEEE, pp. 94-101. Static analysis of low-level synchronization. In [AP87] ALLEN, T. R., AND PADUA, D. A. 1987. Proceedings of Workshop on Parallel and Distrib- Debugging FORTRAN on a shared memory uted Debugging. ACM, pp. 100-111. machine. In Proceedings of the International Con- [CW82] CURTIS, R. S., AND WITTIE, L. D. 1982. ference on Parallel Processing. Penn State BugNet: A debugging system for parallel pro- University, pp. 721-727. gramming environments. In Proceedings of the 3rd International Conference on Distributed Com- [Bat881 BATES, P. 1988. Debugging heterogeneous puting Systems. ACM, pp. 394-399. distributed systems using event-based models of behavior. In Proceedings of Workshop on Parallel [Els88] ELSHOFF, I. J. P. 1988. A distributed debug- and Distributed Debugging. ACM, pp. 11-22. ger for amoeba. In Proceedings of Workshop on Parallel and Distributed Debugging. ACM, pp. [BCKT79] BANERJEE, U., CHEN, S., KUCK, D. J., l-10. AND TOWLE, R. A. 1979. Time and parallel pro- [Fid88] FIDGE, C. J. 1988. Partial orders for parallel cessor bounds for fortran-like loops. IEEE Trans. debugging. In Proceedings of Workshop on Comput. 28,9 (Sept.), 660-670. Parallel and Distributed Debugging. ACM, pp. [BDER79a] BRISTOW, G., DRAY, C., EDWARDS, B., 183-194. AND RIDDLE, W. 1979. Anomaly detection in [FLM88] FOWLER, R. J., LEBLANC, T. J., AND concurrent programs. In Proceedings of the 4th MELLOR-CRUMMEY, J. M. 1988. An integrated International Conference on Software Engineer- approach to parallel program debugging and per- ing. IEEE. formance analysis on large-scale multiprocessors. [BDER79b] BRISTOW, G., DREY, C., EDWARDS, B., In Proceedings of Workshop on Parallel and Dis- AND RIDDLE, W. 1979. Design of a system for tributed Debugging. ACM, pp. 163-173.

ACM Computing Surveys, Vol. 21, No. 4, December 1989 Debugging Concurrent Programs 621

[F076] FOSDICK, L. D., AND OSTERWEIL,L. J. 1976. [HW88] HABAN, D., AND WEIGEL, W. 1988. Global Data flow analysis in software reliability. ACM events and global breakpoints in distributed sys- Comput. Surv. 8 (Sept.), 305-330. tems. In Proceedings of Hawaii Znternational Con- [For881 FORIN, A. 1988. Debugging of heteroge- ference on System Sciences. IEEE, pp. 166-175. neous parallel systems. In Proceedings of Work- [Int87] INTEL CORP. 1987. iPSC Concurrent Debug- shop on Parallel and Distributed Debucminp.-- - ger Manual. AC-M, pp. 130-140. [JLSU87] JOYCE, J., LOMOW, G., SLIND, K., AND [Gai85] GAIT, J. 1985. A debugger for concurrent UNGER, B. 1987. Monitoring distributed sys- programs. Softw. Pratt. Exper. 15,6, 539-554. tems. ACM Trans. Comput. Syst. 5, 2 (May), [GB85] GARCIA, M. E., AND BERMAN, W. J. 1985. 121-150. An approach to concurrent systems debugging. In [Kar87] KARP, A. H. 1987. Programming for paral- Proceedings of the 5th International Conference lelism. Computer 20, 5, 43-57. on Distributed Computing Systems. IEEE, pp. [Lam781 LAMPORT, L. 1978. Time, clocks, and the 507-514. ordering of events in a distributed system. [GGK84] GARCIA-M• LINA, H., GERMANO, F., JR., Commun. ACM 21,7,558-565. AND KOHLER, W. H. 1984. Debugging a distrib- [LM87] LEBLANC, T. J., AND MELLOR-CRUMMEY, uted computing system. IEEE Trans. Softw. Eng. J. M. 1987. Debugging parallel programs with SE-IO, 2 (Mar.), 210-219. instant replay. IEEE Trans. Comput. C-36, 4 [GH83] GENTLEMAN, W. M., AND HOEKSMA, H. (Apr.), 471-482. 1983. Hardware assisted high level debugging. [LP85] LEDOUX, C. H., AND PARKER,D. S., JR. 1985. SZGPLAN Notices 18,8 (August), 140-144. Saving traces for ada debugging. In Ada In Use, [GK86] GAGLIANELLO, R. D., AND KATSEFF, H. P. Proceedings of the Ada International Conference. 1986. The meglos user interface. In Proceedings ACM, Cambridge University Press, pp. 97-108. of Full Joint Computer Conference. ACM, pp. [LP86] LAZZERINI, B., AND PRETE, C. A. 1986. 169-177. Disdeb: An interactive high-level debugging sys- [GKY88] GOLDSZMIDT, G., KATZ, S., AND YEMINI, tem for a multi-microprocessor system. Micropro- S. 1988. Interactive blackbox debugging for con- cess. Microprogram. 18, 401-408. current languages. In Proceedings >? %‘ork.shop [LR85] LEBLANC, R. J., AND ROBBINS, A. D. 1985. on Parallel and Distributed Debu,&ng. ACM, Event-driven monitoring of distributed programs. pp. 271-282. In Proceedings of the 5th International Conference [Go1861 GOLDBERG, A. T. 1986. Knowledge-based on Distributed Computing Systems. IEEE, pp. programming: A survey of program design and 515-522. construction techniques. IEEE Trans. Softw. [Man871 MANNING, C. R. 1987. Traveler: The api- Eng. SE-12,4 (Apr.), 752-768. ary observatory. In Proceedings of European [GR85] GEHANI, N. H., AND ROOME, W. D. 1985. Conference on Object Oriented Programming. Concurrent C. Tech. Rep., AT&T Bell Labora- pp. 97-105. tories. [MC88a] MILLER, B. P., AND CHOI, J.-D. 1988a. [Gri87] GRIFFIN, J. 1987. Parallel debugging system Breakpoints and halting in distributed systems. user’s guide. Tech. Rep., Los Alamos National In Proceedings of International Conference on Laboratory. Distributed Computing Systems. IEEE. [MC88b] MILLER, B. P., AND CHOI, J.-D. 1988. A [HC87] HOUGH, A. A., AND CUNY, J. 1987. mechanism for efficient debugging of parallel pro- Belvedere: Prototype of a pattern-oriented debug- grams. In Proceedings of Workshop on Parallel ger for highly parallel computation. In Proceed- and Distributed Debugging. ACM, pp. 141-150. ings of the International Conference on Parallel Processing. Penn State University, pp. 735-738. [McD88] MCDOWELL, C. E. 1988. Viewing anoma- lous states in parallel programs. In Proceedings [HHK85] HARTER, P. K., JR., HEIMBIGNER, D. M., of the Znternutionul Conference on Parallel Pro- AND KING, R. 1985. IDD: An interactive distrib- cessing. Penn State University, pp. 54-57. uted debugger. In Proceedings of the 5th Znter- [McD89] MCDOWELL, C. E. 1989. A practical algo- national Conference on Distributed Computing rithm for static analysis of parallel programs. Systems. IEEE, pp. 498-506. Journal of Parallel and Distributed Computing 6, [HL85a] HELMBOLD, D., AND LUCKHAM, D. 1985. 3 (June), 515-536. Debugging ada tasking programs. IEEE Softw. 2, [MCR85] DI MAIO, A., CERI, S., AND REGHIZZI, S. 2, 47-57. C. 1985. Execution monitoring and debugging [HL85b] HELMBOLD, D., AND LUCKHAM, D. 1985. tool for ada using relational algebra. In Ada In TSL: Task Sequencing Language. In Ada In Use, Use, Proceedings of the Ada International Confer- Proceedings of the Ada International Conference. ence. ACM, Cambridge University Press. ACM, Cambridge University Press. [MMS86] MILLER, B. D., MACRANDER, C., AND [HU75] HECHT, M. S., AND ULLMAN, J. D. 1975. A SECHREST, S. 1986. A distributed programs simple algorithm for global data flow analysis monitor for Berkeley UNIX. Softw. Pruct. Exper. problems. SIAM J. Comput. 4,519-532. 16,2,183-200.

ACM ComputingSurveys, Vol. 21,No. 4, December1989 622 l C. E. McDowell and D. P. Helmbold

[Pet771 PETERSON, J. L. 1977. Petri nets. ACM [Sto88] STONE, J. M. 1988. A graphical represen- Comput. Surv. 9, 3 (Sept.), 223-252. tation of concurrent processes. In Proceedings of [PL88] PAN, D. Z., AND LINTON, M. A. 1988. Workshop on Paralleiand Distributed Debugging. Supporting reverse execution of parallel pro- ACM. Published as SZGPLAN Notices 24.1I (Jan- arams. In Proceedings of Workshop on Parallel uary 1989). pp. 226-235. and Distributed Debugging. ACM. Published [Sun861 SUN MICROSYSTEMS. 1986. NeWS Prelim- as SZGPLAN Notices 24, 1 (January 1989). pp. inary Technical Overview. 124-129. [Tan811 TANENBAUM, A. S. 1981. Computer Net- lRRZ881 RUBIN, R. V., RUDOLPH, L., AND ZERNIK, works. Prentice-Hall, Englewood Cliffs, N.J. D. i988. Debugging parallel programs in paral- [Tay83] TAYLOR, R. N. 1983. A general-purpose lel. In Proceedings of Workshop on Parallel and algorithm for analyzing concurrent programs. Distributed Debugging. ACM. Published as SZG- CACM 26,5,362-376. PLAN Notices 24, 1 (January 1989). pp. 216-225. [Tay84] TAYLOR, R. N. 1984. Debugging real-time lSBN881 SOCHA, D., BAILEY, M. L., AND NOTKIN, D. software in a host-target environment. Tech. Rep. 1988. Voyeur: Graphical views of parallel pro- 212, Univ. of California at Irvine. mams. In Proceedings of Workshop on Parallel [TO801 TAYLOR, R. N., AND OSTERWEIL, L. J. 1980. and Distributed Deb&g&. ACM. -Published as Anomaly detection in concurrent software by SZGPLAN Notices 24, 1 (January 1989). pp. static data flow analysis. IEEE Trans. Softw. 206-215. Eng. SE-6,3 (May), 265-278. [Seq86] SEQUENT CORP. 1986. Dynix Pdbx Parallel [Vic77] VICTOR, K. E. 1977. The design and imple- Debugger User’s Manual. mentation of DAD, a multiprocess, multima- chine, multilanguage interactive debugger. In [SG86] SCHEIFLER, R. W., AND GETTYS, J. 1986. Proceedings of Hawaii International Conference The X window system. ACM Trans. Graph. 5, 2 on System Sciences. IEEE, pp. 196-199. (Apr.). [Web831 WEBER, J. C. 1983. Interactive debugging [Sno84] SNODGRASS, R. 1984. Monitoring in a soft- of concurrent programs. SZGPLAN Notices 18,8, ware development environment: a relational 112-113. approach. In Proceedings of the Software [Wer88] WERNER, L. L. 1988. Fault detection in Engineering Symposium on Practical Software production programs by means of data usage Development Environments. SIGPLAN, ACM analysis. Ph.D. dissertation UCSD. SIGSOFT. [YT86] YOUNG, M., AND TAYLOR, R. N. 1986. [ST831 SEIDNER, R., AND TINDALL, N. 1983. Combining static concurrency analysis with sym- Interactive debug requirements. SZGPLAN No- bolic execution. In Proceedings of Workshop on tices 9-22. Software Testing. pp. 10-178.

Received July 1988; final revision accepted January 1989.

ACM Computing Surveys, Vol. 21, No. 4, December 1989