Concurrent Breakpoints

Concurrent Breakpoints Chang Seo Park Koushik Sen Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2011-159 http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-159.html December 18, 2011 Copyright © 2011, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Acknowledgement Research supported by Microsoft (Award #024263) and Intel (Award #024894) funding and by matching funding by U.C. Discovery (Award #DIG07-10227), by NSF Grants CCF-101781, CCF-0747390, CCF- 1018729, and CCF-1018730, and by a DoD NDSEG Graduate Fellowship. The last author is supported in part by a Sloan Foundation Fellowship. Additional support comes from Par Lab affiliates National Instruments, Nokia, NVIDIA, Oracle, and Samsung. Concurrent Breakpoints Chang-Seo Park and Koushik Sen EECS Department, UC Berkeley Berkeley, CA 94720, USA fparkcs,[email protected] Abstract regular testing. Therefore, even if a user reports the input that caused a bug in a concurrent program, the developer In program debugging, reproducibility of bugs is a key re- may not be able to recreate the bug and debug the cause quirement. Unfortunately, bugs in concurrent programs are of the bug. Such non-deterministic bugs in concurrent pro- notoriously difficult to reproduce compared to their sequen- grams are called Heisenbugs. One could argue that Heisen- tial counterparts. This is because bugs due to concurrency bugs could be made reproducible if the thread schedule is happen under very specific thread schedules and the like- recorded along with program inputs during a program execu- lihood of taking such corner-case schedules during regular tion. Unfortunately, recording and replaying a thread sched- testing is very low. We propose concurrent breakpoints, a ule poses several problems: 1) It requires to observe the ex- light-weight and programmatic way to make a concurrency act thread schedule either through program instrumentation bug reproducible. We describe a mechanism that helps to or by using some specialized hardware. Instrumentation of- hit a concurrent breakpoint in a concurrent execution with ten incurs huge overhead and specialized hardware are often high probability. We have implemented concurrent break- not easily available. 2) Replaying a thread schedule requires points as a light-weight library (containing a few hundreds special runtime on the development side which could again of lines of code) for Java and C/C++ programs. We have incur huge overhead. used the implementation to deterministically reproduce sev- Nevertheless, we need some information about the thread eral known non-deterministic bugs in real-world concurrent schedule along with the program inputs to reproduce a Java and C/C++ programs involving 1.6M lines of code. Heisenbug. We, therefore, ask the question: is there a better In our evaluation, concurrent breakpoints made these non- way to provide necessary information about a thread sched- deterministic bugs almost 100% reproducible. ule along with program inputs so that one can reproduce a Heisenbug? We would like the information about thread schedule to be portable so that we do not need a special 1 Introduction runtime to reproduce the bug. In this paper, we propose a simple light-weight technique to specify enough information A key requirement in program debugging is reproducibility. about a Heisenbug so that it can be reproduced with very When a bug is encountered in a program execution, in order high probability without requiring a special runtime or a full to fix the bug, the developer first needs to confirm the exis- recording of the thread schedule. tence of the bug by trying to reproduce the bug. Developers also require that the bug can be reproduced deterministically Our technique for reproducibility is based on the observa- so that they can run the buggy execution repeatedly with the tion that Heisenbugs can often be attributed to a set of pro- aid of a debugger and find the cause of the bug. Reproducing gram states, called conflict states. A program execution is bugs for sequential programs is relatively easy when given said to be in a conflict state if there exists two threads such enough details. If all sources of non-determinism, such as that both threads are either 1) trying to access the same mem- program inputs (and values of environment variables in some ory location and at least one of the accesses is a write (i.e. cases), are recorded, a bug in a sequential program can be a data race), or 2) they are trying to operate on the same reproduced deterministically by replaying the program with synchronization object (e.g. contending to acquire the same the recorded inputs. Bugs in sequential programs can be re- lock). Depending on how a conflict state is resolved, i.e. ported easily to a bug database because a user only needs to which thread is allowed to execute first, a concurrent pro- report the input on which the sequential program exhibits the gram execution could end up in different states. Such differ- bug. ence in program states often lead to Heisenbugs. Therefore, Unfortunately, concurrent programs are notoriously diffi- in order to reproduce a Heisenbug, one should be able to cult to debug compared to their sequential counterparts. This reach conflict states and control the program execution from is because bugs due to concurrency happen under very spe- those states. cific thread schedules and are often not reproducible during In this paper, we propose concurrent breakpoints, a light- 1 weight and programmatic tool that facilitates reproducibility Penelope [40], and PCT [5]. In such testing techniques, one of Heisenbugs in concurrent programs. A concurrent break- first identifies potential program statements where a delay or point is an object that defines a set of program states and a context switch could be made to trigger a Heisenbug. De- a scheduling decision that the program needs to take if a lays or context switches are then systematically or randomly state in the set is reached. Typically, the set of states de- inserted at those program statements to see if the resulting scribed by a concurrent breakpoint would be a set of conflict thread schedule could lead to a bug. The goal of this work is states. Formally, a concurrent breakpoint is a tuple of the not to systematically or randomly explore thread schedules form (`1; `2; φ), where `1 and `2 are program locations and based on some prior information; rather, concurrent break- φ is a predicate over the program state. A concurrent pro- points make sure that once a bug is found, the bug can be gram execution reaches a concurrent breakpoint (`1; `2; φ) made reproducible using a simple programmatic technique. if there exists two threads t1 and t2 such that t1 and t2 are at Concurrent breakpoints also ensure that anyone can repro- program locations `1 and `2, respectively, and the states of t1 duce the bug deterministically without requiring the original and t2 jointly satisfy the predicate φ. After reaching a state testing framework and its runtime. denoted by a concurrent breakpoint, the program makes a For reproducibility of Heisenbugs in concurrent pro- scheduling decision: it executes the next instruction of t1 be- grams, a number of light-weight and efficient techniques fore t2. In this paper, we consider concurrent breakpoints in- have been proposed to record and replay a concurrent exe- volving two threads; however, concurrent breakpoints could cution [29, 36, 7, 35, 44, 28, 24, 33, 2]. A record and replay easily be generalized to more than two threads. system dynamically tracks the execution of a concurrent pro- We show that concurrent breakpoints could represent all gram, recording the non-deterministic choices made by the conflict states, i.e. they could represent data races and lock scheduler. A trace is produced which allows the program to contentions. We also illustrate that concurrent breakpoints be re-executed, forcing it to take the same schedule. If cap- could represent other buggy states, such as a deadlock state tured in a trace, a concurrency bug can be replayed consis- or a state where an atomicity violation or a missed notifica- tently during debugging. Note that these record-and-replay tion happens. We argue that the necessary information about systems are automatic, but require heavy-weight machinery a buggy schedule could be represented using a small set to record or replay a buggy thread schedule. In contrast, con- of concurrent breakpoints: if a program execution could be current breakpoints provide a manual mechanism to make a forced to reach all the concurrent breakpoints in the set, then Heisenbug reproducible and it requires no special runtime or the execution hits the Heisenbug. We show using probabilis- heavy-weight machinery. tic arguments that concurrent breakpoints are hard to reach during normal program executions. We propose a mechanism, called BTRIGGER, that increases the likelihood of 2 Concurrent Breakpoint hitting a concurrent breakpoint during a program execution. We define a concurrent breakpoint as the tuple (` ; ` ; φ), We provide a simplified probabilistic argument to show the 1 2 where `1 and `2 are program locations and φ is a predicate effectiveness of BTRIGGER in hitting a concurrent break- over the program state. A program execution is said to have point. triggered a concurrent breakpoint (`1; `2; φ) if the following We have implemented concurrent breakpoints and conditions are met BTRIGGER as a light-weight library (containing a few hundreds of lines code) for Java and C/C++ programs.

Load more