A Tree Clock Data Structure for Causal Orderings in Concurrent Executions

A Tree Clock Data Structure for Causal Orderings in Concurrent Executions Umang Mathur Andreas Pavlogiannis Mahesh Viswanathan University of Illinois, Urbana Aarhus University University of Illinois, Urbana Champaign Champaign Denmark USA USA [email protected] [email protected] [email protected] Abstract patterns also makes verification a demanding task, as ex- posing a bug requires searching an exponentially large Dynamic techniques are a scalable and effective way to ana- space [29]. Consequently, significant efforts are made to- lyze concurrent programs. Instead of analyzing all behaviors wards understanding and detecting concurrency bugs effi- of a program, these techniques detect errors by focusing on ciently [4, 12, 24, 45, 50, 54]. a single program execution. Often a crucial step in these techniques is to define a causal ordering between events in Dynamic analyses and partial orders. One popular approach the execution, which is then computed using vector clocks, to the scalability problem of concurrent program verification a simple data structure that stores logical times of threads. is dynamic analysis [16, 28, 32, 42]. Such techniques have The two basic operations of vector clocks, namely join and the more modest goal of discovering faults by analyzing copy, require Θ(k) time, where k is the number of threads. program executions instead of whole programs. Although Thus they are a computational bottleneck when k is large. this approach cannot prove the absence of bugs, it is far In this work, we introduce tree clocks, a new data structure more scalable than static analysis and typically makes sounds that replaces vector clocks for computing causal orderings in reports of errors. These advantages have rendered dynamic program executions. Joining and copying tree clocks takes analyses a very effective and widely used approach to error time that is roughly proportional to the number of entries detection in concurrent programs. being modified, and hence the two operations do not suffer The first step in virtually all techniques that analyze concur- the a-priori Θ(k) cost per application. We show that when rent executions is to establish a causal ordering between the used to compute the standard happens-before (HB) partial events of the execution. Although the notion of causality order, tree clocks are optimal, in the sense that no other varies with the application, its transitive nature makes it data structure can lead to smaller asymptotic running time. naturally expressible as a partial order between these events. Moreover, we demonstrate that tree clocks can be used to The most commonly used partial order used in this context compute other partial orders, such as schedulably-happens- is Lamport’s happens-before (HB)[23] initially proposed in before (SHB), and thus are a versatile data structure. Our the context of distributed systems [43]. In the context of experiments on standard benchmarks show that the time for testing multi-threaded programs, partial orders play a cru- computing HB and SHB reduces to 50% and 57%, respectively, cial role in dynamic race detection techniques, and have simply by replacing vector clocks with tree clocks. been thoroughly exploited to explore trade-offs between soundness, completeness, and running time of the underly- Keywords: concurrency, happens-before, vector clocks ing analysis. Prominent examples include the widespread use of HB [11, 16, 21, 32, 44], schedulably-happens-before 1 Introduction (SHB)[25], causally-precedes (CP)[46], weak-causally- precedes (WCP)[22], doesn’t-commute (DC)[36], and The analysis of concurrent programs is one of the major strong/weak-dependently-precedes (SDP/WDP)[19], and challenges in formal methods, due to the non-determinism M2 [31]. Beyond race detection, partial orders are often em- of inter-thread communication. The large space of commu- ployed to detect and reproduce other concurrency bugs such nication interleavings poses a significant challenge to the as atomicity violations [2, 18, 27] and deadlocks [41, 48]. programmer, as intended invariants can be broken by un- expected communication patterns. The subtlety of these Vector clocks in dynamic analyses. Often, the computational Conference’17, July 2017, Washington, DC, USA task of determining the partial ordering between events of 2021. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 an execution is achieved using a simple data structure called https://doi.org/10.1145/nnnnnnn.nnnnnnn vector clock. Informally, a vector clock C is an integer array 1 Conference’17, July 2017, Washington, DC, USA Umang Mathur, Andreas Pavlogiannis, and Mahesh Viswanathan indexed by the processes/threads in the execution, and suc- rooted at thread t3 encodes the fact that t2 has learned about cinctly encodes the knowledge of a process about the whole the current times of t4, t5 and t6 transitively, via t3. To perform system. For vector clock Ct1 associated with t1, if Ct1 (t2) = i the join operation Ct1 Ct1 t Ct2 , we start from the root of then it means that the latest event of t1 is ordered after the Ct2 , and given a current node u, we proceed to the children first i events of thread t2 in the partial order. Vector clocks, of u if and only if u represents the time of a thread that is not thus seamlessly capture a partial order, with the point-wise known to t1. Hence, the join operation will now access only ordering of the vector timestamps of two events capturing the light-gray area of the tree, and thus compute the join the ordering between the events with respect to the partial without accessing the whole tree, resulting in a sublinear order. For this reason, vector clocks are instrumental in com- running time. puting the HB parial order efficiently [14, 15, 28], and are The above principle, which we call direct monotonicity is ubiquitous in the efficient implementation of analyses based one of two key ideas exploited by tree clocks; the other be- on partial orders even beyond HB [16, 22, 25, 27, 36, 41, 48]. ing indirect monotonicity. The key technical challenge in The fundamental operation on vector clocks is the pointwise developing the tree clock data structure lies in (i) using di- join Ct1 Ct1 t Ct2 . This occurs whenever there is a causal rect and indirect monotonicity to perform efficient updates, ordering from thread t2 to t1. Operationally, a join is per- and (ii) perform these updates such that direct and indirect formed by updating Ct1 (t) max(Ct1 (t); Ct2 (t)) for every monotonicity are preserved for future operations. We refer thread t, and captures the transitivity of causal orderings: as to Section 3.1 for a more in depth illustration of the intuition t1 learns about t2, it also learns about other threads t that behind these two principles. t knows about. Note that if t is aware of a later event of t, 2 1 Contributions. The contributions of this work are as fol- this operation is vacuous. With k threads, a vector clock join lows. takes Θ(k) time, and can quickly become a bottleneck in systems with large k. This motivates the following question: is 1. We introduce tree clock, a new data structure for maintain- it possible to speed up join operations by proactively avoid- ing logical times in a concurrent executions. In contrast ing vacuous updates? The challenge in such a task comes to the flat structure of the traditional vector clocks, the from the efficiency of the join operation itself: since itonly hierarchical structure of tree clocks allows for join and requires linear time in the size of the vector, any improve- copy operations that run in sublinear time. As a data ment must operate in sub-linear time, i.e., not even touch structure, tree clocks offer high versatility as they can be certain entries of the vector clock. We illustrate this idea on used in computing many different ordering relations. a concrete example, and present the key insight in this work. 2. We prove that, for computing the HB partial order, tree clocks offer an optimality guarantee we call vector-time Motivating example. Consider the example shown in Fig- (or vt-) optimality. Intuitively, vt-optimality guarantees ure1. It shows the partial trace from a concurrent system that tree clocks are an optimal data structure for HB, with 6 threads with vector times at each event. When event in the sense that the total computation time cannot be e is ordered before event e due to a synchronization event, 2 3 improved (asymptotically) by replacing tree clocks with the vector clock C of t is joined with that of C , i.e., the t2 2 t1 any other data structure. On the other hand, vector clocks t -th entry of C is updated to the maximum of C (t ) and j t1 t1 j do not enjoy this property. C (t ). Now assume that thread t has learned of the cur- t2 j 2 3. We illustrate the versatility of tree clocks by presenting rent times of threads t , t , t and t via thread t . Since 3 4 5 6 3 a tree clock-based algorithm to compute the SHB partial the t -th component of the vector timestamp of event e is 3 1 order. larger than the corresponding component of event e , t can- 2 1 4. We perform an experimental evaluation of the tree clock not possibly learn any new information about threads t , t , 4 5 data structure for computing HB and SHB, and compare and t through the join performed at event e . Hence the 6 3 its performance against the standard vector clock data naive pointwise updates will be redundant for the indices structure. Our results show that just by replacing vec- j = f3; 4; 5; 6g.

Load more