Checking Race Freedom Via Linear Programming

Checking Race Freedom via Linear Programming Tachio Terauchi Tohoku University [email protected] Abstract e ::= x (variable) We present a new static analysis for race freedom and race de- | n (integer constant) tection. The analysis checks race freedom by reducing the prob- | let x = e1 in e2 (variable binding) lem to (rational) linear programming. Unlike conventional static | if * then e1 else e2 (branch) analyses for race freedom or race detection, our analysis avoids | while * do e (loop) explicit computation of locksets and lock linearity/must-aliasness. | newtid (new thread identifier) Our analysis can handle a variety of synchronization idioms that | spawn(e1){e2} (new thread) more conventional approaches often have difficulties with, such as | join e (thread join) thread joining, semaphores, and signals. We achieve efficiency by | ref e (new reference cell) utilizing modern linear programming solvers that can quickly solve | !e (reference read) large linear programming instances. This paper reports on the for- | e1 := e2 (reference write) mal properties of the analysis and the experience with applying an | newlock (new lock) implementation to real world C programs. | freelock e (free lock) lock lock acquire Categories and Subject Descriptors F.3.2 [Logics and Meaning | e ( ) unlock lock release of Programs]: Semantics of Programming Languages—Program | e ( ) analysis; F.3.3 [Logics and Meaning of Programs]: Studies of Program Constructs—Type structure Figure 1. The syntax of the simple concurrent language. General Terms Algorithms, Languages, Theory, Verification This paper presents a different approach for statically checking Keywords Fractional Capabilities,Linear Programming race freedom. The key idea is to reduce the race checking problem to a linear programming problem such that if there exists a solution to the constructed linear programming instance, then the program 1. Introduction is guaranteed to be race free. In contrast to previous approaches, Race condition occurs when one thread writes to a memory location our approach only needs standard may-aliasing information, and that another thread is concurrently writing or reading. Race free- does not require locksets and lock linearity/must-aliasness. By uti- dom, the absence of race conditions, is a basic building block for lizing efficient linear programming algorithms, we achieve both developing and verifying shared-memory parallel programs, and good theoretical computational complexity (polynomial in the size static analysis for race freedom and race detection has been an ac- of the program), and good practical running times, without sacrific- tive focus of research. ing soundness. The approach can be extended to synchronization In many static (or dynamic) analyses for race freedom or race methods beyond simple lock-based idioms. The prototype imple- detection, the central idea is to compute locksets. A lockset is the mentation described in Section 3 handles programming idioms that set of locks that are always held when accessing some memory other analyses often have difficulties with, such as thread joining, location (abstract memory location in static analyses) such that a semaphores, signals, and read-write locks, as well as read only ac- potential race is detected when the lockset is empty. It is important cesses and local accesses. that the locks in the locksets are linear [20] (or must-alias [17]), The rest of the paper is organized as follows. Section 2 intro- in the sense that each lock corresponds to a unique concrete lock. duces the key concepts with a toy language that contains only locks Inferring locksets and lock linearity/must-aliasness statically can and thread spawning/joining, and formally proves soundness. Sec- be non-trivial, especially in the presence of non-lexically scoped tion 3 discusses the implementation, LP-Race, a tool for detect- locks, that sometimes analyses make optimistic approximations. ing races in multithreaded C programs. Section 4 discusses related Also, this approach is usually limited to locks and other lock like work, Section 5 discusses open issues, and Section 6 concludes. synchronization idioms. 2. Formal Aspects Figure 1 shows a simple first order expression language (with side effects) we use to present the key concepts of the analysis. The Permission to make digital or hard copies of all or part of this work for personal or language is minimized in ordered to focus on the novel aspects of classroom use is granted without fee provided that copies are not made or distributed the analysis. We briefly describe the syntax. Variables are ranged for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute over by meta variables x,x1, etc. The construct let x = e1 in e2 to lists, requires prior specific permission and/or a fee. binds the result of evaluating e1 to x and evaluates e2. We write PLDI’08, June 7–13, 2008, Tucson, Arizona, USA. e1; e2 for let x = e1 in e2 such that x is not free in e2. Copyright c 2008 ACM 978-1-59593-860-2/08/06. $5.00. The language contains non-deterministic branches and loops. The let x = ref 0 in If1 let x = ref 0 in (ι, κ, θ, F [if * then e1 else e2]) → (ι, κ, θ, F [e1]) spawn(newtid){!x}; let l = newlock in x := 1 spawn(newtid){ If2 (a) (ι, κ, θ, F [if * then e1 else e2]) → (ι, κ, θ, F [e2]) lock l; !x; let x = ref 0 in Loop unlock l }; let t = newtid in (ι, κ, θ, F [while * do e)) → lock l; if * then while * do else spawn(t){!x}; (ι, κ, θ, F [ (e; e) 0]) x := 1; join t; unlock l Let x := 1 (c) (ι, κ, θ, F [let x = v in e]) → (ι, κ, θ, F [e[v/x]]) (b) t∈ / dom(ι) NewT Figure 2. Simple race (a), and race avoidance via thread joining (ι, κ, θ, F [newtid]) → (ι ∪ {t}, κ, θ, F [t]) (b) and locking (c). Spawn (ι, κ, θ, F [spawn(t){e}]) → (ι, κ, θ, F [0] || t.e) F ::= t.E | F ||p | p||F Join E ::= let x = E in e | ref E | E := e | v := E | !E (ι, κ, θ, F [join t] || t.v) → (ι, κ, θ, F [0]) | lock E | unlock E | freelock E | spawn(E){e} | join E ℓ∈ / dom(θ) Ref (ι, κ, θ, F [ref v]) → (ι, κ, θ[ℓ 7→ v], F [ℓ]) Figure 3. The evaluation contexts Read (ι, κ, θ, F [!ℓ]) → (ι, κ, θ, F [θ(ℓ)]) construct spawn(e1){e2} creates a new thread to evaluate e2. Here, Write e1 is a thread identifier that can be used at join e to join threads. (ι, κ, θ, F [ℓ := v]) → (ι, κ, θ[ℓ 7→ v], F [0]) Multiple threads are allowed to have the same thread identifier. The language contains reference cells, that could be written and read l∈ / dom(κ) NewL concurrently. Finally, the language contains syntax for creating, (ι, κ, θ, F [newlock]) → (ι, κ[l 7→ U],θ,F [l]) deleting, acquiring, and releasing locks. Figure 2 (a) shows a simple example program that contains a κ = κ′ ∪ {l 7→ U} l∈ / dom(κ′) read write race. The spawned thread may read from the reference ′ FreeL cell bound to the variable x while the spawner thread writes to it. (ι, κ, θ, F [freelock l]) → (ι, κ ,θ,F [0]) Such a race can be avoided by using locks, as shown in (c), or by using join to wait until the other thread finishes, as shown in (b). κ(l)= U Lck (ι, κ, θ, F [lock l]) → (ι, κ[l 7→ L],θ,F [0]) 2.1 The Semantics Ulck We formally define the semantics of the language. The semantics (ι, κ, θ, F [unlock l]) → (ι, κ[l 7→ U],θ,F [0]) is defined as small-step reductions from states to states. A state is a quadruple (ι,κ,θ,p) where ι is the set of currently allocated thread identifiers, κ is the current lock state, a mapping from the currently Figure 4. The operational semantics. available locks to {U, L} where U denotes that the lock is unlocked and L denotes that the lock is locked, θ is a store mapping locations to values, and p is a program state. Values, v, are defined as needs to evaluate to a value if they terminate, and so we use 0 as the “unit” value for expressions that are evaluated purely for their v ::= t | l | ℓ | n effects. where the symbol t ranges over thread identifiers, l ranges over NewT creates a fresh thread identifier. Spawn spawns a new locks, and ℓ ranges over locations. A program state is defined as thread using the given thread identifier, and Join waits for a thread follows. with the thread identifier to finish. Ref allocates a new reference e ::= ···| v cell that can be read by Read and written by Write. NewL creates a new lock, initialized to unlocked status. If the lock is unlocked, p ::= t.e | p1|| p2 Lck may acquire the lock, setting the lock status to locked. Ulck Here, is extended with values. Intuitively, an (extended) expres- e releases the lock, setting the lock status to unlocked. FreeL deletes sion prefixed by a thread identifier, t.e, denotes a thread with the a lock if the lock is unlocked1. thread identified by t whose current program counter is e, whereas ∗ We write (ι1, κ1,θ1,p1) → (ι2, κ2,θ2,p2) for zero or p1 || p2 denotes a parallel composition of threads. Therefore, a more reduction steps from the state (ι1, κ1,θ1,p1) to the state program state is a parallel composition of finitely many expres- (ι2, κ2,θ2,p2).

Load more