Lock-Free Queue with Constant Time Snapshots

SnapQueue: Lock-Free Queue with Constant Time Snapshots Aleksandar Prokopec École Polytechnique Fédérale de Lausanne, Switzerland [email protected]fl.ch Abstract Concurrent queues are used as buffers between producers We introduce SnapQueues - concurrent, lock-free queues and consumers in streaming platforms, as event queues in with a linearizable, lock-free global-state transition oper- reactive programming frameworks, or mailbox implementa- ation. This transition operation can atomically switch be- tions in actor systems. Traditionally, concurrent queues ex- tween arbitrary SnapQueue states, and is used by enqueue, pose enqueue and dequeue operations. Augmenting them dequeue, snapshot and concatenation operations. We show with atomic global operations opens several new use cases: that implementing these operations efficiently depends on • Persisting actor state: Actor frameworks expose persis- the persistent data structure at the core of the SnapQueue. tence modules to persist and recover actor state [1]. Per- This immutable support data structure is an interchange- sisting the state requires blocking the mailbox until the able kernel of the SnapQueue, and drives its performance pending messages are copied. With an efficient snapshot characteristics. The design allows reasoning about concur- operation, the mailbox can be persisted in real-time. rent operation running time in a functional way, absent from concurrency considerations. We present a support data struc- • Forking actors: Adding a fork or a choice operator ture that enables O(1) queue operations, O(1) snapshot and to the actor model [26] requires copying the contents O(log n) atomic concurrent concatenation. We show that the of the mailbox. To achieve atomicity, the corresponding SnapQueue enqueue operation achieves up to 25% higher actor is blocked during the copying and cannot access the performance, while the dequeue operation has performance mailbox. Efficient snapshots help avoid this problem. identical to standard lock-free concurrent queues. • Dynamic stream fusion: Streaming platforms express Categories and Subject Descriptors E.1 [Data Struc- computations as dataflow graphs. Each node in this graph tures]: Lists, stacks and queues executes a modular subset of the computation, but also introduces some buffering overhead [24]. Runtime opti- Keywords queues, lock-free, concatenation, snapshots mizations eliminate this overhead by fusing subsets of the graph. Efficient, atomic buffer concatenation and joining 1. Introduction allow optimizations without blocking the computation. Scalability in a concurrent data structure is achieved by al- This paper introduces a lock-free queue implementation, lowing concurrent accesses to execute independently on sep- called SnapQueue, which, in addition to enqueue and de- arate parts of the data structure. While efficient concurrent queue operations, exposes an efficient, atomic, lock-free implementations exist for many different data structure types transition operation. The transition operation is used to im- [15], efficient operations that change their global state are plement snapshots, concatenation, rebalancing and queue still largely unexplored. For example, most concurrent hash expansion. Although SnapQueue is a concurrent data struc- tables [7] [13], concurrent skip lists [21], and concurrent ture, it relies on a persistent data structure to encode its state. queues [14] [22] do not have an atomic snapshot operation, We describe the SnapQueue data structure incrementally: size retrieval or a clear operation. 1. We describe lock-free single-shot queues, or segments, and their enqueue and dequeue operations in Section 2. Permission to make digital or hard copies of all or part of this work for personal or Permissionclassroom use to makeis granted digital without or hard fee copies provided of all that or copies part of are this not work made for orpersonal distributed or 2. We show that segments can be atomically frozen, i.e. classroomfor profit or use commercial is granted advantage without fee and provided that copies that bear copies this are notice not andmade the or full distributed citation transformed into a persistent data structure. foronthe profit first or page. commercial Copyrights advantage for components and that copies of this bear work this owned notice by and others the full than citation ACM onmust the be first honored. page. CopyrightsAbstracting for with components credit is permitted. of this work To copy owned otherwise, by others or thanrepublish, ACM 3. We describe SnapQueues and their fundamental atomic mustto post be on honored. servers Abstracting or to redistribute with credit to lists, is permitted.requires prior Tocopy specific otherwise, permission or republish, and/or a tofee. post Request on servers permissions or to redistribute from [email protected]. to lists, requires prior specific permission and/or a operation called transition in Section 3. fee. Request permissions from [email protected]. Scala ’15, June 13–14, 2015, Portland, Oregon, USA. SCALA’15Copyright c, June2015 13, ACM 2015, 978-1-4503-3626-0/15/06. Portland, OR, USA . $15.00. 4. We implement SnapQueue enqueue, dequeue, snapshot Copyrighthttp://dx.doi.org/10.1145/nnnnnnn.nnnnnnn 2015 ACM 978-1-4503-3626-0/15/06...$15.00 and concatenation using the transition operation. http://dx.doi.org/10.1145/2774975.2774976 1 EMPTY NON-EMPTY FULL 1 @tailrec def enq(p: Int, x: T): Boolean = head last head last head last 2 if (p >= 0 && p < array.length) { 3 if (CAS(array(p), EMPTY, x)) { array 4 WRITE(last, p + 1) 5 true } else enq(findLast(p), x) Figure 1. Single-Shot Queue Illustration 6 7 } else false 8 @tailrec def findLast(p: Int): Int = { 9 val x = READ(array(p)) class Segment[T]( 10 if (x == EMPTY) p val array = new VolatileArray[T], 11 else if (x == FROZEN) array.length @volatile var head: Int = 0, 12 else if (p + 1 == array.length) p + 1 @volatile var last: Int = 0) 13 else findLast(p + 1) 14 } Figure 2. Single-Shot Queue Data Type Figure 3. Single-Shot Queue Enqueue Operation 5. We show how to tune the running time of SnapQueue As we will see in Section 2.2, a single-shot queue can operations using persistent data structures in Section 4. become frozen, in which case no subsequent enqueue or 6. We evaluate SnapQueues against similar concurrent dequeue operations can succeed. In this frozen state, array queue implementations in Section 5, and show that hav- may contain a special FROZEN value. ing snapshot support adds little or no overhead. 2.1 Basic Operations The goal of the paper is not only to propose a novel concurrent, lock-free queue with atomic constant time snapshots, In this section, we study the basic single-shot queue oper- but also to reproach the common belief that persistent data ations. The enqueue operation overwrites an EMPTY entry in structures are slow, and consequently irrelevant for high- the array. At all times, it ensures that the array corresponds p n m performance parallel and concurrent computing. As this pa- to the string REMOVED · T · EMPTY , where T is the el- per shows, persistent data structures can simplify the devel- ement type, and array length is L = p + n + m. After opment of and reasoning about concurrent data structures. inserting an element, enqueue sets the last field to point to For the sake of conciseness, code listings slightly diverge the first EMPTY entry. from valid Scala code in several ways. First, atomic READ, We define an auxiliary enq method in Figure 3, which WRITE and CAS operations differ from standard Scala syn- takes an estimated position p of the first EMPTY entry, and an tax, which relies on the Unsafe class. These operations re- element x. The enq first checks that p is within the bounds tain the standard Java volatile semantics, and are used to ac- of array. It then attempts to atomically CAS an EMPTY entry cess volatile object fields. Second, volatile arrays are rep- with x in line 3. An unsuccessful CAS implies that another resented with a non-existent VolatileArray class. Finally, enq call succeeded, so findLast finds the first special en- generically typed methods sometimes accept or return spe- try, and enq is retried. A successful CAS means that x is cial singleton values. This does not type-check in Scala, but enqueued, so the value last is increased in line 4. Due to can be worked around at a small syntactic cost. The com- potential concurrent stale writes, last is an underestimate. plete SnapQueue implementation can be found in our online The enq precondition is that p is less than or equal than code repository [2]. the actual first EMPTY entry position. This is trivially ensured by specifying the last field when calling enq. 2. Single-Shot Lock-Free Queue Lemma 1. If enq returns true, then its CAS operation added an element to the queue. Otherwise, the queue is We start by examining a simplistic lock-free queue imple- either full or frozen. mentation, called a single-shot lock-free queue. This queue is bounded – it can contain only up to L elements. Second, Dequeue atomically increments the head field to point to only up to L enqueue and up to L dequeue operations can the next element in the queue. Unlike last, the head field be invoked on this queue. Despite these limited capabilities, always precisely describes the position of the first unread single-shot lock-free queue is the basic building block of the element. When head becomes larger than array length, SnapQueue, as we show in Section 3. the queue is either empty or frozen. Similarly, when head Single-shot queue is defined by a single data type called points to EMPTY or FROZEN, the queue is considered empty Segment, shown in Figure 2. Segment contains an array or frozen, respectively. of queue elements, initially filled with special EMPTY values, The deq method in Figure 4 starts by reading head to a head – the position of the first element in the queue, and local variable p, and checks if p is within bounds.

Load more