Phase-Concurrent Hash Tables for Determinism
Total Page:16
File Type:pdf, Size:1020Kb
Phase-Concurrent Hash Tables for Determinism Julian Shun Guy E. Blelloch Carnegie Mellon University Carnegie Mellon University [email protected] [email protected] ABSTRACT same input, while internal determinism also requires certain in- termediate steps of the program to be deterministic (up to some We present a deterministic phase-concurrent hash table in which level of abstraction) [3]. In the context of concurrent access, a data operations of the same type are allowed to proceed concurrently, structure is internally deterministic if even when operations are ap- but operations of different types are not. Phase-concurrency guar- plied concurrently the final observable state depends uniquely on antees that all concurrent operations commute, giving a determin- the set of operations applied, but not on their order. This prop- istic hash table state, guaranteeing that the state of the table at any erty is equivalent to saying the operations commute with respect to quiescent point is independent of the ordering of operations. Fur- the final observable state of the structure [36, 32]. Deterministic thermore, by restricting our hash table to be phase-concurrent, we concurrent data structures are important for developing internally show that we can support operations more efficiently than previous deterministic parallel algorithms—they allow for the structure to concurrent hash tables. Our hash table is based on linear probing, be updated concurrently while generating a deterministic result in- and relies on history-independence for determinism. dependent of the timing or scheduling of threads. Blelloch et al. We experimentally compare our hash table on a modern 40-core show that internally deterministic algorithms using nested paral- machine to the best existing concurrent hash tables that we are lelism and commutative operations on shared state can be efficient, aware of (hopscotch hashing and chained hashing) and show that and their algorithms make heavy use concurrent data structures [3]. we are 1.3–4.1 times faster on random integer keys when opera- tions are restricted to be phase-concurrent. We also show that the However, for certain data structures, the operations naturally do cost of insertions and deletions for our deterministic hash table is not commute. For example, in a hash table mixing insertions and only slightly more expensive than for a non-deterministic version deletions in time would inherently depend on ordering since insert- that we implemented. Compared to standard sequential linear prob- ing and deleting the same element do not commute, but insertions ing, we get up to 52 times speedup on 40 cores with dual hyper- commute with each other and deletions commute with each other, threading. Furthermore, on 40 cores insertions are only about 1:3× independently of value. The same is true for searching mixed with slower than random writes (scatter). We describe several applica- either insertion or deletion. For a data structure in which certain tions which have deterministic solutions using our phase-concurrent operations commute but others do not, it is useful to group the op- hash table, and present experiments showing that using our phase- erations into phases such that the concurrent operations within a concurrent deterministic hash table is only slightly slower than us- phase commute. We define a data structure to be phase-concurrent ing our non-deterministic one and faster than using previous con- if subsets of operations can proceed (safely) concurrently. If the current hash tables, so the cost of determinism is small. operations within a phase also commute, then the data structure is deterministic. Note that phase-concurrency can have other uses Categories and Subject Descriptors: D.1.3 [Programming Tech- besides determinism, such as giving more efficient data structures. niques]: Concurrent Programming—Parallel Programming It is the programmer’s responsibility to separate concurrent opera- tions into phases, with synchronization in between, which for most Keywords: Hash Table, Determinism, Applications nested parallel programs is easy and natural to do. In this paper, we focus on the hash table data structure. We de- 1. INTRODUCTION scribe a deterministic phase-concurrent hash table and prove its Many researchers have argued the importance of deterministic re- correctness. Our data structure builds upon a sequential history- sults in developing and debugging parallel programs (see e.g. [32, independent hash table [4] and allows concurrent insertions, con- 6, 8, 25, 3]). Two types of determinism are distinguished—external current deletions, concurrent searches, and reporting the contents. determinism and internal determinism. External determinism re- It does not allow different types of operations to be mixed in time, quires that the program produce the same output when given the because commutativity (and hence determinism) would be violated in general. We show that using one type of operation at a time is Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed still very useful for many applications. The hash table uses open for profit or commercial advantage and that copies bear this notice and the full cita- addressing with a prioritized variant of linear probing and guaran- tion on the first page. Copyrights for components of this work owned by others than tees that in a quiescent state (when there are no operations ongo- ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- ing) the exact content of the array is independent of the ordering publish, to post on servers or to redistribute to lists, requires prior specific permission of previous updates. This allows, for example, quickly returning and/or a fee. Request permissions from [email protected]. the contents of the hash table in a deterministic order simply by SPAA’14, June 23–25, 2014, Prague, Czech Republic. Copyright 2014 ACM 978-1-4503-2821-0/14/06 ...$15.00 packing out the empty cells, which is useful in many applications. http://dx.doi.org/10.1145/2612669.2612687 . Returning the contents could be done deterministically by sorting, but this is more expensive. Our hash table can store key-value pairs that phase-concurrency can be applied to hash tables to obtain both either directly or via a pointer. determinism and efficiency. We prove correctness and termina- We present timings for insertions, deletions, finds, and return- tion of our deterministic phase-concurrent hash table. Third, we ing the contents into an array on a 40-core machine. We compare present a comprehensive experimental evaluation of our hash ta- these timings with the timings of several other implementations of bles with the fastest existing parallel hash tables. We compare our concurrent and phase-concurrent hash tables, including the fastest deterministic and non-deterministic phase-concurrent linear prob- concurrent open addressing [15] and closed addressing [20] hash ing hash tables, our phase-concurrent implementation of cuckoo tables that we could find, and two of our non-deterministic phase- hashing, hopscotch hashing, which is the fastest existing concur- concurrent implementations (based on linear probing and cuckoo rent open addressing hash table as far as we know, and an opti- hashing). We also compare the implementations to standard se- mized implementation of concurrent chained hashing. Finally, we quential linear probing, and to the sequential history-independent describe several applications of our deterministic hash table, and hash table. Our experiments show that our deterministic hash table present experimental results comparing the running times of using significantly outperforms the existing concurrent (non-deterministic) different hash tables in these applications. versions on updates by a factor of 1.3–4.1. Furthermore, it gets up to a 52× speedup over the (standard) non-deterministic sequential 2. RELATED WORK version on 40 cores with two-way hyper-threading. We compare A data structure is defined to be history-independent if its layout insertions to simply writing into an array at random locations (a depends only on its current contents, and not the ordering of the scatter). On 40 cores, and for a load factor of 1=3, insertions into operations that created it [13, 24]. For sequential data structures, our table is only about 1:3× the cost of random writes. This is be- history-independence is motivated by security concerns, and in par- cause most insertions only involve a single cache miss, as does a ticular ensures that examining a structure after its creation does not random write, and that is the dominant cost. reveal anything about its history. In this paper, we extend a sequen- Such a deterministic hash table is useful in many applications. tial history-independent hash table based on open addressing [4] to For example, Delaunay refinement iteratively adds triangles to a work phase-concurrently. Our motivation is to design a data struc- triangulation until all triangles satisfy some criteria (see Section 5). ture which is deterministic independent of the order of updates. Al- “Bad triangles” which do not satisfy the criteria are broken up into though we are not concerned with exact memory layout, we want smaller triangles, possibly creating new bad triangles. The result to be able to return the contents of the hash table very quickly and of Delaunay refinement depends on the order in which bad trian- in an order that is independent of when the updates arrived. For a gles are added. Blelloch et al. show that using a technique called history-independent open addressing table, this can be done easily deterministic reservations [3], triangles can be added in parallel in by packing the non-empty elements into a contiguous array, which a deterministic order on each iteration. However, for the algorithm just involves a parallel prefix sum and cache-block friendly writes. to be deterministic, the list of new bad triangles returned in each Several concurrent hash tables have been developed over the iteration must also be deterministic.