How Caching Affects Hashing

How Caching Affects Hashing∗ Gregory L. Heileman [email protected] Department of Electrical and Computer Engineering University of New Mexico, Albuquerque, NM Wenbin Luo [email protected] Engineering Department St. Mary’s University, San Antonio, TX Abstract Thus, for general-purpose use, a practitioner is well-advised A number of recent papers have considered the influence of to choose double hashing over linear probing. Explanations modern computer memory hierarchies on the performance are provided as to why these results differ from those previ- of hashing algorithms [1, 2, 3]. Motivation for these papers ously reported. is drawn from recent technology trends that have produced an ever-widening gap between the speed of CPUs and the 1 Introduction. latency of dynamic random access memories. The result is In the analysis of data structures and algorithms, an an emerging computing folklore which contends that inferior abstract view is generally taken of computer memory, hash functions, in terms of the number of collisions they pro- treating it as a linear array with uniform access times. duce, may in fact lead to superior performance because these Given this assumption, one can easily judge two com- collisions mainly occur in cache rather than main memory. peting algorithms for solving a particular problem—the This line of reasoning is the antithesis of that used to jus- superior algorithm is the one that executes fewer in- tify most of the improvements that have been proposed for structions. In real computer architectures, however, the open address hashing over the past forty years. Such im- assumption that every memory access has the same cost provements have generally sought to minimize collisions by is not valid. We will consider the general situation of a spreading data elements more randomly through the hash memory hierarchy that consists of cache memory, main table. Indeed the name “hashing itself is meant to con- memory, and secondary storage that is utilized as vir- vey this notion [12]. However, the very act of spreading tual memory. As one moves further from the CPU in the data elements throughout the table negatively impacts this hierarchy, the memory devices become slower as their degree of spatial locality in computer memory, thereby their capacity become larger. Currently, the typical sit- increasing the likelihood of cache misses during long probe uation involves cache memory that is roughly 100 times sequences. In this paper we study the performance trade- faster than main memory, and main memory that is offs that exist when implementing open address hash func- roughly 10,000 times faster than secondary storage [10]. tions on contemporary computers. Experimental analyses In our model of the memory hierarchy, we will treat are reported that make use of a variety of different hash cache memory as a unified whole, making no distinc- functions, ranging from linear probing to highly “chaotic tion between L1 and L2 cache. forms of double hashing, using data sets that are justified It is well known that the manner in which memory through information-theoretic analyses. Our results, con- devices are utilized during the execution of a program trary to those in a number of recently published papers, show can dramatically impact the performance of the pro- that the savings gained by reducing collisions (and therefore gram. That is, it is not just the raw quantity of memory probe sequence lengths) usually compensate for any increase accesses that determines the running time of an algo- in cache misses. That is, linear probing is usually no better rithm, but the nature of these memory accesses is also than, and in some cases performs far worse than double hash important. Indeed, it is possible for an algorithm with functions that spread data more randomly through the table. a lower instruction count that does not effectively use cache memory to actually run slower than an algorithm ∗We wish to thank to Bernard Moret for suggesting this topic with a higher instruction count that does make efficient to us. use of cache. A number of recent papers have considered and double hashing on modern architectures. This is this issue [1, 2, 13, 14], and the topic of “cache-aware” followed in Section 5 by a description of a set of exper- programming is gaining in prominence [4, 6, 15]. iments that used these synthetic data sets to measure In this paper we conduct analyses of open address the actual performance of the aforementioned hashing hashing algorithms, taking into account the memory ac- algorithms. Three important performance factors con- cess patterns that result from implementing these algo- sidered are the number of probes (and therefore memory rithms on modern computing architectures. We begin in accesses) required during the search process, the num- Section 2 with a description of open address hashing in ber of cache misses generated by the particular probe general, and then we present three specific open address sequence, and the load factor of the hash table. In ad- hashing algorithms, including linear probing, linear dou- dition, we also consider how the size of the hash table, ble hashing, and exponential double hashing. We dis- and the size of the data elements stored in the hash ta- cuss the memory access patterns that each of these algo- ble affect performance. In general, hash functions that rithms should produce when implemented, with particu- produce highly random probe sequences lead to shorter lar considerations given to interactions with cache mem- searches, as measured by the number of memory ac- ory. These access patterns have a high degree of spatial cesses, but are also more likely to produce cache misses locality in the case of linear probing, and a much lower while processing these memory access requests. Our ex- degree of spatial locality in the case of double hashing. periments were aimed at quantifying the net affects of Indeed, the exponential double hashing algorithm that these competing phenomena, along with the impact of we describe creates memory access patterns very simi- the other performance factors we have just mentioned, lar to those of uniform hashing, which tends to produce on the overall performance of open address hashing algo- the optimal situation in terms of memory accesses, but rithms. The results, presented in Section 5, show that it is also the worst case in terms of spatial locality. is difficult to find cases, either real or contrived, whereby In Section 3 previous research results dealing with linear probing outperforms double hashing. However, it open addressing hashing are presented. First we de- is easy to find situations where linear probing performs scribe theoretical results that show double hashing far worse than double hashing, particularly when real- closely approximates uniform hashing, under the as- istic data distributions are considered. sumption that data elements are uniformly distributed. It was subsequently shown experimentally that the per- 2 Open Address Hashing. formance of linear double hashing diverges significantly We assume hashing with open addressing used to resolve from that of uniform hashing when skewed data dis- collisions. The data elements stored in the hash table tributions are used, and that in the case of exponen- are assumed to be from the dynamic dictionary D. Each tial double hashing this divergence is not as severe [20]. data element x D has a key value kx, taken from the Thus, exponential double hashing is the best choice if universe U of possible∈ key values, that is used to identify the goal is to minimize the raw number of memory ac- the data element. The dynamic dictionary operations cesses. These results, however, do not consider how of interest include: cache memory may impact performance. Thus in Sec- tion 3 we also review a number of recent experimental Find (k, D). Returns the element x D such that • ∈ results that have been used to make the case that, due to kx = k, if such an element exists. modern computer architectures, linear probing is gen- Insert(x, D). Adds element x to D using kx. • erally a better choice than double hashing. We provide Delete(k, D). Removes the element x D that • ∈ detailed analyses of these studies, pointing out the flaws satisfies kx = k, if such an element exists. that lead to this erroneous conclusion. When implementing these operations using a hash table It is important to note that dynamic dictionary data with m slots, a table index is computed from the key sets are often too small to be directly used to test cache value using a ordinary hash function, h, that performs effects. For example, the number of unique words in the mapping h : U ZZ , where ZZ denotes the set the largest data set available in the DIMACS Dictionary m m 0, 1, . , m 1 . → Tests challenge, joyce.dat, will fit in the cache memory { Hashing− with} open addressing uses the mapping H : of most modern computers. Thus, in Section 4 we de- U ZZ ZZ , where ZZ= 0, 1, 2,... , and produces the scribe an information-theoretic technique that allows us × → m { } probe sequence < H(0, k),H(1, k),H(2, k), . > [18]. to create large synthetic data sets that are based on real For a hash table containing m slots, there can be at data sets such as joyce.dat. These data sets are there- most m unique elements in a probe sequence. A full fore realistic, yet large enough to create the cache effects length probe sequence visits all m hash table locations necessary to study the performance of linear probing using only m probes. One of the key factors affecting the length of a probe sequence needed to implement any of A second approach involves using a more compli- the dynamic dictionary operations is the load factor, cated ordinary hash function h( ) so that the initial α, of the hash table, which is defined as the ratio of probe into the hash table is more· random.

Load more