Generation-Friendly Eq Hash Tables

Generation-Friendly Eq Hash Tables

Generation-Friendly Eq Hash Tables Abdulaziz Ghuloum and R. Kent Dybvig Department of Computer Science, Indiana University, Bloomington, IN 47408 {aghuloum,dyb}@cs.indiana.edu Abstract k, h(k) must be the same every time h(k) is computed. Efficient Eq hash tables, which support arbitrary objects as keys, distinguish hashing (i.e., O(1) cost for lookup and update operations) relies on keys via pointer comparison and often employ hash functions that the ability of the hashing function to produce uniformly distributed utilize the address of the object. When compacting garbage collec- values. tors move garbage collected objects, the addresses of such objects Hash tables can be characterized by the predicates used to dis- may change, thus invalidating the hash function computation. A tinguish keys. Eq hash tables, which support arbitrary objects as common solution is to rehash all of the entries in an eq hash ta- keys, distinguish keys via pointer comparison, e.g,. Scheme’s eq? ble on the first access to the table after a collection. For a simple predicate or C’s == operator. Because keys are distinguished by stop-and-copy garbage collector, which moves every element of a their address in memory, the most effective eq hash functions are hash table, the rehashing overhead is proportional to the amount based on those addresses, since the address of an object is different of work done by the collector and so the cost of rehashing adds from that of every other object and the addresses are already scat- only constant overhead. Generational copying collectors, however, tered in memory. Hash functions based on objects’ addresses are may move a few or none of the entries, so rehashing a large table also the most efficient, since no traversal of the objects’ sub-parts is may cost considerably more than the garbage collection run that required. Unfortunately, compacting garbage collectors, which are caused the rehash. In other words, such rehashing is not “genera- used by many implementations, can change the address of an object tion friendly.” at any time—in particular, between the time it is entered into an eq In this paper, we describe an efficient, generation-friendly hash table and the time it is retrieved. An attempted retrieval based mechanism for implementing eq hash tables. The amount of work on the new address is likely not to succeed. A common solution to required for rehashing is proportional to the work performed by the this problem is to rehash all of the entries in an eq hash tables on collector as only objects that actually move during a collection are the first access of the table since the last garbage collection run. For rehashed. The collector supports eq hash tables and their variants simple copying collectors, which touch every element of a table in via a simple new type of object, a transport link cell, the handling shifting it from old to new space, the rehashing overhead is pro- of which by the collector is nearly trivial. portional to the amount of work done by the collector and so adds only constant overhead. Generational copying collectors, however, Categories and Subject Descriptors D.3.3 [Programming Lan- may move none or only some of the entries. Rehashing a large table guages]: Language Constructs and Features—data structures; D.3.4 may cost considerably more than the cost of the garbage collection [Programming Languages]: Processors—memory management, run that caused the rehash. In other words, such rehashing is not runtime environments “generation friendly.” In this paper, we describe an efficient, generation-friendly General Terms Languages mechanism for implementing eq hash tables. The amount of work Keywords Scheme, hash tables, generational garbage collection, required for rehashing objects that move during garbage collection generation-friendliness is at most proportional to the work performed by the collector. Con- sequently, the cost of rehashing is O(1) amortized over the lifetime of the hash table. Furthermore, the mechanism adds no overhead 1. Introduction to programs that do not make use of hash tables. In addition, none A hash table (Dumey 1956; Knuth 1998; Cormen et al. 2001) of the common hash-table operations require synchronization with associates a set of keys with a corresponding set of values. A the collector, i.e., disabling the collector during the operation. The hash table employs a hashing function h that maps keys from the mechanism is also flexible in that it allows eq hash tables, and universe of possible keys to a limited range of integer values, which variants such as weak eq hash tables or hash tables based only are used as indices into the table. Proper hashing depends on the partly on object address, to be created by the user. The collector property that h is a function in the mathematical sense: for every supports eq hash tables and their variants via a simple new type of object, the transport link cell, which are inspired by transport guardians (Dybvig et al. 1993). Transport link cells are simple to implement efficiently within the collector and thus contribute al- most nothing to the complexity of the implementation or the cost of a collection run. The rest of this paper is organized as follows: Section 2 gives brief background information about hash tables and compacting generational garbage collection techniques. Section 3 describes three existing implementation techniques of hash table that are Proceedings of the 2007 Workshop on Scheme and Functional Programming found in practice and how each fails to meet our performance cri- Universite´ Laval Technical Report DIUL-RT-0701 teria. Section 4 presents a prototype solution of implementing eq Scheme and Functional Programming 2007 27 The set of required operations on hash tables varies among pro- descriptor buckets gramming languages and their implementations. At a minimum, a usage : int hash table implementation must supports the following operations: table : ptr length : int • (hash-table-put! ht k v): associates the value v with the key k in the hash table ht. • (hash-table-get ht k d): retrieves the value associated with the key k in the hash table ht. If k does not exist in the table, the default value d is returned. • (hash-table-delete ht k): removes any association of the key k in the hash table ht. chain link To find the value associated with a key in a hash table, the index key : ptr of the bucket containing the key is obtained by computing an index val : ptr idx = h(k) mod length(ht). The chain stored at that bucket idx next : ptr is then searched sequentially by comparing the key of each chain link to the key requested. Many programming languages (e.g., Common LISP (American National Standards Institute and Information Technology Industry Council 1996), Perl (Wall et al. 1996), Python (Rossum 1992), Figure 1. Chained Hash Tables: A hash table is represented as a Scheme (Sperber et al. 2007), etc.) require additional operations descriptor holding (a) a counter indicating the number of key/value on hash tables such as: pairs stored in the table, (b) a pointer to a vector of buckets, and (c) the number of buckets in the vector. A (possibly empty) linked list • (hash-table-member? ht k): returns a boolean value indi- of key/value pairs is stored in each bucket. cating whether the hash table ht contains a binding for the key k. • (hash-table-keys ht): returns the list of keys stored in the hash tables using transport guardians. Section 5 presents our final hash table. solution based on transport link cells and describes their implemen- • (hash-table-map ht p): applies the procedure p to every tation within the collector. Section 6 analyzes the performance of key/value pair in the hash table. our implementation. To illustrate the flexibility of our approach, Section 7 presents some possible user-space extensions to the ba- 2.2 Compacting garbage collectors sic hash-table mechanism that employ the same transport link cell Implementations of programming languages that support automatic mechanism. Finally, section 8 presents our conclusions and discus- memory management often use garbage collection to reclaim un- sions future research directions. used heap space (Wilson 1992). A large contiguous block of mem- ory is pre-allocated to serve as an allocation heap from which indi- vidual objects are allocated. Exhaustion of the allocation area trig- 2. Background gers the garbage collection routine which moves all live objects This section gives a brief overview of chained hash tables, gen- (e.g., objects that are reachable through the transitive closure of the erational garbage collection, and the problems associated with the program roots including the registers and the stack) to another area interaction of the two subsystems. Readers who are familiar with in memory, and thus freeing the old allocation area. these topics can skip this section without loss of continuity. Generational garbage collectors separate heap objects into gen- erations based on their age (Lieberman and Hewitt 1983; Sansom 2.1 Chained hash tables and Jones 1993; Appel 1989; Dybvig et al. 1994). Memory is di- vided into a (typically fixed) number of generations. Objects in A chained hash table (see figure 1) is a data structure with the younger generations are collected more frequently than those in following components: older generations. New objects are allocated sequentially into the youngest generation (a.k.a. the nursery). Once the nursery is ex- • Hash table descriptor: The descriptor contains information hausted, the garbage collector is invoked to clean up the nursery about the hash table such as the number of elements contained by moving the surviving young objects to the first generation.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    10 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us