GPERF A Perfect Hash Function Generator
Douglas C. Schmidt [email protected]
http://www.cs.wustl.edu/ schmidt/ Department of Computer Science Washington University, St. Louis 63130
1 Introduction a sample input keyfile; Section 4 highlights design patterns and implementation strategies used to develop gperf; Sec- Perfect hash functions are a time and space efficient imple- tion 5 shows the results from empirical benchmarks between mentation of static search sets. A static search set is an ab- gperf-generated recognizers and other popular techniques stract data type (ADT) with operations initialize, insert,and for reserved word lookup; Section 6 outlines the limita- retrieve. Static search sets are common in system software tions with gperfand potential enhancements; and Section 7 applications. Typical static search sets include compiler and presents concluding remarks. interpreter reserved words, assembler instruction mnemonics, shell interpreter built-in commands, and CORBA IDL compil- ers. Search set elements are called keywords. Keywords are 2 Static Search Set Implementations inserted into the set once, usually off-line at compile-time. gperf is a freely available perfect hash function generator There are numerous implementations of static search sets. written in C++ that automatically constructs perfect hash func- Common examples include sorted and unsorted arrays and tions from a user-supplied list of keywords. It was designed in linked lists, AVL trees, optimal binary search trees, digital the spirit of utilities like lex [1] and yacc [2] to remove the search tries, deterministic finite-state automata, and various drudgery associated with constructing time and space efficient hash table schemes, such as open addressing and bucket chain- keyword recognizers manually. ing [5].
gperf translates an n element list of user-specified key- Different static search structure implementations offer words, called the keyfile, into source code containing a k ele- trade-offs between memory utilization and search time effi- ment lookup table and the following pair of functions: ciency and predictability. For example, an n element sorted array is space efficient. However, the average- and worst-case
hash uniquely maps keywords in the keyfile onto the
time complexity for retrieval operations using binary search