Improving Cuckoo Hashing with Perfect Hashing

IMPROVING CUCKOO HASHING WITH PERFECT HASHING A THESIS IN Computer Science Presented to the Faculty of the University of Missouri-Kansas City in partial fulfillment of the requirements for the degree MASTER OF SCIENCE By MOULIKA CHADALAVADA B. Tech, SASTRA University - Thanjavur, India, 2013 Kansas City, Missouri 2017 ©2017 MOULIKA CHADALAVADA ALL RIGHTS RESERVED ii IMPROVING CUCKOO HASHING WITH PERFECT HASHING Moulika Chadalavada, Candidate for the Master of Science Degree University of Missouri – Kansas City, 2017 ABSTRACT In computer science, the data structure is a systematic way of organizing data such that it can be used efficiently. There are many hashing techniques that aim at storing keys in memory to increase key access efficiency and to make hashing efficient. One option to increase throughput is to use the algorithms based on hashing. Cuckoo Hashing is one among the techniques which provide high memory usage in constant access time. Cuckoo Hashing, in turn, uses many implementations among which Parallel-d-Pipeline is more efficient. Perfect Hashing maps distinct elements to set of integers without any collision. Perfect Hashing is fast and hit ratio is high. There are many other hashing techniques like Perfect Hashing but the reason we choose perfect hashing as it doesn’t require collision resolution mechanism. Cuckoo Hashing has high memory usage in allocating keys to its memory. So, we are combining Cuckoo Hashing and Perfect Hashing to increase the keys hit ratio and memory utilization. Keywords: Cuckoo Hashing, Memory Utilization, Perfect Hashing, Collision Resolution, Hashing iii APPROVAL PAGE The faculty listed below, appointed by the Dean of School of Computing and Engineering, have examined a thesis titled “Improving Cuckoo Hashing with Perfect Hashing” presented by Moulika Chadalavada, candidate for the Master of Science degree, and certify that in their opinion it is worthy of acceptance. Supervisory Committee Yijie Han, Ph.D., Chair School of Computing and Engineering Baek-Young Choi, Ph.D. Department of Computer Science Electrical Engineering Mohammad Kuhail, Ph.D. Department of Computer Science Electrical Engineering iv CONTENTS ABSTRACT . iii ILLUSTRATIONS . .vii TABLES . viii ACKNOWLEDGEMENTS. ix CHAPTER 1. INTRODUCTION. 1 2. CUCKOO HASHING AND ITS IMPLEMENTATION . 3 2.1 Overview . .. .. 3 2.2 Example of Cuckoo Hashing . .. 4 2.3 Different Cuckoo Hashing Implementations . .. 6 2.4 Implementation of Parallel-d-Pipeline . ... .. 8 3. PERFECT HASHING. .10 3.1 Introduction . 10 3.2 Minimal Perfect Hashing . ... 11 4. OUR APPROACH . .. 15 4.1 Introduction . .. ...15 4.2 Allocating Key to Memory . ...16 4.3 Adding New Key to Memory. ... 19 v 5. IMPLEMENTATION. .. 22 5.1 Approach for Implementing Algorithm . ...22 6. ANALYSIS. .. 30 6.1 Comparative Analysis of Cuckoo Hashing and Our Approach. .. .. .. ...30 7. CONCLUSION. .. 32 REFERENCES . 33 VITA . .. .35 vi ILLUSTRATIONS Figure Page 1. Parallel Pipeline Implementation. 7 2. Parallel-d-Pipeline Implementation. 8 3. Illustration of Perfect Hashing. 10 4. Minimal Perfect Hashing. 11 5. Perfect Hashing Example – Intermediate and Value Table. 13 6. Illustration of getting frequency of element. 17 7. Table Indexing to get Memory Module. 17 8. Element lookup in Memory. .18 9. New Element Insertion in Memory. .20 vii TABLES Table Page 1. Cuckoo Hashing Example - Keys and Positions . 4 2. Cuckoo Hashing Example – Inserting Keys in Hash Tables . 5 3. Cuckoo Hashing Example – Moving Key to Secondary Position . 5 4. Cuckoo Hashing Example – Moving Key to Secondary Position2 . 5 5. Cuckoo Hashing Example – Inserting All Keys in Respective Positions . 6 6. Perfect Hashing Indexing Example. 21 viii ACKNOWLEDGEMENTS I would like to take this opportunity to thank following people who have directly or indirectly helped me in academic achievements. Firstly, I would like to thank Dr. Yijie Han for the constant and endearing support which has helped me in fulfilling my thesis. He has provided me with an opportunity to realize my potential in the field of my thesis. His encouragement and inputs were elements of vital guidance in my thesis. He has been a constant source of motivation and challenged me with deadlines, that have contributed to me acquiring inspiration and ideology. His expertise and innovative insights have been phenomenal in completing my thesis. I sincerely thank Dr. Baek-Young Choi and Dr. Mohammad Kuhail for accepting to be a part of my thesis committee and making time for me from their busy schedule. I would like to thank the University of Missouri- Kansas City for providing me with an opportunity to continue my research and supporting me in this regard. I would like to dedicate my thesis to my parents who constantly inspired me to pursue higher studies. I would like to thank my family who stood behind me all these years during my degree. Finally, I would like to thank all my teachers, educational administrators, present, and past and all who helped me achieve this academic goal. ix CHAPTER 1 INTRODUCTION In computer science, the data structure is a systematic way of organizing data such that it can be used efficiently. A good algorithm generally comes with the set of good data structures which allows designed algorithms to efficiently manipulate data. Hash Table is one of the data structure that maps keys to values using the hash function. Hashing is the mechanism that is used for storing and indexing a large amount of data using keys. There are many hashing techniques that aim at storing keys in memory to increase key access efficiency and to make hashing efficient. One option to increase throughput is to use the algorithms based on hashing [2]. Hash Table or Hash Map is used in developing a structure that maps keys to values. Hash Table uses a hash function to compute an index into the array, from which the required values are found. The main disadvantage of Hash Tables is that it maps multiple keys to the same index thus results in the collision in hashing [9]. To handle this issue many hashing techniques are introduced to avoid collisions in allocating memory. Cuckoo Hashing is one of the hash table schemas which provides high memory utilization and constant access time [6]. In this mechanism, every non-empty cell contains a key-value pair. It is a form of open addressing in which every non-empty cell in hash table contains a key-value pair. But open addressing may lead to the collision, this is where Cuckoo Hashing plays its role in resolving collisions. Cuckoo Hashing mainly aims at reducing collisions which might occur when more than one key is mapped to the same cell. It is also used in optimizing the throughput. Cuckoo Hashing uses two hash functions instead of one to reduce collisions. There were many implementations in Cuckoo Hashing such as Serial 1 Implementation, Parallel Implementation, Parallel Pipeline Implementation, Parallel d- Pipeline Implementation [9]. I studied research paper "Parallel d-Pipeline: A Cuckoo Hashing Implementation for Increased Throughput" by S. Pontarelli, P. Reviriego, and J.A. Maestro. In this paper new Cuckoo Hashing implementation called parallel, d-pipeline has been proposed. In this scheme, tables are accessed parallel such that throughput is increased and is also applicable for applications with high speed. Perfect Hash Function maps distinct element of a set S to set of integers with no collision. However, in perfect hashing, the set of keys to be hashed must be provided to create the hash function. In mathematical term, it is a total injective function. This hash function is used in implementing lookup table with constant worst-case access time. There are many hash functions that are like Perfect Hashing, but the main advantage is that no collision resolution should be implemented in Perfect Hashing. My research basically involves in finding the solution to improve Cuckoo Hashing using Perfect Hashing mechanism in terms of memory utilization and allocating memory based on the frequency of keys. 2 CHAPTER 2 CUCKOO HASHING AND ITS IMPLEMENTATION In this chapter, we review paper “Parallel d-Pipeline: A Cuckoo Hashing Implementation for Increased Throughput” by S. Pontarelli, P. Reviriego, and J.A. Maestro. 2.1 Overview Cuckoo Hashing is one of the hash table schemas which provides high memory utilization and constant access time [6]. In this mechanism, every non-empty cell contains a key-value pair. It is a form of open addressing in which every non-empty cell in hash table contains a key-value pair. But open addressing may lead to the collision, this is where Cuckoo Hashing plays its role in resolving collisions. Cuckoo hashing is used for resolving hash collisions for hash function values in a table [9]. It somewhere derives the features of cuckoo species, where the chick pushes young ones out from nest when it hatches. Similarly, in Cuckoo hashing, while inserting new key into the hash table we can push the older key to the table in a different location. To avoid collision instead using one hash function Cuckoo Hashing mechanism uses two hash functions. Because of which for each key two locations are provided in the hash table. It takes constant time in the worst case as the lookup of key requires inspection of two locations in the hash table. In Cuckoo Hashing mechanism inserting a new key first look if one of its two cells are empty or full, when empty then the key will be placed in the available cell. But when both the cells are full, the other keys will be moved to their respective second locations to insert the new key. 3 An element x can be placed in hash tables from 1, 2 …d in positions h1(x), h2(x)… where hi(x)'s are hash functions [9]. The main difference between d-left hashing [1] and Cuckoo hashing is that in d-left hashing new elements cannot be inserted when all positions are occupied because of which memory usage is limited [9].

Load more