Using Tabulation to Implement the Power of Hashing Using Tabulation to Implement the Power of Hashing Talk Surveys Results from I M
Total Page:16
File Type:pdf, Size:1020Kb
Talk surveys results from I M. Patras¸cuˇ and M. Thorup: The power of simple tabulation hashing. STOC’11 and J. ACM’12. I M. Patras¸cuˇ and M. Thorup: Twisted Tabulation Hashing. SODA’13 I M. Thorup: Simple Tabulation, Fast Expanders, Double Tabulation, and High Independence. FOCS’13. I S. Dahlgaard and M. Thorup: Approximately Minwise Independence with Twisted Tabulation. SWAT’14. I T. Christiani, R. Pagh, and M. Thorup: From Independence to Expansion and Back Again. STOC’15. I S. Dahlgaard, M.B.T. Knudsen and E. Rotenberg, and M. Thorup: Hashing for Statistics over K-Partitions. FOCS’15. Using Tabulation to Implement The Power of Hashing Using Tabulation to Implement The Power of Hashing Talk surveys results from I M. Patras¸cuˇ and M. Thorup: The power of simple tabulation hashing. STOC’11 and J. ACM’12. I M. Patras¸cuˇ and M. Thorup: Twisted Tabulation Hashing. SODA’13 I M. Thorup: Simple Tabulation, Fast Expanders, Double Tabulation, and High Independence. FOCS’13. I S. Dahlgaard and M. Thorup: Approximately Minwise Independence with Twisted Tabulation. SWAT’14. I T. Christiani, R. Pagh, and M. Thorup: From Independence to Expansion and Back Again. STOC’15. I S. Dahlgaard, M.B.T. Knudsen and E. Rotenberg, and M. Thorup: Hashing for Statistics over K-Partitions. FOCS’15. I Providing algorithmically important probabilisitic guarantees akin to those of truly random hashing, yet easy to implement. I Bridging theory (assuming truly random hashing) with practice (needing something implementable). I Many randomized algorithms are very simple and popular in practice, but often implemented with too simple hash functions, so guarantees only for sufficiently random input. I Too simple hash functions may work deceivingly well in random tests, but the real world is full of structured data on which they may fail miserably (as we shall see later). Target I Simple and reliable pseudo-random hashing. I Bridging theory (assuming truly random hashing) with practice (needing something implementable). I Many randomized algorithms are very simple and popular in practice, but often implemented with too simple hash functions, so guarantees only for sufficiently random input. I Too simple hash functions may work deceivingly well in random tests, but the real world is full of structured data on which they may fail miserably (as we shall see later). Target I Simple and reliable pseudo-random hashing. I Providing algorithmically important probabilisitic guarantees akin to those of truly random hashing, yet easy to implement. I Many randomized algorithms are very simple and popular in practice, but often implemented with too simple hash functions, so guarantees only for sufficiently random input. I Too simple hash functions may work deceivingly well in random tests, but the real world is full of structured data on which they may fail miserably (as we shall see later). Target I Simple and reliable pseudo-random hashing. I Providing algorithmically important probabilisitic guarantees akin to those of truly random hashing, yet easy to implement. I Bridging theory (assuming truly random hashing) with practice (needing something implementable). I Too simple hash functions may work deceivingly well in random tests, but the real world is full of structured data on which they may fail miserably (as we shall see later). Target I Simple and reliable pseudo-random hashing. I Providing algorithmically important probabilisitic guarantees akin to those of truly random hashing, yet easy to implement. I Bridging theory (assuming truly random hashing) with practice (needing something implementable). I Many randomized algorithms are very simple and popular in practice, but often implemented with too simple hash functions, so guarantees only for sufficiently random input. Target I Simple and reliable pseudo-random hashing. I Providing algorithmically important probabilisitic guarantees akin to those of truly random hashing, yet easy to implement. I Bridging theory (assuming truly random hashing) with practice (needing something implementable). I Many randomized algorithms are very simple and popular in practice, but often implemented with too simple hash functions, so guarantees only for sufficiently random input. I Too simple hash functions may work deceivingly well in random tests, but the real world is full of structured data on which they may fail miserably (as we shall see later). Applications of Hashing Hash tables (n keys and 2n hashes: expect 1/2 keys per hash) I chaining: follow pointers • x a t • ! ! • v • ! • f s r • ! ! ! Applications of Hashing Hash tables (n keys and 2n hashes: expect 1/2 keys per hash) I chaining: follow pointers • x a t x • ! ! ! • v • ! • f s r • ! ! ! Applications of Hashing Hash tables (n keys and 2n hashes: expect 1/2 keys per hash) I chaining: follow pointers I linear probing: sequential search in one array • q x a g ! c ! ! • • t Applications of Hashing Hash tables (n keys and 2n hashes: expect 1/2 keys per hash) I chaining: follow pointers I linear probing: sequential search in one array • q x a g ! c ! x ! • t Applications of Hashing Hash tables (n keys and 2n hashes: expect 1/2 keys per hash) I chaining: follow pointers I linear probing: sequential search in one array I cuckoo hashing: search 2 locations, complex updates a • s • z • y f x w • r • x b • Applications of Hashing Hash tables (n keys and 2n hashes: expect 1/2 keys per hash) I chaining: follow pointers I linear probing: sequential search in one array I cuckoo hashing: search 2 locations, complex updates a • s • z • y f x w • r • x b • Applications of Hashing Hash tables (n keys and 2n hashes: expect 1/2 keys per hash) I chaining: follow pointers I linear probing: sequential search in one array I cuckoo hashing: search 2 locations, complex updates a • s • z • y f x w • r • x b • Applications of Hashing Hash tables (n keys and 2n hashes: expect 1/2 keys per hash) I chaining: follow pointers I linear probing: sequential search in one array I cuckoo hashing: search 2 locations, complex updates a • s • z • y f x w • r • x b • Applications of Hashing Hash tables (n keys and 2n hashes: expect 1/2 keys per hash) I chaining: follow pointers I linear probing: sequential search in one array I cuckoo hashing: search 2 locations, complex updates a • s • z • y f x w • r • x b • Applications of Hashing Hash tables (n keys and 2n hashes: expect 1/2 keys per hash) I chaining: follow pointers I linear probing: sequential search in one array I cuckoo hashing: search 2 locations, complex updates z • s • w • y f x x • a • r x b I sketch A and B to later find A B / A B | \ | | [ | A B / A B = Pr[min h(A)=min h(B)] | \ | | [ | h We need h to be "-minwise independent: 1 " x S : Pr[h(x)=min h(S)] = ± 2 S | | Sketching, streaming, and sampling: 2 I second moment estimation: F2(x¯)= i xi P Applications of Hashing Hash tables (n keys and 2n hashes: expect 1/2 keys per hash) I chaining: follow pointers. I linear probing: sequential search in one array I cuckoo hashing: search 2 locations, complex updates I sketch A and B to later find A B / A B | \ | | [ | A B / A B = Pr[min h(A)=min h(B)] | \ | | [ | h We need h to be "-minwise independent: 1 " x S : Pr[h(x)=min h(S)] = ± 2 S | | Applications of Hashing Hash tables (n keys and 2n hashes: expect 1/2 keys per hash) I chaining: follow pointers. I linear probing: sequential search in one array I cuckoo hashing: search 2 locations, complex updates Sketching, streaming, and sampling: 2 I second moment estimation: F2(x¯)= i xi P Applications of Hashing Hash tables (n keys and 2n hashes: expect 1/2 keys per hash) I chaining: follow pointers. I linear probing: sequential search in one array I cuckoo hashing: search 2 locations, complex updates Sketching, streaming, and sampling: 2 I second moment estimation: F2(x¯)= i xi I sketch A and B to later find A B / A B | \ | | P[ | A B / A B = Pr[min h(A)=min h(B)] | \ | | [ | h We need h to be "-minwise independent: 1 " x S : Pr[h(x)=min h(S)] = ± 2 S | | Prototypical example: degree k 1 polynomial − I u = b prime; I choose a0, a1,...,ak 1 randomly in [u]; − k 1 I h(x)= a0 + a1x + + ak 1x − mod u. ··· − Many solutions for k-independent hashing proposed, but generally slow for k > 3 and too slow for k > 5. Wegman & Carter [FOCS’77] We do not have space for truly random hash functions, but Family = h :[u] [b] k-independent iff for random h : H { ! } 2H I ( )x [u], h(x) is uniform in [b]; 8 2 I ( )x ,...,x [u], h(x ),...,h(x ) are independent. 8 1 k 2 1 k Many solutions for k-independent hashing proposed, but generally slow for k > 3 and too slow for k > 5. Wegman & Carter [FOCS’77] We do not have space for truly random hash functions, but Family = h :[u] [b] k-independent iff for random h : H { ! } 2H I ( )x [u], h(x) is uniform in [b]; 8 2 I ( )x ,...,x [u], h(x ),...,h(x ) are independent. 8 1 k 2 1 k Prototypical example: degree k 1 polynomial − I u = b prime; I choose a0, a1,...,ak 1 randomly in [u]; − k 1 I h(x)= a0 + a1x + + ak 1x − mod u. ··· − Wegman & Carter [FOCS’77] We do not have space for truly random hash functions, but Family = h :[u] [b] k-independent iff for random h : H { ! } 2H I ( )x [u], h(x) is uniform in [b]; 8 2 I ( )x ,...,x [u], h(x ),...,h(x ) are independent.