CS240E: Data Structures and Data Management
Total Page:16
File Type:pdf, Size:1020Kb
CS240E: Data Structures and Data Management Helena S. Ven 08 Jan. 2019 Class: T,Th at 0830 { 0950 Instructors: Therese Biedl Office: DC 2341, (1000 - 1100) Topics: Tutorial: Midterm: The class on the date of the midterm (28 Feb.): Tries, Hash for strings, Compressed tries. Class start at 0910 instead of 0830. Final: Arithmetic compression, Cache oblivious trees, and External sorting are not on the final. No deletion in dictionaries (lazy deletion is fine) By convention, log is base 2 unless stated otherwise. The distribution of this document is prohibited unless given permission from the author and Prof. Biedl Index 1 Runtime and Asymptotic bounds 4 1.1 Objective of this course . .4 1.1.1 Computational Problems . .4 1.2 Asymptotic Analysis . .5 1.3 Analysis of Algorithms . .7 1.4 Runtime of Randomised Algorithms . .8 1.5 Potential Method of Amortised Analysis . 10 2 Comparison-Based Data Structures 11 2.1 Array-based Data types . 11 2.2 ADT: Priority Queues . 11 2.3 Heap . 13 2.3.1 Operations in heaps . 14 2.3.2 Improvements of the Heap . 17 2.4 Heap Merging . 19 2.4.1 Method 1, Determinstic . 19 2.4.2 Method 2, Randomised . 20 2.4.3 Method 3, Modified Heap . 21 2.4.4 Almost-heaps . 23 2.5 ADT: Dictionaries . 23 2.6 Binary Search Trees . 24 2.7 Tree-based Implementations of the Dictionary . 25 2.7.1 Treaps . 25 2.7.2 AVL Trees . 26 2.7.3 Scapegoat Tree . 30 2.8 Skip Lists . 33 2.9 Dictionary with Biased Search Requests . 35 2.9.1 MTF Array and Transpose Array . 36 2.10 Splay Trees . 36 3 Hashing and Spatial Data 39 3.1 Hash Tables . 39 3.1.1 Probe Hashing . 39 3.1.2 Double Hashing . 41 3.1.3 Cuckoo Hashing . 41 3.1.4 Complexity of Probing Methods . 42 3.1.5 Universal Hashing . 42 3.2 Hash of Multi-dimensional data . 43 3.3 Tries . 44 3.3.1 Variation of Trie: No leaves . 45 3.3.2 Variation of Trie: Compressed labels . 45 3.3.3 Variation of Trie: Allow Prefixes . 45 3.3.4 Compressed Tries . 46 3.4 ADT: Dictionary with Range Search . 47 1 3.5 Spatial Data Structures . 48 3.6 Quad-Trees . 48 3.7 KD-Tree . 50 3.8 Range Tree . 52 3.8.1 Problem of Duplicates and Generalisations . 54 3.8.2 3-Sided Ranged Queries . 54 4 Sorting and Searching Algorithms 55 4.1 Problem: Selection and Sorting . 55 4.1.1 The Lower Bound of Comparison Sorting . 55 4.2 Quick-Select . 56 4.2.1 Randomised Pivoting . 57 4.3 Partitioning . 58 4.4 Quicksort . 58 4.4.1 Choice of the Pivot . 61 4.5 Sorting Integers . 61 4.5.1 Bucket Sort . 62 4.5.2 Radix Sort . 63 4.6 Problem: Search . 64 4.7 Interpolation Search . 64 5 String Algorithms 66 5.1 Problem: Pattern Matching . 66 5.2 Pattern Pre-processing . 67 5.2.1 Karp-Rabin Fingerprint Algorithm . 67 5.2.2 Boyer-Moore Algorithm . 68 5.2.3 Finite Automaton and Knuth-Morris-Pratt Method . 71 5.3 Text Pre-processing . 75 5.3.1 Trie of Suffixes, Suffix Trees . 75 5.3.2 Suffix Array . 75 5.4 Comparison of Pattern Matching Algorithms . 77 5.5 Problem: Compression . 79 5.5.1 Prefix-Free Encoding . 79 5.6 Huffman Tree . 80 5.6.1 Huffman Tree with Different Base . 82 5.7 Run-Length Encoding . 83 5.8 Lempel-Ziv-Welch . 84 5.8.1 Decoding Lempel-Ziv-Welch . 85 5.9 BZip2 . 86 5.9.1 Burrows-Wheeler Transform . 87 5.9.2 Move-to-front Transform . 89 5.10 Arithmetic Compression . 90 5.11 Comparison of Compression Algorithms . ..