String Hashing for Collection-Based Compression
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Fundamental Data Structures Contents
Fundamental Data Structures Contents 1 Introduction 1 1.1 Abstract data type ........................................... 1 1.1.1 Examples ........................................... 1 1.1.2 Introduction .......................................... 2 1.1.3 Defining an abstract data type ................................. 2 1.1.4 Advantages of abstract data typing .............................. 4 1.1.5 Typical operations ...................................... 4 1.1.6 Examples ........................................... 5 1.1.7 Implementation ........................................ 5 1.1.8 See also ............................................ 6 1.1.9 Notes ............................................. 6 1.1.10 References .......................................... 6 1.1.11 Further ............................................ 7 1.1.12 External links ......................................... 7 1.2 Data structure ............................................. 7 1.2.1 Overview ........................................... 7 1.2.2 Examples ........................................... 7 1.2.3 Language support ....................................... 8 1.2.4 See also ............................................ 8 1.2.5 References .......................................... 8 1.2.6 Further reading ........................................ 8 1.2.7 External links ......................................... 9 1.3 Analysis of algorithms ......................................... 9 1.3.1 Cost models ......................................... 9 1.3.2 Run-time analysis -
Remote Synchronization Using Differential Compression
Date of acceptance Grade th 6 November, 2012 ECLA Instructor Sasu Tarkoma Lossless Differential Compression for Synchronizing Arbitrary Single- Dimensional Strings Jari Karppanen Helsinki, September 20th, 2012 Master’s Thesis UNIVERSITY OF HELSINKI Department of Computer Science HELSINGIN YLIOPISTO − HELSINGFORS UNIVERSITET – UNIVERSITY OF HELSINKI Tiedekunta − Fakultet – Faculty Laitos − Institution − Department Faculty of Science Department of Computer Science Tekijä − Författare − Author Jari Karppanen Työn nimi − Arbetets titel − Title Lossless Differential Compression for Synchronizing Arbitrary Single-Dimensional Strings Oppiaine − Läroämne − Subject Computer Science Työn laji − Arbetets art − Level Aika − Datum − Month and year Sivumäärä − Sidoantal − Number of pages th Master’s thesis Sep 20 , 2012 93 pages Tiivistelmä − Referat − Abstract Differential compression allows expressing a modified document as differences relative to another version of the document. A compressed string requires space relative to amount of changes, irrespective of original document sizes. The purpose of this study was to answer what algorithms are suitable for universal lossless differential compression for synchronizing two arbitrary documents either locally or remotely. Two main problems in differential compression are finding the differences (differencing), and compactly communicating the differences (encoding). We discussed local differencing algorithms based on subsequence searching, hashtable lookups, suffix searching, and projection. We also discussed -
Data Structures
Data structures PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Thu, 17 Nov 2011 20:55:22 UTC Contents Articles Introduction 1 Data structure 1 Linked data structure 3 Succinct data structure 5 Implicit data structure 7 Compressed data structure 8 Search data structure 9 Persistent data structure 11 Concurrent data structure 15 Abstract data types 18 Abstract data type 18 List 26 Stack 29 Queue 57 Deque 60 Priority queue 63 Map 67 Bidirectional map 70 Multimap 71 Set 72 Tree 76 Arrays 79 Array data structure 79 Row-major order 84 Dope vector 86 Iliffe vector 87 Dynamic array 88 Hashed array tree 91 Gap buffer 92 Circular buffer 94 Sparse array 109 Bit array 110 Bitboard 115 Parallel array 119 Lookup table 121 Lists 127 Linked list 127 XOR linked list 143 Unrolled linked list 145 VList 147 Skip list 149 Self-organizing list 154 Binary trees 158 Binary tree 158 Binary search tree 166 Self-balancing binary search tree 176 Tree rotation 178 Weight-balanced tree 181 Threaded binary tree 182 AVL tree 188 Red-black tree 192 AA tree 207 Scapegoat tree 212 Splay tree 216 T-tree 230 Rope 233 Top Trees 238 Tango Trees 242 van Emde Boas tree 264 Cartesian tree 268 Treap 273 B-trees 276 B-tree 276 B+ tree 287 Dancing tree 291 2-3 tree 292 2-3-4 tree 293 Queaps 295 Fusion tree 299 Bx-tree 299 Heaps 303 Heap 303 Binary heap 305 Binomial heap 311 Fibonacci heap 316 2-3 heap 321 Pairing heap 321 Beap 324 Leftist tree 325 Skew heap 328 Soft heap 331 d-ary heap 333 Tries 335 Trie -
Tobiesen Ole.Pdf (2.195Mb)
Faculty of Science and Technology MASTER’S THESIS Study program/ Specialization: Spring semester, 2016 Master of Science in Computer Science Open Writer: Ole Tobiesen ………………………………………… (Writer’s signature) Faculty supervisor: Reggie Davidrajuh Thesis title: Data Fingerprinting -- Identifying Files and Tables with Hashing Schemes Credits (ECTS): 30 Key words: Pages: 145, including table of contents Data Fingerprinting and appendix Fuzzy Hashing Machine Learning Enclosure: 4 7z archives Merkle Trees Finite Fields Mersenne Primes Stavanger, 15 June 2016 Front page for master thesis Faculty of Science and Technology Decision made by the Dean October 30th 2009 Abstract INTRODUCTION: Although hash functions are nothing new, these are not lim- ited to cryptographic purposes. One important field is data fingerprinting. Here, the purpose is to generate a digest which serves as a fingerprint (or a license plate) that uniquely identifies a file. More recently, fuzzy fingerprinting schemes — which will scrap the avalanche effect in favour of detecting local changes — has hit the spotlight. The main purpose of this project is to find ways to classify text tables, and discover where potential changes or inconsitencies have happened. METHODS: Large parts of this report can be considered applied discrete math- ematics — and finite fields and combinatorics have played an important part. Ra- bin’s fingerprinting scheme was tested extensively and compared against existing cryptographic algorithms, CRC and FNV. Moreover, a self-designed fuzzy hashing algorithm with the preliminary name No-Frills Hash has been created and tested against Nilsimsa and Spamsum. NFHash is based on Mersenne primes, and uses a sliding window to create a fuzzy hash.