Space-Efficient Data Structures, Streams, and Algorithms
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
1 Suffix Trees
This material takes about 1.5 hours. 1 Suffix Trees Gusfield: Algorithms on Strings, Trees, and Sequences. Weiner 73 “Linear Pattern-matching algorithms” IEEE conference on automata and switching theory McCreight 76 “A space-economical suffix tree construction algorithm” JACM 23(2) 1976 Chen and Seifras 85 “Efficient and Elegegant Suffix tree construction” in Apos- tolico/Galil Combninatorial Algorithms on Words Another “search” structure, dedicated to strings. Basic problem: match a “pattern” (of length m) to “text” (of length n) • goal: decide if a given string (“pattern”) is a substring of the text • possibly created by concatenating short ones, eg newspaper • application in IR, also computational bio (DNA seqs) • if pattern avilable first, can build DFA, run in time linear in text • if text available first, can build suffix tree, run in time linear in pattern. • applications in computational bio. First idea: binary tree on strings. Inefficient because run over pattern many times. • fractional cascading? • realize only need one character at each node! Tries: • used to store dictionary of strings • trees with children indexed by “alphabet” • time to search equal length of query string • insertion ditto. • optimal, since even hashing requires this time to hash. • but better, because no “hash function” computed. • space an issue: – using array increases stroage cost by |Σ| – using binary tree on alphabet increases search time by log |Σ| 1 – ok for “const alphabet” – if really fussy, could use hash-table at each node. • size in worst case: sum of word lengths (so pretty much solves “dictionary” problem. But what about substrings? • Relevance to DNA searches • idea: trie of all n2 substrings • equivalent to trie of all n suffixes. -
Managing Unbounded-Length Keys in Comparison-Driven Data
Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing∗ Amihood Amir Gianni Franceschini Department of Computer Science Dipartimento di Informatica Bar-Ilan University, Israel Universita` di Pisa, Italy and Department of Computer Science John Hopkins University, Baltimore MD Roberto Grossi Tsvi Kopelowitz Dipartimento di Informatica Department of Computer Science Universita` di Pisa, Italy Bar-Ilan University, Israel Moshe Lewenstein Noa Lewenstein Department of Computer Science Department of Computer Science Bar-Ilan University, Israel Netanya College, Israel arXiv:1306.0406v1 [cs.DS] 3 Jun 2013 ∗Parts of this paper appeared as extended abstracts in [3, 20]. 1 Abstract This paper presents a general technique for optimally transforming any dynamic data struc- ture that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Exam- ples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g. records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure of is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree construction can be done in worst case time O(log n) per input symbol (as opposed to amortized O(log n) time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves O(log n) worst case time per input symbol. -
Optimizing Sorting with Genetic Algorithms ∗
Optimizing Sorting with Genetic Algorithms ∗ Xiaoming Li, Mar´ıa Jesus´ Garzaran´ and David Padua Department of Computer Science University of Illinois at Urbana-Champaign {xli15, garzaran, padua}@cs.uiuc.edu http://polaris.cs.uiuc.edu Abstract 1 Introduction Although compiler technology has been extraordinarily The growing complexity of modern processors has made successful at automating the process of program optimiza- the generation of highly efficient code increasingly difficult. tion, much human intervention is still needed to obtain high- Manual code generation is very time consuming, but it is of- quality code. One reason is the unevenness of compiler im- ten the only choice since the code generated by today’s com- plementations. There are excellent optimizing compilers for piler technology often has much lower performance than some platforms, but the compilers available for some other the best hand-tuned codes. A promising code generation platforms leave much to be desired. A second, and perhaps strategy, implemented by systems like ATLAS, FFTW, and more important, reason is that conventional compilers lack SPIRAL, uses empirical search to find the parameter values semantic information and, therefore, have limited transfor- of the implementation, such as the tile size and instruction mation power. An emerging approach that has proven quite schedules, that deliver near-optimal performance for a par- effective in overcoming both of these limitations is to use ticular machine. However, this approach has only proven library generators. These systems make use of semantic in- successful on scientific codes whose performance does not formation to apply transformations at all levels of abstrac- depend on the input data. -
Augmentation: Range Trees (PDF)
Lecture 9 Augmentation 6.046J Spring 2015 Lecture 9: Augmentation This lecture covers augmentation of data structures, including • easy tree augmentation • order-statistics trees • finger search trees, and • range trees The main idea is to modify “off-the-shelf” common data structures to store (and update) additional information. Easy Tree Augmentation The goal here is to store x.f at each node x, which is a function of the node, namely f(subtree rooted at x). Suppose x.f can be computed (updated) in O(1) time from x, children and children.f. Then, modification a set S of nodes costs O(# of ancestors of S)toupdate x.f, because we need to walk up the tree to the root. Two examples of O(lg n) updates are • AVL trees: after rotating two nodes, first update the new bottom node and then update the new top node • 2-3 trees: after splitting a node, update the two new nodes. • In both cases, then update up the tree. Order-Statistics Trees (from 6.006) The goal of order-statistics trees is to design an Abstract Data Type (ADT) interface that supports the following operations • insert(x), delete(x), successor(x), • rank(x): find x’s index in the sorted order, i.e., # of elements <x, • select(i): find the element with rank i. 1 Lecture 9 Augmentation 6.046J Spring 2015 We can implement the above ADT using easy tree augmentation on AVL trees (or 2-3 trees) to store subtree size: f(subtree) = # of nodes in it. Then we also have x.size =1+ c.size for c in x.children. -
Optimal Finger Search Trees in the Pointer Machine
Alcom-FT Technical Report Series ALCOMFT-TR-02-77 Optimal Finger Search Trees in the Pointer Machine Gerth Stølting Brodal∗ George Lagogiannis† Christos Makris† Athanasios Tsakalidis† Kostas Tsichlas† Abstract We develop a new finger search tree with worst-case constant update time in the Pointer Machine (PM) model of computation. This was a major problem in the field of Data Structures and was tantalizingly open for over twenty years while many attempts by researchers were made to solve it. The result comes as a consequence of the innovative mechanism that guides the rebalancing operations combined with incremental multiple splitting and fusion techniques over nodes. Keywords balanced trees, update operations, finger search trees, data structures, complexity 1 Introduction The balanced search tree is one of the most common data structures used in algorithms. Assuming that the update position is known, balanced search trees with O(1) amortized update time have been presented long ago ([6, 14]). It has also been known ([6, 16]) that updates can be performed in O(1) structural changes, but the nodes to be changed have to be searched in Ω(log n) time. Levcopoulos and Overmars ([13]) presented an algorithm achieving O(1) worst case update time by using a global splitting lemma that is based on a pebble game combined with the bucketing technique of Overmars ([14]). Instead of storing single keys in the leaves of the search tree, each leaf can store a list of several keys. Unfortunately, the buckets in [13] have size O(log2 n), so they need a two level hierarchy of lists in order to guarantee O(log n) query time within the buckets. -
Bulk Updates and Cache Sensitivity in Search Trees
BULK UPDATES AND CACHE SENSITIVITY INSEARCHTREES TKK Research Reports in Computer Science and Engineering A Espoo 2009 TKK-CSE-A2/09 BULK UPDATES AND CACHE SENSITIVITY INSEARCHTREES Doctoral Dissertation Riku Saikkonen Dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the Faculty of Information and Natural Sciences, Helsinki University of Technology for public examination and debate in Auditorium T2 at Helsinki University of Technology (Espoo, Finland) on the 4th of September, 2009, at 12 noon. Helsinki University of Technology Faculty of Information and Natural Sciences Department of Computer Science and Engineering Teknillinen korkeakoulu Informaatio- ja luonnontieteiden tiedekunta Tietotekniikan laitos Distribution: Helsinki University of Technology Faculty of Information and Natural Sciences Department of Computer Science and Engineering P.O. Box 5400 FI-02015 TKK FINLAND URL: http://www.cse.tkk.fi/ Tel. +358 9 451 3228 Fax. +358 9 451 3293 E-mail: [email protected].fi °c 2009 Riku Saikkonen Layout: Riku Saikkonen (except cover) Cover image: Riku Saikkonen Set in the Computer Modern typeface family designed by Donald E. Knuth. ISBN 978-952-248-037-8 ISBN 978-952-248-038-5 (PDF) ISSN 1797-6928 ISSN 1797-6936 (PDF) URL: http://lib.tkk.fi/Diss/2009/isbn9789522480385/ Multiprint Oy Espoo 2009 AB ABSTRACT OF DOCTORAL DISSERTATION HELSINKI UNIVERSITY OF TECHNOLOGY P.O. Box 1000, FI-02015 TKK http://www.tkk.fi/ Author Riku Saikkonen Name of the dissertation Bulk Updates and Cache Sensitivity in Search Trees Manuscript submitted 09.04.2009 Manuscript revised 15.08.2009 Date of the defence 04.09.2009 £ Monograph ¤ Article dissertation (summary + original articles) Faculty Faculty of Information and Natural Sciences Department Department of Computer Science and Engineering Field of research Software Systems Opponent(s) Prof. -
A Complete Bibliography of Publications in Nordisk Tidskrift for Informationsbehandling, BIT, and BIT Numerical Mathematics
A Complete Bibliography of Publications in Nordisk Tidskrift for Informationsbehandling, BIT,andBIT Numerical Mathematics Nelson H. F. Beebe University of Utah Department of Mathematics, 110 LCB 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 USA Tel: +1 801 581 5254 FAX: +1 801 581 4148 E-mail: [email protected], [email protected], [email protected] (Internet) WWW URL: http://www.math.utah.edu/~beebe/ 09 June 2021 Version 3.54 Title word cross-reference [3105, 328, 469, 655, 896, 524, 873, 455, 779, 946, 2944, 297, 1752, 670, 2582, 1409, 1987, 915, 808, 761, 916, 2071, 2198, 1449, 780, 959, 1105, 1021, 497, 2589]. A(α) #24873 [1089]. [896, 2594, 333]. A∗ [2013]. A∗Ax = b [2369]. n A [1640, 566, 947, 1580, 1460]. A = a2 +1 − 0 n (3) [2450]. (A λB) [1414]. 0=1 [1242]. 1 [334]. α [824, 1580]. AN [1622]. A(#) [3439]. − 12 [3037, 2711]. 1 2 [1097]. 1:0 [3043]. 10 AX − XB = C [2195, 2006]. [838]. 11 [1311]. 2 AXD − BXC = E [1101]. B [2144, 1953, 2291, 2162, 3047, 886, 2551, 957, [2187, 1575, 1267, 1409, 1489, 1991, 1191, 2007, 2552, 1832, 949, 3024, 3219, 2194]. 2; 3 979, 1819, 1597, 1823, 1773]. β [824]. BN n − p − − [1490]. 2 1 [320]. 2 1 [100]. 2m 4 [1181]. BS [1773]. BSI [1446]. C0 [2906]. C1 [1105]. 3 [2119, 1953, 2531, 1351, 2551, 1292, [3202]. C2 [3108, 2422, 3000, 2036]. χ2 1793, 949, 1356, 2711, 2227, 570]. [30, 31]. Cln(θ); (n ≥ 2) [2929]. cos [228]. D 3; 000; 000; 000 [575, 637]. -
Sorting Algorithm 1 Sorting Algorithm
Sorting algorithm 1 Sorting algorithm In computer science, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order. Efficient sorting is important for optimizing the use of other algorithms (such as search and merge algorithms) that require sorted lists to work correctly; it is also often useful for canonicalizing data and for producing human-readable output. More formally, the output must satisfy two conditions: 1. The output is in nondecreasing order (each element is no smaller than the previous element according to the desired total order); 2. The output is a permutation, or reordering, of the input. Since the dawn of computing, the sorting problem has attracted a great deal of research, perhaps due to the complexity of solving it efficiently despite its simple, familiar statement. For example, bubble sort was analyzed as early as 1956.[1] Although many consider it a solved problem, useful new sorting algorithms are still being invented (for example, library sort was first published in 2004). Sorting algorithms are prevalent in introductory computer science classes, where the abundance of algorithms for the problem provides a gentle introduction to a variety of core algorithm concepts, such as big O notation, divide and conquer algorithms, data structures, randomized algorithms, best, worst and average case analysis, time-space tradeoffs, and lower bounds. Classification Sorting algorithms used in computer science are often classified by: • Computational complexity (worst, average and best behaviour) of element comparisons in terms of the size of the list . For typical sorting algorithms good behavior is and bad behavior is . -
Finger Search Trees
11 Finger Search Trees 11.1 Finger Searching....................................... 11-1 11.2 Dynamic Finger Search Trees ....................... 11-2 11.3 Level Linked (2,4)-Trees .............................. 11-3 11.4 Randomized Finger Search Trees ................... 11-4 Treaps • Skip Lists 11.5 Applications............................................ 11-6 Optimal Merging and Set Operations • Arbitrary Gerth Stølting Brodal Merging Order • List Splitting • Adaptive Merging and University of Aarhus Sorting 11.1 Finger Searching One of the most studied problems in computer science is the problem of maintaining a sorted sequence of elements to facilitate efficient searches. The prominent solution to the problem is to organize the sorted sequence as a balanced search tree, enabling insertions, deletions and searches in logarithmic time. Many different search trees have been developed and studied intensively in the literature. A discussion of balanced binary search trees can e.g. be found in [4]. This chapter is devoted to finger search trees which are search trees supporting fingers, i.e. pointers, to elements in the search trees and supporting efficient updates and searches in the vicinity of the fingers. If the sorted sequence is a static set of n elements then a simple and space efficient representation is a sorted array. Searches can be performed by binary search using 1+⌊log n⌋ comparisons (we throughout this chapter let log x denote log2 max{2, x}). A finger search starting at a particular element of the array can be performed by an exponential search by inspecting elements at distance 2i − 1 from the finger for increasing i followed by a binary search in a range of 2⌊log d⌋ − 1 elements, where d is the rank difference in the sequence between the finger and the search element. -
Ankara Yildirim Beyazit University Graduate
ANKARA YILDIRIM BEYAZIT UNIVERSITY GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES DEVELOPMENT OF SORTING AND SEARCHING ALGORITHMS Ph. D Thesis By ADNAN SAHER MOHAMMED Department of Computer Engineering December 2017 ANKARA DEVELOPMENT OF SORTING AND SEARCHING ALGORITHMS A Thesis Submitted to The Graduate School of Natural and Applied Sciences of Ankara Yıldırım Beyazıt University In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in Computer Engineering, Department of Computer Engineering By ADNAN SAHER MOHAMMED December 2017 ANKARA PH.D. THESIS EXAMINATION RESULT FORM We have read the thesis entitled “Development of Sorting and Searching Algorithms” completed by Adnan Saher Mohammed under the supervision of Prof. Dr. Fatih V. ÇELEBİ and we certify that in our opinion it is fully adequate, in scope and in quality, as a thesis for the degree of Doctor of Philosophy. Prof. Dr. Fatih V. ÇELEBİ (Supervisor) Prof. Dr. Şahin EMRAH Yrd. Doç. Dr. Hacer YALIM KELEŞ (Jury member) (Jury member) Yrd. Doç. Dr. Özkan KILIÇ Yrd. Doç. Dr. Shafqat Ur REHMAN (Jury member) (Jury member) Prof. Dr. Fatih V. ÇELEBİ Director Graduate School of Natural and Applied Sciences ii ETHICAL DECLARATION I hereby declare that, in this thesis which has been prepared in accordance with the Thesis Writing Manual of Graduate School of Natural and Applied Sciences, • All data, information and documents are obtained in the framework of academic and ethical rules, • All information, documents and assessments are presented in accordance with scientific ethics and morals, • All the materials that have been utilized are fully cited and referenced, • No change has been made on the utilized materials, • All the works presented are original, and in any contrary case of above statements, I accept to renounce all my legal rights. -
Sorting Algorithm 1 Sorting Algorithm
Sorting algorithm 1 Sorting algorithm A sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order. Efficient sorting is important for optimizing the use of other algorithms (such as search and merge algorithms) which require input data to be in sorted lists; it is also often useful for canonicalizing data and for producing human-readable output. More formally, the output must satisfy two conditions: 1. The output is in nondecreasing order (each element is no smaller than the previous element according to the desired total order); 2. The output is a permutation (reordering) of the input. Since the dawn of computing, the sorting problem has attracted a great deal of research, perhaps due to the complexity of solving it efficiently despite its simple, familiar statement. For example, bubble sort was analyzed as early as 1956.[1] Although many consider it a solved problem, useful new sorting algorithms are still being invented (for example, library sort was first published in 2006). Sorting algorithms are prevalent in introductory computer science classes, where the abundance of algorithms for the problem provides a gentle introduction to a variety of core algorithm concepts, such as big O notation, divide and conquer algorithms, data structures, randomized algorithms, best, worst and average case analysis, time-space tradeoffs, and upper and lower bounds. Classification Sorting algorithms are often classified by: • Computational complexity (worst, average and best behavior) of element comparisons in terms of the size of the list (n). For typical serial sorting algorithms good behavior is O(n log n), with parallel sort in O(log2 n), and bad behavior is O(n2). -
Fundamental Data Structures Contents
Fundamental Data Structures Contents 1 Introduction 1 1.1 Abstract data type ........................................... 1 1.1.1 Examples ........................................... 1 1.1.2 Introduction .......................................... 2 1.1.3 Defining an abstract data type ................................. 2 1.1.4 Advantages of abstract data typing .............................. 4 1.1.5 Typical operations ...................................... 4 1.1.6 Examples ........................................... 5 1.1.7 Implementation ........................................ 5 1.1.8 See also ............................................ 6 1.1.9 Notes ............................................. 6 1.1.10 References .......................................... 6 1.1.11 Further ............................................ 7 1.1.12 External links ......................................... 7 1.2 Data structure ............................................. 7 1.2.1 Overview ........................................... 7 1.2.2 Examples ........................................... 7 1.2.3 Language support ....................................... 8 1.2.4 See also ............................................ 8 1.2.5 References .......................................... 8 1.2.6 Further reading ........................................ 8 1.2.7 External links ......................................... 9 1.3 Analysis of algorithms ......................................... 9 1.3.1 Cost models ......................................... 9 1.3.2 Run-time analysis