A Simple Parallel Cartesian Tree Algorithm and Its Application to Suffix Tree Construction ∗

Total Page:16

File Type:pdf, Size:1020Kb

A Simple Parallel Cartesian Tree Algorithm and Its Application to Suffix Tree Construction ∗ A Simple Parallel Cartesian Tree Algorithm and its Application to Suffix Tree Construction ∗ Guy E. Blellochy Julian Shunz Abstract data structures for string processing. For example, it We present a simple linear work and space, and poly- is used in several bioinformatic applications, such as logarithmic time parallel algorithm for generating mul- REPuter [KS99], MUMmer [DPCS02], OASIS [MPK03] tiway Cartesian trees. As a special case, the algorithm and Trellis+ [PZ08]. Both suffix trees and a linear can be used to generate suffix trees from suffix arrays on time algorithm for constructing them were introduced arbitrary alphabets in the same bounds. In conjunction by Weiner [Wei73] (although he used the term posi- with parallel suffix array algorithms, such as the skew tion tree). Since then various similar constructions have algorithm, this gives a rather simple linear work paral- been described [McC76] and there have been many im- lel algorithm for generating suffix trees over an integer plementations of these algorithms. Although originally alphabet Σ ⊆ [1; : : : ; n], where n is the length of the in- designed for fixed-sized alphabets with deterministic lin- put string. More generally, given a sorted sequences of ear time, Weiner's algorithm can work on an alphabet strings and the longest common prefix lengths between f1; : : : ; ng, henceforth [n], in linear expected time sim- adjacent elements, the algorithm will generate a pat tree ply by using hashing to access the children of a node. (compacted trie) over the strings. The algorithm of Weiner and its derivatives are all We also present experimental results comparing incremental and inherently sequential. The first paral- the performance of the algorithm to existing sequential lel algorithm for suffix trees was given by Apostolico et. + implementations and a second parallel algorithm. We al. [AIL 88] and was based on a quite different doubling present comparisons for the Cartesian tree algorithm approach. For a parameter 0 < ≤ 1 the algorithm 1 n 1+ on its own and for constructing a suffix tree using our runs in O( log n) time, O( log n) work and O(n ) algorithm. The results show that on a variety of strings space on the CRCW PRAM for arbitrary alphabets. our algorithm is competitive with the sequential version Although reasonably simple, this algorithm is likely not on a single processor and achieves good speedup on practical since it is not work efficient and uses super- multiple processors. linear memory (by a polynomial factor). The parallel construction of suffix trees was later improved to linear 1 Introduction work and space by Hariharan [Har94], with an algorithm taking O(log4 n) time on the CREW PRAM, and then For a string s of length n over a character set Σ ⊆ by Farach and Muthukrishnan to O(log n) time using f1; : : : ; ng1 the suffix-tree data structure stores all the a randomized CRCW PRAM [FM96] (high-probability suffixes of s in a pat tree (a trie in which maximal branch bounds). These later results are for a constant-sized al- free paths are contracted into a single edge). In addi- phabet, are \considerably non-trivial", and do not seem tion to supporting searches in s for any string t 2 Σ∗ to be amenable to efficient implementations. in O(jtj) expected time2, suffix trees efficiently sup- One way to construct a suffix tree is to first generate port many other operations on strings, such as longest a suffix array (an array of pointers to the lexicograph- common substring, maximal repeats, longest repeated ically sorted suffixes), and then convert it to a suffix substrings, and longest palindrome, among many oth- tree. For binary alphabets and given the length of the ers [Gus97]. As such it is one of the most important longest common prefix (LCP) between adjacent entries this conversion can be done sequentially by generating ∗This work was supported by generous gifts from Intel, Mi- a Cartesian tree in linear time and space. The approach crosoft and IBM, and by the National Science Foundation under can be generalized to arbitrary alphabets using multi- award CCF1018188. way Cartesian trees without much difficulty. Using suf- y Carnegie Mellon University, E-mail: [email protected] fix arrays is attractive since in recent years there has zCarnegie Mellon University, E-mail: [email protected] 1More general alphabets can be used by first sorting the been considerable theoretical and practical advances in characters and then labeling them from 1 to n. the generation of suffix arrays (see e.g. [PST07]). The 2Worst case time for constant sized alphabets. Copyright © 2011 by SIAM 48 Unauthorized reproduction is prohibited. interest is partly due to their need in the widely used machine on a variety of inputs. First, we compare our Burrows-Wheeler compression algorithm, and also as a Cartesian tree algorithm with a simple stack based se- more space-efficient alternative to suffix trees. As such quential implementation. On one core our algorithm is there have been dozens of papers on efficient implemen- about 3x slower, but we achieve about 30x speedup on tations of suffix arrays. Among these Karkkainen and 32 cores and about 45x speedup with 32 cores using hy- Sanders have developed a quite simple and efficient par- perthreading (two threads per core). We also analyze allel algorithm for suffix arrays [KS03, KS07] that can the algorithm when used as part of code to generate a also generate LCPs. suffix tree from the original string. We compare the code The story with generating Cartesian trees in paral- to the ANSV based algorithm described in the previous lel is less satisfactory. Berkman et. al [BSV93] describe paragraph and to existing sequential implementations a parallel algorithm for the all nearest smaller values of suffix trees. Our algorithm is always faster than the (ANSV) problem, which can be directly used to gen- ANSV algorithm. The algorithm is competitive with erate a binary Cartesian tree for fixed sized alphabets. the sequential code on a single processor (core), and However, it cannot directly be used for non-constant achieves good speedup on 32 cores. Finally, we present sized alphabets, and the algorithm is very complicated. timings for searching multiple strings in the suffix tree Iliopoulos and Rytter [IR04] present two much simpler we generate. Our times are always faster than the se- algorithms for generating suffix trees from suffix arrays, quential suffix tree on one core and always more than one based on merging and one based on a variant of 50x faster on 32 cores using hyperthreading. the ANSV problem that allows for multiway Cartesian trees. However they both require O(n log n) work. 2 Preliminaries In this paper we describe a linear work, linear space, Given a string s of length n over an ordered alphabet and polylogarithmic time algorithm for generating mul- Σ, the suffix array, SA, represents the n suffixes of s in tiway Cartesian trees. The algorithm is based on divide- lexicographically sorted order. To be precise, SA[i] = j and-conquer and we describe two versions that differ if and only if the suffix starting at the j'th position in in whether the merging step is done sequentially or s appears in the i'th position in the suffix-sorted order. in parallel. The first based on a sequential merge, A pat tree [Mor68] (or patricia tree, or compacted trie) is very simple, and for a tree of height d, it runs in of a set of strings S is a modified trie in which (1) edges O(minfd log n; ng) time on the CREW PRAM. The sec- can be labeled with a sequence of characters instead of a ond version is only slightly more complicated and runs single character, (2) no node has a single child, and (3) 2 in O(log n) time on the CREW PRAM. They both use every string in S corresponds to concatenation of labels linear work and space. for a path from the root to a leaf. Given a string s of Given any linear work and space algorithm for length n, the suffix tree for s stores the n suffixes of s generating a suffix array and corresponding LCPs using in a pat tree. O(S(n)) time our results lead to a linear work and space In this paper we assume an integer alphabet Σ ⊆ [n] 2 algorithm for generating suffix trees in O(S(n) + log n) where n is the total number of characters. We require time. For example using the Skew algorithm [KS03] on that the pat tree and suffix tree supports the following 2 a CRCW PRAM we have O(log n) time for constant- queries on a node in constant expected time: finding sized alphabets and O(n ); 0 < ≤ 1 time for the the child edge based on the first character of the edge, alphabet [n]. We note that a polylogarithmic time, finding the first child, finding the next and previous linear work and linear space algorithm for the alphabet sibling in the character order, and finding the parent. [n] would imply stable radix sort on [n] in the same If the alphabet is constant sized all these operations bounds, which is a long open problem. can easily be implemented in constant worst-case time. For comparison we also present a technique for using A Cartesian tree [Vui80] on a sequence of elements the ANSV problem to generate multiway Cartesian taken from a total order is a binary tree that satisfies trees on arbitrary alphabets in linear work and space. two properties: (1) heap order on values, i.e. a node The algorithm runs in O(I(n) + log n) time on the has an equal or lesser value than any of its descendants, CRCW PRAM, where I(n) is the best time bound for a and (2) an inorder traversal of the tree defines the linear-work stable sorting of integers from [n].
Recommended publications
  • The Euler Tour Technique: Evaluation of Tree Functions
    05‐09‐2015 PARALLEL AND DISTRIBUTED ALGORITHMS BY DEBDEEP MUKHOPADHYAY AND ABHISHEK SOMANI http://cse.iitkgp.ac.in/~debdeep/courses_iitkgp/PAlgo/index.htm THE EULER TOUR TECHNIQUE: EVALUATION OF TREE FUNCTIONS 2 1 05‐09‐2015 OVERVIEW Tree contraction Evaluation of arithmetic expressions 3 PROBLEMS IN PARALLEL COMPUTATIONS OF TREE FUNCTIONS Computations of tree functions are important for designing many algorithms for trees and graphs. Some of these computations include preorder, postorder, inorder numbering of the nodes of a tree, number of descendants of each vertex, level of each vertex etc. 4 2 05‐09‐2015 PROBLEMS IN PARALLEL COMPUTATIONS OF TREE FUNCTIONS Most sequential algorithms for these problems use depth-first search for solving these problems. However, depth-first search seems to be inherently sequential in some sense. 5 PARALLEL DEPTH-FIRST SEARCH It is difficult to do depth-first search in parallel. We cannot assign depth-first numbering to the node n unless we have assigned depth-first numbering to all the nodes in the subtree A. 6 3 05‐09‐2015 PARALLEL DEPTH-FIRST SEARCH There is a definite order of visiting the nodes in depth-first search. We can introduce additional edges to the tree to get this order. The Euler tour technique converts a tree into a list by adding additional edges. 7 PARALLEL DEPTH-FIRST SEARCH The red (or, magenta ) arrows are followed when we visit a node for the first (or, second) time. If the tree has n nodes, we can construct a list with 2n - 2 nodes, where each arrow (directed edge) is a node of the list.
    [Show full text]
  • Area-Efficient Algorithms for Straight-Line Tree Drawings
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Computational Geometry 15 (2000) 175–202 Area-efficient algorithms for straight-line tree drawings ✩ Chan-Su Shin a;∗, Sung Kwon Kim b;1, Kyung-Yong Chwa a a Department of Computer Science, Korea Advanced Institute of Science and Technology, Taejon 305-701, South Korea b Department of Computer Science and Engineering, Chung-Ang University, Seoul 156-756, South Korea Communicated by R. Tamassia; submitted 1 October 1996; received in revised form 1 November 1999; accepted 9 November 1999 Abstract We investigate several straight-line drawing problems for bounded-degree trees in the integer grid without edge crossings under various types of drawings: (1) upward drawings whose edges are drawn as vertically monotone chains, a sequence of line segments, from a parent to its children, (2) order-preserving drawings which preserve the left-to-right order of the children of each vertex, and (3) orthogonal straight-line drawings in which each edge is represented as a single vertical or horizontal segment. Main contribution of this paper is a unified framework to reduce the upper bound on area for the straight-line drawing problems from O.n logn/ (Crescenzi et al., 1992) to O.n log logn/. This is the first solution of an open problem stated by Garg et al. (1993). We also show that any binary tree admits a small area drawing satisfying any given aspect ratio in the orthogonal straight-line drawing type. Our results are briefly summarized as follows.
    [Show full text]
  • Balanced Trees Part One
    Balanced Trees Part One Balanced Trees ● Balanced search trees are among the most useful and versatile data structures. ● Many programming languages ship with a balanced tree library. ● C++: std::map / std::set ● Java: TreeMap / TreeSet ● Many advanced data structures are layered on top of balanced trees. ● We’ll see several later in the quarter! Where We're Going ● B-Trees (Today) ● A simple type of balanced tree developed for block storage. ● Red/Black Trees (Today/Thursday) ● The canonical balanced binary search tree. ● Augmented Search Trees (Thursday) ● Adding extra information to balanced trees to supercharge the data structure. Outline for Today ● BST Review ● Refresher on basic BST concepts and runtimes. ● Overview of Red/Black Trees ● What we're building toward. ● B-Trees and 2-3-4 Trees ● Simple balanced trees, in depth. ● Intuiting Red/Black Trees ● A much better feel for red/black trees. A Quick BST Review Binary Search Trees ● A binary search tree is a binary tree with 9 the following properties: 5 13 ● Each node in the BST stores a key, and 1 6 10 14 optionally, some auxiliary information. 3 7 11 15 ● The key of every node in a BST is strictly greater than all keys 2 4 8 12 to its left and strictly smaller than all keys to its right. Binary Search Trees ● The height of a binary search tree is the 9 length of the longest path from the root to a 5 13 leaf, measured in the number of edges. 1 6 10 14 ● A tree with one node has height 0.
    [Show full text]
  • 15–210: Parallel and Sequential Data Structures and Algorithms
    Full Name: Andrew ID: Section: 15{210: Parallel and Sequential Data Structures and Algorithms Practice Final (Solutions) December 2016 • Verify: There are 22 pages in this examination, comprising 8 questions worth a total of 152 points. The last 2 pages are an appendix with costs of sequence, set and table operations. • Time: You have 180 minutes to complete this examination. • Goes without saying: Please answer all questions in the space provided with the question. Clearly indicate your answers. 1 • Beware: You may refer to your two< double-sided 8 2 × 11in sheet of paper with notes, but to no other person or source, during the examination. • Primitives: In your algorithms you can use any of the primitives that we have covered in the lecture. A reasonably comprehensive list is provided at the end. • Code: When writing your algorithms, you can use ML syntax but you don't have to. You can use the pseudocode notation used in the notes or in class. For example you can use the syntax that you have learned in class. In fact, in the questions, we use the pseudo rather than the ML notation. Sections A 9:30am - 10:20am Andra/Charlie B 10:30am - 11:20am Andres/Emma C 12:30pm - 1:20pm Anatol/John D 12:30pm - 1:20pm Aashir/Ashwin E 1:30pm - 2:20pm Nathaniel/Sonya F 3:30pm - 4:20pm Teddy/Vivek G 4:30pm - 5:20pm Alex/Patrick 15{210 Practice Final 1 of 22 December 2016 Full Name: Andrew ID: Question Points Score Binary Answers 30 Costs 12 Short Answers 26 Slightly Longer Answers 20 Neighborhoods 20 Median ADT 12 Geometric Coverage 12 Swap with Compare-and-Swap 20 Total: 152 15{210 Practice Final 2 of 22 December 2016 Question 1: Binary Answers (30 points) (a) (2 points) TRUE or FALSE: The expressions (Seq.reduce f I A) and (Seq.iterate f I A) always return the same result as long as f is commutative.
    [Show full text]
  • Position Heaps for Cartesian-Tree Matching on Strings and Tries
    Position Heaps for Cartesian-tree Matching on Strings and Tries Akio Nishimoto1, Noriki Fujisato1, Yuto Nakashima1, and Shunsuke Inenaga1;2 1Department of Informatics, Kyushu University, Japan fnishimoto.akio, noriki.fujisato, yuto.nakashima, [email protected] 2PRESTO, Japan Science and Technology Agency, Japan Abstract The Cartesian-tree pattern matching is a recently introduced scheme of pattern matching that detects fragments in a sequential data stream which have a similar structure as a query pattern. Formally, Cartesian-tree pattern matching seeks all substrings S0 of the text string S such that the Cartesian tree of S0 and that of a query pattern P coincide. In this paper, we present a new indexing structure for this problem, called the Cartesian-tree Position Heap (CPH ). Let n be the length of the input text string S, m the length of a query pattern P , and σ the alphabet size. We show that the CPH of S, denoted CPH(S), supports pattern matching queries in O(m(σ + log(minfh; mg)) + occ) time with O(n) space, where h is the height of the CPH and occ is the number of pattern occurrences. We show how to build CPH(S) in O(n log σ) time with O(n) working space. Further, we extend the problem to the case where the text is a labeled tree (i.e. a trie). Given a trie T with N nodes, we show that the CPH of T , denoted CPH(T ), supports pattern matching queries on the trie in O(m(σ2 +log(minfh; mg))+occ) time with O(Nσ) space.
    [Show full text]
  • Dynamic Trees 2005; Werneck, Tarjan
    Dynamic Trees 2005; Werneck, Tarjan Renato F. Werneck, Microsoft Research Silicon Valley www.research.microsoft.com/˜renatow Entry editor: Giuseppe Italiano Index Terms: Dynamic trees, link-cut trees, top trees, topology trees, ST- trees, RC-trees, splay trees. Synonyms: Link-cut trees. 1 Problem Definition The dynamic tree problem is that of maintaining an arbitrary n-vertex for- est that changes over time through edge insertions (links) and deletions (cuts). Depending on the application, one associates information with ver- tices, edges, or both. Queries and updates can deal with individual vertices or edges, but more commonly they refer to entire paths or trees. Typical operations include finding the minimum-cost edge along a path, determining the minimum-cost vertex in a tree, or adding a constant value to the cost of each edge on a path (or of each vertex of a tree). Each of these operations, as well as links and cuts, can be performed in O(log n) time with appropriate data structures. 2 Key Results The obvious solution to the dynamic tree problem is to represent the forest explicitly. This, however, is inefficient for queries dealing with entire paths or trees, since it would require actually traversing them. Achieving O(log n) time per operation requires mapping each (possibly unbalanced) input tree into a balanced tree, which is better suited to maintaining information about paths or trees implicitly. There are three main approaches to perform the mapping: path decomposition, tree contraction, and linearization. 1 Path decomposition. The first efficient dynamic tree data structure was Sleator and Tarjan’s ST-trees [13, 14], also known as link-cut trees or simply dynamic trees.
    [Show full text]
  • Massively Parallel Dynamic Programming on Trees∗
    Massively Parallel Dynamic Programming on Trees∗ MohammadHossein Bateniy Soheil Behnezhadz Mahsa Derakhshanz MohammadTaghi Hajiaghayiz Vahab Mirrokniy Abstract Dynamic programming is a powerful technique that is, unfortunately, often inherently se- quential. That is, there exists no unified method to parallelize algorithms that use dynamic programming. In this paper, we attempt to address this issue in the Massively Parallel Compu- tations (MPC) model which is a popular abstraction of MapReduce-like paradigms. Our main result is an algorithmic framework to adapt a large family of dynamic programs defined over trees. We introduce two classes of graph problems that admit dynamic programming solutions on trees. We refer to them as \(poly log)-expressible" and \linear-expressible" problems. We show that both classes can be parallelized in O(log n) rounds using a sublinear number of machines and a sublinear memory per machine. To achieve this result, we introduce a series of techniques that can be plugged together. To illustrate the generality of our framework, we implement in O(log n) rounds of MPC, the dynamic programming solution of graph problems such as minimum bisection, k-spanning tree, maximum independent set, longest path, etc., when the input graph is a tree. arXiv:1809.03685v2 [cs.DS] 14 Sep 2018 ∗This is the full version of a paper [8] appeared at ICALP 2018. yGoogle Research, New York. Email: fbateni,[email protected]. zDepartment of Computer Science, University of Maryland. Email: fsoheil,mahsaa,[email protected]. Supported in part by NSF CAREER award CCF-1053605, NSF BIGDATA grant IIS-1546108, NSF AF:Medium grant CCF-1161365, DARPA GRAPHS/AFOSR grant FA9550-12-1-0423, and another DARPA SIMPLEX grant.
    [Show full text]
  • Design and Analysis of Data Structures for Dynamic Trees
    Design and Analysis of Data Structures for Dynamic Trees Renato Werneck Advisor: Robert Tarjan Outline The Dynamic Trees problem ) Existing data structures • A new worst-case data structure • A new amortized data structure • Experimental results • Final remarks • The Dynamic Trees Problem Dynamic trees: • { Goal: maintain an n-vertex forest that changes over time. link(v; w): adds edge between v and w. ∗ cut(v; w): deletes edge (v; w). ∗ { Application-specific data associated with vertices/edges: updates/queries can happen in bulk (entire paths or trees at once). ∗ Concrete examples: • { Find minimum-weight edge on the path between v and w. { Add a value to every edge on the path between v and w. { Find total weight of all vertices in a tree. Goal: O(log n) time per operation. • Applications Subroutine of network flow algorithms • { maximum flow { minimum cost flow Subroutine of dynamic graph algorithms • { dynamic biconnected components { dynamic minimum spanning trees { dynamic minimum cut Subroutine of standard graph algorithms • { multiple-source shortest paths in planar graphs { online minimum spanning trees • · · · Application: Online Minimum Spanning Trees Problem: • { Graph on n vertices, with new edges \arriving" one at a time. { Goal: maintain the minimum spanning forest (MSF) of G. Algorithm: • { Edge e = (v; w) with length `(e) arrives: 1. If v and w in different components: insert e; 2. Otherwise, find longest edge f on the path v w: · · · `(e) < `(f): remove f and insert e. ∗ `(e) `(f): discard e. ∗ ≥ Example: Online Minimum Spanning Trees Current minimum spanning forest. • Example: Online Minimum Spanning Trees Edge between different components arrives. • Example: Online Minimum Spanning Trees Edge between different components arrives: • { add it to the forest.
    [Show full text]
  • Lecture Notes of CSCI5610 Advanced Data Structures
    Lecture Notes of CSCI5610 Advanced Data Structures Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong July 17, 2020 Contents 1 Course Overview and Computation Models 4 2 The Binary Search Tree and the 2-3 Tree 7 2.1 The binary search tree . .7 2.2 The 2-3 tree . .9 2.3 Remarks . 13 3 Structures for Intervals 15 3.1 The interval tree . 15 3.2 The segment tree . 17 3.3 Remarks . 18 4 Structures for Points 20 4.1 The kd-tree . 20 4.2 A bootstrapping lemma . 22 4.3 The priority search tree . 24 4.4 The range tree . 27 4.5 Another range tree with better query time . 29 4.6 Pointer-machine structures . 30 4.7 Remarks . 31 5 Logarithmic Method and Global Rebuilding 33 5.1 Amortized update cost . 33 5.2 Decomposable problems . 34 5.3 The logarithmic method . 34 5.4 Fully dynamic kd-trees with global rebuilding . 37 5.5 Remarks . 39 6 Weight Balancing 41 6.1 BB[α]-trees . 41 6.2 Insertion . 42 6.3 Deletion . 42 6.4 Amortized analysis . 42 6.5 Dynamization with weight balancing . 43 6.6 Remarks . 44 1 CONTENTS 2 7 Partial Persistence 47 7.1 The potential method . 47 7.2 Partially persistent BST . 48 7.3 General pointer-machine structures . 52 7.4 Remarks . 52 8 Dynamic Perfect Hashing 54 8.1 Two random graph results . 54 8.2 Cuckoo hashing . 55 8.3 Analysis . 58 8.4 Remarks . 59 9 Binomial and Fibonacci Heaps 61 9.1 The binomial heap .
    [Show full text]
  • CMSC 420: Lecture 7 Randomized Search Structures: Treaps and Skip Lists
    CMSC 420 Dave Mount CMSC 420: Lecture 7 Randomized Search Structures: Treaps and Skip Lists Randomized Data Structures: A common design techlque in the field of algorithm design in- volves the notion of using randomization. A randomized algorithm employs a pseudo-random number generator to inform some of its decisions. Randomization has proved to be a re- markably useful technique, and randomized algorithms are often the fastest and simplest algorithms for a given application. This may seem perplexing at first. Shouldn't an intelligent, clever algorithm designer be able to make better decisions than a simple random number generator? The issue is that a deterministic decision-making process may be susceptible to systematic biases, which in turn can result in unbalanced data structures. Randomness creates a layer of \independence," which can alleviate these systematic biases. In this lecture, we will consider two famous randomized data structures, which were invented at nearly the same time. The first is a randomized version of a binary tree, called a treap. This data structure's name is a portmanteau (combination) of \tree" and \heap." It was developed by Raimund Seidel and Cecilia Aragon in 1989. (Remarkably, this 1-dimensional data structure is closely related to two 2-dimensional data structures, the Cartesian tree by Jean Vuillemin and the priority search tree of Edward McCreight, both discovered in 1980.) The other data structure is the skip list, which is a randomized version of a linked list where links can point to entries that are separated by a significant distance. This was invented by Bill Pugh (a professor at UMD!).
    [Show full text]
  • MATHEMATICAL ENGINEERING TECHNICAL REPORTS Balanced
    MATHEMATICAL ENGINEERING TECHNICAL REPORTS Balanced Ternary-Tree Representation of Binary Trees and Balancing Algorithms Kiminori MATSUZAKI and Akimasa MORIHATA METR 2008–30 July 2008 DEPARTMENT OF MATHEMATICAL INFORMATICS GRADUATE SCHOOL OF INFORMATION SCIENCE AND TECHNOLOGY THE UNIVERSITY OF TOKYO BUNKYO-KU, TOKYO 113-8656, JAPAN WWW page: http://www.keisu.t.u-tokyo.ac.jp/research/techrep/index.html The METR technical reports are published as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder. Balanced Ternary-Tree Representation of Binary Trees and Balancing Algorithms Kiminori Matsuzaki and Akimasa Morihata Abstract. In this paper, we propose novel representation of binary trees, named the balanced ternary-tree representation. We examine flexible division of binary trees in which we can divide a tree at any node rather than just at the root, and introduce the ternary-tree representation for the flexible division. Due to the flexibility of division, for any binary tree, balanced or ill-balanced, there is always a balanced ternary tree representing it. We also develop three algorithms for generating balanced ternary-tree representation from binary trees and for rebalancing the ternary-tree representation after a small change of the shape. We not only show theoretical upper bounds of heights of the ternary-tree representation, but also report experiment results about the balance property in practical settings.
    [Show full text]
  • Octree-Based Projection Mesh Generation for Complex 'Dirty' Geometries
    Octree-based Projection Mesh Generation for Complex 'Dirty' Geometries ALEXANDER V. SHEVELKO A thesis submitted in partial fulfilment of the requirements of the University of Brighton for the degree of Doctor of Philosophy April 2006 The University of Brighton in collaboration with Ricardo UK Limited Abstract The problem of robust fully automatic 3D body-fitted mesh generation for industrial geometries is considered. A new projection-based method is developed for a general-type unstructured solver. Unlike other projection techniques, it starts with generation of a high quality prismatic dummy boundary layer. The nodes are gradually moved towards the domain boundaries and important geometric features are resolved. The minimum mesh quality is preserved at every stage of the process. This guarantees that the obtained grid is valid for calculations. The algorithm uses octree data structure as a base mesh. Its implementation is described in detail. Geometry localisation and the refinement criteria for 'dirty' geometries are further developed. The method starts with triangulated geometric boundaries. The quality of the input data can be extremely poor. A problem of leaking of the base mesh is considered. Several strategies to overcome it are proposed. The generated polyhedral mesh can also be used for fully automatic tetrahedral or hybrid grid generation. Near boundary prismatic cells can be anisotropically refined for boundary layer mesh generation for viscous flow calculations. The developed methodology can also be used for polygonal, triangular or hybrid surface mesh generation and shrink-wrapping to achieve the following basic aims: 'cleaning' of 'dirty' geometries, reduction of model elements, generation of external surfaces, thickening and offset of geometries.
    [Show full text]