A Simple Parallel Cartesian Tree Algorithm and Its Application to Suﬃx Tree Construction ∗

A Simple Parallel Cartesian Tree Algorithm and its Application to Suffix Tree Construction ∗ Guy E. Blellochy Julian Shunz Abstract data structures for string processing. For example, it We present a simple linear work and space, and poly- is used in several bioinformatic applications, such as logarithmic time parallel algorithm for generating mul- REPuter [KS99], MUMmer [DPCS02], OASIS [MPK03] tiway Cartesian trees. As a special case, the algorithm and Trellis+ [PZ08]. Both suffix trees and a linear can be used to generate suffix trees from suffix arrays on time algorithm for constructing them were introduced arbitrary alphabets in the same bounds. In conjunction by Weiner [Wei73] (although he used the term posi- with parallel suffix array algorithms, such as the skew tion tree). Since then various similar constructions have algorithm, this gives a rather simple linear work paral- been described [McC76] and there have been many im- lel algorithm for generating suffix trees over an integer plementations of these algorithms. Although originally alphabet Σ ⊆ [1; : : : ; n], where n is the length of the in- designed for fixed-sized alphabets with deterministic lin- put string. More generally, given a sorted sequences of ear time, Weiner's algorithm can work on an alphabet strings and the longest common prefix lengths between f1; : : : ; ng, henceforth [n], in linear expected time sim- adjacent elements, the algorithm will generate a pat tree ply by using hashing to access the children of a node. (compacted trie) over the strings. The algorithm of Weiner and its derivatives are all We also present experimental results comparing incremental and inherently sequential. The first paral- the performance of the algorithm to existing sequential lel algorithm for suffix trees was given by Apostolico et. + implementations and a second parallel algorithm. We al. [AIL 88] and was based on a quite different doubling present comparisons for the Cartesian tree algorithm approach. For a parameter 0 < ≤ 1 the algorithm 1 n 1+ on its own and for constructing a suffix tree using our runs in O( log n) time, O( log n) work and O(n ) algorithm. The results show that on a variety of strings space on the CRCW PRAM for arbitrary alphabets. our algorithm is competitive with the sequential version Although reasonably simple, this algorithm is likely not on a single processor and achieves good speedup on practical since it is not work efficient and uses super- multiple processors. linear memory (by a polynomial factor). The parallel construction of suffix trees was later improved to linear 1 Introduction work and space by Hariharan [Har94], with an algorithm taking O(log4 n) time on the CREW PRAM, and then For a string s of length n over a character set Σ ⊆ by Farach and Muthukrishnan to O(log n) time using f1; : : : ; ng1 the suffix-tree data structure stores all the a randomized CRCW PRAM [FM96] (high-probability suffixes of s in a pat tree (a trie in which maximal branch bounds). These later results are for a constant-sized al- free paths are contracted into a single edge). In addi- phabet, are \considerably non-trivial", and do not seem tion to supporting searches in s for any string t 2 Σ∗ to be amenable to efficient implementations. in O(jtj) expected time2, suffix trees efficiently sup- One way to construct a suffix tree is to first generate port many other operations on strings, such as longest a suffix array (an array of pointers to the lexicograph- common substring, maximal repeats, longest repeated ically sorted suffixes), and then convert it to a suffix substrings, and longest palindrome, among many oth- tree. For binary alphabets and given the length of the ers [Gus97]. As such it is one of the most important longest common prefix (LCP) between adjacent entries this conversion can be done sequentially by generating ∗This work was supported by generous gifts from Intel, Mi- a Cartesian tree in linear time and space. The approach crosoft and IBM, and by the National Science Foundation under can be generalized to arbitrary alphabets using multi- award CCF1018188. way Cartesian trees without much difficulty. Using suf- y Carnegie Mellon University, E-mail: [email protected] fix arrays is attractive since in recent years there has zCarnegie Mellon University, E-mail: [email protected] 1More general alphabets can be used by first sorting the been considerable theoretical and practical advances in characters and then labeling them from 1 to n. the generation of suffix arrays (see e.g. [PST07]). The 2Worst case time for constant sized alphabets. Copyright © 2011 by SIAM 48 Unauthorized reproduction is prohibited. interest is partly due to their need in the widely used machine on a variety of inputs. First, we compare our Burrows-Wheeler compression algorithm, and also as a Cartesian tree algorithm with a simple stack based se- more space-efficient alternative to suffix trees. As such quential implementation. On one core our algorithm is there have been dozens of papers on efficient implemen- about 3x slower, but we achieve about 30x speedup on tations of suffix arrays. Among these Karkkainen and 32 cores and about 45x speedup with 32 cores using hy- Sanders have developed a quite simple and efficient par- perthreading (two threads per core). We also analyze allel algorithm for suffix arrays [KS03, KS07] that can the algorithm when used as part of code to generate a also generate LCPs. suffix tree from the original string. We compare the code The story with generating Cartesian trees in paral- to the ANSV based algorithm described in the previous lel is less satisfactory. Berkman et. al [BSV93] describe paragraph and to existing sequential implementations a parallel algorithm for the all nearest smaller values of suffix trees. Our algorithm is always faster than the (ANSV) problem, which can be directly used to gen- ANSV algorithm. The algorithm is competitive with erate a binary Cartesian tree for fixed sized alphabets. the sequential code on a single processor (core), and However, it cannot directly be used for non-constant achieves good speedup on 32 cores. Finally, we present sized alphabets, and the algorithm is very complicated. timings for searching multiple strings in the suffix tree Iliopoulos and Rytter [IR04] present two much simpler we generate. Our times are always faster than the se- algorithms for generating suffix trees from suffix arrays, quential suffix tree on one core and always more than one based on merging and one based on a variant of 50x faster on 32 cores using hyperthreading. the ANSV problem that allows for multiway Cartesian trees. However they both require O(n log n) work. 2 Preliminaries In this paper we describe a linear work, linear space, Given a string s of length n over an ordered alphabet and polylogarithmic time algorithm for generating mul- Σ, the suffix array, SA, represents the n suffixes of s in tiway Cartesian trees. The algorithm is based on divide- lexicographically sorted order. To be precise, SA[i] = j and-conquer and we describe two versions that differ if and only if the suffix starting at the j'th position in in whether the merging step is done sequentially or s appears in the i'th position in the suffix-sorted order. in parallel. The first based on a sequential merge, A pat tree [Mor68] (or patricia tree, or compacted trie) is very simple, and for a tree of height d, it runs in of a set of strings S is a modified trie in which (1) edges O(minfd log n; ng) time on the CREW PRAM. The sec- can be labeled with a sequence of characters instead of a ond version is only slightly more complicated and runs single character, (2) no node has a single child, and (3) 2 in O(log n) time on the CREW PRAM. They both use every string in S corresponds to concatenation of labels linear work and space. for a path from the root to a leaf. Given a string s of Given any linear work and space algorithm for length n, the suffix tree for s stores the n suffixes of s generating a suffix array and corresponding LCPs using in a pat tree. O(S(n)) time our results lead to a linear work and space In this paper we assume an integer alphabet Σ ⊆ [n] 2 algorithm for generating suffix trees in O(S(n) + log n) where n is the total number of characters. We require time. For example using the Skew algorithm [KS03] on that the pat tree and suffix tree supports the following 2 a CRCW PRAM we have O(log n) time for constant- queries on a node in constant expected time: finding sized alphabets and O(n ); 0 < ≤ 1 time for the the child edge based on the first character of the edge, alphabet [n]. We note that a polylogarithmic time, finding the first child, finding the next and previous linear work and linear space algorithm for the alphabet sibling in the character order, and finding the parent. [n] would imply stable radix sort on [n] in the same If the alphabet is constant sized all these operations bounds, which is a long open problem. can easily be implemented in constant worst-case time. For comparison we also present a technique for using A Cartesian tree [Vui80] on a sequence of elements the ANSV problem to generate multiway Cartesian taken from a total order is a binary tree that satisfies trees on arbitrary alphabets in linear work and space. two properties: (1) heap order on values, i.e. a node The algorithm runs in O(I(n) + log n) time on the has an equal or lesser value than any of its descendants, CRCW PRAM, where I(n) is the best time bound for a and (2) an inorder traversal of the tree defines the linear-work stable sorting of integers from [n].

A Simple Parallel Cartesian Tree Algorithm and Its Application to Suﬃx Tree Construction ∗

The Euler Tour Technique: Evaluation of Tree Functions

Area-Efficient Algorithms for Straight-Line Tree Drawings

Balanced Trees Part One

15–210: Parallel and Sequential Data Structures and Algorithms

Position Heaps for Cartesian-Tree Matching on Strings and Tries

Dynamic Trees 2005; Werneck, Tarjan

Massively Parallel Dynamic Programming on Trees∗

Design and Analysis of Data Structures for Dynamic Trees

Lecture Notes of CSCI5610 Advanced Data Structures

CMSC 420: Lecture 7 Randomized Search Structures: Treaps and Skip Lists

MATHEMATICAL ENGINEERING TECHNICAL REPORTS Balanced

Octree-Based Projection Mesh Generation for Complex 'Dirty' Geometries