Bulk Updates and Cache Sensitivity in Search Trees

BULK UPDATES AND CACHE SENSITIVITY INSEARCHTREES TKK Research Reports in Computer Science and Engineering A Espoo 2009 TKK-CSE-A2/09 BULK UPDATES AND CACHE SENSITIVITY INSEARCHTREES Doctoral Dissertation Riku Saikkonen Dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the Faculty of Information and Natural Sciences, Helsinki University of Technology for public examination and debate in Auditorium T2 at Helsinki University of Technology (Espoo, Finland) on the 4th of September, 2009, at 12 noon. Helsinki University of Technology Faculty of Information and Natural Sciences Department of Computer Science and Engineering Teknillinen korkeakoulu Informaatio- ja luonnontieteiden tiedekunta Tietotekniikan laitos Distribution: Helsinki University of Technology Faculty of Information and Natural Sciences Department of Computer Science and Engineering P.O. Box 5400 FI-02015 TKK FINLAND URL: http://www.cse.tkk.fi/ Tel. +358 9 451 3228 Fax. +358 9 451 3293 E-mail: [email protected].fi °c 2009 Riku Saikkonen Layout: Riku Saikkonen (except cover) Cover image: Riku Saikkonen Set in the Computer Modern typeface family designed by Donald E. Knuth. ISBN 978-952-248-037-8 ISBN 978-952-248-038-5 (PDF) ISSN 1797-6928 ISSN 1797-6936 (PDF) URL: http://lib.tkk.fi/Diss/2009/isbn9789522480385/ Multiprint Oy Espoo 2009 AB ABSTRACT OF DOCTORAL DISSERTATION HELSINKI UNIVERSITY OF TECHNOLOGY P.O. Box 1000, FI-02015 TKK http://www.tkk.fi/ Author Riku Saikkonen Name of the dissertation Bulk Updates and Cache Sensitivity in Search Trees Manuscript submitted 09.04.2009 Manuscript revised 15.08.2009 Date of the defence 04.09.2009 £ Monograph ¤ Article dissertation (summary + original articles) Faculty Faculty of Information and Natural Sciences Department Department of Computer Science and Engineering Field of research Software Systems Opponent(s) Prof. Peter Widmayer Supervisor Prof. Eljas Soisalon-Soininen Instructor Prof. Eljas Soisalon-Soininen Abstract This thesis examines two topics related to binary search trees: cache-sensitive memory layouts and avl-tree bulk-update operations. Bulk updates are also applied to adaptive sorting. Cache-sensitive data structures are tailored to the hardware caches in modern computers. The thesis presents a method for adding cache-sensitivity to binary search trees without changing the rebalancing strategy. Cache- sensitivity is maintained using worst-case constant-time operations executed when the tree changes. The thesis presents experiments performed on avl trees and red-black trees, including a comparison with cache-sensitive B-trees. Next, the thesis examines bulk insertion and bulk deletion in avl trees. Bulk insertion inserts several keys in one operation. The number of rotations used by avl-tree bulk insertion is shown to be worst-case logarithmic in the number of inserted keys, if they go to the same location in the tree. Bulk deletion deletes an interval of keys. When amortized over a sequence of bulk deletions, each deletion requires a number of rotations that is logarithmic in the number of deleted keys. The search cost and total rebalancing complexity of inserting or deleting keys from several locations in the tree are also analyzed. Experiments show that the algorithms work efficiently with randomly generated input data. Adaptive sorting algorithms are efficient when the input is nearly sorted according to some measure of pre- sortedness. The thesis presents an avl-tree-based variation of the adaptive sorting algorithm known as local insertion sort. Bulk insertion is applied by extracting consecutive ascending or descending keys from the input to be sorted. A variant that does not require a special bulk-insertion algorithm is also given. Experiments show that applying bulk insertion considerably reduces the number of comparisons and time needed to sort nearly sorted sequences. The algorithms are also compared with various other adaptive and non-adaptive sorting algorithms. Keywords cache-sensitivity, cache-consciousness, bulk insertion, bulk deletion, adaptive sorting, avl tree, binary search tree ISBN (printed) 978-952-248-037-8 ISSN (printed) 1797-6928 ISBN (pdf) 978-952-248-038-5 ISSN (pdf) 1797-6936 Language English Number of pages 144 p. Publisher Department of Computer Science and Engineering Print distribution Department of Computer Science and Engineering £ The dissertation can be read at http://lib.tkk.fi/Diss/2009/isbn9789522480385/ AB VAIT¨ OSKIRJAN¨ TIIVISTELMA¨ TEKNILLINEN KORKEAKOULU PL 1000, 02015 TKK http://www.tkk.fi/ Tekijä Riku Saikkonen Väitöskirjan nimi Eräpäivitykset ja välimuistitietoisuus hakupuissa Käsikirjoituksen päivämäärä 09.04.2009 Korjatun käsikirjoituksen päivämäärä 15.08.2009 Väitöstilaisuuden ajankohta 04.09.2009 £ Monografia ¤ Yhdistelmäväitöskirja (yhteenveto + erillisartikkelit) Tiedekunta Informaatio- ja luonnontieteiden tiedekunta Laitos Tietotekniikan laitos Tutkimusala Ohjelmistojärjestelmät Vastaväittäjä(t) prof. Peter Widmayer Työn valvoja prof. Eljas Soisalon-Soininen Työn ohjaaja prof. Eljas Soisalon-Soininen Tiivistelmä Väitöskirja käsittelee kahta binäärisiin hakupuihin liittyvää aihetta: välimuistitietoista solmujen sijoittamista muistiin sekä avl-puiden eräpäivitysoperaatioita. Eräpäivityksiä myös sovelletaan mukautuvaan järjestämiseen. Välimuistitietoiset tietorakenteet on sovitettu nykyisten tietokoneiden laitteistotason välimuisteihin. Väitöskirja esittelee tavan tehdä binäärisistä hakupuista välimuistitietoisia muuttamatta hakupuun tasapainotusmenetel- mää. Välimuistitietoisuus säilytetään tekemällä puun muuttuessa pahimmassakin tapauksessa vakioaikaisia operaatioita. Väitöskirja esittelee avl- ja punamustilla puilla tehtyjä kokeita, sisältäen vertailun välimuistitietoisiin B-puihin. Seuraavaksi väitöskirja käsittelee avl-puiden erälisäys- ja eräpoisto-operaatioita. Erälisäys lisää monta avainta puuhun kerralla. Väitöskirjassa näytetään, että avl-puun erälisäys tekee pahimmassa tapauksessa logaritmisen määrän kiertoja suhteessa lisättyjen avainten lukumäärään, jos avaimet kuuluvat puussa samaan kohtaan. Eräpoisto poistaa puusta kokonaisen avainvälin. Jonossa eräpoisto-operaatioita kukin poisto tarvitsee tasoite- tusti logaritmisen määrän kiertoja suhteessa poistettavien avainten lukumäärään. Lisäksi tutkitaan hakukus- tannusta ja koko tasapainotuksen kustannusta tilanteessa, jossa lisäyksiä tai poistoja tehdään useaan kohtaan puussa. Koetulosten mukaan algoritmit toimivat tehokkaasti satunnaisesti tuotetuilla syötteillä. Mukautuvat järjestämisalgoritmit ovat tehokkaita, kun syöte on ennestään melkein järjestyksessä jonkin järjestystä kuvaavan suureen mukaan. Väitöskirja esittelee avl-puupohjaisen muunnelman paikallinen lisäysjär- jestäminen -nimisestä mukautuvasta järjestämisalgoritmista. Erälisäystä sovelletaan keräämällä järjestettävästä syötteestä peräkkäisiä kasvavia tai väheneviä avaimia. Lisäksi esitellään muunnelma, joka ei tarvitse varsinais- ta erälisäysalgoritmia. Koetulosten mukaan erälisäyksen soveltaminen vähentää tarvittavia vertailuoperaatioita ja suoritusaikaa huomattavasti, kun järjestettävä jono on melkein järjestyksessä. Algoritmeja verrataan myös muihin mukautuviin ja ei-mukautuviin järjestämisalgoritmeihin. Asiasanat välimuistitietoisuus, erälisäys, eräpoisto, mukautuva järjestäminen, avl-puu, binäärinen hakupuu ISBN (painettu) 978-952-248-037-8 ISSN (painettu) 1797-6928 ISBN (pdf) 978-952-248-038-5 ISSN (pdf) 1797-6936 Kieli englanti Sivumäärä 144 s. Julkaisija Tietotekniikan laitos Painetun väitöskirjan jakelu Tietotekniikan laitos £ Luettavissa verkossa osoitteessa http://lib.tkk.fi/Diss/2009/isbn9789522480385/ ix Preface First of all, I would like to thank the supervisor of this work, Professor Eljas Soisalon-Soininen. He has introduced me to the field of algorithm and data structure research and provided an environment where I had the opportunity to concentrate primarily on the research work. He has been very supportive, and an extremely valuable guide and source of new ideas for the research. I am grateful to Professors Tapio Elomaa (Tampere University of Technology) and Otto Nurmi (University of Helsinki) for their detailed and valuable comments on the manuscript, and to Professor Peter Wid- mayer (eth Zürich) for kindly agreeing to be the opponent. The research presented in this thesis was carried out at the Depart- ment of Computer Science and Engineering at the Helsinki University of Technology. I thank the Department, as well as the Helsinki Gradu- ate School in Computer Science and Engineering and the Academy of Finland for providing the financial support that enabled me to work on this thesis. I would also like to thank the authors (far too many to list here) of the free open-source software that I used in this work. The most useful programs were gnu Emacs, Latex and its packages, Gnuplot, the gnu C compiler, and Perl. Thanks also to the people behind the Debian gnu/Linux distribution, which has provided a stable operating system that included all the above software. Finally, I wish to thank my family and friends for their support throughout my studies. Espoo, August 2009 Riku Saikkonen x xi Contents 1 INTRODUCTION 1 1.1 Bulk updates · p. 1 1.2 Cache sensitivity · p. 2 1.3 Adaptive sorting · p. 3 1.4 Organization · p. 4 2 BACKGROUND 5 2.1 Search trees · p. 5 2.2 avl trees · p. 6 2.3 Red-black trees · p. 7 2.4 Internal and external trees · p. 8 2.5 Height-valued trees · p. 10 2.6 Parent links · p. 11 2.7 Relaxed balancing · p. 11 2.8 Bulk updates · p. 12 2.9 Cache-conscious search trees · p. 13 2.10 Adaptive sorting · p. 15 2.11 Finger trees · p. 17 3 CACHE-SENSITIVE BINARY SEARCH TREES 19 3.1 Cache model · p. 19 3.2 Multi-level cache-sensitive layout · p. 21 3.3 Global relocation · p. 24 3.4 Local relocation · p.

Bulk Updates and Cache Sensitivity in Search Trees

1 Suffix Trees

Managing Unbounded-Length Keys in Comparison-Driven Data

Augmentation: Range Trees (PDF)

Advanced Data Structures

Search Trees

The Random Access Zipper: Simple, Purely-Functional Sequences

Design of Data Structures for Mergeable Trees∗

Optimal Finger Search Trees in the Pointer Machine

Space-Efficient Data Structures, Streams, and Algorithms

Finger Search Trees

An..Time Algorithm for Triangulating

Biased Finger Trees and Three-Dimensional Layers of Maxima