Bulk Updates and Cache Sensitivity in Search Trees
Total Page:16
File Type:pdf, Size:1020Kb
BULK UPDATES AND CACHE SENSITIVITY INSEARCHTREES TKK Research Reports in Computer Science and Engineering A Espoo 2009 TKK-CSE-A2/09 BULK UPDATES AND CACHE SENSITIVITY INSEARCHTREES Doctoral Dissertation Riku Saikkonen Dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the Faculty of Information and Natural Sciences, Helsinki University of Technology for public examination and debate in Auditorium T2 at Helsinki University of Technology (Espoo, Finland) on the 4th of September, 2009, at 12 noon. Helsinki University of Technology Faculty of Information and Natural Sciences Department of Computer Science and Engineering Teknillinen korkeakoulu Informaatio- ja luonnontieteiden tiedekunta Tietotekniikan laitos Distribution: Helsinki University of Technology Faculty of Information and Natural Sciences Department of Computer Science and Engineering P.O. Box 5400 FI-02015 TKK FINLAND URL: http://www.cse.tkk.fi/ Tel. +358 9 451 3228 Fax. +358 9 451 3293 E-mail: [email protected].fi °c 2009 Riku Saikkonen Layout: Riku Saikkonen (except cover) Cover image: Riku Saikkonen Set in the Computer Modern typeface family designed by Donald E. Knuth. ISBN 978-952-248-037-8 ISBN 978-952-248-038-5 (PDF) ISSN 1797-6928 ISSN 1797-6936 (PDF) URL: http://lib.tkk.fi/Diss/2009/isbn9789522480385/ Multiprint Oy Espoo 2009 AB ABSTRACT OF DOCTORAL DISSERTATION HELSINKI UNIVERSITY OF TECHNOLOGY P.O. Box 1000, FI-02015 TKK http://www.tkk.fi/ Author Riku Saikkonen Name of the dissertation Bulk Updates and Cache Sensitivity in Search Trees Manuscript submitted 09.04.2009 Manuscript revised 15.08.2009 Date of the defence 04.09.2009 £ Monograph ¤ Article dissertation (summary + original articles) Faculty Faculty of Information and Natural Sciences Department Department of Computer Science and Engineering Field of research Software Systems Opponent(s) Prof. Peter Widmayer Supervisor Prof. Eljas Soisalon-Soininen Instructor Prof. Eljas Soisalon-Soininen Abstract This thesis examines two topics related to binary search trees: cache-sensitive memory layouts and avl-tree bulk-update operations. Bulk updates are also applied to adaptive sorting. Cache-sensitive data structures are tailored to the hardware caches in modern computers. The thesis presents a method for adding cache-sensitivity to binary search trees without changing the rebalancing strategy. Cache- sensitivity is maintained using worst-case constant-time operations executed when the tree changes. The thesis presents experiments performed on avl trees and red-black trees, including a comparison with cache-sensitive B-trees. Next, the thesis examines bulk insertion and bulk deletion in avl trees. Bulk insertion inserts several keys in one operation. The number of rotations used by avl-tree bulk insertion is shown to be worst-case logarithmic in the number of inserted keys, if they go to the same location in the tree. Bulk deletion deletes an interval of keys. When amortized over a sequence of bulk deletions, each deletion requires a number of rotations that is logarithmic in the number of deleted keys. The search cost and total rebalancing complexity of inserting or deleting keys from several locations in the tree are also analyzed. Experiments show that the algorithms work efficiently with randomly generated input data. Adaptive sorting algorithms are efficient when the input is nearly sorted according to some measure of pre- sortedness. The thesis presents an avl-tree-based variation of the adaptive sorting algorithm known as local insertion sort. Bulk insertion is applied by extracting consecutive ascending or descending keys from the input to be sorted. A variant that does not require a special bulk-insertion algorithm is also given. Experiments show that applying bulk insertion considerably reduces the number of comparisons and time needed to sort nearly sorted sequences. The algorithms are also compared with various other adaptive and non-adaptive sorting algorithms. Keywords cache-sensitivity, cache-consciousness, bulk insertion, bulk deletion, adaptive sorting, avl tree, binary search tree ISBN (printed) 978-952-248-037-8 ISSN (printed) 1797-6928 ISBN (pdf) 978-952-248-038-5 ISSN (pdf) 1797-6936 Language English Number of pages 144 p. Publisher Department of Computer Science and Engineering Print distribution Department of Computer Science and Engineering £ The dissertation can be read at http://lib.tkk.fi/Diss/2009/isbn9789522480385/ AB VAIT¨ OSKIRJAN¨ TIIVISTELMA¨ TEKNILLINEN KORKEAKOULU PL 1000, 02015 TKK http://www.tkk.fi/ Tekij¨a Riku Saikkonen V¨ait¨oskirjan nimi Er¨ap¨aivitykset ja v¨alimuistitietoisuus hakupuissa K¨asikirjoituksen p¨aiv¨am¨a¨ar¨a 09.04.2009 Korjatun k¨asikirjoituksen p¨aiv¨am¨a¨ar¨a 15.08.2009 V¨ait¨ostilaisuuden ajankohta 04.09.2009 £ Monografia ¤ Yhdistelm¨av¨ait¨oskirja (yhteenveto + erillisartikkelit) Tiedekunta Informaatio- ja luonnontieteiden tiedekunta Laitos Tietotekniikan laitos Tutkimusala Ohjelmistoj¨arjestelm¨at Vastav¨aitt¨aj¨a(t) prof. Peter Widmayer Ty¨on valvoja prof. Eljas Soisalon-Soininen Ty¨on ohjaaja prof. Eljas Soisalon-Soininen Tiivistelm¨a V¨ait¨oskirja k¨asittelee kahta bin¨a¨arisiin hakupuihin liittyv¨a¨a aihetta: v¨alimuistitietoista solmujen sijoittamista muistiin sek¨a avl-puiden er¨ap¨aivitysoperaatioita. Er¨ap¨aivityksi¨a my¨os sovelletaan mukautuvaan j¨arjest¨amiseen. V¨alimuistitietoiset tietorakenteet on sovitettu nykyisten tietokoneiden laitteistotason v¨alimuisteihin. V¨ait¨oskirja esittelee tavan tehd¨a bin¨a¨arisist¨a hakupuista v¨alimuistitietoisia muuttamatta hakupuun tasapainotusmenetel- m¨a¨a. V¨alimuistitietoisuus s¨ailytet¨a¨an tekem¨all¨a puun muuttuessa pahimmassakin tapauksessa vakioaikaisia ope- raatioita. V¨ait¨oskirja esittelee avl- ja punamustilla puilla tehtyj¨a kokeita, sis¨alt¨aen vertailun v¨alimuistitietoisiin B-puihin. Seuraavaksi v¨ait¨oskirja k¨asittelee avl-puiden er¨alis¨ays- ja er¨apoisto-operaatioita. Er¨alis¨ays lis¨a¨a monta avainta puuhun kerralla. V¨ait¨oskirjassa n¨aytet¨a¨an, ett¨a avl-puun er¨alis¨ays tekee pahimmassa tapauksessa logaritmi- sen m¨a¨ar¨an kiertoja suhteessa lis¨attyjen avainten lukum¨a¨ar¨a¨an, jos avaimet kuuluvat puussa samaan kohtaan. Er¨apoisto poistaa puusta kokonaisen avainv¨alin. Jonossa er¨apoisto-operaatioita kukin poisto tarvitsee tasoite- tusti logaritmisen m¨a¨ar¨an kiertoja suhteessa poistettavien avainten lukum¨a¨ar¨a¨an. Lis¨aksi tutkitaan hakukus- tannusta ja koko tasapainotuksen kustannusta tilanteessa, jossa lis¨ayksi¨a tai poistoja tehd¨a¨an useaan kohtaan puussa. Koetulosten mukaan algoritmit toimivat tehokkaasti satunnaisesti tuotetuilla sy¨otteill¨a. Mukautuvat j¨arjest¨amisalgoritmit ovat tehokkaita, kun sy¨ote on ennest¨a¨an melkein j¨arjestyksess¨a jonkin j¨arjestyst¨a kuvaavan suureen mukaan. V¨ait¨oskirja esittelee avl-puupohjaisen muunnelman paikallinen lis¨aysj¨ar- jest¨aminen -nimisest¨a mukautuvasta j¨arjest¨amisalgoritmista. Er¨alis¨ayst¨a sovelletaan ker¨a¨am¨all¨a j¨arjestett¨av¨ast¨a sy¨otteest¨a per¨akk¨aisi¨a kasvavia tai v¨ahenevi¨a avaimia. Lis¨aksi esitell¨a¨an muunnelma, joka ei tarvitse varsinais- ta er¨alis¨aysalgoritmia. Koetulosten mukaan er¨alis¨ayksen soveltaminen v¨ahent¨a¨a tarvittavia vertailuoperaatioita ja suoritusaikaa huomattavasti, kun j¨arjestett¨av¨a jono on melkein j¨arjestyksess¨a. Algoritmeja verrataan my¨os muihin mukautuviin ja ei-mukautuviin j¨arjest¨amisalgoritmeihin. Asiasanat v¨alimuistitietoisuus, er¨alis¨ays, er¨apoisto, mukautuva j¨arjest¨aminen, avl-puu, bin¨a¨arinen hakupuu ISBN (painettu) 978-952-248-037-8 ISSN (painettu) 1797-6928 ISBN (pdf) 978-952-248-038-5 ISSN (pdf) 1797-6936 Kieli englanti Sivum¨a¨ar¨a 144 s. Julkaisija Tietotekniikan laitos Painetun v¨ait¨oskirjan jakelu Tietotekniikan laitos £ Luettavissa verkossa osoitteessa http://lib.tkk.fi/Diss/2009/isbn9789522480385/ ix Preface First of all, I would like to thank the supervisor of this work, Professor Eljas Soisalon-Soininen. He has introduced me to the field of algorithm and data structure research and provided an environment where I had the opportunity to concentrate primarily on the research work. He has been very supportive, and an extremely valuable guide and source of new ideas for the research. I am grateful to Professors Tapio Elomaa (Tampere University of Technology) and Otto Nurmi (University of Helsinki) for their detailed and valuable comments on the manuscript, and to Professor Peter Wid- mayer (eth Z¨urich) for kindly agreeing to be the opponent. The research presented in this thesis was carried out at the Depart- ment of Computer Science and Engineering at the Helsinki University of Technology. I thank the Department, as well as the Helsinki Gradu- ate School in Computer Science and Engineering and the Academy of Finland for providing the financial support that enabled me to work on this thesis. I would also like to thank the authors (far too many to list here) of the free open-source software that I used in this work. The most useful programs were gnu Emacs, Latex and its packages, Gnuplot, the gnu C compiler, and Perl. Thanks also to the people behind the Debian gnu/Linux distribution, which has provided a stable operating system that included all the above software. Finally, I wish to thank my family and friends for their support throughout my studies. Espoo, August 2009 Riku Saikkonen x xi Contents 1 INTRODUCTION 1 1.1 Bulk updates · p. 1 1.2 Cache sensitivity · p. 2 1.3 Adaptive sorting · p. 3 1.4 Organization · p. 4 2 BACKGROUND 5 2.1 Search trees · p. 5 2.2 avl trees · p. 6 2.3 Red-black trees · p. 7 2.4 Internal and external trees · p. 8 2.5 Height-valued trees · p. 10 2.6 Parent links · p. 11 2.7 Relaxed balancing · p. 11 2.8 Bulk updates · p. 12 2.9 Cache-conscious search trees · p. 13 2.10 Adaptive sorting · p. 15 2.11 Finger trees · p. 17 3 CACHE-SENSITIVE BINARY SEARCH TREES 19 3.1 Cache model · p. 19 3.2 Multi-level cache-sensitive layout · p. 21 3.3 Global relocation · p. 24 3.4 Local relocation · p.