Appendix a Sift Optimizations in Van Emde Boas-Based Heap

Nuno Filipe Monteiro Faria Locality optimizations on irregular algorithms and data structures Dissertação de Mestrado Engenharia Informática Trabalho efectuado sob a orientação do Professor Doutor João Luís Sobral Outubro de 2011 Acknowledgments This dissertation would not have been possible without the aid of my parents and my sister, which have always endured and supported me through my life. Second, I thank my advisor, Professor Jo~aoLu´ısSobral, for giving me the opportunity to join him and his team in research, for fostering my critical thinking, guiding me and sharing his experience and wisdom with me. I would also like to thank the professors I found along my master course for introducing me to the highly motivating theme of high performance computing, Professor Jo~aoLu´ısSobral, Professor Alberto Proen¸ca,Professor Rui Ralha and Professor AntónioPina. Last but not least, thank you to all my friends and lab colleagues in university, Roberto Ribeiro, Rui Silva, Rui Gon¸calves, Diogo Mendes, Nuno Silva, for all the brainstorm-ish discus- sions and opinions that allowed me to improve and push myself further - a few themes presented in this dissertation were discussed in some of those brainstorms. To my colleagues/room-mates through my graduate years with whom I shared unforgettable experiences, Nuno Silva, Pedro Silva, Emanuel Gon¸calves, AndréFélix,Samuel Moreira, Roberto Ribeiro and Pedro Miranda - I believe I have made friends for life. Nuno Filipe Monteiro Faria The project that served as ground for this dissertation was funded under the agreement of the protocol between FCT and UTAustin Parallel Programming Refinements for Irregular Applications (UTAustin/CA/0056/2008) ii Otimiza¸c~oesde localidade em algoritmos e estruturas de dados irregulares Resumo Estruturas de grafos baseadas em apontadores têmsido amplamente discutidas por váriasco- munidades cient´ıficas que tencionam usar este tipo de estruturas. Muitas destas áreastêm a necessidade de usar e implementar algoritmos complexos e sofisticados, como por exemplo, encontrar a árvore de expans~aode custo m´ınimode um grafo. Estes algoritmos têmum comportamento t´ıpicamente irregular no que toca aos padr~oesde acesso àmemória. Como tal, o objectivo éotimizar estas implementa¸c~oesde estruturas de grafos visando aspetos que melhorem a localidade dos acessos àmemória. O impacto da latênciade memóriatem sido continua- mente atenuado atravésdo uso de memórias cache nas arquitecturas de computadores atuais - as técnicasde otimiza¸c~aopara estruturas regulares (ex., matrizes) s~aobem conhecidas neste âmbito, contudo estruturas irregulares inserem-se numa classe de problemas para os quais ainda n~aoexistem consolidadas representa¸c~oeseficientes. Nesta disserta¸c~aoéefetuado um estudo sobre as optimiza¸c~oesposs´ıveis em algoritmos como métodos de ordena¸c~ao heap-sort: s~aoapresentadas uma implementa¸c~aopadr~aode um algoritmo heap-sort e uma vers~aoamigável da cache, originalmente apresentada por Emde Boas. Junta- mente com os problemas de esfor¸code memória,existem tambémos problemas de algoritmos intensivos em linguagens de alto n´ıvel. Frameworks orientadas a objectos s~aonormalmente baseadas em conceitos, linguagens e mecanismos abstratos de alto-n´ıvel, o que pode criar incon- sistênciasquando combinadas com aspetos habituais da computa¸c~aoavan¸cada(contiguidade de elementos em memória,localidade, eliminar a redundânciado código-fonte, etc.). Aspetos como a gest~aode objetos em memória(introduzidos por pol´ıticascomo type-erasure e auto-boxing) e conceitos abstratos relativamente ao encapsulamento (uso ineficiente de APIs, em conjunto com mecanismos de tipos genéricos)s~aoidentificados como sendo problemáticosna resolu¸c~aode sobrecarga e introdu¸c~aode refinamentos nas implementa¸c~oes. E´ notada uma clara incompatibil- idade entre aplica¸c~oesirregulares intensivas e metodologias object-oriented, nomeadamente em Java. Otimiza¸c~oese altera¸c~oesao código-fonte s~aopropostas em aplica¸c~oesque sofrem destas limita¸c~oesde desempenho, com o intuito de diminuir a sobrecarga que a abstra¸c~aointroduz nas implementa¸c~oes. E´ feito um conjunto de experiênciascom aplica¸c~oesintensivas ao ordernar elementos com o heap-sort e com o algoritmo sobre grafos para encontrar a árvore de expans~ao de custo m´ınimo,para que se possam introduzir otimiza¸c~oescomo novas distribui¸c~oesde dados nas estruturas para melhorar os padr~oesde acesso àmemória,mais amigáveis da cache. Padr~oes eficientes de acesso àmemóriaéo principal interesse tendo em considera¸c~aoaspetos sobre a localidade, quer seja utilizando primitivas eficientes de distribui¸c~oesde dados em memória,ou diminuindo a complexidade do código-fonte dos algoritmos e estruturas. As otimiza¸c~oespro- postas melhoram o comportamento de cache verificado em vers~oesiniciais, assim como o tempo de execu¸c~ao.Aspetos baixo-n´ıvel como misses nas caches L1 e L2 salientam o carácteramigável da cache das distribui¸c~oesde dados e as melhorias no númerode misses; os menores misses na TLB mostram as melhorias na complexidade das implementa¸c~oes. iii Locality optimizations on irregular algorithms and data structures Abstract Pointer-based graph structures has been discussed by the scientific community that aims to use such structures on several areas. Many of these areas have the need to implement complex and, sometimes, extremely sophisticated algorithms, like finding the minimum spanning tree of a graph. These algorithms are well known for being hard to efficiently execute on current multi- core machines due to irregular patterns of memory accesses. Thus, the objective is optimizing the implementation of graph structures aiming for better memory locality. Memory latency problems have been attenuated by using cache memories in nowadays computer architectures - the memory optimization techniques for regular data-structures (e.g., matrices) are well known, as for irregular data-structures insert themselves in areas where efficient representations are not yet well known. Optimization algorithms like sorting problems are studied through the use of heap-sort meth- ods: we present a standard implementation and an optimized version originally presented by van Emde Boas. Along with the problems of memory straining we refer also to the problems of intensive algorithms in modern high-level languages. Object-oriented frameworks usually operate with high-level concepts and abstract mechanisms that may not combine well with core features of high performance computing (element contiguity, locality, eliminating code redundancy, etc.). We identify aspects like inefficient object-memory management (introduced by type-erasure and auto-boxing) abstract concepts regarding encapsulation (inefficient API usage alongside with generic mechanisms) as being problematic issues in the resolution of overheads of data-intensive algorithms. A mismatch is clearly noticed between irregular data-intensive applications and object-oriented in Java. We propose a series of optimizations and changes to the source code of applications that suffer from bottlenecks related to these, in order to demean the setbacks imposed by abstractions. We perform tests on heap sorting and Prim's minimal spanning tree algorithm in order to introduce the improvements made in data layouts optimizing irregular memory accesses. Efficient memory access patterns are the main concern in this thesis and good cache locality on modern memory architectures, whether by using efficient sorting techniques or improving pointer-chasing complexity in algorithms and data structures, are the main goals. The optimizations proposed are able to decrease the cache misses of applications and, most im- portant, execution time. We analyse the low-level instruction counts in order to accurately show that instruction complexity decreases; L1 and L2 cache miss lower counts prove the efficiency of cache-friendly layouts and show the miss behaviour improvement; TLB low miss counts verify the improvement in address and memory management. iv Contents Acknowledgments ii Resumo iii Abstract iv List of Figures vii List of Tables viii 1 Introduction 9 1.1 Data intensive and Irregular applications ....................... 10 1.2 Locality of reference .................................. 11 1.3 Optimizing data layouts ................................ 12 1.3.1 Object-oriented and Memory management . 12 1.3.2 Parallelism motivations ............................ 13 1.4 Main goals, Scope and Contributions ......................... 13 1.5 Organization of the Dissertation ........................... 14 2 Background 15 2.1 Cache-aware and cache-oblivious ........................... 15 2.2 Memory models ..................................... 15 2.2.1 RAM model ................................... 16 2.2.2 External memory model ............................ 16 2.2.3 Hierarchical memory model .......................... 17 2.2.4 Ideal-cache model ............................... 17 2.2.5 Caches in multi-processor environments ................... 19 2.3 Cache-aware/oblivious sorting ............................. 19 2.3.1 Efficient heap implementations ........................ 19 2.3.2 Cache-efficient heap and priority-queues ................... 21 2.4 Locality optimizations ................................. 23 2.4.1 Algorithmic locality optimizations ...................... 23 2.4.2 Data layout locality optimizations ...................... 25 2.4.3 JVM level optimizations ............................ 27 2.5 Graphs

Appendix a Sift Optimizations in Van Emde Boas-Based Heap

An Alternative to Fibonacci Heaps with Worst Case Rather Than Amortized Time Bounds∗

Advanced Data Structures

Introduction to Algorithms and Data Structures

Lecture 6 — February 9, 2017 1 Wrapping up on Hashing

Data Structures and Programming Spring 2016, Final Exam

Data Structures Binary Trees and Their Balances Heaps Tries B-Trees

Lecture Notes of CSCI5610 Advanced Data Structures

18.415 Advanced Algorithms Contents

FIBONACCI HEAPS ‣ Preliminaries ‣ Insert ‣ Extract the Minimum

Fibonacci Heaps

Fibonacci Heap

Package 'Datastructures'