Register Allocation on the Intel® Itanium® Architecture

Technisch-Naturwissenschaftliche Fakultät Register Allocation on the Intel® Itanium® Architecture DISSERTATION zur Erlangung des akademischen Grades Doktor im Doktoratsstudium der TECHNISCHEN WISSENSCHAFTEN Eingereicht von: Dipl.-Math. Gerolf Fritz Hoflehner Angefertigt am: Institut fuer Systemsoftware Beurteilung: o.Univ.-Prof. Dr. Dr.h.c. Hanspeter Mössenböck (Betreuung) a.Univ.-Prof. Dr. Andreas Krall San Jose, März 2010 1 Abstract Register allocators based on graph-coloring have been implemented in commercial and research compilers since Gregory Chaitin¶s and colleagues pioneering work in the early 1980¶s. A coloring allocator decides which live range (a program or compiler generated variable) is allocated a machine register (³allocation problem´). The Itanium processor architecture supports predicated code, control- and data speculation and a dynamic register stack. These features make the allocation problem more challenging. This thesis analyses and describes efficient extensions in a coloring allocator for the Itanium processor. • Predicated code: The thesis describes compile time efficient coloring methods in the presence of predicated instructions, without compromising run-time performance of a more elaborate allocator based on the predicate query system, PQS. In particular it classifies predicated live ranges and shows that classical register allocation techniques can be used effectively to engineer efficient coloring allocators for predicated code. When predicated code is generated from compiler control flow more expensive predicate analysis frameworks like PQS don¶t have to be employed. • Speculated code: The thesis describes a new method of efficiently allocating speculated live ranges avoiding spill code generated by a more conservative method. In particular the NaT propagation problem is solved efficiently. • Dynamic register stack: The thesis reviews methods to use the dynamic register stack effectively in particular regions with function calls and/or pipelined loops. • Scalable allocation: A generic problem of coloring allocators is that they can be slow for large candidate sets. This thesis proposes the scalable allocator as a generic coloring method capable of allocating effectively programs with large register candidates sets. The methods can also be used for parallel allocation e.g. on multi-core machines. The experimental results on the CPU2006 benchmark suite illustrate the effectiveness of new methods. Finally, the thesis reviews the development of coloring allocators since Chaitin. 2 Kurzfassung Gregory Chaitin und seine Kollegen haben um 1980 die Pionierarbeit für Registerallokation basierend auf Graphfärbung (ÄFarballokator³) geleistet. Diese Allokatoren entscheiden, welchen ÄLebensspannen³ (d.h. deklarierte oder vom Compiler erzeugte Variablen, engl. live ranges) Maschinenregister (ÄFarben³) zugeteilt werden (Allokationsproblem). Der Itanium-Prozessor unterstützt Instruktionen mit Prädikaten (bedingt ausführbare Instruktionen), Kontroll- und Datenspekulation sowie einen dynamischen Register Stack. Diese Eigenschaften erschweren die Lösung des Allokationsproblems. Die vorliegende Dissertation untersucht und beschreibt effiziente Erweiterungen in einem Farballokator für den Itanium-Prozessor. • Prädikate: Es werden effiziente Methoden (zur Übersetzerzeit) für Farballokatoren vorgestellt, die Lebenspannen in Code mit bedingt ausführbaren Instruktionen allokieren. Insbesondere werden Äprädikatierte³ Lebensspannen klassifiziert und es wird gezeigt, das klassische M ethoden zu einem effizienten Farballokator für diese Lebensspannen erweitert werden können. In einem Compiler kann die Allokation mit diesen Methoden genau so effizienten Code generieren wie aufwendigere Verfahren, insbesondere Verfahren, die das Äpredicate query system³ (PQS) benutzen. • Spekulation: Es wird eine neue Methode erläutert, die ± im Vergleich zu konservativen Verfahren ± Spill Code für spekulative Lebensspannen vermeiden kann. Inbesondere wird eine effiziente Lösung für das NaT Propagation Problem vorgestellt. • Dynamischer Register Stack: Es wird beschrieben, wie der dynamische Register Stack in Code mit Funktionsaufrufen oder Äpipelined³ Schleifen (engl. Äsoftware- pipeline loops³) effizient verwendet werden kann. • Skalierbare Allokation: Es wird der skalierbare Allokator vorgeschlagen für die Lösung Allokationsprobleme beliebiger Grösse. Skalierbare Allokation erlaubt insbesondere die Parallelisierung des Allokationsproblems und ist unabhängig von der Prozessor-Architektur. 3 Die experimentellen Resultate für die CPU2006 Benchmark Suite zeigt die Effizienz der vorgestellen Verfahren. Schließlich enthält diese Dissertation einen ausführlichen Überblick über die Forschungsergebnisse für Farballokatoren seit Chaitin. 4 Acknowledgements The tree with our fruits is watered by many. First, this work would not have been possible without Intel and many of its excellent engineers. Roland Kenner implemented the original version of the graph-coloring allocator and PQS. The idea to eliminate interferences for portions of a data speculated live range (Section 7.2) had been implemented in the compiler before 1999, the year I started at Intel. Daniel Lavery was my exceptional engineering manager for seven years, taking a lot of interest in and continuously challenging my thoughts on register allocation. His questions and curiosity helped shape the section on predicate-aware register allocation. I had the good fortune to benefit from many discussions and working with an outstanding team and colleagues: Howard Chen, Darshan Desai, Kalyan Muthukumar, Robyn Sampson and Sebastian Winkel. Alban Douillet and Alex Settle helped bringing to life the product version of the multiple alloc algorithms during their summer internships. The Intel Compiler Lab provided a great place to work and let me drive innovation amidst the challenges of tight product schedules. Second, I would like to thank Prof. Hanspeter Mössenböck for his patience and support. His kind personality, openness, and advice were inspiring. I cannot imagine how he could have been more supportive. Finally, and most importantly, I thank the woman in my life, Elisabeth Reinhold. Her push and loving support were always there when I needed it most. 5 Contents Abstract................................ ................................ ................................ ........................... 2 Kurzfassung ................................ ................................ ................................ .................... 3 Acknowledgements ................................ ................................ ................................ ......... 5 1 Introduction ................................ ................................ ................................ .............. 8 1.1 Compilers and Optimizations................................ ................................ ........... 8 1.2 Register Allocation based on Graph-Coloring ................................ .................. 8 1.3 Itanium Processor Family................................ ................................ ................. 9 1.4 Overview ................................ ................................ ................................ ......... 9 2 Background on IA-64................................ ................................ .............................. 11 2.1 IA-64 Instructions................................ ................................ .......................... 11 2.2 Predication................................ ................................ ................................ ..... 18 2.3 Architected Registers ................................ ................................ ..................... 21 2.4 Register Stack Frame ................................ ................................ ..................... 22 2.5 Register Spilling and Backing Store................................ ............................... 26 2.6 Speculation ................................ ................................ ................................ ....27 2.6.1 Control Speculation................................ ................................ .................... 27 2.6.2 Data Speculation ................................ ................................ ........................ 29 2.6.3 Combined Control- and Data Speculation................................ ................... 30 3 Review of Graph-Coloring based Register Allocation ................................ ............. 31 3.1 Foundations ................................ ................................ ................................ ...31 3.1.1 Chaitin-style Register Allocation................................ ................................ 31 3.1.2 Priority-based Register Allocation................................ .............................. 35 3.2 Worst-case Time and Space Analysis................................ ............................. 37 3.3 Developments ................................ ................................ ................................ 42 3.3.1 Spill Cost Reduction ................................ ................................ .................. 43 3.3.2 Scoping................................ ................................ ................................ ...... 47 3.3.3 Coalescing ................................ ................................ ................................ .49 3.3.4 Extensions................................ ................................ ................................ ..50 3.4 Alternative

Load more