Compiler Strategies for Transport Triggered Architectures
Total Page:16
File Type:pdf, Size:1020Kb
Compiler Strategies for Transport Triggered Architectures Johan Janssen Compiler Strategies for Transport Triggered Architectures Proefschrift ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magnificus prof.ir. K.F. Wakker, voorzitter van het College voor Promoties, in het openbaar te verdedigen op maandag 17 september 2001 om 16:00 uur door Johannes Antonius Andreas Jozef JANSSEN elektrotechnisch ingenieur geboren te Wamel Dit proefschrift is goedgekeurd door de promotoren: Prof.dr.ir. A.J. van de Goor Prof.dr. H. Corporaal Samenstelling promotiecommissie: Rector Magnificus, voorzitter Prof.dr.ir. A.J. van de Goor, Technische Universiteit Delft, promotor Prof.dr. H. Corporaal, T.U. Eindhoven / IMEC, promotor Prof.dr.ir. E.F. Deprettere, Universiteit Leiden Prof.dr.ir. Th. Krol, Universiteit Twente Prof.dr.ir. R.H.J.M. Otten, Technische Universiteit Eindhoven Prof.dr.ir. H.J. Sips, Technische Universiteit Delft Dr. C. Eisenbeis, INRIA, Rocquencourt Published and distributed by: DUP Science DUP Science is an imprint of Delft University Press P.O. Box 98 2600 MG Delft The Netherlands Telephone: +31 15 27 85 678 Telefax: +31 15 27 85 706 E-mail: [email protected] ISBN 90-407-2209-9 Keywords: Compilers, Instruction Scheduling, Register Assignment This work was carried out in the ASCI graduate school. ASCI dissertation series number 69. Advanced School for Computing and Imaging Copyright c 2001 by Johan Janssen All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without written permission from the publisher: Delft University Press. Printed in The Netherlands This dissertation is dedicated to the loving memory of my mother Acknowledgements This Ph.D. thesis is the result of my research at the Computer Engineering group of the Electrical Engineering department of the Delft University of Tech- nology. First, I would like to express my gratitude to Henk Corporaal, my super- visor, for his everlasting support, valuable comments and for the numerous discussions we had. In addition, I want to thank prof. Ad van de Goor for being my promotor and for giving me the opportunity to perform my research in his group. Secondly, I thank the reviewers of this thesis, Andrea Cilio, Henjo Schot and my sister Willemien Korfage for their valuable comments on (parts of) the first drafts of this thesis. I thank Walter Groeneveld for his contribution on the formulation of the “stellingen”. Furthermore, I would like to thank my fellow Ph.D. students within the MOVE project: Marnix Arnold, Jeroen Hordijk, Steven Roos and especially Jan Hoogerbrugge for their work on the TTA compiler. I would like to thank the system administrators Jean-Paul van der Jagt, Tobias Nijweide and Bert Meijs for providing an excellent working computer environment. In addition, I would also like to thank all my former colleagues and students of the Computer Engineering group for the enjoyable working environment. Finally, I would like to thank my family, friends and TNO colleagues for their support and encouragement. Johan Janssen Delft, July 2001 vii viii Contents Acknowledgements vii 1 Introduction 1 1.1 Instruction-Level Parallelism .................... 2 1.1.1 ILP Architecture Arena . ................ 3 1.1.2 Architectural Trade-off .................... 6 1.2 Research Goals . .......................... 9 1.3 Thesis Outline . .......................... 11 2TTAs: An Overview 13 2.1 From VLIW to TTA .......................... 13 2.2 Transport Triggered Architectures . ................ 15 2.2.1 TTA Instruction Format . ................ 16 2.2.2 Function Units . ..................... 17 2.2.3 Register Files ......................... 18 2.2.4 Immediates .......................... 18 2.2.5 Move Buses .......................... 18 2.2.6 Sockets . .......................... 19 2.2.7 Control Flow and Conditional Execution . ....... 19 2.2.8 Software Bypassing . ..................... 20 2.2.9 Operand Sharing . ..................... 21 3 Compiler Overview 23 3.1 Front-end . ............................... 24 3.2 Back-end Infrastructure . ..................... 26 3.2.1 Reading and Writing ..................... 26 3.2.2 Control Flow Analysis .................... 27 3.2.3 Data Flow Analysis . ..................... 30 3.2.4 Data Dependence Analysis . ................ 32 3.2.5 Loop Unrolling, Function Inlining and Grafting . 35 3.3 Register Assignment ......................... 35 3.3.1 Graph Coloring . ..................... 36 3.3.2 Spilling . .......................... 40 ix 3.3.3 State Preserving Code .................... 41 3.3.4 TTA vs. OTA ......................... 43 3.4 Instruction Scheduling . ..................... 43 3.4.1 List Scheduling . ..................... 44 3.4.2 Resource Assignment .................... 45 3.4.3 Local Scheduling . ..................... 46 3.4.4 Global Scheduling . ..................... 50 3.4.5 Software Pipelining . ..................... 54 4 Evaluation Methodology 59 4.1 Benchmark Suite . .......................... 59 4.2 TTA Processor Suite .......................... 60 4.2.1 Space Walking ......................... 60 4.2.2 Selected TTA Processors . ................ 63 4.3 Scheduling Scopes .......................... 65 4.4 Exploitable ILP . .......................... 67 5 The Phase Ordering Problem 69 5.1 Early Register Assignment . ..................... 70 5.1.1 ILP and Early Register Assignment ............ 70 5.1.2 Dependence-Conscious Register Assignment Strategies . 72 5.1.3 Dependence-Conscious Early Register Assignment for TTAs .............................. 78 5.1.4 Discussion, Experiments and Evaluation . ....... 81 5.2 Late Register Assignment . ..................... 83 5.2.1 ILP and Late Register Assignment . ............ 84 5.2.2 Register-Sensitive Instruction Scheduling Strategies . 86 5.2.3 Register-Sensitive Instruction Scheduling for TTAs . 88 5.2.4 Experiments and Evaluation ................ 90 5.3 Integrated Register Assignment . ................ 91 5.3.1 Interleaved Register Assignment . ............ 92 5.3.2 Integrated Instruction Scheduling and Register Assign- ment .............................. 93 5.4 Conclusion ............................... 96 6 Integrated Assignment and Local Scheduling 99 6.1 Resource Assignment and Phase Integration ........... 100 6.2 Register Resource Vectors . ..................... 101 6.3 The Interference Register Set ..................... 105 6.4 Spilling . ............................... 109 6.4.1 Integrated Spilling . ..................... 110 6.4.2 Updating Data Flow and Data Dependence Relations . 110 6.4.3 Scheduling Issues . ..................... 112 6.4.4 Peephole Optimizations . ................ 117 6.5 State Preserving Code . ..................... 118 x 6.5.1 Generation of Callee-saved Code . ............ 118 6.5.2 Generation of Caller-Saved Code . ............ 120 6.6 Experiments and Evaluation .................... 121 6.6.1 Register Selection . ..................... 122 6.6.2 Operation Selection . ..................... 123 6.6.3 Basic Block Selection ..................... 124 6.6.4 Early vs. Integrated Assignment . ............ 126 6.7 Conclusions .............................. 130 7 Integrated Assignment and Global Scheduling 131 7.1 The Interference Register Set ..................... 131 7.1.1 Importing a Use . ..................... 132 7.1.2 Importing a Definition .................... 133 7.2 Importing Operations ......................... 136 7.3 Example . ............................... 137 7.4 Spilling . ............................... 140 7.5 State Preserving Code . ..................... 141 7.6 Experiments and Evaluation ..................... 143 7.6.1 Region Selection . ..................... 143 7.6.2 Global Spill Cost Heuristic . ................ 144 7.6.3 Early vs. Integrated Assignment . ............ 145 7.7 Conclusions .............................. 150 8 Integrated Assignment and Software Pipelining 151 8.1 Register Pressure . .......................... 152 8.2 Register Assignment and Software Pipelining ........... 156 8.3 Integrated Assignment and Modulo Scheduling . ....... 158 8.3.1 The Interference Register Set ................ 159 8.3.2 Spilling . .......................... 160 8.4 Experiments and Evaluation ..................... 161 8.4.1 Spilling or Increasing the II ................. 161 8.4.2 Early vs. Integrated Assignment . ............ 162 8.5 Conclusions .............................. 164 9 The Partitioned Register File 167 9.1 Register Files .............................. 168 9.1.1 Silicon Area .......................... 169 9.1.2 Access Time .......................... 170 9.1.3 Power Consumption ..................... 171 9.1.4 Partitioned Register Files . ................ 171 9.2 Early Assignment and Partitioned Register Files . ....... 174 9.2.1 Simple Distribution Methods ................ 175 9.2.2 Advanced Distribution Methods . ............ 177 9.2.3 Equal Area Compiling .................... 179 9.3 Late Assignment and Partitioned Register Files . ....... 180 xi 9.4 Integrated Assignment and Partitioned Register Files ...... 183 9.4.1 Local Heuristics . ..................... 184 9.4.2 A Global Heuristic . ..................... 186 9.5 Conclusions .............................. 189 10 Summary and Future Research 191 10.1 Summary . ............................... 191 10.2 Contributions . .........................