Optimal Software Pipelining: Integer Linear Programming Approach

OPTIMAL SOFTWARE PIPELINING: INTEGER LINEAR PROGRAMMING APPROACH by Artour V. Stoutchinin Schooi of Cornputer Science McGill University, Montréal A THESIS SUBMITTED TO THE FACULTYOF GRADUATESTUDIES AND RESEARCH IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTEROF SCIENCE Copyright @ by Artour V. Stoutchinin Acquisitions and Acquisitions et Bibliographie Services services bibliographiques 395 Wellington Street 395. nie Wellington Ottawa ON K1A ON4 Ottawa ON KI A ON4 Canada Canada The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distribute or sell reproduire, prêter, distribuer ou copies of this thesis in microform, vendre des copies de cette thèse sous paper or electronic formats. la fome de microfiche/^, de reproduction sur papier ou sur format électronique. The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantialextracts fiom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation- Acknowledgements First, 1 thank my family - my Mom, Nina Stoutcliinina, my brother Mark Stoutchinine, and my sister-in-law, Nina Denisova, for their love, support, and infinite patience that comes with having to put up with someone like myself. I also thank rny father, Viatcheslav Stoutchinin, who is no longer with us. Without them this thesis could not have had happened. My advisor, Professor Guang R. Gao, has directed me throughout this work. His great knowledge that he shared with the students, his high standards in class and in research, and his constant encouragement made this thesis possible. 1 am honored to have worked with him. I also benefited a lot from discussions witli Dr. Erik Altman from IBM Wat- son Research Center and Dr. Govind Ramaswamy from Indian Institute of Technology, who provided me with useful directions in my research. I feel lucky to have been able to work witli them. 1 have worked closely with Dana Tarlescu since my first day in school on many topics that led to completion of this thesis. Dana's ingenuity and hard work inspired me and I am grateful for al1 of her help during my first difficult steps in computer science, and for the wonderful friendship that 1 am proud of. Professor Laurie Hendren, although not directly involved with my thesis, often helped me during this work by asking challenging questions and providing useful advise. Many other people have made my life in Montreal a happy one. My room- mates, Yuri Kroukov and ICatya Baranovskaya, endured my presence for a long time, while we were sharing an apartment. 1 will always treasure the time spent in their Company. 1 am grateful to Igor and Lena Fomenko, Christina Parent, Benoit and Anne de Dinechin for their help and support that they gave me in a variety of ways. I also thank my friends from our graduate office, Peter Alleyne, Fai Jacqueline Yeung , Xinan Tang and K halil El-Khatib, for making my stay in McGill more than enjoyable. My fellow students from the ACAPS group have given me a great deal of help during al1 this time: Andres Marquez, Luis Lozano, V.C. Sreedhar, Christo- pher Lapkowski, Shamir Merali, Kevin Theobald, Zhu Yingchun. 1 also wish to thank Dr. John Ruttenberg, Woody Lichtenstein, David Lively, Verna Lee, Dr. Ross Towle, Violet Jen, Bettina Le Veille and others from the Developers Magic Group at Silicon Graphics Computer Systems in Mountain View, California, where the bulk of my research has been done in the summer of 1995. Their expertise guided me and helped me to complete this work. Finally, 1 owe special thanks to Mike Sung frorn Massachusetts Institute of Technology for proofreading and correcting the final draft, and to Anne de Dinechin for translating the abstract of this thesis into French. Abstract In optimizing the code for high-performance processors, software pipelining of innerrnost loops is of fundamental importance. In order to benefit from software pipelining, it is essential to: (i) find the rate-optimal legal schedule, and (ii) allocate registers to the found schedule (it must fit into the limited number of available machine registers). This thesis deals with the development of a software pipeliner that produces the best possible schedules in terms of required registers, thus, assisting register allocation. Software pipelining and register allocation can be formulated as an integer linear programming (ILP) prablem, aiming to produce optimal schedules. In this thesis, we discuss the application of the integer linear programming to software pipelining and design a pipeliner for the MIPS R8000 superscalar microprocessor. We extended the previously developed ILP framework to a full software pipelining implementation by : (1) establishing an ILP model for the R8000 processor, (2) implementing the model in Modulo Scheduling ToolSet (MOST), (3) integrating it into the MIPSpro compiler, (4) successfully producing real code and gathering runtime statistics, and (5) developing and implementing a model for optimization of the memory systern behavior on the R8OOO processor. The ILP-based software pipeliner was tested as a functional replacement for ABSTRACT the original MIPSpro software pipeliner. Our results indicate a need of improving the ILP formulation and its solution: (1) the existing technique failed to produce results for loops with large instruction counts, (2) it was not able to guarantee register optimality for many interesting and important loops, for which optimal scheduling is necessary in order to avoid spilling, (3) the branching order, in which an ILP solver traverses the brandi-and-bound tree, was a single significant factor that affected the ILP solution time, leading to a conclusion that exploiting scheduling problem structure is essential for improving the efficiency of the ILP problem solving in the future. Résumé Le pipeline logiciel de boucles internes est d'une importance fondamentale dans l'optimisation des codes pour processeurs hautes-performances. Pour bénéficier du pipeline logiciel, il est essentiel: (i) de trouver l'ordonnancement valide à débit optimal, et (ii) d'allouer des registres à l'ordonnancement obtenu (en utilisant le nombre limité de registres machine disponibles). Cette thèse a pour objet le développement d'un pipelineur logiciel qui produise les meilleurs ordonnancements possibles en termes de registres requis, tout en facilitant l'allocation des registres. Dans le but de produire des ordonnancements optimaux, le pipeline logiciel et l'allocation des registres peuvent être formulés comme un problème de programmation linéaire en nombres entiers (PLNE). Dans cette thèse, nous discutons de l'application de la programmation linéaire en nombres entiers au pipeline logiciel, et nous dérivons un pipelineur pour le processeur MIPS R8000. Nous étendons le cadre initial de la PLNE à une implantation complète d'un pipelineur logiciel (1) en établissant un modèle de PLNE pour le processeur R8000, (2) en implantant le modèle au MOST (Modulo Scheduling ToolSet), (3) en l'intégrant au compilateur MIPSpro, (4) en produisant avec succès du code réel et en collectionnant des statistiques d'exécution, (5) en développant et implantant un modèle pour l'optimisation du comportement du système mémoire sur le processeur R8000. Le pipelineur logiciel PLNE a été testé en tant que remplaçant fonctionnel du pipelineur logiciel MIPS. Nos résultats montrent le besoin d'améliorer la formulation du PLNE et sa solution: 1) la technique existante ne peut produire de résultats pour des boucles avec un grand nombre d'instructions, (2) elle n'est pas capable de garantir l'optimisation des registres pour de nombreuses boucles intéressantes et importantes, pour lesquelles un ordonnancement optimal est nécessaire afin d'éviter tout débordement, (3) l'ordre de séparation, selon lequel un solveur traverse l'arbre de séparation-évaluation, s'avère être le facteur principal qui règle le temps pour obtenir une solution, nous faisant conclure qu'une exploitation de la structure du problème est essentielle pour améliorer l'efficacité de la méthode de résolution par PLNE pour les problèmes futurs. Contents Acknowledgement s Abstract Résumé 1 Introduction 1 1.1 McGill ILP Formulation . 3 1.2 Thesis Contributions . 4 1.3 Thesis Organizat ion . 7 2 Software Pipelining Basics 8 2.1 Simple Example. S 2.2 Dependence . 9 2.3 Basic Definitions . 14 2.4 Modulo Scheduling . 15 vii CONTENTS vl11... 3 The R8000 Processor Design 20 3.1 Processor Overview ........................ 22 3.2 Instruction Pipeline ........................ 23 3.2.1 Instruction Fetch ..................... 24 3.2.2 Integer and Address Generation Pipelines ....... 25 3.2.3 Floating Point Execution Pipeline ............ 25 3.3 CPU - FPU Interface ....................... 26 3.3.1 Floating Point Queueing Mechanism .......... 27 3.3.2 TBus ............................ 27 3.4 Memory System .......................... 29 3.4.1 Streaming Cache ..................... 29 3.5 Instruction Set Architecture ................... 31 4 ILP Mode1 for a Superscalar Processor 33 4.1 ILP Formulation ......................... 33 4.2 ILP Formulation for Superscalars ................ 39 4.2.1 Modified Resource Constraints .............. 39 4.2.2 ObjectiveFunction

Optimal Software Pipelining: Integer Linear Programming Approach

Mipspro C++ Programmer's Guide

MIPS IV Instruction Set

Pluggable Interface Relays CR-M Miniature Relays

Mipspro 64-Bit Porting and Transition Guide

Decoupled Software Pipelining with the Synchronization Array

VLIW Architectures Lisa Wu, Krste Asanovic

C++ Programmer's Guide

Static Instruction Scheduling for High Performance on Limited Hardware

Computer Architectures an Overview

Electronic Products and Relays Selection Table Interface Relays CR-Range and R600 / R500 Range Pluggable Interface Relays

Parallel Processing Techniques: History and Usage of MIPS Approach for Implementation of Fast CPU

Introduction to Software Pipelining in the IA-64 Architecture