A Tiling Perspective for Register Optimization Fabrice Rastello, Sadayappan Ponnuswany, Duco Van Amstel

A Tiling Perspective for Register Optimization Fabrice Rastello, Sadayappan Ponnuswany, Duco Van Amstel

A Tiling Perspective for Register Optimization Fabrice Rastello, Sadayappan Ponnuswany, Duco van Amstel To cite this version: Fabrice Rastello, Sadayappan Ponnuswany, Duco van Amstel. A Tiling Perspective for Register Op- timization. [Research Report] RR-8541, Inria. 2014, pp.24. hal-00998915 HAL Id: hal-00998915 https://hal.inria.fr/hal-00998915 Submitted on 3 Jun 2014 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A Tiling Perspective for Register Optimization Łukasz Domagała, Fabrice Rastello, Sadayappan Ponnuswany, Duco van Amstel RESEARCH REPORT N° 8541 May 2014 Project-Teams GCG ISSN 0249-6399 ISRN INRIA/RR--8541--FR+ENG A Tiling Perspective for Register Optimization Lukasz Domaga la∗, Fabrice Rastello†, Sadayappan Ponnuswany‡, Duco van Amstel§ Project-Teams GCG Research Report n° 8541 — May 2014 — 21 pages Abstract: Register allocation is a much studied problem. A particularly important context for optimizing register allocation is within loops, since a significant fraction of the execution time of programs is often inside loop code. A variety of algorithms have been proposed in the past for register allocation, but the complexity of the problem has resulted in a decoupling of several important aspects, including loop unrolling, register promotion, and instruction reordering. In this paper, we develop an approach to register allocation and promotion in a unified optimization framework that simultaneously considers the impact of loop unrolling and instruction scheduling. This is done via a novel instruction tiling approach where instructions within a loop are represented along one dimension and innermost loop iterations along the other dimension. By exploiting the regularity along the loop dimension, and imposing essential dependence based constraints on intra- tile execution order, the problem of optimizing register pressure is cast in a constraint programming formalism. Experimental results are provided from thousands of innermost loops extracted from the SPEC benchmarks, demonstrating improvements over the current state-of-the-art. Key-words: compilation, compiler optimisation, register allocation,register spilling, register pro- motion, scheduling, constraint programming, loop transformations, loop unrolling, tiling, register tiling, locality ∗ Inria † Inria ‡ OSU § Inria RESEARCH CENTRE GRENOBLE – RHÔNE-ALPES Inovallée 655 avenue de l’Europe Montbonnot 38334 Saint Ismier Cedex Perspective de tuillage pour l’optimisation de registres. R´esum´e: L’allocation de registres est un probl`eme largement ´etudi´e. Un contexte partic- uli`erement important pour l’optimisation de l’allocation de registres est celui des boucles car elles constituent une fraction importante du temps d’ex´ecution du programme. De nombreux al- gorithmes d’allocation de registres ont ´et´epropos´es dans le pass´emais la complexit´edu probl`eme `adonn´elieu `aun d´ecouplage de plusieurs aspects importants, incluant notamment le d´eroulage de boucles, la promotion de registres ou le r´eordonnance d’instructions. Dans ce rapport nous d´eveloppons une approche unifi´ee au probl`eme d’allocation et promo- tion de registres dans un cadre d’optimisation qui combine l’impact du d´eroulage de boucles et le r´eordonnancement d’instructions. Ceci est r´ealis´egrˆace `a une nouvelle approche de pavage- registres dans lequel les instructions du corps de boucle sont repr´esent´ees le long d’une dimension et les it´erations de la boucle interne le long d’une autre dimension. En profitant de r´egularit´es le long d’une dimension et en imposant `al’ordre intra-tuile les contraintes de d´ependances, le probl`eme d’optimisation de la pression registres est exprim´ee dans un formalisme de programma- tion par contraintes. Les r´esultats exp´erimentaux issus de milliers de boucles internes extraites de la suite de benchmarks SPEC, d´emontrent l’am´elioration par rapport `al’´etat de l’art. Mots-cl´es : compilation, optimisation de compilation, allocation de registres, vidage en m´emoire, promotion de registres, ordonnancement, programmation par contraintes, transfor- mation de boucles, d´eroulage de boucle, d´ecoupage de boucles, pavage, localit´e A Tiling Perspective for Register Optimization 3 1 Introduction The efficient use of machine registers has been a fundamental compiler optimization goal for over half a century. Typically, the goal is to minimize the number of loads (stores) from (to) memory and/or cache. Register allocation [8] is the central technique and domain of study for this optimizations. Its goal is to map variables in a program to either machine registers or memory locations. Register allocation is subdivided into two sub-problems: first, the allocation selects the set of variables that will reside in registers at each point of the program; then, the assignment or coloring assigns each variable to a specific machine register. In general, it is not possible for all variables of a program to reside in registers throughout the execution of the program. The spilling problem [5] is that of determining which variables should be stored (or spilled) to memory to make the assignment possible; it aims at minimizing load/store overhead and thus attempts to maximize the reuse of values held in the registers. Register allocation is a very complex problem because of the interaction of multiple factors. Even within a single basic block, and a fixed schedule of operations, minimizing the number of loads and stores is very hard [14]. Further, there are many possible valid schedules for the order of execution of the instructions and it is generally recognized that it can have a significant impact on the number of loads/stores: different orders of instructions imply different live ranges for values and thus differences in the number of necessary registers to perform an allocation. For example, re-materialization [6, 2], which can be viewed as a form of very limited re-scheduling is the main source of performance improvement when integrated in the spilling formulation [11]. A particularly important context for considering the allocation problem is within loop com- putations, since they constitute a significant fraction of the execution time of many codes. In this paper, we take a fresh new look at this problem of optimizing register usage within innermost loops of programs Instead of the standard approach of analyzing the inter-statement dependen- cies among the operations in the loop body and live range of values based on a pre-determined schedule of operations, we view the set of operations in a novel two dimensional Cartesian space that is tiled to optimize register use. This new representation simultaneously considers both intra-iteration and inter-iteration register reuse, along with the use of register promotion, a tech- nique that enables an inter-iteration memory-dependence flow through a register instead [21]. Historically, register allocation did not at first consider loops when performing the coupled allocation and assignment problem [8]. Building on the observation that loops often generate multiple redundant loads of the same memory addresses, new analytic methods were developed that could detect such unnecessary loads. This information can then be used for a technique called scalar replacement or register promotion that keeps the memory content in a register until the next read of this value instead of reloading it from memory [7]. Improving on this first solution, register pipelining [13] provided a more extended formalization of the reuse problem from the perspective of register allocation. However, both register promotion and register pipelining do not consider the possibility for rescheduling the instructions within a loop. The advent of the Static Single Assignment (SSA) form allowed a new perspective where allocation could be performed separately from the assignment [18]. New aggressive allocation algorithms were designed that also perform re-materialization to reduce register pressure. In the context of architectures with ILP (instruction level parallelism), combined register allocation and instruction scheduling have also been extensively studied. Common to both of these approaches targeting the improvement of register reuse are many orthogonal loop transformations among which is loop unrolling. This technique, combined with register promotion, allows for a better exposure of register reuse between consecutive loop iterations. In this paper, we develop a novel approach to integrated register optimization considering register pipelining, instruction rescheduling and loop unrolling. We note that while these tech- RR n° 8541 4 Domagala, Rastello, Ponnuswany and van Amstel for i =0: N − 1 S0: (X[i], a, d[i][... ]) ← f0(X[i − 1]) S1: (c, b[i]) ← f1(a, b[i − 1]) S2: (Y [i], d[i][... ], e) ← f2(Y [i − 1]) S3: f3(c, e) (a) X[i] and Y [i] are struc- (b) s is the size of the flow tures made of two words each. edge; δ is the iteration dis- a, b[i], c, d[i][∗], e are a single tance. word each. Figure 1: A toy example and its corresponding dependences. Data-flow must-dependences (can- didate for promotion) are represented using solid edges. Other dependences, such as the output may-dependence related to d are represented using dashed edges. niques could be extended to also address the interplay between register allocation and ILP, it is beyond the scope and is not addressed by this paper. Our aim is to show the impact of combining rescheduling and loop unrolling with register allocation to reduce register pressure, hence reducing the amount of shuffle code between memory and registers. We integrate the three optimizations into a common framework using our a two-dimensional tiles loop perspective and a new model for the computation of spill costs.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    25 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us