Register Promotion by Sparse Partial Redundancy Elimination of Loads and Stores

Register Promotion by Sparse Partial Redundancy Elimination of Loads and Stores Raymond Lo Fred Chow Robert Kennedy Shin-Ming Liu Peng Tul loQsgi.com Silicon Graphics Computer Systems 2011 N. Shoreline Blvd. Mountain View, CA 94043 Abstract Optimization phases generate pseudo-registers to hold the values of computations that can be reused later, An algorithm for register promotion is presented based on like common subexpressions and loop-invariant expressions. the observation that the circumstances for promoting a Variables declared with the register attribute in the C pro- memory location’s value to register coincide with situations gramming language, together with local variables deter- where the program exhibits partial redundancy between ac- mined by the compiler to have no alias, can be directly repre- cesses to the memory location. The recent SSAPRE al- sented as pseudo-registers. All remaining register allocation gorithm for eliminating partial redundancy using a sparse candidates have to be assigned pseudo-registers through the SSA representation forms the foundation for the present al- process of register promotion [CL97]. Register promotion gorithm to eliminate redundancy among memory accesses, identifies sections of code in which it is safe to place the enabling us to achieve both computational and live range op- value of a data object in a pseudo-register. Register pro- timality in our register promotion results. We discuss how to motion is regarded as an optimization because instructions effect speculative code motion in the SSAPRE framework. generated to access a data object in register are more effi- We present two different algorithms for performing specu- cient than if it is not in a register. If later register allocation lative code motion: the conservative speculation algorithm cannot find a real register to map to a pseudo-register, it can used in the absence of profile data, and the the profile-driven either spill the pseudo-register to memory or re-materialize speculation algorithm used when profile data are available. it, depending on the nature of the data object.’ We define the static single use (SSU) form and develop the This paper addresses the problem of register promotion dual of the SSAPRE algorithm, called SSUPRE, to perform within a procedure. We assume that earlier alias analysis the partial redundancy elimination of stores. We provide has already identified the points of aliasing in the program, measurement data on the SPECint95 benchmark suite to and that these aliases are accurately characterized in the demonstrate the effectiveness of our register promotion ap- static single assignment (SSA) representation of the pro- proach in removing loads and stores. We also study the rel- gram [CCLs96]. The register promotion phase inserts ef- ative performance of the different speculative code motion ficient code that sets up data objects in pseudo-registers, strategies when applied to scalar loads and stores. and rewrites the program code to operate on them. The pseudo-registers introduced by register promotion are main- 1 introduction tained in valid SSA form. Our targets for register promotion include scalar variables, indirectly accessed memory lo- Register allocation is among the most important func- cations and program constants. Since program constants tions performed by an optimizing compiler. Prior to reg- can only be referenced but not stored into, they represent ister allocation, it is necessary to identify the data items only a subset of the larger problem of register promotion in the program that are candidates for register alloca- for memory-resident objects. For convenience, we choose to tion. To represent register allocation candidates, compil- exclude them from discussion for the rest of this paper, even ers commonly use an unlimited number of pseudo-registers though our solution does apply to them.3 [CACt81, TWL+Sl]. Pseudo-registers are also called sym- Register promotion is relevant only to objects that need bolic registers or virtual registers, to distinguish them from to be memory-resident in some part of the program. Global real or physical registers. Pseudo-registers have no alias, variables are targets for register promotion because they re- and the process of assigning them to real registers involves side in memory between procedures. Aliased variables need only renaming them. Thus, using pseudo-registers simplifies to reside in memory at the points of aliasing. Before register the register allocator’s job. promotion, we can regard the register promotion candidates as being memory-resident throughout the program. As a re- ‘Present address: Intel Corporation, 2200 Mission College Blvd., Santa Clara, CA 95052. ‘For a program variable that has an allocated home location, spilling its pseudo-register to the home location produces more ef- Permission to make digital or hard oopiaw of all or pan 01 thii work for ficient code than spilling to a new temporary location. Spilling-to- psrsonsl or claewoom we is granted without 1~ provided that home and re-materialization can be regarded aa the reverse of register copies arm not made or dieributed for profit or commwcial advan- promotion [Bri92]. tage and that copies bear this notice and the iull citotion on the firat page. To copy otherwise, to republish. to p(wt cm swvem or to ‘In applying register promotion to program constants, the process redietributa to lists, rsquira prior spacific permission and/w a fae of materializing a program constant in a register corresponds to the SIGPLAN ‘98 Montraal. Canada loading of a variable from memory to a register. 0 1668 ACM 0.89791-9874/98/0006...$6.00 26 our algorithm for the PRE of stores. In Section 7, we pro- 2 t (redundant) vide measurement data to demonstrate the effectiveness of the techniques presented in removing loads and stores, and to study the relative performance of the different speculative code motion strategies when applied to scalar loads and stores. We conclude in Section 8. x (redundant) Xt Xt (a) loads of x (b) stores to x 2 Related Work Figure 1: Duality between load and store redundancies Different approaches have been used in the past to perform sult, there is a load operation associated with each of their register promotion. Chow and Hennessy use data flow anal- uses, and there is a store operation associated with each as- ysis to identify the live ranges where a register allocation signment to them. Hence register promotion can be modeled candidate can be safely promoted [CH90]. Because their as two separate problems: partial redundancy elimination of global register allocation is performed relatively early, at loads, followed by partial redundancy elimination of stores. the end of global optimization, they do not require a sep- Partial redundancy elimination (PRE) is a powerful op- arate register promotion phase. Instead, their register pro- timization concept first developed by Morel and Renvoise motion is integrated into the global register allocator, and [MR79]. By performing data flow analysis on a computa- profitable placement of loads and stores is performed only tion, it determines where in the program to insert the com- if a candidate is assigned to a real register. In optimizing putation. These insertions in turn cause partially redundant the placement of loads and stores, they use a simplified and computations to become fully redundant, and therefore safe symbolic version of PRE that makes use of the fact that the to delete. Knoop et al. came up with a different PRE algo- blocks that make up each live range must be contiguous. rithm called lazy code motion that improves on Morel and Cooper and Lu use an approach that is entirely loop- Renvoise’s results [KRS92, DS93, KRS94a]. The result of based [CL97]. By scanning the contents of the blocks com- lazy code motion is optimal: the number of computations prising each loop, they identify candidates that can be safely cannot be further reduced by safe code motion, and the life- promoted to register in the full extent of the loop. The load times of the pseudo-registers introduced are minimized. to a pseudo-register is generated at the entry to the outer- Our team at Silicon Graphics has recently developed a most loop where the candidate is promotable. The store, new algorithm to perform PRE based on SSA form, called if needed, is generated at the exit of the same loop. Their SSAPRE [CCKS97]. SSAPRE achieves the same optimal algorithm handles both scalar variables and pointer-based result as lazy code motion. Applying SSAPRE to loads thus memory accesses where the base is loop-invariant. Their has the effects of moving them backwards with respect to approach is all-or-nothing, in the sense that if only one part the control flow while inserting them as late as possible. of a loop contains an aliased reference, the candidate will The development of SSAPRE was motivated by the fact not be promoted for the entire loop. They do not han- that traditional data flow analysis based on bit vectors does dle straight-line code, relying instead on the PRE phase to not interface well with the SSA form of program represen- achieve the effects of promotion outside loops, but it is not tation. In contrast, the SSAPRE algorithm takes advantage clear if their PRE phase can handle stores appropriately. of the SSA representation and intrinsically produces its op- Dhamdhere was first to recognize that register promo- timized output in SSA form. It does not use bit vectors and tion can be modeled as a problem of code placement for instead works on one candidate at a time, using the built- loads and stores, thereby benefiting from the established in use-def edges in SSA to propagate data flow informa- results of PRE [Dha88, DhaSO]. His Load-Store Insertion tion. The SSAPRE algorithm exhibits the same attributes Algorithm (LSIA) is an adaptation of Morel and Renvoise’s of sparseness inherent in other SSA-based optimization al- PRE algorithm for load and store placement optimization.

Load more