Fighting Register Pressure in GCC

Fighting register pressure in GCC Vladimir N. Makarov Red Hat [email protected] Abstract is infinite number of virtual registers called pseudo-registers. The optimizations use them The necessity of decreasing register pressure to store intermediate values and values of small in compilers is discussed. Various approaches variables. Although there is an untraditional to decreasing register pressure in compilers are approach to use only memory to store the val- given, including different algorithms of regis- ues. For both approaches we need a special ter live range splitting, register rematerializa- pass (or optimization) to map pseudo-registers tion, and register pressure sensitive instruction onto hard registers and memory; for the second scheduling before register allocation. approach we need to map memory into hard- registers instead of memory because most in- Some of the mentioned algorithms were tried structions work with hard-registers. This pass and rejected. Implementation of the rest, in- is called register allocation. cluding region based register live range splitting and rematerialization driven by the regis- A good register allocator becomes a very ter allocator, is in progress and probably will significant component of an optimized com- be part of GCC. The effects of discussed op- piler nowadays because the gap between ac- timizations will be reported. The possible di- cess times to registers and to first level mem- rections of improving the register allocation in ory (cache) widens for the high-end proces- GCC will be given. sors. Many optimizations (especially inter- procedural and SSA-based ones) tend to cre- ate lots of pseudo-registers. The number of Introduction hard-registers is the same because it is a part of architecture. Even processors with new archi- Modern computers have several levels of stor- tectures containing more hard-registers need a age. The faster the storage, the smaller its size. good register allocator (although in less de- This is the consequence of a trade-off between gree) because the programs run on these com- the computer speed and its price. The fastest puters tend to be more complicated too. storage units are registers (or hard registers). The register allocator is one or more compiler They are not enough to store the values of op- components that could be considered as ones erations and directly referred variables for any solving two major tasks (mostly in an inte- serious program. grated way). The first and most interesting one It is very hard to force any optimization in a is to decrease register pressure to the level de- compiler (especially in a portable one) to use fined by the number of hard registers by differ- the hard registers effectively. Therefore most ent transformations. And the second one is to of compiler optimizations is written as if there assign hard registers to pseudo-registers effec- 86 • GCC Developers’ Summit tively. Despite its lack of many modern optimizations present in the new register allocator, the orig- So what is register pressure? There are two inal register allocator can easily compete with commonly used definitions. The wide one is the new one in terms of compiler speed, qual- the number of hard registers needed to store ity of the generated code and size of code. This values of the pseudo-registers at given program was a major reason for me to start work on im- point. Another one is the number of living proving the original register allocator. After pseudo-registers. thorough investigation, I found that the method There are a lot of known transformations that of assigning hard registers is very similar to decrease register pressure. Some of these the priority based colouring register allocator transformations generate code which could and [Chow84, Chow90], although it is more similar should be corrected later. Some transforma- to the modifications described in [Sorkin96]. It tions are easily and naturally integrated with was confirmed later. other transformations, such as the ones de- Chow’s approach is a real competitor to the creasing register pressure, assigning hard reg- Chaitin/Briggs approach. Some advantages isters, and fixing the pitfalls of the previous of Chow’s approach are acknowledged even transformations (such as register coalescing in by Preston Briggs [Briggs89]. Chow’s algo- a colouring based register allocator). Some of rithm is used in SGI Pro64 [Pro64] compiler them are hard to integrate in one pass. and derived compilers like Open64 [Open64] Currently GCC has two register allocators. The and ORC [ORC]. For example, as Briggs’ new one was written about two years ago optimistic colouring, Chow’s algorithm easily and is described in details in [Matz03]. It finds hard-registers for the diamond conflict is based on the Chaitin, Briggs, and Appel graph (see Figure 1). approaches to register allocation [Chaitin81, Briggs94, Appel96]. a The old register allocator (I will call it the original register allocator) has been existing since b c the very first version of GCC. It was written by Richard Stallman. Some of its important com- d ponents stayed practically unchanged since the first version. Richard Stallman took the register allocator design from a portable Pastel (an Figure 1: Diamond graph extension of the programming language Pas- cal) compiler written in Livermore Laborato- All that was mentioned above was a major mo- ries [Stallman04]. The design of the Pastel tivation to start work on improvement of the register allocator (which actually was a second original register allocator. This article is fo- version for the Pastel compiler) is very similar cused on improving the original GCC register to the GCC one [Killian04]—they both have allocator. The first section describes the orig- the same separation on a pass assigning hard inal GCC register allocator. The second sec- registers to pseudo-registers and a pass which tion describes the method for decreasing the actually changes the code following the assign- register pressure for the original register al- ment and, if it is not possible, generates addi- locator based on register live range splitting. tional instructions to reload the registers. The third section describes decreasing regis- GCC Developers’ Summit 2004 • 87 ter pressure based on the live range shrinking reload pass can solve this task too but in a approach. The fourth section describes other less effective manner. possible improvements to the original register If register coalescing and global value allocator. The fifth section gives conclusions numbering (mentioned in Section 4) are a from my work. part of GCC, we could try to remove register coalescing from this pass. 1 The original register allocator in The instruction scheduler is not a part of the GCC original register allocator. It is present just to show GCC’s major passes starting The original register allocator contains a lot of with the regmove pass. Although the in- passes. Figure 2 describes the major passes and struction scheduler could solve the task their order. of decreasing register pressure (see section “register pressure sensitive instruction scheduling”). regmove (regmove.c) Regclass. GCC has a very powerful model for describing the target processor’s register insn scheduler file. In this model there is the notion of register class. The register class is a set of regclass (regclass.c) hard registers. You can describe as many register classes as possible. Of course, local global they should reflect the target processor’s allocator allocator register file. For example, some instruc- (local-alloc.c) (global.c) retry_global tions can accept only a subset of all registers. In this case you should define a post-reload reload (postreload.c) (reload1.c, reload.c) register class for the subset. Any rela- tions are possible between different register classes: they can intersect or one register class can be a subset of another register Figure 2: The original register allocator class (there are reserved register classes like NO_REGS which does not contain any register or ALL_REGS which con- The regmove pass is usually not considered tains all registers). to be a part of the original register al- The pass regclass (file regclass.c) locator. I included it because the pass mainly finds the preferred and alternative solves one task (register coalescing) pe- register classes for each pseudo-register. culiar to register allocators. The pass re- The preferred class is the smallest class moves some register moves if the registers containing the union of all register classes have the same value and it can be found in which result in the minimal cost of their a basic block scope. Although the major usage for the given pseudo-register. The task of regmove is to generate move in- alternative class is the smallest class con- structions to satisfy two operand instruc- taining the union of all register classes, tion constraints when the destination and the usage of which is still more profitable source registers should be the same. The than memory (the class NO_REGS is used 88 • GCC Developers’ Summit for the alternative if there are no such reg- The global allocator also tries to do this isters besides the ones in the preferred in a less general way. The local alloca- class). tor also performs a simple copy and con- It is interesting to note that the pass also stant propagation. It is implemented in the update_reg_equiv implicitly does code selection. Regclass function . works in two passes. On the first pass, Actually all hard-registers could be as- it defines the preferred and alternative signed in the global allocator. Such di- classes without taking the possible classes vision between the local and global allo- of other operands into account. For ex- cator has historical roots. In my opinion ample, an instruction with two operand it is reasonable to remove the local al- pseudo-registers exists in two variants; locator in the future because faster allo- one accepting classes A and B, and other cation of local pseudo-registers does not one accepting C and D.

Load more