P106-Kessler.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the ACM SIGPLAN '84 Symposium on Compiler Construction, SIGPLAN Noticea Vol. 19, No. 8, June 198~ Peep - An Architectura! Description Driven Peephob Optimizer Robert R. Kessler 1 Portable AI Support Systems Project Department of Computer Science University of Utah Salt Lake City, Utah 84112 Abstract global flow analysis, because even though it uses the global information it still performs "local" peephole transformations, Peep is an architectural description driven peephole optimizer, e.g. it doesn't do code motion and loop invariant removal. that is being adapted for use in the Portable Standard Lisp Peep is currently being integrated into the Portable Standard compiler. Tables of optimizable instructions are generated prior Lisp (PSL) compiler [9]. The PSL environment is an ideal place to the creation of the compiler from the architectural description to utilize Peep, since its compiler needs a new optimizer for each of the target machine. Peep then performs global flow analysis target machine (Currently PSL is supported on DecSystem-20, on the target machine code and optimizes instructions as defined DEC Vax, Motorola 68000, Cray-1 and IBM-370.) The PSL in the table. This global flow analysis allows optimization across compiler generates code by translating the input language into a basic blocks of instructions, and the use of tables created at sequence of virtual machine instructions, which are then macro compiler-generation time minimizes the overhead of discovering expanded into the target machine "LAP" instructions. These optimizable instructions. instructions are then translated into binary code for subsequent direct loading into the running image, or into a FASL file for 1. Introduction later loading. Peep has been inserted into the compiler as a LAP to LAP pass. This prOvides a good environment to experiment Peep is functionally similar to traditional peephole with Peep, allowing easy comparison of optimized and optimizers [12, 11]. That is, its purpose is to pass over the target unoptimized code sequences. We have chosen the Motorola machine code produced by a compiler, eliminating redundant MC68000 [13] as the first target machine to which we will apply operations and combining instructions into more efficient ones. the Peep optimizations. The primary reason is that the The term "peephole" was derived from the fact that the optimizer architecture is fairly contemporary and offers a wide range of only looks at a small local window (peephole) of adjacent different addressing modes. Also, in looking over a number of machine instructions when searching for optimizations and does generated code sequences, we observed quite a few instructions not use any global knowledge of the program. For example, a that would benefit from Peep optimizations. typical peephole optimization would be to combine the two instruction sequence of load 1 into register X and add X to Y, We begin with a review of the first significant machine into an increment of Y. Peep is different from traditional independent peephole optimizer, PO, developed by Davidson and peephole optimizers in two important ways: first, instead of being Fraser [7, 5] and its latest version [4, 6]. PO and Peep have hand written it is automatically generated from an architectural recently grown closer in their functionality and will be used for description of the target machine; and second, it performs global comparison. We follow this comparison with a discussion of the flow analysis [14] over the code to relax the adjacency constraint previous versions of Peep and a brief description of the machine and allow optimization across basic blocks (instruction sequences description language that is used in Peep. That is followed by a with one entrance and one exit). However, it is a restricted discussion of the two main parts of Peep: 1) the Peep Table Generator (PTG), which performs the analysis of the target machine; and 2) the Peep optimizer (embedded in the compiler) lWork supported in part by the Hewlett Packard Company, International which first performs a global flow analysis upon the LAP code Business Machines Corporation and the National Science Foundation under Grant Numbers MCS81-21750 and MCS82-04247. and then perform optimizations as specified by the tables produced by the PTG. We conclude with a complexity Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the comparison with PO, the current status of Peep and finally, ACM copyright notice and title of the publication and its date appear, and notice directions for future Peep research. is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific premission. ©1984 ACM 0-89791-139-3/84/0600/0106500,75 106 2. Previous Peephole Optimizers compiler. The optimizer performs flow analysis, and upon In the past, peephole optimization has been an ad-hoc finding an instruction pair that is in the table, performs the technique, customized for each compiler, usually added as an optimization. afterthought when it was discovered that, for certain localized cases the compiler was not generating the most efficient code possible. The first serious work attempting to formalize this type 3. Early Versions of Peep of optimization was conducted by Davidson and Fraser [7, 5]. Peep has undergone a number of changes in its evolutionary They defined PO, a peephole optimizer written in SNOBOL that cycle, from its beginnings in the COG system [10]. The general structure still remains the same, consisting of the Peep Table used a target machine description (written in the Instruction Set Generator, which derives optimization tables at compiler Processor language ISP [3, 2]) to create a machine independent peephole optimizer. PO performed the following steps: generater time, and the actual Peep optimizer that utilizes these tables to perform the optimizations within the compiler. 1. If PO operates directly upon an input assembly code, it Originally, the PTG derived two types of optimization tables: first translates each instruction into a list of equivalent semantic equations (derived from the ISP description). Cancellation: Pairs of instructions that could both be When PO is placed directly into a compiler, the compiler eliminated because their effect was to cancel can emit the semantic equations, and eliminate this step; each other (for example, a PUSH followed by a POP); and 2. PO then scans backward through the code, eliminating dead register assignments (e.g. a register that is set to Compresmon: Pairs of Instructions that could be compressed some value in one instruction and then set again in the into a single more efficient instruction (for next will eliminate the first assignment); example, Load a 1 and add, compresses into an increment). 3. PO then attempts to combine the two semantic equations Peep then took these tables and laboriously scanned the input of lexically adjacent instructions within a basic block. code searching for optimizable instructions. It checked the This is performed by substituting referenced resources in the second instruction with the values of the resources in resources accessed by each instruction, and ignored those that the first instruction. The combined semantic equation is would not conflict with the optimization pair being search for. then used to search the machine instruction descriptions This algorithm could potentially perform many passes over the for an instruction that performs the equivalent operation; code resulting in a complexity measure of N 2. 4. PO performs one other operation. When it finds a label, it After investigating PO, it was decided that flow analysis was a searches to determine if that label is still referenced (the reference could have been optimized away). If it is not, bettor technique for performing the optimization pass in Peep. PO removes the label and attempts to combine the labeled Flow analysis was beth a faster .technique (order N within a instruction with the previous one. basic block) and eliminated the necessity of deriving the PO was enhanced in late 1981 and has been described in cancellation pairs in the PTG (the flow analysis can do dead Davidson's dissertation [4] and POPL-9 [6]. The enhancements register elimination, which is functionally equivalent to the included a reimplementation in C and a new technique for cancellation pairs, using only the knowledge of which resources optimizing "logically adjacent" instructions instead of lexically each instruction accesses). Finally, it was decided that with only adjacent ones. PO uses a simple flow analysis upon each basic a little more work, global flow analysis could be added to allow block of cede, to determine where resources are accessed. These optimizations across basic blocks. This addition allows many more optimizations including dead "resource" elimination. This are linked together into a set of lists of related instructions has resulted in the current version of Peep, which has a (related by access to the various machine resources). Each list is then scanned for possible optimizations. Thus, even though simplified PTG, and uses global flow analysis. there may be a lexically intervening instruction it will not be logically included in a particular list and may be ignored by the 4. Target Machine Description optimizer. This is an excellent technique, since it requires only Peep utilizes a target machine description that is Lisp based. one pass over the code to create the lists and one more pass over This allows easy expression of the constructs, and maximal each list to find the optimizations. flexibility in writing the definition (by allowing the machine PO is a major advance over the previous peephole optimization definer to write Lisp macros where needed). It is also in the techniques, mainly because it is target machine independent. spirit of the PSL compiler and system, in which nearly all parts Both PO and Peep have many areas of commonality, including of the system are written in PSL itself.