the Top T HEME ARTICLE

THE I

The Fortran I compiler was the first demonstration that it is possible to automatically generate efficient from high-level languages. It has thus been enormously influential. This article presents a brief description of the techniques used in the Fortran I compiler for the parsing of expressions, loop optimization, and register allocation.

uring initial conversations about the At the same time, it is almost universally topic of this article, it became evident agreed that the most important event of the 20th that we can’t identify the top compiler century in compiling—and in computing—was of the century if, as the the development of the first Fortran compiler DCiSE editors originally intended, we consider only between 1954 and 1957. By demonstrating that the parsing, analysis, and code optimization - it is possible to automatically generate quality rithms found in undergraduate compiler text- machine code from high-level descriptions, the books and the research literature. Making such a IBM team led by opened the door selection seemed, at first, the natural thing to do, to the Information Age. because fundamental compiler belong The impressive advances in scientific comput- to the same class as the other algorithms discussed ing, and in computing in general, during the past in this special issue. In fact, fundamental compiler half century would not have been possible with- algorithms, like the other algorithms in this issue, out high-level languages. Although the word al- are often amenable to formal descriptions and, as gorithm is not usually used in that sense, from a result, to mathematical treatment. the definition it follows that a compiler is an al- However, in the case of the difficulty gorithm and, therefore, we can safely say that the is that, paraphrasing John Donne, no algorithm is Fortran I translator is the 20th century’ top an island, entire of itself. A compiler’s components compiler algorithm. are designed to work together to complement each other. Furthermore, next to this conceptual objec- tion, there is the very practical issue that we don’t The language have enough information to decide whether any The IBM team not only developed the com- of the fundamental compiler algorithms have had piler but also designed the Fortran language, and a determinant impact on the quality of compilers. today, almost 50 years later, Fortran is still the language of choice for scientific programming. The language has evolved, but there is a clear 1521-9615/00/$10.00 © 2000 IEEE family resemblance between Fortran I and to- day’s Fortran 77, 90, and 95. Fortran’s influence DAVID PADUA is also evident in the most popular languages to- University of Illinois at Urbana-Champaign day, including numerically oriented languages

70 COMPUTING IN SCIENCE & ENGINEERING such as Matlab as well as general-purpose lan- The flip side of using novel and sophisticated guages such as and Java. compiler algorithms was implementation and Ironically, Fortran has been the target of crit- debugging complexity. Late delivery and many icism almost from the beginning, and even bugs created more than a few Fortran skeptics, Backus voiced serious objections: “‘von Neuman but Fortran eventually prevailed: languages’ [like Fortran] create enormous, un- necessary intellectual roadblocks in thinking It gradually got to the where a program in about programs and in creating the higher-level Fortran had a reasonable expectancy of compiling combining forms required in a powerful pro- all the way through and maybe even of running. gramming methodology.”1 This gradual change in status from an experimen- Clearly, some language features, such as implicit tal to a working system was true of most compil- typing, were not the best possible choices, but For- ers. It is stressed here in the case of Fortran only tran’s simple, direct design enabled the develop- because Fortran is now almost taken for granted, ment of very effective compilers. Fortran I was the as if it were built into the hardware.2 first of a long line of very good Fortran compilers that IBM and other companies developed. These powerful compilers are perhaps the single most Optimization techniques important reason for Fortran’s success. The Fortran I compiler was the first major project in code optimization. It tackled problems of crucial importance whose general solution The compiler was an important research focus in compiler The Fortran I compiler was fairly by to- technology for several decades. Many classical day’s standards. It consisted of 23,500 assembly techniques for compiler analysis and optimiza- language instructions and required 18 person- tion can trace their origins and inspiration to the years to develop. Modern commercial compilers Fortran I compiler. In addition, some of the ter- might contain 100 times more instructions and minology the Fortran I implementers used al- require many more person-years to develop. most 50 years ago is still in use today. Two of the However, its size not withstanding, the compiler terms today’s compiler writers with the was a very sophisticated and complex program. It 1950s IBM team are (“a stretch of pro- performed many important optimizations—some gram which has a single entry point and a single quite elaborate even by today’s standards—and it point”3) and symbolic/real registers. Symbolic “produced code of such efficiency that its output registers are variable names the compiler uses in would startle the who studied it.”1 an intermediate form of the code to be gener- However, as expected, the success was not uni- ated. The compiler eventually replaces symbolic versal.2 The compiler seemingly generated very registers with real ones that represent the target good code for regular computations; however, ir- machine’s registers. regular computations, including sparse and sym- Although more general and perhaps more bolic computations, are generally more difficult to powerful methods have long since replaced those analyze and transform. Based on my understand- used in the Fortran I compiler, it is important to ing of the techniques used in the Fortran I com- discuss Fortran I methods to show their ingenu- piler, I believe that it did not do as well on these ity and to contrast them with today’s techniques. types of computations. A manifestation of the dif- ficulties with irregular computations is that sub- Parsing expressions scripted subscripts, such as A(M(I,),N(I,J)), One of the difficulties designers faced was how were not allowed in Fortran I. to compile arithmetic expressions taking into The compiler’s sophistication was driven by account the precedence of operators. That is, the need to produce efficient object code. The in the absence of parentheses, project would not have succeeded otherwise. Ac- should be evaluated first, then products and di- cording to Backus: visions, followed by additions and subtractions. Operator precedence was needed to avoid ex- It was our belief that if Fortran, during its first tensive use of parentheses and the problems as- months, were to translate any reasonable scientific sociated with them. For example, IT, an experi- source program into an object program only half mental compiler completed by A. Perlis and J.W. as fast as its hand-coded counterpart, the accep- Smith in 1956 at the Carnegie Institute of Tech- tance of our system would be in serious danger.1 nology,4 did not assume operator precedence. As

JANUARY/FEBRUARY 2000 71 pointed out: “The lack of operator copy propagation followed by dead-code elimination.6 priority (often called precedence or hierarchy) in Given an assignment x = a, copy propagation the IT language was the most frequent single substitutes a for occurrences of x whenever it can cause of errors by the users of that compiler.”5 determine it is safe to do so. Dead-code elimina- The Fortran I compiler would expand each tion deletes statements that do not affect the pro- operator with a sequence of parentheses. In a gram’s output. Notice that if a is propagated to simplified form of the algorithm, it would all uses of x, x = a can be deleted. The Fortran I compiler also identified permu- • replace + and – with ))+(( and ))-((, tations of operations, which reduced memory ac- respectively; cess and eliminated redundant computations re- • replace * and / with )*( and )/(, respec- sulting from common subexpressions.3 It is tively; interesting to contrast the parsing algorithm of • add (( at the beginning of each expression Fortran I with more advanced parsing algorithms and after each left parenthesis in the original developed later on. These algorithms, which are expression; and much easier to understand, are based on syntac- • add )) at the end of the expression and be- tic representation of expressions such as:7 fore each right parenthesis in the original expression. expression = term [ [ + | −] term]... term = factor [ [ * | / ] factor]... Although not obvious, the factor = constant | variable | (expression ) It is interesting to algorithm was correct, and, in the words of Knuth, “The re- Here, a factor is a constant, variable, or expression contrast the parsing sulting formula is properly enclosed by parentheses. A term is a factor possibly parenthesized, believe it or followed by a sequence of factors separated by * algorithm of Fortran I not.”5 For example, the expres- or /, and an expression is a term possibly followed sion A + * C was expanded by a sequence of terms separated by + or −. The with more advanced as ((A))+((B)*(C)). The precedence of operators is implicit in the notation: translation algorithm then terms (sequences of products and divisions) must parsing algorithms. scanned the resulting expres- be formed before expressions (sequences of addi- sion from left to right and in- tions and subtractions). When represented in this serted a temporary variable for manner, it is easy to build a recursive descent each left parenthesis. Thus, it parser with a routine associated with each type of translated the previous expression as follows: object, such as term or factor. For example, the routine associated with term will be something like u1=u2+u4; u2=u3; u3=A; u4=u5*u6; u5=B; u6=C. procedure term(){ call factor() Here, variable ui (2 ≤ i ≤ 5) is generated when while token is * or / { the (i – 1)th left parenthesis is processed. Vari- get next token able u1 is generated at the beginning to contain call factor() the expression’s value. These assignment state- } ments, when executed from right to left, will } evaluate the original expression according to the operator precedence . A subsequent Multiplication and division instructions could be optimization eliminates redundant temporaries. generated inside the (and addition or This optimization reduces the code to only two subtraction in a similar routine written to repre- instructions: sent expressions) without redundancy, thus avoid- ing the need for copy-propagation or dead-code- u1=A+u4; u4=B*C. elimination optimization within an expression.

Here, variables A, B, and C are propagated to DO loop optimizations and subscript where they are needed, eliminating the instruc- computations tions rendered useless by this propagation. One of the Fortran I compiler’s main objectives In today’s terminology, this optimization was was “to analyze the entire structure of the program equivalent to applying, at the expression level, in order to generate optimal code from DO state-

72 COMPUTING IN SCIENCE & ENGINEERING ments and references to subscripted variables.”1 bring values to the index registers. The compiler For example, the address of the Fortran array ele- section that Sheldon Best designed, which per- ment A(I,J,c3*K+6) could take the form formed index-register allocation, was extremely complex and probably had the greatest influence 1 base_A+I-1+(J-1)*di+(c3*K+6-1) on later compilers. Indeed, seven years after *di*dj Fortran I was delivered, Saul Rosen wrote:

where di and dj are the length of the first two Part of the index register optimization fell into dimensions of A, and these two values as well as disuse quite early, but much of it was carried the coefficient c3 are assumed to be constant. along into Fortran II and is still in use on the Clearly, address expressions such as this can slow 704/9/90. In many programs it still contributes down a program if not computed efficiently. to the production of better code than can be It is easy to see that there are constant subex- achieved on the new Fortran IV compiler.2 pressions in the address expression that can be in- corporated in the address of the instruction that The register allocation section was preceded by makes the reference.3 Thus, an instruction making another section whose objective was to create reference to the previous array element could in- what today is called a control-flow graph. The corporate the constant base_A+(6-1)*di*dj- nodes of this graph are basic blocks and its arcs di-1. It is also important to evaluate the variant represent the flow of execution. part of the expression as efficiently as possible. The Absolute execution frequencies Fortran I compiler used a pattern-based approach were computed for each basic Although the Monte to achieve this goal. For the previous expression, block using a Monte Carlo every time “K is increased by n (under control of a method and the information Carlo algorithm DO), the index quantity is increased by c3didjn, provided by Frequency state- giving the correct new value.”3 ments. Fortran I programmers delivered the necessary Today’s compilers apply removal of loop invari- had to insert the Frequency ants, induction-variable detection, and strength re- statements in the results, not everybody duction to accomplish similar results.6,8 The idea to specify the branching proba- of induction-variable detection is to identify those bility of IF statements, com- liked the strategy. variables within a loop that assume a sequence of puted GOTO statements, and av- values forming an arithmetic sequence. After erage iteration counts for DO identifying these induction variables, strength re- statements that had variable duction replaces multiplications of induction- limits.9 variable and loop-invariant values with additions. Compilers have used Frequency information The Fortran I compiler applied, instead, a sin- for register allocation and for other purposes. gle transformation that simultaneously moved However, modern compilers do not rely on pro- subexpressions to the outermost possible level grammers to insert information about frequency and applied strength reduction. A limitation of in the source code. Modern register allocation the Fortran I compiler, with respect to modern algorithms usually estimate execution frequency methods, was that it only recognized loop in- using syntactic information such as the level of dices as induction variables: nesting. When compilers used actual branching frequencies, as was the case with the Multiflow It was decided that it was not practical to track compiler,10 they obtained the information from down and identify linear changes in subscripts actual executions of the source program. resulting from assignment statements. Thus, the Although the Monte Carlo Algorithm deliv- sole criterion for linear changes, and hence for ered the necessary results, not everybody liked efficient handling of array references, was to be the strategy: that the subscripts involved were being con- trolled by DO statements.1 The possibility of solving the simultaneous equa- tions determining path frequency in terms of tran- Register allocation sition frequency using known methods for solv- The IBM 704, the Fortran I compiler’s target ing sparse matrix equations was considered, but machine, had three index registers. The com- no methods which would work in the presence of piler applied register allocation strategies to re- DO-loops and assigned GOTO statements [were] hit duce the number of load instructions needed to upon, although IF-type branches alone could

JANUARY/FEBRUARY 2000 73 have been handled without explicit interpretation. for the second loop was that ultimately used in The frequency estimating simulation traced the generating machine code.”9 flow of control in the program through a fixed The “least undesirable” register the look-ahead number of steps, and was repeated several times procedure identified was one whose value was in an effort to secure reliable frequency statistics. dead or, if all registers were live, the one reused Altogether an odd method!9 most remotely within the region. This strategy is the same as that proved optimal by Laszlo A. With the estimated value of execution fre- Belady in a 1965 paper for page replacement quency at hand, the compiler proceeded to cre- strategies.11 Belady’s objective was to minimize ate connected regions, similar to the traces used the number of page faults; as a result, the algo- many years later in the Multiflow compiler. Re- rithm is optimal “as long as one is concerned only gions were created iteratively. In each iteration, with minimizing the number of loads of symbolic the control flow graph was scanned one at a time indexes into actual registers and not with mini- by finding at each step the basic block with the mizing the stores of modified indexes.”9 highest absolute execution frequency. Then, The goal, of course, was not to prove or even working backwards and forward, a chain was achieve optimality of the register allocation al- formed by following the branches with the high- gorithm. In fact, est probability of execution as specified in the Frequency statements. Then, registers were al- [i]n order to simplify the index register alloca- located in the new region tion, it was implicitly assumed that calculations were not to be reordered. The contrary assump- … by simulating the action of the program. tion would have introduced a new order of diffi- Three cells are set aside to represent the object culty into the allocation process, and required the machine index registers. As each new tagged in- abstraction of additional information from the struction is encountered, these cells are examined program to be processed.9 to see if one of them contains the required tag; if not, the program is searched ahead to determine This assumption meant that the result is not which of the index registers is the least undesir- always optimal because, in some cases, “… there able to replace.3 is much advantage to be had by reordering com- putations.”7 Nevertheless, “… empirically, Best’s The new regions could connect with old regions 1955–1956 procedure appeared to be optimal.”1 and subsume them into larger regions.

In processing a new path connecting two previously disconnected regions, register usage was matched by permuting all the register designations of one uring the last decade, the relative im- region to match those of the other as necessary.9 portance of traditional programming languages as the means to interact The process of dealing with loops was somewhat with has rapidly declined. involved. DThe availability of powerful interactive applica- tions has made it possible for many people to use In processing a new path linking a block to itself computers without needing to write a single line and thus defining a loop, the loop was first con- of code. sidered to be concatenated with a second copy of Although traditional programming languages itself, and straight-line register allocation carried and their compilers are still necessary to imple- out in normal fashion through the first of the two ment these applications, this is bound to change. I copies, with look-ahead extending into the second do not believe that 100 years hence computers will copy. …Straight-line allocation was carried out for still be programmed the same way they are today. a second loop copy in essentially normal fashion.9 New applications-development technology will supersede our current strategies that are based on The only difference was that the look-ahead conventional languages. Then, the era of compil- procedure employed during this allocation was ers that Fortran I initiated will come to an end. a modified version of the original look-ahead Technological achievements are usually of in- procedure to account for register reuse across terest for a limited time only. New techniques loop iterations. Finally, “the allocation produced or devices rapidly replace old ones in an endless

74 COMPUTING IN SCIENCE & ENGINEERING in Compiler Techniques, B.W. Pollack, ed., Auerbach Publishers, cycle of progress. All the techniques used in the Princeton, N.J., 1972, pp. 38–59. Fortran I compiler have been replaced by more 6. A.V. Aho, . Sethi, and J.D. Ullman, Compilers, Principles, Tech- general and effective methods. However, For- niques, and Tools, Addison-Wesley, Reading, Mass., 1988. tran I remains an extraordinary achievement that 7. W.M. McKeeman, “Compiler Construction,” Compiler Construc- will forever continue to impress and inspire. tion: An Advanced Course, .L. Bauer and J.Eickel, eds., Lecture Notes in , Vol. 21, Springer-Verlag, Berlin, 1976. 8. S.S. Muchnick, Advanced Compiler Design and Implementation, Acknowledgments Morgan Kaufmann, San Francisco, 1997. This work was supported in part by US Army contract 9. J. Cocke and J.T. Schwartz, Programming Languages and Their N66001-97-C-8532; NSF contract ACI98-70687; and Compilers, Courant Inst. of Mathematical Sciences, New York Univ., New York, 1970. Army contract DABT63-98-1-0004. This work is not 10. J.A. Fisher, “Trace Scheduling: A Technique for Global Microc- necessarily representative of the positions or policies of ode Compaction,” IEEE Trans. Computers, Vol. C-30, No. 7, July the Army or Government. 1981, pp. 478–490. 11 L.A. Belady, “A Study of Replacement Algorithms for Virtual Stor- age Computers,” IBM Systems J., Vol. 5, No. 2, 1966, pp. 78–101.

References 1. J. Backus, “The History of Fortran I, II, and III,” IEEE Annals of the History of Computing, Vol. 20, No. 4, 1998. 2. S. Rosen, “Programming Systems and Languages—A Historical Survey,” Proc. Eastern Joint Computer Conf., Vol. 25, 1964, pp. 1–15, 1964; reprinted in Programming Systems and Languages, S. Rosen, ed., McGraw-Hill, New York, 1967, pp. 3–22. David Padua is a professor of computer science at the 3. J.W. Backus et al., “The Fortran Automatic Coding System,” Proc. University of Illinois, Urbana-Champaign. His interests Western Joint Computer Conf., Vol. 11, 1957, pp. 188–198; are in compiler technology, especially for parallel com- reprinted in Programming Systems and Languages, S. Rosen, ed., puters, and machine organization. He is a member of McGraw-Hill, New York, 1967, pp. 29–47. the ACM and a fellow of the IEEE. Contact him at the 4. D.E. Knuth and L.T. Pardo, “Early Development of Programming Languages,” Encyclopedia of Computer Science and Technology, Dept. of Computer Science, 3318 Digital Computer Vol. 7, Marcel Dekker, New York, 1977, p. 419. Laboratory, 1304 W. Springfield Ave., Urbana, IL 5. D.E. Knuth, “A History of Writing Compilers,” 1962; reprinted 61801; [email protected]; polaris.cs.uiuc.edu/~padua.

to Institute leadership. Remaining ef- portunity/Affirmative Action Employer. Classified Advertising fort may include responsibilities in re- search, education, and administrative duties. CUNY GRADUATE CENTER Requirements. Applicants must PURDUE UNIVERSITY have a Ph.D. in a science or engineer- Computer Science: Ph.D. Program Director ing discipline and must have demon- in Computer Science at CUNY Gradu- Computing Research Institute strated effective leadership of multi-in- ate Center has two professorial posi- vestigator research programs. tions, possibly at Distinguished Pro- Purdue University seeks a person Administrative and academic experi- fessor level. Seek individuals who have with vision to direct its Computing ence should be commensurate with an had major impact in computer sci- Research Institute. The Institute’s mis- appointment to a Full Professor faculty ence, are active in more than one sion is to enhance computing re- position. area, have consistent grant record, search across the Purdue campus by The Director will report to the Vice and some of whose work is applied. providing focus for interdisciplinary President for Research and Dean of Candidates with wide-ranging com- initiatives and by facilitating large in- the Graduate School. putational science backgrounds (com- novative research programs. The Uni- Applications/Nominations. A re- puter science, computational biology, versity is prepared to invest resources view of applications will commence chemistry, physics, , or to achieve this end. December 1, 1999, and continue un- engineering) as well as interests in Position Responsibilities. The Di- til the position is filled. Nominations new media are encouraged to apply. rect-or will provide leadership for ac- or applications containing a résumé Candidates will have opportunity to tivities designed to promote and facil- and names of three references should be associated with CUNY Institute for itate research in computer systems be sent to: Design and Development and their applications in science and Dr. Richard J. Schwartz, and New Media Lab at Graduate Cen- engineering; lead development of col- Chair of the Search Committee ter. See http://www.cuny.edu/abt- laborative relationships with govern- Dean, Schools of Engineering cuny/cunyjobs. Send CV and ment and industry; assist faculty in Purdue University names/addresses of three references identifying research opportunities; 1280 ENAD Building by 1/31/00 to: Search Committee and organize multidisciplinary re- West Lafayette, In 47907-1280 Chair, Ph.D. Program in Computer search programs. PHONE: 764-494-5346; Science, CUNY Graduate Center, 365 The appointment is full-time with FAX: 765-494-9321 Fifth Avenue, New York, NY 10016. a minimum half-time effort devoted Purdue University is an Equal Op- EO/AA/IRCA/ADA