Control-Flow Analysis of Higher-Order Languages Olin
Total Page:16
File Type:pdf, Size:1020Kb
Control-Flow Analysis of Higher-Order Languages or Taming Lambda Olin Shivers May 1991 CMU-CS-91-145 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Submitted in partial ful®llment of the requirements for the degree of Doctor of Philosophy. c 1991 Olin Shivers This research was sponsored by the Of®ce of Naval Research under Contract N00014-84-K-0415. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the of®cial policies, either expressed or implied, of ONR or the U.S. Government. Keywords: data-¯ow analysis, Scheme, LISP, ML, CPS, type recovery, higher-order functions, functional programming, optimising compilers, denotational semantics, non- standard abstract semantic interpretations Abstract Programs written in powerful, higher-order languages like Scheme, ML, and Common Lisp should run as fast as their FORTRAN and C counterparts. They should, but they don't. A major reason is the level of optimisation applied to these two classes of languages. Many FORTRAN and C compilers employ an arsenal of sophisticated global optimisations that depend upon data-¯ow analysis: common-subexpression elimination, loop-invariant detection, induction-variable elimination, and many, many more. Compilers for higher- order languages do not provide these optimisations. Without them, Scheme, LISP and ML compilers are doomed to produce code that runs slower than their FORTRAN and C counterparts. The problem is the lack of an explicit control-¯ow graph at compile time, something which traditional data-¯ow analysis techniques require. In this dissertation, I present a technique for recovering the control-¯ow graph of a Scheme program at compile time. I give examples of how this information can be used to perform several data-¯ow analysis optimisations, including copy propagation, induction-variable elimination, useless-variable elimination, and type recovery. The analysis is de®ned in terms of a non-standard semantic interpretation. The denotational semantics is carefully developed, and several theorems establishing the correctness of the semantics and the implementing algorithms are proven. iii iv To my parents, Julia and Olin. v vi Contents Acknowledgments xi 1 Introduction 1 1.1 My Thesis : :: :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: 1 1.2 Structure of the Dissertation ::: :: :: :: :: ::: :: :: :: :: :: 2 1.3 Flow Analysis and Higher-Order Languages : :: ::: :: :: :: :: :: 3 2 Basic Techniques: CPS and NSAS 7 2.1 CPS : ::: :: :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: 7 2.2 Control Flow in CPS :: :: ::: :: :: :: :: ::: :: :: :: :: :: 11 2.3 Non-Standard Abstract Semantic Interpretation : ::: :: :: :: :: :: 12 3 Control Flow I: Informal Semantics 15 3.1 Notation :: :: :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: 15 3.2 Syntax ::: :: :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: 16 3.3 Standard Semantics :: :: ::: :: :: :: :: ::: :: :: :: :: :: 17 3.4 Exact Control-Flow Semantics :: :: :: :: :: ::: :: :: :: :: :: 23 3.5 Abstract Control-Flow Semantics :: :: :: :: ::: :: :: :: :: :: 26 3.6 0CFA : ::: :: :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: 31 3.7 Other Abstractions : :: :: ::: :: :: :: :: ::: :: :: :: :: :: 33 3.8 Semantic Extensions :: :: ::: :: :: :: :: ::: :: :: :: :: :: 34 4 Control Flow II: Detailed Semantics 41 4.1 Patching the Equations : :: ::: :: :: :: :: ::: :: :: :: :: :: 42 4.2 Abstraction Functions : :: ::: :: :: :: :: ::: :: :: :: :: :: 53 4.3 Existence :: :: :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: 57 4.4 Properties of the Semantic Functions :: :: :: ::: :: :: :: :: :: 61 vii 5 Implementation I 79 5.1 The Basic Algorithm :: :: ::: :: :: :: :: ::: :: :: :: :: :: 79 5.2 Exploiting Monotonicity :: ::: :: :: :: :: ::: :: :: :: :: :: 80 5.3 Time-Stamp Approximations :: :: :: :: :: ::: :: :: :: :: :: 81 5.4 Particulars of My Implementation :: :: :: :: ::: :: :: :: :: :: 84 6 Implementation II 85 6.1 Preliminaries :: :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: 85 6.2 The Basic Algorithm :: :: ::: :: :: :: :: ::: :: :: :: :: :: 86 6.3 The Aggressive-Cutoff Algorithm :: :: :: :: ::: :: :: :: :: :: 87 6.4 The Time-Stamp Algorithm ::: :: :: :: :: ::: :: :: :: :: :: 90 7 Applications I 93 7.1 Induction-Variable Elimination : :: :: :: :: ::: :: :: :: :: :: 93 7.2 Useless-Variable Elimination :: :: :: :: :: ::: :: :: :: :: :: 98 7.3 Constant Propagation : :: ::: :: :: :: :: ::: :: :: :: :: :: 101 8 The Environment Problem and Re¯ow Analysis 103 8.1 Limits of Simple Control-Flow Analysis : :: :: ::: :: :: :: :: :: 103 8.2 Re¯ow Analysis and Temporal Distinctions : :: ::: :: :: :: :: :: 105 9 Applications II: Type Recovery 107 9.1 Type Recovery : :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: 107 9.2 Quantity-based Analysis :: ::: :: :: :: :: ::: :: :: :: :: :: 112 9.3 Exact Type Recovery : :: ::: :: :: :: :: ::: :: :: :: :: :: 112 9.4 Approximate Type Recovery ::: :: :: :: :: ::: :: :: :: :: :: 119 9.5 Implementation :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: 127 9.6 Discussion and Speculation ::: :: :: :: :: ::: :: :: :: :: :: 128 9.7 Related Work : :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: 133 10 Applications III: Super- 135 10.1 Copy Propagation : :: :: ::: :: :: :: :: ::: :: :: :: :: :: 135 10.2 Lambda Propagation :: :: ::: :: :: :: :: ::: :: :: :: :: :: 137 10.3 Cleaning Up :: :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: 141 10.4 A Uni®ed View :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: 141 viii 11 Future Work 143 11.1 Theory ::: :: :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: 143 11.2 Other Optimisations :: :: ::: :: :: :: :: ::: :: :: :: :: :: 144 11.3 Extensions : :: :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: 146 11.4 Implementation :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: 149 12 Related Work 151 12.1 Early Abstract Interpretation: the Cousots and Mycroft :: :: :: :: :: 151 12.2 Hudak ::: :: :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: 152 12.3 Higher-Order Analyses: Deutsch and Harrison : ::: :: :: :: :: :: 153 13 Conclusions 155 Appendices A 1CFA Code Listing 157 A.1 Preliminaries :: :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: 157 ::: :: :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: A.2 1cfa.t 161 B Analysis Examples 173 B.1 Puzzle example :: :: :: ::: :: :: :: :: ::: :: :: :: :: :: 173 B.2 Factorial example : :: :: ::: :: :: :: :: ::: :: :: :: :: :: 176 Bibliography 179 ix x Acknowledgments It is a great pleasure to thank the people who helped me with my graduate career. First, I thank the members of my thesis committee: Peter Lee, Allen Newell, John Reynolds, Jeannette Wing, and Paul Hudak. Peter Lee, my thesis advisor and good friend, taught me how to do research, taught me how to write, helped me publish papers, and kept me up-to-date on the latest developments in Formula-1 Grand Prix. I suppose that I recursively owe a debt to Uwe Pleban for teaching Peter all the things he taught me. Allen Newell was my research advisor for my ®rst four years at CMU, and co-advised me with Peter after I picked up my thesis topic. He taught me about having ªa bright blue ¯ame for science.º Allen sheltered me during the period when I was drifting in research limbo; I'm reasonably certain that had I chosen a different advisor, I'd have become an ex-Ph.D.-candidate pretty rapidly. Allen believed in me Ð certainly more than I did. John Reynolds taught (drilled/hammered/beat into) me the importance of formal rigour in my work. Doing mathematics with John is an exhilarating experience, if you have lots of stamina and don't mind chalk dust or feelings of inadequacy. John's emphasis on formal methods rescued my dissertation from being just another pile of hacks. Jeannette Wing was my contact with the programming-systems world while I was of®cially an AI student. She supported my control-¯ow analysis research in its early stages. She also has an eagle eye for poor writing and a deep appreciation for edged weapons. Paul Hudak's work in¯uenced me a great deal. His papers on abstract semantic interpretations caused me to consider applying the whole approach to my problem. It goes without saying that all of these ®ve scholars are dedicated researchers, else they would not have attained their current positions. HoweverÐand this is not a given in academeÐthey also have a deep committment to their students' welfare and development. I was very, very lucky to be trained in the craft of science under the tutelage of these individuals. xi In the summer of 1984, Forest Baskett hired the entire T group to come to DEC's Western Research Laboratory and write a Scheme compiler. I was going to do the data-¯ow analysis module of the compiler; it would be fair to say that I am now ®nally ®nishing my 1984 summer job. The compiler Forest hired us to write turned into the ORBIT Scheme compiler, and this dissertation is the third one to come out of that summer project (so far). Both Forest and DEC have a record of sponsoring grad students for summer research projects that are interesting longshots, and I'd like to acknowledge their foresight, scope and generosity. If one is going to do research on advanced programming languages, it certainly helps to be friends with someone like Jonathan Rees. In 1982, Jonathan was the other guy at Yale who'd read the Sussman/Steele papers. We agreed that hacking Scheme and spinning great plans was much more fun than interminable theory classes. Jonathan's deep understanding of computer languages and good systems taste has in¯uenced me repeatedly. Norman Adams, like Jonathan a member of the T project, contributed his technical and