A Flow-Analysis Framework for Realistic Scheme Programs

A flow-analysis framework for realistic Scheme programs Dissertation der Fakultät fur¨ Informations- und Kognitionswissenschaften der Eberhard-Karls-Universität Tubingen¨ zur Erlangung des Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) vorgelegt von Dipl.-Inform. Eric Jean Knauel aus Buchholz in der Nordheide Tubingen¨ 2008 Tag der mundlichen¨ Qualifikation: 7. Mai 2008 Dekan: Prof. Dr. Michael Diehl 1. Berichterstatter: Prof. Dr. Herbert Klaeren 2. Berichterstatter: Prof. Dr. Schroeder-Heister Abstract It is possible to scale control-flow analyses for higher-order languages to complete, fully-fledged programming languages and consequently compute the flow analysis of realistic programs. This dissertation gives a formal specification of a universal flow-analysis framework for the functional programming languages Scheme and PreScheme that covers all aspects and features of these languages. Moreover, the dissertation proposes new implementation strategies and techniques that have been developed and tested in context of an implementation for Scheme 48. These techniques yield an efficient implementation that enables analysis of realistic programs. Zusammenfassung Es ist möglich, Kontrollflussanalysen fur¨ Sprachen höherer Ordnung auf reali- stische Programme anzuwenden. Diese Dissertation spezifiziert eine universell verwendbare Flussanalyse fur¨ die funktionalen Programmiersprachen Scheme und PreScheme, die alle Aspekte und Fähigkeiten dieser Sprachen abdeckt. Fur¨ die praktische Umsetzung dieser Analyse werden neuartige Implementierungs- strategien und -techniken vorgestellt, die im Rahmen einer Implementierung fur¨ Scheme 48 entwickelt und erprobt wurden. Diese Techniken fuhren¨ zu einer effi- zienten Implementierung, welche die Analyse realistischer Programme erlaubt. 4 Contents 1 Introduction 9 1.1 Implementation Scalability ..................... 9 1.2 Executable semantic model ..................... 12 1.3 Setting the scene ........................... 13 1.4 Contributions ............................. 14 1.5 Structure of this dissertation .................... 14 2 Intermediate Language 17 2.1 Syntax ................................. 17 2.2 Printing CPS Programs ....................... 22 2.3 Semantics ............................... 24 2.4 Complete programs .......................... 30 2.4.1 Complete PreScheme programs ............... 30 2.4.2 Complete Scheme programs ................. 31 2.4.3 Treating complete programs ................. 31 2.4.4 Initial state .......................... 32 3 Flow analysis of Programs 33 3.1 Control flow .............................. 33 3.2 Using control-flow information for optimization .......... 35 3.3 Control-flow analysis for higher-order languages .......... 36 4 Analysis Framework 39 4.1 Semantic domains .......................... 39 4.2 State transition ............................ 43 4.3 Semantic correctness ......................... 48 4.4 Language-specific abstractions .................... 52 4.4.1 Abstracting PreScheme values ............... 53 4.4.2 Abstracting Scheme values ................. 60 4.5 Precision and Timed ......................... 73 4.5.1 k-CFA time abstractions ................... 74 4.6 Flow analysis examples ........................ 75 5 Executable Semantic Model 83 5.1 Example trace ............................ 84 5.2 Implementation with PLT Redex .................. 86 5.2.1 Grammar definition ..................... 86 5.2.2 Metafunctions ........................ 90 5 6 CONTENTS 5.2.3 Reduction relations ...................... 100 5.2.4 Generating traces ....................... 102 6 Implementation 105 6.1 Abstract syntax ............................ 105 6.2 Semantic domains .......................... 106 6.2.1 Representing time ...................... 106 6.2.2 Representing the binding environment ........... 106 6.2.3 Representing variable environments ............ 107 6.2.4 Representing abstract values ................ 107 6.2.5 Implementing the visited set ................ 112 6.3 Description of flow information ................... 114 6.4 Analysis with Garbage Collection .................. 116 6.4.1 Semantics with global garbage collection .......... 117 6.4.2 Abstract values with GC marks ............... 122 6.4.3 Garbage Collection and variable environment ....... 124 6.4.4 Splitting the root set ..................... 127 6.5 Experimental results ......................... 127 6.6 Testing and Debugging ........................ 131 7 Conclusion 133 7.1 Review ................................. 133 7.2 Future work .............................. 134 A Notation 137 B Semantic Correctness 139 C PreScheme Primops 149 C.1 Primops with one return value ................... 149 C.2 Primops for compound values .................... 151 C.3 Primops with multiple return values ................ 153 D Scheme Primops 155 D.1 Arithmetic Primops ......................... 157 D.2 Primops for compound values .................... 158 D.3 Procedures with rest list argument ................. 161 D.4 Multiple return values ........................ 162 Acknowledgments First, I thank my advisor Professor Herbert Klaeren. Professor Klaeren provided a pleasant work environment and allowed me great latitude in my choice of research activities, and agreed with this dissertation project. I thank Professor Peter Schroeder-Heister who immediately agreed to co-review this dissertation. I am greatly indebted to Mike Sperber. Mike was the first to direct my attention to research in compiler construction and especially to flow analyses. He wakened my curiosity for these subjects. I am deeply thankful that Mike shared his extensive knowledge with me, and the time and effort he spent read- ing the various drafts of this dissertation. Mike’s profound remarks, numerous corrections and suggestions — the remaining mistakes are mine — really helped me improve this work. My colleague Marcus Crestani deserves special thanks. Marcus not only carefully proof-read this dissertation and suggested valuable improvements, but also relieved me of parts of my teaching workload during the last semester. I am grateful to Matthew Might and Olin Shivers for patiently answering all my e-mails on flow analyses, garbage collection for flow analyses, and frame strings. Matthew Might also gave me access to his ∆CFA and ΓCFA implementations. Matthew and Olin also shared draft versions of their latest research results with me. For all of this, I am grateful. I thank Robby Findler for answering my questions on PLT Redex — using this tool to model the semantics ranks among the most pleasant parts of this work. Martin Gasbichler is an expert on Scheme 48 and I thank him for patiently answering all my questions on the byte code, native code, and PreScheme compilers, the virtual machine, and other aspects of this complex system. Espe- cially, I thank Martin for helping me develop the id tables for Scheme 48. My colleague Holger Gast was always available to help me with typesetting problems — I would probably still be struggling with strange TEX problems without his help. Frank Sühl and Thomas Rasch, both close friends of mine, provided many helpful comments and encouragement — for both I am grateful. I am very grateful for the support from my parents Hermann and Marie- Louise. Marie-Louise proof-read the whole work and suggested lots of linguistic improvements. Both supported and encouraged me tireless from day one of my studies till the end of this dissertation project. Finally, I thank my wife Gertraud for her love, support, and patience during the months I suffered from something she identified as the “compiling syn- drome” — a state of mind, characterized by confusion and mental absence, caused by long debugging sessions of flow analyses. 8 CONTENTS Chapter 1 Introduction It is possible to scale control-flow analyses for higher-order languages to complete, fully-fledged programming languages and consequently compute the flow analysis of realistic programs. Only few compilers for functional languages employ a flow analysis to gain information on the run-time behavior of a program and subsequently optimize the program. This failure of flow analysis for functional languages in practice is paradoxical, as a rich amount of technical literature proposes new techniques for computing analyses, develops more precise abstractions, and finds new applica- tions for analyses. The following observations help understanding the reasons for this paradox: • Technical literature considers minimalist toy languages, leaving open many questions that arise when adapting an analysis for a real programming language. • There are very few reports on experience with implementation techniques for flow analyses. Most literature only reports on prototype and proof-of- concept implementations. • Only few descriptions of flow analyses show how they scale to realistic programs. This dissertation aims at eliminating these shortcomings. This introduction describes the problems that arise when scaling to a complete language (Section 1.1). Section 1.2 motivates the need for an executable semantic model. Section 1.3 describes the compiler architecture that is the ba- sis for my flow analysis. A summary of the contributions (Section 1.4) and an overview of dissertation (Section 1.5) follow. 1.1 Implementation Scalability Research literature in the area of flow analyses for higher-order languages presents new ideas as formal specifications with corresponding soundness proofs, accom- panied by measurements supposed to show the relevance and usefulness of the

Load more