A Unified Compiler Framework for Program Analysis, Optimization, and Automatic Vectorization with Chains of Recurrences

A UNIFIED COMPILER FRAMEWORK FOR PROGRAM ANALYSIS, OPTIMIZATION, AND AUTOMATIC VECTORIZATION WITH CHAINS OF RECURRENCES Name: Yixin Shou Department: Department of Computer Science Major Professor: Robert A. van Engelen Degree: Ph.D. Term Degree Awarded: Spring, 2009 In this dissertation I propose a unified compiler framework for program analysis, optimization, and automatic vectorization with techniques based on the Chains of Recurrences (CR) algebra. The root theoretical foundations of the CR algebra and the CR# additions for program analysis, that I specifically developed for this dissertation research, are given. Based on the theory, I propose an extension of the Single Static Assignment (SSA) form, called the Inductive SSA form. The form associates inductive proofs with SSA variables to describe their inductive properties. The proofs are derived with algorithms based on the CR# theory. Inductive SSA provides an internal program representation that facilitates more effective loop analysis by a compiler. Based on the Inductive SSA form, I describe and evaluate a collection of compiler algorithms for program optimization at the SSA level. More specifically, Inductive SSA merges static SSA information with semantic recurrence information to describe the value progressions of loop-variant variables. I will show how Inductive SSA enables and/or enhances loop-variant variable analysis, improves dependence testing for loops, and how it can be effectively used by many classic compiler optimizations to improve compilation results. Inductive SSA is based on my theoretical work on extending the CR formalism by the CR# (CR-sharp) algebra and lattice to effectively represent the value sequences and value ranges of conditionally updated variables in loops. Using empirical results on SPEC benchmarks with GCC 4.1, I will show that the Inductive SSA form captures more recurrence relations compared to state-of-the-art induction variable recognition techniques. Because the CR# term rewriting system is complete (confluent and terminating), it can be used for congruence detection in programs at all optimization levels. The Inductive SSA program analysis is far less sensitive to syntactical code changes, due to the real semantic information it captures. As a result of the independence of code structure, the phase ordering problem, in which multiple optimizing transformations can be ordered based on their effect on enabling or disabling other transformations, is greatly alleviated with Inductive SSA for many traditional low-level and loop-level optimizations. The experimental results show that the effectiveness of loop-variant variable analysis, congruence detection, and loop optimizations can be significantly increased with Inductive SSA. For this dissertation research I also developed a math function library generator. Many computational tasks require repeated evaluation of functions over structured grids, such as plotting in a coordinate system, rendering of parametric objects in 2D and 3D, numerical grid generation, and signal processing. The library generator speeds up closed-form function evaluations over grids by vectorizing the CR forms of these functions. The approach is comparable to aggressive strength reduction and short-vector vectorization of the resulting CRs. Strength reduction of floating point operations is usually prevented in a compiler due to safety issues related to floating point roundoff. Auto-tuning of the library ensures safe floating point evaluation. Experiments show significantly improved execution of streaming SIMD extensions (SSE) and instruction-level parallelism (ILP) of math function evaluations. THE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCES A UNIFIED COMPILER FRAMEWORK FOR PROGRAM ANALYSIS, OPTIMIZATION, AND AUTOMATIC VECTORIZATION WITH CHAINS OF RECURRENCES By YIXIN SHOU A Dissertation submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Ph.D. Degree Awarded: Spring Semester, 2009 The members of the Committee approve the Dissertation of Yixin Shou defended on April 8, 2009. Robert A. van Engelen Professor Directing Dissertation Gordon Erlebacher Outside Committee Member Kyle Gallivan Committee Member David Whalley Committee Member Xin Yuan Committee Member Approved: David Whalley, Chair Department of Computer Science Joseph Travis, Dean, College of Arts and Sciences The Office of Graduate Studies has verified and approved the above named committee members. ii The members of the Committee approve the Dissertation of Yixin Shou defended on April 8, 2009. Robert A. van Engelen Professor Directing Dissertation Gordon Erlebacher Outside Committee Member Kyle Gallivan Committee Member David Whalley Committee Member Xin Yuan Committee Member The Office of Graduate Studies has verified and approved the above named committee members. ii To my father iii ACKNOWLEDGEMENTS This dissertation would not have been possible without the support of many individuals. First of all, I would like to thank my advisor, Dr. Robert A. van Engelen, for his extensive support, patience and encouragement throughout my doctoral work. His insightful comments and guidance at every stage of this dissertation were essential for me to complete my research and has taught me innumerable lessons and insights on the working of research in general. I am also grateful to him for his great help with scientific writing and his careful reading, commenting and revising of the manuscript. I am also very grateful to my committee members, Dr. Gordon Erlebacher, Dr. Kyle Gallivan, Dr. David Whalley and Dr. Xin Yuan, for their guidance and support through my graduate studies and for their insightful comments on my dissertation research. I would also like to thank all of the group members of ACIS laboratory at FSU for all the support and help I have received. Especially I would like to thank Johnnie Birch for many valuable and interesting discussions and his collaboration and help in integrating CR library into the experimental compiler. Last but not least, I would like to thank my family members who have been a constant source of love, support and strength. I would like to thank my parents with my deepest gratitude for their unconditional love and support all these years. I am grateful to my husband Yunsong for his understanding and love during these past years. His understanding and encouragement made it possible to complete this study. iv TABLE OF CONTENTS List of Tables...................................... vii List of Figures..................................... viii Abstract........................................ xii 1. INTRODUCTION.................................1 1.1 Motivation..................................1 1.2 The Inductive SSA Concept......................... 12 1.3 Automatic SIMD Vectorization of Chains of Recurrences......... 15 1.4 Thesis Statement............................... 16 1.5 Dissertation Outline............................. 17 2. PRELIMINARIES AND RELATED WORK.................. 18 2.1 Static Single Assignment Form....................... 18 2.2 Induction Variable Analysis......................... 21 2.3 Classic Compiler Optimizations....................... 30 2.4 The Chains of Recurrences Formalism................... 42 3. THEORY...................................... 50 3.1 The CR# Chains of Recurrences Extension................ 50 3.2 Inductive SSA................................. 71 4. ALGORITHMS................................... 80 4.1 Inductive SSA Construction......................... 80 4.2 Compiler Analysis and Optimization with Inductive SSA......... 87 5. IMPLEMENTATION............................... 101 5.1 GNU Compiler Collection (GCC)...................... 101 5.2 Implementation of Inductive SSA Extension................ 103 5.3 Loop-variant Variable Analysis Implementation.............. 105 5.4 Enhanced GVN-PRE with Inductive SSA-based Optimizations...... 105 6. EXPERIMENTAL RESULTS........................... 106 6.1 SPEC CPU2000 Benchmarks........................ 106 6.2 Measuring the Inductive Proof Choice Set Sizes.............. 106 6.3 Results of Loop-Variant Variable Classification with Inductive SSA... 108 v 6.4 Results for Inductive SSA-based Optimizations.............. 112 6.5 Inductive SSA Impact Analysis on GVN-PRE............... 116 7. SIMD Vectorization of Chains of Recurrences.................. 123 7.1 Introduction.................................. 123 7.2 Vector Chains of Recurrences........................ 125 7.3 Results.................................... 136 7.4 Related Work................................. 149 8. CONCLUSION................................... 152 REFERENCES..................................... 154 BIOGRAPHICAL SKETCH............................. 163 vi LIST OF TABLES 2.1 Chains of Recurrences simplification rules.................... 48 2.2 Chains of Recurrences inverse simplification rules................ 49 3.1 CR# simplification rules.............................. 52 3.2 CR# inverse simplification rules......................... 53 6.1 The SPEC CPU2000 benchmark suite...................... 107 6.2 Measuring the sizes of choice sets in Inductive SSA on SPEC2000....... 108 6.3 Results of loop-variant variable classification with Inductive SSA on SPEC2000.109 6.4 Inductive SSA classification of conditional loop-variant variables in SPEC2000.110 6.5 Number of additional equivalences and loop-invariants detected by Inductive SSA enhancement of GVN-PRE for SPEC CPU2000 benchmarks using GCC 4.1 -O3. Note: − indicates no difference....................

A Unified Compiler Framework for Program Analysis, Optimization, and Automatic Vectorization with Chains of Recurrences

Redundancy Elimination Common Subexpression Elimination

Integrating Program Optimizations and Transformations with the Scheduling of Instruction Level Parallelism*

Copy Propagation Optimizations for VLIW DSP Processors with Distributed Register Files ?

Garbage Collection

Phase-Ordering in Optimizing Compilers

Using Program Analysis for Optimization

CSE 401/M501 – Compilers

Machine Independent Code Optimizations

Code Improvement:Thephasesof Compilation Devoted to Generating Good Code

Abstract Interpretations, 14 Access Expressions, 137 Access Graph

Compiler Construction

Optimization.Pdf