Code Size Optimization for Embedded Processors

UCAM-CL-TR-607 Technical Report ISSN 1476-2986 Number 607 Computer Laboratory Code size optimization for embedded processors Neil E. Johnson November 2004 15 JJ Thomson Avenue Cambridge CB3 0FD United Kingdom phone +44 1223 763500 http://www.cl.cam.ac.uk/ c 2004 Neil E. Johnson This technical report is based on a dissertation submitted May 2004 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Robinson College. Technical reports published by the University of Cambridge Computer Laboratory are freely available via the Internet: http://www.cl.cam.ac.uk/TechReports/ ISSN 1476-2986 Abstract This thesis studies the problem of reducing code size produced by an optimizing compiler. We develop the Value State Dependence Graph (VSDG) as a powerful intermediate form. Nodes represent computation, and edges represent value (data) and state (control) dependencies between nodes. The edges specify a partial ordering of the nodes—sufficient ordering to maintain the I/O semantics of the source program, while allowing optimizers greater freedom to move nodes within the program to achieve better (smaller) code. Optimizations, both classical and new, transform the graph through graph rewriting rules prior to code generation. Additional (semantically inessential) state edges are added to transform the VSDG into a Control Flow Graph, from which target code is generated. We show how procedural abstraction can be advantageously applied to the VSDG. Graph patterns are extracted from a program's VSDG. We then select repeated patterns giving the greatest size reduction, generate new functions from these patterns, and replace all occurrences of the patterns in the original VSDG with calls to these abstracted functions. Several embedded processors have load- and store-multiple instructions, representing several loads (or stores) as one instruction. We present a method, benefiting from the VSDG form, for using these instructions to reduce code size by provisionally combining loads and stores before code generation. The final contribution of this thesis is a combined register allocation and code motion (RACM) algorithm. We show that our RACM algorithm formulates these two previously antagonis- tic phases as one combined pass over the VSDG, transforming the graph (moving or cloning nodes, or spilling edges) to fit within the physical resources of the target processor. We have implemented our ideas within a prototype C compiler and suite of VSDG optimizers, generating code for the Thumb 32-bit processor. Our results show improvements for each optimization and that we can achieve code sizes comparable to, and in some cases better than, that produced by commercial compilers with significant investments in optimization technology. 4 Contents 1 Introduction 15 1.1 Compilation and Optimization . 16 1.1.1 What is a Compiler? . 16 1.1.2 Intermediate Code Optimization . 17 1.1.3 The Phase Order Problem . 18 1.2 Size Reducing Optimizations . 18 1.2.1 Compaction and Compression . 19 1.2.2 Procedural Abstraction . 19 1.2.3 Multiple Memory Access Optimization . 19 1.2.4 Combined Code Motion and Register Allocation . 21 1.3 Experimental Framework . 21 1.4 Thesis Outline . 21 2 Prior Art 25 2.1 A Cornucopia of Program Graphs . 26 2.1.1 Control Flow Graph . 26 2.1.2 Data Flow Graph . 26 2.1.3 Program Dependence Graph . 27 2.1.4 Program Dependence Web . 27 2.1.5 Click's IR . 27 2.1.6 Value Dependence Graph . 28 2.1.7 Static Single Assignment . 29 2.1.8 Gated Single Assignment . 30 2.2 Choosing a Program Graph . 31 2.2.1 Best Graph for Control Flow Optimization . 32 2.2.2 Best Graph for Loop Optimization . 32 2.2.3 Best Graph for Expression Optimization . 32 2.2.4 Best Graph for Whole Program Optimization . 33 2.3 Introducing the Value State Dependence Graph . 33 2.3.1 Control Flow Optimization . 33 2.3.2 Loop Optimization . 33 CONTENTS 6 2.3.3 Expression Optimization . 33 2.3.4 Whole Program Optimization . 34 2.4 Our Approaches to Code Compaction . 34 2.4.1 Procedural Abstraction . 34 2.4.2 Multiple Memory Access Optimization . 36 2.4.3 Combining Register Allocation and Code Motion . 37 2.5 10002 Code Compacting Optimizations . 39 2.5.1 Procedural Abstraction . 39 2.5.2 Cross Linking . 40 2.5.3 Algebraic Reassociation . 41 2.5.4 Address Code Optimization . 41 2.5.5 Leaf Function Optimization . 42 2.5.6 Type Conversion Optimization . 42 2.5.7 Dead Code Elimination . 43 2.5.8 Unreachable Code Elimination . 44 2.6 Summary . 45 3 The Value State Dependence Graph 47 3.1 A Critique of the Program Dependence Graph . 48 3.1.1 Definition of the Program Dependence Graph . 48 3.1.2 Weaknesses of the Program Dependence Graph . 49 3.2 Graph Theoretic Foundations . 50 3.2.1 Dominance and Post-Dominance . 50 3.2.2 The Dominance Relation . 50 3.2.3 Successors and Predecessors . 51 3.2.4 Depth From Root . 52 3.3 Definition of the Value State Dependence Graph . 53 3.3.1 Node Labelling with Instructions . 54 3.4 Semantics of the VSDG . 57 3.4.1 The VSDG's Pull Semantics . 57 3.4.2 A Brief Summary of Push Semantics . 60 3.4.3 Equivalence Between Push and Pull Semantics . 61 3.4.4 The Benefits of Pull Semantics . 63 3.5 Properties of the VSDG . 63 3.5.1 VSDG Well-Formedness . 63 3.5.2 VSDG Normalization . 64 3.5.3 Correspondence Between θ-nodes and GSA Form . 64 3.6 Compiling to VSDGs . 66 3.6.1 The LCC Compiler . 66 3.6.2 VSDG File Description . 67 3.6.3 Compiling Functions . 67 3.6.4 Compiling Expressions . 69 3.6.5 Compiling if Statements . 70 3.6.6 Compiling Loops . 71 3.7 Handling Irreducibility . 73 3.7.1 The Reducibility Property . 73 CONTENTS 7 3.7.2 Irreducible Programs in the Real World . 76 3.7.3 Eliminating Irreducibility . 76 3.8 Classical Optimizations and the VSDG . 77 3.8.1 Dead Node Elimination . 78 3.8.2 Common Subexpression Elimination . 79 3.8.3 Loop-Invariant Code Motion . 81 3.8.4 Partial Redundancy Elimination . 81 3.8.5 Reassociation . 82 3.8.6 Constant Folding . 83 3.8.7 γ Folding . 83 3.9 Summary . 84 4 Procedural Abstraction via Patterns 85 4.1 Pattern Abstraction Algorithm . 86 4.2 Pattern Generation . 87 4.2.1 Pattern Generation Algorithm . 88 4.2.2 Analysis of Pattern Generation Algorithm . 88 4.3 Pattern Selection . 90 4.3.1 Pattern Cost Model . 91 4.3.2 Observations on the Cost Model . 91 4.3.3 Overlapping Patterns . 92 4.4 Abstracting the Chosen Pattern . 92 4.4.1 Generating the Abstract Function . 92 4.4.2 Generating Abstract Function Calls . 92 4.5 Summary . 93 5 Multiple Memory Access Optimization 95 5.1 Examples of MMA Instructions . 96 5.2 Simple Offset Assignment . 96 5.3 Multiple Memory Access on the Control Flow Graph . 97 5.3.1 Generic MMA Instructions . 98 5.3.2 Access Graph and Access Paths . 98 5.3.3 Construction of the Access Graph . 99 5.3.4 SOLVEMMA and Maximum Weight Path Covering . 100 5.3.5 The Phase Order Problem . 101 5.3.6 Scheduling SOLVEMMA Within A Compiler . 101 5.3.7 Complexity of Heuristic Algorithm . 102 5.4 Multiple Memory Access on the VSDG . 103 5.4.1 Modifying SOLVEMMA for the VSDG . 103 5.5 Target-Specific MMA Instructions . 104 5.6 Motivating Example . ..

Load more