A General Compiler Framework for Speculative Optimizations Using Data Speculative Code Motion

Total Page:16

File Type:pdf, Size:1020Kb

A General Compiler Framework for Speculative Optimizations Using Data Speculative Code Motion A General Compiler Framework for Speculative Optimizations Using Data Speculative Code Motion Xiaoru Dai, Antonia Zhai, Wei-Chung Hsu, Pen-Chung Yew Department of Computer Science and Engineering University of Minnesota Minneapolis, MN 55455 {dai, zhai, hsu, yew}@cs.umn.edu Abstract obtaining precise data dependence analysis is both difficult and expensive for languages such as C in which Data speculative optimization refers to code dynamic and pointer-based data structures are frequently transformations that allow load and store instructions to used. When the data dependence analysis is unable to be moved across potentially dependent memory show that there is definitely no data dependence between operations. Existing research work on data speculative two memory references, the compiler must assume that optimizations has mainly focused on individual code there is a data dependence between them. It is quite often transformation. The required speculative analysis that that such an assumption is overly conservative. The identifies data speculative optimization opportunities and examples in Figure 1 illustrate how such conservative data the required recovery code generation that guarantees the dependences may affect compiler optimizations. correctness of their execution are handled separately for each optimization. This paper proposes a new compiler S1: = *q while ( p ){ while( p ){ framework to facilitate the design and implementation of S2: *p= b 1 S1: if (p->f==0) S1: if (p->f == 0) general data speculative optimizations such as dead store 2 …… …… elimination, redundancy elimination, copy propagation, S3: = *q S2: p->f=0; S2: p->f=0; and code scheduling. This framework allows different …… …… data speculative optimizations to share the followings: (i) S4: *r= … p = p->n; p = p->n; 3 a speculative analysis mechanism to identify data } S3: if (p->f == 0) speculative optimization opportunities by ignoring low S5: = *p …… probability data dependences from optimizations, and (ii) S6: *r= … S4: p->f = 0; a recovery code generation mechanism to guarantee the …… correctness of the data speculative optimizations. The p = p->n; } proposed recovery code generation is based on Data Speculative Code Motion (DSCM) that uses code motion a) Example 1 b) Example 2 to facilitate a desired transformation. Based on the Figure 1. Examples of compiler optimizations position of the moved instruction, recovery code can be disabled by possible data dependences. generated accordingly. The proposed framework greatly simplifies the task of incorporating data speculation into In Figure 1, lines represent possible data dependences non-speculative optimizations by sharing the recovery between memory references. For the example in Figure 1 code generation and the speculative analysis. We have (a), the possible true dependence between *p and *q (line implemented the proposed framework in the ORC 2.1 1) prevents possible redundancy elimination of *q in S3. compiler and demonstrated its effectiveness on The possible output dependence between *p and *r (line SPEC2000 benchmark programs. 2) inhibits possible copy propagation of *p in S5. The possible true dependence between *p and *r (line 3) 1. Introduction disallows possible dead store elimination in S4. In this example, three compiler optimizations (redundancy Imprecise data dependence information may decrease elimination, copy propagation, dead store elimination) are the effectiveness of compiler optimizations. However, inhibited by possible data dependences. Without these Proceedings of the International Symposium on Code Generation and Optimization (CGO’05) 0-7695-2298-X/05 $ 20.00 IEEE possible dependences, those optimizations could have The work in [12] and [15] uses data speculation in been performed by the compiler. code scheduling to generate more efficient code sequence. In the left column of Figure 1 (b), another motivation In [13][14][22], data speculation is used to enable example with a C while statement is shown. In this speculative register allocation. They are all examples of example, the load of p->f in S1 may have possible data specific speculative code optimizations. dependence with the store of p->f in S2. After the loop is In [11], Ju et al. proposed a unified compiler unrolled, the result code is shown in the right column of framework for control and data speculation in a code Figure 1 (b). S1 and S2 are from the first iteration, S3 and scheduler. There are three main tasks in their speculative S4 are from the second iteration. The load of p->f in S3 code scheduler: marking speculative dependence edges, cannot be scheduled ahead of the store in S2 because of selecting speculative instructions as scheduling the possible data dependence. If this data dependence candidates, and check insertion and DAG update. These rarely happens at runtime, it may be profitable to schedule three tasks are integrated with the rest of the instruction the load in S3 before the store in S2 to hide the load scheduling phase. latency. If the data dependence indeed happens, a In [16], a framework that augments SSA form to recovery code needs to be executed to guarantee the incorporate data speculative information (obtained either correct results. from alias profiling or compiler heuristic rules) is Getting precise data dependence information is proposed. Speculative partial redundancy elimination difficult because it is hard for a compiler to know what based on the SSAPRE [5] is presented to exemplify the memory locations a memory reference may access at run use of such a framework. time. It is even more difficult when pointers are involved In both [11][16], the data speculative information is in the program. Therefore, using data speculation and explicitly annotated either through speculative runtime verification to overcome possible data dependence edges in dependence graph [11] or dependences (with low probabilities) has been proposed speculative weak updates in SSA form (i.e. χ and µ recently in [11-16]. Here, data speculation refers to the operators in [16]). All optimizations that try to incorporate execution of instructions which may potentially violate data speculation thus must be modified and made aware of possible memory dependences albeit infrequently. such explicitly annotated data speculative information. Compiler optimizations are normally divided into two In [11], the construction of dependence graph, phases: the analysis phase and the code transformation selection of scheduling candidates and DAG update are all phase. The analysis phase identifies optimization modified to handle the speculative dependence edges. opportunities based on the internal representation (IR) and Recovery code generation is decoupled from the data dependence information. The code transformation scheduling phase, and works well only for code phase modifies IR to generate improved code. To support scheduling. It may not handle other optimizations directly. data speculation, we need a recovery mechanism using For example, the identification of speculative chains in either hardware or software support to guarantee the their recovery code generation will not be applicable for correctness of their speculative optimizations. eliminating instructions due to speculative redundancy. In [16], the construction of SSA form, the Φ-insertion step, Speculative Data Dependence Analysis the rename step and the code motion step in SSAPRE all need be modified to identify speculative optimization opportunities and to generate recovery code. In [11][16], Analysis phase … Analysis phase the accommodation of data speculative information in of optimization 1 of optimization n optimizations and the recovery code generation have to be tailored to each specific compiler optimization. They can’t be shared among optimizations. Such existing frameworks are difficult to adopt, to extend, and to maintain. Data Speculative Code Motion In our framework, as shown in Figure 2, the data speculative information is integrated into a shared Speculative Data Dependence Analysis (SDDA) phase by ignoring low probability data dependences from the Code transformation … Code transformation optimizations. Hence, more optimization opportunities of optimization 1 of optimization n could be exposed for existing optimizations without requiring any modification to accommodate such information as in [11] and [16]. When an optimization Figure 2. Structure of our proposed data opportunity is identified in the analysis phase of an speculative optimizations optimization, a shared mechanism is provided for Proceedings of the International Symposium on Code Generation and Optimization (CGO’05) 0-7695-2298-X/05 $ 20.00 IEEE recovery code generation if it is data speculative. The dependence between two memory references unless we proposed recovery code generation is based on Data could prove that it is very likely, or most definitely, that Speculative Code Motion (DSCM) which uses a code those two memory references will access the same motion model to determine whether a transformation is memory locations. Any data dependence with a low data speculative, and to generate necessary recovery code. probability will be assumed as no data dependence in the Our framework has two advantages. First, SDDA and speculative optimizations. As it turns out, the probability DSCM are shared by all optimizations. Second, the distribution of most data dependences are very bimodal, existing non-speculative optimizations need no i.e. it is either very likely, or not likely at all [20]. Using modifications.
Recommended publications
  • Precise Null Pointer Analysis Through Global Value Numbering
    Precise Null Pointer Analysis Through Global Value Numbering Ankush Das1 and Akash Lal2 1 Carnegie Mellon University, Pittsburgh, PA, USA 2 Microsoft Research, Bangalore, India Abstract. Precise analysis of pointer information plays an important role in many static analysis tools. The precision, however, must be bal- anced against the scalability of the analysis. This paper focusses on improving the precision of standard context and flow insensitive alias analysis algorithms at a low scalability cost. In particular, we present a semantics-preserving program transformation that drastically improves the precision of existing analyses when deciding if a pointer can alias Null. Our program transformation is based on Global Value Number- ing, a scheme inspired from compiler optimization literature. It allows even a flow-insensitive analysis to make use of branch conditions such as checking if a pointer is Null and gain precision. We perform experiments on real-world code and show that the transformation improves precision (in terms of the number of dereferences proved safe) from 86.56% to 98.05%, while incurring a small overhead in the running time. Keywords: Alias Analysis, Global Value Numbering, Static Single As- signment, Null Pointer Analysis 1 Introduction Detecting and eliminating null-pointer exceptions is an important step towards developing reliable systems. Static analysis tools that look for null-pointer ex- ceptions typically employ techniques based on alias analysis to detect possible aliasing between pointers. Two pointer-valued variables are said to alias if they hold the same memory location during runtime. Statically, aliasing can be de- cided in two ways: (a) may-alias [1], where two pointers are said to may-alias if they can point to the same memory location under some possible execution, and (b) must-alias [27], where two pointers are said to must-alias if they always point to the same memory location under all possible executions.
    [Show full text]
  • A Formally-Verified Alias Analysis
    A Formally-Verified Alias Analysis Valentin Robert1;2 and Xavier Leroy1 1 INRIA Paris-Rocquencourt 2 University of California, San Diego [email protected], [email protected] Abstract. This paper reports on the formalization and proof of sound- ness, using the Coq proof assistant, of an alias analysis: a static analysis that approximates the flow of pointer values. The alias analysis con- sidered is of the points-to kind and is intraprocedural, flow-sensitive, field-sensitive, and untyped. Its soundness proof follows the general style of abstract interpretation. The analysis is designed to fit in the Comp- Cert C verified compiler, supporting future aggressive optimizations over memory accesses. 1 Introduction Alias analysis. Most imperative programming languages feature pointers, or object references, as first-class values. With pointers and object references comes the possibility of aliasing: two syntactically-distinct program variables, or two semantically-distinct object fields can contain identical pointers referencing the same shared piece of data. The possibility of aliasing increases the expressiveness of the language, en- abling programmers to implement mutable data structures with sharing; how- ever, it also complicates tremendously formal reasoning about programs, as well as optimizing compilation. In this paper, we focus on optimizing compilation in the presence of pointers and aliasing. Consider, for example, the following C program fragment: ... *p = 1; *q = 2; x = *p + 3; ... Performance would be increased if the compiler propagates the constant 1 stored in p to its use in *p + 3, obtaining ... *p = 1; *q = 2; x = 4; ... This optimization, however, is unsound if p and q can alias.
    [Show full text]
  • Aliases, Intro. to Optimization
    Aliasing Two variables are aliased if they can refer to the same storage location. Possible Sources:pointers, parameter passing, storage overlap... Ex. address of a passed x and y are aliased! Pointer Analysis, Alias Analysis to get less conservative info Needed for correct, aggressive optimization Procedures: terminology a, e global b,c formal arguments d local call site with actual arguments At procedure call, formals bound to actuals, may be aliased Ex. (b,a) , (c, d) Globals, actuals may be modified, used Ex. a, b Call Graphs Determines possible flow of control, interprocedurally G = (N, LE, s) N set of nodes LE set of labelled edges n m s start node Qu: Why need call site labels? Why list? Example Call Graph 1,2 3 4 5 6 7 Interprocedural Dataflow Analysis Based on call graph: forward, backward Gen, Kill: Need to summarize procedures per call Flow sensitive: take procedure's control flow into account Flow insensitive: ignore procedure's control flow Difficulties: Hard, complex Flow sensitive alias analysis intractable Separate compilation? Scale compiler can do both flow sensitive and insensitive Most compilers ultraconservative, or flow insensitive Scalar Replacement of Aggregates Use scalar temporary instead of aggregate variable Compiler may limit optimization to such scalars Can do better register allocation, constant propagation,... Particulary useful when small number of constant values Can use constant propagation, dead code elimination to specialize code Value Numbering of Basic Blocks Eliminates computations whose values are already computed in BB value needn't be constant Method: Value Number Hash Table Global Copy Propagation Given A:=B, replace later uses of A by B, as long as A,B not redefined (with dead code elim) Global Copy Propagation Ex.
    [Show full text]
  • Automatic Parallelization of C by Means of Language Transcription
    Automatic Parallelization of C by Means of Language Transcription Richard L. Kennell Rudolf Eigenmann Purdue University, School of Electrical and Computer Engineering Abstract. The automatic parallelization of C has always been frustrated by pointer arithmetic, irregular control flow and complicated data aggregation. Each of these problems is similar to familiar challenges encountered in the parallelization of more rigidly-structured languages such as FORTRAN. By creating a mapping from one language to the other, we can expose the capabil- ities of existing automatically parallelizing compilers to the C language. In this paper, we describe our approach to mapping applications written in C to a form suitable for the Polaris source-to- source FORTRAN compiler. We also describe the improvements in the compiled applications realized by this second level of transformation and show results for a small application in compar- ison to commercial compilers. 1.0 Introduction Polaris is a automatically parallelizing source-to-source FORTRAN compiler. It accepts FORTRAN77 input and produces a FORTRAN output in a new dialect that supports explicit par- allelism by means of embedded directives such as the OpenMP [Ope97] or Sun FORTRAN Directives [Sun96]. The benefit that Polaris provides is in automating the analysis of the loops and array accesses in the application to determine how they can best be expressed to exploit available parallelism. Since FORTRAN naturally constrains the way in which parallelism exists, the analy- sis is somewhat more straightforward than with other languages. This allows Polaris to perform very complicated interprocedural and global analysis without risk of misinterpretation of pro- grammer intent. Experimental results show that Polaris is able to markedly improve the run-time of applications without additional programmer direction [PVE96, BDE+96].
    [Show full text]
  • Automatic Loop Parallelization Via Compiler Guided Refactoring
    Automatic Loop Parallelization via Compiler Guided Refactoring Per Larsen∗, Razya Ladelskyz, Jacob Lidmany, Sally A. McKeey, Sven Karlsson∗ and Ayal Zaksz ∗DTU Informatics Technical U. Denmark, 2800 Kgs. Lyngby, Denmark Email: fpl,[email protected] y Computer Science Engineering Chalmers U. Technology, 412 96 Gothenburg, Sweden Email: [email protected], [email protected] zIBM Haifa Research Labs Mount Carmel, Haifa, 31905, Israel Email: frazya,[email protected] Abstract—For many parallel applications, performance relies with hand-parallelized and optimized versions using an octo- not on instruction-level parallelism, but on loop-level parallelism. core Intel Xeon 5570 system and a quad-core IBM POWER6 Unfortunately, many modern applications are written in ways SCM system. that obstruct automatic loop parallelization. Since we cannot identify sufficient parallelization opportunities for these codes in Our contributions are as follows. a static, off-line compiler, we developed an interactive compilation • First, we present our interactive compilation system. feedback system that guides the programmer in iteratively • Second, we perform an extensive performance evaluation. modifying application source, thereby improving the compiler’s We use two benchmark kernels, two parallel architectures ability to generate loop-parallel code. We use this compilation system to modify two sequential benchmarks, finding that the and also study the behavior of the benchmarks. code parallelized in this way runs up to 8.3 times faster on an After modification with our compilation system, we find octo-core Intel Xeon 5570 system and up to 12.5 times faster on that the two benchmarks runs up to 6.0 and 8.3 times faster a quad-core IBM POWER6 system.
    [Show full text]
  • Topic 1D Slides
    Topic I (d): Static Single Assignment Form (SSA) 621-10F/Topic-1d-SSA 1 Reading List Slides: Topic Ix Other readings as assigned in class 621-10F/Topic-1d-SSA 2 ABET Outcome Ability to apply knowledge of SSA technique in compiler optimization An ability to formulate and solve the basic SSA construction problem based on the techniques introduced in class. Ability to analyze the basic algorithms using SSA form to express and formulate dataflow analysis problem A Knowledge on contemporary issues on this topic. 621-10F/Topic-1d-SSA 3 Roadmap Motivation Introduction: SSA form Construction Method Application of SSA to Dataflow Analysis Problems PRE (Partial Redundancy Elimination) and SSAPRE Summary 621-10F/Topic-1d-SSA 4 Prelude SSA: A program is said to be in SSA form iff Each variable is statically defined exactly only once, and each use of a variable is dominated by that variable’s definition. So, is straight line code in SSA form ? 621-10F/Topic-1d-SSA 5 Example 푿ퟏ In general, how to transform an arbitrary program into SSA form? Does the definition of X 푿ퟐ 2 dominate its use in the example? 푿ퟑ 흓 (푿ퟏ, 푿ퟐ) 푿 ퟒ 621-10F/Topic-1d-SSA 6 SSA: Motivation Provide a uniform basis of an IR to solve a wide range of classical dataflow problems Encode both dataflow and control flow information A SSA form can be constructed and maintained efficiently Many SSA dataflow analysis algorithms are more efficient (have lower complexity) than their CFG counterparts. 621-10F/Topic-1d-SSA 7 Algorithm Complexity Assume a 1 GHz machine, and an algorithm that takes f(n) steps (1 step = 1 nanosecond).
    [Show full text]
  • Autotuning for Automatic Parallelization on Heterogeneous Systems
    Autotuning for Automatic Parallelization on Heterogeneous Systems Zur Erlangung des akademischen Grades eines Doktors der Ingenieurwissenschaften von der KIT-Fakultät für Informatik des Karlsruher Instituts für Technologie (KIT) genehmigte Dissertation von Philip Pfaffe ___________________________________________________________________ ___________________________________________________________________ Tag der mündlichen Prüfung: 24.07.2019 1. Referent: Prof. Dr. Walter F. Tichy 2. Referent: Prof. Dr. Michael Philippsen Abstract To meet the surging demand for high-speed computation in an era of stagnat- ing increase in performance per processor, systems designers resort to aggregating many and even heterogeneous processors into single systems. Automatic paral- lelization tools relieve application developers of the tedious and error prone task of programming these heterogeneous systems. For these tools, there are two aspects to maximizing performance: Optimizing the execution on each parallel platform individually, and executing work on the available platforms cooperatively. To date, various approaches exist targeting either aspect. Automatic parallelization for simultaneous cooperative computation with optimized per-platform execution however remains an unsolved problem. This thesis presents the APHES framework to close that gap. The framework com- bines automatic parallelization with a novel technique for input-sensitive online autotuning. Its first component, a parallelizing polyhedral compiler, transforms implicitly data-parallel program parts for multiple platforms. Targeted platforms then automatically cooperate to process the work. During compilation, the code is instrumented to interact with libtuning, our new autotuner and second com- ponent of the framework. Tuning the work distribution and per-platform execu- tion maximizes overall performance. The autotuner enables always-on autotuning through a novel hybrid tuning method, combining a new efficient search technique and model-based prediction.
    [Show full text]
  • Vbcc Compiler System
    vbcc compiler system Volker Barthelmann i Table of Contents 1 General :::::::::::::::::::::::::::::::::::::::::: 1 1.1 Introduction ::::::::::::::::::::::::::::::::::::::::::::::::::: 1 1.2 Legal :::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 1 1.3 Installation :::::::::::::::::::::::::::::::::::::::::::::::::::: 2 1.3.1 Installing for Unix::::::::::::::::::::::::::::::::::::::::: 3 1.3.2 Installing for DOS/Windows::::::::::::::::::::::::::::::: 3 1.3.3 Installing for AmigaOS :::::::::::::::::::::::::::::::::::: 3 1.4 Tutorial :::::::::::::::::::::::::::::::::::::::::::::::::::::::: 5 2 The Frontend ::::::::::::::::::::::::::::::::::: 7 2.1 Usage :::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 7 2.2 Configuration :::::::::::::::::::::::::::::::::::::::::::::::::: 8 3 The Compiler :::::::::::::::::::::::::::::::::: 11 3.1 General Compiler Options::::::::::::::::::::::::::::::::::::: 11 3.2 Errors and Warnings :::::::::::::::::::::::::::::::::::::::::: 15 3.3 Data Types ::::::::::::::::::::::::::::::::::::::::::::::::::: 15 3.4 Optimizations::::::::::::::::::::::::::::::::::::::::::::::::: 16 3.4.1 Register Allocation ::::::::::::::::::::::::::::::::::::::: 18 3.4.2 Flow Optimizations :::::::::::::::::::::::::::::::::::::: 18 3.4.3 Common Subexpression Elimination :::::::::::::::::::::: 19 3.4.4 Copy Propagation :::::::::::::::::::::::::::::::::::::::: 20 3.4.5 Constant Propagation :::::::::::::::::::::::::::::::::::: 20 3.4.6 Dead Code Elimination::::::::::::::::::::::::::::::::::: 21 3.4.7 Loop-Invariant Code Motion
    [Show full text]
  • A Performance-Based Approach to Automatic Parallelization
    A Performance-based Approach to Automatic Parallelization Lecture Notes for ECE 663 Advanced Optimizing Compilers Draft Rudolf Eigenmann School of Electrical and Computer Engineering Purdue University c Rudolf Eigenmann 2 Chapter 1 Motivation and Introduction Slide 2: Optimizing Compilers are in the Center of the (Software) Universe Compilers are the translators between “human and machine”. This interface is one of the more challenging and important issues in all of computer science. Through programming languages, software engineers tell machines what do to. Today’s programming languages are still rather cryptic, borrowing a few words from the English language vocabulary. Perhaps, tomorrow’s languages will be more like natural languages. They may offer higher-level ex- pressions, allowing problems to be specified, rather than coding the detailed problem solution algorithms. Translating these languages onto modern architectures with their low-level ma- chine code languages is already a challenge. As every generation of computer architectures tends to grow in complexity, translating future programming languages onto future machines will be an even grander challenge. In performing such translation, optimization is important. Basic translation is usually inefficient. Performance almost always matters. The are other optimization criteria, as well. Energy efficiency is of increasing importance and is critical for devices that are battery operated. Small code and memory size can be of great value for embedded system. In this course, we will focus mainly on performance. Optimizations for parallel machines are the focus of this course. Today’s CPUs contain multiple cores. They are one of the most important classes of current parallel machines. This was not always so.
    [Show full text]
  • Effective Representation of Aliases and Indirect Memory Operations in SSA Form
    Effective Representation of Aliases and Indirect Memory Operations in SSA Form Fred Chow, Sun Chan, Shin-Ming Liu, Raymond Lo, Mark Streich Silicon Graphics Computer Systems 2011 N. Shoreline Blvd. Mountain View, CA 94043 Contact: Fred Chow (E-mail: [email protected], Phone: USA (415) 933-4270) Abstract. This paper addresses the problems of representing aliases and indirect memory operations in SSA form. We propose a method that prevents explosion in the number of SSA variable versions in the presence of aliases. We also present a technique that allows indirect memory operations to be globally commonized. The result is a precise and compact SSA representation based on global value numbering, called HSSA, that uniformly handles both scalar variables and indi- rect memory operations. We discuss the capabilities of the HSSA representation and present measurements that show the effects of implementing our techniques in a production global optimizer. Keywords. Aliasing, Factoring dependences, Hash tables, Indirect memory oper- ations, Program representation, Static single assignment, Value numbering. 1 Introduction The Static Single Assignment (SSA) form [CFR+91] is a popular and efficient represen- tation for performing analyses and optimizations involving scalar variables. Effective algorithms based on SSA have been developed to perform constant propagation, redun- dant computation detection, dead code elimination, induction variable recognition, and others [AWZ88, RWZ88, WZ91, Wolfe92]. But until now, SSA has only been used mostly for distinct variable names in the program. When applied to indirect variable constructs, the representation is not straight-forward, and results in added complexities in the optimization algorithms that operate on the representation [CFR+91, CG93, CCF94].
    [Show full text]
  • Debloating Software Through Piece-Wise Compilation and Loading
    Debloating Software through Piece-Wise Compilation and Loading Anh Quach Aravind Prakash Binghamton University Binghamton University [email protected] [email protected] Lok Yan Air Force Research Laboratory [email protected] Abstract This extraneous code may contain its own bugs and vulnerabilities and therefore broadens the overall attack Programs are bloated. Our study shows that only 5% of surface. Additionally, these features add unnecessary libc is used on average across the Ubuntu Desktop envi- burden on modern defenses (e.g., CFI) that do not dis- ronment (2016 programs); the heaviest user, vlc media tinguish between used and unused features in software. player, only needed 18%. Accumulation of unnecessary code in a binary – either In this paper: (1) We present a debloating framework by design (e.g., shared libraries) or due to software devel- built on a compiler toolchain that can successfully de- opment inefficiencies – amounts to code bloating. As a bloat programs (shared/static libraries and executables). typical example, shared libraries are designed to contain Our solution can successfully compile and load most li- the union of all functionality required by its users. braries on Ubuntu Desktop 16.04. (2) We demonstrate Static dead-code-elimination – a static analysis tech- the elimination of over 79% of code from coreutils nique used to identify unused code paths and remove and 86% of code from SPEC CPU 2006 benchmark pro- them from the final binary – employed during compila- grams without affecting functionality. We show that even tion is an effective means to reduce bloat. In fact, under complex programs such as Firefox and curl can be higher levels of optimization, modern compilers (clang, debloated without a need to recompile.
    [Show full text]
  • Compiler Construction
    Compiler construction PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 10 Dec 2011 02:23:02 UTC Contents Articles Introduction 1 Compiler construction 1 Compiler 2 Interpreter 10 History of compiler writing 14 Lexical analysis 22 Lexical analysis 22 Regular expression 26 Regular expression examples 37 Finite-state machine 41 Preprocessor 51 Syntactic analysis 54 Parsing 54 Lookahead 58 Symbol table 61 Abstract syntax 63 Abstract syntax tree 64 Context-free grammar 65 Terminal and nonterminal symbols 77 Left recursion 79 Backus–Naur Form 83 Extended Backus–Naur Form 86 TBNF 91 Top-down parsing 91 Recursive descent parser 93 Tail recursive parser 98 Parsing expression grammar 100 LL parser 106 LR parser 114 Parsing table 123 Simple LR parser 125 Canonical LR parser 127 GLR parser 129 LALR parser 130 Recursive ascent parser 133 Parser combinator 140 Bottom-up parsing 143 Chomsky normal form 148 CYK algorithm 150 Simple precedence grammar 153 Simple precedence parser 154 Operator-precedence grammar 156 Operator-precedence parser 159 Shunting-yard algorithm 163 Chart parser 173 Earley parser 174 The lexer hack 178 Scannerless parsing 180 Semantic analysis 182 Attribute grammar 182 L-attributed grammar 184 LR-attributed grammar 185 S-attributed grammar 185 ECLR-attributed grammar 186 Intermediate language 186 Control flow graph 188 Basic block 190 Call graph 192 Data-flow analysis 195 Use-define chain 201 Live variable analysis 204 Reaching definition 206 Three address
    [Show full text]