COMP 181 Compilers

Total Page:16

File Type:pdf, Size:1020Kb

COMP 181 Compilers COMP 181 Compilers Lecture 1 The view from 35000 feet Monday, Sept. 12, 2005 Introduction What is this artifact? The Rosetta Stone Significance? Same document in Greek and Egyptian hieroglyphics Am I in the wrong class? No, compilers are translators Tufts University Computer Science 2 1 Language translation How does translation work? Meaning Semantics Semantics Syntax Syntax High-level language Assembly/machine code Tufts University Computer Science 3 Compiler overview Responsibilities : Recognize legal programs Generate correct code Manage hardware resources Registers Memory Cooperate with OS System calls Object code, virtual memory layout Tufts University Computer Science 4 2 Traditional Two-pass Compiler Source Front IR Back Machine code End End code Errors Implications Use an intermediate representation ( IR ) Front end maps legal source code into IR Back end maps IR into target machine code Admits multiple front ends & multiple passes (better code ) Typically, front end is O(n) or O(n log n), while back end is NPC Tufts University Computer Science 5 A Common Fallacy Front Back Fortran Target 1 end end Front Scheme end Back Target 2 end Front Java end Front Back Target 3 Smalltalk end end Can we build n*m compilers with n+m components? Must encode all language knowledge in each front end Must encode all features in a single IR Must encode all target knowledge in each back end Limited success in systems with very low-level IR s Tufts University Computer Science 6 3 The Front End Source tokens IR Scanner Parser code Errors Responsibilities Recognize legal (& illegal) programs Report errors in a useful way Produce IR & preliminary storage map Shape the code for the back end Much of front end construction can be automated Tufts University Computer Science 7 The Front End Source tokens IR Scanner Parser code Errors Scanner Maps character stream into words—the basic unit of syntax Produces pairs — a word & its part of speech x = x + y; becomes <id ,x> = < id ,x> + < id ,y> ; word ≅ lexeme, part of speech ≅ token type In casual speech, we call the pair a token Typical tokens include number , identifier , +, –, while , if Scanner eliminates white space ( including comments ) Speed is important Tufts University Computer Science 8 4 The Front End Source tokens IR Scanner Parser code Errors Parser Recognizes context-free syntax & reports errors Guides context-sensitive (“semantic”) analysis (type checking ) Builds IR for source program Most parsers built using automated tools Hand-coded parsers not too difficult to build Tufts University Computer Science 9 The Front End Context-free syntax is specified with a grammar sentence → subject verb object subject → noun | proper-noun It is written in a variant of Backus–Naur Form (BNF) Formally, a grammar G = (S,N,T,P) N is a set of non-terminal symbols T is a set of terminal symbols or words S is the start symbol – chosen from N P is a set of productions or rewrite rules (P : N → N ∪T ) Tufts University Computer Science 10 5 The Front End Application to programming languages: 1. goal → expr 2. expr → expr op term S = goal 3. | term T = { number , id , +, - } 4. term → number N = { goal , expr , term , op } 5. | id 6. op → + P = { 1, 2, 3, 4, 5, 6, 7 } 7. | - This grammar defines simple expressions with addition & subtraction over “number” and “id” This grammar, like many, falls in a class called “context- free grammars” Tufts University Computer Science 11 The Front End Given a grammar, we Production Result can derive sentences goal 1 expr by repeated 2 expr op term substitution 5 expr op y 7 expr - y 2 expr op term - y 4 expr op 2 - y To recognize a valid 6 expr + 2 - y sentence in some 3 term + 2 - y CFG, we reverse this 5 x + 2 -y process and build up a parse Tufts University Computer Science 12 6 The Front End A parse can be goal represented by a tree (parse tree or syntax tree ) expr x + 2 - y expr op term expr op term - <id, y> + <number, 2> term 1. goal → expr 2. expr → expr op term <id, x> 3. | term 4. term → number Contains a lot of 5. | id 6. op → + unneeded information. 7. | - Tufts University Computer Science 13 The Front End Compilers often use an abstract syntax tree - The AST summarizes + <id, y> grammatical structure, without including detail about the derivation <id, x> <number, 2> This is much more concise AST s are one kind of intermediate representation ( IR ) Tufts University Computer Science 14 7 Front end to back end Code shape and lowering Start with high-level AST Dismantle complex structures into simple ones if if (x > 0) { t0 = x > 0 br t0 label1 y = a + b * c; > goto label2 z = d + b * c; x 0 label1: } t1 = b * c = = y = a + t1 y + z = d + t1 z + label2: a * d * b c b c Tufts University Computer Science 15 The Back End IR Instruction IR Register IR Instruction Machine Selection Allocation Scheduling code Errors Responsibilities Translate IR into target machine code Choose instructions to implement each IR operation Decide which value to keep in registers Ensure conformance with system interfaces Automation has been less successful in back end Tufts University Computer Science 16 8 The Back End IR Instruction IR Register IR Instruction Machine Selection Allocation Scheduling code Errors Instruction Selection Produce fast, compact code Take advantage of target features Addressing modes, special instructions (e.g., multadd ) Often viewed as a pattern matching problem ad hoc and automated methods Used to be a bigger problem on CISC machines – VAX-11 Orthogonality of RISC simplified this problem Tufts University Computer Science 17 The Back End Example: RISC instructions load @b => r1 ... load @c => r2 label1: mult r1, r2 => r3 t1 = b * c load @a => r4 y = a + t1 add r3, r4 => r5 z = d + t1 store r5 => @y ... load @d => r6 add r3, r6 => r7 store r7 => @z Notice: Explicit loads and stores Lots of registers – “virtual registers” Tufts University Computer Science 18 9 The Back End IR Instruction IR Register IR Instruction Machine Selection Allocation Scheduling code Errors Register Allocation PentiumPentium 4 4 Have each value in a register when it is used Registers 1 cycle Manage a limited set of resourcesRegisters 1 cycle Cache 3-8 cycles Can change instruction choicesCache & insert LOAD 3-8s & cycles STORE s Memory 30-150 cycles Optimal allocation is NP-CompleteMemory (1 or k 30-150registers) cycles Compilers approximate solutions to NP-Complete problems Tufts University Computer Science 19 The Back End IR Instruction IR Register IR Instruction Machine Selection Allocation Scheduling code Errors Instruction Scheduling Avoid hardware stalls and interlocks Use all functional units productively Can increase lifetime of variables (changing the allocation) Optimal scheduling is NP-Complete in nearly all cases Heuristic techniques are well developed Tufts University Computer Science 20 10 The Back End Conflicting goals: Move loads early to avoid waiting Move operations together to reduce registers load @b => r1 load @b => r1 load @c => r2 load @c => r2 mult r1, r2 => r3 load @a => r4 load @a => r1 load @d => r5 add r3, r1 => r1 mult r1, r2 => r3 store r1 => @y add r3, r4 => r4 load @d => r1 store r4 => @y add r3, r1 => r1 add r3, r5 => r5 store r1 => @z store r5 => @z Uses only 3 registers, Start loads early, hide but may stall on loads latency, but need 5 registers Tufts University Computer Science 21 Traditional 3-pass compiler Source Front IR Middle IR Back Machine Code End End End code ErrorsUnderstood misnomer Code Improvement (or optimization ) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time May also improve space, power consumption, … Must preserve “meaning” of the code Next semester... Tufts University Computer Science 22 11 The Optimizer (or Middle End) IROpt IROpt IROpt ... IROpt IR 1 2 3 n Errors Modern optimizers are structured as a series of passes Typical Transformations Discover & propagate some constant value Move a computation to a less frequently executed place Specialize some computation based on context Discover a redundant computation & remove it Remove useless or unreachable code Encode an idiom in some particularly efficient form Tufts University Computer Science 23 Optimization Example: Array accesses for (i = 0; i < N; i++) for (i = 0; i < N; i++) for (j = 0; j < M; j++) for (j = 0; j < M; j++){ A[i][j] = A[i][j] + C; t0 = &A + (i * M) + j (*t0) += C; } for (i = 0; i < N; i++) { t1 = 0; t1 = ii ** M;M for (i = 0; i < N; i++) { for (j = 0; j < M; j++){ t1 = t1t1 + M;M; t0 = &A + t1 + j for (j = 0; j < M; j++){ (*t0) += C; t0 = &A + t1 + j } (*t0) += C; } } } Tufts University Computer Science 24 12 Role of the Run-time System Memory management services Allocate In the heap or in an activation record ( stack frame ) Deallocate Collect garbage Run-time type checking Error processing (exception handling) Interface to the operating system Input and output Support of parallelism Parallel thread initiation Communication and synchronization Tufts University Computer Science 25 Related systems Interpreters Scripting languages XML processing Just-in-time compilers Java JVM Microsoft CLR Binary instrumentation PIN valgrind Tufts University Computer Science 26 13 A bit about my research Getting more out of compilers: High-level optimizations Library-level optimization Checking for errors and security vulnerabilities Approximating vulnerability Cooperation with run-time system Compiler-assisted garbage collection
Recommended publications
  • Expression Rematerialization for VLIW DSP Processors with Distributed Register Files ?
    Expression Rematerialization for VLIW DSP Processors with Distributed Register Files ? Chung-Ju Wu, Chia-Han Lu, and Jenq-Kuen Lee Department of Computer Science, National Tsing-Hua University, Hsinchu 30013, Taiwan {jasonwu,chlu}@pllab.cs.nthu.edu.tw,[email protected] Abstract. Spill code is the overhead of memory load/store behavior if the available registers are not sufficient to map live ranges during the process of register allocation. Previously, works have been proposed to reduce spill code for the unified register file. For reducing power and cost in design of VLIW DSP processors, distributed register files and multi- bank register architectures are being adopted to eliminate the amount of read/write ports between functional units and registers. This presents new challenges for devising compiler optimization schemes for such ar- chitectures. This paper aims at addressing the issues of reducing spill code via rematerialization for a VLIW DSP processor with distributed register files. Rematerialization is a strategy for register allocator to de- termine if it is cheaper to recompute the value than to use memory load/store. In the paper, we propose a solution to exploit the character- istics of distributed register files where there is the chance to balance or split live ranges. By heuristically estimating register pressure for each register file, we are going to treat them as optional spilled locations rather than spilling to memory. The choice of spilled location might pre- serve an expression result and keep the value alive in different register file. It increases the possibility to do expression rematerialization which is effectively able to reduce spill code.
    [Show full text]
  • User-Directed Loop-Transformations in Clang
    User-Directed Loop-Transformations in Clang Michael Kruse Hal Finkel Argonne Leadership Computing Facility Argonne Leadership Computing Facility Argonne National Laboratory Argonne National Laboratory Argonne, USA Argonne, USA [email protected] hfi[email protected] Abstract—Directives for the compiler such as pragmas can Only #pragma unroll has broad support. #pragma ivdep made help programmers to separate an algorithm’s semantics from popular by icc and Cray to help vectorization is mimicked by its optimization. This keeps the code understandable and easier other compilers as well, but with different interpretations of to optimize for different platforms. Simple transformations such as loop unrolling are already implemented in most mainstream its meaning. However, no compiler allows applying multiple compilers. We recently submitted a proposal to add generalized transformations on a single loop systematically. loop transformations to the OpenMP standard. We are also In addition to straightforward trial-and-error execution time working on an implementation in LLVM/Clang/Polly to show its optimization, code transformation pragmas can be useful for feasibility and usefulness. The current prototype allows applying machine-learning assisted autotuning. The OpenMP approach patterns common to matrix-matrix multiplication optimizations. is to make the programmer responsible for the semantic Index Terms—OpenMP, Pragma, Loop Transformation, correctness of the transformation. This unfortunately makes it C/C++, Clang, LLVM, Polly hard for an autotuner which only measures the timing difference without understanding the code. Such an autotuner would I. MOTIVATION therefore likely suggest transformations that make the program Almost all processor time is spent in some kind of loop, and return wrong results or crash.
    [Show full text]
  • Elimination of Memory-Based Dependences For
    Elimination of Memory-Based Dependences for Loop-Nest Optimization and Parallelization: Evaluation of a Revised Violated Dependence Analysis Method on a Three-Address Code Polyhedral Compiler Konrad Trifunovic1, Albert Cohen1, Razya Ladelsky2, and Feng Li1 1 INRIA Saclay { ^Ile-de-France and LRI, Paris-Sud 11 University, Orsay, France falbert.cohen, konrad.trifunovic, [email protected] 2 IBM Haifa Research, Haifa, Israel [email protected] Abstract In order to preserve the legality of loop nest transformations and parallelization, data- dependences need to be analyzed. Memory dependences come in two varieties: they are either data-flow dependences or memory-based dependences. While data-flow de- pendences must be satisfied in order to preserve the correct order of computations, memory-based dependences are induced by the reuse of a single memory location to store multiple values. Memory-based dependences reduce the degrees of freedom for loop transformations and parallelization. While systematic array expansion techniques exist to remove all memory-based dependences, the overhead on memory footprint and the detrimental impact on register-level reuse can be catastrophic. Much care is needed when eliminating memory-based dependences, and this is particularly essential for polyhedral compilation on three-address code representation like the GRAPHITE pass of GCC. We propose and evaluate a technique allowing a compiler to ignore some memory- based dependences while checking for the legality of a given affine transformation. This technique does not involve any array expansion. When this technique is not sufficient to expose data parallelism, it can be used to compute the minimal set of scalars and arrays that should be privatized.
    [Show full text]
  • A General Compilation Algorithm to Parallelize and Optimize Counted Loops with Dynamic Data-Dependent Bounds Jie Zhao, Albert Cohen
    A general compilation algorithm to parallelize and optimize counted loops with dynamic data-dependent bounds Jie Zhao, Albert Cohen To cite this version: Jie Zhao, Albert Cohen. A general compilation algorithm to parallelize and optimize counted loops with dynamic data-dependent bounds. IMPACT 2017 - 7th International Workshop on Polyhedral Compilation Techniques, Jan 2017, Stockholm, Sweden. pp.1-10. hal-01657608 HAL Id: hal-01657608 https://hal.inria.fr/hal-01657608 Submitted on 7 Dec 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A general compilation algorithm to parallelize and optimize counted loops with dynamic data-dependent bounds Jie Zhao Albert Cohen INRIA & DI, École Normale Supérieure 45 rue d’Ulm, 75005 Paris fi[email protected] ABSTRACT iterating until a dynamically computed, data-dependent up- We study the parallelizing compilation and loop nest opti- per bound. Such bounds are loop invariants, but often re- mization of an important class of programs where counted computed in the immediate vicinity of the loop they con- loops have a dynamically computed, data-dependent upper trol; for example, their definition may take place in the im- bound. Such loops are amenable to a wider set of trans- mediately enclosing loop.
    [Show full text]
  • Foundations of Scientific Research
    2012 FOUNDATIONS OF SCIENTIFIC RESEARCH N. M. Glazunov National Aviation University 25.11.2012 CONTENTS Preface………………………………………………….…………………….….…3 Introduction……………………………………………….…..........................……4 1. General notions about scientific research (SR)……………….……….....……..6 1.1. Scientific method……………………………….………..……..……9 1.2. Basic research…………………………………………...……….…10 1.3. Information supply of scientific research……………..….………..12 2. Ontologies and upper ontologies……………………………….…..…….…….16 2.1. Concepts of Foundations of Research Activities 2.2. Ontology components 2.3. Ontology for the visualization of a lecture 3. Ontologies of object domains………………………………..………………..19 3.1. Elements of the ontology of spaces and symmetries 3.1.1. Concepts of electrodynamics and classical gauge theory 4. Examples of Research Activity………………….……………………………….21 4.1. Scientific activity in arithmetics, informatics and discrete mathematics 4.2. Algebra of logic and functions of the algebra of logic 4.3. Function of the algebra of logic 5. Some Notions of the Theory of Finite and Discrete Sets…………………………25 6. Algebraic Operations and Algebraic Structures……………………….………….26 7. Elements of the Theory of Graphs and Nets…………………………… 42 8. Scientific activity on the example “Information and its investigation”……….55 9. Scientific research in Artificial Intelligence……………………………………..59 10. Compilers and compilation…………………….......................................……69 11. Objective, Concepts and History of Computer security…….………………..93 12. Methodological and categorical apparatus of scientific research……………114 13. Methodology and methods of scientific research…………………………….116 13.1. Methods of theoretical level of research 13.1.1. Induction 13.1.2. Deduction 13.2. Methods of empirical level of research 14. Scientific idea and significance of scientific research………………………..119 15. Forms of scientific knowledge organization and principles of SR………….121 1 15.1. Forms of scientific knowledge 15.2.
    [Show full text]
  • Topic 1D Slides
    Topic I (d): Static Single Assignment Form (SSA) 621-10F/Topic-1d-SSA 1 Reading List Slides: Topic Ix Other readings as assigned in class 621-10F/Topic-1d-SSA 2 ABET Outcome Ability to apply knowledge of SSA technique in compiler optimization An ability to formulate and solve the basic SSA construction problem based on the techniques introduced in class. Ability to analyze the basic algorithms using SSA form to express and formulate dataflow analysis problem A Knowledge on contemporary issues on this topic. 621-10F/Topic-1d-SSA 3 Roadmap Motivation Introduction: SSA form Construction Method Application of SSA to Dataflow Analysis Problems PRE (Partial Redundancy Elimination) and SSAPRE Summary 621-10F/Topic-1d-SSA 4 Prelude SSA: A program is said to be in SSA form iff Each variable is statically defined exactly only once, and each use of a variable is dominated by that variable’s definition. So, is straight line code in SSA form ? 621-10F/Topic-1d-SSA 5 Example 푿ퟏ In general, how to transform an arbitrary program into SSA form? Does the definition of X 푿ퟐ 2 dominate its use in the example? 푿ퟑ 흓 (푿ퟏ, 푿ퟐ) 푿 ퟒ 621-10F/Topic-1d-SSA 6 SSA: Motivation Provide a uniform basis of an IR to solve a wide range of classical dataflow problems Encode both dataflow and control flow information A SSA form can be constructed and maintained efficiently Many SSA dataflow analysis algorithms are more efficient (have lower complexity) than their CFG counterparts. 621-10F/Topic-1d-SSA 7 Algorithm Complexity Assume a 1 GHz machine, and an algorithm that takes f(n) steps (1 step = 1 nanosecond).
    [Show full text]
  • A General Compiler Framework for Speculative Optimizations Using Data Speculative Code Motion
    A General Compiler Framework for Speculative Optimizations Using Data Speculative Code Motion Xiaoru Dai, Antonia Zhai, Wei-Chung Hsu, Pen-Chung Yew Department of Computer Science and Engineering University of Minnesota Minneapolis, MN 55455 {dai, zhai, hsu, yew}@cs.umn.edu Abstract obtaining precise data dependence analysis is both difficult and expensive for languages such as C in which Data speculative optimization refers to code dynamic and pointer-based data structures are frequently transformations that allow load and store instructions to used. When the data dependence analysis is unable to be moved across potentially dependent memory show that there is definitely no data dependence between operations. Existing research work on data speculative two memory references, the compiler must assume that optimizations has mainly focused on individual code there is a data dependence between them. It is quite often transformation. The required speculative analysis that that such an assumption is overly conservative. The identifies data speculative optimization opportunities and examples in Figure 1 illustrate how such conservative data the required recovery code generation that guarantees the dependences may affect compiler optimizations. correctness of their execution are handled separately for each optimization. This paper proposes a new compiler S1: = *q while ( p ){ while( p ){ framework to facilitate the design and implementation of S2: *p= b 1 S1: if (p->f==0) S1: if (p->f == 0) general data speculative optimizations such as dead store
    [Show full text]
  • Polyhedral-Model Guided Loop-Nest Auto-Vectorization Konrad Trifunović, Dorit Nuzman, Albert Cohen, Ayal Zaks, Ira Rosen
    Polyhedral-Model Guided Loop-Nest Auto-Vectorization Konrad Trifunović, Dorit Nuzman, Albert Cohen, Ayal Zaks, Ira Rosen To cite this version: Konrad Trifunović, Dorit Nuzman, Albert Cohen, Ayal Zaks, Ira Rosen. Polyhedral-Model Guided Loop-Nest Auto-Vectorization. The 18th International Conference on Parallel Architectures and Com- pilation Techniques, Sep 2009, Raleigh, United States. hal-00645325 HAL Id: hal-00645325 https://hal.inria.fr/hal-00645325 Submitted on 27 Nov 2011 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Polyhedral-Model Guided Loop-Nest Auto-Vectorization Konrad Trifunovic †, Dorit Nuzman ∗, Albert Cohen †, Ayal Zaks ∗ and Ira Rosen ∗ ∗IBM Haifa Research Lab, {dorit, zaks, irar}@il.ibm.com †INRIA Saclay, {konrad.trifunovic, albert.cohen }@inria.fr Abstract—Optimizing compilers apply numerous inter- Modern architectures must exploit multiple forms of par- dependent optimizations, leading to the notoriously difficult allelism provided by platforms while using the memory phase-ordering problem — that of deciding which trans- hierarchy efficiently. Systematic solutions to harness the formations to apply and in which order. Fortunately, new infrastructures such as the polyhedral compilation framework interplay of multi-level parallelism and locality are emerg- host a variety of transformations, facilitating the efficient explo- ing, by advances in automatic parallelization and loop nest ration and configuration of multiple transformation sequences.
    [Show full text]
  • Effective Representation of Aliases and Indirect Memory Operations in SSA Form
    Effective Representation of Aliases and Indirect Memory Operations in SSA Form Fred Chow, Sun Chan, Shin-Ming Liu, Raymond Lo, Mark Streich Silicon Graphics Computer Systems 2011 N. Shoreline Blvd. Mountain View, CA 94043 Contact: Fred Chow (E-mail: [email protected], Phone: USA (415) 933-4270) Abstract. This paper addresses the problems of representing aliases and indirect memory operations in SSA form. We propose a method that prevents explosion in the number of SSA variable versions in the presence of aliases. We also present a technique that allows indirect memory operations to be globally commonized. The result is a precise and compact SSA representation based on global value numbering, called HSSA, that uniformly handles both scalar variables and indi- rect memory operations. We discuss the capabilities of the HSSA representation and present measurements that show the effects of implementing our techniques in a production global optimizer. Keywords. Aliasing, Factoring dependences, Hash tables, Indirect memory oper- ations, Program representation, Static single assignment, Value numbering. 1 Introduction The Static Single Assignment (SSA) form [CFR+91] is a popular and efficient represen- tation for performing analyses and optimizations involving scalar variables. Effective algorithms based on SSA have been developed to perform constant propagation, redun- dant computation detection, dead code elimination, induction variable recognition, and others [AWZ88, RWZ88, WZ91, Wolfe92]. But until now, SSA has only been used mostly for distinct variable names in the program. When applied to indirect variable constructs, the representation is not straight-forward, and results in added complexities in the optimization algorithms that operate on the representation [CFR+91, CG93, CCF94].
    [Show full text]
  • Compiler Construction
    Compiler construction PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 10 Dec 2011 02:23:02 UTC Contents Articles Introduction 1 Compiler construction 1 Compiler 2 Interpreter 10 History of compiler writing 14 Lexical analysis 22 Lexical analysis 22 Regular expression 26 Regular expression examples 37 Finite-state machine 41 Preprocessor 51 Syntactic analysis 54 Parsing 54 Lookahead 58 Symbol table 61 Abstract syntax 63 Abstract syntax tree 64 Context-free grammar 65 Terminal and nonterminal symbols 77 Left recursion 79 Backus–Naur Form 83 Extended Backus–Naur Form 86 TBNF 91 Top-down parsing 91 Recursive descent parser 93 Tail recursive parser 98 Parsing expression grammar 100 LL parser 106 LR parser 114 Parsing table 123 Simple LR parser 125 Canonical LR parser 127 GLR parser 129 LALR parser 130 Recursive ascent parser 133 Parser combinator 140 Bottom-up parsing 143 Chomsky normal form 148 CYK algorithm 150 Simple precedence grammar 153 Simple precedence parser 154 Operator-precedence grammar 156 Operator-precedence parser 159 Shunting-yard algorithm 163 Chart parser 173 Earley parser 174 The lexer hack 178 Scannerless parsing 180 Semantic analysis 182 Attribute grammar 182 L-attributed grammar 184 LR-attributed grammar 185 S-attributed grammar 185 ECLR-attributed grammar 186 Intermediate language 186 Control flow graph 188 Basic block 190 Call graph 192 Data-flow analysis 195 Use-define chain 201 Live variable analysis 204 Reaching definition 206 Three address
    [Show full text]
  • Techniques for Effectively Exploiting a Zero Overhead Loop Buffer
    Techniques for Effectively Exploiting a Zero Overhead Loop Buffer Gang-Ryung Uh1, Yuhong Wang2, David Whalley2, Sanjay Jinturkar1, Chris Burns1, and Vincent Cao1 1 Lucent Technologies, Allentown, PA 18103, U.S.A. {uh,sjinturkar,cpburns,vpcao}@lucent.com 2 Computer Science Dept., FloridaStateUniv. Tallahassee, FL 32306-4530, U.S.A. {yuhong,whalley}@cs.fsu.edu Abstract. A Zero Overhead Loop Buffer (ZOLB) is an architectural feature that is commonly found in DSP processors. This buffer can be viewed as a compiler managed cache that contains a sequence of instruc- tions that will be executed a specified number of times. Unlike loop un- rolling, a loop buffer can be used to minimize loop overhead without the penalty of increasing code size. In addition, a ZOLB requires relatively little space and power, which are both important considerations for most DSP applications. This paper describes strategies for generating code to effectively use a ZOLB. The authors have found that many common im- proving transformations used by optimizing compilers to improve code on conventional architectures can be exploited (1) to allow more loops to be placed in a ZOLB, (2) to further reduce loop overhead of the loops placed in a ZOLB, and (3) to avoid redundant loading of ZOLB loops. The results given in this paper demonstrate that this architectural fea- ture can often be exploited with substantial improvements in execution time and slight reductions in code size. 1 Introduction The number of DSP processors is growing every year at a much faster rate than general-purpose computer processors. For many applications, a large percentage of the execution time is spent in the innermost loops of a program [1].
    [Show full text]
  • Synthesis and Exploration of Loop Accelerators for Systems-On-A-Chip
    Synthesis and Exploration of Loop Accelerators for Systems-on-a-Chip Der Technischen Fakultät der Universität Erlangen-Nürnberg zur Erlangung des Grades DOKTOR-INGENIEUR vorgelegt von Hritam Dutta Erlangen 2011 Als Dissertation genehmigt von der Technischen Fakultät der Universität Erlangen-Nürnberg Tag der Einreichung: . 10. Januar 2011 Tag der Promotion: . 03. März 2011 Dekan: . Prof. Dr.-Ing. Reinhard German Berichterstatter: . Prof. Dr.-Ing. Jürgen Teich . .Prof. Christian Lengauer, Ph.D. Acknowledgements I owe my deepest gratitude to my adviser, Professor Jürgen Teich, for always being enthusiastic to propose and discuss new ideas. He also provided me a great amount of freedom, and valuable scientific and editorial feedback. I would also like to thank Professor Christian Lengauer for agreeing to serve on my dissertation committee and the suggestions to improve the dissertation. My sincere gratitude also goes to Professor Bernard Pottier and Professor Ulrich Rüde for introducing me to new ideas and fields of research. My special thanks goes to all colleagues, especially, Frank Hannig, Dmitrij Kissler, Joachim Keinert, Richard Membarth, Moritz Schmid, Jens Gladigau, Dirk Koch for brainstorming sessions and intensive co-operation, which led to key scientific progress and enrichment of my knowledge. I appreciate Frank’s patience in reading the whole dissertation and making valuable suggestions. I was fortunate to have won- derful office mates in Mateusz Majer and Tobias Ziermann, and thank them both for all the technical and non-technical discussions. My sincere acknowledgements also goes to external colleagues Sebastian Siegel, Rainer Schaffer (TU Dresden), Wolf- gang Haid (ETH Zürich), Samar Yazdani (UBO, Brest) for co-operation on important research problems.
    [Show full text]