Compiler Construction

Total Page:16

File Type:pdf, Size:1020Kb

Compiler Construction Compiler construction PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 10 Dec 2011 02:23:02 UTC Contents Articles Introduction 1 Compiler construction 1 Compiler 2 Interpreter 10 History of compiler writing 14 Lexical analysis 22 Lexical analysis 22 Regular expression 26 Regular expression examples 37 Finite-state machine 41 Preprocessor 51 Syntactic analysis 54 Parsing 54 Lookahead 58 Symbol table 61 Abstract syntax 63 Abstract syntax tree 64 Context-free grammar 65 Terminal and nonterminal symbols 77 Left recursion 79 Backus–Naur Form 83 Extended Backus–Naur Form 86 TBNF 91 Top-down parsing 91 Recursive descent parser 93 Tail recursive parser 98 Parsing expression grammar 100 LL parser 106 LR parser 114 Parsing table 123 Simple LR parser 125 Canonical LR parser 127 GLR parser 129 LALR parser 130 Recursive ascent parser 133 Parser combinator 140 Bottom-up parsing 143 Chomsky normal form 148 CYK algorithm 150 Simple precedence grammar 153 Simple precedence parser 154 Operator-precedence grammar 156 Operator-precedence parser 159 Shunting-yard algorithm 163 Chart parser 173 Earley parser 174 The lexer hack 178 Scannerless parsing 180 Semantic analysis 182 Attribute grammar 182 L-attributed grammar 184 LR-attributed grammar 185 S-attributed grammar 185 ECLR-attributed grammar 186 Intermediate language 186 Control flow graph 188 Basic block 190 Call graph 192 Data-flow analysis 195 Use-define chain 201 Live variable analysis 204 Reaching definition 206 Three address code 207 Static single assignment form 209 Dominator 215 C3 linearization 217 Intrinsic function 218 Aliasing 219 Alias analysis 221 Array access analysis 223 Pointer analysis 223 Escape analysis 224 Shape analysis 225 Loop dependence analysis 227 Program slicing 230 Code optimization 233 Compiler optimization 233 Peephole optimization 244 Copy propagation 247 Constant folding 248 Sparse conditional constant propagation 250 Common subexpression elimination 251 Partial redundancy elimination 252 Global value numbering 253 Strength reduction 254 Bounds-checking elimination 265 Inline expansion 266 Return value optimization 269 Dead code 272 Dead code elimination 273 Unreachable code 275 Redundant code 278 Jump threading 279 Superoptimization 279 Loop optimization 280 Induction variable 282 Loop fission 285 Loop fusion 286 Loop inversion 287 Loop interchange 289 Loop-invariant code motion 290 Loop nest optimization 291 Manifest expression 295 Polytope model 296 Loop unwinding 298 Loop splitting 305 Loop tiling 306 Loop unswitching 308 Interprocedural optimization 309 Whole program optimization 313 Adaptive optimization 313 Lazy evaluation 314 Partial evaluation 318 Profile-guided optimization 320 Automatic parallelization 320 Loop scheduling 322 Vectorization 323 Superword Level Parallelism 331 Code generation 332 Code generation 332 Name mangling 334 Register allocation 343 Chaitin's algorithm 345 Rematerialization 346 Sethi-Ullman algorithm 347 Data structure alignment 349 Instruction selection 357 Instruction scheduling 358 Software pipelining 360 Trace scheduling 364 Just-in-time compilation 364 Bytecode 368 Dynamic compilation 370 Dynamic recompilation 371 Object file 373 Code segment 374 Data segment 374 .bss 376 Literal pool 377 Overhead code 377 Link time 378 Relocation 378 Library 380 Static build 388 Architecture Neutral Distribution Format 389 Development techniques 391 Bootstrapping 391 Compiler correctness 392 Jensen's Device 394 Man or boy test 395 Cross compiler 397 Source-to-source compiler 403 Tools 405 Compiler-compiler 405 PQCC 407 Compiler Description Language 408 Comparison of regular expression engines 410 Comparison of parser generators 416 Lex 427 flex lexical analyser 430 Quex 437 JLex 440 Ragel 441 yacc 442 Berkeley Yacc 443 ANTLR 444 GNU bison 446 Coco/R 456 GOLD 458 JavaCC 463 JetPAG 464 Lemon Parser Generator 467 ROSE compiler framework 468 SableCC 470 Scannerless Boolean Parser 471 Spirit Parser Framework 472 S/SL programming language 474 SYNTAX 475 Syntax Definition Formalism 476 TREE-META 478 Frameworks supporting the polyhedral model 480 Case studies 485 GNU Compiler Collection 485 Java performance 495 Literature 505 Compilers: Principles, Techniques, and Tools 505 Principles of Compiler Design 507 The Design of an Optimizing Compiler 507 References Article Sources and Contributors 508 Image Sources, Licenses and Contributors 517 Article Licenses License 518 1 Introduction Compiler construction Compiler construction is an area of computer science that deals with the theory and practice of developing programming languages and their associated compilers. The theoretical portion is primarily concerned with syntax, grammar and semantics of programming languages. One could say that this gives this particular area of computer science a strong tie with linguistics. Some courses on compiler construction will include a simplified grammar of a spoken language that can be used to form a valid sentence for the purposes of providing students with an analogy to help them understand how grammar works for programming languages. The practical portion covers actual implementation of compilers for languages. Students will typically end up writing the front end of a compiler for a simplistic teaching language, such as Micro. Subfields • Parsing • Program analysis • Program transformation • Compiler or program optimization • Code generation Further reading • Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. • Michael Wolfe. High-Performance Compilers for Parallel Computing. ISBN 978-0805327304 External links • Let's Build a Compiler, by Jack Crenshaw [1], A tutorial on compiler construction. References [1] http:/ / compilers. iecc. com/ crenshaw/ Compiler 2 Compiler A compiler is a computer program (or set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable program. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language or machine code). If the compiled program can run on a computer whose CPU or operating system is different from the one on which the compiler runs, the compiler is known as a cross-compiler. A program that translates from a low level language to a higher level one is a decompiler. A program that A diagram of the operation of a typical multi-language, multi-target compiler translates between high-level languages is usually called a language translator, source to source translator, or language converter. A language rewriter is usually a program that translates the form of expressions without a change of language. A compiler is likely to perform many or all of the following operations: lexical analysis, preprocessing, parsing, semantic analysis (Syntax-directed translation), code generation, and code optimization. Program faults caused by incorrect compiler behavior can be very difficult to track down and work around; therefore, compiler implementors invest a lot of time ensuring the correctness of their software. The term compiler-compiler is sometimes used to refer to a parser generator, a tool often used to help create the lexer and parser. Compiler 3 History Software for early computers was primarily written in assembly language for many years. Higher level programming languages were not invented until the benefits of being able to reuse software on different kinds of CPUs started to become significantly greater than the cost of writing a compiler. The very limited memory capacity of early computers also created many technical problems when implementing a compiler. Towards the end of the 1950s, machine-independent programming languages were first proposed. Subsequently, several experimental compilers were developed. The first compiler was written by Grace Hopper, in 1952, for the A-0 programming language. The FORTRAN team led by John Backus at IBM is generally credited as having introduced the first complete compiler in 1957. COBOL was an early language to be compiled on multiple architectures, in 1960.[1] In many application domains the idea of using a higher level language quickly caught on. Because of the expanding functionality supported by newer programming languages and the increasing complexity of computer architectures, compilers have become more and more complex. Early compilers were written in assembly language. The first self-hosting compiler — capable of compiling its own source code in a high-level language — was created for Lisp by Tim Hart and Mike Levin at MIT in 1962.[2] Since the 1970s it has become common practice to implement a compiler in the language it compiles, although both Pascal and C have been popular choices for implementation language. Building a self-hosting compiler is a bootstrapping problem—the first such compiler for a language must be compiled either by a compiler written in a different language, or (as in Hart and Levin's Lisp compiler) compiled by running the compiler in an interpreter. Compilers in education Compiler construction and compiler optimization are taught at universities and schools as part of the computer science curriculum. Such courses are usually supplemented with the implementation of a compiler for an educational
Recommended publications
  • A Type Inference on Executables
    A Type Inference on Executables Juan Caballero, IMDEA Software Institute Zhiqiang Lin, University of Texas at Dallas In many applications source code and debugging symbols of a target program are not available, and what we can only access is the program executable. A fundamental challenge with executables is that during compilation critical information such as variables and types is lost. Given that typed variables provide fundamental semantics of a program, for the last 16 years a large amount of research has been carried out on binary code type inference, a challenging task that aims to infer typed variables from executables (also referred to as binary code). In this article we systematize the area of binary code type inference according to its most important dimensions: the applications that motivate its importance, the approaches used, the types that those approaches infer, the implementation of those approaches, and how the inference results are evaluated. We also discuss limitations, point to underdeveloped problems and open challenges, and propose further applications. Categories and Subject Descriptors: D.3.3 [Language Constructs and Features]: Data types and struc- tures; D.4.6 [Operating Systems]: Security and Protection General Terms: Languages, Security Additional Key Words and Phrases: type inference, program executables, binary code analysis ACM Reference Format: Juan Caballero and Zhiqiang Lin, 2015. Type Inference on Executables. ACM Comput. Surv. V, N, Article A (January YYYY), 35 pages. DOI:http://dx.doi.org/10.1145/0000000.0000000 1. INTRODUCTION Being the final deliverable of software, executables (or binary code, as we use both terms interchangeably) are everywhere. They contain the final code that runs on a system and truly represent the program behavior.
    [Show full text]
  • Journal of Computer Science and Engineering Parsing
    IJRDO - Journal of Computer Science and Engineering ISSN: 2456-1843 JOURNAL OF COMPUTER SCIENCE AND ENGINEERING PARSING TECHNIQUES Ojesvi Bhardwaj Abstract: ‘Parsing’ is the term used to describe the process of automatically building syntactic analyses of a sentence in terms of a given grammar and lexicon. The resulting syntactic analyses may be used as input to a process of semantic interpretation, (or perhaps phonological interpretation, where aspects of this, like prosody, are sensitive to syntactic structure). Occasionally, ‘parsing’ is also used to include both syntactic and semantic analysis. We use it in the more conservative sense here, however. In most contemporary grammatical formalisms, the output of parsing is something logically equivalent to a tree, displaying dominance and precedence relations between constituents of a sentence, perhaps with further annotations in the form of attribute-value equations (‘features’) capturing other aspects of linguistic description. However, there are many different possible linguistic formalisms, and many ways of representing each of them, and hence many different ways of representing the results of parsing. We shall assume here a simple tree representation, and an underlying context-free grammatical (CFG) formalism. *Student, CSE, Dronachraya Collage of Engineering, Gurgaon Volume-1 | Issue-12 | December, 2015 | Paper-3 29 - IJRDO - Journal of Computer Science and Engineering ISSN: 2456-1843 1. INTRODUCTION Parsing or syntactic analysis is the process of analyzing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammar. The term parsing comes from Latin pars (ōrātiōnis), meaning part (of speech). The term has slightly different meanings in different branches of linguistics and computer science.
    [Show full text]
  • An Efficient Implementation of the Head-Corner Parser
    An Efficient Implementation of the Head-Corner Parser Gertjan van Noord" Rijksuniversiteit Groningen This paper describes an efficient and robust implementation of a bidirectional, head-driven parser for constraint-based grammars. This parser is developed for the OVIS system: a Dutch spoken dialogue system in which information about public transport can be obtained by telephone. After a review of the motivation for head-driven parsing strategies, and head-corner parsing in particular, a nondeterministic version of the head-corner parser is presented. A memorization technique is applied to obtain a fast parser. A goal-weakening technique is introduced, which greatly improves average case efficiency, both in terms of speed and space requirements. I argue in favor of such a memorization strategy with goal-weakening in comparison with ordinary chart parsers because such a strategy can be applied selectively and therefore enormously reduces the space requirements of the parser, while no practical loss in time-efficiency is observed. On the contrary, experiments are described in which head-corner and left-corner parsers imple- mented with selective memorization and goal weakening outperform "standard" chart parsers. The experiments include the grammar of the OV/S system and the Alvey NL Tools grammar. Head-corner parsing is a mix of bottom-up and top-down processing. Certain approaches to robust parsing require purely bottom-up processing. Therefore, it seems that head-corner parsing is unsuitable for such robust parsing techniques. However, it is shown how underspecification (which arises very naturally in a logic programming environment) can be used in the head-corner parser to allow such robust parsing techniques.
    [Show full text]
  • Derivatives of Parsing Expression Grammars
    Derivatives of Parsing Expression Grammars Aaron Moss Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, Canada [email protected] This paper introduces a new derivative parsing algorithm for recognition of parsing expression gram- mars. Derivative parsing is shown to have a polynomial worst-case time bound, an improvement on the exponential bound of the recursive descent algorithm. This work also introduces asymptotic analysis based on inputs with a constant bound on both grammar nesting depth and number of back- tracking choices; derivative and recursive descent parsing are shown to run in linear time and constant space on this useful class of inputs, with both the theoretical bounds and the reasonability of the in- put class validated empirically. This common-case constant memory usage of derivative parsing is an improvement on the linear space required by the packrat algorithm. 1 Introduction Parsing expression grammars (PEGs) are a parsing formalism introduced by Ford [6]. Any LR(k) lan- guage can be represented as a PEG [7], but there are some non-context-free languages that may also be represented as PEGs (e.g. anbncn [7]). Unlike context-free grammars (CFGs), PEGs are unambiguous, admitting no more than one parse tree for any grammar and input. PEGs are a formalization of recursive descent parsers allowing limited backtracking and infinite lookahead; a string in the language of a PEG can be recognized in exponential time and linear space using a recursive descent algorithm, or linear time and space using the memoized packrat algorithm [6]. PEGs are formally defined and these algo- rithms outlined in Section 3.
    [Show full text]
  • Redundancy Elimination Common Subexpression Elimination
    Redundancy Elimination Aim: Eliminate redundant operations in dynamic execution Why occur? Loop-invariant code: Ex: constant assignment in loop Same expression computed Ex: addressing Value numbering is an example Requires dataflow analysis Other optimizations: Constant subexpression elimination Loop-invariant code motion Partial redundancy elimination Common Subexpression Elimination Replace recomputation of expression by use of temp which holds value Ex. (s1) y := a + b Ex. (s1) temp := a + b (s1') y := temp (s2) z := a + b (s2) z := temp Illegal? How different from value numbering? Ex. (s1) read(i) (s2) j := i + 1 (s3) k := i i + 1, k+1 (s4) l := k + 1 no cse, same value number Why need temp? Local and Global ¡ Local CSE (BB) Ex. (s1) c := a + b (s1) t1 := a + b (s2) d := m&n (s1') c := t1 (s3) e := a + b (s2) d := m&n (s4) m := 5 (s5) if( m&n) ... (s3) e := t1 (s4) m := 5 (s5) if( m&n) ... 5 instr, 4 ops, 7 vars 6 instr, 3 ops, 8 vars Always better? Method: keep track of expressions computed in block whose operands have not changed value CSE Hash Table (+, a, b) (&,m, n) Global CSE example i := j i := j a := 4*i t := 4*i i := i + 1 i := i + 1 b := 4*i t := 4*i b := t c := 4*i c := t Assumes b is used later ¡ Global CSE An expression e is available at entry to B if on every path p from Entry to B, there is an evaluation of e at B' on p whose values are not redefined between B' and B.
    [Show full text]
  • Parser Tables for Non-LR(1) Grammars with Conflict Resolution Joel E
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Science of Computer Programming 75 (2010) 943–979 Contents lists available at ScienceDirect Science of Computer Programming journal homepage: www.elsevier.com/locate/scico The IELR(1) algorithm for generating minimal LR(1) parser tables for non-LR(1) grammars with conflict resolution Joel E. Denny ∗, Brian A. Malloy School of Computing, Clemson University, Clemson, SC 29634, USA article info a b s t r a c t Article history: There has been a recent effort in the literature to reconsider grammar-dependent software Received 17 July 2008 development from an engineering point of view. As part of that effort, we examine a Received in revised form 31 March 2009 deficiency in the state of the art of practical LR parser table generation. Specifically, LALR Accepted 12 August 2009 sometimes generates parser tables that do not accept the full language that the grammar Available online 10 September 2009 developer expects, but canonical LR is too inefficient to be practical particularly during grammar development. In response, many researchers have attempted to develop minimal Keywords: LR parser table generation algorithms. In this paper, we demonstrate that a well known Grammarware Canonical LR algorithm described by David Pager and implemented in Menhir, the most robust minimal LALR LR(1) implementation we have discovered, does not always achieve the full power of Minimal LR canonical LR(1) when the given grammar is non-LR(1) coupled with a specification for Yacc resolving conflicts. We also detail an original minimal LR(1) algorithm, IELR(1) (Inadequacy Bison Elimination LR(1)), which we have implemented as an extension of GNU Bison and which does not exhibit this deficiency.
    [Show full text]
  • Chapter 4 Syntax Analysis
    Chapter 4 Syntax Analysis By Varun Arora Outline Role of parser Context free grammars Top down parsing Bottom up parsing Parser generators By Varun Arora The role of parser token Source Lexical Parse tree Rest of Intermediate Parser program Analyzer Front End representation getNext Token Symbol table By Varun Arora Uses of grammars E -> E + T | T T -> T * F | F F -> (E) | id E -> TE’ E’ -> +TE’ | Ɛ T -> FT’ T’ -> *FT’ | Ɛ F -> (E) | id By Varun Arora Error handling Common programming errors Lexical errors Syntactic errors Semantic errors Lexical errors Error handler goals Report the presence of errors clearly and accurately Recover from each error quickly enough to detect subsequent errors Add minimal overhead to the processing of correct progrms By Varun Arora Error-recover strategies Panic mode recovery Discard input symbol one at a time until one of designated set of synchronization tokens is found Phrase level recovery Replacing a prefix of remaining input by some string that allows the parser to continue Error productions Augment the grammar with productions that generate the erroneous constructs Global correction Choosing minimal sequence of changes to obtain a globally least-cost correction By Varun Arora Context free grammars Terminals Nonterminals expression -> expression + term Start symbol expression -> expression – term productions expression -> term term -> term * factor term -> term / factor term -> factor factor -> (expression) factor -> id By Varun Arora Derivations Productions are treated as
    [Show full text]
  • Integrating Program Optimizations and Transformations with the Scheduling of Instruction Level Parallelism*
    Integrating Program Optimizations and Transformations with the Scheduling of Instruction Level Parallelism* David A. Berson 1 Pohua Chang 1 Rajiv Gupta 2 Mary Lou Sofia2 1 Intel Corporation, Santa Clara, CA 95052 2 University of Pittsburgh, Pittsburgh, PA 15260 Abstract. Code optimizations and restructuring transformations are typically applied before scheduling to improve the quality of generated code. However, in some cases, the optimizations and transformations do not lead to a better schedule or may even adversely affect the schedule. In particular, optimizations for redundancy elimination and restructuring transformations for increasing parallelism axe often accompanied with an increase in register pressure. Therefore their application in situations where register pressure is already too high may result in the generation of additional spill code. In this paper we present an integrated approach to scheduling that enables the selective application of optimizations and restructuring transformations by the scheduler when it determines their application to be beneficial. The integration is necessary because infor- mation that is used to determine the effects of optimizations and trans- formations on the schedule is only available during instruction schedul- ing. Our integrated scheduling approach is applicable to various types of global scheduling techniques; in this paper we present an integrated algorithm for scheduling superblocks. 1 Introduction Compilers for multiple-issue architectures, such as superscalax and very long instruction word (VLIW) architectures, axe typically divided into phases, with code optimizations, scheduling and register allocation being the latter phases. The importance of integrating these latter phases is growing with the recognition that the quality of code produced for parallel systems can be greatly improved through the sharing of information.
    [Show full text]
  • The Earley Algorithm Is to Avoid This, by Only Building Constituents That Are Compatible with the Input Read So Far
    Earley Parsing Informatics 2A: Lecture 21 Shay Cohen 3 November 2017 1 / 31 1 The CYK chart as a graph What's wrong with CYK Adding Prediction to the Chart 2 The Earley Parsing Algorithm The Predictor Operator The Scanner Operator The Completer Operator Earley parsing: example Comparing Earley and CYK 2 / 31 We would have to split a given span into all possible subspans according to the length of the RHS. What is the complexity of such algorithm? Still O(n2) charts, but now it takes O(nk−1) time to process each cell, where k is the maximal length of an RHS. Therefore: O(nk+1). For CYK, k = 2. Can we do better than that? Note about CYK The CYK algorithm parses input strings in Chomsky normal form. Can you see how to change it to an algorithm with an arbitrary RHS length (of only nonterminals)? 3 / 31 Still O(n2) charts, but now it takes O(nk−1) time to process each cell, where k is the maximal length of an RHS. Therefore: O(nk+1). For CYK, k = 2. Can we do better than that? Note about CYK The CYK algorithm parses input strings in Chomsky normal form. Can you see how to change it to an algorithm with an arbitrary RHS length (of only nonterminals)? We would have to split a given span into all possible subspans according to the length of the RHS. What is the complexity of such algorithm? 3 / 31 Note about CYK The CYK algorithm parses input strings in Chomsky normal form.
    [Show full text]
  • Targeting Embedded Powerpc
    Freescale Semiconductor, Inc. EPPC.book Page 1 Monday, March 28, 2005 9:22 AM CodeWarrior™ Development Studio PowerPC™ ISA Communications Processors Edition Targeting Manual Revised: 28 March 2005 For More Information: www.freescale.com Freescale Semiconductor, Inc. EPPC.book Page 2 Monday, March 28, 2005 9:22 AM Metrowerks, the Metrowerks logo, and CodeWarrior are trademarks or registered trademarks of Metrowerks Corpora- tion in the United States and/or other countries. All other trade names and trademarks are the property of their respective owners. Copyright © 2005 by Metrowerks, a Freescale Semiconductor company. All rights reserved. No portion of this document may be reproduced or transmitted in any form or by any means, electronic or me- chanical, without prior written permission from Metrowerks. Use of this document and related materials are governed by the license agreement that accompanied the product to which this manual pertains. This document may be printed for non-commercial personal use only in accordance with the aforementioned license agreement. If you do not have a copy of the license agreement, contact your Metrowerks representative or call 1-800-377- 5416 (if outside the U.S., call +1-512-996-5300). Metrowerks reserves the right to make changes to any product described or referred to in this document without further notice. Metrowerks makes no warranty, representation or guarantee regarding the merchantability or fitness of its prod- ucts for any particular purpose, nor does Metrowerks assume any liability arising
    [Show full text]
  • LATE Ain't Earley: a Faster Parallel Earley Parser
    LATE Ain’T Earley: A Faster Parallel Earley Parser Peter Ahrens John Feser Joseph Hui [email protected] [email protected] [email protected] July 18, 2018 Abstract We present the LATE algorithm, an asynchronous variant of the Earley algorithm for pars- ing context-free grammars. The Earley algorithm is naturally task-based, but is difficult to parallelize because of dependencies between the tasks. We present the LATE algorithm, which uses additional data structures to maintain information about the state of the parse so that work items may be processed in any order. This property allows the LATE algorithm to be sped up using task parallelism. We show that the LATE algorithm can achieve a 120x speedup over the Earley algorithm on a natural language task. 1 Introduction Improvements in the efficiency of parsers for context-free grammars (CFGs) have the potential to speed up applications in software development, computational linguistics, and human-computer interaction. The Earley parser has an asymptotic complexity that scales with the complexity of the CFG, a unique, desirable trait among parsers for arbitrary CFGs. However, while the more commonly used Cocke-Younger-Kasami (CYK) [2, 5, 12] parser has been successfully parallelized [1, 7], the Earley algorithm has seen relatively few attempts at parallelization. Our research objectives were to understand when there exists parallelism in the Earley algorithm, and to explore methods for exploiting this parallelism. We first tried to naively parallelize the Earley algorithm by processing the Earley items in each Earley set in parallel. We found that this approach does not produce any speedup, because the dependencies between Earley items force much of the work to be performed sequentially.
    [Show full text]
  • Abstract Syntax Trees & Top-Down Parsing
    Abstract Syntax Trees & Top-Down Parsing Review of Parsing • Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree • Issues: – How do we recognize that s ∈ L(G) ? – A parse tree of s describes how s ∈ L(G) – Ambiguity: more than one parse tree (possible interpretation) for some string s – Error: no parse tree for some string s – How do we construct the parse tree? Compiler Design 1 (2011) 2 Abstract Syntax Trees • So far, a parser traces the derivation of a sequence of tokens • The rest of the compiler needs a structural representation of the program • Abstract syntax trees – Like parse trees but ignore some details – Abbreviated as AST Compiler Design 1 (2011) 3 Abstract Syntax Trees (Cont.) • Consider the grammar E → int | ( E ) | E + E • And the string 5 + (2 + 3) • After lexical analysis (a list of tokens) int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’ • During parsing we build a parse tree … Compiler Design 1 (2011) 4 Example of Parse Tree E • Traces the operation of the parser E + E • Captures the nesting structure • But too much info int5 ( E ) – Parentheses – Single-successor nodes + E E int 2 int3 Compiler Design 1 (2011) 5 Example of Abstract Syntax Tree PLUS PLUS 5 2 3 • Also captures the nesting structure • But abstracts from the concrete syntax a more compact and easier to use • An important data structure in a compiler Compiler Design 1 (2011) 6 Semantic Actions • This is what we’ll use to construct ASTs • Each grammar symbol may have attributes – An attribute is a property of a programming language construct
    [Show full text]