LR Parsing LR Parsing • Limits of SLR Parsing • LR Parsing Lecture Notes by • LALR Parsing Profs Aiken and Necula (UCB) • Implementation of Semantic Actions

Total Page:16

File Type:pdf, Size:1020Kb

LR Parsing LR Parsing • Limits of SLR Parsing • LR Parsing Lecture Notes by • LALR Parsing Profs Aiken and Necula (UCB) • Implementation of Semantic Actions Outline • Review of SLR parsing LR Parsing • Limits of SLR parsing • LR parsing Lecture Notes by • LALR parsing Profs Aiken and Necula (UCB) • Implementation of semantic actions • Using parser generators CS780(Prasad) L16LR 1 CS780(Prasad) L16LR 2 Review of SLR(1) Parsing LR Parsing Algorithm • LR parser maintains a stack Let I = w$ be initial input Let j = 0 á sym1, state1 ñ . á symn, staten ñ Let DFA state 1 have item S’ ® .S staten is the final state of the DFA on sym1 … symn Let stack = á dummy, 1 ñ • Goto table: the transition function of the DFA repeat A – Goto[i,A] = j if statei ® statej case action[top_state(stack), I[j]] of • Action table: for each state and terminal: shift k: push á I[j++], k ñ reduce X ® A: Shift j pop |A| pairs, Reduce X ® a push áX, Goto[X,top_state(stack)]ñ Action[i, a] = Accept accept: halt normally Error error: halt and report error CS780(Prasad) L16LR 3 CS780(Prasad) L16LR 4 Review. Items SLR(1) Action Table • An item [X ® a.b] says that For each state si and terminal a – the parser is looking for an X – If si has item X ® a.ab and Goto[i,a] = j then – it has an a on top of the stack Action[i,a] = shift j – Expects to find a string derived from b next in the input – If si has item S’ ® S. then Action[i,$] = accept • Notes: – [X ® a.ab] means that a should follow. Then we can – If si has item X ® a. and a Î Follow(X) and X ¹ S’ shift it and still have a viable prefix. then Action[i,a] = reduce X ® a – [X ®a.] means that we could reduce X • But this is not always a good idea ! – Otherwise, Action[i,a] = error CS780(Prasad) L16LR 5 CS780(Prasad) L16LR 6 1 Limits of SLR Parsing Limits of SLR Parsing (cont.) • SLR(1) is the simplest LR parsing method • Consider two states of the DFA for • SLR(1) is almost powerful enough, but … recognizing viable prefixes • … some common programming language ® ® constructs are not SLR(1). S’ . S S L . = E L • Consider the grammar S ® . L = E Þ E ® L . ® S ® L = E | E S . E SLR(1) parser on input “=“ ® L ® * E | id L . * E • shift (item L . = E ) L ® . id • reduce by E ® L E ® L (since “=“ Î Follow(E)) E ® . L CS780(Prasad) L16LR 7 CS780(Prasad) L16LR 8 What’s The Problem? What’s The Problem? (Cont.) • The grammar is not SLR(1), but why? • Problem: the SLR table has too many reduce • Focus on the reduce move in the second state actions. – We are in the context of S ® E ® L – Using Follow is too coarse. – No = can follow E in this context • In any given context, only some elements of – Even though = Î Follow(E) (in S ® L = E ® *E = E) Follow can actually follow a non-terminal. – The reduce move should not happen if an = follows • For example: in this context. Follow(E) = {=, $}, but In context S ® E only $ can follow E In context S ® L = E ® * E1 = E only = can follow E1 CS780(Prasad) L16LR 9 CS780(Prasad) L16LR 10 One Way to Fix The Problem: LR(1) Items LR(1) Items. Intuition • Idea: • [X ® a .b, a] describes a state of the parser: – refine Follow based on context. – We are trying to find an X, and – The context is described through items. – We have a already on top of the stack, and • An LR(1) item is a pair – We expect to see a prefix derived from ba [X ® a .b, a] ® a where X ® ab is a production and a is the • Back to reduce actions: have an [X ., a] lookahead token or $ – Perform the reduce only if next token is a ! – Will have fewer reduce actions • LR(k) is similar but with k tokens of lookahead – Not for all b Î Follow(X) – In practice, k = 1 CS780(Prasad) L16LR 11 CS780(Prasad) L16LR 12 2 Constructing Sets of LR(1) Items (1) Constructing Sets of LR(1) Items (2) • Similar to construction for LR(0). 1. For each LR(1) item [Y ® a.Xb, a] Add an X-transition [Y ® a.Xb, a] ®X [Y ® aX.b, a] • The states of the NFA are the LR(1) items of G. 2. For each LR(1) item [Y ® a.Xb, a] • The start state is [S’ ® . S, $] For each production X ® g For each terminal b Î First(ba) Add an e transition [Y ® a.Xb, a] ®e [X ® .g, b] CS780(Prasad) L16LR 13 CS780(Prasad) L16LR 14 NFA for Viable Prefixes in Detail (1) NFA for Viable Prefixes in Detail (2) S’ ® S . $ S’ ® S . $ L ® . id = S S e S ® . L = E $ S ® . L = E $ L e e e S’ ® . S $ S’ ® . S $ S ® L . = E $ L ® . * E = e e S ® . E $ S ® . E $ CS780(Prasad) L16LR 15 CS780(Prasad) L16LR 16 NFA for Viable Prefixes in Detail (3) NFA for Viable Prefixes in Detail (4) S’ ® S . $ S’ ® S . $ L ® . id = L ® . id = S e S e S ® . L = E $ L S ® . L = E $ L e e e e S’ ® . S $ S ® L . = E $ S’ ® . S $ S ® L . = E $ L ® . * E = L ® . * E = L e e E ® L . $ e E ® . L $ e E ® . L $ e S ® . E $ S ® . E $ e E E L ® . * E $ L ® . id $ CS780(Prasad) L16LR 17 CS780(Prasad) L16LR 18 3 An Example Revisited Constructing LR(1) Parsing Tables • Consider the state from last slide 1. Add a dummy S’ ® S production S ® L . = E $ 2. Construct the NFA of LR(1) items as before 3. Convert the NFA into a DFA E ® L . $ 4. Goto is defined exactly as before: A Goto[i, A] = j if statei ® statej • LR(1) parser on input “ =“ (the transition function of the DFA) • only shift (item L . = E ) CS780(Prasad) L16LR 19 CS780(Prasad) L16LR 20 Constructing LR(1) Parsing Tables (Cont.) LALR Parsing 5. For each state si of the DFA and terminal a • Two bottom-up parsing methods: SLR and LR – If si has item [X ® a.ab, c] and Goto[i, a] = j then action[i,a] = shift j • Which one we use? Neither – SLR is not powerful enough. – If si has item [X ® a., a] and X ¹ S’ then action[i,a] = reduce X ® a – LR parsing tables are too big (1000’s of states vs. 100’s of states for SLR). – If si has item [S’ ® S., $] then action[i,$] = accept • In practice, use LALR(1) – Otherwise, – Stands for Look-Ahead LR action[i,a] = error – A compromise between SLR(1) and LR(1) • LR(1) grammar Û action[i,a] uniquely defined CS780(Prasad) L16LR 21 CS780(Prasad) L16LR 22 LALR Parsing (Cont.) The Core of a Set of LR Item • Rough intuition: A LALR(1) parser for G has • Definition: The core of a set of LR items is – The number of states of an SLR parser. the set of first components. – Some of the lookahead discrimination of LR(1). • Example: the core of • Idea: construct the DFA for the LR(1). { [X ® a.b, b], [Y ® g.d, d]} • Then merge the DFA states whose items is ® a b ® g d differ only in the lookahead tokens {X . , Y . } – We say that such states have the same core. • The core of an LR item is an LR(0) item. CS780(Prasad) L16LR 23 CS780(Prasad) L16LR 24 4 A LALR(1) DFA The LALR Parser Can Have Conflicts • Repeat until all states have distinct core. • Consider for example the LR(1) states – Choose two distinct states with same core. {[X ® a., a], [Y ® b., b]} – Merge the states by creating a new one with the {[X ® a., b], [Y ® b., a]} union of all the items. – Point edges from predecessors to new state. • And the merged LALR(1) state – New state points to all the previous successors. {[X ® a., a/b], [Y ® b., a/b]} • Has a new reduce-reduce conflict. A B C A C BE • In practice such cases are rare. D E F D F CS780(Prasad) L16LR 25 CS780(Prasad) L16LR 26 LALR vs. LR Parsing A Hierarchy of Grammar Classes • LALR languages are not natural. – They are an efficiency hack on LR languages • Any reasonable programming language has an LALR(1) grammar. • LALR(1) has become a standard for programming languages and for parser generators. CS780(Prasad) L16LR 27 CS780(Prasad) L16LR 28 Semantic Actions Performing Semantic Actions. Example • We can now illustrate how semantic actions • Recall the example from earlier lecture are implemented for LR parsing. • Keep attributes on the stack. E ® T + E1 { E.val = T.val + E1.val } | T { E.val = T.val } • On shift a, push attribute for a on stack. T ® int * T1 { T.val = int.val * T1.val } • On reduce X ® a | int { T.val = int.val } – pop attributes for a – compute attribute for X • Consider the parsing of the string 3 * 5 + 8 – and push it on the stack CS780(Prasad) L16LR 29 CS780(Prasad) L16LR 30 5 Performing Semantic Actions. Example Notes | int * int + int shift • The previous discussion shows how int3 | * int + int shift synthesized attributes are computed by LR int3 * | int + int shift parsers. int3 * int5 | + int reduce T ® int int3 * T5 | + int reduce T ® int * T T15 | + int shift • It is also possible to compute inherited T15 + | int shift attributes in an LR parser.
Recommended publications
  • Buffered Shift-Reduce Parsing*
    BUFFERED SHIFT-REDUCE PARSING* Bing Swen (Sun Bin) Dept. of Computer Science, Peking University, Beijing 100871, China [email protected] Abstract A parsing method called buffered shift-reduce parsing is presented, which adds an intermediate buffer (queue) to the usual LR parser. The buffer’s usage is analogous to that of the wait-and-see parsing, but it has unlimited buffer length, and may serve as a separate reduction (pruning) stack. The general structure of its parser and some features of its grammars and parsing tables are discussed. 1. Introduction It is well known that the LR parsing is hitherto the most general shift-reduce method of bottom-up parsing [1], and is among the most widely used. For example, almost all compilers of mainstream programming languages employ the LR-like parsing (via an LALR(1) compiler generator such as YACC or GNU Bison [2]), and many NLP systems use a generalized version of LR parsing (GLR, also called the Tomita algorithm [3]). In some earlier research work on LR(k) extensions for NLP [4] and generalization of compiler generator [5], an idea naturally occurred that one can make a modification to the LR parsing so that after each reduction, the resulting symbol may not be necessarily pushed onto the analysis stack, but instead put back into the input, waiting for decisions in the following steps. This strategy postpones the time of some reduction actions in traditional LR parser, partly deviating from leftmost pruning (Leftmost Reduction). This may cause performance loss for some grammars, with the gained opportunities of earlier error detecting and avoidance of false reductions, as well as later handling of ambiguities, which is significant for the processing of highly ambiguous input strings like natural language sentences.
    [Show full text]
  • Validating LR(1) Parsers
    Validating LR(1) Parsers Jacques-Henri Jourdan1;2, Fran¸coisPottier2, and Xavier Leroy2 1 Ecole´ Normale Sup´erieure 2 INRIA Paris-Rocquencourt Abstract. An LR(1) parser is a finite-state automaton, equipped with a stack, which uses a combination of its current state and one lookahead symbol in order to determine which action to perform next. We present a validator which, when applied to a context-free grammar G and an automaton A, checks that A and G agree. Validating the parser pro- vides the correctness guarantees required by verified compilers and other high-assurance software that involves parsing. The validation process is independent of which technique was used to construct A. The validator is implemented and proved correct using the Coq proof assistant. As an application, we build a formally-verified parser for the C99 language. 1 Introduction Parsing remains an essential component of compilers and other programs that input textual representations of structured data. Its theoretical foundations are well understood today, and mature technology, ranging from parser combinator libraries to sophisticated parser generators, is readily available to help imple- menting parsers. The issue we focus on in this paper is that of parser correctness: how to obtain formal evidence that a parser is correct with respect to its specification? Here, following established practice, we choose to specify parsers via context-free grammars enriched with semantic actions. One application area where the parser correctness issue naturally arises is formally-verified compilers such as the CompCert verified C compiler [14]. In- deed, in the current state of CompCert, the passes that have been formally ver- ified start at abstract syntax trees (AST) for the CompCert C subset of C and extend to ASTs for three assembly languages.
    [Show full text]
  • Efficient Computation of LALR(1) Look-Ahead Sets
    Efficient Computation of LALR(1) Look-Ahead Sets FRANK DeREMER and THOMAS PENNELLO University of California, Santa Cruz, and MetaWare TM Incorporated Two relations that capture the essential structure of the problem of computing LALR(1) look-ahead sets are defined, and an efficient algorithm is presented to compute the sets in time linear in the size of the relations. In particular, for a PASCAL grammar, the algorithm performs fewer than 15 percent of the set unions performed by the popular compiler-compiler YACC. When a grammar is not LALR(1), the relations, represented explicitly, provide for printing user- oriented error messages that specifically indicate how the look-ahead problem arose. In addition, certain loops in the digraphs induced by these relations indicate that the grammar is not LR(k) for any k. Finally, an oft-discovered and used but incorrect look-ahead set algorithm is similarly based on two other relations defined for the fwst time here. The formal presentation of this algorithm should help prevent its rediscovery. Categories and Subject Descriptors: D.3.1 [Programming Languages]: Formal Definitions and Theory--syntax; D.3.4 [Programming Languages]: Processors--translator writing systems and compiler generators; F.4.2 [Mathematical Logic and Formal Languages]: Grammars and Other Rewriting Systems--parsing General Terms: Algorithms, Languages, Theory Additional Key Words and Phrases: Context-free grammar, Backus-Naur form, strongly connected component, LALR(1), LR(k), grammar debugging, 1. INTRODUCTION Since the invention of LALR(1) grammars by DeRemer [9], LALR grammar analysis and parsing techniques have been popular as a component of translator writing systems and compiler-compilers.
    [Show full text]
  • LLLR Parsing: a Combination of LL and LR Parsing
    LLLR Parsing: a Combination of LL and LR Parsing Boštjan Slivnik University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia [email protected] Abstract A new parsing method called LLLR parsing is defined and a method for producing LLLR parsers is described. An LLLR parser uses an LL parser as its backbone and parses as much of its input string using LL parsing as possible. To resolve LL conflicts it triggers small embedded LR parsers. An embedded LR parser starts parsing the remaining input and once the LL conflict is resolved, the LR parser produces the left parse of the substring it has just parsed and passes the control back to the backbone LL parser. The LLLR(k) parser can be constructed for any LR(k) grammar. It produces the left parse of the input string without any backtracking and, if used for a syntax-directed translation, it evaluates semantic actions using the top-down strategy just like the canonical LL(k) parser. An LLLR(k) parser is appropriate for grammars where the LL(k) conflicting nonterminals either appear relatively close to the bottom of the derivation trees or produce short substrings. In such cases an LLLR parser can perform a significantly better error recovery than an LR parser since the most part of the input string is parsed with the backbone LL parser. LLLR parsing is similar to LL(∗) parsing except that it (a) uses LR(k) parsers instead of finite automata to resolve the LL(k) conflicts and (b) does not perform any backtracking.
    [Show full text]
  • Parser Generation Bottom-Up Parsing LR Parsing Constructing LR Parser
    CS421 COMPILERS AND INTERPRETERS CS421 COMPILERS AND INTERPRETERS Parser Generation Bottom-Up Parsing • Main Problem: given a grammar G, how to build a top-down parser or • Construct parse tree “bottom-up” --- from leaves to the root a bottom-up parser for it ? • Bottom-up parsing always constructs right-most derivation • parser : a program that, given a sentence, reconstructs a derivation for • Important parsing algorithms: shift-reduce, LR parsing that sentence ---- if done sucessfully, it “recognize” the sentence • LR parser components: input, stack (strings of grammar symbols and • all parsers read their input left-to-right, but construct parse tree states), driver routine, parsing tables. differently. input: a1 a2 a3 a4 ......an $ • bottom-up parsers --- construct the tree from leaves to root stack s shift-reduce, LR, SLR, LALR, operator precedence m LR Parsing output Xm • top-down parsers --- construct the tree from root to leaves .... s1 recursive descent, predictive parsing, LL(1) X1 parsing s0 Copyright 1994 - 2000 Carsten Schürmann, Zhong Shao, Yale University Parser Generation: Page 1 of 28 Copyright 1994 - 2000 Carsten Schürmann, Zhong Shao, Yale University Parser Generation: Page 2 of 28 CS421 COMPILERS AND INTERPRETERS CS421 COMPILERS AND INTERPRETERS LR Parsing Constructing LR Parser How to construct the parsing table and ? • A sequence of new state symbols s0,s1,s2,..., sm ----- each state ACTION GOTO sumarize the information contained in the stack below it. • basic idea: first construct DFA to recognize handles, then use DFA to construct the parsing tables ! different parsing table yield different LR • Parsing configurations: (stack, remaining input) written as parsers SLR(1), LR(1), or LALR(1) (s0X1s1X2s2...Xmsm , aiai+1ai+2...an$) • augmented grammar for context-free grammar G = G(T,N,P,S) is next “move” is determined by sm and ai defined as G’ = G’(T,N ??{ S’}, P ??{ S’ -> S}, S’) ------ adding non- • Parsing tables: ACTION[s,a] and GOTO[s,X] terminal S’ and the production S’ -> S, and S’ is the new start symbol.
    [Show full text]
  • Compiler Design
    CCOOMMPPIILLEERR DDEESSIIGGNN -- PPAARRSSEERR http://www.tutorialspoint.com/compiler_design/compiler_design_parser.htm Copyright © tutorialspoint.com In the previous chapter, we understood the basic concepts involved in parsing. In this chapter, we will learn the various types of parser construction methods available. Parsing can be defined as top-down or bottom-up based on how the parse-tree is constructed. Top-Down Parsing We have learnt in the last chapter that the top-down parsing technique parses the input, and starts constructing a parse tree from the root node gradually moving down to the leaf nodes. The types of top-down parsing are depicted below: Recursive Descent Parsing Recursive descent is a top-down parsing technique that constructs the parse tree from the top and the input is read from left to right. It uses procedures for every terminal and non-terminal entity. This parsing technique recursively parses the input to make a parse tree, which may or may not require back-tracking. But the grammar associated with it ifnotleftfactored cannot avoid back- tracking. A form of recursive-descent parsing that does not require any back-tracking is known as predictive parsing. This parsing technique is regarded recursive as it uses context-free grammar which is recursive in nature. Back-tracking Top- down parsers start from the root node startsymbol and match the input string against the production rules to replace them ifmatched. To understand this, take the following example of CFG: S → rXd | rZd X → oa | ea Z → ai For an input string: read, a top-down parser, will behave like this: It will start with S from the production rules and will match its yield to the left-most letter of the input, i.e.
    [Show full text]
  • A Simple, Possibly Correct LR Parser for C11 Jacques-Henri Jourdan, François Pottier
    A Simple, Possibly Correct LR Parser for C11 Jacques-Henri Jourdan, François Pottier To cite this version: Jacques-Henri Jourdan, François Pottier. A Simple, Possibly Correct LR Parser for C11. ACM Transactions on Programming Languages and Systems (TOPLAS), ACM, 2017, 39 (4), pp.1 - 36. 10.1145/3064848. hal-01633123 HAL Id: hal-01633123 https://hal.archives-ouvertes.fr/hal-01633123 Submitted on 11 Nov 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. 14 A simple, possibly correct LR parser for C11 Jacques-Henri Jourdan, Inria Paris, MPI-SWS François Pottier, Inria Paris The syntax of the C programming language is described in the C11 standard by an ambiguous context-free grammar, accompanied with English prose that describes the concept of “scope” and indicates how certain ambiguous code fragments should be interpreted. Based on these elements, the problem of implementing a compliant C11 parser is not entirely trivial. We review the main sources of difficulty and describe a relatively simple solution to the problem. Our solution employs the well-known technique of combining an LALR(1) parser with a “lexical feedback” mechanism. It draws on folklore knowledge and adds several original as- pects, including: a twist on lexical feedback that allows a smooth interaction with lookahead; a simplified and powerful treatment of scopes; and a few amendments in the grammar.
    [Show full text]
  • CLR(1) and LALR(1) Parsers
    CLR(1) and LALR(1) Parsers C.Naga Raju B.Tech(CSE),M.Tech(CSE),PhD(CSE),MIEEE,MCSI,MISTE Professor Department of CSE YSR Engineering College of YVU Proddatur 6/21/2020 Prof.C.NagaRaju YSREC of YVU 9949218570 Contents • Limitations of SLR(1) Parser • Introduction to CLR(1) Parser • Animated example • Limitation of CLR(1) • LALR(1) Parser Animated example • GATE Problems and Solutions Prof.C.NagaRaju YSREC of YVU 9949218570 Drawbacks of SLR(1) ❖ The SLR Parser discussed in the earlier class has certain flaws. ❖ 1.On single input, State may be included a Final Item and a Non- Final Item. This may result in a Shift-Reduce Conflict . ❖ 2.A State may be included Two Different Final Items. This might result in a Reduce-Reduce Conflict Prof.C.NagaRaju YSREC of YVU 6/21/2020 9949218570 ❖ 3.SLR(1) Parser reduces only when the next token is in Follow of the left-hand side of the production. ❖ 4.SLR(1) can reduce shift-reduce conflicts but not reduce-reduce conflicts ❖ These two conflicts are reduced by CLR(1) Parser by keeping track of lookahead information in the states of the parser. ❖ This is also called as LR(1) grammar Prof.C.NagaRaju YSREC of YVU 6/21/2020 9949218570 CLR(1) Parser ❖ LR(1) Parser greatly increases the strength of the parser, but also the size of its parse tables. ❖ The LR(1) techniques does not rely on FOLLOW sets, but it keeps the Specific Look-ahead with each item. Prof.C.NagaRaju YSREC of YVU 6/21/2020 9949218570 CLR(1) Parser ❖ LR(1) Parsing configurations have the general form: A –> X1...Xi • Xi+1...Xj , a ❖ The Look Ahead
    [Show full text]
  • Packrat Parsing: Simple, Powerful, Lazy, Linear Time
    Packrat Parsing: Simple, Powerful, Lazy, Linear Time Functional Pearl Bryan Ford Massachusetts Institute of Technology Cambridge, MA [email protected] Abstract 1 Introduction Packrat parsing is a novel technique for implementing parsers in a There are many ways to implement a parser in a functional pro- lazy functional programming language. A packrat parser provides gramming language. The simplest and most direct approach is the power and flexibility of top-down parsing with backtracking and top-down or recursive descent parsing, in which the components unlimited lookahead, but nevertheless guarantees linear parse time. of a language grammar are translated more-or-less directly into a Any language defined by an LL(k) or LR(k) grammar can be rec- set of mutually recursive functions. Top-down parsers can in turn ognized by a packrat parser, in addition to many languages that be divided into two categories. Predictive parsers attempt to pre- conventional linear-time algorithms do not support. This additional dict what type of language construct to expect at a given point by power simplifies the handling of common syntactic idioms such as “looking ahead” a limited number of symbols in the input stream. the widespread but troublesome longest-match rule, enables the use Backtracking parsers instead make decisions speculatively by try- of sophisticated disambiguation strategies such as syntactic and se- ing different alternatives in succession: if one alternative fails to mantic predicates, provides better grammar composition properties, match, then the parser “backtracks” to the original input position and allows lexical analysis to be integrated seamlessly into parsing. and tries another.
    [Show full text]
  • Parser Generation Bottom-Up Parsing LR Parsing Constructing LR Parser
    CS421 COMPILERS AND INTERPRETERS CS421 COMPILERS AND INTERPRETERS Parser Generation Bottom-Up Parsing • Main Problem: given a grammar G, how to build a top-down parser or a • Construct parse tree “bottom-up” --- from leaves to the root bottom-up parser for it ? • Bottom-up parsing always constructs right-most derivation • parser : a program that, given a sentence, reconstructs a derivation for that • Important parsing algorithms: shift-reduce, LR parsing sentence ---- if done sucessfully, it “recognize” the sentence • LR parser components: input, stack (strings of grammar symbols and states), • all parsers read their input left-to-right, but construct parse tree differently. driver routine, parsing tables. • bottom-up parsers --- construct the tree from leaves to root input: a1 a2 a3 a4 ...... an $ shift-reduce, LR, SLR, LALR, operator precedence stack s m LR Parsing output • top-down parsers --- construct the tree from root to leaves Xm .. recursive descent, predictive parsing, LL(1) s1 X1 s0 parsing Copyright 1994 - 2015 Zhong Shao, Yale University Parser Generation: Page 1 of 27 Copyright 1994 - 2015 Zhong Shao, Yale University Parser Generation: Page 2 of 27 CS421 COMPILERS AND INTERPRETERS CS421 COMPILERS AND INTERPRETERS LR Parsing Constructing LR Parser How to construct the parsing table and ? • A sequence of new state symbols s0,s1,s2,..., sm ----- each state ACTION GOTO sumarize the information contained in the stack below it. • basic idea: first construct DFA to recognize handles, then use DFA to construct the parsing tables ! different parsing table yield different LR parsers SLR(1), • Parsing configurations: (stack, remaining input) written as LR(1), or LALR(1) (s0X1s1X2s2...Xmsm , aiai+1ai+2...an$) • augmented grammar for context-free grammar G = G(T,N,P,S) is defined as G’ next “move” is determined by sm and ai = G’(T, N { S’}, P { S’ -> S}, S’) ------ adding non-terminal S’ and the • Parsing tables: ACTION[s,a] and GOTO[s,X] production S’ -> S, and S’ is the new start symbol.
    [Show full text]
  • Elkhound: a Fast, Practical GLR Parser Generator
    Published in Proc. of Conference on Compiler Construction, 2004, pp. 73–88. Elkhound: A Fast, Practical GLR Parser Generator Scott McPeak and George C. Necula ? University of California, Berkeley {smcpeak,necula}@cs.berkeley.edu Abstract. The Generalized LR (GLR) parsing algorithm is attractive for use in parsing programming languages because it is asymptotically efficient for typical grammars, and can parse with any context-free gram- mar, including ambiguous grammars. However, adoption of GLR has been slowed by high constant-factor overheads and the lack of a general, user-defined action interface. In this paper we present algorithmic and implementation enhancements to GLR to solve these problems. First, we present a hybrid algorithm that chooses between GLR and ordinary LR on a token-by-token ba- sis, thus achieving competitive performance for determinstic input frag- ments. Second, we describe a design for an action interface and a new worklist algorithm that can guarantee bottom-up execution of actions for acyclic grammars. These ideas are implemented in the Elkhound GLR parser generator. To demonstrate the effectiveness of these techniques, we describe our ex- perience using Elkhound to write a parser for C++, a language notorious for being difficult to parse. Our C++ parser is small (3500 lines), efficient and maintainable, employing a range of disambiguation strategies. 1 Introduction The state of the practice in automated parsing has changed little since the intro- duction of YACC (Yet Another Compiler-Compiler), an LALR(1) parser genera- tor, in 1975 [1]. An LALR(1) parser is deterministic: at every point in the input, it must be able to decide which grammar rule to use, if any, utilizing only one token of lookahead [2].
    [Show full text]
  • A Simple, Possibly Correct LR Parser for C11
    14 A simple, possibly correct LR parser for C11 Jacques-Henri Jourdan, Inria Paris, MPI-SWS François Pottier, Inria Paris The syntax of the C programming language is described in the C11 standard by an ambiguous context-free grammar, accompanied with English prose that describes the concept of “scope” and indicates how certain ambiguous code fragments should be interpreted. Based on these elements, the problem of implementing a compliant C11 parser is not entirely trivial. We review the main sources of difficulty and describe a relatively simple solution to the problem. Our solution employs the well-known technique of combining an LALR(1) parser with a “lexical feedback” mechanism. It draws on folklore knowledge and adds several original as- pects, including: a twist on lexical feedback that allows a smooth interaction with lookahead; a simplified and powerful treatment of scopes; and a few amendments in the grammar. Although not formally verified, our parser avoids several pitfalls that other implementations have fallen prey to. We believe that its simplicity, its mostly-declarative nature, and its high similarity with the C11 grammar are strong informal arguments in favor of its correctness. Our parser is accompanied with a small suite of “tricky” C11 programs. We hope that it may serve as a reference or a starting point in the implementation of compilers and analysis tools. CCS Concepts: •Software and its engineering ! Parsers; Additional Key Words and Phrases: Compilation; parsing; ambiguity; lexical feedback; C89; C99; C11 ACM Reference Format: Jacques-Henri Jourdan and François Pottier. 2017. A simple, possibly correct LR parser for C11 ACM Trans.
    [Show full text]