COMP3012/G53CMP: Lecture 4 This Lecture Parser Generators

Total Page:16

File Type:pdf, Size:1020Kb

COMP3012/G53CMP: Lecture 4 This Lecture Parser Generators This Lecture • Parser generators (“compiler compilers”) COMP3012/G53CMP: Lecture 4 • The parser generator Happy Syntactic Analysis: Parser Generators • A TXL parser written using Happy Henrik Nilsson • A TXL interpreter written using Happy University of Nottingham, UK COMP3012/G53CMP: Lecture 4 – p.1/32 COMP3012/G53CMP: Lecture 4 – p.2/32 Parser Generators (1) Parser Generators (2) • Constructing parsers by hand can be very An LR(0) DFA recognizing viable prefixes for tedious and time consuming. S → aABe A → bcA | c B → d • This is true in particular for LR(k) and LALR I 1 I I 5 I parsers: constructing the corresponding DFAs 0 S → a · ABe 8 a A S → aA · Be B is extremely laborious. S →·aABe A →·bcA S → aAB · e B →·d A →·c • E.g., this simple grammar (from the prev. lect.) I b c d I e 2 I3 I6 9 A → b · cA A → c · B → d · S → aABe · S → aABe c I4 A → bcA | c c I A → bc · A 7 b A B → d A →·bcA A → bcA · A →·c gives rise to a 10 state LR(0) DFA! COMP3012/G53CMP: Lecture 4 – p.3/32 COMP3012/G53CMP: Lecture 4 – p.4/32 Parser Generators (3) Parser Generators (4) • Parser construction is in many ways a very Consider an LR shift-reduce parser: mechanical process. Why not write a • Some of the actions when parsing abccde: program to do the hard work for us? State Stack (γ) Input (w) Move • A Parser Generator (or “compiler compiler”) . takes a grammar as input and outputs a I5 aA de Shift parser (a program) for that grammar. I6 aAd e Reduce by B → d I8 aAB e Shift • The input grammar is augmented with I9 aABe ǫ Reduce by S → aABe “semantic actions”: code fragments that get S ǫ Done invoked when a derivation step is performed. • A reduction corresponds to a derivation step • The semantic actions typically construct an in the grammar (an LR parser performs a AST or interpret the program being parsed. rightmost derivation in reverse). COMP3012/G53CMP: Lecture 4 – p.5/32 COMP3012/G53CMP: Lecture 4 – p.6/32 Parser Generators (5) Parser Generators (6) • At a reduction, the terminals and non- Some examples of parser generators: terminals of the RHS of the production (the • Yacc (“Yet Another Compiler Compiler”): A handle) are on the parse stack, associated classic UNIX LALR parser generator for C. with semantic information or semantic http://dinosaur.compilertools.net/ values; e.g., the corresponding AST fragments, expression values. • Bison: GNU project parser generator, a free Yacc replacement, for C and C++. • Think of the RHS symbols as variables bound to the the semantic value resulting • Happy: a parser generator for Haskell, similar from a successful derivation for that symbol. to Yacc and Bison. http://www.haskell.org/happy/ • Construction of AST, evaluation of expressions, etc. proceeds in bottom-up order. • Cup: LALR parser generator for Java. COMP3012/G53CMP: Lecture 4 – p.7/32 COMP3012/G53CMP: Lecture 4 – p.8/32 Parser Generators (7) Happy Parser for TXL (1) • ANTLR: LL(k) (recursive descent) parser We are going to develop a TXL parser using generator and other translator-oriented tools Happy. The TXL CFG: for Java, C#, C++. http://www.antlr.org/ TXLProgram → Exp • Many more compiler tools for Java here: Exp → AddExp http://catalog.compilertools.net/ java.html AddExp → MulExp | AddExp + MulExp • And a general catalogue of compiler tools: http://catalog.compilertools.net/ | AddExp - MulExp Note: Left-recursive! (To impart associativity.) LR parsers have no problems with left- or right- recursion (except right recursion uses more stack). COMP3012/G53CMP: Lecture 4 – p.9/32 COMP3012/G53CMP: Lecture 4 – p.10/32 Happy Parser for TXL (2) Happy Parser for TXL (3) Haskell datatype for tokens: The TXL CFG continued: data Token = T_Int Int MulExp → PrimExp | T_Id Id | MulExp * PrimExp | T_Plus | T_Minus | MulExp / PrimExp | T_Times PrimExp IntegerLiteral → | T_Divide | Identifier | T_LeftPar | ( Exp ) | T_RightPar | let Identifier = Exp in Exp | T_Equal | T_Let | T_In COMP3012/G53CMP: Lecture 4 – p.11/32 COMP3012/G53CMP: Lecture 4 – p.12/32 Happy Parser for TXL (4) Happy Parser for TXL (5) A simple Happy parser specification: Haskell datatypes for AST: { Module Header } data BinOp = Plus | Minus | Times | Divide %name ParserFunctionName %tokentype { TokenTypeName } data Exp = LitInt Int | Var Id %token | BinOpApp BinOp Exp Exp Specification of Terminal Symbols | Let Id Exp Exp %% Grammar productions with semantic actions { Further Haskell Code } COMP3012/G53CMP: Lecture 4 – p.13/32 COMP3012/G53CMP: Lecture 4 – p.14/32 Happy Parser for TXL (6) Happy Parser for TXL (7) The terminal symbol specification specifies • The code fragment between curly braces is a terminals to be used in productions and relates Haskell pattern that is matched against the them to Haskell constructors for the tokens: actual tokens returned by the parsing function. %token • If this pattern contains the special variable int { T_Int $$ } $$, then the corresponding part of the ident { T_Id $$ } matched token becomes the semantic value. ’+’ { T_Plus } Examples: T_Int $$, T_Id $$ ’-’ T_Minus { } • Otherwise the entire token becomes the ... semantic value. Examples: T_Plus, T_In ’=’ { T_Equal } • The semantic values of different terminal let { T_Let } symbols may thus have different types. in { T_In } COMP3012/G53CMP: Lecture 4 – p.15/32 COMP3012/G53CMP: Lecture 4 – p.16/32 Happy Parser for TXL (5) Happy Parser for TXL (6) The grammar productions are written in BNF, It is also possible to add type annotations: with an additional semantic action defining the semantic value for each production: add_exp :: { Exp } add_exp add_exp : mul_exp {$1} : mul_exp {$1} | add_exp ’+’ mul_exp {BinOpApp Plus $1 $3} | add_exp ’+’ mul_exp {BinOpApp Plus $1 $3} | add_exp ’-’ mul_exp {BinOpApp Minus $1 $3} | add_exp ’-’ mul_exp {BinOpApp Minus $1 $3} mul_exp Most useful when semantic values are of : prim_exp {$1} different types. | mul_exp ’*’ prim_exp {BinOpApp Times $1 $3} See HappyTXL.y for the complete example. | mul_exp ’/’ prim_exp {BinOpApp Divide $1 $3} COMP3012/G53CMP: Lecture 4 – p.17/32 COMP3012/G53CMP: Lecture 4 – p.18/32 Shift/Red. and Red./Red. Conflicts (1) Shift/Red. and Red./Red. Conflicts (2) Context-free grammars are often initially ambiguous. Consider the grammar fragment: In LR-parsing, ambiguous grammars lead to Cmd → ... shift/reduce and reduce/reduce conflicts: | if Exp then Cmd • shift/reduce: some states have mixed | if Exp then Cmd else Cmd complete and incomplete items: According to this grammar, a program fragment A → a · if e1 then if e2 then c1 else c2 A → a · b can be parsed in two ways, with very different meanings (the “dangling else” problem): Should parser shift or reduce? if e1 then (if e2 then c1) else c2 if e1 then (if e2 then c1 else c2) COMP3012/G53CMP: Lecture 4 – p.19/32 COMP3012/G53CMP: Lecture 4 – p.20/32 Shift/Red. and Red./Red. Conflicts (3) Shift/Red. and Red./Red. Conflicts (4) • reduce/reduce: some states have more than • Shift/reduce conflicts often resolved by opting one complete item: for shifting: - Typically the default option (e.g. Yacc, A → a· Bison, Happy) B → a· - Usually gives the desired result; e.g., resolves the dangling else problem in a natural way. Reduce, but by which production? • Reduce/reduce conflicts are worse as no reason to pick one production over another: grammar has to be manually disambiguated. COMP3012/G53CMP: Lecture 4 – p.21/32 COMP3012/G53CMP: Lecture 4 – p.22/32 Precedence and Associativity A TXL Interpreter (1) Happy (like e.g. Yacc and Bison) allows operator The semantic actions do not have to construct an precedence and associativity to be explicitly AST. An alternative is to interpret the code being specified to disambiguate a grammar: parsed. Basic idea: %left ’+’ ’-’ exp :: { Int } %left ’*’ ’/’ exp exp : exp ’+’ exp { BinOpApp Plus $1 $3 } : exp ’+’ exp {$1 + $3} | exp ’-’ exp { BinOpApp Minus $1 $3 } | exp ’-’ exp {$1 - $3} | exp ’*’ exp { BinOpApp Times $1 $3 } ... | exp ’/’ exp { BinOpApp Divide $1 $3 } But TXL has a let-construct ... ... What about TXL VARIABLES??? E.g.: See HappyTXL2.y for further details. let x = 3 in x + x, semantic value of x? COMP3012/G53CMP: Lecture 4 – p.23/32 COMP3012/G53CMP: Lecture 4 – p.24/32 A TXL Interpreter (2) A TXL Interpreter (3) One way: • The semantic action for a variable looks up • Each semantic action returns a function of type the variable value in the environment: Env -> Int | ident { \env -> env $1 } where (for example) • The semantic action for let extends the Type Env = Id -> Int argument environment and evaluates the body in the extended environment: • The semantic action for evaluating a | let ident ’=’ exp in exp composite expression passes on the {\env -> let v = $4 env environment. E.g. semantic action for +: in $6 (\i -> if i == $2 | exp ’+’ exp then v { \env -> $1 env + $3 env } else env i)} COMP3012/G53CMP: Lecture 4 – p.25/32 COMP3012/G53CMP: Lecture 4 – p.26/32 A TXL Interpreter (4) A TXL Interpreter (5) Another way: Attribute Grammars • The semantic action for literal just returns the value of the literal, ignoring the environment: • Each non-terminal associated with a set of | int { \_ -> $1 } semantic attributes (values). • • A program gets evaluated by applying the overall Set of equations for each production defines result function to the empty environment: the attributes. (\_ -> error "undefined variable") • Attributes can be: - Inherited: defined in terms of attributes of See HappyTXLInterpreter.y for further parents or siblings in derivation tree. details. - Synthesized: defined in terms of attributes of children in derivation tree. COMP3012/G53CMP: Lecture 4 – p.27/32 COMP3012/G53CMP: Lecture 4 – p.28/32 A TXL Interpreter (6) A TXL Interpreter (7) • Semantic equation for literals: • We need two attributes: | int { $$ = $1 } %attribute value { Int } %attribute env { Env } • Semantic equation for identifiers: The first one is the default attribute: | ident { $$ = $$.env $1 } assumed unless other attribute name • Semantic equations for let: explicitly stated, and gives overall result.
Recommended publications
  • Validating LR(1) Parsers
    Validating LR(1) Parsers Jacques-Henri Jourdan1;2, Fran¸coisPottier2, and Xavier Leroy2 1 Ecole´ Normale Sup´erieure 2 INRIA Paris-Rocquencourt Abstract. An LR(1) parser is a finite-state automaton, equipped with a stack, which uses a combination of its current state and one lookahead symbol in order to determine which action to perform next. We present a validator which, when applied to a context-free grammar G and an automaton A, checks that A and G agree. Validating the parser pro- vides the correctness guarantees required by verified compilers and other high-assurance software that involves parsing. The validation process is independent of which technique was used to construct A. The validator is implemented and proved correct using the Coq proof assistant. As an application, we build a formally-verified parser for the C99 language. 1 Introduction Parsing remains an essential component of compilers and other programs that input textual representations of structured data. Its theoretical foundations are well understood today, and mature technology, ranging from parser combinator libraries to sophisticated parser generators, is readily available to help imple- menting parsers. The issue we focus on in this paper is that of parser correctness: how to obtain formal evidence that a parser is correct with respect to its specification? Here, following established practice, we choose to specify parsers via context-free grammars enriched with semantic actions. One application area where the parser correctness issue naturally arises is formally-verified compilers such as the CompCert verified C compiler [14]. In- deed, in the current state of CompCert, the passes that have been formally ver- ified start at abstract syntax trees (AST) for the CompCert C subset of C and extend to ASTs for three assembly languages.
    [Show full text]
  • Yacc (Yet Another Compiler-Compiler)
    Yacc (Yet Another Compiler-Compiler) EECS 665 Compiler Construction 1 The Role of the Parser Source Intermediate Program Token Parse Representation Tree Lexical Parser Rest of Front Analyzer End Get next token Symbol Table 2 Yacc – LALR Parser Generator Input *.y Yacc y.tab.c (yyparse) C Executable y.tab.h Compiler *.l Lex lex.yy.c (yylex) Output 3 Yacc Specification declarations %% translation rules %% supporting C routines 4 Yacc Declarations ● %{ C declarations %} ● Declare tokens ● %token name1 name2 … ● Yacc compiler %token INTVAL → #define INTVAL 257 5 Yacc Declarations (cont.) ● Precedence ● %left, %right, %nonassoc ● Precedence is established by the order the operators are listed (low to high) ● Encoding precedence manually ● Left recursive = left associative (recommended) ● Right recursive = right associative 6 Yacc Declarations (cont.) ● %start ● %union %union { type type_name } %token <type_name> token_name %type <type_name> nonterminal_name ● yylval 7 Yacc Translation Rules ● Form: A : Body ; where A is a nonterminal and Body is a list of nonterminals and terminals (':' and ';' are Yacc punctuation) ● Semantic actions can be enclosed before or after each grammar symbol in the body ● Ambiguous grammar rules ● Yacc chooses to shift in a shift/reduce conflict ● Yacc chooses the first production in reduce/reduce conflict 8 Yacc Translation Rules (cont.) ● When there is more than one rule with the same left hand side, a '|' can be used A : B C D ; A : E F ; A : G ; => A : B C D | E F | G ; 9 Yacc Actions ● Actions are C code segments
    [Show full text]
  • Adaptive Probabilistic Generalized Lr Parsing
    ADAPTIVE PROBABILISTIC GENERALIZED LR PARSING + Jerry Wright* , Ave Wrigley* and Richard Sharman • Centre for Communications Research Queen's- Building , University Walk , Bristol BS8 lTR , U.K. + I.B .M. United- Kingdom Scientific Centre Athelstan House, St Clement Street, Winchester S023 9DR, U.K. ABSTRACT connected speech recognition (Lee , 1989) , together with the possibility Various issues in the of building grammar ...le vel mqdelling implem�ntation of generalized LR into this . framework (Lee and parsing with probability' are Rabiner, 19S9) is further evidence discussed. A method for preve_nting of this trend. Th_e p\lrpose. of this the generation of infinite ·numbers paper is to consider some issues ,, in of states is described and the space the implementation of probabilistic requirements of the parsing tables generalized LR parsing . are · assessed for a substantial natural-language grammar . Because of a high degree of ambiguity in the The obj ectives of our current grammar , there are many multiple work on language modelling and · · entries and the tables are rather parsing for s·peech recognition can large . A new method for grammar be summarised as follows : adaptation is introduced which may (1) real-time parsing without help to reduce this problem . A excessive space requirement,s, probabilistic version of the Tomita ( 2 ) minimum restrictions on parse forest i& also described. the grammar ( ambiguity , null rules , left - recursion all · permitted, no need to use a normal form) , 1. INTRODUCTION ( 3 ) probabilistic predictions to be made available to the patte·rn matcher, with word or phoneme The generalized LR parsing likel ihoods received in return , algorithm of Tomita ( 1986) allows (4) interpretations , ranked by most context- free grammars to be overall probability, to be made parsed with high efficiency .
    [Show full text]
  • Compiler Construction
    Compiler construction PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 10 Dec 2011 02:23:02 UTC Contents Articles Introduction 1 Compiler construction 1 Compiler 2 Interpreter 10 History of compiler writing 14 Lexical analysis 22 Lexical analysis 22 Regular expression 26 Regular expression examples 37 Finite-state machine 41 Preprocessor 51 Syntactic analysis 54 Parsing 54 Lookahead 58 Symbol table 61 Abstract syntax 63 Abstract syntax tree 64 Context-free grammar 65 Terminal and nonterminal symbols 77 Left recursion 79 Backus–Naur Form 83 Extended Backus–Naur Form 86 TBNF 91 Top-down parsing 91 Recursive descent parser 93 Tail recursive parser 98 Parsing expression grammar 100 LL parser 106 LR parser 114 Parsing table 123 Simple LR parser 125 Canonical LR parser 127 GLR parser 129 LALR parser 130 Recursive ascent parser 133 Parser combinator 140 Bottom-up parsing 143 Chomsky normal form 148 CYK algorithm 150 Simple precedence grammar 153 Simple precedence parser 154 Operator-precedence grammar 156 Operator-precedence parser 159 Shunting-yard algorithm 163 Chart parser 173 Earley parser 174 The lexer hack 178 Scannerless parsing 180 Semantic analysis 182 Attribute grammar 182 L-attributed grammar 184 LR-attributed grammar 185 S-attributed grammar 185 ECLR-attributed grammar 186 Intermediate language 186 Control flow graph 188 Basic block 190 Call graph 192 Data-flow analysis 195 Use-define chain 201 Live variable analysis 204 Reaching definition 206 Three address
    [Show full text]
  • Validating LR(1) Parsers Jacques-Henri Jourdan, François Pottier, Xavier Leroy
    Validating LR(1) Parsers Jacques-Henri Jourdan, François Pottier, Xavier Leroy To cite this version: Jacques-Henri Jourdan, François Pottier, Xavier Leroy. Validating LR(1) Parsers. ESOP 2012 - Pro- gramming Languages and Systems - 21st European Symposium on Programming, Mar 2012, Tallinn, Estonia. pp.397-416, 10.1007/978-3-642-28869-2_20. hal-01077321 HAL Id: hal-01077321 https://hal.inria.fr/hal-01077321 Submitted on 24 Oct 2014 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Validating LR(1) Parsers Jacques-Henri Jourdan1,2, Fran¸cois Pottier2, and Xavier Leroy2 1 Ecole´ Normale Sup´erieure 2 INRIA Paris-Rocquencourt Abstract. An LR(1) parser is a finite-state automaton, equipped with a stack, which uses a combination of its current state and one lookahead symbol in order to determine which action to perform next. We present a validator which, when applied to a context-free grammar G and an automaton A, checks that A and G agree. Validating the parser pro- vides the correctness guarantees required by verified compilers and other high-assurance software that involves parsing. The validation process is independent of which technique was used to construct A.
    [Show full text]
  • Principled Procedural Parsing Nicolas Laurent
    Principled Procedural Parsing Nicolas Laurent August 2019 Thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Applied Science in Engineering Institute of Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM) Louvain School of Engineering (EPL) Université catholique de Louvain (UCLouvain) Louvain-la-Neuve Belgium Thesis Committee Prof. Kim Mens, Advisor UCLouvain/ICTEAM, Belgium Prof. Charles Pecheur, Chair UCLouvain/ICTEAM, Belgium Prof. Peter Van Roy UCLouvain/ICTEAM Belgium Prof. Anya Helene Bagge UIB/II, Norway Prof. Tijs van der Storm CWI/SWAT & UG, The Netherlands Contents Contents3 1 Introduction7 1.1 Parsing............................7 1.2 Inadequacy: Flexibility Versus Simplicity.........8 1.3 The Best of Both Worlds.................. 10 1.4 The Approach: Principled Procedural Parsing....... 13 1.5 Autumn: Architecture of a Solution............ 15 1.6 Overview & Contributions.................. 17 2 Background 23 2.1 Context Free Grammars (CFGs).............. 23 2.1.1 The CFG Formalism................. 23 2.1.2 CFG Parsing Algorithms.............. 25 2.1.3 Top-Down Parsers.................. 26 2.1.4 Bottom-Up Parsers.................. 30 2.1.5 Chart Parsers..................... 33 2.2 Top-Down Recursive-Descent Ad-Hoc Parsers....... 35 2.3 Parser Combinators..................... 36 2.4 Parsing Expression Grammars (PEGs)........... 39 2.4.1 Expressions, Ordered Choice and Lookahead... 39 2.4.2 PEGs and Recursive-Descent Parsers........ 42 2.4.3 The Single Parse Rule, Greed and (Lack of) Ambi- guity.......................... 43 2.4.4 The PEG Algorithm................. 44 2.4.5 Packrat Parsing.................... 45 2.5 Expression Parsing...................... 47 2.6 Error Reporting........................ 50 2.6.1 Overview....................... 51 2.6.2 The Furthest Error Heuristic...........
    [Show full text]
  • One Parser to Rule Them All
    One Parser to Rule Them All Ali Afroozeh Anastasia Izmaylova Centrum Wiskunde & Informatica, Amsterdam, The Netherlands {ali.afroozeh, anastasia.izmaylova}@cwi.nl Abstract 1. Introduction Despite the long history of research in parsing, constructing Parsing is a well-researched topic in computer science, and parsers for real programming languages remains a difficult it is common to hear from fellow researchers in the field and painful task. In the last decades, different parser gen- of programming languages that parsing is a solved prob- erators emerged to allow the construction of parsers from a lem. This statement mostly originates from the success of BNF-like specification. However, still today, many parsers Yacc [18] and its underlying theory that has been developed are handwritten, or are only partly generated, and include in the 70s. Since Knuth’s seminal paper on LR parsing [25], various hacks to deal with different peculiarities in program- and DeRemer’s work on practical LR parsing (LALR) [6], ming languages. The main problem is that current declara- there is a linear parsing technique that covers most syntactic tive syntax definition techniques are based on pure context- constructs in programming languages. Yacc, and its various free grammars, while many constructs found in program- ports to other languages, enabled the generation of efficient ming languages require context information. parsers from a BNF specification. Still, research papers and In this paper we propose a parsing framework that em- tools on parsing in the last four decades show an ongoing braces context information in its core. Our framework is effort to develop new parsing techniques.
    [Show full text]
  • Characteristic Parsing: a Framework for Producing Compact Deterministic Parsers, I1"
    JOURNAL OF COMPUTER AND SYSTEM Sel1~'.Na)E.q 14, 318--343 (1977) Characteristic Parsing: A Framework for Producing Compact Deterministic Parsers, I1" MATTHEW M. GEI.LER* AND MICHAEL A. IIARRISON Computer Science Division, University of California at Berkeley, Berkeley, California 94720 Received September 22, 1975; revised August 4, 1976 This paper applies the theory of characteristic parsing to obtain very small parsers in three special cases. The first application is for strict deterministic grammars. A charac- teristic parser is constructed and its properties are exhibited. It is as fast as an LI,~ parser. It accepts every string in the language and rejects every string not in the language although it rejects later than a canonical LR parser. The parser is shown to be exponentially smaller than an LR parser (or SLR or LALR parser) on a sample family of grammars. General size bounds are obtained. A similar analysis is carried through for SLR and LAI,R grammars. INTRODUCTION In [8], which is essentially the first part of this work, characteristic parsing was intro- duced and a basic theory was developed. The fundamental idea of this framework is to regard an LR parser as a table-driven device. Since LR-like tables are derived from sets of items, a parameterized algorithm is given in [8] for generating sets of items based on "characteristics" of the underlying grammar. This technique provides parsers that work as fast as LR parsers; every string in the original language is accepted by the parser; every string not in the original language is rejected by the parser (although perhaps later).
    [Show full text]
  • EDAN70 Compiler Project CUP Parser Generator for Jastadd January 20, 2016
    EDAN70 Compiler Project CUP parser generator for JastAdd January 20, 2016 Felix Akerlund˚ Ragnar Mellbin D11, Faculty of Engineering (LTH), Lund University D11, Faculty of Engineering (LTH), Lund University [email protected] [email protected] Abstract bit = "0" | "1" ; nibble = bit , bit , bit , bit ; We have extended a parser specification preprocessor called Jas- bits = bit , { bit } ; tAddParser to support the Construction of Useful Parsers (CUP) signed-bits = [ "-" ] , bits ; parser generator as a backend in addition to the Beaver parser gen- erator, which previously was the only available option. In the process of implementing CUP support we also modular- Figure 1. A small example of a grammar in EBNF. ized the pre-processor to make it easier to support additional gen- erators in the future. This will make JastAddParser less dependent on the continued support and development of Beaver. It is also in- <bit> = "0" | "1" teresting to see how a CUP-generated parser performs compared to <nibble> = <bit> , <bit> , <bit> , <bit> one built by Beaver. <bits> = <bit> | <bit> , <bits> We encountered several difficulties during the processes, but <signed-bits> = <bits> | "-" , <bits> produced a implementation with partial CUP support. Figure 2. The grammar from figure 1, expressed in BNF. 1. Introduction Beaver and Construction of Useful Parsers (CUP) are two open source LALR(1) Java parser generators. A parser generator is a four production rules with a nonterminal on the left side of the as- program that takes a parser specification as input and produces signment, and a number of terminals or nonterminals on the right.
    [Show full text]
  • Nez: Practical Open Grammar Language
    Nez: Practical Open Grammar Language Kimio Kuramitsu Yokohama National University, JAPAN [email protected] http://nez-peg.github.io/ Abstract tions. Developers use a formal grammar, such as LALR(k), LL(k), Nez is a PEG(Parsing Expressing Grammar)-based open gram- or PEG, to specify programming language syntax. Based on the mar language that allows us to describe complex syntax constructs grammar specification, a parser generator tool, such as Yacc[14], without action code. Since open grammars are declarative and ANTLR3/4[29], or Rats![9], produces an efficient parser code that free from a host programming language of parsers, software engi- can be integrated with the host language of the compiler and inter- neering tools and other parser applications can reuse once-defined preter. This generative approach, obviously, enables developers to grammars across programming languages. avoid error-prone coding for lexical and syntactic analysis. A key challenge to achieve practical open grammars is the ex- Traditional parser generators, however, are not entirely free pressiveness of syntax constructs and the resulting parser perfor- from coding. One particular reason is that the formal grammars mance, as the traditional action code approach has provided very used today lack sufficient expressiveness for many of the pop- pragmatic solutions to these two issues. In Nez, we extend the ular programming language syntaxes, such as typedef in C/C++ symbol-based state management to recognize context-sensitive lan- and other context-sensitive syntaxes[3, 8, 9]. In addition, a formal guage syntax, which often appears in major programming lan- grammar itself is a syntactic specification of the input language, not guages.
    [Show full text]
  • MCA Part III Paper- XIX Topic: Parsing Prepared By
    MCA Part III Paper- XIX Topic: Parsing Prepared by: Dr. Kiran Pandey School of Computer science Email-Id: [email protected] INTRODUCTION The output of syntax analysis is a parse tree. Parse tree is used in the subsequent phases of compilation. This process of analyzing the syntax of the language is done by a module in the compiler called parser. The process of verifying whether an input string matches the grammar of the language is called parsing. The syntax analyzer gets the string of tokens from the lexical analyzer. It then verifies the syntax of input string by verifying whether the input string can be derived from the grammar or not. If the input string is derived from the grammar then the syntax is correct otherwise it is not derivable and the syntax is wrong. The parser will report syntactical errors in a manner that is easily understood by the user. It has procedures to recover from these errors and to continue parsing action. The output of a parser is a parse tree. The figure below shows the position of a parser in the compiler. Figure 1: Position of parser in a Compiler. TYPES OF PARSING Syntax analyzers follow production rules defined by means of context-free grammar. The way the production rules are implemented (derivation) divides parsing into two types: top-down parsing and bottom-up parsing. Figure 2: Types of Parsing TOP DOWN PARSING, We have learnt in the last chapter that the top-down parsing technique parses the input, and starts constructing a parse tree from the root node gradually moving down to the leaf nodes.
    [Show full text]
  • A Methodology for Removing LALR(K) Conflicts
    A Methodology for Removing LALR(k) Conflicts Leonardo Teixeira Passos, Mariza A. S. Bigonha, Roberto S. Bigonha Departamento de Cienciaˆ da Computac¸ao˜ – Universidade Federal de Minas Gerais (UFMG) CEP: 31270-010 – Belo Horizonte – MG – Brazil {leonardo, mariza, bigonha}@dcc.ufmg.br Abstract. Despite all the advances brought by LALR parsing method by DeRemer in the late 60’s, conflicts reported by LALR parser generators are still removed in an old fashion and primitive manner, based on analysis of a huge amount of textual and low-level data stored on a single log file. For the purpose of minimizing the effort and time consumed in LALR conflict removal, which is definitely a laborious task, a methodology is proposed, along with the set of operations necessary to its realization. We also present a tool and the ideas behind it to support the methodology, plus its plugin facility, which per- mits the interpretation of virtually any syntax specification, regardless of the specification language used. Resumo. Apesar de todos os avanc¸os obtidos pelo metodo´ de analise´ sintatica´ LALR de DeRemer no final da decada´ de 60, conflitos reportados por geradores de analisadores sintaticos´ LALR ainda sao˜ removidos de forma primitiva, baseada na analise´ de uma grande quantidade de dados de baixo n´ıvel, disponi- bilizados em um unico´ arquivo de log. Com o proposito´ de minimizar o esforc¸o e o tempo gasto na remoc¸ao˜ de conflitos, que e´ definitivamente uma tarefa labo- riosa, uma metodologia e´ proposta, juntamente com as operac¸oes˜ necessarias´ a` sua realizac¸ao.˜ Alem´ disto, apresenta-se uma ferramenta e as ideias´ uti- lizadas em sua criac¸ao˜ no suporte a` metodologia, acrescido da descric¸ao˜ da facilidade oferecida pelo mecanismo de plugins, que permite virtualmente a interpretac¸ao˜ de qualquer especificac¸ao˜ sintatica,´ sem a preocupac¸ao˜ da lin- guagem de especificac¸ao˜ utilizada.
    [Show full text]