The Impact of Terminal Prefixing on LL(1) Parsing

Total Page:16

File Type:pdf, Size:1020Kb

The Impact of Terminal Prefixing on LL(1) Parsing J. Appl. Environ. Biol. Sci. , 7(5)64-76, 2017 ISSN: 2090-4274 Journal of Applied Environmental © 2017, TextRoad Publication and Biological Sciences www.textroad.com The Impact of Terminal Prefixing on LL(1) Parsing Muhammad Shumail Naveed Department of Computer Science, University of Balochistan, Pakistan Received: December 10, 2016 Accepted: February 22, 2017 ABSTRACT In this paper a terminal prefixing is presented as a normal form of context-free grammar and applied on a LL(1) grammar. Its impact on the size of derivations, magnitude of parse trees and particularly on the efficiency of LL(1) parsing is analyzed. Results show that terminal prefixing has a significant impact on size of parse trees and similarly on the efficiency of LL(1) parser. The discussion included in this paper is helpful in the advancement of other parsers and also for the anatomization of topical normal forms. KEYWORDS : parsing, context-free grammar, LL(1) parsing, Greibach normal form, ambiguity 1. INTRODUCTION Parsing is the process of arranging a linear representation according to a particular grammar [1], and parser is a program that accomplished this process. Parsing has a great importance and extensive history of academic study yet it is not a solved problem [2], and therefore researchers have devoted decades analyzing to develop efficient parsing methods. Parsing has diverse applications especially in the area of compiler construction [3], natural language processing [4,5] and automatic speech recognition [6]. In classical compilers the parsing is a second phase and deals with the verification of syntax and generation of parse tree which defines the structure of a program. In contemporary compilers the top-down and bottom-up are the common parsing techniques. The top-down parsing starts the parsing from the root of the parse tree and determine from the input token stream how to develop the parse tree. Top-down parser perform this duty by using the information of the lookahead tokens and of the productions in the grammar. Furthermore, it scan the incoming tokens from left to right. Top-down parsing can be performed with a set of recursive functions or through table-driven approach which is easy to construct and modify [7]. Top-down parser can only parse a restricted class of a context-free grammar. Fundamentally, the context-free grammar (CFG) is one of a powerful concept of computational linguistics, programming languages and computer science. In computer science the context-free grammar is generally used to describe the syntax of programming languages [8]. It is also very useful in the description of restricted natural language based programming languages [9, 10]. Formally the context-free grammar is a four tuple (N,T,P,S), where N is a set of nonterminals. T is a set of terminals. Nonterminals and terminals are always disjoint. P is set of productions or rules, in which each rule takes the following form [11]: B → α, where B ∈ N and α ∈ (N U T) *. S is the start symbol and a subset of N. The context-free grammar is usually normalized before using in any real solution and parsing is no exception. The normalization usually restrict the structure of a grammar either by restricting the size of production or the arrangement of symbols in the productions. A restricted class of context-free grammar is used in top-down parsing. The top-down parsing has two main types: recursive-descent parsing and LL(1) parsing. The recursive-descent parsing analyze the input by using the recursive procedures where each such procedure represents one of the production of its context-free grammar. On the other hand, the LL(1) parsing generally does not require backtracking during building a parse tree and commonly used in compilers. LL(1) parsing is one of a simple and viable choice for syntax analysis which is a second phase of a compiler. Syntax analyzer is one of a time consuming phase of a compiler and consequently the development of efficient parser remains an active area of research in computer science and computational linguistics. This paper introduced a terminal prefixing which is a more general form of Greibach normal form and apply this terminal prefixing to simplify the LL(1) grammar and determine its impact on derivation, parse tree and LL(1) parsing. The paper is structured as follows. In section 2 the LL(1) parsing is described and the major work conducted in the improvement of LL parsing is included in section 3. In section 4 the terminal prefixing is introduced. The initial evaluation and results are included in section 5. Finally, section 6 describes the conclusion. *Corresponding Author : Dr. Muhammad Shumail Naveed, Department of Computer Science, University of Balochistan, Quetta, Pakistan. Email: [email protected] 64 Naveed, 2017 2. LL(1) Parsing The LL(1) parsing is a top-down, non-recursive predictive parsing technique which requires no backtracking and constructed over the class of grammars called LL(1) grammars. The fundamental concepts of LL(1) grammars were formally presented by Lewis and Stearns [12], and the attributes of LL(1) grammar were refined by Rosenkrantz and Stearnsc [13]. LL(1) parsing used an explicit stack to parse the sentence. It is also used in the definition of other parsing techniques especially in a bottom-up parsing algorithms [8]. The first “L” in LL(1) indicates that input is processed from left to right. The second “L” indicates that it performs the left most derivation for the input, and “1” indicates that in each step only one symbol of input is used as a lookahead to predict the parsing action. In LL parsing the lookahead symbols may be the multiple symbols. LL parsing is the foundation of a contemporary parser generator ANother Tool for Language Recognition (ANTLR) [14]. LL(1) parsing is usually implemented through a table called the parsing table. The construction of a LL(1) parsing table involves the computing of two functions associated with the LL(1) grammar. These two functions, FIRST and FOLLLOW facilitate to fill up the entries of LL(1) parsing table. LL(1) grammars are the restricted class of context-free grammars and therefore LL(1) parsing is inherently a restricted class of parsing technique. There are several limitations of LL(1) parsing. First, it cannot handle the ambiguous grammar. Similarly it cannot handle the left recursive grammar and it also can’t handle a grammar containing the conflict. If G is a context-free grammar and L( G) is a language defined over G, then grammar G is said to be ambiguous if there is at least one string in L(G) for which G generates multiple parse trees [15]. Ambiguity is a common characteristic of natural language; however in programming language it is necessary to remove the ambiguity if possible. Concretely there is no well defined algorithm for the identification and removal of ambiguity, nevertheless in most of the situations it is possible to identify and remove the ambiguity. Principally any language L can be defined by different grammars. If every grammar that defines L is ambiguous grammar, then the said language is known as inherently ambiguous. A grammar G is said to be left recursive, if it contains at least one left recursive production. A left recursive is one that takes the following form: X → X α X → β Where X is a nonterminal and α and β are the strings of nonterminals and terminals and β does not begin with X. The production having left recursion can be handled by rewriting that production into two rules: one that defines β first and another one that defines the repetition of α: X → β X’ X’ → α X’ | ε Conflict is an undesirable condition for LL(1) parser, that occur when multiple grammar productions choices have a common prefix: X → β α X → β γ This problem can be solved by left factoring. Through left factoring the β is factored on the left and rule is rewritten as: X → β X’ X’ → α X’ → γ 3. Previous Work For decades, experts and scientists have worked towards expanding the recognition strength and the performance of LL parsing. Parr et al. [16] introduced the ALL(*) parsing which integrates the plainness, effectiveness and predictability of traditional LL(k) parsers with the ability of a GLR-type method to take parsing decisions. The significant novelty is to move analysis to parse-time, which allows ALL(*) deal context-free grammar without any left recursion. Theoretically ALL(*) is O(n 4) but executes linearly on grammar used in practice. Shilling [17] introduced an efficient incremental parsing algorithm. It is devised for the language-based editors and implemented in Fred language-based editor. It can parse input at intervals of extremely tiny granularity and bound the amount of incremental parsing required when alterations are perform internal to the editing buffer. In [18], the notion of LLLR is introduced. An LLLR parser generates the left parse of the input in a linear time and can be built for any LL(k) grammar. It works as an LL(k) and used as a foundation for a syntax-directed translation. LL(k) is the backbone of the LLLR(k) parser and generate semantic action by means of top-down approach just similar to the canonical LR(k) parser. LLLR parsing is similar to LL(*) parsing but does not perform backtracking and utilize LR(k) parser to manage the LL(k) conflicts [19]. The LL(k) parser of an LL(k) grammar is fundamentally a single state deterministic push-down automaton that performs left parse of the input string. The LL( k) parsing is evolved as duple of LR ( k) parsing.
Recommended publications
  • LLLR Parsing: a Combination of LL and LR Parsing
    LLLR Parsing: a Combination of LL and LR Parsing Boštjan Slivnik University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia [email protected] Abstract A new parsing method called LLLR parsing is defined and a method for producing LLLR parsers is described. An LLLR parser uses an LL parser as its backbone and parses as much of its input string using LL parsing as possible. To resolve LL conflicts it triggers small embedded LR parsers. An embedded LR parser starts parsing the remaining input and once the LL conflict is resolved, the LR parser produces the left parse of the substring it has just parsed and passes the control back to the backbone LL parser. The LLLR(k) parser can be constructed for any LR(k) grammar. It produces the left parse of the input string without any backtracking and, if used for a syntax-directed translation, it evaluates semantic actions using the top-down strategy just like the canonical LL(k) parser. An LLLR(k) parser is appropriate for grammars where the LL(k) conflicting nonterminals either appear relatively close to the bottom of the derivation trees or produce short substrings. In such cases an LLLR parser can perform a significantly better error recovery than an LR parser since the most part of the input string is parsed with the backbone LL parser. LLLR parsing is similar to LL(∗) parsing except that it (a) uses LR(k) parsers instead of finite automata to resolve the LL(k) conflicts and (b) does not perform any backtracking.
    [Show full text]
  • Compiler Design
    CCOOMMPPIILLEERR DDEESSIIGGNN -- PPAARRSSEERR http://www.tutorialspoint.com/compiler_design/compiler_design_parser.htm Copyright © tutorialspoint.com In the previous chapter, we understood the basic concepts involved in parsing. In this chapter, we will learn the various types of parser construction methods available. Parsing can be defined as top-down or bottom-up based on how the parse-tree is constructed. Top-Down Parsing We have learnt in the last chapter that the top-down parsing technique parses the input, and starts constructing a parse tree from the root node gradually moving down to the leaf nodes. The types of top-down parsing are depicted below: Recursive Descent Parsing Recursive descent is a top-down parsing technique that constructs the parse tree from the top and the input is read from left to right. It uses procedures for every terminal and non-terminal entity. This parsing technique recursively parses the input to make a parse tree, which may or may not require back-tracking. But the grammar associated with it ifnotleftfactored cannot avoid back- tracking. A form of recursive-descent parsing that does not require any back-tracking is known as predictive parsing. This parsing technique is regarded recursive as it uses context-free grammar which is recursive in nature. Back-tracking Top- down parsers start from the root node startsymbol and match the input string against the production rules to replace them ifmatched. To understand this, take the following example of CFG: S → rXd | rZd X → oa | ea Z → ai For an input string: read, a top-down parser, will behave like this: It will start with S from the production rules and will match its yield to the left-most letter of the input, i.e.
    [Show full text]
  • The Use of Predicates in LL(K) and LR(K) Parser Generators
    Purdue University Purdue e-Pubs ECE Technical Reports Electrical and Computer Engineering 7-1-1993 The seU of Predicates In LL(k) And LR(k) Parser Generators (Technical Summary) T. J. Parr Purdue University School of Electrical Engineering R. W. Quong Purdue University School of Electrical Engineering H. G. Dietz Purdue University School of Electrical Engineering Follow this and additional works at: http://docs.lib.purdue.edu/ecetr Parr, T. J.; Quong, R. W.; and Dietz, H. G., "The sU e of Predicates In LL(k) And LR(k) Parser Generators (Technical Summary) " (1993). ECE Technical Reports. Paper 234. http://docs.lib.purdue.edu/ecetr/234 This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. TR-EE 93-25 JULY 1993 The Use of Predicates In LL(k) And LR(k) Parser ene era tors? (Technical Summary) T. J. Purr, R. W. Quong, and H. G. Dietz School of Electrical Engineering Purdue University West Lafayette, IN 47907 (317) 494-1739 [email protected] Abstract Although existing LR(1) or U(1) parser generators suffice for many language recognition problems, writing a straightforward grammar to translate a complicated language, such as C++ or even C, remains a non-trivial task. We have often found that adding translation actions to the grammar is harder than writing the grammar itself. Part of the problem is that many languages are context-sensitive. Simple, natural descriptions of these languages escape current language tool technology because they were not designed to handle semantic information.
    [Show full text]
  • Part 1: Introduction
    Part 1: Introduction By: Morteza Zakeri PhD Student Iran University of Science and Technology Winter 2020 Agenda • What is ANTLR? • History • Motivation • What is New in ANTLR v4? • ANTLR Components: How it Works? • Getting Started with ANTLR v4 February 2020 Introduction to ANTLR – Morteza Zakeri 2 What is ANTLR? • ANTLR (pronounced Antler), or Another Tool For Language Recognition, is a parser generator that uses LL(*) for parsing. • ANTLR takes as input a grammar that specifies a language and generates as output source code for a recognizer for that language. • Supported generating code in Java, C#, JavaScript, Python2 and Python3. • ANTLR is recursive descent parser Generator! (See Appendix) February 2020 Introduction to ANTLR – Morteza Zakeri 3 Runtime Libraries and Code Generation Targets • There is no language specific code generators • There is only one tool, written in Java, which is able to generate Lexer and Parser code for all targets, through command line options. • The available targets are the following (2020): • Java, C#, C++, Swift, Python (2 and 3), Go, PHP, and JavaScript. • Read more: • https://github.com/antlr/antlr4/blob/master/doc/targets.md 11 February 2020 Introduction to ANTLR – Morteza Zakeri Runtime Libraries and Code Generation Targets • $ java -jar antlr4-4.8.jar -Dlanguage=CSharp MyGrammar.g4 • https://github.com/antlr/antlr4/tree/master/runtime/CSharp • https://github.com/tunnelvisionlabs/antlr4cs • $ java -jar antlr4-4.8.jar -Dlanguage=Cpp MyGrammar.g4 • https://github.com/antlr/antlr4/blob/master/doc/cpp-target.md • $ java -jar antlr4-4.8.jar -Dlanguage=Python3 MyGrammar.g4 • https://github.com/antlr/antlr4/blob/master/doc/python- target.md 11 February 2020 Introduction to ANTLR – Morteza Zakeri History • Initial release: • February 1992; 24 years ago.
    [Show full text]
  • Compiler Construction
    Compiler construction PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 10 Dec 2011 02:23:02 UTC Contents Articles Introduction 1 Compiler construction 1 Compiler 2 Interpreter 10 History of compiler writing 14 Lexical analysis 22 Lexical analysis 22 Regular expression 26 Regular expression examples 37 Finite-state machine 41 Preprocessor 51 Syntactic analysis 54 Parsing 54 Lookahead 58 Symbol table 61 Abstract syntax 63 Abstract syntax tree 64 Context-free grammar 65 Terminal and nonterminal symbols 77 Left recursion 79 Backus–Naur Form 83 Extended Backus–Naur Form 86 TBNF 91 Top-down parsing 91 Recursive descent parser 93 Tail recursive parser 98 Parsing expression grammar 100 LL parser 106 LR parser 114 Parsing table 123 Simple LR parser 125 Canonical LR parser 127 GLR parser 129 LALR parser 130 Recursive ascent parser 133 Parser combinator 140 Bottom-up parsing 143 Chomsky normal form 148 CYK algorithm 150 Simple precedence grammar 153 Simple precedence parser 154 Operator-precedence grammar 156 Operator-precedence parser 159 Shunting-yard algorithm 163 Chart parser 173 Earley parser 174 The lexer hack 178 Scannerless parsing 180 Semantic analysis 182 Attribute grammar 182 L-attributed grammar 184 LR-attributed grammar 185 S-attributed grammar 185 ECLR-attributed grammar 186 Intermediate language 186 Control flow graph 188 Basic block 190 Call graph 192 Data-flow analysis 195 Use-define chain 201 Live variable analysis 204 Reaching definition 206 Three address
    [Show full text]
  • ANTLR Reference PDF Manual
    ANTLR Reference Manual Home | Download | News | About ANTLR | Support Latest version is 2.7.3. Download now! » » Home » Download ANTLR Reference Manual » News »Using ANTLR » Documentation ANTLR » FAQ » Articles Reference Manual » Grammars Credits » File Sharing » Code API Project Lead and Supreme Dictator Terence Parr » Tech Support University of San Franciso »About ANTLR Support from » What is ANTLR jGuru.com » Why use ANTLR Your View of the Java Universe » Showcase Help with initial coding » Testimonials John Lilly, Empathy Software » Getting Started C++ code generator by » Software License Peter Wells and Ric Klaren » ANTLR WebLogs C# code generation by »StringTemplate Micheal Jordan, Kunle Odutola and Anthony Oguntimehin. »TML Infrastructure support from Perforce: »PCCTS The world's best source code control system Substantial intellectual effort donated by Loring Craymer Monty Zukowski Jim Coker Scott Stanchfield John Mitchell Chapman Flack (UNICODE, streams) Source changes for Eclipse and NetBeans by Marco van Meegen and Brian Smith ANTLR Version 2.7.3 March 22, 2004 What's ANTLR http://www.antlr.org/doc/index.html (1 of 6)31.03.2004 17:11:46 ANTLR Reference Manual ANTLR, ANother Tool for Language Recognition, (formerly PCCTS) is a language tool that provides a framework for constructing recognizers, compilers, and translators from grammatical descriptions containing Java, C++, or C# actions [You can use PCCTS 1.xx to generate C-based parsers]. Computer language translation has become a common task. While compilers and tools for traditional computer languages (such as C or Java) are still being built, their number is dwarfed by the thousands of mini-languages for which recognizers and translators are being developed.
    [Show full text]
  • Town-Down LL Parsing
    Lecture Notes on Top-Down Predictive LL Parsing 15-411: Compiler Design Frank Pfenning∗ Lecture 8 1 Introduction In this lecture we discuss a parsing algorithm that traverses the input string from left to right giving a left-most derivation and makes a decision on ¯ ¯ which grammar production to use based on the first character/token of the input string. This parsing algorithm is called LL(1). If that were ambigu- ous, the grammar would have to be rewritten to fall into this class, which is not always possible. Most hand-written parsers are recursive descent parsers, which follow the LL(1) principles. Alternative presentations of the material in this lecture can be found in a paper by Shieber et al. [SSP95]. The textbook [App98, Chapter 3] covers parsing. For more detailed background on parsing, also see [WM95]. 2 LL(1) Parsing We have seen in the previous section, that the general idea of recursive de- scent parsing without restrictions forces us to non-deterministically choose between several productions which might be applied and potentially back- track if parsing gets stuck after a choice, or even loop (if the grammar is left-recursive). Backtracking is not only potentially very inefficient, but it makes it difficult to produce good error messages in case the string is not grammatically well-formed. Say we try three different ways to parse a given input and all fail. How could we say which of these is the source ∗With edits by Andre´ Platzer LECTURE NOTES L8.2 Top-Down Predictive LL Parsing of the error? How could we report an error location? This is compounded because nested choices multiply the number of possibilities.
    [Show full text]
  • Types of Parse Trees
    TOP DOWN PARSERS C.Naga Raju B.Tech(CSE),M.Tech(CSE),PhD(CSE),MIEEE,MCSI,MISTE Professor Department of CSE YSR Engineering College of YVU Proddatur 6/15/2020 Prof.C.NagaRaju YSRCE of YVU 9949218570 • Contents • Role of the Parser • Types of parsers • Recursive descent parser • Recursive Descent Predictive Parser • Non-Recursive Predictive Parser • GATE Problems and Solutions 6/15/2020 Prof.C.NagaRaju YSRCE of YVU 9949218570 The Role of the Parser • Parsing is the process of determining, a stream of tokens generated by a grammar. • The Main Tasks of the Parsers are to ✓ Obtain a String of Tokens from the Lexical Analyzer ✓ Verify the syntactic structure of the string, related to the Programming Language ✓ If it finds any Syntax Errors ,send to error handling 6/15/2020 Prof.C.NagaRaju YSRCE of YVU 9949218570 Types of Parsers Parsing Techniques are divided into Two general classes Top-Down Parsing Bottom-Up Parsing 6/15/2020 Prof.C.NagaRaju YSRCE of YVU 9949218570 Top-Down Parser • Top-Down Parser parses an input string of tokens with Left- Most Derivation. • Top-Down Parser traverse the parse tree from the Root to the Leaves with left most derivation Ex: for the given grammar and input string E→E + E E→E * E Id * Id + Id E→ id 6/15/2020 Prof.C.NagaRaju YSRCE of YVU 9949218570 construction of Top-Down Parsing E E →E * E E → E * E E * E →Id * E E → id →Id * E + E E → E + E id E + E →Id * Id + E E → id →Id * Id + Id E → id id id Top-Down Parsing • Top-Down Parsing is divided into Two Techniques • Recursive Descent Parser (TDP with Backtracking) • Predictive Parser (TDP without Backtracking) Recursive Descent Parser • Top-Down Parsing with Backtracking is called as Recursive Descent Parser • It requires backtracking to find the correct production.
    [Show full text]
  • Generalized LR Parsing in Haskell
    Generalized LR Parsing in Haskell Jo~ao Fernandes [email protected] Techn. Report DI-PURe-04.11.01 2004, October PURe Program Understanding and Re-engineering: Calculi and Applications (Project POSI/ICHS/44304/2002) Departamento de Informatica´ da Universidade do Minho Campus de Gualtar | Braga | Portugal DI-PURe-04.11.01 Generalized LR Parsing in Haskell by Jo~ao Fernandes Abstract Parser combinators elegantly and concisely model generalised LL parsers in a purely functional language. They nicely illustrate the concepts of higher- order functions, polymorphic functions and lazy evaluation. Indeed, parser combinators are often presented as a motivating example for functional pro- gramming. Generalised LL, however, has an important drawback: it does not handle (direct nor indirect) left recursive context-free grammars. In a different context, the (non-functional) parsing community has been doing a considerable amount of work on generalised LR parsing. Such parsers handle virtually any context-free grammar. Surprisingly, little work has been done on generalised LR by the functional programming community ( [9] is a good exception). In this report, we present a concise and elegant implementation of an incremental generalised LR parser generator and interpreter in Haskell. For good computational complexity, such parsers rely heavily on lazy evaluation. Incremental evaluation is obtained via function memoisation. An implementation of our generalised LR parser generator is available as the HaGLR tool. We assess the performance of this tool with some benchmark examples. 1 Motivation The Generalised LR parsing algorithm was first introduced by Tomita [14] in the context of natural language processing. Several improvements have been proposed in the literature, among others to handle context-free gram- mars with hidden left-recursion [11, 7].
    [Show full text]
  • CSE P 501 – Compilers
    CSE P 501 – Compilers LL and Recursive-Descent Parsing Hal Perkins Autumn 2019 UW CSE P 501 Autumn 2019 F-1 Agenda • Top-Down Parsing • Predictive Parsers • LL(k) Grammars • Recursive Descent • Grammar Hacking – Left recursion removal – Factoring UW CSE P 501 Autumn 2019 F-2 Basic Parsing Strategies (1) • Bottom-up – Build up tree from leaves • Shift next input or reduce a handle • Accept when all input read and reduced to start symbol of the grammar – LR(k) and subsets (SLR(k), LALR(k), …) remaining input UW CSE P 501 Autumn 2019 F-3 Basic Parsing Strategies (2) • Top-Down – Begin at root with start symbol of grammar – Repeatedly pick a non-terminal and expand – Success when expanded tree matches input – LL(k) A UW CSE P 501 Autumn 2019 F-4 Top-Down Parsing • Situation: have completed part of a left-most derivation S =>* wAa =>* wxy • Basic Step: Pick some production A ::= b1 b2 … bn that will properly expand A to match the input – Want this to be deterministic (i.e., no backtracking) A UW CSE P 501 Autumn 2019 F-5 Predictive Parsing • If we are located at some non-terminal A, and there are two or more possible productions A ::= a A ::= b we want to make the correct choice by looking at just the next input symbol • If we can do this, we can build a predictive parser that can perform a top-down parse without backtracking UW CSE P 501 Autumn 2019 F-6 Example • Programming language grammars are often suitable for predictive parsing • Typical example stmt ::= id = exp ; | return exp ; | if ( exp ) stmt | while ( exp ) stmt If the next part of the input begins with the tokens IF LPAREN ID(x) … we should expand stmt to an if-statement UW CSE P 501 Autumn 2019 F-7 LL(1) Property • A grammar has the LL(1) property if, for all non-terminals A, if productions A ::= a and A ::= b both appear in the grammar, then it is true that FIRST(a) ∩ FIRST(b) = Ø (Provided that neither a or b is ε (i.e., empty).
    [Show full text]
  • LL and LR Parsing with Abstract Machines, Or Syntax for Semanticists Midlands Graduate School 2016
    LL and LR parsing with abstract machines, or Syntax for Semanticists Midlands Graduate School 2016 Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt April 14, 2016 S S A π ρ A α π ρ α v w v w Contents Introduction Background: grammars, derivations, and parse trees Parsing machines LL machine LR machine LL(1) machine using lookahead LR(0) machine using LR items Parser generators Recursive descent: call stack as parsing stack Should a naturalist who had never studied the elephant except by means of the microscope think himself sufficiently acquainted with that animal? Henri Poincar´e Intro to this MGS course I aimed at (beginning) PhD students with background in CS and/or math I some classic compiling I operational semantics via abstract machine, in a very simple setting, more so than lambda I some simple correctness proof on machines by induction on steps I many correctness properties look obvious I until you try to do the induction I need to find induction hypothesis I if we have a good grasp of the relevant structures, the induction should be easy Some connections to other MGS 2016 modules: There are some connections to other MGS course, perhaps not obviously so: I denotational semantics: the two kinds of semantics are operational and denotational I category theory: think of machine runs as morphisms I linear logic: syntax is a substructrual logic (e.g. Lambek) I type theory: operational semantics has connections via type preservation etc These lectures - aims and style I Some of this material is implicit in what you can find in any comprehensive compilers textbook I such as Dragon Book = Aho, Lam, Sethi, Ullman Compilers: Principles, Techniques, and Tools I Hopcroft, John E.; Ullman, Jeffrey D.
    [Show full text]
  • The Modelcc Model-Based Parser Generator
    The ModelCC Model-Based Parser Generator Luis Quesada, Fernando Berzal, and Juan-Carlos Cubero Department of Computer Science and Artificial Intelligence, CITIC, University of Granada, Granada 18071, Spain {lquesada, fberzal, jc.cubero}@decsai.ugr.es Abstract Formal languages let us define the textual representation of data with precision. Formal grammars, typically in the form of BNF-like productions, describe the language syntax, which is then annotated for syntax-directed translation and completed with semantic actions. When, apart from the textual representation of data, an explicit representation of the corresponding data structure is required, the language designer has to devise the mapping between the suitable data model and its proper language specification, and then develop the conversion procedure from the parse tree to the data model instance. Unfortunately, whenever the format of the textual representation has to be modified, changes have to propagated throughout the entire language processor tool chain. These updates are time-consuming, tedious, and error-prone. Besides, in case different applications use the same language, several copies of the same language specification have to be maintained. In this paper, we introduce ModelCC, a model-based parser generator that decouples language specification from language processing, hence avoiding many of the problems caused by grammar-driven parsers and parser generators. ModelCC incorporates reference resolution within the parsing process. Therefore, instead of returning mere abstract syntax trees, ModelCC is able to obtain abstract syntax graphs from input strings. Keywords: Data models, text mining, language specification, parser generator, abstract syntax graphs, Model-Driven Software Development. 1. Introduction arXiv:1501.03458v1 [cs.FL] 11 Jan 2015 A formal language represents a set of strings [45].
    [Show full text]