Approximating Context-Free Grammars for Parsing and Verification Sylvain Schmitz
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Rewriting Games on Nested Words
Rewriting Games on Nested Words Martin Schuster Thomas Schwentick TU Dortmund Highlights 2014 Context-Free Games On Strings: Intuition Basic idea: Context-free grammar as Example two-player game T “ tafba, aafafa, abafau, JULIET chooses function f Ñ af | b symbols af f a ROMEO chooses Ó replacement strings af afa Ó JULIET wins if a target abafa string is reached Algorithmic problem JWIN Given: A context-free game G and a string w Question: Does JULIET have a winning strategy on w in G? Previous Results (Some) results from [Muscholl, Schwentick, Segoufin 2006]: JWIN is undecidable in general ù Left-to-right (L2R) restriction With L2R restriction and DFA-represented target language, JWin is EXPTIME-complete for finite or (D/N)FA-represented regular replacement languages PTIME-complete for finite replacement languages and bounded recursion Application: Active XML Schema Rewriting Server Client City Ñ Name, Weather, Events City Ñ Name, Weather, Events Events Ñ @event_svc Events Ñ (Sports|Concert)* event_svc: Ñ (Sports|Concert)* Server (CC-BY-SA 3.0) RRZE Icons, Client (LGPL) Everaldo Coelho Goal: Rewrite server documents into client schema Milo et al., 2003: DTD schema languages ù Reduction to string rewriting ù Claimed PTIME algorithm for bounded recursion (?) Our interest: stronger schema languages (XML Schema) ù Main target: identify tractable restrictions ù cfGs on nested words Nested Words Idea: Correct nesting of tags over label alphabet Σ ù Linearisations of Σ-labelled forests Example a a xayx{ayxayxbyxcyx{cyx{byxayx{ayx{ay b a c Schema -
Regular Description of Context-Free Graph Languages
Regular Description of ContextFree Graph Languages Jo ost Engelfriet and Vincent van Oostrom Department of Computer Science Leiden University POBox RA Leiden The Netherlands email engelfriwileidenunivnl Abstract A set of lab eled graphs can b e dened by a regular tree language and one regular string language for each p ossible edge lab el as follows For each tree t from the regular tree language the graph g r t has the same no des as t with the same lab els and there is an edge with lab el from no de x to no de y if the string of lab els of the no des on the shortest path from x to y in t b elongs to the regular string language for Slightly generalizing this denition scheme we allow g r t to have only those no des of t that have certain lab els and we allow a relab eling of these no des It is shown that in this way exactly the class of CedNCE graph languages generated by CedNCE graph grammars is obtained one of the largest known classes of contextfree graph languages Introduction There are many kinds of contextfree graph grammars see eg ENRR EKR Some are no de rewriting and others are edge rewriting In b oth cases a pro duc tion of the grammar is of the form X D C Application of such a pro duction to a lab eled graph H consists of removing a no de or edge lab eled X from H replacing it by the graph D and connecting D to the remainder of H accord ing to the embedding pro cedure C Since these grammars are contextfree in the sense that one no de or edge is replaced their derivations can b e mo d eled by derivation trees as -
Grammars (Cfls & Cfgs) Reading: Chapter 5
Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5 1 Not all languages are regular So what happens to the languages which are not regular? Can we still come up with a language recognizer? i.e., something that will accept (or reject) strings that belong (or do not belong) to the language? 2 Context-Free Languages A language class larger than the class of regular languages Supports natural, recursive notation called “context- free grammar” Applications: Context- Parse trees, compilers Regular free XML (FA/RE) (PDA/CFG) 3 An Example A palindrome is a word that reads identical from both ends E.g., madam, redivider, malayalam, 010010010 Let L = { w | w is a binary palindrome} Is L regular? No. Proof: N N (assuming N to be the p/l constant) Let w=0 10 k By Pumping lemma, w can be rewritten as xyz, such that xy z is also L (for any k≥0) But |xy|≤N and y≠ + ==> y=0 k ==> xy z will NOT be in L for k=0 ==> Contradiction 4 But the language of palindromes… is a CFL, because it supports recursive substitution (in the form of a CFG) This is because we can construct a “grammar” like this: Same as: 1. A ==> A => 0A0 | 1A1 | 0 | 1 | Terminal 2. A ==> 0 3. A ==> 1 Variable or non-terminal Productions 4. A ==> 0A0 5. A ==> 1A1 How does this grammar work? 5 How does the CFG for palindromes work? An input string belongs to the language (i.e., accepted) iff it can be generated by the CFG G: Example: w=01110 A => 0A0 | 1A1 | 0 | 1 | G can generate w as follows: Generating a string from a grammar: 1. -
A Declarative Extension of Parsing Expression Grammars for Recognizing Most Programming Languages
Journal of Information Processing Vol.24 No.2 256–264 (Mar. 2016) [DOI: 10.2197/ipsjjip.24.256] Regular Paper A Declarative Extension of Parsing Expression Grammars for Recognizing Most Programming Languages Tetsuro Matsumura1,†1 Kimio Kuramitsu1,a) Received: May 18, 2015, Accepted: November 5, 2015 Abstract: Parsing Expression Grammars are a popular foundation for describing syntax. Unfortunately, several syn- tax of programming languages are still hard to recognize with pure PEGs. Notorious cases appears: typedef-defined names in C/C++, indentation-based code layout in Python, and HERE document in many scripting languages. To recognize such PEG-hard syntax, we have addressed a declarative extension to PEGs. The “declarative” extension means no programmed semantic actions, which are traditionally used to realize the extended parsing behavior. Nez is our extended PEG language, including symbol tables and conditional parsing. This paper demonstrates that the use of Nez Extensions can realize many practical programming languages, such as C, C#, Ruby, and Python, which involve PEG-hard syntax. Keywords: parsing expression grammars, semantic actions, context-sensitive syntax, and case studies on program- ming languages invalidate the declarative property of PEGs, thus resulting in re- 1. Introduction duced reusability of grammars. As a result, many developers need Parsing Expression Grammars [5], or PEGs, are a popu- to redevelop grammars for their software engineering tools. lar foundation for describing programming language syntax [6], In this paper, we propose a declarative extension of PEGs for [15]. Indeed, the formalism of PEGs has many desirable prop- recognizing context-sensitive syntax. The “declarative” exten- erties, including deterministic behaviors, unlimited look-aheads, sion means no arbitrary semantic actions that are written in a gen- and integrated lexical analysis known as scanner-less parsing. -
Weighted Regular Tree Grammars with Storage
Discrete Mathematics and Theoretical Computer Science DMTCS vol. 20:1, 2018, #26 Weighted Regular Tree Grammars with Storage 1 2 2 Zoltan´ Ful¨ op¨ ∗ Luisa Herrmann † Heiko Vogler 1 Department of Foundations of Computer Science, University of Szeged, Hungary 2 Faculty of Computer Science, Technische Universitat¨ Dresden, Germany received 19th May 2017, revised 11th June 2018, accepted 22nd June 2018. We introduce weighted regular tree grammars with storage as combination of (a) regular tree grammars with storage and (b) weighted tree automata over multioperator monoids. Each weighted regular tree grammar with storage gen- erates a weighted tree language, which is a mapping from the set of trees to the multioperator monoid. We prove that, for multioperator monoids canonically associated to particular strong bimonoids, the support of the generated weighted tree languages can be generated by (unweighted) regular tree grammars with storage. We characterize the class of all generated weighted tree languages by the composition of three basic concepts. Moreover, we prove results on the elimination of chain rules and of finite storage types, and we characterize weighted regular tree grammars with storage by a new weighted MSO-logic. Keywords: regular tree grammars, weighted tree automata, multioperator monoids, storage types, weighted MSO- logic 1 Introduction In automata theory, weighted string automata with storage are a very recent topic of research [HV15, HV16, VDH16]. This model generalizes finite-state string automata in two directions. On the one hand, it is based on the concept of “sequential program + machine”, which Scott [Sco67] proposed in order to har- monize the wealth of upcoming automaton models. -
A Note on Nested Words
A Note on Nested Words Andreas Blass¤ Yuri Gurevichy Abstract For every regular language of nested words, the underlying strings form a context-free language, and every context-free language can be obtained in this way. Nested words and nested-word automata are gen- eralized to motley words and motley-word automata. Every motley- word automation is equivalent to a deterministic one. For every reg- ular language of motley words, the underlying strings form a ¯nite intersection of context-free languages, and every ¯nite intersection of context-free languages can be obtained in this way. 1 Introduction In [1], Rajeev Alur and P. Madhusudan introduced and studied nested words, nested-word automata and regular languages of nested words. A nested word is a string of letters endowed with a so-called nested relation. Alur and Madhusudan prove that every nested-word automaton is equivalent to a deterministic one. In Section 2, we recall some of their de¯nitions. We quote from the introduction to [1]. \The motivating application area for our results has been soft- ware veri¯cation. Given a sequential program P with stack-based control flow, the execution of P is modeled as a nested word with nesting edges from calls to returns. Speci¯cation of the program is given as a nested word automaton A, and veri¯cation corre- sponds to checking whether every nested word generated by P is ¤Math Dept, University of Michigan, Ann Arbor, MI 48109; [email protected] yMicrosoft Research, Redmond, WA 98052; [email protected] 1 accepted by A. Nested-word automata can express a variety of requirements such as stack-inspection properties, pre-post condi- tions, and interprocedural data-flow properties." In Section 3, we show that all context-free properties, and only context- free properties, can be captured by a nested-word automaton in the following precise sense. -
Practical Experiments with Regular Approximation of Context-Free Languages
Practical Experiments with Regular Approximation of Context-Free Languages Mark-Jan Nederhof German Research Center for Arti®cial Intelligence Several methods are discussed that construct a ®nite automaton given a context-free grammar, including both methods that lead to subsets and those that lead to supersets of the original context-free language. Some of these methods of regular approximation are new, and some others are presented here in a more re®ned form with respect to existing literature. Practical experiments with the different methods of regular approximation are performed for spoken-language input: hypotheses from a speech recognizer are ®ltered through a ®nite automaton. 1. Introduction Several methods of regular approximation of context-free languages have been pro- posed in the literature. For some, the regular language is a superset of the context-free language, and for others it is a subset. We have implemented a large number of meth- ods, and where necessary, re®ned them with an analysis of the grammar. We also propose a number of new methods. The analysis of the grammar is based on a suf®cient condition for context-free grammars to generate regular languages. For an arbitrary grammar, this analysis iden- ti®es sets of rules that need to be processed in a special way in order to obtain a regular language. The nature of this processing differs for the respective approximation meth- ods. For other parts of the grammar, no special treatment is needed and the grammar rules are translated to the states and transitions of a ®nite automaton without affecting the language. -
Formal Language and Automata Theory (CS21004)
Context Free Grammar Normal Forms Derivations and Ambiguities Pumping lemma for CFLs PDA Parsing CFL Properties Formal Language and Automata Theory (CS21004) Soumyajit Dey CSE, IIT Kharagpur Context Free Formal Language and Automata Theory Grammar (CS21004) Normal Forms Derivations and Ambiguities Pumping lemma Soumyajit Dey for CFLs CSE, IIT Kharagpur PDA Parsing CFL Properties DPDA, DCFL Membership CSL Soumyajit Dey CSE, IIT Kharagpur Formal Language and Automata Theory (CS21004) Context Free Grammar Normal Forms Derivations and Ambiguities Pumping lemma for CFLs PDA Parsing CFL Properties Formal Language and Automata Announcements Theory (CS21004) Soumyajit Dey CSE, IIT Kharagpur Context Free Grammar Normal Forms Derivations and The slide is just a short summary Ambiguities Follow the discussion and the boardwork Pumping lemma for CFLs Solve problems (apart from those we dish out in class) PDA Parsing CFL Properties DPDA, DCFL Membership CSL Soumyajit Dey CSE, IIT Kharagpur Formal Language and Automata Theory (CS21004) Context Free Grammar Normal Forms Derivations and Ambiguities Pumping lemma for CFLs PDA Parsing CFL Properties Formal Language and Automata Table of Contents Theory (CS21004) Soumyajit Dey 1 Context Free Grammar CSE, IIT Kharagpur 2 Normal Forms Context Free Grammar 3 Derivations and Ambiguities Normal Forms 4 Derivations and Pumping lemma for CFLs Ambiguities 5 Pumping lemma PDA for CFLs 6 Parsing PDA Parsing 7 CFL Properties CFL Properties DPDA, DCFL 8 DPDA, DCFL Membership 9 Membership CSL 10 CSL Soumyajit Dey CSE, -
Efficient Recursive Parsing
Purdue University Purdue e-Pubs Department of Computer Science Technical Reports Department of Computer Science 1976 Efficient Recursive Parsing Christoph M. Hoffmann Purdue University, [email protected] Report Number: 77-215 Hoffmann, Christoph M., "Efficient Recursive Parsing" (1976). Department of Computer Science Technical Reports. Paper 155. https://docs.lib.purdue.edu/cstech/155 This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. EFFICIENT RECURSIVE PARSING Christoph M. Hoffmann Computer Science Department Purdue University West Lafayette, Indiana 47907 CSD-TR 215 December 1976 Efficient Recursive Parsing Christoph M. Hoffmann Computer Science Department Purdue University- Abstract Algorithms are developed which construct from a given LL(l) grammar a recursive descent parser with as much, recursion resolved by iteration as is possible without introducing auxiliary memory. Unlike other proposed methods in the literature designed to arrive at parsers of this kind, the algorithms do not require extensions of the notational formalism nor alter the grammar in any way. The algorithms constructing the parsers operate in 0(k«s) steps, where s is the size of the grammar, i.e. the sum of the lengths of all productions, and k is a grammar - dependent constant. A speedup of the algorithm is possible which improves the bound to 0(s) for all LL(l) grammars, and constructs smaller parsers with some auxiliary memory in form of parameters -
Equivalence of Deterministic Nested Word to Word Transducers
Equivalence of Deterministic Nested Word to Word Transducers S lawomir Staworko13 and Gr´egoireLaurence23 and Aur´elienLemay23 and Joachim Niehren13 1 INRIA, Lille 2 University of Lille 3 Mostrare project, INRIA & LIFL (CNRS UMR8022) Abstract. We study the equivalence problem of deterministic nested word to word transducers and show it to be surprisingly robust. Modulo polynomial time reductions, it can be identified with 4 equivalence prob- lems for diverse classes of deterministic non-copying order-preserving transducers. In particular, we present polynomial time back and fourth reductions to the morphism equivalence problem on context free lan- guages, which is known to be solvable in polynomial time. Keywords: trees, transducers, automata, context-free grammars, XML. 1 Introduction Nested word automata (nas) [1] are tree automata, that operate on linearizations of unranked trees in streaming manner. All nodes of the tree are visited twice, as usual in preorder traversals. At the first visit (opening event), a symbol is to be pushed onto a stack that is popped at the second visit (closing event). nas were introduced as a reformulation of visibly pushdown automata [2] and proved equivalent to pushdown forest automata [11] and streaming tree automata [6]. More formally, a nested word over Σ is a word of parenthesis in Σ^ = fop; clg × Σ, such that all opening parenthesis are properly closed. We consider nested words as linearizations of unranked trees. For instance, the linearization of a(b(c); d) is the nested word (op; a) · (op; b) · (op; c) · (cl; c) · (cl; b) · (op; d) · (cl; d) · (cl; a), or in XML notation <a><b><c></c></b><d></d></a>. -
Tree Automata Techniques and Applications
Tree Automata Techniques and Applications Hubert Comon Max Dauchet Remi´ Gilleron Florent Jacquemard Denis Lugiez Sophie Tison Marc Tommasi Contents Introduction 7 Preliminaries 11 1 Recognizable Tree Languages and Finite Tree Automata 13 1.1 Finite Tree Automata . 14 1.2 The pumping Lemma for Recognizable Tree Languages . 22 1.3 Closure Properties of Recognizable Tree Languages . 23 1.4 Tree homomorphisms . 24 1.5 Minimizing Tree Automata . 29 1.6 Top Down Tree Automata . 31 1.7 Decision problems and their complexity . 32 1.8 Exercises . 35 1.9 Bibliographic Notes . 38 2 Regular grammars and regular expressions 41 2.1 Tree Grammar . 41 2.1.1 Definitions . 41 2.1.2 Regular tree grammar and recognizable tree languages . 44 2.2 Regular expressions. Kleene’s theorem for tree languages. 44 2.2.1 Substitution and iteration . 45 2.2.2 Regular expressions and regular tree languages. 48 2.3 Regular equations. 51 2.4 Context-free word languages and regular tree languages . 53 2.5 Beyond regular tree languages: context-free tree languages . 56 2.5.1 Context-free tree languages . 56 2.5.2 IO and OI tree grammars . 57 2.6 Exercises . 58 2.7 Bibliographic notes . 61 3 Automata and n-ary relations 63 3.1 Introduction . 63 3.2 Automata on tuples of finite trees . 65 3.2.1 Three notions of recognizability . 65 3.2.2 Examples of the three notions of recognizability . 67 3.2.3 Comparisons between the three classes . 69 3.2.4 Closure properties for Rec£ and Rec; cylindrification and projection . -
Advanced Parsing Techniques
Advanced Parsing Techniques Announcements ● Written Set 1 graded. ● Hard copies available for pickup right now. ● Electronic submissions: feedback returned later today. Where We Are Where We Are Parsing so Far ● We've explored five deterministic parsing algorithms: ● LL(1) ● LR(0) ● SLR(1) ● LALR(1) ● LR(1) ● These algorithms all have their limitations. ● Can we parse arbitrary context-free grammars? Why Parse Arbitrary Grammars? ● They're easier to write. ● Can leave operator precedence and associativity out of the grammar. ● No worries about shift/reduce or FIRST/FOLLOW conflicts. ● If ambiguous, can filter out invalid trees at the end. ● Generate candidate parse trees, then eliminate them when not needed. ● Practical concern for some languages. ● We need to have C and C++ compilers! Questions for Today ● How do you go about parsing ambiguous grammars efficiently? ● How do you produce all possible parse trees? ● What else can we do with a general parser? The Earley Parser Motivation: The Limits of LR ● LR parsers use shift and reduce actions to reduce the input to the start symbol. ● LR parsers cannot deterministically handle shift/reduce or reduce/reduce conflicts. ● However, they can nondeterministically handle these conflicts by guessing which option to choose. ● What if we try all options and see if any of them work? The Earley Parser ● Maintain a collection of Earley items, which are LR(0) items annotated with a start position. ● The item A → α·ω @n means we are working on recognizing A → αω, have seen α, and the start position of the item was the nth token. ● Using techniques similar to LR parsing, try to scan across the input creating these items.