Context-Free Grammars: a Way to Specify Some Nonregular Languages

Total Page:16

File Type:pdf, Size:1020Kb

Context-Free Grammars: a Way to Specify Some Nonregular Languages Context-Free Grammars A Way to Specify Some Nonregular Languages Tuesday, November 2, 2010 Reading: Sipser 2.1, Stoughton 4.1, CS235 Languages and Automata Department of Computer Science Wellesley College You Are Here! Reg = Regular Languages 0*1* • Deterministic Finite Automaton • Nondeterministic Finite Automaton (01)* • Regular Expression 0*1*+(01)* • Right-Linear Grammar CFL = Context-Free Language 0n1n • Context-Free Grammar wwR • Nondeterministic Pushdown Automaton Dec = Recursive (Turing-Decidable) Language 0n1n2n • Turing Machine • Unrestricted Grammar ww RE = Recursively Enumerable (Turing-Recognizable/Acceptable) Language Lan = All Languages Context Free Grammars 23-2 1 Overview oIntroduce Context-Free Grammars (CFGs), a new formalism for specifying languages = so-called Context-Free Languages (CFLs). o Define the set of strings denoted by a Context-Free Grammar via the notions of derivations and parse trees. o Show that CFGs can specify some simple nonregular languages. o Show how grammars are manipulated in Forlan Context Free Grammars 23-3 A Sample Context-Free Grammar (CFG) LHS RHS Informally, a context-free grammar (CFG) is a S → AB collection of productions = substitution rules A → 0A1 for rewriting variables (a.k.a. nonterminals) to strings A → % of variables and terminals (non-variable symbols). B → 1B0 Each rule has a left-hand side (LHS) consisting of B → % a single variable and a right-hand side (RHS) consisting of variables and terminals. A CFG has a start variable, which is conventionally the variable in the LHS of the first rule. A string of variables and terminals can be rewritten to another by substituting the RHS of a rule for the variable in the LHS. E.g.: 0A11B0 00A111B0 0A11B0 0%11B0 = 011B0 0A11B0 0A111B00 0A11B0 0A11%0 = 0A110 Context Free Grammars 23-4 2 Derivations Generate Strings A sequence of substitution steps that rewrites S → AB the start variable to a string of terminals is called A → 0A1 a derivation.A CFG generates a string of terminals A → % s if there is a is derivation of s. E.g.: B → 1B0 S AB %B = B % B → % S AB 0A1B 0%1B = 01B 01% = 01 S AB A1B0 A1%0 = A10 %10 = 10 S AB 0A1B 0A11B0 00A111B0 00A111%0 = 00A1110 00%1110 = 001110 Exp lic it %s are usually omitted from a deri vati ons unless % is the final string: S AB B % S AB 0A1B 01B 01 Context Free Grammars 23-5 Leftmost and Rightmost Derivations S → AB A → 0A1 A → % There are often multiple derivations generating the B → 1B0 same string that differ inconsequentially in the order B → % of substitutions performed. E.g.: S AB 0A1B 0A11B0 00A111B0 00A1110 001110 S AB A1B0 0A11B0 0A110 00A1110 001110 We can standardize the sequence by always substituting for the lefmost variable (resulting in a leftmost derivation): S AB 0A1B 00A11B 0011B 00111B0 001110 or the rightmost variable (resulting in a rightmost derivation): S AB A1B0 A10 0A110 00A1110 001110 Context Free Grammars 23-6 3 Parse Trees S → AB Any sequence of rewriting steps can be depicted A → 0A1 as a parse tree in which each internal node shows A → % how a variable rewrites to the children of the node. B → 1B0 S B → % A B 0 A 1 1 B 0 0 A 1 % % The yield of a parse tree = the string consisting of the leaves of the tree from from left to right = the result of the rewriting steps. For the above parse tree, the yield = 00%111%0 = 001110 Context Free Grammars 23-7 Alternative Ways to Write CFGs Can combine multiple In the literature, the → in productions productions from the is often replaced by ::=, especially in same variable using | so-called Backus-Naur Form (BNF). S → AB S ::= AB A → 0A1 | % A ::= 0A1 | % B → 1B0 | % B ::= 1B0 | % Formally, a CFG is a quadruple: Forlan format ({S,A,B}, (1) set of variables {variables} Only three parts {0,1}, (2) set of terminals S, A, B are specified because the S, (3) start variable {start variable} terminals are {(S,AB), (4) productions S implicitly defined (A,0A1), (A,%) {productions} as Sym -variables (B,1B0), (B,%)} S -> AB; ) A -> % | 0A1; B -> % | 1B0 Context Free Grammars 23-8 4 The Language of a CFG The language of a CFG is the set of all terminal S → AB strings generated by the language. A → 0A1 A → % What is the language of our sample CFG? B → 1B0 B → % A language that can be specified by a CFG is called a context-free language (CFL). Context Free Grammars 23-9 Designing CFGs for Some Simple Languages What is a CFG for {0n1n | n Nat}? What is a CFG for {0m1n | m ≥ n}? What is a CFG for {0m1n | m > n}? Context Free Grammars 23-10 5 {w | w in {a,b}* contains equal # of as & bs} What is a CFG for the above language? For intuition, consider annotating each symbol with #as - #bs so far: 1 2 3 2 3 2 1 0 -1 -2 -3 -2 -1 -2 -1 0 1 0 a a a b a b b b b b b a a b a a a b Note that each a matches a particular b. Context Free Grammars 23-11 Balanced Parentheses Consider a language in which the only two terminals are ( and ). Let L(x) = # of left parens in x; R(x) = # of right parens in x A string of parentheses is balanced iff (1) L()(x) = R()(x) (l(alternative ly, L()(x) –R()(x) = 0)0.) (2) For every prefix y of x, L(y) ≥ R(y) (alternatively, L(y) – R(y) ≥ 0) This is just like the language with equal # of as and bs, except that difference can never be < 0. 1 2 3 2 3 2 1 0 1 2 3 2 1 2 1 0 1 0 ( ( ( ) ( ) ) ) ( ( ( ) ) ( ) ) ( ) Note that each ( matches a particular ) after it. Context Free Grammars 23-12 6 What is a CFG for Balanced Parentheses? IiilIntuitively, why is the CFG correct? (For a formal proof of correctness, see Kozen Lecture 20) Context Free Grammars 23-13 CFGs can Specify Natural Languages <Sentence> → <NounPhrase><VerbPhrase> <NounPhrase> → <Article><NounUnit> <NounUnit> → <Noun> | <Adjective><NounUnit> | <NounUnit> that <VerbPhrase> <VerbPhrase> → <Verb> <NounPhrase> <Article> → a | the <Adjective> → big | small | black | gray | furry <Noun> → dog | cat | mouse | bug <Verb> → loves | chases | eats (Imagine the nonterminals are indecomposable tokens, not strings.) Give a parse tree for the following sentence: The big black dog that chases the gray cat loves a furry mouse that eats a bug Context Free Grammars 23-14 7 CFGs can Specify Programming Languages Here is a CFG for SLiP: <Stm> → <Stm> ; <Stm> | [ID(string)] := <Exp> | print ( ExpList ) <Exp> → [ID(string)] | [INT(integer)] | <Exp> [OP(binop)] <Exp> | ( <Stm> , <Exp> ) <ExpList> → Exp | <ExpList> , <Exp> (Note: ; stands for [SEMI] token, ( stands for [LPAREN] token, etc.) Give a parse tree for the following statement: prod := (print (sum, sum-1), 10*sum) Context Free Grammars 23-15 Ambiguity A CFG is ambiguous if there is more than one parse tree for a string that it generates. S → % This is an example of an ambiguous grammar. S → SS The stri ng abba hshas an iifiitnfinite number of parse ts!trees! S → aSb S → bSa Here are a few of them: S S S S S S S S S a b a S b S a S b S S S S b S a % % % b S a % a S b S S S S % % % % % % Context Free Grammars 23-16 8 Ambiguity Can Affect Meaning Ambiguity can affect the meaning of a phrase in both natural languages and programming languages. Here’s are some natural language examples: High school principal Fruit flies like bananas. A woman without her man is nothing. A classic example in programming languages is arithmetic expressions: E → ID(str) | INT(int) | E B E | ( E ) B → + | - | * | / Context Free Grammars 23-17 Arithmetic Expressions: Precedence E → ID(str) | INT(int) | EBE | ( E ) B → + | - | * | / What does 2 * 3 + 4 mean? E E E B E E B E Int(2) * E B E E B E + Int(4) Int(3) + Int(4) Int(2) * Int(3) Context Free Grammars 23-18 9 Arithmetic Expressions: Associativity E → ID(str) | INT(int) | EBE | ( E ) B → + | - | * | / What does 2 - 3 - 4 mea n? E E E B E E B E Int(2) - E B E E B E - Int(4) Int(3) - Int(4) Int(2) - Int(3) In a later lecture we’ll see how to rewrite the arithmetic expression grammar to unambiguously express the standard precedence and associativity rules. Context Free Grammars 23-19 Forlan Gram Module -open Gram; opening Gram type gram val fromString : string -> gram val toString : gram -> string val inppgut : string -> gram val output : string * gram -> unit val variables : gram -> sym set val startVariable : gram -> sym val productions : gram -> prod set val numVariables : gram -> int val numProductions : gram -> int val alphabet : gram -> sym set val renameVariables : gram * sym_rel -> gram val renameViblCVariablesCanon illically: gram -> gram val generated : gram -> str -> bool (* Many other bindings omitted; we’ll see a few more later *) Context Free Grammars 23-20 10 Forlan Gram Examples - val L1gram = Gram.input "L1.gram"; val L1gram = - : gram - Gram.output ("", L1gram); {variables} A, B, S {start variable} S {productions} A -> % | 0A1; B -> % | 1B0; S -> AB val it = () : unit - Gram.numVariables L1gram; val it = 3 : int -SymSet.toString (Gram.variables L1gram); val it = "A, B, S" : string Context Free Grammars 23-21 Forlan Gram Examples, Part 2 - Gram.numProductions L1gram; val it = 5 : int - fun prodToString (sym,str) = (Sym.toString sym) ^ " -> " ^ (Str.toString str); val prodToString = fn : sym * str -> string -List.map ppgrodToString (Set.toList (Gram.productions L1gram)); val it = ["A -> %","A -> 0A1","B -> %","B -> 1B0","S -> AB"] : string list - fun testOnString gram string = = Gram.generated gram (Str.fromString (if string = "" then "%" else string)); val testOnString = fn : gram -> string -> bool (* Gram.generated uses a general parsing technique that we’ll study later to automatically determine whether a string is generated by a grammar *) - testOnString L1gram "011100"; val it = true : bool - testOnString L1gram "010101"; val it = false : bool (* Note testOnString and other grammar testing functions can be found in the CS235 module GramTest in ~/cs235/download/utils/GramTest.sml *) Context Free Grammars 23-22 11 Some Things Forlan Can’t Do Here are some functions we’d like the Forlan Gram module to provide that it doesn’t: val equalLanguages: gram -> gram -> bool (* Determine if two grammars accept the same language *) val isAmbiguous: gram -> bool (* Determine if the given grammar is ambiguous – i.e., there is a string in the language generated by the grammar that has more than one parse tree.
Recommended publications
  • Nooj Computational Devices Max Silberztein
    NooJ Computational Devices Max Silberztein To cite this version: Max Silberztein. NooJ Computational Devices. Formalising Natural Languages With NooJ, Jun 2012, Paris, France. hal-02435921 HAL Id: hal-02435921 https://hal.archives-ouvertes.fr/hal-02435921 Submitted on 11 Jan 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. NOOJ COMPUTATIONAL DEVICES MAX SILBERZTEIN Introduction NooJ’s linguistic development environment provides tools for linguists to construct linguistic resources that formalize 7 types of linguistic phenomena: typography, orthography, inflectional and derivational morphology, local and structural syntax, and semantics. NooJ also provides a set of parsers that can process any linguistic resource for these 7 types, and apply it to any corpus of texts in order to extract examples, annotate matching sequences, perform statistical analyses, etc.1 NooJ’s approach to Linguistics is peculiar in the world of Computational Linguistics: instead of constructing a large single grammar to describe a particular natural language (e.g. “a grammar of English”), NooJ users typically construct, edit, test and maintain a large number of local (small) grammars; for instance, there is a grammar that describes how to conjugate the verb to be, another grammar that describes how to state a date in English, another grammar that describes the heads of Noun Phrases, etc.
    [Show full text]
  • Implementation of Unrestricted Grammar in to the Recursively Enumerable Language Using Turing Machine
    The International Journal Of Engineering And Science (IJES) ||Volume||2 ||Issue|| 3 ||Pages|| 56-59 ||2013|| ISSN: 2319 – 1813 ISBN: 2319 – 1805 Implementation of Unrestricted Grammar in To the Recursively Enumerable Language Using Turing Machine 1,Jainendra Singh, 2, Dr. S.K. Saxena 1,Department of Computer Science, Maharaja Surajmal Institute 2,Department of Computer Engineering, Delhi Technological University -------------------------------------------------------Abstract------------------------------------------------------- This paper presents the implementation of the unrestricted grammar in to recursively enumerable language for JFLAP platform. Automata play a major role in compiler design and parsing. The class of formal languages that work for the most complex problems belongs to the set of Recursively Enumerable Language (REL).RELs are accepted by the type of automata as Turing Machine. Turing Machines are the most powerful computational machines and are the theoretical basis for modern computers. Turing Machine works for all classes of languages including regular language, CFL as well as Recursive Enumerable Languages. Unrestricted grammar are much more powerful than restricted forms like the regular and context free grammars. In facts, unrestricted grammars corresponds to the largest family of languages so we can hope to recognize by mechanical means; that is unrestricted grammars generates exactly the family of recursively enumerable languages. Turing Machine is used to implementation of unrestricted grammar & RELs for JFLAP
    [Show full text]
  • Theory of Computation
    Theory of Computation Todd Gaugler December 14, 2011 2 Contents 1 Mathematical Background 5 1.1 Overview . .5 1.2 Number System . .5 1.3 Functions . .6 1.4 Relations . .6 1.5 Recursive Definitions . .8 1.6 Mathematical Induction . .9 2 Languages and Context-Free Grammars 11 2.1 Languages . 11 2.2 Counting the Rational Numbers . 13 2.3 Grammars . 14 2.4 Regular Grammar . 15 3 Normal Forms and Finite Automata 17 3.1 Review of Grammars . 17 3.2 Normal Forms . 18 3.3 Machines . 20 3.3.1 An NFA λ ..................................... 22 4 Regular Languages 23 4.1 Computation . 24 4.2 The Extended Transition Function . 24 4.3 Algorithms . 26 4.3.1 Removing Non-Determinism . 26 4.3.2 State Minimization . 26 4.3.3 Expression Graph . 26 4.4 The Relationship between a Regular Grammar and the Finite Automaton . 26 4.4.1 Building an NFA corresponding to a Regular Grammar . 27 4.4.2 Closure . 27 4.5 Review for the First Exam . 28 4.6 The Pumping Lemma . 28 5 Pushdown Automata and Context-Free Languages 31 5.1 Pushdown Automata . 31 5.2 Variations on the PDA Theme . 34 5.3 Acceptance of Context-Free Languages . 36 3 CONTENTS CONTENTS 5.4 The Pumping Lemma for Context-Free Languages . 36 5.5 Closure Properties of Context- Free Languages . 37 6 Turing Machines 39 6.1 The Standard Turing Machine . 39 6.2 Turing Machines as Language Acceptors . 40 6.3 Alternative Acceptance Criteria . 41 6.4 Multitrack Machines . 42 6.5 Two-Way Tape Machines .
    [Show full text]
  • Integrating JFLAP Into the Classroom
    Changes to JFLAP to Increase Its Use in Courses Susan H. Rodger Duke University [email protected] ITiCSE 2011 Darmstadt, Germany June 29, 2011 NSF Grants CCLI-0442513 and TUES-1044191 Co-Authors Henry Qin Jonathan Su Overview of JFLAP • Java Formal Languages and Automata Package • Instructional tool to learn concepts of Formal Languages and Automata Theory • Topics: – Regular Languages – Context-Free Languages – Recursively Enumerable Languages – Lsystems • With JFLAP your creations come to life! JFLAP – Regular Languages • Create – DFA and NFA – Moore and Mealy – regular grammar – regular expression • Conversions – NFA to DFA to minimal DFA – NFA regular expression – NFA regular grammar JFLAP – Regular languages (more) • Simulate DFA and NFA – Step with Closure or Step by State – Fast Run – Multiple Run • Combine two DFA • Compare Equivalence • Brute Force Parser • Pumping Lemma JFLAP – Context-free Languages • Create – Nondeterministic PDA – Context-free grammar – Pumping Lemma • Transform – PDA CFG – CFG PDA (LL & SLR parser) – CFG CNF – CFG Parse table (LL and SLR) – CFG Brute Force Parser JFLAP – Recursively Enumerable Languages • Create – Turing Machine (1-Tape) – Turing Machine (multi-tape) – Building Blocks – Unrestricted grammar • Parsing – Unrestricted grammar with brute force parser JFLAP - L-Systems • This L-System renders as a tree that grows larger with each successive derivation step. JFLAP’s Use Around the World • JFLAP web page has over 300,000 hits since 1996 • Google Search – JFLAP appears on over 9830 web pages
    [Show full text]
  • Recursively Enumerable
    Chapter 11 A HIERARCHY OF FORMAL LANGUAGES AND AUTOMATA Learning Objectives At the conclusion of the chapter, the student will be able to: • Explain the difference between recursive and recursively enumerable languages • Describe the type of productions in an unrestricted grammar • Identify the types of languages generated by unrestricted grammars • Describe the type of productions in a context sensitive grammar • Give a sequence of derivations to generate a string using the productions in a context sensitive grammar • Identify the types of languages generated by context-sensitive grammars • Construct a context-sensitive grammar to generate a particular language • Describe the structure and components of the Chomsky hierarchy Recursive and Recursively Enumerable Languages • A language L is recursively enumerable if there exists a Turing machine that accepts it (as we have previously stated, rejected strings cause the machine to either not halt or halt in a nonfinal state) • A language L is recursive if there exists a Turing machine that accepts it and is guaranteed to halt on every valid input string • In other words, a language is recursive if and only if there exists a membership algorithm for it Languages That Are Not Recursively Enumerable • Theorem 11.1 states that, for any nonempty alphabet, there exist languages not recursively enumerable • One proof involves a technique called diagonalization, which can be used to show that, in a sense, there are fewer Turing Machines than there are languages • More explicitly, Theorem 11.3 describes
    [Show full text]
  • Unrestricted Grammars
    Automata Theory CS411-2004F-13 Unrestricted Grammars David Galles Department of Computer Science University of San Francisco 13-0: Language Hierarchy Regular Regular Expressions Finite Automata Languaes Context Free Context-Free Grammars Push-Down Automata Languages Recusively Enumerable Languages ?? Turing Machines 13-1: CFG Review G = (V, Σ,R,S) V = Set of symbols, both terminals & non-terminals Σ ⊂ V set of terminals (alphabet for the language being described) R ⊂ ((V − Σ) × V ∗) Set of rules S ∈ (V − Σ) Start symbol 13-2: Unrestricted Grammars G = (V, Σ,R,S) V = Set of symbols, both terminals & non-terminals Σ ⊂ V set of terminals (alphabet for the language being described) R ⊂ (V ∗(V − Σ)V ∗ × V ∗) Set of rules S ∈ (V − Σ) Start symbol 13-3: Unrestricted Grammars R ⊂ (V ∗(V − Σ)V ∗ × V ∗) Set of rules In an Unrestricted Grammar, the left-hand side of a rule contains a string of terminals and non-terminals (at least one of which must be a non-terminal) Rules are applied just like CFGs: Find a substring that matches the LHS of some rule Replace with the RHS of the rule 13-4: Unrestricted Grammars To generate a string with an Unrestricted Grammar: Start with the initial symbol While the string contains at least one non-terminal: Find a substring that matches the LHS of some rule Replace that substring with the RHS of the rule 13-5: Unrestricted Grammars Example: Grammar for L = {anbncn : n> 0} First, generate (ABC)∗ Next, non-deterministically rearrange string Finally, convert to terminals (A → a, B → b, etc.), ensuring that string
    [Show full text]
  • Csci 311, Models of Computation Chapter 11 a Hierarchy of Formal Languages and Automata
    CSci 311, Models of Computation Chapter 11 A Hierarchy of Formal Languages and Automata H. Conrad Cunningham 29 December 2015 Contents Introduction . 1 11.1 Recursive and Recursively Enumerable Languages . 2 11.1.1 Aside: Countability . 2 11.1.2 Definition of Recursively Enumerable Language . 2 11.1.3 Definition of Recursive Language . 2 11.1.4 Enumeration Procedure for Recursive Languages . 3 11.1.5 Enumeration Procedure for Recursively Enumerable Lan- guages . 3 11.1.6 Languages That are Not Recursively Enumerable . 4 11.1.7 A Language That is Not Recursively Enumerable . 5 11.1.8 A Language That is Recursively Enumerable but Not Recursive . 6 11.2 Unrestricted Grammars . 6 11.3 Context-Sensitive Grammars and Languages . 6 11.3.1 Linz Example 11.2 . 7 11.3.2 Linear Bounded Automata (lba) . 8 11.3.3 Relation Between Recursive and Context-Sensitive Lan- guages . 8 11.4 The Chomsky Hierarchy . 9 1 Copyright (C) 2015, H. Conrad Cunningham Acknowledgements: MS student Eli Allen assisted in preparation of these notes. These lecture notes are for use with Chapter 11 of the textbook: Peter Linz. Introduction to Formal Languages and Automata, Fifth Edition, Jones and Bartlett Learning, 2012.The terminology and notation used in these notes are similar to those used in the Linz textbook.This document uses several figures from the Linz textbook. Advisory: The HTML version of this document requires use of a browser that supports the display of MathML. A good choice as of December 2015 seems to be a recent version of Firefox from Mozilla.
    [Show full text]
  • Chapter 11 a Hierarchy of Formal Languages and Automata
    ✐ ✐ “15529˙CH11˙Linz” — 2011/1/12 — 10:03 — page 277 — #1 ✐ ✐ © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALEChapter OR DISTRIBUTION 11NOT FOR SALE OR DISTRIBUTION A Hierarchy of © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTIONFormal LanguagesNOT FOR SALE OR DISTRIBUTION and Automata © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTIONe now return our attention to ourNOT main FOR interest, SALE the studyOR DISTRIBUTION of formal languages. Our immediate goal will be to examine the languages W associated with Turing machines and some of their restrictions. Be- cause Turing machines can perform any kind of algorithmic com- putation, we expect to find that the family of languages associated with © Jones & Bartlett Learning, LLC them isquite broad.© It Jones includesnot & Bartlett only regular Learning, and context-free LLC lan- guages, but also the various examples we have encountered that lie outside NOT FOR SALE OR DISTRIBUTIONthese families. The nontrivialNOT FOR question SALE is whether OR DISTRIBUTION there are any languages that are not accepted by some Turing machine. We will answer this ques- tion first by showing that there are more languages than Turing machines, so that there must be some languages for which there are no Turing ma- © Joneschines.
    [Show full text]
  • Regular Expressions and Grammars
    Regular Expressions COMP2600 — Formal Methods for Software Engineering Katya Lebedeva Australian National University Semester 2, 2016 Slides created by Katya Lebedeva COMP 2600 — Regular Expressions 1 Regular Expressions and Finite State Automata Regular expressions can define exactly the same languages that finite state automata describe: the regular languages. Regular expressions offer a declar- ative way to express the strings we want to accept. That is why many systems that process strings use regular expressions as the input language: file search commands (e.g. UNIX grep) • lexical analyzers • these systems convert the regular expression into either a DFA or an NFA, and simulate the automaton on the file being searched • uses the automaton to recognize which token appears next on the input • COMP 2600 — Regular Expressions 2 Example Consider the expression (0 + 1)01⇤ The language described by this expression is the set of all binary strings that start with either 0 or 1 as indicated by (0 + 1), • for which the second symbol is 0 • that end with zero or more 1s as indicated by 1 • ⇤ The language described by this expression is 00,001,0011,00111,...,10,101,1011,10111,... { } COMP 2600 — Regular Expressions 3 Definition of a regular expression Given an alphabet S. Regular expressions (RE) over S are strings over an alphabet S +, , ,(),e,0/ defined inductively as follows [{ · ⇤ } 1. Base case: e is a RE 0/ is a RE for all a S, a is a RE 2 2. Inductive case: if E and F are RE, then E + F is a RE alternation E F is a RE concatenation · E⇤ is a RE Kleene star (E) is a RE COMP 2600 — Regular Expressions 4 where E⇤ is the set of all strings that can be made by concatenating any finite number (including zero) of strings from set described by E.
    [Show full text]
  • Unrestricted Grammar Example Definition 1 Example
    Unrestricted Grammar Example Definition By now you have seen grammars that are called Context Free Grammars which by definition only have a single variable on the left side in each of the production (replacement rules). It is interesting to explore the possibilities once this restriction is removed. In fact, such grammars where any combination of variables and terminals can appear on the left side are called unrestricted grammars. 1 Example While you have probably seen how to construct a context free grammar for fwwRjw 2 fa; bg∗g if you try to make a context free grammar for D = fwwjw 2 fa; bg∗g however, it will prove to be impossible. In fact, using the pumping lemma, it is possible to prove that this language D is not context free. The interesting question, is whether an unrestricted grammar can help us achieve this. Since the solution is quite tricky we will provide the rules and see how they are derived. This example solution is taken from reference. We begin by just adapting the solution for wwR to have some indicators for the beginning and the end of the reversed portion. Enter the following in JFLAP S ! Q# Q ! aQa Q ! bQb Q ! T This will produce strings where the reverse of the first part can be found between T and #. The key now is to add rules that will help in reversing this portion of the string. The idea is to add a rule T ! TR. Now the R can be used to slowly reverse the next two characters. This is where the unrestricted nature of the grammar truly plays a part.
    [Show full text]
  • Turing Machines
    You Are Here! The Church-Turing Thesis Reg = Regular Languages 0*1*! Turing Machines and • Deterministic Finite Automaton Effective Computation • Nondeterministic Finite Automaton (01)*! • Regular Expression 0*1*+(01)* • Right-Linear Grammar ! CFL = Context-Free Language 0n1n Wednesday, November 8, 2011 • Context-Free Grammar wwR Reading: Sipser 3; Kozen 28 • Nondeterministic Pushdown Automaton Dec = Recursive (Turing-Decidable) Language 0n1n2n • Turing Machine CS235 Languages and Automata • Unrestricted Grammar ww Department of Computer Science RE = Recursively Enumerable Wellesley College (Turing-Recognizable/Acceptable) Language Lan = All Languages Turing Machines 27-2 Early Theory of Computation The Church-Turing Thesis o In the 1920s – 1940s, before the advent of modern o Computability is the common spirit embodied by this collection of computing machines, mathematicians were wrestling formalisms. with the notion of effective computation: formalisms for expressing algorithms. o This thesis is a claim that is widely believed about the intuitive notions of algorithm and effective computation. It is not a theorem that can be proved. o Many formalisms evolved: • Turing Machines (Turing); CS235! o Because of their similarity to later computer hardware, Turing machines have become the gold standard for effectively • λ-calculus (Church, Kleene); CS251! computable. • combinatory logic (Schönfinkel, Curry); o Well see in CS251 that the λ-calculus formalism is the • Post systems (Post); foundation of modern programming languages. • µ-recursive functions (Gödel, Herbrand). o A consequence: programming languages all have the same o All of these formalisms were proven to be equivalent computational power in term of what they can express. (But it to each other! may be easier or more efficient to use one than another.) Turing machines 27-3 Turing machines 27-4 What Is A Turing Machine? special blank symbol Informal TM Example: {w#w | w ∈ {a,b}*} Model of computation proposed right –infinite tape 1.
    [Show full text]
  • Lecture 27: Unrestricted Grammars Unrestricted Grammars Example 1
    Unrestricted Grammars Lecture 27: Unrestricted • An unrestricted, or type 0 grammar G is a quadruple (V, Σ, R, S), where: Grammars • V is an alphabet, Σ (the set of terminals) is a subset of V, CSCI 101 • Spring, 2019 • R (the set of rules) is a finite subset of (V+ × V*), • S (the start symbol) is an element of V - Σ. Kim Bruce • The language generated by G is: {w ∈ Σ* : S ⇒G* w}. Example 1: Example 2 • AnBnCn = {anbncn, n ≥ 0} • {w ∈ {a, b, c}* : #a(w) = #b(w) = #c(w)} • S → aBSc • S → ABCS S → ε S → ε Ba → aB AB → BA Bc → bc BC → CB Bb → bb AC → CA BA → AB Proof: CA → AC • CB → BC • Gives only strings in AnBnCn : A → a B → b • All strings in AnBnCn are generated: C → c Example 3 WW = {ww : w ∈ {a, b}*} • S → T# /* Generate the wall exactly once. T → aTa /* Generate wCwR. T → bTb ʺ T → C ʺ C → CP /* Generate a pusher P • WW = {ww : w ∈ {a, b}*} Paa→ aPa /* Push one character to the right to get ready to jump. • Idea: Generate wCwR# and then reverse last part Pab → bPa ʺ Pba → aPb ʺ Pbb → bPb ʺ Pa# → #a /* Hop a character over the wall. Pb# → #b ʺ C# → ε Computability Proof of Equivalence • Theorem: A language is generated by an • (Grammar ⇒ TM): by construction of a two- unrestricted grammar if and only if it is in SD. tape NDTM. Proof: • Suppose S ⇒* w. • w is on tape 1. Start w/S on tape 2. • (Grammar ⇒ TM): by construction of an • Tape 2 simulates derivation.
    [Show full text]