Context-Free Grammars: a Way to Specify Some Nonregular Languages

Context-Free Grammars: a Way to Specify Some Nonregular Languages

Context-Free Grammars A Way to Specify Some Nonregular Languages Tuesday, November 2, 2010 Reading: Sipser 2.1, Stoughton 4.1, CS235 Languages and Automata Department of Computer Science Wellesley College You Are Here! Reg = Regular Languages 0*1* • Deterministic Finite Automaton • Nondeterministic Finite Automaton (01)* • Regular Expression 0*1*+(01)* • Right-Linear Grammar CFL = Context-Free Language 0n1n • Context-Free Grammar wwR • Nondeterministic Pushdown Automaton Dec = Recursive (Turing-Decidable) Language 0n1n2n • Turing Machine • Unrestricted Grammar ww RE = Recursively Enumerable (Turing-Recognizable/Acceptable) Language Lan = All Languages Context Free Grammars 23-2 1 Overview oIntroduce Context-Free Grammars (CFGs), a new formalism for specifying languages = so-called Context-Free Languages (CFLs). o Define the set of strings denoted by a Context-Free Grammar via the notions of derivations and parse trees. o Show that CFGs can specify some simple nonregular languages. o Show how grammars are manipulated in Forlan Context Free Grammars 23-3 A Sample Context-Free Grammar (CFG) LHS RHS Informally, a context-free grammar (CFG) is a S → AB collection of productions = substitution rules A → 0A1 for rewriting variables (a.k.a. nonterminals) to strings A → % of variables and terminals (non-variable symbols). B → 1B0 Each rule has a left-hand side (LHS) consisting of B → % a single variable and a right-hand side (RHS) consisting of variables and terminals. A CFG has a start variable, which is conventionally the variable in the LHS of the first rule. A string of variables and terminals can be rewritten to another by substituting the RHS of a rule for the variable in the LHS. E.g.: 0A11B0 00A111B0 0A11B0 0%11B0 = 011B0 0A11B0 0A111B00 0A11B0 0A11%0 = 0A110 Context Free Grammars 23-4 2 Derivations Generate Strings A sequence of substitution steps that rewrites S → AB the start variable to a string of terminals is called A → 0A1 a derivation.A CFG generates a string of terminals A → % s if there is a is derivation of s. E.g.: B → 1B0 S AB %B = B % B → % S AB 0A1B 0%1B = 01B 01% = 01 S AB A1B0 A1%0 = A10 %10 = 10 S AB 0A1B 0A11B0 00A111B0 00A111%0 = 00A1110 00%1110 = 001110 Exp lic it %s are usually omitted from a deri vati ons unless % is the final string: S AB B % S AB 0A1B 01B 01 Context Free Grammars 23-5 Leftmost and Rightmost Derivations S → AB A → 0A1 A → % There are often multiple derivations generating the B → 1B0 same string that differ inconsequentially in the order B → % of substitutions performed. E.g.: S AB 0A1B 0A11B0 00A111B0 00A1110 001110 S AB A1B0 0A11B0 0A110 00A1110 001110 We can standardize the sequence by always substituting for the lefmost variable (resulting in a leftmost derivation): S AB 0A1B 00A11B 0011B 00111B0 001110 or the rightmost variable (resulting in a rightmost derivation): S AB A1B0 A10 0A110 00A1110 001110 Context Free Grammars 23-6 3 Parse Trees S → AB Any sequence of rewriting steps can be depicted A → 0A1 as a parse tree in which each internal node shows A → % how a variable rewrites to the children of the node. B → 1B0 S B → % A B 0 A 1 1 B 0 0 A 1 % % The yield of a parse tree = the string consisting of the leaves of the tree from from left to right = the result of the rewriting steps. For the above parse tree, the yield = 00%111%0 = 001110 Context Free Grammars 23-7 Alternative Ways to Write CFGs Can combine multiple In the literature, the → in productions productions from the is often replaced by ::=, especially in same variable using | so-called Backus-Naur Form (BNF). S → AB S ::= AB A → 0A1 | % A ::= 0A1 | % B → 1B0 | % B ::= 1B0 | % Formally, a CFG is a quadruple: Forlan format ({S,A,B}, (1) set of variables {variables} Only three parts {0,1}, (2) set of terminals S, A, B are specified because the S, (3) start variable {start variable} terminals are {(S,AB), (4) productions S implicitly defined (A,0A1), (A,%) {productions} as Sym -variables (B,1B0), (B,%)} S -> AB; ) A -> % | 0A1; B -> % | 1B0 Context Free Grammars 23-8 4 The Language of a CFG The language of a CFG is the set of all terminal S → AB strings generated by the language. A → 0A1 A → % What is the language of our sample CFG? B → 1B0 B → % A language that can be specified by a CFG is called a context-free language (CFL). Context Free Grammars 23-9 Designing CFGs for Some Simple Languages What is a CFG for {0n1n | n Nat}? What is a CFG for {0m1n | m ≥ n}? What is a CFG for {0m1n | m > n}? Context Free Grammars 23-10 5 {w | w in {a,b}* contains equal # of as & bs} What is a CFG for the above language? For intuition, consider annotating each symbol with #as - #bs so far: 1 2 3 2 3 2 1 0 -1 -2 -3 -2 -1 -2 -1 0 1 0 a a a b a b b b b b b a a b a a a b Note that each a matches a particular b. Context Free Grammars 23-11 Balanced Parentheses Consider a language in which the only two terminals are ( and ). Let L(x) = # of left parens in x; R(x) = # of right parens in x A string of parentheses is balanced iff (1) L()(x) = R()(x) (l(alternative ly, L()(x) –R()(x) = 0)0.) (2) For every prefix y of x, L(y) ≥ R(y) (alternatively, L(y) – R(y) ≥ 0) This is just like the language with equal # of as and bs, except that difference can never be < 0. 1 2 3 2 3 2 1 0 1 2 3 2 1 2 1 0 1 0 ( ( ( ) ( ) ) ) ( ( ( ) ) ( ) ) ( ) Note that each ( matches a particular ) after it. Context Free Grammars 23-12 6 What is a CFG for Balanced Parentheses? IiilIntuitively, why is the CFG correct? (For a formal proof of correctness, see Kozen Lecture 20) Context Free Grammars 23-13 CFGs can Specify Natural Languages <Sentence> → <NounPhrase><VerbPhrase> <NounPhrase> → <Article><NounUnit> <NounUnit> → <Noun> | <Adjective><NounUnit> | <NounUnit> that <VerbPhrase> <VerbPhrase> → <Verb> <NounPhrase> <Article> → a | the <Adjective> → big | small | black | gray | furry <Noun> → dog | cat | mouse | bug <Verb> → loves | chases | eats (Imagine the nonterminals are indecomposable tokens, not strings.) Give a parse tree for the following sentence: The big black dog that chases the gray cat loves a furry mouse that eats a bug Context Free Grammars 23-14 7 CFGs can Specify Programming Languages Here is a CFG for SLiP: <Stm> → <Stm> ; <Stm> | [ID(string)] := <Exp> | print ( ExpList ) <Exp> → [ID(string)] | [INT(integer)] | <Exp> [OP(binop)] <Exp> | ( <Stm> , <Exp> ) <ExpList> → Exp | <ExpList> , <Exp> (Note: ; stands for [SEMI] token, ( stands for [LPAREN] token, etc.) Give a parse tree for the following statement: prod := (print (sum, sum-1), 10*sum) Context Free Grammars 23-15 Ambiguity A CFG is ambiguous if there is more than one parse tree for a string that it generates. S → % This is an example of an ambiguous grammar. S → SS The stri ng abba hshas an iifiitnfinite number of parse ts!trees! S → aSb S → bSa Here are a few of them: S S S S S S S S S a b a S b S a S b S S S S b S a % % % b S a % a S b S S S S % % % % % % Context Free Grammars 23-16 8 Ambiguity Can Affect Meaning Ambiguity can affect the meaning of a phrase in both natural languages and programming languages. Here’s are some natural language examples: High school principal Fruit flies like bananas. A woman without her man is nothing. A classic example in programming languages is arithmetic expressions: E → ID(str) | INT(int) | E B E | ( E ) B → + | - | * | / Context Free Grammars 23-17 Arithmetic Expressions: Precedence E → ID(str) | INT(int) | EBE | ( E ) B → + | - | * | / What does 2 * 3 + 4 mean? E E E B E E B E Int(2) * E B E E B E + Int(4) Int(3) + Int(4) Int(2) * Int(3) Context Free Grammars 23-18 9 Arithmetic Expressions: Associativity E → ID(str) | INT(int) | EBE | ( E ) B → + | - | * | / What does 2 - 3 - 4 mea n? E E E B E E B E Int(2) - E B E E B E - Int(4) Int(3) - Int(4) Int(2) - Int(3) In a later lecture we’ll see how to rewrite the arithmetic expression grammar to unambiguously express the standard precedence and associativity rules. Context Free Grammars 23-19 Forlan Gram Module -open Gram; opening Gram type gram val fromString : string -> gram val toString : gram -> string val inppgut : string -> gram val output : string * gram -> unit val variables : gram -> sym set val startVariable : gram -> sym val productions : gram -> prod set val numVariables : gram -> int val numProductions : gram -> int val alphabet : gram -> sym set val renameVariables : gram * sym_rel -> gram val renameViblCVariablesCanon illically: gram -> gram val generated : gram -> str -> bool (* Many other bindings omitted; we’ll see a few more later *) Context Free Grammars 23-20 10 Forlan Gram Examples - val L1gram = Gram.input "L1.gram"; val L1gram = - : gram - Gram.output ("", L1gram); {variables} A, B, S {start variable} S {productions} A -> % | 0A1; B -> % | 1B0; S -> AB val it = () : unit - Gram.numVariables L1gram; val it = 3 : int -SymSet.toString (Gram.variables L1gram); val it = "A, B, S" : string Context Free Grammars 23-21 Forlan Gram Examples, Part 2 - Gram.numProductions L1gram; val it = 5 : int - fun prodToString (sym,str) = (Sym.toString sym) ^ " -> " ^ (Str.toString str); val prodToString = fn : sym * str -> string -List.map ppgrodToString (Set.toList (Gram.productions L1gram)); val it = ["A -> %","A -> 0A1","B -> %","B -> 1B0","S -> AB"] : string list - fun testOnString gram string = = Gram.generated gram (Str.fromString (if string = "" then "%" else string)); val testOnString = fn : gram -> string -> bool (* Gram.generated uses a general parsing technique that we’ll study later to automatically determine whether a string is generated by a grammar *) - testOnString L1gram "011100"; val it = true : bool - testOnString L1gram "010101"; val it = false : bool (* Note testOnString and other grammar testing functions can be found in the CS235 module GramTest in ~/cs235/download/utils/GramTest.sml *) Context Free Grammars 23-22 11 Some Things Forlan Can’t Do Here are some functions we’d like the Forlan Gram module to provide that it doesn’t: val equalLanguages: gram -> gram -> bool (* Determine if two grammars accept the same language *) val isAmbiguous: gram -> bool (* Determine if the given grammar is ambiguous – i.e., there is a string in the language generated by the grammar that has more than one parse tree.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us