
Context-Free Grammars A Way to Specify Some Nonregular Languages Tuesday, November 2, 2010 Reading: Sipser 2.1, Stoughton 4.1, CS235 Languages and Automata Department of Computer Science Wellesley College You Are Here! Reg = Regular Languages 0*1* • Deterministic Finite Automaton • Nondeterministic Finite Automaton (01)* • Regular Expression 0*1*+(01)* • Right-Linear Grammar CFL = Context-Free Language 0n1n • Context-Free Grammar wwR • Nondeterministic Pushdown Automaton Dec = Recursive (Turing-Decidable) Language 0n1n2n • Turing Machine • Unrestricted Grammar ww RE = Recursively Enumerable (Turing-Recognizable/Acceptable) Language Lan = All Languages Context Free Grammars 23-2 1 Overview oIntroduce Context-Free Grammars (CFGs), a new formalism for specifying languages = so-called Context-Free Languages (CFLs). o Define the set of strings denoted by a Context-Free Grammar via the notions of derivations and parse trees. o Show that CFGs can specify some simple nonregular languages. o Show how grammars are manipulated in Forlan Context Free Grammars 23-3 A Sample Context-Free Grammar (CFG) LHS RHS Informally, a context-free grammar (CFG) is a S → AB collection of productions = substitution rules A → 0A1 for rewriting variables (a.k.a. nonterminals) to strings A → % of variables and terminals (non-variable symbols). B → 1B0 Each rule has a left-hand side (LHS) consisting of B → % a single variable and a right-hand side (RHS) consisting of variables and terminals. A CFG has a start variable, which is conventionally the variable in the LHS of the first rule. A string of variables and terminals can be rewritten to another by substituting the RHS of a rule for the variable in the LHS. E.g.: 0A11B0 00A111B0 0A11B0 0%11B0 = 011B0 0A11B0 0A111B00 0A11B0 0A11%0 = 0A110 Context Free Grammars 23-4 2 Derivations Generate Strings A sequence of substitution steps that rewrites S → AB the start variable to a string of terminals is called A → 0A1 a derivation.A CFG generates a string of terminals A → % s if there is a is derivation of s. E.g.: B → 1B0 S AB %B = B % B → % S AB 0A1B 0%1B = 01B 01% = 01 S AB A1B0 A1%0 = A10 %10 = 10 S AB 0A1B 0A11B0 00A111B0 00A111%0 = 00A1110 00%1110 = 001110 Exp lic it %s are usually omitted from a deri vati ons unless % is the final string: S AB B % S AB 0A1B 01B 01 Context Free Grammars 23-5 Leftmost and Rightmost Derivations S → AB A → 0A1 A → % There are often multiple derivations generating the B → 1B0 same string that differ inconsequentially in the order B → % of substitutions performed. E.g.: S AB 0A1B 0A11B0 00A111B0 00A1110 001110 S AB A1B0 0A11B0 0A110 00A1110 001110 We can standardize the sequence by always substituting for the lefmost variable (resulting in a leftmost derivation): S AB 0A1B 00A11B 0011B 00111B0 001110 or the rightmost variable (resulting in a rightmost derivation): S AB A1B0 A10 0A110 00A1110 001110 Context Free Grammars 23-6 3 Parse Trees S → AB Any sequence of rewriting steps can be depicted A → 0A1 as a parse tree in which each internal node shows A → % how a variable rewrites to the children of the node. B → 1B0 S B → % A B 0 A 1 1 B 0 0 A 1 % % The yield of a parse tree = the string consisting of the leaves of the tree from from left to right = the result of the rewriting steps. For the above parse tree, the yield = 00%111%0 = 001110 Context Free Grammars 23-7 Alternative Ways to Write CFGs Can combine multiple In the literature, the → in productions productions from the is often replaced by ::=, especially in same variable using | so-called Backus-Naur Form (BNF). S → AB S ::= AB A → 0A1 | % A ::= 0A1 | % B → 1B0 | % B ::= 1B0 | % Formally, a CFG is a quadruple: Forlan format ({S,A,B}, (1) set of variables {variables} Only three parts {0,1}, (2) set of terminals S, A, B are specified because the S, (3) start variable {start variable} terminals are {(S,AB), (4) productions S implicitly defined (A,0A1), (A,%) {productions} as Sym -variables (B,1B0), (B,%)} S -> AB; ) A -> % | 0A1; B -> % | 1B0 Context Free Grammars 23-8 4 The Language of a CFG The language of a CFG is the set of all terminal S → AB strings generated by the language. A → 0A1 A → % What is the language of our sample CFG? B → 1B0 B → % A language that can be specified by a CFG is called a context-free language (CFL). Context Free Grammars 23-9 Designing CFGs for Some Simple Languages What is a CFG for {0n1n | n Nat}? What is a CFG for {0m1n | m ≥ n}? What is a CFG for {0m1n | m > n}? Context Free Grammars 23-10 5 {w | w in {a,b}* contains equal # of as & bs} What is a CFG for the above language? For intuition, consider annotating each symbol with #as - #bs so far: 1 2 3 2 3 2 1 0 -1 -2 -3 -2 -1 -2 -1 0 1 0 a a a b a b b b b b b a a b a a a b Note that each a matches a particular b. Context Free Grammars 23-11 Balanced Parentheses Consider a language in which the only two terminals are ( and ). Let L(x) = # of left parens in x; R(x) = # of right parens in x A string of parentheses is balanced iff (1) L()(x) = R()(x) (l(alternative ly, L()(x) –R()(x) = 0)0.) (2) For every prefix y of x, L(y) ≥ R(y) (alternatively, L(y) – R(y) ≥ 0) This is just like the language with equal # of as and bs, except that difference can never be < 0. 1 2 3 2 3 2 1 0 1 2 3 2 1 2 1 0 1 0 ( ( ( ) ( ) ) ) ( ( ( ) ) ( ) ) ( ) Note that each ( matches a particular ) after it. Context Free Grammars 23-12 6 What is a CFG for Balanced Parentheses? IiilIntuitively, why is the CFG correct? (For a formal proof of correctness, see Kozen Lecture 20) Context Free Grammars 23-13 CFGs can Specify Natural Languages <Sentence> → <NounPhrase><VerbPhrase> <NounPhrase> → <Article><NounUnit> <NounUnit> → <Noun> | <Adjective><NounUnit> | <NounUnit> that <VerbPhrase> <VerbPhrase> → <Verb> <NounPhrase> <Article> → a | the <Adjective> → big | small | black | gray | furry <Noun> → dog | cat | mouse | bug <Verb> → loves | chases | eats (Imagine the nonterminals are indecomposable tokens, not strings.) Give a parse tree for the following sentence: The big black dog that chases the gray cat loves a furry mouse that eats a bug Context Free Grammars 23-14 7 CFGs can Specify Programming Languages Here is a CFG for SLiP: <Stm> → <Stm> ; <Stm> | [ID(string)] := <Exp> | print ( ExpList ) <Exp> → [ID(string)] | [INT(integer)] | <Exp> [OP(binop)] <Exp> | ( <Stm> , <Exp> ) <ExpList> → Exp | <ExpList> , <Exp> (Note: ; stands for [SEMI] token, ( stands for [LPAREN] token, etc.) Give a parse tree for the following statement: prod := (print (sum, sum-1), 10*sum) Context Free Grammars 23-15 Ambiguity A CFG is ambiguous if there is more than one parse tree for a string that it generates. S → % This is an example of an ambiguous grammar. S → SS The stri ng abba hshas an iifiitnfinite number of parse ts!trees! S → aSb S → bSa Here are a few of them: S S S S S S S S S a b a S b S a S b S S S S b S a % % % b S a % a S b S S S S % % % % % % Context Free Grammars 23-16 8 Ambiguity Can Affect Meaning Ambiguity can affect the meaning of a phrase in both natural languages and programming languages. Here’s are some natural language examples: High school principal Fruit flies like bananas. A woman without her man is nothing. A classic example in programming languages is arithmetic expressions: E → ID(str) | INT(int) | E B E | ( E ) B → + | - | * | / Context Free Grammars 23-17 Arithmetic Expressions: Precedence E → ID(str) | INT(int) | EBE | ( E ) B → + | - | * | / What does 2 * 3 + 4 mean? E E E B E E B E Int(2) * E B E E B E + Int(4) Int(3) + Int(4) Int(2) * Int(3) Context Free Grammars 23-18 9 Arithmetic Expressions: Associativity E → ID(str) | INT(int) | EBE | ( E ) B → + | - | * | / What does 2 - 3 - 4 mea n? E E E B E E B E Int(2) - E B E E B E - Int(4) Int(3) - Int(4) Int(2) - Int(3) In a later lecture we’ll see how to rewrite the arithmetic expression grammar to unambiguously express the standard precedence and associativity rules. Context Free Grammars 23-19 Forlan Gram Module -open Gram; opening Gram type gram val fromString : string -> gram val toString : gram -> string val inppgut : string -> gram val output : string * gram -> unit val variables : gram -> sym set val startVariable : gram -> sym val productions : gram -> prod set val numVariables : gram -> int val numProductions : gram -> int val alphabet : gram -> sym set val renameVariables : gram * sym_rel -> gram val renameViblCVariablesCanon illically: gram -> gram val generated : gram -> str -> bool (* Many other bindings omitted; we’ll see a few more later *) Context Free Grammars 23-20 10 Forlan Gram Examples - val L1gram = Gram.input "L1.gram"; val L1gram = - : gram - Gram.output ("", L1gram); {variables} A, B, S {start variable} S {productions} A -> % | 0A1; B -> % | 1B0; S -> AB val it = () : unit - Gram.numVariables L1gram; val it = 3 : int -SymSet.toString (Gram.variables L1gram); val it = "A, B, S" : string Context Free Grammars 23-21 Forlan Gram Examples, Part 2 - Gram.numProductions L1gram; val it = 5 : int - fun prodToString (sym,str) = (Sym.toString sym) ^ " -> " ^ (Str.toString str); val prodToString = fn : sym * str -> string -List.map ppgrodToString (Set.toList (Gram.productions L1gram)); val it = ["A -> %","A -> 0A1","B -> %","B -> 1B0","S -> AB"] : string list - fun testOnString gram string = = Gram.generated gram (Str.fromString (if string = "" then "%" else string)); val testOnString = fn : gram -> string -> bool (* Gram.generated uses a general parsing technique that we’ll study later to automatically determine whether a string is generated by a grammar *) - testOnString L1gram "011100"; val it = true : bool - testOnString L1gram "010101"; val it = false : bool (* Note testOnString and other grammar testing functions can be found in the CS235 module GramTest in ~/cs235/download/utils/GramTest.sml *) Context Free Grammars 23-22 11 Some Things Forlan Can’t Do Here are some functions we’d like the Forlan Gram module to provide that it doesn’t: val equalLanguages: gram -> gram -> bool (* Determine if two grammars accept the same language *) val isAmbiguous: gram -> bool (* Determine if the given grammar is ambiguous – i.e., there is a string in the language generated by the grammar that has more than one parse tree.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages12 Page
-
File Size-