Context-Free Grammars
A Way to Specify Some Nonregular Languages
Tuesday, November 2, 2010 Reading: Sipser 2.1, Stoughton 4.1,
CS235 Languages and Automata
Department of Computer Science Wellesley College
You Are Here!
Reg = Regular Languages 0*1* • Deterministic Finite Automaton • Nondeterministic Finite Automaton (01)* • Regular Expression 0*1*+(01)* • Right-Linear Grammar
CFL = Context-Free Language 0n1n • Context-Free Grammar wwR • Nondeterministic Pushdown Automaton
Dec = Recursive (Turing-Decidable) Language 0n1n2n • Turing Machine • Unrestricted Grammar ww
RE = Recursively Enumerable (Turing-Recognizable/Acceptable) Language
Lan = All Languages Context Free Grammars 23-2
1 Overview
oIntroduce Context-Free Grammars (CFGs), a new formalism for specifying languages = so-called Context-Free Languages (CFLs).
o Define the set of strings denoted by a Context-Free Grammar via the notions of derivations and parse trees.
o Show that CFGs can specify some simple nonregular languages.
o Show how grammars are manipulated in Forlan
Context Free Grammars 23-3
A Sample Context-Free Grammar (CFG)
LHS RHS Informally, a context-free grammar (CFG) is a S → AB collection of productions = substitution rules A → 0A1 for rewriting variables (a.k.a. nonterminals) to strings A → % of variables and terminals (non-variable symbols). B → 1B0 Each rule has a left-hand side (LHS) consisting of B → % a single variable and a right-hand side (RHS) consisting of variables and terminals.
A CFG has a start variable, which is conventionally the variable in the LHS of the first rule. A string of variables and terminals can be rewritten to another by substituting the RHS of a rule for the variable in the LHS. E.g.: 0A11B0 00A111B0 0A11B0 0%11B0 = 011B0 0A11B0 0A111B00 0A11B0 0A11%0 = 0A110
Context Free Grammars 23-4
2 Derivations Generate Strings A sequence of substitution steps that rewrites S → AB the start variable to a string of terminals is called A → 0A1 a derivation.A CFG generates a string of terminals A → % s if there is a is derivation of s. E.g.: B → 1B0 S AB %B = B % B → % S AB 0A1B 0%1B = 01B 01% = 01 S AB A1B0 A1%0 = A10 %10 = 10 S AB 0A1B 0A11B0 00A111B0 00A111%0 = 00A1110 00%1110 = 001110 Expli cit %s are usuall y omitt ed from a deri vati ons unless % is the final string: S AB B % S AB 0A1B 01B 01
Context Free Grammars 23-5
Leftmost and Rightmost Derivations S → AB A → 0A1 A → % There are often multiple derivations generating the B → 1B0 same string that differ inconsequentially in the order B → % of substitutions performed. E.g.:
S AB 0A1B 0A11B0 00A111B0 00A1110 001110 S AB A1B0 0A11B0 0A110 00A1110 001110 We can standardize the sequence by always substituting for the lefmost variable (resulting in a leftmost derivation): S AB 0A1B 00A11B 0011B 00111B0 001110 or the rightmost variable (resulting in a rightmost derivation): S AB A1B0 A10 0A110 00A1110 001110
Context Free Grammars 23-6
3 Parse Trees
S → AB Any sequence of rewriting steps can be depicted A → 0A1 as a parse tree in which each internal node shows A → % how a variable rewrites to the children of the node. B → 1B0 S B → % A B
0 A 1 1 B 0
0 A 1 %
%
The yield of a parse tree = the string consisting of the leaves of the tree from from left to right = the result of the rewriting steps. For the above parse tree, the yield = 00%111%0 = 001110
Context Free Grammars 23-7
Alternative Ways to Write CFGs
Can combine multiple In the literature, the → in productions productions from the is often replaced by ::=, especially in same variable using | so-called Backus-Naur Form (BNF). S → AB S ::= AB A → 0A1 | % A ::= 0A1 | % B → 1B0 | % B ::= 1B0 | %
Formally, a CFG is a quadruple: Forlan format ({S,A,B}, (1) set of variables {variables} Only three parts {0,1}, (2) set of terminals S, A, B are specified because the S, (3) start variable {start variable} terminals are {(S,AB), (4) productions S implicitly defined (A,0A1), (A,%) {productions} as Sym -variables (B,1B0), (B,%)} S -> AB; ) A -> % | 0A1; B -> % | 1B0
Context Free Grammars 23-8
4 The Language of a CFG
The language of a CFG is the set of all terminal S → AB strings generated by the language. A → 0A1 A → % What is the language of our sample CFG? B → 1B0 B → %
A language that can be specified by a CFG is called a context-free language (CFL).
Context Free Grammars 23-9
Designing CFGs for Some Simple Languages
What is a CFG for {0n1n | n Nat}?
What is a CFG for {0m1n | m ≥ n}?
What is a CFG for {0m1n | m > n}?
Context Free Grammars 23-10
5 {w | w in {a,b}* contains equal # of as & bs}
What is a CFG for the above language?
For intuition, consider annotating each symbol with #as - #bs so far:
1 2 3 2 3 2 1 0 -1 -2 -3 -2 -1 -2 -1 0 1 0 a a a b a b b b b b b a a b a a a b
Note that each a matches a particular b.
Context Free Grammars 23-11
Balanced Parentheses Consider a language in which the only two terminals are ( and ). Let L(x) = # of left parens in x; R(x) = # of right parens in x A string of parentheses is balanced iff (1) L()(x) = R ()(x) (l(alternative ly, L ()(x) –R()(x) = 0)0.) (2) For every prefix y of x, L(y) ≥ R(y) (alternatively, L(y) – R(y) ≥ 0) This is just like the language with equal # of as and bs, except that difference can never be < 0. 1 2 3 2 3 2 1 0 1 2 3 2 1 2 1 0 1 0 ( ( ( ) ( ) ) ) ( ( ( ) ) ( ) ) ( )
Note that each ( matches a particular ) after it.
Context Free Grammars 23-12
6 What is a CFG for Balanced Parentheses?
IiilIntuitively, why is the CFG correct?
(For a formal proof of correctness, see Kozen Lecture 20)
Context Free Grammars 23-13
CFGs can Specify Natural Languages
(Imagine the nonterminals are indecomposable tokens, not strings.)
Give a parse tree for the following sentence: The big black dog th at ch ases th e gray cat loves a furry mouse th at eats a bug
Context Free Grammars 23-14
7 CFGs can Specify Programming Languages
Here is a CFG for SLiP:
(Note: ; stands for [SEMI] token, ( stands for [LPAREN] token, etc.)
Give a parse tree for the following statement: prod := (print (sum, sum-1), 10*sum)
Context Free Grammars 23-15
Ambiguity
A CFG is ambiguous if there is more than one parse tree for a string that it generates.
S → % This is an example of an ambiguous grammar. S → SS The stri ng abba hshas an iifiitnfinite num ber of parse ts!trees! S → aSb S → bSa Here are a few of them:
S S S
S S S S S S a b a S b S a S b S S S S b S a
% % % b S a % a S b S S S S
% % % % % %
Context Free Grammars 23-16
8 Ambiguity Can Affect Meaning
Ambiguity can affect the meaning of a phrase in both natural languages and programming languages.
Here’s are some natural language examples: High school principal Fruit flies like bananas. A woman without her man is nothing.
A classic example in programming languages is arithmetic expressions:
E → ID(str) | INT(int) | E B E | ( E ) B → + | - | * | /
Context Free Grammars 23-17
Arithmetic Expressions: Precedence
E → ID(str) | INT(int) | EBE | ( E ) B → + | - | * | /
What does 2 * 3 + 4 mean?
E E
E B E E B E
Int(2) * E B E E B E + Int(4)
Int(3) + Int(4) Int(2) * Int(3)
Context Free Grammars 23-18
9 Arithmetic Expressions: Associativity
E → ID(str) | INT(int) | EBE | ( E ) B → + | - | * | /
What does 2 - 3 - 4 mea n?
E E
E B E E B E
Int(2) - E B E E B E - Int(4)
Int(3) - Int(4) Int(2) - Int(3)
In a later lecture we’ll see how to rewrite the arithmetic expression grammar to unambiguously express the standard precedence and associativity rules.
Context Free Grammars 23-19
Forlan Gram Module -open Gram; opening Gram type gram val fromString : string -> gram val toString : gram -> string val inppgut : string -> gram val output : string * gram -> unit val variables : gram -> sym set val startVariable : gram -> sym val productions : gram -> prod set val numVariables : gram -> int val numProductions : gram -> int val alphabet : gram -> sym set val renameVariables : gram * sym_rel -> gram val renameViblCVariablesCanoni illcally: gram -> gram val generated : gram -> str -> bool (* Many other bindings omitted; we’ll see a few more later *)
Context Free Grammars 23-20
10 Forlan Gram Examples - val L1gram = Gram.input "L1.gram"; val L1gram = - : gram - Gram.output ("", L1gram); {variables} A, B, S {start variable} S {productions} A -> % | 0A1; B -> % | 1B0; S -> AB val it = () : unit - Gram.numVariables L1gram; val it = 3 : int -SymSet.toString (Gram.variables L1gram); val it = "A, B, S" : string
Context Free Grammars 23-21
Forlan Gram Examples, Part 2 - Gram.numProductions L1gram; val it = 5 : int - fun prodToString (sym,str) = (Sym.toString sym) ^ " -> " ^ (Str.toString str); val prodToString = fn : sym * str -> string -List.map ppgrodToString (Set.toList (Gram.productions L1gram)); val it = ["A -> %","A -> 0A1","B -> %","B -> 1B0","S -> AB"] : string list
- fun testOnString gram string = = Gram.generated gram (Str.fromString (if string = "" then "%" else string)); val testOnString = fn : gram -> string -> bool (* Gram.generated uses a general parsing technique that we’ll study later to automatically determine whether a string is generated by a grammar *) - testOnString L1gram "011100"; val it = true : bool - testOnString L1gram "010101"; val it = false : bool (* Note testOnString and other grammar testing functions can be found in the CS235 module GramTest in ~/cs235/download/utils/GramTest.sml *) Context Free Grammars 23-22
11 Some Things Forlan Can’t Do
Here are some functions we’d like the Forlan Gram module to provide that it doesn’t:
val equalLanguages: gram -> gram -> bool (* Determine if two grammars accept the same language *)
val isAmbiguous: gram -> bool (* Determine if the given grammar is ambiguous – i.e., there is a string in the language generated by the grammar that has more than one parse tree. *)
This isn’ t a fault of Forlan. These functions are undecidable and can’t be implemented in any programminer language! More on this later in the course …
Context Free Grammars 23-23
12