Context-Free Grammars

A Way to Specify Some Nonregular Languages

Tuesday, November 2, 2010 Reading: Sipser 2.1, Stoughton 4.1,

CS235 Languages and Automata

Department of Computer Science Wellesley College

You Are Here!

Reg = Regular Languages 0*1* • Deterministic Finite Automaton • Nondeterministic Finite Automaton (01)* • 0*1*+(01)* • Right-

CFL = Context-Free Language 0n1n • Context-Free Grammar wwR • Nondeterministic

Dec = Recursive (Turing-Decidable) Language 0n1n2n • • Unrestricted Grammar ww

RE = Recursively Enumerable (Turing-Recognizable/Acceptable) Language

Lan = All Languages Context Free Grammars 23-2

1 Overview

oIntroduce Context-Free Grammars (CFGs), a new formalism for specifying languages = so-called Context-Free Languages (CFLs).

o Define the set of strings denoted by a Context-Free Grammar via the notions of derivations and parse trees.

o Show that CFGs can specify some simple nonregular languages.

o Show how grammars are manipulated in Forlan

Context Free Grammars 23-3

A Sample Context-Free Grammar (CFG)

LHS RHS Informally, a context-free grammar (CFG) is a S → AB collection of productions = substitution rules A → 0A1 for rewriting variables (a.k.a. nonterminals) to strings A → % of variables and terminals (non-variable symbols). B → 1B0 Each rule has a left-hand side (LHS) consisting of B → % a single variable and a right-hand side (RHS) consisting of variables and terminals.

A CFG has a start variable, which is conventionally the variable in the LHS of the first rule. A string of variables and terminals can be rewritten to another by substituting the RHS of a rule for the variable in the LHS. E.g.: 0A11B0  00A111B0 0A11B0  0%11B0 = 011B0 0A11B0  0A111B00 0A11B0  0A11%0 = 0A110

Context Free Grammars 23-4

2 Derivations Generate Strings A sequence of substitution steps that rewrites S → AB the start variable to a string of terminals is called A → 0A1 a derivation.A CFG generates a string of terminals A → % s if there is a is derivation of s. E.g.: B → 1B0 S  AB  %B = B  % B → % S  AB  0A1B  0%1B = 01B  01% = 01 S  AB  A1B0  A1%0 = A10  %10 = 10 S  AB  0A1B  0A11B0  00A111B0  00A111%0 = 00A1110  00%1110 = 001110 Expli cit %s are usuall y omitt ed from a deri vati ons unless % is the final string: S  AB  B  % S  AB  0A1B  01B  01

Context Free Grammars 23-5

Leftmost and Rightmost Derivations S → AB A → 0A1 A → % There are often multiple derivations generating the B → 1B0 same string that differ inconsequentially in the order B → % of substitutions performed. E.g.:

S  AB  0A1B  0A11B0  00A111B0  00A1110  001110 S  AB  A1B0  0A11B0  0A110  00A1110  001110 We can standardize the sequence by always substituting for the lefmost variable (resulting in a leftmost derivation): S  AB  0A1B  00A11B  0011B  00111B0  001110 or the rightmost variable (resulting in a rightmost derivation): S  AB  A1B0  A10  0A110  00A1110  001110

Context Free Grammars 23-6

3 Parse Trees

S → AB Any sequence of rewriting steps can be depicted A → 0A1 as a parse tree in which each internal node shows A → % how a variable rewrites to the children of the node. B → 1B0 S B → % A B

0 A 1 1 B 0

0 A 1 %

%

The yield of a parse tree = the string consisting of the leaves of the tree from from left to right = the result of the rewriting steps. For the above parse tree, the yield = 00%111%0 = 001110

Context Free Grammars 23-7

Alternative Ways to Write CFGs

Can combine multiple In the literature, the → in productions productions from the is often replaced by ::=, especially in same variable using | so-called Backus-Naur Form (BNF). S → AB S ::= AB A → 0A1 | % A ::= 0A1 | % B → 1B0 | % B ::= 1B0 | %

Formally, a CFG is a quadruple: Forlan format ({S,A,B}, (1) set of variables {variables} Only three parts {0,1}, (2) set of terminals S, A, B are specified because the S, (3) start variable {start variable} terminals are {(S,AB), (4) productions S implicitly defined (A,0A1), (A,%) {productions} as Sym -variables (B,1B0), (B,%)} S -> AB; ) A -> % | 0A1; B -> % | 1B0

Context Free Grammars 23-8

4 The Language of a CFG

The language of a CFG is the set of all terminal S → AB strings generated by the language. A → 0A1 A → % What is the language of our sample CFG? B → 1B0 B → %

A language that can be specified by a CFG is called a context-free language (CFL).

Context Free Grammars 23-9

Designing CFGs for Some Simple Languages

What is a CFG for {0n1n | n  Nat}?

What is a CFG for {0m1n | m ≥ n}?

What is a CFG for {0m1n | m > n}?

Context Free Grammars 23-10

5 {w | w in {a,b}* contains equal # of as & bs}

What is a CFG for the above language?

For intuition, consider annotating each symbol with #as - #bs so far:

1 2 3 2 3 2 1 0 -1 -2 -3 -2 -1 -2 -1 0 1 0 a a a b a b b b b b b a a b a a a b

Note that each a matches a particular b.

Context Free Grammars 23-11

Balanced Parentheses Consider a language in which the only two terminals are ( and ). Let L(x) = # of left parens in x; R(x) = # of right parens in x A string of parentheses is balanced iff (1) L()(x) = R ()(x) (l(alternative ly, L ()(x) –R()(x) = 0)0.) (2) For every prefix y of x, L(y) ≥ R(y) (alternatively, L(y) – R(y) ≥ 0) This is just like the language with equal # of as and bs, except that difference can never be < 0. 1 2 3 2 3 2 1 0 1 2 3 2 1 2 1 0 1 0 ( ( ( ) ( ) ) ) ( ( ( ) ) ( ) ) ( )

Note that each ( matches a particular ) after it.

Context Free Grammars 23-12

6 What is a CFG for Balanced Parentheses?

IiilIntuitively, why is the CFG correct?

(For a formal proof of correctness, see Kozen Lecture 20)

Context Free Grammars 23-13

CFGs can Specify Natural Languages

| | that
→ a | the → big | small | black | gray | furry → dog | cat | mouse | bug → loves | chases | eats

(Imagine the nonterminals are indecomposable tokens, not strings.)

Give a parse tree for the following sentence: The big black dog th at ch ases th e gray cat loves a furry mouse th at eats a bug

Context Free Grammars 23-14

7 CFGs can Specify Programming Languages

Here is a CFG for SLiP:

; | [ID(string)] := | print ( ExpList ) → [ID(string)] | [INT(integer)] | [OP(binop)] | ( , ) → Exp | ,

(Note: ; stands for [SEMI] token, ( stands for [LPAREN] token, etc.)

Give a parse tree for the following statement: prod := (print (sum, sum-1), 10*sum)

Context Free Grammars 23-15

Ambiguity

A CFG is ambiguous if there is more than one parse tree for a string that it generates.

S → % This is an example of an ambiguous grammar. S → SS The stri ng abba hshas an iifiitnfinite num ber of parse ts!trees! S → aSb S → bSa Here are a few of them:

S S S

S S S S S S a b a S b S a S b S S S S b S a

% % % b S a % a S b S S S S

% % % % % %

Context Free Grammars 23-16

8 Ambiguity Can Affect Meaning

Ambiguity can affect the meaning of a phrase in both natural languages and programming languages.

Here’s are some natural language examples: High school principal Fruit flies like bananas. A woman without her man is nothing.

A classic example in programming languages is arithmetic expressions:

E → ID(str) | INT(int) | E B E | ( E ) B → + | - | * | /

Context Free Grammars 23-17

Arithmetic Expressions: Precedence

E → ID(str) | INT(int) | EBE | ( E ) B → + | - | * | /

What does 2 * 3 + 4 mean?

E E

E B E E B E

Int(2) * E B E E B E + Int(4)

Int(3) + Int(4) Int(2) * Int(3)

Context Free Grammars 23-18

9 Arithmetic Expressions: Associativity

E → ID(str) | INT(int) | EBE | ( E ) B → + | - | * | /

What does 2 - 3 - 4 mea n?

E E

E B E E B E

Int(2) - E B E E B E - Int(4)

Int(3) - Int(4) Int(2) - Int(3)

In a later lecture we’ll see how to rewrite the arithmetic expression grammar to unambiguously express the standard precedence and associativity rules.

Context Free Grammars 23-19

Forlan Gram Module -open Gram; opening Gram type gram val fromString : string -> gram val toString : gram -> string val inppgut : string -> gram val output : string * gram -> unit val variables : gram -> sym set val startVariable : gram -> sym val productions : gram -> prod set val numVariables : gram -> int val numProductions : gram -> int val alphabet : gram -> sym set val renameVariables : gram * sym_rel -> gram val renameViblCVariablesCanoni illcally: gram -> gram val generated : gram -> str -> bool (* Many other bindings omitted; we’ll see a few more later *)

Context Free Grammars 23-20

10 Forlan Gram Examples - val L1gram = Gram.input "L1.gram"; val L1gram = - : gram - Gram.output ("", L1gram); {variables} A, B, S {start variable} S {productions} A -> % | 0A1; B -> % | 1B0; S -> AB val it = () : unit - Gram.numVariables L1gram; val it = 3 : int -SymSet.toString (Gram.variables L1gram); val it = "A, B, S" : string

Context Free Grammars 23-21

Forlan Gram Examples, Part 2 - Gram.numProductions L1gram; val it = 5 : int - fun prodToString (sym,str) = (Sym.toString sym) ^ " -> " ^ (Str.toString str); val prodToString = fn : sym * str -> string -List.map ppgrodToString (Set.toList (Gram.productions L1gram)); val it = ["A -> %","A -> 0A1","B -> %","B -> 1B0","S -> AB"] : string list

- fun testOnString gram string = = Gram.generated gram (Str.fromString (if string = "" then "%" else string)); val testOnString = fn : gram -> string -> bool (* Gram.generated uses a general parsing technique that we’ll study later to automatically determine whether a string is generated by a grammar *) - testOnString L1gram "011100"; val it = true : bool - testOnString L1gram "010101"; val it = false : bool (* Note testOnString and other grammar testing functions can be found in the CS235 module GramTest in ~/cs235/download/utils/GramTest.sml *) Context Free Grammars 23-22

11 Some Things Forlan Can’t Do

Here are some functions we’d like the Forlan Gram module to provide that it doesn’t:

val equalLanguages: gram -> gram -> bool (* Determine if two grammars accept the same language *)

val isAmbiguous: gram -> bool (* Determine if the given grammar is ambiguous – i.e., there is a string in the language generated by the grammar that has more than one parse tree. *)

This isn’ t a fault of Forlan. These functions are undecidable and can’t be implemented in any programminer language! More on this later in the course …

Context Free Grammars 23-23

12