<<

Unit 7

Context-free Grammars Context-free Languages

Reading: Sipser, chapt. 2.1 Hopcroft et. al. chapt. 5

1 דקד ו ק / דקד ו ק םי Grammar • DFA/NFA describes a in a computational way: Given a word – we run it on the FA and check whether it stopped on an accepting state. • is another method of describing languages in a syntactic way. • A grammar is a set of rules describing languages in a syntactic way: We start with an empty string and form the word according to the grammar rules until we have the desired output ().

2 The Origin of Grammar

• The origin of the name grammar for this computational model is in natural languages, where grammar is a collection of rules. • This collection defines what is legal in the language and what is not.

3 Example of a grammar

• Symbols: S={a,b}, Variables: V={S,B} • The following grammar generates a*b*.

S®aS (S is the starting variable) S®B B®Bb B®e How can we generate ab, aab, e ?

4 The Grammars Formalism

A grammar is composed of: 1. Terminals S = symbols of the alphabet of the language being defined 2. Variables V = a finite set of other symbols, each of which represents a language. 3. Start SÎV = the variable whose language is the one being defined. 4. A collection of production rules.

5 Common Notations

• Terminals: Lower case, lower alphabet (a, b, c). • Variables: Upper case, lower alphabet (A, B, C). • String of terminals: Lower case, higher alphabet (x, y, u, v, w). • Mixed strings: (terminals + variables): Lower case, Greek letters (a, b, g) • Starting Variable: S

6 Production Rules • A production rule has the form:

a®b • It means: – a can be replaced by b – a constructs b – a produces b

7 Example

• Symbols: S={0,1}, Variables: V={S} • The language: L = 0$1$ & > 0}.

S®0S1 (S is the starting variable) S®01

8 Another Example

• S={a,b,#} • The following grammar generates {"#$%#$%"# | (, * ≥ 0}.

S®aSa S®B B®bBb B®#

9 Derivation of a word

1. Write down the start variable. 2. Find a variable A that is written down and a rule A®a. 3. Replace the variable A with the string a. 4. Repeat steps 2+3 until no variables remain.

10 Þ notation • We use the notation “Þ” to represent an actual derivation: a Þ b

• It means: the string b was derived from a using a production rule. • We can derive a!b ⇒ agb , if ! → $ is a production rule. • Example: % → 01; % → 0%1. % ⇒ 0%1 ⇒ 00%11 ⇒ 000111. 11 Production w=aacb

can be written 1. S®aS S®aS | bS | cS | e 2. S®bS 3. S®cS When a variable has various 4. S®e production rules, they can all be written in one line.

• How can the word w=aacb be produced? (1) (1) (3) (2) (4)

S ÞaS ÞaaS ÞaacS ÞaacbS Þaacb 12 Parsing / Parsing Tree • Producing a word according to a given grammar is called parsing. • We can represent the same production sequence by a parsing tree. • Each node in the tree is either a variable or a terminal. • A terminal node is a leaf. • The resulting word is a concatenation of the labels of the leaves in left-to-right order (preorder traversal) • This is called the yield of the parsing tree. 13 Parsing Tree of w=aacb S

a S

a S

c S

b S

e 14 Parsing Tree of w=aacb

Or a step by step derivation:

S S S

a S a S

a S

15 Parsing w=aacb (cont.)

S S S a S a S a S

a S a S a S

bc S c S c S

b S b S

e

16 דקד ו ק רסח רשקה Context-Free Grammar (CFG)

A context-free grammar (CFG) G is a 4-tuple

G = (V, S, S, R), where

1. V is a finite set called the variables. 2. S is a finite set, disjoint from V, called the terminals. 3. ( ∈ * is a start symbol. 4. R is a finite set of production rules of the form:

A®a where AÎV and aÎ(VÈS)* 17 Derivation in CFG

• Let a, b and g be strings of variables and terminals • If A®g is a rule in the grammar, we say that aAb derives agb, written aAb Þ agb. • We write x Þ* y if there exists a sequence

x1, x2, ..xk, k³0 and x Þ x1 Þ x2 Þ...Þ y .

Þ means derives in one step Þ+ means derives in one or more steps Þ* means derives in zero or more steps 18 פש ו ת רסח ו ת רשקה Context Free Languages

• The language of the grammar is

L(G) = {wÎS* | S Þ* w}

• The language generated by a Context Free Grammar (CFG) is called a context-free language (CFL).

19 Examples over S={0,1} • Construct a grammar for the following language: L = {0,00,1}

• G = (V={S},S={0,1},S, R)

• R: S ® 0 Alternatively S ® 00 S ® 0 | 00 | 1 S ® 1

20 Examples over S={0,1}

• Construct a grammar for the following language L = {0n1n |n³0}

• G = (V={S},S={0,1},S, R) where R: S®0S1 | e

• Example: let’s parse the word 0011 ! ⟹ 0!1 ⟹ 00!11 ⟹ 0011 21 Examples over S={0,1}

• Construct a grammar for the following language L = {0n1n |n³1}

• G = (V={S},S={0,1},S, R) where R:

S ® 0S1 | 01

• Let’s parse the word w=00001111 22 Examples over S={0,1}

• Construct a grammar for the following language L = {0n1m |m ³n>0}

• G = (V={S},S={0,1},S, R) where R:

S ® 0S1 | 0B1 B ® B1 | e

23 Examples over S={0,1}

• Construct a grammar for the following language L = {0*1+}

• G = (V={S,B},S={0,1},S, R) where R:

S ®0S | 1B What about 0*1* ? B ®1B | e 24 Examples over S={0,1}

• Construct a grammar for the following language L = {02i+1 | i³0}

• G = (V={S},S={0,1},S, R) where R:

S ®00S | 0

Alternatively: S ®0S0 | 0 25 Examples over S={0,1}

• Construct a grammar for the following language L = {0i+11i | i³0}

• G = (V={S},S={0,1},S, R) where R:

S ®0S1 | 0

26 Examples over S={0,1} • Construct a grammar for the following language L = {w Î{0,1}* | |w| mod 2 = 1}

• G = (V={S},S={0,1},S, R) where R:

S ®0 | 1| 1S1| 0S0 |1S0 | 0S1 Let’s parse: 011100101

• Alternatively: S ®DSD | D ; D ® 0 | 1 • Alternatively: S ®DDS | D ; D ® 0 | 1 27 Examples over S={0,1}

• Construct a grammar for the following language L = {0n1n |n>0}È {1n0n | n³0}

• G = (V={S,A,B},S={0,1},S, R) where R:

S ® A | B A ® 0A1 | 01 B ® 1B0 | e

28 Exercise Construct grammars for the following languages over S={0,1}

1. L1= {w | #1(w) is even}

2. L2= {w | #1(w) is odd}

3. L3= {w| #1(w) = #0(w)} n m n+m 4. L4= {0 10 10 | n,m ³ 0}

Solution: In class 29 Define the Language for a CFG

• Give a description of L(G) for the following grammar G: S ® 0S0 | 1

• L(G) = {0n10n | n³0}

30 Define the Language for a CFG

• Give a description of L(G) for the following grammar G: S ® 0S0 | 1S1 | e – " # = % ∈ 0,1 ∗ % = %+, % ,- ./.0 }

S ® 0S0 | 1S1 | 1 | 0 | e – " # = % ∈ 0,1 ∗ % = %+}

31 Define the Language for a CFG

• Give a description of L(G) for the following grammar G: S ® 0A | 0B A®1S B®1

• L(G) = {(01)n |n³1 }

• Simpler version S ® 01S | 01 32 Define the Language for a CFG

• Give a description of L(G) for the following grammar G: S ® 0S11 | 0

• L(G) = {0n+112n |n³0 }

33 Define the Language for a CFG

• Give a description of L(G) for the following grammar G: S ® E | NE N ® D | DN D ® 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 E ® 0 | 2 | 4 | 6

• L(G) = {w | w represents an even octal number}

34 Define the Language for a CFG

• Give a description of L(G) for the following grammar G: S ® N.N | -N.N N ® D | DN D ® 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

• L(G) = {w | w represents a decimal rational number (that has a finite representation) }

35 Exercise

Give a description for L(G) for each of the following grammars over S={a,b,$} :

G1: S ® aSb | A A ® Aa | e

G2: S ® aSb | SS | e

G3: S ® aSa | bSb | aS | bS | $

Solution: In class 36 Exercise

Give a description for L(G) for the following grammars over S={a} :

E®E+E | ExE | D D®0|1|2|..|9

• Let’s parse the string 3+4x5 – E Þ E+E Þ D+E Þ 3+E Þ 3+ExE Þ… 3+4x5 – E Þ ExE Þ ExD Þ Ex5 Þ E+Ex5 Þ… 3+4x5

37 Exercise (cont.)

• The string 3+4*5 can be E®E+E | ExE | D produced in several ways: D®0|1|2|..|9

E E E E + E x E D x E + E E E D D D D D 3 5 3 4 4 5 38 Ambiguous Production

• So if we use this grammar to produce a then we will have several computations for 3+4*5. • There is no precedence of ‘*’ over the ‘+’. • This language will be impossible to use because the user won't know which computation the uses. • Two possible results: 35 or 23.

39 Ambiguity • The ability of grammar to generate the same string in several ways is called ambiguity. • A grammar is ambiguous if there exists a string w that can be derived by at least two different parse trees. • Sometimes it is possible to find for an ambiguous grammar an unambiguous one defining the same language. • Some CFL are inherently ambiguous – Example: {aibjck | i=j or j=k} 40 Another example

! → ! + $ | $ $ → $×' | ' ' → ! | ( ( → 0 1 2 … | 9

• What are the terminals? • Parse the string: 3+4x5 (is it unique?) • Parse the string: (3 + 4) ´ 5

Solution: in class 41 Finite Languages

Theorem: Any finite language cab be constructed by a CFG.

Proof:

• Let L={wi | i£n and wiÎS*} be a finite language over S. • We construct the following grammar:

S®!" !# !$ … | !'

42 Regular Languages

Questions: • Are the regular languages cab be constructed by CFG? • Are the RL a subset of CFL?

• Answer: in the following.

43 The

A grammar is called regular if each production has one of the following forms: A®w or A®wB where wÎS* and A,BÎV.

44 Regular grammar example

Example

S ® 012 S ® 01A A ® 0A A ® 0 A ® e

45 Simple Regular Grammar

A grammar is called simple regular if each production has one of the following forms: A®s or A®sB

where ! ∈ Σ$ and A,BÎV.

• Claim: Each regular grammar can be modified into a simple regular form.

46 Simple Regular grammar Example:

S ® 012 S ® 0B S ® 0B S ® 01A S ® 0C S ® 0C A ® 0A A ® 0A A ® 0A A ® 0 A ® 0 A ® 0 A ® e A ® e A ® e

B ® 12 B ® 1D C ® 1A C ® 1A

D® 2 47 The Regular Grammar

• Theorem: The set of languages that have a regular grammar is the set of regular languages.

• Proof Idea: Given an NFA we will create an equivalent regular grammar. Given a regular grammar we will build an equivalent NFA.

48 From NFA to Regular Grammars

Lemma: A regular grammar (RG) can be constructed for any NFA.

The basic idea (no proof): Translation the transition functions of the NFA to rules in a RG.

49 Algorithm: from NFA to RG

1. Rename all states of NFA to a set of capital letters. 2. Name the start state of the NFA by S. 3. Translate each transition d(A,s)=B into the rule A®sB and d(A,e)=B into the rule A®B. 4. For each accepting state A in the NFA, add the

rule A®e 50 Example: 0,1

0 0 q0 q1 1

Denote q0 by S and q1 by A

51 Example: 0,1

0 0 S A 1

Denote q0 by S and q1 by A The regular grammar is: S® 0S | 0A | 1A | e A® 0A | 1S

Generate w=1010 52 From RG to NFA

Lemma: A NFA can be constructed for any regular grammar G.

The basic idea (no proof): Construct a state for each variable and define the transition functions according to the regular grammar.

53 Algorithm: from RG to NFA

Input: A regular grammar G.

Algorithm: 1. Transform all rules of the grammar to be in a simple regular form

Denote the simple regular grammar: ! = #, Σ, S, ' 54 Algorithm (cont.)

2. The equivalent NFA is (", Σ, %, &', () where:

3. " = + and &' = , 4. For each rule “A®cB” ∈ . add % /, 0 = 1

c A B

5. For each rule “A®B” ∈ . add % /, 2 = 1

e A B 55 Algorithm (cont.)

6. For each rule “A®c” ∈ " , cÎSe a. Add a new state f to Q: # = # ∪ {'} b. Add ) *, , = ' c. Add - = - ∪ {'}

c A f

56 Example:

Input: New grammar: S® 0S | 11A S® 0S | 1B A® 1A | 0 B® 1A A® 1A | 0

Resulting NFA: 1 1 1 0 S B A F

0 57 The Regular Grammar

• Conclusion: The regular languages is a proper subset of the context-free languages.

Regular Languages

58 59 Today’s Topics:

• Context Free Closures: - Union - Concatenation - Kleene Star - CFL Substitution - Intersection with RL

• Chomsky Normal Forms

60 Operations over Grammars

Corollary 1: The context-free languages are closed under the following operations: 1. Union 2. Concatenation 3. Kleene star 4. Intersection with RL 5. CFL Substitute

Corollary 2: The CFL are not closed under complement or under intersection. 61 Union Proposition: The CFL are closed under union.

• L1 with G1= (V1,S1,S1,R1) and

• L2 with G2 = (V2,S2,S2,R2)

Such that V1ÇV2 = Æ, we construct their union by merging their grammars:

GÈ = (V1ÈV2È{S}, S1ÈS2, S, R1ÈR2È{S®S1|S2})

Proof idea: The rule S®S1 | S2 enables a string w to be derived either from S1 or from S2. 62 Example Language Grammar

n n S ® aS b | e L1 ={a b } 1 1

R L2 ={ww } S2 ® aS2a | bS2b | e

Union n n R L ={a b }È{ww } S ® S1 | S2 63 Concatenation Prop. The CFL are closed under concatenation.

• L1 with G1= (V1,S1,S1,R1) and

• L2 with G2 = (V2,S2,S2,R2)

Such that V1ÇV2 = Æ we construct their concatenation :

Gcon = (V1ÈV2È{S}, S1ÈS2, S, R1ÈR2È{S®S1S2})

Proof idea: The rule S®S1S2 enables the creation of a string w=uv where u can be derived from S1 and 64 v from S2. Example Language Grammar

n n S ® aS b | e L1 ={a b } 1 1

R L2 ={ww } S2 ® aS2a | bS2b | e

Concatenation n n R L ={a b }{ww } S ® S1S2 65 Kleene star Proposition: The CFL are closed under Kleene Star.

Given a language L1 and its grammar: G= (V1,S1,S1,R1)

then G* is the grammar for L1*:

G* = (V1È {S}, S1, S, R1È {S®S1S | e})

Proof idea:

• The rule S®S1S means that a word w in L(G*) is built of two parts w=uv such that u is derived from S1 and v is derived from S. • The rule S®e means a final derivation of S or 66 derivation of the e string. Example

Language Grammar

n n L = {a b } S1 ® aS1b | e

Star Operation

n n * L = {a b } S ® S1S | e

67 More closure properties on CFL

Corollary1: CFL are not closed under intersection. Corollary2: CFL are not closed under complement. Corollary3: CFL are not closed under difference. Corollary4: CFL are closed under intersection with regular languages.

Proof: in class. (Are languages with regular grammars closed under complement or intersection?) 68 Even More Closure Properties: Substitution • Substitution means that we associate each

terminal sÎS with a language Ls=h(s). • Given a word ! = #$#% … #' the substitution

ℎ ! = {!$!% … !' | !+ ∈ -./ } • For a language L we define

ℎ - = ℎ ! ! ∈ -}

• Regular Substitution: Every Ls is a RL

• Context Free Substitution: Every Ls is a CFL 69 Substitution Examples Example for Regular Substitution: • L={ab*} and let h(a)=(a+c)*, h(b)=001 • Then h(L)= (a+c)*(001)* – If w = #$ ∈ &, (ℎ*+ #001, .001, ##001, .#001 ⊂ ℎ(1) Example for Context Free substitution: & = 0313 + ≥ 0} , ℎ(0) = {#30$3 + ≥ 0 , ℎ 1 = 1 • G: S® 0S1 | e

• h(0): S0® aS0b | 0 and h(1): 1 The resulting grammar h(G):

• S® S0S1 | e 70 • S0® aS0b | 0 Regular Substitution

Proposition 1: The RL are closed under regular substitution.

Proof:

Let L be generated by a RE s and ∀" ∈ Σ let %& be generated by a RE '(. • For each " ∈ Σ substitute each occurrence of

a in s with '(. • The resulting expression is a legitimate RE.

71 CF Substitution

Proposition 2: The CFL are closed under CF substitution. (What about regular substitution?)

Proof: Let L be generated by a CFG G, G = V, Σ, S, R , and ∀) ∈ Σ let +, be generated by G- = V-, Σ-, S-, R- . • In each production in R, and ∀) ∈ Σ, replace the

symbol a with Sa . • Add 0, , Σ, )12 3, to grammar G (for each ) ∈ Σ). • The resulting grammar is CFG. 72 Exercises 1. # = %&'& ( ≠ 100, ( ≥ 0 } , prove that L is CFL

∗ 2. # = / ∈ %, ', 1 #4 / = #5 / = #6 / }, prove that L is not CFL.

3. Let / = %7%8 … %& and define :;<=>?@( / = %7%A … for a language L define: :;<=>?@( # = :;<=>?@((/) / ∈ # } Prove that if L is CFL then :;<=>?@( # is CFL

4. Let D ⋄ F = GH G ∈ D, H ∈ F, G = H , G, H ∈ 0,1 ∗} prove that if A and B are RL then D ⋄ F is CFL

Proofs: in class.

73 Simplified Grammars

A simplified form of the grammar is a grammar that 1. doesn't have e rules (only S®e is permitted) 2. doesn't have unit rules.

• An e rule is a rule of the form: A®e. • A unit rule is a rule of the form: A®B.

Theorem: Every context-free grammar can be rewritten in a simplified form. 74 Algorithm to convert CFG to simplified grammar

Step 1: removing e-rules (in the following) • A CFL that does not contain e can be written without e rules. • But, if eÎL then – remove e from L. – Build a simplified form CFG without e rules. – Add a rule S'®S | e . Step 2: removing unit rules (in the following) • Any CFG can be rewritten without unit rules.

75 Step 1: Removing e-rules

1. Find an e rule A®e (A ¹S) and remove it from R. 2. For each rule in R of the form B®aAb where a,bÎ(VÈS)*, add to R the rule B®ab. – Note: We do so for each occurrence of A, e.g. for B®aAbAg we add B®abAg | aAbg | abg . – Note: For a rule B®A, we add a new rule B®e unless this rule has already been removed through this process. 3. Repeat from step 1 until we eliminate all e rules.

76 Step 2: Removing Unit Rules

1. For each unit rule A®B , remove this rule from R and add all productions of B to A: For each B®a in R add the rule A®a 2. Repeat step 1 until all unit rules are removed.

77 Removing e-rules:

Example 1: Example 2: Example 3: S® aBBAc S® aAB S® aBaC A®e A®aA | B B® bB | C B®e B®bB | e C® cC | e

Removing unit-rules: Example 1: S®A | b A® B | b B® bB | a Solutions: In class 78

• A special form of context free grammar is called Chomsky Normal Form (CNF).

Definition A CFG is in CNF if every rule is of the form: A®BC or A®s where sÎS , A,B,CÎV, and B,C¹S. • If a language contains e then the S®e is allowed.

79 (from Wikipedia)

An American linguist, philosopher, cognitive scientist, political activist, author, and lecturer. He is an Institute Professor and professor emeritus of linguistics at the Massachusetts Institute of Technology. Chomsky is well known in the academic and scientific community as one of the fathers of modern linguistics. Since the 1960s, he has become known more widely as a political dissident, an anarchist, and a libertarian socialist intellectual. Chomsky is often viewed as a notable figure in contemporary philosophy.

80 Chomsky Normal Form (cont.) A grammar in Chomsky Normal Form has several properties and usages: – Any string of length n can be derived in 2n-1 steps. – The parsing tree is a binary tree.

S

A B

C D a 81 Chomsky Normal Form (cont.)

Theorem: Any context free language can be generated by a CNF grammar.

Converting CFG to CNF 1. Add a new start symbol S' and the rule S'®S to CFG. 2. Remove all e-rules. 3. Remove all unit rules. 4. next slide… 82 4. Convert all remaining rules into a proper form:

– 4.1 Replace each terminal !" ∈ Σ in a rule whose right-hand side has two or more symbols with a variable %" and add a rule %"®!" to CFG.

– 4.2 For each rule of the form A®B1B2..Bn where n>2 replace it with the two following rules:

A®&'(' and ('® B2..Bn – 4.3 Repeat step 4 until all rules have the proper form (right-hand side of length£2).

83 Example

Write the following grammar in CNF.

S® A | 0B0 A® S | 1 B® A | 0

Solution: in class

84 Chomsky Hierarchy

Chomsky hierarchy consist of 4 types of grammars: 1. Regular (type 3) 2. Context-free (type 2) 3. Context-sensitive (type 1) 4. Recursively enumerable (type 0)

85 Chomsky Hierarchy (cont.)

Regular grammars: – Restricted to rules as: S®σ or S®σB where σÎS and S,BÎV (different from our definition - σÎS*)

• Generates regular languages. • Can be decided by a Finite Automaton (FA).

86 Chomsky Hierarchy (cont.)

Context-free grammars: – Restricted to rules as: A®a where AÎV and aÎ(VÈS)*

• Generates context-free languages. • Can be decided by a (PDA).

87 Chomsky Hierarchy (cont.)

Context-sensitive grammars: – Restricted to rules as: α A β ® α γ β AÎV and α , β, γ Î(VÈS)*

• Generates context-sensitive languages. • Can be decided by a linear-bounded nondeterministic (BTM).

88 Example: S L={anbn| n>=1} a S B C

S ® aSBC | abC a a b C B C CB ® BC bB ® bb a a b B C bC ® b a a b b C

a a b b 89 Chomsky Hierarchy (cont.)

Recursively enumerable grammar: – No restrictions on rules

• Generates recursively enumerable languages. • Can be decided by a Turing machine (TM).

90 End Of Unit 7

91 Examples of regular grammars

Construct a regular grammar for the following regular expressions:

L = 0*+1* L1= 0* 3 Regular grammar: Regular grammar: S® 0S | e S® A | B A® 0A | e

+ B® 1B | e L2= (0+1) + Regular grammar: L4= (01) S® 0S | 1S | 0 | 1 Regular grammar:

S® 01S | 01 92