<<

Chapt er 3 Context-Free Grammars Context-Free Grammars and Languages

„ De fn. 3 311.1.1 A contttext-free grammar idl(is a quadruple (V, ∑, P, S), where

¾ V is a finite set of variables (non-terminals)

¾ ∑, the alphabet, is a finite set of terminal symbols

¾ P is a finite set of rules of the form V × (V ∪ ∑)*, and

¾ S ∈ V, is the start symbol

„ A production rule of the form A → w, where w ∈ (V ∪ ∑ )*, applied to the string uAv yields uwv, and u and v define the context in which A occurs.

¾ Because the context places no limitations on the applicability of a rule, such a grammar is called context-free grammar (CFG)

2 Context-Free Grammars and Languages

„ De fn. 3 312.1.2. LtLet G = (V, ∑, P, S)b) be a CFG an d v ∈ (V ∪ ∑)*. The set of strings derivable from v is defined recursively as follows:

i) Basis: v is derivable from v ii) Recursion: If u = xAy is derivable from v and A → w ∈ P, then xwy idiis deriva blfble from v iii) Closure: All strings constructed from v and a finite number of applications of (ii) are derivable from v

+ „ The derivability of w ∈ (V ∪ ∑)* from v ∈ (V ∪ ∑) is denoted

* + n * vw⇒ , or vw⇒ , vw⇒ , vw⇒ G

„ The language of the grammar G is the set of terminal strings derivable from the start symbol of G

3 CFG and Languages

„ Defn313Defn. 3.1.3. Let G =(= (V, ∑, P, S)beaCFG) be a CFG ⇒* (i) A string w ∈ (V ∪ ∑)* is a sentential form of G if S G w (()ii) A strin g w ∈ ∑* is a sentence of G if S ⇒ * w G (iii) The language of G, denoted L(G), is the set { w ∈ ∑* | S ⇒ * w }

¾ A set of strings w over an alphabet is called a CFL if there is a CFG that generates w

„ Leftmost (Rightmost) derivation: a derivation that transforms the 1st variable occurring in a string from left-to-right (right-to-left)

e.g., Fig. 3.1(a) and (b) exhibit a leftmost derivation, whereas Fig. 3.1(c) shows a rightmost derivation

„ The derivation of a string can be graphically depicted by a

derivation/ 4 CFG and Languages

5 CFG and Languages

„ Design CFG for the following languages:

(i) The set { 0n1n | n ≥ 0 }.

(ii) The set { aibjck | i ≠ j or j ≠ k }, i.e., the set of strings of a’s followed by b’s followed by c’s such that there are either a different number of a’s and b’s or a different number of b’s and c’s, or both.

„ Given the following grammar:

S → A 1 B A → 0A | λ B → 0B |1B | λ

Give the leftmore and rightmost derivation of the string 00101

6 CFG and Languages

* „ Defn. 3.1.4 . Let G = (V, ∑, P, S)beaCFGand) be a CFG and S ⇒ w a G derivation. The derivation tree, DT, of Sw⇒* is an G ordered tree that can be built iteratively as follows:

(i) Initialize DT T with root S

(ii) If A → x1 ... xn, where xi ∈ (V ∪ ∑), is a rule in the derivation applied to rAv, then add x1 ... xn as the children of A in T (iii) If A →λis a rule in the derivation applied to uAv, then add λ as the only child of A in T e.g., Fig. 3.2 for Fig. 3.1(a) S ⇒ AA ⇒ aA ⇒ aAAA ⇒ abAAA ⇒ abaAA ⇒ ababAA ⇒ ababaA ⇒ ababaa Fig. 3.3 for Fig. 3.1(a)...(d)

ƒ Example. Let G be the CFG .∋. P = S → zMNz, M → aMa | z, N → bNb | z

n n m m which generates strings of the form za za b zb z, where n, m ≥ 0 7 3.2 Examples of Context-Free Grammar (CFG)

„ Many CFGs are the union of simpler CFGs, i. e., combining

individual grammars by putting their rules S1, S2, ..., Sn together using S, the start symbol:

S → S1 | S2 | ... | Sn

n n n n „ Example. Consider the langgg{uage { 0 1 | n ≥ 0 } ∪ { 1 0 | n ≥ 0 } Step 1. Construct the CFG for the language { 0n1n | n ≥ 0 }

S1 → 0 S1 1 | λ Step 2. Construct the CFG for the language { 1n0n | n ≥ 0 }

S2 → 1 S2 0 | λ Step 3. Cons truc t the CFG f or th e l anguage { 0 n1n | n ≥ 0}0 } ∪ { 1n0n | n ≥ 0 }

S → S1 | S2

S1 → 0 S1 1 | λ 8 S2 → 1 S2 0 | λ 3.2. Examples of CFG

„ Example. Consider the followinggg grammar: S → aSa | bSb | a | b | λ where S → aSa | bSb capture the recursive generation process and the grammar generates the set of palindromes over {a, b}

„ Example. Consider a CFG which generates the language consisting of even number of a’s and even number of b’s: S → aB | bA | λ {S: even a’s and even b’s} A → aC | bS {A: even a’s and odd b’s} B → aS | bC {B: odd a’s and even b’s} C → aA | bB {C: odd a’s and odd b’s}

„ Example. Same as above except odd a’s and odd b’s S → aB | bA A → aC | bS B → aS | bC C → aA | bB | λ 9 4.5 Chomsky Normal Form

„ A simplified normal form which restricts the length and composition of the R.H.S. of a rule in CFG

„ Defn 4.5.1. A CFG G = (V, ∑, P, S) is in chomsky normal form if each rule in G has one of the following forms: i) A → BC ii) A → a iii) S → λ

where A, B, C, S ∈ V, and B, C ∈ V -{ S }, and a ∈ ∑

„ The derivation tree for a string generated by a CFG in chomsky normal form is a binary tree

10 Chomsky Normal Form

„ Theorem 4.5.2. Let G = (V, ∑, P, S) be a CFG. There is an algorithm to construct a grammar G’ = (V’, ∑’, P’, S’) in chomsky normal form that is equivalent to G PfProof (kth)(sketch): (i) For each rule A → w, where |w| > 1, replace each terminal symbol a ∈ w by a distinct variable Y and create new rule Y → a (ii) For each modified rule X → w, w is either a terminal or a string in V+. Rules in the latter form must be broken into a sequence of rules, each of whose R.H.S. consists of two variables.

¾ Example 4514.5.1

„ One of the applications of using CFGs that are in Chomsky Normal Form - Constructing binary search trees to accomplish “optimal” time and space search complexity for an input string 11 3.5 Leftmost Derivations and Ambiguity

„ Theorem 3 351.5.1 LtLet G = (V, ∑, P, S)b) be a CFG. A s titring w ∈ L(G) iff there is a leftmost derivation of w from S. Proof. It is clear that if there is a leftmost derivation of w from S, w ∈ L(G). We can show that every string in w ∈ L(G) is derivable in a leftmost manner, i. e., S ⇒* w, is a leftmost derivation. If there is any rule application that is not leftmost, the rule applications can be reordered so that they are leftmost.

„ Is there a unique leftmost derivation for every string in a CFL?

¾ Answer: No. (Consider the two leftmost derivations in Fig. 3.1.)

¾ The possibility of a string having several leftmost derivations introduces the notion of ambiguity.

¾ The ambiguity increases the burden on debugging a program, which should be avoided. 12 3.5 Leftmost Derivations and Ambiguity

„ Defn352Defn. 3.5.2 ACFGA CFG G is ambiguous if there is a string w ∈ L(G) that can be derived by two distinct leftmost derivations. A grammar that is not ambiguous is called unambiguous.

„ Example 3.5.1 The grammar G, which is defined as S → aS | Sa | a is ambiguous, since there are two leftmost derivations on aa: S ⇒ aS ⇒ aa and S ⇒ Sa ⇒ aa however, G’, which is defined as S → aS | a, is unambiguous.

„ Unfortunately, there are some CFLs that cannot be generated by any unambiguous grammars. Such languages are called inherently ambiguous.

„ A grammar is unambiguous if, at each leftmost-derivation step, there is only one rule that can lead to a derivation of the desired string. 13 3.5 Leftmost Derivations and Ambiguity

„ Example.An. An inherently ambiguous language L = { anbncm | n, m ≥ 0 } ∪ { anbmcm | m, n ≥ 0 }

¾ Every grammar that generates L is ambiguous

¾ Consider the following grammar of L:

S → S1 | S2,

S1 → S1c | A, A → aAb | λ

S2 → aS2 | B, B → bBc | λ n n n ¾ the strings { a b c | n ≥ 0 } always have two different DTs, e.g.,

S S

S1 S2

S1 c a S2

S1 c a S2 … … … … 14 3.5 Leftmost Derivations and Ambiguity

„ Another example of inherently ambiguous language: L = { anbncmdm | n, m > 0 } ∪ { anbmcmdn | n, m > 0 }

„ The problem of determining whether an arbitrary language is inherently ambiguous is recursively unsolvable .

¾ i.e., there is no algorithm that determines whether an arbitraryygg language is inherentlyyg ambiguous.

„ Reference: “Ambiguity in context free languages,” S. Ginsburg and J . Ullian , Journal of the ACM, (13)1: 62- 89, January 1966.

15 3.5 Leftmost Derivations and Ambiguity

„ Examp le 3523.5.2 The ambiguous grammar G,

S → bS | Sb | a

can be converted into unambiguous grammar G1 or G2, where

G1: S → bS | aA A → bA | λ

G2: S → bS | A A → Ab | a

„ Example 3.5.3 The following grammar G is ambiguous:

S → aSb | aSbb | λ (in Example 3.2.4), since S ⇒ aSb ⇒ aaSbbb ⇒ aabbb, and S ⇒ aSbb ⇒ aaSbbb ⇒ aabbb

which can be converted into an unambiguous grammar

S → aSb | A | λ A → aAbb | abb 16 3.4 CFG and CFL

„ A terminal string w is in the language of a grammar G if w can bdibe derive dfd from thttthe start sym blbol us ing thlthe rules of G, i.e., w is derivable in G

„ To show that everyyg string in the lan ggguage L can be derived by a grammar G and vice versa, i.e., L = L(G), use a derivation schema.

„ Example. LtLet G be S → AASB | AAB A → a B → bbb andld le t L = { a2nb3n | n > 0}0 } (i) L ⊆ L(G): Derivation Applicable Rules n−1 S ⇒ (AA)n-1 SBn-1 S → AASB ⇒ (AA)n Bn S → AAB 2n ⇒ (aa)n Bn A → a n n n ⇒ (aa) (bbb) B → bbb = a2nb3n 17 CFG and CFL (continued)

„ (II) L(G) ⊆ L: By induction on the number of rule applications for deriving all strings in L using G. Three conditions must be satisfied fhtiditiffor each step in a derivation of u:

i) 3(ηa(u) + ηA(u)) = 2 (ηb(u) + 3ηB(u))

ii) ηA(u)+) + ηa(u)>1) > 1 iii) All a’s and A’s precede all b’s and B’s

¾ Basis: n = 1 , i. e., derivations of length one S → AASB or S → AAB

condition (()i) holds since 3(ηa(u) + ηA(u)) = 6 and 2(ηb(u) + 3ηB(u)) = 6 condition (ii) and (iii) also hold.

¾ Induction Hypothesis: Assume that conditions (i), (ii), and (iii) hold for any derivation of length n or fewer. 18 CFG and CFL

„ (II) L(G) ⊆ L: ¾ Induction: Let w be a string derivable from S by a derivation nn+1 of length n+1. Since S ⇒ w can be written as S ⇒ u ⇒ w, u ∈ (V ∪∑)*, by I.H., j(u) = 3(ηa(u) + ηA(u)) & and k(u) = 2 (ηb(u) + 3ηB(u)), j(u) = k(u), and j(u) / 3 > 1. Applying the derivation u ⇒ w yields: Applicable Rule j(w) k(w) j(w)/3 S → AASB j(u)+6 k(u)+6 j(u)/3+2 S → AAB j(u)+6 k(u)+6 j(u)/3+2 A → a j(u)k() k(u)j() j(u)/3 B → bbb j(u) k(u) j(u)/3 i) j(u) = k(u) ⇒ j(w) = k(w), since the 2nd column = 3rd column ii) j(u)/3 > 1 ⇒ j(w)/3 >1 iii) order is preserved since either a) S is replaced by a string with an appropriate ordered sequence, or b) a variable is replaced by the corresponding terminal Hence, L(G) ⊆ L 19 CFG and CFL

„ Example. Let G be the grammar S → aSb | ab. Then, L = {anbn | n > 0} = L(G) Derivation Applicable Rule n−1 Sa⇒ n-1Sbn-1 S → aSb ⇒ anbn S → ab

„ Example 3.4.1 Let G be the grammar S → aS | bB | λ B → aB | bS | bC C → aC | λ show L is the set a*(a*ba*ba*)* & L(G) = a*(a*ba*ba*)*, i.e., w ∈ L(G) has even # of b’s

20 CFG and CFL

„ (I) L ⊆ L(G): The constraint that must be satisfied for each step in a derivation of u:

(c) (ηb(u) + ηB(u)) mod 2 = 0 Let n be the length of the derivation of u

¾ Basis: n = 1 • S ⇒ aS or S ⇒ bB or S ⇒ λ • (c) holds

¾ Induction Hypothesis: • Assume th tht()hldfat (c) holds for all llti strings u fditifor any derivation of length n.

¾ Induction: • Let w be a string derivable from S by a derivation of length n+1 n n+1. Since Sw⇒ can be written as Su⇒ ⇒ w, by I.H., ηb(u) + ηB(u)iseven) is even • Applying the derivation u ⇒ w using the corresponding rules yields: 21 CFG and CFL

„ (I) L ⊆ L(G)

¾ Induction:

Applicable Rule ηb(w) + ηB(w)

S → aS ηb(u) + ηB(u)

S → bB ηb(u) + ηB(u) + 2

S →λ ηb(u) + ηB(u)

B → aB ηb(u) + ηB(u)

B → bS ηb(u) + ηB(u)

B → bC ηb(u)+) + ηB(u)

C → aC ηb(u) + ηB(u)

C → λ ηb(u)+) + ηB(u)

22 CFG and CFL

„ Example 3.4.2. (II) L(G) ⊆ L, i.e., show that every string in a*(a*ba*ba*)* is derivable in G. Derivation: Applicable Rule: n1 S ⇒ an1 S S → aS ⇒an1 bB S → bB n2 n n ⇒ a 1 ba 2 B B → aB n ⇒ a 1 bbSa n2 B→ bS ! n2k n n n2k ⇒ a 1 b a n 2b a 3 ... a B B → aB n n n2k ⇒ a 1 b a n 2 b a 3 ... a bC B → bC n 2k+1 n n n n ⇒an1 bbb a 2b a 3 ... a 2 k ba 2k+1 C C → aC n n3 n n ⇒ an1 b a 2 b a ... a 2 k bCa 2k+1 →λ

ƒ Example 3.4.3 (P.76)

23