Context-Free Grammars and Languages

More powerful than finite automata. Used to describe document formats, via DTD - document-type definition used in XML (extensible markup language) We introduce context-free grammar notation parse tree. There exist an ’’ that describes all and only the context-free languages. Will be described later.

Automata and Grammars Grammars 6 March 30, 2017 1 / 39 example

A string the same forward and backward, like otto or Madam, I’m Adam. w is a palindrome iff w = w R .

The language Lpal of is not a . A context-free grammar for We use the pumping lemma. palindromes If Lpal is a regular language, let n be the 1. P → λ asssociated constant, and consider: 2. P → 0 w = 0n10n. 3. P → 1 For regular L, we can break w = xyz such that y consists of one or more 0’s from the 4. P → 0P0 5. P → 1P1 first group. Thus, xz would be also in Lpal if Lpal were regular. A context-free grammar (right) consists of one or more variables, that represent classes of strings, i.e., languages.

Automata and Grammars Grammars 6 March 30, 2017 2 / 39 Definition (Grammar) A Grammar G = (V , T , P, S) consists of Finite set of terminal symbols (terminals) T , like {0, 1} in the previous example. Finite set of variables V (nonterminals,syntactic categories), like {P} in the previous example. Start symbol S is a variable that represents the language being defined. P in the previous example. Finite set of rules (productions) P that represent the recursive definition of the language. Each has the form: αAβ → ω, A ∈ V , α, β, ω ∈ (V ∪ T )∗ notice the left side (head) contains at least one variable. The head - the left side, the production symbol →, the body - the right side.

Definition (Context free grammar CFG) Context free grammar (CFG) je G = (V , T , P, S) has only productions of the form A → α, A ∈ V , α ∈ (V ∪ T )∗.

Automata and Grammars Grammars 6 March 30, 2017 3 / 39

Grammar types according to productions allowed.

Type 0 (recursively enumerable languages L0) general rules α → β, α, β ∈ (V ∪ T )∗, α contains at least one variable

Type 1 (context languages L1) productions of the form αAβ → αωβ A ∈ V , α, β ∈ (V ∪ T )∗, ω ∈ (V ∪ T )+ with only exception S → λ, then S does not appear at the right side of any production

Type 2 (context free languages L2) productions of the form A → ω, A ∈ V , ω ∈ (V ∪ T )∗

Type 3 (regular (right linear) languages L3) productions of the form A → ωB, A → ω, A, B ∈ V , ω ∈ T ∗

Automata and Grammars Grammars 6 March 30, 2017 4 / 39 Chomsky hierarchy

The classes of languages are ordered

L0 ⊇ L1 ⊇ L2 ⊇ L3

later we show proper inclusions

L0 ⊃ L1 ⊃ L2 ⊃ L3

L0 ⊇ L1 recursively enumerable contain context free productions αAβ → αωβ have variable A in the head

L2 ⊇ L3 context free contain regular languages productions A → ωB, A → ω have in the body a string (V ∪ T )∗

L1 ⊇ L2 context contain context free languages we have to eliminate rules A → λ, we can do it (later).

Automata and Grammars Grammars 6 March 30, 2017 5 / 39 Derivations Using a Grammar

Definition (One step derivation) Suppose G = (V , T , P, S) is grammar. Let α, ω, η, ν ∈ (V ∪ T )∗. Let α → ω be a production rule of G.

Then one derivation step is: ηαν ⇒G ηων or just ηαν ⇒ ηων.

We extend ⇒ to any number of derivation steps as follows. Definition (Derivation ⇒∗) Let G = (V , T , P, S) is CFG. ∗ ∗ Basis: For any string α ∈ (V ∪ T ) it derives itself, α ⇒G α. ∗ ∗ Induction: If α ⇒G β and β ⇒G γ then α ⇒G γ. ∗ ∗ If grammar G is understood, then we use ⇒ in place of ⇒G .

Example (derivation E ⇒∗ a ∗ (a + b00)) E ⇒ E ∗E ⇒ I ∗E ⇒ a ∗E ⇒ a ∗(E) ⇒ a ∗(E +E) ⇒ a ∗(I +E) ⇒ a ∗(a +E) ⇒ ⇒ a ∗ (a + I) ⇒ a ∗ (a + I0) ⇒ a ∗ (a + I00) ⇒ a ∗ (a + b00)

Automata and Grammars Grammars 6 March 30, 2017 6 / 39 The Language of a Grammar, Notation Convention for CFG Derivations

a, b, c terminals A, B, C variables w, z strings of terminals X, Y either terminals or variables α, β, γ strings of terminals and/or variables.

Definition (The Language of a Grammar) Let G = (V , T , P, S) is CFG. The language L(G) of G is the set of terminal strings that have derivations from the start symbol.

∗ ∗ L(G) = {w ∈ T |S ⇒G w}

∗ ∗ Language of a variable A ∈ V is defined L(A) = {w ∈ T |A ⇒G w}.

Example (Not CFL example) L = {0n1n2n|n ≥ 1} is not context-free, there does not exist CFG grammar recognizing it. Automata and Grammars Grammars 6 March 30, 2017 7 / 39 Type 3 grammars and regular languages

productions has the form A → wB, A → w, A, B ∈ V , w ∈ T ∗ an example of derivation: P : S → 0S|1A|λ, A → 0A|1B, B → 0B|1S S ⇒ 0S ⇒ 01A ⇒ 011B ⇒ 0110B ⇒ 01101S ⇒ 01101 Observations: each word contains exactly one variable (except the last one) the variable is always on the rightmost position the production A → w is the last one of the derivation any step generates terminal string and (possibly) changes the variable The relation of the grammar and a finite automata variable = state of the finite automata productions = transition function

Automata and Grammars Grammars 6 March 30, 2017 8 / 39 Example of the reduction FA to a grammar

Example (G, FA binary numbers divisible by 5) L = {w|w ∈ {a, b}∗&w binary numbers divisible by 5} 0 C2 E4 1

A → 1B|0A|λ 1 B → 0C|1D A 0 0 0 1 C → 0E|1A 1 D → 0B|1C 1 0 E → 0D|1E B1 D3 0

A ⇒ 0A ⇒ 0 (0) A ⇒ 1B ⇒ 10C ⇒ 101A ⇒ 101 (5) Derivation examples A ⇒ 1B ⇒ 10C ⇒ 101A ⇒ 1010A ⇒ 1010 (10) A ⇒ 1B ⇒ 11D ⇒ 111C ⇒ 1111A ⇒ 1111 (15)

Automata and Grammars Grammars 6 March 30, 2017 9 / 39 FA to Grammar reduction

Theorem (L ∈ RE ⇒ L ∈ L3) For any language recognized by a finite automata there exists a grammar Type 3 recognizing the language.

Proof: FA to Grammar reduction

L = L(A) for some automaton A = (Q, Σ, δ, q0, F ).

We define a grammar G = (Q, Σ, P, q0), with productions P p → aq, iff δ(p, a) = q p → λ, iff p ∈ F Is L(A) = L(G)?

λ ∈ L(A) ⇔ q0 ∈ F ⇔ (q0 → λ) ∈ P ⇔ λ ∈ L(G) a1 ... an ∈ L(A) ⇔ ∃q0,..., qn ∈ Q tž. δ(qi , ai+1) = qi+1, qn ∈ F ⇔ (q0 ⇒ a1q1 ⇒ ... a1 ... anqn ⇒ a1 ... an) is derivation of a1 ... an ⇔ a1 ... an ∈ L(G)

Automata and Grammars Grammars 6 March 30, 2017 10 / 39 We aim to construct Grammar to FA reduction

Opposite direction production A → aB are encoded to transition function productions A → λ define the accepting states we rewrite productions A → a1 ... anB, A → a1 ... an with more terminals

we introduce new variables H2,..., Hn define productions A → a1H2, H2 → a2H3,..., Hn → anB or A → a1H2, H2 → a2H3,..., Hn → an productions A → B correspond to λ transitions

Lemma For any Type 3 grammar there exist a Type 3 grammar with the same languages with all productions of the form: A → aB, A → B, A → λ,A, B ∈ V , a ∈ T.

Automata and Grammars Grammars 6 March 30, 2017 11 / 39 Standard form of a grammar Type 3

Lemma For any Type 3 grammar there exist a Type 3 grammar with the same languages with all productions of the form: A → aB, A → B, A → λ,A, B ∈ V , a ∈ T.

Proof. We define G| = (V |, T , S, P|), for each rule we introduce set of new variables Y2,..., Yn, Z1,..., Zn and define P P| A → aB A → aB A → λ A → λ A → a1 ... anBA → a1Y2, Y2 → a2Y3,... Yn → anB Z → a1 ... an Z → a1Z1, Z1 → a2Z2,..., Zn−1 → anZn, Zn → λ we may eliminate also rules: A → B transitive closure U(A) = {B|B ∈ V &A ⇒∗ B} A → w for all Z ∈ U(A) and (Z → w) ∈ P|

Automata and Grammars Grammars 6 March 30, 2017 12 / 39 Theorem (Reduction Type 3 grammar to a λ–NFA) For any language L of a Type 3 grammar there exists a λ–NFA recognizing the same language.

Proof: Reduction Type 3 grammar to a λ–NFA

We take a grammar G = (V , T , P, S) with all productions of the form A → aB, A → B, A → λ, A, B ∈ V , a ∈ T generating L (previous lemma) we define a non–deterministic λ–NFA A = (V , T , δ, {S}, F ) where: F = {A|(A → λ) ∈ P} δ(A, a) = {B|(A → aB) ∈ P} δ(A, λ) = {B|(A → B) ∈ P} L(G) = L(A) λ ∈ L(G) ⇔ (S → λ) ∈ P ⇔ S ∈ F ⇔ λ ∈ L(A) a1 ... an ∈ L(G) ⇔ there exists a derivation ∗ ∗ (S ⇒ a1H1 ⇒ ... ⇒ a1 ... anHn ⇒ a1 ... an) ⇔∃ H0,..., Hn ∈ V tak že H0 = S, Hn ∈ F Hi+1 ∈ δ(Hi , ak ) for the step a1 ... ak−1Hi ⇒ a1 ... ak−1ak Hi+1 Hi+1 ∈ δ(Hi , λ) for the step a1 ... ak Hi ⇒ a1 ... ak Hi+1 ⇔ a1 ... an ∈ L(A)

Automata and Grammars Grammars 6 March 30, 2017 13 / 39 Left (and right) linear grammars

Definition (Left (and right) linear grammars) Type 3 grammars are also called right linear (the variable is always at right). A grammar G is left linear iff all production has the form A → Bw, A → w, A, B ∈ V , w ∈ T ∗.

Lemma Languages generated by left linear grammars are exactly regular languages.

Proof: by ’reversing’ productions we get right linear grammar A → Bw, A → w we reduce to A → w R B, A → w R the new grammar generates LR we know regular languages are closed under reverse, LR is regular, so is L = (LR )R any regular language can be expressed in this form (FA⇒reverse⇒ right linear grammar ⇒ left linear grammar)

Automata and Grammars Grammars 6 March 30, 2017 14 / 39 Linear grammars and languages

Left and right linear grammars together are stronger.

Definition (linear grammar, language) A grammar is linear iff all productions have the form A → uBw, A → w, A, B ∈ V , u, w ∈ T ∗ (at most one variable in the body). Linear languages are languages generated by linear grammars.

Obviously: regular languages ⊆ linear languages. It is a proper subset ⊂.

Example (linear, non–regular language) The language L = {0n1n|n ≥ 1} is not regular but it is linear. It is generated by the grammar S → 0S1|01.

Observation: linar rules can be splitted to left linear and right linear rules: S → 0A, A → S1.

Automata and Grammars Grammars 6 March 30, 2017 15 / 39 A context-free grammar for simple expressions

CFG for simple expressions 1. E → I Example (CFG for simple expressions) 2. E → E + E A grammar for (simple) expression is 3. E → E ∗ E G = ({E, I}, {+, ∗, (, ), a, b, 0, 1}, P, E) 4. E → (E) where P is the set of rules right. 5. I → a Rules 1-4 describe expression. 6. I → b Rules 5-10 describe identifiers I, 7. I → Ia correspond to regular expression 8. I → Ib (a + b)(a + b + 0 + 1)∗. 9. I → I0 10. I → I1

Automata and Grammars Grammars 6 March 30, 2017 16 / 39 Leftmost and Rightmost Derivations

Definition (Leftmost and Rightmost Derivation) ∗ Leftmost derivation ⇒lm, ⇒lm replaces at each step the leftmost variable by on of its production bodies. ∗ Rightmost derivation ⇒rm, ⇒rm replaces at each step the rightmost variable by on of its production bodies.

Example (leftmost derivation)

E ⇒lm E ∗ E ⇒lm I ∗ E ⇒lm a ∗ E ⇒lm a ∗ (E) ⇒lm a ∗ (E + E) ⇒lm a ∗ (I + E) ⇒lm a ∗ (a + E) ⇒lm ⇒lm a ∗ (a + I) ⇒lm a ∗ (a + I0) ⇒lm a ∗ (a + I00) ⇒lm a ∗ (a + b00)

Example (rightmost derivation)

E ⇒rm E ∗ E ⇒rm E ∗ (E) ⇒rm E ∗ (E + E) ⇒rm E ∗ (E + I) ⇒rm E ∗ (E + I0) ⇒rm E ∗ (E + I00) ⇒rm E ∗ (E + b00) ⇒rm ⇒rm E ∗ (I + b00) ⇒rm E ∗ (a + b00) ⇒rm I ∗ (a + b00) ⇒rm a ∗ (a + b00)

Automata and Grammars Grammars 6 March 30, 2017 17 / 39 The Language of a Grammar

Theorem

L(Gpal ), where Gpal is the set of palindromes over {0, 1}.

Proof.

IF: Suppose w is a palindrome. Induction on |w| that w is in L(Gpal ). BASIS: If |w| = 0 or |w| = 1, then w is λ, 0 or 1. We have production rules P → λ,P → 0,P → 1, therefore P ⇒∗ w in any basis case. INDUCTION: Suppose |w| ≥ 2. Since w = w R , so w = 0x0 or w = 1x1. Moreover, x must be a palindrome, x = x R . If w = 0x0, then we invoke the inductive hypothesis to claim P ⇒∗ x. Then: P ⇒ 0P0 ⇒∗ 0x0 = w. If w = 1x1, do it yourself. ∗ ONLY-IF We assume w ∈ L(Gpal ), that is P ⇒ w. We claim that w is a palindrome. Induction on the number of steps in a derivation of w from P. BASIS: λ, 0, 1 are palindromes. INDUCTION: Derivation P ⇒ 0P0 ⇒∗ 0x0 = w. By inductive hypothesis x is a palindrome. Therefore, 0x0 and 1x1 are also palindromes.

Automata and Grammars Grammars 6 March 30, 2017 18 / 39 Sentential Forms

Definition (Sentential Forms) Let G = (V , T , P, S) is CFG. Any string α ∈ (V ∪ T )∗ such that S ⇒∗ α is an sentential form. ∗ ∗ If S ⇒lm α, then α is a left sentential form, if S ⇒rm α, then α is a right sentential form.

Example The string E ∗ (I + E) is an sentential form but is neither left nor right sentential form.

Automata and Grammars Grammars 6 March 30, 2017 19 / 39 Parse Trees

The tree is the data structure of choice to represent the source program in a compiler. The structure facilitates the translation into executable code.

Definition (Parse Tree) Let us fix on a grammar G = (V , T , P, S). The parse trees for G are trees such that: Each interior node is labeled by a variable in V . Each leaf is labeled ∈ V ∪ T ∪ {λ}. If is labeled λ, it must be the only child of its parent.

If an interior node is labeled A and its children X1,..., Xk then (A → X1,..., Xk ) ∈ P is a production rule. The children of a node are ordered from the left.

Notation (Tree terminology) Nodes, parent, child, root, interior nodes, leaves, descendants, ancestors.

Automata and Grammars Grammars 6 March 30, 2017 20 / 39 Tree Examples, Yield Definition

A parse tree of E ⇒∗ I + E. A parse tree of P ⇒∗ 0110. E P

E + E 0 P 0

I 1 P 1

λ

Definition (The Yield) The yield is a string of leaves of a parse tree concatenated from the left.

Special importance has yield that: is a terminal string, the root is labeled by the start symbol. These yields are strings in the language of the underlying grammar.

Automata and Grammars Grammars 6 March 30, 2017 21 / 39 Inference, Derivations, and Pare Trees

Theorem Given a grammar G = (V , T , P, S), w ∈ T ∗. The following are equivalent: The recursive inference procedure determines that terminal string w is in the language of variable A. A ⇒∗ w. ∗ A ⇒lm w. ∗ A ⇒rm w. There is a parse tree with root A and yield w.

Leftmost derivation Parse tree

Derivation Rightmost derivation Recursive inference

Proof directions:

Automata and Grammars Grammars 6 March 30, 2017 22 / 39 From Inferences to Trees

Theorem Let G = (V , T , P, S) be a CFG. If the recursive inference procedure tells us that terminal string w is in the language of variable A, then there is a parse tree with root A and yield w.

A Proof. Induction on the number of steps used to infer w in L(A). ← w → BASIS: A → w ∈ P. A INDUCTION: A → X1X2 ... Xn ∈ P, we have X1 X2 ... Xn parse trees for Xi ’s from inductive hypothesis.

w1 w2 ... wn

Automata and Grammars Grammars 6 March 30, 2017 23 / 39 From Trees to Derivations

Example (Context free derivation) Assume following is a derivation:

E ⇒ I ⇒ Ib ⇒ ab.

Then for any strings α, β following is also derivation:

αEβ ⇒ αIβ ⇒ αIbβ ⇒ αabβ.

Theorem Let G = (V , T , P, S) be a CFG, suppose there is a parse tree with root labeled by variable A and with yield w ∈ T ∗. ∗ Then there is a leftmost derivation A ⇒lm w in grammar G.

Automata and Grammars Grammars 6 March 30, 2017 24 / 39 Proof. Induction on the height of the tree. BASIS: height 1: Root A with children that read w. Since it is a parse tree, A → w is a production, thus A ⇒lm w is a one step.

INDUCTION: Height n > 1. Root A with children X1, X2,..., Xk .

If Xi is a terminal, define wi to be a string consisting of Xi alone. ∗ If Xi is a variable, then by inductive hypothesis Xi ⇒lm wi . We construct leftmost derivation, inductively on i = 1,..., k we show ∗ A ⇒lm w1w2 ... wi Xi+1Xi+2 ... Xk . If Xi is a terminal, do nothing, just i + +. If Xi is a variable, rewrite derivation: Xi ⇒lm α1 ⇒lm α2 ... ⇒lm wi to

w1w2 ... wi−1Xi Xi+1Xi+2 ... Xk ⇒lm

w1w2 ... wi−1α1Xi+1Xi+2 ... Xk ⇒lm ...

⇒lm w1w2 ... wi−1wi Xi+1Xi+2 ... Xk .

When i = k, the result is a leftmost derivation of w from A.

Automata and Grammars Grammars 6 March 30, 2017 25 / 39 Leftmost Derivation from Parse Tree Example

E

E * E Leftmost child of the root: E ⇒lm I ⇒lm a Rightmost child of the root: I ( E ) E ⇒lm (E) ⇒lm (E + E) ⇒lm (I + E) ⇒lm (a + E) ⇒lm (a+I) ⇒lm (a+I0) ⇒lm (a+I00) ⇒lm (a+b00) a E + E Root: E ⇒lm E ∗ E Leftmost integrated to root: E ⇒ E ∗ E ⇒ I ∗ E ⇒ a ∗ E I I lm lm lm Full derivation: E ⇒ E ∗ E ⇒ I ∗ E ⇒ a ∗ E ⇒ a I 0 lm lm lm lm ⇒lm a ∗ (E) ⇒lm a ∗ (E + E) ⇒lm a ∗ (I + E) ⇒lm ⇒lm a ∗ (a + E) ⇒lm a ∗ (a + I) ⇒lm a ∗ (a + I0) ⇒lm I 0 ⇒lm a ∗ (a + I00) ⇒lm a ∗ (a + b00).

b

Automata and Grammars Grammars 6 March 30, 2017 26 / 39 From Derivations to Recursive Inferences

Lemma (Extracting Xi derivation from A derivation) ∗ Suppose we have a derivation A ⇒ X1X2 ... Xk ⇒ w. Then we can break w into ∗ pieces w = w1w2 ... wk such that Xi ⇒ wi . If Xi is a variable, we can obtain the ∗ ∗ derivation of Xi ⇒ wi by starting with A ⇒ w and stripping away: All the position of the sentential forms that are either to the left or right of the positions that are derived from Xi , and

All steps that are not relevant to the derivation of wi from Xi .

Example Consider derivation E ⇒ E ∗ E ⇒ E ∗ E + E ⇒ I ∗ E + E ⇒ I ∗ I + E ⇒ I ∗ I + I ⇒ a ∗ I + I ⇒ a ∗ b + I ⇒ a ∗ b + a. The derivation of central E from the third sentential form is: E ⇒ I ⇒ b.

Automata and Grammars Grammars 6 March 30, 2017 27 / 39 From Derivations to Recursive Inference

Theorem (From Derivations to Recursive Inference) Let G = (V , T , P, S) be a CFG, and suppose there is a derivation A ⇒∗ w, w ∈ T ∗. Then the recursive inference procedure applied to G determines what W is in the language of variable A.

Proof. An induction on the length of the derivation A ⇒∗ w. BASIS: One step, then A → w must be a production. Since w ∈ T ∗, w will be discovered in L(A) in the basis part of the recursive inference procedure. INDUCTION: n + 1 steps. ∗ Write the derivation as A ⇒ X1X2 ... Xk ⇒ w. Then, from the lemma we can break w into w = w1w2 ... wk either Xi ∈ T and Xi = wi or Xi ∈ V and ∗ Xi ⇒ wi has n or fewer steps. By inductive hypothesis wi is inferred to be in the language of Xi . Now, we have production A → X1X2 ... Xk . The next round of the recursive inference procedure shall discover that w1w2 ... wk = w is in the L(A).

Automata and Grammars Grammars 6 March 30, 2017 28 / 39 Applications of Context-Free Grammars

Parsers Markup Languages (HTML) XML and Document Type Definitions (DTD) In the book chapter 5.3.

Automata and Grammars Grammars 6 March 30, 2017 29 / 39 Ambiguity in Grammars

Two derivations of the same expression: E ⇒ E + E ⇒ E + E ∗ EE ⇒ E ∗ E ⇒ E + E ∗ E E E

E + E E * E

E * E E + E The difference is important; left 1 + (2 ∗ 3) = 7, right (1 + 2) ∗ 3 = 9. This grammar can be modified to be unambiguous.

Example Different derivations may represent the same parse tree. Then, it is not a problem. 1. E ⇒ E + E ⇒ I + E ⇒ a + E ⇒ a + I ⇒ a + b 2. E ⇒ E + E ⇒ E + I ⇒ I + I ⇒ I + b ⇒ a + b.

Automata and Grammars Grammars 6 March 30, 2017 30 / 39 Definition (ambiguous CFG) We say a CFG G = (V , T , P, S) is ambiguous if there is at least one string w ∈ T ∗ for which we can find two different parse trees, each with root labeled S and yield w. If each string has at most one parse tree in the grammar, then the grammar is unambiguous.

E E

E + E E * E Two derivation trees with yield a + a ∗ a shoving the ambiguity I E * E E + E I of the grammar. a I I I I a

a a a a

Automata and Grammars Grammars 6 March 30, 2017 31 / 39 Removing Ambiguity From Grammars

There is no algorithm that can eve tell us whether a CFG is ambiguous. There are context-free languages that have nothing but ambiguous CFG’s. There are some hints for removing ambiguity. There are two causes of ambiguity: The precedence of operators is not respected. A sequence of identical operators can group either from the left or from the right.

Automata and Grammars Grammars 6 March 30, 2017 32 / 39 Enforcing Precedence

The solution enforcing precedence is to introduce several different variables, each for one level of The sole parse tree fore a + a ∗ a. ’binding strength’. Specifically: E A factor is an expression that cannot be broken by any operator. E + T identifiers. Any parenthesized expression. T T * F A term is an expression that cannot be broken by + operator. An expression can be broken by either ∗ or F F I +. An unambiguous expression grammar I I a 1. I → a|b|Ia|Ib|I0|I1 2. F → I|(E) 3. T → F |T ∗ F a a 4. I → T |E + T .

Automata and Grammars Grammars 6 March 30, 2017 33 / 39 1. I → a|b|Ia|Ib|I0|I1 2. F → I|(E) 3. T → F |T ∗ F Unambiguity of our grammar. 4. I → T |E + T . Points why no string can have two different parse T trees. A factor is either a single identifier or any parenthesized expression. T * F Any string derived from T is a sequence of factors connected by ∗. T * F Because of two productions of T , the only parse tree breaks f1 ∗ f2 ... ∗ fn into a term ... * F f1 ∗ f2 ... ∗ fn−1 and a factor fn. Likewise, an expression is a sequence of terms connected by +. The production T E → E + T takes as the term always the last one. T * F

F

Automata and Grammars Grammars 6 March 30, 2017 34 / 39 Leftmost Derivations as a Way to Express Ambiguity

Different leftmost derivations for different parse trees. E ⇒ E ∗ E ⇒ E + E ∗ E ⇒ E ⇒ E + E ⇒ I + E ⇒ a + lm lm lm lm lm lm I +E ∗E ⇒ a+E ∗E ⇒ a+I ∗ E ⇒ a+E ∗E ⇒ a+I ∗E ⇒ lm lm lm lm lm E ⇒ a +a ∗E ⇒ a +a ∗I ⇒ a+a∗E ⇒ a+a∗I ⇒ a+a∗a lm lm lm lm lm a + a ∗ a E E

E + E E * E

I E * E E + E I

a I I I I a

a a a a

Automata and Grammars Grammars 6 March 30, 2017 35 / 39 Theorem For each grammar G = (V , T , P, S) and string w ∈ T ∗, w has two distinct parse trees if and only if w has two distinct leftmost derivations from S.

Proof. (Only-if) Wherever the two parse trees first have a node at which different productions are used, the leftmost derivations constructed will also use different productions and thus be different derivations. (If) Start constructing a tree only the root, labeledS. From eahc production used determine the node is being changed and what the children of this node should be. If there are two distinct derivations, then at the first step where the derivations differ, the nodes being constructed will get different lists of children, and this difference guarantees that the parse trees are distinct.

Automata and Grammars Grammars 6 March 30, 2017 36 / 39 Inherent Ambiguity

Definition (Inherent Ambiguity) A context-free language L is said to be inherently ambiguous if all its grammars are ambiguous. If one grammar for L is unambiguous, then L is an unambiguous language.

Example (Inherently ambiguous language) An example of an inherently ambiguous language:

L = {anbncmd m|n ≥ 1, m ≥ 1} ∪ {anbmcmd n|n ≥ 1, m ≥ 1}. This grammar is ambiguous. For example, the string 1. S → AB|C aabbccdd has the two leftmost derivations: 2. A → aAb|ab 1. S ⇒ AB ⇒ aAbB ⇒ aabbB ⇒ aabbcBd ⇒ 3. B → cBd|cd lm lm lm lm lm aabbccdd 4. C → aCd|aDd 2. S ⇒ C ⇒ aCd ⇒ aaDdd ⇒ aabDcdd ⇒ 5. D → bDc|bc. lm lm lm lm lm aabbccdd

Automata and Grammars Grammars 6 March 30, 2017 37 / 39 Two parse trees for aabbccdd. S S

A B C

a A b c B d a C d

a b c d a C d

b D c

D

b c No matter what modifications we make to the basic grammar, it will generate at least some of the strings of the form anbncnd n in the two ways that the grammar presented.

Automata and Grammars Grammars 6 March 30, 2017 38 / 39 Summary of Chapter 5

Context-Free Grammars G = (V , T , P, S), P recursive rules called productions. Derivations and Languages Beginning with S we repeatedly replace variable by the body. The language is the set of terminal strings we can so derive. Leftmost, Rightmost Derivations. Sentential Forms any step of derivation. Parse Trees Interior nodes are labeled by variables, and leaves are labeled by terminals or λ. For internal node, there must be a production justifying the node-children relation. Equivalence of Parse Trees and Derivations A terminal string is in the language of a grammar iff it is the yield of at least one parse tree. The existence of leftmost derivations and parse trees define exactly the strings in the language of a CFG. Ambiguous Grammars For some CFG’s, it is possible to find a terminal string with more than one parse tree. Eliminating Ambiguity For many useful grammars it is possible to find an unambiguous grammar that generates the same language.

Automata and Grammars Grammars 6 March 30, 2017 39 / 39