<<

Grammar simplification CS 3813: Introduction to Formal Languages and Automata There are many CFG's for any given CFL. When reasoning about CFL's, it often Chomsky normal form (Sec 6.1 – 6.2) helps to assume that a grammar for it has some particularly simple form. “The somewhat tedious nature of the material in this chapter lies in the fact that many of the arguments are manipulative and give little intuitive insight” (p. 150) -- Well, this doesn’t sound promising, but… “The various conclusions are significant; they will be used many times in later discussions.” (p. 150)

Chomsky normal form Conversion to Chomsky normal form • A context- free grammar (CFG) is said to be in Chomsky normal form if every rule in the grammar is of the form Theorem: Any context-free language is A → BC generated by a context-free grammar in or Chomsky normal form. A → a Proof idea: Convert any CFG to one in where A,B and C are any nonterminals, and a is Chomsky normal form by removing or any terminal replacing all rules in the wrong form • For languages that include the empty string λ, the 1. Add a new start symbol rule S may also be allowed, where S is the →λ 2. Eliminate λ rules of the form A →λ start symbol, as long as S does not occur on the right-hand side of any rule 3. Eliminate unit rules of the form A → B 4. Convert remaining rules into proper form

Step 1: Add new start symbol Step 2: Remove λ-productions

1. Add a new start symbol 2. Eliminate all λ rules A →λ, where A is not - Create the following new rule the start variable - For each rule with an occurrence of A on S0 → S the right-hand side, add a new rule with where S is the start symbol and S0 is not used in the CFG the A deleted R → uAv becomes R → uAv | uv R → uAvAw becomes R → uAvAw | uvAw | uAvw | uvw - If we have R → A, replace it with R →λ unless we had already removed R →λ

1 Step 3: Remove unit productions Step 4: Convert rules to proper form

3. Eliminate all unit rules of the form A → B 4. Convert remaining rules into proper form - For each rule B → u, add a new rule A → - What’s left? u, where u is a string of terminals and - Replace each rule A → u1u2…uk, where k ≥ 3 variables, unless this rule had already and ui is a variable or a terminal with k-1 rules been removed A → u1A1 A → u A - Repeat until all unit rules have been 1 2 2 replaced … Ak-2 → uk-1uk

Example Example

S → S1 | S2 S0 → S

S1 → S1b | Ab | λ S → S1 | S2

A → aAb | ab S1 → S1b | Ab

S2 → S2a | Ba | λ A → aAb | ab | λ

B → bBa | ba S2 → S2a | Ba B → bBa | ba | λ Step 1: Add a new start symbol Step 2: Eliminate λ rules

Example Example

S0 → S S0 → S1b | Ab | b | S2a | Ba | a

S → S1 | S2 S → S1b | Ab | b | S2a | Ba | a

S1 → S1b | Ab | b S1 → S1b | Ab | b A → aAb | ab A → aAb | ab

S2 → S2a | Ba | a S2 → S2a | Ba | a B → bBa | ba B → bBa | ba

Step 3: Eliminate all unit rules Step 4: Convert rules to proper form

2 Example Exercises

S0 → S1b | Ab | b | S2a | Ba | a • Convert the following grammar to CNF

S → S1b | Ab | b | S2a | Ba | a A → a | aaA | abBc S1 → S1b | Ab | b B → abbA | b A → aA1 | ab A → Ab 1 • Convert the following grammar to CNF S2 → S2a | Ba | a S → SS | (S) | λ B → bB1 | ba

B1 → Ba

Chomsky normal form and Uses of Chomsky Normal Form • The fact that any CFG can be converted to Chomsky normal form lets us develop a parsing • Allows determination of: algorithm that shows that the membership problem • Membership problem can be solved for context- free languages (CFLs). –Is string w a member of language L(G)? • Here is the idea of the algorithm. For a grammar in Chomsky normal form, any derivation of a string w • Emptiness problem has 2n-1 steps, where n is the length of w. (Why?) –Is L(G) = ∅? So, it is only necessary to check derivations of 2n- • Finiteness problem 1 steps to decide whether G generates w. –Is language L(G) finite? • Of course, this parsing algorithm is inefficient! It • Used by efficient parsing algorithm (called would never be used in practice. But it solves the membership problem for CFLs. CYK algorithm) described in section 6.3

3