Chapter 3 Context-Free Grammars
By Dr Zalmiyah Zakaria
•Context-Free Grammars and Languages •Regular Grammars Formal Definition of Context-Free Grammars (CFG) A CFG can be formally defined by a quadruple of (V, , P, S) where: – V is a finite set of variables (non-terminal) – (the alphabet) is a finite set of terminal symbols , where V = – P is a finite set of rules (production rules) written as: A for A V, (v )*. – S is the start symbol, S V
2 Formal Definition of CFG
• We can give a formal description to a particular CFG by specifying each of its four components, for example, G = ({S, A}, {0, 1}, P, S) where P consists of three rules: S → 0S1 S → A A →
Sept2011 Theory of Computer Science 3 Context-Free Grammars
• Terminal symbols – elements of the alphabet • Variables or non-terminals – additional symbols used in production rules • Variable S (start symbol) initiates the process of generating acceptable strings.
4 Terminal or Variable ? • S → (S) | S + S | S × S | A • A → 1 | 2 | 3
• The terminal symbols are { (, ), +, ×, 1, 2, 3} • The variable symbols are S and A
Sept2011 Theory of Computer Science 5 Context-Free Grammars • A rule is an element of the set V (V )*. • An A rule: [A, w] or A w • A null rule or lambda rule: A
6 Context-Free Grammars • Grammars are used to generate strings of a language. • An A rule can be applied to the variable A whenever and wherever it occurs. • No limitation on applicability of a rule – it is context free
8 Context-Free Grammars
• CFG have no restrictions on the right-hand side of production rules. All of the following are valid CFG production rules: S aSbS S c S λ S xyz S ABcDEF S MZK • Notice that there can be a λ, one or more terminals, one or more non-terminals and any combination of terminals and non-terminals on the right-hand side of a production rule.
Sept2011 Theory of Computer Science 9 Generating Strings with CFG • Generate a string by applying rules – Start with the initial symbol – Repeat: • Pick any non-terminal in the string • Replace that non-terminal with the right-hand side of some rule that has that non-terminal as a left-hand side • Repeat until all elements in the string are terminals • E.g. : P: S uAv A w We can derived string uwv as: S ⇒ uAv ⇒ uwv
10 Generating Strings with CFG
P: S aS S Bb B cB B • Generating a string: S replace S with aS aS replace S with Bb aBb replace B with cB acBb replace B with acb Final String
Sept2011 Theory of Computer Science 11 Generating Strings with CFG
P: S aS S Bb B cB B • Generating a string: S replace S with aS aS replace S with aS aaS replace S with Bb aaBb replace B with cB aacBb replace B with cB aaccBb replace B with aaccb Final String
Sept2011 Theory of Computer Science 12 Generating Strings with CFG P: S aS S Bb B cB B • Regular expression equivalent to this CFG: a∗c∗b • Shortest string: S ⇒ Bb ⇒ b.
Sept2011 Theory of Computer Science 13 Derivation • A derivation is a listing of how a string is generated – showing what the string looks like after every replacement. S AB S ⇒ AB A aA | ⇒ aAB B bB | ⇒ aAbB ⇒ abB ⇒ abbB ⇒ abb
Sept2011 Theory of Computer Science 14 Derivation • A terminal string is in the language of the grammar if it can be derived from the start symbol S using the rules of the grammar: • Example: S AASB | AAB A a B bbb
15 Derivation Derivation Rule Applied S ⇒ AASB S AASB ⇒ AAAASBB S AASB ⇒ AAAAAASBBB S AASB ⇒ AAAAAAAABBBB S AAB ⇒ aaaaaaaaBBBB A a ⇒ aaaaaaaabbbbbbbbbbbb B bbb
16 Derivation • Let G be the grammar : S aS | aA A bA | • The derivation of aabb is as shown: S aS aaA aabA aabbA aabb aabb
17 Example • Let G be the grammar : S aSb | ab • The derivation of aabb is as shown: S aSb aaSbb aaaSbbb aaaaSbbbb aaaabbbb Set notation equivalent to this CFG is L = {anbn | n > 0} 18 CFG – Generating strings • E.g. : Formal description of G = ({S}, {a, b}, P, S) P: S aS | bS | • The following strings can be derived: S ⇒ S ⇒ aS ⇒ a ⇒ a S ⇒ bS ⇒ b ⇒ b S ⇒ aS ⇒ aaS ⇒ aa ⇒ aa S ⇒ aS ⇒ abS ⇒ ab ⇒ ab S ⇒ bS ⇒ baS ⇒ ba ⇒ ba S ⇒ aS ⇒ abS ⇒ abbS ⇒ abb ⇒ abb
19 CFG – Generating strings
• E.g. : G = ({S}, {a,b}, P, S) P: S aS | bS | • The language above can also be defined using regular expression: L(G) = (a + b)*
20 CFG – Generating strings • E.g. : G = ({S, A}, {a, b}, P, S) P: S AA A AAA | bA | Ab | a • The following strings can be derived: S ⇒ AA S ⇒ aA rule [A a] Shortest S ⇒ aAAA rule [A AAA] string ? S ⇒ abAAA rule [A bA] S ⇒ abaAA rule [A a] S ⇒ abaAbA rule [A Ab] S ⇒ abaabA rule [A a] S ⇒ abaaba rule [A a] 21 CFG – Generating strings G = (V, , P, S), V = {S, A}, = {a, b}, P: S AA A AAA | bA | Ab | a • Four distinct derivations of ababaa in G: (a) S ⇒ AA (b) S ⇒ AA (c) S ⇒ AA (d) S ⇒ AA ⇒ aA ⇒ AAAA ⇒ Aa ⇒ aA ⇒ aAAA ⇒ aAAA ⇒ AAAa ⇒ aAAA ⇒ abAAA ⇒ abAAA ⇒ AAbAa ⇒ aAAa ⇒ abaAA ⇒ abaAA ⇒ AAbaa ⇒ abAAa ⇒ ababAA ⇒ ababAA ⇒ AbAbaa ⇒ abAbAa ⇒ ababaA ⇒ ababaA ⇒ Ababaa ⇒ ababAa ⇒ ababaa ⇒ ababaa ⇒ ababaa ⇒ ababaa
(a) & (b) leftmost, (c) rightmost, (d) arbitrary 22 Context-Free Grammars
S ⇒ aS ⇒ aSa ⇒ aba
• Sentencial forms are the strings derivable
from start symbol of the grammar. • Sentences are forms that contain only terminal symbols. • A set of strings over is context-free language if there is a context-free grammar that generates it.
23 Derivation Tree, DT
G = ({S, A}, {a, b}, P, S) P: S AA
A AAA | bA | Ab | a
The derivation tree for S ⇒ AA ⇒ aA ⇒ abA ⇒ abAb ⇒ abab
24 Derivation Tree
S ⇒ aS ⇒ aSa ⇒ aba S ⇒ Sa ⇒ aSa ⇒ aba
25 Derivation Tree
• A CFG G is ambiguous if there exist more than 1 DT for n, where n L(G). • Example: G = ({S}, {a, b}, P, S) P: S aS | Sa | b The string aba can be derived as: S ⇒ aS ⇒ aSa ⇒ aba or S ⇒ Sa ⇒ aSa ⇒ aba
26 Example
• Let G be the grammar given by the production S aSa | aBa B bB | b
• Then L(G) = {anbman | n > 0, m > 0}
27 Example • Let L(G) = {anbmcmd2n | n≥0, m>0} • Then the production rules for this grammar is: S aSdd | A A bAc | bc
28 Example
• A string w is a palindrome if w = wR • The set of palindrome over {a, b} can be derived using rules: S a | b | S aSa | bSb
29 More Examples
• What is the language of this grammar ? • S aA a(a + b)*b A aA | bA | b • S aA aa*bb* A aA | bB B bB | • S aS | bA a*bba* A bB B aB |
30 How to convert RE to CFG
• General rules: • a + b S a | b • ab S aA A b • a* S → aS | λ
Sept2011 Theory of Computer Science 31 Regular Grammars • Regular grammars are an important subclass of context-free grammars that play an important role in the lexical analysis and parsing of programming languages. • Regular grammars is a subset of CFG. • Regular grammars are obtained by placing restrictions on the form of the right hand side of the rules.
33 Definition
• A regular grammar is a CFG in which each rule has one of the following form: 1. A a 2. A aB 3. A λ where A, B V, and a , A, B is a non- terminal and a can be one or more terminals
34 Regular Grammars
• There is at most ONE variable in a sentential form – the rightmost symbol in the string. • Each rule application adds ONE terminal to the derived string. Derivation is terminated by rules: A a OR A λ
35 Example
• Consider the grammar: G : S abSA | A Aa | • The equivalent regular grammar:
Gr : S aB | B bS | bA A aA |
36 Example Syntax of Pascal in Backus-Naur Form
37 Example Is A := B*(A+C) Syntactically correct?
38 Example Is A := B*(A+C) Syntactically correct?
39 How to construct RG from a RE • Use a new non-terminal for every new character • Each loop state turns into a recursive definition on a non-terminal • Example: R.E. ab*ab RG S aA A bA A aB B b Sept2011 Theory of Computer Science 40 Reg Expression to Reg Grammar
• RE a(a + b)*b RG S → aA A → aA A → bA A → b
• RE ab*a RG S → aA A → bA A → a
Sept2011 Theory of Computer Science 41 How to construct RG from a RE (cont.)
• Eg. R.E. a(a + b)*b Regular Grammar S aA A aA A bA A b
Sept2011 Theory of Computer Science 42 RE ⇒ RG
• Consider the regular expression: a+b* • The regular grammar is: S aS aR R bR
43 RE ⇒ RG ⇒ RL • The regular language L = a+b* can be defined as: L = ( V, , P, S) where: V = {S, R} = {a, b} P: S aS aR R bR
44 Non-Regular Language
• The language L = a+b* can also be defined as: L = ( V, , P, S) where: V = {S, A, B} = {a, b} P: S AB non-regular A aA | a grammar B bB |
45 Non-RG to RG
• Since every RG defines a RL and every RL must have a RE, it would be easier to go ahead and figure out what the RE is and convert it to the RG. • Eg. : CFG that is not regular: S XYX X aX | bX | Y aa | bb • The equivalent RE for it is: (a + b)*(aa + bb)(a + b)* • Convert it to RG: S aS | bS | X X aaY | bbY Y aY | bY |
Sept2011 Theory of Computer Science 46 Example
• Grammar for even-length strings over {a, b}: S aE | bE | E aS | bS
47 Design example L = {0n1n | n 0} These strings have recursive structure: 000000111111 0000011111 00001111 000111 0011 S 0S1| λ 01 λ Design examples
Allowed strings n n m m L = {0 1 0 1 | n 0, m 0} 010011 00110011 These strings have two parts: 000111
L = L1L2 n n L1 = {0 1 | n 0} L = {0m1m | m 0} 2 S AA A 0A1 | λ rules for L1: S1 0S11| λ
L2 is the same as L1 Design examples Allowed strings 011001 L = {0n1m0m1n | n 0, m 0} 0011 1100 00110011 These strings have nested structure:
outer part: 0n1n S 0S1|A m m inner part: 1 0 A 1A0 | λ Example of RG
• G = ({S, A, B, C}, {a, b}, P, S) P: S aS | aB B bC C aC | a • L(G) = {amban |m, n 1}
Sept2011 Theory of Computer Science 51 Example of CFG
• G = ({S, A, B, C}, {a, b}, P, S) P: S aSbb | abb • L(G) = {anb2n |n 1}
Sept2011 Theory of Computer Science 52