Context-Free Grammars
Total Page:16
File Type:pdf, Size:1020Kb
Chapter 3 Context-Free Grammars By Dr Zalmiyah Zakaria •Context-Free Grammars and Languages •Regular Grammars Formal Definition of Context-Free Grammars (CFG) A CFG can be formally defined by a quadruple of (V, , P, S) where: – V is a finite set of variables (non-terminal) – (the alphabet) is a finite set of terminal symbols , where V = – P is a finite set of rules (production rules) written as: A for A V, (v )*. – S is the start symbol, S V 2 Formal Definition of CFG • We can give a formal description to a particular CFG by specifying each of its four components, for example, G = ({S, A}, {0, 1}, P, S) where P consists of three rules: S → 0S1 S → A A → Sept2011 Theory of Computer Science 3 Context-Free Grammars • Terminal symbols – elements of the alphabet • Variables or non-terminals – additional symbols used in production rules • Variable S (start symbol) initiates the process of generating acceptable strings. 4 Terminal or Variable ? • S → (S) | S + S | S × S | A • A → 1 | 2 | 3 • The terminal symbols are { (, ), +, ×, 1, 2, 3} • The variable symbols are S and A Sept2011 Theory of Computer Science 5 Context-Free Grammars • A rule is an element of the set V (V )*. • An A rule: [A, w] or A w • A null rule or lambda rule: A 6 Context-Free Grammars • Grammars are used to generate strings of a language. • An A rule can be applied to the variable A whenever and wherever it occurs. • No limitation on applicability of a rule – it is context free 8 Context-Free Grammars • CFG have no restrictions on the right-hand side of production rules. All of the following are valid CFG production rules: S aSbS S c S λ S xyz S ABcDEF S MZK • Notice that there can be a λ, one or more terminals, one or more non-terminals and any combination of terminals and non-terminals on the right-hand side of a production rule. Sept2011 Theory of Computer Science 9 Generating Strings with CFG • Generate a string by applying rules – Start with the initial symbol – Repeat: • Pick any non-terminal in the string • Replace that non-terminal with the right-hand side of some rule that has that non-terminal as a left-hand side • Repeat until all elements in the string are terminals • E.g. : P: S uAv A w We can derived string uwv as: S ⇒ uAv ⇒ uwv 10 Generating Strings with CFG P: S aS S Bb B cB B • Generating a string: S replace S with aS aS replace S with Bb aBb replace B with cB acBb replace B with acb Final String Sept2011 Theory of Computer Science 11 Generating Strings with CFG P: S aS S Bb B cB B • Generating a string: S replace S with aS aS replace S with aS aaS replace S with Bb aaBb replace B with cB aacBb replace B with cB aaccBb replace B with aaccb Final String Sept2011 Theory of Computer Science 12 Generating Strings with CFG P: S aS S Bb B cB B • Regular expression equivalent to this CFG: a∗c∗b • Shortest string: S ⇒ Bb ⇒ b. Sept2011 Theory of Computer Science 13 Derivation • A derivation is a listing of how a string is generated – showing what the string looks like after every replacement. S AB S ⇒ AB A aA | ⇒ aAB B bB | ⇒ aAbB ⇒ abB ⇒ abbB ⇒ abb Sept2011 Theory of Computer Science 14 Derivation • A terminal string is in the language of the grammar if it can be derived from the start symbol S using the rules of the grammar: • Example: S AASB | AAB A a B bbb 15 Derivation Derivation Rule Applied S ⇒ AASB S AASB ⇒ AAAASBB S AASB ⇒ AAAAAASBBB S AASB ⇒ AAAAAAAABBBB S AAB ⇒ aaaaaaaaBBBB A a ⇒ aaaaaaaabbbbbbbbbbbb B bbb 16 Derivation • Let G be the grammar : S aS | aA A bA | • The derivation of aabb is as shown: S aS aaA aabA aabbA aabb aabb 17 Example • Let G be the grammar : S aSb | ab • The derivation of aabb is as shown: S aSb aaSbb aaaSbbb aaaaSbbbb aaaabbbb Set notation equivalent to this CFG is n n L = {a b | n > 0} 18 CFG – Generating strings • E.g. : Formal description of G = ({S}, {a, b}, P, S) P: S aS | bS | • The following strings can be derived: S ⇒ S ⇒ aS ⇒ a ⇒ a S ⇒ bS ⇒ b ⇒ b S ⇒ aS ⇒ aaS ⇒ aa ⇒ aa S ⇒ aS ⇒ abS ⇒ ab ⇒ ab S ⇒ bS ⇒ baS ⇒ ba ⇒ ba S ⇒ aS ⇒ abS ⇒ abbS ⇒ abb ⇒ abb 19 CFG – Generating strings • E.g. : G = ({S}, {a,b}, P, S) P: S aS | bS | • The language above can also be defined using regular expression: L(G) = (a + b)* 20 CFG – Generating strings • E.g. : G = ({S, A}, {a, b}, P, S) P: S AA A AAA | bA | Ab | a • The following strings can be derived: S ⇒ AA S ⇒ aA rule [A a] Shortest S ⇒ aAAA rule [A AAA] string ? S ⇒ abAAA rule [A bA] S ⇒ abaAA rule [A a] S ⇒ abaAbA rule [A Ab] S ⇒ abaabA rule [A a] S ⇒ abaaba rule [A a] 21 CFG – Generating strings G = (V, , P, S), V = {S, A}, = {a, b}, P: S AA A AAA | bA | Ab | a • Four distinct derivations of ababaa in G: (a) S ⇒ AA (b) S ⇒ AA (c) S ⇒ AA (d) S ⇒ AA ⇒ aA ⇒ AAAA ⇒ Aa ⇒ aA ⇒ aAAA ⇒ aAAA ⇒ AAAa ⇒ aAAA ⇒ abAAA ⇒ abAAA ⇒ AAbAa ⇒ aAAa ⇒ abaAA ⇒ abaAA ⇒ AAbaa ⇒ abAAa ⇒ ababAA ⇒ ababAA ⇒ AbAbaa ⇒ abAbAa ⇒ ababaA ⇒ ababaA ⇒ Ababaa ⇒ ababAa ⇒ ababaa ⇒ ababaa ⇒ ababaa ⇒ ababaa (a) & (b) leftmost, (c) rightmost, (d) arbitrary 22 Context-Free Grammars S ⇒ aS ⇒ aSa ⇒ aba • Sentencial forms are the strings derivable from start symbol of the grammar. • Sentences are forms that contain only terminal symbols. • A set of strings over is context-free language if there is a context-free grammar that generates it. 23 Derivation Tree, DT G = ({S, A}, {a, b}, P, S) P: S AA A AAA | bA | Ab | a The derivation tree for S ⇒ AA ⇒ aA ⇒ abA ⇒ abAb ⇒ abab 24 Derivation Tree S ⇒ aS ⇒ aSa ⇒ aba S ⇒ Sa ⇒ aSa ⇒ aba 25 Derivation Tree • A CFG G is ambiguous if there exist more than 1 DT for n, where n L(G). • Example: G = ({S}, {a, b}, P, S) P: S aS | Sa | b The string aba can be derived as: S ⇒ aS ⇒ aSa ⇒ aba or S ⇒ Sa ⇒ aSa ⇒ aba 26 Example • Let G be the grammar given by the production S aSa | aBa B bB | b • Then L(G) = {anbman | n > 0, m > 0} 27 Example • Let L(G) = {anbmcmd2n | n≥0, m>0} • Then the production rules for this grammar is: S aSdd | A A bAc | bc 28 Example • A string w is a palindrome if w = wR • The set of palindrome over {a, b} can be derived using rules: S a | b | S aSa | bSb 29 More Examples • What is the language of this grammar ? • S aA a(a + b)*b A aA | bA | b • S aA aa*bb* A aA | bB B bB | • S aS | bA a*bba* A bB B aB | 30 How to convert RE to CFG • General rules: • a + b S a | b • ab S aA A b • a* S → aS | λ Sept2011 Theory of Computer Science 31 Regular Grammars • Regular grammars are an important subclass of context-free grammars that play an important role in the lexical analysis and parsing of programming languages. • Regular grammars is a subset of CFG. • Regular grammars are obtained by placing restrictions on the form of the right hand side of the rules. 33 Definition • A regular grammar is a CFG in which each rule has one of the following form: 1. A a 2. A aB 3. A λ where A, B V, and a , A, B is a non- terminal and a can be one or more terminals 34 Regular Grammars • There is at most ONE variable in a sentential form – the rightmost symbol in the string. • Each rule application adds ONE terminal to the derived string. Derivation is terminated by rules: A a OR A λ 35 Example • Consider the grammar: G : S abSA | A Aa | • The equivalent regular grammar: Gr : S aB | B bS | bA A aA | 36 Example Syntax of Pascal in Backus-Naur Form <assign> <var> := <exp> <var> A B C <exp> <var> + <exp> <var> - <exp> (<exp>) <var> * <exp> <var> 37 Example Is A := B*(A+C) Syntactically correct? <assign> <var> := <expr> A := <expr> A := <var>*<expr> A := B*<expr> A := B*(<var>+<expr>) A := B*(A+<expr>) A := B*(A+<var>) A := B*(A+C) 38 Example Is A := B*(A+C) Syntactically correct? 39 How to construct RG from a RE • Use a new non-terminal for every new character • Each loop state turns into a recursive definition on a non-terminal • Example: R.E. ab*ab RG S aA A bA A aB B b Sept2011 Theory of Computer Science 40 Reg Expression to Reg Grammar • RE a(a + b)*b RG S → aA A → aA A → bA A → b • RE ab*a RG S → aA A → bA A → a Sept2011 Theory of Computer Science 41 How to construct RG from a RE (cont.) • Eg. R.E. a(a + b)*b Regular Grammar S aA A aA A bA A b Sept2011 Theory of Computer Science 42 RE ⇒ RG • Consider the regular expression: a+b* • The regular grammar is: S aS aR R bR 43 RE ⇒ RG ⇒ RL • The regular language L = a+b* can be defined as: L = ( V, , P, S) where: V = {S, R} = {a, b} P: S aS aR R bR 44 Non-Regular Language • The language L = a+b* can also be defined as: L = ( V, , P, S) where: V = {S, A, B} = {a, b} P: S AB non-regular A aA | a grammar B bB | 45 Non-RG to RG • Since every RG defines a RL and every RL must have a RE, it would be easier to go ahead and figure out what the RE is and convert it to the RG.