Context-Free Languages & Grammars (Cfls & Cfgs)

Context-Free Languages & Grammars (Cfls & Cfgs)

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5 1 Not all languages are regular n So what happens to the languages which are not regular? n Can we still come up with a language recognizer? n i.e., something that will accept (or reject) strings that belong (or do not belong) to the language? 2 Context-Free Languages n A language class larger than the class of regular languages n Supports natural, recursive notation called “context- free grammar” n Applications: Context- n Parse trees, compilers Regular free n XML (FA/RE) (PDA/CFG) 3 An Example n A palindrome is a word that reads identical from both ends n E.g., madam, redivider, malayalam, 010010010 n Let L = { w | w is a binary palindrome} n Is L regular? n No. n Proof: N N (assuming N to be the p/l constant) n Let w=0 10 k n By Pumping lemma, w can be rewritten as xyz, such that xy z is also L (for any k≥0) n But |xy|≤N and y≠e + n ==> y=0 k n ==> xy z will NOT be in L for k=0 n ==> Contradiction 4 But the language of palindromes… is a CFL, because it supports recursive substitution (in the form of a CFG) n This is because we can construct a “grammar” like this: Same as: 1. A ==> e A => 0A0 | 1A1 | 0 | 1 | e Terminal 2. A ==> 0 3. A ==> 1 Variable or non-terminal Productions 4. A ==> 0A0 5. A ==> 1A1 How does this grammar work? 5 How does the CFG for palindromes work? An input string belongs to the language (i.e., accepted) iff it can be generated by the CFG G: n Example: w=01110 A => 0A0 | 1A1 | 0 | 1 | e n G can generate w as follows: Generating a string from a grammar: 1. A => 0A0 1. Pick and choose a sequence 2. => 01A10 of productions that would allow us to generate the 3. => 01110 string. 2. At every step, substitute one variable with one of its productions. 6 Context-Free Grammar: Definition n A context-free grammar G=(V,T,P,S), where: n V: set of variables or non-terminals n T: set of terminals (= alphabet U {e}) n P: set of productions, each of which is of the form V ==> a1 | a2 | … n Where each ai is an arbitrary string of variables and terminals n S ==> start variable CFG for the language of binary palindromes: G=({A},{0,1},P,A) P: A ==> 0 A 0 | 1 A 1 | 0 | 1 | e 7 More examples n Parenthesis matching in code n Syntax checking n In scenarios where there is a general need for: n Matching a symbol with another symbol, or n Matching a count of one symbol with that of another symbol, or n Recursively substituting one symbol with a string of other symbols 8 Example #2 n Language of balanced paranthesis e.g., ()(((())))((()))…. n CFG? G: S => (S) | SS | e How would you “interpret” the string “(((()))()())” using this grammar? Sà SS à (S) à (SS)à((S)SS)à (((S))(S)(S))à (((()))()()) 9 Example #3 m n n A grammar for L = {0 1 | m≥n} n CFG? G: S => 0S1 | A A => 0A | e How would you interpret the string “00000111” using this grammar? Sà0S1à 0 0S1 1à 0 00S11 1à 0 000A11 1à 0 0000A11 1à 000001111 10 Example #4 A program containing if-then(-else) statements if Condition then Statement else Statement (Or) if Condition then Statement CFG? 11 More examples n n L1 = {0 | n≥0 } n n L2 = {0 | n≥1 } i j k n L3={0 1 2 | i=j or j=k, where i,j,k≥0} i j k n L4={0 1 2 | i=j or i=k, where i,j,k≥1} 12 Applications of CFLs & CFGs n Compilers use parsers for syntactic checking n Parsers can be expressed as CFGs 1. Balancing paranthesis: n B ==> BB | (B) | Statement n Statement ==> … 2. If-then-else: n S ==> SS | if Condition then Statement else Statement | if Condition then Statement | Statement n Condition ==> … n Statement ==> … 3. C paranthesis matching { … } 4. Pascal begin-end matching 5. YACC (Yet Another Compiler-Compiler) 13 More applications n Markup languages n Nested Tag Matching n HTML n <html> …<p> … <a href=…> … </a> </p> … </html> n XML n <PC> … <MODEL> … </MODEL> .. <RAM> … </RAM> … </PC> 14 Tag-Markup Languages Roll ==> <ROLL> Class Students </ROLL> Class ==> <CLASS> Text </CLASS> Text ==> Char Text | Char Char ==> a | b | … | z | A | B | .. | Z Students ==> Student Students | e Student ==> <STUD> Text </STUD> Here, the left hand side of each production denotes one non-terminals (e.g., “Roll”, “Class”, etc.) Those symbols on the right hand side for which no productions (i.e., substitutions) are defined are terminals (e.g., ‘a’, ‘b’, ‘|’, ‘<‘, ‘>’, “ROLL”, etc.) 15 Structure of a production head derivation body A =======> a1 | a2 | … | ak The above is same as: 1. A ==> a1 2. A ==> a2 3. A ==> a3 … K. A ==> ak 16 CFG conventions n Terminal symbols <== a, b, c… n Non-terminal symbols <== A,B,C, … n Terminal or non-terminal symbols <== X,Y,Z n Terminal strings <== w, x, y, z n Arbitrary strings of terminals and non- terminals <== a, b, g, .. 17 Syntactic Expressions in Programming Languages result = a*b + score + 10 * distance + c terminals variables Operators are also terminals Regular languages have only terminals n Reg expression = [a-z][a-z0-1]* n If we allow only letters a & b, and 0 & 1 for constants (for simplification) n Regular expression = (a+b)(a+b+0+1)* 18 String membership How to say if a string belong to the language defined by a CFG? 1. Derivation n Head to body Both are equivalent forms 2. Recursive inference n Body to head G: Example: A => 0A0 | 1A1 | 0 | 1 | e n w = 01110 A => 0A0 n Is w a palindrome? => 01A10 => 01110 19 Simple Expressions… n We can write a CFG for accepting simple expressions n G = (V,T,P,S) n V = {E,F} n T = {0,1,a,b,+,*,(,)} n S = {E} n P: n E ==> E+E | E*E | (E) | F n F ==> aF | bF | 0F | 1F | a | b | 0 | 1 20 Generalization of derivation n Derivation is head ==> body n A==>X (A derives X in a single step) n A ==>*G X (A derives X in a multiple steps) n Transitivity: IF A ==>*GB, and B ==>*GC, THEN A ==>*G C 21 Context-Free Language n The language of a CFG, G=(V,T,P,S), denoted by L(G), is the set of terminal strings that have a derivation from the start variable S. n L(G) = { w in T* | S ==>*G w } 22 Left-most & Right-most G: Derivation Styles E => E+E | E*E | (E) | F F => aF | bF | 0f | 1f | e * Derive the string a*(ab+10) from G: E = =>G a*(ab+10) nE nE n==> E * E n==> E * E Left-most n==> F * E n==> E * (E) Right-most n==> aF * E n==> E * (E + E) derivation: derivation: n==> a * E n==> E * (E + F) n==> a * (E) n==> E * (E + 1F) Always Always substitute n==> a * (E + E) n==> E * (E + 10F) n==> a * (F + E) n==> E * (E + 10) substitute leftmost n==> a * (aF + E) n==> E * (F + 10) rightmost variable n==> a * (abF + E) n==> E * (aF + 10) variable n==> a * (ab + E) n==> E * (abF + 0) n==> a * (ab + F) n==> E * (ab + 10) n==> a * (ab + 1F) n==> F * (ab + 10) n==> a * (ab + 10F) n==> aF * (ab + 10) n==> a * (ab + 10) n==> a * (ab + 10) 23 Leftmost vs. Rightmost derivations Q1) For every leftmost derivation, there is a rightmost derivation, and vice versa. True or False? True - will use parse trees to prove this Q2) Does every word generated by a CFG have a leftmost and a rightmost derivation? Yes – easy to prove (reverse direction) Q3) Could there be words which have more than one leftmost (or rightmost) derivation? Yes – depending on the grammar 24 How to prove that your CFGs are correct? (using induction) 25 Gpal: CFG & CFL A => 0A0 | 1A1 | 0 | 1 | e n Theorem: A string w in (0+1)* is in L(Gpal), if and only if, w is a palindrome. n Proof: n Use induction n on string length for the IF part n On length of derivation for the ONLY IF part 26 Parse trees 27 Parse Trees n Each CfG can be represented using a parse tree: n Each internal node is labeled by a variable in V n Each leaf is terminal symbol n for a production, A==>X1X2…Xk, then any internal node labeled A has k children which are labeled from X1,X2,…Xk from left to right Parse tree for production and all other subsequent productions: A ==> X1..Xi..Xk A X1 … Xi … Xk 28 Examples E A E + E 0 A 0 f f 1 A 1 a 1 e Derivation Recursiveinference Parse tree for 0110 Parse tree for a + 1 G: G: E => E+E | E*E | (E) | F A => 0A0 | 1A1 | 0 | 1 | e F => aF | bF | 0f | 1f | 0 | 1 | a | b 29 Parse Trees, Derivations, and Recursive Inferences Production: A ==> X ..X ..X A 1 i k X1 … Xi … Xk Derivation Recursive inference Left-most Parse tree derivation Derivation Right-most derivation Recursive inference 30 Interchangeability of different CFG representations n Parse tree ==> left-most derivation n DFS left to right n Parse tree ==> right-most derivation n DFS right to left n ==> left-most derivation == right-most derivation n Derivation ==> Recursive inference n Reverse the order of productions n Recursive inference ==> Parse trees n bottom-up traversal of parse tree 31 Connection between CFLs and RLs 32 What kind of grammars result for regular languages? CFLs & Regular Languages n A CFG is said to be right-linear if all the productions are one of the following two forms: A ==> wB (or) A ==> w Where: • A & B are variables, • w is a string of terminals n Theorem 1: Every right-linear CFG generates a regular language n Theorem 2: Every regular language has a right-linear grammar n Theorem 3: Left-linear CFGs also represent RLs 33 Some Examples 0 1 0,1 0 1 ØA => 01B | C 1 0 B => 11B | 0C | 1A A B C 1 0 A B 1 C C => 1A | 0 | 1 0 Right linear CFG? Right linear CFG? Finite Automaton? 34 Ambiguity in CFGs and CFLs 35 Ambiguity in CFGs n A CFG is said to be ambiguous if there exists a string which has more than one left-most derivation Example: S ==> AS | e LM derivation #1: LM derivation #2: S => AS S => AS A ==> A1 | 0A1 | 01 => 0A1S => A1S =>0A11S => 0A11S => 00111S => 00111S Input string: 00111 => 00111 => 00111 Can be derived in two ways 36 Why does ambiguity matter? Values are E ==> E + E | E * E | (E) | a | b | c | 0 | 1 different !!! string = a * b + c E • LM derivation #1: •E => E + E => E * E + E E + E (a*b)+c ==>* a * b + c E * E c a b E • LM derivation #2 •E => E * E => a * E => E * E a*(b+c) a * E + E ==>* a * b + c a E + E The calculated value depends on which b c of the two parse trees is actually used.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    40 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us