Parse Trees (=Derivation Tree) by Constructing a Derivation
Total Page:16
File Type:pdf, Size:1020Kb
CFG: Parsing •Generative aspect of CFG: By now it should be clear how, from a CFG G, you can derive strings wÎL(G). CFG: Parsing •Analytical aspect: Given a CFG G and strings w, how do you decide if wÎL(G) and –if so– how do you determine the derivation tree or the sequence of production rules Recognition of strings in a that produce w? This is called the problem of parsing. language 1 2 CFG: Parsing CFG: Parsing • Parser A program that determines if a string w Î L(G) Parse trees (=Derivation Tree) by constructing a derivation. Equivalently, A parse tree is a graphical representation it searches the graph of G. of a derivation sequence of a sentential form. – Top-down parsers • Constructs the derivation tree from root to leaves. • Leftmost derivation. Tree nodes represent symbols of the – Bottom-up parsers grammar (nonterminals or terminals) and • Constructs the derivation tree from leaves to tree edges represent derivation steps. root. • Rightmost derivation in reverse. 3 4 CFG: Parsing CFG: Parsing Parse Tree: Example Parse Tree: Example 1 E ® E + E | E * E | ( E ) | - E | id Given the following grammar: Lets examine this derivation: E Þ -E Þ -(E) Þ -(E + E) Þ -(id + id) E ® E + E | E * E | ( E ) | - E | id E E E E E Is the string -(id + id) a sentence in this grammar? - E - E - E - E Yes because there is the following derivation: ( E ) ( E ) ( E ) E Þ -E Þ -(E) Þ -(E + E) Þ -(id + id) E + E E + E This is a top-down derivation because we start building the id id parse tree at the top parse tree 5 6 1 CFG: Parsing CFG: Parsing S ® SS | a | b Parse Tree: Example 2 Rightmost Parse Tree: Example 2 abÎ L( S) derivation S Þ SS Þ Sb Þ ab S S S S S S S S Derivation S S S S Derivation S S S S Trees S S Trees a a b S S b a b S S S Leftmost S derivation S Þ SS Þ aS Þ ab Rightmost Derivation a b S S in Reverse 7 8 CFG: Parsing CFG: Parsing Practical Parsers Example 3 Consider the CFG grammar G • Language/Grammar designed to enable deterministic (directed S ® A and backtrack-free) searches. A ® T | A + T T ® b | ( A ) Show that (b)+b Î L(G)? – Top-down parsers : LL(k) languages S S • E.g., Pascal, Ada, etc. S S S S S S • Better error diagnosis and recovery. – Bottom-up parsers : LALR(1), LR(k) languages A A A A A A A • E.g., C/C++, Java, etc. A T A T A T A T A T A T • Handles left recursion in the grammar. T T T T T – Backtracking parsers A A • E.g., Prolog interpreter. A A T T T ( b )+ ( b )+ b 9 10 + + ( )+ ( )+ CFG: Parsing CFG: Parsing Top-down Exhaustive Parsing Flaws of Top-down Exhaustive Parsing nExhaustive parsing is a form of top-down parsing where nObvious flaw: it will take a long time and a lot of memory you start with S and systematically go through all possible (say for moderately long strings w: It is inefficient. leftmost) derivations until you produce the string w. n(You can remove sentential forms that will not work.) nFor cases wÏL(G) exhaustive parsing may never end. This will especially happen if we have rules like A? ? that make nExample: Can the CFG S ? SS | aSb | bSa | ? produce the the sentential forms ‘shrink’ so that we will never know if we string w = aabb, and how? went ‘too far’ with our parsing attempts. nAfter one step: S Þ SS or aSb or bSa or ?. nSimilar problems occur if the parsing can get in a loop according nAfter two steps: S Þ SSS or aSbS or bSaS or S, to A Þ B Þ A Þ B… or S Þ aSSb or aaSbb or abSab or ab. nFortunately, it is always possible to remove problematic rules ? ? nAfter three steps we see that: S Þ aSb Þ aaSbb Þ aabb. like A ? and A B from a CFG G. 11 12 2 Grammar Ambiguity Grammar Ambiguity Definition A string wÎL(G) is derived ambiguously if it has more than one derivation tree (or equivalently: if it has more than one leftmost derivation (or rightmost)). Definition: a string is derived ambiguously A grammar is ambiguous if some strings are derived in a context-free grammar if it has two or ambiguously. more different parse trees Typical example: rule S ® 0 | 1 | S+S | S´S Definition: a grammar is ambiguous if it S Þ S+S Þ S´S+S Þ 0´S+S Þ 0´1+S Þ 0´1+1 generates some string ambiguously versus S Þ S´S Þ 0´S Þ 0´S+S Þ 0´1+S Þ 0´1+1 13 14 Grammar Ambiguity Grammar Ambiguity The ambiguity of 0´1+1 is shown by the two Note that the two different derivations: different parse trees: S S Þ S+S Þ 0+S Þ 0+1 S S and S Þ S+S Þ S+1 Þ 0+1 do not constitute an ambiguous string 0 S ´ S 0+1 as have the same parse tree: + 1 S + S 0 Ambiguity causes troubles when trying to interpret strings 1 S + like: “She likes men who love women who don't smoke.” S ´ S S Solutions: Use parentheses, or use precedence rules such as a+(b´c) = a+b´c ? (a+b)´c. 0 1 1 1 15 16 Grammar Ambiguity Grammar Ambiguity Example <EXPR> ? <EXPR> + <EXPR> Inherently Ambiguous <EXPR> ? <EXPR> * <EXPR> <EXPR> ? ( <EXPR> ) nLanguages that can only be generated by <EXPR> ? a ambiguous grammars are inherently ambiguous. Build a parse tree for a + a * a nExample 5.13: L = {anbncm} È {anbmcm}. <EXPR> <EXPR> L ={ aib jck | i = j Ú j = k} nThe way to make a CFG for this L somehow has to <EXPR> <EXPR> involve the step S ? S1|S2 where S1 produces the n n m n m m <EXPR> <EXPR> strings a b c and S2 the strings a b c . nThis will be ambiguous on strings anbncn. <EXPR> <EXPR> <EXPR> <EXPR> 17 18 a + a * a a + a * a 3 Grammar Ambiguity Grammar Ambiguity Example Example E ® E + E | E * E | ( E ) | - E | id E ® E + E | E * E | ( E ) | - E | id Find a derivation for the expression: id + id * id Find a derivation for the expression: id + id * id E E E E E According to the grammar, both are correct. E + E E + E E + E E + E E * E id E * E id E * E id id A grammar that produces more than one id id E E E E parse tree for any input sentence is said E to be an ambiguous grammar. E * E E * E E * E E + E E + E E + E id E * E id Which derivation tree is correct? id id 19 id id 20 Grammar Ambiguity Grammar Ambiguity Example One way to resolve ambiguity is to associate precedence to the operators. stm ® if expr then stm Example Grammar: | if expr then stm • * has precedence over + else stm 1 + 2 * 3 = 1 + (2 * 3) 1 + 2 * 3 ? (1 + 2)*3 if B1 then if B2 then S1 else S2 Ambiguity: • Associativity and precedence information is typically vs used to disambiguate non-fully parenthesized if B1 then if B2 then S1 else S2 expressions containing unary prefix/postfix operators or binary infix operators. 21 22 Grammar Ambiguity Grammar Ambiguity Quiz 1 Quiz 2 Is the following grammar ambiguous? S ® PC | AQ Is the following grammar ambiguous? S ® aS | Sb | ab P ® aPb | l Yes: consider ab Yes: consider the string abc C ® cC | l Q ® bQc | l A ® aA | l 23 24 4 Grammar Ambiguity Simple Grammar Definition Quiz A CFG (V,T,S,P) is a simple grammar (s-grammar) if and only if all its productions are of the form Is the following grammar ambiguous? S ® SS | l A ? ax with AÎV, aÎT, xÎV* and any pair (A,a) occurs at most once. •Note, for simple grammars a left most derivation of a S string wÎL(G) is straightforward and requires time |w|. Yes •Example: Take the s-grammar S ? aS|bSS|c with aabcc: SS S Þ aS Þ aaS Þ aabSS Þ aabcS Þ aabcc. l Cyclic structure Quiz: is the grammar S ? aS|bSS|aSS|c s-grammar ? SSS (Illustrates ambiguous grammar with cycles.) 25 NO Why? The pair (S,a) occurs twice 26 Chomsky Normal Form CNF Even though we can’t get every grammar into right-linear form, or in general even Normal Forms get rid of ambiguity, there is an especially simple form that general CFG’s can be converted into: Chomsky Normal Form Griebach Normal Form 27 28 Chomsky Normal Form CNF Chomsky Normal Form Noam Chomsky came up with an especially simple Definition 6.4: A CFG is in Chomsky normal form type of context free grammars which is able to if and only if all production rules are of the form A ® BC capture all context free languages. or A ® x Chomsky's grammatical form is particularly useful with variables A,B,CÎV and xÎT. when one wants to prove certain facts about (Sometimes rule S®? is also allowed.) context free languages. This is because CFGs in CNF can be parsed in time O(|w|3). assuming a much more restrictive kind of grammar can often make it easier to prove that Named after Noam Chomsky who in the generated language has whatever property the 60s made seminal contributions you are interested in.