Context Free Languages and Pushdown Automata
Total Page:16
File Type:pdf, Size:1020Kb
Context Free Languages and Pushdown Automata COMP2600 — Formal Methods for Software Engineering Ranald Clouston Australian National University Semester 2, 2013 COMP 2600 — Context Free Languages and Pushdown Automata 1 Parsing The process of parsing a program is partly about confirming that a given program is well-formed – syntax. But it is also about representing the structure of the program so that it can be executed – semantics. For this purpose, the trail of sentential forms created en route to generating a given sentence is just as important as the question of whether the sentence can be generated or not. COMP 2600 — Context Free Languages and Pushdown Automata 2 The Semantics of Parses Take the code if e1 then if e2 then s1 else s2 where e1, e2 are boolean expressions and s1, s2 are subprograms. Does this mean if e1 then( if e2 then s1 else s2) or if e1 then( if e2 else s1) else s2 We’d better have an unambiguous way to tell which is right, or we cannot know what the program will do at runtime! COMP 2600 — Context Free Languages and Pushdown Automata 3 Ambiguity Recall that we can present CFG derivations as parse trees. Until now this was mere pretty presentation; now it will become important. A context-free grammar G is unambiguous iff every string can be derived by at most one parse tree. G is ambiguous iff there exists any word w 2 L(G) derivable by more than one parse tree. COMP 2600 — Context Free Languages and Pushdown Automata 4 Example: If-Then and If-Then-Else Consider the CFG S ! if bexp then S j if bexp then S else S j prog where bexp and prog stand for boolean expressions and (if-statement free) programs respectively, defined elsewhere. The string if e1 then if e2 then s1 else s2 then has two parse trees: S S RR ss EE ss EERRR ss EE ss EE RRR sy ss EE" sy ss E" RRR if e1 then S if e1 then S else R) S jjjy EE s jjjj yy EE sss ju jjj | yy EE" sy ss if e2 then S y else S if e2 then S s2 s1 s2 s1 COMP 2600 — Context Free Languages and Pushdown Automata 5 Example: If-Then and If-Then-Else That grammar was ambiguous. But here’s a grammar accepting the exact same language that is unambiguous: S ! if bexp then S j T T ! if bexp then T else S j prog There is now only one parse for if e1 then if e2 then s1 else s2. This is given on the next slide: COMP 2600 — Context Free Languages and Pushdown Automata 6 Example: If-Then and If-Then-Else S s EE sss EE sy ss EE" if e1 then S T jj E jjj y EE jjj yy EE ju jj y| y E" if e2 then T else S s1 T s2 You cannot parse this string as if e1 then ( if e2 else s1 ) else s2. COMP 2600 — Context Free Languages and Pushdown Automata 7 Reflecting on This Example We have seen that the same language can be presented with ambiguous and unambiguous grammars. Ambiguity is in general a property of grammars, not of languages. From the point of view of semantics and parsing, it is often desirable to turn an ambiguous grammar into an equivalent unambiguous grammar. This generally involves choices that are not driven by the language itself. E.g. there exists another grammar for the language of the previous slides that is unambiguous and allows the parse if e1 then ( if e2 else s1 ) else s2 but not if e1 then ( if e2 then s1 else s2 ). COMP 2600 — Context Free Languages and Pushdown Automata 8 What Ambiguity Isn’t You might wonder if our grammar is still ambiguous, given the production T ! if bexp then T else S After a derivation that uses this, we have a choice of whether we expand the T or the S first. Is this an ambiguity? No. A context-free grammar gets its name because non-terminals can be expanded without regard for the context they appear in. In other words, it doesn’t matter if you expand T or S ‘first’. From the perspec- tive of the parse tree these expansions are happening in parallel. This is a reason why a parse tree is a better formalism for presenting a CFG derivation than listing each step. COMP 2600 — Context Free Languages and Pushdown Automata 9 Inherently Ambiguous Languages In fact not all context-free languages can be given unambiguous grammars – some are inherently ambiguous. Consider the language L = faib jck j i = j or j = kg How do we know that this is context-free? First, notice that L = faibickg [ faib jc jg We then combine CFGs for each side of this union (a standard trick): S ! T j W T ! UV W ! XY U ! aUb j e X ! aX j e V ! cV j e Y ! bYc j e COMP 2600 — Context Free Languages and Pushdown Automata 10 Inherently Ambiguous Languages The problem with this CFG is that the union we used has a non-empty inter- section, where the a’s, b’s, and c’s all have equal number. The sentences in this intersection are a source of ambiguity: S S T W B C || BB {{ CC |} | B {} { C! U V X Y B A @ }} BB AA ~~ || @@ }~ } B A ~~ |} | @ a U b c V a X b Y c e e e e In fact (not proved here!) there is no alternative choice of grammar for this language that avoids this ambiguity. COMP 2600 — Context Free Languages and Pushdown Automata 11 The Bad News We would like to have an algorithm that turns grammars into equivalent un- ambiguous grammars where possible. However this is an uncomputable problem. Worse – determining whether a grammar is ambiguous or not in the first place is also uncomputable! Uncomputable problems are everyone in computer science. You will see many more next week in particular. The best response is not despair; rather we should see what tricks and tech- niques might help us out at least some of the time. COMP 2600 — Context Free Languages and Pushdown Automata 12 Example: Subtraction Consider the grammar S ! S − S j int where int could be any integer and the symbol ‘−’ is intended to be executed as subtraction. This grammar is ambiguous, and this matters: S S > @ ~~ >> ÐÐ @@ ~~ >> ÐÐ @@ ~ ~ > Ð Ð @ S − 1 5 − S @ > ÐÐ @@ ~~ >> ÐÐ @@ ~~ >> Ð Ð @ ~ ~ > 5 − 3 3 − 1 The left tree evaluates 5 − 3 − 1 to 1; the left evaluates the same string to 3! COMP 2600 — Context Free Languages and Pushdown Automata 13 Technique 1: Associativity We can remove the ambiguity of a binary infix operator by making it associate to the left or right. S ! S − int j int Now 5 − 3 − 1 can only be read as (5 − 3) − 1 - this is left associativity. For right associativity we would use the production S ! int − S instead. Idea: Force one side of our operator to a ‘lower’ level, making sure we avoid loops in our grammar that allow the original non-terminal to be recovered. Here we force the right hand side of the minus sign to the lowest possible level – a terminal symbol. COMP 2600 — Context Free Languages and Pushdown Automata 14 Example: Multiplication and Addition S ! S ∗ S j S + S j int where ∗ is to be executed as multiplication and + as addition. Again this is obviously ambiguous – 1 + 2 ∗ 3 could evaluate to 7 or 9. (Note that 1 + 2 + 3 is also a source of ambiguity as it can be produced by different parse trees, even though it is not ambiguous in the interpretation we have in mind.) If all we care about is resolving ambiguity we can use the same trick as the last slide, making both ∗ and + left (or right) associative. But this is not the behaviour we expect from these operations: we expect ∗ to have higher precedence than +. COMP 2600 — Context Free Languages and Pushdown Automata 15 Technique 2: Precedence S ! S + T j T T ! T ∗ int j int Given a string 1+2∗3, or 2∗3+1, we have no choice but to expand to S+T first, so that (thinking bottom-up) the + will be last command to be executed. Suppose we tried to derive 1 + 2 ∗ 3 by first doing S ) T ) T ∗ 3. We are then stuck because we cannot send T to 1 + 2! As with associativity this trick works by forcing ourselves down to a lower level to generate parts of our sentences. Here we have three levels: S, then T , then the integers as non-terminals. COMP 2600 — Context Free Languages and Pushdown Automata 16 Example: Basic Arithmetic S ! S + T j S − T j T T ! T ∗U j T=U j U U ! (S) j int Note that we have brackets available to give a clearly labelled way to loop back to the top – if we want to break the usual rules of arithmetic to get the execution (1 + 2) ∗ 3 then we must indicate this with explicit brackets. (Note also that the previous slides’ languages were actually regular and so could have been generated by right-linear grammars. The language above is truly context-free because of the need to keep track of bracket balancing to an arbitrary depth.) COMP 2600 — Context Free Languages and Pushdown Automata 17 Example: Balanced Brackets The following grammar generates a language where each left bracket is ex- actly matched by a closing bracket: S ! e j (S) j SS This is ambiguous in a rather stupid way: to generate () we could use the derivation S ) (S) ) (), which seems sensible.