LR Parsing LALR Parser Generators

Outline LR Parsing • Review of bottom-up parsing LALR Parser Generators • Computing the parsing DFA • Using parser generators Compiler Design I (2011) 2 Bottom-up Parsing (Review) The Shift and Reduce Actions (Review) • A bottom-up parser rewrites the input string • Recall the CFG: E → int | E + (E) to the start symbol • A bottom-up parser uses two kinds of actions: • The state of the parser is described as α I γ • Shift pushes a terminal from input on the – α is a stack of termin als and non-terminals stack – γ is the string of terminals not yet examined E + ( I int ) ⇒ E + ( int I ) • Reduce pops 0 or more symbols off of the • Initially: I x1x2 . xn stack (production RHS) and pushes a non- terminal on the stack (production LHS) E + (E + ( E ) I ) ⇒ E + ( E I ) Compiler Design I (2011) 3 Compiler Design I (2011) 4 Key Issue: When to Shift or Reduce? LR(1) Parsing: An Example int I int + (int) + (int)$ shift • Idea: use a deterministic finite automaton 0 1 E I → (DFA) to decide when to shift or reduce E → int int + (int) + (int)$ E int ( on $, + E I + (int ) + (int)$ shift (x3) – The input is the stack 2 + 3 4 E + (int I ) + (int)$ E → int – The language consists of terminals and non-terminals accept on $ E int E + (E I ) + (int)$ shift ) E → int I → • We run the DFA on the stack and we examine 7 6 5 E + (E) + (int)$ E E+(E) on ), + the resulting state X and the token tok after I E → E + (E) E I + (int)$ shift (x3) on $, + + int E + (int I )$ E → int – If X has a transition labeled tok then shift ( E + (E I )$ shift – If X is labeled with “A →βon tok”then reduce 8 9 E + (E) I $ E → E+(E) E + E I $ accept 10 11 E → E + (E) ) on ), + Compiler Design I (2011) 5 Representing the DFA Representing the DFA: Example • Parsers represent the DFA as a 2D table The table for a fragment of our DFA: – Recall table-driven lexical analysis int + ( ) $ E ( • Lines correspond to DFA states 3 4 … 3 s4 • Columns correspond to terminals and non- E int terminals 4 s5 g6 5 • Typically columns are split into: 6 5 rE → int rE → int E → int 6 s8 s7 – Those for terminals: the action table ) on ), + 7 r → r → – Those for non-terminals: the goto table E E+(E) E E+(E) … 7 sk is shift and goto state k E → E + (E) rX → α is reduce on $, + gk is goto state k Compiler Design I (2011) 7 Compiler Design I (2011) 8 The LR Parsing Algorithm The LR Parsing Algorithm • After a shift or reduce action we rerun the let I = w$ be initial input DFA on the entire stack let j = 0 – This is wasteful, since most of the work is repeated let DFA state 0 be the start state let stack = 〈 dummy, 0 〉 repeat • Remember for each stack element on which case action[top_state(stack), I[j]] of state it brings the DFA shift k: push 〈 I[j++], k 〉 reduce X → A: • LR parser maintains a stack pop |A| pairs, 〈 sym1, state1 〉 . 〈 symn, staten 〉 push 〈 X, goto[top_state(stack), X] 〉 statek is the final state of the DFA on sym1 …sy mk accept: halt normally error: halt and report error Compiler Design I (2011) 9 Compiler Design I (2011) 10 Key Issue: How is the DFA Constructed? LR(0) Items • The stack describes the context of the parse • An LR(0) item is a production with a “I” – What non-terminal we are looking for somewhere on the RHS – What production RHS we are looking for – What we have seen so far from the RHS • The items for T → (E) are T → I (E) • Each DFA state describes several such T → ( I E) contexts T → (E I ) – E.g., when we are looking for non-terminal E, we T → (E) I might be looking either for an int or an E + (E) RHS • The only item for X →εis X → I Compiler Design I (2011) 11 Compiler Design I (2011) 12 LR(0) Items: Intuition LR(1) Items • An item [X →α I β] says that • An LR(1) item is a pair: – the parser is looking for an X X →α I β, a – it has an α on top of the stack – X →αβis a production – Expects to find a string derived from β next in the – a is a terminal (the lookahead terminal) input – LR(1) means 1 lookahead terminal • [X →α I β, a] describes a context of the parser • Notes: – We are trying to find an X followed by an a, and →αI β – [X a ] means that a should follow. Then we – We have (at least) α already on top of the stack can shift it and still have a viable prefix – Thus we need to see next a prefix derived from βa →α I – [X ] means that we could reduce X • But this is not always a good idea ! Compiler Design I (2011) 13 Compiler Design I (2011) 14 Note Convention • The symbol I was used before to separate the • We add to our grammar a fresh new start stack from the rest of input symbol S and a production S → E – α I γ, where α is the stack and γ is the remaining – Where E is the old start symbol string of terminals • In items I is used to mark a prefix of a • The initial parsing context contains: production RHS: S → I E , $ X →α I β, a – Trying to find an S as a string derived from E$ – Here β might contain terminals as well – The stack is empty • In both case the stack is on the left of I Compiler Design I (2011) 15 Compiler Design I (2011) 16 LR(1) Items (Cont.) LR(1) Items (Cont.) • In context containing • Consider the item E → E + I ( E ) , + E → E + ( I E ) , + – If ( follows then we can perform a shift to context • We expect a string derived from E ) + containing • There are two productions for E → I E E + ( E ) , + E → int and E → E + ( E) • In context containing • We describe this by extending the context E → E + ( E ) I , + with two more items: – We can perform a reduction with E → E + ( E ) E → I int , ) – But only if a + follows E → I E + ( E ) , ) Compiler Design I (2011) 17 Compiler Design I (2011) 18 The Closure Operation Constructing the Parsing DFA (1) • The operation of extending the context with • Construct the start context: Closure({S → I E, $}) items is called the closure operation S → I E , $ E → I E+(E), $ Closure(Items) = E → I int , $ repeat E → I E+(E), + for each [X → α I Yβ, a] in Items E → I int , + for each production Y →γ for each b in First(βa) • We abbreviate as: add [Y → I γ, b] to Items S → I E , $ until Items is unchanged E → I E+(E) , $/+ E → I int , $/+ Compiler Design I (2011) 19 Compiler Design I (2011) 20 Constructing the Parsing DFA (2) The DFA Transitions • A DFA state is a closed set of LR(1) items • A state “State” that contains [X →α I yβ, b] has a transition labeled y to a state that • The start state contains [S → I E , $] contains the items “Transition (State, y)” – y can be a terminal or a non-terminal • A state that contains [X →α I, b] is labelled Transition(State, y) with “reduce with X →αon b” Items = ∅ for each [X → α I yβ, b] in State • And now the transitions … add [X → αy I β, b] to Items return Closure(Items) Compiler Design I (2011) 21 Compiler Design I (2011) 22 Constructing the Parsing DFA: Example LR Parsing Tables: Notes 0 1 • Parsing tables (i.e., the DFA) can be S → I E , $ E → int E → int I, $/+ → I on $, + constructed automatically for a CFG E E+(E), $/+ int E → I int , $/+ E → E+ I (E), $/+ 3 E 2 + • But we still need to understand the S → E I , $ ( construction to work with parser generators E → E I +(E), $/+ E → E+(I E) , $/+ 4 – E.g., they report errors in terms of sets of items accept E E → I E+(E) , )/+ on $ E → I int , )/+ • What kind of errors can we expect? → I int 6 E E+(E ) , $/+ 5 E → E I +(E) , )/+ E → int I , )/+ E → int + ) on ), + Compiler Design I (2011) and so on… 23 Compiler Design I (2011) 24 Shift/Reduce Conflicts Shift/Reduce Conflicts • If a DFA state contains both • Typically due to ambiguities in the grammar [X → α I aβ, b] and [Y → γ I, a] • Classic example: the dangling else S → if E then S | if E then S else S | OTHER • Then on input “a” we could either • Will have DFA state containing – Shift into state [X → α a I β, b], or [S → if E then S I, else] – Reduce with Y → γ [S → if E then S I else S, x] • If else follows then we can shift or reduce • This is called a shift-reduce conflict • Default (yacc, ML-yacc, etc.) is to shift – Default behavior is as needed in this case Compiler Design I (2011) 25 Compiler Design I (2011) 26 More Shift/Reduce Conflicts More Shift/Reduce Conflicts • Consider the ambiguous grammar • In yacc declare precedence and associativity: E → E + E | E * E | int %left + %left * • We will have the states containing [E → E * I E, +] [E → E * E I, +] • Precedence of a rule = that of its last terminal [E → I E + E, +] ⇒E [E → E I + E, +] See yacc manual for ways to override this default …… • Resolve shift/reduce conflict with a shift if: • Again we have a shift/reduce on input + – no precedence declared for either rule or terminal – We need to reduce (* binds more tightly than +) – input terminal has higher precedence than the rule – Recall solution: declare the precedence of * and + – the precedences are the same and right associative Compiler Design I (2011) 27 Compiler Design I (2011) 28 Using Precedence to Solve S/R Conflicts Using Precedence to Solve S/R Conflicts • Back to our example: • Same grammar as before [E → E * I E, +] [E →E * E I, +] E → E + E | E * E | int [E → I E + E, +] ⇒E [E →E I + E, +] • We will also have the states …… [E → E + I E, +] [E → E + E I, +] [E → I E + E, +] ⇒E [E → E I + E, +] • Will choose reduce because precedence of …… rule E → E * E is higher than of terminal + • Now we also have a shift/reduce on input + – We choose reduce because E → E + E and + have the same precedence and + is left-associative Compiler Design I (2011) 29 Compiler Design I (2011) 30 Using Precedence to Solve S/R Conflicts Precedence Declarations Revisited • Back to our dangling else example The term “precedence declaration” is misleading!

LR Parsing LALR Parser Generators

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support