LR Parsing LALR Parser Generators

Outline LR Parsing • Review of bottom-up parsing LALR Parser Generators • Computing the parsing DFA • Using parser generators 2 Bottom-up Parsing (Review) The Shift and Reduce Actions (Review) • A bottom-up parser rewrites the input string • Recall the CFG: E → int | E + (E) to the start symbol • A bottom-up parser uses two kinds of actions: • The state of the parser is described as α I γ • Shift pushes a terminal from input on the – α is a stack of termin als and non-terminals stack – γ is the string of terminals not yet examined E + ( I int ) ⇒ E + ( int I ) • Reduce pops 0 or more symbols off of the • Initially: I x1x2 . xn stack (production RHS) and pushes a non- terminal on the stack (production LHS) E + (E + ( E ) I ) ⇒ E + ( E I ) 3 4 Key Issue: When to Shift or Reduce? LR(1) Parsing: An Example int I int + (int) + (int)$ shift • Idea: use a deterministic finite automaton 0 1 E I → (DFA) to decide when to shift or reduce E → int int + (int) + (int)$ E int ( on $, + E I + (int ) + (int)$ shift (x3) – The input is the stack 2 + 3 4 E + (int I ) + (int)$ E → int – The language consists of terminals and non-terminals accept on $ E int E + (E I ) + (int)$ shift ) E → int I → • We run the DFA on the stack and we examine 7 6 5 E + (E) + (int)$ E E+(E) on ), + the resulting state X and the token tok after I E → E + (E) E I + (int)$ shift (x3) on $, + + int E + (int I )$ E → int – If X has a transition labeled tok then shift ( E + (E I )$ shift – If X is labeled with “A →βon tok”then reduce 8 9 E + (E) I $ E → E+(E) E + E I $ accept 10 11 E → E + (E) ) on ), + 5 Representing the DFA Representing the DFA: Example • Parsers represent the DFA as a 2D table The table for a fragment of our DFA: – Recall table-driven lexical analysis int + ( ) $ E ( • Lines correspond to DFA states 3 4 … 3 s4 • Columns correspond to terminals and non- E int terminals 4 s5 g6 5 • Typically columns are split into: 6 5 rE → int rE → int E → int 6 s8 s7 – Those for terminals: the action table ) on ), + 7 r → r → – Those for non-terminals: the goto table E E+(E) E E+(E) … 7 sk is shift and goto state k E → E + (E) rX → α is reduce on $, + gk is goto state k 7 8 The LR Parsing Algorithm The LR Parsing Algorithm • After a shift or reduce action we rerun the let I = w$ be initial input DFA on the entire stack let j = 0 – This is wasteful, since most of the work is repeated let DFA state 0 be the start state let stack = 〈 dummy, 0 〉 • Remember for each stack element on which repeat state it brings the DFA case action[top_state(stack), I[j]] of shift k: push 〈 I[j++], k 〉 • LR parser maintains a stack reduce X → A: pop |A| pairs, 〈 sym , state 〉 . 〈 sym , state 〉 1 1 n n push 〈 X, goto[top_state(stack), X] 〉 state is the final state of the DFA on sym …sy m k 1 k accept: halt normally error: halt and report error 9 10 Key Issue: How is the DFA Constructed? LR(0) Items • The stack describes the context of the parse • An LR(0) item is a production with a “I” – What non-terminal we are looking for somewhere on the RHS – What production RHS we are looking for – What we have seen so far from the RHS • The items for T → (E) are T → I (E) • Each DFA state describes several such T → ( I E) contexts T → (E I ) – E.g., when we are looking for non-terminal E, we T → (E) I might be looking either for an int or an E + (E) RHS • The only item for X →εis X → I 11 12 LR(0) Items: Intuition LR(1) Items • An item [X →α I β] says that • An LR(1) item is a pair: – the parser is looking for an X X →α I β, a – it has an α on top of the stack – X →αβis a production – Expects to find a string derived from β next in the – a is a terminal (the lookahead terminal) input – LR(1) means 1 lookahead terminal • [X →α I β, a] describes a context of the parser • Notes: – We are trying to find an X followed by an a, and →αI β – [X a ] means that a should follow. Then we – We have (at least) α already on top of the stack can shift it and still have a viable prefix – Thus we need to see next a prefix derived from βa →α I – [X ] means that we could reduce X • But this is not always a good idea ! 13 14 Note Convention • The symbol I was used before to separate the • We add to our grammar a fresh new start stack from the rest of input symbol S and a production S → E – α I γ, where α is the stack and γ is the remaining – Where E is the old start symbol string of terminals • In items I is used to mark a prefix of a • The initial parsing context contains: production RHS: S → I E , $ X →α I β, a – Trying to find an S as a string derived from E$ – Here β might contain terminals as well – The stack is empty • In both case the stack is on the left of I 15 16 LR(1) Items (Cont.) LR(1) Items (Cont.) • In context containing • Consider the item E → E + I ( E ) , + E → E + ( I E ) , + – If ( follows then we can perform a shift to context • We expect a string derived from E ) + containing • There are two productions for E → I E E + ( E ) , + E → int and E → E + ( E) • In context containing • We describe this by extending the context E → E + ( E ) I , + with two more items: – We can perform a reduction with E → E + ( E ) E → I int , ) – But only if a + follows E → I E + ( E ) , ) 17 18 The Closure Operation Constructing the Parsing DFA (1) • The operation of extending the context with • Construct the start context: E → E + ( E ) | int items is called the closure operation Closure({S → I E, $}) Closure(Items) = S → I E , $ → I repeat E E+(E), $ E → I int , $ → α I β for each [X Y , a] in Items E → I E+(E), + for each production Y →γ E → I int , + for each b in First(βa) add [Y → I γ, b] to Items • We abbreviate as: until Items is unchanged S → I E , $ E → I E+(E) , $/+ E → I int , $/+ 19 20 Constructing the Parsing DFA (2) The DFA Transitions • A DFA state is a closed set of LR(1) items • A state “State” that contains [X →α I yβ, b] has a transition labeled y to a state that • The start state contains [S → I E , $] contains the items “Transition (State, y)” – y can be a terminal or a non-terminal • A state that contains [X →α I, b] is labelled Transition(State, y) with “reduce with X →αon b” Items = ∅ for each [X → α I yβ, b] in State • And now the transitions … add [X → αy I β, b] to Items return Closure(Items) 21 22 Constructing the Parsing DFA: Example LR Parsing Tables: Notes 0 1 • Parsing tables (i.e., the DFA) can be S → I E , $ E → int E → int I, $/+ → I on $, + constructed automatically for a CFG E E+(E), $/+ int E → I int , $/+ E → E+ I (E), $/+ 3 E 2 + • But we still need to understand the S → E I , $ ( construction to work with parser generators E → E I +(E), $/+ E → E+(I E) , $/+ 4 – E.g., they report errors in terms of sets of items accept E E → I E+(E) , )/+ on $ E → I int , )/+ • What kind of errors can we expect? → I int 6 E E+(E ) , $/+ 5 E → E I +(E) , )/+ E → int I , )/+ E → int + ) on ), + and so on… 23 24 Shift/Reduce Conflicts Shift/Reduce Conflicts • If a DFA state contains both • Typically due to ambiguities in the grammar [X → α I aβ, b] and [Y → γ I, a] • Classic example: the dangling else S → if E then S | if E then S else S | OTHER • Then on input “a” we could either • Will have DFA state containing – Shift into state [X → α a I β, b], or [S → if E then S I, else] – Reduce with Y → γ [S → if E then S I else S, x] • If else follows then we can shift or reduce • This is called a shift-reduce conflict • Default (yacc, ML-yacc, etc.) is to shift – Default behavior is as needed in this case 25 26 More Shift/Reduce Conflicts More Shift/Reduce Conflicts • Consider the ambiguous grammar • In yacc declare precedence and associativity: E → E + E | E * E | int %left + %left * • We will have the states containing [E → E * I E, +] [E → E * E I, +] • Precedence of a rule = that of its last terminal [E → I E + E, +] ⇒E [E → E I + E, +] See yacc manual for ways to override this default …… • Resolve shift/reduce conflict with a shift if: • Again we have a shift/reduce on input + – no precedence declared for either rule or terminal – We need to reduce (* binds more tightly than +) – input terminal has higher precedence than the rule – Recall solution: declare the precedence of * and + – the precedences are the same and right associative 27 28 Using Precedence to Solve S/R Conflicts Using Precedence to Solve S/R Conflicts • Back to our example: • Same grammar as before [E → E * I E, +] [E →E * E I, +] E → E + E | E * E | int [E → I E + E, +] ⇒E [E →E I + E, +] • We will also have the states …… [E → E + I E, +] [E → E + E I, +] [E → I E + E, +] ⇒E [E → E I + E, +] • Will choose reduce because precedence of …… rule E → E * E is higher than of terminal + • Now we also have a shift/reduce on input + – We choose reduce because E → E + E and + have the same precedence and + is left-associative 29 30 Using Precedence to Solve S/R Conflicts Precedence Declarations Revisited • Back to our dangling else example The term “precedence declaration” is misleading! [S → if E then S I, else] [S → if E then S I else S, x] These declarations do not define precedence: • Can eliminate conflict by declaring else having they define conflict resolutions higher precedence than then I.e., they instruct shift-reduce parsers to resolve • But this starts to look like “hacking the tables” conflicts in certain ways • Best to avoid overuse of precedence The two are not quite the same thing! declarations or we will end with unexpected parse trees 31 32 Reduce/Reduce Conflicts Reduce/Reduce Conflicts • If a DFA state contains both • Usually due to gross ambiguity in the grammar [X → α I, a] and [Y → β I, a] • Example: a sequence of

LR Parsing LALR Parser Generators

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support