LR Parsing LALR Parser Generators

Outline LR Parsing • Review of bottom-up parsing LALR Parser Generators • Computing the parsing DFA • Using parser generators Compiler Design I (2011) 2 Bottom-up Parsing (Review) The Shift and Reduce Actions (Review) • A bottom-up parser rewrites the input string • Recall the CFG: E → int | E + (E) to the start symbol • A bottom-up parser uses two kinds of actions: • The state of the parser is described as α I γ • Shift pushes a terminal from input on the – α is a stack of termin als and non-terminals stack – γ is the string of terminals not yet examined E + ( I int ) ⇒ E + ( int I ) • Reduce pops 0 or more symbols off of the • Initially: I x1x2 . xn stack (production RHS) and pushes a non- terminal on the stack (production LHS) E + (E + ( E ) I ) ⇒ E + ( E I ) Compiler Design I (2011) 3 Compiler Design I (2011) 4 Key Issue: When to Shift or Reduce? LR(1) Parsing: An Example int I int + (int) + (int)$ shift • Idea: use a deterministic finite automaton 0 1 E I → (DFA) to decide when to shift or reduce E → int int + (int) + (int)$ E int ( on $, + E I + (int ) + (int)$ shift (x3) – The input is the stack 2 + 3 4 E + (int I ) + (int)$ E → int – The language consists of terminals and non-terminals accept on $ E int E + (E I ) + (int)$ shift ) E → int I → • We run the DFA on the stack and we examine 7 6 5 E + (E) + (int)$ E E+(E) on ), + the resulting state X and the token tok after I E → E + (E) E I + (int)$ shift (x3) on $, + + int E + (int I )$ E → int – If X has a transition labeled tok then shift ( E + (E I )$ shift – If X is labeled with “A →βon tok”then reduce 8 9 E + (E) I $ E → E+(E) E + E I $ accept 10 11 E → E + (E) ) on ), + Compiler Design I (2011) 5 Representing the DFA Representing the DFA: Example • Parsers represent the DFA as a 2D table The table for a fragment of our DFA: – Recall table-driven lexical analysis int + ( ) $ E ( • Lines correspond to DFA states 3 4 … 3 s4 • Columns correspond to terminals and non- E int terminals 4 s5 g6 5 • Typically columns are split into: 6 5 rE → int rE → int E → int 6 s8 s7 – Those for terminals: the action table ) on ), + 7 r → r → – Those for non-terminals: the goto table E E+(E) E E+(E) … 7 sk is shift and goto state k E → E + (E) rX → α is reduce on $, + gk is goto state k Compiler Design I (2011) 7 Compiler Design I (2011) 8 The LR Parsing Algorithm The LR Parsing Algorithm • After a shift or reduce action we rerun the let I = w$ be initial input DFA on the entire stack let j = 0 – This is wasteful, since most of the work is repeated let DFA state 0 be the start state let stack = 〈 dummy, 0 〉 repeat • Remember for each stack element on which case action[top_state(stack), I[j]] of state it brings the DFA shift k: push 〈 I[j++], k 〉 reduce X → A: • LR parser maintains a stack pop |A| pairs, 〈 sym1, state1 〉 . 〈 symn, staten 〉 push 〈 X, goto[top_state(stack), X] 〉 statek is the final state of the DFA on sym1 …sy mk accept: halt normally error: halt and report error Compiler Design I (2011) 9 Compiler Design I (2011) 10 Key Issue: How is the DFA Constructed? LR(0) Items • The stack describes the context of the parse • An LR(0) item is a production with a “I” – What non-terminal we are looking for somewhere on the RHS – What production RHS we are looking for – What we have seen so far from the RHS • The items for T → (E) are T → I (E) • Each DFA state describes several such T → ( I E) contexts T → (E I ) – E.g., when we are looking for non-terminal E, we T → (E) I might be looking either for an int or an E + (E) RHS • The only item for X →εis X → I Compiler Design I (2011) 11 Compiler Design I (2011) 12 LR(0) Items: Intuition LR(1) Items • An item [X →α I β] says that • An LR(1) item is a pair: – the parser is looking for an X X →α I β, a – it has an α on top of the stack – X →αβis a production – Expects to find a string derived from β next in the – a is a terminal (the lookahead terminal) input – LR(1) means 1 lookahead terminal • [X →α I β, a] describes a context of the parser • Notes: – We are trying to find an X followed by an a, and →αI β – [X a ] means that a should follow. Then we – We have (at least) α already on top of the stack can shift it and still have a viable prefix – Thus we need to see next a prefix derived from βa →α I – [X ] means that we could reduce X • But this is not always a good idea ! Compiler Design I (2011) 13 Compiler Design I (2011) 14 Note Convention • The symbol I was used before to separate the • We add to our grammar a fresh new start stack from the rest of input symbol S and a production S → E – α I γ, where α is the stack and γ is the remaining – Where E is the old start symbol string of terminals • In items I is used to mark a prefix of a • The initial parsing context contains: production RHS: S → I E , $ X →α I β, a – Trying to find an S as a string derived from E$ – Here β might contain terminals as well – The stack is empty • In both case the stack is on the left of I Compiler Design I (2011) 15 Compiler Design I (2011) 16 LR(1) Items (Cont.) LR(1) Items (Cont.) • In context containing • Consider the item E → E + I ( E ) , + E → E + ( I E ) , + – If ( follows then we can perform a shift to context • We expect a string derived from E ) + containing • There are two productions for E → I E E + ( E ) , + E → int and E → E + ( E) • In context containing • We describe this by extending the context E → E + ( E ) I , + with two more items: – We can perform a reduction with E → E + ( E ) E → I int , ) – But only if a + follows E → I E + ( E ) , ) Compiler Design I (2011) 17 Compiler Design I (2011) 18 The Closure Operation Constructing the Parsing DFA (1) • The operation of extending the context with • Construct the start context: Closure({S → I E, $}) items is called the closure operation S → I E , $ E → I E+(E), $ Closure(Items) = E → I int , $ repeat E → I E+(E), + for each [X → α I Yβ, a] in Items E → I int , + for each production Y →γ for each b in First(βa) • We abbreviate as: add [Y → I γ, b] to Items S → I E , $ until Items is unchanged E → I E+(E) , $/+ E → I int , $/+ Compiler Design I (2011) 19 Compiler Design I (2011) 20 Constructing the Parsing DFA (2) The DFA Transitions • A DFA state is a closed set of LR(1) items • A state “State” that contains [X →α I yβ, b] has a transition labeled y to a state that • The start state contains [S → I E , $] contains the items “Transition (State, y)” – y can be a terminal or a non-terminal • A state that contains [X →α I, b] is labelled Transition(State, y) with “reduce with X →αon b” Items = ∅ for each [X → α I yβ, b] in State • And now the transitions … add [X → αy I β, b] to Items return Closure(Items) Compiler Design I (2011) 21 Compiler Design I (2011) 22 Constructing the Parsing DFA: Example LR Parsing Tables: Notes 0 1 • Parsing tables (i.e., the DFA) can be S → I E , $ E → int E → int I, $/+ → I on $, + constructed automatically for a CFG E E+(E), $/+ int E → I int , $/+ E → E+ I (E), $/+ 3 E 2 + • But we still need to understand the S → E I , $ ( construction to work with parser generators E → E I +(E), $/+ E → E+(I E) , $/+ 4 – E.g., they report errors in terms of sets of items accept E E → I E+(E) , )/+ on $ E → I int , )/+ • What kind of errors can we expect? → I int 6 E E+(E ) , $/+ 5 E → E I +(E) , )/+ E → int I , )/+ E → int + ) on ), + Compiler Design I (2011) and so on… 23 Compiler Design I (2011) 24 Shift/Reduce Conflicts Shift/Reduce Conflicts • If a DFA state contains both • Typically due to ambiguities in the grammar [X → α I aβ, b] and [Y → γ I, a] • Classic example: the dangling else S → if E then S | if E then S else S | OTHER • Then on input “a” we could either • Will have DFA state containing – Shift into state [X → α a I β, b], or [S → if E then S I, else] – Reduce with Y → γ [S → if E then S I else S, x] • If else follows then we can shift or reduce • This is called a shift-reduce conflict • Default (yacc, ML-yacc, etc.) is to shift – Default behavior is as needed in this case Compiler Design I (2011) 25 Compiler Design I (2011) 26 More Shift/Reduce Conflicts More Shift/Reduce Conflicts • Consider the ambiguous grammar • In yacc declare precedence and associativity: E → E + E | E * E | int %left + %left * • We will have the states containing [E → E * I E, +] [E → E * E I, +] • Precedence of a rule = that of its last terminal [E → I E + E, +] ⇒E [E → E I + E, +] See yacc manual for ways to override this default …… • Resolve shift/reduce conflict with a shift if: • Again we have a shift/reduce on input + – no precedence declared for either rule or terminal – We need to reduce (* binds more tightly than +) – input terminal has higher precedence than the rule – Recall solution: declare the precedence of * and + – the precedences are the same and right associative Compiler Design I (2011) 27 Compiler Design I (2011) 28 Using Precedence to Solve S/R Conflicts Using Precedence to Solve S/R Conflicts • Back to our example: • Same grammar as before [E → E * I E, +] [E →E * E I, +] E → E + E | E * E | int [E → I E + E, +] ⇒E [E →E I + E, +] • We will also have the states …… [E → E + I E, +] [E → E + E I, +] [E → I E + E, +] ⇒E [E → E I + E, +] • Will choose reduce because precedence of …… rule E → E * E is higher than of terminal + • Now we also have a shift/reduce on input + – We choose reduce because E → E + E and + have the same precedence and + is left-associative Compiler Design I (2011) 29 Compiler Design I (2011) 30 Using Precedence to Solve S/R Conflicts Precedence Declarations Revisited • Back to our dangling else example The term “precedence declaration” is misleading!

LR Parsing LALR Parser Generators

Buffered Shift-Reduce Parsing*

Validating LR(1) Parsers

Efficient Computation of LALR(1) Look-Ahead Sets

LLLR Parsing: a Combination of LL and LR Parsing

Parser Generation Bottom-Up Parsing LR Parsing Constructing LR Parser

Compiler Design

A Simple, Possibly Correct LR Parser for C11 Jacques-Henri Jourdan, François Pottier

CLR(1) and LALR(1) Parsers

Packrat Parsing: Simple, Powerful, Lazy, Linear Time

Parser Generation Bottom-Up Parsing LR Parsing Constructing LR Parser

Elkhound: a Fast, Practical GLR Parser Generator

A Simple, Possibly Correct LR Parser for C11