LR Parsing LALR Parser Generators

Outline LR Parsing • Review of bottom-up parsing LALR Parser Generators • Computing the parsing DFA • Using parser generators 2 Bottom-up Parsing (Review) The Shift and Reduce Actions (Review) • A bottom-up parser rewrites the input string • Recall the CFG: E → int | E + (E) to the start symbol • A bottom-up parser uses two kinds of actions: • The state of the parser is described as α I γ • Shift pushes a terminal from input on the – α is a stack of termin als and non-terminals stack – γ is the string of terminals not yet examined E + ( I int ) ⇒ E + ( int I ) • Reduce pops 0 or more symbols off of the • Initially: I x1x2 . xn stack (production RHS) and pushes a non- terminal on the stack (production LHS) E + (E + ( E ) I ) ⇒ E + ( E I ) 3 4 Key Issue: When to Shift or Reduce? LR(1) Parsing: An Example int I int + (int) + (int)$ shift • Idea: use a deterministic finite automaton 0 1 E I → (DFA) to decide when to shift or reduce E → int int + (int) + (int)$ E int ( on $, + E I + (int ) + (int)$ shift (x3) – The input is the stack 2 + 3 4 E + (int I ) + (int)$ E → int – The language consists of terminals and non-terminals accept on $ E int E + (E I ) + (int)$ shift ) E → int I → • We run the DFA on the stack and we examine 7 6 5 E + (E) + (int)$ E E+(E) on ), + the resulting state X and the token tok after I E → E + (E) E I + (int)$ shift (x3) on $, + + int E + (int I )$ E → int – If X has a transition labeled tok then shift ( E + (E I )$ shift – If X is labeled with “A →βon tok”then reduce 8 9 E + (E) I $ E → E+(E) E + E I $ accept 10 11 E → E + (E) ) on ), + 5 Representing the DFA Representing the DFA: Example • Parsers represent the DFA as a 2D table The table for a fragment of our DFA: – Recall table-driven lexical analysis int + ( ) $ E ( • Lines correspond to DFA states 3 4 … 3 s4 • Columns correspond to terminals and non- E int terminals 4 s5 g6 5 • Typically columns are split into: 6 5 rE → int rE → int E → int 6 s8 s7 – Those for terminals: the action table ) on ), + 7 r → r → – Those for non-terminals: the goto table E E+(E) E E+(E) … 7 sk is shift and goto state k E → E + (E) rX → α is reduce on $, + gk is goto state k 7 8 The LR Parsing Algorithm The LR Parsing Algorithm • After a shift or reduce action we rerun the let I = w$ be initial input DFA on the entire stack let j = 0 – This is wasteful, since most of the work is repeated let DFA state 0 be the start state let stack = 〈 dummy, 0 〉 • Remember for each stack element on which repeat state it brings the DFA case action[top_state(stack), I[j]] of shift k: push 〈 I[j++], k 〉 • LR parser maintains a stack reduce X → A: pop |A| pairs, 〈 sym , state 〉 . 〈 sym , state 〉 1 1 n n push 〈 X, goto[top_state(stack), X] 〉 state is the final state of the DFA on sym …sy m k 1 k accept: halt normally error: halt and report error 9 10 Key Issue: How is the DFA Constructed? LR(0) Items • The stack describes the context of the parse • An LR(0) item is a production with a “I” – What non-terminal we are looking for somewhere on the RHS – What production RHS we are looking for – What we have seen so far from the RHS • The items for T → (E) are T → I (E) • Each DFA state describes several such T → ( I E) contexts T → (E I ) – E.g., when we are looking for non-terminal E, we T → (E) I might be looking either for an int or an E + (E) RHS • The only item for X →εis X → I 11 12 LR(0) Items: Intuition LR(1) Items • An item [X →α I β] says that • An LR(1) item is a pair: – the parser is looking for an X X →α I β, a – it has an α on top of the stack – X →αβis a production – Expects to find a string derived from β next in the – a is a terminal (the lookahead terminal) input – LR(1) means 1 lookahead terminal • [X →α I β, a] describes a context of the parser • Notes: – We are trying to find an X followed by an a, and →αI β – [X a ] means that a should follow. Then we – We have (at least) α already on top of the stack can shift it and still have a viable prefix – Thus we need to see next a prefix derived from βa →α I – [X ] means that we could reduce X • But this is not always a good idea ! 13 14 Note Convention • The symbol I was used before to separate the • We add to our grammar a fresh new start stack from the rest of input symbol S and a production S → E – α I γ, where α is the stack and γ is the remaining – Where E is the old start symbol string of terminals • In items I is used to mark a prefix of a • The initial parsing context contains: production RHS: S → I E , $ X →α I β, a – Trying to find an S as a string derived from E$ – Here β might contain terminals as well – The stack is empty • In both case the stack is on the left of I 15 16 LR(1) Items (Cont.) LR(1) Items (Cont.) • In context containing • Consider the item E → E + I ( E ) , + E → E + ( I E ) , + – If ( follows then we can perform a shift to context • We expect a string derived from E ) + containing • There are two productions for E → I E E + ( E ) , + E → int and E → E + ( E) • In context containing • We describe this by extending the context E → E + ( E ) I , + with two more items: – We can perform a reduction with E → E + ( E ) E → I int , ) – But only if a + follows E → I E + ( E ) , ) 17 18 The Closure Operation Constructing the Parsing DFA (1) • The operation of extending the context with • Construct the start context: E → E + ( E ) | int items is called the closure operation Closure({S → I E, $}) Closure(Items) = S → I E , $ → I repeat E E+(E), $ E → I int , $ → α I β for each [X Y , a] in Items E → I E+(E), + for each production Y →γ E → I int , + for each b in First(βa) add [Y → I γ, b] to Items • We abbreviate as: until Items is unchanged S → I E , $ E → I E+(E) , $/+ E → I int , $/+ 19 20 Constructing the Parsing DFA (2) The DFA Transitions • A DFA state is a closed set of LR(1) items • A state “State” that contains [X →α I yβ, b] has a transition labeled y to a state that • The start state contains [S → I E , $] contains the items “Transition (State, y)” – y can be a terminal or a non-terminal • A state that contains [X →α I, b] is labelled Transition(State, y) with “reduce with X →αon b” Items = ∅ for each [X → α I yβ, b] in State • And now the transitions … add [X → αy I β, b] to Items return Closure(Items) 21 22 Constructing the Parsing DFA: Example LR Parsing Tables: Notes 0 1 • Parsing tables (i.e., the DFA) can be S → I E , $ E → int E → int I, $/+ → I on $, + constructed automatically for a CFG E E+(E), $/+ int E → I int , $/+ E → E+ I (E), $/+ 3 E 2 + • But we still need to understand the S → E I , $ ( construction to work with parser generators E → E I +(E), $/+ E → E+(I E) , $/+ 4 – E.g., they report errors in terms of sets of items accept E E → I E+(E) , )/+ on $ E → I int , )/+ • What kind of errors can we expect? → I int 6 E E+(E ) , $/+ 5 E → E I +(E) , )/+ E → int I , )/+ E → int + ) on ), + and so on… 23 24 Shift/Reduce Conflicts Shift/Reduce Conflicts • If a DFA state contains both • Typically due to ambiguities in the grammar [X → α I aβ, b] and [Y → γ I, a] • Classic example: the dangling else S → if E then S | if E then S else S | OTHER • Then on input “a” we could either • Will have DFA state containing – Shift into state [X → α a I β, b], or [S → if E then S I, else] – Reduce with Y → γ [S → if E then S I else S, x] • If else follows then we can shift or reduce • This is called a shift-reduce conflict • Default (yacc, ML-yacc, etc.) is to shift – Default behavior is as needed in this case 25 26 More Shift/Reduce Conflicts More Shift/Reduce Conflicts • Consider the ambiguous grammar • In yacc declare precedence and associativity: E → E + E | E * E | int %left + %left * • We will have the states containing [E → E * I E, +] [E → E * E I, +] • Precedence of a rule = that of its last terminal [E → I E + E, +] ⇒E [E → E I + E, +] See yacc manual for ways to override this default …… • Resolve shift/reduce conflict with a shift if: • Again we have a shift/reduce on input + – no precedence declared for either rule or terminal – We need to reduce (* binds more tightly than +) – input terminal has higher precedence than the rule – Recall solution: declare the precedence of * and + – the precedences are the same and right associative 27 28 Using Precedence to Solve S/R Conflicts Using Precedence to Solve S/R Conflicts • Back to our example: • Same grammar as before [E → E * I E, +] [E →E * E I, +] E → E + E | E * E | int [E → I E + E, +] ⇒E [E →E I + E, +] • We will also have the states …… [E → E + I E, +] [E → E + E I, +] [E → I E + E, +] ⇒E [E → E I + E, +] • Will choose reduce because precedence of …… rule E → E * E is higher than of terminal + • Now we also have a shift/reduce on input + – We choose reduce because E → E + E and + have the same precedence and + is left-associative 29 30 Using Precedence to Solve S/R Conflicts Precedence Declarations Revisited • Back to our dangling else example The term “precedence declaration” is misleading! [S → if E then S I, else] [S → if E then S I else S, x] These declarations do not define precedence: • Can eliminate conflict by declaring else having they define conflict resolutions higher precedence than then I.e., they instruct shift-reduce parsers to resolve • But this starts to look like “hacking the tables” conflicts in certain ways • Best to avoid overuse of precedence The two are not quite the same thing! declarations or we will end with unexpected parse trees 31 32 Reduce/Reduce Conflicts Reduce/Reduce Conflicts • If a DFA state contains both • Usually due to gross ambiguity in the grammar [X → α I, a] and [Y → β I, a] • Example: a sequence of

LR Parsing LALR Parser Generators

Buffered Shift-Reduce Parsing*

Validating LR(1) Parsers

Efficient Computation of LALR(1) Look-Ahead Sets

LLLR Parsing: a Combination of LL and LR Parsing

Parser Generation Bottom-Up Parsing LR Parsing Constructing LR Parser

Compiler Design

A Simple, Possibly Correct LR Parser for C11 Jacques-Henri Jourdan, François Pottier

CLR(1) and LALR(1) Parsers

Packrat Parsing: Simple, Powerful, Lazy, Linear Time

Parser Generation Bottom-Up Parsing LR Parsing Constructing LR Parser

Elkhound: a Fast, Practical GLR Parser Generator

A Simple, Possibly Correct LR Parser for C11