LR Parsing, LALR Parser Generators Lecture 10 Outline • Review Of

Outline • Review of bottom-up parsing LR Parsing, LALR Parser Generators • Computing the parsing DFA • Using parser generators Lecture 10 Prof. Fateman CS 164 Lecture 10 1 Prof. Fateman CS 164 Lecture 10 2 Bottom-up Parsing (Review) The Shift and Reduce Actions (Review) • A bottom-up parser rewrites the input string • Recall the CFG: E → int | E + (E) to the start symbol • A bottom-up parser uses two kinds of actions: • The state of the parser is described as •Shiftpushes a terminal from input on the stack α I γ – α is a stack of terminals and non-terminals E + (I int ) ⇒ E + (int I ) – γ is the string of terminals not yet examined •Reducepops 0 or more symbols off the stack (the rule’s rhs) and pushes a non-terminal on • Initially: I x1x2 . xn the stack (the rule’s lhs) E + (E + ( E ) I ) ⇒ E +(E I ) Prof. Fateman CS 164 Lecture 10 3 Prof. Fateman CS 164 Lecture 10 4 LR(1) Parsing. An Example Key Issue: When to Shift or Reduce? int int + (int) + (int)$ shift 0 1 I int + (int) + (int)$ E → int • Idea: use a finite automaton (DFA) to decide E E → int I ( on $, + E I + (int) + (int)$ shift(x3) when to shift or reduce 2 + 3 4 E + (int I ) + (int)$ E → int – The input is the stack accept E int E + (E ) + (int)$ shift – The language consists of terminals and non-terminals on $ I ) E → int E + (E) I + (int)$ E → E+(E) 7 6 5 on ), + E I + (int)$ shift (x3) • We run the DFA on the stack and we examine E → E + (E) + E + (int )$ E → int the resulting state X and the token tok after I on $, + int I E + (E )$ shift –If X has a transition labeled tok then shift ( I 8 9 β E + (E) I $ E → E+(E) –If X is labeled with “A → on tok”then reduce E + E I $ accept 10 11 E → E + (E) Prof. Fateman CS 164 Lecture 10 5 ) on ), + Representing the DFA Representing the DFA. Example • Parsers represent the DFA as a 2D table • The table for a fragment of our DFA: – Recall table-driven lexical analysis int + ( ) $ E ( • Lines correspond to DFA states 3 4 … int 3s4 • Columns correspond to terminals and non- E 4s5 g6 terminals 5 6 5 rE→ int rE→ int • Typically columns are split into: E → int 6s8 s7 – Those for terminals: the action table ) on ), + 7 rE→ E+(E) rE→ E+(E) – Those for non-terminals: the goto table … sk is shift and goto state k 7 rX → α is reduce E → E + (E) gk is goto state k, Prof. Fateman CS 164 Lecture 10 7 on $, + Prof. Fateman CS 164 Lecture 10 8 The LR Parsing Algorithm The LR Parsing Algorithm • After a shift or reduce action we rerun the Let I = w$ be initial input DFA on the entire stack Let j = 0 Let DFA state 0 be the start state – This is wasteful, since most of the work is repeated Let stack = 〈 dummy, 0 〉 repeat • Remember for each stack element on which case action[top_state(stack), I[j]] of state it brings the DFA; use extra memory. shift k: push 〈 I[j++], k 〉 reduce X → A: pop |A| pairs, • LR parser maintains a stack push 〈X, Goto[top_state(stack), X]〉〈 sym1, state1 〉 . 〈 symn, staten 〉 accept: halt normally statek is the final state of the DFA on sym1 …symk error: halt and report error Prof. Fateman CS 164 Lecture 10 9 Prof. Fateman CS 164 Lecture 10 10 Key Issue: How is the DFA Constructed? LR(1) Items • The stack describes the context of the parse • An LR(1) item is a pair: – What non-terminal we are looking for X →α•β, a – What production rhs we are looking for –X → αβ is a production – What we have seen so far from the rhs –ais a terminal (the lookahead terminal) – LR(1) means 1 lookahead terminal • Each DFA state describes several such contexts •[X →α•β, a] describes a context of the parser – E.g., when we are looking for non-terminal E, we – We are trying to find an X followed by an a, and might be looking either for an int or a E + (E) rhs – We have (at least) α already on top of the stack – Thus we need to see next a prefix derived from βa Prof. Fateman CS 164 Lecture 10 11 Prof. Fateman CS 164 Lecture 10 12 Note Convention •The symbol I was used before to separate the • We add to our grammar a fresh new start stack from the rest of input symbol S and a production S → E – α I γ, where α is the stack and γ is the remaining –Where E is the old start symbol string of terminals •In items • is used to mark a prefix of a • The initial parsing context contains: production rhs: S → •E, $ →α β X • , a – Trying to find an S as a string derived from E$ β –Here might contain terminals as well –The stack is empty • In both case the stack is on the left Prof. Fateman CS 164 Lecture 10 13 Prof. Fateman CS 164 Lecture 10 14 LR(1) Items (Cont.) LR(1) Items (Cont.) • In context containing •Consider the item E → E + • ( E ), + E → E + (• E ) , + –If ( follows then we can perform a shift to context • We expect a string derived from E ) + containing • There are two productions for E → • E E + ( E ), + E → int and E → E + ( E) • In context containing • We describe this by extending the context E → E + ( E ) •, + with two more items: – We can perform a reduction with E → E + ( E ) E → • int, ) –But only if a + follows E → • E + ( E ) , ) Prof. Fateman CS 164 Lecture 10 15 Prof. Fateman CS 164 Lecture 10 16 The Closure Operation Constructing the Parsing DFA (1) • Extending the context with items is called the • Construct the start context: Closure({S → •E, $}) closure operation. S → •E, $ E → •E+(E), $ Closure(Items) = E → •int, $ repeat E → •E+(E), + for each [X → α•Yβ, a] in Items E → •int, + for each production Y → γ • We abbreviate as: for each b ∈ First(βa) S → •E, $ γ add [Y → • , b] to Items E → •E+(E), $/+ until Items is unchanged E → •int, $/+ Prof. Fateman CS 164 Lecture 10 17 Prof. Fateman CS 164 Lecture 10 18 Constructing the Parsing DFA (2) The DFA Transitions • A DFA state is a closed set of LR(1) items • A state “State” that contains [X → α•yβ, b] has a transition labeled y to a state that the • The start state contains [S → •E, $] items “Transition(State, y)” – y can be a terminal or a non-terminal • A state that contains [X → α•, b] is labeled Transition(State, y) α with “reduce with X → on b” Items ← ∅ for each [X → α•yβ, b] ∈ State • And now the transitions … add [X → αy•β, b] to Items return Closure(Items) Prof. Fateman CS 164 Lecture 10 19 Prof. Fateman CS 164 Lecture 10 20 Constructing the Parsing DFA. Example. LR Parsing Tables. Notes → • 0 1 S E, $ E → int•, $/+ E → int • Parsing tables (i.e. the DFA) can be E → •E+(E), $/+ int on $, + constructed automatically for a CFG E → •int, $/+ E → E+• (E), $/+ 3 E + 2 • Why study this at all in CS164? We still need S → E•, $ ( to understand the construction to work with E → E•+(E), $/+ E → E+(•E), $/+ 4 parser generators accept E E → •E+(E), )/+ on $ E → •int, )/+ – E.g., they report errors in terms of sets of items E → E+(E•), $/+ int 5 6 • What kind of errors can we expect? E → E•+(E), )/+ E → int•, )/+ E → int on ), + and so on… Prof. Fateman CS 164 Lecture 10 21 Prof. Fateman CS 164 Lecture 10 22 Shift/Reduce Conflicts Shift/Reduce Conflicts • If a DFA state contains both • They are a typical symptom if there is an [X → α•aβ, b] and [Y → γ•, a] ambiguity in the grammar • Classic example: the dangling else S → if E then S | if E then S else S | OTHER •Then on input “a” we could either • Will have DFA state containing –Shift into state [X → αa•β, b], or [S → if E then S•, else] – Reduce with Y → γ [S → if E then S• else S, x] •Ifelse follows then we can shift or reduce • This is called a shift-reduce conflict • Default (bison, CUP, JLALR, etc.) is to shift – Default behavior is right in this case Prof. Fateman CS 164 Lecture 10 23 Prof. Fateman CS 164 Lecture 10 24 More Shift/Reduce Conflicts More Shift/Reduce Conflicts • Consider the ambiguous grammar Some parser generators (YACC, BISON) → E E + E | E * E | int provide precedence declarations. • We will have the states containing – Precedence left PLUS, → → [E E * • E, +] [E E * E•, +] – Precedence left TIMES [E → • E + E, +] ⇒E [E → E • + E, +] – Precedence right EXP …… • Again we have a shift/reduce on input + •Bison, YACC – We need to reduce (* binds more tightly than +) – Declare precedence and associativity: – Solution: somehow impose the precedence of * and %left + + %left * Prof. Fateman CS 164 Lecture 10 25 Prof. Fateman CS 164 Lecture 10 26 More Shift/Reduce Conflicts Reduce/Reduce Conflicts Our LALR generator doesn’t do this. Instead, we • If a DFA state contains both “Stratify” the grammar. (Less explanation!) [X → α•, a] and [Y → β•, a] E → E + E | E * E | int ;; original New –Then on input “a”we don’t know which E → E + E1 | E1 ;; E1+E would be right associative production to reduce E1 → E1 * int | int (Many “layers” may be necessary for elaborate languages.

LR Parsing, LALR Parser Generators Lecture 10 Outline • Review Of

Validating LR(1) Parsers

CLR(1) and LALR(1) Parsers

Parser Generation Bottom-Up Parsing LR Parsing Constructing LR Parser

Elkhound: a Fast, Practical GLR Parser Generator

Unit-5 Parsers

Bottom-Up Parsing Swarnendu Biswas

Compiler Construction

Practical Dynamic Grammars for Dynamic Languages Lukas Renggli, St´Ephane Ducasse, Tudor Gˆırba,Oscar Nierstrasz

Parser Generators

LNCS 9031, Pp

Principled Procedural Parsing Nicolas Laurent

One Parser to Rule Them All