LR Parsing LR Parsing • Limits of SLR Parsing • LR Parsing Lecture Notes by • LALR Parsing Profs Aiken and Necula (UCB) • Implementation of Semantic Actions

Outline • Review of SLR parsing LR Parsing • Limits of SLR parsing • LR parsing Lecture Notes by • LALR parsing Profs Aiken and Necula (UCB) • Implementation of semantic actions • Using parser generators CS780(Prasad) L16LR 1 CS780(Prasad) L16LR 2 Review of SLR(1) Parsing LR Parsing Algorithm • LR parser maintains a stack Let I = w$ be initial input Let j = 0 á sym1, state1 ñ . á symn, staten ñ Let DFA state 1 have item S’ ® .S staten is the final state of the DFA on sym1 … symn Let stack = á dummy, 1 ñ • Goto table: the transition function of the DFA repeat A – Goto[i,A] = j if statei ® statej case action[top_state(stack), I[j]] of • Action table: for each state and terminal: shift k: push á I[j++], k ñ reduce X ® A: Shift j pop |A| pairs, Reduce X ® a push áX, Goto[X,top_state(stack)]ñ Action[i, a] = Accept accept: halt normally Error error: halt and report error CS780(Prasad) L16LR 3 CS780(Prasad) L16LR 4 Review. Items SLR(1) Action Table • An item [X ® a.b] says that For each state si and terminal a – the parser is looking for an X – If si has item X ® a.ab and Goto[i,a] = j then – it has an a on top of the stack Action[i,a] = shift j – Expects to find a string derived from b next in the input – If si has item S’ ® S. then Action[i,$] = accept • Notes: – [X ® a.ab] means that a should follow. Then we can – If si has item X ® a. and a Î Follow(X) and X ¹ S’ shift it and still have a viable prefix. then Action[i,a] = reduce X ® a – [X ®a.] means that we could reduce X • But this is not always a good idea ! – Otherwise, Action[i,a] = error CS780(Prasad) L16LR 5 CS780(Prasad) L16LR 6 1 Limits of SLR Parsing Limits of SLR Parsing (cont.) • SLR(1) is the simplest LR parsing method • Consider two states of the DFA for • SLR(1) is almost powerful enough, but … recognizing viable prefixes • … some common programming language ® ® constructs are not SLR(1). S’ . S S L . = E L • Consider the grammar S ® . L = E Þ E ® L . ® S ® L = E | E S . E SLR(1) parser on input “=“ ® L ® * E | id L . * E • shift (item L . = E ) L ® . id • reduce by E ® L E ® L (since “=“ Î Follow(E)) E ® . L CS780(Prasad) L16LR 7 CS780(Prasad) L16LR 8 What’s The Problem? What’s The Problem? (Cont.) • The grammar is not SLR(1), but why? • Problem: the SLR table has too many reduce • Focus on the reduce move in the second state actions. – We are in the context of S ® E ® L – Using Follow is too coarse. – No = can follow E in this context • In any given context, only some elements of – Even though = Î Follow(E) (in S ® L = E ® *E = E) Follow can actually follow a non-terminal. – The reduce move should not happen if an = follows • For example: in this context. Follow(E) = {=, $}, but In context S ® E only $ can follow E In context S ® L = E ® * E1 = E only = can follow E1 CS780(Prasad) L16LR 9 CS780(Prasad) L16LR 10 One Way to Fix The Problem: LR(1) Items LR(1) Items. Intuition • Idea: • [X ® a .b, a] describes a state of the parser: – refine Follow based on context. – We are trying to find an X, and – The context is described through items. – We have a already on top of the stack, and • An LR(1) item is a pair – We expect to see a prefix derived from ba [X ® a .b, a] ® a where X ® ab is a production and a is the • Back to reduce actions: have an [X ., a] lookahead token or $ – Perform the reduce only if next token is a ! – Will have fewer reduce actions • LR(k) is similar but with k tokens of lookahead – Not for all b Î Follow(X) – In practice, k = 1 CS780(Prasad) L16LR 11 CS780(Prasad) L16LR 12 2 Constructing Sets of LR(1) Items (1) Constructing Sets of LR(1) Items (2) • Similar to construction for LR(0). 1. For each LR(1) item [Y ® a.Xb, a] Add an X-transition [Y ® a.Xb, a] ®X [Y ® aX.b, a] • The states of the NFA are the LR(1) items of G. 2. For each LR(1) item [Y ® a.Xb, a] • The start state is [S’ ® . S, $] For each production X ® g For each terminal b Î First(ba) Add an e transition [Y ® a.Xb, a] ®e [X ® .g, b] CS780(Prasad) L16LR 13 CS780(Prasad) L16LR 14 NFA for Viable Prefixes in Detail (1) NFA for Viable Prefixes in Detail (2) S’ ® S . $ S’ ® S . $ L ® . id = S S e S ® . L = E $ S ® . L = E $ L e e e S’ ® . S $ S’ ® . S $ S ® L . = E $ L ® . * E = e e S ® . E $ S ® . E $ CS780(Prasad) L16LR 15 CS780(Prasad) L16LR 16 NFA for Viable Prefixes in Detail (3) NFA for Viable Prefixes in Detail (4) S’ ® S . $ S’ ® S . $ L ® . id = L ® . id = S e S e S ® . L = E $ L S ® . L = E $ L e e e e S’ ® . S $ S ® L . = E $ S’ ® . S $ S ® L . = E $ L ® . * E = L ® . * E = L e e E ® L . $ e E ® . L $ e E ® . L $ e S ® . E $ S ® . E $ e E E L ® . * E $ L ® . id $ CS780(Prasad) L16LR 17 CS780(Prasad) L16LR 18 3 An Example Revisited Constructing LR(1) Parsing Tables • Consider the state from last slide 1. Add a dummy S’ ® S production S ® L . = E $ 2. Construct the NFA of LR(1) items as before 3. Convert the NFA into a DFA E ® L . $ 4. Goto is defined exactly as before: A Goto[i, A] = j if statei ® statej • LR(1) parser on input “ =“ (the transition function of the DFA) • only shift (item L . = E ) CS780(Prasad) L16LR 19 CS780(Prasad) L16LR 20 Constructing LR(1) Parsing Tables (Cont.) LALR Parsing 5. For each state si of the DFA and terminal a • Two bottom-up parsing methods: SLR and LR – If si has item [X ® a.ab, c] and Goto[i, a] = j then action[i,a] = shift j • Which one we use? Neither – SLR is not powerful enough. – If si has item [X ® a., a] and X ¹ S’ then action[i,a] = reduce X ® a – LR parsing tables are too big (1000’s of states vs. 100’s of states for SLR). – If si has item [S’ ® S., $] then action[i,$] = accept • In practice, use LALR(1) – Otherwise, – Stands for Look-Ahead LR action[i,a] = error – A compromise between SLR(1) and LR(1) • LR(1) grammar Û action[i,a] uniquely defined CS780(Prasad) L16LR 21 CS780(Prasad) L16LR 22 LALR Parsing (Cont.) The Core of a Set of LR Item • Rough intuition: A LALR(1) parser for G has • Definition: The core of a set of LR items is – The number of states of an SLR parser. the set of first components. – Some of the lookahead discrimination of LR(1). • Example: the core of • Idea: construct the DFA for the LR(1). { [X ® a.b, b], [Y ® g.d, d]} • Then merge the DFA states whose items is ® a b ® g d differ only in the lookahead tokens {X . , Y . } – We say that such states have the same core. • The core of an LR item is an LR(0) item. CS780(Prasad) L16LR 23 CS780(Prasad) L16LR 24 4 A LALR(1) DFA The LALR Parser Can Have Conflicts • Repeat until all states have distinct core. • Consider for example the LR(1) states – Choose two distinct states with same core. {[X ® a., a], [Y ® b., b]} – Merge the states by creating a new one with the {[X ® a., b], [Y ® b., a]} union of all the items. – Point edges from predecessors to new state. • And the merged LALR(1) state – New state points to all the previous successors. {[X ® a., a/b], [Y ® b., a/b]} • Has a new reduce-reduce conflict. A B C A C BE • In practice such cases are rare. D E F D F CS780(Prasad) L16LR 25 CS780(Prasad) L16LR 26 LALR vs. LR Parsing A Hierarchy of Grammar Classes • LALR languages are not natural. – They are an efficiency hack on LR languages • Any reasonable programming language has an LALR(1) grammar. • LALR(1) has become a standard for programming languages and for parser generators. CS780(Prasad) L16LR 27 CS780(Prasad) L16LR 28 Semantic Actions Performing Semantic Actions. Example • We can now illustrate how semantic actions • Recall the example from earlier lecture are implemented for LR parsing. • Keep attributes on the stack. E ® T + E1 { E.val = T.val + E1.val } | T { E.val = T.val } • On shift a, push attribute for a on stack. T ® int * T1 { T.val = int.val * T1.val } • On reduce X ® a | int { T.val = int.val } – pop attributes for a – compute attribute for X • Consider the parsing of the string 3 * 5 + 8 – and push it on the stack CS780(Prasad) L16LR 29 CS780(Prasad) L16LR 30 5 Performing Semantic Actions. Example Notes | int * int + int shift • The previous discussion shows how int3 | * int + int shift synthesized attributes are computed by LR int3 * | int + int shift parsers. int3 * int5 | + int reduce T ® int int3 * T5 | + int reduce T ® int * T T15 | + int shift • It is also possible to compute inherited T15 + | int shift attributes in an LR parser.

LR Parsing LR Parsing • Limits of SLR Parsing • LR Parsing Lecture Notes by • LALR Parsing Profs Aiken and Necula (UCB) • Implementation of Semantic Actions

Buffered Shift-Reduce Parsing*

Validating LR(1) Parsers

Efficient Computation of LALR(1) Look-Ahead Sets

LLLR Parsing: a Combination of LL and LR Parsing

Parser Generation Bottom-Up Parsing LR Parsing Constructing LR Parser

Compiler Design

A Simple, Possibly Correct LR Parser for C11 Jacques-Henri Jourdan, François Pottier

CLR(1) and LALR(1) Parsers

Packrat Parsing: Simple, Powerful, Lazy, Linear Time

Parser Generation Bottom-Up Parsing LR Parsing Constructing LR Parser

Elkhound: a Fast, Practical GLR Parser Generator

A Simple, Possibly Correct LR Parser for C11