<<

Outline

• Review LL parsing Introduction to Bottom-Up Parsing • Shift-reduce parsing

• The LR parsing

• Constructing LR parsing tables

Compiler Design 1 (2011) 2

Top-Down Parsing: Review Top-Down Parsing: Review

• Top-down parsing expands a from • Top-down parsing expands a parse tree from the start symbol to the leaves the start symbol to the leaves – Always expand the leftmost non-terminal – Always expand the leftmost non-terminal

E E • The leaves at any point T E T E form a string βAγ + + – β contains only terminals T – The input string is βbδ int * – The prefix β matches – The next token is b int * int + int int * int + int

Compiler Design 1 (2011) 3 Compiler Design 1 (2011) 4 Top-Down Parsing: Review Top-Down Parsing: Review

• Top-down parsing expands a parse tree from • Top-down parsing expands a parse tree from the start symbol to the leaves the start symbol to the leaves – Always expand the leftmost non-terminal – Always expand the leftmost non-terminal

E E • The leaves at any point • The leaves at any point T E form a string βAγ T E form a string βAγ + + – β contains only terminals – β contains only terminals T T T – The input string is βbδ T – The input string is βbδ int * int * – The prefix β matches – The prefix β matches int – The next token is b int int – The next token is b int * int + int int * int + int

Compiler Design 1 (2011) 5 Compiler Design 1 (2011) 6

Predictive Parsing: Review Constructing Predictive Parsing Tables

• A predictive parser is described by a table 1. Consider the state S →* βAγ

– For each non-terminal A and for each token b we – With b the next token specify a production A →α – Trying to match βbδ – When trying to expand A we use A →αif b follows There are two possibilities: next • b belongs to an expansion of A

• Once we have the table • Any A →αcan be used if b can start a string derived from α – The parsing algorithm is simple and fast ∈ α • We say that b First( ) – No is necessary

Or…

Compiler Design 1 (2011) 7 Compiler Design 1 (2011) 8 Constructing Predictive Parsing Tables (Cont.) Computing First Sets

2. b does not belong to an expansion of A Definition – The expansion of A is empty and b belongs to an First(X) = { b | X →* bα } ∪ { ε | X →* ε } expansion of γ Algorithm sketch – Means that b can appear after A in a derivation of the form S →* βAb ω 1. First(b) = { b }

– We say that b ∈ Follow(A) in this case 2. ε∈First(X) if X →εis a production

3. ε∈First(X) if X → A1 …An – What productions can we use in this case? and ε∈First(Ai) for 1 ≤ i ≤ n

• Any A → α can be used if α can expand to ε 4. First(α) ⊆ First(X) if X → A1 …An α • We say that ε ∈ First(A) in this case and ε∈First(A ) for 1 ≤ i ≤ n i

Compiler Design 1 (2011) 9 Compiler Design 1 (2011) 10

First Sets: Example Computing Follow Sets

• Recall the • Definition E → T X X → + E | ε Follow(X) = { b | S →* β X b δ } T → ( E ) | int Y Y → * T | ε • First sets • Intuition

First( ( ) = { ( } First( T ) = {int, ( } – If X → A B then First(B) ⊆ Follow(A)

First( ) ) = { ) } First( E ) = {int, ( } and Follow(X) ⊆ Follow(B) First( int) = { int } First( X ) = { +, ε } – Also if B →* ε then Follow(X) ⊆ Follow(A)

First( + ) = { + } First( Y ) = { *, ε } – If S is the start symbol then $ ∈ Follow(S)

First( * ) = { * }

Compiler Design 1 (2011) 11 Compiler Design 1 (2011) 12 Computing Follow Sets (Cont.) Follow Sets: Example

Algorithm sketch • Recall the grammar 1. $ ∈ Follow(S) E → T X X → + E | ε

2. First( β) - {ε} ⊆ Follow(X) T → ( E ) | int Y Y → * T | ε – For each production A →αX β • Follow sets

3. Follow(A) ⊆ Follow(X) Follow( + ) = { int, ( } Follow( * ) = { int, ( }

– For each production A →αX β where ε∈First(β) Follow( ( ) = { int, ( } Follow( E ) = { ), $ } Follow( X ) = { $, ) } Follow( T ) = { +, ) , $ } Follow( ) ) = { + , ) , $ } Follow( Y ) = { +, ) , $ } Follow( int) = { *, +, ) , $ }

Compiler Design 1 (2011) 13 Compiler Design 1 (2011) 14

Constructing LL(1) Parsing Tables Constructing LL(1) Tables: Example

• Construct a parsing table T for CFG G • Recall the grammar E → T X X → + E | ε • For each production A →αin G do: T → ( E ) | int Y Y → * T | ε – For each terminal b ∈ First(α) do • T[A, b] = α • Where in the line of Y we put Y →* T ?

– If ε∈First(α), for each b ∈ Follow(A) do – In the lines of First( *T) = { * } • T[A, b] = α

– If ε∈First(α) and $ ∈ Follow(A) do • Where in the line of Y we put Y →ε ? α • T[A, $] = – In the lines of Follow(Y) = { $, +, ) }

Compiler Design 1 (2011) 15 Compiler Design 1 (2011) 16 Notes on LL(1) Parsing Tables

• If any entry is multiply defined then G is not LL(1) Bottom Up Parsing – If G is ambiguous – If G is left recursive – If G is not left-factored – And in other cases as well

• For some there is a simple parsing strategy: Predictive parsing • Most programming grammars are not LL(1) • Thus, we need more powerful parsing strategies

Compiler Design 1 (2011) 17

Bottom-Up Parsing An Introductory Example

• Bottom-up parsing is more general than top- • LR parsers don’t need left-factored grammars down parsing and can also handle left-recursive grammars – And just as efficient – Builds on ideas in top-down parsing • Consider the following grammar: – Preferred method in practice E → E + ( E ) | int • Also called LR parsing – L means that tokens are read left to right – Why is this not LL(1)?

– R means that it constructs a rightmost derivation ! • Consider the string: int + ( int ) + ( int )

Compiler Design 1 (2011) 19 Compiler Design 1 (2011) 20 The Idea A Bottom-up Parse in Detail (1)

• LR parsing reduces a string to the start symbol by inverting productions: int + (int) + (int)

str w input string of terminals repeat – Identify β in str such that A →βis a production (i.e., str = αβγ) – Replace β by A in str (i.e., str w = α A γ) until str = S (the start symbol) OR all possibilities are exhausted int + ( int ) + ( int )

Compiler Design 1 (2011) 21 Compiler Design 1 (2011) 22

A Bottom-up Parse in Detail (2) A Bottom-up Parse in Detail (3)

int + (int) + (int) int + (int) + (int) E + (int) + (int) E + (int) + (int) E + (E) + (int)

E E E

int + ( int ) + ( int ) int + ( int ) + ( int )

Compiler Design 1 (2011) 23 Compiler Design 1 (2011) 24 A Bottom-up Parse in Detail (4) A Bottom-up Parse in Detail (5)

int + (int) + (int) int + (int) + (int) E + (int) + (int) E + (int) + (int) E + (E) + (int) E + (E) + (int)

E + (int) E E + (int) E E + (E)

E E E E E

int + ( int ) + ( int ) int + ( int ) + ( int )

Compiler Design 1 (2011) 25 Compiler Design 1 (2011) 26

A Bottom-up Parse in Detail (6) Important Fact #1

E int + (int) + (int) Important Fact #1 about bottom-up parsing: E + (int) + (int) E + (E) + (int) An LR parser traces a rightmost derivation in E + (int) E E + (E) reverse E

A rightmost E E E derivation in reverse int + ( int ) + ( int )

Compiler Design 1 (2011) 27 Compiler Design 1 (2011) 28 Where Do Reductions Happen Notation

Important Fact #1 has an interesting • Idea: Split string into two substrings consequence: – Right substring is as yet unexamined by parsing – Let αβγ be a step of a bottom-up parse (a string of terminals) – Assume the next reduction is by using A→β – Left substring has terminals and non-terminals – Then γ is a string of terminals • The dividing point is marked by a I

Why? Because αAγ→αβγis a step in a right- – The I is not part of the string

most derivation • Initially, all input is unexamined: Ix1x2 . . . xn

Compiler Design 1 (2011) 29 Compiler Design 1 (2011) 30

Shift-Reduce Parsing Shift

Bottom-up parsing uses only two kinds of actions: Shift: Move I one place to the right – Shifts a terminal to the left string

Shift E + (I int ) ⇒ E + (int I )

Reduce In general: ABC I xyz ⇒ ABCx I yz

Compiler Design 1 (2011) 31 Compiler Design 1 (2011) 32 Reduce Shift-Reduce Example

Reduce: Apply an inverse production at the right E → E + ( E ) | int I int + (int) + (int)$ shift end of the left string – If E → E + ( E ) is a production, then

E + ( E + ( E ) I ) ⇒ E + ( E I )

In general, given A → xy, then: Cbxy I ijk ⇒ CbA I ijk

int + ( int ) + ( int )

Compiler Design 1 (2011) 33

Shift-Reduce Example Shift-Reduce Example

E → E + ( E ) | int E → E + ( E ) | int I int + (int) + (int)$ shift I int + (int) + (int)$ shift int I + (int) + (int)$ reduce E → int int I + (int) + (int)$ reduce E → int E I + (int) + (int)$ shift 3 times

E

int + ( int ) + ( int ) int + ( int ) + ( int ) Shift-Reduce Example Shift-Reduce Example

E → E + ( E ) | int E → E + ( E ) | int I int + (int) + (int)$ shift I int + (int) + (int)$ shift int I + (int) + (int)$ reduce E → int int I + (int) + (int)$ reduce E → int E I + (int) + (int)$ shift 3 times E I + (int) + (int)$ shift 3 times E + (int I ) + (int)$ reduce E → int E + (int I ) + (int)$ reduce E → int E + (E I ) + (int)$ shift

E E E

int + ( int ) + ( int ) int + ( int ) + ( int )

Shift-Reduce Example Shift-Reduce Example

E → E + ( E ) | int E → E + ( E ) | int I int + (int) + (int)$ shift I int + (int) + (int)$ shift int I + (int) + (int)$ reduce E → int int I + (int) + (int)$ reduce E → int E I + (int) + (int)$ shift 3 times E I + (int) + (int)$ shift 3 times E + (int I ) + (int)$ reduce E → int E + (int I ) + (int)$ reduce E → int E + (E I ) + (int)$ shift E + (E I ) + (int)$ shift E E + (E) I + (int)$ reduce E → E + (E) E + (E) I + (int)$ reduce E → E + (E) E I + (int)$ shift 3 times

E E E E

int + ( int ) + ( int ) int + ( int ) + ( int ) Shift-Reduce Example Shift-Reduce Example

E → E + ( E ) | int E → E + ( E ) | int I int + (int) + (int)$ shift I int + (int) + (int)$ shift int I + (int) + (int)$ reduce E → int int I + (int) + (int)$ reduce E → int E I + (int) + (int)$ shift 3 times E I + (int) + (int)$ shift 3 times E + (int I ) + (int)$ reduce E → int E + (int I ) + (int)$ reduce E → int E + (E I ) + (int)$ shift E E + (E I ) + (int)$ shift E E + (E) I + (int)$ reduce E → E + (E) E + (E) I + (int)$ reduce E → E + (E) E I + (int)$ shift 3 times E I + (int)$ shift 3 times E + (int I )$ reduce E → int E + (int I )$ reduce E → int E + (E I )$ shift E E E E E

int + ( int ) + ( int ) int + ( int ) + ( int )

Shift-Reduce Example Shift-Reduce Example

E → E + ( E ) | int E → E + ( E ) | int I int + (int) + (int)$ shift I int + (int) + (int)$ shift E int I + (int) + (int)$ reduce E → int int I + (int) + (int)$ reduce E → int E I + (int) + (int)$ shift 3 times E I + (int) + (int)$ shift 3 times E + (int I ) + (int)$ reduce E → int E + (int I ) + (int)$ reduce E → int E + (E I ) + (int)$ shift E E + (E I ) + (int)$ shift E E + (E) I + (int)$ reduce E → E + (E) E + (E) I + (int)$ reduce E → E + (E) E I + (int)$ shift 3 times E I + (int)$ shift 3 times E + (int I )$ reduce E → int E + (int I )$ reduce E → int E + (E I )$ shift E + (E I )$ shift E E E E E + (E) I $ reduce E → E + (E) E E + (E) I $ reduce E → E + (E) E E I $ accept int + ( int ) + ( int ) int + ( int ) + ( int ) The Stack Key Question: To Shift or to Reduce?

• Left string can be implemented by a stack Idea: use a finite automaton (DFA) to decide – Top of the stack is the I when to shift or reduce – The input is the stack

• Shift pushes a terminal on the stack – The language consists of terminals and non-terminals

• We run the DFA on the stack and we examine • Reduce pops 0 or more symbols off of the the resulting state X and the token tok after I stack (production RHS) and pushes a non- – If X has a transition labeled tok then shift terminal on the stack (production LHS) – If X is labeled with “A →βon tok”then reduce

Compiler Design 1 (2011) 45 Compiler Design 1 (2011) 46

LR(1) Parsing: An Example Representing the DFA

int 0 1 I int + (int) + (int)$ shift • Parsers represent the DFA as a 2D table E E → int int I + (int) + (int)$ E → int – Recall table-driven lexical ( on $, + E I + (int ) + (int)$ shift(x3) 2 + 3 4 • Lines correspond to DFA states accept E + (int I ) + (int)$ E → int E int • Columns correspond to terminals and non- on $ E + (E I ) + (int)$ shift terminals ) E → int E + (E) I + (int)$ E → E+(E) 7 6 5 on ), + E I + (int)$ shift (x3) • Typically columns are split into: E → E + (E) on $, + + int E + (int I )$ E → int – Those for terminals: action table – Those for non-terminals: goto table ( E + (E I )$ shift 8 9 E + (E) I $ E → E+(E) E + E I $ accept 10 11 E → E + (E) ) on ), + Compiler Design 1 (2011) 48 Representing the DFA: Example The LR Parsing Algorithm

• The table for a fragment of our DFA: • After a shift or reduce action we rerun the DFA on the entire stack – This is wasteful, since most of the work is repeated ( int + ( ) $ E 3 4 … int E 3 s4 • Remember for each stack element on which

4 s5 g6 state it brings the DFA 6 5 5 r → r → E → int E int E int 6 s8 s7 • LR parser maintains a stack ) on ), + 7 〈 〉 〈 〉 rE →E+(E) rE → E+(E) sym1, state1 . . . symn, staten state is the final state of the DFA on sym …sy m 7 … k 1 k E → E + (E) on $, + Compiler Design 1 (2011) 49 Compiler Design 1 (2011) 50

The LR Parsing Algorithm LR Parsers

Let I = w$ be initial input • Can be used to parse more grammars than LL Let j = 0 Let DFA state 0 be the start state Let stack = 〈 dummy, 0 〉 • Most programming grammars are LR repeat case action[top_state(stack), I[j]] of • LR Parsers can be described as a simple table shift k: push 〈 I[j++], k 〉 reduce X → A: • There are tools for building the table pop |A| pairs, push 〈X, Goto[top_state(stack), X]〉 accept: halt normally • How is the table constructed? error: halt and report error

Compiler Design 1 (2011) 51 Compiler Design 1 (2011) 52