Transforming a grammar for LL(1)

Ambiguous grammars are not LL(1) but unambiguous grammars are not necessarily LL(1) Having a non-LL(1) unambiguous grammar for a language does not mean that this language is not LL(1). But there are languages for which there exist unambiguous context-free grammars but no LL(1) grammar.

We will see two grammar transformations that improve the chance to get a LL(1) grammar:

I Elimination of left-recursion I Left-factorization

Syntax analysis 150 Left-recursion The following expression grammar is unambiguous but it is not LL(1):

Exp Exp + Exp2 ! Exp Exp Exp2 ! Exp Exp2 ! Exp2 Exp2 Exp3 ! ⇤ Exp2 Exp2/Exp3 ! Exp2 Exp3 ! Exp3 num ! Exp3 (Exp) !

Indeed, First(↵)isthesameforallRHS↵ of the productions for Exp et Exp2 This is a consequence of left-recursion.

Syntax analysis 151 Left-recursion Recursive productions are productions defined in terms of themselves. Examples: A Ab ou A bA. ! ! When the recursive nonterminal is at the left (resp. right), the production is said to be left-recursive (resp. right-recursive). Left-recursive productions can be rewritten with right-recursive productions Example:

N 1N0 N N↵ ! ! 1 . . . . N nN0 N N↵m ! ! N0 ↵ N0 N ! 1 ! 1 . . . . , N0 ↵mN0 N n ! ! N0 ✏ !

Syntax analysis 152 Right-recursive expression grammar

Exp Exp2Exp0 ! Exp Exp + Exp2 Exp0 +Exp2Exp0 ! ! Exp Exp Exp2 Exp0 Exp2Exp0 ! ! Exp Exp2 Exp0 ✏ ! ! Exp2 Exp2 Exp3 Exp2 Exp3Exp20 ! ⇤ ! Exp2 Exp2/Exp3 Exp20 Exp3Exp20 ! !⇤ Exp2 Exp3 , Exp20 /Exp3Exp20 ! ! Exp3 num Exp20 ✏ ! ! Exp3 (Exp) Exp3 num ! ! Exp3 (Exp) !

Syntax analysis 153 Left-factorisation The RHS of these two productions have the same First set. Stat if Exp then Stat else Stat ! Stat if Exp then Stat !

The problem can be solved by left factorising the grammar: Stat if Exp then Stat ElseStat ! ElseStat else Stat ! ElseStat ✏ !

Note I The resulting grammar is ambiguous and the parsing table will contain two rules for M[ElseStat, else] (because else Follow(ElseStat)andelse First(else Stat)) 2 2 I Ambiguity can be solved in this case by letting M[ElseStat, else]= ElseStat else Stat . { ! } Syntax analysis 154 Hidden left-factors and hidden left recursion

Sometimes, left-factors or left recursion are hidden Examples:

I The following grammar:

A da acB ! | B abB daA Af ! | | has two overlapping productions: B daA and B ⇤ daf . ! ) I The following grammar:

S Tu wx ! | T Sq vvS ! | has left recursion on T (T ⇤ Tuq) ) Solution: expand the production rules by substitution to make left-recursion or left factors visible and then eliminate them

Syntax analysis 155 Summary

Construction of a LL(1) parser from a CFG grammar Eliminate ambiguity Eliminate left recursion left factorization Add an extra start production S S$ to the grammar 0 ! Calculate First for every production and Follow for every nonterminal Calculate the parsing table Check that the grammar is LL(1)

Syntax analysis 156 Recursive implementation Recursive implementation Recursive implementation From the parsing table, it is easy to implement a predictive parser recursivelyFrom the parsing table, it is easy to implement a predictive parser From the parsing table,3.12. it LL(1) is PARSING easy to implement a predictive81 parser recursively function3.12. parseT’() LL(1)3.12. PARSING = LL(1) PARSING 81 81 recursively (with one functionif next = ’a’ peror next =nonterminal) ’b’ or next = ’$’ then parseT()function ; parseT’() match(’$’) = else reportError() if next = ’a’ or next = ’b’ or next = ’$’ then T 0 T $ parseT()function ; match(’$’) parseT’() = ! function parseT() = else reportError() T R if next = ’b’if or next next = ’c’ = or ’a’ next = or ’$’ thennext = ’b’ or next = ’$’ then T 0 ! T $ T !aTc parseR() parseT() ; match(’$’) T ! R elsefunction if next parseT() = ’a’ then = R !✏ match(’a’)if nextelse ; = parseT() ’b’ or reportError() next ; match(’c’) = ’c’ or next = ’$’ then T ! aTc else reportError()parseR() R !bR else if next = ’a’ then R ! ✏ function parseR()match(’a’) = ; parseT() ; match(’c’) ! if nextelse =function ’c’ reportError() or next = ’$’ parseT() then = abc$ R bR (* do nothingif *) next = ’b’ or next = ’c’ or next = ’$’ then T 0 T 0 T $ T 0 !T $ T 0 T $ elsefunction if next parseR() = ’b’ then = T T !aTc T ! RT RT! R match(’b’)if next ; = parseR() ’c’parseR() or next = ’$’ then abc$ else reportError()(* do nothing *) R ! R !bR R ! ✏ R ! ✏ T 0 T 0 T $ T!0 T $ ! T!0 T $ else ifelse next = ’b’if then next = ’a’ then ! ! ! match(’b’)Figure 3.16: Recursive; parseR() descent parser for grammar 3.9 T T aTc T RT RT R else reportError()match(’a’) ; parseT() ; match(’c’) R ! R !bR R ! ✏ R ! ✏ ! ! ! For parseR, weelse must choose reportError() the empty production on symbols in FOLLOW(R) (c or $). The productionFigureR 3.16:bR is Recursive chosen on descent input b parser. Again, for all grammar other symbols 3.9 ! produce an error. Syntax analysis 62 The functionfunctionmatch takes as parseR()argument a symbol, = which it tests for equality with the nextFor inputparseR symbol., we must If they choose are equal, the empty the following production symbol on symbols is read in intoFOLLOW(R) (c or $). The production R bR is chosen on input b. Again, all other symbols the variable next.if We assume nextnext! =is initialised ’c’ or to thenext first input symbol = ’$’ before then produce an error. Syntax analysis parseT’ is called. 62 The programThe function in figure(*match 3.16 do onlytakes nothing checks as argument if the input a *) symbol, is valid. which It can it easily tests be for equality extendedwith to constructthe nextelse ainput syntax symbol.if tree by next letting If they the are parse= equal, ’b’ functions the thenfollowing return the symbol sub-trees is read into for thethe parts variable of inputnext that. they We parse. assume next is initialised to the first input symbol before parseT’ is called.match(’b’) ; parseR() The program in figure 3.16 only checks if the input is valid. It can easily be 3.12.2 Table-driven LL(1) parsing extended toelse construct areportError() syntax tree by letting the parse functions return the sub-trees In table-drivenfor the parts LL(1) of parsing, input that we they encode parse. the selection of productions into a table instead of in the program text. A simple non-recursive program uses this table and a stack to perform the parsing. (Mogensen) The3.12.2 table is cross-indexedTable-driven by LL(1) nonterminalFigure parsing and 3.16: terminal and Recursive contains for each descent parser for grammar 3.9 such pairIn table-driven the production LL(1) (if any) parsing, that is we chosen encode for thatthe selection nonterminal of whenproductions that ter- into a table Syntax analysis minal isinstead the next ofinput in the symbol. program This text. decision A simple is made non-recursive just as for program recursive uses descent this table and 157 a stack to perform the parsing. The tableFor is cross-indexedparseR by, nonterminal we must and choose terminal and the contains empty for each production on symbols in FOLLOW(R) such pair(c theor production $). The (if any) production that is chosen forR that nonterminalbR is when chosen that ter- on input b. Again, all other symbols minal is the next input symbol. This decision is made! just as for recursive descent produce an error. The function match takes as argument a symbol, which it tests for equality with the next input symbol. If they are equal, the following symbol is read into the variable next. We assume next is initialised to the first input symbol before parseT’ is called. The program in figure 3.16 only checks if the input is valid. It can easily be extended to construct a syntax tree by letting the parse functions return the sub-trees for the parts of input that they parse.

3.12.2 Table-driven LL(1) parsing

In table-driven LL(1) parsing, we encode the selection of productions into a table instead of in the program text. A simple non-recursive program uses this table and a stack to perform the parsing. The table is cross-indexed by nonterminal and terminal and contains for each such pair the production (if any) that is chosen for that nonterminal when that ter- minal is the next input symbol. This decision is made just as for recursive descent Outline

1. Introduction

2. Context-free grammar

3. Top-down parsing

4. Bottom-up parsing Shift/reduce parsing LR parsers Operator precedence parsing Using ambiguous grammars

5. Conclusion and some practical considerations

Syntax analysis 158 Bottom-up parsing

A bottom-up parser creates the starting from the leaves towards the root It tries to convert the program into the start symbol Most common form of bottom-up parsing: shift-reduce parsing

Syntax analysis 159 Bottom-up parsing: example

Bottum-up parsing of int +(int + int + int) Grammar: One View of a Bottom-Up Parse S S E$ S → E$ ! E → T E E T E → E + T T ! T → int E E + T T → (E) ! E T int ! E T ( E ) ! E E T T T T

int + ( int + int + int ) $

(Keith Schwarz)

Syntax analysis 160 Bottom-up parsing: example

Bottum-up parsing of int +(int + int + int):

Grammar: int +(int + int + int)$ T +(int + int + int)$ S E$ ! E +(int + int + int)$ E T ! E +(T + int + int)$ E E + T E +(E + int + int)$ ! T int E +(E + T + int)$ ! E +(E + int)$ T ( E ) ! E +(E + T )$ E +(E)$ E + T $ E$ S Top-down parsing is often done as a rightmost derivation in reverse (There is only one if the grammar is unambiguous). Syntax analysis 161 Terminology

A Rightmost (canonical) derivation is a derivation where the rightmost nonterminal is replaced at each step. A rightmost derivation from ↵ to is noted ↵ ⇤ . )rm A reduction transforms uwv to uAv if A w is a production ! ↵ is a right sentential form if S ⇤ ↵. )rm A handle of a right sentential form (= ↵w) is a production A and a position in where may be found and replaced by A ! to produce the previous right-sentential form in a rightmost derivation of : S ⇤ ↵Aw ↵w )rm )rm

I Informally, a handle is a production we can reverse without getting stuck. I If the handle is A ,wewillalsocall the handle. !

Syntax analysis 162 Handle: example

Bottum-up parsing of int +(int + int + int)

Grammar: int +(int + int + int)$ T +(int + int + int)$ S E ! E +(int + int + int)$ E T ! E +(T + int + int)$ E E + T E +(E + int + int)$ ! T int E +(E + T + int)$ ! E +(E + int)$ T ( E ) ! E +(E + T )$ E + (E)$ E + T $ E$ S The handle is in red in each right sentential form

Syntax analysis 163 Finding the handles

Bottom-up parsing = finding the handle in the right sentential form obtained at each step This handle is unique as soon as the grammar is unambiguous (because in this case, the rightmost derivation is unique) Suppose that our current form is uvw and the handle is A v ! (getting uAw after reduction). w can not contain any nonterminals (otherwise we would have reduced a handle somewhere in w)

Syntax analysis 164 Shift/reduce parsing

Proposed model for a bottom-up parser: Split the input into two parts:

I Left substring is our work area I Right substring is the input we have not yet processed All handles are reduced in the left substring Right substring consists only of terminals At each point, decide whether to:

I Move a terminal across the split (shift) I Reduce a handle (reduce)

Syntax analysis 165 Shift/reduce parsing: example

Grammar: Left substring Right substring Action $ id + id id$Shift ⇤ E E + T T $id +id id$ReducebyF id ! | $F +id ⇤ id$ReducebyT ! F T T F F $T +id ⇤ id$ReducebyE ! T ! ⇤ | ⇤ ! F ( E ) id $E +id id$Shift ! | $E+ id ⇤ id$Shift $E + id ⇤ id$ReducebyF id $E + F ⇤id$ReducebyT ! F $E + T ⇤id$Shift ! Bottum-up parsing of $E + T ⇤id$Shift ⇤ id + id id $E + T id $ReducebyF id ⇤ $E + T ⇤ F $ReducebyT ! T F $E + T ⇤ $ReducebyE ! E +⇤ T $E $Accept !

Syntax analysis 166 Shift/reduce parsing

In the previous example, all the handles were to the far right end of the left area (not inside) This is convenient because we then never need to shift from the left to the right and thus could process the input from left-to-right in one pass.

Is it the case for all grammars? Yes ! Sketch of proof: by induction on the number of reduces

I After no reduce, the first reduction can be done at the right end of the left area I After at least one reduce, the very right of the left area is a nonterminal (by induction hypothesis). This nonterminal must be part of the next reduction, since we are tracing a rightmost derivation backwards.

Syntax analysis 167 Shift/reduce parsing

Consequence: the left area can be represented by a stack (as all activities happen at its far right)

Four possible actions of a shift-reduce parser: 1. Shift: push the next terminal onto the stack 2. Reduce: Replace the handle on the stack by the nonterminal 3. Accept: parsing is successfully completed 4. Error: discover a syntax error and call an error recovery routine

Syntax analysis 168 Shift/reduce parsing

There still remain two open questions: At each step:

I How to choose between shift and reduce? I If the decision is to reduce, which rules to choose (i.e., what is the handle)? Ideally, we would like this choice to be deterministic given the stack and the next k input symbols (to avoid ), with k typically small (to make parsing ecient) Like for top-down parsing, this is not possible for all grammars Possible conflicts:

I shift/reduce conflict: it is not possible to decide between shifting or reducing I reduce/reduce conflict: the parser can not decide which of several reductions to make

Syntax analysis 169 Shift/reduce parsing

We will see two main categories of shift-reduce parsers: LR-parsers

I They cover a wide range of grammars I Di↵erent variants from the most specific to the most general: SLR, LALR, LR Weak precedence parsers

I They work only for a small class of grammars I They are less ecient than LR-parsers I They are simpler to implement

Syntax analysis 170 Outline

1. Introduction

2. Context-free grammar

3. Top-down parsing

4. Bottom-up parsing Shift/reduce parsing LR parsers Operator precedence parsing Using ambiguous grammars

5. Conclusion and some practical considerations

Syntax analysis 171 LR-parsers

LR(k) parsing: Left-to-right, Rightmost derivation, k symbols lookahead. Advantages:

I The most general non-backtracking shift-reduce parsing, yet as ecient as other less general techniques I Can detect syntactic error as soon as possible (on a left-to-right scan of the input) I Can recognize virtually all programming language constructs (that can be represented by context-free grammars) I Grammars recognized by LR parsers is a proper superset of grammars recognized by predictive parsers (LL(k) LR(k)) ⇢ Drawbacks:

I More complex to implement than predictive (or operator precedence) parsers Like table-driven predictive parsing, LR parsing is based on a parsing table.

Syntax analysis 172 LR Parsing Algorithm Structure of a LR parser

input a1 ... ai ... an $ stack

Sm

X m LR Parsing Algorithm output Sm-1

Xm-1 . . Action Table Goto Table S 1 terminals and $ non-terminal s s X1 t four different t each item is S0 a actions a a state number t t e e s s

30

Syntax analysis 173 Structure of a LR parser

A configuration of a LR parser is described by the status of its stack and the part of the input not analysed (shifted) yet:

(s0X1s1 ...Xmsm, ai ai+1 ...an$)

where Xi are (terminal or nonterminal) symbols, ai are terminal symbols, and si are state numbers (of a DFA) A configuration corresponds to the right sentential form

X1 ...Xmai ...an

Analysis is based on two tables:

I an action table that associates an action ACTION[s, a]toeachstate s and nonterminal a. I a goto table that gives the next state GOTO[s, A] from state s after a reduction to a nonterminal A

Syntax analysis 174 Actions of a LR-parser Let us assume the parser is in configuration

(s0X1s1 ...Xmsm, ai ai+1 ...an$)

(initially, the state is (s0, a1a2 ...an$),wherea1 ...an is the input word) ACTION[sm, ai ] can take four values: 1. Shift s: shifts the next input symbol and then the state s on the

stack (s0X1s1 ...Xmsm, ai ai+1 ...an) (s0X1s1 ...Xmsmai s, ai+1 ...an) 2. Reduce A (denoted by rn where! n is a production number) ! I Pop 2 (= r) items from the stack | | I Push A and s where s = GOTO[sm r , A] (s0X1s1 ...Xmsm, ai ai+1 ...an) ! (s0X1s1 ...Xm r sm r As, ai ai+1 ...an) I Output the prediction A ! 3. Accept: parsing is successfully completed 4. Error: parser detected an error (typically an empty entry in the action table).

Syntax analysis 175 LR-parsing algorithm

Create a stack with the start state s0 a = getnexttoken() while (True) s = pop() if (ACTION[s, a] = shift t) Push a and t onto the stack a = getnexttoken() elseif (ACTION[s, a]=reduceA ) ! Pop 2 elements o↵the stack | | Let state t now be the state on the top of the stack Push A onto the stack Push GOTO[t, A]ontothestack Output A ! elseif (ACTION[s, a]=accept) break // Parsing is over else call error-recovery routine

Syntax analysis 176 Example: parsing table for the expression grammar

(SLR) Parsing Tables for Expression Grammar Action Table Goto Table 1) E → E+T state id + * ( ) $ E T F 2) E → T 0 s5 s4 1 2 3 1. E E + T 1 s6 acc 3) T T*F ! → 2 r2 s7 r2 r2 2. E T 4) T → F ! 3 r4 r4 r4 r4 3. T T 5)F F → (E) 4 s5 s4 8 2 3 ! ⇤6) F → id 5 r6 r6 r6 r6 4. T F 6 s5 s4 9 3 ! 5. F (E) 7 s5 s4 10 ! 8 s6 s11 6. F id 9 r1 s7 r1 r1 ! 10 r3 r3 r3 r3 11 r5 r5 r5 r5

34

Syntax analysis 177 Example: LRActions parsing with of A the (S)LR-Parser expression grammar -- Example

stack input action output 0 id*id+id$ shift 5 0id5 *id+id$ reduce by F→id F→id 0F3 *id+id$ reduce by T→F T→F 0T2 *id+id$ shift 7 0T2*7 id+id$ shift 5 0T2*7id5 +id$ reduce by F→id F→id 0T2*7F10 +id$ reduce by T→T*F T→T*F 0T2 +id$ reduce by E→T E→T 0E1 +id$ shift 6 0E1+6 id$ shift 5 0E1+6id5 $ reduce by F→id F→id 0E1+6F3 $ reduce by T→F T→F 0E1+6T9 $ reduce by E→E+T E→E+T 0E1 $ accept

35

Syntax analysis 178 Constructing the parsing tables

There are several ways of building the parsing tables, among which:

I LR(0): no lookahead, works for only very few grammars I SLR: the simplest one with one symbol lookahead. Works with less grammars than the next ones I LR(1): very powerful but generate potentially very large tables I LALR(1): tradeo↵between the other approaches in terms of power and simplicity I LR(k), k> 1: exploit more lookahead symbols

Main idea of all methods: build a DFA whose states keep track of where we are in the parsing

Syntax analysis 179 Parser generators

LALR(1) is used in most parser generators like /Bison

We will nevertheless only see SLR in details:

I It’s simpler. I LALR(1) is only minorly more expressive. I When a grammar is SLR, then the tables produced by SLR are identical to the ones produced by LALR(1). I Understanding of SLR principles is sucient to understand how to handle a grammar rejected by LALR(1) parser generators (see later).

Syntax analysis 180 LR(0) item An LR(0) item (or item for short) of a grammar G is a production of G with a dot at some position of the body. Example: A XYZ yields four items: ! A .XYZ ! A X .YZ ! A XY .Z ! A XYZ. ! (A ✏ generates one item A .) ! ! An item indicates how much of a production we have seen at a given point in the parsing process.

I A X .YZ means we have just seen on the input a string derivable from! X (and we hope to get next YZ). Each state of the SLR parser will correspond to a set of LR(0) items A particular collection of sets of LR(0) items (the canonical LR(0) collection) is the basis for constructing SLR parsers

Syntax analysis 181 Construction of the canonical LR(0) collection

The grammar G is first augmented into a grammar G 0 with a new start symbol S and a production S S where S is the start 0 0 ! symbol of G We need to define two functions:

I Closure(I ): extends the set of items I when some of them have a dot to the left of a nonterminal I Goto(I , X ): moves the dot past the symbol X in all items in I These two functions will help define a DFA:

I whose states are (closed) sets of items I whose transitions (on terminal and nonterminal symbols) are defined by the Goto function

Syntax analysis 182 Closure

Closure(I ) repeat for any item A ↵.X in I ! for any production X ! I = I X . [{ ! } until I does not change return I Example:

0 0 E 0 E Closure( E .E )= E .E, ! { ! } { ! E E + T E .E + T ! E T ! ! E .T T T F ! ! ⇤ T .T F T F ! ⇤ ! F (E) T .F ! ! F id F .(E) ! ! F . id ! } Syntax analysis 183 Goto

Goto(I , X ) Set J to the empty set for any item A ↵.X in I ! J = J A ↵X . { ! } return closure(J) S Example:

0 E 0 E I0 = E .E, ! { ! goto(I0, E)= E 0 E., E E. + T E E + T E .E + T { ! ! } ! ! goto(I0, T )= E T ., T T . F E T E .T { ! ! ⇤ } ! goto(I0, F )= T F . T T F ! { ! } ! ⇤ T .T F goto(I0,0 (0)=Closure( F (.E) ) T F ! ⇤ { ! } ! = F (.E) (I0 E 0 E ) F (E) T .F { ! }[ \{ ! } ! ! goto(I0, id) = F id. F id F .(E) { ! } ! ! F . id ! }

Syntax analysis 184 Construction of the canonical collection

C = closure( S .S ) { { 0 ! } } repeat for each item set I in C for each item A ↵.X in I ! C = C Goto(I , X ) [{ } until C did not change in this iteration return C

Collect all sets of items reachable from the initial state by one or several applications of goto. Item sets in C are the states of a DFA, goto is its transition function

Syntax analysis 185 ExampleExample Example

ExampleExample Example Example Example I : E .E, 0 0 ! I : E E + .T I0 : E 0 .E.E+, T Example6 Example !Example ExampleExample: parsing! table for the expression grammar E ! ..TE + T I6 : E ET+ .T.TI6 : FE E + .T ! ! ! ⇤ ! ExampleTE ! .T F T .TT F.F T .T F ! ⇤ ! !⇤ I6 :!E E⇤+ .T T ..FT ExampleF T .FF .(SLR)(E) T Parsing.!F Tables for Expression Grammar ! ⇤ Example ! ! ExampleT .T F FT ..(FE) F .F(E) . id !T ! .F ⇤ I0 : E 0 ! .E, ! F .(E) Action Table Goto Table ExampleExample F !Example.(E) ! ! IEF : E..Eid+ TE + .T FI9 : .Eid E + T . !F state. (Eid) + * ( ) $ E T F 6F ! . Exampleid ! !1) E → E+TF .!id I : E ! E!. I : E E T+ .TT . F F 0 . ids5 s4 1 2 3 Actions of a LR-parserExample: parsing table1 forE 0! theT.T expression.T F grammar6 I9 : E E + 2)T . E → T ! ! I : E !! E. I6 : E E +!.T! ! ⇤ I : E 1 E + Ts6. acc I0 : E 0 .E, I6 : E E +1.TTE 0 .ET.!I+6 :FTE ⇤E + .T ! 1.T TEII6 :.:TETET.+FFTEIT9+:.TFE.9 E + T . Let us assume the! parser is in configuration !!T .F! ExampleT .T F10! 3) T → T*F ! ! E .EExample+ T T ! .ExampleT F E ! E. ⇤+ TT .T F !! ⇤!⇤! ⇤ T 2 T . Fr2 s7 r2 r2 I2 : TE ! .TF.! ! ⇤ T !I10.F2.T:⇤ETI11 .:FTTTF 4)F. T (. TE →).F F T T!. ⇤F ! (SLR)! ParsingI⇤ : E ! TablesFT . . T(forE) Expression.F GrammarExample: ! !⇤! parsing⇤ ExampleI10 table!: T 3 forT⇤ F ther4. r4 expressionr4 r4 grammar E .T T .F 2 FT .(TE.) F ! !I : !F !T(E). .F ! ⇤ ! (s0X1s1 ...Xmsm,!ai ai+1 ...an$)!! !⇤ I6F: E .(E)E + .T F 11.(FE) .(E) 5) I F10 →: (E)IIT 6 :: EF T4 E(E+s5) F..T. s4 8 2 3 T .T F F .(E) FT F.Tid. ActionF. id Table! IGoto6 : ETable! E3.+T.F ! T !F 11 ! II30 :: TE 0 ! F.E. ,⇤ !TExample.T F F !. id !!F ⇤6). ( EF ) id !T !5 .T⇤ Fr6 r6 r6 r6 ! ⇤ ! accept!! ! F . id T ! .FT F . id! I11→: F (E). T .F 1)F E .E+TidII3 :: ETstate! EFid . + * ! ( ! ) ⇤$ E T F !6 s5⇤ s4 9 3 (initially,Example the state! is (s0, a1a2 ...an$)→,whereaI14 : I9FE10...:!Ea(.nE.Eis+) theTE input+T T ..F I9 : E !E4.+⇤!T. FF . id (SLR)!T Parsing.F Tables for Expression Grammar F .(E) ! !0 s5 I9 : E s4E! + T . 1 T2 !3 .F ! ! ! I9 :2)E E → ET +I4T: .EEF E.(.ET..E!++) TT !FExample.(E) TI9 !: TE. F E + T . F 7 .(Es5) s4 10 word) ! !1 s6 T T . F acc F .5.(E)FI9 : E(E) E + T . F 1.. idE E + T ! E !!T.EI6+:TTE. !F E + .T ! ⇤! ! !8 s6 Actions11 Table Goto Table I6 : E E +3).T T T → T*FT .I2 :F EET T..TT.!F !F⇤ ⇤. id I10 : T! T F .!T . F F . id ACTION[sm, a!i ] can take! four values:! ⇤ !2 I10r2 : Ts7 !!T r2 F .r2 I : EF .Eid+ .F T T . F !9 stater1 ids7 + *r1 (r1 ) $ E T F I1 : E E. ! I E !!!: .TT ⇤I6 :TTE F .TE +F.6T ! 6. ⇤F! id⇤ 1) EI →: E+TE E + T . 0 2.TE .TTI10 :4)F T T → FT FT10.T 3 T..TF. r4FF I9 :r4 E! E⇤r4 +r4T . I11 : F ! (E). ! ⇤ 9 1. Shift s: shifts! the next input symbol and then the!! stateI⇤11!s Example:onF the⇤!!(!E). ⇤ I7 : TI10!: .TTT I.10F!: TT F .T F . !10 0 r3 s5r3 r3 s4r3 1 2 3 E E. +!!T ⇤ ! ⇤ T ! .T Example⇤ F !T T . TF F 2) EI6 →: TET ET+. .FT (s X s ...X s I,6a:a IE11...:5)a F E)F →+ ((E)s.(TEXI 3)s.: ...IT11TFX4: a sF.F,(s5F.aE ) ...(aTE) )s4. .F 8 2 !3 ⇤⇤ ! ⇤! ⇤ !11 r5 r5 r5 r5 stack Example:0 1 !1 3.TmTm parsingi .TiF+1 F n table! 0 1 for1 the!m!i expressioni+1⇤ n ! grammarExample⇤ TIF : ..(FEF)I111.:(EEF). E(E+).T T ! .T ⇤ F1 s6 acc I2 : EExampleT . ! ! FT 5 ..Fid!r6 I10r6: T!!r6T r6 F⇤. 11! 3) TI10 : T*FT T F . 2. Reduce A (denoted!! byT⇤rn6) where .FT → idnF Iis4 a: productionFF (..(EE)) number)F T .(E.F) ! !! → !! ⇤⇤2 r2 s7 r2 r2 ! F .(E) ! ! ⇤ F .(idE) ! T .F 34 T! 4.T .T F F ! ⇤ F !!6 .(s5E ) I11 : !Fs4 !(E). 9 !3 I11 : F (E). I Pop 2 (= r)! items fromT the.F stack I1 : EFE 0!..EEid+. T F F !. id.(E) I : F ! (E.) 2. E T 4) T → F !! 3 r4 r4 r4 r4 | !| F ⇤ !. id F !!7 !.s5id s4 I6 : E 7E +F .T10 . id ! F .(E) II3Push: TA andFs. where s = GOTO!(SLR)[smExample Parsingr , A] EE Tables.ET. + Tfor Expression!! Grammar! 5) F (E)! 4 s5 s4 8 2 3 5. F! (EF) .(E) I5 : F ! id. ! E E. + F → F . id I6 : !E E + .T !!8 I9 s6: EF Es11 .+TidTI.7 .:TT F T .F3. T T F I4(s:0XF1Is91 ...: (X.EmEs)m,!ai aEi+1 ...+!aTn) . I5 : TEF .idT.. ActionF TableI : E GotoE Table+ .!T ⇤ ! 5 r6 r6 r6 r6 ! ! F .!id 2 !9 ⇤ r1 s7 !!6r1 r1 ! F⇤ .(ESyntax) analysis! ⇤6) Syntax FI9 →: analysisidE E + T . 88 88 (s0X1s1 T...6.XmF.r!Tsm r AsFid, ai ai+1 ...an) ! I9 :TE TE.T+!FT.F. ! 6 s5 s4 9 3 E .ET+ T T .1)! FE → E+T TstateT 10 id. TF . + r3F * r3 ( ) r3 $ Tr3 !E .TT !F 4. T F T T . F I Output! the! predictionI!9⇤: EA E + T .I0 : E 0! .E, !! ⇤ F Syntax. id analysis 88 T .F! ⇤ 0 !s5 ⇤ I6 :s4 E EF+!.T1 .(E2 )⇤!3 ! ! ⇤ 7 s5 s4 10 E .T 2)!! E T I3 : FT 11! .(FEI.) r5 : r5 TT r5T TTr5. FF..F Syntax analysis I : T T F . 80 ! → E .E 10+ T ! !I8 : F (E.) 5. F (E) 10 3. Accept: parsingI10!F1.: E isT. successfully(E)E +TT T completedFT.. F 1 !! s6 I6 : ET!Eacc.TF+ .⇤!TF . id ! ⇤ 8 s6 s11 T .T F Syntax analysisF !. id ! ⇤ ! I11 : F (E). 88 ! ! 3)⇤! T → T*F⇤ I4 : EF ! .(T.EI11)I10: :F T!!(ETF)⇤.!F.(.EE) E. + F34 ! 4. Error: parserI11!F detected: F.⇤!idI10 an:( errorET). (typicallyT SyntaxF . an empty analysis2 ! entryr2 ins7 the T actionT r2 .T.r2F !F ! 9 r1 s7 88 r1 r1 T I2.6 :.FEE TE +4) . TT F I5 : FE !id.E. + T !I9!: E I6⇤: EE+ T!.E + .T6. F id table). !! ! !→ ⇤ T3 ! .T r4 IFr4 : F !r4 r4(F E⇤!). . id ! 10 r3 r3 r3 r3 F .(E)!I!11 : F (E). ! 11 TF .F.(TE)Syntax! TT analysis. F .T F !Syntax analysis 88 88 I9 : E TE + T.T. 5) F F (E) E4 !s5. SyntaxT ⇤ analysiss4 !! 8 2 3 88 !3.!T !T F⇤! → T ! .F FFI9 :.(.EidE) ! E +⇤!T . ⇤ 11 r5 r5 r5 r5 Syntax analysis F T . idT . F T5 ! .T r6 Fr6 Ir6! 78:r6 T! T F .F ! T! .F⇤6) F → id F .(E) I : E!10 ET+ T . T . F ! ⇤Syntax analysis ! ⇤ 9 F . id ! ⇤! 88 34 I5 : F 4.idT. !F T6 !s5 .F s4 ! F9 3 .(E) I10 :!T F T .F(E. ) F . id T!I11 :T .F !F (E⇤). Syntax analysis 88 !!!⇤ 7 !s5 SyntaxI : analysiss4E I10E: +TT!. T !10F . Syntax analysis 88 88 I Syntax: analysisF F(E).. id Syntax analysisF ! .(E) 9 ! ⇤ ! F ⇤ . id 80 88 11 5. F (E) I1 : E 0! E. I10 : T! T F . !!! F8 !. ids6 T Is1111 !T: . F⇤SyntaxF (E analysis)!. 88 I9 : E E + T . SyntaxE9 ! analysisE. +r1 Ts7 I11 : F!r1 r1( E⇤)I.9!: SyntaxE E analysis+ T . 88 88 6. F !id I5 : F ! id. I10 : T !T F . ! T! T . F I : E10 T . r3 r3 r3! r3 ⇤ T T . F Syntax analysis 2 ! I11 : F (E). ! ⇤ 88 ! ⇤ 11 ! r5 r5 r5 r5 I10 : T T F . T T . F ! I10 : T T F . ! ⇤ ! ⇤ !Syntax⇤ analysis 80 I11 : F (E). I3 : T F . I11 : F (E).34 Syntax analysis 88 ! ! Syntax analysis! 88 SyntaxI4 : analysisF (.E) 88 E ! .E + T E ! .T Syntax analysis Syntax analysis 88 88 SyntaxSyntax analysis analysis T ! .T F 186 88 Syntax analysis T ! .FSyntax⇤ analysis 80 88 F ! .(E) Syntax analysis 88 Syntax analysis F ! . id Syntax analysis 88 88 ! Syntax analysis 88 I5 : F id. ! Syntax analysis 88

Syntax analysis Syntax analysis 88 88 Syntax analysis 88 Constructing the LR(0) parsing table

1. Construct C = I , I ,...,I , the collection of sets of LR(0) items { 0 1 n} for G 0 (the augmented grammar) 2. State i of the parser is derived from Ii . Actions for state i are as follows: 2.1 If A ↵.a is in I and goto(I , a)=I ,thenACTION[i, a]=Shiftj ! i i j 2.2 If A ↵. is in Ii ,thensetACTION[i, a]=ReduceA ↵ for all terminals! a. ! 2.3 If S 0 S. is in I ,thensetACTION[i, $] = Accept ! i 3. If goto(Ii , X )=Ij ,thenGOTO[i, X ]= j. 4. All entries not defined by rules (2) and (3) are made “error” 5. The initial state s is the set of items containing S .S 0 0 !

LR(0) because the chosen action (shift or reduce) only depends on the ) current state (but the choice of the next state still depends on the token)

Syntax analysis 187 3.3. LR PARSING

128 x x S' . S $ S x . L L , . S 9 S . ( L ) S . ( L ) S CHAPTER THREE. PARSING( x 3 ( L L , S . S . x S . x Example of aCHAPTER LR(0) THREE. grammar PARSING3.3.S LR PARSING ( . L ) L . S , 0 S′ S$ S 12L . L , S 8 5 CHAPTER THREE. PARSING→ ( x Lx 0 S′ S$ S' . S3 $ L SS S . (x L . ) L L , . S 9 → S ( L . ) S → S . ( L ) x 3 S . ( L ) 1 S ( L ) 4 3 LL ( SL S , S . x ( L L . , S L L , S . → S 4. x →→ S ( . L ) S . x 0 S′ S$ 21 S x( L ) 4 L L S, S → → S' S . $ → L . S , ) 3 L S 7 6 2 S x S L . L , S 5 GRAMMAR→ 3.20. ( L 1 S ( L ) 4 L →L , S L S S .. ( L ) S S ( L (. )L ) . → → S . x 2 S x GRAMMAR 3.20. 4 L L . , S → S' S . $ S ) Rather than rescan the stack for each7 token, the parser6 can remember in- GRAMMAR 3.20. L S . S ( L ) . FIGUREsteadRather 3.21. the state than reachedLR(0) rescan states the for forstack each Grammar stack for each element. 3.20. token, Then the parser the parsing can remember algorithm in- isstead the state reached for each stack element. Then the parsing algorithm Rather than rescan the stack forLook each up token, top stack the parser state, can and remember input symbol, in- to get action; is ()x,$SL stead the state reached for each stackIfLookFIGURE action element. up 3.21. is top ThenstackLR(0) thestate, states parsing and for Grammar input algorithm symbol, 3.20. to get action; is 1 s3 s2 g4 2 IfShift( actionr2n):is r2 Advance r2 r2 input r2 one token; push n on stack. Look up top stack state, and input symbol,()x,$ to get action; SL 3 Reduce(s3k): Pop s2 stack as many timesg7 as g5 the number of If action is Shift(1 ns3): Advance s2 input oneg4 token; push n on stack. 4 2 r2 r2symbols r2 r2 ona r2 the right-hand side of rule k; Shift(n): Advance input one token;Reduce(3 pushs3k):n Popon stack. s2 stack as manyg7 times g5 as the number of 5 s6Let X be s8 the left-hand-side symbol of rule k; Reduce(k): Pop stack as many6 timesr14 as r1 the number r1symbols ofr1 r1 ona the right-hand side of rule k; 7 r35 r3Ins6 r3 the state r3 s8 now r3 on top of stack, look up X to get “goto n”; symbols on the right-hand6 r1 side r1Let of r1X rulebe r1k the; left-hand-sider1 symbol of rule k; 8 s3Push s2 n on top of stack.g9 Let X be the left-hand-side7 r3 symbol r3In of the r3 rule state r3k; now r3 on top of stack, look up X to get “goto n”; 9 Accept:r48 s3 r4 Stop r4 s2parsing, r4 r4 reportg9 success. In the state now on top9 of stack,r4 r4 lookPush r4 upn Xon r4to top get r4 of “goto stack.n”; Error: Stop parsing, report failure. Push n onTABLE top of stack. 3.22. LR(0) parsing table for Grammar 3.20. (Appel) TABLEAccept: 3.22. StopLR(0) parsing, parsing table report for Grammar success. 3.20. Accept: Stop parsing, reportError: success. Stop parsing, report failure. Error:Syntax analysis Stop parsing, reportLR(0) failure. PARSER GENERATION 188 An LR(k)parserusesthecontentsofitsstackandthenextk tokens of the LR(0) PARSER GENERATION input to decide which action to take. Table 3.19 shows the use of one sym- LR(0) PARSER GENERATIONAn LR(k)parserusesthecontentsofitsstackandthenextWe can now construct a parsing table for this grammark tokens (Table of 3.22). the For k bol of lookahead. ForWek can now2,k theX construct table has a columns parsing tablefor every for this two-token grammar se- (Table 3.22). For An LR( )parserusesthecontentsofitsstackandthenextinput to decide whicheach action edge= I tokens toJ take.where of Table theX is a 3.19 terminal, shows we put the the use action ofshift one J sym-at position quence and so on; in practice,X→k > 1 is not used for compilation. This is input to decide which action tobol take. of Table lookahead. 3.19each shows For(I edge, Xk the) ofI use the2, table; oftheJ onewhere table if X sym-ishas aX nonterminal,is columns a terminal, we for put we everygoto put J two-token theat position action(Ishift se-, X).For J at position partly because the tables would→ be huge, but more because most reasonable bol of lookahead. For k 2, the table has columns(I, Xeach for) of every state the= I table; two-tokencontaining if X is an se- a item nonterminal,S′ S.$weputan we put gotoaccept Jactionat position at (I, $)(. I, X).For = quence and so on; in practice, k > 1 is not used→ for compilation. This is quence and so on; in practice,programmingk > 1 is not languages used forFinally, compilation. can for be a statedescribed containing This isby LR an item(1) grammars.A γ. (production n with the dot at partly becauseeach the tables state wouldI containing be huge, an butitem moreS′ because→S.$weputan most reasonableaccept action at (I, $). partly because the tables would beLR(0) huge, grammars but more becausethe are end), those most we that put reasonable a canreduce be n parsedaction at looking(I→, Y ) for only every at token theY stack,. programming languagesFinally, for can a statebe described containing by anLR item(1) grammars.A γ. (production n with the dot at programming languages can bemaking described shift/reduce by LR(1) grammars. decisionsIn principle, without since LR(0) any needslookahead. no lookahead, Though→ we justthis need class a single of action the end),for each we state: put A a statereduce willn shiftaction or reduce, at (I but, Y not) for both. every In practice, token Y since. we LR(0) grammars are those thatgrammarsLR(0) can be isgrammars parsed too weak looking are to be those only very at that useful, the can stack, the be algorithm parsed looking for constructing only at the LR(0) stack, Inneed principle, to know whatsince state LR(0) to shift needs into, no we lookahead,have rows headed we byjust state need numbers a single action making shift/reduce decisions withoutparsingmaking any tables shift/reduce lookahead. is a good decisions Though introduction this without class to the anyof LR(1) lookahead. parserThough construction this algo-class of for eachand columns state: A headed state by will grammar shift symbols. or reduce, but not both. In practice, since we grammars is too weak to be veryrithm.grammars useful, the is algorithm too weak for to constructing be very useful, LR(0) the algorithm for constructing LR(0) need to know what state to shift into, we have rows headed by state numbers parsing tables is a good introductionparsingWe to will thetables use LR(1) Grammar is a parser good 3.20 construction introduction to illustrate algo- toLR(0) the LR(1) parser parser generation. construction Consider algo- 61 and columns headed by grammar symbols. rithm. whatrithm. the parser for this grammar will be doing. Initially, it will have an empty We will use Grammar 3.20 to illustrateWe will LR(0) use Grammar parser generation. 3.20 to illustrate Consider LR(0) parser generation. Consider stack, and the input will be a complete S-sentence followed by $; that is, 61 what the parser for this grammar will be doing. Initially, it will have an empty what the parser for this grammarthe will right-hand be doing.Initially, side of the it willS′ haverule anwill empty be on the input. We indicate this as stack, and the input will be a completestack, andS-sentence the input followed will be by a complete $; that is,S-sentence followed by $; that is, S′ .S$wherethedotindicatesthecurrentpositionoftheparser. → the right-hand side of the S′ rulethe will right-hand be on the side input. of theWe indicateS′ rule will this beas on the input. We indicate this as S′ .S$wherethedotindicatesthecurrentpositionoftheparser.58 S′ .S$wherethedotindicatesthecurrentpositionoftheparser. → → 58 58 ExampleExample Example

ExampleExample ExampleExampleExample Example ExampleExample I : E .E, 0 0 ! I : E E + .T I0 : E 0 .E.E+, T Example6 Example Example ExampleExample!Example ExampleExample: parsing! table for the expression grammar E ! ..TE + T I6 : E ET+ .T.TI6 : FE E + .T Example! Example! ! ⇤ ! ExampleTE ! .T F T .TT F.F T .T F ! ! !⇤ I6 : E E + .T T I!0 : .TE 0⇤ F.E, T .FF .(E) ! ! ⇤ T .F Example! ExampleI6 : E (SLR)E +Example.TT ParsingT .F .TablesT F for Expression Grammar Example I!0 : E 0⇤ .E.E+, T Example ! !! ! ⇤ FT ..(FE)!!Example ExampleExample:I F: E .F(E parsingET) +. .idT.TI table: FE!T forE . the+F .T expression grammar I0 : E 0 !!Example.EE, ..TE + T 6 ! ! 6 F .(E) Action Table Goto Table ExampleExample F ! ..(idE!) FI : !.Eid !E + T⇤ . !F ! .(E) Example of a non LR(0)IE6Example:!E.ETE+ T grammarE.T +F.T 9 T .TT1) FE. F→ E+TT ! state.T id F + * ( ) $ E T F F ! . Exampleid! ⇤ ! ! !!⇤ F I6 : E.!idE + .T Actions of a LR-parser I1 : E 0! ET!. .T F I6 :I E: E TEET++.FF.TT. .(E)F !F !0 . id⇤s5 s4 1 2 3 Example: parsing table forE ! theT.TT expression! .FTExample⇤ F grammar 9 2) E(SLR) → T ExampleT Parsing!T!.F .TablesT F for Expression Grammar I : E !! E. ! I6 :ExampleE E +!.T! ! !! ⇤ I : E !1 E +⇤Ts6. acc I0 : E 0 .E, I6 : E E +1.TTE 0 .ETFT.!I+6 :FT..E(FE)⇤E + .T ! 1. TEII6 F: :ETET..+F(EFTE) IT9+. id:.TFE.9 !TE +.FT . I0!: E 0 ! .E,! ExampleT 10 .T F3)! T → T*FF !.(E) Action Table Goto Table Let us assumeExample theExample! parserExample is in of configuration! aExample nonE ! LR(0)TEF. ⇤+!ExampleT.T(FE) grammar.T F T .T !F !! ⇤!⇤! ⇤ !T !2 T . Fr2 s7 r2 r2 E .EExample+ T T .TI2 F: TE .TIFF.: E. id E + .T ! ⇤ I FI:9T:F.Eid.T(EE)+.F T . !FTstate. . (EidF) + * ( ) $ E T F ! E6 !!.E +!T ⇤ T I10.F2.T: ET11 .FTT 4)F 1). T → E →F E+T !!3 ⇤ r4 r4 r4 r4 ! (SLR)! Parsing⇤ ! TablesF ! . TforExampleid Expression.F GrammarExample: ! ! parsing! ⇤ ExampleI10F table: TF . id for.Tid F the. expression grammar ActionsE of a.T LR-parserExample:T parsing.F I2 : tableFTE I1 :F. forT(EE.0) theF..T(EE!. expression) grammar!I6 :I9!E: !E!TEET+⇤!+..FTT. . F ! 0 ⇤ s5 s4 1 2 3 (s X s ...X s , a a ...a $)! !!TI6 : !E .T E +F.T F I11.(E:) F (E).5) 2)I F → E : →(E) TIT : !E !T4 E +s5⇤ F.T. s4 8 2 3 ! 0 1 1 m m !i i+1 n !I!1 : E!0!⇤ FEI. : .(EE) E + .T I6 : E 3.F ET+!..T(!ET)!F!10I ⇤: I116EI9:: FE E1 +(EE+)T.T.s6. acc T I.0T: EF0 .E, F I6 .:(EE) FET+ .FT.TTidE. ActionF.ET.id!+6 F TableT! ⇤ IGoto6 : ETable! E!+1.T.FT!EII610:.:TETET.+FFTET9+ .TF . ! II30 :: TE 0 ! F.E. ,!⇤! !TExample!.T F ExampleF !T. id .!T !F F ⇤6). ( 3) EF ) T →id T*F !T !5 .T⇤ Fr6 r6 r6 r6 Let us! assume⇤E the!. parserEExample+ T is! in configuration! Exampleaccept!! E!!TEF. ⇤+ T..TFid .T F T ! .FT F !. id!!⇤!⇤!I11→:⇤ F !T (E2 T).. Fr2 s7 r2 r2 T .F 1)F E .E+TidTII3 :: E.TTstateI2! F: EFTidE . !+ .TF.* ! ( !!) ⇤$ ⇤ E T F !I102.:⇤ETI11 :TTTF 4)F. T (. TE ).F F T !T!6 . s5⇤ ⇤F s4 9 3 (initially,Example the state is (s0,!a1a2 ...an$)→,wherea(SLR)I14 !: I FE10 ...:Parsing!⇤ Ea(.nE.Eis+!) theTablesTE input!+T T Tfor.. FExpression.F I9 : E Grammar!TE4.+Example:⇤.!TF . FF.F !. parsingid ⇤→(SLR)ExampleI table!:T TParsing3 .F forT F ther4Tables. r4 expression forr4 Expressionr4 grammar Grammar ! E .T ! T 9.F I0! 2 : Fs5TE I9 :F.T(EE.) Fs4.E!( E+)T . 1 T2 3 .!F !!! ⇤! 10 !! ⇤ F .(E) I4 : F ! (.E) ExampleI : !E E + .T !F I11.(E:)!F !T(E).5). F F (E) 7 !4 s5s5 ⇤ s4s4 8 2 3 10 word) ! (Is90X:2)1sE 1 ...E →X ETm s+m,!Tai .aEEi+1 ...E..EaT.n!+$)+!!TT !!⇤F 6F .(E.()E) TI9 !: TE. F E +.(ET) . I10→: IIT116 ::FEF T.(E(EE+))F..T. ! T .T F ! F .(E!!1) TFT s6 F.TTid.. ActionF.T!Fid. TableF!acc IFGoto6 :! ETable.5.!(E⇤ )EF3.I9+:T.FE(!E)T !EF+ T . !! F 1.. idE E + T E I!I30 :: .ETEI60+!:TF.E. !,⇤ !E⇤TExample+ ..TT F F !.!id !!!F ⇤6). ( EF ) id F!T 8! 5 . .idT⇤ Fs6r6 r6 Actionr6s11 Tabler6 Goto Table I6 : E !!E +⇤3).T T T → T*FT .I2 !:F EET T..TTaccept.!!!!F !F⇤ F . id.!id I⇤10 : T!T !T .FTF .!TF . idF! I11→: F (E). ACTION[sm, a!i ] can takeT four.F values:! 1)F ⇤E → .E+Tid!II!21 3 :: ETIstate10r2 : EFidT s7. +! !T* !r2 F( .r2 ) I $ : EFE !T.E id+F ⇤4..F T T F T . F !!9 6 s5state⇤ r1 ids7 +s4 *r1 (r1 ) $9 3 E T F I1 :(initially,E 0 Example theE. ! state! is (s0, a1a2 ...an$)I,whereaE I!!:4 : I.TT9FE10...:⇤!I6Ea(.nE.:TETisI+)E! the:TEEF input+.⇤TTET+.+.FT.6T I9 : E 6.!EF!+⇤!T . id⇤F . id1) EI(SLR) →: E+TE!T Parsing E.F+ T . Tables for Expression Grammar 2.TEF .T.TI(10E):4)F T T → FT !FT10.T 3 T..TF. !0 r4FF I9s5 :r4 9E Er4 s4+! r4T . I11 : F1 !T2 (!E3 ).F !!! ⇤ 9 ! 1. Shift s: shifts! the next input symbolI9 and:2)E E then → ET the+I!4T state: .EEFI11s Example:onE.(.ET..FE!+ the+) TT!!(!E!)F.Example⇤.(EI)7 : TI10!TI:9.TT!:TTEI.10F. !F: TTE +FT.T. F . F!10 7 .(Es5) 0 r3 s5r3 s4 r3 s4r3 10 1 2 3 word)E E. +!T! ! ⇤ T !! .T⇤Example⇤!F ⇤ T T . F I9 : E E +2) T EI6. : TET ET+. .FT F! 1..I11idE⇤:5) F FE →+ (E)(TEI 3)!.: ITTF 4: F.F(s5F.E !!)1 T(TIE6!Ts6): s4.T E..TF.!FTEF +accF8 .T2 !F3 !⇤⇤.5.(E⇤)!F (!E) ⇤ → !8 s6 Actions11 Table Goto Table stack Example:(s0X1s!1 ...3.XTmTsmII,6 parsingai:a.TiF+1E...FanE)E table++(s.3)0T.XT T 1Ts for1→... 11T*FTX. theIm! a:Fi s,ETEa expressioni+1⇤ ...T..ET.an+) FT!!F grammar⇤ ⇤. id IF : T.(!E)T!F . ⇤ ! F!11 . id⇤ 1 r5 r5 s6 r5 r5 acc ACTIONExample[s , a!] can take! four! values: !2! E 2 .T!I r2 !: T!s7 ⇤!ExampleT r2 ⇤F .r2 TI1011 : .FF TI111.:(E!EFT)T. E(FE+T)..T F T .T F I2 : E T . m!i !! ! ! FT 5 ⇤ ..Fid!!r6 I10r6: 10⇤T r6T !r6 F . I6 :!EF !.Eid+⇤ .F 3) TI10 →: T*FT !9 T stateFr1 . ids7 + *r1 (r1 ) $ E T F 2. Reduce A I1(denoted: E!0 2. byET⇤E.rn6) where .FT → idn4)F I is 4 T a: production FF (.I.(T10ETE)) number): T.T.TF.F TIF6I9:TT: .E!(E.F).E⇤TE+ T+F. I.11T!: F ! (6.E)F.! !!id⇤! ⇤1) EI9 →: E+TE!! E +⇤⇤2T . r2 s7 r2 r2 ! F .(ET) .TI10 : FT → T!F .T 3 .TI r4F : F!r4 (⇤E)r4. r4 F .(!idE) ! T .F 34 1.T!Shift4.Ts:.T shiftsF ! theF next! input⇤ symbol! andF !!6 ⇤ then.(s5E the!)! stateI11 :⇤11⇤!sFExampleons4 ! the(⇤!E!!). ⇤ I79 :!TI3 10!: .TTT I.10F!: TT F .T FI11. : F !10 (E). 0 r3 s5r3 r3 s4r3 1 2 3 I Pop 2 (= r)E itemsE from. +!! theT stack⇤ I1 : EFE 0 ..EETid+. T.T Example!F !T T . TF F! 2. E T 4) 2) T E→I6 → :F TET !! ET+. .3 FT r4 r4 r4 r4 ! (s ⇤X!s ...XT s I,6a:.aFIE11...:5)a F E)F →+ ((E)s.(TEXI !3)s.: ...IT11TFX4: a sF.F,(s5F.aE F)F...(!aTE.) )id.s4( E.F) I7 : F8 2 .(E!id3. ) ⇤⇤ ! ⇤! ⇤ F !11. (E)⇤ r5 r5 r5 r5 |stack| FExample:0 1!!1 .3.idTmTm parsingi .TiF+1 F n table!F0 !!71 ! for1 .s5id the!m!i expressioni+1⇤ s4 In6 : !E! grammarExample⇤E⇤+ .T10TIF : ..(FEF)I111.:(!EEF). E(E+).T T .T F1 s6 acc II3Push: TA andIF2 s:. whereEExamplesT=. GOTO!(SLR)[smExample! Parsingr , A]! EE Tables.ETFT. +5 Tfor..Fid !Expressionr6! I10!r6: T! r6T Grammarr6 F . !11! 5) 3) F TI10 → (E): T*FT! T 4 F . s5 s4 8 2 3 2. Reduce5. FA! (E(denotedF) !!.(E byT)⇤rn6) whereI 5.FT :→ idFnF Iis!4 a: id productionFF. (..(EE)) number)F T .(!E.F) E E!. + F !! → F !!. id ⇤⇤2 r2 s7 r2 r2 ! ! F .(E) !!8 !s6 F s11 . T!id I7⇤.:TT FF T .(.idEF3.) !T T F T .F 34 (sI06X1:s1 E...XmsEmT,!+ai ai.4.+1TT....T!aFn) F !I5 : ⇤F ! idF.I9!!6 : .(s5E ) I11 :EFs4+ !T(E.). !9 !3 I11 : F! (E). I4 : FI9 :I (Pop.EE) 2 E(= +r) itemsT . from theI2 stack: TE I1 :.TEFE.0 ActionF..EEid+. TableT I 6 :!E GotoE Table+ .!T !⇤ 2. E T 4) T → F !! 5 3 r4r6 r4r6 r4 r6 r4 r6 ! ! F ⇤!!.!idT .F !9 !r1 s7 !!F r1F !r1. !id.(E) IF7⇤: F.(ESyntax).(Eid.) analysis! ⇤6) Syntax FI9 →: analysisidE F E.(+E)T . 88 88 (s0X1s1!T...6.XmF.r!Tsm| |r AsFidF, ai ai+1....idan)(SLR) Parsing! F ⇤! !7 Tables!.s5id for Expressions4 TI6 :!E.F GrammarE + .T10 ! E II.3EPush:+TATandFs. where! s = GOTO! [smTstateTExampler ,I A]:id. TFEFE. + I9F.idET*: .T+E( T ) T!E.$! +FTE .T !F ! 4. T F 5) F (E)!! 6 4 s5 s4s4 8 2 9 3 3 !T !⇤5.T .F1)! FE →(EF E+T) .(E) 105 !r3 r3 r3 Tr3 ! .T! FE E. + F →TF T. .id F I Output! the predictionI6 :I9!E: EA E +E.T+ T .I0 : E 0!! .E!,!8⇤ I9 s6: !!EF ⇤Es11 .+TidTIF.7 .Syntax:TT. Fid analysisT .F3. T T F 88 E T I.4T(s:0X.F1!Is91 ...: (X.EmEs)m⇤,!ai aEi+1 ...+!aTn) . F 0 I5 :s5.( TEF) .idTI..6 Action:s4F E TableEFI +:!.T1E .(EGoto2 )⇤E! 3 Table+ .!T ⇤ ! !! ⇤ 7 5 s5 r6 r6 s4 r6 r6 10 !2)!! E →F T I.3!id: T 112! FI. !9 r5 : r5 TTr1 s7 r5T !T!Tr56. r1 FF.r1 .F!SyntaxF⇤ analysis.(ESyntax) analysis! ⇤6)I Syntax 10 FI9 →:: analysisTidE ET+ TF. 80 88 88 !(s0X1s1!...6.XmFr sm r Asid, ai ai+1 ...aEn) .E 10+!T ⇤ ! !I8T: F .F (E.) 3. Accept: parsingI10!: isT successfullyE T .TET+.!TT completedFT.F. !F 1 !! Tstates6 idI. F : + IE9T*: !TE(E acc.TF+ ) T.⇤TEF.$ .+!FidTE .T !F 5. F (E) ! ⇤ 8 6 s5 s6 s4 s11 9 3 T F1. .TE .(EF)E !+TT!⇤T .1)Syntax FE → E+T analysisF !. idT 10 6T . r3F !r3 ⇤r3 !Tr3 ! .!TF F . id 4. T F I : FT (TE.). F 88 I Output!! the prediction3)⇤! I 9T :→EA T*F⇤ IE4 :+FT .I0 : (.EE0!)I .E:, T!!!!TF ⇤ F⇤.(.E) Syntax34 analysis! 11 88 4. Error: parser! detected! ⇤! T an error.F! (typically⇤ an emptyE2 ! entry.TI0 11r2! in10s5: s7 the F⇤ actionI6r2 :s4( Er2 ).!EF+!.TE1 .(E2 )⇤E!3 . + F ! !! ⇤ 7 s5 s4 10 I11F : F.EidI10 :(.TET). T 2)!!Syntax F E. T analysis !I3 : FT 11! .(FEI.) r5T :Tr5 TT.T.F r5!T FTTr5. FF..F Syntax analysis I : T T 9 F . r1 s7 88 r1 r1 80 T I6 :.FE I E:!+T .T T IF5→.: FE !id.EE. !+ T.E 10+ T!I9!: E!I6⇤: E!E+I8 :T!F.E +(.ET6..) 5.F F id(E) 10 table). 3. Accept:!2.!E parsing!10!FT1. E is4). successfully( E! T)E →+T F⇤ T completedT . TF3 .T1 r4! Fr4 s6 I6 !:r4! ETr4F !⇤Eacc.TF+ . .⇤!idTF . id ! ⇤ 8 s6 s11 TI! :.TF F (E). Syntax!! analysisF !I11. id: TFF .F.!((EE)Syntax!). ⇤ analysis! ! !Syntax analysisI11 : F (E)10. r3 r3 88 r3 r3 88 88 I F: E .(EE)!+11 T!. ! 3)⇤! T → T*F⇤ I!4 : EF ⇤ .(T.EI )I10: :F T!T!(!ETF)T⇤.!T. F.(F.EE).T EF. + F34 ! 94. Error: parserT I11! detected:.TF5)⇤! I F 10F an →:(error E(E)T) . (typicallyT ESyntaxF4 . ans5.empty analysisSyntaxT2 !!entry analysis11r2 s4 in s7 the! !T actionT r2 .T.r2F8 F2 3 ! 9 r1 s7 88 r1 88r1 !3.!T T FIT6 :.FEF. !id E + .T T I5 : .F !id. FI9 : .!idEI9!:!EEI!6⇤+:⇤!ETE+. T!.E⇤ + .T6. F id 11 r5 r5 r5 r5 Syntax analysis F table).. id !2. E ⇤ !T 4) ! T → F⇤ ! TE3 .ET r4+ FFTr4 .78(E!r4!) r4F ⇤ . id T TT . !F.!F I! : F (E)T. 5 ! .T!r6! Fr6 I11 I:r6!10 TF:Fr6 T!.F.((EE)SyntaxT!). F analysis.!F !Syntax analysis 10 r3 r3 r3 r3 88 88 ! I9 !F: E .(⇤ET6)E)! +11F →T.T .id F F .(E )! .TI ⇤: E! ET+!TT. T!. TFT. F .T F ! ⇤Syntax analysis5) F (E) ! 4 ⇤ s5 Syntax9 F analysiss4. id!!! 8 ⇤!2 3 88 88 34 I5 : F 4.idT. !!3.F!T T F! → 6 !s5 T .F s4 FI9 : . idE !9 E3 +⇤!T . ⇤ 11 r5 r5 r5 r5 Syntax analysisI10 : T FFT .F.(Eid. ) ! ⇤ T .F ! !I11!F: F.!78(E)(FE⇤). .(E) Syntax analysis 88 ! T TT . F.F F . Tid5 ! .T r6 FTr6 T!Ir6.!10 :Fr6 T! T F .F !!!!⇤ ! ⇤6)Syntax F → id analysis 7 !s5 F SyntaxI.9(E: ) analysiss4E II910:E:E+TT!E.T+TT .!T10F . . FSyntax analysis 88 88 88 Syntax analysis ! !⇤Syntax analysis F ! .(E)! ⇤ !F ⇤. id ! ⇤! 80 88 34 I11 : FI5 :FF(E)4...ididT. F T6 !s5 .F ! s4 ! ! F!⇤ F9 .⇤id3 .(E) 5. FI10 : (TE) F T .F(E. ) I1 : E8 0! EF.s6 . idI10 : TIs11 :T!TI11F :FT..F(EF ).(E). Syntax analysis 88 ! !! ! ! SyntaxT 11 analysisT .I10SyntaxF: T analysis!T !F . 88 88 Syntax! analysis! !⇤ FSyntax! analysis. Fid7 !s5. (E) I9 : !s4E !⇤E +⇤T!. 10 Syntax analysis80 88 88 I9 : IE11 : FE +F(TE)... id SyntaxE9 analysisE. +r1 Ts7 I11 : F!r1 r1(! E⇤)I.9!: SyntaxE! F E analysis⇤ +. Tid. 88 88 5. F (E) I!1 : E8 0! E.s6 I10 : TIs11 : TF F . (E). 6. F !id! ! I5 : F ! idF. I.10id: T !TT 11!TF.. ⇤SyntaxF ! analysis! 88 T! I T: .E!F E + T . 10 Syntaxr3 ! analysisr3 Ir3! : r3F! ⇤ (E⇤)IT.9!: SyntaxE T . E analysisF+ T . 88 88 Syntax analysis 9 I2 : E ! TE.9 !IE. +r1: TFs7 11 (Er1) . r1 88 !6. F ⇤ !id !I5 : F ! id11. I10 : T !T F .! ! ⇤ I : T TT!F .T . F T11 I : TE10. r5 FTr5 . r3 r3 !r5 r5 r3! I10r3 :⇤ T T T TF. . F Syntax10 analysis 2 ! I11 : F (E). Syntax! analysis⇤ 88 80 ! ⇤ ! ⇤ ! 11 !⇤ r5 r5 r5 r5 ! ⇤ ! I10 : T T34 F . I11 : F I10(:E)T. T FI3. : T FT. T . F I11 : F (E)Syntax. analysis Syntax analysis 80 88 ! ⇤ ! ⇤ Syntax analysis ! ⇤ 88 ! ! 34 I11 : F (E). I!3 : T F . I11 : F (E). Syntax analysis 88 ! SyntaxI4 : analysisF (.E)! Syntax analysis! 88 88 !I : F (.E) ESyntax4 . analysisE +!T 88 E ! .TE .E + T Conflict:Syntax analysis in state 2, we don’tSyntax! analysis know whether to shift or reduce. 88 88 Syntax analysis Syntax analysis ! E .T 88 88 Syntax analysis T .T !FSyntax analysis 88 88 Syntax analysis ! TSyntax⇤ .T analysisF 80 88 Syntax analysis T .F ! Syntax⇤ analysis 80 88 ! T .F Syntax analysis 88 F .(E)! Syntax analysis 88 Syntax analysis F Syntax.(E) analysis 91 88 Syntax analysisSyntax analysis ! ! Syntax analysis 88 189 88 Syntax analysis F . Fid . id Syntax analysis 88 88 ! ! Syntax analysis 88 I5 : F I5 :idF. id. ! !Syntax analysisSyntax analysis 88 88

Syntax analysisSyntax analysis SyntaxSyntax analysis analysis 8888 88 88 Syntax analysisSyntax analysis 8888 Constructing the SLR parsing tables

1. Construct c = I , I ,...,I , the collection of sets of LR(0) items { 0 1 n} for G 0 (the augmented grammar) 2. State i of the parser is derived from Ii . Actions for state i are as follows:

2.1 If A ↵.a is in Ii and goto(Ii , a)=Ij ,thenACTION[i, a]=Shiftj 2.2 If A ! ↵. is in I ,thenACTION[i, a]=ReduceA ↵ for all ! i ! terminals a in Follow(A)whereA = S 0 6 2.3 If S 0 S. is in I ,thensetACTION[i, $] = Accept ! i 3. If Goto(Ii , A)=Ij for a nonterminal A,thenGOTO[i, A]=j 4. All entries not defined by rules (2) and (3) are made “error” 5. The initial state s is the set of items containing S .S 0 0 !

the simplest form of one symbol lookahead, SLR (Simple LR) )

Syntax analysis 190 Example (SLR) Parsing Tables for Expression Grammar Action Table Goto Table 1) E → E+T state id + * ( ) $ E T F 2) E → T 0 s5 s4 1 2 3 1 s6 acc 3) T T*F → 2 r2 s7 r2 r2 4) T → F 3 r4 r4 r4 r4 5) F → (E) 4 s5 s4 8 2 3 6) F → id 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 First Follow 10 r3 r3 r3 r3 E id ( $+) 11 r5 r5 r5 r5

T id ( $+*) 34 F id ( $+*)

Syntax analysis 191 SLR(1) grammars

A grammar for which there is no (shift/reduce or reduce/reduce) conflict during the construction of the SLR table is called SLR(1) (or SLR in short). All SLR grammars are unambiguous but many unambiguous grammars are not SLR There are more SLR grammars than LL(1) grammars but there are LL(1) grammars that are not SLR.

Syntax analysis 192

Conflict example for SLR parsing

(Dragonbook) Follow(R) contains ’=’. In I2, when seeing ’=’ on the input, we don’t know whether to shift or to reduce with R L . !

Syntax analysis 193

Summary of SLR parsing

Construction of a SLR parser from a CFG grammar Eliminate ambiguity (or not, see later) Add the production S S,whereS is the start symbol of the 0 ! grammar Compute the LR(0) canonical collection of LR(0) item sets and the Goto function (transition function) Add a shift action in the action table for transitions on terminals and goto actions in the goto table for transitions on nonterminals Compute Follow for each nonterminals (which implies first adding S S $ to the grammar and computing First and Nullable) 00 ! 0 Add the reduce actions in the action table according to Follow Check that the grammar is SLR (and if not, try to resolve conflicts, see later)

Syntax analysis 194 Outline

1. Introduction

2. Context-free grammar

3. Top-down parsing

4. Bottom-up parsing Shift/reduce parsing LR parsers Operator precedence parsing Using ambiguous grammars

5. Conclusion and some practical considerations

Syntax analysis 195 Operator precedence parsing

Bottom-up parsing methods that follow the idea of shift-reduce parsers Several flavors: operator, simple, and weak precedence. In this course, only weak precedence

Main di↵erences compared to LR parsers:

I There is no explicit state associated to the parser (and thus no state pushed on the stack) I The decision of whether to shift or reduce is taken based solely on the symbol on the top of the stack and the next input symbol (and stored in a shift-reduce table) I In case of reduction, the handle is the longest sequence at the top of stack matching the RHS of a rule

Syntax analysis 196 Structure of the weak precedence parser

input a1 ai an $ stack

Xm

Xm 1 Weak precedence parsing output

X2

X1 Shift-reduce table terminals and $

Shift/Reduce/Error terminals, nonterminals and $

Syntax(Amodifier)˜ analysis 197 Weak precedence parsing algorithm

Create a stack with the special symbol $ a = getnexttoken() while (True) if (Stack= = $S and a = = $) break // Parsing is over Xm = top(Stack) if (SRT [Xm, a] = shift) Push a onto the stack a = getnexttoken() elseif (SRT [Xm, a]=reduce) Search for the longest RHS that matches the top of the stack if no match found call error-recovery routine Let denote this rule by Y Xm r+1 ...Xm ! Pop r elements o↵the stack Push Y onto the stack Output Y Xm r+1 ...Xm ! else call error-recovery routine

Syntax analysis 198 Example for the expression grammar

Example: Shift/reduce table + ( ) id $ ⇤ E E + T E S S R ! E T T S R R R ! T T F F R R R R ! ⇤ T F S S ! ⇤ F (E) + S S ! F id ( S S ! ) R R R R id R R R R $ S S

Syntax analysis 199 Example of parsing

Stack Input Action $ id + id id$ Shift ⇤ $id +id id$ReducebyF id ⇤ ! $F +id id$ReducebyT F ⇤ ! $T +id id$ReducebyE T ⇤ ! $E +id id$ Shift ⇤ $E+ id id$ Shift ⇤ $E + id id$ReducebyF id ⇤ ! $E + F id$ReducebyT F ⇤ ! $E + T id$ Shift ⇤ $E + T id$ Shift ⇤ $E + T id $ReducebyF id ⇤ ! $E + T F $ReducebyT T F ⇤ ! ⇤ $E + T $ReducebyE E + T ! $E $Accept

Syntax analysis 200 Precedence relation: principle

We define the (weak precedence) relations l and m between symbols of the grammar (terminals or nonterminals) I X l Y if XY appears in the RHS of a rule or if X precedes a reducible word whose leftmost symbol is Y I X m Y if X is the rightmost symbol of a reducible word and Y the symbol immediately following that word

Shift when Xm l a,reducewhenXm m a Reducing changes the precedence relation only at the top of the stack (there is thus no need to shift backward)

Syntax analysis 201 Precedence relation: formal definition

Let G =(V , ⌃, R, S) be a context-free grammar and $ a new symbol acting as left and right end-marker for the input word. Define V = V $ 0 [{ } The weak precedence relations l and m are defined respectively on V 0 V and V V 0 as follows: ⇥ ⇥ + 1. X l Y if A ↵XB is in R,andB Y , ! ) 2. X l Y if A ↵XY is in R +! 3. $ l X if S X ↵ ) + 4. X m a if A ↵B is in R,andB X and ⇤ a !+ ) ) 5. X m $ifS ↵X ) for some ↵, , ,andB

Syntax analysis 202 Construction of the SR table: shift

Shift relation, l:

Initialize to the empty set. S 1add$l S to S 2 for each production X L1L2 ...Lk for i =1to k 1! add Li l Li+1 to 3 repeat S for each⇤ pair X l Y in for each production SY L L ...L ! 1 2 k Add X l L1 to until did not change in thisS iteration. S

⇤ We only need to consider the pairs X l Y with Y anonterminalthatwereaddedin at the previous iteration S

Syntax analysis 203 Example of the expression grammar: shift

Step 1 S l $ Step 2 E l + + l T T l ⇤ l F ⇤ E E + T (lE ! E T El) ! T T F Step 3.1 + l F ! ⇤ T F l id ! ⇤ F (E) l ( ! ⇤ F id (lT ! Step 3.2 + l id + l ( (lF Step 3.3 (l( (lid

Syntax analysis 204 Construction of the SR table: reduce

Reduce relation, m:

Initialize to the empty set. R 1addS m $to 2 for each productionR X L L ...L ! 1 2 k for each pair X l Y in S add Lk m Y in 3 repeat R for each⇤ pair X m Y in for each production RX L L ...L ! 1 2 k Add Lk m Y to until did not change in thisR iteration. R

⇤ We only need to consider the pairs X m Y with X anonterminalthatwereaddedin at the previous iteration. R

Syntax analysis 205 Example of the expression grammar: reduce

Step 1 E m $ Step 2 T m + F m ⇤ T m) Step 3.1 T m $ E E + T F m + ! E T ) m ! ⇤ T T F id m ! ⇤ ⇤ T F F m) ! F (E) Step 3.2 F $ ! m F id ) + ! m id m + )m) idm) Step 3.3 id m $ ) m $

Syntax analysis 206 Weak precedence grammars

Weak precedence grammars are those that can be analysed by a weak precedence parser. A grammar G =(V , ⌃, R, S)iscalledaweak precedence grammar if it satisfies the following conditions: 1. There exist no pair of productions with the same right hand side 2. There are no empty right hand sides (A ✏) 3. There is at most one weak precedence relation! between any two symbols 4. Whenever there are two syntactic rules of the form A ↵X and ! B , we don’t have X l B ! Conditions 1 and 2 are easy to check Conditions 3 and 4 can be checked by constructing the SR table.

Syntax analysis 207 Example of the expression grammar

Shift/reduce table

+ ( ) id $ E E + T ⇤ ! E S S R E T ! T S R R R T T F ! ⇤ F R R R R T F S S ! ⇤ F (E) + S S ! F id ( S S ! ) R R R R id R R R R $ S S Conditions 1-3 are satisfied (there is no conflict in the SR table) Condition 4: I E E + T and E T but we don’t have + l E (see slide 204) ! ! I T T F and T F but we don’t have l T (see slide 204) ! ⇤ ! ⇤

Syntax analysis 208 Removing ✏ rules

Removing rules of the form A ✏ is not dicult ! For each rule with A in the RHS, add a set of new rules consisting of the di↵erent combinations of A replaced or not with ✏. Example:

S AbA B ! | B b c ! | A ✏ ! is transformed into

S AbA Ab bA b B ! | | | | B b c ! |

Syntax analysis 209 Summary of weak precedence parsing

Construction of a weak precedence parser Eliminate ambiguity (or not, see later) Eliminate productions with ✏ and ensure that there are no two productions with identical RHS Construct the shift/reduce table Check that there is no conflict during the construction Check condition 4 of slide 207

Syntax analysis 210 Outline

1. Introduction

2. Context-free grammar

3. Top-down parsing

4. Bottom-up parsing Shift/reduce parsing LR parsers Operator precedence parsing Using ambiguous grammars

5. Conclusion and some practical considerations

Syntax analysis 211 Using ambiguous grammars with bottom-up parsers

All grammars used in the construction of Shift/Reduce parsing tables must be un-ambiguous We can still create a parsing table for an but there will be conflicts We can often resolve these conflicts in favor of one of the choices to disambiguate the grammar Why use an ambiguous grammar?

I Because the ambiguous grammar is much more natural and the corresponding unambiguous one can be very complex I Using an ambiguous grammar may eliminate unnecessary reductions Example: E E + T T ! | E E + E E E (E) id T T F F ! | ⇤ | | ) ! ⇤ | F (E) id ! |

Syntax analysis 212 Set of LR(0) items of the ambiguous expression grammar

E E + E E E (E) id ! | ⇤ | |

Follow(E)= $, +, , ) { ⇤ } states 7 and 8 have ) shift/reduce conflicts for +and . ⇤

(Dragonbook)

Syntax analysis 213

Disambiguation Example: Parsing of id + id id will give the configuration ⇤ (0E1+4E7, id$) ⇤ We can choose:

I ACTION[7, ]=shift5 precedence to ⇤ ) ⇤ I ACTION[7, ]=reduceE E + E precedence to + ⇤ ! )

Parsing of id + id + id will give the configuration

(0E1+4E7, +id$)

We can choose:

I ACTION[7, +] =shift 4 + is right-associative ) I ACTION[7, +] =reduce E E + E +isleft-associative ! ) (same analysis for I8)

Syntax analysis 214 outline

1. Introduction

2. Context-free grammar

3. Top-down parsing

4. Bottom-up parsing Shift/reduce parsing LR parsers Operator precedence parsing Using ambiguous grammars

5. Conclusion and some practical considerations

Syntax analysis 215 Top-down versus bottom-up parsing

Top-down

I Easier to implement (recursively), enough for most standard programming languages I Need to modify the grammar sometimes strongly, less general than bottom-up parsers I Used in most hand-written compilers and some parser generators (JavaCC, ANTLR) Bottom-up:

I More general, less strict rules on the grammar, SLR(1) powerful enough for most standard programming languages I More dicult to implement, less easy to maintain (add new rules, etc.) I Used in most parser generators (Yacc, Bison)

Syntax analysis 216 Hierarchy of grammar classes CHAPTER THREE. PARSING

Unambiguous Grammars Ambiguous Grammars LL(k) LR(k)

LL(1) LR(1)

LALR(1)

SLR

LL(0) LR(0)

(Appel) FIGURE 3.29. Ahierarchyofgrammarclasses.

Syntax analysis For example, the items in states 6 and 13 of the LR(1) parser217 for Gram- mar 3.26 (Figure 3.27) are identical if the lookahead sets are ignored. Also, states 7 and 12 are identical except for lookahead, as are states 8 and 11 and states 10 and 14. Merging these pairs of states gives the LALR(1) parsing table shown in Table 3.28b. For some grammars, the LALR(1) table contains reduce-reduce conflicts where the LR(1) table has none, but in practice the difference matters little. What does matter is that the LALR(1) parsing table requires less memory to represent than the LR(1) table, since there can be many fewer states.

HIERARCHY OF GRAMMAR CLASSES AgrammarissaidtobeLALR(1)ifitsLALR(1)parsingtablecontainsno conflicts. All SLR grammars are LALR(1), but not vice versa. Figure 3.29 shows the relationship between several classes of grammars. Any reasonable programming language has a LALR(1) grammar, and there are many parser-generator tools available for LALR(1) grammars. For this

66 Error detection and recovery

In table-driven parsers, there is an error as soon as the table contains no entry (or an error entry) for the current stack (state) and input symbols The least one can do: report a syntax error and give information about the position in the input file and the tokens that were expected at that position In practice, it is however desirable to continue parsing to report more errors There are several ways to recover from an error:

I Panic mode I Phrase-level recovery I Introduce specific productions for errors I Global error repair

Syntax analysis 218 Panic-mode recovery

In case of syntax error within a “phrase”, skip until the next synchronizing token is found (e.g., semicolon, right parenthesis) and then resume parsing In LR parsing:

I Scan down the stack until a state s with a goto on a particular nonterminal A is found I Discard zero or more input symbols until a symbol a is found that can follow A I Stack the state GOTO(s, A)andresumenormalparsing

Syntax analysis 219 Phrase-level recovery

Examine each error entry in the parsing table and decide on an appropriate recovery procedure based on the most likely programmer error. Examples in LR parsing: E E + E E E (E) id ! | ⇤ | | I id + id: is unexpected⇤ after a +: report a “missing operand” error, push an arbitrary⇤ number on the stack and go to the appropriate next state I id + id)+id: Report an “unbalanced right parenthesis” error and remove the right parenthesis from the input

Syntax analysis 220 Other error recovery approaches

Introduce specific productions for detecting errors: Add rules in the grammar to detect common errors Examples for a C compiler: I if EI(parenthesis are missing around the expression) ! I if (E) then I (then is not needed in C) ! Global error repair: Try to find globally the smallest set of insertions and deletions that would turn the program into a syntactically correct string Very costly and not always e↵ective

Syntax analysis 221 Building the syntax tree

Parsing algorithms presented so far only check that the program is syntactically correct In practice, the parser also needs to build the parse tree (also called concrete syntax tree) Its construction is easily embedded into the parsing algorithm

Top-down parsing:

I Recursive descent: let each parsing function return the sub-trees for the parts of the input they parse I Table-driven: each nonterminal on the stack points to its node in the partially built syntax tree. When the nonterminal is replaced by one of its RHS, nodes for the symbols on the RHS are added as children to the nonterminal node

Syntax analysis 222 would be grouped into the lexemes x3, =, y, +, 3, and ;.

A token is a pair. For example

1. The lexeme x3 would be mapped to a token such as . The name id is short for identifier. The value 1 is the index of the entry for x3 in the symbol table produced by the compiler. This table is used to pass information to subsequent phases. 2. The lexeme = would be mapped to the token <=>. In reality it is probably mapped to a pair, whose second component is ignored. The point is that there are many different identifiers so we need the second component, but there is only one assignment symbol =. 3. The lexeme y is mapped to the token 4. The lexeme + is mapped to the token <+>. 5. The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters. It is mapped to , but what is the something. On the one hand there is only one 3 so we could just use the token . However, there can be a difference between how this should be printed (e.g., in an error message produced by subsequent phases) and how it should be stored (fixed vs. float vs double). Perhaps the token should point to the symbol table where an entry for "this kind of 3" is stored. Another possibility is to have a separate "numbers table". 6. The lexeme ; is mapped to the token <;>.

Note that non-significant blanks are normally removed during scanning. In C, most blanks are non-significant. Blanks inside strings are an exception.

Note that we can define identifiers, numbers, and the various symbols and punctuation without using recursion (compare with parsing below).

1.2.2: Syntax Analysis (or Parsing) would be grouped into the lexemes x3, =, y, +, 3, and ;. ParsingA token involves is a a further grouping pair.in which For example tokens are grouped into grammatical1. The lexeme phrases, x3 would bewhich mapped are to a oftentoken such represented as . The in name a parse id is short for identifier. The value 1 is tree. For examplethe index of the entry for x3 in the symbolBuilding table produced by the the compiler. syntax This table tree is used to pass information to subsequent phases. 2. The lexeme = would be mapped to the token <=>. In reality it is probably mapped to a pair, whose second x3 =component y + 3; is ignored. The point is that there are manyBottom-up different identifiers parsing: so we need the second component, but there is only one assignment symbol =. 3. The lexeme y is mapped to the token I Each stack element points to a subtree of the syntax tree would 4be. The parsed lexeme into + is mappedthe tree to theon token the <+>.right. 5. The lexeme 3 is somewhat interesting and is discussedI furtherWhen in subsequent performing chapters. It a is reduce,mapped to a new syntax tree is built with the , but what is the something. On the onenonterminal hand there is only at one the 3 so rootwe could and just use the the popped-o↵stack elements as children This parsingtoken would . result However, from therea grammar can be a difference containing between rules how suchthis should as be printed (e.g., in an error message produced by subsequent phases) and how it should be stored (fixed vs. float vs double). Perhaps the asst-stmttoken should ! point id to= the expr symbol ;table where an entry for "this kind of 3" is stored. Another possibility is to have a separate "numbers table". Note: expr6. The lexeme !; is numbermapped to the token <;>. | id I In practice, the concrete syntax tree is not built but rather a Note that non-significant | expr blanks +are exprnormally removed during scanning.simplified In C, most (abstract) blanks are non-significant. syntax tree Blanks inside strings are an exception. I Depending on the complexity of the compiler, the syntax tree might Note that we can define identifiers, numbers, and the various symbols and punctuation without using recursion Note(compare the recursive with parsing definition below). of expression (expr). Noteeven also not the be hierarchical constructed decomposition in the figure on the right.

The division1.2.2: Syntax between Analysis scanning (or Parsing) and parsing is somewhat arbitrary, but invariably if a recursive definition is involved, it is consideredParsing involves parsing a further notgrouping scanning. in which tokens are grouped into grammatical phrases, which are often represented in a parse Oftentree. we For utilize example a simpler tree called the syntax tree with operators as interior nodes and operands x3 as = they + children3; of the operator. The syntax tree on the right corresponds to the parse tree above it. would be parsed into the tree on the right.

(TechnicalThis parsing point.) would The result syntaxfrom a grammar tree represents containing rules an such assignment as expression not an assignment statement. In C an assignment statement includes the trailing semicolon. That is, in C (unlike in Algol) the semicolon is a statement asst-stmt ! id = expr ; Syntax analysis 223 terminator expr not a statement ! number separator. | id | expr + expr 1.2.3: Semantic Analysis Note the recursive definition of expression (expr). Note also the hierarchical decomposition in the figure on the right. There is more to a front end than simply syntax. The compiler needs semantic information, e.g., the types (integer, The division between scanning and parsing is somewhat arbitrary, but invariably if a recursive definition is involved, real, itpointer is considered to array parsing of not integers, scanning. etc) of the objects involved. This enables checking for semantic errors and inserting

Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator. The syntax tree on the right corresponds to the parse tree above it.

(Technical point.) The syntax tree represents an assignment expression not an assignment statement. In C an assignment statement includes the trailing semicolon. That is, in C (unlike in Algol) the semicolon is a statement terminator not a statement separator.

1.2.3: Semantic Analysis

There is more to a front end than simply syntax. The compiler needs semantic information, e.g., the types (integer, real, pointer to array of integers, etc) of the objects involved. This enables checking for semantic errors and inserting For your project

The choice of a parsing technique is left open for the project You can either use a parser generator or implement the parser by yourself Motivate your choice in your report and explain any transformation you had to apply to your grammar to make it fit the constraints of the parser

Parser generators:

I Yacc: Unix parser generator, LALR(1) (companion of Lex) I Bison: free implementation of Yacc, LALR(1) (companion of Flex) I ANTLR: LL(*), implemented in Java but output code in several languages I ... http://en.wikipedia.org/wiki/Comparison_of_parser_generators

Syntax analysis 224 An example with Flex/Bison Example: Parsing of the following expression grammar:

Input Input Line ! Input ✏ ! Line Exp EOL ! Line EOL ! Exp num ! Exp Exp + Exp ! Exp Exp Exp ! Exp Exp Exp ! ⇤ Exp Exp/Exp ! Exp (Exp) !

https://github.com/prashants/calc

Syntax analysis 225 Flex file: calc.lex

%{ #define YYSTYPE double /* Define the main semantic type */ #include "calc.tab.h" /* Define the token constants */ #include %} %option yylineno /* Ask flex to put line number in yylineno */ white [ \t]+ digit [0-9] integer {digit}+ exponent [eE][+-]?{integer} real {integer}("."{integer})?{exponent}? %% {white} {} {real} { yylval=atof(yytext); return NUMBER; } "+" { return PLUS; } "-" { return MINUS; } "*" { return TIMES; } "/" { return DIVIDE; } "(" { return LEFT; } ")" { return RIGHT; } "\n" { return END; } .{yyerror("Invalidtoken");}

Syntax analysis 226 Bison file: calc.y

Declaration: %{ #include #include #include #define YYSTYPE double /* Define the main semantic type */ extern char *yytext; /* Global variables of Flex */ extern int yylineno; extern FILE *yyin; %}

Definition of the tokens and start symbol %token NUMBER %token PLUS MINUS TIMES DIVIDE %token LEFT RIGHT %token END

%start Input

Syntax analysis 227 Bison file: calc.y

Operator associativity and precedence: %left PLUS MINUS %left TIMES DIVIDE %left NEG

Production rules and associated actions: %%

Input: /* epsilon */ | Input Line ;

Line: END | Expression END { printf("Result: %f\n", $1); } ;

Syntax analysis 228 Bison file: calc.y

Production rules and actions (continued): Expression: NUMBER { $$ = $1; } | Expression PLUS Expression { $$ = $1 + $3; } | Expression MINUS Expression { $$ = $1 - $3; } | Expression TIMES Expression { $$ = $1 * $3; } | Expression DIVIDE Expression { $$ = $1 / $3; } | MINUS Expression %prec NEG { $$ = -$2; } | LEFT Expression RIGHT { $$ = $2; } ;

Error handling: %%

int yyerror(char *s) { printf("%s on line %d - %s\n", s, yylineno, yytext); }

Syntax analysis 229 Bison file: calc.y Main functions: int main(int argc, char **argv) { /* if any input file has been specified read from that */ if (argc >= 2) { yyin = fopen(argv[1], "r"); if (!yyin) { fprintf(stderr, "Failed to open input file\n"); } return EXIT_FAILURE; }

if (yyparse()) { fprintf(stdout, "Successful parsing\n"); }

fclose(yyin); fprintf(stdout, "End of processing\n"); return EXIT_SUCCESS; }

Syntax analysis 230 Bison file: makefile

How to compile: bison -v -d calc.y flex -o calc.lex.c calc.lex gcc -o calc calc.lex.c calc.tab.c -lfl -lm

Example: >./calc 1+2*3-4 Result: 3.000000 1+3*-4 Result: -11.000000 *2 syntax error on line 3 - * Successful parsing End of processing

Syntax analysis 231 The state machine Excerpt of calc.output (with Expression abbreviated in Exp): state 9 state 11 6Exp:Exp.PLUSExp 6 Exp: Exp PLUS . Exp 7|Exp.MINUSExp 8|Exp.TIMESExp 9|Exp.DIVIDEExp NUMBER shift, and go to state 3 10 | MINUS Exp . MINUS shift, and go to state 4 LEFT shift, and go to state 5 $default reduce using rule 10 (Exp) Exp go to state 17 state 10 6Exp:Exp.PLUSExp 7|Exp.MINUSExp 8|Exp.TIMESExp 9|Exp.DIVIDEExp 11 | LEFT Exp . RIGHT

PLUS shift, and go to state 11 MINUS shift, and go to state 12 TIMES shift, and go to state 13 DIVIDE shift, and go to state 14 RIGHT shift, and go to state 16

Syntax analysis 232