Outline

• Recursive Descent • Predictive Parsers Top-Down Parsing

Originated from Prof. Aiken CS 143 Modified by Yu Zhang

1 2

Intro to Top-Down Parsing: The Idea Recursive Descent Parsing (递归下降的分析 )

• Consider the grammar • The is constructed 1 E  T |T + E –From the top T  int | int * T | ( E )

– From left to right t2 3 t9

• Token stream is: ( int5 ) • Terminals are seen in order of 4 7 appearance in the token • Start with top-level non-terminal E stream: t5 t6 t8 – Try the rules for E in order

t2 t5 t6 t8 t9

3 4

Recursive Descent Parsing Recursive Descent Parsing

E  T |T + E E  T |T + E T  int | int * T | ( E ) T  int | int * T | ( E )

E E

T

( int5 ) ( int5 ) 5 6

1 Recursive Descent Parsing Recursive Descent Parsing

E  T |T + E E  T |T + E T  int | int * T | ( E ) T  int | int * T | ( E )

E E

T T

Mismatch: int is not ( ! int Backtrack …

( int5 ) ( int5 ) 7 8

Recursive Descent Parsing Recursive Descent Parsing

E  T |T + E E  T |T + E T  int | int * T | ( E ) T  int | int * T | ( E )

E E

T T

Mismatch: int is not ( ! int * T Backtrack …

( int5 ) ( int5 ) 9 10

Recursive Descent Parsing Recursive Descent Parsing

E  T |T + E E  T |T + E T  int | int * T | ( E ) T  int | int * T | ( E )

E E

T T

Match! Advance input. ( E ) ( E )

( int5 ) ( int5 ) 11 12

2 Recursive Descent Parsing Recursive Descent Parsing

E  T |T + E E  T |T + E T  int | int * T | ( E ) T  int | int * T | ( E )

E E

T T

Match! Advance input. (E) (E)

T T ( int5 ) ( int5 ) 13 14 int

Recursive Descent Parsing Recursive Descent Parsing

E  T |T + E E  T |T + E T  int | int * T | ( E ) T  int | int * T | ( E )

E E

T T

Match! Advance input. End of input, accept. (E) (E)

T T ( int5 ) ( int5 ) 15 16 int int

A Recursive Descent Parser. Preliminaries A (Limited) Recursive Descent Parser (2)

• Let TOKEN be the type of tokens • Define boolean functions that check the token – Special tokens INT, OPEN, CLOSE, PLUS, TIMES string for a match of – A given token terminal • Let the global next point to the next token bool term(TOKEN tok) { return *next++ == tok; } – The nth production of S:

bool Sn() { … } – Try all productions of S: bool S() { … }

17 18

3 A (Limited) Recursive Descent Parser (3) A (Limited) Recursive Descent Parser (4)

• For production E  T • Functions for non-terminal T bool T1() { return term(INT); } bool E1() { return T(); } bool T2() { return term(INT) && term(TIMES) && T(); } • For production E  T + E bool T3() { return term(OPEN) && E() && term(CLOSE); } bool E2() { return T() && term(PLUS) && E(); } • For all productions of E (with ) bool T() { TOKEN *save = next; bool E() { return (next = save, T ()) TOKEN *save = next; 1 || (next = save, T2()) return (next = save, E1()) || (next = save, T3()); } || (next = save, E2()); }

19 20

Recursive Descent Parsing. Notes. Example

• To start the parser E  T |T + E ( int ) –Initialize next to point to first token T  int | int * T | ( E )

–Invoke E() bool term(TOKEN tok) { return *next++ == tok; }

• Notice how this simulates the example parse bool E1() { return T(); } bool E2() { return T() && term(PLUS) && E(); }

bool E() {TOKEN *save = next; return (next = save, E1()) • Easy to implement by hand || (next = save, E2()); } bool T () { return term(INT); } – But not completely general 1 bool T2() { return term(INT) && term(TIMES) && T(); } – Cannot backtrack once a production is successful bool T3() { return term(OPEN) && E() && term(CLOSE); } – Works for grammars where at most one production can bool T() { TOKEN *save = next; return (next = save, T1()) succeed for a non-terminal || (next = save, T2()) || (next = save, T3()); } 21 22

When Recursive Descent Does Not Work Elimination of

• Consider a production S  S a • Consider the left-recursive grammar

bool S1() { return S() && term(a); } S  S  | 

bool S() { return S1(); } •Sgenerates all strings starting with a  and •S()goes into an infinite loop followed by a number of 

• A left-recursive grammar has a non-terminal S • Can rewrite using right-recursion S + S for some  S S’ • Recursive descent does not work in such cases S’ S’ | 

23 24

4 More Elimination of Left-Recursion General Left Recursion

• In general • The grammar S  A  |  S  S  | … | S  |  | … |  1 n 1 m A  S  • All strings derived from S start with one of is also left-recursive because + 1,…,m and continue with several instances of S  S  1,…,n • Rewrite as • This left-recursion can also be eliminated

S 1 S’ | … | m S’

S’ 1 S’ | … | n S’ |  • See Dragon Book for general algorithm –Section 4.3

25 26

Summary of Recursive Descent Predictive Parsers (预测分析器)

• Simple and general parsing strategy • Like recursive-descent but parser can – Left-recursion must be eliminated first “predict” which production to use – … but that can be done automatically – By looking at the next few tokens –No backtracking • Unpopular because of backtracking (回溯) • Predictive parsers accept LL(k) grammars – Thought to be too inefficient –Lmeans “left-to-right” scan of input –Lmeans “leftmost derivation” • In practice, backtracking is eliminated by –kmeans “predict based on k tokens of lookahead” restricting the grammar – In practice, LL(1) is used

27 28

LL(1) vs. Recursive Descent Predictive Parsing and Left Factoring

• In recursive-descent, • Recall the grammar – At each step, many choices of production to use E  T + E | T – Backtracking used to undo bad choices T  int | int * T | ( E ) • In LL(1), – At each step, only one choice of production • Hard to predict because –That is • When a non-terminal A is leftmost in a derivation –For T two productions start with int • The next input symbol is t –For E it is not clear how to predict • There is a unique production A to use – Or no production to use (an error state) • We need to left-factor the grammar • LL(1) is a recursive descent variant without backtracking

29 30

5 Left-Factoring Example Left-Recursion & Left-Factoring Example

• Recall the grammar • Recall the grammar E  T + E | T E  E + T | T -- +: left assoc. T  int | int * T | ( E ) T  int | int * T | ( E )

• Factor out common prefixes of productions • Eliminate left-recursion & factor out common E  T X prefixes X  + E |  E  T X T  ( E ) | int Y X  + T X |  Y  * T |  T  ( E ) | int Y Y  * T |  31 32

LL(1) Parsing Table Example LL(1) Parsing Table Example (Cont.)

• Left-factored grammar •Consider the [E, int] entry E  T X X  + E |  – “When current non-terminal is E and next input is T  ( E ) | int Y Y  * T |  int, use production E  T X” • The LL(1) parsing table: next input token – This can generate an int in the first position int * + ( ) $ E T X T X •Consider the [Y,+] entry X + E  – “When current non-terminal is Y and current token T int Y ( E ) is +, get rid of Y” Y * T  –Ycan be followed by + only if Y  rhs of production to use leftmost non-terminal 33 34

LL(1) Parsing Tables. Errors Predictive Parsing

• Blank entries indicate error situations 输入 a + b $

•Consider the [E,*] entry – “There is no way to derive a string starting with * 预测分析程序 from non-terminal E” 栈 X 输出 Y Z $ 分析表M

36 35

6 Using Parsing Tables LL(1) Parsing Algorithm

• Method similar to recursive descent, except initialize stack = and next – For the leftmost non-terminal S repeat – We look at the next input token a case stack of – And choose the production shown at [S,a] : if T[X,*next] == Y1…Yn then stack  ; • A stack records frontier of parse tree else error (); – Non-terminals that have yet to be expanded : if t == *next ++ – Terminals that have yet to matched against the input then stack  ; – Top of stack = leftmost pending terminal or non-terminal else error (); until stack == < > • Reject on reaching error state • Accept on end of input & empty stack

37 38

LL(1) Parsing Algorithm LL(1) Parsing Example $ marks bottom of stack

initialize stack = and next Stack Input Action repeat For non-terminal X on top of stack, E $ int * int $ T X lookup production case stack of T X $ int * int $ int Y : if T[X,*next] == Y1…Yn int Y X $ int * int $ terminal then stack  ; Y X $ * int $ * T else error (); Pop X, push * T X $ * int $ terminal : if t == *next ++ production T X $ int $ int Y For terminal t on top of then stack  ; rhs on stack. stack, check t matches next else error (); Note int Y X $ int $ terminal input token. until stack == < > leftmost Y X $ $  symbol of rhs X $ $  is on top of the stack. $ $ ACCEPT

39 40

Constructing Parsing Tables: The Intuition Computing First Sets

• Consider non-terminal A, production A , & token t Definition • T[A,t] =  in two cases: First(X) = { t | X * t}  { | X * }

*  •If t Algorithm sketch: –  can derive a t in the first position –We say thatt  First() 1. First(t) = { t } 2. First(X) •IfA and *  and S *  A t  •if X 

– Useful if stack has A, input is t, and A cannot derive t •if X  A1 … An and First(Ai) for 1  i  n – In this case only option is to get rid of A (by deriving ) 3. First()  First(X) if X  A1 … An  •Can work only if t can follow A in at least one derivation –and First(A ) for 1  i  n –We sayt  Follow(A) i

41 42

7 First Sets. Example Computing Follow Sets

• Recall the grammar • Definition:    E T X X + E | Follow(X) = { t | S *  X t  } T  ( E ) | int Y Y  * T | 

• First sets •Intuition First( ( ) = { ( } First( T ) = {int, ( } –If X  A B then First(B)  Follow(A) and First( ) ) = { ) } First( E ) = {int, ( } Follow(X)  Follow(B) First( int) = { int } First( X ) = {+,  } •if B *  then Follow(X)  Follow(A) First( + ) = { + } First( Y ) = {*,  } First( * ) = { * } –IfS is the start symbol then $  Follow(S)

43 44

Computing Follow Sets (Cont.) Follow Sets. Example

Algorithm sketch: • Recall the grammar    1. $  Follow(S) E T X X + E | T  ( E ) | int Y Y  * T |  2. First() - {}  Follow(X) – For each production A X  • Follow sets 3. Follow(A)  Follow(X) Follow( + ) = { int, ( } Follow( * ) = { int, ( } – For each production A X  where First() Follow( ( ) = { int, ( } Follow( E ) = {), $} Follow( X ) = {$, ) } Follow( T ) = {+, ) , $} Follow( ) ) = {+, ) , $} Follow( Y ) = {+, ) , $} Follow( int) = {*, +, ) , $}

45 46

Constructing LL(1) Parsing Tables Notes on LL(1) Parsing Tables

• Construct a parsing table T for CFG G • If any entry is multiply defined then G is not LL(1) • For each production A in G do: –If G is ambiguous –For each terminal t  First() do – If G is left recursive •T[A, t] =  – If G is not left-factored    –If First( ), for each t Follow(A) do – And in other cases as well •T[A, t] =  –If First() and $  Follow(A) do •T[A, $] =  • Most programming language CFGs are not LL(1)

47 48

8