Outline
• Recursive Descent Parsing • Predictive Parsers Top-Down Parsing
Originated from Prof. Aiken CS 143 Modified by Yu Zhang
1 2
Intro to Top-Down Parsing: The Idea Recursive Descent Parsing (递归下降的分析 )
• Consider the grammar • The parse tree is constructed 1 E T |T + E –From the top T int | int * T | ( E )
– From left to right t2 3 t9
• Token stream is: ( int5 ) • Terminals are seen in order of 4 7 appearance in the token • Start with top-level non-terminal E stream: t5 t6 t8 – Try the rules for E in order
t2 t5 t6 t8 t9
3 4
Recursive Descent Parsing Recursive Descent Parsing
E T |T + E E T |T + E T int | int * T | ( E ) T int | int * T | ( E )
E E
T
( int5 ) ( int5 ) 5 6
1 Recursive Descent Parsing Recursive Descent Parsing
E T |T + E E T |T + E T int | int * T | ( E ) T int | int * T | ( E )
E E
T T
Mismatch: int is not ( ! int Backtrack …
( int5 ) ( int5 ) 7 8
Recursive Descent Parsing Recursive Descent Parsing
E T |T + E E T |T + E T int | int * T | ( E ) T int | int * T | ( E )
E E
T T
Mismatch: int is not ( ! int * T Backtrack …
( int5 ) ( int5 ) 9 10
Recursive Descent Parsing Recursive Descent Parsing
E T |T + E E T |T + E T int | int * T | ( E ) T int | int * T | ( E )
E E
T T
Match! Advance input. ( E ) ( E )
( int5 ) ( int5 ) 11 12
2 Recursive Descent Parsing Recursive Descent Parsing
E T |T + E E T |T + E T int | int * T | ( E ) T int | int * T | ( E )
E E
T T
Match! Advance input. (E) (E)
T T ( int5 ) ( int5 ) 13 14 int
Recursive Descent Parsing Recursive Descent Parsing
E T |T + E E T |T + E T int | int * T | ( E ) T int | int * T | ( E )
E E
T T
Match! Advance input. End of input, accept. (E) (E)
T T ( int5 ) ( int5 ) 15 16 int int
A Recursive Descent Parser. Preliminaries A (Limited) Recursive Descent Parser (2)
• Let TOKEN be the type of tokens • Define boolean functions that check the token – Special tokens INT, OPEN, CLOSE, PLUS, TIMES string for a match of – A given token terminal • Let the global next point to the next token bool term(TOKEN tok) { return *next++ == tok; } – The nth production of S:
bool Sn() { … } – Try all productions of S: bool S() { … }
17 18
3 A (Limited) Recursive Descent Parser (3) A (Limited) Recursive Descent Parser (4)
• For production E T • Functions for non-terminal T bool T1() { return term(INT); } bool E1() { return T(); } bool T2() { return term(INT) && term(TIMES) && T(); } • For production E T + E bool T3() { return term(OPEN) && E() && term(CLOSE); } bool E2() { return T() && term(PLUS) && E(); } • For all productions of E (with backtracking) bool T() { TOKEN *save = next; bool E() { return (next = save, T ()) TOKEN *save = next; 1 || (next = save, T2()) return (next = save, E1()) || (next = save, T3()); } || (next = save, E2()); }
19 20
Recursive Descent Parsing. Notes. Example
• To start the parser E T |T + E ( int ) –Initialize next to point to first token T int | int * T | ( E )
–Invoke E() bool term(TOKEN tok) { return *next++ == tok; }
• Notice how this simulates the example parse bool E1() { return T(); } bool E2() { return T() && term(PLUS) && E(); }
bool E() {TOKEN *save = next; return (next = save, E1()) • Easy to implement by hand || (next = save, E2()); } bool T () { return term(INT); } – But not completely general 1 bool T2() { return term(INT) && term(TIMES) && T(); } – Cannot backtrack once a production is successful bool T3() { return term(OPEN) && E() && term(CLOSE); } – Works for grammars where at most one production can bool T() { TOKEN *save = next; return (next = save, T1()) succeed for a non-terminal || (next = save, T2()) || (next = save, T3()); } 21 22
When Recursive Descent Does Not Work Elimination of Left Recursion
• Consider a production S S a • Consider the left-recursive grammar
bool S1() { return S() && term(a); } S S |
bool S() { return S1(); } •Sgenerates all strings starting with a and •S()goes into an infinite loop followed by a number of
• A left-recursive grammar has a non-terminal S • Can rewrite using right-recursion S + S for some S S’ • Recursive descent does not work in such cases S’ S’ |
23 24
4 More Elimination of Left-Recursion General Left Recursion
• In general • The grammar S A | S S | … | S | | … | 1 n 1 m A S • All strings derived from S start with one of is also left-recursive because + 1,…,m and continue with several instances of S S 1,…,n • Rewrite as • This left-recursion can also be eliminated
S 1 S’ | … | m S’
S’ 1 S’ | … | n S’ | • See Dragon Book for general algorithm –Section 4.3
25 26
Summary of Recursive Descent Predictive Parsers (预测分析器)
• Simple and general parsing strategy • Like recursive-descent but parser can – Left-recursion must be eliminated first “predict” which production to use – … but that can be done automatically – By looking at the next few tokens –No backtracking • Unpopular because of backtracking (回溯) • Predictive parsers accept LL(k) grammars – Thought to be too inefficient –Lmeans “left-to-right” scan of input –Lmeans “leftmost derivation” • In practice, backtracking is eliminated by –kmeans “predict based on k tokens of lookahead” restricting the grammar – In practice, LL(1) is used
27 28
LL(1) vs. Recursive Descent Predictive Parsing and Left Factoring
• In recursive-descent, • Recall the grammar – At each step, many choices of production to use E T + E | T – Backtracking used to undo bad choices T int | int * T | ( E ) • In LL(1), – At each step, only one choice of production • Hard to predict because –That is • When a non-terminal A is leftmost in a derivation –For T two productions start with int • The next input symbol is t –For E it is not clear how to predict • There is a unique production A to use – Or no production to use (an error state) • We need to left-factor the grammar • LL(1) is a recursive descent variant without backtracking
29 30
5 Left-Factoring Example Left-Recursion & Left-Factoring Example
• Recall the grammar • Recall the grammar E T + E | T E E + T | T -- +: left assoc. T int | int * T | ( E ) T int | int * T | ( E )
• Factor out common prefixes of productions • Eliminate left-recursion & factor out common E T X prefixes X + E | E T X T ( E ) | int Y X + T X | Y * T | T ( E ) | int Y Y * T | 31 32
LL(1) Parsing Table Example LL(1) Parsing Table Example (Cont.)
• Left-factored grammar •Consider the [E, int] entry E T X X + E | – “When current non-terminal is E and next input is T ( E ) | int Y Y * T | int, use production E T X” • The LL(1) parsing table: next input token – This can generate an int in the first position int * + ( ) $ E T X T X •Consider the [Y,+] entry X + E – “When current non-terminal is Y and current token T int Y ( E ) is +, get rid of Y” Y * T –Ycan be followed by + only if Y rhs of production to use leftmost non-terminal 33 34
LL(1) Parsing Tables. Errors Predictive Parsing
• Blank entries indicate error situations 输入 a + b $
•Consider the [E,*] entry – “There is no way to derive a string starting with * 预测分析程序 from non-terminal E” 栈 X 输出 Y Z $ 分析表M
36 35
6 Using Parsing Tables LL(1) Parsing Algorithm
• Method similar to recursive descent, except initialize stack = and next – For the leftmost non-terminal S repeat – We look at the next input token a case stack of – And choose the production shown at [S,a]
37 38
LL(1) Parsing Algorithm LL(1) Parsing Example $ marks bottom of stack
initialize stack = and next Stack Input Action repeat For non-terminal X on top of stack, E $ int * int $ T X lookup production case stack of T X $ int * int $ int Y
39 40
Constructing Parsing Tables: The Intuition Computing First Sets
• Consider non-terminal A, production A , & token t Definition • T[A,t] = in two cases: First(X) = { t | X * t} { | X * }
* •If t Algorithm sketch: – can derive a t in the first position –We say thatt First() 1. First(t) = { t } 2. First(X) •IfA and * and S * A t •if X
– Useful if stack has A, input is t, and A cannot derive t •if X A1 … An and First(Ai) for 1 i n – In this case only option is to get rid of A (by deriving ) 3. First() First(X) if X A1 … An •Can work only if t can follow A in at least one derivation –and First(A ) for 1 i n –We sayt Follow(A) i
41 42
7 First Sets. Example Computing Follow Sets
• Recall the grammar • Definition: E T X X + E | Follow(X) = { t | S * X t } T ( E ) | int Y Y * T |
• First sets •Intuition First( ( ) = { ( } First( T ) = {int, ( } –If X A B then First(B) Follow(A) and First( ) ) = { ) } First( E ) = {int, ( } Follow(X) Follow(B) First( int) = { int } First( X ) = {+, } •if B * then Follow(X) Follow(A) First( + ) = { + } First( Y ) = {*, } First( * ) = { * } –IfS is the start symbol then $ Follow(S)
43 44
Computing Follow Sets (Cont.) Follow Sets. Example
Algorithm sketch: • Recall the grammar 1. $ Follow(S) E T X X + E | T ( E ) | int Y Y * T | 2. First() - {} Follow(X) – For each production A X • Follow sets 3. Follow(A) Follow(X) Follow( + ) = { int, ( } Follow( * ) = { int, ( } – For each production A X where First() Follow( ( ) = { int, ( } Follow( E ) = {), $} Follow( X ) = {$, ) } Follow( T ) = {+, ) , $} Follow( ) ) = {+, ) , $} Follow( Y ) = {+, ) , $} Follow( int) = {*, +, ) , $}
45 46
Constructing LL(1) Parsing Tables Notes on LL(1) Parsing Tables
• Construct a parsing table T for CFG G • If any entry is multiply defined then G is not LL(1) • For each production A in G do: –If G is ambiguous –For each terminal t First() do – If G is left recursive •T[A, t] = – If G is not left-factored –If First( ), for each t Follow(A) do – And in other cases as well •T[A, t] = –If First() and $ Follow(A) do •T[A, $] = • Most programming language CFGs are not LL(1)
47 48
8