Top-Down Parsing

Outline • Recursive Descent Parsing • Predictive Parsers Top-Down Parsing Originated from Prof. Aiken CS 143 Modified by Yu Zhang 1 2 Intro to Top-Down Parsing: The Idea Recursive Descent Parsing (递归下降的分析) • Consider the grammar • The parse tree is constructed 1 E T |T + E –From the top T int | int * T | ( E ) – From left to right t2 3 t9 • Token stream is: ( int5 ) • Terminals are seen in order of 4 7 appearance in the token • Start with top-level non-terminal E stream: t5 t6 t8 – Try the rules for E in order t2 t5 t6 t8 t9 3 4 Recursive Descent Parsing Recursive Descent Parsing E T |T + E E T |T + E T int | int * T | ( E ) T int | int * T | ( E ) E E T ( int5 ) ( int5 ) 5 6 1 Recursive Descent Parsing Recursive Descent Parsing E T |T + E E T |T + E T int | int * T | ( E ) T int | int * T | ( E ) E E T T Mismatch: int is not ( ! int Backtrack … ( int5 ) ( int5 ) 7 8 Recursive Descent Parsing Recursive Descent Parsing E T |T + E E T |T + E T int | int * T | ( E ) T int | int * T | ( E ) E E T T Mismatch: int is not ( ! int * T Backtrack … ( int5 ) ( int5 ) 9 10 Recursive Descent Parsing Recursive Descent Parsing E T |T + E E T |T + E T int | int * T | ( E ) T int | int * T | ( E ) E E T T Match! Advance input. ( E ) ( E ) ( int5 ) ( int5 ) 11 12 2 Recursive Descent Parsing Recursive Descent Parsing E T |T + E E T |T + E T int | int * T | ( E ) T int | int * T | ( E ) E E T T Match! Advance input. (E) (E) T T ( int5 ) ( int5 ) 13 14 int Recursive Descent Parsing Recursive Descent Parsing E T |T + E E T |T + E T int | int * T | ( E ) T int | int * T | ( E ) E E T T Match! Advance input. End of input, accept. (E) (E) T T ( int5 ) ( int5 ) 15 16 int int A Recursive Descent Parser. Preliminaries A (Limited) Recursive Descent Parser (2) • Let TOKEN be the type of tokens • Define boolean functions that check the token – Special tokens INT, OPEN, CLOSE, PLUS, TIMES string for a match of – A given token terminal • Let the global next point to the next token bool term(TOKEN tok) { return *next++ == tok; } – The nth production of S: bool Sn() { … } – Try all productions of S: bool S() { … } 17 18 3 A (Limited) Recursive Descent Parser (3) A (Limited) Recursive Descent Parser (4) • For production E T • Functions for non-terminal T bool T1() { return term(INT); } bool E1() { return T(); } bool T2() { return term(INT) && term(TIMES) && T(); } • For production E T + E bool T3() { return term(OPEN) && E() && term(CLOSE); } bool E2() { return T() && term(PLUS) && E(); } • For all productions of E (with backtracking) bool T() { TOKEN *save = next; bool E() { return (next = save, T ()) TOKEN *save = next; 1 || (next = save, T2()) return (next = save, E1()) || (next = save, T3()); } || (next = save, E2()); } 19 20 Recursive Descent Parsing. Notes. Example • To start the parser E T |T + E ( int ) –Initialize next to point to first token T int | int * T | ( E ) –Invoke E() bool term(TOKEN tok) { return *next++ == tok; } • Notice how this simulates the example parse bool E1() { return T(); } bool E2() { return T() && term(PLUS) && E(); } bool E() {TOKEN *save = next; return (next = save, E1()) • Easy to implement by hand || (next = save, E2()); } bool T () { return term(INT); } – But not completely general 1 bool T2() { return term(INT) && term(TIMES) && T(); } – Cannot backtrack once a production is successful bool T3() { return term(OPEN) && E() && term(CLOSE); } – Works for grammars where at most one production can bool T() { TOKEN *save = next; return (next = save, T1()) succeed for a non-terminal || (next = save, T2()) || (next = save, T3()); } 21 22 When Recursive Descent Does Not Work Elimination of Left Recursion • Consider a production S S a • Consider the left-recursive grammar bool S1() { return S() && term(a); } S S | bool S() { return S1(); } •Sgenerates all strings starting with a and •S()goes into an infinite loop followed by a number of • A left-recursive grammar has a non-terminal S • Can rewrite using right-recursion S + S for some S S’ • Recursive descent does not work in such cases S’ S’ | 23 24 4 More Elimination of Left-Recursion General Left Recursion • In general • The grammar S A | S S | … | S | | … | 1 n 1 m A S • All strings derived from S start with one of is also left-recursive because + 1,…,m and continue with several instances of S S 1,…,n • Rewrite as • This left-recursion can also be eliminated S 1 S’ | … | m S’ S’ 1 S’ | … | n S’ | • See Dragon Book for general algorithm –Section 4.3 25 26 Summary of Recursive Descent Predictive Parsers (预测分析器) • Simple and general parsing strategy • Like recursive-descent but parser can – Left-recursion must be eliminated first “predict” which production to use – … but that can be done automatically – By looking at the next few tokens –No backtracking • Unpopular because of backtracking (回溯) • Predictive parsers accept LL(k) grammars – Thought to be too inefficient –Lmeans “left-to-right” scan of input –Lmeans “leftmost derivation” • In practice, backtracking is eliminated by –kmeans “predict based on k tokens of lookahead” restricting the grammar – In practice, LL(1) is used 27 28 LL(1) vs. Recursive Descent Predictive Parsing and Left Factoring • In recursive-descent, • Recall the grammar – At each step, many choices of production to use E T + E | T – Backtracking used to undo bad choices T int | int * T | ( E ) • In LL(1), – At each step, only one choice of production • Hard to predict because –That is • When a non-terminal A is leftmost in a derivation –For T two productions start with int • The next input symbol is t –For E it is not clear how to predict • There is a unique production A to use – Or no production to use (an error state) • We need to left-factor the grammar • LL(1) is a recursive descent variant without backtracking 29 30 5 Left-Factoring Example Left-Recursion & Left-Factoring Example • Recall the grammar • Recall the grammar E T + E | T E E + T | T -- +: left assoc. T int | int * T | ( E ) T int | int * T | ( E ) • Factor out common prefixes of productions • Eliminate left-recursion & factor out common E T X prefixes X + E | E T X T ( E ) | int Y X + T X | Y * T | T ( E ) | int Y Y * T | 31 32 LL(1) Parsing Table Example LL(1) Parsing Table Example (Cont.) • Left-factored grammar •Consider the [E, int] entry E T X X + E | – “When current non-terminal is E and next input is T ( E ) | int Y Y * T | int, use production E T X” • The LL(1) parsing table: next input token – This can generate an int in the first position int * + ( ) $ E T X T X •Consider the [Y,+] entry X + E – “When current non-terminal is Y and current token T int Y ( E ) is +, get rid of Y” Y * T –Ycan be followed by + only if Y rhs of production to use leftmost non-terminal 33 34 LL(1) Parsing Tables. Errors Predictive Parsing • Blank entries indicate error situations 输入 a + b $ •Consider the [E,*] entry – “There is no way to derive a string starting with * 预测分析程序 from non-terminal E” 栈 X 输出 Y Z $ 分析表M 36 35 6 Using Parsing Tables LL(1) Parsing Algorithm • Method similar to recursive descent, except initialize stack = <S $> and next – For the leftmost non-terminal S repeat – We look at the next input token a case stack of – And choose the production shown at [S,a] <X, rest> : if T[X,*next] == Y1…Yn then stack <Y1… Yn rest>; • A stack records frontier of parse tree else error (); – Non-terminals that have yet to be expanded <t, rest> : if t == *next ++ – Terminals that have yet to matched against the input then stack <rest>; – Top of stack = leftmost pending terminal or non-terminal else error (); until stack == < > • Reject on reaching error state • Accept on end of input & empty stack 37 38 LL(1) Parsing Algorithm LL(1) Parsing Example $ marks bottom of stack initialize stack = <S $> and next Stack Input Action repeat For non-terminal X on top of stack, E $ int * int $ T X lookup production case stack of T X $ int * int $ int Y <X, rest> : if T[X,*next] == Y1…Yn int Y X $ int * int $ terminal then stack <Y1… Yn rest>; Y X $ * int $ * T else error (); Pop X, push * T X $ * int $ terminal <t, rest> : if t == *next ++ production T X $ int $ int Y For terminal t on top of then stack <rest>; rhs on stack. stack, check t matches next else error (); Note int Y X $ int $ terminal input token. until stack == < > leftmost Y X $ $ symbol of rhs X $ $ is on top of the stack. $ $ ACCEPT 39 40 Constructing Parsing Tables: The Intuition Computing First Sets • Consider non-terminal A, production A , & token t Definition • T[A,t] = in two cases: First(X) = { t | X * t} { | X * } * •If t Algorithm sketch: – can derive a t in the first position –We say thatt First() 1. First(t) = { t } 2. First(X) •IfA and * and S * A t •if X – Useful if stack has A, input is t, and A cannot derive t •if X A1 … An and First(Ai) for 1 i n – In this case only option is to get rid of A (by deriving ) 3. First() First(X) if X A1 … An •Can work only if t can follow A in at least one derivation –and First(A ) for 1 i n –We sayt Follow(A) i 41 42 7 First Sets.

Top-Down Parsing

Derivatives of Parsing Expression Grammars

A Typed, Algebraic Approach to Parsing

Adaptive LL(*) Parsing: the Power of Dynamic Analysis

Efficient Recursive Parsing

Conflict Resolution in a Recursive Descent Compiler Generator

A Parsing Machine for Pegs

Syntactic Analysis, Or Parsing, Is the Second Phase of Compilation: the Token File Is Converted to an Abstract Syntax Tree

Lecture 3: Recursive Descent Limitations, Precedence Climbing

Parser Generation Bottom-Up Parsing LR Parsing Constructing LR Parser

CSCI 742 - Compiler Construction

1 Top-Down Parsing

Top-Down Parsing