Top-Down Parsing – Top-Down – Bottom-Up • Top-Down Adapted from Lecture by – Easier to Understand and Program Manually Profs

Lecture Outline • Implementation of parsers • Two approaches Top-Down Parsing – Top-down – Bottom-up • Top-Down Adapted from Lecture by – Easier to understand and program manually Profs. Alex Aiken & George Necula • Bottom-Up (UCB) – More powerful and used by most parser generators CS780(Prasad) L101TDP 1 CS780(Prasad) L101TDP 2 Intro to Top-Down Parsing Recursive Descent Parsing • Consider the grammar • The parse tree is constructed 1 E → T + E | T – From the top T → ( E ) | int | int * T – From left to right t2 3 t9 • Token stream is: int5 * int2 • Start with top-level non-terminal E • Terminals are seen in order of 4 7 appearance in the token • Try the rules for E in order stream: t5 t6 t8 t2 t5 t6 t8 t9 CS780(Prasad) L101TDP 3 CS780(Prasad) L101TDP 4 1 Recursive Descent Parsing. Example (Cont.) Recursive Descent Parsing. Example (Cont.) •Try E0 → T1 + E2 •Try E0 → T1 • Then try a rule for T1 → ( E3 ) • Follow same steps as before for T1 –But( does not match input token int5 – And succeed with T1 → int * T2 and T2 → int – With the following parse tree •TryT1 → int . Token matches. E0 –But + after T1 does not match input token * → •Try T1 int * T2 T1 – This will match but + after T1 will be unmatched int5 * T2 •Has exhausted the choices for T1 – Backtrack to choice for E0 int2 CS780(Prasad) L101TDP 5 CS780(Prasad) L101TDP 6 A Recursive Descent Parser. Preliminaries A Recursive Descent Parser (2) • Let TOKEN be the type of tokens • Define boolean functions that check the token – Special tokens INT, OPEN, CLOSE, PLUS, TIMES string for a match of – A given token terminal • Let the global next point to the next token bool term(TOKEN tok) { return *next++ == tok; } – A given production of S (the nth) bool Sn() { … } – Any production of S: bool S() { … } • These functions advance next CS780(Prasad) L101TDP 7 CS780(Prasad) L101TDP 8 2 A Recursive Descent Parser (3) A Recursive Descent Parser (4) • For production E → T + E • Functions for non-terminal T bool T () { return term(OPEN) && E() && term(CLOSE); } bool E1() { return T() && term(PLUS) && E(); } 1 • For production E → T bool T2() { return term(INT) && term(TIMES) && T(); } bool T3() { return term(INT); } bool E2() { return T(); } For all productions of E (with backtracking) bool T() { bool E() { TOKEN *save = next; TOKEN *save = next; return (next = save, T1()) return (next = save, E ()) 1 || (next = save, T2()) || (next = save, E ()); } 2 || (next = save, T3()); } CS780(Prasad) L101TDP 9 CS780(Prasad) L101TDP 10 Recursive Descent Parsing. Notes. When Recursive Descent Does Not Work • To start the parser • Consider a production S → S a –Initialize next to point to first token bool S1() { return S() && term(a); } –Invoke E() bool S() { return S1(); } • Notice how this simulates our previous •S()will get into an infinite loop example. • A left-recursive grammar has a non-terminal S • Easy to implement by hand S →+ Sα for some α • But does not always work … • Recursive descent does not work in such cases. CS780(Prasad) L101TDP 11 CS780(Prasad) L101TDP 12 3 Elimination of Left Recursion More Elimination of Left-Recursion • Consider the left-recursive grammar • In general S → S α | β S → S α1 | … | S αn | β1 | … | βm • S generates all strings starting with a β and • All strings derived from S start with one of followed by a number of α β1,…,βm and continue with several instances of α α βα∗ 1,…, n • Can rewrite using right-recursion •Rewrite as S →β S’ | … | β S’ S →βS’ 1 m S’ →α S’ | … | α S’ | ε S’ →αS’ | ε 1 n CS780(Prasad) L101TDP 13 CS780(Prasad) L101TDP 14 A → Bb | a General Left Recursion B → Aa | b •The grammar (Cf. Gaussian Elimination) S → A α | δ A → Bb | a A → S β is also left-recursive because B → (Bb | a)a | b S →+ S βα A → Bb | a A → Bb | a B → (aa | b)Z | (aa | b) • This left-recursion can also be eliminated. B → Bba | aa | b Z → baZ | ba • More examples on the following slides. CS780(Prasad) L101TDP 15 CS780(Prasad) L101TDP 16 4 Example: Related to conversion to Griebach Normal Formal Summary of Recursive Descent A → BC C → (bCB | a)R • Simple and general parsing strategy B → CA | b | bCB|a – Left-recursion must be eliminated first C → AB | a R → ACBR | ACB n o i – … but that can be done automatically s r u Introducing terminals A B C c f f e r as first element on RHS • Unpopular because of backtracking t f e l C → bCBR | aR | bCB | a A → BC g – Thought to be too inefficient n i t a n B → bcBRA | aRA B → CA | b i – Cf. Prolog execution strategy m i | bCBA | aA | b C → BCB | a El A → bcBRAC | aRAC • In practice, backtracking is eliminated by C → CACB | bCB | a | bCBAC | aAC | bC restricting the grammar R → (bCBRAC |... | bC)(CBR | CB) – To enable “look-before-you-leap” strategy CS780(Prasad) L101TDP 17 CS780(Prasad) L101TDP 18 Predictive Parsers () • Like recursive-descent but parser can •LL(k) grammars “predict” which production to use. •LR(k) grammars – By looking at the next few tokens. –Lmeans “left-to-right” scan of input –No backtracking. –Rmeans “rightmost derivation” • Predictive parsers accept LL(k) grammars. –kmeans “predict based on k tokens of lookahead” –Lmeans “left-to-right” scan of input. •RL(1) grammars –Lmeans “leftmost derivation”. –Rmeans “right-to-left” scan of input –kmeans “predict based on k tokens of lookahead”. •LR(0) , LR(1) grammars • In practice, LL(1) is used. •SLR(1) grammars, LALR(1) grammars CS780(Prasad) L101TDP 19 CS780(Prasad) L101TDP 20 5 LL(1) Languages Predictive Parsing and Left Factoring • In recursive-descent, for each non-terminal • Recall the grammar and input token there may be a choice of E → T + E | T production. T → int | int * T | ( E ) • LL(1) means that for each non-terminal and token there is only one production. •Hard to predict because • Can be specified via 2D tables. –For T, two productions start with int. –For E, it is not clear how to predict. – One dimension for current non-terminal to expand. – One dimension for next token. •A grammar must be left-factored before use – A table entry contains one production. for predictive parsing. CS780(Prasad) L101TDP 21 CS780(Prasad) L101TDP 22 Left-Factoring Example LL(1) Parsing Table Example • Recall the grammar • Left-factored grammar E → T + E | T E → T X X → + E | ε T → int | int * T | ( E ) T → ( E ) | int Y Y → * T | ε • Factor out common prefixes of productions, • The LL(1) parsing table: possibly introducing ε-productions int * + ( ) $ E → T X E T X T X X + E X → + E | ε ε ε T → ( E ) | int Y T int Y ( E ) Y * T ε ε ε Y → * T | ε CS780(Prasad) L101TDP 23 CS780(Prasad) L101TDP 24 6 LL(1) Parsing Table Example (Cont.) LL(1) Parsing Tables. Errors • Consider the [E, int] entry • Blank entries indicate error situations – “When current non-terminal is E and next input is –Consider the [E,*] entry int, use production E → T X. – “There is no way to derive a string starting with * – This production can generate an int in the first from non-terminal E” place. • Consider the [Y,+] entry – “When current non-terminal is Y and current token is +, get rid of Y”. –Ycan be followed by + only in a derivation in which Y →ε. CS780(Prasad) L101TDP 25 CS780(Prasad) L101TDP 26 Using Parsing Tables LL(1) Parsing Algorithm • Method similar to recursive descent, except initialize stack = <S $> and next – For each non-terminal X repeat case stack of – We look at the next token t <X, rest> : if T[X,*next] = Y1…Yn – And chose the production shown at [X,t] then stack ← <Y1…Yn, rest>; • We use a stack to keep track of pending non- else error (); terminals. <t, rest> : if t == *next ++ then stack ← <rest>; • We reject when we encounter an error state. else error (); • We accept when we encounter end-of-input. until stack == < > CS780(Prasad) L101TDP 27 CS780(Prasad) L101TDP 28 7 LL(1) Parsing Example Constructing Parsing Tables Stack Input Action • LL(1) languages are those defined by a parsing E $ int * int $ T X table for the LL(1) algorithm. T X $ int * int $ int Y • No table entry can be multiply defined. int Y X $ int * int $ terminal Y X $ * int $ * T * T X $ * int $ terminal • We want to generate parsing tables from CFG. T X $ int $ int Y int Y X $ int $ terminal Y X $ $ ε X $ $ ε $ $ ACCEPT CS780(Prasad) L101TDP 29 CS780(Prasad) L101TDP 30 Constructing Parsing Tables (Cont.) Computing First Sets •If A →α, Definition where in the line of A do we place α ? First(X) = { t | X →* tα} ∪ {ε | X →* ε} • In the column of t Algorithm sketch: where t can start a string derived from α. 1. First(t) = { t } – α→* t β 2. ε∈First(X) if X →εis a production –We say thatt ∈ First(α). • In the column of t 3. ε∈First(X) if X → A1 …An ε∈ ≤ ≤ if α is or derives ε and t can follow an A. –and First(Ai) for 1 i n –S →* β A t δ 4. First(α) –{ε} ⊆ First(X) if X → A1 …An α –We sayt ∈ Follow(A). –and ε∈First(Ai) for 1 ≤ i ≤ n CS780(Prasad) L101TDP 31 CS780(Prasad) L101TDP 32 8 First Sets. Example Computing Follow Sets • Recall the grammar • Definition: E → T X X → + E | ε Follow(X) = { t | S →* β X t δ } T → ( E ) | int Y Y → * T | ε •First sets • Intuition First( ( ) = { ( } First( T ) = {int, ( } –If X → A B then First(B) ⊆ Follow(A) and First( ) ) = { ) } First( E ) = {int, ( } Follow(X) ⊆ Follow(B) First( int) = { int } First( X ) = {+, ε } –Also if B →* ε then Follow(X) ⊆ Follow(A) First( + ) = { + } First( Y ) = {*, ε } –IfS is the start symbol then $ ∈ Follow(S) First( * ) = { * } CS780(Prasad) L101TDP 33 CS780(Prasad) L101TDP 34 Computing Follow Sets (Cont.) Follow Sets.

Top-Down Parsing – Top-Down – Bottom-Up • Top-Down Adapted from Lecture by – Easier to Understand and Program Manually Profs

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support