
Lecture Outline • Implementation of parsers • Two approaches Top-Down Parsing – Top-down – Bottom-up • Top-Down Adapted from Lecture by – Easier to understand and program manually Profs. Alex Aiken & George Necula • Bottom-Up (UCB) – More powerful and used by most parser generators CS780(Prasad) L101TDP 1 CS780(Prasad) L101TDP 2 Intro to Top-Down Parsing Recursive Descent Parsing • Consider the grammar • The parse tree is constructed 1 E → T + E | T – From the top T → ( E ) | int | int * T – From left to right t2 3 t9 • Token stream is: int5 * int2 • Start with top-level non-terminal E • Terminals are seen in order of 4 7 appearance in the token • Try the rules for E in order stream: t5 t6 t8 t2 t5 t6 t8 t9 CS780(Prasad) L101TDP 3 CS780(Prasad) L101TDP 4 1 Recursive Descent Parsing. Example (Cont.) Recursive Descent Parsing. Example (Cont.) •Try E0 → T1 + E2 •Try E0 → T1 • Then try a rule for T1 → ( E3 ) • Follow same steps as before for T1 –But( does not match input token int5 – And succeed with T1 → int * T2 and T2 → int – With the following parse tree •TryT1 → int . Token matches. E0 –But + after T1 does not match input token * → •Try T1 int * T2 T1 – This will match but + after T1 will be unmatched int5 * T2 •Has exhausted the choices for T1 – Backtrack to choice for E0 int2 CS780(Prasad) L101TDP 5 CS780(Prasad) L101TDP 6 A Recursive Descent Parser. Preliminaries A Recursive Descent Parser (2) • Let TOKEN be the type of tokens • Define boolean functions that check the token – Special tokens INT, OPEN, CLOSE, PLUS, TIMES string for a match of – A given token terminal • Let the global next point to the next token bool term(TOKEN tok) { return *next++ == tok; } – A given production of S (the nth) bool Sn() { … } – Any production of S: bool S() { … } • These functions advance next CS780(Prasad) L101TDP 7 CS780(Prasad) L101TDP 8 2 A Recursive Descent Parser (3) A Recursive Descent Parser (4) • For production E → T + E • Functions for non-terminal T bool T () { return term(OPEN) && E() && term(CLOSE); } bool E1() { return T() && term(PLUS) && E(); } 1 • For production E → T bool T2() { return term(INT) && term(TIMES) && T(); } bool T3() { return term(INT); } bool E2() { return T(); } For all productions of E (with backtracking) bool T() { bool E() { TOKEN *save = next; TOKEN *save = next; return (next = save, T1()) return (next = save, E ()) 1 || (next = save, T2()) || (next = save, E ()); } 2 || (next = save, T3()); } CS780(Prasad) L101TDP 9 CS780(Prasad) L101TDP 10 Recursive Descent Parsing. Notes. When Recursive Descent Does Not Work • To start the parser • Consider a production S → S a –Initialize next to point to first token bool S1() { return S() && term(a); } –Invoke E() bool S() { return S1(); } • Notice how this simulates our previous •S()will get into an infinite loop example. • A left-recursive grammar has a non-terminal S • Easy to implement by hand S →+ Sα for some α • But does not always work … • Recursive descent does not work in such cases. CS780(Prasad) L101TDP 11 CS780(Prasad) L101TDP 12 3 Elimination of Left Recursion More Elimination of Left-Recursion • Consider the left-recursive grammar • In general S → S α | β S → S α1 | … | S αn | β1 | … | βm • S generates all strings starting with a β and • All strings derived from S start with one of followed by a number of α β1,…,βm and continue with several instances of α α βα∗ 1,…, n • Can rewrite using right-recursion •Rewrite as S →β S’ | … | β S’ S →βS’ 1 m S’ →α S’ | … | α S’ | ε S’ →αS’ | ε 1 n CS780(Prasad) L101TDP 13 CS780(Prasad) L101TDP 14 A → Bb | a General Left Recursion B → Aa | b •The grammar (Cf. Gaussian Elimination) S → A α | δ A → Bb | a A → S β is also left-recursive because B → (Bb | a)a | b S →+ S βα A → Bb | a A → Bb | a B → (aa | b)Z | (aa | b) • This left-recursion can also be eliminated. B → Bba | aa | b Z → baZ | ba • More examples on the following slides. CS780(Prasad) L101TDP 15 CS780(Prasad) L101TDP 16 4 Example: Related to conversion to Griebach Normal Formal Summary of Recursive Descent A → BC C → (bCB | a)R • Simple and general parsing strategy B → CA | b | bCB|a – Left-recursion must be eliminated first C → AB | a R → ACBR | ACB n o i – … but that can be done automatically s r u Introducing terminals A B C c f f e r as first element on RHS • Unpopular because of backtracking t f e l C → bCBR | aR | bCB | a A → BC g – Thought to be too inefficient n i t a n B → bcBRA | aRA B → CA | b i – Cf. Prolog execution strategy m i | bCBA | aA | b C → BCB | a El A → bcBRAC | aRAC • In practice, backtracking is eliminated by C → CACB | bCB | a | bCBAC | aAC | bC restricting the grammar R → (bCBRAC |... | bC)(CBR | CB) – To enable “look-before-you-leap” strategy CS780(Prasad) L101TDP 17 CS780(Prasad) L101TDP 18 Predictive Parsers () • Like recursive-descent but parser can •LL(k) grammars “predict” which production to use. •LR(k) grammars – By looking at the next few tokens. –Lmeans “left-to-right” scan of input –No backtracking. –Rmeans “rightmost derivation” • Predictive parsers accept LL(k) grammars. –kmeans “predict based on k tokens of lookahead” –Lmeans “left-to-right” scan of input. •RL(1) grammars –Lmeans “leftmost derivation”. –Rmeans “right-to-left” scan of input –kmeans “predict based on k tokens of lookahead”. •LR(0) , LR(1) grammars • In practice, LL(1) is used. •SLR(1) grammars, LALR(1) grammars CS780(Prasad) L101TDP 19 CS780(Prasad) L101TDP 20 5 LL(1) Languages Predictive Parsing and Left Factoring • In recursive-descent, for each non-terminal • Recall the grammar and input token there may be a choice of E → T + E | T production. T → int | int * T | ( E ) • LL(1) means that for each non-terminal and token there is only one production. •Hard to predict because • Can be specified via 2D tables. –For T, two productions start with int. –For E, it is not clear how to predict. – One dimension for current non-terminal to expand. – One dimension for next token. •A grammar must be left-factored before use – A table entry contains one production. for predictive parsing. CS780(Prasad) L101TDP 21 CS780(Prasad) L101TDP 22 Left-Factoring Example LL(1) Parsing Table Example • Recall the grammar • Left-factored grammar E → T + E | T E → T X X → + E | ε T → int | int * T | ( E ) T → ( E ) | int Y Y → * T | ε • Factor out common prefixes of productions, • The LL(1) parsing table: possibly introducing ε-productions int * + ( ) $ E → T X E T X T X X + E X → + E | ε ε ε T → ( E ) | int Y T int Y ( E ) Y * T ε ε ε Y → * T | ε CS780(Prasad) L101TDP 23 CS780(Prasad) L101TDP 24 6 LL(1) Parsing Table Example (Cont.) LL(1) Parsing Tables. Errors • Consider the [E, int] entry • Blank entries indicate error situations – “When current non-terminal is E and next input is –Consider the [E,*] entry int, use production E → T X. – “There is no way to derive a string starting with * – This production can generate an int in the first from non-terminal E” place. • Consider the [Y,+] entry – “When current non-terminal is Y and current token is +, get rid of Y”. –Ycan be followed by + only in a derivation in which Y →ε. CS780(Prasad) L101TDP 25 CS780(Prasad) L101TDP 26 Using Parsing Tables LL(1) Parsing Algorithm • Method similar to recursive descent, except initialize stack = <S $> and next – For each non-terminal X repeat case stack of – We look at the next token t <X, rest> : if T[X,*next] = Y1…Yn – And chose the production shown at [X,t] then stack ← <Y1…Yn, rest>; • We use a stack to keep track of pending non- else error (); terminals. <t, rest> : if t == *next ++ then stack ← <rest>; • We reject when we encounter an error state. else error (); • We accept when we encounter end-of-input. until stack == < > CS780(Prasad) L101TDP 27 CS780(Prasad) L101TDP 28 7 LL(1) Parsing Example Constructing Parsing Tables Stack Input Action • LL(1) languages are those defined by a parsing E $ int * int $ T X table for the LL(1) algorithm. T X $ int * int $ int Y • No table entry can be multiply defined. int Y X $ int * int $ terminal Y X $ * int $ * T * T X $ * int $ terminal • We want to generate parsing tables from CFG. T X $ int $ int Y int Y X $ int $ terminal Y X $ $ ε X $ $ ε $ $ ACCEPT CS780(Prasad) L101TDP 29 CS780(Prasad) L101TDP 30 Constructing Parsing Tables (Cont.) Computing First Sets •If A →α, Definition where in the line of A do we place α ? First(X) = { t | X →* tα} ∪ {ε | X →* ε} • In the column of t Algorithm sketch: where t can start a string derived from α. 1. First(t) = { t } – α→* t β 2. ε∈First(X) if X →εis a production –We say thatt ∈ First(α). • In the column of t 3. ε∈First(X) if X → A1 …An ε∈ ≤ ≤ if α is or derives ε and t can follow an A. –and First(Ai) for 1 i n –S →* β A t δ 4. First(α) –{ε} ⊆ First(X) if X → A1 …An α –We sayt ∈ Follow(A). –and ε∈First(Ai) for 1 ≤ i ≤ n CS780(Prasad) L101TDP 31 CS780(Prasad) L101TDP 32 8 First Sets. Example Computing Follow Sets • Recall the grammar • Definition: E → T X X → + E | ε Follow(X) = { t | S →* β X t δ } T → ( E ) | int Y Y → * T | ε •First sets • Intuition First( ( ) = { ( } First( T ) = {int, ( } –If X → A B then First(B) ⊆ Follow(A) and First( ) ) = { ) } First( E ) = {int, ( } Follow(X) ⊆ Follow(B) First( int) = { int } First( X ) = {+, ε } –Also if B →* ε then Follow(X) ⊆ Follow(A) First( + ) = { + } First( Y ) = {*, ε } –IfS is the start symbol then $ ∈ Follow(S) First( * ) = { * } CS780(Prasad) L101TDP 33 CS780(Prasad) L101TDP 34 Computing Follow Sets (Cont.) Follow Sets.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages10 Page
-
File Size-