
CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References slide for acknowledgements. Example Expression Grammar 푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 − 푇푒푟푚 푇푒푟푚 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 푇푒푟푚 ÷ 퐹푎푐푡표푟 퐹푎푐푡표푟 priority 퐹푎푐푡표푟 → 퐸푥푝푟 | num | name CS 335 Swarnendu Biswas Derivation of name + name × name Sentential Form Input 퐸푥푝푟 ↑ name + name × name 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 퐹푎푐푡표푟 + 푇푒푟푚 ↑ name + name × name name + 푇푒푟푚 ↑ name + name × name name + 푇푒푟푚 name ↑ +name × name name + 푇푒푟푚 name +↑ name × name name + 푇푒푟푚 × 퐹푎푐푡표푟 name +↑ name × name name + 퐹푎푐푡표푟 × 퐹푎푐푡표푟 name +↑ name × name name + name × 퐹푎푐푡표푟 name +↑ name × name name + name × 퐹푎푐푡표푟 name + name ↑× name name + name × 퐹푎푐푡표푟 name + name ×↑ name name + name × name name + name ×↑ name name + name × name name + name × name ↑ CS 335 Swarnendu Biswas Derivation of name + name × name Sentential Form Input 퐸푥푝푟 ↑ name + name × name 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 퐹푎푐푡표푟 + 푇푒푟푚 ↑ name + name × name name + 푇푒푟푚 ↑ name + name × name name + 푇푒푟푚 name ↑ +name × name name + 푇푒푟푚 name +↑ name × name Thename current+ 푇푒푟푚 input× 퐹푎푐푡표푟 terminal being scannedname is +called↑ name ×thename lookaheadname + 퐹푎푐푡표푟 symbol× 퐹푎푐푡표푟 name +↑ name × name name + name × 퐹푎푐푡표푟 name +↑ name × name name + name × 퐹푎푐푡표푟 name + name ↑× name name + name × 퐹푎푐푡표푟 name + name ×↑ name name + name × name name + name ×↑ name name + name × name name + name × name ↑ CS 335 Swarnendu Biswas Derivation of name + name × name 푙푚 푙푚 푙푚 푙푚 푙푚 푆푡푎푟푡 퐸푥푝푟 퐸푥푝푟 퐸푥푝푟 퐸푥푝푟 퐸푥푝푟 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 푇푒푟푚 Term Term 퐹푎푐푡표푟 퐹푎푐푡표푟 name CS 335 Swarnendu Biswas Derivation of name + name × name 푙푚 퐸푥푝푟 푙푚 퐸푥푝푟 푙푚 퐸푥푝푟 푙푚 퐸푥푝푟 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 Term Term 푇푒푟푚 × 퐹푎푐푡표푟 Term 푇푒푟푚 × 퐹푎푐푡표푟 Term 푇푒푟푚 × 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 name name name name name CS 335 Swarnendu Biswas Derivation of name + name × name 푙푚 퐸푥푝푟 푙푚 퐸푥푝푟 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 Term 푇푒푟푚 × 퐹푎푐푡표푟 Term 푇푒푟푚 × 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 name name name name name CS 335 Swarnendu Biswas General Idea of Top-down Parsing Start with the root (start symbol) of the parse tree Grow the tree downwards by expanding productions at the lower levels of the tree • Select a nonterminal and extend it by adding children corresponding to the right side of some production for the nonterminal Repeat till • Lower fringe consists only terminals and the input is consumed Top-down parsing basically finds a leftmost derivation for an input string CS 335 Swarnendu Biswas General Idea of Top-down Parsing Start with the root of the parse tree Grow the tree by expanding productions at the lower levels of the tree • Extend a nonterminal by adding children corresponding to the right side of some production for the nonterminal Repeat till • Lower fringe consists only terminals and the input is consumed • Mismatch in the lower fringe and the remaining input stream • Selection of a production may involve trial-and-error • Wrong choice of productions while expanding nonterminals • Input character stream is not part of the language CS 335 Swarnendu Biswas Leftmost Top-down Parsing Algorithm root = node for Start symbol if curr == word: curr = root word = nextWord() push(null) // Stack curr = pop() word = nextWord() if word == eof and curr == null: accept input while (true): else if curr ∈ Nonterminal: backtrack pick next rule 퐴 ⟶ 훽1훽2 … 훽푛 to expand curr create nodes for 훽1, 훽2, …, 훽푛 as children of curr push(훽푛, 훽푛−1, 훽1) curr = 훽1 CS 335 Swarnendu Biswas Implementing Backtracking • Extend the previous algorithm to backtrack • Set curr to parent and delete the children • Expand the node curr with untried rules if any • Create child nodes for each symbol in the right hand of the production • Push those symbols onto the stack in reverse order • Set curr to the first child node • Move curr up the tree if there are no untried rules • Report a syntax error when there are no more moves CS 335 Swarnendu Biswas Example of Top-down Parsing Rule # Production Rule # Sentential Form Input 0 푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 ↑ name + name × name 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 3 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 3 퐸푥푝푟 → 푇푒푟푚 6 퐹푎푐푡표푟 + 푇푒푟푚 ↑ name + name × name 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 9 name + 푇푒푟푚 ↑ name + name × name 5 푇푒푟푚 → 푇푒푟푚 ÷ 퐹푎푐푡표푟 name + 푇푒푟푚 name ↑ +name × name + 푇푒푟푚 +↑ × 6 푇푒푟푚 → 퐹푎푐푡표푟 name name name name 4 name + 푇푒푟푚 × 퐹푎푐푡표푟 name +↑ name × name 7 퐹푎푐푡표푟 → (퐸푥푝푟) 6 name + 퐹푎푐푡표푟 × 퐹푎푐푡표푟 name +↑ name × name 8 퐹푎푐푡표푟 → num 9 name + name × 퐹푎푐푡표푟 name +↑ name × name 9 퐹푎푐푡표푟 → name name + name × 퐹푎푐푡표푟 name + name ↑× name name + name × 퐹푎푐푡표푟 name + name ×↑ name 9 name + name × name name + name ×↑ name name + name × name name + name × name ↑ CS 335 Swarnendu Biswas Example of Top-down Parsing Rule # Production Rule # Sentential Form Input 0 푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 ↑ name + name × name 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 3 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 3 퐸푥푝푟 → 푇푒푟푚 6 퐹푎푐푡표푟 + 푇푒푟푚 ↑ name + name × name 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 9 name + 푇푒푟푚 ↑ name + name × name 5 푇푒푟푚 → 푇푒푟푚 ÷ 퐹푎푐푡표푟 name + 푇푒푟푚 name ↑ +name × name + 푇푒푟푚 +↑ × 6 푇푒푟푚 → 퐹푎푐푡표푟 name name name name How does a top-down parser4 choosename + which푇푒푟푚 × 퐹푎푐푡표푟 rule toname apply?+↑ name × name 7 퐹푎푐푡표푟 → (퐸푥푝푟) 6 name + 퐹푎푐푡표푟 × 퐹푎푐푡표푟 name +↑ name × name 8 퐹푎푐푡표푟 → num 9 name + name × 퐹푎푐푡표푟 name +↑ name × name 9 퐹푎푐푡표푟 → name name + name × 퐹푎푐푡표푟 name + name ↑× name name + name × 퐹푎푐푡표푟 name + name ×↑ name 9 name + name × name name + name ×↑ name name + name × name name + name × name ↑ CS 335 Swarnendu Biswas Example of Top-down Parsing Rule # Production Rule # Sentential Form Input 0 푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 ↑ name + name × name 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 3 퐸푥푝푟 → 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 + 푇푒푟푚 + ⋯ ↑ name + name × name 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 1 … ↑ name + name × name 5 푇푒푟푚 → 푇푒푟푚 ÷ 퐹푎푐푡표푟 1 … ↑ name + name × name 6 푇푒푟푚 → 퐹푎푐푡표푟 7 퐹푎푐푡표푟 → (퐸푥푝푟) 8 퐹푎푐푡표푟 → num 9 퐹푎푐푡표푟 → name CS 335 Swarnendu Biswas Example of Top-Down Parsing Rule # Production Rule # Sentential Form Input 0 푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 ↑ name + name × name 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 3 퐸푥푝푟 → 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 + 푇푒푟푚 + ⋯ ↑ name + name × name 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 1 … ↑ name + name × name 5 A푇푒푟푚 top→-down푇푒푟푚 ÷ parser퐹푎푐푡표푟 can loop1 indefinitely …with left-recursive↑ name + name × name 6 grammar푇푒푟푚 → 퐹푎푐푡표푟 7 퐹푎푐푡표푟 → (퐸푥푝푟) 8 퐹푎푐푡표푟 → num 9 퐹푎푐푡표푟 → name CS 335 Swarnendu Biswas Left Recursion • A grammar is left-recursive if it has a nonterminal 퐴 such that there is + a derivation 퐴 ֜ 퐴훼 for some string 훼 • Direct left recursion: There is a production of the form 퐴 → 퐴훼 • Indirect left recursion: First symbol on the right-hand side of a rule can derive the symbol on the left We can often reformulate a grammar to avoid left recursion CS 335 Swarnendu Biswas Remove Left Recursion 퐴 → 퐴훼1 퐴훼2 … |퐴훼푚 훽1 … |훽푛 ′ ′ ′ 퐴 → 훽1퐴 |훽2퐴 |…| 훽푛퐴 ′ ′ ′ ′ 퐴 → 훼1퐴 훼2퐴 … |훼푚퐴 |휖 CS 335 Swarnendu Biswas Remove Left Recursion 퐸 → 푇퐸′ 퐸 → 퐸 + 푇 | 푇 퐸′ → +푇퐸′ 푇 → 푇 ∗ 퐹 | 퐹 푇 → 퐹푇′ 퐹 → 퐸 | id 푇′ →∗ 퐹푇′ 퐹 → 퐸 |id CS 335 Swarnendu Biswas Non-Left-Recursive Expression Grammar Rule # Production Rule # Production 0 푆푡푎푟푡 → 퐸푥푝푟 0 푆푡푎푟푡 → 퐸푥푝푟 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 → 푇푒푟푚 퐸푥푝푟′ 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 2 퐸푥푝푟′ → + 푇푒푟푚 퐸푥푝푟′ 3 퐸푥푝푟 → 푇푒푟푚 3 퐸푥푝푟′ → − 푇푒푟푚 퐸푥푝푟′ 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 4 퐸푥푝푟′ → 휖 5 푇푒푟푚 → 푇푒푟푚 ÷ 퐹푎푐푡표푟 5 푇푒푟푚 → 퐹푎푐푡표푟 푇푒푟푚′ 6 푇푒푟푚 → 퐹푎푐푡표푟 6 푇푒푟푚′ →× 퐹푎푐푡표푟 푇푒푟푚′ 7 퐹푎푐푡표푟 → (퐸푥푝푟) 7 푇푒푟푚′ →÷ 퐹푎푐푡표푟 푇푒푟푚′ 8 퐹푎푐푡표푟 → num 8 푇푒푟푚′ → 휖 9 퐹푎푐푡표푟 → name 9 퐹푎푐푡표푟 → (퐸푥푝푟) 10 퐹푎푐푡표푟 → num 11 퐹푎푐푡표푟 → name CS 335 Swarnendu Biswas Indirect Left Recursion 푆 → 퐴푎 | 푏 퐴 → 퐴푐 푆푑 휖 • There is a left recursion because 푆 → 퐴푎 → 푆푑푎 CS 335 Swarnendu Biswas Eliminating Left Recursion • Input: Grammar 퐺 with no cycles or 휖−productions • Algorithm Arrange nonterminals in some order 퐴1, 퐴2, … , 퐴푛 for 푖 ← 1 … 푛 for 푗 ← 1 to 푖 − 1 If ∃ a production 퐴푖 → 퐴푗훾 Replace 퐴푖 → 퐴푗훾 with one or more productions that expand 퐴푗 Eliminate the immediate left recursion among the 퐴푖 productions CS 335 Swarnendu Biswas Eliminating Left Recursion • Input: Grammar 퐺 with no cycles or 휖−productions • Algorithm Arrange nonterminals in some order 퐴1, 퐴2, … , 퐴푛 for 푖 ← 1 … 푛 for 푗 ← 1 to 푖 − 1 If ∃ a production 퐴푖 → 퐴푗훾 Replace 퐴푖 → 퐴푗훾 with one or more productions that expand 퐴푗 Eliminate the immediate left recursion among the 퐴푖 productions Loop invariant at the start of outer iteration 푖 ∀푘 < 푖, no production expanding 퐴푘 has 퐴푙 in its righthand side for all 푙 < 푘 CS 335 Swarnendu Biswas Eliminating Indirect Left Recursion 푆 → 퐴푎 | 푏 푆 → 퐴푎 | 푏 퐴 → 퐴푐 푆푑 휖 퐴 → 푏푑퐴′ | 퐴′ 퐴′ → 푐퐴′ 푎푑퐴′ 휖 CS 335 Swarnendu Biswas Cost of Backtracking Backtracking is expensive • Parser expands a nonterminal with the wrong rule • Mismatch between the lower fringe of the parse tree and the input is detected • Parser undoes the last few actions • Parser tries other productions if any CS 335 Swarnendu Biswas Avoid Backtracking • Parser is to select the next rule • Compare the curr symbol and the next input symbol called the lookahead • Use the lookahead to disambiguate the possible production rules • Backtrack-free grammar is a CFG for which the leftmost, top-down parser can always predict the correct rule with one word lookahead • Also called a predictive grammar CS 335 Swarnendu Biswas FIRST Set • Intuition • Each alternative for the leftmost nonterminal leads to a distinct terminal symbol • Which rule to choose becomes obvious by comparing the next word in the input stream • Given a string 훾 of terminal and nonterminal symbols, FIRST(훾) is the set of all terminal symbols that can begin any string derived from 훾 • We also need to keep track of which symbols can produce the empty string • FIRST: (푁푇 ∪ 푇 ∪ 휖, EOF ) → (푇 ∪ 휖, EOF ) CS 335 Swarnendu Biswas Steps to Compute FIRST Set 1.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages60 Page
-
File Size-