CS 335: Top-Down Parsing Swarnendu Biswas

CS 335: Top-Down Parsing Swarnendu Biswas

CS 335: Top-down Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References slide for acknowledgements. Example Expression Grammar 푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 − 푇푒푟푚 푇푒푟푚 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 푇푒푟푚 ÷ 퐹푎푐푡표푟 퐹푎푐푡표푟 priority 퐹푎푐푡표푟 → 퐸푥푝푟 | num | name CS 335 Swarnendu Biswas Derivation of name + name × name Sentential Form Input 퐸푥푝푟 ↑ name + name × name 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 퐹푎푐푡표푟 + 푇푒푟푚 ↑ name + name × name name + 푇푒푟푚 ↑ name + name × name name + 푇푒푟푚 name ↑ +name × name name + 푇푒푟푚 name +↑ name × name name + 푇푒푟푚 × 퐹푎푐푡표푟 name +↑ name × name name + 퐹푎푐푡표푟 × 퐹푎푐푡표푟 name +↑ name × name name + name × 퐹푎푐푡표푟 name +↑ name × name name + name × 퐹푎푐푡표푟 name + name ↑× name name + name × 퐹푎푐푡표푟 name + name ×↑ name name + name × name name + name ×↑ name name + name × name name + name × name ↑ CS 335 Swarnendu Biswas Derivation of name + name × name Sentential Form Input 퐸푥푝푟 ↑ name + name × name 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 퐹푎푐푡표푟 + 푇푒푟푚 ↑ name + name × name name + 푇푒푟푚 ↑ name + name × name name + 푇푒푟푚 name ↑ +name × name name + 푇푒푟푚 name +↑ name × name Thename current+ 푇푒푟푚 input× 퐹푎푐푡표푟 terminal being scannedname is +called↑ name ×thename lookaheadname + 퐹푎푐푡표푟 symbol× 퐹푎푐푡표푟 name +↑ name × name name + name × 퐹푎푐푡표푟 name +↑ name × name name + name × 퐹푎푐푡표푟 name + name ↑× name name + name × 퐹푎푐푡표푟 name + name ×↑ name name + name × name name + name ×↑ name name + name × name name + name × name ↑ CS 335 Swarnendu Biswas Derivation of name + name × name 푙푚 푙푚 푙푚 푙푚 푙푚 푆푡푎푟푡 퐸푥푝푟 퐸푥푝푟 퐸푥푝푟 퐸푥푝푟 퐸푥푝푟 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 푇푒푟푚 Term Term 퐹푎푐푡표푟 퐹푎푐푡표푟 name CS 335 Swarnendu Biswas Derivation of name + name × name 푙푚 퐸푥푝푟 푙푚 퐸푥푝푟 푙푚 퐸푥푝푟 푙푚 퐸푥푝푟 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 Term Term 푇푒푟푚 × 퐹푎푐푡표푟 Term 푇푒푟푚 × 퐹푎푐푡표푟 Term 푇푒푟푚 × 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 name name name name name CS 335 Swarnendu Biswas Derivation of name + name × name 푙푚 퐸푥푝푟 푙푚 퐸푥푝푟 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 Term 푇푒푟푚 × 퐹푎푐푡표푟 Term 푇푒푟푚 × 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 name name name name name CS 335 Swarnendu Biswas General Idea of Top-down Parsing Start with the root (start symbol) of the parse tree Grow the tree downwards by expanding productions at the lower levels of the tree • Select a nonterminal and extend it by adding children corresponding to the right side of some production for the nonterminal Repeat till • Lower fringe consists only terminals and the input is consumed Top-down parsing basically finds a leftmost derivation for an input string CS 335 Swarnendu Biswas General Idea of Top-down Parsing Start with the root of the parse tree Grow the tree by expanding productions at the lower levels of the tree • Extend a nonterminal by adding children corresponding to the right side of some production for the nonterminal Repeat till • Lower fringe consists only terminals and the input is consumed • Mismatch in the lower fringe and the remaining input stream • Selection of a production may involve trial-and-error • Wrong choice of productions while expanding nonterminals • Input character stream is not part of the language CS 335 Swarnendu Biswas Leftmost Top-down Parsing Algorithm root = node for Start symbol if curr == word: curr = root word = nextWord() push(null) // Stack curr = pop() word = nextWord() if word == eof and curr == null: accept input while (true): else if curr ∈ Nonterminal: backtrack pick next rule 퐴 ⟶ 훽1훽2 … 훽푛 to expand curr create nodes for 훽1, 훽2, …, 훽푛 as children of curr push(훽푛, 훽푛−1, 훽1) curr = 훽1 CS 335 Swarnendu Biswas Implementing Backtracking • Extend the previous algorithm to backtrack • Set curr to parent and delete the children • Expand the node curr with untried rules if any • Create child nodes for each symbol in the right hand of the production • Push those symbols onto the stack in reverse order • Set curr to the first child node • Move curr up the tree if there are no untried rules • Report a syntax error when there are no more moves CS 335 Swarnendu Biswas Example of Top-down Parsing Rule # Production Rule # Sentential Form Input 0 푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 ↑ name + name × name 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 3 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 3 퐸푥푝푟 → 푇푒푟푚 6 퐹푎푐푡표푟 + 푇푒푟푚 ↑ name + name × name 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 9 name + 푇푒푟푚 ↑ name + name × name 5 푇푒푟푚 → 푇푒푟푚 ÷ 퐹푎푐푡표푟 name + 푇푒푟푚 name ↑ +name × name + 푇푒푟푚 +↑ × 6 푇푒푟푚 → 퐹푎푐푡표푟 name name name name 4 name + 푇푒푟푚 × 퐹푎푐푡표푟 name +↑ name × name 7 퐹푎푐푡표푟 → (퐸푥푝푟) 6 name + 퐹푎푐푡표푟 × 퐹푎푐푡표푟 name +↑ name × name 8 퐹푎푐푡표푟 → num 9 name + name × 퐹푎푐푡표푟 name +↑ name × name 9 퐹푎푐푡표푟 → name name + name × 퐹푎푐푡표푟 name + name ↑× name name + name × 퐹푎푐푡표푟 name + name ×↑ name 9 name + name × name name + name ×↑ name name + name × name name + name × name ↑ CS 335 Swarnendu Biswas Example of Top-down Parsing Rule # Production Rule # Sentential Form Input 0 푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 ↑ name + name × name 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 3 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 3 퐸푥푝푟 → 푇푒푟푚 6 퐹푎푐푡표푟 + 푇푒푟푚 ↑ name + name × name 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 9 name + 푇푒푟푚 ↑ name + name × name 5 푇푒푟푚 → 푇푒푟푚 ÷ 퐹푎푐푡표푟 name + 푇푒푟푚 name ↑ +name × name + 푇푒푟푚 +↑ × 6 푇푒푟푚 → 퐹푎푐푡표푟 name name name name How does a top-down parser4 choosename + which푇푒푟푚 × 퐹푎푐푡표푟 rule toname apply?+↑ name × name 7 퐹푎푐푡표푟 → (퐸푥푝푟) 6 name + 퐹푎푐푡표푟 × 퐹푎푐푡표푟 name +↑ name × name 8 퐹푎푐푡표푟 → num 9 name + name × 퐹푎푐푡표푟 name +↑ name × name 9 퐹푎푐푡표푟 → name name + name × 퐹푎푐푡표푟 name + name ↑× name name + name × 퐹푎푐푡표푟 name + name ×↑ name 9 name + name × name name + name ×↑ name name + name × name name + name × name ↑ CS 335 Swarnendu Biswas Example of Top-down Parsing Rule # Production Rule # Sentential Form Input 0 푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 ↑ name + name × name 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 3 퐸푥푝푟 → 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 + 푇푒푟푚 + ⋯ ↑ name + name × name 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 1 … ↑ name + name × name 5 푇푒푟푚 → 푇푒푟푚 ÷ 퐹푎푐푡표푟 1 … ↑ name + name × name 6 푇푒푟푚 → 퐹푎푐푡표푟 7 퐹푎푐푡표푟 → (퐸푥푝푟) 8 퐹푎푐푡표푟 → num 9 퐹푎푐푡표푟 → name CS 335 Swarnendu Biswas Example of Top-Down Parsing Rule # Production Rule # Sentential Form Input 0 푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 ↑ name + name × name 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 3 퐸푥푝푟 → 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 + 푇푒푟푚 + ⋯ ↑ name + name × name 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 1 … ↑ name + name × name 5 A푇푒푟푚 top→-down푇푒푟푚 ÷ parser퐹푎푐푡표푟 can loop1 indefinitely …with left-recursive↑ name + name × name 6 grammar푇푒푟푚 → 퐹푎푐푡표푟 7 퐹푎푐푡표푟 → (퐸푥푝푟) 8 퐹푎푐푡표푟 → num 9 퐹푎푐푡표푟 → name CS 335 Swarnendu Biswas Left Recursion • A grammar is left-recursive if it has a nonterminal 퐴 such that there is + a derivation 퐴 ֜ 퐴훼 for some string 훼 • Direct left recursion: There is a production of the form 퐴 → 퐴훼 • Indirect left recursion: First symbol on the right-hand side of a rule can derive the symbol on the left We can often reformulate a grammar to avoid left recursion CS 335 Swarnendu Biswas Remove Left Recursion 퐴 → 퐴훼1 퐴훼2 … |퐴훼푚 훽1 … |훽푛 ′ ′ ′ 퐴 → 훽1퐴 |훽2퐴 |…| 훽푛퐴 ′ ′ ′ ′ 퐴 → 훼1퐴 훼2퐴 … |훼푚퐴 |휖 CS 335 Swarnendu Biswas Remove Left Recursion 퐸 → 푇퐸′ 퐸 → 퐸 + 푇 | 푇 퐸′ → +푇퐸′ 푇 → 푇 ∗ 퐹 | 퐹 푇 → 퐹푇′ 퐹 → 퐸 | id 푇′ →∗ 퐹푇′ 퐹 → 퐸 |id CS 335 Swarnendu Biswas Non-Left-Recursive Expression Grammar Rule # Production Rule # Production 0 푆푡푎푟푡 → 퐸푥푝푟 0 푆푡푎푟푡 → 퐸푥푝푟 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 → 푇푒푟푚 퐸푥푝푟′ 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 2 퐸푥푝푟′ → + 푇푒푟푚 퐸푥푝푟′ 3 퐸푥푝푟 → 푇푒푟푚 3 퐸푥푝푟′ → − 푇푒푟푚 퐸푥푝푟′ 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 4 퐸푥푝푟′ → 휖 5 푇푒푟푚 → 푇푒푟푚 ÷ 퐹푎푐푡표푟 5 푇푒푟푚 → 퐹푎푐푡표푟 푇푒푟푚′ 6 푇푒푟푚 → 퐹푎푐푡표푟 6 푇푒푟푚′ →× 퐹푎푐푡표푟 푇푒푟푚′ 7 퐹푎푐푡표푟 → (퐸푥푝푟) 7 푇푒푟푚′ →÷ 퐹푎푐푡표푟 푇푒푟푚′ 8 퐹푎푐푡표푟 → num 8 푇푒푟푚′ → 휖 9 퐹푎푐푡표푟 → name 9 퐹푎푐푡표푟 → (퐸푥푝푟) 10 퐹푎푐푡표푟 → num 11 퐹푎푐푡표푟 → name CS 335 Swarnendu Biswas Indirect Left Recursion 푆 → 퐴푎 | 푏 퐴 → 퐴푐 푆푑 휖 • There is a left recursion because 푆 → 퐴푎 → 푆푑푎 CS 335 Swarnendu Biswas Eliminating Left Recursion • Input: Grammar 퐺 with no cycles or 휖−productions • Algorithm Arrange nonterminals in some order 퐴1, 퐴2, … , 퐴푛 for 푖 ← 1 … 푛 for 푗 ← 1 to 푖 − 1 If ∃ a production 퐴푖 → 퐴푗훾 Replace 퐴푖 → 퐴푗훾 with one or more productions that expand 퐴푗 Eliminate the immediate left recursion among the 퐴푖 productions CS 335 Swarnendu Biswas Eliminating Left Recursion • Input: Grammar 퐺 with no cycles or 휖−productions • Algorithm Arrange nonterminals in some order 퐴1, 퐴2, … , 퐴푛 for 푖 ← 1 … 푛 for 푗 ← 1 to 푖 − 1 If ∃ a production 퐴푖 → 퐴푗훾 Replace 퐴푖 → 퐴푗훾 with one or more productions that expand 퐴푗 Eliminate the immediate left recursion among the 퐴푖 productions Loop invariant at the start of outer iteration 푖 ∀푘 < 푖, no production expanding 퐴푘 has 퐴푙 in its righthand side for all 푙 < 푘 CS 335 Swarnendu Biswas Eliminating Indirect Left Recursion 푆 → 퐴푎 | 푏 푆 → 퐴푎 | 푏 퐴 → 퐴푐 푆푑 휖 퐴 → 푏푑퐴′ | 퐴′ 퐴′ → 푐퐴′ 푎푑퐴′ 휖 CS 335 Swarnendu Biswas Cost of Backtracking Backtracking is expensive • Parser expands a nonterminal with the wrong rule • Mismatch between the lower fringe of the parse tree and the input is detected • Parser undoes the last few actions • Parser tries other productions if any CS 335 Swarnendu Biswas Avoid Backtracking • Parser is to select the next rule • Compare the curr symbol and the next input symbol called the lookahead • Use the lookahead to disambiguate the possible production rules • Backtrack-free grammar is a CFG for which the leftmost, top-down parser can always predict the correct rule with one word lookahead • Also called a predictive grammar CS 335 Swarnendu Biswas FIRST Set • Intuition • Each alternative for the leftmost nonterminal leads to a distinct terminal symbol • Which rule to choose becomes obvious by comparing the next word in the input stream • Given a string 훾 of terminal and nonterminal symbols, FIRST(훾) is the set of all terminal symbols that can begin any string derived from 훾 • We also need to keep track of which symbols can produce the empty string • FIRST: (푁푇 ∪ 푇 ∪ 휖, EOF ) → (푇 ∪ 휖, EOF ) CS 335 Swarnendu Biswas Steps to Compute FIRST Set 1.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    60 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us