CS 335: Top-down Parsing Swarnendu Biswas
Semester 2019-2020-II CSE, IIT Kanpur
Content influenced by many excellent references, see References slide for acknowledgements. Example Expression Grammar
푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 − 푇푒푟푚 푇푒푟푚
푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 푇푒푟푚 ÷ 퐹푎푐푡표푟 퐹푎푐푡표푟 priority 퐹푎푐푡표푟 → 퐸푥푝푟 | num | name
CS 335 Swarnendu Biswas Derivation of name + name × name Sentential Form Input 퐸푥푝푟 ↑ name + name × name 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 퐹푎푐푡표푟 + 푇푒푟푚 ↑ name + name × name name + 푇푒푟푚 ↑ name + name × name name + 푇푒푟푚 name ↑ +name × name name + 푇푒푟푚 name +↑ name × name name + 푇푒푟푚 × 퐹푎푐푡표푟 name +↑ name × name name + 퐹푎푐푡표푟 × 퐹푎푐푡표푟 name +↑ name × name name + name × 퐹푎푐푡표푟 name +↑ name × name name + name × 퐹푎푐푡표푟 name + name ↑× name name + name × 퐹푎푐푡표푟 name + name ×↑ name name + name × name name + name ×↑ name name + name × name name + name × name ↑ CS 335 Swarnendu Biswas Derivation of name + name × name Sentential Form Input 퐸푥푝푟 ↑ name + name × name 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 퐹푎푐푡표푟 + 푇푒푟푚 ↑ name + name × name name + 푇푒푟푚 ↑ name + name × name name + 푇푒푟푚 name ↑ +name × name name + 푇푒푟푚 name +↑ name × name Thename current+ 푇푒푟푚 input× 퐹푎푐푡표푟 terminal being scannedname is +called↑ name ×thename lookaheadname + 퐹푎푐푡표푟 symbol× 퐹푎푐푡표푟 name +↑ name × name name + name × 퐹푎푐푡표푟 name +↑ name × name name + name × 퐹푎푐푡표푟 name + name ↑× name name + name × 퐹푎푐푡표푟 name + name ×↑ name name + name × name name + name ×↑ name name + name × name name + name × name ↑ CS 335 Swarnendu Biswas Derivation of name + name × name
푙푚 푙푚 푙푚 푙푚 푙푚 푆푡푎푟푡 퐸푥푝푟 퐸푥푝푟 퐸푥푝푟 퐸푥푝푟 퐸푥푝푟
퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚
푇푒푟푚 Term Term
퐹푎푐푡표푟 퐹푎푐푡표푟
name
CS 335 Swarnendu Biswas Derivation of name + name × name
푙푚 퐸푥푝푟 푙푚 퐸푥푝푟 푙푚 퐸푥푝푟 푙푚 퐸푥푝푟
퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚
Term Term 푇푒푟푚 × 퐹푎푐푡표푟 Term 푇푒푟푚 × 퐹푎푐푡표푟 Term 푇푒푟푚 × 퐹푎푐푡표푟
퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟
name name name name name
CS 335 Swarnendu Biswas Derivation of name + name × name
푙푚 퐸푥푝푟 푙푚 퐸푥푝푟
퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚
Term 푇푒푟푚 × 퐹푎푐푡표푟 Term 푇푒푟푚 × 퐹푎푐푡표푟
퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 name
name name name name
CS 335 Swarnendu Biswas General Idea of Top-down Parsing
Start with the root (start symbol) of the parse tree
Grow the tree downwards by expanding productions at the lower levels of the tree • Select a nonterminal and extend it by adding children corresponding to the right side of some production for the nonterminal
Repeat till
• Lower fringe consists only terminals and the input is consumed
Top-down parsing basically finds a leftmost derivation for an input string
CS 335 Swarnendu Biswas General Idea of Top-down Parsing
Start with the root of the parse tree
Grow the tree by expanding productions at the lower levels of the tree • Extend a nonterminal by adding children corresponding to the right side of some production for the nonterminal Repeat till • Lower fringe consists only terminals and the input is consumed • Mismatch in the lower fringe and the remaining input stream • Selection of a production may involve trial-and-error • Wrong choice of productions while expanding nonterminals • Input character stream is not part of the language
CS 335 Swarnendu Biswas Leftmost Top-down Parsing Algorithm root = node for Start symbol if curr == word: curr = root word = nextWord() push(null) // Stack curr = pop() word = nextWord() if word == eof and curr == null: accept input while (true): else if curr ∈ Nonterminal: backtrack
pick next rule 퐴 ⟶ 훽1훽2 … 훽푛 to expand curr
create nodes for 훽1, 훽2, …, 훽푛 as children of curr
push(훽푛, 훽푛−1, 훽1)
curr = 훽1
CS 335 Swarnendu Biswas Implementing Backtracking
• Extend the previous algorithm to backtrack • Set curr to parent and delete the children • Expand the node curr with untried rules if any • Create child nodes for each symbol in the right hand of the production • Push those symbols onto the stack in reverse order • Set curr to the first child node • Move curr up the tree if there are no untried rules • Report a syntax error when there are no more moves
CS 335 Swarnendu Biswas Example of Top-down Parsing
Rule # Production Rule # Sentential Form Input 0 푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 ↑ name + name × name 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 3 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 3 퐸푥푝푟 → 푇푒푟푚 6 퐹푎푐푡표푟 + 푇푒푟푚 ↑ name + name × name 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 9 name + 푇푒푟푚 ↑ name + name × name 5 푇푒푟푚 → 푇푒푟푚 ÷ 퐹푎푐푡표푟 name + 푇푒푟푚 name ↑ +name × name + 푇푒푟푚 +↑ × 6 푇푒푟푚 → 퐹푎푐푡표푟 name name name name 4 name + 푇푒푟푚 × 퐹푎푐푡표푟 name +↑ name × name 7 퐹푎푐푡표푟 → (퐸푥푝푟) 6 name + 퐹푎푐푡표푟 × 퐹푎푐푡표푟 name +↑ name × name 8 퐹푎푐푡표푟 → num 9 name + name × 퐹푎푐푡표푟 name +↑ name × name 9 퐹푎푐푡표푟 → name name + name × 퐹푎푐푡표푟 name + name ↑× name name + name × 퐹푎푐푡표푟 name + name ×↑ name 9 name + name × name name + name ×↑ name name + name × name name + name × name ↑
CS 335 Swarnendu Biswas Example of Top-down Parsing
Rule # Production Rule # Sentential Form Input 0 푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 ↑ name + name × name 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 3 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 3 퐸푥푝푟 → 푇푒푟푚 6 퐹푎푐푡표푟 + 푇푒푟푚 ↑ name + name × name 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 9 name + 푇푒푟푚 ↑ name + name × name 5 푇푒푟푚 → 푇푒푟푚 ÷ 퐹푎푐푡표푟 name + 푇푒푟푚 name ↑ +name × name + 푇푒푟푚 +↑ × 6 푇푒푟푚 → 퐹푎푐푡표푟 name name name name How does a top-down parser4 choosename + which푇푒푟푚 × 퐹푎푐푡표푟 rule toname apply?+↑ name × name 7 퐹푎푐푡표푟 → (퐸푥푝푟) 6 name + 퐹푎푐푡표푟 × 퐹푎푐푡표푟 name +↑ name × name 8 퐹푎푐푡표푟 → num 9 name + name × 퐹푎푐푡표푟 name +↑ name × name 9 퐹푎푐푡표푟 → name name + name × 퐹푎푐푡표푟 name + name ↑× name name + name × 퐹푎푐푡표푟 name + name ×↑ name 9 name + name × name name + name ×↑ name name + name × name name + name × name ↑
CS 335 Swarnendu Biswas Example of Top-down Parsing
Rule # Production Rule # Sentential Form Input 0 푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 ↑ name + name × name 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 3 퐸푥푝푟 → 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 + 푇푒푟푚 + ⋯ ↑ name + name × name 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 1 … ↑ name + name × name 5 푇푒푟푚 → 푇푒푟푚 ÷ 퐹푎푐푡표푟 1 … ↑ name + name × name 6 푇푒푟푚 → 퐹푎푐푡표푟 7 퐹푎푐푡표푟 → (퐸푥푝푟) 8 퐹푎푐푡표푟 → num 9 퐹푎푐푡표푟 → name
CS 335 Swarnendu Biswas Example of Top-Down Parsing
Rule # Production Rule # Sentential Form Input 0 푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 ↑ name + name × name 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 3 퐸푥푝푟 → 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 + 푇푒푟푚 + ⋯ ↑ name + name × name 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 1 … ↑ name + name × name 5 A푇푒푟푚 top→-down푇푒푟푚 ÷ parser퐹푎푐푡표푟 can loop1 indefinitely …with left-recursive↑ name + name × name 6 grammar푇푒푟푚 → 퐹푎푐푡표푟 7 퐹푎푐푡표푟 → (퐸푥푝푟) 8 퐹푎푐푡표푟 → num 9 퐹푎푐푡표푟 → name
CS 335 Swarnendu Biswas Left Recursion
• A grammar is left-recursive if it has a nonterminal 퐴 such that there is + a derivation 퐴 ֜ 퐴훼 for some string 훼 • Direct left recursion: There is a production of the form 퐴 → 퐴훼 • Indirect left recursion: First symbol on the right-hand side of a rule can derive the symbol on the left
We can often reformulate a grammar to avoid left recursion
CS 335 Swarnendu Biswas Remove Left Recursion
퐴 → 퐴훼1 퐴훼2 … |퐴훼푚 훽1 … |훽푛
′ ′ ′ 퐴 → 훽1퐴 |훽2퐴 |…| 훽푛퐴 ′ ′ ′ ′ 퐴 → 훼1퐴 훼2퐴 … |훼푚퐴 |휖
CS 335 Swarnendu Biswas Remove Left Recursion
퐸 → 푇퐸′ 퐸 → 퐸 + 푇 | 푇 퐸′ → +푇퐸′ 푇 → 푇 ∗ 퐹 | 퐹 푇 → 퐹푇′ 퐹 → 퐸 | id 푇′ →∗ 퐹푇′ 퐹 → 퐸 |id
CS 335 Swarnendu Biswas Non-Left-Recursive Expression Grammar
Rule # Production Rule # Production 0 푆푡푎푟푡 → 퐸푥푝푟 0 푆푡푎푟푡 → 퐸푥푝푟 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 → 푇푒푟푚 퐸푥푝푟′ 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 2 퐸푥푝푟′ → + 푇푒푟푚 퐸푥푝푟′ 3 퐸푥푝푟 → 푇푒푟푚 3 퐸푥푝푟′ → − 푇푒푟푚 퐸푥푝푟′ 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 4 퐸푥푝푟′ → 휖 5 푇푒푟푚 → 푇푒푟푚 ÷ 퐹푎푐푡표푟 5 푇푒푟푚 → 퐹푎푐푡표푟 푇푒푟푚′ 6 푇푒푟푚 → 퐹푎푐푡표푟 6 푇푒푟푚′ →× 퐹푎푐푡표푟 푇푒푟푚′ 7 퐹푎푐푡표푟 → (퐸푥푝푟) 7 푇푒푟푚′ →÷ 퐹푎푐푡표푟 푇푒푟푚′ 8 퐹푎푐푡표푟 → num 8 푇푒푟푚′ → 휖 9 퐹푎푐푡표푟 → name 9 퐹푎푐푡표푟 → (퐸푥푝푟) 10 퐹푎푐푡표푟 → num 11 퐹푎푐푡표푟 → name
CS 335 Swarnendu Biswas Indirect Left Recursion
푆 → 퐴푎 | 푏 퐴 → 퐴푐 푆푑 휖
• There is a left recursion because 푆 → 퐴푎 → 푆푑푎
CS 335 Swarnendu Biswas Eliminating Left Recursion
• Input: Grammar 퐺 with no cycles or 휖−productions • Algorithm Arrange nonterminals in some order 퐴1, 퐴2, … , 퐴푛 for 푖 ← 1 … 푛 for 푗 ← 1 to 푖 − 1
If ∃ a production 퐴푖 → 퐴푗훾 Replace 퐴푖 → 퐴푗훾 with one or more productions that expand 퐴푗 Eliminate the immediate left recursion among the 퐴푖 productions
CS 335 Swarnendu Biswas Eliminating Left Recursion
• Input: Grammar 퐺 with no cycles or 휖−productions • Algorithm Arrange nonterminals in some order 퐴1, 퐴2, … , 퐴푛 for 푖 ← 1 … 푛 for 푗 ← 1 to 푖 − 1
If ∃ a production 퐴푖 → 퐴푗훾 Replace 퐴푖 → 퐴푗훾 with one or more productions that expand 퐴푗 Eliminate the immediate left recursion among the 퐴푖 productions
Loop invariant at the start of outer iteration 푖
∀푘 < 푖, no production expanding 퐴푘 has 퐴푙 in its righthand side for all 푙 < 푘
CS 335 Swarnendu Biswas Eliminating Indirect Left Recursion
푆 → 퐴푎 | 푏 푆 → 퐴푎 | 푏 퐴 → 퐴푐 푆푑 휖 퐴 → 푏푑퐴′ | 퐴′ 퐴′ → 푐퐴′ 푎푑퐴′ 휖
CS 335 Swarnendu Biswas Cost of Backtracking
Backtracking is expensive • Parser expands a nonterminal with the wrong rule • Mismatch between the lower fringe of the parse tree and the input is detected • Parser undoes the last few actions • Parser tries other productions if any
CS 335 Swarnendu Biswas Avoid Backtracking
• Parser is to select the next rule • Compare the curr symbol and the next input symbol called the lookahead • Use the lookahead to disambiguate the possible production rules
• Backtrack-free grammar is a CFG for which the leftmost, top-down parser can always predict the correct rule with one word lookahead • Also called a predictive grammar
CS 335 Swarnendu Biswas FIRST Set
• Intuition • Each alternative for the leftmost nonterminal leads to a distinct terminal symbol • Which rule to choose becomes obvious by comparing the next word in the input stream
• Given a string 훾 of terminal and nonterminal symbols, FIRST(훾) is the set of all terminal symbols that can begin any string derived from 훾 • We also need to keep track of which symbols can produce the empty string • FIRST: (푁푇 ∪ 푇 ∪ 휖, EOF ) → (푇 ∪ 휖, EOF )
CS 335 Swarnendu Biswas Steps to Compute FIRST Set
1. If 푋 is a terminal, then FIRST 푋 = {푋} 2. If 푋 → 휖 is a production, then 휖 ∈ FIRST(푋)
3. If 푋 is a nonterminal and 푋 → 푌1푌2 … 푌푘is a production I. Everything in FIRST(푌1) is in FIRST 푋 II. If for some 푖, 푎 ∈ FIRST(푌푖) and ∀1 ≤ 푗 < 푖, 휖 ∈ FIRST(푌푗), then 푎 ∈ FIRST(푋) III. If 휖 ∈ FIRST(푌1 … 푌푘), then 휖 ∈ FIRST(푋)
CS 335 Swarnendu Biswas FIRST Set
• Generalize FIRST relation to string of symbols
FIRST 푋훾 → FIRST 푋 if 푋 ↛ 휖 FIRST 푋훾 → FIRST 푋 ∪ FIRST 훾 if 푋 → 휖
CS 335 Swarnendu Biswas Compute FIRST Set
푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 → 푇푒푟푚 퐸푥푝푟′ 퐸푥푝푟′ → +푇푒푟푚 퐸푥푝푟′ −푇푒푟푚 퐸푥푝푟′ 휖 푇푒푟푚 → 퐹푎푐푡표푟 푇푒푟푚′ 푇푒푟푚′ →× 퐹푎푐푡표푟 푇푒푟푚′ ÷ 퐹푎푐푡표푟 푇푒푟푚′ 휖 퐹푎푐푡표푟 → (퐸푥푝푟) | num | name
CS 335 Swarnendu Biswas Compute FIRST Set
푆푡푎푟푡 → 퐸푥푝푟 FIRST 퐸푥푝푟 = {(, name, num} 퐸푥푝푟 → 푇푒푟푚 퐸푥푝푟′ FIRST 퐸푥푝푟′ = {+, −, 휖} ′ ′ 퐸푥푝푟 → +푇푒푟푚 퐸푥푝푟 FIRST 푇푒푟푚 = {(, name, num} −푇푒푟푚 퐸푥푝푟′ 휖 FIRST 푇푒푟푚′ = {휖 ×,÷} 푇푒푟푚 → 퐹푎푐푡표푟 푇푒푟푚′ FIRST 퐹푎푐푡표푟 = {(, name,num} 푇푒푟푚′ →× 퐹푎푐푡표푟 푇푒푟푚′ ÷ 퐹푎푐푡표푟 푇푒푟푚′ 휖 퐹푎푐푡표푟 → (퐸푥푝푟) | num | name
CS 335 Swarnendu Biswas FOLLOW Set
• FOLLOW(푋) is the set of terminals that can immediately follow 푋 • That is, 푡 ∈ FOLLOW(푋) if there is any derivation containing 푋푡
푆
Terminal 푐 is in FIRST(퐴) and 푎 훼 퐴 푎 훽 is in FOLLOW(퐴)
푐 … 훾
CS 335 Swarnendu Biswas Steps to Compute FOLLOW Set
1. Place $ in FOLLOW(푆) where 푆 is the start symbol and $ is the end marker 2. If there is a production 퐴 → 훼퐵훽, then everything in FIRST(훽) except 휖 is in FOLLOW(퐵) 3. If there is a production 퐴 → 훼퐵, or a production 퐴 → 훼퐵훽 where FIRST(훽) contains 휖, then everything in FOLLOW(퐴) is in FOLLOW(퐵)
CS 335 Swarnendu Biswas Compute FOLLOW Set
푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 → 푇푒푟푚 퐸푥푝푟′ 퐸푥푝푟′ → +푇푒푟푚 퐸푥푝푟′ −푇푒푟푚 퐸푥푝푟′ 휖 푇푒푟푚 → 퐹푎푐푡표푟 푇푒푟푚′ 푇푒푟푚′ →× 퐹푎푐푡표푟 푇푒푟푚′ ÷ 퐹푎푐푡표푟 푇푒푟푚′ 휖 퐹푎푐푡표푟 → (퐸푥푝푟) | num | name
CS 335 Swarnendu Biswas Compute FOLLOW Set
푆푡푎푟푡 → 퐸푥푝푟 FOLLOW 퐸푥푝푟 = {$, )} 퐸푥푝푟 → 푇푒푟푚 퐸푥푝푟′ FOLLOW 퐸푥푝푟′ = {$,)} ′ ′ 퐸푥푝푟 → +푇푒푟푚 퐸푥푝푟 FOLLOW 푇푒푟푚 = {$, +, −, )} −푇푒푟푚 퐸푥푝푟′ 휖 FOLLOW 푇푒푟푚′ = {$,+, −, )} 푇푒푟푚 → 퐹푎푐푡표푟 푇푒푟푚′ FOLLOW 퐹푎푐푡표푟 = {$, +, −,×,÷, )} 푇푒푟푚′ →× 퐹푎푐푡표푟 푇푒푟푚′ ÷ 퐹푎푐푡표푟 푇푒푟푚′ 휖 퐹푎푐푡표푟 → (퐸푥푝푟) | num | name
CS 335 Swarnendu Biswas Conditions for Backtrack-Free Grammar
• Consider a production 퐴 → 훽 FIRST 훽 if 휖 ∉ FIRST(훽) FIRST+ = ቊ FIRST 훽 ∪ FOLLOW 퐴 otherwise
• For any nonterminal 퐴 where 퐴 → 훽1|훽2|…| 훽푛, a backtrack-free grammar has the property + + FIRST 퐴 → 훽푖 ∩ FIRST 퐴 → 훽푗 = 휙, ∀1 ≤ 푖, 푗 ≤ 푛, 푖 ≠ 푗
CS 335 Swarnendu Biswas Backtracking
푆푡푎푟푡 → 퐸푥푝푟 퐹푎푐푡표푟 → name 퐸푥푝푟 → 푇푒푟푚퐸푥푝푟′ | name [ 퐴푟푔푙푖푠푡 ] 퐸푥푝푟′ → +푇푒푟푚퐸푥푝푟′ | name ( 퐴푟푔푙푖푠푡 ) −푇푒푟푚퐸푥푝푟′ 휖 퐴푟푔푙푖푠푡 → 퐸푥푝푟 푀표푟푒퐴푟푔푠 푇푒푟푚 → 퐹푎푐푡표푟푇푒푟푚′ 푀표푟푒퐴푟푔푠 → , 퐸푥푝푟 푀표푟푒퐴푟푔푠 푇푒푟푚′ →× 퐹푎푐푡표푟푇푒푟푚′ | 휖 ÷ 퐹푎푐푡표푟푇푒푟푚′ 휖
CS 335 Swarnendu Biswas Backtracking
푆푡푎푟푡 → 퐸푥푝푟 퐹푎푐푡표푟 → name 퐸푥푝푟 → 푇푒푟푚퐸푥푝푟′ | name [ 퐴푟푔푙푖푠푡 ] 퐸푥푝푟′ → +푇푒푟푚퐸푥푝푟′ | name ( 퐴푟푔푙푖푠푡 ) −푇푒푟푚퐸푥푝푟′ 휖 퐴푟푔푙푖푠푡 → 퐸푥푝푟 푀표푟푒퐴푟푔푠 푇푒푟푚 → 퐹푎푐푡표푟푇푒푟푚′ 푀표푟푒퐴푟푔푠 → , 퐸푥푝푟 푀표푟푒퐴푟푔푠 푇푒푟푚′ →× 퐹푎푐푡표푟푇푒푟푚′ | 휖 ÷ 퐹푎푐푡표푟푇푒푟푚′ 휖
Not all grammars are backtrack free
CS 335 Swarnendu Biswas Left Factoring
• Left factoring is the process of extracting and isolating common prefixes in a set of productions
퐹푎푐푡표푟 → 푛푎푚푒 퐴푟푔푢푚푒푛푡푠 퐴푟푔푢푚푒푛푡푠 → 퐴푟푔퐿푖푠푡 퐴푟푔퐿푖푠푡 휖
• Algorithm
퐴 → 훼훽1 훼훽2 … 훼훽푛 훾1 훾2 … |훾푗
퐴 → 훼퐵|훾1 훾2 … |훾푗 퐵 → 훽1 훽2 … |훽푛
CS 335 Swarnendu Biswas Key Insight in Using Top-Down Parsing
• Efficiency depends on the accuracy of selecting the correct production for expanding a nonterminal • Parser may not terminate in the worst case • A large subset of the context-free grammars can be parsed without backtracking
CS 335 Swarnendu Biswas Recursive-Descent Parsing
CS 335 Swarnendu Biswas Recursive-Descent Parsing • Recursive-descent parsing is a form of top-down parsing that may require backtracking • Consists of a set of procedures, one for each nonterminal
void A() { Choose an A-production 퐴 → 푋1푋2 … 푋푘 for 푖 ← 1 … 푘 if 푋푖 is a nonterminal call procedure 푋푖() else if 푋푖 equals the current input symbol 푎 advance the input to the next symbol else // error }
CS 335 Swarnendu Biswas Limitations with Recursive-Descent Parsing
• Consider a grammar with two productions 푋 → 훾1 and 푋 → 훾2 • Suppose FIRST(훾1) ∩ FIRST(훾2) ≠ 휙 • Say 푎 is the common terminal symbol • Function corresponding to 푋 will not know which production to use on input token 푎
CS 335 Swarnendu Biswas Recursive-Descent Parsing with Backtracking
• To support backtracking • All productions should be tried in some order • Failure for some production implies we need to try remaining productions • Report an error only when there are no other rules
CS 335 Swarnendu Biswas Predictive Parsing
• Special case of recursive-descent parsing that does not require backtracking • Lookahead symbol unambiguously determines which production rule to use • Advantage is that the algorithm is simple and the parser can be constructed by hand
푠푡푚푡 → expr ; | if 푒푥푝푟 푠푡푚푡 | for 표푝푡푒푥푝푟 ; 표푝푡푒푥푝푟 ; 표푝푡푒푥푝푟 푠푡푚푡 | other 표푝푡푒푥푝푟 → 휖 | expr
CS 335 Swarnendu Biswas Pseudocode for a Predictive Parser void stmt() { switch(lookahead) { case expr: match(expr); match(‘;’); break; case if: match(if); match(‘(‘); match(expr); match(‘)’); stmt(); break; case for: match(for); match(‘(‘); optexpr(); match(‘;’); optexpr(); match(‘;’); optexpr(); match(‘)’); stmt(); break; case other: match(other); break; default: report(“syntax error”); } }
CS 335 Swarnendu Biswas LL(1) Grammars
• Class of grammars for which no backtracking is required • First L stands for left-to-right scan, second L stands for leftmost derivation • There is one lookahead token • No left-recursive or ambiguous grammar can be LL(1) • In LL(k), k stands for k lookahead tokens • Predictive parsers accept LL(k) grammars • Every LL(1) grammar is a LL(2) grammar
CS 335 Swarnendu Biswas Nonrecursive Table-Driven Predictive Parser
Input a + b $
Predictive Stack X Output Parsing Program Y
Z
$ Parsing Table 푀
CS 335 Swarnendu Biswas Predictive Parsing Algorithm • Input: String 푤 and parsing table 푀 for grammar 퐺 • Algorithm: Let 푎 be the first symbol in 푤 Let 푋 be the symbol at the top of the stack while 푋 ≠ $: if 푋 == 푎: pop the stack and advance the input else if 푋 is a terminal or 푀[푋, 푎] is an error entry: error else if 푀 푋, 푎 == 푋 → 푌1푌2 … 푌푘: output the production pop the stack push 푌푘푌푘−1 … 푌1 onto the stack 푋 ← top stack symbol
CS 335 Swarnendu Biswas 퐸 → 푇퐸′ 퐸′ → +푇퐸′ | 휖 Predictive Parsing Table 푇 → 퐹푇′ 푇′ →∗ 퐹푇′ | 휖 퐹 → 퐸 | id
Nonterminal id + * ( ) $ 퐸 퐸 → 푇퐸′ 퐸 → 푇퐸′
퐸′ 퐸′ → +푇퐸′ 퐸′ → 휖 퐸′ → 휖
푇 푇 → 퐹푇′ 푇 → 퐹푇′
푇′ 푇′ → 휖 푇′ →∗ 퐹푇′ 푇′ → 휖 푇′ → 휖
퐹 퐹 → id 퐹 → (퐸)
CS 335 Swarnendu Biswas Construction of a Predictive Parsing Table
• Input: Grammar 퐺
• Algorithm: • For each production 퐴 → 훼 in 퐺, • For each terminal 푎 in FIRST 훼 , add 퐴 → 훼 to 푀[퐴, 푎] • If 휖 is in FIRST 훼 , then for each terminal 푏 in FOLLOW(퐴), add 퐴 → 훼 to 푀 퐴, 푏 • If 휖 is in FIRST 훼 and $ is in FOLLOW(퐴), add 퐴 → 훼 to 푀[퐴, $] • No production in 푀[퐴, 푎] indicates error
CS 335 Swarnendu Biswas Working of Predictive Parser
Matched Stack Input Action 퐸$ id + id ∗ id$ 푇퐸′$ id + id ∗ id$ Output 퐸 → 푇퐸′ 퐹푇′퐸′$ id + id ∗ id$ Output 푇 → 퐹푇′ id푇′퐸′$ id + id ∗ id$ Output 퐹 → id id 푇′퐸′$ +id ∗ id$ Match id id 퐸′$ +id ∗ id$ Output 푇′ → 휖 id +푇퐸′$ +id ∗ id$ Output 퐸′ → +푇퐸′ id + 푇퐸′$ id ∗ id$ Match + id + 퐹푇′퐸′$ id ∗ id$ Output 푇 → 퐹푇′ id + id퐓′퐸′$ id ∗ id$ Output 퐹 → id
CS 335 Swarnendu Biswas Working of Predictive Parser
Matched Stack Input Action … id + id푇′퐸′$ id ∗ id$ Output 퐹 → id id + id 푇′퐸′$ ∗ id$ Match id id + id ∗ 퐹푇′퐸′$ ∗ id$ Output 푇′ →∗ 퐹푇′ id + id∗ 퐹푇′퐸′$ id$ Match ∗ id + id∗ id푇′퐸′$ id$ Output 퐹 → id id + id∗id 푇′퐸′$ $ Match id id + id∗id 퐸′$ $ Output 푇′ → 휖 id + id∗id $ $ Output 퐸′ → 휖
CS 335 Swarnendu Biswas Predictive Parsing
• Grammars whose predictive parsing tables contain no duplicate entries are called LL(1)
• If grammar 퐺 is left-recursive or is ambiguous, then parsing table 푀 will have at least one multiply-defined cell • Some grammars cannot be transformed into LL(1) • The adjacent grammar is ambiguous 푆 → 푖퐸푡푆푆′ | 푎 푆′ → 푒푆 | 휖 퐸 → 푏
CS 335 Swarnendu Biswas Predictive Parsing Table 푆 → 푖퐸푡푆푆′| 푎 푆′ → 푒푆 | 휖 퐸 → 푏
Nonterminal a b e i t $ 푆 푆 → 푎 푆 → 푖퐸푡푆푆′
푆′ 푆′ → 휖 푆′ → 휖 푆′ → 푒푆 퐸 퐸 → 푏 푇 → 퐹푇′
CS 335 Swarnendu Biswas Error Recovery in Predictive Parsing
• Error conditions • Terminal on top of the stack does not match the next input symbol • Nonterminal 퐴 is on top of the stack, 푎 is the next input symbol, and 푀[퐴, 푎] is error • Choices • Raise an error and quit parsing • Print an error message, try to recover from the error, and continue with compilation
CS 335 Swarnendu Biswas Error Recovery in Predictive Parsing
• Panic mode – skip over symbols until a token in a set of synchronizing (synch) tokens appears • Add all tokens in FOLLOW(퐴) to the synch set for 퐴 • Add symbols in FIRST(퐴) to the synch set for 퐴 • Add keywords that can begin sentences • …
CS 335 Swarnendu Biswas Predictive Parsing Table with Synchronizing 퐸 → 푇퐸′ Tokens 퐸′ → +푇퐸′ | 휖 푇 → 퐹푇′ 푇′ →∗ 퐹푇′ | 휖 퐹 → 퐸 | id
Nonterminal id + * ( ) $ 퐸 퐸 → 푇퐸′ 퐸 → 푇퐸′ synch synch
퐸′ 퐸′ → +푇퐸′ 퐸′ → 휖 퐸′ → 휖
푇 푇 → 퐹푇′ synch 푇 → 퐹푇′ synch synch
푇′ 푇′ → 휖 푇′ →∗ 퐹푇′ 푇′ → 휖 푇′ → 휖
퐹 퐹 → id synch synch 퐹 → (퐸) synch synch
CS 335 Swarnendu Biswas Error Recover Moves by Predictive Parser
Stack Input Remark 퐸$ )id ∗ +id$ Error, skip ) 퐸$ id ∗ +id$ id is in FIRST(퐸) 푇퐸′$ id ∗ +id$ 퐹푇퐸′$ id ∗ +id$ id푇퐸′$ id ∗ +id$ 푇′퐸′$ ∗ +id$ ∗ 퐹푇′퐸′$ ∗ +id$ 퐹푇′퐸′$ +id$ Error, 푀 퐹, + = synch 푇′퐸′$ +id$ 퐹 has been popped 퐸′$ +id$
CS 335 Swarnendu Biswas Error Recover Moves by Predictive Parser
Stack Input Remark +푇퐸′$ +id$ 푇퐸′$ id$ 퐹푇′퐸′$ id$ id푇′퐸′$ id$ 푇′퐸′$ $ 퐸′$ $ $ $
CS 335 Swarnendu Biswas References
• A. Aho et al. Compilers: Principles, Techniques, and Tools, 2nd edition, Chapter 4.4. • K. Cooper and L. Torczon. Engineering a Compiler, 2nd edition, Chapter 3.3.
CS 335 Swarnendu Biswas