<<

CS 335: Top-down Parsing Swarnendu Biswas

Semester 2019-2020-II CSE, IIT Kanpur

Content influenced by many excellent references, see References slide for acknowledgements. Example Expression

푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 − 푇푒푟푚 푇푒푟푚

푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 푇푒푟푚 ÷ 퐹푎푐푡표푟 퐹푎푐푡표푟 priority 퐹푎푐푡표푟 → 퐸푥푝푟 | num | name

CS 335 Swarnendu Biswas Derivation of name + name × name Sentential Form Input 퐸푥푝푟 ↑ name + name × name 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 퐹푎푐푡표푟 + 푇푒푟푚 ↑ name + name × name name + 푇푒푟푚 ↑ name + name × name name + 푇푒푟푚 name ↑ +name × name name + 푇푒푟푚 name +↑ name × name name + 푇푒푟푚 × 퐹푎푐푡표푟 name +↑ name × name name + 퐹푎푐푡표푟 × 퐹푎푐푡표푟 name +↑ name × name name + name × 퐹푎푐푡표푟 name +↑ name × name name + name × 퐹푎푐푡표푟 name + name ↑× name name + name × 퐹푎푐푡표푟 name + name ×↑ name name + name × name name + name ×↑ name name + name × name name + name × name ↑ CS 335 Swarnendu Biswas Derivation of name + name × name Sentential Form Input 퐸푥푝푟 ↑ name + name × name 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 퐹푎푐푡표푟 + 푇푒푟푚 ↑ name + name × name name + 푇푒푟푚 ↑ name + name × name name + 푇푒푟푚 name ↑ +name × name name + 푇푒푟푚 name +↑ name × name Thename current+ 푇푒푟푚 input× 퐹푎푐푡표푟 terminal being scannedname is +called↑ name ×thename lookaheadname + 퐹푎푐푡표푟 symbol× 퐹푎푐푡표푟 name +↑ name × name name + name × 퐹푎푐푡표푟 name +↑ name × name name + name × 퐹푎푐푡표푟 name + name ↑× name name + name × 퐹푎푐푡표푟 name + name ×↑ name name + name × name name + name ×↑ name name + name × name name + name × name ↑ CS 335 Swarnendu Biswas Derivation of name + name × name

푙푚 푙푚 푙푚 푙푚 푙푚 푆푡푎푟푡 퐸푥푝푟 퐸푥푝푟 퐸푥푝푟 퐸푥푝푟 퐸푥푝푟

퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚

푇푒푟푚 Term Term

퐹푎푐푡표푟 퐹푎푐푡표푟

name

CS 335 Swarnendu Biswas Derivation of name + name × name

푙푚 퐸푥푝푟 푙푚 퐸푥푝푟 푙푚 퐸푥푝푟 푙푚 퐸푥푝푟

퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚

Term Term 푇푒푟푚 × 퐹푎푐푡표푟 Term 푇푒푟푚 × 퐹푎푐푡표푟 Term 푇푒푟푚 × 퐹푎푐푡표푟

퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟

name name name name name

CS 335 Swarnendu Biswas Derivation of name + name × name

푙푚 퐸푥푝푟 푙푚 퐸푥푝푟

퐸푥푝푟 + 푇푒푟푚 퐸푥푝푟 + 푇푒푟푚

Term 푇푒푟푚 × 퐹푎푐푡표푟 Term 푇푒푟푚 × 퐹푎푐푡표푟

퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 퐹푎푐푡표푟 name

name name name name

CS 335 Swarnendu Biswas General Idea of Top-down Parsing

Start with the root (start symbol) of the

Grow the tree downwards by expanding productions at the lower levels of the tree • Select a nonterminal and extend it by adding children corresponding to the right side of some production for the nonterminal

Repeat till

• Lower fringe consists only terminals and the input is consumed

Top-down parsing basically finds a leftmost derivation for an input string

CS 335 Swarnendu Biswas General Idea of Top-down Parsing

Start with the root of the parse tree

Grow the tree by expanding productions at the lower levels of the tree • Extend a nonterminal by adding children corresponding to the right side of some production for the nonterminal Repeat till • Lower fringe consists only terminals and the input is consumed • Mismatch in the lower fringe and the remaining input stream • Selection of a production may involve trial-and-error • Wrong choice of productions while expanding nonterminals • Input character stream is not part of the

CS 335 Swarnendu Biswas Leftmost Top-down Parsing root = node for Start symbol if curr == word: curr = root word = nextWord() push(null) // Stack curr = pop() word = nextWord() if word == eof and curr == null: accept input while (true): else if curr ∈ Nonterminal: backtrack

pick next rule 퐴 ⟶ 훽1훽2 … 훽푛 to expand curr

create nodes for 훽1, 훽2, …, 훽푛 as children of curr

push(훽푛, 훽푛−1, 훽1)

curr = 훽1

CS 335 Swarnendu Biswas Implementing

• Extend the previous algorithm to backtrack • Set curr to parent and delete the children • Expand the node curr with untried rules if any • Create child nodes for each symbol in the right hand of the production • Push those symbols onto the stack in reverse order • Set curr to the first child node • Move curr up the tree if there are no untried rules • Report a error when there are no more moves

CS 335 Swarnendu Biswas Example of Top-down Parsing

Rule # Production Rule # Sentential Form Input 0 푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 ↑ name + name × name 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 3 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 3 퐸푥푝푟 → 푇푒푟푚 6 퐹푎푐푡표푟 + 푇푒푟푚 ↑ name + name × name 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 9 name + 푇푒푟푚 ↑ name + name × name 5 푇푒푟푚 → 푇푒푟푚 ÷ 퐹푎푐푡표푟 name + 푇푒푟푚 name ↑ +name × name + 푇푒푟푚 +↑ × 6 푇푒푟푚 → 퐹푎푐푡표푟 name name name name 4 name + 푇푒푟푚 × 퐹푎푐푡표푟 name +↑ name × name 7 퐹푎푐푡표푟 → (퐸푥푝푟) 6 name + 퐹푎푐푡표푟 × 퐹푎푐푡표푟 name +↑ name × name 8 퐹푎푐푡표푟 → num 9 name + name × 퐹푎푐푡표푟 name +↑ name × name 9 퐹푎푐푡표푟 → name name + name × 퐹푎푐푡표푟 name + name ↑× name name + name × 퐹푎푐푡표푟 name + name ×↑ name 9 name + name × name name + name ×↑ name name + name × name name + name × name ↑

CS 335 Swarnendu Biswas Example of Top-down Parsing

Rule # Production Rule # Sentential Form Input 0 푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 ↑ name + name × name 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 3 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 3 퐸푥푝푟 → 푇푒푟푚 6 퐹푎푐푡표푟 + 푇푒푟푚 ↑ name + name × name 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 9 name + 푇푒푟푚 ↑ name + name × name 5 푇푒푟푚 → 푇푒푟푚 ÷ 퐹푎푐푡표푟 name + 푇푒푟푚 name ↑ +name × name + 푇푒푟푚 +↑ × 6 푇푒푟푚 → 퐹푎푐푡표푟 name name name name How does a top-down parser4 choosename + which푇푒푟푚 × 퐹푎푐푡표푟 rule toname apply?+↑ name × name 7 퐹푎푐푡표푟 → (퐸푥푝푟) 6 name + 퐹푎푐푡표푟 × 퐹푎푐푡표푟 name +↑ name × name 8 퐹푎푐푡표푟 → num 9 name + name × 퐹푎푐푡표푟 name +↑ name × name 9 퐹푎푐푡표푟 → name name + name × 퐹푎푐푡표푟 name + name ↑× name name + name × 퐹푎푐푡표푟 name + name ×↑ name 9 name + name × name name + name ×↑ name name + name × name name + name × name ↑

CS 335 Swarnendu Biswas Example of Top-down Parsing

Rule # Production Rule # Sentential Form Input 0 푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 ↑ name + name × name 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 3 퐸푥푝푟 → 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 + 푇푒푟푚 + ⋯ ↑ name + name × name 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 1 … ↑ name + name × name 5 푇푒푟푚 → 푇푒푟푚 ÷ 퐹푎푐푡표푟 1 … ↑ name + name × name 6 푇푒푟푚 → 퐹푎푐푡표푟 7 퐹푎푐푡표푟 → (퐸푥푝푟) 8 퐹푎푐푡표푟 → num 9 퐹푎푐푡표푟 → name

CS 335 Swarnendu Biswas Example of Top-Down Parsing

Rule # Production Rule # Sentential Form Input 0 푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 ↑ name + name × name 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 ↑ name + name × name 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 + 푇푒푟푚 ↑ name + name × name 3 퐸푥푝푟 → 푇푒푟푚 1 퐸푥푝푟 + 푇푒푟푚 + 푇푒푟푚 + ⋯ ↑ name + name × name 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 1 … ↑ name + name × name 5 A푇푒푟푚 top→-down푇푒푟푚 ÷ parser퐹푎푐푡표푟 can loop1 indefinitely …with left-recursive↑ name + name × name 6 grammar푇푒푟푚 → 퐹푎푐푡표푟 7 퐹푎푐푡표푟 → (퐸푥푝푟) 8 퐹푎푐푡표푟 → num 9 퐹푎푐푡표푟 → name

CS 335 Swarnendu Biswas Left

• A grammar is left-recursive if it has a nonterminal 퐴 such that there is + a derivation 퐴 ֜ 퐴훼 for some string 훼 • Direct : There is a production of the form 퐴 → 퐴훼 • Indirect left recursion: First symbol on the right-hand side of a rule can derive the symbol on the left

We can often reformulate a grammar to avoid left recursion

CS 335 Swarnendu Biswas Remove Left Recursion

퐴 → 퐴훼1 퐴훼2 … |퐴훼푚 훽1 … |훽푛

′ ′ ′ 퐴 → 훽1퐴 |훽2퐴 |…| 훽푛퐴 ′ ′ ′ ′ 퐴 → 훼1퐴 훼2퐴 … |훼푚퐴 |휖

CS 335 Swarnendu Biswas Remove Left Recursion

퐸 → 푇퐸′ 퐸 → 퐸 + 푇 | 푇 퐸′ → +푇퐸′ 푇 → 푇 ∗ 퐹 | 퐹 푇 → 퐹푇′ 퐹 → 퐸 | id 푇′ →∗ 퐹푇′ 퐹 → 퐸 |id

CS 335 Swarnendu Biswas Non-Left-Recursive Expression Grammar

Rule # Production Rule # Production 0 푆푡푎푟푡 → 퐸푥푝푟 0 푆푡푎푟푡 → 퐸푥푝푟 1 퐸푥푝푟 → 퐸푥푝푟 + 푇푒푟푚 1 퐸푥푝푟 → 푇푒푟푚 퐸푥푝푟′ 2 퐸푥푝푟 → 퐸푥푝푟 − 푇푒푟푚 2 퐸푥푝푟′ → + 푇푒푟푚 퐸푥푝푟′ 3 퐸푥푝푟 → 푇푒푟푚 3 퐸푥푝푟′ → − 푇푒푟푚 퐸푥푝푟′ 4 푇푒푟푚 → 푇푒푟푚 × 퐹푎푐푡표푟 4 퐸푥푝푟′ → 휖 5 푇푒푟푚 → 푇푒푟푚 ÷ 퐹푎푐푡표푟 5 푇푒푟푚 → 퐹푎푐푡표푟 푇푒푟푚′ 6 푇푒푟푚 → 퐹푎푐푡표푟 6 푇푒푟푚′ →× 퐹푎푐푡표푟 푇푒푟푚′ 7 퐹푎푐푡표푟 → (퐸푥푝푟) 7 푇푒푟푚′ →÷ 퐹푎푐푡표푟 푇푒푟푚′ 8 퐹푎푐푡표푟 → num 8 푇푒푟푚′ → 휖 9 퐹푎푐푡표푟 → name 9 퐹푎푐푡표푟 → (퐸푥푝푟) 10 퐹푎푐푡표푟 → num 11 퐹푎푐푡표푟 → name

CS 335 Swarnendu Biswas Indirect Left Recursion

푆 → 퐴푎 | 푏 퐴 → 퐴푐 푆푑 휖

• There is a left recursion because 푆 → 퐴푎 → 푆푑푎

CS 335 Swarnendu Biswas Eliminating Left Recursion

• Input: Grammar 퐺 with no cycles or 휖−productions • Algorithm Arrange nonterminals in some order 퐴1, 퐴2, … , 퐴푛 for 푖 ← 1 … 푛 for 푗 ← 1 to 푖 − 1

If ∃ a production 퐴푖 → 퐴푗훾 Replace 퐴푖 → 퐴푗훾 with one or more productions that expand 퐴푗 Eliminate the immediate left recursion among the 퐴푖 productions

CS 335 Swarnendu Biswas Eliminating Left Recursion

• Input: Grammar 퐺 with no cycles or 휖−productions • Algorithm Arrange nonterminals in some order 퐴1, 퐴2, … , 퐴푛 for 푖 ← 1 … 푛 for 푗 ← 1 to 푖 − 1

If ∃ a production 퐴푖 → 퐴푗훾 Replace 퐴푖 → 퐴푗훾 with one or more productions that expand 퐴푗 Eliminate the immediate left recursion among the 퐴푖 productions

Loop invariant at the start of outer iteration 푖

∀푘 < 푖, no production expanding 퐴푘 has 퐴푙 in its righthand side for all 푙 < 푘

CS 335 Swarnendu Biswas Eliminating Indirect Left Recursion

푆 → 퐴푎 | 푏 푆 → 퐴푎 | 푏 퐴 → 퐴푐 푆푑 휖 퐴 → 푏푑퐴′ | 퐴′ 퐴′ → 푐퐴′ 푎푑퐴′ 휖

CS 335 Swarnendu Biswas Cost of Backtracking

Backtracking is expensive • Parser expands a nonterminal with the wrong rule • Mismatch between the lower fringe of the parse tree and the input is detected • Parser undoes the last few actions • Parser other productions if any

CS 335 Swarnendu Biswas Avoid Backtracking

• Parser is to select the next rule • Compare the curr symbol and the next input symbol called the lookahead • Use the lookahead to disambiguate the possible production rules

• Backtrack-free grammar is a CFG for which the leftmost, top-down parser can always predict the correct rule with one word lookahead • Also called a predictive grammar

CS 335 Swarnendu Biswas FIRST Set

• Intuition • Each alternative for the leftmost nonterminal leads to a distinct terminal symbol • Which rule to choose becomes obvious by comparing the next word in the input stream

• Given a string 훾 of terminal and nonterminal symbols, FIRST(훾) is the set of all terminal symbols that can begin any string derived from 훾 • We also need to keep track of which symbols can produce the empty string • FIRST: (푁푇 ∪ 푇 ∪ 휖, EOF ) → (푇 ∪ 휖, EOF )

CS 335 Swarnendu Biswas Steps to Compute FIRST Set

1. If 푋 is a terminal, then FIRST 푋 = {푋} 2. If 푋 → 휖 is a production, then 휖 ∈ FIRST(푋)

3. If 푋 is a nonterminal and 푋 → 푌1푌2 … 푌푘is a production I. Everything in FIRST(푌1) is in FIRST 푋 II. If for some 푖, 푎 ∈ FIRST(푌푖) and ∀1 ≤ 푗 < 푖, 휖 ∈ FIRST(푌푗), then 푎 ∈ FIRST(푋) III. If 휖 ∈ FIRST(푌1 … 푌푘), then 휖 ∈ FIRST(푋)

CS 335 Swarnendu Biswas FIRST Set

• Generalize FIRST relation to string of symbols

FIRST 푋훾 → FIRST 푋 if 푋 ↛ 휖 FIRST 푋훾 → FIRST 푋 ∪ FIRST 훾 if 푋 → 휖

CS 335 Swarnendu Biswas Compute FIRST Set

푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 → 푇푒푟푚 퐸푥푝푟′ 퐸푥푝푟′ → +푇푒푟푚 퐸푥푝푟′ −푇푒푟푚 퐸푥푝푟′ 휖 푇푒푟푚 → 퐹푎푐푡표푟 푇푒푟푚′ 푇푒푟푚′ →× 퐹푎푐푡표푟 푇푒푟푚′ ÷ 퐹푎푐푡표푟 푇푒푟푚′ 휖 퐹푎푐푡표푟 → (퐸푥푝푟) | num | name

CS 335 Swarnendu Biswas Compute FIRST Set

푆푡푎푟푡 → 퐸푥푝푟 FIRST 퐸푥푝푟 = {(, name, num} 퐸푥푝푟 → 푇푒푟푚 퐸푥푝푟′ FIRST 퐸푥푝푟′ = {+, −, 휖} ′ ′ 퐸푥푝푟 → +푇푒푟푚 퐸푥푝푟 FIRST 푇푒푟푚 = {(, name, num} −푇푒푟푚 퐸푥푝푟′ 휖 FIRST 푇푒푟푚′ = {휖 ×,÷} 푇푒푟푚 → 퐹푎푐푡표푟 푇푒푟푚′ FIRST 퐹푎푐푡표푟 = {(, name,num} 푇푒푟푚′ →× 퐹푎푐푡표푟 푇푒푟푚′ ÷ 퐹푎푐푡표푟 푇푒푟푚′ 휖 퐹푎푐푡표푟 → (퐸푥푝푟) | num | name

CS 335 Swarnendu Biswas FOLLOW Set

• FOLLOW(푋) is the set of terminals that can immediately follow 푋 • That is, 푡 ∈ FOLLOW(푋) if there is any derivation containing 푋푡

Terminal 푐 is in FIRST(퐴) and 푎 훼 퐴 푎 훽 is in FOLLOW(퐴)

푐 … 훾

CS 335 Swarnendu Biswas Steps to Compute FOLLOW Set

1. Place $ in FOLLOW(푆) where 푆 is the start symbol and $ is the end marker 2. If there is a production 퐴 → 훼퐵훽, then everything in FIRST(훽) except 휖 is in FOLLOW(퐵) 3. If there is a production 퐴 → 훼퐵, or a production 퐴 → 훼퐵훽 where FIRST(훽) contains 휖, then everything in FOLLOW(퐴) is in FOLLOW(퐵)

CS 335 Swarnendu Biswas Compute FOLLOW Set

푆푡푎푟푡 → 퐸푥푝푟 퐸푥푝푟 → 푇푒푟푚 퐸푥푝푟′ 퐸푥푝푟′ → +푇푒푟푚 퐸푥푝푟′ −푇푒푟푚 퐸푥푝푟′ 휖 푇푒푟푚 → 퐹푎푐푡표푟 푇푒푟푚′ 푇푒푟푚′ →× 퐹푎푐푡표푟 푇푒푟푚′ ÷ 퐹푎푐푡표푟 푇푒푟푚′ 휖 퐹푎푐푡표푟 → (퐸푥푝푟) | num | name

CS 335 Swarnendu Biswas Compute FOLLOW Set

푆푡푎푟푡 → 퐸푥푝푟 FOLLOW 퐸푥푝푟 = {$, )} 퐸푥푝푟 → 푇푒푟푚 퐸푥푝푟′ FOLLOW 퐸푥푝푟′ = {$,)} ′ ′ 퐸푥푝푟 → +푇푒푟푚 퐸푥푝푟 FOLLOW 푇푒푟푚 = {$, +, −, )} −푇푒푟푚 퐸푥푝푟′ 휖 FOLLOW 푇푒푟푚′ = {$,+, −, )} 푇푒푟푚 → 퐹푎푐푡표푟 푇푒푟푚′ FOLLOW 퐹푎푐푡표푟 = {$, +, −,×,÷, )} 푇푒푟푚′ →× 퐹푎푐푡표푟 푇푒푟푚′ ÷ 퐹푎푐푡표푟 푇푒푟푚′ 휖 퐹푎푐푡표푟 → (퐸푥푝푟) | num | name

CS 335 Swarnendu Biswas Conditions for Backtrack-Free Grammar

• Consider a production 퐴 → 훽 FIRST 훽 if 휖 ∉ FIRST(훽) FIRST+ = ቊ FIRST 훽 ∪ FOLLOW 퐴 otherwise

• For any nonterminal 퐴 where 퐴 → 훽1|훽2|…| 훽푛, a backtrack-free grammar has the property + + FIRST 퐴 → 훽푖 ∩ FIRST 퐴 → 훽푗 = 휙, ∀1 ≤ 푖, 푗 ≤ 푛, 푖 ≠ 푗

CS 335 Swarnendu Biswas Backtracking

푆푡푎푟푡 → 퐸푥푝푟 퐹푎푐푡표푟 → name 퐸푥푝푟 → 푇푒푟푚퐸푥푝푟′ | name [ 퐴푟푔푙푖푠푡 ] 퐸푥푝푟′ → +푇푒푟푚퐸푥푝푟′ | name ( 퐴푟푔푙푖푠푡 ) −푇푒푟푚퐸푥푝푟′ 휖 퐴푟푔푙푖푠푡 → 퐸푥푝푟 푀표푟푒퐴푟푔푠 푇푒푟푚 → 퐹푎푐푡표푟푇푒푟푚′ 푀표푟푒퐴푟푔푠 → , 퐸푥푝푟 푀표푟푒퐴푟푔푠 푇푒푟푚′ →× 퐹푎푐푡표푟푇푒푟푚′ | 휖 ÷ 퐹푎푐푡표푟푇푒푟푚′ 휖

CS 335 Swarnendu Biswas Backtracking

푆푡푎푟푡 → 퐸푥푝푟 퐹푎푐푡표푟 → name 퐸푥푝푟 → 푇푒푟푚퐸푥푝푟′ | name [ 퐴푟푔푙푖푠푡 ] 퐸푥푝푟′ → +푇푒푟푚퐸푥푝푟′ | name ( 퐴푟푔푙푖푠푡 ) −푇푒푟푚퐸푥푝푟′ 휖 퐴푟푔푙푖푠푡 → 퐸푥푝푟 푀표푟푒퐴푟푔푠 푇푒푟푚 → 퐹푎푐푡표푟푇푒푟푚′ 푀표푟푒퐴푟푔푠 → , 퐸푥푝푟 푀표푟푒퐴푟푔푠 푇푒푟푚′ →× 퐹푎푐푡표푟푇푒푟푚′ | 휖 ÷ 퐹푎푐푡표푟푇푒푟푚′ 휖

Not all are backtrack free

CS 335 Swarnendu Biswas Left Factoring

• Left factoring is the process of extracting and isolating common prefixes in a set of productions

퐹푎푐푡표푟 → 푛푎푚푒 퐴푟푔푢푚푒푛푡푠 퐴푟푔푢푚푒푛푡푠 → 퐴푟푔퐿푖푠푡 퐴푟푔퐿푖푠푡 휖

• Algorithm

퐴 → 훼훽1 훼훽2 … 훼훽푛 훾1 훾2 … |훾푗

퐴 → 훼퐵|훾1 훾2 … |훾푗 퐵 → 훽1 훽2 … |훽푛

CS 335 Swarnendu Biswas Key Insight in Using Top-Down Parsing

• Efficiency depends on the accuracy of selecting the correct production for expanding a nonterminal • Parser may not terminate in the worst case • A large subset of the context-free grammars can be parsed without backtracking

CS 335 Swarnendu Biswas Recursive-Descent Parsing

CS 335 Swarnendu Biswas Recursive-Descent Parsing • Recursive-descent parsing is a form of top-down parsing that may require backtracking • Consists of a set of procedures, one for each nonterminal

void A() { Choose an A-production 퐴 → 푋1푋2 … 푋푘 for 푖 ← 1 … 푘 if 푋푖 is a nonterminal call procedure 푋푖() else if 푋푖 equals the current input symbol 푎 advance the input to the next symbol else // error }

CS 335 Swarnendu Biswas Limitations with Recursive-Descent Parsing

• Consider a grammar with two productions 푋 → 훾1 and 푋 → 훾2 • Suppose FIRST(훾1) ∩ FIRST(훾2) ≠ 휙 • Say 푎 is the common terminal symbol • Function corresponding to 푋 will not know which production to use on input token 푎

CS 335 Swarnendu Biswas Recursive-Descent Parsing with Backtracking

• To support backtracking • All productions should be tried in some order • Failure for some production implies we need to try remaining productions • Report an error only when there are no other rules

CS 335 Swarnendu Biswas Predictive Parsing

• Special case of recursive-descent parsing that does not require backtracking • Lookahead symbol unambiguously determines which production rule to use • Advantage is that the algorithm is simple and the parser can be constructed by hand

푠푡푚푡 → expr ; | if 푒푥푝푟 푠푡푚푡 | for 표푝푡푒푥푝푟 ; 표푝푡푒푥푝푟 ; 표푝푡푒푥푝푟 푠푡푚푡 | other 표푝푡푒푥푝푟 → 휖 | expr

CS 335 Swarnendu Biswas Pseudocode for a Predictive Parser void stmt() { switch(lookahead) { case expr: match(expr); match(‘;’); break; case if: match(if); match(‘(‘); match(expr); match(‘)’); stmt(); break; case for: match(for); match(‘(‘); optexpr(); match(‘;’); optexpr(); match(‘;’); optexpr(); match(‘)’); stmt(); break; case other: match(other); break; default: report(“syntax error”); } }

CS 335 Swarnendu Biswas LL(1) Grammars

• Class of grammars for which no backtracking is required • First L stands for left-to-right scan, second L stands for leftmost derivation • There is one lookahead token • No left-recursive or can be LL(1) • In LL(k), k stands for k lookahead tokens • Predictive parsers accept LL(k) grammars • Every LL(1) grammar is a LL(2) grammar

CS 335 Swarnendu Biswas Nonrecursive Table-Driven Predictive Parser

Input a + b $

Predictive Stack X Output Parsing Program Y

Z

$ Parsing Table 푀

CS 335 Swarnendu Biswas Predictive Parsing Algorithm • Input: String 푤 and parsing table 푀 for grammar 퐺 • Algorithm: Let 푎 be the first symbol in 푤 Let 푋 be the symbol at the top of the stack while 푋 ≠ $: if 푋 == 푎: pop the stack and advance the input else if 푋 is a terminal or 푀[푋, 푎] is an error entry: error else if 푀 푋, 푎 == 푋 → 푌1푌2 … 푌푘: output the production pop the stack push 푌푘푌푘−1 … 푌1 onto the stack 푋 ← top stack symbol

CS 335 Swarnendu Biswas 퐸 → 푇퐸′ 퐸′ → +푇퐸′ | 휖 Predictive Parsing Table 푇 → 퐹푇′ 푇′ →∗ 퐹푇′ | 휖 퐹 → 퐸 | id

Nonterminal id + * ( ) $ 퐸 퐸 → 푇퐸′ 퐸 → 푇퐸′

퐸′ 퐸′ → +푇퐸′ 퐸′ → 휖 퐸′ → 휖

푇 푇 → 퐹푇′ 푇 → 퐹푇′

푇′ 푇′ → 휖 푇′ →∗ 퐹푇′ 푇′ → 휖 푇′ → 휖

퐹 퐹 → id 퐹 → (퐸)

CS 335 Swarnendu Biswas Construction of a Predictive Parsing Table

• Input: Grammar 퐺

• Algorithm: • For each production 퐴 → 훼 in 퐺, • For each terminal 푎 in FIRST 훼 , add 퐴 → 훼 to 푀[퐴, 푎] • If 휖 is in FIRST 훼 , then for each terminal 푏 in FOLLOW(퐴), add 퐴 → 훼 to 푀 퐴, 푏 • If 휖 is in FIRST 훼 and $ is in FOLLOW(퐴), add 퐴 → 훼 to 푀[퐴, $] • No production in 푀[퐴, 푎] indicates error

CS 335 Swarnendu Biswas Working of Predictive Parser

Matched Stack Input Action 퐸$ id + id ∗ id$ 푇퐸′$ id + id ∗ id$ Output 퐸 → 푇퐸′ 퐹푇′퐸′$ id + id ∗ id$ Output 푇 → 퐹푇′ id푇′퐸′$ id + id ∗ id$ Output 퐹 → id id 푇′퐸′$ +id ∗ id$ Match id id 퐸′$ +id ∗ id$ Output 푇′ → 휖 id +푇퐸′$ +id ∗ id$ Output 퐸′ → +푇퐸′ id + 푇퐸′$ id ∗ id$ Match + id + 퐹푇′퐸′$ id ∗ id$ Output 푇 → 퐹푇′ id + id퐓′퐸′$ id ∗ id$ Output 퐹 → id

CS 335 Swarnendu Biswas Working of Predictive Parser

Matched Stack Input Action … id + id푇′퐸′$ id ∗ id$ Output 퐹 → id id + id 푇′퐸′$ ∗ id$ Match id id + id ∗ 퐹푇′퐸′$ ∗ id$ Output 푇′ →∗ 퐹푇′ id + id∗ 퐹푇′퐸′$ id$ Match ∗ id + id∗ id푇′퐸′$ id$ Output 퐹 → id id + id∗id 푇′퐸′$ $ Match id id + id∗id 퐸′$ $ Output 푇′ → 휖 id + id∗id $ $ Output 퐸′ → 휖

CS 335 Swarnendu Biswas Predictive Parsing

• Grammars whose predictive parsing tables contain no duplicate entries are called LL(1)

• If grammar 퐺 is left-recursive or is ambiguous, then parsing table 푀 will have at least one multiply-defined cell • Some grammars cannot be transformed into LL(1) • The adjacent grammar is ambiguous 푆 → 푖퐸푡푆푆′ | 푎 푆′ → 푒푆 | 휖 퐸 → 푏

CS 335 Swarnendu Biswas Predictive Parsing Table 푆 → 푖퐸푡푆푆′| 푎 푆′ → 푒푆 | 휖 퐸 → 푏

Nonterminal a b e i t $ 푆 푆 → 푎 푆 → 푖퐸푡푆푆′

푆′ 푆′ → 휖 푆′ → 휖 푆′ → 푒푆 퐸 퐸 → 푏 푇 → 퐹푇′

CS 335 Swarnendu Biswas Error Recovery in Predictive Parsing

• Error conditions • Terminal on top of the stack does not match the next input symbol • Nonterminal 퐴 is on top of the stack, 푎 is the next input symbol, and 푀[퐴, 푎] is error • Choices • Raise an error and quit parsing • Print an error message, try to recover from the error, and continue with compilation

CS 335 Swarnendu Biswas Error Recovery in Predictive Parsing

• Panic mode – skip over symbols until a token in a set of synchronizing (synch) tokens appears • Add all tokens in FOLLOW(퐴) to the synch set for 퐴 • Add symbols in FIRST(퐴) to the synch set for 퐴 • Add keywords that can begin sentences • …

CS 335 Swarnendu Biswas Predictive Parsing Table with Synchronizing 퐸 → 푇퐸′ Tokens 퐸′ → +푇퐸′ | 휖 푇 → 퐹푇′ 푇′ →∗ 퐹푇′ | 휖 퐹 → 퐸 | id

Nonterminal id + * ( ) $ 퐸 퐸 → 푇퐸′ 퐸 → 푇퐸′ synch synch

퐸′ 퐸′ → +푇퐸′ 퐸′ → 휖 퐸′ → 휖

푇 푇 → 퐹푇′ synch 푇 → 퐹푇′ synch synch

푇′ 푇′ → 휖 푇′ →∗ 퐹푇′ 푇′ → 휖 푇′ → 휖

퐹 퐹 → id synch synch 퐹 → (퐸) synch synch

CS 335 Swarnendu Biswas Error Recover Moves by Predictive Parser

Stack Input Remark 퐸$ )id ∗ +id$ Error, skip ) 퐸$ id ∗ +id$ id is in FIRST(퐸) 푇퐸′$ id ∗ +id$ 퐹푇퐸′$ id ∗ +id$ id푇퐸′$ id ∗ +id$ 푇′퐸′$ ∗ +id$ ∗ 퐹푇′퐸′$ ∗ +id$ 퐹푇′퐸′$ +id$ Error, 푀 퐹, + = synch 푇′퐸′$ +id$ 퐹 has been popped 퐸′$ +id$

CS 335 Swarnendu Biswas Error Recover Moves by Predictive Parser

Stack Input Remark +푇퐸′$ +id$ 푇퐸′$ id$ 퐹푇′퐸′$ id$ id푇′퐸′$ id$ 푇′퐸′$ $ 퐸′$ $ $ $

CS 335 Swarnendu Biswas References

• A. Aho et al. : Principles, Techniques, and Tools, 2nd edition, Chapter 4.4. • K. Cooper and L. Torczon. Engineering a , 2nd edition, Chapter 3.3.

CS 335 Swarnendu Biswas