Bottom-Up Parsing Swarnendu Biswas
Total Page:16
File Type:pdf, Size:1020Kb
CS 335: Bottom-up Parsing Swarnendu Biswas Semester 2019-2020-II CSE, IIT Kanpur Content influenced by many excellent references, see References slide for acknowledgements. Rightmost Derivation of 푎푏푏푐푑푒 푆 → 푎퐴퐵푒 Input string: 푎푏푏푐푑푒 퐴 → 퐴푏푐 | 푏 푆 → 푎퐴퐵푒 퐵 → 푑 → 푎퐴푑푒 → 푎퐴푏푐푑푒 → 푎푏푏푐푑푒 푆 푟푚 푆 푟푚 푆 푟푚 푆 푟푚 푆 푎 퐴 퐵 푒 푎 퐴 퐵 푒 푎 퐴 퐵 푒 푎 퐴 퐵 푒 푑 퐴 푏 푐 푑 퐴 푏 푐 푑 푏 CS 335 Swarnendu Biswas Bottom-up Parsing Constructs the parse tree starting from the leaves and working up toward the root 푆 → 푎퐴퐵푒 Input string: 푎푏푏푐푑푒 퐴 → 퐴푏푐 | 푏 푆 → 푎퐴퐵푒 푎푏푏푐푑푒 퐵 → 푑 → 푎퐴푑푒 → 푎퐴푏푐푑푒 → 푎퐴푏푐푑푒 → 푎퐴푑푒 → 푎푏푏푐푑푒 → 푎퐴퐵푒 → 푆 reverse of rightmost CS 335 Swarnendu Biswas derivation Bottom-up Parsing 푆 → 푎퐴퐵푒 Input string: 푎푏푏푐푑푒 퐴 → 퐴푏푐 | 푏 푎푏푏푐푑푒 퐵 → 푑 → 푎퐴푏푐푑푒 → 푎퐴푑푒 → 푎퐴퐵푒 → 푆 푎푏푏푐푑푒⇒ 푎 퐴 푏 푐 푑 푒 ⇒ 푎 퐴 푑 푒 ⇒ 푎 퐴 퐵 푒 ⇒ 푆 푏 퐴 푏 푐 퐴 푏 푐 푑 푎 퐴 퐵 푒 푏 푏 퐴 푏 푐 푑 푏 CS 335 Swarnendu Biswas Reduction • Bottom-up parsing reduces a string 푤 to the start symbol 푆 • At each reduction step, a chosen substring that is the rhs (or body) of a production is replaced by the lhs (or head) nonterminal Derivation 푆 훾0 훾1 훾2 … 훾푛−1 훾푛 = 푤 푟푚 푟푚 푟푚 푟푚 푟푚 푟푚 Bottom-up Parser CS 335 Swarnendu Biswas Handle • Handle is a substring that matches the body of a production • Reducing the handle is one step in the reverse of the rightmost derivation Right Sentential Form Handle Reducing Production 퐸 → 퐸 + 푇 | 푇 ∗ 퐹 → 푇 → 푇 ∗ 퐹 | 퐹 id1 id2 id1 id 퐹 → 퐸 | id 퐹 ∗ id2 퐹 푇 → 퐹 푇 ∗ id2 id2 퐹 → id 푇 ∗ 퐹 푇 ∗ 퐹 푇 → 푇 ∗ 퐹 푇 푇 퐸 → 푇 CS 335 Swarnendu Biswas Handle Although 푇 is the body of the production 퐸 → 푇, 푇 is not a handle in the sentential form 푇 ∗ id2 Right Sentential Form Handle Reducing Production 퐸 → 퐸 + 푇 | 푇 ∗ 퐹 → 푇 → 푇 ∗ 퐹 | 퐹 id1 id2 id1 id 퐹 → 퐸 | id 퐹 ∗ id2 퐹 푇 → 퐹 푇 ∗ id2 id2 퐹 → id 푇 ∗ 퐹 푇 ∗ 퐹 푇 → 푇 ∗ 퐹 푇 푇 퐸 → 푇 CS 335 Swarnendu Biswas Handle • If 푆 ∗ 훼퐴푤 훼훽푤, then 퐴 → 푟푚 푟푚 푆 훽 is a handle of 훼훽푤 • String 푤 right of a handle must 퐴 contain only terminals 훼 훽 푤 A handle 퐴 → 훽 in the parse tree for 훼훽푤 CS 335 Swarnendu Biswas Handle If grammar 퐺 is unambiguous, then every right sentential form has only one handle If 퐺 is ambiguous, then there can be more than one rightmost derivation of 훼훽푤 CS 335 Swarnendu Biswas Shift-Reduce Parsing CS 335 Swarnendu Biswas Shift-Reduce Parsing • Type of bottom-up parsing with two primary actions, shift and reduce • Other obvious actions are accept and error • The input string (i.e., being parsed) consists of two parts • Left part is a string of terminals and nonterminals, and is stored in stack • Right part is a string of terminals read from an input buffer • Bottom of the stack and end of input are represented by $ CS 335 Swarnendu Biswas Shift-Reduce Actions • Shift: shift the next input symbol from the right string onto the top of the stack • Reduce: identify a string on top of the stack that is the body of a production, and replace the body with the head CS 335 Swarnendu Biswas Shift-Reduce Parsing • Initial Stack Input $ 푤$ ⇒ Shift ∗ Reduce • Final goal Stack Input $푆 $ CS 335 Swarnendu Biswas 퐸 → 퐸 + 푇 | 푇 Shift-Reduce Parsing 푇 → 푇 ∗ 퐹 | 퐹 퐹 → 퐸 | id Stack Input Action $ 퐢퐝1 ∗ 퐢퐝2$ Shift $퐢퐝1 ∗ 퐢퐝2$ Reduce by 퐹 → id $퐹 ∗ 퐢퐝2$ Reduce by 푇 → 퐹 $푇 ∗ 퐢퐝2$ Shift $푇 ∗ 퐢퐝2$ Shift $푇 ∗ 퐢퐝2 $ Reduce by 퐹 → id $푇 ∗ 퐹 $ Reduce by 푇 → 푇 ∗ 퐹 $푇 $ Reduce by 퐸 → 푇 $퐸 $ Accept CS 335 Swarnendu Biswas Handle on Top of the Stack • Is the following scenario possible? Stack Input Action … $ 훼훽훾 푤$ Reduce by 퐴 → 훾 $ 훼훽퐴 푤$ Reduce by 퐵 → 훽 $훼퐵퐴 푤$ … CS 335 Swarnendu Biswas Possible Choices in Rightmost Derivation 1. 푆 훼퐴푧 훼훽퐵푦푧 훼훽훾푦푧 2. 푆 훼퐵푥퐴푧 훼퐵푥푦푧 훼훾푥푦푧 푟푚 푟푚 푟푚 푟푚 푟푚 푟푚 푆 푆 퐴 퐵 퐴 퐵 훼 훾 푥 푦 푧 훼 훽 훾 푦 푧 CS 335 Swarnendu Biswas Handle on Top of the Stack • Is the following scenario possible? Stack Input Action … $ 훼훽훾 푤$ Reduce by 퐴 → 훾 Handle$ 훼훽퐴 always eventually appears on top of푤$ theReduce stack, by 퐵 never→ 훽 inside $훼퐵퐴 푤$ … CS 335 Swarnendu Biswas Shift-Reduce Actions • Shift: shift the next input symbol from the right string onto the top of the stack • Reduce: identify a string on top of the stack that is the body of a production, and replace the body with the head How do you decide when to shift and when to reduce? CS 335 Swarnendu Biswas Steps in Shift-Reduce Parsers General shift-reduce technique If there is no handle on the stack, then shift If there is a handle on the stack, then reduce • Bottom up parsing is essentially the process of detecting handles and reducing them • Different bottom-up parsers differ in the way they detect handles CS 335 Swarnendu Biswas Challenges in Bottom-up Parsing Which action do you •Both shift and reduce are valid, pick when there is a choice? implies a shift-reduce conflict Which rule to use if reduction is possible by more than one •Reduce-reduce conflict rule? CS 335 Swarnendu Biswas Shift-Reduce Conflict 퐸 → 퐸 + 퐸 퐸 ∗ 퐸 id id + id ∗ id id + id ∗ id Stack Input Action Stack Input Action $ id + id ∗ id$ Shift $ id + id ∗ id$ Shift … … $퐸 + 퐸 ∗ id$ Reduce by 퐸 → 퐸 + 퐸 $퐸 + 퐸 ∗ id$ Shift $퐸 ∗ id$ Shift $퐸 + 퐸 ∗ id$ Shift $퐸 ∗ id$ Shift $퐸 + 퐸 ∗ id $ Reduce by 퐸 → id $퐸 ∗ id $ Reduce by 퐸 → id $퐸 + 퐸 ∗ 퐸 $ Reduce by 퐸 → 퐸 ∗ 퐸 $퐸 ∗ 퐸 $ Reduce by 퐸 → 퐸 ∗ 퐸 $퐸 + 퐸 $ Reduce by 퐸 → 퐸 + 퐸 $퐸 $ $퐸 $ CS 335 Swarnendu Biswas Shift-Reduce Conflict S푡푚푡 → if 퐸푥푝푟 then 푆푡푚푡 | if 퐸푥푝푟 then 푆푡푚푡 else 푆푡푚푡 | 표푡ℎ푒푟 Stack Input Action … if 퐸푥푝푟 then 푆푡푚푡 else … $ CS 335 Swarnendu Biswas Shift-Reduce Conflict S푡푚푡 → if 퐸푥푝푟 then 푆푡푚푡 | if 퐸푥푝푟 then 푆푡푚푡 else 푆푡푚푡 | 표푡ℎ푒푟 Stack Input Action … if 퐸푥푝푟 then 푆푡푚푡 else … $ What is a correct thing to do for this grammar – shift or reduce? CS 335 Swarnendu Biswas 푀 → 푅 + 푅 푅 + 푐 푅 Reduce-Reduce Conflict 푅 → 푐 푐 + 푐 푐 + 푐 Stack Input Action Stack Input Action $ 푐 + 푐$ Shift $ 푐 + 푐$ Shift $푐 +푐$ Reduce by 푅 → 푐 $푐 +푐$ Reduce by 푅 → 푐 $푅 +푐$ Shift $푅 +푐$ Shift $푅 + 푐$ Shift $푅 + 푐$ Shift $푅 + 푐 $ Reduce by 푅 → 푐 $푅 + 푐 $ Reduce by 푀 → 푅 + 푐 $푅 + 푅 $ Reduce by 푅 → 푅 + 푅 $푀 $ $푀 $ CS 335 Swarnendu Biswas LR Parsing CS 335 Swarnendu Biswas LR(k) Parsing • Popular bottom-up parsing scheme • L is for left-to-right scan of input • R is for reverse of rightmost derivation • k is the number of lookahead symbols • LR parsers are table-driven, like the nonrecursive LL parser • LR grammar is one for which we can construct an LR parsing table CS 335 Swarnendu Biswas Popularity of LR Parsing Can recognize all language constructs with CFGs Most general nonbacktracking shift-reduce parsing method Works for a superset of grammars parsed with predictive or LL parsers Why? CS 335 Swarnendu Biswas Popularity of LR Parsing Can recognize all language constructs with CFGs Most general nonbacktracking shift-reduce parsing method Works for a superset of grammars parsed with predictive or LL parsers • LL(k) parsing predicts which production to use having seen only the first k tokens of the right-hand side • LR(k) parsing can decide after it has seen input tokens corresponding to the entire right-hand side of the production CS 335 Swarnendu Biswas Block Diagram of LR Parser Input 푎1 … … 푎푖 … … 푎푛 $ LR Parsing Stack 푠푚 Output Program 푠푚−1 … ACTION GOTO $ Parse Table CS 335 Swarnendu Biswas LR Parsing • Remember the basic question: when to shift and when to reduce! • Information is encoded in a DFA constructed using canonical LR(0) collection I. Augmented grammar 퐺′ with new start symbol 푆′ and rule 푆′ → 푆 II. Define helper functions Closure() and Goto() CS 335 Swarnendu Biswas LR(0) Item • An LR(0) item (also called item) Production Items of a grammar 퐺 is a production of 퐺 with a dot at some position 퐴 → •푋푌푍 in the body 퐴 → 푋•푌푍 퐴 → 푋푌푍 퐴 → 푋푌•푍 • An item indicates how much of a production we have seen 퐴 → 푋푌푍• • Symbols on the left of “•” are already on the stack 퐴 → •푋푌푍 indicates that we expect a • Symbols on the right of “•” are string derivable from 푋푌푍 next on expected in the input the input CS 335 Swarnendu Biswas Closure Operation • Let 퐼 be a set of items for a grammar 퐺 • Closure(퐼) is constructed by 1. Add every item in 퐼 to Closure(퐼) 2. If 퐴 → 훼•퐵훽 is in Closure(퐼) and 퐵 → 훾 is a rule, then add 퐵 → 훾 to Closure(퐼) if not already added 3. Repeat until no more new items can be added to Closure(퐼) CS 335 Swarnendu Biswas Example of Closure Suppose 퐼 = {퐸′ → •퐸 }, compute 퐸′ → 퐸 퐸 → 퐸 + 푇 | 푇 Closure(퐼) 푇 → 푇 ∗ 퐹 | 퐹 퐹 → 퐸 | id CS 335 Swarnendu Biswas Example of Closure Suppose 퐼 = {퐸′ → •퐸 } 퐸′ → 퐸 퐸 → 퐸 + 푇 | 푇 Closure(퐼) = { 푇 → 푇 ∗ 퐹 | 퐹 퐸′ → •퐸, 퐹 → 퐸 | id 퐸 → •퐸 + 푇, 퐸 → •푇, 푇 → •푇 ∗ 퐹, 푇 → •퐹, 퐹 → • 퐸 , 퐹 → •id } CS 335 Swarnendu Biswas Kernel and Nonkernel Items • If one 퐵-production is added to Closure(퐼) with the dot at the left end, then all 퐵-productions will be added to the closure • Kernel items • Initial item 푆′ → •푆, and all items whose dots are not at the left end • Nonkernel items • All items with their dots at the left end, except for 푆′ → •푆 CS 335 Swarnendu Biswas Goto Operation • Suppose 퐼 is a set of items and 푋 is a grammar symbol • Goto(퐼, 푋) is the closure of set all items [퐴 → 훼푋•훽] such that [퐴 → 훼•푋훽] is in 퐼 • If 퐼 is a set of items for some valid prefix 훼, then Goto(퐼,푋) is set of valid items for prefix 훼푋 • Intuitively, Goto(퐼, 푋) defines the transitions in the LR(0) automaton • Goto(퐼, 푋) gives the transition from state 퐼 under input 푋 CS 335 Swarnendu Biswas Example of Goto • Compute Goto(퐼, +) 퐸′ → 퐸 퐸 → 퐸 + 푇 | 푇 푇 → 푇 ∗ 퐹 | 퐹 퐹 → 퐸 | id Suppose 퐼 = { 퐸′ → 퐸•, 퐸 → 퐸• + 푇 } CS 335 Swarnendu Biswas Example of Goto Goto(퐼, +) = { 퐸′ → 퐸 퐸 → 퐸 + 푇 | 푇 퐸 → 퐸 + •푇, 푇 → 푇 ∗ 퐹 | 퐹 푇 → •푇 ∗ 퐹, 퐹 → 퐸 | id 푇 → •퐹, 퐹 → • 퐸 , Suppose 퐼 = { 퐹 → •id 퐸′ → 퐸•, } 퐸 → 퐸• + 푇 } CS 335