COMP 181 Z What Is the Tufts Mascot? “Jumbo” the Elephant
Total Page:16
File Type:pdf, Size:1020Kb
Prelude COMP 181 z What is the Tufts mascot? “Jumbo” the elephant Lecture 6 z Why? Top-down Parsing z P. T. Barnum was an original trustee of Tufts z 1884: donated $50,000 for a natural museum on campus Barnum Museum, later Barnum Hall September 21, 2006 z “Jumbo”: famous circus elephant z 1885: Jumbo died, was stuffed, donated to Tufts z 1975: Fire destroyed Barnum Hall, Jumbo Tufts University Computer Science 2 Last time Grammar issues z Finished scanning z Often: more than one way to derive a string z Produces a stream of tokens z Why is this a problem? z Removes things we don’t care about, like white z Parsing: is string a member of L(G)? space and comments z We want more than a yes or no answer z Context-free grammars z Key: z Formal description of language syntax z Represent the derivation as a parse tree z Deriving strings using CFG z We want the structure of the parse tree to capture the meaning of the sentence z Depicting derivation as a parse tree Tufts University Computer Science 3 Tufts University Computer Science 4 Grammar issues Parse tree: x – 2 * y z Often: more than one way to derive a string Right-most derivation Parse tree z Why is this a problem? Rule Sentential form expr z Parsing: is string a member of L(G)? - expr z We want more than a yes or no answer 1 expr op expr # Production rule 3 expr op <id,y> expr op expr 1 expr → expr op expr 6 expr * <id,y> z Key: 2 | number 1 expr op expr * <id,y> expr op expr * y z Represent the derivation as a parse3 tree | identifier 2 expr op <num,2> * <id,y> z We want the structure of the parse4 optree →to capture+ the 5 expr - <num,2> * <id,y> meaning of the sentence 5 | - 3 <id,x> - <num,2> * <id,y> x - 2 6 | * 7 | / Tufts University Computer Science 5 Tufts University Computer Science 6 1 Abstract syntax tree Left vs right derivations z Parse tree contains extra junk z Two derivations of “x – 2 * y” z Eliminate intermediate nodes z Move operators up to parent nodes Rule Sentential form Rule Sentential form z Result: abstract syntax tree - expr - expr 1 expr op expr 1 expr op expr expr * 3 <id, x> op expr 3 expr op <id,y> 5 <id,x> - expr 6 expr * <id,y> 1 <id,x> - expr op expr 1 expr op expr * <id,y> expr op expr - y 2 <id,x> - <num,2> op expr 2 expr op <num,2> * <id,y> 6 <id,x> - <num,2> * expr 5 expr - <num,2> * <id,y> expr op expr * y x 2 3 <id,x> - <num,2> * <id,y> 3 <id,x> - <num,2> * <id,y> Left-most derivation Right-most derivation x - 2 Tufts University Computer Science 7 Tufts University Computer Science 8 Derivations With precedence z One captures meaning, the other doesn’t z Last time: ways to force the right tree shape z Add productions to represent precedence - * # Production rule # Production rule 1 expr → expr op expr 1 expr → expr + term x * - y 2 | number 2 | expr - term 3 | identifier 3 | term 4 op → + 4 term → term * factor 2 y x 2 5 | - 5 | term / factor 6 | * 6 | factor Left-most derivation Right-most derivation 7 | / 7 factor → number 8 | identifier Tufts University Computer Science 9 Tufts University Computer Science 10 With precedence Parsing z What is parsing? z Discovering the derivation of a string expr expr- If one exists z Harder than generating strings Not surprisingly expr op expr expr - term* z Two major approaches expr op expr * y term term * fact z Top-down parsing z Bottom-up parsing x - 2 fact fact y z Don’t work on all context-free grammars z Properties of grammar determine parse-ability x 2 z Our goal: make parsing efficient z We may be able to transform a grammar Tufts University Computer Science 11 Tufts University Computer Science 12 2 Two approaches Grammars and parsers z Top-down parsers LL(1), recursive descent z LL(1) parsers z Start at the root of the parse tree and grow toward leaves z Left-to-right input Grammars that this z Pick a production & try to match the input z Leftmost derivation can handle are called z Bad “pick” Æ may need to backtrack LL(1) grammars z 1 symbol of look-ahead z Bottom-up parsers LR(1), operator precedence z LR(1) parsers z Start at the leaves and grow toward root z Left-to-right input Grammars that this z As input is consumed, encode possible parse trees in an z Rightmost derivation can handle are called internal state (similar to our NFA Æ DFA conversion) LR(1) grammars z 1 symbol of look-ahead z Bottom-up parsers handle a large class of grammars z Also: LL(k), LR(k), SLR, LALR, … Tufts University Computer Science 13 Tufts University Computer Science 14 Top-down parsing Example z Start with the root of the parse tree z Expression grammar (with precedence) z Root of the tree: node labeled with the start symbol # Production rule z Algorithm: 1 expr → expr + term Repeat until the fringe of the parse tree matches input string 2 | expr - term 3 | term z At a node A, select a production for A 4 term → term * factor Add a child node for each symbol on rhs 5 | term / factor z If a terminal symbol is added that doesn’t match, backtrack 6 | factor z Find the next node to be expanded (a non-terminal) 7 factor → number 8 | identifier z Done when: z Leaves of parse tree match input string (success) z Input string x – 2 * y z All productions exhausted in backtracking (failure) Tufts University Computer Science 15 Tufts University Computer Science 16 Current position in Example the input stream Backtracking Rule Sentential form Input string Rule Sentential form Input string - expr ↑ x - 2 * y expr - expr ↑ x - 2 * y 2 expr + term ↑ x - 2 * y 2 expr + term ↑ x - 2 * y 3 term + term ↑ x – 2 * y 3 term + term ↑ x – 2 * y Undo all these 6 factor + term ↑ x – 2 * y expr + term 6 factor + term ↑ x – 2 * y productions 8 <id> + term x ↑ – 2 * y 8 <id> + term x ↑ – 2 * y - <id,x> + term x ↑ – 2 * y ? <id,x> + term x ↑ – 2 * y term fact z Rollback productions z Problem: z Choose a different production for expr z x Can’t match next terminal z Continue z We guessed wrong at step 2 Tufts University Computer Science 17 Tufts University Computer Science 18 3 Retrying Successful parse Rule Sentential form Input string expr Rule Sentential form Input string expr - expr ↑ x - 2 * y - expr ↑ x - 2 * y 2 expr - term ↑ x - 2 * y 2 expr - term ↑ x - 2 * y 3 term - term ↑ x – 2 * y expr - term 3 term - term ↑ x – 2 * y expr - term 6 factor - term ↑ x – 2 * y 6 factor - term ↑ x – 2 * y 8 <id> - term x ↑ – 2 * y 8 <id> - term x ↑ – 2 * y term fact term term * fact - <id,x> - term x – ↑ 2 * y - <id,x> - term x – ↑ 2 * y 3 <id,x> - factor x – ↑ 2 * y 4 <id,x> - term * fact x – ↑ 2 * y 7 <id,x> - <num> x – 2 ↑ * y fact 2 6 <id,x> - fact * fact x – ↑ 2 * y fact fact y 7 <id,x> - <num> * fact x – 2 ↑ * y z - <id,x> - <num,2> * fact x – 2 * ↑ y Problem: x x 2 8 <id,x> - <num,2> * <id> x – 2 * y ↑ z More input to read z Another cause of backtracking z All terminals match – we’re done Tufts University Computer Science 19 Tufts University Computer Science 20 Other possible parses Left recursion Rule Sentential form Input string z Formally, - expr ↑ x - 2 * y A grammar is left recursive if ∃ a non-terminal A such that 2 expr + term ↑ x - 2 * y A →* A α (for some set of symbols α) 2 expr + term + term ↑ x – 2 * y ↑ 2 expr + term + term + term x – 2 * y What does →* mean? 2 expr + term + term + term + term ↑ x – 2 * y A → B x z Problem: termination B → A y z Bad news: z Wrong choice leads to infinite expansion Top-down parsers cannot handle left recursion (More importantly: without consuming any input!) z May not be as obvious as this z Good news: z Our grammar is left recursive We can systematically eliminate left recursion Tufts University Computer Science 21 Tufts University Computer Science 22 Notation Eliminating left recursion z Non-terminals z Consider this grammar: z Capital letter: A, B, C Language is β followed # Production rule by zero or more α 1 foo → foo α z Terminals 2 | β z Lowercase, underline: x, y, z z Rewrite as z Some mix of terminals and non-terminals # Production rule z Greek letters: α, β, γ This production gives 1 foo → β bar you one β z Example: # Production rule 2 bar → α bar 1 A → B + x 3 | ε These two productions 1 A → B α α = + x give you zero or more α New non-terminal Tufts University Computer Science 23 Tufts University Computer Science 24 4 Back to expressions Eliminating left recursion z Two cases of left recursion: z Resulting grammar # Production rule # Production rule # Production rule z All right recursive 1 expr → term expr2 2 expr2 → + term expr2 1 expr → expr + term 4 term → term * factor z Retain original language 3 | - term expr2 2 | expr - term 5 | term / factor and associativity 4 | ε 3 | term 6 | factor z Not as intuitive to read 5 term → factor term2 z Transform as follows: 6 term2 → * factor term2 z Top-down parser 7 | / factor term2 # Production rule # Production rule 8 | ε z 1 expr → term expr2 4 term → factor term2 Will always terminate 9 factor → number 2 expr2 → + term expr2 5 term2 → * factor term2 z May still backtrack 10 | identifier 3 | - term expr2 6 | / factor term2 4 | ε | ε There’s a lovely algorithm to do this automatically, which we will skip Tufts University Computer Science 25 Tufts University Computer Science 26 Top-down parsers Right-recursive grammar # Production rule z Problem: Left-recursion Two productions 1 expr → term expr2 with no choice at all z Solution: Technique to remove it 2 expr2 → + term expr2 3 | - term expr2 All other productions are 4 | ε z What about backtracking? uniquely identified by a 5 term → factor term2 terminal symbol at the Current algorithm is brute force 6 term2 → * factor term2 start of RHS 7 | / factor term2 z Problem: how to choose the