Lecture 11: Parsing Parsing Arithmetic Grammar Recursive Descent Parser

Parsing Lecture 11: Parsing • Build parse tree from an expression! CSC 131! Interested in abstract syntax tree! Fall, 2014! • ! - drops irrelevant details from parse tree Kim Bruce Arithmetic grammar Recursive Descent Parser Base recognizer (ignore building tree now) on productions:! <exp> ::= <exp> <addop> <term> ! <exp> ::= <exp> <addop> <term>! | <term> ! ! <term> ::= <term> <mulop> <factor> ! addop (fst:rest) = if fst==’+’ or fst==’-‘ then rest! | <factor> ! else error ...! <factor> ::= ( <exp> ) ! ! | NUM ! exp input = let! | ID ! inputAfterExp = exp input! <addop> ::= + | -! inputAfterAddop = addOp inputAfterExp! <mulop> ::= * | / rest = term inputAfterAddop! in ! rest or Look at parse tree & abstract syntax tree for 2 * 3 + 7 fun exp input = term(addOp(exp input)); Problems Rewrite Grammar <exp> ::= <term> <termTail> (1)! <termTail> ::= <addop> <term> <termTail> (2)! | ε (3)! <term> ::= <factor> <factorTail> (4)! <factorTail> ::= <mulop> <factor> <factorTail> (5)! • How do we select which production to use | ε (6)! when alternatives?! <factor> ::= ( <exp> ) (7)! | NUM (8)! Left-recursive - never terminates • | ID (9)! <addop> ::= + | - (10)! <mulop> ::= * | / (11) No lef recursion" How do we know which production to take? Predictive Parsing FIRST Goal: a1a2...an! • Intuition: b ∈ First(X) i% there is a derivation ! X →* bω for some ω.! S → α! 1.First(b) = b for b a terminal or the empty string! ...! 2.If have X ::= ω1 | ω2 | ... | ωn then → a1a2Xβ First(X) = First(ω1) ∪ ... ∪ First(ωn) Want next terminal character derived to be a3! 3.For any right hand side u1u2...un! ! a3 in First(γ) - First(u1) ⊆ First(u1u2...un) ! Need to apply a production X ::= γ where ! if all of u , u ..., u can derive the empty string then 1) γ can eventually derive a string starting with a3 or! - 1 2 i-1 2) If X can derive the empty string, and also ! also First(ui) ⊆ First(u1u2...un) if β can derive a string starting with a3. - empty string is in First(u1u2...un) i% all of u1, u2..., un can derive the empty string a3 in Folow(X) First for Arithmetic Follow • Intuition: A terminal b ∈ Follow(X) i% there is a FIRST(<addop>) = & +, - ' ! derivation S →* vXbω for some v and ω.! FIRST(<mulop>) = & *, / ' ! 1.If S is the start symbol then put EOF ∈ Follow(S) FIRST(<factor>) = & (, NUM, ID ' ! 2.For all rules of the form A ::= wXv,! FIRST(<term>) = & (, NUM, ID ' ! a.Add all elements of First(v) to Follow(X) FIRST(<exp>) = & (, NUM, ID ' ! If v can derive the empty string then add all elts of FIRST(<termTail>) = & +, -, ε ' ! b. Follow(A) to Follow(X)" FIRST(<factorTail>) = & *, /, ε ' • Follow(X) only used if can derive empty string from X. Follow for ArithmeticOnly needed to Predictive Parsing, redux calculate for <termTail>, Goal: a1a2...an! FOLLOW(<exp>) = & EOF, ) ' ! <factorTail> ! ! FOLLOW(<termTail>) = FOLLOW(<exp>) = & EOF, ) ' ! S → α! FOLLOW(<term>) = FIRST(<termTail>) ∪ ! ...! FOLLOW(<exp>) ∪ FOLLOW(<termTail>) ! → a1a2Xβ = & +, -, EOF, ) ' ! Want next terminal character derived to be a FOLLOW(<factorTail>) = & +, -, EOF, ) ' ! 3! ! FOLLOW(<factor>) = & *, /, +, -, EOF ( " Need to apply a production X ::= γ where ! FOLLOW(<addop>) = & (, NUM, ID ( Not needed!" 1) γ can eventually derive a string starting with a3 or! FOLLOW(<mulop>) = & (, NUM, ID ( ' 2) If X can derive the empty string, then see ! if β can derive a string starting with a3. Building Table Need Unambiguous • Put X ::= α in entry (X,a) if either! • No table entry should have more than one production to ensure it’s unambiguous, as otherwise we - a in First(α), or! don’t know which rule to apply.! - e in First(α) and a in Follow(X) • Laws of predictive parsing:! • Consequence: X ::= α in entry (X,a) i% there is - If A ::= α1 | ...| αn then for all i() j, a derivation s.t. applying production can First(αi) ∩ First(αj) = ∅. ! eventually lead to string starting with a. - If X →* ε, then First(X) ∩ Follow(X) = ∅. See ArithParse.hs Non- ID NUM Addop Mulop ( ) EOF • Laws of predictive parsing:! terminals <exp> 1 1 1 - If A ::= α1 | ...| αn then for all i() j, First(αi) ∩ First(αj) = ∅. ! <termTail> 2 3 3 - If X →* ε, then First(X) ∩ Follow(X) = ∅.! <term> 4 4 4 • 2nd is OK for arithmetic:! <factTail> 6 5 6 6 - FIRST(<termTail>) = & +, -, " ' ! ' <factor> 9 8 7 - FOLLOW(<termTail>) = & EOF, ) ' ! no overlap! <addop> 10 - FIRST(<factorTail>) = & *, /, " '! <mulop> 11 - FOLLOW(<factorTail>) = & +, -, EOF, ) ' ' Read o) fom table which production to apply! Table-Driven Stack-based Parser • http://en.wikipedia.org/wiki/LL_parser! • Start with “S $” on stack and “input $” to be recognized.! Alternatives to Recursive Descent • Use table to replace non-terminals on top of Parsers stack.! • If terminal on top of stack matches next input then erase both and proceed.! • Success if end up clearing stack and input! • Show with ID * (NUM + NUM)$ Another alternative More Options • LR(1) parsers -- bottom up, gives right-most derivation. Also stack-based.! Parser Combinators! • YACC is LR(1). ANTLR is LL(1).! • - Domain speciﬁc language for parsing.! k in LL(k) and LR(k) indicates how many • - Even easier to tie to grammar than recursive descent! letters of look ahead are necessary -- e.g. length of strings in columns of table.! - Build into Haskell and Scala, deﬁnable elsewhere! • Talk about when cover Scala • Compiler writers are happiest with k=1 to avoid exponential blow-up of table. May have to rewrite grammars..

Load more