LR Parsing Lecture 7

LR Parsing Lecture 7 January 30, 2018 Midterm 1 Next Week É in-class exam É One sheet letter paper, two sides É Some review questions online É Things to know É Cool Syntax É Compiler/language processor steps É Regular Languages É NFA/DFAs É Regular Expressions É Parsing É Context-Free Grammars Compiler Construction 2/41 From Last Week É Top-down Parsing strategies É Tells us how to get from the start symbol of a grammar to a sequence of terminals following a sequence of productions Compiler Construction 3/41 Top Down Parsing S S a B c B ! C x B B ! ε C ! d !| a B c Input string: “adxdxc” a d x d x c Compiler Construction 4/41 Top Down Parsing S S a B c B B ! C x B B ! ε C ! d !| a B c Input string: “adxdxc” a d x d x c Compiler Construction 4/41 Top Down Parsing S S a B c B B ! C x B ! B ε B C ! d !| a B c C Input string: “adxdxc” a d x d x c Compiler Construction 4/41 Top Down Parsing S S a B c B B ! C x B ! B ε B C ! d !| a B c C Input string: “adxdxc” a d x d x c Compiler Construction 4/41 Top Down Parsing S S a B c B B ! C x B ! B ε B C ! d !| a B c C Input string: C B “adxdxc” a d x d x c Compiler Construction 4/41 Top Down Parsing S S a B c B B ! C x B ! B ε B C ! d !| a B c C Input string: C B “adxdxc” a d x d x c Compiler Construction 4/41 Top Down Parsing S S a B c B B ! C x B ! B ε B C ! d !| a B c C Input string: C B “adxdxc” a d x d x ε c Compiler Construction 4/41 Top Down Parsing (2) E T + E | T T ! (E) | int | int T ! ∗ Input: int * int E Compiler Construction 5/41 Top Down Parsing (2) E T + E | T T ! (E) | int | int T ! ∗ Input: int * int E T + E Compiler Construction 5/41 Top Down Parsing (2) E T + E | T T ! (E) | int | int T ! ∗ Input: int * int E T + E ( E ) Compiler Construction 5/41 Top Down Parsing (2) E T + E | T T ! (E) | int | int T ! ∗ Input: int * int E T + E ( E ) int Compiler Construction 5/41 Top Down Parsing (2) E T + E | T T ! (E) | int | int T ! ∗ Input: int * int E T + E Compiler Construction 5/41 Top Down Parsing (2) E T + E | T T ! (E) | int | int T ! ∗ Input: int * int E T + E int Compiler Construction 5/41 Top Down Parsing (2) E T + E | T T ! (E) | int | int T ! ∗ Input: int * int E T + E int * Compiler Construction 5/41 Top Down Parsing (2) E T + E | T T ! (E) | int | int T ! ∗ Input: int * int E T + E Compiler Construction 5/41 Top Down Parsing (2) E T + E | T T ! (E) | int | int T ! ∗ Input: int * int E T + E int * T Compiler Construction 5/41 Top Down Parsing (2) E T + E | T T ! (E) | int | int T ! ∗ Input: int * int E T + E Compiler Construction 5/41 Top Down Parsing (2) E T + E | T T ! (E) | int | int T ! ∗ Input: int * int E Compiler Construction 5/41 Top Down Parsing (2) E T + E | T T ! (E) | int | int T ! ∗ Input: int * int E int * T Compiler Construction 5/41 Top Down Parsing (2) E T + E | T T ! (E) | int | int T ! ∗ Input: int * int E int * T Compiler Construction 5/41 Top Down Parsing (2) E T + E | T T ! (E) | int | int T ! ∗ Input: int * int E int * T Compiler Construction 5/41 Top Down Parsing (2) E T + E | T T ! (E) | int | int T ! ∗ Input: int * int E int * T Compiler Construction 5/41 Top Down Parsing (2) E T + E | T T ! (E) | int | int T ! ∗ Input: int * int E int * T int Compiler Construction 5/41 Top Down Parsing (2) E T + E | T T ! (E) | int | int T ! ∗ Input: int * int E int * T int * int Compiler Construction 5/41 Top-Down Summary É Exhaust all grammatical rule options at each step É Notice significant backtracing! É Backtracing results from having multiple options for rules when considering tokens! É If we restrict our grammar slightly, we can use LL(1) parsing É Key idea: deterministically select grammar rule based on one token to eliminate backtracing Compiler Construction 6/41 LL(1) Summary É We want a parsing strategy that lets us parse an input string by considering only one token at a time É Intuition: If input string x L(G), there must be a parse tree. We want to pick2 a grammar rule one token at a time so we can quickly find if the string is in the language! É Implementation: Parse table É Intuition: 2-d table indexed by ‘current’ non-terminal and ‘lookahead’ token. Each cell contains the production rule corresponding to the lookahead. Compiler Construction 7/41 Parse Table Construction É Constructing the Parse table requires a PREDICT set of every non-terminal in the grammar. É We need a way to know which rule must be used to successfully construct the next part of the parse tree S a |! b Consider! You know exactly which rule to take based on the first thing you see as part of the S rule. Compiler Construction 8/41 Parse Table Construction S a A a !| b A A c c A !| b b A É Same idea! We always start with S. You can see that the first non-terminal in the S rules tells you unambiguously which rule is used! É Similarly, after you consume one ‘a’, the next non-terminal you consider is A. You can decide which A rule to choose based on the first terminal in the A rules! Compiler Construction 9/41 Parse Table Construction S A a A ! B b B ! c B !| a A É A bit trickier, but you can still decide which rule to take with only one terminal Compiler Construction 10/41 Parse Table Construction É Sounds like we need to know the first terminal that can appear as a result of expanding a non-terminal! É This is called the FIRST set FIRST(A) is the set of terminals that can appear first in the expansion of the non-terminal A É Note this means recursing through other nonterminals! FIRST(X) = b X ∗ bα ε X ∗ ε É f j ! g [ f j ! g Compiler Construction 11/41 Parse Table Construction É That includes ε rules S a A a A ! b A !| ε Compiler Construction 12/41 Parse Table Construction É Recall we’re trying to make a parse table É Almost enough to know FIRST set of all non-terminals. The token will tell us which rule to take next. É However, how do we know when the token should be an ε? Compiler Construction 13/41 Parse Table Construction S a A B a A ! b B !| ε B c A d !| ε If my input string is aba, after consuming the ‘a’ token, how do we know that the next ‘b’ token results in B ε? ! Compiler Construction 14/41 Parse Table Construction: FOLLOW É We also need to know the terminals that could come after the previous non-terminal FOLLOW(X) = b S ∗ βXb! É f j ! g Computing FOLLOW set É Compute FIRST sets for all non-terminals É Add $ to FOLLOW(S) É start symbol always ends with end-of-input For all productions Y ...XA1...An É ! É Add FIRST(Ai)- ε to FOLLOW(X). Stop if f g ε FIRST (Ai). 62 É Add FOLLOW(Y) to FOLLOW(X) Compiler Construction 15/41 Example FOLLOW Set E TX ! X + E | ε T ! ( E ) | i n t Y Y ! T | ε ! ∗ FOLLOW(“+”) = { int, ( } FOLLOW(“(”) = { int, ( } FOLLOW(X) = { $, ) } FOLLOW(Y) = { +, ), $ } Compiler Construction 16/41 Back to Parsing Tables É Recall: We want to build a LL(1) Parsing Table For each production A α in G do: ! For each terminal b FIRST(α) do É 2 É T[A][b] = α If α ∗ ε, for each b FOLLOW(A) do É ! 2 É T[A][b] = α Compiler Construction 17/41 Parsing Table E TX ! X + E | ε T ! ( E ) | i n t Y Y ! T | ε ! ∗ Where do we put Y T ? ! ∗ É Well, FIRST(*T) = {*}, thus column * of row Y gets *T int * + ( ) $ T int Y (E) E TX TX X + E ε ε Y *T ε ε ε Compiler Construction 18/41 Parsing Table E TX ! X + E | ε T ! ( E ) | i n t Y Y ! T | ε ! ∗ Where do we put Y ε? ! É Well, FOLLOW(Y) = {$, +,)}, thus columns $, +, and ) in row Y get Y ε ! int * + ( ) $ T int Y (E) E TX TX X + E ε ε Y *T ε ε ε Compiler Construction 19/41 Another Parse Table Example E TXR FR ! ! ∗ X + T X | ε |! ε F i n t T F R! | ( E ) ! Production Prediction + * ( ) int $ E TX (, int X! TX E + + X ! ε $, ) X T ! FR (, int) T R ! FR * R R !ε ∗ , $, ) F + F !int int ! F (E) ( ! Compiler Construction 20/41 Another Parse Table Example Production Prediction + * ( ) int $ E TX (, int X! TX E TX TX + + X ! ε $, ) X TX ε ε + T ! FR (, int) T FR FR R ! FR * R ε FR ε ε R !ε ∗ , $, ) F ∗ E int + ( ) F !int int ! F (E) ( ! Compiler Construction 21/41 Notes on LL(1) Parsing Tables É If any entry is multiply defined then G is not LL(1) É G is ambiguous É G is left-recursive É G is not left-factored Compiler Construction 22/41 Ambiguity in parse tables E E + TT F E ! TF ! i d T ! T FF ! (E) ! ∗ ! For the E productions, we need FIRST(T) = {(, id} and FIRST(E) = {(, id} But now, which rule ( E E + T or E T ) gets put in ! ! T[E][(] and T[E][id]?? + * ( ) id $ E ? ? T F Compiler Construction 23/41 LR Parsing É LL(1) parsing is simple and fast É However, what if we wanted more expressiveness in the grammar? É Enter LR Parsing, a bottom-up parsing approach É Equally efficient as LL(1) É Not quite as easy by hand É Preferred practical method (see bison/yacc) Compiler Construction 24/41 LR Parsing An LR Parser reads tokens from left to right and constructs a bottom-up rightmost derivation.

LR Parsing Lecture 7

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support