LL and LR Parsing Lecture 6

LL and LR Parsing Lecture 6 February 5, 2018 Context-free Grammars A context-free grammar consists of É A set of non-terminals N É Written in uppercase throughout these notes É A set of terminals T comprised of tokens É Lowercase or punctuation throughout these notes É A start symbol S (a non-terminal) É A set of productions (rewrite rules) Assuming E N E ε 2 or ! E Y1Y2...Yn where Yi N T ! 2 [ Compiler Construction 2/49 Context-free ? Production rules hint at expressiveness! Regular A aB,C ε Context-free A ! α ! Context-sensitive αA!β αγβ Type-0 α β! ! α,β,γ N T ∗ 2 f [ g “What just happened? We must be missing some context...” Compiler Construction 3/49 Parsing and Context-free Grammars É Lexical Analysis É Regular Expressions specify a Regular Language containing strings of characters (lexeme) that correspond to a token É Parsing É Context-free Grammars specify a Context-free Language containing strings of tokens that correspond to a grammatical rule (production) Compiler Construction 4/49 Generativeness É Regular expressions and context-free grammars are generative É You can generate every string in the language using the regex or grammar! Compiler Construction 5/49 Generating Strings É Consider regex: ab*a É You can generate aa, aba, abba, abbba, ... É Consider context-free grammar: E (E)E |! ε É You can generate ε, (), (()), (())(), ... É Generating strings with a grammar can be thought of as creating a parse tree! Compiler Construction 6/49 Language membership É We care about whether an input string of tokens is syntactically correct (e.g., obeys our language’s grammar) É So far, we have looked at theoretical implications of grammars L(G) = a1...an S ∗ a1...an f j ! g For an input string x, is x L(G)? 2 Parsing part 1: We need a yes/no answer! Compiler Construction 7/49 Language membership S a B |! b C B b b C C ! c c ! What strings are in this language? (Hint: there’s only two!) If my input string is “dabc”, we ask: can the grammar generate this string? (No) É N.B. it doesn’t matter how from a theoretical perspective, that’s the job of the parsing algorithm! Compiler Construction 8/49 Parsing Algorithms É LL (top down) É Reads input from left to right and uses left-most derivations to construct a parse tree É LR (bottom up) É Reads input from left to right and uses right-most derivations to construct a parse tree É Both algorithms are driven by the input grammar and the input to be parsed. Compiler Construction 9/49 Parsing Algorithm Intuition É You start with a sequence of tokens, t1t2t3t4t5 É and also a grammar! É Two general approaches to constructing the parse tree É top-down parsing is when you predict the grammatical rule used to produce the tokens seen so far É bottom-up parsing is when you consider tokens one at a time until you match a grammatical rule Compiler Construction 10/49 Top Down Parsing S S a B c B ! C x B B ! ε C ! d !| a B c Input string: “adxdxc” a d x d x c Compiler Construction 11/49 Top Down Parsing S S a B c B B ! C x B B ! ε C ! d !| a B c Input string: “adxdxc” a d x d x c Compiler Construction 11/49 Top Down Parsing S S a B c B B ! C x B ! B ε B C ! d !| a B c C Input string: “adxdxc” a d x d x c Compiler Construction 11/49 Top Down Parsing S S a B c B B ! C x B ! B ε B C ! d !| a B c C Input string: “adxdxc” a d x d x c Compiler Construction 11/49 Top Down Parsing S S a B c B B ! C x B ! B ε B C ! d !| a B c C Input string: C B “adxdxc” a d x d x c Compiler Construction 11/49 Top Down Parsing S S a B c B B ! C x B ! B ε B C ! d !| a B c C Input string: C B “adxdxc” a d x d x c Compiler Construction 11/49 Top Down Parsing S S a B c B B ! C x B ! B ε B C ! d !| a B c C Input string: C B “adxdxc” a d x d x ε c Compiler Construction 11/49 Bottom-up Parsing Tokens right now: a S a B c B ! C x B B ! ε C ! d !| a B c Input string: “adxdxc” a d x d x c Compiler Construction 12/49 Bottom-up Parsing Tokens right now: ad S a B c B ! C x B B ! ε C ! d !| a B c Input string: “adxdxc” a d x d x c Compiler Construction 12/49 Bottom-up Parsing Tokens right now: aC S a B c B ! C x B B ! ε C ! d !| a B c C Input string: “adxdxc” a d x d x c Compiler Construction 12/49 Bottom-up Parsing Tokens right now: aCx S a B c B ! C x B B ! ε C ! d !| a B c C Input string: “adxdxc” a d x d x c Compiler Construction 12/49 Bottom-up Parsing Tokens right now: aCxd S a B c B ! C x B B ! ε C ! d !| a B c C Input string: “adxdxc” a d x d x c Compiler Construction 12/49 Bottom-up Parsing Tokens right now: aCxC S a B c B ! C x B B ! ε C ! d !| a B c C Input string: “adxdxc” C a d x d x c Compiler Construction 12/49 Bottom-up Parsing Tokens right now: aCxCx S a B c B ! C x B B ! ε C ! d !| a B c C Input string: “adxdxc” C a d x d x c Compiler Construction 12/49 Bottom-up Parsing Tokens right now: aCxCxε S a B c B ! C x B B ! ε C ! d !| a B c C Input string: “adxdxc” C a d x d x ε c Compiler Construction 12/49 Bottom-up Parsing Tokens right now: aCxCxB S a B c B ! C x B B ! ε C ! d !| a B c C Input string: “adxdxc” C B a d x d x ε c Compiler Construction 12/49 Bottom-up Parsing Tokens right now: aCxB S a B c B ! C x B B ! ε C ! d B !| a B c C Input string: “adxdxc” C B a d x d x ε c Compiler Construction 12/49 Bottom-up Parsing Tokens right now: aB S a B c ! B C x B B B ! ε C ! d B !| a B c C Input string: “adxdxc” C B a d x d x ε c Compiler Construction 12/49 Bottom-up Parsing Tokens right now: aBc S a B c ! B C x B B B ! ε C ! d B !| a B c C Input string: “adxdxc” C B a d x d x ε c Compiler Construction 12/49 Bottom-up Parsing Tokens right now: S S a B c S ! B C x B B B ! ε C ! d B !| a B c C Input string: “adxdxc” C B a d x d x ε c Compiler Construction 12/49 LL(k) parsing A LL parser read tokens from left to right and constructs a top-down leftmost derivation. LL(k) parsing predicts which production rule to use from k tokens of lookahead. LL(1) parsing is a special case using one token of lookahead. LL(1) parsing is fast and easy, but does not work if the grammar is ambiguous, left-recursive, or non-left-factored. Compiler Construction 13/49 General LL(1) Algorithm É Process 1 token at a time É Consider a ‘current’ non-terminal symbol, start with S É While input is not empty É Given next 1 token (t) and ‘current’ non-terminal N, choose a rule R s.t. (N α) ! É For each element X in rule R from left to right É If X is a non-terminal, ‘expand’ X by recursing! Set ‘current’ to X and consider same token t. É If X is a terminal and if t matches. If it matches, consume t from input, loop É Note the need for particular types of grammars! What if we have a rule S Sα? ! Compiler Construction 14/49 Recursive Descent Parsing É Recursive Descent Parsing can parse LL(k) grammars with backtracing É We can use RDP to parse LL(1) grammars by recursing through the rules of the grammar based upon the next available token É Intuition: Construct mutually-recursive functions that consume tokens according to the grammar rules! É TL;DR “Try all productions exhaustively, backtrack” Compiler Construction 15/49 Recursive Descent Parsing E T + E | T T ! (E) | int | int T ! ∗ Input: int * int 1. Try E0 T1 + E2 ! 2. Try T1 (E3) ! É Nope! token ‘int’ does not match ‘(’ in T1 (E3) ! 3. Try T1 int. Match! ! É But the next token ‘*’ does not match ‘+’ from E0 4. Try T1 int T2 ! ∗ É Matches ‘int’, but ‘+’ from E0 remains unmatched 5. Exhausted choices for T1, so we backtrack to E0 Compiler Construction 16/49 Recursive Descent Parsing (2) E T + E | T T ! (E) | int | int T ! ∗ Input: int * int 6.

Load more