CMSC 330: Organization of Programming Languages Recall

Recall: Steps of Compilation CMSC 330: Organization of Programming Languages Parsing (aka scanning) Converts Converts strings to tokens to tokens trees CMSC 330 1 CMSC 330 2 Implementing Parsers Top-Down Parsing E → id = n | { L } Many efficient techniques for parsing E • I.e., turning strings (token lists) into parse trees L → E ; L | ε • Examples L ! LL(k), SLR(k), LR(k), LALR(k)… (Assume: id is E L ! Take CMSC 430 for more details variable name, n is integer) E L One simple technique: recursive descent parsing ε • This is a top-down parsing algorithm L Show parse tree for • Other algorithms are bottom-up E { x = 3 ; { y = 4 ; } ; } L ε { x = 3 ; { y = 4 ; } ; } CMSC 330 3 CMSC 330 4 Bottom-up Parsing Example: Shift-Reduce Parsing E → id = n | { L } Replaces RHS of production with LHS L → E ; L | ε E (nonterminal) Show parse tree for L { x = 3 ; { y = 4 ; } ; } Example grammar E L • S → aA, A → Bc, B → b Note that final trees constructed are same E L Example parse as for top-down; only order in which nodes ε • abc aBc aA S are added to tree is L different • Derivation happens in reverse E L Something to look forward to in CMSC 430 ε { x = 3 ; { y = 4 ; } ; } CMSC 330 5 CMSC 330 6 Tradeoffs Recursive Descent Parsing Recursive descent parsers: easy to write & fast Goal • The formal definition is a little clunky • Determine if we can produce the string to be parsed ! but if you follow the code then it’s almost what you might from the grammar's start symbol have done if you weren't told about grammars formally Approach • Implemented with a simple table (and/or recursion) • Recursively replace nonterminal with RHS of production Shift-reduce parsers handle more grammars At each step, we'll keep track of two facts • Complicated to write; requires tools • What tree node are we trying to match? ! yacc, bison, etc. convert a CFG to a shift-reduce parser • What is the lookahead (next token of the input string)? • Error messages may be confusing ! Helps guide selection of production used to replace nonterminal Most languages use hacked parsers (!) • Strange combination of the two CMSC 330 7 CMSC 330 8 Recursive Descent Parsing (cont.) Parsing Example At each step, 3 possible cases E → id = n | { L } • If were trying to match a terminal L → E ; L | ε ! If the lookahead is that token, then succeed, advance the • Here n is an integer and id is an identifier lookahead, and continue • If were trying to match a nonterminal ! Pick which production to apply based on the lookahead One input might be • Otherwise fail with a parsing error • { x = 3; { y = 4; }; } • This would get turned into a list of tokens { x = 3 ; { y = 4 ; } ; } • And we want to turn it into a parse tree CMSC 330 9 CMSC 330 10 Parsing Example (cont.) Recursive Descent Parsing (cont.) E E → id = n | { L } Key step { L } L → E ; L | ε • Choosing which production should be selected E ; L Two approaches { x = 3 ; { y = 4 ; } ; } x = 3 E ; L • Backtracking ! Choose some production { L } ε ! If fails, try different production ! Parse fails if all choices fail E ; L • Predictive parsing y = 4 ε ! Analyze grammar to find FIRST sets for productions ! Compare with lookahead to decide which production to select ! Parse fails if lookahead does not match FIRST CMSC 330 11 CMSC 330 12 First Sets First Sets Motivating example Definition • The lookahead is x • First(γ), for any terminal or nonterminal γ, is the set of initial terminals of all strings that γ may expand to • Given grammar S → xyz | abc • Well use this to decide what production to apply ! Select S → xyz since 1st terminal in RHS matches x • Given grammar S → A | B A → x | y B → z Examples ! Select S → A, since A can derive string beginning with x • Given grammar S → xyz | abc ! First(xyz) = { x }, First(abc) = { a } In general ! First(S) = First(xyz) U First(abc) = { x, a } • Choose a production that can derive a sentential form • Given grammar S → A | B A → x | y B → z beginning with the lookahead ! First(x) = { x }, First(y) = { y }, First(A) = { x, y } • Need to know what terminal may be first in any ! First(z) = { z }, First(B) = { z } sentential form derived from a nonterminal / production ! First(S) = { x, y, z } CMSC 330 13 CMSC 330 14 Calculating First(γ) First( ) Examples For a terminal a E → id = n | { L } E → id = n | { L } | ε • First(a) = { a } L → E ; L | ε L → E ; L For a nonterminal N • If N → ε, then add ε to First(N) First(id) = { id } First(id) = { id } • If N → α α ... α , then (note the α are all the 1 2 n i First("=") = { "=" } First("=") = { "=" } symbols on the right side of one single production): First(n) = { n } First(n) = { n } ! Add First(α1α2 ... αn) to First(N), where First(α1 α2 ... αn) is defined as First("{")= { "{" } First("{")= { "{" } • First(α ) if ε First(α ) 1 ∉ 1 First("}")= { "}" } First("}")= { "}" } • Otherwise (First(α1) – ε) First(α2 ... αn) ! If ε ∈ First(αi) for all i, 1 ≤ i ≤ k, then add ε to First(N) First(";")= { ";" } First(";")= { ";" } First(E) = { id, "{" } First(E) = { id, "{", ε } First(L) = { id, "{", ε } First(L) = { id, "{", ";" } CMSC 330 15 CMSC 330 16 Recursive Descent Parser Implementation Parser Implementation (cont.) For terminals, create function match(a) The body of parse_N for a nonterminal N does • If lookahead is a it consumes the lookahead by the following advancing the lookahead to the next token, and returns • Let N → β1 | ... | βk be the productions of N • Otherwise fails with a parse error if lookahead is not a ! Here βi is the entire right side of a production- a sequence of • In algorithm descriptions, consider parse_a, terminals and nonterminals parse_term(a) to be aliases for match(a) • Pick the production N → βi such that the lookahead is in First(βi) For each nonterminal N, create a function parse_N ! It must be that First(βi) ∩ First(βj) = ∅ for i ≠ j • Called when were trying to parse a part of the input ! If there is no such production, but N → ε then return which corresponds to (or can be derived from) N ! Otherwise fail with a parse error • parse_S for the start symbol S begins the parse • Suppose βi = α1 α2 ... αn. Then call parse_α1(); ... ; parse_αn() to match the expected right-hand side, and return CMSC 330 17 CMSC 330 18 Parser Implementation (cont.) Recursive Descent Parser Parse is built on procedure calls Given grammar S → xyz | abc Procedures may be (mutually) recursive • First(xyz) = { x }, First(abc) = { a } Parser parse_S( ) { Note: if (lookahead == x) { • These procedures are imperative: assume there is a match(x); match(y); match(z); // S → xyz global list of tokens we are consuming } • In Ocaml, the list of tokens would be passed as an else if (lookahead == a) { argument, and returned in the result match(a); match(b); match(c); // S → abc ! E.g., match(a) becomes match_tok a l } • Argument l is list of tokens else error( ); • Return value is token list minus a, if a was at the front, } or throws an exception if not CMSC 330 19 CMSC 330 20 Recursive Descent Parser Example Given grammar S → A | B A → x | y B → z E → id = n | { L } First(E) = { id, "{" } • First(A) = { x, y }, First(B) = { z } L → E ; L | ε Parser parse_E( ) { parse_L( ) { parse_S( ) { parse_A( ) { if (lookahead == id) { if ((lookahead == id) || if (lookahead == x) if ((lookahead == x) || match(id); (lookahead == {)) { match(x); // A → x match(=); // E → id = n parse_E( ); (lookahead == y)) else if (lookahead == y) match(n); match(;); // L → E ; L parse_A( ); // S → A match(y); // A → y } parse_L( ); else if (lookahead == z) else error( ); else if (lookahead == {) { } parse_B( ); // S → B } match({); else ; // L → ε parse_B( ) { else error( ); parse_L( ); // E → { L } } if (lookahead == z) match(}); } match(z); // B → z } else error( ); else error( ); } } CMSC 330 21 CMSC 330 22 Things to Notice Things to Notice (cont.) If you draw the execution trace of the parser This is a predictive parser • You get the parse tree • Because the lookahead determines exactly which production to use Examples • Grammar • Grammar This parsing strategy may fail on some grammars S → xyz S → A | B • Production First sets overlap S → abc A → x | y • Production First sets contain ε • String xyz B → z • Possible infinite recursion parse_S( ) • String x S S Does not mean grammar is not usable match(x) parse_S( ) /|\ | • Just means this parsing method not powerful enough match(y) parse_A( ) A x y z • May be able to change grammar match(z) match(x) | x CMSC 330 23 CMSC 330 24 Conflicting FIRST Sets Left Factoring Algorithm Consider parsing the grammar E → ab | ac Given grammar • First(ab) = a Parser cannot choose between • A → xα1 | xα2 | … | xαn | β RHS based on lookahead! • First(ac) = a Rewrite grammar as Parser fails whenever A → α1 | α2 and • A → xL | β • First(α1) ∩ First(α2) != ε or • L → α1 | α2 | … | αn Solution Repeat as necessary • Rewrite grammar using left factoring Examples • S → ab | ac S → aL L → b | c • S → abcA | abB | a S → aL L → bcA | bB | ε • L → bcA | bB | ε L → bL | ε L → cA | B CMSC 330 25 CMSC 330 26 Alternative Approach Left Recursion Change structure of parser Consider grammar S → Sa | ε • First match common prefix of productions • Try writing parser parse_S( ) { • Then use lookahead to choose between productions if (lookahead == a) { parse_S( ); Example match(a); // S → Sa Consider parsing the grammar E → a+b | a*b | a } • else { } parse_E( ) { } match(a); // common prefix if (lookahead == +) { // E → a+b match(+); match(b); } • Body of parse_S( ) has an infinite loop if (lookahead == *) { // E → a*b ! if (lookahead = "a") then parse_S( ) match(*);

CMSC 330: Organization of Programming Languages Recall

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support