Abstract Syntax Trees & Top-Down Parsing

Review of Parsing Abstract Syntax Trees • Given a language L(G), a parser consumes a & sequence of tokens s and produces a parse tree Top-Down Parsing • Issues: – How do we recognize that s ∈ L(G) ? – A parse tree of s describes how s ∈ L(G) – Ambiguity: more than one parse tree (possible interpretation) for some string s – Error: no parse tree for some string s – How do we construct the parse tree? 2 Abstract Syntax Trees Abstract Syntax Trees (Cont.) • So far, a parser traces the derivation of a • Consider the grammar sequence of tokens E → int | ( E ) | E + E • The rest of the compiler needs a structural • And the string representation of the program 5 + (2 + 3) • Abstract syntax trees • After lexical analysis (a list of tokens) – Like parse trees but ignore some details int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’ – Abbreviated as AST • During parsing we build a parse tree … 3 4 Example of Parse Tree Example of Abstract Syntax Tree PLUS E • Traces the operation of the parser PLUS E + E • Captures the nesting structure • But too much info int5 ( E ) 5 2 3 – Parentheses – Single-successor nodes • Also captures the nesting structure + E E • But abstracts from the concrete syntax a more compact and easier to use int 2 int3 • An important data structure in a compiler 5 6 Semantic Actions Semantic Actions: An Example • This is what we will use to construct ASTs • Consider the grammar E → int | E + E | ( E ) • Each grammar symbol may have attributes • For each symbol X define an attribute X.val – An attribute is a property of a programming – For terminals, val is the associated lexeme language construct – For non-terminals, val is the expression’s value – For terminal symbols (lexical tokens) attributes can (which is computed from values of subexpressions) be calculated by the lexer • We annotate the grammar with actions: • Each production may have an action E → int { E.val = int.val } – Written as: X → Y1 …Yn { action } | E1 + E2 { E.val = E1.val + E2.val } – That can refer to or compute symbol attributes | ( E1 ) { E.val = E1.val } 7 8 Semantic Actions: An Example (Cont.) Semantic Actions: Dependencies Semantic actions specify a system of equations • String: 5 + (2 + 3) – Order of executing the actions is not specified • Tokens: int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’ • Example: Productions Equations E .val = E .val + E .val E → E + E E.val = E .val + E .val 3 4 5 1 2 1 2 – Must compute E4.val and E5 .val before E3.val E → int E .val = int .val = 5 1 5 1 5 – We say that E3.val depends on E4 .val and E5.val E2 → (E 3) E2.val = E3.val E3 → E4 + E5 E3.val = E4.val + E5.val • The parser must find the order of evaluation E4 → int 2 E4.val = int2.val = 2 E5 → int 3 E5.val = int3.val = 3 9 10 Dependency Graph Evaluating Attributes • Each node labeled with • An attribute must be computed after all its + E a non-terminal E has successors in the dependency graph have been one slot for its val computed E + E attribute 1 2 – In the previous example attributes can be • Note the dependencies computed bottom-up int 5 5 ( E3 + ) • Such an order exists when there are no cycles + E4 E – Cyclically defined attributes are not legal 5 int 2 2 int3 3 11 12 Semantic Actions: Notes (Cont.) Inherited Attributes • Synthesized attributes • Another kind of attributes – Calculated from attributes of descendents in the • Calculated from attributes of the parent parse tree node(s) and/or siblings in the parse tree – E.val is a synthesized attribute – Can always be calculated in a bottom-up order • Example: a line calculator • Grammars with only synthesized attributes are called S-attributed grammars – Most frequent kinds of grammars 13 14 A Line Calculator Attributes for the Line Calculator • Each line contains an expression • Each E has a synthesized attribute val E → int | E + E – Calculated as before • Each line is terminated with the = sign • Each L has a synthesized attribute val L → E = | + E = L → E = { L.val = E.val } | + E = { L.val = E.val + L.prev } • In the second form, the value of evaluation of the previous line is used as starting value • We need the value of the previous line • A program is a sequence of lines • We use an inherited attribute L.prev P →ε| P L 15 16 Attributes for the Line Calculator (Cont.) Example of Inherited Attributes • Each P has a synthesized attribute val • val synthesized – The value of its last line P P →ε { P.val = 0 } + P L • prev inherited | P1 L { P.val = L.val; L.prev = P1.val } = • Each L has an inherited attribute prev + E + ε 0 3 • All can be – L.prev is inherited from sibling P1.val computed in E + 4 E5 depth-first • Example … order int 2 2 int3 3 17 18 Semantic Actions: Notes (Cont.) Constructing an AST • Semantic actions can be used to build ASTs • We first define the AST data type • Consider an abstract tree type with two • And many other things as well constructors: – Also used for type checking, code generation, … • Process is called syntax-directed translation mkleaf(n) = n – Substantial generalization over CFGs mkplus( , ) = PLUS T1 T2 T1 T2 19 20 Constructing a Parse Tree Parse Tree Example • We define a synthesized attribute ast – Values of ast values are ASTs • Consider the string int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’ – We assume that int. lexval is the value of the • A bottom-up evaluation of the ast attribute: integer lexeme E. ast = mkplus(mkleaf(5), – Computed using semantic actions mkplus(mkleaf(2), mkleaf(3)) PLUS E → int { E.ast = mkleaf(int.lexval) } | E1 + E2 { E.ast = mkplus(E1.ast, E2.ast) } PLUS | ( E1 ) { E.ast = E1.ast } 5 2 3 21 22 Review of Abstract Syntax Trees Second-Half of Lecture 5: Outline • We can specify language syntax using CFG • Implementation of parsers • A parser will answer whether s ∈ L(G) • Two approaches • … and will build a parse tree – Top-down • … which we convert to an AST – Bottom-up • … and pass on to the rest of the compiler • Today: Top-Down – Easier to understand and program manually • Next two & a half lectures: • Then: Bottom-Up – How do we answer s ∈ L(G) and build a parse tree? – More powerful and used by most parser generators • After that: from AST to assembly language 23 24 Introduction to Top-Down Parsing Recursive Descent Parsing • Consider the grammar • Terminals are seen in order of E → T + E | T 1 appearance in the token T → int | int * T | ( E ) stream: t2 3 t9 • Token stream is: int5 * int2 t t t t t 2 5 6 8 9 • Start with top-level non-terminal E 4 7 • The parse tree is constructed • Try the rules for E in order t t t – From the top 5 6 8 – From left to right 25 26 Recursive Descent Parsing. Example (Cont.) Recursive Descent Parsing. Example (Cont.) Token stream: int5 * int2 Token stream: int5 * int2 • Try E0 → T1 + E2 • Try E0 → T1 • Then try a rule for T1 → ( E3 ) • Follow same steps as before for T1 – But ( does not match input token int5 – And succeed with T1 → int5 * T2 and T2 → int2 – With the following parse tree • Try T1 → int . Token matches. – But + after T1 does not match input token * E 0 • Try T1 → int * T2 – This will match and will consume the two tokens. T1 • Try T2 → int (matches) but + after T1 will be unmatched • Try T → int * T but * does not match with end-of-input 2 3 int5 * T2 • Has exhausted the choices for ET →1 T + E | T E → T + E | T – Backtrack to choice for E0 T → (E) | int | int * T int2 T → (E) | int | int * T 27 28 Recursive Descent Parsing. Notes. When Recursive Descent Does Not Work • Easy to implement by hand • Consider a production S → S a bool S1() { return S() && term(a); } • Somewhat inefficient (due to backtracking) bool S() { return S1(); } • S() will get into an infinite loop • But does not always work … • A left-recursive grammar has a non-terminal S S →+ Sα for some α • Recursive descent does not work in such cases 29 30 Elimination of Left Recursion More Elimination of Left-Recursion • Consider the left-recursive grammar • In general S → S α | β S → S α1 | … | S αn | β1 | … | βm • S generates all strings starting with a β and • All strings derived from S start with one of followed by any number of α’s β1,…,βm and continue with several instances of α1,…,αn • The grammar can be rewritten using right- • Rewrite as recursion S →β1 S’ | … | βm S’ →β S S’ S’ →α1 S’ | … | αn S’ | ε S’ →αS’ | ε 31 32 General Left Recursion Summary of Recursive Descent • The grammar • Simple and general parsing strategy S → A α | δ – Left-recursion must be eliminated first A → S β – … but that can be done automatically is also left-recursive because • Unpopular because of backtracking S →+ S βα – Thought to be too inefficient • This left-recursion can also be eliminated • In practice, backtracking is eliminated by restricting the grammar [See a Compilers book for a general algorithm] 33 34 Predictive Parsers LL(1) Languages • Like recursive-descent but parser can • In recursive-descent, for each non-terminal “predict” which production to use and input token there may be a choice of – By looking at the next few tokens production – No backtracking • LL(1) means that for each non-terminal and • Predictive parsers accept LL(k) grammars token there is only one production – L means “left-to-right” scan of input • Can be specified via 2D tables – L means “leftmost derivation” – One dimension for current non-terminal to expand – k means “predict based on k tokens of lookahead” – One dimension for next token • In practice, LL(1) is used – A table entry contains one production 35 36 Predictive Parsing and Left Factoring Left-Factoring Example • Recall the grammar for arithmetic expressions • Recall the grammar E → T + E | T E → T + E | T T → ( E ) | int

Load more