Review of

Abstract Syntax Trees • Given a language L(G), a parser consumes a & sequence of tokens s and produces a parse

Top-Down Parsing • Issues: – How do we recognize that s ∈ L(G) ? – A of s describes how s ∈ L(G) – Ambiguity: more than one parse tree (possible interpretation) for some string s – Error: no parse tree for some string s – How do we construct the parse tree?

2

Abstract Syntax Trees Abstract Syntax Trees (Cont.)

• So far, a parser traces the derivation of a • Consider the grammar sequence of tokens E → int | ( E ) | E + E • The rest of the needs a structural • And the string representation of the program 5 + (2 + 3) • Abstract syntax trees • After (a list of tokens) – Like parse trees but ignore some details int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’

– Abbreviated as AST • During parsing we build a parse tree …

3 4 Example of Parse Tree Example of Abstract Syntax Tree

PLUS E • Traces the operation of the parser PLUS E + E • Captures the nesting structure • But too much info int5 ( E ) 5 2 3 – Parentheses – Single-successor nodes • Also captures the nesting structure + E E • But abstracts from the concrete syntax a more compact and easier to use int 2 int3 • An important in a compiler

5 6

Semantic Actions Semantic Actions: An Example

• This is what we will use to construct ASTs • Consider the grammar E → int | E + E | ( E ) • Each grammar symbol may have attributes • For each symbol X define an attribute X.val

– An attribute is a property of a programming – For terminals, val is the associated lexeme

language construct – For non-terminals, val is the expression’s value – For terminal symbols (lexical tokens) attributes can (which is computed from values of subexpressions) be calculated by the lexer • We annotate the grammar with actions: • Each production may have an action E → int { E.val = int.val }

– Written as: X → Y1 …Yn { action } | E1 + E2 { E.val = E1.val + E2.val } – That can refer to or compute symbol attributes | ( E1 ) { E.val = E1.val }

7 8 Semantic Actions: An Example (Cont.) Semantic Actions: Dependencies

Semantic actions specify a system of equations • String: 5 + (2 + 3) – Order of executing the actions is not specified

• Tokens: int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’

• Example: Productions Equations E3.val = E4.val + E5.val E → E1 + E2 E.val = E1.val + E2.val – Must compute E .val and E .val before E .val 4 5 3 E1 → int5 E1.val = int5.val = 5 – We say that E .val depends on E .val and E .val 3 4 5 E2 → (E 3) E2.val = E3.val

E3 → E4 + E5 E3.val = E4.val + E5.val • The parser must find the order of evaluation E4 → int 2 E4.val = int2.val = 2

E5 → int 3 E5.val = int3.val = 3

9 10

Dependency Graph Evaluating Attributes

• Each node labeled with • An attribute must be computed after all its + E a non-terminal E has successors in the dependency graph have been one slot for its val computed E + E attribute 1 2 – In the previous example attributes can be • Note the dependencies computed bottom-up int 5 5 ( E3 + ) • Such an order exists when there are no cycles + E4 E – Cyclically defined attributes are not legal 5 int 2 2 int3 3

11 12 Semantic Actions: Notes (Cont.) Inherited Attributes

• Synthesized attributes • Another kind of attributes – Calculated from attributes of descendents in the • Calculated from attributes of the parent parse tree node(s) and/or siblings in the parse tree – E.val is a synthesized attribute – Can always be calculated in a bottom-up order • Example: a line calculator

• Grammars with only synthesized attributes are called S-attributed grammars – Most frequent kinds of grammars

13 14

A Line Calculator Attributes for the Line Calculator

• Each line contains an expression • Each E has a synthesized attribute val E → int | E + E – Calculated as before

• Each line is terminated with the = sign • Each L has a synthesized attribute val

L → E = | + E = L → E = { L.val = E.val } | + E = { L.val = E.val + L.prev } • In the second form, the value of evaluation of the previous line is used as starting value • We need the value of the previous line

• A program is a sequence of lines • We use an inherited attribute L.prev

P →ε| P L

15 16 Attributes for the Line Calculator (Cont.) Example of Inherited Attributes

• Each P has a synthesized attribute val • val synthesized – The value of its last line P P →ε { P.val = 0 } + P L • prev inherited | P1 L { P.val = L.val;

L.prev = P1.val } + = • Each L has an inherited attribute prev ε + E3 0 • All can be – L.prev is inherited from sibling P1.val computed in E + 4 E5 depth-first • Example … order int 2 2 int3 3

17 18

Semantic Actions: Notes (Cont.) Constructing an AST

• Semantic actions can be used to build ASTs • We first define the AST data type • Consider an abstract tree type with two • And many other things as well constructors: – Also used for type checking, code generation, …

• Process is called syntax-directed translation mkleaf(n) = n – Substantial generalization over CFGs mkplus( , ) = PLUS

T1 T2 T1 T2

19 20 Constructing a Parse Tree Parse Tree Example

• We define a synthesized attribute ast – Values of ast values are ASTs • Consider the string int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’

– We assume that int. lexval is the value of the • A bottom-up evaluation of the ast attribute: integer lexeme E. ast = mkplus(mkleaf(5), – Computed using semantic actions mkplus(mkleaf(2), mkleaf(3))

PLUS E → int { E.ast = mkleaf(int.lexval) }

| E1 + E2 { E.ast = mkplus(E1.ast, E2.ast) } PLUS

| ( E1 ) { E.ast = E1.ast }

5 2 3

21 22

Review of Abstract Syntax Trees Second-Half of Lecture 5: Outline

• We can specify language syntax using CFG • Implementation of parsers • A parser will answer whether s ∈ L(G) • Two approaches • … and will build a parse tree – Top-down

• … which we convert to an AST – Bottom-up

• … and pass on to the rest of the compiler • Today: Top-Down – Easier to understand and program manually

• Next two & a half lectures: • Then: Bottom-Up

– How do we answer s ∈ L(G) and build a parse tree? – More powerful and used by most parser generators

• After that: from AST to assembly language

23 24 Introduction to Top-Down Parsing Recursive Descent Parsing

• Consider the grammar • Terminals are seen in order of E → T + E | T 1 appearance in the token T → int | int * T | ( E ) stream: t2 3 t9 • Token stream is: int5 * int2 t t t t t 2 5 6 8 9 • Start with top-level non-terminal E 4 7 • The parse tree is constructed t t t • Try the rules for E in order – From the top 5 6 8 – From left to right

25 26

Recursive Descent Parsing. Example (Cont.) Recursive Descent Parsing. Example (Cont.)

Token stream: int5 * int2 Token stream: int5 * int2 • Try E0 → T1 + E2 • Try E0 → T1

• Then try a rule for T1 → ( E3 ) • Follow same steps as before for T1

– But ( does not match input token int5 – And succeed with T1 → int5 * T2 and T2 → int2 – With the following parse tree • Try T1 → int . Token matches.

– But + after T1 does not match input token * E 0 • Try T1 → int * T2

– This will match and will consume the two tokens. T1 • Try T2 → int (matches) but + after T1 will be unmatched

• Try T → int * T but * does not match with end-of-input 2 3 int5 * T2

• Has exhausted the choices for ET →1 T + E | T E → T + E | T

– Backtrack to choice for E0 T → (E) | int | int * T int2 T → (E) | int | int * T

27 28 Recursive Descent Parsing. Notes. When Recursive Descent Does Not Work

• Easy to implement by hand • Consider a production S → S a

bool S1() { return S() && term(a); }

• Somewhat inefficient (due to ) bool S() { return S1(); } • S() will get into an infinite loop

• But does not always work … • A left-recursive grammar has a non-terminal S S →+ Sα for some α

• Recursive descent does not work in such cases

29 30

Elimination of Left Recursion More Elimination of Left-Recursion

• Consider the left-recursive grammar • In general S → S α | β S → S α1 | … | S αn | β1 | … | βm

• S generates all strings starting with a β and • All strings derived from S start with one of followed by any number of α’s β1,…,βm and continue with several instances of

α1,…,αn • The grammar can be rewritten using right- • Rewrite as

recursion S →β1 S’ | … | βm S’ →β S S’ S’ →α1 S’ | … | αn S’ | ε

S’ →αS’ | ε

31 32 General Left Recursion Summary of Recursive Descent

• The grammar • Simple and general parsing strategy S → A α | δ – Left-recursion must be eliminated first

A → S β – … but that can be done automatically

is also left-recursive because • Unpopular because of backtracking

S →+ S βα – Thought to be too inefficient

• This left-recursion can also be eliminated • In practice, backtracking is eliminated by restricting the grammar [See a book for a general algorithm]

33 34

Predictive Parsers LL(1) Languages

• Like recursive-descent but parser can • In recursive-descent, for each non-terminal “predict” which production to use and input token there may be a choice of – By looking at the next few tokens production – No backtracking • LL(1) means that for each non-terminal and • Predictive parsers accept LL(k) grammars token there is only one production – L means “left-to-right” scan of input • Can be specified via 2D tables – L means “leftmost derivation” – One dimension for current non-terminal to expand – k means “predict based on k tokens of lookahead” – One dimension for next token • In practice, LL(1) is used – A table entry contains one production

35 36 Predictive Parsing and Left Factoring Left-Factoring Example

• Recall the grammar for arithmetic expressions • Recall the grammar E → T + E | T E → T + E | T T → ( E ) | int | int * T T → ( E ) | int | int * T

• Hard to predict because • Factor out common prefixes of productions – For T two productions start with int E → T X – For E it is not clear how to predict X → + E | ε

T → ( E ) | int Y • A grammar must be left-factored before it is Y → * T | ε used for predictive parsing

37 38

LL(1) Parsing Table Example LL(1) Parsing Table Example (Cont.)

• Left-factored grammar • Consider the [E, int] entry E → T X X → + E | ε – “When current non-terminal is E and next input is T → ( E ) | int Y Y → * T | ε int, use production E → T X – This production can generate an int in the first place • The LL(1) parsing table: • Consider the [Y,+] entry int * + ( ) $ – “When current non-terminal is Y and current token E T X T X is +, get rid of Y” X + E ε ε – Y can be followed by + only in a derivation in which T int Y ( E ) Y →ε Y * T ε ε ε

39 40 LL(1) Parsing Tables: Errors Using Parsing Tables

• Blank entries indicate error situations • Method similar to recursive descent, except – Consider the [E,*] entry – For each non-terminal S – “There is no way to derive a string starting with * – We look at the next token a from non-terminal E” – And chose the production shown at [S,a] • We use a stack to keep track of pending non- terminals • We reject when we encounter an error state • We accept when we encounter end-of-input

41 42

LL(1) Parsing Algorithm LL(1) Parsing Example

Stack Input Action initialize stack = and next E $ int * int $ T X repeat T X $ int * int $ int Y case stack of int Y X $ int * int $ terminal : if T[X,*next] = Y1 …Yn Y X $ * int $ * T then stack ← ; 1 n * T X $ * int $ terminal else error(); : if t == *next++ T X $ int $ int Y then stack ← ; int Y X $ int $ terminal else error(); Y X $ $ ε until stack == <> X $ $ ε

$ $ ACCEPT int * + ( ) $ E T X T X X + E ε ε T int Y ( E ) 43 Y * T ε ε ε44 Constructing Parsing Tables Constructing Parsing Tables (Cont.)

• LL(1) languages are those defined by a parsing • If A →α, where in the line of A we place α ? table for the LL(1) algorithm • In the column of t where t can start a string • No table entry can be multiply defined derived from α – α→* t β

• We want to generate parsing tables from CFG – We say that t ∈ First(α) • In the column of t if α is ε and t can follow an A – S →* β A t δ

– We say t ∈ Follow(A)

45 46

Computing First Sets First Sets: Example

Definition • Recall the grammar

First(X) = { t | X →* tα} ∪ {ε | X →* ε} E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε Algorithm sketch • First sets 1. First(t) = { t } First( ( ) = { ( } First( ) ) = { ) } 2. ε∈First(X) if X →εis a production First( + ) = { + } First( * ) = { * } 3. ε∈First(X) if X → A1 …An First( int) = { int } and ε∈First(A ) for each 1 ≤ i ≤ n i 4. First(α) ⊆ First(X) if X → A …A α First( T ) = { int, ( } 1 n and ε∈First(A ) for each 1 ≤ i ≤ n First( E ) = { int, ( } i First( X ) = { +, ε } First( Y ) = { *, ε } 47 48 Computing Follow Sets Computing Follow Sets (Cont.)

• Definition Algorithm sketch Follow(X) = { t | S →* β X t δ } 1. $ ∈ Follow(S)

2. First( β) - {ε} ⊆ Follow(X) • Intuition For each production A →αX β – If X → A B then First(B) ⊆ Follow(A) 3. Follow(A) ⊆ Follow(X) and Follow(X) ⊆ Follow(B) For each production A →αX β where ε∈First(β) – Also if B →* ε then Follow(X) ⊆ Follow(A)

– If S is the start symbol then $ ∈ Follow(S)

49 50

Follow Sets: Example Constructing LL(1) Parsing Tables

• Recall the grammar • Construct a parsing table T for CFG G E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε • For each production A →αin G do: • Follow sets – For each terminal t ∈ First(α) do Follow( + ) = { int, ( } Follow( * ) = { int, ( } • T[A, t] = α

Follow( ( ) = { int, ( } Follow( E ) = { ), $ } – If ε∈First(α), for each t ∈ Follow(A) do • T[A, t] = α Follow( X ) = { $, ) } Follow( T ) = { +, ) , $ } – If ε∈First(α) and $ ∈ Follow(A) do Follow( ) ) = { +, ) , $ } Follow( Y ) = { +, ) , $ } • T[A, $] = α Follow( int) = { *, +, ) , $ }

51 52 Notes on LL(1) Parsing Tables Review

• If any entry is multiply defined then G is not • For some grammars there is a simple parsing LL(1) strategy – If G is ambiguous Predictive parsing – If G is left recursive – If G is not left-factored • Next time: a more powerful parsing strategy – And in other cases as well • Most grammars are not LL(1) • There are tools that build LL(1) tables

53 54