Announcements

• PA1 – Due today at midnight Error Handling – README, test case Syntax-Directed Translation – Your name(s)! Recursive Descent • WA1 – Due today at 5pm Lecture 6 • PA2 – Assigned today • WA2 – Assigned Tuesday

Prof. Aiken CS 143 Lecture 6 1 Prof. Aiken CS 143 Lecture 6 2

Outline Error Handling

• Extensions of CFG for parsing • Purpose of the is – Precedence declarations – To detect non-valid programs – Error handling – To translate the valid ones – Semantic actions • Many kinds of possible errors (e.g. in ) Error kind Example Detected by … • Constructing a parse Lexical … $ … Lexer Syntax … x *% … Parser • Recursive descent Semantic … int x; y = x(3); … Type checker Correctness your favorite program Tester/User

Prof. Aiken CS 143 Lecture 6 3 Prof. Aiken CS 143 Lecture 6 4

Syntax Error Handling Approaches to Syntax Error Recovery

• Error handler should • From simple to complex – Report errors accurately and clearly – Panic mode – Recover from an error quickly – Error productions – Not slow down compilation of valid code – Automatic local or global correction

• Good error handling is not easy to achieve • Not all are supported by all parser generators

Prof. Aiken CS 143 Lecture 6 5 Prof. Aiken CS 143 Lecture 6 6

1 Error Recovery: Panic Mode Syntax Error Recovery: Panic Mode (Cont.)

• Simplest, most popular method • Consider the erroneous expression (1 + + 2) + 3 • When an error is detected: • Panic-mode recovery: – Discard tokens until one with a clear role is found – Skip ahead to next integer and then continue – Continue from there • Bison: use the special terminal error to • Such tokens are called synchronizing tokens describe how much input to skip – Typically the statement or expression terminators E  int | E + E | ( E ) | error int | ( error )

Prof. Aiken CS 143 Lecture 6 7 Prof. Aiken CS 143 Lecture 6 8

Syntax Error Recovery: Error Productions Error Recovery: Local and Global Correction

• Idea: specify in the grammar known common mistakes • Idea: find a correct “nearby” program – Try token insertions and deletions • Essentially promotes common errors to alternative – Exhaustive search syntax

• Example: • Disadvantages: – Write 5 x instead of 5 * x – Hard to implement – Add the production E  … | E E – Slows down parsing of correct programs • Disadvantage – “Nearby” is not necessarily “the intended” program – Complicates the grammar – Not all tools support it

Prof. Aiken CS 143 Lecture 6 9 Prof. Aiken CS 143 Lecture 6 10

Syntax Error Recovery: Past and Present Abstract Syntax Trees

• Past • So far a parser traces the derivation of a – Slow recompilation cycle (even once a day) sequence of tokens – Find as many errors in one cycle as possible – Researchers could not let go of the topic • The rest of the compiler needs a structural • Present representation of the program – Quick recompilation cycle – Users tend to correct one error/cycle • Abstract syntax trees – Complex error recovery is less compelling – Like parse trees but ignore some details – Panic-mode seems enough – Abbreviated as AST

Prof. Aiken CS 143 Lecture 6 11 Prof. Aiken CS 143 Lecture 6 12

2 . (Cont.) Example of

• Consider the grammar E E  int | ( E ) | E + E • Traces the operation E + E of the parser • And the string 5 + (2 + 3) • Does capture the int5 ( E ) nesting structure

• After (a list of tokens) • But too much info E + int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’ E – Parentheses – Single-successor int nodes • During parsing we build a parse tree … 2 int3

Prof. Aiken CS 143 Lecture 6 13 Prof. Aiken CS 143 Lecture 6 14

Example of Abstract Syntax Tree Semantic Actions

PLUS • This is what we’ll use to construct ASTs

PLUS • Each grammar symbol may have attributes – For terminal symbols (lexical tokens) attributes can 5 2 3 be calculated by the lexer • Also captures the nesting structure • But abstracts from the concrete syntax • Each production may have an action – Written as: X  Y … Y { action } => more compact and easier to use 1 n – That can refer to or compute symbol attributes • An important in a compiler

Prof. Aiken CS 143 Lecture 6 15 Prof. Aiken CS 143 Lecture 6 16

Semantic Actions: An Example Semantic Actions: An Example (Cont.)

• Consider the grammar • String: 5 + (2 + 3) E  int | E + E | ( E ) • Tokens: int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’ • For each symbol X define an attribute X.val – For terminals, val is the associated lexeme Productions Equations – For non-terminals, val is the expression’s value (and is  computed from values of subexpressions) E E1 + E2 E.val = E1.val + E2.val E1  int5 E1.val = int5.val = 5

• We annotate the grammar with actions: E2  ( E3) E2.val = E3.val E  int { E.val = int.val } E3  E4 + E5 E3.val = E4.val + E5.val | E1 + E2 { E.val = E1.val + E2.val } E4  int2 E4.val = int2.val = 2 | ( E1 ) { E.val = E1.val } E5  int3 E5.val = int3.val = 3 Prof. Aiken CS 143 Lecture 6 17 Prof. Aiken CS 143 Lecture 6 18

3 Semantic Actions: Notes Dependency Graph

• Semantic actions specify a system of E + • Each node labeled E equations has one slot for the val – Order of resolution is not specified E1 + E2 attribute • Note the dependencies • Example: int 5 5 ( E3 + ) E3.val = E4.val + E5.val – Must compute E4.val and E5.val before E3.val E + – We say that E3.val depends on E4.val and E5.val 4 E5

int 2 • The parser must find the order of evaluation 2 int3 3

Prof. Aiken CS 143 Lecture 6 19 Prof. Aiken CS 143 Lecture 6 20

Evaluating Attributes Dependency Graph

• An attribute must be computed after all its E 10 successors in the dependency graph have been 5 5 computed E1 + E2 – In previous example attributes can be computed bottom-up int 5 5 ( E3 5 )

• Such an order exists when there are no cycles E 2 + 4 E5 3 – Cyclically defined attributes are not legal int 2 2 int3 3

Prof. Aiken CS 143 Lecture 6 21 Prof. Aiken CS 143 Lecture 6 22

Semantic Actions: Notes (Cont.) Inherited Attributes

• Synthesized attributes • Another kind of attribute – Calculated from attributes of descendents in the parse tree • Calculated from attributes of parent and/or – E.val is a synthesized attribute siblings in the parse tree – Can always be calculated in a bottom-up order

• Grammars with only synthesized attributes • Example: a line calculator are called S-attributed grammars – Most common case

Prof. Aiken CS 143 Lecture 6 23 Prof. Aiken CS 143 Lecture 6 24

4 A Line Calculator Attributes for the Line Calculator

• Each line contains an expression • Each E has a synthesized attribute val E  int | E + E – Calculated as before • Each line is terminated with the = sign • Each L has an attribute val L  E = | + E = L  E = { L.val = E.val } | + E = { L.val = E.val + L.prev } • In second form the value of previous line is used as starting value • We need the value of the previous line • A program is a sequence of lines • We use an inherited attribute L.prev P   | P L

Prof. Aiken CS 143 Lecture 6 25 Prof. Aiken CS 143 Lecture 6 26

Attributes for the Line Calculator (Cont.) Example of Inherited Attributes

• Each P has a synthesized attribute val P • val synthesized – The value of its last line + P   { P.val = 0 } P L

| P1 L { P.val = L.val; • prev inherited = L.prev = P1.val } +  + E3 – Each L has an inherited attribute prev 0 – L.prev is inherited from sibling P .val • All can be 1 E + 4 E5 computed in depth-first • Example … int 2 order 2 int3 3

Prof. Aiken CS 143 Lecture 6 27 Prof. Aiken CS 143 Lecture 6 28

Example of Inherited Attributes Semantic Actions: Notes (Cont.)

P 5 • val synthesized • Semantic actions can be used to build ASTs

P 0 0 L 5 • And many other things as well • prev inherited – Also used for type checking, code generation, … + E 5 =  0 3 • All can be • Process is called syntax-directed translation E 2 + 4 E5 3 computed in – Substantial generalization over CFGs depth-first int 2 order 2 int3 3

Prof. Aiken CS 143 Lecture 6 29 Prof. Aiken CS 143 Lecture 6 30

5 Constructing An AST Constructing a Parse Tree

• We first define the AST data type • We define a synthesized attribute ast – Supplied by us for the project – Values of ast values are ASTs • Consider an abstract tree type with two – We assume that int.lexval is the value of the constructors: integer lexeme – Computed using semantic actions mkleaf(n) = n E  int E.ast = mkleaf(int.lexval) PLUS mkplus( , ) = | E1 + E2 E.ast = mkplus(E1.ast, E2.ast)

| ( E1 ) E.ast = E1.ast

T1 T2 T1 T2 Prof. Aiken CS 143 Lecture 6 31 Prof. Aiken CS 143 Lecture 6 32

Parse Tree Example Summary

• Consider the string int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’ • We can specify language syntax using CFG • A bottom-up evaluation of the ast attribute: E.ast = mkplus(mkleaf(5), • A parser will answer whether s  L(G) mkplus(mkleaf(2), mkleaf(3)) – … and will build a parse tree PLUS – … which we convert to an AST – … and pass on to the rest of the compiler PLUS

5 2 3

Prof. Aiken CS 143 Lecture 6 33 Prof. Aiken CS 143 Lecture 6 34

Intro to Top-Down Parsing: The Idea Recursive Descent Parsing

• Consider the grammar • The parse tree is constructed 1 E  T |T + E – From the top T  int | int * T | ( E )

– From left to right t2 3 t9

• Token stream is: ( int5 ) • Terminals are seen in order of 4 7 appearance in the token • Start with top-level non-terminal E stream: t5 t6 t8 – Try the rules for E in order

t2 t5 t6 t8 t9

Prof. Aiken CS 143 Lecture 6 35 Prof. Aiken CS 143 Lecture 6 36

6 Recursive Descent Parsing Recursive Descent Parsing

E  T |T + E E  T |T + E T  int | int * T | ( E ) T  int | int * T | ( E )

E E

T

( int5 ) ( int5 ) Prof. Aiken CS 143 Lecture 6 37 Prof. Aiken CS 143 Lecture 6 38

Recursive Descent Parsing Recursive Descent Parsing

E  T |T + E E  T |T + E T  int | int * T | ( E ) T  int | int * T | ( E )

E E

T T Mismatch: int is not ( ! int Backtrack …

( int5 ) ( int5 ) Prof. Aiken CS 143 Lecture 6 39 Prof. Aiken CS 143 Lecture 6 40

Recursive Descent Parsing Recursive Descent Parsing

E  T |T + E E  T |T + E T  int | int * T | ( E ) T  int | int * T | ( E )

E E

T T Mismatch: int is not ( ! int * T Backtrack …

( int5 ) ( int5 ) Prof. Aiken CS 143 Lecture 6 41 Prof. Aiken CS 143 Lecture 6 42

7 Recursive Descent Parsing Recursive Descent Parsing

E  T |T + E E  T |T + E T  int | int * T | ( E ) T  int | int * T | ( E )

E E

T T Match! Advance input. ( E ) ( E )

( int5 ) ( int5 ) Prof. Aiken CS 143 Lecture 6 43 Prof. Aiken CS 143 Lecture 6 44

Recursive Descent Parsing Recursive Descent Parsing

E  T |T + E E  T |T + E T  int | int * T | ( E ) T  int | int * T | ( E )

E E

T T Match! Advance input. ( E ) ( E )

T T ( int5 ) ( int5 ) Prof. Aiken CS 143 Lecture 6 45 Prof. Aiken CS 143 Lecture 6 46 int

Recursive Descent Parsing Recursive Descent Parsing

E  T |T + E E  T |T + E T  int | int * T | ( E ) T  int | int * T | ( E )

E E

T T Match! Advance input. End of input, accept. ( E ) ( E )

T T ( int5 ) ( int5 ) Prof. Aiken CS 143 Lecture 6 47 Prof. Aiken CS 143 Lecture 6 48 int int

8 A . Preliminaries A Recursive Descent Parser (2)

• Let TOKEN be the type of tokens • Define boolean functions that check the token – Special tokens INT, OPEN, CLOSE, PLUS, TIMES string for a match of – A given token terminal • Let the global next point to the next token bool term(TOKEN tok) { return *next++ == tok; } – The nth production of S:

bool Sn() { … } – Try all productions of S: bool S() { … }

Prof. Aiken CS 143 Lecture 6 49 Prof. Aiken CS 143 Lecture 6 50

A Recursive Descent Parser (3) A Recursive Descent Parser (4)

• For production E  T • Functions for non-terminal T bool T1() { return term(INT); } bool E1() { return T(); } bool T2() { return term(INT) && term(TIMES) && T(); } • For production E  T + E bool T3() { return term(OPEN) && E() && term(CLOSE); } bool E2() { return T() && term(PLUS) && E(); } • For all productions of E (with ) bool T() { TOKEN *save = next; bool E() { return (next = save, T ()) TOKEN *save = next; 1 || (next = save, T2()) return (next = save, E1()) || (next = save, T3()); } || (next = save, E2()); }

Prof. Aiken CS 143 Lecture 6 51 Prof. Aiken CS 143 Lecture 6 52

Recursive Descent Parsing. Notes. Example

• To start the parser E  T |T + E ( int ) T  int | int * T | ( E ) – Initialize next to point to first token – Invoke E() bool term(TOKEN tok) { return *next++ == tok; }

bool E1() { return T(); } bool E () { return T() && term(PLUS) && E(); } • Notice how this simulates the example parse 2 bool E() {TOKEN *save = next; return (next = save, E1())

|| (next = save, E2()); }

bool T1() { return term(INT); } • Easy to implement by hand bool T2() { return term(INT) && term(TIMES) && T(); } bool T3() { return term(OPEN) && E() && term(CLOSE); }

bool T() { TOKEN *save = next; return (next = save, T1()) || (next = save, T2())

|| (next = save, T3()); } Prof. Aiken CS 143 Lecture 6 53 Prof. Aiken CS 143 Lecture 6 54

9 When Recursive Descent Does Not Work Elimination of

• Consider a production S  S a • Consider the left-recursive grammar

bool S1() { return S() && term(a); } S  S  | 

bool S() { return S1(); } •S generates all strings starting with a  and • S() goes into an infinite loop followed by a number of 

• A left-recursive grammar has a non-terminal S • Can rewrite using right-recursion S + S for some  S   S’ • Recursive descent does not work in such cases S’   S’ | 

Prof. Aiken CS 143 Lecture 6 55 Prof. Aiken CS 143 Lecture 6 56

More Elimination of Left-Recursion General Left Recursion

• In general • The grammar S  A  |  S  S  | … | S  |  | … |  1 n 1 m A  S  • All strings derived from S start with one of is also left-recursive because + 1,…,m and continue with several instances of S  S   1,…,n • Rewrite as • This left-recursion can also be eliminated

S  1 S’ | … | m S’

S’  1 S’ | … | n S’ |  • See Dragon Book for general algorithm – Section 4.3

Prof. Aiken CS 143 Lecture 6 57 Prof. Aiken CS 143 Lecture 6 58

Summary of Recursive Descent

• Simple and general parsing strategy – Left-recursion must be eliminated first – … but that can be done automatically

• Unpopular because of backtracking – Thought to be too inefficient

• In practice, backtracking is eliminated by restricting the grammar

Prof. Aiken CS 143 Lecture 6 59

10