Intermediate Representations Abstract Syntax Trees Source Front IRMiddle IR Back Target The parser’s output is an abstract syntax Code End End End Code (AST) representing the grammatical structure of the parsed input. • Front end - produces an intermediate representation (IR) But first a digression. • Middle end - transforms the IR into an equivalent IR that runs more efficiently (usually consists of several passes) • Back end - transforms the IR into native code • The IR encodes the ’s knowledge of the program at any point in time 1 2

Typical Implementation of a Compiler Abstract Syntax Trees Source Lexical Analyzer Program • The parser’s output is an abstract syntax tree Tokens Syntax Analyzer (AST) representing the grammatical structure Abstract syntax tree of the parsed input Semantic Analyzer • ASTs represent only semantically meaningful aspects of input program, unlike concrete AST and symbol Intermediate Code tables Generator syntax trees which record the complete textual form of the input IR Code Optimizer – There’s no need to record keywords or punctuation like () , ;, else IR Back end – The rest of compiler only cares about the abstract structure Target Program 3 4

Concrete Syntax vs. Abstract Syntax Parse trees and abstract syntax trees

• Graphically represent grammatical structure of input program – : tree representation of grammar derivation – AST: condensed form of parse tree • Concrete syntax: what the programmer wrote • Operators and keywords do not appear as leaves • Chains of single productions are collapsed => Parse Tree Parse trees Abstract syntax trees • Abstract syntax: what the compiler needs S If-then-else => Abstract Syntax Tree IF B THEN S1 ELSE S2 B S1 S2

E + E + T 3 5 T 5

5 3 6

1 AST Class Hierarchy AST Node Classes

• AST classes are organized into an inheritance Each node in an AST is an instance of an AST class hierarchy based on commonalities of meaning and – IfStmt , AssignStmt , AddExpr , VarDecl , etc. structure • Each "abstract non-terminal" that has multiple Each AST class declares its own instance variables alternative concrete forms will have an abstract class holding its AST subtrees that’s the superclass of the various alternative forms – IfStmt has testExpr , thenStmt , and elseStmt – Stmt is abstract superclass of IfStmt , AssignStmt , etc. – AssignStmt has lhsVar and rhsExpr – Expr is abstract superclass of AddExpr , VarExpr , etc. – AddExpr has arg1Expr and arg2Expr – Type is abstract superclass of IntType , ClassType , etc. – VarDecl has typeExpr and varName

7 8

Automatic Parser Generation in MiniJava

We use the CUP tool to automatically create a parser from a specification file, Parser/minijava.cup The MiniJava Makefile automatically rebuilds the parser Notes on MiniJava Project whenever its specification file changes

A CUP file has several sections: – introductory declarations included with the generated parser – declarations of the terminals and nonterminals with their types – The AST node or other value returned when finished that nonterminal or terminal – precedence declarations – productions + actions

9 10

Terminal and Nonterminal Declarations Precedence Declarations Terminal declarations we saw before: /* reserved words: */ Can specify precedence and associativity of operators terminal CLASS, PUBLIC, STATIC, EXTENDS; – equal precedence in a single declaration ... /* tokens with values: */ – lowest precedence textually first terminal String IDENTIFIER; – specify left, right, or nonassoc with each declaration terminal Integer INT_LITERAL; Examples: Nonterminals are similar: precedence left AND_AND; nonterminal Program Program; precedence nonassoc EQUALS_EQUALS, nonterminal MainClassDecl MainClassDecl; EXCLAIM_EQUALS; nonterminal List/*<...>*/ ClassDecls; precedence left LESSTHAN, LESSEQUAL, nonterminal RegularClassDecl ClassDecl; GREATEREQUAL, GREATERTHAN; ... nonterminal List/**/ Stmts; precedence left PLUS, MINUS; nonterminal Stmt Stmt; precedence left STAR, SLASH; nonterminal List/**/ Exprs; precedence left EXCLAIM; nonterminal List/**/ MoreExprs; precedence left PERIOD; nonterminal Expr Expr; 11 12 nonterminal String Identifier;

2 AST Extensions For Project Productions New variable declarations: All of the form: – StaticVarDecl LHS ::= RHS1 {: Java code 1 :} | RHS2 {: Java code 2 :} New types: | ... – DoubleType | RHSn {: Java code n :}; – ArrayType Can label symbols in RHS with :var suffix to refer to its New/changed statements: result value in Java code – IfStmt can omit else branch • varleft is set to line in input where var symbol was – ForStmt E.g.: Expr ::= Expr:arg1 PLUS Expr:arg2 – BreakStmt {: RESULT = new AddExpr( arg1,arg2,arg1left);:} – ArrayAssignStmt | INT_LITERAL:value{: RESULT = new IntLiteralExpr( New expressions: value.intValue(),valueleft);:} – DoubleLiteralExpr | Expr:rcvr PERIOD Identifier:message OPEN_PAREN Exprs:args CLOSE_PAREN – OrExpr {: RESULT = new MethodCallExpr( – ArrayLookupExpr rcvr,message,args,rcvrleft);:} – ArrayLengthExpr 13 14 | ... ; – ArrayNewExpr

Error Handling

Extra Slides Start Here How to handle syntax error? Option 1: quit compilation + easy - inconvenient for programmer Option 2: error recovery + try to catch as many errors as possible on one compile - difficult to avoid streams of spurious errors Option 3: error correction + fix syntax errors as part of compilation - hard!!

15 16

Panic Mode Error Recovery When finding a syntax error, skip tokens until reaching a “landmark” • landmarks in MiniJava: ;, ), } • once a landmark is found, hope to have gotten back on track In top-down parser, maintain set of landmark tokens as recursive descent proceeds • landmarks selected from terminals later in production • as parsing proceeds, set of landmarks will change, depending on the parsing context In bottom-up parser, can add special error nonterminals, followed by landmarks • if syntax error, then will skip tokens till seeing landmark, then reduce and continue normally • E.g. Stmt ::= ... | error ; | { error } Expr ::= ... | ( error ) 17

3