CSE131 D2: Project 2

Xinxin Jin Apr 19, 2013

Special Thanks to Chung-Huang Hsiao Announcements

• p2.zip with complete scanner.l is uploaded to piazza • P2 deadline is extended to Apr 28th midnight Project 2: What to Do

• Now you have a scanner, the next step is to do the syntax analysis based on Decaf’s grammar • You will learn to use the /Bison parser generator to build a parser for Decaf – Just like using lex/flex to generate a scanner • Build and print the if the input program is good; report syntax errors otherwise – The abstract syntax tree will then be used in the semantic analysis phase in later projects Syntax Analysis

• Given the token sequence, now we want to recognize expressions and statements • Let’s start with an English example: Alice ate apples. – After , we have got a sequence of tokens: N(Alice) V(ate) N(apples) Period(.) –The goal of syntax analysis is to know it’s a statement: S(Alice) V(ate) O(apples) • In programming language, we want to know how the code can be derived from the grammar – By knowing which rules we are using, we can get the meaning of the code Yacc and Bison

• Yacc - Yet Another Compiler Compiler – Using a (Yacc) compiler to generate another compiler! • Automatically generates an LALR parser given the CFG of a language – Requires a scanner as its front-end to recognize the terminals • Bison: a Yacc-compatible GNU replacement – Just like Flex to Lex – In addition to LALR parser, it can also generate more kinds of LR parsers Yacc Syntax

• Structure of a .y file %{ PROLOGUE %} DECLARATIONS %% RULES %% USERCODE

Yacc Example: In-fix Calculator

$n : yylval of the nth symbol $$: yylval for the resulng symbol

• Demo Connecting Flex and Yacc

• Use %token in .y DECLARATIONS to define terminals %token T_Number • Use yacc -h to generate y.tab.h enum yytokentype{ T_Number = 258, }; • Include y.tab.h in scanner.l and return the corresponding terminal for each pattern [0-9]+ return T_Number; Constructing AST • Instead of do the calculation in place, we need to return a specific AST node in each action • Yacc uses bottom-up (LALR(1)) • An action is performed when a handle is reduced through its corresponding rule – At the time the handle is reduced, all actions associated with its RHS are already performed

Constructing AST

AST of “(1+2*3)^2” exp: NUM {$$=new IntConstNode($1); } | exp '+' exp {$$=new AddNode($1, $3); } | exp '-' exp { $$ = new SubtractNode($1, $3); } | exp '*' exp { $$ = new MulplyNode($1, $3); } | exp '/' exp {$$=new DivideNode($1, $3); } | '-' exp %prec NEG {$$=new NegateNode($2); } | exp '^' exp { $$ = new ExponentNode($1, $3); } | ‘(’ exp ‘)’ {$$ = $2;} ; Semantic Value Types (1/2)

• $$ and $n refer to yylval of LHS and RHS • @$, @n refer to yylloc of LHS and RHS • Use %union to define fields in yylval %union { int number; ExprNode *expr; } • It generates the union type for yylval in y.tab.h typedef union YYSTYPE { int number; ExprNode *expr; } YYSTYPE; extern YYSTYPE yylval; Semantic Value Types (2/2)

• For a terminal, specify its yylval field with %token %token T_Number • For a nonterminal, specify its field with %type %type E T F • Then the following action E: E+id {$$=new AddNode($1,$3);} (semantically) becomes { yylvalLHS.expr = new AddNode (yylvalRHS1.expr, yylvalRHS3.number); } Shift/Reduce Conflict

• As Bison reads tokens, it pushes them onto a parser stack along with their semantic values. Pushing a token is traditionally called shifting – Parser Stack: 1 + 15 • When the last n tokens and groupings shifted match a grammar rule, they can be replaced by a single grouping whose symbol is the result (left hand side) of that rule. This is called reduction – Parser Stack: 16 • Shift/Reduce conflicts happens when either shift or conflict is valid ! : Operator Precedence & Associativity(1/2) • Consider the following grammar E : E '+' E | E '-' E | E ‘*' E | E '=’ E | T_Number; • This grammar is ambiguous – 1+2*3 = 7 but 2*3+1=8 ? • The ambiguity can be resolved with precedence declarations %left, %right and %nonassoc %nonassoc '=’ %left '+’ '-’ %left ‘*’ ‘/’ Ambiguous Grammar: Operator Precedence & Associativity(1/2) • How about the following grammar E : E '-' E | '-' E | T_Number ; • The ambiguity can be resolved by adding a dummy token to specify the precedence of the negation %left '-' %left NEG %% E : E '-’ E | '-' E %prec NEG | T_Number

Ambiguous Grammar: Dangling Else(1/3) • Consider the following grammar Stmt : if Expr then Stmt | if Expr then Stmt else Stmt ; • This grammar is ambiguous – Why? Ambiguous Grammar: Dangling Else(2/3) • Reduce the following state if x then if y then win; else lose; Shift “else” => if x then do; if y then win; else lose; end. Reduce => if x then do; if y then win; end; else lose; Ambiguous Grammar: Dangling Else(3/3) • Use the same trick as we do for operator precedence ! %nonassoc NoELSE %nonassoc else Stmt :Expr |if Expr then Stmt %prec NoELSE |if Expr then Stmt else Stmt ; Project 2: Start Code Decaf grammar rules: Program ::= Decl+ Decl := VariableDecl| FunconDecl | ClassDecl | InterfaceDecl /*Bison rules*/ Program : DeclList Decl : VariableDecl { $$ = $1;} { @1; | FunconDecl { ?? } Program *program = new Program($1); | ClassDecl { ?? } // if no errors, advance to next phase | InterfaceDecl { ?? } if (ReportError::NumErrors() == 0) ; program->Print(0); } ; DeclList : DeclList Decl { ($$=$1)->Append($2); } |Decl { ($$ = new List) ->Append($1); } ;

Proj2: Start code

• Set the type of Non-terminal /*Add attributes to yylval*/ %union { List *declList; Decl *decl; } /*Set non-terminal types*/ %type DeclList %type Decl Hints on Project 2

• Read ast_*.h, think about which class is for what kinds of nodes • What actions are expected for a LHS ? • Be careful about operator precedence • You may set yydebug = true for debugging References

• http://www.gnu.org/software/bison/manual/ html_node/index.html • http://dinosaur.compilertools.net/#yacc • http://dinosaur.compilertools.net/#bison • http://epaperpress.com/lexandyacc/ Thanks