Week 3 Slides on Abstract Syntax Trees

Week 3 Slides on Abstract Syntax Trees

COMP520Fall2010 Abstractsyntaxtrees(1) Abstract syntax trees COMP520Fall2010 Abstractsyntaxtrees(2) A compiler pass is a traversal of the program. A compiler phase is a group of related passes. A one-pass compiler scans the program only once. It is naturally single-phase. The following all happen at the same time: • scanning • parsing • weeding • symbol table creation • type checking • resource allocation • code generation • optimization • emitting COMP520Fall2010 Abstractsyntaxtrees(3) This is a terrible methodology: • it ignores natural modularity; • it gives unnatural scope rules; and • it limits optimizations. However, it used to be popular: • it’s fast (if your machine is slow); and • it’s space efficient (if you only have 4K). A modern multi-pass compiler uses 5–15 phases, some of which may have many individual passes: you should skim through the optimization section of ‘man gcc’ some time! COMP520Fall2010 Abstractsyntaxtrees(4) A multi-pass compiler needs an intermediate representation of the program between passes. We could use a parse tree, or concrete syntax tree (CST): E Q QQ E + T Q QQ T T * F F F id id id or we could use a more convenient abstract syntax tree (AST), which is essentially a parse tree/CST but for a more abstract grammar: + @ @ id * @ @ id id COMP520Fall2010 Abstractsyntaxtrees(5) Instead of constructing the tree: + @ @ id * @ @ id id a compiler can generate code for an internal compiler-specific grammar, also known as an intermediate language. Early multi-pass compilers wrote their IL to disk between passes. For the above tree, the string +(id,*(id,id)) would be written to a file and read back in for the next pass. It may also be useful to write an IL out for debugging purposes. COMP520Fall2010 Abstractsyntaxtrees(6) Examples of modern intermediate languages: • Java bytecode • C, for certain high-level language compilers • Jimple, a 3-address representation of Java bytecode specific to Soot that you learn about in COMP 621 • Simple, the precursor to Jimple that Laurie Hendren created for McCAT • Gimple, the IL based on Simple that gcc uses In this course, you will generally use an AST as your IR without the need for an explicit IL. Note: somewhat confusingly, both industry and academia use the terms IR and IL interchangeably. COMP520Fall2010 Abstractsyntaxtrees(7) $ cat tree.h tree.c # AST construction for Tiny language [...] typedef struct EXP { enum {idK,intconstK,timesK,divK,plusK,minusK} kind; union { char *idE; int intconstE; struct {struct EXP *left; struct EXP *right;} timesE; struct {struct EXP *left; struct EXP *right;} divE; struct {struct EXP *left; struct EXP *right;} plusE; struct {struct EXP *left; struct EXP *right;} minusE; } val; } EXP; EXP *makeEXPid(char *id) { EXP *e; e = NEW(EXP); e->kind = idK; e->val.idE = id; return e; } [...] EXP *makeEXPminus(EXP *left, EXP *right) { EXP *e; e = NEW(EXP); e->kind = minusK; e->val.minusE.left = left; e->val.minusE.right = right; return e; } COMP520Fall2010 Abstractsyntaxtrees(8) $ cat tiny.y # Tiny parser that creates EXP *theexpression %{ #include <stdio.h> #include "tree.h" extern char *yytext; extern EXP *theexpression; void yyerror() { printf ("syntax error before %s\n", yytext); } %} %union { int intconst; char *stringconst; struct EXP *exp; } %token <intconst> tINTCONST %token <stringconst> tIDENTIFIER %type <exp> program exp [...] COMP520Fall2010 Abstractsyntaxtrees(9) [...] %start program %left ’+’ ’-’ %left ’*’ ’/’ %% program: exp { theexpression = $1; } ; exp : tIDENTIFIER { $$ = makeEXPid ($1); } | tINTCONST { $$ = makeEXPintconst ($1); } | exp ’*’ exp { $$ = makeEXPmult ($1, $3); } | exp ’/’ exp { $$ = makeEXPdiv ($1, $3); } | exp ’+’ exp { $$ = makeEXPplus ($1, $3); } | exp ’-’ exp { $$ = makeEXPminus ($1, $3); } | ’(’ exp ’)’ { $$ = $2; } ; %% COMP520Fall2010 Abstractsyntaxtrees(10) Constructing an AST with flex/bison: • AST node kinds go in tree.h enum {idK,intconstK,timesK,divK,plusK,minusK} kind; • AST node semantic values go in tree.h struct {struct EXP *left; struct EXP *right;} minusE; • Constructors for node kinds go in tree.c EXP *makeEXPminus(EXP *left, EXP *right) { EXP *e; e = NEW(EXP); e->kind = minusK; e->val.minusE.left = left; e->val.minusE.right = right; return e; } • Semantic value type declarations go in tiny.y %union { int intconst; char *stringconst; struct EXP *exp; } • (Non-)terminal types go in tiny.y %token <intconst> tINTCONST %token <stringconst> tIDENTIFIER %type <exp> program exp • Grammar rule actions go in tiny.y exp : exp ’-’ exp { $$ = makeEXPminus ($1, $3); } COMP520Fall2010 Abstractsyntaxtrees(11) A “pretty”-printer: $ cat pretty.h #include <stdio.h> #include "pretty.h" void prettyEXP(EXP *e) { switch (e->kind) { case idK: printf("%s",e->val.idE); break; case intconstK: printf("%i",e->val.intconstE); break; case timesK: printf("("); prettyEXP(e->val.timesE.left); printf("*"); prettyEXP(e->val.timesE.right); printf(")"); break; [...] case minusK: printf("("); prettyEXP(e->val.minusE.left); printf("-"); prettyEXP(e->val.minusE.right); printf(")"); break; } } COMP520Fall2010 Abstractsyntaxtrees(12) The following pretty printer program: $ cat main.c #include "tree.h" #include "pretty.h" void yyparse(); EXP *theexpression; void main() { yyparse(); prettyEXP(theexpression); } will on input: a*(b-17) + 5/c produce the output: ((a*(b-17))+(5/c)) COMP520Fall2010 Abstractsyntaxtrees(13) As mentioned before, a modern compiler uses 5–15 phases. Each phase contributes extra information to the IR (AST in our case): • scanner: line numbers; • symbol tables: meaning of identifiers; • type checking: types of expressions; and • code generation: assembler code. Example: adding line number support. First, introduce a global lineno variable: $ cat main.c [...] int lineno; void main() { lineno = 1; /* input starts at line 1 */ yyparse(); prettyEXP(theexpression); } COMP520Fall2010 Abstractsyntaxtrees(14) Second, increment lineno in the scanner: $ cat tiny.l # modified version of previous exp.l %{ #include "y.tab.h" #include <string.h> #include <stdlib.h> extern int lineno; /* declared in main.c */ %} %% [ \t]+ /* ignore */; /* no longer ignore \n */ \n lineno++; /* increment for every \n */ [...] Third, add a lineno field to the AST nodes: typedef struct EXP { int lineno; enum {idK,intconstK,timesK,divK,plusK,minusK} kind; union { char *idE; int intconstE; struct {struct EXP *left; struct EXP *right;} timesE; struct {struct EXP *left; struct EXP *right;} divE; struct {struct EXP *left; struct EXP *right;} plusE; struct {struct EXP *left; struct EXP *right;} minusE; } val; } EXP; COMP520Fall2010 Abstractsyntaxtrees(15) Fourth, set lineno in the node constructors: extern int lineno; /* declared in main.c */ EXP *makeEXPid(char *id) { EXP *e; e = NEW(EXP); e->lineno = lineno; e->kind = idK; e->val.idE = id; return e; } EXP *makeEXPintconst(int intconst) { EXP *e; e = NEW(EXP); e->lineno = lineno; e->kind = intconstK; e->val.intconstE = intconst; return e; } [...] EXP *makeEXPminus(EXP *left, EXP *right) { EXP *e; e = NEW(EXP); e->lineno = lineno; e->kind = minusK; e->val.minusE.left = left; e->val.minusE.right = right; return e; } COMP520Fall2010 Abstractsyntaxtrees(16) The SableCC 2 grammar for our Tiny language: Package tiny; Helpers tab =9; cr =13; lf =10; digit = [’0’..’9’]; lowercase = [’a’..’z’]; uppercase = [’A’..’Z’]; letter = lowercase | uppercase; idletter = letter | ’_’; idchar = letter | ’_’ | digit; Tokens eol = cr | lf | cr lf; blank = ’ ’ | tab; star = ’*’; slash = ’/’; plus = ’+’; minus = ’-’; l_par = ’(’; r_par = ’)’; number = ’0’| [digit-’0’] digit*; id = idletter idchar*; Ignored Tokens blank, eol; COMP520Fall2010 Abstractsyntaxtrees(17) Productions exp = {plus} exp plus factor | {minus} exp minus factor | {factor} factor; factor = {mult} factor star term | {divd} factor slash term | {term} term; term = {paren} l_par exp r_par | {id} id| {number} number; COMP520Fall2010 Abstractsyntaxtrees(18) SableCC generates subclasses of the ’Node’ class for terminals, non-terminals and production alternatives: • Node classes for terminals: ’T’ followed by (capitalized) terminal name: TEol, TBlank, ..., TNumber, TId • Node classes for non-terminals: ’P’ followed by (capitalized) non-terminal name: PExp, PFactor, PTerm • Node classes for alternatives: ’A’ followed by (capitalized) alternative name and (capitalized) non-terminal name: APlusExp (extends PExp), ..., ANumberTerm (extends PTerm) COMP520Fall2010 Abstractsyntaxtrees(19) SableCC populates an entire directory structure: tiny/ |--analysis/ Analysis.java | AnalysisAdapter.java | DepthFirstAdapter.java | ReversedDepthFirstAdapter.java | |--lexer/ Lexer.java lexer.dat | LexerException.java | |--node/ Node.java TEol.java ... TId.java | PExp.java PFactor.java PTerm.java | APlusExp.java... | AMultFactor.java... | AParenTerm.java... | |--parser/ parser.dat Parser.java | ParserException.java ... | |-- custom code directories, e.g. symbol, type, ... COMP520Fall2010 Abstractsyntaxtrees(20) Given some grammar, SableCC generates a parser that in turn builds a concrete syntax tree (CST) for an input program. A parser built from the Tiny grammar creates the following CST for the program ‘a+b*c’: Start | APlusExp / \ AFactorExp AMultFactor | / \ ATermFactor ATermFactor AIdTerm | | | AIdTerm AIdTerm c | | a b This CST has many unnecessary intermediate nodes. Can you identify them? COMP520Fall2010 Abstractsyntaxtrees(21) We only need an abstract syntax tree (AST) to operate on: APlusExp / \ AIdExp AMultExp | / \ a AIdExp AIdExp | | b c Recall that bison relies on user-written

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    40 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us