CSE401 Introduction to Compiler Construction

CSE401 Introduction to Compiler Construction

9/26/2008 Source “Compiler”: from the web Program Target Program [Higher-Level • The Oxford English Dictionary (OED) indicates that the first Programming Compiler [Lower-Level usage of the term is circa 1330, referring to one who collects Language] Language/ Architecture] and puts together materials – They also note a usage “Diuerse translatours and compilaris” from Scotland in 1549 • Most dictionaries give the above definition as well as the CSE401 computing-based definition (which the OED dates to 1953) – A program that translates programs written in a high-level programming language into equivalent programs in a lower- Introduction to Compiler Construction level language • Wikipedia credits Grace Hopper with the first compiler (for a David Notkin language called A-0) in 1952, and John Backus’ IBM team with the first complete compiler (for FORTRAN) in 1957 Autumn 2008 Trivia: In what year was I born? CSE401 Au08 2 A world with no compilers Assembly/machine language coding • …is slow, error-prone, tedious, not portable, … • The size (roughly, lines of code) of a high-level language program relative to its assembly language equivalent is approximately linear – but that may well be a factor of 10 or even 100 – Microsoft Vista is something like 50 million lines of source code (50 MLOC) • Printed double-sided something like triple the height of the Allen Center • Something like 20 person-years just to retype • Q: Why is harder to build a program 10 times larger? CSE401 Au08 3 CSE401 Au08 4 1 9/26/2008 Ergo: we need compilers But why might you care? • And to have compilers, somebody has to build • Crass reasons: jobs compilers • Class reasons: grade in 401 – At least every time there is a need to program in a • Cool reasons: loveliest blending of theory and practice in computer science & engineering new <programming language, architecture> pair • Cruel reasons: we all had to learn it – Roughly how many pl’s and how many ISA’s? • Practice reasons: more experience with software design, Cross product? modifying software written by others, etc. • Unless the compilers could be generated • Practical reasons: the techniques are widely used outside of automatically – and parts can (a bit more on this later conventional compilers in the course) • Super-practical reasons: lays foundation for understanding or even researching really cool stuff like JIT (just-in-time) compilers, compiling for multicore, building interpreters, scripting Trivia: In what year did I first write a program? languages, (de)serializing data for distribution, and more… In what language? On what architecture? CSE401 Au08 5 CSE401 Au08 6 Better understand… Compiling (or related) Turing Awards • Compile-time vs. run-time • 1966 Alan Perlis • 1984 Niklaus Wirth • Interactions among • 1972 Edsger Dijkstra • 1987 John Cocke – language features • 1976 Michael Rabin • 2001 Ole-Johan Dahl – implementation efficiency and Dana Scott and Kristen Nygaard – compiler complexity • 1977 John Backus • 2003 Alan Kay – architectural features • 1978 Bob Floyd • 2005 Peter Naur • 1979 Bob Iverson • 2006 Fran Allen • 1980 Tony Hoare CSE401 Au08 7 CSE401 Au08 8 2 9/26/2008 Questions? Administrivia: see web • Text: Engineering a Compiler, Cooper and Torczon, Morgan-Kaufmann 2004 • Mail list – automatically subscribed • Google calendar with links • Grading – Project 40% – Homework 15% – Midterm 15% – Final 25% – Other (class participation, extra credit, etc.) 5% CSE401 Au08 9 CSE401 Au08 10 Project Compiler structure: overview • Start with a MiniJava compiler in Java Source Target Program Compiler • Add features such as comments, floating-point, Program arrays, class variables, for loops, etc. • Completed in stages over the term • Not teams: but you can talk to each other (“Prison Break” rule, see web) for the project Analyze Intermediate Generate (front end) Representation (back end) • Grading basis: correctness, clarity of design and implementation, quality of test cases, etc. Lexical & Intermediate Code Generation & Syntactic & Optimization & Semantic Code Generation CSE401 Au08 11 CSE401 Au08 12 3 9/26/2008 name=t6,assign,name=Fac,period, name=ComputeFac,lparen,name=this, Lexical analysis (scanning, lexing) Syntactic analysis comma,name=t3,rparen,semicolon Assignment Analyze: Source Analyze: Intermediate statement scan; parse Program scan; parse Representation Lefthand Righthand side side t 6 := Abstract Fac. Comput eFac ( t hi s , t 3) ; Character Stream Identifier: Method syntax tree 28 characters not counting t6 invocation whitespace Scan Method name Parameter List (lexical analysis) Identifier: Identifier: name=t6,assign,name=Fac,period, QualifiedName name=ComputeFac,lparen,name=this, this t3 comma,name=t3,rparen,semicolon (11 tokens) Token Stream Identifier: Identifer: Fac ComputeFac CSE401 Au08 13 CSE401 Au08 14 Semantic analysis Code generation (backend) Assign… • Annotate abstract statement Annotated abstract Generate Target Lefthand Righthand syntax tree (back end) Program syntax tree side side Identifier Method • Primarily determine : t6 invocation which identifiers Intermediate Method name Parameter List Annotated abstract Intermediate code are associated with syntax tree Language Identifier: Identifier Qualified… generation which declarations this : t3 Identifier: Identifer: • Scoping is key Fac ComputeFac Target code Target issue generation Program • Symbol table is key data structure CSE401 Au08 15 CSE401 Au08 16 4 9/26/2008 Optimization Quotations about optimization • Takes place at various (and multiple) places during • Michael Jackson code generation – Rule 1: Don't do it. – Might optimize the intermediate language code – Rule 2 (for experts only): Don't do it yet. – Might optimize the target code • Bill Wulf – Might optimize during execution of the program – More computing sins are committed in the name of efficiency (without necessarily achieving it) than for • Q: Is it better to have an optimizing compiler or to any other single reason – including blind stupidity. hand-optimize code? • Don Knuth – We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. CSE401 Au08 17 CSE401 Au08 18 Questions? Lexing: reprise • Read in characters • Clump into tokens • Strip out whitespace and comments • Tokens are specified using regular expressions Ident ::= Letter AlphaNum* Integer ::= Digit+ AlphaNum ::= Letter | Digit Letter ::= 'a' | … | 'z' | 'A' | … | 'Z' Digit ::= '0' | … | '9' • Q: regular expressions are equivalent to something you’ve previously learned about… what is it? CSE401 Au08 19 CSE401 Au08 20 5 9/26/2008 Syntactic analysis: reprise Semantic analysis: reprise • Read in tokens • Do name resolution and type checking on the abstract syntax • Build a tree based on syntactic structure tree • Report any syntax errors – What declaration does each name refer to? • EBNF (extended Backus-Naur Form) is a common notation for – Are types consistent? Are other static properties consistent? defining programming language syntax as a context-free • Symbol table grammar – maps names to information about name derived from Stmt ::= if (Expr) Stmt [else Stmt] declaration | while (Expr) Stmt | ID = Expr; | … – represents scoping usually through a tree of per-scope Expr ::= Expr + Expr | Expr < Expr | … | ! Expr symbol tables | Expr . ID ([Expr {,Expr}]) | ID | Integer | (Expr) | … • Overall process 1. Process each scope top down • The grammar specifies the concrete syntax of language 2. Process declarations in each scope into symbol table • The parser constructs the abstract syntax tree 3. Process body of each scope in context of symbol table CSE401 Au08 21 CSE401 Au08 22 Intermediate code generation: reprise Target code generation: reprise • Translate annotated AST and symbol tables into • Instruction selection: choose target instructions for lower-level intermediate code (subsequences) of intermediate representation (IR) • Intermediate code is a separate language instructions – Source-language independent • Register allocation: allocate IR code variables to – Target-machine independent registers, spilling to memory when necessary • Intermediate code is simple and regular • Compute layout of each procedures stack frames and other runtime data structures – Good representation for doing optimizations • Emit target code – Might be a reasonable target language itself, e.g. Java bytecode CSE401 Au08 23 CSE401 Au08 24 6 9/26/2008 Example: source Example: intermediate representation Sample (extended) MiniJava program: Factorial.java Int Fac.ComputeFac(*? this, int num) { // Computes 10! and prints it out int t1, numAux, t8, t3, t7, t2, t6, t0; class Factorial { t0 := 1; public static void main(String[] a) { t1 := num < t0; System.out.println( ifnonzero t1 goto L0; new Fac().ComputeFac(10)); } t2 := 1; } t3 := num - t2; class Fac { t6 := Fac.ComputeFac(this, t3); // the recursive helper function t7 := num * t6; public int ComputeFac(int num) { numAux := t7; int numAux; goto L2; if (num < 1) label L0; numAux = 1; t8 := 1; else numAux = num * this.ComputeFac(num-1); numAux := t8 return numAux; label L2; } } return numAux } CSE401 Au08 25 CSE401 Au08 26 Questions? Don’t forget • Survey (before Friday) • Readings (on calendar) • Visit office hours (on calendar) • Ask questions CSE401 Au08 27 CSE401 Au08 28 7.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us