Codegenassem.Java) Lexical Analysis (Scanning)

Codegenassem.Java) Lexical Analysis (Scanning)

Undergraduate Compilers Review Structure of a Typical Interpreter Compiler Analysis Synthesis Announcements character stream – Send me email if you have set up a group. Include account names for group. lexical analysis IR code generation – Each project is 10% of grade tokens “words” IR Today syntactic analysis optimization – Outline of planned topics for course AST “sentences” IR – Overall structure of a compiler semantic analysis code generation – Lexical analysis (scanning) – Syntactic analysis (parsing) annotated AST target language interpreter CS553 Lecture Undergrad Compilers Review 2 CS553 Lecture Undergrad Compilers Review 3 Structure of the MiniJava Compiler (CodeGenAssem.java) Lexical Analysis (Scanning) Analysis Synthesis Break character stream into tokens (“words”) character stream – Tokens, lexemes, and patterns – Lexical analyzers are usually automatically generated from patterns Lexer lexical analysis IR code generation Translate (regular expressions) (e.g., lex) tokens “words” IRT Tree/ Examples Parser.parse() syntactic analysis instruction selection Mips/Codegen token lexeme(s) pattern AST “sentences” Assem const const const Project 4 if if if BuildSymTable semantic analysis optimization relation <,<=,=,!=,... < | <= | = | != | ... CheckTypes identifier foo,index [a-zA-Z_]+[a-zA-Z0-9_]* Assem CodeGenAssem number 3.14159,570 [0-9]+ | [0-9]*.[0-9]+ string “hi”, “mom” “.*” AST and symbol table code generation minijava.node/ MIPS const pi := 3.14159 ⇒ const, identifier(pi), assign,number(3.14159) SymTable/ CS553 Lecture Undergrad Compilers Review 4 CS553 Lecture Undergrad Compilers Review 5 1 Interaction Between Scanning and Parsing Specifying Tokens with SableCC Theory meets practice: – Regular expressions, formal languages, grammars, parsing… lexer.next() parse tree lexer.peek() SableCC example input file: character stream or AST Tokens Lexical Package minijava; t_plus = '+'; Parser Helpers analyzer t_if = 'if'; all = [0..0xFFFF]; cr = 13; token t_id = letter (letter | digit | underscore)*; digit = ['0'..'9']; t_blank = (' ' | eol | tab)+; letter = ['a'..'z'] | ['A'..'Z']; t_comment = c_comment | line_comment; underscore = ’_’; not_star = [all - '*']; Ignored Tokens not_star_slash = [not_star - '/']; t_blank, t_comment; c_comment = '/*' not_star* ('*' (not_star_slash not_star*)?)* '*/'; CS553 Lecture Undergrad Compilers Review 6 CS553 Lecture Undergrad Compilers Review 7 Recognizing Tokens with DFAs Syntactic Analysis (Parsing) i f Impose structure on token stream ‘if‘ 1 4 5 t_if – Limited to syntactic structure (⇒ high-level) – Structure usually represented with an abstract syntax tree (AST) – Parsers are usually automatically generated from context-free grammars (e.g., yacc, bison, cup, javacc, sablecc) letter or digit for Example letter letter (letter | digit)* 1 2 t_id i 1 10 asg for i = 1 to 10 do a[i] = x * 5; arr tms Ambiguity due to matching substrings a i x 5 – Longest match – Rule priority for id(i) equal number(1) to number(10) do id(a) lbracket id(i) rbracket equal id(x) times number(5) semi CS553 Lecture Undergrad Compilers Review 8 CS553 Lecture Undergrad Compilers Review 9 2 Interaction Between Scanning and Parsing Bottom-Up Parsing: Shift-Reduce Grammer a + b + c (1) S -> E S -> E lexer.next() (2) E -> E + T -> E + T parse tree lexer.peek() (3) E -> T -> E + id character stream or AST (4) T -> id -> E + T + id Lexical Parser -> E + id + id analyzer -> T + id + id token -> id + id + id Rightmost derivation: expand rightmost non-terminals first SableCC, yacc, and bison generate shift-reduce parsers: – LALR(1): look-ahead, left-to-right, rightmost derivation in reverse, 1 symbol lookahead – LALR is a parsing table construction method, smaller tables than canonical LR Reference: Barbara Ryder’s 198:515 lecture notes CS553 Lecture Undergrad Compilers Review 10 CS553 Lecture Undergrad Compilers Review 11 LR Parse Table Shift-Reduce Parsing Example (1) S -> E Stack Input Action (1) S -> E Action Goto (2) E -> E + T (2) E -> E + T $ 0 a + b + c shift 4 State + id $ S E T (3) E -> T (3) E -> T $ 0 a 4 + b + c reduce (4) 0 s4 1 2 (4) T -> id (4) T -> id $ 0 T 2 + b + c reduce (3) 1 s3 accept $ 0 E 1 + b + c shift 2 r(3) r(3) r(3) $ 0 E 1 + 3 b + c shift 3 s4 5 $ 0 E 1 + 3 b 4 + c reduce (4) 4 r(4) r(4) r(4) $ 0 E 1 + 3 T 5 + c reduce (2) 5 r(2) r(2) r(2) $ 0 E 1 + c shift Look at current state and input symbol to get action $ 0 E 1 + 3 c shift shift(n): advance input, push n on stack $ 0 E 1 + 3 c 4 reduce (4) reduce(k): pop rhs of grammar rule k, k = (lhs -> rhs) $ 0 E 1 + 3 T 5 reduce (2) look up state on top of stack and lhs for goto n $ 0 E 1 accept push n accept: stop and success error: stop and fail Reference: Barbara Ryder’s 198:515 lecture notes CS553 Lecture Undergrad Compilers Review 12 CS553 Lecture Undergrad Compilers Review 13 3 Syntax-directed Translation: AST Construction example Using SableCC to specify grammar and generate AST Productions Grammer with production rules cst_program {-> program} = S: E { $$ = $1; }; cst_main_class cst_class_decl* E: E ‘+’ T { $$ = new node(“+”, $1, $3); } {-> New program(cst_main_class.main_class,[cst_class_decl.class_decl])} ; Abstract Syntax Tree | T { $$ = $1; } ; cst_exp_list {-> exp* } = program = T: T_ID { $$ = new leaf(“id”, $1); }; {many_rule} cst_exp cst_exp_rest* main_class [class_decls]:class_decl*; {-> [cst_exp.exp, cst_exp_rest.exp] } exp = Implicit parse tree for a+b+c AST for a+b+c | {empty_rule} {call} exp t_id [args]:exp* | ... S {-> [] } + ; type = {int} | {bool} | ... E cst_exp_rest {-> exp* } = t_comma cst_exp + E + {-> [cst_exp.exp] }; T c E + T T_ID cst_type {-> type } = ... a b | {int_rule} int T T_ID {-> New type.int() } T_ID c b a Reference: Barbara Ryder’s 198:515 lecture notes CS553 Lecture Undergrad Compilers Review 14 CS553 Lecture Undergrad Compilers Review 15 AST, SymTable, IRT Tree, Assem, and MIPS for example Concepts class And { Compilation stages in a compiler public static void main(String[] a){ – Scanning, parsing, semantic analysis, intermediate code generation, System.out.println(new Foo().testing(42)); } } optimization, code generation class Foo { Lexical analysis or scanning – Tools: SableCC, lex, flex, etc. public int testing(int p) { Syntactic analysis or parsing int x; – Tools: SableCC, yacc, bison, etc. if (p < 10 && 2 < p) { Parsing Terms x = 7; – see attached slides, be familiar with these terms } else { x = 22; } return x; }C S }553 Lecture Undergrad Compilers Review 16 CS553 Lecture Undergrad Compilers Review 17 4 Next Time Parsing Terms CFG (Context-free Grammer) Lecture – production rule – More undergraduate compilers review – terminal – nonterminal BNF (Backus-Naur Form) and EBNF (Extended BNF): equivalent to CFGs CS553 Lecture Undergrad Compilers Review 18 CS553 Lecture Undergrad Compilers Review 19 Parsing Terms cont … Top-down parsing – LL(1): left-to-right reading of tokens, leftmost derivation, 1 symbol look-ahead – Predictive parser: an efficient non-backtracking top-down parser that can handle LL(1) – More generally recursive descent parsing may involve backtracking Bottom-up Parsing – LR(1): left-to-right reading of tokens, rightmost derivation in reverse, 1 symbol lookahead – Shift-reduce parsers: for example, bison, yacc, and SableCC generated parsers – Methods for producing an LR parsing table – SLR, simple LR – Canonical LR, most powerful – LALR(1) CS553 Lecture Undergrad Compilers Review 20 5.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    5 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us