Scanning and Parsing Structure of a Typical Interpreter Compiler Lexical

Scanning and Parsing Structure of a Typical Interpreter Compiler Lexical

Scanning and Parsing Structure of a Typical Interpreter Compiler Analysis Synthesis Announcements character stream – Project 1 is 5% of total grade – Project 2 is 10% of total grade lexical analysis IR code generation – Project 3 is 15% of total grade tokens “words” IR – Project 4 is 10% of total grade syntactic analysis optimization Today – Outline of planned topics for course AST “sentences” IR – Overall structure of a compiler semantic analysis code generation – Lexical analysis (scanning) – Syntactic analysis (parsing) annotated AST target language interpreter CS553 Lecture Scanning and Parsing 2 CS553 Lecture Scanning and Parsing 3 Lexical Analysis (Scanning) Interaction Between Scanning and Parsing Break character stream into tokens (“words”) – Tokens, lexemes, and patterns – Lexical analyzers are usually automatically generated from patterns lexer.next() parse tree (regular expressions) (e.g., lex) lexer.peek() character stream or AST Lexical Examples Parser analyzer token lexeme(s) pattern token const const const if if if relation <,<=,=,!=,... < | <= | = | != | ... identifier foo,index [a-zA-Z_]+[a-zA-Z0-9_]* number 3.14159,570 [0-9]+ | [0-9]*.[0-9]+ string “hi”, “mom” “.*” const pi := 3.14159 ⇒ const, identifier(pi), assign,number(3.14159) CS553 Lecture Scanning and Parsing 4 CS553 Lecture Scanning and Parsing 5 1 Specifying Tokens with SableCC Recognizing Tokens with DFAs Theory meets practice: i f – Regular expressions, formal ‘if‘ 1 4 5 t_if languages, grammars, parsing… SableCC example input file: Tokens Package minijava; t_plus = '+'; letter or digit Helpers t_if = 'if'; all = [0..0xFFFF]; cr = 13; letter letter (letter | digit)* t_id = letter (letter | digit | underscore)*; 1 2 t_id digit = ['0'..'9']; t_blank = (' ' | eol | tab)+; letter = ['a'..'z'] | ['A'..'Z']; t_comment = c_comment | line_comment; underscore = ’_’; not_star = [all - '*']; Ignored Tokens Ambiguity due to matching substrings not_star_slash = [not_star - '/']; t_blank, t_comment; – Longest match c_comment = '/*' not_star* ('*' (not_star_slash not_star*)?)* '*/'; – Rule priority CS553 Lecture Scanning and Parsing 6 CS553 Lecture Scanning and Parsing 7 Syntactic Analysis (Parsing) Interaction Between Scanning and Parsing Impose structure on token stream – Limited to syntactic structure (⇒ high-level) – Structure usually represented with an abstract syntax tree (AST) lexer.next() parse tree – Parsers are usually automatically generated from context-free grammars lexer.peek() (e.g., yacc, bison, cup, javacc, sablecc) character stream or AST Lexical for Parser Example analyzer i 1 10 asg token for i = 1 to 10 do a[i] = x * 5; arr tms a i x 5 for id(i) equal number(1) to number(10) do id(a) lbracket id(i) rbracket equal id(x) times number(5) semi CS553 Lecture Scanning and Parsing 8 CS553 Lecture Scanning and Parsing 9 2 Bottom-Up Parsing: Shift-Reduce Shift-Reduce Parsing Example Grammer a + b + c Stack Input Action (1) S -> E (2) E -> E + T $ a + b + c shift (1) S -> E S -> E (3) E -> T $ a + b + c reduce (4) (2) E -> E + T -> E + T (4) T -> id (3) E -> T -> E + id $ T + b + c reduce (3) (4) T -> id -> E + T + id $ E + b + c shift -> E + id + id $ E + b + c shift -> T + id + id $ E + b + c reduce (4) -> id + id + id $ E + T + c reduce (2) $ E + c shift Rightmost derivation: expand rightmost non-terminals first $ E + c shift SableCC, yacc, and bison generate shift-reduce parsers: – LALR(1): look-ahead, left-to-right, rightmost derivation in reverse, 1 symbol lookahead $ E + c reduce (4) – LALR is a parsing table construction method, smaller tables than canonical LR $ E + T reduce (2) $ E reduce (1) $ S accept Reference: Barbara Ryder’s 198:515 lecture notes Reference: Barbara Ryder’s 198:515 lecture notes CS553 Lecture Scanning and Parsing 10 CS553 Lecture Scanning and Parsing 11 Shift-Reduce Parsing Example (precedence problem) Syntax-directed Translation: AST Construction example Stack Input Action (1) S -> E Grammer with production rules (2) E -> E + T $ a + b * c shift S: E { $$ = $1; }; (3) E -> E * T E: E ‘+’ T { $$ = new node(“+”, $1, $3); } (4) E -> T | T { $$ = $1; } (5) T -> id ; T: T_ID { $$ = new leaf(“id”, $1); }; Implicit parse tree for a+b+c AST for a+b+c S + E E + + T c E + T T_ID a b T T_ID T_ID c b a Reference: Barbara Ryder’s 198:515 lecture notes CS553 Lecture Scanning and Parsing 12 CS553 Lecture Scanning and Parsing 13 3 Using SableCC to specify grammar and generate AST Parsing Terms Productions CFG (Context-free Grammer) cst_program {-> program} = cst_main_class cst_class_decl* – production rule {-> New program(cst_main_class.main_class,[cst_class_decl.class_decl])} ; – terminal cst_exp_list {-> exp* } = {many_rule} cst_exp cst_exp_rest* – nonterminal {-> [cst_exp.exp, cst_exp_rest.exp] } – FOLLOW(X): “the set of terminals that can immediately follow X” | {empty_rule} {-> [] } ; cst_exp_rest {-> exp* } = t_comma cst_exp {-> [cst_exp.exp] }; BNF (Backus-Naur Form) and EBNF (Extended BNF): equivalent to CFGs Abstract Syntax Tree program = main_class [class_decls]:class_decl*; exp = {call} exp t_id [args]:exp* | ... CS553 Lecture Scanning and Parsing 14 CS553 Lecture Scanning and Parsing 15 Parsing Terms cont … Concepts Top-down parsing Compilation stages in a compiler – LL(1): left-to-right reading of tokens, leftmost derivation, 1 symbol look-ahead – Scanning, parsing, semantic analysis, intermediate code generation, – Predictive parser: an efficient non-backtracking top-down parser that can handle optimization, code generation LL(1) – More generally recursive descent parsing may involve backtracking Lexical analysis or scanning – Tools: SableCC, lex, flex, etc. Bottom-up Parsing Syntactic analysis or parsing – LR(1): left-to-right reading of tokens, rightmost derivation in reverse, 1 symbol – Tools: SableCC, yacc, bison, etc. lookahead – Shift-reduce parsers: for example, bison, yacc, and SableCC generated parsers – Methods for producing an LR parsing table – SLR, simple LR – Canonical LR, most powerful – LALR(1) CS553 Lecture Scanning and Parsing 16 CS553 Lecture Scanning and Parsing 17 4 N e xt T im Lecture e – More undergraduate compilers review C S 5 53 L ec tu re S c an ni ng a nd P ar si ng L an F o g r u en a te g rt e ai I nm m en p t p le ur m po e ‘5 se n 0 s t 1 on a 8 ly ti A-0 [Hopper] ! on Fortran [Backus] T Algol [Comm.] im LISP [McCarthy] el COBOL [Short Range Comm.] in Parser generators e ‘ Simula [Dahl & Nygaard] 60 Smalltalk [Kay] & PFC [Kennedy] BASIC [Kemeny & Kurtz] Trace sched. [Fisher] Coloring reg. alloc. [Chaitin] ‘8 1st RISC (IBM 801), Wolfe’s thesis 0 C++ [Stroustrup] Dep. vectors [Karp et al.] C S [INRIA] 5 O caml Value numbering [Cocke&Schwartz] 53 Dragon book [ASU] Copying GC [Cheney] L st ec PDG [Ferante] Perl [Wall] Pascal [Wirth] & 1 uproc [4004] tu SW pipelining [Lam] r ‘7 C [Ritchie] & ML [Milner et al.] e SSA [Cytron] 486 w/ cache 0 Prolog [Colmeraurer] Modern DFA [Kildall] & Lamport’s parallelism Sparse cond. const. [Wegman&Zadeck] Lex & YACC [Johnson] ‘9 Superblock scheduling [Hwu] GCD test [Banerjee & Towle] 0 Parafrase [Kuck] May v. must [Barth] Java [Gosling&Sun] Flow-sens. defined [Banning] PRE [Morel et al.] ‘8 S ca 0 n ni ng Itanium ships & Jikes RVM [IBM] a 2000 nd P ar si ng C S553 @ CSU 2010 19 5.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    5 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us