Homework 2: Parser and Lexer

Homework 2: Parser and Lexer

Homework 2: Parser and Lexer Remi Meier Compiler Design – 15.03.2018 1 Compiler Phases x86 Javali Compiler Assembly IR IR Front-end Optimizations Back-end Machine Machine independent dependent Lexical Syntactc AST Semantic Analysis Analysis Analysis 2 Homework 2 Class class Main { Method void main() { write(222); ? writeln(); Seq } } Text Write WriteLn Abstract IntConst Syntax Tree How do we… • check if a program follows the syntax of Javali? • extract meaning / structure? 3 Homework 2 Token Lexer Parser Parse Tree Javali AST Stream Part 2a Part 2b 4 Lexical Analysis Token Parse Tree / Text Lexer Parser Stream AST Lexer • Read input character by character • Recognize character groups → tokens Token • Sequence of characters with a collectve meaning → grammar terminals • E.g. constants, identiiers, keywords, … 5 Lexical Analysis class Main { ID : [a-zA-Z]+ ; void main() { NUM : [0-9]+ ; write(222); MISC : [{()};] ; writeln(); WS : ('\n'|' ') → skip ; } } Token stream: ID: class ID: Main MISC: { ID: void ID: main MISC: ( MISC: ) … 6 Syntactc Analysis Token Parse Tree / Text Lexer Parser Stream AST Parser • Check if token stream follows the grammar • Group tokens hierarchically (extract structure) → Parse Tree / Abstract Syntax Tree … 7 TOP-DOWN PARSER 8 Top-down Parsers return a + b ; Grammar in Extended Backus- Naur Form (EBNF): return a + b ; statement: statement return | assign return: return ‘return’ expr ‘;’ ‘return’ expr ‘;’ assign: ID ‘=‘ expr ‘;’ expr: ID ‘+’ ID ‘a’ ‘+’ ‘b’ 9 Implementaton return a + b ; Grammar in Extended Backus- void statement() { Naur Form (EBNF): return(); or assign(); statement: } return | assign void return() { match(‘return’); return: expr(); ‘return’ expr ‘;’ match(‘;’); assign: } ID ‘=‘ expr ‘;’ void expr() { match(ID); expr: ID ‘+’ ID match(‘+’); How to deal match(ID);with alternatves?} 10 Lookahead return a + b ; Grammar in Extended Backus- Naur Form (EBNF): statement: void statement() { return if (next() is ‘return’) { | assign return(); } else if (next() is ID) { return: assign(); ‘return’ expr ‘;’ } assign: } ID ‘=‘ expr ‘;’ expr: ID ‘+’ ID LL(1) 11 http://www.antlr4.org/ (or HW2 fragment) 12 ANTLR Token speciications MyLexer.java + MyParser.java Grammar Top-down parser generator ● ALL(*) adaptive, arbitrary lookahead ● handles any non-lef-recursive context-free grammar 13 ANTLR – Grammar Descripton /* This is an example */ grammar Example; /* Parser rules = Non-terminals */ program : Start rule matching end-of-ile statement* EOF ; Lower-case initial: Parser statement : assignment ';' | expression ';' ; /* Lexer rules = Terminals */ Identifier : Letter (Letter | Digit)* ; Upper-case initial: Lexer Letter : '\u0024' | '\u0041'..'\u005a'; → Tokens 14 ANTLR – Operators Extended Backus-Naur Form (EBNF) program : EBNF operators statement* EOF; x | y | z (ordered) alternative x? at most once (optional) statement : assignment ';' x* 0 .. n times x+ | expression ';' 1 .. n times l e x ; one of the chars, e.g.: e [charset] r [a-zA-Z] - o method : n l 'x'..'y' characters in range y type name '(' params? ')' ; 15 Demo 1 16 ANTLR – Troubleshootng ANTLR does not warn about ambiguous rules ● resolves ambiguity at runtime → requires lots of testing ANTLR does not handle indirect lef-recursion ● direct lef-recursion supported 17 ANTLR – Lexer Ambiguity What if some input is matched by multiple lexer rules? d o creates implicit lexer rule parserRule : 'foo' parserRule ; c u T123 : 'foo' m e n fragment enforces that the rule fragment t o Letter : [a-z] ; r never produces a token, but can be d e r used in other lexer rules Identifier : Letter+ ; can never match foo, but e.g., foot Lexer decides based on: 1. rule with the longest match irst 2. literal tokens before all regular Lexer rules 3. document order 4. fragment rules never match on their own 18 ANTLR – Parser Ambiguity stmt: 'if' expr 'then' stmt 'else' stmt (1) | 'if' expr 'then' stmt (2) | ID '=' expr ; if a then if c then d else e ) (2 (2 ), ), (1 (1 ) if a then if c then d else e if a then if c then d else e Ambiguous since there exist more than one parse trees for the same input. 19 ANTLR – Parser Ambiguity stmt: 'if' expr 'then' stmt 'else' stmt (1) | 'if' expr 'then' stmt (2) | ID '=' expr ; At decision points, if more than one alternative matches a given input, follow document order. if a then if c then d else e ) (2 (2 ), ), (1 (1 ) if a then if c then d else e if a then if c then d else e 20 ANTLR – Parser Ambiguity stmt: 'if' expr 'then' stmt 'else' stmt (1) | 'if' expr 'then' stmt (2) | ID '=' expr ; At decision points, if more than one alternative matches a given input, follow document order. stmt: 'if' expr 'then' stmt Soluton | 'if' expr 'then' stmt 'else' stmt | ID '=' expr ; 21 ANTLR – Parser Ambiguity At decision points, if more than one alternative matches a given input, follow document order. Alternative solution: stmt: 'if' expr 'then' stmt ('else' stmt)? (…)? → (…| ) | ID '=' expr ; Sub-rules introduce additional decision points. 22 ANTLR – Lef-recursion Without: “a, b, c” list : LETTER (',' LETTER)*; Direct: list : list ',' LETTER void list() { | LETTER ; if (???) { list(); Indirect: match(‘,’); match(LETTER); list : LETTER } else { | longlist ; match(LETTER); } } longlist : list ',' LETTER; 23 ANTLR – Direct Lef-recursion exp : exp '*' exp A grammar that implicitly assigns | exp '+' exp rewrite priorites to alternatives in | ID ; document order a * a + a * a a + a * a https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Lef-recursive+rules 24 Demo 2 25 Homework Token Lexer Parser Parse Tree Javali AST Stream Part 2a Part 2b Parser grammar: Javali.g4 JavaliAstVisitor.java 26 Javali AST Generaton BinaryOp(+) “a * a + a * a” Visitor BinaryOp(*) BinaryOp(*) Var('a') Var('a') Var('a') Var('a') → JavaliAstVisitor.java Generated Files ANTLR JavaliLexer/Parser.java ● the real thing Javali(Base)Visitor.java ● base class for parse-tree visitor Javali(Lexer).tokens ● token → number mapping for debugging 28 Generated Visitor start : exp EOF ; exp : exp '*' exp one method | exp '+' exp per rule | ID ; start : exp EOF ; exp : exp '*' exp # MULT one method | exp '+' exp # ADD per label / rule | ID # TERM ; https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Parser+Rules 29 Constructng the Javali AST start : exp EOF ; exp : exp '*' exp # MULT | exp '+' exp # ADD | ID # TERM ; BinaryOp(+) “a * a + a * a” visitADD visitMULT visitMULT Visitor BinaryOp(*) BinaryOp(*) visitTERM Var('a') Var('a') Var('a') Var('a') 30 Demo 3 31 Final Notes • Look on our website for more material. • Due date is March, 29th at 10 a.m. Please don't do that… 32.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    32 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us