<<

Compiler:

Computer Science and Engineering  College of Engineering  The Ohio State University

Lecture 37 Backus-Naur Form (BNF)

Computer Science and Engineering  The Ohio State University  Classic notation for writing CFG  Very old: Invented to describe the of ALGOL 60  But still used today!  Basic syntax  is a symbol  ::= is the arrow in a production rule  Vertical bar ( | ) for choice in a rule  Common extensions (many variants)  ( )’s to group elements  RE repetition operators * + ? (or { } [ ] for ?)  Comments Example: Mailing Addresses

Computer Science and Engineering  The Ohio State University ::=

::= |

::= "." |

::=

::= ","

::= "#" | "" Example: Mailing Addresses

Computer Science and Engineering  The Ohio State University ::=

::= |

::= "." |

::=

::= ","

::= "#" | "" Parse

Computer Science and Engineering  The Ohio State University  A records how rules are applied to form a string  Root: start symbol  Internal nodes: non-terminals  Leaves: terminals (i.e., tokens) Example

Computer Science and Engineering  The Ohio State University  Grammar: ::= | "a" "b" | "a" "b"  String: abaababb

exp

exp exp

abexp ab exp exp

abb a Your Turn

Computer Science and Engineering  The Ohio State University  Grammar: ::= (("+"|"-") )* ::= (("*"|"/") )* ::= | "(" ")"  String: 14 + 2 * 3 - 6 / 2  Parse tree: Your Turn

Computer Science and Engineering  The Ohio State University  MEAN := SUM DIV 100;

Computer Science and Engineering  The Ohio State University  An is one that permits two different parse trees to be formed for the same  Example: ::= + | - |  Consider: 3 + 6 – 2  Parse tree?

Computer Science and Engineering  The Ohio State University  How do we calculate a parse tree from a sequence of tokens?  In general, there are two basic ways to tackle a problem of tree construction: 1. Bottom up  Start at leaves  Calculate their parents, then their parents… 2. Top down  Start at root  Calculate its children, then their children… Operator Precedence Parsing

Computer Science and Engineering  The Ohio State University  An early bottom-up technique  Define binding priority between "operators" (ie token types)  e.g., A + B * – D  Priority: '+' < '*' and '-' < '*'  Resulting parse tree: Operator Precedence Parsing

Computer Science and Engineering  The Ohio State University  Operators are tokens  Binding priority only defined between terminals  Grammar (implicitly) defines a matrix of binding priorities  Note 1: Not all pairs defined!  Note 2: Ordering is not antisymmetric! Example

Computer Science and Engineering  The Ohio State University  Reductions  Parse tree creation BEGIN READ ( id ) ; < = < > > BEGIN READ ( nt1 ) ; < = = > BEGIN nt2 ; <  Notes:  Each reduction adds an internal node  Internal node names do not matter Shift-Reduce Parsing

Computer Science and Engineering  The Ohio State University  Generalizes idea of operator precedence  Two phases: 1. Shift: Scan tokens, placing them on a stack 2. Reduce: Group tokens at top of stack  Pop tokens that group together off  Push corresponding non-terminal  Repeat until done  Should be left with ______Shift-Reduce Parsing II

Computer Science and Engineering  The Ohio State University  Grammar must be "LR"  "Left-to-right scan of the input, producing a right-most derivation"  Symbols to be reduced always appear at top of stack (never inside it)  Need to "look ahead" to decide how/when to reduce symbols at the top of the stack  If we only need to look ahead 1 token: LR (1) grammar Recursive Descent

Computer Science and Engineering  The Ohio State University  Top-down approach  Each rule has an associated function  Scan forward  Try to identify string matching this rule  Function may have to call other functions  Example: Function to recognize find "READ"; find "(" ; find ; //another function call find ")"; Recursive Descent: Problem

Computer Science and Engineering  The Ohio State University  Subtle potential problem: "left- "  Occurs when left-most (first) symbol rule is the same non-terminal (recursive) ::= id | "," id  If we want to expand 2nd alternative, first call ourselves!  Results in infinite recursion  Solution: use optional repetition ::= id { "," }  Now the function always consumes a token before recursive call (AST)

Computer Science and Engineering  The Ohio State University  Concrete parse tree: Faithful representation of each grammar rule application  Often contains syntactic clutter  Abstract syntax tree: Faithful representation of structure of program  Only semantically important information is included Parse Tree

Computer Science and Engineering  The Ohio State University  MEAN := SUM DIV 100;

id := MEAN DIV int id 100 SUM AST

Computer Science and Engineering  The Ohio State University  MEAN := SUM DIV 100; :=

id MEAN DIV

int id 100 SUM Code Generation

Computer Science and Engineering  The Ohio State University  Output produced from the AST  Semantic routines: one routine per internal node in AST  Two approaches:  Create entire tree, then transform and walk the tree, generating output  Generate output as the grammar rules are recognized, bottom up Example

Computer Science and Engineering  The Ohio State University  Code snippet MID := (MAX + MIN) DIV 2  Grammar rule ::= DIV  Semantic routine:  Needs results from children, eg registers which contain values being div'ed  Generates output: machine code for div'ing  Returns location where result is placed, for its parent to use

DIV

+ int 2 Optimization

Computer Science and Engineering  The Ohio State University  An optimizing to generate the most efficient code  Time (fast execution times)  Space (small object files)  Requires sophisticated  Often uses an intermediate representation of code  IR is not executed directly  IR is analyzed for deciding register allocation, instruction ordering, branch shadows, etc... Example: LLVM IR

Computer Science and Engineering  The Ohio State University

@.str = internal constant [14 x i8] c"hello, world\0A\00" declare i32 @printf(i8*, ...) define i32 @main(i32 %argc, i8** %argv) nounwind { entry: %tmp1 = getelementptr [14 x i8], [14 x i8]* @.str, i32 0, i32 0 %tmp2 = call i32 (i8*, ...) @printf( i8* %tmp1 ) nounwind ret i32 0 } Compiler

Computer Science and Engineering  The Ohio State University  Write:  Token definitions (REs)  Grammar definition (CFG)  Semantic routines (code to execute when visiting/generating the nodes of the tree)  Use a tool to translate this information into a compiler (in C or Java or…)  Translation tool  a compiler compiler!  Classic unix tools:  Old school: and ("lexical analyzer", "yet another compiler compiler")  Better: Gnu's flex and bison  Output a lexer and a compiler that calls the generated lexer Modern Tool: ANTLR

Computer Science and Engineering  The Ohio State University  ANother Tool for Recognition  See: .org, github.com/antlr/antlr4  Examples: github.com/antlr/grammars-v4 (simple one: arithmetic.g4)  Can generate code in many (Java, C#, Python, JavaScript, C++…)  Two parts:  The tool (processes grammar to generate the lexer/parser)  The runtime (libraries for running the generated lexer/parser) Summary

Computer Science and Engineering  The Ohio State University  BNF: Syntax for grammar definition  Parse trees reflect application of grammar rules to produce program  Parse tree vs abstract syntax tree  Two strategies:  Bottom up (shift reduce)  Top down (recursive descent)  Code generation  IR and optimizations  Compiler compilers: lex/yacc, flex/bison, antlr