
Anatomy of a Compiler Computer Language Engineering Program (character stream) Lexical Analyzer (Scanner) • How to give instructions to a computer? Token Stream – Programming Languages. Syntax Analyzer (Parser) Parse Tree • How to make the computer carryout the Intermediate Code Generator instructions efficiently? Intermediate Representation Intermediate Code Optimizer – Compilers. Optimized Intermediate Representation Code Generator Assembly code cs480(Prasad) L2Syntax 1 cs480(Prasad) L2Syntax 2 What is a Lexical Analyzer? Lexical Analyzer in Action Source program text Tokens • Examples of Token f o r v a r 1 = 1 0 v a r 1 < = • Operators = + - > ( { := == <> •Keywords if while for int double • Numeric literals 43 6.035 -3.6e10 0x13F3A for ID(“var1”) eq_op Num(10) ID(“var1”) leq_op • Character literals ‘a’ ‘~’ ‘\’’ • String literals “3.142” “Fall” “\”\” = empty” • Partition input program text into sequence of • Examples of non-token tokens, attaching corresponding attributes. • White space space(‘ ’) tab(‘\t’) end-of-line(‘\n’) – E.g., C-lexeme “015” token NUM attribute 13 • Comments /*this is not a token*/ • Eliminate white space and comments cs480(Prasad) L2Syntax 3 cs480(Prasad) L2Syntax 4 1 Syntax and Semantics of a programming Input to and output of a parser language Input: - (123.3 + 23.6) •Syntax Token Stream Parse Tree – What is the structure of the program? minus_op – Textual representation. left_paren_op - • Formally defined using context-free num(123.3) grammars (Backus-Naur Formalism) plus_op • Semantics ( ) num(23.6) – What is the meaning of a program? right_paren_op • Harder to give a mathematical definition. Syntax Analyzer (Parser) 123.3 23.6 + cs480(Prasad) L2Syntax 5 cs480(Prasad) L2Syntax 6 Example: A CFG for expressions • Simple arithmetic expressions with + and * Example: A CFG for expressions – 8.2 + 35.6 – 8.32 + 86 * 45.3 <expr> <expr> <op> <expr> – (6.001 + 6.004) * (6.035 * -(6.042 + 6.046)) <expr> ( <expr> ) • Terminals (or tokens) <expr> - <expr> – num for all the numbers <expr> num – plus_op, minus_op, times_op, left_paren_op, right_paren_op <op> + <op> * • What is the grammar for all possible expressions? cs480(Prasad) L2Syntax 7 cs480(Prasad) L2Syntax 8 2 Context-Free Grammars (CFGs) English Language • Terminals • Letters: a,b,c, … – Symbols for strings or tokens { num, (, ), +, *, - } • Alphabet: { a,b, …, z} U { A, B, …, Z} • Nonterminals – Syntactic variables { <expr>, <op> } • Tokens (Terminals) : English words • Start symbol • Nonterminals: <Sentence>, <Subject>, – A special nonterminal <expr> <Clause>, … • Productions • Context-free aspect: Replacing <Noun> by – The manner in which terminals and nonterminals are <Proper Noun> or <Common Noun> combined to form strings. • Context-sensitive aspects: Agreement among – A nonterminal in LHS and a string of terminals and non- words w.r.t. number, gender, tense, etc. terminals in RHS. <expr> - <expr> cs480(Prasad) L2Syntax 9 cs480(Prasad) L2Syntax 10 Example Derivation Another Example Derivation <expr> <expr> <op> <expr> <expr> <expr> <op> <expr> num <op> <expr> <expr> * <expr> num * <expr> <expr> *(<expr> ) num * ( <expr> ) <expr> *(<expr> <op> <expr> ) num * ( <expr> <op> <expr> ) <expr> * ( num <op> <expr> ) num * ( num <op> <expr> ) <expr> * ( num + <expr> ) num * ( num + <expr> ) num * ( num + num ) <expr> * ( num + num ) num * ( num + num ) num * ( num + num ) cs480(Prasad) L2Syntax 11 cs480(Prasad) L2Syntax 12 3 Parse/Derivation Tree Parse/Derivation Tree Example • Graphical Representation of the parsed structure num * ( num + num ) • Shows the sequence of derivations performed – Internal nodes are non-terminals. <expr> – Leaves are terminals. <expr> <op> <expr> – Each parent node is LHS and the children are RHS of a production. num * ()<expr> • Abstracts the details of sequencing of the rule applications, but preserves decomposition of <expr> <op> <expr> a non-terminal. num num cs480(Prasad) L2Syntax 13 + Expression – Expression – One possible parse tree Another possible parse tree num + num * num num + num * num <expr> <expr> <expr> <op> <expr> <expr> <op> <expr> <expr><op> <expr> <expr><op> <expr> num + * num num* num num+ num 124 + (23.5 * 86) = 2145 (124 + 23.5) * 86 = 12685 cs480(Prasad) L2Syntax 15 cs480(Prasad) L2Syntax 16 4 Same string – Two distinct parse (derivation) trees Ambiguous Grammar num + num * num • Applying different derivation orders produces different parse trees. <expr> !AMBIGUITY! <expr> – This can lead to ambiguous/unexpected results. – Note that multiple derivations leading to the same parse <expr> <op> <expr> <expr> <op> <expr> tree is not a problem. <expr><op> <expr> <expr><op> <expr> • A CFG is ambiguous if the same string can num + * num be associated with two distinct parse trees. num* num num+ num – E.g., The expression grammar can be shown to be ambiguous using expressions containing binary infix operators and non-fully parenthesized expressions. 124 + (23.5 * 86) = 2145 (124 + 23.5) * 86 = 12685 cs480(Prasad) L2Syntax 17 cs480(Prasad) L2Syntax 18 Eliminating Ambiguity Removing Ambiguity <expr> <term> + <expr> • Sometimes rewriting a grammar to reflect <expr> <term> operator precedence with additional <term> <unit> * <term> nonterminals will eliminate ambiguity. <term> <unit> * more binding than +. <unit> num && has precedence over ||. <expr> <expr> <op> <expr> <unit> ( <expr> ) <expr> ( <expr> ) <expr> num • One can view the rewrite as “breaking the <op> + symmetry” through “stratification” . <op> * cs480(Prasad) L2Syntax 19 cs480(Prasad) L2Syntax 20 5 Extended BNF (Backus Naur Formalism) Expression Parser Fragment <expr> <term> ( + <term> ) * (String => Parse Tree) Node expr() { <term> <unit> ( * <unit> ) * // PRE: Expects lookahead token. // POST: Consume an Expression <unit> num | ( <expr> ) // and update Lookahead token. Node temp = term(); <expr> <expr> ( <op> <expr> ) * while ( inTok.ttype == '+') { inTok.nextToken(); Kleene Star : <expr> ( <expr> ) Node temp1 = term(); Zero or more temp = new OpNode(temp,'+',temp1); occurrences <expr> num } <op> + | * return temp; } cs480(Prasad) L2Syntax 21 cs480(Prasad) L2Syntax 22 Arithmetic Expressions (ambiguous) (with variables) Resolving Ambiguity in Expressions <expr> <expr> + <expr> | <expr> * <expr> • Different operators : precedence relation | ( <expr> ) | <variable> • Same operator : associativity relation | <constant> 5-2-3 2**3**4 5-2 - 3 5- 2-3 <variable> x | y | z 2^12 2^81 = 0 = 6 <constant> 0 | 1 | 2 Left Associative ( (5-2)-3) Right Associative (2**(3**4)) cs480(Prasad) L2Syntax 23 cs480(Prasad) L2Syntax 24 6 C++ Operator Precedence and Associativity Level Operator Function Level Operator Function 17R :: global scope (unary) 12L + , - arithmetic operators 17L :: class scope (binary) 11L << , >> bitwise shift 16L -> , .member selectors 10L < , <= , > , >= relational operators 16L [] array index 9L == , != equaltity, inequality 16L () function call 8L & bitwise AND 16L () type construction 15R sizeof size in bytes 7L ^ bitwise XOR 15R ++ , -- increment, decrement 6L | bitwise OR 15R ~ bitwise NOT 5L && logical AND 15R ! logical NOT 4L || logical OR 15R + , - uniary minus, plus 3L ?: arithmetic if 15R * , & dereference, address-of 2R = , *= , /= , %= , += , -= 15R () type conversion (cast) <<= , >>= , &= , |= , ^= assignment 15R new , delete free store management operators 14L ->* , .* member pointer select 1L , comma operator 13L * , / , % multiplicative operators cs480(Prasad) L2Syntax 25 7.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages7 Page
-
File Size-