Lexical Analyzer in Action

Lexical Analyzer in Action

Anatomy of a Compiler Computer Language Engineering Program (character stream) Lexical Analyzer (Scanner) • How to give instructions to a computer? Token Stream – Programming Languages. Syntax Analyzer (Parser) Parse Tree • How to make the computer carryout the Intermediate Code Generator instructions efficiently? Intermediate Representation Intermediate Code Optimizer – Compilers. Optimized Intermediate Representation Code Generator Assembly code cs480(Prasad) L2Syntax 1 cs480(Prasad) L2Syntax 2 What is a Lexical Analyzer? Lexical Analyzer in Action Source program text Tokens • Examples of Token f o r v a r 1 = 1 0 v a r 1 < = • Operators = + - > ( { := == <> •Keywords if while for int double • Numeric literals 43 6.035 -3.6e10 0x13F3A for ID(“var1”) eq_op Num(10) ID(“var1”) leq_op • Character literals ‘a’ ‘~’ ‘\’’ • String literals “3.142” “Fall” “\”\” = empty” • Partition input program text into sequence of • Examples of non-token tokens, attaching corresponding attributes. • White space space(‘ ’) tab(‘\t’) end-of-line(‘\n’) – E.g., C-lexeme “015” token NUM attribute 13 • Comments /*this is not a token*/ • Eliminate white space and comments cs480(Prasad) L2Syntax 3 cs480(Prasad) L2Syntax 4 1 Syntax and Semantics of a programming Input to and output of a parser language Input: - (123.3 + 23.6) •Syntax Token Stream Parse Tree – What is the structure of the program? minus_op – Textual representation. left_paren_op - • Formally defined using context-free num(123.3) grammars (Backus-Naur Formalism) plus_op • Semantics ( ) num(23.6) – What is the meaning of a program? right_paren_op • Harder to give a mathematical definition. Syntax Analyzer (Parser) 123.3 23.6 + cs480(Prasad) L2Syntax 5 cs480(Prasad) L2Syntax 6 Example: A CFG for expressions • Simple arithmetic expressions with + and * Example: A CFG for expressions – 8.2 + 35.6 – 8.32 + 86 * 45.3 <expr> <expr> <op> <expr> – (6.001 + 6.004) * (6.035 * -(6.042 + 6.046)) <expr> ( <expr> ) • Terminals (or tokens) <expr> - <expr> – num for all the numbers <expr> num – plus_op, minus_op, times_op, left_paren_op, right_paren_op <op> + <op> * • What is the grammar for all possible expressions? cs480(Prasad) L2Syntax 7 cs480(Prasad) L2Syntax 8 2 Context-Free Grammars (CFGs) English Language • Terminals • Letters: a,b,c, … – Symbols for strings or tokens { num, (, ), +, *, - } • Alphabet: { a,b, …, z} U { A, B, …, Z} • Nonterminals – Syntactic variables { <expr>, <op> } • Tokens (Terminals) : English words • Start symbol • Nonterminals: <Sentence>, <Subject>, – A special nonterminal <expr> <Clause>, … • Productions • Context-free aspect: Replacing <Noun> by – The manner in which terminals and nonterminals are <Proper Noun> or <Common Noun> combined to form strings. • Context-sensitive aspects: Agreement among – A nonterminal in LHS and a string of terminals and non- words w.r.t. number, gender, tense, etc. terminals in RHS. <expr> - <expr> cs480(Prasad) L2Syntax 9 cs480(Prasad) L2Syntax 10 Example Derivation Another Example Derivation <expr> <expr> <op> <expr> <expr> <expr> <op> <expr> num <op> <expr> <expr> * <expr> num * <expr> <expr> *(<expr> ) num * ( <expr> ) <expr> *(<expr> <op> <expr> ) num * ( <expr> <op> <expr> ) <expr> * ( num <op> <expr> ) num * ( num <op> <expr> ) <expr> * ( num + <expr> ) num * ( num + <expr> ) num * ( num + num ) <expr> * ( num + num ) num * ( num + num ) num * ( num + num ) cs480(Prasad) L2Syntax 11 cs480(Prasad) L2Syntax 12 3 Parse/Derivation Tree Parse/Derivation Tree Example • Graphical Representation of the parsed structure num * ( num + num ) • Shows the sequence of derivations performed – Internal nodes are non-terminals. <expr> – Leaves are terminals. <expr> <op> <expr> – Each parent node is LHS and the children are RHS of a production. num * ()<expr> • Abstracts the details of sequencing of the rule applications, but preserves decomposition of <expr> <op> <expr> a non-terminal. num num cs480(Prasad) L2Syntax 13 + Expression – Expression – One possible parse tree Another possible parse tree num + num * num num + num * num <expr> <expr> <expr> <op> <expr> <expr> <op> <expr> <expr><op> <expr> <expr><op> <expr> num + * num num* num num+ num 124 + (23.5 * 86) = 2145 (124 + 23.5) * 86 = 12685 cs480(Prasad) L2Syntax 15 cs480(Prasad) L2Syntax 16 4 Same string – Two distinct parse (derivation) trees Ambiguous Grammar num + num * num • Applying different derivation orders produces different parse trees. <expr> !AMBIGUITY! <expr> – This can lead to ambiguous/unexpected results. – Note that multiple derivations leading to the same parse <expr> <op> <expr> <expr> <op> <expr> tree is not a problem. <expr><op> <expr> <expr><op> <expr> • A CFG is ambiguous if the same string can num + * num be associated with two distinct parse trees. num* num num+ num – E.g., The expression grammar can be shown to be ambiguous using expressions containing binary infix operators and non-fully parenthesized expressions. 124 + (23.5 * 86) = 2145 (124 + 23.5) * 86 = 12685 cs480(Prasad) L2Syntax 17 cs480(Prasad) L2Syntax 18 Eliminating Ambiguity Removing Ambiguity <expr> <term> + <expr> • Sometimes rewriting a grammar to reflect <expr> <term> operator precedence with additional <term> <unit> * <term> nonterminals will eliminate ambiguity. <term> <unit> * more binding than +. <unit> num && has precedence over ||. <expr> <expr> <op> <expr> <unit> ( <expr> ) <expr> ( <expr> ) <expr> num • One can view the rewrite as “breaking the <op> + symmetry” through “stratification” . <op> * cs480(Prasad) L2Syntax 19 cs480(Prasad) L2Syntax 20 5 Extended BNF (Backus Naur Formalism) Expression Parser Fragment <expr> <term> ( + <term> ) * (String => Parse Tree) Node expr() { <term> <unit> ( * <unit> ) * // PRE: Expects lookahead token. // POST: Consume an Expression <unit> num | ( <expr> ) // and update Lookahead token. Node temp = term(); <expr> <expr> ( <op> <expr> ) * while ( inTok.ttype == '+') { inTok.nextToken(); Kleene Star : <expr> ( <expr> ) Node temp1 = term(); Zero or more temp = new OpNode(temp,'+',temp1); occurrences <expr> num } <op> + | * return temp; } cs480(Prasad) L2Syntax 21 cs480(Prasad) L2Syntax 22 Arithmetic Expressions (ambiguous) (with variables) Resolving Ambiguity in Expressions <expr> <expr> + <expr> | <expr> * <expr> • Different operators : precedence relation | ( <expr> ) | <variable> • Same operator : associativity relation | <constant> 5-2-3 2**3**4 5-2 - 3 5- 2-3 <variable> x | y | z 2^12 2^81 = 0 = 6 <constant> 0 | 1 | 2 Left Associative ( (5-2)-3) Right Associative (2**(3**4)) cs480(Prasad) L2Syntax 23 cs480(Prasad) L2Syntax 24 6 C++ Operator Precedence and Associativity Level Operator Function Level Operator Function 17R :: global scope (unary) 12L + , - arithmetic operators 17L :: class scope (binary) 11L << , >> bitwise shift 16L -> , .member selectors 10L < , <= , > , >= relational operators 16L [] array index 9L == , != equaltity, inequality 16L () function call 8L & bitwise AND 16L () type construction 15R sizeof size in bytes 7L ^ bitwise XOR 15R ++ , -- increment, decrement 6L | bitwise OR 15R ~ bitwise NOT 5L && logical AND 15R ! logical NOT 4L || logical OR 15R + , - uniary minus, plus 3L ?: arithmetic if 15R * , & dereference, address-of 2R = , *= , /= , %= , += , -= 15R () type conversion (cast) <<= , >>= , &= , |= , ^= assignment 15R new , delete free store management operators 14L ->* , .* member pointer select 1L , comma operator 13L * , / , % multiplicative operators cs480(Prasad) L2Syntax 25 7.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us