Anatomy of a Compiler Computer Language Engineering Program (character stream) Lexical Analyzer (Scanner) • How to give instructions to a computer? Token Stream – Programming Languages. Syntax Analyzer (Parser) Parse Tree • How to make the computer carryout the Intermediate Code Generator instructions efficiently? Intermediate Representation Intermediate Code Optimizer – Compilers. Optimized Intermediate Representation Code Generator Assembly code cs480(Prasad) L2Syntax 1 cs480(Prasad) L2Syntax 2
What is a Lexical Analyzer? Lexical Analyzer in Action Source program text Tokens • Examples of Token f o r v a r 1 = 1 0 v a r 1 < = • Operators = + - > ( { := == <> •Keywords if while for int double • Numeric literals 43 6.035 -3.6e10 0x13F3A for ID(“var1”) eq_op Num(10) ID(“var1”) leq_op • Character literals ‘a’ ‘~’ ‘\’’ • String literals “3.142” “Fall” “\”\” = empty” • Partition input program text into sequence of • Examples of non-token tokens, attaching corresponding attributes. • White space space(‘ ’) tab(‘\t’) end-of-line(‘\n’) – E.g., C-lexeme “015” token NUM attribute 13 • Comments /*this is not a token*/ • Eliminate white space and comments
cs480(Prasad) L2Syntax 3 cs480(Prasad) L2Syntax 4
1 Syntax and Semantics of a programming Input to and output of a parser language Input: - (123.3 + 23.6) •Syntax Token Stream Parse Tree – What is the structure of the program? minus_op – Textual representation. left_paren_op - • Formally defined using context-free num(123.3) grammars (Backus-Naur Formalism) plus_op • Semantics ( ) num(23.6) – What is the meaning of a program? right_paren_op
• Harder to give a mathematical definition. Syntax Analyzer (Parser) 123.3 23.6 +
cs480(Prasad) L2Syntax 5 cs480(Prasad) L2Syntax 6
Example: A CFG for expressions • Simple arithmetic expressions with + and * Example: A CFG for expressions – 8.2 + 35.6 – 8.32 + 86 * 45.3
cs480(Prasad) L2Syntax 7 cs480(Prasad) L2Syntax 8
2 Context-Free Grammars (CFGs) English Language
• Terminals • Letters: a,b,c, … – Symbols for strings or tokens { num, (, ), +, *, - } • Alphabet: { a,b, …, z} U { A, B, …, Z} • Nonterminals – Syntactic variables {
cs480(Prasad) L2Syntax 9 cs480(Prasad) L2Syntax 10
Example Derivation Another Example Derivation
cs480(Prasad) L2Syntax 11 cs480(Prasad) L2Syntax 12
3 Parse/Derivation Tree Parse/Derivation Tree Example • Graphical Representation of the parsed structure num * ( num + num ) • Shows the sequence of derivations performed – Internal nodes are non-terminals.
– Leaves are terminals.
Expression – Expression – One possible parse tree Another possible parse tree num + num * num num + num * num
num* num num+ num
124 + (23.5 * 86) = 2145 (124 + 23.5) * 86 = 12685
cs480(Prasad) L2Syntax 15 cs480(Prasad) L2Syntax 16
4 Same string – Two distinct parse (derivation) trees Ambiguous Grammar num + num * num • Applying different derivation orders produces different parse trees.
cs480(Prasad) L2Syntax 17 cs480(Prasad) L2Syntax 18
Eliminating Ambiguity Removing Ambiguity
cs480(Prasad) L2Syntax 19 cs480(Prasad) L2Syntax 20
5 Extended BNF (Backus Naur Formalism) Expression Parser Fragment
cs480(Prasad) L2Syntax 21 cs480(Prasad) L2Syntax 22
Arithmetic Expressions (ambiguous) (with variables) Resolving Ambiguity in Expressions
5-2 - 3 5- 2-3
cs480(Prasad) L2Syntax 23 cs480(Prasad) L2Syntax 24
6 C++ Operator Precedence and Associativity
Level Operator Function Level Operator Function 17R :: global scope (unary) 12L + , - arithmetic operators 17L :: class scope (binary) 11L << , >> bitwise shift 16L -> , .member selectors 10L < , <= , > , >= relational operators 16L [] array index 9L == , != equaltity, inequality 16L () function call 8L & bitwise AND 16L () type construction 15R sizeof size in bytes 7L ^ bitwise XOR 15R ++ , -- increment, decrement 6L | bitwise OR 15R ~ bitwise NOT 5L && logical AND 15R ! logical NOT 4L || logical OR 15R + , - uniary minus, plus 3L ?: arithmetic if 15R * , & dereference, address-of 2R = , *= , /= , %= , += , -= 15R () type conversion (cast) <<= , >>= , &= , |= , ^= assignment 15R new , delete free store management operators 14L ->* , .* member pointer select 1L , comma operator 13L * , / , % multiplicative operators cs480(Prasad) L2Syntax 25
7