Lexical Analyzer, Lexer Or Scanner

Lexical Analyzer, Lexer Or Scanner

COMP 442/6421 – Compiler Design 1 COMPILER DESIGN Lexical analysis Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 2 Lexical analysis • Lexical analysis is the process of converting a sequence of characters into a sequence of tokens. • A program or function which performs lexical analysis is called a lexical analyzer, lexer or scanner. • A scanner often exists as a single function which is called by the parser, whose functionality is to extract the next token from the source code. • The lexical specification of a programming language is defined by a set of rules which defines the scanner, which are understood by a lexical analyzer generator such as lex or flex. These are most often expressed as regular expressions. • The lexical analyzer (either generated automatically by a tool like lex, or hand- crafted) reads the source code as a stream of characters, identifies the lexemes in the stream, categorizes them into tokens, and outputs a token stream. • This is called "tokenizing." • If the scanner finds an invalid token, it will report a lexical error. Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 3 Roles of the scanner • Removal of comments • Comments are not part of the program’s meaning • Multiple-line comments? • Nested comments? • Case conversion • Is the lexical definition case sensitive? • For identifiers • For keywords Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 4 Roles of the scanner • Removal of white spaces • Blanks, tabulars, carriage returns • Is it possible to identify tokens in a program without spaces? • Interpretation of compiler directives • #include, #ifdef, #ifndef and #define are directives to “redirect the input” of the compiler • May be done by a pre-compiler • Initial creation of the symbol table • A symbol table entry is created when an identifier is encountered • The lexical analyzer cannot create the whole entries • Can convert literals to their value and assign a type • Convert the input file to a token stream • Input file is a character stream • Lexical specifications: literals, operators, keywords, punctuation Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 5 Lexical specifications: tokens and lexemes • Token: An element of the lexical definition of the language. • Lexeme: A sequence of characters identified as a token. Token Lexeme id distance,rate,time,a,x relop >=,<,== openpar ( if if then then assignop = semi ; Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 6 Design of a lexical analyzer Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 7 Design of a lexical analyser • Procedure 1. Construct a set of regular expressions (REs) that define the form of any valid token 2. Derive an NDFA from the REs 3. Derive a DFA from the NDFA 4. Translate the NDFA to a state transition table 5. Implement the table 6. Implement the algorithm to interpret the table • This is exactly the procedure that a scanner generator is implementing. • Scanner generators include: • Lex, flex • Jlex • Alex • Lexgen • re2c Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 8 Regular expressions : { } s : {s | s in s^} a : {a} r | s : {r | r in r^} or {s | s in s^} s* : {sn | s in s^ and n>=0} s+ : {sn | s in s^ and n>=1} id ::= letter(letter|digit)* Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 9 Deriving DFA from REs • Thompson’s construction is an algorithm invented by Ken Thompson in 1968 to translate regular expressions into an NFA. • Rabin-Scott powerset construction is an algorithm invented by Michael O. Rabin and Dana Scott in 1959 to transform an NFA to a DFA. Ken Thompson • Kleene’s algorithm, is an algorithm invented by Stephen Cole Kleene in 1956 to transform a DFA into a regular expression. • These algorithms are the basis of the implementation of all scanner generators. Dana Scott Michael O. Rabin Stephen Cole Kleene Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 10 Thompson’s construction Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 11 REs to NDFA: Thompson’s construction id ::= letter(letter|digit)* ɛ 2 l 3 ɛ ɛ 0 l 1 6 ɛ 7 ɛ ɛ 4 d 5 ɛ Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 12 Thompson’s construction • Thompson’s construction s i s f works recursively by splitting an expression into its constituent subexpressions. • Each subexpression st i N(s) N(t) f corresponds to a subgraph. • Each subgraph is then grafted with other subgraphs depending on the nature of ɛ N(s) ɛ the composed subexpression, i.e. s|t i f • An atomic lexical symbol ɛ N(t) ɛ • A concatenation expression • A union expression ɛ • A Kleene star expression s* i ɛ N(s) ɛ f ɛ Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 13 Thompson’s construction: example (a|b)*abb a 2 a 3 e i e f b 4 b 5 N(s) ɛ 2 a 3 ɛ ɛ ɛ a|b 1 6 s|t i f ɛ 4 b 5 ɛ ɛ N(t) ɛ Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 14 Thompson’s construction: example (a|b)*abb ɛ ɛ 2 a 3 2 ɛ a 3 ɛ ɛ ɛ a(a||bb)* 1 0 ɛ 1 6 6 ɛ 7 ɛ ɛ ɛ 4 b 5 4 ɛ b 5s* i ɛ N(s) ɛ f ɛ ɛ Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 15 Thompson’s construction: example (a|b)*abb a 2 a 3 b 4 b 5 st i N(s) N(t) f ɛ 2 a 3 ɛ ɛ ɛ (a|b)* 0 ɛ 1 6 ɛ 7 2 a 3 ɛ ɛ ɛ ɛ 4 b 5 (a|b)*abb 0 ɛ 1 6 ɛ 7 a 8 b 9 b 10 ɛ ɛ ɛ 4 b 5 ɛ Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 16 Rabin-Scott powerset construction Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 17 Rabin-Scott powerset construction: concepts • SDFA: set of states in the DFA • SNFA: set of states in the NFA • Σ: set of all symbols in the lexical specification. • ɛ-closure(S): set of states in the NDFA that can be reached with ɛ transitions from any element of the set of states S, including the state itself. • MoveNFA(T,a): state in SNFA to which there is a transition from one of the states in states set T, having encounteredɛ symbol a. • Move (T,a): state in S to which there is a transition from one of the DFA DFA 2 l 3 states in states set T, having encounteredɛ symbolɛ a. 0 l 1 id ::= letter(letter|digit)*6 ɛ 7 ɛ ɛ l 4 d 5 C l ɛ A l B d l d D d Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 18 Rabin-Scott powerset construction: algorithm Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 19 Rabin-Scott powerset construction: example ɛ 2 l 3 ɛ ɛ 0 l 1 6 ɛ 7 ɛ ɛ l 4 d 5 C l ɛ A l B d l d D d Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 20 Rabin-Scott powerset construction: example Starting state A = ɛ-closure(0) = {0} A State A : {0} moveDFA(A,l) = ɛ-closure(moveNFA(A,l)) A l B = ɛ-closure({1}) = {1,2,4,7} = B moveDFA(A,d) = ɛ-closure(moveNFA(A,d)) = ɛ-closure({}) = {} Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 21 Rabin-Scott powerset construction: example State B : {1,2,4,7} moveDFA(B,l) = ɛ-closure(moveNFA(B,l)) A l B l C = ɛ-closure({3}) = {1,2,3,4,6,7} = C move (B,d) DFA C = ɛ-closure(moveNFA(B,d)) l = ɛ-closure({5}) A l B = {1,2,4,5,6,7} d = D D Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 22 Rabin-Scott powerset construction: example State C : {1,2,3,4,6,7} l C moveDFA(C,l) l = ɛ-closure(moveNFA(C,l)) A l B = ɛ-closure({3}) d = {1,2,3,4,6,7} D = C l move (C,d) DFA C = ɛ-closure(moveNFA(C,d)) l = ɛ-closure({5}) A l B d = {1,2,4,5,6,7} d = D D Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 23 Rabin-Scott powerset construction: example State D : {1,2,4,5,6,7} l C moveDFA(D,l) l = ɛ-closure(moveNFA(D,l)) A l B d l = ɛ-closure({3}) d = {1,2,3,4,6,7} D = C l move (D,d) DFA C = ɛ-closure(moveNFA(D,d)) l = ɛ-closure({5}) A l B d l = {1,2,4,5,6,7} d = D D d Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 24 Rabin-Scott powerset construction: example Final states: l C l A l B d l d D d Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 25 Generate state transition table state letter digit final A B N B C D Y l C C D Y C l D C D AY l B d l d D d Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 26 Implementation Concordia University Department of Computer Science and Software Engineering Joey Paquet, 2000-2018 COMP 442/6421 – Compiler Design 27 Implementation concerns • Backtracking • Principle : A token is normally recognized only when the next character is read.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    43 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us