Compiler Design Spring 2018

Compiler Design Spring 2018 3.2 Lexical Analysis Thomas R. Gross Computer Science Department ETH Zurich, Switzerland 1 Overview § 3.1 Introduction § 3.2 Lexical analysis § 3.3 “Top down” parsing § 3.4 “Bottom up” parsing 2 Outline § Ambiguity (from Tuesday) § Lexical analysis § Top-down parsing § Simple backtracking parsers § Simple predictive parsers 3 3.2 Lexical analysis § Use regular expression to describe elements of language § Names of variables, fields, methods, classes, … § Constants (int, float, double, hex, …) § Keywords of the language (if then else while class …) § Example (from previous lecture): § Id: L { L | N } * § L = { a | b | c | … | z } § N = { 0 | 1 | 2 | … | 9 } § Regular expressions à DFA § Automatic construction easy § DFA produces the tokens 4 How it works a 3 + b Tokens: Yes Source b + 3 a Id(b) Term(+) Id(a3) DFA Analyzer program Lexer (or scanner) Parser No 5 Scanner § Also known as lexer § Problem: Characters à tokens § How to identify token? § Example: b + 3 a § How to stop building a token and then start a new one? § How much does the scanner “need to know”? 6 Token assembly § First (part of) answer: Stop when encountering a character that does not belong to current token § For many languages: Stop when encountering whitespace § Whitespace: Invisible and/or irrelevant for program § Look at C, C++, Java § <space> ␣ § Newline, form feed, CR (carriage return) § Tab § Comments 8 Comments and whitespace § Some languages attach meaning to whitespace § Nesting level in Python § “make” utility § Warning: macro facilities, pragma § Not all comments are whitespace § Directives hidden in comments § Example: Fortran90 comments start with “!” !DEC$ IVDEP – ignore vector dependencies DO I=1, N A(INDARR(I)) = A(INDARR(I)) + B(I) END DO 9 IVDEP – what’s that? § “ignore vector dependencies” A(INDARR(I)) = A(INDARR(I)) + B(I) § Parallelization § Processor 0: A[INDARR[1]] = … § Processor 1: A[INDARR[2]] = … § Possible outcomes § INDARR[1] == 10 192 § INDARR[2] == 100 192 10 Simple strategy § Works for JavaLi and other languages § Put as many characters into a token as possible until it is obvious that a new token starts a3 + b a – first token 3 – add to token + – new token § “Maximal munch” § May make good error reporting difficult § a12 vs. a + 12 11 Maximal munch limitations § Does not work for all programming languages § Example C program segment int j, k; int* kaddr; int** kkaddr; kaddr = & k; j = *kaddr + 2; kkaddr = & kaddr; Token: ”**” j = ** kkaddr + 3; k = 5 * * * kkaddr; Token: ”*” j = 7 * * kaddr; 13 What can be done? § Close(r) coupling between scanner and parser . Input program Scanner Token Requests Id, “*” (type of token expected or a list of types expected) Parser 14.

Compiler Design Spring 2018

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support