Lexical and Syntactic Analysis

Total Page:16

File Type:pdf, Size:1020Kb

Lexical and Syntactic Analysis 10/4/2012 COS 301 Lexical and Syntactic Analysis • Language implementation systems must Programming Languages analyze source code, regardless of the specific implementation approach (compiler or Lexical and Syntactic Analysis interpreter) • Nearly all syntax analysis is based on a formal description of the syntax of the source language (BNF) Sebesta Chapter 4.1-4.4 – Lexical analysis uses less powerful grammars than syntactic analysis Source Code Syntax Analysis Why Separate Lexical and Syntax Analysis? • The syntax analysis portion of a language • Simplicity - less complex approaches can be processor nearly always consists of two parts: used for lexical analysis; separating them – A low-level part called a lexical analyzer simplifies the parser (mathematically, a finite automaton based on a • Efficiency - separation allows optimization of regular grammar) the lexical analyzer – A high-level part called a syntax analyzer, or parser – About 75% of execution time for a non-optimizing (mathematically, a push-down automaton based on compiler is lexical analysis a context-free grammar, or BNF) • Portability - parts of the lexical analyzer may not be portable, but the parser always is portable – The lexical analyzer has to deal with low-level details of the character set – such as what a newline character looks like, EOF etc. Lexical Analysis Lexical Analyzer • A lexical analyzer is a pattern matcher for • Purpose: transform program representation character strings from sequence of characters to sequence of • A lexical analyzer is a “front-end” for the tokens parser • Input: a stream of characters • Identifies substrings of the source program that • Output: lexemes / tokens belong together - lexemes • Discard: whitespace, comments – Lexemes match a character pattern, which is associated with a lexical category called a token – sum is a lexeme; its token may be IDENT • Often “token” is used in place of lexeme 1 10/4/2012 Example Tokens Other Sequences • Identifiers • Whitespace: space tab • Literals: 123, 5.67, 'x', true • Comments, e.g. • Keywords or reserved words: bool, while, char // {any-char} end-of-line ... /* {any-char} */ • Operators: + - * / ... • End-of-line • Punctuation: ; , ( ) { } • End-of-file • Note: in some languages end-of-line or new- line characters are considered white space (C, C++, Java…) • In other languages (BASIC, Fortran, etc.) they are statement delimiters Lexical Analyzer (continued) The Chomsky Hierarchy (Again) • The lexical analyzer is usually a function that is called • Four levels of grammar: by the parser when it needs the next token 1. Regular • Three approaches to building a lexical analyzer: – Write a formal description of the tokens (grammar or regular 2. Context-free expressions) and use a software tool that constructs table- 3. Context-sensitive driven lexical analyzers given such a description • Ex. lex, flex, flex++ 4. Unrestricted (recursively enumerable) – Design a state diagram that describes the tokens and write a • CFGs are used for syntax parsing program that implements the state diagram – Design a state diagram that describes the tokens and hand- • Regular grammars are used for lexical analysis construct a table-driven implementation of the state diagram Productions Three models of the lexical level • All grammars are tuples {P,T,N,S} • Although the lexical level can be described – Where P is a set of productions, T a set of terminal with BNF, regular grammars can be used symbols, N a set of non-terminal symbols and S is • Equivalent to regular grammars are: the start symbol – a member of N – Regular expressions – Finite state automata • The form of production rules distinguishes grammars in hierarchy 2 10/4/2012 Context-Sensitive Grammars Context-free Grammars • Production: • Already discussed as BNF - a stylized form of • α → β |α| ≤ |β| CFG • α, β (N T)* • Every production is in the form A where A – The left-hand side can be composed of strings of terminals is a single non-terminal and is a string of and nonterminals – Length of RHS cannot be less than length of LHS (sentential terminals and/or non-terminals (possibly form cannot shrink in derivation) except S is allowed empty) • Note than context sensitive grammars can have • Equivalent to a pushdown automaton productions such as – aXb => aYZc • For a wide class of unambiguous CFGs, there – aXc => aaXb are table-driven, linear time parsers Regular Grammars Regular Grammars • Simplest and least powerful; equivalent to: • Left regular grammar: T*, B N – Regular expression A → B – Finite-state automaton A → • All productions must be right-regular or left- • A regular grammar is a right-regular or a left- regular regular grammar • Right regular grammar: T*, B N – If we have both types of rules we have a linear A → B grammar – a more powerful language than a regular A → grammar – Regular langs linear langs context-free langs • E.g., rhs of any production must contain at • Example of a linear language that is not a regular most one nonterminal AND it must be the language: rightmost symbol { aⁿ bⁿ | n ≥ 1 } • Direct recursion is permitted A → A i.e., we cannot balance symbols that have matching pairs such as ( ), { }, begin end, with a regular grammar Right-regular Integer grammar Summary of Grammatical Forms Integer → 0 Integer | 1 Integer | ... | 9 Integer • Regular Grammars Integer → 0 | 1 | ... | 9 – Only one nonterminal on left; rhs of any production must contain at most one nonterminal AND it must • In EBNF be the rightmost (leftmost) symbol Integer → (0 |... | 9) Integer • Context Free Grammars Integer → 0 | ... | 9 – Only one non-terminal symbol on lhs • Context-Sensitive Grammars – Lhs can contain any number of terminals and non- terminals – Sentential form cannot shrink in derivation • Unrestricted Grammars – Same as CSGs but remove restriction on shrinking sentential forms 3 10/4/2012 Left-regular Integer grammar Finite State Automata Integer → Integer 0 | Integer 1 | ... | Integer 9 • An abstract machine that is useful for lexical Integer → 0 | 1 | ... | 9 analysis • In EBNF – Also know as Finite State Machines Integer → Integer (0 |... | 9) • Two varieties (equivalent in power): Integer → 0 | ... | 9 – Non-deterministic finite state automata (NFSA) – Deterministic finite state automata (DFSA) • Only DFSAs are directly useful for constructing programs – Any NFSA can be converted into an equivalent DFSA • We will use an informal approach to describe DFSAs What is a Finite State Machine? Other uses of FSAs / FSMs • A device that has a finite number of states. • Finite state machines can be used to describe • It accepts input from a “tape” things other than languages • Each state and each input symbol uniquely determine • Many relatively simply embedded systems can another state (hence deterministic) be described with a finite state machine • The device starts operation before any input is read – this is the “start state” • At the end of input the device may be in an “accepting” state – If inputs are characters then the device recognizes a language • Some inputs may cause the device to enter an “error” state (not usually explicitly represented) FSA Graph Representation Example: Vending Machine • A finite state automaton has • Adapted from Wulf, Shaw, Hilfinger, Flon, Fundamental Structures of Computer Science, p.17. 1. A set of states: represented by nodes in a graph 2. An input alphabet augmented with unique end of input symbol 3. State transition function, represented by directed edges in graph, labeled with symbols from alphabet or set of inputs 4. A unique start state 5. One or more final (accepting) states – no exiting edges 4 10/4/2012 Example: Battery Charger A Finite State Automaton for Identifiers • From http://www.jcelectronica.com/articles/state_machines.htm Letter, Digit $ Letter S 1 F • This diagram indicates an explicit transition to an accepting state • We could also use this diagram: L, D L S 1 FSM for a childish language Quiz Oct 2 • What language is described by this diagram? 1. Draw a DFSA that recognizes binary strings that start with 1 and end with 0 a 2. Draw a DFSA that recognizes binary strings m a m with at least three consecutive 1’s S 3. Below is a BNF grammar for fractional d a d numbers. Rewrite as EBNF a S -> -FN | FN FN -> DL | DL.DL DL -> D | D DL D -> 0|1|2|3|4|5|6|7|8|9 Regular Expressions Regular Expressions • An alternative to regular grammars for Regex Meaning specifying a language at the lexical level x a character x (stands for itself) • Also used extensively in text-processing \x an escaped character, e.g., \n M | N M or N • Very useful for web applications M N M followed by N • Built-in support in many languages, e.g., Perl, M* zero or more occurrences of M Ruby, Java, Javascript, Python, .NET languages Note: \ varies with software, typical usage: • There are several different syntactic certain non-printable characters (e.g., \n = newline and \t=tab) conventions for regexes ASCII hex (\xFF) or Unicode hex (\xFFFF) Shorthand character classes (\w = word, \s = whitespace \d=digit) Escaping a literal, e.g. \* or \. 5 10/4/2012 Regular Expression Metasymbols Regex Examples - 1 Regex Meaning Let Σ = { a, b, c } r = ( a | b ) * c M+ One or more occurrences of M This regex specifies repetition (0, 1, 2, etc. occurrences) of either a or b followed by c. Strings that match this regular expression M? Zero or one occurrence of M include: M* Zero or more occurrences of M c [aeiou] the set of vowels ac [0-9] the set of digits bc . Any single character abc aabbaabbc ( ) Grouping Regex Examples – 2 Regex Examples – 3 Let Σ = { a, b, c } r = ( a | c ) * b ( a | c ) * • A regular expression to represent a signed integer. This regular expression specifies repetition of either a or c followed • There is an optional leading sign (+ or -) followed by at by b followed by repetition of either a or c.
Recommended publications
  • C Constants and Literals Integer Literals Floating-Point Literals
    C Constants and Literals The constants refer to fixed values that the program may not alter during its execution. These fixed values are also called literals. Constants can be of any of the basic data types like an integer constant, a floating constant, a character constant, or a string literal. There are also enumeration constants as well. The constants are treated just like regular variables except that their values cannot be modified after their definition. Integer literals An integer literal can be a decimal, octal, or hexadecimal constant. A prefix specifies the base or radix: 0x or 0X for hexadecimal, 0 for octal, and nothing for decimal. An integer literal can also have a suffix that is a combination of U and L, for unsigned and long, respectively. The suffix can be uppercase or lowercase and can be in any order. Here are some examples of integer literals: Floating-point literals A floating-point literal has an integer part, a decimal point, a fractional part, and an exponent part. You can represent floating point literals either in decimal form or exponential form. While representing using decimal form, you must include the decimal point, the exponent, or both and while representing using exponential form, you must include the integer part, the fractional part, or both. The signed exponent is introduced by e or E. Here are some examples of floating-point literals: 1 | P a g e Character constants Character literals are enclosed in single quotes, e.g., 'x' and can be stored in a simple variable of char type. A character literal can be a plain character (e.g., 'x'), an escape sequence (e.g., '\t'), or a universal character (e.g., '\u02C0').
    [Show full text]
  • PYTHON NOTES What Is Python?
    PYTHON NOTES What is Python? Python is a popular programming language. It was created in 1991 by Guido van Rossum. It is used for: web development (server-side), software development, mathematics, system scripting. What can Python do? Python can be used on a server to create web applications. Python can connect to database systems. It can also read and modify files. Python can be used to handle big data and perform complex mathematics. Python can be used for production-ready software development. Why Python? Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc). Python has a simple syntax similar to the English language. Python has syntax that allows developers to write programs with fewer lines than some other programming languages. Python runs on an interpreter system, meaning that code can be executed as soon as it is written. This means that prototyping can be very quick. Good to know The most recent major version of Python is Python 3, which we shall be using in this tutorial. However, Python 2, although not being updated with anything other than security updates, is still quite popular. Python Syntax compared to other programming languages Python was designed to for readability, and has some similarities to the English language with influence from mathematics. Python uses new lines to complete a command, as opposed to other programming languages which often use semicolons or parentheses. Python relies on indentation, using whitespace, to define scope; such as the scope of loops, functions and classes. Other programming languages often use curly-brackets for this purpose. Python Install Many PCs and Macs will have python already installed.
    [Show full text]
  • Variables and Calculations
    ¡ ¢ £ ¤ ¥ ¢ ¤ ¦ § ¨ © © § ¦ © § © ¦ £ £ © § ! 3 VARIABLES AND CALCULATIONS Now you’re ready to learn your first ele- ments of Python and start learning how to solve programming problems. Although programming languages have myriad features, the core parts of any programming language are the instructions that perform numerical calculations. In this chapter, we’ll explore how math is performed in Python programs and learn how to solve some prob- lems using only mathematical operations. ¡ ¢ £ ¤ ¥ ¢ ¤ ¦ § ¨ © © § ¦ © § © ¦ £ £ © § ! Sample Program Let’s start by looking at a very simple problem and its Python solution. PROBLEM: THE TOTAL COST OF TOOTHPASTE A store sells toothpaste at $1.76 per tube. Sales tax is 8 percent. For a user-specified number of tubes, display the cost of the toothpaste, showing the subtotal, sales tax, and total, including tax. First I’ll show you a program that solves this problem: toothpaste.py tube_count = int(input("How many tubes to buy: ")) toothpaste_cost = 1.76 subtotal = toothpaste_cost * tube_count sales_tax_rate = 0.08 sales_tax = subtotal * sales_tax_rate total = subtotal + sales_tax print("Toothpaste subtotal: $", subtotal, sep = "") print("Tax: $", sales_tax, sep = "") print("Total is $", total, " including tax.", sep = ") Parts of this program may make intuitive sense to you already; you know how you would answer the question using a calculator and a scratch pad, so you know that the program must be doing something similar. Over the next few pages, you’ll learn exactly what’s going on in these lines of code. For now, enter this program into your Python editor exactly as shown and save it with the required .py extension. Run the program several times with different responses to the question to verify that the program works.
    [Show full text]
  • Regular Expressions the Following Are Sections of Different Programming Language Specifications
    Regular Expressions The following are sections of different programming language specifications. Write the corresponding regular expressions for each of the described lexical elements. 1) C Identifiers [Kernighan] An identifier is a sequence of letters and digits. The first character must be a letter; the underscore _ is considered a letter. Uppercase and lowercase letters are considered different. 2) Java 6 and previous Integer Literals [Gosling] An integer literal may be expressed in decimal (base 10), hexadecimal (base 16), or octal (base 8). An integer literal is of type long if it is suffixed with an ASCII letter L or l (ell); otherwise it is of type int. The suffix L is preferred, because the letter l (ell) is often hard to distinguish from the digit 1 (one). A decimal numeral is either the single ASCII character 0, representing the integer zero, or consists of an ASCII digit from 1 to 9, optionally followed by one or more ASCII digits from 0 to 9, representing a positive integer. An hexadecimal numeral consists of the leading ASCII characters 0x or 0X followed by one or more ASCII hexadecimal digits and can represent a positive, zero, or negative integer. Hexadecimal digits with values 10 through 15 are represented by the ASCII letters a through f or A through F, respectively; each letter used as a hexadecimal digit may be uppercase or lowercase. An octal numeral consists of an ASCII digit 0 followed by one or more of the ASCII digits 0 through 7 and can represent a positive, zero, or negative integer. Note that octal numerals always consist of two or more digits; 0 is always considered to be a decimal numeral-not that it matters much in practice, for the numerals 0, 00, and 0x0 all represent exactly the same integer value.
    [Show full text]
  • CS414-2004S-01 Compiler Basics & Lexical Analysis 1 01-0: Syllabus
    CS414-2004S-01 Compiler Basics & Lexical Analysis 1 01-0: Syllabus • Office Hours • Course Text • Prerequisites • Test Dates & Testing Policies • Projects • Teams of up to 2 • Grading Policies • Questions? 01-1: Notes on the Class • Don’t be afraid to ask me to slow down! • We will cover some pretty complex stuff here, which can be difficult to get the first (or even the second) time. ASK QUESTIONS • While specific questions are always preferred, “I don’t get it” is always an acceptable question. I am always happy to stop, re-explain a topic in a different way. • If you are confused, I can guarantee that at least 2 other people in the class would benefit from more explanation 01-2: Notes on the Class • Projects are non-trivial • Using new tools (JavaCC) • Managing a large scale project • Lots of complex classes & advanced programming techniques. 01-3: Notes on the Class • Projects are non-trivial • Using new tools (JavaCC) • Managing a large scale project • Lots of complex classes & advanced programming techniques. • START EARLY! • Projects will take longer than you think (especially starting with the semantic analyzer project) • ASK QUESTIONS! CS414-2004S-01 Compiler Basics & Lexical Analysis 2 01-4: What is a compiler? Source Program Compiler Machine code Simplified View 01-5: What is a compiler? Token Abstract Source Lexical Analyzer Parser File Stream Syntax Tree Abstract Semantic Analyzer Assembly Code Generator Assembly Tree Assembly Tree Generator Relocatable Assembler Object Linker Machine code Code Libraries More Accurate View 01-6:
    [Show full text]
  • AAE 875 – Fundamentals of Object Oriented Programming and Data Analytics
    AAE 875 – Fundamentals of Object Oriented Programming and Data Analytics Cornelia Ilin, PhD Department of Ag & Applied Economics UW-Madison Week 1 - Summer 2019 Programming languages - Types • 'low' because they are very close to how different hardware elements of a computer communicate with each other Low-level Created in 1940s • Require extensive knowledge of computer hardware and its configuration Programming languages - Types • The only language directly understood by a computer; does not need to be translated (by a compiler or interpreter – more on this later) Low-level • All instructions use binary notations and are written as strings of 1s and 0s 011 1100001 001001 1100010 Machine Assembly language language 'machine code' • However, binary notation is very difficult to understand -> develop assembly language to make machine language more readable by humans Programming languages - Types • Consists of a set of symbols and letters Low-level • Requires an assembler to translate the assembly language to machine language Machine Assembly • A second generation because it no longer requires a set of 1s and 0s to write language language instructions, but terms like: Mul 97, #9, 98 Add 96, #3, 92 Div 92, #4, 97 • Assemblers automatically translate assembly language instructions 'Mul 97, #9, 98', into machine code (011 1100001 001001 1100010). • Easier than machine language but still difficult to understand -> develop high level languages Programming languages - Types • Uses English and mathematical symbols in its instructions Low-level •
    [Show full text]
  • XL C/C++: Language Reference the Static Storage Class Specifier
    IBM XL C/C++ for Blue Gene/Q, V12.1 Language Reference Ve r s i o n 12 .1 GC14-7364-00 IBM XL C/C++ for Blue Gene/Q, V12.1 Language Reference Ve r s i o n 12 .1 GC14-7364-00 Note Before using this information and the product it supports, read the information in “Notices” on page 511. First edition This edition applies to IBM XL C/C++ for Blue Gene/Q, V12.1 (Program 5799-AG1) and to all subsequent releases and modifications until otherwise indicated in new editions. Make sure you are using the correct edition for the level of the product. © Copyright IBM Corporation 1998, 2012. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents About this information ........ix The register storage class specifier ......54 Who should read this information .......ix The __thread storage class specifier (IBM How to use this information .........ix extension) ..............56 How this information is organized .......ix Type specifiers .............58 Conventions ..............x Integral types.............58 Related information ...........xiii Boolean types ............59 IBM XL C/C++ information .......xiii floating-point types...........59 Standards and specifications .......xiv Character types ............60 Other IBM information .........xv The void type ............61 Other information ...........xv Vector types (IBM extension) .......61 Technical support ............xv User-defined types ...........62 How to send your comments ........xv The auto type specifier (C++0x) ......81 The decltype(expression) type specifier (C++0x) . 83 Chapter 1. Scope and linkage .....1 Compatibility of arithmetic types (C only) ....88 Type qualifiers .............89 Scope .................2 The __align type qualifier (IBM extension) .
    [Show full text]
  • C Language O Level / a Level
    Programming and Problem Solving through C Language O Level / A Level Chapter -3 : Introduction to ‘C’ Language Constants and Literals Like a variable, a constant is a data storage location used by the users program. Unlike a variable, the value stored in a constant can′t be changed during program execution. C has two types of constants, each with its own specific uses. o Literal Constants o Symbolic Constants Literal Constants 0 and ‘R’ are the examples for literal constant: int count = 20; char name= ‘R’; Symbolic Constants A symbolic constant is a constant that is represented by a name (symbol) in the program. 1. To define a symbolic constant , #define directive is used as follows: #define CONSTNAME literal For example #define PI 3.14159 Area = PI * (radius) * (radius); 2. To define a symbolic constant , Using const keyword. const int PI 3.14159 ; Area = PI * (radius) * (radius); #include <stdio.h> #define LENGTH 10 #define WIDTH 5 #define NEWLINE '\n' void main() { int area; area = LENGTH * WIDTH; printf("value of area : %d", area); printf("%c", NEWLINE); } Literals The constants refer to fixed values that the program may not alter during its execution. These fixed values are also called literals. Constants can be of any of the basic data types like o an integer constant, o a floating constant, o a character constant, or o a string literal. There are also enumeration constants as well. Integer literals An integer literal can be a decimal, octal, or hexadecimal constant. A prefix specifies the base or radix: 0x or 0X for hexadecimal, 0 for octal, and nothing for decimal.
    [Show full text]
  • Modern C++ Tutorial: C++11/14/17/20 on the Fly
    Modern C++ Tutorial: C++11/14/17/20 On the Fly Changkun Ou (hi[at]changkun.de) Last update: August 28, 2021 Notice The content in this PDF file may outdated, please check our website or GitHub repository for the latest book updates. License This work was written by Ou Changkun and licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. http://creativecommons.org/licenses/by-nc-nd/4.0/ 1 2 CONTENTS CONTENTS Contents Preface 8 Introduction ............................................... 8 Targets ................................................. 8 Purpose ................................................. 9 Code ................................................... 9 Exercises ................................................ 9 Chapter 01: Towards Modern C++ 9 1.1 Deprecated Features ........................................ 10 1.2 Compatibilities with C ....................................... 11 Further Readings ............................................ 13 Chapter 02: Language Usability Enhancements 13 2.1 Constants .............................................. 13 nullptr ............................................... 13 constexpr ............................................. 15 2.2 Variables and initialization .................................... 17 if-switch .............................................. 17 Initializer list ........................................... 18 Structured binding ........................................ 20 2.3 Type inference ..........................................
    [Show full text]
  • CS 1622: Lexical Analysis Lexical Analysis Tokens Why Tokens
    9/5/2012 Lexical Analysis Problem: Want to break input into meaningful units of information Input: a string of characters CS 1622: Output: a set of partitions of the input string (tokens) Example: Lexical Analysis if(x==y) { z=1; } else { z=0; } Jonathan Misurda “if(x==y){\n\tz=1;\n} else {\n\tz=0;\n}” [email protected] Tokens Why Tokens? Token: A sequence of characters that can be treated as a single local entity. We need to classify substrings of our source according to their role. Tokens in English: Since a parser takes a list of tokens as inputs, the parser relies on token • noun, verb, adjective, ... distinctions: • For example, a keyword is treated differently than an identifier Tokens in a programming language: • identifier, integer, keyword, whitespace, ... Tokens correspond to sets of strings: • Identifier: strings of letters and digits, starting with a letter • Integer: a non-empty string of digits • Keyword: “else”, “if”, “while”, ... • Whitespace: a non-empty sequence of blanks, newlines, and tabs Design of a Lexer Lexer Implementation 1. Define a finite set of tokens An implementation must do two things: • Describe all items of interest 1. Recognize substrings corresponding to tokens • Depend on language, design of parser 2. Return the value or lexeme of the token recall “if(x==y){\n\tz=1;\n} else {\n\tz=0;\n}” A token is a tuple (type, lexeme): • Keyword, identifier, integer, whitespace “if(x==y){\n\tz=1;\n} else {\n\tz=0;\n}” • Should “==” be one token or two tokens? • Identifier: (id, ‘x’), (id, ‘y’), (id, ‘z’) • Keywords: if, else 2.
    [Show full text]
  • C++ CONSTANTS/LITERALS Rialspo Int.Co M/Cplusplus/Cpp Co Nstants Literals.Htm Copyrig Ht © Tutorialspoint.Com
    C++ CONSTANTS/LITERALS http://www.tuto rialspo int.co m/cplusplus/cpp_co nstants_literals.htm Copyrig ht © tutorialspoint.com Constants refer to fixed values that the prog ram may not alter and they are called literals. Constants can be of any of the basic data types and can be divided into Integ er Numerals, Floating -Point Numerals, Characters, String s and Boolean Values. Ag ain, constants are treated just like reg ular variables except that their values cannot be modified after their definition. Integ er literals: An integ er literal can be a decimal, octal, or hexadecimal constant. A prefix specifies the base or radix: 0x or 0X for hexadecimal, 0 for octal, and nothing for decimal. An integ er literal can also have a suffix that is a combination of U and L, for unsig ned and long , respectively. The suffix can be uppercase or lowercase and can be in any order. Here are some examples of integ er literals: 212 // Legal 215u // Legal 0xFeeL // Legal 078 // Illegal: 8 is not an octal digit 032UU // Illegal: cannot repeat a suffix Following are other examples of various types of Integ er literals: 85 // decimal 0213 // octal 0x4b // hexadecimal 30 // int 30u // unsigned int 30l // long 30ul // unsigned long Floating -point literals: A floating -point literal has an integ er part, a decimal point, a fractional part, and an exponent part. You can represent floating point literals either in decimal form or exponential form. While representing using decimal form, you must include the decimal point, the exponent, or both and while representing using exponential form, you must include the integ er part, the fractional part, or both.
    [Show full text]
  • CS414-2017S-01 Compiler Basics & Lexical Analysis 1
    CS414-2017S-01 Compiler Basics & Lexical Analysis 1 01-0: Syllabus • Office Hours • Course Text • Prerequisites • Test Dates & Testing Policies • Projects • Teams of up to 2 • Grading Policies • Questions? 01-1: Notes on the Class • Don’t be afraid to ask me to slow down! • We will cover some pretty complex stuff here, which can be difficult to get the first (or even the second) time. ASK QUESTIONS • While specific questions are always preferred, “I don’t get it” is always an acceptable question. I am always happy to stop, re-explain a topic in a different way. • If you are confused, I can guarantee that at least one other person in the class would benefit from more explanation 01-2: Notes on the Class • Projects are non-trivial • Using new tools (JavaCC) • Managing a large scale project • Lots of complex classes & advanced programming techniques. 01-3: Notes on the Class • Projects are non-trivial • Using new tools (JavaCC) • Managing a large scale project • Lots of complex classes & advanced programming techniques. • START EARLY! • Projects will take longer than you think (especially starting with the semantic analyzer project) • ASK QUESTIONS! CS414-2017S-01 Compiler Basics & Lexical Analysis 2 01-4: What is a compiler? Source Program Compiler Machine code Simplified View 01-5: What is a compiler? Token Abstract Source Lexical Analyzer Parser File Stream Syntax Tree Abstract Semantic Analyzer Assembly Code Generator Assembly Tree Assembly Tree Generator Relocatable Assembler Object Linker Machine code Code Libraries More Accurate View 01-6:
    [Show full text]