Syntax (Pre Lecture)
Total Page:16
File Type:pdf, Size:1020Kb
Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2021 Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 1 / 36 Introduction Introduction Outcomes I Syntax: what programs we can write I Know basic definitions of formal / what the language \looks like" language theory I Semantics: what these programs I Understand parse trees and abstract means / what the language does syntax trees (more later in the course) I Design grammars for common I concrete syntax { human-readable programming language constructs I abstract syntax { encoded for use by interpreter/compiler I Formal language: mathematical basis to represent and analyze syntax Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 2 / 36 Front-end text Phases of Interpretation/Compilation Front-end text Lexical Analysis I Analysis: Front-end terminal sequence I Lexical: convert text to terminals, Syntax Analysis aka lexing, scanning abstract syntax tree I Syntax: convert terminals to syntax tree aka parsing Semantic Analysis I Semantic: check or infer types annotated syntax tree aka type checking, type inference I Synthesis: Back-end I Compiler: Construct machine code I Interpreter: Execute the program Back-end machine code Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 3 / 36 Phases of Analysis ``foo+bar*bif'' Lexical Analysis [foo; +; bar; ∗; bif] Syntax Analysis + foo ∗ bar bif + : float foo : float ∗ : int Semantic Analysis bar : int bif : int Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 4 / 36 Automatically Generating Code for Analysis Compiler Compilers Describe terminals/syntax using formal I Lexical Analysis language theory Scanner: Regular Expressions Regular Scanner I Scanner I Parser: Grammar Expressions Generator I Automatically generate code Syntax Analysis Example Parser I Scanner Generators: Lex / Flex, Ragel Grammar Parser Generator I Parser Generators: YACC / Bison I Combined: JavaCC, ANTLR Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 5 / 36 Formal Language Theory Outline Formal Language Theory Grammars Definition Grammars for the Functional Programs Ambiguity and Precedence Abstract Syntax Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 6 / 36 Formal Language Theory Why use formal language? Overview Example I Some program text is \valid" Valid I And some is \invalid" if true then false else true I Formal language lets us: I I Precisely define the program text I 1 + 2 * 3 that is valid/invalid I Automatically recognize (parse) Invalid program text I if true else then false true I (Also, profound implications on what computers can do (CSCI-561)) I 1 + * 3 Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 7 / 36 Formal Language Theory Sets Review Notation I S = fs0; s1; s2;:::; sng Definition (Set) I Empty Set: fg = ; An unordered collection of objects without repetition I Set Membership: x 2 S x 2= S | {z } | {z } x in S x not in S Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 8 / 36 Formal Language Theory Sequences Definition (Sequence) Example An ordered list of objects. (1; 2; 3; 5; 8;:::) Definition (Tuple) Example A sequence of finite length. I k-tuple: An tuple of length k I 3-tuple: (2; 4; 8) I pair: An 2-tuple I pair-tuple: (a; b) Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 9 / 36 Formal Language Theory Strings Definition (Symbol) Example (Symbols) An abstract, primitive, atomic \thing" 0, 1, a, x, foo, bar, +, -, if, match Definition (Alphabet) Example (Alphabets) A non-empty, finite set of symbols I ΣB = f0; 1g I ΣE = fa; b; c; dg I ΣC = fif; match; case; +; −} Definition (String) Example (Strings) A sequence over some alphabet I ΓB = (1; 0; 1; 0; 1; 0) I ΓE = (h; e; l; l; o) I ΓC = (3; +; x) Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 10 / 36 Formal Language Theory Formal Languages Definition (Formal Language) A formal language is a set of strings. Representation I How would you represent: I The language (set) of arithmetic expressions? I The language (set) of well-formed XML documents? I The language (set) of valid variable names in C? I The language (set) of C programs? Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 11 / 36 Grammars Outline Formal Language Theory Grammars Definition Grammars for the Functional Programs Ambiguity and Precedence Abstract Syntax Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 12 / 36 Grammars Definition Overview of Grammars Example (Conditional Expression) Overview I Programs are written as text if e1 then e2 else e3 I There is a structure to the program A conditional consists of the following I Grammars represent this structure sequence: 1. keyword \if" Grammar 2. an expression 3. the keyword \then" cond ! \if" exp \then" exp \else" exp 4. an expression 5. the keyword \else" 6. an expression Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 13 / 36 Grammars Definition Terminal and Nonterminal Symbols Example Terminals and Nonterminals Grammar Terminals: The alphabet of the language. Atomic. cond ! \if" exp \then" exp \else" exp Nonterminals: Decompose into multiple exp ! \true" j \false" terminals and nonterminals. Non-atomic. I Terminals: if, then, else, true, false I Nonterminals: cond, exp Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 14 / 36 Grammars Definition Context-Free Grammars Definition Example A context-free grammar G is the Grammar tuple G = (V ; T ; P; S), where: cond ! \if" exp \then" exp \else" exp I V is a finite set of nonterminals exp ! \true" j \false" I T is a finite set of terminals I P is a finite set of productions of Elements form V ! X1;:::; X1, V = fcond; expg where each Xi 2 V [ T I T = fif; then; else; true; falseg I S 2 V is the start symbol I I P = fcond ! \if" exp \then" exp \else" exp; exp ! \True"; exp ! \False"g I S 2 cond Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 15 / 36 Grammars Definition What's \context-free?" no surrounding symbols v ! x0 x1 ::: xn |{z} | {z } left-hand side right-hand side Nonterminals' expansion is independent of surrounding context Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 16 / 36 Grammars Definition What's not \context-free?" Non-context-free languages Counterexample (C/C++) I Most programming language syntax is context-free (or close) /∗ Context: ∗ I sxa type or variable? C/C++ are almost context free I ∗/ I In practice: integrate parsing and scanning to distinguish type x ∗ y ; // declaration or and variable names // multiplication? f ( ( x )∗ y ) ; // multiplication or // deref. and cast? Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 17 / 36 Grammars Definition Backus-Naur Form (BNF) Example LATEX hcondi ! \if"hexpi\then"hexpi\else"hexpi hexpi ! \true" j \false" Plain Text <cond> ::= "if" <exp> "then" <exp> "else" <exp> <exp> ::= "true" | "false" Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 18 / 36 Grammars Definition What can we do with a grammar? Definition (Generation) Definition (Recognition) Generation or uses a grammar to produce Recognition or parsing determines if an a string in its language through a input string is in the language of a sequence of substitutions or rewrites called grammar. a derivation. Equivalently, parsing determines whether a derivation exists from the grammar's start symbol to the input string. Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 19 / 36 Grammars Definition Derivation Overview Example Input: Grammar Output: String of terminals in the hcondi ! \if"hexpi\then"hexpi\else"hexpi Grammar's language hexpi ! \true" j \false" Approach: Rewriting 1. Begin with the start symbol 2. Find a nonterminal in the I hcondi current in the current string and I \if "hexpi\then"hexpi\else"hexpi rewrite with a right-hand side I \if "\true"\then"hexpi\else"hexpi 3. Repeat 2 until no nonterminals I \if "\true"\then"\false"\else"hexpi remain. I \if "\true"\then"\false"\else"\true" Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 20 / 36 Grammars Definition Parsing Overview Input: Grammar, String Output: Is the string in the grammar's language? Approach: Construct a derivation for the string, corresponding to a parse tree Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 21 / 36 Grammars Definition Parse Tree Parse Trees Grammar Leaves: Terminals I cond ! \if" exp \then" exp \else" exp Nodes: Nonterminals I exp ! \true" j \false" I Edges: Productions Text Parse Tree cond if true then false if exp then exp else exp else true true false true Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 22 / 36 Grammars Grammars for the Functional Programs Example: Lambda Calculus Grammar (λa : a) b Grammar Parse Tree hexpi hexpi ! hsymi hexpi hexpi j \λ"hsymi\:"hexpi j hexpihexpi \(" hexpi \)" hsymi j \("hexpi\)" hsymi \:" hexpi hsymi ! \a" j \b" j \c" j ::: \λ" \b" \a" hsymi \a" Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 23 / 36 Grammars Grammars for the Functional Programs Example: Lambda Calculus Grammar (λa : a) b (continued) Derivation Parse Tree hexpi hexpihexpi hexpi \("hexpi\)"hexpi \("\λ"hsymi\:"hexpi\)"hexpi hexpi hexpi \("\λ"\a"\:"hexpi\)"hexpi \("\λ"\a"\:"hsymi\)"hexpi \(" hexpi \)" hsymi \("\λ"\a"\:"\a"\)"hexpi \λ" hsymi \:" hexpi \b" \("\λ"\a"\:"\a"\)"hsymi \("\λ"\a"\:"\a"\)"\b" \a" hsymi \a" Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 24 / 36 hexpi hexpi hexpi hsymi \λ" hsymi \:" hexpi \a" \b" \(" hexpi \)" hexpi hexpi hsymi hsymi \b" \c" Grammars Grammars for the Functional Programs Exercise: Lambda Calculus Grammar a λb : (b c) Grammar Parse Tree hexpi ! hsymi j \λ"hsymi\:"hexpi j hexpihexpi