Syntax (Pre Lecture)
Dr. Neil T. Dantam
CSCI-400, Colorado School of Mines
Spring 2021
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 1 / 36 Introduction
Introduction Outcomes I Syntax: what programs we can write I Know basic definitions of formal / what the language “looks like” language theory I Semantics: what these programs I Understand parse trees and abstract means / what the language does syntax trees (more later in the course) I Design grammars for common I concrete syntax – human-readable programming language constructs I abstract syntax – encoded for use by interpreter/compiler I Formal language: mathematical basis to represent and analyze syntax
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 2 / 36 Front-end text
Phases of Interpretation/Compilation
Front-end text Lexical Analysis I Analysis: Front-end terminal sequence I Lexical: convert text to terminals, Syntax Analysis aka lexing, scanning abstract syntax tree I Syntax: convert terminals to syntax tree aka parsing Semantic Analysis I Semantic: check or infer types annotated syntax tree aka type checking, type inference I Synthesis: Back-end I Compiler: Construct machine code I Interpreter: Execute the program Back-end machine code
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 3 / 36 Phases of Analysis
‘‘foo+bar*bif’’ Lexical Analysis
[foo; +; bar; ∗; bif]
Syntax Analysis
+ foo ∗ bar bif + : float foo : float ∗ : int Semantic Analysis bar : int bif : int
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 4 / 36 Automatically Generating Code for Analysis
Compiler Compilers Describe terminals/syntax using formal I Lexical Analysis language theory Scanner: Regular Expressions Regular Scanner I Scanner I Parser: Grammar Expressions Generator I Automatically generate code Syntax Analysis Example Parser I Scanner Generators: Lex / Flex, Ragel Grammar Parser Generator I Parser Generators: YACC / Bison I Combined: JavaCC, ANTLR
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 5 / 36 Formal Language Theory Outline
Formal Language Theory
Grammars Definition Grammars for the Functional Programs Ambiguity and Precedence
Abstract Syntax
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 6 / 36 Formal Language Theory Why use formal language?
Overview Example I Some program text is “valid” Valid I And some is “invalid” if true then false else true I Formal language lets us: I I Precisely define the program text I 1 + 2 * 3 that is valid/invalid I Automatically recognize (parse) Invalid program text I if true else then false true I (Also, profound implications on what computers can do (CSCI-561)) I 1 + * 3
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 7 / 36 Formal Language Theory Sets Review
Notation
I S = {s0, s1, s2,..., sn} Definition (Set) I Empty Set: {} = ∅ An unordered collection of objects without repetition I Set Membership: x ∈ S x ∈/ S | {z } | {z } x in S x not in S
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 8 / 36 Formal Language Theory Sequences
Definition (Sequence)
Example An ordered list of objects. (1, 2, 3, 5, 8,...)
Definition (Tuple)
Example A sequence of finite length. I k-tuple: An tuple of length k I 3-tuple: (2, 4, 8) I pair: An 2-tuple I pair-tuple: (a, b)
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 9 / 36 Formal Language Theory Strings
Definition (Symbol) Example (Symbols) An abstract, primitive, atomic “thing” 0, 1, a, x, foo, bar, +, -, if, match
Definition (Alphabet) Example (Alphabets)
A non-empty, finite set of symbols I ΣB = {0, 1} I ΣE = {a, b, c, d} I ΣC = {if, match, case, +, −}
Definition (String) Example (Strings)
A sequence over some alphabet I ΓB = (1, 0, 1, 0, 1, 0) I ΓE = (h, e, l, l, o) I ΓC = (3, +, x)
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 10 / 36 Formal Language Theory Formal Languages
Definition (Formal Language) A formal language is a set of strings.
Representation I How would you represent: I The language (set) of arithmetic expressions? I The language (set) of well-formed XML documents? I The language (set) of valid variable names in C? I The language (set) of C programs?
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 11 / 36 Grammars Outline
Formal Language Theory
Grammars Definition Grammars for the Functional Programs Ambiguity and Precedence
Abstract Syntax
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 12 / 36 Grammars Definition Overview of Grammars
Example (Conditional Expression) Overview I Programs are written as text if e1 then e2 else e3 I There is a structure to the program A conditional consists of the following I Grammars represent this structure sequence: 1. keyword “if” Grammar 2. an expression 3. the keyword “then” cond → “if” exp “then” exp “else” exp 4. an expression 5. the keyword “else” 6. an expression
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 13 / 36 Grammars Definition Terminal and Nonterminal Symbols
Example
Terminals and Nonterminals Grammar Terminals: The alphabet of the language. Atomic. cond → “if” exp “then” exp “else” exp Nonterminals: Decompose into multiple exp → “true” | “false” terminals and nonterminals. Non-atomic. I Terminals: if, then, else, true, false I Nonterminals: cond, exp
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 14 / 36 Grammars Definition Context-Free Grammars
Definition Example A context-free grammar G is the Grammar tuple G = (V , T , P, S), where: cond → “if” exp “then” exp “else” exp I V is a finite set of nonterminals exp → “true” | “false” I T is a finite set of terminals I P is a finite set of productions of Elements form V → X1,..., X1, V = {cond, exp} where each Xi ∈ V ∪ T I T = {if, then, else, true, false} I S ∈ V is the start symbol I I P = {cond → “if” exp “then” exp “else” exp, exp → “True”, exp → “False”} I S ∈ cond
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 15 / 36 Grammars Definition What’s “context-free?”
no surrounding symbols
v → x0 x1 ... xn |{z} | {z } left-hand side right-hand side
Nonterminals’ expansion is independent of surrounding context
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 16 / 36 Grammars Definition What’s not “context-free?”
Non-context-free languages Counterexample (C/C++) I Most programming language syntax is context-free (or close) /∗ Context: ∗ I sxa type or variable? C/C++ are almost context free I ∗/ I In practice: integrate parsing and scanning to distinguish type x ∗ y ; // declaration or and variable names // multiplication?
f ( ( x )∗ y ) ; // multiplication or // deref. and cast?
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 17 / 36 Grammars Definition Backus-Naur Form (BNF)
Example
LATEX hcondi → “if”hexpi“then”hexpi“else”hexpi hexpi → “true” | “false”
Plain Text
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 18 / 36 Grammars Definition What can we do with a grammar?
Definition (Generation) Definition (Recognition) Generation or uses a grammar to produce Recognition or parsing determines if an a string in its language through a input string is in the language of a sequence of substitutions or rewrites called grammar. a derivation. Equivalently, parsing determines whether a derivation exists from the grammar’s start symbol to the input string.
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 19 / 36 Grammars Definition Derivation
Overview Example Input: Grammar Output: String of terminals in the hcondi → “if”hexpi“then”hexpi“else”hexpi Grammar’s language hexpi → “true” | “false” Approach: Rewriting 1. Begin with the start symbol 2. Find a nonterminal in the I hcondi current in the current string and I “if ”hexpi“then”hexpi“else”hexpi rewrite with a right-hand side I “if ”“true”“then”hexpi“else”hexpi 3. Repeat 2 until no nonterminals I “if ”“true”“then”“false”“else”hexpi remain. I “if ”“true”“then”“false”“else”“true”
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 20 / 36 Grammars Definition Parsing
Overview Input: Grammar, String Output: Is the string in the grammar’s language? Approach: Construct a derivation for the string, corresponding to a parse tree
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 21 / 36 Grammars Definition Parse Tree
Parse Trees Grammar Leaves: Terminals I cond → “if” exp “then” exp “else” exp Nodes: Nonterminals I exp → “true” | “false” I Edges: Productions
Text Parse Tree cond if true then false if exp then exp else exp else true true false true
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 22 / 36 Grammars Grammars for the Functional Programs Example: Lambda Calculus Grammar (λa . a) b
Grammar Parse Tree hexpi hexpi → hsymi hexpi hexpi | “λ”hsymi“.”hexpi | hexpihexpi “(” hexpi “)” hsymi | “(”hexpi“)” hsymi “.” hexpi hsymi → “a” | “b” | “c” | ... “λ” “b” “a” hsymi
“a”
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 23 / 36 Grammars Grammars for the Functional Programs Example: Lambda Calculus Grammar (λa . a) b (continued)
Derivation Parse Tree hexpi hexpihexpi hexpi “(”hexpi“)”hexpi “(”“λ”hsymi“.”hexpi“)”hexpi hexpi hexpi “(”“λ”“a”“.”hexpi“)”hexpi “(”“λ”“a”“.”hsymi“)”hexpi “(” hexpi “)” hsymi “(”“λ”“a”“.”“a”“)”hexpi “λ” hsymi “.” hexpi “b” “(”“λ”“a”“.”“a”“)”hsymi “(”“λ”“a”“.”“a”“)”“b” “a” hsymi
“a”
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 24 / 36 hexpi hexpi hexpi hsymi “λ” hsymi “.” hexpi “a” “b” “(” hexpi “)” hexpi hexpi hsymi hsymi “b” “c”
Grammars Grammars for the Functional Programs Exercise: Lambda Calculus Grammar a λb . (b c)
Grammar Parse Tree hexpi → hsymi | “λ”hsymi“.”hexpi | hexpihexpi | “(”hexpi“)” hsymi → “a” | “b” | “c” | ...
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 25 / 36 hexpi hexpihexpi hsymihexpi hexpi “a”hexpi “a”“λ”hsymi“.”hexpi hexpi hexpi “a”“λ”“b”“.”hexpi “a”“λ”“b”“.”“(”hexpi“)” hsymi “λ” hsymi “.” hexpi “a”“λ”“b”“.”“(”hexpihexpi“)” “(” hexpi “)” “a”“λ”“b”“.”“(”hsymihexpi“)” “a” “b” “a”“λ”“b”“.”“(”“b”hexpi“)” hexpi hexpi “a”“λ”“b”“.”“(”“b”hsymi“)” hsymi hsymi “a”“λ”“b”“.”“(”“b”“c”“)” “b” “c”
Grammars Grammars for the Functional Programs Exercise: Lambda Calculus Grammar a λb . (b c) (continued)
Derivation Parse Tree
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 26 / 36 hexpi
“let” hsymi “←” hexpi “in” hexpi
“let”hsymi“←”hexpi“in”hexpi “x” hexpihexpi “(”hexpi“)” hsymihsymi hexpihexpi
“f ” “y” hsymihsymi
“x” “z”
Grammars Grammars for the Functional Programs Exercise: Let Expression
Grammar Parse Tree let x ← f y in (x z) hexpi → hsymi | “λ”hsymi“.”hexpi | hexpihexpi | “(”hexpi“)” | hsymi → “a” | “b” | “c” | ...
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 27 / 36 hei hei
hei “+” hei hei “∗” hei
“1” hei “∗” hei hei “+” hei “3”
“2” “3” “1” “2”
Ambiguous: multiple valid parse trees
Grammars Ambiguity and Precedence Exercise: Arithmetic
Grammar Parse Tree 1 + 2 ∗ 3 hei → hei“+”hei | hei“∗”hei | “1” | “2” | “3” | ...
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 28 / 36 Grammars Ambiguity and Precedence Handling Precedence Modify Grammar
Parse Tree 1 + 2 ∗ 3 Modify Grammar hei
hei → hti | hei“+”hti hei “+” hti hti → hni | hti“∗”hni hni → “1” | “2” | “3” | ... hti hti “∗” hni
hni hni “3”
“1” “2”
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 29 / 36 Grammars Ambiguity and Precedence Handling Precedence Parser-specific directives YACC/Bison Grammar
expr: expr ’+’ expr | expr ’-’ expr YACC/Bison Precedence | expr ’*’ expr % left ’+’ ’-’ | expr ’/’ expr % left ’*’ ’/’ | num ;
Directs (some) parsing algorithms to resolve ambiguity
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 30 / 36 Abstract Syntax Outline
Formal Language Theory
Grammars Definition Grammars for the Functional Programs Ambiguity and Precedence
Abstract Syntax
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 31 / 36 Abstract Syntax Abstract Syntax
Conditional Type Overview type Exp ← Data structure encoding the I | TrueExp program for compiler or | FalseExp interpreter | CondExp of Exp × Exp × Exp I Some details addressed in the parser, and omitted from the abstract syntax Example I Ambiguity / precedence if true then false else true I Keyword and operator strings CondExp I Abstract Syntax Tree (AST): Use algebraic data types TrueExp FalseExp TrueExp
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 32 / 36 Abstract Syntax Abstract Syntax Tree vs. Parse Tree
Parse Tree Abstract Syntax Tree (AST) Directly corresponds to concrete Omits precedence, parenthesis, keywords etc. syntax and grammar. type Exp ← | NumExp of int hei | AddExp of Exp × Exp | MulExp of Exp × Exp hei “+” hti AddExp hti hti “∗” hni NumExp MulExp hni hni “3” 1 NumExp NumExp “1” “2” 2 3
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 33 / 36 CallExp
| SymExp of string SymExp LambdaExp | LambdaExp of string × Exp a b CallExp | CallExp of Exp × Exp SymExp SymExp
b c
Abstract Syntax Exercise: Lambda Calculus Abstract Syntax
AST a λb . (b c) Data Type type Exp ←
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 34 / 36 Abstract Syntax Summary
I Formal languages: underlying theory for lexical and syntax analysis I Grammars: representation for the concrete programming language syntax I Abstract Syntax: the important structure of the program
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 35 / 36 Abstract Syntax References
I Clarkson. Functional Programming in OCaml. I Ch 10.1 Lexing and Parsing I Hennessy. The Semantics of Programming Languages. I Ch 1.2 Concrete and Abstract Syntax I Aho. Compilers, 2nd ed. I Ch 4 Syntax Analysis
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 36 / 36