Syntax (Pre Lecture)

Dr. Neil T. Dantam

CSCI-400, Colorado School of Mines

Spring 2021

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 1 / 36 Introduction

Introduction Outcomes I Syntax: what programs we can write I Know basic definitions of formal / what the language “looks like” language theory I Semantics: what these programs I Understand parse trees and abstract means / what the language does syntax trees (more later in the course) I Design grammars for common I concrete syntax – human-readable programming language constructs I abstract syntax – encoded for use by interpreter/ I Formal language: mathematical basis to represent and analyze syntax

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 2 / 36 Front-end text

Phases of Interpretation/Compilation

Front-end text I Analysis: Front-end terminal sequence I Lexical: convert text to terminals, Syntax Analysis aka lexing, scanning abstract syntax tree I Syntax: convert terminals to syntax tree aka Semantic Analysis I Semantic: check or infer types annotated syntax tree aka type checking, type inference I Synthesis: Back-end I Compiler: Construct machine code I Interpreter: Execute the program Back-end machine code

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 3 / 36 Phases of Analysis

‘‘foo+bar*bif’’ Lexical Analysis

[foo; +; bar; ∗; bif]

Syntax Analysis

+ foo ∗ bar bif + : float foo : float ∗ : int Semantic Analysis bar : int bif : int

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 4 / 36 Automatically Generating Code for Analysis

Compiler Describe terminals/syntax using formal I Lexical Analysis language theory Scanner: Regular Expressions Regular Scanner I Scanner I Parser: Grammar Expressions Generator I Automatically generate code Syntax Analysis Example Parser I Scanner Generators: / Flex, Ragel Grammar Parser Generator I Parser Generators: YACC / Bison I Combined: JavaCC, ANTLR

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 5 / 36 Formal Language Theory Outline

Formal Language Theory

Grammars Definition Grammars for the Functional Programs Ambiguity and Precedence

Abstract Syntax

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 6 / 36 Formal Language Theory Why use formal language?

Overview Example I Some program text is “valid” Valid I And some is “invalid” if true then false else true I Formal language lets us: I I Precisely define the program text I 1 + 2 * 3 that is valid/invalid I Automatically recognize (parse) Invalid program text I if true else then false true I (Also, profound implications on what computers can do (CSCI-561)) I 1 + * 3

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 7 / 36 Formal Language Theory Sets Review

Notation

I S = {s0, s1, s2,..., sn} Definition (Set) I Empty Set: {} = ∅ An unordered collection of objects without repetition I Set Membership: x ∈ S x ∈/ S | {z } | {z } x in S x not in S

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 8 / 36 Formal Language Theory Sequences

Definition (Sequence)

Example An ordered list of objects. (1, 2, 3, 5, 8,...)

Definition (Tuple)

Example A sequence of finite length. I k-tuple: An tuple of length k I 3-tuple: (2, 4, 8) I pair: An 2-tuple I pair-tuple: (a, b)

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 9 / 36 Formal Language Theory Strings

Definition (Symbol) Example (Symbols) An abstract, primitive, atomic “thing” 0, 1, a, x, foo, bar, +, -, if, match

Definition (Alphabet) Example (Alphabets)

A non-empty, finite set of symbols I ΣB = {0, 1} I ΣE = {a, b, , } I ΣC = {if, match, case, +, −}

Definition (String) Example (Strings)

A sequence over some alphabet I ΓB = (1, 0, 1, 0, 1, 0) I ΓE = (h, e, l, l, o) I ΓC = (3, +, x)

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 10 / 36 Formal Language Theory Formal Languages

Definition (Formal Language) A formal language is a set of strings.

Representation I How would you represent: I The language (set) of arithmetic expressions? I The language (set) of well-formed XML documents? I The language (set) of valid variable names in C? I The language (set) of C programs?

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 11 / 36 Grammars Outline

Formal Language Theory

Grammars Definition Grammars for the Functional Programs Ambiguity and Precedence

Abstract Syntax

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 12 / 36 Grammars Definition Overview of Grammars

Example (Conditional Expression) Overview I Programs are written as text if e1 then e2 else e3 I There is a structure to the program A conditional consists of the following I Grammars represent this structure sequence: 1. keyword “if” Grammar 2. an expression 3. the keyword “then” cond → “if” exp “then” exp “else” exp 4. an expression 5. the keyword “else” 6. an expression

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 13 / 36 Grammars Definition Terminal and Nonterminal Symbols

Example

Terminals and Nonterminals Grammar Terminals: The alphabet of the language. Atomic. cond → “if” exp “then” exp “else” exp Nonterminals: Decompose into multiple exp → “true” | “false” terminals and nonterminals. Non-atomic. I Terminals: if, then, else, true, false I Nonterminals: cond, exp

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 14 / 36 Grammars Definition Context-Free Grammars

Definition Example A context-free grammar G is the Grammar tuple G = (V , T , P, S), where: cond → “if” exp “then” exp “else” exp I V is a finite set of nonterminals exp → “true” | “false” I T is a finite set of terminals I P is a finite set of productions of Elements form V → X1,..., X1, V = {cond, exp} where each Xi ∈ V ∪ T I T = {if, then, else, true, false} I S ∈ V is the start symbol I I P = {cond → “if” exp “then” exp “else” exp, exp → “True”, exp → “False”} I S ∈ cond

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 15 / 36 Grammars Definition What’s “context-free?”

no surrounding symbols

v → x0 x1 ... xn |{z} | {z } left-hand side right-hand side

Nonterminals’ expansion is independent of surrounding context

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 16 / 36 Grammars Definition What’s not “context-free?”

Non-context-free languages Counterexample (C/C++) I Most programming language syntax is context-free (or close) /∗ Context: ∗ I sxa type or variable? C/C++ are almost context free I ∗/ I In practice: integrate parsing and scanning to distinguish type x ∗ y ; // declaration or and variable names // multiplication?

f ( ( x )∗ y ) ; // multiplication or // deref. and cast?

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 17 / 36 Grammars Definition Backus-Naur Form (BNF)

Example

LATEX hcondi → “if”hexpi“then”hexpi“else”hexpi hexpi → “true” | “false”

Plain Text ::= "if" "then" "else" ::= "true" | "false"

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 18 / 36 Grammars Definition What can we do with a grammar?

Definition (Generation) Definition (Recognition) Generation or uses a grammar to produce Recognition or parsing determines if an a string in its language through a input string is in the language of a sequence of substitutions or rewrites called grammar. a derivation. Equivalently, parsing determines whether a derivation exists from the grammar’s start symbol to the input string.

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 19 / 36 Grammars Definition Derivation

Overview Example Input: Grammar Output: String of terminals in the hcondi → “if”hexpi“then”hexpi“else”hexpi Grammar’s language hexpi → “true” | “false” Approach: Rewriting 1. Begin with the start symbol 2. Find a nonterminal in the I hcondi current in the current string and I “if ”hexpi“then”hexpi“else”hexpi rewrite with a right-hand side I “if ”“true”“then”hexpi“else”hexpi 3. Repeat 2 until no nonterminals I “if ”“true”“then”“false”“else”hexpi remain. I “if ”“true”“then”“false”“else”“true”

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 20 / 36 Grammars Definition Parsing

Overview Input: Grammar, String Output: Is the string in the grammar’s language? Approach: Construct a derivation for the string, corresponding to a parse tree

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 21 / 36 Grammars Definition Parse Tree

Parse Trees Grammar Leaves: Terminals I cond → “if” exp “then” exp “else” exp Nodes: Nonterminals I exp → “true” | “false” I Edges: Productions

Text Parse Tree cond if true then false if exp then exp else exp else true true false true

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 22 / 36 Grammars Grammars for the Functional Programs Example: Lambda Calculus Grammar (λa . a) b

Grammar Parse Tree hexpi hexpi → hsymi hexpi hexpi | “λ”hsymi“.”hexpi | hexpihexpi “(” hexpi “)” hsymi | “(”hexpi“)” hsymi “.” hexpi hsymi → “a” | “b” | “c” | ... “λ” “b” “a” hsymi

“a”

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 23 / 36 Grammars Grammars for the Functional Programs Example: Lambda Calculus Grammar (λa . a) b (continued)

Derivation Parse Tree hexpi hexpihexpi hexpi “(”hexpi“)”hexpi “(”“λ”hsymi“.”hexpi“)”hexpi hexpi hexpi “(”“λ”“a”“.”hexpi“)”hexpi “(”“λ”“a”“.”hsymi“)”hexpi “(” hexpi “)” hsymi “(”“λ”“a”“.”“a”“)”hexpi “λ” hsymi “.” hexpi “b” “(”“λ”“a”“.”“a”“)”hsymi “(”“λ”“a”“.”“a”“)”“b” “a” hsymi

“a”

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 24 / 36 hexpi hexpi hexpi hsymi “λ” hsymi “.” hexpi “a” “b” “(” hexpi “)” hexpi hexpi hsymi hsymi “b” “c”

Grammars Grammars for the Functional Programs Exercise: Lambda Calculus Grammar a λb . (b c)

Grammar Parse Tree hexpi → hsymi | “λ”hsymi“.”hexpi | hexpihexpi | “(”hexpi“)” hsymi → “a” | “b” | “c” | ...

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 25 / 36 hexpi hexpihexpi hsymihexpi hexpi “a”hexpi “a”“λ”hsymi“.”hexpi hexpi hexpi “a”“λ”“b”“.”hexpi “a”“λ”“b”“.”“(”hexpi“)” hsymi “λ” hsymi “.” hexpi “a”“λ”“b”“.”“(”hexpihexpi“)” “(” hexpi “)” “a”“λ”“b”“.”“(”hsymihexpi“)” “a” “b” “a”“λ”“b”“.”“(”“b”hexpi“)” hexpi hexpi “a”“λ”“b”“.”“(”“b”hsymi“)” hsymi hsymi “a”“λ”“b”“.”“(”“b”“c”“)” “b” “c”

Grammars Grammars for the Functional Programs Exercise: Lambda Calculus Grammar a λb . (b c) (continued)

Derivation Parse Tree

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 26 / 36 hexpi

“let” hsymi “←” hexpi “in” hexpi

“let”hsymi“←”hexpi“in”hexpi “x” hexpihexpi “(”hexpi“)” hsymihsymi hexpihexpi

“f ” “y” hsymihsymi

“x” “z”

Grammars Grammars for the Functional Programs Exercise: Let Expression

Grammar Parse Tree let x ← f y in (x z) hexpi → hsymi | “λ”hsymi“.”hexpi | hexpihexpi | “(”hexpi“)” | hsymi → “a” | “b” | “c” | ...

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 27 / 36 hei hei

hei “+” hei hei “∗” hei

“1” hei “∗” hei hei “+” hei “3”

“2” “3” “1” “2”

Ambiguous: multiple valid parse trees

Grammars Ambiguity and Precedence Exercise: Arithmetic

Grammar Parse Tree 1 + 2 ∗ 3 hei → hei“+”hei | hei“∗”hei | “1” | “2” | “3” | ...

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 28 / 36 Grammars Ambiguity and Precedence Handling Precedence Modify Grammar

Parse Tree 1 + 2 ∗ 3 Modify Grammar hei

hei → hti | hei“+”hti hei “+” hti hti → hni | hti“∗”hni hni → “1” | “2” | “3” | ... hti hti “∗” hni

hni hni “3”

“1” “2”

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 29 / 36 Grammars Ambiguity and Precedence Handling Precedence Parser-specific directives YACC/Bison Grammar

expr: expr ’+’ expr | expr ’-’ expr YACC/Bison Precedence | expr ’*’ expr % left ’+’ ’-’ | expr ’/’ expr % left ’*’ ’/’ | num ;

Directs (some) parsing algorithms to resolve ambiguity

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 30 / 36 Abstract Syntax Outline

Formal Language Theory

Grammars Definition Grammars for the Functional Programs Ambiguity and Precedence

Abstract Syntax

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 31 / 36 Abstract Syntax Abstract Syntax

Conditional Type Overview type Exp ← Data structure encoding the I | TrueExp program for compiler or | FalseExp interpreter | CondExp of Exp × Exp × Exp I Some details addressed in the parser, and omitted from the abstract syntax Example I Ambiguity / precedence if true then false else true I Keyword and operator strings CondExp I Abstract Syntax Tree (AST): Use algebraic data types TrueExp FalseExp TrueExp

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 32 / 36 Abstract Syntax Abstract Syntax Tree vs. Parse Tree

Parse Tree Abstract Syntax Tree (AST) Directly corresponds to concrete Omits precedence, parenthesis, keywords etc. syntax and grammar. type Exp ← | NumExp of int hei | AddExp of Exp × Exp | MulExp of Exp × Exp hei “+” hti AddExp hti hti “∗” hni NumExp MulExp hni hni “3” 1 NumExp NumExp “1” “2” 2 3

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 33 / 36 CallExp

| SymExp of string SymExp LambdaExp | LambdaExp of string × Exp a b CallExp | CallExp of Exp × Exp SymExp SymExp

b c

Abstract Syntax Exercise: Lambda Calculus Abstract Syntax

AST a λb . (b c) Data Type type Exp ←

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 34 / 36 Abstract Syntax Summary

I Formal languages: underlying theory for lexical and syntax analysis I Grammars: representation for the concrete programming language syntax I Abstract Syntax: the important structure of the program

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 35 / 36 Abstract Syntax References

I Clarkson. Functional Programming in OCaml. I Ch 10.1 Lexing and Parsing I Hennessy. The Semantics of Programming Languages. I Ch 1.2 Concrete and Abstract Syntax I Aho. Compilers, 2nd ed. I Ch 4 Syntax Analysis

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2021 36 / 36