A Simple Language

::= true | false | if then else | 0 | succ | pred | iszero

Type Systems . Simple untyped expressions . Natural numbers encoded as succ … succ 0 . E.g. succ succ succ 0 represents 3 . Pierce Ch. 3, 8, 11, 15 . term: a string from this language . To improve readability, we will sometime write parentheses: e.g. iszero (pred (succ 0))

CSE 6341 1 CSE 6341 2

Semantics (informally) Equivalent Ways to Define the Syntax

. A term evaluates to a value . Inductive definition: the smallest S s.t. . Values are terms themselves . { true, false, 0 }  S . Boolean constants: true and false . if t1S, then {succ t1, pred t1, iszero t1 } S . Natural numbers: 0, succ 0, succ (succ 0), … . if t1 , t2 , t3 S, then if t1 then t2 else t3 S . Given a program (i.e., a term), the result . Same thing, written as inference rules of “running” this program is a boolean trueS falseS 0S axioms (no premises) value or a natural number t1S t1S t1St2St3S . if false then 0 else succ 0  succ 0 succ t1 S pred t1 S if t1 then t2 else t3 S . iszero (pred (succ 0))  true t S If we have established the premises . Problematic: succ true or if 0 then 0 else 0 1 (above the line), we can derive the iszero t1 S CSE 6341 3 conclusion (below the line) 4

Why Does This Matter? Inductive Proofs

. Key property: for any tS, one of three things . Structural induction – used very often must be true: . Suppose P is a predicate over terms (i.e., a . It is a constant (i.e., derived from an axiom) function mapping elements of S to truth values)

. It is of the form succ t1, pred t1, or iszero t1 . When P(t) is true, we will just write P(t) where t is some smaller term 1 . For each term t, let ti be its immediate . It is of the form if t1 then t2 else t3 where subterms. Suppose we can prove that t , t , and t are some smaller terms 1 2 3 . Whenever P(ti) for all ti , we also have P(t) . The inference rules make this explicit, and . For terms without subterms, P(t) holds make it easy for us to have . This means that P(t) for all terms in S . Inductive definitions of functions over S . Inductive proofs of properties of S

CSE 6341 5 CSE 6341 6

1 Semantics: Why? Semantics: How?

. We need to define the semantics before we . Operational semantics in the general sense: can discuss type systems imagine an abstract machine . The semantics defines the difference . Some notion of the state of this machine between “good” and “bad” programs . Transition function: given the current state, . A can help us prove that certain what is the next state? programs are “good”, for all possible inputs . It is possible that the machine gets . Safety (a.k.a. soundness) of a type system: if “stuck” – there is no valid transition a program is well-typed, it will not “go wrong” . The semantics we will define for this simple . But only for certain bad behaviors: e.g. a language is a specific form of “small-step” type system typically cannot assure the operational semantics absence of “division by zero” or “array . state = term; transition = term simplification index out of bounds” . CSE 6341 7 Later will discuss “big-step” semantics 8

Semantics: How? Semantics (formally)

. Initial state: the term whose meaning we are . The domain of values (a subset of the terms) trying to determine . ::= | values . i.e., the expression we are trying to evaluate . ::= true | false boolean values . One of two things can happen: . ::= 0 | succ numeric values . We reach a state (i.e. a term) which is a . Operational semantics defined by an evaluation semantic value relation on terms: t  t’ . We get stuck .  is a binary relation:   S  S . All of this depends on what we consider to be . t  t’ means “t evaluates to t’ in one step” the set of semantic values . Thus, “small-step” operational semantics

CSE 6341 9 CSE 6341 10

Evaluation Relation: Booleans Example

. Relation   SS defined with inference rules if true then (if (if false then false else false) then . Just a way of writing an inductive definition true

if true then t2 else t3  t2 else false) else if false then t2 else t3  t3 true  ? (value i.e. term that is true or false)  t1  t1 Step 1: ... if (if false then false else false) then true else false if t then t else t  if t then t else t 1 2 3 1 2 3 Step 2: if false then false else false  false Step 3: if (if false then false else false) then . These rules get instantiated with concrete true else false  if false then true else false terms – to get rule instances Step 4: if false then true else false  false CSE 6341 11 CSE 6341 12

2 More on the Evaluation Relation Typed Expressions

. We can generalize to the natural numbers by . Goal: without evaluating a term, can we adding more inference rules guarantee that it will not get stuck? . Will not go into these details here . Idea: define types, and establish a . A key issue: what if we reach a term that relationship between terms and types cannot be evaluated anymore (no inference rule . For our simple example: applies), but the term is not a semantic value? . Type Bool, which is the set of all terms . Examples: if 0 then 0 else 0 and pred false that evaluate to a boolean value . There is no inference rule that can be used . Type Nat, which is the set of all terms to make “the next step” that evaluate to a numeric value . We get “stuck” – i.e. have a run-time error: . To determine that a term t has type T (i.e., the program has reached a meaningless state tT), we will only look at the structure of t (i.e., will do a compile-time analysis) CSE 6341 13 CSE 6341 14

Typing Relation Example: Typing Derivation

. Relation :  S  { Bool, Nat} . if (iszero 0) then 0 else (succ 0) : ? . t : T is the same as t T true : Bool false : Bool 0 : Nat 0 : Nat 0 : Nat t1 : Bool t2 : T t3 : T iszero 0 : Bool 0 : Nat succ 0 : Nat

if t1 then t2 else t3 : T if (iszero 0) then 0 else (succ 0) : Nat

t1 : Nat t1 : Nat t1 : Nat . This structure is a derivation tree: the leaves succ t1 : Nat pred t1 : Nat iszero t1 : Bool are instances of axioms, the inner nodes are instances of inference rules with premises

CSE 6341 15 CSE 6341 16

More on the Typing Relation More on the Typing Relation

. A term t is typable (or well typed) if there is . Safety = Progress + Preservation some T such that t : T . Safety (a.k.a. soundness) of a type system: if . In this particular simple type system, each a program is well-typed, it will not “go wrong” term has at most one type . For this type system: a well-typed term t : T . In general, a term may have multiple types will not get stuck (e.g. when the type system has subtypes) . And will evaluate to a value of type T . Progress: A well-typed term will not be stuck: . This property does not work in the other it either is a value, or it can take a step direction: a term which is not well typed may or according to the evaluation rules may not get stuck (conservative analysis) . Preservation: If a well-typed term takes a step . if (iszero 0) then 0 else false of evaluation, the result is also well typed . if true then 0 else false

CSE 6341 17 CSE 6341 18

3 An Extended Simple Language Typing Relation Again

::= true | false | if then else . No surprises here … | 0 | succ | pred | iszero t1 : T1 t2 : T2

| { , } | .1 | .2 { t1 , t2 } : T1  T2 . Pairs: pairing { , } and projection .1/.2 t1 : T1  T2 t1 : T1  T2 . Need to add pair values to the semantics t .1: T t .2: T . ::= | | { , } 1 1 1 2 . Generalization to n-tuples is trivial . {if (iszero 0) then 0 else (succ 0),true}.2 : ? . For typing: need to add pair types T1  T2 . { … } : Nat  Bool . E.g. Bool  Nat, Nat  Nat, etc. . { … }.2 : Bool

CSE 6341 19 CSE 6341 20

Records Typing Relation

::= … | { l1=1 , l2=2 ,…, ln=n } | .l . Similar to the handling of tuples . Example: { sum=succ 0,overdraft=true } t1 : T1 t2 : T2 … tn : Tn . Labels li are from some pre-defined set of labels { l1=t1, l2=t2, …, ln=tn } : { l1:T1, l2:T2, …, ln:Tn } . In any term, all labels must be different

. In the semantics, introduce record values t1 : { l1:T1, l2:T2, …, ln:Tn } . In the type system, introduce record types t1.lk : Tk { l1:T1 , l2:T2 ,…, ln:Tn } . E.g. { sum:Nat , overdraft:Bool } . {sum=succ 0,overdraft=true}.sum : ? . { … } : { sum:Nat , overdraft:Bool } . { … }.sum : Nat

CSE 6341 21 CSE 6341 22

Ordering of Labels Lists

. Consider { sum=succ 0 , overdraft=true } and ::= … | nil[] | [] { overdraft=true , sum=succ 0 } | isnil[] | head[] | tail[] . Are they the same value? . Consider { sum:Nat , overdraft:Bool } and . Example: cons[Bool] (isnil[NatBool] { overdraft:Bool , sum:Nat } nil[NatBool]) (cons[Bool] false nil[Bool]) . Are they the same type? . The value is a list of size 2: cons[Bool] true . In our type system, labels are ordered (cons[Bool] false nil[Bool]) i.e. (true false) . Similarly to tuples: {0,true} is not {true,0} . In the semantics: list values . Will this typecheck in C? . ::= … | nil[] | cons[] . struct{int x;int y;} a,b; struct{int y;int x;} c; . In the type system: list types . a.x = 1; a.y = 2; b = a; c = a; . List T – e.g. List (List NatNat) CSE 6341 23 CSE 6341 24

4 Typing Relation Let Bindings

::= … | let id = in nil[T ] : List T t1 : List T1 1 1 . Give names to sub-expressions head[T1] t1 : T1 t1 : T1 t2 : List T1 . let z=true in cons[Bool] z (cons[Bool] z nil[Bool]) . Semantics: evaluate the first expr, “bind” z to cons[T1] t1 t2 : List T1 t1 : List T1 that value, and evaluate the second expr tail[T1] t1 : List T1 t1 : List T1 . Use a type environment  (a.k.a. typing context)

isnil[T1] t1 : Bool . Sequence of (name,type) pairs

. Example 1: cons[Bool] (isnil[NatBool] . , x:T means “ appended with the pair (x:T)” nil[NatBool]) (cons[Bool] false nil[Bool]) . Name x should not already be bound by  . Example 2: cons[Bool] false true . Ternary typing relation:  t : T . Example 3: isnil[Bool] nil[NatBool] . “Term t has type T under the bindings in “ CSE 6341 25 CSE 6341 26

Typing Relation Extended Typing Relation

. Need to include  in all rules ; e.g.  t1 : T1 ,x:T1 t2 : T2 x:T       let x = t1 in t2 : T2  x: T true : Bool t1 : T1 t2 : List T1

 cons[T1] t1 t2 : List T1 . let z=true in cons[Bool] z (cons[Bool] z nil[Bool]) : ? .  true : Bool .  also needed for functions and function . z:Bool cons[Bool] z (cons[Bool] z nil[Bool]) : ? applications (function body should be evaluated . z:Bool z : Bool z:Bool nil[Bool] : List Bool under bindings for the function parameters) . z:Bool cons[Bool] z nil[Bool] : List Bool . But, we have no time for this discussion . z:Bool cons[Bool] z (cons[Bool] z nil[Bool]) : List Bool . In this generalized type system, as before, .  let z=true in cons[Bool] z (cons[Bool] z nil[Bool]) each term has at most one type, and a well- : List Bool typed term will not get stuck (safety) . Note:  t : T is typically written simply as t : T CSE 6341 27 CSE 6341 28

Subtypes Subtype Relation

. Subtypes play an important role in many S S reflexivity S U U T transitivity languages (e.g. object-oriented ones) S  T S Top top type . S is a subtype of T, written S T, if any term of type S can be safely used in any situation S1 T1 S2  T2 … Sn  Tn where a term of type T is expected { l1:S1, l2:S2, …, ln:Sn }  { l1:T1, l2:T2, …, ln:Tn } . Principle of safe substitution  t: S ST depth for records subsumption rule  t: T { l1:T1, l2:T2, …, ln:Tn, ln+1:Tn+1 }  { l1:T1, l2:T2, …, ln:Tn } . Simple interpretation is that the elements of S width subtyping for records form a subset of the elements of T Example: {x:Nat} is the set of all records that have a field x:Nat, and some . We will define the subtype relation with the other fields. {x:Nat,y:Bool} is the set of all records that have a field x:Nat, a field y:Bool, and some other fields. Thus, {x:Nat,y:Bool}  {x:Nat} help of inference rules 29 CSE 6341 30

5 Should the Order of Labels Matter? Functions and Subtypes

{ k1:S1, …, kn:Sn } is a permutation of { l1:T1, …, ln:Tn } . Function types: T1  T2 . For a term of type T , the result of applying { k1:S1, …, kn:Sn }  { l1:T1, …, ln:Tn } 1 the function on this term is of type T2 . The rule says that the order of labels (fields) in a . Subtyping: contravariant for the parameter, record does not matter: e.g. {x:Nat,y:Bool} is a covariant for the result subtype of {y:Bool,x:Nat} and vice versa T1 S1 S2 T2 . Problem: this is bad for run-time performance S  S  T  T . If we fix the order at , we would 1 2 1 2 .  know, at compile time, the offset of the field Function f of type S1 S2 accepts an argument of S1, so it should be OK with an with label ln – allows efficient access for t.ln argument of T . Returns a value of S , so f(…) . But with permutation, at run time need to 1 2 can be used anywhere where T is expected. “search” in memory for the actual location of l 2 n So, f is also of type T  T CSE 6341 31 1 2 32

Tuples and Lists Casting

. n-tuples can be thought of as a special case of . (T) t in Java and C++ records with labels 1, 2, …, n . Up-cast: a term is “forced” to a supertype of . Essentially, same typing rules the type the typechecker would choose for it . Lists If  t:Sand ST, use S1 T1  t: T this and the subsumption List S List T  (T) t : T 1 1 rule to derive  (T) t :T . . Allows the creation of heterogeneous lists: e.g. Down-cast: force a type that cannot be cons[{x:Nat}] {x=0} (cons[{x:Nat,y:Bool}] {x=0,y=true} determined statically  t: S nil[{x:Nat,y:Bool}]) . The programmer says  (T) t : T . For the inner expression: cons … : List {x:Nat,y:Bool} to the typechecker: . Subsumption rule: give it type List {x:Nat} “I know this will be the type; trust me” . Only then we can type the outer cons … . “trust but verify” e.g. run-time checks in Java CSE 6341 33 CSE 6341 34

Polymorphism Terminology

. Statically typed language: compile-time analyses . Poly = many, morph = form . Prove the absence of certain type-related bad . A piece of code has multiple types run-time behaviors (C, C++, Java, ML, Haskell,…) . Example 1: subtype polymorphism . Type safety: all bad behaviors of certain . Subsumption rule: a term has multiple types kinds are excluded (e.g. Java, but not C) . Typical for object-oriented languages . Dynamically typed language: run-time checks to . Example 2: catch bad behaviors (e.g. Lisp, Scheme, ) . E.g. f(x)=x has types BoolBool, NatNat, … . Language safety: cannot “break” the fundamental . Use a type parameter T and type TT abstractions (type-related and otherwise); e.g. no . Examples: generics in C++ and Java, ML- buffer overflows, seg faults, return address style polymorphism in functional languages overriding, garbage values due to type errors, etc. . Example 3: ad hoc polymorphism -e.g. overloading . C: unsafe; Java: safe, static+dynamic checking; CSE 6341 35 Lisp: safe, dynamic checking 36

6