Nullable Type Inference
Total Page:16
File Type:pdf, Size:1020Kb
Nullable Type Inference Michel Mauny Benoît Vaugon Unité d’Informatique et d’Ingénierie des Systèmes (U2IS) ENSTA-ParisTech France {michel.mauny,benoit.vaugon} <at> ensta-paristech.fr We present type inference algorithms for nullable types Now, the compilation of Some(expr) needs to generate a in ML-like programming languages. Starting with a sim- test, in order to use the special representation of Somen(None) ple system, presented as an algorithm, whose only inter- when expr evaluates to None, and pattern-matching against est is to introduce the formalism that we use, we replace Some=None also needs to be adjusted. unification by subtyping constraints and obtain a more in- teresting system. We state the usual properties for both systems. This is work in progress. This paper does not aim at opposing nullable types to op- tion types. Options, in the Hindley-Milner type discipline, offer not only type safety, but also precision by distinguish- 1 Nullable vs. option types ing Some(None) from None, but at the price of a memory al- location or a dynamic test for Some. On the other hand, nul- Imperative programming languages, such as C or Java deriva- lable types extend any classical type t into t?, to include NULL. tives, make abundant use of NULL either as a value for un- Such “nullable values” are easier to represent and compile known or invalid references, or as failure return values. Us- than options, but offer less precision since it makes no sense NULL if ing is rather practical, since the statement suffices for to extend further t?. Also, their static inference haven’t re- checking NULL-ity. Of course, the downside of having NULL as ceived much attention, so far. Indeed, although quite a few a possible value is that, without further support, it could acci- recent programming laguages statically check the safety of dentally be confused with a legal value, leading to execution NULL [3, 2, 11, 1], none of them really performs type infer- errors [7]. ence in the ML sense, but rather local inference, propagat- In languages using the ML type discipline, the option type ing mandatory type annotations of function parameters in- type α option = None | Some of α side the function bodies. injects regular values and the nullary data constructor None, into a single type that could be thought as a nullable type. 2 Nullable type inference The type system guarantees that options cannot be confused with regular values, and pattern-matching is used to check and extract regular values from options. In Haskell, the The purpose of this work is to study type inference of nul- “maybe” data type is heavily used for representing successes lable types, by adding them as a feature in a small functional and failures. The JaneStreet Core library [9] uses the option language. The language that we consider, given in figure 1, type to show possible failures in function types: it purposely is a classical mini-ML, extended with NULL test and creation. avoids exceptions, because their non-appearance in OCaml Section 3 starts with a naïve approach, where the types τ types hides the partiality of functions. In OCaml, None and Some are automatically inserted by the compiler for dealing c ::= () true f alse 0 1 2 ::: + ::: with optional arguments [5]. j j j j j j j j e ::= c x λx:e e e if e then e else e Using the option data type presents the disadvantage of j j j 1 2 j 1 2 3 let x = e in e allocating memory blocks for representing Some(_) values, j 1 2 NULL case e1 of NULL e2 x e3 which may refrain experienced programmers from heavily us- j j k ing options. Avoiding those memory allocations is not really difficult: if Some(v) is represented as v itself, that is, without Figure 1: The language allocating a block tagged as Some, we need a special represen- tation for None, distinct from the representation of v, for any v. Of course, when v is None itself, it then becomes impossi- (that are assigned to expressions e) are pairs (t; ν) of a usual ble to distinguish Some(None) from None. Therefore, None is type t and a “nullability” type information ν. A type (t; ?) cor- not the only special case that needs a special treatment: the responds to values that may be NULL, whereas (t; ∆) denotes whole range of Somen(None) values, for n 0 needs a spe- values that cannot be NULL. Nullability variables are written cial representation such that Somei(None) cannot≥ be confused δ. This system is mainly used to introduce the formalism that with Somej(None) when i , j. we use for writing our algorithms. For instance, unaligned addresses on 64-bits architectures, Section 4 shows a translation algorithm that encode nul- or a statically pre-allocated array of a sufficient size could do lable values with polymorphic variants. Typing the trans- the job1. lated programs with a unification-based mechanism suffers the same weakness as our naive type system. 1Although polymorphic recursion theoretically allows for unbounded depth of Somen(None) while this representation allows for representing only Section 5 presents a more sophisticated typing mechanism, a finite number of n, this limitation should never be met in practice. where unification is replaced by subtyping constraints. Submitted to ML 2014 2 TCONST TINSTVAR TINST(α) Φ τ = T(c) B Φ0 Φ τ0 = τ B Φ0 let α0 fresh Φ; Γ x : σ[α α0] x : τ B Φ0 ` ` ⊕ ` Φ; Γ c : τ B Φ0 Φ; Γ x : τ0 x : τ B Φ0 Φ; Γ x : α.σ x : τ B Φ0 ` ⊕ ` ⊕ 8 ` TINST(δ) let δ0 fresh Φ; Γ x : σ[δ δ0] x : τ B Φ0 ⊕ ` Φ; Γ x : δ.σ x : τ B Φ0 ⊕ 8 ` TLAMBDA let α ; δ ; α ; δ fresh Φ; Γ x :(α ; δ ) e :(α ; δ ) B Φ0 Φ0 τ = (α ; δ ) (α ; δ ) B Φ00 1 1 2 2 ⊕ 1 1 ` 2 2 ` 1 1 ! 2 2 Φ; Γ λx:e : τ B Φ00 ` TAPP let α, δ fresh Φ; Γ e : ((α, δ) τ); ∆) B Φ0 Φ0; Γ e :(α, δ) B Φ00 ` 1 ! ` 2 Φ; Γ e e : τ B Φ00 ` 1 2 TLET let α, δ fresh Φ; Γ e :(α, δ) B Φ0 Φ0; Γ x : gen(Φ0; Γ; (α, δ)) e : τ B Φ00 ` 1 ⊕ ` 2 Φ; Γ let x = e in e : τ B Φ00 ` 1 2 TIFTHENELSE TNULL Φ; Γ e :(bool; ∆) B Φ0 Φ0; Γ e : τ B Φ00 Φ00; Γ e : τ B Φ000 let α fresh Φ τ = (α, ?) B Φ0 ` 1 ` 2 ` 3 ` Φ; Γ if e then e else e : τ B Φ000 Φ; Γ NULL : τ B Φ0 ` 1 2 3 ` TCASE let δ fresh Φ; Γ e :(t; ?) B Φ0 Φ0; Γ e : τ B Φ00 Φ00; Γ x : δ:(t; δ) e : τ B Φ000 ` 1 ` 2 ⊕ 8 ` 3 Φ; Γ case e1 of NULL e2 x e3 : τ B Φ000 ` k Figure 3: Typing rules gen(Φ; Γ; τ) = gen0(Φ(Γ); Φ(τ)) τ ::= (t; ν) t ::= tb α τ1 τ2 µα.τ j j ! j gen0(Γ; σ) = gen0(Γ; α.σ) when α FTV(σ) α < FTV(Γ) tb ::= unit bool int ν ::= ? δ ∆ 8 2 ^ j j j j gen0(Γ; σ) = gen0(Γ; δ.σ) when δ FTV(σ) δ < FTV(Γ) 8 2 ^ gen0(Γ; σ) = σ otherwise Figure 2: The types Figure 4: Generalisation 3 A simple type system or merge (EQMERGE) variable bindings in the Φ substitution. EQNEW , installs a binding in the substitution We first present a rather simple type system, where the types (α) α = t0 Φ[α (that is, in which free occurences of are replaced by carry a “nullability information” saying whether the value of t0] Φ α ) when there is no previous binding about in . Here, an expression may be NULL or not. t0 α Φ t0 is either t or t, depending on whether occurs or not in t. The judgements of our language’s type system are of the µα. α EQNEW(δ) does the same, in a simpler context, on nullability form Φ; Γ e :(t; ν) B Φ , where Φ and Φ are substitutions, Γ 0 0 variables. is a type environment` that maps program identifiers to type EQMERGE merges a new constraint t to a previous schemes (types with a prenex universal quantification of type (α) α = 0 t occuring in . Here, the resulting substitution is and nullability variables). Such a judgement should be read α = Φ Φ0 obtained by resolving the constraint t = t . EQMERGE(δ) does as: given Φ, under assumption Γ, the expression e has type τ 0 the same for nullability variables. with substitution Φ . 0 Correctness. The operational semantics needed for stating The rules should be read as the different cases of an al- the correction of the type system is also classical. NULL cannot gorithm that, given Φ, Γ, e, and τ, computes substitution Φ 0 handle operations such as being called as a function, tested which, when applied to τ and Γ, assign the type Φ (τ) to e. In 0 as a boolean, added as an integer, etc. The typing of values, other words, under assumptions Φ (Γ), e has type Φ (τ). 0 0 procuded by correct executions, is also slightly changed: the Because we have two kinds of variables, we have two in- type (t; ?) includes NULL as well as regular values.