Qualifier Type Inference

We present the full qualifier inference system in this section. annotates the expression e with the qualifier Q. The expression Our system extends the flow-sensitive analysis of Foster et al. check(e, Q) requires the top-level qualifier of e to be at most Q. [1]. In particular, we consider pair types (and more generally We automatically insert the check expressions through a simple records) and present their corresponding type inference rules. program transformation. Specifically, we consider two types of Providing separate qualifiers for the elements of pairs is use as security critical: pointer dereferences and conditional important in our problem domain, as records ( structs) branches. To detect UBI, we insert a check(e, init) statement are used extensively in the . More importantly, before every statement where e is dereferenced or is used as pointers to records are often passed between functions and the predicate of a conditional branch. whether a field of a record is or is not initialized is independent of the other fields of the record. We present a type qualifier B. Types and Type Stores inference system to infer a qualifier (either init or uninit) for We now define the qualified types. each expression of the program. τ := Q σ A. Syntax Q := κ | init | uninit 0 0 σ := int | ref (ρ) | (C, τ) → (C , τ ) | hτ1, τ2i Our qualifier inference is performed after alias analysis. The C :=  | Alloc(C, ρ) | Assign(C, ρ: τ) alias analysis results are used to decorate aliased references with | Merge(C,C0,L) | Filter(C,L) the same abstract locations ρ. This can be the line number of η := 0 | 1 | ω an object allocation statement. In the input programs, reference creation expressions are decorated with abstract locations and The qualified types τ can have qualifiers at different levels. functions are decorated with effects (i.e., the set of abstract Q can be a qualifier variable κ or a constant qualifier init locations that they access). The abstract syntax is defined as or uninit. The flow-sensitive analysis associates a ground follows: store C to each program point that is a vector that associates abstract locations to qualified types. Thus, function types are e := x | n | λL x: t. e | e e | refρ e | !e 1 2 now extended to (C, τ) → (C0, τ 0) where C is the store that | e := e | he , e i | fst(e) | snd(e) 1 2 1 2 the function is invoked in and C0 is the store when the function | fst(e ) := e | snd(e ) := e | 1 2 1 2 returns. | assert(e, Q) | check(e, Q) Each location in a store C also has an associated linearity t := α | int | ref (ρ) | t →L t0 | ht , t i 1 2 η that can take three values: 0 for unallocated locations, L := {ρ, .., ρ} 1 for linear locations, and ω for non-linear locations. An An expression e can be a variable x, a constant integer n, a abstract location is linear if the can prove that it function λL x: t. e with argument x of type t, effect set L and corresponds to a single concrete location in every execution. An body e. The effect set L is the set of abstract locations ρ that update that changes the qualifier of a location is called a strong the function accesses. A type t is either a type variable α, an update; otherwise, it is called a weak update. Strong updates integer type int, a reference ref (ρ) (to the abstract location can be applied to only linear locations. The three linearities ρ), a function type t →L t0 (that is decorated with its effects form a lattice 0 < 1 < ω. Addition on linearities is as follows: L) or a pair type ht1, t2i. The analysis will involve a store 0 + x = x, 1 + 1 = ω, and ω + x = ω. The type inference C that maps abstract locations ρ to types. The expression system tracks the linearity of locations to allow strong updates e1 e2 is the application of function e1 to argument e2. The for only the linear locations. ρ reference creation expression ref e (decorated with the abstract Since a store C maps from each abstract location ρi to a location ρ) allocates memory with the value e. The expression type τi and a linearity ηi, we write C(ρ) as the type of ρ !e dereferences the reference e. The expression e1 := e2 in C and Clin (ρ) as the linearity of ρ in C. Store variables assigns the value of e2 to the location e1 points to. The are denoted as . We use the following store constructors to expression he1, e2i is the pair of e1 and e2. The expressions represent the store after an expression as a function of the store fst(e) and snd(e) are the first and second elements of the pair e before it. Alloc(C, ρ) returns the same store as C except for respectively. The expressions fst(e1) := e2 and snd(e1) := e2 the location ρ. Allocating ρ does not affect the types in the assign the value of e2 to the first element and second elements store; however, as ρ is allocated once more, the linearity of ρ 0 of the location e1 points to respectively. is increased by one. Merge(C,C ,L) returns the combination We use explicit qualifiers to both annotate and check the of stores C and C0; for a location ρ, if ρ ∈ L, then its type and initialization status of expressions. The expression assert(e, Q) linearity are taken from C, otherwise from C0. Filter(C,L) ρ Alloc(C, ρ0)(ρ) =C(ρ) type of in the post-store. The qualifier of the new location n1 + C (ρ) if ρ = ρ0 is initialized. The rule DEREF checks that the dereferenced Alloc(C, ρ0) (ρ) = lin lin C(ρ) otherwise expression is of a reference type ref (ρ) and retrieves the type nC(ρ) if ρ ∈ L Merge(C,C0,L)(ρ) = C(ρ0) otherwise of the value stored at the location ρ from the store. Qualifiers nC (ρ) if ρ ∈ L are checked by the single check expression described before Merge(C,C0,L) (ρ) = lin lin C0 (ρ) lin otherwise (and not when references are dereferenced). The rule ASSIGN Filter(C,L)(ρ) =C(ρ) ρ ∈ L nC (ρ) if ρ ∈ L checks that the left-hand side expression is of a reference type Filter(C,L) (ρ) = lin lin 0 otherwise and checks that the type of the right-hand side is a subtype of 0 0 0 τ where τ  τ if ρ = ρ ∧ Clin (ρ) 6= ω 0 0 the type of the value that the reference stores. It also checks Assign(C, ρ : τ)(ρ) = τ t C(ρ) if ρ = ρ ∧ Clin (ρ) = ω C(ρ) otherwise that the right-hand side can be assigned to the left-hand side 0 Assign(C, ρ : τ)lin (ρ)=Clin (ρ) considering the linearity and type of the left-hand side reference and the type of the right-hand side expression (as described in restricts the domain of C to L. Assign(C, ρ: τ) overrides C the definition of Assign above). The rule LAM type-checks the by mapping ρ to a type τ 0 such that τ  τ 0. The condition function body e in a fresh initial store  and with the parameter τ  τ 0 allows assigning a subtype τ of resulting type τ 0 to ρ. bound to a type with fresh qualifier variables. The resulting If ρ is linear then its type in Assign(C, ρ : τ) is τ 0; otherwise post-store of the function body C0 should be a subtype of its type is conservatively the least-upper bound of τ and its the post-store of the function 0. This step essentially creates previous type C(ρ). a function summary, which has been explained in the paper The type inference system generates subtyping constraints section 4.3. We use the function sp(t) to decorate a standard between stores. We define store subtyping in Figure 1. type t with fresh qualifier and store variables: sp(α) = κ α κ fresh INT REF Q  Q0 Q  Q0 sp(int) = κ int κ fresh 0 0 sp(ref (ρ)) = κ ref (ρ) κ fresh Q int  Q int Q ref (ρ)  Q ref (ρ) L 0 L 0 0 0 FUN sp(t → t ) = κ (, sp(t)) → ( , sp(t )) κ, ,  fresh 0 0 0 0 0 0 0 Q  Q τ2  τ1 τ1  τ2 C2  C1 C1  C2 sp(ht, t i) = κ hsp(t), sp(t )i κ fresh L 0 0 0 L 0 0 Q (C1, τ1) → (C1, τ1)  Q (C2, τ2) → (C2, τ2) The rule APP checks that the type of e2 is a subtype STORE 0 0 of the parameter type of e1, Further, with the condition τi  τi ηi  ηi i = 1..n Filter(C,L)  , it checks that state of the locations that 0 0 η1 ηn η1 0 ηn 0 00 {ρ1 : τ1, ..., ρn : τ1}  {ρ1 : τ1, ..., ρ1 : τn} e1 uses (captured by its effect set L) in the post-store C of e2 PAIR are compatible with the store  that the function e expects. The 0 0 0 1 Q  Q τ1  τ1 τ1  τ2 resulting store Merge(0,C00,L) joins the store C00 before the 0 0 0 0 Q hτ1, τ2i  Q hτ1, τ2i function call with the result store  of the function. Filtering Fig. 1: Store subtyping. and merging according to the effect set provides polymorphism Constraints between stores yield constraints between lineari- as functions do not affect the locations they do not use. The ties and types, which in turn yield constraints between qualifiers rule ASSERT adds a qualifier annotation to the program, and 0 and linearities. The rule INT requires a corresponding the rule CHECK checks that the top-level qualifier Q of e is subtyping relation for the qualifiers of the type int. The rule more specific or equal to the the expected qualifier Q. REF requires the same subtyping relation between qualifiers The rule PAIR type-checks the expressions e1 and e2 in and further, the equality of the two locations. The rule FUN order and results in an initialized pair type. The rule FST requires the subtyping relation between the top-level qualifiers, checks that the expression e is of a pair type and types fst(e) and contra-variance for the argument and input store and co- as the first element of the pair type. The qualifier Q of the pair variance for the return value and output store. The rule STORE type is unconstrained; qualifiers are only checked by the check requires both subtyping and stronger linearity for corresponding expressions presented above. The rule FSTASSIGN checks that locations. The rule PAIR requires subtyping between the top- the expression e1 is of a reference type ref (ρ), the post-store 00 level qualifiers, and also subtyping for corresponding elements C (after checking e1 and e2) maps the reference ρ to a of the two pair type. supertype of a pair type κ hα1, α2i, and the type τ1 of e2 is a subtype of α1. The resulting store remaps ρ to a new pair type C. Type Inference System where the first element is the type of τ1 and the second element We present the complete rules of the type inference system is unchanged. More precisely, as described in the definition of in Figure 2. The judgments are of the form Γ,C ` e: τ, C0 Assign above, the Assign store updates ρ to the new pair type that is read as in type environment Γ and store C, evaluating if ρ is linear; otherwise updates ρ to the least upper bound e yields a result of type τ and a new store C0. The rules VAR of the old and new pair types. We elide the rules for snd that and INT are standard. The rule REF creates a location and are similar to the rules for fst. The constraints generated by adds it to the store. The type τ of the expression e that is the new rules PAIR, FST and FSTASSIGN are type and store stored in the new location is constrained to be a subtype of the subtyping constraints that the previous rules generated too. VAR INT REF DEREF x ∈ dom(Γ) κ fresh Γ,C ` e: τ, C0 τ  C0(ρ) Γ,C ` e : Q ref (ρ),C0 Γ,C ` x : Γ(x),C Γ,C ` n : κ int, C Γ,C ` refρe: κ ref (ρ), Alloc(C0, ρ) Γ,C ` !e : C0(ρ),C0

ASSIGN LAM 0 0 00 00 0 0 0 0 0 Γ,C ` e1: Q ref (ρ),C Γ,C ` e2: τ, C τ  C (ρ) τ = sp(t) ,  , κ fresh Γ[x 7→ τ],  ` e : τ ,C C   00 L L 0 0 Γ,C ` e1 := e2: τ, Assign(C , ρ: τ) Γ,C ` λ x : t.e : κ(, τ) → ( , τ ),C

APP ASSERT L 0 0 0 0 00 00 0 0 0 Γ,C ` e1 : Q(, τ) → ( , τ ),C Γ,C ` e2 : τ2,C τ2  τ F ilter(C ,L)   Γ,C ` e : Q σ, C Q  Q 0 0 00 0 Γ,C ` e1 e2 : τ , Merge( ,C ,L) Γ,C ` assert(e, Q): Q σ, C

CHECK PAIR FST 0 0 0 0 0 00 0 Γ,C ` e : Q σ, C Q  Q Γ,C ` e1: τ1,C Γ,C ` e2: τ2,C Γ,C ` e: Q hτ1, τ2i,C 0 0 00 0 Γ,C ` check(e, Q): Q σ, C Γ,C ` he1, e2i: κ hτ1, τ2i,C Γ,C ` fst(e): τ1,C

FSTASSIGN 0 0 00 00 Γ,C ` e1: Q ref (ρ),C Γ,C ` e2: τ1,C κ hα1, α2i  C (ρ) τ1  α1 κ, α1, α2 fresh 00 00 Γ,C ` fst(e1) := e2: τ1, Assign(C , ρ: hτ1, snd(C (ρ))i)

Fig. 2: Type inference system.

Further, by the rule PAIR, the subtyping constraints between types such that S satisfies the constraints C. In other words, pair types are decomposed into subtyping constraints between substituting each variable in the constraints C with its mapping qualifier and simpler types that are inductively decomposed in S results in valid constraints. If there is a solution S for the into constraints between qualifiers and linearities. Thus, the constraints C then the evaluation of e cannot get stuck. The added inference rules do not increase the complexity of the evaluation of an expression can get stuck if a non-reference generated constraints. value is dereferenced, a value is assigned to a non-reference value, a value of a mismatching type is assigned to a reference . Soundness to a location of a specific type, the parameter of a function The type inference has the following soundness property. is instantiated with an argument of a mismatched type, and Consider a given expression e. Consider the set of conditions C more importantly a qualifier check or assertion fails, i.e., the generated during type inference for e in the empty environment qualifier of a value is not a subtype of the expected qualifier. and empty store i.e., the constraints generated to derive the 0 0 judgment ∅, ∅ ` e: τ, C for some type τ and store C .A REFERENCES solution S for the constraints C is a mapping from store [1] Foster, J.S., Terauchi, T. and Aiken, A., 2002, May. Flow-sensitive type variables  to concrete stores, from qualifier variables κ to qualifiers. In Proceedings of the ACM SIGPLAN 2002 Conference on concrete qualifiers, and from type variables α to concrete design and implementation (pp. 1-12).