<<

A Kind System for Typed Machine Language

Andrew W. Appel Christopher D. Richards Kedar N. Swadi

Princeton University, October 2002 {appel,richards,kswadi}@cs.princeton.edu

ABSTRACT in ways which require us to keep track of properties like One of the aims of Foundational Proof-Carrying Code (FPCC) contractiveness as well as representability. is to incorporate a completely semantic description of types Finally, to model arithmetic dataflow, we use singleton into the Proof-Carrying Code framework. FPCC describes integer types. When quantifying over such types in dataflow types as complex predicates, some of which require proper- equations, we also wish to keep track of which types are ties like contractiveness, representability,andextensionality singletons. to hold. We keep track of these properties by encoding them The previous approaches to semantics of recursive types within kinds. In this paper, we give a syntactic kinding sys- explain how to prove the contractiveness of a single-argument tem with semantic proofs. type function, and how to compose single-argument contrac- tive and nonexpansive type functions. Unfortunately, in the type systems we use in real life, type expressions often use 1. INTRODUCTION many type variables at once. That is, deep inside a type Some of the early frameworks for Proof-Carrying Code (PCC) expression such as [12] assumed the soundness of the typing rules for a particu- ×∃ × lar . Foundational Proof-Carrying Code (FPCC) Λα. rec β.α γ.(γ β) [3] reduces the size of the Trusted Computing Base by giv- there are subexpressions with several free variables. These ing semantics to types and instructions in terms of higher- are not simply compositions of one-argument type functions. order logic and arithmetic facts, and using these semantics What we need is an organizing principle for applying the in machine-checked proofs of the typing rules. semantic ideas for recursive types or mutable references to In the FPCC project at Princeton, we have a type sys- nontrivial type expressions. tem targeted towards compilation, with safety proofs, of In this paper, we present a kinding system to track these languages such as Java and ML. Among the type operators properties and ensure that all our types are well formed. in this system are the recursion operator rec, and the refer- Properties we wish to track are encoded into our kinds, the ence operators box (for immutable references) and ref (for semantics of which are hidden in their definitions. Our kind- mutable references). Recursive types are written as rec(f), ing judgments are backed by machine-checked proofs; some where f is a type function. In the semantic systems of Mac- of these proofs are complete and checked in the Twelf sys- Queen, Plotkin, Sethi [9] and Appel and McAllester [5], the tem; others we are still in the process of implementing from fold/unfold lemma to manipulate recursive types, hand-verified arguments. The syntactic presentation also allows us to achieve modularity; higher layers in our sys- rec f = f(rec f), tem which use kinds and kinding judgments are shielded is provable only if f is contractive.Thatis,f must be the from their semantics. We expect any foundational semantic composition of at least one contractive operator with other model for core ML to require addressing the properties listed contractive and nonexpansive operators. As a result, it be- above, and believe that the kinding system to be useful for comes necessary to keep track of the contractiveness and the all such models. nonexpansiveness of types to ensure well formedness. Our kinding system is restricted to first-order kinds. Since To give a model for mutable references, we use the work our current goal in the FPCC project is to be able to compile of Ahmed et al. [2]. In this model, a reference type ref(τ) core ML, it is enough for us to restrict ourselves to first-order is well formed if τ is representable.TomodelcoreML,we kinds. often require a composition of operators like ∃, ref,andrec We have type functions arising out of the uses of type operators like rec and ∃, and we use de Bruijn indices [7] with explicit substitutions [1] to manage applications of ar- guments to these type functions. Our de Bruijn numbering starts at 0. We do not use an explicit Λ anywhere; in our calculus, the of a type expression (possibly with free variables) is known from the context in which it appears, so the Λ is unnecessary.

1 Examples. a TML type is modeled as a predicate on (ρ, k, v), where ρ is a function from natural numbers to closed types. ref(int) A judgment v :k τ makes sense if τ has no free type A mutable reference to an integer. variables, and is an abbreviation for the semantic formula ∀ρ. τ(ρ, k, v). box(const(6)) An immutable pointer to a value which must be the 2.1 Contractive and nonexpansive types constant 6. To reason about recursive types using the two-way rec τF = τF (rec τF ), we must know that τF is contractive ∃(0 × 0) in its first argument. Intuitively, a F is A pair of values, both of which have the same type. In contractive if it takes more machine instructions to reach our system the cartesian product operator × is not a the bottom of a value of type F (τ)thanitdoestoreachthe primitive, but is instead defined as the intersection of bottom of a value of type τ. Constructors such as box and field types. → are contractive,1 but constructors such as offset, ∩,and∪ are not. Intuitively, a value of type τ1 ∩ τ2 is no deeper than ∩ ∩ field(0, const(3)) field(1, int) field(2, 0) a value of type τ1 or a value of type τ2. However, these op- A 3- containing a constant integer 3, an integer, erators are all nonexpansive, meaning that the composition and a free type variable 0. Because this expression has of offset or ∩ with a contractive operator is contractive.   a free type variable, it must be interpreted in some Formally, we define contractiveness as follows. Let φ k th context that gives a binding for 0. Note that not even be a type representing the k approximation to the type φ, field(i, τ)isprimitive;itisdefinedasoffset(i, box(τ)). φ = λ(j, v).∀j

2 Figure 1 shows the machine-level view of a cons and a following problematic model: nil integer instance of this type. Register r contains zero 1 → → → denoting a nil cell. Register r4 contains a nonzero value Type τ : M Λ v o. (314) denoting the cons cell. This value is interpreted as Memory M : L → v. a pointer to a two-word structure. The first word contains Allocated Set Λ : L → τ. integer 10, and the second contains a pointer to the next cell Values v : {0, 1, ....} of the list. Locations L : {0, 1, ....} The problem is that there is a circularity of definition in Registers Memory the semantic model: types are predicates on alloc-sets, and alloc-sets are predicates on types. Ahmed et al. remedy this situation by making the allocation set a map from location to type syntax, τ . τ is a tree (or G¨odel number) which r1 0 314 10 syn syn gives a syntactic description of the type. Although this re- (nil) ..... 512 moves the circularity, it requires another map from the type syntax to the semantic type. This map is the representation r ..... 4 314 function repr. Given a type-syntax encoding, repr gives the (cons) 512 9 corresponding semantic type. In this new setting we have:

0 Type syntax τsyn : {0, 1, 2, ...} Allocated Set Λ : L → τsyn. repr : τsyn → τ.

Figure 1: Representing nil and cons cells For convenience we define a predicate “representable” such that

representable def= λτ. ∃τ . repr(τ )=τ. Using low-level types, we describe the List type as syn syn

Types like ref(τ) are meaningful in this model only if there  rec (nonzero ∩ offset 0(box 1) is a syntax τsyn for the type τ. Not all types are repre- ∩ offset 1(box 0)) sentable, though, and therefore this check is necessary. On ∪ (const 0)  the other hand, most type operators do not require that their argument-type be representable. Type expressions made only from operators like box and offset are representable and where any nonzero value has the nonzero type. Note that 0, can go inside mutable references. the parameter to rec, appears deep inside the type expression Quantifiers, however, are a more complex. To have within a box operator. The rec constructor requires its ar- meaningful quantified type expressions, we may choose to gument to be contractive in order to be well formed. Due to quantify over representable types, or over all types: this wide separation of the binding of the type variable and its use in the type expression, we require a kinding system def ∀r = λF.λ(M,Λ,v).∀τ. representable(τ) ⇒ (Fτ) which can keep track of the contractive and nonexpansive ∀ def ∀ properties of type variables. a = λF.λ(M,Λ,v). τ. (Fτ) def ∃r = λF.λ(M,Λ,v).∃τ. representable(τ) ∧ (Fτ) ∃ def ∃ 2.2 Representable types a = λF.λ(M,Λ,v). τ. (Fτ)

The inclusion of references in our semantic model is challeng- The operators ∀r and ∃r quantify over representable types, ing. Ahmed, Appel, and Virga [2] give a semantic model of while ∀a and ∃a quantify over all types. references. For this model to work, it is necessary for types The semantic model of ∀r must refer to repr. Therefore, to be representable. Informally, if a type is representable the repr relation cannot itself refer to ∀r, and thus ∀r is then a value with that type can be stored in a mutable cell unrepresentable. On the other hand, the model of ∀a does in memory. We give a brief explanation of what it means not refer to repr, and this means that repr can safely refer for types to be representable. to ∀a;thatis,∀a can be representable. In the Appel-Felty model of types [4], which can model Over which types is it more useful to quantify? It turns immutable pointers and memory allocation but is too weak out we need both. To represent the abstype feature of core to model mutable references, a type is a predicate on mem- ML, we need ∃r, but to represent function closures we need ory M, the allocated set Λ, and a value v. M is a map from ∃a. To represent polymorphic functions we need ∀r. locations to values, while Λ is a finite set of “allocated” lo- We need to keep track of representability of complex type cations. But in conventional syntactic calculi with mutable expressions in an organized way. Representability is encoded references, Λ is a finite map from locations to types; that is, into kinds in our system and the kinding system is then used in addition to telling which locations are allocated, Λ tells externally to impose the representability requirement on the what type of value we may store at each location. These type expressions. syntactic calculi have a judgment M :Λsayingthateach Throughout the 1980s, the interaction of references and element of the memory has an appropriate type. polymorphism caused much trouble in the of A naive extension of the Appel-Felty model to a regime ML. Our semantic model, and the associated kinding sys- where Λ is a function from locations to types, yields the tem, provides a new explanation of this interaction.

3 3. TYPED MACHINE LANGUAGE Kinds κ ::= Ω | Ω | Ω | Ω In our FPCC project, we use Typed Machine Language 0 R C RC (TML) as an between the semantics built on higher- Types τ ::= n | |⊥ order logic and arithmetic, and the syntactic rules used | codeptr(τ) | offset (τ1,τ2) to provide type checking (and safety proofs) for programs. | box(τ) | ref(τ) That is, at the bottom we have higher-order logic (HOL), | rec(τ) which is quite general and completely undecidable. At the | τ1 ∩ τ2 | τ1 ∪ τ2 top, we have a typed assembly language (TAL), which has |∀r τ |∃r τ a syntax-directed type-checking algorithm but is necessarily |∀a τ |∃a τ not very general: the TAL is suitable for a particular pro- | const(τ) gramming language as compiled by a particular compiler | geq(n) | leq(n) | gt(n) | lt(n) for a particular target machine. Our goal is to define the | plus(τ1,τ2) | minus(τ1,τ2) semantics of TAL in terms of HOL, and to prove all the typing rules of TAL as derived lemmas in HOL. We expect Naturals n ::= {0, 1, ....} that these proofs will total about a hundred thousand lines of code, in the LF notation used by the Twelf [13] system. A hundred thousand lines of software must be well modu- Figure 2: TML Kinds and Types larized with appropriate abstraction layers, for it could never be built and maintained otherwise. Our most important ab- straction layer between HOL and TAL is Typed Machine We have used our semantic model to prove all the con- Language. ventional syntactic rules of the explicit substitution calcu- Because TAL must be syntax directed, it cannot include lus, and we have implemented a logic program (in the Twelf fully general intersection and union types, subtyping, quan- system) that reduces expressions containing substitutions. tification, and so on. TML has all the primitives necessary to Rendering F (int)asF [int · id] and substituting gives us construct the application-specific types used in TAL; there-  G = rec (nonzero ∩ offset 0(box int) fore TML subtyping cannot be syntax-directed, or even de- ∩ offset 1(box 0)) cidable. (See Figure 2 for the full repertoire of TML type  ∪ (const 0) primitives.) We will prove the primitive TML subtyping rules in HOL, by hand, and check them by machine; then Now we can apply the UNFOLD rule, we will prove the TAL rules in the TML calculus, mostly by v : F “F is contractive” hand, and check them by machine. Finally we can apply the UNFOLD TAL rules to a given compiled program fully automatically. v : F (rec F ) Although TML is intended for the construction of TAL, which results in the new typing judgment and not to reason about programs directly, in this paper ∩ we will explain TML by applying it directly to machine- r1 :(nonzero offset 0(box int) language programs. ∩ offset 1(box G)) For example, consider the ML program ∪ (const 0) datatype ’a List = Cons of ’a * ’a List | Nil After the conditional goto, where it is known that r1 is not ... zero, we have: case x : ’a List  r1 : nonzero ∩ (nonzero ∩ offset 0(box int) of Cons(h,t) => ... ∩ offset 1(box G)) | Nil => ... ∪ (const 0)  As explained in section 2.1, we encode the datatype as where nonzero is a type containing all the nonzero integers. F = rec (nonzero ∩ offset 0(box 1) Since (nonzero ∩ const 0) = ⊥, and using the distributivity

∩ offset 1(box 0)) law, we can prove  ∪ (const 0) r1 :(offset 0(box int) ∩ offset 1(box G)) where the type expression on the right is an α list. The and thus it is safe (at label ConsCase) to dereference r . program can be compiled to machine code as 1 In FPCC, the UNFOLD rule is a theorem built upon the semantics of the rec type. It requires F to satisfy a con- {r1 : F (int)} tractiveness property, which we have written informally in if r1=0 goto NilCase the presentation above. Our kinding system, described in ConsCase: r2 := mem[r1+0] the next section, captures the semantics of contractiveness ... (and other properties) in a formal way, and hides the se- NilCase: ... mantics under an abstraction layer, leaving only an easily manipulated syntax for the user of TML. The case discrimination implicit in the if-goto instruction requires that r1 belong to a . Indeed, there is 4. KINDING SYSTEM a union buried inside F (int), but first we must beta-reduce Previous sections have motivated the need for a kinding sys- (using the explicit substitution calculus) and then unfold the tem which is able to track the extensionality, representabil- rec. ity, contractiveness, and constancy properties of TML types.

4 Ω Because TML uses de Bruijn indices instead of variables, N our kinding judgment need not mention the xi explicitly. Rather, index i is assumed to have kind κi.Weobtain therefore a judgment of the form

Ω RC κ0,...,κn  τ :: κ

We call the array of kinds to the left of the turnstile Γ, and Ω Ω denote by Γ(i)theith kind in the array—the kind of the R C prospective term to be substituted for de Bruijn index i.

4.2.1 Model I Ω 0 Now we must assign semantics to the judgment Γ  τ :: κ. Let us summarize the concepts at hand and introduce a ty- Figure 3: The TML kinding hierarchy as a lattice. pographical convention. An open type τ may contain as sub- terms some number of free variables represented as de Bruijn indices. Such an open τ may be closed by applying it to an In this section we describe a hierarchy of kinds suited to that environment ρ that maps each index i to type ρ(i). Note purpose, and give these kinds and the kinding judgment a that ρ(i) is itself necessarily a closed type. We use φ to semantic model that allows us to prove the kinding rules as denote closed types such as τ(ρ)andρ(i). lemmas. Last, we describe the kinding system’s implemen- Since we associate a kind to each type variable, it is nat- tation as a logic program, and this program’s use in assisting ural to think of modeling kinds as predicates on closed types. the users of TML in the writing of proofs. Since extensionality, representability, and constancy are prop- 4.1 Kinding hierarchy erties of type expressions, this approach works well: Model I captures all the properties of interest except that of contrac- The kinding system employs a hierarchy of five kinds: Ω0, tiveness. ΩR,ΩC ,ΩRC ,andΩN . Except where noted, the reader Now in this model we render kinds as predicates on closed may assume the presence of a weakening rule where one types φ. Since the kinding judgment Γ  τ :: κ must ulti- kind implies another. mately involve the open type τ, and since const constructs

Ω0 is the most inclusive kind, requiring only that its mem- an open type, some of the following definitions must make bers respect basic properties such as extensionality judicious use of a universally quantified ρ to close the open over equivalent values and states. types. The kinds in this model are defined as follows:

ΩR implies Ω0, and additionally requires its members to def? respect representability. Ω0 = λφ. extensional(φ) def? ΩC implies Ω0, and additionally requires its members to ΩR = λφ. representable(φ) ∧ Ω0(φ) respect contractiveness. While the kind hierarchy in- def? cludes this element for completeness, in practice we ΩN = λφ. ∃n. ∀ρ. φ = const(n)(ρ) have no need of it; it does not figure into the kinding judgments. and the the kinding judgment, thus:

ΩRC implies both ΩR and ΩC .

 

   Ω is the kind of integer singleton types (those constructed def?  N Γ  τ :: κ = ∀ρ. ∀i. Γ(i) ρ(i) ⇒ κ τ(ρ) from const). It can be proved that such types are both representable and contractive, that is, that ΩN implies ΩRC . However it is a matter of taste and style whether Here, as before, ρ is an environment mapping indices to the TML calculus include the corresponding weakening closed type expressions. Hence Γ  τ :: κ says that τ belongs rule. Here we demand that expressions having kind to κ under the assumption that each index i in τ has kind ΩN be explicitly coerced to kind ΩRC , in preference to Γ(i). the weakening rule. The kinding hierarchy thus forms a lattice as per Figure 3. In the figure, the relation depicted is that of weakening. We 4.2.2 Model II show a dotted line from ΩN to ΩRC to denote that while To the detriment of the previous model, contractiveness is such a weakening rule is possible, it remains a matter of not a property of types but rather of functions from types to taste whether to include it. types (hereafter “type functions”). Thus we must reformu- late our kind calculus so that kinds are predicates on type 4.2 Semantics of the kinding judgment functions. In such a formulation, it is simple to produce a In a conventional kinding system the syntax of the kinding definition for ΩC : judgment takes the following form: def x0 :: κ0,...,xn :: κn  τ :: κ ΩC = λf. contractive(f) ∧ Ω0(f)

5 This formulation slightly complicates the definitions of Ω0, label : ΩR,andΩN . The solution is to require the type function head <- subgoal_1 <- ... <- subgoal_n . under consideration to yield a result that is extensional, rep- resentable, or constant, respectively, regardless of its argu- whereas a traditional Prolog system would use the syntax ment. Thus we write:

head :- subgoal_1 , ... , subgoal_n .  def ∀ 

Ω0 = λf. φ. extensional f(φ) Twelf permits the use of higher-order abstract syntax, but  def  we have no need of it here. ΩR = λf. ∀φ. representable f(φ) ∧ Ω0(f) The user of the TML calculus would run the logic program def ΩN = λf. ∃n. ∀φ. ∀ρ. f(φ)=const(n)(ρ) by issuing to Twelf a directive of the form For the revised formulation of the kinding judgment we re- tau = union tain the form of the previous incarnation. However, since @ (intersection the new regime raises the subjects of the kinding judgment @ nonzero from closed types to functions on closed types, we replace ρ, @ (offset @ 0 @ (box @ (var @ 1))) a map from integers to closed types i,withR,amapfrom @ (offset @ 1 @ (box @ (var @ 0)))) integers to functions on closed types. Similarly, on the right @ (const @ 0) . hand side of the implication arrow we replace τ(ρ)withthe analogous type function constructed using R. %solve witness : |-wellformed Gamma tau Kappa .

Γ  τ :: κ def=

  

 Ordinarily, tau is ground (the type expression of interest

    ∀R. ∀i. Γ(i) R(i) ⇒ κ λφ. τ λi. R(i)(φ) supplied by the user), and Gamma and Kappa are unification variables. Should the logic program succeed, Twelf instanti- We are left with the problem of how to relate each R(i) ates Gamma and Kappa. Furthermore, Twelf binds to witness to de Bruijn index ρ(i). To solve this problem, consider the the witness to the program’s success, a tree with nodes la- right hand side of the implication in the new kinding judg- beled with the names of clauses. The witness and the in- ment. To relate type function R(i)toi, we can construct a stantiated Gamma and Kappa may then be used in subsequent ρ using R in its definition, such that R(i) perturbs whatever expressions of concern to the TML user. The user of the would otherwise have been the image of i (here written as TML calculus is thus free to expend effort on interesting φ): problems, leaving tedious syntax-directed judgments to the def computer. ρ0 = λi. R(i)(φ) Naturally, we wish this ρ0 to be the environment used to 5. SEMANTIC PROOFS OF LOGIC

close τ:  def  PROGRAMS φ0 = τ λi. R(i)(φ) Using the Twelf logical framework allows us the convenience Now by λ-abstracting φ, we obtain a function on closed types to express our definitions, theorems and proof rules all in a suitable for offering to kind κ for judgment. uniform way. For example, the definition for the kind ΩR is 4.3 Kinding prover omega_R : We have implemented the inference rules for the kinding tm tml_kind = judgment (see Figure 4) as a logic program in the Twelf sys- lam [f] omega_0 @ f and tem. Each inference rule for the kinding judgment appears forall [tau] reppable @ repr @ tau imp as a clause in the logic program. For example, consider the reppable @ repr kinding rules for box and rec, which we write in notation as @ (lam [sigma] f @ (tau @ sigma)).

Γ  τ :: ΩR ΩR, Γ  τ :: ΩRC This statement in Twelf defines “omega_R”tobeofTwelf , . metalogic type tm tml_kind, and having a definition follow- Γ  box(τ)::ΩRC Γ  rec(τ)::ΩRC ing the “=” sign. Consider the kinding rule which states that In the logic program, these inference rules are realized as the union of two types of kind ΩRC is also of kind ΩRC . clauses thusly: τ1 :: ΩRC τ2 :: ΩRC |-wellformed_box : Ω ∪ RC τ1 ∪ τ2 :: ΩRC |-wellformed Gamma (box @ Tau) omega_RC <- |-wellformed Gamma Tau omega_R . Below is the corresponding semantic lemma in Twelf named “omega_union_RC”. It is written in exactly the same syntax |-wellformed_rec : as the definition above: |-wellformed Gamma (rec @ Tau) omega_RC <- omega_union_RC : |-wellformed (omega_R , Gamma) Tau omega_RC . pf (omega_RC @ T1) -> This Twelf presentation of the logic program’s clauses differs pf (omega_RC @ T2) -> from that of a traditional Prolog system’s in two salient pf (omega_0 @ (union @ T1 @ T2)) = ways. First, note that each clause is labeled with the name [p1 : pf (omega_R @ T1)] of the corresponding inference rule. We will return to this [p2 : pf (omega_R @ T2)] point below and again in section 5. Second, for clauses Twelf ... p1 ... uses the syntax ... p2 ... .

6 The Twelf metalogic type of this lemma (the expression fol- lowing the : sign) is pf (omega_RC @ T1) -> pf (omega_RC @ T2) -> pf (omega_RC @ (union @ T1 @ T2) and it tells us that given proofs of T1 and T1 having kind ΩRC , we can derive a proof of union @ T1 @ T2 having the kind ΩRC . In an automated setting, we would require the Γ(i)=κ equivalent of the Prolog rule: WF VAR Γ  i :: κ omega_RC (union(T1, T2)) :- WF  WF ⊥ Γ  :: Ω Γ ⊥:: Ω omega_RC (T1), RC RC omega_RC (T2). Γ  τ :: ΩR WF CPTR This Prolog rule, however, is not backed by any semantic Γ  codeptr τ :: Ω RC proofs. We can reuse the Twelf notation shown above to get Γ  τ :: ΩRC Γ  n :: ΩN an equivalent rule with a semantic proof: WF OFFSET Γ  offset (τ ,τ )::Ω 1 2 RC RC_union_rule :

Γ  τ :: ΩR Γ  τ :: ΩR pf (omega_RC @ (union @ T1 @ T2)) <- WF BOX WF REF pf (omega_RC @ T1) <- Γ  box τ :: ΩRC Γ  ref τ :: ΩRC pf (omega_RC @ T2) = ΩR, Γ  τ :: ΩRC [p1 : pf (omega_RC @ T1)] WF REC Γ  rec τ :: ΩRC [p2 : pf (omega_RC @ T2)] (omega_union_RC @ p1 @ p2). Γ  τ1 :: ΩRC Γ  τ2 :: ΩRC WF ∩ Γ  τ1 ∩ τ2 :: ΩRC The last line gives us a justification for this “logic program- ming rule”. As a result of this validity checking and uni- Γ  τ1 :: ΩRC Γ  τ2 :: ΩRC ∪ formity of syntax, it is now possible for us to use Twelf to  ∪ WF Γ τ1 τ2 :: ΩRC ensure the correctness of a logic program (composed of rules

ΩRC , Γ  τ :: ΩRC ΩRC , Γ  τ :: ΩRC like RC_union_rule) which automatically generates proofs of WF ∀r WF ∃r well formedness of types. Γ ∀r τ :: Ω0 Γ ∃r τ :: Ω0

Ω0, Γ  τ :: Ω0 Ω0, Γ  τ :: Ω0 WF ∀a WF ∃a 6. HIGHER-ORDER KINDS? Γ ∀a τ :: ΩR Γ ∃a τ :: ΩR 6.1 Object logic representation of kinds  WF CONST Γ const(n)::ΩN We use the Twelf logical framework for our implementation WF GEQ WF GT of FPCC, which allows specification of type constructors Γ  geq(n)::Ω Γ  gt(n)::Ω N N in our object logic. Kinds such as ΩR and ΩN are rep- WF LEQ WF LT resented by higher-order logic definitions such as omega_R Γ  leq(n)::ΩN Γ  lt(n)::ΩN and omega_N. Since these kinding predicates must be able to fit in the same slots in the kinding judgment, the (higher- Γ  τ1 :: ΩN Γ  τ2 :: ΩN WF + order-logic) type of omega_R must be the same as the type  Γ plus(τ1,τ2)::ΩN of omega_N. Indeed, both predicates have the type

Γ  τ1 :: ΩN Γ  τ2 :: ΩN → → WF − (closedtype closedtype) o. Γ  minus(τ1,τ2)::ΩN It is natural to want to extend the kind system with Γ  τ :: ΩR Γ  τ :: ΩC higher-order kinds formed by a rule κ ::= κ → κ.How- WF WEAK1 WF WEAK2 Γ  τ :: Ω0 Γ  τ :: Ω0 ever, the kinding predicates representing higher-order kinds would have to have (higher-order-logic) types such as Γ  τ :: ΩRC Γ  τ :: ΩRC WF WEAK3 WF WEAK4 → → → → Γ  τ :: ΩR Γ  τ :: ΩC (closedtype closedtype) (closedtype closedtype) o and so on. These predicates would not fit into our represen- tation framework in higher-order logic. Figure 4: Kinding (well-formedness) rules for TML On the other hand, for any bounded degree of higher-order kinds, we could lift all of our kind predicates to a particular higher-order level. Indeed, to model the Featherweight Java type system using techniques described by League et al. [8], we will require first-level kind functions. An alternate approach would be to permit different kind predicates to have different types in our underlying logical representation. Suppose we had N different types of kind predicates. Each kind of closed TML type would also be

7 represented by a differently-typed predicate in the logical [8] Christopher League, Valery Trifonov, and Zhong Shao. representation. Then we could not use a single environment Type-preserving compilation of Featherweight Java. ρ to map type variables to closed types; we would need to In Foundations of Object-Oriented Languages parameterize τ by N different ρ variables. We could not (FOOL8), London, January 2001. use a single series of de Bruijn indices; we would need N [9] David MacQueen, Gordon Plotkin, and Ravi Sethi. different sets. We could not use just one shift operator in An ideal model for recursive polymophic types. the explicit substitution calculus; we would need N 2 shift Information and Computation, 71(1/2):95–130, 1986. operators. We have decided to avoid this nightmare. [10] David B. MacQueen and Mads Tofte. A semantics for Because we have (at most) bounded higher-order kinds, higher-order functors. In Proc. European Symposium it is not clear how our system could represent the proposed on Programming (ESOP’94), pages 409–423, April higher-order module system of ML [10]. In fact, we have not 1994. even worked out how to model the Standard ML module [11] Robin Milner, Mads Tofte, Robert Harper, and David system [11]. We are compiling core ML to proof-carrying MacQueen. The Definition of Standard ML (Revised). code [6], and building foundational safety proofs. MIT Press, Cambridge, MA, 1997. [12] George Ciprian Necula. Compiling with Proofs.PhD 7. CONCLUSION thesis, School of , Carnegie Mellon Our kind system for foundational PCC allows us to keep University, Pittsburgh, PA, September 1998. track of semantic properties of types; this is crucial to en- [13] Frank Pfenning and Carsten Sch¨urmann. System suring that derived types formed from compositions of ba- description: Twelf — a meta-logical framework for sic types are sound. We believe that any type system for deductive systems. In The 16th International core ML at the machine code level, that provides sufficient Conference on Automated Deduction,Berlin,July flexibility to optimizing compilers must keep track of these 1999. Springer-Verlag. properties. While kinding rules in our system have machine-checked proofs, we also have a purely syntactic presentation of this system. This allows the complex semantic model to be hid- den from users of this system. We also have a logic program to automate proofs about well formedness of types using this kinding system. Using Twelf allows us to provide machine-checkable justification for these rules and thus prove the logic program correct.

8. REFERENCES [1] M. Abadi, L. Cardelli, P.-L. Curien, and J.-J. Levy. Explicit substitutions. In Seventeenth Annual ACM Symp. on Principles of Prog. Languages, pages 31–46. ACM Press, Jan 1990. [2] Amal Ahmed, Andrew W. Appel, and Roberto Virga. Semantics of general references by a hierarchy of G¨odel numberings. In 17th Annual IEEE Symposium on Logic in Computer Science (LICS 2002), pages 75–86, June 2002. [3] Andrew W. Appel. Foundational proof-carrying code. In Symposium on Logic in Computer Science (LICS ’01), pages 247–258. IEEE, 2001. [4] Andrew W. Appel and Amy P. Felty. A semantic model of types and machine instructions for proof-carrying code. In POPL ’00: The 27th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 243–253, New York, January 2000. ACM Press. [5] Andrew W. Appel and David McAllester. An indexed model of recursive types for foundational proof-carrying code. ACM Trans. on Programming Languages and Systems, 23(5):657–683, September 2001. [6] Juan Chen, Dinghao Wu, Hai Fang, and Andrew W. Appel. Low-level typed assembly language. in preparation, 2001. [7] N. G. deBruijn. Lambda calculus notation with nameless dummies, a tool for automatic formula manipulation. Indag. Math., 34:381–92, 1972.

8