Step-indexed Semantic Model of the Imperative Object Calculus

Catalin Hritcu Jan Schwinghammer Gert Smolka

Programming Systems Lab Saarland University, Saarbrücken

February 2007

1 The Big Picture

2 Deduction systems

Γ = a : α α β (Sem-SSub) | ⊆ Γ = a : β |

α τ τ β (Sem-SRefl) α α (Sem-STrans) ⊆ ⊆ ⊆ α β Sets of ⊆ (Sem-STop) α (Sem-SBot) α rules ⊆ " ⊥ ⊆ E D (Sem-SObj) ⊆ [md = ς(xd)bd]d D [me = ς(xe)be]e E ∈ ⊆ ∈

α, β Type. α β F (α) G(β) (Sem-SRec) ∀ ∈ ⊆ ⇒ ⊆ µF µG ⊆

β α τ Type. τ β F (τ) G(τ) (Sem-SUniv) ⊆ ∀ ∈ ⊆ ⇒ ⊆ F G ∀α ⊆ ∀β

3 Deduction systems are sets of rules used to prove properties of systems. For example these are some of the rules of our type system. Deduction systems

• Type systems • Property: type safety - efficiently decidable • Soundness usually proved syntactically • One can also use semantic models!(details soon) • Program logics (e.g. Hoare logic) • Property: correctness - undecidable • Soundness proved wrt. semantic model: • “Derivability implies validity in the model”

4 The properties one proves using deduction systems range from efciently decidable ones, like type soundness, to very strong properties which are undecidable like program correctness. Program logics are such deduction systems for program correctness.

In order to prove soundness of type systems one usually employs syntactic arguments - subject-reduction. However, for program logics one proves soundness with respect to a semantic model, which makes a clear distinction between validity and derivability.

A program logic is sound if everything that can be derived using the rules is also valid with respect to the model.

We will see that simple semantic models can also be used for type systems. The Challenge

5 However, finding good semantic models in which one can reason about the behaviour of programs is challenging. This is even harder when dynamic storage of executable code is allowed. Still, this feature is present in diferent forms in almost all practical programming languages: pointers to functions in C/C++, callbacks in Java or general references in ML.

Classical denotational models based on partial orders are highly complex in this setting, and in the existing ones many equivalences involving state still do not hold. [Category theory, domain theory: For higher-order store the semantic domain is defined as mixed-variant recursive equation one has to solve. For modeling dynamic allocation one has to move to a possible-world model, formalized as a category of functors over CPOs.]

One possible alternative is to use simpler set-theoretic models based on the operational semantics and step-indexing.

But before we dive into this, I will first present the we studied. The Challenge

• Executable code on the heap (e.g. C/C++, Java, ML) • Under this realistic assumption • Denotational semantic models • Very complex • Still not perfect

5 However, finding good semantic models in which one can reason about the behaviour of programs is challenging. This is even harder when dynamic storage of executable code is allowed. Still, this feature is present in diferent forms in almost all practical programming languages: pointers to functions in C/C++, callbacks in Java or general references in ML.

Classical denotational models based on partial orders are highly complex in this setting, and in the existing ones many equivalences involving state still do not hold. [Category theory, domain theory: For higher-order store the semantic domain is defined as mixed-variant recursive equation one has to solve. For modeling dynamic allocation one has to move to a possible-world model, formalized as a category of functors over CPOs.]

One possible alternative is to use simpler set-theoretic models based on the operational semantics and step-indexing.

But before we dive into this, I will first present the programming language we studied. The Challenge

• Executable code on the heap (e.g. C/C++, Java, ML) • Under this realistic assumption • Denotational semantic models • Very complex • Still not perfect • Possible alternative: “step-indexing” • Simpler set-theoretic models (details soon)

5 However, finding good semantic models in which one can reason about the behaviour of programs is challenging. This is even harder when dynamic storage of executable code is allowed. Still, this feature is present in diferent forms in almost all practical programming languages: pointers to functions in C/C++, callbacks in Java or general references in ML.

Classical denotational models based on partial orders are highly complex in this setting, and in the existing ones many equivalences involving state still do not hold. [Category theory, domain theory: For higher-order store the semantic domain is defined as mixed-variant recursive equation one has to solve. For modeling dynamic allocation one has to move to a possible-world model, formalized as a category of functors over CPOs.]

One possible alternative is to use simpler set-theoretic models based on the operational semantics and step-indexing.

But before we dive into this, I will first present the programming language we studied. Imperative Object Calculus

• Simple object-oriented programming language • Only one primitive: objects • Classes and inheritance encoded • Captures the essence of object-oriented languages • Features dynamically allocated, higher-order store • We also studied the functional object calculus • A Theory of Objects [Abadi & Cardelli ‘96]

Martin Abadi

6 It is called the imperative object calculus and it is a very simple programming language. Only objects are considered as primitives. However, the calculus is expressive enough to encode all common features of practical object-oriented languages like classes and inheritance.

The imperative object calculus captures the essence of object-oriented programming languages, the same way the lambda calculus captures the essence of functional ones.

The calculus features: dynamically allocated, higher-order store - a very realistic store type. As we discussed before, it is challenging to find good semantic models for such a calculus.

Actually, we have also studied the simpler functional object calculus and have a similar model for it, which I won’t present in this talk. Both the functional and the imperative object calculus are described in the book by Abadi and Cardelli: A Theory of Objects from 96. Imperative Object Calculus

a, b ::= x variable [md = ς(xd)bd]d D object creation | clone a ∈ object cloning | a.m method invocation | a.m := ς(x)b method update | λx. b a b procedures | |

[gcd = ς(y)λx. λz. if x < z then y.gcd x (z x) else if z < x then y.gcd (−x z) z else x] −

7 Step-indexed Semantic Models

• First developed by Appel et al. - foundational PCC • Machine-checkable proofs of type soundness • For low-level languages • Lambda calculus • Recursive types [Appel & McAllester, ‘01] • General references and polymorphism [Ahmed et al., ‘03] [Ahmed, ‘04]

Andrew Appel Amal Ahmed David McAllester

8 Now that we presented the calculus, let us turn our attention to step-indexed semantic models. Such models were first developed Andrew Appel and his collaborators in the context of foundational proof-carrying code. They wanted simpler and more modular proofs of type soundness that are easier to check automatically. And they considered low-level languages, like assembler or machine code.

However, step-indexed semantic models were also defined first for a lambda calculus with recursive types in 2001, and then successfully extended to an imperative language with general references and impredicative polymorphism. Step-indexed Semantic Models

• Use operational semantics of untyped calculus • Types are sets of indexed values “a :k α if a behaves like an element of α for k steps” • In the presence of state • Values also indexed by store typings Ψ

Store extension relation: (k, Ψ) (j, Ψ!) • ! • Types have a “meaning” - predicates over programs

9 These models are based only on the small-step operational semantics of an untyped calculus.

Types are interpreted as sets of indexed values. Informally, an expression has a certain type if it behaves like an element of that type for a fixed number of steps. Or equivalently we say that a is in alpha with approximation k, if one cannot distinguish a from a real value of type alpha in less than k computation steps.

In the presence of state, the values are indexed not only by computation steps, but also by store typings. Store typings increase monotonically during computation, which is captured by a store extension relation.

Like in a denotational semantics types have a meaning: they can be viewed as predicates over programs. Type constructors are just functions over these predicates. However, our model is purely set theoretical, so much simpler than the models one usually has with a denotational semantics. Especially the proofs are usually by direct induction on the index. Types

Object types [md : τd]d D • ∈ • Recursive types µF Polymorphic types F F • ∀ ∃ Subtyping α β • ⊆ • “No object calculus can fully justify its existence without some notion of subsumption” - Luca Cardelli Bounded quantification F F • ∀α ∃α • Self types ~ recursive object types ς(λτ Type. [md : Fd(τ)]d D) ∈ ∈

10 Out object calculus has of course object types, which enumerate the return type of each method (our methods do not have arguments).

Recursive and polymorphic types are defined in the same way as in Ahmed’s model.

However, a feature which is very important to us, but was not explicit before, is subtyping. In his book Luca Cardelli states that: “No object calculus can fully justify its existence without some notion of subsumption”. In particular an object with more methods can subsume (emulate) another object with fewer methods.

Our model has a very natural notion of subtyping: since types are just sets, subtyping is just set inclusion. And while defining subtyping this way is very simple, the interaction between subtyping and the other features of the type system is not so simple. This interaction also gives raise to new types, like bounded polymorphic types and self types (which are recursive object types with proper subtyping). Rules and Soundness • 27 semantic typing rules (+31 functional obj. calc.) τ Type. τ α Γ = a : F (τ) (Sem-Pack) ∃ ∈ ⊆ ∧ | Γ = pack a : F | ∃α Ψ, Γ C A Ψ, Γ a[X := C] : B[X := C] (Syn-Pack) ! ≤ ! Ψ, Γ pack X A=C a : X A. B ! ≤ ∃ ≤ • Note: we only check location-free programs • Soundness proofs • Foundational, i.e. elementary • Easily machine checkable (not checked yet)

11 Our type system has 27 typing rules which we have proved sound. These rules are more “semantic”when compared to purely syntactic rules one usually considers in type systems. However there is a direct correspondence between the two kinds of rules.

The semantic rules are slightly simpler since we only type-check programs which are location-free. In a syntactic setting we would need to be able to type-check running programs for the subject-reduction proof, so the typing judgement would be also involve a store typing.

In our case the run-time safety of well-typed terms is direct from the definition of types. Each semantic typing rule needs however to be proved sound separately. These soundness proofs are elementary, usually by direct induction on the index. Appel and his collaborators have shown that such proofs are modular, and can be more easily encoded into an expressive logic (HOL). This allows for easily machine checkable proofs, which is very important in their setting. For us this is not crucial, so for now we did not check our proofs automatically. Further Work

• Local and modular programming logic • ... for the imperative object calculus • Proving it sound wrt. semantic model

12 An interesting direction for further work, is to create a local and modular programming logic for the imperative object calculus, and to prove it sound with respect to a semantic model. not local local modular

Program Logics

Hoare Logic (1967, 1969)

Robert Floyd not modular

13 not local local Specification Logic (1982)

modular John Reynolds

Program Logics

Hoare Logic (1967, 1969) not modular

14 not local local Specification Logic (1982) modular

Program Logics

Hoare Logic (1967, 1969) Separation Logic (2001)

Peter O’Hearn John Reynolds not modular

15 not local local Specification Logic (1982) modular

Program Logics

Hoare Logic (1967, 1969) Separation Logic (2001) Higher-order Store (2006) not modular

Bernhard Reus Jan Schwinghammer

16 not local local Specification Logic (1982) Sep. and Abstraction (2005)

modular Matthew Parkinson

Program Logics

Hoare Logic (1967, 1969) Separation Logic (2001) Higher-order Store (2006) not modular

17 not local local Specification Logic (1982) Sep. and Abstraction (2005) Hoare Type Theory (2007) modular

Aleks Nanevski Amal Ahmed Greg Morrisett Lars Birkedal Program Logics

Hoare Logic (1967, 1969) Separation Logic (2001) Higher-order Store (2006) not modular

18 not local local Specification Logic (1982) Sep. and Abstraction (2005) Hoare Type Theory (2007) Sep. Idealized ML (2007) modular

Neel. Krishnaswami Lars Birkedal Jonathan Aldrich John Reynolds

Hoare Logic (1967, 1969) Separation Logic (2001) Higher-order Store (2006) not modular

19 not local local Specification Logic (1982) Sep. and Abstraction (2005) Hoare Type Theory (2007) Sep. Idealized ML (2007) modular

Object Calculus

Hoare Logic (1967, 1969) Separation Logic (2001) Logic of Objects Higher-order Store (2006) (1996, 2004) not modular

Martin Abadi Rustan Leino

20 not local local Specification Logic (1982) Sep. and Abstraction (2005) Hoare Type Theory (2007) Sep. Idealized ML (2007) modular

Object Calculus

Hoare Logic (1967, 1969) Separation Logic (2001) Logic of Objects Higher-order Store (2006) (1996, 2004, 2006) not modular

Jan Schwinghammer

21 not local local Specification Logic (1982) Sep. and Abstraction (2005) Hoare Type Theory (2007) ? Sep. Idealized ML (2007)

modular ?

Object Calculus ? Hoare Logic (1967, 1969) Separation Logic (2001) Logic of Objects Higher-order Store (2006) (1996, 2004, 2006) not modular

22 Summary

• Imperative object calculus • Step-indexed semantic model of types • Main contributions • Object types • Subtyping • Bounded quantification, self types • One direction for further research • Local and modular program logic

23 To sum things up, we have seen a simple programming language with a very expressive type system and a realistic store model, for which constructing denotational models is really hard.

Then we described a step-indexed semantic model of types, we have constructed based on the previous work by Andrew Appel, Amal Ahmed and others.

Our main contribution is the treatment of object types and subtyping, and in particular of the interaction between subtyping and the strong type system we considered. This also leads to new features like: bounded quantification and self types.

Finally, we have also seen one interesting direction for further research: constructing a local and modular program logic for the imperative object calculus. There are many others.