A Theory of Indirection Via Approximation

A Theory of Indirection via Approximation Aquinas Hobor ∗† Robert Dockins† Andrew W. Appel † National University of Singapore Princeton University Princeton University [email protected] [email protected] [email protected] Abstract Consider general references in the polymorphic λ-calculus. Building semantic models that account for various kinds of indirect Here is a flawed semantic model of types for this calculus: reference has traditionally been a difficult problem. Indirect refer- value ≡ loc of address + num of N + ... ence can appear in many guises, such as heap pointers, higher-order type ≡ (memtype × value) → T (1) functions, object references, and shared-memory mutexes. memtype ≈ address * type We give a general method to construct models containing indirect reference by presenting a “theory of indirection”. Our method Values are a tagged disjoint union, with the tag loc indicating can be applied in a wide variety of settings and uses only simple, a memory address. T is some notion of truth values (e.g., the elementary mathematics. In addition to various forms of indirect propositions of the metalogic or a boolean algebra) and a natural reference, the resulting models support powerful features such as interpretation of the type A → T is the characteristic function for a impredicative quantification and equirecursion; moreover they are set of A. We write ≈ to mean “we wish we could define things this compatible with the kind of powerful substructural accounting re- way” and A*B to indicate a finite partial function from A to B. quired to model (higher-order) separation logic. In contrast to pre- The typing judgment has the form ψ ` v : τ, where ψ is vious work, our model is easy to apply to new settings and has a a memory typing (memtype), v is a value, and τ is a type; the simple axiomatization, which is complete in the sense that all mod- semantic model for the typing judgment is ψ ` v : τ ≡ τ(ψ, v). els of it are isomorphic. Our proofs are machine-checked in Coq. Memory typings are partial functions from addresses to types. The motivation for this attempt is that the type “ref τ” can use the Categories and Subject Descriptors D.3.1 [PROGRAMMING memory typing to show that the reference’s location has type τ: LANGUAGES]: Formal Definitions and Theory — Semantics; F.3.1 [LOGICS AND MEANINGS OF PROGRAMS]: Specifying ref τ ≡ λ(ψ, v). ∃a. v = loc(a) ∧ ψ(a) = τ. and Verifying and Reasoning about Programs — Logics of pro- That is, a value v has type ref τ if it is an address a, and according grams; F.4.1 [MATHEMATICAL LOGIC AND FORMAL LAN- to the memory typing ψ, the memory cell at location a has type τ. GUAGES]: Mathematical Logic — Mechanical theorem proving Unfortunately, this series of definitions is not well-founded: type contains a contravariant occurrence of memtype, which in General Terms Languages, Theory, Verification turn contains an occurrence of type. A standard diagonalization Keywords Indirection theory, Step-indexed models proves that no solution to these equations exists in set theory. A recent approach to tackle this problem is to employ stratification, yielding a well-founded definition of an appropriate seman- 1. Introduction tic model [AAV03, Ahm04, AMRV07]. A key strength of these A recurring problem in the semantics of programming languages is models is that they can express general (impredicative) quantified finding semantic models for systems with indirect reference. Indi- types even though they are stratified. Unfortunately, these models rection via pointers gives us mutable records; indirection via locks have a disturbing tendency to “leak” into any proofs utilizing them, can be used for shared storage; indirection via code pointers gives making modularization difficult, obscuring their essential features, us complex patterns of computation and recursion. Models for pro- needlessly limiting their power, and contributing to unpleasant te- gram logics for these systems need to associate invariants (or asser- dium. Also, all of these models are specialized to the problem of tions, or types) with addresses in the store; and yet invariants are mutable references, and even for the expert it can be daunting to predicates on the store. Tying this “knot” has been difficult, espe- modify the techniques to apply to other domains. cially for program logics that support abstraction (impredicativity). Hobor et al. applied these techniques to Hoare logics, devel- oping a model for a concurrent separation logic (CSL) with first- ∗ Supported by a Lee Kuan Yew Postdoctoral Fellowship. class locks and threads [HAZ08]. This model was extremely com- † Supported in part by NSF awards CNS-0627650 and CNS-0910448, and plex to construct because the substructural accounting was woven AFOSR award FA9550-09-1-0138. throughout the stratification. It was also complex to use, exposing more than fifty axioms.Even then, the axiom set was incomplete: from time to time the CSL soundness proof required the statement and proof of additional properties (which were provable from the model). Like the mutable reference models, Hobor et al.’s model Permission to make digital or hard copies of all or part of this work for personal or was a solution to a specific problem, and could not be applied to classroom use is granted without fee provided that copies are not made or distributed other problems—not even to other concurrent separation logics. for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute We present a single solution, indirection theory, that can handle to lists, requires prior specific permission and/or a fee. these domains as well as numerous others. Our model is character- POPL’10, January 17–23, 2010, Madrid, Spain. ized by just two axioms, an order of magnitude better than the sim- Copyright c 2010 ACM 978-1-60558-479-9/10/01. $10.00 plest of the previous solutions. The axioms are equational, orthog- 1 onal, and complete, and have greater expressive power and cleaner 2.1 General references in the λ-calculus modularity. The modularity enables a graceful extension for sub- Ahmed et al. constructed the first model of a type system for the structural accounting that does not need to thread the accounting polymorphic λ-calculus with general references [AAV03]. Follow- through the stratification. Moreover, the axioms provide greater in- ing (1), we want a solution to the pseudoequation sight into the model by directly exposing the approximation at the heart of the stratification technique. memtype ≈ address * ((memtype × value) → T), As we show in §2, a key observation is that many domains can which falls neatly into the pattern of pseudoequation (2) with be described by the pseudoequation: F (X) ≡ address *X K ≈ F ((K × O) → ), (2) T O ≡ value. where F (X) is a covariant functor, O is some arbitrary set of By equation (3), indirection theory constructs the approximation “other” data and K is the object that we wish to construct. The earlier cardinality argument guarantees we cannot construct K (in memtype N × (address * ((memtype × value) → T)), set theory), but indirection theory lets us approximate it: and by folding the definition of type (from eqn. 1) we reach K N × F ((K × O) → T) (3) memtype N × (address * type). Here X Y means the “small” type X is related to the “big” type This model is sufficient to define a powerful type system for the Y by two functions: squash : Y → X and unsquash : X → Y . polymorphic λ-calculus with mutable references. In §4.1, we will The squash function “packs” an element y of the big type Y into an show how to construct the types nat and ref τ. element of the small type X, applying an approximation when the structure of y is too complex to fit into X. The unsquash function 2.2 General references in von Neumann machines reverses the process, “unpacking” an element of X into an element Modeling a type system with general references for von Neumann of Y ; since Y is bigger than X, unsquash is lossless. machines was solved first by Ahmed [Ahm04], and then later in The squash and unsquash functions form a section-retraction a more sophisticated way by Appel et al. [AMRV07]. The key is pair, meaning that squash ◦ unsquash : X → X is the identity that whereas in the λ-calculus types are based on sets of values, on function and unsquash ◦ squash : Y → Y is an approximation a von Neumann machine types are based on sets of register banks. function. Thus X Y is almost an isomorphism, and informally Here is the intuition for a von Neumann machine with 32-bit integer one can read X Y as “X is approximately Y”. With equation memory addressing and m 32-bit integer registers: (3), indirection theory says that left hand side of pseudoequation m−2 (2) is approximately equal to a pair of a natural number and the rbank ≡ int32× ... ×int32 right hand side of pseudoequation (2). type ≡ (memtype × rbank) → T memtype ≈ int32 * type. Contributions. The intuition is thus very similar to the λ-calculus case. We set §2 We show numerous examples containing indirect reference and F (X) ≡ int32 *X show how they can be characterized with a covariant functor. O ≡ rbank, §3 We present the two axioms of indirection theory. and indirection theory constructs the approximation §4 We show how to apply the axioms to two of the examples. memtype N × (int32 * type). §5 We explore some implications of indirection theory. In the λ-calculus, the memory maps addresses to values, and values §6 We induce an impredicative logic from indirection theory.

A Theory of Indirection Via Approximation

Virtual Memory

Concept-Oriented Programming: References, Classes and Inheritance Revisited

Chapter 9 Pointers

Pointers Pointers – Getting the Address of a Value the Address Operator (&) Returns the Memory Address of a Variable

Encapsulation and Inheritance in Object-Oriented Programming Languages

Specification and Verification with References Bruce W

Lazy Reference Counting for Transactional Storage Systems

Fast Run-Time Type Checking of Unsafe Code

Migration of Threads Containing Pointers in Distributed Memory Systems

Hardware-Software Interface

C Pointers Indirection Creating Pointers Pointer Declarations Pointer

Chapter 6 Data Type