Behavioral Subtyping, Specification
Total Page:16
File Type:pdf, Size:1020Kb
Behavioral Subtyping, Specification Inheritance, and Modular Reasoning Gary T. Leavens and David A. Naumann TR #06-20b July 21, 2006, revised Aug 4, Sept. 3 2006 Keywords: Behavioral subtyping, supertype abstraction, specification inheritance, modularity, specification, verification, state transformer, dynamic dispatch, invariants, Eiffel language, JML language. 2006 CR Categories: D.2.2 [Software Engineering] Design Tools and Techniques — Object-oriented design methods; D.2.3 [Software Engineering] Coding Tools and Techniques — Object-oriented programming; D.2.4 [Software Engineering] Software/Program Verification — Class invariants, correctness proofs, formal methods, programming by contract, reliability, tools, Eiffel, JML; D.2.7 [Software Engineering] Distribution, Maintenance, and Enhancement — Documentation; D.3.1 [Programming Languages] Formal Definitions and Theory — Semantics; D.3.2 [Programming Languages] Language Classifications — Object-oriented languages; D.3.3 [Programming Languages] Language Constructs and Features — classes and objects, inheritance; F.3.1 [Logics and Meanings of Programs] Specifying and Verifying and Reasoning about Programs — Assertions, invariants, logics of programs, pre- and post-conditions, specification techniques; Copyright c 2006 by Gary T. Leavens and David A. Naumann. Submitted for publication Department of Computer Science 226 Atanasoff Hall Iowa State University Ames, Iowa 50011-1041, USA Behavioral Subtyping, Specification Inheritance, and Modular Reasoning Gary T. Leavens ∗ David A. Naumann † Iowa State University Stevens Institute of Technology Ames, IA 50011 USA Hoboken, NJ 07030 USA [email protected] [email protected] Abstract understanding is embodied languages and tools such as Eiffel [30], JML Behavioral subtyping is an established idea that enables modular reason- [20], ESC/Java [16], and Spec# [8, 7]. But these have unsoundnesses and ing about behavioral properties of object-oriented programs. It requires incompletenesses, some by engineering design and some for lack of ad- that syntactic subtypes are behavioral refinements. It validates reason- equate theory and methodology. A key source of unsoundness is naive ing about a dynamically-dispatched method call, say E.m(), using the treatment of object invariants, because many important design patterns specification associated with the static type of the receiver expression E. invalidate the simple hierarchical notion of encapsulation on which the For languages with references and mutable objects the idea of behavioral standard treatment [17] is based. subtyping has not been rigorously formalized as such, the standard infor- On one hand, behavioral subtyping has been rigorously studied in re- mal notion has inadequacies, and exact definitions are not obvious. This strictive programming models (e.g., [19, 47]). On the other hand, various paper formalizes behavioral subtyping and supertype abstraction for a embodiments have been implemented in static and runtime verification Java-like sequential language with classes, interfaces, exceptions, muta- tools and logics that apply to rich specification and programming lan- ble heap objects, references, and recursive types. Behavioral subtyping guages such as Java and JML [20] and Eiffel [30]. Our goal is to close is proved sound and semantically complete for reasoning with supertype the gap by providing a rigorous analysis on which can be based more abstraction. Specification inheritance, as used in the specification lan- specialized assessments and justifications of specific tools and logics. guage JML, is formalized and proved to entail behavioral subtyping. (With the ultimate aim of high assurance for verification tools, we have undertaken to machine-check our results.) We believe the gap has remained because it was far from clear how 1. Introduction to formalize a general theory that pertains directly to reasoning about In object-oriented (OO) programming, subtyping and dynamic dispatch code in languages of practical interest. The semantic intricacies of the are both useful and problematic. They are useful because supertypes can languages —and of current methodologies for sound reasoning about abstract away details in the specifications of their subtypes, thus allow- invariants, heap encapsulation and locality of effects, etc.— are daunt- ing variations in data structures and algorithms to be handled uniformly. ing. The details of the OO language are important, because some lan- They are problematic for modular reasoning because a dynamically- guage features, such as reflection, allow programs to make observations dispatched method call such as E.m() seems to require a case analysis that can distinguish between supertype and subtype objects. The achieve- to deal with all possible dynamic types of E’s value. The basic idea of ments closest to our aim are soundness and completeness proofs for log- modular reasoning, which we call supertype abstraction [22], is clear. It ics of Java fragments that embody supertype abstraction in some form is a generalization of typechecking: reasoning about an invocation, say (e.g.,[43]). But these assess the reasoning power of a proof system, rather E.m(), is based on the specification associated with the static type of than assessing and explicating the connection between behavioral sub- E, and constraints are imposed on implementations of m at all subtypes. typing and supertype abstraction; and they are somewhat removed from While modular type safety conditions for dynamically-dispatched meth- the axiomatic semantics of some widely used verifiers (which simply ods are well-known [12], a straightforward translation into conditions on postulate soundness of behavioral subtyping in some form). overriding method specifications, while sound, is more restrictive than The key insight that led to our results is a purely semantic formula- necessary. The translation also gives no help in reasoning about object tion of supertype abstraction using two denotational semantics: in one, invariants, which need to be strengthened in subtypes. Hence, for modu- method calls are statically dispatched. On this basis we are able to give lar reasoning one needs a behavioral notion of subtyping. a formal treatment in a language with many constructs of sequential OO Remarkably, there is no mathematically rigorous account of behav- languages, including classes, interfaces, mutable heap objects, assign- ioral subtyping and its connection with modular reasoning about speci- ment, exceptions, inheritance, visibility, reference equality, type tests, fications and programs in conventional OO programming languages — and recursive types. Even with effective definitions in hand, it was non- although there has been much study [1, 2, 3, 11, 14, 15, 19, 22, 29, 30, 47] trivial to find the right induction hypothesis and technique to prove the (see [21] for a survey). Some of the current understanding of behavioral main lemma. subtyping is embodied in program logics [31, 40, 42, 43, 46] but is diffi- We make no commitment to particular specification notations or rea- cult to disentangle from various other complications. Some of the current soning system but rather formulate modular reasoning semantically in a generic way that idealizes what is found in logics and tools. Using an op- ∗ Supported in part by NSF grant CCF-0429567. erationally sound compositional semantics allows us to provide a foun- † Supported in part by NSF grants CCR-0208984 and CCF-0429894. dation that will serve as a point of reference and as a basis for assessment and further development of specification languages and verification tools. This paper makes the following contributions. • We give a semantic characterization of supertype abstraction, which idealizes what is found in logics and verification tools. In contrast to related work, our definition does not rely on derived notions such as substituting one object for another [19, 27, 29], nor is it tied to a proof [copyright notice will appear here] system [31, 40, 42, 43, 46]. 2 2006/9/3 • We formalize behavioral subtyping in terms of refinement of observ- subtype of T .” This is often called the “Liskov Substitutability Principle” able behavior in a realistic programing model. Refinement does not (LSP) and is a strong form of supertype abstraction. The LSP is actually need to hold between all syntactically related types but only when the too strong, because it uses the notion of “unchanged” behavior; the subtype is a (non-abstract) class. point of introducing subtype objects is often to change behavior in a • In contrast to the standard view [29], we define refinement intrinsi- way that is allowed by the supertype’s specification. A more flexible cally, by quantifying over satisfying implementations. Separately, we intuition defines observations that are not allowed by this specification characterize refinement in terms of relations between pre- and post- as “surprising,” and says that behavioral “subtyping prevents surprising conditions. Our characterization adapts previous work [13, 33, 41] behavior” [19, Chapter 1]. that improves on the overly restrictive standard condition of postcon- As a formulation of supertype abstraction, the LSP is not easy to ap- dition implications [2, 3, 15, 29, 30]. (This also isolates the way in ply to imperative OO languages. It is not clear what it means to substitute which characterization of completeness depends on the specification one object for another: imperative programs are not referentially trans- language, see Sect. 7.) An outcome of our focus