Efficient Subtyping Tests with PQ-Encoding
Total Page:16
File Type:pdf, Size:1020Kb
Efficient Subtyping Tests with PQ-Encoding JOSEPH (YOSSI) GIL1 and YOAV ZIBIN Technion—Israel Institute of Technology Given a type hierarchy, a subtyping test determines whether one type is a direct or indirect descendant of another type. Such tests are a frequent operation during the execution of object- oriented programs. The implementation challenge is in a space-e±cient encoding of the type hierarchy that simultaneously permits e±cient subtyping tests. We present a new scheme for encoding multiple and single inheritance hierarchies, which, in the standard benchmark hierarchies, reduces the footprint of all previously published schemes. Our scheme is called PQ-encoding (PQE) after PQ-trees, a data structure previously used in graph theory for ¯nding the orderings that satisfy a collection of constraints. In particular, we show that in the traditional object layout model, the extra memory requirements for single inheritance hierarchies is zero. In the PQE subtyping tests are constant time, and use only two comparisons. The encoding creation time of PQE also compares favorably with previous results. It is less than a second on all standard benchmarks on a contemporary architecture, while the average time for processing a type is less than one millisecond. However, PQE is not an incremental algorithm. Other than PQ-trees, PQE employs several novel optimization techniques. These techniques are applicable also in improving the performance of other, previously published, encoding schemes. Categories and Subject Descriptors: D.1.5 [Programming Techniques]: Object-oriented Programming; D.3.3 [Programming Languages]: Language Constructs and Features—Data types; structures; G.4 [Mathematical Software]: Algorithm design; analysis General Terms: Algorithms, Design, Measurement, Performance, Theory Additional Key Words and Phrases: Casting, Encoding, Hierarchy, Inheritance, Partially Ordered Sets, PQ, PQE, Subtyping, Type inclusion 1. INTRODUCTION One of the basic operations in the runtime environment of object-oriented (OO) programs is a subtyping test. Given an object o and a type b, a subtype test determines whether a, the runtime type of o, is a subtype of b, i.e., a is a direct or indirect descendant of b in the inheritance hierarchy. These subtyping tests (also known as type inclusion tests) are carried out at runtime, and are distinct from static subtyping tests done by the language type checker. 1Research supported in part by the generous funding of the Israel Science Foundation, grant No. 128/02. Authors’ address: Technion City, Haifa 32000, Israel. Contact the authors for patent information. A preliminary version of this paper was published in the proceedings of the 16th Annual Conference on Object- Oriented Programming Systems, Languages, and Applications (OOPSLA’01). Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific permission and/or a fee. °c 2001 ACM 0164-0925/99/0100-0111 $00.75 ACM Transactions on Programming Languages and Systems, Vol. ?, No. ?, January 2005, Pages 1–36. 2 ¢ J. Gil and Y. Zibin 1.1 Subtyping Tests in OO Languages A programmer may apply a subtyping test explicitly using dedicated constructs such as JAVA’s [Arnold and Gosling 1996] instanceof, and SMALLTALK’s [Goldberg 1984] isKindOf: method. In addition, several other language constructs are implemented us- ing these tests. For example, subtyping tests are implicit in the execution of type cast oper- ations, e.g., ?= in the EIFFEL programming language [Meyer 1992] and dynamic cast in C++ [Stroustrup 1997]. In JAVA for example, the lack of parametric polymorphism is a common source of such casts. When extracting an element o from a collection class, e.g., Vector, the type of o is Object. Therefore, it is typically necessary to cast o to the expected type. Also, the covariant nature of array subtyping in JAVA renders subtyping tests necessary in assignments to elements of arrays whose dynamic type is unknown. Consider for exam- ple the following code fragment void f(Object x[]) f x[1] = new A(); g It may be a bit surprising that the assignment to x[1] in f requires a subtyping test. To understand why, suppose that function f was invoked with a value of type B[] (an array of elements of type B) as an actual argument x, e.g., f(new B[3]); This invocation is correct because type B[] is a subtype of Object[] (an array of ob- jects). However, the assignment x[1] = new A(); is legal only if type A is a subtype of type B. Otherwise, the runtime environment raises an ArrayStoreException exception. Yet another construct which is implemented using subtyping tests is covariant overriding of arguments in EIFFEL. The compiler is inclined to make subtyping tests in conjunction with calls to methods that use this feature in order to ensure type-safety2. Finally, we note that subtyping tests may also be part of the implementation of exception handling in JAVA, C++, and other languages. The following code excerpt is an example of a try block, followed by a catch clause and block. try f f(); ...g catch(B b) f...g Suppose that an object o is thrown from the try block, e.g., from within function f(). Then, the program should execute the catch block, but only if type B is a supertype of the thrown object’s type. Thus, the catch clause can be implemented with a subtyping test. In JAVA, the subtyping test is with respect to the dynamic type of o, while in C++ it is with respect to the static type of o. However, in both cases, this test must be carried out at runtime. 2Various mechanisms of static analysis have been proposed to eliminate this requirement, but none of these have been implemented. ACM Transactions on Programming Languages and Systems, Vol. ?, No. ?, January 2005. Efficient Subtyping Tests with PQ-Encoding ¢ 3 1.2 Problem Definition The problem we deal with is defined formally as follows. A hierarchy is a partially ordered set (T ,¹) where T is a set of types3 and ¹ is a reflexive, transitive and anti-symmetric subtype relation. If a and b are types, and a ¹ b holds, we say that they are comparable, that a is a subtype of b and that b is a supertype of a. Given a hierarchy (T ; ¹), jT j = n, the subtyping problem is to build a data structure supporting queries of the sort a ¹ b. This data structure is called an encoding of the hierarchy. The subtyping problem has enjoyed considerable attention recently (see e.g., [Kaci et al. 1989; Agrawal et al. 1989; Caseau 1993; Habib and Nourine 1994; Capelle 1994; Fall 1995; 1996; Krall et al. 1997; Vitek et al. 1997; Habib et al. 1999; van Bommel and Beck 2000; Raynaud and Thierry 2001; Filman 2002; Palacz and Vitek 2003; Corsaro and Cytron 2003]). The challenge of implementing subtyping tests is to simultaneously optimize its four complexity measures: (1) Space. Encoding methods associate certain data with each type. We measure the average number of bits per type, also called the encoding length. Note that we do not include in the measure the space consumed by each object. Al- though the space overhead per object depends on the object layout model, it is usually assumed that each object includes a pointer to a type information record. (2) Instruction count. This is the number of machine instructions in the test code, on a certain hardware architecture. There are indications [Vitek et al. 1997] that the space consumed by the test code, which can appear many times in a program, can dominate the encoding length. An encoding is said to be uniform4 if there exists an implementa- tion of the test code in which the instruction count does not depend on the size of the hierarchy. Only uniform encodings will interest us. (3) Test time. The time complexity of the test code is of major interest. Since the test code might contain loops, the time complexity may not be constant even in uniform encodings. Our main concern here are encodings with constant-time tests (which are always uniform). To improve timing performance, loops of not constant-time tests may be unrolled, giving rise to non-constant instruction count, without violating the uniformity condition. (Bit-vector encoding, presented in Sec. 4, is an example of a uniform encoding which is not constant-time.) In explicit subtyping tests and in type casts, the supertype b is known at compile time. Therefore, the generated subtyping test code can be specialized, by pre-computing values that depend only on b and using them in the test code. Specialization thus benefits both instruction count and test time, and may even reduce the encoding length. (4) Encoding creation time. Another important complexity measure is the time for gen- erating the actual encoding, which can be large. It is essential that a compiler will be able to finish its computation in a reasonable time. In many cases, this is not possible. For example, the problem of finding an optimal bit-vector encoding was proved to be intractable [Habib and Nourine 1994]. Heuristics of bit vector encoding [Kaci et al. 3The distinction between type, class, interface, signature, etc., as it may occur in various languages does not concern us here. We shall refer to all these collectively as types. 4The term is borrowed from circuit complexity. A family of circuits for the size dependent incarnations of a certain problem is called uniform, if this family can be generated by a single Turing machine.