Eric Allen David Chase Victor Luchangco Jan-Willem Maessen Guy L. Steele Jr.
Sun Microsystems Laboratories Burlington MA 01803 .@sun.com
ABSTRACT meters); the unit and dimension of the measurement are Programs that manipulate physical quantities typically rep- not represented in the program at all (except possibly in resent these quantities as raw numbers corresponding to the the names of variables used to hold these numbers). Thus, quantities’ measurements in particular units (e.g., a length the units and dimensions cannot be used in the analysis of represented as a number of meters). This approach elimi- the program. In particular, they cannot be used to detect nates the possibility of catching errors resulting from adding errors resulting from mismatched quantities. While trivial, or comparing quantities expressed in different units (as in such errors are not uncommon, and can have costly reper- the Mars Climate Orbiter error [11]), and does not sup- cussions. For example, the loss of the Mars Climate Orbiter port the safe comparison and addition of quantities of the in September 1999 was ultimately traced to a failure in the same dimension. We show how to formulate dimensions and software to convert between English and metric units [11]. units as classes in a nominally typed object-oriented lan- In another example, a space shuttle rolled itself away from guage through the use of statically typed metaclasses. Our the Earth in response to an instruction to bounce a laser off formulation allows both parametric and inheritance poly- a mountain it was told was 10,000 miles high, rather than morphism with respect to both dimension and unit types. the intended 10,000 feet [24]. Units were not programmed It also allows for integration of encapsulated measurement as part of the input data and therefore all inputs were im- systems, dynamic conversion factors, declarations of scales plicitly interpreted as measurements in miles. (including nonlinear scales) with defined zeros, and noncon- stant exponents on dimension types. We also show how to At least superficially, dimensions and units bear a strong encapsulate most of the “magic machinery” that handles the similarity to types. However, most contemporary type sys- algebraic nature of dimensions and units in a single meta- tems, even with generic types, are insufficiently expressive class that allows us to treat select static types as generators to capture several properties necessary for dimension check- of a free abelian group. ing. For example, dimension checking at compile-time re- quires the compiler to manipulate dimensions according to a dimensional algebra (e.g., Time × Length = Length × Time). Categories and Subject Descriptors Most previous formulations of dimension checking in pro- D.1.5 [Programming Techniques]: Object-Oriented Pro- gramming languages have encoded dimensions as an ad hoc gramming; D.3.1 [Programming Languages]: Formal Def- language feature rather than integrating them into a gen- initions and Theory; D.3.3 [Programming Languages]: eral type system. The few attempts at integrating dimen- Language Constructs and Features sions into more general type systems have focused on struc- tural type systems over functional and procedural languages General Terms with significantly different design considerations than those Design,Languages of mainstream object-oriented languages [20].
In this paper, we show how to integrate dimensions and units 1. INTRODUCTION with classes in a nominally typed object-oriented language Physical units and dimensions are a commonly used com- (such as the JavaTM Programming Language or C#) in a putational construct in science and engineering, but there is way that supports static checking of units and dimensions, relatively little support for them in programming languages. and with a performance cost no greater than that of boxing Instead, physical quantities are typically represented as num- primitive values.1 bers that correspond to the quantities’ measurement in a particular unit (e.g., a length represented as a number of 1Boxing is a technique that enables a primitive value (e.g., 0) to be used in contexts requiring a reference value by wrap- ping the primitive in a new reference (new Integer(0)). It OOPSLA’04, October 24-28, 2004 Vancouver, British Columbia, Canada, can negatively impact execution time because boxed values Copyright 2004 Sun Microsystems, Inc. All rights reserved. must be dereferenced when accessed. However, clever com- ACM 1-58113-712-5/03/0010 . piler optimizations, particularly in just-in-time compilers, Our formulation allows dimensions and units to be treated as 2.1 Basic terminology and analysis ordinary class-based types, and allows both parametric and The fundamental notion in our formulation is that of a quan- inheritance polymorphism with respect to both dimension tity. Every quantity exists in a particular dimension2 (e.g., and unit types. We describe the new language features re- Length, Time). When two quantities Q and Q0 exist in the quired to support this integration in the context of the Mix- same dimension, we can answer the question, “how much of Gen extension [1] of the Java Programming Language, where Q does it take to equal Q0?” generic types are first-class. MixGen allows type-dependent operations such as casts on arbitrary generic types, and even Some quantities are designated units and are denoted with expression of mixins (i.e., generic classes parameterized by special symbols (e.g., Meter, Second). Every quantity can their own superclasses). be denoted by a measurement in terms of any unit U in its dimension. If it takes x of U to equal Q then we denote Q First-class generic types play a crucial role in our formula- with the measurement xU, and we call x the magnitude of tion of dimensions. We present our language as an extension this measurement. For example, the denotation 5 Meters is of MixGen to ground it in a sound, statically typed object- a measurement for a quantity in the dimension Length whose oriented language that supports first-class generic types. A magnitude in terms of the unit Meter is 5. formulation in the context of C++ templates, in particular, would not meet this criterion. MixGen includes many sub- One advantage of using units is that it allows us to compare tle features to handle pathologies that do not arise in our two measurements of the same dimension algebraically. To treatment of dimensions and units—for example, support express a quantity Q denoted by a measurement M = xU in for polymorphic recursion, hygienic method overriding, and unit U as a measurement in another unit V , we need not re- nontrivial with clauses; we avoid discussing these features. measure the same quantity in V (which is seldom practical, Therefore, we expect our formulation to be accessible to and often impossible). Instead, we can measure our unit U readers familiar only with C++, C#, or the Java Program- with unit V , resulting in a measurement yV ; we say y is ming Language, and with the notion of encoding mixins via the conversion factor from U to V . Then we can form the generic types, as can be done with C++ templates. Also, measurement M 0 of Q in unit V by substituting yV for U because our real interest is in understanding the relation- in M: M = xU = xyV = (xy)V . Two measurements in the ships of dimension and unit checking with object-oriented same unit can be compared by comparing their magnitudes type systems in general, we make no attempt to maintain arithmetically. backward compatibility when extending MixGen. Magnitudes can be taken from an arbitrary algebraic field. Our main extension to MixGen is the addition of statically For now, we assume that magnitudes are always real num- typed metaclasses, which we use to model values that have bers (i.e., instances of a class Real, which we intentionally associated dimensional units. We also define a special meta- leave undefined). Furthermore, we use the standard liter- class abelian class that provides the algebraic properties als and mathematical operators to denote real numbers and necessary to model dimensions and units accurately. Al- their methods, and we do not worry about errors due to though we strive to introduce as few features as possible, computational limitations such as imprecision and overflow. the language we end up with is quite different from the lan- We discuss taking magnitudes from arbitrary fields in Sec- guage we started with; we dub our new language MetaGen. tion 4.2, and implementation issues in Section 5.2.
In Section 2, we briefly analyze some concepts important We can also combine quantities of distinct dimension via to dimension and unit checking, discuss design constraints multiplication and division to form new quantities. Such a for a system to do this checking, and explain why existing quantity exists in a new dimension denoted by combining object-oriented languages do not satisfy these constraints. the respective dimensions with combinators × and /, which In Section 3, we describe several common uses of physical satisfy certain algebraic properties (e.g., Length × Time = quantities for which we want to have static checking, and Time × Length). The set of dimensions is an abelian group the features of MetaGen that support this ability, and we under × [21]. We define a special dimension, called unity, describe further extensions in Section 4. We discuss various to serve as the multiplicative identity for dimensions, and syntactic and performance issues in Section 5, survey related we denote the inverse of a dimension D by D−1. work in Section 6, and conclude with future directions in Section 7. Units of combined measurements are combinations of units of the constituent measurements. Just as we can have sev- eral units of a given primary dimension, we can have several 2. ANALYSISAND DESIGN CHALLENGES derived units for a given derived dimension. It is also possi- In this section, we lay out the basic framework and terminol- ble to define new primitive units of combined dimensions. ogy that we use throughout this paper. Although the view we take is not the only way to understand dimensions and 2.2 Design Constraints units, we have found it useful for elucidating many design We want a design that accurately models our understand- issues. We also discuss several pragmatic requirements that ing of dimensions and units. In addition, we impose the influence our design. following pragmatic design constraints: can eliminate much of this overhead by specializing code 2We adopt for now the na¨ıve view that every quantity exists with variants for unboxed values in contexts where observ- in a single, fixed, dimension. In Section 3.6, we discuss some able behavior is unaffected. ways in which this na¨ıve view needs to be refined. 1. It should be easy to convert a measurement in one any unit of D. We may also want to declare abstract methods unit to a measurement in another unit of the same such as shortName on dimensions, or a method devices that dimension. returns a collection of the devices used to measure quantities of a dimension. We cannot represent individual dimensions 2. Type-dependent operations on dimensions and units, as subclasses of Dimension because the instances of partic- such as casting and instanceof tests, as well as the ular dimensions—naturally identified as quantities in that ability to print and store measurements, should be sup- dimension—should not be used in contexts requiring a di- ported. mension. If we model quantities as instances of their corre- 3. It should be possible to require that a quantity be sponding dimension class, and we model dimension classes denoted by a measurement in a specific unit (e.g., a as subclasses of Dimension, then by transitivity of subtyping, length that must be measured in meters) as an addi- every quantity would also be an instance of Dimension, which tional error check, to avoid unnecessary conversions, is clearly wrong: 5 Meters is not an instance of Dimension. and to manage the inaccuracies inherent in floating point calculations. We might try to break this inheritance relationship with a shallow workaround, such as treating dimensions as sub- 4. Programmers should be able to define new units and classes of Dimension and introducing a parametric class Quantity dimensions and to define behavior specific to that di- to model quantities of a particular dimension: mension (e.g., a method dilate for Time). Even if all programmers and users were willing to use the SI sys- abstract class Dimension {...} tem, there are contexts in which units and dimension class Length extends Dimension {...} checking can be employed (currency, information, etc.) class Time extends Dimension {...} ... that are not part of this system. class Quantity {...} 5. We should allow for as much polymorphism as possible. In particular, we should support type parameters for Then lengths would be instances of Quantity and dimensions and units. times would be instances of Quantity
The generalization of class relationships turns out to be the class Measurement U> extends D { private final Real _magnitude; key language feature needed for integrating dimensions and ... units with object-oriented types. Furthermore, this gener- } alization has applicability beyond just dimension checking. It allows us to more accurately model many mathematical The special kind class* is introduced to designate a class structures based on set membership and subsets. Metaphor- whose instances are all instance classes. Unit is defined to ically, we can think of classes as sets. Class extension can be be of kind class* so we can enforce the property that every thought of as the subset relationship, and instanceof can be unit will be associated with its own type. An instantiation thought of as set membership. In this view, allowing classes of D must be an instance of Dimension (and a subtype of to contain other classes is quite natural. Object). For example, we can define the following instance classes: We now describe the core of the new semantics of MetaGen, Unit Minute {...} an extension of the MixGen extension of the Java Program- Unit Foot {...} ming Language that allows us to express the type relation- Unit Mile {...} ships we want. ... Both Unit and Measurement extend their dimension In MetaGen, a class is allowed to define both a static method D, allowing us to refer to the type D in places we expect mea- and an instance method of the same name. However, this surements of D, (e.g., we can say that a method m takes a allowance introduces the potential for ambiguity when a Length and call m with new Measurement(5), de- method so defined is called from within an instance method noting 5 Miles). We also get a single point of control for of the class it is defined in. Our solution is simply to require specifying functionality specific to a particular dimension D: references to static members from instance contexts to ex- the class D itself. We stipulate that a naked type parameter plicitly denote the receiver. The receiver of a static method occurring in an extends clause or as a kind of a type param- call may be any legal type, including a type parameter. eter must be instantiated only with classes that contain a zeroary constructor and no abstract methods (ensuring there Static methods of C must not conflict with final instance is a superconstructor to call and that a mixin will not acci- methods of C’s kind and its superclasses. As with MixGen, dentally inherit abstract methods without defining them).3 static methods must not be abstract. The static members Because Unit and Measurement both have a superclass type of a class define the members of that class when considered parameter of kind Dimension, they implicitly require that all as a value. They are not members of a class’s instances, nor instance classes of Dimension include a zeroary constructor. are they inherited by subclasses. For example, we cannot write the expression Meter.shortName(). Because the unit of a measurement is a type parameter, we need not add an extra field to hold the unit; that is, the 3.1.2 Constructors and instance classes implementation need not include an extra unit reference in As with instance methods, the instance fields of a kind D each instance. As in MixGen, instantiations of type param- of C are static fields of C. Because these fields are normally eters of a generic type can be stored in a class table en- initialized with a call to a constructor of D, we must answer try shared by all instances of that instantiation. Therefore, the question of how they are initialized for an instance class unit support is no more expensive than boxing provided that such as C. We allow such fields to be initialized as follows: magnitudes are represented with primitive values. class C is allowed a single isolated constructor call to its kind in a static block. This constructor call must occur before all 3.1.1 Static members and instance members other static actions in the class definition. The new keyword As in MixGen, a class definition includes both static mem- container is used to designate this call. bers and instance members. Each member is either a field or a method (we leave open the question of how to integrate For example, we define an instance class of class inner classes into our metaclass semantics). Measurement If C is an instance class of D then the instance members of D are static members of C (i.e., members of the class object that we name CircumferenceOfEarth. This class must call the C). C may override the instance methods of D by declaring constructor of Measurement from a static block: static methods with matching signatures. If D includes ab- Measurement CircumferenceOfEarth { stract methods (via either inheritance or direct definition), static { container(24889); } those methods must be defined as static methods in C. For } example, we add the following definitions to our dimension classes: Then CircumferenceOfEarth, viewed as an instance, has a single field magnitude initialized with value 24889. Viewed abstract class Dimension { as a class, it has a single static field magnitude with value ... 24889. abstract String shortName(); } If the immediate containing class of a class C is known to have a zeroary constructor and the definition of C does not Dimension Length { include a call to a containing constructor in a static block, ... static String shortName() { return "L"; } then C acts as if the static block static { container(); } } were inserted at the beginning of the body of C. Dimension Time { 3.1.3 Classes as values ... static String shortName() { return "T"; } Henceforth, we use variable to refer to fields, local variables, } and method parameters (but not type parameters).
A class can be assigned to a field or variable and passed as The expression Time.shortName() evaluates to "T". Also, if we defined a method: an argument to a method, provided that it is an instance class of the variable’s type. String dimName(Dimension d) { return d.shortName(); } In general, a variable may be bound to either an instance class or an ordinary (non-class) instance. Member accesses then the method call dimName(Time) evaluates to "T" as well. (i.e., field accesses and method calls) to a variable bound to 3Readers familiar with MixGen semantics may note that our an instance class are interpreted as accesses to static mem- restrictions on type parameters allow us to drop the use of bers of the instance class. Member access to a non-class MixGen’s with clauses on type parameters. instance behaves in the usual way. A variable may not be used in type contexts (e.g., the type Real getPrimaryConversion() { return _primaryConversion; } of a method parameter or the type specified in a cast opera- tion), even if it is bound to an instance class. Also, there is U> Real conversionFactor() { no way to call a constructor of a bound instance class on a return _primaryConversion / U.getPrimaryConversion(); } variable it is bound to (we are intentionally ignoring interac- tions of metaclasses with reflection facilities). For example, U> Measurement inUnit() { the constructor call in the following code is not allowed: return new Measurement(conversionFactor()); } Length measurementMaker(Measurement m) { ... return new m(); // not allowed } } The introduction of primary units also allows us to avoid A type parameter of kind K can be used in all type contexts. cyclic dependencies in unit definitions. It can also be used in any context requiring a value of class K. In particular, it can be assigned to variables of class K Note that the constructor of class Unit ensures that no and instance methods of K may be called on it, just like instances of Unit will have a _primaryConversion of 0. ground instances or other variables of class K. For example, Because this constructor is called in static blocks of each we include the following method definition in class Unit: instance class, we catch such violations when classes are loaded. class* Unit extends D { ... We still need to reify primary units with separate classes String dimName() { return D.shortName(); } so we can refer to them. Given a dimension D, we want to } be able to easily obtain its primary unit. We also need to ensure that each dimension has exactly one primary unit. Furthermore, Minute.dimName() evaluates to "T". We meet both of these requirements by defining a new class With this core semantics in hand, we now discuss advanced PrimaryUnit, parameterized by a dimension. Each instantia- aspects of our library definition, as well as some peripheral tion of this class with a particular dimension D is the primary language features motivated by it. Many of these additional unit of D. We enforce that class PrimaryUnit has no subclasses aspects are quite complex, but this complexity results pre- or instances by introducing a new modifier, final*: cisely from our intention to provide programmers with the final* Unit PrimaryUnit { behavior they intuitively expect. It is often the case with static { container(1); } language design that a semantics that correctly matches pro- } grammers’ intuitive expectations necessarily includes sub- tleties that are seldom thought about or even noticed by The final* keyword indicates that PrimaryUnit has no sub- most users of the language. classes or instances. Conceptually, we view final* classes as typed terminal values; they act like ordinary values with 3.2 Unit conversion and primary units unique types that could be represented in a more clumsy To convert between measurements in different units of the fashion with the Singleton Pattern. same dimension, we must specify conversion factors between various units of that dimension. A natural place to keep this This solution gives a syntactically enforced bijection be- information is in the definition of a unit: each unit specifies tween dimensions and primary units: the primary unit of how to convert measurements in that unit to measurements dimension D is PrimaryUnit. For convenience and read- in any other defined unit (for the same dimension). Al- ability, we use aliases to identify primary units with familiar though the number of such conversion factors is quadratic units. For example, in the number of units, it is not necessary to maintain so many factors explicitly: if we can convert between measure- alias Meter = PrimaryUnit; ments in units A and B, and between measurements in units B and C, then we can convert between measurements in A Using the conversion factors defined by the units, we can and C via B. Thus, it is sufficient to include in the defi- easily convert between measurements in different units: nition of every unit a single conversion factor to a primary class Measurement U> extends D { unit of that dimension, and convert between any two com- Real magnitude; mensurable units via their common primary unit. ... V> Measurement inUnit() { class* Unit extends D { return new Measurement ( final private Real _primaryConversion; magnitude * U.conversionFactor() ); Unit(Real primaryConversion) { } if (primaryConversion == 0) { throw new RuntimeException( Measurement> asPrimary() { "Attempt to define a unit with a 0 conversion." return new Measurement> ( ); magnitude * U.getPrimaryConversion() } ); _primaryConversion = primaryConversion; } } ... } The methods inUnit and asPrimary require calling a static group. For notational convenience, we also parameterize method on the type parameter U. Static checking ensures abelian class with two symbols, included in parentheses, that each instance class of Unit defines conversionFactor that denote (i) the binary operator of the group and (ii) a and getPrimaryConversion. repetition operator of the group, which takes an element of the abelian group and an integer n, and yields the result of We can also name commonly used, algebraically formed di- repeated application of the binary operator to the element mensions using type aliases. For example, we use Area to n times (where, if n is negative, application to an element n 2 Length times is defined to denote the inverse of the result of appli- refer to the dimension Length and Speed to refer to Time . We now turn turn to the question of how to handle alge- cation to that element |n| times). We write an instantiation braically formed dimensions such as these. of type abelian class as follows:
3.3 Dimensional algebra abelian class (bin op,rep op ) name {...} Like quantities, dimensions and units can be combined via The repetition operator is given higher precedence than the multiplication and division operators. We might (na¨ıvely) binary operator. Because we are modeling a multiplicative try to model these operators as classes parameterized by semantics in this section, we use * as the binary operator and their operands: ^ as the repetition operator. For readability, we represent applications of ^ with superscripts. Dimension DimProd {...} Dimension DimQuot {...} We refer to the set of explicitly defined instance classes of an abelian class G as the base types of G. The set of instances This model is unsatisfactory because the algebraic structure of G is the set generated by the following rules: of dimensions requires that we equate certain terms. For ex- ample, Length×Time = Time×Length. Furthermore, the units of a given dimension can be combined as well. Combined • All base types of G are elements of G. units obey algebraic rules, and the subtyping relationships between units and their dimensions should be respected dur- • If T and U are elements of G then T *U is an element ing algebraic manipulation. of G. Another complication is that combined forms of dimensions • If T is an element of G then for all n ∈ , T n is an and units may contain integers. For example, consider the Z element of G. parameter type of a sqrt method over quantities (where we part with the Java Programming Language specification by using ^ to refer to exponentiation rather than exclusive or): Additionally, generated elements are equated according to the following rules: D sqrt(D^2 x) { ... } • T *U = U*T Notice that the parameter type contains a numeric exponent • T *(U*V ) = (T *U)*V (namely 2). We want to allow combined dimensions that can be raised to (constant) integer exponents. • T 0 = identity
3.3.1 Abelian classes • T 1 = T Rather than support algebraic manipulation in an ad hoc fashion, we introduce a more general language extension m n m+n • T *T = T that gives us the expressiveness we need: a new (special) metaclass abelian class. As with class*, a class of kind abelian class must have only instance classes. The explic- These rules imply all of the ordinary rules for manipulating itly declared instance classes of an abelian class form the integer exponents that we make use of in this paper. identity and basis of a free abelian group.4 Metaclass abelian class is parameterized by an identity element that desig- Now we can redefine class Dimension as an abelian class, with nates the instance class that serves as the identity for the the special dimension Unity as its identity element: 4An abelian group is a set A together with an operator, often denoted + : A, A → A, that satisfies the following properties: abelian class (*,^) Dimension {...} • There exists i ∈ A such that a + i = a for all a ∈ A; i Dimension Unity {...} is called the identity of A. ... Dimension Length {...} • For any a ∈ A, there exists a0 ∈ A such that a+a0 = i; 0 Dimension Time {...} a is called the inverse of a in A. ... • For all a, a0, a00 ∈ A,(a + a0) + a00 = a + (a0 + a00). • For all a, a0 ∈ A, a + a0 = a0 + a. Then in addition to Length, Time, etc., class Dimension also contains Length∗Time-2, etc. 3.3.2 Abelian constructors stract methods. For example, class Dimension includes the For instance classes B and C of an abelian class A, the class B*C abstract method shortName. Abstract methods of an abelian includes (as static fields) all instance fields in the definition class A may be defined for algebraically formed instance of A. It does not include any of the fields defined in B or classes by defining them in the abelian class itself. Methods C. Initialization of the fields of B*C is defined by a special defined this way must include definitions of behavior under constructor in A labeled by the binary operator of A. The combination by both the binary operator and the repeti- arguments to this constructor must be exactly two elements tion operator of A, and these definitions must be prefixed of type A, denoting the two instance classes being combined by the binary and repetition operators. The definition pre- to form the new instance. Similarly, initialization of Bn is fixed by the binary operator includes two type parameters defined by a special constructor in A labeled by the repetition of kind A. The definition prefixed by the repetition operator operator of A. This constructor takes an argument of type includes a type parameter of kind A and a final int param- A and a final int n, denoting the numeric argument to the eter. For example, abstract method shortName is defined for repetition operator. The argument n is constrained by static algebraically formed instance classes of Dimension by adding checking to be a constant (we discuss a generalization to the following method definitions to class Dimension: nonconstant exponents in Section 4.1). We refer to both of these constructors as abelian constructors. Notice that an abelian constructor is shared by all algebraically formed * String shortName() { return E.shortName() + "*" + F.shortName(); elements of A. }
If A is a generic class, we need to declare type parameters on the abelian constructors to denote the instantiations of the ^ String shortName(final int n) { type parameters for a particular instance class. For example, return E.shortName() + "^" + n; we revise our definition of class Unit to be an abelian class, } with PrimaryUnit as its identity element, and abelian constructors as follows: In the example above, does Time*Length.shortName() eval- abelian class (*,^) > Unit uate to "L*T" or "T*L"? The answer would appear to de- extends D pend on the order of the arguments. However, by the alge- { braic properties of abelian classes, Time*Length is the same final private Real _primaryConversion; as Length*Time. To allow methods such as shortName to be well-defined, we need to canonicalize all combined abelian *(Unit V, Unit W) { this(V.getPrimaryConversion().multiply( types. More importantly, canonicalization is a natural way W.getPrimaryConversion() to statically determine type equivalence. We discuss a pro- )); cedure for canonicalization in the next section. } Before any constructors are called on an algebraically formed ^(Unit V, final int n) { instance of an abelian class, the instance is canonicalized. this(V.getPrimaryConversion().power(n)); n } After canonicalization, the terms B of the canonicalized ... product are initialized from left to right via the abelian con- } structor prefixed by the repetition operator. Then combi- nations of these terms are initialized from left to right via For generic abelian classes, we impose restrictions on the re- the abelian constructor prefixed by the binary operator. spective instantiations of type parameters for the elements of a combination. The instantiations of a type parameter 3.4 Canonicalization of abelian type references whose kind is not an abelian class must match; the type To statically check type equivalence between members of an parameter of the combination is instantiated with the same abelian class G, we identify members of G and put them type. The instantiations of a type parameter whose kind is in canonical form so they can be compared directly. an abelian class are combined pointwise in the new instan- tiation. For example, because Meter is an instance class of Ignoring for the moment the complication of type parame- Unit and Second is an instance class of Unit, ters, elements of G can be identified syntactically: they are Meter*Second is an instance of Unit. In general, simply type expressions whose base types are all elements the type Unit*Unit equals the type Unit. Like- of G that are combined solely via the two operators of G. wise, Unit^n equals Unit. To put elements of G into canonical form, we perform the following steps: The identity element of a generic abelian class is also con- strained: every type parameter of an identity element whose kind is an abelian class must be instantiated with the iden- 1. Replace all occurrences of a base type T (no exponent) tity element of its kind. For example, the identity of Unit with T 1. must be an instance of Unit because Unity is the iden- tity of Dimension. 2. Distribute exponents so we are left with a product of base types that are raised to various powers. An algebraically formed instance of an abelian class A in- 3. Eliminate all parentheses. cludes (as static methods) all instance methods defined in the definition of A. But an abelian class may also have ab- 4. Put base types into lexicographic order. 5. For each occurrence of base type T , combine all tokens 3.5 Arithmetic operations on quantities P T ni into a single token T ni . To use an arbitrary instance of a dimension D in arithmetic contexts, we need to declare that instances of D define arith- 6. Eliminate all base type tokens of exponent 0. metic operations; that is, all instances of D must include a 7. Eliminate all occurrences of the identity element of G. set of arithmetic methods. To express such a restriction, we add a new class modifier to our language. To motivate this 8. If nothing is left but the empty product, add a single new modifier, let us reconsider our metaphor of classes as occurrence of the identity element. sets. We can understand conventional class extension dec- larations class C extends D as an assertion of the following For example, the (improbable) dimension form: ∀x ∈ C . x ∈ D (Length2 ∗ Time−2 ∗ Unity)4 ∗ Length−8 ∗ Time8 When the only relationship between classes is extension, this is canonicalized as follows: simple form of assertion is sufficient to state the most es- 7→ (Length2 ∗ Time−2 ∗ Unity1)4 ∗ Length−8 ∗ Time8 sential relationships. Once we generalize our language to include metaclasses, several new assertions suggest them- 7→ ( 8 ∗ −8 ∗ 4) ∗ −8 ∗ 8 Length Time Unity Length Time selves. For example, consider a class hierarchy for biologi- 8 −8 4 −8 8 7→ Length ∗ Time ∗ Unity ∗ Length ∗ Time cal applications with classes Species and LivingThing. We 7→ Length8 ∗ Length−8 ∗ Time−8 ∗ Time8 ∗ Unity4 want to declare that all instances of Species are subtypes of 0 0 4 LivingThing. 7→ Length ∗ Time ∗ Unity 7→ Unity In general, we may want to make assertions of the following 7→ form, relating classes to their subclasses and instances: 7→ Unity ∀x ∈ C . x ⊆ D To express this new relation, we add an optional extends* clause to the header of a class definition. For example, we Once elements of G are in canonical form, type equivalence can represent the constraint we would like on instances of can be determined via syntactic equivalence. Species as follows: abstract class LivingThing {} Generalizing to allow type parameters of kind G is straight- abstract class Species extends* LivingThing {} forward: In a lexical scope binding new type parameters, we Species Human extends LivingThing {} identify type parameters of kind G as elements of G, just as Species Lion extends LivingThing {} if they were new base types of G. Elements of G can be built with these type parameters just as they can with true base Then Human and Lion would be invalid class definitions if they types. Of course, these type parameters may be instantiated did not extend LivingThing. We also allow for an implements* with elements of G that are not base types; we merely treat clause that constrains instance classes to implement desig- 5 them syntactically as base types within the scope that binds nated interfaces. them. Within that scope, canonicalized elements of G are equated if and only if they are syntactically equivalent. As with instances of a class of kind class*, all instances of a class that includes either an extends* or an implements* References to instance classes of abelian classes that are clause in its definition must be instance classes. syntactically contained in generic type instantiations are also canonicalized. For example, because PrimaryUnit is pa- We add an implements* clause to class Dimension, requiring rameterized by a dimension, our solution for algebra over that each instance class D of class Dimension implements an Dimension automatically allows us to form and compare pri- interface DimensionI that contains a set of arithmetic oper- mary units of compound dimension. Thus, ations. In this way, we ensure that each instance of D will define those methods. PrimaryUnit The signatures of the arithmetic methods of an instance of equals dimension D must refer to D itself: D add(D x); PrimaryUnit D subtract(D x); D*E multiply(E x); For abelian classes with type parameters, an additional step D*E^-1 divide(E x); is necessary for canonicalization: instantiations of common U> Measurement inUnit(); generic abelian classes are combined pointwise. For exam- ple, The method inUnit allows us to convert an arbitrary quan- tity of D into a measurement in a given unit U of D so we can Unit^2*Unit add and subtract it to another measurement in U. 5Of the two remaining variations, ∀x ⊆ C . x ⊆ D is logically is canonicalized to: equivalent to the constraint for ordinary subtyping, and we know of no motivation in practice for stipulating constraints Unit of the form ∀x ⊆ C . x ∈ D. How can we put these declarations in an interface in such a notion of dimension for nonzero quantities, and then add in way that the interface has a handle on the type of an instance the notion of the zero quantity as special. class of Dimension? We need to include a type parameter on the interface to represent the type of an instance class of Given this special status of the zero quantity, it is natural Dimension. The bound on this type parameter will therefore to ask whether there are any software engineering repercus- be our interface instantiated with the type parameter itself: sions as to how we treat it. In fact, there are [21, 27]. We would like to allow programmers to write down zero quan- interface DimensionI This> { tities without the tedium of having to add “dummy” units. U> Measurement inUnit(); For example, we want the following code to be legal: This add(This x); This subtract(This x); This*That multiply(That x); Length x = ; This*That^-1 divide(That x); if (x == 0) {...} U> Measurement inUnit(); } Treating the zero quantity as polymorphic in this way is harmless: If we can statically determine that a quantity is We now need to redefine Dimension to declare that each of its zero, then no harm can come from adding it to, or multi- instance classes implements an instantiation of this interface plying it by, an arbitrary quantity; in the former case, the with itself. We do that as follows: result is simply the quantity it is added to; in the latter case the result is simply the zero quantity. On the other abelian class (*,^) Dimension This> hand, if we cannot statically determine that a quantity is implements* DimensionI {...} zero, then static checking will prevent it from being used polymorphically. Our instance classes of Dimension now need to instantiate this parameter with themselves: To include a special zero quantity, we might consider re- stricting magnitudes of measurements to be nonzero and Dimension Length {...} then adding a new typed element Zero of class Measurement: Dimension Time {...} Measurement Zero U> {} The tight bound on parameter This ensures that each in- stance class of Dimension must instantiate This with itself. However, we do not want a separate Zero for each dimension Therefore, it is straightforward to introduce syntactic sugar and unit. Rather, we want a single polymorphic zero quan- that implicitly includes a type parameter This on every class tity that is an instance of all instantiations of Measurement. of kind class* or abelian class and on every class that in- We express this relationship by using wildcard instantiations cludes an extends* or implements* clause. We need only re- of type parameters of generic classes. serve the identifier This as a new keyword. If T is a generic type, then we can instantiate C with a With this syntactic sugar, our class definitions revert almost wildcard, written T<*>. An instance of T<*> is an instance of to their earlier form: all instantiations of T. Normally we cannot say such things in object-oriented languages with generic types because the abelian class (*,^) Dimension notions of pointwise extension of a generic type and of ex- implements* DimensionI {...} tension of all instantiations of a generic type are conflated Dimension Length {...} (e.g., Zero extends Measurement). But it is help- Dimension Time {...} ful to distinguish these notions. Because the programmer ... has no handle on the type argument of a wildcard instanti- ation of a parent class, it is syntactically impossible for the 3.6 Zero quantities definition of such a class to depend on those arguments. When we introduced the notion of dimension, we said that if two quantities Q and Q0 exist in the same dimension, we can Using wildcards, we define the zero quantity to be a wildcard answer the question, “how much of Q does it take to equal instantiation of Measurement: Q0?” If it takes x of Q to equal Q0 then we write Q0 = xQ. We now discuss a minor complication with this na¨ıve view. Measurement<*,*> Zero {...} Although adding wildcards may seem excessive just to get Consider the ability of two quantities to be related in this a polymorphic zero quantity, wildcards are useful in other way as a relation R. Because we naturally think of dimen- contexts. For example, wildcard instantiations are similar to sions as dividing quantities into disjoint sets, we might ex- a standard interpretation of many type hierarchies as they pect that R is an equivalence relation. Unfortunately, our are written in a structurally typed languages such as ML expectation is not quite right: although R is reflexive, it is and Haskell, where parametric types are erased after type neither symmetric or transitive. The problem lies with zero checking and do not affect run-time behavior. For example, quantities: it takes 0 of any quantity to equal a zero quantity, lists are typically modeled in ML as follows: but no amount of a zero quantity equals a nonzero quantity. Furthermore, the symmetric closure of R does not satisfy ’a List = Empty | ’a Cons transitivity. For example, suppose QRQ0 and Q0RQ00. Then it may be the case that Q0 = 0Q and Q0 = 0Q00, but that In this definition, there is one Empty list shared by all instan- QRQ00 does not hold. Strictly speaking, we should define the tiations of List. In contrast, Generic Java without wildcard instantiations forces us to include a separate Empty for each can for an ordinary abelian class. Specifically, because the instantiation of List [6]: set of ints is the only abelian class with the property that the second, numeric, argument to the repetition operator is class Empty extends List {...} itself a element of the class, and the repetition operator rep- resents multiplication, we allow additional canonicalization Using wildcards, we can define class Empty as: for this special class.7 Automatic simplification of arith- metic expressions is well understood, and is incorporated in List<*> Empty {...} many production systems, including proof checkers such as Isabelle and symbolic math tools such as Mathematica. 4. EXTENSIONS We have presented our framework in the context of a general- To support nonconstant exponents in type expressions in purpose object-oriented type system, adding only metaclasses, other abelian classes (e.g., Dimension), we need to add one abelian classes, and wildcards. With just these three ex- step to the canonicalization process in Section 3.3, between tensions, we are able to express the most common uses of steps 5 and 6: physical quantities in computations. In particular, we can express user-defined dimensions and units, automatic con- 5.5. Canonicalize all exponents. version between units of a given dimension, static types cor- responding to dimensions and units, parametric polymor- We can now type check our power method, provided the phism with respect to these types, dimensional algebra, and parameter n is declared to be final (allowing it to be used zero quantities of polymorphic dimension. Furthermore, our as a type): formulation is general and robust enough to extend easily in many ways to support additional static checking. In this /** @precondition n >= 0 */ section, we discuss some of these ways. D^n power(D x, final int n) { if (n == 0) return 1; else return x.multiply(power(x, n-1)); 4.1 Nonconstant exponents } In general, we prefer that exponents on abelian classes not be limited to be constants. For example, we would like to 4.2 Generalizing magnitudes to arbitrary fields define a power function over quantities: Note that a magnitude need not be a real number. For ex- ample, the magnitude of an impedance is often represented /** @precondition n >= 0 */ 8 D^n power(D x, int n) { as a complex number. if (n == 0) return 1; else return x.multiply(power(x, n-1)); We want to generalize our treatment of class Measurement so } that magnitudes may be elements of various numeric classes (e.g., Float, Double, Complex, etc.). We define a new class where the value 1 is autoboxed to a value of type Unity. AlgebraicField that contains classes representing algebraic Employing pure recursion is of course undesirable in a math fields. We want to be able to perform arithmetic operations library function, but the specific implementation is irrele- on any two elements of an AlgebraicField. But we also want vant for our present purposes; what is relevant is that we to require that elements of an AlgebraicField representing are able to type check even recursive definitions. magnitudes can be combined via arithmetic operations with elements of any other AlgebraicField representing magni- Parameterizing dimensions by run-time integer values is a tudes (e.g., we want to allow a complex magnitude to be form of dependent typing. Because we cannot statically de- multiplied by a real magnitude). We also want to be able to termine precisely which integer computations will result in combine dimensioned quantities with arbitrary magnitudes equal values when the program is run, we must, as with all via multiplication and division. static checking, settle for a conservative approximation. We can satisfy all of these requirements by extending class One approximation that allows us to check many practical AlgebraicField with a class MagnitudeField and requiring examples and leverage the machinery we have already de- every instance class of MagnitudeField to extend dimension 6 veloped is to treat the set of ints as an abelian class! The Unity. binary operator of this class is addition; the repetition op- erator is multiplication. Each int constant is treated as a class* MagnitudeField extends AlgebraicField distinct type. extends* Unity {}
7 A final int bound in a lexical scope is treated as a type Alternatively, we could define another special metaclass commutative ring and define int as an instance class of that parameter ranging over this abelian class. Thus, the set of kind. But this approach requires adding additional com- valid int types consists of all expressions of int variables and plexity to our language with little reward. Furthermore, constants formed via multiplication and addition. We iden- even the notion of a commutative ring does not capture all tify and canonicalize elements of this type in the same way as of the algebraic simplification possible with the integers. explained for general abelian classes. However, special prop- 8We use the term “magnitude” as it is employed in most erties of the ints allow us to equate even more terms than we contexts concerning dimension checking. However, this use differs from its typical use with regard to impedance, to refer 6We assume that autoboxing of primitives is built into the to the square root of the sum of the squares of the real and language, as it is in the JavaTM 2 SDK1.5. imaginary parts of what we call the magnitude. Because class Unity implements DimensionI, all in- a programmer to easily convert dimensionless quantities to stance classes of MagnitudeField must implement it as well. dimensioned quantities when necessary. Because dimension- Thus, an element of an instance class of MagnitudeField can less quantities are identified with elements of class Unity, be added to or subtracted from any other element of an in- which are also used to represent magnitudes of Measurements, stance class of MagnitudeField. It may also be multiplied by a programmer wishing to treat such a quantity as having a and divided by any other quantity. When an element e of an dimension other than unity can simply wrap it in a new instance class of MagnitudeField is multiplied by another ele- Measurement. For example, to treat a dimensionless quotient ment e0 of an instance class of MagnitudeField, the dimension of two Lengths l1 and l2 as an angle, we can write: of the result is Unity ∗ Unity = Unity. When e is divided by new Measurement(l1.divide(l2)) e0, the dimension of the result is Unity∗Unity−1 = Unity. For quantities of Dimension D, the dimension of a multiplication −1 −1 4.4 Scales of measurement is Unity ∗ D = D and of a division is Unity ∗ D = D . Along with multiplicative conversions between units of the same dimension, we often want to define scales with explicit DimensionI also includes method inUnit, but the only zeros that are based on a unit of measurement (e.g., the Cel- unit of dimension Unity is PrimaryUnit. Therefore, we sius, Kelvin, and Fahrenheit scales). Scale conversions may define this method as follows: if it is called on an instance of involve more than just multiplication by conversion factors; Unity and the given unit to convert to is PrimaryUnit, they can involve addition and subtraction as well (e.g., the then the method simply returns its receiver. Otherwise, it conversion of a Celsius scale reading to a Fahrenheit scale throws an exception. reading involves multiplication by 9/5 and addition of 32 F◦, where we follow the convention that n◦C means a Celsius We redefine existing classes Float and Double as instance scale reading of n but nC◦ means a distance of n degrees be- classes of kind MagnitudeField and provide them with ap- tween two readings on the scale). How are we to understand propriate definitions of the arithmetic operations required scales and scale conversions in the context of our analysis? to satisfy the extends* constraint on MagnitudeField: In particular, can we explain scale conversion solely in terms MagnitudeField Float extends Unity {...} of manipulation of quantities? MagnitudeField Double extends Unity {...} The key to understanding the relation between quantities Now we need only modify field _magnitude in class Measurement and scale readings is to realize that a measurement of a to be of type Unity. In this way, we allow the magnitude of quantity is really a measurement of the difference between a measurement to be any dimensionless quantity. Note that two points in a continuum: an origin (or zero point) and dimensionless products of measurements are both measure- an external point. For example, a measurement of time is ments and instances of Unity. A definition of a Measurement really a measurement between an origin (e.g., the point at class with methods for binary operations and for unit conver- which a stopwatch is started) and an external point (the sion is presented in Figure 1. Class Measurement includes sev- point at which the watch is stopped). Scale readings are eral polymorphic methods with type parameters that cannot measurements for which the origin is fixed by the scale it- be inferred from method arguments alone. As in MixGen, self. Thus, even when two scales are applied to the same polymorphic methods in MetaGen employ explicit polymor- external point, the quantities represented by the readings phism. may not be the same—the scales may have different origins. When we convert a reading R of external point M in scale S 0 0 We can also define new instance classes Complex, Rational, with zero point ZS to a reading R in another scale S with Real, Interval, etc. with kind MagnitudeField, along with ap- zero point ZS0 we are really deducing the size of a measure- propriate subclassing relationships between them. A proper ment from ZS0 to M given both a measurement from ZS to encoding of these classes and their relation to one another M and a measurement from ZS0 to ZS . This deduction in- is beyond the scope of this paper. volves a simple addition. Because scales may also represent quantities in different units, unit conversions of the initial 4.3 Discovered dimensions measurements may be necessary as well. Sometimes the combination of measurements results in di- mensionless quantities (i.e., quantities of dimension unity). For example, a reading of a temperature in degrees Celsius For example, we can think of an angle as being determined is really a measurement of difference of an external “tem- by measuring two lengths (an arc and a radius) and dividing perature point” with a reference point (0◦C) in unit C◦. If one by the other. Similarly, a relative density is obtained by we write the external point as A and 0◦C as Z then we can dividing one density by another. Although angle and rela- express this difference with the following equation: tive density are both of dimension unity, it is nonsense to A − Z = x C◦ compare an angle to a relative density. Now suppose we want to convert the measurement x C◦ to ◦ Therefore, a programmer may want to view (nonzero) mea- a measurement on the Fahrenheit scale. We represent 0 F surements as measurements of quantities in newly discovered as ZF . Then we have: primary dimensions (e.g., angle, relative density, etc.). It is ◦ Z − ZF = 32 F often possible to measure such quantities directly (e.g., with a protractor). We also know the conversion factor of C◦ to F◦: We leave the decision of when to treat a dimensionless quan- 9 C◦ = F◦ tity in this way to the programmer. Our formulation allows 5 class Measurement U> extends D { private final Unity _magnitude;
Figure 1: Class Measurement ◦ To derive an equation of the form A − ZF = yF , we add A ScaleReading can be converted to any other ScaleReading the two difference equations and perform unit conversion: of the same dimension via method inScale. Class Scale is merely a factory for new ScaleReadings. We may want to A − Z = x C◦ + 32 F◦ F alias particular instantiations of Scale with their common 9 = x F◦ + 32 F◦ names, e.g., 5 9 = ( x + 32) F◦ 5 alias KelvinScale = Scale>; yielding the standard formula. By viewing scale readings as final* Point ZeroCelsius { measurements from fixed reference points, we can convert static { between scales while manipulating only quantities. container ( new Measurement(273) ); We can model scales and points similarly to how we model } measurements and units. Just as every dimension has a pri- } mary unit (which prevents cyclic dependencies), so too it alias CelsiusScale = has a primary origin. Every point in a given dimension is Scale; defined with respect to the primary origin of that dimen- sion. Just as measurements are defined with respect to a particular unit, scales are defined with respect to a unit and 4.5 Nonlinear scales origin. Therefore, we define a class Point as follows: More generally, zeroed “scales” are also defined in which readings are nonlinear, e.g., the bel scale, which is a mea- class Point { sure of the logarithm of the ratio of two measurements of private final D _distanceToOrigin; Power, and the Gregorian calendar, where the length of a Point(D distanceToOrigin) { year depends on which year we’re talking about. A par- _distanceToOrigin = distanceToOrigin; ticularly unusual nonlinear scale from science fiction is the } warp scale from Star Trek: The Next Generation,9 which is asymptotic at Warp 10. D getDistanceToOrigin() { return _distanceToOrigin; Although such “scales” appear superficially to be distinct } D distanceTo(Point p) { from linear scales such as Celsius and Fahrenheit, they re- return _distanceToOrigin - p.distanceToOrigin(); ally are the same kind of entity. Any scale can be thought of } as linear along a particular dimension. In fact, temperature } itself is equal to the reciprocal of the rate of increase of en- tropy with respect to energy of a given system. In the case of Unlike Units, we do not require all points to be instance decibels, “absolute zero” corresponds to 0dB, and it is qual- classes, so we do not define Point as having kind class*. We itatively different from other decibel readings. Differences also do not define Point as an abelian class. can be taken between two readings in the dB scale; the result is expressed in a special unit (let us call it “decibel units”) As with PrimaryUnits, we define an instance class PrimaryOrigin that is not typically written down.10 Another example is the that is parametric in a dimension: musical scale, which can be thought of as based on the unit “halftone”. High C and middle C differ by 12 halftones. If final* Point PrimaryOrigin { static { container(0); } we start from middle C, and then add 4 halftones, we get } E, and if we add 8 more, we get high C. We can arbitrar- ily pick middle C as the 0 point, and count halftones from We can now define classes Scale and ScaleReading that are there. However, we can also associate a reading on the mu- parametric in a dimension, a unit, and a point: sical scale with the frequency of the note. A musical scale based on halftone is logarithmic in the frequency. But we class Scale U, Point P> { might as easily say that the frequency scale is exponential ScaleReading newReading(Point Q) { in the halftone scale. return new ScaleReading(Q.distanceTo(P)); } } Because scale readings are often logarithmic with respect to class ScaleReading U, Point P> { other scales of more fundamental quantities (based on units private final Measurement _value; that the designers of the original scale may not have been aware of), it is natural that we find readings on many scales ScaleReading(Measurement value) { _value = value; } to be restricted to proper subintervals of a field (e.g., the positive reals, the reals between 0 and 10, etc.). Measurement getValue() { return _value; }
9 V, Point Q> ScaleReading inScale() { The warp scale from the original Star Trek series was based return new ScaleReading ( on a less interesting formula. _value.subtract(P.distanceTo(Q)).inUnit() 10Because a decibel reading is measured as the logarithm of ); the ratio of two quantities of Power, a decibel unit is like an } angle in that it can be viewed either as dimensionless or as } a unit in a discovered dimension. 5. PRAGMATICS 6. RELATED WORK We now consider some pragmatic aspects of our system with 6.1 Dimension and unit checking respect to syntax and performance. Most formulations of dimension checking in programming languages encode dimensions as an ad hoc language feature 5.1 Syntax rather than integrating them into a more general type sys- Many of the syntactic constructs in our finished design are tem. Also, most formulations incorporate dimension check- long-winded. However, this is an artifact of C++-style syn- ing into functional and procedural languages rather than tax that is easily addressed with syntactic sugar. For one object-oriented languages, where the design issues are sig- thing, many of the instantiations of generic types involved nificantly different. Few formulations support parametric instantiations of multiple type parameters for Measurement, polymorphism over dimensions. No previous formulation Dimension, and Unit when instantiation of the Unit is all presents dimension checking as a generalization of static that is needed to infer the others. Also, we have selected checking for metaclasses. long names for our classes to help communicate ideas; in a production system, we may want to use shorter names in Andrew Kennedy presents an extension of a core calculus some cases. Finally, support for operator overloading in our for ML with support for dimension checking [20, 21]. Be- language would allow us to specify arithmetic operations cause ML is a structurally typed functional language rather on Measurements more concisely. If we had autoboxing of than a nominally typed object-oriented language, many of numeric types and we could also overload arithmetic opera- the key issues that have driven our design do not apply to tions on numeric types to take a unit and return a measure- this language. Units are not given separate types and each ment in that unit, many expressions could be written more dimension is associated with a single unit. Other units are succinctly. For example, we could replace expressions such encoded by the programmer as constant values defined in as: terms of this single unit. Because dimensions are added as an ad hoc language extension, the issues involved in en- new Measurement(5) coding them via metaclasses do not arise, nor do those in- volved in general support for kinds whose elements satisfy with 5 * Meter. the properties of an abelian group. Nevertheless, Kennedy’s approach shares similarities with ours. Dimension exponents 5.2 Efficiency of our representation are constrained to be integers. The zero quantity is treated Our formulation of dimensioned quantities is powerful, and as having polymorphic dimension. Also, function types may it allows for implementation with reasonable performance: be polymorphic with respect to dimensions. The problem Our representation of measurements involves only a single of typing functions that are polymorphic with respect to field for each measurement. Elements of this field are el- a dimension exponent (such as power) is discussed but no ements of classes associated with primitive values such as solution is presented. Float and Double. Because the magnitude field is private and final, rudimentary escape analysis should allow us to A significant portion of Kennedy’s proposal involves the for- inline these field values in many cases. Thus, run-time repre- mulation of type inference over dimension types. In this re- sentation of quantities should require overhead about equal spect, it builds on the work of Wand and O’Keefe [28] and to that of boxing primitive values. Additionally, various Goubault [16], which present formulations of dimension in- reductions in expressive power allow for even better per- ference for ML-like languages. Incorporating an inference formance, giving library designers the freedom to choose mechanism over metaclasses and abelian classes based on the power/performance tradeoff for their target applications. these inference systems is an important direction for future When run-time representation of quantities is not required, work. the Measurements themselves could be represented as primi- tive values. Even when run-time representation of quantities Kennedy extends his own work by relating it to the para- is important, it should be possible to simply pass this infor- metricity result of System F [22]. By proving a “dimen- mation as type parameters to methods that require it and sional invariance” principle over a core calculus based on represent the measurement values themselves as primitives. ML, Kennedy is able to establish several surprising prop- erties of this calculus. For example, it is possible to show All the arithmetic methods are simple nonrecursive one- that functions of the type ∀u . num u2 7→ u cannot return a line methods that are easily inlined by modern just-in-time nonzero result for any argument value, and therefore that an compilers. Through static overloading, calls to the arith- approximating square root function with this type cannot be metic operations are more efficient when additional infor- expressed in the language! It is not at all obvious how to mation can be statically inferred. For example, when the relate this work, which is based on F-bounded parametric static types of the operands are Measurements with the same polymorphism over structural types, to an object-oriented Unit, no unit conversion is necessary. Additionally, field language with typed metaclasses and nominal types. But it _magnitude is final; a savvy just-in-time compiler could cal- would be interesting to determine such a relationship. culate values for methods such as getPrimaryConversion on the first call for a value and cache the result for subsequent Other formulations of dimension checking do not support calls. If performance were still unsatisfactory, we could re- parametric polymorphism over dimensions. Formulations of quire all measurements to be kept in the primary unit of dimensions for object-oriented languages have typically en- their dimension. Although we may lose precision for some coded a fixed set of dimensions. A popular formulation of programs, we would be able to reduce each of the four arith- dimension checking in C++ (via the Standard Template Li- metic methods to a single arithmetic operation. brary) is SIUnits [5], which provides for the checking of quan- tities in the SI system by encoding the seven dimensions ad- does not include a notion of dimension in his language ex- dressed by that system as integer parameters of a template tension; instead, units of a common dimension are declared class Quantity. Parametric dimensions, class relationships to be related via a counits operator. As a result, unit con- between units dimensions and quantities, and programmer version can occur along multiple paths, adding significant extensions of this hardwired template are not supported. In complexity to an implementation. addition, the underlying type system is unsafe and does not directly represent the abstraction of generic types, greatly Gehani’s approach is criticized by House [18], who notes that reducing the value of static checking. units are not actually declared in Gehani’s proposal. In- stead, new units are simply used when needed by a variable Fowler discusses a modeling of units of measurement used declaration. As a result, it is possible to form pathological in the Cosmos project for the U.K. National Health Ser- definitions where the name of a quantity is identical to the vice [10]. In addition to units and quantities, Fowler in- name of a (distinct) unit. House also notes that Gehani’s cludes a notion of “measurement” that includes the notion system is not statically checkable without severely restrict- of an observation and an observed “patient”. Dimensions ing the use of the UNITS(*) operator. House goes on to cite of quantities and the relationships of algebraically formed a litany of other criticisms of Gehani’s proposal and then units and dimensions are not modeled. Static checking and proposes his own extension of Pascal supporting dimension dimensional parametricity are not discussed. checking.
The formulation closest to our own is by van Delft [27], who In House’s proposal, each dimension is associated with a describes a mechanism for implementing dimensions in Java unique primary unit. Exponents on dimensions are arbi- class files as annotated classes. As with our approach, van trary rational numbers to allow the definition of methods Delft treats the zero quantity as having polymorphic dimen- such as sqrt; the syntax for polymorphic dimensions pro- sion. However, van Delft does not provide a specification for hibits the solution we have presented, where the input di- the semantics of dimension types, does not encode dimen- mension is defined to be the square of a dimension vari- sions as part of a class hierarchy, does not allow for the able. House also restricts exponents on dimensions to be definition of new methods specific to a particular dimen- constants.11 Generic type declarations are supported for sion, and does not include parametric polymorphism over records and procedures. However, procedure return types dimension types or separate types for units. As a result, the are precluded from being generic dimensions independent issues in object-oriented modeling that we have addressed of the parameter dimensions. House proposes to solve the in our proposal do not arise. Van Delft does includes a con- problem of scale conversion by allowing for “units with dif- struct dimension(exp), which evaluates to the dimension of ferent zeros”; unit conversion involves both an additive and exp. Although this feature is useful for linking dimensions a multiplicative conversion. This approach fails to recog- of method parameters (the term “dimension” can be used nize that scale readings have distinct properties from other to take an argument of arbitrary dimension), and provides quantities and should be modeled as separate entities. some of the benefits of parametric dimensions, this limited form of parametricity, combined with the restriction that di- Gehani also discusses the application of units of measure- mension exponents be natural numbers, prevents definition ment to databases [14]. In this context, conversion factors of a polymorphic sqrt function: can change dynamically (e.g., between monetary units), sug- gesting that the best way to store data is not with a stan- double dimension(v)^1/2 sqrt(double*dimension v) dard unit per dimension, but instead in the units in which the data was originally entered. van Delft proposes the following alternative: In the context of database manipulation, Gehani recom- double*dimension sqrt(double*dimension(return*return) v) mends the following operators: Where return is the type of the return value. This nota- tion provides an ad hoc mechanism to allow for equating • counits: specifies that two or more units are of the the return type of a method with its parameter types. But same dimension this mechanism is really just a workaround for the lack of • remove counits: removes unit relationships general support for parametricity, and it prevents expres- sion of parametric dependencies between multiple method • ccc: changes the conversion constant between two units parameters, e.g., (useful when dealing with rates of exchange, improving precision of scientific constants, etc.) boolean lessThan(D x, D y) • in: evaluates an expression in a given unit Other published formulations of dimension checking have • unitsof: returns the units of a given value been over procedural languages. Gehani contrasts the use of derived types in Ada with the use of a facility for unit checking, showing how derived types are an inadequate sub- In our formulation, where dimensions of quantities are de- stitute for a special language feature for units [15]. He also noted explicitly, the counits operator is unnecessary, and presents an extension of Pascal with support for units [13], in 11In fact, because Pascal syntax does not include an expo- which he allows for polymorphic use of units via a UNITS(*) nentiation operator, compound dimensions are expressed via declaration, and a UNITSOF operator. However, his discus- multiplication and division, which is expressively equivalent sion often conflates the notion of units and dimensions: he to requiring dimension exponents to be constant. the remove counits operator is nonsensical (remove counits As a result, detecting commensurability of units becomes a is apparently motivated by the need to maintain possibly central aspect of their approach. However, there is an upshot corrupted, persistent data in a database). In a program- to their focus on commensurability: their mechanism for ming language, units either belong to a dimension or they manipulating unit relationships is more general than ours: do not. Programmers may introduce and correct erroneous Rather than defining the magnitudes of all units for a given declarations of unit classes in their programs, but dynam- dimension in terms of a primary unit, each unit may be re- ically altering the dimension of a quantity is not needed. lated to any other unit. Commensurability is determined The ability to support dynamically changing conversion con- by a interesting computational trick where equations relat- stants between two units could be supported by making the ing units are represented in logarithmic form, presented as conversion factor in class Unit mutable and allowing select matrices, and put into row echelon form. Cycles and in- classes to support a ccc method. The in and unitsof oper- consistencies in the unit relationships can be determined by ator are easily supported as methods in class Measurement. straightforward inspection of the reduced matrix. The fact that we are able to easily incorporate Gehani’s operators into our formulation is a demonstration of the ex- Because we allow for dimension types in our approach, com- tensibility we have attained by unifying our formulation into mensurability detection is not necessary for static checking. a general object-oriented type system. However, intelligent unit conversion at run time would al- low for improved precision. For example, when converting Manner describes a syntactic extension to Pascal and Ada in feet to inches, it may be advantageous to relate them di- which unit declarations are supported [23]. Each dimension rectly rather than to relate them each to meters and then is associated with a unique primary unit and all quantities divide. An interesting direction for future work is to try of that dimension are kept in the primary unit. Polymor- to embed the row-echelon technique of Karr and Loveman phism over quantities is not supported. Dreiheller et al. ex- into the definition of method inUnit on measurements. Ide- pand upon Manner’s syntax to provide a language extension ally, the equations composing this matrix would be checked and implementation (via a preprocessor) for Pascal called statically. PHYSCAL [8]. PHYSCAL supports scale factors on units as well as input/output facilities for quantities. PHYSCAL Parametric units are not discussed in Karr and Loveman’s is limited to supporting SI units; the programmer cannot de- proposal. Raising quantities to a (possibly nonconstant) fine new units or dimensions. Scales of measurement are not power is allowed, so long as the unit of the exponent is com- handled by PHYSCAL, nor is polymorphism over program mensurate with 1. If the exponent is nonconstant, the units constructs involving dimensions. of the expression are assumed to be 1, breaking soundness of static checking.12 Scale measurements and input/output Baldwin proposes another extension of Pascal supporting of quantities are not addressed by their proposal. units inspired by Manner [3]: a fixed set of dimensions is added to Pascal as siblings of type REAL. Each dimension The earliest mention of dimension checking for programming is associated with a unique primary unit, and quantities of languages that we are aware of occurs in J.C. Cleaveland’s that dimension are always kept in the primary unit. Multi- discussion of “pouches” [7]. Pouches are described as a gen- plication and division of dimensioned quantities by dimen- eral way to allow programmers to organize numeric variables sionless quantities is not supported despite recognition that into natural classes or categories. It is shown how a set of such support is needed for correctness. pouches can be organized as the base elements of a freely generated abelian group, and therefore as unit annotations. Hilfinger proposed an encoding of support for dimension No polymorphism is provided over pouches, but it is men- checking in Ada in [17]. Hilfinger’s approach shares some tioned in passing that a notion of “hidden” pouches would be of the flavor of ours in that he attempts to use existing lan- useful when defining arithmetic functions that are naturally guage abstraction mechanisms to encode dimensions rather polymorphic. Exponents on generated elements of pouches than introducing an ad hoc language feature. He relies on are restricted to be constants. No subtyping is provided over the subtyping mechanism of Ada to encode dimensions as pouches. Pouches are used to describe units rather than di- subtypes of records of constant integers. Each field in this mensions, but no facility is provided for converting between record supertype corresponds to a primary dimension. Be- commensurable units. cause the fields in this record are fixed, the programmer cannot define new dimensions. Also, quantities of a di- mension must be represented in a primary unit. Limited 6.2 Generalized class relationships in other polymorphism is allowed because a programmer can refer to languages the record supertype of all dimensions when defining meth- The use of class relationships beyond simple class exten- ods. However, general parametric polymorphism including sion is essential for modeling many real-world ontologies [9]. dependencies between the dimensions of multiple variables Many non-statically-typed object-oriented languages, such is not supported. 12Because parametric polymorphism is not supported, this design decision is reasonable as long as all occurrences of Karr and Loveman discuss mechanically incorporating units these exponent expressions are either accompanied by a into a procedural programming language [19]. Like us, they static warning or required to be annotated by the program- are concerned not with simply adding unit checking to an ex- mer. Provided that enough information is kept to deter- isting language, but rather with understanding more general mine the actual units of these expressions at run time and issues in adding unit checking to a class of languages. Karr to check their use, exponents with nonconstant exponents and Loveman do not include a separate notion of dimension. in this proposal are akin to downcasts in the Java Program- ming Language. as Smalltalk, Python, and Self, allow for more flexible re- concisely than can be done in Smalltalk. Instead of writing lationships. In the case of Smalltalk and Python, classes classes such as ClassClass, ObjectClass, etc., we can define a are instances of metaclasses. Static members of a class are metaclass Class that is parameterized by its contained class: modeled as ordinary members of the corresponding meta- class. Self, on the other hand, is a prototype-based object- Class> Class C> {...} oriented language, where there are no classes at all: object instantiation consists of cloning an existing instance. The Note that the bound on type parameter C is itself Class. members of the clone can be modified and added to at will. This bound allows us to define instance classes such as Class
We would also like to examine to what degree we can unbox [12] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. dimensioned quantities in real programs to achieve perfor- Design Patterns: Elements of Reusable mance acceptable for high-performance computation. Object-Oriented Software. Addison-Wesley, 1995. [13] N. Gehani. Units of Measure as a Data Attribute. Finally, formalizing the MetaGen semantics over a core cal- Computer Languages, Vol. 2. pp. 93-111. Pergamon culus (e.g., an extension of Core MixGen) is an important Press, 1977. Printed in Great Britain. next step in helping us to better analyze its properties and interactions with other language extensions. [14] N. Gehani. Databases and Units of Measure. IEEE Transactions on Software Engineering. Vol. SE-8, No. Acknowledgements 6, November 1982. We are gratetful to Matthias Felleisen, Sam Tobin-Hochstadt, Cheryl McCosh, Miriam Kadansky, and the anonymous ref- [15] N. Gehani. Ada’s Derived Types and Units of erees for their comments on earlier drafts of this work. Measure. Software – Practice and Experience, Vol. 15(6), 555-569. June 1985. 8. REFERENCES [16] J. Goubault. Inference d’unites physiques en ML. In [1] E. Allen, J. Bannet, and R. Cartwright. A First-Class P. Cointe, C. Queinnec, and B. Serpette, editors, Approach to Genericity. In Proceedings of the Journees Francophones des Langages Applicatifs, Eighteenth Annual SIGPLAN Conference on Noirmoutier, p.3-20. INRIA, Collection didactique, Object-Oriented Systems, Languages, and 1994. Applications, Anaheim CA 2003. [17] P. Hilfinger. An Ada Package for Dimensional [2] E. Allen, R. Cartwright, and B. Stoler. Efficient Analysis. ACM Transactions on Programming implementation of run-time generic types for Java. In Languages and Systems, Vol. 10, No. 2, April 1988. p. IFIP WG2.1 Working Conference on Generic 189-203. Programming, 2002. [18] R. T. House. A Proposal for an Extended Form of [3] G. Baldwin. Implementation of Physical Units. Type Checking of Expressions. The Computer SIGPLAN Notices, V. 22(8). August 1987. Journal, Vol. 26, No. 4. 1983. [4] G. Bracha, D. Griswold. Strongtalk: Typechecking [19] M. Karr, D. B. Loveman III. Incorporation of units Smalltalk in a production environment. In Proceedings into programming languages. Communications of the of the Eighth Annual SIGPLAN Conference on ACM, 21(5):385-391, May 1978. Object-Oriented Programming, Systems, Languages and Applications, pages 215-230, 1993. [20] A. Kennedy. Dimension Types. In Proceedings of the [5] W. Brown. Applied Template Metaprogramming in 5th European Symposium on Programming Languages SIUnits: the Library of Unit-Based Computation. In and Systems. Edinburgh, U.K. 1994. Proceedings of the Second Workshop on C++ Template [21] A. Kennedy. Programming Languages and , October 2001. Available at Programming Dimensions. PhD Thesis. St. Catharine’s College. . http://www.oonumerics.org/tmpw01/brown.pdf November 1995. [6] R. Cartwright and G. Steele. Compatible genericity [22] A. Kennedy. Relational Parametricity and Units and with run-time types for the Java programming Measure. In Proceedings of the 24th Annual ACM language. In Proceedings of the Thirteenth Annual Symposium on Principles of Programming Languages, SIGPLAN Conference on Object-Oriented Systems, Paris, France. January 1997. Languages, and Applications, Vancouver BC 1998. [7] J.C. Cleaveland. Redundant Specification in [23] R. Manner. Strong Typing and Physical Units. Programming Languages Through Pouches. UCLA SIGPLAN Notices V. 21(3), March 1986. Technical Report. 1975. [24] P. Neumann. Risks to the public from the use of [8] A. Dreiheller, M. Moerschbacher, B. Mohr. computers. ACM Software Engineering Notes, 10(3) PHYSCAL: Programming Pascal with Physical Units. July 1985. SIGPLAN Notices, V. 21(12), December 1986. [25] M. Sulzmann, The Chameleon website. [9] D. Ferrucci, C. Welty. What’s in an Instance? RPI http://www.comp.nus.edu.sg/ sulzmann/chameleon/ Computer Science Technical Report. 1994. [26] M. Sulzmann, M. Odersky, M. Wehr, Type Inference [10] M. Fowler. Analysis Patterns: Reusable Object with Constrained Types. Theory and Practice of Models. Addison-Wesley 1997. Object Systems, 5(1), 1999. [27] A. van Delft. A Java Extension with Support for [29] A. Goldberg, D. Robson. Smalltalk-80: The Language. Dimensions. Software – Practice and Experience, Addison-Wesley, 1989. 29(7), 605-616. 1999. [28] M. Wand, P. M. O’Keefe. Automatic dimensional inference. In J.-L. Lassez and G. Plotkin, editors, Computational Logic: Essays in Honor of Alan Robinson. MIT Press, 1991.