Recursive Structures for Standard ML
Total Page:16
File Type:pdf, Size:1020Kb
Recursive Structures for Standard ML Claudio V. Russo Microsoft Research Ltd., St George House, 1 Guildhall Street, Cambridge CB2 3NH crusso@microsoft.com ABSTRACT both small and large programs. SML’s general-purpose Core Standard ML is a statically typed programming language language supports “programming in the small” with a rich that is suited for the construction of both small and large range of types and computational constructs that includes programs. “Programming in the small” is captured by Stan- mutually recursive datatypes and functions, control con- dard ML’s Core language. “Programming in the large” structs, exceptions and references. is captured by Standard ML’s Modules language that pro- SML’s special-purpose Modules language supports “pro- vides constructs for organising related Core language defini- gramming in the large”. Constructed on top of the Core, tions into self-contained modules with descriptive interfaces. the Modules language allows sequential definitions of identi- While the Core is used to express details of algorithms and fiers denoting Core language types and terms to be packaged data structures, Modules is used to express the overall archi- together into possibly nested structures, whose components tecture of a software system. In Standard ML, modular pro- are accessed by the dot notation. Structures are transpar- grams must have a strictly hierarchical structure: the depen- ent:bydefault,therealisation (i.e. implementation) of a dency between modules can never be cyclic. In particular, type component within a structure is evident outside the definitions of mutually recursive Core types and values, that structure. Signatures are used to specify the types of struc- arise frequently in practice, can never span module bound- tures, by specifying their individual components. A type aries. This limitation compromises modular programming, component may be specified opaquely, permitting a variety forcing the programmer to merge conceptually (i.e. archi- of realisations, or transparently, by equating it with a partic- tecturally) distinct modules. We propose a practical and ular Core type ([10] uses the terminology abstract and man- simple extension of the Modules language that caters for ifest instead; we follow [8]). A structure matches asigna- cyclic dependencies between both types and terms defined ture if it provides an implementation for all of the specified in separate modules. Our design leverages existing features components, and, thanks to the subtyping relation called of the language, supports separate compilation of mutually enrichment, possibly more. A signature may be used to recursive modules and is easy to implement. opaquely constrain a matching structure. This existentially quantifies over the actual realisation of type components that have opaque specifications in the signature, effectively Categories and Subject Descriptors hiding their implementation. A functor definition defines a D.3.3 [Programming Languages]: Language Constructs polymorphic function mapping structures to structures. A and Features—Modules,packages functor may be applied to any structure that realises a sub- type of the formal argument’s type, resulting in a concrete General Terms implementation of the functor body. Despite the flexibility of the Modules type system, it does LANGUAGES, THEORY suffer from an awkward limitation. Unlike the definitions of the Core language, module bindings must have a strictly Keywords hierarchical structure: the dependency between modules can recursive modules, datatypes, Standard ML, type theory never be cyclic. In particular, although definable within the confines of a single module, definitions of mutually recursive 1. INTRODUCTION types and values can never span module boundaries. While this does not affect the expressiveness of the Core language, Standard ML [12] (henceforth SML) is a high-level pro- the restriction does compromise modular programming. The gramming language that is suited for the construction of programmer is typically left with two choices. She can either merge conceptually distinct modules into a single module, just to satisfy the type checker. Or she can resort to tricky Permission to make digital or hard copies of all or part of this work for encodings in the Core language to break the cycles between personal or classroom use is granted without fee provided that copies are modules. The first approach obscures the architecture of the not made or distributed for pro£t or commercial advantage and that copies program; the second obscures its implementation. Neither bear this notice and the full citation on the £rst page. To copy otherwise, to solution is satisfactory. republish, to post on servers or to redistribute to lists, requires prior speci£c In this article, we relax the hierarchical structure of Mod- permission and/or a fee. ICFP’01, September 3-5, 2001, Florence, Italy. ules, allowing structures to be recursive. Our extension sup- Copyright 2001 ACM 1-58113-415-0/01/0009 ...$5.00. 50 ports mutual recursion at the level of Core datatypes and components, and possibly more. A structure matches asig- (independently) values. Although different in detail, our nature containing opaque type or datatype specifications if proposal may be seen as adapting Crary, Harper and Puri’s it enriches a complete realisation of that signature. theoretical analysis of module recursion to Standard ML [2]. Core expressions e describe a simple functional language Our main contribution is to provide a practical type system extended with the projection of a value identifier from a for recursive modules in Standard ML that avoids some of structure path. Constructor applications and case expres- the typing challenges of the formalism in [2], and exploits sions are tagged with the name of the datatype (tp) that Standard ML’s subtyping on modules to provide slightly they introduce or eliminate. A constructor application takes more expressive and convenient constructs. Our extension the values of its n arguments and builds a tuple tagged with preserves the existing features of Standard ML. the constructor c, introducing a value of type tp. The typ- For presentation purposes, we formulate our extension, ing rules ensure that the constructor is fully applied. A case not for Standard ML, but for a representative toy language expression evaluates e to a constructed value of type tp and called Mini-SML. The static semantics of Mini-SML is based choosesacontinuationbasedonthetagofthisvalue.Kisa directly on that of Standard ML. Mini-SML includes the es- finite set of constructors used to index the set of alternative sential features of Standard ML Modules but, for brevity, continuations. Each alternative binds nc constructor argu- only has a simple Core language of explicitly typed, mono- ments xc,0 ···xc,nc−1 in the continuation body ec (typing morphic functions and non-parametric type and datatype will ensure that nc is the arity of c in the datatype). A case definitions. [14] treats a more realistic Standard ML-like expression need not be exhaustive, in which case evaluation Core with implicitly typed, polymorphic functions and pa- aborts by raising the built-in exception match. rameterised type definitions. Our proposal has been adapted A structure path sp is a reference to a bound structure to full Standard ML and is available in MoscowML [17]. identifier, or the projection of one of its substructures. A Section 2 introduces the syntax of Mini-SML. Sections 3 structure body b is a dependent sequence of definitions: sub- gives a motivating example to illustrate the limitations of sequent definitions may refer to previous ones. A type def- acyclic Modules. Section 4 reviews the static semantics of inition abbreviates a type. A datatype definition is like a Mini-SML. Section 5 defines our extension to recursive mod- datatype specification but generates a new(and thus dis- ules and its static and dynamic semantics. Section 6 recasts tinct) type with the corresponding set of constructors. As the first example so that its mutually dependent components in signatures, datatype replication declares a datatype that may be separately compiled. Section 7 presents a real-world is equivalent to, and thus compatible with, the datatype tp. example that uses recursive modules to implement an ad- Datatype replication is used to copy a datatype into an- vanced data structure. Section 8 assesses our contribution. other scope whilst preserving compatibility with that type. Value, recursive function and structure definitions bind term 2. THE SYNTAX OF MINI-SML identifiers to the values of expressions. A functor definition introduces a named function on structures: X is the func- Figure 1 defines the abstract syntax of Mini-SML. A type tor’s formal argument, S specifies the argument’s type, and s path tp is a projection t or sp.t of a (Core) type component is the functor’s body that may refer to X. The functor may from the context or a structure path. A core type umaybe be applied to any argument that matches S. A signature used to define a type identifier or to specify the type of a definition abbreviates a signature. A structure expression s Core value. These are just the types of a simple functional evaluates to a structure. It may be a path or an encapsulated language, extended with type paths. A signature body Bisa structure body, whose type, value and structure definitions sequential specification of a structure’s components. A type become the components of the structure. The application of component may be specified transparently, by equating it a functor evaluates its body against the value of the actual with a type, or opaquely, permitting a variety of realisations argument. A transparent constraint (s : S) restricts the (ie. implementations). The implementation of a transpar- visibility of the structure’s components to those specified in ent component is fixed (up to the realisation of any opaque the signature, which the structure must match, and reveals types that it mentions). An opaque datatype specification the actual realisations of all type components in the sig- describes an arbitrary (recursive) datatype with finite set of nature (even those with opaque specifications).