A Type System for Smalltalk

Total Page:16

File Type:pdf, Size:1020Kb

A Type System for Smalltalk A Type System for Smalltalk Justin 0. Graver University of Florida Ralph E. Johnson University of Illinois at Urbana-Champaign Abstract subclassing with subtyping [Sus81, B182, SCB’86, Str86, Mey88]. In our type system, types are based This paper describes a type system for Smalltalk that on classes (i.e. each class defines a type or a fam- is type-safe, that allows most Smalltalk programs to ily of types) but subclassing has no relationship to be type-checked, and that can be used as the basis of subtyping. This is because Smalltalk classes inherit an optimizing compiler. implementation and not specification. Our type system uses discriminated union types and signature types to describe inclusion polymor- 1 Introduction phism and describes functional polymorphism (some- times called parametric polymorphism, see [DT88]) There has been a lot of interest recently in type using bounded universal quantification. It has the systems for object-oriented programming languages following features: [CW85, DT88]. Since Smalltalk was one of the ear- liest object-oriented languages, it is not surprising a automatic case analysis of union types, that there have been several attempts to provide a type system for it [SuaBl, BI82]. Unfortunately, l effective treatment of side-effects by basing the none of the attempts have been completely successful definition of the subtype relation on equality of [Joh86]. In particular, none of the proposed type sys- parameterized types, and tems are both type-safe and capable of type-checking l type-safety, i.e. a variable’s value always con- most common Smalltalk programs. Smalltalk vio- forms to its type. lates many of the assumptions on which most object- oriented type systems are based, and a successful Because the type system uses class information, it can type system for Smalltalk is necessarily different from be used for optimization. It has been implemented as those for other languages. part of the TS (Typed Smalltalk) optimizing compiler We have designed a new type system for Smalltalk. [JGZ88]. The biggest difference between our type system and others is that most type systems for object-oriented programming languages equate classes with types and 2 Background Authors’ address, telephone, and e-mail: Smalltalk [GRSS] is. a pure object-oriented program- Department of Computer and Information Sciences, E301 CSE, Gainesville, FL 32611, (904) 392-1507, graverOcis.ti.edu ming language in that everything, from integers to Department of Computer Science, 1304 W. Springfield Ave., text windows to the execution state, is an object. In Urbana, IL 61801, (217) 244-0093, [email protected] particular, classes are objects. Since everything is an object, the only operation needed in Smalltalk is mes- sage sending. Smalltalk uses run-time type-checking. Every message send is dynamically bound to an im- plementation depending on the class of the receiver. This unified view of the universe makes Smalltalk a Permission to copy without fee all or part of this material is granted compact yet powerful language. provided that the copies are not made or distributed for direct Smalltalk is extremely extensible. Most operations commercialadvantage, the ACM copyright notice and the title of the are built out of a small set of powerful primitives. publication and its date appear, and notice is given that copying is by permission of the Asociation for Computing Machinery. To copy othcr- wise, or to republish, requires a fee and/or specific permission. 0 1990 ACM 089791-3434/9O/OOOi/O136 $1.50 136 For example, control structures are all implemented in terms of blocks, which are the Smalltalk equivalent class Change: of closures. Smalltalk’s equivalent to if-then-else is values the ifTtue:ifFalse: message, which is sent to a boolean TAtray with: self class with: self parameters object with two block arguments. If the receiver is an parameters instance of class True then the “true block” is evalu- self subclassResponsibility ated. Similarly, if the receiver is an instance of class Fake then the “false block” is evaluated. Looping is class ClassRelatedChange: implemented in a similar way using recursion. These parameters primitives can be used to implement case statements, TclassName generators, exceptions, and coroutines[Deu81]. Smalltalk has been used mostly for prototyping and class MethodChange: exploratory development. It is ideal for these pur- parameters poses because Smalltalk programs are easy to change fAtray with: className with: selector and reuse, and Smalltalk comes with a powerful pro- gramming environment and large library of generally class OtherChange: useful components. However, it has not been used parameters as much for production programming. This is partly Tself text because it is hard to use for multi-person projects, partly because it is hard to deliver application pro- grams without the large programming environment, Figure 1: Change and its subclasses. and partly because it is not as efficient as languages like C. In spite of these problems, the development and maintenance of Smalltalk programs is so easy that more and more companies are developing ap- plications with it. class is needed) [JF88]. Although conscientious prd- A type system can help solve Smalltalk’s problems. grammers remove these improprieties from finished Type information makes programs easier to under- programs, we wanted a type system that would al- stand and can be used to automatically check inter- low them, since they are often important intermedi- face consistency, which makes Smalltalk better suited ate steps in the development process. Thus, a static for multiperson projects. However, our main motiva- type system for Smalltalk must be as flexible as the tion for type-checking Smalltalk is to provide infor- traditional dynamic type-checking. mation needed by an optimizing compiler. Smalltalk In an untyped object-oriented language like methods (procedures) tend to be very small, mak- Smalltalk, classes inherit only the implementation ing aggressive inline substitution necessary to achieve of their superclasses. In contrast, most type sys- good performance. Type information is required to tems for object-oriented programming languages re- bind methods at compile-time. Although some type quire classes to inherit the specification of their super- information can be acquired by dataflow analysis of classes [BI82, Car84, CW85, SCB*86, Str86, Mey88], individual methods [CUL89], explicit type declara- not just the implementation [Sua81, Joh86]. A class tions produce better results and also make programs specification can be an explicit signature [BHJL86], more reliable and easier to understand. From this the implicit signature implied by the name of a class point of view, the type system has been quite suc- [SCB*86], or a signature combined with method pre- cessful, because the TS compiler can make Smalltalk and post-conditions, class invariants, etc. [Mey88]. programs run nearly as fast as C programs. Inheriting specification means that the type of meth- It is important that a type system for Smalltalk not ods in subclasses must be subtypes of their specifica- hurt Smalltalk’s good qualities. In particular, it must tions in superclasses. not hinder the rapid-prototyping style often used by Since Smalltalk is untyped, only implementation is Smalltalk programmers. Exploratory programmers inherited. This makes retrofitting a type system dif- try to build programs as quickly as possible, mak- ficult. Since specification inheritance is a logical or- ing localized changes instead of global reorganization. ganization, many parts of the Smalltalk inheritance They often use explicit run-time checks for particu- hierarchy conform to specification inheritance, but lar classes (evidence that code is in the wrong class) there is nothing in the language that requires or en- and create new subclasses to reuse code rather than forces this. Thus, it is common to find classes in to organize abstraction (evidence that a new abstract the Smalltalk- class library that inherit implemen- 137 tation and ignore specification. 3 Types Dictionary is a good example of a class that inher- The abstract and concrete syntax for type expressions its implementation but not specification; it is a sub- is shown in Figure 2 (abstract syntax on the left and class of Set. A Dictionary is a keyed lookup table concrete syntax on the right). In the concrete syn- that is implemented as a hash table of <key, value> tax grammar terminals are underlined, (something)* pairs. Dictionary inherits the hash table implemen- represents zero or more repetitions of the something, tation from Set, but applications using Sets would and (something)+ represents one or more repetitions behave quite differently if given Dictionaries instead. of the something. Each of the type forms will be de- scribed in detail in the following sections. Abstract classes, a commonly used Smalltalk pro- We use the abstract syntax in inference rules and gramming technique, provide other examples where the concrete syntax in examples. Type expressions classes inherit implementation but not specification. are denoted by a, t, and u. Type variables are de- Consider the method definitions shown in Figure 1. noted by p (abstract syntax) or by P (concrete_syn- These (and other) classes are used to track changes tax). Lists (tuples) of types are denoted by t = made to the system. The abstract class Change de- <ir,tz,***r t, >. The empty list is denoted by <> fines the values method in terms of the parameters and t et’is the list < t, 21, t2,. , t, >. Similar nota- method. The implementation of the values method is tion is used for lists of type variables. C denotes a inherited by the subclasses of Change. The result of class name and m denotes a message selector. sending the values message depends on the result of Strictly speaking, a type variable is not a type; it sending the parameters message, which is different for is a place holder (or representative) for a type. We each class in Figure 1.
Recommended publications
  • A Polymorphic Type System for Extensible Records and Variants
    A Polymorphic Typ e System for Extensible Records and Variants Benedict R. Gaster and Mark P. Jones Technical rep ort NOTTCS-TR-96-3, November 1996 Department of Computer Science, University of Nottingham, University Park, Nottingham NG7 2RD, England. fbrg,[email protected] Abstract b oard, and another indicating a mouse click at a par- ticular p oint on the screen: Records and variants provide exible ways to construct Event = Char + Point : datatyp es, but the restrictions imp osed by practical typ e systems can prevent them from b eing used in ex- These de nitions are adequate, but they are not par- ible ways. These limitations are often the result of con- ticularly easy to work with in practice. For example, it cerns ab out eciency or typ e inference, or of the di- is easy to confuse datatyp e comp onents when they are culty in providing accurate typ es for key op erations. accessed by their p osition within a pro duct or sum, and This pap er describ es a new typ e system that reme- programs written in this way can b e hard to maintain. dies these problems: it supp orts extensible records and To avoid these problems, many programming lan- variants, with a full complement of p olymorphic op era- guages allow the comp onents of pro ducts, and the al- tions on each; and it o ers an e ective type inference al- ternatives of sums, to b e identi ed using names drawn gorithm and a simple compilation metho d.
    [Show full text]
  • Safety Properties Liveness Properties
    Computer security Goal: prevent bad things from happening PLDI’06 Tutorial T1: Clients not paying for services Enforcing and Expressing Security Critical service unavailable with Programming Languages Confidential information leaked Important information damaged System used to violate laws (e.g., copyright) Andrew Myers Conventional security mechanisms aren’t Cornell University up to the challenge http://www.cs.cornell.edu/andru PLDI Tutorial: Enforcing and Expressing Security with Programming Languages - Andrew Myers 2 Harder & more important Language-based security In the ’70s, computing systems were isolated. Conventional security: program is black box software updates done infrequently by an experienced Encryption administrator. Firewalls you trusted the (few) programs you ran. System calls/privileged mode physical access was required. Process-level privilege and permissions-based access control crashes and outages didn’t cost billions. The Internet has changed all of this. Prevents addressing important security issues: Downloaded and mobile code we depend upon the infrastructure for everyday services you have no idea what programs do. Buffer overruns and other safety problems software is constantly updated – sometimes without your knowledge Extensible systems or consent. Application-level security policies a hacker in the Philippines is as close as your neighbor. System-level security validation everything is executable (e.g., web pages, email). Languages and compilers to the rescue! PLDI Tutorial: Enforcing
    [Show full text]
  • Create Union Variables Difference Between Unions and Structures
    Union is a user-defined data type similar to a structure in C programming. How to define a union? We use union keyword to define unions. Here's an example: union car { char name[50]; int price; }; The above code defines a user define data type union car. Create union variables When a union is defined, it creates a user-define type. However, no memory is allocated. To allocate memory for a given union type and work with it, we need to create variables. Here's how we create union variables: union car { char name[50]; int price; } car1, car2, *car3; In both cases, union variables car1, car2, and a union pointer car3 of union car type are created. How to access members of a union? We use . to access normal variables of a union. To access pointer variables, we use ->operator. In the above example, price for car1 can be accessed using car1.price price for car3 can be accessed using car3->price Difference between unions and structures Let's take an example to demonstrate the difference between unions and structures: #include <stdio.h> union unionJob { //defining a union char name[32]; float salary; int workerNo; } uJob; struct structJob { char name[32]; float salary; int workerNo; } sJob; main() { printf("size of union = %d bytes", sizeof(uJob)); printf("\nsize of structure = %d bytes", sizeof(sJob)); } Output size of union = 32 size of structure = 40 Why this difference in size of union and structure variables? The size of structure variable is 40 bytes. It's because: size of name[32] is 32 bytes size of salary is 4 bytes size of workerNo is 4 bytes However, the size of union variable is 32 bytes.
    [Show full text]
  • Cyclone: a Type-Safe Dialect of C∗
    Cyclone: A Type-Safe Dialect of C∗ Dan Grossman Michael Hicks Trevor Jim Greg Morrisett If any bug has achieved celebrity status, it is the • In C, an array of structs will be laid out contigu- buffer overflow. It made front-page news as early ously in memory, which is good for cache locality. as 1987, as the enabler of the Morris worm, the first In Java, the decision of how to lay out an array worm to spread through the Internet. In recent years, of objects is made by the compiler, and probably attacks exploiting buffer overflows have become more has indirections. frequent, and more virulent. This year, for exam- ple, the Witty worm was released to the wild less • C has data types that match hardware data than 48 hours after a buffer overflow vulnerability types and operations. Java abstracts from the was publicly announced; in 45 minutes, it infected hardware (“write once, run anywhere”). the entire world-wide population of 12,000 machines running the vulnerable programs. • C has manual memory management, whereas Notably, buffer overflows are a problem only for the Java has garbage collection. Garbage collec- C and C++ languages—Java and other “safe” lan- tion is safe and convenient, but places little con- guages have built-in protection against them. More- trol over performance in the hands of the pro- over, buffer overflows appear in C programs written grammer, and indeed encourages an allocation- by expert programmers who are security concious— intensive style. programs such as OpenSSH, Kerberos, and the com- In short, C programmers can see the costs of their mercial intrusion detection programs that were the programs simply by looking at them, and they can target of Witty.
    [Show full text]
  • [Note] C/C++ Compiler Package for RX Family (No.55-58)
    RENESAS TOOL NEWS [Note] R20TS0649EJ0100 Rev.1.00 C/C++ Compiler Package for RX Family (No.55-58) Jan. 16, 2021 Overview When using the CC-RX Compiler package, note the following points. 1. Using rmpab, rmpaw, rmpal or memchr intrinsic functions (No.55) 2. Performing the tail call optimization (No.56) 3. Using the -ip_optimize option (No.57) 4. Using multi-dimensional array (No.58) Note: The number following the note is an identification number for the note. 1. Using rmpab, rmpaw, rmpal or memchr intrinsic functions (No.55) 1.1 Applicable products CC-RX V2.00.00 to V3.02.00 1.2 Details The execution result of a program including the intrinsic function rmpab, rmpaw, rmpal, or the standard library function memchr may not be as intended. 1.3 Conditions This problem may arise if all of the conditions from (1) to (3) are met. (1) One of the following calls is made: (1-1) rmpab or __rmpab is called. (1-2) rmpaw or __rmpaw is called. (1-3) rmpal or __rmpal is called. (1-4) memchr is called. (2) One of (1-1) to (1-3) is met, and neither -optimize=0 nor -noschedule option is specified. (1-4) is met, and both -size and -avoid_cross_boundary_prefetch (Note 1) options are specified. (3) Memory area that overlaps with the memory area (Note2) read by processing (1) is written in a single function. (This includes a case where called function processing is moved into the caller function by inline expansion.) Note 1: This is an option added in V2.07.00.
    [Show full text]
  • Cs164: Introduction to Programming Languages and Compilers
    Lecture 19 Flow Analysis flow analysis in prolog; applications of flow analysis Ras Bodik Hack Your Language! CS164: Introduction to Programming Ali and Mangpo Languages and Compilers, Spring 2013 UC Berkeley 1 Today Static program analysis what it computes and how are its results used Points-to analysis analysis for understanding flow of pointer values Andersen’s algorithm approximation of programs: how and why Andersen’s algorithm in Prolog points-to analysis expressed via just four deduction rules Andersen’s algorithm via CYK parsing (optional) expressed as CYK parsing of a graph representing the pgm Static Analysis of Programs definition and motivation 3 Static program analysis Computes program properties used by a range of clients: compiler optimizers, static debuggers, security audit tools, IDEs, … Static analysis == at compile time – that is, prior to seeing the actual input ==> analysis answer must be correct for all possible inputs Sample program properties: does variable x has a constant value (for all inputs)? does foo() return a table (whenever called, on all inputs)? 4 Client 1: Program Optimization Optimize program by finding constant subexpressions Ex: replace x[i] with x[1] if we know that i is always 1. This optimizations saves the address computation. This analysis is called constant propagation i = 2 … i = i+2 … if (…) { …} … x[i] = x[i-1] 5 Client 2: Find security vulnerabilities One specific analysis: find broken sanitization In a web server program, as whether a value can flow from POST (untrusted source) to the SQL interpreter (trusted sink) without passing through the cgi.escape() sanitizer? This is taint analysis.
    [Show full text]
  • Design and Implementation of Generics for the .NET Common Language Runtime
    Design and Implementation of Generics for the .NET Common Language Runtime Andrew Kennedy Don Syme Microsoft Research, Cambridge, U.K. fakeÒÒ¸d×ÝÑeg@ÑicÖÓ×ÓfغcÓÑ Abstract cally through an interface definition language, or IDL) that is nec- essary for language interoperation. The Microsoft .NET Common Language Runtime provides a This paper describes the design and implementation of support shared type system, intermediate language and dynamic execution for parametric polymorphism in the CLR. In its initial release, the environment for the implementation and inter-operation of multiple CLR has no support for polymorphism, an omission shared by the source languages. In this paper we extend it with direct support for JVM. Of course, it is always possible to “compile away” polymor- parametric polymorphism (also known as generics), describing the phism by translation, as has been demonstrated in a number of ex- design through examples written in an extended version of the C# tensions to Java [14, 4, 6, 13, 2, 16] that require no change to the programming language, and explaining aspects of implementation JVM, and in compilers for polymorphic languages that target the by reference to a prototype extension to the runtime. JVM or CLR (MLj [3], Haskell, Eiffel, Mercury). However, such Our design is very expressive, supporting parameterized types, systems inevitably suffer drawbacks of some kind, whether through polymorphic static, instance and virtual methods, “F-bounded” source language restrictions (disallowing primitive type instanti- type parameters, instantiation at pointer and value types, polymor- ations to enable a simple erasure-based translation, as in GJ and phic recursion, and exact run-time types.
    [Show full text]
  • Coredx DDS Type System
    CoreDX DDS Type System Designing and Using Data Types with X-Types and IDL 4 2017-01-23 Table of Contents 1Introduction........................................................................................................................1 1.1Overview....................................................................................................................2 2Type Definition..................................................................................................................2 2.1Primitive Types..........................................................................................................3 2.2Collection Types.........................................................................................................4 2.2.1Enumeration Types.............................................................................................4 2.2.1.1C Language Mapping.................................................................................5 2.2.1.2C++ Language Mapping.............................................................................6 2.2.1.3C# Language Mapping...............................................................................6 2.2.1.4Java Language Mapping.............................................................................6 2.2.2BitMask Types....................................................................................................7 2.2.2.1C Language Mapping.................................................................................8 2.2.2.2C++ Language Mapping.............................................................................8
    [Show full text]
  • Multi-Return Function Call
    To appear in J. Functional Programming 1 Multi-return Function Call OLIN SHIVERS and DAVID FISHER College of Computing Georgia Institute of Technology (e-mail: fshivers,[email protected]) Abstract It is possible to extend the basic notion of “function call” to allow functions to have multiple re- turn points. This turns out to be a surprisingly useful mechanism. This article conducts a fairly wide-ranging tour of such a feature: a formal semantics for a minimal λ-calculus capturing the mechanism; motivating examples; monomorphic and parametrically polymorphic static type sys- tems; useful transformations; implementation concerns and experience with an implementation; and comparison to related mechanisms, such as exceptions, sum-types and explicit continuations. We conclude that multiple-return function call is not only a useful and expressive mechanism, at both the source-code and intermediate-representation levels, but also quite inexpensive to implement. Capsule Review Interesting new control-flow constructs don’t come along every day. Shivers and Fisher’s multi- return function call offers intriguing possibilities—but unlike delimited control operators or first- class continuations, it won’t make your head hurt or break the bank. It might even make you smile when you see the well-known tail call generalized to a “semi-tail call” and a “super-tail call.” What I enjoyed the most was the chance to reimagine several of my favorite little hacks using the new mechanism, but this unusually broad paper offers something for everyone: the language designer, the theorist, the implementor, and the programmer. 1 Introduction The purpose of this article is to explore in depth a particular programming-language mech- anism: the ability to specify multiple return points when calling a function.
    [Show full text]
  • Julia's Efficient Algorithm for Subtyping Unions and Covariant
    Julia’s Efficient Algorithm for Subtyping Unions and Covariant Tuples Benjamin Chung Northeastern University, Boston, MA, USA [email protected] Francesco Zappa Nardelli Inria of Paris, Paris, France [email protected] Jan Vitek Northeastern University, Boston, MA, USA Czech Technical University in Prague, Czech Republic [email protected] Abstract The Julia programming language supports multiple dispatch and provides a rich type annotation language to specify method applicability. When multiple methods are applicable for a given call, Julia relies on subtyping between method signatures to pick the correct method to invoke. Julia’s subtyping algorithm is surprisingly complex, and determining whether it is correct remains an open question. In this paper, we focus on one piece of this problem: the interaction between union types and covariant tuples. Previous work normalized unions inside tuples to disjunctive normal form. However, this strategy has two drawbacks: complex type signatures induce space explosion, and interference between normalization and other features of Julia’s type system. In this paper, we describe the algorithm that Julia uses to compute subtyping between tuples and unions – an algorithm that is immune to space explosion and plays well with other features of the language. We prove this algorithm correct and complete against a semantic-subtyping denotational model in Coq. 2012 ACM Subject Classification Theory of computation → Type theory Keywords and phrases Type systems, Subtyping, Union types Digital Object Identifier 10.4230/LIPIcs.ECOOP.2019.24 Category Pearl Supplement Material ECOOP 2019 Artifact Evaluation approved artifact available at https://dx.doi.org/10.4230/DARTS.5.2.8 Acknowledgements The authors thank Jiahao Chen for starting us down the path of understanding Julia, and Jeff Bezanson for coming up with Julia’s subtyping algorithm.
    [Show full text]
  • An Operational Semantics and Type Safety Proof for Multiple Inheritance in C++
    An Operational Semantics and Type Safety Proof for Multiple Inheritance in C++ Daniel Wasserrab Tobias Nipkow Gregor Snelting Frank Tip Universitat¨ Passau Technische Universitat¨ Universitat¨ Passau IBM T.J. Watson Research [email protected] Munchen¨ [email protected] Center passau.de [email protected] passau.de [email protected] Abstract pected type, or end with an exception. The semantics and proof are formalized and machine-checked using the Isabelle/HOL theorem We present an operational semantics and type safety proof for 1 multiple inheritance in C++. The semantics models the behaviour prover [15] and are available online . of method calls, field accesses, and two forms of casts in C++ class One of the main sources of complexity in C++ is a complex hierarchies exactly, and the type safety proof was formalized and form of multiple inheritance, in which a combination of shared machine-checked in Isabelle/HOL. Our semantics enables one, for (“virtual”) and repeated (“nonvirtual”) inheritance is permitted. Be- the first time, to understand the behaviour of operations on C++ cause of this complexity, the behaviour of operations on C++ class class hierarchies without referring to implementation-level artifacts hierarchies has traditionally been defined informally [29], and in such as virtual function tables. Moreover, it can—as the semantics terms of implementation-level constructs such as v-tables. We are is executable—act as a reference for compilers, and it can form only aware of a few formal treatments—and of no operational the basis for more advanced correctness proofs of, e.g., automated semantics—for C++-like languages with shared and repeated mul- program transformations.
    [Show full text]
  • Parametric Polymorphism Parametric Polymorphism
    Parametric Polymorphism Parametric Polymorphism • is a way to make a language more expressive, while still maintaining full static type-safety (every Haskell expression has a type, and types are all checked at compile-time; programs with type errors will not even compile) • using parametric polymorphism, a function or a data type can be written generically so that it can handle values identically without depending on their type • such functions and data types are called generic functions and generic datatypes Polymorphism in Haskell • Two kinds of polymorphism in Haskell – parametric and ad hoc (coming later!) • Both kinds involve type variables, standing for arbitrary types. • Easy to spot because they start with lower case letters • Usually we just use one letter on its own, e.g. a, b, c • When we use a polymorphic function we will usually do so at a specific type, called an instance. The process is called instantiation. Identity function Consider the identity function: id x = x Prelude> :t id id :: a -> a It does not do anything with the input other than returning it, therefore it places no constraints on the input's type. Prelude> :t id id :: a -> a Prelude> id 3 3 Prelude> id "hello" "hello" Prelude> id 'c' 'c' Polymorphic datatypes • The identity function is the simplest possible polymorphic function but not very interesting • Most useful polymorphic functions involve polymorphic types • Notation – write the name(s) of the type variable(s) used to the left of the = symbol Maybe data Maybe a = Nothing | Just a • a is the type variable • When we instantiate a to something, e.g.
    [Show full text]