Scala for Generic Programmers Comparing Haskell and Scala Support for Generic Programming

Under consideration for publication in J. Functional Programming 1 Scala for Generic Programmers Comparing Haskell and Scala Support for Generic Programming BRUNO C. D. S. OLIVEIRA ROSAEC Center, Seoul National University, 599 Gwanak-ro, Gwanak-gu, Seoul 151-744, South Korea (e-mail: http://ropas.snu.ac.kr/ bruno/) JEREMY GIBBONS Oxford University Computing Laboratory, Wolfson Building, Parks Road, Oxford OX1 3QD, UK (e-mail: http://www.comlab.ox.ac.uk/jeremy.gibbons/) Abstract Datatype-generic programming involves parametrization of programs by the shape of data, in the form of type constructors such as ‘list of’. Most approaches to datatype-generic programming are developed in pure functional programming languages such as Haskell. We argue that the functional object-oriented language Scala is in many ways a better choice. Not only does Scala provide equivalents of all the necessary functional programming features (such as parametric polymorphism, higher-order functions, higher-kinded type operations, and type- and constructor-classes), but it also provides the most useful features of object-oriented languages (such as subtyping, overriding, traditional single inheritance, and multiple inheritance in the form of traits). Common Haskell techniques for datatype-generic programming can be conveniently replicated in Scala, whereas the extra expressivity provides some important additional benefits in terms of extensibility and reuse. We illustrate this by comparing two simple approaches in Haskell, pointing out their limitations and showing how equivalent approaches in Scala address some of these limitations. Finally, we present three case studies on how to implement in Scala real datatype-generic programming approaches from the literature: Hinze’s ‘Generics for the Masses’, Lämmel and Peyton Jones’s ‘Scrap your Boilerplate with Class’, and Gibbons’s ‘Origami Programming’. 1 Introduction Datatype-generic programming (DGP) is about writing programs that are parametrized by a datatype, such as lists or trees. This is differentfrom parametric polymorphism,or ‘generics’ as the term is used by most object-oriented programmers: parametric polymorphism abstracts from the ‘integers’ in ‘lists of integers’, whereas DGP abstracts from the ‘lists of’. There is a large and growing collection of techniques for writing datatype-generic programs. Much of the early research on DGP relied on special-purpose languages or language extensions such as Charity (Cockett & Fukushima, 1992), PolyP (Jansson, 2000), and Generic Haskell (Hinze & Jeuring, 2002). With time, research has shifted towards more lightweight approaches, based on language extensions such as Scrap your Boiler- plate (Lämmel & Peyton Jones, 2003) and Template Haskell (Sheard & Peyton Jones, 2002); more recently, DGP techniques have been encapsulated in libraries for existing general-purpose languages, such as Generics for the Masses (Hinze, 2006) for Haskell and 2 Bruno Oliveira and Jeremy Gibbons Adaptive Object-Oriented Programming (Lieberherr, 1996) for C++. One key advantage of the lightweight approaches is that DGP becomes more accessible to potential users, since no new tool or compileris requiredin orderto enjoy its benefits. Indeed, the use of libraries or simple language extensions rather than completely new languages has greatly promoted the adoption of DGP. Despite the rather wide variety of host languages involved in the techniques listed above, the casual observer might be forgiven for concluding, from the wealth of proposals for lightweight generic programmingin Haskell (Cheney & Hinze, 2002; Hinze, 2006; Lämmel & Peyton Jones, 2005; Oliveira et al., 2006; Hinze et al., 2006; Weirich, 2006; Hinze & Löh, 2007; Mitchell & Runciman, 2007; Brown & Sampson, 2009), that ‘Haskell is the programming language of choice for discriminating datatype-generic programmers’. Our purpose in this paper is to argue to the contrary; we believe that although Haskell is ‘a fine tool for many datatype-generic applications’, it is not necessarily the best choice. In particular, we argue that the discriminating datatype-generic programmer ought seri- ously to consider using Scala, a relatively recent language providing a smooth integration of the functional and object-oriented paradigms. Scala offers equivalents for most famil- iar features cherished by datatype-generic Haskell programmers, such parametric polymorphism, higher-order functions, higher-kinded types, and type- and constructor-classes. (Two significant missing features are lazy evaluation and higher-ranked types.) In addition, it offers some of the most useful features of object-oriented programming languages, such as subtyping, overriding, and both single and a form of multiple inheritance (via ‘traits’). We show not only that Haskell techniques for DGP can be conveniently replicated in Scala, but also that the extra expressivity provides important additional benefits in terms of extensibility and reuse. Specifically, intricate constructions are often needed to bend the implicit dictionaries in Haskell’s class system to DGP purposes—these convolutions could mostly be avoided if one had first-class dictionaries. Scala’s traits mechanism provides such a facility, including the ability to pass them implicitly where appropriate. We are not the first to consider DGP in Scala: Moors et al. (2006) presented a translation into Scala of a Haskell library of ‘origami operators’ (Gibbons, 2006); we discuss this translation in depth in Section 9. And of course, it is not really surprising that Scala should turn out effectively to subsume Haskell, since its design is substantially inspired by functional programming. We are interested in finding the limitations of the general-purpose mechanisms, in order to point out their weaknesses and to promote their improvement. The aim here is not to compare particular DGP libraries, as was done by Rodriguez et al. (2008), but rather to consider the basic mechanisms used to implement those libraries. We claim that the language mechanisms traditionally used for implementing DGP libraries, namely type classes (Hall et al., 1996) and generalized algebraic datatypes (GADTs) (Peyton Jones et al., 2006), each have their limitations, and that it might be better to exploit a different mechanism incorporatingthe advantages of both. This paper explores Scala’s object system as such an alternative mechanism. We feel that our main contribution is as a call to datatype-generic programmers to look beyond Haskell, and particularly to look at Scala. Not only can Scala be used to express current approaches to DGP; in some ways—in particular, with its open datatypes, inheritance, and implicits mechanism—it improves upon Haskell. Some of those advantages Journal of Functional Programming 3 derive from Scala’s mixed-paradigm nature, and so do not translate back into Haskell; but others (such as case classes and anonymous case analyses, as we shall see) would fit perfectly well into Haskell. We emphasize that we are not arguing that Scala is universally superior to Haskell. Indeed, as we shall see, Scala too has some limitations. Instead, our purpose is to promote the migration of good ideas from Scala to Haskell and vice versa, so as to improve support for DGP in both languages. As a secondary contribution, we show that Scala is more of a functional programming language than is typically appreciated. Scala tends to be seen primarily as an object- oriented language that happens to have some functional features, and so potential users feel that they have to use it in an object-oriented way. For example, Moors et al. (2006) claimed to be ‘staying as close to the original work as possible’ in their translation of the origami operators, but as we show in Section 9 they still ended up less functional than they might have done. Scala is also a functional programming language that happens to have object-oriented features; indeed, it offers the best of both worlds, and this paper serves also as a tutorial in exploiting Scala as a multi-paradigm language. The rest of this paper is structured as follows. Section 2 sets the scene, by reviewing two straightforward approaches to DGP in Haskell—using representation datatypes and type classes respectively—and pointing out their limitations. Sections 3 and 4 introduce the basics of Scala, and those more advanced features of its type and class system on which we depend. Our contribution starts in Sections 5 and 6, which show how to implement in Scala the two approaches presented in Haskell in Section 2. After that, three case studies of the translation of existing DGP libraries into Scala are presented: Section 7 discusses an implementation of Hinze’s Generics for the Masses approach (Hinze, 2006); Section 8 shows a Scala implementation of Scrap your Boilerplate with Class (Lämmel & Peyton Jones, 2005); and Section 9 presents a more functional alternative to Moors et al.’s encoding (2006) of the Origami Programming operators (Gibbons, 2003; Gibbons, 2006) in Scala. Finally, Section 10 compares Haskell and Scala support for DGP, and briefly discusses some of the key ideas of the paper, and Section 11 concludes. Scala code for the examples is available online (Oliveira, 2009b). 2 Some Limitations of Haskell for Datatype-Generic Programming To conduct our experiments, we consider two very simple and straightforward libraries for generic programming, using representation datatypes and type classes respectively. The purpose here is to discuss the limitations of the two mechanisms. While

Load more