CDuce: cohabitation of parametric, , and ad hoc polymorphism

J. Lopez1

1: Université Paris Diderot - Paris VII, 5 Rue Thomas Mann, 75013 Paris, France [email protected]

Abstract In this short paper, we report on the progress of the implementation of the CDuce functional with polymorphic types. We highlight the challenges (theoretical and implementation wise) of having an ML-style programming language with , semantic subtyping, and a type-case. We also highlight recent work on linking OCaml and CDuce programs (so that one language can manipulate values from the other and viceversa), as well as the issues encountered in the process. This ongoing work raises several language design questions, that we think may be of interest to (or answered by) the community.

1. CDuce in a nutshell

CDuce [1] is a modern XML-oriented functional language with innovative features. A (written in OCaml) is available under the terms of an open-source license. CDuce is type-safe, efficient, and offers powerful constructions to work with XML documents. Its syntax is rather close to the syntax of OCaml. Here is an example of CDuce code : type phone_string = [ ’+’? (’0’--’9’)*] type email_string = [ (Char\’@’)+ ’@’ (Char\’@’)+] type contact = [ [ Char*] (phone_string| email_string) ] type addressbook = [ contact*]

let extract_info ( [ [Char*] phone_string ]-> ([Char*], phone_string) ; [ [Char*] email_string ]-> ([Char*], email_string) ) < _ >[ <_>n <_> s ] -> (n,s)

let contacts_to_list (addressbook -> [ ([Char*], phone_string|email_string)*]) l -> (mapl with c -> extract_info c) In this example, we first define a type phone_string representing sequences of characters between ’0’ and ’9’ optionally prefixed by a character ’+’. As one can see in CDuce sequences (delimited by square brackets) can be formed of elements of different types. The type of a sequence describes its content by a regular expressions over types (e.g., ’+’ is the singleton type containing just the value ’+’). Likewise we define a type mail_string as a sequence type with a regular expression (a set-theoretic type difference: Char\’@’ denoting the type of all UTF-8 characters but ’@’). Based on these types, we can define an XML-element type, whose tag is contact, itself containing two elements : a name element and either a phone element or an email element (as

1 J. Lopez

stated by the use of the union type operator “|”). Lastly, we define an addressbook type with a tag abook and a content which is a sequence of contacts. We then define two functions. The first one, extract_info is overloaded. It can either take a contact element whose second child is a phone and extract the corresponding string content or it can take a contact element whose second child is an email and then also extract the string content. Note that the output type is not [ Char *] but the precise sequence type phone_string or mail_string depending on the context. The function works by pattern-matching its input against the appropriate pattern (as in OCaml, _ stands for a wild-card pattern). Lastly the function contacts_to_list transforms an addressbook into a list of pairs of strings (the name and either the phone or email). To do so, it uses the special map_with construct, which applies its set of pattern-matchings to every element of the matched sequence. As we can see, CDuce features type constructors (arrows, products, sequences, XML elements), set-theoretic type combinators (union, intersection, difference), recursive and regular expression types and singleton types (for instance ’@’ is the type of the character ’@’). This type algebra is supported by a very powerful notion of semantic subtyping, defined as : t ≤ s ⇔ [t] ⊆ [s] where [_] is the function interpreting every type as the set of values having that type. Subtyping is therefore defined as the set-inclusion between interpretations. We invite the reader to refer to [2] for a description of the advantages of a semantic approach to subtyping (as apposed to an axiomatized syntactic approach), especially in the presence of Boolean connectives in types. While the previous example is (we hope) enticing, it also shows the limits of the original, monomorphic CDuce: • relating input and output types of functions requires to list all possible “instances” (as we did with our extract_info function), that is, make use of the so-called ad-hoc polymorphism; • higher-order iterators on sequences and trees have to be hard-coded operators of the language (like our map_with) to be typed precisely. Indeed, a map function could only have the uninteresting type (Any -> Any) -> [ Any * ] -> [ Any * ], loosing thus all type information. The core calculus and meta-theory underlying the CDuce language have recently been extended with parametric polymorphism. First, semantic subtyping was extended to support subtyping in the presence of type variables, then the core calculus and its were extended with parametric polymorphism ([5] for the static and dynamic semantics and [6] for local type inference).

2. Polymorphic CDuce 2.1. Short example As an example, here is the function map in polymorphic CDuce: let mapf (f : ’a -> ’b)(l : [’a*]) : [’b*] = let aux (f : ’a -> ’b)(l : [’a*])(acc : [’b*]) : [’b*] = matchl with | [] -> acc | [el; rest] -> aux f rest (acc @ [(f el)]) in aux f l [] While the definition of mapf is very similar to what could be written in OCaml (save for the mandatory type annotation on the function definition), the CDuce compiler automatically infers the following, very precise type for the partial application map extract_info: ([ contact* ] -> [ ([Char*], phone_string|email_string)* ]) & ([ [ String email_string ]* ] -> [ ([Char*], email_string)*]) & ([ [ String phone_string ]* ] -> [ ([Char*], phone_string)*]) & ([ ] -> [ ])

2 CDuce: cohabitation between parametric polymorphism and semantic subtyping

2.2. Type substitutions One challenging theoretical aspect with polymorphic CDuce is the handling of type- substitutions. Concretely, if we want to apply the polymorphic identity fun (’a -> ’a) x -> x1 to, say, 42, then the particular instance obtained by the type-substitution {Int/α} (denoting the replacement of every occurrence of α by Int) must be used, that is (fun (Int -> Int) x -> x) 42. We have thus to relabel the type decorations of λ-abstractions before applying them. In implicitly typed languages, such as ML, the relabeling is meaningless (no type decoration is used in terms) while in their explicitly-typed counterparts relabeling can be seen as a logically meaningful but computationally useless operation, insofar as execution takes place on type erasures (ie, the terms obtained by erasing all type decorations). In the presence of type-case expressions, however, relabeling is necessary since the label of a λ-abstraction determines its type: testing whether an expression has type, say, Int→Int should succeed for the application of fun (’a -> ’a -> ’a) x -> (fun (’a -> ’a) _ -> x) to 42 and fail for its application to true. In practice, we have that match (fun (’a -> ’a -> ’a) x -> fun (’a -> ’a) _ -> x) 42 with | (Int -> Int) -> 0 | _ -> 1 must reduce to match (fun (Int -> Int) _ -> 42) with | (Int -> Int) -> 0 | _ -> 1 and thus to 0, while match (fun (’a -> ’a -> ’a) x -> fun (’a -> ’a) _ -> x) ‘true with | (Int -> Int) -> 0 | _ -> 1 must reduce to match (fun (Bool -> Bool) _ -> ‘true) with | (Int -> Int) -> 0 | _ -> 1 and thus to 1.

2.3. Interface between OCaml and CDuce The CDuce compiler also includes an interface between OCaml and CDuce, allowing a CDuce program to use OCaml values and vice versa. Take for example this line in CDuce: let listmap = List.map Here listmap is the polymorphic function map from the OCaml module List, but that can be used on CDuce values. Note: One also has to write an OCaml header file that contains this kind of values with their corresponding types.2 In our example, we should write this line in a .mli file: val listmap : (’a -> ’b) -> ’a list -> ’b list To translate a CDuce function into an OCaml function (the opposite goes through a similar process), the system first translates the OCaml argument into the equivalent value in CDuce (for example, an OCaml int is translated into an OCaml big_int which is the type Int in CDuce) before giving it as an argument to the CDuce function stored in the environment. Then, it translates the result into the equivalent value in OCaml. However, we cannot do that for a value of type ’a. We could translate an OCaml value of this type into an equivalent CDuce value, but then how to translate back the CDuce value returned by the application into an OCaml value of type ’a? The only way is to create a CDuce value that contains the runtime representation of the OCaml value (using Obj.repr), the system uses this value for the application and then uses Obj.magic to take back a polymorphic OCaml value from the result. Note: In our case, the use of Obj.magic is justified by the fact that the type system in CDuce is more precise than the one in OCaml, so we need this cast to make OCaml accept our output. Besides, we can guarantee that this cast is actually type-safe because we guarantee that the translations from OCaml to CDuce and from CDuce to OCaml, as well as the CDuce type system are type-safe. 1Anonymous functions can be written in CDuce, but with an interface : fun (t -> s) x -> e. 2There is ongoing work to derive the header file automatically.

3 J. Lopez

2.4. Future work Some problems we had during the implementation are still not resolved and a lot of improvements are possible: • Syntax: the OCaml way to describe type variables is not completely compatible with the original syntax of CDuce. Is there a way to improve the actual solution? • What is missing for CDuce?: abstract types, type inference, ...

References

[1] The CDuce project webpage : https://www.cduce.org/redmine/projects/cduce (for stable monomorphic version of CDuce, watch tag 0.6.0) Website and documentation : http://www.cduce.org [2] G.Castagna, A.Frisch. A Gentle Introduction to Semantic Subtyping In PPDP ’05 Proceedings of the 7th ACM SIGPLAN international conference on Principles and practice of declarative programming. Pages 198 - 199. July 2005 [3] G. Castagna, K. Nguyen, Z. Xu. Set-theoretic Foundation of Parametric Polymorphism and Subtyping. In ICFP ’11 Proceedings of the 16th ACM SIGPLAN international conference on Functional programming. Pages 94-106. September 2011. [4] Z. Xu. Polymorphisme paramétrique pour le traitement de documents XML. PhD Thesis, Université Paris Diderot. 2013. [5] G. Castagna, K. Nguyen, Z. Xu, H. Im, S. Lenglet and L. Padovani. Polymorphic Functions with Set-Theoretic Types. Part 1: Syntax, Semantics, and Evaluation. In POPL ’14, 41th ACM Symposium on Principles of Programming Languages, pag. 5-17. January, 2014. [6] G. Castagna, K. Nguyen, Z. Xu, and P. Abate. Polymorphic Functions with Set-Theoretic Types. Part 2: Local Type Inference and Type Reconstruction. November, 2013. Unpublished manuscript. http://www.pps.univ-paris-diderot.fr/~gc/papers/polydeuces-part2.pdf [7] C. Okasaki. Purely Functional Data Structures. Cambridge University Press, 1998

4