arXiv:2106.06278v1 [cs.PL] 11 Jun 2021 omtvt otat n h rbe asdb unions, by caused problem the and contracts motivate To o xeddnmccek to checks dynamic extend you tests. disjunc- two Boolean the of simply tions is it easy: dynamic really for are surely, unions hand, tests, other the On patterns. typed dynami- existing cally formalise they grad- where in language it typed worth ually difficult only really quite are unions are So statically. they verify because to mostly systems, type static 22]. 20, in- 17, quite 14, [10, been types has union systems, in type terested dynamic be- and interplay static the tween formalising with typ- gradual concerned the literature, why ing is This . optional absent an resent overviews Concepts: CCS literature. the of exam- survey via a contracts, and trade- intersection ple and and ex- union problems really to the inherent doesn’t illustrate offs and fact, point the We why. of inter-plain aware and keenly union while with section, contracts The difficult. higher-order deceptively on are literature intersection and union tracts of simple. part pretty dynamic actually the is that typing ver- first gradual at to appear difficult recog- may are it been types statically, ify intersection long and it’s union While that nized TypeScript. as typed gradually such of staple language a are types intersection and Union Abstract no ye,maigatype a meaning types, Union Introduction 1 section Keywords: cit[]adMP 7,ueuintpst oe h fre- the model to value the types use union to use practice [7], quent MyPy Type- and both particular, [9] In Script language. dynamic a to types static type a to either belong features grto agaei agaecnendwt describing with concerned language a con- is A language languages. figuration configuration through detour a make let’s languages Configuration 1.1 language. properties your desirable threaten and difficult, pretty actually notntl,a edcmn nti ril,a onas soon as article, this in document we as Unfortunately, of feature common a not are unions hand, other the On ttrsothwvr hti rsneo ihrodrcon- higher-order of presence in that however, out turns It no n nescincnrcsaehr,actually hard, are contracts intersection and Union nvria eBeo Aires Buenos de Universidad ; unsArs Argentina Aires, Buenos • ; otaevrfiainadvalidation and verification Software [email protected] otat,hge-re otat,uin inter- union, contracts, higher-order contracts, edr Freund Teodoro otaeadisengineering its and Software • eea n reference and General A or B r oua olwe adding when tool popular a are , A contracts ∪ null B otiigvle which values containing ( None 1] nosbecome unions [11], nPto)t rep- to Python) in → . → uvy and Surveys Language [email protected] anHamdaoui Yann ai,France Paris, of Tweag AL aho h ofiuaingnrtn agae above languages configuration-generating the of Each YAML. omdof: data formed con- JSON its illus- the and mechanisms, is as abstraction Nickel with [6] augmented core, language model, its Nickel At the motivation. use and will tration we article, this In Nickel 1.2 of natively. definition top job-matrix on such layered allow system templating a using for typi- this them systems do instantiate cally integration and Continuous once, better, infrastructure. steps much each the It’s prone. write error te- to is and instead, This maintain, infrastructure. to each hard for dious, have steps same would the ver- configuration copy Traditional different you compiler. with a or of infrastructures, sions different tests on same the run wherein jobs, are of matrix a need to typical fairly generated created. are were configurations out, Json- spelt where [4], than [6], Dhall rather Nickel [2], or Cue [5], as such net languages To con- need, the do. this of languages parts meet programming abstract traditional and re-use like re- to figuration, able you be configurations, to of want sort ally this describe docker To replicated) (possibly containers. of fleet large a you configure Kubernetes to with or need instance, computer, For a computers. of of fleet state a entire even the describe to extended been out. config- spelt explicitly, the and JSON, fully, or is TOML, uration YAML, as such configura- languages, traditional tion In application. an of configuration the n ikldt uthv tagtowr interpretation JSON. straightforward a in have is must Nickel data of Nickel constraint any design a Therefore application. an by configuration, nte xml scniuu nerto ytm:it’s systems: integration continuous is example Another have configurations DevOps, of advent the with However, iklcngrto ste vlae oa explicit an to evaluated then is configuration Nickel A • • • • functions: arrays: itoais rte as: written dictionaries, n let-definitions: and x 2, ,xn] ., . ,. x2 , [x1 fed au1, ,fed aun} valuen = fieldn ., . ,. value1 = {field1 fun let =value = id r1 .argn . arg1. e.g. nJO,wihcnte econsumed be then can which JSON, in [email protected] in ⇒ radSpiwack Arnaud exp body ai,France Paris, Tweag Conference’17, July 2017, Washington, DC, USA Teodoro Freund, Yann Hamdaoui, and Arnaud Spiwack

1.3 Contracts Unfortunately, the delayed check of contract, while essen- A useful feature of a configuration language is to provide fa- tial to ensuring that schema validation doesn’t affect per- cilities for schema validation. That is, help answer questions formance (or, indeed, is possible at all on functions), make like: does our configuration have all the required fields? does union contracts (and their less appreciated sibling, intersec- the url field indeed contains a URL? tion contracts) quite problematic. These are inherently dynamic questions, as they are all questions about the evaluated configuration. To this effect, 1.4 Contributions Nickel lets us annotate any expression with a dynamic schema Our contributions are as follows check: exp | C. There is also syntactic sugar to annotate def- • We describe the fundamental difficulties caused by pres- initions: let id | C = value in exp stands for let id = ( ence of union and intersection contracts in a language, value | C) in exp. which are kept implicit in the literature (Section 4) Let us pause for a moment and consider the following: • We survey the various trade-offs which appear in im- it is Nickel’s ambition to be able to manipulate configura- plemented languages and in the literature to work around tions like Nixpkgs. With over 50 000 packages, it is one of these difficulties (Section 5) the largest repository of software packages in existence [8]. Concretely, Nixpkgs is a dictionary mapping packages to 2 A typology of language features build recipes. That is, a massive, over-50 000-key-value-pair Union contracts are not only difficult to implement, their wide dictionary. It is absolutely out of the question to eval- unrestricted presence is incompatible with potentially desir- uate the entirety of this dictionary every time one needs to able properties of the language. In this section we present install 10 new packages: this would result in a painfully slow some of these properties; we will show how these properties experience. interact with union contracts in Sections 4 and 5. To be able to support such large dictionaries, Nickel’s dic- tionaries are lazy, that is, the values are only evaluated when 2.1 User-defined contracts explicitly required. For instance, when writing nixpkgs.hello, only the hello package gets evaluated. A strength of dynamic checking is that we can easily check But let’s consider now writing something like nixpkgs | properties which are impractical to check statically. For in- packages, to guarantee that all the packages conform to the stance that a string represents a well-formed URL, or a num- desired schema. If this were a simple Boolean test, it would ber is a valid port. have to evaluate all 50 000 package to check their validity, This same property is desirable of contracts as well, oth- hence breaking the laziness of dictionaries. Do we have to erwise we lose an important benefit of dynamic checking. choose between laziness and schema validation? Fortunately, Preferably, we want to be able to extend the universe of con- we don’t! Enter contracts [11]: dynamic checks which can tracts with user-defined predicates. be partially delayed, yet errors can be reported accurately. For instance, Figure 2 shows the definition of a contract Contracts can respect laziness of dictionaries, and they can for valid ports in Nickel syntax. User-defined contracts can be use to add schema validation to functions as well (in fact be combined with other contracts normally: Int → Port is functions were the original motivation for contracts). a contract verified by functions which, given an integer re- ThereisnoBooleanfunctionwhichcancheckthatavalue turns a valid port. has type Str → Str.Instead,acontractfor Str → Str checks This type of contracts are present in many different lan- for each call of the function whether guages, for instance, the Eiffel [16], the precursor of the Design by Contract philosophy, makes 1. the argument has type Str, otherwise the caller of the it possible to assert these kinds of expression as pre- and function is faulty post-conditions on functions and as invariants on classes[3]. 2. if so, that the returned value has type Str, otherwise The Racket programming language also has a system to the implementation of the function is faulty work with contracts, powerful enough to define flat con- Like in the case of lazy dictionaries, the checks are de- tracts, and to compose them with other kinds of dynamic layed. Contracts keep track of whether the caller or the im- checks, like higher order contracts or a lightweight take on plementation is at fault for a violation, hence it can report union and intersection contracts[1]. precise error messages. Contracts are said to blame either the caller or the implementation. Compare Figure 1a and Fig- 2.2 Referential transparency ure 1b: in Figure 1a an error is reported inside the catHosts The performance of modern programs heavily relies on the function, but catHosts is, in fact, correct, as is made clear by optimizations performed by the compiler or the interpreter. Figure 1b, where catHosts is decorated with the Str → Str Even more so for functional languages, whose execution contract, and correctly reports that the caller failed to call model is often far removed from the hardware, causing naive catHosts with a string argument. execution to exhibit unacceptable slowdowns. Union and intersection contracts are hard, actually Conference’17, July 2017, Washington, DC, USA ✞ ☎ 1 let catHosts = fun last ⇒ 2 let hosts = ["foo.com", "bar.org"] in 3 lists.fold (fun val acc ⇒ val ++ "," ++ ✞ ☎ acc) hosts last in 1 let catHosts | Str → Str = fun last ⇒ 4 2 [...] 5 let makeHost = fun server ext ⇒ ✝ ✆ 6 server ++ "." ++ ext in 7 8 catHosts (makeHost "google") ✝ ✆ error: Type error error: Blame error: contract broken by the caller. 3 | [...] "," ++ acc) hosts last in | Str → Str | ^^^ | --- expected type of the argument [...] | This expression has type Fun, [...] | but Str was expected 1 | let catHosts | Str → Str = fun last ⇒ 4 | | ^^^^^^^^^^boundhere 5 | let makeHost = fun server ext ⇒ [...] in [...] | ------6 | catHosts (makeHost "google") | evaluatedtothis | ------(2) calling = ++, 2nd argument

(a) Error reporting without contract (b) Error reporting with contract

Figure 1. Contracts improve error messages

✞ ☎ ✞ ☎ 1 let Port = contracts.fromPred (fun p ⇒ 1 let elem = fun elt ⇒ 2 num.isInt p && 0 ≤ p && p ≤ 65535) in 2 lists.any ( fun x ⇒ x == elt) in 3 80 | Port 3 ✝ ✆4 let subList = fun l1 l2 ⇒ 5 elem (lists.head l1) l2 Figure 2. A contract for valid ports 6 && subList (list.tail l1) l2 ✝ ✆ Source program ✞ ☎ One such important optimization is inlining (Figure 3). 1 let subList = fun l1 l2 ⇒ Functional programs tend to make heavy use of functions, 2 lists.any ( fun x ⇒ x == (lists.head l1)) and a function call is not a free operation: it usually involves l2 3 && subList (list.tail l1) l2 a number of low-level operations such as saving and storing ✝ ✆ registers, pushing a new stack frame and jumping to and Optimized program back from the function’s body. Inlining eliminates a func- tion call by directly substituting the function for its defini- tion at compile time (or before execution, for interpreted lan- Figure 3. Inlining guage). This is especially efficient for small functions that are called repeatedly. While inlining expands an expression by substituting a For example, take the code of Figure 5. The partial ap- definition for its value, an opposite transformation can be plication g y is recomputed each time f is called. This may beneficial when a composite expression occurs at several be costly, in particular in the presence of contracts: if the places. In this case, the same expression is wastefully recom- first argument of g must be a list with elements of a spe- puted at each occurrence. Common subexpression elimina- cific and if that precondition is enforced by a function tion (CSE) consists in storing the expression in a variable contract (say g | List Odd → Even → List Odd), the addi- that is then used in place of the original occurrences (Fig- tional cost is linear in the size of y. A sensible thing to do is ure 4), thus evaluating the expression once and for all. to factor g y out of f as in Figure 5, which is something a Beyond CSE, other optimizations such as loop-invariant let-floating transformation could indeed do (given g is pure, code motion or let-floating [18] apply the same principle of as detailed below). extracting out an invariant expression to avoid recomputing The soundness of these optimizations is tied to the valid- it (respectively across loop iterations and function calls). ity of specific program equivalences. Inlining requires that Conference’17, July 2017, Washington, DC, USA Teodoro Freund, Yann Hamdaoui, and Arnaud Spiwack ✞ ☎ ✞ ☎ let 1 let elemAtOrLast = fun index list ⇒ 1 f x = print "hi";(x+1) 2 if index > lists.length list - 1 then ✝ ✆ Effectful function 3 lists.elemAt (lists.length list - 1) list ✞ ☎ else 1 (f 1,f 1) ; let y=f1 in (y,y) 4 ✝ ✆ 5 lists.elemAt index list Invalid expansion ✝ ✆ Source program ✞ ☎ Figure 6. Counter-example to (1) in presence of side-effects 1 let elemAtOrLast = fun index list ⇒ 2 let l = lists.length list - 1 in 3 if index > l then Strikingly, we will see in Section 4 that the introduction 4 lists.elemAt l list of unions and intersections make (1) unsound even in a pure else 5 setting, preventing the kind optimization of Figure 5 to fire 6 lists.elemAt index list ✝ ✆ in general. Optimized program 3 Union & intersection Let us now consider union and intersection contracts, be- Figure 4. Common subexpression elimination fore we explain, in Section 4 how they can compromise the ✞ ☎ properties that we described in Section 2. 1 let f = fun x ⇒ g y (x + 1) ✝ ✆ 3.1 Unions Source program A A ∪ B is a type of values which are either of ✞ ☎ 1 let g' = g y in type A or oftype B: literally the union of A and B. Union types 2 let f' = fun x ⇒ g' (x + 1) are popular in gradual typed systems such as TypeScript [9] ✝ ✆ and MyPy [7]. Optimized program In gradually typed systems. The problem that these prac- tical gradual type systems are trying to solve is to capture, Figure 5. Let-floating in static types, as many programming patterns as possible from the underlying dynamically typed language(JavaScript one can replace the application of a function by its body, for TypeScript and Python for MyPy). One such pattern is which is basically V-reduction: as long as the arguments are heterogeneous collections. For instance, in TypeScript, an evaluated following the language’s strategy, this is usually array which can contain both strings and numbers would a valid transformation. However, Section 4 exposes that the have type Array. question of inlining a function with a contract attached is A probablyeven morecommonpattern is avariable which more subtle. can contain either a value of type A (say, a number) or the null A CSE-like transformation on a term " requires on the value. So much so, in fact, that MyPy defines a type other hand an equivalence of the form: alias Optional[A] for Union[A,None] (None is how Python renders the null value). " [# /G] ≃ ;4C G = # 8=" (1) Yet another application of union types is, rather than cap- turing a pattern from JavaScript or Python, to capture a pat- " [# /G] stands for the substitution of G for the term # in tern from traditional statically typed language: sum types. the term ". This equation clearly fails in presence of side- In statically typed languages, values of sum types are usu- (f 1, effects, as demonstrated in Figure 6. In that example, ally thought of as being built out of constructors. But nei- f 1) "hi" let y=f1 in (y,y) prints two times while only ther JavaScript nor Python have such constructors. So in- prints it once. stead, sums are construed as “tagged unions” (or discrimi- However, (1) does hold for pure terms, that are terms with- nated unions), that is, quite literally, the union of two types out side-effects. In a pure language, all these transforma- which contain a discriminating tag. See Figure 7 for an ex- tions are valid. In impure languages, the situation varies: ample from the TypeScript documentation: there the kind in some case a large subset of pure terms can be identified field is the tag, and its type in both alternatives is a single- (in languages with effects tracking such as PureScript) to be ton type which contains only the specified string. safely transformed. Otherwise, the compiler must stay con- servative and only apply CSE to expressions it can prove are Union contracts. In the academic gradual type literature, without side-effects (arithmetic expressions, for example). it is common to use contracts as a glue between static and Union and intersection contracts are hard, actually Conference’17, July 2017, Washington, DC, USA ✞ ☎ ✞ ☎ 1 Circle { 1 let Date = {day | Num, month | Num, year | 2 kind: "circle"; Num} in 3 radius: number; } 2 let DateWeek = {dayOfWeek | Num, week | Num, 4 year | Num} in 5 interface Square { 3 let appendDate | (List (Date ∪ Str) 6 kind: "square"; 4 ∪ List (DateWeek ∪ Str)) 7 sideLength: number; } 5 → List = 8 6 fun list ⇒ lists. list "01/01/2021" 9 type Shape = Circle | Square; 7 } ✝ ✆ ✝ ✆

Figure 7. A sum type as a Figure 8. Adding an element to a union of two arrays

defined Animal and Pet, and a variable that is compatible dynamic types. Therefore, the question of bringing union to with both types is declared, with type Animal ∩ Pet. This contracts is natural, and have indeed been studied (e.g. [14, particular application is supported, for instance, by Type- 22]). Script. Like for static types, a value which satisfies the contract A ∪ B is a value which satisfies either contract A or contract ✞ ☎ 1 let Animal = B (though in Section 5 we will see that it may be desirable 2 { species | Str, breed | Str, name | Str } to weaken this definition). in Nickel is a language built from scratch with contracts, so 3 let Pet = { owner | Str, name | Str } in it may be less clear why unions are useful. However, Nickel’s 4 let myDog | Animal ∩ Pet = ambition is to have its data model canonically interpretable 5 { species = "Canis Lupus", in common serialization formats, in particular JSON. It means 6 breed = "Australian Cattle Dog", that it is very convenient to represent optional value by 7 owner = "Anonymous Author", the null value like in JavaScript. It also means that Nickel 8 name = "Juno" } doesn’t have built-in constructors: constructors don’t have a ✝ ✆ canonical representation in JSON. So it would be quite nat- Figure 9. An animal that is also a pet ural to represent optional contracts and sum contracts as unions. Another application of intersections shows up when in- 3.2 Intersections tersecting functions: it can be used to encode overloading. For instance, take a look at Figure 10, where the function An A ∩ B is satisfied by values which sat- duplicate works both as a function to duplicate arrays, as isfy both contract A and contract B. well as a function to duplicate strings. This is particularly Intersection contracts (and types) are probably less preva- useful when using unions, since it’s a good way to express lent than union in practical type systems. However, a func- that a function can deal with different shapes of data. For tion from a union is equivalent to an intersection. That is (A instance, in the same figure, duplicate is (correctly) called ∪ B) → C ≃ (A → C) ∩ (B → C). So in a system with func- on a value of type (List Str) ∪ Str. tions and unions, intersections are already morally present (and, for that matter, in a system with functions and inter- ✞ ☎ 1 let duplicate sections, unions are morally present). Some of our examples 2 | (List Str → List Str) in Sections 4 and 5 are better expressed in terms of intersec- 3 ∩ (Str → Str) = tions, so it’s best to include them. 4 fun x ⇒ x ++ x in Figure 8 gives a concrete example of this phenomenon. 5 let text | (List Str) ∪ Str =...in The function appendDate append an element to list whose 6 duplicate text type is only known to be in union of two lists, each using ✝ ✆ a different representation. Both alternatives support falling back to a simple string for unparsed dates. Faced with these Figure 10. Duplicating an array of Strings or a String two possibilities, appendDate can only append a value which fits both types: this is precisely the intersection (Date ∪ Str ) ∩ (DateWeek ∪ Str), that is, Str. 4 Incompatibilities Yet, intersections are useful in their own right. They can However appealing union and intersection contracts may be used to combine dictionaries in the style of object-oriented be, they happen to be fundamentally incompatible with the multiple inheritance. For instance, in Figure 9 two types are desirable language features from Section 2. At least in their Conference’17, July 2017, Washington, DC, USA Teodoro Freund, Yann Hamdaoui, and Arnaud Spiwack

full-blown form: in Section 5 we will discuss pragmatic re- This behavior indicates that union contracts introduce strictions of union and intersection contracts to recover some side-effects. The result of f 5 now depends on the previous or all of the features. execution and more specifically on any prior call to f. This behavior of union contracts break the property 1 introduced 4.1 Union contracts as a side-effect in Section 2.2, that is required to perform CSE-like optimiza- In Nickel, the failure of a function contract can always be tions in all generality. The candidate example of Figure 5 in traced back to a single call. For example, take the function Section 2.2 can’t be optimized in general. f with a simple contract attached of Figure 11. The whole Figure 13 illustrates this point further. It contains an orig- program fails with a contract error blaming f because the inal program and an optimized version where the common return value of the second call f 5 violates the Positive con- subexpression f 1 has been eliminated. While equivalent in tract. The first call to f does not matter, and f 5 is a single a pure language without contracts, these two programs be- and independent witness of the contract violation. The user have differently because of unions: is pointed to this one location in practice. • The original version returns (1, "False") without fail- This single witness property can be justified as follows. ing. Apart from the error reporting part (although this is the cru- • The optimized version fails with a contract violation. cial in practice!), the current contract system of Nickel can be implemented purely as a library, requiring only a In the original version, each partial application f 1 gives fail primitive to abort the execution. In practice, applying rise to a fresh instance of the contract Bool → Num ∪ Bool a function contract to f replaces it with a f' that performs → Str. These instances are independent, and can pick a dif- the additional checks. Thus, since the core language is pure ferent component of the union to satisfy. Although f doesn’t (albeit partial, if only because fail), the failure of f' 5 must actually respect the contract, these calls are not enough to be independent of its environment and of any previous call prove so. In the optimized version, g is endowed with a sin- to f'. gle contract, that must pick one of the two components of ✞ ☎ the union. There, the two calls refer to the same union con- 1 let f | Positive → Positive tract, and shows that f does violate its initial contract. 2 = fun x ⇒ x - 7 in 3 (f 10) + (f 5) ✞ ☎ ✝ ✆ 1 let f | Num → (Bool → Num ∪ Bool → Str) 2 = fun x y ⇒ if y then x else "False" Figure 11. Simple contract violation 3 in (f 1 true , f 1 false ) ✝ ✆ Original Union contracts are different. Consider the program pre- ✞ ☎ sented in Figure 12. The same f is now given a union con- 1 let f | Num → (Bool → Num ∪ Bool → Str) f tract. is violating this contract once again, as it neither 2 = fun x y ⇒ if y then x else "False" maps positive numbers to positive numbers nor positive num- 3 let g=f1 in bers to negative numbers. 4 (g true , g false ) ✝ ✆ ✞ ☎ Optimized 1 let f | (Positive → Positive) 2 ∪ (Positive → NonPositive) 3 = fun x ⇒ x - 7 in Figure 13. Equivalent programs, with inlining or CSE ap- 4 (f 10) + (f 5) plied ✝ ✆

Figure 12. Union contract violation To sum up, the addition of union contracts introduce side- effects in a pure language. Side-effects have well-known pit- This program must fail, because f 10 is a witness of f fail- falls: ing the contract Positive → NonPositive, and f 5 is a wit- ness of f failing Positive → Positive. But, as opposed to • For the programmer, they are hard to reason about. the example from Figure 11, removing only one of the calls They prevent local reasoning. In our previous exam- makes the program succeed! Indeed, each call only unveils ples, removing or adding a function call somewhere theviolation ofonecomponentoftheunion. Inthisexample, can toggle a failure in a call at a totally different loca- a single call to f that would be the witness of the violation tion. of the whole contract doesn’t even exist: a minimum of two • For the interpreter (or compiler), side-effects inhibit are always needed. many optimizations and program transformations. Union and intersection contracts are hard, actually Conference’17, July 2017, Washington, DC, USA

4.2 Intersection with user-defined contracts is given on Figure 15. Decomposing using the shared state A natural - but naive - implementation of intersection con- approach, we end up with: tracts could be the following: to apply a contract A ∩ B, ap- (( fun x ⇒ x) | (Str → Str)[l]) | C[l] ply both contracts A and B sequentially, resulting in the naive Stateful decomposition decomposition rule of Figure 14. where l represents the shared state. At this point, apply- ing the C contract results in evaluating: M | A ∩ B ≃ (M | A) | B (( fun x ⇒ x) | (Str → Str)[l]) 0 == 0. Naive decomposition Applying a function wrapped in a Str → Str contract to (A → B) ∩ (C → D) ≃ (A ∩ C) → (B ∩ D) 0 fails negatively. This is not the expected behavior, since the Exchange law identity function does respect semantically both contracts. ✞ ☎ As opposed to built-in higher-order contracts, user-defined 1 let g | Num → Num ∩ Str → Str contracts are black-box from the interpreter’s point of view, 2 = fun x ⇒ x in and it is thus not obvious how to extend the shared state 3 g 1 approach to handle user-defined contracts. ✝ ✆ ✞ ☎ Overloaded identity 1 let c = contracts.fromPred ( fun f ⇒ 2 f 0 == 0) in let Figure 14. Naive implementation of intersection 3 g | (Str → Str) ∩ c 4 = fun x ⇒ x 5 in g 0 This intuition works for simple contracts: checking that ✝ ✆ x | Natural ∩ Odd amounts to check that x | Natural and x | Odd. Unfortunately, this doesn’t scale to higher-order Figure 15. intersection and user defined contracts contracts. The overloaded identity example of Figure 14 il- lustrates the use of an intersection to model a simple over- Once again, intersection contracts introduce side-effects loading of the identity function. If we were to apply the in the picture. What’s more, these side-effects interact with naive decomposition, the argument 1 would fail the Str → user-defined contracts in a non-trivial way, while they are Str contract and abort the execution. Perhaps the exchange an important feature for validation. rule given in Figure 14, which is a direct consequence of the naive decomposition, illustrates the issue better. It is clear 5 Pragmatic trade-offs that this exchange law isn’t the right semantics for over- Despite the difficulties of Section 4, union and intersection loading. With this law, the contract for overloaded identity contracts are still sought after. In this section we turn to of Figure 14 would always fail because no argument can sat- existing systems with union and intersection contracts in isfy Num ∩ Str. the literature and in implementations. In a higher-order intersection contract, blame is raised These systems all make trade-offs, sacrificing some fea- when: tures of union and intersection contracts to preserve lan- Faulty caller The argument fails both components. guage features. We survey and discuss those trade-offs and Faulty implementation The function fails at least one their implications. component that the argument previously satisfied. 5.1 A coinductive semantics To fix the naive implementation, the interpreter can share state between the sub-contracts, in order to decide if blame In order to give a precise definition to what values ought must be raised or not when a sub-contract fails: to satisfy union and intersection contracts, [14] give a coin- ductively defined semantics inspired by union and intersec- x | A ∩ B ≃ (x | A[l]) | B[l] tion type systems. The key innovation of [14] is recognizing Shared state is represented by the label l. This is in essence that giving a semantics to higher-order contracts requires the approach proposed by Williams, Morris, and Wadler in [22]. defining not only what values satisfy a contract, but also However, their approach has a major drawback: it is not what contexts satisfy the contract. This models the situation straightforwardly compatible with user-defined contracts (in- where context may violate a contract by calling a function troduced in Section 2.1). The issue is similar to our initial with an inappropriate argument. issue with higher-order contracts and the naive decomposi- Concretely, given a contract, [14] introduce the two sets tion: user-defined contracts may apply functions and thus ÈÉ+ and ÈÉ− of values and contexts, respectively, satisfy- make a sub-contract of the intersection fail, but this fail- ing the contract. They are defined by mutual induction and ure shouldn’t always result in raising blame. An example coinduction. Conference’17, July 2017, Washington, DC, USA Teodoro Freund, Yann Hamdaoui, and Arnaud Spiwack

This semantics has limited support for overloading. Con- can be an issue with user-defined contracts which may in- sider the example in Figure 16: it could evaluate to the pair clude costly tests. (1, 1). But the coinductive semantics of [14] rejects it as Efficiency is also affected another way: each time a func- a contract violation. This can be phrased pithily as the fact tion with a contract attached is applied, the whole context that the coinductive semantics doesn’t validate the property must be traversed to check for a compatibility property. A → B ∩ A → C≃A → (B ∩ C). The algorithmic system of [14] is, on balance, a techni- ✞ ☎ cally impressive realization of the coinductive semantics, though 1 let f = fun x y ⇒ x in it is fairly complex and probably difficult to implement effi- 2 let g = f | (Num → Num → Num) ciently. 3 ∩ (Num → Bool → Num) in 4 let h=g1 in 5.3 Monitoring properties 5 (h 1, h true ) Another realization of the coinductive semantics described ✝ ✆ in Section 5.1, is [22], which aims at simplifying the algorith- mic system proposed by [14] and described in Section 5.2. Figure 16. Intersection contracts don’t distribute A key ingredient of [22] is to disallow user-defined con- tracts. This choice gives the authors more freedom in the Note that if the last two lines of Figure 16 had read ( quest of a more uniform operational semantics. This is sen- g 1 1, g 1 true) instead, then the coinductive semantics sible trade-off in the context of à la Type- would accept the example. It implies that under the coin- Script: the problem is to match contracts with static types, ductive semantics, common-subexpression elimination (see and user-defined contracts don’t have a static type equiv- Section 2.2) is quite perilous. alent. On the other hand, the cost would probably not be worth it for a configuration language like Nickel. 5.2 A first realization As a means of proving the correctness of their simplified In Section 4, we’ve seen that different calls to a function system, the authors introduce what they call sound moni- with a union contract must share information: the behavior toring properties. Here is the sound monitoring property for of one call is influenced by the previous ones, as the function contexts of intersection contracts: must pick one component of the union to satisfy across all usages. Conversely, following the semantics of overloading, ∈È ∩ É− 85 ∈ÈÉ− ∨ ∈ÈÉ− each application of a function with an intersection contract This reads as: a context satisfies the intersection of  can select a different branch and is thus independent from and  if it satisfies at least one of the two. Morally, the s in the others. A general contract composed of nested unions, È ∩ É− should be the ones that can have their hole filled intersections and higher-order contracts appears to require with a term satisfying  ∩  without violating the  ∩  complex book-keeping in order to correctly raise blame. contract. All of this still holds true of the coinductive semantics Although sound, this interpretation is weaker than what described in Section 5.1. Nevertheless, [14] gives an algo- the coinductive semantics permits. Consider the two con- rithmic system which is complete for their coinductive se- texts presented on Figure 17. The first one is a context satis- mantics. In a remarkable technical tour de force, their algo- fying Num → Num, applying the hole to a number. Similarly, rithmic system is compatible with user-defined contract (see the second context from the same figure satisfies Bool → Section 4.2). Bool. A key aspect of the approach of [14] is to rewrite nested ✞ ☎ union and intersection contracts into a disjunctive normal 1  3 form using the De Morgan’s law A ∩ (B ∪ C)≃(A ∩ B) ∪ ✝ ✆ (A ∩ C). The goal is to be able to delay the choice of branch Num → Num context in intersections as much as possible. ✞ ☎ To implement contract verification, [14] resorts to spe- 1  true cific reduction rules for unions and intersections which per- ✝ ✆ Bool → Bool context form this rewriting on the fly. This aspect is critiqued in [22]: “the monitoring semantics for contracts of intersection and union types given by Keil and Thiemann are not uniform. Figure 17. Two different contexts in Nickel (...) If uniformity helps composition, then special cases can hinder composition.” Now, combining these two context as in Figure 18 gives A cost of this approach is that the De Morgan’s law A ∩ a context that doesn’t satisfy Num → Num nor Bool → Bool. (B ∪ C)≃(A ∩ B) ∪ (A ∩ C) duplicates contract A, which According to the sound monitoring property of intersection, will cause some contracts to be checked several times. This Figure 18 thus doesn’t satisfy Num → Num ∩ Bool → Bool. Union and intersection contracts are hard, actually Conference’17, July 2017, Washington, DC, USA ✞ ☎ ✞ ☎ 1 let f =  in 1 (define/contract united 2 (f 3, f true ) 2 (or /c (→ number? number?) ✝ ✆3 (→ even? even?)) 4 (lambda (x) x)) Figure 18. Combined context ✝ ✆

Figure 21. Rejected use of or/c with higher-order contracts The consequence is that [22] doesn’t prove their system complete for the coinductive semantics. It’s probably just an oversight in the proof: we believe their system to be indeed case->. To compensate for the fact that and/c doesn’t sup- complete for the coinductive semantics. But it does speak to port overloading, Racket provides a second intersection-like the intrinsic complexity of union and intersection contracts: combinator: case→. As for the or/c, the candidate contracts it is remarkably easy to get details wrong. This difficulty must have distinct arities to avoid ambiguity, unlike or/c the contrasts with the standard framework of higher-order con- alternative chosen when the function is called rather than tracts where satisfaction is much more straightforward. when the contract is applied to the function. An example is provided in Figure 22. The resulting possibilities are sim- 5.4 Racket ilar to static overloading, where one function can take ad- Racket is a language based on the Scheme dialect of Lisp. ditional parameters for example (e.g. as supported for Java Among established languages, Racket is probably the one methods). On the other hand, it excludes the overloading of with the most comprehensive contract system [1]. Regard- generic operations with fixed arity such as equality, compar- ing union and intersections, Racket provides the and/c and ison, arithmetic operators, and so on. or/c combinators for contract. ✞ ☎ The intersection combinator and/c corresponds to the naive 1 ( define/contract overcase interpretation of intersection described in Section 4.2: ap- 2 ( case→ (→ string? string?) plying contract (and/c A B) is like applying contract A then 3 (→ number? number? number?) 4 ) applying contract B. In particular and/c doesn’t model over- 5 ( lambda (x [y 0]) (if (number? x) loading. As in Figure 14, the example given in Figure 19 al- 6 (+ x y) number? ways fail because no argument satisfies both and 7 x))) string?. 8 ✞ ☎9 (overcase 1 2) 1 ( define/contract overload 10 2 ( and/c (→ number? number?) 11 (overcase "hello") 3 (→ string? string?)) ✝ ✆ 4 ( lambda (x) x)) ✝ ✆ Figure 22. Overloading with the case→ combinator

and/c Figure 19. and overloading In conclusion, Racket, a programming language with a large user base, avoids the difficulties of general unions and The union combinator or/c is similarly simple. It must intersections (Section 4). They make the pragmatic choice be able to decide immediately which branch holds: or/c is of a simple semantics with limited support for higher-order a simple Boolean disjunction. For instance, when higher- contracts in intersections and unions. order contracts are combined using or/c, Racket imposes that contracts must be distinguishable by their arity. Do- 6 (More) Related work ing so, there is at most one candidate that can be selected 6.1 Higher-order contracts directly. This is illustrated in Figure 20 whose program is accepted. On the other hand, the program of Figure 21 is Enforcing pre- and post-conditions at runtime is a widely rejected. established practice. In their foundational paper[11], Findler and Felleisen introduce higher-order contracts, a principled ✞ ☎ approach to run-time assertion checking that nicely sup- 1 ( define/contract united 2 ( or/c (→ number? number?) ports functions. They introduce the notion of blame, which 3 (→ string? string? string?)) is crucial to good error reporting. It became apparent later 4 ( lambda (x) x)) that their contracts are closely related to the type casts in- ✝ ✆ troduced by gradual typing, excluding blame: both [15] and [19] see the value of contracts as a safe interface between Figure 20. Accepted use of or/c with higher-order contracts typed and untyped code. In [21], the authors precisely in- troduce a system integrating gradual typing with contracts Conference’17, July 2017, Washington, DC, USA Teodoro Freund, Yann Hamdaoui, and Arnaud Spiwack

à la Findler & Felleisen. Nickel adopts a similar , there are gaps between the theoretical foundation, a proof- with both statically typed terms, dynamically typed terms, of-concept, a prototype, and the integration in an actual lan- and first-class contracts. Higher-order contracts are the ba- guage. We hope that our attempt may serve as a cautionary sis of the work [14] and [22] that this paper explorted exten- tale: for union and intersection contracts, these gaps may be sively. larger than they appear.

References 6.2 Unions and intersections in gradual typing [1] [n.d.]. Contracts - Racket Documentation. Castagna et. al [10] introduce a gradual type system based hps://docs.racket-lang.org/reference/contracts.html on so-called -theoretic types [12]. Set-theoretic types fea- [2] [n.d.]. The CUE Configuration Language. hps://cuelang.org/ ture unions, intersections, negation types together with a [3] [n.d.]. Design by Contract and Assertions - Eiffel. notion of . This work adheres to the static first hps://www.eiffel.org/doc/solutions/Design_by_Contract_and_Assertions [4] [n.d.]. The Dhall configuration language. hps://dhall-lang.org/ school (see [13]) of gradual typing: the reason for gradual [5] [n.d.]. Jsonnet - The Data Templating Language. hps://jsonnet.org/ types is to allow for a less precise type information. It fol- [6] [n.d.]. The Nickel repository on GitHub. lows that their goals and constraints are somehow differ- hps://github.com/tweag/nickel ent, resulting in the absence of first-class contracts. As for [7] [n.d.]. Optional types and the None type. any gradual type system, they do have to implement casts – hps://mypy.readthedocs.io/en/latest/kinds_of_types.html#optional-types-and-the-none-type [8] [n.d.]. Repology package tracker. that are very close to contracts – for unions and intersec- hps://repology.org/repositories/statistics/total tions, but these casts are neither visible nor available to the [9] [n.d.]. Union Types in TypeScript, TypeScript handbook. programmer. Obviously, no contracts mean no user-defined hps://www.typescriptlang.org/docs/handbook/2/everyday-types.html#union-types contracts as well. [10] Giuseppe Castagna, Victor Lanvin, Tommaso Petrucciani, and Jeremy G. Siek. 2019. Gradual Typing: A New Perspective. Proc. ACM Program. Lang. 3, POPL, Article 16 (Jan. 2019), 32 pages. hps://doi.org/10.1145/3290329 7 Conclusion [11] Robert Bruce Findler and Matthias Felleisen. 2002. Contracts for Despite the fact that union and intersection of dynamic prop- Higher-Order Functions. SIGPLAN Not. 37, 9 (Sept. 2002), 48–59. erties may at first appear like an easy task, as soon as they hps://doi.org/10.1145/583852.581484 [12] Alain Frisch, Giuseppe Castagna, and Véronique Benzaken. 2008. are combined with higher-order contracts for increased ac- Semantic subtyping: Dealing set-theoretically with function, curacy of error messages, they really aren’t. union, intersection, and negation types. J. ACM 55 (09 2008). The problem of union and intersection contracts is that hps://doi.org/10.1145/1391289.1391293 they are not orthogonal to apparently independent features [13] Michael Greenberg. 2019. The Dynamic Practice and Static of programming languages. The mere presence of intersec- Theory of Gradual Typing. In 3rd Summit on Advances in Pro- gramming Languages (SNAPL 2019) (Leibniz International Pro- tion and union contract induces computational effects, and ceedings in Informatics (LIPIcs), Vol. 136), Benjamin S. Lerner, can make it quite difficult to perform simple program opti- Rastislav Bodík, and Shriram Krishnamurthi (Eds.). Schloss Dagstuhl– mizations such as inlining. Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 6:1–6:20. Designing a language with union and intersection con- hps://doi.org/10.4230/LIPIcs.SNAPL.2019.6 tracts means making difficult choices: some features of union [14] Matthias Keil and Peter Thiemann. 2015. Blame Assignment for Higher-Order Contracts with Intersection and Union. In Proceed- and intersection contracts and of the rest of the language ings of the 20th ACM SIGPLAN International Conference on Func- must be abandoned. Various trade-offs can be made, but it is tional Programming (Vancouver, BC, Canada) (ICFP 2015). Asso- worth mentioning that implementing a system with union ciation for Computing Machinery, New York, NY, USA, 375–386. and intersection contracts appears to be pretty complex a hps://doi.org/10.1145/2784731.2784737 task when unions and intersections are fairly complete. [15] Jacob Matthews and Robert Bruce Findler. 2007. Operational Semantics for Multi-Language Programs. In Proceedings of the It’s hard not to have sympathy for the minimalist end of 34th Annual ACM SIGPLAN-SIGACT Symposium on Principles this spectrum, where, like Racket, a language only has a of Programming Languages (Nice, France) (POPL ’07). Associ- very simple notion of unions and intersections. Many ap- ation for Computing Machinery, New York, NY, USA, 3–10. plications of unions and intersections are not possible in hps://doi.org/10.1145/1190216.1190220 such a context, but the presence of union and intersection [16] Bertrand Meyer. 1987. Eiffel: programming for reusability and ex- tendibility. ACM Sigplan Notices 22, 2 (1987), 85–94. contracts doesn’t interact with the rest of the language. It’s [17] Francisco Ortin and Miguel García. 2011. Union and intersection probably more manageable. types to support both dynamic and static typing. Inform. Process. Lett. To conclude, let us make clear that we do not think union 111, 6 (2011), 278–286. hps://doi.org/10.1016/j.ipl.2010.12.006 and intersection contracts are fundamentally broken, that [18] Simon Peyton Jones, Will Partain, and André Santos. 1996. Let- they can not be implemented correctly, or that they do not Floating: Moving Bindings to Give Faster Programs. In Proceed- ings of the First ACM SIGPLAN International Conference on Func- bear any value (quite the contrary). They may still make tional Programming (Philadelphia, Pennsylvania, USA) (ICFP ’96). sense to have in a language, and some apparent difficulties Association for Computing Machinery, New York, NY, USA, 1–12. in the implementation could be lifted some day. But as often, hps://doi.org/10.1145/232627.232630 Union and intersection contracts are hard, actually Conference’17, July 2017, Washington, DC, USA

[19] Sam Tobin-Hochstadt and Matthias Felleisen. 2006. Interlanguage [21] Philip Wadler and Robert Bruce Findler. 2009. Well-Typed Programs Migration: From Scripts to Programs. In Companion to the 21st ACM Can’t Be Blamed. In Programming Languages and Systems, Giuseppe SIGPLAN Symposium on Object-Oriented Programming Systems, Lan- Castagna (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1–16. guages, and Applications (Portland, Oregon, USA) (OOPSLA ’06). As- [22] Jack Williams, J. Garrett Morris, and Philip Wadler. 2018. The Root sociation for Computing Machinery, New York, NY, USA, 964–974. Cause of Blame: Contracts for Intersection and Union Types. Proc. hps://doi.org/10.1145/1176617.1176755 ACM Program. Lang. 2, OOPSLA, Article 134 (Oct. 2018), 29 pages. [20] Matías Toro and Éric Tanter. 2017. A Gradual Interpretation of Union hps://doi.org/10.1145/3276504 Types. 382–404. hps://doi.org/10.1007/978-3-319-66706-5_19