Union and Intersection Contracts Are Hard, Actually

Union and intersection contracts are hard, actually Teodoro Freund Yann Hamdaoui Arnaud Spiwack Universidad de Buenos Aires Tweag Tweag Buenos Aires, Argentina Paris, France Paris, France [email protected] [email protected] [email protected] Abstract the configuration of an application. In traditional configura- Union and intersection types are a staple of gradually typed tion languages, such as YAML, TOML, or JSON, the config- language such as TypeScript. While it’s long been recog- uration is fully, and explicitly, spelt out. nized that union and intersection types are difficult to ver- However, with the advent of DevOps, configurations have ify statically, it may appear at first that the dynamic part of been extended to describe the entire state of a computer, or gradual typing is actually pretty simple. even a fleet of computers. For instance, with Kubernetes you It turns out however, that in presence of higher-order con- needto configure a large fleet of (possibly replicated) docker tracts union and intersection are deceptively difficult. The containers. To describe this sort of configurations, you re- literature on higher-order contracts with union and inter- ally want to be able to re-use and abstract parts of the con- section, while keenly aware of the fact, doesn’t really ex- figuration, like traditional programming languages do. To plain why. We point and illustrate the problems and trade- meet this need, languages such as Cue [2], Dhall [4], Json- offs inherent to union and intersection contracts, via exam- net [5], or Nickel [6], where configurations are generated ple and a survey of the literature. rather than spelt out, were created. Another example is continuous integration systems: it’s CCS Concepts: • General and reference → Surveys and fairly typical to need a matrix of jobs, wherein the same tests overviews; • Software and its engineering → Language are run on different infrastructures, or with different ver- features; Software verification and validation. sions of a compiler. Traditional configuration would have Keywords: contracts, higher-order contracts, union, inter- you copy the same steps for each infrastructure. This is te- section dious, hard to maintain, and error prone. It’s much better, instead, to write the steps once, and instantiate them for 1 Introduction each infrastructure. Continuous integration systems typi- cally do this using a templating system layered on top of Union types, meaning a type A ∪ B containing values which YAML. Each of the configuration-generating languages above belong either to a type A or B, are a popular tool when adding static types to a dynamic language. In particular, both Type- allow such job-matrix definition natively. Script [9] and MyPy [7], use union types to model the fre- 1.2 Nickel quent practice to use the value null (None in Python) to rep- resent an absent optional value. This is why the gradual typ- In this article, we will use the Nickel language [6] as illus- ing literature, concerned with formalising the interplay be- tration and motivation. At its core, Nickel is the JSON data tween static and dynamic type systems, has been quite in- model, augmented with abstraction mechanisms, and its con- terested in union types [10, 14, 17, 20, 22]. formed of: On the other hand, unions are not a common feature of • dictionaries, written as: arXiv:2106.06278v1 [cs.PL] 11 Jun 2021 static type systems, mostly because they are quite difficult {field1 = value1,..., fieldn = valuen} to verify statically. So unions are really only worth it in grad- • arrays: ually typed language where they formalise existing dynami- cally typed patterns. On the other hand, surely, for dynamic [x1, x2,..., xn] tests, unions are really easy: it is simply the Boolean disjunc- • functions: tions of two tests. fun Unfortunately, as we document in this article, as soon as arg1. .argn ⇒ body you extend dynamic checksto contracts [11], unions become • and let-definitions: actually pretty difficult, and threaten desirable properties of let id = value in exp your language. A Nickel configuration is then evaluated to an explicit 1.1 Configuration languages configuration, e.g. in JSON, which can then be consumed To motivate contracts and the problem caused by unions, by an application. Therefore a design constraint of Nickel is let’s make a detour through configuration languages. A con- any Nickel data must have a straightforward interpretation figuration language is a language concerned with describing in JSON. Conference’17, July 2017, Washington, DC, USA Teodoro Freund, Yann Hamdaoui, and Arnaud Spiwack 1.3 Contracts Unfortunately, the delayed check of contract, while essen- A useful feature of a configuration language is to provide fa- tial to ensuring that schema validation doesn’t affect per- cilities for schema validation. That is, help answer questions formance (or, indeed, is possible at all on functions), make like: does our configuration have all the required fields? does union contracts (and their less appreciated sibling, intersec- the url field indeed contains a URL? tion contracts) quite problematic. These are inherently dynamic questions, as they are all questions about the evaluated configuration. To this effect, 1.4 Contributions Nickel lets us annotate any expression with a dynamic schema Our contributions are as follows check: exp | C. There is also syntactic sugar to annotate def- • We describe the fundamental difficulties caused by pres- initions: let id | C = value in exp stands for let id = ( ence of union and intersection contracts in a language, value | C) in exp. which are kept implicit in the literature (Section 4) Let us pause for a moment and consider the following: • We survey the various trade-offs which appear in im- it is Nickel’s ambition to be able to manipulate configura- plemented languages and in the literature to work around tions like Nixpkgs. With over 50 000 packages, it is one of these difficulties (Section 5) the largest repository of software packages in existence [8]. Concretely, Nixpkgs is a dictionary mapping packages to 2 A typology of language features build recipes. That is, a massive, over-50 000-key-value-pair Union contracts are not only difficult to implement, their wide dictionary. It is absolutely out of the question to eval- unrestricted presence is incompatible with potentially desir- uate the entirety of this dictionary every time one needs to able properties of the language. In this section we present install 10 new packages: this would result in a painfully slow some of these properties; we will show how these properties experience. interact with union contracts in Sections 4 and 5. To be able to support such large dictionaries, Nickel’s dictionaries are lazy, that is, the values are only evaluated when 2.1 User-defined contracts explicitly required. For instance, when writing nixpkgs.hello, only the hello package gets evaluated. A strength of dynamic checking is that we can easily check But let’s consider now writing something like nixpkgs | properties which are impractical to check statically. For in- packages, to guarantee that all the packages conform to the stance that a string represents a well-formed URL, or a num- desired schema. If this were a simple Boolean test, it would ber is a valid port. have to evaluate all 50 000 package to check their validity, This same property is desirable of contracts as well, oth- hence breaking the laziness of dictionaries. Do we have to erwise we lose an important benefit of dynamic checking. choose between laziness and schema validation? Fortunately, Preferably, we want to be able to extend the universe of con- we don’t! Enter contracts [11]: dynamic checks which can tracts with user-defined predicates. be partially delayed, yet errors can be reported accurately. For instance, Figure 2 shows the definition of a contract Contracts can respect laziness of dictionaries, and they can for valid ports in Nickel syntax. User-defined contracts can be use to add schema validation to functions as well (in fact be combined with other contracts normally: Int → Port is functions were the original motivation for contracts). a contract verified by functions which, given an integer re- ThereisnoBooleanfunctionwhichcancheckthatavalue turns a valid port. has type Str → Str.Instead,acontractfor Str → Str checks This type of contracts are present in many different lan- for each call of the function whether guages, for instance, the Eiffel programming language[16], the precursor of the Design by Contract philosophy, makes 1. the argument has type Str, otherwise the caller of the it possible to assert these kinds of expression as pre- and function is faulty post-conditions on functions and as invariants on classes[3]. 2. if so, that the returned value has type Str, otherwise The Racket programming language also has a system to the implementation of the function is faulty work with contracts, powerful enough to define flat con- Like in the case of lazy dictionaries, the checks are de- tracts, and to compose them with other kinds of dynamic layed. Contracts keep track of whether the caller or the im- checks, like higher order contracts or a lightweight take on plementation is at fault for a violation, hence it can report union and intersection contracts[1]. precise error messages. Contracts are said to blame either the caller or the implementation. Compare Figure 1a and Fig- 2.2 Referential transparency ure 1b: in Figure 1a an error is reported inside the catHosts The performance of modern programs heavily relies on the function, but catHosts is, in fact, correct, as is made clear by optimizations performed by the compiler or the interpreter. Figure 1b, where catHosts is decorated with the Str → Str Even more so for functional languages, whose execution contract, and correctly reports that the caller failed to call model is often far removed from the hardware, causing naive catHosts with a string argument.

Load more