<<

Linköping University | Department of Computer and Information Science Master thesis, 30 ECTS | Computer Science 2017 | LIU-IDA/LITH-EX-A--17/043--SE

Evaluang Spec

Utvärdering av Clojure Spec

Chrisan Luckey

Supervisor : Bernhard Thiele Examiner : Christoph Kessler

External supervisor : Rasmus Svensson

Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrä

Dea dokument hålls llgängligt på Internet – eller dess framda ersäare – under 25 år från pub- liceringsdatum under förutsäning a inga extraordinära omständigheter uppstår. Tillgång ll doku- mentet innebär llstånd för var och en a läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och a använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsräen vid en senare dpunkt kan inte upphäva dea llstånd. All annan användning av doku- mentet kräver upphovsmannens medgivande. För a garantera äktheten, säkerheten och llgäng- ligheten finns lösningar av teknisk och administrav art. Upphovsmannens ideella rä innefaar rä a bli nämnd som upphovsman i den omfaning som god sed kräver vid användning av dokumentet på ovan beskrivna sä samt skydd mot a dokumentet ändras eller presenteras i sådan form eller i så- dant sammanhang som är kränkande för upphovsmannenslierära eller konstnärliga anseende eller egenart. För yerligare informaon om Linköping University Electronic Press se förlagets hemsida hp://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starng from the date of publicaon barring exceponal circumstances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educaonal purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are condional upon the consent of the copyright owner. The publisher has taken technical and administrave measures to assure authencity, security and accessibility. According to intellectual property law the author has the right to be menoned when his/her work is accessed as described above and to be protected against infringement. For addional informaon about the Linköping University Electronic Press and its procedures for publicaon and for assurance of document integrity, please refer to its www home page: hp://www.ep.liu.se/.

© Chrisan Luckey ABSTRACT

The objective of this thesis is to evaluate whether or not Clojure Spec meets the goals it sets out to meet with regards to easy data validation, performance and automatically generated tests in comparison to existing specification systems in the Clojure ecosystem.

A specification for a real-world data format was implemented in the three currently popular spec- ification systems used in Clojure. They were then compared on merits in terms of performance, code size and additional capabilities.

The results show that Spec shines with complex data, both in expressivity and validation perfor- mance, but has an API more complex than its competitors. For complex enough use cases where expressing regular data structures and generative testing is desired the time investment of learn- ing Spec pays off, in simpler situations an assertions library like Truss can be recommended.

iv I want to thank my mother and father without whom I wouldn’t be here.

v Contents

Abstract iii

Acknowledgments v

Contents vi

List of Figures viii

List of Tables ix

1 Introduction 3 1.1 The objectives of Spec ...... 3 1.2 Aim ...... 4 1.3 Questions ...... 4 1.4 Scope ...... 5

2 Background 7 2.1 Definitions ...... 7 2.2 Code complexity and quality ...... 9 2.3 Related work ...... 11 2.4 Introduction to Clojure ...... 12 2.5 Introduction to Spec, Schema and Truss ...... 15

3 Method 23 3.1 Pre-study ...... 23 3.2 Data selection ...... 25 3.3 Filtering test data ...... 27 3.4 Writing specifications ...... 27 3.5 Benchmarking ...... 27 3.6 Effort reduction ...... 28 3.7 Edge cases ...... 28

4 Results 29 4.1 Effort reduction ...... 29 4.2 Performance ...... 30

vi 4.3 Edge cases ...... 33 4.4 Criteria comparison ...... 33 4.5 Generating data from Clojure Spec ...... 34

5 Discussion 37 5.1 Method ...... 37 5.2 Results ...... 39 5.3 Writing specifications ...... 41 5.4 In a wider context ...... 52

6 Conclusion 53

Bibliography 56

A Statistical results from generating data with Spec 57

B Validation time broken down by keyword per system 61

C Validation time broken down by keyword grouped by system 67

vii List of Figures

4.1 Project file validation time summary ...... 31 4.2 Validation time broken down by whether the tested data is valid or not...... 32

5.1 Validation time comparison for boolean value in a map...... 40 5.2 Validation time for a dependency vector...... 40 5.3 Validation time for a value with multiple options...... 40

B.1 Validation time broken down by keyword for Truss ...... 62 B.2 Validation time broken down by keyword for Spec ...... 63 B.3 Validation time broken down by keyword for Schema ...... 64 B.4 Validation time broken down by keyword for plain Clojure validation ...... 65

C.1 Validation time by all systems for keys :aliases to :exclusions ...... 68 C.2 Validation time by all systems for keys :filespecs to :main ...... 69 C.3 Validation time by all systems for keys :managed-dependencies to :release-tasks . . 70 C.4 Validation time by all systems for keys :repl-options to :warn-on-reflection . . . . . 71

viii List of Tables

3.1 Downloads from Clojars per library...... 25 3.2 Number of projects on Clojars that depend on the given library...... 25

4.1 SLOC per specification implementation...... 29 4.2 Statistical measures in ms for validation of project files...... 31 4.3 Criteria comparison: Xmeans full, d partial and  no support...... 34 4.4 Generation time in milliseconds from spec...... 35

5.1 Feature comparison: Xmeans full, d partial and  no support...... 41

ix 1 Introduction

Clojure Spec is an upcoming standard library of the Clojure [48] for specifying the functionality of programs and the nature of data. It allows the programmer to describe the structure and contents of data held in any combination of Closure’ data structures as well as that given to and returned from functions and macros. It also allows for the relation between the data given to and returned from a function or to be expressed. These specifications can then be used for data validation, higher level parsing, generative testing as well as to provide improved documentation and error messaging. This work seeks to evaluate Clojure Spec in comparison to other competing specification systems as well as plain, normal Clojure code.

1.1 The objectives of Spec

Introduced in May 2016, but as of July 2017 yet to see a stable release, Spec seeks [10] to fill a lot of gaps in the Clojure ecosystem by:

• Documenting functions, macros, keywords, lists, arrays, maps and sets for both program- matic and human consumption. • Reporting errors1 on the parsing and destructuring of data. • Providing run time data validation. • Automatic destructuring and parsing. • Generating property based tests.

1Error messages being hard to understand has been the prime reason given by potential users of Clojure as to why they are not currently using Clojure, according to the Clojure survey. [44] [45]

3 1. Introduction

• Generating test data for these tests.

Some of the terms used above are described further in the Definitions section. The specifications themselves are available at application run time, including test time. They are not intended as mathematical proofs like those which type systems provide [10], although such applications exist2; instead the intention is to provide an environment where arbitrary validation of run time data is not only possible but easy to perform.

1.2 Aim

The aim of this thesis is to try to evaluate whether Clojure Spec succeeds in achieving some of the goals outlined in section 1.1. We do this partly through a measurement of time consumed in validating a real-world data set but also through measures of code quality and criteria based evaluation.

1.2.1 Classifying this study Stol and Fitzgerald write in their “Holistic Overview of Software Engineering Research Strategies” [46] that “terminology is a challenge in research methodology, that there is no commonly adopted taxonomy to describe such”. Nevertheless, some key words as defined in their paper describes the approach of this study. This thesis describes primary, quantitative and qualitative, desk research. It is a field study, an exploratory case study. The target of the study is the common ways of implementing specifications in the Clojure ecosystem.

1.3 Questions

The question asked in this paper is: To what degree and at what cost are the goals of Spec achieved? Specifically:

(1) How does the code of Spec specifications compare to equivalent code in competing systems, both in terms of plain SLOC and convenience of expression? (2) How does Spec perform in real-world benchmarks compared to competing systems? (3) To which degree does Spec expose issues in data or functions not found by competing systems, if at all?

The competing systems to Clojure Spec were deemed to be the existing data specification or assertion systems Schema [39] and Truss [5], and to not use any library at all. A specification for the project declaration file of the highly popular build, project and depen- dency management tool Leiningen was produced using each system and compared to each other with regards to the research questions. A comparison was also made using the criteria of related work.

2The third party library Spectrum [43] uses Spec as the basis for a type system.

4 1.4. Scope

1.3.1 Formulating research questions As Kitchenham et al. [28] write, a well-formulated research question should have three parts, with the focus on the first two.

1. A study factor, i.. the technology. In this case Clojure Spec. 2. The population, i.e. the samples of code. 3. The outcomes.

They express how the technological abstraction should neither be to high nor to low, and that “For some questions it may be necessary to be even more precise e.g. Contract-based specifications . . . ” which is precisely what this paper seeks to study.

1.4 Scope

With so many aspects to Spec there were many different approaches which were explored but not delved further into during the process of the thesis project.

1. By how much do the improved3 error messages given by Spec increase the speed at which a developer is able to fix the erring code? If there are two versions of a function, one annotated with Spec and one not; which function’s error message helps in correcting the incorrect usage of the function faster? Does this change between different developer groups, for example newcomers vs. experienced clojurists? Does the complexity of the function or arguments have an impact? Preliminary research was done on the subject, a method and theory chapter formulated, and the task of collecting test subjects was initiated. Though it was soon terminated based on advice from core Clojure developers [38] who relayed that the error messaging of Spec was still subject to large changes; that there was a good chance any results being invalidated by a new release.

2. To which degree does Spec expose edge cases not found by previous defensive programming, if at all? If a library is annotated with Spec and the generated tests run for the library, how many new bugs are found in the library?

3Improved especially in terms of exactness. While a common Clojure error message reads

ClassCastException java.lang.String cannot be cast to clojure.lang.IFn an error given by Spec for the same issue could read

Call to #'ns/bar did not conform to spec: val: ("Some string") fails at: [:args :function] predicate: fn? :clojure.spec/args (1) :clojure.spec/failure :instrument :clojure.spec.test/caller {:file "lekplats.clj", :line 20, :var-scope ns/fn1}

5 1. Introduction

This question, while interesting, would reasonably end up either being a case-study or too large time-wise considering how deeply a developer has to go in understanding how a program works in order to correctly specify the different kinds of data flowing through it. 3. How well does Spec perform as a parser? Is it possible to express any regular language in Spec? How performant is it compared to a hand written parser? With background in the fact that Clojure Spec has yet to be performance tuned [37], but foremost that language parsing in the sense of parsing a string is rather the work of a parser such as Instaparse [27], it made sense to not delve further into this question. 4. How much specification code is needed to cover the entire input of a function? Is there a limit to how large a specification can be in order for it to stay performant? Is there a sweet spot between these two? The answer to this question, it was reasoned, was going to vary so widely between cases that it would be hard to formulate any scientific method which would yield a result other than: “It depends.”

6 2 Background

This chapter will go over the theoretical underpinnings of this thesis starting with a short dictionary of terms and concepts. Following that, there are two short sections on how this paper fits into the world of computer science and on that follows theory or background sections on code quality and finding edge cases after which the reader will find a section on related work. Finally this chapter will bring the reader an introduction to Clojure and the specification systems dealt with in this paper.

2.1 Definitions

This section defines field specific nomenclature that the reader may find useful knowing about when reading this paper.

• Application Programming Interface (API), used in reference to the functions exposed by a library and the arguments they take together with the values they return. • Artifact, a Maven [30] concept for a Java archive which is uniquely identified by a sequence of a group-id and a name followed by a version string. In Clojure these are expressed as [group-id/name "1.0.0"]. • Derivative, the derivative of a language L with respect to a character c is a new language that has been:

1. Filtered to only contain words with the character c. 2. Had the letter c cut out from every word. [7] [33]

The precise name for this type of derivation is Brzozozwski derivation.

7 2. Background

• Destructuring, to destructure means to break down some structure into smaller compo- nents. Examples can be found in the Clojure Guide on Destructuring [15]. To parse a text in a regular language into a data structure can also be seen as a kind of destructuring. • Domain Specific Language (DSL), a language defined for the specific purpose of solving problems in one domain. • Example based testing, testing where the tests including input and expected output of a function are written by hand by a programmer. • Keyword, a specific in Clojure which only ever represents itself, :apple is only ever evaluated to :apple. They are commonly used as keys in maps and may be used as a function that given a map returns the value held under itself in that map.

(:k {:k 5}) ; => 5

A Java programmer may see them as interned1 strings with a function attached.

• Macros are function which transform a piece of code. [3] These are run at as opposed to normal functions which run at run time. • Plain Clojure, used to signify code written using only the standard libraries of Clojure. • Predicate function, a function which takes one or more arguments and returns a boolean value. Predicate functions are usually given names ending with a question mark such as seq? in Clojure. • Property based test, testing where the tests are generated based on specifications as opposed to example based testing. Sometimes called generative testing. Using property based testing it is possible to find edge cases in code that the author either did not account for or that comes out of mistakes [17]. • Regular expression (regex), consist of constants and operator symbols that denote sets of strings and certain operations over these sets, respectively. In this thesis the strict definition of the term regex will be used, i.e. that there is no notion of recollection. [24] About all modern implementations2 of regexes allow recollection with parentheses like (.+)\1 where \1 recalls what was read in the parenthesis. This is not allowed in the strict definition of regular expressions.

• Schema with capital S is used to refer to the library formerly called Prismatic Schema, nowadays Plumatic Schema [39]. Written with a lowercase s schema is used to refer to a specific schema for X.

1https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#intern() 2In Clojure and this thesis the Java [35] flavour of string-based regex syntax will be used. In it ‘(’ starts a capture group, ‘.’ means any non-newline character, ‘+’ means one or more of the previous character class. This thesis will deal with two syntaxes of regular expressions: The string-based expressions most programmers will find familiar and those formed by Spec’s regex functions.

8 2.2. Code complexity and quality

• Source lines of code (SLOC), the number of lines of code that the program consists of not counting comments. • Spec with capital S is used to refer to the core library Clojure Spec itself. Written with a lowercase s, spec is used to refer to a specific specification: a spec for X. • Specification, any definition on the characteristics of data written in code with either with Spec, Schema, Truss or Plain Clojure. • Truss with capital T is used when referring to Peter Taoussanis library Truss [5]. A truss with lowercase s is used to refer to a specific piece of code asserting that Y is true for X using Truss.

2.2 Code complexity and quality

A survey of the field of software metrics was performed in search of a metric beyond the obviously simplistic measure of SLOC. The two most cited sets of such metrics comes from two separate authors; Maurice H. Halstead [22] and Thomas J. McCabe [31], both claiming a large following at their time.

2.2.1 Halstead complexity measures Often referred to as Halstead’s metrics Maurice Halstead defined what he would later call “Software Science” starting with the length measure which correlated the number of operators and operands in a program with the number of bugs in his paper [23]. Continuing in his book Elements of Software Science [22] he defines four measures:

n1 the number of distinct operators in a program, n2 the number of distinct operands in a program, N1 the total number of occurrences of operators in a program, N2 the total number of occurrences of operands in a program;

and the vocabulary of a program as n = n1 + n2 and the length of the program as N = N1 + N2. He also stipulated that the length can be estimated by Nˆ = n1 log2 n1 + N2 log2 n2, but only for polished programs [18]. In short this means a higher quality program has a N and Nˆ which are closer to one another. But for our purposes the most important metric is that of effort: n N (N + N ) log (N + n ) E = 1 2 1 2 2 1 2 . 2n2 In Curtis, Sheppard and Millimans study [14] relating Halstead metrics of complexity to psychological complexity it was concluded that this metric correlates well with the speed at which a programmer is able to locate and fix a bug in some preexisting piece of code. They also observed that:

“Many small-sized programs can be grasped by the typical programmer as a cognitive gestalt. The psychological complexity of such programs is adequately represented by the volume of the program as indexed by the number of lines. When the code

9 2. Background

grows beyond a or module, its complexity to the programmer is better assessed by measuring constructs other then [sic] the number of lines of code.”

In other words, metrics such as those Halstead proposes he argues are needed to measure the cognitive complexity of real world programs; something which is confirmed by Banker et al. who found significant correlation between software complexity and maintenance cost [6] of large projects. Coppeick and Cheatham [12] then applied the Halstead metrics on Lisp defining a function to be equivalent to an operator and an argument to be the equivalent of an operand, which means that the code (+ (/ a 2) (inc a) 2) yields: n1 = 3. +, / and inc. n2 = 2. a and 2. N1 = 3. +, / and inc. N2 = 4. a, 2, a and 2. 3˚4(3+4) log2(3+2) E = 2˚2 « 49 But when reading modern literature on the subject of evaluating metrics of software complexity, namely Alain Abran’s book Software Metrics and Software Metrology [1], we find it discounts Halstead’s metrics as both ill defined and imprecise to the level that different researches following different interpretations of his metrics come to differing conclusions for the same code; perhaps precisely as Coppeick and Cheatham have done above. In other words: The results of Curtis, Sheppard and Millimans are not necessarily valid for the interpretation of Halstead’s metrics that Coppeick and Cheatham apply; that is because Curtis et al. use a different interpretation of the metrics than Coppeick and Cheatham and hence it does not follow that we can apply Coppeick and Chathams interpretations of the metrics and draw conclusions from them according to the results of Curtis et al. This leaves us having to repeat the trials of Curtis et al. using Coppeick and Cheathams definitions before we could in a scientifically honest way proceed in using their definitions for the subject of this thesis.

2.2.2 Cyclomatic complexity measures McCabe proposes a different measure of complexity which he names cyclomatic complexity [31], a measure of how many independant paths there are through a program. There exists a Leiningen plugin named Uncomplexor [2] which calculates cyclomatic complexity for Clojure projects that uses Leiningen, though it is not the result of scientific research. But without delving further on the subject we note that Stuart Halloway, author of the book Programming Clojure [21], write that complexity in Clojure code does not come from structurally complex code but rather from using the wrong things [32]. To back up that reasoning Abran notes that the transposition of cyclomatic complexity from graph theory into the field of software seems almost arbitrary and that practitioners build their own interpretation of what the resulting number from applying cyclomatic complexity on a software module actually means [1]. This being the result of McCabe never explicitly defining how to interpret the relation between the cyclomatic complexity measurement and software complexity itself.

10 2.3. Related work

2.2.3 Conclusions Based on the background presented in in section 2.2.1 and section 2.2.2 it can be concluded that the field of code quality measurement is lackluster at best, that perhaps this thesis is better off sticking with the simple metric of SLOC and leaving it up to the reader how to interpret that number.

2.3 Related work

Nothing has thus far been written about Clojure Spec in a scientific publication. This thesis remedies that and draws upon previous works both in- and outside the scientific community.

2.3.1 On Spec When it comes to evaluating Spec there exists non-scientific comparisons [19] [40] of Spec and Schema, but none that base their discussion on the specification of a real world data, instead making cases based on theoretical examples. One performance evaluation of Spec compared to some of its competitors can be found on the web [49], but it is based on the concept of validating the same small amount of artificial data thousands of times. This thesis contrasts itself against these on the basis that it takes a complex real world data set, produces specifications for validating this data and then compares them around the three research questions.

2.3.2 On benchmarking Kraus and Kestlers “Multi-core parallelization in Clojure: a case study” [29] contains a comparison of two implementations of the same algorithm; one pre-existing implementation in and one original in Clojure. The algorithms are then run with varying sizes of generated artificial data sets ranging from 20 to 100 thousand samples. They then visualize the execution time using box plots with whiskers, something this thesis will take after as it conveys the most important statistical measures in a manner which is easy to understand.

2.3.3 On evaluating contract systems In Plösch’s work on evaluating contract for Java he defines a number of criteria that can be used to describe and rank different such libraries [36]. He devises four different levels of support which the author can boil down to the following when translating it to functional rather than object-oriented programming:

BAS Basic Assertion Support3:

1. Is there support for assertions in the body of a function?

3The paper also speaks about class invariants, a concept which does not apply as there are no classes in Clojure.

11 2. Background

2. Are there pre- and post-conditions on functions?

AAS Advanced Assertion Support:

1. Can one define a relation between the input and output of a function? 2. Does the system support assertion expressions on collections? Are these guaranteed to not return a mutated4 collection? 3. Are the assertions guaranteed to be side effect free?

SBS Support for Behavioral Subtyping:

1. Is it possible to specify contracts for interfaces?

RMA Runtime Monitoring of Assertions:

1. Are there mechanisms available for handling broken contracts? 2. Is it possible to disable these checks in production? 3. Is it possible to do some checks selectively?

Except for SBS-1 these criteria will be used to provide additional evaluation of the specification systems. SBS-1 will not be used since inheritance is not a widely used concept in the Clojure world. The lack of support for this concept in any contract system for Clojure should not come as a surprise to a Clojurist.

2.4 Introduction to Clojure

Clojure is a modern functional Lisp on the JVM5, but also has official implementations that target CLR6 and JS7 called Clojure CLR and ClojureScript respectively, the latter often referred to as cljs. Clojure is a modern Lisp both in that it implements first class literals for not only lists but also vectors, maps, and sets. In that its core data types, called persistent data structures8, are immutable but do not require the creation of complete copies in order to facilitate permutation; instead structures common between the original and mutated data are shared. And in that it has built in primitives for concurrent execution and management of shared state.

4As long as the Clojure collections are used, which are immutable and persistent, it is impossible for any code to mutate a collection; though its functions may return a different collection than that passed in. 5Java Virtual Machine, a virtual computer specification and with multiple implementations for the execution of Java programs. 6Common Language Runtime, a virtual machine by Microsoft for execution of its .net languages. 7JavaScript, standardized in the ECMAScript language specification, implemented by a multitude of VM’s including Mozilla SpiderMonkey and Google V8. 8The book Purely Functional Data Structures [34] can be recommended for further reading on the subject.

12 2.4. Introduction to Clojure

2.4.1 Numbers Comments in Clojure begin with semicolons. Along with the usual number types Clojure also supports rational numbers as a first class data type.

1 ; Integer 50322143214123443210 ; Arbitrary precision integer 5.1 ; Floating point number 5E52 ; Floating point number 5/2 ; Ratio

2.4.2 Symbols, and keywords A symbol in Clojure is a name or identifier which may resolve to something, perhaps a variable or a function. The symbol itself holds no value but is just an interned string that can be resolved by the Clojure compiler to a value. A symbol is written without any special notation using alphanumeric characters as well as *, +, !, -, _, ’, and ?. A is similar to a module in Python or package in Java in that they are groupings of names, named symbol tables. A symbol containing either a . or / are said to be namespace qualified, they refer to something in a namespace other than the current one.

name-of-something ; Dashes are allowed characters in symbols. java.lang.Double ; Symbol referencing the fully qualified Java Double type. lein/something ; The symbol something in namespace lein, or a namespace ; referred to as lein in the current namespace.

In order to prevent a symbol to be resolved into the value it references it is possible to quote the symbol using a single quote.

(def thing5) (str thing 'thing) ; => "5thing"

In this thesis we adopt a convention common in the Clojure ecosystem, that the return value of a function call or expression is indicated with an arrow like =>. A keyword is a type in Clojure which only ever references itself, a constant commonly used for programmatic lookup in maps. These may also be qualified, but only in the Clojure fashion using /.

:key ; The unqualified keyword key. :spacious/key ; The keyword key in the namespace spacious. ::key ; The keyword key in the current namespace. ::lein/key ; The keyword key in the namespace referred to ; as lein in the current namespace.

13 2. Background

2.4.3 Data structures Clojure is a homoiconic9 language with built in literals for data structures like lists, vectors, maps and sets:

(12 "a") ; List [1 "a"] ; Vector {1 "a", :b 2} ; Map #{12 "hat"} ; Set

All data structures are heterogeneous10 and as previously mentioned are immutable but do not require copies to be made upon “modification”, i.e. when a new value is added, removed or something is updated in a new version of the same data structure.

2.4.4 Additional literals There are boolean, string and character literals as one would expect, but also string regex11 literals and an analogue of Python’s None and Java’s null: nil. true ; Boolean truth nil ; Nothing "What a day!" ; String \c ; The character c #"ab." ; Regular expression matching ab followed by any character.

It is worth noting that any value except for false and nil is considered true in Clojure’s equivalent of if-statements12.

2.4.5 Evaluation Clojure, being a Lisp dialect, is drastically different from languages in the C family in that it puts the parenthesis before the name of the function being called.

# Python print(1,2)

;; Clojure (print 1,2) ; Commas are optional in Clojure, (print 12) ; thus this and the line above are equivalent.

9Homoiconicity is a term used to describe languages where the structure of the program is be expressed using the data structures of the language. The data structure expressed in is then similar to how the program will be laid out in memory when executed. 10Heterogeneous data structures can contain data of multiple different types at the same time. 11Java string regexes are regular expression in the wider sense of the word. 12Linguistical note: there are no statements in Clojure, only expressions.

14 2.5. Introduction to Spec, Schema and Truss

As can be seen above, function calls are simply lists starting with a function name. It follows that one should be able to perform operations on these lists like with any other list, something enabled by Clojure’s macro system.

(+ 12) ; => 3

(defmacro infix [[first-operand operator second-operand]] (list operator first-operand second-operand))

(infix (1 + 2)) ; => 3

The above infix macro takes three arguments and returns a list containing the operation in prefix notation for Clojure to evaluate. Worth noting for discussions later in this thesis is that macros are evaluated at compile13 time, as opposed to the programs run time.

2.5 Introduction to Spec, Schema and Truss

This section of the background chapter will try to explain the basic need of a contract system and how the more complex specification systems are motivated. To start off we will look at a function abs which returns the absolute value of a number. Named functions are in Clojure defined with defn which is a combination of def and fn. The first argument of defn is the function name, the second the function’s argument list and the third is the body of the function.

(defn abs[ num] (if (pos? num) num (- num)))

(abs1) ; => 1 (abs -1) ; => 1 (abs nil) ; java.lang.NullPointerException (NPE): No message

If passed a number, the function returns the expected value, but when passed something which is not a number it throws an exception. If the programmer either does not want an exception as a part of the normal application flow, or simply considers NPE to be a user hostile error message exposing intricacies of the implementation they may chose to check the data before attempting to perform number-specific operations on it. Depending on the situation they may either throw an exception with a more useful message or return a value symbolizing failure.

(defn abs-nil[ num] (when (number? num) (if (pos? num) num

13On the JVM there is no interpretation, ever, of Clojure code; it is always compiled before execution. With ClojureScript macros are run in a separate compilation stage before being sent to the JS VM.

15 2. Background

(- num))))

(abs-nil nil) ; => nil

Instead of throwing an NPE the function abs-nil returns nil which can flow through the application, something a programmer may see as preferable if the result of the function is non-essential and its more important for the process as a whole to finish than it being exactly correct. But if the result of the function is essential and invalid arguments are exceptional the pro- grammer may try to provide the caller with a more useful exception instead. In Clojure there is a built in concept of pre-and post-conditions for functions that, while they never became popular, are worth mentioning.

(defn abs-pre[ num] {:pre (number? num)} (if (pos? num) num (- num)))

(abs-pre nil) ; java.lang.AssertionError "Assert failed: num"

When passing abs-pre the value nil the error message thrown is quite meagre. No information about what was asserted or what the value being passed as num was is given in the exception message is given. Though "Assert failed: num" is still an improvement over the complete lack of error message given with the NPE. But this is where Truss comes in, when the programmer wishes to pass on more information about the failing data in the error message of the exception.

2.5.1 Truss Truss is an assertions library by Peter Taoussanis [5] which first and foremost packs relevant data into the exceptions thrown when incorrect data is found. It is by far the simplest of the three libraries examined in this thesis. In the below function abs-truss the assertion is placed in the body of the function, but it might as well have been placed in the pre-condition of the function.

(defn abs-truss[ num] (if (pos? (truss/have number? num)) num (- num)))

(abs-truss "hat") ;; Invariant violation in `user.clj:561`. Test form ;; `(number? num)` failed against input val `"hat"`. ;; {:dt #inst "2017-06-27T13:54:40.402-00:00", ;; :val "hat", ;; :ns-str "user", ;; :val-type java.lang.String, ;; :?line 561, ;; :form-str "(number? num)"}

16 2.5. Introduction to Spec, Schema and Truss

Truss’s only14 function have asserts that there should be a number? in num and returns the value of num if that is true. If that turns out to be false Truss will throw an exception much like the Clojure’s pre-conditions, but with additional information attached to help the programmer debug the problem. As seen above Truss both lets us know which piece of code deemed the data to be invalid, the invalid value itself and other relevant information such as the time of the failure and the run time type of the value. The user may also attach arbitrary data to be included with each thrown exception. Beyond allowing us to assert that num is a number? Truss also provides some handy shorthands for expressing basic boolean logic.

(truss/have [:or string? integer?]1) ; => 1 (truss/have [:or string? integer?] "Tux") ; => "Tux" (truss/have [:or string? integer?] 'sym) ; Invariant violation in ...

(truss/have [:and integer? even?]2) ; => 2 (truss/have [:and integer? even?]1) ; Invariant violation in ...

It is also easy to assert that all the elements in a collection should be valid according to some predicate, i.e. that it is a homogeneous collection.

(truss/have integer? :in [123]) ; => [1 2 3] (truss/have integer? :in [1 'sym]) ; Invariant violation in ...

Though it is notable that have only returns vectors, no matter the type of collection it is given.

(truss/have integer? :in '(123)) ; => [1 2 3] (truss/have integer? :in #{123}) ; => [1 3 2] (yes, that really is the output)

Truss also provides a way to assert that an item should be a member of a set.

(truss/have [:el #{13}]1) ; => 1 (truss/have [:el #{13}]2) ; Invariant violation in ...

Beyond that it provides custom syntax for “set X is a superset of Y” and “map has exactly keys x, y and z” and “map has keys x, y and z but not less/more”, but no shorthand way to express what the values of these keys should be. Such functionality has be added by the user providing their own function or macro if desired.

2.5.2 Spec Both Spec and Schema are usually included in a project when the programmer wishes to express more complex data structures, perhaps with nested structures or heterogeneous sequences. Or perhaps value validation in maps is desired. But this introduction will start with simple predicate validation. Clojure Spec has three entry points for validating that some data is valid to a spec: 14Almost true: It also provides have? and have! which are slight variations on the same function.

17 2. Background

• valid?, a predicate returning true if the value is valid to the spec, else false. • conform, returns either the conformed value or :clojure.spec/invalid. The conformed value can be different from the original value depending on the spec. This is what provides Spec’s capabilities as a high level parser. • assert, returns the given value if valid to the spec, else thows an exception. The checking of assertions has to explicitly be enabled with a call to check-asserts. Spec is capable of using any predicate function as a specification and below the built in predicate number? is used as such.

(spec/valid? number? 5) ; => true (spec/valid? number? nil) ; => false

(spec/conform number? 5) ; => 5 (spec/conform number? nil) ; => :clojure.spec/invalid (spec/conform (spec/or :name string? :id int?)5) ; => [:id 5]

(spec/check-asserts true) (spec/assert number? 5) ; => 5 (spec/assert number? nil) ;; Spec assertion failed val: nil fails predicate: :clojure.spec/unknown ;; :clojure.spec/failure :assertion-failed

Though, as seen in the above demonstration, in order for Spec to be able to name what failed the value in its error messages the specification has to be formalized in a spec.

(spec/def ::number number?) (spec/assert ::number nil) ;; Spec assertion failed val: nil fails predicate: number? ;; :clojure.spec/failure :assertion-failed

But considering that assert is a macro and thus knows the name used by the caller of the macro in referring to the predicate function, this behaviour seems strange and a potential solution is proposed in section 5.3.6. Either way, the error messages Spec does produce can also be given as a string.

(spec/explain-str ::number nil) ;; => "val: nil fails spec: :user/number predicate: number?"

The below implementation has exactly the same functionality as abs-nil.

(defn abs-nil-spec[ num] (when (spec/valid? ::number num) (if (pos? num) num (- num))))

If the programmer instead wishes an exception to be thrown when invalid data is given to abs it is normally done through a function spec which is then attached to the function using clojure.spec.test/instrument.

18 2.5. Introduction to Spec, Schema and Truss

(spec/fdef abs :args (spec/cat :num ::number) :ret ::number)

(require '[clojure.spec.test :as spec-test]) (spec-test/instrument `abs)

(abs nil) ;; Call to #'leiningen.core.spec.project/abs did not conform to spec: ;; In: [0] val: not-a-number fails spec: ;; :user/number at: [:args :num] predicate: ;; number? :clojure.spec.alpha/args (not-a-number) ;; :clojure.spec/failure :instrument ;; :clojure.spec.test/caller {:file "form-init1202700168622381044.clj", :line 631}

While the above example may seem inane, remember that ::number can be replaced with a spec of much higher complexity, one which could describe any of the different collections Clojure provides. To start off: maps are expressed in terms of sets of keys:

(spec/valid? (spec/keys :req #{::id ::name}) {::id 'apple ::name "Peter"}) ; => true (spec/valid? (spec/keys :req #{::id ::name}) {::id 'apple}) ; => false

And validation of the values held by these keys is performed when specs for individual keys are entered into the central registry with def:

(spec/def ::id integer?) (spec/valid? (spec/keys :req #{::id ::name}) {::id 'apple ::name "Peter"}) ; => false (spec/valid? (spec/keys :req #{::id ::name}) {::id 5 ::name "Peter"}) ; => true

The simplest subset of collections, excluding maps, are the homogeneous collections which are expressed using coll-of:

(spec/valid? (spec/coll-of number?) #{123}) ; => true (spec/valid? (spec/coll-of number?)[123]) ; => true (spec/valid? (spec/coll-of number?) '(123)) ; => true (spec/valid? (spec/coll-of number?) '(1 'a3)) ; => false

Spec also has a dedicated concept for tuples, i.e. sequences of fixed length:

(spec/valid? (spec/tuple double? string? symbol?)[1.0 "bear" 'pine]) ; => true

For more complex sequences Spec has a set of functions which it calls regex ops: cat, alt, *, +, ? and &. These can be combined to match any regular language in the strict sense of the term. For example a sequence which begins with an id number followed by either one keyword and one float or two strings.

19 2. Background

(spec/def ::db-row (spec/cat :id integer? :pairs (spec/alt :kw-float (spec/cat :key keyword? :value float?) :name-desc (spec/cat :name string? :description string?))))

(spec/valid? ::db-row [1 :height 5.5]) ; => true (spec/valid? ::db-row [1 "hat" "black"]) ; => true (spec/valid? ::db-row [1 "hat" 5.5]) ; => false

Spec’s function conform can be used to destructure this sequence into a map using the defined spec. This essentially allows for parsing the implicit semantics of syntax into maps with explicit names.

(spec/conform ::db-row [1 "hat" "black"]) ;; => {:id 1, :pairs [:name-desc {:name "hat", :description "black"}]}

Spec can also generate data from a given spec, something which becomes useful when sample data is needed or for property based testing:

(require '[clojure.spec.gen :as spec-gen]) (spec-gen/generate (spec/gen ::db-row)) ;; => (-342030845 "EfcznnD2apFlceNG34n5" "P")

One may also use this as a tool to reflect on whether the spec is correct or not, e.g. was negative identifiers really what we wanted or should we restrict the spec further?

2.5.3 Schema Plumatic Schema is a specification system much like Spec and has two entry points for validating that some data is valid to a schema, check and validate. The first returns nil if the data matches the schema or an error message in a string if the data is invalid, the latter returns the data if valid or else throws an error.

(schema/check schema/Num5) ; => nil (schema/check schema/Num nil) ; => "(not (instance? java.lang.Number nil))"

(schema/validate schema/Num5) ; => 5 (schema/validate schema/Num nil) ;; Value does not match schema: (not (instance? java.lang.Number nil)) ;; {:type :schema.core/error, :schema java.lang.Number, :value nil, :error ...

One difference between Schema and its brother Spec is that the preferred leaf15 in a schema is a type declaration rather than a predicate function. In the above code the type is schema/Num 15A leaf node in a specification is one which has no children and is directly matched against a value.

20 2.5. Introduction to Spec, Schema and Truss which gets translated to the host specific type java.lang.Number. This does not preclude the usage of arbitrary predicates in schemas, though they have to be wrapped in the pred function for some of the same reasons that Spec requires predicates to be wrapped in a spec for it to be able to name them.

(schema/check (schema/pred number?)5) ; => nil (schema/check (schema/pred number?) nil) ; => "(not (number? nil))"

The below implementation has exactly the same functionality as abs-nil.

(defn abs-nil-schema[ num] (when (nil? (schema/check schema/Num num)) (if (pos? num) num (- num))))

But as with Spec the more common way of enforcing a contract on the arguments and return value of a function is through a function specification. In Schema’s case these are declared together with the function and in order to enforce the contract the call to the function is wrapped with with-schema-validation.

(schema/defn abs-schema :- schema/Num [num:- schema/Num] (if (pos? num) num (- num)))

(schema/with-fn-validation (abs-schema "not-number")) ;; Input to abs-schema does not match schema: ;; [(named (not (instance? java.lang.Number "not-number")) num)]code

It is possible to construct schemas for collections using the data DSL Schema provides. It uses the Clojure data types to represent what the schema should match. Schemas for homogeneous collections, including maps, are showcased below:

(schema/check #{schema/Num} #{123}) ; => nil ;; Sequences are expressed using array literals. (schema/check [schema/Num] [123]) ; => nil (schema/check [schema/Num] '(123)) ; => nil (schema/check [schema/Num] '(12 "a")) ; => [nil nil (not (instance? java.lang.Number "a"))] (schema/check {schema/Str schema/Str} {"key" "value", "a" "b"}) ; => nil

Tuples such as a sequence of a number followed by either a string or a number can be expressed:

(schema/defschema a-row [(schema/one schema/Num "id") (schema/one (schema/cond-pre schema/Str schema/Num) "value")]) (schema/check a-row [1]) ; => [nil (not (present? "value"))] (schema/check a-row [1 "str"]) ; => nil (schema/check a-row [1 5.0]) ; => nil

21 2. Background

But expressing the previously given spec ::db-row is not as easy. The closest we come using Schema’s own constructs is:

(schema/defschema db-row (schema/conditional #(keyword? (second %)) [(schema/one schema/Num "id") (schema/one schema/Keyword "key") (schema/one schema/Num "value")] #(string? (second %)) [(schema/one schema/Num "id") (schema/one schema/Str "name") (schema/one schema/Str "description")]))

(schema/check db-row [1 :degreees 5.5]) (schema/check db-row [1 :degreees "hat"]) ;; => [nil nil (named (not (instance? java.lang.Number "hat")) "value")]

Unlike the spec for the same piece of data the schema has to repeat the complete sequence for each permutation due to the lack of , there is simply no way to express “put these two parts together to form a sequence”. The selection between the two alternatives also has to be manually specified by providing the predicates #(keyword? (first %)) and #(string? (first %)). Further observations on the differences between Spec, Schema and Truss can be found in section 5.3. Among other things we describe how the constrains of Schema make it unsuitable for describing named arguments. Finally, it is worth mentioning that it is possible to generate data from a given schema using the schema-generators library, just like with Spec.

(require '[schema-generators.generators :as schema-gen]) (schema-gen/generate db-row) ;; => [-1.0 ["V^9)(1A~v" "4c)-\")cj"]]

22 3 Method

In order to study Spec the author first had to figure out how it was being used in the real world and identify its competitors. Seeing that it was commonly being used for validation the author searched for and found a large set of real world Clojure data which was then specified using Spec and its competitors. Measurements of validation time were then performed as well as other forms of inspection on the specifications and the results of the validation.

3.1 Pre-study

The first goal of the pre-study was to find existing code bases using Spec. The community driven website clojure-toolbox.com provides a categorised list of more or less popular libraries and tools a developer may use when writing Clojure applications. The source code for all projects was fetched and searched for usage of Clojure Spec, and subsequently found that 8 in total did so.1 The author also found a work in progress2 for the popular Leiningen build tool which specified what a project declaration can and cannot look like.

1alia, cljs-oops, core.typed, graphql-clj, java.jdbc, onyx, sablono, spandex. 2https://github.com/technomancy/leiningen/pull/2223

23 3. Method

A dependency tree for the entirety of Clojars3 was found on crossclj.info from which it was possible to extract a much longer list4 of projects using Clojure 1.9-alphaXX5 and specific functions of Spec.67 In addition some scripts were written to download the latest release of every artifact on Clojars,8 numbering 13995 in total, in order to search through their contents. The two main interaction points of Clojure Spec when it comes to using specifications, as opposed to defining them, is valid?, which checks whether a datum is valid according to a specification, and conform, which in addition of checking the datums validity destructures the datum if possible. Inspecting the use of Spec in the gathered data revealed equal use of conform and valid?. Inspecting these uses further with git blame it was found that:

1. Alia only uses Spec for testing. 2. Oops was written from the start with conform and valid?. 3. Untangled-web rewrote parts of its internals to use conform. 4. GraphQL-clj used to use valid?. The specs remain but its uses are gone. 5. java.jdbc only uses Spec for tests. 6. Onyx uses Schema and has only a toy spec. It uses Schema extensively throughout its code. 7. Sablono only uses Spec for tests. 8. Spandex only uses Spec for testing and documentation. 9. Net uses only valid? and its use was added on top of existing defensive programming. 10. Zprint uses valid? and has moved to Spec from Schema.

The conclusion that Spec is foremost used as an addon to existing code was drawn; that, as of yet, it has only had limited use as a replacement for existing code and if it replaced anything it was a Schema. In other words it made sense to make the study itself centre around how Spec compares to other specification or contract systems. The ecosystem was surveyed and five such systems were found: Spec, Schema, Truss, Herbert [25] and Annotate [4]. All but Spec are delivered to end users through Clojars which meant download numbers were publicly available and are presented in table 3.1.

3Repository of Clojure code where anyone can contribute, at clojars.org. Like Maven central but without review process. 4systems-toolbox, spex, clj-edocu-users, mistakes-were-made, utangled-client, clj-edocu-help, net, clj-element-type, gadjett, datomic-spec, tappit, tick, deploy-static, tick, spectrum, sails-forth, ring-spec, doh, tongue, diglett, rp-query-clj, pedestal.vase, clj-jumanpp, om-html, crucible, ctim, crucible, wheel, merlion, turbovote-admin-specs, odin, pc-message, active-status, rp-json-clj, curd, rp-util-clj, swagger-spec, invariant, functional-vaadin, untangled-client, flanders, uniontypes, dspec, schpeck, macros, replique, specific. 5https://crossclj.info/ns/org.clojure/clojure/1.9.0-alpha14/project.clj.html 6Defining data specs: https://crossclj.info/fun/clojure.spec/def 7Defining function specs: https://crossclj.info/fun/clojure.spec/fdef 8https://github.com/Rovanion/all-the-clojars

24 3.2. Data selection

Table 3.1: Downloads from Clojars per library.

name downloads downloads per year Spec not available9 Schema 1 250 727 320 699 Truss 257 800 161 125 Herbert 7 182 1 795 Annotate 2 074 1 037

The number of uses in published code on Clojars was also available on crossclj.info which gave us an additional set of data gauge popularity from, presented in table 3.2

Table 3.2: Number of projects on Clojars that depend on the given library.

name uses uses per year Spec 60 55 Schema 498 54 Truss 213 135 Herbert 5 1 Annotate 0 0

If the age of each project is taken into account it stands clear that there are only two real competitor libraries to Spec: Schema and Truss.

3.2 Data selection

The internet was scoured for large amounts of production data typical for what a Clojure program would deal with. The goal was to find a data source where the data was produced by humans so to have a large amount of variance with many different types of errors in them. And the best thing would be if there existed a parser to bring the data into the Clojure data types that Clojure programs normally deal with. In the ecosystems of modern programming languages there are often repositories of liberally licensed code available, such is also the case with Clojure. The lion share of such code is shared in the repository named Clojars and it was found that most of these had some version of the very same file located within their archives: The Leiningen project file. The contents vary from project to project, here is an example from the popular routing library Compojure [11]:

(defproject compojure "1.6.0" :description "A concise routing library for Ring" :url "https://github.com/weavejester/compojure"

9Spec has been delivered alongside Clojure 1.9-alphaXX meaning that it has been delivered with all programs using 1.9 thus far. Worth noting is that it recently was split out into its own artifact.

25 3. Method

:license {:name "Eclipse Public License" :url "http://www.eclipse.org/legal/epl-v10.html"} :dependencies [[org.clojure/clojure "1.7.0"] [org.clojure/tools.macro "0.1.5"] [clout "2.1.2"] [medley "1.0.0"] [ring/ring-core "1.6.0"] [ring/ring-codec "1.0.1"]] :plugins [[lein-codox "0.10.3"]] :codox {:output-path "codox" :metadata {:doc/format :markdown} :source-uri "http://github.com/weavejester/compojure/blob/{filepath}#L{line}"} :aliases {"test-all"["with-profile" "default:+1.8" "test"]} :profiles {:dev {:jvm-opts ^:replace [] :dependencies [[ring/ring-mock "0.3.0"] [criterium "0.4.4"] [javax.servlet/servlet-api "2.5"]]} :1.8{ :dependencies [[org.clojure/clojure "1.8.0"]]}})

These files are read by the Clojure reader into Clojure data types and are very typical for just Clojure programs as they turn into maps which is the preferred structure for Clojurists to put their data in. This comes from the Clojurists preference to hold explicit semantics in names instead of implicit semantics in syntax. The values then held by these keys are of many different degrees of complexity and structure. Some hold simple singular values such as :url which holds a string representing just that, an URL. Others are of intermediate but still strictly bounded complexity such as :mailing-list which holds a map with the possible keys :name, :archive, :other-archives, :post, :subscribe and :unsubscribe each with their own valid values, but none more complex than a vector of URL:s. Then there are the complex keys such as :dependencies which can vary in complexity from

[org.clojure/clojure "1.8.0"]

to

[log4j "1.2.15" :exclusions [[javax.mail/mail :extension "jar"] [javax.jms/jms :classifier "*"] com.sun.jdmk/jmxtools com.sun.jmx/jmxri] :native-prefix ""]

or :profiles which holds as its value an entirely new project-map. With this in mind it seemed like the perfect real-world data source to base our study on.

26 3.3. Filtering test data

3.3 Filtering test data

The test data was acquired using a software named All the Clojars10, developed as part of the thesis work. It uses Clojars public API to acquire a list of every single artifact in their repository to then feed that into Leiningen which pulls down its code and the code of all its dependencies, which could be located in any repository. Out of the 15089 project files collected from Clojars and Maven Central, 21 were removed since they were made for Leiningen 1, whereas this thesis project is for version 2. 121 files were removed because they could not be read by the Clojure reader, the most common cause being duplicate keys. On the other hand well over 200 files were repaired semi-manually, most of them broken due to being taken out of their original file system context. In this data there were a few top level keys that were never present: :plugin-repositories, :offline?, :implicit-hooks, :implicit-middleware, :jar-inclusions and :install-releases?. It follows that the specifications written for these keys will not be part of the evaluation.

3.4 Writing specifications

All measurements and conclusions thereof are drawn from the specifications written or co-written by the author of this thesis. The author of Figwheel[16] Bruce Hauman wrote a library for specifying configuration files he called strictly specking [47] which adds additional functionality on top of Clojure Spec. He then applied this library to the project Leiningen and produced a partial spec for its project files, this is the previously mentioned work in progress spec for Leiningen. The main part of the study consisted of removing strictly specking from Bruce’s partial spec, and then completing the spec. Additional specifications were then implemented using Schema, Truss and Plain Clojure for the same data. Issues encountered during the process are documented in section 5.3.

3.5 Benchmarking

All the acquired data was read into memory and each specification was used to validate both the project files in their original form and sorted into per keyword sets for per keyword benchmarks. The resulting timings were sorted on whether the data was valid or not. The benchmarks themselves consisted of timing each library validating project maps or sets of values per some specification. Each run was performed on the same JVM and the time measured was that of the execution of the validation function. The JVM had been warmed up by running the benchmarks twice before extracting the results of a third run. An overview of the benchmarking procedure follows:

1. Record the system time right before starting the validator function. 2. Run the validator function on a project file or value from one of keys depending on whether we are producing results for the graphs separated by keyword or not.

10https://github.com/Rovanion/all-the-clojars

27 3. Method

3. Record the system time right after the function is completed. 4. Calculate delta and put it together with the result of the function into the list of of results. 5. Go to 1 if there is yet more data to validate. Else calculate statistical results and output them to file.

When running across the original project files each validator was run across the entire data set of 14947 files. When benchmarking the validation of values separated out by keyword each set of values was validated ten times over except for those keys which there were11 fewer than ten values present, in which case the values were validated a 100 times to reach statistical significance. The benchmarks presented was run on a consumer-grade Intel Core i7-2640M with 4GB RAM on 32-bit Ubuntu 16.04. The source code of the benchmarking software can be found on GitHub12 along with instructions for running benchmark.

3.6 Effort reduction

The program cloc[9] will be used to count the number of lines of code in each implementation. The source code of the different validator implementations can also be found on GitHub13.

3.7 Edge cases

Two separate inquiries into finding edge cases in data not found by previous programming was performed during the study. The first method was applied during the pre-study where Hauman’s existing partial implemen- tation of a spec for Leiningen project files was run across the project-files from clojure-toolbox.com. We then manually inspected the reported failures and recounted our findings. The second inquiry was made during the study’s main section and is based on running the written validators across the entire data set in order to find any discrepancies there.

11https://gist.github.com/Rovanion/ec72b5092b4737763a377b7e616f6a06 12https://github.com/Rovanion/leiningen-validation-benchmark 13https://github.com/Rovanion/leiningen/tree/{spec,schema,truss,pred- validation}/leiningen-core/src/leiningen/core/{spec,schema,truss,pred_validation}

28 4 Results

This chapter contains answers for the three research questions of the paper: That there is no real difference in code size, that all three libraries add overhead with different characteristics and that Spec is not inherently better at finding edge cases in existing code.

4.1 Effort reduction

This section seeks to answer research question (1): How does the code of Spec specifications compare to equivalent code in competing systems, both in terms of plain SLOC and convenience of expression? As is seen in table 4.1 there is no major difference between length of the code using the different systems.

Table 4.1: SLOC per specification implementation.

Spec Schema Truss Plain Clojure 423 401 426 413

Though one may observe the number of utility functions and macros needed to support the specifications as an indicator of the deficiencies in each library. One could also see it as the amount of code needed to unify the capabilities of all the libraries. For Spec two macros named vcat and stregex were written to express concatenation of vectors and strings which matches a specific string regular expression. These were macros and not functions since most of the Spec API is also macro-based; and it follows that if one wishes to manipulate data before is passed on to a macro that manipulator has to also be a macro.

29 4. Results

For Schema a total of four functions were added:

• stregex!, in the same vein as stregex for Spec. • key-val-seq?, emulates spec/keys* and is treated in greater detail in the Named argu- ments section of the Discussion. • first-rest-cat-fn and pair-rest-cat-fn which both emulate parts of the functionality of spec/cat.

For Truss one function and four macros were implemented. These macros were macros instead of functions since Truss uses the source code in error messages and these would become hard to understand had they referenced anonymous functions:

• key-val-seq?, the very same as for Schema. • opt-key and req-key to emulate Schema and Spec’s validation of values held by keys. • stregex-matches, a macro likelike stregex. • >>, a convenience macro to apply its first argument to the end of each of the lists given as the rest of its arguments and lastly return the first argument. In practice:

(macroexpand '(util/>> [123] (truss/have vector) (truss/have integer? :in))) ;; => (do (truss/have vector [1 2 3]) ;; (truss/have integer? :in [1 2 3]) ;; [1 2 3])

As perhaps is clear from the above text, the simpler the library the more complex the hand written utility code had to become. Discussion around the specificities of these utility functions and macros can be found in section 5.3. Haumans’s incomplete spec for Leiningen project files, as introduced in section 3.4 and studied in the pre-study, was 804 SLOC and 80 lines of comments; though a large part of those SLOC were docstrings1. Hauman’s code did not replace any existing code but added the feature of being able to identify and suggest corrections for common error in such files [20]. Since it replaces no code no comparison in terms of SLOC could be drawn to a previous implementation.

4.2 Performance

This section seeks to answer research question (2): How does Spec perform compared to competing techniques? The benchmarks were run across all Leiningen project files published on Clojars or depended on by any project on Clojars as described in the Filtering test data section of the Method. The graphs in this chapter are box plots where the colored box represents the second Quartile2, the single black line across the coloured box is the median and the whiskers represent the whole range of the data. As an example: In fig. 4.1 this means that for Spec the fastest validation of

1The documentation bundled with a variable or function that is available at run and develop- ment time for the programmer to read. 250% of the data centred around the mean.

30 4.2. Performance

Validation time for all data

10.00000

1.00000

0.10000 milliseconds 0.01000

0.00100

0.00010

Spec Schema Truss Plain

Figure 4.1: Project file validation time summary

a project file completed in 0.0005ms, that 25% of the validation completed in 0.03ms, 75% in 0.14ms and 100% in 6.6ms with a median of 0.07ms. The term validation time is used to denote the time it takes to validate some piece of data using a specification system. In fig. 4.1 and table 4.2 this piece of data is one whole project file. In the per keyword breakdowns presented in Appendix B the piece of data being validated is that held under a keyword. As is obvious in fig. 4.1 Schema is by far the slowest of the three with a median evaluation time almost two magnitudes higher than its competitors. Spec has a median validation time 0.04ms slower than Truss and 0.05ms slower than Plain Clojure, but 1.58ms faster than Schema. Though it may seem from fig. 4.1 that Schema is also the most stable of the three one has to remember that the y-axis is in logarithmic scale3. And by looking at the numbers in table 4.2 we can confirm that such is not the case. Table 4.2: Statistical measures in ms for validation of project files.

min 1st quart median 3rd quart max std-dev name 0.00042 0.02618 0.06388 0.12772 1.59931 0.11643 Spec 1.48286 1.57328 1.64274 1.77755 21.27848 0.61164 Schema 0.00216 0.01287 0.02371 0.08320 41.78816 0.79916 Truss 0.00008 0.00893 0.01280 0.02000 0.49467 0.01294 Plain

Schema is in fact second in standard deviation to Spec, with Truss trailing behind by almost a whole order of magnitude. But then again 75% of the data is validated an order of magnitude faster by Spec and Truss than by Schema. There seem to be some specific cases that drive up the standard deviation of Truss. Breaking down the results by whether or not the test data was valid, as is displayed in fig 4.2, perhaps lays bare one of the causes. 3In order to accommodate for the large variance in execution time data.

31 4. Results

Validation time for passing data Validation time for failing data

10.00000 10.00000

1.00000 1.00000

0.10000 0.10000 milliseconds milliseconds 0.01000 0.01000

0.00100 0.00100

0.00010 0.00010

Spec Schema Truss Plain Spec Schema Truss Plain

Figure 4.2: Validation time broken down by whether the tested data is valid or not.

Looking at the data in fig 4.2 we can observe that Spec faster at validating invalid data while Truss is slower at the same task. The following paragraphs will dive into the reasons behind this, starting with Truss. Something which is reasonable due to the multiple layers of asserts that are used in the code in order to gain reasonable explanations as to why a piece of data failed. For example, consider the Truss verification function for the key :clean-targets in the project file:

(defn clean-targets [v] (truss/have [:and vector? not-empty] v) (truss/have [:or keyword? util/non-blank-string? project-path] :in v))

(defn project-path [kws] (truss/have [:and vector? #(>= (count %)2)] kws) (truss/have keyword? :in kws))

The key :clean-targets is meant to hold a list of paths that should be cleansed in case the user wants to make a clean build of the project. The value should be a vector of either keywords, non-blank strings or project-paths. In practice this means that if the program enters clean-targets to validate whether v is a valid clean-target and then enters project-path because one of the entries in v was neither a keyword? nor a non-blank-string?, if it turns out that there is one non-keyword in in kws then an exception will be thrown from there, caught in clean-targets and then tossed down the again with further information packed in. Imagine that a discrepancy in the data is found by Truss ten steps into the validation of some deeply nestled data structure and that all these steps are asserted with truss/have. In such a situation the call stack would be recorded, parsed out into a string and be added to the message of the throw exception ten times over resulting in an enormous final message; perhaps explaining why Truss is slower in validating incorrect datasets.

32 4.3. Edge cases

It is up to the developer how much granularity he or she wants in the messages that eventually end up in his or her logs, do they only want to know that v did not contain [:or keyword? util/non-blank-string? project-path] or do they specifically want to know that it was identified as a vector matching project-path but which contained some non-keyword element. The opposite behaviour can be observed in Spec; that validating incorrect data is faster than validating correct data. The reasons behind which are speculated to be related to the parsing algorithm, parsing by derivatives, used by Spec and that a descrepency to the spec means the parsing can terminate early. A complete collection of validation benchmark charts can be found in Appendix B and C. During the pre-study we assessed Leiningen which had no formal project file verification before Haumans’s implementation. This meant that measuring the difference in execution time became as simple as measuring the time it took for verification to execute. As a test bed of Leiningen project files all 516 projects listed on clojure-toolbox.com were downloaded4 and the improved Leiningen5 code was run on them. The results were that on average 45.6, median 43, max 108, min 13, ms was added to the execution time with a standard deviation of 17 ms.

4.3 Edge cases

This section seeks to answer research question (3): To which degree does Spec expose issues in data or functions not found by competing systems, if at all? When validating the full data set of all project files from Clojars the Spec implementation did not achieve any outstanding results from the other implementations. The specifications matched what the author had written them to match with the exception of minor quirks in the API’s as detailed in section 5.3.6. Running the improved version of Leiningen with a spec for the Leiningen project files on all projects on clojure-toolbox.com found a total of 8 true positive errors in existing project declarations. It also found 2 false positives where projects had entered their own configuration into their Leiningen project file which was not necessarily incorrect, but is not encouraged either. The possibility of false negatives was not explored. In short the answer to research question (3) is that: No, Spec seems to hold no traits which makes it stand out in this respect compared to the other systems evaluated in this paper.

4.4 Criteria comparison

As laid out in the section 2.3.3 there are some criteria the libraries may be evaluated on and the results of doing so can be found in table 4.3.

4The script used: https://gist.github.com/Rovanion/12742281888626c4ad5f063b68bc0258 5The specific version can be found at: https://github.com/Rovanion/leiningen/ with commit 53be818986047aed6b490685436d5b17ae65a203

33 4. Results

Table 4.3: Criteria comparison: Xmeans full, d partial and  no support.

Criteria Spec Schema Truss Plain

BAS XXXX AAS-1 X d* d* d* AAS-2 XX d**  AAS-3     RMA-1 d d d d RMA-2 XXXX RMA-3    

*: Only if written in the body of the function, not separately from the function implementation. **: The return values of (have X :in Y) calls are always vectors6 no matter the input type.

As pre- and postconditions as well as asserts are a part of the core Clojure language all techniques described have BAS. Worth mentioning on AAS-3 is that all systems use normal Clojure functions in one shape or form and there is no built in support or guarantees than can be given for side-effects to be forbidden. With regards to RMA-1 the handling has to always be managed by catching the thrown clojure.lang.ExceptionInfo unless the programmer wishes the thread to crash.

4.5 Generating data from Clojure Spec

With the given spec for project files the author also benchmarked data generation from it in 100 samples. The timings can be found in Appendix A. By far the most complex subspec to generate is that for the key :filespecs with incredible amounts of variance as is seen in the difference between the median and the mean. Each entry in the filespecs vector has a one-in-four chance of being a function which takes a project map and returns a filespec entry. This seems to mean that Spec will internally test its generated function by giving it generated project maps and validating that the returned value is a valid filespec. In other words there are multiple additional project maps generated and thrown away for each project map being generated. To illustrate that Spec does indeed validate its generated functions the below code sample will print the string “validating” 84 times in the process of generating one ::predicate function.

(defn valid? [& any] (println "validating") true)

(spec/fdef ::predicate :args (spec/cat :arg (spec/with-gen valid? #(spec/gen #{:a}))) :ret boolean?)

(gen/generate (spec/gen ::predicate)) ; Prints "validating" 84 times.

6Bug tracker issue: https://github.com/ptaoussanis/truss/issues/9

34 4.5. Generating data from Clojure Spec

Whether or not this behaviour of is intentional is unclear at the time of writing. Some key numbers from generating 1000 project maps based on the constructed spec can be seen in table 4.4. Table 4.4: Generation time in milliseconds from spec.

min max mean median 69,095901 62110,453413 8543,627015 786,806830

35

5 Discussion

This chapter will discuss method, validity and reproducibility of the report. It will then delve deeper into the finer points around writing specifications using the libraries described in this thesis.

5.1 Method

One of the largest aspects of Spec that is not evaluated is the usage of it to provide property based testing. As it was not something prominently available within its competitors, although there is a complimentary library available for Schema, it was left out of the evaluation; perhaps leaving an incomplete image of Spec. Overall the author would like to claim that the reproducibility of this study to be high since the the code used in its production is publicly available along with usage instructions.

5.1.1 Pre-study The selection to focus on specification, assertion and contract systems may seem arbitrary, one could argue that Typed Clojure[13] fills some of the same gaps in Clojure code that Spec does but in a static rather than dynamic way. The author would like to encourage investigations into Typed Clojure and its relation to Spec to build upon the results of this thesis.

5.1.2 Data selection The selection of Leiningen project files as the data set came down to availability. It is a real world data “type” used by a real world application, but as Clojure’s largest industries are Enterprise

37 5. Discussion

Software and Financial Services [45] and these tend to keep their data tight it becomes hard to get hold of representative data.

5.1.3 Filtering test data Some project files in the data set could be automatically modified to be readable outside of their original contexts, others were unsalvageable without manual inspection. Given more time every single Leiningen 2 project could have been modified by hand to provide for a slightly larger data set.

5.1.4 Writing specifications Perhaps the most glaring source of bias of this work is the fact that the specifications on which all measurements are made were all written by the author of the study, that all results and discussion is drawn from the experience of a sole human. But there seemed no natural way around this issue as perhaps no programmer writes the same functionality four times over using four different techniques as part of their typical work flow for a real world application.

5.1.5 Benchmarking It would be interesting to inspect the memory profiles of the different implementations and contrast them to each other as a compliment to the execution time comparisons drawn in this paper. It is entirely possible that some enlightenment could be drawn from such data. One could also argue that considering the platform, different benchmarks should be run comparing a newly started JVM to a warm JVM as it optimizes code during runtime. On a more positive note the author would like to highlight the reliability of the benchmark results, that author finds it likely that the same results would be found by a third party using the same code.

5.1.6 Effort reduction SLOC as a metric is about as vapid as one can come. One could argue that one might as well not want mention this result due to the weakness of the measure.

5.1.7 Source criticism Due to Clojure not being terribly popular in academic circles some of the academic sources cited [8] [36] do not relate to Clojure directly but instead to Java, and some of the concepts described therein had to be translated from an object oriented way of thinking into a functional mindset; something one may argue removes the validity of the methods. In some cases, such as relating to code quality, the foremost experts on Clojure are part of the industry and confide their wisdom either through emails, chat or other forums of discussion such as GitHub, Jira and StackOverflow. During the course of this study there has been direct and indirect contact with these experts and citations of these experts are part of the thesis when relevant, disregarding the fact that the form of publication is highly informal.

38 5.2. Results

5.2 Results

5.2.1 Effort reduction The size of the specifications seems to depend on the data that is being described or how the code is laid out much more than the specification system being used. Though, the utility section of each implementation grew more complex the simpler the supporting library became.

5.2.2 Performance The strengths and weaknesses of the three libraries can be neatly illustrated by looking at the graphs of specific keys from the validation benchmarks. We can start by looking at the overhead of doing a runtime type check. The keyword :warn-on-reflection should only ever hold a boolean value and the associated predicate is defined as:

(defn boolean? [x] (instance? Boolean x))

Verifying that the value of the optional key :warn-on-reflection is valid looks like:

(def the-map{ :warn-on-reflection true}) ;; Spec (spec/def ::warn-on-reflection boolean?) (spec/def ::project-map (spec/keys :opt-un [::warn-on-reflection]) (spec/valid? ::project-map the-map) ; => true ;; Schema (schema/defschema project-map {(schema/optional-key :warn-on-reflection) schema/Bool}) (schema/check project-map the-map) ; => nil ;; Truss (when (contains? the-map :warn-on-reflection) (truss/have boolean? (get the-map :warn-on-reflection))) ; => true ;; Plain Clojure (when (contains? the-map :warn-on-reflection) (boolean? (get the-map :warn-on-reflection))) ; => true

As we can see in fig. 5.1, compared to the baseline of using no library at all, Truss adds no overhead within the margins of error while both Schema and Spec add about equal amounts of overhead. As advertised these simple assertions is where Truss shines. If we instead direct our gaze at a reasonably complex piece of data like that held under :dependencies and the validation times displayed in fig 5.2. These dependencies are a vector of a name, which can be either a string or a symbol, followed by a version string and an arbitrary number of argument-value pairs. Argument-value-pairs can themselves be of varying complexity. One example of what one would typically find under :dependencies is as follows:

[[leiningen-core "2.7.2-SNAPSHOT"] [org.clojure/clojure "1.9.0-alpha16"] ["bultitude" "0.2.8"] [org.clojure/google-closure-library "0.0-20140226-71326067" :scope "provided"] [stencil "0.5.0" :exclusions [org.clojure/core.cache]]]

39 5. Discussion

Validation comparison for key :warn-on-reflection

10.00000

1.00000

0.10000 milliseconds 0.01000

0.00100

0.00010

Spec Schema Truss Plain

Figure 5.1: Validation time comparison for boolean value in a map.

Validation comparison for key :dependencies Validation comparison for key :aot

10.00000 10.00000

1.00000 1.00000

0.10000 0.10000 milliseconds milliseconds 0.01000 0.01000

0.00100 0.00100

0.00010 0.00010

Spec Schema Truss Plain Spec Schema Truss Plain

Figure 5.2: Validation time for a Figure 5.3: Validation time for a value dependency vector. with multiple options.

In fig 5.2 we can observe how both Spec and Schema are about an order of magnitude slower than Truss and the baseline predicated-based validation, though no specific underlying cause for this behaviour has been found. One interesting observation is that any situation where the value of a key could be either of multiple alternatives Spec closed the gap to Truss by half an order of magnitude. An example of such would be :aot as seen in fig 5.3 where the value is either the keyword :all or a vector of regexes and/or symbols like [org.example.sample #"clj-webdriver\.ext\.*"].

40 5.3. Writing specifications

When it comes to the difference seen in validation time based on whether the data is valid or not one may speculate that this comes from Spec being able to prematurely end its parsing of the data structure and that these failures are much cheaper than the stack trace rollups that Truss performs. This is likely related to the derivation based parsing model Spec uses.

5.3 Writing specifications

This section will discuss some of the interesting but somewhat subjective points surrounding the specification writing process. In table 5.1 the reader will find a summary of the features and subjects discussed in this section of the paper.

Table 5.1: Feature comparison: Xmeans full, d partial and  no support.

Feature Spec Schema Truss Plain

Assertions XXXX Function specifications XX   Thread safe function specifications X    Named arguments in fn-specifications X    Data generation from specification X* X   Expresses homogeneous collections XXX d** Expresses tuples XX d*** d*** Expresses regular data structures X    Closed maps and types  X d 

*: Care has to be taken not to get stuck in an infinite loop when generating data from a recursive spec. **: Homogeneity is easily expressed like (every? int? [1 2 'a]), but the result is boolean. ***: Clojure destructuring makes it easy to pick apart and validate tuples to a degree, length has to be checked separately.

5.3.1 Function specifications Both Spec and Schema fashion function specifications, in Spec they are declared separately and later attached to some function. In the below example a function stringify will be given specifications, it takes a collection and applies the function str to each element in it.

(defn stringify [coll] (apply str coll))

(spec/fdef stringify :args (spec/cat :collection coll?) :ret string?)

(spec-test/instrument `stringify) ; Enforce the spec on call.

41 5. Discussion

(stringify [123]) ; => "123" (stringify1) ; Exception: Call to stringify did not conform ...

The implication of function specs being separate from their functions is that they can be reused for multiple functions.

(defn log-str [coll] (str "Data: " coll))

(spec/def log-str `stringify) (spec-test/instrument `log-str)

(log-str [123]) ; => "Data: [1 2 3]" (log-str "hat") ; Exception: Call to log-str did not conform ...

Spec is also capable of verifying whether a function itself is valid to a function spec using property based testing.

(spec/valid? `stringify #(str "Data:" %)) ; => true (spec/valid? `stringify #(vec "Data:" %)) ; => false

In Schema on the other hand a function schema is declared in conjunction with the function itself:

(schema/defn stringify :- schema/Str [coll :- (schema/pred coll?)] (apply str coll))

(stringify [123]) ; => "123" (schema/with-fn-validation (stringify1)) ;; Exception: Input to stringify does not match schema ...

And unfortunately they cannot be detached from the function they were declared with to be applied to some other function. Even though there exists facilities to extract the function schema from a function it is only useful for documentational purposes and cannot be used for validation.

(defn log-str [coll] (str "Data: " coll)) (def log-str (schema/schematize-fn log-str (schema/fn-schema stringify))) ;; We can see the string representation is attached. (schema/fn-schema log-str) ; (=> Str (pred coll?)) ;; But the below call would fail had the schema been enforced. (schema/with-fn-validation (log-str1)) ; => "Data: 1"

It is also well worth mentioning that Schema’s function validation is not thread safe because the switch of whether function validation is a piece of global state that is modified when the with-fn-validation macro is called.

42 5.3. Writing specifications

5.3.2 Data generation Assuming that the programmer keeps within the bounds of the DSL or API that Schema and Spec provide both libraries also come with support libraries that enable the generation of data from specifications. This means that since the Spec API allows the programmer to express a wider range of data structures it also allows him or her to generate that same wider range of data. While Spec was built with data generation in mind and provides built in facilities for limiting the amounts of data generated it is not without issues. With Spec it is possible to express an upper limit to how many elements should be generated for every1, every-kv, coll-of, map-of, but not keys and keys* and none of the regex operations. Perhaps the most obvious example of why one would want to impose such a limitation is a tree with an arbitrary number of branches like:

(spec/def ::tree (spec/coll-of ::node)) (spec/def ::node (spec/or :children ::tree :terminal any?))

(require '[clojure.spec.gen :as gen]) (gen/generate (spec/gen ::tree)) ;; Stuck in endless loop.

Upon calling gen/generate the JVM becomes unresponsive evaluating either until it reaches a java.lang.OutOfMemoryError: GC overhead limit exceeded2 or receives a SIGKILL. With this spec it is possible to prevent this issue since coll-of supports the named argument :gen-max.

(spec/def ::tree (spec/coll-of ::node :gen-max 2)) (gen/generate (spec/gen ::tree)) ; => [[[-1 1128 []]] 7 -1381834]

But if the data structure being expressed had to use the Spec regexp operations that would not be possible. An example of such a structure would be the popular Hiccup syntax [26] which expresses XML in Clojure data structures:

(spec/def ::hiccup (spec/cat :tag keyword? :attributes (spec/? map?) :content (spec/* (spec/or :terminal string? :element ::hiccup))))

When asked whether the Clojure development team intended to implement a named argument :gen-max-opt for keys Clojure maintainer Alex Miller explained that there were no such plans. Attaching a custom generator to a spec entails learning how the generative testing library test.check3 works and what it defines a generator to be.

1The macros every and every-kv behave like coll-of and map-of but do not exhaustively check every element in the collection. 2Which by default means that the JVM spends more than 98% of its time doing GC recovering less than 2% of the heap and decides to terminate the thread. 3Both Spec and Schema use test.check to provide implementation of the data generation.

43 5. Discussion

Schema on the other hand does not suffer from the same issues and manages to produce useful output even though the programmer has imposed no upper limit to the generation:

(declare tree) (schema/defschema node (schema/cond-pre schema/Int (schema/recursive #'tree))) (schema/defschema tree [node])

(require '[schema-generators.generators :as gen]) (gen/generate tree) ;; => [-260 [[0 [[] []] 1 [] []] [[] -1] -3 -3 [[] -2 -1 -1] [] 0 [[-1]]

5.3.3 Regular data structures A regular structure is one which abides to the same restrictions that a regular language does, only that it deals with sequences in data structures instead of strings. One facet of the Spec API that many newcomers struggle with is that there are two classes of functions in it. One class that can be combined using the semantics of regular expressions and one class that when combined always define new subsequences. The difference between the functions & and and or alt and or cannot be inferred from the names and must be read in the docs. For example:

;; spec/and is a regex operation and combines. (gen/sample (spec/gen (spec/and #{:a :c} #{:b :a :c}))) ;; => (:c :a :c :a :a :a :a :a :a :c)

;; spec/& is not a regex operation and creates subsequences. (gen/sample (spec/gen (spec/& #{:a :c} #{:b :a :c}))) ;; => ([:c] [:a] [:a] [:c] [:c] [:c] [:c] [:a] [:c] [:c])

To further examplify, the function alt is for concatenating nested regex specs while or is for non-regexp use and will produce a new subsequence when used in a regex. In the below code snippet we have the regex spec ::numbers which uses the regex operation * which means “zero or more”, ::numbers is then concatenated with a string using Spec’s cat macro.

(spec/def ::numbers (spec/* number?)) (spec/def ::or-spec (spec/cat :num (spec/or :num ::numbers) :str string?))

(spec/valid? ::numbers [123]) ; => true (spec/valid? ::or-spec [123 "a"]) ; => false (spec/valid? ::or-spec [[123] "a"]) ; => true

As seen above, or starts a new sub-sequence in which a new regex context is started. In contrast, alt becomes part of the existing regex context.

(spec/def ::alt-spec (spec/cat :num (spec/alt :num ::numbers) :str string?))

(spec/valid? ::alt-spec [123 "a"]) ; => true (spec/valid? ::alt-spec [[123] "a"]) ; => false

44 5.3. Writing specifications

On the other hand neither Schema nor Truss has any concept of concatenating sequences to each other without them matching individual sub-sequences. For Schema the user has to exit the DSL completely and leave it up to a constraining function. A sequence4 of keys and values like :a 5 :b 7 would with Schema have to be expressed like

(def kw-int-seq (schema/constrained schema/Any ; Schema which allows for anything. #(and (even? (count %)) (every? keyword? (take-nth 2 %)) (every? integer? (take-nth 2( rest %))))))

(schema/check kw-int-seq '(:a 1 :b 2)) ; => nil (which means valid) which can be contrasted to the equivalent spec:

(spec/def ::kw-int-seq (spec/* (spec/cat :kw keyword? :int integer? )))

(spec/valid? ::kw-int-seq '(:a 1 :b 2)) ; => true

The type or DSL-part of the schema allows for anything to be in the sequence, it is then up to the given constraining anonymous function5 to check the contents of the sequence.

5.3.4 Named arguments Perhaps due to its issues with regular data structures Schema has no built in way of expressing Clojure’s named arguments. The head author of Schema, Jason Wolfe, writes that

“So, we (prismatic) basically frowned on keyword rest arguments internally, prefer- ring to always pass an explicit map (not as a rest arg). I can go into the reasons if you find it interesting, but basically that + the required additional complexity means that schema does not support schemas for rest keyword arguments.” [41]

In other words the author of this thesis had to introduce his own function to match key-value sequences where the order in which the key-value pairs come in is of no importance. The function was named key-val-seq? and has two arities. The unary function body merely checks that there is an even amount of items in the key-value sequence and that the first item in the pair is a keyword. The binary body verifies that all the values in the key-value sequence match up to the schema given in the validation map for that key.

(defn key-val-seq? ([kv-seq] (and (even? (count kv-seq)) (every? keyword? (take-nth 2 kv-seq))))

4As opposed to a map of keys and values. This is how named arguments are given in Clojure. 5An is simply a function without a name. #(some %) is a shorthand for (fn [arg] (some arg)) and defines a new anonymous function with one argument.

45 5. Discussion

([kv-seq validation-map] (and (key-val-seq? kv-seq) (every? nil? (for [[k v] (partition 2 kv-seq)] (if-let [schema (get validation-map k)] (schema/check schema v) :schema/invalid))))))

(defn ab-arguments? [kv-seq] (key-val-seq? kv-seq {:a schema/Num :b schema/Num}))

(def exclusion-args (schema/constrained schema/Any ab-arguments?))

In addition a generator has to be written added to a map of generators passed to the generate function of schema-generators if one wishes to generate data from a schema containing a construct like this. To reiterate, Schema expressed in the DSL cannot match the sequence 5 :max 10 :min 1 which would represent the arguments for the function

(defn get-ints [nr & {:keys [max min]}] (take nr (repeatedly #(int (+ (* (rand)(- max min -1)) min)))))

At best the Schema DSL can express only 5 :a :b, that is: one element which is a number followed by any number of elements which are of keyword type. The function key-val-seq? was then reused for the Truss and Plain Clojure specifications. Though due to the complete lack of a function specification concept the validation has to be written in the bodies of the function, if not in the :pre or :post conditions, which allows for the normal Clojure destructuring primitives to be used; something which is not available when writing schemas.

5.3.5 Closed maps and types Clojure Spec by design does not provide facilities to limit the keys present in a map, only ways to express that at least these keys should be present and that optionally these may be present. This comes from an ideological stance that there is no harm in accepting or giving back too much data. By contrast maps in Schema are closed by default:

(schema/defschema user {:id schema/Int :name schema/Str})

(schema/check user {:id 1 :name "Leif"}) ; => nil (schema/check user {:id 1}) ; => {:name missing-required-key} (schema/check user {:id 1 :name "My" :c 2}) ; => {:c disallowed-key}

But they can easily be opened up:

46 5.3. Writing specifications

(schema/defschema user {:id schema/Int :name schema/Str schema/Keyword schema/Any})

(schema/check user {:id 1 :name "Leif" :c 2}) ; => nil

Truss again acts differently than both Spec and Schema. It only comes with built in support for checking for the presence of keys and does not also validate the values held under these keys. Though it has syntax for specifying that a map should have a subset, superset or exactly some set of keys.

(truss/have [:ks= #{:id :name}] {:id 5 :name "My"}) ; => {:id 5, :name "My"} (truss/have [:ks= #{:id :name}] {:id 5 :name "My" :c 1}) ;; Exception: Invariant violation in ... (truss/have [:ks= #{:id :name}] {:id 5}) ;; Exception: Invariant violation in ...

;; Subset (truss/have [:ks<= #{:id :name}] {:id 5}) ; => {:id 5}

;; Superset (truss/have [:ks>= #{:id :name}] {:id 5 :name "My" :c 1}) ;; => {:id 5, :name "My", :c 1}

But in order to assert that the value held under these keys fulfil some condition some additional code has to be written like the req-key macro the author wrote:

(defmacro req-key [key predicate data] `(do (truss/have [:ks>= #{~key}] ~data) (truss/have ~predicate (get ~data ~key)) ~data))

Which then can be used like:

(req-key :name string? {:id 5 :name "My"}) ; => {:id 5, :name "My"} (req-key :name string? {:id 5 :name 5}) ;; Exception: Invariant violation in `user:551`. ;; Test form `(string? (clojure.core/get {:id 5, :name 5} :name))` ;; failed against input val `5`.

5.3.6 Other minor points of interest Here follows a set of subjects which are intended for the reader interested in the minute details of implementing a specification. Subjects such as: If keysets are to be shared between maps with Spec they have to be manipulated and put in place using macros and not with functions. Or that int? only accepts fixed precision integers, unlike integer?. Or if a custom conformer6 is 6A conformer is a function which takes some data and returns a transformation of it if the data is considered valid. One may see it as the building of a high level parser.

47 5. Discussion written with Spec a custom unformer and generator also has to be written if those features are to function.

5.3.6.1 Mutually exclusive keys in maps Clojure Spec is specifically designed so to keep maps open and extensible. There is no built-in way of prohibiting a key from being present in a map, and thus neither a way to allow key :a to be present iff :b is not. But that does not prevent the developer from expressing such a relation using the rest of Clojure.

(defn xor? [coll a-key b-key] (let [a (contains? coll a-key) b (contains? coll b-key)] (or (and a (not b)) (and b (not a)))))

(spec/def ::diameter float?) (spec/def ::radius float?) (spec/def ::circle (spec/and (spec/keys :opt-un [::radius ::diameter]) #(xor? % :radius :diameter)))

(spec/valid? ::circle {:radius 10.0}) ; => true (spec/valid? ::circle {:radius 10.0 :diameter 5.0}) ; => false

5.3.6.2 Schema considers nil a sequence

Clojure has a special value for “nothing”: nil. It is the analogue of Python’s None and Java’s null, but is sometimes used differently than in those languages. Given a schema for a sequence of integers

(schema/defschema int-seq [schema/Int])

(schema/check int-seq [12]) ; => nil (schema/check int-seq []) ; => nil (schema/check int-seq :a) ; => "(not (sequential? :a))" (schema/check int-seq nil) ; => nil what may seem strange is that nil is considered a valid sequence of integers; though this is by design,7 as some core Clojure functions treat nil as a sequence. In order to constrain the schema further, so that it behaves like the equivalent Spec and Truss code, the schema has to be further constrained:

(schema/defschema int-seq (schema/constrained [schema/Int] seq?))

(schema/check int-seq nil) ; => "(not (seq? nil))"

7Further reading on nil as a sequence https://github.com/plumatic/schema/issues/195

48 5.3. Writing specifications

5.3.6.3 Sets cannot be used as predicates for false values Since Spec is based around the idea of predicate functions, and sets can be used as functions, its user’s guide [42] explains how to use them to ask questions about set membership. A set will return the given argument if it is a member of the set and nil otherwise.

(#{:a :b :c false} :a) ; => :a (#{:a :b :c false} false) ; => false (#{:a :b :c false} :d) ; => nil

The problem comes from that Spec then interprets this return value as the answer to “is the given element a member of this set?” which of course is incorrect given that the set contains either false or nil. The proper way to deal with this is to instead call contains? with both the set and element as arguments.

(contains? #{:a :b :c false} false) ; => true (contains? #{:a :b :c false} :d) ; => false

Below is code illustrating how the Spec Guide [42] way of implementing set membership tests fails and how to fix it along with how Schema and Truss deals with the same problem.

(spec/def ::pedantic? #{:abort :warn :ranges true false}) (spec/valid? ::pedantic? false) ; => false, unexpected result. (spec/def ::pedantic-2? #(contains? #{:abort :warn :ranges true false} %)) (spec/valid? ::pedantic? false) ; => true, expected result.

(def pedantic? (schema/enum :abort :warn :ranges true false)) (schema/check pedantic? false) ; => nil, expected result.

(defn pedantic[ val] (truss/have [:el #{:abort :warn :ranges true false}] val)) (pedantic false) ; => false, expected result.

5.3.6.4 Error messaging The error messages given by Schema are simple and short, perhaps erring on the short side when it comes to composite schemas:

(schema/check schema/Int :a) ; => "(not (integer? :a))"

(def int-or-str (schema/cond-pre schema/Int schema/Str)) (schema/check int-or-str :a) ;; => (not (matches-some-precondition? :a))

While the error messaging of Spec is more wordy.

49 5. Discussion

(spec/def ::int integer?) (spec/explain ::int :a) ;; => val: :a fails spec: :ns/int predicate: integer?

(spec/def ::int-or-str (spec/or :int integer? :str string?)) (spec/explain ::url-or-str) ;; => val: :a fails spec: :ns/int-or-str at: [:int] predicate: integer? ;; val: :a fails spec: :ns/int-or-str at: [:str] predicate: string?

The error messages of Truss are even longer.

(truss/have string? 1) ;; Exception: Invariant violation in `user:573`. Test form `(string? 1)` ;; failed against input val `1`. ;; {:*?data* nil, ;; :elidable? true, ;; :dt #inst "2017-06-19T13:57:22.299-00:00", ;; :val 1, ;; :ns-str "leiningen.core.truss.project", ;; :val-type java.lang.Long, ;; :?err nil, ;; :*assert* true, ;; :?data nil, ;; :?line 573, ;; :form-str "(string? 1)"}

Truss allows the programmer to attach arbitrary data from the local environment to the error message so to provide relevant context. A flexibility which can be used as a strategy to avoid having nested asserts to provide error messages like the author did in his implementation.

(let [local-name {:k "a local value"}] (truss/have string? 1 :data local-name))

;; Exception: Invariant violation in `user:574`. Test form `(string? 1)` ;; failed against input val `1`. ;; {:*?data* nil, ;; ... ;; :?data {:k "a local value"}, ;; ...}

5.3.6.5 Proposed solution for Spec’s inability to report predicate names Spec does not report the name of the predicate function in error messages when the predicate is not wrapped in a call to spec.

(spec/explain-str number? 'not-number) ;; => "val: not-number fails predicate: :clojure.spec/unknown" (spec/explain-str (spec/spec number?) 'not-number) ;; => "val: not-number fails predicate: number?"

50 5.3. Writing specifications

To understand this we have to first understand that functions, such as number?, do not keep a list of names by which they are referred to. This means that when the function spec/explain-str is given a function such as number? as an argument it is not possible for it to extract the name by which the caller referred to that function. But a macro on the other hand is given the symbols rather than the values of the arguments passed to it. In other words a macro can know the name by which the caller of the macro refers to a function. So what happens in the latter form in the above code block is that spec records the name and stores it in the returned spec which can then be read by explain-str at a later point. Two of the entry points to Spec’s validation functionality are functions, valid?8 and conform, but the last one, assert, is a macro. This means that it is very straightforward to modify assert to report a useful name for plain predicates. All code in the remainder of this section is in the Clojure Spec namespace, the original assert is defined like:

(defmacro assert [spec x] (if *compile-asserts* `(if clojure.lang.RT/checkSpecAsserts (assert* ~spec ~x) ~x) x))

And the modified version simply converts arguments given on the position specc into specs if need be:

(defmacro assert' [specc x] (if *compile-asserts* `(let [spec# (if (spec? ~specc) ~specc (spec ~specc))] (if clojure.lang.RT/checkSpecAsserts (assert* (specize* spec#) ~x) ~x)) x))

It is then capable of reporting the name of the failing predicate, unlike the original implemen- tation.

(assert number? 'sym) ;; Spec assertion failed val: sym fails predicate: :clojure.spec/unknown ;; :clojure.spec/failure :assertion-failed

(assert' number? 'sym) ;; Spec assertion failed val: sym fails predicate: number? ;; :clojure.spec/failure :assertion-failed

In a similar fashion, if explain-str is turned into a macro it too could provide these improved error messages:

8Valid is in fact just a wrapper around conform.

51 5. Discussion

(defmacro explain-str' [specc x] `(with-out-str (explain (if (spec? ~specc) ~specc (spec ~specc)) ~x)))

(explain-str number? 'sym) ; => "val: sym fails predicate: :clojure.spec/unknown" (explain-str' number? 'sym) ; => "val: sym fails predicate: number?"

5.4 In a wider context

This work relates to the methods one may employ to faster and more reliably produce maintainable code. It is neutral to any ethical concerns as such, one may employ these methods for any given person’s definition of good or evil. Hopefully this work may also provide some clues to the authors of each library as to what works well and and what is still missing from the respective libraries, both in terms of API design and performance. It is also the hope of the author that this work may be extended upon to compare different versions of the discussed libraries and automatically update with each release so to spur further developments in performance as seen in Mozilla’s arewefastyet.com, a benchmark of its and other browsers over time. If there was one question the author would like to delve further into had he the time and funding it would be that of whether or not Spec improves the error messaging of Clojure; a usability study on error message comprehension.

52 6 Conclusion

The author hopes that this writing has in some shape or form met the goal of evaluating Clojure Spec and positioned it in a context of competing technology. We can conclude that the median validation time of the implemented spec is always, within the margin of error, faster than the schema. That the data chosen for validation is perhaps too complex for Truss, but that the median validation time of the truss was still better than that of the spec. That plain Clojure code is almost always faster but it will only provide the user with a boolean value of whether the datum is valid or not, with no further explanation. Use Spec if the data being described is complex its property based testing capabilities seem desireable. If, on the other hand, the data is simple and there are only a few different facts at the time that need to be asserted, use Truss. The multitude of different types of structures within the Leiningen project map should cover most of what one could find in the real world, and as of such the results of this study (or at least conclusions drawn from the breakdowns in Appendix C), should be generalizable to quite a wide field.

53

Bibliography

[1] Abran, A. 2010. Software metrics and software metrology. John Wiley & Sons.

[2] An experiment on measuring complexity of Clojure code: 2017. https:// github.com/ lokori/ uncomplexor.

[3] Anatomy of a Macro: 2017. http:// www.braveclojure.com/ writing-macros/ #Anatomy_of_a_ Macro.

[4] Annotate - A library for adding type annotations to functions and checking those types at runtime.: 2017. https:// github.com/ roomkey/ annotate.

[5] Assertions API for Clojure/Script: 2017. https:// github.com/ ptaoussanis/ truss.

[6] Banker, R.D. et al. 1993. Software complexity and maintenance costs. Communications of the ACM. 36, 11 (1993), 81–95.

[7] Brzozowski, J.A. 1964. Derivatives of regular expressions. Journal of the ACM (JACM). 11, 4 (1964), 481–494.

[8] Bull, J.M. et al. 2001. Benchmarking Java against C and for scientific applications. Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande (2001), 97–105.

[9] CLOC - Counts Lines of Code.: 2017. https:// github.com/ AlDanial/ cloc.

[10] clojure.spec - Rationale and Overview: 2017. https:// clojure.org/ about/ spec.

[11] Compojure - A concise routing library for Ring/Clojure.: 2017. https:// github.com/ weavejester/ compojure.

[12] Coppick, J.C. and Cheatham, T.J. 1992. Software metrics for object-oriented systems. Proceedings of the 1992 ACM annual conference on Communications (1992), 317–322.

[13] core.typed - An optional type system for Clojure.: 2017. https:// github.com/ clojure/ core. typed.

[14] Curtis, B. et al. 1979. Third time charm: Stronger prediction of programmer performance by software complexity metrics. Proceedings of the 4th international conference on Software

59 60 CHAPTER 6. CONCLUSION engineering (1979), 356–360.

[15] Destructuring in Clojure: 2017. https:// clojure.org/ guides/ destructuring.

[16] Figwheel - hot loads it into the browser as you are coding! 2017. https:// github.com/ bhauman/ figwheel.

[17] Fink, G. and Bishop, M. 1997. Property-based testing: a new approach to testing for assurance. ACM SIGSOFT Software Engineering Notes. 22, 4 (1997), 74–80.

[18] Fitzsimmons, A. and Love, T. 1978. A review and evaluation of software science. ACM Computing Surveys (CSUR). 10, 1 (1978), 3–18.

[19] Five Differences between clojure.spec and Schema: 2017. http:// www.lispcast.com/ clojure. spec-vs-schema.

[20] Good Configuration Feedback is Essential: 2017. http:// rigsomelight.com/ 2016/ 05/ 17/ good-configuration-feedback-is-essential.html.

[21] Halloway, S. 2009. Programming Clojure. Pragmatic Bookshelf.

[22] Halstead, M.H. 1977. Elements of software science. Elsevier New York.

[23] Halstead, M.H. 1972. Natural laws controlling algorithm structure? ACM Sigplan Notices. 7, 2 (1972), 19–26.

[24] Harrison, M.A. 1978. Introduction to formal language theory. Addison-Wesley Longman Publishing Co., Inc.

[25] Herbert - Clojure library defining a schema for edn values.: 2017. https:// github.com/ miner/ herbert.

[26] Hiccup - A fast library for rendering HTML in Clojure.: 2017. https:// github.com/ weavejester/ hiccup.

[27] Instaparse - What if context-free grammars were as easy to use as regular expressions? 2017. https:// github.com/ Engelberg/ instaparse.

[28] Kitchenham, B.A. et al. 2004. Evidence-based software engineering. Proceedings of the 26th international conference on software engineering (2004), 273–281.

[29] Kraus, J.M. and Kestler, H.A. 2009. Multi-core parallelization in Clojure: a case study. Proceedings of the 6th European Lisp Workshop (2009), 8–17.

[30] Maven - a software project management tool.: 2017. https:// maven.apache.org/ .

[31] McCabe, T.J. 1976. A complexity measure. IEEE Transactions on software Engineering. 4 (1976), 308–320.

[32] Measuring code complexity in Clojure - Stack Overflow: 2017. http:// stackoverflow.com/ a/ 42670540/ 501017 .

[33] Might, M. et al. 2011. Parsing with derivatives: a functional pearl. Acm sigplan notices 61

(2011), 189–195.

[34] Okasaki, C. 1999. Purely functional data structures. Cambridge University Press.

[35] Pattern (Java 2 Platform SE 5): 2010. http:// docs.oracle.com/ javase/ 1.5.0/ docs/ api/ java/ util/ regex/ Pattern.html.

[36] Plösch, R. 2002. Evaluation of assertion support for the Java programming language. Journal of Object Technology. 1, 3 (2002), 5–17.

[37] Re: Clojure Spec, performance and workflows: 2016. https:// groups.google.com/ d/ msg/ clojure/ kLLvOdtPO_k/ Pwtcn9KlCQAJ.

[38] Re: Seeking participants for a study on the usability of Clojure Spec’s error messag- ing.: 2017. https:// www.reddit.com/ r/ Clojure/ comments/ 5vbuu3/ seeking_participants_for_ a_study_on_the/ de0tx10/ .

[39] Schema - A Clojure(Script) library for declarative data description and validation: 2017. https:// github.com/ plumatic/ schema.

[40] Schema & Clojure Spec for the Web Developer: 2017. http:// www.metosin.fi/ blog/ schema-spec-web-devs/ .

[41] Schema for a function with keyword arguments: 2017. https:// groups.google.com/ forum/ #! topic/ prismatic-plumbing/ dfR-YyNdEpY .

[42] Spec Guide: 2017. https:// clojure.org/ guides/ spec.

[43] Spectrum - A library for doing static analysis of Clojure code, catching clojure.spec conform errors at compile time: 2017. https:// github.com/ arohner/ spectrum.

[44] State of Clojure 2015 Survey Results: 2016. http:// blog.cognitect.com/ blog/ 2016/ 1/ 28/ state-of-clojure-2015-survey-results.

[45] State of Clojure 2016 Results and Analysis: 2017. http:// blog.cognitect.com/ blog/ 2017/ 1/ 31/ state-of-clojure-2016-results.

[46] Stol, K.-J. and Fitzgerald, B. 2015. A holistic overview of software engineering research strategies. Proceedings of the Third International Workshop on Conducting Empirical Studies in Industry (2015), 47–54.

[47] Strictly Specking: 2017. https:// github.com/ bhauman/ strictly-specking.

[48] The Clojure Website: 2017. https:// clojure.org/ .

[49] Validation-benchmark: 2017. http:// muhuk.github.io/ validation-benchmark/ . A Statistical results from generating data with Spec

The following numbers are given in milliseconds from benchmarking the generation of 100 samples of data from the different subkeys of project.clj using the test.check generators Clojure Spec provides. Generation as contrasted to validation which is the main operation performed in the rest of this paper and is shown in Appendix B and C.

mean min median max std-dev key 35.034 0.020 23.292 164.945 38.956 :aliases 2.376 0.012 0.048 17.278 3.523 :aot 0.003 0.002 0.002 0.011 0.000 :auto-clean 0.003 0.002 0.002 0.012 0.001 :bootclasspath 4.518 0.105 3.371 20.220 4.019 :certificates 65.235 0.905 60.473 232.314 47.562 :checkout-deps-shares 0.005 0.005 0.005 0.012 0.000 :checksum 59.134 0.055 53.556 160.403 41.741 :classifiers 0.003 0.002 0.002 0.020 0.001 :clean-non-project-classes 63.993 0.420 59.461 225.686 46.901 :clean-targets 0.057 0.014 0.059 0.103 0.024 :compile-path 20.033 0.838 17.696 76.161 13.624 :dependencies 4.192 0.093 3.669 12.105 2.977 :deploy-branches 4.492 0.175 4.076 25.392 3.650 :deploy-repositories 0.145 0.037 0.141 0.274 0.057 :description 0.005 0.005 0.005 0.037 0.003 :-in 5.843 0.180 4.857 19.548 4.099 :exclusions 5.259 0.180 4.467 40.950 5.428 :extensions 10439.266 0.152 9.661 50418.869 13750.362 :filespecs

57 A. Statistical results from generating data with Spec

mean min median max std-dev key 15.456 0.052 12.640 76.839 14.930 :global-vars 0.003 0.003 0.003 0.014 0.001 :hooks 0.003 0.003 0.003 0.009 0.000 :implicit-hooks 0.003 0.003 0.003 0.022 0.001 :implicit-middleware 5.442 0.024 5.006 22.210 4.528 :injections 0.002 0.002 0.002 0.010 0.000 :install-releases? 4.717 0.075 4.337 22.686 3.573 :jar-exclusions 0.056 0.014 0.055 0.116 0.023 :jar-name 6.149 0.425 6.095 20.219 3.716 :java-agents 0.064 0.015 0.067 0.156 0.030 :java-cmd 4.428 0.181 3.708 14.250 3.146 :javac-options 4.927 0.300 4.342 15.318 3.295 :java-source-paths 5.021 0.104 4.084 17.251 3.756 :jvm-opts 0.149 0.017 0.159 0.379 0.099 :license 0.273 0.032 0.257 0.696 0.157 :licenses 0.054 0.014 0.055 0.091 0.021 :local-repo 0.546 0.026 0.533 1.521 0.381 :mailing-list 0.788 0.039 0.717 2.191 0.430 :mailing-lists 1.010 0.044 1.060 2.079 0.591 :main 17.354 0.536 16.460 64.949 10.434 :managed-dependencies 16.519 0.042 15.778 52.331 12.442 :manifest 15.680 1.098 15.159 70.472 8.733 :middleware 0.118 0.088 0.119 0.182 0.016 :min-lein-version 0.281 0.024 0.303 0.807 0.190 :mirrors 0.004 0.003 0.004 0.014 0.001 :monkeypatch-clojure-test 0.053 0.014 0.052 0.117 0.021 :native-path 0.003 0.003 0.003 0.010 0.000 :offline? 0.002 0.002 0.002 0.011 0.000 :omit-source 0.543 0.168 0.354 2.543 0.488 :parent 0.005 0.005 0.005 0.025 0.002 :pedantic? 3.963 0.163 3.144 12.268 2.892 :plugin-repositories 13.079 1.571 12.434 42.030 8.288 :plugins 5.722 0.233 1.987 24.715 6.289 :pom-addition 0.055 0.014 0.055 0.108 0.023 :pom-location 104.099 0.067 107.824 325.427 70.815 :pom-plugins 12.508 0.225 12.556 33.342 6.351 :prep-tasks 675.577 0.076 379.180 3306.051 777.520 :profiles 12.580 0.502 11.089 35.899 7.249 :release-tasks 96.274 0.032 92.987 254.056 72.608 :repl-options 4.790 0.194 4.451 14.585 3.338 :repositories 4.208 0.112 3.568 21.361 3.398 :resource-paths 0.161 0.016 0.153 0.454 0.114 :scm 0.719 0.022 0.631 2.273 0.452 :signing 4.405 0.080 4.055 16.604 3.271 :source-paths 0.051 0.014 0.045 0.119 0.024 :target-path

58 mean min median max std-dev key 4.507 0.071 4.154 13.091 3.172 :test-paths 78.994 0.044 76.627 235.074 54.854 :test-selectors 4.498 0.055 4.115 12.449 3.189 :uberjar-exclusions 34.778 0.072 30.465 121.628 22.932 :uberjar-merge-with 0.060 0.013 0.061 0.353 0.037 :uberjar-name 0.005 0.005 0.005 0.019 0.001 :update 0.253 0.176 0.224 0.991 0.096 :url 0.002 0.002 0.002 0.012 0.000 :warn-on-reflection

59

B Validation time broken down by keyword per system

Figure B.1 displays the spread of validation time broken down on a per keyword basis for Truss. Figure B.2-4 does the same but for Spec, Schema and Plain Clojure.

61 B. Validation time broken down by keyword per system

:warn-on-reflection

:url

:update

:uberjar-name

:uberjar-merge-with

:uberjar-exclusions

:test-selectors

:test-paths

:target-path

:source-paths

:signing

:scm

:resource-paths

:repositories

:repl-options

:release-tasks

:profiles

:prep-tasks

:pom-plugins

:pom-location

:pom-addition

:plugins

:pedantic?

:parent

:omit-source

:native-path

:monkeypatch-clojure-test

:mirrors

:min-lein-version

:middleware

:manifest

:managed-dependencies

:main

:mailing-lists

:mailing-list

:local-repo

:licenses

:license

:jvm-opts Validation time per keyword for Truss :javac-options

:java-source-paths

:java-cmd

:java-agents

:jar-name

:jar-exclusions

:injections

:hooks

:global-vars

:filespecs

:exclusions

:eval-in

:description

:deploy-repositories

:deploy-branches

:dependencies

:compile-path

:clean-targets

:clean-non-project-classes

:classifiers

:checksum

:checkout-deps-shares

:certificates

:bootclasspath

:auto-clean

:aot

:aliases 1.00000 0.10000 0.01000 0.00100 0.00010

10.00000 milliseconds

Figure B.1: Validation time broken down by keyword for Truss 62 Validation time per keyword for Spec iueB2 aiaintm rkndw ykyodfrSpec for keyword by down broken time Validation B.2: Figure

10.00000

1.00000

0.10000 milliseconds

0.01000

0.00100

0.00010

:aliases:aot :auto-clean:bootclasspath:certificates:checkout-deps-shares:checksum:classifiers:clean-non-project-classes:clean-targets:compile-path:dependencies:deploy-branches:deploy-repositories:description:eval-in:exclusions:filespecs:global-vars:hooks:injections:jar-exclusions:jar-name:java-agents:java-cmd:java-source-paths:javac-options:jvm-opts:license:licenses:local-repo:mailing-list:mailing-lists:main:managed-dependencies:manifest:middleware:min-lein-version:mirrors:monkeypatch-clojure-test:native-path:omit-source:parent:pedantic?:plugins:pom-addition:pom-location:pom-plugins:prep-tasks:profiles:release-tasks:repl-options:repositories:resource-paths:scm:signing:source-paths:target-path:test-paths:test-selectors:uberjar-exclusions:uberjar-merge-with:uberjar-name:update:url :warn-on-reflection 63 B. Validation time broken down by keyword per system

:warn-on-reflection

:url

:update

:uberjar-name

:uberjar-merge-with

:uberjar-exclusions

:test-selectors

:test-paths

:target-path

:source-paths

:signing

:scm

:resource-paths

:repositories

:repl-options

:release-tasks

:profiles

:prep-tasks

:pom-plugins

:pom-location

:pom-addition

:plugins

:pedantic?

:parent

:omit-source

:native-path

:monkeypatch-clojure-test

:mirrors

:min-lein-version

:middleware

:manifest

:managed-dependencies

:main

:mailing-lists

:mailing-list

:local-repo

:licenses

:license

:jvm-opts

Validation time per keyword for Schema :javac-options

:java-source-paths

:java-cmd

:java-agents

:jar-name

:jar-exclusions

:injections

:hooks

:global-vars

:filespecs

:exclusions

:eval-in

:description

:deploy-repositories

:deploy-branches

:dependencies

:compile-path

:clean-targets

:clean-non-project-classes

:classifiers

:checksum

:checkout-deps-shares

:certificates

:bootclasspath

:auto-clean

:aot

:aliases 1.00000 0.10000 0.01000 0.00100 0.00010

10.00000 milliseconds

Figure B.3: Validation time broken down by keyword for Schema 64 iueB4 aiaintm rkndw ykyodfrpanCouevalidation Clojure plain for keyword by down broken time Validation B.4: Figure

Validation time per keyword for Plain

10.00000

1.00000

0.10000 milliseconds

0.01000

0.00100

0.00010

:aliases:aot :auto-clean:bootclasspath:certificates:checkout-deps-shares:checksum:classifiers:clean-non-project-classes:clean-targets:compile-path:dependencies:deploy-branches:deploy-repositories:description:eval-in:exclusions:filespecs:global-vars:hooks:injections:jar-exclusions:jar-name:java-agents:java-cmd:java-source-paths:javac-options:jvm-opts:license:licenses:local-repo:mailing-list:mailing-lists:main:managed-dependencies:manifest:middleware:min-lein-version:mirrors:monkeypatch-clojure-test:native-path:omit-source:parent:pedantic?:plugins:pom-addition:pom-location:pom-plugins:prep-tasks:profiles:release-tasks:repl-options:repositories:resource-paths:scm:signing:source-paths:target-path:test-paths:test-selectors:uberjar-exclusions:uberjar-merge-with:uberjar-name:update:url :warn-on-reflection 65

C Validation time broken down by keyword grouped by system

Validation time broken down by keyword but unlike Appendix B, each system is shown side by side. The keywords then span Figure C.1-4.

67 C. Validation time broken down by keyword grouped by system

Spec :exclusions Truss Schema Plain Clojure

:eval-in

:description

:deploy-repositories

:deploy-branches

:dependencies

:compile-path

:clean-targets

:clean-non-project-classes

:classifiers

:checksum Validation time per keyword of all systems side by side.

:checkout-deps-shares

:certificates

:bootclasspath

:auto-clean

:aot

:aliases 1.00000 0.10000 0.01000 0.00100 0.00010

10.00000 milliseconds

Figure C.1: Validation time by all systems for keys :aliases to :exclusions 68 Validation time per keyword of all systems side by side.

Spec iueC2 aiaintm yalssesfrky fiepc o:main to :filespecs keys for systems all by time Validation C.2: Figure Schema Truss Plain Clojure

10.00000

1.00000

0.10000 milliseconds

0.01000

0.00100

0.00010

:filespecs :global-vars :hooks :injections :jar-exclusions :jar-name :java-agents :java-cmd :java-source-paths:javac-options :jvm-opts :license :licenses :local-repo :mailing-list :mailing-lists :main 69 C. Validation time broken down by keyword grouped by system

Spec :release-tasks Truss Schema Plain Clojure

:profiles

:prep-tasks

:pom-plugins

:pom-location

:pom-addition

:plugins

:pedantic?

:parent

:omit-source

:native-path Validation time per keyword of all systems side by side.

:monkeypatch-clojure-test

:mirrors

:min-lein-version

:middleware

:manifest

:managed-dependencies 1.00000 0.10000 0.01000 0.00100 0.00010

10.00000 milliseconds

Figure C.3: Validation time by all systems for keys :managed-dependencies to :release-tasks70 iueC4 aiaintm yalssesfrky rp-pin o:warn-on- to :repl-options keys for systems all by time reflection Validation C.4: Figure

Validation time per keyword of all systems side by side.

Spec Schema Truss Plain Clojure

10.00000

1.00000

0.10000 milliseconds

0.01000

0.00100

0.00010

:repl-options :repositories :resource-paths :scm :signing :source-paths :target-path :test-paths :test-selectors :uberjar-exclusions:uberjar-merge-with:uberjar-name :update :url :warn-on-reflection 71