Open Pattern Matching for C++
Total Page:16
File Type:pdf, Size:1020Kb
Open Pattern Matching for C++ Yuriy Solodkyy Gabriel Dos Reis Bjarne Stroustrup Texas A&M University College Station, Texas, USA fyuriys,gdr,[email protected] Abstract 1.1 Summary Pattern matching is an abstraction mechanism that can greatly sim- We present functional-style pattern matching for C++ built as an plify source code. We present functional-style pattern matching for ISO C++11 library. Our solution: C++ implemented as a library, called Mach71. All the patterns are • is open to the introduction of new patterns into the library, while user-definable, can be stored in variables, passed among functions, not making any assumptions about existing ones. and allow the use of class hierarchies. As an example, we imple- • is type safe: inappropriate applications of patterns to subjects ment common patterns used in functional languages. are compile-time errors. Our approach to pattern matching is based on compile-time • Makes patterns first-class citizens in the language (§3.1). composition of pattern objects through concepts. This is superior • is non-intrusive, so that it can be retroactively applied to exist- (in terms of performance and expressiveness) to approaches based ing types (§3.2). on run-time composition of polymorphic pattern objects. In partic- • provides a unified syntax for various encodings of extensible ular, our solution allows mapping functional code based on pattern hierarchical datatypes in C++. matching directly into C++ and produces code that is only a few • provides an alternative interpretation of the controversial n+k percent slower than hand-optimized C++ code. patterns (in line with that of constructor patterns), leaving the The library uses an efficient type switch construct, further ex- choice of exact semantics to the user (§3.3). tending it to multiple scrutinees and general patterns. We compare • supports a limited form of views (§3.4). the performance of pattern matching to that of double dispatch and • generalizes open type switch to multiple scrutinees and enables open multi-methods in C++. patterns in case clauses (§3.5). • demonstrates that compile-time composition of patterns through Categories and Subject Descriptors D.1.5 [Programming tech- concepts is superior to run-time composition of patterns through niques]: Object-oriented Programming; D.3.3 [Programming Lan- polymorphic interfaces in terms of performance, expressive- guages]: Language Constructs and Features ness, and static type checking (§4.1). General Terms Languages, Design Our library sets a standard for the performance, extensibility, brevity, clarity, and usefulness of any language solution for pat- Keywords Pattern Matching, C++ tern matching. It provides full functionality, so we can experiment with the use of pattern matching in C++ and compare it to existing 1. Introduction alternatives. Our solution requires only current support of C++11 Pattern matching is an abstraction mechanism popularized by without any additional tool support. the functional programming community, most notably ML [12], OCaml [21], and Haskell [15], and recently adopted by several 2. Pattern Matching in C++ multi-paradigm and object-oriented programming languages such The object analyzed through pattern matching is commonly called as Scala [30], F# [7], and dialects of C++[22, 29]. The expressive the scrutinee or subject, while its static type is commonly called power of pattern matching has been cited as the number one reason the subject type. Consider for example the following definition of for choosing a functional language for a task [6, 25, 28]. factorial in Mach7: This paper presents functional-style pattern matching for C++. To allow experimentation and to be able to use production-quality int factorial(int n) f toolchains (in particular, compilers and optimizers), we imple- unsigned short m; mented our matching facilities as a C++ library. Match(n) f Case(0) return 1; 1 The library is available at http://parasol.tamu.edu/mach7/ Case(m) return m∗factorial(m−1); Case( ) throw std::invalid argument(”factorial”); g EndMatch Permission to make digital or hard copies of all or part of this work for personal or g classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation The subject n is passed as an argument to the Match statement on the first page. Copyrights for components of this work owned by others than ACM and is then analyzed through Case clauses that list various pat- must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, terns. In the first-fit strategy typically adopted by functional lan- to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. guages, the matching proceeds in sequential order while the pat- GPCE ’13, October 27–28, 2013, Indianapolis, Indiana, USA. terns guarding their respective clauses are rejected. Eventually, the Copyright © 2013 ACM 978-1-4503-2373-4/13/10. $15.00. statement guarded by the first accepted pattern is executed or the http://dx.doi.org/10.1145/2517208.2517222 control reaches the end of the Match statement. The value 0 in the first case clause is an example of a value pat- This == is an example of a binary method: an operation that re- tern. It will match only when the subject n is 0. The variable m in quires both arguments to have the same type [3]. In each of the the second case clause is an example of a variable pattern. It will case clauses, we check that both subjects are of the same dynamic bind to any value that can be represented by its type. The name in type using a constructor pattern. We then decompose both subjects the last case clause refers to the common instance of the wildcard into components and compare them for equality with the help of a pattern. Value, variable, and wildcard patterns are typically referred variable pattern and an equivalence combinator (‘+’) applied to it. to as primitive patterns. The list of primitive patterns is often ex- The use of an equivalence combinator turns a binding use of a vari- tended with a predicate pattern (e.g. as seen in Scheme [49]), which able pattern into a non-binding use of that variable’s current value allows the use of any unary predicate or nullary member-predicate as a value pattern. We chose to overload unary + because in C++ it as a pattern: e.g. Case(even) ::: (assuming bool even(int);) or turns an l-value into an r-value, which has a similar semantics here. Case([](int m) f return m^m−1; g) ::: for λ-expressions. In general, a pattern combinator is an operation on patterns The predicate pattern is a use of a predicate as a pattern and to produce a new pattern. Other typical pattern combinators, sup- should not be confused with a guard, which is a predicate attached ported by many languages, are conjunction, disjunction and nega- to a pattern that may make use of the variables bound in it. The tion combinators, which all have an intuitive Boolean interpreta- result of the guard’s evaluation will determine whether the case tion. We add a few non-standard combinators to Mach7 that reflect clause and the body associated with it will be accepted or rejected. the specifics of C++, e.g. the presence of pointers and references. Guards gives rise to guard patterns, which in Mach7 are expres- The equality operator on λ-terms demonstrates both nesting of sions of the form P j=E, where P is a pattern and E is its guard. patterns and relational matching. The variable pattern was nested Pattern matching is closely related to algebraic data types. within an equivalence pattern, which in turn was nested inside In ML and Haskell, an Algebraic Data Type is a data type each a constructor pattern. The matching was also relational because of whose values are picked from a disjoint sum of data types, we could relate the state of two subjects. Both aspects are even called variants. Each variant is a product type marked with a better demonstrated in the following well-known functional so- unique symbolic constant called a constructor. Each construc- lution to balancing red-black trees with pattern matching due to tor provides a convenient way of creating a value of its vari- Chris Okasaki [32, §3.3] implemented in Mach7: ant type as well as discriminating among variants through pat- class Tfenum colorfblack,redg col; T∗ left; K key; T∗ right;g; tern matching. In particular, given an algebraic data type D = C1(T11; :::; T1m )S⋯SCk(Tk1; :::; Tkm ) an expression of the form T∗ balance(T::color clr, T∗ left, const K& key, T∗ right) f 1 k const T::color B = T::black, R = T::red; Ci(x1; :::; xmi ) in a non-pattern-matching context is called a value constructor and refers to a value of type D created via the construc- var⟨T∗⟩ a, b, c, d; var⟨K&⟩ x, y, z; T::color col; Match(clr, left, key, right) f tor Ci and its arguments x1; :::; xmi . The same expression in the pattern-matching context is called a constructor pattern and is used Case(B, C⟨T⟩(R, C⟨T⟩(R, a, x, b), y, c), z, d) ::: to check whether the subject is of type D and was created with the Case(B, C⟨T⟩(R, a, x,C⟨T⟩(R, b, y, c)), z, d) ::: constructor Ci. If so, it matches the actual values it was constructed Case(B, a, x,C⟨T⟩(R, C⟨T⟩(R, b, y, c), z, d)) ::: with against the nested patterns xj .