Non-Linear Pattern Matching with Backtracking for Non-Free Data Types

Non-linear Pattern Matching with Backtracking for Non-free Data Types Satoshi Egi1 and Yuichi Nishiwaki2 1 Rakuten Institute of Technology, Japan 2 University of Tokyo, Japan Abstract. Non-free data types are data types whose data have no canonical forms. For example, multisets are non-free data types because the multiset fa; b; bg has two other equivalent but literally different forms fb; a; bg and fb; b; ag. Pattern matching is known to provide a handy tool set to treat such data types. Although many studies on pattern matching and implementations for practical programming languages have been proposed so far, we observe that none of these studies satisfy all the criteria of practical pattern matching, which are as follows: i) efficiency of the backtracking algorithm for non-linear patterns, ii) extensibility of matching process, and iii) polymorphism in patterns. This paper aims to design a new pattern-matching-oriented programming language that satisfies all the above three criteria. The proposed language features clean Scheme-like syntax and efficient and extensible pattern matching semantics. This programming language is especially useful for the processing of complex non-free data types that not only include multisets and sets but also graphs and symbolic mathematical expressions. We discuss the importance of our criteria of practical pattern matching and how our language design naturally arises from the criteria. The proposed language has been already implemented and open-sourced as the Egison programming language. 1 Introduction Pattern matching is an important feature of programming languages featur- ing data abstraction mechanisms. Data abstraction serves users with a simple method for handling data structures that contain plenty of complex informa- tion. Using pattern matching, programs using data abstraction become concise, arXiv:1808.10603v2 [cs.PL] 27 May 2019 human-readable, and maintainable. Most of the recent practical programming languages allow users to extend data abstraction e.g. by defining new types or classes, or by introducing new abstract interfaces. Therefore, a good programming language with pattern matching should allow users to extend its pattern- matching facility akin to the extensibility of data abstraction. Earlier, pattern-matching systems used to assume one-to-one correspondence between patterns and data constructors. However, this assumption became prob- lematic when one handles data types whose data have multiple representations. To overcome this problem, Wadler proposed the pattern-matching system 2 Satoshi Egi1 and Yuichi Nishiwaki2 views [28] that broke the symmetry between patterns and data constructors. Views enabled users to pattern-match against data represented in many ways. For example, a complex number may be represented either in polar or Cartesian form, and they are convertible to each other. Using views, one can pattern-match a complex number internally represented in polar form with a pattern written in Cartesian form, and vice versa, provided that mutual transformation functions are properly defined. Similarly, one can use the Cons pattern to perform pattern matching on lists with joins, where a list [1,2] can be either (Cons 1 (Cons 2 Nil)) or (Join (Cons 1 Nil) (Cons 2 Nil)), if one defines a normalization function of lists with join into a sequence of Cons. However, views require data types to have a distinguished canonical form among many possible forms. In the case of lists with join, one can pattern- match with Cons because any list with join is canonically reducible to a list with join with the Cons constructor at the head. On the other hand, for any list with join, there is no such canonical form that has Join at the head. For example, the list [1,2] may be decomposed with Join into three pairs: [] and [1,2], [1] and [2], and [1,2] and []. For that reason, views do not support pattern matching of lists with join using the Join pattern. Generally, data types without canonical forms are called non-free data types. Mathematically speaking, a non-free data type can be regarded as a quotient on a free data type over an equivalence. An example of non-free data types is, of course, list with join: it may be viewed as a non-free data type composed of a (free) binary tree equipped with an equivalence between trees with the same leaf nodes enumerated from left to right, such as (Join Nil (Cons 1 (Cons 2 Nil))) = (Join (Cons 1 Nil) (Cons 2 Nil)). Other typical examples include sets and multisets, as they are (free) lists with obvious identifications. Gen- erally, as shown for lists with join, pattern matching on non-free data types yields multiple results.3 For example, multiset f1,2,3g has three decompositions by the insert pattern: insert(1,{2,3}), insert(2,{1,3}), and insert(3,{1,2}). Therefore, how to handle multiple pattern-matching results is an extremely important issue when we design a programming language that supports pattern matching for non-free data types. On the other hand, pattern guard is a commonly used technique for filter- ing such multiple results from pattern matching. Basically, pattern guards are applied after enumerating all pattern-matching results. Therefore, substantial unnecessary enumerations often occur before the application of pattern guards. One simple solution is to break a large pattern into nested patterns to apply pattern guards as early as possible. However, this solution complicates the program and makes it hard to maintain. It is also possible to statically transform the program in the similar manner at the compile time. However, it makes the compiler implementation very complex. Non-linear pattern is an alternative method for pattern guard. Non-linear patterns are patterns that allow multiple occurrences 3 In fact, this phenomenon that \pattern matching against a single value yields multiple results" does not occur for free data types. This is the unique characteristic of non-free data types. Non-linear Pattern Matching with Backtracking for Non-free Data Types 3 of same variables in a pattern. Compared to pattern guards, they are not only syntactically beautiful but also compiler-friendly. Non-linear patterns are easier to analyze and hence can be implemented efficiently (Section 3.1 and 4.2). How- ever, it is not obvious how to extend a non-linear pattern-matching system to allow users to define an algorithm to decompose non-free data types. In this paper, we introduce extensible pattern matching to remedy this issue (Section 3.2, 4.4, and 6). Extensibility of pattern matching also enables us to define predicate patterns, which are typically implemented as a built-in feature (e.g. pattern guards) in most pattern-matching systems. Additionally, we improve the usability of pattern matching for non-free data types by introducing a syntactic generalization for the match expression, called polymorphic patterns (Section 3.3 and 4.3). We also present a non-linear pattern-matching algorithm specialized for backtracking on infinite search trees and supports pattern matching with infinitely many results in addition to keeping efficiency (Section 5). This paper aims to design a programming language that is oriented toward pattern matching for non-free data types. We summarize the above argument in the form of three criteria that must be fulfilled by a language in order to be used in practice: 1. Efficiency of the backtracking algorithm for non-linear patterns, 2. Extensibility of pattern matching, and 3. Polymorphism in patterns. We believe that the above requirements, called together criteria of practical pattern matching, are fundamental for languages with pattern matching. However, none of the existing languages and studies [5,15,26,10] fulfill all of them. In the rest of the paper, we present a language which satisfies the criteria, together with comparisons with other languages, several working examples, and formal semantics. We emphasize that our proposal has been already implemented in Haskell as the Egison programming language, and is open-sourced [6]. Since we set our focus in this paper on the design of the programming language, detailed discussion on the implementation of Egison is left for future work. 2 Related Work In this section, we compare our study with the prior work. First, we review previous studies on pattern matching in functional programming languages. Our proposal can be considered as an extension of these studies. The first non-linear pattern-matching system was the symbol manipulation system proposed by McBride [21]. This system was developed for Lisp. Their paper demonstrates some examples that process symbolic mathematical expressions to show the expressive power of non-linear patterns. However, this approach does not support pattern matching with multiple results, and only supports pattern matching against a list as a collection. Miranda laws [27,25,24] and Wadler's views [28,22] are seminal work. These proposals provide methods to decompose data with multiple representations by 4 Satoshi Egi1 and Yuichi Nishiwaki2 explicitly declaring transformations between each representation. These are the earliest studies that allow users to customize the execution process of pattern matching. However, the pattern-matching systems in these proposals treat nei- ther multiple pattern matching results nor non-linear patterns. Also, these studies demand a canonical form for each representation. Active patterns [15,23] provides a method to decompose non-free data. In active patterns, users define a match function for each pattern to specify how to decompose non-free data. For example, insert for multisets is defined as a match function in [15]. An example of pattern matching against graphs using matching function is also shown in [16]. One limitation of active patterns is that it does not support backtracking in the pattern matching process.

Load more