Compiling Successor ML Pattern Guards
Total Page:16
File Type:pdf, Size:1020Kb
Compiling Successor ML Pattern Guards John Reppy Mona Zahir University of Chicago University of Chicago USA USA [email protected] [email protected] 1 Introduction 2 Successor ML Pattern Guards Successor ML [SML15] is a collection of proposed language As mentioned above, Successor ML extends the Standard extensions to the 1997 Definition of StandardMTHM97 ML[ ]. ML pattern syntax with pattern guards, which have the form A number of these extensions address pattern matching, by “p with p0 = e.” The semiformal semantics of this con- adding richer record patterns, or-patterns, and pattern guards; struct is given by the following inference rules: the last of these features is the focus of this paper. s0; E;v ` p ) FAIL; s1 Languages, such as Haskell, OCaml, and Scala, have a 0 restricted form of conditional patterns, where the guards are s0; E;v ` p with p = e ) FAIL; s1 part of the match or case syntax. Compiling such guards is s0; E;v ` p ) VE1; s1 fairly straightforward [Pet99b]. The pattern guards of Succes- 0 0 0 s1; E + VE1 ` e ) v ; s2 s2; E + VE1;v ` p ) FAIL; s3 sor ML are more general, since they are a syntactic form of ; ; ` with 0 = ) FAIL; pattern and, thus, can be nested inside other pattern constructs. s0 E v p p e s3 This more general version of pattern guards allows another s0; E;v ` p ) VE1; s1 1 0 0 0 extension, called transformation patterns, to be defined as a s1; E + VE1 ` e ) v ; s2 s2; E + VE1;v ` p ) VE2; s3 derived form [Ros08], but it also raises some interesting im- 0 s0; E;v ` p with p = e ) E + VE2; s3 plementation issues that have not been previously addressed where s is a state, VE is a value environment, and v is a value. in the literature. The third rule describes a successful match, where we first As part of an ongoing effort to extend Standard ML of test if p matches v and then, if it does, evaluate e to a value New Jersey with Successor ML features,2 we have been pro- v 0 and test if p0 matches v 0. The variables bound by p are in totyping a new pattern-match compiler. This paper describes scope in e and the dynamic environment is further extended the approach that we have devised to handle Successor ML’s by the variable bound in p. Furthermore, the state is threaded general form of pattern guards. through these three steps, which fixes the order of evaluation There are two common approaches to implementing pattern in the presence of side effects. matching in lower-level code: Successor ML also provides the form “p if e” as syntac- • Backtracking-based solutions, which have the advan- tic sugar for “p with true = e.” tage that the resulting low-level code is linear in the size The complete semantics of Successor ML pattern matching of the input, but which may do redundant work [Aug85, can be found in the Definition [SML15], but the most impor- LFM01]. tant aspect of it is that patterns (and sub-patterns) are matched • Decision-tree-based solutions, which avoid redundant in left-to-right order. This order restricts the flexibility of a tests, but can suffer exponential blowup in the size of pattern-match compiler when pattern guards are present, since the generated code [BM85, Pet99b, Mar08]. the order of any side effects that might be present must be While these approaches have different properties, in both preserved. In fact, a guard might even modify the value being cases the compilation schemes typically use a pattern matrix matched, as illustrated in the following code snippet: as the central data structure and are organized around a set of let val x = ref 2 in rules for analyzing and transforming the pattern matrix. Our case x of approach to compiling pattern guards extends the pattern ma- | (ref 1) => ... trix representation with guard columns and associated rules; | (ref _ if (x := 1; false)) => ... thus, the approach presented in this paper can be applied to generate either backtracking or decision-tree implementations | (ref 1) => ... of the compiled match. end which will match the third clause of the case expression. 1Transformation patterns are a simple view mechanism. 2Rossberg’s HaMLet S [Ros08] is the only complete implementation of 3 Clause Matrices and Guard Columns successor ML, but it does not compile patterns. Both MLton and Standard ML of New Jersey have implemented a small subset of Successor ML features, Our approach to compiling pattern guards can be used in but not the pattern extensions. both back-tracking and decision-tree compilation schemes. John Reppy and Mona Zahir In this section, we describe the common aspects of these P ! L (P is m × n), and is defined by analysis of the first two schemes. We assume that the translation produces an column of P: expression only simple patterns are used (i.e., a tuple, variable, • If n = 0, then the result is e1. or single constructor application). • If all the the pi (1 ≤ i ≤ m) are all wild cards, then the For purposes of the presentation, we consider a simpli- 1 Variable Rule applies, which effectively drops x1 and fied pattern language, with wildcards, constructors, andthe the first column of the P. derived form of pattern guards:3 i • If all the p1 (1 ≤ i ≤ m) are all constructor patterns, then the Constructor Rule applies. In this case, we p ::= _ j c¹p1;:::;pnº j ¹p if eº construct a simple case expression on x1, where there Most presentations of pattern-matching algorithms work on is a clause a vector of n variables x® and a clause matrix P ! L, where P is an m × n pattern matrix and L is an m-element column c¹y1;:::;yk º => vector of actions. This input can be viewed as representing a C¹hy1;:::;yk; x2;:::; xni; S¹c; P ! Lº case of the form The right-hand-side of the clause is the projection of those rows in P ! L that have a pattern of the form case (x1;:::; xn) of c¹q1;:::; qk º in the first position, where we then replace | (p1, :::, p1 ) => l1 1 n the first pattern by q1 ··· qk . If the constructors in the | ··· first column are not exhaustive, we add a default rule m m m | (p1 , :::, pn ) => l that backtracks. • Otherwise, the Mixture Rule applies, which splits the (we follow the convention of using superscripts to denote row matrix into maximal blocks of rows, where each block indices and subscripts to denote column indices). is covered by one of the other rules. Each block is recur- We extend this representation of a clause matrix to include sively compiled to an expression and the expressions f g columns of guard expressions, where we use the syntax “ e ” are combined in series using backtracking.4 to distinguish a guard from a pattern. We also extend the nota- tion for variable vectors to include the “•” symbol in positions Note that these rules test patterns in a left-to-right order that correspond to guard columns. We define a trivial entry in that respects the semantics in the presence of effectful pattern a pattern matrix to be either a wild card or the trivial guard guards. We extend this C with three rules related to pattern ftrueg. guards. Compilation of the clause matrix proceeds by case analysis 4.1 The Guard-split Rule on one of the columns of P. For simplicity, we usually pick the i ¹ if º first column, but when compiling to a decision tree, itmaybe If any pattern p1 in the first column has the form p e , we advantageous to pick a different column [SR00, Mar08]. The apply the transformation G1, to the clause matrix P ! L and details of the case analysis and subsequent transformations then invoke C on the result. Thus, the Guard-split Rule is depend on the particular compilation scheme, but we need implemented as the following transformation for both schemes. C¹hx1; •; x2;:::; xni; G¹P ! Lºº The guard-split transformation Gj splits the jth column of a pattern matrix into a two columns: a pattern column and a We apply the Guard-split Rule eagerly (i.e., when any pat- guard column. G¹P ! Lº is defined as follows: tern in the first column is a pattern guard), since the transfor- mation can expose larger blocks of rows for the other rules, pi Row i in G ¹P ! Lº j j which is particularly beneficial for backtracking implementa- i i i ¹p if eº p1 ··· p feg · · · pn ! l tions. This rule does not reorder patterns, so the left-to-right i i i evaluation order is preserved. p p1 ··· p ftrueg · · · pn ! l 4 Compiling Pattern Guards with 4.2 The Guard-match Rule 1 If the first column of P is a guard column, p1 is feg (where e Backtracking i is not the expression true), and p1 is ftrueg for 1 < i ≤ m, We first consider how to extend the “Classical Backtracking then we generate the code Scheme” C, as described by Le Fessant and Maranget [LFM01], 0 0 to include pattern guards. The compilation scheme C takes let val x = e in C¹hx ; x2;:::; xni; M¹P ! Lºº two arguments: a vector of n variables x® and a clause matrix where M¹P ! Lº is defined as follows: 3Because of space limitations, we are ignoring variable bindings. These can either be handled using occurrences à la Pettersson [Pet99b] or by modifying 4Le Fessant and Maranget use a special catch/exit construct to imple- the actions with let bindings.