Generating Mutually Recursive Definitions
Jeremy Yallop Oleg Kiselyov University of Cambridge, UK Tohoku University, Japan [email protected] [email protected]
Abstract .
75 PEPM ’19, January 14–15, 2019, Cascais, Portugal Jeremy Yallop and Oleg Kiselyov are eventually bound to the intended expressions, and how since we do not wish to compute 1+ 2 every time the function to ensure that generated code is well-typed. generated by c3 is applied. The challenge is inserting let Related metaprogramming systems such as LMS [Rompf bindings into a wider context rather than into the immediate 2016] and Template Haskell [Sheard and Jones 2002] which code fragment under construction. are capable of generating recursive definitions indeed use Recent versions of BER MetaOCaml have a built-in genlet gensym and ‘compiler magic’ (such as intensional analysis primitive: if e is a code value, then genlet e arranges to gen- of closures in LMS and dependency analysis in GHC to de- erate, at an appropriate place, a let expression binding e to a termine mutually recursive groups). None of them therefore variable, returning the code value with just that variable. (If catch the left-unbound variables until the code is fully gener- e is already an atomic expression, genlet e returns e as it is). ated and compiled – at which point the error in the generator For example, in the following program p12l is bound to a becomes very difficult to find (as illustrated in[Ofenbeck code expression 1+ 2 that is to be let-bound according to the et al. 2016]). context. When p12l is printed – that is, used in the top-level In practice, MetaOCaml programmers fall back on a vari- context – the let is inserted immediately: ety of workarounds, simulating mutual recursion using ordi- let p12l = genlet p12 nary recursion [Kiselyov 2013] or nested recursion [Inoue { val p12l : int code = .
76 Generating Mutually Recursive Definitions PEPM ’19, January 14–15, 2019, Cascais, Portugal
The duplicated expressions in the generated code reveal why Generating recursive definitions was deemed for a long fibnr is exponentially slow. time a difficult problem. One day, a two-liner solution A memoizing fixpoint combinator inserts a let-binding emerged, from the insight that a recursive definition for the result of each call, and maintains a mapping from previous arguments to the let-bound variables [Swadi et al. let rec g = e in body 2006] may be re-written as let mfix (f : (α → β code) → (α → β code)) (x : α): β code = let memo = ref [] in let g = let rec g = e in g in body let rec loop n = try List.assoc n !memo with Not_found → which immediately gives us genletrec: let v = genlet (f loop n) in let genletrec : ((α→β) code → α code → β code) → memo := (n,v) :: !memo; v (α→β) code = in loop x fun f → genlet .
.
77 PEPM ’19, January 14–15, 2019, Cascais, Portugal Jeremy Yallop and Oleg Kiselyov
4 Generating Mutually-Recursive and obtain the code for even m n specialized to a particular Functions value of m, say, 0 (which is just the ordinary even function): In many practical cases of generating recursive definitions mrfix (fun self (idx,m) x → one wants to produce mutually recursive definitions, such sevodf (fun idx m → self (idx,m)) idx m x) as the state machine shown in Section1. To illustrate the (Even,0) challenges brought by mutual recursion, we take a simpler { − : (int → bool) code = .< running example, contrived to be in the shape of the earlier let lv6 = Ackermann function. The example is the ‘classical’ even- let rec g1 = odd pair, but taking two non-negative integers m and n and let lv5 = returning a boolean, telling if the sum m+ n has even or odd let rec g3 x4 = if x4>0 then g1 (x4−1) else false parity, resp. in g3 in fun x → if x >0 then lv (x −1) else true in let rec even m n = if n>0 then odd m (n−1) else 2 2 5 2 g in if m>0 then odd (m−1) n else 1 lv >. true 6 and odd m n = if n>0 then even m (n−1) else The odd function (appearing under the generated name g3) if m>0 then even (m−1) n else is nested inside even (or, g1) rather than being ‘parallel’ with false it. It means odd is not accessible from the outside; if we also want to compute odd parity, we have to duplicate the At first, mutual recursion seems to pose no problem: after code. There is a deeper problem than mere code duplication: all, a group of mutually recursive functions may always be specializing to = (that is, applying the tied-knot converted to the ordinary recursive function by adding an even m n m 1 to ) generates no code. An exception is raised extra argument: the index of a particular recursive clause in sevodf (Even,1) instead, telling us that MetaOCaml detected scope extrusion: the group1: an attempt to use a variable outside the scope of its binding. type evod = Even | Odd Indeed, we have attempted to produce something like the following (identifiers are renamed for clarity): let rec evodf self idx m n = let lod0 = (∗ odd 0 n ∗) match idx with let rec od0 n = | Even → if n>0 then self Odd m (n−1) else let lev0 = (∗ even 0 n ∗) if m>0 then self Odd (m−1) n else let rec ev0 n = if n>0 then od0 (n−1) else true in ev0 in true if n>0 then lev0 (n−1) else false in od0 in | Odd → if n>0 then self Even m (n−1) else let lev1 = (∗ even 1 n ∗) if m>0 then self Even (m−1) n else let rec ev1 n = false let lod1 = (∗ odd 1 n ∗) { val evodf : (evod → int → int → bool) → let rec od1 n = if n>0 then ev1 (n−1) else lev0n in od1 in evod → int → int → bool if n>0 then lod1 (n−1) else lod0 n in ev1 To find out if the sum of 10 and 42 has even parity one writes in lev1 fix evodf Even 10+ 42. The straightforward staging gives Here, the function ev1, the specialization of even m n let rec sevodf self idx m n = to m= 1 calls od0 and od1. The latter calls ev1 and match idx with fun n → even 0 n, whose code was already generated and | Even → memoized, under the name lev0. Unfortunately, the scope of .
78 Generating Mutually Recursive Definitions PEPM ’19, January 14–15, 2019, Cascais, Portugal do we have that all generated names will be bound, and to With this new mrfix but the same sevodf from Section4 we their intended clauses? How do we maintain the MetaOCaml are able to generate the specialized even 1 n code, with four guarantee that the fully generated code is always well-typed? mutually recursive definitions. Finally, what should the interface for the generator of mu- tually recursive bindings be in the first place? After quite a Finite State Automata, reprise Recognizers of finite state automata are produced by the following generic, textbook bit of thought, it turns out that genletrec’s interface would 4 suffice. For the sake of better error detection, one would gen- generator : eralize it slightly. We add a second function, genletrec_locus, type token = A | B which marks the location where a group of recursive defi- type state = S | T | U nitions should be inserted; the generated locus_t value rep- type (α,σ) automaton = resenting the location can be passed as first argument of { finals: σ list; genletrec: trans: (σ ∗ (α ∗ σ) list) list} type locus_t val genletrec_locus : let makeau : (locus_t → α code) → α code (token, α) automaton → val genletrec : (α → (token list → bool) code) → locus_t → ((α→β) code → α code → β code) → (α→β) code α → token list code → bool code = fun {finals; trans} self state stream → The earlier genlet (and, hence genletrec) inserted the re- let accept = List.mem state finals in quested definition in the widest possible context (while en- let next token = List.assoc token (List.assoc state trans) in suring the absence of unbound variables in the generated .< match .~stream with code). With the new interface the insertion point (and hence | A :: r → .~(self (next A)) r the scope of the inserted bindings) is explicitly marked using | B :: r → .~(self (next B)) r genletrec_locus and each call to genletrec indicates which | [] → accept >. group of recursive bindings should contain the generated definition2. Correspondingly, in a call In particular, the automaton in Section1 is represented by genletrec locus (fun g x → ...) the following description the identifier for the binding (bound to g) scopes beyond let au1 = genletrec’s body (but within the scope denoted by locus). {finals = [S]; The new genletrec let us write mrfix essentially just like trans = [(S, [(A, S); (B, T)]); the simpler mfix, without the splitting of the memo table (T, [(A, S); (B, U)]); into global and local parts3: now, the definitions have the (U, [(A, T); (B, U)]);]} same scope. Then mrfix (makeau au1) S generates: let mrfix : let rec x1 y = match y with ((α → (β→γ ) code) → (α → β code →γ code)) → | A::r → x1 r (α → (β→γ ) code) = | B::r → x5 r fun f x → | [] → true genletrec_locus @@ fun locus → and x5 y = match y with let memo = ref [] in | A::r → x1 r let rec loop n = | B::r → x9 r try List.assoc n !memo | [] → false with Not_found → and x9 y = match y with genletrec locus (fun g y → | A::r → x5 r memo := (n,g) :: !memo; | B::r → x9 r f loop n y) | [] → false in loop x in x1 2 It hence becomes the programmer’s responsibility to place genletrec_locus → correctly. We are yet to explore and resolve the trade-off between automati- of the type token list bool. cally floating genlet and genletrec whose scope is to be set manually. 3Previously, genletrec relied on the trick let g = let rec g = e in g in body, 4The generator makeau is indeed polymorphic over the type of the state; which binds two different gs, one within and one outside the scope of the the dependence on the alphabet shows in the match statement. Incidentally, local let rec. Therefore, the memo table had two parts. The local part tracks MetaOCaml also has a facility to generate pattern-match clauses of statically the identifiers that are valid only while we are generating the let rec body; unknown length and content. With its help, we can make makeau fully the global part, to which we only add, collects the externally visible gs. general.
79 PEPM ’19, January 14–15, 2019, Cascais, Portugal Jeremy Yallop and Oleg Kiselyov
5 Further Extensions Here %staged is an attribute that indicates the need for We sketch some extensions to the mrfix combinator of Sec- a rewrite by a plug-in program that expands the syntax as tion4. shown above. Then ack can be written as follows 5.1 Arbitrary Bodies in let rec Expressions let%staged rec ack m n = The mrfix combinator has the following type: if m = 0 then .<.~n+ 1>. else val mrfix : ((α → (β→γ ) code) → (α → β code →γ code)) → .
let rec x1 y = match y with A::r → x1 r Since the higher-rank and higher-kinded polymorphism | B::r → x5 r found in this type cannot be expressed directly in OCaml, our | [] → true implementation uses standard encodings based on OCaml’s and x5 y = match y with A::r → x1 r polymorphic record fields and functors. | B::r → x9 r | [] → false 5.4 Polymorphic Recursion and x9 y = match y with A::r → x5 r The extensions needed to support heterogeneous recursion | B::r → x9 r (Section 5.3) are also sufficient to support polymorphic re- | [] → false cursion. For example, here is a nested data type ntree of in (x1, x9, x5) perfectly balanced trees and a polymorphic function swivel that interchanges left and right elements of ntree values: 5.2 A Syntax Extension 5This (progressively fancier and fancier) indexing is closely related to the Third-order functions such as mrfixk are not always easy to generalized arity in Plotkin and Power’s formulation of algebraic effects understand and use. The following small syntax extension [Plotkin and Power 2003]. Like them, we represent a tuple as a set of indices improves readability in many cases: plus the function that maps each index to a value. The index set does not have to have a fixed finite cardinality. Adding more structure to the set let%staged rec f p p' = e in e' of indices lets us, like it let Plotkin and Power, represent more interesting { mrfixk (fun f p p' → e) (fun f → e') collections.
80 Generating Mutually Recursive Definitions PEPM ’19, January 14–15, 2019, Cascais, Portugal
type α ntree = EmptyN Graham Hutton and Erik Meijer. 1996. Monadic Parser Combinators. Tech- | TreeN of α ∗ (α ∗ α) ntree nical Report NOTTCS-TR-96-4. Department of Computer Science, Uni- versity of Nottingham. Jun Inoue. 2014. Supercompiling with Staging. In Fourth International → → → let rec swivel : α.(α α) α ntree α ntree = Valentin Turchin Workshop on Metacomputation. fun f t → match t with Yukiyoshi Kameyama, Oleg Kiselyov, and Chung-chieh Shan. 2011. Shifting | EmptyN → EmptyN the Stage: Staging with Delimited Control. J. Funct. Program. 21, 6 (Nov. | TreeN (v,t) → TreeN (f v, swivel (fun (x, y) → (f y, f x)) t) 2011), 617–662. Oleg Kiselyov. 2012. Delimited Control in OCaml, Abstractly and Concretely. This is polymorphic-recursive because the recursive call uses Theor. Comput. Sci. 435 (June 2012), 56–76. https://doi.org/10.1016/j.tcs. swivel at a different type than the type of the definition: the 2012.02.025 passed function f acts on pairs α ∗ α, not values of type α. Oleg Kiselyov. 2013. Simplest poly-variadic fix-point combinators for mutual Generating polymorphic-recursive definitions like swivel recursion. http://okmij.org/ftp/Computation/fixed-point-combinators. html#Poly-variadic. involves indexing by a polymorphic type. Here is a suitable Oleg Kiselyov. 2014. The Design and Implementation of BER MetaOCaml. index for generating swivel: In Functional and Logic Programming (Lecture Notes in Computer Science), type swivel = { swivel: α.(α → α) → α ntree → α ntree } Michael Codish and Eijiro Sumii (Eds.), Vol. 8475. Springer International Publishing, 86–102. type _ index = Swivel : swivel index Shriram Krishnamurthi. 2006. Educational Pearl: Automata via macros. At each use of the index the polymorphic record field J. Funct. Program. 16, 3 (2006), 253–267. https://doi.org/10.1017/ can be instantiated afresh, making it possible to call the S0956796805005733 Georg Ofenbeck, Tiark Rompf, and Markus Püschel. 2016. RandIR: differ- generated function recursively at any instance of the type ential testing for embedded compilers. In Proceedings of the 7th ACM (α → α) → α ntree → α ntree. SIGPLAN Symposium on Scala, SCALA@SPLASH 2016. ACM, 21–30. https://doi.org/10.1145/2998392 Acknowledgments Gordon Plotkin and John Power. 2003. Algebraic Operations and Generic Effects. Applied Categorical Structures 11, 1 (01 Feb 2003), 69–94. https: We thank Nada Amin and Jun Inoue for helpful discussions //doi.org/10.1023/A:1023064908962 and posed challenges, and Atsushi Igarashi for hospitality. Tiark Rompf. 2016. The Essence of Multi-stage Evaluation in LMS. Springer We are grateful to anonymous reviewers for many helpful International Publishing, Cham, 318–335. https://doi.org/10.1007/ suggestions. This work was partially supported by JSPS KAK- 978-3-319-30936-1_17 ENHI Grant Number 18H03218. Tim Sheard and Simon Peyton Jones. 2002. Template Meta-programming for Haskell. SIGPLAN Not. 37, 12 (Dec. 2002), 60–75. https://doi.org/10. 1145/636517.636528 Status Kedar Swadi, Walid Taha, Oleg Kiselyov, and Emir Pašalić. 2006. A Monadic The described genletrec is prototyped6 in full using plain Approach for Avoiding Code Duplication When Staging Memoized Func- MetaOCaml as well as MetaOCaml with delimited control tions. In PEPM. 160–169. Don Syme. 2006. Initializing Mutually Referential Abstract Objects: The effects, such as those provided by Multicore OCaml [Dolan Value Recursion Challenge, In Proceedings of the ACM-SIGPLAN Work- et al. 2015] or the delimcc library [Kiselyov 2012]. We are shop on ML (2005). working at supporting it above-the-board in a forthcoming Walid Mohamed Taha. 1999. Multistage Programming: Its Theory and Appli- release of MetaOCaml. cations. Ph.D. Dissertation. Oregon Graduate Institute of Science and Technology. AAI9949870. Jeremy Yallop. 2016. Staging Generic Programming. In Proceedings of References the 2016 ACM SIGPLAN Workshop on Partial Evaluation and Program Hal Abelson, Jerry Sussman, and Julie Sussman. 1984. Structure and Inter- Manipulation (PEPM ’16). ACM, New York, NY, USA, 85–96. https: pretation of Computer Programs. MIT Press. ISBN 0-262-01077-1. //doi.org/10.1145/2847538.2847546 Stephen Dolan, Leo White, KC Sivaramakrishnan, Jeremy Yallop, and Anil Jeremy Yallop. 2017. Staged Generic Programming. Proc. ACM Program. Madhavapeddy. 2015. Effective Concurrency through Algebraic Effects. Lang. 1, ICFP, Article 29 (Aug. 2017), 29:1–29:29 pages. https://doi.org/ (September 2015). OCaml Users and Developers Workshop 2015. 10.1145/3110273 Ralf Hinze and Ross Paterson. 2003. Derivation of a Typed Functional LR Parser. 6https://github.com/yallop/metaocaml-letrec
81