ADAPTON: Composable, Demand- Driven Incremental Computation (Extended version)

Matthew A. Hammer, Khoo Yit Phang, Michael Hicks, and Jeffrey S. Foster University of Maryland, College Park, USA

Abstract • Swapping, in which the order of subcomputations is changed, Many researchers have proposed programming languages that sup- e.g., changing F(A1..A100) + F(B1..B100) to F(B1..B100) ∗ port incremental computation (IC), which allows programs to be F(A1..A100). Previous IC systems would recompute F(A1.. efficiently re-executed after a small change to the input. However, A100) or F(B1..B100) due to their reliance on a total ordering. existing implementations of such languages have two important • Switching, in which computations are toggled back and forth, drawbacks. First, recomputation is oblivious to specific demands e.g., a computation of F(A1..A50) is replaced by F(A51..A100), on the program output; that is, if a program input changes, all de- which is then replaced by F(A1..A100). Previous IC systems pendencies will be recomputed, even if an observer no longer re- would recompute the F(A1..A50) results from scratch, even if quires certain outputs. Second, programs are made incremental as that subcomputation could be reused. a unit, with little or no support for reusing results outside of their original context, e.g., when reordered. A second major problem with prior IC approaches is that they are cdd To address these problems, we present λic , a core calculus ill-suited to interactive computations, because they are inherently that applies a demand-driven semantics to incremental computa- eager. When an input value is changed, all values derived from tion, tracking changes in a hierarchical fashion in a novel demanded that input are updated. But in many interaction scenarios, users cdd computation graph. λic also formalizes an explicit separation be- may not need such updates. For example, suppose cell S!B1 on tween inner, incremental computations and outer observers. This spreadsheet S contains F(T!A1..T!A100); here S!Ai refers to cell cdd Ai S S combination ensures λic programs only recompute computations in spreadsheet . Now suppose the user hides and switches as demanded by observers, and allows inner computations to be to T to edit A1 and other cells. Then there is no need to update reused more liberally. We present ADAPTON, an OCaml library S!B1 until the user switches back to S to display it. Yet standard IC cdd recomputes dependencies on each change, regardless of demand. implementing λic . We evaluated ADAPTON on a range of bench- marks, and found that it provides reliable speedups, and in many In this paper, we introduce ADAPTON, a new IC approach re- cases dramatically outperforms state-of-the-art IC approaches. alized in a functional , that addresses the limitations discussed above. The key insight behind ADAPTON is to combine traditional IC-style reuse with a mechanism for memo- 1. Introduction izing thunks, as in lazy computation. In ADAPTON, updates to mu- Incremental computation (IC), a.k.a. self-adjusting computation, is table ref cells signal the potential need for recomputation, but such a technique for efficiently recomputing a after making a recomputation is delayed until thunks accessing cells’ dependents small change to its input. In recent years, researchers have shown are force d. Under the hood, both ref s and thunks are implemented that for certain , inputs, and classes of input changes, IC almost identically using a demanded computation graph (DCG). delivers large, even asymptotic speed-ups over full reevaluation [5, The DCG captures the partial order of which computation’s re- 6]. IC has been developed in many different settings [12, 18, 20, sults are used in which other computations. As a result, ADAP- 33], and has even been used to address open problems, e.g., in TON provides very efficient support of the sharing, swapping, and computational geometry [7]. switching patterns, since partial computations can be reused more IC systems work by recording traces of computations and then effectively. ADAPTON’s laziness avoids recomputing undemanded reusing portions of those traces as inputs change. Unfortunately, values. (Section 2 gives a high-level overview of ADAPTON.) cdd prior IC approaches have two major limitations in trace reuse. First, We formalize ADAPTON as the core calculus λic . Follow- cdd because traditional IC imposes a total ordering on traces (typically, ing Levy’s call-by-push-value calculus [24], λic includes explicit because of reliance on the Dietz-Sleator order-maintenance data thunk and force primitives, to make laziness apparent in the lan- structure [10, 16]), several straightforward kinds of reuse are im- guage, and adds ref , get, and set to model changeable state. A key cdd possible. For example, consider using IC to implement a spread- feature of λic , and of ADAPTON, is that it explicitly separates in- sheet, so that visible formulae are minimally recomputed as spread- ner computations—which may read but not write ref s—from outer sheet cells are changed, hidden, and shown. Traditional IC omits computations, which can allocate and mutate ref s and thus poten- three common reuse patterns: tially precipitate change propagation. We should note that, to our knowledge, this clear inner/outer separation is absent from previ- cdd • Sharing, in which a computation is used in different contexts, ous treatments of IC. (Section 3 presents λic .) cdd e.g., F(A1..A100) appears as a subformula in two different We formalize an incremental semantics for λic that captures computations or cells. Previous IC systems would recompute the notion of prior knowledge, which consists of the demanded the second F(A1..A100) rather than reuse the previous result. computation traces of prior computations. This semantics declar-

1 2014/1/18 atively specifies the process of reusing traces from prior knowl- indicated, all of this code runs at the outer layer, hence it is allowed edge by (locally) patching their inconsistencies. We prove that the to allocate new memory: patching process is sound in that patched results will match what p2 let n1 : cell= ref Num 1 in + (re)computation from scratch would have produced. (Section 4 let n2 : cell= ref Num 2 in presents our incremental semantics.) let n3 : cell= ref Num 3 in + 3 We have implemented ADAPTON as an OCaml library (Sec- let p1 : cell= ref Plus n1 n2 in tion 5). We compared ADAPTON’s performance against that of a let p2 : cell= ref Plus p1 n3 in ··· 1 2 traditional IC system using a range of standard subject programs Given a cell of interest, we can evaluate it using the obvious from the IC literature, e.g., map, filter, sort, etc. We created micro- interpreter with a few extra calls for reuse: benchmarks that invoke these programs with varying levels of de- mand (e.g., demand a single element vs. all elements) and with eval : c e l l U int varying change patterns (e.g., swapping and switching). eval c = thunk(inner(case getc of | Num x x → Our results show that ADAPTON greatly outperforms traditional | Plus c1 c2 force (eval c1) + force (eval c2) )) IC for lazy interactions: where traditional IC gets 2× to 20× speedups over naive recomputation, ADAPTON gets 7× to 2000× Here thunk creates⇒ a suspended computation, which is typed by the ⇒ speedups. For one program ( mergesort ), traditional IC actually U constructor, e.g., U int is equivalent to unit int . Thunks are incurs a 6.5× slowdown, whereas ADAPTON provides a 300× demanded by calling force , as in the recursive calls to eval . The speedup. ADAPTON provides similarly significant speedups on the body of the thunk is an expression wrapped in→inner , indicating switching and swapping patterns, whereas traditional IC often in- that expression occurs at the inner layer. Inner layer computations curs substantial (4× to 500×) slowdowns. ADAPTON does not per- may only read ref cells—e.g., via the call to get above—and not form as well as traditional IC when all output is demanded—it can allocate or change them. This restriction enables inner computa- be 1.5× to 3.5× slower—but still achieves a significant speedup tions to be incrementally reused. Thus, in this example, each nested over naive recomputation. call to eval is a computation that can be updated and/or its results As a more practical measure of ADAPTON’s utility, we devel- reused when ref s are modified at the outer layer. To illustrate how 2 oped the ADAPTON SpreadSheet (AS ), which uses ADAPTON as changes are made and propagated, consider the code shown at the its recomputation engine. We found that ADAPTON successfully in- top of Figure 1. Although written as a block of code, we envision crementalized the (stateless) specification for formulae evaluation; the user entering this code a line at a time at an interactive top level. a pair of simple benchmarks showed speedups of up to 20× com- On lines 1 and 2, the user calls eval to produce a thunk that, when pared to naive recomputation. On the same benchmarks classic IC forced, will compute the values of p1 and p2, respectively. Then techniques performed poorly, always resulting in a slowdown (up on lines 3 and 4, the user forces evaluation and displays the result to 100×). (Section 6 presents our experimental results.) using a function While most prior work on IC requires a total ordering of events, display :U int unit( ∗ force arg and display it ∗) which compromises reuse, Ley-Wild et al. have recently studied (In this example the user refreshes the display manually, rather than non-monotonic changes [25, 27]. Non-monotonic IC supports the having it update automatically,→ to illustrate varying demand.) The swapping pattern mentioned above, but does not support sharing computation on line 3 evaluates the formula of cell p1, which recur- or switching. Moreover, non-monotonic IC has never been imple- sively forces the (trivial) evaluation of leaf cells n1 and n2. Next, mented, and is (in our opinion) far more complicated than ADAP- line 4 computes the value of p2. Since p2 has p1 as a subexpression, TON. An increasingly popular computational paradigm related to we would like to reuse the prior computation of p1.ADAPTON ac- IC is functional-reactive programming (FRP) [14, 15, 22]. FRP pro- complishes this reuse via demanded computation graphs (DCGs). vides some reuse of computations under a changing signal, but is Sharing. In ADAPTON, DCGs operate behind the scenes by more specialized than ADAPTON (and IC in general). We have im- recording inner layer computations. Figure 1a shows the DCG after plemented an FRP library using ADAPTON; for space reasons we relegate discussion of it to the Appendix. (Section 7 compares to evaluating line 4. Lines 1 and 2 only create thunks and otherwise IC, non-monotonic IC, and FRP in detail.) do no computation, and line 3 essentially creates what is the left side of Figure 1a. We depict DCGs growing from the bottom of In sum, we believe that ADAPTON offers a compellingly sim- INNER ple, yet general approach for programming incremental, interactive the line upwards. We use a dotted line to identify opera- computation. tions performed at the outer layer that affect the inner layer graph, and color graph edges when they are touched as the result of that operation; edges are shown in blue when created or refreshed, and 2. Overview of ADAPTON pink when they are dirtied. Rather than draw outer layer edges for We illustrate how ADAPTON works using a simple example in- n1, t1, p2, etc. to their respective nodes, we write the names in the spired by a typical user interaction with a spreadsheet. ADAPTON’s nodes themselves, to avoid clutter. programming model is based on an ML-like language with explicit Each node in the DCG corresponds to a reference cell (depicted primitives for thunks and mutable state, where changes to the latter as a square) or a thunk (depicted as a circle). Edges that target ref- eventually propagate to update previous results. erence cells are produced by get operations, and edges that target Consider the following toy language for formulae in spreadsheet thunks are produced by force operations. Though not shown in the cells: figure, each thunk node records a (suspended) expression and, once forced, its valuation; each reference node records the reference ad- typecell =M formula and formula =Num of int | Plus of cell × c e l l dress and its content. Thunk nodes may have outgoing edges, where the leftmost edge was created first and the rightmost last. In the fig- Values of type cell are formula addresses, i.e., mutable references ure, the leftmost edge from t2 goes to p2 because t2 is the result containing a cell formula. Here the M constructor (for “mutable”) of eval p2, which calls get on its argument. The rightmost edge is equivalent to ref type constructor in ML. A formula consists of corresponds to the forced recursive eval call that gets n3. One key either an integer (Num) or the sum of two other cells (Plus). feature of ADAPTON is that it tries to memo-match previously com- We begin by building an initial expression tree, illustrated to puted results whenever possible. Here, the middle edge out of t2 the right of the code below. Note that, since it is not otherwise memo-matches the result of eval p1 previously computed when t1

2 2014/1/18 1 let t1 :U int = eval p1 in and eval p1). When the outer layer demands a computation on line 2 let t2 :U int = eval p2 in 6, ADAPTON performs propagation by selectively traversing dirty 3 display t1 ;( ∗ demands(eval p1) ∗) nodes and edges from the bottom upwards and left to right, re- 4 display t2 ;( ∗ memo matches(eval p1) ∗) pairing dirty graph components by reevaluating dirty thunk nodes, 5 set n1 Num 5;( ∗ mutate leaf value ∗) which in turn replace their graph components with up-to-date ver- 6 display t1 ;( ∗ does not re−eval p2 ∗) sions; in the figure, the refreshed nodes/edges are in blue. Impor- 7 set p2 Plus n3 p1;( swaps operand cells ) ← ∗ ∗ tantly, this traversal order reflects evaluation (and demand) order, 8 display t2 ;( ∗ memo matches twice ∗) allowing change propagation to lazily avoid repair to components ← not currently under demand. In this example, line 6 re-demands the Graph features n1 n2 n3 first result for t1 but since there is no demand of t2,ADAPTON thunk does not recompute its value. get get get ref Swapping. Next, on line 7, the outer layer updates p2 by swap- dirty p1 p2 ping its two subcomponents. This kind of structural change defeats eval n1 eval n2 eval n3 IC actions traditional IC reuse, but DCGs support it naturally. In the figure, get force force get force memo match we can see that the outer layer update on line 7 dirties an additional t1 t2 constructed/ node and edge. Then line 8 demands the result t2, which initiates eval p1 eval p2 refreshed force propagation to recompute the thunk eval p2. As shown in the fig- INNER ure, propagation is able to memo-match eval p1 (as in the original OUTER line 4 force computation) and eval n3, even though they occur in a different (a) Sharing order. Switching. Finally, suppose the user updates expression p1 but then changes her mind and switches it back: n1 n2 n3 n1 n2 n3 9 set p1 Num4; display t2; 10 set p1 Plus n1 n2; display t2 p1 p2 p1 p2 After line← 9 the eval p1 thunk in the graph would be updated to point to← a new thunk that evaluates Num 4, and would no longer t1 t2 t1 t2 point to the eval n1 and eval n2 thunks. However, these thunks INNER are still available for reuse, and when the user switches back on line 10, the eval p1 thunk is restored to its original state, memo- OUTER line 5 line 6 set force matching the previous eval n1 and eval n2 results. We call this (b) Dirtying and lazy change propagation pattern switching because it switches back to previously computed, but currently inactive, results; once again, traditional IC would fail to achieve reuse in this case. n1 n2 n3 n1 n2 n3 3. Core calculus cdd p1 p2 p1 p2 We formalize ADAPTON as λic , a core calculus for incremental cdd computation in a setting with lazy evaluation. λic is an exten- sion of Levy’s call-by-push-value (CBPV) calculus [23], which is t1 t2 t1 t2 a standard variant of the simply-typed lambda calculus with an ex- INNER plicit thunk primitive. It uses thunks as part of a mechanism to syn- OUTER line 7 line 8 tactically distinguish computations from values, and to make eval- set force cdd uation order syntactically explicit. λic adds reference cells to the (c) Swapping CBPV core, along with notation for specifying inner- and outer- layer computations. Figure 1: Spreadsheet example. As there exist standard translations from both call-by-value cdd (CBV) and call-by-name (CBN) into CBPV, we intend λic to be in some sense canonical, regardless of whether the host language was forced on line 3; memo-matched expressions are depicted with is lazy or eager. We give a translation from a CBV language variant a gray background. Prior IC implementations would not be able to cdd of λic in the Appendix. reuse t2’s result in this manner because they enforce a monolithic, cdd total order of events; in this case we are reusing (i.e., sharing) a 3.1 Syntax, typing and basic semantics for λic previous computation in the event history. cdd The top of Figure 2 gives the formal syntax of λic , with new cdd Dirtying and lazy change propagation Next, on line 5, the user features highlighted. Figure 3 gives λic ’s type system. As most decides to change the value of leaf n1, and on line 6, they de- of the type rules are standard we weave discussion of them into our mand and display the updated result. Since a ref cell changed, the presentation of the language. cdd memoized thunk valuations may be inconsistent, and hence ADAP- λic inherits most of its syntax from CBPV. Terms consist of TON needs to repair the computation. Figure 1b shows how this value terms (written v) and computation terms (written e), which repair and recomputation is done via the DCG. When the outer we alternatively call expressions. Types consist of value types layer mutates the ref on line 5, ADAPTON performs dirtying to (written A, B) and computation types (written C, D). Standard value mark nodes and edges that may need recomputation. As shown in types consist of those for unit values () (typed by 1), injective val- the figure, ADAPTON traverses the graph from the mutated node ues inji v (typed as a sum A + B), pair values (v1, v2) (typed as a downwards, dirtying the nodes and edges that (transitively) de- product A × B) and thunk values thunk e (typed as a suspended pend on the changed reference cell n1 (viz., the thunks for eval n1 computation U C).

3 2014/1/18 Values v ::= x | () | (v1, v2) | inji v | thunk e | a Prior knowledge K ::= ε | K, T Comps e ::= λx.e | e v | let x e1 in e2 | ret v Traces T ::= T1·T2 | t | e˜ e a | fix f .e | f | case (v, x1.e1, x2.e2) Trace events t ::= forcee˜ [T] | getv | split (v, x1.x2.e) | force← ` v | inner e | ref v | get v | set v1 v2 trm(T) trm(T1·T2) = trm(T2) e trm( forcee˜ [T])= trm(T) ← a Value types A, B ::= 1 | A + B | A × B | U C | M A trm( getv ) = ret v Comp. types C, D ::= A C | F` A trm(e˜) = e˜ Comp. layers ` ::= inner | outer Typing env. Γ ::= ε | Γ, x:A | Γ, f :C | Γ, a:A → Figure 6: Traces and prior knowledge

Store S ::= ε | S, a:v Terminal comps e˜ ::= λx.e | ret v Theorem 3.1 (Subject reduction). Suppose that Γ ` S1 wf, Γ ` 0 0 e : C, and S1 ` e S2; e˜ then there exists Γ such that Γ ` Γ , 0 0 Γ ` S2 wf, and Γ ` e˜ : C Figure 2: Values and computations: Term and type syntaxes. An analogous result⇓ for a small-step semantics, which we omit for space reasons, establishes that the type system guarantees progress. We show that the inner layer does not mutate the outer Standard computation types consist of functions (typed by ar- layer store (recall that the inner layer’s only store effect is read-only row A C, and introduced by λx.e), and value-producers (typed access via get), and always that it yields deterministic results: by connective F` A, and introduced by ret v). These two term inner forms are→ special in that they correspond to the two introduction Theorem 3.2 (Inner purity). Suppose that Γ ` e :(C) and forms for computation types, and also the two terminal computa- S1 ` e S2; e˜ then S1 = S2. tion forms, i.e., the possible results of computations. inner Other standard computation terms consist of function applica- Theorem 3.3 (Inner determinism). Suppose that Γ ` e :(C) , S ` e ⇓ S ; e˜ , and S ` e S ; e˜ then S = S and e˜ = e˜ . tion (eliminates A C), let binding (eliminates F` A), fixed point 1 2 2 1 3 3 2 3 2 3 computations (fix f .e binds f recursively in its body e), pair split- ting (eliminates A × B), case analysis (eliminates A + B), and thunk → 4. Incremental⇓ semantics⇓ forcing (eliminates U C). cdd In Figure 4, we give the incremental semantics of λic . It defines Mutable stores and computation layers. The remaining (high- the reduction to traces judgment K; S1 ` e S2; T, read as “un- cdd der prior knowledge K and store S , expression e reduces to store lighted) forms are specific to λic ; they implement mutable stores 1 S and trace T.” We refer to our traces as demanded computation and computation layers. Mutable (outer layer) stores S map ad- 2 ⇓ dresses a to values v. Addresses a are values; they introduce the traces (DCT) as they record what thunks and suspended expres- type connective M A , where A is the type of the value that they sions a computation has demanded; these are the analogue of the contain. The forms ref, get and set introduce, access and update DCG presented in Section 2. Prior knowledge is simply a list of store addresses, respectively. such traces. The first time we evaluate e we will have an empty cdd store and no prior knowledge. As e evaluates, the traces of sub- The two layers of a λic program, outer and inner, are ranged over by layer meta variable `. For informing the operational seman- computations will get added to the prior knowledge K under which tics and typing rules, layer annotations attach to force terms (viz., subsequent sub-computations are evaluated. If the outer layer mu- tates the store, this knowledge may be used to support change prop- force` v ) and the type connective for value-producing computa- agation, written K; S ` T1 prop T2 . The given rules are sound, tions (viz., F` A). A term’s layer determines how it may interact y with the store. Inner layer computations may read from the store, as but non-deterministic and non-algorithmic; a practical, determinis- per the typing rule TYE-GET, while only outer layer computations tic is given in Section 5.1. may also allocate to it and mutate its contents, as enforced by typ- 4.1 Trace structure and propagation semantics ing rules TYE-REF and TYE-SET. As per type rule TYE-INNER, inner layer computations e may be used in an outer context by ap- Prior knowledge and traces. Figure 6 defines our notions of prior plying the explicit coercion inner e ; the converse is not permitted. knowledge and traces. Prior knowledge K consists of a list of traces This rule employs the “layer coercion” function (C)`, defined in from prior reductions. Traces T consist of sequences of trace events Figure 5, to enforce layer purity in a computation. It is also used to that end in a terminal expression. Trace events t record demanded a similar purpose in rules TYE-INNER and TYE-FORCE. The TYE- computations. Traced events getv record the address a that was INNER rule employs the environment transformation |Γ|, which fil- e read and the value v to which it mapped. Traced events forcee˜ [T] ters occurrences of recursive variables f from Γ, thus making the record the thunk expression e that was forced, its terminal expres- outer layer’s recursive functions unavailable to the inner layer. sion e˜ (i.e., the final term to which e originally reduced), and the trace T that was produced during its evaluation. Thus traces are hi- λcdd Operational Semantics. The basic reduction semantics for ic erarchical: trace events themselves contain traces which are locally S ` e S ; e S proves judgments of the form 1 2 ˜, read as “under 1, consistent—there is no global ordering of all events. This allows e e computation expression reduces to terminal ˜, producing store change propagation to be more compositional, supporting, e.g., S .” The semantics can be found in Section A. 2 ⇓ the sharing, switching and swapping patterns shown in Figures 1a to 1c. DCTs are closely related to the DCGs shown and discussed λcdd 3.2 Meta theory of basic ic in Section 2; the key difference between the two, as the names cdd We show that the λic type system and the basic reduction seman- suggest, is that DCGs are graphs whereas DCTs are trees. Hence, 0 tics enjoy subject reduction. Judgments Γ ` S1 wf and Γ ` Γ used DCTs do not explicitly represent the sharing found in DCGs, but below are defined in Figure 5. are easier to define and reason about formally; a shared sub-graph

4 2014/1/18 Γ ` v : A (Under Γ, value v has value type A.) TYV-VAR TYV-INJ TYV-PAIR TYV-THUNK TYV-REF TYV-UNIT Γ(x) = A exists i in {1, 2} Γ ` v : Ai Γ ` v1 : A1 Γ ` v2 : A2 Γ ` e : C Γ(a) = A

Γ ` x : A Γ ` () : 1 Γ ` inji v : A1 + A2 Γ ` (v1, v2): A1 × A2 Γ ` thunk e : U C Γ ` a : M A

Γ ` e : C (Under Γ, expression e has computation type C.)

TYE-RET TYE-BIND TYE-VAR TYE-LAM TYE-APP ` Γ(f ) = C Γ, x:A ` e : C Γ ` v : A Γ ` e : A C Γ ` v : A Γ ` e1 : F` A Γ, x:A ` e2 :(C) ` Γ ` f : C Γ ` λx.e : A C Γ ` ret v : F` A Γ ` e v : C Γ ` let x e1 in e2 :(C) → TYE-CASE TYE-SPLIT → ← forall i in {1, 2} Γ ` v : A1 × A2 TYE-FORCE TYE-INNER TYE-FIX ` inner Γ ` v : A1 + A2 Γ, xi:Ai ` ei : C Γ, x1:A1, x2:A2 ` e : C Γ, f :C ` e : C Γ ` v : U (C) |Γ| ` e :(C) ` ` Γ ` case (v, x1.e1, x2.e2): C Γ ` split (v, x1.x2.e): C Γ ` fix f .e : C Γ ` force` v :(C) Γ ` inner e :(C)

TYE-REF TYE-GET TYE-SET Γ ` v : A Γ ` v : M A Γ ` v1 : M A Γ ` v2 : A

Γ ` ref v : Fouter M A Γ ` get v : F` A Γ ` set v1 v2 : Fouter 1

← cdd Figure 3: Typing semantics of λic

K; S1 ` e S2; T (Reduction to traces: “Under knowledge K and store S1, e reduces, yielding S2 and trace T”.) INCR-APP INCR-BIND ⇓ trm(T1) = λx.e2 K; S1 ` e1 S2; T1 trm(T1) = ret v K; S1 ` e1 S2; T1 INCR-TERM K, T1; S2 ` e2[v/x] S3; T2 K, T1; S2 ` e2[v/x] S3; T2 K; S ` e˜ S; e˜ K; S1 ` e1 v S3; T1·T2 ⇓ K; S1 ` let x e1 in e2 S3;⇓T1·T2 ⇓ ⇓ INCR-CASE INCR-SPLIT INCR-FIX ⇓ ⇓ ← ⇓ K; S1 ` ei[v/xi] S2; T K; S1 ` e[v1/x1][v2/x2] S2; T K; S1 ` e[fix f .e/f ] S2; T

K; S1 ` case (inji v, x1.e1, x2.e2) S2; T K; S1 ` split ((v1, v2), x1.x2.e) S2; T K; S1 ` fix f .e S2; T ⇓ ⇓ ⇓ INCR-FORCEPROP ⇓ e ⇓ ⇓ forcee˜ 1 [T1] ∈ K INCR-FORCE e e INCR-INNER K; S ` forcee˜ [T1] prop forcee˜ [T2] K; S1 ` e S2; T trm(T) = e˜ 1 y 2 K; S ` e S; T e e K; S1 ` force` (thunk e) S2; forcee˜ [T] K; S ` forceinner (thunk e) S; forcetrm(T2)[T2] K; S ` inner e S; T ⇓ ⇓ INCR-GET INCR-REF ⇓ ⇓ ⇓ a ∈/ dom(S) S1(a) = v INCR-SET a K; S ` ref v S, a:v; ret a K; S1 ` get a S2; getv K; S ` set a v S, a:v; ret ()

⇓ ⇓ ← ⇓ cdd Figure 4: Operational semantics of λic : Reduction (to traces), propagating incremental changes.

` (C) `2 Γ1 ` Γ2 (Under Γ1, context Γ2 is a consistent extension.) Γ ` S wf ( F`1 A) = F`2 A ` ` (A C) = A (C) EXT-CONS (Under Γ, store S is well-formed.) Γ1 ` Γ2 SWF-CONS EXT-REFL → → a fresh for Γ2 Γ ` S wf Γ ` a : M A Γ ` Γ Γ1 ` Γ2, a:A SWF-EMP Γ ` v : A Γ ` ε wf Γ ` S, a:v wf

Figure 5: Auxiliary typing judgements: Layer coercion, context extension and store typing.

5 2014/1/18 in the DCG corresponds to a a subtree in the DCT that is duplicated patch The patching judgement is written as T1{e : T2} ; T3 and read several times (one duplicate per use). trace T1 with the new mapping of e to T2 results in (“patched”) Figure 6 also defines trm(T) as the rightmost element of trace trace T3. T, i.e., its terminal element, equivalent to e˜ in the normal evaluation patch judgment. It also defines when prior knowledge is well-formed. e˜{e : T2} ; e˜ patch 0 0 patch 0 patch 0 (T1·T2){e : T3} ; T1 ·T2 when T1{e : T3} ; T1 , T2{e : T3} ; T2 Reduction to traces. Rules INCR-APP and INCR-BIND are simi- patch ( forcee [T ]) {e : T } ; forcee [T ] lar to a standard semantics, except that they use trm(T) to extract e˜ 1 2 e˜ 2 e1 patch e1 patch the lambda or return expression, respectively, and they add the trace ( forcee˜ [T1]) {e2 : T2} ; forcee˜ [T3] when T1{e2 : T2} ; T3, e1 6= e2 patch T1 from the first sub-expression’s evaluation to the prior knowledge a a ( get ){e : T2} ; get available to the second sub-expression. The traces produced from v v both are concatenated and returned from the entire computation. Conceptually, patching a DCT T simulatenously replaces all occur- Rule INCR-FORCE produces a force event; notice that the ex- rences of a forced thunk’s trace with an update-to-date version. The pression e from the thunk annotates the event, along with the trace definition is above straightforward: All the rules are congruences T and the terminal expression e˜ at its end. Rule INCR-GET sim- except for the first force rule, which performs the actual patching. ilarly produces a get event with the expected annotations. Rules It substitutes the given trace for the existing trace of the forced INCR-TERM,INCR-REF, and INCR-SET all return the expected ter- expression in question, based on the syntactic equivalence of the minal expressions. forced expression e. This means that all force events whose forced Rule INCR-FORCEPROP performs memoization of inner-layer computation is e will be updated “all at once,” simulating the shar- forces by uses change propagation to repair the memoized trace. ing pattern of DCGs. Importantly, we do not initiate change propagation at a set, and In sum, the incremental semantics defined above is a declarative thus we delay change propagation until a computation’s result it is specification for an efficient implementation. Below, we prove that actually demanded. Rule INCR-FORCEPROP non-deterministically this specification is sound, in the sense that it always yields a result chooses a prior trace of a force of the same expression e from K consistent with non-incremental evaluation. In the next section, we (that is, it chooses a memo match for the computation e) and give an efficient algorithmic account of change propagation that recursively switches to the propagating judgement described below. conforms to this specification. The prior trace to choose as the memo match is the first of two non- deterministic decisions of the incremental semantics; the second 4.2 Meta theory of incremental semantics concerns the propagating specification, below. The following theorem says that trace-based runs under empty knowledge in the incremental semantics are equivalent to runs in Propagating changes by checking and patching. The change the basic (non-incremental) semantics. propagation judgment K; S ` T1 yprop T2 updates a trace T1 to be T2 according to knowledge K and the current store S. Theorem 4.1 (Equivalence of blind evaluation). ε; S1 ` e S2; T if and only if S1 ` e S2; e˜ where e˜ = trm(T) K; S ` T1 yprop T2 (Change propagation) Next, we introduce well-formed knowledge, defined as PROP-PATCH ⇓ ⇓ K; S ` e S 0; T 0 Γ ` K wf (Under Γ, knowledge K is well formed.) 0 patch T1{e : T } ; T2 KWF-CONS PROP-CHECKS KWF-EMP √ 0 ⇓ Γ ` K wf Γ ` S1 wf ε; S1 ` e S2; T S ` T K, T ; S ` T2 yprop T3 Γ ` ε wf Γ ` K, T wf K; S ` T yprop T K; S ` T1 yprop T3 ⇓ We now state and prove that the incremental semantics enjoys In the base case (rule PROP-CHECKS), there are no changes remaining to propagate through the given trace, which is consistent subject reduction. with√ the given store, as determined by the checking judgment S ` Theorem 4.2 (Subject reduction). Suppose that Γ ` K wf, Γ ` 0 T (explained shortly). The recursive case (rule PROP-PATCH) S1 wf, Γ ` e : C, and K; S1 ` e S2; T then there exists Γ such 0 0 0 0 arbitrarily chooses an expression e and reduces it to a trace T that Γ ` Γ , Γ ` S2 wf, and Γ ` trm(T): C under the current store S. (The choice of e is the second non- deterministic decision of this semantic specification.) This new Finally, we prove that the incremental⇓ semantics is sound: when subtrace is patched into the current trace according to the patching seeded with (well-formed) prior knowledge, there exists a consis- 0 patch tent run in the basic (non-incremental) semantics. judgment T1{e : T } ; T2. The patched trace T2 is processed 0 recursively under prior knowledge expanded to include T , until Theorem 4.3 (Soundness). Suppose that Γ ` K wf and Γ ` S1 wf. the trace is ultimately made consistent. √ Then K; S1 ` e S2; T if and only if S1 ` e S2; trm(T) The checking judgement, written S ` T , ensures that a trace is consistent with a store. This theorem establishes that every incremental reduction has √ a corresponding⇓ basic reduction, and vice⇓ versa. This correspon- CHECK-TRM S ` e˜ √ √ √ dance establishes extensional consistency, i.e., the initial and final CHECK-SEQ S ` T1·T2 when S ` T1 and S ` T2 e √ √ conditions of the runs are equivalent. CHECK-FORCE S ` forcee˜ [T] when trm(T) = e˜ and S ` T a √ CHECK-GET S ` getv when S(a) = v 5. OCaml library The interesting rules are CHECK-FORCE and CHECK-GET. The first checks that the terminal expression e˜ produced by each force is We have implemented ADAPTON as an OCaml library implements consistent with the one last observed and recorded in the trace; i.e., with the basic API shown in Figure 7. The fundamental data types cdd it matches the terminal expression trm(T) of trace T. The second are aref and athunk, corresponding to M A and U C in λic , respec- rule checks that the value retrieved from an address a is consistent tively. The functions operating on aref and athunk are named after cdd with the current store. their counterparts in λic .

6 2014/1/18 type 'a aref 1 function dirty(node) val aref : 'a 'a aref 2 foreach edge ∈ node.incomingset do val get : 'a aref 'a 3 if ¬ edge.dirty then val set : a aref a unit ' → ' 4 edge.dirty − true; → type a athunk 5 dirty(edge.source); ' → → val force : 'a athunk 'a ← val thunk : (unit 'a) 'a athunk val memo : ( fn arg a) (( arg a athunk) as fn) 6 function propagate(node) ' ' → ' ' ' ' 7 foreach edge ∈ node.outgoinglist do → → 8 if edge.dirty then → → → → Figure 7: Basic ADAPTON API 9 edge.dirty − false; 10 if edge.target is athunk then 11 propagate(← edge.target); The last function, memo, solves a practical implementation is- 12 if edge.label 6= edge.target.value then sue while fixing the memoization choice left open by rule INCR- 13 node.outgoinglist − []; cdd cdd ORCE ROP λ λ F P in ic (Figure 4). In ic , memoization is based on 14 evaluate(node); syntactic equality, and occurs implicitly at force, but we cannot per- 15 return; form syntactic equality checks in OCaml. As such, the memo func- ← tion creates memoized thunk constructors, which are unary func- tions that return athunks. For example, we can define a memoized Algorithm 1: ADAPTON core pseudocode. constructor that computes fibonacci: let memo fib = memo (fun memo fib n ifn <= 1 then1 else force (memo fib (n − 1)) + force (memo fib (n − 2)));; As described in Section 2, the key to making ADAPTON efficient print int (force (memo fib 10));;( 89 ) ∗ ∗→ is to split change propagation into two phases—dirtying and propa- gation. Algorithm 1 lists the pseudocode for these two phases. The memo takes a function of two arguments (here, memo fib and n). It dirtying phase occurs when we make calls to set to update inputs returns a constructor (here, also called memo fib) that, when called to the incremental program. For each call to set, we traverse the with an argument (e.g., 10), first checks a memoization table to see if the constructor was previously called with the same argument. DCG starting from the aref backward, marking all traversed edges as “dirty” (lines 1 to 5). Note that unlike Section 2, in our imple- If not, the constructor creates an athunk that, when forced, com- putes its value by invoking the function with the constructor itself mentation only edges are dirtied; a node is implicitly dirty if any of (to make recursive calls) and the argument passed to the construc- its outgoing edges are dirty. The propagation phase occurs when we make calls to force to tor (e.g., 10); this athunk is then stored in the memoization table demand results from the incremental program. For each call to force and returned. Otherwise, the constructor returns the same athunk as on an athunk, we perform an inorder traversal starting from the before. This simple choice of returning the same athunk is equiv- alent to choosing the most recently patched occurrence of a trace athunk’s dirty outgoing edges, re-evaluating athunks as necessary. We check if we need to re-evaluate an athunk by iterating over its from the prior knowledge in rule INCR-FORCEPROP. To limit the size of memoization tables, we implement them using weak hash dirty outgoing edges in the order they were added (line 7). For each tables, relying on OCaml’s garbage collector to eventually remove dirty edge, we first clean its dirty flag (line 9). If the target node is an athunk, we recursively check if we need to re-evaluate the athunks that are no longer reachable. cdd target athunk (lines 10 to 11). Then, we compare the value of the ADAPTON does not provide an equivalent to inner in λ , since ic target aref or athunk against the label of the outgoing edge (line 12). OCaml’s type system makes it hard to enforce layer separation stat- If the value is inconsistent, we know that at least one input to the ically. In ADAPTON, an inner level computation implicitly begins athunk has changed, so we need to re-evaluate the athunk (line 14), when force is called and ends when the call returns. and need not check its remaining outgoing edges (line 15). In fact, we first remove all its outgoing edges (line 13), since some edges 5.1 ADAPTON change propagation algorithm may no longer be relevant due to the changed input; relevant edges cdd As described in Section 4, λic leaves open the choice of an ex- will be re-added when get or force is called during re-evaluation. pression e to patch in rule PROP-PATCH, allowing several possible (We store incoming edges in weak hash tables, relying on OCaml’s change propagation algorithms. ADAPTON implements an efficient garbage collector to remove irrelevant edges.) change propagation algorithm that avoids re-evaluating thunks un- Note that, in both dirtying and propagation, we only traverse an necessarily. edge if it is clean or dirty, respectively. We can do so because the ADAPTON operates on an acyclic demanded computation graph above procedures maintain an invariant that, if an edge is dirty at the (DCG), corresponding to demanded computation traces T. Simi- end of a dirtying or propagation phase, then all edges transitively lar to the visualization in Section 2, each node in the graph rep- reachable by traversing incoming edges beginning from the source resents an aref or an athunk, and each directed edge represents a node will also be dirty; dually, if an edge is clean, then all edges dependency pointing from an athunk to another aref or athunk. The transitively reachable by traversing outgoing edges beginning from DCG is initially empty at the beginning of the execution. Nodes are the target node will also be clean. This greatly improves efficiency added to the DCG whenever a new aref or athunk is created via aref, by amortizing dirtying costs across consecutive calls to set, and thunk, or a memo constructor (when memoization misses). Edges propagation cost across consecutive calls to force. are added when an athunk calls get or force; edges are labeled with By traversing the DCG inorder, ADAPTON re-evaluates incon- the value returned by that call. We maintain edges bidirectionally: sistent nodes in the same order as a standard non-incremental lazy Each node stores an ordered list of outgoing edges that is appended evaluator would force thunks. Therefore, ADAPTON avoids re- by each call to get or force, as well as an unordered set of incoming evaluating nodes unnecessarily, such as athunks that were condi- edges. This allows us to traverse the DCG from caller to callee or tionally forced due to certain inputs, but may no longer be forced vice-versa. under updated inputs.

7 2014/1/18 6. Empirical evaluation ADAPTON EagerTotalOrder speed-up over speed-up over This section evaluates ADAPTON’s performance against traditional LazyNonInc LazyNonInc pattern IC on micro benchmarks and larger example modeling a spread- input # sheet. We find that ADAPTON provides reliable speedups over naive filter 1e6 12.8 2.24 recomputation and often significantly outperforms traditional IC. map 1e6 7.80 1.53

quicksort lazy 1e5 2020 22.9 6.1 Micro-benchmark mergesort 1e5 336 0.148 filter 1e6 1.99 0.143 We ran a micro-benchmark to evaluate the effectiveness of ADAP- map 1e6 2.36 0.248 TON in handling incremental programs that are lazy, as well as fold(min) 1e6 472 0.123 those that use the swapping and switching patterns. We also eval- fold(sum) swap 1e6 501 0.128 uate it on incremental batch programs, the target of traditional IC exptree 1e6 667 10.1 where there is no repetition in the input and the entire output is updown1 4e4 22.4 0.00247 eagerly demanded. updown2 4e4 24.7 4.28 switch filter 1e6 2.04 4.11 • For lazy programs, we include standard list-manipulating bench- map 1e6 2.21 3.32 marks from the IC literature: filter, map, quicksort, and merge- fold(min) 1e6 4350 3090 sort. We run each program on a randomly generated list of fold(sum) batch 1e6 1640 4220 integers, and then demand the first item from the output list. exptree 1e6 248 746 • For the swapping pattern, we use filter, map, fold applying min Table 1: ADAPTON micro-benchmark results. and sum, and exptree, an arithmetic expression tree evaluator similar to that in Section 2. We run each program on a randomly generated list of integers (or balanced expression tree) and demand the output. Then we randomly split the input into two the input size, we run 250 pairs of cycles, e.g., alternating between lists (or pick two subtrees at the half the height), swap those removing and inserting a list item, to ensure consistent input size. parts, and then re-demand the entire output. In our initial evaluation, we observed that EagerTotalOrder spends a significant portion of time in the garbage collector (some- • For the switching pattern, we wrote two programs, updown1 times more than 50%), which has also been observed in prior and updown2, that sort a list of integers in either ascending or work [3]. To mitigate this issue, we tweak OCaml’s garbage col- descending order depending on another input. updown1 does the lector under EagerTotalOrder, increasing the minor heap size from obvious thing, sorting the input list in one direction or the other, 2MB to 16MB and major heap increment from 1MB to 32MB. whereas updown2 first sorts the input list in both directions and then returns the appropriate one. (As we will show in the Results. Table 1 summarizes the speed-up of ADAPTON EagerTo- results, the odd structure of updown2 is necessary to achieve talOrder when performing each incremental change-then-propagate a speed-up in traditional IC.) After sorting an initial list of computation over LazyNonInc. We also highlight table cells in gray integers and demanding the first output, we randomly remove to indicate whether ADAPTON or EagerTotalOrder has a higher an item, randomly insert an item, toggle the sort direction, and speed-up. then demand the first output again. We can see that ADAPTON provides a speed-up to all patterns • Finally, for batch programs, we use the same programs as the and programs. Also, ADAPTON is faster than EagerTotalOrder for swapping pattern, but instead of swapping parts of the input, the lazy, swapping, and switching patterns. These results validates we randomly insert or remove a single list item (or replace a the benefits of our approach. leaf node in an expression tree with a binary node with two leaf For the batch pattern, ADAPTON gets only about half the speed- nodes or vice-versa) before demanding the entire output. up of EagerTotalOrder. This is expected, since EagerTotalOrder is optimized for the batch pattern, whereas ADAPTON adds over- We measure the time it takes to run ADAPTON in comparison head that is unnecessary if all outputs are demanded. Interestingly, to other variants that implement the same API. First, to compare ADAPTON is faster for fold(min), since single changes are not as against prior IC work, we implemented EagerTotalOrder, which likely to affect the result of the min operation as compared to other uses the traditional, totally ordered, monolithic form of IC (in operations such as sum. particular, [2]). Second, as baseline, we compare against standard Conversely, EagerTotalOrder actually incurs a slowdown over lazy programs with LazyNonInc, which wraps lazy values and does LazyNonInc in many other cases. For lazy mergesort, EagerTo- not provide incremental semantics or memoization. In particular, talOrder performs badly due to limited memoization between each 'a aref and 'a athunk are simply records containing 'a lazy and a internal recursion in mergesort. Prior work solved this problem unique ID, set throws an exception (thus requiring the program to by asking programmers to manually modify mergesort using tech- recompute the results from scratch), and memo just calls thunk. niques such as adaptive memoization [3] or keyed allocation [17]; We compile the micro-benchmarks using OCaml 4.00.1 and run we are currently investigating alternative approaches. them on an 8-core, 2.26 GHz Intel Mac Pro with 16 GB of RAM EagerTotalOrder also incurs slowdowns for swapping and switch- running Mac OS X 10.6.8. We run 2, 4, or 8 programs in parallel, ing, except for exptree and updown2. Unlike ADAPTON, EagerTo- depending on the memory usage of the particular program. For talOrder can only memo-match about half the input on average for most programs, we choose input sizes of 1e6 items. For quicksort changes considered by swapping due to its underlying total order- and mergesort, we choose 1e5 items, and for updown1 and updown2, ing assumption, and has to recompute the rest. we choose 4e4 items, since these programs use up to 6GB of For updown1 in particular, the structure of the computation trace memory under EagerTotalOrder. We report the average of 8 runs is such that EagerTotalOrder cannot memo-match any prior compu- for each program using seeds 1–8 to initialize OCaml’s random tation at all, and has to re-sort the input list every time the flag input number generator (to generate the input data, seed hash functions, is toggled. updown2 works around this limitation by uncondition- etc.), and each run consists of 250 change-then-propagate cycles for ally sorting the input list in both directions before returning the changes that do not affect the input size. For changes that do affect appropriate one, but this effectively wastes half the computation. In

8 2014/1/18 contrast, ADAPTON is equally effective for updown1 and updown2: '!" %#" '()*+,-" It is able to memo-match the computation in updown1 regardless of )*+,-./" %!" &" .)/012,+)341(01" the flag input, and, due to laziness, incurs no cost to unconditionally 0+1234.-+563*23" sort the input list twice in updown2. %" $#"

Other experiments. In addition to the running times, we also $" $!" measured the memory consumed by ADAPTON and EagerTo- talOrder.ADAPTON uses 3–98% less memory than EagerTo- #" #"

talOrder for the lazy, swapping, and switching patterns, and it uses Speedup (over Naive) !" !" 103–120% more memory for the batch pattern. Our supplemental #" (" '#" %" &" $%" $&" %%" %&" technical report contains a more detailed table summarizing mem- Num. Sheets (10 Changes) Num. Changes (15 Sheets) ory usage as well as other results of our experiment. 2 Finally, we also ran ADAPTON on quicksort while varying the Figure 8: ADAPTON Spreadsheet (AS ) performance tests. demand size (recall that in Table 1, one element is demanded for the lazy benchmarks). As expected, the speed-up decreases as demand size increases, but ADAPTON still outperforms EagerTotalOrder number of sheets grows, the benefit of ADAPTON increases. In fact, when demanding up to 1.8% of the output for quicksort. We also ob- our measurements show that with only four sheets, the performance served that the dirtying cost also increases with demand size. This of the naive approach is overtaken by ADAPTON; the gap widens is due to the interplay between dirtying and propagation phases: As exponentially for more sheets. By contrast, EagerTotalOrder offers more output is demanded, more edges will be cleaned by the prop- no incremental benefit, and the performance is always worse than agation phase, and will have to be dirtied by the dirtying phase. the naive implementation, resulting in slowdowns that worsen as The Appendix contains more details of these observations and input sizes grow. We note that the speedups vary, depending on our elapsed-time experiments. random choices made by both scramble-one and scramble-all.1 The right plot shows that even for a fixed-sized spreadsheet, as 6.2 AS2: An experiment in stateless spreadsheet design the number of changes grows, the benefit of ADAPTON increases As a more realistic case study of ADAPTON, we developed the exponentially. As with the left plot, EagerTotalOrder again offers 2 ADAPTON SpreadSheet (AS ), which implements basic spread- no incremental benefit, and always incurs a slowdown (e.g., at the sheet functionality and uses ADAPTON to handle all change prop- right edges of each plot, we consistently measure slowdowns of agation as cells are updated. This is in contrast to conventional 100x or more). spreadsheet implementations, which have custom caching and de- In both cases (more sheets and more changes), ADAPTON offers pendence tracking logic. significant speedups over the naive stateless evaluator. These per- In AS2, spreadsheets are organized into a standard three- formance results makes sense: efficient AS2 evaluation relies criti- dimensional coordinate system of sheets, rows and columns, and cally on the ability to reuse results multiple times within a compu- each spreadsheet cell contains a formula. The language of formu- tation (the sharing pattern). While ADAPTON supports this incre- lae extends that of Section 2 with support for cell coordinates, mental pattern with ease, EagerTotalOrder fundamentally lacks the arbitrary-precision rational numbers and binary operations over ability to do so, and instead only incurs a large performance penalty them, and aggregate functions such as max, min and sum. It also for its dependence-tracking overhead. adds a command language for navigation (among sheets, rows and columns), cell mutation and display. For instance, the following 7. Related Work script simulates the interaction from Section 2: Incremental computation. The idea of memoization—improving goto A1; =1; goto A2; =2; goto A3; =3; efficiency by caching and reusing the results of pure computations— goto B1; =(A1 + A2); goto B2; =(A1 + A3); dates back to at least the late 1950’s [9, 29, 30]. Incremental com- display B1; display B2; goto A1; =5; display B1; putation (IC), which has also been studied for decades [1, 21, 28, goto B2; =(A3 + B1); display B2; 31, 32], uses memoization to avoid unnecessary recomputation The explicit state of AS2 consists simply of a mutable mapping when input changes. Early IC approaches were promising, but are of three-dimensional coordinates to cell formulae. We empirically limited to caching final results. study different implementations of AS2 using a test script that Self-adjusting computation is a recent approach to IC that uses a simulates a user loading dense spreadsheet (with s sheets, ten rows special form of memoization that caches and reuses portions of dy- and ten columns) and making a sequence of c random cell changes namic dependency graphs (DDGs) of a computation. These DDGs while observing the (entire) final sheet s: are generated from conventional-looking programs with general re- cursion and fine-grained data dependencies [4, 11]. As a result, self- scramble-all; goto s!a1; print; repeat c{scramble-one; print} adjusting computation tolerates store-based differences between the pending computation being matched and its potential matches The scramble-all command initializes the formulae of each sheet in the memo table; change-propagation repairs any inconsisten- such that sheet 1 holds random constants (uniformly in [0, 10k]), cies in the matched graph. Researchers later combined these dy- and for i > 1, sheet i consists of binary operations (drawn uni- namic graphs with a special form of memoization, making the ap- formly among {+, −, ÷, ×}) over two random coordinates in sheet proach even more efficient and broadly applicable [2]. More re- i − 1. scramble-one changes a randomly chosen cell to a random cently, researchers have studied ways to make self-adjusting pro- constant. grams easier to write and reason about [12, 13, 26] and better per- Figure 8 shows the performance of this test script. In the left forming [19, 20]. plot, the number of sheets varies, and the number of changes is ADAPTON is similar to self-adjusting computation in that it fixed at ten; in the right plot, the number of sheets is fixed at applies to a conventional-looking language and tracks dynamic fifteen, and the number of changes varies. In both plots, we show dependencies. However, as discussed in Sections 1 and 2, we the relative speedup/slowdown of ADAPTON and EagerTotalOrder to that of naive, stateless evaluation. The left plot shows that as the 1 We plot an average of eight randomized runs for each coordinate.

9 2014/1/18 make several advances over prior work in the setting of interac- [3] U. A. Acar, G. E. Blelloch, M. Blume, and K. Tangwongsan. An ex- tive, demand-driven computations. First, we formally characterize perimental analysis of self-adjusting computation. In Proceedings of the semantics of the inner and outer layers working in concert, the ACM Conference on Programming Language Design and Imple- whereas all prior work simply ignored the outer layer (which is mentation, 2006. problematic for modeling interactivity). Second, we offer a com- [4] U. A. Acar, G. E. Blelloch, and R. Harper. Adaptive functional positional model that supports several key incremental patterns— programming. ACM Transactions on Programming Languages and sharing, switching, and swapping. All prior work on self-adjusting Systems (TOPLAS), 28(6):990–1034, 2006. computation, which is based on maintaining a single totally ordered [5] U. A. Acar, A. Ihler, R. Mettu, and O. Sumer.¨ Adaptive Bayesian view of past computation, simply cannot handle these patterns. inference. In Neural Information Processing Systems (NIPS), 2007. Ley-Wild et al. have recently studied non-monotonic changes [6] U. A. Acar, G. E. Blelloch, K. Tangwongsan, and D. Turko¨ glu.˘ Ro- (viz., what we call “swapping”), giving a formal semantics and pre- bust kinetic convex hulls in 3D. In Proceedings of the 16th Annual liminary algorithmic designs [25, 27]. However, these semantics European Symposium on Algorithms, Sept. 2008. still assume a totally ordered, monolithic trace representation and [7] U. A. Acar, A. Cotter, B. Hudson, and D. Turko¨ glu.˘ Dynamic well- hence are still of limited use for interactive settings, as discussed in spaced point sets. In Symposium on Computational Geometry, 2010. Section 1. For instance, their techniques explicitly assume the ab- [8] W. W. Armstrong. Dependency structures of data base relationships. sence of sharing (they assume all function invocations have unique In IFIP Congress, pages 580–583, 1974. arguments), and they do not support laziness, which they leave for [9] R. Bellman. Dynamic Programming. Princeton Univ. Press, 1957. future work. Additionally, to our knowledge, their techniques have [10] M. A. Bender, R. Cole, E. D. Demaine, M. Farach-Colton, and J. Zito. no corresponding implementations. Two simplified algorithms for maintaining order in a list. In ESA 2002, Functional reactive programming (FRP). The chief aim of FRP pages 152–164. Springer, 2002. is to provide a declarative means of specifying interactive and/or [11] M. Carlsson. Monads for incremental computing. In International time-varying behavior [14, 15, 22]. FRP-based proposals share Conference on , pages 26–35, 2002. some commonalities with incremental computation; e.g., when an [12] Y. Chen, J. Dunfield, M. A. Hammer, and U. A. Acar. Implicit input signal is updated (due to an event like a key press, or simply self-adjusting computation for purely functional programs. In Int’l the passage of time), dependent computations are updated as well, Conference on Functional Programming (ICFP ’11), pages 129–141, and this update process may take advantage of memoization. Sept. 2011. However, FRP’s notion of incremental change is implicit, as part [13] Y. Chen, J. Dunfield, and U. A. Acar. Type-directed automatic incre- of its evaluation model, rather than an explicit mechanism one can mentalization. In ACM SIGPLAN Conference on Programming Lan- program with, as with an IC system like ADAPTON. While it may guage Design and Implementation (PLDI), Jun 2012. be possible, it is hard to imagine writing an efficient incremental [14] G. H. Cooper and S. Krishnamurthi. Embedding dynamic dataflow in a sorting algorithm using FRP. On the other hand, IC would seem call-by-value language. In In European Symposium on Programming, to be an appropriate mechanism for implementing an FRP engine. pages 294–308, 2006. As such, we have begun to experiment with an IC-based imple- [15] E. Czaplicki and S. Chong. Asynchronous functional reactive pro- mentation of FRP using ADAPTON’s abstractions, and plan further gramming for GUIs. In PLDI, 2013. explorations in future work. More details about this FRP library are [16] P. F. Dietz and D. D. Sleator. Two algorithms for maintaining order available in Appendix E. in a list. In Proceedings of the 19th ACM Symposium on Theory of Computing, pages 365–372, 1987. 8. Conclusion [17] M. Hammer and U. A. Acar. Memory management for self-adjusting computation. In International Symposium on Memory Management, Within the context of interactive, demand-driven scenerios, we pages 51–60, 2008. identify key limitations in prior work on incremental computation. [18] M. Hammer, U. A. Acar, M. Rajagopalan, and A. Ghuloum. A pro- Specically, we show that certain idiomatic patterns naturally arise posal for parallel self-adjusting computation. In DAMP ’07: Declara- that result in incremental computations being shared, switched and tive Aspects of Multicore Programming, 2007. swapped, each representing in an “edge case” that past work cannot [19] M. Hammer, G. Neis, Y. Chen, and U. A. Acar. Self-adjusting stack handle efficiently. These limitations are a direct consequence of machines. In ACM SIGPLAN Conference on Object-Oriented Pro- past works’ tacit assumption that the maintained cache enabling gramming, Systems, Languages, and Applications (OOPSLA), 2011. incremental reuse is monolithic and totally-ordered. [20] M. A. Hammer, U. A. Acar, and Y. Chen. CEAL: a C-based language To overcome these problems, we give a new, more composable for self-adjusting computation. In ACM SIGPLAN Conference on approach that naturally expresses lazy (ie., demand-driven) evalu- Programming Language Design and Implementation, 2009. ation that uses the notion of a thunk to identify reusable units of [21] A. Heydon, R. Levin, and Y. Yu. Caching function calls using precise computation. This new approach naturally supports the idioms that dependencies. In Programming Language Design and Implementa- were previously problematic. We executed this new approach both tion, pages 311–320, 2000. formally, as a core calculus that we prove is always consistent with [22] N. R. Krishnaswami and N. Benton. A semantic model for graphical full-reevaluation, as well as practically, as an OCaml library (viz., user interfaces. In M. M. T. Chakravarty, Z. Hu, and O. Danvy, editors, ADAPTON). We evaluated ADAPTON on a range of benchmarks, ICFP, pages 45–57. ACM, 2011. ISBN 978-1-4503-0865-6. showing that it provides reliable speedups, and in many cases dra- [23] P. Levy. Call-by-push-value: A subsuming paradigm. Typed Lambda matically outperforms prior state-of-the-art IC approaches. Calculi and Applications, pages 644–644, 1999. [24] P. B. Levy. Call-by-push-value: A Functional/imperative Synthesis, References volume 2. Springer, 2003. [1] M. Abadi, B. W. Lampson, and J.-J. Levy.´ Analysis and caching of de- [25] R. Ley-Wild. Programmable Self-Adjusting Computation. PhD the- pendencies. In International Conference on Functional Programming, sis, Department, Carnegie Mellon University, Oct. pages 83–91, 1996. 2010. [2] U. A. Acar, G. E. Blelloch, M. Blume, R. Harper, and K. Tang- [26] R. Ley-Wild, U. A. Acar, and M. Fluet. A cost semantics for self- wongsan. A library for self-adjusting computation. Electronic Notes adjusting computation. In Proceedings of the 26th Annual ACM in Theoretical Computer Science, 148(2), 2006. Symposium on Principles of Programming Languages, 2009.

10 2014/1/18 S ` e S ; e ( ) sched 1 2 ˜ Basic reduction S; e ` T ; p (Schedule: Under S and e, p is next in T.) EVAL-APP CHED SEQ ⇓ S1 ` e1 S2; λx.e2 S - 2 EVAL-TERM S ` e [v/x] S ; e˜ sched 2 2 3 SCHED-SEQ1 S; e ` T1 ; ◦ S ` e˜ S; e˜ S1 ` e1⇓v S3; e˜ SCHED-TERM sched sched S; e1 ` T1 ; e2 S; e ` T2 ; p ⇓ EVAL-BIND sched sched sched ⇓ ⇓ S; e ` e˜ ; ◦ S; e1 ` T1·T2 ; e2 S; e ` T1·T2 ; p S1 ` e1 S2; ret v EVAL-CASE S ` e [v/x] S ; e˜ S ` ei[v/xi] S ; e˜ 2 2 3 1 2 SCHED-FORCEOK S1 ` let x ⇓e1 in e2 S3; e˜ S1 ` case (inji v, x1.e1, x2.e2) S2; e˜ trm(T) = e˜ SCHED-PATCHFORCE ⇓ ⇓ sched EVAL-SPLIT EVAL-FIX trm(T) 6= e˜ S; e2 ` T ; p ← ⇓ ⇓ S1 ` e[v1/x1][v2/x2] S2; e˜ S1 ` e[fix f .e/f ] S2; e˜ e2 sched e2 sched S; e1 ` forcee˜ [T] ; e1 S; e1 ` forcee˜ [T] ; p S1 ` split ((v1, v2), x1.x2.e) S2; e˜ S1 ` fix f .e S2; e˜ ⇓ ⇓ SCHED-PATCHGET SCHED-GETOK EVAL-FORCE ⇓ EVAL-INNER ⇓ S(a) 6= v S(a) = v S1 ` e S2; e˜ S ` e S; e˜ a sched a sched S1 ` force` (thunk e) S2; e˜ S ` inner e S; e˜ S; e ` getv ; e1 S; e ` getv ; ◦ ⇓ ⇓ EVAL-REF EVAL-GET ⇓ ⇓ a ∈/ dom(S) S1(a) = v K; S ` T1 yalg T2 (Algorithm: determinisitically patches T1 into T2.) S ` ref v S, a:v; ret a S ` get a S ; ret v 1 2 ALG-PATCHSTEP sched EVAL-SET S; e1 ` T1 ; e2 ⇓ ⇓ 0 K; S ` e2 S; T S ` set a v S, a:v; ret () patch ( forcee1 [T ]) {e : T 0} ; T e˜ 1 1 2 2 ALG-DONE 0 ⇓ ← ⇓ sched K, T ; S ` T2 yalg T3 cdd S; e ` T ; ◦ Figure 9: Basic reduction semantics of λic (i.e., non-incremental e1 K; S ` force [T1] alg T3 evaluation). K; S ` T yalg T e˜ 1 y

[27] R. Ley-Wild, U. A. Acar, and G. E. Blelloch. Non-monotonic self- Figure 10: A deterministic schedule for patching. adjusting computation. In ESOP, pages 476–496, 2012. [28] Y. A. Liu and T. Teitelbaum. Systematic derivation of incremental programs. Sci. Comput. Program., 24(1):1–39, 1995. sources of nondeterminism in the formal semantics (i.e., it deter- [29] J. McCarthy. A basis for a mathematical theory of computation. In ministically schedules patching order, while still leaving open the P. Braffort and D. Hirschberg, editors, Computer Programming and choice of which prior trace to patch). Formal Systems, pages 33–70. North-Holland, Amsterdam, 1963. In general, some orders are preferable to others: Some patching [30] D. Michie. “Memo” functions and machine learning. Nature, 218: steps may fix inconsistencies in sub-traces that ultimately are not 19–22, 1968. relevant under the current store. The algorithmic semantics given [31] W. Pugh and T. Teitelbaum. Incremental computation via function here addresses this problem by giving a deterministic order to caching. In Principles of Programming Languages, pages 315–328, patching, such that no unnecessary patching steps occur. The other 1989. source of non-determinism in the semantics arises from rule INCR- [32] G. Ramalingam and T. Reps. A categorized bibliography on incre- FORCEPROP. The specification allows an arbitrary choice of trace mental computation. In Principles of Programming Languages, pages to patch from prior knowledge; in general, many instances of a 502–510, 1993. forced expression e may exist there, and it is difficult to know, a [33] A. Shankar and R. Bodik. DITTO: Automatic incrementalization of priori, which trace will be most efficient to patch under the current data structure invariant checks (in Java). In Programming Language store. We address this problem in our implementation, given in Design and Implementation, 2007. Section 5. Figure 10 defines K; S ` T1 yalg T2 , the algorithmic propa- A. Basic reduction semantics gation judgment, which employs an explicit, deterministic patch- ing order. This order is determined by the scheduling judgment The judgement in Figure 9 gives the basic reduction semantics of sched λcdd (i.e., non-incremental evaluation). The judgement is written S; e ` T ; p, where e is the nearest enclosing thunk, and p is ic either some thunk e 0 or else is ◦ if trace T is consistent. as S1 ` e S2; e˜, and can be read as “under S1, computation expression e reduces in n steps to terminal e˜, producing store S2”. optional thunk p ::= ◦ | e ⇓ The scheduling judgement chooses p based on the inconsisten- B. Scheduled patching cies of the trace T and store S. Rule SCHED-TERM is the base case; The incremental semantics given in Section 4 is sound, but not rule SCHED-SEQ1 and rule SCHED-SEQ2 handle the sequencing yet an algorithm. Meanwhile, the change propagation algorithm in cases where there is a thunk to patch is in the left sub-trace, or not, Section 5.1 provides a deterministic algorithm, but it is not imme- respectively. For forces and gets, the respective rules check that the diately clear how to relate it to the (non-deterministic) formal se- force or get event is consistent. If so, rules SCHED-FORCEOK and mantics. In service of relating ADAPTON to the formal semantics, SCHED-GETOK recursively check the trace enclosed within those this section provides a determinisitic variant of the formal seman- events. rule SCHED-FORCEOK additionally places the thunk asso- tics that is faithful to ADAPTON, which resolves one of the two ciated with the enclosed trace as e to the left of the turnstile, since

11 2014/1/18 that is the nearest enclosing thunk for that trace. Otherwise, if the CBV big-step semantics. Figure 12 gives the judgement for big- force or get event is inconsistent, rules SCHED-PATCHFORCE and step evaluation of CBV terms under a store S˙. SCHED-PATCHGET schedule the nearest enclosing thunk to patch. CBV type-directed translation. Figure 13 gives the judgement for Algorithmic change propagation K; S ` T1 yalg T2 closely translating CBV types into corresponding CBPV types. Figure 14 resembles the nondeterministic specification K; S ` T T 1 yprop 2 gives the judgement for translating CBV terms into corresponding from Section 4. It gives rules for successively scheduling a thunk CBPV value terms. Figure 15 gives the judgement for translating T T and patching the associated trace 1 into 2. We note that the root CBV terms into corresponding CBPV value terms. Figure 16 gives of the incoming and outgoing trace always consists of a force event the judgement for translating CBV stores into corresponding CBPV (i.e., forcee [T ] ); this invariant is preserved by the propagation e˜ 1 1 stores. algorithm, and its use in rule INCR-FORCEPROP, which is the only evaluation rule that uses this judgement as a premise. The Meta theory. We show several simple results: rule ALG-PATCHSTEP does all the work of the algorithm: it sched- Theorem C.1 (CBV subject reduction). ules the next thunk e to patch, recomputes this thunk yielding a 2 Suppose that: new trace T 0, patches in the new trace for occurrences of e in the e • G ` S˙ current trace ( forcee˜ 1 [T1] ), and repeats this process, which termi- 1 1 nates with rule ALG-DONE. The premise to rule ALG-DONE is • G1 `` M : τ justified by the following lemma, which relates this base case to • S˙1 ` M S˙2; V the specification semantics: then there exists G2 such that G1 ` G2 and Lemma B.1 (No scheduled thunk implies check). ⇓ √ • G2 ` S˙2 S; e ` T sched; ◦ implies S ` T • G2 `` V : τ The result below says that our algorithmic semantics is sound Theorem C.2 (CBV typing implies CBPV translation). with respect to the specification semantics. comp • If G ` M : τ then G ` M : τ ; Γ ` e : C Theorem B.2 (Algorithmic semantics is sound). ` ` • val If K; S ` T1 yalg T2 then K; S ` T1 yprop T2 If G `` V : τ then G ` V : τ ; (Γ ` v : A) Theorem C.3 (CBPV translation and CBV reduction commute). C. From CBV into CBPV Suppose that: cdd This section presents a call-by-value (CBV) variant of λic that • G ` S˙1 ; Γ ` S1 shares a close correspondance to the CBPV variant presented in comp • G `` M1 : τ ; Γ ` e : C the main body of the paper. In particular, we present syntax, typing • S˙ ` M S˙ ; V and reduction rules in a CBV style, which are otherwise analogous 1 1 2 to the CBPV presented in Section 3. We connect the CBV variant then there exists extended contexts Γ 0 and G 0 with G ` G 0 and cdd 0 presented here to the CBPV variant of λic via a type-directed Γ ` Γ such that:⇓ translation. We show that this translation is preserved by the basic, 0 0 • G ` S˙2 ; Γ ` S2 non-incremental reduction semantics of both variants: If a CBV comp • 0 0 term M translates to a CBPV e, and if M reduces to a value V, G `` V : τ ; Γ ` e˜ : C then e reduces to a CBPV terminal e˜ to which value V translates. • S1 ` e S2; e˜ CBV syntax. Unlike CBPV syntax, CBV syntax does not syntac- D. Empirical⇓ evaluation in detail tically separate values and expressions; rather, values are a syntac- tic subclass of expressions. Second, CBV treats lambdas as val- D.1 Micro-benchmark ues. Finally, since the CBV type syntax is missing an analogue This section provide more details about the micro-benchmark de- cdd to F` A, which carries a layer annotation in λic , we instead af- scribed in Section 6.1. In addition to LazyNonInc, we also com- ` pared against a second baseline, EagerNonInc, which implements τ fix this layer annotation to the type connectives arrow ( 1 standard eager evaluation without incremental semantics or memo- ` ` τ2), thunk (U τ1) and reference cells ( M τ1 ). The other differ- ization. It is similar to LazyNonInc, except that 'a aref and 'a athunk ences to the syntax and typing rules reflect these key distinctions:→ contains 'a rather than 'a lazy. For ADAPTON and EagerTotalOrder, we also measured the time it takes to perform a computation from CBV values V ::= () | λx.M | inji V | (V1,V2) | thunk M | a scratch, i.e., when there is no prior knowledge and the output is demanded for the first time. Lastly, we also measured the memory CBV terms M ::= V | x | M1 M2 | let x M1 in M2 usage at the end the from-scratch computation, or after completing | fix f .M | inji M | case (M, x1.M1, x2.M2) | (M ,M ) | fst M | snd M | force M 250 or 500 change-then-propagate cycles. 1 2 ← ` | inner M | ref M | get M | set M1 M2 D.2 Micro-benchmark results ` Table 2 shows our results. For EagerNonInc and LazyNonInc, we re- τ ::= 1 | τ + τ | τ × τ | τ ←τ | U ` τ CBV types 1 2 1 2 1 2 port the wall-clock time that it takes to run the program. For ADAP- ` | M τ TON and EagerTotalOrder, we report overhead, the time it takes CBV typing env. G ::= ε | G, x:τ | G, f :τ | G, a:τ → to run the from-scratch computation relative to EagerNonInc and LazyNonInc, and speed-up, the time it takes to run each incremen- CBV store S˙ ::= ε | S,˙ a:M tal change-then-propagate cycle, also relative to EagerNonInc and LazyNonInc. For memory usage, we report the maximum heap size CBV typing. Figure 11 gives the judgement for typing CBV terms as reported by OCaml’s garbage collector. We also highlight table under a typing context G and layer `. cells in gray to indicate whether ADAPTON or EagerTotalOrder has

12 2014/1/18 ilo nu tmnest epoesd n iial for similarly and for effective processed, quite be also to is It needs item input million than faster times for check sanity a EagerTotalOrder A Also, programs. and incre- an performing when memory less computation. mental uses or speed-up higher a h aesince case the EagerTotalOrder uaina l,adhst esr h nu iteeytm h flag the time every list input the re-sort toggled. to is has input and all, at putation that such is rest. the recompute to has and assumption, ing by considered changes talOrder ing opportunities. memoization better dencies for approaches modify allocation manually keyed to programmers asking mergesort by problem this recur- solved internal each between for memoization in limited sion to due For badly cases. other many A min Interestingly, cases. fold(min) some in for Nonetheless, significant demanded. are A outputs programs in all all phase as cost propagation unnecessary a (as an just computation than incremental rather EagerTotalOrder 5.1), an Section perform in to described phase propagation a A because and mergesort fold(sum) fold(sum) fold(min) fold(min) quicksort updown2 updown1 xetfor except , o the For A that see can We For EagerTotalOrder Conversely, exptree exptree mergesort prto scmae oohroeain uhas such operations other to compared as operation eedoverhead: Legend filter filter filter map map map updown1 mergesort 8 rmdtbsscnb sdt ytmtclyidentify systematically to used be can databases from [8] a nymm-ac bu afteipto vrg for average on input the half about memo-match only can ic hne r o slkl oafc h euto the of result the affect to likely as not are changes since , ouetcnqe uhas such techniques use to batch EagerTotalOrder swap lazy pattern fmr upteeet eedmne. ro work Prior demanded.) were elements output more if switch batch EagerTotalOrder 4e4 4e4 1e6 1e6 1e6 1e6 1e6 1e6 1e6 1e6 1e6 1e6 1e5 1e5 1e6 1e6 λ DAPTON exptree DAPTON a bu aftesed eepce hst be to this expected We speed. the half about —at o the For . ic cdd npriua,tesrcueo h optto trace computation the of structure the particular, in EagerTotalOrder o the for updown2 input # SmlryA (Similarly . 1] eaecretyivsiaigalternative investigating currently are We [17]. atr,A pattern, EagerNonInc lazy npriua,w eiv that believe we particular, in , loicr lwon for slowdowns incurs also 1 all all 1 demand # DAPTON DAPTON ent htA that note we , .0 19.3 15.2 0.409 0.198 153 0.145 155 153 0.868 0.143 35.0 0.793 184 0.863 EagerNonInc and swapping iemem time lazy *NonInc lazy .7255 240 1.97 232 1.92 1.42 255 240 2.44 232 2.43 155 1.79 49.2 1.05 1.04 184 1.35 s (MB) (s) rvdsasedu htcnsilb quite be still can that speed-up a provides batch novsadryn hs nadto to addition in phase dirtying a involves ok rudti iiainb uncon- by limitation this around works quicksort updown2 , DAPTON mergesort swapping antmm-ac n ro com- prior any memo-match cannot atr,tedryn hs becomes phase dirtying the pattern, rvdsasedu oalpatterns all to speed-up a provides ie/fo-cac iespeed-up: time from-scratch / time sfse n ssls eoythan memory less uses and faster is sotmzdfrthe for optimized is for DAPTON u oisudryn oa order- total underlying its to due culyicr lwon in slowdowns incurs actually .006596.7 0.00000685 map nieA Unlike . .001 96.7 0.0000116 and dpiememoization adaptive osntproma elas well as perform not does , LazyNonInc iemem time and , s (MB) (s) .368.63 0.0326 8.63 0.0328 18.6 0.0741 DAPTON EagerTotalOrder .0 152 0.307 232 157 0.894 152 0.502 0.308 157 0.629 50.8 0.346 mergesort ic nyoeoto a of out one only since .1180 179 1.11 1.04 180 179 1.48 232 1.43 1.20 loicr slowdown a incurs also DAPTON switching swapping al :A 2: Table DAPTON ucinldepen- functional soe million a over is . batch .008 .1264 3.61 0.0000184 sfse for faster is atrs As patterns. sum .00526 264 2.63 0.000035 ae Lazy Eager , and vredmem overhead performs EagerTo- DAPTON from-scratch . pattern, .42. 73.4 21.9 73.4 1.74 21.8 3.60 1630 34.0 1610 19.1 934 36.3 19.6 913 10.5 6.63 20.7 12.0 1630 30.5 1610 18.4 934 31.2 18.3 915 10.8 7.22 380 20.7 12.4 161 17.2 5.72 23.8 2.23 9 7 1780 278 590 1780 285 613 switch- 3 or [3] filter . ir-ecmr results. micro-benchmark *NonInc 13 (MB) A DAPTON hra h pe-pof speed-up the A whereas of speed-up costs. the relative their a understand and to phase updated each in is spent time mea- list the additionally sured input inconsistencies—we repair the to when phase propagation phase dirtying DCG—a the fteotu,tu,teei tl navnaet sn A using to advantage as an much still as is not there if thus, output, the of A of both speed-up than the However, output. ad- each the take A to list). cost output than minor the a from is element there ditional (though size demand across of speed-up the 5% measured NonInc to also element we one comparison, from EagerTotalOrder For size output. demand the the vary of and baseline the as the to quan- lar We increases. for size effect this demand tified as significant more becomes ally demand varying over performances Evaluating D.3 list input the sort unconditionally in to twice cost no almost incurs laziness, in computation A contrast, In for memory. effective much as computation returning twice the before uses half wastes and directions effectively this both but in one, appropriate list the input the sorting ditionally hnpoaain ic tde o efr n optto on computation any for costly perform costly less less not significantly is is does dirtying dirtying it thunks; Also, since to output. has propagation, the computation than compute more to as size, performed demand be with A increases time of tion phases propagation and ing output. the of The 4% than D.2. more Appendix demanding in A explained that as shows slowdown, omit plot a We incurs it 17b. because Figure in shown A ie/iceetltm e:OalsG aiu epsize heap maximum GC OCaml’s mem: time incremental / time DAPTON 1540000 h eut r hw nFgr 7 nFgr 7 esethat see we 17a Figure In 17. Figure in shown are results The iue 7 n 7 hw o uhtm sseti h dirty- the in spent is time much how shows 17d and 17c Figures for speed-up The 951000 ae Lazy Eager 21600 pe-pmem speed-up EagerTotalOrder 7014 1630 1640 2700 7400 1010 3.75 3.43 1600 2.21 1580 3.31 2.04 3.39 309 135 315 888 872 1780 248 116 incremental ic A Since . updown2 lazy EagerNonInc 301610 4350 2020 efrsbs o ml eadsz,btiscs gradu- cost its but size, demand small for best performs 24.7 22.4 2.36 1.99 7.80 12.8 667 501 472 336 updown1 atr ecie nScin61 euse We 6.1. Section in described pattern updown1 DAPTON (MB) swl stesedu of speed-up the as well as 1780 1640 1620 1600 1580 . EagerTotalOrder DAPTON 119 121 395 162 264 264 DAPTON quicksort hndmnigmr hnaot18 of 1.8% about than more demanding when and mergesort ae Lazy Eager and 29951930 987 915 429 72.9 1440 70.9 17.0 1420 806 17.5 773 11.7 9.60 19.1 9.49 7.35 1440 11.0 15.2 1410 806 15.5 773 11.0 9.22 19.1 9.08 1390 7.35 60.3 2610 570 11.5 20.1 773 773 53.3 1680000 8.54 953000 12.8 2 061480 60.6 128 1480 60.1 129 otistodfeetpae htupdate that phases different two contains eadeso h a nu,ad u to due and, input, flag the of regardless vredmem overhead updown2 EagerTotalOrder eoe lwrthan slower becomes LazyNonInc from-scratch and erae sdmn ieincreases, size demand as decreases mergesort slwrbtsilsgicn,as significant, still but lower is DAPTON . EagerTotalOrder ti bet eomththe memo-match to able is It : EagerTotalOrder (MB) vnwe eadn 5% demanding when even DAPTON DAPTON sepce,propaga- expected, As . LazyNonInc sn rcdr simi- procedure a using 00015 773 773 1.53 2.24 301000 166000 srltvl constant relatively is ae Lazy Eager .1 .04 3710 0.00247 0.015 6090 0.128 6070 0.123 2900 0.228 0.248 2710 0.226 0.143 0.394 0.247 4900 0.148 0.443 mergesort 203090 6970 5260 3742 2120 4.28 1810 53.7 10.1 4.78 4.97 6.84 347 2680 22.9 245 DAPTON pe-pmem speed-up LazyNonInc incremental eoe slower becomes ean greater remains rmti plot this from EagerNonInc 4220 3.32 4.11 over 746 eaieto relative sequally is DAPTON 2014/1/18 (MB) Eager- 1480 1440 1440 1540 1410 when , 500 20 EagerTotalOrder (incr.) Adapton (incr.) 1 module type Behavior = sig Adapton (incr.) LazyNonInc 2 type ’a behavior 400 LazyNonInc 15

300 4 val const : ’a ’a behavior 10 5 val app : ( ’a ’b) behavior ’a behavior ’b behavior 200 → 6 val memo app : ( ’a ’b) behavior ’a behavior 5 → → → 100 ’b behavior

speed-up over EagerNonInc speed-up over EagerNonInc → → 0 0 8 val merge : ’a behavior ’a behavior ’a 0% 1% 2% 3% 4% 5% 0% 1% 2% 3% 4% 5% → behavior demand size demand size 9 val ifb : bool behavior ’a behavior ’a → → (a) quicksort speed-up (b) mergesort speed-up behavior ’a behavior 14 600 propagate propagate → → 12 11 val lift : (’a ’b) ’a behavior ’b behavior dirty 500 dirty → 10 12 val lift2 : (’a ’b ’c) ’a behavior ’b 400 behavior ’c behavior 8 → → → 13 val lift3 : (’a ’b ’c ’d) ’a behavior 300 → → → → 6 ’b behavior ’c behavior ’d behavior

time (ms) time (ms) → 200 14 val lift4 : (’a ’b ’c ’d ’e) ’a 4 → → → → behavior ’b behavior ’c behavior ’d 100 → → → 2 behavior ’e behavior → → → → → 0 0 0% 1% 2% 3% 4% 5% 0% 1% 2% 3% 4% 5% → → → 16 val memo lift : ( ’a ’b) ’a behavior ’b demand size demand size → behavior (c) quicksort dirty/patch time (d) mergesort dirty/patch time 17 val memo lift2 : ( ’a ’b ’c) ’a behavior → → → ’b behavior ’c behavior Figure 17: Speed-up and dirtying/patching time over varying de- 18 val memo lift3 : ( ’a ’b ’c ’d) ’a → → → → mand sizes for input size of 100,000 behavior ’b behavior ’c behavior ’d → behavior → → → → 19 val memo lift4 : ( ’a ’b ’c ’d ’e) ’a → → → propagation as more thunks are re-evaluated than in quicksort. Inter- behavior ’b behavior ’c behavior ’d behavior ’e behavior estingly, however, dirtying time increases with demand size. This is → → → → → due to the interaction between the amortization of the dirtying and → → → 21 val time : unit time behavior propagation phases described at the end of Section 5.1. For a set of → 22 val seconds : unit time behavior input changes, each consecutive change has to dirty fewer edges as → more edges become dirty in the graph. However, as demand size in- 24 val curr : ’a behavior ’a → creases, more dirty edges will be cleaned by propagation, resulting 25 val id : ’a behavior int 26 end in more dirtying work for the next set of input changes. → → E. ADAPTIME FRP Library Figure 18: Interface showing which FRP combinators are supported We have used ADAPTON to implement an FRP library, ADAPTIME, by ADAPTIME. that is inspired by FrTime [14]. In FRP, behaviors (aka streams) are composable, time-varying values that automatically update as time marches forward. FRP libraries need to track dependencies points). This limitation is due to the fact that these combinators between related behaviors to efficiently update values; in ADAP- have past dependence so side effects are needed to store the previ- TIME, all this heavy lifting is handled by ADAPTON. ous value. For memo matching to work properly, ADAPTON cur- We define a behavior as a pair containing a thunk of the behav- rently requires thunks to be pure. We plan to investigate how to ior’s value and the current time. mark certain thunks as eager so that we can support side effects efficiently. type ’a behavior = ’a thunk ∗ time thunk This approach simply constructs the FRP combinators by compos- ing together calls to the ADAPTON API such that behaviors are only updated when demanded. ADAPTIME uses the time thunk to keep track of when the latest event for a behavior fired. Here we describe some of the core FRP combinators that are available in ADAPTIME. The simplest combinator is const which takes a static value, and returns a behavior wrapping that value. The app combinator takes a functional behavior (i.e., of type (’ a ’b ) behavior ) and an argument (of type ’a behavior ) and produces a behavioral result (of type ’b behavior ). By using ADAPTON→, we can introduce a new combinator, memo app. This combinator is similar to app, except it memoizes the lambda and the argument, so it can be used to optimize FRP programs. Figure 18 shows the full interface with all of the combinators that ADAPTIME supports. A limitation of ADAPTIME is that we were unable to implement the combinators prev (which retains prior values), filter (which ignores irrelevant values), and fix (which permits time-indexed fix-

14 2014/1/18 G ` S˙ (Under G, the store S˙ is well-typed) CBVTYS-ADDR G ` S˙ CBVTYS-EMP ` G(a) = M τG `` M : τ G ` ε G ` S,˙ a:M

G `` M : τ (Under G at `, the term M has type τ)

CBVTY-ABS CBVTY-APP CBVTY-VAR G, x:τ1 `` M : τ2 CBVTY-UNIT 2 ` G(x) = τ G `` M1 : τ1 τ2 G `` M2 : τ1 `2 G `` x : τ G `` () : 1 G ``1 λx.M : τ1 τ2 G `` M1 M2 : τ2 → CBVTY-CASE → G `` M : τ1 + τ2 CBVTY-LET CBVTY-FIX CBVTY-INJ G, x1:τ1 `` M1 : τ3 G `` M1 : τ1 G, x:τ1 `` M2 : τ2 G, f :τ `` M : τ exists i in {1, 2} G `` M : τi G, x2:τ2 `` M2 : τ3

G `` let x M1 in M2 : τ2 G `` fix f .M : τ G `` inji M : τ1 + τ2 G `` case (M, x1.M1, x2.M2): τ3

CBVTY-THUNK CBVTY-FORCE CBVTY-PAIR ← CBVTY-FST CBVTY-SND ` G ``2 M : τ G `` M : U τ G `` M1 : τ1 G `` M2 : τ2 G `` M : τ1 × τ2 G `` M : τ1 × τ2 `2 G `` force M : τ G `` (M1,M2): τ1 × τ2 G `` fst M : τ1 G `` snd M : τ2 G ``1 thunk M : U τ `

CBVTY-SET G ` M : M`2 τ CBVTY-ADDR CBVTY-REF CBVTY-GET outer 1 CBVTY-INNER ` 2 ` `2 |G| `inner M : τ G(a) = M τ G `` M : τ G `` M : M τ G `outer M2 : U τ `2 ` G `outer inner M : τ G ``1 a : M τ G `outer ref M : M τ G `` get M : τ G `outer set M1 M2 : 1

← Figure 11: CBV typing semantics.

S˙1 ` M S˙2; V (Under S˙1, the term M reduces, yielding store S˙2 and value V) CBVEVAL-APP ⇓ S˙1 ` M1 S˙2; λx.M CBVEVAL-LET S˙3 ` M2 S˙4; V2 S˙1 ` M1 S˙2; V1 CBVEVAL-FIX CBVEVAL-VAL ˙ ˙ ˙ ˙ ˙ ˙ S5 ` M[V2⇓/x] S6; V3 S2 ` M2[V1/x] S3; V2 S1 ` M[fix f .M/f ] S2; V S˙ ` V S˙; V S˙1 ` M1 M⇓2 S˙7; V3 S˙1 ` let x M1⇓in M2 S˙3; V2 S˙1 ` fix f .M S˙2; V ⇓ ⇓ ⇓ CBVEVAL-CASE ⇓ ⇓ ← ⇓ ⇓ exists i in {1, 2} CBVEVAL-PAIR ˙ ˙ ˙ ˙ CBVEVAL-INJ S1 ` M S2; inji V1 S1 ` M1 S2; V1 CBVEVAL-FST S˙1 ` M S˙2; V S˙2 ` Mi[V1/xi] S˙3; V2 S˙2 ` M2 S˙3; V2 S˙1 ` M S˙2;(V1,V2) ˙ ˙ ˙ ˙ ˙ ˙ ˙ ˙ S1 ` inji M S2; inji V S1 ` case (M, x1.M⇓ 1, x2.M2) S3; V2 S1 ` (M1,M2)⇓ S3;(V1,V2) S1 ` fst M S2; V1 ⇓ ⇓ ⇓ ⇓ CBVEVAL-FORCE ⇓ ⇓ ⇓ 0 ⇓ S˙1 ` M S˙2; thunk M CBVEVAL-SND CBVEVAL-INNER 0 CBVEVAL-REF S˙2 ` M S˙3; V S˙1 ` M S˙2;(V1,V2) S˙1 ` M S˙2; V a ∈/ dom(S˙) S˙1 ` force⇓ M S˙3; V S˙1 ` snd M S˙2; V2 S˙1 ` inner M S˙2; V ` S˙ ` ref M S,˙ a:M; a ⇓ ⇓ ⇓ CBVEVAL-GET CBVEVAL-SET ⇓ ⇓ ⇓ ⇓ S˙1 ` M S˙2; a S˙1 ` M1 S˙2; a 0 0 S˙2(a) = M S˙2 ` M S˙3; V S˙2 ` M2 S˙3; thunk M

S˙1 ` get M⇓ S˙2; V S˙1 ` set M1 M⇓2 S˙3, a:M; () ⇓ ⇓ ⇓ ← ⇓ Figure 12: CBV big-step evaluation semantics.

15 2014/1/18 G ctxt; Γ (The CBV typing context G translates to the CBPV typing context Γ) CBV-TRCTXT-VARVAL CBV-TRCTXT-VARFIX CBV-TRCTXT-ADDR comp ctxt val CBV-TRCTXT-EMP G ctxt; Γ τ ;val A G ctxt; Γ τ ; C G ; Γ τ ; A ε ctxt; ε G, x:τ ctxt; Γ, x:A G, f :τ ctxt; Γ, f :C G, a: M` τ ctxt; Γ, a:A

τ ;val A (The CBV type τ translates to the CBPV value type A) CBV-TRVALTY-THUNK CBV-TRVALTY-REF CBV-TRVALTY-SUM CBV-TRVALTY-PROD comp CBV-TRVALTY-UNIT ` val val val val val τ ; (C) τ ; A τ1 ; A1 τ2 ; A2 τ1 ; A1 τ2 ; A2 val ` val ` val val val 1 ; 1 U τ ; U C M τ ; M A τ1 + τ2 ; A1 + A2 τ1 × τ2 ; A1 × A2

CBV-TRVALTY-ARROW ` comp ` τ1 τ2 ; (C) ` τ τ ;val U (C)` 1 → 2

comp τ ; C → (The CBV type τ translates to the CBPV computation type C) CBV-TRCOMPTY-ARROW val comp ` CBV-TRCOMPTY-FREE τ ; A τ ; (C) val 1 2 τ ; A ` comp ` comp τ1 τ2 ; A (C) τ ; F` A

→ → Figure 13: CBV types into CBPV types.

G ` M : τ ;val (Γ ` v : A) (Under G, the CBV term M translates to CBPV value term v)

CBV-TRVALTM-INJ1 val forall j in {1, 2}, τj ; Aj CBV-TRVALTM-VARVAL exists i in {1, 2} val CBV-TRVALTM-UNIT val τ ; A G ` M : τi ; (Γ ` v : Ai) val val val G ` x : τ ; (Γ ` x : A) G ` () : 1 ; (Γ ` () : 1) G ` inji M : τ1 + τ2 ; (Γ ` inji v : A1 + A2)

CBV-TRVALTM-PAIR 1, 2 CBV-TRVALTM-ABS forall i in { } comp val G, x:τ1 `` M : τ2 ; Γ, x:A ` e : C G ` M1 : τi ; (Γ ` v : Ai) val ` val G ` (M1,M2): τ1 × τ2 ; (Γ ` inj1 v : A1 × A2) G ` λx.M : τ1 τ2 ; (Γ ` thunk λx.e : U (A C))

CBV-TRVALTM-ADDR → comp → CBV-TRVALTM-THUNK τ ; C comp ` G `` M : τ ; Γ ` e : C G(a) = M τΓ (a) = A ` val ` val G ` thunk M : U τ ; (Γ ` thunk e : U C) G ` a : M τ ; (Γ ` a : M A )

Figure 14: CBV terms as CBPV value terms.

16 2014/1/18 comp G `` M : τ ; Γ ` e : C (Under G and `, the CBV term M translates to CBPV computation term e) CBV-TRCOMPTM-APP comp CBV-TRCOMPTM-ABS G `` M2 : τ1 ; Γ ` e2 : F` A comp G, x:τ1 `` M : τ2 ; Γ, x:A ` e : C ` comp G `` M1 : τ1 τ2 ; Γ ` e1 : A C ` comp comp G `` λx.M : τ1 τ2 ; Γ ` λx.e : A C G `` M1 M2 : τ2 ; Γ ` let x e2 in e1 x : C → → CBV-TRCOMPTM-LET comp → → ← G `` M1 : τ1 ; Γ ` e1 : F` A CBV-TRCOMPTM-RET CBV-TRCOMPTM-FIX comp val comp G, x:τ1 `` M2 : τ2 ; Γ, x:A ` e2 : C G ` V : τ ; (Γ ` v : A) G, f :τ `` M : τ ; Γ, f :C ` e : C comp comp comp G `` let x M1 in M2 : τ2 ; Γ ` let x e1 in e2 : C G `` V : τ ; Γ ` ret v : F` A G `` fix f .M : τ ; Γ ` fix f .e : C

CBV-TRCOMPTM-INJ ← ← val CBV-TRCOMPTM-VARFIX forall j in {1, 2}, τj ; Aj exists i in {1, 2} comp comp τ ; C G `` M : τ ; Γ ` e : F` Ai comp comp G `` f : τ ; Γ ` f : C G `` inji M : τ1 + τ2 ; Γ ` let x e in ret inji x : F` (A1 + A2)

CBV-TRCOMPTM-CASE forall i in {1, 2} ← comp G `` M : τ ; Γ ` e : F` (A1 + A2) comp G, x:τi `` M : τ3 ; Γ, x:Ai ` ei : C comp G `` case (M, x1.M1, x2.M2): τ3 ; Γ ` let x e in case (x, x1.e1, x2.e2): C

CBV-TRCOMPTM-PAIR forall i in {1, 2} ← comp G `` Mi : τi ; Γ ` ei : F` Ai comp G `` (M1,M2): τ1 × τ2 ; Γ ` let x1 e1 in let x2 e2 in ret (x1, x2): F` (A1 × A2)

CBV-TRCOMPTM-FST CBV-TRCOMPTM-SND comp ← ← comp G `` M : τ1 × τ2 ; Γ ` e : F` (A1 × A2) G `` M : τ1 × τ2 ; Γ ` e : F` (A1 × A2) comp comp G `` fst M : τ1 ; Γ ` let x e in split (x, y1.y2.ret y1): F` A1 G `` snd M : τ2 ; Γ ` let x e in split (x, y1.y2.ret y2): F` A2

CBV-TRCOMPTM-THUNK← CBV-TRCOMPTM-FORCE ← comp ` comp G ``2 M : τ ; Γ ` e : C G `` M : U τ ; Γ ` e : F` U C ` comp comp G M : 2 τ Γ ( ): G `` force M : τ ; Γ ` let x e in force` x : C ``1 thunk U ; ` ret thunk e F`1 U C `

CBV-TRCOMPTM-REF CBV-TRCOMPTM-GET ← comp ` comp G `` M : τ ; Γ ` e : Fouter A G `` M : M τ ; Γ ` e : F` ( M A ) comp comp G `outer ref M : 1 ; Γ ` let x e in ref v : Fouter 1 G `` get M : τ ; Γ ` let x e in get x : F` A

CBV-TRCOMPTM-SET ← comp ← G `outer M2 : τ ; Γ ` e1 : Fouter A ` comp G `outer M1 : M τ ; Γ ` e2 : Fouter ( M A ) comp G `outer set M1 M2 : 1 ; Γ ` let x1 e1 in let x2 e2 in set x1 x2 : Fouter 1

← ← ← ← Figure 15: CBV terms as CBPV computation terms.

G ` S˙ ; Γ ` S (Under G, the CBV store S˙ translates to CBPV store S) CBV-TRSTORE-CONS G ` S˙ ; Γ ` S val CBV-TRSTORE-EMP G(a) = M` τG ` V : τ ; (Γ ` v : A) G ` ε ; Γ ` ε G ` S,˙ a:V ; Γ ` S, a:v

Figure 16: CBV stores as CBPV stores.

17 2014/1/18