Adapton: Composable, Demand-Driven Incremental Computation

Adapton: Composable, Demand-Driven Incremental Computation CS-TR-5027 — July 12, 2013 Matthew A. Hammer, Khoo Yit Phang, Michael Hicks, and Jeffrey S. Foster University of Maryland, College Park, USA Abstract puts and decides what to do with the (automatically updated) inner- Many researchers have proposed programming languages that sup- layer outputs, i.e., in the context of the broader application. The port incremental computation (IC), which allows programs to be problem arises when the outer layer would like to orchestrate inner efficiently re-executed after a small change to the input. However, layer computations based on dynamic information. existing implementations of such languages have two important To see the issue, suppose there are two inner-layer computa- drawbacks. First, recomputation is oblivious to specific demands on tions, F(I) and G(I), and the application only ever displays the the program output; that is, if a program input changes, all depen- results of one or the other. For example, perhaps F(I) is on one dencies will be recomputed, even if an observer no longer requires spreadsheet pane, while G(I) is on another, and a flag P determines certain outputs. Second, programs are made incremental as a unit, which pane is currently visible. There are two ways we could struc- with little or no support for reusing results outside of their origi- ture this computation. Option (A) is to define F(I) and G(I) as two nal context, e.g., when reordered. To address these problems, we inner-layer computations, and make the decision about what to dis- present λcdd, a core calculus that applies a demand-driven seman- play entirely at the outer layer. In this case, when the outer layer ic I F(I) G(I) tics to incremental computation, tracking changes in a hierarchical changes , change propagation will update both and , even though only one of them is actually displayed. Option (B) is to cre- fashion in a novel demanded computation graph. λcdd also formal- ic ate one inner layer computation that performs either F(I) or G(I) izes an explicit separation between inner, incremental computations based on a flag P, now also a changeable input. When I is updated, and outer observers. This combination ensures λcdd programs only ic one of F(I) or G(I) is updated as usual. But when P is toggled, the recompute computations as demanded by observers, and allows in- prior work computing one of F(I) or G(I) is discarded. Thus, under ner computations to be composed more freely. We describe an algo- cdd many potential scenarios there is a lost opportunity to reuse work, rithm for implementing λic efficiently, and we present ADAPTON, cdd e.g., if the user displays F(I), toggles to G(I), and then toggles back a library for writing λic -style programs in OCaml. We evaluated to F(I), the last will be recomputed from scratch. The underlying ADAPTON on a range of benchmarks, and found that it provides re- issue derives from the use of the Dietz-Sleator order maintenance liable speedups, and in many cases dramatically outperforms prior data structure to represent the trace [10, 15]. This approach requires state-of-the-art IC approaches. a totally ordered, monolithic view of inner layer computations as change propagation updates a trace to look just as it would had the 1. Introduction computation been performed for the first time. This monolithic view also conspires to prevent other useful Incremental computation self- (IC), also sometimes referred to as compositions of inner and outer layer computations. A slight vari- adjusting computation , is a technique for efficiently recomputing ation of the above scenario computes X = F(I) unconditionally, a function after making a small change to its input. A good ap- and then depending on the flag P conditionally computes G(X). I plication of IC is a spreadsheet. A user enters a column of data , For technical reasons again related to Dietz-Sleator, Option (A) of F defines a function over it (e.g., sorting), and stores the result in putting the two function calls in separate inner layer computations, O I another column . Later, when the user changes (e.g., by insert- with the outer layer connecting them by a conditional on P, is not O ing a cell), the spreadsheet will update . Rather than re-sort the even permitted. Once again, this is dissatisfying because putting change propagation entire column, we could use IC to perform , both in the same inner layer computation results in each change to O which only performs an incremental amount of work to bring up P discarding work that might be fruitfully reused. to date. For certain algorithms (even involved ones [5, 6]), certain In this paper, we propose a new way of implementing IC that inputs, and certain classes of input changes, IC delivers large, even we call Composable, Demand-driven Incremental Computation asymptotic speed-ups over full reevaluation. IC has been developed (CD2IC), which addresses these problems toward the goal of ef- in many different settings [12, 17, 19, 31], and has even been used ficiently handling interactive incremental computations. CD2IC’s to address open problems, e.g., in computational geometry [7]. centerpiece is a change propagation algorithm that takes advantage Unfortunately, existing approaches to IC do not perform well of lazy evaluation. Lazy evaluation is a highly compositional (and when interactions with a program are unpredictable. To see the highly programmable) technique for expressing computational de- problem, we give an example, but first we introduce some termi- mand as a first-class concern: It allows programmers to delay nology. IC systems stratify a computation into two distinct layers. computations in a suspended form (as “thunks”) until they are The inner layer performs a computation whose inputs may later demanded (“forced”) by some outside observer. Just as lazy eval- change. Under the hood, a trace of its dynamic dependencies is im- uation does not compute thunks that are not forced, our demand- plicitly recorded and maintained (for efficiency, the trace may be driven change propagation 2 represented as a graph). The outer layer actually changes these in- (D CP) algorithm performs no work until forced to; it even avoids recomputing results that were previ- Formula = + ously forced until they are forced again. As such, we can naturally l1 l2 l3 employ Option (A) for the first example above, and change propa- + get get get 3 gation will only take place for the demanded computation. 2 To implement D CP we use a novel form of execution trace p1 p2 1 2 we call the demanded computation trace (DCT), which in practice eval l1 eval l2 eval l3 we represent as a graph (the DCG). Traced events are demanded get force force get force computations, i.e., which thunks have been forced and which input Legend eval p1 eval p2 force event (reference) cells have been read. Each force event references a sub- force trace of the events produced by running its corresponding thunk. INNER force force get event OUTER When an input changes, these events will become inconsistent, but memo match r1 r2 no changes are propagated immediately. Rather, when a thunk e is 2 forced, the CD IC engine sees if it has been executed before, and get line 3 get line 4 attempts to reuse its result after making it consistent (via change propagation), if needed. Focusing change propagation on thunks makes prior computations more composable and reusable. For ex- Figure 1: Sharing: Traces generated for lines 3 and 4. ample, repeated executions of the same thunk will be reused, as with standard memoization, even within the same execution. More- over, because trace elements are self-contained, and not globally usual, thunks are suspended computations, treated as values. We ordered, each can be freely reordered and composed. For exam- use the type connective U for typing thunks, whose introduction ple, we can do things like map a function over a linked list, swap and elimination forms, respectively, correspond to the thunk and the front half and the back half of the list, and change propagation force keywords, illustrated below. will update the result in a constant amount of work rather than re- In addition, we have an outer layer that may create special ref- compute (at least) half of the list. Because our representation is not erence cells for expressing incremental change; these mutable cells monolithic, we can also naturally intersperse inner and outer layer combine the features of ordinary reference cells and thunks. The computations, e.g., to be able to employ Option (A) in the second reference cells are created, accessed and updated by the primitives example above. ref , get and set , respectively, and typed by the M connective. In- We make several contributions in this paper. First, we formalize ner layer computations use get to access mutable state; the outer 2 cdd layer uses ref and set to create and mutate this state. CD IC as the core calculus λic (Section 3), which has two key cdd Now suppose that we have the following (toy) language for the features. Following Levy’s call-by-push-value calculus [22], λic includes explicit thunk and force primitives, to make laziness ap- formulae in spreadsheet cells: cdd parent in the language. In addition, λic defines a notion of mutable typecell =M formula store, employing a simple type system to enforce its correct usage and formula = Leaf of int j Plus of cell × c e l l for IC: inner layer computations may only read the store, and thus Values of type cell are formula addresses, i.e., mutable refer- are locally pure, while outer layer computations may also update it.

Adapton: Composable, Demand-Driven Incremental Computation

Deterministic Execution of Multithreaded Applications

CUDA Toolkit 4.2 CURAND Guide

Unit: 4 Processes and Threads in Distributed Systems

A Short History of Computational Complexity

Deterministic Parallel Fixpoint Computation

Byzantine Fault Tolerance for Nondeterministic Applications Bo Chen

Internally Deterministic Parallel Algorithms Can Be Fast

Entropy Harvesting in GPU[1] CUDA Code for Collecting Entropy

Deterministic Atomic Buffering

Complexity Theory Lectures 1–6

Optimal Oblivious RAM*

Semantics and Concurrence Module on Semantic of Programming Languages