Reactive Async: Expressive Deterministic Concurrency

Philipp Haller Simon Geries Michael Eichberg Guido Salvaneschi KTH Royal Institute of Technology, Sweden TU Darmstadt, Germany [email protected] {eichberg, salvaneschi}@cs.tu-darmstadt.de

Abstract tion can generate non-deterministic results because of their Concurrent programming is infamous for its difficulty. An unpredictable . Traditionally, developers address important source of difficulty is non-determinism, stemming these problems by protecting state from concurrent access from unpredictable interleavings of concurrent activities. via synchronisation. Yet, concurrent programming remains Futures and promises are widely-used abstractions that help an art: insufficient synchronisation leads to unsound pro- designing deterministic concurrent programs, although this grams but synchronising too much does not exploit hardware property cannot be guaranteed statically in mainstream pro- capabilities effectively for parallel execution. gramming languages. Deterministic-by-construction con- Over the years researchers have proposed concurrency current programming models avoid this issue, but they typi- models that attempt to overcome these issues. For exam- cally restrict expressiveness in important ways. ple the [13] encapsulates state into actors This paper introduces a concurrent programming model, which communicate via asynchronous messages–a solution Reactive Async, which decouples concurrent computations that avoids shared state and hence (low-level) race condi- using so-called cells, shared locations which generalize tions. Yet, the outcome of the program is potentially non- futures as well as recent deterministic abstractions such deterministic because it depends on how actors execution, as LVars. Compared to previously proposed programming message sending and message receiving interleave. Such is- models Reactive Async provides (a) a fallback mechanism sues are shared by a number of popular concurrency models, for the case where no computation ever computes the value including STM [12] and futures/promises [17]. of a given cell, and (b) explicit and optimized handling of Recently, researchers designed concurrency models, like cyclic dependencies. We present a complete implementation LVars [15], FlowPools [23] and Isolation Types [6], that re- of the Reactive Async programming model as a library in sult in deterministic executions. The idea behind these mod- Scala. Finally, the paper reports on a case study applying els is to restrict expressivity to capture only computations Reactive Async to static analyses of JVM bytecode based on that can be defined in a deterministic way. the Opal framework. However, the limitations imposed by the deterministic concurrency models proposed so far are severe. They push to Categories and Subject Descriptors D.3.2 [Programming the programmer the burden of merging the result of concur- Languages]: Language Classifications – Concurrent, dis- rent computations, they do not support cyclic dependencies– tributed, and parallel languages which are common in a number of fixed point algorithms, Keywords asynchronous programming, concurrent pro- .g., in static analysis–and they do not support dynamic de- gramming, deterministic concurrency, static analysis, Scala pendencies among concurrent computations. Adding and re- moving dependencies dynamically is critical for certain ap- 1. Introduction plications where concurrent abstractions are created on a very large scale and must be removed when they are no Developing correct concurrent programs is a well-known longer required. challenge. Concurrency can create race conditions that may In this paper, we present Reactive Async, a program- corrupt the state of the program. Also, concurrent programs ming model that allows developers to write deterministic are hard to reason about, because multiple flows of execu- concurrent code in a composable way, supports dynamic cyclic dependencies and supports a fallback mechanism to resolve computations that haven’t reached final values nor have pending dependencies,

This is the author’s version of the work. It is posted here for your personal use. Not for Our evaluation shows that Reactive Async scales well on redistribution. The definitive version was published in the following publication: multicore architectures, is efficient and can express complex SCALA’16, October 30–31, 2016, Amsterdam, Netherlands computations that are not supported by existing determinis- c 2016 ACM. 978-1-4503-4648-1/16/10... http://dx.doi.org/10.1145/2998392.2998396 tic concurrency models in real-world scenarios.

11 1 type ConcurrentSet[T] = TrieMap[T, Int] 1 val selectedUsers: List[Int] = ... 2 2 val entry: Vertex = ... 3 def traverse(v: Vertex, 3 val userIDs: ConcurrentSet[Int] = 4 userIDs: ConcurrentSet[Int]): Unit = { 4 TrieMap.empty[Int, Int] 5 if (isInteresting(v)) 5 // traverse graph starting from ‘entry‘ 6 userIDs.put(v.userID, 0) 6 // collect users in ‘userIDs‘ 7 for {n <- v.neighbors; if (!visited(n))} { 7 traverse(entry, userIDs) 8 Future { 8 // analyze selected users 9 traverse(n, userIDs) 9 for (user <- selectedUsers) { 10 } 10 val isInteresting = userIDs.contains(user) 11 } 11 ... 12 } 12 }

Figure 1. Concurrent traversal and collection of IDs Figure 2. Traversal, collection, and analysis of social graph

In summary, the contributions of this paper are: we use a concurrent TrieMap [22] as the implementation of • We present Reactive Async, a deterministic concurrency a concurrent set where elements of the set are represented model that supports cyclic dependencies, dynamic de- as keys of the TrieMap. Now, suppose that in addition to pendencies, and fallbacks. collecting interesting users, we would also like to analyze selected users in more detail as shown in Figure 2. This ap- • We identify patterns from real-world large-scale use proach suffers from two major issues: cases where existing approaches fall short. We show that Reactive Async can capture such complex computations 1. Since the graph traversal is concurrent (spawning fu- significantly more concisely than the original implemen- tures for the traversal of neighbors), the invocation of tation. contains on line 11 returns a non-deterministic result. A selected user may be “interesting”, but the concurrent • We provide an efficient implementation of Reactive traversal may – at the point when contains is invoked – Async in Scala1 that in our benchmarks exhibits scalabil- not have added the corresponding ID to the userIDs set. ity comparable to the state-of-the-art Scala futures [11]. In real-world case studies, the performance of Reactive 2. Furthermore, without examining the implementation de- Async achieves a speedup between 1.75x and 10x and tails of the traverse function, it is not clear whether scalability over multicore architectures is improved. the concurrent traversal yields a deterministic outcome in the first place (e.g., if user IDs are not only added to the The paper is structured as follows. Section 2 provides an userIDs set but may also be removed in certain cases). overview of our programming model. Section 3 describes relevant implementation details. Section 4 presents our case These issues demand for a programming model that leads studies. Section 5 provides a performance evaluation. Sec- to a deterministic result still preserving the efficiency of tion 6 reviews related work, and Section 7 concludes. concurrent computation.

2. Overview 2.2 Reactive Async In this section, we introduce an example of concurrent pro- Using Reactive Async, both issues can be addressed in a gramming, show the issues that arise using traditional so- straightforward way. First, instead of using a concurrent set, lutions and explain how Reactive Async can address them. user IDs are maintained in a so-called cell. Computations Next, we present in detail the features of Reactive Async. performed on cells are made deterministic by requiring cells to contain values taken from a lattice. This enables concur- 2.1 Motivating Example rent, monotonic updates of the value of a cell while preserv- Consider the example of a social network modeled as a ing the determinism of the (concurrent) computation. Cell graph where vertices represent users of the social network updates are performed using cell completers. traverse and edges represent connections between users (e.g., a “fol- Figure 3 shows the code of the function, but userIDs lows” relation). Suppose an application traverses a given so- this time using a cell completer for updating . In cial graph and collects the IDs of users satisfying a predicate order for this cell completer to be valid, we must guarantee Set[Int] (“interesting” users) in a set. In order to leverage multicores, that the elements of type form a lattice. This is IntSetLattice the traversal and collection should be performed concur- done by providing the (implicit) instance Lattice Set[Int] rently. Therefore, user IDs should be collected in a thread- of the type class for type (lines 1–6). Set(v.userID) safe (concurrent) set. Figure 1 shows a simple concurrent Line 11 shows a monotonic update with userIDs breadth-first graph traversal for such a purpose. Note that which updates the value of with the join of the cell’s current value and Set(v.userID), effectively adding 1 See https://github.com/phaller/reactive-async. v.userID to the set of collected users. Importantly, during

12 implicit object IntSetLattice nation of the graph traversal (such as futures [11]). However, extends Lattice[Set[Int]] { in this case the programmer would be on her own to ensure val empty = Set() that the concurrent program is deterministic (or at least data- def join(left: Set[Int], right: Set[Int]) = race free). The goal of Reactive Async is to provide a single left ++ right } programming model that integrates mechanisms such as qui- escence in a way that allows writing concurrent programs def traverse(v: Vertex, that are deterministic by construction. In the following we userIDs: CellCompleter[Set[Int]]): Unit = { dive into more details of the programming model. if (isInteresting(v)) userIDs.putNext(Set(v.userID)) Core Abstractions As mentioned earlier, the programming for {n <- v.neighbors; if (!visited(n))} { Future { model of Reactive Async is based on two core abstractions: traverse(n, userIDs) cells and cell completers. The two abstractions are related in } a way similar to the way Scala’s futures and promises [11] } are related. To recall, a Scala future of type Future[V] may } asynchronously be completed successfully with a value of Figure 3. Concurrent traversal and collection of IDs using type V; or it may asynchronously be “failed” with an excep- a cell completer tion. A Scala promise of type Promise[V] allows complet- ing its associated future at most once. An attempt to com- plete an already completed promise throws an exception. val selectedUsers: List[Int] = ... Similar to a future of type Future[V], a cell of type val entry: Vertex = ... Cell[K, V] may asynchronously be completed with a val pool = new HandlerPool val userIDs = CellCompleter[Set[Int]](pool) value of type V – the K type parameter is explained below. // traverse graph starting from ‘entry‘ Cell[K, V] provides a read-only interface to access its // collect users in ‘userIDs‘ value(s). Similar to a promise of type Promise[V], a cell traverse(entry, userIDs) completer of type CellCompleter[K, V] allows completing pool.onQuiescent { // analyze selected users its associated cell. There are two main differences between for (user <- selectedUsers) { futures/promises and cells: val isInteresting = userIDs.contains(user) ... • Cell completers allow multiple monotonic writes. The } written values must be elements of a lattice. In contrast, } promises allow at most one write. Figure 4. Traversal, collection, and analysis of social graph • Cells allow resolving cyclic dependencies. In contrast, using quiescence dependencies between futures must be acyclic in order to avoid deadlocks. a concurrent traversal, multiple monotonic updates (like the The distinction between cell completers is motivated by one on line 11) would be applied concurrently. However, the the same software engineering considerations behind the final value of the userIDs cell would be unaffected by the futures/promises model. With this distinction, libraries can order of these updates. manipulate cell completers hiding from the client state up- Finally, in order to safely perform further analyses of in- dates when the concurrent computation is in progress. In teresting users in a deterministic way (effectively avoiding contrast, the client of the library receives a cell that can only a ), Reactive Async supports the concept of be used to access the final result as soon as it is available. quiescence. Essentially, a concurrent computation on cells reaches quiescence whenever it is guaranteed that none of 2.3 Declarative Dependencies the cells are updated anymore. Therefore, the values of cells Reactive Async supports the declaration of dependencies be- are stable as soon as quiescence is reached, and read accesses tween cells, such that a monotonic update of one cell triggers return deterministic results. To revisit the example, Figure 4 a monotonic update of another cell. In addition, triggered shows how to make the previously non-deterministic analy- updates are conditional on the update of the original cell. sis deterministic. Here, the analysis is only performed when For a simple example, consider two cells c1 and c2 hold- the traversal has reached quiescence. This is done by regis- ing integer values taken from the lattice of integers where tering an onQuiescent handler on line 8. Therefore, the result the join operator computes the maximum of the two given returned by the invocation of contains on line 11 is now integers (i.e., join(x, y) = max(x, y)). Suppose we deterministic, since the userIDs cell is guaranteed not to be would like to express the fact that whenever c1 is (mono- updated any more by the traversal. tonically) updated with a value v which is a multiple of Note that it would always be possible, of course, to utilize 10, c2 should be updated with v; else, c2 should not be up- other concurrency abstractions in Scala to ensure the termi- dated. This means that if c1 is updated sequentially with val-

13 ues 1, 5, 10, 11, 21, 30, 35, 40 then c is updated with values 2 1 class X{ 10, 30, 40. This dependency can be expressed declaratively 2 private final Y y; as follows: 3 X() { 4 y = new Y(this); c2.whenNext(c1, 5 } x => if (x % 10 == 0) WhenNext else FalsePred, 6 } None) 7 class Y{ 8 private final X x; The above snippet registers a dependency of c2 on 9 Y(X x) { c1. Whenever c1 receives an update with value v (via 10 this.x = x; 11 } c1.putNext(v)), it first applies the function which is the 12 } second argument of whenNext to v to decide whether c2 should be updated as well, and, if so, how; the function has type Int => WhenNextPredicate. Figure 5. Immutability analysis with cyclic dependencies The WhenNextPredicate type is defined as follows: sealed trait WhenNextPredicate a class is immutable unless the analysis finds contradicting case object WhenNext extends WhenNextPredicate case object WhenNextComplete extends WhenNextPredicate evidence. case object FalsePred extends WhenNextPredicate The example in Figure 5 shows how to produce cyclic dependencies in this context. Note that the code shown in A WhenNext result means that the dependent cell should re- Figure 5 is essentially a Java rendition of the equivalent ceive an update (with the semantics of putNext); the update JVM bytecode; in particular, scalac rejects the naive Scala value is either the value received by the upstream cell if the equivalent.2 For the shown example, our analysis (a) assigns third argument of whenNext is None or w if the third argu- value ConditionallyImmutable to each cell, given that ment of whenNext is Some(w).A WhenNextComplete result each class does not introduce mutable elements itself, and means that the dependent cell should receive an update, and (b) registers the following two dependencies: this update should be final. Final completions are discussed in detail below. A FalsePred result means that the depen- 1 cellForX.whenNext(cellForY, 2 x => if (x == Mutable) => WhenNext else FalsePred, dent cell should not receive an update. 3 Some(Mutable)) In the above example, the second and third arguments 4 ... of whenNext express the following update logic: c2 is only 5 cellForY.whenNext(cellForX, 6 x => if (x == Mutable) => WhenNext else FalsePred, updated if c1 is updated with a multiple of 10, and in this 7 Some(Mutable)) case c2 is updated with the same value as c1. Clearly, the whenNext invocations create a dependency cy- 2.4 Cyclic Dependencies cle. As a result, the immutability analysis so far fails to de- Using whenNext it is possible to create cyclic dependen- termine the correct result value, namely, Immutable, for the cies between cells. For a real-world example, consider the cells. static analysis of JVM bytecode, and more specifically im- Dependency Resolution Reactive Async supports the res- mutability analysis (this case study is discussed in more olution of cyclic dependencies as follows. detail in Section 4). Roughly, immutability analysis at- First, the runtime system identifies points in the program tempts to answer the question whether a given class dec- execution when it can detect dependencies between cells laration defines a class whose instances can never be mu- which can never be updated again. A state in the program tated after construction. The analysis can be implemented execution that is suitable for this, is a state where it is guar- using Reactive Async by creating a cell for each analyzed anteed that none of the cells will be updated again. Clearly, class. The cell holds values taken from an “immutabil- this is the case when all threads capable of performing up- ity lattice” with elements ⊥ (the initial value of the cell), dates are idle and there are no outstanding tasks (performing ConditionallyImmutable, Mutable, and Immutable. The updates) to be executed by these threads. When a value ConditionallyImmutable expresses the fact that the reaches this state, it is said to be quiescent. Thus, to be able corresponding class C depends on potentially mutable types to detect quiescence of all update-performing threads, it is (e.g., generic class List[T]) while C itself does not intro- sufficient to ensure that all cell updates are performed by duce mutability (e.g., reassignable fields). The introduction tasks executed on a thread pool. In turn, when this thread of dependencies is straightforward: the cell of a class C de- pool becomes quiescent, it is guaranteed that none of the pends on the cell of a class D if C “uses” D in some way, cells will be updated again, unless a new task is submitted to such as by declaring a field of type D (the precise details of a the thread pool from a non-pool thread. real-world immutability analysis are irrelevant and therefore omitted; Section 4.2 discusses more details). The general ap- 2 Patterns such as the one shown here are relevant in practice, given occur- proach taken by our immutability analysis is to assume that rences in widely-used software like JDK version 8.

14 trait Key[V] { sealed abstract class Try[+T] { ... } def resolve[K <: Key[V]](cells: Seq[Cell[K, V]]): final case class Failure[+T](exception: Throwable) Seq[(Cell[K, V], V)] extends Try[T] { ... } def fallback[K <: Key[V]](cells: Seq[Cell[K, V]]): final case class Success[+T](value: T) Seq[(Cell[K, V], V)] extends Try[T] { ... } } Figure 7. Type Try[T]

Figure 6. The Key trait for specifying resolution strategies key specifies fallback values by implementing the fallback method shown in Figure 6. The method receives all cells Second, we need to distinguish between final and that have neither final values nor dependencies, and returns non-final values of cells. In the above example, each a sequence of pairs where each pair contains a cell and a cell initially receives value ConditionallyImmutable. If value with which the cell should be resolved. This may sub- this value would not be expected to change any more, sequently lead to the completion of those cells which have then cycle resolution would not be necessary, since outgoing dependencies, but which do not take part in cyclic ConditionallyImmutable would be a satisfactory final re- dependencies. sult value. However, in the case of the analysis, a value should only be considered “final” if no other value would 3. Implementation be allowable (otherwise, the analysis could return incorrect results, as explained above). Clearly, in the above exam- The implementation of Reactive Async consists of two main ple, another value, namely Immutable, is allowable, so the components: the cell and the handler pool. The former im- ConditionallyImmutable values should not be considered plements the Cell and CellCompleter interface traits pre- final. Non-final cell values then enable Reactive Async to sented in Section 2. The latter implements (a) concurrent detect cycles: cells with non-final values need an extra res- task execution, (b) quiescence detection, and (c) dependency olution step when the underlying thread pool reaches qui- resolution. escence, since there are no outstanding updates that would Cell Similar to a promise, a cell can be in one of two states: complete such cells with a final value. Thus, when the thread (a) completed with a final result, or (b) not completed. pool becomes quiescent, all cells with non-final values are The state of a completed cell of type Cell[K,V] is simply checked to see if they form part of a dependency cycle. De- a value of type Try[V], shown in Figure 7.4 A value of pendency cycles, more precisely closed strongly-connected type Failure[V] is reserved for the case where a cell is components (cSCC), are determined and then a custom res- completed using an expression which throws an exception. A 3 olution strategy applied to all cells of a cSCC. Resolution value of type Success[V] indicates a successful completion strategies are specified using keys, another abstraction that, with a value of type V. like lattices, form part of the “configuration” of a cell. A cell that has not been completed maintains an interme- Keys A key specifies the resolution strategy that is to be diate result as well as a set of dependencies. For a cell of type used for a group of cells. Essentially, a concrete key imple- Cell[K,V], the intermediate result is an element of a lattice ments methods for resolving cyclic dependencies. Figure 6 implemented by a type class instance of type Lattice[V]. shows the Key trait with its two methods for specifying reso- There are two kinds of dependencies: onNext dependencies lution strategies. The first method, resolve, takes an entire and onComplete dependencies. cSCC as an argument (as a Scala sequence), and returns a A cell c1 with an onNext dependency on a cell c2 may sequence of pairs where each pair contains a cell as well as be updated whenever c2 is updated using putNext (onCom- the value with which the cell should be resolved. (The actual plete dependencies are analogous). Recall the example from 5 resolution is done by the runtime system, which schedules Section 2.3: appropriate cell updates.) c1.whenNext(c2, x => if (x % 10 == 0) WhenNext else FalsePred, Fallbacks Interestingly, the mechanism used by Reactive None) Async to detect unresolved, cyclic dependencies enables fallbacks, which increase the expressiveness of the program- The above whenNext invocation dynamically introduces an ming model further. In the case where in a state of quies- onNext dependency of cell c1 on cell c2. Whenever c2 re- cence there are cells without final values and without depen- ceives a monotonic update, this dependency triggers an eval- dencies on other cells, dependency resolution as explained uation of the predicate provided in the whenNext invoca- above is not applicable. To handle also this case, keys allow tion, by submitting a corresponding callback for execution specifying fallback values for cells affected in such a way. A on the associated handler pool. c1’s intermediate result is

4 3 A closed SCC is an SCC where each node belonging to it is only connected Type Try[T] is part of Scala’s standard library in package scala.util. 5 with nodes in the SCC, i.e., there are no outgoing dependencies. Note that c1 and c2 are swapped.

15 then updated according to the evaluation result (WhenNext We performed two case studies to evaluate and demon- or FalsePred). To keep the overhead of triggering such an strate the benefits of using reactive async for the imple- update low, the corresponding callbacks are maintained as mentation of static analyses. The first case study is the im- task objects as part of their dependencies. plementation of a basic analysis for identifying pure meth- It is important to delete dependencies as soon as they are ods (Purity Analysis). The second one determines the mu- no longer needed. This is particularly important for cases tability of instances of some class (Immutability Analysis). where cells are kept accessible for longer periods of time. Both are implemented using the static analysis framework For example, in our case studies (see Section 4) cells are OPAL [9]6 and both have functionally equivalent imple- used to represent code entities for the purpose of static anal- mentations, which use OPAL’s Fixed-Point Computations ysis. Here, code entities and their cells live “forever.” As a re- Framework (called FPCF in the following). Compared to sult, it is crucial to free up objects implementing dependen- Reactive Async, FPCF provides an API that offers similar cies as early as possible to reduce the risk of out-of-memory features as Reactive Async, but which is build around tradi- errors. tional locks and monitors. Furthermore, FPCF ensures that The key insight for effectively removing unneeded depen- all user-defined computations w..t. a specific value – cap- dencies is the fact that when a cell is completed with a final tured by a cell in Reactive Async – are never executed in result (e.g., using putFinal), all its onNext and onComplete parallel. For example, if – in case of Reactive Async – a dependencies can be removed, since further updates are no value of a specific cell depends on the values of multiple longer possible. However, it is important that this removal other cells, then the user predicate(s) which is(are) registered is efficient, since each invocation of putFinal triggers a re- using whenComplete may be executed in parallel. In case of the moval. This is done by having each dependency maintain FPCF all computations related to a value are executed se- references to both the source and the target cell, as well as quentially. storing callbacks in hash tables indexed by cell. For the evaluation we always analyzed the complete Mac Handler Pool The second main component of the imple- OS X (10.11.6) Oracle JDK 8 update 45 on a Mac Pro with mentation of Reactive Async is the handler pool. A handler a 3 GHz, 8-core (16 hyperthreads) Intel Xeon E5 with 32 pool is a thread pool with extensions for quiescence detec- GB RAM; the JVM was given 24 GB of heap memory. The tion and dependency resolution. A handler pool allows reg- performance data is visualized using box plots where the istering callbacks which are executed asynchronously when- whiskers represent the min/max values (Figure 8 and 9). ever the handler pool reaches quiescence: 4.1 Purity Analysis def onQuiescent(handler: () => Unit): Unit Here, we consider those methods as being pure that – during Quiescence detection is implemented using an atomically- one instantiation of the program – always return the same updated reference to an instance of the following class: value given the same input values. The implemented anal- class PoolState( ysis rates those methods as being pure which only perform val handlers: List[() => Unit] = List(), basic arithmetic operations and non-virtual method calls of val submittedTasks: Int = 0) pure methods. If the method performs any of the follow- Note that PoolState is not a case class, since an atomic ing operations then the method is considered to be impure: reference of type AtomicReference[PoolState] requires array related operations, virtual method calls, calls of na- PoolState to implement by-reference equality. tive methods, synchronization or instance-based field ac- Cyclic dependencies are resolved when the handler pool cesses. Hence, this analysis is not a full-blown purity analy- is quiescent. Dependency resolution first determines all sis (e.g., as presented in [10, 24]), but it captures the essential closed strongly-connected components (cSCCs) of incom- part: a method’s purity depends on the purity of other meth- plete cells. For each cSCC, the resolve method of the cor- ods; even if the methods are in a cyclic calling dependency. responding Key (see Figure 6) is invoked to determine the The first step of the analysis is to associate each method values that the affected cells are completed with. with one cell (completer) to store the purity information. After that, the analysis runs in parallel for all methods and 4. Case Studies tries to determine for each method whether it is pure or Our case studies aim at evaluating the following research impure by analyzing the method’s body. For example, if the questions: analysis only sees arithmetic and control-flow instructions • Is our approach sufficiently expressive to implement – such as in Math.abs(Int,Int) – it immediately completes complex computations that require managing cyclic de- the method’s cell using putFinal(Pure). If the analysis on the e.g., pendencies? other hand sees, an array access, it instead completes the method’s cell using putFinal(Impure). In case the analysis • Is the performance of our approach competitive com- encounters a non-virtual method call the analysis may still pared to a hand written implementation that achieves the same expressivity at a lower level of abstraction? 6 See http://www.opal-project.de.

16 mt el hc r nacci eednyo hc have which or dependency dependencies. cyclic more a no in frame- are ( which value the cells fallback reached, empty the is use quiescence then will of work state the be- have and computations identified all finished be when Later, to quiescence. reach guaranteed we are fore methods impure all propagates that immediately cell. method’s which target design, the on im- This dependency never a create will will using it cell analysis stead, method’s the calling the methods complete other mediately call which meth- ods for Hence, nothing. does it otherwise correspondingly; using method handler: a that register to do completer method To cell method’s the consideration. calls into the immediately method taking it without called pure the is it of if purity determine longer impure, no is can method it the but that determine subsequently to able be etstig hed ncs fFC n 6trasin threads 16 is and latter FPCF the – of Async case two Reactive in of the performance threads case compare 4 a we – to settings when leads best Overall, threads shown). both 16 (not In than decrease performance. more the using improves a cases initially to only leads case case FPCF in always threads in of more threads that using more while seen improvement using be performance performs Async but can Reactive threads, it two Additionally, of or that. one after use worse just Reactive we than when better As Async performs 453 threads. 270 framework of all FPCF numbers analyze the different to shown, using time JDK the the shows of and methods 8 is Figure data in performance seen The framework. ( FPCF the code with of of interaction case lines the In 166 of cells. out (other) 48 completing and querying to alrelweCmlt(agtel =Impure, == _ callerCell.whenComplete(targetCell, Overall, calling the of cell the complete then will handler This oeIpr)/ opeeti eluigImpure using cell this complete Some(Impure))// 9 u of out Impure iue8. Figure

FPCF (secs.) ftecle ehdscl scompleted is cell method’s called the if 0.15 0.20 0.25 0.30 0.35 113 ie fcd ( code of lines utm fteprt nlssfrteetr D ie xdnme fthreads. of number fixed a given 8 JDK entire the for analysis purity the of Runtime 1 Thread ≈ 29% whenComplete r ietyrltdto related directly are ) ≈ 0.43 0.44 0.45 Pure 0.335 0.340 0.345 1 ≈ . 75 2 Threads ocmlt the complete to ) 8% Impure ie faster. times ntetarget the on r related are ) ensures , Pure In- . 4 Threads 17 eemn h muaiiyo t ntne yaayigthe analyzing by to instances tries its and cells, of parallel the immutability in the files of determine class creation all over the iterates After analysis the subtypes. ( all immutability ( type’s over class defined abstracts the information the of store the objects to store the one of to immutability One the class. about each for (completers) so subclasses. mutable, its are of instances superclass the all a are of of instances instances if of E.g., immutability superclass. is sub- the class all by a of of constrained ratings objects always immutability of the immutability of the immutability join Additionally, the types. the analyzing is when – required fields as of – type a of ity immutabil- the Naturally, immutable. conditionally only also is field a if being ilarly, as is considered field is any im- class field’s that the case as used In is mutability. the the type fields respective type, typed the primitive referenced of field For a immutability declared immutable. of as the case treated then is In field final analyzed. are further are fields types all If be finishes. to class the considered then fields, is non-final class defines class the If driven. example im- conditionally nor are immutable mutable neither are that classes immutable. All im- conditionally Typically, only updated. are be classes collection can refer- mutable fields might the closure where transitive objects the ence where but final, are fields immutable. Immutable are objects Conditionally referenced indirectly and next: directly explained all fields. cases instance three its the Immutable of distinguishes immutability analysis a the The of analyzing instances by of immutability class the determines analysis This Analysis Immutability 4.2 0.165 0.170 0.175 0.180 0.185 h mlmnaino h nlsscetstocells two creates analysis the of implementation The counter- also is analysis this case, previous the in As 8 Threads en htteisacso h ls swl as well as class the of instances the that means Mutable 0.18 0.19 0.20 0.09 0.10 0.11 odtoal immutable conditionally . 16 Threads sue o hs bet whose objects those for used is mutable odtoal immutable conditionally 0.1 0.2 0.3 0.4 0.5

n h nlsso the of analysis the and mutable (secs.) Reactive-Async hntecasis class the then h declaring the , TI .Telater The ). OI Sim- . and ) sn clsvr eladealsefiin s fmulti- of use efficient enables CPUs. and Reactive core well that very shown have scales studies Async case W.r.t. the situation. performance the handle the explicitly to developer re- de- the have art quired would the models of programming state other concurrent Using terministic cycles. handle code to no written cases, was both in ex- computations; corresponding corre- that significantly benefits computations shown a cyclic resolving been and for has locks support plicit It traditional model. uses classical com- programming which when of sponding API manner aspects an concise core to more express much pared – a least in makes analyses at Async static – Reactive to using possible shown, it have studies both As Conclusion 4.3 FPCF. by done be to needs that Re- locking 9, of amount Figure high in seen be just is can Async are As active fields. analyses all the over loops since interacts simple framework, directly respective code all the basically with cases both In is and long. analyses lines two into split is mentation is which analysis one dependencies of high. number very the is is case analysis previous the the implemented. to is example, Compared infrastructure previous complete the the but of simple, very case in corresponding the as with Hence, registered is callback a case TI a complete To ingly. the when immutable cor- ditionally the with registered is responding callback a field the reference-typed found a is field instance non-final OI a If described. as fields h ecieAycbsdipeetto sbasically is implementation based Async Reactive The el falsbye nestetp sala ye nthat In type. leaf a is type the unless subtypes all of cells eli meitl opee using completed immediately is cell TI iue9. Figure elwihst the sets which cell ≈ 10 FPCF (secs.) x TI 294 1.0 1.5 2.0 2.5 utm fteimtblt nlssfrteetr D ie xdnme fthreads. of number fixed a given 8 JDK entire the for analysis immutability the of Runtime atrta PF hc sdet the to due is which FPCF, than faster elaclbc srgsee ihall with registered is callback a cell ie og h PFbsdimple- FPCF-based the long; lines 1 Thread TI OI eli opee accord- completed is cell elt ual rcon- or mutable to cell 2.15 2.20 2.25 2.30 2.35 0.290 0.295 0.300 mutable 5 7 424 = 171 + 253 2 Threads ncs of case in ; OI cell. 4 Threads 18 nrlto otenme fue threads used of number the to relation in 10. Figure 32. of factor branching a with trie vector bit-mapped a Scala’s size use we of collection collection expose a To use 16777216 parallel. we in parallelism, integers sufficient random of collection large Sum Parallel 4). Section and (see studies hardware case same the the as setup on perfor- software based All are Async. measurements Reactive concur- mance on a and based promises, implementation and rent futures Scala’s on implemen- based implementa- concurrent tation a different implementation, sequential three a compare tions: we benchmark Value each Present Net Reac- Carlo of benchmarks, two scalability using and Async performance tive the evaluate also We Benchmarks 5.

Runtime (ms) 8 Threads 100 150 200 250 300 50 0 utemr,t nbeefiin pitn fthe of splitting efficient enable to Furthermore, . 1 utm fthe of Runtime 1.15 1.20 1.25 1.30 1.35 0.105 0.110 0.115 2 hsbnhakcmue h u fa of sum the computes benchmark This 3 4 16 Threads 5 Number ofThreads 6 Vector[Int] hc edsrb eo.For below. describe we which , aallSum Parallel 7 8 0.1 0.2 0.3 9

aallSum Parallel Scala Futures 2.11.7 Reactive Async Sequential

10 Reactive-Async (secs.) Reactive-Async 11 hc sbce by backed is which , 12 ir benchmark micro 13 14 and 15 2 16 Monte 24 = 6000 6. Related work Sequential Programming Models for Concurrency Control Over the 5000 Reactive Async years researchers have proposed a number of advanced con- Scala Futures 2.11.7 currency control mechanisms. Software Transactional Mem- 4000 14 15 16 ories (STM) [12] handle the problem of concurrent access to shared data introducing abstractions that define the bound- 3000 700 aries of a transaction. Transactions abort in case of a con- 600

Runtime (ms) Runtime flict (optimistic control) or check not to introduce conflicts in 2000 500 400 advance (pessimistic control). Actors [13] encapsulate state 1000 and control, reacting to asynchronous messages. Since ac- tors do not share memory, race conditions are not possible by 0 construction. Imperative languages pose the additional chal- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 lenge that races can arise on messages. The Async/Finish Number of Threads model [7] provides high-level, lexically-scoped constructs for creating parallel tasks and waiting for their termination. Figure 11. Runtime of the Monte Carlo Net Present Value This approach simplifies static analysis and enables com- benchmark in relation to the number of used threads piler optimizations. Finally, additional models have been de- rived via the combination of other models. For example, Figure 10 shows the runtime of our implementations in Habanero-Scala (HS) [14] is a Scala library that supports relation to the number of used threads. The performance of a hybrid programming model, combining the Async/Finish Reactive Async is virtually identical to Scala’s futures in model and the Actor model. In contrast to our approach, this benchmark. The speed-up compared to the sequential none of these solutions guarantee determinism. implementation is about 1.91x for 2 threads (futures: 2.03x), about 6.97x for 8 threads (futures: 6.62x), and about 7.37x Deterministic Concurrent Models Isolation types [6] are for 16 threads (futures: 7.37x). a mechanism for parallel execution of application tasks. Iso- Monte Carlo Net Present Value This benchmark com- lation types are used for data shared among tasks. Tasks putes the Net Present Value7 while allowing for uncertainty fork and join isolated revisions: tasks read and modify their in the discount rate and the cash flows by using a Monte own private copy of the shared data. Copies are created and Carlo simulation. Our implementation is a Scala port of an merged automatically, conflicts are solved deterministically, open-source Java implementation by Alan Hohn,8 which is in the way declared by each isolation type. based on Doug Lea’s fork/join framework [16]. Programming models for deterministic concurrency guar- Figure 11 shows the runtime of our implementations in antee the equivalence to a sequential execution [3, 5, 21, 25]. relation to the number of used threads. Overall, the observed They restrict the tasks to avoid conflicts that cannot be au- performance trends are similar to the Parallel Sum bench- tomatically resolved via the type system, or at runtime via mark. However, we can see that Scala’s futures are slightly blocking (optimistic) or aborts and retry (optimistic). more efficient. For 14–16 threads, futures are between 23% FlowPools [23] provide a deterministic concurrent and 36% faster than Reactive Async. Two main reasons are dataflow abstraction for multisets, with an efficient - likely: first, the Monte Carlo Net Present Value benchmark free implementation. Reactive Async generalizes Flow- requires more synchronization than the Parallel Sum bench- Pools by supporting arbitrary lattice-based data, including mark; as a result, the more complex state updates within cells but not limited to multisets. Furthermore, Reactive Async become observable. Second, Scala’s futures have undergone supports the resolution of cyclic dataflow dependencies, performance tuning by industry experts, whereas the same whereas FlowPools are restricted to acyclic dataflow graphs. level of optimization effort has not been applied to the cur- LVars [15] is a deterministic-by-construction parallel pro- rent implementation of Reactive Async. gramming model based on shared state similar to Cells: they take the least upper bound of the old and new values with Summary The presented micro benchmarks show that Re- respect to the lattice to guarantee determinism. Similar to active Async exhibits a scalability competitive with Scala’s putFinal, LVars can be freezed after a write, to ensure that futures, a finely-tuned, state-of-the-art implementation of fu- no further values can be added. In contrast to our approach, tures on the JVM. In terms of runtime, Scala’s futures are up LVars do not support cyclic dependencies and do not address to 36% faster than Reactive Async in our micro benchmarks. the problem of fallback computations. We expect this percentage to decrease with an effort to care- In distributed systems, monotonicity has been used to fully optimize the implementation of Reactive Async. guarantee eventual consistency. Bloom [1] allows program- 7 See https://en.wikipedia.org/wiki/Net present value. ming distributed systems that are eventually consistent based 8 See https://github.com/AlanHohn/monte-carlo-npv. on monotonically updated data collections.

19 DPJ [4] introduces an effect system for deterministic con- [6] S. Burckhardt, A. Baldassin, and D. Leijen. Concurrent pro- currency in Java. The compiler checks that memory regions gramming with revisions and isolation types. In OOPSLA, are linear, avoiding concurrent accesses by multiple writers. pages 691–707, 2010. In contrast to our approach and solutions like LVars, which [7] P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, are based in monotonic writes, DPJ achieves determinism by K. Ebcioglu, C. von Praun, and V. Sarkar. X10: An object- restricting when reads and writes can happen. oriented approach to non-uniform cluster computing. In OOP- SLA, pages 519–538, 2005. Reactive Programming Reactive programming [2] is a programming paradigm that aims at supporting reactive ap- [8] E. Czaplicki and S. Chong. Asynchronous functional reactive programming for GUIs. In PLDI, pages 411–422, 2013. plications with dedicated language abstractions (e.g., event streams, signals). Languages like ELM [8], Scala.react [18], [9] M. Eichberg and B. Hermann. A software product line for and ReactiveX/Rx [19] support asynchronous execution of static analyses: The OPAL framework. In SOAP, pages 1–6, 2014. reactive computations. Similar to cells, reactive abstrac- tions support the composition of asynchronous computa- [10] M. Finifter, A. Mettler, N. Sastry, and D. Wagner. Verifiable tions. However, while some of the existing reactive program- functional purity in Java. In CCS, pages 161–174, 2008. ming languages support dynamic dependencies [20], they do [11] P. Haller, A. Prokopec, H. Miller, V. Klang, R. Kuhn, not provide determinism nor support cycles in the dependen- and V. Jovanovic. Futures and promises. http://docs. scala-lang.org/overviews/core/futures.html cies between reactive values. , 2012. [12] T. Harris, A. Cristal, O. S. Unsal, E. Ayguade, F. Gagliardi, 7. Conclusion B. Smith, and M. Valero. Transactional memory: An overview. IEEE Micro, 27(3):8–29, May 2007. Restricted expressivity has been a long-standing challenge for deterministic concurrency. We propose a concurrent pro- [13] C. Hewitt. Viewing control structures as patterns of passing messages. Artificial Intelligence, 8(3):323–364, June 1977. gramming model, Reactive Async, that overcomes several limitations of existing solutions. It allows developers to write [14] S. M. Imam and V. Sarkar. Integrating task parallelism with composable, deterministic concurrent code supporting both actors. In OOPSLA, pages 753–772, 2012. cyclic and dynamic dependencies. Case studies in the do- [15] L. Kuper, A. Turon, N. R. Krishnaswami, and R. R. New- main of static analysis show that Reactive Async can easily ton. Freeze after writing: Quasi-deterministic parallel pro- tackle problems where existing solutions are not effective gramming with LVars. In POPL, pages 257–270, 2014. due to a lack of expressivity. Performance evaluations on [16] D. Lea. A Java fork/join framework. In Java Grande, pages both case studies and benchmarks show that our implemen- 36–43, 2000. tation allows writing faster applications compared to hand- [17] B. Liskov and L. Shrira. Promises: Linguistic support for written concurrent code and to retain the scalability of state- efficient asynchronous procedure calls in distributed systems. of-the-art implementations of futures. In PLDI, pages 260–267, 1988. [18] I. Maier and M. Odersky. Higher-order reactive programming Acknowledgments with incremental lists. In ECOOP, pages 707–731, 2013. This work is partially supported by the European Research [19] E. Meijer. Your mouse is a database. Commun. ACM, Council, grant No. 321217. 55(5):66–73, 2012. [20] L. A. Meyerovich, A. Guha, J. Baskin, G. H. Cooper, References M. Greenberg, A. Bromfield, and S. Krishnamurthi. Flapjax: A programming language for Ajax applications. In OOPSLA, [1] P. Alvaro, N. Conway, J. M. Hellerstein, and W. R. Marczak. pages 1–20, 2009. Consistency Analysis in Bloom: a CALM and Collected Ap- proach. In CIDR, pages 249–260, 2011. [21] P. Pratikakis, J. Spacco, and M. Hicks. Transparent Proxies for Java Futures. In OOPSLA, pages 206–223, 2004. [2] E. Bainomugisha, A. L. Carreton, T. v. Cutsem, S. Mostinckx, and W. d. Meuter. A survey on reactive programming. ACM [22] A. Prokopec, N. G. Bronson, P. Bagwell, and M. Odersky. Comput. Surv., 45(4):52:1–52:34, Aug. 2013. Concurrent tries with efficient non-blocking snapshots. In PPOPP [3] E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe , pages 151–160, 2012. Multithreaded Programming for C/C++. In OOPSLA, pages [23] A. Prokopec, H. Miller, T. Schlatter, P. Haller, and M. Oder- 81–96, 2009. sky. FlowPools: A lock-free deterministic concurrent dataflow [4] R. L. Bocchino, Jr., V. S. Adve, S. V. Adve, and M. Snir. abstraction. In LCPC, pages 158–173, 2012. Parallel Programming Must Be Deterministic by Default. In [24] A. Salcianu and M. C. Rinard. Purity and side effect analysis HotPar, page 4, 2009. for Java programs. In VMCAI, pages 199–215, 2005. [5] R. L. Bocchino, Jr., V. S. Adve, D. Dig, S. V. Adve, [25] A. Welc, S. Jagannathan, and A. Hosking. Safe Futures for S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, Java. In OOPSLA, pages 439–453, 2005. H. Sung, and M. Vakilian. A Type and Effect System for De- terministic Parallel Java. In OOPSLA, pages 97–116, 2009.

20