The Multithreading System: Programming Model, Execution Model, and Runtime System

Stephane´ Zuckerman

Computer Architecture & Parallel Systems Laboratory Electrical & Computer Engineering Dept. University of Delaware 140 Evans Hall Newark,DE 19716, USA [email protected]

CPEG 852 — Spring 2015 Advanced Topics in Computing Systems

Zuckerman et al. Cilk 1 / 118 Outline

1 Introduction High-Level Introduction 2 The Cilk Programming Model Cilk’s Programming Model: Overview Cilk’s Programming Model: The Cilk Computation Graph 3 The Cilk Execution Model Cilk’s Execution Model — Overview Cilk’s Execution Model — Threading Model Cilk’s Execution Model — Synchronization Model Cilk’s Execution Model — Memory Model 4 Examples of Cilk Programs Fibonacci Computation DAXPY Computation 5 Bibliography

Zuckerman et al. Cilk 2 / 118 Outline for section 1

1 Introduction High-Level Introduction 2 The Cilk Programming Model Cilk’s Programming Model: Overview Cilk’s Programming Model: The Cilk Computation Graph 3 The Cilk Execution Model Cilk’s Execution Model — Overview Cilk’s Execution Model — Threading Model Cilk’s Execution Model — Synchronization Model Cilk’s Execution Model — Memory Model 4 Examples of Cilk Programs Fibonacci Computation DAXPY Computation 5 Bibliography

Zuckerman et al. Cilk 3 / 118 Introduction — Multithreading with Cilk

What the Website Says Cilk (pronounced ”silk”) is a linguistic and runtime technology for algorithmic multithreaded programming developed at MIT. The philosophy behind Cilk is that a programmer should concentrate on structuring her or his program to expose parallelism and exploit locality, leaving Cilk’s runtime system with the responsibility of scheduling the computation to run efficiently on a given platform.

Zuckerman et al. Cilk 4 / 118 Introduction — Major Properties of Cilk

Cilk Features Language support for parallelism

I Goal: Provide simple keywords to express parallel algorithms A lightweight runtime system with provable properties:

I Provably time-efficient task scheduler F No hint to be provided by the user F Relies on specific work-stealing algorithm

I Provably space efficient parallel execution on P processors F Memory requirements increase only by a constant factor A weak memory consistency model: DAG Consistency Relies a lot on recursion and divide-and-conquer methodology

See reference list for the programming model (Frigo, C. E. Leiserson, and Randall 1998; C. Leiserson and Plaat 1998; Frigo

2007), the runtime properties (Blumofe, Joerg, et al. 1995; Blumofe and C. E. Leiserson 1998; Blumofe and C. E. Leiserson

1999), and the Cilk memory model (Blumofe et al. 1996b; Blumofe et al. 1996a).

Zuckerman et al. Cilk 5 / 118 Outline for section 2

1 Introduction High-Level Introduction 2 The Cilk Programming Model Cilk’s Programming Model: Overview Cilk’s Programming Model: The Cilk Computation Graph 3 The Cilk Execution Model Cilk’s Execution Model — Overview Cilk’s Execution Model — Threading Model Cilk’s Execution Model — Synchronization Model Cilk’s Execution Model — Memory Model 4 Examples of Cilk Programs Fibonacci Computation DAXPY Computation 5 Bibliography

Zuckerman et al. Cilk 6 / 118 Cilk’s Programming Model – Overview The Evolution of Cilk Cilk-1, the first implementation of Cilk

I Only proposed a runtime API

F Forced users to marshal arguments in continuations by hand F Very tedious Other Cilk implementations improved on internal workings (e.g., Cilk-3 added language support for parallelism) Cilk-5 proposes to extend the ANSI C (C89) language with specific keywords to enable parallelism.

Keywords cilk: must prefix the definition of functions that will spawn Cilk threads at run time. spawn: creates a new Cilk sync: waits for all children threads of the current Cilk thread to finish, then resumes execution.

Zuckerman et al. Cilk 7 / 118 The Cilk Computation Graph

Definition It is a directed acyclic graph (DAG) G = (V , E), where:

I Vertices v ∈ V symbolize some computation takes place when they are reached I Edges e ∈ E symbolize a dependence between two tasks The Cilk computation DAG is not a dataflow graph!

I In dataflow, graphs are formed due to the producer-consumer relationship between actors

F Dataflow actors can fire when their data dependences are satisfied

I In Cilk, graphs are formed due to task spawning and joining

F New branches appear when new tasks are spawned F Branches merge into one when tasks join together at a sync point How data is produced and consumed, and how memory is accessed is not modeled in the computation graph

Zuckerman et al. Cilk 8 / 118 ⇒ Tedious!

Why Use a Computation Graph? Computation Graph = Parallel Program Abstraction Roughly speaking, a computation graph is an execution trace It abstracts away what the program is doing It helps focus on how the program is behaving It helps reason about where parallelism is created/available

Limitations Roughly speaking, a computation graph is an execution trace According to the program inputs, the DAG may change its shape

I e.g., if there is control-flow involved when more work is created It is possible to work around this limitation:

I Detect all basic blocks in all functions involved in the program I “Empty out” the code, keeping only the parallel and control-flow constructs I Generate variants to take all control-flow paths in the program

Zuckerman et al. Cilk 9 / 118 Why Use a Computation Graph? Computation Graph = Parallel Program Abstraction Roughly speaking, a computation graph is an execution trace It abstracts away what the program is doing It helps focus on how the program is behaving It helps reason about where parallelism is created/available

Limitations Roughly speaking, a computation graph is an execution trace According to the program inputs, the DAG may change its shape

I e.g., if there is control-flow involved when more work is created It is possible to work around this limitation:

I Detect all basic blocks in all functions involved in the program I “Empty out” the code, keeping only the parallel and control-flow constructs I Generate variants to take all control-flow paths in the program ⇒ Tedious!

Zuckerman et al. Cilk 9 / 118 Why Use a Computation Graph?

Tools to Analyze Computation Graphs

Parallel execution time: TP , i.e., time it took to run on P processors

Total work: T1 (exec. time as if everything was run on a single processor)

Serial work execution Tseq (i.e., no scheduling to take into account for exec. time)

Critical Path Length: T∞ (also called span)

I Longest path in the DAG which will have to be executed “serially” I There can be several T∞ I Every other path in the DAG is shorter (in terms of number of tasks) I Evaluating the “breadth” of the computation DAG gives the theoretical maximum for parallelism

Lower and upper bounds:

I TP ≥ T1/P I TP ≥ T∞ I : S = T1/TP I Max parallelism: T1/T∞, i.e., “the average amount of work per step along the span”

Zuckerman et al. Cilk 10 / 118 Cilk Computation Graph: Example

Zuckerman et al. Cilk 11 / 118 Cilk Computation Graph: Example – T1

Zuckerman et al. Cilk 12 / 118 Cilk Computation Graph: Example — Critical Path T∞

Zuckerman et al. Cilk 13 / 118 Outline for section 3

1 Introduction High-Level Introduction 2 The Cilk Programming Model Cilk’s Programming Model: Overview Cilk’s Programming Model: The Cilk Computation Graph 3 The Cilk Execution Model Cilk’s Execution Model — Overview Cilk’s Execution Model — Threading Model Cilk’s Execution Model — Synchronization Model Cilk’s Execution Model — Memory Model 4 Examples of Cilk Programs Fibonacci Computation DAXPY Computation 5 Bibliography

Zuckerman et al. Cilk 14 / 118 Cilk PXM — Overview

Zuckerman et al. Cilk 15 / 118 Cilk PXM — Threading Model

Thread Creation, Execution, & Management

1 Any function marked as parallel (a cilk function) can issue a new thread. 2 When a cilk thread is spawned, it starts executing immediately 3 The parent thread is put on a waiting deque (double-ended queue) 4 If its deque is empty, a processor randomly steals work from another deque

Zuckerman et al. Cilk 16 / 118 Cilk’s Threading Model

Zuckerman et al. Cilk 17 / 118 Cilk’s Threading Model

Zuckerman et al. Cilk 18 / 118 Cilk’s Threading Model

Zuckerman et al. Cilk 19 / 118 Cilk PXM — Synchronization Model

Overview (Frigo, C. E. Leiserson, and Randall 1998) From a programming model point of view:

I A single keyword: sync I Synchronization point for all Cilk threads spawned in the same hierarchy From a PXM point of view:

I The “owner” of the sync object waits for its descendent Cilk threads to join. I The sync objects implements a counter

F Tracks how many outstanding Cilk threads need to join

Zuckerman et al. Cilk 20 / 118 Cilk PXM — Synchronization Model

Zuckerman et al. Cilk 21 / 118 Cilk PXM — Memory Model

Introduction to DAG Consistency (Blumofe et al. 1996b) Designed for distributed systems Performance is on-par with statically scheduled parallel tasks Demonstrates there is a relationship between page faults occurring in a serial execution and its parallel equivalent

Zuckerman et al. Cilk 22 / 118 Cilk PXM — Memory Model

DAG Consistency: Definition (Blumofe et al. 1996b) The shared-memory M of a multithreaded computation G = (V , E) is dag-consistent if the following two conditions hold: 1 Whenever any thread i ∈ V reads any object m ∈ M, it receives a value v tagged with some thread j ∈ V such that j writes v to M and i ⊀ j 2 For any three threads i, j, k ∈ V , satisfying i ≺ j ≺ k, if j writes some object m ∈ M, and k reads m, then the value received by k is not tagged with i.

Zuckerman et al. Cilk 23 / 118 Cilk PXM — Memory Model

DAG Consistency High-Level View Each processor in the system has a local cache Each processor in the system has access to a global store memory Some object may be located in the store memory as well as any processor’s cache A processor can read or write an object only if it is present in its cache Each object cached in a processor is associated with a dirty bit There are three basic memory operations:

I fetch: copies an object from the backing store to a processor cache and marks it clean I reconcile: copies an object from a processor’s cache to the backing store and marks it clean I flush: removes a clean object from a processor’s cache

Zuckerman et al. Cilk 24 / 118 Cilk PXM — Memory Model

DAG Consistency BACKER Algorithm Overview All read/write operations from a processor happen in a cached copy of the object. ⇒ If the object does not exist in the cache, it is fetched in the cache first. If the operation is a write, the object’s dirty bit is set If a processor must load a new object, but its cache is full:

I A clean object can be removed by flushing it to the backer store I A dirty object must first be reconciled, then flushed If a thread j succeeds to a thread i (i.e., i → j in the computation DAG):

I If i and j execute on the same processor, there is nothing to do. Otherwise: I Assuming i and j execute on Pi and Pj :

F Pi reconciles all of i’s writes after i finished, but before j starts executing, and F All of Pj ’s cache is reconciled before j starts executing.

Zuckerman et al. Cilk 25 / 118 Outline for section 4

1 Introduction High-Level Introduction 2 The Cilk Programming Model Cilk’s Programming Model: Overview Cilk’s Programming Model: The Cilk Computation Graph 3 The Cilk Execution Model Cilk’s Execution Model — Overview Cilk’s Execution Model — Threading Model Cilk’s Execution Model — Synchronization Model Cilk’s Execution Model — Memory Model 4 Examples of Cilk Programs Fibonacci Computation DAXPY Computation 5 Bibliography

Zuckerman et al. Cilk 26 / 118 i n t fib( i n t n) { i n t n1, n2; i f (n < 2) / ∗ cutoff value −− base case ∗ / return n;

n1 = fib(n-1); n2 = fib(n-2);

return n1+n2; / ∗ Done −− we can return the sum ∗ / } Figure : Na¨ıve Fibonacci Number Computation

Fibonacci Numbers Definition: Fibonacci Sequence  U0 = 0  U1 = 1 , ∀n ∈ N   Un = Un−1 + Un−2

Zuckerman et al. Cilk 27 / 118 Fibonacci Numbers Definition: Fibonacci Sequence  U0 = 0  U1 = 1 , ∀n ∈ N   Un = Un−1 + Un−2

i n t fib( i n t n) { i n t n1, n2; i f (n < 2) / ∗ cutoff value −− base case ∗ / return n;

n1 = fib(n-1); n2 = fib(n-2);

return n1+n2; / ∗ Done −− we can return the sum ∗ / } Figure : Na¨ıve Fibonacci Number Computation Zuckerman et al. Cilk 27 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 28 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 29 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 30 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 31 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 32 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 33 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 34 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 35 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 36 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 37 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 38 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 39 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 40 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 41 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 42 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 43 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 44 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 45 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 46 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 47 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 48 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 49 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 50 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 51 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 52 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 53 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 54 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 55 / 118 Fibonacci: Example of Sequential Execution

Zuckerman et al. Cilk 56 / 118 Fibonacci Numbers

i n t fib( i n t n) { i n t n1, n2; i f (n < 2) / ∗ cutoff value −− base case ∗ / return n;

n1 = fib(n-1); n2 = fib(n-2);

return n1+n2; / ∗ Done −− we can return the sum ∗ / }

Figure : Na¨ıve Fibonacci Number Computation

Zuckerman et al. Cilk 57 / 118 Fibonacci Numbers

c i l k int fib( i n t n) { i n t n1, n2; i f (n < 2) / ∗ cutoff value −− base case ∗ / return n;

n1 = spawn fib(n-1); n2 = spawn fib(n-2);

/ ∗ Wait for all children threads to finish ∗ / sync;

return n1+n2; / ∗ Done −− we can return the sum ∗ / }

Figure : Na¨ıve Parallel Fibonacci Number Computation

Zuckerman et al. Cilk 58 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 59 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 60 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 61 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 62 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 63 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 64 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 65 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 66 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 67 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 68 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 69 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 70 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 71 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 72 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 73 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 74 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 75 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 76 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 77 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 78 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 79 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 80 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 81 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 82 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 83 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 84 / 118 Fibonacci: Example of Parallel Execution

Zuckerman et al. Cilk 85 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 86 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 87 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 88 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 89 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 90 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 91 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 92 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 93 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 94 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 95 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 96 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 97 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 98 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 99 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 100 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 101 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 102 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 103 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 104 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 105 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 106 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 107 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 108 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 109 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 110 / 118 Fibonacci: Another Example of Parallel Execution

Zuckerman et al. Cilk 111 / 118 void DAXPY(double* Y, const double* X, const double alpha, const size_t N) { for (size_t i = 0; i < N; ++i) { Y[i] += alpha * X[i]; } }

Figure : Na¨ıve DAXPY computation

DAXPY

Definition: Double-precision Alpha-times-X-Plus-Y ~ n ~ n ~ ~ ~ ∀α ∈ R, X ∈ R , Y ∈ R : Y = α.X + Y

Zuckerman et al. Cilk 112 / 118 DAXPY

Definition: Double-precision Alpha-times-X-Plus-Y ~ n ~ n ~ ~ ~ ∀α ∈ R, X ∈ R , Y ∈ R : Y = α.X + Y

void DAXPY(double* Y, const double* X, const double alpha, const size_t N) { for (size_t i = 0; i < N; ++i) { Y[i] += alpha * X[i]; } }

Figure : Na¨ıve DAXPY computation

Zuckerman et al. Cilk 112 / 118 DAXPY

void DAXPY(double* Y, const double* X, const double alpha, const size_t N) {

for (size_t i = 0; i < N; ++i) { Y[i] += alpha * X[i]; }

}

Figure : Na¨ıve DAXPY Computation

Zuckerman et al. Cilk 113 / 118 DAXPY

c i l k void DAXPY(double* Y, const double* X, const double alpha, const size_t N) { i f (N < CUTOFF_VALUE) { for (size_t i = 0; i < N; ++i) { Y[i] += alpha * X[i]; } } else { spawn DAXPY(Y, X, alpha, N/2); spawn DAXPY(Y+N/2, X+N/2, alpha, N-N/2); sync; } }

Figure : Na¨ıve parallel DAXPY computation

Zuckerman et al. Cilk 114 / 118 Outline for section 5

1 Introduction High-Level Introduction 2 The Cilk Programming Model Cilk’s Programming Model: Overview Cilk’s Programming Model: The Cilk Computation Graph 3 The Cilk Execution Model Cilk’s Execution Model — Overview Cilk’s Execution Model — Threading Model Cilk’s Execution Model — Synchronization Model Cilk’s Execution Model — Memory Model 4 Examples of Cilk Programs Fibonacci Computation DAXPY Computation 5 Bibliography

Zuckerman et al. Cilk 115 / 118 ReferencesI

Blumofe, Robert D., Christopher F. Joerg, et al. (1995). “Cilk: An Efficient Multithreaded Runtime System”. In: SIGPLAN Not. 30.8, pp. 207–216. ISSN: 0362-1340. DOI: 10.1145/209937.209958. URL: http://doi.acm.org/10.1145/209937.209958. Blumofe, Robert D. and Charles E. Leiserson (1998). “Space-Efficient Scheduling of Multithreaded Computations”. In: SIAM Journal on Computing 27.1, pp. 202–229. DOI: 10.1137/S0097539793259471. eprint: http://dx.doi.org/10.1137/S0097539793259471. URL: http://dx.doi.org/10.1137/S0097539793259471. – (1999). “Scheduling Multithreaded Computations by Work Stealing”. In: J. ACM 46.5, pp. 720–748. ISSN: 0004-5411. DOI: 10.1145/324133.324234. URL: http://doi.acm.org/10.1145/324133.324234.

Zuckerman et al. Cilk 116 / 118 ReferencesII

Blumofe, Robert D. et al. (1996a). “An Analysis of Dag-consistent Distributed Shared-memory Algorithms”. In: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures. SPAA ’96. Padua, Italy: ACM, pp. 297–308. ISBN: 0-89791-809-6. DOI: 10.1145/237502.237574. URL: http://doi.acm.org/10.1145/237502.237574. – (1996b). “Dag-Consistent Distributed Shared Memory”. In: Proceedings of the 10th International Parallel Processing Symposium. IPPS ’96. Washington, DC, USA: IEEE Computer Society, pp. 132–141. ISBN: 0-8186-7255-2. URL: http://dl.acm.org/citation.cfm?id=645606.661333.

Zuckerman et al. Cilk 117 / 118 References III

Frigo, Matteo (2007). “Multithreaded Programming in Cilk”. In: Proceedings of the 2007 International Workshop on Parallel Symbolic Computation. PASCO ’07. London, Ontario, Canada: ACM, pp. 13–14. ISBN: 978-1-59593-741-4. DOI: 10.1145/1278177.1278181. URL: http://doi.acm.org/10.1145/1278177.1278181. Frigo, Matteo, Charles E. Leiserson, and Keith H. Randall (1998). “The Implementation of the Cilk-5 Multithreaded Language”. In: SIGPLAN Not. 33.5, pp. 212–223. ISSN: 0362-1340. DOI: 10.1145/277652.277725. URL: http://doi.acm.org/10.1145/277652.277725. Leiserson, Charles and Aske Plaat (1998). “Programming parallel applications in Cilk”. In: SINEWS: SIAM News 31.

Zuckerman et al. Cilk 118 / 118