
The Cilk Multithreading System: Programming Model, Execution Model, and Runtime System Stephane´ Zuckerman Computer Architecture & Parallel Systems Laboratory Electrical & Computer Engineering Dept. University of Delaware 140 Evans Hall Newark,DE 19716, USA [email protected] CPEG 852 — Spring 2015 Advanced Topics in Computing Systems Zuckerman et al. Cilk 1 / 118 Outline 1 Introduction High-Level Introduction 2 The Cilk Programming Model Cilk’s Programming Model: Overview Cilk’s Programming Model: The Cilk Computation Graph 3 The Cilk Execution Model Cilk’s Execution Model — Overview Cilk’s Execution Model — Threading Model Cilk’s Execution Model — Synchronization Model Cilk’s Execution Model — Memory Model 4 Examples of Cilk Programs Fibonacci Computation DAXPY Computation 5 Bibliography Zuckerman et al. Cilk 2 / 118 Outline for section 1 1 Introduction High-Level Introduction 2 The Cilk Programming Model Cilk’s Programming Model: Overview Cilk’s Programming Model: The Cilk Computation Graph 3 The Cilk Execution Model Cilk’s Execution Model — Overview Cilk’s Execution Model — Threading Model Cilk’s Execution Model — Synchronization Model Cilk’s Execution Model — Memory Model 4 Examples of Cilk Programs Fibonacci Computation DAXPY Computation 5 Bibliography Zuckerman et al. Cilk 3 / 118 Introduction — Multithreading with Cilk What the Website Says Cilk (pronounced ”silk”) is a linguistic and runtime technology for algorithmic multithreaded programming developed at MIT. The philosophy behind Cilk is that a programmer should concentrate on structuring her or his program to expose parallelism and exploit locality, leaving Cilk’s runtime system with the responsibility of scheduling the computation to run efficiently on a given platform. Zuckerman et al. Cilk 4 / 118 Introduction — Major Properties of Cilk Cilk Features Language support for parallelism I Goal: Provide simple keywords to express parallel algorithms A lightweight runtime system with provable properties: I Provably time-efficient task scheduler F No hint to be provided by the user F Relies on specific work-stealing algorithm I Provably space efficient parallel execution on P processors F Memory requirements increase only by a constant factor A weak memory consistency model: DAG Consistency Relies a lot on recursion and divide-and-conquer methodology See reference list for the programming model (Frigo, C. E. Leiserson, and Randall 1998; C. Leiserson and Plaat 1998; Frigo 2007), the runtime properties (Blumofe, Joerg, et al. 1995; Blumofe and C. E. Leiserson 1998; Blumofe and C. E. Leiserson 1999), and the Cilk memory model (Blumofe et al. 1996b; Blumofe et al. 1996a). Zuckerman et al. Cilk 5 / 118 Outline for section 2 1 Introduction High-Level Introduction 2 The Cilk Programming Model Cilk’s Programming Model: Overview Cilk’s Programming Model: The Cilk Computation Graph 3 The Cilk Execution Model Cilk’s Execution Model — Overview Cilk’s Execution Model — Threading Model Cilk’s Execution Model — Synchronization Model Cilk’s Execution Model — Memory Model 4 Examples of Cilk Programs Fibonacci Computation DAXPY Computation 5 Bibliography Zuckerman et al. Cilk 6 / 118 Cilk’s Programming Model – Overview The Evolution of Cilk Cilk-1, the first implementation of Cilk I Only proposed a runtime API F Forced users to marshal arguments in continuations by hand F Very tedious Other Cilk implementations improved on internal workings (e.g., Cilk-3 added language support for parallelism) Cilk-5 proposes to extend the ANSI C (C89) language with specific keywords to enable parallelism. Keywords cilk: must prefix the definition of functions that will spawn Cilk threads at run time. spawn: creates a new Cilk thread sync: waits for all children threads of the current Cilk thread to finish, then resumes execution. Zuckerman et al. Cilk 7 / 118 The Cilk Computation Graph Definition It is a directed acyclic graph (DAG) G = (V ; E), where: I Vertices v 2 V symbolize some computation takes place when they are reached I Edges e 2 E symbolize a dependence between two tasks The Cilk computation DAG is not a dataflow graph! I In dataflow, graphs are formed due to the producer-consumer relationship between actors F Dataflow actors can fire when their data dependences are satisfied I In Cilk, graphs are formed due to task spawning and joining F New branches appear when new tasks are spawned F Branches merge into one when tasks join together at a sync point How data is produced and consumed, and how memory is accessed is not modeled in the computation graph Zuckerman et al. Cilk 8 / 118 ) Tedious! Why Use a Computation Graph? Computation Graph = Parallel Program Abstraction Roughly speaking, a computation graph is an execution trace It abstracts away what the program is doing It helps focus on how the program is behaving It helps reason about where parallelism is created/available Limitations Roughly speaking, a computation graph is an execution trace According to the program inputs, the DAG may change its shape I e.g., if there is control-flow involved when more work is created It is possible to work around this limitation: I Detect all basic blocks in all functions involved in the program I “Empty out” the code, keeping only the parallel and control-flow constructs I Generate variants to take all control-flow paths in the program Zuckerman et al. Cilk 9 / 118 Why Use a Computation Graph? Computation Graph = Parallel Program Abstraction Roughly speaking, a computation graph is an execution trace It abstracts away what the program is doing It helps focus on how the program is behaving It helps reason about where parallelism is created/available Limitations Roughly speaking, a computation graph is an execution trace According to the program inputs, the DAG may change its shape I e.g., if there is control-flow involved when more work is created It is possible to work around this limitation: I Detect all basic blocks in all functions involved in the program I “Empty out” the code, keeping only the parallel and control-flow constructs I Generate variants to take all control-flow paths in the program ) Tedious! Zuckerman et al. Cilk 9 / 118 Why Use a Computation Graph? Tools to Analyze Computation Graphs Parallel execution time: TP , i.e., time it took to run on P processors Total work: T1 (exec. time as if everything was run on a single processor) Serial work execution Tseq (i.e., no scheduling to take into account for exec. time) Critical Path Length: T1 (also called span) I Longest path in the DAG which will have to be executed “serially” I There can be several T1 I Every other path in the DAG is shorter (in terms of number of tasks) I Evaluating the “breadth” of the computation DAG gives the theoretical maximum for parallelism Lower and upper bounds: I TP ≥ T1=P I TP ≥ T1 I Speedup: S = T1=TP I Max parallelism: T1=T1, i.e., “the average amount of work per step along the span” Zuckerman et al. Cilk 10 / 118 Cilk Computation Graph: Example Zuckerman et al. Cilk 11 / 118 Cilk Computation Graph: Example – T1 Zuckerman et al. Cilk 12 / 118 Cilk Computation Graph: Example — Critical Path T1 Zuckerman et al. Cilk 13 / 118 Outline for section 3 1 Introduction High-Level Introduction 2 The Cilk Programming Model Cilk’s Programming Model: Overview Cilk’s Programming Model: The Cilk Computation Graph 3 The Cilk Execution Model Cilk’s Execution Model — Overview Cilk’s Execution Model — Threading Model Cilk’s Execution Model — Synchronization Model Cilk’s Execution Model — Memory Model 4 Examples of Cilk Programs Fibonacci Computation DAXPY Computation 5 Bibliography Zuckerman et al. Cilk 14 / 118 Cilk PXM — Overview Zuckerman et al. Cilk 15 / 118 Cilk PXM — Threading Model Thread Creation, Execution, & Management 1 Any function marked as parallel (a cilk function) can issue a new thread. 2 When a cilk thread is spawned, it starts executing immediately 3 The parent thread is put on a waiting deque (double-ended queue) 4 If its deque is empty, a processor randomly steals work from another deque Zuckerman et al. Cilk 16 / 118 Cilk’s Threading Model Zuckerman et al. Cilk 17 / 118 Cilk’s Threading Model Zuckerman et al. Cilk 18 / 118 Cilk’s Threading Model Zuckerman et al. Cilk 19 / 118 Cilk PXM — Synchronization Model Overview (Frigo, C. E. Leiserson, and Randall 1998) From a programming model point of view: I A single keyword: sync I Synchronization point for all Cilk threads spawned in the same hierarchy From a PXM point of view: I The “owner” of the sync object waits for its descendent Cilk threads to join. I The sync objects implements a counter F Tracks how many outstanding Cilk threads need to join Zuckerman et al. Cilk 20 / 118 Cilk PXM — Synchronization Model Zuckerman et al. Cilk 21 / 118 Cilk PXM — Memory Model Introduction to DAG Consistency (Blumofe et al. 1996b) Designed for distributed shared memory systems Performance is on-par with statically scheduled parallel tasks Demonstrates there is a relationship between page faults occurring in a serial execution and its parallel equivalent Zuckerman et al. Cilk 22 / 118 Cilk PXM — Memory Model DAG Consistency: Definition (Blumofe et al. 1996b) The shared-memory M of a multithreaded computation G = (V ; E) is dag-consistent if the following two conditions hold: 1 Whenever any thread i 2 V reads any object m 2 M, it receives a value v tagged with some thread j 2 V such that j writes v to M and i ⊀ j 2 For any three threads i; j; k 2 V , satisfying i ≺ j ≺ k, if j writes some object m 2 M, and k reads m, then the value received by k is not tagged with i. Zuckerman et al. Cilk 23 / 118 Cilk PXM — Memory Model DAG Consistency High-Level View Each processor in the system has a local cache Each processor in the system has access to a global store memory Some object may be located in the store memory as well as any processor’s cache A processor can read or write an object only if it is present in its cache Each object cached in a processor is associated with a dirty bit There are three basic memory operations: I fetch: copies an object from the backing store to a processor cache and marks it clean I reconcile: copies an object from a processor’s cache to the backing store and marks it clean I flush: removes a clean object from a processor’s cache Zuckerman et al.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages121 Page
-
File Size-