The Cilk System for Parallel Multithreaded Computing

The Cilk System for Parallel Multithreaded Computing by Christopher F Jo erg Submitted to the Department of Electrical Engineering and Computer Science in partial fulllment of the requirements for the degree of Do ctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY January c Massachusetts Institute of Technology All rights reserved Author Department of Electrical Engineering and Computer Science January Certied by Charles E Leiserson Professor Thesis Sup ervisor Accepted by Frederic R Morgenthaler Chairman Departmental Committee on Graduate Students The Cilk System for Parallel Multithreaded Computing by Christopher F Jo erg Submitted to the Department of Electrical Engineering and Computer Science on January in partial fulllment of the requirements for the degree of Do ctor of Philosophy Abstract Although costeective parallel machines are now commercially available the widespread use of parallel pro cessing is still b eing held back due mainly to the troublesome nature of parallel programming In particular it is still dicult to build ecient implementations of parallel applications whose communication patterns are either highly irregular or dep endent up on dynamic information Multithreading has b ecome an increasingly p opular way to implement these dynamic asynchronous concurrent programs Cilk pronounced silk is our Cbased multithreaded computing system that provides provably go o d p erformance guarantees This thesis describ es the evolution of the Cilk language and runtime system and describ es applications which aected the evolution of the system Using Cilk programmers are able to express their applications either by writing mul tithreaded co de written in a continuationpassing style or by writing co de using normal callreturn semantics and sp ecifying which calls can b e p erformed in parallel The Cilk run time system takes complete control of the scheduling loadbalancing and communication needed to execute the program thereby insulating the programmer from these details The programmer can rest assured that his program will b e executed eciently since the Cilk scheduler provably achieves time space and communication b ounds all within a constant factor of optimal For distributed memory environments we have implemented a software sharedmemory system for Cilk We have dened a dagconsistent memory mo del which is a lo ckfree consistency mo del well suited to the needs of a multithreaded program Because dag consistency is a weak consistency mo del we have b een able to implement coherence eciently in software The most complex application written in Cilk is the So crates computer chess program So crates is a large nondeterministic challenging application whose complex control de p endencies make it inexpressible in many other parallel programming systems Running on an no de Paragon So crates nished second in the World Computer Chess Championship Currently versions of Cilk run on the Thinking Machines CM the Intel Paragon various SMPs and on networks of workstations The same Cilk program will run on all of these platforms with little if any mo dication Applications written in Cilk include protein folding graphic rendering backtrack search and computer chess Thesis Sup ervisor Charles E Leiserson Title Professor Acknowledgments I am esp ecially grateful to my thesis sup ervisor Professor Charles Leiserson who has led the Cilk pro ject I still remember the day he came to my oce and recruited me He explained how he realized I had other work to do but he wanted to know if I would like to help out part time on using the PCM system I had worked on to implement a chess program It sounded like a interesting pro ject so I agreed but only after making it clear that I could only work part time b ecause I had my thesis pro ject to work on Well part time b ecame full time and at times full time b ecame much more than that Eventually the chess program was completed and the chess tournament came and went yet I still kept working on the PCM system which was now turning into Cilk Ultimately I realized that I should give up on my other pro ject and make Cilk my thesis instead Charles is a wonderful sup ervisor and under his leadership the Cilk pro ject has achieved more than I ever exp ected Charles inuence can also b e seen in this writeup itself He has help ed me turn this thesis into a relatively coherent do cument and he has also p ointed out some of my more malo dorous grammatical constructions The Cilk pro ject has b een a team eort and I am indebted to all the p eople who have contributed in some way to the Cilk system Bobby Blumofe Feng Ming Dong Matteo Frigo Shail Aditya Gupta Michael Halbherr Charles Leiserson Bradley Kuszmaul Rob Miller Keith Randall Rolf Riesen Andy Shaw Richard Tauriello and Yuli Zhou Their contributions are noted throughout this do cument I thank along with the members of the Cilk team the past and present members of the Computation Structures Group These friends have made MIT b oth a challenging and a fun place to b e In particular I should thank Michael Halbherr He not only b egan the work that lead to the PCM system but he tried many times to convince me to switch my thesis to this system It to ok a while but I nally realized he was right I am also indebted to Don Dailey and Larry Kaufman b oth formerly of Heuristic Soft The research describ ed in this do cument was supp orted in part by the Advanced Research Pro jects Agency of the Department of Defense under grants N and NJ This work was also supp orted by the National Center for Sup ercomputing Applications at the University of Illinois at UrbanaChampagne NCSA who under NCSA Grant TRAN provided us access to their pro cessor CM for the chess tournament and by Sandia National Lab oratories who provided us access to their no de Intel Paragon for the tournament ware They wrote the serial So crates program on which So crates is based In addition Don and I sp ent many long nights debugging testing and improving or at least trying to improve So crates Most of this time we even had fun Professor Arvind Dr Andy Boughton and Dr Greg Papadopoulus also deserve many thanks They provided me the freedom encouragement and supp ort to work on a wide range of exciting pro jects throughout my years at MIT I am also grateful to my parents and my family Their love and supp ort has always b een imp ortant to me Last but not least I thank Constance Jeery Whether we were together apart or o on one of our many trips ranging from Anchorage to Zurich her continuing friendship over the past decade has made these years enjoyable and memorable Contents Introduction Life Before Cilk The Evolution of Cilk The PCM System Introduction The Parallel Continuation Machine Elements of the PCM The Thread Sp ecication Language Executing a PCM Program Tail Calls Passing Vectors in Closures Scheduling PCM Threads on a Multipro cessor Two Case Studies Ray Tracing Protein Folding Conclusions Cilk A Provably Go o d Runtime System Cilk Overview Cilk Programming Environment and Implementation Cilks WorkStealing Scheduler Performance of Cilk Applications Mo deling Performance Theoretical Analysis of the Cilk Scheduler Conclusion The So crates Parallel Chess Program Introduction Parallel Game Tree Search Negamax Search Without Pruning AlphaBeta Pruning Scout Search Jamboree Search Using Cilk for Chess Search Migration Threads Ab orting Computations Priority Threads Steal Ordering Level Waiting Other Chess Mechanisms Transposition Table Rep eated Moves Debugging Performance of Jamboree Search History of So crates Cilk Programming at a Higher Level A Cilk Example Knary The Cilk System Knary Revisited Conclusion Cilk Shared Memory for Cilk Introduction Dag Consistency Maintaining Dag Consistency Implementation An Analysis of Page Faults Conclusion Cilk Supp orting Sp eculative Computations

The Cilk System for Parallel Multithreaded Computing

Other Apis What’S Wrong with Openmp?

Distributed & Cloud Computing: Lecture 1

Distributed Algorithms with Theoretic Scalability Analysis of Radial and Looped Load ﬂows for Power Distribution Systems

Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++

Grid Computing: What Is It, and Why Do I Care?*

Parallelism in Cilk Plus

Computing at Massive Scale: Scalability and Dependability Challenges

A CPU/GPU Task-Parallel Runtime with Explicit Epoch Synchronization

Intel Threading Building Blocks

An Overview of Parallel Ccomputing

Lecture 18: Parallel and Distributed Systems

Openclunk: a Cilk-Like Framework for the GPGPU