The Architecture and Programming of a Fine-Grain Multicomputer

The Architecture and Programming of a FineGrain Multicomputer Thesis by JakovNSeizovic In Partial Fulllment of the Requirements for the Degree of Do ctor of Philosophy California Institute of Technology Pasadena California Submitted August CaltechCSTR ii c Copyright JakovNSeizovic All Rights Reserved iii Acknowledgments Many thanks to all the great teachers Ive had in particular ToChuck Seitz my research advisor for knowing when to b e a friend and when a tough guy for his willingness to b e b oth a student and a teacher for p ointing out the obvious and insisting on simplicityToJanvan de Snepscheut Mani Chandy and Eric van de Velde memb ers of my thesisdefense committee for teaching me through their example that one must never stop learning To MilenkoCvetinovi c for giving me all the right b o oks to get excited ab out To Ilija Sto janovi c Dejan Zivkovi c and Slob o dan Cuk for b elieving in me b efore I did Many more thanks to myfellow students To Nan Bo den for her dedicated supp ort and critique dep ending on what I needed on any particular dayToWen King Su for the long hours wespentworking together and for sharing his insights into the wonderful world of programming To the late MikePertel for teaching me ab out stamina in times go o d and bad ToBillAthas Don Sp eck and Craig Steele for helping me keep b oth feet on the ground ToTonyLeeforbeingthe b est ocemate there is Thanks also go to my research sp onsors To the Advanced Research Pro jects Agency whose program managers rep eatedly demonstrated a remarkable ability to balance researchers vanities against the need for interpro ject co op eration To IBM for their orientation towards the future emb o died in part in numerous studentsupp ort programs Totheentire Caltech Computer Science Department sta and to Arlene DesJardins in particular for humming busily in the background of my co co on And to my family and friends who made sure that it wasnt all work and no play iv v ToGoga vi Abstract The research presented in this thesis was conducted in the context of the Mosaic C an exp erimental negrain multicomputer The ob jectiveofthe Mosaic exp erimentwas to develop a concurrentcomputing system with maximum p erformance p er unit cost while still retaining a generalpurp ose application span A stipulation of the Mosaic pro ject was that the complexity of a Mosaic no de b e limited by the silicon complexityavailable on a single VLSI chip The two most imp ortant original results rep orted in the thesis are The design and implementation of C a concurrent ob jectoriented programming system SyntacticallyC is an extension of C The concurrent semantics of C are contained within the process concept A C pro cess is analogous to a C ob ject but it is also an autonomous computing agent and a unit of p otential concurrencyAtomic singlepro cess up dates that can b e individually enabled and disabled are the execution units of the concurrent computation The limited set of primitives that C provides is shown to b e sucient to express avariety of concurrentprogramming problems concisely and eciently An imp ortant design requirementforC was that ecientimplementations should exist on a variety of concurrentarchitectures and in particular on the simple and inexp ensive hardware of the Mosaic no de The Mosaic runtime system was written entirely in C Pip eline synchronization a novel generallyapplicable technique for hard ware synchronization This technique is a simple lowcost highbandwidth highreliability solution to interfaces b etween synchronous and asynchronous systems or b etween synchronous systems op erating from dierent clo cks The technique can sustain the full communication bandwidth and achievean arbitrarily low nonzero probability of synchronization failure P withthe f price in b oth latency and chip area b eing O log P f Pip eline synchronization has b een successfully applied to the high p erformance intercomputer communication in Mosaic no de ensembles CONTENTS vii Contents Intro duction Concurrency and VLSI ConcurrentArchitectures Concurrent Programming SharedMemory Programming Explicit Message Passing ArchitectureIndep endent Programming The ReactivePro cess Programming Mo del The Mosaic C Pro ject Overview of the Thesis C Intro duction Ob jectOriented Programming vs Concurrency Concurrent Ob jectOriented Languages The Pro cess Concept Managing Concurrency Remote Pro cedure Call Call Forwarding ForkJoin Semaphores Monitors Recursion Message Passing SingleAssignmentVariables Pro cess Aggregates Summary Managing Program Complexity Class Inheritance Virtual Functions Pro cess Layering Pro cess Libraries viii CONTENTS Data Exchange Putting It All Together Implementation Issues The RuntimeSystem Framework Pro cess Creation Runtime Services Pro cess Dispatch The pointer t and the entry t Typ es Pro cess State Pro cess Migration Invoking Atomic Actions ActivePassive Remote Pro cedure Call From C to C Parsing Co de Generation Co de Splitting The Mosaic C Multicomputer Architecture The Mosaic No de The Mosaic Router The Dynamic RAM The Pro cessor and the Network Interface Software Overhead of Communications Pip eline Synchronization Intro duction Problem Sp ecication Existing Solutions Pip eline Synchronization The MutualExclusion Element TwoPhaseProto col FIFO Pip eline Synchronizer Correctness Pro of Variations On the Theme A CMOS Implementation Conclusions CONTENTS ix Conclusions Comparison With Related Work MediumGrain Multicomputers FineGrain Multicomputers Multipro cessors Summary A Example Pro ducts of C Compilation Bibliography x CONTENTS Chapter Intro duction Concurrency and VLSI Progress in micro electronics technology during the past four decades has b een remarkable byany measure Three ma jor factors contributed to this progress a rapid and steady pace of improvements in pro cessing technology to pro duce ever smaller faster and lowerp ower devices the developmentofdesign metho dologies and to ols to manage design complexity and the exploitation of concurrency The rst two factors are readily understo o d The imp ortance of concurrency to the p erformancecost ratio of VLSI systems can b e understo o d from results of VLSIcomplexity theory and has b een demonstrated rep eatedly in practice Sp ecialpurp ose computing engines were the rst to employ concurrent solutions and continue to do so highly successfullytothisday Although various forms of concurrency bitlevel parallelism pip elining vectorization are exploited regularly in generalpurp ose computing engines applying concurrent solutions to generalpurp ose computing at the application level has b een slower in gaining ground A considerable eort has b een made to exploit the concurrency that is implicit in sequential programs This eort has b een successful in discovering and utilizing mo dest degrees of concurrencybutisnow regarded almost universally as having approached its limits Applications with explicitly concurrent formulations are the driving force for a range of concurrentarchitectures some of which are discussed in the following section ConcurrentArchitectures Most of to days concurrent computers are representatives of one of the following three architectures CHAPTER INTRODUCTION Computers with a single instruction stream and multiple data streams SIMD Twovariants of computers with multiple instruction streams and multiple data streams MIMD multipro cessors whichhave one global address space and multicomputers whichhavemultiple lo cal address spaces Early concurrentcomputer implementations closely followed this classication SIMD computers employed multiple computing units to which instructions were broadcast multipro cessors utilized buses andor switches to connect multiple pro cessors to the global memory multicomputers featured indep endent pro cessormemory pairs interacting through a messagepassing network The dierences between the more recent representatives of these three architectures are blurred When observed from a p ointthatis suciently close to the hardware or from a p oint that is suciently far away from the hardware these three architectures are remarkably similar Each consists of a communication network connecting a collection of computing no des Eachnode consists of one to several instructioninterpreting pro cessors a lo cal memoryand a network interface All three architectures supp ort some concept of pro cesses computing agents that execute concurrently and that can communicate data and synchronize activities with each other What were once architectural distinctions b ecame dierences in programming style dataparallel sharedmemory and messagepassing Chapter programming abstractions Dep ending on the emphasis on supp ort for

Load more