Parallex: a Study of a New Parallel Computation Model

ParalleX: A Study of A New Parallel Computation Model Guang R. Gao1, Thomas Sterling2,3,RickStevens4, Mark Hereld4, and Weirong Zhu1 1 2 Department of Electrical and Computer Engineering Center for Advanced Computing Research University of Delaware California Institute of Technology {ggao,weirong}@capsl.udel.edu [email protected] 3 4 Department of Computer Science Mathematics and Computer Science Division Louisiana State University Argonne National Laboratory [email protected] {stevens,hereld}@mcs.anl.gov Abstract more commonly the “message passing model” represented by various implementations of MPI (e.g. MPICH-2, Open- This paper proposes the study of a new computation MPI) adopted because of its applicability to microprocessor model that attempts to address the underlying sources of based MPPs and commodity clusters. Other models used performance degradation (e.g. latency, overhead, and star- are multiple threads (e.g. OpenMP), vector, and SIMD, but vation) and the difficulties of programmer productivity (e.g. message passing is dominant. However, as semiconductor explicit locality management and scheduling, performance technology has continued to evolve, both new opportunities tuning, fragmented memory, and synchronous global barri- and new challenges have emerged demanding correspond- ers) to dramatically enhance the broad effectiveness of par- ing improvements to architecture. Most pronounced is the allel processing for high end computing. In this paper, we move to multicore components and the re-emergence of het- present the progress of our research on a parallel program- erogeneous computing elements such as GPUs and Clear- ming and execution model - mainly, ParalleX. We describe speed SIMD [2] attached processors. The IBM Cell ar- the functional elements of ParalleX, one such model being chitecture embodies both heterogeneous and multicore el- explored as part of this project. We also report our progress ements [6]. These exemplify but do not fully reflect the on the development and study of a subset of ParalleX - the increasing need for new programming methods and archi- LITL-X at University of Delaware. We then present a novel tecture structures to continue to fully benefit from Moore’s architecture model - Gilgamesh II - as a ParalleX process- Law. An important objective of this research project has ing architecture. A design point study of Gilgamesh II and been the exploration and development of a possible new ex- the architecture concept strategy are presented. ecution model that will more readily employ the potential capabilities of near term semiconductor technologies while easing the programmer burden. Such an execution model 1 Introduction could lead to new programming models and languages, new parallel computer architecture, and new supporting system software. Historically as technology has advanced, computer architecture has changed to exploit the new opportunities In this paper, we present the progress of our research offered and to compensate for the exposed weaknesses. on a parallel programming and execution model - mainly, Alternative strategies for organizing the computation as ParalleX. We describes the functional elements of ParalleX, well as the architectural structures and system software one such model being explored as part of this project. Par- have been devised to provide governing semantic principles alleX principal elements are discussed, including locality, upon which such architecture and programming models are global name space, multithreading, parcels, local control based. For more than a decade, the dominant model of com- objects, percolation, echo, and parallel processes. We also putation has been the communication sequential process or report our progress on the development and study of a subset of ParalleX - the LITL-X at University of Delaware. We 1-4244-0910-1/07/$20.00 c 2007 IEEE. then present a novel architecture model - Gilgamesh II - as a ParalleX processing architecture. A design point study the size of the large scale machines. While these would ap- of Gilgamesh II is presented and the architecture concept pear to be properties of the bulk hardware, the manner of strategy is presented. its use is dictated by the execution model and therefore for a given sustained performance can be significantly affected 2 Parallel Programming Execution Model: by it. A new computational model is also needed to move towards a new balance of resources. To devise structures ParalleX and LITL-X and coordinate computing around the realistic precepts demands a model of computing quite distinct from those con- This section discusses the specific requirements for a ventionally employed by the mainstream today. This will new execution model and describes the functional elements enable the exploitation of the true performance opportu- of ParalleX, one such model being explored by Sterling’s nities yielded by technology and support architecture ad- group at LSU as part of this project. We also describe LITL- vances. X a subset of ParalleX that is being developed and studied The following are considered essential characteristics of at Gao’s group at Delaware. a future model of computation: • System wide global name spaces both for data and ac- 2.1 Requirements for a New Model of tive tasks with efficient address translation and com- Computation munication routing in the presence of dynamic object distribution, A new model of computation is required to meet the • Rich parallelism semantics and granularity to exploit challenges of emerging technologies. Such a model must a diversity of forms to make available the million to serve as a discipline to govern future scalable system archi- billion way parallelism that will be required of near tectures, programming methods, and runtime and operating nanoscale systems by the end of the next decade, system software. Among its most important requirements is to address the dominant factors impeding significant im- • Direct support for lightweight processing of irregular provements to efficiency. Among these are latency, over- time-varying sparse data structure parallelism such as head, starvation, resource contention, and programmability. that for trees (N-body codes), directed graphs (adaptive Latency is the time delay, measured in processor clock cy- mesh refinement, semantic nets), and particle in cell cles required to access remote data or services such as local (magneto hydro dynamics), main memory or remote nodes. Overhead is the critical path • Intrinsic mechanisms for automatic latency hiding, to work required to manage parallel physical resources and mitigate this major source of performance degradation concurrent abstract tasks. Overhead can determine the scal- duetoblockingoncompletionofremoteactionsand ability of a system and the minimum granularity of program the ensuing idle times experienced, tasks that can be effectively exploited. Starvation is the lack of work and therefore the idle cycles experienced by an ex- • Incorporation of low overhead mechanisms for manag- ecution site (e.g. processor core) caused either due to inad- ing global system parallelism including synchroniza- equate program parallelism or due to poor load balancing. tion, scheduling, data movement, and load balancing, Contention for shared resources causes delays while one re- and questing execution site is blocked by another accessing the • Affinity semantics to establish relationships that would same needed resource. These can include shared communi- lead to locality opportunities through both compile cation channels, common memory banks, or other mutually time and runtime techniques. accessible resource. Programmability relates to programmer productivity. Systems that require direct and explicit Addressing these challenges through new strategies and resource management by the programmer to achieve opti- structures of computation would remove the reliance on mized performance can be much more difficult to use than manual, pains taking, and error prone direct user interven- those machines with mechanisms more naturally suited to tion including direct control of hardware mechanisms, di- effectively exploit the function elements thus reducing pro- rect management and allocation of hardware resources, and grammer intervention. All of these efficiency factors can direct choreographing of physical data and task locality. As benefit from improved programming models and parallel a consequence, efficiency and performance scaling could be architecture requiring a model of computation that incorpo- dramatically higher, systems smaller and lower power, and rates the means and facilitates the methods to make effective programming dramatically easier. Towards this end, Par- use of the critical resources. alleX is being devised to provide an alternative framework Other demands on the execution model include such from conventional methods and currently addresses many practical concerns as power consumption, reliability, and (but not all) of these requirements. 2 2.2 Principal Elements of ParalleX puting, ParalleX recognizes a local physical domain. In the case of ParalleX, it is the locus of resources An execution model is neither a programming model (al- that can be guaranteed to operate synchronously and though it influences it) nor a virtual machine (although it is for which hardware can guarantee compound atomic often treated as such) but rather a set of guiding principles operations on local data elements. While a typical ex- that organize the computation and govern the relationships

Load more