Future Generation Supercomputers II: a Paradigm for Cluster Architecture
Total Page:16
File Type:pdf, Size:1020Kb
Future Generation Supercomputers II: A Paradigm for Cluster Architecture N.Venkateswaran§,DeepakSrinivasan†, Madhavan Manivannan†, TP Ramnath Sai Sagar† Shyamsundar Gopalakrishnan† VinothKrishnan Elangovan† Arvind M† Prem Kumar Ramesh Karthik Ganesan Viswanath Krishnamurthy Sivaramakrishnan Abstract Blue Gene/L, Red Storm, and ASC Purple clearly marks that these machines although significantly diverse along the In part-I, a novel multi-core node architecture was proposed afore-mentioned design parameters, offer good performance which when employed in a cluster environment would be ca- during ”Grand Challenge Application” execution. But fu- pable of tackling computational complexity associated with ture generation applications might require close coupling of wide class of applications. Furthermore, it was discussed previously independent application models, as highlighted in that by appropriately scaling the architectural specifications, NASA’s report on Earth Science Vision 2030[2], which in- Teraops computing power could be achieved at the node volves simulations on coupled climate models, such as ocean, level. In order to harness the computational power of such atmosphere, biosphere and solid earth. These kind of hybrid a node, we have developed an efficient application execution applications call for simultaneous execution of the compo- model with a competent cluster architectural backbone. In nent applications, since the execution of different applica- this paper we present the novel cluster paradigm, dealing tions on separate clusters may not be prudent in the context with operating system design, parallel programming model of constraints laid down by cluster performance and opera- and cluster interconnection network. Our approach in devel- tional cost. oping the competent cluster design revolves around an ex- There is hence a need, to develop an execution model for ecution model to aid the execution of multiple applications cost effective supercomputing which will envisage simultane- simultaneously on all partitions of the cluster, leading to ous execution of multiple applications on all partitions of a cost sharing across applications. This would be a major ini- single cluster(without sacrificing the performance of individ- tiative towards achieving Cost-Effective Supercomputing. ual application) unlike the current models in which different applications are executed in independent partitions of the 1. Introduction cluster. Future supercomputing models should also address critical design aspects like reliability, fault tolerance and low High performance monolithic clusters, having good perfor- power issues which are increasingly becoming important de- mance and scalability are becoming increasingly popular in sign criterions. This paper along with part-I [3] conjointly the research community for their ability to cater to specific proposes a supercomputing model which is expected to of- application requirements. The level of performance is char- fer superior performance/cost ratio and deal with the rigors acterized by the node architecture, network topology, com- posed by computational requirements of the hybrid applica- piler, parallel programming paradigm and operating system. tions (composed of interdependent applications). Making better design choices would improve the execution This execution model introduces new challenge in the time of large scale applications, which are currently pre- cluster architecture and operating system design for han- dicted to be in Teraflop years. In this paper, we discuss the dling the increased mapping complexity and tracking mech- impact of these design choices on the application’s perfor- anisms during the execution of multiple applications. This mance and provide insights into a supercomputing model execution model introduces new challenges in the cluster which would cater to the demands of the next generation architecture and operating system to enable it to handle grand challenge applications. the increased mapping complexity and tracking, during the Performance Analysis carried out by Adolfy et al.[1] on execution of multiple applications. Also, the programming paradigm adopted should help exploit both node and cluster §N.Venkateswaran Founder Director, WAran Research Founda- Tion (WARFT), Chennai, India. Email:- [email protected] architecture characteristics and ease the programming com- †- WARFT Research Trainee, 2005-2007 plexities involved in application development. However, the −Former WARFT Research Trainee current status given at end support for execution of such diverse workloads encountered of this paper during simultaneous multiple application execution lies in the efficient design of the node architecture. In paper-I [3], we discuss the capability of MIP-based (Memory In Proces- sor) heterogeneous multi-core node architectures to handle SMAG (Simultaneous Multiple AlGorithms) execution aid- ing the proposed execution model by running traces of mul- ACM SIGARCH Computer Architecture News 61 Vol. 35, No. 5, December 2007 tiple applications in the same node. But in the context of the proposed execution model, a new The paper is organized into 4 sections. Section 2 dis- OS paradigm is required for handling the complexities asso- cusses the scope for improvement in the design features of ciated with parallel mapping and data tracking of the huge current generation clusters in order to meet the requirements amount of data associated with the different applications. of performance greedy hybrid applications, also taking into In this scenario, the reliability of the operating system is of consideration the operating cost factor. Section 3 highlights paramount importance as the integrity of IO data sequenc- a cluster model that incorporates all the architectural con- ing is critical, particularly when dealing with million node cepts proposed in section2 and investigates its potential for clusters. Thus the capability of the cluster to stomach the cost effective execution of multiple applications. Section 4 complexities involved in multiple applications’ execution lies addresses the ramification of this model on performance, in an efficient OS design. resource utilization profile and their influence on the perfor- mance/cost relation. 2.3 Parallel Programming Paradigm and Compiler The current Parallel Programming Languages are catego- 2. Design Characteristics of High Performance Clusters rized into Data parallel languages, explicit communication Performance modeling has come a long way in helping re- model and functional languages. These parallel languages searchers characterize cluster design to achieve expected per- either stress on data parallelism as in NESL and C* [5, 6] formance. Different methodologies have been evolved to ac- or the communication model where the user is completely curately compare, analyze and predict the performance of responsible for the creation and management of processes various designs and features such as the node and cluster and their interactions with one another, as in Emerald or architecture, operating system, and programming paradigm COOL [7]. No single language has been developed which that have been identified to play dominant roles[4]. We dis- can handle both data parallelism and communication model cuss these design issues in high performance clusters and efficiently. propose new directions for evolving a cluster model to meet With increasing complexity of the application, the pro- the requirements of future generation applications. gramming model needs to be simple and expressible and also allow programmers to represent complex logic efficiently. For this, the Parallel Programming Language (PPL) model 2.1 Cluster Interconnection Network should be simple and portable form of object-based so that The type of interconnects and the topology adopted af- it can be easily understand, modify and debug than its se- fects the overall performance of the communication network. quential counterpart. These PPL should have constructs Conventional networks use wired network topologies sup- which must be capable of exploit the level of parallelism in- ported by different technologies for implementing large scale herently present in the application matching the underlying HPC (High Performance Cluster) designs.The most popular architecture (ISA of the node architecture). A new PPL choices for network interconnects are Fast Ethernet, Gigabit model of the MIP SCOC cluster incorporating all the above Ethernet, Myricom Myrinet, and InfiniBand. features will be discussed in section 3.6. The communication pattern of massive applications vary dynamically during execution time and each pattern can 3. MODEL FOR NEXT GENERATION SUPERCOMPUT- be served better by employing a particular interconnection ERS topology. If it is possible to dynamically reconfigure the cluster topology to suit the communication requirements of In order to create a design space for supercomputers, the the instant, it would greatly boost the performance of the focus should also be on aspects like power, performance, application execution on the cluster. Although many of the cost and their related tradeoffs. In this section we present currently employed networks have been successful in sat- a conceptual model (fig. 1) for cluster design taking into isfying the high bandwidth requirements, they are unable consideration all the design issues discussed in section 2. to meet the overwhelming degree of scalability required by The cluster model comprises of MIP-paradigm based het- hybrid application execution and the