1 Cluster Computing at a Glance Chapter 1: by M. Baker and R. Buyya Introduction How to Run Applications Faster ? Era of Computi

Cluster Computing at a Glance Chapter 1: by M. Baker and R. Buyya High Performance Cluster Computing Introduction Scalable Parallel Computer Architecture Architectures and Systems Towards Low Cost Parallel Computing and Motivations Windows of Opportunity A Cluster Computer and its Architecture Clusters Classifications Book Editor: Rajkumar Buyya Commodity Components for Clusters Network Service/Communications SW Slides Prepared by: Hai Jin Cluster Middleware and Single System Image Resource Management and Scheduling (RMS) Programming Environments and Tools Cluster Applications Internet and Cluster Computing Center Representative Cluster Systems Cluster of SMPs (CLUMPS) Summary and Conclusions 2 http://www.buyya.com/cluster/ Introduction How to Run Applications Faster ? Need more computing power There are 3 ways to improve performance: Improve the operating speed of processors & Work Harder other components Work Smarter constrained by the speed of light, thermodynamic laws, & the high financial costs for processor Get Help fabrication Computer Analogy Using faster hardware Connect multiple processors together & coordinate their computational efforts Optimized algorithms and techniques used to solve computational tasks parallel computers allow the sharing of a computational task among Multiple computers to solve a particular task multiple processors 3 4 Era of Computing Two Eras of Computing Rapid technical advances the recent advances in VLSI technology software technology OS, PL, development methodologies, & tools grand challenge applications have become the main driving force Parallel computing one of the best ways to overcome the speed bottleneck of a single processor good price/performance ratio of a small cluster- based parallel computer 5 6 1 Scalable Parallel Computer Scalable Parallel Computer Architectures Architectures Taxonomy MPP A large parallel processing system with a shared-nothing based on how processors, memory & architecture interconnect are laid out Consist of several hundred nodes with a high-speed Massively Parallel Processors (MPP) interconnection network/switch Each node consists of a main memory & one or more Symmetric Multiprocessors (SMP) processors Cache-Coherent Nonuniform Memory Access Runs a separate copy of the OS (CC-NUMA) SMP Distributed Systems 2-64 processors today Shared-everything architecture Clusters All processors share all the global resources available Single copy of the OS runs on these systems 7 8 Scalable Parallel Computer Key Characteristics of Architectures Scalable Parallel Computers CC-NUMA a scalable multiprocessor system having a cache-coherent nonuniform memory access architecture every processor has a global view of all of the memory Distributed systems considered conventional networks of independent computers have multiple system images as each node runs its own OS the individual machines could be combinations of MPPs, SMPs, clusters, & individual computers Clusters a collection of workstations of PCs that are interconnected by a high-speed network work as an integrated collection of resources have a single system image spanning all its nodes 9 10 Towards Low Cost Parallel Motivations of using NOW over Computing Specialized Parallel Computers Parallel processing Individual workstations are becoming increasing linking together 2 or more computers to jointly solve some computational problem powerful since the early 1990s, an increasing trend to move away Communication bandwidth between workstations is from expensive and specialized proprietary parallel supercomputers towards networks of workstations increasing and latency is decreasing the rapid improvement in the availability of commodity high Workstation clusters are easier to integrate into performance components for workstations and networks existing networks Low-cost commodity supercomputing Typical low user utilization of personal workstations from specialized traditional supercomputing platforms to cheaper, general purpose systems consisting of loosely Development tools for workstations are more mature coupled components built up from single or multiprocessor PCs or workstations Workstation clusters are a cheap and readily need to standardization of many of the tools and utilities available used by parallel applications (ex) MPI, HPF Clusters can be easily grown 11 12 2 Trend Windows of Opportunities Parallel Processing Workstations with UNIX for science & Use multiple processors to build MPP/DSM-like systems for industry vs PC-based machines for parallel computing Network RAM administrative work & work processing Use memory associated with each workstation as aggregate DRAM cache A rapid convergence in processor Software RAID performance and kernel-level functionality Redundant array of inexpensive disks of UNIX workstations and PC-based Use the arrays of workstation disks to provide cheap, highly machines available, & scalable file storage Possible to provide parallel I/O support to applications Use arrays of workstation disks to provide cheap, highly available, and scalable file storage Multipath Communication Use multiple networks for parallel data transfer between 13 14 nodes Cluster Computer and its Architecture Cluster Computer Architecture A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource A node a single or multiprocessor system with memory, I/O facilities, & OS generally 2 or more computers (nodes) connected together in a single cabinet, or physically separated & connected via a LAN appear as a single system to users and applications provide a cost-effective way to gain features and benefits 15 16 Prominent Components of Beowulf Cluster Computers (I) Multiple High Performance Computers PCs Head Node Workstations Switch SMPs (CLUMPS) Distributed HPC Systems leading to Metacomputing Client Node1 Client Node2 17 18 3 Prominent Components of Prominent Components of Cluster Computers (II) Cluster Computers (III) High Performance Networks/Switches State of the art Operating Systems Ethernet (10Mbps), Fast Ethernet Linux (Beowulf) (100Mbps), Microsoft NT(Illinois HPVM) SUN Solaris (Berkeley NOW) InfiniteBand (1-8 Gbps) IBM AIX (IBM SP2) Gigabit Ethernet (1Gbps) HP UX (Illinois - PANDA) SCI (Dolphin - MPI- 12micro-sec latency) Mach (Microkernel based OS) (CMU) ATM Cluster Operating Systems (Solaris MC, SCO Myrinet (1.2Gbps) Unixware, MOSIX (academic project) Digital Memory Channel OS gluing layers (Berkeley Glunix) FDDI 19 20 Prominent Components of Prominent Components of Cluster Computers (IV) Cluster Computers (V) Network Interface Card Fast Communication Protocols and Services Myrinet has NIC Active Messages (Berkeley) InfiniteBand (HBA) Fast Messages (Illinois) User-level access support U-net (Cornell) XTP (Virginia) 21 22 Prominent Components of Prominent Components of Cluster Computers (VI) Cluster Computers (VII) Cluster Middleware Parallel Programming Environments and Tools Single System Image (SSI) Threads (PCs, SMPs, NOW..) POSIX Threads System Availability (SA) Infrastructure Java Threads Hardware MPI Linux, NT, on many Supercomputers DEC Memory Channel, DSM (Alewife, DASH), SMP Techniques PVM Operating System Kernel/Gluing Layers Software DSMs (Shmem) Solaris MC, Unixware, GLUnix Compilers C/C++/Java Applications and Subsystems Parallel programming with C++ (MIT Press book) Applications (system management and electronic forms) RAD (rapid application development tools) Runtime systems (software DSM, PFS etc.) GUI based tools for PP modeling Resource management and scheduling software (RMS) Debuggers CODINE, LSF, PBS, NQS, etc. Performance Analysis Tools 23 24 Visualization Tools 4 Prominent Components of Key Operational Benefits of Cluster Computers (VIII) Clustering Applications High Performance Sequential Expandability and Scalability Parallel / Distributed (Cluster-aware app.) High Throughput Grand Challenging applications Weather Forecasting High Availability Quantum Chemistry Molecular Biology Modeling Engineering Analysis (CAD/CAM) ………………. PDBs, web servers,data-mining 25 26 Clusters Classification (I) Clusters Classification (II) Application Target Node Ownership High Performance (HP) Clusters Dedicated Clusters Grand Challenging Applications Non-dedicated clusters High Availability (HA) Clusters Adaptive parallel computing Mission Critical applications Communal multiprocessing 27 28 Clusters Classification (III) Clusters Classification (IV) Node Operating System Node Hardware Linux Clusters (e.g., Beowulf) Clusters of PCs (CoPs) Solaris Clusters (e.g., Berkeley NOW) Piles of PCs (PoPs) NT Clusters (e.g., HPVM) Clusters of Workstations (COWs) AIX Clusters (e.g., IBM SP2) SCO/Compaq Clusters (Unixware) Clusters of SMPs (CLUMPs) Digital VMS Clusters HP-UX clusters Microsoft Wolfpack clusters 29 30 5 Clusters Classification (V) Clusters Classification (VI) Levels of Clustering Node Configuration Group Clusters (#nodes: 2-99) Homogeneous Clusters Nodes are connected by SAN like Myrinet Departmental Clusters (#nodes: 10s to 100s) All nodes will have similar architectures and run the same OSs Organizational Clusters (#nodes: many 100s) National Metacomputers (WAN/Internet- Heterogeneous Clusters based) All nodes will have different architectures

1 Cluster Computing at a Glance Chapter 1: by M. Baker and R. Buyya Introduction How to Run Applications Faster ? Era of Computi

Sprite File System There Are Three Important Aspects of the Sprite ®Le System: the Scale of the System, Location-Transparency, and Distributed State

Clustering with Openmosix

Distributed Virtual Machines: a System Architecture for Network Computing

Workstation Operating Systems Mac OS 9

Distributed Operating Systems

A Single System Image Java Operating System for Sensor Networks

Load Balancing Experiments in Openmosix”, Inter- National Conference on Computers and Their Appli- Cations , Seattle WA, March 2006 0 0 5 4 9

HPC with Openmosix

Introduction Course Outline Why Does This Fail? Lectures Tutorials

The Utility Coprocessor: Massively Parallel Computation from the Coffee Shop

Legoos: a Disseminated, Distributed OS for Hardware Resource

Architectural Review of Load Balancing Single System Image