PlayStation 3 Cluster Computing

Benjamin Wheeler Zachary Sigmund Outline

● Cluster Computing Overview ○ Historical background ○ Theory/Concepts ○ Characteristics of Clusters ○ Interconnects ○ Communication ○ Cluster Management Outline

● Playstation 3 Clusters ○ Cell Microprocessor ○ Programming Models ○ Constructing a PlayStation 3 Cluster ○ Existing PlayStation 3 Clusters ○ Applications ○ Limitations Cluster Computing Overview: Historical Background

○ set of that work together to solve a single problem ● When was the first computer cluster created? ○ depends on the definition of "work together" ○ First commodity-network based computer cluster, ARPANET, created in 1969 ■ Later became the ○ First commercial clustering products ■ ARCnet, developed by Datapoint in 1977 ■ VAXcluster, developed by DEC in 1984 Cluster Computing Overview: Theory/Concepts

● Clusters arose as viable solutions because of a desire for more computing power and better reliability ○ readily available computing nodes ○ fast local area networks ● Clusters rely on centralized management approach ○ nodes act as shared servers ○ different from distributed approaches such as peer to peer or grid computing Cluster Computing Overview: Theory/Concepts

● A simple approach for building a cluster is to construct a ○ connect inexpensive personal to create a high performance system ○ all nodes are identical, commodity-grade computers ● A more complex approach is to combine different types of nodes ○ highly reactive systems that interact with other environments ○ take advantage of specialized resources to improve performance ○ heterogeneous clusters are still experimental Cluster Computing Overview: Characteristics of Clusters

● Low cost ○ can be created using commercial off the shelf nodes ● Elasticity ○ hardware can be added and removed as needed ● Loosely coupled ○ nodes share resources (ex: common home directory) ● Nodes must trust each other ○ ssh password would require manual startup on each machine ● Message passing software installed ○ nodes must be able to communicate to run parallel programs correctly Cluster Computing Overview: Interconnects

● Gigabit ○ Low cost ○ Bandwidth: < 100 MB/s, Latency: < 100 us ● Infiniband ○ High performance ○ Bandwidth: 850 MB/s, Latency: < 7 us ● Myrinet ○ High performance ○ Bandwidth: 230 MB/s, Latency: 10 us ● Scalable Coherent Interface (SCI) ○ High performance ○ Bandwidth: < 320 MB/s, Latency: 1-2 us Cluster Computing Overview: Communication

● Communication between nodes is achieved through the use of message passing ○ (PVM) ■ set of software libraries provides environment for message passing, task and resource management, and fault notification ○ Message Passing Interface (MPI) ■ has emerged as the "standard" for message passing ■ specification rather than set of libraries ■ implemented in systems such as MPICH and OpenMPI Cluster Computing Overview: Cluster Management

● Task scheduling ○ can be centralized or decentralized ○ schedule jobs as resources become available ○ Examples: , Portable Batch System, Condor, SLURM ● Node failure management ○ When a node fails, the rest of the system must remain operational ■ power : use a power controller to turn the failed node off ■ resources fencing: revoke access to resources without powering off the node PlayStation 3 Computer Clusters: Cell Microprocessor Background

● Developed by "STI" - Sony, Toshiba, and IBM ● Designed to bridge gap between conventional desktop processors and more specialized high-performance processors ● Intended for use in HD displays and recording equipment ● Also well-suited to digital imaging and physical simulation PlayStation 3 Computer Clusters: Cell Microprocessor

● PPE ○ Controller for SPEs ● 8x SPEs ○ Designed for vectorized floating point code ○ Only 6 available PlayStation 3 Computer Clusters: Cell Microprocessor

● EIB ○ 16-byte circular ring ○ Theoretical peak of 204.8 GB/s ● FlexIO ○ Gigabit Ethernet PlayStation 3 Computer Clusters: Cell Microprocessor Optimization

● Offload work to the SPEs ○ PPE best used as control processor ● Minimize synchronization events ○ Static partitioning ● Be wary of data type difference ○ 32-bit vs 64-bit longs/pointers ● Use an appropriate compiler ○ Microcoded opcodes cause HW stalls PlayStation 3 Computer Clusters: Programming Models - Taxonomy

● Functional offload model ○ main application executes on PPE ○ offload performance critical computations to SPE ● Device extension model ○ SPE provides services normally offered by a device ● Computational acceleration model ○ SPE-centric ○ PPE acts as a system service facility PlayStation 3 Computer Clusters: Programming Models - Taxonomy

● Streaming model ○ SPEs arranged in a ○ Each SPE performs a different computation on received data ● multiprocessor model ○ DMA to shared memory ○ PPE and SPE in same address space ● Asymmetric runtime model ○ extremely flexible ○ alternative to full preemptive task switching needed (too costly) PlayStation 3 Computer Clusters: Programming Models

● CorePy ○ provides an API for creating SPU and PPU programs using Python ○ BLASTp bioinformatics algorithm ● RapidMind ○ provides extensions to common languages such as C and C++ ○ RTT RealTrace PlayStation 3 Computer Clusters: Programming Models

● MPI MicroTask ○ optimize scheduling of computation and communication ○ microtasks are virtual SPEs that communicate using MPI ● Mercury Multi-Core Framework ○ focuses on exploiting ○ PPE takes role of manager while SPEs act as workers PlayStation 3 Computer Clusters: Constructing a PlayStation 3 Cluster

● Using "Other OS" feature ○ This was removed in a firmware update ● Install Fedora Linux ○ The hard drive will need to be partitioned ● Install OpenMPI ● Network PS3s ○ This uses a standard Ethernet network ● Create NFS share ○ This is how the nodes communicate ● Run PlayStation 3 Computer Clusters: Existing PlayStation 3 Clusters

● Air Force ○ 1700 PS3s ○ 500 TeraFLOPS ● North Carolina State ○ 8 PS3s ● University of Massachusetts ○ 16 PS3s ○ "Gravity Grid" ● Folding@home ● Single PS3 ○ Bruteforce MD5 PlayStation 3 Computer Clusters: Applications

● Fluid flow simulation ● Dense linear algebra ○ Matrix multiplication (SUMMA algorithm) ● FFTs ● Variational optic flow ● Elliptic curve cryptography PlayStation 3 Computer Clusters: Applications - "Gravity Grid"

● 16 PS3s ● Binary black hole coalescence ● Gravity waves produced by merger of two black holes ● Wave-equation solver ● Performance comparable to nearly 100 Xeon processor cores PlayStation 3 Computer Clusters: Applications - Folding@home

● Simulates protein folding ● Used for medical research ○ Alzheimer's ○ Huntington's ○ Cancer ○ Drug design PlayStation 3 Computer Clusters: Limitations

● Virtualization layer ○ Hardware only accessible through hypervisor ● Main memory access rate ○ 24 FP operations per memory transfer ● Network interconnect speed ○ Gigabit Ethernet is a bottleneck compared to Cell ● Main memory size ○ 256 MB of main memory is relatively little ● DP performance ○ DP is significantly slower than SP ● Programming paradigm ○ Writing efficient, fast code for Cell can be difficult Conclusions

● Computer clusters provide low cost, modular solutions to complex problems ● The PlayStation 3 is a good choice for a potential node in a cluster due to its low cost and the advantages of the cell microprocessor ● PlayStation 3 computer clusters are good solutions for a variety of applications ● The limitations of the PlayStation 3 ultimately mean that for any application, there is probably a better solution