Building a Distributed Supercomputer Using Cell Processors
Total Page:16
File Type:pdf, Size:1020Kb
PS3GRID.NET: Building a distributed supercomputer using Cell processors Gianni De Fabritiis, Computational Biochemistry and Biophysics Laboratory (GRIB), Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB) [email protected] Molecular simulations (MS) methods will mature in the next few years to be able to simulate temporal scales of biological interests, changing drastically its importance in biological discovery. Gianni De Fabritiis, Uniiversity Pompeu Fabra 2 Why not now? . MD: Newton Equation . Characteristic scales . Timescale: 10E-15 s . Lengthscale: 10E-10 m . Application scales (biology) . Micro-milli seconds . 10-100 nm . Current resolution: . Nanoseconds . 10 nm Helical dimer (PDB: 1JNO) Lipids not shown Gianni De Fabritiis, Uniiversity Pompeu Fabra 3 Where are we going? Source: NVIDIA Gianni De Fabritiis, Uniiversity Pompeu Fabra 4 Accelerator processors: A new challenging era for computational science We are at a crucial critical point in . Data-memory the evolution of microprocessors: Single core => multi-core . Every CPU maker (IBM, INTEL, AMD, NVIDIA) is working on Memory heavily multi-core chips handling bottleneck hundreds of threads. A different memory architecture is needed => IBM Cell processor, GPUs Computation-cpus Gianni De Fabritiis, Uniiversity Pompeu Fabra 5 Cell Processor . 9 cores (1 PPE + 8 SPEs) . Only 80$ per chip . 230 Gflops in single precision . 8 SPE cores are specialized for number crunching with local memory access. Altivec-like, memory alignment, async DMA, mailboxes, signals, memory mapped (16 SPEs in shared mem) Gianni De Fabritiis, Uniiversity Pompeu Fabra 6 Sony-Toshiba-IBM Cell Processor . Available: IBM blades (roadrunner, 16,000 Cell) and PlayStation3 . Software barriers: . Codes do not automatically run faster . Vector programming techniques are required . Memory alignment issues . Different memory model Gianni De Fabritiis, Uniiversity Pompeu Fabra 7 Cell MD for molecular simulations . Simple recompilation slower than a standard processor . 8 SPEs performance is 19 times faster (30 Gflops sustained) . it is possible to issue in a real code like this one 1.4 instructions per clock (theoretical maximum of 2). G. De Fabritiis, Performance of the Cell processor for biomolecular simulations, Comp. Phys. Commun. 176, 670 (2007). Gianni De Fabritiis, Uniiversity Pompeu Fabra 8 Performances on PS3s Gianni De Fabritiis, Uniiversity Pompeu Fabra 9 Speed-up . Speed-up compared to an Opteron 2Ghz is currently 19 times scalable even at smaller problem sizes. This system would run on at least 6 Cell processors for 120 CPU equivalent. Gianni De Fabritiis, Uniiversity Pompeu Fabra 10 Steered molecular dynamics protocol for potential of mean force calculations Gianni De Fabritiis, Uniiversity Pompeu Fabra 11 Free energy along a reaction coordinate (PMF) . Def: reaction coordinate (e.g. z) . Density (equilibrium quantity) . Free energy difference along a reaction coordinate (potential of mean force) Gianni De Fabritiis, Uniiversity Pompeu Fabra 12 Non-equilibrium molecular dynamics . K+ (or water) needs to displace the water file in the channel . We pull the ion using an harmonic restrain potential . Pulling speed v of the order of A/ns . Z=z0+vt 2 . V(z,t)=k/2(z- z0-vt) Gianni De Fabritiis, Uniiversity Pompeu Fabra 13 Ion permeation . We are running simulations of K+ ions crossing the channel using the Crooks formula to recover the equilibrium free energy from non equilibrium MD. Gianni De Fabritiis, Uniiversity Pompeu Fabra 14 Typical tasks to run this protocol before ps3grid.net . [get cpu hours] (annoying) . Prepare inputs (expensive) . Upload inputs . Submit 40-50 runs each on 32 cores . On MN waiting time is very low . Monitor runs . Download results . Consumed 30,000 CPU hours Gianni De Fabritiis, Uniiversity Pompeu Fabra 15 PS3GRID: Building a distributed supercomputer using the PlayStation3 Gianni De Fabritiis, Uniiversity Pompeu Fabra 16 Berkley open infrastructure for distributed computing (BOINC) . SETI@HOME Gianni De Fabritiis, Uniiversity Pompeu Fabra 17 PS3GRID goal . We are creating a machinery for medium-throughput free energy calculations including molecule translocation, binding affinities calculations on relatively large number of molecular targets. The machinery comprises: . High performance molecular simulation codes, CellMD (19x) [end 2006] . Computational protocols . Hardware, Cell processors . Work Distribution server . Analysis tools . Web-based submission and control tools Gianni De Fabritiis, Uniiversity Pompeu Fabra 18 www.ps3grid.net Gianni De Fabritiis, Uniiversity Pompeu Fabra 19 Delivery: PS3GRID LIVE PENDRIVE Gianni De Fabritiis, Uniiversity Pompeu Fabra 20 PS3GRID: Server side Scientists submit input files, analyze results . BOINC based . Many years of development BOINC . Very stable and reliable server and . Large user base database . Cell MD modified for PS3GRID . New submission models implemented to adapt to MD PS3 PS3 PS3 PS3 simulations (workflows, etc) 1) request jobs 2) Compute 3) Return results Gianni De Fabritiis, Uniiversity Pompeu Fabra 21 PS3GRID: Execution models Workflow Task farming task farming . Each line is an execution unit in a user machine . Standard execution model is task farming . Advanced execution model in PS3GRID is a workflow of runs for longer simulations. Other execution modes are possible like graphs. Gianni De Fabritiis, Uniiversity Pompeu Fabra 22 Ps3grid => a grid of clusters . Ps3grid innovated the way a BOINC project runs. Before it was used for high throughput highly grained jobs [thousands of light task in parallel]. Full-atom molecular simulations (MS) required at least some interconnect between nodes due to the need to run on at least (32- 128 a typical job). Accelerated hardware like Cell and GPUs allowed us to move on a single machine, the BOINC approach becomes viable for MS also. Gianni De Fabritiis, Uniiversity Pompeu Fabra 23 A numerical experiment . Previous pulling experiments consists of 20-40 runs each on 32 cores. Now (ps3grid.net), we run 200-400 runs for each experiment . Prepare inputs . Submit them . Copy out of server for analysis . Shift change in mentality: Computation became a commodity good Gianni De Fabritiis, Uniiversity Pompeu Fabra 24 Some numbers Jan 08 . So far in the last 6 months, 5000 trajectories, . Over 4 microseconds of molecular simulations, . Over 200 years of CPU time. Each trajectory takes 24 hours in Cell, approx 16 days on a single PC core . Daily output over 100 trajectories . We receive over 5 GB of data per day . BOINC is very reliable, server caused downtime is very low (few days/year). Gianni De Fabritiis, Uniiversity Pompeu Fabra 25 Avoiding the human bottle-neck: PS3GRID science portal Gianni De Fabritiis, Uniiversity Pompeu Fabra 26 Conclusion . The Cell processor is a valuable alternative to standard processors. It is likely that this and GPUs technologies will become dominant in computational science very soon. FR-SMD is effective in recovering the PMF for Gramicidin A . Simple protocol . Little human intervention . Possible semi-automatization to release to third-party . PS3GRID forces on us a conceptual shift, computational resources becomes a commodity good Gianni De Fabritiis, Uniiversity Pompeu Fabra 27 Acknowledgments . G. Giupponi (UPF, Barcelona) . M. Harvey (Imperial College, London) . J. Villà-Freixa (IMIM, Barcelona) . Universitat Pompeu Fabra, Barcelona, Spain . Ramon y Cajal program [email protected] Gianni De Fabritiis, Uniiversity Pompeu Fabra 28.