Jaguar: the World's Most Powerful Computer System – an Update
Total Page:16
File Type:pdf, Size:1020Kb
Jaguar: The World’s Most Powerful Computer System – An Update Arthur S. Bland, Wayne Joubert, Ricky A. Kendall, Douglas B. Kothe, James H. Rogers, Galen M. Shipman Oak Ridge National Laboratory ABSTRACT: At the SC'09 conference in November 2009, Jaguar was crowned as the world's fastest computer by the web site www.Top500.org. In this paper, we will describe Jaguar, present results from a number of benchmarks and applications, and discuss future computing in the Oak Ridge Leadership Computing Facility. This paper updates the CUG 2009 paper that described Jaguar with new information about the upgrade from 4-core to 6-core processors, and ORNL’s plans for exaflops computing by 2018. KEYWORDS: Jaguar, XT4, XT5, Lustre, Exascale, OLCF, Top500, Istanbul core system in 2006, an addition of a 65 TF 1. Introduction XT44 in 2006 that was combined with the XT3 In 2004, the National Center for in 2007 to make Jaguar XT4 a 119 TF system. Computational Sciences (NCCS) at the U.S. In 2008 the system was further upgraded to Department of Energy’s Oak Ridge National quad-core processors increasing performance to Laboratory (ORNL) partnered with Cray Inc., 263 TF with 62 terabytes (TB) of system Argonne National Laboratory, Pacific memory. This ten-fold increase in Jaguar’s Northwest National Laboratory, and others to computing power and memory from February propose a new national user facility for 2005 through April 2008 provided a productive, 1 leadership computing. Having won the scalable computing system for the development competition, the new Oak Ridge Leadership and execution of the most demanding science Computing Facility (OLCF) has delivered a applications. series of increasingly powerful computer On July 30, 2008 the OLCF took delivery of systems to the science community based on the first 16 of 200 cabinets of an XT5 system 2 Cray’s X1 and XT product lines. The series of providing an additional 1,375 TF, 300 TB of machines includes a 6.4 trillion floating-point system memory and over 10,000 TB of total operations per second (TF) Cray X1 in 2004, an disk space to the OLCF. The final cabinets were upgrade of that system to an 18.5 TF X1e in delivered on September 17, 2008. Twelve days 3 2005, a 26 TF single-core Cray XT3 in 2005, later on September 29th, this incredibly large and an upgrade of that system to a 54 TF dual- and complex system, known as Jaguar XT5, ran Cray User Group 2010 Proceedings 1 of 15 a full-system benchmark application that took phased approach to facility upgrades, system two and one-half hours to complete. The software development, and dramatic application system completed acceptance testing in scalability improvements over an acceptable December 2008 and was transitioned to time horizon. operations shortly thereafter. In late July 2009, a Having successfully transitioned from phased upgrade of the Jaguar XT5 system from sustained teraflop to sustained petaflop 2.3 GHz 4-core Barcelona AMD processors to computing, the OLCF has laid out a similar 2.6 GHz 6-core Istanbul AMD processors roadmap to transition to exaflop computing as began. This phased approach provided users illustrated in Figure 2. The expectation is that with a substantial fraction of the Jaguar XT5 systems on the way to Exascale will be platform throughout the upgrade period, composed of a relatively modest number of minimizing the impact to our users. Completion CPU cores optimized for single thread of this upgrade on November 7th increased performance coupled with a large number of performance of Jaguar XT5 to 2,332 TF with CPU cores optimized for multi-threaded improved memory bandwidth due to performance and low-power consumption. enhancements in the Istanbul’s memory Images in Figure 2 are placeholders and are controller. Today, Jaguar XT5 is running a meant to illustrate that systems could be quite broad range of time-critical applications of different at each stage of the roadmap. national importance in such fields as energy assurance, climate modelling, superconducting 2. Scaling Applications and Supporting materials, bio-energy, chemistry, combustion, Infrastructure astrophysics, and nuclear physics. Successful execution of our 5-year roadmap to sustained petascale computing required focused efforts in improving application scalability, facility upgrades, and scaling out our supporting infrastructure. In 2005 our initial installation of the Jaguar platform resulted in a 3,748 core XT3, subsequent Figure 1: OLCF Petascale Roadmap upgrades and additions have resulted in nearly a The successful deployment of the Jaguar 60 fold increase in the number of CPU cores in XT5 platform represents a landmark in the the current Jaguar system (224,256 cores). OLCF's roadmap to petascale computing. This Scaling applications and infrastructure over this roadmap, illustrated in Figure 1, was based on period, while challenging, has been critical to phased scaling steps of system upgrades and the success of the OLCF. new system acquisitions, allowing a concurrent Cray User Group 2010 Proceedings 2 of 15 SC2009. Using the NWChem application, a study6 of the CCSD(T) calculations of the binding energies of H 2O resulted in sustained performance of 1.4 Petaflops/sec on 224,196 CPU cores, a 2009 Gordon Bell Award finalist at SC2009. This result was particularly notable from a Figure 2: OLCF Exascale Roadmap system and software architecture perspective as NWChem’s communication 2.1. Improving Application Scalability model is based on the Global Arrays7 Recognizing that improving application framework rather than the dominant scalability would require a concerted effort communication model in large-scale HPC, between computational scientists with domain MPI8. expertise and the science teams and that a gap in 2.2. Scaling Supporting Infrastructure computational science expertise existed in many To scale Jaguar from a 26TF system in 2005 of these science teams, the OLCF created the to a 2.332 PF system in 2009 required both Scientific Computing group in which group facility upgrades to our 40,000 square foot members act as liaisons to science teams facility and scaling out of our supporting helping them improve application scalability, infrastructure. Power requirements necessitated performance, and scientific productivity. To facility upgrades from 8 MW in 2005 to 14 MW compliment these efforts, Cray established a in 2009. Power efficiency improvements such as Supercomputer Center of Excellence (COE) at transitioning from a traditional 208-volt power the OLCF in 2005 providing expertise in distribution system to a 480-volt power performance optimization and scalability on distribution system resulted in energy savings Cray platforms. The liaison model coupled with estimated at nearly $500,000 while dramatically the Cray COE has proven successful in decreasing the cost of our power distribution improving application performance and installation. Cooling infrastructure required scalability. As an example, beginning in dramatic improvements in efficiency and September 2009, the OLCF worked closely with capacity. Over this four-year period the a number of key applications teams to OLCF’s cooling capacity grew from 3,600 tons demonstrate application scalability on the to 6,600 tons. In order to efficiently cool the Jaguar XT5 platform. Work on a scalable Jaguar XT5 system, a new cooling system was method for ab-initio computation of free designed for the XT5 platform using R-134a energies in nanoscale systems using the locally refrigerant within the XT5 cabinet and chilled- self-consistent multiple scattering method 5 water heat exchangers to remove the heat. This (LSMS) resulted in sustained performance of system requires less than 0.2 Watts of power for 1.836 Petaflops/sec on 223,232 CPU cores, cooling for every 1 Watt of power used for claiming the 2009 Gordon Bell Award at Cray User Group 2010 Proceedings 3 of 15 computing, substantially better than the industry core Xeon processors, connected to 48 Data average ratio of 0.8 : 1. Direct Networks S2A990013 disk subsystems. The login cluster is composed of eight quad- In addition to facility enhancements, the socket servers with quad-core AMD Opteron OLCF required dramatic scalability and processors and 64 GB of memory per node capacity improvements in its supporting running SUSE Linux. infrastructure particularly in the areas of parallel I/O and archival storage. Scaling from 8 GB/sec 3.1. Node Architecture of peak I/O performance and 38.5 TB of The XT5 compute nodes are general- capacity on Jaguar XT3 in 2005 to over 240 purpose symmetric multi-processor, shared GB/sec and 10 PB of capacity on Jaguar XT5 in memory building blocks designed specifically to 2009 required improvements in parallel I/O support the needs of a broad range of 9 software , storage systems, and system area applications. Each node has two AMD Opteron network technologies. Archival storage 2435 “Istanbul” six-core processors running at requirements have grown dramatically with less 2.6 GHz and connected to each other through a than 500 TB of data stored in 2005 to over 11 pair of HyperTransport14 connections. Each of PB as of this writing, requiring improvements in the Opteron processors has 6 MB of level 3 10 archival storage software and tape archive cache shared among the six cores and a DDR2 infrastructure. memory controller connected to a pair of 4 GB DDR2-800 memory modules. The 3. System Architecture HyperTransport connections between the 11 Jaguar is a descendent of the Red Storm processors provide a cache coherent shared computer system developed by Sandia National memory node with twelve cores, 16 GB of Laboratories with Cray and first installed at memory, and 25.6 GB per second of memory Sandia in 2004. Jaguar is a massively parallel, distributed memory system composed of a 2,332 TF Cray XT5, a 263 TF Cray XT4, a 10,000 TB external file system known as Spider, and a cluster of external login, compile, and job submission nodes.