Bringing up Red Storm-Lessons to Remember

Total Page:16

File Type:pdf, Size:1020Kb

Bringing up Red Storm-Lessons to Remember Bringing Up Red Storm: Lessons to Remember Robert A. Ballance John P. Noe Sandia National Laboratories Albuquerque, New Mexico May 4, 0007 {[email protected], [email protected]} SAND 2007-2904C Abstract Capability computer systems are designed, developed, and operated to provide the computational power to investigate and solve some of the most diffi cult and compelling prob- lems facing humanity. Th ese unique resources can be ex- tremely helpful in scientifi c endeavors, but present exciting challenges for the operators and customers who utilize their power. While many of the diffi culties in getting capability jobs run are inherent in the job itself, others intrude by way of operational issues. Consider Red Storm: in Jumbo mode, its largest confi guration, it provides 13,80 dual-core pro- cessors; over 30,000 links; 3,700 embedded Linux sensors; 88 RAID controllers; 50 10GigE network connections; plus cooling, power, and environmental sensors. For each component, the only defi nitely known state is “down.” Th is overview looks at capability computing from the aspect of operations, but draws many conclusions about the eff ects of system design on the eff ectiveness of system operation. Figure 2: ASCI Red, Installed 01997 Overview ing and on the subject of today’s talk. Solid lines indicate direct andia National Laboratories has been building large systems lines of descendants, from the MPP’s like the nCube and Para- for 15 years now. Prior to that, the Labs had been a pur- gon through ASCI Red to the Cray XT4 line, and from the fi rst chaser and operator of systems like the Paragon, Th inking Beowulf clusters through commercial off erings like Los Lobos to SMachines CM-, and Cray vector systems. During that time, Th underbird. Dashed lines indicate infl uences. Sandians have designed, deployed, and operated the fi rst Terafl op Operating systems also play key a role, with the red lines system (ASCI Red), several of the fi rst large commodity clusters trending the use of small, lightweight kernels, and the blue lines (CPLANT), and this week’s # system Red Storm. Th e fi rst author following variants of Linux. In 0007, Cray is working to blend joined Sandia several years ago, aft er working with the University the red and blue lines, which might bring us to something like of New Mexico and NCSA in standing up Los Lobos, one of the Cray Purple, but that name has already been taken! fi rst Linux clusters to break the Top 100. (Sandia’s CPLANT was Th e lower tier of Figure 1 reminds us that many systems have another; HPTI built the third.) Today, Sandia also operates one been designed, fewer have been built, and even fewer have be- of the largest Linux clusters in the world, Th underbird with 8,960 come ongoing commercial successes! processors. Th e systems named, of course, are only a handful of those Figure 1 maps several of many sets of infl uences on our think- involved in a great conversation. Th e topic concerns Exafl ops for science and engineering: how to achieve them, how to deliver BeoWulf nCube, nCube-2 Los Lobos Thunderbird them, how to apply them, and how to fund them. In 0007, the next clear step is to a Petafl op, and ICC/NWCC CPLANT from a theoretical Petafl op to a sustained, produc- Paragon tion, delivered, multi-petafl op system. Th ere is no ASCI RED Red Storm doubt that any of several vendors can construct a system with a PetaOp peak. What we’ve found, CM-2 BlueGene ? SP however, is that getting from the PetaOp peak to the sustained, pro duction mult-petaOp system will take sustained eff ort from a broad-based team hav- Cray Vector ing diverse talents. T3 Isn’t it interesting that we don’t have a unit for Whirlwind•Stretch•BBN•ETA•Alliant•Elexsi•MasPar•Convex •Kendall Square•CDC•Cray 3•… sustained human eff ort? We need a unit that would Figure 1: Systems from a Sandian’s Perspective signify honest attempts and would harmonize with Petafl ops and Exabytes, something like Giga-tries 1 ability, reliability, and operability another: versatility. The cost, of course, is the overhead of maintaining two installations. In that sense, Red Storm is like having twins. Figure 3 shows the end of ASCI Red. Red was #1 on the Top 500 list for 3.5 years; 7 lists! Red was decommissioned last year, not because it wasn’t useful, but because we could no longer af- ford to keep it running. When it was turned off, there were still jobs on the mesh and more jobs in the queue. ASCI Red served production cycles for over 9 years from its initial operation until its retirement. That’s a long time in supercomputer years! Of course there were upgrades along the way which took advantage of the rapid pace of technology maturation. Processors, memory, network interconnect, and disk storage were all upgraded. The Clock of the Long Now Figure 3: ASCI Red, 02006 “I want to build a clock that ticks once a year. The century hand ad- or Peta-d’oh’s. One of the motivations for this paper, and one of vances once every one hundred years, and the cuckoo comes out on the the missions for organizations like CUG and LCI, is to reduce the millennium. I want the cuckoo to come out every millennium for the number of wasted human cycles in building ever-more-compli- next 10,000 years. If I hurry I should finish the clock in time to see the cated systems. cuckoo come out for the first time.” Danny Hillis, [8] But, lest we get too optimistic, let’s consider the following four very different pictures. Figure shows ASCI Red. Red was Figure 4 shows a prototype of the Clock of the Long Now — an notable for several reasons. It successfully deployed a Lightweight attempt to build a computer, and a very engineered artifact — that Kernel on compute nodes called Cougar. Cougar descended from will continue to deliver cycles and be both maintainable and SUNMOS — the Sandia/UNM OS — that ran on the nCube, and understandable for 10,000 years [4]. The cuckoo did come out which became PUMA on the Paragon. Red also had an inte- the first time, seven years ago, and the clock struck twice: once grated service (RAS) network that provided for system health for each millennium c.e. You might also be interested in Brian and maintenance checks for all nodes. TheRAS design on Red Eno’s study of the bells for the clock; they are based on a 10-digit influenced bothCPLANT and Red Storm. Finally, Red provided permutation so that each day the clock will sound a unique bell a simple mechanism to switch compute nodes from Unclassified sequence! Of course, that sequence will eventually wrap around, to Classified processing and back gain. This red/black switching in a bit over 10,000 years. procedure allows Sandia to deploy capability computing on either The Tacoma Narrows Bridge network by sharing the compute partitions. The cost is relatively low: additional service and I/O nodes, and a mechanism for the Figure 5 shows the Tacoma Narrows Bridge, or “Galloping Gertie”. switch. The duplicated I/O partitions have an additional benefit Bridge builders have had a history of building new structures that to operations, as well as a cost. The benefit is that during major creatively used new materials and new designs that pushed the upgrades, you can keep a portion of the system in production envelope of bridge engineering. Just as regularly, on a well-docu- while other portions are being worked. Add to terms like scal- mented 30-year cycle, bridges were built and collapsed. This cycle was originally noticed by Sibley, and is documented in Henry Petroski’s “Engineers of Dreams” [1]. The amazing thing about the Tacoma Narrows Bridge is not that it was built, or that it fell down, but that the original archi- tecture was carefully studied and engineered, and then built even when there were open questions about its stability. The width to Figure 4: Clock of the Long Now Figure 5: Tacoma Narrow Bridge Red Storm Thunderbird Stance Red + Black Black Service Nodes 320 + 320 Compute Nodes 12,960 4,480 Compute Cores 25,960 8,960 Segments 3360/6240/3360 Interconnect SeaStar Mesh Infiniband Disk 170TB + 170TB 46TB + 420TB shared OS Linux + Catamount Linux Figure 6: Red Storm 02006 span ratio of the deck was 1:76, while the next most comparable Seek First to Emulate bridge (the Golden Gate) at the time had a width to length ratio • Learn from the past of 1:46.5. Tacoma Narrows was truly a “stretch goal”. • Simulate the future Bridges have a lot in common with supercomputers; they cost a lot, take a long time to build, carry a lot of traffic, and are only Learn from the past missed when they are gone. Like bridges, building a supercom- This first principle is a restatement of Petroski’s arguments for the puter often starts with a need and a vision. From those seeds use of historical precedent in the begins lengthy rounds of funding negotiations, politics, further design of new structures. It was negotiations, and redesigns. Stretch goals and new technologies phrased succinctly in a quote by often determine the design of the system. Finally, if the team is John Guale: lucky, the system is built and jobs start running. “A complex system that works Computer engineers tend not to lose the same number of is invariably found to have workmen to death, accident, or caisson disease as the bridge evolved from a simple system builders, but we may lose the same number of systems in the end.
Recommended publications
  • Red Storm Infrastructure at Sandia National Laboratories
    Red Storm Infrastructure Robert A. Ballance, John P. Noe, Milton J. Clauser, Martha J. Ernest, Barbara J. Jennings, David S. Logsted, David J. Martinez, John H. Naegle, and Leonard Stans Sandia National Laboratories∗ P.O. Box 5800 Albquuerque, NM 87185-0807 May 13, 2005 Abstract Large computing systems, like Sandia National Laboratories’ Red Storm, are not deployed in isolation. The computer, its infrastructure support, and the operations of both have to be designed to support the mission of the or- ganization. This talk and paper will describe the infrastructure and operational requirements of Red Storm along with a discussion of how Sandia is meeting those requirements. It will include discussions of the facilities that surround and support Red Storm: shelter, power, cooling, security, networking, interactions with other Sandia platforms and capabilities, operations support, and user services. 1 Introduction mary components: Stewart Brand, in How Buildings Learn[Bra95], has doc- 1. The physical setting, such as buildings, power, and umented many of the ways in which buildings — often cooling, thought to be static structures – evolve over time to meet the needs of their occupants and the constraints of their 2. The computational setting which includes the re- environments. Machines learn in the same way. This is lated computing systems that support ASC efforts not the learning of “Machine learning” in Artificial Intel- on Red Storm, ligence; it is the very human result of making a machine 3. The networking setting that ties all of these to- smarter as it operates within its distinct environment. gether, and Deploying a large computational resource teaches many lessons, including the need to meet the expecta- 4.
    [Show full text]
  • Merrimac – High-Performance and Highly-Efficient Scientific Computing with Streams
    MERRIMAC – HIGH-PERFORMANCE AND HIGHLY-EFFICIENT SCIENTIFIC COMPUTING WITH STREAMS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Mattan Erez May 2007 c Copyright by Mattan Erez 2007 All Rights Reserved ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (William J. Dally) Principal Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Patrick M. Hanrahan) I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Mendel Rosenblum) Approved for the University Committee on Graduate Studies. iii iv Abstract Advances in VLSI technology have made the raw ingredients for computation plentiful. Large numbers of fast functional units and large amounts of memory and bandwidth can be made efficient in terms of chip area, cost, and energy, however, high-performance com- puters realize only a small fraction of VLSI’s potential. This dissertation describes the Merrimac streaming supercomputer architecture and system. Merrimac has an integrated view of the applications, software system, compiler, and architecture. We will show how this approach leads to over an order of magnitude gains in performance per unit cost, unit power, and unit floor-space for scientific applications when compared to common scien- tific computers designed around clusters of commodity general-purpose processors.
    [Show full text]
  • Developing Integrated Data Services for Cray Systems with a Gemini Interconnect
    Developing Integrated Data Services for Cray Systems with a Gemini Interconnect Ron A. Oldeld Todd Kordenbrock Jay Lofstead May 1, 2012 Abstract Over the past several years, there has been increasing interest in injecting a layer of compute resources between a high-performance computing application and the end storage devices. For some projects, the objective is to present the parallel le system with a reduced set of clients, making it easier for le-system vendors to support extreme-scale systems. In other cases, the objective is to use these resources as staging areas to aggregate data or cache bursts of I/O operations. Still others use these staging areas for in-situ analysis on data in-transit between the application and the storage system. To simplify our discussion, we adopt the general term Integrated Data Services to represent these use-cases. This paper describes how we provide user-level, integrated data services for Cray systems that use the Gemini Interconnect. In particular, we describe our implementation and performance results on the Cray XE6, Cielo, at Los Alamos National Laboratory. 1 Introduction den on the le system and potentially improve over- all ecency of the application workow. Current In our quest toward exascale systems and applica- work to enable these coupling and workow scenar- tions, one topic that is frequently discussed is the ios are focused on the data issues to resolve resolu- need for more exible execution models. Current tion and mesh mismatches, time scale mismatches, models for capability class high-performance comput- and make data available through data staging tech- ing (HPC) systems are essentially static, requiring niques [20, 34, 21, 11, 2, 40, 10].
    [Show full text]
  • Jaguar and Kraken -The World's Most Powerful Computer Systems
    Jaguar and Kraken -The World's Most Powerful Computer Systems Arthur Bland Cray Users’ Group 2010 Meeting Edinburgh, UK May 25, 2010 Abstract & Outline At the SC'09 conference in November 2009, Jaguar and Kraken, both located at ORNL, were crowned as the world's fastest computers (#1 & #3) by the web site www.Top500.org. In this paper, we will describe the systems, present results from a number of benchmarks and applications, and talk about future computing in the Oak Ridge Leadership Computing Facility. • Cray computer systems at ORNL • System Architecture • Awards and Results • Science Results • Exascale Roadmap 2 CUG2010 – Arthur Bland Jaguar PF: World’s most powerful computer— Designed for science from the ground up Peak performance 2.332 PF System memory 300 TB Disk space 10 PB Disk bandwidth 240+ GB/s Based on the Sandia & Cray Compute Nodes 18,688 designed Red Storm System AMD “Istanbul” Sockets 37,376 Size 4,600 feet2 Cabinets 200 3 CUG2010 – Arthur Bland (8 rows of 25 cabinets) Peak performance 1.03 petaflops Kraken System memory 129 TB World’s most powerful Disk space 3.3 PB academic computer Disk bandwidth 30 GB/s Compute Nodes 8,256 AMD “Istanbul” Sockets 16,512 Size 2,100 feet2 Cabinets 88 (4 rows of 22) 4 CUG2010 – Arthur Bland Climate Modeling Research System Part of a research collaboration in climate science between ORNL and NOAA (National Oceanographic and Atmospheric Administration) • Phased System Delivery • Total System Memory – CMRS.1 (June 2010) 260 TF – 248 TB DDR3-1333 – CMRS.2 (June 2011) 720 TF • File Systems
    [Show full text]
  • Red Storm IO Performance Analysis James H
    1 Red Storm IO Performance Analysis James H. Laros III #1,LeeWard#2, Ruth Klundt ∗3, Sue Kelly #4 James L. Tomkins #5, Brian R. Kellogg #6 #Sandia National Laboratories 1515 Eubank SE Albuquerque NM, 87123-1319 [email protected] [email protected], [email protected] [email protected], [email protected] ∗Hewlett-Packard 3000 Hanover Street Palo Alto, CA 94304-1185 [email protected] Abstract— This paper will summarize an IO1 performance hardware or software, potentially affects how the entire system analysis effort performed on Sandia National Laboratories Red performs. Storm platform. Our goal was to examine the IO system We consider the testing described in this paper to be second performance and identify problems or bottle-necks in any aspect of the IO sub-system. Our process examined the entire IO path phase testing. Initial testing was accomplished to establish from application to disk both in segments and as a whole. Our baselines, develop test harnesses and establish test parameters final analysis was performed at scale employing parallel IO that would provide meaningful information within the lim- access methods typically used in High Performance Computing its imposed such as time constraints. All testing presented applications. in this paper was accomplished during system preventative Index Terms— Red Storm, Lustre, CFS, Data Direct Networks, maintenance periods on the production Red Storm platform, Parallel File-Systems on production file-systems. Testing time on heavily utilized production platforms such as Red Storm is precious. Detailed results of previous, current and future testing is posted on- I. INTRODUCTION line[2]. IO is a critical component of capability computing2.At In section II we will discuss components of the Red Storm Sandia Labs, high value is placed on balanced platforms.
    [Show full text]
  • Taking the Lead in HPC
    Taking the Lead in HPC Cray X1 Cray XD1 Cray Update SuperComputing 2004 Safe Harbor Statement The statements set forth in this presentation include forward-looking statements that involve risk and uncertainties. The Company wished to caution that a number of factors could cause actual results to differ materially from those in the forward-looking statements. These and other factors which could cause actual results to differ materially from those in the forward-looking statements are discussed in the Company’s filings with the Securities and Exchange Commission. SuperComputing 2004 Copyright Cray Inc. 2 Agenda • Introduction & Overview – Jim Rottsolk • Sales Strategy – Peter Ungaro • Purpose Built Systems - Ly Pham • Future Directions – Burton Smith • Closing Comments – Jim Rottsolk A Long and Proud History NASDAQ: CRAY Headquarters: Seattle, WA Marketplace: High Performance Computing Presence: Systems in over 30 countries Employees: Over 800 BuildingBuilding Systems Systems Engineered Engineered for for Performance Performance Seymour Cray Founded Cray Research 1972 The Father of Supercomputing Cray T3E System (1996) Cray X1 Cray-1 System (1976) Cray-X-MP System (1982) Cray Y-MP System (1988) World’s Most Successful MPP 8 of top 10 Fastest First Supercomputer First Multiprocessor Supercomputer First GFLOP Computer First TFLOP Computer Computer in the World SuperComputing 2004 Copyright Cray Inc. 4 2002 to 2003 – Successful Introduction of X1 Market Share growth exceeded all expectations • 2001 Market:2001 $800M • 2002 Market: about $1B
    [Show full text]
  • Performance Impact of the Red Storm Upgrade
    Performance Impact of the Red Storm Upgrade Ron Brightwell, Keith D. Underwood, and Courtenay Vaughan Sandia National Laboratories P.O. Box 5800, MS-1110 Albuquerque, NM 87185-1110 {rbbrigh, kdunder, ctvaugh }@sandia.gov Abstract— The Cray Red Storm system at Sandia processors. Several large XT3 systems have National Laboratories recently completed an upgrade been deployed and have demonstrated excellent of the processor and network hardware. Single-core performance and scalability on a wide variety 2.0 GHz AMD Opteron processors were replaced of applications and workloads [1], [2]. Since with dual-core 2.4 GHz AMD Opterons, while the network interface hardware was upgraded from a the processor building block for the XT3 sys- sustained rate of 1.1 GB/s to 2.0 GB/s. This paper tem is the AMD Opteron, existing single-core provides an analysis of the impact of this upgrade on systems can be upgraded to dual-core simply the performance of several applications and micro- by changing the processor. benchmarks. We show scaling results for applications The Red Storm system at Sandia, which is out to thousands of processors and include an analysis the predecessor of the Cray XT3 product line, of the impact of using dual-core processors on this system. recently underwent the first stage of such an upgrade on more than three thousand of its I. INTRODUCTION nearly thirteen thousand nodes. In order to help The emergence of commodity multi-core maintain system balance, the Red Storm net- processors has created several significant chal- work on these nodes was also upgraded from lenges for the high-performance computing SeaStar version 1.2 to 2.1.
    [Show full text]
  • Tour De Hpcycles
    Tour de HPCycles Wu Feng Allan Snavely [email protected] [email protected] Los Alamos National San Diego Laboratory Supercomputing Center Abstract • In honor of Lance Armstrong’s seven consecutive Tour de France cycling victories, we present Tour de HPCycles. While the Tour de France may be known only for the yellow jersey, it also awards a number of other jerseys for cycling excellence. The goal of this panel is to delineate the “winners” of the corresponding jerseys in HPC. Specifically, each panelist will be asked to award each jersey to a specific supercomputer or vendor, and then, to justify their choices. Wu FENG [email protected] 2 The Jerseys • Green Jersey (a.k.a Sprinters Jersey): Fastest consistently in miles/hour. • Polka Dot Jersey (a.k.a Climbers Jersey): Ability to tackle difficult terrain while sustaining as much of peak performance as possible. • White Jersey (a.k.a Young Rider Jersey): Best "under 25 year-old" rider with the lowest total cycling time. • Red Number (Most Combative): Most aggressive and attacking rider. • Team Jersey: Best overall team. • Yellow Jersey (a.k.a Overall Jersey): Best overall supercomputer. Wu FENG [email protected] 3 Panelists • David Bailey, LBNL – Chief Technologist. IEEE Sidney Fernbach Award. • John (Jay) Boisseau, TACC @ UT-Austin – Director. 2003 HPCwire Top People to Watch List. • Bob Ciotti, NASA Ames – Lead for Terascale Systems Group. Columbia. • Candace Culhane, NSA – Program Manager for HPC Research. HECURA Chair. • Douglass Post, DoD HPCMO & CMU SEI – Chief Scientist. Fellow of APS. Wu FENG [email protected] 4 Ground Rules for Panelists • Each panelist gets SEVEN minutes to present his position (or solution).
    [Show full text]
  • Introduction to the Oak Ridge Leadership Computing Facility for CSGF Fellows Bronson Messer
    Introduction to the Oak Ridge Leadership Computing Facility for CSGF Fellows Bronson Messer Acting Group Leader Scientific Computing Group National Center for Computational Sciences Theoretical Astrophysics Group Oak Ridge National Laboratory Department of Physics & Astronomy University of Tennessee Outline • The OLCF: history, organization, and what we do • The upgrade to Titan – Interlagos processors with GPUs – Gemini Interconnect – Software, etc. • The CSGF Director’s Discretionary Program • Questions and Discussion 2 ORNL has a long history in 2007 High Performance Computing IBM Blue Gene/P ORNL has had 20 systems 1996-2002 on the lists IBM Power 2/3/4 1992-1995 Intel Paragons 1985 Cray X-MP 1969 IBM 360/9 1954 2003-2005 ORACLE Cray X1/X1E 3 Today, we have the world’s most powerful computing facility Peak performance 2.33 PF/s #2 Memory 300 TB Disk bandwidth > 240 GB/s Square feet 5,000 Power 7 MW Dept. of Energy’s Jaguar most powerful computer Peak performance 1.03 PF/s #8 Memory 132 TB Disk bandwidth > 50 GB/s Square feet 2,300 National Science Kraken Power 3 MW Foundation’s most powerful computer Peak Performance 1.1 PF/s Memory 248 TB #32 Disk Bandwidth 104 GB/s Square feet 1,600 National Oceanic and Power 2.2 MW Atmospheric Administration’s NOAA Gaea most powerful computer 4 We have increased system performance by 1,000 times since 2004 Hardware scaled from single-core Scaling applications and system software is the biggest through dual-core to quad-core and challenge dual-socket , 12-core SMP nodes • NNSA and DoD have funded
    [Show full text]
  • Towards Efficient Supercomputing: a Quest for the Right Metric
    Towards Efficient Supercomputing: A Quest for the Right Metric ∗ Chung-Hsing Hsu, Wu-chun Feng, Jeremy S. Archuleta Los Alamos National Laboratory {chunghsu, feng, jsarch}@lanl.gov Abstract Awards for Performance and Price/Performance take cen- ter stage. Unfortunately, this focus on performance (and to Over the past decade, we have been building less and a lesser degree, price/performance, where price only cap- less efficient supercomputers, resulting in the construction tures the acquisition cost) has led the HPC community to of substantially larger machine rooms and even new build- build less and less efficient supercomputers with lower reli- ings. In addition, because of the thermal power envelope ability, reduced availability, lower productivity, and signifi- of these supercomputers, a small fortune must be spent to cantly higher total cost of ownership (TCO). cool them. These infrastructure costs coupled with the ad- For example, while the performance of our n-body code ditional costs of administering and maintaining such (un- has increased by 2000-fold since the Cray C90 of the early reliable) supercomputers dramatically increases their total 1990s, the performance per watt has only increased 300- cost of ownership. As a result, there has been substantial in- fold and the performance per square foot by only 65-fold. terest in recent years to produce more reliable and more ef- This trend in inefficiency has resulted in the construction of ficient supercomputers that are easy to maintain and use. substantially larger and more complex machine rooms and But how does one quantify efficient supercomputing? That even new buildings. Further, because of the thermal enve- is, what metric should be used to evaluate how efficiently a lope of such supercomputers, a small fortune must be spent supercomputer delivers answers? to simply cool them, e.g., $6M/year at Lawrence Livermore We argue that existing efficiency metrics such as the National Laboratory (LLNL) [10].
    [Show full text]
  • Leadership Computing Directions at Oak Ridge National Laboratory: Navigating the Transition to Heterogeneous Architectures
    Leadership Computing Directions at Oak Ridge National Laboratory: Navigating the Transition to Heterogeneous Architectures International Symposium on Computing in Atmospheric Sciences September 8-12 , 2013 James J. Hack, Director National Center for Computational Sciences Oak Ridge National Laboratory Arthur S. “Buddy” Bland, OLCF Project Director Bronson Messer, OLCF Scientific Computing James Rogers, NCCS Director of Operations Jack Wells, NCCS Director of Science U.S. Department of Energy strategic priorities Incremental changes to existing technologies cannot meet these challenges – Transformational advances in energy technologies are needed – Transformational adaptation strategies will need to be implemented – Transformational changes to tools that allow a predictive explorations of paths forward Innovation Energy Security Investing in science, Providing clean, secure Safeguarding nuclear discovery and innovation energy and promoting and radiological materials, to provide solutions economic prosperity advancing responsible to pressing energy through energy efficiency legacy cleanup, and challenges and domestic maintaining nuclear forms of energy deterrence 2 Managed by UT-Battelle for the U.S. Department of Energy 3 Managed by UT-Battelle New for the York U.S. Department Magazine of Energy ExampleEnergy is of our the defining virtualization challenge challenge And a great example of success Big storm early next week (Oct 29) with wind and rain??? A number of computer models today were hinting at that, suggesting a tropical system now in the Caribbean may merge with a cold front in that time table, close to the East Coast. At the same time, 8 days is an eternity in storm forecasting, so plenty of time to watch this. Consider this an early “heads-up” that we’ll have an interesting an weather feature to monitor this week..
    [Show full text]
  • Leadership Computing at the National Center for Computational Science: Transitioning to Heterogeneous Architectures
    Leadership Computing at the National Center for Computational Science: Transitioning to Heterogeneous Architectures 15th ECMWF Workshop on High Performance Computing in Meteorology 4 October 2012 James J. Hack, Director National Center for Computational Sciences Oak Ridge National Laboratory Arthur S. “Buddy” Bland, OLCF Project Director Bronson Messer, OLCF Scientific Computing James Rogers, NCCS Director of Operations Jack Wells, NCCS Director of Science U.S. Department of Energy strategic priorities Innovation Energy Security Investing in science, Providing clean, secure Safeguarding nuclear discovery and innovation energy and promoting and radiological materials, to provide solutions economic prosperity advancing responsible to pressing energy through energy efficiency legacy cleanup, and challenges and domestic maintaining nuclear forms of energy deterrence 2 Managed by UT-Battelle for the U.S. Department of Energy Energy is the defining challenge of our time The major driver for Global energy consumption – Climate change will increase 50% by 2030 – National security – Economic competitiveness – Quality of life Incremental changes to existing technologies cannot meet this challenge – Transformational advances in energy technologies are needed – Transformational adaptation strategies will need to be implemented – Transformational changes to tools enabling virtualization of strategies 33 Managed by UT-Battelle for the DepartmentU.S. Department of Energy of Energy ORNL has a long history in High Performance Computing 2007 IBM Blue
    [Show full text]