NERSC

National Energy Research Scientific Computing Center

2007 Annual Report

National Energy Research Scientific Computing Center 2007 Annual Report

Ernest Orlando Lawrence Berkeley National Laboratory 1 Cyclotron Road, Berkeley, CA 94720-8148

This work was supported by the Director, Office of Science, Office of Ad- vanced Scientific Computing Research of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

LBNL-1143E, October 2008

National Energy Research Scientific Computing Center iii 2007 Annual Report Table of Contents

THE YEAR IN PERSPECTIVE...... 2

RESEARCH NEWS ...... 4 Quantum Secrets of Photosynthesis Revealed Computational models guide the design of the experiment and help

interpret the results ...... 4 Bridging the Gap between Climate and Weather A century’s worth of reconstructed weather data will provide a better

baseline for climate change studies...... 10 A Perfect Sandwich Scientists discover why the right amount of moisture in the membrane

plays a key role in fuel cell efficiency ...... 16 The Fusion Two-Step

Simulations elucidate the physics of fast ignition ...... 20 Spontaneous Superlattice Ab initio calculations and modeling contribute to the discovery of a new

way to fabricate striped nanorods ...... 24 Igniting a Stellar Explosion Flash Center achieves the first 3D simulation of the spontaneous detonation

of a white dwarf star ...... 28 Science for Humanity

NERSC users share Nobel Peace Prize, among other honors ...... 32

THE NERSC CENTER ...... 34

Kathy Yelick Is Named NERSC Director ...... 35

NERSC Gets High Marks in Operational Assessment Review...... 36

Franklin Passes Rigorous Acceptance Test...... 37

Early Results Demonstrate Franklin’s Capabilities ...... 39

Climate Models Produce Finest Details...... 39

Weather Forecast Model Sets Speed Record ...... 41

Large Scale Reimbursement Program Improves Code Performance ...... 42

Seaborg Is Retired after Seven Years of Service...... 43

NERSC’S Mass Storage Stays Ahead of the Curve...... 45 iv National Energy Research Scientific Computing Center 2007 Annual Report

Improving Access for Users ...... 46

Overcoming Obstacles to Connect with KamLAND...... 46

Syncing Up with the Open Science Grid ...... 48

Software Innovations for Science ...... 48

Sunfall: A Collaborative Visual Analytics System ...... 48

IPM to Be Deployed at NSF Supercomputer Centers ...... 51

International Leadership and Partnerships...... 51

ENVISIONING THE EXASCALE ...... 54

NERSC Computing...... 56

NERSC Power...... 58

Reducing Waste in Computing...... 58

A Tightly Coupled Hardware/Software Co-Design Process ...... 60 An Ultrascale Application: Ultra-High-Resolution Climate

Change Simulation ...... 61

Related Research ...... 63

NERSC Data ...... 64

Easy Access to Data Accelerates Science ...... 64

NERSC Data Program Elements ...... 65

NERSC Data Storage...... 65

NERSC Data Production Infrastructure ...... 66

NERSC Data Tools...... 66

Focused Data Projects...... 67

Appendix A: NERSC Policy Board ...... 71

Appendix B: NERSC Client Statistics ...... 72

Appendix C: NERSC Users Group Executive Committee ...... 74

Appendix D: Office of Advanced Scientific Computing Research ...... 76

Appendix E: Advanced Scientific Computing Advisory Committee ...... 78

Appendix F: Acronyms and Abbreviations ...... 80

The Year in Perspective

As a computer scientist, I have 3000 users and 500 application always been interested in making codes from across the DOE sci- computer systems more efficient ence disciplines, the NERSC and easier to use through better workload is one of the most chal- architectures, programming lan- lenging workloads supported by guages, algorithms, and tools that any center. To support this user connect the hardware to the appli- base, NERSC and Cray developed cations. So when Berkeley Lab of- a plan to test Cray’s CLE operat- fered me the position of NERSC ing system, an ultra-lightweight Division Director beginning in Jan- version of Linux, and Franklin be- uary 2008, I saw it as an opportu- came the first Cray XT4 system to nity to help scientists make new run CLE in production. discoveries in domains ranging The availability of Franklin re- from basic scientific understand- leased a pent-up demand for ing, such as the origins of the uni- large-scale computing as users verse, to some of the most critical quickly adapted their codes to run issues facing the world today, in- on the multicore-based machine cluding climate modeling and the with CLE. Within a week of its for- development of new sources of energy. mal acceptance in October 2007, Franklin was 80– The computing industry is facing its greatest 95% utilized, and the users, on average, consumed challenge ever, as it shifts from single-core to multi- five times more compute time on Franklin than they core processing chips, driven by power density had initially been allocated for 2007 — fourteen limits and the recognition that hidden forms of in- times more for the largest users. They used this op- struction-level parallelism have been tapped out. portunity to scale their codes to new levels, to ex- This change was apparent at NERSC in 2007 as a periment with new algorithms, and to produce new major computational system, a Cray XT4 built with scientific results. dual-core AMD chips, was introduced. The deploy- The real impact of the Franklin system is meas- ment of this system, known as Franklin, was a major ured by the science produced by NERSC users. One milestone in NERSC’s history — as the first major dramatic example was the project “Structure and system based on multicore technology, it sustained Reactions of Hadrons and Nuclei,” led by James the performance increases that the community has Vary of Iowa State University, which investigates come to expect, increasing the computational power longstanding problems in nuclear physics such as available to NERSC users by a factor of six. the nature of the strong interactions and the origins With close to 20,000 cores, Franklin has a theo- of the spin-orbit force. These researchers originally retical peak speed of over 100 teraflops (100 trillion had an allocation of only 200,000 hours, but were floating point operations per second). In the installa- able to use 4 million hours on Franklin for their real- tion, testing, and acceptance of Franklin, NERSC istic ab initio calculations of nucleon–nucleon inter- staff demonstrated their expertise in standing up actions of oxygen-16. By increasing the scaling of large systems that provide excellent performance their calculations from 2000 to 12,000 compute across a diverse set of applications. With nearly cores and diagonalizing a matrix of dimension 1 bil- lion, they achieved the most accurate calculations to for NERSC and the users it serves: the growing energy date on this size nucleus. These results can be used requirement of large-scale systems, which could dwarf to parameterize new density functionals for nuclear the cost of hardware purchases if it goes unchecked, structure simulations. and the virtual tsunami of scientific data arising from Another outstanding achievement this year was simulations as well as experimental and measurement the “20th Century Reanalysis” INCITE project, led by devices. In the discussion of NERSC’s transition to Gil Compo of the University of Colorado and the the exascale in the last section of this annual report, NOAA Earth System Research Lab, which is using an we describe these challenges in more detail along algorithm called an Ensemble Kalman Filter to recon- with two longer-term goals we will be pursuing to struct missing climate data from 1892 to the present. address them. Compo’s team has successfully reproduced historical The first goal comes from a holistic look at system weather phenomena like the 1922 Knickerbocker design, including hardware, algorithms, software, and Storm, and the comprehensive three-dimensional applications; it involves leveraging low-power embed- database they are producing will be used to validate ded processing technology, massively multicore com- climate and weather models. With a 3.1-million-hour pute nodes to reduce power, and scalable algorithms. allocation and what they described as “fabulous sup- As a demonstration, we are focusing on a system de- port” from NERSC consultants, the researchers ran sign driven by global climate modeling with the goal their code on all four of NERSC’s large-scale comput- of demonstrating an affordable system for kilometer- ing systems, switched to a higher-resolution algorithm scale modeling. when they moved to the Cray XT4, and parallelized The second goal will look at the growing data re- virtually their entire workflow. quirements of DOE science areas, and the spectrum The year 2007 represents both a beginning and an of storage, communication, and computing technol- end in the history of NERSC’s major computational ogy needed to preserve, manage, and analyze the systems, as the IBM SP RS/6000 system, Seaborg, data. Just as web search engines have revolutionized was in its seventh and final year of production use. nearly every aspect of our lives, we believe that better Over the course of its lifetime, Seaborg provided over access to and ability to search and manipulate scien- 250 million CPU hours to the users and resulted in an tific data sets will revolutionize science in the next estimated 7000 published scientific results in astro- decade. physics, climate research, fusion energy, chemistry, I am looking forward to working with NERSC’s and other disciplines. One of the last breakthroughs dedicated staff, Associate Lab Director (and former enabled by Seaborg was the first spontaneous deto- NERSC Director) Horst Simon, and the diverse com- nation of a white dwarf star into a supernova in a munity of researchers we support to continue finding three-dimensional simulation, achieved by Don Lamb innovative solutions to these and other challenging and a team at the University of Chicago Flash Center scientific problems. using resources at NERSC and Lawrence Livermore National Laboratory. NERSC HPC consultants helped Katherine Yelick get the Flash Center team’s 512-processor job up NERSC Division Director and running on short notice to help them meet a hard deadline and a longstanding scientific goal. Despite the availability of the new Franklin sys- tem, there is still a huge unmet demand from users for more compute time. We will upgrade Franklin from dual cores to quad cores during the second half of 2008 and are beginning a project for the procurement of the next major computational system at NERSC. As we look to the future, two critical issues arise Quantum Secrets of Photosynthesis Revealed National Energy Research Scientific Computing Center 5 2007 Annual Report

Computational models guide the design of the experiment and help interpret the results

Through photosynthesis, green plants and cyanobacteria are able to trans- fer sunlight energy to molecular reaction centers for conversion into chemical energy with nearly 100-percent efficiency. Speed is the key — the transfer of the solar energy takes place almost instantaneously so little energy is wasted as heat. How photosynthesis achieves this near instantaneous energy transfer is a longstanding mystery that may have finally been solved. A study led by researchers with Berkeley Lab and the University of Califor- nia (UC) at Berkeley reports that the answer lies in quantum mechanical ef- fects. Results of the study were presented in the April 12, 2007 issue of the journal Nature.1 “We have obtained the first direct evidence that remarkably long-lived wavelike electronic quantum coherence plays an important part in energy transfer processes during photosynthesis,” said Graham Fleming, the principal investigator for the study. “This wavelike characteristic can explain the extreme efficiency of the energy transfer because it enables the system to simultaneously sample all the potential energy pathways and choose the most efficient one.” Fleming, a former Deputy Director of Berkeley Lab, is a researcher in the Lab’s Physical Biosciences Division, a professor of chemistry at UC Berkeley, and an internationally acclaimed leader in spectroscopic studies of the photo- synthetic process. Co-authoring the Nature paper were Gregory Engel, who was first author, Tessa Calhoun, Elizabeth Read, Tae-Kyu Ahn, Tomáš Mančal, and Yuan-Chung Cheng, all of whom held joint appointments with Berkeley Lab’s Physical Biosciences Division and the UC Berkeley Chemistry Depart- ment at the time of the study, plus Robert Blankenship, from Washington Uni- versity in St. Louis. Project: Simulations of Nonlinear In the paper, Fleming and his collaborators report the detection of “quan- Optical Spectra and Energy Transfer tum beating” signals, coherent electronic oscillations in both donor and ac- Dynamics of Photosynthetic Light- ceptor molecules, generated by light-induced energy excitations, like the Harvesting Complexes ripples formed when stones are tossed into a pond (Figure 1). Electronic spec- PI: Graham Fleming, University of troscopy measurements made on a femtosecond (millionths of a billionth of a California, Berkeley, and Lawrence second) time-scale showed the oscillations meeting and interfering construc- Berkeley National Laboratory tively, forming wavelike motions of energy (superposition states) that can ex- plore all potential energy pathways simultaneously and reversibly, meaning Senior investigators: Elizabeth Read and Yuan-Chung Cheng, University of California, Berkeley, and Lawrence Berkeley National Laboratory 1 G. S. Engel, T. R. Calhoun, E. L. Read, T.-K. Ahn, T. Mančal, Y.-C. Cheng, R. E. Blankenship, and G. Funding: BES, MIBRS, KRFG R. Fleming, “Evidence for wavelike energy transfer through quantum coherence in photosynthetic systems,” Nature 446, 782 (2007). 6 National Energy Research Scientific Computing Center 2007 Annual Report

through molecular complexes with femtosecond temporal resolution. The technique involves sequentially flashing a sample with femtosecond pulses of light from three laser beams. A fourth beam is used as a local oscillator to amplify and de- tect the resulting spectroscopic sig- nals as the excitation energy from the laser lights is transferred from one molecule to the next. (The exci- tation energy changes the way each molecule absorbs and emits light.) Fleming has compared 2D elec- tronic spectroscopy to the technique used in the early super-heterodyne Figure 1. Sunlight absorbed by bacteriochlorophyll (green) within the FMO protein (gray) gener- radios, where an incoming high- ates a wavelike motion of excitation energy whose quantum mechanical properties can be frequency radio signal was converted mapped through the use of two-dimensional electronic spectroscopy. (Image courtesy of Greg by an oscillator to a lower frequency Engel) for more controllable amplification and better reception. In the case of they can retreat from wrong path- Following the flow of 2D electronic spectroscopy, scien- ways with no penalty. energy tists can track the transfer of energy This finding contradicts the between molecules that are cou- classical description of the photo- The photosynthetic technique pled (connected) through their elec- synthetic energy transfer process for transferring energy from one tronic and vibrational states in any as one in which excitation energy molecular system to another should photoactive system, macromolecular hops from light-capturing pigment make any short-list of Mother Na- assembly, or nanostructure. molecules to reaction center mole- ture’s spectacular accomplish- Fleming and his group first de- cules step-by-step down the mo- ments. If we can learn enough to scribed 2D electronic spectroscopy in lecular energy ladder. emulate this process, we might be a 2005 Nature paper, when they used “The classical hopping descrip- able to create artificial versions of the technique to observe electronic tion of the energy transfer process photosynthesis that would help us couplings in the Fenna-Matthews- is both inadequate and inaccurate,” effectively tap into the sun as a Olson (FMO) photosynthetic light- said Fleming. “It gives the wrong clean, efficient, sustainable and car- harvesting protein, a molecular picture of how the process actually bon-neutral source of energy. complex in green sulfur bacteria.2 works, and misses a crucial aspect Towards this end, Fleming and Said Engel, “The 2005 paper of the reason for the wonderful effi- his research group have developed was the first biological application ciency.” a technique called two-dimensional of this technique; now we have electronic spectroscopy that en- used 2D electronic spectroscopy to ables them to follow the flow of discover a new phenomenon in light-induced excitation energy photosynthetic systems. While the

2 T. Brixner, J. Stenger, H. M. Vaswani, M. Cho, R. E. Blankenship, and G. R. Fleming, “Two-dimensional spectroscopy of electronic couplings in photosyn- thesis,” Nature 434, 625 (2005). National Energy Research Scientific Computing Center 7

Diagonal Cut Through 2D Electronic Spectrum Eciton 1 Diagonal Peak Beating Power Spectrum

0.015 Predicted Beat Spectrum Power Spectrum of

0.0125 Amplitude (arb. units) Beat Spectrum Interpolation

0.01 Amplitude 0.0075

783 0.005

0.0025 660 Diagonal Wavelength (nm)

813 500 1000 1500 Frequency (cm–1)

Time (fs) Figure 2. Two-dimensional electronic spectroscopy enables scientists to follow the flow of light-induced excitation energy through molecular complexes with femtosecond temporal res- 843 olution. In this 2D electronic spectrum, the amplitude of the quantum beating signal for exciton 0 1 is plotted against population time. The black line covers the exciton 1 peak amplitude. The experimental data's agreement with the computational simulation is shown on the right.

possibility that photosynthetic en- sociated cross-peak amplitude also Simulations provide ergy transfer might involve quantum appears to oscillate. Surprisingly, a preview oscillations was first suggested this quantum beating lasted the en- more than 70 years ago, the wave- tire 660 femtoseconds.” “Computational modeling and like motion of excitation energy had Engel said the duration of the simulation play a critical role in this never been observed until now.” quantum beating signals was unex- kind of research,” Fleming said. As in the 2005 paper, the FMO pected because the general scien- “Nobody has ever done experi- protein was again the target. FMO tific assumption had been that the ments like this before, so simula- is considered a model system for electronic coherences responsible tions are essential to tell us what studying photosynthetic energy for such oscillations are rapidly de- the information content of the transfer because it consists of only stroyed. measurement actually is. I often say seven pigment molecules, and its “For this reason, the transfer of to students, it’s normal for theory to chemistry has been well character- electronic coherence between exci- come in after you’ve done an exper- ized. tons during relaxation has usually iment, but in our case, the model “To observe the quantum beats, been ignored,” Engel said. “By actually guides the design of the 2D spectra were taken at 33 popu- demonstrating that the energy experiment. It tells us what to look lation times, ranging from 0 to 660 transfer process does involve elec- for, how to measure it, and how to femtoseconds,” said Engel. “In tronic coherence and that this co- know what we’re looking at.” these spectra, the lowest-energy herence is much stronger than we Yuan-Chung Cheng explained exciton (a bound electron-hole pair would ever have expected, we have further: “At first we didn’t under- formed when an incoming photon shown that the process can be stand why the signal oscillates. To boosts an electron out of the va- much more efficient than the classi- figure that out, we had to run a sim- lence energy band into the conduc- cal view could explain. However, we ulation on a smaller model, which tion band) gives rise to a diagonal still don’t know to what degree pho- clearly showed that the oscillations peak near 825 nanometers that tosynthesis benefits from these correlate with the quantum coherence clearly oscillates (Figure 2). The as- quantum effects.” in the system. It actually represents 8 National Energy Research Scientific Computing Center 2007 Annual Report

the energy transfer back and forth systems there is an intrinsic static of molecules,” Fleming pointed out. in those molecules. So when we disorder, so each individual protein Elizabeth Read added, “Once figured that out, we could then carry complex is slightly different from we have a decent model of what this out more large-scale simulations. the others.” protein is doing, that enables us to “It’s not just a single system that “So we have to repeat the cal- estimate what the frequencies of we have to simulate,” Cheng con- culation thousands of times to get oscillation will be, and then that tells tinued, “because in those protein the averaged behavior of the real set us how many time points we need to

Figure 3. Half of the green visible on Earth is the molecule LHC2, which is the next subject of the Fleming group’s research. (Composite image created by Reto Stöckli, Nazmi El Saleous, and Marit Jentoft-Nilsen, NASA GSFC.) National Energy Research Scientific Computing Center 9 2007 Annual Report

measure in order to be able to ob- One of the next steps for the serve the quantum coherence, and so group will be to look at the effects of it all goes into the design of the exper- temperature changes on the photo- iments. Without these models we synthetic energy transfer process. would have no basis for doing that.” They will also be looking at broader This kind of modeling requires bandwidths of energy using different the capacity of a massively parallel colors of light pulses to map out computer. Calculating a 2D spectrum everything that is going on, not just for just one set of initial conditions energy transfer. And they plan to can take several hours because of begin studying light-harvesting the large number of molecules and complex 2 (LHC2), which is the states involved. The results from most abundant pigment-protein thousands of independent calcula- complex in green plants (Figure 3). tions with different initial parameters “The contribution of green sulfur must be averaged, requiring up to bacteria to the entire world’s energy 20,000 processor-hours for a com- supply is probably not all that large,” plete 2D spectrum. The research Fleming said, “but half the chlorophyll team used Jacquard, NERSC’s in the world lives in LHC2. It’s the 712-processor Opteron cluster, and single most important light-harvest- Bassi, the 888-processor IBM p575 ing protein on earth. We’re planning POWER5 system, and were granted experiments that are somewhat an increase in their disk quota to harder to do and even more compli- accommodate all the data. cated to interpret because there are In addition to the 2D spectrum 14 cholorophylls of two different modeling, the Fleming group also kinds in this protein.” performs quantum dynamics simu- Ultimately, the idea is to gain a lations to elucidate the energy much better understanding of how transfer pathways inside the network Nature not only transfers energy of photosynthetic pigment-protein from one molecular system to an- complexes, and to study the effects other, but is also able to convert it of coherence transfer on the energy into useful forms. transfer dynamics. These simulations “Nature has had about 2.7 billion can take up to 1000 processor-hours years to perfect photosynthesis, so per job, depending on the number there are huge lessons that remain for of molecules being modeled. The us to learn,” Engel said. “The results researchers’ goal is to understand we’re reporting in this latest paper, the “design rules” that enable the however, at least give us a new way extremely efficient energy transfer to think about the design of future in photosynthesis. artificial photosynthesis systems.” This article written by: Lynn Yarris and John } Hules (Berkeley Lab). Bridging the Gap between Climate and Weather

Figure 1. Historic weather map for 8 a.m. on January 28, 1922, the day the deadly Knickerbocker Storm hit Washington, D.C. (see sidebar). (NOAA Central Library Data Imaging Project) National Energy Research Scientific Computing Center 11 2007 Annual Report

A century’s worth of reconstructed weather data will provide a better baseline for climate change studies

The distinction between climate and weather was expressed most suc- cinctly by science fiction writer Robert A. Heinlein: “Climate is what you ex- pect; weather is what you get.” But as global warming produces more noticeable changes on a planetary scale, how do we even know what to ex- pect in a particular region? Climate change studies are increasingly focused on understanding and predicting regional changes of daily weather statistics. But to predict the next century’s statistical trends with confidence, researchers have to demonstrate that their forecasting tools can successfully recreate the conditions of the past century. That requires a detailed set of historical atmospheric circulation data — not just monthly averages, but statistics for at least every six hours, so that phenomena like severe storms can be analyzed. Although there is scant atmospheric data from weather balloons and none from satellites for the first half of the 20th century, there is an enormous amount of observational data collected at the Earth’s surface by a variety of sources, from meteorologists and military personnel to volunteer observers and ships’ crews. Until recently, this two-dimensional data was widely avail- able only on hand-drawn weather maps (Figure 1). Despite many errors, these maps are indispensable to researchers, and extensive efforts are being made to put these maps into a digital format and make them available on the Web. Now, using the latest data integration and atmospheric modeling tools and a 2007 INCITE award of 2 million supercomputing hours at NERSC, scientists from the NOAA Earth System Research Lab and the Cooperative Institute for Research in Environmental Sciences (CIRES) are building the first complete database of three-dimensional global weather maps of the 20th Project: The 20th Century Reanalysis century. Project Called the 20th Century Re- analysis Project, the new dataset PI: Gil Compo, University of Colorado/ will double the number of years for CIRES/Climate Diagnostics Center and which a complete record of three- NOAA Earth System Research Lab dimensional atmospheric climate Senior investigators: Jeffrey data is available, extending the us- Whitaker, NOAA Earth System Re- able digital dataset from 1948 back search Lab; Prashant Sardeshmukh, to 1892. The team expects to com- University of Colorado/CIRES plete the dataset within two years, Funding: INCITE, CIRES, NOAA including observations currently 12 National Energy Research Scientific Computing Center 2007 Annual Report

being digitized around the world. From two to three In each analysis cycle, observa- The final maps will depict weather dimensions tional data is combined with the conditions every six hours from the forecast results from the mathemat- Earth’s surface to the level of the jet Compo, Whitaker, and Sardesh- ical model to produce the best esti- stream (about 11 km or 36,000 ft mukh have discovered that using mate of the current state of the high), and will allow researchers to only surface air pressure data, it is system, balancing the uncertainty in compare the patterns, magnitudes, possible to recreate a snapshot of the data and in the forecast. The means, and extremes of recent and other variables, such as winds and model then advances several hours, projected climate changes with temperatures, throughout the tropo- and the results become the forecast past changes. sphere, from the ground or sea level for the next analysis cycle. “We expect the reanalysis of a to the jet stream.1 This discovery The Ensemble Kalman Filter is century’s worth of data will enable makes it possible to extend two- one of the most sophisticated tools climate researchers to better ad- dimensional weather maps into available for data assimilation. dress issues such as the range of three dimensions. “This was a bit Generically, a Kalman filter is a re- natural variability of extreme events unexpected,” Compo said, “but it cursive algorithm that estimates the including floods, droughts, hurri- means that we can use the surface state of a dynamic system from a canes, extratropical , and pressure measurements to get a series of incomplete and noisy cold waves,” said principal investi- very good picture of the weather measurements. Kalman filters are gator Gil Compo of CIRES. Other back to the 19th century.” used in a wide range of engineering team members are Jeff Whitaker of The computer code used to applications, from radar to computer the NOAA Earth System Research combine the data and reconstruct vision to aircraft and spacecraft Lab and Prashant Sardeshmukh, the third dimension has two com- navigation. Perhaps the most com- also of CIRES, a joint institute of ponents. The forecast model is the monly used type of Kalman filter is NOAA and the University of Col- atmospheric component of the Cli- the phase-locked loop, which en- orado. mate Forecast System, which is ables radios, video equipment, and “Climate change may alter a re- used by the National Weather Ser- other communications devices to gion’s weather and its dominant vice’s National Centers for Environ- recover a signal from a noisy com- weather patterns,” Compo said. mental Prediction (NCEP) to make munication channel. Kalman filtering “We need to know if we can under- operational climate forecasts. The has only recently been applied to stand and simulate the variations in data assimilation component is the weather and climate applications, weather and weather patterns over Ensemble Kalman Filter. but the initial results have been so the past 100 years to have confi- Data assimilation is the process good that the Meteorological Service dence in our projections of changes by which raw data such as temper- of Canada has incorporated it into in the future. The alternative — to ature and atmospheric pressure ob- their forecasting code. The 20th wait for another 50 years of obser- servations are incorporated into the Century Reanalysis Project uses the vations — is less appealing.” physics-based equations that make Ensemble Kalman Filter to remove up numerical weather models. This errors in the observations and to fill process provides the initial values in the blanks where information is used in the equations to predict missing, creating a complete how atmospheric conditions will weather map of the troposphere. evolve. Data assimilation takes Rather than making a single es- place in a series of analysis cycles. timate of atmospheric conditions at

1G. P. Compo, J. S. Whitaker, and P. D. Sardeshmukh, “Feasibility of a 100-year reanalysis using only surface pressure data,” Bulletin of the American Mete- orological Society 87, 175 (2006). National Energy Research Scientific Computing Center 13 2007 Annual Report

Recreating the Knickerbocker Storm of 1922 One of the deadliest snowstorms in U.S. history was the Knickerbocker Storm, a slow-moving bliz- zard that occurred on January 27–29, 1922 in the upper South and Middle Atlantic states. This storm was named after the collapse of the Knickerbocker Theater in Washington, D.C. shortly after 9 p.m. on January 28. The movie theater’s flat roof collapsed under the weight of 28 inches of wet , bringing down the balcony and a portion of the brick wall and killing 98 people, including a Congressman. An arctic air mass had been in place across the Northeast for several days before the storm, and Washington had been below freezing since the afternoon of January 23. The storm formed over Florida on January 26 and took three days to move up the Eastern Seaboard. Snow reached Washington and by noon on January 28 and continued into the morning of January 29. Winds gusting up to 50 mph created conditions, and heavy drifting blocked roads for days. Railroad lines between Philadelphia and Washington were covered by at least 36 inches of snow, with drifts as high as 16 feet. Figure 2 presents data from the 20th Century Reanalysis Project’s three-dimensional reanalysis of conditions at 7 p.m. on January 28, 1922. With data like this available for the entire 20th century, climate researchers hope to improve their models so that they can more confidently predict regional weather trends for the future.

A Ensemble Mean SLP and SLP spread (hPa) 1922012900 B Ensemble Mean Z500 and Z500 spread (m) 1922012900 9.5 145 8.5 130 7.5 115 6.5 100 5.5 85 4.5 70 3.5 55 2.5 40 1.5 25 0.5 10

C Ens Mean Pcp (mm, acum over past 6-h) 1922012900 D Ens Mean 2-m Temp (273 K thickened) 1922012900 25 305 297 21 289 17 281

13 273 267 9 257 5 249 241 1

Figure 2. Reanalysis of conditions at 7 p.m. on January 28, 1922. (A) Sea level pressure (SLP) measured in hectopascals (hPa): con- tours show the ensemble mean SLP, with 1000 and 1010 hPa contours thickened; colors show the range of uncertainty; red dots indicate observation locations. (B) Height of 500 hPa pressure in meters: contours show the ensemble mean height, with the 5600 m contour thickened; colors show the range of uncertainty. (C) Ensemble mean precipitation accumulated over 6 hours, in millime- ters. (D) Ensemble mean temperature (Kelvin) at 2 meters, with the 273 K (0° F) contour thickened. (J. Whitaker, NOAA Earth Sys- tem Research Lab) 14 National Energy Research Scientific Computing Center 2007 Annual Report

each time step, the Ensemble the observations when they are number of observations as well as Kalman Filter reduces the uncer- high quality, or to the forecasts meteorological conditions, thus en- tainty by covering a wide range — when the observations are noisy. abling the model to correct itself in it produces 56 estimated weather The NCEP forecasting system then each analysis cycle. maps (the “ensemble”), each takes the blended 56 weather maps “What we have shown is that slightly different from the others. and runs them forward six hours to the map for the entire troposphere The mean of the ensemble is the produce the next forecast. Process- is very good, even though we have best estimate, and the variance ing one month of global weather only used the surface pressure ob- within the ensemble indicates the data takes about a day of comput- servations,” Compo said. He esti- degree of uncertainty, with less vari- ing, with each map running on its mates that the error for the 3D ance indicating higher certainty. The own processor. The Kalman filter is weather maps will be comparable filter blends the forecasts with the flexible enough to change continu- to the error of modern two- to observations, giving more weight to ously, adapting to the location and three-day weather forecasts.

Reanalysis: Reconstructing complete climate data

Traditionally the Earth’s climate has been studied by statistical analysis of weather data such as temperature, wind direction and speed, and precipitation, with the results expressed in terms of long- term averages and variability. But statistical summaries by themselves are inadequate for studies of cli- mate changes; for one thing, many important atmospheric events happen too quickly to be captured in the averages. The ideal historical data set would provide continuous, three-dimensional weather data for the entire globe, collected using consistent methods for a period of at least a century, and more if possible. In reality, weather records are incomplete both spatially and temporally, skewed by changing methods of collecting data, and sprinkled with inaccuracies. Reanalysis is a technique for reconstructing complete, continuous, and physically consistent long- term climate data. It integrates quality-controlled data obtained from disparate observing systems, then feeds these data into a numerical weather forecasting model to produce short-term forecasts. The out- put from these forecasts fills in the gaps in the recorded observations both in time and space, resulting in high-resolution, three-dimensional data sets. Over the past decade, reanalysis data sets have been used in a wide range of climate applications and have provided a more detailed and comprehensive understanding of the dynamics of the Earth’s at- mosphere, especially over regions where the data are sparse, such as the poles and the Southern oceans. Reanalysis has also alleviated the impacts of changing observation systems and reduced the uncertainty of climate modeling by providing consistent and reliable data sets for the development and validation of models. National Energy Research Scientific Computing Center 15 2007 Annual Report

The reanalysis team ran their One of the first results of the IN- tropical and Antarctic climate code on all four of NERSC’s large- CITE award is that more historical trends. In some cases, flawed scale computing systems — Bassi, data are being made available to the datasets have produced spurious Jacquard, Seaborg, and Franklin — international research community. long-term trends. and switched to a higher-resolution This project will provide climate The new 3D atmospheric algorithm when they moved to modelers with surface pressure ob- dataset will provide missing infor- Franklin. “We got fabulous support servations never before released mation about the conditions in from the consultants,” Compo said, from Australia, Canada, Croatia, the which early-century extreme cli- “especially Helen He and David United States, Hong Kong, Italy, mate events occurred, such as the Turner, on porting code, debugging, Spain, and 11 West African nations. Dust Bowl of the 1930s and the disk quota increases, using the When the researchers see gaps in Arctic warming of the 1920s to HPSS, and special software re- the data, they contact the country’s 1940s. It will also help to explain quests.” They parallelized virtually weather service for more informa- climate variations that may have their entire workflow on the Franklin tion, and the prospect of contribut- misinformed early-century policy architecture via job bundling, writ- ing to a global database has decisions, such as the prolonged ing compute-node shell scripts, motivated some countries to in- wet period in central North America and using MPI sub-communicators crease the quality and quantity of that led to overestimates of ex- to increase the concurrency of the their observational data. pected future precipitation and analysis code. The team also aims to reduce over-allocation of water resources inconsistencies in the atmospheric in the Colorado River basin. climate record, which stem from But the most important use of Filling in and correcting differences in how and where at- weather data from the past will be the the historical record mospheric conditions are observed. validation of climate model simula- Until the 1940s, for example, tions and projections into the future. With the 2007 INCITE alloca- weather and climate observations “This dataset will provide an impor- tion, the researchers reconstructed were mainly taken from the Earth’s tant validation check on the climate weather maps for the years 1918 to surface. Later, weather balloons models being used to make 21st 1949. In 2008, they plan to extend were added. Since the 1970s, ex- century climate projections in the the dataset back to 1892 and for- tensive satellite observations have recently released Fourth Assessment ward to 2007, spanning the 20th become the norm. Discrepancies in Report of the Intergovernmental century. In the future, they hope to data resulting from these different Panel on Climate Change,” Compo run the model at higher resolution observing platforms have caused said. “Our dataset will also help im- on more powerful computers, and otherwise similar climate datasets prove the climate models that will perhaps extend the global dataset to perform poorly in determining contribute to the IPCC’s Fifth Assess- back to 1850. the variability of storm tracks or of ment Report.” This article written by: John Hules } (Berkeley Lab). A Perfect Sandwich

Backing Layers Hydrogen Flow Field Hydrogen Gas Air (oxygen)

Hydrogen Flow Field

Unused – Hydrogen Figure 1. Schematic diagram of a Gas polymer electrolyte membrane Anode fuel cell. – PEM Cathode Water – – – National Energy Research Scientific Computing Center 17 2007 Annual Report

Scientists discover why the right amount of moisture in the membrane plays a key role in fuel cell efficiency

What makes a perfect sandwich? Besides good bread and a tasty combi- nation of fillings and condiments, you need the right amount of moisture to convey the flavor in your mouth. If the sandwich is too dry, it may seem less flavorful, and if it is too soggy, the flavor may seem watered down. The art of sandwich making may be far removed from the science and technology of hydrogen fuel cells, but in both cases, the amount of moisture in the sandwich is important. In a polymer electrolyte membrane (PEM) fuel cell, the electrolyte membrane is sandwiched between an anode (negative elec- trode) and a cathode (positive electrode), as shown in Figure 1. After the cata- lyst in the anode splits the hydrogen fuel into protons and electrons, the PEM transports the protons to the cathode, allowing the separated electrons to flow along an external circuit as an electric current. But the PEM needs the right amount of moisture for efficient proton transport — with too much or too little water, power output will drop. A fundamental understanding of the relationship between membrane nanostructure and the dynamics of water molecules is needed for the devel- opment of efficient, reliable, and cost-effective membranes to advance PEM fuel cell technology. The structure and dynamics of the polymer membranes under different levels of hydration cannot be directly observed in experiments, but they can be modeled in molecular dynamics simulations, as shown in a series of three papers published in the Journal of Physical Chemistry B by Ram Devanathan, Arun Venkatnathan, and Michel Dupuis of Pacific Northwest National Laboratory (PNNL).1 “Experimental studies are inadequate to understand proton dynamics, be- cause it occurs below nanoscale,” said Devanathan. “This is where NERSC’s Project: Charge Transfer, Transport, computing power becomes indispensable. By using advanced computer mod- and Reactivity in Complex els, we are getting a grasp of the complex processes at the molecular level in Environments polymer membranes.” The simulations for these three papers were run on PI: Michel Dupuis, Pacific Northwest Jacquard and Bassi. National Laboratory The research is part of President Bush’s Hydrogen Fuel Initiative, which Senior investigators: Ram Devanathan and Arun Venkatnathan, Pacific North- west National Laboratory Funding: BES 1 A. Venkatnathan, R. Devanathan, and M. Dupuis, “Atomistic simulations of hydrated Nafion and Computing resources: NERSC, temperature effects on hydronium ion mobility,” J. Phys. Chem. B 111, 7234 (2007). R. Devanathan, A. Venkatnathan, and M. Dupuis, “Atomistic simulation of Nafion membrane: 1. Effect MSCF/EMSL of hydration on membrane nanostructure,” J. Phys. Chem. B 111, 8069 (2007). R. Devanathan, A. Venkatnathan, and M. Dupuis, “Atomistic simulation of Nafion membrane. 2. Dynam- ics of water molecules and hydronium ions,” J. Phys. Chem. B 111, 13006 (2007). 18 National Energy Research Scientific Computing Center 2007 Annual Report

aims to develop commercially viable hydrogen fuel cells. Using this clean Figure 2. The June 28, 2007 cover of the Journal of Physi- and efficient technology would help cal Chemistry B showed snapshots of ionized Nafion and hydronium ions at various degrees of hydration: λ = 3.5 to reduce the world’s reliance on (a), λ = 6 (b), λ = 11 (c), and λ = 16 (d) at 350 K. The fossil fuels and lessen greenhouse black area corresponds to the polymer backbone that is gas emissions. not shown. The pendant side chain (green), sulfonate (yel- The PNNL researchers’ three low and red), hydronium ions (red and white), and water Journal of Physical Chemistry B molecules (steel gray) show the structural changes asso- papers all studied a polymer mem- ciated with changes in hydration. brane manufactured by DuPont a) b) called Nafion, which has been the subject of numerous experiments and is considered a good starting point for the development of next- generation polymer electrolytes. “Nafion 117 has excellent proton conductivity and good chemical and mechanical stability, but the c) d) atomic-level details of its structure at various degrees of hydration are not well characterized or under- stood,” the authors wrote in their first paper, which was featured on the cover of the journal’s June 28, 2007 issue (Figure 2).

Hydration and four levels of hydration and two dif- on the absolute value of the diffusion temperature ferent temperatures. The scientists coefficients for both water and hy- calculated structural properties dronium ions. These findings have In this paper the researchers set such as radial distribution functions, helped in interpreting experimental out to create simulations that exam- coordination numbers, and dynami- results that indicate a major struc- ined the impact of hydration and cal properties such as diffusion co- tural change taking place in the mem- + temperature on the positively efficients of hydronium ions (H3O ) brane with increasing hydration. charged hydrated protons and water and water molecules. In the second paper, the authors molecules. Understanding these The results of their calculations used all-atom molecular dynamics dynamics could lead to polymer showed that protons and water mole- simulations to systematically exam- membranes that are better engi- cules are bound to sulfonate groups ine eight different levels of membrane neered for transporting protons in the membrane at low hydration hydration to closely mirror two exper- while controlling electrode flooding levels. As the hydration level in- imental studies. They also simulated by the water molecules. One of the creases, the water molecules be- bulk water to develop a novel crite- goals is to develop PEM mem- come free and form a network along rion to identify free water in Nafion. branes that need little water. which protons can hop (Figure 2). This enabled them to quantify the Using classical molecular dy- This leads to a dramatic increase in fraction of free, weakly bound, and namics simulations, the research proton conductivity. Temperature bound water molecules in the mem- team investigated the impact of was found to have a significant effect brane as a function of hydration. National Energy Research Scientific Computing Center 19 2007 Annual Report

The researchers found that at Structure and dynamics 20% of the water molecules are low hydration levels, strong binding free (bulklike). With increasing hy- of hydronium ions to sulfonate groups In the third report, Devanathan, dration, the diffusion coefficients of prevents transport of protons. Mul- Venkatnathan, and Dupuis com- hydronium ions and water mole- tiple sulfonate groups surrounding puted the dynamical properties of cules increase, and the mean resi- the hydronium ions in bridging config- water molecules and hydronium dence time of water molecules urations hinder the hydration of the ions in Nafion and related them to around sulfonate groups decreases. hydronium and the structural diffusion the structural changes reported These results provide a molecular- of protons (Figure 3). As the hydration previously. They confirmed other level explanation for the proton and level increases, the water molecules researchers’ finding that the behavior water dynamics observed in neu- mediate the interaction between hy- of water molecules within nanoscale tron scattering experiments. dronium ions and sulfonate groups, pores and channels of PEMs, espe- Because the structure and dy- moving them farther apart. These cially at low hydration levels, is re- namics of the membrane under dif- results provide atomic-level insights markably different from that of ferent levels of hydration cannot be into structural changes observed in molecules in bulk water. directly observed in experiments, Nafion by infrared spectroscopy. At low hydration, fewer than there is no universally accepted model of the structure of Nafion. a) b) This research makes a significant step toward that goal and toward the development of the next gener- ation of PEMs. Characteristics of the ideal PEM include high proton conductivity at low hydration levels; thermal, me- chanical, and chemical stability; c) d) durability under prolonged operation; and low cost. None of the existing membranes meet all these require- ments, and developing new mem- branes requires a molecular-level understanding of membrane chem- istry and nanostructure. Molecular dynamics simulations like these, to- gether with experiments, are laying e) f) the foundation for future break- throughs in fuel cells. }

This article written by: John Hules and Ucilia Wang, Berkeley Lab.

Figure 3. Orthographic projection (~42 Å × 30 Å) of hydrated Nafion for the following λ values: (a) 3; (b) 5; (c) 7; (d) 9; (e) 11; and (f) 13.5. Water molecules, hydronium ions, sulfonate groups, and the rest of the membrane are represented in blue, red, yellow, and gray, respectively. The Fusion Two-Step National Energy Research Scientific Computing Center 21 2007 Annual Report

Simulations elucidate the physics of fast ignition

To a dance aficionado, the term two-step may refer to the ballroom dance that evolved into the foxtrot, or to country/western dances like the Texas two-

Project: Three-Dimensional Particle- step and the Cajun two-step. But in the realm of alternative energy sources, in-Cell Simulations for Fast Ignition one of the hottest new trends is the two-step fast ignition concept for inertial confinement fusion (ICF). PI: Chuang Ren, University of ICF is the process of initiating a nuclear fusion reaction by heating and Rochester, Laboratory for Laser compressing a fuel target, usually a pellet of deuterium-tritium (DT) ice. If a 10 Energetics milligram DT fuel pellet was completely consumed by fusion, it would release Senior investigators: Warren Mori energy equivalent to more than half a barrel of oil. and John Tonge, University of Califor- Until recently, the most common approach to ICF has been hot-spot igni- nia, Los Angeles tion, in which the fuel pellet is compressed and heated in one step by a multi- Funding: INCITE, FES beam laser. This is the concept around which the National Ignition Facility Computing Resources: NERSC, UCLA (NIF), scheduled to be completed in 2009 at Lawrence Livermore National Laboratory, was designed. In the newer fast ignition concept, compression and ignition are separated into two steps: first, a compression laser compresses a spherical shell of DT ice to high density at low temperature, without a central hot spot; then a sec- ond very high-intensity laser delivers an extremely short pulse of energy that ignites the compressed fuel. The two ignition concepts are sometimes compared to a diesel engine (pure compression) and a gasoline engine (where the fast ignitor is equivalent to a spark plug). Compared to hot-spot ignition, fast ignition promises much higher gain (the ratio of energy output to energy input) for the same driver en- ergy, possible reduction of the driver energy necessary to achieve ignition, and less stringent compression symmetry requirements. The ignition step is the least understood aspect of fast ignition and is the subject of the INCITE project “Three-Dimensional Particle-in-Cell Simulations for Fast Ignition,” led by Chuang Ren, Assistant Professor of Mechanical Engi- neering and Physics at the University of Rochester, and Warren Mori, Profes- sor of Physics and Electrical Engineering at UCLA.

The hole-boring scheme

Since the ignition laser cannot directly reach the dense core region, the laser energy needs to be converted into an energetic (super-hot) electron beam that can penetrate to the core and deposit its energy there. The electron beam 22 National Energy Research Scientific Computing Center 2007 Annual Report

needs to be generated as close to centimeter. But the igni- the core as possible to reduce en- tion laser will need inten- ergy loss along its path to the core. sity of 1019 to 1020 watts One way to do that is the hole-boring per square centimeter. scheme, in which the ignition laser So we understand less pulse is preceded by a channeling about the interactions laser pulse to create a channel between the more in- through the underdense corona and tense ignition beam and into a critical density surface, be- a plasma. Our computa- yond which it cannot penetrate (Fig- tion is about this process.” ure 1). The ignition pulse is then sent Particle-in-cell (PIC) meth- in tandem to reach the critical sur- ods provide the best available face and may continue to push for- simulation tool to understand the ward into the overdense plasma, in highly nonlinear and kinetic physics the meantime heating the plasma to in the ignition phase. Ren’s project Figure 1. A sketch of the hole-boring scheme for fast ignition. generate the energetic electron beam. covers almost all the physics in the This beam will penetrate through ignition phase with the goal of an- the dense plasma and deposit the swering the following questions: size and time, you see new phe- energy in the core, heating a small 1. Can a clean channel be created nomena,” Ren said. area to a high-enough temperature by a channeling pulse so that Using the OSIRIS code, which to ignite the fusion reaction. the ignition pulse can arrive at can run on over 1000 processors “Compressing the fuel to high the critical surface without sig- with more than 80% efficiency, Ren’s density is relatively easy to achieve, nificant energy loss? team ran the first 2D simulations of but high temperature is more chal- 2. What are the amount and spec- channeling at the millimeter scale.1 lenging,” Ren said. “You need to trum of the laser-generated en- These simulations, which ran on convert the laser energy into elec- ergetic electrons? Seaborg and Bassi, employed up to tron energy, and these electrons 3. What is the energetic electron 8 × 107 grids, 109 particles, and 106 need to be collimated [focused], transport process beyond the time steps. NERSC’s User Services because if they spread to a large laser-plasma interface in a plasma Group increased the researchers’ area, you need much more energy with densities up to 1023 cm–3? disk quota and queue priority to ac- to heat a large area. You need to commodate the scale of these cal- focus the laser down to a very small culations. The Analytics Team also spot, say a 20 micron radius, and Millimeter-scale assisted by reducing network la- make the electrons go forward, not simulations tency for remote performance of just spread.” Xlib-based applications. Ren continued: “The ignition Most of the previous channeling The results of the simulations laser has to heat the pellet in a very experiments and simulations were showed important new details of short time, 10 to 20 picoseconds, so done in 100 micrometer-scale plas- the channeling process, including its intensity is a lot higher than the mas, but the underdense region of plasma buildup in front of the laser, compression laser, which works on an actual fast ignition target is 10 laser hosing (an undulating motion the time scale of 1 nanosecond and times longer. “If you extend the ex- like water coming out of a garden has intensity of 1014 watts per square periment 10 times longer in both hose), and channel bifurcation and

1 G. Li, R. Yan, C. Ren, T.-L. Wang, J. Tonge, and W. B. Mori, “Laser channeling in millimeter-scale underdense plasmas of fast-ignition targets,” Physical Review Letters 100, 125002 (2008). National Energy Research Scientific Computing Center 23 2007 Annual Report

1300 0.6 1100 A 0.3 C ) ) c c 1100 900 0.2 0.4 ) o

) 900 o 700 y(c/ ω 700 0.1 0.2 y(c/ ω

500 Ion density(n/n 500 0.0 300 0.0 0 200 400 600 800 1000 1300 1300 0.6 B 2 D ) ) Ion density(n/n 1100 1100 c

1 –1 e

o 0.4 )

o 900 900 ) c ω o 0 e

y(c/ ω 700 700 y(m z

y(c/ ω 0.2 E

–1 Ion density(n/n 500 500 300 –2 300 0.0 0 1000 2000 3000 Figure 2. Simulation of laser channeling through an underdense plasma. (a) Ion density at t = 0.8 ps showing micro channels formed; (b) laser E- field showing laser hosing and (c) ion density showing channel bifurcation at t = 3.4 ps; (d) ion density at t = 7.2 ps showing channel self-correction. self-correction (Figure 2). The simu- gets. These simulations employed Ren is a researcher at the Uni- lations demonstrated electron heating 5 × 108 grids, 109 particles, and 105 versity of Rochester’s Laboratory to relativistic temperatures, a chan- time steps. The results showed that for Laser Energetics, where the neling speed much less than the lin- the temperature of electrons emit- OMEGA EP laser system, the ear group velocity of the laser, and ted at high laser intensities is only world’s leading system for fast igni- increased transmission of an igni- half of that predicted by an empirical tion experiments, is scheduled to be tion pulse in a preformed channel. formula used in many fast ignition completed in April 2008. He also The simulation results also shed feasibility studies. The simulations collaborates with other investigators light on the question of how to save also found that the laser absorption in the DOE’s Fusion Science Center energy during the channeling rate increases with the laser inten- for Extreme States of Matter and Fast process. “You want to spend as lit- sity. “Higher-intensity lasers are Ignition Physics, which coordinates tle energy as possible on creating desirable since they bore a deeper research in all aspects of fast igni- the channel and save it for ignition,” hole and deliver their energy to a tion. Fast ignition is also going to be Ren said, “so what kind of laser in- smaller area, creating a hot spot of tested at the FIREX facility in Japan tensity do you use for the channel- higher temperature,” Ren explained. and the Z machine at Sandia Na- ing? High intensity will create a The combination of these effects tional Laboratories. The National Ig- channel more quickly, but you may means that ultra-high-intensity nition Facility could be adapted for spend more energy. You can also lasers can produce an electron flux full-scale fast ignition experiments, use a low intensity laser but it takes with a majority of electrons in the and the proposed HiPER facility in longer, so it was not clear before usable energy range. Europe is being designed for just how to minimize that. We showed “But there are important effects that purpose. All of these experi- that a lower intensity laser takes that cannot be simulated in two di- ments will benefit from the insights more time to produce the channel mensions,” Ren pointed out. “Sim- gained in Ren’s simulations, which but does it effectively using less en- ply scaling up our 2D simulations to gives his work a sense of urgency. ergy than a high intensity laser. This 3D would require more than a 4000- “This research will help toward result will provide some guidance fold computation increase. That is the realization of fusion as a con- for designing experiments.” not feasible even on the largest trollable energy source, and can The group’s 2D simulations of computers available. So we will help solve the energy crisis facing hot electron generation and trans- combine 3D simulations at reduced the world today,” he concluded. } port were the largest ever in target scales with full-scale 2D results and size and the first with isolated tar- theory to figure out what happens.” This article written by: John Hules, Berkeley Lab. Spontaneous Superlattice National Energy Research Scientific Computing Center 25 2007 Annual Report

Ab initio calculations and modeling contribute to the discovery of a new way to fabricate striped nanorods

Superlatticed or “striped” nanorods — crystalline materials only a few mol- ecules in thickness and made up of two or more semiconductors — are highly valued for their potential to serve in a variety of nanodevices, including transis- tors, biochemical sensors, and light-emitting diodes (LEDs). Until now the po- tential of superlatticed nanorods has been limited by the relatively expensive and exacting process required to make them. That paradigm may be shifting. A team of researchers with Lawrence Berkeley National Laboratory (Berke- ley Lab) and the University of California (UC) at Berkeley, has found a way to make striped nanorods in a colloid — a suspension of particles in solution. Previously, striped nanorods were made through epitaxial processes, in which the rods were attached to or embedded within a solid medium. “We have demonstrated the application of strain engineering in a colloidal quantum-dot system by introducing a method that spontaneously creates a regularly spaced arrangement of quantum dots within a colloidal quantum rod,” said chemist Paul Alivisatos, who led this research. “A linear array of quantum dots within a nanorod effectively creates a one-dimensional superlattice, or striped nanorod.” Alivisatos, an internationally recognized authority on colloidal nanocrystal research, is the Director of the Materials Sciences Division and Associate Lab- oratory Director for Physical Sciences at Berkeley Lab, and is the Larry and Diane Bock Professor of Nanotechnology at UC Berkeley. Collaborators on this project, which culminated in a paper published in the journal Science1, were Richard Robinson of Berkeley Lab’s Materials Sciences Division (lead au- thor), Denis Demchenko and Lin-Wang Wang of Berkeley Lab’s Computational Research Division; and Bryce Sadtler and Can Erdonmez, of the UC Berkeley Project: Large Scale Nanostructure Department of Chemistry. Electronic Structure Calculations PI: Lin-Wang Wang, Lawrence Berkeley National Laboratory One-dimensional fabrication Senior investigators: Byounghak Today’s electronics industry is built on two-dimensional semiconductor Lee, Joshua Schrier, Denis materials that feature carefully controlled doping and interfaces. Tomorrow’s Demchenko, Nenad Vukmirovic, Sefa Dag, Lawrence Berkeley National Laboratory Funding: BES, ASCR 1 R. D. Robinson, B. Sadtler, D. O. Demchenko, C. K. Erdonmez, L. W. Wang, and A. P. Alivisatos, “Spontaneous superlattice formation in nanorods through partial cation exchange,” Science 317, 355 (2007). 26 National Energy Research Scientific Computing Center 2007 Annual Report

industry will be built on one-dimen- striped nanorods opens up the pos- cation exchange with free-standing sional materials, in which controlled sibility of using them in biological quantum dots of the semiconductor

doping and interfaces are achieved labeling, and in solution-processed silver sulfide (Ag2S) (Figure 1). through superlatticed structures. LEDs and solar cells.” “We found that a linear arrange- Formed from alternating layers of Previous research by Alivisatos ment of regularly spaced silver sulfide semiconductor materials with wide and his group had shown that the contained within a cadmium-sulfide and narrow band gaps, superlat- exchange of cations could be used to nanorod forms spontaneously at a ticed structures, such as striped vary the proportion of two semicon- cation exchange rate of approxi- nanorods, can display not only out- ductors within a single nanocrystal mately 36 percent,” said Alivisatos. standing electronic properties, but without changing the crystal’s size “The resulting striped nanorods dis- photonic properties as well. and shape, so long as the crystal’s play properties expected of an epi- “A target of colloidal nanocrystal minimum dimension exceeded four taxially prepared array of silver-sulfide research has been to create super- nanometers. This led the group to quantum dots separated by confin- latticed structures while leveraging investigate the possibility of using a ing regions of cadmium sulfide. This the advantages of solution-phase partial exchange of cations between includes the ability to emit near- fabrication, such as low-cost syn- two semiconductors in a colloid to infrared light, which opens up poten- thesis and compatibility in disparate form a superlattice. Working with tial applications such as nanometer- environments,” Alivisatos said. “A previously formed cadmium-sulfide scale optoelectronic devices.” colloidal approach to making (CdS) nanorods, they engineered a

A B Strain engineering

One of the key difference be- tween quantum dots epitaxially grown on a substrate and freestanding colloidal quantum dots is the pres- ence of strain. The use of tempera- ture, pressure, and other forms of stress to place a strain on material structures that can alter certain properties is called “strain engineer-

20 nm 20 nm ing.” This technique is used to en- hance the performance of today’s C electronic devices, and has recently been used to spatially pattern epi- taxially grown striped nanorods. However, strain engineering in 60 epitaxially produced striped nanorods requires clever tricks,

Counts 20 whereas Demchenko and Wang

10 nm 016 32 discovered — through ab initio cal- Separation (nm) culations of the interfacial energy Figure 1. In these transmission electron microscope images of superlatticed or striped and computer modeling of strain nanorods formed through partial cation exchange, (A) shows the original cadmium-sulfide energies — that naturally occurring nanorods; (B and C) show cadmium-sulfide nanorods striped with silver-sulfide. The inset is a strain in the colloidal process would histogram showing the pattern spacing of the silver-sulfide stripes. be the driving force that induced National Energy Research Scientific Computing Center 27 2007 Annual Report

the valence force field (VFF) A 1 2 3 4 method, which is an atomistic bond stretching and bending model. The S 2 Ag researchers received assistance [100] with code installation and testing CdS Ag2S DE2(x1.5) from Zhengji Zhao, a materials sci- 3(x1) CdS ence and chemistry specialist in [001] 4(x7) NERSC’s User Services Group. “This project has involved tight coordination between computer

Intensity (a.u.) simulations and experiment, and 2,3,4 1(x1) the results obtained here would not S = yellow have been possible to achieve with- Cd = 0range Ag = gray 440 480 520 900 1200 1500 out the contributions of our compu- Wavelength (nm) tational scientists, Demchenko and B Stain C Wang,” Alivisatos said. “It is an- (% bond length) 164 +2.0 other clear example where we see +1.6 162 +1.2 that theoretical simulations are not 160 +0.8 14 16 18 +0.4 just being used to explain materials 158 0.0 growth after the fact, but are now –0.4 156 –0.8 an integral part of the materials de- Elastic Energy (eV) 154 –1.2 4 6 8 10 12 14 16 18 20 –1.6 sign and creation process from the –2.0 Seg. Separation (nm) very start.” Even though the colloidal Figure 2. Theoretical modeling and experimental optical characterization. (A) Cubic-cutout rep- striped nanorods form sponta- resentation of cells used for ab initio energy calculations. A distorted monoclinic Ag2S (100) neously, Alivisatos said it should be plane connects with the wurtzite CdS (001) plane. (B) Elastic energy of the rod as a function of segment separation (center-to-center). (C) Z-axis strain for the case of two mismatched seg- possible to control their superlat- ments at a center-to-center separation distance of 14.1 nm (top) and 12.1 nm (bottom). The ticed pattern — hence their proper- elastic interaction between segments is greatly reduced for separations >12.1 nm. Arrows ties — by adjusting the length, show the placement of mismatched segments. (D) Visible and (E) near-infrared photolumines- width, composition, etc., of the cence spectra at λ = 400- and 550-nm excitation, respectively. Coupling between the CdS and original nanocrystals. However, Ag2S is evident by the complete quenching of the visible photoluminescence (D) in the het- much more work remains to be erostructures. The shift in near-infrared photoluminescence (E) is due to quantum confinement done before the colloidal method of of the Ag2S. fabricating striped nanorods can the spontaneous formation of the Energy (PEtot) codes, utilizing the match some of the “spectacular re- superlattice structures (Figure 2). local density approximation and sults” that have been obtained from This is the first time that the elastic generalized gradient approximation epitaxial fabrication. energy has been shown to be re- to the density functional theory. These “For now, the value of our work sponsible for pattern formation in a techniques were used to estimate lies in the unification of concepts colloidal nanostructure. stability of various Ag2S phases, find between epitaxial and colloidal fab- Demchenko and Wang per- the optimal geometry for the epitaxial rication methods,” he said. } formed the ab initio calculations of attachment, calculate the formation the electronic structure of Ag2S and energies of the CdS–Ag2S interfaces, This article written by: CdS on Seaborg and Bassi using and calculate the corresponding Lynn Yarris and John the Vienna Ab-Initio Simulation band alignment. Elastic energies Hules (Berkeley Lab). Package (VASP) and Parallel Total and strains were estimated using Igniting a Stellar Explosion National Energy Research Scientific Computing Center 29 2007 Annual Report

Flash Center achieves the first 3D simulation of the spontaneous detonation of a white dwarf star

University of Chicago scientists demonstrated how to incinerate a white dwarf star in unprecedented detail at the “Paths to Exploding Stars” confer- ence on March 22, 2007, in Santa Barbara, Calif. White dwarf stars pack one and a half times the mass of the sun into an object the size of Earth. When they burn out, the ensuing explosion produces a type of supernova that astrophysicists believe manufactures most of the iron in the universe. These type Ia supernovas, as they are called, may also help illumi- nate the mystery of dark energy, an unknown force that dominates the universe. “That will only be possible if we can gain a much better understanding of the way in which these stars explode,” said Don Lamb, Director of the Univer- sity of Chicago’s Center for Astrophysical Thermonuclear Flashes. Scientists for years have attempted to blow up a white dwarf star by writ- ing the laws of physics into computer software and then testing it in simula- tions. At first the detonations would only occur if inserted manually into the programs. Then the Flash team naturally detonated white dwarf stars in sim- plified, two-dimensional tests, but “there were claims made that it wouldn’t work in 3D,” Lamb said. But in January 2007, the Flash Center team for the first time naturally deto- nated a white dwarf in a more realistic three-dimensional simulation. The sim- ulation confirmed what the team already suspected from previous tests: that the stars detonate in a supersonic process resembling diesel-engine combus- tion, which they call gravitationally confined detonation.1 Unlike a gasoline engine, in which a spark ignites the fuel, it is compres- sion that triggers ignition in a diesel engine. “You don’t want supersonic burn- Project: Validation Study of Funda- ing in a car engine, but the triggering is similar,” said Dean Townsley, a mental Properties of Type Ia Super- Research Associate at the Joint Institute for Nuclear Astrophysics at Chicago. novae Models The temperature attained by a detonating white dwarf star makes the PI: Don Lamb, University of Chicago 10,000-degree surface of the sun seem like a cold winter day in Chicago by Center for Astrophysical Thermonu- comparison. “In nuclear explosions, you deal with temperatures on the order clear Flashes of a billion degrees,” said Flash Center Research Associate Cal Jordan. The new 3D white dwarf simulation shows the formation of a flame bubble Senior investigators: Robert Fisher, near the center of the star. The bubble, initially measuring approximately 10 Anshu Dubey, Jim Truran, Carlo miles in diameter, rises more than 1,200 miles to the surface of the star in one Graziani, Dean Townsley, Cal Jordan, Casey Meakin, University of Chicago Funding: INCITE, ASC, NSF 1 G. C. Jordan IV, R. T. Fisher, D. M. Townsley, A. C. Calder, C. Graziani, S. Asida, D. Q. Lamb, and J. Computing Resources: NERSC, LLNL W. Truran, “Three-dimensional simulations of the deflagration phase of the gravitationally confined detonation model of Type Ia supernovae,” Astrophysical Journal 681, 1448–1457 (2008). 30 National Energy Research Scientific Computing Center 2007 Annual Report

Figure 1. Three phases of the gravitationally confined detonation mechanism. The images show the flame surface (orange) and the star surface (blue) (a) at 0.5 s, soon after the bubble becomes unstable and develops into a mushroom shape, (b) at 1.0 s, as the bubble breaks through the surface of the star, and (c) at 1.7 s, shortly before the hot ash from the bubble collides at the opposite point on the surface of the star. These im- ages are generated from volume-renderings of the flame surface and the density.

second. In another second, the would never have been able to do like being a doctor on call 24/7.” flame crashes into itself on the op- the simulations.” But the scientific payoff for log- posite end of the star, triggering a Katie Antypas, an HPC consult- ging these long, stressful hours is detonation. “It seems that the dy- ant in NERSC’s User Services potentially huge. Astrophysicists namics of the collision is what cre- Group, worked closely with Lamb value type Ia supernovas because ates a localized compression region to run the simulations on Seaborg. they all seem to explode with ap- where the detonation will manifest,” “With help and input from many proximately the same intensity. Cal- Townsley said. Figure 1 shows people at NERSC, from setting up ibrating these explosions according stages of the flame erupting and accounts, allocating terabytes of to their distance reveals how fast enveloping the star, while Figure 2 disk space, granting file sharing the universe has been expanding at shows the entire process from permissions to analyzing output various times during its long history. flame formation to detonation. from failed runs, we were able to In the late 1990s, supernova get the Flash team’s 512-processor measurements revealed that the ex- job up and running on short notice pansion of the universe is accelerat- Extreme computing to help them meet a hard deadline,” ing. Not knowing what force was Antypas said. working against gravity to cause This process plays out in no more The simulations are so demand- this expansion, scientists began than three seconds, but the simula- ing — the Flash team calls it “ex- calling it “dark energy.” The Flash tions take considerably longer. The treme computing” — that they Center simulations may help astro- Flash Center team ran its massive monopolize powerful computers of physicists make better calibrations simulation on two powerful super- the U.S. Department of Energy dur- to adjust for the minor variation that computers at Lawrence Livermore ing the allocated time. To ensure they believe occurs from one super- National Laboratory and at NERSC. that these computers are used to nova to the next. Just one of the jobs ran for 75 their maximum potential, the Flash “To make extremely precise hours on 768 computer processors, team stands on alert to rapidly cor- statements about the nature of dark for a total of 58,000 hours. rect any glitches that may arise. energy and cosmological expan- “I cannot say enough about the “We have it set up so that if sion, you have to be able to under- support we received from the high- something goes wrong, text mes- stand the nature of that variation,” performance computing teams at sages are sent out instantaneously Fisher said. Lawrence Livermore and NERSC,” to everyone,” said Flash Center Re- Telescopic images of the two Lamb said. “Without their help, we search Scientist Robert Fisher. “It’s supernovas closest to Earth seem National Energy Research Scientific Computing Center 31 2007 Annual Report

Figure 2. This series of images shows a two-dimensional slice through the center of an exploding white dwarf star. The lines that form the rings are contours that mark dif- ferences in density. The gray tones represent fuel and ash that is enveloping the star. These images were produced in the first three-dimensional computer simulation in which a white dwarf exploded naturally. In previous 3D simulations, the detonation had to be inserted manually. match the Flash team’s findings. programs: the Advanced Simulation The images of both supernovas and Computing program, which has show a sphere with a cap blown off provided funding and computer the end. time to the Flash Center for nearly a “In our model, we have a rising decade, and INCITE (Innovative and bubble that pops out of the top. It’s Novel Computation Impact on The- very suggestive,” Jordan said. ory and Experiment) of the Office of This article written by: Steve Koppes, Support for these simulations Science, which has provided com- University of Chicago; Ucilia Wang and John Hules, Berkeley Lab. was provided by two separate DOE puter time. } Science for Humanity

NERSC users share Nobel Peace Prize, among other honors

More than 20 NERSC users were contributing authors of the United Na- tions Intergovernmental Panel on Climate Change (IPCC) Fourth As- sessment Report (AR4), which was published in early 2007. Later in the year, the IPCC — a group of more than 2000 scientists and pol- icy experts — shared the 2007 Nobel Peace Prize with former Vice President Al Gore “for their efforts to build up and disseminate greater knowledge about man-made climate change, and to lay the foundations for the measures that are needed to counteract such change,” according to the Nobel announcement. Supercomputers at NERSC and the National Center for Computational Sciences (NCCS) at Oak Ridge National Laboratory provided more than half of the simulation data for the joint Department of Energy and National Science Foundation data contribution to AR4. “Access to DOE leadership-class, high-performance computing assets at NERSC and ORNL significantly improved model simulations,” said atmospheric scientist Lawrence Buja of the National Center for Atmospheric Research (NCAR), an NSF center. “These computers made it possible to run more realistic physical processes at higher resolutions, with more ensemble members, and longer historical validation simulations. We simply couldn’t have done this without the strong DOE/NSF interagency partnership.” At NERSC, climate runs for the IPCC began in the late 1990s with the Parallel Climate Model (PCM). Results from these runs were stored in the PCM database at NERSC, the first truly public database for distributing climate data. “It is fair to say that without the PCM runs, made largely at NERSC in the late 1990s through 2002, the U.S. modeling effort would have not been the major factor it is in the IPCC report,” said Michael Wehner, who managed the PCM database at the time. Since 2002, running PCM and the newer Community Climate System Model (CCSM), the IPCC project has used nearly 10 million processor hours at NERSC. Warren Washington of NCAR is principal investigator of the climate change simulation project at NERSC, working with senior investigators Jerry Meehl and Lawrence Buja. National Energy Research Scientific Computing Center 33 2007 Annual Report

Other current NERSC users who contributed to the A number of other awards and honors were bestowed IPCC AR4 report include: on NERSC users in 2007: Krishna Achutarao, Lawrence Livermore National Member of the National Academy of Sciences Laboratory American Chemical Society Ahmed Zewail Award in Natalia Andronova, University of Michigan Ultrafast Science and Technology Graham Fleming, University of California, Berkeley and Julie Arblaster, National Center for Atmospheric Research Lawrence Berkeley National Laboratory and Bureau of Meteorology Research Centre, Australia Fellow of the American Academy of Arts & Sciences William Collins, Lawrence Berkeley National Laboratory Saul Perlmutter, University of California, Berkeley and Curt Covey, Lawrence Livermore National Laboratory Lawrence Berkeley National Laboratory Chris Forest, Massachusetts Institute of Technology Gruber Cosmology Prize Saul Perlmutter, Greg Aldering, Alex Kim, and Peter Nu- Inez Fung, University of California, Berkeley and Lawrence gent (Supernova Cosmology Project), Lawrence Berke- Berkeley National Laboratory ley National Laboratory Nathan Gillett, University of East Anglia Fellows of the American Association for the Advance- Jonathan Gregory, University of Reading and Hadley Cen- ment of Science tre for Climate Prediction and Research James Chelikowsky, University of Texas, Austin William Gutowski, Iowa State University Peter Cummings, Vanderbilt University Fritz Prinz, Stanford University Aixue Hu, National Center for Atmospheric Research Fellows of the American Physical Society Hugo Lambert, University of California, Berkeley Michael Borland, Argonne National Laboratory Eric Leuliette, University of Colorado, Boulder Giorgio Gratta, Stanford University Ruby Leung, Pacific Northwest National Laboratory and Stephen Gray, Argonne National Laboratory National Oceanic and Atmospheric Administration Edward Seidel, Louisiana State University U.S. Department of Energy Ernest Orlando Lawrence Michael Mastrandrea, Stanford University Award Carl Mears, Remote Sensing Systems Paul Alivisatos, University of California, Berkeley and Surabi Menon, Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Joyce Penner, University of Michigan IEEE Computer Society Sidney Fernbach Award David Keyes, Columbia University Thomas Phillips, Lawrence Livermore National Laboratory Welch Award in Chemistry David Randall, Colorado State University William Miller, University of California, Berkeley and Haiyan Teng, National Center for Atmospheric Research Lawrence Berkeley National Laboratory Minghuai Wang, University of Michigan American Meteorological Society Charles Franklin Brooks Award Li Xu, University of Michigan Warren Washington, National Center for Atmospheric Research American Physical Society Nicholas Metropolis Award for Outstanding Doctoral Thesis Work in Computa- tional Physics Chengkun Huang, University of California, Los Angeles The NERSC Center National Energy Research Scientific Computing Center 35 2007 Annual Report

Kathy Yelick Is Named NERSC Director

Kathy Yelick, a professor of computer science at the University of California at Berkeley and an internationally recognized expert in developing methods to advance the use of supercomputers, was named director of NERSC in Octo- ber 2007 and assumed her new duties in January 2008. Yelick has received a number of research and teaching awards and is the author or co-author of two books and more than 75 refereed technical papers. She earned her Ph.D. in computer science from MIT and has been a professor at UC Berkeley since 1991 with a joint research appointment at Berkeley Lab since 1996. “We are truly delighted to have Kathy serve as the next director of NERSC, and only the fifth director since the center was established in 1974,” said Berkeley Lab Director Steven Chu. “Her experience and expertise in advancing the state of high performance computing make her the perfect choice to maintain NERSC’s leadership position among the world’s supercomputing centers.” Yelick, who has been head of the Future Technologies Group at Berkeley Lab since 2005, succeeds Horst Simon as head of NERSC. Simon, who has led NERSC since 1996, will continue to serve as Berkeley Lab’s Associate Di- rector for Computing Sciences and Director of the Computational Research Division. “When Horst Simon announced that he wanted to relinquish the leadership of NERSC, we knew he would be a tough act to follow,” said Michael Strayer, head of DOE’s Office of Advanced Scientific Computing Research, which funds NERSC. “But with the selection of Kathy Yelick as the next director, I believe that NERSC will continue to build upon its success in advancing scientific discovery through computation. We are extremely happy to have her take on this role.” In 2006, Yelick was named one of 16 “People to Watch in 2006” by the newsletter HPCwire. The editors noted that “Her multi-faceted research goal is to develop techniques for obtaining high performance on a wide range of computational platforms, all while easing the programming effort required to achieve high performance. Her current work has shown that global address space languages like UPC and Titanium offer serious opportunities in both 36 National Energy Research Scientific Computing Center 2007 Annual Report

productivity and performance, and NERSC Gets High Marks in that these languages can be ubiq- uitous on parallel machines without Operational Assessment Review excessive investments in compiler technology.” In addition to high performance languages, Yelick has worked on parallel algorithms, numerical li- braries, computer architecture, communication libraries, and I/O systems. Her work on numerical li- braries includes self-tuning libraries which automatically adapt the code to machine properties. She is also a consumer of parallel systems, having worked directly with inter- disciplinary teams on application scaling, and her own applications work includes parallelization of a NERSC General Manager Bill Kramer computational fluid dynamics model for blood flow in the heart. NERSC received praises from cient,” said Bill Kramer, NERSC’s She is involved in a National Re- its first ever Operational Assess- General Manager. “We exceeded search Council study investigating ment Review by a committee of the metrics for the review.” the impact of the multicore revolu- scientists, national research facility The committee evaluated each tion across computing domains, leaders, and managers at the DOE supercomputing center’s business and was a co-author of a Berkeley Office of Science. plan, financial management, inno- study on this subject known as the The review, conducted on Au- vation, scientific achievements, and “Berkeley View.”1 Yelick speaks gust 28, 2007, fulfilled a require- customer satisfaction. The inaugu- extensively on her research, with ment by the Office of Management ral review also serves as a baseline over 15 invited talks and keynote and Budget to evaluate national re- for future assessments. speeches over the past three years. search facilities for capital planning Reviewers emphasized the im- “After working on projects purposes. NERSC, Oak Ridge Na- portance for each center to seek aimed at making HPC systems tional Laboratory, and the Molecu- feedback from its users. Each cen- easier to use, I’m looking forward lar Science Computing Facility, part ter conducts its own user survey; at to helping NERSC’s scientific users of the Environmental Molecular Sci- NERSC the survey participation make the most effective use of our ences Laboratory at the Pacific hovers around 10 percent of all au- resources,” Yelick said. “NERSC Northwest National Laboratory, un- thorized users. The survey yields a has a strong track record in provid- derwent reviews in 2007. Argonne valuable assessment of the software, ing critical computing support for a National Laboratory will be re- hardware, and service offerings at number of scientific breakthroughs viewed in 2008. NERSC. and building on those successes “The committee feels that we In fact, researchers gave NERSC makes this an exciting opportunity.” are doing a good job and that our high marks in the 2006 survey, in operations are mature and effi- which the respondents gave an av-

1 K. Asanovic et al., “The Landscape of Parallel Computing Research: A View from Berkeley,” University of California at Berkeley Technical Report No. UCB/EECS-2006-183, http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html. National Energy Research Scientific Computing Center 37 2007 Annual Report

erage score of 6.3 — on a scale of recognized American scientist, Ben- physics, fusion, climate change pre- 1 to 7 with 7 being “very satisfied” jamin Franklin, the Cray XT4 system diction, combustion, energy and biol- — for the question about their over- enables researchers to tackle the ogy. Franklin will enable researchers all satisfaction with using NERSC most challenging problems in sci- at Berkeley Lab to address such resources. ence by conducting more frequent problems as developing better The hard work by NERSC staff and increasingly detailed simula- models of the Earth’s climate and paid off when the committee gave tions and analyses of massive sets using it to predict the impact of car- the center kudos for its services of data. bon dioxide emissions and global and management. “With Franklin, we are increasing warming. The powerful system will “I would like to send along my the computational power available also allow researchers to explore thanks and congratulations to you to our 2,900 NERSC users by a fac- clean energy technologies and vali- and your team for such a fine job tor of six, providing them with ac- date theories that attempt to un- that you did on the Operational As- cess to one of the world’s fastest cover evidence that explains the sessment,” wrote Vince Dattoria, a supercomputers dedicated to open origin of the universe. review committee member from the scientific research,” said Michael “NERSC’s new Cray XT4 system Advanced Scientific Computing Re- Strayer, associate director of DOE’s has demonstrated that it can deliver search Program Office within the Office of Advanced Scientific Com- a high sustained performance on a Office of Science, in an email to puting Research, which funds demanding scientific workload in a Kramer. NERSC. “We have high expecta- rigorous production environment “The professionalism and orga- tions that NERSC’s proven track while at the same time permitting nizational maturity was evident in record of scientific productivity will users to explore scaling to 20,000 the presentations and responses to provide many new discoveries and cores,” said Horst Simon, Berkeley the impromptu questions from the understandings.” Lab’s Associate Laboratory Director reviewers. This is especially note- The highly scalable Cray XT4 for Computing Sciences. “NERSC worthy since this was a baseline re- system is capable of running appli- is proud to accept one of the view with no precedents to draw cations across a wide range of sci- largest ‘broad impact science’ su- from for guidance. Well done,” Dat- entific disciplines, including astro- percomputers in the world for its toria added.

Franklin Passes Rigorous Acceptance Test

On November 1, 2007, NERSC and Cray Inc. announced the suc- cessful completion of the accept- ance test of one of the world’s largest supercomputers. The pow- erful Cray XT4 system contains nearly 20,000 processor cores and has a top processing speed of more than 100 teraflops. This supercomputer is being used to advance a broad range of scientific research. Named “Franklin” Franklin was the first production XT4 to run the Cray Linux Environment (CLE), an ultra-light- in honor of the first internationally weight version of the standard Linux operating system. CLE is now standard on the XT4. 38 National Energy Research Scientific Computing Center 2007 Annual Report

demanding user community.” lightweight version of the standard launch in 2008. A joint U.S.-Euro- Franklin has a theoretical peak Linux operating system. CLE pean project, Planck will use 74 speed in excess of 100 teraflops makes the system easier to use, detectors to measure the cosmic (100 trillion floating point operations allowing users to more easily port microwave background, the resid- per second). The system contains their scientific applications from ual radiation from the actual Big 9,672 AMD dual-core Opteron 2.6 other architectures. During exten- Bang. Last scattered some 400,000 GHz processors with 39 terabytes sive testing, about 300 different years after the Big Bang, it provides of memory. In assessing proposed features and functions were tested the earliest possible image of the systems, the Cray XT4 scalable ar- and validated, making CLE more universe, including encoded signa- chitecture promised to deliver high reliable with the same or better per- tures of the fundamental parame- sustained performance, which is formance than previous XT operat- ters of all matter. critical to NERSC’s 24x7 operation ing systems. CLE is now standard “I am delighted to report that we to meet users’ supercomputing de- on the XT4. have just successfully created a map mands. As part of an extensive testing of the entire Planck Full Focal Plane “We are very excited to see one program, a number of NERSC one-year simulation,” Borrill said. of the largest supercomputers in users were given early access to “This is the first time that so many the world opened up to the expan- Franklin to ensure that the system data samples — three terabytes of sive user community at NERSC,” could handle the most demanding data in 50,000 files, representing all said Peter Ungaro, president and scientific applications. the information collected by Planck CEO of Cray. “The Cray XT4 system “I am extremely impressed with during one mission year — have will provide the computational Franklin,” said Robert Harkness, an been analyzed simultaneously, a power to enable researchers who astrophysicist at the San Diego Su- primary goal of our group’s early compute at NERSC to efficiently percomputer Center working on a Franklin efforts.” Running his code tackle some of the most important project aimed at precisely measur- on 16,384 processor cores, Borrill problems we face today. With high ing the cosmological parameters was able to complete the run in just sustained performance, scalability that describe the shape, matter- 45 minutes. and upgradeability to petaflops ca- energy contents, and expansion Franklin would not have passed pacity as its key attributes, the Cray history of the universe. Running an the acceptance test without the XT4 supercomputer will help enable application called ENZO, the proj- hard work put in by the NERSC major advances in a number of sci- ect seeks to increase our under- staff to troubleshoot hardware and entific fields now and in the future.” standing of the dark energy and software problems that are typical NERSC General Manager Bill dark matter thought to make up of getting a supercomputer to work Kramer said, “I want to sincerely more than nine-tenths of the uni- properly, especially for demanding, thank all the staff — NERSC and verse. “We have run the largest in- large-scale scientific research. Cray — who worked so hard to stances of ENZO ever, anywhere, Working closely with early users make Franklin the first production and found that the performance and using their feedback to resolve Cray Linux Environment system, and scaling on Franklin are both any issues was key to prepare and the world’s largest Cray XT4.” strong and better than other com- Franklin for the test. “One of the During negotiations to procure puter platforms we have used at biggest issues was to get Franklin’s the system, NERSC and Cray other computing centers.” reliability up to the levels that mapped out a plan to install the Another project, led by Julian NERSC users expect,” said Cray Linux Environment (CLE) on Borrill of Berkeley Lab, leveraged Jonathan Carter, head of the User each of Franklin’s 9,672 nodes. As Franklin’s computing power to pre- Services Group at NERSC. “Part of a result of this partnership, NERSC pare for analyzing the massive this was handled by soliciting early became the first center with a pro- amounts of data to be sent back to user feedback and reporting prob- duction XT4 running CLE, an ultra- Earth by the Planck satellite, set for lems to Cray quickly. Another part National Energy Research Scientific Computing Center 39 2007 Annual Report

was keeping the composite metrics (GFDL) proposed a set of experi- like job failure rates, SSP, and ESP ments using climate models with high on the list of things Cray resolutions many times higher than needed to pay attention to.” those in the standard models, such Helen He, a member of the User as those used by the Intergovern- Services Group, added: “The early mental Panel on Climate Change users got a lot of useful work done. (IPCC). The high-resolution models Many were able to run high-concur- offer not only a closer look at physi- rency jobs to tackle much larger cal elements of the climate, such as problems and model resolutions tropical storms, but they also en- that were impossible before.” able researchers to conduct a more in-depth analysis of climate change Prabhat as higher-resolution phenomena in Early Results Demonstrate the ocean and atmosphere are re- Franklin’s Capabilities solved. For years, scientists worldwide Two teams of researchers push- have relied on simulations with res- ing the limits of climate and weather olutions in the 100-kilometer range simulations achieved noteworthy for studying forces that shape the results running their codes on oceans and the atmosphere. But Franklin. The climate scientists suc- the resolution is not high enough to cessfully ran experimental simula- model details such as ocean vor- tions with resolutions much higher tices and clouds, phenomena that than in widely used codes. And an- are critical for understanding re- other team of researchers set a gional climate variations. Developing Richard Gerber speed performance record for a a climate model is a computationally U.S. weather model, running on intensive task, and getting enough animations to illustrate the results. 12,090 of Franklin’s processors. time on powerful supercomputers Prabhat from the Analytics Team The accomplishments of these two has always been a challenge. created a series of visual renderings projects are described below. GFDL scientists, located in of data that included sea surface Princeton, New Jersey, had devel- temperatures, salinity, and clouds oped models capable of modeling and precipitation in different parts Climate Models Produce the global atmosphere at resolu- of the world. Finest Details tions down to 5 km, and the ocean “We are able to increase our A team of climate researchers at resolutions between 10 km and models’ resolutions because of our who obtained early access to 20 km. They also have designed ex- access to the NERSC machine,” said Franklin said the powerful system periments which generated 1 to 4 Venkatramani Balaji, head of the produced simulations that offered TB of data for every year of simula- Modeling Services Group at GFDL. high-resolution details of oceanic tion. NERSC provided GFDL with “One of the results is we can see and atmospheric phenomena, re- the computational resources for this category 4 or 5 hurricanes in a 20-km sults that were difficult to obtain challenge by setting aside over model, and they are what we would from other supercomputers before. 800,000 CPU hours on Franklin. expect to see in the real world.” At the DOE’s behest, scientists In addition to carrying out suc- Senior software developer from the National Oceanic and At- cessful runs on Franklin, GFDL also Christopher Kerr at GFDL and other mospheric Administration’s Geo- received strong support from members of Balaji’s team were re- physical Fluid Dynamics Laboratory NERSC’s Analytics Team in using sponsible for enabling the software 40 National Energy Research Scientific Computing Center 2007 Annual Report

AB

Visualizations of high-resolution climate datasets: (a) sea surface temperatures for the North Atlantic Gulf Stream; (b) salinity of the Southern Ocean; (c) hurricanes forming in the Atlantic Ocean. (Datasets provided by Chris Kerr, NOAA/GFDL; visualizations by Prabhat, NERSC/LBNL.)

C

infrastructure to perform these sci- to discovery by developing exten- in order to create visualizations that entific experiments. Richard Gerber, sions to the software that elimi- focus on the most interesting and a NERSC consultant, resolved sys- nated costly data format conversion significant phenomena (e.g., the tem-related issues so that the ex- barriers, namely extra computation, formation of tropical storms and periments could be performed on extra data storage, and more man- ocean eddies). The visualizations Franklin. ual processing steps. As a result, it present the simulation data in an “The results of the visualization was possible to load the simulation accessible format using conventions collaboration have been outstand- output generated on Franklin di- familiar to the climate modeling ing,” Balaji added. rectly into VisIt for visual data ex- community. The Analytics Team has The NERSC Analytics Team ploration and analysis. also installed VisIt at GFDL to allow used the VisIt visualization and Developing the visualizations scientists to use the new visualiza- analysis package to create images was a collaborative effort. The Ana- tion capability on a day-to-day basis, and movies for climate scientists at lytics Team had extensive interac- creating large-scale visualization of GFDL. The team accelerated time tions with GFDL staff and scientists climate data that were impossible National Energy Research Scientific Computing Center 41 2007 Annual Report

with conventional visualization tools. ics on a sphere. The researchers military, and commercial forecasters The climate modeling project, have done test runs using the as well as for highly detailed weather called the Coupled High-Resolution cubed sphere and found that the and climate research in hundreds of Modeling of the Earth System highly scalable methodology would universities and institutions worldwide. (CHiMES), began as a collaboration enable them to carry out simula- Scientists from the National between NOAA/GFDL and DOE. tions with a 5-km resolution. Center for Atmospheric Research The research uses comprehensive “We can go a lot further on this (NCAR), the San Diego Supercom- Earth system models and historical model,” Balaji said. “If the coupled puter Center (SDSC) at UC San data to examine how climate has model simulations done at NERSC Diego, Lawrence Livermore National changed over time and what exter- represent today’s leading edge, this Laboratory (LLNL), and IBM Watson nal forces will likely influence the model is already showing what will Research Center made up the re- climate in the future. The models be possible when the next generation search team that carried out the are based on the Flexible Modeling of hardware becomes available.” weather simulations on several high System (FMS) developed by GFDL. performance computers, setting na- FMS is a powerful computational Weather Forecast Model Sets tional records not only in perform- infrastructure for constructing cou- Speed Record ance but also in size and fidelity of pled climate models on high-end A team of researchers set a per- computer weather simulations. scalable computer architecture. formance record by running the The team’s efforts open the way The CHiMES project seeks to Weather Research and Forecast for simulations of greatly enhanced understand how the overall climate (WRF) code on 12,090 of Franklin’s resolution and size, which will serve responds to high-resolution phe- processors at a speed of 8.8 tera- as a key benchmark for improving nomena such as ocean eddies, as flop/s — the fastest performance of both operational forecasts and well as how fine-scale events such a weather or climate-related appli- basic understanding of weather and as tropical storms respond to climate cation on a U.S. supercomputer. climate prediction. change. To answer these questions, WRF is widely used for continuous The scientific value of the re- the project has been divided into weather forecasting by government, search goes hand-in-hand with the two parts: one is to study the cli- mate’s predictability over decades or longer using high-resolution cou- pled models; the second part is to study the correlations between tropical storms and climate change, a hot topic in the research world. Work by GFDL researchers on this subject appeared in over 35 publi- cations in scientific peer-reviewed journals last year. For the hurricane research, CHiMES uses an atmospheric model based on the cubed sphere grid developed by lead scientist S. J. Lin at GFDL. This projection of a grid over the surface of the Earth is a more scalable basis than latitudes A new speed record for weather and climate codes — 8.8 teraflop/s — was set when the and longitudes for solving the equa- Weather Research and Forecast (WRF) model was run on 12,090 processors of Franklin. WRF tions of computational fluid dynam- is widely used for routine weather forecasting as well as research. 42 National Energy Research Scientific Computing Center 2007 Annual Report

computational achievements. The collaborating with SDSC computer for high performance computing, “non-hydrostatic” WRF weather scientists to introduce efficiencies networking, storage, and analysis, code is designed for greater realism into the code, we were able to scale where it was a finalist in the presti- by including more of the physics of the model to run in parallel on more gious Gordon Bell Prize competition weather and capturing much finer than 15,000 processors, which has- in high performance computing. detail than simpler models tradition- n’t been done with this size prob- In preparing for the ground- ally used for global scale weather lem before, achieving a sustained breaking runs on the Stony Brook- prediction. Running this realistic 3.4 teraflop/s.” Brookhaven and NERSC systems, model using an unprecedented Added John Michalakes, lead the extensive problem solving re- number of computer processors architect of the WRF code, “To solve quired to achieve these results was and simulation size enabled re- a problem of this size, we also had made possible by running the WRF searchers to capture key features of to work through issues of parallel code on the Blue Gene system at the atmosphere never before repre- input and output of the enormous DOE’s Lawrence Livermore National sented in simulations covering such amount of data required to produce Laboratory, the fastest supercom- a large part of the Earth’s atmos- a scientifically meaningful result. puter on the Top500 list, and the phere. This is an important step to- The input data to initialize the run large Blue Gene system at the IBM wards understanding weather was more than 200 gigabytes, and Watson Research Center. predictability at high resolution. the code generates 40 gigabytes Tuning and testing were also “The scientific challenge we’re each time it writes output data.” carried out at the National Center addressing is the question in nu- With this power the researchers for Computational Sciences at Oak merical weather prediction of how were able to create “virtual weather” Ridge National laboratory and on to take advantage of coming peta- on a detailed 5-kilometer horizontal SDSC’s Blue Gene system, a re- scale computing power,” said grid covering one hemisphere of the source in the National Science weather scientist Josh Hacker of globe, with 100 vertical levels, for a Foundation-supported TeraGrid, an NCAR. “There are surprisingly com- total of some two billion cells — 32 open scientific discovery infrastruc- plex questions about how to har- times larger and requiring 80 times ture combining leadership class re- ness the higher resolution offered more computational power than sources at nine partner sites. In by petascale systems to best im- previous simulation models using these ongoing collaborations, the prove the final quality of weather the WRF code. team anticipates further record- predictions.” Petascale computing “The calculation, which is lim- setting results. refers to next generation supercom- ited by memory bandwidth and in- Team members include John puters able to compute at a peta- terprocessor communication, is Michalakes, Josh Hacker, and Rich flop/s (1015 calculations per representative of many other scien- Loft of NCAR; Michael McCracken, second), equivalent to around tific computations,” said Allan Allan Snavely, and Nick Wright of 200,000 typical laptops. Snavely, director of the Perform- SDSC; Tom Spelce and Brent The team also set a record for ance Modeling and Characteriza- Gorda of Lawrence Livermore; and parallelism, running on 15,360 tion (PMaC) lab at SDSC, whose Robert Walkup of IBM. processors of the 103 peak tera- group helped tune the model to run Story courtesy of SDSC. flop/s IBM Blue Gene/L supercom- at these unprecedented scales. puter at Brookhaven National “This means that what we learn in Laboratory, jointly operated by these large simulations will not only Large Scale Reimburse- Brookhaven and Stony Brook Uni- improve weather forecasts, but help ment Program Improves versity. a number of other applications as Codes’ Performance “We ran this important weather they enter the petascale realm.” model at unprecedented computa- The work was presented at Before Franklin’s acceptance tional scale,” added Hacker. “By SC07, the international conference testing was completed, NERSC’s National Energy Research Scientific Computing Center 43 2007 Annual Report

large scale reimbursement program required researchers to run 1024- to we had previously, since non-zero provided around nine million com- 1500-processor jobs on Seaborg, exit codes force IPM to throw away puting hours on Seaborg, NERSC’s depending on whether they have all output. But the NERSC staff was IBM SP RS/6000 system, to 21 participated in the program in the helpful and understanding of IPM projects that took advantage of the past. Each project could get a max- issues, never letting missing IPM opportunity to scale their runs and imum reimbursement of 500,000 output interfere with reimburse- improve their codes in preparation hours. Scientists also had to use ment. Where available, the IPM out- for using Franklin when it entered the Integrated Performance Moni- put supplied interesting profiling production. toring (IPM) software to gather per- information,” said Carlo Graziani, a The incentive attracted projects formance information about each run. researcher on Lamb’s team. from a variety of scientific disci- “The quantum Monte Carlo Other scientists whose projects plines, including astrophysics, life methods developed in the Lester were among the top ten recipients sciences, fusion, chemistry, and cli- group are naturally amenable to of reimbursed hours were George mate research. Scientists said the parallel computing,” said Brian Vahala (fusion plasmas), Doug program enabled them to pinpoint Austin, a researcher in a group led Toussaint (quantum chromodynam- and resolve issues before running by William A. Lester, Jr., a UC ics), Stephen Gray (nanoscale elec- jobs on Franklin. Berkeley chemistry professor and trodynamics), Cameron Geddes “We have participated in the Berkeley Lab researcher. “Histori- (accelerator physics), Wei-li Lee (fu- scaling reimbursement program in cally, our production jobs have run sion plasmas), and Paola Cessi (cli- order to tackle the problem of testing on several hundred processors with mate research). the predictive power of new empiri- near perfect parallel efficiency. The cal force fields for biomolecular sim- advent of near-petascale computers ulation,” said Nicolas Lux Fawzi, a such as Franklin will bring jobs with Seaborg Is Retired after UC Berkeley scientist on a research thousands of processors into the Seven Years of Service team led by Teresa Head-Gordon at norm. In this regime, subtle changes Berkeley Lab. “Our system of interest to our mode of parallel communica- With Franklin up and running, is the Aβ peptide and various sub- tion have dramatic effects that were Seaborg, the IBM supercomputer peptides which are associated in unnoticeable at previous scales: that has tackled some of the most the formation of amyloid plaques in communication time increased from challenging problems in astro- Alzheimer’s disease. We have used 2 percent to almost 50 percent as physics, climate research, fusion the CPU time in the program to the number of processors increased energy, chemistry and other scien- demonstrate that we can run large from 512 to 2048. The reimburse- tific areas, was retired in January parallel simulations on 1024 proces- ment program has been essential to 2008 after seven years of serving sors using the replica exchange our exploration and resolution of NERSC users. technique to generate the complete these issues.” Since it was opened for produc- equilibrated ensemble for our sys- Don Lamb, a University of tion use in August 2001, Seaborg tem at a range of temperatures. Chicago scientist who leads a su- has provided a little over 250 million “We have very much enjoyed pernovae research project (see page CPU hours to some 3,000 scientific the chance to work with NERSC as 28, also was an active participant in users. Included in these numbers part of the scaling program,” Fawzi the reimbursement program. Lamb’s are 26.5 million CPU hours for 22 added. “We’ve received some ex- research team members said the projects from the Innovative and cellent input from the NERSC staff IPM was not easy to use initially, Novel Computational Impact on on how to evaluate and improve the but they overcame those issues with Theory and Experiment (INCITE) scaling of the code.” the support of the NERSC staff. program, which was created by the The reimbursement program “We had to pay more attention DOE Office of Science to support was open to all NERSC users and to exit codes issued by FLASH than large-scale, high-impact projects. 44 National Energy Research Scientific Computing Center 2007 Annual Report

COMPUTING ON SEABORG

Total CPU CPU Hours Used CPU Hours Used by No. of INCITE No. of Total No. of Active Hours² by Scientists INCITE Projects Projects Active Users³ Science Users 20071 32,652,500 32,537,200 4,283,200 7 1,048 980

2006 51,904,400 51,554,100 3,992,500 3 1,991 1,868 2005 50,180,000 49,809,100 7,281,800 3 1,903 1,749 2004 56,821,900 56,086,500 6,670,800 3 1,458 1,354 2003 39,913,300 38,088,300 2,502,400 3 1,224 1,115 2002 22,649,400 22,279,200 1,785,700 3 897 831 Total 256,696,400 252,787,500 26,516,300 22 3,165 2,996

1 Data reflect usage through mid-September, 2007

2 Total CPU hours includes time used by researchers, NERSC staff and HPC vendors.

3 Active users include researchers, NERSC staff and HPC vendors.

The number of scientific papers percomputer was installed in Janu- puter for large-scale projects from resulting from computations on ary 2001 and was ranked No. 2 on the SciDAC and INCITE programs. Seaborg is estimated to be over the June 2001 TOP500 list, a semi- “From the day it went online, 7000. annual ranking of the world’s most Seaborg has been the scientific At the time of its purchase, the powerful supercomputers. Seaborg workhorse for many of the most im- IBM SP RS/6000 represented the ended its career at No. 331 on the portant computational science proj- largest single procurement in November 2007 list. ects undertaken within the Office of Berkeley Lab’s history, costing Seaborg underwent a significant Science,” said Michael Strayer, As- $33 million for the system and a upgrade in 2003, increasing the num- sociate Director of Advanced Scien- five-year contract. Named after the ber of its processors from the initial tific Computing Research within the Nobel prize winner and Berkeley 2,528 to 6,656. The upgrade, costing Office of Science. “Now the system Lab chemist Glenn Seaborg, the su- $30 million, readied the supercom- is ready to retire at a ripe old age of

In seven years of service, Seaborg provided over 250 million CPU hours to 3,000 scientific users. Its TOP500 ranking went from No. 2 in June 2001 to No. 331 in November 2007. National Energy Research Scientific Computing Center 45 2007 Annual Report

seven, many researchers will re- 300 MB/sec, and aggregate I/O analyze a variety of metrics, such as member Seaborg with appreciation.” reaches 750 MB/sec. hardware and software failures and Seaborg was shut down for the • Security of the data archive is tape drive utilization, and apply the last time on January 11, 2008, and maintained by NERSC’s access lessons learned to improve the pro- was disassembled and removed for restrictions as well as ongoing duction storage systems in ways that recycling within a week. monitoring, logging, and audit- directly affect end-user experience. ing. In addition, Bill Kramer, Jason Hick, Two things that make storage at and Akbar Mokhtarani are collabo- NERSC’s Mass Storage NERSC unique are applied storage rators in the SciDAC Petascale Data Stays Ahead of the Curve research and operational efficiency. Storage Institute (PDSI). NERSC’s NERSC’s Mass Storage Group focus in PDSI is on gathering and Advances in storage technology actively researches storage reliability reporting data on storage and file may not be as dramatic as advances and performance. They collect and system reliability, and working on in high performance computers — there is no TOP500 list for data HPSS Capacity Media/Drive Planning 50000 5 centers — but they are just as es- 45000 sential to the success of a scientific 40000 4 computing center. NERSC’s Mass 35000 Storage Group stays ahead of the 30000 3 curve by focusing on reliability, 25000 $M Terabytes scalability, availability, performance, 20000 2 15000 and security: 10000 1 • Reliability is the most important 5000 goal, and NERSC has a track 0 0 record of successful data preservation going back Oct 98Apr 99Oct 99Apr 00Oct 00Apr 01Oct 01Apr 02Oct 02Apr 03Oct 03Apr 04Oct 04Apr 05Oct 05Apr 06Oct 06Apr 07Oct 07Apr 08Oct 08Apr 09Oct 09Apr 10Oct 10Apr 11Oct 11 Month decades. • Scalability involves careful plan- Data stored Adjusted max capacity Theoretical max capacity ning and continuous upgrading Cost of Media (20 GB tape) Cost of Media (200 GB tape) Cost of Media (500 GB tape) of storage systems to anticipate This planning tool demonstrates actual and future storage growth, cost, and planned technol- the output of high-end comput- ogy upgrades. The light blue curve represents data retained in the storage system; the red line ers and data-intensive experi- represents theoretical maximum capacity of the storage system; the dark blue curve repre- ments. NERSC’s cumulative sents theoretical capacity adjusted by the media density of tapes already in the storage sys- tem. The green curves represent the cost of placing all the current data, represented by the data storage has grown expo- light blue curve, on a given capacity of media; for example, it would cost about $1 million to nentially and crossed into the store 5000 TB of data in April 2008 entirely on T10000A media (500 GB tape). The green petascale in 2005. curves demonstrate that continual upgrades in tape media capacity are essential to keeping • Availability and performance are costs of the storage system under control in the face of continual increased storage demand. crucial metrics for user produc- Another important consideration is the number of tape silos required to retain a given amount tivity. NERSC’s mass storage of data. Introducing new silos into an existing storage system is non-trivial. The primary indi- cator of needing a new silo is the number of tape media slots required. As the adjusted max systems achieved an overall capacity, the dark blue curve, approaches the amount of data stored, the light blue curve, the availability rate of 98.4% for FY more likely a new silo is required to handle storage growth with existing numbers and types of 2007. Single-client I/O can top media in the system. 46 National Energy Research Scientific Computing Center 2007 Annual Report

former zinc mine near Kamioka in Storage Technology and Capacity Upgrades western Japan. With almost 1400 Technology Year Data Cartridge Total Storage Total Storage citations, the first report on neutrino Deployed Rate Capacity Capacity Capacity (uncompressed) (compressed) oscillations from KamLAND is now one of the top-cited publications in T9840A 1999 10 MB/s 20 GB 0.88 PB 1.3 PB nuclear and particle physics.2 This T9940A 2002 30 MB/s 60 GB 2.60 PB 3.9 PB study ruled out all but one solution to the “solar neutrino problem,” T9940B 2003 30 MB/s 200 GB 8.80 PB 13.2 PB while advancing the art of neutrino T10000A 2007 120 MB/s 500 GB 22.00 PB 33.0 PB detection from surveys to precision measurements. KamLAND is also I/O benchmarking and characteriza- compressed data. As a result, the only experiment in the world tion of selected petascale applica- NERSC’s 44,000-tape cartridge ca- that can detect geologically pro- tions. And as a founding member of pacity is now 22 PB uncompressed duced neutrinos, which provide the High Performance Storage Sys- and 33 PB with compression. (The valuable information about heat tem (HPSS) collaboration, NERSC table above shows storage capacity generated by radioactive decay in continues to contribute to ongoing improvements over the past the Earth’s interior. HPSS development. decade.) This year NERSC also im- The experiment is now entering The operational efficiency of plemented a 25% bandwidth im- a new phase: the real-time detection mass storage at NERSC is evident provement in the disk array. of 7Be solar neutrinos, which are from the perspective of the past emitted in the solar fusion processes. decade. Total data grew from a few “The 7Be neutrino flux is currently terabytes in 1998 to several peta- Improving Access the least constrained quantity in the bytes in 2007 using multiple stor- for Users Standard Solar Model, the model age technologies. Single transfer that describes the processes in the bandwidth improved from a peak of Providing users with easy access Sun,” said Stuart Freedman, PI of 1 MB/sec in 1998 to nearly 500 to large-scale data, computing, and the “Data Analysis and Simulations MB/sec in 2007. This growth was storage resources is one of NERSC’s for KamLAND” project at NERSC. achieved using the same physical key goals. Accomplishments in “A 5% 7Be neutrino measure- footprint in the machine room and 2007 included the establishment of ment is essential in order to better with an essentially flat storage a high-bandwidth network connec- understand the fusion processes in budget. This high efficiency requires tion between the KamLAND experi- the Sun and to see if the Sun is in a careful planning and nearly continu- ment in Japan and NERSC’s PDSF steady state,” Freedman explained. ous phased technology upgrades. and HPSS systems, and making “The neutrinos emerging from the The major technology upgrade most of NERSC’s resources avail- Sun were produced just seconds for 2007 was the installation of new able on the Open Science Grid. earlier in the Sun’s interior, while the Titanium 10000A tape drives, which photons that we see reflect the state provide two and a half times more Overcoming Obstacles to Con- of the Sun around 40,000 years capacity and four times the perform- nect with KamLAND ago. The eventual comparison of ance of the previous tape drives. KamLAND (Kamioka Liquid the neutrino-inferred luminosity to Each cartridge can hold 500 GB of Scintillator Anti-Neutrino Detector) the presently visible photon lumi- uncompressed data or 750 GB of is a 1000-ton detector located in a nosity will allow us to determine

2 K. Eguchi et al. (KamLAND Collaboration), “First Results from KamLAND: Evidence for Reactor Antineutrino Disappearance,” Phys. Rev. Lett. 90, 021802 (2003). National Energy Research Scientific Computing Center 47 2007 Annual Report

Damian Hazen

KamLAND is the largest low-energy anti-neutrino detector ever built. Harvard Holmes how steady the energy production the most efficient method, but the mechanisms in the Sun are.” only one possible at the time, given The KamLAND project is the KamLAND’s limited network band- second-biggest user of storage re- width. But when the experimental site sources at NERSC, with 420 TB of was upgraded from a 1 Mbps to a data accumulated over the past five 1 Gbps network connection, Damian years, and is one of the largest Hazen, Harvard Holmes, and Wayne users of NERSC’s PDSF cluster. Hurlbert of the NERSC Mass Storage NERSC is the sole U.S. site for Group undertook the formidable computing in the KamLAND collab- task of overcoming the remaining oration. The KamLAND detector is barriers: time zone challenges, lan- currently recording experimental guage barriers (the network providers Wayne Hurlbert data at a rate of 250 GB per day, only speak Japanese), security barri- 365 days per year (~80 TB a year), ers, and performance tuning. The line a number of data monitoring but by mid-2008 that will increase result is that regular, reliable, and ef- tasks and has been very beneficial to 400 GB per day to accommodate ficient data transfers are now occur- for remote access to the experi- the 7Be neutrino measurements. ring between Kamioka and NERSC. ment,” Freedman said. “We are Until recently, data transfer from “The high bandwidth network currently also using this connection KamLAND to NERSC was accom- connection between KamLAND and to copy all the experimental data plished by shipping tapes — hardly NERSC has allowed us to stream- to the mass storage system at 48 National Energy Research Scientific Computing Center 2007 Annual Report

ture for scientific research. The Software Innovations OSG software stack provides dis- for Science tributed mechanisms for moving data and submitting large parallel NERSC’s focus on user produc- jobs to computing resources on the tivity sometimes leads to the devel- OSG grid at universities, national opment of new scientific software labs, and computing centers in tools. Two recent examples are North and South America, Europe, Sunfall, a collaborative visual ana- and Asia. Researchers from many lytics system that eliminated 90% fields, including astrophysics, bioin- of the human labor from a super- formatics, computer science, med- nova search and follow-up work- Shreyas Cholia ical imaging, nanotechnology, and flow; and Integrated Performance physics use the OSG infrastructure Monitoring (IPM), which has been to advance their research. used to analyze application per- The OSG will save valuable time formance at NERSC for several and reduce headaches for scien- years and has just been funded for tists who carry out their research at deployment at all NSF supercom- several computing facilities. Instead puting centers. of dealing with different authentica- tion processes and software at Sunfall: A Collaborative Visual each site, the scientists can go Analytics System through the OSG to manage their Computational and experimen- computing jobs, data, and workflow tal sciences are producing and across various sites using a uniform collecting ever larger and more complex datasets, often in multi- Jeff Porter interface. Making NERSC supercomputers institution projects. Managing, navi- available over the OSG has been gating, and gaining scientific insight a priority over the past year for from such data can be challenging, NERSC. We can achieve a maxi- Shreyas Cholia and Jeff Porter of particularly when using software mum performance of about 8 MB/s, the Open Software and Program- tools that were not specifically de- which is more than adequate.” ming Group. Bill Kramer, NERSC’s signed for collaborative scientific General Manager, chairs the OSG applications. To address this prob- Syncing Up with the Open Council. lem for observational astrophysics, Science Grid Connecting NERSC systems to Cecilia Aragon of the NERSC Ana- More NERSC users can now the OSG requires bridging different lytics Team and Berkeley Lab Visu- launch and manage their work at pieces of local and distributed soft- alization Group led a group of multiple computing sites by going ware and performing validation computer scientists and astrophysi- through the Open Science Grid tests to make sure everything runs cists to develop Sunfall, a collabo- (OSG). NERSC’s Bassi, Jacquard, without any glitches. As new tools rative visual analytics system for DaVinci, PDSF, and HPSS systems are added, NERSC solicits feed- supernova discovery and data ex- are all available on the OSG, and back from selected users who run ploration. Franklin will be joining them soon. the tools over a testbed. NERSC Astrophysics lends itself to a vi- The Open Science Grid, funded has set aside computing time sual analytics approach because by the DOE Office of Science and specifically for projects carried out much astronomical data, including the National Science Foundation, is over the OSG, as part of an effort to images and spectra, is inherently vi- a distributed computing infrastruc- attract new research and users. sual. Sunfall (Supernova Factory National Energy Research Scientific Computing Center 49 2007 Annual Report

scientists now have new data ex- days a week for a period of seven ploration and analysis capabilities months, making the operations very that had previously been too time- dynamic, and there is no way to consuming to attempt. take a break to catch up. Before the The SNfactory is an interna- Sunfall software was developed to tional collaboration among Berkeley integrate this process, it could be Lab, Yale University, and three re- an overwhelming job simply to per- search centers in France: Centre de form the necessary work, but in ad- Recherche Astrophysique de Lyon, dition it also was difficult to keep Institut de Physique Nucléaire de track of whether what had been done Lyon, and Laboratoire de Physique was sufficient, much less optimal.”

Cecilia Aragon Nucléaire et de Hautes Energies. Processing and analyzing these SNfactory’s mission is to create a data became a great challenge for Assembly Line) was developed for large database of Type Ia super- the researchers, whose responsibili- the Nearby Supernova Factory novae, which are known for their ties require them to sift through vast (SNfactory),3 an international astro- extraordinary and uniform bright- amounts of image data to search physics experiment and the largest ness and their role in breakthrough for Type Ia supernovae, and then data volume supernova search cur- research showing that the uni- follow up with spectral and photo- rently in operation. Sunfall is the verse’s expansion is accelerating. metric observations of each super- first visual analytics system in pro- The data will help researchers to nova. Aragon and fellow Sunfall duction use at a major astrophysics understand the mysterious dark en- researchers created the system by project, and it won the Best Poster ergy that propels the expansion. modifying existing software and de- Award at the 2007 IEEE Symposium The first goal in designing Sun- veloping new algorithms. on Visual Analytics Science and fall was to tackle the growing Sunfall provides the software Technology (VAST).4 amount of wide-field image data tools for extracting the Type Ia su- Sunfall utilizes novel interactive from the Palomar Observatory in pernovae from the raw images, visualization and analysis tech- San Diego. SNfactory receives 50 using statistical algorithms that re- niques to facilitate deeper scientific to 80 GB of image data each night, duced the number of false-positive insight into complex, noisy, high- and its researchers must process supernovae candidates by a factor dimensional, high-volume, time- and examine these data within 12 of 10. Instead of reviewing 1000 critical data. The system combines to 24 hours in order to get the most selected images each day, the re- novel image processing algorithms, data from these rare stellar explo- searchers now only have to examine statistical analysis, and machine sions. These supernovae only occur 100 images. learning with highly interactive vi- a few times per millennium in a typi- The software provides a visual sual interfaces to enable collabora- cal galaxy and remain bright enough display of three-dimensional astro- tive, user-driven scientific exploration for detection only for a few weeks. nomical data in an easy-to-read, of supernova image and spectral “To maximize the scientific re- two-dimensional format. Another data. The development of Sunfall turn, this has to be a very accurate, visual display offers the signal led to a 90% labor savings in areas efficient, and traceable process,” strength and other information of the SNfactory supernova search said Greg Aldering, leader of the about each spectrum, making it and follow-up workflow; and project SNfactory. “The data come in seven easy for researchers to analyze the

3 http://snfactory.lbl.gov/ 4 C. Aragon, S. Bailey, S. Poon, K. Runge, and R. Thomas, “Sunfall: A Collaborative Visual Analytics System for Astrophysics,” IEEE Visual Analytics Sci- ence and Technology Conference, Sacramento, CA, Oct. 30–Nov. 1, 2007; http://vis.lbl.gov/Publications/2007/Sunfall_VAST07.pdf (abstract), http://vis.lbl.gov/Publications/2007/Sunfall_VAST07_poster.pdf (poster). 50 National Energy Research Scientific Computing Center 2007 Annual Report

Sunfall’s SNwarehouse Data Taking window. The observer can follow the targets on the sky visualization; take notes on the success or failure of each observation, telescope status, and weather conditions; and reschedule targets if necessary.

information. Sunfall also closely mechanism is critical given that analysis of our follow-up spec- monitors and detects any problems SNfactory researchers are located troscopy,” Aldering said. that crop up while NERSC super- in different countries and time Developing Sunfall was no easy computers process the data — zones. By using Sunfall’s Super- task. Aragon and other members of there are visual displays of job nova Warehouse (SNwarehouse) the interdisciplinary team met fre- queues, completion times, and tool, scientists can easily access, quently, often daily, to discuss various other information. modify, annotate, and schedule fol- proposed designs and implementa- A Sunfall package called Data low-up observations of the data. tions, report progress, and ask for Forklift automates data transfers “The new capabilities of feedback on the developing system. among different types of systems, SNwarehouse produced an imme- Ideas for technical solutions often databases, and formats. Having a diate transformation, allowing us to came during regular weekly science reliable and secure data transfer shift our focus onto the science meetings in which SNfactory re- National Energy Research Scientific Computing Center 51 2007 Annual Report

searchers discussed their work. Since then, the software has won In addition to Aragon, other fans beyond NERSC, including the Sunfall project members included San Diego Supercomputer Center, Stephen J. Bailey, formerly in the Center for Computation and Berkeley Lab’s Physics Division, Technology at Louisiana State Uni- Sarah Poon and Karl Runge of the versity, the Swiss National Super- Space Sciences Lab at UC Berke- computing Center, and the Army ley, and Rollin Thomas, who worked Research Laboratory. for SNfactory and recently joined IPM overcomes shortcomings Berkeley Lab’s Computational Cos- exhibited by other performance mology Center. analysis software, Skinner said. For

“This project showed that an example, IPM has low overhead David Skinner interdisciplinary team can have suc- and requires no source code modi- cess solving high-data-volume sci- fications, making it easy for re- profiling activities; others are more ence needs,” Aragon said. “Now we searchers to use. Its fixed memory passive, operating in the background. have experience in solving a practi- footprint also ensures that running Some scale to thousands of tasks cal problem that can be applied in the software will not negatively im- and some do not,” said Skinner. other scientific domains.” pact the applications being profiled. The comparisons enabled the “An understandable application researchers to convince NSF to de- IPM to Be Deployed at NSF performance profile is something ploy IPM in all of its supercomputer Supercomputer Centers that all researchers using parallel centers. NSF has awarded $1.58 The National Science Foundation computing resources should expect million for the project, which is sched- (NSF) in 2007 approved a proposal in situ via a simple flip of a switch. It uled to take place over three years. that will deploy a nimble perform- should not require additional effort of Part of the project will focus on ance evaluation tool, developed by changing their code,” Skinner said. expanding IPM’s capabilities, such NERSC’s David Skinner, on all The NSF proposal reviewed as broadening the scope of what is major NSF supercomputers. real-life cases of DOE and NSF su- profiled and improving data analy- The software, Integrated Per- percomputer centers using various sis. The software is available under formance Monitoring (IPM),5 ana- performance monitoring tools. The an open source software license lyzes the performance of HPC principal investigators for the proj- and can run on major supercom- applications and identifies load bal- ect are San Diego Supercomputer puter architectures today: IBM, Linux ance and communication problems Center staff members Allan Snavely clusters, Altix, Cray X1, Cray XT4, that prevent them from running and Nick Wright and NERSC Direc- NEC SX6, and the Earth Simulator. smoothly and achieving high per- tor Kathy Yelick. formance. IPM is easy to deploy “Some means of doing perform- and use in systems with thousands ance analyses are quite invasive and International Leadership or tens of thousands of processors, disturb the application one is trying and Partnerships making it a good tool for petascale to study; others are more lightweight computing. but don’t provide adequate informa- Every year, visitors from scien- Skinner, leader of NERSC’s tion to researchers to improve the ir tific computing centers around the Open Software and Programming codes. Some require all users of a world visit NERSC to exchange Group, developed IPM in 2005. system to actively participate in the ideas and see firsthand how NERSC

5 5 http://ipm-hpc.sourceforge.net/ 52 National Energy Research Scientific Computing Center 2007 Annual Report

ties. And one exchange of visits re- sulted in an ongoing staff exchange program. In May 2007, NERSC hosted four visitors from CSCS (Swiss Na- tional Supercomputing Centre) for a series of discussions about sys- tems and facilities. Howard Walter, head of NERSC’s Systems Depart- ment, paid a return visit to CSCS in Manno, Switzerland in January 2008, sharing NERSC’s expertise in designing and building energy- efficient computing facilities. And on that occasion CSCS and NERSC Horst Simon (left), head of Computing Sciences at Berkeley Lab, gave a tour of the NERSC fa- signed a memorandum of under- cility to a delegation from RIKEN, Japan’s premier science and technology research institution, standing for a staff exchange pro- in August 2007. gram between the two centers. The agreement gives more formal structure to already existing ties be- tween the two centers. Berkeley Lab Associate Director for Computing Sciences Horst Simon is a member of the CSCS advisory board. Both centers also share a common tech- nological focus, having selected Cray XT supercomputers as their primary systems after thorough re- views of various systems. “While many of us at NERSC are in frequent contact with our col- leagues at other supercomputing centers in the U.S., we see this agreement as a means to broaden NERSC General Manager Bill Kramer (left) led a tour of NERSC’s computer room for visitors our outreach and perspectives,” from the Swiss National Supercomputing Centre, including COO Dominik Ulmer (center). said NERSC Director Kathy Yelick. operates. For example, Thomas machine room at the end of his “Our informal discussions have al- Lippert, director of the Central Insti- visit, he said, “NERSC is a model of ready yielded valuable insights. tute for Applied Mathematics at Re- how high-performance computing With a more formalized structure, search Center Jülich in Germany should be done.” we expect these exchanges to be and head of the John von Neumann Other visitors during the year even more productive.” Institute for Computing, spent a day came from Germany, Japan, Korea, The two centers also play simi- with NERSC staff on March 6, 2007 Saudi Arabia, Sweden, and Switzer- lar roles in their national research to discuss performance and bench- land. One of the most frequent top- communities: CSCS is the largest marking of scientific computing ics of conversion was energy-efficient supercomputing center in Switzer- systems. After touring NERSC’s computer architectures and facili- land and is managed by the Swiss National Energy Research Scientific Computing Center 53 2007 Annual Report

Federal Institute of Technology in same primary goal of advancing the Under the agreement, staff ex- Zurich. NERSC is the U.S. Depart- scientific research of our users,” changes will be arranged based on ment of Energy’s flagship facility for said CSCS COO Dominik Ulmer. specific projects of mutual interest. computational science, serving “We believe each center has a lot of Each center will continue to pay the 2,900 users at national laboratories expertise to share, and we are look- salary and expenses of staff partici- and universities around the country. ing forward to working together on pating in the exchanges. According to “Not only do our two centers new HPC technologies that will allow the agreement, the goal is “sharing share organizational and operational us to further enhance the support and furthering the scientific and tech- similarities, but we both have the and services we offer our users.” nical know-how of both institutions.” Envisioning the Exascale

Spatial (x*y*z*) resolution

0.2° 10000 1 km 22 km

1000 (AMR)

1

Ensemble National Energy Research Scientific Computing Center 55 2007 Annual Report

In April, May, and June 2007, three town hall meetings were held at Lawrence Simulation complexity ESM + global cloud Berkeley, Oak Ridge, and Argonne national laboratories to collect community resolving model input on the prospects of a proposed new DOE initiative entitled Simulation and Modeling at the Exascale for Energy and the Environment, or E3 for short. ? About 450 researchers from universities, national laboratories, and U.S. com- panies discussed the potential benefits of advanced computing at the exascale Earth system model (15 components) (1018 operations per second) on global challenge problems in the areas of en- Timescale (years*timestep) ergy, the environment, and basic science. The findings of the meetings were 100 ? summarized in a document that quickly became known as the E3 Report.1 Climate model (5 components) 1000 yr The E3 Report stated that exascale computer systems are expected to be ° 70 1.4 1000 yr/ technologically feasible within the next 15 years, but that they face significant 160 km 3 min 1000 yr/ challenges. One of the challenges that is receiving a great deal of attention 400 20 min throughout the HPC community is power efficiency. An exaflops system that 2005 requires less than 20 MW sustained power consumption is “perhaps achiev- Terascale 5 ? able,” according to the E3 findings, but based on straightforward scaling of 2010 2 10 existing technology, estimates are roughly an order of magnitude higher. Petascale ? When the cost of running and cooling a supercomputer grows to exceed its pro- 50 2015 Data assimilation/ curement cost (which is already happening at major data centers), the economic 20 Exascale initial value forecasts viability of the project may come into question. 1000 e size A task force of staff members from NERSC and Berkeley Lab’s Computa- tional Research Division held a series of internal meetings in 2007 to develop a strategic vision for NERSC’s transition to the exascale.3 Envisioning NERSC Figure 1. Investment of exascale and petascale computational resources in sev- as the Keystone Facility for the DOE Office of Science, the task force ad- eral aspects of a simulation: spatial resolu- dressed three broad topics: computing, power, and data. tion, simulation complexity, ensemble size, The discussions about NERSC at the exascale were informed both by the etc. Each red pentagon represents a bal- E3 Report and by a series of meetings that had taken place in Berkeley over anced investment at a compute scale. the previous two years, in which a multidisciplinary group of researchers (in- (Image from E3 Report.) cluding Kathy Yelick, John Shalf, Parry Husbands, Bill Kramer, and Lenny Oliker) collaborated with faculty and students on campus to explore the impli- cations of the recent switch to multicore processors throughout the computing industry. This collaborative exploration led to the publication of “The Land-

1 H. D. Simon, T. Zacharia, R. Stevens, et al., “Modeling and Simulation at the Exascale for Energy and the Environment” (“E3 Report”), Department of Energy Technical Report (2007); http://www.sc.doe.gov/ascr/ProgramDocuments/TownHall.pdf. 2 E3 Report, p. viii. 3 Participants included Deborah Agarwal (CRD), Michael Banda, E. Wes Bethel, John Hules, William Kramer, Juan Meza (CRD), Leonid Oliker, John Shalf, Horst Simon, David Skinner, Francesca Verdier, Howard Walter, Michael Wehner (CRD), and Katherine Yelick. 56 National Energy Research Scientific Computing Center 2007 Annual Report

scape of Parallel Computing Re- the small, low-power processors in NERSC alone, NERSC combined search: A View from Berkeley”4 and consumer electronics products such with the other DOE centers, or even to the establishment of UC Berke- as cell phones, PDAs, and MP3 an augmented NERSC with other ley’s Parallel Computing Laboratory players — and high performance DOE centers. (Par Lab),5 which received major computing would have more in NERSC and the Office of Sci- funding from Intel, Microsoft and common in the future than they did ence’s Leadership Computing Facil- California’s UC Discovery program. in the past. ities will be linked by ESnet in a fully The ParLab project involves Berke- The anticipation of a paradigm integrated computational science ley Lab faculty scientists Krste shift in high performance computing environment. NERSC’s role as the Asanovic, Jim Demmel, John set the context for the discussions Keystone Facility in this environment Wawrzynek, Kathy Yelick, and the of NERSC’s transition to the exascale. will be to provide exceptional quality principal investigator of the project, The path forward envisioned in those of service and David Patterson. discussions is summarized below. • high-impact computing (defined The “Berkeley View” report re- below) ceived widespread attention.6 It as- • broad-impact computing for the serted that an evolutionary approach NERSC Computing diversity of DOE mission sci- to parallel hardware and software ence would face diminishing returns at The need for an increase in com- • efficient and transparent access 16 cores and beyond, when the in- putational resources is well docu- to and management of simula- creased difficulty of parallel pro- mented in the E3 Report, the DOE tion and experimental data gramming would not be rewarded by Greenbook,8 and the SCaLeS Re- • integrated data analysis tools a commensurate improvement in port.9 The scientific requirements go and platforms performance. “We concluded that beyond the traditional Office of Sci- • integrated support for SciDAC sneaking up on the problem of par- ence work that NERSC has sup- and other science community- allelism via multicore solutions was ported for the past three decades, developed tools likely to fail and we desperately need adding untouched areas of life sci- • outreach to new HPC user com- a new solution for parallel hardware ence, energy resources (nuclear, munities. and software,” the authors stated.7 biofuels, and renewable), energy NERSC will continue to support Taking aim at thousands of proces- efficiency, climate management, both high-impact and broad-impact sors per chip (“manycore”), the report nanotechnology, and knowledge science workloads. High-impact work proposed seven critical questions discovery. Simulations will grow in is defined as ultrascale workflows for parallel computing research and complexity, spatial resolution, or applications that require 20– suggested directions in which the timescales, ensemble sizes, and 100% of the largest resources at solutions might be found. One of data assimilation (Figure 1). The any given time. Broad-impact work the more provocative suggestions computational needs are far beyond is science that runs at scale, with was that embedded computing — what can be supplied today by high throughput, using 1–20% of

4 Krste Asanovíc, Rastislav Bodik, Bryan Catanzaro, Joseph Gebis, Parry Husbands, Kurt Keutzer, David Patterson, William Plishker, John Shalf, Samuel Williams, and Katherine Yelick, “The Landscape of Parallel Computing Research: A View from Berkeley,” University of California at Berkeley Technical Re- port No. UCB/EECS-2006-183, http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html. 5 http://parlab.eecs.berkeley.edu/ 6 For example, John Shalf’s talk “Overturning the Conventional Wisdom for the Multicore Era: Everything You Know Is Wrong” was voted one of the most popular at the 2007 International Supercomputing Conference in Germany. 7 “Berkeley View,” p. 3. 8 S. C. Jardin, ed., “DOE Greenbook: Needs and Directions in High Performance Computing for the Office of Science. A Report from the NERSC User Group.” PPPL-4090/LBNL-58927, June 2005; http://www.nersc.gov/news/greenbook/2005greenbook.pdf. 9 David E. Keyes, Phillip Colella, Thom H. Dunning, Jr., and William D. Gropp, eds., A Science-Based Case for Large-Scale Simulation (“The SCaLeS Re- port”), Washington, D.C.: DOE Office of Science, Vol. 1, July 30, 2003; Vol. 2, September 19, 2004; http://www.pnl.gov/scales/. National Energy Research Scientific Computing Center 57 2007 Annual Report

Table 1: Characteristics of scientific discipline codes

Multi- Dense Sparse Spectral N-body Structured Unstructured Data physics, linear linear methods methods grids grids Intensive multi-scale algebra algebra (FFTs) (N-Body) (S-Grids) (U-Grids) (Map (DLA) (SLA) (SM-FFT) (PIC) Reduce)

Nanoscience X X X X X X

Chemistry X X X X X

Fusion X X X X X X

Combustion X X X X X

Astrophysics X X X X X X X X

Biology X X X X

Nuclear X X X X System General-purpose High-speed High perform- High intercon- High-speed High perform- Irregular data and con- High storage balance balanced system CPU, high ance random nect bisection CPU, fast ance unit and trol flow, high perform- and network flop/s rate access memory bandwidth random access strided memory ance random access bandwidth/ low implications memory access memory latency the resources at a given time. balanced for just one computational NERSC’s general large-scale NERSC expects to support be- approach cannot fully serve a disci- systems will provide an appropriate tween 10 and 20 high-impact sci- pline area alone. relationship between sustained per- ence projects and from 200 to 250 Since NERSC will support a formance, usable memory, and us- broad-impact projects. In addition diverse workload as indicated in able disk space. Regardless of to broad-impact and high-impact Table 1, resources can be provi- whether the general-purpose or the science, NERSC will be posed to sioned in one of two ways. First, specialized system approach is support three to five INCITE-like provide general-purpose systems taken, a key system architecture “breakthrough science” projects a that are optimized to do well with component that will make scientists year. These projects receive pre- the entire or a large segment of the succeed will be having a high per- ferred processing and services from workload. Second, provide a small formance, parallel, facility-wide file NERSC in order to meet their mile- number of specialized systems that system tightly integrated with NERSC stones. are very efficient at particular algo- large-scale systems. Balanced sys- The system architecture will rithms and in aggregate support the tem architectures also include: focus on the computational needs entire diverse workload. NERSC is • high performance local-area of the user base by using multiple open to either approach or one that network HPC systems, with at least two balances a portfolio of general and • wide-area network interfaces large-scale systems in place at a special systems that overall pro- matching or exceeding ESnet time. While it is possible to create vides DOE with a very efficient facil- backbone speeds specialized system solutions for a ity for its entire problem space. In • archival storage with single algorithmic approach, Table 1 either case, a key requirement is ° large-scale near line data shows it is not feasible to segregate easing the burden on scientists to repository system balance by discipline, since move data between systems. In all ° online data cache different workflow steps and/or ap- likelihood, NERSC will have some • data focused systems proaches within a discipline require large general-purpose systems ° community data services: different system balance. Often and a few specialized systems for “Google for Science” these different algorithms exist in breakthrough science areas or spe- (described below, page 66) the same codes. Likewise, a system cific algorithmic needs. ° visualization and analysis 58 National Energy Research Scientific Computing Center 2007 Annual Report

• servers and specialized systems The new design constraint for simulation as one of the key driver ° Web services, NERSC Infor- processing elements is electrical applications to develop the hard- mation Management System power. While Moore’s Law is alive ware/software co-design methodol- (NIM), etc. and well, smaller transistors are no ogy. This hardware/software co- • Infrastructure special-arrange- longer resulting in faster chips that design process can dramatically ment systems consume less energy. Traditional accelerate the development cycle ° PDSF, Planck cluster methods for extracting more per- for exascale systems while de- • cyber security formance per processor have been creasing the power requirements. • advanced concept systems. well mined. The only way to im- NERSC will remain open to and prove performance now is to put Reducing Waste in Computing will encourage deploying systems more cores on a chip. In fact, it is The low-power embedded com- from multiple vendors, with major now the number of cores per chip puting market — including consumer systems arriving at three-year inter- that is doubling every 18 months in- electronics products such as cell vals. At least two systems will be on stead of clock frequency doubling phones, PDAs, and MP3 players — the floor at any given time, one pro- as in the past. has been the driver for CPU innova- viding a stable platform while the Consequently, the path towards tion in recent years. The processors next-generation system is brought realizing exascale computing de- in these products are optimized for into production. pends on riding a wave of exponen- low power (to lengthen battery life), tially increasing system concurrency. low cost, and high computational This is leading to reconsideration of efficiency. NERSC Power interconnect design, memory bal- According to Mark Horowitz, ance, and I/O system design. The Professor of Electrical Engineering The electrical power demands entire software infrastructure is built and Computer Science at Stanford of ultrascale computers threaten to on assumptions that are no longer University and co-founder of Ram- limit the future growth of computa- true. The shift to multicore and bus Inc., “Years of research in low- tional science. For decades, the no- manycore processors will have dra- power embedded computing have tion of computer performance has matic consequences for the design shown only one design technique to been synonymous with raw speed of future HPC applications and al- reduce power: reduce waste.” The as measured in flop/s. This isolated gorithms. sources of waste in current HPC focus has led to supercomputers To reach exascale computing systems include wasted transistors that consume egregious amounts of cost-effectively, NERSC proposes to (surface area), wasted computation electrical power. Other performance radically change the relationship be- (useless work, speculation, stalls), metrics have been largely ignored, tween machines and applications by wasted bandwidth (data move- e.g., power efficiency, space effi- developing a tightly coupled hard- ment), and optimizing chip design ciency, reliability, availability, and ware/software co-design process. for serial performance, which in- usability. As a consequence, the We will directly engage the Office of creases the complexity (and power total cost of ownership of a super- Science applications community in waste) of the design. computer has increased extraordi- this cooperative process of develop- Efficient designs must be spe- narily. With the cost of power ing models to achieve an aggressive cific to application and/or algorithm possibly exceeding the procure- goal of 100 times the computational classes, as suggested by a study ment costs of exaflop systems, the efficiency and 100 times the capabil- that examined the dual-core AMD current approach to building super- ity of the mainstream HPC approach processor on Cray XT3 and XT4 computers is not sustainable with- to hardware/software design. We systems to assess the current state out dramatic increases in funds to propose to use global cloud system of system balance and to determine operate the systems. resolving models for climate change when to invest more resources to National Energy Research Scientific Computing Center 59 2007 Annual Report

Distribution of Time Spent in Application in Dual-Core Opteron/XT4 System excellent driver for understanding 100 how processors can be designed to optimize for efficient parallel execu- tion rather than serial execution. 80 OTHER The “Berkeley View” report con- cludes that parallelism is an energy-

60 efficient way to achieve performance. A system with many simple cores FLOPS offers higher performance per unit 40 area for parallel codes than a com-

PERCENT TIME SPENT parable design employing smaller numbers of complex cores. Lower 20 complexity makes a chip more eco- nomical to design and produce, and

0 smaller processing elements pro- CAM MILC GTC GAMESS PARATEC PMEMD MadBench CONTENTION MEMORY vide an economical way to improve APPLICATION defect tolerance by providing many Figure 2. A breakdown of where time was spent in a subset of the NERSC SSP application codes redundant cores that can be turned suggests that different applications have different requirements for computational efficiency. off if there are defects. Figure 3 shows that moving to a improve memory bandwidth.10 The bottlenecks in existing CPUs, and simpler core design results in mod- study used the NERSC SSP bench- that different applications have dif- estly lower clock frequencies, but mark, which is a diverse array of ferent balance requirements. has enormous benefits in chip sur- full-scale applications that represent A core designed to a specific face area and power consumption. a significant fraction of the NERSC set of application resource require- Even if it is assumed that the sim- workload. A breakdown of time spent ments can get 10 to 100 times better pler core will offer only one-third the in various components of the codes performance per watt, as shown by computational efficiency of the (Figure 2) shows that surprisingly studies from Stanford University11 and more complex out-of-order cores, a little time could be attributed to from Tensilica, Inc.12 Figure 3 illus- manycore design (hundreds to memory contention corresponding trates this potential by showing the thousands of simple cores) could to basic memory bandwidth limita- area and performance differences still provide an order of magnitude tions. The largest fraction of time between general purpose, embed- more power efficiency for an equiv- (the “other” category) is attributed to ded (used in BlueGene/P), and ap- alent sustained performance. As the either latency stalls or integer/ad- plication-tailored cores. The figure figure illustrates, even with the dress arithmetic. Theoretically, these shows how much area and power smaller cores operating at one-third applications should all be memory- desktop processors waste because to one-tenth the efficiency of the bandwidth bound, but instead the they are optimized for serial code. largest chip, 100 times more cores study shows that most are con- The DOE applications, because they can still be packed onto a chip and strained by other microarchitectural are already highly parallel, are an consume one-twentieth the power.

10 J. Carter, H. He, J. Shalf, E. Strohmaier, H. Shan, and H. Wasserman, “The Performance Effect of Multi-Core on Scientific Applications,” Cray User Group (CUG2007), Seattle, Washington, May 7–10, 2007. 11 M. Horowitz, E. Alon, D. Patil, S. Naffziger, R. Kumar, and K. Bernstein, “Scaling, Power, and the Future of CMOS,” IEEE International Electron Devices Meeting, December 2005. 12 Chris Rowen, “Application-Specific Supercomputing: New Building Blocks Enable New Systems Efficiency,” SIAM Conference on Computational Science and Engineering, February 19, 2007, Costa Mesa, California. 60 National Energy Research Scientific Computing Center 2007 Annual Report

TensilicaDP testing and evaluation by the appli- PPC450 cation scientists. A hardware/soft- Intel Core2 Power5 ware co-design approach could dramatically accelerate this process. FXU ISU FPU Power5 (server) • 389 mm2 For years, NERSC has engaged • 120 W @ 1900 MHz in a cooperative effort with hard- IDU Intel Core2 sc (laptop) ware designers, which we call Sci- LSU • 130 mm2 ence-Driven System Architecture, IFU • 15 W @ 1000 MHz which involves engaging application PowerPC450 (BlueGene/P) scientists in the early parts of the • 8 mm2 • 3 W @ 850 MHz hardware design process for future- Tensilica DP (cell phones) generation supercomputing sys- 2 13 L2 L2 L2 • 0.8 mm tems. This approach is consistent • 0.09 W @ 650 MHz with the recommendations of the High-End Computing Revitalization Task Force (HECRTF)14 and the Na- tional Research Council report Get- ting Up to Speed: The Future of L3 Directory/Control MC Supercomputing,15 both of which

Figure 3. Relative size and power dissipation of different CPU core architectures. Simpler recommend partnerships with ven- processor cores require far less surface area and power with only a modest drop in clock fre- dors in the early stages of the prod- quency. Even if measured by sustained performance on applications, the power efficiency and uct development process. performance per unit area is significantly better when using the simpler cores. NERSC proposes to focus this cooperative effort toward a new de- Effective performance per watt is the length of the feedback loop on sign paradigm: application-driven the critical metric. system designs. Due to the high HPC. This approach involves identi- This design approach raises the design investment cost, the vendor fying high-impact exascale scien- challenges of creating ultrascale must make compromises in the tific applications, tailoring the parallel applications. system design to accommodate a system architecture to the applica- wide variety of applications. The tion resource requirements, and co- A Tightly Coupled Hardware/ application scientists cannot pro- designing algorithms and software Software Co-Design Process vide performance feedback to the together with the semi-custom If the HPC community emulated vendor until hardware is released hardware. the embedded computing industry, for testing and evaluation. This This co-design process would we could potentially reduce not only multi-year cycle is a source of sig- be impossible using the typical power requirements but also design nificant inefficiencies for scientific multi-year hardware lead times for costs and time to market. A key productivity, because it can take complex, serial-optimized chips. limiting factor in the market-driven years for each new iteration of However, a typical embedded approach to HPC procurements is hardware to become available for processor vendor may generate up

13 H. D. Simon, W. T. Kramer, W. Saphir, J. Shalf, D. Bailey, L. Oliker, M. J. Banda, C. W. McCurdy, J. Hules, A. Canning, M. Day, P. Colella, D. Serafini, M. F. Wehner, and P. Nugent, “Science-Driven System Architecture: A New Process for Leadership Class Computing,” Journal of the Earth Simulator, Volume 2, March 2005, pp. 2–10; Lawrence Berkeley National Laboratory technical report LBNL-56545; http://repositories.cdlib.org/lbnl/LBNL-56545. 14 Federal Plan for High-End Computing: Report of the High-End Computing Revitalization Task Force (HECRTF), Washington, D.C.: National Coordination Office for Information Technology Research and Development, May 10, 2004. 15 National Research Council Committee on the Future of Supercomputing, Getting Up to Speed: The Future of Supercomputing, S. L. Graham, M. Snir, and C. A. Patterson, eds. (Washington, DC: National Academies Press, 2004). National Energy Research Scientific Computing Center 61 2007 Annual Report

to 200 unique designs every year for faster than conventional software- tudes are inadequately represented simple, specialized chips. In order based cycle-accurate simulators, at current resolutions. Current- to keep up with the demand for semi- hardware/software co-design is generation climate models can be customized designs, leading em- now feasible. extended to about a 20 km horizon- bedded design houses such as IBM The software side of the co- tal resolution in the atmospheric Microelectronics, Altera, and Tensil- design process will be supported by component without major reformu- ica have evolved sophisticated auto-tuning tools for code genera- lation, but at finer resolutions, the toolsets to accelerate the design tion that are being developed by the treatment of cumulus cloud proces- process through semi-automated SciDAC Center for Scalable Appli- ses breaks down. Fortunately an- synthesis of custom processor de- cation Development Software,17 led other alternative presents itself at the signs. NERSC proposes to leverage by John Mellor-Crummey of Rice 1 km scale, where cloud systems the expertise of this technology University. (but not individual clouds) can be sector by collaborating with Mark resolved. Current technologies for Horowitz of Stanford University and An Ultrascale Application: regional modeling at this scale are Rambus, Inc., and Chris Rowen of Ultra-High-Resolution Climate extendable to global models, but the Tensilica, Inc. Change Simulation computational platform will need to NERSC’s co-design process NERSC’s hardware/software co- achieve around 10 petaflop/s will utilize the Berkeley Research design methodology is broadly ap- (Pflop/s) sustained.18 A detailed ex- Accelerator for Multiple Processors plicable and could be applied to a trapolation of the resource require- (RAMP),16 an FPGA emulation plat- number of Office of Science disci- ments of a current-generation form that makes the hardware con- plines that could effectively utilize atmospheric model showed that it figuration available for evaluation ultrascale resources, resulting in dif- is unlikely that multicore chip tech- while the actual hardware is still on ferent hardware configurations nology will achieve this goal in the the drawing board. Making use of within a consistent framework. As a next two decades within practical large field programmable gate ar- pilot project that could result in hardware or power budgets. An en- rays (FPGAs), RAMP looks like the breakthroughs in both the domain ergy-efficient hardware architecture real hardware to software develop- science and computer science, capable of achieving the aggressive ers, who can efficiently test their NERSC proposes to use climate requirements of the kilometer-scale target application software on vary- change simulation to illustrate this model could employ 20 million much ing hardware configurations. The power-efficient computing ap- simpler cores using existing 90 nm flexibility of RAMP allows rapid proach, resulting in a synergy of re- technology. changes in the details of the hard- ducing the carbon footprint needed Actual development of a 1 km ware configuration (e.g., the num- to predict climate change with un- cloud system resolving global model ber of processors, number of precedented accuracy. is a significant multi-year effort. The floating point units per processor, A major source of errors in cli- SciDAC project “Design and Testing size and speed of caches, prefetch- mate models is poor cloud simula- of a Global Cloud Resolving Model”19 ing schemes, speed of memory, tion. The deep convective processes is leading the way with a grid-cell etc.). Since RAMP allows these ex- responsible for moisture transport spacing of approximately 3 km on a plorations at speeds 1,000 times from near-surface to higher alti- highly uniform icosahedral grid. This

16 S. Wee, J. Casper, N. Njoroge, Y. Teslyar, D. Ge, C. Kozyrakis, and K. Olukotun, “A Practical FPGA-based Framework for Novel CMP Research,” Pro- ceedings of the 15th ACM SIGDA Intl. Symposium on Field Programmable Gate Arrays, Monterey, CA, February 2007. See also the RAMP homepage, http://ramp.eecs.berkeley.edu/. 17 http://cscads.rice.edu/ 18 Michael Wehner, Leonid Oliker, and John Shalf, “Towards Ultra-High Resolution Models of Climate and Weather,” International Journal of High Perform- ance Computing Applications (in press). 19 http://kiwi.atmos.colostate.edu/gcrm/ 62 National Energy Research Scientific Computing Center 2007 Annual Report

Research, a unit of an investment General Purpose Application Driven Special Purpose Single Purpose firm, focuses on the development of new algorithms and specialized su- percomputer architectures for ultra- Cray XT3 Green Flash Blue Gene D.E. Shaw MD Grape fast biomolecular simulations of scientific and pharmaceutical prob- lems. The D. E. Shaw system will use Figure 4. The customization continuum of computer architectures. fully programmable cores with full- model would be capable of simulating could be achieved with 20 million custom co-processors to achieve effi- the circulations associated with large processors, modest vertical paral- ciency, and will simulate 100 to 1000 convective clouds. Although the lelization, a modest 0.5 gigaflop/s times longer timescales than any procedure is conceptually straight- per processor, and 5 MB memory existing HPC system. While the pro- forward, there is little sense of ur- per processor. grammability of the D. E. Shaw sys- gency for a 1 km global model given An application-driven architec- tem will broaden its application reach, the assumption that the computer ture does not necessitate a special- it will still be narrower than NERSC’s technology is not arriving anytime purpose machine, nor does it require proposed Green Flash. soon. However, this model could run exotic technology. As Figure 4 shows IBM’s Blue Gene is the best ex- much sooner than expected on with several examples, there is a ample to date of the kind of appli- massively concurrent architectures customization continuum from cation-driven architecture based on composed of power-efficient em- general-purpose to single-purpose an embedded processor core that bedded cores. computers, and indeed the Blue NERSC envisions for Green Flash. NERSC proposes a focused Gene line of systems was started Designed around a protein folding program to design a computing with a very narrow application tar- application, Blue Gene, over several platform and the climate model in get in mind. generations, has proved to be use- tandem. This research project has At the single-purpose, fully cus- ful for a growing list of applications, been named “Green Flash.” The tom extreme is MD-Grape, a com- including hydrodynamics, quantum computer system would employ puter at RIKEN in Japan. MD-Grape chemistry, molecular dynamics, cli- power-efficient cores specifically was designed for molecular dynam- mate modeling, and financial mod- tailored to meet the requirements of ics simulations (the name stands for eling. this ultrascale climate code. The “Molecular Dynamics — Greatly Like Blue Gene, NERSC’s ultra- equations of motion (rather than the Reduced Array of Processor Ele- high-resolution Green Flash would physics) dominate the requirements ments”) and has a custom ASIC have a semicustom design. The of an atmospheric model at 1 km chip design. It achieves 1 Pflop/s core architecture would be highly resolution because the Courant sta- performance for its target application programmable using C, C++, or bility condition requires smaller time using 200 kilowatts of power, and Fortran. Its 100x improvement in steps. To be useful, the model must cost $8.6M from concept to imple- power efficiency would be modest run at least 1,000 times faster than mentation (including labor). Although when compared with the demon- real time, calculating values for MD-Grape was custom-designed for strated capability of more specialized about 2 billion icosahedral points. molecular dynamics, it has proven approaches. This approach would At this rate, millennium-scale con- useful for several other applications, solve an exascale problem without trol runs could be completed in a including astrophysical N-body sim- building an exaflop/s machine. year, and century-scale transient ulations. RAMP will be used as a testbed runs could be done in a month. The An example of a semicustom to design the system architecture in computational platform perform- design with some custom elements the context of climate model algo- ance would need to reach around is the D. E. Shaw system, expected rithms. The software implementation 10 Pflop/s sustained. This goal to be completed in 2008. D. E. Shaw will be tailored to take advantage of National Energy Research Scientific Computing Center 63 2007 Annual Report

Michael Wehner Lenny Oliker John Shalf hard design limits or features of the evolving hardware implementations. NERSC’s partners in the climate community will be David Randall of Colorado State University (leader of the SciDAC Global Cloud Resolving Model project), and Michael Wehner and Bill Collins of Berkeley Lab. In addition to enabling a break- through in cloud-resolving climate simulation, NERSC’s power-efficient, application-driven design method- Jonathan Carter Erich Strohmaier ology will have an impact on the broader DOE scientific workload. Conference on Computational Sci- chitectures and how efficiently Our hardware/software co-design ence and Engineering, it has at- those systems can perform on approach is geared for a class of tracted international attention.20,21,22 challenging scientific codes. codes, not just for a single code in- Berkeley Lab is currently funding The project, “Enhancing the Ef- stantiation. This methodology is development of a prototype, with fectiveness of Manycore Chip broadly applicable and could be ex- Michael Wehner, Lenny Oliker, and Technologies for High-End tended to other scientific disciplines. John Shalf leading the effort. Computing,” also includes col- Blue Gene was originally targeted at laborators Lenny Oliker and chemistry and bioinformatics appli- Related Research John Shalf. They will identify cations, resulting in a power-effi- Berkeley Lab is also funding two candidate algorithms that map cient architecture, but its application other projects on energy-efficient com- well to multicore technologies, has been broader than the original puting led by NERSC researchers: and document the steps needed target. NERSC expects a similar re- • Jonathan Carter, head of the to re-engineer programs to take sult from the Green Flash. User Services Group, is leading advantage of these architectures. Since the Green Flash concept a project to explore a wide They will also try to identify de- was unveiled at the 2007 SIAM range of multicore computer ar- sign elements in multicore chips

20 Ashlee Vance, “Geeks fight the smelter with embedded processor-based box,” The Register, February 2, 2008, http://www.theregister.co.uk/2008/02/02/horst_simon_cloud_computer/. 21 Michael Feldman, “A Modest Proposal for Petascale Computing,” HPCwire, February 8, 2008, http://www.hpcwire.com/hpc/2112632.html. 22 Economist.com, “Cool it!” March 4, 2008, http://www.economist.com/displaystory.cfm?story_id=10795585. 64 National Energy Research Scientific Computing Center 2007 Annual Report

that would contribute to a better retrieval, archiving, indexing, sum- above) and STAR, maintaining hun- high performance system. maries, sharing across the science dred of terabytes of data on scala- • A project led by Erich Strohmaier team), and in how they analyze data ble storage platforms that are proposes to develop a testbed and communicate results. But it is managed by professional systems for benchmarking of key algo- widely agreed that one of the primary engineers. Centralized data storage rithms that will be crucial for de- bottlenecks in modern science is frees scientists in both projects signing software and computers managing and discovering knowl- from spending time on system ad- that use processors with many edge in light of the tsunami of data ministration, allowing them to focus cores on each chip. This project resulting from increasing computa- on the science. is being conducted with domain tional capacity and the increasing There are many existing or im- experts from Berkeley Lab’s fidelity of scientific observational in- minent projects that could benefit Computational Research Divi- struments.23 And as data become from this kind of data platform, in- sion (CRD), NERSC, and UC too large to move, we are evolving cluding ESG, LHC, ITER, JDEM/ Berkeley. The project, “Refer- towards a model where data-inten- SNAP, Planck, SciDAC Computa- ence Benchmarks for the sive services are centrally located.24 tional Astrophysics Consortium, and Dwarfs,” will devise ways to use The proposed NERSC Data effort JGI. Providing community-oriented a set of algorithms to gauge the will offer a diverse set of activities data repositories for a large number performance of systems from to meet this demand, including but of projects, along with advanced personal computers to high per- not limited to: analytics tools that help extract formance systems. The algo- • community-oriented data meaning from the data, is outside rithms are known as dwarfs; repositories the scope of the current NERSC each dwarf represents a class of • browsing, exploration, and program, but would allow NERSC algorithms with similar proper- analysis capabilities that oper- to more effectively fulfill its mission ties and behavior. The 13 dwarfs ate on the centrally located of enabling scientific discovery. chosen for the research include community repositories algorithms important for the sci- • providing and maintaining the Easy Access to Data entific community. Strohmaier is centrally located hardware and Accelerates Science head of the Future Technologies software infrastructure that en- The value of accessing massive Group in CRD and is a member ables these capabilities. datasets with powerful analytic tools of the NERSC Science-Driven A key element of the NERSC was illustrated in the 2005 National System Architecture team. long-term strategy is production- Institute of Standards and Technology quality data management and ana- (NIST) Open Machine Translation lytics with sufficient resources to Evaluation, which involved aca- NERSC Data meet science needs. The potential demic, government, and commercial impact to DOE in long-term cost participants from all over the world. Virtually all branches of science savings and scientific opportunity is Although it was Google’s first time base hypothesis testing on some profound. competing, their translation system form of data analysis. Scientific dis- For example, NERSC already achieved the highest scores in both ciplines vary in how they produce serves as the data repository for Arabic- and Chinese-to-English data (via observation or simulation), two international nuclear physics translation, outperforming sophisti- in how they manage data (storage, collaborations, KamLAND (page 46 cated rules-based systems devel-

23 Richard P. Mount, ed., The Office of Science Data-Management Challenge: Report from the DOE Office of Science Data-Management Workshops, March–May 2004; http://www.sc.doe.gov/ascr/ProgramDocuments/Final-report-v26.pdf. 24 Gordon Bell, Jim Gray, and Alex Szaley, “Petascale Computational Systems,” IEEE Computer 39(1), January 2006. National Energy Research Scientific Computing Center 65 2007 Annual Report

oped by expert linguists.25 Google A web application that combines NERSC Data Program used statistical learning techniques data and functionality from more Elements to build its translation models, feed- than one source is the Berkeley To anticipate and meet the chang- ing the machines billions of words of Water Center’s (BWC’s) Scientific ing needs of its user communities, text, including matching pairs of Data Server,27 which integrates data the new NERSC Data program will human-translated documents.26 In from several hundred FLUXNET en- include: this case, Google, with more data, vironmental observatories worldwide • The next-generation mass stor- beat others with more expertise. to improve the understanding of age system Similar results can be expected carbon fluxes and carbon-climate • Production infrastructure for from applying advanced analytics interactions. Using Microsoft Share- data tools to massive scientific datasets. Point collaboration tools and an in- o Hardware: computational plat- Indeed, several projects at NERSC tegration with MS Virtual Earth, the forms and Berkeley Lab’s Computational BWC server offers 921 site years of o Software for data manage- Research Division (CRD) illustrate data (150 variables at a 30 minute ment, analysis/analytics, and the growing need for integrated, data rate) available for direct down- interfaces between integrated production-quality data management load into MS Excel. data components and analytics. One such project is the These examples are initial illus- • Development or adaptation of cosmic microwave background data trations of how the data needs of reusable, broad-impact tools analysis for the Planck satellite mis- the scientific community are chang- o Analogous to Google Earth or sion. The satellite is scheduled to be ing. These changes can be summa- Microsoft SharePoint launched in 2008, but the data pro- rized as follows: o Hosting and adapting SciDAC duction pipeline is already in place • Increasing size of data sets from tools for the science commu- at NERSC. Access to both raw and experimental systems (satellites, nity processed data will be provided detectors, etc.) in addition to the • Focused data projects through a web portal for a remote simulation data that has always o Consulting expertise in scien- community of thousands of users. grown with machine size and tific data management, analyt- Another example of production speed. ics, visualization, workflow analytics is Sunfall, the collaborative • Growing size, geographic distri- management, etc. visual analytics and data exploration bution, and diversity of commu- system created with the Nearby nities that share a common data NERSC Data Storage Supernova Factory, as described repository; each community may NERSC is a founding develop- on page 48. The development of be made up scientists with dif- ment partner in the High Performance Sunfall led to a 90% labor savings ferent specializations studying Storage System (HPSS) project,28 in areas of the SNfactory supernova different features of the shared software that manages petabytes of search and follow-up workflow; and data set. data on disk and robotic tape li- project scientists now have new • Increased used of and reliance braries. While HPSS has proven to data exploration and analysis capa- on production-quality information be an invaluable mass storage plat- bilities that had previously been too management, workflow, and an- form, it is now about 15 years old time-consuming to attempt. alytics software infrastructure. and likely will not evolve to meet fu-

25 NIST 2005 Machine Translation Evaluation Official Results, August 1, 2005, http://www.nist.gov/speech/tests/mt/2005/doc/mt05eval_official_results_ release_20050801_v3.html. 26 Bill Softky, “How Google translates without understanding,” The Register, May 15, 2007, http://www.theregister.co.uk/2007/05/15/google_translation/ print.html. 27 http://bwc.berkeley.edu/ 28 http://www.hpss-collaboration.org/hpss/index.jsp 66 National Energy Research Scientific Computing Center 2007 Annual Report

ture science needs. The current ways ergy consumed by the data center. range: community-centric data of storing data in global filesystems This project is exploring new con- repositories and analysis, portal- and archival storage systems will figurations that divide the disks into based interfaces to data and com- probably not scale to exascale. active and passive groups. The ac- putation, high performance and Past experience has shown that tive group contains continuously production-quality visual analytics commercial storage products de- spinning disks that act as a cache pipelines/workflows and systems. signed for a mass market will not for the most frequently accessed The role of the new NERSC meet the needs of open science. data. The disks in the passive group Data program is to provide the The open science community would power down after a period of hardware infrastructure commensu- needs to initiate the collaborative inactivity. This is a prime example of rate with need, which includes both development of what might be called the kind of research needed to de- sufficient capacity (absolute num- EXA-HPSS — a next-generation velop the next generation of storage ber of CPUs, memory, storage, net- mass storage system. This system technology. work bandwidth, etc.) as well as must be energy efficient, scalable, capability (e.g., GPUs, large mem- closely integrated with parallel NERSC Data Production ory footprint nodes, etc.). filesystems and online data, and Infrastructure designed for the requirements of The NERSC Data production NERSC Data Tools new data profiles (e.g., the increas- infrastructure will consist of compu- “Google for Science” may be- ing importance of metadata). tational platforms for high-capacity come the next paradigm for scien- Archival storage needs to g o beyond and high-throughput interactive tific analytics if one considers the file-based access to support a analytics, high-capacity and energy- powerful capabilities that Google broader set of data storage and re- efficient mass storage, high perform- and similar search engines put in trieval operations and more user- ance intra- and inter-networking the hands of anyone with Internet friendly functionality. With decades capability, and a robust collection of access. For example, in response of experience serving a large com- software tools for realizing production to a user query about a given loca- munity of science users, NERSC is analytics solutions. Software tools tion (address, intersection, land- in the best position to lead the will include applications and libraries mark, or latitude and longitude), specification, design, and research for data management, analysis, vi- Google Maps and Google Earth can effort for a next-generation mass sualization, and exploration, as well access a plethora of satellite, aerial, storage system, and to participate as applications and libraries en- and surface photos, map images, in R&D of an interface to support abling scientific community access, and textual information on roads, efficient use of the system. e.g., web portal infrastructure, a buildings, businesses, landmarks, Berkeley Lab’s Scientific Data new data archive interface, etc. and geography, then present that Management Group is already initi- A good analogy for this infra- information in the desired format — ating research into an “Energy-Smart structure is Google, where a signifi- text, map tiles, pictures, or a com- Disk-Based Mass Storage System,” cant investment in computational and bination. Users can zoom in or out envisioned as an energy-efficient, software infrastructure enables the of images, change the directional low-latency, scalable mass storage retrieval of data most relevant to a orientation, change the angle of system with a three-level hierarchy query from a variety of sources and aerial photos, or move in any direc- (compared to HPSS’s two-level hi- presents it quickly in an easily com- tion, all through an easy-to-use web erarchy). Today’s storage systems prehensible, usable, and navigable interface. Providing the ability to ac- in data centers use thousands of form. NERSC’s long-term vision is cess, navigate, and manipulate sci- continuously spinning disk drives. to provide this type of on-demand entific data this easily is NERSC’s These disk drives and the neces- capability (“Google for Science”) to vision of “Google for Science.” sary cooling components use a our users and stakeholders. The re- One of the keys tools developed substantial fraction of the total en- sulting solutions span a diverse at Google that makes these capa- National Energy Research Scientific Computing Center 67 2007 Annual Report

bilities possible is MapReduce, a Data program will be to develop or Focused Data Projects programming model and an associ- adapt reusable, broad-impact open The NERSC Data program will ated implementation for generating tools such as MapReduce, Mi- provide integrated, production- and processing large data sets.29 crosoft SharePoint, or other soft- quality analytics pipelines for exper- MapReduce is used to regenerate ware that can simplify production imental and computational science Google’s index of the World Wide analytics, allowing researchers to projects, working directly with sci- Web as well as perform a wide vari- focus on scientific discovery rather ence stakeholders to design and ety of analytic tasks, including grep, than the detailed operation of ana- deploy production analytics capa- clustering, data mining, and ma- lytics tools. bilities for science communities. chine learning — more than ten Hosting and adapting SciDAC- This unique capability is targeted at thousand applications to date. The developed tools for the science science projects that have a signifi- basic steps in MapReduce are: community will be an essential part cant need for high performance, • read a large quantity of data of this effort. DOE’s investment in production-quality data management, • map the data: extract interest- data management technologies fo- processing, and analysis capabili- ing items cuses on infrastructure for data ties (e.g., ESG, JDEM/SNAP, etc.). • shuffle and sort storage and I/O (PDSI, HPSS), in- The scope of these production • reduce: aggregate and trans- dexing and searching (SDM Center), analytics capabilities is diverse and form the selected data workflow management (SDM Center), will be driven by data-intensive sci- • write the results. and location-transparent access to ence needs. This approach, where MapReduce has a number of distributed data (SDM Center and the primary focus is on the needs of features that suggest it could be Open Science Grid). Such infra- individual science communities, used very productively in scientific structure generally consists of soft- provides opportunities for major analytics. Its functions can be ap- ware comprised of standalone breakthroughs in both the domain plied to numeric, image, or text data, executables to libraries of callable science and computer science, with e.g., simulations, telescopic images, methods/routines that implement the additional benefit of spinning off or genomic data. Its simple, exten- focused capabilities. A gap in DOE’s generally applicable technologies sible interface allows for domain- program, and consequently a long- for broader use. This service will be specific analysis and leverages term challenge, is having dedicated allocated through an INCITE-like domain-independent infrastructure. professional staff at production process to take advantage of the It makes efficient use of wide area computing facilities responsible for NERSC staff’s expertise in consult- bandwidth by shipping functions to the ultimate production deployment ing, analytics, and technology eval- the raw data and returning filtered of such new capabilities. Given the uation, testing, integration, and information. And it hides messy de- demands of production use, there will hardening. tails, such as parallelization, load necessarily be periods of evaluation NERSC anticipates working in balancing, and machine failures, in where new technologies are subject the following areas: the MapReduce runtime library, al- to beta testing, including scrutiny by • Data formats and models. lowing programmers who have no cybersecurity experts. This activity High performance, parallel data experience with distributed or paral- of the NERSC Data program — tool I/O libraries will optimize data lel systems to exploit large amounts evaluation, testing, and feedback — storage, retrieval, and exchange of resources easily. will fill a gap in DOE’s current com- on NERSC and other platforms. A key element of the NERSC putational programs. The NERSC Data team will eval-

29 Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Proc. OSDI’04: Sixth Symposium on Operating Sys- tem Design and Implementation, San Francisco, CA, December, 2004; http://labs.google.com/papers/mapreduce-osdi04.pdf. 68 National Energy Research Scientific Computing Center 2007 Annual Report

uate these technologies, partici- reads files and displays results, • Workflow management. The pate in R&D efforts to create an or through a web-based appli- goal of workflow management improved technology, and directly cation that interfaces to back- is to automate specific sets of consult with science projects to end infrastructure at NERSC to tasks that are repeated many deploy these technologies. access and process data, then times and thus simplify execu- • Community-wide data access. displays results through the web tion and avoid typical human er- Science projects need straight- interfaces. The intersection be- rors that often occur when forward, unfettered, yet authen- tween filtering and exploration repetitive tasks are performed. ticated and authorized access can be based on queries, which • Interfaces and usability. Recent to their community data regard- return subsets of data that meet production analytics workflows less of location across multiple certain criteria. In the commer- like Sunfall show the dramatic sites. Access to data could po- cial world, such systems are increase in scientific productivity tentially be based on files, like typically implemented using a that results from careful atten- the current approach familiar to relational database management tion to the combination of highly users of HPSS and other typical system and SQL queries. A body capable analytics software and filesystems; or based on “ob- of work from DOE’s Scientific highly effective interfaces to jects,” where the result of a Data Management SciDAC pro- those software tools. One pri- “data gather” operation is per- gram shows that commercial mary NERSC Data objective is formed by an agent on the RDBMS systems are not ade- to increase scientific productiv- user’s behalf and later made quate to meet the needs of ity for data-intensive activities available to the user. large, data-intensive science through well designed and engi- • Data filtering and processing. activities.30 neered interfaces. Often raw data must undergo • Data analysis. These activities NERSC Data staff will be a additional processing (e.g., gap include generating statistical bridge between programs in pro- filling, filtering, etc.) before summaries and analysis, super- duction computing and data stor- being ready for downstream use vised and unsupervised classifi- age and complementary applied by consumers. NERSC Data cation and clustering, curve research efforts in visualization, staff will contribute in several fitting, and so forth. While many data management, ultra-efficient areas, such as knowledge of the contemporary applications pro- platforms, networking, distributed best algorithms for filtering/pro- vide integrated data analysis systems, and networking middle- cessing and their deployment capabilities, some science proj- ware. This bridge will complete the on parallel machines, and assis- ects will want to run standalone cycle between research and devel- tance in deploying these algo- analysis tools on data collec- opment and production deployment rithms in scientific workflows. tions offline as part of a work- in computing centers, with active • Data exploration. Science flow to produce derived data for participation in areas of emerging users want to be able to quickly later analysis. architectures and novel algorithms. and easily explore their data, ei- • Data visualization. The role of With many decades of experi- ther with a traditional applica- visualization and visual data ence serving a large community of tion (run on NERSC resources analysis in the scientific process science users, NERSC is in the best or at the user’s location) that is well established. position to deliver production-quality

30 K. Wu, W.-M. Zhang, V. Perevoztchikov, J. Laurent, and A. Shoshani, “Grid Collector: Using an Event Catalog to Speed Up User Analysis in a Distributed Environment,” presented at Computing in High Energy and Nuclear Physics (CHEP) 2004, Interlaken, Switzerland, September 2004; http://www.osti.gov/bridge/servlets/purl/882078-E3rSLU/882078.PDF. National Energy Research Scientific Computing Center 69 2007 Annual Report

data management and knowledge discovery infrastructure to the DOE science community. The NERSC Data program will expand the scope of the NERSC mission to include capabilities that are responsive to the data needs of DOE science, needs that are inseparable from the computational requirements.

Appendix A NERSC Policy Board

Daniel A. Reed (Chair) Microsoft Corporation

David Dean (ex officio, NERSC Users Group Chair) Oak Ridge National Laboratory

Robert J. Goldston Princeton Plasma Physics Laboratory

Tony Hey Microsoft Corporation

Sidney Karin University of California, San Diego

Pier Oddone Fermi National Accelerator Laboratory

Tetsuya Sato Earth Simulator Center/Japan Marine Science and Technology Center

Stephen L. Squires Hewlett-Packard Laboratories Appendix B NERSC Client Statistics

In support of the DOE Office of Science’s mission, the NERSC Center served 3,113 scientists throughout the United States in 2007. These re- searchers work in DOE laboratories, universities, industry, and other Federal agencies. Figure 1 shows the proportion of NERSC usage by each type of in- stitution, while Figures 2 and 3 show laboratory, university, and other organiza- tions that used large allocations of computer time. Computational science conducted at NERSC covers the entire range of scientific disciplines, but is fo- cused on research that supports the DOE’s mission and scientific goals, as shown in Figure 4. More than 1,500 scientific publications in 2007 were based entirely or in part on calculations done at NERSC; a list is available at http://www.nersc.gov/ news/reports/ERCAPpubs07.php. National Energy Research Scientific Computing Center 73 2007 Annual Report

Lawrence Berkeley 9,646,216 Universities 63% Princeton Plasma Physics 4,804,108 Oak Ridge 4,655,828 Argonne 2,461,828 Lawrence Livermore 2,231,002 DOE Labs 32% Pacific Northwest 1,791,287 National Renewable Energy 1,387,030 National Center for Atmospheric Research 1,387,030 Ames 481,656 Thomas Jefferson National Accelerator Facility 444,805 Industries 3% Stanford Linear Accelerator Center 405,582 Army Corps of Engineers 373,096 Brookhaven 354,582 Los Alamos 243,507 Other Government Labs 2% Sandia 221,434 Others (8) 249,807

0 20,000,000 40,000,000 60,000,000 10,000,000 30,000,000 50,000,000 0 2,000,000 4,000,000 6,000,000 8,000,000 10,000,000

Figure 1. NERSC MPP usage by institution type, 2007. Figure 2. DOE and other Federal laboratory usage at NERSC, 2007 (MPP hours).

University of Arizona 2,388,877 Fusion Energy 24.2% Massachusetts Institute of Technology 1,878,131 Materials Sciences 15.9% University of California, Santa Cruz 1,842,348 Auburn University 1,598,937 Chemistry 15.4% University of California, Berkeley 1,293,207 Climate Science 9.9% University of Kentucky 1,101,157 General Atomics 953,941 Astrophysics 7.8% University of 933,076 Accelerator Physics 6.9% Colorado State University 651,197 Lattice QCD 6.8% University of Wisconsin, Madison 634,250 New York University 622,051 Life Sciences 5.5% University of California, Los Angeles 511,868 Nuclear Physics 3.7% Science Applications International 476,512 University of New Hampshire 364,135 Geosciences 1.5% Harvard University 273,281 Mathematics 1.2% University of Washington 257,947 Computer Science 0.4% Georgia Institute of Technology 255,028 University of Texas, Austin 240,899 Engineering 0.3% George Washington University 239,586 Environmental Sciences 0.3% University of Colorado 227,491 38 Others 3,224,218, High Energy Physics 0.1%

0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000 0 5,000,000 10,000,000 15,000,000 20,000,000 25,000,000

Figure 3. Academic and private laboratory usage at NERSC, 2007 Figure 4. NERSC usage by scientific discipline, 2007 (MPP hours). (MPP hours). Appendix C NERSC Users Group Executive Committee

Office of Advanced Scientific Computing Research Kirk Cameron, Polytechnic Institute and State University Mike Lijewski, Lawrence Berkeley National Laboratory Ravi Samtaney, Princeton Plasma Physics Laboratory

Office of Basic Energy Sciences Bas Braams, Emory University Eric Bylaska, Pacific Northwest National Laboratory Thomas Miller, University of California, Berkeley

Office of Biological and Environmental Research David Beck, University of Washington Brian Hingerty, Oak Ridge National Laboratory Adrianne Middleton, National Center for Atmospheric Research

Office of Fusion Energy Sciences Andris Dimits, Lawrence Livermore National Laboratory Stephane Ethier (Vice Chair), Princeton Plasma Physics Laboratory Alex Friedman,* Lawrence Livermore and Lawrence Berkeley National Laboratories Jean-Luc Vay,** Lawrence Berkeley National Laboratory National Energy Research Scientific Computing Center 75 2007 Annual Report

Office of High Energy Physics Olga Barranikova,* University of Illinois at Chicago Julian Borrill,** Lawrence Berkeley National Laboratory Cameron Geddes,** Lawrence Berkeley National Laboratory Warren Mori,* University of California, Los Angeles Frank Tsung, University of California, Los Angeles

Office of Nuclear Physics David Bruhwiler,** Tech-X Corporation David Dean (Chair),* Oak Ridge National Laboratory Patrick Decowski,* Lawrence Berkeley National Laboratory Peter Messmer,** Tech-X Corporation James Vary, Iowa State University

Members at Large Yuen-Dat Chan,* Lawrence Berkeley National Laboratory Angus Macnab,** Woodruff Scientific, LLC Ned Patton,** National Center for Atmospheric Research Gerald Potter, Lawrence Livermore National Laboratory Douglas Swesty,* State University of New York at Stony Brook Xingfu Wu, Texas A&M University

* Outgoing members ** Incoming members Appendix D Office of Advanced Scientific Computing Research

The primary mission of the Advanced Scientific Computing Research (ASCR) program is to discover, develop, and deploy the computational and networking tools that enable researchers in the scientific disciplines to ana- lyze, model, simulate, and predict complex phenomena important to the De- partment of Energy. To accomplish this mission, the program fosters and supports fundamental research in advanced scientific computing—applied mathematics, computer science, and networking—and operates supercom- puter, networking, and related facilities. In fulfilling this primary mission, the ASCR program supports the Office of Science Strategic Plan’s goal of provid- ing extraordinary tools for extraordinary science as well as building the foun- dation for the research in support of the other goals of the strategic plan. In the course of accomplishing this mission, the research programs of ASCR have played a critical role in the evolution of high performance computing and networks. Berkeley Lab thanks the program managers with direct responsibil- ity for the NERSC program and the research projects described in this report: National Energy Research Scientific Computing Center 77 2007 Annual Report

Michael R. Strayer Facilities Division Computational Science Research Associate Director, ASCR Michael R. Strayer and Partnerships (SciDAC) Melea Baker Acting Division Director Division Administrative Specialist Daniel Hitchcock Fred Johnson Barbara Helland Senior Advisor Acting Division Director Senior Advisor Sally McPherson Amy Clark Julie Scott Program Support Specialist Program Support Specialist Financial Management Specialist Vincent Dattoria Teresa Beachley Betsy Riley General Engineer Program Support Assistant Detailee Robert Lindsay Walter Polansky Computer Scientist Senior Scientific Advisor Yukiko Sekine Christine Chalk Computer Scientist Physical Scientist Lali Chatterjee Physical Scientist George Seweryniak Computer Scientist Thomas Ndousse-Fetter Program Manager Steven Lee Mathematician, Detailee Susan Turnbull Detailee Bill Spotz Mathematician, IPA Sandy Landsberg Mathematician Osni Marques Computer Scientist Appendix E Advanced Scientific Computing Advisory Committee

The Advanced Scientific Computing Advisory Committee (ASCAC) pro- vides valuable, independent advice to the Department of Energy on a variety of complex scientific and technical issues related to its Advanced Scientific Computing Research program. ASCAC’s recommendations include advice on long-range plans, priorities, and strategies to address more effectively the sci- entific aspects of advanced scientific computing including the relationship of advanced scientific computing to other scientific disciplines, and maintaining appropriate balance among elements of the program. The Committee formally reports to the Director, Office of Science. The Committee primarily includes representatives of universities, national laboratories, and industries involved in advanced computing research. Particular attention is paid to obtaining a di- verse membership with a balance among scientific disciplines, institutions, and geographic regions. National Energy Research Scientific Computing Center 79 2007 Annual Report

Jill P. Dahlburg, Chair James J. Hack Naval Research Laboratory Oak Ridge National Laboratory

Robert G. Voigt, Co-Chair Thomas A. Manteuffel College of William and Mary University of Colorado at Boulder

F. Ronald Bailey Horst D. Simon NASA Ames Research Center Lawrence Berkeley National (retired) Laboratory

Gordon Bell Ellen B. Stechel Microsoft Bay Area Research Sandia National Laboratories Center Rick L. Stevens Marsha Berger Argonne National Laboratory Courant Institute of Mathematical Sciences Virginia Torczon College of William and Mary David J. Galas Battelle Memorial Institute Thomas Zacharia Oak Ridge National Laboratory Roscoe C. Giles Boston University Appendix F Acronyms and Abbreviations

ASC Advanced Simulation and ESG Earth Systems Grid IPM Integrated Performance Moni- Computing (DOE) toring FES Office of Fusion Energy Sci- ASCR Office of Advanced Scientific ences (DOE) ITER A multinational tokamak ex- Computing Research (DOE) periment to be built in France FMO Fenna-Matthews-Olson (pho- (Latin for “the way”) ASIC Application-specific integrated tosynthetic protein) circuit JDEM Joint Dark Energy Mission FPGA Field programmable gate BER Office of Biological and Envi- array JGI Joint Genome Institute (DOE) ronmental Research (DOE) GB Gigabyte KamLAND BES Office of Basic Energy Sci- Kamioka Liquid Scintillator GFDL Geophysical Fluid Dynamics ences (DOE) Anti-Neutrino Detector Laboratory (NOAA) BWC Berkeley Water Center KRFG Korea Research Foundation GPU Graphics processing unit Grant CCSM Community Climate System HECRTF High-End Computing Revital- Model LBNL Lawrence Berkeley National ization Task Force Laboratory CHiMES Coupled High-Resolution HEP Office of High Energy Physics Modeling of the Earth System LED Light-emitting diode (DOE) CIRES Cooperative Institute for LHC Large Hadron Collider hPa Hectopascals Research in Environmental LHC2 Light-harvesting complex 2 Sciences HPC High performance computing LLNL Lawrence Livermore National CLE Cray Linux Environment HPSS High Performance Storage Laboratory System CPU Central processing unit MB Megabyte ICF Inertial confinement fusion CRD Computational Research MIBRS Miller Institute for Basic Division, Lawrence Berkeley IEEE Institute of Electrical and Elec- Research in Science National Laboratory tronics Engineers MSCF/EMSL CSCS Swiss National Supercomputing INCITE Innovative and Novel Compu- Molecular Science Computing Centre tational Impact on Theory and Facility at the Environmental Experiment (DOE) DOE U.S. Department of Energy Molecular Sciences Laboratory, I/O Input/output Pacific Northwest National DT Deuterium-tritium Laboratory IPCC Intergovernmental Panel on Climate Change National Energy Research Scientific Computing Center 81 2007 Annual Report

MW Megawatt PDSI Petascale Data Storage Insti- SLP Sea level pressure tute (SciDAC) NCAR National Center for Atmos- SNAP SuperNova Acceleration pheric Research PDSF Parallel Distributed Systems Probe Facility (NERSC) NCCS National Center for Computa- Teraflops Trillions of floating point oper- tional Sciences at Oak Ridge PEM Polymer electrolyte membrane ations per second National Laboratory Petaflops UC University of California NCEP National Centers for Environ- Quadrillions of floating point UCLA University of California, Los mental Prediction operations per second Angeles NERSC National Energy Research Sci- PEtot Parallel Total Energy code UPC Unified Parallel C entific Computing Center Pflops Petaflops VASP Vienna Ab-Initio Simulation NIF National Ignition Facility PI Principal investigator Package NIM NERSC Information Manage- PIC Particle-in-cell VFF Valence force field ment system PNNL Pacific Northwest National WRF Weather Research and Fore- NIST National Institute of Standards Laboratory cast and Technology RAMP Research Accelerator for Mul- NOAA National Oceanographic and tiple Processors Atmospheric Administration RDBMS Relational database manage- NP Office of Nuclear Physics (DOE) ment system NSF National Science Foundation SC Office of Science (DOE) ORNL Oak Ridge National SciDAC Scientific Discovery through Laboratory Advanced Computing (DOE) OSG Open Science Grid SDM Scientific Data Management PB Petabyte Center (SciDAC)

PCM Parallel Climate Model SDSC San Diego Supercomputer Center PDA Personal digital assistant SIAM Society for Industrial and Ap- plied Mathematics For more information Jon Bashor about NERSC, contact: NERSC Communications email: [email protected] Berkeley Lab, MS 50B4230 phone: (510) 486-5849 1 Cyclotron Road fax: (510) 486-4300 Berkeley, CA 94720-8148

NERSC’s web site: http://www.nersc.gov/ Published by the Berkeley Lab Creative Services Office in collaboration with NERSC researchers and staff. JO15600

Editor and principal writer: John Hules. Contributing writers: Jon Bashor, Ucilia Wang, Lynn Yarris, and Paul Preuss (Berkeley Lab); Warren Froelich and Jan Zverina (San Diego Supercomputer Center).

DISCLAIMER

This document was prepared as an account of work sponsored by the United States Government. While this document is believed to contain correct information, neither the United States Government nor any agency thereof, nor The Regents of the University of California, nor any of their employees, makes any warranty, express or implied, or assumes any legal responsibility for the accuracy, completeness, or usefulness of any information, appara- tus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by its trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recom- mendation, or favoring by the United States Government or any agency thereof, or The Regents of the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof, or The Regents of the Uni- versity of California.

Ernest Orlando Lawrence Berkeley National Laboratory is an equal opportunity employer.