NERSC 2010 Annual Report

National Energy Research Scientific Computing Center

2010 Annual Report

Ernest Orlando Lawrence Berkeley National Laboratory 1 Cyclotron Road, Berkeley, CA 94720-8148

This work was supported by the Director, Office of Science, Office of Advanced Scientific Computing Research of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Table of Contents i

Table of Contents

1 The Year in Perspective

5 Research News

6 Overlooked Alternatives: New research restores photoisomerization and thermoelectrics to the roster of promising alternative energy sources

2 1 A Goldilocks Catalyst: Calculations show why a nanocluster may be “just right” for recycling carbon dioxide by converting it to methanol

6 1 Down with Carbon Dioxide: Detailed computational models help predict the performance of underground carbon sequestration projects

2 2 Touching Chaos: Simulations reveal mechanism behind a plasma instability that can damage fusion reactors

6 2 Winds of Climate Change: Cyclones in the warm Pliocene era may have implications for future climate

32 3D Is the Key: Simulated supernova explosions happen easier and faster when third dimension is added

6 3 Surface Mineral: Innovative simulations reveal that Earth’s silica is predominantly superficial

0 4 Soggy Origami: Nanotech research on graphene is blossoming at NERSC

4 4 An Antimatter Hypernucleus: Discovery opens the door to new dimensions of antimatter

48 Dynameomics: A new database helps scientists discover how proteins do the work of life

52 NISE Program Encourages Innovative Research

8 5 NERSC Users’ Awards and Honors ii NERSC Annual Report 2010-2011

61 The NERSC Center

2 6 Hopper Powers Petascale Science

3 6 Other New Systems and Upgrades 63 Carver and Magellan Clusters 63 Euclid Analytics Server and Dirac GPU Testbed 64 NERSC Global Filesystem and HPSS Tape Archive Upgrades

5 6 Innovations to Increase NERSC’s Energy Efficiency 65 Carver Installation Saves Energy, Space, Water 65 Monitoring Energy Across the Center to Improve Efficiency

6 6 Improving HPC System Usability with External Services

68 Providing New Computational Models 68 Magellan Cloud Computing Testbed and Evaluation 69 Developing Best Practices in Multicore Programming 70 Computational Science and Engineering Petascale Initiative 71 GPU Testbed and Evaluation

2 7 Improving User Productivity 73 Expanding the Science Gateway Infrastructure 73 Speeding Up Turnaround Time for the Palomar Transient Factory 74 NERSC and JGI Join Forces for Genomics Computing 76 Improving VisIt Performance for AMR Applications 76 Optimizing HPSS Transfers 77 Uncovering Job Failures at Scale 77 Improving Network Transfer Rates 78 Software Support 78 Maintaining User Satisfaction

79 Leadership Changes

0 8 Outreach and Training

2 8 Research and Development by NERSC Staff

89 Appendices

89 Appendix A: NERSC Policy Board 90 Appendix B: NERSC Client Statistics 92 Appendix C: NERSC Users Group Executive Committee 93 Appendix D: Office of Advanced Scientific Computing Research 95 Appendix E: Advanced Scientific Computing Advisory Committee 96 Appendix F: Acronyms and Abbreviations

The Year in Perspective Table of Contents 1

Even while NERSC is settling into the petascale era, we are preparing for the exascale era.

For both the NERSC user community and the center staff, 2010 was truly a transformational year, with an unprecedented number of new systems and upgrades, highlighted by NERSC’s entry into petascale computing. As the primary computing center for the Department of Energy’s Office of Science (DOE SC), our mission is to anticipate and support the computational science needs—in both systems and services—of DOE SC’s six constituent offices. The science we support ranges from basic to mission-focused, from fundamental breakthroughs to practical applications. DOE’s science mission currently includes a renewed emphasis on new energy sources, energy production and storage, and understanding the impacts of energy use on both regional and global climate systems, along with mitigation of those impacts through techniques such as carbon sequestration. A number of major improvements during the past year helped NERSC provide the critical support to meet these research needs.

The most notable improvement was the two-phase installation of Hopper, our newest supercomputer. Hopper Phase 1, a Cray XT5, was installed in October 2009 and entered production use in March 2010. Even with dedicated testing and maintenance times, utilization of Hopper-1 from December 15 to March 1 reached 90%, and significant scientific results are produced even during the testing period.

Hopper Phase 2, a petascale Cray XE6, was fully installed in September 2010 and opened for early use and testing late in the year. NERSC users responded in their usual fashion to this opportunity to run their codes at higher scale while at the same time helping us ensure the machine could meet the growing demands of our user community. Hopper’s potential was evidenced by the 1.05 petaflop/s performance the system achieved running Linpack, making Hopper No. 5 on the November 2010 edition of the TOP500 list of the world’s fastest computers. More importantly, the machine performs representative DOE applications even faster than anticipated due to the low-overhead Cray Gemini network and the fast shared memory communication within the AMD compute nodes.

Hopper represents an important step in a long-term trend of massive growth in on-chip parallelism and a shift away from a pure message passing (MPI) programming model. Hopper has 24-core nodes, in contrast to the dual-core nodes in the original Franklin system, a Cray XT4 system installed only three years earlier. 2 NERSC Annual Report 2010-2011

While the number of cores per node grew by an order of magnitude, the memory per node did not. As a result, it is important to share data structures as much a possible within nodes using some kind of shared memory programming. NERSC is taking a proactive role in evaluating new models and guiding the transition of the user community. In collaboration with Cray, Berkeley Lab’s Computational Research Division, and representative users, NERSC has established a Center of Excellence for Multicore Programming to research programming models for Hopper and beyond. MPI alone can exhaust the available memory on Hopper, but initial research shows that a hybrid MPI + OpenMP model saves significant memory and achieves faster runtime for some applications.

Other new systems and upgrades in 2010 included installation of the Carver and Magellan clusters, the Euclid analytics server, and the Dirac GPU testbed; and upgrades of the NERSC Global Filesystem, the tape archive, and the power supply to the Oakland Scientific Facility. All of these systems and upgrades are described in the NERSC Center section of this report. But it’s worth noting here that these installations required major efforts not only from the Systems Department staff, who planned and coordinated them, but also from the Services Department staff, who are responsible for software installation and user training. The challenge in deploying innovative, maximum-scale systems is always in making them available, usable, and failure tolerant. With their cross-group teamwork, resourcefulness, and commitment, NERSC’s staff excels at meeting those challenges.

In the broader computing community, interest in and use of cloud computing grew significantly. NERSC used the Magellan system as a testbed to evaluate the viability of cloud computing to meet a portion of DOE’s scientific computing demands for small-scale parallel jobs. The project generated a great deal of interest in the community, judging by the number of media inquiries, speaking invitations, and industry attention Magellan has generated. As part of a joint effort with Argonne National Laboratory and the Computational Research Division, NERSC is using Magellan to evaluate the costs, benefits, capabilities, and downsides of cloud computing. All of the cost advantages of centralization exist for scientific users as well as businesses using commercial clouds, but science applications require different types of systems and scheduling. The LBNL research showed that even modest scale scientific workloads slow down significantly—as much as a factor of 50—on standard cloud configurations, and the slowdowns are worse with more communication-intensive problems and as the parallelism within a job scales up. Even ignoring these performance concerns, the cost of commercial clouds is prohibitive—up to 20 cents per core hour for clouds configured with high speed networking and parallel job schedulers— compared to a total NERSC cost of around 2 cents per core hour, including the cost of NERSC’s scientific consulting services. Still, the cloud computing analysis also revealed some important advantages of the commercial setting in the ability to run tailored software stacks and to obtain computing on demand with very low wait times.

Magellan also captured the attention of DOE’s Joint Genome Institute (JGI), located 20 miles away in Walnut Creek, . In March 2010, JGI became an early user of Magellan when the genomics center had a sudden need for increased computing resources. In less than three days, NERSC and JGI staff provisioned and configured a 120-node cluster in Magellan to match the computing environment available on JGI’s local compute clusters, where the data resides. At the same time, staff at both centers collaborated with ESnet network engineers to deploy—in just 24 hours—a dedicated 9 Gbps virtual circuit between Magellan and JGI over ESnet’s Science Data Network (SDN). In April, NERSC and JGI formalized their partnership, with NERSC managing JGI’s six-person systems staff to integrate, operate, and Table of Contents 3

support systems at both NERSC and JGI. This strategy gives JGI researchers around the world increased computational capacity without any change to their software or workflow.

NERSC’s focus on scientific productivity led us to expand the number of Science Gateways, which make massive scientific data sets and high performance computing resources available through the web. The NERSC Web Toolkit (NEWT) enables anyone familiar with HTML and Javascript to develop a Science Gateway. The current roster of active gateways includes the DeepSky astronomical image database, the Gauge Connection for lattice QCD data, the Coherent X-Ray Imaging Data Bank, the 20th Century Reanalysis Project climate database, the Daya Bay Neutrino Experiment, and the Earth System Grid climate data gateway. Science Gateways are extending the impact of computational science to new scientific domains, such as experimental communities who did not previously have easy access to HPC resources and data. This kind of access also helps make scientific conclusions more robust by making it easier to replicate analyses and reproduce results.

Even while NERSC is settling into the petascale era, we are preparing for the exascale era, with the understanding that exascale is all about energy-efficient computing. An exaflop computer built with today’s standard technology would require 3 gigawatts of electricity; and even accounting for Moore’s Law increases in chip density, an exaflop computer would still need over 100 megawatts of power, which is simply not affordable. Twenty megawatts is a more affordable target, but achieving that goal will require major innovations in computer architecture to design for low power, which will change the algorithms and programming techniques needed for exascale systems.

NERSC’s research into improving performance on advanced computing architectures includes the Dirac GPU testbed— another collaboration with Berkeley Lab’s Computational Research Division. Each of Dirac’s 48 nodes is composed of two CPUs and one GPU. GPUs can offer energy-efficient and cost-effective performance boosts to traditional processors. The question that Dirac will help to answer is whether GPUs can do this for a broad scientific workload or for a more limited class of computations. About 100 users are running codes on Dirac, and it seems to be particularly popular with postdocs in the SciDAC-e program, which is improving and developing new scientific algorithms and software for energy-related science.

Several innovations to increase NERSC’s energy efficiency were implemented in 2010. NERSC staff configured the installation of the IBM iDataplex system Carver (of which Dirac is a sub-cluster) in a manner so efficient that it reduces cooling costs by 50 percent, and in some places the cluster actually cools the air around it. NERSC has also instrumented its machine room with state-of-the-art wireless monitoring technology that gathers information on variables important to machine room operation, including air temperature, pressure, and humidity. Responding quickly to accurate machine room data not only enhances cooling efficiency, it helps assure the reliability of center systems.

All of these activities enhanced NERSC’s focus of advancing scientific discovery. Over the course of the past year, our user community continued to grow, and NERSC supported over 3,200 users at universities, national labs, and other research organizations around the nation. With all the advanced technologies we deploy, it is still people who make the difference. I am grateful to our DOE SC sponsors for their continued endorsements, to our users for the science they produce using NERSC resources, and to the NERSC staff who make NERSC an excellent high performance computing center.

Katherine Yelick NERSC Division Director

Research News 5 Research News

With 4,294 users in 2010 from universities, national laboratories, and industry, NERSC supports the largest and most diverse research community of any computing facility within the DOE complex. In their annual allocation renewals, users reported 1,864 refereed publications (accepted or submitted) enabled at least in part by NERSC resources in 2010. These scientific accomplishments are too numerous to cover in this report, but several representative examples are discussed in this section.

Research related to energy and climate included developing a rechargeable heat battery, turning waste heat into electricity, converting carbon dioxide into fuel, resolving uncertainties about carbon sequestration, explaining plasma instability in fusion experiments, and looking to earth’s distant past to predict our future climate.

Other research accomplishments included new insight into the ignition of core-collapse supernovae, a surprising discovery about the composition of Earth’s lower mantle, controlling the shape and conductivity of graphene nanostructures, exploring a new region of strange antimatter, and developing an important community resource for biological research—a database of protein folding pathways.

Early results from a few NERSC Initiative for Scientific Exploration (NISE) projects are also included below, along with a list of awards earned by NERSC users. Overlooked Alternatives Research News 7

New research restores photoisomerization and thermoelectrics to the roster of promising alternative energy sources

Alternative energy research might sound like a search for new ideas, but sometimes it involves putting a new spin on old ideas that have been abandoned or neglected. That’s the case with two recent research projects of Jeffrey Grossman, an associate professor in the Department of Materials Science and Engineering at the Massachusetts Institute of Technology (MIT). Both projects involved collaborations with experimental groups at the University of California at Berkeley, where Grossman led a computational nanoscience research group before joining MIT in 2009, and Lawrence Berkeley National Laboratory.

“The focus of my research group at MIT is on the application and development of cutting-edge simulation tools to understand, predict, and design novel materials with applications in energy conversion, energy storage, thermal transport, surface phenomena, and synthesis,” Grossman says.

His recent collaborations involved two methods of energy conversion and storage that most people have never heard of:

1. Photoisomerization, a thermochemical approach to capturing and storing the sun’s energy which could be used to create a rechargeable heat battery.

2. Thermoelectrics, the conversion of temperature differences into an electric current, which could turn vast quantities of waste heat into usable energy.

A Rechargeable Heat Battery Project: Quantum Simulations of Nano- scale Solar Cells Broadly speaking, there have been two approaches to capturing the sun’s energy:

PI: Jeffrey Grossman, Massachusetts photovoltaics, which turn the sunlight into electricity, or solar-thermal systems, Institute of Technology which concentrate the sun’s heat and use it to boil water to turn a turbine, or use the heat directly for hot water or home heating. But there is another approach Senior Investigators: Joo-Hyoung Lee whose potential was seen decades ago, but which was sidelined because nobody and Vardha Srinivasan, MIT, and Yosuke found a way to harness it in a practical and economical way. Kanai, LLNL

Funding: BES, NSF, LBNL, MIT, UCB This is the thermochemical approach, in which solar energy is captured in the configuration of certain molecules which can then release the energy on demand Computing Resources: NERSC, Teragrid, to produce usable heat. And unlike conventional solar-thermal systems, which LLNL require very effective insulation and even then gradually let the heat leak away, 8 NERSC Annual Report 2010-2011

the heat-storing chemicals could begun construction of a working by a small addition of heat or a remain stable for years. device as proof of principle. However, catalyst, it snaps back to its original a basic understanding of what makes shape, releasing heat in the process Researchers explored this type this particular molecule special was (Figure 1). But the team found that of solar thermal fuel in the 1970s, still missing. In order to tackle this the process is a bit more complicated but there were big challenges: question, Vollhardt teamed up with than that. Nobody could find a chemical that Grossman, who did quantum could reliably and reversibly switch mechanical calculations at NERSC “It turns out there’s an intermediate between two states, absorbing to find out exactly how the molecule step that plays a major role,” said sunlight to go into one state and accomplishes its energy storage Grossman. In this intermediate step, then releasing heat when it reverted and release. the molecule forms a semistable to the first state. Such a compound configuration partway between the was finally discovered in 1996 by Specifically, the theorists used the two previously known states (Figure 2). researchers from the University of Quantum ESPRESSO code to perform “That was unexpected,” he said. The California at Berkeley and Lawrence density functional theory (DFT) two-step process helps explain why Berkeley National Laboratory; but calculations on 256 processors of the molecule is so stable, why the the compound, called fulvalene the Franklin Cray XT4 supercomputer process is easily reversible, and also diruthenium (tetracarbonyl), included at NERSC, with additional com- why substituting other elements for ruthenium, a rare and expensive putation at Lawrence Livermore ruthenium has not worked so far. element, so it was impractical for National Laboratory. widespread energy storage. Moreover, In effect, explained Grossman, no one understood how fulvalene The calculations at NERSC showed this makes it possible to produce diruthenium worked, which hindered that the fulvalene diruthenium a “rechargeable heat battery” that efforts to find a cheaper variant. molecule, when it absorbs sunlight, can repeatedly store and release undergoes a structural transformation heat gathered from sunlight or other Recently Peter Vollhardt of UC called photoisomerization, putting sources. In principle, Grossman said, Berkeley and Berkeley Lab’s the molecule into a higher-energy or a fuel made from fulvalene Chemical Sciences division—the charged state where it can remain diruthenium, when its stored heat discoverer of the molecule—has stable indefinitely. Then, triggered is released, “can get as hot as 200

Figure 1. A molecule of fulvalene diruthenium, seen in diagram, changes its configuration when it absorbs heat, and later releases heat when it snaps back to its original shape. This series shows the reaction pathway from the high-energy state back to the original state. Image: J. Grossman. Research News 9

degrees C, plenty hot enough to heat Intermediate State your home, or even to run an engine to produce electricity.”

Compared to other approaches to Original State solar energy, he said, “it takes many Energy of the advantages of solar-thermal energy, but stores the heat in the form (Means Two Barriers) of a fuel. It’s reversible, and it’s stable Initial Molecule (Stored Heat) over a long term. You can use it where Heat Release you want, on demand. You could put the fuel in the sun, charge it up, then Figure 2. DFT calculations revealed an unexpected second barrier along the reaction path- use the heat, and place the same way from the charged state to the original state. The two-step process helps explain the fuel back in the sun to recharge.” molecule’s stability, and the relative barrier heights play a crucial role in its functionality.

In addition to Grossman, the theoretical work was carried out by in the journal Angewandte Chemie questions, applying the principles Yosuke Kanai of Lawrence Livermore International Edition.1 learned from this analysis in order National Laboratory and Varadharajan to design new, inexpensive materials Srinivasan of MIT, and supported by The problem of ruthenium’s rarity that exhibit this same reversible complementary experiments by and cost still remains as “a deal- process. The tight coupling between Steven Meier and Vollhardt of UC breaker,” Grossman said, but now computational materials design Berkeley. Their report on the work, that the fundamental mechanism and experimental synthesis and which was funded in part by the of how the molecule works is validation, he said, should further National Science Foundation, by the understood, it should be easier accelerate the discovery of promising Sustainable Products and Solutions to find other materials that exhibit new candidate solar thermal fuels. Program at UC Berkeley, and by an the same behavior. This molecule MIT Energy Initiative seed grant, “is the wrong material, but it shows Soon after the Angewandte Chemie was published on October 20, 2010, it can be done,” he said. paper was published, Roman Boulatov, assistant professor of The next step, he said, is to use a chemistry at the University of combination of simulation, chemical at Urbana-Champaign, said of this intuition, and databases of tens research that “its greatest of millions of known molecules to accomplishment is to overcome look for other candidates that have significant challenges in quantum- structural similarities and might chemical modeling of the reaction,” exhibit the same behavior. “It’s my thus enabling the design of new types firm belief that as we understand of molecules that could be used for what makes this material tick, energy storage. But he added that we’ll find that there will be other other challenges remain: “Two other materials” that will work the same critical questions would have to be way, Grossman said. solved by other means, however. One, how easy is it to synthesize Grossman plans to collaborate with the best candidates? Second, what Daniel Nocera, a professor of energy is a possible catalyst to trigger the and chemistry at MIT, to tackle such release of the stored energy?”

1 Y. Kanai, V. Srinivasan, S. K. Meier, K. P. C. Vollhardt, and J. C. Grossman, “Mechanism of thermal reversal of the (fulvalene)tetracarbonyldiruthenium photoisomerization: Toward molecular solar–thermal energy storage,” Angewandte Chemie Int. Ed. 49, 8926 (2010). 10 NERSC Annual Report 2010-2011

Recent work in Vollhardt’s laboratory What’s the solution? Junqiao Wu, “We are predicting a range of has provided answers to both a materials scientist at UC Berkeley inexpensive, abundant, non-toxic questions. Vollhardt and other and Berkeley Lab, believes the materials in which the band structure Berkeley Lab experimentalists, answer lies in finding a new material can be widely tuned for maximal including Rachel Segalman and with spectacular thermoelectric thermoelectric efficiency,” says Wu. Arun Majumdar (now director of properties that can efficiently and “Specifically, we’ve shown that the the U.S. Department of Energy’s economically convert heat into hybridization of electronic wave Advanced Research Projects electricity. (Thermoelectrics is functions of alloy constituents in Agency-Energy, or ARPA-E), the conversion of temperature HMAs makes it possible to enhance have successfully constructed differences in a solid—that is, thermopower without much reduction a microscale working device of differences in the amplitude of atomic of electric conductivity, which fulvalene diruthenium and have vibration—into an electric current.) is not the case for conventional applied for a U.S. patent. Using both thermoelectric materials.” theoretical and experimental results, Computations performed on they are working to maximize the NERSC’s Cray XT4, Franklin, by In 1821, the physicist Thomas Johann device’s efficiency and demonstrate Wu’s collaborators Joo-Hyoung Lee Seebeck observed that a temperature that thermochemical energy storage and Grossman of MIT, showed that difference between two ends of a is workable in a real system. the introduction of oxygen impurities metal bar created an electrical current into a unique class of semiconductors in between, with the voltage being known as highly mismatched alloys directly proportional to the temp- Turning Waste Heat (HMAs) can substantially enhance erature difference. This phenomenon to Electricity the thermoelectric performance of became known as the Seebeck these materials without the usual thermoelectric effect, and it holds Automobiles, industrial facilities, degradation in electric conductivity. great promise for capturing and and power plants all produce waste The team published a paper on converting into electricity some of heat, and a lot of it. In a coal-fired these results in the journal the vast amounts of heat now being power plant, for example, as much Physical Review Letters.2 lost in the turbine-driven production as 60 percent of the coal’s energy disappears into thin air. One estimate places the worldwide waste heat recovery market at one trillion a. b. dollars, with the potential to offset as much as 500 million metric tons 0.04500 0.07000 0.03575 0.05500 of carbon per year. 0.02650 0.04000 0.01725 0.02500 One solution to waste heat has been 0.008000 0.01000 an expensive boiler-turbine system; Max: 0.04996 Max: 0.07756 but for many companies and utilities, Min: 0.000 Min: 0.000 it does not recover enough waste heat to make economic sense. Bismuth telluride, a semiconductor material, has shown promise by employing a thermoelectric principle Figure 3. Contour plots showing electronic density of states in HMAs created from zinc called the Seebeck effect to convert selenide by the addition of (a) 3.125-percent oxygen atoms, and (b) 6.25 percent heat into an electric current, but it’s oxygen. The zinc and selenium atoms are shown in light blue and orange. Oxygen atoms also problematic: toxic, scarce, and (dark blue) are surrounded by high electronic density regions. Image: J.-H. Lee, J. Wu, expensive, with limited efficiency. and J. Grossman.

2 Joo-Hyoung Lee, Junqiao Wu, and Jeffrey C. Grossman, “Enhancing the thermoelectric power factor with highly mismatched isoelectronic doping,” Physical Review Letters 104, 016602 (2010). Research News 11

0.5 the semiconductor zinc selenide,

O s they simulated the introduction of two dilute concentrations of oxygen O p 0.4 atoms (3.125 and 6.25 percent O d respectively) to create model HMAs Zn s (Figure 3). In both cases, the oxygen 0.3 Zn p impurities were shown to induce Zn d peaks in the electronic density of states above the conduction band 0.2 minimum. It was also shown that DOS (states/eV-cell) charge densities near the density of state peaks were substantially 0.1 attracted toward the highly electro­ negative oxygen atoms (Figure 4).

0 -1 0123The researchers found that for each of the simulation scenarios, E - EF (eV) the impurity-induced peaks in the electronic density of states resulted

Figure 4. The projected density of states (DOS) of ZnSe1–xOx with x = 3:125%. Image: J.-H. in a sharp increase of both thermo­ Lee, J. Wu, and J. Grossman. power and electric conductivity compared to oxygen-free zinc selenide. The increases were by of electrical power. For this lost thermoelectric properties,” Wu says. factors of 30 and 180 respectively. heat to be reclaimed, however, “You need it to be spiky or peaky.” thermoelectric efficiency must “Furthermore, this effect is found be significantly improved. Wu also knew that a specialized to be absent when the impurity type of semiconductor called highly electronegativity matches the host “Good thermoelectric materials mismatched alloys (HMAs) could that it substitutes,” Wu says. should have high thermopower, be very peaky because of their “These results suggest that highly high electric conductivity, and hybridization, the result of forcing electronegativity-mismatched alloys low thermal conductivity,” says Wu. together two materials that don’t want can be designed for high performance “Enhancement in thermoelectric to mix atomically, akin to mixing thermoelectric applications.” Wu and performance can be achieved by water and oil. HMAs are formed from his research group are now working reducing thermal conductivity alloys that are highly mismatched to actually synthesize HMAs for through nanostructuring. However, in terms of electronegativity, which physical testing in the laboratory. increasing performance by increasing is a measurement of their ability to thermopower has proven difficult attract electrons. Their properties In addition to capturing energy because an increase in thermopower can be dramatically altered with that is now being wasted, Wu has typically come at the cost of only a small amount of doping. believes that HMA-based a decrease in electric conductivity.” Wu hypothesized that mixing two thermoelectrics can also be used semiconductors, zinc selenide and for solid state cooling, in which Wu knew from earlier research that a zinc oxide, into an HMA, would a thermoelectric device is used good thermoelectric material needed produce a peaky density of states. to cool other devices or materials. to have a certain type of density of “Thermoelectric coolers have states, which is a mathematical In their theoretical work, Lee, advantages over conventional description of distribution of energy Wu, and Grossman discovered that refrigeration technology in that they levels within a semiconductor. “If the this type of electronic structure have no moving parts, need little density of states is flat or gradual, engineering can be greatly beneficial maintenance, and work at a much the material won’t have very good for thermoelectricity. Working with smaller spatial scale,” Wu says. A Goldilocks Catalyst Research News 13

Calculations show why a nanocluster may be “just right” for recycling carbon dioxide by converting it to methanol

Carbon dioxide (CO2) emissions from fossil fuel combustion are major

contributors to global warming. Since CO2 comes from fuel, why can’t we recycle it back into fuel rather than releasing it into the atmosphere? An economical way

to synthesize liquid fuel from CO2 would help mitigate global warming while reducing our need to burn fossil fuels. Catalysts that induce chemical reactions could be the key to recycling carbon dioxide, but they need to have just the right activity level—too much, and the wrong chemicals are produced; too little, and they don’t do their job at all.

Recycling CO2 into alcohols such as methanol (CH3OH) or ethanol (C2H5OH) by adding hydrogen would be sustainable if the hydrogen were produced by a renewable technology such as solar-thermal water splitting, which separates

the hydrogen and oxygen in water. But synthesizing fuel by CO2 hydrogenation requires a catalyst, and the most successful catalysts so far have been rhodium compounds. With a spot price averaging more than $4000 an ounce over the last five years, rhodium is too expensive to be practical on a large scale.

Molybdenum (“moly”) sulfide is a more affordable catalyst, and experiments

have identified it as a possible candidate for CO2 hydrogenation; but bulk moly

disulfide (MoS2) only produces methane. Other metals can be added to the catalyst to produce alcohols, but even then, the output is too low to make the process commercially viable.

Nanoparticles, however, often have higher catalytic activity than bulk chemicals. Designing moly sulfide nanoparticles optimized for alcohol synthesis would depend on finding answers to basic questions about the chemical reaction mechanisms. Project: Computational Studies of Supported Nanoparticles in Heterogeneous Catalysis Many of those questions were answered in a paper featured on the cover of the March 25, 2010 special issue of the Journal of Physical Chemistry A devoted PI: Ping Liu, Brookhaven National to “Green Chemistry in Energy Production” (Figure 5). In this paper, Brookhaven Laboratory National Laboratory (BNL) chemist Ping Liu, with collaborators YongMan Choi,

Senior Investigators: YongMan Choi, BNL Yixiong Yang, and Michael G. White from BNL and the State University of (SUNY) Stony Brook, used density functional theory (DFT) calculations Funding: BES, BNL conducted at BNL and NERSC to investigate methanol synthesis using a moly sulfide nanocluster (Mo S ) as the catalyst instead of the bulk moly sulfide Computing Resources: NERSC, BNL 6 8 powder used in previous experiments.3 14 NERSC Annual Report 2010-2011

“This study might seem like pure Designing moly sulfide nanoparticles optimized research, but it has significant implications for real catalysis,” Liu for alcohol synthesis depends on finding answers says. “If you can recycle CO and 2 to basic questions about the chemical reaction produce some useful fuels, that would be very significant. And that’s mechanisms. what DOE has always asked people to do, to find some new alternatives for future fuel that will eliminate should have the right activity to do This detailed understanding the environmental footprint.” this catalysis.” is important for optimizing the catalyst’s efficiency. For example, The reason bulk moly disulfide From DFT calculations run using lowering the energy barrier in the

only produces methane (CH4) is, the VASP code on 16 to 32 cores CO to HCO reaction would allow ironically, that it is too active a of NERSC’s Franklin system, Liu increased production of methanol. catalyst. “Catalysts are very tricky and colleagues discovered that the Adding a C–C bond could result

materials,” Liu says. “You don’t Mo6S8 nanocluster has catalytic in synthesis of ethanol, which is want it too active, which will break properties very different from bulk usable in many existing engines.

the CO bond and only form methane, MoS2. This is not surprising, given Both of these improvements might an undesired product for liquid fuel the different geometries of the be achieved by adding another metal production. If the catalyst is too molecules: the cluster has a more to the cluster, a possibility that

inert, it can’t dissociate CO2 or H2, complex, cagelike structure (Figure Liu and her theorist colleagues are and do the trick. So a good catalyst 6). The cluster is not as reactive currently working on in collaboration as the bulk—it cannot break the with Mike White, an experimentalist.

carbon–oxygen bond in CO2—and therefore it cannot produce methane. The goal of catalyst research is to find But, using a completely different a “Goldilocks” catalyst—one that reaction pathway (Figure 7), the does neither too much nor too little, cluster does have the ability to but is “just right” for a particular produce methanol. application. By revealing why a specific catalyst enables a specific “In this case, the nanocluster is reaction, studies like this one provide not as active as the bulk in terms the basic understanding needed to of bonding; but it has higher activity, design the catalysts we need for or you could say higher selectivity, sustainable energy production.

to synthesize methanol from CO2

and H2,” Liu explains. S-edge Mo-edge Figure 7 shows the details of the reaction pathway from carbon dioxide and hydrogen to methanol

and water, as catalyzed by the Mo6S8 Figure 5. The top left, bottom left, and cluster, including the intermediate top right images illustrate aspects of states, energetics, and geometries a. b. methanol synthesis on a moly sulfide of the transition states—all of which nanocluster. (The other two images are were revealed for the first time by Figure 6. Optimized edge surface of (a)

from different studies.) Image: © 2010 this research. MoS2 and (b) Mo6S8 cluster (Mo = cyan, American Chemical Society. S = yellow). Image: Liu et al., 2010.

3 Ping Liu, YongMan Choi, Yixiong Yang, and Michael G. White, “Methanol synthesis from H2 and CO2 on a Mo6S8 cluster: A density functional study,” J. Phys. Chem. A 114, 3888 (2010). Research News 15

1.0

0.5

0.0 CO2 + 3H2

-0.5 CO3OH + H2O

-1.0

Energy (eV) -1.5 2 2 O 2 CO CO 2 3 *CO COH COH *CO +3H 3 3 *HCO 2 *H *H H *HOCO *H CO *CO+*H

Figure 7. Optimized potential energy diagram for methanol synthesis from CO2 and H2 on a Mo6S8 cluster. Thin bars represent the reactants, products, and intermediates; thick bars stand for the transition states. The geometries of the transition states involved in the reaction are also included (Mo = cyan, S = yellow, C = gray, O = red, H = white). Image: Liu et al., 2010. Down with Carbon Dioxide Research News 17

Detailed computational models help predict the performance of underground carbon sequestration projects

Despite progress in clean energy, Americans will continue to rely on fossil fuels for years to come. In fact, coal-, oil- and natural gas-fired power plants will generate 69 percent of U.S. electricity as late as 2035, according to the U.S. Energy Information Administration.

Such sobering projections have sparked a wide range of proposals for keeping

the resulting carbon dioxide (CO2) out of the atmosphere, where it traps heat and contributes to global warming. Berkeley Lab scientists are using computer simulations to evaluate one promising idea: Pump it into saltwater reservoirs deep underground.

Underground, or geologic, carbon sequestration “will be key tool in reducing

atmospheric CO2,” says George Pau, a Luis W. Alvarez Postdoctoral Fellow with Berkeley Lab’s Center for Computational Sciences and Engineering (CCSE). “By providing better characterizations of the processes involved, we can more accurately predict the performance of carbon sequestration projects, including the storage capacity and long-term security of a potential site.”

Taking advantage of NERSC’s massively parallel computing capacity, CCSE researchers led by John Bell have created the most detailed models yet of the mixing processes that occur at the interface of the sequestered carbon dioxide and brine.4 These simulations—including the first-ever three- dimensional ones—will help scientists better predict the success of this kind of sequestration project.

Selecting a Site Project: Simulation and Analysis of Reacting Flows

Carbon sequestration ideas run a wide gamut, from mixing CO2 into cement PI: John B. Bell, Lawrence Berkeley to pumping it into the deepest ocean waters. However, geologic sequestration National Laboratory projects are promising enough that some are already under way (Figure 8). In this type of sequestration, CO from natural gas-, oil-, or coal-fired power Senior Investigators: Ann S. Almgren, 2 Marcus S. Day, LBNL

Funding: ASCR, BES 4 G. S. H. Pau, J. B. Bell, K. Pruess, A. S. Almgren, M. J. Lijewski, and K. Zhang, “High resolution simulation and characterization of density-driven flow in CO storage in saline aquifers,” Advances Computing Resources: NERSC, LBNL 2 in Water Resources 33, 443 (2010). 18 NERSC Annual Report 2010-2011

plants is pressure-injected deep into stable geologic formations. Geological Storage Options for CO Produced oil or gas “The natural first picks [for these 2 sites] are depleted oil and gas 1 Depleted oil and gas reservoirs Injected CO reservoirs,” says Karsten Pruess, 2 2 Use of CO in enhanced oil recovery a hydrologist with Berkeley Lab’s 2 3 Deep unused saline water-saturated reservoir rocks Stored CO Earth Sciences Division and a 2 co-author of the study. 4 Deep unmineable coal seams

5 Use of CO2 in enhanced coal bed methane recovery However, the problem with this 6 Other suggested options (basalts, oil shales, cavities) “put-it-back-where-it-comes-from” notion is that combustion generates

about three times more CO2, by volume, than the fossil fuels burned. 6 4 “So, even if all these old reservoirs 1 5 were good targets—and not all are— the capacity available isn’t nearly enough,” Pruess concludes. 3 Saline aquifers offer a promising alternative. The brine that permeates 2 these porous rock formations is unusable for crops or drinking, but given enough time, it can lock away

some CO2 in the form of a weak acid. Meanwhile, the solid-rock lid that prevents saline from welling up to the surface (the caprock) also traps

injected CO2 underground.

According to Pruess, when CO2 is first injected into an aquifer, it forms a separate layer, a bubble of sorts, above the saline. Over time, some

CO2 will diffuse into the brine, causing the brine density to 1km increase slightly. As more of the

CO2 diffuses into the water, the resulting layer of carbonic acid (now less buoyant than the water below it) sinks. This churns up fresh brine, and a convective process kicks in

Figure 8. Geologic sequestration in saline aquifers (3) is shown in this illustration alongside other geologic sequestration ideas. Image: Australian Cooperative Research Centre for Greenhouse Gas 2km ©CO2CRC Technologies (CO2CRC). Research News 19

Geological Storage Options for CO2 Produced oil or gas

1 Depleted oil and gas reservoirs Injected CO2

2 Use of CO2 in enhanced oil recovery

3 Deep unused saline water-saturated reservoir rocks Stored CO2 4 Deep unmineable coal seams

5 Use of CO2 in enhanced coal bed methane recovery 6 Other suggested options (basalts, oil shales, cavities)

6 4 1 5

3

2

1km

2km ©CO2CRC 20 NERSC Annual Report 2010-2011

which dramatically increases the rate understand sequestration because simultaneously on NERSC’s Cray

of CO2 uptake, compared to diffusion they have either insufficient detail XT4 system, Franklin. alone (Figures 9 and 10). This or require too much computation process can take hundreds, even to be practical. “About 100 meters Called Porous Media AMR, or thousands of years, but scientists resolution is the best we can do,” PMAMR, the code concentrates haven’t fully quantified the factors Pruess says. Because convection computing power on more active that influence and trigger convection. effects start at scales smaller than areas of a simulation by breaking a millimeter, they are typically them into finer parts (higher

“The processes by which CO2 invisible to these geological models. resolution) while calculating less dissolves into the brine are of serious active, less critical portions in consequence,” says Pau. “If you coarser parts (lower resolution). can determine the rate this will Cracking the The size of the chunks, and thus occur, then you can more accurately Sequestration Code the resolution, automatically shifts

determine how much CO2 can be as the model changes, ensuring that stored in a given aquifer.” That’s why Berkeley Lab’s CCSE the most active, critical processes and Pruess collaborated to develop are also the most highly resolved. Because scientists can’t wait a new simulation code. The code This ability to automatically shift hundreds of years for answers, they combines a computing technique focus makes AMR a powerful tool for are using NERSC supercomputers called adaptive mesh refinement modeling phenomena that start small to run mathematical models that (AMR) with high performance and grow over time, says Pruess. simulate the processes involved. parallel computing to create high- resolution simulations. The team’s The first PMAMR models were of Geological models that break the simulations were performed using modest size, on the scale of meters. problem into fixed-size pieces cannot 2 million processor-hours and But the eventual goal is to be able reveal phenomena needed to running on up to 2,048 cores to use the characteristics of an aquifer—such as the size of the pores in the sandstone, the amount 3D simulations of carbon sequestration are of salts in the water, or the a first step toward predicting how much CO composition of the reservoir rocks— 2 to model a given site and predict

an aquifer can hold. how much CO2 it can accommodate.

Figure 9. As CO2 diffuses into the brine, the top, more concentrated layers (indicated in red) begin to sink, triggering a convective process

that churns up fresh brine and speeds the salt water’s CO2 uptake. The arrows in the frame at the right indicate the direction of flow.

Understanding convection onset is critical to understanding how much CO2 a given aquifer can store. Image: George Pau. Research News 21

PMAMR simulations might also The next challenge the team faces to migrate outward from the be used to help geologists more involves extending and integrating injection site and, if the aquifer accurately track and predict the their models with large-scale models. has a slope, could eventually reach migration of hazardous wastes “If you look how much CO2 is made drinking water aquifers or escape underground or to recover more by a large coal-fired power plant, at outcrops hundreds of kilometers oil from existing wells. And the and you want to inject all that away. “We are very interested in team’s three-dimensional underground for 30 years, you the long-term fate of this thing,” simulations could help scientists end up with a CO2 plume on the says Pruess. uncover details not apparent in order of 10 km in dimension,” says two-dimensional models. Pruess. This free CO2 will continue

Contour Var: CO2 32.96 16.48 Max: 49.44 Min: 0.000

Vector Contour Var:_velocity Var: CO2 3.602e-08 32.98 16.49 2.701e-08 Max: 49.48 Min: 0.000 1.801e-08 Vector 9.004e-09 Var:_velocity

2.475e-16 3.489e-08

Max: 3.602e-08 2.617e-08 Min: 2.475e-16 1.745e-08

8.726e-09

6.028e-12 Max: 3.489e-08 Contour Min: 6.028e-12 Var: CO2 32.95 16.48 Max: 49.43 Min: 0.000

Vector Var:_velocity 3.148e-08

2.361e-08

1.574e-08

7.870e-09

7.459e-15 Max: 3.148e-08 Min: 7.459e-15

Figure 10. These three-dimensional simulation snapshots show the interface between sequestered CO2 and brine over time. The sequence, starting at the upper left frame and proceeding counterclockwise, shows a dramatic evolution in the size of the convection cells with time. Image: George Pau. Touching Chaos Research News 23

Simulations reveal mechanism behind a plasma instability that can damage fusion reactors

If humans could harness nuclear fusion, the process that powers stars like our Sun, the world could have an inexhaustible energy source. In theory, scientists could produce a steady stream of fusion energy on Earth by heating up two types of hydrogen atoms—deuterium and tritium—to more than 100 million degrees centigrade until they become a gaseous stew of electrically charged particles, called plasma; then use powerful magnets to confine these particles until they fuse together, releasing energy in the process.

Although researchers have achieved magnetic fusion in experiments, they still do not understand the behavior of plasma well enough to generate a sustainable flow of energy. They hope the international ITER experiment, currently under construction in southern France, will provide the necessary insights to make fusion an economically viable alternative energy source. ITER, an industrial-scale reactor, will attempt to create 500 megawatts of fusion power for several-minute stretches. Much current research is focused on answering fundamental questions that will allow engineers to optimize ITER’s design.

The DOE’s Scientific Discovery through Advanced Computing (SciDAC) Center for Extended Magnetohydrodynamic (MHD) Modeling, led by Steve Jardin of the Princeton Plasma Physics Laboratory, has played a vital role in developing computer codes for fusion modeling. Recently, the collaboration created an extension to the Multilevel 3D (M3D) computer code that allows researchers to simulate what happens when charged particles are ejected from a hot plasma stew and splatter on the walls of a tokamak device, the doughnut-shaped “pot” Project: 3D Extended MHD Simulation used to magnetically contain plasmas. According to Linda Sugiyama of the of Fusion Plasmas Massachusetts Institute of Technology, who led the code extension development, these simulations are vital for ensuring the safety and success of upcoming PI: Stephen Jardin, Princeton Plasma Physics Laboratory magnetic confinement fusion experiments like ITER.

Senior Investigators: Joshua Breslau, Jin Chen, Guoyong Fu, Wonchull Park, PPPL; Modeling Fusion to Ensure Safety Henry R. Strauss, New York University; Linda Sugiyama, Massachusetts Institute of Technology Because hot plasma is extremely dangerous, it is imperative that researchers understand plasma instability so that they can properly confine it. For fusion Funding: FES, SciDAC to occur, charged particles must be heated to more than 100 million degrees centigrade. At these searing temperatures, the material is too hot to contain Computing Resources: NERSC with most earthly materials. In tokamaks, the plasma is shaped into a torus, 24 NERSC Annual Report 2010-2011

or doughnut shape, and held in place magnetic confinement through the instability involving the mutual within a vacuum by powerful protective vacuum and splatter onto influence of growing plasma magnetic fields, so that the plasma the tokamak wall before the plasma structures and the magnetic field. does not touch the surface of the stabilizes and resumes its original A minor instability anywhere on a containment vessel. shape. The number of unstable plasma boundary that possesses an pulses affects how much plasma X-point can perturb the magnetic Tokamak plasmas typically have one is thrown onto the walls, which field, causing it to become chaotic. or two magnetic “X-points”—so in turn determines the extent As the plasma instability grows, called because the magnetic field of wall damage. magnetic chaos increases, allowing lines form an X shape—at the top or the plasma bulges to penetrate deep bottom of the plasma surface. The origins and dynamics into the plasma and also to break out X-points help promote efficient of ELMs—or of stable plasma of the boundary surface, resulting in fusion: by channeling stray particles edges, for that matter—have long a loss of plasma (Figures 12 and 13). along the magnetic field lines to resisted theoretical explanation. This demonstration that the plasma divertors designed to handle them, A breakthrough came when motion can couple to a chaotic the X-points help control the plasma Sugiyama and a colleague, Henry magnetic field may help explain shape and prevent instabilities. Strauss of the Courant Institute of the large range of ELM and ELM-free Mathematical Sciences at New York edge behaviors in fusion plasmas, Until they don’t. University, used the M3D extension as well as clarifying some properties on NERSC computers to simulate of magnetic confinement. For three decades, tokamak the development of the magnetic researchers have observed periodic fields on a tokamak plasma surface. Sugiyama notes that the primary disruptive instabilities, called edge They reported their results in an causes of plasma instability vary localized modes (ELMs), on the article featured on the cover of the in the different plasma regions edge of the plasma. ELMs allow June 2010 issue of Physics of of the tokamak, including the core, blobs of plasma to break out of the Plasmas (Figure 11).5 edge, and surrounding vacuum. To understand the most dangerous “Studies of nonlinear dynamics show of these instabilities, computational that if you have X-points on the physicists model each plasma region plasma boundary, which is standard on short time and space scales. for high temperature fusion plasmas, The plasma regions are strongly particles can fly out and hit the walls. coupled through the magnetic and But no one had ever seen what electric fields, and researchers must actually happens inside the plasma if carry out an integrated simulation you let it go,” says Sugiyama. “This that shows how all the components is the first time that we have seen a interact in the big picture, over stochastic magnetic tangle, a long time scales. structure well known from Hamiltonian chaos theory, generated “Modeling plasma instabilities is by the plasma itself. Its existence computationally challenging because also means that we will have to these processes occur on widely rethink some of our basic ideas about differing scales and have unique confined plasmas.” complexities,” says Sugiyama. “This Figure 11. This ELM research was phenomenon couldn’t be investigated featured on the cover of the June 2010 What Sugiyama and Strauss without computers. Compared issue of Physics of Plasmas. Image: discovered is that ELMs constitute to other research areas, these ©2010, American Institute of Physics. a new class of nonlinear plasma simulations are not computationally

5L. E. Sugiyama and H. R. Strauss, “Magnetic X-points, edge localized modes, and stochasticity,” Physics of Plasmas 17, 062505 (2010). Research News 25

large. The beauty of using NERSC is “I greatly appreciate the NERSC surprisingly difficult,” Sugiyama that I can run my medium-size jobs policy of supporting the work says, “primarily because the for a long time until I generate all the required to scale up from small jobs instability is extremely fast compared time steps I need to see the entire to large jobs, such as generating to most experimental measurements process accurately.” input files and visualizing the —changes occur over a few micro- results,” says Sugiyama. “The seconds—and it is very difficult to She notes that the longest job ran center’s user consultants and measure most things occurring inside on 360 processors for 300 hours on analytics group were crucial to a hot dense plasma, since material NERSC’s Cray XT4 Franklin system. getting these results.” probes are destroyed immediately. However, she also ran numerous We have good contacts with all the other jobs on the facility’s Franklin Her ongoing research has two major major U.S. experiments and some and DaVinci systems using anywhere goals. One is to verify the simulation international ones, but have been from 432 to 768 processors for model against experimental waiting for good data.” about 200 CPU hours. observations. “This is proving Her second goal is to characterize the ELM threshold in a predictive fashion for future fusion plasmas, including new experiments such as ITER. This involves studying small ELMs and ELM-stable plasmas, where there are other, smaller MHD instabilities. Demonstrating the predictive value of this ELM model, besides being a significant theoretical achievement, will help bring practical fusion energy one step closer to realization.

“This is the first time that we have seen a Figure 12. Temperature surface near the plasma edge shows helical perturbation aligned with the magnetic field.Image: L. E. Sugiyama and the M3D Team. stochastic magnetic tangle … generated by the plasma itself.”

Figure 13. This image shows a simulation of the initial stage of a large edge in­stability in the DIII-D tokamak. The top row shows contours of the plasma a. b. c. d. e. temperature in a cross section of the torus with its central axis to the left. The bottom row shows corresponding density. The vacuum region between plasma and wall is grey; plasma expands rapidly into this region and hits the outer wall, then gradually subsides back to its original shape. Image: L. E. Sugiyama and H. R. Strauss. Winds of Climate Change Research News 27

Cyclones in the warm Pliocene era may have implications for future climate

Scientists searching for clues to Earth’s future climate are turning to its dim past, the Pliocene epoch. “We’re interested in the Pliocene because it’s the closest analog we have in the past for what could happen in our future,” says Chris Brierley, a Yale climate researcher computing at NERSC.

Brierley and Yale colleague Alexey Fedorov are using simulations to help unravel a mystery that has bedeviled climatologists for years: Why was the Pliocene— under conditions similar to today—so much hotter? Their most recent work suggests that tropical cyclones (also called hurricanes or typhoons) may have played a crucial role.6

Working with Kerry Emanuel of MIT, the Yale researchers’ computer simulations revealed twice as many tropical cyclones during the Pliocene; storms that lasted two to three days longer on average (compared to today); and, unlike today, cyclones occurred across the entire tropical Pacific Ocean (Figure 14). This wide-ranging distribution of storms disrupted a key ocean circulation pattern, warming seas and creating atmospheric conditions that encouraged more cyclones.

This positive feedback loop, they posit, is key to explaining the Pliocene’s higher temperatures and persistent El Niño-like conditions. Their findings, featured on the cover of the journal Nature (Figure 15), could have implications for the planet’s future as global temperatures continue to rise due to climate change.

What’s the Pliocene Got to Do with It?

The Pliocene Epoch (the period between 5 and 3 million years ago) is especially Project: Tropical Cyclones and the Shallow Pacific Overturning Circulation interesting to climatologists because conditions then were similar to those of today’s Earth, including the size and position of land masses, the levels PI: Alexey Fedorov, Yale University of sunlight, and, crucially, atmospheric concentrations of the greenhouse gas carbon dioxide (CO ). “In geological terms, it was basically the same,” Senior Investigator: Christopher Brierley, 2 Brierley says. Yale University

Funding: BER, NSF, David and Lucile Packard Foundation

Computing Resources: NERSC 6 Alexey V. Fedorov, Christopher M. Brierley, and Kerry Emanuel, “Tropical cyclones and permanent El Niño in the early Pliocene epoch,” Nature 463, 1066 (2010). 28 NERSC Annual Report 2010-2011

worldwide, are characterized by an uptick in the temperature of the cold tongue. It only took about 50 model- years for the cold tongue to begin warming in the researchers’ Pliocene model. After 200 years, El Niño-like warming was persistent (Figure 17).

Brierley, Fedorov, and Emanuel suggest tropical cyclones were to blame, at least in part. More numerous and more widely Figure 14. Simulated hurricane tracks under modern conditions (left) and Pliocene (right) distributed across the globe, overlay sea surface temperatures created by a fully coupled climate model. Images: Chris the cyclones disrupted the cold M. Brierley. tongue, raising ocean temperatures and leading to changes in the atmosphere that favored the Yet the Pliocene was puzzlingly They then randomly seeded vortices formation of even more tropical hotter. Global mean temperatures and tracked the results. “We found storms. The cycle repeated, helping ranged up to 4º C (about 8º F) higher there were twice as many tropical to sustain the Pliocene’s higher than today; sea levels were 25 cyclones during this period [versus overall temperatures. meters higher; and sea ice was all modern times], and that they lasted but absent. The poles heated up, longer,” two to three days longer on “We’re suggesting that one of the yet temperatures in the tropics average, says Brierley. reasons for the differences [between did not exceed modern-day levels. the Pliocene and today] is this And the planet was locked into Hurricane paths shifted, too (Figure feedback loop that isn’t operating in a continuous El Niño-like state, 16). “Rather than starting in the far our present climate,” says Brierley. as opposed to the on/off oscillation western Pacific, hitting Japan, then experienced today. heading back out into the northern Pacific, they were all over the place,” Fedorov and Brierley suspected striking all along two bands north and cyclones played a role. “We knew south of the equator, says Brierley. from a previous study that additional vertical mixing [in the ocean] can The next step was to integrate the cause warming in the Niño region” extra ocean-churning action of all of the eastern tropical Pacific, says those hurricanes into a more complete Brierley. “Hurricanes were one of the climate model. To do that, the team few sources of mixing that we could applied an idealized pattern of envisage changing,” he explains. hurricane ocean-mixing to a fully coupled climate model, one that includes ocean, sea ice, land cover, Cyclones Stir the Pot and atmosphere.

To test their idea, the team first Today, cold water originating off the simulated atmospheric flows for coasts of California and Chile skirts modern Earth and for the Pliocene the region of tropical cyclone activity (see sidebar). Because most other on its way to the Equator, where it factors were basically the same, surfaces in a “cold tongue” that Figure 15. These simulations were the main difference between the stretches west off the coast of South featured on the cover of the 25 February two was the Pliocene’s higher sea America. El Niño events, which affect 2010 issue of Nature. Image ©2010 surface temperatures. precipitation and temperature patterns Macmillan Publishers Limited. Research News 29

a. Tracks in modern climate

30N Hurricane Strength 0 5

Latitude (degrees) 30S 4

3 30E 60E 90E 120E 150E 180 150W 120W 90W 60W 30W 0

2 b. Tracks in early Pliocene climate

1 30N TS

0 TD

Latitude (degrees) 30S

30E 60E 90E 120E 150E 180 150W 120W 90W 60W 30W 0 Longitude (degrees) Figure 16. Simulated hurricane tracks in (a) modern climate and (b) Pliocene climate illustrate a two-year subsample from 10,000 simulated tropical cyclones. Colors indicate strength from tropical depression (blue) to category-5 hurricane (red). Images: Chris M. Brierley.

Powering Climate Models

Calculating the complex, multifaceted interactions of land, sea, atmosphere, and ice requires specialized and massive computing power and storage. NERSC’s Cray XT4, Franklin, and the NERSC High Performance Storage System (HPSS) both played important roles in Fedorov and Brierley’s enquiries.

For this investigation, the team used 125,000 processor hours on Franklin to run fully coupled climate models. “The current resources that we have locally achieve a rate of about three model years per day, whereas Franklin can achieve around 25 years per day,” says Brierley. Models run on Franklin in just a month would have taken six months or more locally. “It’s not just the number and speed of individual processors, but the speed at which they can pass information” that makes it feasible to run hundreds of years long simulations, Brierley says.

The team was also able to get their models up and running quickly because of Franklin’s pre-optimized, pre-installed climate modeling code. “I only had to set up the problem to do the science, rather than dealing with all the trouble of getting the code running first,” says Brierley.

Because of NERSC’s mass storage facility, HPSS, “I was able to store the vast amounts of data generated and then only take the fields I need at the time to a local machine to run visualization software,” Brierley says. Safely stored, the datasets can be used again in future rather than forcing scientists to repeat computationally expensive runs. 30 NERSC Annual Report 2010-2011

Will History Repeat? SST a. Preindustrial (˚C) The researchers don’t know if 31 modern-day climate will eventually 20N 29 develop the same feedback loop, 27 0 because they aren’t sure what set if 25 23 off the first place. “There must have 20S been a trigger, some sort of tipping Latitude (degrees) 21 120E 150E 180 150W 120W 90W point, but we don’t yet know what that may have been,” Brierley explains.

Fedorov cautions against drawing SST change direct comparisons between our (˚C) b. Increasing CO2 and subtropical mixing future climate and this past epoch. 3 The current scientific consensus 20N 2 projects that the number of intense 1 0 0 hurricanes will increase under global -1 warming, but the overall number 20S -2 Latitude (degrees) will actually fall, he says. “However, -3 120E 150E 180 150W 120W 90W unless we understand the causes of these differences, we will not be sure whether our projections

are correct,” explains Fedorov. SST change “Changes in the frequency and c. Increasing CO2 only (˚C) distribution of these storms could 3 20N be a significant component of future 2 1 climate conditions.” 0 0 -1 For their next step, the team is 20S

Latitude (degrees) -2 refining and rerunning their models -3 120E 150E 180 150W 120W 90W at NERSC. “We’ve done all this assuming that you can average the Longitude (degrees) behavior of hurricanes into one annual mean signal,” says Brierley. The team is working on more realistic Figure 17. Sea surface temperature changes in the tropical Pacific as simulated by the hurricane mixing, “where you have coupled model are shown here. In the top frame is a baseline of temperatures using

a very strong forcing for a short pre-industrial CO2 levels. The center frame shows the amount of warming (over baseline)

amount of time followed by a long under Pliocene CO2 levels with the water-churning effects of intense hurricane activity.

quiescence,” says Brierley. “We need The bottom frame shows the effects of higher CO2 levels alone. After 200 years of to know if this makes a significant calculations, the Pliocene scenario (center) shows warming in the eastern equatorial difference in our models.” Pacific is high, the signature of El Niño-like conditions. Images: Chris M. Brierley.

Tropical cyclones disrupted the “cold tongue” in the Pacific Ocean, raising ocean temperatures and leading to changes in the atmosphere that favored the formation of even more tropical storms. Research News 31 3D Is the Key Research News 33

Simulated supernova explosions happen easier and faster when third dimension is added

For scientists, supernovae are true superstars—massive explosions of huge, dying stars that shine light on the shape and fate of the universe. Recorded observations of supernovae stretch back thousands of years, but only in the past 50 years have researchers been able to attempt to understand what’s really happening inside a supernova via computer modeling. These simulations, even when done in only one or two dimensions, can lead to new information and help address longstanding problems in astrophysics.

Now researchers have found a new way to make three-dimensional computer simulations of “core collapse” supernova explosions (one of two major types). Writing in the Sept. 1, 2010 issue of the Astrophysical Journal, Jason Nordhaus and Adam Burrows of Princeton University, and Ann Almgren and John Bell of Lawrence Berkeley National Laboratory report that the new simulations are beginning to match the massive blowouts astronomers have witnessed when gigantic stars die.7

The new simulations are based on the idea that the collapsing star is not spherical but distinctly asymmetrical and affected by a host of instabilities in the volatile mix surrounding its core (Figure 18). The simulations were made Project: SciDAC Computational possible by bigger and faster supercomputers and a new code that can take Astrophysics Consortium advantage of their power.

PI: Stan Woosley, University of California, Santa Cruz The researchers performed these 3D simulations with approximately 4 million computer processor hours on 8,000 to 16,000 cores of NERSC’s Cray XT4 Senior Investigators: Adam Burrows, Franklin system, using a new multidimensional radiation/hydrodynamic code Princeton University; Peter Nugent, called CASTRO, the development of which was led by Almgren and Bell of John Bell, and Ann Almgren, Lawrence Berkeley National Laboratory; Michael Berkeley Lab’s Center for Computational Sciences and Engineering. Zingale, State University of New York, Stony Brook; Louis Howell and Robert In the past, simulated explosions represented in one and two dimensions often Hoffman, Lawrence Livermore National stalled, leading scientists to conclude that their understanding of the physics Laboratory; Alex Heger, University of was incorrect or incomplete. Using the new CASTRO code on supercomputers Minnesota; Daniel Kasen, University many times more powerful than their predecessors, this team was able to create of California, Santa Cruz

Funding: HEP, SciDAC, NSF

Computing Resources: NERSC, 7 J. Nordhaus, A. Burrows, A. Almgren, and J. Bell, “Dimension as a key to the neutrino mechanism of TIGRESS, NICS, TACC, ALCF core-collapse supernova explosions,” Astrophysical Journal 720, 694 (2010). 34 NERSC Annual Report 2010-2011

supernova simulations in 3D that were ~50% easier to explode than in 1D, and exploded significantly faster than in 2D. This dimensional effect is much larger than that of a variety of other physical effects such as nuclear burning, inelastic scattering, and general relativity.

“It may well prove to be the case that the fundamental impediment to progress in supernova theory over the last few decades has not been lack of physical detail, but lack of access to codes and computers with which to properly simulate the collapse phenomenon in 3D,” the team wrote. “This could explain the agonizingly slow march since the 1960s toward demonstrating a robust mechanism of explosion.”

Burrows, a professor of astrophysical sciences at Princeton, commented: “I think this is a big jump in our Figure 18. Isoentropy surface of the explosion of a massive star. The snapshot is taken understanding of how these things ~50 milliseconds after the explosion. The multi-dimensional turbulent character of the can explode. In principle, if you blast is manifest. Image: J. Nordhaus, A. Burrows, A. Almgren, and J. Bell. could go inside the supernovae to their centers, this is what you might see.” “Imagine taking something as layers are ejected into space (Figure massive as the sun, then compacting 19). This stage is what observers see it to something the size of the Earth,” as the supernova. What’s left behind Birth of a Supernova Burrows said. “Then imagine that is an ultra-dense object called a collapsing to something the size neutron star. Sometimes, when an Supernovae are the primary source of Princeton.” ultramassive star dies, a black hole is of heavy elements in the cosmos. created instead. Most result from the death of single What comes next is even more stars much more massive than mysterious. Scientists have a sense of the steps the sun. leading to the explosion, but there is At some point, the implosion no consensus about the mechanism As a star ages, it exhausts the reverses. Astrophysicists call it “the of the bounce. Part of the difficulty hydrogen and helium fuel at its bounce.” The core material stiffens is that no one can see what is core. With still enough mass and up, acting like what Burrows calls a happening on the inside of a star. pressure to fuse carbon and produce “spherical piston,” emitting a shock During this phase, the star looks other heavier elements, it gradually wave of energy. Neutrinos, which are undisturbed. Then, suddenly, a blast becomes layered like an onion with inert particles, are emitted too. The wave erupts on the surface. Scientists the bulkiest tiers at its center. shock wave and the neutrinos are don’t know what occurs to make the Once its core exceeds a certain invisible. central region of the star instantly mass, it begins to implode. In the unstable. The emission of neutrinos squeeze, the core heats up and Then there is a massive and very is believed to be related, but no one grows even denser. visible explosion, and the star’s outer is sure how or why. Research News 35

“We don’t know what the mechan­­- temperature, pressure, gravitational conjures the entire phenomenon ism of explosion is,” Burrows said. acceleration, and velocity. and brings home what has happened. “As a theorist who wants to get It also allows one to diagnose the to root causes, this is a natural The calculations took months dynamics, so that the event is not problem to explore.” to process on supercomputers only visualized, but understood.” at NERSC, Princeton, and other centers. To help them visualize these Multiple Scientific results, the team relied on Hank Approaches The simulations are not ends unto Childs of the NERSC Analytics themselves, Burrows noted. Part Team. Nordhaus noted that Childs The scientific visualization employed of the learning process is viewing played an important role in helping by the research team is an inter- the simulations and connecting the team create 3D renderings disciplinary effort combining astro- them to real observations. In this of these supernovae. physics, applied mathematics, and case, the most recent simulations computer science. The endeavor are uncannily similar to the explosive “The Franklin supercomputer produces a representation, through behavior of stars in their death at NERSC gives us the most bang computer-generated images, of throes witnessed by scientists. for the buck,” said Nordhaus. three-dimensional phenomena. In addition, scientists often learn “This system not only has enough In general, researchers employ from simulations and see behaviors cores to run our 3D simulations, visualization techniques with the they had not expected. but our code scales extremely well aim of making realistic renderings on this platform. Having the code of quantitative information including “Visualization is crucial,” Burrows developers, John Bell and Ann surfaces, volumes, and light sources. said. “Otherwise, all you have is Almgren, nearby to help us scale Time is often an important compon- merely a jumble of numbers. our simulations on the super- ent too, allowing researchers to Visualization via stills and movies computer is an added bonus.” create movies showing a simulated process in motion.

To do their work, Burrows and his colleagues came up with Time = 0.156 s Time = 0.201 s mathematical values representing the energetic behaviors of stars by using mathematical representations of fluids in motion—the same partial differential equations solved by geophysicists for climate modeling and weather forecasting. To solve these complex equations and simulate what happens inside 2000 kilometers 2000 kilometers a dying star, the team used the CASTRO code, which took into Time = 0.290 sTime = 0.422 s account factors that changed over time, including fluid density,

Figure 19. Evolutionary sequence of blast morphology for four different times after the bounce (0.156, 0.201, 0.289, and 0.422 seconds). The scale is more than 2000 km on a side. Image: J. Nordhaus, 2000 kilometers 2000 kilometers A. Burrows, A. Almgren, and J. Bell. Surface Mineral Research News 37

Innovative simulations reveal that Earth’s silica is predominantly superficial

Silica is one of the most common minerals on Earth. Not only does it make up two-thirds of our planet’s crust, it is also used to create a variety of materials from glass to ceramics, computer chips, and fiber optic cables. Yet new quantum mechanics results generated by a team of physicists from State University (OSU) and elsewhere show that this mineral only populates our planet superficially—in other words, silica is relatively uncommon deep within the Earth.

Using several of the largest supercomputers in the nation, including NERSC’s Cray XT4 Franklin system, the team simulated the behavior of silica in high- temperature, high-pressure environments that are particularly difficult to study in a lab. These details may one day help scientists predict complex geological processes like earthquakes and volcanic eruptions. Their results were published in the May 25, 2010 edition of the Proceedings of the National Academy of Sciences (PNAS).8

“Silica is one of the simplest and most common minerals, but we still don’t understand everything about it. A better understanding of silica on a quantum- mechanical level would be useful to earth science, and potentially to industry as well,” says Kevin Driver, an OSU graduate student who was corresponding author of the paper. “Silica adopts many different structural forms at different temperatures and pressures, not all of which are easy to study in the lab.”

Over the past century, seismology and high-pressure laboratory experiments have revealed a great deal about the general structure and composition of the Earth. For example, such work has shown that the planet’s interior structure exists in three layers called the crust, mantle, and core (Figure 20). The outer two layers —the mantle and the crust—are largely made up of silicates, minerals containing silicon and oxygen. Still, the detailed structure and composition of the deepest Project: Modeling Dynamically and parts of the mantle remain unclear. These details are important for complex Spatially Complex Materials geodynamical modeling that may one day predict large-scale events, such as earthquakes and volcanic eruptions. PI: John Wilkins, Ohio State University

Funding: BES, NSF

8 Computing Resources: NERSC, NCAR, K. P. Driver, R. E. Cohen, Zhigang Wu, B. Militzer, P. López Ríos, M. D. Towler, R. J. Needs, and J. W. Wilkins, “Quantum Monte Carlo computations of phase stability, equations of state, and elasticity TeraGrid, NCSA, OSC, CCNI of high-pressure silica,” PNAS 107, 9519 (2010). 38 NERSC Annual Report 2010-2011

Liquid Outer Core Solid Inner Core

Mantle

Crust

Figure 20. Cross-section of the Earth.

Driver notes that even the role that beneath the Earth’s crust, the forms surrounded by four oxygen the simplest silicate—silica—plays structure of the silica changes atoms and higher-pressure forms in the Earth’s mantle is not well dramatically,” he says. surrounded by six (Figure 21). understood. “Say you’re standing on With even more pressure, the a beach, looking out over the ocean. As pressure increases with depth, structure collapses into a very dense The sand under your feet is made of the silica molecules crowd closer form of the mineral, which scientists quartz, a form of silica containing together, and the silicon atoms call the alpha-lead dioxide form. one silicon atom surrounded by four start coming into contact with more oxygen atoms. But in millions of oxygen atoms from neighboring Driver notes that it is this form years, as the oceanic plate below molecules. Several structural of silica that likely resides deep becomes subducted and sinks transitions occur, with low-pressure within the earth, in the lower part Research News 39

of the mantle, just above the planet’s “This work demonstrates … that the quantum core. When scientists try to interpret Monte Carlo method can compute nearly every seismic signals from that depth, they have no direct way of knowing property of a mineral over a wide range of what form of silica they are dealing pressures and temperatures.” with. They must rely on high- pressure experiments and computer simulations to constrain the possibilities. Driver and colleagues explicitly treats the electrons and Wilkins and his colleagues expect use a particularly high-accuracy, their interactions via a stochastic that quantum Monte Carlo will be quantum mechanical simulation solution of Schrödinger’s equation. used more often in materials science method to study the thermodynamic Although QMC algorithms have been in the future, as the next generation and elastic properties of different used for over half a century to of computers goes online. silica forms, and then compare simulate atoms, molecules, gases, the results to the seismic data. and simple solids, Driver notes that Coauthors on the paper included applying them to a complex solid Ronald Cohen of the Carnegie In PNAS, Driver, his advisor John like silica was simply too labor- and Institution of Washington; Zhigang Wilkins, and their coauthors describe computer-intensive until recently. Wu of the Colorado School of Mines; how they used a quantum mechanical Burkhard Militzer of the University method to design computer algo- “In total, we used the equivalent of California, Berkeley; and Pablo rithms that would simulate the silica of six million CPU hours to model López Ríos, Michael Towler, and structures. When they did, they four different states of silica. Three Richard Needs of the University found that the behavior of the dense, million of those CPU hours involved of Cambridge. alpha-lead dioxide form of silica did using NERSC’s Franklin system not match up with any global seismic to calculate a shear elastic constant signal detected in the lower mantle. for silica with the QMC method. This result indicates that the lower This is the first time it had ever mantle is relatively devoid of silica, been done,” said Driver. The Franklin except perhaps in localized areas results allowed him to measure where oceanic plates have subducted. how silica deforms at different temperatures and pressures. “As you might imagine, experiments performed at pressures near those of “From computing hardware to the Earth’s core can be very challenging. consulting staff, the resources at By using highly accurate quantum NERSC are really excellent,” says mechanical simulations, we can offer Driver. “The size and speed of the reliable insight that goes beyond the center’s machines is something scope of the laboratory,” says Driver. that I don’t normally have access to at other places.” Figure 21. The structure of a six-fold A New Application of Wilkins comments, “This work coordinated silica phase called stishovite, Quantum Monte Carlo demonstrates both the superb a prototype for many more complex contributions a single graduate mantle minerals. Stishovite is commonly The team’s work was one of the first student can make, and that the created in diamond anvil cells and often to show that quantum Monte Carlo quantum Monte Carlo method found naturally where meteorites have (QMC) methods could be used can compute nearly every property slammed into earth. Driver and co- to study complex minerals. QMC of a mineral over a wide range of authors computed the shear elastic does not rely on density functional pressures and temperatures. The constant softening of stishovite under calculations, in which approximations study will stimulate a broader use pressure with quantum Monte Carlo can produce inaccurate results in of quantum Monte Carlo worldwide using over 3 million CPU hours on these kinds of studies; instead, QMC to address vital problems.” Franklin. Image: Kevin Driver. Soggy Origami Research News 41

Nanotech research on graphene is blossoming at NERSC

In a 1959 lecture titled “There’s Plenty of Room at the Bottom,” physicist and future Nobel laureate Richard Feynman predicted that humans would create increasingly smaller and more powerful machines by tapping into the quantum physics that drives atoms and molecules. Nearly 50 years later this prediction is realized in the field of materials nanoscience, where some researchers are designing materials atom by atom to build devices with unprecedented capabilities.

In fact, assistant professor Petr Král and his team of graduate students from the University of Illinois at Chicago (UIC) are making headway in this arena by using supercomputers at NERSC to investigate the molecular and atomic properties of graphene—a single-atom layer of carbon, densely packed in a honeycomb pattern. Discovered in 2004, graphene is the most conductive material known to humankind and could be used to make the next generation of transistors, computer memory chips, and high-capacity batteries.

Manipulating the “Miracle Material” with Water

Electrons blaze through graphene faster than in any other conductor, making it ideal for use in a variety of high-tech devices. With 200 times the breaking strength of steel, graphene is also stronger and stiffer than a diamond. The application potential for this material is so promising that the two scientists who discovered it, Andre Geim and Konstantin Novoselov, won the 2010 Nobel Prize in Physics.

But before this “miracle material” can be widely implemented in new technologies, researchers must figure out how to control its shape. That’s exactly what Král’s research team did. In an article published in Nano Letters9 Project: Multiscale Modeling of and highlighted in Nature’s “News and Views” section,10 Král and his former Nanofluidic Systems UIC graduate students Niladri Patra and Boyang Wang reported that nanodroplets of water could be used to shape sheets of graphene. Using the scalable NAMD PI: Petr Král, University of Illinois at Chicago

Funding: BES, NSF 9 Niladri Patra, Boyang Wang, and Petr Král, “Nanodroplet activated and guided folding of graphene Computing Resources: NERSC nanostructures,” Nano Letters 9, 3766 (2009). 10 Vincent H. Crespi, “Soggy origami,” Nature 462, 858 (2009). 42 NERSC Annual Report 2010-2011

code to study molecular dynamics and SIESTA to calculate electron structure on NERSC’s Franklin a. b. system, they found that nanodroplets of water remove the energy barrier that prevents graphene from folding.

Because atoms exposed on the surface of materials tend to be less stable than those buried within, placing a water droplet on graphene —which is just a single layer of c. d. carbon atoms—will cause the material to deform so that the number of exposed atoms is minimized and the system’s energy is reduced. Although spontaneous folding is not a characteristic unique to wet graphene, the computational Figure 22. A droplet of water (pink) causes the “petals” of a flower-shaped graphene models also showed that researchers ribbon (green) about 13 nanometers across to fold up around it. Image: N. Patra, B. could control how the material Wang, and P. Král. deforms. Being able to easily shape graphene is essential to its application in future technologies.

According to Král, a graphene a. sheet will naturally wrap around a water droplet. By positioning b. nanodroplets of water on graphene c. with guided microscopic needles, scientists can make the carbon d. sheet roll, bend, and slide; the material can also be manipulated e. into a variety of complex forms like capsules, sandwiches, knots, and rings, to serve as the foundation Figure 23. Placing a droplet of water on Figure 24. Placing a graphene nanoribbon for nanodevices. one end of a graphene ribbon causes the on a nanotube causes it to fold on the free end to wrap around it, thus folding nanotube surface, even in the presence The end product depends on the the ribbon in half. Image: N. Patra, of nonpolar solution. Image: N. Patra, graphene’s initial shape and the water B. Wang, and P. Král. Y. Song and P. Král. droplet’s diameter. For example, placing a droplet of water in the center of the “petals” of a flower- Controlling graphene’s shape and conductivity is shaped graphene ribbon causes the “flower” to close (Figure 22). essential to its application in future technologies. A droplet placed on one end of a graphene ribbon causes

11 N. Patra, Y. Song, and P. Král, “Self-assembly of graphene nanostructures on nanotubes,” ACS Nano (in press). Research News 43

6 the free end to wrap around it, a. b. c. d. 2 thus folding the ribbon in half 0.7 3 (Figure 23). Another study, which 1 [eV]

will appear in a forthcoming issue f 0 0 0.0 of ACS Nano, shows that placing sym E - asym -1 a graphene ribbon on a nanotube -3 pristine causes the ribbon to wrap itself -2 -0.7 around the tube, even in solvents -6 (Figure 24).11 00k [π/a] 0.5 0 k [π/a] 0.1 k [π/a] 0.1 0 DOS [a.u] 8

6 e. f. g. h. “Until these results it wasn’t 2 0.7 sym asym thought that we could controllably 3 1 pristine fold these structures,” says Král. [eV] f “Now, thanks to NERSC, we know 0 0 0.0 E - how to shape graphene using -1 -3 weak forces between nanodroplets -2 -0.7 of water.” -6

00k [π/a] 0.5 0 k [π/a] 0.1 k [π/a] 0.1 0 DOS [a.u] 25

Manipulating the Conductivity of Graphene Figure 25. These graphs show the energies of electronic states as a function of their momentum (k). When the curves touch each other in the center, the graphene is a metal; In addition to shaping graphene, otherwise it is a semiconductor. The graphs on the right side show densities of states. Král and another UIC graduate AGNR is armchair graphene nanoribbon and ZGNR is a zigzag. Band structure is shown student, Artem Baskin, used for: (a) pristine 11-AGNR, (b) 11-AGNR with centered pore, (c) 11-AGNR with shifted large-scale density functional theory pore, (e) pristine 10-ZGNR, (f) 10-ZGNR with centered pore, (g) 10-ZGNR with shifted ab initio calculations at NERSC pore. The notation 11- or 10- here means the width of the ribbons, namely, 11-AGNR to confirm a belief long held by and 10-ZGNR are nanoribbons of 5 honeycomb carbon rings width, as shown in the scientists—that they could insets of (d) and (h). Image: A. Baskin and P. Král. manipulate the conductivity of graphene by strategically inserting holes into it.12 The simulations nanoribbons, superlattices, and The team used NERSC resources revealed that the arrangement nanotubes. Based on their results, for algorithm testing as well as of these pores and their size could Baskin and Král propose a modeling. “NERSC resources have be used to control graphene’s phenomenological model that can been indispensible for us,” says conductivity; and changing the be used to predict the band structure Král. “Our projects require long- distance between these pores can features of porous nanocarbons multiprocessor calculations, cause the material to switch from without performing the time- and access to the center’s metal to semiconducting porous consuming quantum mechanical computation time was very graphene (Figure 25). calculations. The duo successfully important for progressing our tested this model on a number research in a timely manner.” In this project they explored the of systems and recently submitted atomic characteristics of porous a paper with this recommendation graphene-based materials, like to Physical Review Letters.

12 A. Baskin and P. Král, “Electronic structures of porous nanocarbons,” Physical Review Letters (submitted). An Antimatter Hypernucleus Research News 45

Discovery opens the door to new dimensions of antimatter

The strangest form of antimatter ever seen has been discovered by the STAR experiment at the Relativistic Heavy Ion Collider (RHIC) at DOE’s Brookhaven National Laboratory in New York.13 The new particle goes by the jawbreaking name “anti-hyper-triton.”

Translated, that means a nucleus of antihydrogen containing one antiproton and one antineutron—plus one heavy relative of the antineutron, an antilambda hyperon. It’s that massive antilambda that makes the newly discovered antinucleus “hyper.”

“STAR is the only experiment that could have found an antimatter hypernucleus,” says Nu Xu of Berkeley Lab’s Nuclear Science Division, the spokesperson for the STAR experiment. “We’ve been looking for them ever since RHIC began operations. The discovery opens the door on new dimensions of antimatter, which will help astrophysicists trace back the story of matter to the very first millionths of a second after the big bang.”

RHIC was designed to collide heavy nuclei like gold together at very high energies. The fireballs that blossom in the collisions are so hot and dense that the protons Project: STAR Detector Simulations and and neutrons in the nuclei are shattered and there is massive production of Data Analysis partons (Figure 26).

PI: Grazyna Odyniec, Lawrence Berkeley National Laboratory For a brief instant there exists a quark-gluon plasma that models the first moments of the early universe, in which quarks—and gluons, which strongly Senior Investigators: Nu Xu, LBNL; bind quarks together at lower temperatures—are suddenly free to move James Dunlop, Brookhaven National independently. But as the quark-gluon plasma cools, its constituents recombine Laboratory; Olga Barannikova, University in a variety of ways. of Illinios, Chicago; Bernd Surrow, Massachusetts Institute of Technology The commonest quarks, which are also the lightest, are the ups and downs; Funding: NP, HEP, NSF, Sloan they make up the protons and neutrons in the nuclei of all the familiar ordinary Foundation, CNRS/IN2P3, STFC, EPSRC, atoms around us. Heavier particles also condense out of the quark-gluon FAPESP–CNPq, MESRF, NNSFC, CAS, plasma, including the lambda, made of an up quark, a down quark, and MoST, MoE, GA, MSMT, FOM, NWO, DAE, DST, CSIR, PMSHE, KRF, MSES, a massive strange quark. RMST, ROSATOM

Computing Resources: NERSC, RCF, OSG 13 The STAR Collaboration, “Observation of an antimatter hypernucleus,” Science 238, 58 (2010). 46 NERSC Annual Report 2010-2011

a. b. 50 cm David Nygren; STAR’s TPC was built there and shipped to Brookhaven aboard a giant C-5A cargo plane.

3

HV “What a time projection chamber does is allow three-dimensional imaging of the tracks of every particle and its decay products in a single gold-gold collision,” Nu Xu π+

3 3 + explains. “The tracks are curved by He He π a uniform magnetic field, which Figure 26. In a single collision of gold nuclei at RHIC, many hundreds of particles are allows their original mass and energy emitted, most created from the quantum vacuum via the conversion of energy into mass to be determined by following them in accordance with Einstein’s famous equation E = mc2. The particles leave telltale tracks back to their origin. We can recon- in the STAR detector (shown here from the end and side). Scientists analyzed about a struct the ‘topology of decay’ of a hundred million collisions to spot the new antinuclei, identified via their characteristic chosen event and identify every decay into a light isotope of antihelium and a positive pi-meson. Altogether, 70 examples particle that it produces.” of the new antinucleus were found. Image: STAR Collaboration. So far the STAR collaboration has identified about 70 of the strange The New Region lambdas or other hyperons, have new antihypertritons, along with of Strange Antimatter been made in particle accelerators about 160 ordinary hypertritons, since the 1950s—not long after the from 100 million collisions. As The standard Periodic Table first lambda particles were identified more data is collected, the STAR of Elements is arranged according in cosmic ray debris in 1947. researchers expect to find even to the number of protons, which heavier hypernuclei and antimatter determine each element’s chemical Quarks and antiquarks exist in equal hypernuclei, including alpha properties. Physicists use a more numbers in the quark-gluon plasma, particles, which are the nuclei complex, three-dimensional chart so the particles that condense from of helium. to also convey information on the a cooling quark-gluon plasma come number of neutrons, which may not only in ordinary-matter forms, change in different isotopes of the but in their oppositely charged Data Simulation, same element, and a quantum antimatter forms as well. Storage, and Analysis number known as “strangeness,” which depends on the presence of Antiprotons and antineutrons To identify this antihypertriton, strange quarks (Figure 27). Nuclei were discovered at Berkeley Lab’s physicists used supercomputers containing one or more strange Bevatron in the 1950s. A few at NERSC and other research quarks are called hypernuclei. antihydrogen atoms with a single centers to painstakingly sift through antiproton for a nucleus have even the debris of some 100 million For all ordinary matter, with no been made, but no heavier forms collisions. The team also used strange quarks, the strangeness of antihydrogen. A hydrogen nucleus NERSC’s PDSF system to simulate value is zero and the chart is flat. with one proton and two neutrons detector response. These results Hypernuclei appear above the plane is called a triton. The STAR discovery allowed them to see that all of the of the chart. The new discovery is the first time anyone has observed charged particles within the collision of strange antimatter with an an antihypertriton. debris left their mark by ionizing antistrange quark (an antihyper- the gas inside RHIC’s time projection nucleus) marks the first entry What made the discovery possible chamber, while the antihypertritons below the plane. was the central element in the STAR revealed themselves through experiment, its large time projection a unique decay signature—the two “Ordinary” hypernuclei, in which chamber (TPC). The TPC concept tracks left by a charged pion and a neutron or two are replaced by was invented at Berkeley Lab by an antihelium-3 nucleus, the latter Research News 47

being heavy and so losing energy -S 6vv

rapidly with distance in the gas. He

v

6 v

Li 7 8 v

Z Li 9 v

“These simulations were vital Li 10 v

4 v Li v Li

He 5 6 v

He 7 v

He He 8 v

3 v He to helping us optimize search v H 4 H 11B 7Be conditions such as topology of the 9Be 10Be 6Li 7 Li 8Li 9 decay configuration,” says Zhangbu 3 4 Li He He 6 He 8 1H 2 He Xu, a physicist at Brookhaven who H 3H is one of the STAR collaborators. n n 3 “By embedding imaginary anti- 2 H 1 H H matter in a real collision and 3 N optimizing the simulations for the He

best selection conditions, we were 3 v H able to find a majority of those embedded particles.”

As a STAR Tier 1 site, the PDSF stores a portion of the raw data Figure 27. The diagram above is known as the 3D chart of the nuclides. The familiar from the experiment and all of the Periodic Table arranges the elements according to their atomic number, Z, which analysis-ready data files. The STAR determines the chemical properties of each element. Physicists are also concerned with databases at Brookhaven are the N axis, which gives the number of neutrons in the nucleus. The third axis represents mirrored to PDSF to provide direct strangeness, S, which is zero for all naturally occurring matter, but could be non-zero in access for PDSF users. STAR the core of collapsed stars. Antinuclei lie at negative Z and N in the above chart, and the currently maintains around 800 TB newly discovered antinucleus (red) now extends the 3D chart into the new region of of data in NERSC’s HPSS archive. strange antimatter. Image: STAR Collaboration.

Great Expectations will help us distinguish between between matter and antimatter, models that describe these exotic and there are good prospects for The STAR detector will soon be states of matter.” future antimatter measurements upgraded, and as the prospects at RHIC to address this key issue.” for determining precise ratios for “Understanding precisely how and the production of various species why there’s a predominance of Why there is more matter than of strange hypernuclei and their matter over antimatter remains a antimatter in the universe is one antimatter counterparts increase, major unsolved problem of physics,” of the major unsolved problems so does the expectation that STAR says Zhangbu Xu. “A solution will in science. Solving it could tell will be able to shed new light on require measurements of subtle us why human beings, indeed fundamental questions of physics, deviations from perfect symmetry why anything at all, exists today. astrophysics, and cosmology, including the composition of col- lapsed stars and the ratio of matter and antimatter in the early universe.

“The strangeness value could be nonzero in the core of collapsed Subtle deviations from perfect symmetry will help stars,” says Jinhui Chen of the STAR collaboration, a postdoctoral explain why there is more matter than antimatter. researcher at Kent State University and staff scientist at the Shanghai Institute of Applied Physics, “so the present measurements at RHIC Dynameomics Research News 49

A new database helps scientists discover how proteins do the work of life

All organisms from the largest mammals to the smallest microbes are composed of proteins—long, non-branching chains of amino acids. In the human body, proteins play both structural and functional roles, making up muscle tissue and organs, facilitating bone growth, and regulating hormones, in addition to various other tasks.

Scientists know that the key to a protein’s biological function depends on the unique, stable, and precisely ordered three-dimensional shape that it assumes just microseconds after the amino acid chain forms. Biologists call this the “native state,” and suspect that incorrectly folded proteins are responsible for illnesses like Alzheimer’s disease, cystic fibrosis, and numerous cancers. Over the years, NMR spectroscopy and X-ray crystallography experiments have provided a lot of information about the native state structure of most protein folds, but the scientific rules that determine their functions and cause them to fold remain a mystery. Researchers expect that a better understanding of these rules will pave the way for breakthroughs in medical applications, drug design, and biophysics.

This is where Valerie Daggett, professor of Bioengineering at the University of Washington, Seattle, believes supercomputers can help. Over the last six years, Daggett used a total of 12.4 million processor hours at NERSC to perform molecular dynameomics simulations. This technique combines molecular dynamics (the study of molecular and atomic motions) with proteomics (the study of protein structure and function) to characterize the folding and unfolding pathways of proteins. The culmination of this work has been incorporated into a public database, http://www.dynameomics.org/.

“We already have several examples of dynamic behavior linked to function that was discovered through molecular dynamic simulations and not at all evident in the static average experimental structures,” Daggett says in a June 2010 Project: Molecular Dynameomics Nature Methods article.14

PI: Valerie Daggett, University of Washington, Seattle

Funding: BER, NIH, Microsoft

Computing Resources: NERSC 14 Allison Doerr, “A database of dynamics,” Nature Methods 7, 426 (2010). 50 NERSC Annual Report 2010-2011

Target Selection Preparation Simulaition Marc van der Kamp, a postdoctoral Fold Families Input Native (298K, 310K) researcher in Daggett’s group, along with a dozen co-authors in a paper featured on the cover of the April 14, Minimization Add 2 Hydrogens 2010 issue of Structure (Figure 29).

SNPs Solvation Denaturing (498K) A protein will easily unfold, or denature, when enough heat is applied. This unfolding process follows the same transitions and intermediate structures as folding, Dynameomics except in reverse. To simulate this Data SLIRP Warehouse Data Mining process computationally, Daggett’s Backbone Propensity Structural Mining team uses the “in lucem Molecular Dynamics” (ilmm) code, which was Analysis developed by David Beck, Darwin RMSD Contacts Alonso, and Daggett, to solve Rotamer Flexability Newton’s equations of motion for every atom in the protein molecule SASA Φ-Ψ and surrounding solvent. These

Fragments Wavelets simulations map the conformational

Custom changes of a protein with atomic- level detail, which can then be tested with experiments.

Figure 28. Overview of the generation and use of molecular dynameomics simulation data, To determine the trajectories of organized through the Dynameomics Data Warehouse. The repository include a Structural protein unfolding, Daggett’s team Library of Intrinsic Residue Propensities (SLIRP). Image: Daggett Group. simulates each protein at least six times—once to explore the molecular dynamics of a native state for at least Unraveling Proteins with structures for representatives of all 51 nanoseconds at a temperature of Computer Simulations protein folds, including their unfold­ 298 K (77° F), and five more times ing pathways. The Dynameomics Data to study the dynamics of protein at Although the Protein Data Bank (PDB) Warehouse provides an organizing 498 K (437° F), with two simulations of experimentally derived protein framework, a repository, and a variety of at least 51 ns and three short structures has been vital to many of access interfaces for simulation simulations of at least 2 ns each. important scientific discoveries, these and analysis data (Figure 28). data do not tell the whole story of Daggett’s group applies a careful protein function. These structures are “With the simulation data stored strategy to select diverse protein simply static snapshots of proteins in an easily queryable structured folds to analyze. Using three existing in their native, or folded, state. In repository that can be linked to protein fold classification systems— reality, proteins are constantly moving, other sources of biological and SCOP, CATH, and Dali—they identify and these motions are critical to their experimental data, current scientific domains for proteins available function. This is why Daggett’s team questions can be addressed in ways in the PDB and cluster them into wanted to create a complementary that were previously impossible “metafolds.”3 The team has database of molecular dynamics or extremely cumbersome,” writes simulated at least one member from

2 M. W. van der Kamp, R. D. Schaeffer, A. L. Jonsson, A. D. Scouras, A. M. Simms, A. D. Toofanny, N. C. Benson, P. C. Anderson, E. D. Merkley, S. Rysavy, D. Bromley, D. A. C. Beck, and V. Daggett, “Dynameomics: A comprehensive database of protein dynamics,” Structure 18, 423 (2010). 3 R. Dustin Schaeffer, Amanda L. Jonsson, Andrew M. Simms, and Valerie Daggett, “Generation of a consensus protein domain dictionary,” Bioinformatics 27, 46 (2011). Research News 51

each protein metafold. Although Figure 29. The plane of structures on the experimental dynamic information bottom shows sample fold-representatives is not available for most of their that were simulated. One example targets, Daggett notes that the of a simulated protein, Protein-tyrosine simulations do compare well where phosphatase 1B (PDB code 2HNP), experimental data are available. is highlighted above. Cover image: Marc W. van der Kamp, Steven Rysavy, Daggett began running these and Dennis Bromley. ©2010, Elsevier Ltd. simulations at NERSC in 2005, when she received a Department search for transient conformations of Energy INCITE allocation of that would allow the binding of drugs 2 million processor hours on the or chemical chaperones. These Dynameomics facility’s IBM SP “Seaborg” system. molecules could then stabilize the Data Suggests New In that first year, she modeled the protein structure without interfering Treatment Strategy folding pathways of 151 proteins with function. This kind of in approximately 906 simulations. information is readily available for Amyloid Diseases Subsequently her team received through simulation but not through One of the most exciting applications additional allocations from DOE’s standard experimental techniques. of Dynameomics involves amyloids, Office of Biological and Environmental the insoluble, fibrous protein deposits Research, as well as NERSC Some of the proteins in the that play a role in Alzheimer’s discretionary time, to continue Dynameomics Data Warehouse include disease, type 2 diabetes, Parkinson’s their successful efforts. As of 2010, single-nucleotide polymorphism- disease, heart disease, rheumatoid the group has completed over 11,000 containing mutant human proteins arthritis, and many other illnesses. individual molecular dynamics implicated in disease. In some cases, The Daggett group has mined the simulations on more than 2,000 the computer simulations provided simulations of amyloid producing proteins from 218 different organisms. valuable insights into why certain proteins and found a common structural feature in the intermediate Their complete database of mutations caused structural stages between normal and toxic simulations contains over 340 disruptions at the active site. The forms. Compounds with physical microseconds of native and unfolding database also includes proteins from properties complementary to these 8 simulations, stored as >10 individual thermophilic organisms, which thrive structural features were designed structures, which is four orders of in scorching environments. By com- on computers and synthesized. magnitude larger than the PDB. paring the native state and unfolding dynamics of thermophiles to their Experiments showed that these custom-designed compounds inhibit “This is a data-mining endeavor mesophilic counterparts, which amyloid formation in three different to identify similarities and thrive in moderate temperatures, protein systems, including those differences between native and researchers hope to discover the responsible for (1) Alzheimer’s unfolded states of proteins across mechanisms that make thermophiles disease, (2) Creutzfeldt-Jakob disease all secondary and tertiary structure so stable at high temperatures. and mad cow disease, and (3) types and sequences,” says systemic amyloid disease and heart Daggett. “It is necessary to “There are approximately 1,695 disease. The experimental compounds successfully predict native states known non-redundant folds, of which preferentially bind the toxic forms of of proteins, in order to translate 807 are self-contained autonomous these proteins. These results suggest the current deluge of genomic folds, and we have simulated it may be possible to design drugs that information into a form appropriate representatives for all 807,” says can treat as many as 25 amyloid for better functional identification Daggett. “We couldn’t have come this diseases, to design diagnostic tools for amyloid diseases in humans, of proteins and for drug design.” far without access to DOE resources and to design tests to screen our at NERSC, and with their continued blood and food supplies. A patent Traditional drug design strategies tend suport we will simulate more targets application for this technology is in to focus on native-state structure alone. for well-populated metafolds to preparation, and further experiments But the Dynameomics Data Warehouse explore the sequence determinants are being planned. allows researchers to systematically of dynamics and folding.” 52 NERSC Annual Report 2010-2011

NISE Program Encourages Innovative Research

The NERSC Initiative for Scientific Exploration (NISE) program provides special allocations of computer time for exploring new areas of research, such as:

• A new research area not covered by the existing ERCAP proposal: this could be a tangential research project or a tightly coupled supplemental research initiative.

• New programming techniques that take advantage of multicore compute nodes by using OpenMP, Threads, UPC or CAF: this could include modifying existing codes, creating new applications, or testing the performance and scalability of multicore programming techniques.

• Developing new algorithms that increase researchers’ scientific productivity, for example by running simulations at a higher scale or by incorporating new physics.

Twenty-seven projects received NISE awards in 2010 (see sidebar). Although many NISE projects are exploratory, some have already achieved notable scientific results. A few of those are described briefly on the following pages. Research News 53

102 #51 a. Modeling the Energy Flow 10 0 the mechanism by which the van ) -1 10 -2 in the Ozone m der Waals states of ozone contribute -4

/c 10 2 Recombination Reaction 0 10 -6 to the ozone-forming recombination (a E 10 -8 reaction (Figure 30). This method σ The ozone-oxygen cycle is the 10 -10 of treating the dynamics of the 10 -12 process by which ozone (O ) and energy transfer allows the 3 102 #53 b. oxygen (O2) are continuously 10 0 incorporation of quantum zero-point ) converted back and forth from one to -1 10 -2 energy, scattering resonances, and m -4 /c 10 2

the other in Earth’s stratosphere. 0 symmetry—the features essential 10 -6 (a This chemical process converts solar E 10 -8 for understanding the isotope effect. σ ultraviolet radiation into heat and 10 -10 -12 greatly reduces the amount of 10 Explaining the anomalous isotope 102 harmful UV radiation that reaches #54 c. effects in O , NO , and CO will 10 0 3 2 2 ) the lower atmosphere. The NISE -1 10 -2 significantly improve our m -4 /c 10 project “Modeling the Energy Flow in 2 understanding of their production, 0 10 -6 (a the Ozone Recombination Reaction,” E 10 -8 chemistry, lifetime, and loss in the σ led by Dmitri Babikov of Marquette 10 -10 atmosphere. That knowledge will 10 -12 University, aims to resolve the -1500-1000 -500 5000 1000 help to identify and remove pollution remaining mysteries of what sources as well as monitor the ozone E (cm-1) determines the isotopic composition hole, with the possible impact of atmospheric ozone and to Figure 30. Energy transfer functions on enhancing the security of all life investigate unusual isotope effects in expressed as the differential (over energy) on the planet. It will allow the isotopic other atmospheric species. These cross sections. Three frames correspond composition of oxygen to be used as effects are poorly understood, mainly to three initial states. Transition and a reliable probe of its source and due to computational difficulties stabilization cross sections are shown as history and provide information for associated with the quantum black circles and red dots, respectively. studying atmospheric chemistry, global dynamics treatment of polyatomic Presence of the van der Waals states climate change, atmospheres of systems. in the range −150

17 M. V. Ivanov and D. Babikov, “Mixed quantum-classical theory for the collisional energy transfer and the rovibrational energy flow: Application to ozone stabilization,” J. Chem. Phys. (in press). 18 M. V. Ivanov and D. Babikov, “Collisional stabilization of van der Waals states of ozone,” submitted to J. Chem. Phys. 54 NERSC Annual Report 2010-2011

at the high intensities required including carrier leakage, nanoscale, material characteristics for these applications. The project recombination at dislocations, change significantly from the “First Principles Modeling of and Auger recombination, but the conventional due to pronounced Charged-Defect-Assisted Loss underlying microscopic mechanism effects of geometry and extreme Mechanisms in Nitride Light remains an issue of intense debate. quantum confinement in various Emitters,” led by Emmanouil Using atomistic first-principles dimensions. On the other hand, Kioupakis of the University of calculations, Kioupakis and his a full quantum mechanical simulation California, Santa Barbara, is collaborators have shown that of the transport phenomena in investigating why nitride light- charge-defect-scattering is not nanostructured materials, with emitting devices lose their efficiency the major mechanism that minimum approximations, is at high intensities and suggesting contributes to the efficiency extremely difficult computationally. ways to fix this problem. degradation. Instead, the efficiency droop is caused by The NISE project “Electron Transport Several mechanisms have been indirect Auger recombination, in the Presence of Lattice Vibrations blamed for this efficiency reduction, mediated by electron-phonon for Electronic, Spintronic, and coupling and alloy scattering Photovoltaic Devices,” led by Sayeef (Figure 31).19, 20, 21 Due to the Salahuddin of the University of 4.0 Electrons c participation of phonons in the California, Berkeley, aims to 3.0 Electrons || c a. Holes c 2.0 Holes || c Auger process, increasing massively parallelize the NEST ) 2 1.0 temperature contributes to the code for solving quantum transport

cm 0.0

-18 0.008 performance degradation. Moreover, problems, including spin, using the 0.006 b. the indirect Auger coefficients nonequilibrium Green’s function 0.004 0.002 increase as a function of indium formalism. NEST can be used to 0.000 mole fraction and contribute to the simulate transport in a wide variety 3.0 c. efficiency reduction for devices of nanostructures where quantum 2.0 1.0 operating at longer wavelengths (the effects are of significant importance, 0.0 “green gap” problem). By identifying for example, nanoscale transistors the origin of the droop, these results and memory devices, solar cells, Absorption cross section (10 0.12 d. 0.08 provide a guide to addressing the and molecular electronics. 0.04 efficiency issues in nitride LEDs and 0.00 400 450 500 550 600 650 to engineering highly efficient Perfect scaling up to 8,192 processor Photon wavelength (nm) solid-state lighting. cores has been achieved. This massive parallelization has enabled Figure 31. The phonon- and charged- Electron Transport for the researchers to solve problems defect-assisted absorption cross-section Electronic, Spintronic, that had been deemed intractable spectra by free electrons and holes of and Photovoltaic Devices before. This resulted in multiple gallium nitride at room temperature are publications in leading journals, shown in (a) and (b), respectively. The With the increasing ability to including a cover story in the July direct absorption spectrum is plotted in lithographically create atomic-scale 19, 2010 issue of Applied Physics (c) for holes and (d) for electrons. The devices, a full-scale quantum Letters (Figure 32).22 four lines correspond to absorption by transport simulation is becoming electrons or holes, for light polarized necessary for efficiently designing The starting point for that study either parallel or perpendicular to the c nanoscale electronic, spintronic, was an experimental demonstration axis. Image: E. Kioupakis et al., ref. 19. and photovoltaic devices. At the that a graphene nanoribbon (GNR)

19 E. Kioupakis, P. Rinke, A. Schleiffe, F. Bechstedt, and C. G. Van de Walle, “Free-carrier absorption in nitrides from first principles,” Phys. Rev. B 81, 241201(R) (2010). 20 E. Kioupakis, P. Rinke, and C. G. Van de Walle, “Determination of internal loss in nitride lasers from first principles,” Appl. Phys. Express 3, 082101 (2010). 21 E. Kioupakis, P. Rinke, K. T. Delaney, and C. G. Van de Walle, “Indirect Auger recombination as a cause of efficiency droop in nitride LEDs,” submitted (2011). 22 Youngki Yoon and Sayeef Salahuddin, “Barrier-free tunneling in a carbon heterojunction transistor,” Applied Physics Letters 97, 033102 (2010). Research News 55

Figure 32. The illustration shows the “local density of states” (the quantum mechanically viable states) in the carbon heterojunction transistor under a gate bias when the transistor is turning ON. The black regions denote the band gap. The yellow regions show where a Klein tunneling is taking place, thereby creating the unique situation that increases energy efficiency in this transistor beyond classical limits. The inset shows the structure of the device and current-voltage characteristics. Image: ©2011, American Institute of Physics.

can be obtained by unzipping a and collaborator Youngki Yoon power, high-performance carbon- carbon nanotube (CNT). This makes demonstrated that such a hetero- heterostructure electronics. This it possible to fabricate all-carbon junction may be utilized to obtain means significantly lower need heterostructures that have a unique a unique transistor operation. for electrical power for plugged-in interface between a CNT and a GNR. They showed that such a transistor electronics (currently estimated By performing a self-consistent may reduce energy dissipation to be 3% of total national electricity nonequilibrium Green’s function below the classical limit while not needs), as well as a significant based calculation on an atomistically compromising speed—thus providing increase in battery life for all defined structure, Salahuddin an alternative route toward ultralow- portable devices.

NERSC Initiative for Scientific Exploration (NISE) 2010 Awards

Project: Modeling the Energy Flow in Project: 3-D Radiation/Hydrodynamic Project: Molecular Mechanisms the Ozone Recombination Reaction Modeling on Parallel Architectures of the Enzymatic Decomposition PI: Dmitri Babikov, Marquette University PI: Adam Burrows, Princeton University of Cellulose Microfibrils NISE award: 260,000 hours NISE award: 5,000,000 hours PI: Jhih-Wei Chu, University Research areas/applications: Atmospheric Research areas/applications: Astrophysics of California, Berkeley chemistry, chemical physics, climate NISE award: 2,000,000 hours science, geochemistry Project: Dependence of Secondary Research areas/applications: Islands in Magnetic Chemistry, biofuels Project: Bridging the Gaps Between Fluid Reconnection on Dissipation Model and Kinetic Magnetic Reconnection PI: Paul Cassak, West Virginia University Project: Developing an Ocean-Atmosphere Simulations in Large Systems NISE award: 200,000 hours Reanalysis for Climate Applications PI: Amitava Bhattacharjee, University Research areas/applications: Plasma (OARCA) for the Period 1850 to Present of New Hampshire physics, astrophysics, fusion energy PI: Gil Compo, University of Colorado, NISE award: 1,000,000 hours Boulder Research areas/applications: Plasma Project: Simulation of Elastic Properties NISE award: 2,000,000 hours physics, astrophysics, fusion energy and Deformation Behavior of Lightweight Research areas/applications: Protection Materials under High Pressure Climate science Project: Decadal Predictability in CCSM4 PI: Wai-Yim Ching, University of Missouri, PIs: Grant Branstator and Haiyan Teng, Kansas City Project: Surface Input Reanalysis National Center for Atmospheric Research NISE award: 1,725,000 hours for Climate Applications 1850–2011 NISE award: 1,600,000 hours Research areas/applications: Materials PI: Gil Compo, University of Colorado Research areas/applications: Climate science, protective materials at Boulder science NISE award: 1,000,000 hours Research areas/applications: Climate science 56 NERSC Annual Report 2010-2011

NISE 2010 Awards (continued)

Project: Pushing the Limits of the GW/ Project: Thermodynamic, Transport, and Project: High Resolution Climate BSE Method for Excited-State Properties Structural Analysis of Geosilicates using Simulations with CCSM of Molecules and Nanostructures the SIESTA First Principles Molecular PIs: Warren Washington, Jerry Meehl, PI: Jack Deslippe, University of Dynamics Software and Tom Bettge, National Center California, Berkeley PI: Ben Martin, University of California, for Atmospheric Research NISE award: 1,000,000 hours Santa Barbara NISE award: 3,525,000 hours Research areas/applications: Materials NISE award: 110,000 hours Research areas/applications: science, protective materials Research areas/applications: Geosciences Climate science

Project: Reaction Pathways in Methanol Project: Computational Prediction Project: Abrupt Climate Change Steam Reformation of Protein-Protein Interactions and the Atlantic Meridional PI: Hua Guo, University of New Mexico PI: Harley McAdams, Stanford University Overturning Circulation NISE award: 200,000 hours NISE award: 1,880,000 hours PI: Peter Winsor, University of Alaska, Research areas/applications: Chemistry, Research areas/applications: Biological Fairbanks hydrogen fuel systems science NISE award: 2,000,000 hours Research areas/applications: Project: Warm Dense Matter Simulations Project: Light Propagation in Climate science Using the ALE-AMR Code Nanophotonics Structures Designed PI: Enrique Henestroza, Lawrence for High-Brightness Photocathodes Project: Modeling Plasma Surface Berkeley National Laboratory PIs: Karoly Nemeth and Katherine Interactions for Materials under NISE award: 450,000 hours Harkay, Argonne National Laboratory Extreme Environments Research areas/applications: Plasma NISE award: 1,300,000 hours PI: Brian Wirth, University physics, fusion energy Research areas/applications: Materials of California, Berkeley science, high-brightness electron sources NISE award: 500,000 hours Project: Modeling the Dynamics of Research areas/applications: Catalysts with the Adaptive Kinetic Project: International Linear Collider Fusion energy Monte Carlo Method (ILC) Damping Ring Beam Instabilities PI: Graeme Henkelman, University PI: Mauro Pivi, Stanford Linear Project: Models for the Explosion of Texas, Austin Accelerator Center of Type Ia Supernovae NISE award: 1,700,000 hours NISE award: 250,000 hours PI: Stan Woosley, University of California, Research areas/applications: Chemistry, Research areas/applications: Santa Cruz alternative energy Accelerator physics NISE award: 2,500,000 hours Research areas/applications: Astrophysics Project: First-Principles Study of the Project: Thermonuclear Explosions Role of Native Defects in the Kinetics from White Dwarf Mergers of Hydrogen Storage Materials PI: Tomasz Plewa, Florida State PI: Khang Hoang, University of California, University Santa Barbara NISE award: 500,000 hours NISE award: 600,000 hours Research areas/applications: Astrophysics Research areas/applications: Materials science, hydrogen fuel Project: Beam Delivery System Optimization of Next-Generation X-Ray Project: ITER Rapid Shutdown Simulation Light Sources PI: Valerie Izzo, General Atomics PIs: Ji Qiang, Robert Ryne, and Xiaoye Li, NISE award: 1,200,000 hours Lawrence Berkeley National Laboratory Research areas/applications: NISE award: 1,000,000 hours Fusion energy Research areas/applications: Accelerator physics Project: First-Principles Modeling of Charged-Defect-Assisted Loss Project: Electron Transport in the Mechanisms in Nitride Light Emitters Presence of Lattice Vibrations for PI: Emmanouil Kioupakis, University Electronic, Spintronic, and Photovoltaic of California, Santa Barbara Devices NISE award: 900,000 hours PI: Sayeef Salahuddin, University Research areas/applications: Materials of California, Berkeley science, energy-efficient lighting NISE award: 600,000 hours Research areas/applications: Materials science, ultra-low-power computing

58 NERSC Annual Report 2010-2011 NERSC Users’ Awards and Honors

National Medal of Science Fellows of the American Physical APS John Dawson Award Warren Washington Society (APS) for Excellence in Plasma National Center for Mark Asta Physics Research Atmospheric Research University of California, Davis Eric Esarey David Bruhwiler Lawrence Berkeley National Laboratory Fellows of the American Academy Tech-X Corporation Cameron Geddes of Arts and Sciences Liem Dang Lawrence Berkeley National Laboratory Andrea Louise Bertozzi Pacific Northwest National Laboratory Carl Schroeder University of California, Los Angeles Francois Gygi Lawrence Berkeley National Laboratory Adam Seth Burrows University of California, Davis Princeton University Julius Jellinek American Chemical Society Gary A. Glatzmaier Argonne National Laboratory (ACS) Ahmed Zewail Prize in Molecular Sciences University of California, Santa Cruz En Ma Johns Hopkins University William H. Miller University of California, Berkeley and Member of the National Academy Barrett Rogers Lawrence Berkeley National Laboratory of Sciences Dartmouth College Gary A. Glatzmaier Philip Snyder Peter Debye Award in Physical University of California, Santa Cruz General Atomics Chemistry Ramona Vogt George C. Schatz Fellows of the American Association Lawrence Livermore Northwestern University for the Advancement of Science National Laboratory Liem Dang Fuqiang Wang Fellow of the Materials Pacific Northwest National Laboratory Purdue University Research Society Alenka Luzar Martin White Chris Van de Walle Virginia Commonwealth University University of California, Berkeley and University of California, Santa Barbara Gregory K. Schenter Lawrence Berkeley National Laboratory Pacific Northwest National Laboratory Zhangbu Xu IEEE Sidney Fernbach Award Yousef Saad Brookhaven National Laboratory University of Minnesota James Demmel University of California, Berkeley and Chris G. Van de Walle APS James Clerk Maxwell Prize for Lawrence Berkeley National Laboratory University of California, Santa Barbara Plasma Physics James Drake University of Maryland Research News 59

Every year a significant number of NERSC users are honored for their scientific discoveries and achievements. Listed below are some of the most prominent awards given to NERSC users in 2010.

Fellows of the Society for Industrial Academician of Academia Sinica and Applied Mathematics (SIAM) Inez Yau-Sheung Fung Andrea Louise Bertozzi University of California, Berkeley and University of California, Los Angeles Lawrence Berkeley National Laboratory Yousef Saad University of Minnesota Presidential Early Career Award for Scientists and Engineers (PECASE) SIAM Junior Scientist Prize Cecilia R. Aragon Kamesh Madduri Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Joshua A. Breslau Princeton Plasma Physics Laboratory

International Council for Industrial and Applied Mathematics (ICIAM) U.S. Air Force Young Investigator Pioneer Prize Research Program James Demmel Per-Olof Persson University of California, Berkeley and University of California, Berkeley and Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory

Gordon Bell Prize NASA Public Service Group Aparna Chandramowlishwaran Achievement Award Lawrence Berkeley National Laboratory Julian Borrill Logan Moon Lawrence Berkeley National Laboratory Georgia Institute of Technology Christopher Cantalupo Lawrence Berkeley National Laboratory Theodore Kisner Association for Computing Machinery Lawrence Berkeley National Laboratory (ACM) Distinguished Scientists Wu-chun Feng Virginia Polytechnic Institute and State University Kesheng (John) Wu Lawrence Berkeley National Laboratory

NERSC Center 61 The NERSC Center

NERSC is the primary computing center for the DOE Office of Science, serving approximately 4,000 users, hosting 400 projects, and running 700 codes for a wide variety of scientific disciplines. NERSC has a tradition of providing systems and services that maximize the scientific productivity of its user community. NERSC takes pride in its reputation for the expertise of its staff and the high quality of services delivered to its users. To maintain its effectiveness, NERSC proactively addresses new challenges in partnership with the larger high performance computing (HPC) community.

The year 2010 was a transformational year for NERSC, with an unprecedented number of new systems and upgrades, including the acceptance of Hopper Phase 1; the installation of Hopper Phase 2, the Carver and Magellan clusters, the Euclid analytics server, and the Dirac GPU testbed; upgrades of the NERSC Global Filesystem and the tape archive; and a major upgrade to the Oakland Scientific Facility power supply. We also made significant improvements in the Center’s energy efficiency.

At the same time, NERSC staff were developing innovative ways to improve the usability of HPC systems, to increase user productivity, and to explore new computational models that can make the most effective use of emerging computer architectures.

The following pages describe some of the ways NERSC is meeting current challenges and preparing for the future. 62 NERSC Annual Report 2010-2011

Hopper Powers and entered production use in March period, including high-resolution Petascale Science 2010. Phase 1 helped NERSC simulations of magnetic reconnection, staff optimize the external node and simulations of peptide The installation of NERSC’s newest architecture. “Working out the aggregation into amyloid fibrils. supercomputer, Hopper (Figure 1), kinks in Phase 1 will ensure a more marks a milestone—it is NERSC’s risk-free deployment when Phase 2 Hopper Phase 2, the Cray XE6, was first petascale computer, posting arrives,” said Jonathan Carter, who fully installed in September 2010 a performance of 1.05 petaflops led the Hopper system procurement and opened for early use and testing (quadrillions of calculations per as head of NERSC’s User Services late in the year. It has 6,384 nodes, second) running the Linpack Group. “Before accepting the Phase with two 12-core AMD “Magny- benchmark. That makes Hopper, 1 Hopper system, we encouraged Cours” chips per node, and features a 153,216-processor-core Cray all 300 science projects computing Cray’s high-bandwidth, low-latency XE6 system, the fifth most powerful at NERSC to use the system during Gemini interconnect. With 168 GB/ supercomputer in the world and the the pre-production period to see sec routing capacity and internode second most powerful in the United whether it could withstand the latency on the order of 1 microsecond, States, according to the November gamut of scientific demands that Gemini supports millions of MPI 2010 edition of the TOP500 list, we typically see.” messages per second, as well as the definitive ranking of the world’s OpenMP, Shmem, UPC, and Co-Array top computers. The new external login nodes on Fortran. The innovative ECOphlex Hopper offer users a more responsive liquid cooling technology increases NERSC selected the Cray system environment. Compared to NERSC’s Hopper’s energy efficiency. in an open competition, in which Cray XT4 system, called Franklin, vendors ran NERSC application the external login nodes have more Because Hopper has 2 PB of disk benchmarks from several scientific memory, and in aggregate, have space and 70 GB/sec of bandwidth domains that use a variety of more computing power. This allows on the external filesystem, users with algorithmic methods. The Cray system users to compile applications faster extreme data demands are seeing showed the best application perform­ and run small post-processing or fewer bottlenecks when they move ance per dollar and per megawatt visualization jobs directly on the their data in and out of the machine. of power. It features external login login nodes without interference Additionally, the availability of nodes and an external filesystem for from other users. Even with dynamically loaded libraries enables increased functionality and availability. dedicated testing and maintenance even more applications to run on the times, utilization of Hopper-1 from system and adds support for popular Hopper was installed in two phases. December 15 to March 1 reached frameworks like Python. This feature The first phase, a 5,344-core Cray 90%, and significant scientific results helps ensure that the system is XT5, was installed in October 2009 were produced even during the testing optimized for scientific productivity. NERSC Center 63

Figure 1. NERSC’s newest supercomputer is named after American computer scientist Grace Murray Hopper, a pioneer in the field of software development and programming languages who created the first compiler. She was a champion for increasing the usability of computers, understanding that their power and reach would be limited unless they were made more user-friendly. Cabinet façade design by Caitlin Youngquist, Berkeley Lab Creative Services Office. Grace Hopper photo courtesy of the Hagley Museum and Library, Wilmington, Delaware.

Other New Systems at the end of April. from x-ray crystallography and and Upgrades small-angle x-ray scattering that Carver contains 800 Intel Nehalem reveals how protein atoms interact Carver and Magellan Clusters quad-core processors, or 3,200 cores. inside of a cell. The Carver and Magellan clusters With a theoretical peak performance (Figure 2), built on liquid-cooled of 34 teraflops, Carver has 3.5 times The Magellan cluster, with hardware IBM iDataPlex technology, were the capacity of Jacquard and Bassi identical to Carver’s but different installed in March 2010. Carver, combined, yet uses less space and software, is a testbed system and named in honor of American power. The system’s 400 compute will be discussed below on page 68. scientist and inventor George nodes are interconnected by the Washington Carver, entered latest 4X QDR InfiniBand technology, Euclid Analytics Server production use in May, replacing meaning that 32 GB/s of point-to- and Dirac GPU Testbed NERSC’s Opteron Jacquard cluster point bandwidth is available for high The Euclid analytics server and and IBM Power5 Bassi system, performance message passing and Dirac GPU testbed were installed which were both decommissioned I/O. Carver is one of the first platforms in May 2010. Euclid, named in honor of this scale to employ four Voltaire of the ancient Greek mathematician, switches, allowing information to is a Sun Microsystems Sunfire x4640 quickly transfer between the SMP. Its single node contains eight machine and its external filesystem. six-core Opteron 2.6 GHz processors, with all 48 cores sharing the same NERSC staff have configured 512 GB of memory. The system’s Carver’s batch queue and scratch theoretical peak performance is storage to handle a range of large 499.2 Gflop/s. Euclid supports a and small jobs, making Carver wide variety of data analysis and NERSC’s most popular system. visualization applications. A queue that allows use of up to 32 processors now has the option Dirac, a 50-node hybrid CPU/GPU to run for 168 hours, or seven cluster with Intel Nehalem and straight days. NVIDIA Fermi chips, is an experimental system dedicated Early scientific successes on Carver to exploring the performance Figure 2. Top view of the Carver included calculations of the atomic of scientific applications on and Magellan clusters showing trays interactions of titanium, other a GPU (graphics processing unit) of InfiniBand cables.Photo: Roy transition metals, and their alloys; architecture. Research conducted Kaltschmidt, Berkeley Lab Public Affairs. and computational modeling of data on Dirac is discussed on page 71. 64 NERSC Annual Report 2010-2011

50k NERSC Global Filesystem and 9840D HPSS Tape Archive Upgrades 45k T10KB “Historically, each new computing 40k T10KA platform came with its own distinct 9940B storage hardware and software, 35k 9840A creating isolated storage ‘islands,’” wrote NERSC Storage Systems Group 30k lead Jason Hick in HPC Source, a 25k supplement to Scientific Computing magazine. “Users typically had 20k different filesystem quotas on each Total Number of Tapes Total 15k machine and were responsible for keeping track of their data, from 10k determining where the data was to 5k transferring files between machines.”1 0 In recent years, however, NERSC Jul 09 Aug 09 Sep 09 Oct 09 Nov 09 Dec 09 Jan 10 Feb 10 Mar 10 Apr 10 May 10 Jun 10 Jul 10 began moving from this “island” model toward global storage resources. Figure 3. Number of HPSS tapes, July 2009 to August 2010. The center was among the first to implement a shared filesystem, 15k called the NERSC Global Filesystem 14k Archive (NGF), which can be accessed from 13k Hpss/Regent any of NERSC’s major computing 12k 11k platforms. This service facilitates file 10k sharing between platforms, as well 9k as file sharing among NERSC users 8k 60% per year growth working on a common project. This 7k year, to allow users to access nearly Terabytes 6k 5k all their data regardless of the system 4k they log into, NGF was expanded 3k from project directories to include 2k scratch and home directories as 1k well. (Project directories provide a 0

large-capacity file storage resource Jul 00 Feb 01 Feb 02 Feb 03 Feb 04 Feb 05 Feb 06 Feb 07 Feb 08 Feb 09 Feb 10 Aug 01 Aug 02 Aug 10 Aug 03 Aug 04 Aug 05 Aug 06 Aug 07 Aug 08 Aug 09 for groups of users; home directories Month and Year are used for permanent storage of source code and other relatively Figure 4. Cumulative storage by month and system, January 2000 to August 2010, small files; and scratch directories in terabytes. are used for temporary storage of large files.) With these tools, users spend less time moving data between Performance Storage System (HPSS) capacity for the continuing exponential systems and more time using it. tape archive by upgrading to higher- growth in data (Figure 4). Disk cache capacity media. In just the past year, capacity and bandwidth have been Another aspect of NERSC’s storage the number of tapes has been doubled, and NERSC has invested in strategy is a multi-year effort to reduced from 43,000 to 28,000 HPSS client software improvements improve the efficiency of the High (Figure 3), while still providing to accommodate future increases.

1Jason Hick, “Big-picture storage approach,” HPC Source, Summer 2010, pp. 5–6. NERSC Center 65

Innovations to Increase attracting attention throughout Using a design concept from NERSC, NERSC’s Energy Efficiency the HPC community: the innovative Vette Corporation custom-designed a configuration and cooling system CDU that reroutes used water coming In July 2010, NERSC scheduled for the Carver cluster, and the out of Franklin back into Carver. a four-day, center-wide outage for extensive energy monitoring system Water from the building’s chillers installation of a 3 megawatt power now operating in the NERSC enters Franklin’s cooling system at supply upgrade to the computer machine room. 52° F. The water exits the system at room. The total 9 megawatt capacity about 65° F. Sending the water back was needed to accommodate Hopper Carver Installation Saves to be chilled at such low temperatures Phase 2 along with existing and Energy, Space, Water is inefficient. Instead, the custom- future systems. NERSC staff configured the IBM designed CDU increases efficiency iDataplex system Carver in a manner by rerouting the water through Carver But perhaps more important than the so efficient that, in some places, the to pick up more heat, sending water increased power supply were NERSC’s cluster actually cools the air around back to the chillers at 72° F and innovative solutions to improve it. By setting row pitch at 5 ft, rather enabling one CDU to cool 250 kW energy efficiency, which has become than the standard 6 ft, and reusing of computing equipment. a major issue for high performance the 65° F water that comes out of the computing and data centers. The Franklin Cray XT4 to cool the system, A case study published by IBM, titled energy consumed by data center the team was able to reduce cooling “NERSC creates an ultra-efficient servers and related infrastructure costs by 50 percent and to use 30 supercomputer,” observed: equipment in the and percent less floor space. NERSC worldwide doubled between 2000 collaborated with vendor Vette The sheer size of the rear and 2005 and is continuing to grow. Corporation to custom-design doors—which measure four A 2007 Environmental Protection a cooling distribution unit (CDU) feet by seven feet—means Agency report to Congress estimated for the installation. that the air can move more that in 2006, data centers consumed slowly through them and has about 1.5 percent of total U.S. By narrowing row pitch to 5 ft more time to exchange heat electricity consumption—equivalent from cabinet-front to cabinet-front with the water. This in turn to 5.8 million average households— (a gap of only 30 inches between enables slower fans, dramatically and that federal servers and data rows), the Carver team was able reducing the energy wasted centers accounted for 10 percent to use only one “cold row” to feed as noise, and making the of that figure. the entire installation. (A standard Carver cluster the quietest installation would require cooling place in the machine room. Cost and carbon footprint are critical vents along every other row.) At concerns, but performance is such a narrow pitch, the IBM NERSC has shared these innovations suffering, too. In an interview in iDataplex’s transverse cooling with the wider HPC community and HPC Source, NERSC Director Kathy architecture recirculates cool air with IBM, which now uses NERSC Yelick said, “The heat, even at the from the previous row while the air techniques in new installations; and chip level, is limiting processor is still at a low temperature. When Vette is working to turn this custom performance. So, even from the the air passes through Carver’s design into a commercial product. innards of a computer system, we water-cooled back doors (the cool are worried about energy efficiency water coming from Franklin), it is Monitoring Energy Across the and getting the most computing further cooled and passed to the Center to Improve Efficiency with the least energy.”2 next row. When the air finally exits NERSC has instrumented its the last row of Carver, it can be machine room with state-of-the-art Two innovative projects to increase several degrees cooler than when wireless monitoring technology from energy efficiency at NERSC are it entered. SynapSense to study and optimize

2Mike May, “Sensing the future of greener data centers,” HPC Source, Autumn 2010, pp. 6-8. 66 NERSC Annual Report 2010-2011

energy use. The center has installed Responding quickly to accurate from two different chiller plants 992 sensors, which gather information machine room data not only and air handlers come in on their on variables important to machine enhances cooling efficiency, it helps own separate systems as well. room operation, including air assure the reliability of center NERSC staff created several temperature, pressure, and humidity. systems. For example, after cabinets applications to bring the Cray This data is collected by NERSC’s of the decommissioned Bassi system and SynapSense data together, SynapSense system software, which were shut down and removed, cold and work continues on integrating generates heat, pressure, and air pockets developed near Franklin’s the others. humidity maps (Figure 5). air handling units. The SynapSense system alerted administrators, who Future energy efficiency projects “Just a glance at one of these were able to quickly adjust fans and include controlling water flow and maps tells you a lot,” says Jim Craw, chillers in response to these changes fan speeds within the air handling NERSC’s newly appointed Risks in airflow and room temperatures units based on heat load, and and Energy Manager. “If you start before Franklin was negatively placing curtains or “top hats” developing hot or cold spots, you’re affected. The monitoring system also around systems to optimize the getting an unambiguous visual cue revealed changes in machine room air flows in their vicinity. that you need to investigate and/or airflow following the power upgrade. take corrective action.” Repositioning some floor tiles and partitions corrected the airflow Improving HPC System The system can also calculate the and rebalanced air temperatures. Usability with External center’s power usage effectiveness, Services an important metric that compares In addition to the SynapSense the total energy delivered to a data sensors, NERSC’s two Cray Feedback from NERSC’s Franklin center with that consumed by its systems have their own internal users has led NERSC and Cray computational equipment. sensors; and sensor readings to jointly develop configuration

Figure 5. NERSC’s energy monitoring system generates heat maps that can be used to look for areas where efficiency can be improved. NERSC Center 67

improvements for both Franklin congestion and batch launch failures. context (where connections are and Hopper that improve their On Franklin the solution was to long-lived and data transfers large). usability and availability. These divide the servers into a login pool To take advantage of this technology, improvements, called external and a head node pool. On Hopper, NERSC staff worked closely with services, are now available on all the hardware for the head nodes vendor Brocade. Together they Cray XE6 systems, and include is the same as the hardware for the conceived, adapted, and implemented external login nodes, reconfigurable compute nodes, allowing compute three improvements to support HPC head nodes (also called machine- nodes to be reconfigured into head environments: (1) support for 10 GB/s oriented miniserver, or MOM nodes), nodes and vice versa at boot time. connections, needed for HPC I/O; and an external global filesystem. NERSC has initially deployed a higher (2) support for jumbo frames, needed ratio of head nodes to compute for large data sets; and (3) support While NERSC users have made great nodes on Hopper than on Franklin. for long-lasting login sessions. use of the Franklin Cray XT4 system, This configuration provides better Brocade has integrated these the large number and variety of support for complex workflows. innovations into their product line, users, projects, applications, and As a result, NERSC’s “power login” making them available to other workflows running at NERSC exposed users now have the capabilities HPC centers. several issues tied to the fact that they need, and they are no longer almost all user-level serial processing disruptive to other users. Availability of files has also been (e.g., compiling, running analysis, improved. When Franklin was first pre- or post-processing, file transfer In addition to external login nodes, acquired, it could not be integrated to or from other systems) is confined login node balancers have improved into the NERSC Global Filesystem to a small number of shared service interactive access to NERSC (NGF), which uses GPFS software, nodes. These nodes have limited systems. Assigning login nodes because Franklin’s I/O nodes only memory (8 GB), no swap disk, and to incoming interactive sessions ran the Lustre filesystem software. older processor technology. NERSC in the traditional round-robin As a result, files on Franklin were user applications had often run the fashion can lead to congestion, not accessible from other NERSC Franklin service nodes out of memory, or worse, the perception that an systems. So NERSC worked with primarily due to long and involved entire system is down when the Cray to test, scale, and tune the application compilation, analytics user lands on a bad login node Data Virtualization Services (DVS) applications, and file transfers. or one that has been taken offline. software that Cray had purchased In the Franklin configuration, it Load balancers now intercept and developed for the solution, and was hard for NERSC to increase incoming sessions and distribute in 2010 the DVS software went into the number of login nodes as them to the least-loaded active production. Now NGF hosts a global demand for this service increased. node. Also, staff can now take down project filesystem that allow users nodes at will for maintenance, to share files between systems During the contract negotiation for and the load balancer automatically and among diverse collaborators. the Hopper system, NERSC staff removes out-of-service nodes from In addition, NERSC introduced a worked with Cray to design, configure, the rotation and redirects users GPFS-based global home filesystem and support eight external login nodes, attempting to connect to a cached that provides a convenient way for located outside of the Hopper Gemini (but offline) node to an active users to access source files, input interconnect; they have more memory one. As a result, users no longer files, configuration files, etc., (128 GB), more cores (16), and more confuse a downed node with regardless of the platform the user disk than the original Franklin login a downed system. is logged onto; and a global scratch nodes. One external login node was filesystem that can be used to also added to Franklin itself. While load balancers are commonly temporarily store large amounts used for web and database of data and access it from any Head nodes are the nodes that launch applications (where sessions tend system. When user applications parallel jobs and run the serial parts to be extremely short and the amount load one of these dynamic shared of these jobs. Initially on Franklin of data transferred quite small), libraries, the compute nodes use this function ran on the login nodes, these devices have not typically DVS to access and cache the and large interactive jobs could cause been used in a scientific computing libraries or data. 68 NERSC Annual Report 2010-2011

Taken together, the benefit of these external services (Figure 6) is improved usability and availability: users do not have to worry about X 8 Login Nodes whether Hopper is up or down, because filesystem and login nodes and software are outside the main system. So, users can log in, access Gemini Interconnect files, compile jobs, submit jobs, and analyze data even if the main system is down. Lustre router on I/O nodes routes traffic Providing New to external systems Computational Models

InfiniBand InfiniBand Magellan Cloud Computing Testbed and Evaluation GPFS Server Lustre Server Cloud computing can be attractive to scientists because it can give /home file /project file /scratch file them on-demand access to compute system system system resources and lower wait times than the batch-scheduled, high-utilization systems at HPC centers. Clouds can also offer custom-built environments such as the software stack used with their scientific code. And some clouds Figure 6. Schematic of external nodes on Hopper. have implemented data programming models for data intensive computing such as MapReduce. and 120 nodes from the Magellan BLAST perform inside the MapReduce cluster. This arrangement allows framework. Other users are using the With American Recovery and biological data to remain at JGI even Hadoop cluster to perform their own Reinvestment Act (ARRA) funding, though the compute servers are at evaluations, including implementing NERSC has deployed the Magellan NERSC. One early success on this algorithms from scratch in the science cloud testbed for the DOE cluster comes from a JGI team programming model. These Office of Science. It leverages using Hadoop to remove errors from experiences will provide insight into NERSC experience in provisioning 5 billion reads of next-generation how to best take advantage of this new a specialized cluster (PDSF) for the DNA sequence data. approach to distributed computing. high energy and nuclear physics community. NERSC is focused on We have also deployed a MapReduce The first published performance understanding what applications cluster running Hadoop, Apache’s analysis research from Magellan map well to existing and emerging open-source implementation of collaborators shows that cloud commercial solutions, and what MapReduce, which includes both computing will not work for science aspects of cloud computing are best a framework for running MapReduce unless the cloud is optimized for provided at HPC centers like NERSC. jobs as well as the Hadoop Distributed it. The paper “Performance Analysis File System (HDFS), which provides of High Performance Computing To date we have demonstrated a locality aware, fault tolerant file Applications on the Amazon Web on-demand access to cycles for system to support MapReduce jobs. Services Cloud,”3 written by a team the DOE Joint Genome Institute NERSC is working with targeted of researchers from Berkeley Lab’s (JGI) with a configuration using communities such as bioinfomatics Computational Research Division, a 9 GB network provided by ESnet to investigate how applications like Information Technology Division, NERSC Center 69

Slowdown on Commercial Cloud 52x with the 2010 HPCwire Readers’ 25 Choice Award for “Best Use of HPC in the Cloud.” 20

Developing Best Practices 15 in Multicore Programming The motivation for studying new 10 multicore programming methods stems Relative to Franklin from concerns expressed by the DOE 5 EC2 time / Magellan Office of Science users about how to rewrite their applications to keep 0 ahead of the exponentially increasing

GTC MILC HPC system parallelism. In particular, fvCAM BLAST GAMESS IMPACT MAESTRO.. PARATEC the increase in intra-node parallelism of the Hopper Phase 2 system presents Figure 7. On traditional science workloads, standard cloud configurations see significant challenges to existing application slowdown (up to 52x), but independent BLAST jobs run well. implementations—and this is just the first step in a long-term technology trend. Applications and algorithms and NERSC, was honored with the Best author of the paper. “Applications will need to rely increasingly on Paper Award at the IEEE’s International like Paratec with significant global fine-grained parallelism, strong Conference on Cloud Computing communication perform relatively scaling, and improved support for Technology and Science (CloudCom worse than those with less global fault resilience. 2010), held November 30–December communication.” He noted that 1 in Bloomington, Indiana. the EC2 cloud performance varied NERSC has worked with Cray to significantly for scientific applications establish a Cray Center of Excellence After running a series of benchmarks because of the shared nature of the (COE) to investigate these issues, designed to represent a typical virtualized environment, the network, to identify the correct programming midrange scientific workload— and differences in the underlying model to express fine-grained applications that use less than non-virtualized hardware. parallelism, and to guide the 1,000 cores—on Amazon’s EC2 transition of the user community system, the researchers found that A similar comparison of performance by preparing training materials. the EC2’s interconnect severely on the standard EC2 environment The long-term goals are to evaluate limits performance and causes versus an optimized environment advanced programming models and significant variability. Overall, the on the Magellan testbed showed identify a durable approach for cloud ran six times slower than a a slowdown of up to 52x on EC2 programming on the path to exascale. typical midrange Linux cluster, and for most applications, although the 20 times slower than a modern high BLAST bioinformatics code ran well MPI (Message Passing Interface), performance computing system. (Figure 7). Another key issue is cost: which has been the dominant HPC EC2 costs 20 cents per CPU hour, programming model for nearly two “We saw that the communication while Magellan can run workloads decades, does not scale well beyond pattern of the application can at less than 2 cents per CPU hour, eight cores. The hybrid OpenMP/MPI impact performance,” said Keith since it is run by a nonprofit model is the most mature approach Jackson, a computer scientist in government lab. to multicore programming, and it Berkeley Lab’s Computational works well on Hopper; but getting Research Division (CRD) and lead The Magellan Project was honored the best performance depends on

3 K. R. Jackson, L. Ramakrishnan, K. Muriki, S. Canon, S. Cholia, J. Shalf, H. J. Wasserman, and N. J. Wright, “Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud,” 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom), pp.159–168, Nov. 30, 2010–Dec. 3, 2010, doi: 10.1109/CloudCom.2010.69. 70 NERSC Annual Report 2010-2011

FFT “DGEMM”MPI Dynamics Physics OpenMP MPI customizing the implementation 2000 700 1800 600 1600 for each application. 1400 G 500 G 1200 / s / s O 400 O 1000 O me me 800 300 O Ti Ti 600 D 200 D While MPI is a collection of 400 200 100 processes with no shared data, 00 0 1 2 3 6 12 1 2 3 6 12 OpenMP is a collection of threads 768 384 256 128 64 240 120 80 40 20 with both private and shared data; OpenMP threads / MPI tasks OpenMP threads / MPI tasks so OpenMP can save a significant amount of memory compared with Figure 8. MPI + OpenMP performance on two of NERSC’s benchmark applications: (a) MPI. The key question is how many Paratec and (b) fvCAM. OpenMP threads to put on a node vs MPI processes: One MPI process on each core? Or one MPI process high-energy physicists to create imaging codes with graphics on the whole node, and within the computational models that will processing units (GPUs). node use OpenMP thread parallelism? refine the design of Berkeley Lab’s The running time for each option proposed soft x-ray, free electron • Praveen Narayanan is working depends on the application: some laser user facility called the Next with CCSE researchers to run run slower, some faster, some run Generation Light Source. numerical experiments using high faster for a while and then get slower. performance combustion solvers, So the COE is developing guidelines • Kirsten Fagnan is working with and also working with tools to for finding the “sweet spot” of how researchers in Berkeley Lab’s analyze performance of HPC codes much OpenMP parallelism to add Center for Computational Sciences slated for codesign at extremely to various applications (Figure 8). and Engineering (CCSE) to develop high concurrency. an adaptive mesh refinement Computational Science and capability for numerical simulation • Xuefei (Rebecca) Yuan (starting Engineering Petascale Initiative of carbon sequestration and porous in 2011) is working with To ensure that science effectively media flow. researchers in Berkeley Lab’s adapts to multicore computing, Computational Research Division NERSC is receiving $3 million • Jihan Kim is helping chemists (CRD) to improve a hybrid linear from the American Recovery and from the University of California, software package and with Reinvestment Act to develop the Berkeley accelerate their physicists at Princeton Plasma Computational Science and chemistry codes on NERSC Physics Laboratory (PPPL) to Engineering Petascale Initiative. supercomputers. accelerate magnetohydrodynamic This initiative identifies key codes with GPUs. application areas with specific needs • Wangyi (Bobby) Liu (starting for advanced programming models, in 2011) is working with the • Robert Preissl works on advanced algorithms, and other support. Neutralized Drift Compression programming models for multicore The applications are chosen to be Experiment (NDCX) group systems and is applying this consistent with the current mission at Berkeley Lab to develop to physics codes from PPPL of the Department of Energy, with a surface tension model within to optimize and analyze parallel a particular focus on applications the current warm dense matter magnetic fusion simulations. that benefit energy research, those simulation framework—new tools supported by other Recovery Act for understanding inertial Two of the “petascale postdocs,” funding, and Energy Frontier Research confinement fusion. Robert Preissl and Jihan Kim, along Centers (EFRCs). The initiative pairs with project leader Alice Koniges and postdoctoral researchers (Figure 9) • Filipe Maia is working with collaborators from other institutions, with these high-impact projects at researchers at Berkeley Lab’s were co-authors of the Best Paper NERSC. Here are the projects they Advanced Light Source to winner at the Cray User Group are working on: accelerate biological imaging meeting in Edinburgh, Scotland, codes and with the Earth Sciences in May 2010.4 Titled “Application • Brian Austin is working with Division to accelerate geophysical Acceleration on Current and Future NERSC Center 71

Cray Platforms,” the paper described they contain massive numbers of (except for two cores that have four current bottlenecks and performance simple processors, which are more GPUs each). The system includes improvement areas for applications energy-efficient than a smaller a complete set of development tools including plasma physics, chemistry number of larger processors. They for hybrid computing, including related to carbon capture and also are available at reasonable cost, CUDA, OpenCL, and PGI GPGPU- sequestration, and material science. since they are already being mass- targeted compilers. The paper discussed a variety of produced for video gaming. methods including advanced hybrid The question is whether GPUs offer The Dirac testbed provides the parallelization using multi-threaded an effective solution for a broad opportunity to engage with the MPI, GPU acceleration, and scientific workload or for a more NERSC user community to answer autoparallelization compilers. limited class of computations. the following questions:

GPU Testbed and Evaluation In April 2010 NERSC fielded • What parts of the NERSC Graphics processing units (GPUs) a 50-node GPU cluster called workload will benefit from are becoming more widely used for Dirac in collaboration with the GPU acceleration? computational science applications, Computational Research Division and GPU testbeds have been at Berkeley Lab, with funding from • What portions of the workload requested at the requirements the DOE ASCR Computer Science see no benefit, or insufficient workshops NERSC has been Research Testbeds program. Each benefit to justify the investment? conducting with its users. GPUs node contains two InfiniBand- can offer energy-efficient performance connected Intel Nehalem quad-core • Do GPUs represent the future boosts to traditional processors, since chips and one NVIDIA Fermi GPU of HPC platforms, or a feature that will be beneficial to a subset of the community?

Porting an application to a GPU requires reworking the code, such as deciding which parts to run on a GPU and then figuring out how to best modify the code. For example, running an application efficiently on a GPU often requires keeping the data near the device to reduce the computing time taken up with moving data from the CPU or memory to the GPU. Also, a programmer must decide how to thread results from the GPU back into the CPU program. This is not a trivial process, so the Dirac web site provides users with some tips for Figure 9. Alice Koniges (third from left) of NERSC’s Advanced Technologies Group leads porting applications to this system. the Computational Science and Engineering Petascale Initiative, which pairs postdoctoral researchers with high-impact projects at NERSC. The postdocs are (from left) Jihan Kim, Over 100 NERSC users have Filipe Maia, Robert Preissl, Brian Austin (back), Wangyi (Bobby) Liu (front), Kirsten requested and been given access Fagnan, and Praveen Narayanan. Not shown: Xuefei (Rebecca) Yuan. Photo: Roy to the Dirac cluster. With their help, Kaltschmidt, Berkeley Lab Public Affairs. we have collected a broad range of

4 Alice Koniges, Robert Preissl, Jihan Kim, David Eder, Aaron Fisher, Nathan Masters, Velimir Mlaker, Stephane Ethier, Weixing Wang, Martin Head-Gordon, and Nathan Wichmann, “Application Acceleration on Current and Future Cray Platforms,” CUG 2010, the Cray User Group meeting, Edinburgh, Scotland, May 2010. 72 NERSC Annual Report 2010-2011

2.4 GLYCINE-8 120 GLYCINE-20 preliminary performance data that x 1.7 x 7.4 100 1.8 x 18.8 support the idea that GPUs in their 80 x 12.3 current configuration will be an 1.2 60 40 important way to augment the 0.6 effectiveness of NERSC systems 20 Wall Time (Hours) Time Wall 0 (Hours) Time Wall 00 for a subset of the user base. Step 4Total RI-MP2 Step 4Total RI-MP2 Time Time Perhaps the largest collaboration Blue: CPU 1 thread / Red: CPU 8 threads / Green: CPU using Dirac is the International Center for Computational Science Figure 10. Q-Chem CPU/GPU performance comparisons for two proteins. CPU multi- (ICCS), a collaboration between thread performance is similar to GPU performance for smaller glycine molecules. Berkeley Lab, the University of California–Berkeley, NERSC, the University of Heidelberg, the Continents.”5 In this paper, they of achieving high GPU National Astronomical Observatories present direct astrophysical N-body performance for key kernels from of the Chinese Academy of Science, simulations with up to 6 million a gyrokinetic fusion application and Nagasaki University. ICCS bodies using the parallel MPI-CUDA due to irregular data access, explores emerging hardware devices, code phi-GPU on large GPUclusters fine-grained data hazards, and programming models, and algorithm in Beijing, Berkeley, and Heidelberg, high memory requirements.6 techniques to develop effective and with different kinds of GPU optimal tools for enabling scientific hardware. They reached about • Researchers from Argonne discovery and growth. one-third of the peak performance National Laboratory, testing for this code in a real application coupled cluster equations for ICCS researchers have used Dirac scenario with hierarchically blocked electronic correlation in small-to and other GPU clusters for rigorous timesteps and a core-halo density medium-sized molecules, found testing of an adaptive mesh structure of a stellar system. that the GPU-accelerated algorithm refinement code called GAMER readily achieves a factor of 4 to used for astrophysics and other Other research results from 5 speedup relative to the multi­ science domains, and the N-body Dirac include: threaded CPU algorithm on code phi-GPU, which is used to same-generation hardware.7 simulate dense star clusters with • NERSC postdoc Jihan Kim many binaries and galactic nuclei has found impressive GPU with supermassive black holes, versus single-node CPU Improving User in which correlations between speedups for the Q-Chem Productivity distant particles cannot be application, but performance neglected. The results played varies significantly with the With over 400 projects and 700 a significant role in developing input structure (Figure 10). applications, NERSC is constantly metrics for workloads and respective looking for ways to improve the speedups; and the phi-GPU team • A study by researchers from productivity of all 4000 scientific won the 2011 PRACE (Partnership Berkeley Lab’s Computational users. NERSC closely collaborates for Advanced Computing in Europe) Research Division, NERSC, with scientific users to improve their Award for the paper “Astrophysical UC Berkeley, and Princeton productivity and the performance Particle Simulations with Large Plasma Physics Laboratory of their applications. Additionally, Custom GPU Clusters on Three demonstrated the challenges because of NERSC’s large number

5 R. Spurzem, P. Berczik, T. Hamada, K. Nitadori, G. Marcus, A. Kugel, R. Manner, I. Berentzen, J. Fiestas, R. Banerjee, and R. Klessen, “Astrophysical Particle Simulations with Large Custom GPU Clusters on Three Continents,” in press. 6 Kamesh Madduri, Eun-Jin Im, Khaled Ibrahim, Samuel Williams, Stéphane Ethier, and Leonid Oliker, “Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms,” Parallel Computing Journal, in press. 7 A. E. DePrince and J. R. Hammond, “Coupled Cluster Theory on Graphics Processing Units I. The Coupled Cluster Doubles Method,” Journal of Chemical Theory and Computation, in press. NERSC Center 73

of users, we are always looking for To help users create Science Experiment is an international collaborations with users where the Gateways, we have developed the neutrino-oscillation experiment results can be leveraged and applied NERSC Web Toolkit (NEWT), a designed to measure the mixing to many applications. A performance service that allows users to access angle θ13 using anti-neutrinos optimization for one code is always computing resources at NERSC produced by the reactors of the useful, but extrapolating that through a simple representational Daya Bay and Ling Ao nuclear optimization and applying it to state transfer (REST)-based power plants in China. a wider user base produces application programming interface performance improvements on (API). The NEWT API and web • The Earth System Grid a much broader scale. service let researchers interact with (ESG) Climate Gateway. the NERSC center through simple The SciDAC-funded Earth In addition to one-on-one deep HTTP URLs and commands. This System Grid integrates collaborations, NERSC also provides makes it very easy to build powerful supercomputers with large-scale rapid responses and advice to web applications using nothing but data and analysis servers located the hundreds of user requests HTML and Javascript, providing at numerous national labs and received each month. Some simpler services such as authentication, file research centers to create a questions are answered immediately. management, job submission, and powerful environment for next However, users also call on the accounting interfaces. A number of generation climate research. NERSC technical staff’s deep other web access and development Access to ESG is provided knowledge of HPC to cut through tools are also available. through a system of federated complex issues that can otherwise data gateways that collectively hold up the user’s research for New science gateways in allow access to massive data months. This section describes 2010 include: and services for global and a few of our recent collaborations regional climate models, IPCC with NERSC users. • The 20th Century Reanalysis research, and analysis and Project Ensemble Gateway. This visualization software. Expanding the Science site provides data from the 20th Gateway Infrastructure Century Reanalysis Project, Speeding Up Turnaround NERSC is continuing to work with offering temperature, pressure, Time for the Palomar users to build Science Gateways humidity, and wind predictions Transient Factory that allow scientists to access data, in 200 km sections all around the When astronomers search for perform computations, and interact earth from 1871 to 2008, every supernovae, time is of the essence. with NERSC resources using six hours, based on historical data. These spectacular exploding stars web-based interfaces and The ensemble mean and standard don’t flare for long, and in order to technologies. The goal is to make deviation for each value were catch them at their brightest, when it easier for scientists to use NERSC calculated over a set of 56 they can offer the most information, while creating collaborative tools simulations. (This was the first supernovae must be detected early. for sharing data with the rest of the gateway built with NEWT.) scientific community. The NERSC Analytics Group recently • The Coherent X-Ray Imaging Data retooled the workflow of the Palomar NERSC engages with science teams Bank (CXIDB). This website is Transient Factory (PTF) to speed up interested in using these new dedicated to archiving and sharing the turnaround time for identifying services, assists with deployment, data obtained in coherent x-ray objects such as supernovae from accepts feedback, and tries to imaging (CXI) experiments. 4 hours to 15 minutes. In addition, recycle successful approaches into The website also serves as the the group has helped speed methods that other teams can use. reference for the CXI file format, identification of supernova Science Gateways can be configured in which most of the experimental candidates by crowd-sourcing to provide public unauthenticated data on the database is stored. some of the work through “citizen access to data sets and services scientists” who log into the Galaxy as well as authenticated access • The Daya Bay Neutrino Detector Zoo Supernovae website (http:// if needed. Gateway. The Daya Bay Neutrino supernova.galaxyzoo.org/). 74 NERSC Annual Report 2010-2011

Figure 11. Four example detection triplets from PTF, similar to those uploaded to Galaxy Zoo Supernovae. Each image is 100 arcseconds on a side. In each triplet, the panels show (from left to right) the most recent image containing the candidate supernova light (the science image), the reference image generated from data from an earlier epoch with no supernova candidate light, and the subtraction or difference image—the science image minus the reference image—with the supernova candidate at the centre of the crosshairs. The two triplets on the left are real supernovae, and were highly scored by the Zoo; the triplets on the right are not supernovae and were given the lowest possible score. Image: A. M. Smith et al., 2011.

The PTF, which started operating in to observe a “target of opportunity,” supernovae that PTF located during 2008, is a consortium of universities in this case a nearby supernova burning the trial period, with no false positive and telescope sites that image the at peak brightness, astronomers must identifications. This trial run shows sky every night. The resulting files find the exploding star, do follow-up that the Galaxy Zoo Supernovae are sent to NERSC, where software observations to confirm that they are model can in principle be applied running on the Carver system indeed looking at a supernova, then to any future imaging survey, and compares each night’s images to alert the Hubble—all within the first the resulting data can also be used composites gathered over the last seven or eight days of onset. The PTF to improve the accuracy of decade, a process called subtraction. sent 30 such target of opportunity automated machine-learning Images in which stars appear to have triggers to Hubble for that program, transient classifiers.8 moved or changed brightness are and it inspired a new program in which flagged for human follow-up. Some Hubble observes PTF supernova Speeding the PTF workflow with follow-up is done by scientists targets starting more than two weeks a high-speed link and running the but recently the PTF also enlisted before peak brightness and follows database on the NERSC Science volunteers through Galaxyzoo.org them for a month. Gateways in parallel with the Carver to sort stray airplane lights from subtraction pipeline has enabled supernovae. Once a good candidate In a proof of concept for the Galaxy the PTF to become the first survey is identified, one telescope in the Zoo Supernovae project, from April to target astrophysical transients in consortium is assigned to follow up. to July 2010, nearly 14,000 real time on a massive scale. In its supernova candidates from PTF first two and a half years, PTF has For example, one group of astronomers were classified by more than 2,500 discovered over 1,100 supernovae. has embarked on an ambitious Type volunteers within a few hours of data 1a supernova search using the PTF collection (Figure 11). The citizen NERSC and JGI Join Forces to search for supernovae in nearby scientists successfully identified for Genomics Computing galaxies along with the Hubble Space as transients 93% of the ~130 A torrent of data has been flowing Telescope. In order to get the Hubble spectroscopically confirmed from the advanced sequencing

8 A. M. Smith et al., “Galaxy Zoo Supernovae,” Monthly Notices of the Royal Astronomical Society 412, 1309 (2011). NERSC Center 75

platforms at the Department of forces with NERSC. Genomics which microbes play a starring role Energy Joint Genome Institute computing systems are now split and that the JGI is characterizing.” (DOE JGI), among the world’s leading between JGI’s campus in Walnut generators of DNA sequence Creek, California, and NERSC’s Data is currently flowing between information for bioenergy and Oakland Scientific Facility, which are JGI and NERSC over ESnet’s environmental applications. In 20 miles apart. NERSC also manages Science Data Network (SDN). 2010, JGI generated more than JGI’s six-person systems staff to The SDN provides circuit-oriented 5 terabases (5 trillion nucleotide integrate, operate, and support these services to enable the rapid letters of genetic code) from its systems. NERSC performs this work movement of massive scientific plant, fungal, microbial, and on a cost-recovery basis similar to data sets, and to support innovative metagenome (microbial community) the way it manages and supports the computing architectures such as user programs—five times more Parallel Distributed Systems Facility those being deployed at NERSC information than JGI generated in (PDSF) for high energy and nuclear in support of JGI science. 2009 (Figure 12). On December 14, physics research. 2010, the Energy Sciences Network In March 2010, JGI became an (ESnet) tweeted that JGI had just “We evaluated a wide variety early user of NERSC’s Magellan sent more than 50 terabits of of options, and after a thorough Cloud Computing System when genomics data in the past 10 hours review, it made perfect sense to the Institute had a sudden need at the rate of nearly 10 gigabits per partner with NERSC,” said Vito for increased computing resources. second. JGI is now the largest single Mangiardi, JGI’s Deputy Director for In less than three days, NERSC and user of ESnet, surpassing even the Business Operations and Production. JGI staff provisioned and configured Large Hadron Collider. “With a successful track record in hundreds of processor cores on providing these kinds of services, Magellan to match the computing To ensure that there is a robust they don’t have the steep learning environment available on JGI’s local computational infrastructure for curve to climb.” compute clusters. At the same time, managing, storing, and gleaning staff at both centers collaborated scientific insights from this ever- “This is a great partnership, with ESnet network engineers growing flood of data, JGI has joined because data-centric computing to deploy a dedicated 9 gigabit is an important future direction for per second virtual circuit between NERSC, and genomics is seeing Magellan and JGI over SDN exponential increases in computation within 24 hours. and storage,” said NERSC Division 5676.06 Director Kathy Yelick. “The computing This strategy gives JGI researchers DNA SEQUENCING 812.73 requirements for the genomics around the world increased PLATFORM community are quite different from computational capacity without ILLUMA 454 NERSC’s more traditional workload. any change to their software or SANGER The science is heavy in data analytics, workflow. JGI users still log on and JGI runs web portals providing to the Institute’s network and 1003.9 Gb = 1Tb access to genomic information.” submit scientific computing jobs

360.608 to its batch queues, managed by “These are critically important hardware located in Walnut Creek. assets that we can now bring to Once the jobs reach the front 170.58 bear on some of the most complex of the queue, the information questions in biology,” said JGI travels 20 miles on reserved 64 32.74 39.22 41.17 20.58 20.3 4.128 Director Eddy Rubin. “We really SDN bandwidth, directly to FY 06 FY 07 FY 08 FY 09 FY 10 need the massive computation NERSC’s Magellan system in ‘horsepower’ and supporting Oakland. After a job has finished, infrastructure that NERSC offers the results are sent back to Figure 12. DOE JGI sequence output in to help us advance our understanding Walnut Creek on the SDN within billions of nucleotide bases (gigabases, of the carbon cycle and many of the milliseconds to be saved on Gb), 2006–2010. other biogeochemical processes in filesystems at JGI. 76 NERSC Annual Report 2010-2011

“What makes this use of cloud computing so attractive is that JGI users do not notice a difference between computing on Magellan, which is 20 miles away at NERSC, or on JGI’s computing environment in Walnut Creek,” says Jeff Broughton, who heads NERSC’s Systems Department. m

Improving VisIt Performance for k AMR Applications Applications using adaptive mesh refinement (AMR) are an important 5000 km component of the DOE scientific workload. VisIt, a 3D visualization tool, is widely used in data analysis and feature rendering of AMR data.

When Tomasz Plewa of Florida State University, a user of the FLASH AMR code, reported to Hank Childs of the NERSC Analytics Group that VisIt was performing very slowly, Childs found that VisIt’s representations of Figure 13. Flame front image from the video “Type 1a Supernova: Turbulent Combustion AMR data structures were not well at the Grandest Scale” by Haitao Ma and Stan Woosley of UC Santa Cruz with John Bell, designed for very large numbers of Ann Almgren, Andy Nonaka, and Hank Childs of Berkeley Lab. patches. Childs spent several weeks adding new infrastructure to VisIt to represent AMR data more efficiently. (“Vis Night”). Their AMR data also it to his home university. Hamper He then modified VisIt’s FLASH had a large number of patches and collaborates with the Nearby Supernova reader (in consultation with the thus was running slowly in VisIt. Factory project sponsored by the VisIt-FLASH team at Argonne To address this, Childs modified the Office of High Energy Physics, and National Laboratory) to use this BoxLib reader to use the new AMR he needed to access 1.6 million small new infrastructure. As a result, VisIt infrastructure, which allowed the files. A number of the files were stored brought up a plot of Plewa’s data data to be processed much more as one group on the storage system, in only 13 seconds—40 times faster quickly. The movie was finished in but many others were scattered than the 8 minutes, 38 seconds time for the Vis Night contest, where around on various storage tapes. previously required. The new it ultimately received one of the performance time is reasonable People’s Choice awards (Figure 13). Creating a simple list of files and considering the size of the data All of the changes (AMR infrastructure, attempting to access the files in the with respect to I/O performance. FLASH, and BoxLib readers) have listed order resulted in the same tape been committed to the VisIt source media having to be mounted more The AMR tuning proved beneficial code repository and are now available than once. Because mounting the tape to other NERSC users as well. Haitao to the entire VisIt user community. is the slowest part of retrieving a small Ma and Stan Woosley of the University file, possibly taking a minute or two, of California, Santa Cruz, and the Optimizing HPSS Transfers file access times crawled to a near SciDAC Computational Astrophysics Randall Hamper from Indiana halt. These numerous small requests Consortium called on Childs to create University contacted NERSC because also created contention on the system an animation for the annual SciDAC he needed to access data from the for other NERSC users attempting conference’s Visualization Night HPSS storage system and transfer to access their own datasets. NERSC Center 77

Members of the HPSS storage team was set to 32 disks, and with the user NERSC made recommendations created a script that queries the writing output frequently, it meant 7.6 to increase his TCP and gridFTP HPSS metadata to determine which million file segments were being buffer settings, which increased tape volume it belongs to. Then the created every 4 minutes. This large performance from 5 to 30 MB/sec. files were grouped according to tape number of I/O transactions was flooding Next, a NERSC networking engineer volume. This change improved data the system and causing the job to fail. traced a packet from NERSC to retrievals from HPSS by orders LRZ and noticed that a router along of magnitude. Additionally, this Once the problem was identified, the way was not enabled for jumbo collaboration created a reusable the solution was straightforward. frames, creating a bottleneck along technology, the script for grouping We recommended decreasing the the path between the two sites. files. It has been distributed to striping parameter and reducing Finding the administrator responsible a number of different users to help the restart frequency to every four for a specific router in Germany them retrieve data more optimally. hours. Smaller analysis output files is not easy. However, with the help continued to be produced every four of staff from the LRZ, the administrator Uncovering Job Failures at Scale minutes. Tests showed that making was found, and jumbo frames were Debugging applications at scale is these two changes allowed the code enabled on the target router. This essentially detective work. The issues to run successfully to completion. increased the transfer rate to 65 only occur over a certain concurrency MB/sec for the Planck project user. and are often tightly coupled to a Improving Network Transfer Rates Catching this misconfigured router will system’s architecture. Wexing Wang A user working with the Planck not only improve performance transfer from the Princeton Plasma Physics project, an international consortium rates between NERSC and LRZ for all Laboratory contacted NERSC studying the cosmic microwave users, but will improve performance consultants because his application background radiation, contacted for other DOE collaborators was failing more than half the time the consulting group because he was transferring data along this link. when he ran on more than 8,000 experiencing slow data transfer rates processor cores. The cause of his between NERSC and the Leibniz Finally, NERSC gave advice to LRZ problem was not obvious. The Supercomputing Centre (LRZ) in staff on reconfiguring their firewall. simulation did not fail in the same Germany. Using the gridFTP tool, This last improvement increased part of the code consistently. he only saw rates of about 5 MB/ transfer rate performance to 110 Sometimes the simulation would second, far too slow to transfer large MB/sec, for an overall improvement run for two hours before failing; other observational astronomy images of over 18 times the original speed, times it ran for 18 hours, though it in a reasonable amount of time. as shown in Figure 14. rarely ran to completion successfully. Furthermore, the application code, GTS, had been known to scale to very 120 110 large concurrencies on other systems. 100 NERSC put together a team of consultants, systems staff, and Cray 80 staff to troubleshoot the issue. After 65 ec much detective work—searching /s 60 for unhealthy nodes in the system, MB examining the code’s communication 40 pattern and memory use—the I/O 30 was analyzed, and it was discovered 20 that the application was not doing 6

I/O optimally. It was using a one-file- 0 per-processor model to write out Original Rate TCP/gridFTP Enable Jumbo Reconfigure 20–30 MB of data per core on over Buffer Settings Frames on Router Firewall 8,000 processors, roughly 240 GB per dump. A Lustre striping parameter Figure 14. Transfer rates between NERSC and LRZ were boosted 18-fold. 78 NERSC Annual Report 2010-2011

Software Support responses provide feedback about The account allocation process NERSC pre-compiles and supports every aspect of NERSC’s operation, is very fast and efficient. dozens of software packages for our help us judge the quality of our users, comprising over 13.5 million services, give DOE information on NERSC has proven extremely lines of source code. These software how well NERSC is doing, and point effective for running high packages include debugging tools, us to areas we can improve. Three resolution models of the earth’s analysis and post-processing hundred ninety-five users responded climate that require a large number software, and third-party applications to the 2009/2010 survey, representing of processors. Without NERSC that scientists use to run their primary 11 percent of authorized users and I never would have been able to simulations. Instead of each user 67 percent of total MPP usage. The run these simulations at such a installing his or her own version respondents represent all six DOE high resolution to predict future of a third-party software package, science offices and a variety of climate. Many thanks. NERSC installs and supports home institutions. On a long list optimized versions of many app­ of questions, users rated us on a On the question “What can NERSC lications, allowing users to focus on 7-point satisfaction scale (Table 3). do to make you more productive?” their science research rather than on 105 users responded. The top areas installing and supporting software. On the open-ended question “What of concern were long queue does NERSC do well?” 132 users turnaround times, the need for more In the past few years, NERSC has responded. Some representative computing resources, queue policies, seen an increase in the number of comments are: and software support. Some of the scientists (currently 400) using these comments from this section are: third-party applications. This is User support is fantastic—timely primarily due to the relative increase and knowledgeable, including There are a lot of users (it is good in the proportion of materials science follow-up service. New machines to be useful and popular), and and chemistry time allocated to are installed often, and they are the price of that success is long NERSC projects and the popularity state-of-the-art. The queues are queues that lead to slower of third-party applications in these crowded but fairly managed. turn-around. A long-standing science areas (Tables 1 and 2). problem with no easy answer. Website is first class, especially Maintaining User Satisfaction the clear instructions for Allow longer jobs (such as one month The annual NERSC user survey compiling and running jobs. or half year) on Carver and Hopper. Let science run to the course.

Provide an interface which can Table 1 Table 2 help the user determine which Pre-Compiled Materials Science Pre-Compiled Chemistry Codes at NERSC Codes at NERSC of the NERSC machines is more appropriate at a given time Codes Hopper Franklin Carver Codes Hopper Franklin Carver for running a job based on ABINIT • • • AMBER • • the number of processors and

CP2K • • • runtime that are requested. G09 • CPMD • GAMESS • • • I could possibly use some more Quantum web-based tutorials on various • • • Espresso GROMACS • • • topics: MPI programming, data LAMMPS • • • analysis with NERSC tools, MOLPRO • • • a tutorial on getting VisIt Qbox • NAMD • • • (visualization tool) to work SIESTA • • • on my Linux machine. NWChem • • • VASP • • • Every year we institute changes based WEIN 2K • Q-Chem • • on the previous year’s survey. In 2010 NERSC Center 79

Table 3 Leadership Changes 2009/2010 User Survey Satisfaction Scale

In September 2010, Associate Satisfaction Score Meaning Number of Times Selected Laboratory Director for Computing 7 Very Satisfied 8,053 Sciences (and former NERSC 6 Mostly Satisfied 6,219 Division Director) Horst Simon was named Deputy Director of Berkeley 5 Somewhat Satisfied 1,488 Lab. “Horst is a strong leader who 4 Neutral 1,032 has helped to lead a tremendously productive program in high 3 Somewhat Dissatisfied 366 performance computing that is world- 2 Mostly Dissatisfied 100 class,” said Berkeley Lab Director

1 Very Dissatisfied 88 Paul Alivisatos. “As Deputy Director he’ll help me lead major scientific initiatives, oversee strategic research investments, and maintain the NERSC took a number of actions of 5.99, a significant increase of intellectual vitality of Berkeley Lab.” in response to suggestions from the 1.08 points over the previous year. 2008/2009 user survey, including Two other Franklin scores (overall A few days later, Berkeley Lab improvements to the Franklin satisfaction, and disk configuration announced that Kathy Yelick was system and the NERSC web site. and I/O per­­formance) were named Associate Lab Director for significantly improved as well. Computing Sciences. Yelick has been On the 2008/2009 survey, Franklin Another indication of increased the director of the NERSC Division uptime received the second lowest satisfaction with Franklin is that since 2008, a position she will average score (4.91). In the first on the 2009 survey 40 users continue to hold. “I am very pleased half of 2009, Franklin underwent requested improvements in to see Kathy Yelick assume this an intensive stabilization period. Franklin uptime or performance, critical role in senior leadership as Tiger teams were formed with close whereas only 10 made such Associate Lab Director for Computing collaborations with Cray to address requests on the 2010 survey. Sciences,” said Berkeley Lab system instability. These efforts Director Paul Alivisatos. “We will were continued in the second half On the 2008/2009 survey, ten benefit immensely from her of 2009 and throughout 2010, users requested improvements knowledge and vision. She will have when NERSC engaged in a project to the NERSC web site. In response, tremendous opportunity to use to understand system-initiated User Services staff removed older Berkeley Lab’s strengths in scientific causes of hung jobs, and to documentation and made sure that computing and advanced networking implement corrective actions the remaining documentation was to accelerate research that addresses to reduce their number. These up to date. On the 2010 survey, some of our nation’s most urgent investigations revealed bugs in the the score for “ease of finding scientific challenges.” Seastar interconnect as well as in information on the NERSC web the Lustre file system. These bugs site” was significantly improved. In August 2010 Jonathan Carter, were reported to Cray and were Also, for the medium-scale MPP who had led NERSC’s User Services fixed in March 2010, when Franklin users, the scores for the web site Group since 2005, was named the was upgraded to Cray Linux overall and for the accuracy of new Computing Sciences Deputy, Environment 2.2.i. As a result, information on the web showed succeeding Michael Banda, who Franklin’s mean time between significant improvement. joined Berkeley Lab’s Advanced Light failures improved from a low of Source. Carter was one of the first about 3 days in 2008 to 9 days The complete survey results are new employees hired when NERSC in 2010. available at https://www.nersc.gov/ moved to Berkeley Lab in 1996, and news-publications/publications- had most recently led the NERSC-6 On the 2010 survey, Franklin reports/user-surveys/2009-2010- procurement team that selected the uptime received an average score user-survey-results/. Cray XE6 Hopper system. 80 NERSC Annual Report 2010-2011

Succeeding Carter as leader of the Manager and named Jim Craw the computational science community, User Services Group was Katie to this new role. “Jim’s extensive are essential to NERSC’s mission. Antypas, whom Kathy Yelick describes knowledge of NERSC’s systems, as “a passionate advocate for the operations and vendor relations Collaborations between institutions NERSC users.” Antypas was a will be invaluable to support are one fruitful avenue of outreach. member of the NERSC-6 procurement the expanded emphasis on these Examples include NERSC’s technical team, co-lead ofthe NERSC-6 matters,” Yelick said. Craw had been exchanges with KISTI (Korea), CSCS implementation team, and co-lead leader of NERSC’s Computational (Switzerland), and KAUST (Saudi on a number of Franklin stabilization Systems Group; Systems Department Arabia); the data transfer working “tiger teams,” as well as participating Head Jeff Broughton is acting leader group activities with Oak Ridge and in various planning activities. of the group until a permanent Argonne national labs; procurement and replacement is hired. technical exchanges with Sandia; Cray Energy efficiency has become a systems administrator interactions major issue for NERSC, with costs with Oak Ridge; and center growing with each new system and Outreach and Training management interchanges between the need for facility upgrades likely NERSC, Argonne, and Oak Ridge. every few years. This requires As science tackles increasingly increased monitoring, reporting, and complex problems, interdisciplinary What began as an informal discussion optimization of energy use, as well as research becomes more essential, two years ago between a physicist management of the associated risks. because interdisciplinary teams bring and a computer scientist has led Rather than distributing the a variety of knowledge, approaches, to a new international collaboration management of these activities and skills to solve the problem. aimed at creating computational across multiple individuals, NERSC Computational science is no tools to help scientists make more Director Kathy Yelick created a new exception. That’s why outreach and effective use of new computing position of Energy Efficiency and Risk training, which promote diversity in technologies, including multicore processors. The International Center for Computational Science (ICCS) is located at Berkeley Lab/NERSC and the University of California at Berkeley, with partners at the University of Heidelberg in Germany, the National Astronomical Observatories of the Chinese Academy of Sciences, and Nagasaki University in Japan. ICCS explores the vast and rapidly changing space of emerging hardware devices, programming models, and algorithm techniques to develop effective and optimal tools for enabling scientific discovery and growth.

ICCS hosts several collaborative research programs, including ISAAC, a three-year NSF-funded project to focus on research and development of infrastructure to accelerate physics and astronomy applications using multicore architectures; and the GRACE Project and the Silk Road Project, Top row, left to right: Horst Simon, Kathy Yelick, Jonathan Carter. Bottom row, left to which are developing programmable right: Katie Antypas, Jim Craw, Jeff Broughton. hardware for astrophysical simulations. NERSC Center 81

Participants applied the techniques Technologies Corporation (UTC) on in hands-on lab sessions using code profiling, HPC scalability, and NERSC computers (Figure 15). code tuning, with the result of a 3.6x Seating was limited and many faster time to solution for one class people had to be turned away, of problems. so ICCS offered a second five-day workshop at UC Berkeley in January NERSC has also extended its 2011 on “Manycore and Accelerator- outreach to local high schools. Based High-Performance Scientific In June 2010, NERSC hosted 12 Figure 15. Researchers with scientific Computing,” which also included students from Oakland Technical backgrounds ranging from astrophysics hands-on tutorials. High School’s Computer Science to biochemistry attended the first and Technology Academy, a small computational science summer school NERSC staff have been actively academy within the larger high at NERSC. Photo: Hemant Shukla. involved in developing and delivering school, for an introduction to tutorials at SciDAC meetings, at the computational science and SC conference, and for the TeraGrid. supercomputer architecture, and a ICCS also offers training in One hundred participants attended tour of NERSC (Figure 16). In July, programming multicore devices tutorials at the July 2010 SciDAC Berkeley Lab Computing Sciences such as GPUs for advanced scientific conference, which were organized by hosted 14 local high school students computation. In August 2010, ICCS the SciDAC Outreach Center, which for a four-day Careers in Computing offered a five-day course at NERSC, is hosted at NERSC. The SciDAC Camp, with sessions at both the “Proven Algorithmic Techniques Outreach Center has provided HPC main Lab and NERSC’s Oakland for Many-Core Processors,” in outreach to industry, identifying Scientific Facility (Figure 17). The collaboration with the Virtual computing needs in industrial areas program was developed with input School of Computational Science that reinforce the DOE mission with from computer science teachers at and Engineering, a project of the the Council on Competiveness. An Berkeley High, Albany High, Kennedy University of Illinois and NVIDIA. example is working with United High in Richmond, and Oakland Tech, and included presentations, hands-on activities, and tours of various facilities. Staff from NERSC, CRD, ESnet, and the IT Division presented a wide range of topics including assembling a desktop computer, cyber security war stories, algorithms for combustion and

Figure 17. Students from the Berkeley Figure 16. Dave Paul, a systems engineer, brought out computer nodes and parts from Lab Careers In Computing Camp pose NERSC’s older systems to show Oakland Tech students how the components have become in front of NERSC’s Cray XT5 both more dense and more power efficient as the technology has evolved over time. supercomputer, Franklin. Photo: Roy Photo: Roy Kaltschmidt, Berkeley Lab Public Affairs. Kaltschmidt, Berkeley Lab Public Affairs. 82 NERSC Annual Report 2010-2011

astrophysics, the role of applied Research and 2010, the 1st Workshop on math, networking, science portals, Development by Scientific Cloud Computing, and hardware/software co-design. Chicago, Ill., June 2010. NERSC Staff

To reach an even broader audience, Staying ahead of the technological • Performance Analysis of High NERSC, ESnet, and Computational curve, anticipating problems, Performance Computing Research Division staff are and developing proactive solutions Applications on the Amazon Web contributing to Berkeley Lab’s are part of NERSC’s culture. Many Services Cloud. Keith R. Jackson, Video Glossary, a popular website staff members collaborate on Lavanya Ramakrishnan, Krishna where scientific terms are defined computer science research projects, Muriki, Shane Canon, Shreyas in lay language (http://videoglossary. scientific code development, and Cholia, John Shalf, Harvey J. lbl.gov/). Computing-related videos domain-specific research, as well Wasserman, and Nicholas J. to date include: as participating in professional Wright. Proceedings of the 2nd organizations and conferences IEEE International Conference • bits and bytes (Inder Monga, ESnet) and contributing to journals and on Cloud Computing Technology • computational science (Associate proceedings. The NERSC user and Science (CloudCom 2010), Lab Director Horst Simon) community benefits from the Bloomington, Ind., Nov. 30– • green computing (John Shalf, results of these activities as they Dec.1, 2010. NERSC) are applied to systems, software, • IPv6 (Michael Sinatra, ESnet) and services at NERSC and Alice Koniges of NERSC’s Advanced • petabytes (Inder Monga, ESnet) throughout the HPC community. Technologies Group and Gabriele Jost • petaflop computing (David Bailey, of the Texas Advanced Computing CRD) In 2010, four conference papers Center were guest editors of a special • quantum computing (Thomas co-authored by NERSC staff won issue of the journal Scientific Schenkel, Accelerator and Fusion Best Paper awards: Programming titled “Exploring Research Division) Languages for Expressing Medium • science network (Eli Dart, ESnet) • MPI-Hybrid Parallelism to Massive On-Chip Parallelism” • supercomputing (NERSC Director for Volume Rendering on Large, (Vol. 18, No. 3–4, 2010). Several Kathy Yelick) Multi-Core Systems. Mark Howison, NERSC researchers co-authored E. Wes Bethel, and Hank Childs. articles in this issue: Proceedings of the Eurographics Symposium on Parallel Graphics and • Guest Editorial: Special Issue: Visualization (EGPGV’10), Norrköping, Exploring Languages for Sweden, May 2–3, 2010. Expressing Medium to Massive On-Chip Parallelism. Gabriele Jost • Application Acceleration on and Alice Koniges. Current and Future Cray Platforms. Alice Koniges, Robert • Overlapping Communication with Preissl, Jihan Kim, David Eder, Computation using OpenMP Tasks Aaron Fisher, Nathan Masters, on the GTS Magnetic Fusion Code. Velimir Mlaker, Stephane Ethier, Robert Preissl, Alice Koniges, Weixing Wang, Martin Head- Stephan Ethier, Weixing Wang, Gordon, and Nathan Wichmann. and Nathan Wichmann. Proceedings of the 2010 Cray User Group Meeting, Edinburgh, • A Programming Model Scotland, May 24–27, 2010. Performance Study using the NAS Parallel Benchmarks. Hongzhang • Seeking Supernovae in the Clouds: Shan, Filip Blagojeviëć, Seung-Jai A Performance Study. Keith Jackson, Min, Paul Hargrove, Haoqiang Jin, Lavanya Ramakrishnan, Karl Runge, Karl Fuerlinger, Alice Koniges, and and Rollin Thomas. ScienceCloud Nicholas J. Wright. NERSC Center 83

A paper co-authored by Daniela V.; Ganeshalingam, Mohan; Leonard, Ushizima, R. Saye, C. M. Ghajar, Ushizima of NERSC’s Anaytics Group Douglas C.; Li, Weidong; Thomas, J. A. Sethian. Bay Area Physical was on two successive lists of the Rollin C. Science, Volume 327, Sciences–Oncology Center (NCI) “Top 25 Hottest Articles” based on Issue 5961, pp. 58– (2010). Meeting 2010, UCSF Medical number of downloads from the Sciences, Sept. 2010. journal Digital Signal Processing, Analyzing and Tracking Burning from April to September 2010: Structures in Lean Premixed Communication Requirements and Hydrogen Flames. P.-T. Bremer, Interconnect Optimization for • SAR Imagery Segmentation by G. H. Weber, V. Pascucci, M. S. Day, High-End Scientific Applications. Statistical Region Growing and and John B. Bell. IEEE Transactions Shoaib Kamil, Leonid Oliker, Ali Hierarchical Merging. E. A. Carvalho, on Visualization and Computer Pinar, and John Shalf. IEEE Trans. D. M. Ushizima, F. N. S. Medeiros, Graphics, 16(2): 248–260, 2010. Parallel Distrib. Syst. 21(2): C. I. O. Martins, R. C. P. Marques, 188–202 (2010). and I. N. S. Oliveira. Digital Signal Analyzing the Effect of Different Processing, Volume 20, Issue 5, Programming Models upon Compressive Auto-Indexing in pp. 1365–1378, Sept 2010. Performance and Memory Usage on Femtosecond Nanocrystallography. Cray XT5 Platforms. H. Shan, H. Jin, Filipe Maia, Chao Yang, and Stefano Other publications by NERSC staff K. Fuerlinger, A. Koniges, and N. Marchesini. Lawrence Berkeley in 2010 are listed below. (Not all Wright. Proceedings of the 2010 National Laboratory technical report co-authors are from NERSC.) Many Cray User Group, May 24–27, 2010, LBNL-4008E, 2010. of these papers are available online Edinburgh, Scotland. at https://www.nersc.gov/news- Compressive Phase Contrast publications/publications-reports/ Assessment and Mitigation of Tomography. Filipe Maia, Alastair nersc-staff-publications/. Radiation, EMP, Debris and Shrapnel MacDowell, Stefano Marchesini, Impacts at Mega-Joule Class Laser Howard A. Padmore, Dula Y. ALE-AMR: A New 3D Multi-Physics Facilities. D. C. Eder, et al. J. Physics, Parkinson, Jack Pien, Andre Code for Modeling Laser/Target Effects. Conf. Ser. 244: 0320018, 2010. Schirotzek, and Chao Yang. SPIE A. E. Koniges, N. D. Masters, A. C. Optics and Photonics 2010, San Fisher, R. W. Anderson, D. C. Eder, Auto-Tuning Stencil Computations Diego, California. T. B. Kaiser, D. S. Bailey, B. Gunney, on Diverse Multicore Architectures. P. Wang, B. Brown, K. Fisher, K. Datta, S. Williams, V. Volkov, Core-Collapse Supernovae F. Hansen, B. R. Maddox, D. J. J. Carter, L. Oliker, J. Shalf, and from the Palomar Transient Benson, M. Meyers, and A. Geille. K. Yelick. In Scientific Computing Factory: Indications for a Different Proceedings of the Sixth International with Multicore and Accelerators, Population in Dwarf Galaxies. Conference on Inertial Fusion edited by: Jakub Kurzak, David A. Arcavi, Iair; Gal-Yam, Avishay; Sciences and Applications, Journal Bader, and Jack Dongarra. CRC Kasliwal, Mansi M.; Quimby, Robert of Physics: Conference Series, Vol. Press, 2010. M.; Ofek, Eran O.; Kulkarni, 244 (3), 032019. Shrinivas R.; Nugent, Peter E.; Automated Detection and Analysis Cenko, S. Bradley; Bloom, Joshua S.; An Auto-Tuning Framework for of Particle Beams in Laser-Plasma Sullivan, Mark; Howell, D. Andrew; Parallel Multicore Stencil Accelerator Simulations. Daniela Poznanski, Dovi; Filippenko, Alexei Computations. S. Kamil, C. Chan, Ushizima, Cameron Geddes, Estelle V.; Law, Nicholas; Hook, Isobel; L. Oliker, J. Shalf, and S. Williams. Cormier-Michel, E. Wes Bethel, Jönsson, Jakob; Blake, Sarah; 2010 IEEE International Symposium Janet Jacobsen, Prabhat, Oliver Cooke, Jeff; Dekany, Richard; on Parallel & Distributed Processing Rübel, Gunther Weber, Bernard Rahmer, Gustavo; Hale, David; (IPDPS), pp.1–12, April 19–23, 2010. Hamann, Peter Messmer, and Hans Smith, Roger; Zolkower, Jeff; Velur, Hagen. In-Tech, 367–389, 2010. Viswa; Walters, Richard; Henning, An Unusually Fast-Evolving John; Bui, Kahnh; McKenna, Dan; Supernova. Poznanski, Dovi; Building a Physical Cell Simulation Jacobsen, Janet. Astrophysical Chornock, Ryan; Nugent, Peter E.; and Comparing with Confocal Journal, Volume 721, Issue 1, Bloom, Joshua S.; Filippenko, Alexei Microscopy. C. Rycroft, D. M. pp. 777–784 (2010). 84 NERSC Annual Report 2010-2011

Coupling Visualization and Data Electromagnetic Counterparts of HPSS in the Extreme Scale Era: Analysis for Knowledge Discovery Compact Object Mergers Powered by Report to DOE Office of Science from Multi-Dimensional Scientific the Radioactive Decay of r-Process on HPSS in 2018–2022. D. Cook, Data. Oliver Rübel, Sean Ahern, E. Nuclei. Metzger, B. D.; Martínez- J. Hick, J. Minton, H. Newman, Wes Bethel, Mark D. Biggin, Hank Pinedo, G.; Darbha, S.; Quataert, E.; T. Preston, G. Rich, C. Scott, Childs, Estelle Cormier-Michel, Arcones, A.; Kasen, D.; Thomas, R.; J. Shoopman, J. Noe, J. O’Connell, Angela DePace, Michael B. Eisen, Nugent, P.; Panov, I. V.; Zinner, G. Shipman, D. Watson, and V. Charless C. Fowlkes, Cameron G. R. N. T. Monthly Notices of the Royal White. Lawrence Berkeley National Geddes, Hans Hagen, Bernd Astronomical Society, Volume 406, Laboratory technical report LBNL- Hamann, Min-Yu Huang, Soile V. E. Issue 4, pp. 2650–2662 (2010). 3877E, 2010. Keränen, David W. Knowles, Cris L. Luengo Hendriks, Jitendra Malik, Exploitation of Dynamic Hybrid Parallelism for Volume Jeremy Meredith, Peter Messmer, Communication Patterns through Rendering on Large, Multi-Core Prabhat, Daniela Ushizima, Gunther Static Analysis. Robert Preissl, Bronis Systems. M. Howison, E. W. Bethel, H. Weber, and Kesheng Wu. R. de Supinski, Martin Schulz, Daniel and H. Childs. Journal of Physics: Proceedings of International J. Quinlan, Dieter Kranzlmüller, and Conference Series, Proceedings Conference on Computational Thomas Panas. Proc. International of SciDAC 2010, July 2010, Science, ICCS 2010, May 2010. Conference on Parallel Processing Chattanooga TN. (ICPP), Sept. 13–16, 2010. Defining Future Platform Integrating Data Clustering and Requirements for E-Science Extreme Scaling of Production Visualization for the Analysis of 3D Clouds. Lavanya Ramakrishnan, Visualization Software on Diverse Gene Expression Data. O. Rübel, G. Keith R. Jackson, Shane Canon, Architectures. H. Childs, D. Pugmire, H. Weber, M-Y Huang, E. W. Bethel, Shreyas Cholia, and John Shalf. S. Ahern, B. Whitlock, M. Howison, M. D. Biggin, C. C. Fowlkes, C. Proceedings of the 1st ACM Prabhat, G. Weber, and E. W. Bethel. Luengo Hendriks, S. V. E. Keränen, Symposium on Cloud Computing Computer Graphics and Applications, M. Eisen, D. Knowles, J. Malik, (SoCC ‘10). ACM, New York, NY, Vol. 30 (3), pp. 22–31, May/June 2010. H. Hagen and B. Hamann. IEEE USA, 101–106 (2010). Transactions on Computational File System Monitoring as a Window Biology and Bioinformatics, Vol. 7, Discovery of Precursor Luminous into User I/O Requirements. A. Uselton, No.1, pp. 64–79, Jan/Mar 2010. Blue Variable Outbursts in Two K. Antypas, D. M. Ushizima, and J. Recent Optical Transients: The Sukharev. Proceedings of the 2010 International Exascale Software Project Fitfully Variable Missing Links UGC Cray User Group Meeting, Edinburgh, (IESP) Roadmap, version 1.1. Jack 2773-OT and SN 2009ip. Smith, Scotland, May 24–27, 2010. Dongarra et al. [incl. John Shalf, Nathan; Miller, Adam; Li, Weidong; David Skinner, and Kathy Yelick]. Filippenko, Alexei V.; Silverman, H5hut: A High-Performance I/O Library October 18, 2010, www.exascale.org. Jeffrey M.; Howard, Andrew W.; for Particle-Based Simulations. Mark Nugent, Peter; Marcy, Geoffrey W.; Howison, Andreas Adelmann, E. Wes Large Data Visualization on Bloom, Joshua S.; Ghez, Andrea M.; Bethel, Achim Gsell, Benedikt Oswald, Distributed Memory Multi-GPU Lu, Jessica; Yelda, Sylvana; and Prabhat. Proceedings of IASDS10 Clusters. T. Fogal, H. Childs, S. Bernstein, Rebecca A.; Colucci, — Workshop on Interfaces and Shankar, J. Kruger, R. D. Bergeron, Janet E. Astronomical Journal, Abstractions for Scientific Data and P. Hatcher. High Performance Volume 139, Issue 4, pp. 1451– Storage, Heraklion, Crete, Greece, Graphics 2010, Saarbruken, 1467 (2010). September 2010. Germany, June 2010.

Efficient Bulk Data Replication for High Performance Computing and Laser Ray Tracing in a Parallel the Earth System Grid. A. Sim, D. Storage Requirements for HIgh Arbitrary Lagrangian Eulerian Adaptive Gunter, V. Natarajan, A. Shoshani, Energy Physics. Richard Gerber and Mesh Refinement Hydrocode. N. D. Williams, J. Long, J. Hick, J. Lee, Harvey Wasserman, eds. Lawrence Masters, D. C. Eder, T. B. Kaiser, A. and E. Dart. International Symposium Berkeley National Laboratory E. Koniges, and A. Fisher. J. Physics, on Grid Computing, 2010. technical report LBNL-4128E, 2010. Conf. Ser. 244: 032022, 2010. NERSC Center 85

Lessons Learned from Moving G.; Antilogus, P.; Aragon, C.; Bailey, Parallel I/O Performance: From Events Earth System Grid Data Sets over S.; Baltay, C.; Bongard, S.; Buton, to Ensembles. Andrew Uselton, Mark a 20 Gbps Wide-Area Network. C.; Childress, M.; Chotard, N.; Copin, Howison, Nicholas J. Wright, David R. Kettimuthu, A. Sim, D. Gunter, Y.; Fakhouri, H. K.; Gal-Yam, A.; Skinner, Noel Keen, John Shalf, W. Allcock, P.T. Bremer, J. Bresnahan, Gangler, E.; Hoyer, S.; Kasliwal, Karen L. Karavanic, and Leonid Oliker. A. Cherry, L. Childers, E. Dart, M.; Loken, S.; Nugent, P.; Pain, R.; Proceedings of 2010 IEEE International I. Foster, K. Harms, J. Hick, J. Lee, Pécontal, E.; Pereira, R.; Perlmutter, Symposium on Parallel & Distributed M. Link, J. Long, K. Miller, V. S.; Rabinowitz, D.; Rau, A.; Processing (IPDPS), 19–23 April Natarajan, V. Pascucci, K. Raffenetti, Rigaudier, G.; Runge, K.; Smadja, 2010; Atlanta, Georgia; pp.1-11. D. Ressman, D. Williams, L. Wilson, G.; Tao, C.; Thomas, R. C.; Weaver, and L. Winkler. Proceedings of the B.; Wu, C. Astrophysical Journal, Performance Analysis of Commodity 19th ACM International Symposium Volume 713, Issue 2, pp. 1073– and Enterprise Class Flash Devices. on High Performance Distributed 1094 (2010). N. Master, M. Andrews, J. Hick, S. Computing (HPDC ‘10). ACM, New Canon, and N. Wright. Petascale York, NY, USA, 316–319. Nickel-Rich Outflows Produced Data Storage Workshop (PDSW) by the Accretion-Induced Collapse 2010, Nov. 2010. Magellan: Experiences from of White Dwarfs: Light Curves and a Science Cloud. Lavanya Spectra. Darbha, S.; Metzger, B. Petascale Parallelization of the Ramakrishnan, Piotr T. Zbiegel, D.; Quataert, E.; Kasen, D.; Gyrokinetic Toroidal Code. S. Ethier, Scott Campbell, Rick Bradshaw, Nugent, P.; Thomas, R. Monthly M. Adams, J. Carter, and L. Oliker. Richard Shane Canon, Susan Notices of the Royal Astronomical VECPAR: High Performance Coghlan, Iwona Sakrejda, Narayan Society, Volume 409, Issue 2, pp. Computing for Computational Desai, Tina Declerck, and Anping 846–854 (2010). Science, June 2010. Liu. Proceedings of the 2nd International Workshop on Scientific Ocular Fundus for Retinopathy Quantitative Visualization of ChIP- Cloud Computing (ScienceCloud ‘11). Characterization. D. M. Ushizima and chip Data by Using Linked Views. ACM, New York, NY, USA, 49–58. J. Cuadros. Bay Area Vision Meeting, Min-Yu Huang, Gunther H. Weber, Feb. 5, Berkeley, CA, 2010. Xiao-Yong Li, Mark D. Biggin, and Modeling Heat Conduction and Bernd Hamann. Proceedings IEEE Radiation Transport with the Optimizing and Tuning the Fast International Conference on Diffusion Equation in NIF ALE-AMR. Multipole Method for State-of-the-Art Bioinformatics and Biomedicine A. Fisher, D. S. Bailey, T. B. Kaiser, Multicore Architectures. A. 2010 (IEEE BIBM 2010) Workshops, B. T. N. Gunney, N. D. Masters, Chandramowlishwaran, S. Williams, Workshop on Integrative Data A. E. Koniges, D. C. Eder, and R. W. L. Oliker, I. Lashuk, G. Biros, and R. Analysis in Systems Biology (IDASB), Anderson. J. Physics, Conf. Ser. 244: Vuduc. Proceedings of 2010 IEEE pp. 195–200. 022075, 2010. International Symposium on Parallel & Distributed Processing (IPDPS), Rapidly Decaying Supernova 2010X: Multi-Level Bitmap Indexes for April 2010. A Candidate “Ia” Explosion. Kasliwal, Flash Memory Storage. Kesheng Mansi M.; Kulkarni, S. R.; Gal-Yam, Wu, Kamesh Madduri, and Shane Parallel Algorithms for Moving Avishay; Yaron, Ofer; Quimby, Robert Canon. IDEAS ‘10: Proceedings Lagrangian Data on Block Structured M.; Ofek, Eran O.; Nugent, Peter; of the Fourteenth International Eulerian Meshes. A. Dubey, K. B. Poznanski, Dovi; Jacobsen, Janet; Database Engineering and Antypas, C. Daley. Parallel Computing, Sternberg, Assaf; Arcavi, Iair; Applications Symposium, Vol. 37 (2): 101–113 (2011). Howell, D. Andrew; Sullivan, Mark; Montreal, QC, Canada, 2010. Rich, Douglas J.; Burke, Paul F.; Parallel and Streaming Generation Brimacombe, Joseph; Milisavljevic, Nearby Supernova Factory of Ghost Data for Structured Grids. Dan; Fesen, Robert; Bildsten, Lars; Observations of SN 2007if: First M. Isenberg, P. Lindstrom, and H. Shen, Ken; Cenko, S. Bradley; Total Mass Measurement of a Childs. Computer Graphics and Bloom, Joshua S.; Hsiao, Eric; Law, Super-Chandrasekhar-Mass Applications, Vol. 30 (3), pp. 50–62, Nicholas M.; Gehrels, Neil; Immler, Progenitor. Scalzo, R. A.; Aldering, May/June 2010. Stefan; Dekany, Richard; Rahmer, 86 NERSC Annual Report 2010-2011

Gustavo; Hale, David; Smith, Roger; Silicon Nanophotonic Network-On- Supernova PTF 09UJ: A Possible Zolkower, Jeff; Velur, Viswa; Walters, Chip Using TDM Arbitration. Gilbert Shock Breakout from a Dense Richard; Henning, John; Bui, Kahnh; Hendry, Johnnie Chan, Shoaib Kamil, Circumstellar Wind. Ofek, E. O.; McKenna, Dan. Astrophysical Journal Lenny Oliker, John Shalf, Luca P. Rabinak, I.; Neill, J. D.; Arcavi, I.; Letters, Volume 723, Issue 1, pp. Carloni, and Keren Bergman. IEEE Cenko, S. B.; Waxman, E.; Kulkarni, L98–L102 (2010). Symposium on High Performance S. R.; Gal-Yam, A.; Nugent, P. E.; Interconnects (HOTI) 5.1 (Aug. 2010). Bildsten, L.; Bloom, J. S.; Recent Advances in VisIt: AMR Filippenko, A. V.; Forster, K.; Streamlines and Query-Driven SN 2008iy: An Unusual Type IIn Howell, D. A.; Jacobsen, J.; Visualization. G. H. Weber, S. Ahern, Supernova with an Enduring 400-d Kasliwal, M. M.; Law, N.; Martin, E. W. Bethel, S. Borovikov, H. R. Rise Time. Miller, A. A.; Silverman, C.; Poznanski, D.; Quimby, R. M.; Childs, E. Deines, C. Garth, H. Hagen, J. M.; Butler, N. R.; Bloom, J. S.; Shen, K. J.; Sullivan, M.; Dekany, B. Hamann, K. I. Joy, D. Martin, J. Chornock, R.; Filippenko, A. V.; R.; Rahmer, G.; Hale, D.; Smith, R.; Meredith, Prabhat, D. Pugmire, O. Ganeshalingam, M.; Klein, C. R.; Li, Zolkower, J.; Velur, V.; Walters, R.; Rübel, B. Van Straalen, and K. Wu. W.; Nugent, P. E.; Smith, N.; Steele, Henning, J.; Bui, K.; McKenna, D. Numerical Modeling of Space T. N. Monthly Notices of the Royal Astrophysical Journal, Volume 724, Plasma Flows: Astronum-2009 Astronomical Society, Volume 404, Issue 2, pp. 1396–1401 (2010). (Astronomical Society of the Pacific Issue 1, pp. 305–317 (2010). Conference Series), Vol. 429, pp. Transitioning Users from 329–334, 2010. Spectra and Hubble Space Telescope the Franklin XT4 System to the Light Curves of Six Type Ia Supernovae Hopper XE6 System. K. Antypas, Resource Management in the at 0.511 < z < 1.12 and the Union2 Y. He. CUG Procceedings 2011, Tessellation Manycore OS. J. A. Compilation. Amanullah, R.; Lidman, Fairbanks, Alaska. Colmenares, S. Bird, H. Cook, P. C.; Rubin, D.; Aldering, G.; Astier, P.; Pearce, D. Zhu, J. Shalf, K. Asanovic, Barbary, K.; Burns, M. S.; Conley, A.; Two-Stage Framework for and J. Kubiatowicz. Proceedings of Dawson, K. S.; Deustua, S. E.; Doi, a Topology-Based Projection the Second USENIX Workshop on M.; Fabbro, S.; Faccioli, L.; and Visualization of Classified Hot Topics in Parallelism (HotPar’10), Fakhouri, H. K.; Folatelli, G.; Document Collections. Patrick Berkeley, California, June 2010. Fruchter, A. S.; Furusawa, H.; Oesterling, Gerik Scheuermann, Garavini, G.; Goldhaber, G.; Goobar, Sven Teresniak, Gerhard Heyer, Retinopathy Diagnosis from Ocular A.; Groom, D. E.; Hook, I.; Howell, D. Steffen Koch, Thomas Ertl, and Fundus Image Analysis. D. M. A.; Kashikawa, N.; Kim, A. G.; Knop, Gunther H. Weber. Proceedings Ushizima and F. Medeiros. Modeling R. A.; Kowalski, M.; Linder, E.; IEEE Symposium on Visual and Analysis of Biomedical Image, Meyers, J.; Morokuma, T.; Nobili, S.; Analytics Science and Technology SIAM Conference on Imaging Science Nordin, J.; Nugent, P. E.; Östman, (IEEE VAST), Salt Lake City, , (IS10), Chicago, Il, April 12–14, 2010. L.; Pain, R.; Panagia, N.; Perlmutter, October 2010. S.; Raux, J.; Ruiz-Lapuente, P.; SDWFS-MT-1: A Self-Obscured Spadafora, A. L.; Strovink, M.; Type II-P Supernovae as Standard Luminous Supernova at z ~= 0.2. Suzuki, N.; Wang, L.; Wood-Vasey, Candles: The SDSS-II Sample Kozłowski, Szymon; Kochanek, C. S.; W. M.; Yasuda, N.; Supernova Revisited. Poznanski, Dovi; Nugent, Stern, D.; Prieto, J. L.; Stanek, K. Z.; Cosmology Project, The. Astrophysical Peter E.; Filippenko, Alexei V. Thompson, T. A.; Assef, R. J.; Drake, Journal, Volume 716, Issue 1, pp. Astrophysical Journal, Volume 721, A. J.; Szczygieł, D. M.; Woźniak, P. R.; 712–738 (2010). Issue 2, pp. 956–959 (2010). Nugent, P.; Ashby, M. L. N.; Beshore, E.; Brown, M. J. I.; Dey, Arjun; Streamline Integration using Unveiling the Origin of Grb Griffith, R.; Harrison, F.; Jannuzi, B. T.; MPI-Hybrid Parallelism on Large 090709A: Lack of Periodicity Larson, S.; Madsen, K.; Pilecki, B.; Multi-Core Architecture. David Camp, in a Reddened Cosmological Pojmański, G.; Skowron, J.; Vestrand, Christoph Garth, Hank Childs, David Long-Duration Gamma-Ray Burst. W. T.; Wren, J. A. Astrophysical Pugmire, and Kenneth Joy. IEEE Cenko, S. B.; Butler, N. R.; Ofek, E. Journal, Volume 722, Issue 2, pp. Transactions on Visualization and O.; Perley, D. A.; Morgan, A. N.; 1624–1632 (2010). Computer Graphics, 07 Dec. 2010. Frail, D. A.; Gorosabel, J.; Bloom, J. NERSC Center 87

S.; Castro-Tirado, A. J.; Cepa, J.; Chandra, P.; de Ugarte Postigo, A.; Filippenko, A. V.; Klein, C. R.; Kulkarni, S. R.; Miller, A. A.; Nugent, P. E.; Starr, D. L. Astronomical Journal, Volume 140, Issue 1, pp. 224–234 (2010).

Vessel Network Detection Using Contour Evolution and Color Components. D. M. Ushizima, F. N. S. Medeiros, J. Cuadros, C. I. O. Martins. Proceedings of the 32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Buenos Aires, Argentina. Sept 2010.

Visualization and Analysis-Oriented Reconstruction of Material Interfaces. J. Meredith and H. Childs. Eurographics/IEEE Symposium on Visualization (EuroVis), Bordeaux, France, June 2010.

What’s Ahead for Fusion Computing? Alice Koniges, Robert Preissl, Stephan Ethier, and John Shalf. International Sherwood Fusion Theory Conference, April 2010, Seattle, Washington. 88 NERSC Annual Report 2010-2011 Appendices 89 Appendix A NERSC Policy Board

The NERSC Policy Board meets at least annually and provides scientific and executive-level advice to the Berkeley Lab Director regarding the overall NERSC program and, specifically, on such issues as resource utilization to maximize the present and future scientific impact of NERSC, and long-range planning for the program, including the research and development necessary for future capabilities. Policy Board members are widely respected leaders in science, computing technology, or the management of scientific research and/or facilities.

Daniel A. Reed, Policy Board Chair Corporate Vice President, Extreme Computing Group Microsoft Corporation

Achim Bachem Chairman of the Board of Directors Forschungszentrum Jülich GmH

Jack Dongarra University Distinguished Professor Director, Innovative Computing Laboratory University of Tennessee

Stephane Ethier Ex officio, NERSC Users Group Chair Computational Plasma Physics Group Princeton Plasma Physics Laboratory

Sidney Karin Professor of Computer Science and Engineering University of California, San Diego

Pier Oddone Director Fermi National Accelerator Laboratory

Edward Seidel Assistant Director for Mathematics and Physical Sciences National Science Foundation 90 NERSC Annual Report 2010-2011 Appendix B NERSC Client Statistics

In support of the DOE Office of Science’s mission, NERSC served 4,294 scientists throughout the United States in 2010. These researchers work in DOE laboratories, universities, industry, and other Federal agencies. Figure 1 shows the proportion of NERSC usage by each type of institution, while Figures 2 and 3 show laboratory, university, and other organizations that used large allocations of computer time. Computational science conducted at NERSC covers the entire range of scientific disciplines, but is focused on research that supports the DOE’s mission and scientific goals, as shown in Figure 4.

On their Allocation Year 2011 proposal forms, principal investigators reported 1,528 refereed publications (published or in press)—as well as 262 publications submitted to refereed journals—for the preceding 12 months, based, at least in part, on using NERSC resources. Lists of publications resulting from use of NERSC resources are available at https://www.nersc.gov/news-publications/ publications-reports/nersc-user-publications/.

The MPP hours reported here are Cray XT4 equivalent hours.

MPP Hours (in millions) 0510 15 20 25 30 35 40

Berkeley Lab 32,687,481 NCAR 19,566,332 Oak Ridge 12,098,447 PPPL 11,218,833 PNNL 8,240,839 Argonne 6,860,146 NREL 3,421,450 General Atomics 3,229,656 SLAC 3,214,480 MPP Hours (in millions) Livermore Lab 3,014,448 0102030405060 Brookhaven Lab 2,994,457 Sandia Lab CA 2,948,031 Universities 58.3% Fermilab 2,762,998 Los Alamos Lab 1,508,455 Ames Lab 1,239,465 DOE Labs 32.9% Idaho National Lab 1,059,134 NOAA 722,827 Tech-X Corp. 667,089 Naval Research Lab 408,951 Other Government Labs 7.4% Jefferson Lab 401,467 NASA GISS 347,325 JGI 266,382 Industry 1.4% Army Engineer Center 112,398 Others (8) 296,867

Figure 1. NERSC usage by institution Figure 2. DOE and other laboratory type, 2010 (MPP hours). usage at NERSC, 2010 (MPP hours). Appendices 91

MPP Hours (in millions) 0510 15 20

U. Washington 14,915,012

U. Arizona 14,464,867

UC Santa Cruz 14,356,525

UC Berkeley 12,725,264

UCLA 7,385,969

U. Missouri KC 5,715,209

Auburn Univ 4,939,672

MIT 4,795,298

U. Colorado Boulder 4,722,811

Stanford Univ 4,711,737

U. Wisc. Madison 4,436,563

U. Texas Austin 4,103,468

Courant Inst 4,008,502

UC Irvine 3,815,734

Cal Tech 3,482,915

Northwestern 2,922,222

U. Illinois U-C 2,429,845

U. Alaska 2,271,199 MPP Hours (in millions) Harvard Univ 2,192,261 010203040650 0 U. Central Florida 2,139,024 Materials Science 18.0% William & Mary 2,047,854 Fusion Energy 16.5% UC Santa Barbara 1,983,743

Indiana St. U. 1,952,762 Climate Research 13.6%

UC San Diego 1,935,143 Lattice Gauge Theory 12.6% U. Maryland 1,931,734 Chemistry 12.4% Princeton Univ 1,824,899 Astrophysics 6.6% U. New Hampshire 1,676,698

Florida State 1,658,498 Accelerator Physics 5.5%

Iowa State 1,591,337 Life Sciences 4.9% Vanderbilt Univ 1,456,660 Combustion 2.1% U. Florida 1,362,126

Colorado State 1,321,559 Environmental Sciences 1.8%

U. Kentucky 1,313,287 Geosciences 1.7% Yale Univ 1,173,411 Nuclear Physics 1.6% UC Davis 1,168,201 Applied Math 1.4% U. Utah 1,154,251

North Carolina State 1,123,068 Computer Sciences 0.7% Louisiana State 1,083,770 Engineering 0.6% George Wash Univ 1,059,775 High Energy Physics 0.016% Texas A&M Univ 1,020,161

Others (67) 16,395,335 Other 0.001%

Figure 3. Academic usage at NERSC, Figure 4. NERSC usage by scientific 2010 (MPP hours). discipline, 2010 (MPP hours). 92 NERSC Annual Report 2010-2011 Appendix C NERSC Users Group Executive Committee

Office of Advanced Scientific Computing Research Mark Adams, Columbia University Sergei Chumakov, Stanford University Mike Lijewski, Lawrence Berkeley National Laboratory

Office of Basic Energy Sciences Eric Bylaska, Pacific Northwest National Laboratory Paul Kent, Oak Ridge National Laboratory Brian Moritz, SLAC National Accelerator Laboratory

Office of Biological and Environmental Research David Beck, University of Washington Barbara Collignon, Oak Ridge National Laboratory** Brian Hingerty, Oak Ridge National Laboratory* Adrianne Middleton, National Center for Atmospheric Research

Office of Fusion Energy Sciences Stephane Ethier (Chair), Princeton Plasma Physics Laboratory Linda Sugiyama, Massachusetts Institute of Technology Jean-Luc Vay, Lawrence Berkeley National Laboratory

Office of High Energy Physics Julian Borrill, Lawrence Berkeley National Laboratory Cameron Geddes, Lawrence Berkeley National Laboratory* Steven Gottlieb, Indiana University** Frank Tsung (Vice Chair), University of California, Los Angeles

Office of Nuclear Physics David Bruhwiler, Tech-X Corporation* Daniel Kasen, Lawrence Berkeley National Laboratory** Peter Messmer, Tech-X Corporation* Tomasz Plewa, Florida State University Martin Savage, University of Washington**

Members at Large Angus Macnab, DyNuSim, LLC* Gregory Newman, Lawrence Berkeley National Laboratory** Jason Nordhaus, Princeton University Ned Patton, National Center for Atmospheric Research* Maxim Umansky, Lawrence Livermore National Laboratory

* Outgoing members ** Incoming members Appendices 93 Appendix D Office of Advanced Scientific Computing Research

The mission of the Advanced Scientific Computing Research (ASCR) program is to discover, develop, and deploy computational and networking capabilities to analyze, model, simulate, and predict complex phenomena important to the Department of Energy (DOE). A particular challenge of this program is fulfilling the science potential of emerging computing systems and other novel computing architectures, which will require numerous significant modifications to today’s tools and techniques to deliver on the promise of exascale science.

To accomplish its mission and address those challenges, the ASCR program is organized into two subprograms: Mathematical, Computational, and Computer Sciences Research; and High Performance Computing and Network Facilities.

• The Mathematical, Computational, and Computer Sciences Research subprogram develops mathematical descriptions, models, methods, and algorithms to describe and understand complex systems, often involving processes that span a wide range of time and/or length scales. The subprogram also develops the software to make effective use of advanced networks and computers, many of which contain thousands of multi-core processors with complicated interconnections, and to transform enormous data sets from experiments and simulations into scientific insight.

• The High Performance Computing and Network Facilities subprogram delivers forefront computational and networking capabilities and contributes to the development of next-generation capabilities through support of prototypes and testbeds.

Berkeley Lab thanks the program managers with direct responsibility for the NERSC program and the research projects described in this report:

Daniel Hitchcock Julie Scott Acting Associate Director, ASCR Financial Management Specialist

Barbara Helland Lori Jernigan Senior Advisor Program Support Specialist

Walter M. Polansky Melea Baker Senior Advisor Administrative Specialist

Robert Lindsay Special Projects 94 NERSC Annual Report 2010-2011

Appendix D (continued) Office of Advanced Scientific Computing Research

Facilities Division Benjy Grover Computer Scientist (Detailee) Daniel Hitchcock Division Director Barbara Helland Computer Scientist Vincent Dattoria Computer Scientist Sandy Landsberg Mathematician Dave Goodwin Physical Scientist Randall Laviolette Physical Scientist Barbara Helland Computer Scientist Steven Lee Physical Scientist Sally McPherson Program Assistant Lenore Mullin Computer Scientist Betsy Riley Computer Scientist Thomas Ndousse-Fetter Computer Scientist Yukiko Sekine Computer Scientist Lucy Nowell Computer Scientist

Computational Science Karen Pao Research and Partnerships Mathematician (SciDAC) Division Sonia Sachs William Harrod Computer Scientist Division Director Ceren Susut Teresa Beachley Physical Scientist Program Assistant Angie Thevenot Rich Carlson Program Assistant Computer Scientist

Christine Chalk Physical Scientist Appendices 95 Appendix E Advanced Scientific Computing Advisory Committee

The Advanced Scientific Computing Advisory Committee (ASCAC) provides valuable, independent advice to the Department of Energy on a variety of complex scientific and technical issues related to its Advanced Scientific Computing Research program. ASCAC’s recommendations include advice on long-range plans, priorities, and strategies to address more effectively the scientific aspects of advanced scientific computing including the relationship of advanced scientific computing to other scientific disciplines, and maintaining appropriate balance among elements of the program. The Committee formally reports to the Director, Office of Science. The Committee primarily includes representatives of universities, national laboratories, and industries involved in advanced computing research. Particular attention is paid to obtaining a diverse membership with a balance among scientific disciplines, institutions, and geographic regions.

Roscoe C. Giles, Chair Thomas A. Manteuffel Boston University University of Colorado at Boulder

Marsha Berger John Negele Courant Institute of Mathematical Massachusetts Institute of Technology Sciences Linda R. Petzold Jackie Chen University of California, Santa Barbara Sandia National Laboratories Vivek Sarkar Jack J. Dongarra Rice University University of Tennessee Larry Smarr Susan L. Graham University of California, San Diego University of California, Berkeley William M. Tang James J. Hack Princeton Plasma Physics Laboratory Oak Ridge National Laboratory Victoria White Anthony Hey Fermi National Accelerator Laboratory Microsoft Research 96 NERSC Annual Report 2010-2011 Appendix F Acronyms and Abbreviations

ACM Association for Computing Machinery CCSE Center for Computational Sciences and Engineering (LBNL) ACS American Chemical Society CDU Cooling distribution unit AIP American Institute of Physics

CH3OH Methanol ALCF Argonne Leadership Computing Facility

CH4 Methane AMR Adaptive mesh refinement CNRS Centre National de la Recherche API Application programming interface Scientifique,France

APS American Physical Society CNT Carbon nanotube

ARPA-E Advanced Research Projects CO2 Carbon dioxide Agency-Energy (DOE) COE Center of Excellence ARRA American Recovery and Reinvestment Act CPU Central processing unit

ASCR Officeof Advanced Scientific Computing CRD Computational Research Division, Research (DOE) Lawrence Berkeley National Laboratory

BER Officeof Biological and Environmental CSCS Swiss National Supercomputing Centre Research (DOE) CSIR Council of Scientific and Industrial BES Officeof Basic Energy Sciences (DOE) Research, India

BNL Brookhaven National Laboratory CUDA Compute Unified Device Architecture

C2H5OH Ethanol CXI Coherent X-ray imaging

CAF Co-Array Fortran CXIDB Coherent X-Ray Imaging Data Bank

CAS Chinese Academy of Sciences DAE Department of Atomic Energy, India

CATH Class, Architecture, Topology DFT Density functional theory and Homology DNA Deoxyribonucleic acid CCNI Computational Center for Nanotechnology Innovations DOE U.S. Department of Energy Appendices 97

DOS Density of state HEP Officeof High Energy Physics (DOE)

DST Department of Science and HMA Highly mismatched alloy Technology, India HPC High performance computing DVS Data Virtualization Services HPSS High Performance Storage System EFRC Energy Frontier Research Center HTML Hypertext Markup Language ELM Edge localized mode HTTP Hypertext Transfer Protocol EPSRC Engineering and Physical Sciences Research Council, UK I/O Input/output

ESG Earth System Grid ICCS International Center for Computational Science ESnet Energy Sciences Network ICIAM International Council for Industrial FAPESP–CNPq and Applied Mathematics Fundação de Amparo à Pesquisa do Estado de São Paulo–Conselho IEEE Institute of Electrical and Nacional de Desenvolvimento Electronics Engineers Científicoe Tecnológico, Brazil IN2P3 Institut National de Physique Nucléaire FES Officeof Fusion Energy et Physique des Particules, France Sciences (DOE) INCITE Innovative and Novel Computational FOM Fundamenteel Onderzoek der Impact on Theory and Experiment (DOE) Materie, Netherlands IPCC Intergovernmental Program GA Grant Agency of the Czech Republic on Climate Change

GB Gigabyte IT Information technology

Gflop/s Gigaflop/s ITER Latin for “the way”; an international fusion energy experiment in GHz Gigahertz southern France

GNR Graphene nanoribbon JGI Joint Genome Institute (DOE)

GPFS General Parallel File System KAUST King Abdullah University of Science and Technology GPGPU General-purpose graphics processing unit KISTI Korea Institute of Science and Technology Information GPU Graphics processing unit KRF Korea Research Foundation

H2 Hydrogen LBNL Lawrence Berkeley National Laboratory HCO Hydrogen carbonate (transition state) LED Light-emitting diode 98 NERSC Annual Report 2010-2011

LLNL Lawrence Livermore National Laboratory NICS National Institute for Computational Science, University of Tennessee LRZ Leibniz Supercomputing Centre NIH National Institutes of Health MB Megabyte NISE NERSC Initiative for MESRF Ministry of Education and Science ScientificExploration of the Russian Federation NMR Nuclear magnetic resonance MHD Magnetohydrodynamic NNSFC National Natural Science MIT Massachusetts Institute of Technology Foundation of China

Mo6S8 Molybdenum sulfide nanocluster NO2 Nitrogen dioxide

MoE Ministry of Education of the People’s NOW Nederlandse Organisatie voor Republic of China Wetenschappelijk Onderzoek, Netherlands MOM Machine-oriented miniserver NP Officeof Nuclear Physics (DOE)

MoS2 Molybdenum disulfide NSF National Science Foundation MoST Ministry of Science and Technology of the People´s Republic of China NUG NERSC Users Group

MPI Message Passing Interface O2 Oxygen

MPP Massively parallel processing O3 Ozone

MSES Ministry of Science, Education and OSC Ohio Supercomputer Center Sports of the Republic of Croatia OSG Open Science Grid MSMT Ministerstvo Školství, Mládeže a Tělovýchovy, Czech Republic OSU Ohio State University

NASA National Aeronautics and PB Petabyte Space Administration PDB Protein Data Bank NCAR National Center for Atmospheric Research PDSF Parallel Distributed Systems Facility (NERSC) NCSA National Center for Supercomputing Applications PECASE Presidential Early Career Award for Scientists and NERSC National Energy Research Engineers ScientificComputing Center Petaflops Quadrillions of floating point NEWT NERSC Web Toolkit operations per second

NGF NERSC Global Filesystem Pflops Petaflops Appendices 99

PGI Portland Group, Inc. TACC Texas Advanced Computing Center

PMAMR Porous Media AMR (computer code) TB Terabyte

PMSHE Polish Ministry of Science and TCP Transmission Control Protocol Higher Education Teraflops Trillions of floating point operations PNAS Proceedings of the National Academy per second of Sciences TIGRESS Terascale Infrastructure for PPPL Princeton Plasma Physics Laboratory Groundbreaking Research in Engineering and Science High PRACE Partnership for Advanced Computing Performance Computing in Europe Center, Princeton University

PTF Palomar Transient Factory TPC Time projection chamber

QMC Quantum Monte Carlo UC University of California

RCF RHIC Computing Facility UCB University of California, Berkeley

REST Representational state transfer UPC UnifiedParallel C (programming language) RHIC Relativistic Heavy Ion Collider URL Uniform Resource Locator RMST Russian Ministry of Science and Technology UV Ultraviolet

ROSATOM State Atomic Energy Corporation, Russia

SC Officeof Science (DOE)

SciDAC ScientificDiscovery through Advanced Computing (DOE)

SCOP Structural Classification of Proteins

SDN Science Data Network (ESnet)

SIAM Society for Industrial and Applied Mathematics

SLIRP Structural Library of Intrinsic Residue Propensities

SMP Symmetric multiprocessor

STFC Science and Technology Facilities Council, UK

SUNY State University of New York

For more information about NERSC, contact:

Jon Bashor NERSC Communications email [email protected] Berkeley Lab, MS 50B4230 phone (510) 486-5849 1 Cyclotron Road fax (510) 486-4300 Berkeley, CA 94720-8148

NERSC’s web site http://www.nersc.gov/

DISCLAIMER This document was prepared as an account of work sponsored by the United States Government. While this document is believed NERSC Annual Report Editor to contain correct information, neither the United States Government nor any agency John Hules thereof, nor The Regents of the University of California, nor any of their employees, makes any warranty, express or implied, or assumes any legal responsibility for the accuracy, completeness, or usefulness of any information, Contributing writers apparatus, product, or process disclosed, or John Hules, Jon Bashor, Linda Vu, and Margie Wylie, Berkeley Lab Computing represents that its use would not infringe Sciences Communications Group; Paul Preuss and Lynn Yarris, Berkeley Lab privately owned rights. Reference herein to any specific commercial product, process, or service Public Affairs Department; David L. Chandler, MIT News Office; Pam Frost by its trade name, trademark, manufacturer, Gorder, Ohio State University; Kitta MacPherson, Princeton University or otherwise, does not necessarily constitute Communications Office; Rachel Shafer, UC Berkeley Engineering Marketing and or imply its endorsement, recommendation, or favoring by the United States Government Communications Office; Karen McNulty Walsh, Brookhaven National Laboratory. or any agency thereof, or The Regents of the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof, or The Design, layout, illustrations, photography, and printing coordination provided by Regents of the University of California. the Creative Services Office of Berkeley Lab’s Public Affairs Department.

Ernest Orlando Lawrence Berkeley National Laboratory is an equal opportunity employer. CSO 20748

Matthew Andrews

Katie Antypas ∙ Brian Austin ∙ Clayton Bagwell

William Baird ∙ Nicholas Balthaser ∙ Jon Bashor ∙ Elizabeth Bautista

Del Black ∙ Jeremy Brand ∙ Jeffrey Broughton ∙ Gregory Butler ∙ Tina Butler

David Camp ∙ Scott Campbell ∙ Shane Canon ∙ Nicholas Cardo ∙ Jonathan Carter ∙ Steve Chan

Ravi Cheema ∙ Hank Childs ∙ Shreyas Cholia ∙ Roxanne Clark ∙ James Craw ∙ Thomas Davis

Tina Declerck ∙ David Donofrio ∙ Brent Draney ∙ Matt Dunford ∙ Norma Early ∙ Kirsten Fagnan ∙ Aaron Garrett

Richard Gerber ∙ Annette Greiner ∙ Jeff Grounds ∙ Leah Gutierrez ∙ Patrick Hajek ∙ Matthew Harvey

Damian Hazen ∙ Helen He ∙ Mark Heer ∙ Jason Hick ∙ Eric Hjort ∙ John Hules ∙ Wayne Hurlbert ∙ John Hutchings

Christos Kavouklis ∙ Randy Kersnick ∙ Jihan Kim ∙ Alice Koniges ∙ Yu Lok Lam ∙ Thomas Langley ∙ Craig Lant

Stefan Lasiewski ∙ Jason Lee ∙ Rei Chi Lee ∙ Bobby Liu ∙ Fred Loebl ∙ Burlen Loring ∙ Steven Lowe ∙ Filipe Maia

Ilya Malinov ∙ Jim Mellander ∙ Joerg Meyer ∙ Praveen Narayanan ∙ Robert Neylan ∙ Trever Nightingale

Peter Nugent ∙ Isaac Ovadia ∙ R. K. Owen ∙ Viraj Paropkari ∙ David Paul ∙ Lawrence Pezzaglia ∙ Jeff Porter

Prabhat ∙ Robert Preissl ∙ Tony Quan ∙ Lavanya Ramakrishnan ∙ Lynn Rippe ∙ Cody Rotermund

Oliver Ruebel ∙ Iwona Sakrejda ∙ John Shalf ∙ Hongzhang Shan ∙ Hemant Shukla ∙ David Skinner

Jay Srinivasan ∙ Suzanne Stevenson ∙ David Stewart ∙ Michael Stewart ∙ David Turner ∙ Alex Ubungen

Andrew Uselton ∙ Daniela Ushizima ∙ Francesca Verdier ∙ Linda Vu ∙ Howard Walter

Harvey Wasserman ∙ Gunther Weber ∙ Mike Welcome ∙ Nancy Wenning ∙ Cary Whitney

Nicholas Wright ∙ Margie Wylie ∙ Woo-Sun Yang ∙ Yushu Yao

Katherine Yelick ∙ Rebeccca Yuan ∙ Brian Yumae

Zhengji Zhao