NOVEMBER 2011

SUPER- COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING

PUBLISHED IN PARTNERSHIP WITH Research and Innovation with HPC

Joint SMEs Laboratory

HPC

At the interface of computer science and mathematics, Inria researchers have spent 40 years establishing the scientific bases of a new field of knowledge: computational science. In inte- raction with other scientific disciplines, computational science offers new concepts, languages, methods and subjects for study that open new perspectives in the understanding of complex phenomena.

High Performance Computing is a The work of this laboratory focuses Eventually, in order to boost techno- strategic topic for Inria, about thirty on development of algorithms and logy transfer from public research to Inria research teams are involved. software for computers at the peta- industry, which is part of Inria’s core flop scale and beyond. The laborato- mission, the institute has launched Inria has thus established large ry’s researchers carry out their work an «SME go HPC» Program, together scale strategic partnerships with- as part of the Blue Waters project. with GENCI, OSEO and four French Bull for the design of future HPC industry clusters (Aerospace Valley, architectures and with EDF R&D fo- It is also noteworthy that several Axelera, Minalogic, Systematic). cused on high performance simulation former Inria spin-off companies have The objective of the Program is to for energy applications. developed their business on this mar- bring high level expertise to SMEs wil- ket, such as Kerlabs, Caps Enterprise, ling to move to Simulation and HPC as At the international level, Inria and the Activeon or Sysfera. University of Illinois at Urbana-Cham- a means to strengthen their compe- paign (United States) created a joint titiveness.SMEs wanting to make use laboratory for research in supercom- of high-performance computing or puting, the Joint Laboratory for simulation to develop their products and services (design, modelling, sys- Petascale Computing (JLPC), in tem, test, processing and visualisation 2009. of data) can apply on the website devoted to this HPC-SME Initiative.

www.inria.fr www.initiative-hpc-pme.org Inria is the only French national public research organization fully dedicated to digital sciences and it hosts more than 1000 young researchers each year. EDITORIAL 3

OUR STAKE IN THE FUTURE SUPERCOMPUTERRS

igh-performance computing, or HPC, has gradually become a part of our daily H lives, even if we are not always aware of it. It is in our medicines, our investments, in the films we go to see at the cinema and the equipment of our favourite athletes, the cars we drive and the petrol that they run on. It makes our world a safer place, where our resources are used more wisely, and, thanks to researchers, a world we can more easily understand. Yet these giant steps forward, notably by breaking DABURON

the petaflops barrier, or one million FRANÇOIS BY PHILIPPE billion operations per second, VANNIER Chairman will soon seem modest indeed and CEO of Bull. as even greater technological upheavals lie ahead. Cloud Computing is revolutionising and broadening access to scientific computing. The exaflops, 1,000 times more powerful than a petaflops, will give a new dimension to digital simulations. Today the great regions of the world, with the United States and in the lead, have taken significant steps to ensure control of future technologies. Up until now, Europe has remained on the sidelines. We need to act quickly if we want to hold on to this know-how, which is essential for our independence, our research and our industries, and preserve our jobs.

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 . The TERATEC Technopole

Created by a CEA initiative to develop and promote high performance simulation and computing, the TERATEC technopole is located in Bruyères-le-Chatel, in the southern part of Île-de-France, and includes all the elements of the HPC and simula- tion value chain around three entities :

The CEA Very Large Computing Center (TGCC) Infrastructure dedicated to and equipped in particular with the CCRT machines and the European PRACE machine. It is also a place for exchanges and meetings with a «conference area» including a 200-seats auditorium.

The TERATEC Campus In the TERATEC Technopole and facing the CEA Very Large Computing Center, the TERATEC Campus, with more than 13,000 m² , regroups : Industrial companies (systems, software and services) and a business center plus an incubator, Industrial research laboratories: Research Lab (Intel / CEA / GENCI / UVSQ), Extreme Computing Lab (BULL / CEA)... A European HPC Training Institute, Platform Services accessible to all industrial companies and research organizations. The objective of the TERATEC Campus is to provide professionals in the field of high performance simulation and compu- ting with a dynamic and user-friendly environment to serve as a crossroads for innovation in three major areas : systems performance and architecture, software development and services.

The TERATEC Association The TERATEC Association regroups more than 80 partners from industry and research, having in common advanced usage and development of systems, software or services dedicated to high-performance simulation and computing. TERATEC federates and leads the HPC community to promote and develop numerical design and simulation, and facilitates exchanges and collaborations between participants. Each year, TERATEC organizes the TERATEC Forum, the major event in this domain in France and in Europe (next edition planned on June 26 and 27, 2012 - more on : www.teratec.eu)

If you are interested by joining the TERATEC Campus, contact TERATEC : [email protected] or +33(0)1 69 26 61 76. NEW HORIZONS 5

Supplement 2 of “La Recherche” cannot be sold separately from supplement 1 (LR N° 457). “La Recherche” is published 06 AN ONGOING 11 INRIA IS LEADING by Sophia Publications, a subsidiary CHALLENGE FOR THE WAY IN HPC of Financière Tallandier. SUPERCOMPUTERS Digital simulation on SOPHIA PUBLICATIONS Since the end of nuclear supercomputers is driving France 74, avenue du Maine 75014 Paris Tel.: +33 (0)1 44 10 10 10 testing, the CEA is taking up in the race to Exascale. Editorial office email: the challenge of ensuring [email protected] the reliability and security 12 TERA 100: A LEADER IN EFFICIENCY CEO AND PUBLICATION MANAGER of nuclear weapons through Philippe Clerget simulations alone. Tera 100 is 7 times more SUPERCOMPUTERS MANAGEMENT ADVISOR energy-efficient than Jean-Michel Ghidaglia 08 HIGH-PERFORMANCE its predecessor Tera 10. COMPUTING FOR ALL! To contact a member of the editorial Genci intends to provide 14 TRI-GATE 3D TRANSISTORS team directly by phone, dial IN THE RACE TO EXASCALE +33 (0)1 44 10 followed by the four digits all scientists access to high- after to his or her name performance computing. The development of EDITORIAL DIRECTOR Aline Richard exaflopic computers will depend on major technological EDITOR-IN-CHIEF Luc Allemand breakthroughs. DEPUTY EDITOR-IN-CHIEF FOR SUPPLEMENT 2 Thomas Guillemain MAJOR CHALLENGES EDITORIAL ASSISTANT FOR SUPPLEMENT 2 16 MODELLING MOLECULES 28 UNDERSTANDING Jean-Marc Denis FOR MORE EFFECTIVE HOW A STAR IS BORN ARTWORK AND LAYOUT TREATMENTS A noir, +33 (0)1 48 06 22 22 Analysing what happens when galaxies collide and how stars are PRODUCTION Simulation should orient research Christophe Perrusson (1378) towards new drugs. born. SALES, ADVERTISING 30 THE PHYSICS OF SHOCKS AND DEVELOPMENT 20 USING SUPERCOMPUTERS Caroline Nourry (1396) TO IMPROVE TSUNAMI ON AN ATOMIC SCALE CUSTOMER RELATIONS WARNING SYSTEMS The mechanics of materials must Laurent Petitbon (1212) CONTENTS The effects of submarine be understood at the atomic level. ADMINISTRATIVE earthquakes on coastlines could 32 MARTENSITIC AND FINANCE DIRECTOR be predicted in just 15 minutes! Dounia Ammor DEFORMATIONS SEEN SALES AND PROMOTION 22 FUTURE NUCLEAR THROUGH THE PROCESSOR Évelyne Miont (1380) REACTORS ALREADY PRISM BENEFIT FROM HPC Headings, subheadings, presentation Metal alloys can spring back to texts and captions are written National security also relies on their initial shape after a major by the editorial office. The law of March 11, 1957 prohibits copying or reproduction three-dimensional modelling. transformation. intended for collective use. Any representation or reproduction in full 24 WATCHING MATERIALS 34 USING GRAPHICS or in part made without the consent PROCESSORS TO VISUALISE of the author, or of his assigns GROW, ONE ATOM AT A TIME or assignees, is unlawful (article L.122-4 LIGHT of the French intellectual property Code). Simulating growth at the Any duplication must be approved atomic level will lead to mastery Or the eternal question of how by the French copyright agency (CFC, 20, of nanoelectronics. laser beams behave… rue des Grands-Augustins, 75006 Paris, France. Tel.: +33 (0)1 44 07 47 70, Fax: +33 (0)1 46 34 67 19). The editor 26 CALCULATING NUCLEAR reserves the right to refuse any insert DISSUASION that would be deemed contrary to the moral or material interests of the Modelling and simulations publication. are the key tools in nuclear design.

Supplement 2 of “La Recherche” Joint commission of Press Publications and Agencies: 0909 K 85863 THE FUTURE: EXASCALE COMPUTING ISSN 0029-5671

PRINTED IN ITALY BY 41 CORRECTING ERRORS G. Canale & C., Via Liguria 24, 35 THE NEXT CHALLENGE: 10071 Borgaro, Torino. CONTROLLING ENERGY IS A TOP PRIORITY Copyright deposit . CONSUMPTION In the run-up to Exascale, © 2011 SOPHIA PUBLICATIONS. Improving the energy efficiency simulation should help of memories and processors researchers confirm calculations, is a real challenge for tomorrow’s even in the event of failures. machines…

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 . z NEW HORIZONS

6 NEW HORIZONS CEA CADAM 1996: FRANCE DECIDES TO DEFINITIVELY STOP ALL NUCLEAR TESTING. THIS MEANS A NEW CHALLENGE FOR THE CEA: GUARANTEEING, BY 2011, THE RELIABILITY AND SECURITY OF NUCLEAR WEAPONS EXCLUSIVELY VIA SIMULATIONS. THE FOLLOWING IS A RECAP OF THIS FIF- TEEN-YEAR INDUSTRIAL AND RESEARCH ADVENTURE, WITH JEAN GONNORD FROM THE CEA. “AN ONGOING CHALLENGE FOR SUPERCOMPUTERS”

z Now that we’ve reached first! Only the United States has up to 25% today. It took us ten the year 2011, would you say dared, like France, to tackle this years to convince people that you have achieved your goals? ambitious challenge. high-performance simulation Jean Gonnord: We have just was strategic, not only for the delivered the “Standard 2010” z Today your vision of high- industrial world - in order to weapons designers i.e. the set JEAN performance computing to reduce development GONNORD of simulation codes for nuclear and simulation has been cycles and cut costs – but weapons that, combined is project unanimously embraced by also for research – in energy, manager for with our Tera digital simula- industry and research... climatology, health, etc. This 100, now up and running, tions and com- J. G.: Luckily, yes! Europe was is now accepted throughout will guarantee future nuclear puting in the very far behind. From 1996 the world. But computing warheads on submarines military appli- to 2006, its presence in the Top power alone is not enough. cations depart- without conducting new ment 500 evolved from 28% down If we consider this capacity nuclear tests. This is a scientific of the CEA. to less than 17% and then back is strategic, then we need

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS to control the technology! has a capacity of more than materialised in the quality of 7 z However, up until 2005, Europe 1,000 terafl ops, or 1 petafl ops. To the end result. This is a general- More than 200 was completely absent on achieve such computing power, purpose machine and not some people at the CEA the HPC (High Performance a massive parallel architecture research tool like Blue Gene and Bull worked together on Tera Computing) market: less than was required. This meant using or IBM’s Roadrunner, in Los 100. It is the third 0.2% of these machines were 100,000 high-tech processors for Alamos New Mexico, which we supercomputer designed here. We have a Tera 100. For economic reasons, beat out in the end. Finally, it is a set up at the global vision of this industry, we chose mass produced or “off real commercial success for the CEA in Bruyères- from mastering hardware the shelf” components. Fifteen French manufacturer Bull. le-Châtel (Essonne, France) and software technologies years later, these turned out The architecture of Tera 100 dedicated to and integrating them in to be the most effi cient ones has been voted “Best architecture simulation. supercomputers to the fi nal too. Only in a global economy of the year 2009” by the American NEW HORIZONS Below: a application. On this last point, can we fi nance the R&D magazine HPC Wire (1). Bull simulation of we often felt like we were required for a new processor. has sold these computers in turbulent fl ows. talking to a wall. Regarding software, we decided Europe as well as Brazil. Our to pool development and British equivalent (AWE/Atomic z How were you able to validation skills by using open Weapons Establishment) has implement this policy? source software. Finally, for purchased two 150-terafl ops J. G.: As an engineer, by Tera 100, we implemented a machines. The Genci has ordered developing a long-term highly innovative policy, co- the Curie supercomputer, strategy and intermediate design, which associates the with a capacity of more than phases to take advantage of manufacturer with the user- 1.5 petafl ops, for its Prace feedback. And setting the goal expert in architecture. Thanks programme, which will be of developing general-purpose, to co-design, the real utility of up and running this year at the TGCC (Très grand centre de calcul) on the CEA site in Bruyères-le-Châtel. It will be the fi rst petafl opic computer available for all European researchers. The international programme for fusion energy, F4E, ordered a 1.5 petafl ops computer in April that will be set up in Rokasho, in Japan. This proves that when you have a goal, the desire to reach it and you’re perfectly organised, nothing is impossible.

z How do you see the future? J. G.: Now we need to ensure the future of this capacity, which we CEA have demonstrated, for Europe to design and construct these huge supercomputers, which are strategic for our economy and society as a whole. In an era where Japan and China are now reliable machines, that can these Formula 1 computers is leading the way, in front of the withstand competition, not ensured. United States, Europe cannot computational behemoths remain the only region in the FLOPS designed only for the race to the z How did you organise work world where others control a (Floating point top, or military machines for our with the manufacturer? technology that is vital for its Operations Per Second) is a unit of own needs. The very high level J. G.: On the basis of a contract, future. This is why we support measure for of expertise within the team after a call for bids in 2008, which the creation of an ETP, or the performance and the capacity of the CEA included a proposal to share European Technology Platform, of computers. One (Atomic Energy and Alternative R&D (and therefore intellectual run by European industries and terafl ops equals one thousand billion Energies Commission) to property rights), construction of backed by major research operations per second organise big projects did the a demonstrator and an option to laboratories. As far as we are (1012) and one exafl ops rest. Keeping our commitments buy. The French manufacturer concerned, R&D for the next equals one billion in 2010 meant we needed 500 Bull, which had already built Tera two generations of CEA/DAM billion operations per second (1018). terafl ops *. In 2001 we aimed 10, won the tender. More than machines, Tera 100 and EXA1, for 5 terafl ops and designed two hundred people at the CEA has already been launched. Tera 1 and, in 2005, we reached and Bull worked together on And we will have broken EFFICIENCY 50 terafl ops with Tera 10. The Tera 100, both on the hardware the exafl ops * barrier by the end is the ratio between z measured and success of Tera 10 encouraged architecture and the systems of the decade. theoretical power us to double the power planned software. It was a huge human INTERVIEW CONDUCTED BY ISABELLE BELLIN of a computer. for Tera 100, which in the end and organisational success, (1) HPCwire, November 2009.

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 . DIGITAL SIMULATION HAS GRADUALLY BECOME A UNIVERSAL VECTOR FOR SCIENTIFIC AND ECONOMIC DEVELOPMENT. THE AMBITION OF THE GENCI, A PUBLIC ORGANISATION, IS TO INCREASE ACCESS TO HIGH-PERFORMANCE COMPUTING BY MAKING IT AVAILABLE 8 TO ALL SCIENTISTS, THROUGHOUT THE COUNTRY. ACCESS TO HIGH- PERFORMANCE COMPUTING

NEW HORIZONS FOR ALL!

magine machines so pow- (Partnership for Advanced to make even more competitive erful they could do in a day Computing in Europe) research resources available, which will I what it would take a desk- infrastructure on 9th June 2010 promote, in turn, the production top computer 150 years to in Barcelona. Its objective? To of the best scientific results... It’s accomplish. Is this science fic- set up and run a distributed and a virtuous circle.” PRACE is off tion? No; just science! These ma- lasting computing infrastruc- BY to a good start: Hungary joined LÆTITIA chines, called supercomputers, ture in the Old World consisting BAUDIN the research infrastructure on th st are capable of performing mil- in four to six centres equipped head of 8 June, thus becoming the 21 lions of billions of operations in with machines offering a com- communications European state in PRACE. a single second -hence the term puting power greater than one for the GENCI As early as mid 2010, Euro- high-performance computing. petaflops*. (Grand pean scientists had access to équipement They help reproduce, through national de the Jugene supercomputer, the modelling and simulation, ex- A virtual laboratory calcul intensif). first component in the PRACE periments that cannot be con- “The success of PRACE depends infrastructure, located in Jeulich ducted in a lab because they are on the scientific results it can ob- (Germany). Since the beginning too dangerous, costly, time-con- tain, which must be recognised of 2011, they have also been able suming, complex or even inac- as among the best in the world, to use the CURIE computer, set cessible on a human scale. emphasises the British physi- up in France at the TGCC (Très Today, digital simulation has cist Richard Kenway, President Grand Centre de Calcul) of the become a key approach in sci- of the PRACE scientific council. CEA. Financed by the GENCI, FLOPS entific research alongside theory This is a fundamental goal for (Floating point this supercomputer will be fully and experimentation. In France, us. Demonstrating our success is Operations Per operational at the end of 2011, the GENCI (Grand équipement a prerequisite for enlarging the Second) is a unit with a power of at least 1.8 pet- national de calcul intensif) is the number of member countries of measure for aflops. Starting in 2012, scien- the performance public organisation in charge of in PRACE, who will contribute of computers. One tists will have access to even implementing French policy substantially to operating the re- teraflops equals more machines, in Germany, in terms of high-performance search infrastructure. These new one thousand billion Italy and Spain. computing in academic re- contributions will allow PRACE operations per second For Jérémie Bec, the first (1012), one petaflops search. Alongside the Ministry of equals one million French scientist to benefit from Higher Education and Research, billion operations these “European” computing the GENCI brings together the per second (1015) hours, major scientific break- main players in high-perfor- and one exaflops throughs are on the horizon. equals one billion mance computing: the CEA, billion operations “Petaflopic supercomputers open CNRS, public universities and per second (1018). the door to a new era in research, the INRIA. “Over four years, the an era of experimentation in a GENCI’s investments have helped virtual lab”, proclaims this spe- multiply by more than 30 times cialist in turbulent flows based at the computing power available the OCA (Côte d’Azur Observa- for the French scientific com- tory), near Nice, who is studying munity, which is currently ap- the role of turbulent fluctuations proximately 600 teraflops*”, adds in triggering precipitation from Catherine Rivière. hot clouds. Outside France, European More generally speaking, high-performance computing is according to Alain Lichnewsky, taking shape. Convinced that no scientific director of GENCI: country can finance and sustain- “The increased capacities of ably develop a world class com- supercomputers, installed in puting structure alone, twenty France under the aegis of the representatives of European z GENCI or within the framework countries, including the GENCI Catherine Rivière, of PRACE, allow innovative re- for France, created the PRACE DR CEO of GENCI. sults such as the generalisation

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS 9 NEW HORIZONS PRACE z Official inauguration of the European research infrastructure PRACE (Partnership for Advanced Computing in Europe) on 9th June 2010 in Barcelona. z The CURIE petaflopic supercomputer during its installation at the TGCC (Très grand centre de calcul) of the CEA in Bruyères-

CEA le-Châtel, France.

of ab initio models based on Thus scientists can tackle in- as possible, our climate’s past, our fundamental principles in the creasingly complex phenomena current conditions and future fields of chemistry and materi- and provide practical answers trends, according to different als or the collection of vital data to crucial economic or societal scenarios, explains Jean Jouzel, for developing new experimental problems. Vice President of the IPCC (Inter- methods. As modelling has pro- governmental Panel on Climate gressed, and the codes of new su- Towards a pyramid scheme Change). Thanks to CURIE, we percomputers have been adapt- In the field of climatology, for will be able to envisage climate ed, the frontier of established example, preserving the planet simulations with a resolution of knowledge is being defined by means gaining a deeper under- a dozen kilometres for the entire confronting state-of-the-art standing of our climate: “We ab- planet and over several hun- simulation with the nature of solutely need massive computing dreds of years. This will also al- the problems studied.” power to simulate, as realistically low us to increase European >>>

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 . 10 NEW HORIZONS

N° 457 fully makethis transition, a Olivier Pironneau, a theoretical change, 2018: tion toExascale inaround challenge willbethetransi- way. Especiallysincethenext ware” maintaining therequired soft- derstanding, developing and gies, whoare capableofun- the latestcomputertechnolo- young scientiststrained in Saint-Quentin-en-Yvelines). UVSQ (Université de Versailles the Écolepolytechniqueand up by theCEA,Centrale Paris, Modelling andSimulation set as, forexample, theMaster in ing anddigitalsimulationsuch comput- ing high-performance ing offerforspecialistsmaster- andcompletetrain-concerted should alsoallow ustodeploy a Coordinating with universities Waiting forExascale Catherine Rivière. nated attheregional level” centres andothermeanscoordi- sources innationalcomputing ta: European HPCfacilities, re- around three geographical stra- construction ofaHPCpyramid fore, weare goingtospeedup nationalfacilities.port regional level,inorder tosup- force computingmeansatthe € Equip@meso benefitsfrom général àl’investissement), for Investment (Commissariat the French General Commission conducted undertheaegisof lence facilities” callforprojects, regional partners” GENCI andinvolves tendifferent centres, whichiscoordinated by computing ofcoordinated Meso- Excellence facilityforintensive project,the Equip@meso orthe priority; infact, thatistheaimof oping universityfacilitiesisatop French HPCinitiatives. thecoherencysupervising of sive Computing), incharge of Strategic Committee forInten- stratégique ducalculintensif/ President oftheCSCI(Comité underlines Olivier Pironneau, ers andlaboratory machines, and European supercomput- mediate level, betweennational “Today, wedonothave aninter- in thisedition. ample. Others willbepresented simulation.” ternational exercises inclimate >>> 10.5 millioninfundstorein- NOVEMBER 2011LARECHERCHE SUPERCOMPUTERS “We doindeedneedmore Selected an during “Excel- Much remains tobedone.

particip ation particip “This ismuchmore than“This , reckons Richard Ken- This isonlyoneex-

inthenext in- . to success- asserts “There- Devel- , adds , adds

mendations (France Numéri- French government recom- Constructed inkeeping with by GENCI,INRIAandOSEO. HPCPME initiative, supported and humanstakes. Hence the ter thetechnological,financial because theydonotalwaysmas- need tobeconvinced,mainly their developmentplans, SMEs computingin high-performance tegrated digitalsimulation and Airbus orBNPParibas – have in- – nancial firms like TOTAL, EDF, While large corporations orfi- Digital simulationsandSMEs and maintenancecycles.” and optimisationofproduction cantly contributingtoinnovation product andby orservice signifi- commercialisation phasesofa the timespentondesignand tivity, by considerably reducing component inindustrialproduc- Rivière. standpoint, strategic toolsfrom aneconomic benefit from virtualisation.” tance tofailures, whichshould erations persecondandresis- abillionop- to perform number ofprocessors required ment ofsoftware adaptedtothe cation betweenchips, develop- be done: improved communi- huge amountofwork needsto demic research: simulation isnotlimitedtoaca- mance computinganddigital manager atGENCI. Stéphane Requena, technical the advent ofExascale” French scientificcommunityfor sists, notably, inpreparing the contributionofGENCIcon- “The exaflopicperformance. support programming tools)thathelp chitectures (scientificcodesand oping hardware andsoftware ar- ECR Labispreparing anddevel- around twenty researchers, the CI, Intel andUVSQ.Home to byin partnership theCEA,GEN- Research created Laboratory), ECR Lab(Exascale Computing lenge isbeingtackledviathe process” HPC whocanguidethemin this model, by callingonplayersin ulations intermsoftheirbusiness the relevance ofusingdigitalsim- “Our aimistohelpSMEsassess launched justover ayear ago. que 2012),thisinitiativewas ing indifferent sectors(automo- Operat-ting from thissupport. pressed theirinterest inbenefit- less than15SMEshave ex- Yet thefieldofhigh-perfor- In France thisdecisivechal- By thesummerof2011no , addsCatherine Rivière. They are anessential asserts Catherine asserts “These are also “These , explains , explains

Computing) for Intensive Committee Strategic Calcul Intensif/ Stratégique du CSCI (Comité President ofthe Olivier Pironneau, council and PRACE scientific President ofthe Richard Kenway, British physicist bottom: The From topto z computing! access tohigh-performance than ever, we needtoincrease tion isjustonetool.Now more development. Digital simula- nal scientificandeconomic natio- tosupport it iscrucial ning thesupercomputer race, located throughoutthecountry. signal processing, etc.) theyare shipbuilding, microelectronics, bile, aeronautics, digitalmedia, At a timewhenAsiaiswin- z

DR UNIVERSITY OF EDINBURGH THE INRIA (NATIONAL INSTITUTE FOR RESEARCH IN COMPUTER SCIENCE AND CONTROL) HAS LAUNCHED A “LARGE-SCALE INITIATIVE” IN ORDER TO OPTIMISE THE USE OF SUPERCOMPUTERS THROUGH DIGITAL SIMULATIONS. WE CAN ALREADY SAY THAT FRANCE IS UP AND RUNNING IN THE RACE TO EXASCALE. 11 INRIA IS LEADING THE WAY IN HPC NEW HORIZONS

ow can we tackle the risks, CO2 capture, radionuclide es and environments, systems major issues involved transport), nuclear fusion in the software, algorithms and digi- H in programming mas- framework of the Iter project tal libraries, failure tolerance. sive parallel petaf- (plasma dynamics) or aeronau- Another example is the joint lopic and exaflopic computer tics (aerodynamic design). Each national research laboratory BY STÉPHANE architectures? How can we of these application challenges INRIA has opened with the Eu- LANTÉRI imagine their deployment in or- will lead to the implementation ropean Centre for Research and head of the der to understand the complex of very large-scale digital simu- Advanced Training in Scien- NACHOS scientifi c problems and technol- lations, in terms of problem size tific Computation (CERFACS) research team, ogies that are of interest for our and the volume of data involved, in Toulouse. This laboratory is dedicated to numerical society today? To answer these on massive parallel comput- focused on the design and cre- modelling questions, INRIA has launched ing confi gurations with several ation of precise digital tools us- and high- a “large-scale initiative” that thousand to several hundreds of ing large numbers of processor performance brings together teams from its thousands of cores. cores for complex applications computing. different sites around the theme in materials physics, fl uid me- of “High-performance comput- chanics and climatology. AND ing for computational sciences”. JEAN ROMAN The objective: to set up a skills “The next challenge Defi ning a road map head of the continuum with the aim of us- for the scientifi c This involves drawing up an in- HEIPACS ing supercomputer processing ventory of all the applications research team, community regarding dedicated and storage capacities more required for such a machine, to high-end effi ciently to implement high- digital simulation imagining the architectures parallel ly-complex large-scale digital will be the transition that could be built in 2018 with algorithms simulations. the help of manufacturers and, for challenging to exascale, numerical which means the fi nally, estimating the necessary simulations. Massively parallel R&D required to create viable The initiative will be dictated by performance of the software as soon as the fi rst such applicative fi elds that represent best supercomputer supercomputer is available. The key “challenges” for the comput- will be multiplied IESP (International Exascale ing and mathematical meth- Software Project) has drawn up odologies studied. All these by 1,000. INRIA has an initial road map. The EESI research activities – whether taken part, since (European Exascale Software they concern methodology or 2009, in preparing the Initiative) is helping the IESP by applications – will focus on the establishing a European version. same goal: achieving the best run-up to exascale.” INRIA is a key player in both of performance with the possibili- these projects. ties offered by current and fu- Finally, we need to examine ture technologies. how to adapt applications so This partnership within IN- INRIA has also set up joint they can work on an extreme RIA will be completed by the research laboratories with key scale. INRIA is a founder and participation of other organisa- institutions in the fi eld. The fi rst member of the G8 ECS (En- tions and industries, such as, of such laboratories opened two abling Climate Simulation at in an initial phase, the ANDRA years ago, in partnership with Extreme Scale). (French National Radioactive the University of Illinois Urba- This project brings together Waste Management Agency), na-Champagne (USA) and the the best teams of six countries to BRGM (Geosciences), CEA, EDF National Center for Supercom- study new algorithms and soft- R&D or Dassault Aviation. Their puting Applications at the same ware technologies in order to participation will include the university. achieve the highest level of per- defi nition of application chal- Scientific objectives were formance from future exascale lenges in various fields such defined around four themes: supercomputers in the fi eld of as the environment (seismic parallel programming languag- climatology. z

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 . INSTALLED IN JULY 2010, THE TERA 100 SUPERCOMPUTER IS NOW RANKED AMONG THE MOST EFFICIENT ON THE PLANET. IT IS THE FIRST SUPERCOMPUTER IN EUROPE TO BREAK THE PETAFLOPS BARRIER. ITS GREATEST ADVANTAGE: IMPROVED ENERGY EFFICIENCY, UP TO 12 7 TIMES GREATER THAN ITS PREDECESSOR TERA 10, WHICH WAS 20 TIMES LESS POWERFUL! TERA 100: A LEADER IN EFFICIENCY NEW HORIZONS

The product of more than a dozen years of work on digital si- mulation and the architecture of major computing and data ma- nagement systems, Tera 100 also BY PIERRE LECA benefited from experience gai- ned since the beginning of the head of the simulation and years 2000 and the successful information installation and operation of its sciences depart- predecessors: Tera 1 in 2001 and ment of the Tera 10 in 2005. Thanks to these military appli- cations division skills and knowledge, in terms of the CEA. of needs analysis and computer architecture, as well as changes AND SOPHIE in hardware and software tech- HOUSSIAUX nologies, we have succeeded Tera 100 project in defining the features of this manager at Bull. computational behemoth. Tera 100 was designed to be efficient no matter which digital methods or algorithms are used.

z No sooner had Tera 100 been completely installed than the Bull-CEA team started thinking about its successor... which will be 1,000 times more powerful! CEA CADAM - P. STROPPA/CEA CEA CADAM - P.

he end result of two years measured performance (Rmax) of joint R&D by the engi- of 1.05 petaflops placed it 6th, T neers of the military ap- in 2010, among the 500 most z plications division of the powerful computers worldwide. The machine’s CEA (Atomic Energy and Alterna- It is currently ranked n°9 on the interconnection tive Energies Commission) and June 2011 list. With a remarkable network has a the French computer company efficiency * of 83.7% during the highly complex Bull, the Tera 100 supercompu- benchmark test for this ranking, system of cables, enabling ter is the first computer designed it is certainly one of the most ge- communications and built in Europe to break the neral-purpose supercomputers between its symbolic petaflops * barrier. A among the top ten worldwide. 4,370 nodes.

P. STROPPA/CEA N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS its maininnovations awater- for theBull-CEA team.Among of themainpreoccupations type ofmegafacility, wasone tion, a any twonodesinthecomputer. data iscommunicatedbetween inter-archipelago.ensures This levels: intra-archipelago and terconnection network hastwo archipelagos). Moreover, thein- make upthesupercomputer (10 (around 20cabinets),whichall nodes), withinanarchipelago processors), insideacabinet(24 nested insideanode(4micro- First, there isamicroprocessor, of Russianlike aseries dolls. architecture of Tera 100isalot bytes) persecond. reach 13terabytes (13thousand throughput ofthenetwork can dia. The maximumaggregated easy accesstodatastorage me- clusters ofnodesandoffering logy inarchipelagos, linking unusual architecture, atopo- was designedaccording toan The interconnection network under control consumption Energy 140,000 cores. microprocessors andnearly connects 4,370nodes, or17,480 puting power, Terra 100inter- memory. To achieveitscom- 32 cores share 64gigabytes of Thus, ofeachnode, attheheart led a “node” incomputerspeak. tion ofthesupercomputer, cal- thefounda- bullx 3060server), amulti-processor (the to form microprocessors are assembled lem-EX – haveeightcores. Four X7560 – also known astheNeha- build Tera 100,theIntel processor. Those selectedto multiplied withineachmicro- processing units, thatisbeing is thenumberofcores, orbasic no longerincreasing; insteadit Their frequency – evolutions inmicroprocessors. consideration technological . workstations operating under integrates theenvironment of fore thesupercomputer easily selected fortheproject. There- ture ofX86microprocessors was model. project inanindustrial singe object,buttointegrate this the purposewasnottobuilda cial supercomputers. Indeed, competitive range ofcommer- it haspavedthewayforanew and widelyavailable processors, Built withopensource software Controlling energy consump- Design choicestookinto The widelyusedarchitec- major problem forthis or speed – is Overall, the pies 650 m The supercomputer onlyoccu- the facilitytoremain compact. in thecabinetdoorshasallowed based coolingsystem,installed researchers, notablyonLustre, andacademicindustrialists ofEnergy,the USDepartment nity, includingspecialistsfrom commu- with theinternational developments were conducted the mostadvanced software, ergy efficiencyby 7fold. Tera 10, Tera 100improves en- powerful thanitspredecessor, 3 megawatts. Twenty timesmore powerthe electrical required to use,normal thisshouldlimit using theirfullpower. During frequencies whentheyare not which involveslowering thecore dulated according toworkloads, thermore, consumptionismo- of thecomputingfacility. Fur- improves theenergy efficiency closer towhere heatoriginates, cooling system,whichislocated 10 9 6 8 4 2 5 7 3 1 RANK In order tobenefitfrom With its9 in Europe, aswas itspredecessor, Tera 10, inJune2006. running at 8.16petaflops. ranked N°1– and byalong shot – withtheKcomputer byFujitsu, the unchallenged leaderinthenumber offacilities, but Japanisagain in thelatest Top 10, but withlow efficiency. TheUnitedStates remains performance. China continues onitsdizzyingascension, withtwo machines (83.7%), inotherwords, foritsreliabilitycompared toitstheoretical the computers – compared to7inNovember 2010 – had broken thepetaflops of equations calledLINPACK. Inthemostrecent ranking, inJune2011, all supercomputers worldwide isestablishedonthebasisabenchmark system Each year, inJuneandNovember, aranking ofthe500mostpowerful A FRENCHSUPERCOMPUTER9 RANKED Roadrunner, currently ranked 10 barrier. Thisspeedrecord was broken forthefirsttimein2008byIBM THE TOPTHE 10SUPERCOMPUTERS NASA/Ames Research Center/NAS –United States Commission (CEA) –France Atomic Energy and Alternative Energies Computing Center, California –United States DOE/National Energy Research Scientific New Mexico –United States DOE/Los Alamos National Laboratory, for Computational Science – Japan RIKEN Advanced Institute New Mexico –United States DOE/Los Alamos NationalLaboratory, National Supercomputing Center, Tianjin –China Shenzhen –China National Supercomputing Center, Tokyo Institute of Technology – Japan Tennessee –United States DOE/Oak Ridge NationalLaboratory, 2 offloorspace. This SITE –COUNTRY th placeworldwide, Tera 100ranks secondintermsofefficiency of acomputer. and theoretical power between themeasured Efficiency istheratio (10 operations persecond one thousand billion One teraflops equals of computers. the performance of measure for Per Second) isaunit Operations (Floating point LE FLOPS per second (10 billion operations equals onemillion per second (10 billion operations equals onebillion and oneexaflops sbm-. P30S .952.1 1.19 HP 3000SL Tsubame-2.0 optrFjtu81 93.0 8.16 Fujitsu K Computer odunrIM10 75.7 1.04 IBM Roadrunner ine1 UTMP25 54.6 2.57 NUDT MPP Tianhe-1A th ea10Bl ul .583.7 1.05 Bullbullx Tera 100 lidsSI10 82.7 1.09 SGI Pleiades eua ann .742.6 1.27 Dawning Nebulae oprCa E10 81.8 1.05 Cray XE Hopper aurCa T17 75.5 1.75 Cray XT Jaguar 12 NAME il ryX .181.2 1.11 Cray XE Cielo . Tera 100isthemostpowerful supercomputer ), onepetaflops SUPERCOMPUTERS LARECHERCHE NOVEMBER 2011

18 15 ) ). FIRM than Tera100. or apower 1,000timesgreater of supercomputing: exaflops*, is now working ontheHoly Grail been broken,theBull-CEA team hasthat thepetaflopsbarrier itsentireduring lifecycle. Now operational conditioniscritical Maintaining thecomputerin failures computation. during cessors, network...) toprevent components (memories, pro- checks thestatusofdifferent programmeand supervision lations according totheirprofile. software, whichorganises calcu- for theresource management communication library, butalso Of course, thisisthecasefor of theinterconnection network. nodes andthespecialtopology deration thespecificfeatures of environment takesintoconsi- hundreds ofnodes. The software system canshare databetween ment system. This “distributed” the opensource filemanage- Finally, anadministration TH (in petaflops) WORLDWIDE POWER z EFFICIENCY (%) N° 457 . 13 NEW HORIZONS THE DEVELOPMENT OF EXAFLOPIC COMPUTERS, WITH A COMPUTING POWER A THOUSAND TIMES GREATER THAN CURRENT SUPERCOMPUTERS, WILL DEPEND ON MAJOR TECHNOLOGICAL BREAKTHROUGHS IN TERMS OF BOTH HARDWARE 14 AND SOFTWARE. TRI-GATE 3D TRANSISTORS IN THE RACE TO EXASCALE NEW HORIZONS

n the beginning of May, current generation of planar should allow Moore’s Law, and Intel announced a major 2D transistors. This additional the historic pace of innovation, I innovation in transistors, control means there is as much to continue.” a microscopic component current flowing as possible when Tri-Gate transistors allow that is the basis of modern elec- the transistor is in the “on” state, chips to function at lower volt- tronics. For the first time since increasing performance. Alter- BY ages and with less leaking. The MARC their invention more than fifty natively, when the transistor is DOLLFUS result is an unusual combina- years ago, transistor design is in the “off” state, the flow is close head of public tion of improved performance about to evolve, taking on a to zero, minimizing energy con- sector HPC and and greater energy efficiency three-dimensional structure. sumption. It also allows the tran- research at Intel. compared to the previous gen- These revolutionary new 3D sistor to switch very quickly from eration of transistors, even the transistors, called Tri-Gate, will one state to another, once again most modern ones. The new be integrated in a microproces- increasing performance. 22 nm Tri-Gate 3D transistor sor for the first time (code name: improves performance by 37% Ivy Bridge) using a 22-nanome- Continuing Moore’s Law compared to Intel’s 32 nm pla- tre etching process. Just as a skyscraper allows urban nar transistors. Moreover, these Until now, and for decades, developers to optimise the avail- new transistors use less than all transistors have used a planar able space by building a vertical 2D structure that can be found structure, the Tri-Gate 3D tran- in all computers, mobile phones sistor is a means of managing and consumer electronics, but density. As the fins are vertical, also in onboard control systems transistors can be arranged more in vehicles, avionics, household densely side by side, which is es- appliances, medical equipment sential in order to exploit the and literally thousands of other technological and economic devices that we use every day, advantages of Moore’s Law. For hence the importance of this future generations of transistors, innovation. designers will be able to length- Scientists recognised the en these fins to achieve even interest of a 3D structure a long better performance and energy time ago in improving proces- efficiency. sor characteristics. Today, the Fore more than forty years, microscopic size of transistors the economic model of the makes their design even more semiconductor industry has difficult, as they are subject to been dictated by Moore’s Law, the physical laws of the infinite- named after Gordon Moore, ly small. Therefore, this is a real co-founder of Intel. It describes technological feat, regarding the a long-term trend according to design of the processor in and of which the number of transistors itself, as well as the fact it can be that can be placed inexpensively mass produced. on an integrated circuit doubles The Tri-Gate 3D marks a new approximately every two years, era in transistors. The 2D planar exponentially improving func- (or “flat”) power-conducting tionality, performance and costs. channel is replaced by a thin 3D Commenting on the advent of fin that rises vertically from the a 3D structure, Gordon Moore silicon of the transistor. Current observes that “for years we have control is then gated on each of seen limits to how small transis- the three sides of the 3D tran- tors can get. This change in the sistor, rather than just on the basic structure is a truly revolu- top side, as is the case for the tionary approach, and one that

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS 15 NEW HORIZONS PRACE INTEL z Up until now, transistors had a two- dimensional structure half the energy to achieve the gains in performance and en- signers the flexibility they need (left: the 90 same level of performance as ergy savings offered by Tri-Gate to make existing devices more nm transistor, presented in their 2D predecessors. This sig- 3D transistors are unlike any- intelligent. They will also allow 2002). The 3D nificant gain makes them highly thing that has been done be- them to design completely new Tri-Gate attractive for use in small pock- fore. This phase is much more products. (pictured above) et-sized terminals, for which en- than a simple confirmation of The Tri-Gate 3D transistor has a vertical ergy consumption is a key issue. Moore’s Law. The advantages will be deployed during the shift fin that rises in terms of voltage and energy to a new manufacturing pro- from the silicon of the transistor. An unexpected leap forward consumption are much greater cess: 22 nm etching. Intel Core For Mark Bohr, Intel Senior Fel- than those generally achieved processors with Ivy Bridge chip- low, who contributed a great in a single generation of etching sets will be the first to be mass deal to these innovations, the techniques. They will offer de- produced, at the end of 2011. z

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 . LES GRANDS CHALLENGES

16 HOW CAN WE DETERMINE THE EFFECTIVENESS OF A THERAPEUTIC MOLECULE BEFORE LAUNCHING CLINICAL TRIALS? ONE SOLUTION COULD LIE IN THE USE OF SUPERCOMPUTERS. IN PREPARING TESTS, SIMULATIONS SHOULD BE ABLE TO ORIENT RESEARCH TOWARDS NEW DRUGS. MODELLING MOLECULES

MAJOR CHALLENGES FOR MORE EFFECTIVE TREATMENTS

here are thousands of estimating precisely how they it allows these molecules that could will behave inside the human calculations to T potentially be used for body. However, the stronger an be performed therapeutic purposes. interaction is, the more likely the more and This is the problem facing phar- molecule under consideration more rapidly. maceutical laboratories every will be an effective therapeutic “At the start of day. It is impossible to organise or diagnostic agent. “The prob- the years 2000, clinical trials for each one. How lem is that simulation forces us to predicting a single can we predict which biomol- adopt models of macromolecules interaction between ecules have the bests chances that comply with traditional a therapeutic mol- of success? One solution, which Newtonian mechanics rather ecule and a biomol- was once considered only a then the quantum mechanics ecule required three dream, consists in digital simu- of Schrödinger, which are better months of calcula- lations. A colossal machine, a suited to the molecular world”, tions! recalls Michel supercomputer, could become a underlines Richard Lavery. Masella, researcher vital tool in understanding treat- Indeed, in simple terms, at the living chemistry ments at the molecular level. a chemical link between two laboratory of the CEA (Atomic “Thanks to the computing atoms corresponds to an ex- Energy and Alternative Ener- power already available, it is change of electrons, a phenom- gies Commission). At the time, now possible to simulate the be- enon that is essentially quantic. some people thought digital haviour of biomacromolecules However, this is a very diffi cult simulations were amusing and – proteins, nucleic acids, poly- equation to solve as there are that it would be quicker to con- saccharides, etc – in their natu- thousands, even millions, of duct experiments and wait three ral environment and understand electrons that need to be taken months for the results. Today, our their interactions and functional into account. And while atomic goal is the produce 100 to 1,000 roles inside the cell”, explains interactions are important, it is predictions of interactions a day, Richard Lavery, researcher at also important to bear in mind which will be possible thanks to the CNRS (National Centre for that atoms move, that atomic FEMTOSECOND the resources offered by the Cu- Scientifi c Research) BMSII labo- structures fold, etc. These com- A nanosecond (1 ns) rie supercomputer, hosted by the ratory (Structural and Molecu- plex movements over time rep- equals 109 seconds. CEA, and which will soon have lar Basis of Infectious Systems) resent the dynamic part of the A picosecond (1 ps) 80,000 processors.” This is a gi- 12 molecule. “Biomacromolecules equals 10 seconds. at the University of Lyon 1. “By A femstosecond (1 fs) ant step forward! A step forward replacing atomic models with have fl exible structures and their equals 1015 seconds. from a technical standpoint, be- simplified representations, we functions involve movements cause more effi cient machines can even build a model of a virus over several time scales, from a SOLVATATION were needed, but also in terms – containing the equivalent of 15 femtosecond * to a second”, adds A physiochemical of physics and algorithms, be- million atoms – and observe its Richard Lavery. Thus, many cal- phenomenon cause the methods researchers structure evolve over time.” culations are required to repre- observed when a are currently using have been Simulation can be used to sent their behaviour with suffi - chemical compound is signifi cantly improved and op- dissolved in a solvent probe what happens inside a liv- cient precision. and the atoms, ions timised. Certain CEA research ing cell. Practically speaking, this or molecules of the units continue, in fact, to work means modelling systems with The problem of scalability chemical species on improving the quantitative tens of millions of atoms, pre- It is essentially for this rea- are dissipated in the effi ciency of algorithmic codes solution by interacting dicting the forces established be- son that the power of super- with molecules in the i.e. increasing the number of op- tween them - interactions - and computers is vital, because solvent. erations performed per second,

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

quantum physics laboratory, Michel Caffarel is leading a re- search team dedicated to the development of an alternative method for solving Shrödinger’s 17 equation, as precise as DFT but not restricted by the number of processors. This is called “ideal scalability”, a Monte-Carlo quantum method. The term re- fers to the famous casino and is explained by the introduction of random electronic trajec- tories, constructed by a computer picking ran- dom series of numbers, like a roulette wheel. “The advantage, with the probabilistic dy- namics of electrons, is that it is parallel- LES GRANDS CHALLENGES isable. We can per- form parallel cal- culations without knowing what is happening on each processor, explains Michel Caffarel. Why? Simply because the simulation can be broken down into a set of independent electronic trajectories. This unique property allows us to use an arbi-

IBPC, PARIS trary number of proces- sors, because they do not communicate with each other during the entire simulation.” CEA In practice, each proces- sor calculates a single trajec- tory. Then the mean of the re- sults obtained is used to recon- stitute the final result, which should correspond to a single electronic trajectory, obtained by juxtaposing individual ones. Last spring this method was tested successfully on the Curie machine. Working with Anthony Scemama, a young research at the CNRS, a simulation was con- ducted on a set of 10,000 proces- sors of the machine with perfect or fl ops. However, these quan- z equation describing the quan- scalability. The simulation was tum methods “will not make the This model tum “nature” of electrons, it is therefore performed 10,000 scale” to exascale. In computer of a fl u virus considered a good compromise times faster than it would have jargon, we talk about “non-scal- obtained in terms of speed and chemical taken on a single processor. “I’ve ability”, meaning that it is impos- through digital precision. Unfortunately, due to been developing these methods simulation sible to use these calculations will eventually its poor “scalability”, taking ad- for twenty years now and I think to develop the full potential of allow the study vantage of platforms with more we have reached a turning point. exascale computers, which will of interactions than a few thousand processors We will be able to surpass current have millions, even billions, of between the would be very diffi cult. methods in just a few years!” ex- processors. Today, the method of main proteins ults Michel Caffarel. And the val- choice in quantum computing (in orange, ivory Random trajectories idation phase is about to start. and red) and of living molecules is one that therapeutic This is why research is focusing “In October we will study the emerged in the 90s: DFT or Den- molecules. (D. on the development of new ap- interaction of basic molecules sity Functional Theory, involving Parton - Oxford proaches intrinsically adapted in the chemistry of Alzheimer’s electronic structures. While this University, et al.) to computers with an arbitrary disease – the aggregation of amy- approach can only be used to ap- number of processors. In Tou- loid peptides – using 80,000 pro- proximately solve Schrödinger’s louse, in the chemistry and cessors on the Curie super >>>

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

>>> computer, and over the long term we can imagine contribut- ing to the understanding of cer- tain aspects of neurodegenera- 18 tive diseases”. What a challenge! Above all, this offers new hope in developing research on patholo- gies for which clinical trials are very difficult to organise. In the more distant future lies the possibility of more targeted medical research. “We are going to start integrating precise digi- tal simulations upstream in all our experimental programmes, which is what we did before with very simplified models, in order shift to a more efficient predictive phase”, forecasts Michel Masella. Cooperation between theory LES GRANDS CHALLENGES and experimentation is starting to take shape. This is should al- low researchers to predict highly probable paths to success.

It’s all about interactions “Yet we need to remain realistic,

warns Michel Masella. In biology, DR chemical reactions sometimes depend on tiny things.” While this approach is highly promis- ing, we must continue to opti- mise “pre-selected” molecules. This is why models need to be refined as much as possible. “Little by little we are correcting known defects. For example, we are currently working on long range electrostatic interactions. The existing models only analyse interactions at relatively short distances, which doesn’t ad- equately describe the solvation* of charged molecules. However, this is an important parameter in understanding the reaction of a drug dissolved in a solvent. Now z the goal is to adapt the mode to The movement charged interactions.” of electrons THE NEED FOR GREATER (white and grey Furthermore, if we want to dots) of this understand more clearly what beta-amyloid PRECISION happens inside a living cell, we peptide involved must not forget that in addition in Alzheimer’s Unlike aeronautics or climatology, which to the therapeutic interactions disease was have relied on digital models for a long of the drug with a cellular pro- simulated with time now, biochemistry is having difficulty the quantum tein, proteins also interact with Monte-Carlo integrating them in its culture. In the first each other. And, to make matters method. During two fields, there is no need for highly-evolved even more complicated, genet- each step models to make predictions based on ics mean that each individual of the simulation, simulations. While in biology, during a test displays variations in these pro- the colour phase a result must corroborate the entire teins. While producing individu- of the electrons is experiment so the method can be confirmed. modified. alised simulations of therapeutic (A. Scemama- If this is not the case and a difference is interactions seems like science CEA, M. Caffarel observed, the results are not corrected, but fiction today, Michel Masella -CNRS). instead it is the entire theoretical model that does not completely dismiss must be reviewed. This requires calculations the possibility. “When we will with an extreme level of precision and be capable of identifying muta- constantly adapted codes. This culture of tions, we will be able to envisage excellence comes with a trade off: falling what we refer to as personalisa- behind other scientific fields that rely more tion. But that will take another heavily on digital simulations. twenty years!” z MORGANE KERGOAT

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS P. AVAVIAN/CEA

P. AVAVIAN/CEA DR K\EULGVROYHUWHFKQRORJ\ RI\RXUVLPXODWLRQVZLWK    FRPSXWHSRZHU VDPHVLPXODWLRQUXQWKHUHE\GUDVWLFDOO\LPSURYLQJWKH DQGVKDUHGPHPRU\SURFHVVLQJ '03DQG603 LQWKH +\EULGWHFKQRORJ\DOORZV\RXWRXVHERWKGLVWULEXWHG /HYHUDJHWKHSRZHURI\RXU+3& *HW\RXUUHVXOWVIDVWHUWKDQHYHU %HQHîWIURPDVSHHG\VHWXS 3DUFGp$IIDLUHV6,/,&UXHGHV6ROHWV5XQJLVFHGH[

FOXVWHUWRLWVXWPRVW DQGHDVHRIXVH &DOFXODWLRQWLPH %RRVWWKHSURîWDELOLW\ 7ÅO  )D[   ZZZHVLJURXSFRPYSV LQIR#HVLJURXSFRP 1XPEHURIFRUHVXVHG )URQWDOFUDVKVLPXODWLRQ :LWKK\EULG :LWKRXWK\EULG WHEN A SUBMARINE EARTHQUAKE CAUSES A TSUNAMI, THE TIME REQUIRED TO CALCULATE THE HEIGHT OF THE WAVES EXPECTED ON THE COAST PROHIBITS ANY REAL TIME USE. THANKS TO SUPERCOMPUTERS, THE EFFECTS OF TSUNAMIS ON COASTLINES 20 COULD BE PREDICTED IN FIFTEEN MINUTES. USING SUPERCOMPUTERS TO IMPROVE TSUNAMI WARNING SYSTEMS MAJOR CHALLENGES

s we observed last March would reach the Polynesian coasts required a computation time that in Japan, tsunamis can in no more than three hours. When prohibited any real time use. How- A devastate entire coastal an earthquake is detected, re- ever, the parallelisation of code areas and cause consid- searchers analyse its location and used in high-performance com- erable damage. These particular magnitude and then estimate if puting offers new possibilities. waves translate into successive BY ANTHONY it could form a tsunami. Tradi- Digital simulations of tsunamis JAMELOT AND flooding of the coast – every 20 DOMINIQUE tionally, the estimation of the size are based on modelling three phe- to 40 minutes – alternating with REYMOND of the expected tsunami was per- nomena: deformations in the ocean marked withdrawals of the sea. of the Pamatai formed using an empirical law floor caused by an earthquake, Tsunamis are generated by strong Geophysics based on previously observed displacement of the overlying wa- earthquakes, generally with a Laboratory in events in Polynesia. This is how ter and coastal effects. The initial Tahiti. magnitude greater than 7.5 on the tsunami alert was established deformation of the ocean floor is the Richter scale, that occur in BY SÉBASTIEN for Chile in February 2010. An- calculated using elastic models subduction zones *. In the Pacific ALLGEYER, other method, which the CCPT of the Earth’s crustal deformation. FRANÇOIS Ocean these phenomena are more SCHINDELÉ started to develop in 2009, was Propagation, on the other hand, frequent, due to the intense tec- AND HÉLÈNE used in an operational manner involves solving a set of non-lin- tonic activity in that part of the HÉBERT for the Japanese tsunami of 2011. ear equations used in fluid me- world. of the Analysis, It is based on data from 260 pre- chanics for “long waves”: the length During the 1960s, after five Surveillance calculated tsunami scenarios, of tsunami waves (100 to 300 km) catastrophic tsunamis in the Pa- and Environ- whose fictional sources are spread is much greater then the depth of ment Depart- cific, the French Polynesia Warn- ment of the CEA across the main Pacific subduc- the propagation zone (4 to 6 km). ing Centre (CCPT-Centre Polynésien Military Appli- tion * zones. This new method Finally, simulating the effects of de Prévention des Tsunamis) was cations Division. reduces uncertainty regarding the a tsunami as it approaches the created at the CEA Geophysics height of the expected waves and coastline is only possible if high Laboratory based in Tahiti. Its does so very quickly after the earth- resolution bathymetric (sea depth) mission is to ensure constant sur- quake, but it does not offer a de- and topographical data are avail- veillance of overall seismic activ- tailed map of expected heights able. The precision of representa- ity in the Pacific Ocean in order along the coastline. tions of physical processes such to alert Polynesian authorities. as flooding, whirlpools or ampli- The closest region where earth- A triple model fication through resonance in a quakes could occur is in the Ker- While more precise, a complete harbour depends on the resolu- madec-Tonga subduction zone. simulation of a tsunami with de- tion of computing grids. With the A tsunami triggered in this zone tailed information on the coastline model we use, we can access a

z The height of a tsunami wave as it hits the coast (photomontage opposite) is only measured today after the fact. Faster calculations could be used to anticipate waves and evacuate

STOCKLIB © CHRISTOPHE FOUQUIN © CHRISTOPHE STOCKLIB populations.

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS z Figure 1. Maximum heights after fifteen hours 21 of tsunami propagation. Comparison of records from two tsunami monitoring systems by DART® (Deep-ocean Assessment and Reporting of Tsunamis) and simulations. MAJOR CHALLENGES

only 15 minutes of computation, where this would have taken 36 hours with a single processor. This allowed us to establish the distri- bution of maximum heights off the coast after fifteen hours of propagation – or after covering a distance of 10,000 km. We could observe that French Polynesia was in the main energy axis (Figure 1).

Estimating water heights Furthermore, digital simulation of the propagated tsunami in in- creasingly smaller grids around the coast (10 to 15 metres of resolu- tion) has allowed us to estimate maximal water heights as well as horizontal speed fields describing bathymetric description with a Since the middle of the 20th cen- currents at a given moment for spatial resolution of up to 5 km tury, more than fifteen tsunamis the fifteen bays. Conclusion: syn- for the entire Pacific Ocean and have been observed in the baysCEA thetic tidal curves are comparable up to 15 metres for harbours and of these islands. Considering the to real readings of harbour tide inlets. We have high definition magnitude of the Chilean earth- gauges. Similarly, the comparison bathymetric and topographic data quake, the empirical law used for is consistent with water heights for nineteen Polynesian bays. Par- this alert indicated wave heights reported in eye witness accounts allelisation of code and the use of of up to 3 metres for the Marque- or photos, for instance in the the CCRT (Centre de calcul re- sas and 2 metres for Tahiti. Tahauku Bay and Hiva Oa Island. cherche et technologie-Research A whirlpool pho tographed in and Technology Computing Cen- From 36 hours to 15 minutes Hakahau Bay (Ua Pou Island), af- tre) at the CEA’s Ile-de-France site The alert level rapidly rose to “red”, ter approximately 11 hrs and 45 have been essential in multiplying which implies the evacuation of min of propagation, was repro- the number of studies and reduc- coastal areas and ports. The max- duced at the same moment. ing uncertainty. imum water level (from “crest to This complete simulation of However, what are these tech- trough”) measured on tide gauges a tsunami, from its source to the niques worth when compared in ports reached more than 3 me- coastline, shows how the preci- to actual experience? We were tres in Hiva Oa and Nuku Hiva, sion of these results constitutes able to answer this question af- compared to approximately 35 cm a vital tool in warning populations, ter the Chilean tsunami on Feb- in the port of Papeete. The heights in addit ion to existing methods. ruary 27th 2010. At 6:34 GMT a observed on the coasts of the Mar- Indeed, knowing the level of flood- major earthquake (magnitude quesas (excluding tide gauges) ing of coa stal areas in advance, 8.8) occurred off the coast of Chile. range up to 3 metres above low- with minimal uncertainty, would As expected, for this type of quake, tide level, but the tsunami hit the revolutionise the management the tsunami caused significant coast at low tide. The alert worked of alerts. This is not possible today, damage off Chile and then prop- well and there were no victims, but we can imagine that in the agated across the Pacific. In French only property damage – a boat near future we will be able to ob- Polynesia, the Marquesas Islands whose owner refused to evacuate. tain predictive res ults from these were affected the most. Indeed, SUBDUCTION The digital simulation of the event simulations for all of French Poly- gentle submarine slopes and A geological process was conducted a posteriori on two nesia in less than an hour. This by which one tectonic large, open bays with no barrier plate is forced under hundred CCRT processors, for fif- implies, of course, that adequate reefs offer favourable conditions another and sinks into teen bays in the Marquesas Islands. computing means are dedicated for the amplification of tsunamis. the Earth’s mantle. The results were obtained after to the warning system. z

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 . TODAY WE CANNOT IMAGINE THE NUCLEAR POWER INDUSTRY WITHOUT SIMULATIONS. THEY ARE USED TO OPTIMISE THE ENTIRE FUEL CYCLE, INCLUDING MANAGEMENT OF NUCLEAR WASTE. THREE DIMENSIONAL MODELLING HELPS ENGINEERS IN THEIR 22 WORK… AND KEEPS REACTORS SAFE. FUTURE NUCLEAR REACTORS ALREADY BENEFIT FROM HPC MAJOR CHALLENGES

hether it comes to se- and experimental validation. In tron balance equations: the curity, extending the the field of reactor physics and Boltzmann equation can be used Wlife cycle of reactors the fuel cycle, several physical to model the life cycle of neutrons, or optimising waste phenomena are the object of cal- while the Bateman equation trans- management, simulation plays culations. The kinetics and dis- lates the evolution of isotopes over BY an important and increasingly CHRISTOPHE tribution of neutrons in the core time. These are two “exact” equa- prevalent role in the nuclear power CALVIN determine control of the chain tions, i.e. “without approximation”. industry. Engineering teams as head of the reaction and the nuclear fuel. The They use physical units that char- well as R&D are using them more laboratory in the propagation of ionizing radiation acterise the interaction of particles, and more to calculate the normal reactor engineer- is calculated both for the protec- such as neutrons, with the fuel behaviour of systems, of course, ing and applied tion of people and in order to un- and materials of the reactor. And mathematics but also to imagine fields of op- department of derstand its effects on materials. basic nuclear data is produced eration beyond what experience the nuclear en- Finally, the evolution of nuclear through precise experimental can measure. This is a major point ergy division of fuel is directly linked to the opti- measurements. in nuclear safety. Like any indus- the CEA, expert misation of fissile resources and in high-perfor- trial sector, this approach is based mance comput- waste management. The finesse of modelling on a modelling trifecta of physical ing applied to Theoretical modelling of these All this seems well and good: pre- phenomena, digital simulation reactor physics. phenomena is based on two neu- cise data and equations can model

z The core of a nuclear reactor is loaded with fuel. Its composition and position are key parameters for the plant’s power and security. CEA

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS physical phenomena with great accuracy. However, these equa- tions involve solving systems with more than...1,000 billion un- knowns! An impossible mission, 23 no matter how much computing power is available. Thus, in order 0 to solve these equations, it is nec- z essary to mobilise two comple- In this image mentary approaches. On the one of a fourth hand there is the determinist ap- generation reactor core, proach. It relies on physical hy- calculated using potheses and digital models in APOLLO3 code, order to solve a problem. This is each diamond the basis of the APOLLO2 and represents an APOLLO3 codes developed by the assembly of nuclear energy department of the nuclear fuel. The level of

CEA (CEA/DEN). On the other nuclear power 1 MAJOR CHALLENGES hand there is the Monte-Carlo for each one is approach. This relies on native represented by representation of data to “play” a colour, from the life of billions of “useful” neu- blue (no power) to red (maximal trons (TRIPOLI - 4(tm) code, also power). The blue developed at the CEA/DEN). These diamonds in the digital simulation methods are centre are control closely coupled with experiments rods that are and measurements. used to manage In the field of reactor physics, the nuclear reaction. high-performance computing (HPC) leads to modifications in the way digital simulations are

0 CEA used. Evolutions in computer codes (algorithms, digital methods, etc) and the effective use of increased biases of calculation systems for A reactor core is made up of fuel computing power continually nuclear reactors is one of the key assemblies. These components contribute to improving the ac- points in designing tomorrow’s remain a certain time inside the curacy and finesse of models. The reactors. reactor core. Their position, de- information obtained via digital By associating HPC with mod- pending on their type (nature of simulation is also more complete, ern code, engineers manage to the fuel and time spent in the core), thanks to the generalisation of asses these uncertainties more constitutes the core loading plan. three-dimensional calculations effectively. They can rapidly learn According to this loading plan the – which first appeared in the 90s – the precise impact of a more or main operational parameters re- and the consideration of multi- less fine computation grid on the garding safety and yields – like the physical phenomena (neutron final result. They can therefore maximum power of the reactor – transport, thermohydraulics, fuel define “reference” solutions for can be optimised. modelistaion, etc). assessing uncertainties. And to Up until now, these optimisa- Another step forward is the do this they can rely on signifi- tions were performed based on possibility of simulating simulta- cant means: computing power human expertise and feedback neously an ever vaster range of in the hundreds of teraflops *, us- from existing reactors. However, power plant components. For ex- ing more than 30,000 processors for new reactor concepts, it is more ample, HPC opens the door to simultaneously. And the result: and more difficult, and above all modelling, at the heart of three- an extremely realistic 3D com- time-consuming, to optimise fuel dimensional simulations, the re- putation of the reactor core, thanks loading “manually”, considering actor core and the boiler while to the precise solution of a neu- the number of possible configu- taking into consideration nuclear tron transport equation in just a rations. power and thermo-hydraulic phe- few hours. With a single proces- The design of an optimisation nomena together – primary water sor, the same operation would tool, suitable for determining nu- system, steam generators, etc. FLOPS take at least a year… clear fuel loading plans and inde- Thus engineers obtain much more (Floating point pendent from the configuration precise results without spending Operations Changed at the core under study (core, fuel) can fa- more time than they did before Per Second) is a unit Finally, high-performance com- cilitate the work of engineers and of measure on common simplified models. for the performance puting can be used to implement above all reduce the time spent Let us take the example of the of computers. innovative methods for optimis- on studies. This tool, based on nuclear simulation code APOLLO3, One teraflops equals ing operational parameters and multi-criteria optimisation soft- which was used at the end of 2010 one thousand billion systematically mastering uncer- ware integrating genetic algorithms operations per second on the new petaflopic supercom- (1012), one petaflops tainties. A typical example of this (VIZIR) and APOLLO3 code, allows puter (Tera 100) at the CEA/DAM equals one million approach is the combined use of the designer to improve reactor (military applications department billion operations computing power, artificial intel- performance and security. This is of the CEA) to produce 3D simu- per second (1015) ligence methods and a new gen- how, with the help of 4,000 pro- and one exaflops lations of a fourth generation re- equals one billion eration of simulation codes to cessors, we can now find solutions actor, the successor of the EPR. billion operations optimise loading plans for the for complex reactor core loading Determining uncertainties and per second (1018). reactor core. What does this mean? plans in less than 24 hours. z

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 . HOW DO ATOMS LINK UP TO FORM MATERIALS? SIMULATING GROWTH AT THE ATOMIC LEVEL REMAINS A MAJOR CHALLENGE. THE GOAL IS TO MASTER THE FLOW OF ATOMS, TEMPERATURE, THE ELECTRIC FIELD… KNOWLEDGE THAT IS VITAL IN 24 THE FIELD OF NANOTECHNOLOGIES. WATCHING MATERIALS GROW, ONE ATOM AT A TIME MAJOR CHALLENGES

n the atomic level, ma- Basle) or ART (Activation Relax- on silicon carbide (SiC) (Figure terials grow with the ation Technics, developed by Nor- 2). We have considered bidirec- Omovement of atoms as mand Mousseau, at the University tional periodic surfaces consisting they arrive at the sur- of Montreal). in 700 atoms, or 2,608 electrons. face and combine with others that Currently a large number of In one of these directions, the sil- are already there. At the same time, BY THIERRY algorithms have been developed icon carbide crystal (SiC) is made DEUTSCH this minimises the energy of the and are being tested to search as up of alternating layers of pure system, which is why atoms or- researcher at the quickly as possible for the most carbon and silicon. A surface can INAC (Institute ganise in highly regular networks, for Nanoscience stable atomic configurations and therefore end with a layer of car- like crystals. To study this phe- and Cryogenics) energy barriers. These algorithms bon or silicon. Ab initio methods nomenon, we need to identify the at the CEA are required to understand, among are essential in order to describe, different mechanisms behind (Atomic Energy other things, protein folding. in particular, the links of carbon and Alternative atomic distribution and then com- Energies The second step consists in atoms, which are different in sil- pare them at different tempera- Commission) comparing the different mecha- icon carbide and graphene. To give tures. This means simulating mat- of Grenoble, nisms for distributing atoms. To an idea, the calculation of one ter on an atomic scale. where he is do this, we need to call on statis- atomic configuration requires in charge of We can model growth using the atomistic tical physics, and more precisely around five hours, using a super- phenomenological models i.e. simulation a Kinetic Monte Carlo algorithm, computer with 600 parallel pos- using a model with very few pa- laboratory. which consists in randomly “draw- sessors! rameters and based on certain ing” a new atomic position, de- hypotheses to describe the essen- AND PASCAL pending on the energy barrier that Wavelets for good measure tial features of a physical phenom- POCHET must be overcome. It is therefore BigDFT code uses new mathe- enon. However, it is difficult to researcher possible to conduct numerical matical functions – wavelets – in determine whether such a model at the INAC. experiments of material growth a novel way, which had been used really demonstrates phenomena by considering each atom, one up until now mainly to compress and whether it will have predic- by one. images. The code has been opti- tive capacities without conduct- However, the calculation of mised – in partnership with the ing real or numerical experiments. minima and energy barriers us- Grenoble computer lab – to use Thus it is better to rely on “first ing ab initio methods is a hand- several cores for each electron principles”, or ab initio, methods icap for this approach, due to the simulated. In practice, it is there- based on the Schrödinger equa- huge computing power required. fore possible for our system of tion of quantum mechanics, which Indeed, the Schrödinger equa- 2,608 electrons to use more than is capable of predicting, very spe- tion needs to be solved! For this 2, 608 cores. BigDFT code is also cifically, the most stable atomic we need to mobilise the “Kohn- capable of using graphic proces- configurations. They allow the Sham system”, based on the “den- sors, which reduces computing observation of rare events govern- sity functional theory”, honoured time by a factor of 10. ing growth, such as the movement by a Nobel Prize in Chemistry, in The Tera 100 supercomputer or dispersion of an atom from one 1998. What does it consist in? The can perform calculations 2,000 stable site to another. underlying idea is that electronic times faster than a traditional density is sufficient to determine computer. Thus, we can deter- Energy barriers the fundamental state of elec- mine two new minima per day In practice, a computing method trons for a given atomic position. with the ART algorithm. To achieve for the electronic structure, such If we consider that electrons are this, the algorithm has been highly as BigDFT code (developed since always in equilibrium when at- optimised. The goal is to reduce 2005 by the L_Slim laboratory at oms move, it is possible to cal- by a factor of 4 the number of the CEA in Grenoble) needs to be culate the atomic forces and find energy assessments required to combined with a highly exhaus- the minima – i.e. the stable atomic determine a minimum and achieve tive research algorithm for the configurations – and energy bar- results with only 400 assessments minima and energy barrier, such riers (Figure 1). – the basis of a preliminary study as MH (Minima Hopping, invented Our group is focused on sim- that has just been published in by Stefan Goedecker’s group in ulating the growth of graphene the Journal of Chemical Physics.

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS FIGURE 1

25

z Figure 1 Energy curve representing the path between two minima for a cage of SiC (120 atoms). There is an energy barrier (bump) between the structure on the left (local minimum) and the structure on the right (global minimum).

z MAJOR CHALLENGES Figure 2 Bare surface of SiC ending in silicon (in green) on a yellow plane. The bottom surface ends FIGURE 2 in carbon (in black). The box outlined with blue lines corresponds to the calculation box containing 700 atoms.

z Figure 3 View above the SiC surface showing a nanosheet consisting in 16 carbon atoms in blue. The red links in the graphene nanosheet are different from the purple ones in the SiC material.

Source: E. Machado-Charry et al., J. Chem. Phys., 135, 034102, 2011.

FIGURE 3

Nevertheless, we estimate it will take several months to obtain z physical data that can be used in L’apparition des lamelles the Kinetic Monte Carlo part. Une déformation, appliquée Currently, we are studying ici en quelques xxx secondes growth of graphene on SiC using sur un alliage de fer et de a surface ending with silicon (Fig- nickel, induit une transforma- ure 3). Next, we will examine the tion martensitique complexe. Les différentes phases s’empi- growth of a complete sheet of lant en grandes lamelles, se graphene using the carbon atoms structurant elles-mêmes en present in the final layer of silicon bandes. carbide. We hope, in the end, to compare our results with exper- imental data. If these fit, we will have found a mechanism for syn- thesising graphene on silicon z CEA carbide.

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 . HOW CAN WE DESIGN ATOMIC WEAPONS WITHOUT CONDUCTING NUCLEAR TESTS? THE SOLUTION LIES IN COMBINING MODELLING AND SIMULATION. IN THESE FIELDS SUPERCOMPUTERS HAVE PROVEN THEIR UTILITY. AND IN THIS AREA FRANCS 26 IS AHEAD OF THE CURVE. CALCULATING NUCLEAR DISSUASION MAJOR CHALLENGES

o simulate the complete It is impossible to solve this equivalent of 240 laser beams on deployment of an atomic system precisely. Digital analysis a bead filled with two hydrogen Tweapon without relying must be used to transform it into isotopes *, deuterium and tritium, on nuclear testing, you a system of «approximate» equa- triggering their fusion for several need to come up with a mathe- tions that a supercomputer can billionths of a second. matical model, solve a series of solve. Calculations are performed BY A prototype, the LIL (Laser In- CHARLES LION equations on supercomputers and, on small areas called grids. The tegration Line), consisting in four finally, confirm the results obtained more grids there are, the closer Head of the beams, started operating in March through laboratory experiments, we can come to an accurate solu- simulation 2002 and has allowed us to con- programme based on the measurements re- tion to the real problem. Practi- in the military firm the technological choices of corded during past nuclear tests. cally speaking, dozens, even hun- applications LMJ. The latter, as is already the The first step, establishing a dreds, of millions of grids are used. department case for the LIL, will be available model, requires, above all, knowl- This represents billions of un- of the CEA. for use by the international sci- edge of the different physical phe- knowns. entific community in the fields of nomena involved and how they astrophysics, energy, etc. are linked (see “H-bomb basics”). Step-by-step validation Finally, the last step consists The equations that could repro- To ensure the computer produces in global validation. This means duce them through computation the most realistic representation, comparing the result obtained are already known: Navier-Stokes validation of results is a two-step with software to all the measure- equations for fluid mechanics, the process. The first step is valida- ISOTOPES ments recorded during past nuclear Boltzmann transport equation for tion of parts of the model: a single are atoms of a tests, notably those conducted in neutrons and diffusion and trans- physical phenomenon, like the chemical element with 1996, during the last French pro- differing numbers of port equations for the evolution mechanical behaviour of a spe- neutrons. gramme. This step is used to pro- of matter and photons. By com- cific material, or a few combined duce a computation standard, in bining these physical models, we phenomena. To do this, we com- FLOPS other words, real prescriptions (FLoating point can obtain a system of mathemat- pare the results of the simulations Operations Per Second) for using digital simulations, the ical equations that faithfully re- to experiments performed with is a unit of measure trust domain of the simulation at produces the workings of a nuclear two tools: the Airix radiographic for the performance time “t”. weapon. machine and the Laser mégajoule of computers. One teraflop equals (LMJ). one thousand billion A world first Billions of unknowns Set up since 2000 in Moronvil- operations per In 2001, thanks to the Tera 1 su- The next step, in order to solve liers, in the Champagne-Ardenne second (1012). percomputer with a computing this system of equations, is simu- region of France, the Airix facility lation in conditions as close as can produce a flash X-ray image possible to the real deployment of a nuclear weapon without fis- H-BOMB BASICS of a nuclear weapon. To achieve sile matter to confirm the initial, this, we need to describe a wide pyrotechnical phase. It simulates The first stage in the deployment of a range of particles (neutrons and the compression of matter using thermonuclear weapon is detonation of an photons, as well as ions and elec- non-radioactive materials with explosive charge (pyrotechnics). In just a few trons) on three time scales of less comparable mechanical and ther- millionths of a second, several thousands than a millionth of a second. And mal behaviours. The X-ray images of degrees in temperature are reached, this must be done in states of ex- produced during this compres- triggering fission by compressing the fissile treme pressure, up to a thousand sion phase are compared to dig- matter (plutonium or uranium) transformed billion times atmospheric pres- ital simulations. As for the LMJ into plasma (ionised gas). This mini nuclear sure! This calculation is incredibly facility, under construction in Le explosion lasts around 100 millionths of a more complex than many other Barp, near Bordeaux, it will allow second and raises the temperature by the ten modelling processes, such as me- us, at the end of 2014, to repro- million degrees required to set off the third teorology, due to the brevity of duce and study the nuclear fusion step, fusion of deuterium or tritium, which the phenomena in play and the phase. It will be the equivalent of are two hydrogen isotopes. The temperature intimate links between physical a wind tunnel in airplane design. then rises to around a billion degrees for a mechanisms. The LMJ facility will focus the few millionths of a second.

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS 27 MAJOR CHALLENGES CEA

z The LMJ facility, should be in service in 2015. Sim- currently under ulation will allow us to adapt the construction computation standard to their at Le Barp, near Bordeaux, specific thermal and mechanical power of 5 teraflops *, set up at our knowledge, this represents a is a major tool in environments. The other funda- the CEA in Bruyère-le-Chatel, we world first. Missiles on board Mi- nuclear weapon mental dimension of this research produced our first computation rage and Rafale jet fighters have simulation. consists in training and certifying standard, the first step in guaran- started to be equipped with them Nuclear future designers of nuclear weap- fusion will be teeing nuclear weapons with a in 2009. reproduced ons. They must be able to master single simulation. In 2005, Tera With a computing power of on a true scale digital simulations and also un- 10 (50 teraflops) allowed us to de- 1,000 teraflops, Tera 100 repre- in this derstand their limitations. fine a computation standard that sents a giant step forward in mod- experiment The combination of unique was confirmed through a much elling capacities. Installed in July chamber facilities, such as the Tera world vaster set of experiments, with 2010, the supercomputer is now that measures class supercomputers and the 10 metres the first 3D simulations. We could fully operational. Three-dimen- in diameter. LMJ, will also attract young and guarantee the performance of an sional simulations are available brilliant engineers, physicists and air-launched nuclear warhead in large numbers, thus allowing mathematicians and preserve the with a single simulation without us to guarantee the future nuclear necessary skills for our dissuasive organising new nuclear tests. To warheads of submarines, which missions. z

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 . WHAT HAPPENS WHEN TWO GALAXIES COLLIDE? THIS IS NOT MERELY A QUESTION OF CURIOSITY FOR ASTRONOMERS. THE PROCESS OF INTERACTIONS AND FUSIONS BETWEEN GALAXIES ARE KEY PHASES IN THE BIRTH AND 28 FORMATION OF STARS UNDERSTANDING HOW A STAR IS BORN MAJOR CHALLENGES DR

t the beginning of the the centre of galaxies, where in- challenge. To simulate the behav- Universe, small dwarf tense starbursts can occur. iour of galaxies, we need to consider Agalaxies started to form, the dynamics of their discs on a which would combine to One month of simulation scale of tens of kiloparsecs (kpc) *, create increasingly massive gal- So much for theory. How can we or a distance of 1021 m (a thousand axies. The merging of galaxies also BY PAOLA describe these phenomena with billion billion metres). Regarding DI MATTEO, plays a role in the redistribution FRANÇOISE precision? Since we cannot mea- the formation of stars and interstel- of the angular momentum * be- COMBES sure them, because they occur at lar clouds, the scale that needs to AND BENOÎT tween the visible component of SEMELIN astronomical distances and on time be taken into consideration is the 16 rapidly rotating galaxies (gas and astrophysicists scales much longer than a human parsec, or approximately 10 m. stars) and the dark matter, which at the Paris life...we can simulate the evolution Moreover, the calculations must is like a halo around the visible Observatory. of encounters between galaxies! combine these two scales. Super- matter. They create favourable While this seems simple enough computers have allowed us to start conditions for gas to collapse at at first glance, it is a real scientific a statistical study of merging galax-

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS 29 ies, by conducting approximately In our simulation method, the Curie supercomputer, and thanks fifty simulations, with a spatial res- system is modelled by a set of «par- to parallelisation of the code olution of around 50 parsecs. Prac- ticles», which are monitored as they (OpenMP), each simulation was ANGULAR tically speaking, we performed cal- MOMENTUM change under the effect of the dif- launched on 32 cores and used more culations for over a month on 1,550 A vector parallel to the ferent physical processes under than 50 Gb of RAM for each node. cores of the Curie supercomputer, axis of rotation whose consideration. One particle cor- The strength of these simulations set up at the CCRT (Centre de calcul amplitude represents responds to either a cluster of stars – which are still in progress and have the number of rotations recherche et technologie - Research of a system. or a cloud of interstellar gas. The just started to yield viable data – is and Technology Computing Cen- physical processes are essentially to offer results with a previously un- tre) of the CEA. PARSEC gravitational forces exerted between heard of level of spatial resolution. Parallax of one second, all the particles, but also the pres- In an initial state, this has allowed

Modelling mergers a unit of length used sure of gas, viscosity, shock waves, us to know where, and how efficiently, MAJOR CHALLENGES Therefore we are modelling merg- in astronomy that the formation of stars from gas and gases are transformed into stars. ers between galaxies of the same equals 3.26 light the ejection of gas by stars at the Our vision of galactic dynam- years, or 206,265 astronomical units end of their life cycle. ics will radically change. Indeed, (Earth-Sun distance) To obtain a spatial resolution the nuclear energy of stars is not of 50 parsecs, the particles’ mass only “radiated”, but also trans- must be 5,000 times that of the Sun formed into kinetic energy, through and 30 million particles are required flows of ejected gas. At low reso- to model the merger of two large lution, the formation of stars can Milky Way type galaxies. To assess only be treated in a semi-analyt- the gravitational forces between N ical way, by using a probability particles, the simplest algorithms proportional to the density of the require N2 calculations of interac- gas present. However, this is not tions. At the cost of an approxima- satisfactory as the instabilities of tion, the algorithm we used, called galactic disks are more subtle. a “tree” and developed in the 1980s, can be used to perform the calcu- A different global structure? lation with a scale of operation pro- Our hope is to treat interstellar gas portional to N 1n(N) operations. with greater precision, by consid- Without this significant gain, these ering its dispersion via radiation, simulations would be impossible. collisions between clouds, the shock waves formed, the macroscopic Hydrodynamic forces turbulence generated and its vis- z However, gravitational forces are cosity. If we take these different not the only phenomena at work. physical parameters into account Distribution of gas after the We must consider the hydrody- very precisely, the global structure first encounter namic forces (pressure and vis- could be completely transformed. between two cosity) that characterise the dy- First, because the energy re-injected galaxies of the namics of galaxies. We calculated in the stellar environment by the same size (major these using the SPH method stars, stellar winds and supernova merger). When two galaxies pass (Smooth Particle Hydrodynam- explosions may go undetected if near each other, ics), developed at the end of the the scale considered is lower than some matter 1970s, which consists in repre- the resolution. However, if simula- may be ejected senting the fluid as a multitude tions do not overlook them, this far outside the of tiny components covering it. energy transferred within the in- disk, forming To monitor not only the rota- terstellar environment completely structures called a “tidal tail” tion of galaxies, but also the for- changes its dynamics and could that can span mation of small dense structures prevent the formation of stars. several hundred during mergers (molecular clouds Above all, the interstellar envi- kiloparsecs. whose cycle of dynamic evolution ronment has several phases, with is much shorter), we assessed forces very different densities and more size – called “major mergers” – as every 250,000 years, or 8,000 times than 10 differences in scale between well as the accretion of satellites during the 2 billion years covered the densest clouds and the diffuse by a Milky Way type galaxy - or by a simulation. We can estimate environment. At low resolution, “minor mergers”. The results of this corresponds to a total of around these different phases can be sim- this violent transformation can 10,000 billion calculations of in- ulated by different components, then be compared with the evo- teractions between particles dur- with exchanges of mass between lution of slower internal processes, ing a single simulation. phases, calibrated in a semi-ana- called secular evolution. The final If we had only one processor to lytical manner. At high-resolution goal is to understand, in particu- work with, we would have needed we can hope to directly recreate lar, the redistribution of angular several years to perform these cal- several phases, using the natural momentum between visible and culations. There was only one solu- instabilities of the environment, dark matter, the intensity of star- tion: use several arithmetic units and therefore treat full-fledged mo- bursts, or the morphological prop- for each simulation. In the context lecular clouds, without introducing erties of merger residues. of the “Grand Challenge” on the them artificially. z

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 . MECHANICS AFFECTING MATERIALS DURING INTENSE SHOCKS ARE STILL NOT FULLY UNDERSTOOD. TO STUDY THEM, WE NEED TO DESCEND TO ATOMIC LEVEL. IRREVERSIBLE DEFORMATIONS, DAMAGE, DECOMPOSITIONÖHIGH-LEVEL SIMULATIONS ARE REQUIRED 30 TO UNDERSTAND THE PHYSICS OF SHOCKS. THE PHYSICS OF SHOCKS ON AN ATOMIC SCALE MAJOR CHALLENGES

hat happens, on practical cases. For example, the can cause different phenomena a molecular level, transition from an elastic to a that depend on its intensity, as Wwhen a material plastic state: above a certain thresh- well as the local topology of the withstands an in- old pressure, propagation of a surface, including its roughness. tense shock after, for example, shock causes the creation of ir- These phenomena can signifi- BY LAURENT an explosion or violent colli- SOULARD, reversible structural flaws. cantly affect the properties of a sion? A shock can be seen as an NICOLAS Because it significantly affects material and damage elements PINEAU, extremely rapid compression OLIVIER the mechanical properties of the in its environment such as, for – lasting 1011s and an intense one DURAND material, plasticity requires very example, a measurement system. 1010 Pascal, or 105 atmospheres AND JEAN- precise modelling, based on a This is the case of rupture by BERNARD or more. It is still impossible to MAILLET description of underlying elemen- scaling. This phenomenon occurs observe experimentally, with engineers in tary mechanisms. when an expansion wave (or de- precision and in real time, what the military compression) produced when the occurs on an atomic scale. We do applications Plasticity and rupture reflection of a shock on a free sur- know that it involves complex department of We were particularly interested face encounters the one following phenomena: plasticity (irre- the CEA. in studying the plasticity of dia- the initial shock. This causes sud- versible deformation of the ma- monds. This is a real challenge den tensioning of the environ- terial), damage (cracks, breaks), because it involves not only tak- ment locally, which can lead to a chemical decomposition, etc. To ing into account the complexity rupture of the material. Scaling get a more precise idea, one so- of the process, but also setting is currently the object of studies lution consists in simulations on up an interaction model capable associating large-scale simula- petaflopic computers. of reproducing the many differ- tions of molecular dynamics and Molecular dynamics is a widely ent states of carbon (diamond, specific experiments. used simulation method. This graphite, etc). Stamp code is ca- consists in solving equations for pable of describing carbon very Ejection of matter the movement of a set of particles precisely, but the trade off is a Another phenomenon caused by (atoms, molecules) interacting significant computation time. We shocks is the projection of mat- under a predefined force. In re- used it to simulate the propaga- ter. This is caused by a defect in cent years, computers have al- tion of shocks with various inten- the flatness of a free surface, such lowed us to conduct phenomeno- sities on the principal crystallo- as, for example, a scratch pro- logical studies of small systems graphic orientations of a diamond duced by a machining tool. De- – a few million atoms or sizes of (Figure 1). pending on the type of defect, the a few millionths of a micron. Thus, no less than 1.3 million matter ejected can take on the With the Tera 100 supercom- carbon atoms were considered form of small aggregates, jets, etc. puter, we can now study systems for a calculation performed on If we want to protect the surround- with dimensions of nearly a mi- 6,400 cores of the Tera 100 super- ing environment (measurement cron and that can be compared computer. The results? We ob- apparatus, coatings on experi- with experiments, while using served multiple structural defects, ment chambers, etc) we need to forces that more closely describe signifying the appearance of plas- control ejection mechanisms. The the complexity of interactions be- ticity. This highly detailed mea- conditions for forming a jet, for tween atoms. surement of the materialís state example, depend on relatively To successfully perform these will be an invaluable tool in con- well-known hydrodynamic con- simulations, we needed to rethink structing a model that can be used ditions. Its ulterior behaviour, and software programmes so that they with a code working on a macro- particularly its fragmentation, is take full advantage of the com- scopic scale. more difficult to apprehend. Once puting power of massive parallel During its propagation, a shock again, we can analyse the prob- systems. This is the case of Stamp encounters interfaces between lem in an extremely precise man- code, developed by the military materials and, in particular, so- ner but on a reduced spatial scale applications department of the called free surfaces forming the – thanks to a digital experiment CEA (CEA/DAM) over the last fif- frontiers of the system with am- in molecular dynamics. Thus, we teen years. Thanks to this code bient air. However, when it is re- simulated the formation of a jet we were able to simulate some flected on a free surface, a shock by considering 125 million cop-

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS 31

z Figure 1. Propagation of a shock wave in the single crystal of a diamond. Behind the shock MAJOR CHALLENGES front (white line), multiple structural defects show the appearance of plasticity (1 ps equals 1 picosecond, or 10-12 second).

z Figure 2. Formation of a jet caused by the reflection of a shock on a surface with a defect in flatness. A total of 125 million copper atoms and 4,000 processor cores on Tera 100 were required for this

CEA calculation. per atoms thanks to computa- ing from a few microns to several mechanics. They therefore remain tions performed by 4,000 cores millimetres, depending on the compatible with a molecular dy- on Tera 100 (Figure 2). As with our nature of the explosive. However, namics code. Simulations of ni- study of plasticity, this data will simulations of chemical reactions, tromethane (CH3NO2) with 8,000 enable us to construct models on which necessarily require taking cores on Tera 100 demonstrated a macroscopic scale. electrons into account, are lim- that the appearance of a detona- Finally, if the shock is propa- ited to nanometric scales. tion is a much more complex pro- gated in a material that is also the cess than suggested by experi- seat of chemical reactions, we can Revisiting models mental observations. expect a state of detonation, i.e. How can we link these two scales? Simulation can be used to the propagation of a chemical With a model that simulates the revisit macroscopic models. It reaction by the shock wave. We evolution of a set of “super par- completes experiments so we can understand this phenomenon ticles” representing one or more can understand these complex by studying the molecular mech- molecules with their own internal physics. Intimately linked to the anisms behind it. The chemical dynamics. The dynamics of this power of computers, its impor- decomposition of the explosive system is dictated by an exten- tance can only grow over the is produced over a thickness rang- sion of traditional Hamiltonian coming years. z

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 . FOR AROUND FORTY YEARS NOW METAL ALLOYS HAVE REVEALED ASTONISHING MECHANICAL PROPERTIES: THEY CAN UNDERGO MAJOR DEFORMATIONS BEFORE RETURNING TO THEIR INITIAL SHAPE. REACHING AN UNDERSTANDING OF THIS 32 PHENOMENON ON THE ATOMIC LEVEL IS ONLY POSSIBLE WITH HUGE COMPUTING POWER. MARTENSITIC DEFORMATIONS SEEN THROUGH THE PROCESSOR PRISM MAJOR CHALLENGES

t was in 1970 that we first dis- the simulation time, limits the covered metal alloys had sur- correlation between different I prising properties: when sub- points in space. In practice, in- jected to weak tensile stress, creasing the size of the calculation they can stretch to a certain point box beyond this “maximal cor- BY and then, when “released”, return relation distance” contributes CHRISTOPHE to their initial shape. These com- nothing more if the duration of DENOUAL plex alloys – often nickel and tita- the simulation is not increased Research nium based –, and whose behaviour proportionally. However, while engineer at the is often qualified as “pseudo-elas- the power of computers has in- CEA. tic” have rapidly been used for creased with the multiplication many applications, like frames for of processor cores, the power of eyeglasses or surgical stents. each core has not really evolved. The mechanism behind such Today, as was the case ten years major deformations is called “mar- ago, it is very difficult in practice tensitic transformation” (Figure to exceed a million computing 1): this involves a modification in cycles, with each one correspond- the crystal structure after a change ing to a length in time of one fem- in the environment, which can tosecond (one millionth of one be either an applied strain, pres- billionth of a second), or a total sure or change in temperature. In simulation time of one nanosec- metals, these transformations are ond. This limited duration cor- extremely commonplace and are responds to a maximal correlation the object of intense research. distance of only a few micrometres While they rarely allow such spec- and restricts the emergence of tacular behaviours as pseudo- larger microstructures. elasticity, they can induce, under The largest simulations find extreme strain, significant defor- themselves up against this “wall of mations, and play a fundamental time”. It can be overcome by chang- role in the behaviour of metals. ing the representation, by grouping Our understanding of these together several atoms in a complex mechanisms on an atomic level molecule, for example, to optimise has grown a great deal, thanks to calculations. In the same spirit, we modelling martensitic transitions opted in 2010 for a method that al- using molecular dynamic shock lows for a radical change in the simulations conducted with mil- “granularity” of the representation lions of atoms. For prolonged pe- of matter by forming groups of 10, riods of stress, the final microstruc- 100 or 1,000 atoms, while ensuring

ture becomes very complex and a certain equivalency between en- CEA stretches over broad areas of space, ergy in this set and the atomic (and often up to one millimetre. Can detailed) underlying representation. molecular dynamics simulate mar- This approach is based on a tensitic transformations on such particularly compact representa- broad fields? tion of the energy landscape in- volved in phase transformations. The wall of Time Each of the stable states is repre- During such a simulation, equi- sented by a minimum of energy, libriums are established via the the level of which is estimated propagation of deformation waves. (before calculations) by a model- The maximum distance covered ling technique at atomic scale. by these waves, proportional to During transformation, the ma-

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS terial shifts from a stable state (or tures that emerge from these cal- well) to another via a “reaction culations show alternating lamella pathway”, i.e. a series of states that made up of different variants of allow for a smooth transition be- martensites with very fl at edges. tween two wells. This reaction On a larger scale, these also stack 33 pathway tree (Figure 1) is dupli- to form relatively straight bands, z Figure 1. Martensitic trans- cated for all the cells in the calcu- which are in turn contained within formations: Iron-nickel alloys lation, each minimising the energy a broader corridor. can easily shift from a face- as much as possible. To obtain this three-level nested centred cubic structure (upper structure, it is important to guar- left) to a body-centred cubic The reaction pathway tree antee excellent resolution – here structure (upper right), pro- viding there is homogenous Let us take, for example, calcula- each cell represents around 100 deformation of the cell (white tions for an iron and nickel alloy atoms –, large calculation boxes and frame) The phase transfor- (Figure 2). Conducted with Tera suffi ciently long computing times mations are represented as 100 on more than 4,500 proces- for the larger structures to emerge. lines (in grey) connecting sors, it has allowed us to achieve MICRON Only coarse grain computing, con- two stable states (coloured cubes with sides measuring 0.5 ducted by thousands of cores, can dots). The space in the tree Or micrometre, equals (represented here simply in microns * for a simulation time one thousandth of achieve these levels of resolution MAJOR CHALLENGES 6 2 dimensions) is generally of 1 microsecond. The microstruc- a millimetre (10- m) and scales in space and time. 9-dimensional.. The results we have obtained today are based on a reaction path- way tree calculated using simplifi ed atomic potentials. Yet, in reality, these transformations are induced by a re-composition of the atoms’ electronic structure. However, on the scale of an individual atom, these transformations are described ex- clusively via a quantum approach. Simulating them requires use of a different method: ab initio calcula- tions of electronic structures. These are also highly complex and greedy in computing time. They also ben- efi t from the massive parallelisation typical of high performance com- puting. Thus we obtain a reliable estimation of reaction pathways. By combining these two ap- proaches, we will soon have a uni- fi ed vision of a microstructure from a scale of one millimetre to the scale of one atom. These trans- formations, which are sometimes so fast they do not show up in the most precise diagnostics, can therefore be analysed in detail and the mystery of how they are formed can be partially solved. z

z Figure 2. Appearance of lamella: A deformation, applied here in 1 microsecond on an iron and nickel alloy, induces a complex martensitic

transformation. CEA These different phases stack large lamella, which in turn form bands.

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 . THE FIRST LASER BEAMS WERE PRODUCED MORE THAN FIFTY YEARS AGO! HOWEVER, THE BEHAVIOUR OF THIS LIGHT, AS IT PASSES THROUGH VARIOUS MATERIALS, STILL RAISES MANY QUESTIONS. NON-LINEAR OPTICS, SELF-FOCUSING...THANKS TO SUPERCOMPUTERS, 34 THE LASER BEAM STILL HAS MANY SURPRISES IN STORE FOR US. USING GRAPHICS PROCESSORS TO VISUALISE LIGHT MAJOR CHALLENGES

t is one of the basic lessons of Intensity [TW/cm²] “linear” optics: light behaves z differently in different media. In this 3D I simulation of a Each material has a specific refractive index. Yet, this principle “small” laser beam, lasting has been challenged since the BY LUC BERGÉ 10 a picosecond 1960s with the advent of laser and 0.5 mm in head of sources: the refractive index of research in the diameter, we can transparent matter – gases, glass, theoretical and observe that the etc – can depend on the intensity applied physics 5 pulse is initially of the light source. The study of department at homogeneous, the CEA centre but then breaks up this phenomenon is called “non- in Bruyères- under the effect linear optics” (NLO). le-Châtel. of self-focusing In certain conditions, for ex- into a multitude ample when the power output of AND 0.2 of highly intense a laser beam is greater than a GUILLAUME 500 filaments, each COLIN DE 0 with a threshold value, the index of the VERDIÈRE 0 -0.2 -500 micrometric medium increases continually research x [mm] t [fs] diameter and along the optical path. Conse- engineer in duration of several quently, the laser pulse focuses the simulation femtoseconds. like a magnifying glass or a contact and information sciences lens. This is called “self-focusing”. department at the CEA centre Obsolete architecture in Bruyères- CPU (Central Processing Units) fer high processing power, thanks This strange property is encoun- le-Châtel. distributed in a string along a to several thousand arithmetic tered in experiments using mod- single specific dimension, are units that can function simulta- erate-energy laser sources – of just becoming obsolete for this type neously. a few millijoules – but with pulses of calculation. The programmer’s art consists lasting a femtosecond *. It also can in organising all these resources be observed of we use high-energy Describing how plasma is formed by expressing his algorithm with sources (a dozen kilojoules), but Indeed, simulating a laser pulse the help of thousands of light- longer pulses, of one nanosec- less than one millimetre in di- weight processors within a special ond *. This is the case, for example, ameter would monopolise 128 system called a “computational of high-power laser installations, CPUs for several months. With grid”. Gains in speed for the GPUs, where self-focusing can be ob- centimetre-sized beams like LMJ compared to their precursor, the served in silica glasses and leads (Laser Mégajoule), every opera- CPU, can represent a factor of 50. to a fragmentation of the optical tion would have to be multiplied They can solve systems of non- pulse into a multitude of micro- by 10,000! Moreover, in addition linear equations in just a few hours, metric filaments with an intensity to optical mechanisms, it is nec- a process that used to take several one thousand times greater than essary to simulate the deteriora- days only a year ago. the incident wave. tion of the material. In practical Today, GPUs offer the possi- To describe these non-linear terms, this means calculating the bility of simulating laser pulses dynamics, we can conduct digi- FEMTOSECOND formation of plasma, during a lasting 3 nanoseconds, with a tal simulations with high-per- A nanosecond (1 ns) few femtoseconds, when the la- resolution of 30 femtoseconds (or equals 109 seconds. formance supercomputers, such A picosecond (1 ps) ser’s intensity reaches several five orders of magnitude on one as Titane or Tera 100, located at equals 1012 seconds. dozen TW/cm2 *. dimension), in less than a week. the CEA centre in Bruyères-le- A femstosecond (1 fs) The solution lies in the use of Our next challenge? Reaching the Châtel. However, traditional par- equals 1015 seconds. another type of processor: graphic femtosecond to describe the for- allel computing techniques, which TERAWATT (TW) processors, or GPU (Graphic Pro- mation of plasma. Using graphics consist in calculating the laser Equals one trillion cessing Units). Originally designed processors to visualise light...what field simultaneously on different (1012) watts. for video games, recent GPUs of- could be more natural? z

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS THE FUTURE: EXASCALE COMPUTING

35 INCREASING THE POWER OF SUPERCOMPUTERS WILL INVOLVE DRASTICALLY IMPROVING ENERGY EFFICIENCY. MEMORIES AND PROCESSORS THAT ARE LESS GREEDY, MASSIVE PARALLEL ARCHITECTURES, OPTIMISED SOFTWARE, COOLING SYSTEMS:RESEARCHERS

ARE EXAMINING EVERY POSSIBLE LEAD TO REDUCE THEIR VORACITY. THE NEXT CHALLENGE: CONTROLLING ENERGY THE FUTURE: CONSUMPTION EXASCALE COMPUTING

z Equipped with 68,544 processors, Super K is the world’s most powerful supercomputer, according to benchmark tests last June.It is also one of the most efficient in terms of energy consumption. FUJITSU FLOPS igh performance com- sumes nearly 10 megawatts when efficient machines of its kind. Its (Floating point puting is an Olympic running at full capacity, or the an- “performance per watt” is 824 Operations Per undertaking. To set new nual consumption (heating ex- megaflops * (8.16 petaflops per Second) is a unit H of measure for the records, the “athletes” cluded) of 5,000 households! 9.9 megawatts), while the average performance of in the field have one priority to- Thanks to all this energy, it can energy efficiency of the top ten computers. One day: reducing their consumption... perform 8.16 million billion op- supercomputers is only 463 mega- teraflops equals of electricity. The voracity of these erations per second, or 8.16 pet- flops per watt. This good perfor- one thousand billion operations per second machines has indeed become a aflops * according to the Linpack mance is still insufficient to en- (1012) and one exaflops major drawback to the develop- benchmark test *, placing in the visage building a new machine equals one billion ment of more powerful super- n° 1 slot of the international TOP capable of the goal set in 2009 by billion operations per computers. 500, an international ranking of key players in the field: breaking second (1018). Currently, they require huge supercomputers, which, while the exaflops * barrier, or reaching LINPACK amounts of electricity. The world sometimes challenged, remains a billion billion operations per sec- A benchmark test champion since last June, the widely used. ond. “We reckon that at the start used to measure Japanese system baptised Super K, In absolute terms, the amount of 2012 the most powerful super- the time required for a computer to solve co-developed by Fujitsu and the of electricity consumed by Super K computers will deliver 10 petaflops a set of linear equations RIKEN Advanced Institute of Com- is impressive, but, paradoxically, while consuming 10 megawatts, of n equations putational Science in Kobe, con- it is also one of the most energy or around one petaflops per >>> with n unknowns.

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 . >>> megawatt, explains Franck Cappello, co-Director of the Joint Laboratory for Petascale Comput- ing formed by INRIA and Univer- 36 sity of Illinois Urbana-Champagne (USA) and a specialist in the race to break the exaflops barrier. By extrapolating these results, an exa- flopic supercomputer would con-

sumer 1,000 megawatts. This is unacceptable.” Even if demand for comput- ers capable of exaflopic perfor- mance is strong, the cost of the

THE FUTURE: energy required is not economi- cally sustainable. For a supercom- puter «burning» up to 1,000 mega- watts at full capacity – or as much

EXASCALE COMPUTING as a space shuttle during lift off – the annual utilities bill would ex- ceed 500 million in France, or around twice the cost of the su- percomputer itself (between 200 and 300 million). “The goal is to design an exaflopic supercomputer by 2018 that will consume only 20 megawatts, as this corresponds to the maximum capacity of infra- structures required to host such a machine, explains Franck Cap- pello. However, there are those who believe it will be difficult to remain under 50 megawatts.” With a goal of 20 megawatts, the energy efficiency of an exaf- lopic supercomputer would be z 50,000 megaflops per watt! Is it This prototype open or close simultaneously with “If we reduce the possible to multiply the power of silicon circuit, each cycle of operations. They current supercomputers by 100 designed by need a phenomenal amount of voltage by half, and still improve their energy ef- Intel, produces a energy – a large share of which is we need to multiply laser beam that ficiency by 50 fold? To meet this can be used to dissipated, moreover, in the form the number of cores challenge, researchers are explor- exchange up to of heat (Joule effect) – which en- by four to maintain ing all possible solutions. The first 50 gigabits of tails installing cooling systems area of improvement consists in data per second – that also consume large amounts the same level developing microprocessors and a boon for of energy. of performance.” memories that use less energy. To supercomputer manufacturers. perform billions of operations per More than 500,000 cores wire linking two components in second, supercomputers need Over the last thirty years, compo- a circuit. They have also developed more and more processors and nent manufacturers (Intel, IBM, alloys that have increased the com- memories that contain a multi- AMD, Fujitsu, Nvidia, etc) have mutation frequencies of transis- tude of transistors, each requiring increased the finesse of etching, tors while limiting chip voltages. an electrical current for them to or the diameter of the smallest But the heat given off by these

THE ENERGY EFFICIENCY OF THE WORLD'S TOP TEN SUPERCOMPUTERS

NAME MANUFACTURER SITE/COUNTRY POWER ENERGY ENERGY EFFICIENCY

RANK (IN PETAFLOPS) CONSUMPTION (MEGAWATTS) (MEGAFLOPS / W) 1 Super K Fujitsu Japan 8.16 9.9 824.6 2 Tianhe-1A NUDT China 2.57 4.04 635.1 3 Jaguar Cray United States 1.75 6.95 253.1 4 Nebulae Dawning China 1.27 2.58 492.6 5 Tsubame 2.0 NEC/HP Japan 1.19 1.4 852.3 6 Cielo Cray United States 1.11 3.98 278.9 7 Pleiades SGI United States 1.09 4.10 265.2 8 Hopper Cray United States 1.05 2.91 362.2 9 Tera 100 Bull France 1.05 4.59 228.8 10 Roadrunner IBM United States 1.04 2.35 444.3

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS CO-DESIGN TO THE RESCUE

Considering the energy performance of current general-purpose supercomputers, some researchers believe we will need to consider co-design in order to develop an exafl opic machine that consumes less than 50 megawatts. 37 Co-design consists in creating hardware architectures according to the application they will run and therefore creating specialised supercomputers offering the best performance and/or energy effi ciency. It is highly probable that the fi rst exafl opic supercomputer will be a specialised machine. The drawback, of course, is that it may be too specialised. THE FUTURE: EXASCALE COMPUTING 2010 JEFFREY TSENG @ INTOUCHSTUDIOS.COM TSENG 2010 JEFFREY increasingly small surfaces had already reached unacceptable levels in 2004 and manufacturers were forced to limit chip frequen- DAVID RANDALL, UNIVERSITÉ DU COLORADO DAVID cies, which now rarely exceed 4 gigahertz (GHz). Inside a super- computer they generally oscillate z between 1 and 3 GHz. The Green Flash project aims to design a climate simulator capable of running at 200 petafl ops Speed could not be increased, and on only 4 megawatts, or an energy effi ciency of 50,000 megafl ops per watt. To achieve this, it will deploy a massive parallel architecture, based on 20 million cores and created specifi cally for so progress in miniaturisation was its target application. A prototype has demonstrated the relevance of this approach at the end of used to serve parallelisation. Each 2010, but Green Flash still lacks the necessary funding, estimated at $75 million, to construct the microprocessor now contains sev- entire project. eral cores, processing units ca- pable of working alone or in par- allel. Super K uses, for example, 68,544 SPARC64 VIIIfx processors This is a major area for reducing connect cabinets and circuits with 8 cores, made with a 45 nano- electricity bills. “In 2018 super- boards, as well as components on metre (nm) process and running computers will have hundreds of chips by replacing certain path- at 2 GHz, representing a total of millions of cores, compared to ways in printed circuits. 548,352 cores. And this is just the 300,000 today” predicts Franck By using photons, rather then start. “We predict that in 2018 chips Cappello. electrons, to transfer data, pho- will be etched in 8 nm and proces- tonic links created by silicon chips sors will contain more than 1,000 Greedier than ever emitting/receiving a laser beam cores”, forecasts Franck Cappello. Another lever for increasing en- have reached very high outputs The proliferation of cores can ergy effi ciency consists in improv- in laboratory tests (more than 50 improve energy effi ciency by re- ing memories and communica- Gb/sec) and could reduce energy ducing chip voltage, a technique tion links between components. consumption by 10 fold. “Imple- called “voltage scaling”. By reduc- Currently, transferring data be- mentation of an end-to-end opti- ing a processor’s voltage, consump- tween processors and memories cal communication network will tion is reduced, but performance consumes more energy than pro- represent a major technological is also diminished and more cores cessing itself! Once again, design- breakthrough in the race to break are required to compensate for ers are counting on component the exafl ops barrier”, underlines this. If we reduce the voltage by manufacturers to innovate. Patrick Demichel, a systems archi- half, we need to multiply the num- The fi rst of such innovations tect specialised in intensive com- ber of cores by four to maintain consists in implementing optical puting at Hewlett-Packard. Man- the same level of performance. links instead of copper wires to ufacturers are developing >>>

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 . >>> memory chips that enable experts estimate that consump- others combine general-purpose 3D-stacking thanks to vertical tion, excluding cooling systems, processors (CPU or Central Pro- PHASE-CHANGE communication interfaces. It will for an exaflopic supercomputer cessing Unit) and Graphics Pro- MEMORY therefore be possible to stack them could reach 150 megawatts in 2018. cessing Units (GPU *) within “hy- 38 Phase-change above processors and reduce dis- Its processors would consume 50 brid” machines – like the second memories (or PRAM for tances between components. But megawatts, the memory 80 mega- most powerful supercomputer in Phase-Change Random it is above all the advent of non- watts and the network 20 mega- the world, Tianhe-1A, located in Access Memory) record data in vitreous volatile memories, such as phase- watts. Then we would have to add Tianjin, China, capable of reach- matter that changes change memories * or memris- the consumption of the cooling ing 2.5 petaflops with 4 megawatts.

states when an electric tors * that will considerably reduce system, or an additional 50 to 70%, Used in this case for perform- current is applied. electricity consumption. for a total consumption in excess ing calculations instead of dis- They do not need to be powered continuously. Unlike current DRAM, non- of 200 megawatts. This is still much playing graphics, GPUs serve as volatile memories do not require too high. an accelerator for certain appli- MEMRISTOR continual power. “If they are suf- cations and improve overall en- THE FUTURE: A memristor is a ficiently efficient, they could allow Graphics processing units ergy efficiency. In addition to their passive electronic us to change the architecture of If we want to make further prog- specialisation, their current weak- component that machines and simplify fault tol- ress, the supercomputer’s archi- ness is that they do not commu- changes electrical erance systems, which would have tecture will be a determining fac- nicate rapidly with other GPUs resistance constantly EXASCALE COMPUTING under the effect of an fewer backups to perform on a tor. Designers are examining the because they need the CPUs to electrical charge. It is hard drive or SSD * and require nature of the processing units act as intermediaries. For appli- used to design RRAM only a few control micropoints”, inside supercomputers. cations where processors com- or ReRam memories explains Franck Cappello. There are two competing trends. municate a great deal with each (Resistive Random Access Memory) that If we look at technological Some favour machines using a other, this approach is not relevant. do not need to be changes in components alone, single model of general-purpose Yet, this situation could evolve. powered continuously. without considering architectures, processor – like Super K – while “In the end, we will probably no

COOLING SYSTEMS ARE THE FOCUS OF SPECIAL ATTENTION Cooling systems, which ensure supercomputers run smoothly, generally represent 50 to 75% of energy consumption. Designers are therefore developing all sorts of innovations to make them more efficient. A top priority for manufacturers is “free-cooling” or systems where air circulates without using a heat pump. Whenever possible, it is best to select a site in a cool region, but free-cooling is a goal that remains out of reach for an exaflopic supercomputer. Manufacturers are therefore trying to find complementary solutions that use as little energy as possible. For example, cabinet doors that contain processor clusters can also contain ice-water circuits. Certain manufacturers are producing printed circuit boards that also have cooling circuits and IBM is even developing chips covered with micro-channels filled with coolant. Despite these efforts, some experts believe that the owners of supercomputers should sell the heat recovered to achieve a more balanced economic model. This is an attractive idea, but difficult to put into practice.

z The cabinet doors of the most powerful French supercomputer, Tera 100 (ranked 9th worldwide), built by Bull, contain a heat exchanger, fans and an ice-water cooling circuit linked to a suspended ceiling that is 1 km long. Each door dissipates 40 kW per cabinet. CEA

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS longer make the distinction be- will be finding the right program- Since it is never possible to con- tween CPU and GPU because pro- ming methods and new mathe- tinually achieve perfect parallelism, cessors will contain both general- SSD matical models, predicts Serge software must now be capable of purpose and specialised cores that An SSD disc (Solid- Petiton, head of the MAP team reaching unused hardware resources. will serve as accelerators”, claims State Drive) is a data (Méthodologie et algorithmique This role has already been allocated 39 Franck Cappello. storage device made up parallèle pour le calcul scientifique) to operating systems and certain of Flash memories. at the LIFL (the Computer Science hardware like the processor, which “In the race to GPU Laboratory of Lille 1 University). I can slow down or disable certain break the exaflops A Graphics Processing think we’re going to find ourselves cores. Also applications are being Unit is a processor up against a wall when we hit 100 designed to save energy. “The pro- barrier, the main dedicated to petaflops and we’ll absolutely need grammer now has three criteria he calculating displays. scientific problem It can perform several to consider new paradigms.” needs to manage: the number of it- will be finding the tasks at the same time. Already today it is more and erations, the duration of each itera- more difficult to parallelise cal- tion and the overall energy associ-

right programming culations, i.e. to divide them into ated with the execution of these THE FUTURE: methods and new sub-calculations executed by pro- iterations, explains Serge Petiton. mathematical cessor cores. Software must also And, depending on the need, these focus on locality, thus limiting algorithms will be used to reduce

models” movements of data between pro- computation times or energy bills.” EXASCALE COMPUTING cessors. This has led to the advent With hundreds of thousands “Considering the complexity of of new algorithmic disciplines of processors, it is more and more future supercomputers, tomorrow’s such as the “Communication difficult to predict the best com- software will play a vital role. In Avoiding Algorithm”, whose aim putation method to reduce energy the race to break the exaflops bar- is to allow processors to work in- consumption. According to Serge rier, the main scientific problem dependently whenever possible. Petiton “This is not a determinist problem. This is why we are imple- menting autotuning techniques, which consist in automatically up- dating computation parameters in real time in order to reduce en- ergy consumption.” Optimising operating systems, languages, com- pilers and applications could rep- resent 10 to 20% in energy savings.

Implications for the general public The gains obtained could be signi- ficant, but the complexity of compu- ter architectures will require reinfor- cing fault tolerance (error correction systems, registry backups...). “We estimate this will consume more en- ergy and, in the end, fault tolerance mechanisms should drain a third of the energy burned by a supercom- puter, compared to 20% today”, imag- ines Franck Cappello. The researcher’s ingenuity is vital in finding solutions for all these problems. Hence Serge Peti- ton estimates that “in order to de- sign an exaflopic supercomputer, we will need cross-disciplinary teams with hundreds of people trained in energy issues; it will be a real challenge in terms of recruit- ment and training.” However, the results will be worth it. By developing exaflopic machines, a multitude of problems affecting society in general can be solved, such as climate forecasts or drug screenings. In the same way inno- vations in Formula 1 racing have changed cars for the “common man”, progress in supercomputers should significantly reduce the energy con- sumption of PCs and consumer electronics. So bring on the chal- lenges! z CONSTANTIN LENSEELE

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 . The CEA Scientific computing complex

>> to fulfill needs of TGCC - CEA High End Simulations - Large scientific and industrial projects >>>TGCC is a new «green - Great European research programs infrastructure» for high computing performance, able to host >> to mutualize petascale supercomputers. - Expertise in HPC technologies This supercomputing center has - R&D Efforts been planned to welcome: the - Infrastructures first French Petascale machine for the Prace project and the next >> to manage generation of Computing Center - The access of key technologies for Research and Technology - The complexity of high performance (CCRT). computing equipments

TODAY NUMERICALSIMULATIONISUSEDINMANYlELDSFORRESEARCHANDDEVELOPMENTMECHANICS mUIDMECHANICS materials science, astrophysics, nuclear physics, aeronautics, climatology, meteorology, theoretical physics, quantum mechanics, biology, chemistry, technological research, etc.

Some applications of simulation : s#,)-!4%!.$ %.6)2/.-%.4 s$%&%.3%7)4(4%2! to refine climate #/-054).'#%.4%2 warming, estimate design, development, of future change. guarantee and mainte- NANCEOF&RENCHNUCLEAR stockpile.

s.5#,%!2%.%2'9 2d and 3rd generation - 4th generation of reactors 2ESEARCHREACTORS .AVALPROPULSION &UELCYCLE 7ASTESTORAGE

s&53)/.0,!3-!0%2&/2-!.#% High performance simulations of plasma turbulence, key physics issues for ITER.

www.cea.fr A COMPUTER WITH 1 MILLION PROCESSING UNITS, EACH ONE FAILING ONLY ONCE EVERY THOUSAND YEARS, WOULD STILL PRODUCE ONE ERROR PER MINUTE! IN THE RUN-UP TO EXASCALE, THE PROBLEM OF FAILURE TOLERANCE NEEDS TO BE RESOLVED WITH PROTOCOLS FOR BACKUPS OF EXECUTION STATES AND FAILURE AVOIDANCE. 41

CORRECTING ERRORS IS A TOP PRIORITY THE FUTURE: EXASCALE COMPUTING z The Jaguar supercomputer (USA) experiences one failure a day on average. This is not surprising considering the number of cores it contains (nearly 250,000) and its computing power (2.33 petaflops). AP/SIPA

t the end of the decade we should see the first THE CHALLENGE FOR DIGITAL LIBRARIES “exascale” computers, A BY LUC GIRAUD that is to say, machines member of the HIEPACS team, dedicated to high-end parallel algorithms with a computing power of one BY FRANCK for challenging numerical simulations exaflops. CAPPELLO But in order to run scientific co-director Digital libraries are software building blocks that help solve applications and take full advan- of the INRIA/ recurring mathematical problems. These generic problems are tage of hundreds of millions of Urbana- part of large simulation codes developed to understand complex Champaign phenomena that cannot be studied through experimentation, but cores (massive parallel systems), Joint Laboratory we need to envisage the fact that for Petascale that we will be able to understand thanks to the next generation implementing such calculations Computing in of exaflopic computers. INRIA teams are working with its partners also involves tolerating a constant the on solutions aimed at facilitating the use of these new machines flow of failures. The Jaguar su- United States. by researchers who are not experts in parallel computing. In this percomputer at the Oak Ridge context, there are many challenges that need to be met in order National Library in the United to significantly affect the use of supercomputers. New “flexible States experiences an average of hierarchy” algorithms are capable of simultaneously using large one failure a day, and it has “only” numbers of cores on heterogeneous computers. These libraries 250,000 cores. must be particularly failure-tolerant. They should also limit the Imagine this: a computer with energy used for computing while automatically adapting to 1 million processing units, each variable conditions of use in terms data volume and the number one failing only once every >>> of cores available to process it.

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 . 42 >>> thousand years – an optimis- tic hypothesis based on reliable hardware – would still produce one error per minute! We can see why it would be impossible to run

any application without experi- encing numerous failures... What sort of failures can a su- percomputer encounter? These range from power outages, “crashes” z

THE FUTURE: due to a systems error, or inter- Tianhe-1 was the mittent failures on integrated cir- world’s most powerful cuits. These are by far the most supercomputer when it was first presented frequent kind. They involve changes at the end of 2010. EXASCALE COMPUTING in the state of memory cells (where Designed in China, it a bit switches back and forth from beat out the Jaguar 0 to 1) due to electromagnetic or (USA) by adopting a cosmic radiation! hybrid architecture associating general- purpose processors Exascale applications (CPU) and graphics Supercomputer hardware is de- accelerators (GPU). signed to detect and correct in- termittent failures, but at the cost

of greater energy consumption / IMAGINECHINA TA WEI because this multiplies the num- ber of circuits. Otherwise, the ap- VIRTUALISATION OF HYBRID ARCHITECTURES plication data can be “corrupt” and programmes can enter un- BY RAYMOND NAMYST expected states, producing erro- leader of RUNTIME team, dedicated to high-performance runtime systems for parallel architectures neous results without the re- searcher realising it. The recent advent of “hybrid” computers, associating general-purpose INRIA teams are developing processors and accelerators, has profoundly changed development techniques algorithmic techniques and soft- for simulation applications. While programmers of scientific applications and ware tools with its partners in or- developers of environments and compilers already had a lot on their plate der to solve the problem of failure with the arrival of multi-core processors for supercomputers, a new revolution tolerance in exascale machines is taking place in the world of computing: the use of accelerators such as GPU by working on protocols for back- (graphics processing units) to shore up traditional multi-core processors. ups of execution states and failure Originally, GPUs were adopted for their capacity to speed up specific parts avoidance. In the first case, if a of applications, whose computation was “delegated” to them. Gradually, the failure occurs, execution is re- use of GPUs became more widespread and today their power is often greater started based on the most recent than that of general-purpose processors. However, they require radically backed up state. different programming techniques from those The second case consists in traditionally used. predicting failures and shifting One of the greatest challenges facing the IT calculations in progress to reli- “A factory with a community is succeeding in simultaneously able resources. Predicting failures variety of specialised using all computing units. To achieve this, we remains difficult and backups of workers is more need to continuously feed a heterogeneous execution states remains a key efficient than one set of processing units. One approach that has approach. The challenge lies in been recently explored at the INRIA consists in designing very fast backup algo- with workers having breaking down applications into tasks, without rithms that require fewer resources, the same skill set.” deciding in advance which processing units will while the application uses 1 mil- run them. lion processor cores. The idea is to preserve as much flexibility as Teams are studying stochas- possible. The challenge lies in adjusting as tic execution models in order to carefully as possible the distribution of tasks among processing units. Typically predict, and therefore optimise, tasks will be assigned to the units that can run them more efficiently. However, the performance of a large-scale there are many parameters involved, which make the problem difficult to parallel scientific application. Fi- solve: the degree of useable parallelism, the amount of data transferred, energy nally, researchers are working on consumption, etc. new digital methods and robust Furthermore, while GPUs are generally more powerful than traditional cores, algorithms for simulations, which the gains in speed obtained depend a great deal on the task and volume of can calculate the desired solu- data processed. While certain calculations can be performed fifty times faster tions even if many failures occur. on a GPU than a traditional core, gains are much more modest for others and This last approach is a very prom- sometimes negative! Surprisingly, this apparent drawback is in fact a quality: ising one over the long term, but efficient allocation of tasks on a hybrid machine is much more effective than it will be difficult to adapt all ex- the results obtained on a homogeneous one, even though it is much easier ascale applications by the end of to run! The idea is that a factory with a variety of specialised workers is more the decade. z efficient than one with workers having the same skill set.

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS MAISON DE LA SIMULATION Mastering High Performance Computing

s!MULTIDISCIPLINARYRESEARCHLABFOCUSEDAROUNDNUMERICALSIMULATIONSANDACENTREOFEXPERTISEONHIGH PERFORMANCECOMPUTINGDEDICATEDTOSUPPORTINGSCIENTIlCCOMMUNITIESUSINGSUPERCOMPUTERS s!COLLABORATIVESTRUCTURECONDUCIVETOSIGNIlCANTBREAKTHROUGHS s!LARGEDOCTORALANDPOST DOCTORALPROGRAM s4RAININGINHIGHPERFORMANCECOMPUTINGANDITSAPPLICATIONS #RÏDITPHOTO#%!

www.maisondelasimulation.fr

BECOME A MASTER SCIENTIST IN MODELLING AND UVSQ SIMULATION The futur of computing

0ROPOSEDBYMAJOR 4HE-ASTERIN)NFOR RESEARCHINSTITUTIONS MATICS (IGH0ERFOR UNIVERSITIESAND MANCE#OMPUTING ENGINEERINGSCHOOLS 3IMULATION -)(03 ³COLE0OLYTECHNIQUE ³COLECENTRALE0ARIS %#0 %.3#ACHAN %.3#ACHAN 5631 %.34! ).34. 5631 TRAINSEXECUTIVESSPE #%! /NERAx CIALIZEDINPARALLELISM THE-ODELLINGAND MULTICOREPROCESSORS 3IMULATION-ASTER SUPERCOMPUTERS AND COURSE -3 NUMERICALSIMULATION INSTRUCTSlRST CLASS ALONGWITHPRESTIGIOUS SCIENTISTS RESEARCHERS PARTNERS ANDENGINEERS INMATHEMATICAL $EVELOPNEWTOOLS MODELLINGOFCOMPLEX ANDDIVEINTOTHEHEART PHENOMENAAND OFBUSINESSCOMPETITI SIMULATIONAPPLIEDTO VENESSANDINDUSTRYOF THEPHYSICALSCIENCES THEFUTURE "RINGYOURENGINEERING ORRESEARCHEXPERTISE WWWMIHPSFR INTOHIGHPOTENTIAL INDUSTRIALANDSCIENTIlC ORGANIZATIONS #RÏDITPHOTOI3TOCK0HOTOSLASH #RÏDITPHOTO#%!

www.maisondelasimulation.fr/M2S UVSQ.fr