<<

Contents

News Interview with Prof. Dr. Dr. Thomas Lippert 4 Obituary: Dr. Walter Nadler 7 Minister of Science of Baden-Württemberg visits HLRS 9 Cray CEO visits HLRS 10 Spring 2015 Member of Executive Board of Porsche visits HLRS 11 inSiDE • Vol. 13 No.1 No.1 • Spring 2015

Jülich’s Bernd Mohr to Chair SC17 12 13

Applications Innovatives Supercomputing HLRS Supercomputer successfully executed Extreme-scale Simulation Projects 14 Discontinuous Galerkin for High Performance Computational Fluid Dynamics 20 in Deutschland

Large-scale Simulations of Particle-laden Flows 25 inside • Vol. Quarks and Hadrons – and the Spectrum in Between 29 Advance Visualisation of Seismic Wave Propagation and Speed Model 34 A new Neutrino-Emission Asymmetry in forming Neutron Stars 38 Precision Physics from Simulations of Lattice Quantum Chromodynamics 43 Turbulence in the Planetary Boundary Layer 47 The strong Interaction at Neutron-rich Extremes 51 Lattice QCD as Tool for Discoveries 56 LIKWID Performance Tools 60

Projects

ECO2Clouds – Experimental Awareness of CO2 in Federated Cloud Sourcing 64 MIKELANGELO – Micro Kernel for High Performance Cloud and HPC Systems 68 FFMK: A Fast and Fault-tolerant Microkernel-based for Exascale Computing 72 EUDAT2020 77 ORPHEUS — Fire Safety in the Underground 80 Seamless Continuation: 4th PRACE Implementation Phase Project 85 Intel® Parallel Computing Center (IPCC) at LRZ 88

Systems 92

Centres 96 ´ Activities 102

Courses 122

If you would like to receive inside regularly, send an email with your postal address to [email protected] © HLRS 2015 is published two times a year by The GAUSS Centre Springfor 2Supercomputing015 (HLRS, LRZ, JSC)

Publishers Prof. Dr. A. Bode | Prof. Dr. Dr. Th. Lippert | Prof. Dr.-Ing. Dr. h.c. Dr. h.c. Hon. Prof. M. M. Resch

Editor F. Rainer Klank, HLRS [email protected]

Design Pia Heusel, HLRS [email protected]

Fine dust particle distribution in the human lung during inspiration. The particles, tracked by a Lagrangian model, are transported down the trachea and distribute in the left and right main bronchi before they deposit in higher lung generations where they can cause severe damage to the respiratory system potentially leading to lung cancer.

Gonzalo Brito Gadeschi1, Lennart Schneiders1, Christoph Siewert1, 2 Andreas Lintermann1, 3 Matthias Meinke1, Wolfgang Schröder1, 3 1 Institute of Aerodynamics, RWTH Aachen University, Germany, 2 Laboratoire Lagrange, OCA, CNRS, Nice, France, 2 SimLab “Highly Scalable Fluids & Solids Engineering", JARA-HPC, RWTH Aachen University, Germany

2 Spring 2015 • Vol. 13 No. 1 • inSiDE Editorial Editorial

Welcome to this new issue of inside, High Performance Computing contin- the journal on Innovative Supercomput- ues to be an important issue in the ing in Germany published by the Gauss German research community. The sig- Centre for Supercomputing (GCS). In nificance of HPC for German science is this issue we introduce the new Chair- emphasized by a new paper released by man of the Board of GCS, Prof. Thomas the Deutsche Wissenschaftsrat (German Lippert. Prof. Lippert took over from Science Council) on April 24 this year Prof. Michael Resch in April and talks (Drs. 4488-15 at www.wissenschaftsrat. about his ideas and views on the future de/download/archiv/4488-15.pdf) of High Performance Computing in an addressing the question of the funding interview with inside. Unfortunately we of HPC in Germany. also have to report sad news. Walter Nadler of the Jülich Supercomputing As usual, this issue includes informa- Centre passed away recently having tion about events in supercomputing served Gauss Centre for Supercom- in Germany over the last months and puting in a variety of functions over gives an outlook of workshops in the the last years. An obituary reflects on field. Readers are invited to participate a live dedicated to High Performance in these workshops which are now part Computing. of the PRACE Advanced Training Centres (PATCs) and hence are open to all With a selection of applications we European researchers. present the use of GCS supercomput- ers over the last months. The variety of applications shows that GCS systems • Prof. Dr. A. Bode are able to cover a wide spectrum of (LRZ) researach topics and basically make • Prof. Dr. Dr. Th. Lippert supercomputing available for all scien- (JSC) tific fields that rely on compute per- formance to break new barriers and • Prof. Dr.-Ing. Dr. h.c. achieve new results. Dr. h.c. Hon. Prof. M. M. Resch (HLRS) The section "Projects", on the other hand, shows how tightly GCS and its centres are enmeshed in the European and German scientific community when it comes to research in the field of High Performance Computing. We find a variety of project reports that range from basic research to applied research and industrial development. Most outstanding is the research con- ducted to guarantee a leading role for German and European research in the field of Exascale Computing.

Spring 2015 • Vol. 13 No. 1 • inSiDE 3 News Interview with Prof. Dr. Dr. Thomas Lippert

significant merits as far as the success of this association is con- cerned. They have given GCS maxi- mal stability and a reputation that is outstanding worldwide. Further- more, the centres of the GCS have been working together ever closer within the last eight years; I am very proud to follow the role model of my colleagues as the Chair by building on their virtues and these past achievements.

Thanks to the formidable engage- ment of the German ministry for education and research (BMBF) on the one hand and the ministries in Baden-Württemberg, Bayern, and North Rhine-Westphalia on the other hand, between 2010 and 2016 the GCS was enabled (and still is) to provide Germany’s HPC infrastructure of the highest national performance class, called Prof. Lippert, you were elected Tier-1. Evidently, GCS has achieved chairman of the Board of the Gauss one of its major objectives. In Centre for Supercomputing in April. addition, GCS has almost com- How do you see the role of the pleted its delivery of supercomput- Chairman of GCS? ing capacity worth Euro 100 Mio. to the European association Part- Since its foundation in 2007, the nership for Advanced Computing Gauss Centre for Supercomputing in Europe (PRACE) on behalf of the (GCS) has become the most power- BMBF, so far the most significant ful national HPC infrastructure contribution to Europe’s Tier-0 HPC in Europe. My predecessors as infrastructure. On top of it, GCS chairmen of the GCS, Prof. Achim not only is commissioned to pro- Bachem, Prof. Heinz-Gerd Hegering, vide the most powerful HPC infra- and Prof. Michael Resch all have structure in Germany and Europe,

4 Spring 2015 • Vol. 13 No. 1 • inSiDE News

its mission also comprises to serve stantially has enhanced Germany’s a broad range of research and position and visibility in the field industrial activities in a variety of of computational science and disciplines. engineering worldwide. It is quite interesting to observe the reason At this stage, I see it as a most for such a successful long-term important responsibility of the stability. chairman to ensure that the GCS success story will continue and My view as to a GCS key success even can become a greater suc- story more strongly reflects my cess in the future. This goal, on computational scientist background the one hand, requires leading the rather than my computer science GCS safely into its next phase, history – I am a practicing compu- starting in 2016, by deepening the tational elementary particle physi- deep trust GCS enjoys from its cist – and I think, the crucial step ministries. On the other hand, GCS was a scientific one, namely the has always been, as truly European establishment of a rigorous GCS partner, crucial for the progress peer review process, joining the of PRACE; the chairman must con- virtues of established procedures tribute his best to ensure that GCS at the three centres. In fact, it is stays being the stabilizing element remarkable that GCS achieved to in PRACE. promote its peer-review procedure to become the template for the You mentioned the long-term col- European PRACE peer review. On laboration between the GCS part- the organizational level, GCS has ners, which is now in its ninth established a Joint Steering Execu- year. What is the key success tive Committee (Lenkungsausschuss) story of GCS from your point of led by experts from science and view? engineering of highest reputation and supported by a team from the What was considered a tremen- three GCS centres. This executive dous challenge eight years ago – committee is responsible for the bringing together the three peer review process and the prac- national centres to form a single tical coordination between the cen- organization – has turned out to tres. Our GCS evaluation practice become a most conducive research promotes a continuously increas- infrastructure of supercomputing ing level of quality of the proposals for Germany and Europe that sub- submitted, and, most importantly,

Spring 2015 • Vol. 13 No. 1 • inSiDE 5 News

it attracts those scientific and in order to help our communities engineering groups to GCS facili- keeping up with the exponentially ties with highest scientific claims. growing demands on parallelism, scalability, hierarchical memory Which changes should we expect layers and interactivity. from you? Many people say that the US and We have gained lot of experience Japan benefit substantially from in the last years on data intensive having national HPC industries. supercomputing, being increasingly Would such a European player requested in a variety of fields like enhance European science? for instance terrestrial systems, neuroscience, climatology and It was obvious in the early times of engineering. Given the growing supercomputing that US scientists importance of Scientific Big Data and engineers had some competi- Analytics (SBDA) for science and tive advantage as early adopters engineering, I assume that SBDA of their homemade technologies. in science and industry will become I think that since then the rest of a major component of high-end the world, in all scientific and HPC activities worldwide, and both engineering fields, has been able to fields, data intensive simulations strongly improve its position due to and SBDA, will grow. I am con- excellence in application , vinced that the GCS has the ability algorithms and development of to become the trailblazer for peer tools. As example, materials sci- reviewed SBDA in Germany and ences groups in Europe are proba- Europe and I will strongly try bly in the lead compared to US due encouraging the GCS to engage in to their high expertise in molecular this field. dynamics and ab-initio simulation codes. Of course, it is natural and What are future challenges? important for the HPC centres to engage in co-design projects with The HPC landscape is known to German and European technology transform very rapidly. Every 3 to providers of HPC technology; a 5 years a new generation of HPC most welcome effect is that this technology appears. The past and engagement will foster the field not future challenge for GCS is to take only in Europe but also as a whole, our users with us, together with not least because high-end HPC their highly evolved application technology producers and vendors codes, as we move towards more and more become globalized exascale compute performances companies. On all accounts, GCS and data storages becoming must be open to the best technol- established around 2025. We have ogy available worldwide, procured to substantially expand our training on the basis of accepted rules, to efforts and our support activities ensure its continued ability to offer

6 Spring 2015 • Vol. 13 No. 1 • inSiDE News world-leading supercomputing sys- Obituary: tems and software to its users. Dr. Walter Nadler Where do you want to be in two years from now when your term as a chairman of GCS will end?

I hope that GCS II will have been launched by the same winning team as in the first round of GCS, that we can manage to realize the vision of an interacting HPC pyramid with the Gauss Alliance, and that GCS will have succeeded to rejuvenate its leadership in the next round of PRACE.

Prof. Lippert, thank you for the interview. The inteview was con- ducted by the inside team.

On June 9, 2015, Dr. Walter Nadler passed away at the age of 61. Even to those close to him this came as an unexpected shock.

Dr. Walter Nadler was not only an established expert in simulating complex biological systems, he also excelled as a science manager, effi- ciently and effectively coordinating peer review procedures which had been established to grant Tier-1 and Tier-2 supercomputer re- sources to different user alliances. He was the driving force in imple- menting the GCS governance which

Spring 2015 • Vol. 13 No. 1 • inSiDE 7 News

was agreed upon by the GCS mem- versity was approved (JARA-HPC ber centres in 2012/13, where he partition), an extraordinarily com- acted as indispensable advisor to plex undertaking which succeeded the chairmen of numerous com- largely thanks to the skill and lead- pute time granting commissions. ership demonstrated by Walter. One year later, the peer-review Walter received his PhD in Physics process for the newly established in 1985 at Technische Universität GCS Large-Scale Projects - defined München, supervised by Prof. Klaus by the GCS Governance - had to Schulten. In the following years he be integrated and implemented, worked at different institutions, again done by Walter. In the last including the California Institute few months he provided most valu- of Technology, Wuppertal Univer- able input to the redesign of JSC’s sity and Michigan Technological application and peer-review server University. In October 1996 he software. joined Forschungszentrum Jülich for the first time and became for A role that Walter will also be two years a member of the HLRZ fondly remembered for both inside (later NIC) research group Complex and outside JSC is as an enthusi- Systems, led by Prof. Peter Grass- astic, unflappable and fearsomely berger. He collaborated scien- efficient booth manager: in this tifically with many groups, among capacity he organized and coordi- them the Complexity Science Group nated the exhibition of the Jülich at Calgary University and the group Supercomputing Centre at the Cardiac MRT and Biophysics at European International Supercom- Würzburg University. Results of puting Conference (ISC) and the US the latter collaboration were hon- Supercomputing Conference (SC) • Thomas Lippert1 oured 2003 with the Helmholtz every year since 2008. Under his • Dietrich Wolf2 prize. In 2007 he joined supervision these exhibitions be- Forschungszentrum Jülich for a came a cornerstone with respect 1 Jülich second time and became member to the international visibility of the Supercomputing of the NIC research group “Com- Jülich Supercomputing Centre. Centre, Germany putational Biology and Biophysics”, headed by Prof. Ulrich Hansmann. The Jülich Supercomputing Centre 2 Universität Mid 2008 he was appointed head and its partner institutions in the Duisburg-Essen, of the newly established NIC Gauss Centre for Supercomputing, Germany coordination office. Since then he the John von Neumann Institute for focussed on introducing innova- Computing, the JARA-HPC Vergabe- tive enhancements to the NIC peer gremium and the Vergabekommis- review and FZJ-internal allocation sion für Supercomputer-Ressourcen procedures. In 2012 an additional at Forschungszentrum Jülich will regional peer-review process for all sorely miss him. researchers from Forschungszen- trum Jülich and RWTH Aachen Uni-

8 Spring 2015 • Vol. 13 No. 1 • inSiDE News Minister of Science of Baden-Württemberg visits HLRS

On April 13th 2015 Teresia Bauer, Min- ister of Science of the State of Baden- Württembeg visited the HLRS. Having declared High Performance Computing one of the key technologies for the state of Baden-Württemberg Minister Bauer came to see the center and the new supercomputer "Hornet" that is - tional since January 2015. In a guided tour through the facilities Prof. Michael Resch – Director of HLRS – explained the potential and power of the Cray XC40 and pointed to the challenges for both HLRS and its users that come with the ever growing level of parallelism. Minis- ter Bauer was deeply impressed both made available to citizens affected by by the Cray system and the technical the construction and operation of the facilities at HLRS. power plant. Minister Bauer got a very good impression of how simulation and After the technical tour Minister Bauer visualization can be used in a dialogue was shown to the Virtual Reality envi- between politics and citizens to discuss • inside team ronment of HLRS. Dr. Wössner – Head better solutions especially for large of Visualization at HLRS – gave a pre- scale projects. University of sentation of how visualization can be Stuttgart, HLRS, used to make simulation results not In a final discussion Minister Bauer Germany only visible but also understandable for and Prof. Resch touched on a number both the users of HLRS and the general of further important issues related to public. Showing the three-dimensional simulation. Both agreed that the role of view of a planned water power plant simulation in the political decision mak- Dr. Wössner gave an example for how ing process will be more important in the results of visualization had been the future. They also agreed that train- ing and education in simulation technol- ogy and high performance computing are vital issues especially for a high- tech region like the state of Baden- Württemberg.

Spring 2015 • Vol. 13 No. 1 • inSiDE 9 News Cray CEO visits HLRS

On January 20th 2015 Peter J. Ungaro During the discussion Mr. Ungaro pre- President and CEO of Cray visited the sented the future product strategy of High Performance Computing Center Cray discussing technical options for an Stuttgart (HLRS) to discuss future strat- engineering focused center like HLRS. egies and collaborations in HPC. In a Both sides agreed that sustained per- face-to-face meeting with the Director of formance is going to be the key to fur- HLRS Prof. Michael Resch strategic ther success in HPC and that solutions • inside team issues in the development of the world- rather than pure flop counting are the wide HPC community as well as the col- driving force for the further collabora- University of laboration between Cray and HLRS were tion between HLRS and Cray. HLRS and Stuttgart, HLRS, discussed. After a tour through the facilities Cray also agreed that data issues are Germany of HLRS and a visit to the new flagship growing in importance and have become system "Hornet" – a Cray XC40 with a a key driving factor for supercomputers peak performance of 3,8 POFLOP/s – requiring solutions that go way beyond Peter Ungaro sat down for an intensive traditional data storage and manage- discussion with the management and tech- ment. Data analytics and data discovery nical leadership of HLRS. are considered one important aspect. However, beyond the classical issues of Big Data a combination of data and simu- lation will become ever more important in the future. It will therefore be vital for an HPC center to be able to provide solutions that integrate both worlds in a single solution for its users.

10 Spring 2015 • Vol. 13 No. 1 • inSiDE News Member of Executive Board of Porsche visits HLRS

On May 13th 2015 Wolfgang Hatz, Member of the Executive Board – Research and Deveopment of Porsche, visited the HLRS. Together with his team in charge of virtual car develop- ment Wolfgang Hatz came to get a view of the new Cray system and to discuss with HLRS the long term rela- tion of Porsche and HLRS in the field of high performance technical comput- ing. Prof. Michael Resch – Director of HLRS – gave an overview of the center and the history of the collaboration with Porsche. In his talk he highlighted the role of Porsche as a partner for HLRS but also the potential of HPC and es- After a tour through the computer pecially of the new Cray XC40 system. room and the facilities Wolfgang Hatz Wolfgang Hatz was impressed by the was shown to the Virtual Reality envi- variety of issues and the potential of the ronment of HLRS. Dr. Wössner – Head new system for technical simulation. of Visualization at HLRS – gave a pre- sentation of how visualization can be • inside team used to make simulation results not only visible but also understandable for both University of the users of HLRS and the general pub- Stuttgart, HLRS, lic. Wolfgang Hatz was invited to take a Germany ride in the Porsche driving simulator at HLRS and was impressed by the poten- tial of such a simulator.

During the presentation Porsche and HLRS discussed the potential of Augmented Reality in the design and development process of cars. Both sides agreed that there is a lot of poten- tial for the extension of the collaboration not only in terms of computing but also in the integration of visualization and simulation.

Spring 2015 • Vol. 13 No. 1 • inSiDE 11 News Jülich’s Bernd Mohr to Chair SC17

Jülich Supersomputing Centre is extremly Bernd Mohr started to design and happy to annaounce that the SC Steer- develop tools for performance analysis ing Committee elected Jülich scientist of parallel programs already with his Dr.-Ing. Bernd Mohr as General Chair of diploma thesis (1987) at the University the SC17 Conference in Denver, Colorado. of Erlangen in Germany, and continued It is the first time ever that the most this in his Ph.D. work (1987 to 1992). important conference for High Perfor- During his PostDoc position at the Uni- mance Computing, networking, stor- versity of Oregon, he designed and imple- age and analysis – the Supercomputing mented the original TAU performance anal- Conference (SC) in the USA – will be ysis framework. Since 1996 he has been a chaired by a non-American. Shortly senior scientist at Forschungszentrum after the announcement, the portal Jülich, Germany's largest multidisci- HPCwire listed Mohr as one of the plinary research center. He now acts people to watch in 2015. See: www.hp- as team leader for the group "Program- cwire.com/peoplewatch-2015/ ming Environments and Performance Optimization" and serves as deputy

12 Spring 2015 • Vol. 13 No. 1 • inSiDE News head for the JSC division "Application The work for SC17 is already well under support". Besides being responsible for way. One of the most important tasks user support and training in regard to is to build his core organization team, performance tools at the Jülich Super- around which some 500 volunteers computing Centre (JSC), he is leading will gather in the coming year. He the Scalasca performance tools effort also started to think about fine-tuning in collaboration with Prof. Dr. Felix the topics for 2017: Besides an even Wolf, now at TU Darmstadt. stronger international focus for the conference, potential directions will be His first visits to SC were 1993 in linking HPC more closely to topics such Portland and 1994 in Washington, as big data analytics and pre- and post- D.C., during his PostDoc days at the processing including visualization. University of Oregon in Eugene. Later, in 1999, he was among a small team I am personally very proud of working which was responsible for setting up with Bernd at Jülich Supercomputing and staffing the research exhibits Centre. He was the person promoting booth of the Jülich Supercomputing the international reputation of JSC's Centre for a few years. Dr.-Ing. Mohr computer science research most pro- also gave 11 SC tutorials between foundly. His personality and enthusiasm 1999 and 2009. Finally, he got involved have encouraged many international in helping organize the conference as a researchers to cooperate with JSC and research paper reviewer for SC03 and have helped to create numerous per- worked his way up serving various roles sonal relationships between JSC and in the technical program committee. In international experts. 2011, he became the first European to be elected to the SC Steering Committee. • Thomas Lippert

Established in 1988, the annual SC con- Jülich ference attracts scientists and users Supercomputing from all over the world who come Centre (JSC), together to discuss current develop- Germany ments. More than 10,000 people now participate in SC each year. The confer- ence is sponsored by the Association for Computing Machinery (ACM) and the Institute of Electrical and Electron- ics Engineers (IEEE).

Spring 2015 • Vol. 13 No. 1 • inSiDE 13 Applications HLRS Supercomputer successfully executed Extreme-scale Simulation Projects

Figure 1: Fine scale cloud structures of typhoon Soulik depicted by the outgoing long wave radiation. Blue areas indicate dense high clouds whereas white areas show cloud-free regions. Because of the high resolution, the eyewall of the typhoon with the strongest precipitation is clearly seen by the dark blue area southwest of Okinawa. © Institute of Physics and Meteorology, Universität Hohenheim

The newly installed HPC system puter in early January. With each Hornet of HLRS successfully finished application scaling up to all of extensive simulation projects that Hornet’s available 94,646 compute by far exceeded the calibre of previ- cores, the machine was put through ously performed simulation runs at a demanding endurance test. The HLRS: Six so called XXL-Projects from achieved results more than satisfied the computationally demanding scientific HLRS HPC experts as well as the scien- fields such as planetary research, tific users: Hornet lived up to the chal- climatology, environmental chemistry, lenge and passed the simulation “burn-in aerospace, and scientific engineering runs” with flying colors. were applied on the HLRS supercom-

14 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications

The six XXL-Projects implemented on long enough to cover various extreme the HLRS supercomputer, are: events on the Northern hemisphere and to study the model performance. Convection permitting The results confirm an extraordinary Channel Simulation quality with the respect to the simula- Institute of Physics and Meteorology, tion of extreme weather events such Universität Hohenheim, Wulfmeyer, as the taifun Soulik in the Pacific from V., Warrach-Sagi, K., Schwitalla, T. July 10-12, 2013.

• 84,000 compute cores used The storage capabilities of the new 84 machine hours Hornet system allowed the scien- • 330 TB of data + 120 TB for tists to run the simulation without post-processing. any interruptions for more than three days. Using the combination Current high-resolution weather and of MPI+OpenMPI including PNetCDF climate models are operated over libraries, the performance turned out a domain which is centred over the to be excellent. Another finding was region of interest. This configura- that not the computing time but the I/O tion suffers from a deterioration of performance became the limiting factor large-scale features like low pressure for the duration of the model run. systems when propagating in the inner domain. This feature is strongly limiting the quality of the simulation of extreme weather events and climate statistics. A solution can be a latitude belt simulation around the Earth at a resolution of a few km. By the XXL project, a corresponding simulation became possible for a time period

Figure 2: λ2-visualization of the turbulent structures along the flat plate. © IAG, Universität Stuttgart

Spring 2015 • Vol. 13 No. 1 • inSiDE 15 Applications

Direct Numerical scaling attributes being a necessity Simulation of a Spatially- to satisfy the computational demand developing Turbulent in a sustainable and efficient way. Boundary Along a Flat The outcome of this work allowed the Plate researchers to establish a database Institute of Aerodynamics and Gas which can be used for further com- Dynamics (IAG), Universität Stuttgart, plex investigations such as shock / Atak, M., Munz, C. D. wave boundary layer interactions.

Figure 3: Vorticity contours with mapped on Mach number © Institute of Aerodynamics, RWTH Aachen University

• 93,840 compute cores used Prediction of the turbulent 70 machine hours Flow Field around a ducted • 30 TB of data axial Fan Institute of Aerodynamics, RWTH The intake flow of hypersonic air- Aachen University, Pogorelov, A., breathing propulsion systems is char- Meinke, M., Schröder, W. acterized by laminar and turbulent boundary layers and their interaction • 92,000 compute cores used with impinging shock waves. The 110 machine hours objective of this work was to conduct • 80 TB of data a direct numerical simulation of the complete transition of a boundary The turbulent low Mach number flow layer flow to fully-developed turbu- through a ducted axial fan is investi- lence along a flat plate up to high gated by large-eddy simulations using Reynolds numbers. The scientists an unstructured hierarchical Cartesian applied a high-order discontinuous method. It is the purpose of this com- Galerkin spectral element method putation to understand the develop- which inherently involves excellent ment of vortical flow structures and

16 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications the turbulence intensity in the tip-gap region. To achieve this objective a resolution in the range of 1 billion cells is necessary. This defines a com- putational problem that can only be tackled on a Tier-0 machine.

Large-Eddy Simulation of a Helicopter Engine Jet Figure 4: Flow structure downstream of the centre body Institute of Aerodynamics, RWTH © Institute of Aerodynamics, RWTH Aachen University Aachen University, Cetin, M. O., Meinke, M., Schröder, W. the simultaneous consideration of multiple effects like flow through a • 94,646 compute cores used complex geometry, mass transport 300 machine hours due to diffusion and electrodynamic • 120 TB of data forces. The behavior in the boundary layer has a big influence on the overall The impact of internal perturbations process, but is not well understood. due to geometric variations, which Only large computing resources are generally neglected, on the flow offered by Petascale systems such field and the acoustic field of a as Hornet enable the consideration of helicopter engine jet was analyzed all involved physical effects enabling by highly resolved large-eddy simula- a more realistic simulation then ever tions based on hierarchically refined before. Cartesian meshes up to 1 billion cells. The intricacy of the flow structure Large Scale Numerical requires such a detailed resolution Simulation of Planetary which could only be realized on an Interiors architecture like featured by Hornet. German Aerospace Center / Technische Universität Berlin, Hüttig, Ion Transport by C., Plesa, A. C., Tosi, N., Breuer, D. Convection and Diffusion Institute of Simulation Techniques • 54,000 compute cores used and Scientific Computing, Universität 3 machine hours Siegen, Masilamani, K., Klimach, H., • 2 TB of data Roller, S. The interior of planets and planet-like • 94,080 compute cores used objects have a very hot interior that 5 machine hours causes heat-driven motion. To study • 1.1 TB of data the effect of this kind of motion on the evolution of a planet large-scale The goal of this computation was computing facilities are necessary a more detailed simulation of the to understand the evolving patterns boundary layer effects in electro- under realistic parameters and com- dialysis processes, used for seawater pare them with observations. The desalination. This simulation involves goal is to understand how the surface

Spring 2015 • Vol. 13 No. 1 • inSiDE 17 Applications Headline

Figure 5: Species transport in the spacer filled flow channel © Institute of Simulation Techniques and Scientific Computing, Universität Siegen

is influenced, how conditions for life Demand for HPC on are maintained, how plate-tectonics the Rise work and how quickly a planet can cool. Demand for High Performance Com- puting is unbroken. Scientists con- With the Hornet project, scientists tinue to crave for ever increasing were able, for the first time, to study computing power. They are eagerly the flow under realistic assumptions awaiting the availability of even faster in full 3D and capture valuable infor- systems and better scalable software mation such as surface heat-flow and enabling them to attack and puzzle stresses. out the most challenging scientific and engineering problems. "Supply generates demand", states Prof. Michael M. Resch, Director of HLRS. "With the ability of ultra-fast machines like Hornet both industry and researchers are quickly realizing that fully leveraging the vast capabilities of such a supercomputer opens unprecedented opportunities and helps them to deliver results previously impossible to obtain.

18 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications

We are positive that our HPC infra- Outlook structure will be leveraged to its Following its ambitious technology full extent. Hornet will be an invalu- roadmap, HLRS is currently striv- able tool in supporting researchers ing to implement a planned system in their pursuit for answers to the expansion which is scheduled to be most pressing subjects of today’s completed by the end of 2015. The time, leading to scientific findings HLRS supercomputing infrastructure and knowledge of great and enduring will then deliver a peak performance value," adds Prof. Michael M. Resch. of more than seven PetaFlops (quadrillion mathematical calculations per second) and feature 2.3 petabytes of additional file system storage.

contact: Uwe Küster, [email protected]

• Regina Weigand

Gauss Centre of Supercomputing

Figure 6: Bottom and internally heated high-Rayleigh number convection with strong viscosity contrasts in a 3D spherical shell to study patterns and stresses relevant for earth. © German Aerospace Center, Berlin

Spring 2015 • Vol. 13 No. 1 • inSiDE 19 Applications Golden Spike Award by the HLRS Steering Committee in 2014 Discontinuous Galerkin for High Performance Computational Fluid Dynamics

Figure 1: Artist's concept of the X-51A WaveRider (Image credit: www.nasa.gov)

Towards Scramjets using as an ambitious candidate for classi- Discontinuous Galerkin cal rocket driven systems. In contrast Spectral Element Methods to today's space transportation tech- A space shuttle taking off from the nologies which need to carry tons of ground is an impressive moment and oxygen supplies, the scramjet inhales the images broadcasted on TV are the atmospheric air to gain the oxygen still attracting our attention. But as required for combustion. Thus, the fascinating as they are, the future of aircraft becomes lighter, faster and space flight may belong to air-breathing eventually more economic since the supersonic and hypersonic vehicles. payload can be significantly increased. Particularly scramjets (supersonic The idea of an air-breathing vehicle is combustion ramjets) are considered not new though, scramjets having been

20 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications a concept since the 1950s and in the intake flow. Since experimental and 1960s the first scramjet engines were flight data of hypersonic air-breathing tested in labs. However, after more vehicles are difficult and utterly expen- than 50 years of scramjet research a sive to obtain, numerical methods are breakthrough in aviation history was applied to enhance our understanding achieved in May 2013 when the U.S. of the involved complex physical phe- Air Force succesfully launched the nomena. X-51A WaveRider (Fig. 1) reaching a top speed of more than five times the The Discontinuous Galerkin speed of sound for a record-breaking Method flight time of four minutes. The intake flow of a scramjet is charac- terized by laminar and turbulent boundary Despite extensive studies over decades layers and their interaction with shock and first encouraging flight tests, the waves, yielding a three-dimensional un- scramjet still merges and challenges steady complex flow pattern. Particu- various research fields like aerodynam- larly shock wave / boundary layer inter- ics, combustion and material science. actions are of fundamental importance Especially the intake of a scramjet plays as they may cause intense heat loads a key role for the proper functionality leading to serious aircraft damages. A of air-breathing vehicles as it does not reliable numerical investigation of the compress the incoming air via moving occuring phenomena within the intake parts like compressors, but through can be accomplished by means of high a series of shock waves generated by fidelity simulations, as direct numeri- the specific shape of the intake and the cal simulations (DNS) and large eddy high flight velocity (see Fig. 2). As the simulations (LES) offer, which in turn supply of compressed air is of crucial require large computational resources. importance for the subsequent efficient Hence, the applied numerical method combustion of the fuel-air mixture to has to satisfy many requirements, like produce thrust, the intake flow also for instance high spatial resolution and determines the operability limits of the low dissipation errors, to accurately whole system. Along with the previous resolve the turbulent features of the two flight tests of the X-51A WaveRider flow. Moreover, the method should which failed due to inlet malfunctions, also allow an efficient usage of HPC the scientists attached great impor- systems to reduce the computational tance to the detailed study of the time. Due to its excellent dispersion

Figure 2: Schematic of a scramjet

Spring 2015 • Vol. 13 No. 1 • inSiDE 21 Applications

and dissipation attributes as well as its tive influence of data transfer can be highly performant parallelization, the reduced to a minimum. It is therefore discontinuous Galerkin (DG) method possible to send surface data while is recently considered as a promising simultaneously performing volume data approach to conduct such high-fidelity operations. Hence, the DGSEM facili- simulations in an efficient way. Beyond tates a lean parallelization strategy, that, the DG method also combines where except for direct neighbor com- arbitrary high-order resolution in space munication no additional operations are with geometrical flexibility, even en- introduced, being important for an abling the computation of complex efficiently scalable algorithm. The geometries. DGSEM algorithm has been implemented into our code framework FLEXI [2], which In the present work, however, we apply is fully MPI-parallelized and exhibits a special collocation type formulation excellent parallel efficiency and scaling of the DG method for unstructured attributes. In this context, Figs. 3 and hexahedral element meshes, namely 4 show strong scaling results of the the discontinuous Galerkin spectral DGSEM code FLEXI on the HLRS Cray element method (DGSEM) [1], whose XE6 supercomputer cluster for different main advantage is based on its HPC polynomial degrees N and different capability. In contrast to other high- mesh sizes, respectively. Given a fixed order schemes (e.g. finite difference number of elements we increased the and finite volume methods), the DGSEM number of processors by a factor of algorithm with explicit time discretiza- two in each step and, thus, decreased tion is inherently parallel since all ele- the load per processor up to the one- ments communicate only with direct element-per-processor limit. Here, we neighbors. Furthermore, the DGSEM see that for the highest polynomial operator itself can be split into two degree N=8, corresponding to an order parts, namely the volume part, which of accuracy of 9, the code permanently solely relies on local data, and the yields a so-called super-scaling, i.e., surface part, for which neighbor infor- scaling efficiency higher than 100%, mation is required. This feature of the owing to low data communication and DG method can be exploited by hiding memory consumption such that cach- communication latencies and the nega- ing effects can be exploited. It is also

Figure 3: Strong scaling for different polynomial Figure 4: Strong scaling for a fixed polynomial degrees N on the same mesh degree N=5 on different meshes

22 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications

possible to achieve super-scaling for Figure 5: Visualization of the turbulent structures using the λ2-criterion and lower polynomial degrees by increas- colored by the streamwise velocity ing the load per processor to optimize the volume-to-surface data ratio (4 which involves the simulation of the elements per processor for N=5 and complete laminar-turbulent transition 8 elements per processor for N=3). process. Fig. 5 visualizes the stream- Hence, the scaling results prove that wise development of turbulent struc- the DG method is very well suited for tures showing how the initially laminar massively parallel high performance flow experiences transition and eventu- computing. ally becomes turbulent.

DNS of a Turbulent Boundary We choose a polynomial degree of N=5 Layer Along a Flat Plate to enable a direct comparison with Transition describes the process of a Keller & Kloker [3] who used a sixth laminar flow becoming turbulent. It also order compact finite difference (cFD) represents the initial phenomenon that code for a similar simulation setup. As occurs along the intake walls before shock waves interact with the emerg- ing boundary layers. Thus, we will first investigate the most canonical form of transition, namely the DNS of a spatially-developing supersonic turbu- lent boundary layer on a flat plate, in order to demonstrate the high poten- tial of DG schemes for compressible wall-bounded turbulent flows. The free- stream Mach number, temperature and pressure of the boundary layer flow are given by M = 2.67, T = 564 K and p = 14890 Pa, respectively. We follow Figure 6: Van Driest transformed streamwise velocity the approach of developing turbulence, profiles at the most downstream location

Spring 2015 • Vol. 13 No. 1 • inSiDE 23 Applications

state-of-the-art compact FD codes

(PIDcFD=17) and emphasizes the strong efficiency of the DGSEM code at per- forming high-fidelity simulations of wall- bounded compressible turbulent flows. Based on the encouraging results, the outlook of the present study is to conduct a DNS of a shock wave / boundary layer interaction to gain a deeper insight into the complex physics of the intake flow.

Figure 7: Spanwise averaged Reynolds-stress profiles Acknowledgments This research has been supported by displayed in Fig. 6, the van Driest trans- the Deutsche Forschungsgemeinschaft formed velocity profile at the most through the research training group downstream position matches both the “Aero-Thermodynamic Design of a compressible reference solution and Scramjet Propulsion System for Future the incompressible scaling laws. In Fig. 7 Space Transportation Systems”. The we compare the Reynolds-stresses, simulations were conducted on the again, at the most downstream posi- HLRS Cray XE6 supercomputer tion where the momentum thickness system. based Reynolds number reaches a value of Re =1361. In addition, the θ References Reynolds-stresses are density-weighted [1] Kopriva, D. A. Implementing Spectral Methods for Partial to compare the results with Spalart's Differential Equations: Algorithms for incompressible turbulent boundary Scientists and Engineers. Springer Publish- layer at Re =1410 [4]. Except for the ing Company, Incorporated, 2009. θ • Muhammed Atak immediate near-wall region, where [2] Hindenlang, F., Gassner, G., Altmann, C., • Claus-Dieter Munz slight differences to the incompressible Beck, A., Staudenmaier, M., Munz, C.-D. Reynolds-stresses can be observed, Explicit discontinuous Galerkin methods for Institute of the DG results are in a very good unsteady problems. Computers & Fluids, 61 pp. 86-93, 2012. Aerodynamics agreement with both the compressible and Gas Dynamics, reference case and Spalart's incom- [3] Keller, M. and Kloker, M. University of pressible boundary layer. DNS of effusion cooling in a supersonic boundary-layer fow: Influence of turbulence. Stuttgart, Germany AIAA Paper 2013-2897, 2013. The computational efficiency of the DGSEM scheme can be assessed by [4] Spalart, P. R. Direct simulation of a turbulent boundary the so-called performance index (PID). layer up to Re =1410. Journal of Fluid θ The PID is a convenient measure to Mechanics, 187:61-98, 1988. judge the computational efficiency and it expresses the computational contact: Muhammed Atak, time needed to update one degree [email protected] of freedom (DOF) for one time step: Claus-Dieter Munz, PID = (Wall-clock-time x #cores)/ [email protected] (#DOF x #timesteps). The comparison of the PID reveals that the DG method

(PIDDGSEM=15) is able to compete with

24 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications Golden Spike Award by the HLRS Steering Committee in 2014 Large-scale Simulations of Particle-laden Flows

The Computational Fluid Dynamics and the large scales of the flow. group of the Institute of Aerodynamics The interaction of these phenomena and Chair of Fluid Mechanics at the is, e.g., responsible for the particle RWTH Aachen University uses the concentration within the fluid domain. HLRS Supercomputers to study particle-laden flows by both Lagrangian Particle-laden flows play a major role models as well as fully resolved in many engineering applications and particles methods. The latter are natural sciences, which include fields particularly challenging due to the as diverse as biomedical applications, wide range of time and length scales weather forecasting, manufacturing involved. As a consequence a high and combustion processes. Studying mesh resolution is required to capture the deposition of inhaled aerosols and the fluid motion around the particles, particles in the human lung is a bio- responsible for the particle motion, medical application, which aims to help

Figure 1: Fine dust particle distribution in the human lung during inspiration. The particles, tracked by a Lagrangian model, are transported down the trachea and distribute in the left and right main bronchi before they deposit in higher lung generations where they can cause severe damage to the respiratory system potentially leading to lung cancer [2].

Spring 2015 • Vol. 13 No. 1 • inSiDE 25 Applications

Figure 2: (Left) At the inflow plane of the computational domains (at the left side) synthetic eddies are generated to generate an isotropic turbulence field. It is shown how the turbulent intensity decays due to the dissipation as the turbulence is convected downstream (in positive x direction). (Right) Rain particle distribution determined by Lagrangian particle-tracking for a Stokes number of ~1 in an isotropic, decaying turbulence field. The particles concentrate in regions of high strain and low vorticity leading to a higher collision probability.

to understand associated pathologies cess. Finally, oxy-fuel combustion is an (see Fig. 1). Within the high-priority re- emerging combustion technology that

search program SPP1276 METSTROEM significantly reduces the CO2 emissions the growth of water droplets within of coal power plants. Detailed numerical vapor clouds is studied to increase our simulations of the oxy-fuel combustion understanding of the rain formation process including the burning coal par- process [1]. The range of scales in this ticles are performed within the tran- process spans from the size of aerosol sregional collaborative research center particles (micrometers) to the cloud’s SFB/TRR129 (Oxyflame), see Fig. 3. length (kilometers), see Fig. 2. The manufac- turing quality of the electrical discharge machining process is known to be sensitive to the debris removal. In the SFB/TRR136 (Process Signatures) the physics of the debris deposition process for multiple removal strategies is studied to better understand the under- lying physical phenom- Figure 3: Fully-resolved particle simulation involving 1,000 particles in a Taylor Green Vortex at Reynolds’ number 1,600 on 1,800 cores. The ena of the cleaning computational grid is a periodic cube,. On its surfaces the vorticity dis- and characterize the tribution is shown (heat map). Vortical structures are visualized with the signature of the pro- contours of the Q-criterion (green to blue).

26 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications

With this objective in mind, the multi- Lattice-Boltzmann method for incom- physics in-house code ZFS (Zonal Flow pressible flows, finite-volume methods Solver) has been developed at the for compressible flow and combustion Institute of Aerodynamics over the problems, and high-order discontinuous past decade. It is based on the concept Galerkin methods for acoustic prob- of hierarchical Cartesian meshes [3], lems. These are complemented with which allows adaptive mesh refinement particle solvers that allow simulating techniques to automatically invest the millions of point particles using Lagrangian computational effort on those regions models, and level-set methods for per- of the domain where the solution forming fully-resolved particle simula- changes more rapidly, while reducing tions involving thousands of particles. the grid resolution where the solu- tion is smooth. This allows tracking The parallel efficiency of ZFS is demon- the small scales of fluid flows where strated by means of a so-called strong- they exist, i.e., maintaining a fine grid scaling experiment, in which the size of e.g. around the particles as they move the problem is kept constant while the through the domain. A space-filling number of computing cores is continu- curve is used to partition the domain ously increased. For an ideally efficient such that the communication between solver the product of the number of processors is minimized. The concept computing cores time the required of space filling curves also allows to computing time would stay constant. redistribute the load dynamically as In Fig. 4 a study of the parallel effi- the particles move from one processor ciency of the grid generation process to another, enabling efficient large- for a grid containing 9.82 x 109 cells scale adaptive simulations. Finally, the is shown. A parallel efficiency close to mathematical models of the fluid prob- 100% is maintained till 32,768 cores. As lems are solved on these hierarchical the number of cells to be generated per adaptive Cartesian grids by numerical core further decreases the parallel efficiency methods optimized for the underlying also decreases due to a lacking amount of physical phenomena characteristic of work per core, finally achieving a parallel ef- different flow regimes. ZFS provides a ficiency of ~75% on 131,072 cores.

Figure 4: Results of the strong scaling experiment for the generation of a grid consisting of 9.82 x 109 cells. The ideal speed up (red), the actual achieved speedup (green), as well as the number of cells generated per core (blue) are shown [4].

Spring 2015 • Vol. 13 No. 1 • inSiDE 27 Applications

Acknowledgements This work has been financed by the research cluster Fuel production with renewable raw materials (BrenaRo) at RWTH Aachen University as well as by the German Research Foundation (DFG) within the framework of the SFB/Trans- regio 129, SFB/Transregio 136, the transregional collaborative research center SFB 686, and within the frame- work of the priority program SPP 1276 METSTROEM under grant number SCHR 309/39. The support is grate- fully acknowledged. The authors thank for the provided computing time at the IBM BlueGene/Q system (JUQUEEN) at the Jülich Supercomputing Center (JSC) and the CRAY XE6 system (HERMIT) at the High Performance Computing Center Stuttgart (HLRS). • Gonzalo Brito Gadeschi1 References • Lennart Schneiders1 [1] Siewert, C., Kunnen, R.P.J., and Schröder, W. • Christoph Siewert1, 2 Collision Rates of Small Ellipsoids Settling 1, 3 • Andreas Lintermann in Turbulence, Journal of Fluid Mechanics, • Matthias Meinke1 758, 686–701, 2014. • Wolfgang Schröder1, 3 [2] Lintermann, A. and Schröder, W. Simulation of Aerosol Particle Deposition 1 Institute of in the Lower Human Respiratory System, Aerodynamics, submitted to the Journal of Aerosol Science, 2015. RWTH Aachen University, Germany [3] Lintermann, A., Schlimpert, S., Grimmen, J. H., Günther, C., Meinke, M., and Schröder, W. 2 Laboratoire Massively parallel grid generation on HPC Lagrange, OCA, systems, Computer Methods in Applied CNRS, Nice, France Mechanics and Engineering, 277, 131–153. doi:10.1016/j.cma.2014.04.009, 2014.

2 SimLab “Highly [4] Brito Gadeschi, G., Siewert, C.. Scalable Fluids & Lintermann, A., Meinke, M., and Solids Engineering", Schröder, W. Towards large multi-scale particle JARA-HPC, RWTH contact: Gonzalo Brito Gadeschi, simulations with conjugate heat transfer Aachen University, on heterogeneous super computers, High [email protected] Germany Performance Computing in Science and Dr. Matthias Meinke, Engineering '14, 2015. [email protected]

28 Spring 2015 • Vol. 13 No. 1 • inSiDE Golden Spike Award by the HLRS Steering Committee in 2014 Applications Quarks and Hadrons – and the Spectrum in Between

The mass of ordinary matter, due to Lattice QCD, which traditionally uses a protons and neutrons, is described space-time lattice with quarks on the with a precision of about five percent [1] lattice sites and the gluons located by a theory void of any parameters – on the lattice links. In the so-called (massless) Quantum Chromodynamics continuum limit, the lattice spacing (QCD). The remaining percent are due is sent to zero and QCD and the to other parts of the Standard Model continuous space-time are recovered. of Elementary Particle Physics: the Simulations of Lattice QCD require sig- Higgs-Mechanism, which explains the nificant computational resources. As a existence of quark masses, and a per matter of fact, they have been a driving mil effect due to the presence of force of supercomputer development. Electromagnetism. Only at this stage, A large range of special-purpose free parameters are introduced1. Lattice QCD machines, e.g. the com- Mass is thus generated in QCD without puters of the APE family, QPACE, and the need of quark masses, an effect QCDOC among others, have been devel- termed “mass without mass” [2]. oped, the QCDOC being the ancestor of the IBM Blue Gene family of supercom- This mass generation is due to the puters. Over the last decade, these dynamics of the strongly interacting simulations have matured substantially. quarks and gluons, the latter mediating In 2008, the first fully controlled cal- the force like the photon in the case of culation of the particle spectrum [1] Quantum Electrodynamics. However, became available and simpler quantities in contrast to the photons, which are such as masses and decay constants themselves uncharged and thus do not interact (directly) with one another, the 10 gluons themselves carry the strong (color) charge. This is the essential ΔΣ experiment complication in the dynamics of the 8 ΔΞ QCD+QED prediction strong interaction, since it renders the interaction non-linear. Furthermore, 6 Δ due to the uncertainty relation of D quantum physics and relativity, even 4 the number of quarks inside the proton M [MeV] Δ ΔΞ cc or neutron is not fixed: quark anti- Δ quark pairs are created and annihilated 2 N Δ constantly and contribute significantly CG to the overall mass of the particle. 0

So far, the only known way to analyze BMW 2014 HCH QCD at small energies, where its interaction strength is large, is through Figure 1: Mass splittings [3]. The horizontal lines are the experimental values and the grey shaded regions represent the experimental errors than those of our pre- simulations. These simulations are dictions. Our results are shown by red dots with their uncertainties. A blue shaded based on the discretized theory, called region indicates that no experimental results are available or that these results have larger errors. 1 These parameters are a coupling (electro- magnetic charge) and the quark masses

Spring 2015 • Vol. 13 No. 1 • inSiDE 29 Applications

are now routinely computed to percent Including these effects is, however, precision. non-trivial. The biggest obstacle turns out to be the long ranged nature of Neutron, Proton, and the electrodynamics. Whereas the strong Stability of Matter force is essentially confined inside its In order to increase the precision of "bound-states", the so-called hadrons, the calculations further, one has to ad- the electromagnetic force, falling off dress the largest sources of uncertain- according to the well-known 1/r2 rule, ties, which, in case of the spectrum, is still felt at large distances. This are due to Electrodynamics and the introduces significant finite size ef- difference between the up- and down- fects, which are typically larger than quark masses. Once these effects are the mass splittings one is interested properly included in the simulations, in. In our recent calculation, the cor- one can calculate per mil effects of rect theoretical framework for a treat- the particle spectrum of the Standard ment of these effects was established Model, e.g. the difference between the and the finite-size corrections were proton and the neutron mass. calculated analytically. A new simula- tion algorithm for the Electrodynamics For equal light quark masses, Electro- part of the calculations was developed, dynamics renders the proton slightly which reduced the autocorrelation by heavier, due to energy stored in the three orders of magnitude. Combined electromagnetic field that surrounds with other advanced methods, such it. The light quark mass splitting, con- as the latest multi-level solvers, this versely, increases the neutron mass, allowed us to compute the particle since it contains two of the heavier splittings using the presently available down-quarks compared to one in case resources of the Gauss Centre for of the proton. The interplay between Supercomputing2, JUQUEEN at JSC, these effects has significant implica- Hermit at HLRS, and SuperMUC at LRZ tions for the stability of matter. If the (Fig. 1). neutron-proton mass splitting was about a third of the 0.14% found in From Hadrons to Quark nature, hydrogen atoms would un- Soup dergo inverse beta decay, leaving pre- In the case of the proton and the neu- dominantly neutrons. Even with a value tron, quarks and gluons are confined somewhat larger than 0.05%, Big Bang to the hadron. If we, however, increase Nucleosynthesis (BBN) would have pro- the temperature of the system suf- duced much more helium-4 and far less fciently, both particles will "melt" and hydrogen than it did in our universe. As quarks and gluons behave as free a result, stars would not have ignited particles, forming an exotic state of in the way they did. On the other hand, matter called quark-gluon plasma. a value considerably larger than 0.14% This (rapid) transition from the quark- would result in a much faster beta gluon to the hadronic phase occurred decay for neutrons. This would lead to when the early universe evolved from far less neutrons at the end of the BBN the "quark epoch", lasting from 10-12 epoch and would make the burning of to 10 -6 seconds after the Big Bang, hydrogen in stars and the synthesis of to the following hadron epoch, which heavy elements more difficult. ended when the universe was about

2 see acknowledgements

30 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications

Figure 2: (Left) Preliminary results for the charmed EoS. For comparison, we show the HRG result, the Nf = 2 + 1 band, and, at high Temperatures, the HTL result, where the central line marks the HTL expectation for the EoS with the band resulting from (large) varia- tions of the renormalization scale. (Right) Preliminary results for the pressure, errors indicate the Stefan-Boltzmann value. All errors are statistical only. one second old. Present heavy-ion ex- charm quark, which restrict their re- periments (LHC@CERN, RHIC@BNL, gion of applicability to temperatures and the upcoming FAIR@GSI) create, below about 400 MeV. In order to for a brief moment when two heavy reach larger temperatures, we have nuclei collide, the extreme conditions of set up new simulations which take the the early universe, allowing us to study charm quark into account, using an im- this transition and the properties of proved formulation of Lattice QCD. Our the quark-gluon plasma some 13 Billion preliminary results illustrate the impact years later. of the charm quark for temperatures larger than 400 MeV (Fig. 2), and Considerable theoretical effort is invested make contact with both HRG at low attempting to describe these experi- and HTL at large temperatures. The ments, from collision to detector signals. EoS is thus becoming available for the Here, the Equation of State (EoS) of whole temperature region. QCD [4] is a central ingredient for a complete understanding of the experi- Computational Aspects mental findings. At low temperatures, Simulations of Lattice QCD generally the EoS can be calculated using the so- happen in three main phases. In the called "Hadron Resonance Gas" (HRG) first phase, an ensemble is generated model. At high temperatures, pertur- through a Markov process. This phase bative analyses of QCD become pos- is thus usually scaled to a large number sible (e.g. "Hard Thermal Loop" (HTL) of cores, minimizing "wall-clock" time. perturbation theory). The intermediate We have, so far, scaled our production region, from ca. 100 MeV to 1 GeV, code used for the ensemble genera- can be covered systematically through tion up to 1.8 million parallel threads, simulations of Lattice QCD. running at a sustained performance of over 1.6 PFlop/s (Fig. 3). The sec- Presently available Lattice QCD results ond production stage then analyzes for the EoS neglect the effects of the the individual "configurations" that

Spring 2015 • Vol. 13 No. 1 • inSiDE 31 Applications

constitute an ensemble one by one. early universe transition between, and Since an ensemble can contain 1,000 the properties of matter in, the quark configurations and more, this greatly and the hadron epoch, starting 10-12 reduces the need for scaling to a large seconds after the Big Bang. number of cores. Therefore, we can optimize production at this stage for Acknowledgements efficiency (which reaches up to 70% of We thank all the Members of the the hardware peak flop rate) and queue Budapest-Marseille-Wuppertal and throughput. Physics results are then Wuppertal-Budapest collaborations for extracted in the final step of the cal- the fruitful and enjoyable cooperation. culation, which, with our involved blind analysis procedure [1,3], requires a Our simulations require substantial small compute cluster on its own. computational resources. We are indebted to the infrastructure and Outlook computing time made available to us by Simulations of Lattice Quantum Chromo- the Gauss Centre for Supercomputing dynamics have reached per mil level preci- and the John von Neumann Institute sion. By now, we are able to reproduce for Computing. even intricate details of the particle spectrum, such as the neutron-proton Further support for these projects and other mass splittings at high preci- was provided by the DFG grant SFB/ sion. The inclusion of Quantum Electro- TR55, the PRACE initiative, the ERC dynamics was essential to archive this grant (FP7/2007-2013/ERC No level of accuracy, rendering calcula- 208740), the Lendület program of tions of the combined theory possible HAS (LP2012-44/2012), the OCEVU in cases where this is needed in the future. Be- yond conceptual advances, the correct repro- duction of the mass splittings found in nature provides further strong evidence that Quantum Chromodynamics correctly accounts for the proper- ties of strongly interacting mat- ter. Moving be- yond the mass spectrum, we can Figure 3: Strong scaling analysis of the Lattice QCD simulation code, now calculate the reaching a sustained performance of over 1.6 Flop/s on the Blue Gene/Q at properties of the JSC, Jülich.

32 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications

Labex (ANR-11-LABX-0060), the A*MIDEX project (ANR-11-IDEX-0001-0) and the GENCI-IDRIS Grand Challenge grant 2012 "StabMat" as well as grant No. 52275. The computations were performed on JUQUEEN and JUROPA at Forschungszentrum Jülich, on Turing at the Institute for Development and Resources in Intensive Scientific Computing in Orsay, on SuperMUC at Leibniz Supercomputing Centre in München, on Hermit and Hornet at the High Performance Computing Center in Stuttgart, and on local machines in Wuppertal and Budapest. Comput- ing time on the JARA-HPC Partition is gratefully acknowledged.

References [1] Dürr, S. et al. Ab-Initio Determination of Light Hadron Masses, Science 322, 1224-1227, 2008

[2] Wilczek, F. Mass Without Mass I: Most of Matter, Physics Today 52, 11, 1999

[3] Borsanyi, S., et al. Ab initio calculation of the neutron-proton mass difference, Science 347, 1452-1455, 1, 2 2015 • Stefan Krieg • Zoltan Fodor1, 2, 3 [4] Borsanyi S, et al. The QCD equation of state with dynamical 1 quarks, JHEP 1011, 077, 2010 JSC & JARA-HPC, Forschungs- zentrum Jülich, Germany contact: Stefan Krieg, 2 University of [email protected] Wuppertal, Germany

3 Eötvös Lorand University, Hungary

Spring 2015 • Vol. 13 No. 1 • inSiDE 33 Applications Advance Visualisation of Seismic Wave Propagation and Speed Model

With the induction of the Virtual Reality the Earth have a significant impact on and Visualisation Centre (V2C) at how the waves propagate. Depending Leibniz Supercomputing Centre (LRZ), on the material properties, e.g. many domain specialists have approached density and elasticity of the medium, LRZ to leverage on the immersive the speed at which the seismic waves projection technology. Large datasets propagate differs. As seismic waves can now be stereoscopically displayed travel through different materials, they and specialists can interact with their can be reflected, refracted, dispersed, complex datasets intuitively. Seismolo- diffracted and/or attenuated. A wave gists is one group of domain specialists speed model is used to approximate that have benefited from the use of and represent the different materials this virtual reality (VR) technology. To that compose the Earth structure allow a deep insight into the simulated and the speed at which the waves data, the seismologists make use of VR travel through each of these regions. installations like a 5 sided projection Having a wave speed model that installation based on the concepts of a accurately represents a region implies Carolina Cruz-Neira’s CAVE Automated understanding more about the Earth’s Virtual Environment (CAVE)1. In this interior and also the possibility to article, the CAVE like installation at LRZ determine the earthquake location will be referred to as the CAVE for more accurately. convenience. Computation Seismology A forward simulation of a North Italian Seismology is a field of science where earthquake was computed to analyse earthquakes and propagation of the wave speed model and how its fea- seismic waves that move through tures affect the propagation of seismic the Earth and on its surface are waves. In particular, the considered scientifically studied. The geological event was the mainshock of a seismic structure and physical properties of sequence that struck North Italy in

Figure 1: The Wave Speed Model

1 CAVETM, a registered trademark of the University of Illinois’ Board of Trustees

34 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications

Figure 2: Slices and Contours of the Wave Speed Model

2012. It occurred on May 20, 2012, model allows to produce simulations at 02:03 UTC with a local magnitude of that resolve a minimum period of about 5.9, e.g. [5], and a hypocentre location 4 seconds. The forward simulation was of 44.98°N, 11.23°E and 6.3 km depth performed using SPECFEM3D_Cartesian relocated by the Istituto Nazionale di [3], a very popular wave propagation Geofisica e Vulcanologia (INGV). The simulation code, which employed the source parameters were given by the continuous Galerkin spectral-element centroid moment tensor solution calcu- method with arbitrary unstructured lated using the Time Domain Moment hexahedral meshes. The computation Tensor technique implemented at INGV [6]. was submitted to SuperMUC at LRZ, The considered region in North Italy is utilising 500 cores to generate 1 min- characterised by a large sedimentary ute of seismograms with 114 seismic basin, a significant presence of fluid stations. Since this application was well and strong heterogeneities, leading to parallised, it managed to take good ad- remarkable site effects and liquefaction vantage of the parallel architecture of phenomena. To run the simulation, this SuperMUC. The computation, including region was discretised by constructing the generation of additional output files a conforming, unstructured mesh of for visualisation purposes, completed in hexahedral elements that covered a vol- less than 30 minutes. The correspond- ume of ~350 km in longitude, ~230 km in ing wave speed model in Fig. 1, the latitude and 60 km in depth. The mesh slices and contours of this model in was composed of ~1.6 million of hexa- Fig. 2 and the animated surface propa- hedral elements and honoured the free gation of seismic waves in Fig. 3 were surface topography. Then, to represent then ported and displayed in the CAVE. the geological structure of the region, These figures are renderings for illus- a wave speed model was constructed tration purposes of the actual displays that was based on a 3D tomographic in the CAVE. The interactive CAVE dis- model for the Italian peninsula derived plays are particularly useful to allow the by the inversion of P-wave travel time visualisation of the contours and slices measurements [2]. The P-wave wave of the wave speed model. It allows speed model was scaled into the speed the scientists to literally step into the of S-waves and the density using a model, intuitively visualise the critical scaling relation. The model included layers and associate them to the simu- the signature of the 8 km thickness lated speed of the wave propagation. sedimentary basin. The combination of The computed synthetic seismograms the described mesh and wave speed can be compared to the observed seis-

Spring 2015 • Vol. 13 No. 1 • inSiDE 35 Applications

mograms to calculate the discrepancy from their interests. This level of quantified by a misfit function. The automation could allow a dynamic misfit is then used to correct the wave annotation of the visualisation elements speed model. With each such iteration, to enrich the description and the an increasingly accurate wave speed metadata associated to the specific model can be represented and bring workflow run, experiment or study. the seismologists one step closer to From the perspective of advance understanding the Earth’s interior. visualisation, the data exploration is currently focused on a single user Immersive Visualisation setup but it is easily possible to A visualisation prototype has been de- integrate network communication and veloped from scratch which smoothly allow collaborative data analysis. This is integrates into the seismologists’ also something to explore in the future workflow. The prototype is based on to allow the seismologists not only to OpenSG [4], an open source scene analyse the models independently but graph, which supports multi-display also collaboratively with their fellow installations. Additionally VRPN [7] is seismologists. used as an abstraction layer for a large set of input devices. Thus it is possible Acknowledgement to display the wave speed model, con- This work was conducted in tours of it and a sequence of surface cooperation with the team from the propagation on powerwalls, CAVE-like EU project, Virtual Earthquake and installations and Head-Mounted Dis- seismology Research Community plays (HMDs). Providing stereoscopic in Europe e-science environment real-time interaction turning the model (VERCE). The output data required allows building a mind map of the dis- for the visualisation was generated by played structures. The advantage of the execution of scientific workflows the HMDs and surround displays lies that were built around SPECFEM3D_ in the intuitive data analysis. By provid- Cartesian. These workflows were ing head tracking it is easily possible to submitted and controlled within a change the perspective on the data in virtual research environment (VRE), a fluent and natural way fostering which is available to seismologists exploration of the displayed data via the VERCE project as the VERCE discovering unexpected connections platform. The platform offers ease inside the data. and homogeneous access to the data and workflow management Future Work systems by connecting numerous The benefits of using CAVE to distributed computational resources. analyse their models has motivated These services are fully integrated the seismologists to integrate their within an interactive science gateway existing tools to create data products that is tailored to the needs of the that are useful for CAVE displays. The seismologists. It enables them to seismologists aim to include the CAVE experiment with different wave speed visualisation items as an additional end- models and simulation parameters in product of their scientific workflow. an assisted environment. This will allow them to produce multiple types of representation, highlighting features and properties depending

36 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications

Figure 3: Surface Propagation of Seismic Waves

References [5] Scognamiglio, L., Margheriti, L., Mele, F. M., Tinti, E., Bono, A., [1] Cruz-Neira, C., Sandin, D. J., Defanti, De Gori, P., Lauciani, V., Lucente, F. P., T. A., Kenyon, R. V., and Hart, J. C. Mandiello, A. G., Marcocci, C., The cave: Audio visual experience automatic Mazza, S., Pintore, S., and Quintiliani, M. virtual environment, Communications of the The 2012 Pianura Padana Emiliana seismic ACM 35, 6, 64–72., 1992. • Siew Hoon Leong1, 2 sequence: locations, moment tensors and • Christoph Anthes1, 2 [2] Stefano, R. D., Kissling, E., magnitudes, Annals of Geophysics 55, 4, 3 Chiarabba, C., Amato, A., and Giardini, D. 2012. • Federica Magnoni Shallow subduction beneath Italy: three- • Alessandro Spinuso4 [6] Scognamiglio, L., Tinti, E., and dimensional images of the Adriatic-European- 3 Michelini, A. • Emanuele Casarotti Tyrrhenian lithosphere system based on Real-Time Determination of Seismic Moment high-quality P wave arrival times., 114, Tensor for the Italian Region, Bulletin of the 1 B05305, 2009. Leibniz Seismological Society of America 99, 4, Supercomputing [3] Peter, D., Komatitsch, D., Luo, Y., 2223–2242, 2009. Centre, Germany Martin, R., Le Goff, N., Casarotti, E., [7] Taylor, R. M. II, Hudson, T. C., Seeger, Le Loher, P., Magnoni, F., Liu, Q., A., Weber, H., Juliano, J., and Helser, A. T. Blitz, C., Nissen-Meyer, T., Basini, P., 2 Ludwig- Vrpn: a device-independent, network-trans- and Tromp, J. parent vr peripheral system, Proceedings Maximilians- Forward and adjoint simulations of seismic of the ACM symposium on Virtual reality Universität wave propagation on fully unstructured software and technology (New York, NY, hexahedral meshes, Geophysical Journal München, USA), VRST ’01, ACM, pp. 55–61, 2001. International 186, 721–739, 2011. Germany [8] VERCE - http://www.verce.eu [4] Reiners, D. 3 Opensg: A scene graph system for flexible Istituto Nazionale and efficient realtime rendering for virtual contact: Siew Hoon Leong, di Geofisica e and augmented reality applications, Ph.D. [email protected] Vulcanologia, Italy thesis, Technische Universität Darmstadt, Christoph Anthes, Mai 2002. [email protected] 4 Royal Netherlands Federica Magnoni, Meteorological [email protected] Institute, Alessandro Spinuso, The Netherlands [email protected] Emanuele Casarotti, [email protected]

Spring 2015 • Vol. 13 No. 1 • inSiDE 37 Applications A new Neutrino-Emission Asymmetry in forming Neutron Stars

Figure 1: Evolution of the neutrino emission asymmetry in a collapsing star of 11.2 solar masses. The ellipses represent the whole surface of the nascent neutron star (analog to world maps as planar projections of the Earth's surface). Red and yellow mean a large excess of electron neutrinos compared to electron antineutrinos, normalized to the average, blue means a low excess or even deficit of electron neutrinos. The images show the merging of smaller patches that are present at 0.148 seconds (upper left panel) to a clear hemispheric (dipolar) anisotropy at 0.240 seconds (lower right panel). The dot and cross indicate the emission maximum and minimum, the dark-grey line marks the path described by a slow motion of the dipole direction. © Author and The American Astronomical Society Supernovae are the spectacular a sphere with the diameter of Munich. explosions that terminate the lives of Their central density exceeds that in stars more massive than about nine atomic nuclei, gigantic 300 million times our sun. They belong to the most tons (the weight of a mountain) in the energetic and brightest phenomena volume of a sugar cube. Neutron stars in the universe and can outshine a are formed as extremely hot and dense whole galaxy for weeks. They are objects when the central core of the important cosmic sources of chemical highly evolved, massive star undergoes elements like carbon, oxygen, silicon, a catastrophic collapse because it and iron, which are disseminated in cannot withstand its own gravitational the circumstellar space by the blast weight any longer. Newly born neutron wave of the explosion. Supernovae are stars cool by the intense emission of also the birth places of the most exotic neutrinos, ghostly elementary particles celestial objects: neutron stars and that hardly interact with matter on black holes. earth but that are produced in gigantic numbers at extreme temperatures and Neutron stars contain about 1.5 times densities. These neutrinos are thought the mass of the sun, compressed into to trigger the violent disruption of the

38 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications dying star in the supernova if even only transport and neutrino-matter interac- one percent of their huge total energy tions with full energy dependence. While can be tapped to heat the stellar the former is treated with an explicit, mantle that surrounds the forming higher-order Godunov-type scheme, the neutron star. latter is solved by an implicit integrator of the neutrino energy and momentum Because neither experiments nor equations, supplemented by a state-of- direct observations can reveal the the-art set of neutrino-reaction kernels processes at the centers of exploding and a closure relation computed from stars, highly complex numerical simu- a simplified model-Boltzmann equation. lations are indispensable to develop a Consistent with the basically spherical ge- deeper and quantitative understanding ometry the equations are discretized on of this hypothetical "neutrino-driven ex- a polar coordinate grid, and the computa- plosion mechanism", whose solid theo- tional efficiency is enhanced by the use of retical foundation is still missing. The an axis-free Yin-Yang implementation and computational modeling must be done a time- and space-variable radial grid. The in three dimensions (3D), simulating the thermodynamics and changing chemical whole star, because turbulent flows as composition of the stellar medium are de- well as large-scale deformation play a termined by high-dimensional equation-of- crucial role in enhancing the neutrino- state tables and nuclear burning at non- matter interactions. This requires the equilibrium conditions. solution not only of the fluid dynamics problem in a strong-gravity environ- Employing mixed MPI-OpenMP paralleliza- ment including a description of the tion, our ray-by-ray-plus approximation properties of neutron-star matter and of multidimensional transport allows for of nuclear reactions. In particular the essentially linear scaling up to tested neutrino propagation and processes processor-core numbers of more than pose a grand computational challenge, 130,000. In production applications we because besides the three spatial are granted access to up to 16,000 dimensions there are additional three processor cores (allowing for two dimensions for neutrino energy and degrees angular resolution) and the code direction of motion. Not even the big- typically reaches 10–15% of the peak gest existing supercomputers can performance. The computational load is solve such a six-dimensional, time- strongly dominated by the complexity of dependent transport problem in pre- the neutrino transport. Despite remaining fect generality. approximations, a single supernova run for an evolution period of roughly half a Simulation Methods second takes several months of uninter- In this project the Supernova Simulation rupted computing and up to 50 million Group at the Max Planck Institute for core hours, producing more than hundred Astrophysics (MPA) simulates the gravita- Tbytes of data. PRACE and GAUSS contin- tional collapse of stellar cores to neutron gents on the SuperMUC infrastructure of stars and the onset of the supernova LRZ have enabled the successful execu- explosion with the PROMETHEUS-VERTEX tion of this important and timely project of code for multi-dimensional hydrodynami- fundamental research in theoretical cal simulations including a highly sophisti- stellar astrophysics. cated description of three-flavor neutrino

Spring 2015 • Vol. 13 No. 1 • inSiDE 39 Applications

Figure 2: Bubbles of "boiling" gas surrounding the nascent neutron star (invisible at the center). Despite the highly time-variable and dynamical pattern of plumes of hot, rising gas and inflows of cooler matter, the neutrino emission develops a hemispheric asymmetry (see Fig. 1) that remains stable for periods of time much longer than the life time of individual bubbles. © Author and The American Astronomical Society Results while, however, the neutrino emission On the way to producing the first-ever develops clear differences in two hemi- 3D explosion models with a highly spheres. The initially small patches sophisticated treatment of the neutrino merge to larger areas of warmer and physics, the MPA team made a stun- cooler medium until the two hemi- ning and unexpected discovery [1]: The spheres begin to radiate neutrinos neutrino emission develops a strong unequally. A stable dipolar pattern is dipolar asymmetry (Fig. 1). Neutrinos established, which means that on one and their anti-particles are not radi- side more neutrinos leave the neutron ated equally in all directions but with star than on the other side. Observers largely different numbers on opposite in different directions thus receive hemispheres of the neutron star. As different neutrino signals. While the expected, the neutrino emission starts directional variation of the summed out to be basically spherical except emission of all kinds of neutrinos is only for smaller variations over the surface some per cent (Fig. 3a), the individual (see Fig. 1, upper left panel). These neutrino types (for example electron variations correspond to higher and neutrinos or electron antineutrinos) lower temperatures associated with show considerable contrast between violent "boiling" of hot matter inside the two hemispheres with up to about and around the newly formed neutron 20 per cent deviations from the aver- star, by which bubbles of hot matter age (Figs. 3b,c). The directional varia- rise outward and flows of cooler mate- tions are particularly pronounced in the rial move inward (Fig. 2). After a short difference between electron neutrino

40 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications and antineutrino fluxes (Fig. 1, lower     right panel), the so-called lepton num- ber emission. 

The possibility of such a global aniso-   tropy in the neutrino emission was not predicted and its finding in the first- ever detailed three-dimensional simula-   tions of dynamical neutron-star forma- tion comes completely unexpectedly.    The team of astrophysicists named this (a) new phenomenon "LESA" for Lepton-    Emission Self-sustained Asymmetry [1],   because the emission dipole seems to  stabilize and maintain itself through complicated feedback effects despite  the violent bubbling motions of the "boil- ing" hot and cooler gas, which lead to  rapidly changing structures in the flow around and inside the neutron star  (Fig. 2).

  Outlook (b) The new, stunning neutrino- hydrodynamical instability that    manifests itself in the LESA  phenomenon is not yet well  understood. Much more research is needed to ensure that it is not an  artifact produced by the highly complex numerical simulations. If it is physical reality, this novel effect would be a  discovery truly based on the use of modern supercomputing possibilities   and not anticipated by previous (c) theoretical considerations. Figure 3: Neutrino emission asymmetry as observable over a period of 0.1 seconds. Analog to Fig. 1 the ellipses show all possible viewing directions. If LESA happens in collapsing stellar Observers in the red regions see the highest emission, whereas those in the blue cores, it will have important conse- areas receive lower emission. While the hemispheric difference of the emission of electron neutrinos plus antineutrinos is only one to two per cent (a), the differ- quences for observable phenomena ences of electron neutrinos (b) and antineutrinos (c) individually amount to up to connected to supernova explosions. A 20 per cent of the maximum values with extrema on opposite sides. directional variation between electron © Author and The American Astronomical Society neutrino and antineutrino emission will lead to differences of the chemical element production in the supernova ejecta in different directions.

Spring 2015 • Vol. 13 No. 1 • inSiDE 41 Applications

Moreover, the global dipolar anisotropy of the neutrino emission carries away momentum and thus imparts a kick to the nascent neutron star in the oppo- site direction. Also the neutrino signal arriving at Earth from the next super- nova event in our Milky Way must be expected to depend on the angle from which we observe the supernova [2,3].

Acknowledgments The author is grateful for GAUSS and PRACE computing time on LRZ's Super- MUC and for project funding by the European Research Council through grant ERC-AdG No. 341157-COCO- 2CASA.

References [1] Tamborra, I., Hanke, F., Janka, H. Th., Müller, B., Raffelt, G. G., Marek, A. Self-sustained asymmetry of lepton-number emission: A new phenomenon during the supernova shock-accretion phase in three dimensions, Astrophysical Journal 792, 96, 2014, ht tp://arxiv.org/abs/1402.5418

[2] Tamborra, I., Hanke, F., Müller, B., Janka, H. Th., Raffelt, G. Neutrino signature of supernova hydro- • Hans-Thomas Internet Links dynamical instabilities in three dimensions Janka Physical Review Letters 111, 121104, Institute web page: 2013, ht tp://arxiv.org/abs/1307.7936 www.mpa-garching.mpg.de Max Planck [3] Tamborra, I., Raffelt, G., Hanke, F., Institute for Janka, H. Th., Müller, B. Core-collapse supernova data archive: Astrophysics Neutrino emission characteristics and www.mpa-garching.mpg.de/ccsnarchive Garching, detection opportunities based on three- dimensional supernova simulations Germany contact: Hans-Thomas Janka, Physical Review D 90, 045032, 2014, ht tp://arxiv.org/abs/1406.0006 [email protected]

42 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications Precision Physics from Simulations of Lattice Quantum Chromodynamics

In the last decade, simulations of Lattice Quantum Chromodynamics (QCD) have reached a new level of pre- cision. By now we are able to compute per mill effects of the particle mass spectrum of QCD, by combining Lattice QCD with Lattice Quantum Electrody- namics (QED). This dramatic increase of precision was made possible by the combination of new and more powerful machines and new lattice actions and simulation algorithms.

With these new methods at hand, we have computed the neutron-proton and Figure 1: Ab initio calculation of the light hadron spectrum of QCD [9]. Blue circles other mass splittings from first prin- indicate experimental input required to fix the quark masses and overall scale of ciples [1,2], studied the range of appli- the calculation. Red points indicate predictions of QCD. Experimental widths are cability of chiral perturbation theory [3] grey, with the central value indicated by a black line. and extracted the corresponding low energy constants [4]. Furthermore, we quark mass splitting (md-mu)/ΛQCD ≈< 0.01 have computed the Equation of State (which are of almost the same magnitude). of QCD [5], the freeze-out parameters In particular including QED poses a range of the cooling quark-gluon-plasma pro- of conceptual issues [10]. duced in heavy ion experiments [6,7], including the flavor dependency of the Incorporating these effects allows us to freeze-out temperature [8]. calculate the neutron-proton and other mass splittings. This implies calculating In the following, we will briefly discuss a per mill level difference between these different results. particle masses that were previously available at per cent precision. Precision Spectrum of the However, by making use of statistical Standard Model correlations between (e.g., neutron and Moving beyond the per cent level preci- proton) propagators, it is possible to sion of our 2008 calculation of the light calculate the spitting directly. hadron spectrum [9] (see also Fig. 1), requires the addition of two formally A first step in this direction was our per cent level effects in the calculation: result [2] using quenched QED, i.e. Quantum Electrodynamics (QED), neglecting the effect of QED in quark due to the magnitude of the fine structure loops. Such a calculation cannot be constant α ≈ 1/137, and the up/down fully satisfactory. For our recent result

Spring 2015 • Vol. 13 No. 1 • inSiDE 43 Applications

world, where the quark masses are non-zero. XPT has also been used in the past to extrapolate from heavier than physical quark masses (due to the costs of simulating with physical parameters) down to the physical mass point.

In 2010, we were the first to compute the quark masses [11] using ensembles with physical quark mass parameters. With these new ensembles that now include the physical point, we could check the applicability of XPT for dif- Figure 2: Mass splittings. Shown are individual masses from Figure 1, with the ferent observables. Furthermore, we results [1] on the mass splittings shown in 40x magnification. could calculate several (low energy) constants (LECs) of XPT that are not on the mass splittings [1], we took the fixed by the theory. This is illustrated next step and included these unquench- in Fig. 3 for one particular LEC, where ing effects, which required us to de- also the limited applicability of XPT for velop new simulation techniques and to heavy pion masses is clearly visible. calculate finite volume effects analyti- cally [10]. The precision of our results Lattice QCD for heavy Ion is illustrated in Fig. 2. Experiments Heavy ion experiments (such as RHIC Confronting Chiral at BNL, LHC at CERN or the upcom- Perturbation Theory ing FAIR at GSI) require the Equation Chiral perturbation (ΧPT) theory is an of State of QCD (EoS) as a central in- effective theory that is used to com- gredient for the understanding of the pute low-energy properties of QCD. evolution of the quark-gluon plasma It does not describe the dynamics generated in collisions of heavy nuclei. of quarks and gluons, but rather the The only tool available that allows for a dynamics of hadrons. Since it is an calculation of the EoS from first prin- expansion around zero quark masses ciples is Lattice QCD. (and momenta), it is in principle unclear if it does in deed apply to the physical

Figure 3: Deviation of the low energy constant B in the RGI scheme from the value at physical mass param- eters. Clearly, for simulated pion masses much larger than 300 MeV, the applicability of XPT (SU(2)

in this case) to (Nf=2+1) QCD is limited.

44 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications

In 2013 [5] we presented the first full calculation of this central quantity, by carefully balancing and controlling the different uncertainties and taking the continuum limit (vanishing lattice spac- ing). Recently, our findings were cor- roborated by the independent hotQCD collaboration, as shown in Fig. 4. This settled a long standing discrepancy between the collaborations, dating back beyond our first continuum estimate of 2010 and became possible with the Figure 4: Comparison of independent results for the EoS most recent results from hotQCD. of QCD (taken from Phys. Rev. D90 (2014) 094503 by hotQCD). Shown are our results ("stout") and the results By studying so-called generalized of the hotQCD collaboration ("HISQ"). susceptibilities (derivatives of the partition function with respect to the We calculated these parameters [6] chemical potentials), it is also possible using preliminary data from the STAR to directly match Lattice QCD results collaboration. Using their latest ex- at finite temperature and chemical perimental results, our findings for the potentials to experimental data. freeze-out parameters are consistent [7], independent whether electrical charge In heavy-ion experiments, the nuclei or baryon number fluctuations were rarely hit head on, and the "cross- used for the matching. Furthermore, section volume" is not constant on an we see indications that hadrons with dif- event-by-event basis. This implies that ferent quark flavors may have different otherwise conserved quantities like freeze-out parameters [8], a finding that the total electric charge or the baryon could be verified experimentally at LHC. number, as measured from the colli- sion products, fluctuate as well. If ad- Acknowledgements ditional cuts on the experimental data We thank all the Members of the are applied, this experimental setting Budapest-Marseille-Wuppertal and can be described using a grand-canonical Wuppertal-Budapest collaborations for ensemble and thus be simulated using the fruitful and enjoyable cooperation. Lattice QCD. The moments of the dis- tributions of the conserved charges Our simulations require substantial found in experiment can then be computational resources. We are matched to Lattice QCD results, when indebted to the infrastructure and appropriate ratios of moments are computing time made available to us by taken, such that the interaction volume the Gauss Centre for Supercomputing cancels out. In this way it is possible and the John von Neumann Institute to extract the experimental freeze-out for Computing. parameters, i.e. the temperature and chemical potential at "chemical freeze- Further support for these projects was out" (last inelastic scattering of hadrons provided by the DFG grant SFB/TR55, before detection). the PRACE initiative, the ERC grant (FP7/2007-2013/ERC No 208740),

Spring 2015 • Vol. 13 No. 1 • inSiDE 45 Applications

the Lendület program of HAS (LP2012-44/2012), the OCEVU Labex (ANR-11-LABX-0060), the A*MIDEX project (ANR-11-IDEX-0001-0) and the GENCI-IDRIS Grand Challenge grant 2012 "StabMat" as well as grant No. 52275. The computations were performed on JUQUEEN and JUROPA at Forschungszentrum Jülich, on Turing at the Institute for Development and Resources in Intensive Scientific Computing in Orsay, on SuperMUC at Leibniz Supercomputing Centre in München, on Hermit and Hornet at the High Performance Computing Center in Stuttgart, and on local machines in Wuppertal and Budapest. Computing time on the JARA-HPC Partition is gratefully acknowledged.

References [1] Borsanyi, S., et al. Ab initio calculation of the neutron-proton [7] Borsanyi, S., et al. mass difference, Science 347, 1452-1455, Freeze-out parameters from electric 2015. charge and baryon number fluctuations: is there consistency?, Phys. Rev. Lett. 113, [2] Borsanyi, S., et al. 052301, 2014. Isospin splittings in the light baryon octet from lattice QCD and QED, Phys. Rev. Lett. [8] Borsanyi, S., et al. • Zoltan Fodor1, 2, 3 111, 252001, 2013. Is there a flavor hierarchy in the deconfine- ment transition of QCD?, Phys. Rev. Lett. 111, • Stefan Krieg1, 2 [3] Dürr, S., et al. 202302, 2013. Lattice QCD at the physical point meets 1 JSC & JARA-HPC SU(2) chiral perturbation theory, Phys. Rev. [9] Dürr, S., et al. Ab-Initio Determination of Light Hadron Forschungs- D90, 114504, 2014. Masses, Science 322, 1224-1227, 2008. zentrum Jülich, [4] Borsanyi, S., et al. Germany SU(2) chiral perturbation theory low-energy [10] See our contribution "Quarks and Hadrons - constants from 2+1 flavor staggered lattice and the spectrum in between" to this simulations, Phys. Rev. D88, 014513, inside issue. 2 University of 2013. Wuppertal, [11] Dürr, S., et al. Lattice QCD at the physical point: Simula- Germany [5] Borsanyi, S., et al. Full result for the QCD equation of state tion and analysis details, JHEP 1108 (2011) with 2+1 flavors, Phys. Lett. B730, 99, 148, and Lattice QCD at the physical point: 3 Eötvös Lorand 2014. light quark masses, Phys. Lett. B701, 265- University, 268, 2011. [6] Borsanyi, S., et al. Hungary Freeze-out parameters: lattice meets experiment, Phys. Rev. Lett. 111, 062005, 2013. contact: Zoltan Fodor, [email protected] Stefan Krieg, [email protected]

46 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications Turbulence in the Planetary Boundary Layer

The Planetary Boundary Layer (PBL) ful description of turbulence across all is the lowest part of the atmosphere, relevant scales, without any turbulence the part that is in contact with the model, Direct Numerical Simulation surface and that feels the cycle of day (DNS) is opening new avenues to ad- and night. The PBL is important not vance this understanding. In this brief only for being the place where we live, article, we provide two examples that but also because this relatively shal- illustrate this new development. low layer – about 2 kilometers deep or less – regulates the exchange of mass, Turbulence Collapse in the momentum and energy between the Stable Boundary Layer rest of the atmosphere and the land One common PBL regime is a wind- and the oceans. This exchange across driven boundary layer with a stable den- the PBL strongly depends on how tur- sity stratification. Such a situation de- bulence mixes the air. The correct rep- velops for instance at night or when a resentation of turbulence is then key mass of air is advected over a relatively in weather prediction and climate re- cold surface – layers of heavy fluid form search, but turbulence remains poorly below layers of lighter fluid due to the understood in some relevant PBL re- heat loss towards the earth's surface. gions and regimes. By providing a faith- Stable stratification weakens the

Figure 1: Horizontal cross-sections near the surface showing the magnitude of the angular velocity of the fluid particles (this angular velocity is one of the defining properties of turbulence), obtained from a DNS of an Ekman layer using a grid size 3072x6144x512. In a neutrally stratified case (left panel), we appreciate turbulence homogeneously distributed everywhere. In contrast, during turbulence collapse in a stably stratified case (right panel), the morphology is completely different: turbulence is reduced to turbulence pockets that are embedded in an otherwise smooth variation of flow properties – turbulence is intermittent in space.

Spring 2015 • Vol. 13 No. 1 • inSiDE 47 Applications

turbulence as heavy fluid particles are An Ekman layer is a boundary-layer type lifted and kinetic energy is converted of flow that results from the balance into potential energy. If strong enough, between the Coriolis force and the pres- stable stratification can even induce a sure force, along with the boundary turbulence collpase, the flow becomes condition that the flow at the surface laminar (smooth, without turbulent has the same velocity as the surface fluctuations) and the mixing rates itself (the so-called no-slip boundary con- across the PBL drop significantly. The dition). The density difference between understanding and the correct rep- the surface and the free flow above the resentation in atmospheric models boundary layer can also be controlled. of such a process is a long-standing This physical model is fully characterized issue. by only two non-dimensional parameters: the Reynolds number, which measures So far, simulations could not reach the relative strength between the inertia into the strongly stratified regime and forces and the viscous forces, and the the analysis of turbulence collapse has Richardson number, which measures the been based on field measurements, relative strength between the buoyancy to a large extent [1]. This paradigm forces, caused by the density variations, has changed. Using DNS and a stably and the inertia forces. By systematically stratified Ekman layer as a simplified varying this second parameter, we can physical model of the PBL, we have reproduce the different regimes found in succeeded to reproduce, in a single nature (see Fig. 1). This work has demon- configuration, for the first time, the strated that DNS has become a suitable three stratification regimes found in tool to study turbulence collapse in the nature: weakly, intermediately and PBL under controlled conditions. strongly stratified [2].

Figure 2: Vertical cross-sections showing the liquid content (mass of the liquid divided by the total mass of the fluid mixture) and the buoyancy, obtained from a DNS of a cloud-top mixing layer using a grid size 3072x3072x2216. The numbers within parentheses corre- spond to the first research flight of the DYCOMS-II field campaign, used as reference in the DNS study. The turbulence is driven by the convective instability caused by the evaporation of the droplets near the cloud interface.

48 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications

As a first application, we have shown to relatively cold fluid) to the tropo- that turbulence collapse need not be an spheric value (in red, corresponding on-off process in time but can occur in- to relatively warm fluid). As the cloud termittently in space without the need mixes with the dry air above it, droplets of external factors that induce such evaporate and tend to cool down the an intermittency, like surface hetero- resulting mixture, as indicated in Fig. 2 geneity – intermittency is intrinsic to by the blue colors in the buoyancy field. stably stratified turbulence. If suffices For some thermodynamic conditions, that wave-like, large-scale structures like in this example, this evaporative (several tens of times the boundary- cooling is strong enough to render par- layer depth) have space and time to cels of fluid colder, and thus heavier, develop (see Fig. 1). This result helps than the in-cloud value. This condition explain the difficulty to obtain spatial in- is referred to as buoyancy reversal and termittency in simulations, because we leads to convective instability – heavier need to retain these large scales and, fluid on top of lighter fluid – and hence simultaneously, we need to resolve the turbulence. The key question is: how small-scale turbulence inside the turbu- important is this instability for the lence regions. cloud-top dynamics?

Catalytic Effect of Wind By deriving an explicit parametrization Shear at the Cloud Top of the mixing rates from the analysis of Another example of the relevance of DNS data, it was possible to show that, turbulence in the PBL is the depen- although necessary, buoyancy reversal dence of marine-boundary clouds on is not a sufficient condition for a rapid the representation of turbulent mixing: desiccation of the cloud. The reason Differences among climate models in is that the turbulence generated by the representation of this dependence buoyancy reversal alone is limited by explain about half of the model spread molecular transport and that is a very in the surface temperature response slow process compared with other to increasing CO2 [3]. Part of the dif- processes acting at the cloud-top [5]. ficulty lies in representing correctly This work settled a three-decades-long entrainment, or, more generally, small- discussion on the topic. scale turbulence at the PBL top. Part of the difficulty consists as well in Still, evaporative cooling can become understanding better the role of turbu- important when enhanced by other lence in clouds [4]. turbulence sources. DNS analysis has demonstrated that wind shear localized DNS allows us to address several as- at the cloud-top (a vertical gradient of pects of this problem. For instance, the horizontal velocity), which is often we can disentangle key aspects of the found in nature, can serve as such a role of the stratocumulus-top cooling source [6]. The reason is that turbu- that is caused by droplet evaporation. lence generated by buoyancy reversal This process works as follows (see can locally thin the entrainment zone also Fig. 2). The cloud interface lies in and thereby enhance the local shear, a relatively thin layer across which the which in turn enhances mixing and the buoyancy rapidly increases from the evaporation of more droplets, creating in-cloud value (in white, corresponding a positive feedback. The implication of

Spring 2015 • Vol. 13 No. 1 • inSiDE 49 Applications

this finding is twofold. First, evapora- References tive cooling needs indeed to be retained [1] Mahrt, L. Stably stratified atmospheric boundary in the analysis of stratocumulus-topped layers. Annu. Rev. Fluid Mech. 46, 23-45, PBLs but always in combination with 2014. other processes. Second, wind shear [2] Ansorge, C., Mellado, J. P. needs to be added to that analysis. Global intermittency and collapsing turbu- lence in the stratified planetary boundary Outlook layer. Boundary-Layer Meteorol., 153, 89- We have presented two examples 116, 2014.

of how DNS is helping us to advance [3] Sherwood, S. C.., Bony, S., Dufresne, our understanding of turbulence J. L. inside the planetary boundary layer. Spread in model climate sensitivity traced to atmospheric convective mixing. Nature By faithfully representing the flow 505, 37-42, 2014. properties across all the relevant spatial and temporal scales, DNS has [4] Bodenschatz, E., Malinowski, S., Shaw, R., Stratmann, F. led, in some cases, to new insights in Can we understand clouds without turbu- the mechanisms that control the PBL lence? Science 327, 970-971, 2010. dynamics. In some other cases, it has [5] Mellado, J. P. settled long-standing discussions. The The evaporatively driven cloud-top mixing idea of using DNS as a research tool layer. J. Fluid Mech., 660, 5-36, 2010. to investigate this type of problems [6] Mellado, J. P., Stevens, B., Schmidt, H. was already explored in the early Wind shear and buoyancy reversal at the seventies [7], but the computational top of stratocumulus. J. Atmos. Sci., 71, resources were not yet there – now 1040-1057, 2014.

they are here. The possibilities that are [7] Fox, D. G., Lilly, D. K. emerging with the current generation Numerical simulation of turbulence flows of supercomputers, and that will Rev. Geophys. Space Phys., 10, 51-72, 1972. • Juan Pedro further emerge during the coming Mellado decades, are opening new avenues in weather and climate research. Max-Planck- Institute for We gratefully acknowledge the Gauss Meteorology, Centre for Supercomputing (GCS) for Germany providing computing time through the John von Neumann Institute for Com- puting (NIC) on the GCS share of the supercomputer JUQUEEN at Jülich contact: Juan Pedro Mellado, Supercomputing Centre (JSC). [email protected]

50 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications The strong Interaction at Neutron-rich Extremes

The microscopic understanding of of low-energy degrees of freedom, atomic nuclei and of high-density nucleons and pions, based on the matter is a very challenging task. symmetries of QCD [1]. The chiral EFT Powerful many-body simulations are framework provides a systematically required to connect the observations improvable Hamiltonian and explains made in the laboratory to the underly- the hierarchy of two-, three-, and ing strong interactions between neu- higher-body forces. The presence of trons and protons, which govern the such many-body forces is an immediate properties of nuclei and of strongly consequence of strong interactions. interacting matter in the universe. Re- In particular, the computation and newed interest in the physics of nuclei inclusion of three-nucleon (3N) forces is driven by discoveries at rare iso- in many-body calculations is one of the tope beam facilities worldwide, which current frontiers [2]. open the way to new regions of exotic, The second challenge concerns the neutron-rich nuclei, and by astrophysi- practical solution of the many-body cal observations and simulations of problem based on nuclear forces. neutron stars and supernovae, which Since the computational complexity require controlled constraints on the grows significantly with the number equation of state of high-density mat- of particles, up to about 10 years ago ter. Fig. 1 shows the substantial region the scope of ab initio calculations was of exotic nuclei that will be explored at limited to light nuclei up to around the future FAIR facility in Darmstadt. carbon (with nucleon number A=12). The nuclear many-body problem Due to advances on several fronts involves two major challenges. The and also due to rapidly increasing first concerns the derivation of the strong interaction between nucleons (neutrons and protons), which is the starting point of few- and many-body ab initio calculations. Since nucleons are not elementary particles, but composed of quarks and gluons, the strong interaction has a very complex structure. Although it is becoming possible to study systems of few nucleons directly based on quarks and gluons, the fundamental degrees of freedom of Quantum Chromodynamics (QCD), high-precision calculations of nuclei based on quarks and gluons will not be feasible in the foreseeable Figure 1: Chart of atomic nuclei. The black points mark the stable nuclei, while the grey region denotes all nuclei known to date. Of the latter, the points in purple future. As a systematic approach, have been studied with mass measurements at the GSI Helmholtzzentrum für chiral effective field theory (EFT) allows Schwerionenforschung Darmstadt. The blue points mark the exotic nuclei that will to derive nuclear forces in terms be explored at the future FAIR facility. Figure courtesy Y. Litvinov (GSI).

Spring 2015 • Vol. 13 No. 1 • inSiDE 51 Applications

Figure 2: Contour plots of RG-evolved 3N interactions as a function of three-body hypermomenta (see Ref. [5] for details). Shown is the systematic decoupling of low- and high-momentum parts in the interaction as the RG resolution scale λ is lowered from left to right. (Figure taken from Ref. [5]).

computing power, this limitation has RG Evolution of nuclear nowadays been pushed to much Interactions heavier systems (see, e.g., the The convergence behavior and the recent work of Ref. [3]). One key step required computational resources of was the development of powerful many-body calculations for a given renormalization group (RG) methods nucleus are governed by the proper- that allow to systematically change ties of the employed nuclear forces. the resolution scale of nuclear forces It is convenient to visualize nuclear [4]. Such RG transformations lead to interactions as a function in momen- less correlated wave functions at low tum space, where low momenta corre- resolution and the many-body problem spond to large interparticle distances becomes more perturbative and and high momenta to short-range cor- tractable. relations. In general, a strong coupling Our work focuses on the derivation of low- and high-momentum parts in of RG evolved interactions and nuclear interactions induce strong vir- electroweak operators, the inclusion tual excitations of particles and imply of chiral 3N forces in many-body a poor perturbative convergence and calculations of extreme neutron-rich large required basis spaces for the nuclei, and on the development of solution of the many-body Schrödinger Quantum Monte Carlo simulations with equation. The similarity renormalization chiral EFT interactions, which open group (SRG) allows to systematically up nonperturbative benchmarks for decouple low-momentum physics from high-density matter. These calculations high-momentum details via a continu- enable us to explore the formation ous sequence of unitary transforma- of structure in exotic nuclei, the tions that suppress off-diagonal matrix properties of neutron-rich nuclei and elements, driving the Hamiltonian matter that play a key role in the towards a band-diagonal form [4]. This synthesis of heavy elements in the decoupling is illustrated in Fig. 2 on the universe, as well as the nuclear physics basis of a representative chiral 3N in- involved in applications to fundamental teraction. symmetries, e.g., for the nuclear Computationally, the SRG evolution matrix elements of neutrinoless double- of NN interactions is straightforward beta decay that probes the nature and and can be performed on a local com- mass scale of the neutrinos. puter. However, when evolving nuclear interactions to lower resolution, it is inevitable that higher-body interactions

52 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications are induced even if initially absent. This might be considered unnatural, if nuclei could be accurately calculated based on only NN interactions. However, chiral EFT reveals the natural scale and hierarchy of many-body forces, which dictates their inclusion in calculations of nuclei and nuclear matter. In fact, the importance of 3N interactions has been demonstrated in many different calculations [2]. The RG evolution of 3N forces is computationally challenging since typical dimensions of interac- tion matrices in a momentum-space partial-wave representation can reach about 10 4 − 10 5. This means that the Figure 3: Two-neutron separation energy S2n of the required memory for storing a single in- neutron-rich calcium isotopes as a function of neutron number N (see Ref. [7] for details). Shown are the new teraction matrix in double precision can experimental results from the ISOLTRAP collaboration (red reach about 40 GB. For the solution of squares) in comparison with our predictions based on the RG flow equations it is necessary to chiral NN+3N interactions (blue line), as well as shell-mod- evaluate efficiently matrix products of el (KB3G and GXPF1A) and coupled cluster (CC) calcula- tions. (Figure taken from Ref. [7]). this dimension. Since numerical solvers of differential equations need several copies of the solution vector for a sta- particularly evident in the calcium iso- ble and efficient evolution, a distributed topes. While microscopic calculations storage of all matrices and vectors is with well-established NN forces repro- mandatory. For an efficient evaluation duce the standard magic numbers N of large matrix products we have em- = 2, 8, 20, they do not predict 48Ca ployed a hybrid OpenMP/MPI strategy as a doubly-magic nucleus when neu- for our implementation. trons are added to 40Ca. As a result, phenomenological forces have been 3N Forces and Neutron-rich adjusted to render 48Ca doubly magic, Nuclei and it has been argued that the need Nuclei with a certain number of pro- for these phenomenological adjust- tons and neutrons are observed to be ments may be largely due to neglected particularly well-bound. These closed- 3N forces. In recent work, we have shell or "magic" nuclei form the basis shown that 3N forces play a decisive role of the nuclear shell model, which is a in medium-mass nuclei and are crucial for key computational method in nuclear the magic number N = 28 [6]. For the cal- physics. Exploring the formation of shell cium isotopes, the predicted behavior of structure and how these magic config- the two-neutron separation energy S2n urations evolve with nucleon number to- up to 54Ca is in remarkable agreement wards the limits of the nuclear chart is with precision mass measurements of a frontier in the physics of nuclei, and the ISOLTRAP collaboration at ISOLDE/ the microscopic understanding from CERN using a new multi-reflection time- nuclear forces represents a major of-flight mass spectrometer, as shown challenge. The theoretical shortcom- in Fig. 3. The new 53,54Ca masses are in ings in predicting shell structure are excellent agreement with our NN+3N

Spring 2015 • Vol. 13 No. 1 • inSiDE 53 Applications

predictions and unambiguously estab- based on chiral NN interactions. This lish N = 32 as a shell closure. This was not possible before due nonlocali- work was published with the ISOLTRAP ties in chiral EFT interactions. However, collaboration in Nature [7]. it is possible to remove all sources of Since it is not possible to solve the nonlocality in nuclear forces up to next- many-body problem exactly for general to-next-to-leading order (N2LO) in the medium-mass nuclei, valence-space chiral expansion. This enables us to methods utilize a factorization of nuclei perform auxiliary-field diffusion Monte into a core and valence nucleons that Carlo (AFDMC) calculations for the occupy a truncated single-particle neutron matter equation of state up space above the core. The interactions to nuclear saturation density based on of particles in this valence-space are local leading-order (LO), next-to-leading computed microscopically in many-body order (NLO), and N2LO NN interactions. perturbation theory (MBPT), whereas Our results exhibit a systematic order- the primary computation lies in the by-order convergence in chiral EFT and self-consistent evaluation of a large provide nonperturbative benchmarks number of one- and two-body diagrams. with theoretical uncertainties. For the The resulting effective Hamiltonian can softer interactions, perturbative calcu- then be diagonalized exactly, and within lations are in excellent agreement with certain limits reproduces the exact the AFDMC results, as shown in Fig. 4. eigenvalues. These advances also opened up first Green’s Function Monte Carlo calcula- Quantum Monte Carlo tions of light nuclei based on chiral NN Simulations with chiral EFT interactions [11]. Presently, we are Interactions working on the implementation of the Quantum Monte Carlo (QMC) methods leading 3N forces in QMC simulations. have been proven to be a very powerful This paves the way for QMC calcula- tool for studying light nuclei and neu- tions with systematic chiral EFT inter- tron matter [8]. In Refs. [9,10,11], we actions for nuclei and nuclear matter, have presented first QMC calculations for testing the perturbativeness of

2 2 2 N LO R0=1.0 fm N LO R0=1.1 fm N LO R0=1.2 fm 20 20 AFDMC AFDMC AFDMC Hartree-Fock Hartree-Fock Hartree-Fock 2nd order 2nd order 2nd order 15 3rd order 3rd order 3rd order 15

10 10 E/N [MeV]

5 5

0 0 0 0.05 0.1 0.15 0 0.05 0.1 0.15 0 0.05 0.1 0.15 n [fm-3] n [fm-3] n [fm-3]

Figure 4: Neutron matter energy per particle E/N as a function of density n based on AFDMC and MBPT calculations (see Ref. [10] for details). The different panels show results using local N2LO NN interactions with

different resolution scale R0 = 1.0 – 1.2 fm (the latter being the softest interaction). For the MBPT results, we show the Hartree-Fock energies as well as the energy at second and third order. The width of the bands provide a measure of the theoretical uncertainty. For the softer interactions (right panel), the perturbative calculations are in excellent agreement with the AFDMC results. (Figure taken from Ref. [10]).

54 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications different orders, and also allows for References matching to lattice QCD results in a [1] Epelbaum, E., Hammer, H. W., Meißner, U.-G. finite volume. Modern theory of nuclear forces, Rev. Mod. The QMC methods we use in our Phys. 81, 1773, 2009. calculations treat the Schrödinger [2] Hammer, H. W., Nogga, A., Schwenk, A. equation as a diffusion equation in Three-body forces: from cold atoms to imaginary time and project out the nuclei, Rev. Mod. Phys. 85, 197, 2013. ground-state wave function from a [3] Binder, S., Langhammer, J., Calci, A., trial wave function by evolving to large Roth, R. imaginary times. GFMC performs, in Ab initio path to heavy nuclei, Phys. Lett. B addition to a stochastic integration 736, 119, 2014. over the particle coordinates, [4] Furnstahl, R. J., Hebeler, K. explicit summations in spin-isospin New applications of renormalization group space, and is thus very accurate but methods in nuclear physics, Rept. Prog. computationally very costly, so that Phys. 76, 126301, 2013. one can only access particle numbers [5] Hebeler, K. with A ≤ 12. In contrast, AFDMC also Momentum space evolution of chiral 3N stochastically evaluates summations interactions, Phys. Rev. C 85, 021002, 2012. in spin-isospin space and shows a better scaling behavior at the cost of [6] Holt, J. D., Otsuka T., Schwenk, A., less accuracy. We can thus simulate Suzuki, T. Three-body forces and shell structure in 66 fermions in our neutron matter calcium isotopes, J. Phys. G 39, 085111, calculations. For our QMC simulations 2012. we typically average over 5-10k walkers [7] Wienholtz, F., et al. for several thousand time steps. Since Masses of exotic calcium isotopes pin down we use independent walkers, the code nuclear forces, Nature 498, 346, 2013. is easily parallelizable and shows an [8] Carlson, J., et al. excellent and almost linear scaling Quantum Monte Carlo methods for nuclear • Kai Hebeler behavior with the number of cores. physics, arXiv:1412.3081. • Achim Schwenk Typically, we use 200-400 cores. In • Ingo Tews [9] Gezerlis, A., et al. contrast to SRG transformations, Quantum Monte Carlo calculations with we have only moderate memory chiral effective field theory interactions, Institut für requirements of typically 1GB per core. Phys. Rev. Lett. 111, 032501, 2013. Kernphysik,

[10] Gezerlis, A., et al. Technische Acknowledgement Local chiral effective field theory interac- Universität Our results could not have been tions and quantum Monte Carlo applica- Darmstadt, tions, Phys. Rev. C 90, 054323, 2014. achieved without an allocation of com- ExtreMe Matter puting resources at the Jülich Super- [11] Lynn, J. E., et al. Institute EMMI, computing Centre. We are grateful to Quantum Monte Carlo calculations of light GSI, Germany nuclei using chiral potentials, Phys. Rev. the John von Neumann Institute for Lett. 113, 192501, 2014. Computing for selecting the present project as 2014 Excellence Project, and thank our collaborators on the JUROPA project, in particular Alexandros Gezerlis, Jason Holt, Javier Menéndez, and Johannes Simonis. This work was supported in part by the ERC contact: Achim Schwenk, Grant No. 307986 STRONGINT. [email protected]

Spring 2015 • Vol. 13 No. 1 • inSiDE 55 Applications Lattice QCD as Tool for Discoveries

With the discovery of the Higgs par- dominated by quantum effects. There- ticle the Standard Model of particle fore, it is a much too strong simplifica- physics has once again passed a cru- tion to say that proton is built up from cial experimental test. Already before three quarks. Instead, its properties this discovery the gauge sector of the depend very strongly on effects like the theory, which describes the electro- production of virtual quark-antiquark magnetic, weak and strong interactions pairs and gluons. The resulting complex was tested far beyond any reasonable quantum state has definitive mass, doubt. Now that also the Standard charge and spin but how these quanti- Model mechanism for electroweak ties split into individual contributions of symmetry breaking (through the Higgs quarks and gluons can only be mechanism) has been proven to be cor- answered by taking the statistical aver- rectly described, the Standard Model age over all possible quantum states. is established even further as one of At the LHC protons are collided with the great cultural achievements of the extremely high energy such that due to 20th century. Consequently, the task relativistic time dilatation the result of of the 21st century is to clarify the an individual collision depends on what physics Beyond the Standard Model one might want to call a snapshot of (BSM) which is already known to exist this complex state, making the task to (e.g., cosmologically relevant entities fully understand the "underlying event" like dark matter, dark energy, infla- a formidable one. tons etc. are not part of the Standard Model) but for which the fundamental Many highly sophisticated QCD tech- theory responsible has not been deter- niques have been developed over the mined. For this new endeavor the task last decades for this purpose and is no longer to test the Standard Model many aspects of perturbative and non- but to achieve the highest possible ac- perturbative nature can be treated curacy in Standard Model calculations satisfactorily by analytic means. How- and thus to maximize the potential to ever, there exist also many important uncover BSM physics in, e.g., the LHC non-perturbative quantities which can experiments at CERN. In fact, success only be calculated (at least up to now) depends equally crucially on experi- by computationally expensive numerical mental progress as on a continuous simulations using Lattice QCD (LQCD). improvement of theory. As the physics This is the field we are working in. of quarks and gluons (QCD: Quantum Chromodynamics) is by far the most Linking QCD to purely difficult part of the Standard Model, statistical Calculations progress in QCD is especially impor- The fundamental observation, which is tant. Fig. 1 shows an attempt to illus- the basis of LQCD is that the analytic trate the complexity of this task. What continuation of time to imaginary time is shown is a schematic illustration of a allows one to map quantum field theory proton. The properties of a proton are onto thermodynamics and statistics.

56 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications Lattice QCD as Tool for Discoveries

Figure 1: Attempt to illustrate the complex quantum state of a proton. Figure taken from [1].

This allows for a purely statistical treat- LQCD community has promised this ment of otherwise untreatably compli- vital information already many years cated QCD problems. Discretization of ago but fulfilling this promise has again space-time, i.e. introducing a "lattice" of and again turned out to be much more space-time points, reduces the number difficult than anticipated. We have now of degrees of freedom of the corre- added a new chapter to this epic story. sponding statistical problems to a very large but finite number. Finally, Monte Hadronic Structure Carlo techniques allow one to gener- As argued above, a hadron like the pro- ate ensembles of representative gauge ton, i.e. a quark-gluon bound state, has field configurations. Calculating physical a truly mind-boggling complexity, see observables in QCD is then reduced to Fig. 1. QCD combines relativistic quan- calculating suitable expectation values tum field theory and strongly-coupled on these ensembles. For large en- nonlinear dynamics. The task of de- sembles with realistic quarks masses, ducing the incredibly complex internal which allow to reach the needed preci- structure of a hadron from the debris sion, the last two steps need super- of a high energy particle collision "long" computer resources like SuperMUC. (relatively speaking) after the collision To describe any given experiment one is over sounds like a completely hope- always needs a combination of analytic less enterprise. Still, the standards and numerical techniques. However, now reached by modern hadron phys- the required analytic tools have ics means that this is done successfully reached such a high precision, that the every day. Over the years an ever lon- information needed from LQCD became ger list of quantities which the main source of uncertainty. The

Spring 2015 • Vol. 13 No. 1 • inSiDE 57 Applications

0.28 provide information about the internal structure of hadrons in terms of their 0.26 fundamental constituents (quarks and 0.24 gluons) have been introduced and stud- ied in great detail by the international 0.22 physics community. For these quanti- 0.2 ties the crucially needed lattice input (2 GeV) 0.18 can be organised in the form of matrix MS u − d

x  elements of the type 0.16 RQCD N =2 f hadron 1 specific quark gluon 0.14 RBC/UKQCD (‘10), (‘13) Nf =2+1 │ LHPC (‘12) Nf =2+1 operator │hadron 2 0.12 ETMC (‘13) Nf =2+1+1 ⟨ ETMC (‘13) Nf =2 0.1 where hadron 1 and hadron 2⟩ can be 0 0.05 0.1 0.15 0.2 0.25 2 2 identical. The matrix element describes mπ [GeV ] the transition from hadron 1 into hadron 2

Figure 2: Our new results (dark green circles) in comparison to results from other via an interaction represented by the collaborations and experimental data (black symbols). Please note the reduced quark-gluon operator. The task of LQCD parameter range plotted, resulting in optically large error bars. Figure taken from [2]. is to determine a large number of such matrix elements. Some of these are known experimentally to high preci- sion. These provide valuable test cases which can be used to demonstrate how well the systematics involved in a lat- tice calculation are under control. This is used to guide estimates of the pre- cision that can be obtained for those matrix elements which are not yet known experimentally. Unfortunately, A g so far LQCD has failed to reproduce the experimental results for two such test cases: the second moment of the Nf =2 Nf =2+1+1 isovector parton distribution function of Nf =2 Nf =2+1+1 Nf =2 Nf =2 the nucleon, denoted x (u-d) , which Nf =2+1 Nf =2 gA/Fπ gives information about the fraction N =2+1 f of the nucleon momentum⟨ carried⟩ by the individual quarks and the isovector m2 2] π axial-vector coupling of the nucleon de- Figure 3: Our new results for g_A (red squares) and the ratio g_A divided by the noted g , which is associated with the pion decay constant labeled as F_π. Please note the reduced parameter range A plotted, resulting in optically large error bars. Figure taken from [3]. beta decay of a neutron into a proton. These quantities are typical examples of hadron structure observables such that failing to calculate them reliably is a most serious reason for concern.

58 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications

Our new results for Our new data at low pion mass

x (u-d) and gA achieves an overall uncertainty of less Lattice results are assessed in terms than 5% but significantly disagrees of the associated (Monte Carlo) statis- with the phenomenological values. For tical⟨ and systematic⟩ errors. The goal of x (u-d) precision results are likely to LQCD is to simulate QCD using quarks require simulations with smaller lattice with physical masses on a lattice with spacings⟨ ⟩ to have better control of the a large enough physical volume and continuum limit. Unfortunately with the small enough lattice spacing that finite standard lattice formulation we used volume and discretisation effects are so far, simulations on finer lattices not significant. However, the numeri- are unacceptably expensive such that cal cost of the simulations grows very one needs a new approach. The only rapidly as the quark mass is decreased promising formulation proposed so far towards the physical value while keep- is that used by the CLS collaboration. ing the physical volume large enough. We have joined this collaboration and This leaves any lattice collaboration already started large scale simulations. with basically two options. One can For gA the remaining discrepancy is simulating with unphysically large quark caused by finite volume effects which masses and then extrapolate to the can be corrected for with more physical ones. The advantage of this computer time by simulating on larger option is that for a given amount of space-time lattices. computing time one gets much smaller While we obviously would have statistical errors. The disadvantage is preferred to determine many different that there are systematic uncertainties physics quantities already now with introduced by the mass extrapolation high precision, this story provides a Alternatively, one can simulate at or nice illustration of the fundamentals close to the physical masses but then of science research: One should never has to live with (other) large systematic fall prey to wishful thinking but should • Andreas Schäfer uncertainties unless one has access to always work on improving the precision very large computer resources. Most achieved and be open to identify Institute for collaborations do not have the possibil- problems which were previously not Theoretical ity to choose the second approach. apparent. Physics, However, thanks to QPACE, our home- Regensburg made supercomputer, and SuperMUC, References University, we had the computing resources re- [1] http://uni-hamburg.cfel.de/images/ Germany content/e8/e76/index_eng.html quired. Our results for the momentum fraction x (u-d) and the coupling gA [2] Bali, G. S., et al. are shown in Fig.1 and 2. The results Physical Review D 91, 5, 054501, ArXiv: are plotted⟨ against⟩ the square of the 1412.7336, 2015. pion mass which is proportional to the [3] Bali, G. S., et al. u/d quark mass. Physical Review D 90 , 7, 074510, ArXiv: The message of both figures is clear: 1408.6850, 2014. as the quark mass is decreased the experimental results can be reproduced with an error of around 10%. However, to achieve greater contact: Andreas Schäfer, precision more work has to be done. [email protected]

Spring 2015 • Vol. 13 No. 1 • inSiDE 59 Applications LIKWID Performance Tools

LIKWID is a set of performance-related • likwid-perfctr – enables open source command line tools targeting measurements of hardware X86 processors. It was initially presented in performance monitoring data on late 2009. In inside spring edition 1, 2010 X86 processors. we presented it to a wider audience. Since that time a lot has happened and we think Since that time several other tools were it is time to report on the latest LIKWID added: developments. LIKWID is developed at • likwid-bench – a microbenchmarking the Erlangen Regional Computing Centre framework and application allowing (RRZE). Since July, 2013 LIKWID is of- rapid prototyping of threaded ficially funded as part of the BMBF FEPA assembly kernels. project. FEPA is a joint effort together with • likwid-mpirun – enables LRZ-BADW and NEC Deutschland to en- simple and flexible pinning of MPI able system-wide application profiling on and MPI/threaded hybrid large-scale HPC cluster systems. LIKWID applications with integrated likwid- was downloaded several thousand times, perfctr support. and it has found its place in the tooling • likwid-powermeter – tool for community as a set of simple to use com- accessing RAPL counters (power mand line tools with a unique feature and energy consumption) and query set. Many HPC centers offer LIKWID Turbo mode steps on Intel as part of their standard tool set. With processors. regard to functionality LIKWID enables • likwid-memsweeper – tool to to get information about compute nodes, eliminate file system buffer cache measure various profiling data sources on from ccNUMA domains. processors (e.g., HPM data, RAPL energy • likwid-setFrequencies – tool to set counters), control thread/core affinity and specific processor clock frequencies. perform microbenchmarking. The current stable release is version 3.1.3. We are What sets LIKWID apart from many other currently in the process to finish the next tools in this area is that it implements major release LIKWID 4, which includes all functionality on its own in user space major changes in the internal software instead of relying on the Linux perf_event architecture of LIKWID. The command line interface like most other tools in the applications in LIKWID 4 are implemented Hardware Performance Monitoring (HPM) in the Lua scripting language and are sector. This allows us to support new pro- based on a common C library API. This API cessors quickly without the need to install is designed to also be used by other tools a new Linux kernel. As a consequence, building on LIKWID functionality. LIKWID was one of the first tools offer- ing a usable access to the Uncore HPM In its initial version, LIKWID consisted of units on newer Intel processors. For a three command line tools: long time LIKWID used the "cpuid" instruc- • likwid-topology – shows all topology tion to query node topology information. information on a compute node that Starting with LIKWID 4 we decided to rely is relevant for software developers. on "hwloc" as an alternative backend for • likwid-pin – controls thread to core topology information. This enables a more affinity. robust affinity interface and also allows us

60 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications to port LIKWID to non-X86 architectures This command will set OMP_NUM_ more easily in the future. In the following THREADS to 30 using two threads per we want to focus on a few features that core (out of four possible SMT threads). let LIKWID stand out against other tools. Such a placement would be difficult to realize using the list-based syntax. It is also Thread Group Concept more convenient to use in benchmarking The of processor IDs as provided by the scripts. LIKWID thread groups offer a con- Linux OS are tedious to use in practice, sistent and flexible interface for expressing since there a no fixed rules for how they node topology domains using a future-proof should be mapped to the node topology. unified command line syntax. To alleviate this problem LIKWID provides "logical" processor IDs and the concept Performance Groups of "thread groups". A thread group is a One reason why hardware performance topological entity shared by several pro- monitoring (HPM) is difficult to use is cessors. The thread group syntax is used that the things a software developer is throughout LIKWID to specify processor interested in can usually not be directly lists. In LIKWID a thread group is indicated measured. Instead, a collection of HPM by single capital letters: N for node, S for events must be configured to compute socket, M for memory domain, C for last the resulting derived metric. There is level cache. There are two syntax variants no accepted standard for naming HPM for specifying thread groups, a list-based events, and due to poor documentation one and an expression-based one. Here is it is difficult for the application developer an example using the list-based syntax to to pick the right event sets for a specific pin an OpenMP application: purpose. Moreover at least on X86 pro- likwid-pin – c S0:0-3@S1:0-3 ./a.out cessors HPM events are not an integral part of the product for processor vendors. Above command will set OMP_NUM_ Consequently, the documentation may be THREADS to 8 in case it is not yet set outdated or wrong, and the events are from outside and place the first four sometimes unreliable. LIKWID provides threads on the first four physical cores "performance groups", which bundle event on socket 0 and the second four threads sets and derived metrics computed from on the first four physical cores on socket 1. them. This relieves the developer from Using the list-based syntax LIKWID deals finding meaningful event sets on every new with SMT threads using a "physical cores processor generation. We are currently first" policy, which makes it easy to ignore working on validating the supported perfor- the SMT feature. This means that on a mance groups with respect to accuracy. socket with four cores having two SMT Another feature of LIKWID is its easy threads per core the IDs 0-3 are hard- extension. INCE the performance groups ware threads on the first four distinct are defined in plain text files it is simple to physical cores and IDs 4-7 are their coun- alter existing groups or add new ones. In terparts. As can be seen in the example, the current stable release LIKWID must multiple thread group expressions can be be recompiled to be aware of changed chained using the "@" character. or additional performance groups. In the As an alternative LIKWID also supports upcoming LIKWID 4 release performance a expression-based syntax. The following groups are dynamically interpreted without could be used, e.g., on the Intel Xeon Phi: the need to recompile. likwid-pin –c E:N:30:2:4 ./a.out

Spring 2015 • Vol. 13 No. 1 • inSiDE 61 Applications

As an example, the following command The first output box shows the total run- measures the memory bandwidth on an time spent in a region ("copy" here) and IvyBridge-EP system: how many times it was executed (10 likwid-perfctr – C S0:0-1@S1:0-1 – g times). The next table shows the raw MEM -m ./a.out events as measured per core on the hard- ware. The last box displays derived metrics This will run the thread-parallel application computed from the raw event counts. The with four threads, two on socket 0 and reported clock speed of 2.9GHz indicates two on socket 1. The –m option indicates that Turbo mode is enabled on this system that the application was instrumented (the nominal clock is 2.2 GHz as shown in using the "marker API". The output could the header). It can also be seen that look as follows (shortened): almost all data volume is served by just one of the two sockets. The application may thus have a ccNUMA -related prob- lem. There is a final statistics table for threaded measurements providing MIN, MAX, MEAN and SUM for the derived metrics, which was omitted in this output.

LIKWID Libraries Likwid-perfctr performs measurements on the specified cores, but it does not know about the code that was executed during the measurement. Measurements are connected to code by affinity enforcement. All the pinning facilities of likwid-pin are therefore also available in likwid-perfctr. To get a meaningful profiling result it is usually necessary to instrument regions in the code for measurement. LIKWID pro- vides a simple marker API, which allows to tag and name regions in the code. This API only consists of six calls and supports a Fortran 90 and a C interface.

The following code snippet illustrates the marker API:

62 Spring 2015 • Vol. 13 No. 1 • inSiDE Applications

The upcoming LIKWID 4 release will References provide is an extensive library API which [1] Treibig, J., Hager, G., and Wellein, G. LIKWID: A lightweight performance-oriented makes it easier to implement tools on top tool suite for x86 multicore environments. of LIKWID. The C library consists of 72 Proceedings of PSTI2010, the First Interna- functions. tional Workshop on Parallel Software Tools and Tool Infrastructures, San Diego CA, September Outlook 13, 2010. DOI: 10.1109/ICPPW.2010.38 LIKWID 4 will provide a new software [2] Treibig, J., Hager, G., Meier, M., and architecture and enhanced functionality. Wellein, G. LIKWID performance tools. InSiDE 8(1), 50-53, In the context of the BMBF FEPA project Spring 2010. LIKWID is established as part of a unified application profiling framework on the sys- [3] Treibig, J., Hager, G., and Wellein, G. LIKWID performance tools. In: C. Bischof et tem level. Measurements have indicated al. (eds.), Competence in High Performance that the current interface to the LINUX OS Computing 2010. Springer, ISBN 978-3-642- generates significant overhead [4]. Hence, 24025-6 (2012), 165-175. DOI: 10.1007/978- we will provide a dedicated Linux kernel 3-642-24025-6_14. module enabling low overhead measure- [4] Röhl, T., Treibig, J., Hager, G., and ments. Porting of LIKWID to new microar- Wellein, G. chitectures is a constant effort. Depend- Overhead analysis of performance counter measurements. Proceedings of PSTI2014, ing on demand, LIKWID is also ported the Fifth International Workshop on Parallel to non-HPC processors like, e.g., various Software Tools and Tool Infrastructures, Minne- flavors of Intel Atom. Currently LIKWID is apolis MN, September 12, 2014. hosted on Google Code. LIKWID depends on its users to report errors and guide its development by feedback and feature re- How to get it quests. If you want to try LIKWID we would The LIKWID performance tools are Open be happy to receive your feedback on the Source. You can download them at: LIKWID user mailing list. • Jan Eitzinger http://tiny.cc/LIKWID • Thomas Röhl • Georg Hager LIKWID is is funded as part of the FEPA • Gerhard Wellein project. Friedrich- Alexander- Universität Erlangen- This work was funded by KONWIHR. Nürnberg Regionales RechenZentrum Erlangen (RRZE), Germany contact: Jan Eitzinger, [email protected]

Spring 2015 • Vol. 13 No. 1 • inSiDE 63 Projects

ECO2Clouds – Experimental

Awareness of CO2 in Federated Cloud Sourcing

project strategies for enhancing cloud infrastructures by considering the eco- awareness with the aim to reduce the costs and protecting the environment by additionally tracing the carbon foot- print and supporting the reduction of

CO2 emissions from current cloud infra- Project Outline structures were developed. For achiev-

Nowadays, data centers consume a ing the ECO2Clouds goals the project high amount of energy and in addition, did not only focus on the infrastructure they are responsible for the generation layer but as well on the virtual and the

of a significant portion of CO2 emis- application layer to achieve the follow- sions. Furthermore, when thinking of ing project vision: the increasing adop- tion of cloud based developments has • Improving an effective application a big impact on the environment. The deployment on federated cloud energy consumption and the resulting infrastructures. emissions are growing dramatically. • Reducing the energy consumption

The ECO2Clousd project target was and thus the arising costs and the

to analyse and provide solutions for CO2 emissions. ecological implications of cloud-based • Optimizing key assets such as IT infrastructures (Fig. 1) and bridging deploying virtual machines, the critical gap between latest state- applications and databases. of-the-art in research and business.

It enfolded three geographically distri- In the scope of the ECO2Clouds project, buted testbeds across Europe offering eco-efficient data were collected at heterogeneous cloud resources includ- the physical cloud infrastructure and ing the possibility to measure precisely level. Further, required the energy consumption of each physi- quality and cost parameters for deploy-

cal host. ECO2Clodus entire infrastruc- ing virtual machines in multi-cloud ture providers are also part of the environments were identified and ad- BonFIRE project [1], the resources can ditionally, evaluation mechanisms and be accessed seamlessly by describing optimizations for algorithms to assess an application energy profile which is different parameter configurations and

parsed by the ECO2Clouds scheduler in their influence in energy-efficient cloud order to submit the extended experi- sourcing and application deployment ment descriptor to the BonFIRE control strategies were identified. mechanisms which are mainly based

on the Open Cloud Computing Inter- The ECO2Clouds project has achieved

face (OCCI) [6]. Within the ECO2Clouds its goals including infrastructure sup-

64 Spring 2015 • Vol. 13 No. 1 • inSiDE Projects

Figure 1: Schematic infrastructure of ECO2Clouds port for energy efficiency by taking • Mechanisms for an optimized steps towards the provision of neces- resource utilization of federated sary information through monitoring clouds. metrics. Thus, ECO2Clodus monitoring • An approach for adapting changes metrics quantify the energy consump- to a running application based on tion and the environmental impact the energy consumption. arising through application execution. The information regarding the energy Through implementing the above men- consumption were used to identify the tioned issues, the ECO2Clouds project produced grams of CO2. Furthermore, has highlighted that monitoring and col- an application strategy focusing on the lecting knowledge about the execution environmental impact for deploying of applications in a federated cloud envi- applications on multi or federated ronment enables a "green optimization". clouds was developed. In this scope, key model parameters such as ecologi- Work of HLRS cal impact, quality and cost dimensions Within the two year duration of were focused. ECO2Clouds produced ECO2Clouds, HLRS was responsible the following results: for various tasks, starting with provid- ing an efficient and scalable monitor- • Methods for collecting and ing infrastructure for all the required exposing the carbon footprint at data and supporting the data mining the infrastructure and the virtual approach needed for making assump- machine level. tions regarding eco-efficiency, offering • An approach for incorporating the two case studies for the improvement

carbon footprint into a federated of the ECO2Clouds scheduler as well as cloud deployment strategy. providing a dedicated cloud infrastruc- ture to the project.

Spring 2015 • Vol. 13 No. 1 • inSiDE 65 Projects

Figure 2: The monitoring infrastructure

The project goals were achieved lector, collects the required monitoring through implementing the monitoring data from the power distribution units framework (Fig. 2) taking into account (PDUs), physical hosts and virtual the data mining approach for collecting machines. The abstraction API facili- and elaborating the available data and tates the access to the live measure- further, capturing long term data. In ments so that the monitoring collector the scope of the monitoring framework can collect and store those measure- development, HLRS considered various ments in the accounting database. previously developed monitoring solu- The data collection of the monitoring tions such as GAMES [2] and OPTIMIS [3] infrastructure is performed through the projects. These solutions mainly rely on monitoring metrics applicable for the the monitoring tool Nagios [4] whereas PDUs, physical hosts, virtual machines BonFIRE already provides a complete and applications. A central feature of Zabbix [5] monitoring framework. Hence, the monitoring system is the possibility the monitoring solutions were adapted in to estimate the energy consumption of order to overcome different monitoring virtual machines (Fig. 3) as represented approaches as well as potential monitor- by the following simplified formula: ing overhead.

The monitoring infrastructure, including Based on the ECO2Clouds monitor-

the three layers infrastructure, virtu- ing framework, the ECO2Clouds data alization and application layer, and the mining service makes use of the data related monitoring components, the collected through the monitoring met-

abstraction API and the monitoring col- rics. The ECO2Clouds data mining archi- tecture consists of mainly two major components, the accounting service, to perform transfer of non-reduced data to a remote data storage (DM storage), and the DM storage, which gathers the non-reduced data and per- forms statistical analysis over them, e.g., correlation analysis over a large

Figure 3: Energy consumption per VM

66 Spring 2015 • Vol. 13 No. 1 • inSiDE Projects

Figure 4: Data mining architecture

enough portion of data. The data reduc- The ECO2Clouds results can be applied tion service is a crucial component of and reused to other cloud infrastruc- the accounting service to avoid an un- tures and research projects dealing controllable growth of the accounting with federated clouds and eco-effi- DB, which could cause overloading with ciency, focusing on reducing the energy the old data while the new data could consumption and CO2 emissions. not be stored and used. References Further implementations were per- [1] BonFIRE Project Website: http://www. bonfire-project.eu/, Last visited: March 16, formed for the two case studies and 2015. additionally, GAMES project case studies were the foundation of the case studies [2] GAMES Project Website: http://www. dealing with Finite Element (FEM) simu- green-datacenters.eu/, Last visited: March 16, 2015. lations for malicious bones which rep- resent a High Performance Computing [3] OPTIMIS Project Website: http://www. (HPC) application. The second case optimis-project.eu/, Last visited: March 16, study is dealing with modern e-Busi- 2015. ness applications: various benchmarks [4] Nagios Website: http://www.nagios. like the Linpack as well as I/O intensive org/, Last visited: March 16, 2015. • Axel Tenschert operations were executed to simulate a • Michael Gienger common e-Business application behav- [5] Zabbix Website: http://www.zabbix. com/, Last visited March: 16, 2015. iour. Moreover the mandatory cloud University of infrastructure was provided to enable [6] OCCI working group Website: http:// Stuttgart, HLRS, the deployment execution and obtain occi-wg.org/, Last visited March: 16, 2015. Germany detailed data regarding the energy consumption and thus, facilitating the calculation of the carbon footprint. For that purpose, the already offered infra- structure of BonFIRE was used and ex- tended with PDUs. HLRS was providing contact: Axel Tenschert, the host for the central access points [email protected] of ECO2Clouds, an application portal Michael Gienger, and the scheduler. [email protected]

Spring 2015 • Vol. 13 No. 1 • inSiDE 67 Projects MIKELANGELO – Micro Kernel Virtualization for High Performance Cloud and HPC Systems

rity of the virtual infrastructure through packaged applications, using the lean guest operating MIKELANGELO is a project, tar- system OSv [1] and the newly devel- geted to advance the core technol- oped superfast sKVM. ogies of Cloud computing, enabling In short, the work will concentrate even bigger uptake of virtualization on significant performance im- mechanisms, especially in the HPC provements of virtual I/O in KVM and Big Data domains. HPC in the [2], using enhanced virtual I/O Cloud and Big Data technologies expertise and technologies, inte- under one umbrella. The vision grated and optimized in conjunction of MIKELANGELO is to improve with the lightweight operating sys- responsiveness, agility and secu- tem OSv. Thus, it will provide the

68 Spring 2015 • Vol. 13 No. 1 • inSiDE Projects private and public cloud communi- ties technologies for fast, agile and secure Cloud application deploy- ments in manifold hardware infra- structures and environments.

With respect to the proposed initial architecture above, an approach and the accompanying software stack that will disrupt the tradi- tional HPC and Private Cloud fields will be developed, tested and vali- dated. The key enablers for achiev- ing this ambitious goal are the use and enhancements of the state- of-the-art hypervisor software KVM, improvements in Virtio [3], a virtualization standard for network and disk device drivers as well as mechanisms for cross-layer optimi- zation in conjunction with the guest operating system OSv. In order to validate the achieved results, the solution will be integrated in one of the well-accepted Cloud and HPC management systems (such as Big Data OpenStack and Torque). The Big Data use case will leverage Use Cases the MIKELANGELO software stack The MIKELANGELO project will imple- in order to provide flexible Big Data ment various use cases, covering services enabling high performance different application domains with processing. In the context of this different requirements for Cloud use case, flexibility needs to cover and High Performance Computing on-demand access and elasticity. To environments in order to demon- achieve this special kind of flexibility, strate the functionality and flexibil- a dedicated Big Data software stack ity of the developed and optimized will be directly integrated within a components. Therefore, all use Cloud middleware. cases are built to show the capa- bilities and agility of the MIKELAN- Cloud Bursting GELO virtualization stack, as well as to measure and compare their In a Cloud environment, MIKELANGELO’s performance with traditional software stack will showcase how to approaches. solve the problem of Cloud Bursting efficiently. Cloud bursts are a web-

Spring 2015 • Vol. 13 No. 1 • inSiDE 69 Projects

phenomenon, which leads many MIKELANGELO software stacks are users to suddenly use a particular targeted. Flexible and scalable work- service, outstripping its capacity. load based services to simulate and MIKELANGELO is used to optimize analyse computational fluid dynamics and enhance unprecedentedly reac- for new aircraft designs in the Cloud tive cloud elasticity. will be enabled.

Simulation using OpenFOAM Cancellous Bones Simulation

Within this use case, high perfor- This use case will implements mech- mance simulation of plane wings anisms to determine material be- using the OpenFOAM [4] and haviour of cancellous bones in order to improve bone implant systems. As high performance for both, pro- cessing and I/O are required, this use case is executed at the edge of performance capabilities of modern Cloud systems and will strongly benefit of the MIKELAGENLO devel- opments.

70 Spring 2015 • Vol. 13 No. 1 • inSiDE Projects

ROLE of HLRS in the Key Facts Project MIKELANGELO is a large-scale project The role of the High Performance funded by the European Commission Computing Center Stuttgart is to within Horizon 2020 Research coordinate the Workpackage on use and Innovation Programme. It has cases and architecture analysis, started at 01.01.2015 and will run which is concerned with the defini- until 31.12.2017. tion and evaluation of the use cases and the overall architecture of the Webpage: project. Furthermore, HLRS will inte- http://www.mikelangelo-project.eu grate the developed MIKELANGELO components into its Research Cloud Twitter: and a HPC test system in order to twitter.com/mikelangelo_eu support the execution and validation of the use cases. Finally, HLRS will References develop and optimize the Cancellous [1] OSv, h t t p : // o s v . i o / Bone Simulation software kernel for [2] KVM, http://www.linux-kvm.org/page/ distributed Cloud execution. Main_Page

Partners [3] Virtio, http://www.linux-kvm.org/page/ Virtio The MIKELANGELO consortium brings together 9 partners, provid- [4] OpenFOAM, http://www.openfoam.com/ ing comprehensive research, techni- cal and business expertise in hyper- visor I/O research and development (IBM, Huawei), guest operating systems development and kernels (Cloudius), security in co-tenant • Michael Gienger environments (BGU), management of Cloud and HPC environments (GWDG, University of XLAB, HLRS), monitoring and perfor- Stuttgart, HLRS, mance evaluation of systems and Germany applications (GWDG, HLRS, Intel) contact: Michael Gienger, and finally, a strong business con- [email protected] text for exploiting the results (Intel, Bastian Koller, IBM, XLAB, PIPSTREL). [email protected]

Spring 2015 • Vol. 13 No. 1 • inSiDE 71 Projects FFMK: A Fast and Fault- tolerant Microkernel-based Operating System for Exascale Computing

Figure 1: FFMK software architecture: Compute processes with performance-critical parts of (MPI) runtime and communication driver execute directly on L4 microkernel; non-critical functionality is split out into proxy processes on Linux, which also hosts global platform management.

Three out of five research directions Prof. Wolfgang E. Nagel (Center for of the priority program "Software Information Services and High Perfor- for Exascale Computing" (SPPEXA) mance Computing, TU Dresden). The are named system software, compu- starting point for the design of the tational algorithms, and application FFMK platform is the expectation that software. The project "FFMK: A Fast the following major challenges have and Fault-Tolerant Microkernel-based to be addressed by systems software Operating System for Exascale Com- for exascale computers. puting" connects these research directions in an international project. Current high-end HPC systems are tailored towards extremely well-tuned The FFMK project [1] is driven by applications, which are assigned fixed Prof. Hermann Härtig (Computer Sci- partitions of hardware resources by a ence, TU Dresden), Prof. Alexander scheduler such as SLURM. Tuning of Reinefeld (Zuse Institute Berlin), Prof. these applications includes significant Amnon Barak (Computer Science, load balancing efforts. We believe a Hebrew University of Jerusalem), and major part of this effort will have to

72 Spring 2015 • Vol. 13 No. 1 • inSiDE Projects

Figure 2: Run time of COSMO-SPECS+FD4 weather code on BlueGene/Q system JUQUEEN with gossip algorithm running in parallel: Gossip messages sent at different intervals; baseline is same benchmark without running gossip algorithm. Increasing the number of nodes causes more overhead at same intervals due to larger gossip messages describing more nodes. Shorter bars are better; orange part of the bars indicate the time spent communicating via MPI. shift from application programmers Furthermore, the sheer size of to operating systems and runtimes exascale computers with their (OS/Rs) because of the complexity unprecedented number of compo- and dynamics of future applications. nents will have significant impact on A number of runtime systems already fault rates. Some OS/Rs already addresses that challenge, notably address this concern in part by Charm++ and X10. We further enabling incremental and application- believe that an exascale OS/R specific checkpoint/recovery and must accommodate more dynamic by using on-node memory to store applications that extend and shrink checkpoint data. However, for exascale during their runtime, for example, machines, we expect more types of when computational demand explodes memory that differ in aspects like as particle density in a simulation persistence, energy requirements, increases. fault tolerance, and speed. Important examples are on-node non-volatile In addition to imbalances within memory (phase-change memory, applications, we expect future flash, etc.) and stacked DRAM. exascale hardware to have less A checkpoint store that can keep up consistent performance due to with increased fault rates and the fabrication tolerances and thermal vastly enlarged application-state on concerns. This is a radical change exascale machines requires an from the general-purpose CPUs integrated architecture that uses all (like x86-64) and accelerators (e.g., these different types of memory. GPGPUs) that are used today, which are assumed and selected to perform We believe that a systems-software very regularly. We also assume design for exascale machines that that not all compute elements can addresses the just described chal- be active at all times due to power lenges must be based on a coordi- and heat issues (dark silicon). Thus, nated approach across all layers, hardware will add to the unbalanced including applications. The platform execution of applications in a way that architecture as shown in Fig. 1 cannot be predicted upfront. uses an L4 microkernel [2] as the

Spring 2015 • Vol. 13 No. 1 • inSiDE 73 Projects

Figure 3: Run time of MPI-FFT benchmark on BlueGene/Q system JUQUEEN with gossip algorithm running in parallel: Gossip messages sent at different intervals; baseline is same benchmark without running any gossip algorithm. Increasing the number of nodes causes more overhead at same intervals due to larger gossip messages describing more nodes. Shorter bars are better; orange part of the bars indicate the time spent communicating via MPI.

light-weight kernel (LWK) that runs on (i. e., local scheduling of threads/ each node. All cores run under this processes), (2) per application/parti- minimal common foundation consist- tion among nodes, and (3) based on a ing of the microkernel itself and a few global view of a master management extra services that provide higher- node. Node-local schedulers take care level OS functionality. Additionally, an of (1); scalable gossip algorithms instance of a full commodity OS (Linux support (2) and (3). in our case) runs on top of it, but only on a few dedicated cores that we Using gossip, the nodes build up a dis- refer to as service cores. tributed, inherently fault tolerant, and Application programming paradigms scalable bulletin board that provides are supported by small dedicated information on the status of the sys- library operating systems running tem. Nodes have partial knowledge of directly on L4. These library OSes the whole system: they know about and their supporting components are only a subset of the other nodes, but split into performance-critical parts enough of them in order to make deci- running directly on L4 and uncritical sions on how to balance load and how parts running as mostly idle proxy to react to failures in decentralized processes on Linux; we refer to both way. The global view over all nodes is parts as a runtime. available to the master node, which receives gossip messages from some In the presence of frequent compo- nodes. It makes global decisions nent failures, hardware heterogene- such as where to put processes of a ity, and dynamic demands, applica- newly started application. We dem- tions can no longer assume that onstrated that this two-layer gossip compute resources are assigned algorithm matches predictions of statically. We envision the platform an analytical model and simulations as a whole will be managed by load for the average age (i.e., quality) of distribution components. Monitoring the gossiped information; we further and decision making is done at three simulated the effect of node failures levels: (1) on each multi-core node [3]. We found that — at reasonable

74 Spring 2015 • Vol. 13 No. 1 • inSiDE Projects gossip intervals of a few hundred mil- able. The process-level "taskification" liseconds — there is no noticeable of MPI also enables the FFMK OS on overhead on application performance each node to migrate MPI processes when the gossip algorithm runs on among cores to balance load locally; the same nodes and interconnect as cross-node migration is currently not the application (see Fig. 2 and Fig. 3 part of our prototype, as the previously from [4]). described platform management is not yet integrated. To handle hardware faults, a fast checkpointing module takes check- At the time of this writing, after the point state from applications and second year of the project run time, distributes and stores it redundantly the basic prototype (L4 microkernel, in memory across several nodes. , MPI, Infiniband) has been Our prototype is currently based on tested on a small partition of an HPC XtreemFS [5], which uses erasure system installed at TU Dresden’s coding to provide an efficient, redun- "Center for Information Services and dant checkpoint store backed by RAM High-Performance Computing". The disks. We demonstrated checkpoint/ weather code COSMO-SPECS+FD4 restart of a production code on a ran on 17 nodes connected with an Cray XC30, where a node holding FDR Infiniband network (240 cores checkpoint state was taken offline to in total). Another large production simulate failure. code (CP2K), small test codes, and the Intel MPI benchmarks have been The prototype provides a split MPI ported to our OS prototype. They runtime and communication driver comprise in the order of millions of (Infiniband). The Linux-based compo- lines of C, C++, and FORTRAN source nents provide management-related code, which we did not have to modify. functionality and are not on the critical path, whereas performance- While source-level compatibility to ex- critical functionality of MPI and the isting platforms is a great advantage, Infiniband driver runs directly on the we believe that efficient management LWK. Management functionality of of the platform at exascale requires Linux-hosted proxy processes includes cooperation from applications. In forwarding messages to and from the particular, application-level hints can MPI process manager during startup improve efficiency for fault-tolerance and shutdown. They also interface mechanisms (e.g., optimize check- with the Mellanox Infiniband kernel point/restart or avoid it altogether, if driver in Linux to manage allocation algorithms can compensate for failing of message transfer buffers and to nodes). They can also enable load- take care of the rather complicated balancing components predict what connection handling. We currently resources will be needed next (e.g., use MPI as an application runtime CPU or memory required by a pro- also to demonstrate extending and cess to compute the next time step) shrinking of existing applications. and whether migration is worthwhile. We therefore create more MPI pro- cesses than there are cores avail-

Spring 2015 • Vol. 13 No. 1 • inSiDE 75 Projects

We develop in cooperation with References application partners novel interfaces [1] FFMK project website: http://ffmk.tudos.org such that an HPC program can pass [2] Härtig, H., Hohmuth, M., Liedtke, J., hints to the OS/R about what it is Schönberg, S., Wolter, J. doing and what resources it needs. The performance of μ-kernel-based sys- The ultimate goal of the project is tems, In: Proceedings of the 16th ACM Symposium on Operating System Principles to build and integrate mechanisms (SOSP), pages 66–77, Saint-Malo, France, of a self-organizing, fault-tolerant 19 9 7. HPC platform that frees application [3] Barak, A., Drezner, Z., Barak, A., developers from micromanaging Levy, E., and Shiloh, A. the dynamic exascale hardware we Resilient gossip algorithms for collecting expect in the future. online management information in exascale clusters, Concurrency and Computation: Practice and Experience, 2015.

[4] Levy, E., Barak, A., Shiloh, A., Lieber, M., Weinhold, C., Härtig, H. Overhead of a Decentralized Gossip Algorithm on the Performance of HPC Applications, ROSS 2014, June 2014, Munich, Germany.

[5] XtreemFS website: http://www.xtreemfs.org

Acknowledgments This project is supported by the pri- ority program 1648 "Software for Exascale Computing" funded by the • Hermann Härtig German Research Foundation (DFG). • Carsten Weinhold The authors acknowledge the Jülich Supercomputing Centre, the Gauss Department Centre for Supercomputing, and the of Computer John von Neumann Institute for Com- Science, TU puting for providing compute time on Dresden, the JUQUEEN supercomputer. Germany

Contact: Carsten Weinhold, [email protected]

76 Spring 2015 • Vol. 13 No. 1 • inSiDE Projects EUDAT2020

Figure 1: Interaction of EUDAT services

EUDAT2020 brings together a unique EUDAT2020’s vision is to enable Eu- consortium of e-infrastructure provid- ropean researchers and practitioners ers, research infrastructure opera- from any research discipline to pre- tors, and researchers from a wide serve, find, access, and process data range of scientific disciplines, working in a trusted environment, as part of together to address the new data a Collaborative Data Infrastructure challenge. In most research commu- (CDI) conceived as a network of col- nities, there is a growing awareness laborating, cooperating centres. The that the "rising tide of data" will CDI combines the richness of numer- require new approaches to data man- ous community-specific data reposi- agement and that data preservation, tories with the permanence and per- access and sharing should be sup- sistence of some of Europe’s largest ported in a much better way. Data, scientific data centres. and in particular Big Data, is an issue touching all research infrastructures.

Spring 2015 • Vol. 13 No. 1 • inSiDE 77 Projects

EUDAT2020 builds on the founda- During its three-year funded life, tions laid by the first EUDAT proj- EUDAT2020 will evolve the CDI ect (Fig. 1), strengthening the links into a healthy and vibrant data- between the CDI and expanding its infrastructure for Europe, and functionalities and remit. Covering position EUDAT as a sustainable both access and deposit, from in- infrastructure within which the formal data sharing to long-term ar- future, changing requirements of a chiving, and addressing identification, wide range of research communities discoverability and computability of are addressed. both long-tail and big data, EUDAT2020’s services will address EUDAT Service Suite the full lifecycle of research data. Research communities from different disciplines have different needs for One of the main ambitions of data management and analysis, but EUDAT2020 is to bridge the gap they all have requirements for basic between research infrastructures data services. and e-Infrastructures through an active engagement strategy, using EUDAT offers common data services the communities that are in the (Fig. 2) that support multiple research consortium as EUDAT beacons and communities through a geographically integrating others through innovative distributed network of general-pur- partnerships. pose data centres and community- specific data repositories.

B2DROP is a secure and trusted data exchange service for researchers and scientists to keep their research data synchronized and up-to-date and to exchange it with other researchers.

B2SHARE is a user-friendly, reliable and trustworthy service for research- ers, scientific communities and citizen scientists to store and share small- scale research data from diverse contexts. B2SHARE offers the data owner to define access policies, add community-specific metadata and assigns persistent identifiers to data sets.

B2SAFE is a robust, safe and highly available service that allows com- munity repositories to implement data management policies on their research data and replicate data Figure 2: EUDAT service suite

78 Spring 2015 • Vol. 13 No. 1 • inSiDE Projects across multiple administrative domains in a trustworthy manner.

B2STAGE is a reliable, efficient, and lightweight service to transfer re- search data sets between EUDAT storage resources and High Perfor- mance Computing (HPC) workspaces.

B2FIND is a simple, user-friendly metadata catalogue of research data collections stored in EUDAT data cen- tres and other repositories.

• Daniel Mallmann

Jülich Supercomputing Centre (JSC), Germany

contact: Daniel Mallmann, [email protected]

Spring 2015 • Vol. 13 No. 1 • inSiDE 79 Projects ORPHEUS — Fire Safety in the Underground

tions is the thermally driven movement of hot smoke or toxic gases. The gas dynamics may become very intensive and hard to control. This puts the pas- sengers in a very dangerous situation, Public transport systems form a since smoke has a very negative impact fundamental part of our cities and on the rescue process, as it limits the play a major role in our society. Thus, mobility and visibility of the occupants civil safety aspects of underground and the rescue teams. Hence, most transport facilities are of crucial fatalities occur due to the toxic nature importance. of smoke. In contrast to common sta- Risks and danger manifest in large tions or street tunnels, underground disasters that take many innocent lives stations often have a very low ceiling and therefore draw large attention height and therefore cannot sustain a (like 1987 in London, Great Britain or smoke free layer, which is large enough 2003 in Daegu, South Korea, to name to prevent suffocation. only a few). However, there are many As a consequence, fires in under- more incidents that do not trigger ground stations massively challenge strong medial attention (like recently in the rescue operation and demand an Washington, USA, and London, Great Britain, 2015). The demand of safety research in this context is represented by recent projects such as the Swedish project METRO [1] or the German projects OrGaMIR / OrGaMIRPLUS [2]. Both research projects covered selected safety aspects of underground metro stations. So far, there exist only limited concepts to construct smoke extraction systems in complex underground facilities. Such systems play a major role for the per- sonal safety during fire hazards and therefore smoke extraction systems represent the main aim of the ORPHEUS project – abbreviated Figure 1: Typical metro stations are complex and "Optimierung der Rauchableitung und compact structures. Hot smoke and passengers Personenführung in U-Bahnhöfen: have in general the same movement paths to Experimente und Simulationen". the surface. The figures illustrate the metro sta- Thereby, the main obstacle in designing tion "Osloer Strasse" in Berlin. safety concepts for underground sta-

80 Spring 2015 • Vol. 13 No. 1 • inSiDE Projects effective inter-organisational crisis All of them met at the project kick-off management. Considering the high rel- meeting in Jülich on 12 March 2015. evance, the presented project The proposed research plan is divided investigates concepts and strategies into three main parts: to improve the safety of underground stations during a fire with an intensive 1) Fire experiments in existing stations smoke yield. This covers technical as and assessment of the current state- well as inter-organisational aspects. of-the-art of technical fire safety February 2015 saw the start of the engineering systems and methods, ORPHEUS project, which is funded by safety concepts, detection systems as the Federal Ministry of Education and well as the validity of numerical tools. Research (BMBF) for 36 months [3]. The consortium is coordinated by JSC's 2) Technologies and concepts for division "Civil Safety and Traffic" [4] and personal and operational safety in case consists of the following funded partners: of fire with emphasis on people with • Bundesanstalt für Materialforschung special requirements and handicaps und -prüfung covering physical and numerical • IBIT GmbH modelling of smoke extraction and • Imtech Deutschland GmbH guiding systems as well as numerical • Institut für Industrieaerodynamik GmbH pedestrian simulations. • Ruhr-Universität Bochum, Lehrstuhl für Höhlen- und U-Bahnklimatologie 3) Reactive situation detection and and the associated partners support systems for rescue teams with respectively subcontractors integration of the project's results in • Berliner Feuerwehr inter-organisational crisis management • Berliner Verkehrsbetriebe systems. • Deutsche Bahn Station&Service AG • Hekatron Vertriebs GmbH • Karstadt GmbH • NVIDIA GmbH • Team HF PartG

Figure 2: Participants of the kick-off meeting in Jülich

Spring 2015 • Vol. 13 No. 1 • inSiDE 81 Projects

Fire Experiments physical experiments (1:5 and 1:20) The first of the three main parts con- and numerical simulations of smoke sists of the execution and validation and heat propagation in underground of experiments in an operative metro stations will form the basis for the station in Berlin. Outside of operating project studies. hours, real fire experiments will be car- Besides the application of existing ried out in a fully monitored station. CFD (Computational Fluid Dynamics) The target station (Osloer Strasse) has models, new concepts for mesh- a complex structure with three levels. adaptive methods will be developed Multiple experiments will be carried out by JSC. They will allow refining the with real fires. These technical fires numerical mesh in regions of high produce no soot or toxic gases and are dynamics and gradients, while passive only used as a dynamic heat source to regions will be resolved by a coarse model thermally driven flows. The diag- mesh. These techniques are crucial to nostic is based on tracer gases (e.g. provide sufficient numerical resolution

SF6) that are added to the fire plume. The obtained data (e.g. velocity, tem- perature, and tracer gas concentra- tions) measured at many locations will be used to validate small scale and numerical models. These experiments are accompanied by long-term under- ground climate measurements and will apply novel techniques that make use of existing communication cables in the tunnels. The collected climate data will also be considered during the concept phase.

Preventive Fire Safety Figure 4: First steps toward mesh adaptive Concepts simulations. Here, the structure of a hot plume The second part aims to investigate is well resolved, while the surrounding may stay concepts for novel smoke control coarsely represented. and extraction systems as well as inside small subdomains of a large and evacuation aspects. During the complex structure. An accompanying initial phase of the project, the fire goal is to develop a fire simulation safety objectives will be defined and model that can make efficient use of used to compare different safety HPC systems. systems and concepts. Small-scale Special emphasis is put on the avail- ability of evacuation paths and routes used by fire fighters. Therefore, pedes- trian simulations will accompany the work on preventive technical systems. These simulations will be coupled to fire simulations and will therefore allow evaluating the impact of smoke on the evacuation (e.g. visibility or toxic affec- Figure 3: A snapshot of a smoke spread simulation in an underground station

82 Spring 2015 • Vol. 13 No. 1 • inSiDE Projects tion). Additionally, the pedestrian simu- analysis – also with respect to warnings lation model JuPedSim [5], developed and alarms. Besides the individual organ- in-house at JSC, will be expanded to isation analysis, the cooperation and in- include psychological decision models. teraction between affected organisations before and during an incident is studied. The investigation covers the transfer of knowledge and information between the emergency services and the communi- cation to infrastructure users and con- cerned residents. To tackle the complex interaction scheme, an inter-organisational map and interfaces will be developed. The inter- faces include communication paths as Figure 5: Mobility and vision handicapped people pose special requirements, which are well as interaction points in the individual considered in this project. crisis management. This work will allow identifying inconsistent processes and Humans with special requirements, potential for optimisation. Other impor- like sight and/or mobility handicapped tant communicative aspects are warn- people (e.g. elderly), will play a major ings and alarms. Thereby, investigated role in the analysis and optimisation of topics are the contents and forms of the evacuation routes. messages sent to the occupants. To support rescue teams and techni- Support During Rescue cal systems, JSC will also examine a Operations real-time smoke propagation prognosis The third part covers the interaction model. This CFD based model is sup- between the operators, emergency posed to run on GPGPU systems to pro- services, and third parties, like shop- vide a short time-to-solution and should keepers. In the case of fire, processes be coupled to sensors to include live on the surface are essential for rescue parameters such as the position or the attempts. Revision of past events and intensity of the fire as well as climate interviews with organisations involved will data. allow a communication pattern

Figure 6: Surface events like markets may be located directly at the metro's exits and therefore amplify the complexity of a rescue operation.

Spring 2015 • Vol. 13 No. 1 • inSiDE 83 Projects

Conclusion All concepts created in this project focus on underground metro stations. However, during the final phase of the project the outcome will be evaluated for the application to other transport infrastructures, like airports or car parks. The technical and organisational results form the scientific basis to develop production systems and tools for fire safety engineering.

Funding BMBF, call "Zivile Sicherheit – Schutz und Rettung bei komplexen Einsatzlagen"

Funding amount Euro 3.2 million

Project duration February 2015 – January 2018

Project website http://www.orpheus-projekt.de

Links [1] http://www.metroproject.se • Lukas Arnold [2] http://www.bmbf.de/pubRD/OrGaMIR.pdf

Jülich [3] http://www.bmbf.de/pubRD/Projektumriss_ Supercomputing ORPHEUS.pdf Centre (JSC), [4] http://www.fz-juelich.de/ias/jsc/cst Germany [5] http://www.jupedsim.org/

contact: Lukas Arnold, [email protected]

84 Spring 2015 • Vol. 13 No. 1 • inSiDE Projects Seamless Continuation: 4th PRACE Implementation Phase Project

In response to the European Com- or co-processors, by investigating new mission’s call for proposal in 2014 programming tools and developing within the new European framework suitable benchmarks. programme Horizon 2020 PRACE partners from 25 countries submit- Increase European human resources ted a successful proposal and started skilled in HPC and HPC applications. the 4th Implementation Phase project The project will contribute by organizing (PRACE-4IP) on 1 February 2015. The highly visible events, enhancing the project will support the transition from state of the art training provided by the initial five-year period (2010-2015) the PRACE Advanced Training Centres of the Research Infrastructure estab- (PATCs), targeting both the academic lished by the Partnership for Advanced and industrial domains. On-line training Computing in Europe (PRACE) to PRACE 2.0. will be improved and a pilot deployment will assess a Massively open online Key objectives of PRACE-4IP are: Course (MooC).

Ensure long-term sustainability of the Support a balanced eco-system of infrastructure. The project will assist HPC resources for Europe’s research- the PRACE Research Infrastructure ers. The project will contribute to this (PRACE RI) in managing the transition objective through tasks addressing: a) from the business model used in the the improvement of PRACE operations; Initial Period. It demonstrated the case b) the prototyping of new services in- for a European HPC research infra- cluding "urgent computing", the visual- structure relying on the strong engage- ization of extreme size computational ment of four hosting partners (BSC data, and the provision of repositories representing Spain, CINECA represent- for open source scientific libraries. ing Italy, GCS representing Germany Links will be established with other and GENCI representing France) who e-infrastructures, the Centres of Excel- funded and deployed the petaflop/s lence which will be created in Horizon systems used by PRACE RI. 2020 and the existing international collaborations will be extended. Promote Europe’s leadership in HPC applications. Scientific and engineering Evaluate new technologies and define modelling and simulation require Europe’s path for using ExaFlop/s capabilities of supercomputers. The resources. The project will extend project will enable application codes its market watch and evaluation for PRACE leadership platforms based on user requirements, study and prepare for future systems, best practices for energy-efficiency notably those with architectural and lower environmental impact innovation embodied in accelerators throughout the life cycle of large

Spring 2015 • Vol. 13 No. 1 • inSiDE 85 Projects

HPC infrastructures and define best public for investment in HPC, in particu- practices for prototype planning and lar by illustrating success stories and evaluation. This will contribute to raising awareness in the potential for solve a wide range of technological, development. architectural and programming challenges for the exaflop/s area. PRACE-4IP is again coordinated and managed by by Forschungszentrum Jül- Disseminate effectively the PRACE ich. It has a budget of nearly Euro 16,5 results. This targets engaging European Mio including an EC contribution of Euro scientists and engineers in the wider 15 Mio. The duration will be 27 month. utilisation of high end HPC. The project will continue to organise well known Over 250 researchers collaborate in events like PRACEdays, Summer of PRACE from 49 organisations1 in 25 HPC or the International HPC Summer countries. 106 collaborators met in School in order to promote and sup- Ostrava from 28-29 April 2015, for the port innovative scientific approaches in PRACE-4IP kick-off meeting. The meet- modelling, simulation and data-analysis. ing was organised by IT4Innovation-VSB With the extended presence at confer- and held at the Technical University of ences (e.g. SC, ISC or ICT) the project Ostrava, Czech Republic. is seeking wider support of the general

Figure 1: PRACE collaborators at the PRACE-4IP kick-off meeting

86 Spring 2015 • Vol. 13 No. 1 • inSiDE Projects

Euro Euro EC Synopsis of the PRACE Project Id Number Buget Funds Duration Status Grant Number Partners Projects Mio. Mio. PRACE-PP 1.1.20 08 The European Commission supported 16 18.9 10.0 completed RI -211528 - 30.6.2010 the creation and implementation of PRACE-1IP 1.7.2 0 10 21 28.5 20.0 completed PRACE through five projects with a total RI-261557 - 31.8.2013 PRACE-2IP 1.9.2011 EC funding of Euro 82 Mio. The partners 22 25.4 18.0 completed RI-283493 - 31.8.2013 co-funded the projects with more than completed PRACE-3IP 1.7.2 0 12 Euro 33 Mio in addition to the commit- 26 26.8 19.0 PCP2 to end June 30, RI-312763 - 30.6.2014 ment of Euro 400 Mio for the initial 2016 PRACE-4IP 01.02.2015 26 16.5 15.0 started period by the hosting members to 653838 - 30.4.2017 procure and operate Tier-0 systems (total) 116.0 82.0 and the in-kind contribution of Tier-1 resources on systems at presently twenty partner sites. The following table gives an overview of the PRACE projects.

• Florian Berberich

Jülich Supercomputing Centre (JSC), Germany

1 This includes 26 Beneficiaries and 23 linked Third Parties from associated universities or centres. 2 The Pre Commercial Procurement Pilot (PCP) is part of the PRACE-3IP project, however with a duration of 48 month.

Spring 2015 • Vol. 13 No. 1 • inSiDE 87 Projects Intel® Parallel Computing Center (IPCC) at

The Intel® Parallel Computing Centers grids). All these codes have already (IPCC), are novel initiatives established demonstrated very high scalability on and financially supported by Intel within SuperMUC (with performance up to a universities, institutions, and labs that Petaflop/s, see SeisSol ACM Gordon are leaders in their own fields, and focus Bell paper [4]), but are in different on modernizing scientific applications stages of development, e.g. with re- to increase parallelization efficiency spect to running on the Intel MIC and scalability through optimizations architecture. While particularly target- that leverage cores, caches, threads, ing the new Intel architectures, e.g. and vectorization capabilities of micro- Xeon Phi (MIC) coprocessors, the proj- processors and coprocessors of Intel ect also simultaneously tackles some [1]. The main target architectures for of the fundamental challenges that these centers are the Intel MIC (Many are relevant for most supercomputing Integrated Cores), the Intel Xeon Phi architectures – such as parallelism on coprocessor [1]. multiple levels (nodes, cores, hardware The IPCC at Garching Research cam- threads per core, data parallelism) or pus, consisting of two closely collabo- compute cores that offer strong SIMD rating partners, LRZ and the Computer capabilities with increasing vector Science Department, Technische register width, e.g. Intel Haswell Universität München (TUM) [2], was architecture [2]. started in July, 2014 and is funded for Within this project, LRZ contributes two years. The Garching IPCC focuses on optimizing Gadget3, a numerical on the optimization of four different simulation code for cosmological highly acclaimed applications from structure formation, for the Intel different areas of science and engi- Haswell and the MIC architectures. neering: earthquake simulation and Gadget3 has already established itself seismic wave propagation with Seis- as a community code and been already Sol [3,4], simulation of cosmological scaled fairly efficiently on the entire structure formation using Gadget3 SuperMUC but Xeon Phi was basically [5], the molecular dynamics code ls1 an unexplored territory; no MIC version mardyn [6] developed for applications in of Gadget3 existed at the project start. chemical engineering, and the software The Garching IPCC at the same time framework SG++ [7] to tackle high- will also provide the opportunity to dimensional problems in data mining draw out the best possible perfor- or financial mathematics (using sparse mance with such codes from the Su-

88 Spring 2015 • Vol. 13 No. 1 • inSiDE Projects perMUC Haswell extension or potential and best known methods to its interested SuperMUC successors. The far-reach- HPC users. A workshop, as part of the ing goal of this project is to establish a LRZ activities as an Intel Parallel Comput- model process and collection of ing Center, was organized together experiences as well as best practices with Intel during Oct 20-21, 2014 at for similar codes in the computational LRZ [9]. The main objective of this sciences. To prepare simulation soft- workshop was to learn and discuss ware for this new platform and to advanced Xeon Phi programming and identify possible roadblocks at an early analysis methodologies. Case Studies stage, one has to tackle two expected of real world codes have been pre- major challenges: (1) achieving a high sented as hints for planning the opti- fraction of the available node‐level per- mal strategy of Xeon Phi Coprocessor formance on (shared‐memory) compute usage. Intel engineers were also avail- nodes and (2) scaling this performance able supporting this effort. The major up to the range of around 10,000 com- topics covered under this workshop are pute nodes. Concerning node‐level per- Xeon Phi Coprocessor Architecture, formance, the Garching IPCC considers Coprocessor Software Infrastructure compute nodes with one or several and Programming (e.g. Vectorization, Intel Xeon Phi (co-) processors. A re- OpenMP and Offloading). Furthermore, spective small evaluation platform (32 Intel MPI best practices for native and nodes, each with 2 MIC co‐processors symmetric usage in combination with and powerful Infiniband fabric between offloading were discussed. To analyze the nodes), known as SuperMIC, is in user applications Intel engineers intro- operation at LRZ since June 2014 [8]. duced the Intel Trace Analyzer and Scalability on large supercomputers Collector (ITAC) which can profile and is studied on SuperMUC, which in its visualize Intel MPI based parallel upgrade phase in 2015 is extended by application behavior, the Intel VTune a second 3‐PetaFlop/s partition based Amplifier XE for advanced sequential on the new Intel Haswell architecture. and threading analysis e.g. Vectoriza- All successful code optimizations and tion analysis, Debugging for Xeon Phi. improvements done within the IPCC Case Studies for symmetric MPI and works will be integrated in the regular for Offload in combination with MPI on software releases of the four research codes and will therefore provide feed- back to the scientific user communi- ties. The initial results of the code optimization and re-engineering were presented in the first IPCC User Forum Meeting in Berlin (Sep, 2014). More results on performance and MIC related improvements were showcased in the second IPCC User Forum Meet- ing in Dublin (Feb, 2015). To widen the community outreach of the IPCC works LRZ, together with its funding partner Intel, has also taken initiatives to spread the MIC know-how Figure 1: Intel® Xeon Phi™ Coprocessor (image courtesy ©Intel Inc.)

Spring 2015 • Vol. 13 No. 1 • inSiDE 89 Projects

the host only were demonstrated to [3] Dumbser, M., Käser, M. An arbitrary high order discontinuous the audience. Special emphasis was Galerkin method for elastic waves on given on "hands-on" sessions on the unstructured meshes II: The three- four IPCC codes. There were 21 dimensional case. Geophys. J. Int. 167(1), participants from different German pp. 319–336, 2006. Dumbser, M., Käser, M., Toro, E.F. An arbitrary high order universities and research institutes. discontinuous Galerkin method forelastic Due to its overall success and waves on unstructured meshes V: Local extremely high demand from the time stepping and p-adaptivity. Geophys. J. users the IPCC is planning a follow-up Int. 171(2), pp. 695–717, 2007. workshops in the near future. [4] Heinecke, A., Breuer, A., Rettenberger, S., Bader, M., Gabriel, A., Pelties, C., Project Partners Bode, A., Barth, W., Liao, X., Vaidyanathan, K., Smelyanskiy, M., and Principle collaborators Dubey, P. 1. Leibniz Supercomputing Centre of Petascale high order dynamic rupture Bavarian Academy of Sciences and earthquake simulations on heterogeneous Humanities supercomputers. Proc. Int. Conf. High Performance Computing, Networking, PI: Arndt Bode Storage and Analysis (SC '14). IEEE Responsible for Gadget3 Press, Piscataway, NJ, USA, 3-14, 2014 2. Technische Universität München, (DOI=10.1109/SC.2014.6 http://dx.doi. Department of Informatics org/10.1109/SC.2014.6) PI: Michael Bader [5] Bazin, G., Dolag, K., Hammer, N. Responsible for Seissol. Gadget3: Numerical Simulation of Struc- ture Formation in the Universe, PI: Hans Joachim Bungartz ‐ inside 11, 2, 2013. Responsible for ls1 mardyn and SG++ • Anupam Karmakar [6] Niethammer, C., Becker, S., Bernreuther, M., Buchholz, M., • Nicolay Hammer Associated Partners Eckhardt, W., Heinecke, Werth, A. S., • Luigi Iapichino 1. Ludwig-Maximillians-Universität, Bungartz., H.-J., Glass, C. W., Hasse, H., • Vasileios Karakasis Geo and Environmental Sciences Vrabec, J., and Horsch, M. Heiner Igel and Christian Pelties Ls1 mardyn: The massively parallel mo- lecular dynamics code for large systems., Leibniz 2. Ludwig-Maximilians-Universität, Journal of Chemical Theory and Computa- Supercomputing USM, Computational Astrophysics tion 10 (10): pp. 4455-4464, 2014. Centre (LRZ), Klaus Dolag [7] Pflüger, D. Germany 3. Intel Corporation USA, Intel Labs, Spatially Adaptive Sparse Grids for Higher- Parallel Computing Lab Dimensional Problems. Dissertation, Verlag Alexander Heinecke Dr. Hut, München, 2010. ISBN 9-783-868- 53555-6.

References [8] Weiberg, V., Allalen, M. [1] Intel Parallel Computing Center The new Intel Xeon Phi based System Su- https://software.intel.com/en-us/ipcc perMIC at LRZ, inside 12, 2, Autumn 2014.

[2] IPCC at LRZ and TUM [9] IPCC Intel Xeon Phi Coprocessor Workshop https://software.intel.com/en-us/ http://www.lrz.de/services/compute/ articles/intel-parallel-computing-center-at- courses/archive/2014/2014-10-20_ leibniz-supercomputing-centre-and- hmic1w14/ technische-universit-t

90 Spring 2015 • Vol. 13 No. 1 • inSiDE Projects

contact: Anupam Karmakar, [email protected] Nicolay Hammer, [email protected] Luigi Iapichino, [email protected] Vasileios Karakasis, [email protected]

Figure 2: Participants of the IPCC Intel Xeon Phi Coprocessor Workshop during October 20 - 21, 2014 at LRZ

Spring 2015 • Vol. 13 No. 1 • inSiDE 91 Systems JURECA: Jülich Research on Exascale Cluster Architectures

Figure 1: (Left) Front view of a V-class enclosure. (Right) Rear view of a V-class enclosure hosting ten V210S compute nodes (slim or fat nodes of type 1 in JURECA). The same chassis can host five V210F accelerated compute nodes occupying twice the space of V210S node.

According to its dual architecture performance of roughly 1.8 petaflops strategy, Forschungszentrum Jülich is per second – a six-fold increase offering access to a leadership-class over JUROPA. With its high speed capability system and to a general-pur- connection of about 100 GiBps to pose supercomputing system. The latter the center-wide exported GPFS file meets the users’ need for mixed capac- systems, JURECA will not only serve ity and capability computing time. Since the widest variety of user communities 2009, the JUROPA (Jülich Research on from the traditional computational Petaflop Architectures) cluster, based science disciplines but will also be on Intel Nehalem CPUs and quad data a welcoming home to data-intense rate InfiniBand networking technology, science and big data projects. has taken up this challenge and has enabled outstanding science by research- JURECA will be built from V-class blade ers from around the world. Now, servers of the Russian supercomputing Jülich Supercomputing Center (JSC) vendor T-Platforms. Once fully installed, has started to install the next-genera- the system will consist of more than tion general-purpose cluster JURECA 1,800 compute nodes with two Intel (Jülich Research on Exascale Cluster Haswell E5-2680 v3 12 core CPUs per Architectures) which will supersede node. About 1,680 compute nodes will JUROPA. be equipped with 128 GiB DDR4 main memory, i.e., more than 5 GiB per JURECA will be based on latest- core. In support for workflows requiring generation Intel Haswell CPUs more memory per core, additional 128 and provide a peak floating point nodes with 256 GiB and 64 nodes with

92 Spring 2015 • Vol. 13 No. 1 • inSiDE Systems

512 GiB DDR4 RAM will be available. the peak floating point capabilities of 75 nodes will be equipped with two compute nodes in JURECA are 10 NVIDIA K80 cards each, providing an ad- times higher than that of a JUROPA ditional flop rate of 430 teraflops per sec- node. From the user perspective, how- ond for accelerator-capable applications. ever, this performance improvement 12 Login nodes with 256 GiB per node does not come for free but requires will be available for workflow- and data- code optimizations and potentially management as well the convenient refactoring in order to take advantage execution of short running pre- and of the wider single instruction multiple data post-processing operations. Addition- (SIMD) units of the Haswell CPUs. Since ally, 12 visualization nodes with 512 GiB October 2014, JSC is providing JUROPA (10 nodes) and 1 TiB main memory users with access to the 70 TFlops/s (2 nodes) and two NVIDIA K40 GPUs Haswell cluster JUROPATEST to foster are available for advanced visualization such efforts. purposes. JURECA’s visualization parti- tion will replace the older JUVIS visu- JURECA will be interconnected with a alization cluster at JSC and move the cutting-edge Mellanox EDR (extended analysis of data closer to its source. data rate) InfiniBand network. EDR An overview of the different node types InfiniBand with four lanes per direction in JURECA is shown in Table 1. Fig. 1 achieves a unidirectional point-to-point and Fig. 2 show the V-class chassis bandwidth of 100 Gbps – a significant and blade servers employed in JURECA. bandwidth improvement compared to the 4x QDR InfiniBand technology in use The Haswell CPUs in JURECA support by JUROPA. As in JUROPA, the JURECA the AVX 2.0 instruction set architec- components will be organized in a fully ture extension and can perform two non-blocking fat-tree topology. The core 256-bit (i.e., two times 4 double preci- of the fabric is constituted by four 648- sion floating point numbers) wide mul- port EDR director switches connected tiply-add operations per cycle. Due to to more than a hundred 36-port leaf the increased core count and improved switches located in the compute node microarchitecture of the Haswell CPUs, racks.

Figure 2: A V210S compute node as used in the JURECA cluster. Each node hosts two CPU sockets. The V201F version, which has twice the width of a V201S module, can additionally host two PCIe-based accelerator cards.

Spring 2015 • Vol. 13 No. 1 • inSiDE 93 Systems

JURECA will not be equipped with a viding high bandwidth and reliability system-private (global) file systems through failover mechanisms. The like JUROPA, but will be connected employed gateway switches will each to JSC’s storage cluster JUST4 and route traffic between eighteen FDR mount the work and home file systems (Fourteen Data Rate) InfiniBand links from there. For users with access to on one side and eighteen 40GigE links multiple computing systems at JSC this on the other side. While JURECA is consolidation of the storage landscape built around EDR InfiniBand technology, simplifies the data management and the storage connection is realized with reduces data movement across the lower performing FDR links due to the center. The connection to JUST4 will be available market offerings. based on InfiniBand and Ethernet tech- nologies. By utilizing the high-bandwidth JURECA’s cutting edge hardware setup InfiniBand network for storage, JURECA is matched by a state-of-the-art soft- will not only feature a high accumulated ware stack. The system will be launched storage bandwidth but individual nodes with a CentOS 7 Enterprise-Linux instal- will be able to sustain a significant lation featuring a 3.10 Linux kernel. The portion of this total bandwidth. This main MPI (Message Passing Interface) design choice ensures JURECA’s ability implementation will be ParaStation MPI to service the widest variety of users’ which, in the newest version available requirements from capacity to capabil- on JURECA, supports MPI-3.0. Ad- ity computing and traditional computa- ditionally, Intel MPI will be supported tional to emerging big data sciences. on JURECA. JURECA will be the first To bridge between the high speed large-scale system at JSC on which the InfiniBand network and JSC’s Ethernet- open-source Slurm workload manager based storage backbone network, will be employed. In the context of the Mellanox gateway switches are em- JUROPA collaboration a new plugin for ployed in an active/active mode pro- the ParaStation resource management

Node Type Number Characteristics

Standard/Slim 1.605 2x Haswell E5-2680 v3, 128 GiB DDR4 RAM

Fat Type 1 128 2x Haswell E5-2680 v3, 256 GiB DDR4 RAM

Fat Type 2 64 2x Haswell E5-2680 v3, 512 GiB DDR4 RAM

Accelerated 75 2x Haswell E5-2680 v3, 128 GiB DDR4 RAM, 2x K80 GPUs

Login 12 2x Haswell E5-2680 v3, 256 GiB DDR4 RAM

Visualization Type 1 10 2x Haswell E5-2680 v3, 512 GiB DDR4 RAM, 2x K40 GPUs

Visualization Type 2 2 2x Haswell E5-2680 v3, 1 TiB DDR4 RAM, 2x K40 GPUs

Table 1: Overview of the different node types in the JURECA system.

94 Spring 2015 • Vol. 13 No. 1 • inSiDE Systems

daemon has been developed by ParTec together with JSC. This plugin allows for using the Slurm batch system in combination with the ParaStation resource management – the resource management of choice on JSC’s clus- ters – without requiring additional dae- mons on the compute nodes that would inject spurious jitter.

Building on the JUROPA experience – where the successful collaboration between JSC and the hardware and software vendors has contributed to the exceptional life span of the sys- tem – JSC, T-Platforms and the software provider ParTec are engaging in a col- laborative project to further develop and augment the JURECA system after installation and to approach urgent research questions in the scalability of large-scale cluster systems.

The installation of JURECA will proceed in two phases so as to minimize the service interruption for the users by allowing to install the first JURECA phase while JUROPA continues to • Dorian Krause operate. The first phase of JURECA consists of only six racks but delivers a Jülich Node Type Number Characteristics performance equivalent to the current Supercomputing JUROPA system. It thus allows users Centre (JSC), Standard/Slim 1.605 2x Haswell E5-2680 v3, 128 GiB DDR4 RAM to continue working while the JUROPA Germany system is dismantled to free up space Fat Type 1 128 2x Haswell E5-2680 v3, 256 GiB DDR4 RAM for the remaining 28 JURECA racks. The installation of this second phase Fat Type 2 64 2x Haswell E5-2680 v3, 512 GiB DDR4 RAM will be done during production and only a short offline maintenance window Accelerated 75 2x Haswell E5-2680 v3, 128 GiB DDR4 RAM, 2x K80 GPUs is required to integrate phase one and two in a single fat-tree InfiniBand Login 12 2x Haswell E5-2680 v3, 256 GiB DDR4 RAM network and perform benchmarks as part of the acceptance procedure. Visualization Type 1 10 2x Haswell E5-2680 v3, 512 GiB DDR4 RAM, 2x K40 GPUs

Visualization Type 2 2 2x Haswell E5-2680 v3, 1 TiB DDR4 RAM, 2x K40 GPUs

contact: Dorian Krause, [email protected]

Spring 2015 • Vol. 13 No. 1 • inSiDE 95 Centers

The Leibniz Supercomputing Centre of Research in HPC is carried out in colla- the Bavarian Academy of Sciences and boration with the distributed, statewide Humanities (Leibniz-Rechenzentrum, LRZ) Competence Network for Technical and provides comprehensive services to Scientific High Performance Computing scientific and academic communities by: in Bavaria (KONWIHR).

• giving general IT services to more Contact: than 100,000 university customers Leibniz Supercomputing Centre in Munich and for the Bavarian Academy of Sciences Prof. Dr. Arndt Bode Bolt zmannstr. 1 • running and managing the powerful 85478 Garching near Munich communication infrastructure of the Germany Munich Scientific Network (MWN) Phone +49 - 89 - 358 - 31- 80 00 • acting as a competence centre for [email protected] data communication networks www.lrz.de

• being a centre for large-scale archiving and backup, and by

• providing High Performance Computing resources, training and support on the local, regional, national and international level.

Picture of the Petascale system SuperMUC at the Leibniz Supercomputing Centre.

96 Spring 2015 • Vol. 13 No. 1 • inSiDE Centers

Compute servers currently operated by LRZ are given in the following table

Peak ­Performance System Size (TFlop/s) Purpose User ­Community

18 thin node islands 3 ,18 5 Capability German IBM/Lenovo iDataPlex ­Computing universities and 512 nodes per island with research institutes, 2 Intel Sandy Bridge PRACE projects EP processors each (Tier-0 System) 147,456 cores 288 TByte main memory FDR 10 IB

IBM/Lenovo 1 fat node island with 78 Capability System 205 nodes ­Computing x Cluster IBM/Lenovo Bladecenter HX5 “SuperMUC 4 Intel Westmere EX Phase 1” processors each 52 TByte main memory QDR IB

32 accelerated nodes with 75 Prototype 2 Intel Ivy Bridge EP and 2 System Intel Xeon Phi each 72 GByte main memory Dual-Rail FDR 14 IB

Lenovo/IBM 6 medium node islands 3,303 Capability German Nextscale Lenovo/IBM Nextscale Computing universities Cluster 512 nodes per island with and research “SuperMUC 2 Intel Haswell institutes, EP processors each PRACE projects Phase 2” 86,016 cores (Tier-0 System) 197 TByte main memory FDR 14 IB

Linux-Cluster 510 nodes with Intel Xeon 13.2 Capability Bavarian and EM64T/AMD Opteron Computing Munich 2-, 4-, 8-, 16-, 32-way Universities, LCG Grid 2,030 Cores 4.7 TBy te

SGI Altix 2 nodes with 20.0 Capability Bavarian Ultraviolet Intel Westmere EX Computing Universities, PRACE 2,080 Cores 6.0 TByte memory

Megware 178 nodes with 22.7 Capability Bavarian IB-Cluster AMD Magny Cours Computing, Universities, “CoolMUC” PRACE PRACE 2,848 Cores Prototype 2.8 TByte memory

A detailed description can be found on LRZ’s web pages: www.lrz.de/services/compute

Spring 2015 • Vol. 13 No. 1 • inSiDE 97 Centers

Bundling Competencies In order to bundle service resources in the state of Baden-Württemberg HLRS has teamed up with the Steinbuch Center for Computing of the Karlsruhe First German National Center Institute of Technology. This collabora- Based on a long tradition in supercom- tion has been implemented in the non- puting at University of Stuttgart, HLRS profit organization SICOS BW GmbH. (Höchstleistungsrechenzentrum Stuttgart) was founded in 1995 as the first German World Class Research federal Centre for High Performance As one of the largest research centers Computing. HLRS serves researchers for HPC HLRS takes a leading role in at universities and research laboratories research. Participation in the German in Europe and Germany and their exter- national initiative of excellence makes nal and industrial partners with high-end HLRS an outstanding place in the field. computing power for engineering and scientific applications. Contact: Höchstleistungs­rechenzentrum Service for Industry ­Stuttgart (HLRS) Service provisioning for industry is done Universität Stuttgart to­gether with T-Systems, T-Systems sfr, and Porsche in the public-private joint Prof. Dr.-Ing. Dr. h.c. Dr. h.c. Hon. Prof. venture hww (Höchstleistungsrechner Michael M. Resch für Wissenschaft und Wirtschaft). Nobelstraße 19 Through this co-operation industry 70569 Stuttgart always has acces to the most recent Germany HPC technology. Phone +49 - 711- 685 - 8 72 69 [email protected] / www.hlrs.de

View of the HLRS Cray XC40 "Hornet"

98 Spring 2015 • Vol. 13 No. 1 • inSiDE Centers

Compute servers currently operated by HLRS

Peak ­Performance User (TFlop/s) System Size Purpose ­Community

Cray XC40 3,944 dual socket 3,786 Capability European "Hornet" nodes with 94,656 Computing (PRACE) and Intel Haswell cores German Research Organizations and Industry

NEC Cluster 23 TB memory 170 Laki: German (Laki, Laki2) 9,988 cores 120,5 TFlops Universities, heterogenous 911 no de s Laki2: Research compunting 47,2 T F l o p s Institutes platform of 2 and Industry independent

NEC SX-ACE 64 compute nodes 16 Vector German with 256 cores Computing Universities, A detailed description can be found on HLRS’s web pages: www.hlrs.de/systems 4 TB memory Research Institutes and Industry

HLRS Cray XC40 "Hornet" frontdoor image

Spring 2015 • Vol. 13 No. 1 • inSiDE 99 Centers

research groups and in technology, e.g. by doing co-design together with leading HPC companies.

Implementation of strategic support infrastructures including community- oriented simulation laboratories and The Jülich Supercomputing Centre (JSC) cross-sectional teams, e.g. on math- at Forschungszentrum Jülich enables ematical methods and algorithms and scientists and engineers to solve grand parallel performance tools, enabling challenge problems of high complexity in the effective usage of the supercom- science and engineering in collaborative puter resources. infrastructures by means of supercom- puting and Grid technologies. Higher education for master and doctoral students in cooperation e.g. Provision of supercomputer resources with the German Research School for of the highest performance class for proj- Simulation Sciences. ects in science, research and industry in the fields of modeling and computer Contact: simulation including their methods. The Jülich Supercomputing Centre (JSC) selection of the projects is performed Forschungszentrum Jülich by international peer-review procedures implemented by the John von Neumann Prof. Dr. Dr. Thomas Lippert Institute for Computing (NIC), GCS, and 52425 Jülich PRACE. Germany Phone +49 - 24 61- 61- 64 02 Supercomputer-oriented research [email protected] and development in selected fields of www.fz-juelich.de/jsc physics and other natural sciences by

JSC's supercomputer "JUQUEEN", an IBM Blue Gene/Q system.

100 Spring 2015 • Vol. 13 No. 1 • inSiDE CentersCentres

Compute servers currently operated by JSC

Peak ­Performance User System Size (TFlop/s) Purpose ­Community

IBM 28 racks 5,872 Capability European Blue Gene/Q 28,672 nodes Computing Universities "JUQUEEN" 458,752 processors and Research IBM PowerPC® A2 Institutes, 448 Tbyte main PRACE memory

Intel 1,884 SMT nodes 2,245 Capacity European Linux CLuster with 2 Intel Haswell and Universities, “JURECA” 12-core 2.5 GHz Capability Research processors each Computing Institutes and 150 graphics Industry processors (NVIDIA K80) 281 TByte memory

Intel 206 nodes with 240 Capacity selected GPU Cluster 2 Intel Westmere and HGF Projects “JUDGE” 6-core 2.66 GHz Capability processors each Computing 412 graphic processors (NVIDIA Fermi) 20.0 TByte memory

IBM 1,024 PowerXCell 100 Capability QCD Cell System 8i processors Computing Applications “QPACE” 4 TByte memory SFB TR55, PRACE

Spring 2015 • Vol. 13 No. 1 • inSiDE 101 Activities New Books in HPC

High Performance Computing volume covers a wide variety of appli- in Science and Engineering ‘14 cations that deliver a high level of sus- tained performance.

The book covers the main methods in HPC. Its outstanding results in achiev- ing the best performance for produc- tion codes are of particular interest for both scientists and engineers. The book comes with a wealth of color illus- trations and tables of results.

Sustained Simulation Performance 2014

Transactions of the High Performance Computing Center Stuttgart (HLRS), 2014

Editors: Nagel, W. E., Kröner, D. H., Resch, M. M.

This book presents the state-of-the- art in supercomputer simulation. It includes the latest findings from leading researchers using systems from the High Performance Computing Center Stuttgart (HLRS). The reports cover Proceedings of the Joint Workshop on all fields of computational science and Sustained Simulation Performance, Uni- engineering ranging from CFD to com- versity of Stuttgart (HLRS) and Tohoku putational physics and from chemistry University, 2014 to computer science with a special em- phasis on industrially relevant Editors: Resch, M. M., Bez, W., Focht, applications. Presenting findings of E., Kobayashi, H., Patel, N. one of Europe’s leading systems, this

102 Spring 2015 • Vol. 13 No. 1 • inSiDE Activities

This book presents the state-of-the-art Editors: Knüpfer, A., Gracia, J., Nagel, in High Performance Computing and W. E., Resch, M. M. simulation on modern supercomputer architectures. It covers trends in hard- Current advances in High Performance ware and software development in gen- Computing increasingly impact efficient eral and the future of high-performance software development workflows. Pro- systems and heterogeneous architec- grammers for HPC applications need tures in particular. The application- to consider trends such as increased related contributions cover computa- core counts, multiple levels of parallel- tional fluid dynamics, material science, ism, reduced memory per core, and medical applications and climate re- I/O system challenges in order to search; innovative fields such as cou- derive well performing and highly scal- pled multi-physics and multi-scale simu- able codes. At the same time, the lations are highlighted. All papers were increasing complexity adds further chosen from presentations given at the sources of program defects. While 18th Workshop on Sustained Simulation novel programming paradigms and Performance held at the HLRS, Univer- advanced system libraries provide sity of Stuttgart, Germany in October solutions for some of these challenges, 2013 and subsequent Workshop of the appropriate supporting tools are same name held at Tohoku University in indispensable. Such tools aid applica- March 2014. tion developers in debugging, perfor- mance analysis, or code optimization Tools for High Performance and therefore make a major contribu- Computing 2014 tion to the development of robust and efficient parallel software. This book introduces a selection of the tools pre- sented and discussed at the 8th Inter- national Parallel Tools Workshop, held in Stuttgart, Germany, October 1-2, 2014.

Proceedings of the 8th International Work- shop on Parallel Tools for HPC, 2014

Spring 2015 • Vol. 13 No. 1 • inSiDE 103 Activities Third JUQUEEN Porting and Tuning Workshop

Figure 1: Participants of the Third JUQUEEN Porting and Tuning Workshop (© Forschungszentrum Jülich GmbH).

Jülich Supercomputing Centre (JSC) in improving the efficiency of their ap- continued its successful series of plications and workflow. Each year the JUQUEEN Porting and Tuning Work- workshop also focuses on one special shops from 2nd to 4th of February this user group, this time inviting users from year. The PRACE Advanced Training the Earth system modelling community. Centre (PATC) course attracted over This activity was supported by the Simu- 20 participants from various institu- lation Laboratory Climate Science and tions in three European countries. The the Simulation Laboratory Terrestrial workshop familiarised the participants Systems. with the Blue Gene/Q supercomputer installed at JSC, including the provided The programme of the course started toolchain from compilers and libraries with overviews of JUQUEEN's hardware to debuggers and performance analysis and software environment. A summary tools. The participants received help of the best practices was then followed porting their codes, analysing execu- by talks on performance analysis tools, tion performance and scalability, and debugging, and efficient I/O. The

104 Spring 2015 • Vol. 13 No. 1 • inSiDE Activities

workshop concluded with tips on very specific hardware features of the Blue Gene/Q architecture (QPX and TM/ SE). At the heart of the workshop were hands-on sessions with the partici- pants' codes, supervised by members of staff from JSC's Simulation Labo- ratories and cross-sectional teams (Application Optimisation, Performance Analysis, Mathematical Methods and Algorithms) as well as IBM. The gen- eral programme was accompanied by sessions for the Earth system model- ling groups with short talks by the participants and members of the two Simulation Laboratories involved. Those covered a large variety of codes and applications from process and sensitiv- ity studies to numerical weather pre- diction and climate change projections. Examples shown included ensemble • Dirk Brömmel 1 approaches with multiple concurrent • Klaus Görgen 1, 2 realisations, parallel data assimilation • Lars Hoffmann1 frameworks, and innovative variable model grids. In general, the challenge 1 Jülich here is to optimise the often large Supercomputing legacy codes used in geosciences with Centre (JSC), their multiphysics models (clouds and Germany precipitation, convection, chemistry, radiation, groundwater, sediment 2 University of transport) and multiple spatial and tem- Bonn, Germany poral scales from riverbed water per- colation and fine sediment movement to global-scale climate simulations. contact: Dirk Brömmel, [email protected] The slides of the talks can be found on Klaus Görgen, [email protected] the web at http://www.fz-juelich.de/ Lars Hoffmann, ias/jsc/jqws15 [email protected]

Spring 2015 • Vol. 13 No. 1 • inSiDE 105 Activities Helmholtz Portfolio Theme "Supercomputing and Modelling for the Human Brain" − General Assembly 2015

A total of 92 scientists from Forschungszen- The SMHB is embedded into the allied trum Jülich and several partner institutions partnership of the two new Helmholtz collaborating within the Helmholtz Portfolio Programmes "Decoding the Human Brain" Theme "Supercomputing and Modelling for and "Supercomputing and Big Data". Both the Human Brain" (SMHB) [1] met for their were successfully reviewed in 2014 and annual General Assembly on March 30th started in January 2015 with the third and 31st at the Jülich Supercomputing Cen- period of the Programme-oriented tre (JSC), to present and discuss the work Funding of the Helmholtz Association. progress achieved in 2014 and to agree on The SMHB was also conceived as part the next steps in the project. of the Helmholtz contribution to the Eu- ropean Future and Emerging Technology The SMHB started in January 2013 as (FET) Flagship "Human Brain Project" [3]. a Portfolio Theme [2] of the Helmholtz Association. Its overarching goal is to The SMHB General Assembly 2015 better understand the organization and was opened by Prof. Wolfgang functioning of the human brain by devel- Marquardt, Chairman of the Board of oping a realistic model of the brain. To Directors of Forschungzentrum Jül- meet this grand scientific challenge, ich, and by the two project speakers, an appropriate infrastructure for High Prof. Katrin Amunts (INM-1) and Prof. Performance Computing (HPC) in the Thomas Lippert (JSC). In the course exascale range and Big Data analytics of this meeting, the vivid collaboration needs to be built. in the SMHB became once more evi- dent through many examples of fruitful The work plan of the SMHB therefore interactions between the SMHB work integrates a wide range of knowledge packages and tasks. The work plan and expertise from fundamental neuro- was also updated in order to adapt science, brain modelling and simulation, the work to upcoming needs and to simulation software technology, High further strengthen existing or newly Performance Computing, large-scale established links. For instance, two data management, scientific workflows, new tasks "3D cellular architecture" and interactive visualization and analy- and "multimodal modelling of structure, sis. The SMHB collaborates with the function and connectivity" were added JSC’s Exascale Labs for the co-design to the work plan. of neuroscience applications and HPC technology.

106 Spring 2015 • Vol. 13 No. 1 • inSiDE Activities

Dr. Moritz Helmstaedter from References the Max Planck Institute for Brain [1] http://www.fz-juelich.de/JuBrain/EN Research gave an invited and very well [2] http://www.helmholtz.de/en/research received keynote talk on his research topic of Connectomics, in which he [3] https://www.humanbrainproject.eu introduced the audience to the dense reconstruction of neuronal circuits.

A further highlight of this year´s meeting was the SMHB young scientists presenting their work in a spotlight talk session and discussing it afterwards with colleagues and • Anne Do Lam- members of the SMHB Scientific Ruschewski Advisory Board with expertise in • Anna Lührs several fields during the poster contact: Anne Do Lam-Ruschewski, • Boris Orth session. [email protected] Jülich Supercomputing Centre (JSC), Germany

Spring 2015 • Vol. 13 No. 1 • inSiDE 107 Activities 6th Blue Gene Extreme Scaling Workshop

JSC Simulation Laboratories for Climate Science, Fluids & Solids Engineering and Neuroscience assisted the code-teams, along with JSC Cross-sectional Teams, JUQUEEN and IBM technical support.

12 million core-hours were used, cover- ing a 30 hour period with the full 28 As an optional appendix to this year's racks reserved, and all seven codes JUQUEEN Blue Gene/Q Porting and managed their first successful execu- Tuning Workshop at Jülich Supercomputing tion using all 28 racks within the first 24 Centre (JSC), two additional days (5-6 hours of access. Figs. 1 & 2 show that February) were offered for select code- the codes demonstrated excellent strong teams to (im)prove their applications' and/or weak scalability, six using 1.8 mil- scalability to the entire 458,752 cores. lion MPI processes or OpenMP threads, This continued the tradition of the initial which improved the existing High-Q Club initial 2006 workshop using the JUBL entry for FEMPAR and qualified five new Blue Gene/L, 2008 workshop using the members for the High-Q Club. MPAS-A JUGENE Blue Gene/P and three subse- unfortunately was not accepted as its quent workshops dedicated to extreme scaling was limited to only 24 racks scaling which attracted participants (393,216 cores) with its 1.2 TB 3-km from around the world. [1,2,3,4] dataset of 65 million grid points.

Seven international application code- Detailed workshop reports provided by teams took up this offer: CoreNeuron each code-team, and additional compara- brain activity simulator (EPFL Blue Brain tive analysis to the other 16 High-Q Club Project), FE2TI scale-bridging incorpo- member codes, are available in a techni- rating micro-mechanics in macroscopic cal report and expected to be published simulations of multi-phase steels (Uni- in the proceedings of a ParCo confer- versity of Cologne and TU Freiberg), ence minisymposium later this year. The FEMPAR finite-element code for multi- workshop surpassed our expectations and physics problems (from UPC-CIMNE), completely achieved its goal, with all par- ICON icosahedral non-hydrostatic atmo- ticipants finding it to have been extremely spheric model (DKRZ), MPAS-A multi- useful as on-hand support made it pos- scale non-hydrostatic atmospheric model sible to quickly overcome issues. To for global, convection-resolving climate follow-up a workshop is being organised simulations (KIT and NCAR), psOpen at this year's ISC-HPC conference to direct numerical simulation of fine-scale compare JSC application extreme-scaling turbulence (RWTH-ITV and JARA), and experience with that of leading super- SHOCK structured high-order finite- computing centres. [5] difference kernel for compressible flows (RWTH-SWL). contact: Brian Wylie, [email protected]

108 Spring 2015 • Vol. 13 No. 1 • inSiDE Activities

          

 !""

 $%$    &% "!&"  '& ()         #$!   $#!

Figure 1: Strong scaling of workshop application codes compared to existing High-Q Club members.

&)$&" $ - +' &"' (' ")' &$' !"' ))$' +,-

&

%

$ • Dirk Brömmel

 ,--, *++ • Wolfgang Frings #  . • Brian Wylie  .  /0  1  " Jülich   Supercomputing   Centre (JSC), ! Germany Figure 2: Weak scaling of workshop application codes compared to existing High-Q Club members.

References [4] Brömmel, D., Frings, W., Wylie, B. J. N. [1] Mohr, B., Frings, W. 2015 JUQUEEN Blue Gene/Q Extreme Jülich Blue Gene/P Porting, Tuning & Scaling Scaling Workshop, Tech. Report FZJ-JSC- Workshop 2008, inside 6(2), 2008 IB-2015-01 http://juser.fz-juelich.de/ record/188191 [2] Mohr, B., Frings, W. (Eds.) Jülich Blue Gene/P Extreme Scaling Work- [5] ISC-HPC Workshop on Application Extreme- shops 2009,2010,2011, Tech. Reports scaling Experience of Leading Supercomputing FZJ-JSC-IB-2010-02, FZJ-JSC-IB-2010-03 & Centres (Frankfurt, 16 July 2015) FZJ-JSC-IB-2011-02. http://www.fz-juelich.de/ias/jsc/aXXLs/

[3] The High-Q Club http://www.fz-juelich.de/ias/jsc/high-q-club

Spring 2015 • Vol. 13 No. 1 • inSiDE 109 Activities Retrospective of International Jülich CECAM School on Computational Trends in Solvation and Transport in Liquids

The Jülich CECAM School [1], organized The participants, 43 young scientists this year at JSC continued a series of from 8 countries (Europe and over- Schools which started in 2000 and seas), proved a broad interest in the which takes place in bi- or triennial peri- topic, which covers both Soft Matter ods [2]. It has entered the CECAM [3] event Physics and Solvation Chemistry. The calendar since 2012 and addresses a well balanced topics of the lectures larger audience interested in compu- made it possible to present both top- tational science and parallelization of ics as coupled together. The 23 lec- algorithms in various fields of applica- turers from 7 countries (also Europe tion. This year the focus of the School and overseas) were highly renowned was on Computational Trends in Solva- scientists who presented latest devel- tion and Transport in Liquids, partially opments of research embedded in a motivated by the Cluster of Excellence broad overview over the field. RESOLV, coordinated at Ruhr-University Bochum, which also co-sponsored the School financially [4].

110 Spring 2015 • Vol. 13 No. 1 • inSiDE Activities

The topics of the School were grouped hood of Jülich together with a dinner. into several blocks, covering Atomistic The positive feedback from participants Methods, Coarse Grain and Continuum and lecturers provides strong motiva- Methods, Hybrid Methods, Mesoscale tion to further organize Computational Fluid Methods as well as Numerical Science oriented Schools in a similar Methods and Hardware. It is the mix of scheme. topical blocks, which has proven to be most efficient in addressing the audi- References ence, having sometimes a very diverse [1] www.fz-juelich.de/stl-2015 background but profiting highly of the [2] For a list of past Schools, see: www.fz- introductory character of the lectures juelich.de/ias/jsc/EN/Expertise/ spiced with advanced topics which Workshops/Conferences/STL-2015/ gives a perspective of research in the PastWorkshops/_node.html various fields. [3] www.cecam.org

An essential part of the School is the [4] CoE Ruhr Explores Solvation: www.ruhr-uni- provision of Lecture Notes to the bochum.de/solvation/ audience on the first day of the School, [5] IAS Series, Vol. 28:https://juser.fz-juelich. which makes it even more accessible de/record/188877/files/IAS-Band-28-V2.pdf to follow the courses and refresh the knowledge during and after the School. The Lecture Notes have been published as volume 28 of the IAS series, which is available either as hard copy or as PDF file on the web [5]. Although the topics were well structured in blocks in the Lecture Notes, during the School topics were intentionally scattered • Godehard over the whole week to provide a Sutmann1 broad spectrum of topics to the whole • Johannes audience and to avoid topic selected Gritendorst1 days, which would have reduced • Gerhard Gompper the number of participants during • Dominik Marx2 School days. As a practical aspect of the School, a hands-on session for 1 Jülich computing on graphical processing Supercomputing units (GPU) was organized, providing a Centre (JSC), code oriented approach for first steps Germany in GPU computing. 2 Ruhr-Univeristy To stimulate discussion between par- Bonn, Germany ticipants and lecturers, 4 evening events were organized, containing a get-together during the first evening, 2 poster sessions and an excursion of the whole group to the brown coal open cast mining in the close neighbor-

Spring 2015 • Vol. 13 No. 1 • inSiDE 111 Activities Workshop on Force-Field Development (Forces 2014)

Many processes in nature and technol- Seventy participants from four conti- ogy can be rationalized with computer nents attended the workshop, in which simulations of a few thousands to a few eleven invited talks, seven contributed hundred millions of atoms. Examples talks, as well as twenty posters were range from protein folding via plastic presented. Topics included, amongst deformation to the dynamics of explo- others, the systematic and fitting-free sions. Simulating such processes in a bottom-up design of force fields from meaningful fashion is outside the scope first principles, machine-learning strat- of electronic-structure based tech- egies, force fields for non-equilibrium niques and will remain so for the next (excited electronic states) to potential few decades, even if computers and repositories and force-field standardiza- algorithms keep improving at the cur- tion. The IOP journal Modeling and rent rate. Gaining useful insight into the Simulation in Materials Science and various processes then often requires Engineering (MSMSE) will dedicate a one to use low-cost force fields that still special issue Force Fields: From Atoms properly reflect the intricate quantum to Materials to this very successful mechanics responsible for interatomic workshop. Ten invited contributions to bonding and repulsion. For instance, the proceedings volume are currently the failure mechanism of a specific under review and expected to be pub- material can only be unraveled in lished within the next six months. More simulations that accurately account details on the workshop, including a link for defect energetics. This cannot be to the special MSMSE link in the near • Martin Müser achieved with generic two-body poten- future, can be found at http://www.fz- tials. To advance the field, a workshop juelich.de/ias/jsc/ForceFields2014 Jülich on the development of force fields was Supercomputing held in Jülich from November 3–5, 2014. Centre (JSC), Germany

Figure 1: Participants of the NIC workshop Force Fields: From Atoms to Materials in front of the Jülich Supercomputing Centre.

112 Spring 2015 • Vol. 13 No. 1 • inSiDE Activities JSC becomes full Member of JLESC

The Joint Laboratory brings together to ensure that the research supported researchers from the Institut National by the laboratory addresses science and de Recherche en Informatique et en engineering’s most critical needs and Automatique (France), National Center takes advantage of the continuing evolu- for Supercomputing Applications (USA), tion of computing technologies. Argonne National Laboratory (USA), Barcelona Supercomputing Center The collaborative work within the joint (Spain), RIKEN Advanced Institute for laboratory is organized in projects Computational Science (Japan) and JSC. between two or more partners. This The Joint Laboratory brings together includes mutual research visits, joint researchers from the Institut National publications and software releases. The de Recherche en Informatique et en results of these projects are discussed Automatique (Inria, France), the National during biannual workshops, where also Center for Supercomputing Applications new ideas and collaboration opportuni- (NCSA, USA), Argonne National Labo- ties are presented. The next event will ratory (ANL, USA), Barcelona Super- take place in Barcelona from June 29 to computing Center (BSC, Spain), RIKEN July 1, 2015, followed by a two-day sum- • Robert Speck (Japan) and JSC. mer school on big data. In December 2015, the event will be organized by JSC. Jülich The objectives of JLESC are to initiate Supercomputing and facilitate international collaborations JSC is involved in many new develop- Centre (JSC), on state-of-the-art research related to ments related to key extreme-scale Germany computational and data focused simula- applications. This expertise nicely com- tion and analytics at extreme scales. plements JLESC’s current work and will JLESC promotes original ideas, research bring new problems to the applied math- and publications, as well as open source ematics and computer science com- software, aimed to address the most criti- munities. As this is truly a high-ranking cal issues in advancing from petascale consortium in supercomputing, JSC felt to extreme scale computing. Research very much honored being invited to join topics include parallel programming as full partner. JSC is represented in models, numerical algorithms, parallel JLESC by the Steering Committee mem- I/O and storage systems, data analytics, ber Prof. Thomas Lippert and the JSC heterogeneous computing, resilience and Executive Director Dr. Robert Speck. For performance analysis. JLESC involves more information visit the official JLESC scientists and engineers from many dif- website under http://publish.illinois.edu/ ferent disciplines as well as from industry jointlab-esc

Spring 2015 • Vol. 13 No. 1 • inSiDE 113 Activities JuPedSim: Framework for simulating and analyzing the Dynamics of Pedestrians

ans is characterized by a high dynamic and the zoo of models is still growing. However, it lacks a common basis for model comparison and benchmarking. From the scientific and academic point of view, it is often crucial to understand how a model has been implemented since the mathematical description and the computer implementation some- times differ. This also raises issues about the validation of the models, especially against empirical data, which is often neglected in many software Figure 1: Screenshot of the simulation of a building evacuation with three floors systems. connected with two staircases. Pedestrians are represented with cylinders and their velocities is color coded (red for slow and green for fast (ca. 1.34m/s)). The Jülich Pedestrian Simulator, After 4 seconds, there are still 36 pedestrians in the building. JuPedSim, is an extensible framework The growing complexity of modern for simulating and analyzing pedestri- buildings and pedestrians’ outdoor ans’ motion at a microscopic level. It facilities make the use of simulation consists at the moment of three mod- software inescapable. Simulations are ules which are loosely coupled and can used not only in the design phase of be used independently. JuPedSim imple- new structures but also later in the ments state of the art models and preparation or monitoring of large- analysis methods. scale events. To obtain reliable results, it is paramount to understand and The module JPScore computes the tra- reproduce the underlying phenomena jectories of the pedestrians given a ge- which rule the dynamics of the pedes- ometry and an initial configuration. The trians. There exist numerous software start configuration includes the desired tools, mostly commercial (or with destinations, speeds, route choices proprietary licenses), for simulating and other demographic parameters pedestrians. They usually implement a about the pedestrians. Three-dimen- single model and are only of limited use sional geometries are also supported. for academic purposes, where the aim Two models at the operational level is generally a rapid prototyping frame- (locomotion system, collision avoidance) work for implementing and testing and three models at the tactical level new concepts or models. There exist (route choice, short term decisions) open source tools as well. But they are are actually implemented in the frame- mainly designed for applications and do work [1,2]. Other models are in the pro- not offer easy access for model test- cess of being integrated. Further mod- ing or extension. In addition the state els can be incorporated by third parties of the art in the modeling of pedestri- without much effort. Other behavioral

114 Spring 2015 • Vol. 13 No. 1 • inSiDE Activities parameters are implemented such the possibility to share information about closed doors with other agents and the ability to explore an unknown environ- ment looking for an exit.

The second module JPSvis visualizes the geometry and the trajectories, either from files or streamed from a network connection. High resolution videos can also be recorded directly from the module interface.

The module JPSreport analyses the Figure 2: Workflow for analyzing trajectories using results from the simulation or any JPSreport. Top left: experiment involving two pedestrian other source for instance experiments streams coming from left and right and merging in a T-Junction. Top right: automatically extracted trajectories and generates different type of plots. using PeTrack [4]. Bottom left: density profiles computed The reporting tool integrates four dif- with JPSreport using the Voronoi method [3]. Bottom ferent, measurement methods [3]. right: instantaneous velocities of the pedestrians using Possible analyses include densities, the Voronoi method [3]. velocities, flows and profiles of pedes- trians in a given geometry. References [1] Kemloh Wagoum, A. U. Route choice modelling and runtime optimi- Planned features for the framework sation for simulation of building evacuation, include a graphical user interface for Dissertation, Schriften des Forschungszen- editing the geometry, which will also trums Jülich: IAS Series 17, XVIII, 122 p., include import capabilities for various 2013.

CAD formats and the connection of the [2] Chraibi, M. • Armel Ulrich pedestrian simulation with a fire simula- Validated force-based modeling of pedes- Kemloh Wagoum tor. In contrast to other simulation pack- trian dynamics, Dissertation, Schriften des • Armin Seyfried Forschungszentrums Jülich: IAS Series 13, ages, an emphasis is set on the valida- 2012. tion of the implemented models. The Jülich empirical data used for the validation [3] Zhang, J. Supercomputing Pedestrian fundamental diagrams: Com- come from numerous experiments that Centre (JSC), parative analysis of experiments in different have been conducted in different geom- geometries, Dissertation, Schriften des Germany etries during the past years. All inputs and Forschungszentrums Jülich: IAS Series 14, output files are XML based. JuPedSim is 103 p., 2012. platform independent, released under [4] Boltes, M. the LGPL License and written in C++. Automatische Erfassung präziser Trajekto- All information including documentation, rien in Personenströmen hoher Dichte, Dis- sertation, Schriften des Forschungszentrums source code and experimental data are Jülich: IAS Series 27, xii, 308 p., 2015. available at www.jupedsim.org. JuPedSim has been used for instance in a real [5] Arnold, L. ORPHEUS - Fire Safety in the Underground, time evacuation assistant for arenas [1] inside Spring 2015. and will be used to simulate the Berlin underground station Osloer Straße in the ORPHEUS project [5]. contact: Armel Ulrich Kemloh Wagoum, [email protected]

Spring 2015 • Vol. 13 No. 1 • inSiDE 115 Activities HLRS at SC14 in New Orleans, Louisiana

Belonging to the steady circle of par- bicycle themselves, visitors could ticipants at the annual Supercomputing immediately observe changes to the Conference (SC), HLRS had returned to airflow resulting from various riding SC14 held in New Orleans, Louisiana/ positions of the triathlete. This method USA (November 16-21, 2014). In helps triathletes to find and verify the addition to HLRS representatives par- most efficient riding position on their ticipating in a number of workshops, racing bikes and to analyze individual birds-of-a-feather sessions, tutorials mounted accessories and helmets. and other SC events, HLRS also hosted a booth in the SC exhibition hall where Apart from HLRS researchers and they presented details about the large scientists inter-exchanging with the number of simulation projects of break- like-minded HPC experts, visitors to the through caliber the HLRS supercomput- HLRS booth were also given the oppor- ing infrastructure is being used for. tunity to have some fun. The HLRS staff had organized a special competition Highlight of the HLRS’s exhibits was with the opportunity for the partici- a hands-on Augmented Reality demo pants to win real trophies: The HLRS which analyzed simulation results of HPC Awards (High Performance Cycling the airflow and the pressure distribu- awards). Those who dared to meet the tion around a 3D scanned triathlete on challenge of the 5 km long bike trail her racing bike. By moving a camera were given the opportunity to enjoy the around the bicycle or even riding the (simulated) charming scenery of the

116 Spring 2015 • Vol. 13 No. 1 • inSiDE Activities

beautiful Black Forest. However, they also had to fight the elevation of the German mountain range: up- and down- hill passages had to be mastered while at the same time "racing against the clock" and trying to beat trail records was in order.

What at first seemed like a feasible task proved for some biking fans to be a too big challenge. Not all courageous racers eventually crossed the “finishing line”.

• Regina Weigand

University of Stuttgart, HLRS, Germany

contact: Uwe Wössner, [email protected]

Spring 2015 • Vol. 13 No. 1 • inSiDE 117 Activities 14th HLRS/hww Workshop on Scalable Global Parallel File Systems

In the first presentation of the Monday afternoon session, Torben Kling- Petersen, Seagate, discussed the Lustre Storage Enterprise HPC tech- nology of Seagate which is enabling energy efficient, extreme performance storage solutions. Following new approaches for large HPC systems, Wilfried Oed, Cray, explained the Cray Data Warp solution which can already From April 27th to April 29th, 2015, be deployed in todays Cray XC30 and representatives from science and XC40 systems. Afterwards, Alexander industry working in the field of Global Menck, NEC, gave an overview of the Parallel File Systems and High Per- NEC storage portfolio. formance Storage did meet at HLRS for the fourteenth annual HLRS/hww In the second afternoon session, Workshop on Scalable Global Parallel File James Coomer, DDN, introduced the Systems "The Non-Volatile Challenge". IME Burst Buffer technology and he About 75 participants did follow a total showed, how real world applications of 22 presentations which have been can profit from its usage. Franz- on the workshop agenda. Josef Pfreundt, FhG – ITWM, showed BeeOND, which is BeeGFS (formerly Prof. Michael Resch, Director of HLRS, known as Fraunhofer File System) on opened the workshop with an opening demand. He explained how a file sys- address on Monday morning. tem can be setup and provided auto- matically on nodes which have been In the keynote talk, Eric Barton, CTO of reserved for a user job, e.g by a batch Intel’s High Performance Data Division, system. In the last presentation of the discussed "A new storage paradigm for day, Mellanox’ Oren Duer gave an over- NVRAM and integrated fabrics". He view on Mellanox efforts in the storage explained emerging trends and upcom- field including the different technologies ing technologies which might signifi- they are providing in the field. cantly change the storage landscape. Peter Braam, Braam Research and In the first presentation on Tuesday University of Cambridge, explained morning, Akram Hassan, EMC, pro- the exa-scale data requirements and vided EMCs view to elastic Cloud and issues of the square kilometre array object storage. He showed how the which is currently under development. object storage solution fits the needs of todays applications. The following talk have been more research oriented.

118 Spring 2015 • Vol. 13 No. 1 • inSiDE Activities

Tim Süß, University Mainz presented results of his studies on the potential of data deduplication in checkpoint data of scientific simulations.

The second session started with a presentation of IBM, given by Olaf Weiser, explaining new developments in GPFS, especially GPFS native RAID. For half a year, Lenovo is now a new and active player on the HPC market. Michael Hennecke provided an overview of Lenovos HPC storage solutions. In the last talk of the morning sessions, Thomas Uhl introduced a new highly performant HSM solution provided by The other talks touching also the big GrauData which is especially working data arena followed on Wednesday together with Lustre. morning. Mario Vosschmidt, NetApp, discussed the benefits of declustered The second half of the day is tradition- RAID technologies and especially ally reserved for network oriented covered Hadoop file systems with this. presentations. This year a focus have Alexey Cheptsov, HLRS, introduced been on the opportunities of Software the Juniper project and how HPC can Defined Networking. Yaron Ekshtein, be used for data-intensive applications presented in detail the Pica8 approach with Hadoop and OpenMPI. for a switch operating system providing an open networking approach for improv- The last session has been again more ing cost/performance and scalability future and research oriented. The • Thomas Bönisch of compute clusters. This was followed Seagate perspective to the future of by Edge Core’s Lukasz Lukowski, who HPC Storage has been given by University of showed its bare metal switches as Torben Kling-Petersen. Michael Kuhn, Stuttgart, HLRS, underlying hardware for SDN and how Hamburg University, followed with a Germany they can accelerate open networking. new approach to make future storage Klaus Grobe, Adva Optical, went even systems aware of I/O semantics and more down to the hardware and finally Thomas Bönisch, HLRS, showed explained solutions for inter-data-center the results of the project SIOX, 400-Gb/s WDM transport. Scalable IO Extensions.

As a preview for the Wednesday ses- HLRS appreciates the great interest it sion, Radu Tudoran, Huawei, introduced has received once again from the the three storage musketeers of the participants of this workshop and Big Data era: Reliable, Integrated and gratefully acknowledges the encourage- Intelligent and Technology-Convergent. ment and support of the sponsors who have made this event possible.

Spring 2015 • Vol. 13 No. 1 • inSiDE 119 Activities HLRS Scientific Tutorials and Workshop Report and Outlook

The Cray XC40 Optimization an Paral- Another highlight is the Introduction to lel I/O courses will provide all necessary Computational Fluid Dynamics. This information to move applications to the course was initiated at HLRS by Prof. new Cray XC40 Hornet and future Hazel Sabine Roller. It is again scheduled on Hen HPC system. We are looking forward February 23–27, 2015 in Siegen and on to working with our users on this leading- September 14-18, 2015 in Stuttgart. edge supercomputing technology. The next course series in cooperation with In April 2015, Performance and Debug- Cray specialists will take place on October ging Tools are presented to assist parallel 20-23, 2015. An online recording from programming. In July 2015 we continue September 2014 is also available. the successful series of two courses on software optimization, the Node-level One of the flagships of our courses is the Performance Engineering by Georg week on Iterative Solvers and Parallel- Hager and Jan Treibig, and User-guided ization. Prof. A. Meister teaches basics Optimization in High-Level Languages and details on Krylov Subspace Methods. from the Computer Graphics Lab at Saa- Lecturers from HLRS give lessons on rland University. In June 2015, a Cluster • Rolf Rabenseifner distributed memory parallelization with Workshop will discuss hardware and the Message Passing Interface (MPI) software aspects of compute clusters. University of and shared memory multithreading with Stuttgart, HLRS, OpenMP. This course will be presented The Visualization Courses in April and Germany twice, on March 16–20, 2015 at HLRS in October 2015 are targeted at research- Stuttgart and on September 07–11, 2015 ers who would like to learn how to visual- at LRZ in Garching near Munich. ize their simulation results on the desktop but also in Augmented Reality and Virtual ISC and SC Tutorials Environments. Georg Hager, Gabriele Jost, Rolf Rabenseifner: MPI+X - Hybrid Programming on Modern Compute Clusters with Multicore Processors and Accelerators. Tutorial 02 at the International Supercomputing Conference, ISC’15, Frankfurt, July 12–16, 2015. Our general course on parallelization,

Georg Hager, Jan Treibig, Gerhard Wellein: Node-Level Performance Engineering. the Parallel Programming Workshop, Full-day Tutorial at Super Computing 2014, SC14, New Orleans, Louisiana, November September 07–11, 2015 at HLRS, will 16–21, 2014. have three parts: The first two days of Rolf Rabenseifner, Georg Hager: MPI+X - Hybrid Programming on Modern Compute this course are dedicated to paralleliza- Clusters with Multicore Processors and Accelerators. Half-day Tutorial at Super Computing 2014, SC14, New Orleans, Louisiana, November 16–21, 2014. tion with the Message Passing Interface (MPI). Shared memory multi-threading is taught on the third day, and in the last two days, advanced topics are discussed. This includes MPI-2 functionality, e.g., par- allel file I/O and hybrid MPI+OpenMP, as well as MPI-3.0 (and upcoming 3.1) with its extensions to the one-sided communi- cation, a new shared memory program- ming model and a new Fortran interface. As in all courses, hands-on sessions (in C and Fortran) will allow users to imme-

120 Spring 2015 • Vol. 13 No. 1 • inSiDE Activities diately test and understand the parallel- ization methods. The course language is English.

Three and four day-courses on MPI & OpenMP are presented at other loca- tions all over the year.

We also continue our series of Fortran for Scientific Computing on March 09– 13, 2015 and December 07-11, 2015, mainly visited by PhD students from Stuttgart and other universities to learn not only the basics of programming, but also to get an insight on the principles of developing high-performance applications with Fortran. 2015 – Workshop Announcements Scientific Conferences and Workshops at HLRS With Unified Parallel C (UPC) and Co- 14th HLRS/hww Workshop on Scalable Global Parallel File Systems (April 27-30) Array Fortran (CAF) and an Introduc- High Performance Computing in Science and Engineering – The 17th Results and Review Workshop of the HPC Center Stuttgart (October 05-06) tion to GASPI each year in spring, the participants will get an introduction of IDC International HPC User Forum (Autumn 2015) partitioned global address space (PGAS) Parallel Programming Workshops: Training in Parallel Programming and CFD languages. Parallel Programming and Parallel Tools (TU Dresden, ZIH, February 16–19) Industrial Services of HLRS (HLRS, February 25, July 01, and November 11) GPU programming is taught in OpenACC VI-HPS Tuning Workshop (HLRS, February 23-27) (PATC) Programming for Parallel Accelerated Introduction to Computational Fluid Dynamics (ZIMT, Uni Siegen, February 23-27) Supercomputers – an alternative to Cray XC40 and I/O Workshops (HLRS, March 02–05 and October 20–23) (PATC) Iterative Linear Solvers and Parallelization (HLRS, March 16–20 / LRZ Garching, CUDA from Cray perspective in April September 07-11) and in two CUDA courses in April and in Tools for Parallel Programming with OmpSs (HLRS, April 13–15) October 2015. OpenACC Programming for Parallel Accelerated Supercomputers (HLRS, April 16–17) (PATC) In cooperation with Dr. Georg Hager from GPU Programming using CUDA (HLRS, April 20–22 and October 26–28) Unified Parallel C (UPC) and Co-Array Fortran (CAF) (HLRS, April 23-24) (PATC) the RRZE in Erlangen and Dr. Gabriele Efficient Parallel Programming with GASPI (HLRS, April 27) Jost from Supersmith, the HLRS also Scientific Visualization (HLRS, April 28–29 and October 29–30) continues with contributions on hybrid Cluster Workshop (HLRS, June 29–30) MPI & OpenMP programming with tuto- Node Level Performance Engineering (HLRS, July 06–07) rials at conferences; see the box on the Introduction to Computational Fluid Dynamics (HLRS, September 14-18) left page, which includes also a second Message Passing Interface (MPI) for Beginners (HLRS, September 28-29) tutorial with Georg Hager from RRZE. Shared Memory Parallelization with OpenMP (HLRS, September 30) Advanced Topics in Parallel Programming (HLRS, October 01-02) (PATC) In the table on the right hand side, you Advanced Parallel Programming with MPI & OpenMP can find the whole HLRS series of training (FZ Jülich, JSC, November 30–December 02) Training in Programming Languages at HLRS courses in 2015. They are organized at Fortran for Scientific Computing HLRS and also at several other HPC institu- (HLRS, Februar 09–13 / December 07–11) (PATC) tions: LRZ Garching, JSC (FZ Jülich), ZIH (TU URLs Dresden), and ZIMT (Siegen). GCS serves as http://www.hlrs.de/events/ a PRACE Advanced Training Centre (PATC). http://www.hlrs.de/training/course-list/ PATC courses are marked in the table. (PATC): This is a PRACE PATC course

Spring 2015 • Vol. 13 No. 1 • inSiDE 121 GCS – High Performance Computing Courses and Tutorials

Node-Level Performance • Microbenchmarking for Introduction Engineering architectural exploration to Parallel Programming (PATC course) • The Roofline Model with MPI and OpenMP - Basics and simple Date & Location applications Date & Location July 6-7, 2015 - Case study: sparse August 11-14, 2015 HLRS, Stuttgart matrix-vector multiplication JSC, Forschungszentrum Jülich and December 10-11, 2015 - Case study: Jacobi LRZ Garching near Munich smoother Contents • Model-guided optimization The course provides an introduction Contents - Blocking optimization for to the two most important standards This course teaches performance the Jacobi smoother for parallel programming under the engineering approaches on the compute • Programming for optimal use of distributed and shared memory node level. "Performance engineering" parallel resources paradigms: MPI, the Message Passing as we define it is more than employing - Single Instruction Interface, and OpenMP. While intended tools to identify hotspots and bottle- Multiple Data (SIMD) mainly for the JSC guest students, necks. It is about developing a thorough - Cache-coherent Non- the course is open to other interested understanding of the interactions Uniform Memory persons upon request. between software and hardware. This Architecture (ccNUMA) process must start at the core, socket, - Simultaneous Multi- Webpage and node level, where the code gets Threading (SMT) http://www.fz-juelich.de/ias/jsc/ executed that does the actual compu- • Pattern-guided performance events/mpi-gsp tational work. Once the architectural engineering requirements of a code are understood - Hardware performance and correlated with performance metrics measurements, the potential benefit of - Typical performance optimizations can often be predicted. patterns in scientific We introduce a "holistic" node-level computing performance engineering strategy, apply - Examples and best it to different algorithms from compu- practices tational science, and also show how an • Beyond Roofline: The ECM Model awareness of the performance features • Optional: Energy-efficient code of an application may lead to notable execution reductions in power consumption. Between each module, there is time for The course is a PRACE Advanced Questions and Answers! Training Center event. • Introduction and Motivation Webpage • Performance Engineering as a www.hlrs.de/training/course-list process and • Topology and affinity in muticore http://www.lrz.de/services/compute/ systems courses/

122 Spring 2015 • Vol. 13 No. 1 • inSiDE GCS – High Performance Computing Courses and Tutorials

Iterative Linear Solvers Advanced Fortran Topics Introduction to Computional and Parallelization Fluid Dynamics in High Date & Location Performance Computing Date & Location September 14-18, 2015 September 7-11, 2015 LRZ, Garching near Munich Date & Location LRZ, Garching near Munich September 14-18, 2015 Contents HLRS, Stuttgart Contents This course, partly a PRACE Advanced The focus is on iterative and parallel Training Center (PATC) course, is tar- Contents solvers, the parallel programming models geted at scientists who wish to extend The course deals with current MPI and OpenMP, and the parallel their knowledge of Fortran beyond numerical methods for Computational middleware PETSc. Different modern the Fortran 95 standard. Days 1-3 Fluid Dynamics in the context of high Krylov Subspace Methods (CG, GMRES, (LRZ course): best practices (global performance computing. An empha- BiCGSTAB ...) as well as highly efficient objects & interfaces, abstract inter- sis is placed on explicit methods for preconditioning techniques are faces, import statement, object-based compressible flows, but classical presented in the context of real life programming),object-oriented program- numerical methods for incompress- applications. ming (type extension, polymorphism, ible Navier-Stokes equations are also inheritance, type- & object bound proce- covered. A brief introduction to tur- Hands-on sessions (in C and Fortran) dures, generic type-bound procedures, bulence modelling is also provided by will allow users to immediately test and abstract types & deferred bindings), the course. Additional topics are high understand the basic constructs of IEEE features & floating point exceptions, order numerical methods for the solu- iterative solvers, the Message Passing Fortran 2003 I/O extensions. Days 4-5 tion of systems of partial differential Interface (MPI) and the shared (PATC course): object-oriented design- equations. The last day is dedicated to memory directives of OpenMP. patterns (polymorphic objects & function parallelization. This course is organized by LRZ, arguments, interacting objects, submod- Hands-on sessions will manifest the University of Kassel, HLRS, and IAG. ules & plugins), interoperability with C contents of the lectures. In most of (mixed language programming), coar- these sessions, the application Frame- Webpage rays (basics, dynamic entities, advanced work APES will be used. They cover http://www.lrz.de/services/compute/ syncronization, programming patterns, grid generation using Seeder, visualiza- courses/ collectives, events, teams, atomic sub- tion with ParaView and the usage of routines, performance aspects). To con- the parallel CFD solver Ateles on the solidate the lecture material,each day's local HPC system. approximately 4 hours of lecture are The course is organized by HLRS, complemented by 3 hours of hands-on IAG (University of Stuttgart) and sessions. The last 2 days of the course STS, ZIMT (University of Siegen). are a PATC event. Webpage Prerequisites http://www.hlrs.de/events/ • good knowledge of Fortran 95

Webpage http://www.lrz.de/services/compute/ courses/

Spring 2015 • Vol. 13 No. 1 • inSiDE 123 GCS – High Performance Computing Courses and Tutorials

Message Passing Interface Shared Memory Advanced Topics (MPI) for Beginners Parallelization with OpenMP in Parallel Programming (PATC course) Date & Location Date & Location September 28-29, 2015 September 30, 2015 Date & Location HLRS, Stuttgart HLRS, Stuttgart October 1-2, 2015 HLRS, Stuttgart Contents Contents The course gives a full introduction This course teaches shared memory Contents into MPI-1. Further aspects are domain OpenMP parallelization, the key concept Topics are MPI-2 parallel file I/O, decomposition, load balancing, and on hyper-threading, dual-core, multi- hybrid mixed model MPI+OpenMP debugging. An MPI-2 overview and the core, shared memory, and ccNUMA parallelization, MPI -3.0 parallelization MPI-2 one-sided communication is also platforms. Hands-on sessions (in C and of explicit and implicit solvers taught. Hands-on sessions (in C and Fortran) will allow users to immediately and of particle based applications, Fortran) will allow users to immediately test and understand the directives and parallel numerics and libraries, and test and understand the basic con- other interfaces of OpenMP. Tools for parallelization with PETSc. Hands-on structs of the Message Passing Inter- performance tuning and debugging are sessions are included. Course language face (MPI). Course language is English presented. Course language is English is English (if required). (if required). (if required). Webpage Webpage Webpage http://www.hlrs.de/events/ http://www.hlrs.de/events/ http://www.hlrs.de/events/

124 Spring 2015 • Vol. 13 No. 1 • inSiDE GCS – High Performance Computing Courses and Tutorials

Introduction to GPU Cray XC40 Workshop and GPU Programming programming using OpenACC Parallel I/O Courses using CUDA

Date & Location Date & Location Date & Location October 19-20, 2015 October 20-23, 2015 October 26-28, 2015 JSC, Forschungszentrum Jülich HLRS, Stuttgart HLRS, Stuttgart

Contents Contents Contents GPU-accelerated computing drives This course consist of two parts. The course provides an introduction current scientific research. Writing to the programming language CUDA, fast numeric algorithms for GPUs A three-day optimization course which is used to write fast numeric offers high application performance covers the main topics of porting algorithms for NVIDIA graphics by offloading compute-intensive por- applications for the Cray XC40 sys- processors (GPUs). Focus is on the tions of the code to the GPU. The tem Hornet, application profiling, basic usage of the language, the course will cover basic aspects of application debugging, and parallel exploitation of the most important GPU architectures and programming. I/O. Tools provided on the system features of the device (massive parallel Focus is on the usage of the directive- are presented in talks and live dem- computation, shared memory, texture based OpenACC programming model os. Furthermore a series of hands- memory) and efficient usage of the which allows for on sessions give the opportunity to hardware to maximize performance. development. Examples of increasing exercise also with own applications. An overview of the available complexity will be used to demonstrate development tools and the advanced optimization and tuning of scientific An increasing issue in parallel features of the language is given. applications. computing is the efficient handling of reading and writing files. The one Webpage Webpage day workshop Efficient parallel http://www.hlrs.de/events/ http://www.fz-juelich.de/ias/jsc/ I/O offers users the ability to gain events/openacc insight on how to optimize the I/O of an application. It shows methods and tools for parallel I/O including MPI-IO, HDF5, NetCDF. Thus the time spent in reading and writing files can be significantly reduced, increasing the total efficiency of an application.

The workshop does include some Cray specific information about tools but is still useful for I/O on other systems as well.

Webpage http://www.hlrs.de/events/

Spring 2015 • Vol. 13 No. 1 • inSiDE 125 GCS – High Performance Computing Courses and Tutorials

Scientific Visualization Data Analysis and Data Industrial Services Mining with Python of the National HPC Centre Date & Location Stuttgart (HLRS) November 5-6, 2015 Date & Location HLRS, Stuttgart November 9-11, 2015 Date & Location JSC, Forschungszentrum Jülich November 11, 2015 Contents HLRS, Stuttgart This two day course is targeted at Contents researchers with basic knowledge in Pandas, matplotlib, and scikit-learn Contents numerical simulation, who would like to make Python a powerful tool for data In order to permanently assure their learn how to visualize their simulation analysis, data mining, and visualization. competitiveness, enterprises and results on the desktop but also in All of these packages and many institutions are increasingly forced to Augmented Reality and Virtual Environ- more can be combined with IPython deliver highest performance. Power- ments. It will start with a short over- to provide an interactive extensible ful computers, among the best in the view of scientific visualization, following environment. In this course, we will world, can reliably support them in a hands-on introduction to 3D desk- explore matplotlib for visualization, doing so. top visualization with COVISE. On the pandas for time series analysis, This course is targeted towards decision second day, we will discuss how to and scikit-learn for data mining. We makers in companies that would like build interactive 3D Models for Virtual will use IPython to show how these to learn more about the advantages Environments and how to set up an and other tools can be used to of using high performance computers Augmented Reality visualization. facilitate interactive data analysis and in their field of business. They will be exploration. given extensive information about the Webpage properties and the capabilities of the http://www.hlrs.de/events/ Webpage computers as well as access methods http://www.fz-juelich.de/ias/jsc/ and security aspects. In addition we events/python-data-mining present our comprehensive service offering - ranging from individual con- sulting via training courses to visualiza- tion. Real world examples will finally allow an interesting insight into our current activities.

Webpage http://www.hlrs.de/events/

126 Spring 2015 • Vol. 13 No. 1 • inSiDE GCS – High Performance Computing Courses and Tutorials

Introduction to the Advanced Parallel Fortran Programming and Usage Programming with MPI for Scientific Computing of the Supercomputer and OpenMP Resources at Jülich Date & Location Date & Location December 7-11, 2015 Date & Location November 30 - December 2, 2015 HLRS, Stuttgart November 26-27, 2015 JSC, Forschungszentrum Jülich JSC, Forschungszentrum Jülich Contents Contents This course is dedicated for scientists Contents The focus is on programming models and students to learn (sequential) pro- This course gives an overview of the MPI and OpenMP. Hands-on sessions gramming scientific applications with supercomputers at Jülich. Especially (in C and Fortran) will allow users Fortran. The course teaches newest new users will learn how to program to immediately test and understand Fortran standards. Hands-on sessions and use these systems efficiently. the basic constructs of the Message will allow users to immediately test and Topics discussed are: system architec- Passing Interface (MPI) and the shared understand the language constructs. ture, usage model, compilers, tools, memory directives of OpenMP. Course monitoring, MPI, OpenMP, perfor- language is German. This course is Webpage mance optimization, mathematical organized by JSC in collaboration http://www.hlrs.de/events/ software, and application software. with HLRS. Presented by Dr. Rolf Rabenseifner, HLRS. Webpage http://www.fz-juelich.de/ias/jsc/ Webpage events/sc-nov http://www.fz-juelich.de/ias/jsc/ events/mpi

Spring 2015 • Vol. 13 No. 1 • inSiDE 127 Contents

News Interview with Prof. Dr. Dr. Thomas Lippert 4 Obituary: Dr. Walter Nadler 7 Minister of Science of Baden-Württemberg visits HLRS 9 Cray CEO visits HLRS 10 Spring 2015 Member of Executive Board of Porsche visits HLRS 11 inSiDE • Vol. 13 No.1 No.1 • Spring 2015

Jülich’s Bernd Mohr to Chair SC17 12 13

Applications Innovatives Supercomputing HLRS Supercomputer successfully executed Extreme-scale Simulation Projects 14 Discontinuous Galerkin for High Performance Computational Fluid Dynamics 20 in Deutschland

Large-scale Simulations of Particle-laden Flows 25 inside • Vol. Quarks and Hadrons – and the Spectrum in Between 29 Advance Visualisation of Seismic Wave Propagation and Speed Model 34 A new Neutrino-Emission Asymmetry in forming Neutron Stars 38 Precision Physics from Simulations of Lattice Quantum Chromodynamics 43 Turbulence in the Planetary Boundary Layer 47 The strong Interaction at Neutron-rich Extremes 51 Lattice QCD as Tool for Discoveries 56 LIKWID Performance Tools 60

Projects

ECO2Clouds – Experimental Awareness of CO2 in Federated Cloud Sourcing 64 MIKELANGELO – Micro Kernel Virtualization for High Performance Cloud and HPC Systems 68 FFMK: A Fast and Fault-tolerant Microkernel-based Operating System for Exascale Computing 72 EUDAT2020 77 ORPHEUS — Fire Safety in the Underground 80 Seamless Continuation: 4th PRACE Implementation Phase Project 85 Intel® Parallel Computing Center (IPCC) at LRZ 88

Systems 92

Centres 96 ´ Activities 102

Courses 122

If you would like to receive inside regularly, send an email with your postal address to [email protected] © HLRS 2015