Autumn 2012 inSiDE • Vol. 10 No. 2

Innovatives Supercomputing in Deutschland ContentsEditorial

is published two times a year by The GAUSS Editorial Centre for Supercomputing (HLRS, JSC, LRZ)

Welcome to this new issue of InSiDE, touch with a wider community is part Publishers the journal on innovative Supercomputing of two further contributions: A report on Prof. Dr. A. Bode | Prof. Dr. Dr. Th. Lippert | Prof. Dr.-Ing. Dr. h.c. Dr. h.c. M. M. Resch in published by the Gauss ISC'12 shows the strong interaction Centre for Supercomputing (GCS). We of GCS with the international supercom- Editor are well into our 10th year and continue puting community. Introducing the newly F. Rainer Klank, HLRS [email protected] to look back a little bit. This time we created SICOS GmbH we show how small have invited the Chairman of the Board and medium sized enterprises can be Design of GCS Prof. Dr. Heinz-Gerd Hegering supported in their usage of HPC systems. Antje A. Häusser, HLRS [email protected] for an interview on High Performance Stefan Steinbach, HLRS [email protected] Computing in Germany and Europe. Again we present a rich section on Linda Weinmann, HLRS [email protected] applications running on the GCS systems. Prof. Hegering not only looks back on the Ranging from Computational Fluid history of High Performance Computing. Dynamics to astrophysical simulations He and the Leibniz Computing Centre the eight contributions clearly show the (LRZ) also have good reason to celebrate need for compute power and the poten- and look positively towards the future. tial of HPC simulation for both basic LRZ inaugurated its new system called and applied science. The growing "SuperMUC" in July. Following the GCS strength of GCS in HPC research is high- strategy SuperMUC introduces a new lighted by the seven project descriptions architecture to users in Europe and that show how various groups at , Germany. Emphasizing Germany’s role Jülich and Stuttgart contribute to the as a leader in European High Performance progress in HPC research. Computing LRZ and GCS were also proud to present the SuperMUC to be GCS has made big steps over the last Europe’s number one system in the TOP years. The co-operation with PRACE has • Prof. Dr. 500 list. So the spiral of performance in extended its reach into Europe and has A. Bode (LRZ) Germany is working as designed and allowed us to contribute to European the Jülich Supercomputing Centre (JSC) research and development substantially. • Prof. Dr. Dr. Th. Lippert (JSC) and the High Performance Computing GCS is committed to continuing these

Center Stuttgart (HLRS) are both preparing efforts. In the field of HPC research GCS • Prof. Dr.-Ing. Dr. h.c. Dr. h.c. for their next step. is moving forward both in European and M. M. Resch (HLRS) national large scale research programs. The work of GCS is a major part of the European PRACE initiative. Results of With this, we look forward to a bright the 4th Pan-European call for user pro- future for supercomputing in Germany posals are presented in this issue. and hope you enjoy reading. Showing off these results and getting in

2 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 3 Contents ContentsEditorial Contents

News

» "SuperMUC", Europe's fastest Supercomputer, 6 inaugurated at the 50th Anniversary of LRZ » Optimizing the Energy-to-Solution on 62 SandyBridge Systems » Interview with Prof. Hegering 8 » ParaPhrase − Parallel Patterns for Adaptive 66 » PRACE: Results of the 4th Regular Call 11 Heterogeneous Multicore Systems

» GCS @ ISC'12 12 » H4H – Hybrid for HPC 68

» SICOS: HPC and more for Industry 14 » The SIOX Project 70

Applications » Performance Dynamics of Massively Parallel 72 Codes » Simulation of Nasal Cavity Flows for Virtual Surgery 16 Environments » PRACE – Third Implementation Phase Project 74 started » Numerical Investigation of the Flow Field about 24 the VFE-2 Delta-Wing Systems

» Simulating Blood Cells and Blood Flow 28 » JUQUEEN: The most recent Blue Gene 76 Architecture at Jülich » Direct Numerical Simulations of turbulent 34 Rayleigh-Bénard Convection » Virtual Reality and Visualization at the Leibniz 80 Supercomputing Centre » Development and Validation of Thermal Simulation 38 Models for Li-Ion Batteries in Hybrid and Pure Centres 82 Electric Vehicles Activities 88 » Matter-Antimatter Asymmetry and the Search 42 for fundamental Laws of Physics with JUGENE Courses 106

» Theoretical Mechanochemistry: Stressing 48 Authors 112 Molecules in the "Virtual Lab"

» Global MHD Simulations of protostellar Jets with 54 radiative Effects

Projects

» LRZ Validation-Suite for Life Science 58

4 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 5 News News

While High Performance Computing was "SuperMUC", Europe's fastest in the beginning and is at the heart of Supercomputer, inaugurated at LRZ, LRZ also is the general IT service provider for ’s universities and th the 50 Anniversary of LRZ serves more than 100,000 customers in the Munich area. Prof. Dr. Heinz-Gerd Hegering, past Chairman of LRZ for sciencists. On July 20, 2012, LRZ more than twenty years, at the celebra- celebrated the inauguration of SuperMUC tion of the 50th anniversary gave an and its 50th anniversary. Several overview on the evolution of the many hundred guests attended the event to- different services offered by LRZ. Figure 4: The first LRZ building. Figure 5: The LRZ building in gether with Prof. Dr. Annette Schavan, the centre of Munich. Germany’s Federal Minister for Science LRZ always has been a pioneer in infor- and Education, and Dr. Wolfgang mation technology e.g., communication Heubisch, ’s Minister of State networks, long term archiving, High for Research, Science and the Arts. Performance Computing in general and today with a special focus on energy In 1962, the Bavarian Academy of Sci- efficiency. Thus, SuperMUC, manufac- ences founded a "Commission for tured by IBM, is the first computer of Figure 6: The LRZ building after the extension 2009-2011 in Garching near Munich. Electronic Computing”, which was later its class that is completely cooled with renamed to “Commission for Informatics”. hot water. LRZ is a leading expert in which is stressed by the fact that the This Commission for Informatics the field of warm water cooling of com- Federal Government of Germany and Figure 1: Prof. Dr. Arndt “SuperMUC”, LRZ’s new supercomputer established and runs the Leibniz-Rechen- puters. The extension of the computer the Free State of Bavaria jointly spent Bode, Chairman of LRZ, is ranked no. 1 in Europe and no. 4 in zentrum LRZ (Leibniz Supercomputing building has been equipped with the more than 80 Million Euro for the present Martina Koederitz, the world as announced on the Interna- Centre). Starting in a residential necessary infrastructure for this innova- configuration of SuperMUC and about General Manager IBM Germany, Federal Minister tional Suprecomputing Conference at building in Munich, it moved to a new tive and highly energy-efficient way to 50 Million Euro for the extensions of the for Science and Education Hamburg on June, 18th, 2012, by and larger building near the famous run computers from SuperMUC to buildings of LRZ. SuperMUC is now Prof. Dr. Annette Schavan Prof. Dr. Hans Meurer, co-founder of Karlsplatz “Stachus” in the centre of Linux clusters. available for academia in Germany as part • L u d g e r P a l m and Bavaria’s Minister the Top500 list of supercomputers Munich a few years later. In 2006, of the Gauss Centre for Supercomputing of State for Research, Science and the Arts (www.top500.org). LRZ had to move again, now to a much Federal Minister Prof. Dr. Annette GCS and for many researchers in 24 Leibniz Dr. Wolfgang Heubisch. larger facility on the campus in Garching, Schavan as well as Minister of State member states via the Partnership for Supercomputing SuperMUC, as described in detail in our north of Munich, which had to be ex- Dr. Wolfgang Heubisch pointed out Advanced Computing in Europe, PRACE. Centre (LRZ) last issue, delivers 3 PetaFlop/s peak tended further already in 2009 to 2011 the importance of High Performance performance to Germany’s and Europe’s to its current impressive size. Computing for Germany and Bavaria

Figure 2: Prof. Dr. Hans Meuer, co-founder of the Top500 list, Figure 3: Celebrating the 50th anniversary of LRZ and the Figure 7: Prof. Dr. Heinz-Gerd Hegering, LRZ’s Chairman for Figure 8: Ministers Schavan and Heubisch putting their hands on the handing the certificates to Martina Koederitz, General Manager inauguration of “SuperMUC” with Ministers Prof. Dr. Annette more than 30 years, at the ceremony. warm-water cooling of SuperMUC. IBM Germany, and to Prof. Dr. Arndt Bode, Chairman of LRZ. Schavan and Dr. Wolfgang Heubisch.

6 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 7 News News Interview with Prof. Hegering

Prof. Hegering, you have been the chair- nario: it provides the most extensive HPC systems are never stand-alone cate that significant amount of funding man of the board of the Gauss Centre system base in the Petaflop-range, has installations but are enmeshed in an for the project PetaGCS. for Supercomputing (GCS) for the the role of PMO in the PRACE projects, infrastructure of networks and data last 4 years. What have been the major and held until just recently also the handling systems. How do you see the When we look at the national level we changes over that time? role of the council chairman in PRACE development of this overall infrastruc- find that Germany has a well-estab- AISBL. ture when we look at Exaflops? lished pyramid of performance and has We all are witnesses to the fact that organized the medium sized centers in GCS has grown into one virtual super- - GCS is also active in the continuation The answer is related to what I men- the Gauss Alliance. How do you see the computing center that is responsible of the German national HPC concept. tioned before. I think it is important for future development of HPC in Germany for the German tier 0 and tier 1 systems Last year GCS proposed a concept the overall HPC infrastructure that the as a whole? of the HPC pyramid in the national as paper that could serve as a replacement HPC pyramid in Europe as well as in well as the European context: of the “Reuter-Papier” from 2006 and the various countries is somewhat well GCS proved to become a success story served as input for the latest HPC- balanced throughout all tiers. For the that is well recognized in Germany by - In all 3 member centers of GCS related recommendations of the German next round of HPC installations I expect science and politics. If you take a look (Garching, Jülich, Stuttgart) Petaflop- Wissenschaftsrat which is the advisory that the leading HPC platforms for real at the positive statements in the lat- systems were installed belonging to committee of the German government. use will be HPC systems with perhaps est recommendations of the Wissen- the worldwide leading systems in the hundreds of Petaflop/s but not yet schaftsrat, it is clearly stated that HPC TOP500 list. Being of different but The world of HPC is discussing the Exascale machines. is of national importance and that GCS complementary architectures these roadmap to Exaflops intensively. How is the adequate organization to form systems serve a broad range of sci- realistic do you see the chances that It is also a question of how much money the top of the HPC pyramid in Germany ences. GCS now offers the largest and we reach this target within the next we should spend for an experimental with respect to the tiers 0 and 1. The most powerful supercomputing infra- 8 to 10 years? system that could be detrimental to situation for the tiers 2 and 3 is more structure in Europe and a vast range top production machines. In any case complicated due to our national con- of industrial and research activities in It seems to be certain that there will we have to invest in people to improve stitution, since most of the tier 2 and various disciplines will benefit from it. be an Exascale-system within the next usage efficiency. 3 centers underlie the governance decade. And Germany should participate of the local German states. Possible - Access to GCS’s resources is in the development of such a system. With the European initiative PRACE, changes of our constitution are under granted by a common GCS review and Within GCS especially JSC is active in HPC has become more international in discussion, but this is politically a very access committee through a unified the related field. But I personally be- Europe than ever before. How do you complicated field. From a pragmatic evaluation process based on purely sci- lieve that within the next 6 years such see the role of PRACE in Europe and standpoint of view it was wise to entific criteria. systems will rather show more ex- its impact on German HPC? separate concerns and to found Gauss perimental characteristics and be still Centre (GCS) and Gauß-Allianz (GA) as - The GCS governance implies clear subject to related research than serve The success of PRACE has been to bring separate associations. The precondi- and transparent coordination pro- as common production platform for a the European High Performance Com- tions for a good cooperation were al- cesses with regards to procurements, broader scientific usage in an effective puting community closer together and ready designed in the statutes of both educational programs, outreach, as and efficient way. Think of the energy to provide a system platform of con- associations since GCS is a member of well as defining focal and main thematic consumption and the lack of suitable siderably more capacity and capability. GA and a board member of GCS is also emphases. languages, algorithms, and tools to This could not have been reached with- member of the GA board. I think that reach a really high sustained perfor- out the European FP7-call in 2007. In this is the best approach to build a - GCS is member of PRACE and has a mance. Germany this call helped to found GCS powerful HPC pyramid in Germany fac- leading role in the European HPC sce- in a very short time frame and to allo- ing the different political challenges of

8 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 9 News News

GCS and the Gauß-Allianz. We already of the national HPC concept. GCS was have a strong cooperation for joint selected to carry out the PetaGCS PRACE: Results of the software research and development project and to provide the German 4th Regular Call projects. I am fully convinced that the contributions to the European PRACE overall approach taken so far will lead project. We succeeded in achieving the to a successful structure of the whole leading position in Europe in less than The Partnership for Advanced Computing 140 million compute core hours on HPC pyramid in Germany. one decade in Europe (PRACE) is offering super- Hermit, and ten have been awarded a thanks to the great support of BMBF computing resources on the highest total of about 200 million compute After several decades in HPC, what and the states of Baden-Württemberg, level (tier-0) to European researchers. core hours on SuperMUC for the al- do you personally see as the most Bavaria, and North Rhine-Westphalia. location time period May 2012 to April outstanding achievement? Now there is the challenge to continue The Gauss Centre for Supercomputing 2013. Six of those research projects this success story and make the achieve- (GCS) is currently dedicating shares of are from Italy, five are from France, Especially during the last 20 years ments sustainable at a high level. its IBM Blue Gene/P system JUGENE in four are from Germany, two are from the realization that simulation besides Jülich (currently being upgraded to the Spain and the UK, each, while one each theory and experiment is an increas- Prof. Hegering, thank you for the interview. IBM Blue Gene/Q system JUQUEEN, are from Belgium, Denmark, Finland, ingly important and indispensable The Interview was conducted by the inSiDE see also the corresponding article in this Ireland, Portugal, and Switzerland. The method for reaching scientific findings team edition), of its Cray XE6 system Hermit research projects awarded computing has gained momentum. This aware- in Stuttgart, and of its IBM iDataPlex time cover all scientific areas, from ness leads to supporting the concept Prof. Heinz-Gerd Hegering was the system SuperMUC in Garching. Astrophysics to Medicine and Life Sci- of national supercomputing centers in Managing Director of the Leibniz ences. More details, also on the projects 1999. In 2006 the German Federal Rechenzentrum (LRZ) at Garching The 4th call for proposals, this time for granted access to the machines in Ministry for Education and Research, until 2008. He is one of the founders computing time on CURIE, hosted by France, Spain, and Italy, can be found BMBF, gave the inducement to found of GCS and has been the chairman GENCI in TGCC/CEA, Bruyères-Le-Châtel, via the PRACE web pages www.prace- GCS as an organization that became of the Board of Directors of GCS France, MareNostrum, hosted by BSC ri.eu/PRACE-4th-Regular-Call. obligatory for the realization of the top since May 2008. in Barcelona, Spain, FERMI, hosted by CINECA in Casalecchio di Reno, Italy, in Evaluation for the 5th call for proposals addition to the above mentioned GCS that closed May 30, 2012 is still under systems, closed January 10, 2012. way, as of this writing. Details on the • Walter Nadler calls, also on the current 6th call, can Eleven research projects have been be found on www.prace-ri.eu/Call- Jülich awarded a total of about 300 million Announcements. Supercomputing compute core hours on JUGENE, four Centre (JSC) have been awarded a total of about

10 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 11 News News GCS @ ISC'12

At this year’s ISC, the Gauss Centre for Two GCS Systems in Top Ten HPC challenges in the fields of science through the draft tube of that turbine. Supercomputing came up with a novelty: of TOP500 and industry, underscoring Germany’s JSC exhibited a broad spectrum of sci- For the first time in its existence, GCS For GCS, undisputed highlight of this efforts to maintain a leading role in the entific results obtained with its super- presented itself on a conjoint booth year’s ISC was the fact that two GCS supply and use of HPC systems and computers JUGENE and JUROPA in uniting the three national GCS centres systems made it into the Top Ten of the services in Europe and beyond. Session presentation videos and animations. In HLRS, JSC and LRZ on a single presen- latest TOP500: With a Linpack peak titles included: “HPC Strategies in particular, JSC showcased LLview, the tation platform. The brand new 64m² performance of about 3 Petaflops, Germany”, “Supercomputing Projects at comprehensive interactive monitoring large booth attracted countless like- LRZ’s SuperMUC ranks as Europe’s LRZ”, “Simulation Laboratories at JSC: software for supercomputers devel- minded HPC researchers, technology fastest HPC system to date and came Community-Oriented Software Support oped in-house, in live demonstrations leaders, scientists, IT-decision makers in 4th on the TOP500 world wide list- and Development for HPC”, and “Scalability on supercomputers worldwide. Apart as well as high tech media representa- ing while JSC’s new JUQUEEN (Linpack Matters for Industrial Applications?!” from demonstrating simulations on tives who sought the interchange with peak: 1.6 Petaflops) took position 8 in The GCS workshops were complemented its current HPC systems, LRZ had its the directors of the three GCS centres the over-all ranking. HLRS’s flagship by Thursday’s Keynote delivered by Prof. focus once again on SuperMUC Prof. Bode (LRZ), Prof. Lippert (JSC), computer Hermit successfully defended Arndt Bode (LRZ) on the subject ”Extreme and provided details about its newly and Prof. Resch (HLRS) as well as Dr. its title as the world’s fastest industri- Energy Efficiency with SuperMUC”. installed, highly energy-efficient Claus Axel Müller (Managing Director, ally used supercomputer. With a peak 3 Petaflop/s system to the vast number GCS) and further GCS representatives. performance of more than 1 Petaflops, GCS Booth Highlights of visitors and journalists. Hermit continues to be number 1 on On the GCS booth, the three GCS One of the first visitors to the GCS booth the TOP500 Industrial Sub-List. centres presented their wide-ranging About ISC was Dr. George Liang-Thai Chiu of the HPC activities. HLRS concentrated on The ISC has seen an extremely positive IBM Thomas J. Watson Research Center. Four GCS Workshops, interactive 3D visualizations of simula- feedback by the scientific community Earlier that day, Dr. Chiu had been Thursday Keynote on Super- tion results on a 2x2 3D display wall. and has proven to be the number one awarded the GCS Award in recognition MUC Its highlight was the visualization of a event in Supercomputing together with of his scientific paper “Blue Gene/Q: At ISC’12, GCS hosted a total of four pumped hydro power station showing the US Supercomputing Conference. By Co-Design”, submitted to the ISC’12 well-attended workshops. The Directors large scale terrain rendering combined GCS as a main supporter of the con- • Regina Weigand Research Paper Sessions. of the three GCS centres and other with visualization of the power house ference and fair has contributed in a GCS representatives addressed various and the actual water turbine/pump variety of activities to this success and Gauss Centre for subjects related to supercomputing and and the visualization of water flow continues to endorse ISC. Supercomputing (GCS)

Figure 1: GCS Award winner Dr. Chiu with Claus Axel Müller (Managing Director, GCS, left) and Prof. Figure 2: The brand new GCS booth at ISC’12 Michael M. Resch (GCS Board Member) 12 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 13 News News SICOS: HPC and more for Industry

The competitive pressure for enter- fully reached the smaller and medium Parts of the activities of SICOS are the initial phases of the use of simula- prises grows continuously. Especially sized enterprises. They are usually not also oriented towards the internal tion technology, over occasional users, for, but not limited to companies in the able to invest into the learning time, structures of its founders’ computing who just need some help now and then, field of manufacturing (automotive, that every company that wants to use centres, the HLRS in Stuttgart and up to high end users, who can use the aerospace, machinery, etc.) the impor- simulation technology needs, until the SCC in Karlsruhe. In order to be able resources of the shareholders in order tance of optimal efficiency with respect benefits can be harvested. And even to offer an attractive platform for in- to be able to improve their offering. to their products cannot be overesti- if they are successful in applying this dustrial users, SICOS will work with the With the power of its shareholders in mated. In times of dramatically increas- technology: especially smaller companies two computing centres on optimizing the back and a wide spread network ing energy costs the weight of products find it difficult to scale their IT infra- the internal processes to meet the with highly competent partners, SICOS plays a key role for example in the structure to the growing needs, once needs of the customers (which in turn will be able to identify the best solu- transportation context. Being able to they want to integrate simulation into also helps the academic users in many tions and it supports the companies on decrease the weight of parts while at their standard development process. aspects). Currently a common security their way to apply them successfully. the same time improving their structural concept is under development and the stability is an important advantage This is the background that led the first steps towards an integrated user www.sicos-bw.de over the competition. Karlsruhe Institute of Technology (KIT) management between the two centres and the University of Stuttgart to have been made. Simulation technology has been developed found the SICOS GmbH more than one by scientists in the last decades into year ago. SICOS stands for SImulation, Of course these activities are fully an extremely powerful tool. Large com- Computing and Storage, which outlines aligned with the hww - Höchstleistungs- panies have taken simulation up and the technology fields that SICOS will rechner für Wissenschaft und Wirtschaft integrated it into their development pro- address primarily. Its main task is to GmbH. Since more than 15 years KIT, cess. Modern cars for example would facilitate the access to the resources the University of Stuttgart and the either be less secure or much more ex- of its founders for industry, with focus State of Baden-Württemberg closely • Andreas Wierse pensive if they had not been developed on small and medium sized enterprises cooperate with industry (today: using crash simulations. (SMEs); regionally SICOS concentrates Porsche and T-Systems) in the field of SICOS GmbH on Baden-Württemberg, but is not High Performance Computing as partners However this take-up, that is not so easy limited to it. in hww. hww is naturally integrated in even for large companies, has not yet the projects that SICOS pursues regarding common security and user manage- ment in order to provide a professional and integrated service to the outside.

SICOS will be developed into a one-stop- shopping institution for industrial users that are looking for a way to utilize nu- merical simulation or large scale data facilities for their needs. This ranges from novices, who must be led through

Figure 1: Dr. Andreas Wierse, Managing Director Figure 2: Julia Englert, Communication Figure 3: Individual Consulting

14 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 15 Applications Applications Simulation of Nasal Cavity Flows for Virtual Surgery Environments

The anatomy and the functionality of major role in the analysis and under- the pressure and the temperature can the human respiration system is well standing of the patients complaints. be investigated with a rhinomanometry understood, whereas physical pro- In addition, the nasal cavity plays an and temperature measurements in the cesses such as the details of the gas important role in the protection of the pharynx. These aspects only scratch transport within the respiration cycle human lung. The large surface of the the surface of the knowledge, which is lack insight. The influence of the air human nasal cavity allows the moistur- required to fully understand the physics flow on physiological functions like the ization and heating of the air to attain of the human respiration and starve for sense of smell and taste is only one optimal conditions for the lower airways. methods of higher detail. example which is of great interest. These functions can be impaired by Furthermore, the impact of pathologies e.g., congenital malformations or rhino- "Rhinomodel", an interdisci- in the human nasal cavity and the lung anaplasty. The nasal cavity is also re- plinary Research Project are still not fully explained and are sponsible for filtering out particles and Computer-based methods have been the topic of intensive scientific research aerosols from the inhaled air. Diesel developed over the last decades for in the fields of medical treatment, aerosols and respirable dust particles the analysis of fluid mechanical prob- engineering and computer science. An are proofed to cause damage to the lems and are well established in air and impaired respiration capability can lead human lung and induce the growth space science. They are, however, not to a reduction of the respiratory efficacy of cancer. Particle deposition in the very common in medical fields due to and can also induce diminished olfac- respiration tract is a topic of intensive the high complexity of human organs tory and degustation capabilities. Such research to further understand the and the physical aspects of the respi- Figure 2: Surface extraction pipeline of the human nasal cavity surface pathologies are in general caused by impact of such pathologies. To analyze ration, i.e. the particle and gas trans- from CT-data. The CT-images are preprocessed by a sharpening filter before a segmentation takes place, which splits the cavity into volumes allergies, mal- or deformations of e.g. the functions of the nasal cavity it is port, temperature and moisturization. of tissue and air. The surface is extracted and smoothed to obtain an the nasal septum, where their impact almost impossible to do in-vivo investi- Within the scope of several research accurate geometry of the nasal cavity. on the flow field in the nose plays a gations, although some parameters like projects funded by the German Research Foundation (DFG) the Institute of Aerodynamics of RWTH Aachen from the University Hospital in Aachen University (AIA) investigates the human (UKA), rhinology experts from the respiration system experimentally with Institute of Statistics and Medical the help of Particle Image Velocimetry Informatics Cologne (IMSIE) and the Virtual methods (PIV), a laser- and photogra- Reality (VR) group of the Computing phy-based particle tracking system, Center of the RWTH Aachen University and numerically by applying Computa- (RZ RWTH). A key aspect of this research tional Fluid Dynamics methods (CFD). is the relation between the physical The flow field in the complex shape of properties of the flow and the comfort the nasal cavity and the human lung is of the patient, which is raised in a simulated on High Performance Com- patient survey. puters (HPC) like the Cray XE6 Hermit System at the Höchstleistungsrechen- The study cycles through an analysis zentrum (HLRS) in Stuttgart. These procedure as shown in Fig. 1, where in projects are highly interdisciplinary a first step it is required to obtain the as they also require the expertise of geometrical representation of the nasal scientists from the medical and com- cavity from Computer Tomography (CT) Figure 1: Virtual surgery environment for treatment of individual pathologies in rhinology. A surface of the nasal puter science field. Therefore, a strong images obtained from the UKA. The cavity is extracted from CT-images and is used for flow simulations to analyze the patient's complaints. A virtual cooperation exists with radiologists CT-images contain density intensities surgery is performed and the resulting nasal cavity is analyzed and evaluated until a satisfactory treatment is found and a real surgery is carried out.

16 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 17 Applications Applications

(see Fig. 4) the pipeline enters an opti- patient's complaints can be explored mization cycle, where at the beginning and treated individually. The simulation a flow simulation with CFD methods of the complete respiratory cycle can is performed to obtain the raw fluid be simulated on the Cray XE6 Hermit mechanical properties, which are then system at HLRS, depending on the ac- analyzed by surgeons from the UKA curacy of the solution, within hours and and engineers from the AIA. Based on days, which makes this method being the gained knowledge, the surgeons within reach for the application in real perform a virtual surgery in the virtual world clinical environments. environment "CAVE" provided by the VR group of the RZ RWTH and edit the From Billions of Cubes to a extracted surface by virtually ablating Simulation or remodeling tissue. This results in a The flow solver ZFS (Zonal Flow Solver) new triangular representation of the for the simulation of the respiration nasal cavity, which again enters the cycle is developed at the AIA and optimization cycle to detect improve- works on Cartesian grids. This kind of ments to the patient's comfort based computational meshes allows a fully on the fluid mechanical properties of automatic grid generation, which is a the flow. This cycle is repeated until a great advantage over grid generation satisfactory modification of the nasal methods producing boundary fitted cavity is obtained which can then be meshes. Such meshes have to be set Figure 3: Surface of the nasal cavity based on CT-image data. A saggital slice of the CT-data shows the applied in a real surgery. This way, the up manually and their configuration relative position of the extracted surface. Streamlines, colored by their velocity visualize the flow field.

stored in a three-dimensional mesh. sue and air complicate the extraction The extraction process involves several of the surface and hence need to be extraction procedures itself and is sharpened with a convolution filter, depicted in Fig. 2. In a preprocessing which applies a filter mask to each step, it is sometimes required to apply voxel involving the surrounding neigh- an image enhancement filter to the CT- boring voxels. To detect the volume data. Unsharp boundaries between tis- of interest, a seeded region growing algorithm recursively marks voxels depending on their density value as provided by the CT. A triangular repre- sentation is then generated from the segmentation by using the Marching Cubes algorithm, which produces a water-tight surface. The segmentation of the CT-data operates on binary data and hence the resulting surface contains unsmooth stairsteps. The surface is then smoothed by applying filter methods such as a windowed sinc function smoothing, which is a low-pass filter reducing the high fre- quent surface variations deflected by Figure 5: Streamlines of the nasal cavity flow colored by the local flow velocity. The bending in front the stairsteps. Based on this surface Figure 4: Surface of the nasal cavity extracted from CT-data. This patient has shows the flow characteristics in the region of the inferior turbinate. A mixing region is found in the a warped septum and a swollen inferior turbinate on the right side. pharynx, where the left and right cavity merge.

18 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 19 Applications Applications

Cray XE6 Hermit system at HLRS. quasi-incompressible flows and hence The parallelism is a major advantage only in the low Mach numbers regime. over serial grid generators, which are This leads to a decoupling of the energy strongly limited to the available equation from the Navier-Stokes equa- memory on only one system and may tions. To additionally solve for the tem- take up to several days to generate a perature field, a one-directional coupled grid with a sufficient resolution for the scalar transport equation is integrated, Figure 6: Local boundary refinement in the human nasal cavity. The mesh is simulations in the respiratory tract. in which the convection of the scalar refined based on the distance to the boundary. In the center, a frontal slice Local grid refinement, as shown in is controlled by the velocity. With this through the nasal cavity shows the location of the mesh detail. Fig. 6, allows a high resolution in regions, procedure all primitive variables, veloc- where high gradients are expected, i.e. ity, density, pressure and temperature for complicated geometries such as in wall-bounded shear layers and mixing can be obtained in each time step in the human respiratory system would zones and increase the accuracy every grid cell. take several days and is hence not of wall-shear stress calculations, which

feasible. ZFS combines a parallel grid are of great importance in the analysis The nasal Cavity, a highly Figure 8: Simulation of the temperature distribution in the human generator, a Finite Volume (FV) and a of nasal cavity flows. Local flow phe- complex and individual Organ nasal cavity. The inflow condition is set to 20° C and the wall tem- Lattice-Boltzmann Method (LBM) solver nomena, such as shear layers also Several aspects play a major role in perature to 37° C body temperature. The inhaled air is heated up for the simulation of the fluid flow, a need a high resolution for an accurate specifying the quality of a nasal cavity. depending on the heating capability of the nasal cavity and reaches almost body temperature in the pharynx under optimal conditions. Lagrangian particle tracking and post- prediction. Such refinement is obtained One of the properties governing the processing functionalities. As a base by dynamic solution adaptive refine- respiration efficacy is the static pres- for the flow solvers the space needs to ment initiated during the simulation. sure loss resulting from the shape of near the wall, onto the geometry. be split into small cubes to discretize For the simulation, the LBM solver is the cavity. Warped septums and swol- Fig. 7 shows an example of such a visuali- the equations of fluid motion, i.e. the used since it is more efficient for the len turbinates are typical diagnoses zation, where the stresses are highest Navier-Stokes equations for the FV low Mach number flow as it appears in of genetic endowments and allergies, (red) in highly converged channels solver and the Boltzmann equation for the respiration system. In the LBM, respectively. Within the DFG-funded and in zones of increased vorticity. To the LBM solver, respectively. The grid the Lattice-BGK equation, which is the project "Rhinomodel", simulations of fluid accurately compute these stresses, generator is efficiently parallelized to discretized BGK-equation, a simplified flows in the nasal cavity have shown a high grid resolution is required, which be executed on thousands of processors form of the Boltzmann equation, is to be responsible for an increased is why local grid refinement plays a fun- on HPC systems, allows a resolution solved iteratively for so called particle pressure loss in such pathologies. The damental role in the estimation of the of billions of cells per nasal cavity and probability density functions (PPDFs). formation of recirculation zones inhibit- wall shear-stresses. The temperature takes only seconds to generate on the In contrast to the Navier-Stokes equa- ing the direct access to the pharynx is increase is important for attaining tions, the Boltzmann equation is a analyzed by considering the character- optimal conditions for the lower airways statistical approach for the description istics of streamlines as shown in Fig. 3 and is considered by investigating the of the rate of change of the number and Fig. 5. Here, the streamlines are passive scalar transport from the of particles in an infinitesimal small colored by the velocity of the fluid and nostrils, with a boundary condition of volume of fluid. The LBM solver contains show strong curvatures in regions of 20°C ambient temperature and heated two simple steps, the propagation and high vorticity. This form of visualization up by the 37°C tissue temperature collision step, and can easily be parallel- also supports the investigation of the by diffusion and convection, to the ized on thousands of processors. The olfactory capability as fluid not directed pharynx. Healthy cavities show an in- algorithm computes the new PPDFs in along the olfactory organ suggests a crease of the temperature to almost each cell for a certain number of dis- diminished sense of smell since the body temperature while pathological crete directions, i.e. for 14, 18 or 26 olfactory receptors cannot be reached. cavities often exhibit a reduced heating neighboring cells in three dimensions. The forces acting on the nasal cavity capability. An example of the tempera- After this collision step the information tissue generating high pressure losses ture distribution in a nasal cavity is

Figure 7: A mapping of the wall-shear stresses on the nasal cavity surface. is exchanged between neighboring cells are investigated by mapping the wall- shown in Fig. 8. Another aspect not Red regions indicate high shear stresses, which appear in convergent channels, and is again used in the computation shear stresses, which are highest in being verified evidently in literature is generating higher pressure losses. of the new PPDFs. The LBM is valid for regions of strong velocity gradients whether the flow in the nasal cavity is

20 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 21 Applications Applications

nels near the pharynx. Such zones are potential candidates of transitional to turbulent flow and are currently under detailed investigation.

Demand for highly optimized Code and future Requirements The solution algorithm for the simulation of the nasal cavity flow should produce results within a reasonable time, especially when it is used in a virtual surgery environment. Therefore, it is Figure 9: Speedup results for a stong scaling test for a total grid size of 0.403 x 109 cells for the LBGK D3Q19 algorithm on the important to have a highly optimized Cray XE6 Hermit system at HLRS Stuttgart. code, which is implemented on HPC systems. In cooperation with computer science specialists from HLRS Stuttgart Figure 11: The velocity distribution in the right nasal cavity shows a strong and CRAY, ZFS has recently been production of vorticity where two nasal cavity channels merge and the flow optimized to yield a higher performance is directed towards the pharynx. on the High Performance Computing systems of HLRS, with a special focus on Cray XE6 Hermit. It is important to note, that the emphasis is not only vestigated and first tests show very of the collaborative research center placed on the reduction of the compu- promising results. Future research at SFB 686. The financial support by the tation time, but also on the reduction the AIA includes investigations of the German Research Foundation (DFG) and of the communication time. Fig. 9 shows aeroelastic problem which occurs dur- the optimization support by the HLRS the results of a strong scaling experiment ing sleep apnea and also snoring. In and CRAY is gratefully acknowledged. on Cray XE6 Hermit, where for a con- this case an interaction of the elastic • Andreas Lintermann stant problem size of 0.403 x 109 cells tissue with the flow field occurs, where Figure 10: Speedup results for a weak scaling test for a constant the number of processes is continu- the force exerted by the flow leads to Chair of Fluid local grid size of 0.537 x 106 cells for the LBGK D3Q19 algorithm ously increased from 1,024 to 8,192 a deformation of the tissue which in Mechanics and on the Cray XE6 Hermit system at HLRS Stuttgart. and the speedup is measured. The turn impacts the flow. Such problems Institute of small decrease in speed for 8,192 pro- require a prediction of unsteady flow Aerodynamics completely laminar. The type of flow is cesses is due to the increase of the fields coupled with a finite element RWTH Aachen especially important for investigations communication overhead. Instead of solver for the tissue structure and University in the lower airways, where at the tran- keeping the overall problem size con- increases the problem size and the sition from the pharynx to the larynx stant, the problem size per process computational effort considerably. These accurate boundary conditions have to can be kept constant which leads to simulations will therefore be ideal can- be set. Recent investigations at the AIA a weak scaling, for which the results didates for the HPC systems of HLRS. show that the flow can indeed become in Fig. 10 show an almost perfect transitional, depending on the local scalability. For this experiment a maxi- Acknowledgments shape of the geometry and the local mum number of cells of 109 has been The research for project "Rhinomodel" properties of the flow. Fig. 11 shows reached. Currently, the scalability to has been conducted under DFG research a strong production of vortices after even higher processor numbers up to grant WE-2186/5. The parallel grid the unification of the nasal cavity chan- the complete Cray XE6 system is in- generator has been developed as part

22 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 23 Applications Applications Numerical Investigation of the Flow Field about the VFE-2 Delta-Wing

The industrial application of Delta-Wings The occurrence of unsteady vortex a simplification of the Navier Stokes incorporated in a solver for the incom- is manifold and reaches from the clas- breakdown is critical for aircraft. It is equation inevitable. pressible Navier-Stokes equations with sical aerospace engineering, e.g. highly physically not fully understood, thus constant density. Continuity is ensured agile aircraft, aerodynamic devices or further investigation is required. The on- Commonly used in industry is the RANS by the pressure-Poisson equation. The control surfaces, to unique environmental going investigation is funded by the DFG (Reynolds-Averaged Navier Stokes) equations are discretized on a stag- technologies, such as devices for snow project “Numerische Untersuchung der simulation with appropriate statistical gered Cartesian mesh allowing for an clearance. In all cases the development instationären Strömung um generische turbulence models. This simplified ap- easy control of the truncation error and of leading edge vortices is exploited. schlanke Deltaflügel“ (DFG-B-506/2). proach often fails to accurately predict offering superior computational effi- separated and reattached flows. Also ciency compared to body-fitted grids [6]. However, steadiness and stability of The international Vortex Flow Experiment 2 in the case of the VFE-2 Delta-Wing Bounding surfaces of the flow that are these leading edge vortices is essential (VFE-2) Delta-Wing is taken as a generic results do not compare well with the not aligned with the grid are accounted for controllability, particularly for highly aerodynamic configuration for which existing experimental results [9]. for by the Conservative Immersed Inter- agile aircraft. It is well known that vor- small angles of attack already lead to the face (CIIM) approach [8]. tices can undergo a sudden expansion development of leading edge vortices. Better results are expected from often related to vortex break-down [5]. Large-Eddy Simulation (LES). In LES only For time advancement an explicit third- Underlying Theory the large flow structures are resolved order Runge-Kutta scheme is used. The A profound understanding of vortex for- while small, stochastic structures are pressure-Poisson equation and diffusive mation and breakdown requires a com- modelled with the help of SubGrid Scale terms are discretized by second-order prehensive insight into the complete (SGS) models. In Implicit Large Eddy centred differences. unsteady flow field. This insight can only Simulation (ILES) the truncation error of be obtained from time-accurate simula- the discretization of the convective Further improvement of efficiency is tions accompanied by experiments. terms is deliberately tailored to act as a achieved by modelling the turbulent SGS-model which is therefore implicit boundary layer using a wall model [2] The most complete description of flows to the discretization. One implementation and by locally adapting the mesh resolu- in continuum mechanics is given by the of ILES is the Adaptive Local Deconvolution tion with Local Mesh Refinement, such Navier-Stokes equations, which describe Method (ALDM) [1,4] which has shown as used in [7]. In this context new crite- the exchange of momentum in the fluid considerable potential for the efficient ria for the Local Mesh Refinement algo- considering friction. representation of physically complex rithm based on physical criteria are ap- flows in generic configurations, such as plied. A further considerable reduction Solving the Navier Stokes equations isotropic turbulence, turbulent channel of the number of computational cells is requires very high spatial and temporal flow with periodic constrictions, turbulent achieved, rendering the simulation much resolution. A Direct Numerical Simulation boundary layer separation and turbulent more effective. (DNS) is still not feasible for complex cylinder flow [7]. In this project the ILES turbulent flows in industrial applications approach will be used for the essential Results due to the required tremendous com- numerical investigations. The results in this reporting period have putational resources. For simulating the been obtained for a Reynolds number turbulent flow about the VFE-2 Delta- Numerical Method Re = 0.5 million based on root chord Figure 1: Isosurface of streamwise vorticity colored by streamwise velocity for the VFE-2 Delta Wing at an angle of attack of 13° and a Wing several hundred CPU years using The Adaptive Local Deconvolution length and an angle of attack of 13°. Reynolds number of 0.5 million. 1 Terra flops would be needed, making Method (ALDM) turbulence model is The objective is to reach a qualitative

24 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 25 Applications Applications

AoA [°] Re[-]in mio LE Ncells in mio NCPU CPUh in mio

Figure 2: Cp distribution on the upper (top) and 13 0.5 MR 20 1024 0.9 front (bottom) surface of the VFE-2 Delta Wing 13 1 MR 20 1024 0.9 at an angle of attack of 13° and a Reynolds num- ber of 0.5 million. 18 1 MR 40 1024 1.2 Qualitative results for this specific configuration look very promising. 18 1 S 40 1024 1.2 A profound understanding is expected after re- sults have been quantitatively validated with the 23 1 MR 40 1024 1.2 experimental data. The necessary simulations are on-going. 23 1 S 40 1024 1.2

Table 1: Survey of conducted simulations (black) and planned simulations (grey). (AoA = Angle of Attack, Re= Reynolds number, LE=Leading Edge, number of cells, number of CPUs, number CPU hours)

sume 2% of the overall computational References time, the flux calculation with ALDM [1] Adams, N.A., Hickel, S., and Franz, S. [6] Meyer, M., Hickel, S., and Adams, N.A. Implicit subgrid-scale modeling by Computational Aspects of Implicit accounts for 7% and the Poisson adaptive local deconvolution J. Comput. LES of Complex Flows, in: High Performance solver uses 83%. Phys. 200 (2), 412-431, 2004 Computing in Science and Engineering, agreement with the respective experi- Garching/Munich, Springer, Heidelberg, [2] Chen, Z., Devesa, A., Meyer, M., ments of Furman and Breitsamter [3]. Table 1 summarizes further compu- 2010 Hickel, S., Lauer, E., Stemmer, C., tational details for the present and and Adams, N.A. [7] Meyer, M., Hickel, S., and Adams, N.A. For this angle of attack both, experi- planned simulations. Wall Modelling for Implicit Large Eddy Asessment of Implicit Large-Eddy ment [3] and the current simulation, Simulation of Favourable and Adverse Simulation with a Conservative Immersed Pressure Gradient Flows, Proceedings of Interface Method for Turbulent Cylinder show vortex formation over half chord Table 1: survey of conducted simula- • Michael Meyer WALLTURB Workshop, Lille, Springer, Flow, Int. J. Heat Fluid Flow 31 (3), length, see Fig. 1. Close to the apex tions (black) and planned simulations Heidelberg, 2009 368-377, 2010 • Stefan Hickel the boundary layer flow accelerates (grey). (AoA = Angle of Attack, • Nikolaus A. Adams [3] Furman, A., and Breitsamter, C. [8] Meyer, M., Devesa, A., Hickel, S., over the leading edge and undergoes Re= Reynolds number, LE=Leading Experimental investigations on the Hu, X.Y., and Adams, N.A. laminar-turbulent transition. This is Edge, number of cells, number of vfe-2 configuration at TU Munich, Germany, A Conservative Immersed Interface Institute of also reflected by the high suction lev- CPUs, number CPU hours) RTO-TR-AVT RTO-TR-AVT- 113, 21, 1-47, 2008 Method for Large-Eddy Simulation Aerodynamics and els visible in Fig. 2. Severe pressure of Incompressible Flows, J. Comput. Fluidmechanics, [4] Hickel, S., Adams, N.A., Phys. 229 (118), 6300-6317, 2010 gradients in lateral direction provoke On-going Research/Outlook and Domaradzki, J.A. Technische boundary layer separation further Since the objective is to investigate An adaptive local deconvolution [9] Morton, S., Forsythe, J., Mitchell, A., Universität downstream. The separation region is vortex bursting process, further inves- method for implicit LES J. Comput. Phys. and Hajek, D. München 213 (1), 413-436, 2006 Detached-eddy simulations and indicated by the low pressure regions tigation at the angles of attack of 18° Reynolds-Averaged Navier-Stokes on the upper surface of the wing (see and 23° is with varying leading edge [5] Ludwieg, H. simulations of delta wing vortical flow field, Fig. 2). geometries, i.e. MR= Medium Round Zur Erklärung der Instabilität der bei Trans. ASME 124 (4), 924-932, 2001 angestellten Deltaflügeln auftretenden and S= Sharp, are planned. As a next freien Wirbelkerne, Z. Flugwiss. 10 (3), Computational Details step an investigation is planned for 242-249, 1962 For this test case again a SGI-ALTIX actively controlling and preventing the with ItaniumII processors have been vortex burst with leading edge devices, used. CIIM and the wall modelling con- i.e. oscillating control surfaces.

26 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 27 Applications Applications Simulating Blood Cells and Blood Flow

Blood performs a multitude of func- 8μm, 10μm, and 2μm, respectively. tions on its way through our body, The blood cells together take up about from the transport of oxygen to the 45% of the blood volume. However, immune response after infections. In they appear in very different concentra- addition, the circulatory system may be tions: one microliter of blood contains also affected by injuries which cause of about 5 million RBCs, 5,000 WBCs, bleeding, by the formation of plaques and 250,000 platelets. Therefore, the in arteries which cause coronary heart flow properties of blood are dominated disease, and it provides the pathway by the highly abundant RBCs [1,2]. for the organism invasion by bacteria or viruses. Thus, modeling of blood flow Mesoscopic Modeling of and its functions is an important chal- Blood lenge with many medical implications, The numerical modeling of blood flow Figure 2: Simulation of blood under shear flow. RBCs are shown in red and in orange, where orange but also with many interesting physical in the circulatory system is very chal- color depicts the rouleaux structures formed due to aggregation interactions between RBCs. The phenomena. lenging due to the large range of image also displays cut RBCs with the inside drawn in cyan to illustrate RBC shape and deformability. length scales involved, ranging from Blood is a complex fluid, which consists the nanometer size of proteins and Two main ingredients are necessary ization of the above methods. In order of blood plasma – a liquid comprised of the micrometer size of blood cells to for the mesoscopic modeling of blood to highlight the strengths of mesoscale water, small molecules and various pro- the centimeter size of large vessels cells under flow, the membrane and the modeling to study blood flow, we focus teins – and blood cells. There are sev- (arteries and veins) and the decimeter embedding fluid. The membrane of a further on the two recent examples: eral types of blood cells, which include size of the heart. Therefore, no single RBC consists of a fluidic lipid bilayer, to blood rheology and the margination of red blood cells (RBCs), white blood simulation technique is able to cover all which a biopolymer (spectrin) network WBCs in microchannel flows. cells (WBCs) and platelets, as shown required length scales. On the macro- is attached. This composite membrane in Fig. 1, with typical sizes of about scopic level, appropriate for studying can be described by a triangulated Blood Rheology blood flow in the heart and in large network of springs with bending and Rheology is the science of the behav- vessels, Computational Fluid Dynamics stretching elasticity [3,4]. For the fluid, ior and properties of fluids under flow. (CFD) – which describes blood as a mesoscale hydrodynamics simulation The aim of modern rheology is the continuum fluid using constitutive equa- techniques – such as Dissipative prediction of the macroscopic flow tions for the fluid properties – has been Particle Dynamics (DPD) [5], Multi- properties, such as the fluid viscosity, very successful. However, constitutive Particle Collision Dynamics (MPC) [6], from the sizes, deformability, and in- equations are not always accurate. and the Lattice Boltzmann method teractions of the constituent molecules Moreover, this approach fails when the (LBM) – have proven to be very advan- and particles. For complex fluids, the particulate nature of blood becomes tageous, because these hydrodynamics rheological properties can be very rich. important, for instance, when modeling techniques contain thermal fluctua- A characteristic feature is that under the blood flow in small vessels (arteri- tions and can be easily coupled to the shear flow, the fluid viscosity may not oles, capillaries, and venules). Therefore, membrane. The mesoscale hydrody- be constant, but depend on the shear mesoscopic models and simulation namics approaches are also very suit- rate. Such fluids are called “non- approaches, which describe blood as able for running on massively parallel Newtonian”. Blood is non-Newtonian a suspension of soft, deformable par- supercomputers, such as JUROPA and and exhibits shear-thinning, i.e. with Figure 1: A scanning electron micrograph of blood cells. From left to right: red blood cell (red), activated platelet (yellow), and white blood cell (cyan). ticles or cells have been developed in JUGENE at the Jülich Supercomputing increasing shear rate, the viscosity of Source: The National Cancer Institute at Frederick (NCI-Frederick). recent years. Centre, due to the simplicity of parallel- blood decreases.

28 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 29 Applications Applications

more pronounced for whole blood due shear-thinning regime, the distribution to the formation of rouleaux struc- broadens and shifts to smaller val- tures at low shear rates, which break ues, indicating strongly deformed and up into smaller aggregates and finally more rounded RBCs; finally, at veryα single cells with increasing shear rates. high shear rates, the distribution shifts Above a shear rate of approximately to large values of , indicating prolate- 5s-1, the viscosity curves for whole shaped RBCs, which are elongated and blood and Ringer solution merge, and oriented by the flow.α Thus, the simula- therefore we can conclude that rouleaux tions provide new insights into the de- are absent at higher shear rates. In formation and dynamics of single cells, Fig. 3, the comparison of simulation but also their correlations and interac- and experimental results shows excel- tions, under flow. lent agreement. White Blood Cell Margination What can we now learn from simula- WBCs in our organism take part in tions, which has not been already the defense against various infections.

known from experiments? First, the Experimental observations revealed an Figure 4: RBC asphericity distributions to describe cell deformations Figure 3: non-Newtonian relative viscosity (the cell suspension viscosity good agreement between simulations interesting behavior of WBCs in blood through the deviation from a spherical shape. The asphericity for a normalized by plasma viscosity) of whole blood as a function of shear and experiments for the viscosity of flow such that they migrate towards RBC in equilibrium is = 0 .15 4 . rate at H =0.45 and 37°C. Experimental data are shown for whole t whole blood is obtained by tuning the the vessel walls, a process called “mar- blood (green crosses, black circles, and black squares) and for non- α strength of the aggregation interac- gination". This phenomenon appears to flow provides an excellent opportunity aggregating RBC suspension (red circles and red squares). Simulation results [7] are shown for the both cases with black dashed and solid tions between RBCs. Thus, we are able be biomedically very important, since to study WBC margination in detail lines, respectively. to predict the attractive force between WBCs need to firmly adhere to vessel for various flow and cell properties. To RBCs to be in the range of 3 to 7 pico- walls to perform their function, and explore the effect of many parameters In order to predict the shear-thinning Newton, 10 million times smaller than thus first they have to closely approach on WBC margination, we employ a behavior, we study RBC suspensions in the weight of a mosquito! This force or touch the walls, which is facilitated two-dimensional (2D) blood flow model shear flow illustrated in Fig. 2, which is below the resolution of current ex- by their margination. shown in Fig. 5, which significantly re- are characterized by the RBC volume perimental techniques. Second, Fig. 3 duces computational cost, however,

faction Ht, called the “hematocrit”, shows a second shear-thinning regime WBC margination is governed by captures essential physics. In 2D, the and the shear rate · . The simula- above a shear rate of about 5s-1, hydrodynamic interactions of cells with cells are modeled using a bead-spring tion results for the relative viscosity where aggregation interactions are not the vessel walls and with each other in representation illustrated in Fig. 5 [9]. γ s (where s is the plasma viscosity relevant. This regime is due to defor- blood flow. In fact, RBCs play a major The flow rate in the channel is charac- without cells) are displayed in Fig. 3. mations and orientations of RBCs, as role in WBC margination, since RBCs terized by a dimensionless shear rate - - η/Theη two curvesη in Fig. 3 are for whole revealed by a more detailed analysis of experience a stronger hydrodynamic lift · *= · , where · is the average shear blood, for which proteins in the blood the simulated RBC shapes. In the simu- force [8] away from the walls – due to rate and is a characteristic RBC re- plasma (e.g., fibrinogen) mediate at- lations, the shapes are characterized their discoidal shape and deformability – γlaxationγτ time. γ τ tractive interactions which at low shear by the eigenvalues j of the gyration than WBCs, which are nearly spherical rates lead to the formation of cylindrical tensor, from which the asphericity in shape and resistant to deformations. Fig. 6 shows a WBC margination dia- 2 λ 2 stacks of RBCs called “rouleaux”, and i,j i j) i i) can be calcu- As a consequence, RBCs migrate to gram for various hematocrits and flow for RBCs in Ringer solution, where lated (which yields 0 for a sphere, the center of a vessel and effectively rates. The margination probability is these proteins have been removed and α = (Σ1 for(λ a- longλ )/(2Σ and thinλ rod). The push out WBCs to the vessel walls. defined as a probability for a WBC to the interactions are purely repulsive. simulation results forα = the asphericity be closer than 500nm to the channel In both cases, blood is found to be a distributions,α = displayed in Fig. 4, show WBC margination strongly depends on walls. Clearly, efficient WBC margin-

shear-thinning fluid, an important prop- that at low shear rates all RBCs main- blood cell deformability, hematocrit Ht, ation occurs only within an intermedi-

erty to reduce the load on the heart. tain their equilibrium discocyte shape; local flow rate, and RBC aggregation. ate range of hematocrits Ht=0.2-0.5 However, the shear thinning is much at higher shear rates, in the second Therefore, numerical modeling of blood and flow rates · *= 1 - 10 .

γ

30 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 31 Applications Applications

The simulation results are consistent of other cell and capsule suspensions, References with experimental observations that blood flow in diseases (e.g., malaria, [1] Pries, A.R., Secomb, T.W., and Gaehtgens, P. WBC adhesion is found mainly in ve- sickle-cell anemia), biomedical applica- Biophysical aspects of blood flow in the nules (but not in arterioles) in the or- tions (e.g., lab-on-chip, microfluidics), microvasculature, Cardiovasc. ganism. The characteristic values of etc. The main advantages of accurate Res., 32:654–667, 1996 · * in venules are low enough, while in modeling of such suspensions in com- [2] Popel, A.S., and Johnson, P.C. > arterioles · * ~ 30 [2]. Thus, efficient parison to experimental tests are Microcirculation and hemorheology, Annu. γWBC margination and consequent robustness and low cost of numerical Rev. of Fluid Mech., 37: 43–69, adhesion areγ mainly expected in the simulations. The future growth of 2005

venular part of microcirculation. Our supercomputing resources and further [3] Noguchi, H., and Gompper, G. simulations also showed that RBC ag- advances in cell modeling will make Shape transitions of fluid vesicles and red gregation enhances WBC margination, blood simulations a versatile and widely blood cells in capillary flows, Proc. Natl. Acad. Sci. USA, 102:14159–14164, while WBC deformability attenuates available tool in biophysical and bio- 2005 this effect [9]. medical research and applications. [4] Fedosov, D.A., Caswell, B., and Karniadakis, G.E. A multiscale red blood cell model with accurate mechanics, rheology, and dynamics, Biophys. J., 98:2215-2225, Figure 6: Probability diagrams of WBC margination with respect to · * 2010 · * and Ht. Symbols (o) indicate the values of Ht and for which simulations were performed. [5] Pivkin, I.V. , Caswell, B., and γ γ Karniadakis, G.E. Dissipative particle dynamics, in: K.B. Lipkowitz, editor, Reviews in Computational Chemistry, volume 27, pages 85–110, John Wiley & Sons, Inc., Figure 5: Simulation snapshot of the flow (from left to right) at Ht=0.35 and dimensionless shear rate · *= 1.59 . The large quasi-spherical particle is the WBC. Hoboken, NJ, USA, 2010 • Dmitry A. Fedosov*

γ [6] Gompper, G., Ihle, T., Kroll, D.M., and • Gerhard Gompper Winkler, R.G. Conclusions Acknowledgement Multi-particle collision dynamics: a particle- Theoretical The above models and results demon- The simulations reported in this article based mesoscale simulation approach to Soft Matter and the hydrodynamics of complex fluids, Adv. strate a promising role of numerical have been performed on the super- Polym. Sci., 221:1–87, 2009 Biophysics modeling to understand and predict computers JUROPA and JUGENE at the Institute of [7] Fedosov, D.A., Pan, W., Caswell, B., blood flow and its related processes. Jülich Supercomputing Centre. We Complex Systems Gompper, G., and Karniadakis, G.E. The potential of these models extends thank FZJ and GCS for generous Predicting human blood viscosity in silico, and Institute for not only to RBC suspensions under vari- computing time grants. Proc. Natl. Acad. Sci. USA, Advanced Simulation ous conditions, but also to modeling 108:11772–11777, 2011

[8] Messlinger, S., Schmidt, B., Forschungszentrum Noguchi, H., and Gompper, G. Jülich Dynamical regimes and hydrodynamic lift of viscous vesicles under shear, Phys. Rev. * Dmitry Fedosov has E, 80:011901, 2009 recently received a Sofia Kowalewskaja [9] Fedosov, D.A., Fornleitner, J., and Award by the Alexander Gompper, G. von Humboldt Foun- Margination of white blood cells in dation for numerical micro-capillary flow. Phys. Rev. Lett., studies of blood flow in 108:028104, 2012 health and disease.

32 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 33 Applications Applications Direct Numerical Simulations of turbulent Rayleigh-Bénard Convection

The large structures occurring in the In most of the theoretical and numeri- fluid flow on the Sun's surface, in the cal investigations of natural convection atmosphere and oceans of planets, in- the Oberbeck-Boussinesq (OB) approxi- cluding our Earth, are primarily driven mation is considered. This means that by convection. Their actual shape but all physical properties of the fluid are also the efficiency of the heat transport assumed to be independent of tem- is however significantly influenced by perature and pressure, except the Figure 2: Validity range of the OB approximation according to Gray & Giorgini [2] for water (top) and glycerol (bottom). The pink stars mark the parameters for our NOB simulations. the Coriolis force due to rotation. Crystal density in the buoyancy term which is growth and the ventilation of buildings considered linearly dependent on the and aircrafts originate within the same temperature. These assumptions are The mesh size was chosen in a way to at least several months, pushes us to physical framework. Understanding evidently never fulfilled in reality and the fulfil the criterion by Shishkina et al. the limits of nowadays supercomputers. these fundamental processes is thus deviations of the flow characteristics [5], which guarantees the resolution not only utterly important for geo- and due to their violation are called non- of the smallest relevant scales, i.e. the Results astrophysics, but also in industry. Oberbeck-Boussinesq (NOB) effects. Kolmogorov and the Batchelor scale. Our NOB simulations revealed that the temperature dependencies of the That is, where the idealization, the so- Thus the objective of our studies is to So far, our simulations focussed on two material properties are able to signifi- called Rayleigh-Bénard convection investigate the influence of rotation fluids, water (Pr = 4.38) and glycerol cantly influence the global flow structures. comes into play: The above mentioned and NOB effects on turbulent thermal (Pr = 2547.9). Their material properties In general they lead to a breakdown of highly complex phenomena are simpli- convection, also, but not exclusively, were adopted from experimental data the top-bottom symmetry typical for fied to a fluid heated from below and at very high Rayleigh numbers. We ad- by Ahlers et al. [1]. The parameter OB simulations. The NOB effects that cooled from above, with solely gravity, dress these issues by means of Direct ranges for our performed NOB simu- we observe, include, but are not limited and hence buoyancy, acting on it. Numerical Simulations (DNS). lations are given in Fig. 2, together to, different thermal and viscous with the estimation of the validity range boundary layers, asymmetric plume Numerical Method of the OB approximation according to dynamics and an increase of the bulk We performed high-resolved direct Gray & Giorgini [2]. In sum, we covered temperature Tc (Fig. 3, 4). In glycer-ol numerical simulations of Rayleigh-Bénard the range of Rayleigh numbers for the NOB case Δ = 80 K we obtain convection making use of the well- 10 5≤ Ra ≤ 10 9 and NOB conditions up a Tc that is 15 K higher than the arith- tested fourth order finite volume code to a temperature difference ∆ between metic mean temperature between the FLOWSI [2]. The code solves the incom- the top and bottom plate of 80 K. plates, whereas in contrast, in the pressible Navier-Stokes equations and case of water the observed increase of has shown an almost perfect scalability Self-evidently, we performed the corre- Tc is only about 5 K. on the HLRB II cluster (Fig. 1). sponding OB simulations for the purpose of comparison as well. Furthermore, Nevertheless, the Nusselt number Nu For the purpose of investigating NOB we also conducted DNS for different and the Reynolds number Re and their effects, the code has been advanced by rotation rates under OB and NOB scaling with Ra show only a slight deviation taking temperature-dependent material conditions. from the OB case. properties into account. Furthermore, Figure 1: Scaling behaviour of the FLOWSI code (pink dia- monds) on HLRB II. The perfect scalability of an ideal code is we incorporated a module to model the However, not only the required resolution In rotating convection another physi- shown as well (black dashed line). effects of rotation. but also the necessary computation cal phenomenon plays an important time to obtain reliable statistics, that is role: the formation of columnar vorti-

34 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 35 Applications Applications

ces, as seen in Fig. 3. These so-called under rotation is stronger under NOB Ekman vortices are able to extract hot than under OB conditions. and cold fluid from the bottom and top boundary layers, respectively, and On-going Research/Outlook thereby increase the heat transport. A next step on studying NOB effects They are also responsible for a persist- will be the inclusion of compressible ing mean temperature gradient in the convective flows, i.e. almost all gases, bulk. Their size, and related to that where the Low Mach Number (LMN) their efficiency, is determined by the approximation will be used, which gen- heat diffusivity and viscosity, thus they erally admits the working fluid to be a are sensitive to NOB effects. Indeed, non-perfect gas. And since in an astro- Figure 4: Instantaneous tempera- ture isosurfaces of glycerol while Tc is the same as in the non- nomical and geophysical context Ra is (Pr = 2547.9, Ra = 109) under OB rotating case, the absolute value of the typically rather in the order of 1020, we (left) and the NOB conditions (right). temperature gradient is diminished. In plan to go to higher Ra, which are of line with this, the enhancement of Nu course computational more expensive,

because of the mesh size and the References required time for sufficient statistics. [1] Ahlers, G., Brown, E., Fontenele Araujo, F., Funfschilling, D., Grossmann, S., Lohse, D. Additionally, we plan to test existing J. Fluid Mech., 2006 and as appropriate develop subgrid scale models and perform large-eddy [2] Gray, D.D., Giorgini, A. Int. J. Heat Mass Transfer 19, 1996 simulations (LES). [3] Horn, S., Shishkina, O., Wagner, C. Acknowledgments J. Phys.: Conf. Ser. 318, 082005., 2011 All simulations were performed at the [4] Shishkina, O., Wagner, C. Leibniz Supercomputing Centre on the C. R. Mecanique 333, 2005

national supercomputer HLRB II and on [5] Shishkina, O., Stevens, R.J.A.M., • Susanne Horn the HPC Cluster SCART (DLR Göttingen). Grossmann, S., Lohse, D. • Matthias The authors acknowledge support by New J. Physics 12, 2010 Kaczorowski the Deutsche Forschungsgemeinschaft • Olga Shishkina under grant SH405/2-1 and would also Links like to thank the staff of the LRZ for http://scart.dlr.de/site/research-projects/ Institute of non-oberbeck-boussinesq-effects-in-natural- their continuous support. Aerodynamics and convection/ Flow Technology http://scart.dlr.de/site/research-projects/ German Aerospace highly-resolved-numerical-simulations-of- Center (DLR), turbulent-rayleigh-benard-convection/ Göttingen http://scart.dlr.de/site/research-projects/ rotating-turbulent-rayleigh-benard-convection/

Figure 3: Instantaneous tempera- ture isosurfaces of water (Pr = 4.38, Ra = 108, magenta indicates the warm, cyan the cold fluid), shown are the non-rotating OB (upper left) and NOB case (upper right), as well as a moder- ately rotating OB (lower left) and NOB case (lower right).

36 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 37 Applications Applications Development and Validation of Thermal Simulation Models for Li-Ion Batteries in Hybrid and Pure Electric Vehicles

For the design and integration of alter- consortium of the asc(s [3] the neces- native propulsion components such as sity of strategies has led to several Li-ion battery modules for electrified intensive discussions and development vehicles efficient and well suited simula- activities. tion tools are needed [1]. The tools are needed in the scope of the CAE-deter- The prediction of the electro-thermal mined virtual development process to behavior of a single Li-ion battery cell assure performance, durability, reliabil- under different electric loads can be ity and cost-related requirements for covered by various modeling approaches the battery module on vehicle, module described in literature. One group of and cell level. The simulation level models, called physics-based models determines the degree of geometric [4], uses the porous-electrode and complexity covered by the CAE model. concentrated solution theory from As an example, a cell level CAE model Newman [5,6], Ohm’s law and mass of a Li-ion pouch cell covers the entire as well as energy balances. Thus, this electrochemical active area. Thus, group of models is computational Figure 1: Coupling of electrical network model and thermal finite volume model. modeling strategies are needed that intensive but covers the highest depth combine traditional, well established, of physical understanding. Another tion of depth of discharge. Hereby, both physics-based and semi-empirical three-dimensional multi-physics CAE group of models [7], called semi- this modeling group becomes less models. Thereby, this group of models tools with electrochemical models as empirical models, replace basically the computational intensive, is still on cell takes the least computational effort demonstrated in [2]. These modeling porous-electrode and concentrated level but loses some degree of physi- and can also be effectively coupled with strategies should allow the investiga- solution theory by the semi-empirical cal capability in the porous electrodes. multi-physical CFD tools. tion of requirement-related questions correlation from Newman and Tiede- The third group of models [11], called with different physical depth aligned mann [8,9,10] to describe the current equivalent circuit models, describes The project “Development and Valida- with the respective simulation levels of density as a function of the potential the current-voltage behavior of a bat- tion of Thermal Simulation Models for the virtual development process. Since difference between the positive and tery cell by an equivalent system, an Li-Ion Batteries in Hybrid and Pure Elec- these modeling strategies are consid- negative electrodes and the current electric circuit, which is the largest tric Vehicles” is a cooperation of Adam ered as pre-competitive by the OEM collectors of the battery cell as a func- physical simplification in comparison to Opel AG, asc(s, Battery Design LLC,

Figure 2: Investigations at differ- ent kinds of sub models: two cell model including conduction part between and inside air volume model (all pictures with courtesy of the Behr group).

38 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 39 Applications Applications

CD-adapco, Daimler AG, Dr.-Ing. h.c. F. The entire lithium ion battery module is needed to perform battery simula- Porsche AG, as well as the Behr group consists of approx. 16 million mesh tions. But those cores require 3-4 GB as a subcontractor. It concentrates cells and includes multiple high power RAM each. A common battery simula- on the development of a simulation pouch cells, a cooling circuit and tion within the scope of the project strategy on battery module level and the housing. In order to approve the allocates e.g. four 64 GB nodes at the introduces a coupling concept between developed concept for electrothermal Cray XE6 Hermit but uses only every an equivalent circuit cell model with a battery simulation and evaluate the second core. multi-physics CFD code (Fig. 1). The quality of the simulation model and

Figure 3: Transient behavior of a simplified battery module. Temperature distribution in the battery cells and cooling plates at different times References [1] Fell, S. [6] Newman, J.S. CAE Methods and Test Requirements for Electrochemical Systems, Prentice-Hall, overall model can be used to investi- numerical set up on cell and pack level, Battery System Design, 3. VDI-Fach-' Englewood Cliffs, New Jersey, 1991 gate heat and energy management and several verification and validation tests kongress Elektromobilität, 2011 to optimize the layout of the battery have been performed. Fig. 3 demon- [7] Kim, U.S., Shin, C.B., Kim, C.S. [2] Damblanc, G., Hartridge, S.,Spotnitz, R., Modeling for the scale-up of a lithium-ion cooling system. Knowledge about the strates the interaction between the Imachi, K. polymer battery, Journal of Power Sources temperature and charge distribution battery cells and cooling plates for an Validation of a new simulation tool for the 189 (2009) 841–846 • Jenny Kremser between cells within the battery mod- electrothermal test case. The battery analysis of electrochemical and thermal et al. [8] Tiedemann, W., Newman, J. ule is crucial, since potential inhomoge- cells are electrically cycled while the performance of lithium ion batteries, JSAE Annual Congress, 2011 in: Gross, S. (Ed.), neities can have a negative impact on coolant plates switch on and off accord- The Electrochemical Society, Princeton, Adam Opel AG battery life. ing to defined temperature ranges for [3] Automotive Simulation Center Stuttgart e.V. NJ, 1979, pp. 39–49 asc(s the battery cells. http://www.asc-s.de Battery Design LLC [9] Newman, J., Tiedemann, W. In order to develop a best practice simu- [4] Doyle, M., Newman, J. J. Electrochem. Soc. 140 (1993) 1 CD-adapco lation strategy theoretical thoughts as The battery simulations were per- The use of mathematical modeling in the 961–1968 Daimler AG well as experimental results have been formed at the NEC Nehalem cluster design of lithium/polymer battery systems, Porsche AG Electrochimica Acta, Volume 40, Issues [10] Gu, H. applied on the basis of several kinds of as well as the CRAY XE6 Hermit. The 13–14, October 1995, pp. 2191-2196 J. Electrochem. Soc. 130 (1983) sub models. Module parts such as a simulation time quickly amounts to a 1459–1464 [5] Newman, J.S. and Tobias, C.W. two cell model, a single cooling plate or week in order to run a 2-3 hours last- [11] USABC Linearized Battery Model, Freedom- Theoretical Analysis of Current the inside air volume have been studied ing electrothermal test case for the full CAR Battery Test Manual for Power-Assist Distribution in Porous Electrodes, J. extensively, e.g. to find the optimal battery pack. While the CFD part of the Hybrid Electric Vehicles, October 2003 Electrochem. Soc., Volume 109, Issue 12, thermal and electrical mesh resolution solution is fully parallelized the electri- pp. 1183-1191, 1962 of the stack in in- and through-plane cal calculations work in serial for each contact: Jenny Kremser, direction, or to examine the impact of battery cell. Thus just a few couple [email protected] natural convection and radiation effects of nodes resp. cores (related to the on the cell temperature. number of battery cells in the module)

40 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 41 Applications Applications

Broadly speaking, there are two possible (a) Matter-Antimatter Asymmetry scenarios: either the experimentally and the Search for fundamental observed neutral kaon mixing can be explained by the known properties of quarks within the current standard Laws of Physics with JUGENE (b) model of particle physics, the so-called CKM (Cabibbo-Kobayashi-Maskawa) The Matter-Antimatter The reverse process, the transforma- theory, or not. In the first case one Asymmetry tion of an anti-kaon into a kaon , would get a nontrivial confirmation of Our visible universe today consists of does also exist, but it progresses at the standard model and a hint that the matter with only some tiny trace a slightly different rate,K̅ ͦ thus leavingK ͦa matter-antimatter asymmetry has its amounts of antimatter, e.g. in cosmic slight imbalance between matter and origin at a much higher energy scale, rays. Although this statement seems antimat ter. because the standard model alone is to be rather trivial given our everyday known not to provide enough of it. experience, it has some far-reaching The obvious question now is whether In the second case one would obtain Figure 3: (a) The color flux lines between a pair of opposite consequences for the origin of our uni- this slight imbalance is able to explain clear evidence for the existence of color charges. (b) When separating the two charges, the verse. Unless we are willing to accept the observed matter dominance of the “new physics” - new particles, forces, flux tube gets elongated and ultimately breaks up by pro- that the universe started out in an universe. This question, however, is not or mechanisms (e.g. extra dimensions) ducing a quark-antiquark pair out of the vacuum. asymmetric state, with more matter so straightforward to answer because that go beyond the established stan- than antimatter, some mechanism has the particles involved, the kaons and dard model. In this scenario the “new task because the quarks that form a to exist that gives preference to the anti-kaons, are not simple elementary physics” would provide some part of kaon are bound by the strong nuclear production of matter over antimatter in particles. They are what we call mesons, the experimentally observed kaon mix- force that we can describe with a the course of evolution of our universe. bound states of one quark and one ing and a potential answer for the theory called quantum chromodynamics antiquark held together by the strong mater-antimatter asymmetry in the (QCD). QCD, however, does not allow In 1967 Sakharov has listed the neces- nuclear force. Because the effects of universe. for the existence of single, free quarks; sary ingredients for such a mechanism. the strong nuclear force are very dif- quarks have to occur in bound states Obviously one ingredient is a process ficult to account for precisely, it is not QCD only. This essential feature of QCD, that preferentially transforms matter straightforward to deduce from the In order to clearly distinguish between called quark confinement, is caused by into antimatter, and indeed such pro- properties of the (anti-)kaons the fun- these two scenarios, the properties the specific “color” interaction between cesses have been found in nature. The damental properties of their constitu- of kaons and their constituent quarks quarks. In addition to the electrical most prominent of these processes ent (anti-)quarks that are ultimately have to be related numerically in a pre- charge, quarks do carry a “color” (see Fig. 1) is the so-called “kaon mixing”, relevant for explaining the observed cise way. This is not a straightforward charge. This “color” has nothing to do where a neutral particle (the kaon or mater-antimatter asymmetry. with real color – it was labeled “color” ) is transformed into its antiparticle because of the peculiar mixing behavior (the anti-kaon or ) of the fundamental charges that is K ͦ reminiscent of the mixing between the K̅ ͦ primary colors red, green, and blue.

While an electrical charge can only be positive or negative and have a different magnitude, color charge has three components that we may label red, green, and blue. This has surprising and far reaching consequences: let Figure 1: An illustration of neutral kaon mixing. Both the kaon and its antiparticle contain one us first imagine placing two electrical quark and one antiquark bound together by the strong nuclear force and they can transform into Figure 2: Field lines of a pair of opposite electric charges of the same magnitude but one another (grey). charges.

42 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 43 Applications Applications

opposite sign into otherwise empty the quantum mechanical path integration do not occur in nature and therefore space (see Fig. 2). The electrical field, stochastically. This approach is of their masses and the strength of their or its field quanta, the photons, can course extremely demanding numerically. interactions cannot be measured di- spread out freely into space. This is The integration has to be performed rectly. We therefore have to first solve essentially due to the fact that the on a space with a dimension given by the inverse problem and to determine photons themselves are electrically the number of lattice points times the which parameters of the theory allow neutral. internal degrees of freedom. Since the us to reproduce the experimentally lattice has to be both fine enough to observed phenomena of the strong We now turn to the color field between reproduce all essential details and big nuclear force. two opposite color charges or quarks. enough to fit the entire system being The field quanta of the color interaction, investigated, the typical dimension is of In practice, we looked at the masses of the gluons, must relate the different order 10 8 -1010. three different bound states of quarks color components and therefore cannot (two mesons, the kaon and the pion, all be color neutral. As a consequence, In addition to the large dimensionality and one heavier particle or baryon, that the field lines themselves are color D of the stochastic integral that needs is a bound state of three quarks) and charged and do interact with each to be performed, it also needs to be defined the physical point as the point Figure 4: Comparison of some experimentally observed masses of quark bound other. This results in a squeezing of the performed over quarks, which are fer- where the ratio between the three states to our QCD prediction. color flux tube between the two oppo- mions. Fermionic degrees of freedom, masses assume their experimentally site charges (see Fig. 3a). Trying to pull however, are anti-commuting numbers observed values. In our lattice simu- Knowing how to obtain continuum results, the charges apart will result in the flux and their inclusion ultimately leads to lations we had three parameters to we still have to deal with the fact that tube containing more and more energy. the need of computing determinants tune: quark masses and the coupling our lattice represents only a finite At a certain point there is enough energy of square matrices of dimension D2. constant. From the three bound state volume in space and a finite extent in in the flux tube to pair-produce a Though these matrices are typically masses we use to identify the physical time. The third step towards obtaining quark-antiquark pair out of the vacuum sparse, it is nonetheless a challenging point, we obtain two independent mass physical predictions is therefore the (see Fig. 3b). Instead of having split up task even for petaflop-class installa- ratios that constrain the space of infinite volume extrapolation. Fortunately the quark-antiquark bound state, or tions like JUGENE to perform this three parameters to a one–parameter most of the finite volume corrections meson, into its constituents, we end up stochastic integration. line of “physical points”. are exponential in the lattice extent so with two mesons eventually. that a large lattice extent renders Extracting Physics We can further differentiate the physical them negligibly small. Lattice QCD In order to extract final physics results points along the line by comparing any It is now evident that QCD (in the low from a stochastic integration of the one mass, e.g. that of the pion, from Controlled Errors energy regime that we are interested QCD path integral, a number of further experiment to the dimensionless mass Once we have carried out this procedure in here) is very hard to treat numerically: steps are necessary. The first one of of the same bound state as measured of obtaining physical predictions, we the fundamental degrees of freedom these is reaching the “physical point”. on the lattice. This procedure is called might wonder what would happen if we of the theory (quarks, gluons) are very This step is rather unique to lattice “scale setting” as it determines the used three other masses initially to different from the states that occur QCD, as in most other computer simu- fundamental length scale of the lattice, construct our line of physical points. in nature (bound states like mesons or lations, the fundamental parameters which is also not known beforehand. Would we have captured physics exactly protons and neutrons). Usual pertur- of the theory are known beforehand, The second step which is then neces- with our discretized lattice, we would bative techniques that deal with small whereas the interesting observables sary to obtain physical results is to obtain the same line. In reality there are corrections to the free fundamental de- are connected to the nontrivial behavior extrapolate along the line of physical errors due to the truncation however, grees of freedom are evidently insufficient. of large systems. points to the point where the lattice so one can trace out different lines of scale vanishes. This step is called “con- “physical points” depending on the input A more direct way of approaching this In lattice QCD we do not know before- tinuum extrapolation” as it removes quantities used. If the theory and our problem is lattice QCD. One discretizes hand the fundamental parameters of the discretization mesh and therefore methods of solving it are correct, how- the QCD equations on a space-time the theory – the masses of the quarks leads to continuum results. ever, all these lines must extrapolate lattice and then proceeds to perform and the coupling constant. Free quarks to the same continuum value. We can

44 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 45 Applications Applications

advances [2,3,4] in fact allowed us for explain the observed neutral kaon mixing. the first time in this kind of calculations The origin of the matter-antimatter to entirely eliminate the extrapolation asymmetry in our universe is still a to the physical point and replace it by mystery of fundamental physics that an interpolation (see Fig. 5a). The needs to be explored in the future. unavoidable continuum extrapolation is very flat (see Fig. 5b) and was Acknowledgment performed with different extrapolation I would like to thank all the members functions. The spread between different of the Budapest-Marseille-Wuppertal results allowed us to reliably estimate collaboration with whom this work was the corresponding systematic uncertainty. performed. This project has been sup- ported by DFG grant SFB-TR 55, the Following this general strategy, we in necessary computer resources have Figure 5: Value of the parameter BK as measured in our lattice calculations. (a) One interpolation of the lattice values to the physical point and (b) one extrapolation of the lattice values to the continuum. fact performed a total of 5,760 analy- been provided by NIC/Jülich Super- ses resulting in a reliable estimate computing Centre, GENCI, the University therefore obtain a strong crosscheck full control over all sources of systematic of the total uncertainty of our results. of Wuppertal, and the CPT Marseille. of QCD itself and our ability to solve error. When making a prediction that it by checking that not only our input has the potential to disprove our standard Consistency of the Standard References masses but also the masses of other physical theories about our world we Model [1] Durr, S. et al. (Budapest-Marseille-Wuppertal collaboration), bound states are correctly reproduced need to make sure that we do not As a final result of our calculation, we Ab-initio Determination of Light Hadron in the continuum limit at the physical underestimate the inaccuracy of our obtain the so-called bag parameter Masses, Science 322 (2008) 1224

point. As shown in Fig. 4 we have suc- prediction. of the kaon BK. This number param- [2] Durr, S. et al. cessfully performed such a crosscheck eterizes the amount of matter- (Budapest-Marseille-Wuppertal collaboration), in 2008 using mainly the computing In order to achieve this goal, our first antimatter asymmetry in neutral kaon Lattice QCD at the physical point: resources at the Jülich Supercomputing aim was trying to avoid any extra- mixing and previous calculations were light quark masses, Phys. Lett. B701 Centre [1]. polations. The supercomputing resources placing it in the region of 0.72-0.75 (2011) 265

provided to us mainly by the Jülich with a typical uncertainty of about 5%. [3] Durr, S. et al. This result exemplifies what is the most Supercomputing Centre in combination The standard model is favoring a larger (Budapest-Marseille-Wuppertal collaboration), • Christian Hoelbling important aspect of our calculations: a with algorithmic and theoretical number, albeit with a relatively large Lattice QCD at the physical point: Simulation and analysis details, JHEP 1108 error, and some authors were seeing Bergische (2011) 148 this as a possible indication of new Universität physics [5]. [4] Durr, S. et al. Wuppertal (Budapest-Marseille-Wuppertal collaboration), Precision computation of the kaon bag Our new result (see Fig. 6) is 0.773(12) parameter, Phys. Lett. B705 (2011) 477 and has a precision of 1.5%, which is [5] Brod, J. and Gorbahn, M. entirely consistent with the standard Next-to-Next-to-Leading-Order Charm-Quark model expectations. It is also far more Contribution to the CP Violation Parameter precise than the standard model pre- epsilon_K and Delta M_K, Phys. Rev. Lett. diction, which implies that an improved 108 (2012) 121801

standard model prediction for BK would increase the accuracy of this test to the percent level.

In conclusion, our calculation provided strong evidence that the CKM mecha-

Figure 6: Our final value of the parameter KB compared to previous determinations (black) and a very nism of matter-antimatter asymmetry conservative standard model prediction (blue). For more details see [4]. in the standard model does correctly

46 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 47 Applications Applications

Isotensional Stretching of chemical reaction mechanisms [6,7,8,9], Theoretical Mechano- Molecules in the "Virtual Lab" but this approximation clearly lacks the chemistry: Stressing Molecules Most quantum-chemical calculations or inclusion of thermal, entropic, dynamical ab initio simulations of stretched and solvation effects except for crude in the "Virtual Lab" molecules are carried out using so-called modeling of these. “isometric” conditions. Such setups are easily accessible in any conventional Using the power of capability platforms What is "Mechanochemistry"? On the other hand it is obvious that program package that has holonomic such as the Blue Gene/P machine Most chemical reactions must be mechanical forces almost certainly will constraints available. This is either JUGENE at Forschungszentrum Jülich activated by applying some form of lead to different reaction pathways done by fixing an internal distance be- allows us to go a major step beyond excess energy. Certainly the oldest and thus different products much alike tween two atoms in a molecule, or by computing the FT–PES. The natural approach consists of using fire, i.e. what is know from e.g. photochemistry constraining the two atoms to specified generalization is to carry out finite-tem- thermal energy, leading to what is or electrochemistry. This implies a positions along a space-fixed axis. perature simulations using the FT–PES,

commonly known as thermochemistry. growing gap between the ever increasing The latter approach is mostly used in i.e. VEFEI (x, F0 ), as input instead of

Other well–known “chemistries” are potential of experimental nanomanipu- periodic codes, where the former in the usual BO–PES, VBO (x). Clearly, this triggered by light or electricity, i.e. lation techniques and a missing theo- quantum chemistry codes that have calls for ab initio simulations [5] where photochemistry or electrochemistry. retical framework to understand such been optimized to treat finite systems. the electronic structure and thus the A general observation is that different experimental observations. Finally, it A comprehensive review on theoretical forces acting on the nuclei is computed ways of activating chemical reactions is noted that understanding mechano- mechanochemistry can be found in Ref. [4]. “on the fly” to move the nuclei under might lead to different reaction path- chemistry will not only be important In contrast to this approach, the the influence of a constant external ways – including different products in within the realm of chemistry as such, mechanochemical simulations presented mechanical force, instead of pre-

extreme cases. In contrast and despite but it will impact as well on getting in the following draw on our previously calculating and fitting VBO (x) and thus

pioneering macroscopic experiments insights into mechanical aspects of such devised conceptual framework based VEFEI (x, F0 ). This combination enables by Matthew Carey Lea at the end of diverse issues as molecular friction, on force-transformed potential energy us to map multi-dimensional free the 19th century, the field of mechani- nanoscale machines, enzymatic reactions, surfaces (FT–PES) [6]. In this “isotensional” energy landscapes at constant force, cally induced covalent chemistry, which or targeted delivery to name but a few. formalism, the FT–PES, given an ex- which we call force-transformed free

is dubbed here “mechanochemistry” ternal constant force F0, is rigorously energy surface (FT–FES) [10]. in analogy to photochemistry, electro- The aim of the project “Mechano- defined as chemistry etc., still remains largely un- chemistry of Covalent Bond Breaking However, following this avenue requires

explored when it comes to a theoreti- from First Principles” is to complement VEFEI (x, F0 )=VBO (x)-F0 q(x), the computation of free energy surfaces cal understanding at the fundamental experiments in the real laboratory by for several constant forces, which is

molecular level. those carried out in the “Virtual Lab” where VBO (x) is the usual Born–Oppen- a daunting task. This is made possible [3], thus advancing the emerging field heimer (BO) PES as a function of all using the powerful ab initio meta- Clearly this lack of understanding is at of theoretical mechanochemistry [4]. nuclear cartesian coordinates x, and dynamics technique developed by Laio odds with the fantastic current In particular, mechanically-induced ring- q is the mechanical coordinate, i.e. a and Parrinello as reviewed in Ref. [11]. experimental possibilities to use me- opening reactions have been in the structural parameter (a generalized At the technical level, using the so-called chanical nano–Newton forces as a tool focus of these investigations so far. coordinate in terms of x) on which the “multiple walker” sampling in conjunction to induce, alter, or control chemical Relying on ab initio simulations [5] at its force acts. By locating the stationary with the CPMD software package [12] reactions by manipulating atoms or heart, this study yielded unprecedented points of this function, in which the greatly improves the efficient use of olecules [1,2]. This is typically achieved insights into the molecular details of “External Force is Explicitly Included” the Blue Gene architecture, see our within setups relying on atomic force the influence of constant external me- (EFEI), one can evaluate, without invok- inSiDE contribution on prebiotic peptide microscopy (AFM), using individual mol- chanical forces on reaction mechanisms. ing any approximation, properties synthesis [13] for technical background ecules that are covalently anchored to Before presenting our representative such as reactant and transition state and scaling assessment. Together, this

tips, or sonication experiments carried results on ring-opening of cyclopropane (TS) structures as functions of F0. package of methods allows us to carry out in solution, using an ensemble of derivatives a few words on our simulation It is certainly instructive to analyze the out efficiently ab initio simulations [5] mechanophores with long polymer chains approach are in order. FT–PES, which provides a static zero– of mechanochemical reactions on Blue attached that act as force transducers. temperature perspective of mechano- Gene platforms.

48 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 49 Applications Applications

Enforced Ring-Opening of lack of selectivity in the mechanically we have unveiled the mechanisms of this topological feature, the migrating Cyclopropanes assisted ring-opening reactions of cis force-induced ring-openings of cis Cl atom can move either to the C atom Cyclobutene-based mechanophores versus trans isomers: while one ex- versus trans gDCCs, which rationalizes on the right side (thus yielding the Z,R- have received a lot of attention recently, pects the external forces to promote puzzling experimental findings. Even alkene) or to the left (leading to Z,S- whereas cyclopropane systems, such the ring-opening of cis gDCC more more importantly, we have discovered alkene), see Fig. 1. Given the topology as gem-dihalogencyclopropane deriva- efficiently (due to a better coupling be- an unprecedented complex mechano- of the underlying PES, the ring-opening tives, are under-researched in the tween the mechanical coordinate and stereo-chemical behavior, whereby of cis gDCC is expected to yield the two realm of covalent mechanochemistry the reaction coordinate), the experimental the ring-opening of trans isomers of enantiomeric alkenes with equal prob- [2,4]. Yet, the force-induced electro- observations indicate that both isomers 1,3-disubstituted gDCCs can lead to ability. The disrotatory ring-opening of cyclic ring-opening of these compound undergo force-induced ring-opening pro- two different diastereomers, with the trans gDCC at zero force, in its turn, classes has furnished striking results cesses with approximately the same probabilities of obtaining them featuring implies either the TS-II or the TS-IV [2,4]. Indeed, the mechanochemical probability. an intricate dependence on the force according to Fig. 1, with neither of activation of cis benzocyclobutenes exerted onto the system. these two TSs featuring symmetry. has been shown, both experimentally In our project the mechanochemical Trajectory shooting simulations initiated and computationally, to promote a reactivity and force-induced stereo- The model system chosen to explore from TS-II have brought to light the fact thermally forbidden disrotatory ring- chemistry of gDCCs has been scrutinized the mechanochemistry of gDCCs is that the ring-opening of trans gDCC opening process. The application of based on isotensional ab initio meta- 1,1-dichloro-2,3-dimethylcyclopropane: can yield either the E,S-alkene or the a transient tensile force on gem-difluoro- dynamics simulations [10]. Our thermo- its cis and trans isomers and the four Z,S-alkene in the absence of external cyclopropanes, in its turn, has been dynamic approach is supplemented by possible distinct reaction products of forces. The probabilities of obtaining shown experimentally to lead to an ab initio trajectory shooting simulations the corresponding ring-opening pro- these two diastereomers were calcu- unexpected isomerization of trans operating on the FT-PESs in order to cesses are depicted in Fig. 1. In the lated to be 0.76 and 0.24, respectively. species into their less stable cis isomers dissect genuinely dynamical effects first stage of our study, the thermal The branching ratio of obtaining E,R- via a mechanochemical trapping of a on branching ratios as a function of chemistry (i.e. the chemistry at zero versus Z,R-alkenes via TS-IV are identical diradical TS. Even more enigmatic are force (the details of this technique are force) of gDCCs was dissected. Our because of symmetry. The computed gem-dichlorocyclopropane (gDCC) provided in the Supporting Information simulations (both at 0 K and at 300 K) activation free energies were found to systems, which feature a counterintuitive of Ref. [10]). Based on these methods have revealed that the ring-opening be lower (by about 4 kcal/mol) for the of these molecules, which yield the ring-opening of cis gDCC than trans corresponding 2,3-dichloroalkenes, gDCC, as shown in Fig. 1. proceeds via a concerted disrotatory mechanism, whereby the breaking After having set the stage by describing of the C-C bond takes place in concert the thermal chemistry of gDCCs, we with the C-Cl bond cleavage and the will now focus on the mechanochemical subsequent Cl migration. For cis gDCC behavior of these molecules. As ex- there are two possible pathways as plained in Ref. [10], of all the reaction indicated in Fig. 1: the “outward path- pathways depicted in Fig. 1 only two way” which passes through TS-I and are relevant for fully describing the the “inward pathway” via TS-III. On the mechanochemistry of gDCCs: the basis of the difference between the disrotatory outward mechanism of cis activation energies of both pathways gDCC and the disrotatory ring-opening (about 5 kcal/mol, see Fig. 1), it has of trans gDCC passing through TS-II. As been concluded that the thermal ring- shown in Fig. 1, the force-dependence Figure 1: Left: Scheme showing the involved chemical species, i.e. all reactants (cis; trans–I and trans–II being enantiomers), transi- opening of cis gDCC occurs via a featured by the activation free energy, tion states (TS–I to TS–IV; S–TS–I to S–TS–IV), and products (Z,R; Z,S; E,S; E,R). The arrows connecting the reactants with the distinct ‡ ‡ products via the corresponding TSs represent the reaction paths obtained from IRC mapping and ab initio trajectory shooting starting “disrotatory outward mechanism”, whose ∆A , and the activation energy, ∆E , for from the TSs (see text). The second set of TSs (S–TS) belongs to interconversion reactions between selected products as indicated. BO-PES features a TS of Cs symmetry the two considered pathways follows For simplicity all structures correspond to the stationary points at zero force, the Z,R product is reproduced twice for clarity, and the (TS-I) and a bifurcation point along the a similar general trend. The values of Cl atoms are colored violet. Right: Force–dependence of activation energies ΔE‡ (F ) (open symbols) and free energies ΔA‡ (F ) at 300 K 0 0 intrinsic reaction coordinate (IRC) after A‡ (F ) have been obtained from the (filled symbols) of the disrotatory ring–opening of cis (red circles for the “outward” pathway) and trans (blue squares) 1,1–dichloro–2,3– ∆ 0 dimethylocyclopropanes. The stretching force is applied to the C atoms of the two terminal methyl groups as indicated in the inset. the TS is left behind. By virtue of FT-FESs displayed in Fig. 2. Despite the

50 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 51 Applications Applications

be to speculate that sufficiently large in much more realistic environments forces (forces on the order of 2 nN such as solutions and biomolecules. would suffice, according to the data in Fig. 1) were generated in the sono- Thanks chemical experiments as to reach the It gives us pleasure to thank Motoyuki barrierless regime for the ring-opening Shiga, Janos Kiss, Martin Krupicka, reactions of both isomers. Miriam Wollenhaupt, Matthias Rückert, and Marcus Böckmann for fruitful dis- Last but not least, our exploration of cussions. the mechanochemistry of gDCCs has unveiled a most surprising feature: We are grateful to Deutsche Forschungs- the force-dependent selectivity of the gemeinschaft (Reinhart Koselleck Figure 3: Force-dependence of the probability of obtain- ring-opening of trans reactant to yield Grant MA 1547/9 to D.M.), Alexander ing the E,S-products (red) versus Z,S-products (black) either the Z- or the E-diastereomer of von Humboldt Stiftung (Humboldt upon ring-opening of trans reactant as computed from the corresponding dichloroalkene. Fellowship to J.R.A), Catalan Government dynamical trajectory shooting simulations. The range of forces with yellow and white backgrounds correspond As clearly displayed in Fig. 3, the cor- (Beatriu de Pinós Fellowship to J.R.A.), to the forces at which the majority product of the ring- Figure 2: Force-transformed free energy surfaces (FT-FESs) for ring-opening for responding branching ratio changes as well as Spanish Government opening process is the E,S-alkene or the Z,S-alkene, the cis reactant of 1,1-dichloro-2,3-dimethylcyclopropane in the absence of any dramatically and non-monotonically as (“Ramón y Cajal” contract to J.R.A.) respectively. external force and with three different external forces (F = 0.5, 1.0, and 1.25 nN 0 a function of F . This intricate switching for having supported financially our as indicated). These FT-FESs have been obtained by means of isotensional ab 0 initio metadynamics simulations performed in a reaction subspace spanned by two behavior obviously implies that a ratio research on theoretical mechano- collective variables: CV1 is the C C distance associated with the bond that yields of about 50:50 is expected to be ob- chemistry. [5] Marx, D., Hutter, J. upon the ring-opening process. CV2 is the difference CN1 - CN2 of the coordination served only close to some specific criti- Ab-initio Molecular Dynamics: Basic Theory numbers of both chlorine atoms with respect to the left (CN1) or right (CN2) carbon cal forces (0.7 nN, 1.9 nN and 2.2 nN). Last but not least, the excellent and Advanced Methods, Cambridge atom in the cyclopropane ring. Importantly, in the limit of large forces and significant computational support University Press, 2009

the majority product is the Z,S-product, by John von Neumann Institute for [6] Ribas-Arino, J., Shiga, M., Marx, D. ‡ similar trends, the values of ∆A (F0) which is just opposite to the situation Computing (NIC) and Jülich Supercom- Angew. Chem. Int. Ed. 48:4190–4193, are much lower by about 6-7 kcal/mol encountered at zero force. To the best puting Centre (JSC) is most gratefully 2009

over the whole range of forces, which of our knowledge, this is the first time acknowledged. [7] Ribas-Arino, J., Shiga, M., Marx, D. • Przemyslaw results in a dramatic rate acceleration that such an intricate mechanochemical Chem. Eur. J. (Communication) 15:13331– Dopieralski due to finite-temperature and entropy behavior of a reaction resulting from References 13335, 2009 • Padmesh Anjukandi [1] Beyer, M.K., Clausen–Schaumann, H. effects not accounted for when com- intrinsically dynamical effects but deter- [8] Ribas-Arino, J., Shiga, M., Marx, D. • Jordi Ribas-Arino ‡ Chem. Rev. 105:2921-2948, 2005 puting only ∆E (F0). With reference to mining its stereochemistry is reported. J. Am. Chem. Soc. 132:10609–10614, • Dominik Marx the selectivity with which the tensile [2] Caruso, M.M., Davis, D.A., Shen, Q., 2010 Odom, S.A., Sottos, N.R., White, S.R., load might promote preferentially the Outlook [9] Dopieralski, P., Anjukandi, P., Rückert, Lehrstuhl für Moore, J.S. ring-opening of one isomer, both the The theoretical mechanochemistry M., Shiga, M., Ribas-Arino, J., Marx, D. Theoretische Chemie Chem. Rev. 109:5755-5798, 2009 qualitatively different shapes of the en- presented so far has been carried out J. Mater. Chem. 21:8309–8316, 2011 Ruhr-Universität [3] Marx, D. ergy curves associated with the cis and in the gas phase. Clearly, this is only [10] Dopieralski, P., Ribas-Arino, J., Bochum Theoretical Chemistry in the 21st trans reactants and the corresponding a first step as each and every experi- Marx, D. Century: The Virtual Lab”, in: Proceed- Angew. Chem. Int. Ed. 50:7105–7108, rupture forces (1.5 nN and 2.5 nN, ment is carried out in the condensed ings of the “Idea–Finding Symposium: 2011 respectively, at 300 K) reflect the phase - be it a liquid phase in sonication Frankfurt Institute for Advanced Studies, fact that the external forces enhance or a protein matrix in AFM experi- (Greiner, W.; Reinhardt, J., Eds.), EP [11] Laio, A., Parrinello, M. Systema, Debrecen, pp 139–153, 2004. the ring-opening of cis gDCCs more ments. Although initial work along these Lect. Notes Phys. 703:315–347, 2006 http://www.theochem.rub.de/go/cprev. efficiently. This is somehow contradic- lines has been carried out on JUGENE, html [12] CPMD tory to the experimentally found lack of we are looking very much forward J. Hutter et al., http://www.cpmd.org [4] Ribas-Arino, J., Marx, D. selectivity. As argued in Ref. [10], the to using the Blue Gene/Q platform Chem. Rev. (online at http://pubs.acs.org/ [13] Nair, N.N., Schreiner, E., Marx, D. most plausible suggestion for resolving JUQUEEN at Forschungszentrum Jülich doi/abs/10.1021/cr200399q) inSiDE 6:30–35, 2008 such apparent contradiction would to push theoretical mechanochemistry

52 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 53 Applications Applications Global MHD Simulations of protostellar Jets with radiative Effects

The subject of the work I carried out heated gas emits light at particular on the HLRB 2 was the investigation wavelengths according to the chemical of the effect of radiative cooling on the composition, allowing astronomers to propagation of protostellar jets and the observe the traces of the outflow with Figure 2: Schematic of the simulation domain setup [2]. consequent effect on the observable telescopes, see Fig. 1. emissions from these objects. dynamic or magnetohydrodynamic parallel via several possible integration The protostar itself is hidden within domain, representing the ambient methods, while the integration method These highly supersonic jets of gas are the dark centre of the molecular cloud, medium of the molecular cloud, into used for the chemistry module was a ejected from accreting protostars in making theoretical investigation of the which a collimated pulsating flow of gas simple but effective semi-implicit back- the early stages of their development, jet ejection process difficult. However, is injected, representing the beam of wards difference method. the first 30,000 years or so, labelled the appearance of the jet on such the jet. the Class 0 phase. The jets at this large scales gives observational clues The chemical network consisted of the stage typically extend to some tens of which, in conjunction with simulations, The geometry of jets lends itself quite main chemical species and the domi- thousands of AU (astronomical unit, can provide constraints for theoretical well to simulation in a cylindrically nant set of reactions involved in the

1AU is the distance from the earth to models. axisymmetric 2-dimensional domain, production of molecular H2, a key cool- the sun). which enabled us to carry out our simu- ant and emission source in the gas. Modelling Jets with lations without the performance cost Such an outflow interacts with the molecu- Simulations of full 3-dimensional computation. The In order to be able to carry out simula- lar cloud core in which the protostar Our approach is to model the observed setup is as shown in Fig. 2, where tions of such large scale objects, while is forming, and the resulting super- protostellar outflow system as a hydro- initial conditions are set for the gas of still managing to resolve small-scale both the ambient medium and the jet. shock regions, we used the adaptive mesh refinement capability provided in Our simulations examined the effect of PLUTO via the Chombo AMR libraries. key parameters on the jet morphology and emission characteristics. Such parameters include the relative densities of the jet and its surrounding medium, the presence/strength of a magnetic field in the surrounding medium, the chemical composition of the molecular cloud and the level of ionisation of the gas in the jet beam.

We used the PLUTO astrophysical sim- ulation code [1], with an additional Figure 3: Increasing load corresponding to increasing module we have written to compute the wall-time as the jet propagates into the domain [2]. chemistry and corresponding cooling losses at each grid point for every time Performance and Scalability step. PLUTO, a grid based finite-volume The resolution chosen for the produc- Figure 1: HH212, a stunning example of a bipolar prostellar jet emitting in molecular hydrogen lines [3]. code, solves the HD/MHD equations in tion simulations was a base grid of

54 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 55 Applications Applications

of processors at different stages of the References runs, typically starting them with 32 [1] Mignone, A. et al. PLUTO: a Numerical Code for or 64 processors and continuing at 128 Computational Astrophysics, ApJS (2007), or 256 processors. The typical CPU 170, 228, 2007 consumption per run was between 5 http://adsabs.harvard.edu/ and 10 thousand CPU hours in total. abs/2007ApJS..170..228M Results [2] O’Sullivan, J. Molecular Cooling and Emissions in Large The data from the simulations provided Scale Simulations of Protostellar a wealth of information on some of the Jets http://www.ub.uni-heidelberg.de/ factors affecting the jet propagation, archiv/10164, 2009 giving insight into how the observed [3] Zinnecker, A. et al. jets work.One interesting result was HH212: a prototype molecular hydrogen in illustrating the role of the magnetic jet from a deeply embedded protostar, in: B. Reipurth & C. Bertout, eds., Herbig-Haro Figure 4: Section of the jet showing the entrainment of molecular ambient material and subsequent emission [2]. field in maintaining the stability of the Flows and the Birth of Stars, Vol. 182 of bow-shock. In the simulation run with IAU Symposium, 198, 1997 512x64 grid points, with 4 levels of no magnetic field, the cooling instabilities binary mesh refinement, correspond- in the bowshock caused it to break up, ing to an equivalent resolution of allowing ambient molecular material to 8192x1024 grid points. For the domain enter the jet “cocoon”. The molecular of 5000AU this corresponds to a material was then heated by internal resolution of 0.61AU per grid zone, shocks from the jet and caused to emit

which is sufficient to resolve the chemical in the ro-vibrational lines of H2, as dissociation and cooling zones behind shown in Fig. 4. This supports the the shock front. “entrainment hypothesis” where the emissions are thought to arise from As the jet propagates through the domain ambient matter swept up by the jet, as and the area of interaction between opposed to (or in addition to) matter in • Jamie O’Sullivan the jet and the ambient medium in- the jet beam itself. • Max Camenzind creases, the computational load of the simulation also increases, as shown in In the case where magnetic fields are Landessternwarte, Fig. 3. present, this is not seen to occur University of (see Fig. 5), and the emissions in these Heidelberg For the production runs we used the cases are seen to occur directly at HLRB II’s SGI Altix 4700 supercom- the shock, a situation known as “prompt puter. Some thirty runs were carried entrainment”. out in order to probe the effect of different values for the physical and Conclusion chemical parameters on the propaga- The work described here formed part tion and observable emissions. of my PhD work, which I concluded in 2009. Further details are given in my For early parts of the simulation, where thesis [2]. the computational load is small, the Figure 5: Showing the effect of the magnetic field, seen performance benefit of having a large here increasing from 0μG on the left, through 30, 60 number of processors is offset by the and 118 μG on the right. The white box shows matter communication cost between the pro- from the unstable bowshock breaking up and entering the inner part of the jet in the absence of magnetic fields. cessors, so we used a varying number

56 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 57 Projects Projects

Setup of the Validation Suite Overview of the different LRZ Validation-Suite for For each platform a set of variables can HPC Systems at LRZ Life Science be defined in the file "platform.xml" The current LRZ infrastructure consists (see Fig. 1); this file can be found in the of a broad range of HPC systems The potential of High Performance tively validate the quality and scalability directory "platform/". equipped with nodes based on multi-core Computing (HPC) is increasingly being of several life science related software processors. Programs designed for recognized in the fields of life science packages (GROMACS, DALTON and Once a platform is defined, each individual parallel execution on a node can take with particular focus on modelling and DISCRETE). To achieve this, realistic test application that should be used in the advantage of its shared memory archi- simulations. Life science organizations cases were provided by alpha users LRZ-VS needs to be configured for this tecture. The systems are summarized are developing into one of the largest of the Scalalife project, which were then platform. At the LRZ, for example, in the following table: e-Infrastructure users in Europe, and run and analysed on a wide variety of the molecular dynamics application Total number Memory System Processor face many technical challenges in their target HPC platforms. This is important GROMACS can be compiled and submitted of cores per node quest to increase collaboration among to understand and quantify the systems’ on the SuperMUC system with the Intel Sandy SuperMUC Bridge EP 147,456 32 GB researchers, streamline and replicate theoretical and actual performance following commands (see Fig. 1): $ cd 3 PetaFlops 16 cores per processes, enhance service levels and for the constantly updated releases of ScalaLife/VS/applications/GROMACS node accelerate processing for faster time- the software packages included in $ ../../validation/lrz-validate Valid- Intel Westmere- EX (2.4 Ghz) SuperMIG 8,200 256 GB to-market. This development will accel- the project. SuperMUC.xmlThe LRZ-VS will compile 40 cores per node erate tremendously, and puts high a general binary for all node/task/ Intel Westmere- demands on simulation software and The test cases of the ScalaLife alpha thread combinations defined. If neces- EX (2.4 Ghz) SGI Altix UV 2,080 3 TB 20 cores per support services. users as well as the software packages sary, these parameters can be statically node (2.4 Ghz) have been integrated into the LRZ-VS. compiled into the binary. In the next AMD Opteron 6128HE (2 Ghz) CoolMUC 2,800 16 GB This article will review and present a Utilizing the LRZ-VS, the most recent step a so-called sandbox subdirectory 16 cores per new Validation-Suite framework devel- releases of the respective software is created for each job, ensuring node oped at the Leibniz Supercomputing packages are automatically compiled conflict free operation of the individual AMD Opteron + 8 NVIDIA GPU Cluster 64 128 GB Centre (LRZ-VS), which is an extended across the various HPC platforms. applications at run time; the input files Quadro FX 5800 GPGPUs version of the DEISA benchmark- Furthermore, the application of the LRZ-VS are prepared automatically. suite, developed by the Jülich Super- allows automated execution, running computing Centre [1,2]. The suite and verification of the input datasets The Test Cases Results and Discussion has been developed and used in the provided in the ScalaLife project. In the following table a brief overview of At the LRZ, we have measured and ScalaLife project [3], in order to effec- all the parameters for the test cases validated the application examples on of the individual applications is given: the systems SuperMIG and CoolMUC for Dalton (Fig. 3) and on the SuperMIG, Software Name of Number Number CoolMUC and SGI Altix UV machines package test case of atoms of bonds DALTON Cytiac 35 36 for GROMACS (Fig. 4). Spin-Label 71 75 Porp 72 79 All Dalton results scale well up to ap- proximately 100 cores; currently using DISCRETE 1 KIM 4,681 4,735 more cores does not produce results at reasonable scalability. This trend GROMACS Lipid bilayer 40,380 35,020 seems to be independent of the system Gly water 60,000 45,000 Snap 66,463 49,597 size. Cytiac is the smallest test sys- Interface tem but scales better than Spin-Label, 114,000 111,989 (Growth) which includes two-fold atoms and Membrane 261,412 196,630 bonds. In contrast, the Porp system of Virion 5,097,548 50,197 similar size seems to scale well. Figure 1: Automated compilation, execution and verification of validation runs - example for SuperMUC. Nevertheless, these trends can partly

58 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 59 Projects Projects

on the CoolMUC and SGI Altix UV machines scale well until the maximum number of cores tested. The best performance is found for the medium- size test case Snap (~66 k atoms), where a special tweak (vsites) is used in order to increase the step size during the simulation by two-fold. However, this feature also seems to make this system less stable and prone to run- time errors; the medium size of the system further limits the maximum Figure 3: Dalton scaling results for three different test cases: Cytiac (black), Spin-Label (red) and Porp number of cores that can be utilised. (green). The test runs were performed on two architectures at the LRZ: SuperMIG and CoolMUC. The test cases with the larger system sizes Membrane (~260 k atoms) and Growth (~110 K atoms) scale up to 800 and 400 cores, respectively. Only the largest test set called Virion, amounting for more than 5 million particles, shows good scalability up to 2,000 cores and possibly more.

In summary, the LRZ Validation Suite provides a valuable means for automated Figure 4: GROMACS scaling results for six different test cases: Virion (black), Membrane (red), testing, validation and benchmarking Figure 2: Membrane test case for GROMACS. Two layers of lipid in a water box can Growth (green), Snap (blue), Gly (violet) and Bilayer (beige). The test runs were performed on three be seen. Embedded in the lipid layer is a protein. a large range of supercomputing codes. architectures i.e. SuperMIG, CoolMUC and SGI Altix UV. Once the system specific adaption be explained by the different operations has been generated, several program • Momme Allalen executed in each test set, where Spin- packages can be validated just by add- References • Marc Offman Label is a large-scale test case for the ing it to the validation suite. The LRZ-VS [1] http://www.deisa.eu/science/benchmarking [6] Lindahl, E. et al. • Helmut Satzger J. Mol. Model. 7: 306-317, 2001 QM/MM module of Dalton. The Cytiac is open source and architecture inde- [2] http://www.fz-juelich.de/jsc/jube • Ferdinand Jamitzky test case applies advanced quantum pendent and can easily be deployed on [7] van der Spoel, D. et al. [3] http://www.scalalife.eu mechanics methods, which are already a large range of systems as the ScalaLife J. Comput. Chem. 26: 1701-1718, 2005 Leibniz

quite expensive for small molecules. Computing Centres have already shown. [4] Frings, W., Schnurpfeil, A., Meier, S., [8] Hess, B. et al. Supercomputing In the Porp case the density functional For accessing the validation suite or using Janetzko, F., and Arnold, L. J. Chem. Theory Comput. 4: 435-447, 2008 Centre (LRZ) A Flexible, Application- and Platform- theory from quantum mechanics is applied. it together with your own software independent Environment for Benchmarking, [9] http:///www.scalalife.eu/validation projects please visit the ScalaLife web- Accepted for Proceedings Parallel For the GROMACS evaluation, we have site at http://www.scalalife.eu. Computing (ParCo), Lyon, 2009

compared the scalability of six different [5] Berendsen, H.J. et al. test cases on three different architec- Acknowledgements Comp. Phys. Comm. 91: 43-56, 1995 tures. All test cases seem to scale well We kindly thank all the ScalaLife alpha until approximately 400 cores. This users for providing the datasets [9]. observation is confirmed on the SuperMIG architecture: beyond 400 cores, the first test cases start to produce results with a sub-optimal scalability. The runs

60 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 61 Projects Projects

engineer with a consistent interface Test Cases and Results Optimizing the Energy-to-Solution and methodology for use of the per- The library has been tested with three on SandyBridge Systems formance counter hardware found in different applications with three different most major microprocessors. One of requirements. These applications are a The increasing popularity of Green Com- There are five possible kernel governors the new components of the PAPI library O3 matrix multiplication code developed puting has qualified energy consumption available for use with the cpufreq is the so called PAPI-RAPL component in C language parallelized with MPI, a and optimization as a primary concern. subsystem. These governors set the [5] that makes use of the RAPL [6] SIP (Strongly-implicit procedure) test The key of energy efficient applications processor frequency based on certain sensors available at the SandyBridge bench application developed in Fortran lies on the definition on energy models criteria [3]: microarchitectures [7,8], sensors that language parallelized with OpenMP and that minimize, or at least optimize, • Performance: The performance allow to measure the power consump- the geophysical Seissol application the energy consumption at any instant governor statically sets the processor tion of the CPU-level components by implemented in Fortran code parallelized of the runtime of the applications [1]. to the highest frequency available. The examining the MSR (Model Specific with MPI. The three applications have Deeply related to these energy models range of frequencies available to this Registers). been instrumented with Periscope and are the energy policies provided by the governor can be also per parameter then linked with the implemented and linux kernel cpufreq infrastructure [2]. adjusted. The now implemented library joins the here explained library. The three appli- These policies allow managing the • Ondemand: Introduced in the 2.6.10 cpufreq and PAPI-RAPL features allow- cations have been run in a test cluster energy consumption based on frequency kernel, the ondemand governor was the ing to optimize the energy consumption composed by eight SandyBridge EP Intel® and processor utilization criteria. first in-kernel governor to dynamically of running instrumented applications Xeon® CPU E5-2680 at 2.70 GHz. change processor frequency based on by changing the frequency and governor Both applications parallelized with MPI We have evaluated how energy con- processor utilization. The ondemand policy of processors on certain instru- have been run using 32 tasks, whereas sumption changes while setting different governor checks the processor utilization mented code regions. These code regions the SIP application has been run using governor and frequency parameters and if it exceeds a configurable thresh- are detected by Periscope, a scalable eight threads. For each code, we have using the cpufreq kernel infrastructure old, the governor will set the frequency automatic performance analysis tool defined seven different scenarios: and using the PAPI-RAPL component to the highest available. developed by the TU München [9,10]. criteria to change the governor and to measure the energy consumption. We • Conservative: Based on the ondemand Periscope encloses the code regions frequency of the processors depend- evaluate also, how the performance governor, the conservative governor is with the start_region(...) and end_region(...) ing on the region type determined by and runtime of the application changes similar in the way it dynamically adjusts function calls, functions used by the Periscope. These seven scenarios are with the energy consumption of the frequencies based on processor library to trigger the methods to change, specified in Table 1. running application, the energy to solution utilization; however, the conservative if needed, the governor parameters behaviour. governor allows a more gradual increase or frequency of the process where the in power. caller thread or process runs. In this CPUFreq Infrastructure • Powersave: On the flip side of the case for example, we can increment Starting with the 2.6.0 Linux kernel, the performance governor, the powersave the frequency of the processors that processor frequencies can be dyna- policy statically sets the processor store the threads or processes that mically scaled through the cpufreq to the lowest available frequency. are running the loop sections and subsystem. This dynamic scaling of • Userspace: This governor allows the decrement it in I/O regions. But, not the clock speed gives some control in selection and setting a frequency only that is possible; suppose for throttling the system to consume manually. Note that the userspace gover- example that we want to optimize the Figure 1: Suppose an application with a high load imbalance with a barrier at the end of a section, as the one on the left side. The processes that finish their code less power when not operating at full nor itself does not dynamically change energy consumed by an application must wait until the barrier, consuming energy. The library will set a different capacity. The cpufreq infrastructure the frequency; rather, it allows a user- that shows parallel load imbalance (see frequency for each parallel process minimizing the idle time until the barrier for makes use of governors (policies) and space program to dynamically select Fig. 1). The instrumenter will determine each task, as shown in the image of right side. daemons for setting a static or dyna- the processor frequency. the code regions where the processors mic power policy for the system. The are idle whereas the library will set a The runtime, energy consumed and dynamic governors can switch be- PAPI-RAPL Component lower frequency or a more saving energy cycles used by the three applications, tween CPU frequencies based on pro- PAPI (Performance Application Pro- policy for those processors, trying to regarding the seven different scenarios cessor utilization to allow for power gramming Interface) [4] aims to provide minimize idle time on tasks/threads. are graphically presented in the figures savings while not sacrificing performance. the tool designer and application 2, 3 and 4 respectively.

62 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 63 Projects Projects

main region sub, call, user loop, for, forall I/O region parallel region T1 ondemand 2.7 ondemand 2.7 ondemand 2.7 ondemand 2.7 ondemand 2.7 T2 userspace 2.7 userspace 2.7 userspace 2.7 userspace 2.7 userspace 2.7 T3 ondemand 1.9 ondemand 1.9 ondemand 1.9 ondemand 1.9 ondemand 1.9 T4 userspace 1.2 userspace 1.2 userspace 1.2 userspace 1.2 userspace 1.2 T5 ondemand 1.9 ondemand 1.9 ondemand 2.7 ondemand 2.7 ondemand 2.7 T6 ondemand 1.9 ondemand 1.9 userspace 1.9 ondemand 2.7 userspace 1.2 T7 ondemand 1.9 ondemand 1.2 userspace 1.2 ondemand 1.2 userspace 1.2

Figure 2: Comparison between the energy to Figure 3: Comparison between the elapsed Table 1: Governor and frequency constrains for the region type identified by periscope. Each set of constrains solution (Joules) of the three applications, runtime (seconds) of the three applications, have been defined as an optimization profile or scenario test. The frequency specified for the ondemand gover- depending on the energy optimization profile T . depending on the energy optimization profile T . nor refers to the maximal available one in GHz (the minimal in this case is always the minimal frequency 1.2 GHz), i i There are no results for the Seissol application There are no results for the Seissol application whereas the frequency specified for the userspace governor refers to a certain frequency in GHz to be used for scenarios T5, T6 and T7 yet. for scenarios T5, T6 and T7 yet. and not a maximal one. 2.7 GHz is the maximum frequency of the processors, 1.9 GHz is the medium frequency and 1.2 GHz refers to the minimum one.

It is very easy to observe, taking into (scenarios from T5 to T7), instead a account the achieved results, that fixed governor for the whole execution, each application has its own energy although the procedure to change the [4] Browne, S., Deane, C., Ho, G., Mucci, P. behaviour; we cannot conclude with a governor and to set a new frequency PAPI: A Portable Interface to Hardware golden rule for all the applications but, also adds an extra overhead. Although Performance Counters, Proceedings of even so, the three applications have it is not possible to find a direct corre- Department of Defence HPCMP Users Group Conference, 1999 a similar global behaviour: making a spondence between energy and run- Figure 4: Comparison between the consumed comparison between T1 and T2 which time, looking at the results, there is [5] Weaver, V., Johnson, M., Ralph, J., cycles of the three applications, depending on Kasichayanula, K., Luszczek, P., use the same frequency but different somehow a linear relation that allows the energy optimization profile T . There are no i Terpstra, D., Moore, S. results for the Seissol application for scenarios governors, T1 seems to have a better to determine a priori the behaviour of Measuring Energy and Power with PAPI, T5, T6 and T7 yet. performance, because it allows the the cycles used by the application The 1st International Workshop on Power- processor to work at lower frequencies, just from the curve of the energy con- Aware Systems and Architectures, 2012

saving energy when the processor sumption: the most overall energy the production version of the code. [6] David, H., Gorbato, E., Hanebutte, U., really does not need the highest fre- is configured and used, the most cycles The whole tuning process, consisting Khanna, R., Le, C. • Carmen Navarrete quency. In the same way, T1 and T3 are needed for the same input dataset. of automatic performance analysis RAPL: memory power estimation and • Carla Guillen use the same governor policy but with and automatic tuning, will be executed capping, Proceedings of the 16th ACM/ • Wolfram Hesse IEEE international symposium on Low power a different frequency. In that case, We can conclude that a same type of online, i.e., during a single run of the electronics and design, ACM, 2010 • Matthias Brehm T3 saves a little more energy but it region might behave differently and application. [7] Intel Corporation also spends more time. therefore frequency policies will be Leibniz Intel Xeon Processor, 2012. set depending on the segment code References http://www.intel.com/xeon Supercomputing As it has been written before, a lower and not per region. The fine tuning [1] Merkel, A., Bellosa, F. Centre (LRZ) Balancing power consumption in multi- [8] Intel Corporation processor frequency does not necessarily of the frequency in the code should processor systems, Proceedings of 1st Intel 64 and IA-32 Architectures Software imply reduced energy consumption. then be possible when integrating this ACM SIGOPS European Conference on Developer Manual, 2012 This behaviour can also be observed in library with Periscope within the Computer Systems, ACM, 2006 [9] Gerndt, M., Fuerlinger, K., Kereku, E. those tables and figures. For example, AutoTune project. AutoTune will extend [2] CPU Frequency Scaling Periscope: Advanced Techniques for Perfor- T4 uses the same governor pattern as Periscope, an automatic online and https://wiki.archlinux.org/index. mance Analysis, International Conference T2 but with the half of frequency. If this distributed performance analysis tool, php/CPU Frequency Scaling ParCo 2005, 2006 principle were right, the energy con- with automatic online tuning plugins [3] Using the Linux CPUFreq Subsystem for [10] Gerndt, M., Fuerlinger, K. sumed by the T4 set should be approxi- for performance and energy efficiency Energy Management. Automatic Performance Analysis with Peri- mately half of the energy consumed by tuning. The resulting Periscope Tuning http://publib.boulder.ibm.com/ scope, Concurrency and Computation: T2, rule not fulfilled by the experiments. Framework will be able to tune serial infocenter/lnxinfo/v3r0m0/topic/liaai/ Practice and Experience, Wiley InterScience, cpufreq/liaai-cpufreq.htm John Wiley & Sons, Ltd., 2009 What it seems to have better results and parallel codes with/without GPU is setting a mixed combination of gover- kernels and will return tuning recom- nors during the application runtime mendations that can be integrated into

64 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 65 Projects Projects ParaPhrase − Parallel Patterns for Adaptive Heterogeneous Multicore Systems

The ParaPhrase project aims to produce express a variety of parallel algorithms a new structured design and imple- as compositions of lightweight software mentation process for heterogeneous components forming a collection of parallel architectures, where developers virtual parallel tasks. Components from exploit a variety of parallel patterns multiple applications will be instantiated to develop component-based applications and dynamically allocated to the available that can be mapped to the available hardware resources through a simple hardware resources, and which may and efficient software virtualization layer. then be dynamically re-mapped to In this way, we will promote adaptivity, meet application needs and hardware not only at an application level, but also availability. The project will exploit new at a system level. and future (parallel) computer systems. developments in the implementation of Using ParaPhrase technologies, we parallel patterns that will allow us to Finally, we will develop virtualization anticipate that we will be able to achieve abstractions across the hardware significant parallel speedups for realistic boundaries, allowing components to be applications and that these results will Project Partners • The ParaPhrase dynamically re-mapped to either CPU/ scale with larger systems. The ParaPhrase Consortium consists Consortium GPU resources on the basis of suitability of six academic and three industrial • Mathias Nachtmann1 and availability in a simple and straight- ParaPhrase is based on the FastFlow partners from five countries. • José Gracia1 forward way. framework, which is developed, by the • Colin W. Glass1 University of Pisa and Torino. 1. University of St. Andrews, Key Features United Kingdom (Coordinator) 1 University of • Sustainable parallel computing HLRS will initially define test kernels and 2. Robert Gordon University, Stuttgart, HLRS through enhanced programmability and testing frameworks. Together with United Kingdom lower power consumption. other end-users, like SCCH, best prac- 3. Universität Stuttgart (HLRS), tices will be adopted for the particular Germany • Cost reduction in programmability use cases and applications of the end- 4. Universita Degli Studi di Torino, Italy and implementation of parallel systems. user. At a later stage, HLRS will port 5. Universita di Pisa, Italy a full scientific application to the Para- 6. Queen's University Belfast, • Better resource utilisation of parallel Phrase model in order to evaluate it United Kingdom heterogeneous CPU/GPU architectures. in an HPC production environment. Finally, 7. Mellanox Technologies, Israel HLRS will contribute to the optimization 8. Erlang Solutions, United Kingdom The work will enable major progress to of placement of tasks, etc., on various 9. Software Competence Center be made in programming both current current and future hardware architectures. Hagenberg, Austria

66 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 67 Projects Projects H4H – Hybrid for HPC

HLRS is contributing to the provisioning Partners of tools for correctness-checking and The project has nine German partners: debugging. The first approach provides a tighter coupling of OpenMPI with the • Frauenhofer SCAI • MAGMA Gießereitechnologie GmbH memory checking tools Valgrind and • GNS (Gesellschaft für Numerische • RECOM Services GmbH Intel PinTools with the goal to detect Simulation mbH) • ZIH, TU Dresden • Thomas Baumann runtime race-conditions in the usage of • GWT-TUD GmbH Zentrum für Informationsdienste • José Gracia H4H – Hybrid for HPC, is an ITEA2 communication buffers, as read-before- • High Performance Computing und Hochleistungsrechnen • Shiqing Fan labeled European project co-funded by write, etc. The flexible implementation Center Stuttgart (HLRS) • Steffen Brinkmann the German BMBF. It continues in the allows for different memory checkers • INTES (Ingenieurgesellschaft für Another 14 partners come from the tradition of the ParMA ITEA2 Project. to be used as modules on the OpenMPI Technische Software GmbH) European countries Sweden, France, University of OPAL layer. As additional benefit the • Jülich Supercomputing Center (JSC) Spain. Stuttgart, HLRS Aim of Project Portable Hardware Locality tool (hwloc) The objective of H4H is to provide now supports windows-based systems compute-intensive application developers which allows application developers to with a highly efficient hybrid program- check and exploit the layout of the un- ming environment for heterogeneous derlying hardware. computing clusters composed of a mix of classical processors and hardware A further correctness-checking tool is accelerators. Future HPC architectures also based on Valgrind, This Valgrind will rely heavily on non-uniformity and extension allows memory checking on heterogeneity to achieve high perfor- CUDA-based architectures. Using the mance. The process of incorporating wrapping functionality of the Valgrind different type of accelerators, each being framework on the CUDA Driver level optimal for a given type of code, has allows to go beyond the existing capa- already started. However, we are lacking bilities of Valgrind and existing tools like programming models, methods and cuda-memcheck. tools to take full advantage of such platforms. H4H will leverage and con- Last but not least, Temanejo, the sistently advance the state-of-the-art graphical debugger for task-parallel in several key areas: programming programming models developed at models and associated runtimes, per- HLRS, will be extended and adapted formance measurement and correctness to the latest family of the StarSs pro- tools, intelligent mapping of processes / gramming model. It allows a visual threads to hardware topology, dynamic representation on task dependencies, automatic tuning and prediction of tasks scheduling, and execution by the parallel execution time for a given appli- runtime environment. A tight integra- cation on different platforms. tion with conventional debuggers allows the developer to verify parallel tasking and sequential code simultaneously.

68 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 69 Projects Projects The SIOX Project

High Performance Computing has become Additionally, it is difficult to predict for collecting extended information an essential research tool in many whether the identified I/O-patterns when specific I/O issues are investigated. scientific fields. The computing power used by a specific parallel application of HPC systems continues to increase will cause a performance problem or For I/O optimization, the efficiency of exponentially. At the same time, the not. the observed access patterns is to be capabilities of I/O hardware are not estimated through an automatic analysis keeping track. To keep up with the appli- Goals of the Project of the data, followed by the proposal cation needs, the number of I/O SIOX' main goal is to gain an overview of possible improvements based on the devices used with today’s HPC systems of all I/O activity taking place on a HPC knowledge accumulated in the SIOX' Figure 1: SIOX is collecting data from all relevant software and hardware layers and is continuously increasing. Further- system, relate this I/O activity to the knowledge base. provides Open MPI I/O with information to optimize parallel I/O requests. more, additional technologies like cach- running applications and to use this ing are used to further optimize perfor- information to optimize I/O. For applications making use of MPI I/O, mance. This growing complexity of the these improvements will be realized by I/O system together with the complex- Initially, the project's scope spans the giving this information back to the Open • Thomas Bönisch ity of the necessary global parallel file development of standardized interfaces MPI I/O library. There, the most appro- systems make it very hard to locate and an environment to collect, reduce, priate I/O algorithm for the request will Project Partners University of the reason for and understand perfor- relate and store performance data from be chosen accordingly. In addition, the • Center for Information Services and Stuttgart, HLRS mance issues in the I/O area. all relevant software and hardware available control parameters will be set High Performance Computing (ZIH), layers (see Fig. 1). All the created com- to best fitting values. Technische Universität Dresden In this context, a global optimization of ponents will be based on open interfaces. • Universität Hamburg, Chair in the I/O system to the needs of all This opens for example the possibility The obtained data collection gives in Scientific Computing applications turns out to be very difficult to use SIOX with different file systems addition an overview of the I/O require- • High Performance Computing Center or even impossible. In part due to the and different storage hardware. ments of all applications which have disparate nature of the requirements been using the HPC system over time. The project is also supported by the and in part because there is currently The collected information will be stored This will lead to a better understanding following associated partners: no way to identify abnormal I/O behaviour in the SIOX database and it will be of the overall usage of the I/O system and to trace it back to its source. analyzed and correlated with previously and it allows for a better sizing of future • German Climate Computing Center Moreover, the I/O system is typically observed access patterns in order to HPC systems. (DKRZ GmbH), Hamburg not used exclusively by an application. gain an understanding of the character- • IBM Germany GmbH, Ehningen Therefore, the interplay of several appli- istics and causal relationships. The data SIOX has started in July 2011 with a cations performing I/O concurrently collection can be controlled according duration of three years and is funded Project web page: further complicates the scenario. to current requirements. This allows by the German Ministry of Research. http://www.hpc-io.org

70 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 71 Projects Projects

2011) funded under the first call of the tectures and programming paradigms Performance Dynamics of same BMBF program. Score-P was beyond the original project duration. Massively Parallel Codes designed for scalability and easy of use, supports profiling, event tracing, and The academic project partners in LMAC online analysis of HPC applications. It are the German Research School for The deployment of adaptive algorithms Periscope with new functionality to enables enhanced interoperability be- Simulation Sciences as the coordinator, • Christian Rössel 1 and sophisticated load-balancing measure and analyze performance dy- tween the end-user tools and will soon the Jülich Supercomputing Centre, • Bernd Mohr 1 schemes make the execution behavior namics. Vampir is an interactive trace replace their proprietary measurement RWTH Aachen University, TU Dresden • Michael Gerndt 2 of simulation programs increasingly browser whose particular strength is systems (see Fig. 2). The organization and TU Munich. GNS mbH, a private • Felix Wolf 3 dynamic. Effective application optimiza- the detailed visualization of the interac- of Score-P as a community project is company that specializes in services tion therefore requires capturing and tions between the different processes another task in LMAC. related to metal forming simulations, 1 Jülich analyzing performance data along the of a parallel program, offering highly such as mesh generation for complex Supercomputing dimensions of both space and time. flexible views to the user. Scalasca, To serve a broad user base, both, within structures and finite element analyses, Centre (JSC) While existing performance analysis which has been specifically designed the Gauss Alliance and beyond, most joined the project as an industrial tools typically provide detailed information for large-scale systems, integrates ef- of the performance tools are released partner. In addition, the University of 2 Technische along spatial dimensions like processes ficient performance summaries with free of charge to the community under Oregon, an associated partner, com- Universität and nodes, the aspect of performance the ability to automatically identify wait an open-source license. Only Vampir, plements the LMAC objectives with München dynamics, that is, the evolution of states that occur in simulation codes, due to its sophisticated user interface, corresponding extensions to the per- performance metrics along the time for example as a result of unevenly is distributed commercially. The soft- formance tool TAU. 3 German Research dimension, has so far been neglected. distributed workloads. Whereas the ware products are accompanied by School for Simulation Additional insight that can be obtained first two tools analyze the performance training- and support offerings through Sciences, by taking advantage of performance data postmortem, that is, after the the Virtual Institute – High Productivity For more information see: RWTH Aachen dynamics is depicted in Fig. 1. parallel program has been terminated, Supercomputing, and will be maintained http://www.vi-hps.org/projects/lmac Periscope characterizes the perfor- and adapted to emerging HPC archi- and http://www.score-p.org To support the optimization of applications mance properties of an application with time-dependent performance char- and quantifies associated overheads acteristics, the LMAC project (Leistungs- already at runtime. CUBE dynamik massiv-paralleler Codes), Visual profile browser funded under the second call of the A major portion of the new capabilities Periscope BMBF program “HPC-Software für will be integrated in Score-P, a mea- Online bottleneck search skalierbare Parallelrechner”, aims to surement infrastructure shared by all TAU PerfExplorer Performance data extend the well-established performance- of the above-mentioned tools, which mining analysis tools Vampir, Scalasca, and was created in the SILC project (2009-

CUBE4 report Zoom in on CUBE4 relevant spot report Online interface Score-P Scalasca Vampir Performance Parallel wait-state Visual trace browser measurement analysis

Instr. PAPI target application OTF2 traces Figure 1: Runtime-distribution and point-to-point communication times over iterations and processes: (a) Runtime, (b,c) Point-to-point communication. The different colors in (b) correspond to the minimum, the median, and the maximum over all processes of a single iteration. In (c) the color coding denotes the entire distribution over space-time (darker means more). The rise in the maximum com- Figure 2: Interaction of the tools Persicope, Scalasca, Vampir, TAU and Scalasca’s profile browser CUBE via munication time in (b) is the cause for the increased iteration-runtime in (a). the data exchange formats OTF2 for traces and CUBE4 for profiles.

72 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 73 Projects Projects

Tier-0 and Tier-1 systems. Support an EC contribution of 19 Mio Euro PRACE – Third Implementation new partners in the deployment of com- under grant agreement RI-312763. The Phase Project started mon services and enhance the existing duration will be 24 months except for ones. the PCP work that is planned for 48 months due to the complexity of the re- To support the accelerated implementa- the PRACE web site, electronic news • Scale and optimize application codes, lated multi-stage process. tion of the Research Infrastructure letters, printed digests and at major especially for the Preparatory Access established by the Partnership for conferences and exhibitions (e.g. ISC, and DECI projects. Investigate and Over 300 researchers collaborate in Advanced Computing in Europe (PRACE) SC, ICT). Establishing a Summer of HPC exploit new HPC tools and techniques PRACE from 25 countries and 45 the European Commission issued a programme to encourage students to including those developed in other peta- organizations that receive funding from third call for a proposal in 2011. PRACE seek careers in HPC through intern- scale and exascale projects. Produce the European Commission; this includes partners from 25 countries submitted ships at HPC centres: best practice guides to support European 26 Beneficiaries and 19 Third Parties a successful proposal and started the research communities. from associated universities or centres. Third Implementation Phase project • Intensify the successful training of 180 collaborators met in Paris from (PRACE-3IP) on July 1, 2012. PRACE by transforming the PRACE • A completely new undertaking is a September 5-7, 2012, for the PRACE-3IP Advanced Training Centres (PATC) into pilot on Pre-Commercial Procurement kick-off and a joint PRACE-2IP all-hand Key objectives of PRACE-3IP are: a permanent service, organizing face (PCP) which is an approach for procur- meeting. The meeting was organized by • Support for the PRACE AISBL through to face training classes, and improving ing R&D (Product Driven Research GENCI and held at the Institut de Physique the development and optimization of the PRACE training portal. and Pre-Commercial Development) du Globe de Paris. processes for collaborations between services. The scope of the PRACE PCP PRACE and other organizations; manage- • Provide services for industrial users pilot will be a Whole System Design Synopsis of the PRACE ment procedures and related tools; and small and medium enterprises for Energy Efficient HPC. Some PRACE Projects impact assessment, management of (SME). Develop models and best prac- partners, including the AISBL, form the The European Commission supported the DECI (Distributed European Com- tices for industrial users including an PCP Group of Procurers that manages the creation and implementation of puting Initiative providing access to integrated access programme and issue and co-funds this pilot complementing PRACE through four projects with a PRACE Tier-1 resources) call logistics. pilot projects for SMEs. the EC funding of up to 4.5 Mio Euro total EC funding of 67 Mio Euro. The with an equal amount. The intention is partners co-funded the projects with • Dissemination and outreach continuing • Continue the operation and coordi- to demonstrate that PCP can advance over 110 Mio Euro in addition to the • Dietmar Erwin the production of promotional material, nation of the distributed infrastructure innovation in Europe through public pro- commitment of 400 Mio Euro by the presentation of project results through through integration of new and upgraded curers. The pilot will also help to under- hosting members to procure and op- Forschungszentrum stand if PCP is suitable to advance the erate Tier-0 systems and the in-kind Jülich European HPC industry. contribution of Tier-1 resources on systems at presently twenty partner PRACE-3IP is again managed by sites. The following table gives an over- Forschungszentrum Jülich. It has a view of the PRACE projects. budget of nearly 27 Mio Euro including

Project Id Number Budget EC Funds Duration Status Grant Number Partners Mio. € Mio. € January 1, 2008 PRACE-PP 16 20.1 10.0 - completed RI-211528 June 30, 2010 completed – except July 1, 2010 PRACE-1IP for the Evaluation of 21 28.7 20.0 - RI-261557 Prototypes in WP9 June 30, 2012 till March 31, 2013 September 1, 2011 PRACE-2IP 22 35.4 18.0 - in progress RI-283493 August 31, 2013 July 1, 2012 started PRACE-3IP 26 26.8 19.0 - PCP to end RI-312763 June 30, 2014 June 30, 2016 (Total) 99.6 77.0 Figure 1: PRACE collaborators at the PRACE-3IP Kickoff. © PRACE / L. Godart

74 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 75 Systems Systems

perform four fused multiply-add op- be fetched from L2 cache or external JUQUEEN: The most recent erations in parallel. Like in previous memory, the hardware supports pre- Blue Gene Architecture at Jülich generations of Blue Gene, the vector fetching. Like in other processor archi- arithmetic instructions may involve a tectures, such pre-fetching can be permutation of the vector elements, triggered by an automatic detection of Blue Gene/Q is the third generation in The Blue Gene/Q processor [1] comprises which is used to implement complex a sequence of contiguous set of memory a line of highly scalable architectures. 16 cores for executing the user's ap- arithmetics (without the need of sepa- addresses being accessed. Beyond It is similar to the previous designs as plication. There is an additional core for rate shuffle operations like in other this stream pre-fetching scheme there very power-efficient processors are operating services. By decoupling the vector instruction set architectures). is a new scheme called list pre-fetching. integrated very densely. But, due to an execution of system services, effects With each of the 16 QPUs being able This feature may be exploited by appli- immense increase of parallelism at the of random asynchronous delays of user to complete four multiply-add opera- cations where a sequence of (possibly single node level, a significant boost of processes are suppressed. Such noise tions per clock cycle at a clock speed non-contiguous) addresses is accessed performance is achieved. The network can significantly deteriorate the scal- of 1.6 GHz, the peak performance is many times. Controlled by the user the still being implemented on the chip remains ability of an architecture. Yet another, 204.8 GFlop/s. list pre-fetch engines can record a unique feature of this architecture. redundant, processor core is on the access patterns. When the execution of This is a key ingredient for being able to chip, but not available in practice; it has This huge amount of performance can the relevant code section is repeated scale performance up to a huge num- primarily been added to increase the only be exploited if it is balanced by a the list of recorded addresses is used ber of cores. manufacturing yield. With each of the powerful memory subsystem. As can for pre-fetching the data. cores being 4-way multi-threaded, up be seen in Fig. 1 a large fraction of the Figure 1: Physical to 64 user threads (or processes) can layout of the Blue Gene/Q chip. run at the same time.

The processor core architecture is relatively simple and implements a standard 64-bit power instruction set architecture. A particular feature of this core architecture is the support of an auxiliary execution unit. For Blue Gene/Q a Quad Floating-Point Processing Figure 3: A Blue Gene/Q processor (left) is mounted to a heat spreader. A node-card can host 32 nodes. With Unit (QPU) had been developed. The 16 node-cards per midplane and two midplanes per rack the total number of nodes per rack (right) is 1,024. QPU processes vectors of four 64-bit elements. In each clock cycle it can die space is occupied by the L2 cache, One feature of the Blue Gene architecture which has a capacity of 32 MBytes. line is the dense integration of a large Data is moved between external number of nodes within a single rack memory and this last-level cache by (see Fig. 3). Unlike in previous genera- two memory controllers (MC 0 and tions of Blue Gene, where air was used MC 1). The L2 cache is shared by all to remove the heat generated by the processor cores. A central crossbar compute nodes, in the new generation switch connects it to all cores plus of machines the nodes are directly

Figure 2: The Quad Floating- the network subsystem. The cores can connected to a liquid cooling system. Point Processing Unit (QPU) read from the L2 cache at an aggregate 90% of the heat originating from the is an extension of the A2 maximum bandwidth of 409.6 GByte/s. compute nodes in the system is directly processor core to accelerate taken away by the water, the remaining floating-point computations [1]. Operands stored in the register We would like to highlight several fea- fraction, coming mainly from the power file (RF) are processed in 4 tures of the memory hierarchy. To hide supplies, is still moved out of the rack fused multiply-add dataflow the latencies, which occur after an by air. (MAD) pipelines. L1 cache miss because data needs to

76 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 77 Systems Systems

of Blue Gene. However, there are down end of July and dismounted. Once References [1] Haring, R.A. et al. 5 challenges for the application developer the space, power, and cooling capacity The IBM Blue Gene/Q Compute Chip, IEEE to efficiently exploit the performance occupied by JUGENE were freed, the Micron, Vol. 32, No. 2, March/April 2012 of this architecture. The most notable extension of JUQUEEN to its final con- 4 challenge is the increased parallelism at figuration with 28 racks is taking place. [2] Chen, D. et al. The IBM Blue Gene/Q Interconnection Net- sec] various levels. Compared to Blue Gene/P This translates into a more than five-

� work and Message Unit, SC '11 Proceed- 3 the vector units doubled their width, fold increase of compute performance ings, 2011 the number of cores and threads per on Blue Gene available for scientific com- [3] El-Harake, H. et al. 1/t [1/ without with SMT node increased by a factor of four and puting in Jülich, Germany and Europe. 2 Juniors: Prototypes for novel exascale I/O 16, respectively. Within the Exascale concepts, ISC'12 Poster, 2012 Innovation Centre (EIC) the utilization of A large fraction of the Blue Gene/Q 1 [4] http://www.green500.org performance features has been investi- installation at Forschungszentrum Jülich, gated already at a stage when only pro- JUQUEEN, will be available for users of [5] Baier, H. et al. 0 16 32 48 64 totype hardware had been available. the Gauss Centre for Supercomputing QPACE: A QCD parallel computer based on N Cell processors, PoSLAT 2009 (2009) 001 threads and PRACE. Figure 4: Inverse execution time of a fluid Fig. 4 shows the inverse execution time [6] Pozzati, F. dynamics application running on a single node (which is proportional to the performance) Porting and optimization of a Lattice • Dirk Pleiter as a function of the number of threads [6]. Boltzmann D2Q37 code to Blue Gene/Q, of a fluid dynamics application based • Michael Stephan The result shows significant gain from Simulta- Technical Report FZJ-JSC-IB-2011-06, neous Multi-Threading (SMT). on the Lattice Boltzmann method. In Forschungszentrum Jülich, 2011 case of ideal scaling the performance Jülich increases proportional with the number Supercomputing The sustained performance obtained of threads until each core executes Centre (JSC), with the High-Performance Linpack one thread. A significant performance Forschungszentrum (HPL) benchmark is similar to previous increase is expected until each core Jülich generations of Blue Gene machines, executes two threads since one hard- i.e. slightly more than 80% of the peak ware thread can only schedule an in- performance. This high level of device struction every second clock cycle. For utilization in combination with a high this particular application a noticeable power efficiency allowed for a significant further increase of performance could leap on the Green500 list [4]. In this be achieved by placing up to four threads list systems are ordered according to on each core. the ratio speed achieved with the HPL benchmark versus power consumed by The first racks of Blue Gene/Q have the system during benchmark execution. been delivered to Forschungszentrum Previous number one on this list, Jülich in April 2012. Only a few weeks QPACE [5], a special purpose computer later JUQUEEN consisting of eight racks operated at Forschungszentrum Jülich, could be made accessible to users achieved 0.77 GFlop/s per Watt. With for early production runs, see Fig. 5, Blue Gene/Q the score went up by delivering a peak performance of 1.6 almost a factor three to 2.1 GFlop/s PFlop/s, which is 60% more than JUGENE per Watt. (72 racks Blue Gene/P, 1PFlop/s peak performance). The HPL benchmark performance of more than 80% demonstrates that it is Even though JUGENE was an extreme at least for some applications feasible reliable and stable system over the last to obtain a high level of sustained per- years - achieving a user job utilization formance, similar to previous generations of over 90% - the system had to be shut Figure 5: 8-rack JUQUEEN at Forschungszentrum Jülich.

78 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 79 Systems Systems

Projects Virtual Reality and Visualization at As usual with VR and visualization, the the Leibniz Supercomputing Centre application domains are extremely diverse. Where many centres only focus on the visualization of simulation results, the LRZ supports users of The installations are set up in a room classical virtual environment applica- of the size of 12x12x11 m having a raised tions as well. Typical projects come floor at the height of 5,20 meters. The from the HPC users of life sciences but lower part of the V2C contains the also from arts and humanities. computing equipment, electricity and Figure 2: An example from the area of archaeology - cooling infrastructure, whereas the Some example projects currently worked visualization of the tomb of Karaburun (Visualization by upper area is used as visitor and user on in the V2C are terrain visualization, Andreas Hartmann with photos by Roy Hessing). space. logistics simulation, architecture infor- matics, information visualization, archae- tion tools can be used and are able to Figure 1: The V2C with Powerwall Installation ology, new media art, geo engineering have an extremely fast access to the the powerwall at the left The powerwall is typically observed from and zoology. computation results of the simulations and the 5-sided pro- jection installation at the a cinema-like area consisting of 21 seats stored on SuperMUC or elsewhere. right. at 3 elevation levels. With its seating When supporting users and cooperating setup it can be used as a simple pre- in their domain specific project, often The V2C supports over thirty software Due to the growing demand for Virtual sentation medium, while its additional additional competence is needed and packages from the domain of VR and Reality (VR) and Visualization services tracking capabilities allow its application corresponding research is undertaken Visualization in order to support the re- in the recent years the Leibniz Super- for interactive visualizations. by the team at LRZ to improve existing alization of the different project types. computing Centre (LRZ) has decided visualization techniques, navigation and to establish a Centre for Virtual Reality 5-Sided Projection Installation interaction with the scenes. A detailed overview on the installed VR and Visualisation (V2C). Based on the concepts of Carolina Cruz- and visualization software, the hardware Neira’s CAVE (CAVE Automated Virtual Of course the installations are set up setup and the running projects can be Whereas VR as a technology tries to Environment) [1] a 5-sided projection in a way that they can communicate found at http://v2c.lrz.de • Christoph Anthes immerse the user by providing highly installation, consisting of three back- via a 10GE network directly with the • Dieter Kranzlmüller interactive, real-time, stereoscopic im- projected sides, a back-projected floor recently installed LRZ supercomputer References ages based on the user’s viewpoint, and ceiling was built. Mirrors are used SuperMUC (Europe’s most powerful [1] Cruz-Neira, C., Sandin, D.J., Defanti, T.A., Leibniz Kenyon, R.V., and Hart, J.C. visualization techniques process simula- to reduce the spatial demands. supercomputer), but also with other Supercomputing The cave: Audio visual experience automatic tion data, measured data, or abstract systems world-wide. On the one hand, virtual environment, Communications of the Centre (LRZ) data. These data sets can ideally be Whereas the powerwall, although it data display via remote visualization ACM, 1992, 35, 64-72 displayed using VR technology. has the capabilities of interaction with and computation is possible, on the its optical tracking system, is used other hand traditional VR and visualiza- Hardware mainly as a presentation medium, the The V2C hosts a large scale, high reso- 5-sided projection medium provides

lution powerwall installation, as well as full support for interactive applications Device Powerwall 5-Sided Projection Installation a 5-sided projection installation, both especially using finger tracking. With Display equipped with optical tracking. Tracking the setup of five display panels it is easily Dimensions 6.00 m x 3.15 m 5 x 2.70 m x 2.70 m is needed to suffice the basic demands possible to fully cover the field of view Stereo-separation Circular Polarisation Temporal Projectors 2 x Sony 4K 10 x Christie Mirage WU3 DLP of any VR application and to allow easy of the user and immerse him in the Resolution 16.5 MP in stereo 41.4 MP in stereo and intuitive data exploration by adapting environment. Graphics Boards 2 x nVidia QuadroPlex 2 x nVidia Quadro 6000 the projection of the scene based on Compute Hardware the head position and orientation. Technical Data Machine SGI Altix UV10 SGI Altix XE500 (12 nodes) An overview on the key specification of Memory 256 GB 48 GB per render node the two systems is given in the table. Tracking ARTtrack2 (6 cameras) ART TrackPack4 (8 cameras)

80 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 81 ­

Centres Centres

Leibniz Supercomputing Centre of the Research in HPC is carried out in colla- Compute servers currently operated by LRZ are given in Bavarian Academy of Sciences and boration with the distributed, statewide the following table Humanities (Leibniz-Rechenzentrum, LRZ) Competence Network for Technical and

provides comprehensive services to Scientific High Performance Computing Peak scientific and academic communities by: in Bavaria (KONWIHR). ­Performance System Size (TFlop/s) Purpose User Community­

• giving general IT services to more Contact: IBM System x 155,656 Cores 3,263 Capability German than 100,000 university customers Leibniz Supercomputing Centre iDataPlex 330 Tbyte Computing­ Universities and in Munich and for the Bavarian “SuperMUC” Research Institutes, (expected for PRACE Projects Academy of Sciences Prof. Dr. Arndt Bode June 2012) Bolt zmannstr. 1 • running and managing the powerful 85478 Garching near Munich Linux-Cluster 4,438 Cores 36 Capacity Bavarian and Munich communication infrastructure of the Germany Intel Xeon 9.9 TByte Computing Universities, Munich Scientific Network (MWN) EM64T/ D-Grid, AMD Opteron LCG Grid Phone +49 - 89 - 358 - 31- 80 00 2-, 4-, 8-, 16-, • acting as a competence centre [email protected] 32-way

for data communication networks www.lrz.de SGI ICE 512 Cores 5 Capacity Bavarian Universities, Intel Nehalem 1.6 TB y t e Computing PRACE 8-way • being a centre for large-scale archiving and backup, and by

• providing High Performance SGI UV 2,080 Cores 20 Capability Bavarian Universities, Computing resources, training and 6 .1 T B y t e Computing PRACE support on the local, regional and national level. Megware 2,848 Cores 22 Capability Bavarian Universities, IB-Cluster 2.8 TByte Computing, PRACE PRACE Prototype

A detailed description can be found on LRZ’s web pages: www.lrz.de/services/compute

Draft of LRZ's next supercomputer "SuperMUC" by IBM.

82 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 83 ­

Centres Centres

Bundling Competencies Compute servers currently operated by HLRS In order to bundle service resources in

the state of Baden-Württemberg HLRS Peak has teamed up with the Steinbuch Center ­Performance User for Computing of the Karlsruhe Institute System Size (TFlop/s) Purpose Community­ First German National Center of Technology. This collaboration has Based on a long tradition in supercom- been implemented in the non-profit Cray XE6 3,552 dual socket 1,0 45 Capability European puting at University of Stuttgart, HLRS organization SICOS GmbH. "Hermit" nodes with 113,664 Computing and German (Q4 2011) AMD Interlagos cores Research (Höchstleistungsrechenzentrum Stuttgart) Organizations was founded in 1995 as the first German World Class Research and Industry federal Centre for High Performance As one of the largest research centers Computing. HLRS serves researchers for HPC HLRS takes a leading role in NEC Intel 4,700 Intel 170 Capability German Cluster Nehalem cores + Computing Universities, at universities and research laboratories research. Participation in the German 4,600 Sandybridge Research in Europe and Germany and their exter- national initiative of excellence makes cores, 10 TB memory Institutes nal and industrial partners with high-end HLRS an outstanding place in the field. and 64 NVIDIA Tesla and Industry, S1070 D-Grid computing power for engineering and scientific applications. Contact: IBM BW-Grid 3,984 Intel 45.9 Grid D-Grid Höchstleistungs­rechenzentrum Harpertown cores Computing Community Service for Industry ­Stuttgart (HLRS) 8 TByte memory Service provisioning for industry is done Universität Stuttgart to­gether with T-Systems, T-Systems sfr, A detailed description can be found on HLRS’s web pages: www.hlrs.de/systems and Porsche in the public-private joint Prof. Dr.-Ing. Dr. h.c. Dr. h.c. venture hww (Höchstleistungsrechner Michael M. Resch für Wissenschaft und Wirtschaft). Nobelstraße 19 Through this co-operation industry 70569 Stuttgart always has acces to the most recent Germany HPC technology. Phone +49 - 711- 685 - 8 72 69 [email protected] / www.hlrs.de

View of the HLRS Cray XE6 "Hermit". View of the HLRS BW-Grid IBM Cluster (Photo: HLRS).

84 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2011 • Vol. 9 No. 2 • inSiDE 85 ­

Centres Centres

Supercomputer-oriented research Compute servers currently operated by JSC and development in selected fields of physics and other natural sciences by research groups of competence in Peak ­Performance supercomputing applications. User System Size (TFlop/s) Purpose Community­

Operation of strategic support IBM 28 racks 5,872 Capability European infrastructures including community- Blue Gene/Q 28,672 nodes Computing Universities The Jülich Supercomputing Centre (JSC) oriented simulation laboratories and "JUQUEEN" 458,752 processors and Research IBM PowerPC® A2 Institutes, at Forschungszentrum Jülich enables cross-sectional groups on application 448 Tbyte main PRACE scientists and engineers to solve grand optimization, on parallel performance memory challenge problems of high complexity in tools and on mathematical methods and science and engineering in collaborative algorithms, enabling the effective Intel 2,208 SMT nodes with 207 Capacity European Linux CLuster 2 Intel Nehalem-EP and Universities, infrastructures by means of supercom- usage of the supercomputer resources. “JUROPA” quad-core 2.93 GHz Capability Research puting and Grid technologies. processors each Computing Institutes Higher education of bachelor, master 17,6 6 4 c o r e s and Industry, 52 TByte memory PRACE Provision of supercomputer resources and doctoral students.

of the highest performance class for proj- Intel 1,080 SMT nodes with 101 Capacity EU Fusion ects in science, research and industry Contact: Linux CLuster 2 Intel Nehalem-EP and Community in the fields of modeling and computer Jülich Supercomputing Centre (JSC) “HPC-FF” quad-core 2.93 GHz Capability processors each Computing simulation including their methods. The Institute for Advanced Simulation (IAS) 8,640 cores selection of the projects is performed Forschungszentrum Jülich 25 TByte memory by an international peer-review procedure IBM 1,024 PowerXCell 100 Capability QCD implemented by the Steering Committee Prof. Dr. Dr. Thomas Lippert Cell System 8i processors Computing Applications of the Gauss Centre for Supercomputing 52425 Jülich “QPACE” 4 TByte memory SFB TR55, (GCS) and by the John von Neumann Germany PRACE Institute for Computing (NIC), the latter Intel 206 nodes with 240 Capacity selected a joint foundation of Forschungszentrum Phone +49 - 24 61- 61- 64 02 GPU Cluster 2 Intel Westmere and HGF Projects Jülich, Deutsches Elektronen-Synchrotron [email protected] “JUDGE” 6-core 2.66 GHz Capability DESY, and GSI Helmholtzzentrum für www.fz-juelich.de/jsc processors each Computing 412 graphic proces- Schwerionenforschung. sors (NVIDIA Fermi) 20.0 TByte memory

Draft of JSC's supercomputer JUQUEEN, an IBM Blue Gene/Q system.

86 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 87 Activities Activities

the Karlsruhe Institute of Technology Along with interesting self-organization New Division "Civil Security (KIT) and German Aerospace Center phenomena there is a multitude of appli- and Traffic" at JSC (DLR), the new division will continue its cations, like the evaluation of escape work in the HGF portfolio project routes in the context of crowd manage- “Security Research”. In addition, many ment or the optimization of pedestrian The last decades have faced a continu- Centre (JSC) that is developing simulation ously growing urban population and models in the areas of fire safety and cities. Larger and complex buildings as pedestrian dynamics. To structurally well as an increasing demand on public support further developments, a new transportation systems accompany division “Civil Security and Traffic” has this growth. Additionally, the frequency been established at JSC in May 2012. and extend of large public events has The division is composed of scientists increased. The study of dense crowds from mathematics, physics, computer and related safety topics gains even sciences, and engineering. This inter- more importance considering the fact disciplinary team develops and applies that since 2006 half the world population models and simulations of complex sys- lives in cities. Cities are confronted tems in the context of civil security, with dense crowds, dense traffic, and fire safety, and traffic planning. In com- mixtures of traffic types. The potential bination with high performance com- hazards related to dense crowds become puting it is possible to tackle challenges obvious and real, like during the Love- in the simulation of large systems parade 2010 in Duisburg (Germany). It using high fidelity models. The division is important to assist urban planners, intensively cooperates with universities

architects, and safety engineers con- around the world, research institutions Figure 2: Simulation of fire propagation in a train compartment. Shown is a three dimensional fronted with the cities' growth and the as well as companies. representation of the flame distribution. conception of future cities. Therefore, there is a high demand for new tools At the same time the demand of re- programmes, on European level as facilities for urban development. Our which consider the complexity of all in- search on civil security is brought into well, have been launched with the focus aim is the quantitative description of volved processes. the focus of the German Government, on citizens and their protection. The pedestrian dynamics by using micro- Federal Ministry of Education and European Union's current Seventh Frame- scopic models of self-driven particle In recent years, a working group has Research (BMBF) and Helmholtz work Programme has many sections systems. For model validation the group formed at the Jülich Supercomputing Association (HGF). In cooperation with directly dedicated to civil security. The cooperates with several universities next EU programme “Horizon 2020”, on a systematic enhancement of the starting in 2013, will also address the empirical data basis. The group has security of citizens. At the national also taken a pioneering role in the con- level, the field “Protection and rescue ception and execution of large-scale solutions” is one of the funding priorities laboratory experiments involving pedes- in the Federal Government's security trians. This covers the automatic research programme. extraction of information (for instance trajectories) from experiments’ video A summary of the three main research footages using pattern recognition areas of the newly founded division is techniques [1] and the analysis and presented in the following. evaluation of the information using high standard methods based on Voronoi Pedestrian Dynamics decompositions [2]. The field is completed Figure 1: Left: Merging of three pedestrians streams at the gate of the tribune area of a stadium. The right The investigation of pedestrian dynamics by the development of highly accurate picture shows the automatically extracted trajectories using the software PeTrack [5] developed at JSC. is a young and lively field of research. models for pedestrian dynamics [3,4]

88 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 89 Activities Activities

which are used in simulations to of the available fire safety software to bikes increasingly enrich the traffic in References reproduce the observed phenomena. use High Performance Computing facilities cities. However, the consequences of [1] Boltes, M., and Seyfried, A. Collecting Pedestrian Trajectories, Furthermore, to increase the trans- is very limited, we investigate ways to this development for the safety and Neurocomputing, Special Issue on parency of the research activities and improve the parallelization of these comfort on the roads have so far not Behaviours in Video, ISSN 0925-2312, promote a sustainable development, codes. This effort will allow us to simu- been adequately investigated. In cooper- 2012 the models and analysis tools developed late large structures at an adequate ation with the University of Wuppertal, [2] Zhang, J. so far are available to the scientific numerical resolution. initial laboratory experiments with Experimental study on characteristics of bicycles have been conducted. Based pedestrian traffic with Voronoi-based on the results of such experiments, segmentation method, PhD Thesis, University of Wuppertal, 2012 we compare different means of trans- port and provide tools to support the [3] Kemloh Wagoum, A.U. planning of routes for mixed traffic. It Route choice modeling and runtime optimi- zation for simulation of building evacuation, is planned to extend aforementioned PhD Thesis, University of Wuppertal, 2012 activities from local evacuation of build- ings or venues to evacuation of cities [4] Chraibi, M. Validated force-based modeling of or regions. In the latter, the consider- pedestrian dynamics. PhD thesis, ation of intermodal traffic is indispensable, Universität zu Köln, 2012 for instance by modelling trips starting [5] http://www.fz-juelich.de/jsc/petrack as pedestrian and using then a com- bination of various transports means [6] Holl, S., Seyfried, A. such as cars, trains or bus shuttles. Hermes – an evacuation assistant for mass events, inSiDe 2009, 7(1), 60–61

Projects [7] Fiedrich, F. Figure 3: Left: mixed and dense traffic involving pedestrians, vehicles and cyclists in the city of Kolkata (India). Right: experiment with Wie im "alten Rom“: Sicherheit und Plan- The research activities conducted so bicyclists performed in Wuppertal (Germany). The extraction and analysis of trajectories will give new insights into the development of barkeit von Großveranstaltungen, bicycle traffic models. far have been supported by two BMBF Forschungsmagazin Research Bulletin der and DFG-funded projects, which were Bergischen Universität Wuppertal, Nr. 7/ community in the form of open source Current engineering applications in our successfully completed in 2011. The Sommersemester 2012, http://www.buw-output.uni-wuppertal. projects combined with databases of group are subway vehicles and under- BMBF-funded project Hermes [6] covered • Armel Ulrich de/p_pics/Output7_web.pdf experimental results. ground stations. Here we run simulations the design and implemented an evacu- Kemloh Wagoum to study the heat release rates of burn- ation assistant for large public events. • Lukas Arnold Fire Safety ing subway cars and the smoke spread The test venue was the ESPRIT arena • Stefan Holl Modern and complex buildings, like air- in stations. These insights will be used in Düsseldorf (Germany). Succeeding • Armin Seyfried ports and multi-level underground to evaluate new ventilation setups and Hermes, a new BMBF-funded research stations, represent novel architectural adaptive operation modes, including project BaSiGo [7] has been launched Jülich ideas and pose challenges for fire pro- modern devices as jet fans and high in March 2012. Beside other topics, this Supercomputing tection concepts. In general, official fire pressure mist sprinkler. To ease the project studies the emergence of critical Centre (JSC) safety regulations are rigid and hardly evacuation of passengers and the ac- states in large crowds. For that purpose applicable to such constructions. Com- cess of rescue teams, the creation of large experiments are planned. putational fluid dynamics simulations zones with a high visibility is of particular of fire and smoke provide a flexible interest. – and meanwhile also a reliable – tool for the outline of fire protection sys- Traffic tems and strategies. Although fires in The third area of interest is the model- simple and small compartments and ling of intermodal transport, such as buildings can be reasonably computed, large areas evacuations and the model- many models involved are based on ling of mixed traffic for the planning of crude simplifications. As the capability inner city roads. Bicycles and electric

90 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 91 Activities Activities Greengineering the Future Cray User Group Meeting 2012

The High Performance Computing Center presented the current status of the serving the existing standard of living, the people who make all this possible. Stuttgart (HLRS) at the University of European PRACE initiative and Michael is one the key challenges of the future Stuttgart, Germany hosted the annual Resch from HLRS who discussed the in which simulation on supercomputers CUG 2012 was an excellent forum for Cray User Group Conference – CUG future of High Performance Computing will play a major role. Cray users and developers to exchange 2012. in a braoder sense. their expertise, problems, and solu- It is obvious that High Performance tions. Participants had the opportunity CUG is an independent organization of Peter Ungaro – Chief Executive Officer of Computing will be necessary to move to share ideas and experiences, and users and owners of Cray systems Cray – gave a tour de horizon of Cray technology in an age of growing environ- meet key members of the Cray team. that have come together to support and and was open to the public questions mental problems. Simulations on such discuss the usage of Cray systems of customers for one hour. Ungaro systems are the basis for our under- The open atmosphere fostered intensive worldwide. Its annual meeting brings also laid out the future strategy of Cray standing of environmental processes. discussion among users, providers and together centers and users on the and emphasized the importance of The help us to better grasp the com- Cray staff. At the same time a arge one hand and Cray staff on the other hardware and software integration in a plexity of our climate. On the other group of students was able to get in hand. The meeting serves to exchange world that converges technically. hand simulation has become the major touch with professionals in the field and experience and discuss issues related driving key for technology imporve- see what their future work place might to High Performance Computing on The conference theme “Greengineering ments that help to reduce fuel con- look like. Cray systems. the Future” was reflected in a lot of sumption, energy consumption or the discussions about usage of Cray systems pollution of air and water. Social events allowed communicating The conference was held at the center to improve technology or simulate in a relaxed atmosphere. A visit to the of Stuttgart at the Maritim Hotel from climate to better understand the envi- Over the last years simulation has also Mercedes Museum gave participants April 29 - May 3, 2012. About 150 ronment in which we live. The motto become a driving factor for sustainable a view of 125 years of automobile de- participants from Europe, Asia and the reflects the fact that Stuttgart is the energy supply. Wind, water and solar velopment showing how simulation can US enjoyed an interesting technical place where most engineering innova- energy allhave to be substantially im- become part of an outstanding history program. tions worldwide have been created in porved to contribute to the energy of development in individual mobility. A the last 100 years. Meeting the chal- suplly of an industrial nation. Questions night out at the Stuttgart spring festival • Neriman Emre Keynotes were presented by Wolfgang lenge of global climate change is one of of improved power production, reduced and a candle light dinner at the splendid • Michael Resch Nagel from the Supercomputing Center the most important issues for the city of prudction and maintenance costs as Solitude castle close to Stuttgart made Dresden, who spoke about performance cars which is home to companies like well as the implementatin of new the conference a lasting impression for University of analysis and tools, Richard Kenway Daimler Porsche or Bosch. Turning the technologies all rely on simulation. CUG many. Stuttgart, HLRS from the University of Edinburgh who world into a greener planet, while pre- 2012 was a place to get in touch with

92 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 93 Activities Activities 47th International IDC HPC User Forum

The 47th IDC HPC User Forum was held nologies. After a warm welcome by are still many fields and countries in technologies in the field of urban analytics. July 9-10, 2012 at the High Performance HPC User Forum Chair Steve Finn and which supercomputing can grow and The session made clear that large Computing Center Stuttgart (HLRS). Deputy Director Stefan Wesner from contribute. Stefan Wesner then dis- scale data is already today an important After a series of autumn events held HLRS, Earl Joseph and Steve Conway cussed the potential of Cloud concepts issue in supercomputing and will play at HLRS the meeting was held in July from IDC gave interesting insights into for supercomputing. Presenting the an ever larger role in the future. in conjunction with a similar meeting at the market situation of HPC. Covering a EC-funded Bonfire project he showed London. Even though ISC and many field from low-end to high-end systems, how Clouds can help to improve access A number of further talks were address- other meetings were close to the date the market shows extremely positive to large scale systems. ing issues mainly in industrial usage about 40 international participants signs. A big share in the positive numbers of supercomputers, new challenges in joined and discussed over two days the is coming from public funding. Projects Large scale data was one of the big automotive simulation and supercom- current problems of High Performance like PRACE show a positive impact on issues. Jim Kasdorf presented a machine puting hardware. New products as well Computing. For the sixth time in a row the market. with big memory. Showing the data as new methods were discussed. HLRS hosted an IDC user forum at concept of Pittsburgh Supercomputing Michael Resch gave an overview of super- Stuttgart. The collaboration of HLRS Fausto Giunchiglia from the University Center (PSC) Jim worked out how computing in Germany and pointed at and IDC reflects the common interest of Trento gave a fascinating insight large scale shared memory can help the internationally renowned strategy in supercomputing applications with into the FuturICT project highlighting to solve large scale problems. Edgar of joining forces at the national level in a link to the industrial usage of High the general potential of ICT to improve Gabriel from the University of Houston order to improve the international posi- Performance Computing systems. the quality of life for everyone. Vladimir showed how supercomputing can handle tion of German supercomputing. Voevodin from the Moscow State the problem of fast I/O that comes The program was focused on: Tech- University gave an update about the with the increased size of data sets to • Michael Resch nologies for advancing supercomputing Russian situation in supercomputing be handled and created by super- and supercomputer technology evolution: and about the well-known Lomonosov computing. Paul Muzio gave a fascinat- University of processors, algorithms and new tech- system. Both talks showed that there ing talk about the application of these Stuttgart, HLRS

94 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 95 Activities Activities Minister Dr. Theresia Bauer to 6th International Parallel acknowledge Prof. Michael M. Tools Workshop Resch as "Visionary Thinker" highest importance for the development and porting of applications for new HPC On Thursday, May 31, HLRS was honored In recompense for being chosen as architectures. with the visit of Dr. Theresia Bauer, "Visionary Thinker“, each awardee was Minister of Science, Research and the given the opportunity to publicly in- A dozen technical papers were pre- Arts in Baden-Württemberg. The troduce his project and was granted The International Parallel Tools Workshop, sented on the performance and debug- reason for Minister Bauer’s visit was one wish to be fulfilled by the State of jointly organized by the Center for Infor- ging tools Threadspotter, Vampir, to present Prof. Michael M. Resch, Baden-Württemberg. Guidelines for mation Services and High Performance Scalasca Score-P, MUST, Temanejo, and Director of HLRS, with the "Visionary this one wish were: a) must be reason- Computing, University of Dresden (ZIH-TUD), MemPin, as well as on integration and Thinker“ award of the State of Baden- able, b) must not exceed 1,000 in and the High Performance Computing interoperability of tools, and on visualiza- Württemberg. Prof. Resch’s project expenses, and c) must be related to Center Stuttgart (HLRS), is an annual tion of large amounts of performance "High Performance Computing Address- the “Visionary Thinker Project”. Prof. forum for providers of tools for debugging data. In addition to the usual presentations, ing Key Problems of Our Time“ had Resch’s request was a comparatively and performance tuning of parallel ap- this year’s workshop featured two half-day been chosen earlier among more than easy one to meet: He asked for Minister plications [1]. The workshop serves as tutorials on Likwid and Allinea DDT, 700 visionary ideas submitted in a Bauer to take ample time and meet discussion forum between the developers respectively, addressing the training of state wide competition, called “Die with and talk to HLRS employees in order and the users of state-of-the-art tools advanced users. Übermorgenmacher”. The competition to learn what their (scientific) work is used in High Performance Computing, and was initiated in celebration of the 60th all about, comprehend their general to give feedback on mutual requirements The Allinea DDT tutorial focused on one • Regina Weigand anniversary of the State of Baden- ideas and concerns, and hear out their and expectations. The last instance of side on the debugging at scale for large • José Gracia Württemberg. Prof. Resch was nominated expectations from the government. this series was hosted by HLRS from installations as the Cray XE6 Hermit at Gauss Centre for by external experts for this position and September, 25 - 26 in Stuttgart [2]. HLRS. A second focus was the newly University of Supercomputing was one of only six scientists accepted. Minister Bauer complied with Prof. Resch’s introduced support for the OpenACC [3] Stuttgart, HLRS (GCS) wish and spent a good two hours at Recent advances in the High Performance programming model which targets HLRS. When she eventually left the Computing hardware, such as increased accelerators. The Likwid tutorial particu- premises, Minister Bauer had been capabilities of a single NUMA node or larly focused of the percularities of the given a comprehensive overview of HLRS heterogeneous architectures combining Intel Sandybridge processor family which and had engaged herself in meeting traditional CPU nodes with the GPU is expected to form the basis of many and interacting with the people who are accelerators, have fostered the creation future HPC systems. behind the scientific projectworks and of a new generation of highly scalable all the “visionary ideas” flourishing at HLRS. parallel applications and libraries. How- ever, new capabilities of modern super- computers have caused an increasing complexity of parallel applications devel- opment. Despite numerous efforts to improve and simplify the development process, there is still a lot of manual tun- ing needed to fully benefit from modern HPC architectures. The tools facilitating the debugging and performance analysis [1] http://toolsworkshop.hlrs.de/2012/index. of parallel applications, either OpenMP php/previous-workshops and MPI, or more modern programming [2] http://toolsworkshop.hlrs.de/2012/ Handing over of the "Visionary Thinker Award”. Minister T. Bauer and M. Resch. models, such as StarSs, are still of the [3] http://www.openacc.org/

96 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 97 Activities Activities 1st CHANGES Workshop Workshop on Hybrid Particle-Continuum Methods New and highly scalable algorithms, Supercomputing (CHSC11). CHANGES, performance tools, and performance however, is the first trilateral cooperation modelling have been the topics of the and will not only consider the issues A main purpose of computational materials a workshop on “Hybrid Particle-Continuum first CHANGES workshop on High of the partner institutions but will also physics is to establish a fundamental Methods in Computational Materials Performance Computing, which took take into account national topics. link between atomic-scale processes and Physics” will be held at the Forschungs- place from 3 to 5 September 2012 at the macroscopic behavior of condensed zentrum Jülich from 4 to 7 March JSC. The workshop was initiated by the The first CHANGES workshop was a matter, including composite materials, 2013. In order to narrow down the focus Computer Network Information Center high-level platform dealing with latest complex fluids, and materials of techno- of the workshop, we will concentrate (CNIC) of the Chinese Academy of Sciences trends in supercomputing, sophis- logical interest. A common characteristic on methods that couple particles, or (CAS), the National Center for Super- ticated information techniques and of these systems is the existence of discrete degrees of freedom, to con- computing Applications (NCSA) at Uni- interdisciplinary applications. About 45 important features at multiple time or tinuum fields. The two main topics are: versity of Illinois, Urbana-Champaign well-known experts, among them 21 length scales. Typical examples are e.g. • Continuum-mediated interactions (UIUC) and the Jülich Supercomputing speakers, came together by invitation crack propagation in solids or protein between particles Centre at Forschungszentrum Jülich. and discussed latest results of their folding in solution. A precise description • Adaptive and non-adaptive coupling These global players in supercomputing research fields. The workshop focussed of interatomic interactions is needed between particle-based and continuum- together with their mother organi- on the above mentioned topics but at the crack tip or when two side chains based descriptions of materials. zations joined forces and reached an covered also several fields in super- of the protein come close to each agreement about co-sponsoring a computing, e-Science and its applications other. However, stress boundary con- So far, 16 internationally recognized CHinese-AmericaN-German E-Science in China, Germany and the United ditions or local electrostatic fields in scientists have accepted our invitation and cyberinfrastructure workshop States. Besides the presentations the these examples are strongly affected to present their research. We will also • Norbert Attig series (CHANGES), where the three workshop provided a forum for trilat- by long-range interactions. The latter allow for a limited number of contributed • Martin Müser sites will be local organizers alternately eral cooperations on student exchange are conveyed by a medium that can be talks and give everybody attending the • Godehard Sutmann Jülich every year. In the past, the three insti- and mutual research projects. First described in terms of continuous fields. workshop the possibility to present a • Roland G. Winkler Supercomputing tutions already had a couple of bilateral promising cooperation ideas were devel- Modeling such systems is challenging poster. More details can be found at Centre (JSC) cooperations and workshops, like the oped and are foreseen to be launched because the small and the large scale http://www.fz-juelich.de/jsc/HY- Forschungszentrum US-American-Chinese series ACCESS in the near future. have to be incorporated simultane- BRID2013 Jülich and the German-Chinese Workshop on ously and their underlying constitutive equations differ. This is why sequential multi-scale modeling is not an option, despite its success in the bottom- Workshop on up development of field theoretical ap- Hybrid Particle- Continuum Methods in proaches to homogeneous media. Computational Materials Physics March 4 – 7, 2013 | Jülich | Germany There has been much progress on cou- pling different descriptions and levels of resolution in the communities inter- ested in complex fluids and complex solids. However, there has been surpris- ingly little exchange between them. In the tradition of previous workshops organized by research groups of the John von Neumann Institute for Comput- ing (NIC) at Jülich fostering the exchange between various scientific communities, Invited Speakers B. Curtin, Lausanne (CH) M. Moseler, Freiburg (D) R. Delgado-Buscalioni, Madrid (ES) D. Mukherji, MPI-Polymers Mainz (D) C. Denniston, Western (CAN) A. Nitzan, Tel Aviv University (IL) B. Ensing, Amsterdam (NL) Y. Qi, GM R&D Detroit (USA) E. van der Giessen, Groningen (NL) M. O. Robbins, Johns Hopkins (USA) A. Hartmaier, Bochum (D) J. Rottler, British Columbia (CAN) 98 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE P. Koumoutsakus, ETH Zürich (CH) I. Szlufarska, Madison Wisconsin (USA) 99 M. Kubo, Tohoku University (JAP) R. Yamamoto, Kyoto University (JAP)

Organizers M. H. Müser, G. Sutmann, R. G. Winkler Forschungszentrum Jülich, Institute for Advanced Simulation Further details E. Luijten, Northwestern University www.fz- juelich.de/jsc/HYBRID2013 Activities Activities JSC Guest Student Programme on Scientific Computing 2012

As one of the leading HPC centres in opportunity to join research teams from October 12. Once again it was run under was the utilization of the newly installed Europe, Jülich Supercomputing Centre JSC and other institutes at Forschungs- the CECAM framework (Centre Européen IBM Blue Gene/Q system JUQUEEN in provides supercomputer resources, zentrum Jülich for ten weeks. Working de Calcul Atomique et Moléculaire) Jülich. IT tools and HPC expertise for computa- on challenging topical scientific projects, and organized in cooperation with the tional scientists at German and European they received training with up-to-date German Research School for Simulation During a concluding colloquium, the universities, research institutions, and hardware and software as well as HPC- Sciences (GRS). Support by IBM participants had the opportunity to in industry. An important part of this related methods and algorithms. For Germany through a sponsorship within present and discuss their work with mission is to introduce young academics many students, the programme has the IBM University Relations programme other students and supervisors. Finally, to HPC and its role in scientific re- been the foundation for a career in is also gratefully acknowledged. more detailed reports have been pre- search. To this end, JSC hosts regular HPC and the basis for fruitful continuing pared and were compiled into a printed support activities and educational cooperations with their advisers, oc- This year’s announcement yielded a rec- JSC publication, which will be made programmes in the field of Scientific casionally leading to widely recognized ord response of more than 30 applica- available online. Computing. scientific publications, e.g. [1] to give tions from 13 countries, comprising a recent example. Some students also applicants from a particularly wide range Next year's JSC Guest Student Pro- The JSC Guest Student Programme on returned to JSC as Master or PhD of disciplines, including: mathematics, gramme will start on August 5, 2013. Scientific Computing has already been candidates. physics, chemistry, computer science It will be officially announced in January successfully running for 13 years. and engineering, biomedicine and earth 2013 and is open to students from the Since the first programme in 2000, a The JSC Guest Student Programme sciences. This meant that competition natural sciences, engineering, com- total of 133 students have had the 2012 took place from August 6 - for the available places was especially puter science and mathematics after strong, and after the final selection completing their Bachelor and before process, 13 students were invited to reaching their Master's degree. The Jülich. application deadline has been fixed to • Mathias Winkel April 26, 2013. Additional information The programme started with ten days as well as the reports of previous years Jülich of introductory courses on parallel can be found online on www.fz-juelich. Supercomputing programming, including use of MPI on de/jsc/gsp Centre (JSC) distributed-memory cluster systems and OpenMP for shared-memory paral- lelization, as well as CUDA and OpenCL References for GPU-accelerated machines. For [1] Dapp, W., Lücke, A., Persson, B., Müser, H. Self-Affine Elastic Contacts: the remainder of their stay, the partici- Percolation and Leakage, Phys. Rev. Lett. pants worked on research projects 108, p. 244301, 2012 in a broad area of fields, ranging from applications in atmospheric sciences, fluid and molecular dynamics, particle-in- cell methods, and safety research, to fundamental research in elementary particle physics and mathematical algorithms. Besides using the multi- purpose cluster JUROPA and the GPU system JUDGE, one of the key topics

100 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 101 Activities Activities New Books in HPC

Sustained Simulation or multi-scale simulations are presented. Management, Parallel Debugging and The book presents the state-of-the-art Performance 2012 All papers were chosen from pre- Performance Analysis. In the pursuit in High Performance Computing and sentations given at the 14th Teraflop to maintain exponential growth for the simulation on modern supercomputer Workshop held in December 2011 at performance of high performance architectures. It covers trends in HLRS, University of Stuttgart, Germany computers the HPC community is cur- hardware and software development and the Workshop on Sustained Simu- rently targeting Exascale Systems. in general and specifically the future lation Performance at Tohoku University The initial planning for Exascale already of vector-based systems and hetero- in March 2012. started when the first Petaflop system geneous architectures. was delivered. Many challenges need Tools for High Performance to be addressed to reach the necessary The application contributions cover Com- Computing 2011 performance. Scalability, energy effi- putational Fluid Dynamics, material sci- ciency and fault-tolerance need to be ence, medical applications and climate increased by orders of magnitude. research. Innovative fields like coupled The goal can only be achieved when multi-physics or multi-scale simulations advanced hardware is combined with a are presented. suitable software stack. In fact, the importance of software is rapidly growing. All papers were chosen from presen- tations given at the 13th Teraflop Work- High Performance Computing shop held in October 2010 at Tohoku on Vector Systems 2011 University, Japan. For 8 yers now this workshop brings together experts in high performance computing that put the emphasis of their work on solutions Resch, M., Wang, X., Bez, W., Focht, E., Kobayashi, H. (Eds.) for real world problems focussing on 2013, XII, 182 p. 81 illus., the sustained performance achievable 67 in color. on modern hardware architectures. Proceedings of the joint Workshop on High Performance Computing on Vector Systems, Stuttgart (HLRS), and Workshop on Sustained With the advent of Petaflops systems Simulation Performance, Tohoku University, and the predictable arrival of Exascale 2012. systems in the future, sustained per- formance is getting more and more The book presents the state-of-the-art important. this book serves as a guide in High Performance Computing and through the jungles of architectures Brunst, H., Müller, M., Nagel, W.E., Resch, M. (Eds.) simulation on modern supercomputer 2012, XVII, 154 p. 116 illus., 44 in color. and simulation software systems. architectures. It covers trends in Proceedings of the 5th International Workshop hardware and software development on Parallel Tools for High Performance Computing, September 2011, ZIH, Dresden. in general and specifically the future of high performance systems and hetero- geneous architectures. The application The proceedings of the 5th International contributions cover Computational Fluid Workshop on Parallel Tools for High Dynamics, material science, medical Performance Computing provide an over- Resch, M., Wang, X., Focht, E., applications and climate research. Inno- view on supportive software tools and Kobayashi, H., Roller, S. (Eds.) vative fields like coupled multi-physics environments in the fields of System 2012, VIII, 184 p. 89 illus., 68 in color.

102 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 103 Activities Activities

equations, aero-acoustics and high order In the table, you can find the whole HLRS HLRS Scientific Tutorials and numerical methods for the solution of series of training courses in 2013. Workshop Report and Outlook systems of partial differential equations. They are organized at HLRS and also at several other HPC institutions: LRZ HLRS has installed Hermit, a Cray XE6 Programming of Cray XK6 clusters with Our general course on parallelization, Garching, NIC/ZAM (FZ Jülich), ZIH (TU system with AMD Interlagos processors GPUs is taught in OpenACC Program- the Parallel Programming Workshop, Dresden), TUHH (Hamburg Harburg), • Rolf Rabenseifner and 1 PFlop/s peak performance. We ming for Parallel Accelerated Super- September 2 - 6, 2013 at HLRS, will and GRS/RWTH (Aachen). strongly encourage you to port your computers on April 29 - 30, 2013. have three parts: The first two days University of applications to the new architecture as of this course are dedicated to paral- Stuttgart, HLRS early as possible. To support such ef- These Cray XE6 and XK6 courses are lelization with the Message Passing fort we invite current and future users also presented to the European com- Interface (MPI). Shared memory multi- to participate in special Cray XE6 munity in the framework of the PRACE threading is taught on the third day, Optimization Workshops. With these Advanced Training Centre (PATC). and in the last two days, advanced top- courses, we will give all necessary in- GCS, i.e., HLRS, LRZ and the Jülich ics are discussed. As in all courses, formation to move applications to this Supercomputer Centre together, serve hands-on sessions (in C and Fortran) Petaflop system. The Cray XE6 will as one of the first six PATCs in Europe. will allow users to immediately test provide our users with a new level of and understand the parallelization meth- performance. To harvest this potential One of the flagships of our courses is ods. The course language is English. the week on Iterative Solvers and ISC and SC Tutorials in 2012 Parallelization. Prof. A. Meister and Several three and four day-courses on Georg Hager, Gabriele Jost, Rolf Rabenseifner, Jan Treibig: Performance-oriented Programming on Multicore-based Clusters with MPI, Prof. B. Fischer teach basics and details MPI & OpenMP will be presented at OpenMP & Hybrid MPI/OpenMP. Tutorial 4 at the International Supercomputing on Krylov Subspace Methods. Lecturers different locations all over the year. Conference, ISC’12, Hamburg, June 17 - 21, 2012. from HLRS give lessons on distributed Rolf Rabenseifner, Georg Hager, Gabriele Jost: memory parallelization with the Message We also continue our series of Fortran Hybrid MPI and OpenMP Parallel Programming. Half-day Tutorial at Super Computing 2013 – Workshop Announcements 2012, SC12, Salt Lake City, USA, November 10 - 16, 2012. Passing Interface (MPI) and shared for Scientific Computing in March Scientific Conferences and Workshops at HLRS Alice E. Koniges, Katherine Yelick, Rolf Rabenseifner, Reinhold Bader, David Eder: memory multi-threading with OpenMP. 2013, mainly visited by PhD students 11th HLRS/hww Workshop on Scalable Global Parallel File Systems (March/April 2013) Introduction to PGAS (UPC and CAF) and Hybrid for Multicore Programming. Full-day Tutorial at Super Computing 2012, SC12, Salt Lake City, USA, November This course will be presented twice, in from Stuttgart and other universities 7th ZIH+HLRS Parallel Tools Workshop (date and location not yet fixed) 10 - 16, 2012. March 2013 at HLRS in Stuttgart and in Germany to learn not only the basics High Performance Computing in Science and Engineering - The 16th Results and September 2013 at LRZ. of programming, but also to get an Review Workshop of the HPC Center Stuttgart (October 2013) will require all our efforts. We are look- insight on the principles of developing IDC International HPC User Forum (October 2013) ing forward to working with our users Another highlight is the Introduction high-performance applications with Parallel Programming Workshops: Training in Parallel Programming and CFD on these opportunities. These 4-day to Computational Fluid Dynamics. Fortran. Parallel Programming and Parallel Tools (TU Dresden, ZIH, February 4 - 7) Introduction to Computational Fluid Dynamics (HLRS, February 11 - 15) courses in cooperation with Cray and This course was initiated at HLRS by Iterative Linear Solvers and Parallelization (HLRS, March 4 - 8) multi-core optimization specialists are Dr.-Ing. Sabine Roller. She is now a With Unified Parallel C (UPC) and Co- Cray XE6 Optimization Workshops (HLRS, April 23 - 26) April 23 - 26, 2013 and also in autumn professor at the German Research Array Fortran (CAF) in May 2013, the Unified Parallel C (UPC) and Co-Array Fortran (CAF) (HLRS, May 2 - 3) 2013. School at RWTH Aachen and moving to participants will get an introduction Parallel Programming with MPI & OpenMP (TU Hamburg-Harburg, July 29 - 31) Siegen. It is again scheduled in February of partitioned global address space Iterative Linear Solvers and Parallelization (LRZ, Garching, September 2 - 4) 2013 in Stuttgart. The emphasis is (PGAS) languages. Message Passing Interface (MPI) for Beginners (HLRS, September 23 - 24) placed on explicit finite volume methods Shared Memory Parallelization with OpenMP (HLRS, September 25) for the compressible Euler equations. In cooperation with Dr. Georg Hager Advanced Topics in Parallel Programming (HLRS, September 26 - 27) Moreover outlooks on implicit methods, from the RRZE in Erlangen and Dr. Parallel Programming with MPI & OpenMP (FZ Jülich, JSC, November 25 - 27) the extension to the Navier-Stokes Gabriele Jost from TACC, the HLRS Training in Programming Languages at HLRS equations and turbulence modeling are also continues with contributions on Fortran for Scientific Computing (March 11 - 15) given. Additional topics are classical hybrid MPI & OpenMP programming URLs: numerical methods for the solution of with tutorials at conferences; see the http://www.hlrs.de/events/ the incompressible Navier-Stokes box on the left page. http://www.hlrs.de/organization/sos/par/services/training/course-list/ https://fs.hlrs.de/projects/par/events/2013/parallel_prog_2013/

104 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 105 GCS – High Performance Computing Courses and Tutorials

Introduction to the Pro- understand the basic constructs of correlated with performance mea- Unified Parallel C (UPC) and sessions (in C and Fortran) will allow level. Advanced Fortran features like gramming and Usage of the the Message Passing Interface (MPI) surements, the potential benefit of Co-Array Fortran (CAF) users to immediately test and object-oriented programming or co- Supercomputer Resources and the shared memory directives of optimizations can often be predicted. understand the basic constructs of arrays will be covered in a follow-on at Jülich OpenMP. Course language is English. We introduce a “holistic” node-level Dates and Locations the Message Passing Interface (MPI) course in autumn. This course is organized by JSC in performance engineering strategy, December 13 - 14, 2012 and the shared memory directives of To consolidate the lecture material, Date and Location collaboration with HLRS. Presented apply it to different algorithms from May 02 - 03, 2013 OpenMP. The last day is dedicated to each day's approximately 4 hours November 22 - 23, 2012 by Dr. Rolf Rabenseifner, HLRS. computational science, and also Stuttgart, HLRS tools for debugging and performance of lecture are complemented by 3 JSC, Research Centre Jülich show how an awareness of the per- analysis of parallel applications. This hours of hands-on sessions. Web page formance features of an application Contents course is organized by ZIH in collabo- Contents http://www.fz-juelich.de/ias/jsc/ may lead to notable reductions in Partitioned Global Address Space ration with HLRS. Prerequisites This course gives an overview of events/mpi power consumption: (PGAS) is a new model for parallel Course participants should have ba- the supercomputers JUROPA and • Introduction programming. Unified Parallel C (UPC) Web page sic UNIX/Linux knowledge (login with JUQUEEN. Especially new users will • Practical performance analysis and Co-Array Fortran (CAF) are PGAS www.hlrs.de/events/ secure shell, shell commands, basic learn how to program and use these Node-Level Performance • Microbenchmarks and the memory extensions to C and Fortran. PGAS programming, vi or emacs editors). systems efficiently. Topics discussed Engineering hierarchy languages allow any processor to are: system architecture, usage • Typical node-level software over directly address memory/data on any Programming with Fortran Web page model, compilers, tools, monitoring, Date and Location heads other processors. http://www.lrz.de/services/ MPI, OpenMP, performance optimi- December 06, 2012 • Example problems: Parallelism can be expressed more Date and Location compute/courses zation, mathematical software, and December 07, 2012 - The 3D Jacobi solver easily compared to library based February 04 - 08, 2013 application software. LRZ Building, - The Lattice-Boltzmann Method approches as MPI. Hands-on sessions LRZ Building, University Campus Garching, - Sparse Matrix-Vector Multiplication (in UPC and/or CAF) will allow users University Campus Garching, Introduction to Computa- Web page near Munich, Boltzmannstr. 1. - Backprojection algorithm for CT to immediately test and understand near Munich, Boltzmannstr. 1. tional Fluids Dynamics http://www.fz-juelich.de/ias/jsc/ reconstruction the basic constructs of PGAS languages. events/sc-nov Contents • Energy & Parallel Scalability. Contents Date and Location This course teaches performance Between each module, there is time Web page This course is targeted at scientists February 11 - 15, 2013 engineering approaches on the for Questions and Answers! www.hlrs.de/events/ with little or no knowledge of the Stuttgart, HLRS Parallel Programming with compute node level. “Performance Fortran programming language, but MPI, OpenMP and PETSc engineering” as we define it is more Web page need it for participation in projects Contents than employing tools to identify http://www.lrz.de/services/ Parallel Programming with using a Fortran code base, for de- Numerical methods to solve the Date and Location hotspots and bottlenecks. It is about compute/courses MPI, OpenMP, and Tools velopment of their own codes, and equations of Fluid Dynamics are pre- November 26 - 28, 2012 developing a thorough understanding for getting acquainted with addi- sented. The main focus is on explicit JSC, Research Centre Jülich of the interactions between software Date and Location tional tools like debugger and syntax Finite Volume schemes for the com- and hardware. This process must February 04 - 07, 2013 checker as well as handling of compil- pressible Euler equations. Hands-on Contents start at the core, socket, and node Dresden, ZIH ers and libraries. The language is for sessions will manifest the content of The focus is on programming models level, where the code gets executed the most part treated at the level of the lectures. Participants will learn MPI, OpenMP, and PETSc. Hands-on that does the actual computational Contents the Fortran 95 standard; features to implement the algorithms, but sessions (in C and Fortran) will allow work. Once the architectural require- The focus is on programming models from Fortran 2003 are limited to also to apply existing software and to users to immediately test and ments of a code are understood and MPI, OpenMP, and PETSc. Hands-on improvements on the elementary interpret the solutions correctly.

106 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 107 GCS – High Performance Computing Courses and Tutorials

Methods and problems of paralleliza- OpenMP. This course is organized by • Technical aspects of software sessions will allow users to immediately Prerequisites Eclipse: C/C++/Fortran tion are discussed. This course is University of Kassel, HLRS, and IAG. engineering test and understand the language Good MPI and OpenMP knowledge Programming based on a lecture and practical • Basic parallel programming with constructs. as presented in the course "Parallel awarded with the "Landeslehrpreis Web page MPI and OpenMP programming of High Performance Date and Location Baden-Württemberg 2003" and or- www.hlrs.de/events/ • Processor Architectures: Register, Web page Systems" (see above). March 28, 2013 ganized by HLRS, IAG, and University Cache, Locality, Performance www.hlrs.de/events/ LRZ Building, of Kassel. metrics Web page University Campus Garching, Parallel Programming of • Principles of code optimization http://www.lrz.de/services/ near Munich Web page High Performance Systems • Advanced MPI and OpenMP Advanced Topics in High compute/courses www.hlrs.de/events/ programming Performance Computing Contents Date and Location • Parallel algorithms This course is targeted at scientists March 04 - 10, 2013 • Advanced OpenMP programming Date and Location Parallel I/O and Portable who wish to be introduced to pro- Iterative Linear Solvers and LRZ Building, • Parallel Architectures: multi-core, March 18 - 21, 2013 Data Formats gramming C/C++/Fortran with the Parallelization University Campus Garching, multi-socket, ccNUMA LRZ Building, Eclipse C/C++ Development Tools near Munich. • Architecture-specific optimization University Campus Garching, Date and Location (CDT), or the Photran Plugin. Date and Location RRZE building, strategiesPrerequisites: near Munich March 18 - 20, 2013 Topics covered include: March 04 - 08, 2013 University Campus Erlangen, Good working knowledge of at least JSC, Research Centre Jülich Stuttgart, HLRS Mar tensstr. 1: one of the standard HPC languages Contents • Introduction to Eclipse IDE Via video conference if there is suf- Fortran 95, C or C++. In this add-on course to the parallel Contents • Introduction to CDT September 02 - 06, 2013 ficient interest. programming course special topics This course will introduce MPI parallel • Hands-on with CDT Garching, LRZ Web page are treated in more depth, in par- I/O and portable, self-describing data • Short introduction and demo of Contents http://www.lrz.de/services/ ticular performance analysis, I/O formats, such as HDF5 and NetCDF. Photran Contents This course, a collaboration of compute/courses and PGAS concepts. It is provided Participants should have experience The focus is on iterative and parallel Erlangen Regional Computing Centre in collaboration of Erlangen Regional in parallel programming in general, Prerequisites solvers, the parallel programming (RRZE) and LRZ, is targeted at stu- Computing Centre (RRZE) and LRZ. and either C/C++ or Fortran in par- Course participants should have ba- models MPI and OpenMP, and the dents and scientists with interest in Fortran for Scientific Each day is comprised of approxi- ticular. sic knowledge of the C and/or C++/ parallel middleware PETSc. Thereby, programming modern HPC hardware, Computing mately 5 hours of lectures and 2 This course is a PATC course (PRACE Fortran programming languages. different modern Krylov Subspace specifically the large scale parallel hours of hands-on sessions: Advanced Training Centres). Methods (CG, GMRES, BiCGSTAB ...) computing systems available in Jülich, Date and Location Web page as well as highly efficient precondi- Stuttgart and Munich. March 11 - 15, 2013 • Intel Tracing Tools Web page http://www.lrz.de/services/ tioning techniques are presented Stuttgart, HLRS • Introduction to Scalasca http://www.fz-juelich.de/ias/jsc/ compute/courses in the context of real life applica- Each day is comprised of approximately • Parallel input/output with MPI-IO events/parallelio tions. Hands-on sessions (in C and 4 hours of lectures and 3 hours of Contents • I/O tuning on high performance file Fortran) will allow users to immedi- hands-on sessions: This course is dedicated for scientists systems GPU Programming ately test and understand the basic and students to learn (sequential) • Introduction to the PGAS languages constructs of iterative solvers, the • Introduction to High Performance programming scientific applications Coarray Fortran and UPC Date and Location Message Passing Interface (MPI) Computing with Fortran. The course teaches April 15 - 17, 2013 and the shared memory directives of newest Fortran standards. Hands-on JSC, Research Centre Jülich

108 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 109 GCS – High Performance Computing Courses and Tutorials

Contents gramming techniques, optimal usage to working with our users on these ed performance feedback and to tune Web page Many-core programming is a very of CPUs and GPUs on a single node, opportunities. From Monday to the ported applications. The Cray http://www.fz-juelich.de/ias/jsc/ dynamic research area. Many scien- and multi-GPU programming across Wednesday, specialists from Cray will scientific libraries for accelerators events/sc-may tific applications have been ported to multiple nodes. The material will be support you in your effort porting and will be presented, and interoperability GPU architectures during the past presented in the form of short lec- optimizing your application on our of OpenACC directives with these four years. We will give an introduc- tures followed by in-depth hands-on Cray XE6. On Thursday, Georg Hager and with CUDA will be demonstrated. Guest Student Programme: tion to CUDA, OpenCL, and multi-GPU sessions. and Jan Treibig from RRZE will present Through application case studies and Education in Scientific programming using examples of detailed information on optimizing tutorials, users will gain direct expe- Computing increasing complexity. After introduc- Web page codes on the multicore AMD Interlagos rience of using OpenACC directives in ing the basics the focus will be on http://www.fz-juelich.de/ias/jsc/ processor. Course language is English realistic applications. Date and Location optimization and tuning of scientific events/advgpu (if required). Users may also bring their own codes August 05 - October 11, 2013 applications. to discuss with Cray specialists or JSC, Research Centre Jülich This course is a PATC course (PRACE Web page begin porting. Advanced Training Centres). Cray XE6 Optimization http://www.hlrs.de/events/ Contents Workshop Web page Guest Student Programme “Scientific Web page http://www.hlrs.de/events/ Computing” to support education and http://www.fz-juelich.de/ias/jsc/ Date and Location Cray XK at HLRS, Stuttgart training in the fields of supercomput- events/gpu April 23 - 26, 2013 ing. Application deadline is April 30, Stuttgart, HLRS Date and Location Introduction to the Pro- 2013. Students of Computational Sci- April 29 - 30, 2013 gramming and Usage of the ences, Computer Science and Math- Advanced GPU Programming Contents Stuttgart, HLRS Supercomputer Resources ematics can work ten weeks in close HLRS installed Hermit, a Cray XE6 at Jülich collaboration with a local scientific Date and Location system with AMD Interlagos proces- Contents host on a subject in their field. April 18 - 19, 2013 sors and a performance of 1 PFlop/s. This workshop will cover the pro- Date and Location JSC, Research Centre Jülich We strongly encourage you to port gramming environment of the Cray May 16 - 17, 2013 Web page your applications to the new architec- XK6 hybrid supercomputer, which JSC, Research Centre Jülich http://www.fz-juelich.de/ias/jsc/gsp Contents ture as early as possible. To support combines multicore CPUs with GPU Today's computers are commonly such effort we invite current and accelerators. Attendees will learn Contents equipped with multicore processors future users to participate in special about the directive-based OpenACC This course gives an overview of and graphics processing units. To Cray XE6 Optimization Workshops. programming model whose multi-ven- the supercomputers JUROPA and make efficient use of these massively With this course, we will give all nec- dor support allows users to portably JUQUEEN. Especially new users will parallel compute resources advanced essary information to move applica- develop applications for parallel ac- learn how to program and use these knowledge of architecture and pro- tions from the current NEC SX-9, the celerated supercomputers. systems efficiently. Topics discussed gramming models is indispensable. Nehalem cluster, or other systems The workshop will also demonstrate are: system architecture, usage This course builds on the introduction to Hermit. Hermit provides our users how to use the Cray Programming model, compilers, tools, monitoring, to GPU programming. It focuses on with a new level of performance. To Environment tools to identify CPU MPI, OpenMP, performance optimi- finding and eliminating bottlenecks harvest this potential will require all application bottlenecks, facilitate the zation, mathematical software, and using profiling and advanced pro- our efforts. We are looking forward OpenACC porting, provide accelerat- application software.

110 Autumn 2012 • Vol. 10 No. 2 • inSiDE Autumn 2012 • Vol. 10 No. 2 • inSiDE 111

Authors

Nikolaus A. Adams Wolfram Hesse Rolf Rabenseifner [email protected] [email protected] [email protected] Momme Allalen Stefan Hickel Michael M. Resch [email protected] [email protected] [email protected] Padmesh Anjukandi Christian Hoelbling Jordi Ribas-Arino [email protected] [email protected] [email protected] Christoph Anthes Stefan Holl Christian Rössel [email protected] [email protected] [email protected] Lukas Arnold Susanne Horn Helmut Satzger [email protected] [email protected] [email protected] Norbert Attig Ferdinand Jamitzky Armin Seyfried [email protected] [email protected] [email protected] Thomas Baumann Matthias Kaczorowski Olga Shishkina [email protected] [email protected] [email protected] Thomas Bönisch Armel Ulrich Michael Stephan [email protected] Kemloh Wagoum [email protected] Matthias Brehm [email protected] Jamie O’Sullivan [email protected] Dieter Kranzlmüller [email protected] Steffen Brinkmann [email protected] Godehard Sutmann [email protected] Jenny Kremser [email protected] Max Camenzind [email protected] Regina Weigand [email protected] Andreas Lintermann [email protected] Przemyslaw Dopieralski [email protected] Andreas Wierse przemyslaw.dopieralski Dominik Marx [email protected] @theochem.rub.de [email protected] Mathias Winkel Neriman Emre Michael Meyer [email protected] [email protected] [email protected] Roland G. Winkler Dietmar Erwin Bernd Mohr [email protected] [email protected] [email protected] Felix Wolf Shiqing Fan Martin Müser [email protected] [email protected] [email protected] Dmitry A. Fedosov Mathias Nachtmann [email protected] [email protected] Erratum Michael Gerndt Walter Nadler Vol. 10, Nr. 1, Pages 40-43, [email protected] [email protected] “In Silico” Phosphate Transfer: Colin W. Glass Carmen Navarrete From Bulk Water to Enzymes, [email protected] [email protected] we have to add Gerhard Gompper Marc Offman Prof. Dr. Dominik Marx, [email protected] [email protected] Lehrstuhl für Theoretische José Gracia Ludger Palm Chemie, Ruhr-Universität [email protected] [email protected] Bochum, as co-author Carla Guillen Dirk Pleiter [email protected] [email protected]

If you would like to receive inSiDE regularly, send an email with your postal address to [email protected] © HLRS 2012 Web page:http://inside.hlrs.de/