inSiDE • Vol. 6 No.1 • Spring 2008

Innovatives Supercomputing in Deutschland Editorialproposals for the development of soft- Contents ware in the field of high performance Welcome to this first issue of inSiDE in computing. The projects within this soft- 2008. German supercomputing is keep- ware framework are expected to start News ing the pace of the last year with the in autumn 2008 and inSiDE will keep Automotive Simulation Centre (ASCS) founded 4 Gauss Centre for Supercomputing taking reporting on this. the lead. The collaboration between the JUGENE officially unveiled 6 three national supercomputing centres Again we have included a section on (HLRS, LRZ and NIC) is making good applications running on one of the three PRACE Project launched 7 progress. At the same time European main German supercomputing systems. supercomputing has seen a boost. In this In this issue the reader will find six issue you will read about the most recent articles about the various fields of appli- Applications developments in the field in . cations and how they exploit the various Drag Reduction by Dimples? architectures made available. The sec- An Investigation Relying Upon HPC 8 In January the European supercomput- tion not only reflects the wide variety of ing infrastructure project PRACE was simulation research in Germany but at Turbulent Convection in large-aspect-ratio Cells 14 started and is making excellent prog- the same time nicely shows how various Editorial ress led by Prof. Achim Bachem from architectures provide users with the Direct Numerical Simulation Contents the Gauss Centre. In February the fast- best hardware technology for all of their of Flame / Acoustic Interactions 18 est civil JUGENE was problems. officially unveiled at the Jülich Research Massively parallel single and Centre. The IBM BlueGene/P system In the project section we present a multi-phase Flow in porous Media 22 will provide German and European look at one of the key projects in researchers with a peak speed of for the usage of high performance com- Unravelling the Mechanism How Stars Explode 26 223 TFlop/s. With this new installation puters DEISA. This has recently been Germany is knocking at the PFlops-door. extended and you will find a report on The Aquarius Project: Cold Dark Matter

• Prof. Dr. H.-G. In March the University of Stuttgart and the new and extended focus here. The under a Numerical Microscope 32 Hegering (LRZ) HLRS together with its partners from close cooperation between the super- industry founded the Automotive Simula- computing community and the Grid • Prof. Dr. Dr. Th. tion Centre Stuttgart (ASCS). The focus community is reflected in the project Projects Lippert (NIC) of ASCS is on simulation in automotive report on SmartLM which focuses on IRMOS – Interactive Real-time Multimedia • Prof. Dr.-Ing. M. M. engineering that require high perfor- Grid-friendly software licensing for loca- Applications on Service-oriented Infrastructures 36 Resch (HLRS) mance computing. The ultimate goal is tion independent application execution. to support automotive industry in meet- The importance of visualization tools DEISA to enhance the European

ing the challenging targets in CO2 re- to understand the results of simula- HPC Infrastructure in FP7 40 duction as put forward by the European tion is shown in the IRMOS project. It Commission recently. introduces interactive real-time multi- SmartLM – Grid-friendly Software Licensing media applications on service oriented for Location independent Application Execution 42 All these activities are evolutionary infrastructures. A further specifically steps towards a German and European interesting project focusing on optimal Toward Optimal Load Balance in Parallel hardware and software landscape that load balance in parallel programs for Programs for Geophysical Simulations 44 is able to provide researchers with a geophysical simulations is also featured. competitive cyberinfrastructure which allows them to see eye-to-eye with col- As usual, this issue includes information Centres 48 leagues in the US and Japan. In addition about events in supercomputing in Ger- to these activities over the last months many over the last months and gives an Activities 58 the Federal Ministry of Education and outlook of future workshops in the field. Research (BMBF) has issued a call for Readers are welcome to participate. Courses 66

Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 3 Automotive Simulation Members of the ASCS contribute to the funding with an annual contribution. Centre Stuttgart (ASCS) The founding partners of ASCS are: founded Abaqus Deutschland GmbH, Adam Opel GmbH, Altair Engineering GmbH, Cray Computer Deutschland GmbH, A step forward in bringing high per- Dr. Thomas Weber, member of the Daimler AG, Dr.-Ing. h.c. F. Porsche AG, formance computing to automotive board of Daimler AG and of the board Dynamore, Engineous GmbH, Research industry was made on March 7th, of the University of Stuttgart said: Institute for Automotive and Engine 2008 at Stuttgart. Private and pub- “This first transfer centre between the Design (FKFS), INTES GmbH, the lic partners joined the University of University of Stuttgart and industry will University of Stuttgart, The Virtual Stuttgart in creating a centre for the further strengthen our competence in Dimension Centre Fellbach, Wilhelm collaboration in automotive simulation automotive and transportation technol- Karman GmbH. Only four weeks after on high performance computing sys- ogy and will further improve the com- computing in production planning and the foundation NEC Europe joined the tems. For the University of Stuttgart petitiveness of Germany as a location optimization. By bringing together pub- centre as a full member. Collaboration this is a further step to strengthen its for research and development”. Added lic and private research ASCS aims with IBM and further hardware News industrial collaboration and to focus Wolfgang Dürnberger, member of the at shortening the innovation cycle in vendors is currently discussed. News the activities of its High Performance board of Porsche AG: “ASCS has a automotive simulation technology thus Computing Centre Stuttgart (HLRS) in pilot character for the collaboration of increasing the quality of automotive The centre is the first in a series of the field of computational engineering. research and industry and is setting products. Initial funding of ASCS is simulation centres that the State of For automotive industry, software and an example for Europe.” Prof. Wolfram provided by the University of Stuttgart Baden-Württemberg wants to intro- hardware vendors this is a step closer Ressel, President of the University of and the State of Baden-Württemberg duce in order to improve the know-how to public research organizations. The Stuttgart, who had been a driving force over a period of three years. Industrial transfer between research and industry common driving force is the need to behind the ASCS said: “The big interest partners will bring in additional funding over the next years. Further similar

substantially reduce CO2 emissions of industry in such a collaboration again for this first startup period. After that centres are planned for energy simula- while sustaining an economically leading emphasizes the outstanding level of ASCS will work based on public and pri- tion, chemical simulation, and biomedi- position in the field. quality and performance of research at vate funding and will be self-sustaining. cal simulation. the University of Stuttgart.”

ASCS will focus on pre-competitive research in fields that require simula- tion on high performance computing systems. First projects are currently discussed with the board of ASCS. Among the most important problems are multidisciplinary optimization and combustion. Both require a high level of compute performance and expertise in computational engineering on high performance computers available at the HLRS.

Further topics to be covered by the projects of ASCS will be material sci- ence investigations and the usage of simulation and high performance President Prof. Wolfram Ressel at the foundation of ASCS Founding members at the foundation ceremony of ASCS on March 7th, 2008

4 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 5 JUGENE officially unveiled PRACE Project launched

On February 22th, 2008, the supercom- fastest civil supercomputer”, he PRACE, the Partnership for Advanced To achieve these challenging goals, the puter IBM Blue Gene/P JUGENE was said approvingly, “but it also offers a Computing in Europe, has reached an- researchers from the partner organi- officially unveiled at Research Centre network of education opportunities other milestone in its mission to create zations firmed up details of the project Jülich. More than 250 guests attended and top research in the simulation a European high-performance computing workplan during the kick-off meeting at the inauguration ceremony which found sciences. There is a vigorous, multi- infrastructure. At the end of 2007 the Jülich. One task is to define a suitable its highlight in the go-ahead given by faceted research and technology project received a grant of € 10 million legal form and organizational struc- Prime Minister Dr. Jürgen Rüttgers landscape in this area that makes from the European Commission towards ture for the permanent European HPC together with State Secretary Thomas North-Rhine Westphalia the number its total budget of € 20 million for the infrastructure. Key to the success of Rachel. one in innovation.” next two years. On January 29th to 30th, the project are the technical develop- 2008, the kick-off meeting of the project ments required to enable operation Prof. Bachem, Chairman of the Board Thomas Rachel, State Secretary of the – which is coordinated by Forschungs- of a distributed supercomputing infra- of Directors of Research Centre Jülich, Federal Ministry of Education and Re- zentrum Jülich – took place at JSC. structure, the scaling and optimisation gave the welcoming address. “Science search, explained the significance of 74 participants from 14 European of application software, and the evalu- News and industry need computing power the new supercomputer for Germany: countries attended the meeting. Thomas ation of prototypes of future comput- News of the highest quality”, said Bachem. “With JUGENE, the German position Rachel, Parliamentary State Secretary ers. PRACE aims to install a petaflop “With JUGENE, we have now set an- within Europe has been strengthened. at the Federal Ministry of Education and system as early as 2009. other milestone in Jülich for cutting- The German Gauss Centre for Super- Research, opened the event. edge research. Supercomputer simula- computing will take on a leading role in The following countries are collaborat- tions have become a key technology. establishing a European supercomputing PRACE has been established to cre- ing in the PRACE project: Germany, UK, They function as virtual microscopes centre.” ate a permanent pan-European High , Spain, Finland, Greece, Italy, and telescopes and enhance our under- Performance Computing service for The Netherlands, Norway, Austria, standing in the subatomic region and Prof. Thomas Lippert, Director of the research. In the preparatory phase, Poland, Portugal, Sweden, and Switzer- in the universe, respectively.” Jülich Supercomputing Centre, pointed which runs until the end of 2009, the land. Germany is represented in PRACE out how supercomputer users can ben- project will establish the basis for a through the Gauss Centre for Super- North-Rhine Westphalia’s Prime Min- efit from the dedicated topic-specific transnational organizational structure computing, which focuses the activities ister Rüttgers could not conceal his support in Jülich. “JSC‘s staff do not for scientific supercomputing in Europe. of the three HPC centres in Jülich, pride. “Jülich not only operates the just ensure that JUGENE is ready for By bringing together the know-how and Stuttgart, and Garching. Through close use; they also provide expertise on all resources of the partners, PRACE will co-operation with established European aspects of the simulation sciences. In provide European researchers with ac- research organizations such as ESF, simulation laboratories, for example, cess to supercomputing capacities at EMBL, and ESA, PRACE will be embed- scientific simulation know-how is pro- a world-class level, transcending those ded into the European Research Area. vided for the topics of plasma physics, affordable at the national level. This biology, earth system science and the includes a coordinated approach to The PRACE project re- nanosciences.” hardware procurement and potentially ceives funding from the a European platform for the develop- European Community’s In a short discussion, representatives ment of hardware and software jointly Seventh Framework of IBM, Intel and EADS took a look at with industry. Close co-operation with Programme the importance of supercomputing national and regional computer centres (FP7/2007-2013) from industry’s point of view. Video and scientific organizations will ease under grant agree- clips on hardware, applications, users, access to computing resources at all ment no. RI-211528. and project partners in science and levels for scientists and engineers from Additional information: industry livened up the discussions and academia and industry. www.prace-project.eu interviews.

6 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 7 Drag Reduction by Dimples? An avoids the definition of appropriate in- Two geometries of dimples were flow and outflow boundary conditions. tested, both supplied by Inventors Net- Investigation Relying Upon HPC The pressure gradient in the stream- work GmbH, the holder of the patents wise direction was adjusted such that a and representative of the Russian fixed mass flow rate was assured. Fur- scientists. The results reported in the thermore, no-slip boundary conditions following refer to the smaller dimple Introducing a regular arrangement of in more detail. For that purpose, exper- were used at both walls. geometry, which was provided as the surface depressions called dimples is a imental investigations (see [1]) as well first choice. The print diameter of these well-known measure to increase heat as direct numerical simulations (DNS) Since the simulation cannot cover the smaller dimples was D = 15 mm and transfer from a wall. Compared to a of the turbulent flow inside a channel entire experimental set-up, the compu- the depth h = 0.75 mm. As shown in smooth wall, the Nusselt number can with dimpled walls were carried out to tational domain consisted of a cutout Figure 1, multiple shallow dimples (depth be significantly enhanced by dimples, investigate the physical mechanism. of the channel maintaining periodic to print diameter of h/D = 0.05) were whereas the increase of the pressure boundary conditions. Three cases were arranged regularly on the surface of the drop was found to be small. For that pur- Direct Numerical Simulations considered: lower and/or upper channel walls. pose, deep dimples with a ratio of depth The direct numerical simulations were to print diameter of h/D = 0.2 - 0.5 are carried out with the CFD code LESOCC [L:] Plane channel with multiple Using block-structured curvilinear typically applied to enhance the convec- which was developed for the simulation dimples at the lower wall grids, two different grid resolutions Applications tive transport from the wall. of complex turbulent flows. LESOCC is were taken into account. The first, de- Applications based on a 3-D finite-volume method [B:] Plane channel with multiple noted coarse grid, consisted of about Some years back Russian scientists for arbitrary non-orthogonal and non- dimples at both walls 5.5 million control volumes (CVs) and discovered that apart from heat-trans- staggered, block-structured grids [2,3]. thus was not really coarse. For the sec- fer enhancement dimples might be The spatial discretization of all fluxes [R:] Plane channel without dimples, ond grid, denoted fine grid, the number useful for drag reduction. Their findings is based on central differences of i.e., reference case with smooth walls of control volumes was exactly doubled are based on experimental measure- second-order accuracy. A low-storage ments of turbulent flows over surfaces multi-stage Runge-Kutta method (sec- with a regular arrangement of shallow ond-order accurate) is applied for time- dimples leading to a decrease of the marching. LESOCC is highly vectorized skin-friction drag of a turbulent flow (> 99.8%) and additionally parallelized up to 20%. (Note that this is not the by domain decomposition using MPI. “golf ball“ effect!). However, up to now The present simulations were carried no clear explanation has been pro- out on the NEC SX-8 machine at HLRS vided which can illuminate this effect. Stuttgart using 8 processors of one Hence some doubts remain regarding node leading to a total performance of the experimental investigations and about 52.7 GFlop/s. For details about their outcome. Compared to classical vectorization and parallelization we devices for drag reduction, such as refer to [4]. riblets, shallow dimples would be ad- vantageous since they are composed In order to investigate the effect of of macroscopic structures which are dimples, a classical wall-bounded flow less sensitive to dirt and mechanical often studied in literature was consid- degradation. Furthermore, their effec- ered, i.e., a turbulent plane channel tiveness should not depend on the flow flow at Re = 10,935. This test case has direction. several advantages. With smooth non- dimpled walls the flow is homogeneous Since the drag reduction promised by in streamwise and spanwise directions. dimples would have a tremendous eco- This allows to use periodic boundary nomical influence, it is worth studying it conditions in both directions and thus Figure 1: Channel wall with milled dimples

8 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 9 Figure 2: Measured that the linear pressure gradient in In Figure 5 the streamlines at the and predicted friction the streamwise direction of the chan- lower walls defined by the average coeffi cient Cf vs. Re; reference data: Dean nel is not included in the figure. In surface shear stress are displayed in correlation (1978) and Figure 4 the time-averaged wall-shear conjunction with the distribution of the DNS by Moser et al., stress distribution is displayed. Obvi- time-averaged spanwise velocity close 1999 ously the wall-shear stresses decrease to the wall. Upstream of the dimple within the dimples, leading to a very and partially also in the dimple the small recirculation region at the falling streamlines converge. A tiny recircu- edge (see Figs. 5 and 6). However, at lation region visible also in Figure 6 is the downstream edge of the dimples, found inside the dimple. The existence where the fluid flow leaves the surface of this phenomenon is confirmed by the depression again, large values of the results on both grids. At the side and wall-shear stress are found. downstream of the recirculation region the streamlines diverge again.

Applications Applications in each direction. In total about 43 mil- not show any increase in pressure loss, Figure 3: Channel fl ow lion CVs were used and the resolution but a significant decrease could also with multiple dimples at both walls, time-averaged in each direction was halved yielding an not be detected. The second dimple pressure excellent resolution of the viscous sub- geometry performed even worse and layer. For the cases with dimples the resulted in a definite increase of the grid was clinging to the curvilinear ge- pressure loss in the channel flow. Addi- ometry of the walls and thus was locally tionally, an external flow test case was no longer Cartesian. To achieve reliable set-up, i.e. a zero-pressure-gradient flat statistical data, the flow was averaged plate boundary layer flow. This again over long time intervals. supports the conclusion drawn from the channel flow experiments, namely, Results that the dimples did not influence drag, Figure 4: Channel fl ow with multiple dimples at The skin-friction coefficients measured neither to the better nor to the worse. both walls, time-averaged in the channel flow (see [1]) were wall-shear stress plotted as a function of the Reynolds More insight into this not very satis- number. Figure 2 presents a sum- factory result regarding the integral mary of the experimental results for behavior of the dimpled surface com- the smaller dimples together with data pared to the flat walls was sought using points derived from integration of the CFD. Since only minor deviations were numerically obtained wall quantities. found between the results obtained on For comparison, data available in the the coarse and the fine grid, in the literature (Dean, 1978; Moser et al., following only the latter are discussed. Figure 5: Channel fl ow 1999) was added. The experimental with multiple dimples at skin-friction coefficients for the channel Figure 3 depicts the time-averaged both walls, wall stream- flow with smooth walls and with a dim- pressure distribution obtained for the lines pled wall on one side almost collapse. case B using a regular arrangement They differ by less than 1% through of shallow dimples on both walls. The all the Reynolds number range consid- influence of the dimples on the pres- ered. Therefore, it may be concluded sure distribution on the lower wall can that, surprisingly, the dimpled wall did clearly be seen. It should be mentioned

10 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 11 Figure 6: Zoom of dimpled wall, where the time-average on the drag is overcome by the newly the dimple region in a x-y midplane is marginally smaller on the dimpled appearing pressure force contribution. wall than on the smooth wall. However, Concerning prospective applications at the dimpled wall the pressure distri- of dimpled surfaces in engineering, bution on the wavy surface (Fig. 7) yields this study implies that it is feasible to a pressure force in the main flow direc- achieve an augmentation of heat trans- tion (about 5.5% of total force) which fer by shallow dimples without signifi- contributes to the total drag force. cant additional pressure losses typically Both effects – the slight decrease of encountered with deep dimples. the average shear force and the ad- Figure 7: Pressure distribution in the dimple ditional contribution of the pressure Acknowledgments in a x-y midplane force – approximately compensate This work was supported by the each other, so that there is no net Deutsche Forschungsgemeinschaft drag reduction due to the dimples. In under contract number BR 1847/9. fact, the total drag for case B even The computations were carried out on increases about 4.4% compared to the NEC SX-8 machine at HLRS Stutt- the fine-grid reference case R. For the gart (grant no.: DIMPLE/12760). Applications coarse grid, we found an increase of All kinds of support are gratefully Applications about 2.0% and 3.8%, respectively for acknowledged. the case L (one dimpled wall) and case Figures 6 and 7 display the time- drag. The strongest flow structures in B (two dimpled walls). Thus the effect References averaged flow in a x-y midplane of one the time-averaged flow, visualized by of the dimples is approximately doubled [1] Lienhart, H., Breuer, M., Köksoy, C. Drag Reduction by Dimples? dimple. Based on the velocity vectors iso-surfaces of λ used as a structure- for case B compared with case L and 2 A Complementary Experimental/Numerical the reduction of the velocity gradient identification method are located at the an increase of about 4% is found for Investigation, Int. J. of Heat and Fluid Flow, near the wall and thus of the wall-shear upstream and downstream edges of case B on both grids. Hence, these in press, 2008 stress is visible. On the other hand, the dimple, where the flow enters and simulations confirm the experimental [2] Breuer, M. it is obvious that the dimples lead to leaves the dimple. findings [1] that the present arrange- Large-Eddy Simulation of the Sub-Critical a modified pressure distribution com- ment of dimples does not lead to drag Flow Past a Circular Cylinder: Numerical • Michael Breuer pared with the case of a smooth wall. In order to evaluate the effect on the reduction but rather to a slight in- and Modeling Aspects, Int. J. for Num. Methods in Fluids, vol. 28, pp. 1281-1302, At both upstream and downstream drag resistance, histories of the forces crease of the total drag. University 1998 regions of the dimple, where the flow acting on the lower dimpled wall of of Erlangen- either enters or leaves the surface case B and on the non-dimpled smooth Conclusions [3] Breuer, M. Nürnberg, Direkte Numerische Simulation und Large- depression, the pressure is slightly wall of case R were investigated. For Based on the simulations (and the Institute of Eddy Simulation turbulenter Strömungen decreased. However, more important the non-dimpled wall the situation is measurements [1]) described above, auf Hochleistungsrechnern, Habilitations- Fluid Mechanics is the observation that the pressure simple, since on this surface only the the question whether the present ar- schrift, University of Erlangen-Nürnberg, increases on the rising edge of the dim- wall-shear stress leads to a skin-friction rangement of dimples leads to drag Berichte aus der Strömungstechnik, ISBN: 3-8265-9958-6, 2002 ple yielding a contribution to the overall drag. This force also acts on the lower reduction has to be answered in the negative. That is supported by both [4] Breuer, M., Lammers, P., Zeiser, Th., experimental measurements of an in- Hager, G., Wellein, G. Direct Numerical Simulation of Turbulent ternal and an external flow as well as Flow Over Dimples - Code Optimization DNS of channel flows with dimpled and for NEC SX-8 plus Flow Results, High Figure 8: Iso-surfaces smooth walls. However, a significant Performance Computing in Science and Engineering 2007, Transaction of the of λ2 = -0.05 increase of pressure loss was also HLRS 2007, Oct. 4-5, 2007, HLRS not observed. Integrated over the en- Stuttgart, Germany, pp. 303-318, Springer, tire surface the contribution from the ISBN 978-3-540-74738-3, 2008 shear stresses is marginally decreased by the shallow dimples but the effect

12 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 13 Turbulent Convection in large-aspect-ratio Cells

Convective turbulence is created in large lateral Grid extensions for the flat tion on up to N processors. The rather i.e. increasing the number of CPUs to a fluid which is heated strongly from cells require computational resources small memory size per core on Blue solve the problem should also substan- below. The resulting complex interplay which have become available with the Gene and the large difference in the tially decrease the time-to-answer [3]. between buoyancy and gravity is pres- advent of the Blue Gene systems. number of vertical and horizontal Grid Figure 1 shows the scaling properties ent in many systems, reaching from points require so-called volumetric FFTs of our simulation code for an increase astro- and geophysical flows to chemi- Another important open question which decompose the three-dimen- in the number of CPUs by more than cal engineering and indoor ventilation in this context is the dependence of sional volume into iproc × jproc pencils two orders of magnitude (see inset). (see e.g., [1]). Frequently the lateral transport properties on the aspect and allow for a parallelization degree The main figure illustrates also that dif- extension of convective turbulence ratio. We can expect that the turbu- of N2. The prime requirement for a ferent decompositions iproc × jproc of exceed the vertical one by orders of lent dispersion of tracer particles, simulation with a large Grid is that the the total CPU number alter the perfor- magnitude. For example, atmospheric such as aerosols in the atmosphere FFT algorithm should also be scalable, mance significantly. Applications mesoscale layers and the Gulf stream or phytoplankton in the upper ocean, Applications evolve on characteristic horizontal will be different for the lateral and lengths of 500 to 5,000 km in combi- vertical directions. Recall that convec- nation with vertical heights of about tive turbulence is inhomogeneous and 1 to 5 km. The resulting aspect ratios anisotropic. Planets and stars rotate are then 100 to 1,000. at different angular frequencies, i.e. in addition to the buoyancy and gravity ef- The massively parallel supercomputing fects Coriolis forces appear. The effect facilities which are available today allow of rotation on the structure formation us to study how the flow structures and the tracer transport is a further and the transport properties in con- question which will be investigated in vective turbulence change when the our massively parallel high-resolution convection cell is successively extended simulations of turbulent convection. in both horizontal directions, while keeping the vertical one fixed. Theoreti- Parallel Computation cally, this opens the possibility for the Our direct numerical simulations solve formation of pancake-like large-scale the Boussinesq-Navier-Stokes equa- circulation patterns or bigger clusters tions by a pseudospectral method. of hot and cold temperature blobs. But At the core of this numerical scheme is convective turbulence seizing this op- is the Fast Fourier Transform (FFT), portunity? Indeed, a very recent work which allows for the calculation of dif- by Hardenberg et al. [2] indicates that ferent terms of the nonlinear partial such structures can be formed. How- differential equations, either in physical ever, the lateral Grid resolution and or in wavenumber spaces, and also for the degree of convective turbulence rapid switching between the spaces. were still moderate in this simulation. The classical parallel implementation Systematic numerical studies on the of three-dimensional FFTs uses a slab- geometry dependence of turbulence wise decomposition of the simulation Figure 1: Strong scaling tests of the production code. The cyan boxes mark different decompositions 3 are rare, since they become extremely domain. For a simulation with N Grid iproc × jproc for a given total number of CPUs. Inset: The number of CPUs has been varied from numerically expensive. In particular, the points, the method allows a paralleliza- 16 to 2,048 on a smaller Grid.

14 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 15 which detach permanently from the it turns out that this motion becomes thermal boundary layers at the top and qualitatively different in the lateral direc- bottom planes. These characteristic tions compared to the vertical one. The elements of convective turbulence can lateral dispersion is found to be similar also be seen in the side planes. The situ- to isotropic turbulence and reveals the ation changes drastically when strong famous Richardson dispersion law [4]. rotation is present. The vertical trans- port of heat is arranged in columnar Acknowledgements structures. Strong rotation prohibits The author thanks Mathias Pütz (IBM thus the formation of large-scale lateral Germany) for his help to implement the patterns in our convection cell. P3DFFT package by Dmitry Pekurovsky (SDSC) and to optimize the simula- Figure 3 (left) highlights large-scale tion code on the Blue Gene/L. He also temperature patterns which have been thanks Sonja Habbinga and Marc-André Figure 2: Snapshot of temperature fi eld. An isosurface close to the heating plate and contour plots at two side planes are shown. Left: No rotation. Right: Strong rotation about the vertical axis. formed for the non-rotating case. Con- Hermanns from the JSC. Financial tours of the vertically averaged tem- support by the Deutsche Forschungs- perature at one time instant are shown gemeinschaft is acknowledged. Applications (red=hot; blue=cold). The figure also Applications underlines clearly that the structures References would not be observable in a cell of [1] Kadanoff, L. P. Turbulent heat fl ow: Structures and scaling. aspect ratio one which is indicated by Physics Today 54, 34-39, 2001 the black box. The strong rotation case (Figure 3 right) reflects again the verti- [2] Hardenberg, J. v., Parodi, A., Passoni, G., Provenzale, A., Spiegel, E. A. cal columnar arrangement of the tur- Large-scale patterns in Rayleigh-Bénard bulence as shown in Figure 2. An open convection, Physics Letters A, in press, question, that we are going to study in 2008 the future, is whether the large-scale [3] Schumacher, J., Pütz, M. filaments in the non-rotating case pre- • Jörg Schumacher Turbulence in laterally extended systems, vail when the driving of the turbulent Proceedings of ParCo 2007, 585-592, motion by the vertical temperature dif- 2007 Technische ference becomes stronger. Universität [4] Schumacher, J. Ilmenau, Lagrangian dispersion and heat transport As said before, an important aspect for in convective turbulence, Phys. Rev. Lett., Department for our better understanding of convective in press, 2008 Theoretical Fluid turbulence is to unravel the local mecha- Mechanics Figure 3: Contours of the vertically averaged temperature fi eld. The black box is the aspect ratio of one cell. The vertical nisms of the transport of heat through direction points out of the plane. Left: No rotation. Right: Strong rotation. the cell. The preferential choice is then to “go with the flow”. This so-called Lagrangian description provides exactly Results and Outlook the temperature close to the heating this perspective on the turbulent fluid Figure 2 shows two instantaneous plane is shown. In addition, contour motion. The velocity and temperature snapshots of the full convection cell plots of the temperature on two side fields are monitored from a local frame at an aspect ratio 8. In both cases, walls are displayed. The left picture of reference which is co-moving with the computational Grid contains for the non-rotating case indicates the a tracer particle along the streamline. 2,048×2,048×257 points. The produc- formation of large-scale patches which Here, we advect simultaneously up to tion jobs for these parameter sets were are visible as ridges of the isosurface. 1,5 million of such tracers with the flow run on 1,024 CPUs. An isosurface of They are known as thermal plumes equations. In convective turbulence,

16 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 17 Direct Numerical Simulation changes information with its topological correctly all turbulent structures (vorti- neighbours. The inter-processor commu- ces) but also to resolve stiff intermedi- of Flame / Acoustic Interactions nication relies on the Message-Passing ate (short-lived) chemical radicals, which Interface (MPI) communication library. exist only in very narrow reaction fronts.

Configuration Results Applications involving turbulent flows data delivered by such computations. The ignition and initial development Various species can be used as an with chemical reactions play an essen- Nevertheless, DNS has become an es- of a flame inside a turbulent flow is a indicator of the flame front in a com- tial role in our daily life. All our transpor- sential and well-established research problem of great interest, both from a bustion process. Among them, one tation systems and the overwhelming tool to investigate the structure of tur- fundamental (complex, multi-scale, fully specific iso-surface of the main product

majority of our energy supply rely di- bulent flames, since they do not rely on coupled physical process) and from a (carbon dioxide CO2) is retained here. rectly or indirectly on the combustion of any approximate turbulence models [1]. practical (internal combustion engines, A typical time-evolution for the expand- fossil fuels. In order to improve the ef- In the present project our DNS code gas turbine re-ignition, security issues...) ing turbulent premixed flame is shown ficiency of such systems and to develop π3C is employed to investigate differ- point of view. The flame is initially per- in Figure 1. The impact of turbulence new approaches and configurations, ent flames. This three-dimensional DNS fectly spherical, laminar and centreed on the development of the initially numerical simulations play an increas- code is a Finite-Difference solver written in the middle of the three-dimensional spherical flame is clearly observed in ing role in research and industry. They using FORTRAN 95 for a high numerical numerical domain. Initial conditions this figure, leading to a considerable Applications provide a cost-efficient complement to efficiency. It solves directly the com- correspond to ambient pressure and wrinkling and deformation of the flame Applications experimental testing and prototypes. pressible Navier-Stokes equations plus temperature. As a result of the solution front. As a result, the final structure But, of course, they are only useful if supplementary transport equations for procedure, a turbulent, fully premixed indeed corresponds to a realistic turbu- they can accurately reproduce the real total energy and for the mass fraction flame expands with time into the nu- lent flame. The first step of a standard physical processes controlling practical of each chemical component. Methane merical domain. post-processing is usually to extract configurations. flames are considered in this study with the instantaneous flame front from a realistic description of all chemical Each side of the computational domain the raw data, as illustrated in Figure 1. Depending on the value of a key non-di- processes, using 29 chemical species is 1,0 cm long. This small size will be Many important modeling quantities mensional number, the Reynolds num- and 141 elementary chemical reactions. increased in the future by employing can then be extracted from the DNS ber, every flow can be predicted to be The modeling of the chemistry can be even more computer nodes. For the along the flame front. A dedicated Mat- laminar or turbulent. Turbulent flows simplified when needed using a two- presented results the grid spacing is lab-based library has been developed are found in most practical applications dimensional look-up table, built using the constant and uniform, equal to 33 µm. in our group for this purpose [4]. This but are much more complex than lami- full mechanism [2,3]. In that case, spe- This leads to a computational Grid with post-processing delivers for example nar flows, being chaotic in nature. The cies mass fractions and reaction vari- ca. 27 million Grid points. Such a fine the local strain-rate, as shown on top most exact numerical description of a ables are pre-computed and stored in grid-resolution is needed to describe of the flame surface in Figure 1. turbulent flow field is achieved using the a large database, used to speed-up the so-called direct numerical simulation computational procedure. (DNS) approach, for which the Navier- Stokes equations are solved as exactly Using a realistic description of chemistry as possible on an extremely fine grid: on a growing number of grid elements DNS results are often called “numeri- rapidly leads to a huge discretized equa- cal experiments”. Nowadays, DNS is tion system and to an enormous com- only possible for simple configurations putation time. In this case parallel com- and low-Reynolds number flows even on putations are absolutely necessary. The large computing clusters, because this simulation time can be highly reduced type of simulation requires enormous by dividing the numerical domain into computational resources. Furthermore, smaller sub-domains (a method called DNS is associated with complex post- Domain Decomposition). Each processor Figure 1: Successive positions of the fl ame front in time, corresponding respectively to processing and visualization issues, due of the parallel supercomputer is then re- t = 0, 0,3, 0,5, 0,7 ms (from left to right) after fl ame initialization. The fl ame surface has been to the extremely large quantity of raw sponsible for its own sub-domain and ex- colored by the tangential component of the strain rate.

18 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 19 Then, the unit vector locally perpen- Summary dicular to the flame surface can be Using powerful parallel , computed. The flame front curvature accurate physical models and efficient is given by the divergence of this vec- numerical techniques, complex three-di- tor. In Figure 2 (left) the local flame mensional turbulent flames can be com- curvature is shown again on top of puted as “numerical experiments” using the instantaneous flame surface. As a Direct Numerical Simulation. Interesting complement, the local flame thickness information can be obtained in this way has been also computed at the same concerning, e.g., the modifications of the time. Several definitions can be used local flame structure induced by the tur- to determine flame thickness. Here, bulence, or concerning flame acoustics the thermal flame thickness, computed (Fig. 3). A further numerical optimization from the maximal temperature gradi- of the DNS code is now needed to em- ent has been retained. ploy efficiently an even higher number of

Figure 2: Instantaneous fl ame front at t=0,7 ms, colored with local mean fl ame curvature (left) Applications Applications and fl ame thickness (right). Note that the angle of view is not the same as in Figure 1.

Figure 3: Instantaneous perturbation of the pressure fi eld during the interaction of a turbulent fl ame with a Gaussian acoustic wave

References [1] Hilbert, R., Tap, F., El-Rabii, H., Thévenin, D. Impact of detailed chemistry and transport • Gábor Janiga models on turbulent combustion simula- tions. Prog. Energy Combust. Sci. 30, • Hemdan Shalaby 61–117, 2004 • Dominique Thévenin [2] Fiorina, B., Baron, R., Gicquel, O., Thévenin, D., Carpentier, S., Darabiha, N. Figure 2 (right) shows this local flame nodes. In that manner larger numerical Modeling non-adiabatic partially premixed University of thickness on top of the instantaneous domains and more turbulent conditions fl ames using fl ame-prolongation of ILDM. Magdeburg Combust. Theory Modeling 7, 449–470, flame front. From Figure 2 a non-negli- (i.e., higher Reynolds numbers) will be- Otto von Guericke, 2003 gible positive correlation between cur- come accessible to such simulations. Laboratory of Fluid vature and thickness is already visible, [3] Janiga, G., Gicquel, O., Thévenin, D. Dynamics and High-resolution simulation of three-dimen- which can easily be quantified in a fur- Technical Flows Acknowledgement sional laminar burners using tabulated ther analysis. Such correlations should The authors would like to thank the chemistry on parallel computers. clearly be taken into account in simpli- Leibniz Supercomputing Centre for pro- In: Proc. 2nd ECCOMAS Conference on fied turbulent combustion models, such viding access to its supercomputers Computational Combustion, (Coelho, P., Boersma, B.J., Claramunt, K., Eds.), Delft, as used for industrial configurations. (Project h1121). The financial support of The Netherlands, 2007 This exemplifies how DNS results might the Deutsche Forschungsgemeinschaft be employed to investigate, check and (DFG) for the position of Dr. Hemdan [4] Zistl, C., Hilbert, R., Janiga, G., Thévenin, D. eventually improve such models. Shalaby (Research Unit #486 “Combus- Increasing the effi ciency of post-process- tion noise”) has been essential for the ing for turbulent reacting fl ows. Comput. success of this project. Visual. Sci., in press, 2008

20 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 21 Massively parallel single and involving interfacial dynamics, complex segmented into a black and white image and/or moving boundaries and compli- of pore space and solid phase with a multi-phase Flow in porous Media cated constitutive relationships which voxel resolution of 11 µm. Figure 1 can be derived from a microscopic shows the reconstructed porous picture are suitable for the LBM. medium. Details can be found in [6].

The water exchange between the at- wetting and drainage processes. This We developed a LBM optimized for in- Grid Generation mosphere and the terrestrial system phenomenon is denoted as hysteresis compressible multiphase Stokes flow Direct computations of flows through is strongly influenced by water flow and was first documented in [7]: It is problems. A detailed description of the porous media in the literature so far and phase transition processes at the caused by different pore structures model is given in [1,2]. The simulation are based on binarized porous media soil surface and within the underlying relevant for drainage and wetting pro- kernel called “Virtual Fluids” [4] devel- data mapped to uniform Cartesian vadose zone. Although this zone is a cesses. While the drainage of a large oped at the Institute for Computational grids. The tomographic voxel set is di- relatively small compartment of the pore body may be prevented by sur- Modeling uses block structured grids rectly used as the computational grid biosphere it hosts most terrestrial bio- rounding small pore throats, the wet- and the parallelization follows a distrib- and therefore the geometrical repre- logical and life supporting processes. ting of fine pores above a large pore is uted memory approach using the sentation is usually only first-order ac- Water distribution and dynamics within hampered by the weak capillary forces Message Passing Interface (MPI). curate due to staircase patterns. We this zone on the micro-scale (the scale in the wide body. Additional hysteresis pursue a more elaborate approach, Applications where the porous medium is resolved) effects are caused by a difference in Tomography and where the geometry is obtained as fol- Applications determine the mass fluxes and the advancing and receding contact angles Image Processing lows: Starting from a highly resolved transport of substances at the macro- and the inclusion of air in a first wetting To obtain the geometry of a real porous tomographic data set we use an iso- scale (which is related to compartment). process. Due to the inclusion of air the medium we scanned a sand sample of contour algorithm (marching-cubes) to Thus a proper understanding of the porous medium cannot be completely 1,5 cm in diameter and 1 cm in height reconstruct the surface of the porous processes on the microscale can help re-saturated after the first drainage. using X-rays from a synchrotron source. medium as a set of planar triangles. to improve the prediction of the behav- The tomography was carried out at the Then the numerical resolution of the ior on the macro-scale and is extremely Computational Approach Hamburger Synchrotron Laboratories Cartesian grid for the simulation can important for application fields like (HASYLAB) in Germany. The particle size be chosen independently from the voxel environmental engineering, petroleum Lattice Boltzmann Methods of the sand material ranged from 0,1 set. Details can be found in [5]. Figure 1

industry, irrigation, CO2 sequestration In the last two decades the Lattice to 0,5 mm. Based on the imaged X-ray shows a detail of the triangulated sur- and many others. Boltzmann Method (LBM) has matured attenuation the density distribution of face for the porous medium. The com- as an alternative and efficient numeri- the solid material can be reconstructed. plete geometry consists of 110 million Hysteresis cal scheme for the simulation of fluid The reconstructed density map was triangles. In multiphase flow systems of two im- flows and transport problems. Unlike miscible phases as e.g. air and water conventional numerical schemes based in a porous medium like a soil, the flow on discretizations of macroscopic con- and transport properties depend on tinuum equations, the LBM is based on the amount and the spatial distribution microscopic models and mesoscopic of the phases within the pore space. kinetic equations. The fundamental idea On the pore scale the phase distribu- of the LBM is to construct simplified tion is controlled by the capillary forces kinetic models that incorporate the es- depending on pore size, surface tension sential physics of microscopic or me- and wettability. For that reason, the soscopic processes in a way that the relationship between capillary pressure macroscopic averaged properties obey

and liquid saturation (Pc-Sw relationship) the desired macroscopic equations. is of high importance for the predic- The scheme is particularly successful tion of water flow and solute transport. in problems where transport problems

Unfortunately, this relationship is am- are described by an advective and dif- Figure 1: Left: Geometry obtained by an X-ray scan of a porous medium measured at HASYLAB and biguous and depends on the preceding fusive operator. Especially applications image processing methods. Right: Detail of the reconstructed surface of the porous medium.

22 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 23 Simulation (e.g. a morphological pore network Figure 3: Parallel effi ciency Only a few studies have reported simu- model) and to experimental data and of “Virtual Fluids” for a regular lations of multiphase flow in three-di- show a very good agreement [2]. single-phase problem and mensional porous media mainly because different Grid sizes of computational limitations. Only with Computational and the computational resources delivered parallel Performance by the SGI Altix 4700 at the LRZ in The LBM is an explicit time stepping it was possible at all to perform scheme, which operates on regular meaningful simulations for such a com- grids and usually relies only on next- plex problem. To compute the hysteretic neighbor interaction. Therefore the

Pc-Sw relationship the numerical simula- method is very well suited for mas- tion is conducted as follows. Initially the sively parallel systems. Also the single entire pore space is filled with the wet- core performance is well exploited ting phase (water). A time dependent with LBM. The performance P of an LB suction is applied at the bottom, while implementation is usually measured in the top of the sample is connected to a Lattice Updates per Seconds (LUPS), non-wetting phase reservoir (air). The which states how many lattice nodes Outlook [2] Ahrenholz, B., Tölke, J., Lehmann, P., Peters, A., Kaestner, A., Krafczyk, M., Applications numerical grid resolution for the differ- can be updated in one second. For a Presently we are working on algorithms Applications Durner, W. ent simulations was in the range of single core we obtain P = 3,33E6 LUPS to simulate free surface problems with Prediction of capillary hysteresis in porous 2003 to 8003. In Figure 2 a snapshot (in single precision) which corresponds mesh refinement. Here we follow an material using lattice Boltzmann methods of a hysteresis simulation is given. Air to 37% of the maximum possible per- adaptive block-wise strategy, where and comparision to experimental data and a morphological pore network model, acc. infiltrates at the top of the porous me- formance. cuboids containing a fixed number of for publ. in Advances in Water Resources dium which is filled with (non-visible) nodes are the leaves of a hierarchical water. Many other simulations In Figure 3 the parallel efficiency for a octree data structure and are dynami- [3] Freudiger, S., Hegewald, J., Krafczyk, M. A parallelization concept for a multi-physics have been performed and single-phase simulation and different cally (de-)allocated if needed [3]. lattice Boltzmann prototype based on hier- the results have been grid sizes in a simple regular geometry archical Grids, Progress in Computational compared to other with high porosity (0.9) is shown. Even The LBM method is also very well suited Fluid Dynamics, in press • J. Tölke1 3 1 modeling ap- for a relatively small grid size of 240 for acoustics problems. The application [4] Geller, S., Krafczyk, M., Tölke, J., • B. Ahrenholz proaches we obtain a very good scaling behavior. field we are currently investigating is Turek, S., Hron, J. • M. Krafczyk1 Up to a number of 1,000 cores we note the determination of sound absorption Benchmark computations based on lattice • P. Lehmann2 a super-linear scaling due to caching properties of road surfaces. The ge- Boltzmann, Finite Element and Finite Volume Methods for laminar fl ows, Computers 1 effects. In general a very good scaling ometry is again obtained by tomography & Fluids, Volume 35(8-9): 888-897, 2006 Technische behavior of the code is observed. Using methods. Since the number of time Universität [5] Ahrenholz, B., Tölke, J., Krafczyk, M. 4,080 cores for a grid of 4,0003 we steps needed for an acoustics problem Braunschweig, Second-order accurate lattice Boltzmann obtain a parallel efficiency of 92%. The is much smaller than for a hydrodynamic fl ow simulations in reconstructed porous Institute for performance of a parallel multiphase simulation we can run simulations on media, International Journal of Computational Computational simulation in a real porous medium de- very large grids and thus expect to be Fluid Dynamics, 20 (6): 369-377, 2006 Modeling in Civil

pends strongly on the local porosity of able to solve complex real life problems [6] Kaestner, A., Lehmann, E., Engineering the sub-domains. The smaller the varia- with sufficient accuracy. Stampanoni, M. tion of the local porosities in the sub- Imaging and image processing in porous 2 Ecole media research, acc. for publ. in Advances domains the better the load-balancing. References Polytechnique in Water Resources For a typical simulation we used 512 [1] Tölke, J., Freudiger, S., Krafczyk, M. Fédérale de An adaptive scheme for LBE Multiphase cores and a partition of 8×8×8. The [7] Haines, W. Lausanne, Flow Simulations on hierarchical Grids, Studies in the physical properties of soils: minimum and maximum porosity for one Laboratory of Soil Computers & Fluids, Volume 35(8-9): V. The hysteresis effect in capillary proper- partition was 0.25 and 0.53. Neverthe- 820-830, 2006 ties, and the modes of moisture distribution and Environmental Figure 2: Simulation of air (red) penetration from the top in an initially water less the parallel efficiency for this prob- associated therewith, Journal of Agricultural Physics (invisible) saturated porous medium lem (70%) was still reasonable. Science, 20: 97-116, 1930

24 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 25 Unravelling the Mechanism How Stars Explode

Understanding how massive stars end Observations of supernova remnants their lives in supernova explosions is tell us that not all the stellar gas is one of the most important problems ejected in the outburst. The core of in stellar astrophysics. With the most the star with the size of the moon but sophisticated computer simulations more massive than the sun, collapses carried out to date, researchers at to an ultracompact object, a neutron the Max-Planck-Institute for Astrophys- star, which has only the diameter of a ics in Garching are making progress big city. Such a neutron star is still vis- in deciphering the complex interplay ible as a point source in many of the of hydrodynamics and particle physics gaseous clouds that are left behind as that reverses the collapse of the stellar heralds of past explosions. core and initiates the violent disruption Applications of the star. The Supernova Puzzle Applications The gravitational binding energy re- Astrophysical Context leased in the neutron star formation Roughly five times a second some- is hundred times more than needed where in the universe the life of a star for powering the supernova. But how with more than eight times the mass does this happen? How is the implo- of our sun is terminated in a gigantic sion of the stellar core reversed to the supernova explosion. These cosmic explosion of the overlying layers of the catastrophies are the most violent star? This is still a puzzle that chal- Figure 1: Neutrinos, radiated from the newly formed neutron star at the centre of a collapsing star, events after the Big Bang. Within only lenges theorists‘ intuition and modeling deposit the energy to drive the shockwave that causes the supernova explosion of the star. fractions of a second more energy is abilities. One of the most popular ideas released than the sun produces during involves neutrinos as the driving agents. Goals of this Project Until then, computer models are an all its evolution. The hot plasma of the These elementary particles are pro- Answering this question would mean indispensable tool for promoting our distroyed star, expanding into interstel- duced in huge numbers at the extreme a major breakthrough in stellar astro- theoretical insight. At Garching, we have lar space with a tenth of the speed of conditions in the newly formed neutron physics. It would not only allow us to developed numerical codes that allow us light, can outshine a whole galaxy for star, where the matter is denser than better link the properties of supernova to perform, with unprecidented accu- several weeks. in atomic nuclei and reaches tem- explosions and their remnants to the racy, simulations of the complex particle peratures of several hundred billion de- different types of progenitor stars. It physics, nuclear physics, and plasma It is these cataclysmic blasts, which we grees. Neutrinos are the leak by which would also bring us closer to an answer dynamics that determine the destiny of owe our existence to. They enrich the the collapsing stellar core loses its of the question whether supernovae collapsing and exploding stars. galaxy with carbon, oxygen, silicon, and gravitational binding energy. But some are the still mysterious source of rare iron, the building blocks of the earth fraction of the emitted neutrinos is still elements like gold, lead, thorium, and Computational Challenges and of all the creatures on it. Assem- able to deposit its energy in the matter uranium. And it would allow us to more The modeling of supernova explosions bled during millions of years of nuclear surrounding the compact neutron star reliably predict the neutrino and gravi- is in fact one of the most difficult prob- burning as the dying star has been (see Fig. 1). This energy transfer could tational wave signals, which are planned lems in computational astrophysics. aging, or forged in the inferno of its be enough powerful to accelerate the to be measured for future galactic su- Largely different time scales, varying final explosion, these chemical elements supernova shock front and to expel the pernovae by a new generation of big ex- between microseconds and seconds, are disseminated into the galactic me- overlying shells of the star. The ques- periments, and which are the only ways and length scales that extend from dium when the star is disrupted in the tion whether this happens or not is a to observationally probe the processes tens of meters to millions of kilome- supernova event. central problem in supernova theory. deep inside the core of a dying star. ters, have to be resolved to follow

26 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 27 neutrino interactions, nuclear re- tion timescale of neutrinos in neutron Recent Progress actions, turbulent convection, and star matter is extremely short and With the most sophisticated simula- sound wave propagation in different the neutrino propagation happens with tions carried out to date, we have regions of the collapsing core and of the speed of light after decoupling, the learned that the neutrino energy the ejected outer layers of explod- nonlinear transport equations of neu- deposition around the newly formed ing stars. This is computationally trinos, which as fermions are subject neutron star is supported by different extremely demanding: half a second to phase-space blocking effects, need fluid instabilities that take place in the of evolution requires 500,000 time to be solved with fully implicit time gas flow that continuously adds more steps and in two spatial dimensions stepping. In our current numerical im- matter from the collapsing stellar core with 500-1,000 radial zones and plementation this leads to big, densly to the compact remnant at the cen- typically 196 lateral bins needs some filled matrices that have to be inverted tre. The neutrino-heated gas is stirred 1018 floating point operations. several times on every time level of up by vigorous convective overturn the calculated evolution. This is com- as hot matter becomes buoyant and Besides integrating the Euler equa- putationally very expensive and defies begins to rise while cooler fluid sinks tions that describe the time-dependent easy parallelization. New algorithms, inward and is partly absorbed into the motion of the stellar fluid, one needs based on iterative multigrid solvers for neutron star (Fig. 2). to solve the transport of neutrinos hyperbolic systems of equations, are Applications in the dense stellar matter. Different currently under construction but not In addition to this phenomenon, which Applications from the stellar gas, where very fast yet available for full-scale supernova has been known already for ten years, electromagnetic and strong interac- calculations. Their use, however, will we have discovered a global non-radial tions ensure that equilibrium is estab- be unavoidable in future three-dimen- instability of the gas flow towards the lished on dynamical timescales, neu- sional models of supernovae. centre. The layer between neutron trinos couple with matter only through star and supernova shock front is weak reactions. Thus they require a The special needs of our current two- unstable by a so-called advective- transport treatment in phase space dimensional simulations, i.e., shared- acoustic cycle, which constitutes an by the Boltzmann equation and its mo- memory nodes (up to 256 CPUs) amplifying feedback cycle of inward ment equations. This constitutes a with powerful processors and the advected vorticity perturbations and high-dimensional, time-dependent prob- continuous availability of these nodes outward propagating sound waves. lem – in spherically symmetric models for many months, are satisfied on This instability can grow even in condi- the transport is three-dimensional, for different supercomputing platforms tions where convection remains weak axisymmetric models five-dimensional, in all three national supercomputing and it can instigate violent secondary and in full generality six-dimensional – centres, each of which has required convection (Fig. 3). It thus improves and poses the major computational special code adaptation and optimiza- the conditions for ongoing strong neu- challenge: the neutrino transport con- tion: on the NEC SX-8 of the Höchst- trino energy transfer to the supernova sumes the by far dominant amount of leistungsrechenzentrum Stuttgart shock. In fact, our simulations show CPU time during supernova simula- (HLRS), on the sgi Altix 4700 of the that for stars from about eight solar tions. Leibniz-Rechenzentrum (LRZ) in Mu- masses to at least 15 solar masses nich, and on the IBM p690 Jump of neutrino energy deposition, supported Figure 2: The problem is also hard to be imple- the John von Neumann Institute for to different extent by both hydrody- Gas entropy (left half) mented efficiently on massively parallel Computing (NIC) in Jülich. Our project namic instabilities, can initiate and and electron-to-nucle- computers. In particular the neutrino also receives support by CPU time drive the supernova explosion (Figs. 2 on ratio (right half) during the early stages transport module has resisted such within the AstroGrid-D as part of the and 3). (0,097, 0,144, 0,185, efforts so far and currently prevents D-Grid initiative that is funded by the and 0,262 seconds) us from using distributed memory German Federal Ministry of Education of the explosion of a architectures. Because the interac- and Research (BMBF). star with nine solar masses. Convection causes anisotropies of the ejected gas.

28 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 29 The onset of the explosion thus turns Outlook References out to be a generically multi-dimensional Despite the significant progress of our [1] Buras, R., Janka, H.-Th., Rampp, M., Kifonidis, K. phenomenon. The highly aspherical ini- fundamental understanding of the pro- Two-dimensional hydrodynamic core-col- tiation of the blast (Fig. 3), even in the cesses that conspire in starting the lapse supernova simulations with spectral absence of rapid rotation, suggests explosion, many more simulations and neutrino transport. Models for different explanations for a variety of observa- in particular long-time simulations are progenitor stars, Astronomy & Astrophys- ics, 457, 281-308, 2006 tions. The fast motions of many young needed to establish the properties of neutron stars, which are measured to self-consistently calculated explosions. [2] Janka, H.-Th., Marek, A., Müller, B., have average velocities of several hun- The different structures of stars with Scheck, L. Supernova explosions and the birth of dred kilometers per second, some of different masses require studies of a neutron stars, in Proceedings of Interna- them even of more than 1,000 km/s, wider range of supernova progenitors. tional Conference 40 Years of Pulsars: can be explained by the recoil imparted And the incomplete knowledge of the Millisecond Pulsars, Magnetars, and More, to the compact remnant by the aniso- initial conditions (e.g., of the angular McGill University, Montreal, Canada, August 12-17, 2007, Eds. Wang, Z., Bassa, C., tropically ejected supernova gas. The momentum in the stellar core) and of Cumming, A., Kaspi, V., AIP Conference asymmetry of the developing explosion various aspects of the microphysics Proceedings, Vol. 983 (American Institute also triggers large-scale mixing instabil- (e.g., of the equation of state of hot of Physics, New York), 369-378, 2008; ities in the outer layers of the disrupted neutron star matter) make it neces- arXiv:0712.3070 Applications Applications star, thus accounting for the clumpi- sary to explore the full range of vari- [3] Kitaura, F.S., Janka, H.-Th., Hillebrandt, W. ness seen in many supernovae and the ability. Ultimately, three-dimensional Explosions of O-Ne-Mg cores, the Crab asymmetric appearance of gaseous simulations will have to be performed supernova, and subluminous Type II-P supernovae, Astronomy & Astrophysics, supernova remnants. to confirm our findings of the present 450, 345-350, 2006 two-dimensional models. But for that to be possible, we still need to wait for [3] Marek, A., Janka, H.-Th. the next generations of more powerful Delayed neutrino-driven supernova explo- sions aided by the standing accretion-shock supercomputers, and we have to ad- instability, Astrophysical Journal, submitted; vance our modeling tools to massively arXiv:0708.3372, 2007 parallel application. • Hans-Thomas [3] Scheck, L., Kifonidis, K., Janka, H.-Th., Müller, E. Janka Multidimensional supernova simulations • Andreas Marek with approximative neutrino transport. • Bernhard Müller Neutron star kicks and the anisotropy of neutrino-driven explosions in two spatial

dimensions, Astronomy & Astrophysics, Max-Planck-Institut 457, 963-986, 2006 für Astrophysik, Garching

Figure 3: Gas entropy for the beginning explosion of a star with eleven solar masses. The explosion becomes globally asymmetric because of the generic nonradial instability of the developing super- nova shock.

30 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 31 The Aquarius Project: inversely proportional to the square root grouped, multipole moments are calcu- of the density, so simulating a CDM halo lated for each node, and then the force Cold Dark Matter under means dealing with a system where on each particle is obtained by approxi- different regions evolve on timescales mating the exact force with a sum over a Numerical Microscope which may differ by factors of thou- multipoles. A great strength of the tree sands. Codes with spatially-dependent, algorithm is the near insensitivity of its adaptive timestepping are mandatory performance to clustering of matter, its A major puzzle in Cosmology is that the on the HLRB II supercomputer at LRZ, otherwise the most rapidly evolving re- ability to adapt to arbitrary geometries of main matter component in today‘s Uni- we aim to do this by studying the highly gions, which usually include only a tiny the particle distribution, and the absence verse appears to be a yet undiscovered nonlinear structure of CDM halos in fraction of the mass, force timesteps of any intrinsic resolution limitation. elementary particle whose contribution unprecedented detail. We are especially so short that the calculation grinds to to the cosmic density is more than 5 interested in the innermost regions of a halt. However, there are actually faster times that of ordinary baryonic matter. these halos and in their substructures, methods to obtain the gravitational This Cold Dark Matter (CDM) interacts where the density contrast exceeds 106 A second challenge stems from the fields on large scales. In particular, the extremely weakly with regular atoms and the astrophysical consequences of highly clustered spatial distribution of particle-mesh (PM) approach based and photons, so that gravity alone has the nature of dark matter may be most matter which affects in particular the on Fourier techniques is probably the affected its distribution since very early clearly apparent. Quantifying these con- scalability of parallel algorithms. A CDM fastest approach to calculate the gravi- Applications times, when the Universe was in a sequences reliably through simulations halo is a near-monolithic, highly con- tational field on a homogeneous mesh. Applications nearly uniform state. is, however, an acute challenge to nu- centrated structure with a well-defined The obvious limitation of this method is merical technique. centre and no obvious geometrical de- that the force resolution cannot be bet- When the effects of the baryons can composition which can separate it into ter than the size of one mesh cell, and be neglected, the nonlinear growth of The numerical Challenge the large number of computationally the latter cannot be made small enough structure is a well-posed problem where In the Aquarius Project, we have per- equivalent domains required for optimal to resolve all the scales of interest in both the initial conditions and the evolu- formed the first ever one-billion particle exploitation of the many processors cosmological simulations. In fact, in our tion equations are known. In fact, this simulation of a Milky Way-sized dark available in high-performance parallel application we would need a mesh with is an N-body problem par excellence. matter halo, improving resolution by architectures. In addition, gravity couples (10,000.000)3 cells to deliver the de- The faithfulness of late-time predictions a factor of more than 15 relative to the dynamics of matter throughout sired resolution with a single PM mesh is limited purely by numerical tech- previously published simulations of this the halo and beyond, requiring efficient - storing such a mesh would require nique and by the available computing type. The achieved mass resolution of communication between all parts of the several million petabytes! resources. Over the past two decades, ~1,700 solar masses is nearly a million simulated region. In our calculation, the simulations have already been of tre- times better than that of the largest clustering is so extreme that about one GADGET therefore uses a compromise mendous importance for establishing cosmological simulation of large-scale third of all simulation particles collect in between the two methods. The gravita- the viability of the CDM paradigm. In structure formation carried out to date a region that encompasses less than a tional field on large scales is calculated particular, simulation predictions for the (the “Millennium Simulation”). Our spatial fraction of 10-8 of the simulated volume! with a Particle-Mesh (PM) algorithm, distribution of matter on large scales resolution reaches 20 parsec, which while the short-range forces are de- have been compared directly with a implies a dynamic range of close to 107 Calculation Method and livered by the tree, such that a very wide array of observations; so far the per dimension within the simulated box- Parallelization Techniques accurate and fast gravitational solver paradigm has passed with flying colors. size of more than 400 million lightyears To make the Aquarius Project possible results. A central role in our parallel across. This huge dynamic range makes on the HLRB II, we have developed a code is played by the domain decompo- Given CDM‘s success in reproducing our simulation a microscope for the major new version of our simulation sition. It has to split the problem into the main aspects of the large-scale phase-space structure of dark matter code, GADGET-3, in order to improve smaller parts without data duplication, structure of the Universe, it is impor- and enables dramatic advances in our scalability and performance for this in a way that ensures a good balance tant to test its predictions on smaller understanding of the structure and sub- extremely tightly coupled problem. of the computational work induced for scales, both to test it further and to structure of dark matter in our galaxy. GADGET uses a hierarchical multipole each processor. GADGET uses a space- seek clues to the nature of dark mat- However, formidable challenges had to expansion (organized in a “tree”) to filling self-similar fractal, a Peano-Hilbert ter. In the Aquarius Project carried out be overcome to make this calculation calculate gravitational forces. In this curve, for this purpose, which is made by the international Virgo Consortium possible. Gravitational timescales are method, particles are hierarchically to become finer in high-density regions.

32 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 33 The domains themselves are then gen- An impression of the dynamic range of regime where the local logarithmic slope Thanks to the powerful HLRB II com- erated by cutting the one-dimensional the simulation is given in Figure 1. What of the density profile of the dark matter puter, the Aquarius Project produced

space-filling curve into Ncpu pieces that is readily apparent is the fascinating rich- cusp becomes shallower than -1. The the best resolved calculation of the approximately induce the same compu- ness of dark matter substructure re- structure of the cusp is of fundamental Milky Way‘s halo carried out worldwide, tational work-load. The domains are of vealed by the simulation, which resolves importance for our understanding of the and the ongoing analysis delivers many nearly arbitrary shape but always have a several hundred thousand gravitational CDM model, but has remained a highly new insights for theoretical astrophys- relatively small surface-to-volume ratio. bound clumps of matter that orbit within contentious issue up to now. Our results ics. However, computational astrophys- the galaxy‘s potential. However, this ex- demonstrate convincingly that an asymp- ics in this area has still many exciting First Results traordinary dynamic range comes at a totic power law of fixed slope apparently challenges in store for the future. As part of our Aquarius Project on the price. More than 100,000 timesteps on does not exist. Instead, the profile con- Ultimately we would like to repeat our HLRB II, we have carried out extensive 1,024 cores of the HLRB II and about tinues to become gradually shallower at calculation by not only including the dark numerical resolution tests where we 3 TB of RAM were required to carry ever smaller radii. matter, but also the ordinary baryons systematically increased the particle out the simulation. In sum, the total that make up the stars, at similar or number used to simulate the same CPU-time needed for completion of the Our simulations also provide the first ac- even better resolution. Carrying out galaxy. Our primary simulation series C02-2400 calculation was nearly 4 mil- curate and numerically converged mea- such ultra-highly resolved hydrodynami- culminated in our largest production lion hours. The output produced forms a surement of the density profile of dark cal cosmological simulations will require calculation, which we refer to as C02- rich dataset of 45 terabytes in size, and matter substructures, and they deliver further progress in the scalability of our Applications 2400. This simulation followed about will provide an extremely valuable scien- precise predictions for the abundance codes, and the use of yet larger num- Applications 4,5 billion particles, from a time briefly tific resource for many years to come. of dark matter substructures. Using bers of compute cores. after the Big Bang to the present the simulation, we can obtain accurate epoch, over more than 13 billion years Figure 2 shows spherically averaged determinations of the dark matter an- References of cosmic evolution. About 1,3 billion density profiles obtained for the differ- nihilation signal expected from the Milky [1] Barnes, J., Hut, P. A Hierarchical O(N logN) Force-Calculation particles end up in the virialized region ent resolutions that we calculated for Way‘s halo, which becomes potentially Algorithm, 1986, Nature, 324, 446 of a single, Milky-Way-sized galaxy, our “C02” Milky Way halo. The conver- measurable with the launch of the GLAST • Volker Springel1 opening up a qualitatively new regime gence is excellent over the entire range gamma-ray satellite later this year. The [2] Davis, M., Efstathiou, G., Frenk, C. S., • Simon D.M. for studying the non-linear phase-space where convergence can be expected simulation will also help to improve our White, S. D. M. White1 The evolution of large-scale structure in structure of dark matter halos. based on the numerical parameters of understanding of galaxy formation and, in • Julio Navarro2 a universe dominated by cold dark matter, the simulations. For the first time, our particular, the role and physics of satel- • Adrian Jenkins3 1985, Astrophysical Journal, 292, 371 simulation series probes directly into a lite galaxies in the Milky Way‘s halo. • Carlos S. Frenk3 [3] Diemand J., Kuhlen M., Madau P. • Amina Helmi4 Dark Matter Substructure and Gamma-Ray 109 Annihilation in the Milky Way Halo, Astro- 1 z = 0.0 Max-Planck- 8 physical Journal, 657, 262, 2007 10 Institute for 3 [4] Navarro J. F., Frenk C. S., White S. D. M. 107 C02 - 2400 Astrophysics, The Structure of Cold Dark Matter Halos, 3 6 C02 - 1200 Garching 10 Astrophysical Journal, 462, 563, 1996

3 density 5 C02 - 800 10 [5] Springel, V. 2 University of

3 The cosmological simulation code GADGET-2, 104 C02 - 400 Victoria Monthly Notices of the Royal Astronomical 3 103 C02 - 200 Society, 364, 1105, 2005 3 University of 102 [6] Springel, V., White S. D. M., Jenkins A., 0.1 1.0 10.0 100.0 Durham, r [ h-1 kpc ] et al. Simulations of the formation, evolution and Institute for clustering of galaxies and quasars, Nature, Computational Figure 1: Dark matter distribution in the Figure 2: Spherically averaged density profi les of 435, 629, 2005 Cosmology C02-2400 halo at the present time, showing simulations of the same object carried out at different

the virialized region of the main halo and its numerical resolution within the Aquarius Project. The 4 immediate surroundings. The smaller picture dashed vertical lines mark the gravitational resolution University of enlarges the marked region by a factor of 4. limit of the individual calculations. Groningen, Kapteyn Astronomical Institute 34 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 35 IRMOS – Interactive Real-time Multimedia Applications on Service-oriented Infrastructures

Traditionally, “real-time” refers to hard service oriented infrastructure that en- business) to allow complete end- • Toolbox and best practices for real-time systems, where even a single ables the real-time interaction of a dis- to-end solutions to be built using real-time interactive applications violation of the desired timing behaviour tributed set of people and resources. a single infrastructure. is not acceptable, for example because • Semantic descriptions and modelling it leads to total failure, possibly causing IRMOS is set apart from today’s Each of these benefits will be demon- of application characteristics loss of human lives. However, there is Service Oriented Infrastructures strated by using a range of applications also a wide range of applications that through the following key features: including collaborative film post produc- • Contribution to standardisation bodies also have stringent timing and per- tion, virtual and augmented reality ap- formance needs, but for which some 1) IRMOS will make it possible to plication and interactive online learning At the end IRMOS provides a framework deviations in Quality of Service (QoS) distribute interactive real-time using shared virtual environments. that eases usage of services, either Projects are acceptable, provided these are well applications across organizational This broad range of demonstrators will pure software-based or hardware-re- Projects understood and carefully managed. boundaries, instead of having to use enable the benefits of IRMOS to be pro- lated, across organizational boundaries. These are soft real-time applications dedicated, expensive and collocated moted to the widest possible audience For example it provides the possibility and include a broad class of interactive hardware at a single site. and achieve significant impact in the to SMEs to gain temporary access to and collaborative tools and environ- industrial, civil and educational sectors. specialized services on a rental basis ments, including concurrent design and 2) Businesses will be able to come visualization in the engineering sector, together quickly and efficiently using media production in the creative indus- IRMOS to identify, agree and deliver tries, and multi-user virtual environ- real-time applications without the ments in education and gaming. need for protracted manual negotia- tions or service provisioning. Soft real-time applications are tradi- tionally developed without any real-time 3) Providers will be able to deliver methodology or run-time support from services that are cost effective and the infrastructure on which they run. have guaranteed Quality of Service The result is that either expensive and by using IRMOS to give them full dedicated hardware has to be pur- control over their resources. Objectives in IRMOS include: that are only affordable to maintain with chased to ensure good interactivity high acquisition costs and specialized levels and performance, or that gen- 4) IRMOS will give all participants • Facilitate real-time interactivity staff. In addition the IRMOS provides eral-purpose resources are used as a in inter-organization value chains in SOIs certain guaranteed environment proper- compromise (e.g. commodity operating including service providers and ties, e.g. in terms of network link quality, systems and Internet networking) with consumers the confidence that • Consolidate management and which are negotiated on the basis of a no way to guarantee or control the be- interactive real-time applications control of the infrastructure and service level agreement, which acts like haviour of the application as a result. will be delivered in a predictable, services a contract between service-consumer reliable and efficient way. and service-provider. IRMOS in General • Enable integration between network IRMOS aims to break this mould by 5) IRMOS provides a comprehensive and application services enabling “soft real-time” applications to approach that addresses real-time be delivered through value chains that at all levels (network, processing, • Engineer a platform of services span organizational boundaries by a storage, application, workflow and

36 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 37 Such a transmission of video data as Facts well as metadata is done in real-time IRMOS is a project funded by the Euro- in a synchronized way. The constraint pean Union within the 7th Framework of real-time in this context raises re- program in the 1st call of ICT. The quirements considering bandwidth, consortium consists of eleven project which has to be large enough for the partners with a total project budget video data of the appropriate resolu- of 12.9 M. It runs for 36 months and tion, as well as latency to ensure that started on 1st of February 2008. all participants of such collaboration more or less see the visualization at the Partners same time. These constraints are the Xyratex (UK) issues to be solved by IRMOS through University of Stuttgart (DE) its framework services that lookup and ICCS/NTUA (GR) determine appropriate resources for Alcatel-Lucent (DE) a gateway service as well as selection SINTEF (NO) of an appropriate network link. These University of Southampton (UK) features requested through the applica- Scuola Superiore Sant‘Anna (IT) Projects tion are, once setup, guaranteed for the Telefonica I+D (ES) Projects requested time. With its features for Giunti Labs (IT) SLA-based QoS, e.g. in terms of latency, Grass Valley (DE) bandwidth or even synchronization of Deutsche Thomson OHG (DE) multiple streams, as well as monitoring and feedback provision of the acquired Project Coordinator HLRS in IRMOS sion of real-time video data which addi- link, IRMOS makes the scenario more Eddie Townsed The role of HLRS in the IRMOS project tionally is synchronized with position and stable and reliable. is to provide an application scenario for orientation data of the Augmented Real- Xyratex Technology Limited the service-oriented infrastructure, to ity software. The video stream is then The HLRS application scenario poses Langstone Road show its advantages and benefits com- received by the collaboration partners high requirements on the IRMOS frame- Havant • Stefan Wesner pared to today’s existing solutions, as and incorporated into their Virtual Real- work in the area of guaranteed band- Hampshire well as contributing to the development ity environment and overlaid over simu- width, latency and processing power. PO9 1SA University of the framework. The application sce- lation data based on the position and For the IRMOS framework to be able United Kingdom of Stuttgart, nario focuses on Virtual and Augmented orientation that was received with the to provide resources and network links HLRS Reality in collaborative working sessions. additional synchronized stream. This way that fulfil these constraints an interface Tel: +44 (0) 23 9249 6000 It comprises cross-organizational provi- partners in the collaboration can share is provided, that allows an application Fax: +44 (0) 23 9249 6001 sion of Augmented Reality as a service the Augmented Display even though they to specify its requirements on a high to remote locations. Several partners don’t have the hardware equipment at abstraction layer using a specification Website: www.irmosproject.eu join in a visualization session where one their site, but instead benefit of the so language. For that reason framework partner site has the capabilities to per- called Remote Augmented Reality where services are provided by the IRMOS form the Augmented Reality. This part- the Augmented Reality service is pro- framework, that offer webservices for ner site however does not only provide vided by a specifically equipped remote requirements specification, SLA negotia- Augmented Reality locally, but also to all partner. The COVISE application needs tion, service discovery, resource booking collaboration partners over the internet. certain adaptations for interfacing with and service monitoring. With its experi- The application used in this scenario the IRMOS framework to allow the re- ence in the field of web technologies is called COVISE (which stands for Col- quired SLA-negotiation for IRMOS and HLRS provides a good knowledge in that laborative Visualization and Simulation the service development to be able to field of technology and also contributes environment). In IRMOS a service will be transparently utilise the hardware fea- to the IRMOS project to successfully developed that performs video transmis- tures provided by IRMOS. establish such a framework service.

38 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 39 DEISA to enhance the European lifetime cycle of typically about 5 years, for the provision and maintenance as implied by Moore’s law. So far scien- of a European Benchmark Suite for HPC Infrastructure in FP7 tists from 15 different European coun- supercomputers. In addition, Joint tries with collaborators from four other Research Activities aim at an integrated continents have benefited. environment for scientific application development, and at the enabling of In EU FP7, the DEISA Consortium con- DEISA Background DEISA2 Essentials applications for the efficient exploitation tinues to support and further develop In spring 2002 the idea emerged to In the follow-up FP7 project DEISA2, the of current and future supercomputers the distributed high performance com- overcome the fragmentation of super- single-project oriented activities (DECI) by an aggressive parallelism. Further puting infrastructure and its services computing resources in Europe both in will be qualitatively extended towards reading: www.deisa.eu through the DEISA2 project funded for terms of system availability and in the persistent support of Virtual Science three years as of May 2008. Activities necessary skills for efficient supercom- Communities. DEISA2 will provide a DEISA Members and services relevant for Applications puting support. The establishment of a computational platform for them, offer- • BSC, Barcelona, Spain Enabling, Operation, and Technologies distributed European supercomputing ing integration via distributed services • CINECA, Bologna, Italy are continued and further enhanced, as infrastructure was proposed. In May and web applications, as well as man- • CSC, Espoo, Finland these are indispensable for the effective 2004 the DEISA project was started aging data repositories. Emphasis will • ECMWF, Reading, UK support of computational sciences in as a EU FP6 Integrated Infrastructure be put on collaborations with research • EPCC, Edinburgh, UK Projects the HPC area. The service provisioning Initiative by eight leading European su- infrastructure projects established by • FZJ, Jülich, Germany Projects model will be extended from one that percomputing centres. In 2006 DEISA the ESFRI, and European HPC and Grid • HLRS, Stuttgart, Germany supports single projects to one sup- was joined by three additional leading projects. The activity reinforces the rela- • IDRIS-CNRS, Orsay, France porting Virtual European Communities. centres. Through the joint efforts, tions to other European HPC centres, • LRZ, Garching, Germany Collaborative activities will be carried out DEISA reached production quality soon leading international HPC centres in • RZG, Garching, Germany with new European and other interna- after to support leading edge capability Australia, China, Japan, Russia and the • SARA, , The Netherlands tional initiatives. computing for the European scientific United States, and leading HPC projects community. DEISA has also contributed worldwide. For supporting international Acknowledgment Of strategic importance is the co-opera- to a raising awareness of the need for a science communities across existing The DEISA Consortium thanks the tion with the PRACE project which is persistent European HPC infrastructure political boundaries, DEISA2 participates European Commission for support preparing for the installation of a limited as recommended in the ESFRI report in the evaluation and implementation of through contracts RI-508830, • Hermann Lederer number of leadership-class Tier-0 super- 2006. standards for interoperation. RI-031513 and RI-222919. • Stefan Heinzel computers in Europe. The key role and aim will be to deliver a turnkey opera- DEISA Extreme Taking care of the operation of the infra- Rechenzentrum tional solution for a future persistent Computing Projects structure and the support of its efficient Garching der European HPC ecosystem, as suggested The DEISA Extreme Computing Initiative usage is the task of the service activities Max-Planck-Gesell- by ESFRI. The ecosystem will integrate (DECI), launched in 2005, continues to Operations, Technologies and Applica- schaft national Tier-1 centres and the new support the most challenging super- tions. Operations refers to operating the Tier-0 centres, as illustrated computing projects in Europe which HPC infrastructure and advancing it to a in the following require the special resources and skills turnkey solution for the future European sketch. of DEISA. A European Call for Extreme HPC ecosystem. Technology covers Computing Proposals is published annu- monitoring of existing technologies in ally in spring. By selecting the most ap- use and taking care of new emerging propriate supercomputer architectures technologies with relevance for the infra- for each project, DEISA is opening up structure. Applications addresses the the currently most powerful HPC archi- areas applications enabling for the tectures available in Europe for the most DECI, Virtual Communities and challenging projects. This mitigates the EU projects, environment and rapid performance decay of a single user related application sup- national supercomputer within its short port, and benchmarking

40 Spring 2008 • Vol. 6 No. 1 • inSiDE 41 New service-oriented business models ing and signing of the agreements will for this approach will be identified and a provide the necessary level of security number of widely-used license-protected and protection against fraud. All negotia- commercial applications will be adapted tions aiming to lead to a Service Level Grid-friendly Software to be executed under control of the new Agreement, i.e. an electronic contract, licensing mechanisms and will become on terms of the usage of licensed soft- Licensing for Location part of a high quality show-case to con- ware under the umbrella of the license independent Application vince more code-owners to adapt their contract framework. Agreement based applications. license schema enable brokering of Execution licenses, thus allowing to extend the Licenses as Grid Services current business models and to create A promising approach to overcome new ones. The currently existing licensing models underlying hardware and their perfor- the limitations of the current monolithic for commercial applications are focus- mance indicators, e.g. CPU type and licensing models is to reformulate li- Dynamic Licenses sing on software used on compute re- frequency, that are often used in license censes as manageable Grid resources To support license agreements that sources within an administrative domain. agreements. While current mechanisms or even Grid services. This will allow change over time and for cases where Licenses are provided on the basis of limit the usage of licensed software in managing and orchestrating licenses in negotiation between service provider Projects named users, hostnames, or sometimes Grid environments and virtualized in- a job or workflow together with other (i.e. license provider) and service con- Projects as a site license for the administrative frastructures, the increasing usage of resources like compute nodes, data sumer is needed to settle an agreement, domain of an organization. If we want these environments and infrastructures storage, and network Quality of Service SmartLM will implement negotiation to use this software in a distributed make it necessary to overcome these (QoS). SmartLM will provide an orches- procedures. Making licenses flexible with service oriented infrastructure, using limitations. tration service that is capable to co-allo- respect to lifetime, agreed QoS, pricing, resources that are spread across differ- cate different resources and services to etc. allows for the re-negotiating of ser- ent administrative domains, that do not Solution be used at the same time, e.g. reserva- vice consumption in case of unpredicted host the applications license server, we SmartLM will provide a generic and flex- tion of compute nodes and the licenses extension, reduction, or other changes run into trouble. The licenses usually are ible licensing virtualization technology for applications that will be executed in of the requested and agreed service. bound to hardware within the domain of based on standards for new service- a workflow scheduled to these nodes. the user and do not allow access from oriented business models. The solution Project Facts • Daniel Mallmann1 outside thus enforcing local use of the will implement software licenses as Grid Licenses managed The SmartLM project is partly funded • Wolfgang Ziegler2 protected applications only. Grid environ- services, providing platform-independent as Agreements by the European Commission under • Josep Martrat3 ments are usually spread across multiple access just like other Grid resources In emerging Grids and virtualized infra- contract number 216759. The project organizations and their administrative and being accessible from resources structures the conditions of resource started in February 2008 and has a 1 Forschungs- domains, and virtualised infrastructures, outside organizational boundaries. Ser- usage reflect and extend the conven- duration of 30 months. The project con- zentrum Jülich, like utility and cloud computing, hide the vice Level Agreements based on evolving tional Service Level Agreements (SLAs) sortium, led by Atos Origin, is formed Jülich standards will govern licenses. Secure which are made today between resource by Independent Software Vendor (ISV), Supercomputing agreements will be used to transport providers and resource consumers. Business Analyst, Application Service Centre licenses through the Grid and to make These SLAs define for example the life- Provider (ASP), academic partners, them available on the resource to which time, resource usage, accounting and and public centres: Atos Origin (Spain), 2 Fraunhofer a user has been granted access for the billing, but also the guarantee of QoS and Fraunhofer SCAI (Germany), FZ-Jülich Institut SCAI, execution of the application. The license penalties or compensations if guaran- (Germany), CINECA (Italy), The 451 Sankt Augustin agreement and conditions of use for an tees are missed. In SmartLM the agree- Group (UK), INTES (Germany), ANSYS application will be reached through ne- ments between service provider and (Germany), LMS International (), 3 Atos Origin, gotiation between service providers and service consumer about using license T-Systems (Germany), CESGA (Spain), Barcelona service customers. SmartLM will inte- protected software are expected to and Gridcore Aktiebolag (Sweden). grate the generic licensing virtualization transport and propagate the appropriate technology into the major Grid middle- licenses corresponding to the providers’ The website of the project offers detailed ware solutions UNICORE and Globus. and consumers’ requirements. Encrypt- information: http://www.smartlm.eu/

42 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 43 Toward Optimal Load Balance in Parallel Programs for Geophysical Simulations

Understanding and simulating seis- that preserve the good qualities of the mic phenomena is of vital importance local time stepping and reduce the load today. Seissol, the tool for the simula- imbalance. tion of the generation and propaga- tion of seismic waves, used for such Space Filling Curves (SFC) is, in few purpose has been modified to achieve words, a line that passes for every point better speed up by introducing the in a discrete domain (Fig. 1). We use concept of local time stepping. In the the Hilbert space filling curve that has original implementation of Seissol, the time step was calculated for all ele- Projects ments being the smallest chosen and Projects communicated to all elements. Thus, regardless of the shape and size of any element, all of them were doing the same calculations in one time iteration. With local time stepping each cell uses Figure 2: Event time line. Up: Load balance using METIS. Down: load balance using SFC. its own time measurement. In one time In blue user code and in red MPI calls, 16 processors. iteration some cells will be updated while some will not. Some subdomains Generating a space filling curve is a de- the number of points by a factor of 8). have more cells to update than others, manding task. Besides we have also to Instead we use modern programming introducing load imbalance. Figure 1: Hilbert SFC of level 2 and 64 subcubes performe a lot of calculations over mil- techniques and efficient algorithms that

Standard methods like METIS [1], used two nice properties. One is its shape lions of cells, so brute-force methods together with binary representation to distribute data among processors that maintains communication at a mini- are not going to help. A SFC of level 7, and Gray code reduce the ratio of calcula- are no longer capable of decomposing mum level. The second one is that we normally required, has over 2.0x106 tions from O(N2) to O(N). In Figure 2 we the domains in an efficient manner. We can calculate the order in the curve of points whose information has to be see the load balance obtained by using are now looking toward new algorithms any point knowing only its coordinates. stored somehow (each level increases SFC compared to METIS.

44 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 45 Figure 3: Domain decompoition by METIS, Figure 4: Domain decomposition by SFC, Figure 5: Time per processor vs. Iteration. Figure 6: Time per processor vs. Iteration, 16 subdomains, visible subdomains: 0 to 4 16 subdomains, visible subdomains: 0 to 4 16 processors and 15000 iterations 64 processors and 6500 iterations

How does it work? Conclusions processor. However, the partition done simulation than we thought. We will see A SFC begins at one corner of our In general we have seen an average by SFC presents a better performance. cells jumping from subdomains to sub- Projects domain and starts collecting elements improvement between 15% and 20%. One interesting property of these pic- domains. How to migrate them without Projects until they sum up to the load that We are still investigating how the time tures is, if we trace one single proces- creating an excessive overhead is still an corresponds to one subdomain. evolution of the load balance develops. sor during the simulation we will see open question.

This process is repeated for the next The time tracing per iteration of all pro- it moving from top to bottom and vice References • Orlando Rivera1 subdomian. Obviously, subdomains will cessors during one single simulation versa in the band. The same effect was [1] Karypis, G. • Martin Käser2 METIS - Family of Multilevel Partitioning no longer have the same number of generates a spectrum-like graph. The observed either using METIS or SFC Algorithms, http://glaros.dtc.umn.edu/ 1 cells. Instead they have a comparable narrower the band, the better the load decomposition (in Figure 5 and 6 pro- gkhome, 2006-2007 Leibniz load. In Figure 4 we see a typical do- balance is, as shown in Figure 5 and 6. cessor 0 is drawn in black as example). Supercomputing [2] Dumbser, M., Käser, M., Toro, E. main decomposition produced by SFC, The more processors are used for the That means that a processor might be Centre An Arbitrary High Order Discontinuous while in Figure 3 the same domain is simulation, the more crucial the load sometimes the fastest or sometimes Galerkin Method for Elastic Waves on decomposed by METIS. balancing problems gets. The domain might be the slowest. Predicting the load Unstructured Meshes V: Local Time 2 University decomposition made by SFC presents balance statically is a very difficult busi- Stepping and p-Adaptivity, 2000 of Munich,

a better load balance for 16 and 64 ness. But most importantly, it tells us [3] Hamilton., C. Department subdomains. In the later case the load that if we start considering dynamic load Compact Hilbert Indices, Dalhousie of Earth and imbalance is dramatically increased by balance we know as a fact that this has University Technical Report CS-2006-07, Environmental July 2006 the relatively small number of cells per to be done much more often during the Sciences

46 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 47

Leibniz Supercomputing Centre of the Contact Compute servers currently operated by LRZ are Bavarian Academy of Sciences (Leibniz- Leibniz-Rechenzentrum

Rechenzentrum der Bayerischen Aka- Peak Performance demie der Wissenschaften, LRZ) in Dr. Horst-Dieter Steinhöfer User Garching near Munich provides national, Boltzmannstraße 1 System Size (TFlop/s) Purpose Community regional and local HPC services. 85748 Garching/München Germany HRLB II: 9,728 Cores 62,3 Capability German SGI Altix 4700 39 TByte Computing Universities Each platform described below is Phone +49 - 89 - 3 58 31- 87 79 Intel IA64 and Research documented on the LRZ WWW server; 19 x 512-way Institutes, please choose the appropriate link from DEISA www.lrz.de/services/compute

Linux-Cluster 256 Cores 1,6 C apabili t y Bavarian SGI Altix 4700 1 TByte Computing Universities Centres 256-way Centres

Linux-Cluster 128 Cores 0,8 Capability Bavarian SGI Altix 0,5 TByte Computing Universities 3700 BX2 128-way

Linux-Cluster 2,020 Cores 12,3 Capacit y Bavarian Intel Xeon 3,7 TByte Computing and Munich EM64T Universities, AMD Opteron D-Grid, 2-, 4-, 8-, and Bavarian 16-way State Library

Linux-Cluster 220 Cores 1,3 C apaci t y Bavarian Intel IA64 1,1 TByte Computing and Munich 2-, 4- and Universities 8-way

Linux-Cluster 752 Cores 7,2 C a p a c i t y LCG Grid Intel Xeon 1,4 TByte Computing Tier 2 EM64T 4-way

Linux-Cluster 132 Cores 0,72 Capacity LCG Grid Intel IA32 0,3 TByte Computing Tier 2

View of “Höchstleistungsrechner in Bayern HLRB II“, an SGI Altix 4700 A detailed description can be found on LRZ’s web pages: www.lrz.de/services/compute Foto: Kai Hamann, produced by gsiCom

48 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 49 Based on a long tradition in supercom- the University of Heidelberg in the hkz- Compute servers currently operated by HLRS are puting at Universität Stuttgart, HLRS bw (Höchstleistungsrechner-Kompetenz-

was founded in 1995 as a federal Cen- zentrum Baden-Württemberg). Peak Performance tre for High Performance Computing. User HLRS serves researchers at universities Together with its partners HLRS System Size (TFlop/s) Purpose Community and research laboratories in Germany provides the right architecture for the and their external and industrial part- right application and can thus serve NEC SX-8 72 8-way nodes 12,67 Capabilit y German 9,22 TByte memory Computing Universities, ners with high-end computing power for a wide range of fi elds and a variety of Research engineering and scientifi c applications. user groups. Institutes and Industry Operation of its systems is done to- Contact

gether with T-Systems, T-Systems sfr, Höchstleistungsrechenzentrum TX-7 32-way node 0,19 Preprocessing German and Porsche in the public-private joint Stuttgart (HLRS) 256 GByte memory for SX-8 Universities, Research Centres venture hww (Höchstleistungsrechner Universität Stuttgart Centres Institutes für Wissenschaft und Wirtschaft). and Industry Through this co-operation a variety of Prof. Dr.-Ing. Michael M. Resch systems can be provided to its users. Nobelstraße 19 IBM BW-Grid 498 2-way nodes 45,9 Grid D-Grid 70569 Stuttgart 8 TByte memory Computing Community In order to bundle service resources in Germany the state of Baden-Württemberg HLRS Phone +49 - 711- 685 - 8 72 69 has teamed up with the Computing Cen- [email protected] tre of the University of Karlsruhe and www.hlrs.de the Centre for Scientifi c Computing of Nec SX-9 8 16-way nodes 12,8 Grid D-Grid 4 TByte memory Computing Community

Intel Nocona 205 2-way nodes 2,5 Capacity Research 0,62 TByte memory Computing Institutes and Industry

AMD Opteron 194 2-way nodes 2,5 Capacity Research 1 TByte memory Computing Institutes and Industry

Cray XD1 48 2-way nodes 0,54 Industrial Industry and 0,24 TByte memory Development Research Institutes

View of the NEC SX-8 at HLRS A detailed description can be found on HLRS’s web pages: www.hlrs.de/hw-access

50 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 51 Compute servers currently operated by JSC are

Peak Performance User System Size (TFlop/s) Purpose Community The John von Neumann Institute for At present, two research groups exist: Computing (NIC) is a joint foundation of the group Elementary Particle Physics, IBM 16 racks 222,8 Capability German Forschungszentrum Jülich, Deutsches headed by Zoltan Fodor and located at BlueGene/P 16,384 nodes computing Universities, Elektronen-Synchrotron DESY, and the DESY laboratory in Zeuthen and “JUGENE” 65,536 processors Research Gesellschaft für Schwerionenforschung the group Computational Biology and PowerPC 450 Institutes 32 TByte memory and Industry GSI to support supercomputer-aided Biophysics, headed by Ulrich Hansmann scientific research and development. at the Research Centre Jülich. Its tasks are: IBM 41 SMP nodes 9,0 Capability German pSeries 690 1,312 processors computing Universities, Education and training in the fields Cluster 1600 POWER4+ Research Provision of supercomputer capacity of supercomputing by symposia, work- “JUMP” 5,1 TByte memory Institutes and Industry for projects in science, research and shops, school, seminars, courses, and industry in the fields of modeling and guest programmes. computer simulation including their IBM 2 racks 2,2 Capability Selected BladeCentre-H 56 Blades computing NIC Projects Centres methods. The supercomputers with Contact Centres “JULI” 224 PowerPC the required information technology John von Neumann Institute 970 MP cores infrastructure (software, data storage, for Computing (NIC) 224 GByte memory networks) are operated by the Jülich Jülich Supercomputing Centre (JSC)

Supercomputing Centre (JSC) and by Institute for Advanced Simulation (IAS) IBM 12 Blades 4,8 Capability Selected the Centre for Parallel Computing at Cell System 24 Cell processors (single computing NIC Projects “JUICE” 12 GByte memory precision) DESY in Zeuthen. Prof. Dr. Dr. Thomas Lippert 52425 Jülich Supercomputer-oriented research Germany

and development in selected fi elds of Phone +49 - 24 61- 61- 64 02 AMD 125 compute nodes 2,5 Capability EU SoftComp physics and other natural sciences, [email protected] Linux Cluster 500 AMD Opteron computing Community especially in elementary-particle phys- “SoftComp” 2.0 GHz cores 708 GByte memory ics, by research groups of competence www.fz-juelich.de/nic in supercomputing applications. www.fz-juelich.de/jsc

AMD 44 compute nodes 0,85 Capacity and Selected Linux Cluster 176 AMD Opteron capability D-Grid “JUGGLE” 2.4 GHz cores computing Projects 352 GByte memory

apeNEXT 4 racks 2,5 Capability Lattice gauge (special 2,048 processors computing theory Groups purpose 512 GByte memory at Universities computer) and Research Institutes

APEmille 4 racks 0,55 Capability Lattice gauge (special 1,024 processors computing theory Groups purpose 32 GByte memory at Universities computer) and Research Institutes

The IBM supercomputers “JUGENE“ (right) and “JUMP“ (left) in Jülich (Photo: Research Centre Jülich)

52 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 53 The Jülich Supercomputing Centre

Supercomputing is a driving force for • Computational Science • Grid Technologies and Infra- In the following, we will outline some of the field of simulation sciences, which and Mathematical Methods, structures, making Grid technologies the important goals and activities for is the third category of scientific re- providing well-qualified support available for distributed supercom- the near future. search complementing theory and ex- through Simulation Laboratories, puting infrastructures, empowering periment. The ever increasing complex- an innovative community-oriented computational scientists and ex- Supercomputer Facility ity of the systems and processes under research and support structure, perimentalists to use the European The prime objective of this topic is the investigation in science and engineering together with cross-disciplinary high-performance computing (HPC) advancement of JSC to a world-class results at the same time in growing scientific support groups. infrastructure most effectively. leadership facility. JSC strives to become requirements concerning the acces- a European supercomputing centre with sibility of theories to simulation, the ac- curacy of mathematical modeling, the Centres efficiency of numerical and stochastic Centres methods, and the computational meth- odology. This includes the performance and scalability of supercomputers, networks, Grid infrastructures, and data centres as well as programming models, software technology and vi- sualization techniques. Furthermore, due to the generation of complex data flows, future large-scale experimental research presents extraordinary chal- lenges for the storage and processing of huge amounts of experimental data. The mission of the Jülich Supercomput- ing Centre (JSC) is to provide large-scale computational resources and infrastruc- tures for German and European science. Under the name Supercomputing, JSC’s work programme forms part of the Helmholtz Association‘s research field Key Technologies. It is structured into three topics:

• Supercomputer Facility, concerning the provision of super– computer resources of the highest performance and widest scope along with the design and construction of future leadership-class systems in co-operation with European companies.

54 petaflop/s capability in 2009/2010. and high-level user support at JSC. between the various Grid technologies, JSC was the first German national su- As a member of the Gauss Centre for To this end, JSC is pursuing the re- thus creating seamless user access percomputer centre founded in 1987 Supercomputing, JSC heads the Euro- alignment of disciplinary research and to a wide range of e-infrastructures. and has established - together with pean project Partnership for Advanced cross-sectional activities by its innova- It designs and implements world-class its partners DESY and GSI within the Computing in Europe (PRACE), which tive concept of Simulation Laboratories. service-oriented architectures accord- framework of the John von Neumann is preparing the creation of a pan- This measure will ensure the long-term ing to emerging standards from the Institute for Computing (NIC) - a highly European supercomputing service at critical mass of the existing research Grid Computing and Web Services do- regarded peer-review scheme for the the highest performance level, its full teams and will further intensify co- main, accompanied by the identification provision of supercomputer resources. integration into the European HPC eco- operation with external academic and implementation of new trends in JSC is a founding member of the system and its sustained operation. groups. A Simulation Laboratory is a the distributed systems research area. European project Distributed European targeted research and support struc- JSC is a major partner in the operation Infrastructure for Supercomputing For Jülich, the realisation of such am- ture focused on a specific scientific of the heterogeneous environment of Applications (DEISA). Together with the bitious goals implies the cost-efficient community. Each of these consists of a the D-Grid and will continue to sustain national HPC centres Leibniz-Rechen- and continuous upgrading of JSC’s su- core group located at a supercomput- the Grid infrastructures for the German zentrum in Garching (LRZ) and Höchst- percomputers, data storage, network- ing centre and a number of associated e-Science community, including high- leistungsrechenzentrum Stuttgart ing and post-processing capabilities scientists outside. The first Simulation level user support and training events (HLRS), JSC founded the Gauss Centre towards a petascale facility, enabling Laboratories for Jülich and regionally in order to strengthen the application for Supercomputing (GCS) in 2006. Centres both the highest scalability and broad- based communities will be created sector. The topic also includes JSC‘s The GCS e.V. represents Germany as Centres est general purpose flexibility. Only in the fields of computational plasma research and development of future a single legal entity in the European su- through a continuous dual moderniza- physics, earth sciences, computational communications and networking tech- percomputing infrastructure initiative tion process can the sustained provi- biology and nanosciences. They are nologies, both at the wide and local PRACE. sion of supercomputer resources of complemented by cross-sectional re- area and at the processor interconnect the highest performance and widest search groups on mathematical model- level. Under its new name, the Jülich Super- scope be guaranteed. ing and methods and on performance computing Centre will endeavour to analysis tools that play a key role in the Partnerships and Alliances continue the internationally recognized We can only remain in the forefront of scaling of large application codes to JSC’s work has a distinct interdisciplin- work of the Zentralinstitut für Ange- high-performance computing through future petascale machines. The topic ary and collaborative character. It has wandte Mathematik. Prospects of suc- active research and technological de- also comprises the work of the Jülich an important bridging function for its cess are very good for the Centre in • Dr. Sabine velopment at the system level. This research group at the John von Neu- user communities as well as for the view of its competence in high-perfor- Höfl er-Thierfeldt will be carried out by JSC in close col- mann Institute for Computing, whose research fields of the Helmholtz mance computing, its excellent equip- laboration with leading hardware ven- current research theme is Computa- Association and partners in national ment, and its active integration into Jülich dors, software companies and system tional Biology and Biophysics. and international science organizations the European HPC ecosystem. Supercomputing integrators, embedded in the PRACE and competence networks. Centre initiative. Activities range from the in- Grid Technologies vestigation of novel hardware architec- and Infrastructures The simulation activities of JSC are inte- tures to the design and development JSC develops and provides Grid tech- grated in several formal collaborations, of hardware and software for the next nologies for distributed supercom- in particular in the newly established generation of leadership-class systems, puting infrastructures based on the Jülich-Aachen Research Alliance (JARA), including cluster computing, network- UNICORE paradigm. The overall objec- Section JARA-SIM. The long-term ac- ing, security and scientific visualization tive is to support national and interna- tivities and experiences of JSC in the systems. tional e-infrastructures, such as the teaching and training of young scientists Gauss Centre for Supercomputing, together with the Aachen University of Computational Science D-Grid, DEISA, and PRACE, forming Applied Sciences are strengthened by and Mathematical Methods the future European high-performance its firm commitment to the German Re- An effective utilization of leadership- computing infrastructure. JSC collabo- search School for Simulation Sciences class supercomputing facilities requires rates with many partners to achieve at FZJ and RWTH Aachen University, both outstanding technical operation standards-based interoperability which will be inaugurated in 2008.

56 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 57 HPC in Science and Engineering – The 10th Results and Review Workshop of the HLRS

State-of-the-art scientific simulations excellent overview of recent develop- Transactions of the High on the supercomputer systems of ments in High Performance Computing Performance Computing HLRS again emphasized the world-class and simulation. Centre Stuttgart research done at the centre obtaining outstanding results in achieving high- Every year the steering committee of High Performance est performance for production codes the HLRS, a panel of twelve top-class Computing in Science which are of particular interest for scientists and responsible for project and Engineering ‘07 both scientists and engineers. proposal reviews, appreciated the high The book presents the state-of-the-art quality of the work carried out in Stutt- in simulation on supercomputers. Lead- The presentations covered all fields gart as well as the spectacular scien- ing researchers present results achieved Activities of computational science and engi- tific results and the efficient usage of on systems of the Höchstleistungs- Activities neering ranging from computational supercomputer resources by awarding rechenzentrum Stuttgart (HLRS) for fluid dynamics and reacting flows via the three most outstanding projects, the year 2007. The 10th Results and Review Workshop computational physics, astrophysics, honoured by the Golden Spike Award. of the HLRS brought together more solid state physics, chemistry and The laureates of the year 2007 and The reports covers all fields of compu- than 50 participants from German nanotechnology to earth sciences their project titles are: tational science and engineering ranging research institutions, the steering and computer science with a special from CFD via computational physics and committee and scientific support staff emphasis on industrially relevant ap- • Rainer Stauch chemistry to computer science with a of HLRS. More than 45 sophisticated plications. These outstanding results Institute for Technical special emphasis on industrially relevant Product Details: talk and poster presentations were of research problems in science and Thermodynamics, applications. Presenting results for both • Hardcover: selected in advance from the steering engineering are milestones of modern University of Karlsruhe vector systems and microprocessor 676 pages committee out of the yearly supercom- highest level scientific computing also based systems the book makes it pos- • 1st edition puter project reports and presented by using exceptional complex models ”Ignition of Droplets in a Laminar sible to compare performance levels and (January 2008) at the Results and Review Workshop. and methods, and therefore provide an Convective Environment” usability of various architectures. The • Language: English book gives an excellent insight into the • ISBN-10: • Janina Zimmermann potential of vector systems. The book 3540747389 Fraunhofer Institut für further covers the main methods in • ISBN-13: Werkstoffmechanik, Freiburg high performance computing. Its out- 978-3540747383 standing results in achieving highest ”DFT Modeling of Oxygen Adsorption performance for production codes are on CoCr Surfaces” of particular interest for both scientists and engineers. The book comes with • Michael Breuer a wealth of coloured illustrations and Institute of Fluid Mechanics, tables. University of Erlangen-Nürnberg

Nagel, W. E., Kröner, D., Resch, M. (Eds.) ”Direct Numerical Simulation of High Performance Computing in Science and Turbulent Flow Over Dimples - Engineering ‘07, Transactions of the High Perfor- Code Optimization for NEC SX-8 mance Computing Centre Stuttgart (HLRS) 2007, plus Flow Results” Springer, Berlin, Heidelberg, New York 2007

58 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 59 The 8th HLRS-NEC ods. Harald Klimach (HLRS, University formance aspects of the widely used of Stuttgart) talked about the issues re- ECHAM5 climate code. Finally, the talk Te r a f lo p Wo r ks ho p lated with heterogeneous parallelism in given by Stephan Blankenburg (Theo- order to couple a CFD and a CAA (com- retical Physics,University Paderborn) putational aeroacoustics) application. focused on molecular recognition and The eighth Teraflop Workshop is a from RIKEN, Japan introduced the Japa- Nowadays, there is a lot of focus on cli- self assembly using the molecular dy- continuation to the series of work- nese “Next Generation Supercomputer” mate and environmental research. The namics simulation code VASP. shops held twice a year, alternatively project and underlined the challenges progress achieved in this field over the at Tohoku University, Japan and HLRS, ahead to developing a petaflop system. past few years was detailed in three The success of this workshop will be Germany as part of the Teraflop-Work- talks. Regional climate simulations followed up by the proceedings book bench cooperation between NEC and The talks on applications reflect the wide were presented by Hans-Jürgen Pa- “High performance computing on vec- HLRS. The workshop provides a plat- range of simulations run on the HLRS nitz (IMK, FZK). Malte Müller of DKRZ tor systems 2008”, to appear in au- form for scientists, application develop- hardware installations. Various talks talked about barotropic free oscillations tumn. The next Teraflop Workshop will ers, hardware designers and interna- were closely related to industrial applica- of a global ocean model and Stefan be held at Tohoku University, Japan, in tional HPC experts to discuss recent tions such as surface technologies (An- Borowski (NEC, HPCE) discussed per- November 2008. developments and future directions dreas Scheibe, IFF, University of Stutt- of supercomputing:This year‘s spring gart), the simulation of fluid structure Activities event was held on April 10th to 11th. interaction in turbines (Albert Ruprecht, Activities IHS, University of Stuttgart) and the sim- First Industrial Grids Meeting The workshop included two hardware ulation of combustion chambers given by sessions with talks on the three major Benedetto Riso (RECOM Services). The supercomputing projects. The talk on last talk was accompanied by a presen- The first Industrial Grids Meeting took requirements were to improve usability, IBM Roadrunner petaflop project which tation from Ulrich Maas (ITT, University place on October 25th 2008 in Lein- integration into the existing IT infrastruc- is scheduled to be finished in summer of Karlsruhe) on the details of numerical felden-Echterdngen near Stuttgart. It tures and the possibilities for long-term 2008 was given by Cornell Wright from simulation of combustion. was organized by the projects AeroGrid, system usage; improved by the reus- LANL/IBM. He taked about all aspects In-Grid, PartnerGrid, and ProGrid. Rep- ability of the services offered. Another of the machine including hardware, There was a separate session dedi- resentatives of all D-Grid projects with focal point of common expectations and operation and application software. A cated to computational fluid dynamics a strong stake in industry, especially requirements is for the organizations general overview of Cray’s recent and in which Markus Kloker (IAG, University the 40 industrial project partners, were to integrate the users into the Grid future HPC products along with an in- of Stuttgart) reported about DNS of invited. The main points for discussion structure. This comprises the manage- troduction to the Cascade project was controlled shear flow. Matthias Meinke were the expectations and requirements ment of roles and rights as well as the given by Wilfried Oed. Hiroaki Kobayashi presented a whole bunch of applications for Grid Computing by the industry. service provision of legally binding busi- from Tohoku University, Japan pre- that they are working with at AIA, Uni- Representatives from the organizing ness processes and the availability of sented their new NEC SX-9 installation versity Aachen and Dinan Wang (CCRL, projects as well as from the projects service level agreements. Finally, there along with first experiences on applica- St.Augustin) talked about blood flow sim- BauVO-Grid, BIS-Grid, biz2-Grid, GDI-Grid, are several common requirements tion performance. Tadashi Watanabe ulation using Lattice Boltzmann meth- and FinGrid met for discussion. The par- regarding the Grid technology and the ticipants were able to discuss a large tools to operate the infrastructure. Ex- number of points-of-view covering a amples discussed were applying security broad palette of use cases that ranged to the networks as well as questions in over scientific, technical, commerical, the fields of management and technical financial as well as organizational work monitoring of the systems. areas for the industrial Grid users. It was agreed to continue the contacts Over the course of the workshop, after inspired by this exchange of information. setting aside typical specifics, much There are some intentions to form work commonness between the D-Grid proj- groups between the different projects to ects was demonstrated. Important tackle several themes in the future.

60 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 61 NIC-Workshop “From FZJ-CEA Workshop on Computational Biophysics High-Performance Computing to Systems Biology“ and Simulation

About 120 scientist and students from headed by Prof. Dr. Ulrich H.E. Hans- The Jülich Supercomputing Centre include the development of highly scal- Germany, Europe and the USA par- mann. As in past years, experts from hosted a meeting of researchers able communication technology, tools ticipated in the third annual physics, chemistry, biology, from Jülich and the French research for monitoring and steering energy workshop “From Computa- and computer science dis- organization CEA. The subject of the consumption, and the development of tional Biophysics to Systems cussed how to bridge the workshop was the future collaboration terabyte/s-class I/O systems with the Biology” (CBSB08). The work- different scales in physics- between the two organizations, par- emphasis on open software, interoper- shop took place from May based simulations of cells. ticularly in the computational sciences ability, and hierarchical storage. 19 th to May 21st 2007 at Topics included protein fold- and in supercomputing. Leading sci- the Research Centre Jülich ing and misfolding, aggre- entists from a wide range of scientific Special discussions concerned the and was organized by the NIC gation, molecular docking, and technical disciplines were present fusion experiment ITER, which will be Activities research group “Computa- and the modeling of cardiac and discussed plans for joint research built on the CEA campus at Cadarache. Activities tional Biology and Biophysics” contractions. in the fields of materials science and CEA supports Jülich’s proposal to host quantum technology, soft matter and and operate a 100 TFlop/s system re- biophysics, earth and atmospheric sci- quired for crucial simulations accompa- ences, nuclear safety research, hadron nying the construction phase. physics and nuclear structure, as well as plasma physics and fusion. It was The workshop successfully defined CECAM Tutorial 2008 noted that in these areas of the simula- the scientific basis for a bilateral colla- tion sciences the two centres are in boration agreement between FZJ and the forefront of European research, CEA, which will be presented at the Twenty scientists from several due both to their outstanding science 3rd German-French Research Forum European and non-European countries and their excellent computational re- in Paris on 29th February 2008. attended this year‘s CECAM tutorial sources. entitled “Programming Parallel Com- puters” at the Jülich Supercomputing CEA and FZJ are members of the na- Centre. This was the third tutorial of tional consortia which are preparing for its kind held in Jülich and is part of the the future European HPC ecosystem. educational programme offered by the They stated that they are both ready Centre Européen de Calcul Atomique and willing to host future European et Moléculaire (CECAM). From 11th to supercomputers and that they want to 15 th February 2008, the participants start with technical preparations for learned about aspects of parallel pro- the evaluation and installation of these gramming with MPI and OpenMP. JSC petaflop systems. They agreed to coor- experts were responsible for the mixed dinate their search for complementary programme of lectures and hands-on systems, which is a requirement of the exercises, which continued the success international simulation community. of the preceding tutorials in 2006 and They also decided to jointly investigate 2007 and established the tutorial in fundamental technical challenges of CECAM‘s educational programme. extreme parallelism. These challenges

62 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 63 HLRS Workshop Report July-December 2008 – Workshop Announcements Scientifi c Workshops at HLRS, 2008 and Outlook 2nd HLRS Parallel Tools Workshop (July 7-9) High Performance Computing in Science and Engineering - The 11th Results and Review Workshop of the HPC Center Stuttgart (September 29-30)

In spring 2008 HLRS had a number IDC International HPC User Forum (October 13-14) Parallel Programming Workshops: Training in Parallel Programming and CFD of courses an workshops. The total 2nd HLRS Parallel Tools Workshop (HLRS, July 7-9) number of participants in these Parallel Programming with MPI & OpenMP (CSCS Manno, CH, August 12-14) courses from Germany and Europe Iterative Linear Solvers and Parallelization (LRZ, Garching, September 15-19) was about 150. Introduction to Computational Fluid Dynamics (HLRS, September 22-26) Message Passing Interface (MPI) for Beginners (HLRS, October 6-7) One of the flagships of our courses Shared Memory Parallelization with OpenMP (HLRS, October 8) is the week on Iterative Solvers and Advanced Topics in Parallel Programming (HLRS, October 9-10) Parallelization. Prof. A. Meister and Parallel Programming with MPI & OpenMP (FZ Jülich, ZAM/NIC, November 26-28) Prof. B. Fischer teach basics and de- Training in Programming Languages at HLRS tails on Krylov Subspace Methods. Fortran for Scientifi c Computing (October 27-31) Lecturers from HLRS give lessons URL Activities Activities on distributed memory parallelization http://www.hlrs.de/news-events/events/ with the Message Passing Interface http://www.hlrs.de/news-events/events/2008/parallel_prog_fall2008/ (MPI) and shared memory multi- http://www.hlrs.de/news-events/events/2008/prog_lang_fall2008/ threading with OpenMP. This course will be repeated in September 2008 at LRZ in Garching.

NEC SX-8 Usage and Programming The course on Fortran for Scientific is a 1,5 day course, dedicated to our Computing was given the first time at Workshop Announcement: NEC SX-8 users to learn about vector- HLRS in October 2006. The number ization and optimization of their appli- of participants is still growing from 2nd HLRS Parallel cations on the Multi-Teraflops-System course to course. This year, we had at HLRS. therefore to move into our largest lab. To o ls Wo r ks h o p 2 0 0 8

Mainly PhD students from Stuttgart and other universities in Germany Date & Location came to learn not only the basics July 7-9, 2008, HLRS, Stuttgart of programming, but also to get in- sight into the principles of developing Developing for current and future pro- grated developing environments for high-performance applications with cessors will more and more require parallel platforms. Fortran. The Fortran course will be parallel programming techniques for repeated in October this year. application and library programmers. Participants and tools developers itself will get the chance to see the On the next side you find a summary This workshop offers to the industrial strengths of the various tools. of HLRS courses for autumn 2008. and scientific user community, as well Therefore, this workshop is focused Detailed information about individual as the tools developers itself an in- on persons who already know about courses providing information about depth workshop on the state-of-the-art parallel programming, e.g. with MPI dates, location and contents. of parallel programming tools, ranging or OpenMP. Hands-on sessions will from debugging tools, performance give a first touch and allow to test analysis and best practices in inte- the features of the tools.

64 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 65 High Performance Computing Courses and Tutorials

Parallel Programming with the lectures. Participants will learn to Shared Memory Fortran for basic constructs of iterative solvers, Introduction to MPI, OpenMP and PETSc implement the algorithms, but also to Parallelization with OpenMP Scientific Computing the Message Passing Interface (MPI) Programming and Using apply existing software and to inter- and the shared memory directives of the IBM Supercomputers Date & Location pret the solutions correctly. Methods Date & Location Date & Location OpenMP. August 12-14, 2008 and problems of parallelization are October 8, 2008, HLRS, Stuttgart October 27-31, 2008, HLRS, Stuttgart Date & Location CSCS, Manno (CH) discussed. This course is organized by University August 11-12, 2008 Contents Contents of Kassel, HLRS, IAG, and LRZ. NIC/JSC, Research Centre Jülich November 26-28, 2008 This course is organized by HLRS, The course teaches OpenMP paralleliza- This course is dedicated for scientists NIC/JSC, Research Centre Jülich IAG, and University of Kassel, and tion, the key concept on hyper-thread- and students to learn (sequential) pro- Webpage Contents is based on a lecture and practical ing, multi-core, shared memory, and gramming of scientifi c applications with http://lrz.www.de/services/ The course gives an overview of the Contents awarded with the „Landeslehrpreis ccNUMA platforms. Hands-on sessions Fortran. The course teaches newest compute/courses supercomputers JUMP and JUGENE. The focus is on MPI, OpenMP, and Baden-Württemberg 2003“. (C/Fortran) will allow to test and under- Fortran standards. Hands-on sessions http://www.hlrs.de/news-events/ Especially new users will learn how PETSc. Hands-on sessions (C/Fortran) stand the directives and other inter- will allow users to immediately test and external-events to program and use the systems ef- will allow to immediately test and Webpage faces of OpenMP. Tools for performance understand the language constructs. fi ciently. Topics discussed are: system understand the basic constructs of http://www.hlrs.de/news-events/ tuning and debugging are presented. architecture, usage model, compiler, the presented programming models. events Course language is English (if required). Webpage NIC Guest Student Program: tools, monitoring, MPI, OpenMP, per- Course language is English. http://www.hlrs.de/news-events/ Introduction to Parallel formance optimization, mathematical Lecturer: R. Rabenseifner (HLRS) Webpage events Programming with MPI software, and application software. Message Passing Interface http://www.hlrs.de/news-events/ and OpenMP Webpage (MPI) for Beginners events Webpage http://www.hlrs.de/news-events/ Iterative Linear Solvers Date & Location http://www.fz-juelich.de/jsc/neues/ external-events Date & Location and Parallelization August 5-8, 2008 termine/IBM-Supercomputer http://www.fz-juelich.de/zam/neues/ October 6-7, 2008, HLRS, Stuttgart Advanced Topics in NIC/JSC, Research Centre Jülich termine/mpi-openmp Parallel Programming Date & Location Contents September 15-19, 2008 Contents CECAM Tutorial: The course gives a full introduction into Date & Location LRZ Building, Garching/Munich The course provides an introduction to Programming Parallel Introduction to MPI-1. Further aspects are domain October 9-10, 2008, HLRS, Stuttgart the two most important standards for Computers Computational Fluids decomposition, load balancing, and Contents parallel programming under the distrib- Dynamics debugging. An MPI-2 overview and the Contents The focus is on iterative and parallel uted- and shared-memory paradigms: Date & Location MPI-2 one-sided communication is also Topics are MPI-2 parallel fi le I/O, solvers, the parallel programming MPI, the Message-Passing Interface February 2009 (tentative) Date & Location taught. Hands-on sessions (C/Fortran) mixed model parallelization, OpenMP models MPI and OpenMP, and the and OpenMP. While intended mainly for NIC/JSC, Research Centre Jülich September 22-26, 2008 will allow users to immediately test and on clusters, parallelization of explicit parallel middleware PETSc. the NIC Guest Students, the course is HLRS, Stuttgart understand the basic constructs of and implicit solvers and of particle open to other interested persons upon Contents the Message Passing Interface (MPI). based applications, parallel numerics Different modern Krylov Subspace request. This tutorial provides a thorough intro- Contents Course language is English (if required). and libraries, and parallelization with Methods (CG, GMRES, BiCGSTAB ...) duction to scientifi c parallel program- Numerical methods to solve the PETSc. Hands-on sessions are included. as well as highly effi cient precondi- Webpage ming. It covers parallel programming equations of Fluid Dynamics are pre- Webpage Course language is English (if required). tioning and multi Grid techniques http://www.fz-juelich.de/jsc/neues/ with MPI and OpenMP. Lectures will sented. The main focus is on explicit http://www.hlrs.de/news-events/ are presented in the context of real termine/parallele_programmierung alternate with hands-on exercises. Finite Volume schemes for the com- events Webpage life applications. Hands-on sessions pressible Euler equations. Hands-on http://www.hlrs.de/news-events/ (C/Fortran) will allow users to im- Webpage sessions will manifest the content of events mediately test and understand the http://www.cecam.org/tutorials.html

66 Spring 2008 • Vol. 6 No. 1 • inSiDE Spring 2008 • Vol. 6 No. 1 • inSiDE 67 inSiDE

inSiDE is published two times a year by The GAUSS Centre for Supercomputing (HLRS, LRZ, and NIC)

Publishers Prof. Dr. Heinz-Gerd Hegering, LRZ Prof. Dr. Dr. Thomas Lippert, NIC Prof. Dr. Michael M. Resch, HLRS

Editor F. Rainer Klank, HLRS [email protected]

Design Stefanie Pichlmayer [email protected] inSiDE

Authors Benjamin Ahrenholz [email protected] Michael Breuer [email protected] Carlos S. Frenk [email protected] Stefan Heinzel [email protected] Amina Helmi [email protected] Sabine Höfler-Thierfeldt [email protected] Gábor Janiga [email protected] Hans-Thomas Janka [email protected] Adrian Jenkins [email protected] Martin Käser [email protected] Manfred Krafczyk [email protected] Hermann Lederer [email protected] Peter Lehmann peter.lehmann@epfl .ch Daniel Mallmann [email protected] Andreas Marek [email protected] Josep Martrat [email protected] Bernhard Müller [email protected] Julio Navarro [email protected] Orlando Rivera [email protected] Jörg Schumacher [email protected] Hemdan Shalaby [email protected] Volker Springel [email protected] Dominique Thévenin [email protected] Jonas Tölke [email protected] Stefan Wesner [email protected] Simon D.M. White [email protected] Wolfgang Ziegler [email protected]

If you would like to receive inSiDE regularly, send an email with your postal address to [email protected] © HLRS 2008