Innovatives Supercomputing in Deutschland

inSiDE • Vol. 8 No.1 • Spring 2010 Innovatives Supercomputing in Deutschland Editorial Contents Welcome to this new issue of InSiDE the physics of galactic nuclei and the the journal on innovative Supercomput- global parameter optimization of a News ing in Germany published by the Gauss physics-based united-residue force field PRACE: Ready for Implementation 4 Center for Supercomputing. for proteins. New JUVIS Visualization Cluster 7 The focus of the last months all over Projects are an important driving fac- Europe was on the European HPC initia- tor for our community. They provide Applications tive PRACE (Partnership for Advanced a chance to develop and test new Direct Numerical Simulation of Computing in Europe) and the Gauss methods and tools. They allow further Active Separation Control Devices 8 Center for Supercomputing has played improving existing codes and building a very active and important role in the tools we need in order to harness Numerical Simulation and Analysis of Cavitating Flows 14 this process. As a result of all these the performance of our systems. activities PRACE is now ready for the A comprehensive section of projects Modelling Convection over West Africa 18 implementation phase. Here you find an describes among other the recent ad- overview of the results achieved in this vances in the UNICORE 6 middleware Driving Mechanisms of the Geodynamo 26 Editorial issue. In our autumn edition we will be software. This is a good example how Contents able to report on the next steps which a nationally funded project that was The Maturing of Giant Galaxies by Black Hole Activity 32 will include the founding of a formal supported by European projects finally European organization. turned into a community tool and The Physics of Galactic Nuclei 36 is now taking the next step in the PRACE is an important driving factor improvement process. Global Parameter Optimization of a Physics-based also for German supercomputing. Hence United Residue Force Field (UNRES) for Proteins 40 the Gauss Center for Supercomputing Let us highlight a typical hardware has again stepped up its effort to pro- project, which is an excellent example Projects • Prof. Dr. H.-G. vide leading edge systems for Europe in of how the community can drive also Recent Advances in the UNICORE 6 Middleware 46 Hegering (LRZ) the next years. HLRS should be able to hardware development. QPACE (Quan- announce its system soon after we go tum Chromodynamics Parallel Comput- LIKWID Performance Tools 50 • Prof. Dr. Dr. Th. to press – so stay tuned for further in- ing on the Cell) is a project driven by Lippert (JSC) formation. LRZ has started its procure- several research institutions and IBM. Project ParaGauss 54 • Prof. Dr.-Ing. M. M. ment process. InSiDE expects to report The collaboration is a good example Resch (HLRS) on the results in the autumn issue. how we can deviate from main stream Real World Application Acceleration with GPGPUs 58 technology and still closely work with As we enter the Petaflops-era applica- leading edge technology providers. Systems tions are the driving factors of all our The resulting merger of ideas and activities. InSiDE presents the three technologies provides an outstanding Storage and Network Design 62 winners of a Golden Spike Award of solution for a large group of application for the JUGENE Petaflop System the HLRS. The applications that were researchers. 67 honored by the HLRS steering commit- QPACE tee are reporting on direct numerical As usual, this issue includes informa- simulation of active separation control, tion about new systems and events in Centres 68 the modeling of convection over West supercomputing in Germany over the Africa, and the maturing of galaxies by last months and gives an outlook of Activities 74 black hole activity. Further application workshops in the field. Readers are in- papers deal with the numerical simula- vited to participate in these workshops. Courses 88 tion and analysis of cavitating flows, the driving mechanisms of the geo dynamo, 2 Spring 2010 • Vol. 8 No. 1 • inSiDE Spring 2010 • Vol. 8 No. 1 • inSiDE 3 theses were: IBM Blue Gene/P (MPP To stay at the forefront of HPC, PRACE PRACE: [4]) at FZJ, CRAY XT5 (MPP) at CSC, has to play an active role: The project Ready for Implementation IBM Power 6 (SMP-FN) at Sara, IBM evaluated promising hardware and Cell at BSC (Hybrid), NEC SX-9 coupled software components for future multi- with Intel Nehalem (Hybrid) at HLRS, Petaflop/s systems, including: comput- In October 2009 PRACE [1], the identified in 2006 by ESFRI [3]. Having and Intel Nehalem clusters by Bull and ing nodes based on NVIDIA as well as Partnership for Advanced Computing started in January 2008, PRACE is Sun respectively at CEA and FZJ (SMP- AMD GPUs, Cell, ClearSpeed-Petapath, in Europe, demonstrated to a panel among the first projects to complete TN). All prototypes were evaluated in FPGAs, Intel Nehalem and AMD Bar- of external experts and the European its work after only two years. This close to production condition using celona and Istanbul processors; sys- Commission that the project made includes not only the legal and adminis- synthetic benchmarks to measure systems to assess memory bandwidth, “satisfactory progress in all areas” [2] trative tasks and the securing of ad- tem performance, I/O bandwidth, and interconnect, power consumption and and “that PRACE has the potential equate funds but due to the specific communication characteristics. Most I/O performance as well as hybrid and to have real impact on the future of challenges of the rapidly evolving nature applications from the above set of power-efficient solutions; programming European HPC, and the quality and of High Performance Computing also were evaluated on at least three differ- models or languages such as Chapel, outcome of European research that very important technical work. ent prototypes. The results contain a CUDA, Rapidmind, HMPP, OpenCL, depends on HPC services”. wealth of information for the selection StarSs (CellSs), UPC and CAF. PRACE PRACE identified and classified a set of of future systems and where vendors co-funded work to extend the world’s News This marked an important milestone over twenty application that constitute a and application developers should focus “greenest” supercomputer QPACE [5] News towards the implementation of the per- cross-section of the most important sci- to improve performance. The Blue for general purpose applications. manent HPC Research Infrastructure entific applications that consume a large Gene/P installed at FZJ in 2009 is the in Europe, consisting of several world- portion of the resources of Europe’s first European Petaflop/s system that Finally, promising technologies, espe- class top-tier centres managed as a leading HPC centres. When neces- qualifies as a PRACE tier-0 system. In cially with respect to energy efficiency, single European entity. It will be estab- sary, these applications were ported addition, PRACE made the prototypes will be evaluated with the ultimate goal lished temporarily as an association to the architectures that PRACE identi- available to users through open call to engage in collaborations with indus- with its seat in Lisbon in spring 2010. fied as likely candidates for near term for proposals. In the first and second trial partners to develop products The project has prepared signature- Petaflop/s systems. The focus was, round, over 8.9 million CPU hours were exploiting STRATOS, the PRACE advi- ready statutes and other documents however, to evaluate how well the ap- granted to nine research teams. The sory group for Strategic Technologies required to create the association as a plications scale on the different archi- results will be published on the PRACE created in the PRACE Preparatory legal entity. Among them is the Agree- tectures, how well the systems enable web site. Phase project. ment for the Initial Period that covers petascaling, and how difficult it is to the commitments of the partners dur- exploit the parallelism provided by the ing the first five years where national hardware. Ten of these applications procurements are foreseen by the constitute the core of a benchmark hosting partners. Germany, France, suite that may be used in the procure- Italy, and Spain have each committed ments of the next Tier-0 systems. themselves to provide supercomputing resources worth at least 100 Million In parallel PRACE conducted an in- Euro to PRACE during the initial period. depth survey of the HPC market involv- The Netherlands and the UK are ex- ing all relevant vendors. The objective pected to follow suit. The access to the was to classify and select architec- PRACE resources will be governed by a tures promising to deliver systems single European Peer Review system. capable of a peak performance of one Petaflop/s in 2009/10. As one im- PRACE is one of 34 Preparatory Phase portant result a comprehensive set of projects funded in part by the European architectures was defined and proto- Commission to prepare the important types covering this set were deployed European Research Infrastructures as prototypes at partner sites. In detail PRACE Management Board welcomes Bulgaria and Serbia to the PRACE Initiative (Toulouse, September 8, 2009). Photo by Thomas Eickermann. 4 Spring 2010 • Vol. 8 No. 1 • inSiDE Spring 2010 • Vol. 8 No. 1 • inSiDE 5 PRACE started an advanced training ture will be open to all European states. and education programme for scien- The Gauss Centre for Supercomputing New JUVIS Visualization Cluster tists and students in advanced pro- represents Germany both in the legal gramming techniques, novel languages, entity and in the project. The project JSC offers a new visualization service and tools. Among them were a sum- partners asked the Research Centre which was made possible by additional mer school (Stockholm, 2008) and a Jülich to coordinate also the Implemen- BMBF-funding dedicated to the D-Grid winter school (Athens, 2009), six code tation Phase project and to continue the project.

Innovatives Supercomputing in Deutschland

An Introduction to Analysis and Optimization with AMD Codeanalyst™ Performance Analyzer

Software Optimization Guide for Amd Family 15H Processors (.Pdf)

AMD Codexl 1.7 GA Release Notes

AMD Codexl 1.8 GA Release Notes

Instruction-Based Sampling: a New Performance Analysis Technique for AMD Family 10H Processors

AMD Codexl 1.1 GA Release Notes

AMD Codexl 1.9 GA Release Notes

AMD Technology & Software

Improving Program Performance with AMD Codeanalyst for Linux® Paul J. Drongowski AMD Codeanalyst Team May 9, 2007

Efficient Implementation of Finite Element Operators on Gpus

A Methodology for Efficient Multiprocessor System-On

AMD Codexl 1.3 GA Release Notes