LRZ-Newsletter Nr. 07/2020 of July, 2Nd. 2020
Total Page:16
File Type:pdf, Size:1020Kb
LRZ-Newsletter Nr. 07/2020 of July, 2nd. 2020 Our Topics: News Award-Winning Supercomputing Research Supercomputing-Technology under Test The Hope for Quantum Computers Learning Library for Three-Dimensional Animal Models Poll Data per App Figures of the Month Workshops and Events Deep Learning and Programming with OpenACC Iterative Linear Solvers and Parallelization Deep Learning and Programming Fortran for Advanced Students C++ for Software Development Job Opportunities More to Read Informations about the LRZ-Newsletter Imprint News Award-Winning Supercomputing Research 2020-07_en.html[02.07.20, 17:08:07] LRZ-Newsletter Nr. 07/2020 of July, 2nd. 2020 The more than 6480 compute nodes of the SuperMUC-NG contain around 15 million sensors that collect a wide range of data from the system. "In preparation for exascale times, high-performance computing systems are becoming increasingly complex," explains Alessio Netti, computer scientist at the Leibniz Supercomputing Centre (LRZ) in Garching. "For these systems to run stable, become more controllable and, above all, consume significantly less energy, we need more knowledge and thus more data". At the end of June, two projects which deal with operating data of high-performance computers, received awards: The jury of the ACM HDPC 2020 conference honoured the LRZ tool Wintermute in Stockholm as one of the most innovative analysis methods for High Performance Computing (HPC). At the ISC 2020 in Frankfurt, a research team led by Amir Raoofy from the Technical University of Munich (TUM) won the Hans Meuer Award for their work on 'Time Series Mining at Petascale Performance'. Collecting and Evaluating the Right Data Sensors already provide all kinds of information from supercomputers, for example on temperature, power, load and stress on components. The open source software Data Center Data Base (DCDB), which collects data from millions of sensors and thus enables the control of SuperMUC-NG and CoolMUC-3, has already been developed at the LRZ. In order to be able to monitor and operate hese systems efficiently, an analysis tool is needed, but above all a systematic approach for evaluating these data. With Wintermute, Netti presented a generic model at the digital edition of the HDPC conference and thus a basis for Operational Data Analytics (ODA). It is intended to provide as comprehensive a picture of supercomputers as possible and to enable forecasts and adjustments to be made to the technology. To this end, Wintermute processes information generated in components (in-band data) or sent by them (out-of-band data), either in a streaming process, continuously (online processing) or only when explicitly required (on-demand processing). Using three case studies done on CoolMUC-3, the LRZ computer scientist shows which monitoring data can be used to detect anomalies in individual computer nodes, for example, in order to exchange or optimise them. Energy consumption can also be tracked and adjusted using Wintermute and selected data. The open source tool also shows where technology is causing bottlenecks in simulation and modelling. "Wintermute uses machine learning methods to make Operational Data Analytics more meaningful and thus more powerful," says Netti. "The tool was designed to be integrated into any existing monitoring system." The name actually refers to this: Wintermute is the name of an artificial intelligence that combines with another in a science fiction trilogy by William Gibson and becomes a - better - digital life form. The findings from Wintermute can help to improve computer systems of the future. A Matrix for Sorting Data Amir Raoofy, PhD candidate at the Chair for Computer Architecture and Parallel Systems at the Technical University of Munich (TUM) of Professor Martin Schulz, also works with data supplied by thousands of sensors from supercomputers or from the monitoring systems of power plants over weeks or even years. However, he is interested in how SuperMUC-NG and CoolMUC-3 handle the huge amounts of data. "Using matrix profile algorithms, time series can be searched for patterns and similarities," says Raoofy, outlining the problem. "But they are difficult to scale and are not suitable for HPC systems". However, the evaluation of large time series requires supercomputing: Anyone who wants to know under which conditions a gas turbine will run reliably and when the first components will be susceptible to repair should be able to check a lot of data. The computing power and capabilities of supercomputers make such analyses possible only in combination with scalable algorithms. Raoofy and his collegues developed the now award-winning scalable approach (MP)N. This can be run efficiently on up to 256,000 computer cores - that is around 86 percent of the computing resources of the SuperMUC-NG. The fact that it delivers exact calculations was tested with performance data from the SuperMUC-NG. The 2020-07_en.html[02.07.20, 17:08:07] LRZ-Newsletter Nr. 07/2020 of July, 2nd. 2020 algorithm is currently being used to analyze data supplied by two gas turbines belonging to Stadtwerke of Munich. TurbO is the name of the project funded by the Bavarian Research Foundation. "In our experiments, we performed the fastest and largest multidimensional matrix profile ever calculated," reports Raoofy. "We achieved 1.3 petaflops per second." This means that supercomputers like the SuperMUC-NG can quickly and efficiently evaluate data from long time series - science and technology will know how to use this. (vs) Back to Content Supercomputing-Technology under Test The Leibniz Supercomputing Centre (LRZ) will extend its testbed program using HPE'S Cray CS500 system with Fujitsu A64FX processors that are based on Arm architecture. Integrating the technologies in early autumn, the institute of the Bavarian Academy of Sciences and Humanities will be the first academic computing centre in the European Union to offer this innovative hardware for its users to explore. "As a world-class academic supercomputing centre, it is one of our core tasks to explore new and diverse architectures within our Future Computing Program. Together with our core partners, we look forward to exploring the capabilities of this technology and the benefit it brings to our users - especially in the area of HPC and AI", says Dieter Kranzlmueller, Director of the LRZ. HPE's Cray CS500 system is based on the same Fujitsu A64FX processor used in the recently installed Japanese high-performance computer Fugaku currently #1 on the global Top500 list and is a central asset for the LRZ testbed program and its system known as BEAST (Bavarian Energy, Architecture and Software Testbed). The Arm-based architecture in the Fujitsu processors, integrated by Hewlett Packard Enterprise (HPE) will offer unique pathways for traditional modelling and simulation tasks while being equally suitable for data analytics, machine learning and artificial intelligence workloads. It will be available to core academic partners and select projects as well as next-generation HPC practitioners. HPE's Cray CS500 system features the Cray Programming Environment, a fully integrated software suite that maximizes programmer productivity and application performance. "HPC plays an increasingly critical role to meet key societal and economic challenges in areas like medicine, climate change, and risk management. We are committed to advancing HPC technologies by developing diverse architectures to support any workload need," Bill Mannel said, the Vice President and General Manager HPC, Hewlett Packard Enteprise. "We look forward to working with the Leibniz Supercomputing Centre to explore how alternative architectures can enable new HPC applications to bolster research and innovation, while improving performance and efficiency." Exploring Diverse Architectures for Future Computing Each of the Fujitsu A64FX Arm-based processors will be equipped with 32GB of second-generation high bandwidth memory (HBM2). The servers are connected by an EDR-Infiniband network. The system comes with an Open-Source (GCC/LLVM) software stack as well as a Cray Programming Environment supporting the vectorised processor units. HPE's Cray CS500 offers LRZ users and HPC specialists evaluation and research opportunities for the processors' performance on real-world applications and in comparative studies to both regular CPUs and GPUs. As a pioneering centre for energy-efficient HPC systems and data centre infrastructure, LRZ is also highly interested in evaluating the performance per watt the system can deliver. "The Scalable Vector Extension (SVE) architecture, the high memory bandwidth with HBM2, and the Cray Compiler Environment and software stack are a few of the things we are excited to explore and better understand in support of our users and their scientific work," says Josef Weidendorfer, team lead Future Computing at LRZ. (lp) Back to Content The Hope for Quantum Computers 2020-07_en.html[02.07.20, 17:08:07] LRZ-Newsletter Nr. 07/2020 of July, 2nd. 2020 The economic stimulus package with which the German government intends to reactivate the economy includes around two billion euros for the construction and development of two quantum computers. Scientists rely on the better computing capabilities they promise for data-intensive research: quantum technology electrifies researchers, politics and industry. The Leibniz Computing Centre (LRZ) has already been working on this topic for some time - and this is now paying off in terms of research projects and international contacts.