Automatic Performance Engineering Workflows for High Performance

Technische UniversitätM ünchen Institut fur¨ Informatik Lehrstuhl fur¨ Rechnertechnik und Rechnerorganisation Automatic Performance Engineering Workflows for High Performance Computing Ventsislav Petkov Vollstandiger¨ Abdruck der von der Fakultat¨ fur¨ Informatik der Technischen Universitat¨ Munchen¨ zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzende(r): Univ.-Prof. Dr. Helmut Krcmar Prufer¨ der Dissertation: 1. Univ.-Prof. Dr. Hans Michael Gerndt 2. Univ.-Prof. Dr. Felix Gerd Eugen Wolf Rheinisch-Westfalische¨ Technische Hochschule Aachen Die Dissertation wurde am 25.09.2013 bei der Technischen Universitat¨ Munchen¨ eingereicht und durch die Fakultat¨ fur¨ Informatik am 03.02.2014 angenommen. Acknowledgments The long journey in successfully completing this dissertation has been an unforgettable experi- ence for me. Looking back, there are a number of people without whom this work might not have been written and whose tremendous support I greatly appreciate. I would like to express my greatest gratitude to Professor Michael Gerndt for his insightful guidance, valuable support and unmitigated encouragement through my doctoral studies. I am thankful to him for giving me the complete freedom to follow my research and constantly pro- viding me with insightful comments and suggestions. Furthermore, I would like to thank him for giving me the great chance to work in such a supportive environment like the Chair of Computer Architecture (LRR) at Technische Universitat¨ Munchen¨ (TUM). I would also like to offer my special thanks to my second supervisor, Professor Felix Wolf, for his time reviewing this dissertation. I am also very grateful to him for giving me the opportunity to be part of the leading Virtual Institute - High Productivity Supercomputing (VI-HPS) and to work within the Performance Dynamics of Massively Parallel Codes (LMAC) project. Furthermore, I like to express my gratitude to all my friends and colleagues at LRR-TUM for their valuable support and all the great moments that we spent together throughout the last years. Last but not least, I would like to express the deepest appreciation to my mother Ani, my father Valeriy and my brother Martin for their continuous, unwavering support over so many years and for constantly being the driving force to materialize this dissertation. I would also like to thank my girlfriend Martina for her loving care and affection which immensely helped me to stay focused on achieving my goal. Ventsislav Petkov Munich, Germany September 2013 iii Abstract During the typical performance engineering process, application developers must often complete multiple iterative tasks to get an in-depth runtime profile of their application in order to identify new optimization opportunities and implement an effective tuning strategy. However, the majority of today’s mainstream tools do not support common workflows like analyzing the scalability and stability of parallel applications, studying their dynamic behavior, optimizing them for a particular architecture or even improving their power efficiency to help lower the operational costs of High-Performance Computing (HPC) centers. This, combined with the high complexity of today’s computing systems, makes it extremely difficult for the developers to collect, main- tain, and organize data from numerous performance engineering experiments and simultaneously track the evolution of their code throughout the overall software development and tuning cycle. However, problems such as long-running iterative processes, diverse data sources and totally different environments exist also in other research areas. Making relevant data-based decisions that lead to improving process efficiency has been the goal of business intelligence (BI) for many years now. A lot of research has been done in this field in order to accelerate and improve the overall decision-making process and create a smooth and stable working environment. Moreover, many tools have been developed to support the processing of huge data quantities and create model-driven, easier to use systems that greatly automate the overall analysis process. Consequently, this dissertation explores the concept of process automation in the field of parallel performance engineering and proposes a framework to support application developers in structuring and executing the overall tuning process and simultaneously tracking the evolution of the source code that is part of it. It adopts established standards and research ideas from the field of business intelligence and adapts them to the scientific domain of performance analysis and tuning of HPC applications. Furthermore, this work discusses new extensions to the Periscope performance engineering tool to accommodate such automated workflows and create a flexible, cross-platform environment. As a result, the overall performance analysis and tuning process can be enhanced and thus the productivity of high-performance computing developers can be improved. The evaluation of the thesis uses a set of popular performance engineering workflows, HPC benchmark codes and real- world applications to show the feasibility of the proposed framework. v Zusammenfassung Wahrend¨ des typischen Performance Engineering Prozesses, mussen¨ Anwendungsentwickler oft mehrere iterative Aufgaben durchfuhren¨ um ein vollstandiges¨ und detailliertes Laufzeitprofil ihrer Anwendung zu erhalten. Dieses Profil kann dann eingesetzt werden um neue Verbesserungs- moglichkeiten¨ zu identifizieren und eine optimale Optimierungsstrategie zu entwickeln. Allerd- ings hat die Mehrheit der heutigen Tools fur¨ Performance Engineering oft nur eine eingeschrankte¨ Unterstutzung¨ fur¨ einige der gangigen¨ Performance Engineering Workflows. Darunter fallen zum Beispiel die Analyse der Skalierbarkeit, der Stabilitat¨ und des dynamischen Verhaltens von parallelen Anwendungen, ihre Laufzeitoptimierung und Anpassung fur¨ eine bestimmte Architektur oder sogar die Verbesserung ihrer Energieeffizienz um Hochstleistungsrechenzentren¨ zu helfen, die oft sehr hoch anfallenden Betriebskosten zu sinken. Dies, kombiniert mit der hohen Kom- plexitat¨ der heutigen IT-Systemen, macht es extrem schwierig zahlreiche Daten aus Performance- Engineering-Experimente zu sammeln und zu verwalten und gleichzeitig die Entwicklung ihres Codes wahrend¨ des gesamten Software-Entwicklungs-und Tuning-Zyklus fortzufuhren.¨ Solche komplexe, langlaufende Prozesse mit diversen Datenquellen und vollig¨ unterschiedliche Laufzeitumgebungen existieren aber auch in anderen Forschungsbereichen. Beispielhaft, seit vie- len Jahren ist das Ziel der Business Intelligence (BI) die Aufbereitung und Auswertung von Daten und Informationen um komplexe Entscheidungen schneller treffen zu konnen,¨ die Effizienz von Prozessen zu steigern und eine optimale und stabile Arbeitsumgebung bereitzustellen. Nach jahrelanger Forschung wurden viele Werkzeuge entwickelt, um die Verarbeitung von sehr großen Datenmengen zu verbessern und die modellbasierte Entwicklung von Systemen und Komponen- ten, die den gesamten Analyseprozess automatisieren, zu unterstutzen.¨ Darauf basiert, untersucht diese Dissertation das Konzept der Automatisierung des Perfor- mance Engineering Prozesses im Bereich Hochstleistungsrechen¨ (HPC). Sie schlagt¨ ein Frame- work fur¨ die Unterstutzung¨ der Anwendungsentwickler bei der Strukturierung und Durchfuhrung¨ des gesamten Leistungsanalyse- und Tuning-Zyklus vor. Gleichzeitig wird auch die Verfolgung der Entwicklung des Quellcodes, der ein Teil des Performance Engineering Prozesses ist, implizit durchgefuhrt.¨ Das Framework integriert etablierte Standards und Forschungsideen aus dem Bere- ich Business Intelligence und adaptiert sie fur¨ die wissenschaftliche Domane¨ der parallelen Leis- tungsanalyse und Tuning von HPC-Anwendungen. vii Daruber¨ hinaus diskutiert diese Arbeit neue Erweiterungen des Periscope Performance Engi- neering Tools, um automatisierte Workflows zu unterstutzen¨ und so eine flexible, architekturun- abhangige¨ Umgebung bereitzustellen. Als Ergebnis kann ein großes Teil des gesamten Leistungs- analyse- und Tuning-Prozesses automatisiert werden und damit wird auch die Produktivitat¨ der Entwickler im Bereich Hochstleistungsrechen¨ deutlich verbessert. Die Auswertung der Arbeit basiert auf einer Reihe von gangigen¨ Performance Engineering Workflows, etablierten HPC Bench- mark Codes und andere hochparallele Anwendungen, um die Umsetzbarkeit des vorgeschlage- nen Frameworks zu zeigen. viii Contents Page Acknowledgements iii Abstract v List of Figures xiii List of Tables xvii 1. Introduction 1 1.1. Motivation and Problem Statement . .1 1.2. Performance Analysis and Tuning Methodology . .2 1.3. Process Automation and Standardization . .4 1.4. Contributions of This Work . .5 1.5. Outline of This Work . .6 I. Theoretical Background and Technological Overview 11 2. Software Development Life-Cycle 13 2.1. Software Requirements Engineering . 13 2.2. Software Design . 14 2.3. Software Construction . 15 2.4. Software Testing . 16 2.5. Software Maintenance . 17 3. Process Automation and Design of Workflows 19 3.1. Foundations of Process Automation . 20 3.2. Process Automation Languages and Standards . 24 3.3. Business Process Management Suites . 30 3.4. Scientific Workflow Automation Tools . 33 ix Contents 4. Supportive Software Development Tools 37 4.1. Revision Control Systems . 37 4.2. Client-Server Repository Model . 38 4.3. Distributed Repository Model . 41 5. Related Work

Automatic Performance Engineering Workflows for High Performance

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support