2.2 Cloud Computing

Université de Neuchâtel Faculté des Sciences Institut d’Informatique Approaches for Cloudification of Complex High Performance Simulation Systems par Andrei Lapin Thèse présentée à la Faculté des Sciences pour l’obtention du grade de Docteur ès Sciences Acceptée sur proposition du jury: Prof. Peter Kropf, Directeur de thèse Université de Neuchâtel, Suisse Prof. Pascal Felber Université de Neuchâtel, Suisse Prof. Jesus Carretero University Carlos III of Madrid, Spain Prof. Pierre Kuonen Haute école d’ingénierie et d’architecture de Fribourg, Suisse Prof. Marcelo Pasin Haute école Arc, Neuchâtel, Suisse Prof. Philip Brunner Université de Neuchâtel, Suisse Soutenue le 24 octobre 2017 Faculté des Sciences Secrétariat-décanat de Faculté Rue Emile-Argand 11 2000 Neuchâtel – Suisse Tél : + 41 (0)32 718 21 00 E-mail : [email protected] IMPRIMATUR POUR THESE DE DOCTORAT La Faculté des sciences de l'Université de Neuchâtel autorise l'impression de la présente thèse soutenue par Monsieur Andrei LAPIN Titre: “Approaches for Cloudification of Complex High Performance Simulation Systems” sur le rapport des membres du jury composé comme suit: Prof. Peter Kropf, directeur de thèse, UniNE Prof. Pascal Felber, co-directeur de thèse ad interim, UniNE Prof. Philip Brunner, UniNE Prof. Pierre Kuonen, HEIA-FR, Fribourg, Suisse Prof. Jesus Carretero, U. Carlos III, Madrid, Espagne Prof. Marcelo Pasin, HE-Arc, Neuchâtel Neuchâtel, le 17 novembre 2017 Le Doyen, Prof. R. Bshary Imprimatur pour thèse de doctorat www.unine.ch/sciences ACKNOWLEDGMENTS I would like to thank my advisor, Prof. Peter Kropf, for his precious guidance and continuous support throughout the course of my PhD. I gained many useful skills and discovered multiple interesting research directions thanks to him. I would like to thank Prof. Pascal Felber for being part of my jury. I would especially like to thank him for his remarkable help and support during the last months of my stay at the university. I would like to thank Prof. Jesus Carretero for being part of my jury. With Prof. Jesus Carretero and his research group from the Polytechnical School of University Carlos III of Madrid we had a long-lasting research collaboration, which eventually became a very important part of my scientific work. I would like to thank Prof. Philip Brunner for being part of my jury. Prof. Philip Brunner provided a wonderful hydrological use case, which I had been using as a benchmark for my scientific discoveries. Also, I would like to thank Dr. Oliver Schilling and Dr. Wolfgang Kurtz, with whom I had been working side-by-side on the provided hydrological problem. I would like to thank Prof. Marcelo Pasin for being part of my jury. I would especially like to thank him for his detailed feedback and his practical suggestions how to improve my work. I would also like to thank Prof. Peirre Kuonen for being part of my jury. My sincere thanks also go to the Department of Computer Science of the University of Neuchâtel for the excellent working environment and very friendly atmosphere. A special thought for my colleague and office mate, Dr. Eryk Schiller, for always being helpful and motivated to collaborate. Also, I would like to thank the COST Action IC1305 “Network for Sustainable Ultrascale Computing Platforms” (NESUS) for the support and wonderful research opportunities. Finally, I would like to thank my dear wife, Nadiia, for her never ending patience, help, and support. She was always there when I needed to find the final bits of motivation to finish this challenging journey. And a special big thank you for our son Daniel! RÉSUMÉ Le calcul scientifique est souvent associé à un besoin de ressources toujours croissant pour réaliser des expériences, des simulations et obtenir des résultats dans un temps raisonnable. Même si une infrastructure locale peut offrir de bonnes performances, sa limite est souvent atteinte par les chercheurs. Pour subvenir à ces besoins en constante augmentation, une solution consiste à déléguer une partie de ces tâches à un environnement en nuage. Dans cette thèse, nous nous intéresserons au problème de la migration vers des environnements en nuage d’applications scientifiques basées sur le standard MPI. En particulier, nous nous concentrerons sur les simulateurs scientifiques qui implémentent la méthode itérative Monte Carlo. Pour résoudre le problème identifié, nous (a) donnerons un aperçu des domaines du calcul en nuage et du calcul à haute performance, (b) analyserons les types de problèmes actuels liés à la simulation, (c) présenterons un prototype de simulateur Monte Carlo, (d) présenterons deux méthodes de cloudification, (e) appliquerons ces méthodes au simulateur Monte Carlo, et (f) évaluerons l’application de ces méthodes à un exemple d’utilisation réelle. Mots-clés: systèmes distribués, calcul à haute performance, calcul en nuage, déroulement scientifique, big data, architecture orienté service, Apache Spark, réseau maillé sans fil. ABSTRACT Scientific computing is often associated with ever-increasing need for computer resources to conduct experiments, simulations and gain outcomes in a reasonable time frame. While local infrastructures could hold substantial computing power and capabilities, researchers may still reach the limit of available resources. With continuously increasing need for higher computing power, one of the solutions could be to offload certain resource-intensive applications to a cloud environment with resources available on-demand. In this thesis, we will address the problem of migrating MPI-based scientific applications to clouds. Specifically, we will concentrate on scientific simulators, which implement the iterative Monte Carlo method. To tackle the identified problem, we will (a) overview high performance and cloud computing domains, (b) analyze existing simulation problem types, (c) introduce an example Monte Carlo simulator, (d) present two cloudification methodologies, (e) apply the methodologies to the example simulator, and (f) evaluate the potential application of methodologies in a real case study. Keywords: distributed systems, high performance computing (HPC), cloud computing, scientific workflows, big data (BD), service-oriented architecture (SOA), Apache Spark, wireless mesh network (WMN). Contents 1 Introduction 1 1.1 Motivation . .1 1.2 Organization of the Thesis . .2 1.3 Contributions . .2 2 State of the art 5 2.1 High performance computing (HPC) . .5 2.1.1 HPC architectures . .7 2.1.2 HPC programming models . .8 2.1.3 Current trends and challenges . .9 2.2 Cloud computing . 10 2.2.1 Cloud computing models . 11 2.2.2 Enabling technologies behind cloud computing . 14 2.2.3 Service oriented approach to resources . 16 2.2.4 Current challenges . 18 2.3 Cloud computing for scientific applications . 20 2.4 Summary . 22 3 Modern high performance simulators 23 3.1 Introduction . 23 3.2 Problem definition . 25 3.3 Towards the paradigm shift . 27 3.3.1 Analysis of the key features of execution environments . 27 3.3.2 Classification of the simulation problem types . 28 ix 3.3.3 Common aspects of modern simulators . 31 3.4 Case study: hydrological ensemble Kalman filter simulator (EnKF-HGS) . 35 3.4.1 Simulator description . 35 3.4.2 Implementation and execution model . 38 3.5 Summary . 40 4 Big data inspired cloudification methodology 41 4.1 Introduction . 41 4.2 State of the art . 42 4.2.1 Data analytics tools . 42 4.2.2 HPC and big data paradigms convergence . 45 4.3 Cloudification methodology . 47 4.3.1 Description of the original methodology . 47 4.3.2 Enhancing the methodology: cloudification of complex iterative workflows 48 4.4 Enhanced methodology application to EnKF-HGS . 50 4.4.1 Simulator analysis . 51 4.4.2 Cloudification procedure . 51 4.4.3 Evaluation . 54 4.5 Summary . 65 5 Service-oriented cloudification methodology 67 5.1 Introduction . 67 5.2 State of the Art . 68 5.3 Cloudification methodology for Monte Carlo simulations . 72 5.3.1 Methodology description . 72 5.3.2 CLAUDE-building platform . 75 5.4 Application to EnKF-HGS . 82 5.4.1 Simulator analysis . 83 5.4.2 Cloudification procedure . 84 5.4.3 Evaluation . 87 x 5.5 Summary . 94 6 Case study: hydrological real-time data acquisition and management in the clouds 95 6.1 Introduction . 95 6.2 Conceptual framework for cloud-based hydrological modeling and data assimilation . 96 6.2.1 Data acquisition through wireless mesh networks . 96 6.2.2 Data assimilation with EnKF-HGS . 100 6.3 Summary . 101 7 Conclusion 105 7.1 Summary of contributions . 105 7.2 Future directions . 106 A List of publications 109 References 111 xi Chapter 1 Introduction 1.1 Motivation From the 1970s, the high performance computing (HPC) market offered researchers and scientists solutions, which delivered top performance, parallelization, and scalability needed for high quality scientific experiments. Over time building a scientific application for HPC environments became more of a tradition than a necessity. Hence, today a considerable amount of scientific applications is still subject to the traditional HPC infrastructures. According to the TOP500 overview of the IDC’s latest forecast 2016-2020 for the HPC Market [63], the demand for HPC continues growing, with a compounded annual growth rate of 8% from 2015 to 2019. However, there is still an ever-growing demand for even bigger computing power. One of the ways to satisfy this demand is to offload applications to cloud computing environments. Despite clear advantages and benefits of cloud computing (i.e., fault-tolerance, scalability, reliability, and more elastic infrastructure), it is not easy to abandon existing traditions. The migration of scientific applications to clouds is substantially impeded by difficulties in incorporating new technologies, significant architectural differences, software and hardware incompatibilities, reliance on external libraries. Moreover, scientific applications are mostly tightly-coupled to a specific platform or programs. Taken together, scientific applications face considerable challenges to evolve in accordance with 8 Laws proposed by Prof.

Load more