A User-Centered and Autonomic Multi-Cloud Architecture for High Performance Computing Applications
Total Page:16
File Type:pdf, Size:1020Kb
Universidade de Brasília Instituto de Ciências Exatas Departamento de Ciência da Computação A User-Centered and Autonomic Multi-Cloud Architecture for High Performance Computing Applications Alessandro Ferreira Leite Thesis presented in partial fulfillment of the requirements for the degree of Doctor in Computer Science Advisor Prof. Dr. Alba Cristina Magalhães Alves de Melo Advisor Prof. Dr. Christine Eisenbeis Brasília 2014 Universidade de Brasília — UnB Instituto de Ciências Exatas Departamento de Ciência da Computação Doutorado em Informática Coordenadora: Prof. Dr.ª Alba Cristina Magalhães Alves de Melo Banca examinadora composta por: Prof. Dr. Alba Cristina Magalhães Alves de Melo (Advisor) — CIC/UnB Prof. Dr. Christine Eisenbeis (Advisor) — INRIA, LRI/Paris-Sud 11 Prof. Dr. Claude Tadonki (Co-Advisor) — Mines ParisTech Prof. Dr. Christine Froidevaux — LRI/Paris-Sud 11 Prof. Dr. Célia Ghedini Ralha — CIC/UnB Prof. Dr. Christine Morin — INRIA, IRISA Prof. Dr. Christophe Cérin — LIPN/Paris 13 Prof. Dr. Jean-Louis Pazat — INSA Rennes CIP — Catalogação Internacional na Publicação Leite, Alessandro Ferreira. A User-Centered and Autonomic Multi-Cloud Architecture for High Per- formance Computing Applications / Alessandro Ferreira Leite. Brasília : UnB, 2014. 282 p. : il. ; 29,5 cm. Thesis (Ph.D.) — Universidade de Brasília, Brasília, 2014. 1. Computação Autonômica; Computação de Alto Desempenho; Federação de Nuvens; Linha de Produto de Software; Modelo de Features de Infraestrutura de Nuvens CDU 004.4 Endereço: Universidade de Brasília Campus Universitário Darcy Ribeiro — Asa Norte CEP 70910-900 Brasília–DF — Brasil To my family and my parents To my niece and my nephews Acknowledgements First, I would like to thank my supervisors for their support and advices. Next, I would like to thank my thesis’s reviewers Christophe Cérin and Jean-Louis Pazat, and the members of my thesis committee: Christine Froidevaux, Célia Ghedini Ralha, and Christine Morin. I appreciate all yours comments and questions. Additionally, I would like to thank the professors Vander Alves, Genaína Rodrigues, Li Weigang, and Jacques Blanc for their comments, suggestions, and discussions. Next, I am very thankful to Katia Evrat and Valérie Berthou for kindly helping me in various administrative works. In addition, I am grateful to the authors that sent me their papers when I requested them. After, I would like to thank all my team members in INRIA, especially Michael Kruse and Taj Muhammad Khan. I also thank my friends Cícero Roberto, Antônio Junior, Éric Silva, André Ribeiro, Francinaldo Araujo, Emerson Macedo, and the former Logus’s team. Moreover, I would like to thank the people I collaborated with during the last years. I further acknowledge the financial assistance of CAPES/CNPq through the program science without borders (grant 237561/2012-3) during the year of 2013, and Campus France/INRIA (project Pascale) during the period of February to July and November to December 2014. Last but not least, I would like to thank my parents and my family for their uncondi- tional support and patience. iii Abstract Cloud computing has been seen as an option to execute high performance computing(HPC) applications. While traditional HPC platforms such as grid and supercomputers offer a stable environment in terms of failures, performance, and number of resources, cloud computing offers on-demand resources generally with unpredictable performance at low financial cost. Furthermore, in cloud environment, failures are part of its normal operation. To overcome the limits of a single cloud, clouds can be combined, forming a cloud federation often with minimal additional costs for the users. A cloud federation can help both cloud providers and cloud users to achieve their goals such as to reduce the execution time, to achieve minimum cost, to increase availability, to reduce power consumption, among others. Hence, cloud federation can be an elegant solution to avoid over provisioning, thus reducing the operational costs in an average load situation, and removing resources that would otherwise remain idle and wasting power consumption, for instance. However, cloud federation increases the range of resources available for the users. As a result, cloud or system administration skills may be demanded from the users, as well as a considerable time to learn about the available options. In this context, some questions arise such as: (a) which cloud resource is appropriate for a given application? (b) how can the users execute their HPC applications with acceptable performance and financial costs, without needing to re-engineer the applications to fit clouds’ constraints? (c) how can non-cloud specialists maximize the features of the clouds, without being tied to a cloud provider? and (d) how can the cloud providers use the federation to reduce power consumption of the clouds, while still being able to give service-level agreement (SLA) guarantees to the users? Motivated by these questions, this thesis presents a SLA-aware application consolidation solution for cloud federation. Using a multi-agent system(MAS) to negotiate virtual machine(VM) migrations between the clouds, simulation results show that our approach could reduce up to 46% of the power consumption, while trying to meet performance requirements. Using the federation, we developed and evaluated an approach to execute a huge bioinformatics application at zero-cost. Moreover, we could decrease the execution time in 22.55% over the best single cloud execution. In addition, this thesis presents a cloud architecture called Excalibur to auto-scale cloud-unaware application. Executing a genomics workflow, Excalibur could seamlessly scale the applications up to 11 virtual machines, reducing the execution time by 63% and the cost by 84% when compared to a user’s configuration. Finally, this thesis presents a software product line engineering(SPLE) method to handle the commonality and variability of infrastructure-as-a-service(IaaS) clouds, and an autonomic multi-cloud architecture that uses this method to configure and to deal with failures autonomously. The SPLE method uses extended feature model(EFM) with attributes to describe the resources and to select them based on the users’ objectives. Experiments realized with two different cloud providers show that using the proposed method, the users could execute their application on a federated cloud environment, without needing to know the variability and constraints of the clouds. Keywords: Autonomic computing; High performance computing (HPC); Cloud federation; Soft- ware Product Line Engineering, Variability Management of Cloud Infrastructure v Résumé Le cloud computing a été considéré comme une option pour exécuter des applications de calcul haute performance (HPC). Bien que les plateformes traditionnelles de calcul haute performance telles que les grilles et les supercalculateurs offrent un environnement stable du point de vue des défaillances, des performances, et de la taille des ressources, le cloud computing offre des ressources à la demande, généralement avec des performances imprévisibles mais à des coûts financiers abordables. En outre, dans un environnement de cloud, les défaillances sont perçues comme étant ordinaires. Pour surmonter les limites d’un cloud individuel, plusieurs clouds peuvent être combinés pour former une fédération de clouds, souvent avec des coûts supplémentaires légers pour les utilisateurs. Une fédération de clouds peut aider autant les fournisseurs que les utilisateurs à atteindre leurs objectifs tels la réduction du temps d’exécution, la minimisation des coûts, l’augmentation de la disponibilité, la réduction de la consommation d’énergie, pour ne citer que ceux-là. Ainsi, la fédération de clouds peut être une solution élégante pour éviter le sur-approvisionnement, réduisant ainsi les coûts d’exploitation en situation de charge moyenne, et en supprimant des ressources qui, autrement, resteraient inutilisées et gaspilleraient ainsi de énergie. Cependant, la fédération de clouds élargit la gamme des ressources disponibles. En conséquence, pour les utilisateurs, des compétences en cloud computing ou en administration système sont nécessaires, ainsi qu’un temps d’apprentissage considérable pour maîtrises les options disponibles. Dans ce contexte, certaines questions se posent : (a) Quelle ressource du cloud est appropriée pour une application donnée ? (b) Comment les utilisateurs peuvent-ils exécuter leurs applications HPC avec un rendement acceptable et des coûts financiers abordables, sans avoir à reconfigurer les applications pour répondre aux normes et contraintes du cloud ? (c) Comment les non-spécialistes du cloud peuvent-ils maximiser l’usage des caractéristiques du cloud, sans être liés au fournisseur du cloud ? et (d) Comment les fournisseurs de cloud peuvent-ils exploiter la fédération pour réduire la consommation électrique, tout en étant en mesure de fournir un service garantissant les normes de qualité préétablies ? À partir de ces questions, la présente thèse propose une solution de consolidation d’applications pour la fédération de clouds qui garantit le respect des normes de qualité de service. On utilise un système multi-agents (SMA) pour négocier la migration des machines virtuelles entre les clouds. Les résultats de simulations montrent que notre approche pourrait réduire jusqu’à 46% la consommation totale d’énergie, tout en respectant les exigences de performance. En nous basant sur la fédération de clouds, nous avons développé et évalué