Contributions to Desktop Grid Computing Gilles Fedak

Contributions to Desktop Grid Computing Gilles Fedak

Contributions to Desktop Grid Computing Gilles Fedak To cite this version: Gilles Fedak. Contributions to Desktop Grid Computing : From High Throughput Computing to Data-Intensive Sciences on Hybrid Distributed Computing Infrastructures. Calcul parallèle, distribué et partagé [cs.DC]. Ecole Normale Supérieure de Lyon, 2015. tel-01158462v2 HAL Id: tel-01158462 https://hal.inria.fr/tel-01158462v2 Submitted on 7 Dec 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Copyright University of Lyon Habilitation `aDiriger des Recherches Contributions to Desktop Grid Computing From High Throughput Computing to Data-Intensive Sciences on Hybrid Distributed Computing Infrastructures Gilles Fedak 28 Mai 2015 Apr`esavis de : Christine Morin Directeur de Recherche, INRIA Pierre Sens Professeur, Universit´eParis VI Domenico Talia Professeur, University of Calabria Devant la commission d'examen form´eede : Vincent Breton Directeur de Recherche, CNRS Christine Morin Directeur de Recherche, INRIA Manish Parashar Professeur, Rutgers University Christian Perez Directeur de Recherche, INRIA Pierre Sens Professeur, Universit´eParis VI Domenico Talia Professeur, University of Calabria Contents 1 Introduction 9 1.1 Historical Context . .9 1.2 Contributions . 13 1.3 Summary of the Document . 16 2 Evolution of Desktop Grid Computing 19 2.1 A Brief History of Desktop Grid and Volunteer Computing . 20 2.2 Algorithms and Technologies Developed to Implement the Desktop Grid Concept . 24 2.3 Evolution of the Distributed Computing Infrastructures Landscape . 29 2.4 Emerging Challenges of Desktop Grid Computing . 31 2.5 Conclusion . 36 3 Research Methodologies for Desktop Grid Computing 37 3.1 Observing and Characterizing Desktop Grid Infrastructures . 37 3.2 Simulating and Emulating Desktop Grid Systems . 40 3.3 DSL-Lab: a Platform to Experiment on Domestic Broadband Internet . 41 3.4 The European Desktop Grid Infrastructure . 43 3.5 Conclusion . 46 4 Algorithms and Software for Hybrid Desktop Grid Computing 49 4.1 Scheduling for Desktop Grids . 50 4.2 SpeQuloS, a QoS Service for Best-Effort DCIs . 51 4.3 The Promethee Multi-criteria Scheduler for Hybrid DCIs . 54 4.4 Virtualization and Security . 56 4.5 CloudPower: a Cloud Service Providing HTC on-Demand . 57 4.6 Conclusion . 57 5 Large Scale Data-Centric Processing and Management 59 5.1 Environment for Large Scale Data Management . 60 5.2 Implementing the MapReduce Programming Model on Desktop Grids . 62 5.3 Handling Data Life Cycles on Heteregeneous Distributed Infrastructures . 65 5.4 Conclusion . 68 6 Conclusion and Perspectives 71 6.1 Conclusion . 71 6.2 Perspectives . 73 3 Abstract Since the mid 90's, Desktop Grid Computing - i.e the idea of using a large number of remote PCs distributed on the Internet to execute large parallel applications - has proved to be an efficient paradigm to provide a large computational power at the fraction of the cost of a dedicated computing infrastructure. This document presents my contributions over the last decade to broaden the scope of Desktop Grid Computing. My research has followed three different directions. The first direction has established new methods to observe and characterize Desktop Grid resources and developed experimental platforms to test and validate our approach in conditions close to reality. The second line of research has focused on integrating Desk- top Grids in e-science Grid infrastructure (e.g. EGI), which requires to address many challenges such as security, scheduling, quality of service, and more. The third direction has investigated how to support large-scale data management and data intensive applica- tions on such infrastructures, including support for the new and emerging data-oriented programming models. This manuscript not only reports on the scientific achievements and the technologies developed to support our objectives, but also on the international collaborations and projects I have been involved in, as well as the scientific mentoring which motivates my candidature for the Habilitation `aDiriger les Recherches. 5 Acknowledgements I would like to express my gratitude to the members of the defense committee: Christine Morin, Pierre Sens and Domenico Talia, who accepted to review the manuscript and Vincent Breton, Christian Perez and Manish Parashar, who accepted to participate to the defense. I would sincerely like to thank the GRAAL/Avalon team and the LIP members for their kindness, support and for all the fruitful discussions we have daily. In particular, I would like to thank Christian Perez, head of the Avalon team for his constant support during these last years spent in Lyon. A great deal of credit should be given to the PhD students, post-docs and engineers without whom none of this would have been possible: Anthony Simonet, Paul Mal´ecot, Baoha Wei, Asma Ben Cheick, Julio Anjos, Bing Tang, Mircea Moca, Haiwu He, Jos´e Saray, Simon Delamare, Fran¸coisB´erenger,Derrick Kondo and more. This work gave me the privilege to meet many awesome people; some of them with whose I developed over the years a sincere and friendly relationship. Without naming them, I would like to thank all my co-authors and collaborators from around the world: US, China, Japan, Tunisia, Ukraine, Canada, Brazil, Romania and more. I would like to friendly salute Oleg Lodygensky, working at IN2P3, with whom I have shared a lot of this adventure. I would like to dedicate this manuscript to my beloved ones : Irina, Lison and Virginie. 7 Chapter 1 Introduction Learning is experience. Everything else is just information. (Albert Einstein (1879 { 1955)) This document presents the research I have done since receiving my Ph.D. in 2003. These activities started during my postdoctoral stay at the University of California San Diego and continued at University Paris XI in the Grand-Large team where I occu- pied a position of INRIA Researcher between 2004 and 2008. In 2008, I joined the GRAAL/AVALON team at the University of Lyon, whose research activities aim at de- signing software, algorithms and programming models for large scale distributed comput- ing infrastructures. Pursuing similar objective, I focused my research work on studying infrastructures and technologies for Desktop Grid Computing. 1.1 Historical Context Thorough this document, the reader will follow a part of the history of parallel and dis- tributed computing. This story illustrates the efforts to build more powerful computing infrastructures by designing large scale systems, that is to say, capable of assembling millions of computing nodes [1]. Desktop Grid Computing [2] is a discipline which, by considering the whole Internet as a possible computing platform, has pushed this idea to its extreme limit. It's hard to figure out now, how disruptive this approach was, when it emerged at the late 90's. To put things into context, Desktop Grid has appeared at the same time than Cluster Computing, which was aiming at designing parallel com- puters using off-the shelf components such as regular PC run by Linux and early Grid Computing, where the first experiments were considering parallel applications spanning over several super-computers geographically distributed. Thus, the environment was favorable to research leading to new directions for high performance computing. The characteristics of a Desktop Grid platform is radically different from a traditional super- computer: nodes are distributed over the Internet, nodes can join or leave the network 9 Chapter 1 Introduction at any time without notice, nodes have low network capabilities with many connection restrictions and nodes are shared with end-users. Our context was so radically different, that we had to invent new paradigms and tech- nologies before being able to explore the full possibility of Desktop Grid Computing. Indeed, Desktop Grid has revisited many traditional aspects of high performance com- puting: scheduling, file management, communications, programming model and more. Moreover, it has also revealed in advance several problematics that were considered as minor and secondary in the classical field of parallel computing, such as system security and dependability, and which are now widely addressed as a primary concern, at the age of Cloud Computing and Peta-scale High Performance Computing. This thesis presents my contributions to the domain, both in term of technology and scientific results. One particular aspect of research in Desktop Grid computing is that it requires to develop the technology which enables to build the infrastructure, and at the same time to observe and understand the infrastructure in order to improve the technology. Thus, my research follows two axis which are advanced in parallel. The first area concerns Desktop Grid algorithms and software, and addresses a large variety of topics such as performance, programming model, security, data management, fault- tolerance etc. The second direction concerns the Desktop Grid infrastructures, which consists in observing the infrastructures and characterizing the computing resources. This helps

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    111 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us