Virtual Organization Clusters: Self-Provisioned Clouds on the Grid Michael Murphy Clemson University, [email protected]

Clemson University TigerPrints All Dissertations Dissertations 5-2010 Virtual Organization Clusters: Self-Provisioned Clouds on the Grid Michael Murphy Clemson University, [email protected] Follow this and additional works at: https://tigerprints.clemson.edu/all_dissertations Part of the Computer Sciences Commons Recommended Citation Murphy, Michael, "Virtual Organization Clusters: Self-Provisioned Clouds on the Grid" (2010). All Dissertations. 511. https://tigerprints.clemson.edu/all_dissertations/511 This Dissertation is brought to you for free and open access by the Dissertations at TigerPrints. It has been accepted for inclusion in All Dissertations by an authorized administrator of TigerPrints. For more information, please contact [email protected]. Virtual Organization Clusters: Self-Provisioned Clouds on the Grid A Dissertation Presented to the Graduate School of Clemson University In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Computer Science by Michael A. Murphy May 2010 Accepted by: Dr. Sebastien Goasguen, Committee Chair Dr. Jim Martin Dr. Walt Ligon Dr. Robert Geist Abstract Virtual Organization Clusters (VOCs) provide a novel architecture for overlaying dedicated cluster systems on existing grid infrastructures. VOCs provide customized, homogeneous execution environments on a per-Virtual Organization basis, without the cost of physical cluster construction or the overhead of per-job containers. Administrative access and overlay network capabilities are granted to Virtual Organizations (VOs) that choose to implement VOC technology, while the system remains completely transparent to end users and non-participating VOs. Unlike alternative systems that require explicit leases, VOCs are autonomically self-provisioned according to configurable usage policies. As a grid computing architecture, VOCs are designed to be technology agnostic and are implementable by any combination of software and services that follows the Virtual Organization Cluster Model. As demonstrated through simulation testing and evaluation of an implemented prototype, VOCs are a viable mechanism for increasing end-user job compatibility on grid sites. On existing production grids, where jobs are frequently submitted to a small subset of sites and thus experience high queuing delays relative to average job length, the grid-wide addition of VOCs does not adversely affect mean job sojourn time. By load-balancing jobs among grid sites, VOCs can reduce the total amount of queuing on a grid to a level sufficient to counteract the performance overhead introduced by virtualization. ii Dedication This dissertation is dedicated to my grandparents: Marge Tyner, Jim and Edrie Murphy, and the late Dr. Calvin Tyner. Throughout my entire academic career, from grade school through university, my grandparents have provided the encouragement and support that has made this journey to the doctorate possible, and to them I am exceedingly grateful. iii Acknowledgments I would like to express my sincere thanks to my research committee for their assistance with this project. My advisor, Prof. Sebastien Goasguen, has spent countless hours assessing my work, providing assistance with the research, and reading my long-winded e-mail messages. His patient guidance throughout this process, along with his encouragement to publish results in challenging venues, has inspired me to pursue loftier objectives in my research. Prof. Jim Martin, Prof. Walt Ligon, and Prof. Robert Geist have assisted in the process by dedicating their time and expertise to reviewing and aiding in the improvement of my research and its presentation. Outside my committee, I would also like to thank the faculty members who have assisted me with research guidance, pursuit of funding, and day-to-day administration issues. Prof. Jason Hallstrom, Prof. Chris Post, and Prof. Steve Stevenson have guided my past research projects, ad- vised me on career and professional issues, and were instrumental in my securing the NSF Graduate Research Fellowship. Prof. Wayne Madison not only assisted me with the fellowship pursuit but also contributed significant time and effort to handling administrative issues, advising me throughout my undergraduate and graduate careers, and providing me with teaching opportunities in the School of Computing. Prof. Pradip Srimani graciously reviewed paper submissions and mathemat- ical estimation work, providing helpful comments for improvement of the material. The late Prof. Per Brinch Hansen of the Syracuse University Department of Electrical Engineering and Computer Science encouraged me with insights in distributed computing and computational problem solving, which greatly shaped my approaches to the research problems described in this dissertation. For their assistance with numerous administrative and technical issues, I would like to thank the staff of the Clemson University School of Computing. Scott Duckworth and Nell Kennedy have provided exceptional assistance with the provisioning of networking and other services required to support our prototype systems, hardware and vendor advice, and assistance with hardware orders. iv April Bowen, Lea Benson, and Kim Keasler have generously provided their assistance with school and university policy compliance, fellowship administration, facilities support, and other administrative issues. Throughout the process of researching Virtual Organization Clusters and preparing this work, I have been privileged to work with a team of graduate and undergraduate students, who have assisted in the collection of data, administration of computer systems, and co-authoring of papers. I would like to express my sincere appreciation to Linton Abraham, Michael Fenn, Lance Stout, Brandon Kagey, Joshua Canter, and Emmett McQuinn for their contributions to this research project and to the publications that have resulted from it. During my entire academic career, I have been supported by my relatives and friends as I have pursued challenging and often time-consuming courses of study. I would like to thank my grandparents, to whom this dissertation is dedicated, my parents – Mark Murphy and Ginny Murphy, and the remainder of my family for their support and encouragement. I also owe a debt of gratitude to all my friends through graduate school, including Dru and Heather Sepulveda, Katelyn Howay, Shawn Becker, Tim and Kristen Rabideau, James Bardoner, Claire Wellborn, Sarah Peck, Jordan and Shana Upham, Ben Sterrett, Christine Marusich, Myric Lehner, and Shawna Martell. Simulation data sets used in this work have been provided by the Grid Observatory (grid- observatory.org). The Grid Observatory is part of the EGEE-III EU project INFSO-RI-222667. This material is based upon work supported under a National Science Foundation Graduate Research Fellowship. Any opinions, findings, conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. v Table of Contents Title Page . i Abstract . ii Dedication . iii Acknowledgments . iv List of Tables . viii List of Figures . ix 1 Introduction . 1 1.1 Vision . 2 1.2 Motivation . 5 1.3 Thesis Statement . 12 1.4 Dissertation Organization . 13 2 Grid Systems . 15 2.1 Computational Grids . 16 2.2 Composition of Grids . 18 2.3 Virtual Organizations . 20 2.4 Computational Jobs . 22 2.5 Service-Oriented Architecture . 24 2.6 Cloud Computing Systems . 25 3 Operating System Virtualization . 29 3.1 Levels of Virtualization . 31 3.2 Instruction Set Architecture Requirements . 32 3.3 Dynamic Recompilation . 34 3.4 Paravirtualization . 35 3.5 Hardware Virtual Machine Extensions . 38 3.6 Applications of Virtualization Technology . 41 4 Related Work . 44 4.1 Virtualization of Grid Systems . 44 4.2 Simulation Systems . 55 4.3 System Resources . 58 4.4 System Management . 64 5 Virtual Organization Cluster Model . 68 5.1 Separation of Administrative Domains . 69 vi 5.2 Terminology . 72 5.3 Definition of a Virtual Organization Cluster . 74 5.4 Self-Provisioned Clouds . 77 6 Simulator for Virtual Organization Clusters . 79 6.1 Motivation for Simulation . 80 6.2 Grid Trace Data . 83 6.3 Simulator Design . 84 7 Prototype Implementation . 90 7.1 Cluster Setup . 91 7.2 Dynamic Provisioning of Virtual Organization Clusters . 94 7.3 Management of Systems . 97 8 Simulation Test Results . 103 8.1 Preliminary Simulations . 104 8.2 Input Data Processing . 106 8.3 Control Simulation . 106 8.4 Standard Grid Architecture . 107 8.5 Virtual Organization Cluster Architecture . 109 8.6 Effects of Virtualization Overhead . 111 9 Prototype Test Results . 117 9.1 Performance Tests . 117 9.2 Dynamic Provisioning System . 127 9.3 Overlay Scheduling . 133 9.4 Operational Tests . 138 9.5 System Management with Stoker . 140 10 Conclusions . 145 10.1 Utility and Viability of Virtual Organization Clusters . 147 10.2 Direct Impacts . 148 10.3 Broader Impacts . 148 10.4 Future Work . 151 Appendices . 153 A Electronic Attachments . 154 B License Agreements . 155 Bibliography . 161 vii List of Tables 3.1 Levels of Virtualization . 32.

Virtual Organization Clusters: Self-Provisioned Clouds on the Grid Michael Murphy Clemson University, [email protected]

Evaluation of the CSF Firewall

Kosten Senken Dank Open Source?

The Programming Language Concurrent Pascal

The Programmer As a Young Dog

Databits: an Electronic Newsletter for Information Managers

Part Iii Harnessing Parallelism

A Programmer's Story. the Life of a Computer Pioneer

Neural Network FAQ, Part 1 of 7: Introduction

Comparing User Experiences in Using Twiki & Mediawiki To

Tutorial: Wikis in the Workplace" for Wiki Symposium, Montreal, Canada, 22 Oct 2007

Regular Expressions Search String Counts

Downloading from Naur's Website: 19