VCE: A New Personated Virtual Cluster Engine for Cluster Computing

Mohsen Sharifi, Masoud Hassani Ehsan Mousavi Khaneghah, Seyedeh Leili Mirtaheri Computer Engineering Department Computer Engineering Department Iran University of Science and Technology Iran University of Science and Technology Tehran, Iran Tehran, Iran [email protected], [email protected] [email protected], [email protected]

Abstract— addresses the problem of making more On the other hand, there are typically three types of owners, efficient use of available computer resources. Recently, virtual who use their workstations mostly for: clusters try either to virtualize a single physical cluster into (1) Sending and receiving email and preparing multiple independent virtual clusters to provide a virtual server documents. as a highly scalable and highly available server built on a cluster (2) Software development, editing, compiling, debugging of real servers, or to share physical clusters to offer economies of scale and more effective use of resources by multiplexing. and testing. However, since programs running on a cluster demonstrate (3) Running compute-intensive applications. different types of requirements as their executions proceeds, such as support for intensive processing, security, and massive data Cluster computing tries to steal spare cycles from (1) and communications, it is therefore quite unrealistic to assume that a (2) to provide resources for (3). However, this requires statically configured cluster with a predetermined number of overcoming the ownership hurdle - people are very protective nodes with specific features and support can do any good for of their workstations. To do this usually requires organizational such programs. This paper presents a different usage for mandate that computers are to be used in this way. Stealing virtualization in the context of distributed computing using cycles outside standard work hours (e.g. overnight) is easy, but virtual clusters, called Virtual Cluster Engine (VCE), that stealing idle cycles during work hours without impacting provides a computing environment that can be both statistically interactive use (both CPU and memory) is much harder. and dynamically (re)organized according to the needs and requirements of programs, so that they can achieve the best There are more motivations for using cluster computing in possible performance as they suit themselves, indeed within the addition to the above initiative: constraints of available resources. Feasibility of the proposed  The communications bandwidth between workstations architecture for VCE has been studied on an experimental is increasing as new networking technologies and platform using seven real machines, VMWare ESX, VMotion, protocols are implemented in LANs and WANs. and VMWare programming kit, and a number of virtual machines. On average, 20% improvement on response times  Workstation clusters are easier to integrate into under VCE was experienced. existing networks than special parallel computers.  Surveys show utilization of CPU cycles of desktop Keyword: Virtualization, Virtual Cluster, Cluster Computation, workstations is typically <10%. Distributed Computing, Operating System  Performance of workstations and PCs is rapidly improving I. INTRODUCTION  As performance grows, percent utilization will Generally speaking, there have been three ways to improve decrease even further distributed computing performance:  Organizations are reluctant to buy large , due to the large expense and short  Work harder by using faster hardware, e.g. reducing useful life span. the time per instruction.  The development tools for workstations are more  Work smarter by using optimized algorithms and mature than the contrasting proprietary solutions for techniques. parallel computers - mainly due to the non-standard  Get help from multiple computers to solve problems nature of many parallel systems. faster.  Workstation clusters are a cheap and readily available alternative to specialized High Performance Computing (HPC) platforms.  Use of clusters of workstations as a distributed provide a context for our proposition. Section 3 presents the compute resource is a very cost-effective incremental architecture of a new personated Virtual Cluster Engine for growth of system. cluster computing, called VCE. Section 4 studies the feasibility of VCE through experimentation. Section 5 concludes the Some of the key operational benefits of clustering include: paper.  System High Availability (HA). Offers inherent high system availability due to the redundancy of II. RELATED WORK hardware, operating systems, and applications. Virtualization addresses the problem of making more  Hardware Fault Tolerance. Redundancy for most efficient use of available computer resources. This is done by system components (e.g. disk-RAID), including both providing an abstraction layer which maps real resources to hardware and software. virtual resources. Virtualization solutions have existed for more  (OS) and Application Reliability. than forty years. For example, the IBM VM/370 project from Running multiple copies of the OS and applications, the early sixties used virtualization to expose a virtual and through this redundancy System/370 machine to the user. There is a wealth of  Scalability. Adding servers to the cluster or by adding virtualization technologies like: QEMU [3], [4], more clusters to the network as the need arises or OpenVZ [5], coLinux [6], [7], and a lot more. In this paper CPU to SMP. we are not going to focus on any particular technology or  High Performance. Running cluster enabled programs. present any new virtualization technology, but rather intend to put forward a different usage for virtualization in the context of There are so many applications for cluster computing too, distributed computing using virtual clusters. including: The common place technologies on virtual clusters, like  Numerous Scientific and Engineering Applications. COD [1], VPC [8], ZONE.NET [9], LVS [2], NLB [10,11],  Parametric Simulations. SV [12], VS [13], GVC [14] and many more try either to  Business Applications, like E-commerce Applications virtualize a single physical cluster into multiple independent (Amazon.com, eBay.com), Database Applications virtual clusters to provide a virtual server as a highly scalable (Oracle on cluster), and Decision Support Systems. and highly available server built on a cluster of real servers, or  Internet Applications, like Web serving/searching, to share physical clusters to offer economies of scale and more Infowares (yahoo.com, AOL.com), ASPs effective use of resources by multiplexing. (Application Service Providers), eMail, eChat, VCE presents a new perspective and usage for cluster ePhone, eBook, eCommerce, eBank, eSociety, virtualization based on virtual cluster technology. Virtual eAnything, and Computing Portals. clusters were originally used for educational purposes in cases  Mission Critical Applications, like Command Control where physical resources were not sufficient to run (new) systems, banks, nuclear reactor control, star-war, and systems or applications. A recurrent example is to deploy a handling life threatening situations. virtualizer, like Xen and VMWare [15], on a single computer to create a number of virtual machines interconnected by a Given this introductory background, what does a cluster virtual switch, allowing variety of experimentations on the really mean? A cluster is a type of parallel or distributed single computer as though they run on a real network. processing system, which consists of a collection of Nowadays, virtual clusters are mostly used for real interconnected stand-alone or complete computers computation. Advances in multi-processor and multi-core cooperatively working together as a single, integrated technology, increased maintenance costs of real machines, computing resource. The general focus nowadays is mostly (as scalability and flexibility of virtual machines, capability of in this paper) on MIMD model, using general purpose dynamically transferring machine state from one machine to processors or multi-computers. another, has attracted industry and academia to use virtual machines extensively in the construction of virtual clusters in Clusters have been subjected to virtualization in various support of real computations. ways, either to virtualize a single physical cluster into multiple independent virtual clusters [1], or to share physical clusters to offer economies of scale and more effective use of resources by Dinda [16] presents a case for grid computing based on the multiplexing [2]. In this paper we are not going to focus on any use of virtual machines. This was the first successful technique particular cluster virtualization technology or present a new that deployed a bank of virtual machines in support of varieties of tasks. It was however restricted in that assignment of virtual cluster virtualization technology, but rather intend to put machines to tasks were only allowed once and only upon start forward a different usage for virtualization in the context of of computation. VCE furthers this capability by allowing run- distributed computing using virtual clusters. We introduce a time cluster reorganization (i.e., reassignment of virtual completely different perspective of cluster virtualization. machines to tasks) even after the start of computation.

The rest of paper is organized as follows. A number of Foster [17,18] extends the definition of a virtual workspace virtual clusters are comparatively introduced in Section 2 to [19] to encompass the notion of a cluster. Her team introduces Virtual Cluster Workspaces that treat virtual machines as one of the potential ways of providing virtual resources in the Grid Table 1 compares the aforementioned virtual clusters. VCE and focuses on using virtual machines to represent virtual is unique in two features, namely support for Dynamic Cluster clusters. This technique also falls short of supporting cluster (Re)Organization, and Continued Execution in the Absence of reorganization at run-time. Remote Machines by virtual clusters. It is indeed these very two features that have enabled us to present a new personated Chase [1] presents a Cluster-on-Demand (COD) as a virtual cluster engine for cluster computing. system to enable rapid, automated, on-the-fly partitioning of a physical cluster into multiple independent virtual clusters. Difference with VCE is in run-time reorganization support. III. VCE ARCHITECTURE Huang [20] improves Dinda’s approach [16] and presents a In this paper we introduce a completely different case for high performance computing with virtual machines by perspective of cluster virtualization. To begin with, we have a introducing a framework which addresses the performance and central Virtual Cluster Engine (VCE) running on a real cluster management overheads associated with VM based computing. that can be physically scaled up on-demand either by sharing Two key ideas in their design are: the Monitor with other clusters in its accessible network, or putting more (VMM) bypass I/O, and the scalable VM image management. nodes into service. VCE is responsible for creating a virtual VMM-bypass I/O achieves high communication performance cluster for any particular required concern (such as, for for VMs by exploiting the OS-bypass feature of modern high computation, communication, security, naming, data speed interconnects such as InfiBand. Scalable VM image management, etc) and then mapping each node in each of these management significantly reduces the overhead of distributing virtual clusters to a real machine. A single real machine may and managing VMs in large scale clusters. Although Huang’s well host as many virtual nodes as it wishes. Real machines approach is more favorable than Dinda’s approach [16], it does may be heterogeneous and in any numbers. To give an not support the dynamic management of clusters. example, a virtual cluster can be made responsible for handling all user/client requests, one can be made responsible to do Vogels [21] uses CLI virtual machine implementations for certifications, and another one can be made to only perform the high performance computing community. It is not support compute-intensive sequential coarse-grain tasks. In effect, we reorganization at start and run-time. can envisage a central core containing the VCE surrounded by There are also some other cluster solutions under and N real clusters each of which serves one virtual cluster made by Windows, such as MPICH2 [22] and the VCE. Compute Cluster Server 2003 [23], that do not use virtual machine.

Table 1: A feature-wise comparison of virtual clusters. Dynamic Task Full Migration Dynamic Cluster Continued Assignment to (Re)Organization Execution Clusters in the Absence of Remote Machines Industrial Clusters, like Microsoft Cluster Not Supported Not Supported Not Supported Not Supported Server, and MPICH2 [21,22,23] Foster (Grid-like) Not Supported Supported Not Supported Not Supported Virtual Cluster [17,18] Enhanced Virtual Supported Not Supported Not Supported Not Supported Clusters, like [16,20] Virtual Cluster Supported Not Supported Not Supported Not Supported VCE Supported Supported Supported Supported As it is shown in the architecture of VCE in Figure 1, VCE cluster executor is responsible for executing the tasks and consists of a number of components: Task Manager, Resource (re)organizing the virtual machines based on runtime Manager, Local High Speed Network Manager, requirements of tasks and machines. The local high speed Superintendent, Monitor, Scheduler, Cluster Executor, network manager creates sub-nets when required, by Organization Manager, and Remote Machines Manager. The programming network switches accordingly. The remote figure also shows two example virtual clusters, VC1 and VC2, manager part manages the remote real machines, initiates the

running Task1 and Task2, respectively. VC1 is actually run on installation of any required virtual machines on them, and 2 real machines of the VCE cluster running under the auspices delivers these virtual machines to those needing them. of the local high speed network manager, each offering 2 virtual machines to VC1. These 4 virtually interconnected As mentioned before, VCE is based on the idea of building virtual machines actually build the VC1 virtual cluster. Each of a virtual cluster out of virtual machines running in a central high speed network. Each virtual machine is assigned to a real these 4 virtual machines can be supplied either from the pool of virtual machines in the VCE cluster, or from the virtual machine, and has an associated virtual machine on another real machines on remote real machines interconnected to VCE machine. The latter machine can be a real machine in the VCE cluster say via Internet. This is one of the distinguishable network, or any other machine in the world, say in the Internet, features of VCE that does not constrain itself to its own that can be accessible and shared. Every pair of associated resources, but rather allows for dynamic deployment of other virtual machines shares a common state that is managed by available remote resources in addition to the local ones. VC2 VCE, and at any moment in time only one of them is active cluster acts very much the same as VC1, but runs a different based on the cluster organization. This is to say that the activity task. of one virtual machine may stop at any moment and its associated virtual machine activated to continue the tasks Arriving tasks are delivered to the task manager that already assigned to the stopped virtual machine. interfaces with outsiders and submits the tasks to the superintendent. The above feature of VCE allows for dynamic (re)organization or (re)configuration of the clusters as the need Resource manager keeps up-to-date information about the arises. But the crucial question is what might be these needs? virtual machines and clusters, and provides this information to The answer is that programs running on a cluster demonstrate other components. different types of requirements as their executions proceeds. Sometimes they may encounter heavy computations and The monitor is responsible for monitoring and displaying require as much processing power as possible in order to cut cluster operations. Tasks are scheduled by the scheduler. The down their execution times, while in some other times they manager. The local network manager virtualizes the switching may get involved in massive data communication between of these virtual machines in a sub-net. The executor now builds program components or with other programs running on the a virtual cluster out of the virtual machines in this sub-net, same cluster and require high speed network support that may wherein the virtual machines communicate via the virtual not necessarily be available on high processing power switch created for them by the local network manager. The machines and clusters. They may also require some security executor passes control to the superintendent afterwards, which supports, like encryption and authentication services, only in turn is passed to the Organization manager. The during some specific parts of their executions and not all over Organization manager can initiate the installation of an their executions. It is therefore quite unrealistic to assume that appropriate number of virtual machines, as and when required, a statically configured cluster with a predetermined number of on shared remote real machines for each virtual cluster; each nodes with specific features and support can do any good for virtual machine on a remote real machine corresponds and is such programs. associated with a virtual machine in the original virtual cluster. Upon installation of these virtual machines by the remote The main objective and the cornerstone of VCE is to manager, control will get back to the organization manager via provide a computational environment that can be both superintendent. The organization manager may switch between statistically and dynamically (re)configured according to the every node in the virtual cluster and its associated remote needs and requirements of programs, so that they can achieve virtual node, depending on the task execution status and its the best possible performance as they suit themselves, indeed requirements; one node is stopped executing and its state is within the constraints of available resources. To achieve this transferred to the other node to continue the execution path. goal of VCE, we have opted to provide some primitives to programmers to enable them to partition their programs into VCE can support as many varieties of tasks simply by different parts with specific requirements and supports. adding more virtual machines and virtual clusters to the bank Furthermore, programs are allowed to publish their requests for of virtual machines and virtual clusters. Given the availability resources and services statically and dynamically in a given of virtual machines and virtual clusters, users need not always format, such as putting them in pre-allocated files or be present and logged in. Since states of virtual machines can directories. The requests for resources and services may be transferred to other virtual machines, the overall system include information programmers’ expectations and performance and availability can be expected to be quite high, estimations on the type of required operating systems, especially in cases of failures such as communication failures. maximum number of required nodes, maximum computational power, and maximum communication bandwidths. Requests IV. FEASIBILITY STUDY for services can be general, detailed, specific, functional, or non-functional. The big challenge remains though as how a Feasibility of VCE has been studies by developing an system may automatically derive such requirements without experimental platform using seven real machines, VMWare forcing programmers and users to old-fashioned extremes of ESX [25], VMotion [26] virtual machines, and VMWare specifying the requirements by themselves. For simplicity, we programming kit [27]. assume here that such information is somehow derived and the Three real dual core processors were interconnected with a cluster partitioning is done manually. programmable switch. VCE was implemented in C by using We have envisaged that users of VCE, code their programs VMWare programming API [27], and installed on one of the in a programming language of their choice for clusters, e.g. in real machines. VMWare ESX was installed on the other two MPI [24], and call VCE primitives to their advantage to real machines. Four virtual machines were defined on each of partition the executions of programs as they see fit to their these two real machines. Each virtual machine had both Linux objectives, like scalability, security, efficiency, and alike. In and Microsoft Windows installed. The eight virtual machines addition, programmers and users specify their requirements for were interconnected via virtual switches, making up a virtual resources and services either upon start of programs or during cluster. Security settings on each virtual machine were done by execution of programs. Each execution of such written the API programming. Utilities required for the execution of programs with the specified requirements constitutes a Task MPICH2 under both Linux and Microsoft Windows were and is submitted to the task manager of VCE by the user. installed too. Having investigated the feasibility of executing the submitted Four real machines equipped with VMWare ESX were also task, the task manager coordinates with the scheduler and assigned. To begin with, two or three virtual machines were superintendent of VCE and passes the task to the resource installed on each real machine, each associated with a virtual manager. Knowing the available real and virtual resources, the machine on the above virtual cluster. These virtual machines resource manager then designs a suitable virtual cluster for the were also connected to the above virtual cluster. The four real execution of the task, and passes this design to the executor remote machines were not dedicated to our experimentation under the auspices of scheduler and superintendent. It is the and were used by other users to run their programs without responsibility of the executor to use an available cluster and intervention of VCE. possibly add any required number of virtual nodes to it, or to coordinate with the local network manager and the A number of scenarios were run on the experimental superintendent to select a number of real machines in the local platform: network and load the required virtual machines on them as  Tasks with varying number of required resources. specified in the cluster design passed on to it by the resource  Tasks requiring three types of MPICH2 cluster under Microsoft Windows.  Random disconnection of remote machines to REFERENCES local network. [1] J. Chase, L. Grit, D. Irwin, J. Moore, and S. Sprenkle, ”Dynamic virtual clusters in  Tasks with partitioned execution parts in their a grid site manager”, The 12th International Symposium on High Performance programs: security, massive computation, and Distributed Computing (HPDC-12), 2003. high communication parts. [2] W. Zhang, “ Clusters: Build Highly-Scalable and Highly- Available Network Services at Low Cost”, Linux Magazine, November 2003. Although all scenarios were run successfully, the results of [3] F. Bellard, “QEMU: A Fast and Portable Dynamic Translator”, FREENIX Track, running the last scenario were noteworthy and indicative of USENIX Annual Technical Conference, 2005. whether higher performance of execution of programs can be [4] T. Butler, “The Bochs Project”, http://bochs.sourceforge.net, Last Accessed 2007. achieved overall if programs are partitioned and their [5] SWsoft Virtualization Technology, “OpenVZ”, http://www.openvz.org, Last executions are managed dynamically by VCE. Numerous runs Accessed 2007. of this scenario showed 20% improvement on response times [6] D. Aloni, “Cooperative Linux”, Linux Symposium, Canada, July 2004. on average, compared with the runs of non-partitioned [7] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. equivalent programs. Another interesting result was that higher Pratt, and A. Warfield. “Xen and the Art of Virtualization”, The 19th ACM rate of improvements were achievable when the number of Symposium on Operating Systems Principles, October 2003. nodes in the network were increased. It has to be pointed out [8] Cluster Resources Inc., “Virtual Private Cluster”, again that the four real remote machines were not dedicated to http://www.clusterresources.com/products/mwm/docs/21.1vpc.shtml, Last our experimentation and could be used by other users running Accessed 2007. [9] ZONE.NET, ”ZONE.NET Virtual Cluster”, http://zone.net/virtual_clusters, Last their programs without intervention of VCE. Accessed 2006. Based on our experimental evidences, we claim that VCE [10] Microsoft TechNet, ”Understanding Virtual Clusters”, Microsoft Windows Server can cost-effectively run partitioned programs and achieve high Technical Center, January 2005. performance, even with 500 nodes. More evidences supporting [11] Bob Monkman, “The Benefits of Virtual Clusters”, HPCwire, 2006. our claim indeed requires more realistic platforms using light- [12] Matt Stansberry, “Virtual Clusters: IT Management's Magic Pill?,” weight virtual machines, like Xen, high-speed virtual networks, SearchDataCenter.com, 22 Feb 2005. fast state transfer between virtual machines, and deployment of [13] B. Taylor, “Virtual Clusters”, RailsConf 2007, Posted by Nick Sieger, Saturday, considerable number of nodes. We are currently working to May 19, 2007. [14] Grid Today,”United Devices Announces Virtual Cluster Solution”, provide such platform. http://www.gridtoday.com/grid/737958.html, July 2006. [15] VMware Inc., “VMware Virtual Platform Technology,” White Paper, February V. CONCLUSION 1999. [16] R. Figueiredo, P. Dinda, and J. Fortes, “A Case for Grid Computing on Virtual Given that programs running on a cluster of computers Machines”, The 23rd International Conference on Distributed Computing, May demonstrate different types of requirements as their executions 2003. proceeds, we presented a different usage for virtualization in [17] I. Foster, T. Freeman, K. Keahey, D. Scheftner, B.Sotomayor, and X. Zhang, the context of distributed computing using virtual clusters, “Virtual Clusters for Grid Communities”, CCGRID 2006. namely VCE, that provides a computing environment that can [18] X. Zhang, K. Keahey, I. Foster, and T. Freeman, “Virtual Cluster Workspaces for be both statistically and dynamically (re)organized according to Grid Applications”, ANL/MCS-P1246-0405, 2005. the needs and requirements of programs, so that they can [19] K. Keahey, I. Foster, T. Freeman, and X. Zhang, “Virtual Workspaces: Achieving achieve the best possible performance as they suit themselves, Quality of Service and Quality of Life in the Grid”, Scientific Progamming indeed within the constraints of available resources. The Journal, 2005. architecture of VCE was briefly presented and its feasibility [20] W. Huang, J. Liu, B. Abali and D. K. Panda, “A Case for High Performance Computing with Virtual Machines,” The 20th ACM International Conference on studied on an experimental platform. It was shown that Supercomputing (ICS '06), Cairns, Queensland, Australia, June 2006. programs that partition their executions and request for [21] W. Vogels, “HPC.NET: Are CLI-Based Virtual Machines Suitable for High necessary resources in support of each partition can benefit Performance Computing?”, Supercomputing, 2003. from VCE by at least 20% improvement in performance, in [22] Argonne National Laboratory, “MPICH2,” addition to improvements in their availability and scalability. http://www.mcs.anl.gov/research/projects/mpich2/, Last Accessed 2007. Based on our experimental evidences, we claimed that VCE [23] Microsoft Corporation, “Windows Server 2003”, can cost-effectively run partitioned programs and achieve high http://www.microsoft.com/windowsserver2003/ccs/, Last Accessed 2007. performance, even with 500 nodes. More evidence supporting [24] W. Yu, J. Wu, and D. K. Panda, “Fast and Scalable Startup of MPI Programs In this claim indeed requires more realistic platforms using light- In_niBand Clusters”, HiPC'04, Banglore, Inida, 2004. weight virtual machines, like Xen, high-speed virtual networks, [25] C. Waldspurger, “Memory Resource Management in Vmware ESX Server”, The fast state transfer between virtual machines, and deployment of Fifth Symposium on Operating Systems Design and Implementation, 2002. considerable number of nodes. We are currently working to [26] VMware Inc., ”VMWare Vmotion”, provide such platform and use it to run a distributed Web http://www.vmware.com/products/vi/vc/vmotion.html, Last Accessed 2007. crawler in the Internet. [27] VMware Inc., “VMWare Programing API”, http://www.vmware.com/support/developer/prog-api/, Last Accessed 2007.