Replica Synchronization for Virtual Machines

Replica Synchronization For Virtual Machines Ashwin Sancheti Prof.Randal Burns Johns Hopkins University Johns Hopkins University [email protected] [email protected] Abstract virtual machines.Virtual machine identification is one of the important aspect of this particular paper. Replica Synchronization for Virtual machines is important technique in distrubuted systems to reduce the band- Unfortunately, Currently there is no approach avaible width usage.We are proposing a scalable data replication which will perform the virtual machine synchronization.We protocol that synchronizes virtual machines across multi- are proposing one of the approach which will achieve ple geographically distrubuted replica locations.This tech- the replica synchronization for virtual machines.In our nique can be applied to a broad range of virtual ma- approach we don’t require any prior knowlege of the remote chines like VMware workstation,Microsoft virtual server side virtual machines. One of the important aspect of our and many others.The protocol is designed to be bandwidth approach is to identify similar kinds of virtual machines.To efficient,scalable and content based, and it does not reuqire achieve bandwidth efficiency,speed and minimum time we prior knowlege of the virtual machines.We are also trying proposes following steps to reduce the time required to synchronize the virtual machines.To achieve these properties we create the hierarchi- • Inverted Binary Hash tree generation at Source site cal inverted binary hash tree.We are using VMDK files ( VMware Virtual Hard Disk ) for our experimental setup. • Fast matching at Target • Find the common data blocks between source and tar- 1. Introduction get virtual machines • Transfer only differnt data blocks to the target. Virtualization refers to the abstraction of the physical resources.Microsoft virtual server,VMware Work Sta- Our main contributions in this paper are : i) Identifying tion,VMware Server Console are the good examples of the similar virtual machines from the differnt set of the virtual virtual machines.Server consolidation,Testing and develop- machines,ii) Creating list of the hierarchical inverted binary ment,Dynamic load balancing,Disaster recovery,Resource hash tree at source site,iii) Depending on hash comparison sharing are some of the advantages of the virtualization. send only the data blocks which are differnt to the target site.for experimental purpose we are using VMDK files In this paper we describe a redundancy elimination pro- (VMware Hard Disk).For generating hash for data blocks tocol for replica synchronization of virtual machines.Our we can use the MD5 or SHA hashing algorithms. motivation for this approach arose from TAPER: Tiered Approach for Eliminating Redundancy in Replica Synchro- The rest of the paper is organized as follows. Section nization.In this system huge data is synchonizaed acorss 2 provides the technique for identifying Virtual ma- multiple geographically distrubuted replica locations.There chines.VMDK architecture and address space is described are very large number of applications which have the same in detail in section 3. Section 4 talks about the creatation requirements:they require replicating and synchronizing of the Inverted Binary Hash Tree.The whole algorithm is a large collection of data acorss multiple sites,possibly explained in Section 5. Section 6 covers experimental setup over low-bandwidth links.Virtual machine synchroniza- and Section 7 will give the result and performance.Finally, tion is one of the important application out of all those Section 8 covers future work and we conclude with Section applications.For example,software distribution for virtual 9. machines on mirror site i.e patch updation or deletion from virtual machines,synchronizing personal virtual machines with remote virtual machines and versioning systems for 1 memsize = ”384“ scsi:0.present = ”TRUE” scsi:0.fileName = ”Windows 2000 Advanced Server.vmdk” ide1:0.present = “TRUE” ide1:0.fileName = ”auto detect“ ide1:0.deviceType = ”cdrom-raw” floppy0.present = ”FALSE” Figure 2. VMDK Address Space Ethernet0.present = ”TRUE” displayName = ”Windows 2000 Advanced Server” guestOS = ”Win2000advserv” priority.grabbed = ”normal” Figure 1. VMX Configuration File 2. Virtual Machine Identification Virtual machine identification is one of important step in our approach.We should be able to identify the same virtual machine and update the same virtual machines only.For VMDK files ,we are using configuration file .VMX to identify the similar virtual mahines.GuestOS field from the configuration file will give you the information about which operating system is installed on the virtual machines. Figure 1 will give you the idea about the VMware configuration file.We can observe that window 2000 advaned server operating system has been installed on the virtual machine.By looking at the VMDK header we can also confirm Figure 3. VMDK Architecture and Design the type of the Operating System.Similarly for other kinds of virtual machines we have different configuration files or different set of headers.In this way we can solve the prob- lem of identifying similar virtual machines from the set of give you the size of the VMDK file.Grain Size field gives differnt virtual machines in distrubuted enviornment. you the size of the data block which is 128 sectors i.e. 64 K. 3. VMDK Architecture and Design 3.2. Descriptor Header VMDK file is responsible for the VMware Virtual Ma- chine. VMDK files consist of different types of headers One of the field from the SPRASE header will give you such as SPARSE header, Descriptor header, Grain Directo- the total size of the Descriptor header. It consits of the ries, Grain Tables and Data blocks. Unique ID for the VMDK file and name of the VMDK file. Sometime it may also contain Parent CID field which 3.1. SPARSE Header represnt it as a snapshot disk. Descriptor header also con- SPARSE header is of 512 bytes.Following are the couple sists of some of the fields present in the configuration file. of fields from SPARSE header of the virtual machines. 3.3. Grain Directories and Grain Tables 1.MAGIC NUMBER = ’VMDK’ 2.VERSION = ’1’ Total number of grain directories depends on the size of 3.CAPACITY = ’8388608’ the VMDK file. Each grain directory entry will point the 4.GRAIN SIZE = ’128’ start of the grain table. Each grain table will have 512 en- tries. Each entry from the grain table will point to the actual By looking at the MAGIC field from the SPARSE header data block present. Two GB SPARSE VMDK file contain we can identify the valid VMDK image.Capacity field will 64 grain directories. 2 Figure 4. Inverted Binary Hash Tree for Grain Sector Figure 5. Inverted Binary Hash Tree Chain 3.4. Grain Sector 2) Deleting from the VMDK files and 3) Modifying the VMDK files. Figure 5 will shows the list of the Inverted As explained earlier,each grain table entry will point to Binary Hash tree for whole virtual machine which needs to one grain sector. Each grain sector is of 128 sectors i.e. 64 be transfered or compared with other virtual machine.The K. Figure 2 explain the VMDK address space in detail. algorithm for comparing the virtual machine is explained in Figure 3 will give you the information about how Grain Section 5. Directories, Grain tables and Grain Sectors are connected. The main thing our approach is to deal with the Grain Sec- 5. Algorithm tors i.e. with the data blocks present in the VMDK file. Each Grain sector we divide into chunk of 1K. Next section Our algorithm is herarchical hash tree protocol between talks about this in detail. source and target virtual machines that aims at minimizing the transmission of any common data that already exists at 4. Inverted Binary Hash Tree the target virtual machine.The algorithm does not assume any knowledge of the state or the version of the data at the We are using the technique of inverted binary hash target virtual machine. tree for comparing the data blocks between two virtual machines.We divide the each grain sector block into 1K In general, for any hash-based synchronization proto- chunks.For each data block we generate hash.We can use col, the smaller the matching granulity the better the match MD5 or SHA algorithm for generating hash for each data and lower the number of bytes transfered.For this reason block.Once we are done with generating hash for whole only we have divided the each Grain Sector into 1K dif- grain sector we will create the Inverted bianry hash tree for ferent chunks. Grain Sector size of 128 sectors is fixed that Grain Sector. Similarly we will create this kind of tree by VMware and we can not change that. If we are trying for all the Grain Sectors. So we will have the list of Inverted to change that particular field then VMDK file will not get Binary Hash tree. This list will be helpful in identifying the loaded in virtual machine. Following are the differnt steps similar data blocks between two virtual machines. Figure of our algorithm. 4 will give you the idea about how to create the Inverted 1. Similar virtual machines can be identified using con- Binary Hash tree for each Grain Sector block. figuration file as explained in Section 2.First we will In Synchronizing a Inverted Binary Hash tree between a identify the similar virtual machines which are sup- source and target virtual machines, this approach will effi- posed to get synchronized. ciently handle all the common updates on the virtual machines. These might include : 1) adding to VMDK files 2. We will start with source virtual machine.We set our 3 file pointer to start of Grain Directory.We will get the Microsoft Virtual Server,flat VMDK files.

Load more