Stork: Package Management for Distributed VM Environments
Total Page:16
File Type:pdf, Size:1020Kb
Stork: Package Management for Distributed VM Environments Justin Cappos, Scott Baker, Jeremy Plichta, Duy Nyugen, Jason Hardies, Matt Borgard, Jeffry Johnston, John H. Hartman Department of Computer Science University of Arizona Tucson, AZ, 85721 [email protected] Abstract physical machine individually download and maintain In virtual machine environments each application is of- separate copies of the same package. There are also ten run in its own virtual machine (VM), isolating it from no provisions for inter–machine package management, other applications running on the same physical machine. centralized administration of which packages should be Contention for memory, disk space, and network band- installed on which machines, or allowing multiple ma- width among virtual machines, coupled with an inability chines to download the same package efficiently. Finally, to share due to the isolation virtual machines provide, current package management systems have relatively in- leads to heavy resource utilization. Additionally, VMs flexible security mechanisms that are either based on im- increase management overhead as each is essentially a plicit trust of the repository, or public/private key signa- separate system. tures on individual packages. Stork is a package management tool for virtual ma- Stork is a package management system designed for chine environments that is designed to alleviate these distributed VM environments. Stork has several advan- problems. Stork securely and efficiently downloads tages over existing package management systems: it pro- packages to physical machines and shares packages be- vides secure and efficient inter–VM package sharing on tween VMs. Disk space and memory requirements are the same physical machine; it provides centralized pack- reduced because shared files, such as libraries and bina- age management that allows users to determine which ries, only require one persistent copy per physical ma- packages should be installed on which VMs without con- chine. Experiments show that Stork reduces the disk figuring each VM individually; it allows multiple physi- space required to install additional copies of a package cal machines to download the same package efficiently; by over an order of magnitude, and memory by about it ensures that package updates are propagated to the 50%. Stork downloads each package once per physical VMs in a timely fashion; and it provides a flexible secu- machine no matter how many VMs install it. The transfer rity mechanism that allows users to specify which pack- protocols used during download improve elapsed time by ages they trust as well as delegate that decision on a per- 7X and reduce repository traffic by an order of magni- package basis to other (trusted) users. tude. Stork users can manage groups of VMs with the Stork’s inter–VM sharing facility is important for re- ease of managing a single machine – even groups that ducing resource consumption caused by package man- consist of machines distributed around the world. Stork agement in VM environments. VMs are excellent for is a real service that has run on PlanetLab for over 4 years isolation, but this very isolation can increase the disk, and has managed thousands of VMs. memory, and network bandwidth requirements of pack- age management. It is very inefficient to have each VM 1 Introduction install its own copy of each package’s files. The same is The growing popularity of virtual machine (VM) en- true of memory: if each VM has its own copy of a pack- vironments such as Xen [3], VMWare [20], and age’s files then it will have its own copy of the executable Vservers [11, 12], has placed new demands on package files in memory. Memory is often more of a limiting fac- management systems (e.g. apt [2], yum [25], RPM [19]). tor than disk, so Stork’s ability to share package files be- Traditionally, package management systems deal with tween VMs is particularly important for increasing the installing and maintaining software on a single machine number of VMs a single physical machine can support. whether virtual or physical. There are no provisions for In addition, Stork reduces network traffic by only down- inter–VM sharing, so that multiple VMs on the same loading a package to a physical machine once, even if 1 multiple VMs on the physical machine install it. 2 Stork Stork’s inter–machine package management facility Stork provides both manual management of packages on enables centralized package management and efficient, individual VMs using command-line tools as well as cen- reliable, and timely package downloads. Stork provides tralized management of groups of VMs. This section de- package management utilities and configuration files that scribes an example involving package management, the allow the user to specify which packages are to be in- configuration files needed to manage VMs with Stork, stalled on which VMs. Machines download packages us- and the primary components of Stork. ing efficient transfer mechanisms such as BitTorrent [6] and CoBlitz [16], making downloads efficient and reduc- 2.1 An Example ing the load on the repository. Stork uses fail–over mech- Consider a system administrator that manages thousands anisms to improve the reliability of downloads, even if of machines at several sites around the globe. The com- the underlying content distribution systems fail. Stork pany’s servers run VM software that allow different pro- also makes use of publish/subscribe technology to en- duction groups more flexible use of the hardware re- sure that VMs are notified of package updates in a timely sources. In addition, the company’s employees have fashion. desktop machines that have different software installed Stork provides all of these performance benefits with- depending on their use. out compromising security; in fact, Stork has additional The system administrator has just finished testing a security benefits over existing package management sys- new security release for a fictional package foobar tems. First, Stork shares files securely between VMs. and she decides to have all of the desktop machines Although a VM can delete its link to a file, it cannot used for development update to the latest version along modify the file itself. Second, a user can securely spec- with any testing VMs that are used by the coding group. ify which packages he or she trusts and may delegate this The administrator modifies a few files on her local ma- decision for a subset of packages to another user. Users chine, signs them using her private key, and uploads them may also trust other users to know which packages not to a repository. Within minutes all of the desired ma- to install, such as those with security holes. Each VM chines that are online have the updated foobar pack- makes package installation decisions based on a user’s age installed. As offline machines come online or new trust assumptions and will not install packages that are VMs are created they automatically update their copies not trusted. While this paper touches on the security as- of foobar as instructed. pects of the system that are necessary to understand the The subsequent sections describe the mechanisms design, a more rigorous and detailed analysis of security Stork uses to provide this functionality to its users. Sec- is deferred to future work. tion 5 revisits this example and explains in detail how In addition, Stork is flexible and modular, allowing Stork provides the functionality described in this sce- the same Stork code base to run on a desktop PC, nario. a Vserver–based virtual environment, and a PlanetLab 2.2 File Types node. This is achieved via pluggable modules that iso- late the platform–specific functionality. Stork accesses Stork uses several types of files that contain different in- these modules through a well defined API. This approach formation and are protected in different ways (Table 2.2). makes it easy to port Stork to different environments The user creates a public/private key pair that authenti- and allows the flexibility of different implementations for cates the user to the VMs he or she controls. The public common operations such as file retrieval. key is distributed to all of the VMs and the private key is Stork has managed many thousands of VMs and has used to sign the configuration files. In our previous ex- been deployed on PlanetLab [17, 18] for over 4 years. ample, the administrator’s public key is distributed to all Stork is currently running on over hundreds of Planet- of the VMs under her control. When files signed by her Lab nodes and its package repository receives a request private key were added to the repository, the authentic- roughly every ten seconds. Packages installed in multiple ity of these files was independently verified by each VM VMs by Stork typically use over an order of magnitude using the public key. less space and 50% the memory of packages installed The master configuration file is similar to those found by other tools. Stork also reduces the repository load by in other package management tools and indicates things over an order of magnitude compared to HTTP–based such as the transfer method, repository name, user name, tools. Stork is also used in the Vserver[12] environment etc. It also indicates the location of the public key that and can also be used in non-VM environments (such as should be used to verify signatures. on a home system) as an efficient and secure package in- The user’s trusted packages file (TP file) indicates stallation system. The source code for Stork is available which packages the user considers valid. The TP file at http://www.cs.arizona.edu/stork does not cause those packages to be installed, but instead 2 File Type Repository Client Central Signed and Mgmt Embedded available.