<<

When Virtual is Harder than Real: Security Challenges in Virtual Machine Based Computing Environments

Tal Garfinkel Mendel Rosenblum {talg,mendel}@cs.stanford.edu Stanford University Department of Science

Abstract demands a significant re-examination of how security is implemented. As virtual machines become pervasive users will be able to create, modify and distribute new “machines” with unprece- In the next section we will elaborate on the capabil- dented ease. This flexibility provides tremendous benefits for ities that virtual machines provide, new usage mod- users. Unfortunately, it can also undermine many assumptions els they give rise to, and how this can adversely im- that today’s relatively static security architectures rely on about pact security in current systems. In section 3 we will the number of hosts in a system, their mobility, connectivity, patch explore how virtual environments can evolve to meet cycle, etc. these challenges. We review related work in section 4 We examine a variety of security problems virtual computing and offer conclusions in section 5. environments give rise to. We then discuss potential directions for changing security architectures to adapt to these demands. 2 Security Problems in Virtual Environ- ments 1 Introduction A virtual machine monitor (VMM) (e.g. VMware Virtual machines allow users to create, copy, save Workstation, , ), provides (checkpoint), read and modify, share, migrate and roll a layer of software between the (s) back the state of machines with all the ease and hardware of a machine to create the illusion of one of manipulating a file. This flexibility provides signifi- or more virtual machines (VMs) on a single physical cant value for users and administrators. Consequently, platform. A virtual machine entirely encapsulates the VMs are seeing rapid adoption in many computing en- state of the guest operating system running inside it. vironments. Encapsulated machine state can be copied and As virtual machine monitors provide the same in- shared over networks and removable media like a terface as existing hardware, users can take advan- normal file. It can also be instantiated on existing tage of these benefits with their current operating sys- networks and requires configuration and management tems, applications and management tools. This often like a physical machine. VM state can be modified leads to an organic process of adoption, where servers like a physical machine, by executing over time, or and desktops are gradually replaced with their virtual like a file, through direct modification. equivalents. Unfortunately, the ease of this transition is decep- Scaling Growth in physical machines is ultimately tive. As virtual platforms replace real hardware they limited by setup time and bounded by an organiza- can give rise to radically different and more dynamic tion’s capital equipment budget. In contrast, creating usage models than are found in traditional computing a new VM is as easy as copying a file. Users will fre- environments. quently have several or even dozens of special purpose This can undermine the security architecture of VMs lying around e.g. for testing or demonstration many organizations which often assume predictable purposes, “sandbox” VMs to try out new applications, and controlled change in number of hosts, host config- or for particular applications not provided by their reg- uration, host location, etc. Further, some of the useful ular OS (e.g. a Windows VM running Microsoft Of- mechanisms that virtual machines provide (e.g. roll- fice). Thus, the total number of VMs in an organi- back) can have unpredictable and harmful interactions zation can grow at an explosive rate, proportional to with existing security mechanisms. available storage. Virtual computing platforms cannot be deployed The rapid scaling in virtual environments can tax securely simply by dropping them into existing sys- the security systems of an organization. Rarely are tems. Realizing the full benefits of these platforms all administrative tasks completely automated. Up- grades, patch management, and configuration involve monotonically forward as software executes, configu- a combination of automated tools and individual ini- ration changes are made, software is installed, patches tiative from administrators. Consequently, the fast and are applied, etc. In a virtual environment machine unpredictable growth that can occur with VMs can ex- state is more akin to a tree: at any point the execution acerbate management tasks and significantly multiply can fork off into N different branches, where multiple the impact of catastrophic events, e.g. worm attacks instances of a VM can exist at any point in this tree at where all machines should be patched, scanned for a given time. vulnerabilities, and purged of malicious code. Branches are caused by undo-able disks and check- Transience In a traditional computing environment point features, that allow machines to be rolled back users have one or two machines that are online most to previous states in their execution (e.g. to fix con- of the time. Occasionally users have a special purpose figuration errors) or re-run from the same point many machine, or bring a mobile platform into the network, times, e.g. as a means of distributing dynamic content but this is not the common case. In contrast, collec- or circulating a “live” system image. tions of specialized VMs give rise to a phenomenon This execution model conflicts with assumptions in which large numbers of machines appear and dis- made by systems for patch management and main- appear from the network sporadically. tenance, that rely on monotonic forward progress. While conventional networks can rapidly “anneal” For example, rolling back a machine can re-expose into a known good configuration state, with many patched vulnerabilities, reactivate vulnerable services, transient machines getting the network to converge to re-enable previously disabled accounts or passwords, a “known state” can be nearly impossible. use previously retired encryption keys, and change For example, when worms hit conventional net- firewalls to expose vulnerabilities. It can also rein- works they will typically infect all vulnerable ma- troduce worms, viruses, and other malicious code that chines fairly quickly. Once this happens, administra- had previously been removed. tors can usually identify which machines are infected A subtler issue can break many existing security quite easily, then cleanup infected machines and patch protocols. Simply put, the problem is that while VMs them to prevent re-infection, rapidly bringing the net- may be rolled back, an attackers’ memory of what has work back into a steady state. already been seen cannot. In an unregulated virtual environment, such a For example, with a one-time password system like steady state is often never reached. Infected machines S/KEY, passwords are transmitted in the clear and se- appear briefly, infect other machines, and disappear curity is entirely reliant on the attacker not having before they can be detected, their owner identified, seen previous sessions. If a machine running S/KEY etc. Vulnerable machines appear briefly and either be- is rolled back, an attacker can simply replay previ- come infected or reappear in a vulnerable state at a ously sniffed passwords. later time. Also, new and potentially vulnerable vir- A more subtle problem arises in protocols that rely tual machines are created on an ongoing basis, due to on the “freshness” of their random number source copying, sharing, etc. e.g. for generating session keys or nonces. Consider As a result, worm infections tend to persist at a low a virtual machine that has been rolled back to a point level indefinitely, periodically flaring up again when after a random number has been chosen, but before it conditions are right. has been used, then resumes execution. In this case, The requirement that machines be online in con- randomness that must be “fresh” for security purposes ventional approaches to patch management, virus is reused. and vulnerability scanning, and machine configura- With a stream cipher, two different plaintexts could tion also creates a conflict between security and us- be encrypted under the same key stream, thus expos- ability. VMs that have been long dormant can require ing the XOR of the two messages. This could in turn significant time and effort to patch and maintain. This expose both messages if the messages have sufficient results in users either forgoing regular maintenance redundancy, as is common for English text. Non- of their VMs, thus increasing the number of vulner- cryptographic protocols that rely on freshness are also able machines at a site, or losing the ability to spon- at risk, e.g. reuse of TCP initial sequence numbers can taneously create and use machines, thus eliminating a allow TCP hijacking attacks [2]. major virtue of VMs. Zero Knowledge Proofs of Knowledge (ZKPK), by their very nature, are insecure if the same random Software Lifecycle Traditionally, a machine’s life- nonces are used multiple times. For example, ZKPK time can be envisioned as a straight line, where the authentication protocols, such as Fiat-Shamir authen- current state of the machine is a point that progresses tication [5] or Schnorr authentication [12], will leak Similar problems arise with worms and viruses. In- the user’s private key if the same nonce is used twice. fecting a VM is much like infecting a normal exe- Similarly, signature systems derived from ZKPK pro- cutable. Further, direct infection provides access to tocols, e.g. the Digital Signature Standard (DSS), will every part of a machines state irrespective of protec- leak the secret signing key if two signatures are gen- tion in the guest OS. erated using the same randomness [1]. Using VMs as a general-purpose solution for mo- Finally, cryptographic mechanisms that rely on pre- bility [10, 11] poses even more significant issues. Mi- vious execution history being thrown away are clearly grating a VM running on someone’s home machine of no longer effective, e.g. perfect forward secrecy in unknown configuration into a site’s security perimeter SSL. Such mechanisms are not only ineffective in vir- is a risky proposition at best. tual environments, but constitute a significant and un- From a theft standpoint, VMs are easy to copy to necessary overhead. a remote machine, or walk off with on a storage de- Diversity Many IT organizations tackle security vice. Similar issues of proprietary data loss due to problems by enforcing homogeneity: all machines laptop theft are consistently cited as one of the largest must run the most current patched software. VMs sources of financial loss due to computer crime [9]. can facilitate more efficient usage models which de- That VMs are such coarse grain units of mobil- rive benefit from running unpatched or older versions ity can also magnify the impact of theft. Facilitat- of software. This creates a range of problems as one ing easy movement of one’s entire computing envi- must try and maintain patches or other protection for ronment (e.g. on a USB keychain) makes users more a wide range of OSes, and deal with the risk posed by inclined to carry around all of their (potentially sensi- having many unpatched machines on the network. tive) files instead of simply the ones they need. For example, at many sites today users are simply Identity In traditional computing environments supplied with VMs running their new operating en- there is often an ad-hoc identity associated with a ma- vironment and applications are gradually migrated to chine. This can be as simple as a list of MAC ad- that environment, or conversely, legacy applications dresses, employee names, and office numbers. With- are run in a VM. This can mitigate the need for long out such mechanisms it can be extremely difficult to and painful upgrade cycles, but leads to a prolifera- establish who is responsible for a machine, e.g. who tion of OS versions. This makes patch management to contact if the machine turns malicious or who is more difficult, especially in the presence of older, dep- responsible for its origin/current state. recated versions of operating systems. Unfortunately, these static methods are impractical Virtual machines have also changed the way that for VMs. The dynamic creation of VMs makes the use software testing takes place. Previously one required of MAC addresses infeasible. Often VMs just pick a a large number of usually dedicated test machines to random MAC address (e.g. in VMware Workstation), test out a new piece of software, one for each different in the hope of avoiding collisions. OS, OS version (service pack), patch level, etc. Now Identifying machines by location/Ethernet port each developer or tester can simply have their own number is also problematic since a VM’s mobility collection of virtual test machines. Unfortunately, if makes it difficult to establish who owns a VM running these machines are not secured they rapidly become a on a particular physical host. Further, there are often cesspool of infected machines. multiple VMs on a physical host, thus shutting off the Mobility VMs provide mobility similar to a normal port to a machine can end up disabling non-malicious file; they can easily be copied over a network or car- VMs as well. ried on portable storage media. This can give rise to Establishing responsibility is further complicated host of security problems. as VMs have more complicated “ownership histories” For a normal platform, the trusted computing base than normal machines. A specialized virtual machine (TCB) consists of the hardware and software stack. In may be passed around from one user to the next, much a VM world, the TCB consists of all of the hosts that like a popular shell script. This can make it very diffi- a VM has run on. Combined with a lack of history, cult to establish just who made what changes to get a this can make it very difficult to figure out how far machine into its present state. a compromise has extended, e.g. if a file server has Data Lifetime A fundamental principle for build- been compromised, any VM that was on the server ing secure systems is minimizing the amount of time may have been backdoored by an attacker. Determin- that sensitive data remains in a system [6]. A VMM ing which VMs were exposed, subsequently copied, can undermine this process. For example, the VMM etc. can be quite challenging. must log execution state to implement rollback. This can undermine attempts by the guest to destroy sen- formed while VMs are offline, thus aiding issues of sitive data (e.g. cryptographic keys, medical docu- usability, scale and transience. ments) since data is never really “dead,” i.e. data can We will briefly outline what such a layer would always be made available again within the VM. look like and how it can address the challenges raised Outside the VM, logging can leak sensitive data in the prior section. to persistent storage, as can VM paging, checkpoint- Outlining a Layer The heart of a ing, and migration, etc. This breaks guest OS mech- virtualization layer is a high assurance virtual ma- anisms to prevent sensitive data from reaching disk, chine monitor. On top of it would run a secure dis- e.g. encrypted swap, pinning sensitive memory, and tributed storage system, and components replacing se- encrypted file systems. curity and management functions traditionally done in As a result sensitive files, encryption keys, pass- the guest OS. words, etc. can be left on the platform hosting a VM Enforcing policies such as limiting VM mobility indefinitely. Because of VMs’ increased mobility, and connectivity requires that the virtualization layer such data could easily be spread across several hosts. on a particular machine be trusted by the infrastruc- Similar Problems in Traditional Computing Envi- ture. Virtualization layer integrity could be veri- ronments Some existing platforms exhibit security fied either through normal authentication and access problems similar to those found in virtual environ- controls, or through dedicated attestation hardware ments. Laptops are known for making it difficult to e.g. TCPA. maintain a meaningful network perimeter by trans- Policy at this layer could limit replication of sen- porting worms into internal networks, and sensitive sitive VMs and control movement of VMs in and out data (e.g. ) out, thus making the firewall of a managed infrastructure. Document control style irrelevant. Undo features like Windows Restore intro- policies could prevent certain VMs from being placed duce many of the same difficulties as rollback in VMs. onto removable media, limit which physical hosts a Transience occurs with dual boot machines, and other VM could reside on, and limit access to VMs contain- occasionally used platforms. ing sensitive data to within a certain time frame. These examples can lend insight into the impact of User and machine identities at this layer could be VMs. However, they differ in a variety of ways. Most used to reintroduce a notion of ownership, responsi- of these technologies are deployed in limited parts bility and machine history. Tracking information such of IT organizations or see infrequent use; as virtual- as the number of machines in an organization and their ization is adopted, these dynamic behaviors become usage patterns could also help to gauge the impact of the common case. Similar characteristics manifest by potential threats. other platforms (e.g. mobility, transience) tend to be Encryption at this layer could help address data more extreme in VMs as VMs are software state. Fi- lifetime issues due to VM swapping, checkpointing, nally, VMs tend to magnify problems with the rapid rollback, etc. growth and novel uses they facilitate. VMM Assurance A VMM’s central role is provid- Notably, adapting virtual computing environments ing secure isolation. The need to preserve this prop- to meet these challenges also provides a solution for erty is sometimes seen as an argument against moving mobile platforms. functionality out of the guest operating system. How- ever, such arguments overlook the inherent flexibility 3 Towards Secure Virtual Environments available in a VMM. In essence, a virtual machine The dynamic usage models facilitated by virtual monitor is nothing more than a microkernel with a platforms demand a dedicated infrastructure for en- hardware compatibility layer. As such, it can support forcing security policies. We can provide this by in- arbitrary protection models for services running at the troducing a ubiquitous virtualization layer, and mov- virtualization layer. ing many of the security and managment functions of For example, firewall functionality running outside guest operating systems into this layer. of a guest OS would be hosted in its own protection Ubiquity allows administrators to flexibly re- domain (e.g. a paravirtualized VM), and could utilize introduce the constraints that virtualization relaxes on a special purpose operating system affording better as- mobility and data lifetime. Moving security and man- surance, greater efficiency, and a more suitable protec- agement functions (e.g. firewalling, virus scanning, tion model than common OSes. backup) from the guest OS to the virtualization layer Other requirements for building a high assurance allows delegation to a central administrator. It also VMM (e.g. device driver isolation) have been ex- permits management tasks to be automated and per- plored elsewhere [7]. 3.1 Benefits trust components running at network end-points, it Moving security and management functions out of can now delegate responsibility to these end-points, the guest OS provides a variety of benefits including: thus making policies such as trustworthy network quarantine (i.e. limiting network access based on VM contents) feasible. • Delegating Management • Lifecycle Independence Moving security relevant A virtual environment provides maximum utility state out of the guest OS solves many difficulties when users can focus on using their VMs however caused by rollback. they please, without having to worry about manag- This can be accomplished by moving secu- ing them. rity mechanisms out of the guest completely, into Moving security functionality out of guest OSes e.g. an external login mechanism, or by modify- makes it easier to delegate management responsi- ing guests to store state such as user account in- bilities to automated services and site administra- formation, virus signatures, firewall rules, etc. in tors. It also obviates the need for homogeneous dedicated storage that would operate independent systems where every machine runs a common man- of rollback. A combination of both approaches is agement suite (e.g. LANDesk), or where an admin- likely necessary. istrator must have an account on every machine. For protocol related issues, making guest soft- As administrators can externally modify VMs, ware lifecycle independant is likely the easiest path tasks not moved outside of the VM can still be del- forward, and seems possible without major changes egated while VMs are offline. Much of the required to today’s systems. scanning, patching, configuration, etc. can be done As a first step, lifecycle dependent by a service running on the virtualization layer could be replaced with lifecycle independent vari- that would periodically scan and maintain archived ants, e.g. ZKPK based signature schemes (such as VMs. DSS) can be replaced with lifecycle independent In a virtualization layer, VMs are first-class ob- signatures schemes such as RSA. jects, instead of merely a collection of bits (as in Guest software must have also some way of be- today’s file systems). Thus, operations that today ing notified when a VM has been restarted, so that it require reconfiguration could be provided transpar- can refresh any keys it is currently holding, perhaps ently, e.g. users should be able to copy VMs just a variation on existing approaches for notifying ap- as they would a normal file, without having to plications when a laptop has awakened from hiber- bring them online and reconfigure. The infrastruc- nation. Finally, randomness (e.g. data from ’s ture could appropriately update hostname, crypto- /dev/random) should be obtained directly from graphic keys, etc. to reflect the new machine iden- the VMM instead of relying on state/events within tity. the VM. Suspended VMs could be executed in a “sand- • Securely Supporting Diversity boxed” environment to allow certain configuration A virtual infrastructure should allow users to changes to anneal and ensure that they do not break use old unpatched VMs with diverse OSes much the guest. as they would be able to use old or non-standard • Guest OS Independence files without having to change them. This avoids Moving security and managment components problems such as patches breaking VMs and being to the virtualization layer makes them indepen- unable to secure deprecated versions of software dent of the structure of the guest operating system. where patches are no longer available. Thus, these components can provide greater assur- Enforcing policy from outside of VMs facilitates ance, as they can largely specify their own software this through the use of vulnerability specific pro- stack and protection model and are isolated from tection as an alternative to software modification. the guest OS. In contrast, today’s host-based fire- For example, vulnerability specific firewall rules, walls, intrusion detection and anti-virus software such as Shields [13], can allow users to run un- are tightly coupled with the fragile monolithic op- patched versions of applications and operating sys- erating systems they try to protect, making them tems while still accessing as much network func- trivial to bypass. tionality as is safely possible. This flexibility opens the door for the adoption Finally, today greater diversity requires sup- of more secure and flexible operating systems as a porting N different versions of security software foundation for infrastructure services. Further, be- (e.g. firewall, intrusion detection). While special- cause the infrastructure can now authenticate and ized policy is still required for scanning particu- Acknowledgments lar OSes, putting management at the virtualization This work benefited significantly from discussions layer eliminates this redundant infrastructure. with Ben Pfaff and Jim Chow. We are very greatful for generous feedback given to us by Dan Boneh, Jeff Mogul, Armando Fox, Emre Kiciman, Peter Chen, There are many challenges to building an architec- and Martin Casado. This material is based upon work ture that securely allows the full potential of VMs to supported in part by the National Science Foundation be realized. However, we believe the direction for- under Grant No. 0121481. ward is clear. Moving security relevant functional- References ity out of guest operating system to a ubiquitous vir- tualization layer provides a more secure and flexible [1] M. Bellare, S. Goldwasser, and D. Micciancio. ”Pseudo-random” number generation within cryptographic algorithms: The DDS case. model for managing and using VMs. In CRYPTO, pages 277–291, 1997. [2] S. M. Bellovin. Security problems in the TCP/IP protocol suite. 4 Related Work SIGCOMM Comput. Commun. Rev., 19(2):32–48, 1989. [3] P. M. Chen and B. D. Noble. When virtual is better than real. In A body of existing work has already examined the (HOTOS-VIII), Schloss Elmau, Germany, May 2001. security benefits of moving intrusion detection [8], [4] G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai, and P. M. Chen. and logging [3, 4] out of the guest e.g. to leverage the Revirt: Enabling intrusion analysis through virtual-machine logging isolation provided by a VMM and ability to interpose and replay. In OSDI, 2002. on all system events. Other work has also examined [5] U. Fiege, A. Fiat, and A. Shamir. Zero knowledge proofs of identity. the benefits of trust and flexible assurance provided by In STOC ’87: Proceedings of the nineteenth annual ACM confer- ence on Theory of computing, pages 210–217, New York, NY, USA, placing components such as the firewall outside of the 1987. ACM Press. VM [7]. [6] T. Garfinkel, B. Pfaff, J. Chow, and M. Rosenblum. Data lifetime is a A variety of recent projects have examined how systems problem. In Proc. 11th ACM SIGOPS European Workshop, virtualization can enhance manageability [11], mo- september 2004. bility [10] and security [3, 4, 7, 8]. Unfortunately, [7] T. Garfinkel, B. Pfaff, J. Chow, M. Rosenblum, and D. Boneh. Terra: A virtual machine-based platform for trusted computing. In this work has only considered single hosts or assumed Proceedings of the 19th Symposium on Operating System Princi- an entirely new organizational paradigm (e.g. utility ples(SOSP 2003), October 2003. computing), overlooking how virtual machine tech- [8] T. Garfinkel and M. Rosenblum. A virtual machine introspection nology impacts security in current organizations. based architecture for intrusion detection. In Proc. Network and Some of the problems presented here are beginning Distributed Systems Security Symposium, February 2003. to be addressed by VMware ACE, such as controlling [9] L. Gordon, M. L. W. Lucyshyn, and R. Richardson. CSI/FBI computer crime and security survey. http://www.gocsi.com, VM copying, preventing the spread of VM contents 2004. (encrypted virtual disks and suspend files) and some [10] M. Kozuch and M. Satyanarayanan. Internet suspend/resume. In support for network quarantine. Forth IEEE Workshop on Mobile Computing Systems and Applica- In the absence of other published studies on VMs tions, pages 40–, 2002. within organizations, we have relied on our own ex- [11] . Sapuntzakis and M. S. Lam. Virtual appliances in the Collective: perience and anecdotes from others while formulating A road to hassle-free computing. In (HOTOS-XI), May 2003. this work. [12] C.-P. Schnorr. Efficient signature generation by smart cards. J. Cryp- tology, 4(3):161–174, 1991. [13] H. J. Wang, C. Guo, D. R. Simon, and A. Zugenmaier. Shield: 5 Conclusions Vulnerability-driven network filters for preventing known vulnera- We expect end-to-end virtualization to become a bility exploits. In Proc. of ACM SIGCOMM, August 2004. normal part of future computing environments. Un- fortunately, simply providing a virtualization layer is not enough. The flexibility that makes virtual ma- chines such a useful technology can also undermine security within organizations and individual hosts. Current research on virtual machines has focused largely on the implementation of virtualization and its applications. We believe that further attention is due to the security risks that accompany this technology and the development of infrastructure to meet these challenges.