Virtual Machines Uses for Virtual Machines

technology, often just called There are several uses for virtual machines: virtualization, makes one behave as several • Running several operating systems simultaneously on by sharing the resources of a single the same desktop. computer between multiple virtual machines. → • Each of the virtual machines is able to run a complete May need to run both Unix and Windows programs. → . Useful in program development for testing programs on different platforms. • This is an old idea originally used in the IBM VM370 → Some legacy applications my not work with the system, released in 1972. standard system libraries. • The VM370 system was running on big mainframe • Running several virtual servers on the same hardware computers that easily could support multiple virtual server. machines. → One big server running many virtual machines is less • As expensive mainframes were replaced by cheap expensive than many small servers. , barely capable of running one → With virtual machines it is possible to hand out application at a time, the interest in virtualization complete administration rights of a specific virtual vanished. machine to a customer. • For many years the trend was to replace big expensive computers with many small inexpensive computers. • Today single microprocessors have at least two cores each and is more powerful than the of yesterday, thus most programs only need a fraction of the CPU-cycles available in a single . • Together with the discovery that small machines are expensive to administrate, if you have too many of them, this has created a renewed interest in virtualization.

1 2 Types of Virtual Machines Virtual Machine Types

• Type 1 hypervisor (virtual machine monitor or VMM) Type 1 hypervisor → The type 1 hypervisor runs at the physical hardware and Linux app Windows app is the real operating system. User mode → Normal unmodified operating systems, like Linux or Linux Windows Windows runs atop of the hypervisor. Type1 Hypervisor Kernel mode • Type 2 hypervisor Hardware → A normal unmodified host operating system like Linux or Windows runs on the physical hardware. → A type 2 hypervisor like VMware Workstation runs on Type 2 hypervisor the host operating system. Windows app → A normal unmodified guest operating system is booted User mode in the type 2 hypervisor. Guest OS (Windows) Type 2 hypervisor Linux app Host operating system (Linux) Kernel mode Hardware

3 4 Requirements for Virtualization Requirements for Virtualization

The requirements for virtualization were formally described Definitions used by Popek and Goldberg: by Popek and Goldberg in 1974. • Sensitive instructions are instructions that need to be “Formal requirements for Virtualizable Third Generation executed in kernel mode because they do I/O or for Architectures”, Communications of ACM 17 (7), 1974. example change the MMU settings. Popek and Goldberg formulated three requirements for a • Privileged instructions are instructions that trap if virtual machine monitor (VMM): executed in user mode.

Isolation (safety) The VMM manages all hardware Popek and Goldberg found that for a processor architecture resources. The client system is limited to its own virtual to meet the requirements for virtualization: space and is unable to determine that it is virtualized. Equivalence (fidelity) Programs running on the VMM 1. A processor architecture must have both user and executes identically to its on hardware with kernel mode. exception for timing effects. 2. The set of sensitive instructions is a subset of the Performance A majority of guest instructions are executed privileged instructions. by the hardware without intervention of the VMM. • Expressed in a simpler way, if an instruction is executed in user mode which is not allowed to execute in user mode it should trap to kernel mode. • Today the term classically virtualizable if often used for an architecture that has this property. • The IBM/370 processor had this property but the Intel 80386 did not have it.

5 6 Virtual Machine Implementation Virtual Machine Implementation

Type 1 hypervisor Type 2 Hypervisor • The real operating system is the hypervisor, which runs • Actually, it is possible to virtualize a machine that do not in kernel mode. meet the Popek and Goldberg requirements if the guest • The guest operating system runs in user mode executes on an instead of directly on the (sometimes called virtual kernel mode). hardware. • All non sensitive instructions executed by the guest • Interpretation, however , would not meet the execute in the normal way (but in user mode). performance criteria. • When the guest operating system executes a sensitive • Combining interpretation with a technique called binary instruction, a trap occurs and the instruction is emulated translation will however meet all requirements. by the hypervisor. • This is how the architecture which is not → If the machine have sensitive instructions that do not classically virtualizable was actually virtualized by trap in user mode, the emulation will not work making VMware using a type 2 hypervisor. virtualization impossible.

7 8 Type 2 Hypervisor - VMware The working of VMware

• VMware runs as a normal user process atop of an • When VMware executes binary code from a guest operating system such as Linux or Windows. system, it first scans the code to find basic blocks. • The first time VMware is started, it acts like a newly • A basic block is a sequence of instructions terminated started uninstalled computer, expecting to find an by a jump, call, trap or other instruction that changes the installation CD in the CD-ROM drive. control flow. • The guest operating system is installed in the normal • The basic block is inspected for sensitive instructions way on a virtual disk (just a file in Linux or Windows). and all such instructions are replaced by VMware • Once the guest operating system is installed, it can be procedure calls that emulate the instruction. booted and run. • When these steps have been taken the basic block is cached inside VMware and executed. • This technique is called . • The basic block ends with a call to VMware, that locates the next basic block. • All basic blocks that do not contain sensitive instructions will run at the same speed as on a bare machine, because they do run on a bare machine.

9 10 Paravirtualization X86 Hardware Virtualization

• In late 2005 Intel started to ship some of its x86 • Both type 1 and type 2 hypervisors works with processors with VT (Virtualization Technology) unmodified guest operating systems. extensions. • An alternative strategy is to modify the guest operating • The primary goal with the VT extension was to bring system, so that instead of executing sensitive x86 into compliance with the Popek and Goldberg instructions it makes a procedure call to the hypervisor. criteria, making classical virtualization possible. • The method using modified guest operating systems is • The VT-x extensions introduce a new VMX mode of called paravirtualization. operation with two new operating states: VMX root and • Paravirtualization can be used to solve the problem with VMX guest. architectures that are not in compliance with Popek and • The root state is intended for the hypervisor and the Goldberg but can also be used to improve performance guest state is for a guest OS. on architectures that are classically virtualizable. • Both root and guest state support all the original • On architectures such as x86 with many operating privilege levels, making it possible to run a guest OS at systems and many hypervisors, paravirtualization will its intended privilege level. require a standardized API. • A new Virtual Machine Control Structure VMCS • A proposal from VMware is VMI (Virtual Machine specifies the behavior of some instructions in VMX Interface). mode. → A VMI is implemented for each hypervisor. All → In VMX mode all sensitive instructions will trap if implementations present the same interface to the executed in guest state. guest OS but do different calls to the hypervisor. • The processor boots with VMX mode disabled. To • Another such interface is paravirt ops supported by activate VMX mode a new vmxon instruction is executed the Linux kernel developers. and a vmxoff instruction will terminate VMX mode. • AMD have added similar virtualization extensions to their processors, called AMD-V.

11 12 Memory Virtualization Memory Virtualization, cont.

• Virtualized operating systems not only share the CPU; • Every time a guest OS changes a page table the they also have to share virtual memory and I/O devices. hypervisor must change the shadow page tables. • Unfortunately virtualization greatly complicates • The trouble is that the guest page tables reside in management of virtual memory. normal memory. No sensitive instruction is needed to Example: write in these pages and no trap to the hypervisor will occur. • Assume that a guest OS want to mmap virtual pages 3, 4 and 5 to physical pages 10, 11 and 12. A trick is needed to solve this problem: • It will build page tables for this mapping and point a • When the hypervisor creates the page tables for the register to the top level page table. guest, it will mark all memory that contain the page • This instruction is sensitive and will trap on a VT tables as read-only. enabled CPU. • Now any attempt from the guest to write in its page • The hypervisor can now allocate physical pages 10, 11 tables will generate a protection fault and give control to and 12 and set up the page tables to map the guest’s the hypervisor. pages 3, 4 and 5 to them. • Doing some complicated analysis, the hypervisor will be • Now suppose that another guest OS starts and maps its able to determine that this was an attempt to update a virtual pages 6, 7 and 8 to physical pages 10, 11 and page table and it will update the shadow page tables 12. accordingly. • Now the hypervisor cannot use this mapping because • A paravirtualized guest is in a much better position physical pages 10, 11 and 12 are already in use. It have here. It will know that it is virtualized and will do a to find some other free pages and modify the page procedure call to inform the hypervisor that it has tables to use them. changed its page table. This is more efficient than • In general, for each guest the hypervisor will have to generating lots of protection faults an have the manage a shadow page table. hypervisor guessing what is going on.

13 14 Hardware support for Memory Virtualization I/O Virtualization

When virtualizing I/O devices, at least the following • For a few years AMD have shipped processors with interactions between software and devices must be hardware support for memory virtualization. handled: • The AMD virtual memory support for virtual machines is called NPT (Nested Page Tables). Device discovery: A method must be provided for the • Also Intel has announced virtual memory hardware guest OS to find out which I/O devices are present. support called EPT (extended page tables). Device control: Special I/O instructions or memory • In processors with NPT another level of translation is mapped registers can be used. I/O instructions are added in the hardware MMU. sensitive and will trap but memory mapped registers are → The guest page table converts between guest virtual not sensitive unless read/write protected. address and guest physical address. Data transfers: Most devices use DMA which in most → A new nested page table under control of the cases uses absolute physical addresses. The guest OS hypervisor converts between guest physical address will however use guest physical addresses which need and system physical address. to be translated by the hypervisor for DMA to work. • Using NPT the guest OS will be in full control of its page I/O interrupts: A method is needed to relay interrupts for tables. guest initiated data transfers to the guest. • The price to pay for this is that the translation takes more time than with the original hardware and that the caching of the translations in the TLB is less efficient.

15 16 I/O Virtualization Hardware support for I/O Virtualization • A common way for a hypervisor to handle I/O devices is by emulation. • The hypervisor implements a software model of the I/O • The use of DMA is a problem with virtual machines device including all the control registers. because a DMA controller is able to write to the entire • If the device use memory mapped I/O, the memory physical memory, not just the memory assigned do a addresses for the emulated registers must be protected single guest OS. by the hypervisor to make operations on them trap. • For this reason a guest OS is not allowed to handle the • The guest OS will believe that it is talking to a hardware DMA controller itself, but must call for the hypervisor to device, when in fact it is communicating with a software do it. model. • To solve this problem both AMD and Intel have added • The hypervisor may be accessing a real hardware when IOMMU:s (I/O Memory Management Unit) to some emulating the device, but the emulated device may be recent processors. different from the hardware device. For example an IDE • An IOMMU will use a translation technique similar to a disk may be emulated while the real hardware is a SATA virtual memory page table to translate between guest disk. physical addresses and absolute physical addresses. • Using emulation, the hypervisor can present many more • The IOMMU will also restrict which physical addresses devices to the guest systems than are physically a device may access. present on the machine. • Using an IOMMU a guest OS may be allowed to • A type 2 hypervisor will use the device drivers in the initialize a DMA transfer itself, however a call to the host OS to access the real hardware devices while a hypervisor is needed to set up the IOMMU mapping. type 1 hypervisor will need to develop its own device • Like a virtual memory system needs a TLB (Translation drivers for all hardware that are present on the machine. Lookaside Buffer) to get reasonable performance, an • The disadvantage with emulation is that performance IOMMU will need an IOTLB. suffers when every operation on an I/O register will generate a timeconsuming trap to the hypervisor. Here paravirtualization can give much better performance.

17 18 Some Existing Virtual Machine Softwares Some Existing Virtual Machine Softwares, Cont.

• VMware Workstation, WMware Player, WMware server → Type 2 hypervisor using binary translation • Virtualbox (Sun) → VMware Workstation 6.0 also supports → Type 2 hypervisor paravirtualization (VMI) → Partially based on QEMU → Proprietary, Free download for Player and server → Both open source and proprietary versions (free → Host and guest CPUs: X86, AMD64 download) → Host OS: Windows and Linux → Host and guest CPUs: x86 and x86-64 • QEMU → Host OS: Windows, Linux, Mac OS X, Solaris → using dynamic recompilation • Xen → Open source → Type 1 hypervisor → Many different host and guest CPUs → Open source → Host OS: Linux, Mac OS X, Solaris, FreeBSD, → Host CPUs: x86 with Intel VT, AMD64 with AMD-V OpenBSD, Windows → Host OS: Linux. FreeBSD, Solaris • KQEMU → Runs on any x86 processor with Linux and FreeBSD → Linux kernel module to accelerate QEMU using paravirtualization → Type 2 hypervisor using dynamic recompilation → Host and guest CPUs: x86 and x86-64 → Host OS: Linux (experimental versions for FreeBSD and Windows XP)

19 20