Ept-Freebsd.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
Implements BIOS emulation support for To virtualize this architecture efficiently, BHyVe: A BSD Hypervisor hypervisors needed to avoid execute these instructions, and replace instruction with Abstract suitable operations. Current BHyVe only supports FreeBSD/amd64 There were some approach to implement it: as a GuestOS. On VMware approach, the hypervisor replaces One of the reason why BHyVe cannot support problematic sensitive instructions on-the-fly, other OSes is lack of BIOS support. while running guest machine. This approach 2 My project is implementing BIOS emulator on called Binary Translation . BHyVe, to remove these limitations. It could run most of unmodified OSes, but it had some performance overhead. On Xen approach, the hypervisor requires to 1. Background run pre-modified GuestOS which replaced 1.1 History of virtualization on x86 problematic sensitive instructions to dedicated architecture operations called Hypercall. This approach There's a famous requirements called "Popek called Para-virtualization3. & Goldberg Virtualization requirements"1, It has less performance overhead than Binary which defines a set of conditions sufficient for Translation on some conditions, but requires an architecture to support virtualization pre-modified GuestOS. efficiently. Due to increasing popularity of virtualization Efficient virtualization means virtualize on x86 machines, Intel decided to enhance x86 machine without using full CPU emulation, run architecture to virtualizable. guest code natively. The feature called Intel VT-x, or Hardware- Explain the requirements simply, to an Assisted Virtualization which is vendor architecture virtualizable, all sensitive neutral term. instructions should be privileged instruction. AMD also developed hardware-assisted Sensitive instructions definition is the virtualization feature on their own CPU, called instruction which can interfere the global status AMD-V. of system. Which means, all sensitive instructions 1.2 Detail of Intel VT-x executed under user mode should be trapped VT-x provides new protection model which by privileged mode program. isolated with Ring protection, for Without this condition, Guest OS affects Host virtualization. OS system status and causes system crash. It added two CPU modes, hypervisor mode x86 architecture was the architecture which and guest machine mode. didin’t meet the requirement, because It had Hypervisor mode called VMX Root Mode, non-privileged sensitive instructions. and guest machine mode called VMX non Root Mode(Figure 1). Gerald J. Popek and Robert P. Goldberg. 1974. Formal requirements for virtualizable third generation architectures. Commun. ACM 17, 7 (July 1974), 412-421. DOI=10.1145/361011.361073 http://doi.acm.org/10.1145/361011.361073 2 Brian Walters. 1999. VMware Virtual Platform. Linux J. 1999, 63es, Article 6 (July 1999). 3 Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. 2003. Xen and the art of virtualization. In Proceedings of the nineteenth ACM symposium on Operating systems principles (SOSP '03). ACM, New York, NY, USA, 164-177. DOI=10.1145/945445.945462 http://doi.acm.org/10.1145/945445.945462 For example, IN/OUT instruction causes VMExit, and hypervisor emulates virtual VMX VMX device access. root mode non-root VT-x defines number of events which can mode cause VMExit, and hypervisor needs to User User configure enable/disable on each VMExit (Ring 3) (Ring 3) events. Reasons of VMExit is called VMExit reason, Kernel Kernel it classified by genres of events. (Ring 0) (Ring 0) Here are VMExit reason list: • Exception or NMI Figure 1. VMX root Mode and VMX non-root • External interrupt Mode • Triple fault • INIT signal received On VT-x, hypervisor can run guest OS on • SIPI received VMX non Root Mode without any • SM received modification, including sensitive instructions, • Internal interrupt without affecting Host OS system status. • Task switch When sensitive instructions are being executed • CPUID instruction under VMX non Root Mode, CPU stops • Intel SMX instructions execution of VMX non Root Mode, exit to • Cache operation instructions(INVD, VMX Root Mode. WBINVD) Then it trapped by hypervisor, hypervisor • TLB operation instructions(HNVLPG, emulates the instruction which guest tried to INVPCID) execute. • IO operation instructions(INB, OUTB, etc) Mode change from VMX Root Mode to VMX • Performance monitoring conter operation non-root Mode called VMEntry, from VMX instruction(RDTSC) non-root Mode to VMX Root Mode called • SMM related instruction(RSM) VMExit(Figure 2). • VT-x instructions(Can use for implement nested virtualization) VMX VMX • Accesses to control registers root mode non-root • Accesses to debug registers mode • Accesses to MSR • MONITOR/MWAIT instructions User User • PAUSE instruction (Ring 3) (Ring 3) VMEntry • Accesses to Local APIC • Accesses to GDTR, IDTR, LDTR, TR VMExit Kernel Kernel • VMX preemption timer (Ring 0) (Ring 0) • RDRAND instruction Figure 2. VMEntry and VMExit Some more events other than sensitive instructions which need to intercept by hypervisor also causes VMExit. VMCS data format revision number. Error code of VMExit failure. VMCS revision identifier An area for saving / restoring guest registers. VMX-abort indicator Register saving/restoring are automatically preformed at VMExit/VMEntry. Guest-state area (Actually not all register are it's target. Some registers should save by hypervisor manually.) The area saves some non-register state, Host-state area instruction blocking state etc. VM-exection control fields An area for saving / restoring hypervisor registers. Usage is almost identical with Guest-state area. VM-exit control fields VMCS data A field control processor behavior in VMX non-root VM-entry control fields operation. VMExit causing events can configure here. VM-exit information fields A field control processor behavior in VMExit operation. A field control processor behavior in VMEntry operation. Enabling/Disabling 64bit mode can configure here. VMExit reason stored here. Figure 3. Structure of VMCS All configuration data related to VT-x stored to write initial configuration values by VMCS(Virtual Machine Control Structure), VMWRITE instruction. which is on memory data structure for each You need to write initial register values guest machine4. here, and it done by /usr/sbin/bhyveload. Figure 3 shows VMCS structure. 3. VMEntry to VMX non root mode Entry to VMX non root mode by invoking 1.3 VT-x enabled hypervisor lifecycle VMLAUNCH or VMRESUME instruction. Hypervisors for VT-x works as following On first launch you need to use lifecycle (Figure 4). VMLAUNCH, after that you need to use VMRESUME. 1. VT-x enabling Before the entry operation, you need to It requires to enable at first to use VT-x save Host OS registers and restore Guest features. OS registers. To enable it, you need set VMXE bit on VT-x only offers minimum automatic save/ CR4 register, and invoke VMXON restore features, rest of the registers need to instruction. take care manually. 2. VMCS initialization 4. Run guest machine VMCS is 4KB alined 4KB page. CPU runs VMX non root mode, guest You need to notify the page address to CPU machine works natively. by invoking VMPTRLD instruction, then 4 If guest system has two or more virtual CPUs, VMCS needs for each vCPUs. 1. VT-x enabling 2. VMCS initialization 7. Run another 3. VMEntry to VMX process non root mode 6. Do emulation for 4. Run guest the exit reason machine 5. VMExit for some exit reason Figure 4. VT-x enabled hypervisor lifecycle 5. VMExit for some reason • Page 1 of Process B on Guest B -> When some events which causes VMExit, Page 1 of Guest physical memory -> CPU returns to VTX root mode. Page 5 of Host physical You need to save/restore register at first, then check the VMExit reason. But, if you run guest OS natively, CPU will 6. Do emulation for the exit reason translate Page 1 of Process B on Guest B to If VMExit reason was the event which Page 1 of Host physical memory. requires some emulation on hypervisor, Because CPU doesn’t know the paging for perform emulation. (Ex: Guest OS wrote guests are nested. data on HDD Depending Host OS scheduling, it may There is software technique to solve the resume VM by start again from 3, or task problem called shadow paging (Figure 6). switch to another process. Hypervisor creates clone of guest page table, set host physical address on it, traps guest 1.4 Memory Virtualization writing CR3 register and set cloned page table Mordan multi-tasking OSes use paging to to CR3. provide individual memory space for each Then CPU able to know correct mapping of processes. guest memory. To run guest OS program natively, address This technique was used on both Binary translation on paging become problematic translation based VMware, and also early function. implementation of hypervisors for VT-x. For example (Figure 5): But it has big overhead, Intel decided to add You allocate physical page 1- 4 to Guest A, and nested paging support on VT-x from Nehalem 5-8 to GuestB. micro-architecture. Both guests map page 1 of Process A to page 1 of guest physical memory. EPT is the name of nested paging feature Then it should point to: (Figure 7), • Page 1 of Process A on Guest A -> It simply adds Guest physical address to Host Page 1 of Guest physical memory -> physical address translation table. Page 1 of Host physical Now hypervisor doesn’t need to take care guest paging, it become much simpler and faster. Guest A Process A Guest physical memory Page table A 1 1 1 1 2 2 Host physical memory Process B Page table B 3 1 1 3 4 1 2 2 4 2 3 4 Guest B 5 Process A Guest physical memory 6 Page table A 7 1 1 1 1 8 2 2 Process B Page table B 3 4 1 1 3 2 2 4 Figure 5.