Bhyvecon Tokyo 2014

Bhyvecon Tokyo 2014

Introduction to bhyve Takuya ASADA / @syuu1228 What is bhyve? What is bhyve? • bhyve is a hypervisor introduced in FreeBSD • Similar to Linux KVM, runs on host OS • BSD License • Developed by Peter Grehan and Neel Natu bhyve features • Required Intel VT-x and EPT (Nehalem or later) AMD support in progress • Does not support BIOS/UEFI for now UEFI support in progress • Minimal device emulation support: virtio-blk, virtio-net, COM port + α • Supported guest OS: FreeBSD/amd64, i386, Linux/x86_64, OpenBSD/amd64 How to use it? kldload vmm.ko /usr/sbin/bhyveload -m ${mem} -d ${disk} ${name} /usr/sbin/bhyve -c ${cpus} -m ${mem} \ -s 0,hostbridge -s 2,virtio-blk,${disk} \ -s 3,virtio-net,${tap} -s 31,lpc -l com1,stdio vm0 How to run Linux? • bhyve OS Loader(/usr/sbin/bhyveload) only supports FreeBSD You need another OS Loader to support other OSs • grub2-bhyve is the solution • It’s modified version grub2, runs on host OS (FreeBSD) • Can load Linux and OpenBSD • Available in ports & pkg! Virtualization in general Difference between container and hypervisor • Jail is container • It’s virtualize OS environment on kernel level • bhyve is hypervisor • It virtualizes whole machine • Totally different approach Container • Process in jail is just a normal jail1 jail2 process for the kernel • The kernel do some tricks to proc proc proc ess ess isolate environments between jails ess • Lightweight, less-overhead Kernel • Share one kernel with all jails NIC → If the kernel panics, all jails Disk will die • You cannot install another OS (No Windows, No Linux!) Hypervisor • Hypervisor virtualizes a machine vm1 vm2 proc proc • From guest OS, it looks like real ess ess hardware Kernel Kernel • Virtual machine is a normal NI NI Disk Disk process for host OS proc C C ess Hypervisor • Does not share kernel, it is Kernel completely isolated NIC • You can run Full OS inside of Disk the VM → Windows! Linux! How hypervisor virtualize machine? • To make complete virtual machine, you need to virtualize following things: • CPU • Memory (Address Space) • I/O CPU Virtualization: Emulate entire CPU? QEMU run CPU mov dx,3FBh mov al,128 virtual emula out dx,al device IO tor OS physica physic l device al CPU • Like QEMU • You can emulate the entire CPU operation on a normal process • Very slow, not a really useful choice for virtualization CPU Virtualization: Direct execution? • You want run guest instructions directly on a real CPU since you are virtualizing x86 on x86 • You need to avoid executing some instructions which modify system global state, or perform I/O (called sensitive instructions) • If you execute these instructions on a real CPU, it may break host OS state such as directly accessing a HW device Perform I/O on VM GuestOS Real Display Virtual CPU outb • You need to avoid access to real HW from VM • Need to prevent execution of the instruction Perform IO on VM Virtual Display GuestOS trap! Virtual CPU outb • You can trap them by executing in lower privileged mode • However, on x86, there are some instructions which are impossible to trap because these are nonprivileged instructions Software techniques to virtualize x86 • Binary translation (old VMware): interpret & modify guest OS’s instructions on-the-fly → Runs fast, but implementation is very complex • Paravirtualization (old Xen): Modify guest OS for the hypervisor → Runs fast, but is impossible to run unmodified OS’s • We want an easier & better solution → HW assisted virtualization! Hardware assisted virtualization(Intel VT-x) VMX VMX root mode non-root mode User User (Ring 3) VMEntry (Ring 3) Kernel VMExit Kernel (Ring 0) (Ring 0) • New CPU mode: VMX root mode (hypervisor) / VMX non-root mode (guest) • If some event needs to emulate in the hypervisor, CPU stops guest, exit to hypervisor → VMExit • You don’t need complex software techniques You don’t have to modify the guest OS Memory Virtualization Guest A Process A Guest physical memory Page table A 1 1 1 1 2 2 Host physical memory Process B Page table B 3 1 1 3 4 1 2 2 4 2 3 4 Guest B 5 Process A Guest physical memory 6 Page table A 7 1 1 1 1 8 2 2 Process B Page table B 3 4 1 1 3 2 2 4 • If you run guest OS natively, memory address translation become problematic • If GuestB loads Page table A, virtual page 1 translate to Host physical page 1 but you meant Host physical page 5 Shadow Paging Page table A' 1 5 Host physical memory 2 Page table B' 1 1 7 2 2 8 3 4 Guest A 5 Process A 6 Page table A 7 1 1 2 1 8 2 2 Process B Page table B 3 4 1 1 3 2 2 4 Guest physical memory • Trap page table loading/modifying, create “Shadow Page Table”, tell physical page number to the MMU • A software trick that works well, but is slow Nested Paging (Intel EPT) Host physical memory Guest A Process A 1 Page table A EPT A 2 1 1 2 1 1 5 3 2 2 2 6 4 Process B Page table B 3 3 7 5 4 4 8 6 1 1 3 7 2 2 4 Guest physical memory 8 • HW assisted memory virtualization! • You will have Guest physical : Host physical translation table • MMU translates address by two step (Nested) I/O Virtualization • To run unmodified OSs, you’ll need to emulate all devices what you have on the real hardware • SATA, NIC(e1000), USB(ehci), VGA(Cirrus), Interrupt controller(LAPIC, IO-APIC), Clock(HPET), COM port… • Emulating real devices is not very fast because it causes lot of VMExits, not ideal for for virtualization Paravirtual I/O • Virtual I/O device is designed for VM use • Much faster than emulating real devices • Required device driver on guest OS • De-facto standard: virtio-blk, virtio-net PCI Device passthrough Physical memory Guest A Process A 1 IOMMU Pagetable A EPT A 2 translation table 1 1 2 1 1 5 3 5 1 2 2 2 6 4 6 2 DMA! 7 3 Process B Pagetable B 3 3 7 5 PCI 4 4 8 6 8 4 Devices 1 1 3 7 2 2 4 Guest physical memory 8 • If you attach a real HW device on a VM, you will have a problem with DMA • Because the device requires physical address for DMA but the guest OS doesn’t know the Host physical address • Address translator for the devices: IOMMU(Intel VT-d) • Translates guest physical to host physical using a translation table bhyve internals How bhyve virtualize machine? • CPU: HW-assisted virtualization (Intel VT-x) • Memory: HW-assisted memory virtualization (Intel EPT) • IO: virtio, PCI passthrough, +α • Uses HW assisted features bhyve overview 2. Run VM instace • bhyveload: loads Disk image guest OS 1. Create VM instance, tap device load guest kernel stdin/stdout Guest Console bhyve: userland part of kernel N 3. Destroy VM • H I D instance Hypervisor, emulates C bhyveload bhyvectl bhyve devices libvmmapi bhyvectl: a management mmap/ioctl • tool /dev/vmm/${vm_name} (vmm.ko) FreeBSD kernel • libvmmapi: userland API • vmm.ko: kernel part of Hypervisor vmm.ko • All VT-x features only accessible in kernel mode, vmm.ko handles it • Most important work of vmm.ko is CPU mode switching between hypervisor/guest • Provides interface for userland via /dev/ vmm/${vmname} • Each vmm device file contains each VM instance state /dev/vmm/${vmname} interfaces • create/destroy Can create/destroy device file via sysctl hw.vmm.create, hw.vmm.destroy • read/write/mmap Can access guest memory area by standard syscall (Which means you even can dump guest memory by dd command) • ioctl Provides various operations to VM /dev/vmm/${vmname} ioctls • VM_MAP_MEMORY: Maps guest memory area at requested size • VM_SET/GET_REGISTER: Access registers • VM_RUN: Run guest machine, until virtual devices accessed (or some other trap happened) libvmmapi • wrapper library of /dev/vmm operations • vm_create(name)→ sysctl(“hw.vmm.create”, name) • vm_set_register(reg, val) → ioctl(VM_SET_REGISTER, reg, val) bhyveload • bhyve uses OS loader instead of BIOS/UEFI, to load guest OS • FreeBSD bootloader ported to userland: userboot • bhyveload runs host OS, to initialize guest OS • Once it called, it does following things: • Parse UFS on diskimage, find kernel • Load kernel to guest memory area • Initialize Page Table • Create GDT, IDT, LDT • Initialize special registers to get ready for 64bit mode • Guest machine can starts from kernel entry point, with 64bit mode bhyve • bhyve command is the userland part of the hypervisor • It invokes ioctl(VM_RUN) to run GuestOS • Emulates virtual devices • Provides user interface(no GUI for now) main loop in bhyve while (1) { ioctl(VM_RUN, &vmexit); switch (vmexit.exit_code) { case IOPORT_ACCESS: emulate_device(vmexit.ioport); … } } Q&A?.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    34 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us