Universitatea POLITEHNICA din București

Migrating a bhyve guest BSDCan2019, Ottawa, Canada

Authors Elena Mihăilescu Mihai Carabaș [email protected] [email protected] [email protected] About me

• Master’s degree student at University POLITEHNICA of Bucharest • Study Complex Network Security

• Working on FreeBSD’s projects since September, 2017

BSDCan2019, Ottawa, Canada January 15, 2019 2 Introduction

& Cloud Solutions • Live Migration

, Hyper-V, KVM, VirtualBox, VMWare • bhyve – FreeBSD’s

BSDCan2019, Ottawa, Canada January 15, 2019 3 bhyve

libvmmapi User space

Kernel space vmm.ko

BSDCan2019, Ottawa, Canada January 15, 2019 4 bhyve

bhyveload

libvmmapi User space

Kernel space vmm.ko

BSDCan2019, Ottawa, Canada January 15, 2019 5 bhyve

bhyveload bhyverun

libvmmapi User space

Kernel space vmm.ko

BSDCan2019, Ottawa, Canada January 15, 2019 6 bhyve

bhyveload bhyverun bhyvectl

libvmmapi User space

Kernel space vmm.ko

BSDCan2019, Ottawa, Canada January 15, 2019 7 Migration

• Move a guest from one host to another

• Cold Migration • Warm Migration • Live Migration – Pre-Copy Live Migration – Post-Copy Live Migration

BSDCan2019, Ottawa, Canada January 15, 2019 8 Types of Migration

Cold Migration • Guest is powered off • Move its disk on another system

• Disadvantages: – Process is really slow (big down time) – The guest has to be powered off

BSDCan2019, Ottawa, Canada January 15, 2019 9 Types of Migration

Warm Migration • Guest is suspended • Transfer its state and memory on another host • Resume guest on destination system • Guest disk image has to be shared

• Disadvantages: – Big downtime (i.e., large sized guests)

BSDCan2019, Ottawa, Canada January 15, 2019 10 Types of Migration

Live Migration • Guest memory is migrated while running • At some point, guest is suspended and only the CPU’s & devices’ state is migrated • Short down time

BSDCan2019, Ottawa, Canada January 15, 2019 11 Types of Migration

Live Migration – Pre-Copy Approach • Memory migrated in rounds while guest is running • In final round: – Stop source VM – Copy remaining memory – Copy CPU & devices state – Start destination VM

BSDCan2019, Ottawa, Canada January 15, 2019 12 Types of Migration

Live Migration – Post-Copy Approach • Memory migrated using a page fault approach • Algorithm: – Stop source VM – Copy CPU and devices state on destination – Start destination VM – Copy memory page when a page fault occurs at destination

BSDCan2019, Ottawa, Canada January 15, 2019 13 Types of Migration

Pre-Copy Live Migration Post-Copy Live Migration • Same page can be migrated • A page is migrated only once multiple times

• Guest running on source • Guest running on destination until migration finishes until migration finishes

• If migration fails, guest • If migration fails, additional continue running on source mechanism should be host implemented for fallback

BSDCan2019, Ottawa, Canada January 15, 2019 14 Related Work

Save and Restore feature for bhyve • Intel/AMD CPU state – VMCS/VMCB • Guest physical memory • Kernel devices – VHPET, VRTC, VLAPIC etc. • Virtual devices – virtio devices, UART, AHCI

BSDCan2019, Ottawa, Canada January 15, 2019 15 Save Mechanism

bhyveload bhyverun bhyvectl

--suspend file.ckp

libvmmapi User space

Kernel space vmm.ko

BSDCan2019, Ottawa, Canada January 15, 2019 16 Restore Mechanism

bhyveload bhyverun bhyvectl

-r file.ckp

libvmmapi User space

Kernel space vmm.ko

BSDCan2019, Ottawa, Canada January 15, 2019 17 Adding a migration feature for bhyve

• FreeBSD-UPB

• Based on the Save&Restore for bhyve Project

• Features to be presented: – Warm Migration for bhyve – Pre-Copy Live Migration approach for bhyve based on a Copy-on-Write Mechanism

BSDCan2019, Ottawa, Canada January 15, 2019 18 Save-Restore “Migration”

• Use The Save-Restore feature to migrate a guest

• Suspend guest state on one host

• Resume guest state on another host

BSDCan2019, Ottawa, Canada January 15, 2019 19 Save-Restore “Migration”

Client 1 Client 2

Disk Shared with NFS 1. Open VM 2. Snapshot VM 3. Close VM 4. Restore VM____

BSDCan2019, Ottawa, Canada January 15, 2019 20 Save-Restore “Migration”

Limitations: • User has to manually check if hosts are compatible for migration • Additional space required for saving files • Takes a lot of time

BSDCan2019, Ottawa, Canada January 15, 2019 21 Warm Migration • Use the same Save & Restore feature, but… • Send state and memory via socket

• Freeze VM, send state & memory via socket, Restore VM

• Add options for: – bhvye (-R) – bhyvectl (--migrate)

BSDCan2019, Ottawa, Canada January 15, 2019 22 Warm Migration

Socket Connection Client 1 Client 2

Disk Shared with NFS 1. Open VM 2. Open VM 3. Wait for Migration 4. Stop VM 5. Send state through socket 6. Receive state 7. Start VM 8. Destroy VM

BSDCan2019, Ottawa, Canada January 15, 2019 23 Warm Migration - Usage

• Run VM root@src # bhyve vm_src

• Wait for migration root@dst # bhyve -R src_IP,port vm_dst

• Start Migration root@src # bhyvectl --migrate=dst_IP,port vm_src

BSDCan2019, Ottawa, Canada January 15, 2019 24 Live Migration

Socket Connection Client 1 Client 2

1. Open VM Disk Shared with NFS 2. Open VM 3. Wait for Migration 4. Send memory in rounds through socket 5. Receive memory 6. Stop VM 7. Send state 8. Receive state 9. Start VM 10. Destroy VM

BSDCan2019, Ottawa, Canada January 15, 2019 25 Live Migration Challenges

• The difficult part: live migrating the memory

• Memory is migrated in rounds

• Need to determine the memory pages that were modified since the last round started

BSDCan2019, Ottawa, Canada January 15, 2019 26 Live Migration using Copy-on-Write

• When spawning a process with fork(), its memory is marked as CoW

• Pages are duplicated when a write operation occurs

• Check the differences between the parent’s memory and child’s memory

BSDCan2019, Ottawa, Canada January 15, 2019 27 Virtual Memory Management in FreeBSD

struct vmspace

struct vm_map_entry ------struct vm_map_entry ------struct vm_map_entry ------...... ------struct vm_map_entry

BSDCan2019, Ottawa, Canada January 15, 2019 28 Virtual Memory Management in FreeBSD

struct vmspace

struct vm_map_entry ------struct vm_map_entry ------struct vm_map_entry struct vm_object ------...... ------struct vm_map_entry

BSDCan2019, Ottawa, Canada January 15, 2019 29 Virtual Memory Management in FreeBSD

struct vmspace Radix tree of struct vm_map_entry struct vm_page ------struct vm_map_entry ------struct vm_map_entry struct vm_object struct ------vm_page ...... ------struct vm_map_entry

BSDCan2019, Ottawa, Canada January 15, 2019 30 Copy on Write in FreeBSD

VM_MAP_ENTRY VM_OBJECT

BSDCan2019, Ottawa, Canada January 15, 2019 31 Copy on Write in FreeBSD

VM_MAP_ENTRY VM_OBJECT

entry->eflags |= (MAP_ENTRY_COW | MAP_ENTRY_NEEDS_COPY);

BSDCan2019, Ottawa, Canada January 15, 2019 32 Copy on Write in FreeBSD

VM_MAP_ENTRY VM_OBJECT

entry->eflags |= (MAP_ENTRY_COW | MAP_ENTRY_NEEDS_COPY);

VM_MAP_ENTRY SHADOW VM_OBJECT

BACKING VM_OBJECT

BSDCan2019, Ottawa, Canada January 15, 2019 33 Copy-on-Write Guest Memory

SHADOW VM_MAP_ENTRY VM_OBJECT Pages to be migrated

BACKING VM_OBJECT Already migrated pages

BSDCan2019, Ottawa, Canada January 15, 2019 34 Bhyve – Memory Layout

VMSPACE VMSPACE

PTE_x86 PTE_EPT

VMX(Guest)

Memory Layout Memory

MemoryLayout

Tool (Host) Tool bhyve

BSDCan2019, Ottawa, Canada January 15, 2019 35 Bhyve – Memory Layout

VMSPACE VMSPACE

PTE_x86 PTE_EPT VMX(Guest)

VM_MAP VM_MAP

Memory Layout Memory

MemoryLayout

Tool (Host) Tool bhyve

BSDCan2019, Ottawa, Canada January 15, 2019 36 Bhyve – Memory Layout

VMSPACE VMSPACE

PTE_x86 PTE_EPT VMX(Guest)

VM_MAP VM_MAP

Memory Layout Memory

– MemoryLayout

VM_MAP_ENTRY VM_MAP_ENTRY

Tool (Host) Tool bhyve

BSDCan2019, Ottawa, Canada January 15, 2019 37 Bhyve – Memory Layout

VMSPACE VMSPACE

PTE_x86 PTE_EPT VMX(Guest)

VM_MAP VM_MAP

Memory Layout Memory

– MemoryLayout

VM_MAP_ENTRY VM_MAP_ENTRY

Tool (Host) Tool bhyve VM_OBJECT

BSDCan2019, Ottawa, Canada January 15, 2019 38 Copy-on-Write Guest Memory Object

VMSPACE VMSPACE

PTE_x86 PTE_EPT VMX(Guest)

VM_MAP VM_MAP

Memory Layout Memory

– –

VM_MAP_ENTRY VM_MAP_ENTRY MemoryLayout

Tool (Host) Tool VM_OBJECT bhyve

BSDCan2019, Ottawa, Canada January 15, 2019 39 Copy-on-Write Guest Memory Object

VMSPACE VMSPACE

PTE_x86 PTE_EPT VMX(Guest)

VM_MAP VM_MAP

Memory Layout Memory

– –

VM_MAP_ENTRY VM_MAP_ENTRY MemoryLayout

SHADOW Tool (Host) Tool VM_OBJECT bhyve BACKING VM_OBJECT

BSDCan2019, Ottawa, Canada January 15, 2019 40 Copy-on-Write bhyve Memory Object

VMSPACE VMSPACE

PTE_x86 PTE_EPT VMX(Guest)

VM_MAP VM_MAP

Memory Layout Memory

– –

VM_MAP_ENTRY VM_MAP_ENTRY MemoryLayout

Tool (Host) Tool VM_OBJECT bhyve

BSDCan2019, Ottawa, Canada January 15, 2019 41 Copy-on-Write bhyve Memory Object

VMSPACE VMSPACE

PTE_x86 PTE_EPT VMX(Guest)

VM_MAP VM_MAP

Memory Layout Memory

– –

VM_MAP_ENTRY VM_MAP_ENTRY MemoryLayout

SHADOW Tool (Host) Tool VM_OBJECT bhyve BACKING VM_OBJECT

BSDCan2019, Ottawa, Canada January 15, 2019 42 Copy-on-Write Guest Memory

• Host and Guest won’t see the same memory • Communication between host and guest is lost (e.g., networking, block device access) • Virtual Machine will eventually crash

BSDCan2019, Ottawa, Canada January 15, 2019 43 New Approach

• We wanted to use Copy-on-Write to determine pages to be sent… but it doesn’t work :(

• Next, dirty bits approach

BSDCan2019, Ottawa, Canada January 15, 2019 44 Dirty bit approach

• use a dirty bit for each vm_page • clear all the dirty bits at the beginning of a round • in the next round, check all the dirtied pages and send them • clear all the dirty bits and repeat the procedure

BSDCan2019, Ottawa, Canada January 15, 2019 45 Dirty bit approach

• Each vm_page has a dirty flag field that is update from time to time based on the hardware Modified bit (AD bits)

• … but it cannot be used (vm_page’s dirty flag is used by other subsystems; laundry systems)

• So we’ll use our own dirty bit

BSDCan2019, Ottawa, Canada January 15, 2019 46 Algorithm

1. Connect source and destination 2. Check for compatibility

// First Migration Round 3. Clear vmm dirty bit for each page 4. page_list = all guest’s pages 5. Send page_list to destination

BSDCan2019, Ottawa, Canada January 15, 2019 47 Algorithm

6. for each remaining migration round – 1 7. page_list = [] 8. search_for_dirty_pages(page_list) 9. clear_vmm_dirty_bit_for_all_pages() 10. send_to_dest(page_list) 11. end for

BSDCan2019, Ottawa, Canada January 15, 2019 48 Algorithm

// Last Round 12. page_list = [] 13. freeze_vm() 14. search_for_dirty_pages(page_list) 15. send_to_dest(page_list) 16. send_to_dest(kern_structs) 17. send_to_dest(devs) 18. send_to_dest(CPU state)

BSDCan2019, Ottawa, Canada January 15, 2019 49 Implementation

• Use an unused bit from vm_page->flags • VPO_VMM_DIRTY

• Update VPO_VMM_DIRTY when vm_page->dirty is updated

• Clear VPO_VMM_DIRTY after a page is sent

BSDCan2019, Ottawa, Canada January 15, 2019 50 Implementation

• Iterate through all guest’s vm_pages and retain indexes for the dirty ones

• Copy vm_pages into a userspace buffer and send it to destination via sockets • … and from the userspace buffer to vm_spaces (recv part)

• Add --migrate-live option in bhyvectl

BSDCan2019, Ottawa, Canada January 15, 2019 51 Live Migration - Usage

• Run VM root@src # bhyve vm_src

• Wait for migration root@dst # bhyve -R src_IP,port vm_dst

• Start Migration root@src # bhyvectl –migrate-live=dst_IP,port vm_src

BSDCan2019, Ottawa, Canada January 15, 2019 52 Current Limitations

• Only for the lowmem segment (less than 3GB of RAM)

• Issues with memory corruption (can be related to bhyve’s threads for network and disk)

• Only with wired memory (otherwise pages can be swapped out)

BSDCan2019, Ottawa, Canada January 15, 2019 53 Current Status and Future Work

What we have implemented • Warm Migration and the framework for Live Migration

What we do now • Fix and Improve Live Migration Support in bhyve

BSDCan2019, Ottawa, Canada January 15, 2019 54 Special Thanks

• Mihai Carabaș, Darius Mihai, Sergiu Weisz • Marcelo Araujo

• John Baldwin, Mark Johnston, Alan Cox

• Matthew Grooms for financial support

BSDCan2019, Ottawa, Canada January 15, 2019 55 FreeBSD-UPB on Github

• Save-Restore Project: – https://github.com/FreeBSD- UPB/freebsd/tree/projects/bhyve_save_restore • Migration Project: – https://github.com/FreeBSD- UPB/freebsd/tree/projects/bhyve_migration • Live Migration Project: – https://github.com/FreeBSD- UPB/freebsd/tree/projects/bhyve_migration_dev

BSDCan2019, Ottawa, Canada January 15, 2019 56 References • Pictures and Logos: – http://bhyve.org/static/bhyve.png – https://www.freebsdfoundation.org/about/project/ • FreeBSD Virtual Memory Management: – Design elements of the FreeBSD VM system, [Online] https://www.freebsd.org/doc/en/articles/vm-design/ [Accessed Dec, 21st, 2018] – https://github.com/freebsd/freebsd • Bhyve Memory Layout: – Nested Paging in bhyve, N. Natu, P. Grehan – https://github.com/freebsd/freebsd

BSDCan2019, Ottawa, Canada January 15, 2019 57 References

• Migration Documentation: – Virtual Machine Migration [Online] https://nsrc.org/workshops/2014/sanog23- virtualization/raw-attachment/wiki/Agenda/migration- storage.pdf [Accessed Dec, 21st, 2018]. – VMware virtual machine migration types vSphere 6.0, VMWare [Online] https://communities.vmware.com/docs/DOC-31922 [Accessed Dec, 21st, 2018]

BSDCan2019, Ottawa, Canada January 15, 2019 58 References

• Migration Documentation: – Live migration of virtual machines, C. Clark, and K. Fraser and S. Hand, and J.G. Hansen, and E. Jul, and C. Limpach, and I. Pratt, and A. Warfield. – Post-copy live migration of virtual machines, M.R. Hines and U. Deshpande and K. Gopalan – kvm: the virtual machine monitor, A. Kivity and Y. Kamay and D. Laor and U. Lublin and A. Liguori

BSDCan2019, Ottawa, Canada January 15, 2019 59