Architecture of the Kernel-Based Virtual Machine (KVM), 2010

Architecture of the Kernel-Based Virtual Machine (KVM), 2010

Corporate Technology Architecture of the Kernel-based Virtual Machine (KVM) Jan Kiszka, Siemens AG, CT T DE IT 1 Corporate Competence Center Embedded Linux [email protected] Copyright © Siemens AG 2010. All rights reserved. Agenda . Introduction . Basic KVM model . Memory . API . Optimizations . Paravirtual devices . Outlook Slide 2 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology Virtualization of Commodity Computers Instruction On-Chip Set Resources Busses & CPU I/O Devices MMU Clocks Interrupt & Controllers Timers Memory Slide 3 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology Virtualizing the x86 Instruction Set Architecture x86 originally virtualization “unfriendly” . No hardware provisions . Instructions behave differently depending on privilege context . Performance suffered on trap-and-emulate . CISC nature complicates instruction replacements Early approaches to x86 virtualization . Binary translation (e.g. VMware) . Execute substitution code for privileged guest code . May require substantial replacements to preserve illusion . CPU paravirtualization (e.g Xen) . Guest is aware of instruction restrictions . Hypervisor provides replacement services (hypercalls) . Raised abstraction levels for better performance Slide 4 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology Hardware-assisted x86 CPU Virtualization Two variants . Intel's Virtualization Technology, VT-x . AMD-V (aka Secure Virtual Machine) Identical core concept CPU 3 Host Guest VCPU 2 3 State State 2 1 1 0 0 Slide 5 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology Advent and Evolution of KVM Introduced to make VT-x/AMD-V available to user space . Exposes virtualization features securely . Interface: /dev/kvm Merged quickly . Available since 2.6.20 (2006) . From first LKML posting to merge: 3 months . One reason: originally 100% orthogonal to core kernel Evolved significantly since then . Ported to further architectures (s390, PowerPC, IA64) . Always with latest x86 virtualization features . Became recognized & driving part of Linux Slide 6 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology The KVM Model Processes can create virtual machines Guest VMs can contain Memory Hyper- . Memory visor . Virtual CPUs Process . In-kernel device models VCPU VCPU Guest physical memory part of Thread Thread Thread creating process' address space VCPUs run in process Linux KVM execution contexts Kernel . Process usually maps VCPUs on threads CPU CPU CPU Slide 7 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology Architectural Advantages of the KVM Model Proximity of guest and user space hypervisor . Only one address space switch: guest ↔ host . Less rescheduling Massive Linux kernel reuse . Scheduler . Memory management with swapping (though you don't what this) . I/O stacks . Power management . Host CPU hot-plugging . … Massive Linux user land reuse . Network configuration . Handling VM images . Logging, tracing, debugging . ... Slide 8 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology VCPU Execution Flow (KVM View) e c a Handle p Update S • I/O Handle context, Run r • Invalid states Signal e raise IRQs s • ... U Handle Update Handle • In-Kernel I/O l guest Host e • [vMMU] n state IRQ r Save Host, • ... Save Guest, e Load Guest Load Host K State State VM entry VM exit Execute native U (with reason) P guest code C Slide 9 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology KVM Memory Model Slot-based guest memory . Maps guest physical to RAM host virtual memory . Reconfigurable RAM . Supports dirty tracking In-Kernel Virtual MMU RAM Coalesced MMIO Optimizes guest access to . Coalesced RAM-like virtual MMIO regions MMIO Out of scope . Memory ballooning Unassigned (guest ↔ user space hypervisor) . Kernel Same-page Merging RAM (not KVM-specific) Guest Hypervisor Address Space Address Space Slide 10 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology KVM API Overview Step #1: open /dev/kvm Three groups of IOCTLs . System-level requests . VM-level requests . VCPU-level requests Per-group file descriptors . /dev/kvm fd for system level . Creating a VM or VCPU returns new fd mmap on file descriptors . VCPU: fast kernel-user communication segment . Frequently read/modified part of VCPU state . Includes coalesced MMIO backlog . VM: map guest physical memory (deprecated) Slide 11 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology Basic KVM IOCTLs KVM_CREATE_VM KVM_SET_USER_MEMORY_REGION KVM_CREATE_IRQCHIP / ...PIT (x86) KVM_CREATE_VCPU KVM_SET_REGS / ...SREGS / ...FPU / ... KVM_SET_CPUID / ...MSRS / ...VCPU_EVENTS / ... (x86) KVM_SET_LAPIC (x86) KVM_RUN Slide 12 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology Optimizations of KVM Hardware evolves quickly . Near-native performance in guest mode . Decreasing costs of mode switches . Additional features avoid software solutions, thus exits . Nested page tables . TLB tagging . APIC virtualization . ... What will continue to consume cycles? . Code path between VM-exit and VM-entry . Mode switches, i.e. the need to exit at all Slide 13 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology Lightweight vs. Heavy-weight VM-Exits Exits cost time! . Basic state switch in hardware . Additional state switches in software . Analyze exit reason . ReturnIn-kernel to APICuser space Analyze exit reason . In-kernel IO-APIC + PIC . Obtain KVM state (VCPU, devices) Coalescing MMIO . Handle exit cause WriteIn-kernel back instruction states interpreter (detect MMIO access) . >7.000 cycles . InvokeIn-kernel KVM_RUN network stub (vhost-net) . Software-managed state switch >10.000 cycles . Hardware state switch Slide 14 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology Optimizing Lightweight Exits Let's get lazy! z . Perform only partial state switches z . Complete at latest possible point z . Late restoring for guest and host state Candidates (x86) . FPU . Debug registers . Model-specific registers (MSRs) Requirements . Usage detection when in guest mode . Depends on hardware support . Demand detection while in host mode . Preemption notifiers . User-return notifier Slide 15 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology Lazy MSR Switching Why is this possible? . Some MSRs unused by Linux . Some MSRs only relevant when in user space . Some are identical for host & guest Approach . Keep guest values of certain MSRs until... sched-out fires . KVM_RUN IOCTL returns . Keep others until user-return fires (Intel only) Optimizations are vendor-specific Exemplary saving: . 2000 cycles for guest → idle thread → guest Slide 16 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology Paravirtual Devices Advantages . Reduce VM exits or make them lightweight . Improve I/O throughput & latency (less emulation) . Compensates virtualization effects . Enable direct host-guest interaction Available interfaces & implementions . virtio (PCI or alternative transports) . Network . Block user space . Serial I/O (console, host-guest channel, …) business . Memory balloon (primarily) . File system (9P) . Clock (x86 only) KVM . Via shared page + MSRs business . Enables safeTM TSC guest usage Slide 17 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology An Almost-In-Kernel Device – vhost-net Goal: high throughput / low latency guest networking memory r/w . Avoid heavy exits VCPU virtio ring & . Reduce packet copying buffers . No in-kernel QEMU, please! KVM ioeventfd The vhost-net model irqfd r/w . Host user space opens and configures kernel helper vhost-net r memory slot virtio as guest-host interface worker . kthread table . KVM interface: eventfd . TX trigger → ioeventfd . RX signal → irqfd hypervisor process . Linux interface vie tap or macvtap Linux network Enables multi-gigabit throughput stack Slide 18 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology What's next? Generic Linux improvements . Transparent huge pages (mm topic) . NUMA optimizations (scheduler topic) Improve spin-lock-holder preemption effects Zero-copy & multi-queue vhost-net Further optimize exits . Instruction interpretation (hardware may help) . Faster in-kernel device dispatching Nested virtualization as standard feature . AMD-V bits already merged and working . VT-x more complex but likely solvable Hardware-assisted virtualization on non-x86 . PowerPC ISA 2.06 . ARMv7-A “Eagle” extensions … Slide 19 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology Thanks you for listening! Questions? Slide 20 2010-09-23 Jan Kiszka, CT T DE IT 1 © Siemens AG, Corporate Technology.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    20 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us