System Virtual Machines
• Introduction • Key concepts • Resource virtualization – processors – memory – I/O devices • Performance issues • Applications
EECS 768 Virtual Machines 1 Introduction
• System virtual machine – capable of supporting multiple system images – each guest OS manages a set of virtualized hardware resources – VMM owns the real system resources – VMM allocates resources to guest OS by partitioning or time-sharing – guest OS allocates resources to its user programs – each virtual resource may not have physical resource – see Figure 8.1
EECS 768 Virtual Machines 2 Introduction (2)
L i n u x W i n d o w s O S / 2 Application Application Application
L i n u x O S W i n d o w s O S O S / 2 O S
V irtual Intel x86 V irtual Intel x86 V irtual Intel x86
I n t e l x 8 6 H a r d w a r e
EECS 768 Virtual Machines 3 Key Concepts
• Outward appearance • State management • Resource control • Native and hosted virtual machines
• Several concepts are simplified – same guest and host ISA – uniprocessor systems
EECS 768 Virtual Machines 4 Outward Appearance
• Create an illusion of multiple machines. • Illusion in software – no replication of hardware resources – switch to move devices from one VM to the other • Replication of hardware resources – devices used directly by each user are replicated – rest of the hardware is shared • Hosted VM – one OS more important than the other – UI of host OS provides way to display UI of the second EECS 768 Virtual Machines 5 State Management
• Refers to how each individual guest state is managed by the SVM – each guest OS has an architected state – host resources may not be adequate • Fixed location for all guest state in host – VMM manages pointer and switching – indirection can be very inefficient (Figure 8.3a) • Copy approach – more efficient (Figure 8.3b) – essential to allow direct execution of native code EECS 768 Virtual Machines 6 State Management (2)
V M M M e m o r y V M M M e m o r y
R e g i s t e r P r o c e s s o r R egister values P r o c e s s o r v a l u e s f o r V M 1 f o r V M 1
R e g i s t e r R egister values v a l u e s f o r V M 2 f o r V M 2 P r o c e s s o r R egister B lock R e g i s t e r s P o i n t e r
R e g i s t e r R egister values v a l u e s f o r V M 3 f o r V M 3
EECS 768 Virtual Machines 7 Resource Control
• Refers to how the VMM maintains overall control of all hardware resources. • Scheme similar to time-sharing OS – VMM intercept accesses to privileged resources – get control on all privileged instructions – VMM handles the timer interrupt (Figure 8.4) – similar tradeoffs between fair scheduling and large switching overhead
EECS 768 Virtual Machines 8 VMM Timesharing
• VMM timeshares resources among guests – similar to OS timesharing
VMM VM M restores determ ines next architected state V M t o b e for next VM a c t i v a t e d VM M sets tim er V M M s a v e s interval and VM M sets PC to tim er Tim er interrupt architected state e n a b l e s interrupt handler of O S o c c u r s of running VM i n t e r r u p t s i n n e x t V M
First VM A ctive V M M A c t i v e Next VM Active
EECS 768 Virtual Machines 9 Native and Hosted VM
• Refers to the runtime privilege level of the VMM (and OS). – for efficiency, some part of VMM should have higher privileges • Figure 8.5 – analogous relationship to OS and user programs – native VM system: only the VMM operates in privilege mode – hosted VM: VMM installed on top of existing OS • some or no part of VMM in privilege mode EECS 768 Virtual Machines 10 Native and Hosted VM (2)
V i r t u a l V i r t u a l M a c h i n e M a c h i n e V i r t u a l N o n - p r i v i l e g e d A pplications M a c h i n e m o d e s VMM VMM
OS VMM H o s t O S H o s t O S P r i v i l e g e d M o d e
H a r d w a r e H a r d w a r e H a r d w a r e H a r d w a r e
Traditional N a t i v e U s e r - m o d e D u a l - m o d e uniprocessor V M s y s t e m H o s t e d H o s t e d s y s t e m V M s y s t e m V M s y s t e m
EECS 768 Virtual Machines 11 Resource Virtualization – Processors
• Techniques to execute guest instructions – emulation – direct native execution • Emulation – interpretation or binary translation – necessary if different host and guest ISAs • Direct native execution – same host and guest ISAs required – may still require emulation of some instructions
EECS 768 Virtual Machines 12 Conditions for ISA Virtualizability
• Conditions laid out by Popek and Goldberg. • Assume native system VMs – VMM runs in system mode – all other software runs in user mode • Other assumptions – hardware consists of processor and uniformly addressable memory – processor can operate in two modes, user and system – some subset of instructions only available in system mode – relocation register available for memory addressing
EECS 768 Virtual Machines 13 Privileged Instructions
• Instructions that trap if the machine is in user mode, and does not trap if the machine is in system mode. • LPSW (Load Processor Status Word) – PSW can change the state of the CPU – can also change the instruction address (PC) • SPT (Set CPU Timer) – can change the CPU interval timer
EECS 768 Virtual Machines 14 Privileged Instructions (cont…)
• LRW (Load Real Address) – translate a virtual address, save corresponding real address in a specified GPR • POPF (Pop Stack into Flags Register) – replace flag register with top of memory stack – interrupt-enable flag can only be modified in system mode – no trap in user mode
EECS 768 Virtual Machines 15 Sensitive Instructions
• These specify instructions that interact with hardware. • Control-sensitive instructions – can change the configuration of system resources – e.g., physical memory assigned to a program – e.g., LPSW – change mode of operation – e.g., SPT – change CPU timer value
EECS 768 Virtual Machines 16 Sensitive Instructions (cont…)
• Behavior-sensitive instructions – behavior or results depend on configuration of resources – e.g., LRA – depends on mapping (state) of real memory resource – e.g., POPF – depends on mode of operation • Innocuous instructions – neither context-sensitive nor behavior- sensitive • see Figure 8.6
EECS 768 Virtual Machines 17 Instruction Types – Summary
N o n - P r i v i l e g e d I n n o c u o u s P r i v i l e g e d
B e h a v i o r - S e n s i t i v e C o n t r o l - S e n s i t i v e s e n s i t i v e s e n s i t i v e
EECS 768 Virtual Machines 18 Components of a VMM
Instruction trap occurs
These instructions D i s p a t c h e r desire to change m achine resources, P r i v i l e g e d Instruction e.g. Load R elocation Bounds Register P r i v i l e g e d P r i v i l e g e d Interpreter Instruction Instruction R o u t i n e 1
A l l o c a t o r P r i v i l e g e d Instruction Interpreter R o u t i n e 2
These instructions do not change m achine resources, but access privileged resources, e.g. IN , O U T, Interpreter W r i t e T L B R o u t i n e n
EECS 768 Virtual Machines 19 Components of a VMM (2)
• Dispatcher – control module, called on hardware traps – decides the next module to be invoked • Allocator – decides allocation of system resources – invoked by dispatcher to change machine resource allocation for VMs • Interpreter routines – emulates trap functionality for each user VM – all traps except resource re-assignment traps
EECS 768 Virtual Machines 20 Properties for a True VMM
• Efficiency – All innocuous instructions must be natively executed • Resource control – impossible for guest s/w to directly change system resource configurations • Equivalence – guaranty of identical behavior – allowed exceptions • performance can be slower • available resources can be limited • differences in performance due to changed
timings EECS 768 Virtual Machines 21 Requirement for True VMM
• For any third-generation computer, a VMM may be constructed if the set of sensitive instructions for a computer is a subset of the set of privileged instructions – sensitive instructions always trap in user mode – non-privileged instructions execute natively
EECS 768 Virtual Machines 22 Handling Privileged Instruction
• Privileged instruction traps in the guest OS
G uest O S code in VM V M M c o d e (user m ode) (privileged m ode)
D i s p a t c h e r
Privileged instruction (LPSW ) … ... … LPSW Routine: ... Change m ode to privileged N ext instruction (target of LPSW ) C heck privilege level in VM Em ulate instruction Com pute target Restore m ode to user Jum p to target
EECS 768 Virtual Machines 23 Problem Instructions
• Sensitive instructions that are not privileged – POPF instruction in the Intel IA-32 – IA-32 not virtualizable by Popek-Goldberg rules • Handling problem instructions – interpret all guest software – scan and patch problem instructions before execution – see Figure 8.11
EECS 768 Virtual Machines 24 Patching Problem Instructions
Code Patch for d i s c o v e r e d critical instruction C ontrol transfer, e . g . t r a p Scanner and P a t c h e r VMM
O riginal Program Patched Program
EECS 768 Virtual Machines 25 Patching Problem Instructions (2)
• VMM takes control at the head of each basic block. • Scan to find problem instructions. • Replace with trap to VMM. • Place another trap at end of basic block. • Patched basic blocks can be chained. • Mostly similar to binary translation.
EECS 768 Virtual Machines 26 Caching Emulation Code
• High frequency of sensitive instructions requiring interpretation increases VMM overhead – cache interpreter actions in code cache – cache block containing the problem instruction • Each instance of the problem instruction in the code associated with a distinct piece of cache code – make code instance-specific and optimize • Simpler management of cached code – cached code executed in system mode
EECS 768 Virtual Machines 27 Resource Virtualization – Memory
• Generalizes the concept of virtual memory • Virtual memory basics – application provided a logical view of memory – OS manages actual real memory – page tables provide logical-physical mapping – TLBs cache most common mappings • VMM distinguishes between real memory and physical memory – VMM does real-physical memory mapping
EECS 768 Virtual Machines 28 VMM Memory Support
• VMM mechanism to virtualize memory – maintain a per-VM real-map table – maps real pages to physical pages – VMM maintains a distinct swap space • Additional indirection layer may be inefficient – virtualize architected page tables – virtualize architected TLB
EECS 768 Virtual Machines 29 Resource Virtualization – I/O
• Virtualization of I/O is difficult – large number of I/O devices – each I/O device controlled in specific manner – OS have to deal with similar issues • Virtualization Scheme – provide VMs with virtual version of a device – intercept VM requests made to virtual device – convert request, perform equivalent action
EECS 768 Virtual Machines 30 Virtualizing Devices
• Dedicated devices – display, keyboard, mouse etc., for each user – no device virtualization necessary – device requests could bypass the VM • Partitioned devices – smaller virtual disks for each user – VMM use map to translate parameters – status information from the device also needs translation and reflection in the
map EECS 768 Virtual Machines 31 Virtualizing Devices (cont…)
• Shared devices – network adapter, virtual network address for a VM – maintain state information for each guest VM – translate device outgoing/incoming requests • Spooled devices – shared at higher granularity, e.g., printer – two-level spool tables in OS and VMM – VMM schedules requests from multiple guest VMs
EECS 768 Virtual Machines 32 Performance Degradation
• Setup – initialize the state of the machine • Emulation – sensitive instructions, maybe others • Interrupt handling – handling interrupts generated by the guest programs • State saving – save/restore state of VM during switch to VMM • Book-keeping – special operations to maintain equivalent behavior • Time elongation – some instructions may require more processing time
EECS 768 Virtual Machines 33 VM Assists
• Piece of hardware that improves performance of an application when running on a VM. • May not always improve performance – improve user program-VMM system-mode switches – improve address translation performance • VM assists can help improve performance of – instruction emulation – other aspects of VMM – the user-mode system running as the guest – specific types of guest systems
EECS 768 Virtual Machines 34 Instruction Emulation Assists
• Emulation of privileged instructions in the guest VM is a cause of fundamental overhead – causes interrupt that is handled by the VMM – VMM emulation depends on whether guest VM in user or system mode when instruction called • Assist for LPSW on system/370 – check state of guest VM – provides hardware-assisted instructions to modify physical resources • Depends on VM system implementation • The overhead of the trap still remains – eliminates overhead of emulation and mode switching
EECS 768 Virtual Machines 35 VMM Assists
• Context switch – hardware to save/restore registers, machine state • Decoding of privileged instructions – helps certain critical parts of the emulation • Virtual interval timer – precise timer for all guest VMs to schedule jobs • Adding to the instruction set – to help commonly executed VMM parts
EECS 768 Virtual Machines 36 Improve Guest VM Performance
• Requires guest system to be VMM aware – avoid duplication of functionality – provide VMM with additional information (handshaking), e.g., DIAGNOSE instruction • Example; non-paged mode – disable dynamic address translation in guest VM – no translation from virtual to real address – translation from real to physical done by VMM • Paravirtualization – modify guest OS to work around difficult to virtualize ISA features – Xen for the IA-32, eliminates code detection, patching, shadow page tables
EECS 768 Virtual Machines 37 Applications
• Implementing multiprogramming • Multiple single-application virtual machines • Multiple secure environments • Mixed-OS environments • Legacy applications • Multiplatform application development • Gradual migration to new system • New system software development • Operating system training • Help desk support • Operating system instrumentation • Event monitoring • System encapsulation and checkpointing
EECS 768 Virtual Machines 38