Distributed System: Lecture 4 Virtualizations

Box Leangsuksun SWECO Endowed Professor, Computer Science Louisiana Tech University [email protected]

CTO, PB Tech International Inc. [email protected]

Introduction to

• System virtualization studied since the 70's (Goldberg, Popek) • Fundamental – Run multiple virtual machines (OSes) simultaneously – Isolating between virtual machines. – Controlling Resources sharing between VMs – Increase resources utilization – One of the hottest technologies since 2006 Virtualization: Key concepts

(VM), guest OS: complete running in a virtual environment

• Host OS: operating system running on top the hardware, interface between the user and the VMM and VMs

• Virtual Machine Monitor (VMM):, : manage VMs (scheduling, hardware access) Virtualization: Usage

Ø Server consolidation (cloud) Ø testing Ø Security, Isolation (cloud) Ø Lower cost of ownership of server. (cloud) Ø Increase manageability (cloud) Ø Enhance server reliability Major Fields of Virtualization

• Storage Virtualization

• Server Virtualization Architecture & Interfaces

• Architecture: formal specification of a system’s interface and the logical behavior of its visible resources.

API Applications Libraries System Calls ABI Operating System ISA System ISA User ISA

Hardware

n API – application binary interface n ABI – application binary interface n ISA – instruction set architecture Credit: CS5204 – Operating Systems from vtech u Sample of API vs ABI

4/22/14 Towards survivable architecture 7 VMM Types

• System ¨ Provides ABI interface ¨ Efficient execution ¨ Can add OS-independent services (e.g., migration, intrusion detection)

n Process ¨ Provdes API interface ¨ Easier installation ¨ Leverage OS services (e.g., device drivers) ¨ Execution overhead (possibly mitigated by just- in-time compilation)

CS5204 – Operating Credit: CS5204 – Operating Systems from vtech u Systems System-level Design Approaches • (direct execution) – Exact hardware exposed to OS – Efficient execution – OS runs unchanged – Requires a “virtualizable” architecture – Example: VMWare

n ¨ OS modified to execute under VMM ¨ Requires porting OS code ¨ Execution overhead ¨ Necessary for some (popular) architectures (e.g., ) ¨ Examples: , Denali

CS5204 – Operating Credit: CS5204 – Operating Systems from vtech u Systems Design Space (level vs. ISA)

API interface ABI interface

• Variety of techniques and approaches available • Critical technology space highlighted

CS5204 – Operating Credit: CS5204 – Operating Systems from vtech u Systems System VMMs

Type 1 • Structure – Type 1: runs directly on host hardware – Type 2: runs on HostOS • Primary goals – Type 1: High performance – Type 2: Ease of construction/installation/acceptability • Examples – Type 1: VMWare ESX Server, Xen, OS/370 – Type 2: User-mode Type 2

CS5204 – Operating Credit: CS5204 – Operating Systems from vtech u Systems Hosted VMMs

• Structure – Hybrid between Type1 and Type2 – Core VMM executes directly on hardware – I/O services provided by code running on HostOS

• Goals – Improve performance overall – leverages I/O device support on the HostOS

• Disadvantages – Incurs overhead on I/O operations – Lacks performance isolation and performance guarantees

• Example: VMWare (Workstation)

CS5204 – Operating Credit: CS5204 – Operating Systems from vtech u Systems Whole-system VMMs

n Challenge: GuestOS ISA differs from HostOS ISA n Requires full emulation of GuestOS and its applications n Example: VirtualPC

CS5204 – Operating Credit: CS5204 – Operating Systems from vtech u Systems Strategies

GuestOS • De-privileging – VMM emulates the effect on system/hardware resources of privileged instructions whose execution traps into the VMM – aka trap-and-emulate privileged instruction – Typically achieved by running GuestOS at a lower hardware priority level than the VMM – Problematic on some architectures where privileged instructions do not trap when executed at deprivileged priority trap resource • Primary/shadow structures emulate change – VMM maintains “shadow” copies of critical structures whose “primary” versions are manipulated by the GuestOS vmm – e.g., page tables change – Primary copies needed to insure correct environment visible to GuestOS resource • Memory traces – Controlling access to memory so that the shadow and primary structure remain coherent – Common strategy: write-protect primary copies so that update operations cause page faults which can be caught, interpreted, and emulated.

CS5204 – Operating Credit: CS5204 – Operating Systems from vtech u Systems Different Virtualization Concepts

• Full-virtualization: full virtual machine, from the boot sequence to the virtualized hardware • Para-virtualization: the guest OS has to be modify for performance optimization • Emulation: the guest OS architecture is different from the architecture of the host OS (translation on the fly). Ex: PPC VM on top of a x86 host OS. Classification

• Two kinds of system virtualization – Type-I: the virtual machine monitor and the virtual machine run directly on top of the hardware, – Type-II: the virtual machine monitor and the virtual machine run on top of the host OS

VM VM Host OS VM VM VMM VMM Host OS Hardware Hardware Type I Virtualization (Bare-metal) Type II Virtualization (hosted) VMware Workstation, Virtual PC, VMware ESX, Microsoft Hyper-V, Xen Sun VirtualBox, QEMU, KVM Bare-metal or hosted?

• Bare-metal – Has complete control over hardware – Doesn’t have to “fight” an OS • Hosted – Avoid code duplication: need not code a process scheduler, memory management system – the OS already does that – Can run native processes alongside VMs – Familiar environment – how much CPU and memory does a VM take? Use top! How big is the virtual disk? ls –l – Easy management – stop a VM? Sure, just kill it! • A combination – Mostly hosted, but some parts are inside the OS kernel for performance reasons

17 – E.g., KVM Available Solutions • Example of Virtualization Projects – Type I: Xen, L4, VMware ESX, Microsoft Hyper- V • Type II: VMware Workstation, Microsoft Virtual PC, Sun VirtualBox, QEMU, KVM • Different Benefits – Type I: performances • direct access to the hardware simple to implement • para-virtualization possible – Type II: development • no limitation of para-virtualization • emulation possible How to run a VM? Emulate!

• Do whatever the CPU does but in software • Fetch the next instruction • Decode – is it an ADD, a XOR, a MOV? • Execute – using the emulated registers and memory

Example: addl %ebx, %eax is emulated as: enum {EAX=0, EBX=1, ECX=2, EDX=3, …}; unsigned long regs[8]; regs[EAX] += regs[EBX];

19 How to run a VM? Emulate!

• Pro: – Simple!

• Con: – Slooooooooow

• Example hypervisor:

20 How to run a VM? Trap and emulate!

• Run the VM directly on the CPU – no emulation! • Most of the code can execute just fine – E.g., addl %ebx, %eax • Some code needs hypervisor intervention – int $0x80 – movl something, %cr3 – I/O • Trap and emulate it! – E.g., if guest runs int $0x80, trap it and execute guest’s interrupt 0x80 handler 21 How to run a VM? Trap and emulate!

• Pro: – Performance!

• Cons: – Harder to implement – Need hardware support • Not all “sensitive” instructions cause a trap when executed in usermode • E.g., POPF, that may be used to clear IF • This instruction does not trap, but value of IF does not

22 change!

– This hardware support is called VMX () or SVM (AMD) – Exists in modern CPUs

• Example hypervisor: KVM How to run a VM? Dynamic (binary) translation!

• Take a block of binary VM code that is about to be executed • Translate it on the fly to “safe” code (like JIT – just in time compilation) • Execute the new “safe” code directly on the CPU

• Translation rules? – Most code translates identically (e.g., movl %eax, %ebx translates to itself) – “Sensitive” operations are translated into hypercalls • Hypercall – call into the hypervisor to ask for service • Implemented as trapping instructions (unlike POPF) • Similar to syscall – call into the OS to request service 23 How to run a VM? Dynamic (binary) translation!

• Pros: – No hardware support required – Performance – better than emulation

• Cons: – Performance – worse than trap and emulate – Hard to implement – hypervisor needs on-the-fly x86- to-x86 binary compiler

24• Example : VMware, QEMU How to run a VM? Paravirtualization!

• Does not run unmodified guest OSes • Requires guest OS to “know” it is running on top of a hypervisor

• E.g., instead of doing cli to turn off interrupts, guest OS should do hypercall(DISABLE_INTERRUPTS)

25 How to run a VM? Paravirtualization!

• Pros: – No hardware support required – Performance – better than emulation

• Con: – Requires specifically modified guest – Same guest OS cannot run in the VM and bare-metal

• Example hypervisor: Xen

26 Industry trends

• Trap and emulate

• With hardware support

• VMX, SVM

27 Linux-related virtualization projects Project Type License Bochs Emulation LGPL QEMU Emulation LGPL/GPL VMware Full virtualization Proprietary z/VM Full virtualization Proprietary Xen Paravirtualization GPL UML Paravirtualization GPL Linux-VServer Operating system- GPL level virtualization OpenVZ Operating system- GPL level virtualization Hardware support for full virtualization and paravirtualization

• Recall that the IA-32 (x86) architecture creates some issues when it comes to virtualization. Certain privileged-mode instructions do not trap, and can return different results based upon the mode. For example, the x86 STR instruction retrieves the security state, but the value returned is based upon the particular requester's privilege level. This is problematic when attempting to virtualize different operating systems at different levels. For example, the x86 supports four rings of protection, where level 0 (the highest privilege) typically runs the operating system, levels 1 and 2 support operating system services, and level 3 (the lowest level) supports applications. Hardware vendors have recognized this shortcoming (and others), and have produced new designs that support and accelerate virtualization. Hardware support for full virtualization and paravirtualization

• Intel is producing new virtualization technology that will support hypervisors for both the x86 (VT-x) and Itanium® (VT-i) architectures. • The VT-x supports two new forms of operation – one for the VMM (root) – one for guest operating systems (non-root). • The root form is fully privileged, while the non-root form is deprivileged (even for ring 0). • The architecture also supports flexibility in defining the instructions that cause a VM (guest operating system) to exit to the VMM and store off processor state. Other capabilities have been added Hardware support for full virtualization and paravirtualization

• AMD is also producing hardware-assisted virtualization technology, under the name Pacifica. • Among other things, Pacifica maintains a control block for guest operating systems that are saved on execution of special instructions. • The VMRUN instruction allows a virtual machine (and its associated guest operating system) to run until the VMM regains control (which is also configurable). The configurability allows the VMM to customize the privileges for each of the guests. • Pacifica also amends address translation with host and guest memory management unit (MMU) tables. I/O Virtualization

• Typical methods to virtualize the CPU • A computer is more than a CPU • Also need I/O!

• Types of I/O: – Block (e.g., hard disk) – Network – Input (e.g., keyboard, mouse) – Sound – Video • Most performance critical (for servers): – Network

32 – Block Xen Overview

• Para-virtualization possible – full-virtualization is virtualization support at the hardware level (VT Intel technology, AMD-V/Pacifica technology) – XenoLinux: port of the to the Xen Hypervisor • Hypervisor based on a micro-kernel Ø Open Source, Linux based Ø Create and manage VMs via command line Ø Restricted hardware access though API Ø Host’s kernel need to be patched. VMware Overview

Ø Commercial virtualization applications Ø Full Virtualization Ø Highly Portability Ø Simulate BIOS, PXE boot. Ø Simulate virtual Hardware for every VM Ø Support Bridge, NAT, and Host-Only Networks Ø Run wide range unmodified guest OSes such as Windows, Linux, Solaris, BSD, Netware, DOS, VMware Overview

Source : http://www.vmware.com VMware vs. Xen

Relative performance on native Linux (L), Xen/Linux (X), VMware Workstation 3.2 (V), and User Mode Linux (U).

Source : “Xen and Art of Virtualization”, Ian Pratt, University of Cambridge, Xensource Inc. Http://www.cl.cam.ac.uk/netos/papers/2005-xen-may.ppt VMware vs. Xen (TCP results)

1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 L X V U L X V U L X V U L X V U Tx, MTU 1500 (Mbps) Rx, MTU 1500 (Mbps) Tx, MTU 500 (Mbps) Rx, MTU 500 (Mbps)

TCP bandwidth on Linux (L), Xen (X), VMWare Workstation (V), and UML (U)

Source : “Xen and Art of Virtualization”, Ian Pratt, University of Cambridge, Xensource Inc. Http://www.cl.cam.ac.uk/netos/papers/2005-xen-may.ppt Qemu

• Emulation solution • Direct access to the hardware possible if the host OS and the guest OS have the same architecture

User Space User Space User Space User Space

Linux Windows Linux Mac OS X Solaris Drivers Drivers Drivers Drivers Drivers

Qemu x86 Qemu x86 Qemu PPC Qemu PPC Qemu Sparc

Host OS: Linux, Mac OS X, Windows

Hardware: processor, memory, disk, network, etc.

From http://fr.wikipedia.org/wiki/Qemu Xen Overview

• Para-virtualization possible – full-virtualization is virtualization support at the hardware level (VT Intel technology, AMD-V/ Pacifica technology) – XenoLinux: port of the Linux kernel to the Xen Hypervisor • Hypervisor based on a micro-kernel • Efficient virtualization: HPC possible Xen Overview

Ø Open Source, Linux based Ø High Performance Ø Support Bridge, and Routing Networks Ø Create and manage VMs via command line Ø Restricted hardware access though API Ø Host’s kernel need to be patched. Xen’s Ring Model

Ring 3 User Applications Ring 3 User Applications Ring 1 and 2 Ring 2 is not used are not used Ring 1 for VM’s Ring 0 Ring 0 Operating Xen’s System Hypervisor

Standard x86 Architecture Xen on x86 Architecture

The architecture of Xen

43 Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 © Pearson Education 2012 43 Use of rings of privilege

44 Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 © Pearson Education 2012 44 Virtualization of memory management

45 Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 © Pearson Education 2012 45 Split device drivers

46 Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 © Pearson Education 2012 46 I/O rings

47 Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 © Pearson Education 2012 47 The XenoServer Open Platform Architecture

48 Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 © Pearson Education 2012 48 Virtualization Examples

• Server consolidation - Virtual machines are used to consolidate many physical servers into fewer servers, which in turn host virtual machines. Each physical server is reflected as a virtual machine "guest" residing on a virtual machine host system. This is also known as Physical-to-Virtual or 'P2V' transformation. Virtualization Examples

• Disaster recovery - Virtual machines can be used as "hot standby" environments for physical production servers. This changes the classical "backup-and-restore" philosophy, by providing backup images that can "boot" into live virtual machines, capable of taking over workload for a production server experiencing an outage. Virtualization Examples

• Testing and training - can give root access to a virtual machine. This can be very useful such as in kernel development and operating system courses. Virtualization Examples

• Portable applications - The platform has a well-known issue involving the creation of portable applications, needed (for example) when running an application from a removable drive, without installing it on the system's main disk drive. This is a particular issue with USB drives. Virtualization can be used to encapsulate the application with a redirection layer that stores temporary files, Windows Registry entries, and other state information in the application's installation directory – and not within the system's permanent file system. See portable applications for further details. It is unclear whether such implementations are currently available. Virtualization Examples

• Portable workspaces - Recent technologies have used virtualization to create portable workspaces on devices like iPods and USB memory sticks. These products include: – Application Level – Thinstall – which is a driver-less solution for running "Thinstalled" applications directly from removable storage without system changes or needing Admin rights – OS-level – MojoPac, , and U3 – which allows end users to install some applications onto a storage device for use on another PC. – Machine-level – moka5 and LivePC – which delivers an operating system with a full software suite, including isolation and security protections. Virtualization Tips

• In the VMware space, VirtualCenter is the management tool of choice for ESX Server. • Other products, like Hewlett-Packard's Virtual Machine Management or IBM's Director modules, are adding functionality to deal with virtual machine [VM] environments. • The problem is that most of these tools that are snap-ins lack much of the simple functionality you get in VirtualCenter. • Most companies will end up buying both VirtualCenter and the vendor's tool and use both depending on what they are doing. Virtualization Tips

• Shy away from large amounts of processing when doing consolidation. • If you are doing virtualization for other reasons, like workload management, then you can get nearly anything to run virtualized if you are willing to change some of the things you do. • However, if you are looking for maximum consolidation ratios and high ROIs, stay away from the quad boxes that are already running at 50%. VM on Amazon

4/22/14 Towards survivable architecture 56 Security Tips

• Some standard minimum security at least: – Disable remote root access – use sudo when needed – configure the AD PAM modules for Windows shops. Security Tips

• Some organizations use too much surrounding security and end up making their environment slower, more difficult and expensive to manage. • When dealing with the VMs, all of the standard procedures should be followed. • The host systems themselves should often be considered appliances, and organizations should limit the amount of customized agents and security hacks performed on these systems. Security Tips

• One should not go overboard with ESX hosts, since they are basically appliances serving up computing resources and should be treated as such. Nevertheless, taking a common sense approach to security on the servers is the best bet. • The most common mistakes made with virtual security are based on ignorance, lack of knowledge of the Linux console, failure to understand how virtual switch architecture works, and what the host does not directly see in the data in the VM disk files. Security Tips

• The same practices that are performed to secure a physical environment can, and should, be used in a virtual environment as well. • Everything from proper VLAN/firewall organization to host-based intrusion detection should be leveraged to keep the environment secure. Scalability Tips

• Simplicity. The more complicated the design and infrastructure, the less scalable it will be. – For example, a common mistake in large organizations, is that they assume they cannot create a simple solution because they are big. One can argue that they should make the solution or design for VMware as simple as possible to make it scalable for the size of their organization and largest client base. • Don't design the entire solution around the one-offs. Scalability Tips

• When designing a virtual infrastructure, one should never look at the environment and try to plan one large infrastructure for the entire virtualization project. It won’t work. • Organize the overall environment into smaller groupings of servers and addressed individually. • When approached this way, at the end of the project, a very scalable deployment methodology that uses the same principals with a manageable number of servers in various phases of the project will be in place