<<

OpenSolaris Technologies

Daniel Price Staff Engineer Solaris Kernel Technology Agenda • Virtualization Overview • Zones • BrandZ • • Q&A The Need for Virtualization • Driven by the need to consolidate multiple hosts and services on a single machine • Leads to... > Increased hardware utilization (currently average data center utilization is below 15%) > Greater flexibility in resource allocation > Reduce power requirements > Minimize management costs > Lower the cost of ownership Types of Virtualization Hard Partitions Virtual Machines OS Virtualization Resource Mgmt. App

OS

Server Multiple OS's Single OS Trend to flexibility Trend to isolation Dynamic System Logical Domains Solaris Resource Domains Xen (Zones + SRM) Manager (SRM) BrandZ CrossBow Solaris Zones • Basic concept: isolated execution environment within a Solaris instance • Virtualizes OS layer: file system, devices, network, processes • Provides: > Privacy: can't see outside zone > Security: can't affect activity outside zone > Failure isolation: application failure in one zone doesn't affect others • Lightweight, granular, efficient • No porting for most apps; ABI/APIs are the same Typical Uses for Zones • Consolidating data center workloads such as multiple databases. • Deploying multiple-tier application stacks • Hosting untrusted or hostile applications or those that require global resources like IP port space • Deploying Internet facing services • Software development Zones Block Diagram

global zone (v880-room2-rack5-1; 129.76.1.12)

dns1 zone (dnsserver1) web1 zone (foo.org) web2 zone (bar.net) mail zone (mailserver) zone root: /zone/dns1 zone root: /zone/web1 zone root: /zone/web2 zone root: /zone/mail1 login services login services login services login services (SSH sshd) (SSH sshd) (SSH sshd, telnetd) (SSH sshd) t n network services network services network services network services n e io t (named) (Apache, Tomcat) (IWS) (sendmail, IMAP) m a n c o li r i p

core services core services core services core services v p n (inetd) A (inetd) (inetd) (inetd) E 1 2 3 : : : s s s s 0 0 0 1 2 3 : : : :1 n n n n e e e r r r r 0 0 0 1 o o o o s s s s l m 10 30 m 60 m c e e e e m c c c u u u u r a / z h / z h c / z h c / z c c o u f t t r i a l

zoneadmd zoneadmd zoneadmd zoneadmd V P pool1 (4 CPU), FSS pool2 (4 CPU)

zone management (zonecfg(1M), zoneadm(1M), zlogin(1), ...)

core services remote admin/monitoring platform administration (inetd, rpcbind, sshd, ...) (SNMP, SunMC, WBEM) (syseventd, devfsadm, ifconfig, metadb,...)

storage complex network device network device network device (hme0) (ce0) (ce1) Zones Security • Each zone has a security boundary around it • Runs with a subset of privileges(5) • A compromised zone is unable to escalate its privileges • Important name spaces are isolated • BSM auditing can be configured globally or on a per-zone basis • Processes running in a zone are unable to affect activity in other zones Zones Processes • Certain system calls are not permitted or have restricted scope inside a zone • From the global zone, all processes can be seen, but control is privileged • From within a zone, only processes in the same zone can be seen or affected • The /proc file system has been virtualized to only show processes in the same zone Zones Networking • Single TCP/IP stack for the system. > Shields non-global zones from configuration details > Prevents per-zone control (routing, tuning) • Each zone can be assigned any number of IPv4 and IPv6 addresses, and each has its own port space • Applications can bind to INADDR_ANY and will only get packets addressed to that zone • Zones cannot send packets with addresses other than those it has been assigned Zones Observability • From the global zone, process tools like prstat(1M), ps(1), and truss(1) can be used to observe processes in other zones • DTrace can be used from the global zone and supports a “zonename” variable as well as psinfo_t's pr_zoneid field for use with the proc provider, e.g.

global# -n 'io:::start{@[zonename] = count()}' • A subset of the DTrace functionality (non-kernel probes) is available from within non-global zones Zones Configuration & Installation • zonecfg(1M) is used to configure a zone by specifying resources (file-systems, network interfaces, etc) and properties (zone name, file- system directory path, etc) • zoneadm(1M) is used to administer zones (boot, install, etc) • Each Zone is assigned its own file system, constructed by copying packaged files from the global zone. Zones Status • Initially supported in Solaris 10 • Available today through OpenSolaris > Integration with ZFS, faster provisioning with ZFS > Configurable privileges on a per non-global zone basis > Cloning for rapid provisioning > Zone migration, both local (enables zones to be moved and renamed) and remote (using a new attach/detach facility) • Future: > Enhancements to resource management and greater zones/rm integration > Enhanced management, Integration with Live Upgrade BrandZ: Branded Zones • Extends Zones model to support “non-native” zones on a Solaris system > Only supports user-space environments > If you need a different kernel, see Xen • Each distinct zone type is called a Brand • Possible uses: > A Linux zone > A Solaris GNU zone (Nexenta/ShilliX/BeleniX) > Support for Solaris N-1 on Solaris N > A MacOS X zone BrandZ: Brand Implementation • A Brand is composed of the following items: > Required: > An XML brand configuration file and an XML brand platform definition file > Scripts used to install a branded a zone > Optional: > Kernel brand support module > Scripts that can be invoked at zone boot and shutdown > Linker libraries that can be used to help debug branded applications > Userland brand support library > Any other native programs, libraries, or kernel modules needed to support execution of the branded zone BrandZ User-space Infrastructure • Zone utilities call per-brand scripts > At zone installation, zone boot, zone shutdown > Allows brand to customize software installed, execution environment, etc. • Debugging support libraries recognize branded processes and core files > Brand-specific library plugins aid with: > Application and library segment mappings > Address and symbol mappings > Enables tools like mdb, ptools, and the DTrace pid provider, to work without modification BrandZ Kernel Infrastructure • BrandZ adds interposition points to the Solaris kernel: > syscall path, process loading, fork, exit, etc. • Control is transferred to the brand kernel support module. • Allows a brand to replace or modify basic Solaris behaviors. • These interposition points are only applied to branded process. • Fundamentally different brands may require new interposition points The lx Brand • Implements Solaris Containers for Linux Applications • Enables Linux Binaries to run unmodified on Solaris • Creates a zone for Linux application execution > Zone is populated only with Linux software > At boot, it runs the Linux init(1M), configuration scripts, and applications > It all runs on a Solaris kernel. • There is no Linux software delivered with BrandZ > This is not a Linux distro and we do not include our own special Linux software, we install and run standard Linux distributions BrandZ Use Cases • As a transition tool, reducing the Linux “barrier to exit” > Customer would like to move to Solaris, but has legacy Linux applications • Best of both worlds > Users familiar with Linux environment > Administrators want Solaris' enterprise-class features: resource management, fault management, DTrace • Developer/ISV workload > Solaris has strong development tools, let Linux developers leverage them. > We want Solaris to be a better Linux development platform than Linux. BrandZ Status • Available in OpenSolaris now • Will be in Solaris 10 update 4 • Zones running a Red Hat Enterprise Linux 3.x or CentOS 3.5 operating environment > Support for Linux 2.4.21 system call interface > Basic /proc and /dev support • DTrace support for Linux applications > Linux syscall provider and PID provider • Rapid deployment and teardown of Linux zones. Perfect for building 'throwaway' zones for development/QA 101

● The “” abstraction ● Virtualizes hardware – memory, CPU, I/O devices ● May emulate real devices ● For /x64 multiple choices ● Xen, VMware, Parallels ● MSFT now: Virtual Server, future: Viridian Solaris Containers and Hypervision

● Containers (zones) • Scalable, fast, virtual platform, platform agnostic • Emphasis on sharing, simpler admin • Improved fault isolation over “single system.” • Alternate brands ● • Emphasis on separation • Fault isolation, (Xen: SPOFs remain) • Live Migration • Foreign OSes Para vs Full Virtualization

● Full virtualization ● Runs binary image of “metal” OS ● Must emulate real i/o devices ● Can be slow, needs help from hardware ● May use trap and emulate or rewriting ● Para-virtualization ● Runs OS ported to virtual machine arch ● Uses “virtual” device drivers ● More efficient since it is aware Xen • Open source hypervisor technology developed at the University of Cambridge > http://www.cl.cam.ac.uk/Research/SRG/netos/xen/ • OpenSolaris on Xen community > http://www.opensolaris.org/os/community/xen • 2006: HW Virtualization Everywhere > x64 CPU capabilities (VT-x, AMDV) > Workload consolidation Xen Design Principles and Goals

● Existing applications run unmodified ● Support for multi-process, multi-application application environments ● Permit complex server configurations to be virtualized within a single guest OS instance ● Paravirtualization enables high performance and strong isolation between domains ● Particularly on uncooperative architectures (x86) ● Live migration of VM instances between servers Xen 3.x Architecture

dom0 domU1 domU2 domU3 VM0 VM1 VM2 VM3 Device Unmodified Unmodified Unmodified Manager & User User User Control s/w Software Software Software

Host OS GuestOS GuestOS Unmodified (Solaris) (XenLinux ) (Solaris) GuestOS (WinXP )) Back -End Back -End SMP Native Native VT-x Device Device Front -End Front -End Driver Driver Device Drivers Device Drivers AMDV

Control IF Safe HW IF Event Channel Virtual CPU Virtual MMU 32/64bit Xen Virtual Machine Monitor

Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE) Key Xen Capabilities

● Checkpoint/Restart and Live Migration • Provisioning • Grid operations ● Multiple OSes running simultaneously • Linux, Solaris, Windows XP • No longer a boot-time decision ● Special purpose kernels • Drivers, filesystems Xen Live Migration Experiment (2004)

● Two machines running Xen 2.0 • 2GHz Hyperthreaded CPUs • 1Gbit Ethernet • Remote storage • XenoLinux ● SPECweb99 benchmark • 800Mbyte domU, 90% CPU utilization

Now, move workload from machine A to machine B ... Live Migration: SPECweb99

From LinuxWorld 2005 Virtualization BoF OpenSolaris on Xen Port • “Platform” rather than “arch” port > Privileged operations -> hypercalls > MMU, segmentation, exceptions > Xen “event” model of interrupts • New virtual device drivers > net, disk, console • Dom0 infrastructure and tools • Paravirtualized DomU Why Solaris Domain 0 • Observability, debugging tools • ZFS • FMA • Containers and TX • CrossBow • HW support OpenSolaris on Xen Status • OpenSolaris domU and dom0 > 32/64-bit, UP, MP (virtual 32-way!) > Virtual disks, network, bridge > CPU and Memory Hot plug support • Versions > Xen 3.0.2-3 (xen-3.0.3-sun) > OpenSolaris build 44 • Future OpenSolaris drops > Performance, Bug fixes > Soon-ish: Mercurial SCM Join Us... • Our communities and projects are open on OpenSolaris.org: > Zones: http://opensolaris.org/os/community/zones > BrandZ: http://opensolaris.org/os/community/brandz > Xen: http://opensolaris.org/os/community/xen > CrossBow: http://opensolaris.org/os/project/crossbow • Where you will find: > Lively discussions, design docs, FAQs, source code drops, preliminary binary releases, etc... OpenSolaris Virtualization Technologies

Dan Price [email protected] http://blogs.sun.com/dp