<<

White Paper Communications and Storage Infrastructure Container-Based

Linux* Containers Streamline Virtualization and Complement -Based Virtual Machines

As strategic approaches such as -defined networking (SDN) and network function virtualization (NFV) become more mainstream, combinations of both OS-level virtualization and hypervisor-based virtual machines (VMs) often provide dramatic cost, agility, and performance benefits, throughout the center.

1 Executive Summary By decoupling physical computing resources from the software that executes on them, virtualization has dramatically improved resource utilization, enabled consolidation, and allowed workloads to migrate among physical hosts for usages As virtualization using such as load balancing and failover. The following key techniques for resource * containers emerges as sharing and partitioning have been under continual development and refinement a viable option for mainstream since work began on them in the 1960s: computing environments, is investing in enablement • Hypervisor-based virtualization has been the main focus within the Intel® through hardware and software architecture ecosystem. Increased server headroom and hardware assists from technologies that include the Intel® Virtualization Technology (Intel® VT)1 have helped broaden its adoption and following: the scope of workloads for which it is effective on open-standard hardware. • The Intel® Data Plane • OS-level virtualization has been developed largely by makers of proprietary Development Kit (Intel® DPDK) operating systems and system architectures. Linux* containers, introduced in 2007, extend the capabilities for highly elastic or latency-sensitive workloads on • Intel® Solid State Drive open-standards hardware. Family for PCI Express* Both approaches have distinct advantages. For example, whereas each hypervisor- based (VM) runs an entire copy of the OS, multiple containers • Intel® Virtualization Technology share one kernel instance, significantly reducing overhead. While hypervisor-based VMs can be provisioned far more quickly than physical servers—taking only tens of seconds as opposed to several weeks—that lag is unacceptable for workloads that range from real-time data analytics to control systems for autonomous vehicles. On the other hand, hypervisor-based VMs allow for multi-OS environments and superior workload isolation (even in public clouds). Enterprises are also finding that new virtualization usage models that draw on both containers and help them get more value out of strategic approaches such as public cloud, software-defined networking (SDN), and network function virtualization (NFV).

NOTE: In the context of this paper, the term “container” is a generalized reference for any virtual partition other than a hypervisor-based VM (e.g., a jail, FreeBSD jail, Solaris* container/zone, or Linux container). Linux Containers Streamline Virtualization and Complement Hypervisor-Based Virtual Machines

Table of Contents This paper introduces technical 2 The Historical Context for executives, architects, and engineers to Containers and Hypervisors 1 Executive Summary...... 1 the potential value of Linux containers, Hypervisors and containers have a joint 2 The Historical Context for particularly in conjunction with Containers and Hypervisors. . . . . 2 ancestry, represented in Figure 1, which hypervisor-based virtualization; it has been driven by efforts to decrease 2.1 Resource-Partitioning Techniques includes the following sections: Not Based on Hypervisors. . . . 3 the prescriptive coupling between 2.2 Emergence of Software-Only, • The Historical Context for Containers software workloads and the hardware Hypervisor-Based Virtualization and Hypervisors traces the they run on. From the beginning, these for Intel® Architecture ...... 4 development of these technologies advances enhanced the flexibility of 2.3 Hardware-Assisted, Hypervisor- from their common origin as an computing environments, enabling a Based Virtualization with Intel® effort to share and partition system single system to handle a broader range Virtualization Technology. . . . 4 resources among workloads. of tasks. As a result, two related aspects 3 Virtualization Using of the interactions between hardware Linux Containers ...... 5 • Virtualization Using Linux Containers and software have evolved: 3.1 Usage Model: Platform as a introduces the support for container- Service on Software-Defined based virtualization in Linux, • R esource sharing among workloads Infrastructure...... 6 illustrated by representative usage allows greater efficiency compared to 3.2 Usage Model: Combined models. the use of dedicated, single-purpose Hypervisor-Based VMs and equipment. Containers in the Public Cloud . .7 • Enabling Technologies from Intel 4 Enabling Technologies from Intel for for Container-Based Virtualization • R esource partitioning ensures that Container-Based Virtualization. . . .8 discusses Intel’s hardware and the system requirements of each 4.1 Intel® Data Plane software technologies that help workload are met and prevents Development Kit...... 8 improve performance and data unwanted interactions among 4.2 Intel® Solid-State Drive protection when using Linux workloads (isolating sensitive data, Data Center Family containers. for example). for PCI Express*...... 9 4.3 Enhanced Workload Isolation • Containers in Real-World The first stage depicted in Figure 1 with Intel® Virtualization Implementations examines a few is that of the monolithic compute Technology ...... 9 examples of early adopters that are environment, where the hardware and 5 Containers in Real-World using containers as part of their software are logically conjoined, in a Implementations...... 10 virtualization infrastructures today. single-purpose architecture. The ability 5.1 Google: Combinations and Layers to make significant changes to either of Complementary Containers and Hypervisors...... 10 5.2 Heroku and Salesforce.com: Container-Based CRM, Worldwide. 11 6 Conclusion...... 12

User User User … User Env Env … Env Monolithic Compute Enviroment Control Program Supervisor Supervisor (Hypervisor) Supervisor

System Example: System Example: System Example: System Example: GE-635/Multics GE-635/Multics IBM 360 IBM 370 (circa 1959) (circa 1962) (circa 1964) (circa 1972)

Figure 1. Early development of approaches to resource sharing and partitioning. 2 Linux Containers Streamline Virtualization and Complement Hypervisor-Based Virtual Machines

hardware or software, short of replacing IBM introduced its first hypervisor it is vulnerable to intentional efforts the entire system, is severely limited for production (but unsupported) by users or programs to “break out” or even nonexistent. The second stage use in 1967, to run on System/360* of their jails and gain unauthorized includes the supervisor (progenitor of mainframes, followed by a fully access to resources. the kernel), an intermediary between supported version for System/370* • FreeBSD* jails are similar in concept and system resources that equipment in 1972. The company is to chroot jails, but with a greater allows hardware and software to be generally recognized as having held emphasis on security. This mechanism changed independently of each other. the industry’s most prominent role in was introduced as a feature of That de-coupling is advanced further the development of hypervisor-based FreeBSD 4.0 in 2000. FreeBSD jail in the third stage, where the supervisor virtualization from that point through definitions can explicitly restrict access mediates resources among multiple the rest of the twentieth century. outside the sandboxed environment users, allowing time-sharing among by entities such as files, processes, multiple, simultaneous jobs. 2.1 Resource-Partitioning Techniques Not Based on Hypervisors and user accounts (including The fourth stage adds a control accounts created by the jail definition program (CP, also known as the Although hypervisors enrich isolation specifically for that purpose). While “hypervisor”) that presents an among virtual environments running this approach significantly enhances independent view of the complete on a single physical host, they also add control over resources compared environment to each application; complexity and consume resources. to chroot, it is likewise incapable of applications execute in isolation, Alternate approaches to partitioning supporting , because unaware of each other. Importantly, resources were developed as interest the FreeBSD jails must share a single various hypervisor-enabled virtual in * and Unix-like operating OS kernel. environments can run different systems grew in the late 1970s and • Solaris containers (Solaris zones) operating systems simultaneously on early 1980s, driven by organizations build on the virtualization capabilities the same hardware, which would be that included Bell Laboratories, of chroot and FreeBSD jails. Sun impossible under most approaches AT&T, Digital Equipment Corporation, Microsystems introduced this feature to virtualization that are not based on , and academic with the name “” hypervisors. (Running workloads based institutions. These technologies (and as part of Solaris 10 in 2005, and on different operating systems on the others), which are collectively referred Oracle officially changed the name same host is possible to a limited extent to as “OS-level virtualization,” provide to “Solaris zones” with the release using Solaris* containers, as described lightweight approaches to virtualization, of Solaris 11 in 2011. Zones are below.) acting as foundations for today’s Linux containers. individual virtual server instances that The supervisor, as mediator between can coexist within a single OS instance. user space and system resources, • The chroot is a critical Similar to FreeBSD jails, Solaris has full access to the host and allows foundation for container-based zones allow for zone-specific user a subset of that access to users virtualization, introduced as a feature accounts and access restrictions on (workloads). That arrangement allows of Version 7 Unix by Bell Laboratories resources such as network interfaces, the supervisor to enforce limitations in 1979 and as part of 4.2BSD by the memory, storage, and processors. on access that are necessary for usages University of California, Berkeley One significant advance toward full such as the time sharing among jobs in 1983. The chroot mechanism virtualization is that while Solaris described in the third stage of Figure can redefine the for zones must share a single OS kernel, 1. In contrast, the CP shown in the any running program, effectively a capability called “branded zones” fourth stage moderates between the preventing that program from being (BrandZ) enables individual zones supervisor and the multiple virtual able to name or access resources to emulate the behavior of certain representations of the environment outside that root directory tree. While operating systems. As mentioned presented to workloads. Accordingly, such partitions, referred to as “chroot previously, BrandZ allows the the CP operates at a level superior to jails,” support limited virtualization environment to simulate cross-OS that of the supervisor, conceptually functionality, they must share a single virtualization to a limited degree. suggesting a role “beyond the OS kernel. Moreover, chroot is not supervisor,” which is reflected in the designed to be tamper-resistant, and term “hypervisor.”

3 Linux Containers Streamline Virtualization and Complement Hypervisor-Based Virtual Machines

2.2 Emergence of Software-Only, Of the models described above, OS (rather than a management server) Hypervisor-Based Virtualization hypervisor-based virtualization for services such as I/O device support for Intel® Architecture emerged as the dominant data-center and memory management; they are approach, marked by a period of often used for virtualized desktop Shortly after the year 2000, Intel rapid hypervisor development in the infrastructure. architecture was beginning to reach Intel architecture ecosystem, led by the upper limits of -server VMware and the open-source * 2.3 Hardware-Assisted, Hypervisor- performance possible from scaling up project. Other prominent providers of Based Virtualization with Intel® processor frequency alone. The key virtualization solutions include Citrix Virtualization Technology factors behind that plateau were the (including commercial versions of Xen), increased power consumption and In software-only virtualization on , Oracle, Parallels, , waste heat associated with raising Intel architecture, the hypervisor must and the Kernel-based Virtual Machine clock speed. The primary solution emulate the hardware environment (KVM) open-source project. A high- that emerged was to create processor for VMs, using binary translation. level representation of virtualization architectures with multiple execution This service provides software-based stacks is shown in Figure 2, which cores on a single die. interfaces to physical resources such depicts the two primary hypervisor as processors, memory, storage, Multi-core processors can execute variations: Type 1 and Type 2. graphics cards, and network adapters. multiple threads of programming Type 1 hypervisors are installed Because this emulation is performed instructions in parallel, simultaneously directly on the physical host in software, it consumes significant and independently of each other. While hardware, whereas Type 2 (also execution resources on the processor this approach increases , known as “hosted”) hypervisors are that are therefore not available redesigning serial applications for installed on top of a host OS. Type to operate on workloads, which multithreading—so that work can 1 hypervisors communicate directly significantly limits scalability. be split efficiently among multiple with the underlying hardware and cores—is a complex undertaking. Intel introduced Intel VT in 2006, which support robust virtual networking and Moreover, if done incorrectly, it can includes a new privilege level for VMs dynamic resource allocation through produce conflicts among threads that to operate in without the overhead a management server; they are the cause runtime errors, unpredictable of binary translation, dramatically more prevalent approach for data- results, or performance deficits. improving efficiency on the physical center virtualization of servers. Type 2 Other approaches to take advantage host. Intel VT has continued to develop hypervisors typically rely on the host of hardware parallelism include the following:

• Stateless web pages can support many tasks running in parallel, such as user sessions or transactions.

• High-performance computing models, including clusters and App App App App grids, coordinate workloads across Bins/libs Bins/libs machines. App App App App Operating Operating • Hypervisor-based virtualization Bins/libs Bins/libs System System allows serial applications to run in Operating Operating Virtual Machine Virtual Machine separate virtual machines (VMs). System Virtual Machine Hypervisor • Non-hypervisor virtualization provides resource sharing and Hypervisor partitioning among workloads using mechanisms such as chroot, FreeBSD Hardware Hardware jails, and Solaris containers.

Type 1 Hypervisor Type 2 Hypervisor

4 Figure 2. Virtualization based on Type 1 and Type 2 hypervisors. Linux Containers Streamline Virtualization and Complement Hypervisor-Based Virtual Machines

into an expanding set of features within • R emote interaction among users, a set of binaries and libraries), as shown processors, chipsets, and network including real-time usages such as in Figure 3. Each container includes hardware to make virtualization voice or video communication, online only the services that it needs but can’t increasingly efficient and to enhance collaboration, and remote control of obtain from those shared resources, the hardware-based isolation between automated systems, where lag or jitter paring down the size of the software workloads on separate VMs running on is unacceptable stack running on each container. the same host. • Latency-sensitive or lossless Two features of the are Higher performance, improved data networking, as required for systems at the core of the functionality that protection, and decreased latency such as network-attached storage or underlies Linux containers: enabled by Intel VT allowed larger Ethernet-based storage area networks •  provides the ability to numbers of servers to be consolidated • Page rendering for web content govern and measure the use of per physical host and expanded such as Facebook’s News Feed and resources such as memory, processor the types of workloads suitable for LinkedIn’s Home/Feed, which involves usage, and disk I/O by collections of virtualization. At the same time, gathering data in parallel from many processes called “control groups.” The progressive generations of Intel® Xeon® sources kernel provides access to separate processors for the highly scalable subsystems for managing each set server segment were engineered with In contrast to a VM, a container does of resources, either manually or more advanced reliability, availability, not require a hypervisor or incorporate programmatically. and serviceability (RAS) features, a dedicated OS, which presents an making them more suitable for attractive approach for workloads • Namespace isolation provides a virtualizing mission-critical workloads. such as those described above. The software-based means of limiting open-source Linux Containers (LXC) each control group’s view of 3 Virtualization Using Linux project introduced public code in 2007 global resources, such as details Containers that provides support for multiple about file systems, processes, containers on the same physical host network interfaces, inter- As discussed above, the requirement (or on the same VM, as discussed communication, host and domain under hypervisor-based virtualization below), sharing the same instance of names, and user IDs. for a separate OS instance to run the Linux OS kernel (and in some cases in each VM consumes significant resources. For example, multiple copies of many drivers and services must be run, including some that aren’t even needed by any of the running VMs. Furthermore, compute capacity can be readily increased in virtualized environments by starting additional App App App App VMs, but spinning up a new VM (or shutting one down) has time Bins/libs Bins/libs App App App App requirements on the order of tens of Operating Operating seconds. Workloads that can’t tolerate System System Bins/libs Container that level of delay are becoming more Virtual Machine Virtual Machine Container Bins/libs prevalent—notably those with elastic capacity needs and real-time or Hypervisor Operating System otherwise low-latency requirements, Hardware Hardware such as the following:

• R eal-time data analytics, for systems providing services such as business intelligence that rely on immediate Type 1 Hypervisor Linux* Containers access to data as it emerges from Figure 3. Linux containers compared to a Type 1 hypervisor. dispersed sensors and other sources

5 Linux Containers Streamline Virtualization and Complement Hypervisor-Based Virtual Machines

Using these capabilities, the containers At the present early stage, it would to do so using a hypervisor. Moreover, themselves are created as sandboxed be difficult to anticipate the roles that hypervisors and containers can each environments for applications to virtualization based on hypervisors play a valuable role as the installation execute in. Because the containers are versus virtualization based on substrate for the other, as in the usage unaware of each other, each of them containers will eventually take in models described in the remainder of interacts with resources such as the mainstream data centers. Still, it is this paper. shared OS kernel and as if apparent that both models will persist the other containers didn’t exist. That for the foreseeable future, because 3.1 Usage Model: Platform as a isolation extends to the applications of their complementary nature. For Service on Software-Defined within the containers, their security example, the requirement that all Infrastructure settings, and their access to physical containers must use the same Linux The container model of virtualization and virtual devices through the host. As OS kernel is an obvious limitation is a good fit with the rack-scale discussed later in this paper, however, to server consolidation in a typical architecture (RSA) approach favored software-only approaches to isolating enterprise that is likely to also in Intel’s vision for the re-imagined workloads are potentially more include *. Desktop data center. RSA looks ahead to the susceptible to compromise by malicious virtualization is another common need for dramatically increased levels software than those that incorporate example; organizations that want to of server resources in data centers to hardware-based mechanisms. run virtual Windows desktops will need support developments such as the of Things, which is expected to add billions of connected devices The Container-Based Virtualization Software to the global infrastructure in the Ecosystem for Linux* next decade. To support this vast internetwork, server architectures The following list captures a few noteworthy software offerings related to must allow resources to shift around the use of containers: dynamically as workloads change and • LXC (Linux Containers) is a user-space interface (including an API and tools) adapt. for the cgroups and namespace-isolation features of the Linux kernel. The RSA approach to meeting that •  is an open-source engine designed to simplify the use of LXC challenge is to pool the processing, by automating deployment of applications in containers, packaging memory, networking, and storage applications and their dependencies in images with a standard format. resources of an entire server rack, disaggregating them from the host- •  (Let Me Contain That For You) is an open-source version of level units to which we are accustomed. Google’s internal application containerization stack that, in contrast to Physically, these components need Docker, prioritizes performance over ease of use. not even be arranged in the form of • CoreOS is a fork of Chrome OS*, designed to include only the minimum servers; for instance, a rack could level of capabilities that are needed to support the operation of be provisioned using a chassis for applications in containers. each type of component and the hardware dependencies needed to • Project Atomic is a community-based effort initiated by Red Hat to create support it. Logically, the rack replaces technologies for lightweight container hosts; this work will be used in a new the individual system as the unit of product variant, Red Hat Enterprise Linux Atomic Host. management in the data center, with •  is an open-source container manager developed by one rack’s worth of individual resources Google that provides deployment, health management, replication, and referred to as a “tray.” That is, one rack connectivity services for containers. as a logical unit consists of a processor tray, a memory tray, etc. NOTE: Lmctfy, CoreOS, and Project Atomic use the Linux kernel’s cgroup and namespace-isolation features directly, as opposed to using LXC, although Docker is The attributes of the hardware integrated into all three. resources are exposed to the software layer, which provision virtual systems on a dynamic, as-needed basis, according to the requirements of a

6 Linux Containers Streamline Virtualization and Complement Hypervisor-Based Virtual Machines

specific workload or task. RSA offers Usages that particularly benefit environment. In this scenario, both tremendous flexibility by tailoring from the use of containers on PaaS containers and hypervisor-based VMs software-defined infrastructure to the architectures are those cases where have unique advantages: long-running data-services applications needs of applications and workloads • Hypervisor-based workload with relatively higher priority can run in real time. This approach optimizes isolation. The ability to isolate data on in a multitenant fashion with fast- efficiency by dedicating only the level a per-VM basis, even within a shared completion, lower-priority processing of resources needed for the task at multitenant infrastructure, is well tasks such as MapReduce* or Apache hand, without the need to reserve established, including partitioning Spark*. Implementation of this model is idle headroom, because the virtual shared physical memory and I/O accelerating in segments such as high- system will be disbanded and the devices. Doing so is critical to data performance computing, cloud and big resources returned to the rack-level protection particularly in cases of data, telecommunications, the Internet of pool before system demands could sensitive or regulated workloads. Things, and financial services (particularly shift. It also offers a means of building to support high-frequency trading). • Container-based resource efficiency. a system that is far more reliable than Sharing the OS and some libraries any of its components, because each 3.2 Usage Model: Combined and binaries among containers can of them is inherently redundant and Hypervisor-Based VMs and dramatically reduce virtualization interchangeable. Overall, RSA extends Containers in the Public Cloud overhead, increasing provisioning data-center capabilities and drives up speed, performance, and the As they plan how to utilize the cost- asset utilization, while reducing overall number of client endpoints that can effective, on-demand infrastructure costs. be supported with a given level of offered by public clouds, many computing resources. That efficiency Container-based virtualization has organizations see potential efficiencies translates into cost savings for the obvious utility within this approach, from deploying workloads to these customer. because it enables extremely rapid clouds in containers rather than provisioning and de-provisioning of hypervisor-based VMs. The resulting The benefits of both approaches can resources, compared to hypervisor- reduced compute overhead could be combined to a significant extent by based virtualization. Here, the container significantly decrease the amount of the model where a customer provisions environment is scaled horizontally resources they would be billed for, a set of VMs on a public cloud and using a platform-as-a-service (PaaS) reducing operational expenses. At the then partitions each of those VMs model, with a cluster-wide resource same time, however, using containers using containers, as shown in Figure manager running as a service on all may not meet organizations’ needs in 4. Instances of applications deployed the nodes, coordinated by a central terms of isolating data among their in those containers could achieve own workloads or from other unknown executive scheduler. Several such a valuable balance between data organizations in the shared multitenant resource managers integrate with Intel’s protection and efficiency, particularly RSA model, including the following:

• SLU RM (Simple Linux Utility for Resource Management), an open- source project

• TO RQUE (Terascale Open-source Resource and QUEue Manager), a community-based project App App App App App App App App App App • Omega, created by Google Bins/libs Container Bins/libs Container Container • Mesos , an Apache Software Container Bins/libs Container Bins/libs Foundation open-source project Operating System Operating System • Y ARN (Yet Another Resource Negotiator), an Apache Software Virtual Machine Virtual Machine Foundation open-source project for Hypervisor Hadoop* Hardware

Figure 4. Containers inside hypervisor-based virtual machines. 7 Linux Containers Streamline Virtualization and Complement Hypervisor-Based Virtual Machines

by using large numbers of containers That approach is particularly germane 4.1 Intel® Data Plane Development Kit per VM and a stripped-down OS such to supporting real-time or otherwise To increase agility and reduce costs, as CoreOS, which is built explicitly latency-sensitive workloads based on the industry is moving toward solutions for containers. At the same time, global networks of mobile devices. that decouple network functions customers can take advantage of the One example might arise in an from the underlying hardware by global reach, cost-effectiveness, and organization fielding a global network means of abstraction. This transition elasticity of PaaS environments offered of edge devices such as point-of- encompasses the interrelated efforts by global providers such as Amazon sale (POS) systems, customer kiosks, of software-defined networking (SDN) and Google. smart vending machines, or sensor and network function virtualization arrays. Exchanging data between such Moreover, that PaaS environment could (NFV). use a hybrid cloud model, meaning that endpoints and core back-end systems it could include containers running on may be a key requirement for real- The Intel® Data Plane Development VMs within both internal data centers time data analytics, which can create Kit (Intel® DPDK) is a set of software and public clouds. A virtual network business value by guiding tactical and libraries that support the hardware with an umbrella DNS namespace strategic decision making in areas abstraction required by SDN and NFV, could encompass the entire multi- such as demand forecasting, dynamic as well as dramatically accelerating site environment, providing a single, price optimization, and supply-chain packet processing. Those capabilities global service that is resolved using management. enable application, control, and signal-processing workloads to be DNS and routing techniques. Hybrid Many such implementations could need transitioned away from special-purpose cloud infrastructure can help deliver low-latency connectivity in regions that hardware, to standards-based servers the benefits of public cloud services are geographically distant from any based on Intel® processors. With the while also taking full advantage of of the company’s owned data centers. reduction or elimination of network existing internal infrastructure assets. Likewise, an organization might have processor units (NPUs), coprocessors, Running containers on the VMs within highly elastic capacity requirements application-specific integrated circuits that hybrid cloud provides the rapid for certain applications and workloads, (ASICs), and field-programmable provisioning and efficiency benefits of separately or in conjunction with the gate arrays (FPGAs), the hardware containers, as well. global-endpoint example given above. environment becomes simpler, more Workload characteristics that could Deploying combinations of containers cost-effective, and more scalable. and hypervisor-based VMs as create such elastic demand might described above can help organizations include holiday shopping causing a The more homogeneous, standardized take advantage of public clouds to spike in activity on retail POS systems, architecture also better supports meet a variety of business needs. For seasonal variations in travel volume the dynamic definition of systems example, many companies—even large impacting usage of airline kiosks at in software that underlies the RSA multinationals—have consolidated airports all over the world, and hot and software-defined infrastructure their data-center infrastructures into weather or a major sporting event described earlier in this paper. Intel a small number of physical facilities. producing a temporary rise in sales DPDK therefore can play a significant role in creating an optimal environment While that strategy enables them to from connected vending machines. for the implementation of container- achieve cost savings, it also potentially based virtualization. Key software- extends the physical distance (and 4 Enabling Technologies from library components of Intel DPDK accordingly, the number of intermediate Intel for Container-Based include the following: router connections, or “hops”) to some Virtualization endpoints, thereby increasing the • Environment Abstraction Layer In keeping with its strategic vision latency over the connections to those provides access to low-level resources for RSA and software-defined endpoints. In response, as the number such as hardware, memory space, and infrastructure, Intel has invested in and types of connected devices logical cores using a generic interface a number of technologies that will continues to expand, organizations are that obscures the details of those help optimize performance and assessing the potential for containers resources from applications and data protection within container in conjunction with hypervisor-based libraries. environments. virtualization on leased public clouds.

8 Linux Containers Streamline Virtualization and Complement Hypervisor-Based Virtual Machines

• Memory Manager allocates pools of Ongoing work is planned to help ensure which standardizes connectivity for objects in huge-page memory space. Intel DPDK vSwitch compatibility with SSDs and other non-volatile memory An alignment helper ensures that the full range of container-based (NVM)-based devices using PCI Express* objects are padded, to spread them virtualization solutions and to make (PCIe*) 3.0, to dramatically improve equally across DRAM channels. contributions to the main Open vSwitch data-transfer rates and latency. project. In particular, development is • Buffer Manager significantly reduces Based on the NVMe standard, the Intel underway to support direct assignment the time the OS spends allocating and SSD Data Center Family for PCI Express of single-root I/O virtualization (SR- de-allocating buffers. The Intel DPDK offers a number of benefits that make IOV) virtual functions, as well as pre-allocates fixed-size buffers, which it well suited to the needs of container- finer-grained direct assignment at are stored in memory pools. based virtualization: the queue-pair level (i.e., RX and TX • Queue Manager implements safe queues). • Dramatically improved performance. lockless queues (instead of using The Intel SSD Data Center Family for spinlocks), allowing different software 4.2 Intel® Solid-State Drive PCI Express provides up to six times components to process packets while Data Center Family faster data transfer speed than 6 Gbps avoiding unnecessary wait times. for PCI Express* SAS/SATA SSDs2.

• Flow Classification incorporates Intel® The reduction in virtualization overhead • Optimized for multi-core processors. Streaming SIMD Extensions (Intel® that is enabled by containers compared Hardware and software parallelism SSE) to produce a hash based on tuple to hypervisor-only topologies allows enhance performance with features information, improving throughput by for more application instances, users, such as deeper command queues, enabling packets to be placed quickly or transactions to be run with the parallel interrupt processing, and into processing flows. same level of system resources. One lockless synchronization. Successful implementation of NFV effect of that capability is that it places • Advanced software support. Intel’s depends to a large extent on robust greater demand for simultaneous data NVMe driver has been incorporated virtual switching capabilities. Open accesses on the storage subsystem. in the Linux kernel since March vSwitch is a production-quality, Therefore, containers are particularly 2012, and management capabilities multi-layer virtual switch distributed sensitive to the inherent limitation that are provided by the Intel® SSD Data under the Apache 2.0 License. Intel is conventional, spinning hard disk drives Center Tool. committed to working with the Open (HDDs) do not support parallel access vSwitch community to extend it with to data. • Enhanced RAS. Rigorous qualification support for Intel DPDK. This support is Conversely, the NAND memory in and compatibility testing, plus 230 now included in the openvswitch.org solid-state drives (SSDs) inherently years Mean Time Between Failures source tree. supports parallel data access, in (MTBF), helps provide business- A priority for the integration of addition to superior data-transfer critical dependability. Open vSwitch and Intel DPDK is rates and latency, compared to 4.3 Enhanced Workload Isolation with supporting NFV applications where HDDs. Therefore SSDs have the clear Intel® Virtualization Technology high-frequency, small packet sizes are advantage over HDDs for the large common. To address the challenges numbers of simultaneous instances and A hypervisor-based VM provides a presented by processing small packets, rapid provisioning and de-provisioning hardware-isolated environment for Intel provides a user-space kernel- associated with container-based provisioning containers. While the forwarding module (data plane) as part virtualization. Those advantages are containers within the VM are isolated of Open vSwitch. The modifications particularly pronounced for enabling from each other using only software made to create the Intel® DPDK highly parallel big data and high- capabilities, the group of containers Accelerated Open vSwitch (Intel® performance computing workloads within the VM as a whole can be DPDK vSwitch) resulted in dramatic using containers. isolated from the rest of the world improvements to packet-switching using Intel VT. Even using SSDs, however, throughput throughput, providing an improved limitations of SATA interfaces between As discussed above, the Linux kernel option for use with NFV use cases. SSDs and the motherboard is a further provides namespace isolation to potential bottleneck. To address that restrict access to resources as needed. issue, an industry consortium led by The measure of assurance provided Intel developed the new Non-Volatile by this method, however, is limited Memory Express* (NVMe) specification, 9 Linux Containers Streamline Virtualization and Complement Hypervisor-Based Virtual Machines

to some degree by the fact that it is • Pause-loop exiting. Taking advantage of the world’s largest computer based in software, making it at least of the fact that spin-locking code infrastructures, the company potentially susceptible to software- typically uses PAUSE instructions continuously looks for more based attacks. In theory, a malicious in a loop, this feature detects when lightweight, flexible approaches to process could reach outside its the duration of a loop is longer virtualize those resources. At GlueCon container by attacking the Linux OS than “normal” (a sign of lock-holder in May 2014, Joe Beda, a senior staff and breach its separation from other preemption). software engineer for the Google Cloud containers, compromising all data and Platform, declared to the audience Intel® VT for Directed I/O (Intel® VT-d) resources on the host. operates at the chipset level, providing during his presentation that “Everything Intel VT can mitigate the risk of additional capabilities to enhance at Google runs in a container.” A high- software-based attacks with data- hardware-based isolation of VMs, level look at some of the architecture protection measures based on including the following: associated with those containers is hardware features of processors, instructive. • Direct Memory Access (DMA) chipsets, and network controllers and remapping. I/O devices move data Google presently has six main cloud adapters based on Intel architecture. independently of the processor using product offerings: three storage Both open-source and proprietary DMA. Intel VT-d enables the OS services (based on a MySQL-compatible hypervisors and other software or hypervisor to create protection relational database, an object store, offerings are deeply integrated with domains, which are isolated sections and a NoSQL database, respectively), these hardware-based capabilities, of physical memory to which one or the BigQuery analysis service for through co-engineering and enabling more I/O devices can be assigned. The big data, and two compute services, activities by Intel. Hardware-based DMA remapping hardware provided described below. The use of containers isolation of data, memory, and other by Intel VT d uses address-translation and hypervisors in the compute- resources assigned to a specific VM— tables to block access to protection services architecture reveals one beneath the level of system software domains by I/O devices not assigned perspective regarding how these two such as OSs and hypervisors—provides to them. forms of virtualization contrast and an added level of data protection, complement each other. beyond what’s possible with containers • Direct assignment of I/O devices. operating on bare metal. Specific VMs can be assigned • Google App Engine* (GAE) is a exclusive access to particular physical container-based PaaS product that Containers that are provisioned inside or virtual I/O devices, each of which is offers App Servers to customers— a hypervisor-based VM have hardware- given exclusive access to a dedicated containers that each hold one or more enforced access only to the code, data, area of physical system memory; other and memory space specified by the instances of a customer’s application, VMs and I/O devices are restricted hypervisor; Intel VT restricts them which have reserved, guaranteed from reading data from or writing data from paging outside the VM’s memory resource levels on Google’s to those memory locations. boundaries. Applications can also infrastructure. be assigned to dedicated processor 5 Containers in Real-World • Google Compute Engine* (GCE) is a cores, to increase application isolation hypervisor-based infrastructure-as- Implementations even further. Likewise, that a-service (IaaS) product that offers has infiltrated one VM is effectively Exploring environments where Linux-based VMs running on top contained there, being prevented container-based virtualization is in of KVM. Interestingly, KVM itself is from affecting the contents of others. broad use helps characterize where installed in a container, and CoreOS Noteworthy processor-level features this model is appropriate, where (which can only run applications in of Intel VT that contribute to data hypervisors are the better choice, and containers) is emerging as the basis protection within VMs include the where the best option is for the two for the default VM image on GCE. following: approaches to work together. Thus, by default, a GCE application • Descriptor table exiting. This feature runs in a container, within a VM, on top helps protect guest operating systems 5.1 Google: Combinations and Layers of a hypervisor, which is in a container, from internal attack by preventing of Complementary Containers and as illustrated in Figure 5. the relocation of key system-data Hypervisors Using containers to subdivide resources structures. Many of the technologies that enable among GAE end customers offers container-based virtualization were efficient use of hardware resources, developed at Google. With one as well as flexibility to customers in 10 Linux Containers Streamline Virtualization and Complement Hypervisor-Based Virtual Machines

App App App App App App App App App App App App App App App App App App App App

Bins/libs Container Bins/libs Container Container Bins/libs Container Bins/libs Container Container

Container Bins/libs Container Bins/libs Container Bins/libs Container Bins/libs Operating System Operating System Operating System Operating System Virtual Machine Virtual Machine Virtual Machine Virtual Machine

Hypervisor Hypervisor Container Container

Operating System

Hardware

Figure 5. Layers of nested containers and hypervisor-based virtual machines. terms of static (manual) or dynamic implementation uses the BOSH Cloud The AppExchange* app store has been (automated) assignment of those Provider Interface (CPI) to deploy all the active since 2005, joined in 2007 by resources. GAE offers simple creation distributed software and the Google Force.com, a PaaS for the development and hosting of applications, websites, Cloud Datastore NoSQL database and deployment of third-party and services, with some trade-offs in service for the application’s large applications that consume Salesforce terms of flexibility, due to the model repository of non-relational data. CRM data. requiring Google to preselect the finite In May 2014, the container-based set of languages, , and frameworks 5.2 Heroku and Salesforce.com: PaaS operated by Heroku, a Salesforce that GAE supports. Container-Based CRM, Worldwide subsidiary, was richly integrated for the first time with Salesforce CRM as a The architectural differences between As one of the largest data source. Like Force.com, Heroku is the two offerings are significant. companies, Salesforce.com is another designed for developers to build and Consider two hypothetical customers example of a technology provider that deploy customer-facing applications, that must choose between GAE and uses container-based virtualization in tandem with other approaches to mobile and otherwise. In contrast, GCE. The first evaluates both services partition compute resources. The however, where Force.com primarily as the core infrastructure for a company has steadily expanded its serves developers within enterprises straightforward web-based business portfolio of product offerings in the building apps for employees, Heroku application and chooses GAE, planning past several years, building outward customers tend to be at start-ups to develop in PHP with Google’s Cloud from its core Customer Relationship and smaller independent software SQL relational-database service on the Management (CRM) software-as-a- vendors. Heroku’s architecture is also back end. The lightweight containers service (SaaS) offering. substantially different from the general model is a good fit in this case. approach at Salesforce, as shown in In particular, the company has added Table 1. The second hypothetical customer applications and services (many of a wants to deploy a Hadoop* cluster business-oriented social-networking Before this integration, providing within Google’s cloud platform and nature) that integrate with and access to Salesforce data through a manage it with Cloud Foundry*, as the add value to customers’ CRM data. Heroku application involved relatively basis for worldwide data collection that While those additions originate complex manipulation of APIs. The feeds into real-time data analytics. GAE largely from internal development new topology simplifies that process does not support this use case, and or acquisitions, Salesforce also has a while preserving the integrity of the customer opts to deploy Hadoop substantial commitment to facilitating established workflows, toolchains, and the related business logic in the development of software that and expertise among Heroku containers built on top of GCE VMs. The integrates with its CRM by third parties. developers. Heroku applications and 11 Linux Containers Streamline Virtualization and Complement Hypervisor-Based Virtual Machines

Table 1. Layers of nested containers and hypervisor-based virtual machines. 6 Conclusion Container-based virtualization is Salesforce/Force.com Heroku emerging as a viable solution for Physical Infrastructure Company-owned data centers * (AWS) mainstream data-center operators, helping them drive greater efficiency, Custom multitenant Containers inside Resource Partitioning architecture flexibility, and performance. The Xen*-based AWS VMs (details not disclosed) capabilities of this model are an SaaS (Salesforce); excellent fit with strategic trends that IT Service Delivery PaaS PaaS (Force.com) decision makers are likely to be already implementing or considering, from RDBMS: Oracle Database* Data Store Object store: Fileforce Custom Postgres* variant hypervisor-based virtualization to SDN. (proprietary) Intel is investing in the development of Apex (proprietary); Programming Ruby*, Node.js, Python*, hardware and software technologies abstractions allow development Languages *, others that will support this trend, offering without writing code a high degree of performance Capacity (e.g., number of Pricing Basis Number of users optimization for Intel architecture- HTTP requests) based servers, as well as enhanced, hardware-based data protection. A services access Salesforce data from data globally through the Amazon growing solution ecosystem is enriching a Postgres* database at Heroku that public cloud. That reach can strengthen the range of technology choices synchronizes in both directions with mobility initiatives targeting sales available, for use in public, private, and the main Salesforce CRM data store, teams and other field resources, while hybrid clouds. Integration among those which is based on Oracle Database* and taking advantage of the efficiencies of options is growing as well. custom-built object storage. container-based virtualization and the Looking ahead, the addition of data protection of hypervisor-based Establishing data synchronization containers to IT infrastructures VMs. This infrastructure demonstrates between Salesforce CRM data and the promises to help make businesses more the ability and value of containers Heroku application development and agile, cost-effective, and capable of to harmoniously enhance complex hosting PaaS enables organizations meeting their key objectives. environments, without displacing to extend the reach of their business existing architectures. Learn more about Intel® Technology in Communications: www.intel.com/communications

1 Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, and virtual machine monitor (VMM). Functionality, performance or other benefits will vary depending on hardware and software configurations. Software applications may not be compatible with all operating systems. Consult your PC manufacturer. For more information, visit www.intel.com/go/virtualization. 2 Based on the Intel® Solid-State Drive DC P3500, P3600, and P3700 Series Product Specifications. Random I/O Operations based on IOPS. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel’s Web Site http://www.intel.com/. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/ performance. *Other names and brands may be claimed as the property of others. Copyright © 2014 Intel Corporation. All rights reserved. Intel, the Intel logo, and Xeon are trademarks of Intel Corporation in the U.S. and other countries. 0914/BY/MESH/PDF 330678-001US