Osprey: Operating System for Predictable Clouds

Osprey: Operating System for Predictable Clouds Jan Sacha, Jeff Napper, Sape Mullender Jim McKie Bell Laboratories Bell Laboratories Alcatel-Lucent Alcatel-Lucent Antwerp, Belgium Murray Hill, NJ Abstract—Cloud computing is currently based on hardware only possible if the guest system cooperates with the host virtualization wherein a host operating system provides a operating system, which violates full transparency. virtual machine interface nearly identical to that of physical Although hardware virtualization allows backward com- hardware to guest operating systems. Full transparency allows backward compatibility with legacy software but introduces patibility with legacy software, full transparency comes at unpredictability at the guest operating system (OS) level. The a high cost. In addition to real-time scheduling problems, time perceived by the guest OS is non-linear. As a consequence, virtualization also hides the real properties of redundant it is difficult to run real-time or latency-sensitive applications system components such as the storage devices, compute in the cloud. In this paper we describe an alternative approach nodes, and network links of different systems to the level of to cloud computing where we run all user applications on top of a single cloud operating system called Osprey. Osprey the datacenter. allows dependable, predictable, and real-time computing by In paravirtualization [1], [2], [3], the virtual machine consistently managing all system resources and exporting interface is close but not identical to the physical hardware relevant information to the applications. Osprey ensures com- and the guest operating system is aware of virtualization. patibility with legacy software through OS emulation provided However, paravirtualization has been introduced to simplify by libraries and by porting runtime environments. Osprey’s resource containers fully specify constraints between applica- the hypervisor and improve application performance rather tions to enforce full application isolation for real-time execution than provide predictable performance and dependability. For guarantees. Osprey pushes much of the state out of the kernel instance, the hypervisor might provide real-time information into user applications for several benefits: full application to the guest OS but it does not cooperate in scheduling and accounting, mobility support, and efficient networking. Using a thus the guest OS is not able to provide real-time guarantees kernel-based packet filter, Osprey dispatches incoming packets to the user application as quickly as possible, eliminating the to the applications. kernel from the critical path. A real-time scheduler then decides In this paper we describe an alternative approach to using on the priority and order in which applications process their hardware virtualization for cloud computing. We abandon incoming packets while maintaining the limits set forth in the the split between host and guest operating systems to run all resource container. We have implemented a mostly complete user applications on top of a single cloud operating system Osprey prototype for the x86 architecture and we plan to port it to ARM and PowerPC and to develop a Linux library OS. called Osprey. We trade full virtualization transparency for predictable, real-time application execution and fault- Keywords-cloud; virtualization; predictability; real-time; tolerance. Osprey provides a virtual machine interface at the sys- I. INTRODUCTION tem call level as opposed to instruction set (ISA) level in hardware virtualization. This approach, known as OS A growing number of network services today are built virtualization [4], [5], incurs lower overhead compared to on cloud computing infrastructure. Cloud computing is hardware virtualization, while both approaches introduce typically based on hardware virtualization wherein a host similar benefits to cloud computing including application operating system (virtual machine monitor, or hypervisor) sandboxing, server consolidation, checkpoint/restart, and provides a virtual machine interface nearly identical to migration. Hardware virtualization introduces unnecessary that of physical hardware. Services with strict latency or overhead because an application runs on top of two operating timing requirements, such as telecommunications systems, systems (host and guest) which have separate sets of drivers, are difficult to run in current clouds: Hardware virtual- data buffers, resource management policies, etc. that often ization introduces unpredictability at the guest operating conflict. system (OS) level that violates key assumptions the guest In this paper we claim that OS virtualization allows system makes about the hardware it is running on. The dependable, predictable, and real-time computing since the time perceived by the guest system in a virtual machine operating system can consistently manage all system re- is non-linear when sharing hardware, making it difficult to sources while exporting relevant information to the applica- provide reliable timing guarantees. Real-time scheduling is tion. In Osprey, we combine OS virtualization with library 978-1-4673-2266-9/12/$31.00 ©2012 IEEE operating systems [6], [7], [8], which implement legacy Inter-process communication, as well as user-kernel OS personalities in the user space providing compatibility communication, is performed using shared-memory asyn- with legacy software. We implement library OSes on top chronous message queues. Traditional trap-based system of the Osprey kernel to enable high performance, real-time calls are replaced in Osprey by these message queues that guarantees, and scalability on modern, massively multicore allow user processes to batch multiple kernel requests to hardware. In the following sections we describe the key reduce context switches. Processes need to trap to the kernel features that allow Osprey to meets these goals. only in cases where the process fills up its queue or waits for an incoming message. Message-based system calls also II. PREDICTABLE PERFORMANCE allow the kernel to handle requests on a different core We are expecting that the CPU core count in future while the user process is running in parallel, maximizing hardware is going to gradually increase and memory access throughput. is going to be less and less uniform. Consequently, an oper- Since writing event-driven applications is considered dif- ating system for future clouds should be able to execute real- ficult, Osprey provides a lightweight, cooperative user-level time processes efficiently on multi-core NUMA hardware. threading library on top of asynchronous system calls. Similar to the multikernel model [9], we partition most Osprey threads require almost no locking. When a thread of the kernel state and physical memory in Osprey between yields or blocks waiting for a message, the library schedules CPU cores in order to reduce synchronization cost, in- another runnable thread, allowing the application to perform crease parallelism, maximize the efficiency of CPU memory multiple system calls in parallel without ever trapping or caches, and reduce the load on CPU interconnects. In blocking the process. particular, we pin user processes to cores and we run an independent scheduler on each core. A load balancer oc- III. APPLICATION ISOLATION casionally migrates processes between cores. Given a large Cloud applications often share hardware in order to run number of cores in the system and a smart process placement in a cost-effective manner, amortizing the costs of execution heuristic, we expect cross-core process migrations to be over multiple applications. It is crucial for a cloud operating relatively infrequent. Each core is assigned its own memory system to isolate applications from each other in order to allocation pool to reduce cross-core memory sharing. Parts provide them with enough resources to meet their Service- of the kernel address space are shared between the cores but Level Agreements (SLA). Current SLAs offered by cloud there is also a region that is private for each core. providers specify only an annual uptime because fine-scale Osprey supports several types of locks and semaphores for performance is very hard to guarantee with virtual machines synchronization. However, since most kernel data structures sharing hardware resources. are partitioned between cores, access to these data structures Each application in Osprey runs in a resource container can be protected using cheap locks which disable interrupts that controls resource consumption by the application. We and do not require busy waiting. We use global system locks distinguish between two types of resources: preemptive and infrequently and keep critical code sections to a necessary non-preemptive. Preemptive resources, such as CPU time minimum in order to avoid wasting CPU cycles on lock or network interface bandwidth, can be freely given to the contention and to eliminate unknown wait times to provide application and taken back by the kernel when needed. real-time guarantees. Resource containers thus specify the minimum amount of Osprey generally attempts to minimize the number of preemptive resources guaranteed for the application. traps, interrupts, and context switches since they have an Non-preemptive resources, such as memory, cannot be adverse effect on system performance and real-time guar- easily taken back once granted (note that we abandon antees. Clock and inter-core interrupts are only used if a swapping due to its high cost), and thus resource

Load more