Preemptable Ticket Spinlocks: Improving Consolidated Performance in the Cloud

Preemptable Ticket Spinlocks: Improving Consolidated Performance in the Cloud Jiannan Ouyang John R. Lange Department of Computer Science Department of Computer Science University of Pittsburgh University of Pittsburgh Pittsburgh, PA 15260 Pittsburgh, PA 15260 [email protected] [email protected] Abstract holder preemption occurs whenever a virtual machine’s (VM’s) vir- When executing inside a virtual machine environment, OS level tual CPU (vCPU) is scheduled off of a physical CPU while a lock synchronization primitives are faced with significant challenges is held inside the VM’s context. The result is that when the VM’s due to the scheduling behavior of the underlying virtual machine other vCPUs are attempting to acquire the lock they must wait until monitor. Operations that are ensured to last only a short amount of the vCPU holding the lock is scheduled back in by the VMM so it time on real hardware, are capable of taking considerably longer can release the lock. As kernel level synchronization is most often when running virtualized. This change in assumptions has signif- accomplished using spinlocks, the time spent waiting on a lock is icant impact when an OS is executing inside a critical region that wasted in a busy loop. While numerous attempts have been made to is protected by a spinlock. The interaction between OS level spin- address this problem, the solutions have targeted only generic spinlocks and VMM scheduling is known as the Lock Holder Preemp- lock behaviors and not more advanced locking primitives such as tion problem and has a significant impact on overall VM perfor- ticket spinlocks (spinlocks that ensure consistent ordering of acqui- mance. However, with the use of ticket locks instead of generic sitions). As a result of the introduction of ticket spinlocks virtual spinlocks, virtual environments must also contend with waiters be- machine synchronization now must contend not only with Lock ing preempted before they are able to acquire the lock. This has Holder Preemption but also Lock Waiter Preemption. the effect of blocking access to a lock, even if the lock itself is Ticket spinlocks [9] are a form of spinlock that enforces or- available. We identify this scenario as the Lock Waiter Preemption dering among lock acquisitions. Whenever a thread of execution problem. In order to solve both problems we introduce Preemptable attempts to acquire a ticket spinlock it either (1) acquires the lock Ticket spinlocks, a new locking primitive that is designed to enable immediately, or (2) is granted a ticket which determines the order a VM to always make forward progress by relaxing the ordering among all outstanding lock requests. The introduction of ticket guarantees offered by ticket locks. We show that the use of Pre- spinlocks was meant to ensure fairness and prevent starvation emptable Ticket spinlocks improves VM performance by 5:32X on among competing threads by preventing any single thread from average, when running on a non paravirtual VMM, and by 7:91X obtaining a lock before another thread that requested it first. In this when running on a VMM that supports a paravirtual locking inter- manner each thread must wait to acquire a lock until after it has face, when executing a set of microbenchmarks as well as a realistic been held by every other thread that previously tried to acquire e-commerce benchmark. it. This ensures that a given thread is never preempted by another thread while trying to acquire the same lock, and thus guarantees Categories and Subject Descriptors D.4.1 [Process Manage- that well behaved threads will all acquire the lock in a timely man- ment]: Mutual exclusion ner. While ticket spinlocks have been shown to provide advantages Keywords Virtual Machines; Lock Holder Preemption; Paravirtu- to performance and consistency for native OS environments, they alization pose a new challenge for virtualized environments. This is due to the fact that when running inside a VM, the use of a ticket 1. Introduction spinlock can result in multiple threads waiting to acquire a spinlock that is currently available. This problem exists whenever a VMM Synchronization has long been recognized as a source of bottle- preempts a waiter that has not yet acquired the lock. In this case necks in SMP and multicore operating systems. With the increased even if the lock is released, no other thread is allowed to acquire it use of virtualization, multi-core CPUs, and consolidated Infrastruc- until the next waiter is allowed to run, resulting in a scenario where ture as a Service (IaaS) clouds this issue has become more sig- there is contention over an idle resource. We denote this situation nificant due to the Lock Holder Preemption problem [15]. Lock as the Lock Waiter Preemption problem. Lock holder preemption has traditionally been addressed using a combination of configuration, software and hardware techniques. Initial workarounds to the lock holder preemption problem required Permission to make digital or hard copies of all or part of this work for personal or that every vCPU belonging to a given VM be gang scheduled in or- classroom use is granted without fee provided that copies are not made or distributed der to avoid this problem altogether [16]. While this eliminates the for profit or commercial advantage and that copies bear this notice and the full citation lock holder preemption problem, it does so in a way that dramat- on the first page. To copy otherwise, to republish, to post on servers or to redistribute ically reduces the amount of possible consolidation and increases to lists, requires prior specific permission and/or a fee. the amount of cross VM interference. As a result of these draw- VEE’13, March 16–17, 2013, Houston, Texas, USA. Copyright c 2013 ACM 978-1-4503-1266-0/13/03. $15.00 backs, several attempts have been made to address the lock holder preemption problem on a per vCPU level in order to move away 2. Related Work from the gang scheduling model. These approaches focus on de- Due to the impact virtual machine scheduling has on synchroniza- tecting when a given vCPU is stuck spinning on a busy lock, so that tion performance, significant research efforts have sought to opti- the VMM can adjust its scheduling decisions based on the lock de- mize the interaction between the guest OS and VMM. In general pendency. The detection techniques vary, but can be roughly clas- the existing approaches have fallen into the following categories. sified by where they are implemented: inside the VMM (relying on hardware virtualization features such as Pause Loop Exiting), or Preemption Aware Scheduling Initially, VMM architectures inside both VMM and guest OS (paravirtual). Unfortunately, while dealt with guest locking problems by requiring that every VM be these approaches have been effective in addressing the lock holder executed using co-scheduling [11]. This approach was adopted by preemption problem they are unable to handle the lock waiter pre- the virtual machine scheduler in VMware ESX [16]. Advances on emption problem. this approach were explored by [18, 19], that proposed an adap- We propose to address the problem of lock waiter preemption tive co-scheduling scheme, that allowed the VMM scheduler to through the introduction of a new spinlock primitive called Pre- dynamically alternate between co-scheduling and asynchronous emptable Ticket spinlocks. Preemptable Ticket spinlocks improve scheduling of a VM’s set of vCPUs. The choice of scheduling ap- the performance of traditional ticket spinlocks by allowing preemp- proaches is based on the detection of long lived lock contention in tion of a waiter that has been detected to be unresponsive. Unre- the guest OS. Finally, in [14] the authors proposed a “balancing sponsiveness is determined via a linearly increasing timeout that scheduler” scheme that associates a VM’s individual vCPUs with allows earlier waiters a window of opportunity in which they can dedicated physical CPUs, but does not require that the vCPUs be acquire a lock before the lock is offered to later waiters. Preempt- co-scheduled. These solutions add host side scheduling constraints, able Ticket spinlocks provide performance benefits when running and are complementary to Preemptable Ticket spinlock. either with or without VMM support (a specialized paravirtual in- Paravirtual Locks Paravirtual approaches to solve the lock holder terface), however VMM support does provide overall superior per- preemption problem was first explored in [15], where the authors formance. adopted a lock avoidance approach in which the guest OS provided Preemptable Ticket spinlocks are based on the observation that scheduler hints to the underlying VMM. These hints demarcated forward progress is preferable to fairness in the face of contention. non-preemptable regions of guest execution that corresponded to In the case where lock waiter preemption is preventing a guest from critical sections in which a non-blocking lock was held. [6] pro- acquiring a lock, then a thread waiting on the lock should be able posed another paravirtual lock approach (later adopted by Xen and to preempt the next thread in line if that thread is incapable of ac- KVM [12]), that uses a loop counter to detect “unusually long” wait quiring the lock in a reasonable amount of time. While Preemptable times for a spinlock. When the time spent waiting for a lock reaches Ticket spinlocks do allow preemption, which technically breaks the a given threshold, the VMM is notified via a hypercall that a vCPU ordering guarantees of standard ticket locks, it does so in a way that is currently blocked by a held lock. The VMM then halts the wait- minimizes the loss of fairness by always granting priority to earlier ing vCPU until the lock is detected to be available.

Preemptable Ticket Spinlocks: Improving Consolidated Performance in the Cloud

Scalable NUMA-Aware Blocking Synchronization Primitives

Non-Scalable Locks Are Dangerous

Recoverable Mutual Exclusion for Non-Volatile Main Memory Systems

Concurrency: Better Critical Section Solutions

Multicore Semantics & Programming

28. Locks Operating System: Three Easy Pieces

A Flexible and Scalable Affinity Lock for the Kernel

Non-Scalable Locks Are Dangerous

Lock-Based Data Structures on Gpus with Independent Thread Scheduling

A Program Logic for Concurrent Objects Under Fair Scheduling

15-418/618, Spring 2018 Exam 2 Practice SOLUTIONS

High Performance Synchronization Algorithms for Multiprogrammed Multiprocessors