Predicting Disk Scheduling Performance with Virtual Machines
Total Page:16
File Type:pdf, Size:1020Kb
Predicting Disk Scheduling Performance with Virtual Machines Robert Geist, Zachary H. Jones, James Westall Clemson University Abstract. A method for predicting the performance of disk schedul- ing algorithms on real machines using only their performance on virtual machines is suggested. The method uses a dynamically loaded kernel in- tercept probe (iprobe) to adjust low-level virtual device timing to match that of a simple model derived from the real device. An example is pro- vided in which the performance of a newly proposed disk scheduling algorithm is compared with that of standard Linux algorithms. The ad- vantage of the proposed method is that reasonable performance predic- tions may be made without dedicated measurement platforms and with only relatively limited knowledge of the performance characteristics of the targeted devices. 1 Introduction In the last five years, the use of virtual computing systems has grown rapidly, from nearly non-existent to commonplace. Nevertheless, system virtualization has been of interest to the computing community since at least the mid-1960s, when IBM developed the CP/CMS (Control Program/Conversational Monitor System or Cambridge Monitor System) for the IBM 360/67 [1]. In this design, a low-level software system called a hypervisor or virtual machine monitor sits between the hardware and multiple guest operating systems, each of which runs unmodified. The hypervisor handles scheduling and memory management. Priv- ileged instructions, those that trap if executed in user mode, are simulated by the hypervisor’s trap handlers when executed by a guest OS. The specific architecture of the host machine essentially determines the dif- ficulty of constructing such a hypervisor. Popek and Goldberg [2] characterize as sensitive those machine instructions that may modify or read resource config- uration data. They show that an architecture is most readily virtualized if the sensitive instructions are a subset of the privileged instructions. The principal roadblock to widespread deployment of virtual systems has been the basic Intel x86 architecture, the de facto standard, in which a rela- tively large collection of instructions are sensitive but not privileged. Thus a guest OS running at privilege level 3 may execute one of them without gen- erating a trap that would allow the hypervisor to virtualize the effect of the instruction. A detailed analysis of the challenges to virtualization presented by these instructions is given by Robin and Irvine [3]. 2 Robert Geist, Zachary H. Jones, James Westall The designers of VMWare provided the first solution to this problem by us- ing a binary translation of guest OS code [4]. Xen [5] provided an open-source virtualization of the x86 using para-virtualization, in which the hypervisor pro- vides a virtual machine interface that is similar to the hardware interface but avoids the instructions whose virtualization would be problematic. Each guest OS must then be modified to run on the virtual machine interface. The true catalysts to the widespread development and deployment of virtual machines appeared in 2005-2006 as extensions to the basic x86 architecture, the Intel VT-x and AMD-V. The extensions include a “guest” operating mode, which carries all the privilege levels of the normal operating mode, except that system software can request that certain instructions be trapped. The hardware state switch to/from guest mode includes control registers, segment registers, and instruction pointer. Exit from guest mode includes the cause of the exit [6]. These extensions have allowed the development of a full virtualization Xen, in which the guest operating systems can run unmodified, and the Kernel-based Virtual Machine (KVM), which uses a standard Linux kernel as hypervisor. Nevertheless, VMWare, Xen, and KVM are principally aimed at facilitating user-level applications, and thus they export only a fairly generic view of the guest operating system(s), in which the devices to be managed are taken from a small, fixed collection of emulated components. This would seem to preclude the use of virtual machines in testing system-level performance, such as in a comparative study of scheduling algorithms. The devices of interest are unlikely to be among those emulated, and, even if they are, the emulated devices may have arbitrary implementation, thus exhibiting performance characteristics that bear little or no resemblance to those of the real devices. For example, an entire virtual disk may be cached in the main memory of a large NAS device supporting the virtual machine, thus providing a disk with constant (O(1)) service times. In [7], we introduced the iprobe, an extremely light-weight extension to the Linux kernel that can be dynamically loaded and yet allows the interception and replacement of arbitrary kernel functions. The iprobe was seen to allow a straightforward implementation of PCI device emulators that could be dynam- ically loaded and yet were suitable for full, system-level driver design, develop- ment, and testing. The goal of this effort is to extend the use of the iprobe to allow performance prediction for real devices using only virtual machines. We suggested this possi- bility in [8]. In particular, we will predict the performance of a collection of disk scheduling algorithms, one of which is new, for a targeted file-server workload on a specific SCSI device. We then compare the results with measurements of the same algorithms on the real hardware. We will see that the advantage of our approach is that reasonable performance predictions may be made without dedicated measurement platforms and with relatively limited knowledge of the performance characteristics of the targeted devices. The remainder of the paper is organized as follows. In the next section we provide background on both kernel probes and disk scheduling. In section 3, we propose a new disk scheduling algorithm that will serve as the focus of our tests. Lecture Notes in Computer Science 3 In section 4, we show how the kernel probes may be used to implement a virtual performance throttle, the mechanism that allows performance prediction from virtual platforms. In section 5 we describe our virtual machine, the targeted (real) hardware, and a test workload. Section 6 contains results of algorithm performance prediction from the virtual machine and measurements of the same algorithms on the real hardware. Conclusions follow in section 7. 2 Background 2.1 Kernel Probes The Linux kernel probe or kprobe utility was designed to facilitate kernel debug- ging [10]. All kprobes have the same basic operation. A kprobe structure is ini- tialized, usually by a kernel module, i.e., a dynamically loaded kernel extension, to identify a target instruction and specify both pre-handler and post-handler functions. When the kprobe is registered, it saves the targeted instruction and replaces it with a breakpoint. When the breakpoint is hit, the pre-handler is executed, then the saved instruction is executed in single step mode, then the post-handler is executed. A return resumes execution after the breakpoint. A variation on the kprobe, also supplied with Linux, is the jprobe, or jump probe, which is intended for probing function calls, rather than arbitrary kernel instructions. It is a kprobe with a two-stage pre-handler and an empty post- handler. On registration, it copies the first instruction of the registered function and replaces that with the breakpoint. When this breakpoint is hit, the first- stage pre-handler, which is a fixed code sequence, is invoked. It copies both registers and stack, in addition to loading the saved instruction pointer with the address of the supplied, second-stage pre-handler. The second-stage pre-handler then sees the same register values and stack as the original function. In [7], we introduced the intercept probe or iprobe, which is a modified jprobe. Our iprobe second-stage pre-handler decides whether or not to replace the origi- nal function. If it decides to do so, it makes a backup copy of the saved (function entry) instruction and then overwrites the saved instruction with a no-op. As is standard with a jprobe, the second-stage pre-handler then executes a jprobe return, which traps again to restore the original register values and stack. The saved instruction (which now could be a no-op) is then executed in single step mode. Next the post-handler runs. On a conventional jprobe, this is empty, but on the iprobe, the post-handler checks to see if replacement was called for by the second-stage pre-handler. If this is the case, the single-stepped instruction was a no-op. The registers and stack necessarily match those of the original function call. We simply load the instruction pointer with the address of the replacement function, restore the saved instruction from the backup copy (overwrite the no- op) and return. With this method, we can intercept and dynamically replace any kernel function of our choice. Note that it is possible to have two calls to the same probed function, one that is to be intercepted and one that is not. A discussion of the handling of potential attendant race conditions in SMP systems may be found in [9]. 4 Robert Geist, Zachary H. Jones, James Westall 2.2 Disk Scheduling Scheduling algorithms that re-order pending requests for disks have been studied for at least four decades. Such scheduling algorithms represent a particularly attractive area for investigation in that the algorithms are not constrained to be work-conserving. Further, it is easy to dismiss naive (but commonly held) beliefs about such scheduling, in particular, that a greedy or shortest-access-time-first algorithm will deliver performance that is optimal with respect to any common performance measure, such as mean service time or mean response time. Consider a hypothetical system in which requests are identified by their starting blocks and service time between blocks is equal to distance.