Billing the CPU Time Used by System Components on Behalf of Vms Boris Djomgwe Teabe, Alain Tchana, Daniel Hagimont
Total Page:16
File Type:pdf, Size:1020Kb
Billing the CPU Time Used by System Components on Behalf of VMs Boris Djomgwe Teabe, Alain Tchana, Daniel Hagimont To cite this version: Boris Djomgwe Teabe, Alain Tchana, Daniel Hagimont. Billing the CPU Time Used by System Components on Behalf of VMs. 13th IEEE International Conference on Services Computing (SCC 2016), Jun 2016, San Francisco, CA, United States. pp. 307-315. hal-01782591 HAL Id: hal-01782591 https://hal.archives-ouvertes.fr/hal-01782591 Submitted on 2 May 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Open Archive TOULOUSE Archive Ouverte ( OATAO ) OATAO is an open acce ss repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. This is an author-deposited version published in : http://oatao.univ-toulouse.fr/ Epri nts ID : 18957 To link to this article URL : http://dx.doi.org/10.1109/SCC.2016.47 To cite this version : Djomgwe Teabe, Boris and Tchana, Alain- Bouzaïde and Hagimont, Daniel Billing the CPU Time Used by System Components on Behalf of VMs. (2016) In: 13th IEEE International Conference on Services Computing (SCC 2016), 27 June 2016 - 2 July 2016 (San Francisco, CA, United States). Any corresponde nce concerning this service should be sent to the repository administrator: staff -oatao@listes -diff.inp -toulouse.fr Billing the CPU time used by system components on behalf of VMs Boris Teabe, Alain Tchana, Daniel Hagimont Institut de Recherche en Informatique de Toulouse (IRIT) Toulouse, France fi[email protected] Abstract—Nowadays, virtualization is present in almost all amount of CPU is consumed by the underlying system cloud infrastructures. In virtualized cloud, virtual machines components. This consumed CPU time is not only difficult to (VMs) are the basis for allocating resources. A VM is launched monitor, but also is not charged to VM capacities. Therefore, with a fixed allocated computing capacity that should be strictly provided by the hosting system scheduler. Unfortunately, this we have VMs using more computing capacities than the allocated capacity is not always respected, due to mechanisms allocated values. Such a situation can lead to performance provided by the virtual machine monitoring system (also known unpredictability for cloud clients, and resources waste for as hypervisor). For instance, we observe that a significant the cloud provider. We have highlighted this issue in a amount of CPU is consumed by the underlying system compo- previous work [24]. In this paper, we complete our previous nents. This consumed CPU time is not only difficult to monitor, but also is not charged to VM capacities. Consequently, we have analysis [24] by proposing a system extension which deter- VMs using more computing capacities than the allocated val- mines the CPU time consumed by the system components on ues. Such a situation can lead to performance unpredictability behalf of the individual VM in order to charge this time to for cloud clients, and resources waste for the cloud provider. each VM. Therefore, we significantly reduce performance In this paper, we present the design and evaluation of a disturbance coming from the competition on the hosting mechanism which solves this issue. The proposed mechanism consists of estimating the CPU time consumed by the system system components. Note that most researches [12], [13] component on behalf of individual VM. Subsequently, this in this domain have investigated performance disturbance at estimated CPU time is charged to VM. We have implemented a micro-architectural level (e.g. processor caches). Our ex- a prototype of the mechanism in Xen system. The prototype tension is complementary to them since it addresses another has been validated with extensive evaluations using reference level of contention. A prototype has been implemented in benchmarks. the Xen system and extensive evaluations have been made Keywords-Cloud computing; Virtual machines; resources; with various workloads (including real data center traces). computing capacities The evaluations demonstrate that: • without our extension, performance isolation and re- I. INTRODUCTION source management can be significantly impacted Cloud Computing enables remote on-demand access to a • our extension can enforce performance isolation for the set of computing resources through infrastructures, so-called customer and prevent resource leeks for the provider Infrastructure as a Service (IaaS). The latter is the most • intrusivity of our implementation is negligible, near popular cloud model because it offers a high flexibility to zero cloud users. In order to provide isolation, IaaS clouds are The rest of the article is structured as follows. Section II often virtualized such that resource acquiring is performed introduces the necessary background for the paper. Sec- through virtual machines (VMs). A VM is launched with tion III presents the motivations for this work. Section IV a fixed allocated computing capacity that should be strictly overviews our contributions. The implementation is depicted provided by the hosting system scheduler. The respect of in Section IV-C. An evaluation is reported in Section V. The the allocated capacity has two main motivations: (1) For latter is followed by a review of related work in Section VI, the customer, performance isolation and predictability [7], we present our conclusions and perspectives in Section VII. [15], i.e. a VM performance should not be influenced by other VMs running on the same physical machine. (2) For II. BACKGROUND the provider, resource management and cost reduction, i.e. a VM should not be allowed to consume resources that are The contributions described in this paper have been not charged to the customer. Unfortunately, the allocated implemented in the Xen system (the most popular open capacities to VMs are not always respected [1], [10], due source virtualization system, used by Amazon EC2). This to the activities of some hypervisor system components section presents an overview of virtualization technologies (mainly network and disk drivers) [21], [24]. Surprisingly, in Xen. This presentation only covers virtualization aspects the CPU time consumed by the system components is not which are relevant to our study: I/O virtualization, and CPU charged to VMs. For instance, we observe that a significant allocation and accounting. A. I/O virtualization in Xen When remainCredit reaches a lower threshold, the VM is In the Xen para-virtualized system, the real driver of no longer allowed to access a processor. Periodically, the each I/O device resides within a particular VM named scheduler increases the value of remainCredit for each VM ”driver domain” (DD). The DD conducts I/O operations according to its initial credit c to give it a chance to become on behalf of VMs which run a fake driver called frontend schedulable. (FE), as illustrated in Fig. 1. The frontend communicates More formally, on a PM p, if v has n vCPUs with the real driver via a backend (BE) module (within (noted vCP U1 toCPUv n), burntCredit(vCP Ui) and the DD) which allows multiplexing the real device. This pmP rocessT ime(p, v) provide respectively the aggregated I/O virtualization architecture is used by the majority of processing time used by vCP Ui and the time elapsed since virtualization systems and is known as the ”split-driver the start-up of v, then the Credit scheduler goal it to satisfy model”. It enhances the reliability of the physical machine the following equation at each scheduling time: (PM) by isolating faults which occur within a driver. It n P =1 burntCredit(vCPUi) functions as follows. c = i (1) pmP rocessT ime(p, v) The real driver in the DD can access the hard- ware but interrupts are handled by the hypervisor. From the above description we can see that the processing The communication between the DD and the hy- time used by the DD to operate I/O requests on behalf of a pervisor, and between the DD and a domU are VM is not charged to the VM by the scheduler. This is the based on event channels (EC1 and EC2 in Fig. 1). source of several problems in the cloud. An I/O request is- III. MOTIVATIONS sued by a guest OS A. Problem statement (domU) creates an event According to the split-driver model, we have seen that I/O on EC1. The data are requests are handled by both the hypervisor and the DD on transmitted to the BE in behalf of VMs. Therefore, they use their own processing the DD through a ring time (hereafter referred to as ”system time”) to perform buffer (based on mem- operations on behalf of VMs. Current schedulers (including ory pages shared be- Credit) do not consider this system time when accounting tween the domU and the a VM’s processing time. This processing time is distributed DD). The BE is then as follows: responsible for invoking • T1: within the hypervisor for handling hardware inter- the real driver. rupts. An I/O request re- • T2: within the DD’s kernel for handling virtual inter- ceived by the hardware rupts and transferring I/O requests between the real Figure 1: The split-driver (as an interrupt) gener- driver and the backend. model for I/O virtualizign ates an event on EC2. • T3: within the backend for multiplexing and deliv- From this event, the DD ering/receiving I/O requests to/from the frontend.