Virtio: Towards a De-Facto Standard for Virtual I/O Devices
Total Page:16
File Type:pdf, Size:1020Kb
virtio: Towards a De-Facto Standard For Virtual I/O Devices Rusty Russell IBM OzLabs 8 Brisbane Ave Canberra, Australia [email protected] ABSTRACT The Linux kernel has been ported to a huge number of plat- The Linux Kernel currently supports at least 8 distinct vir- forms; the official kernel tree contains 24 separate architec- tualization systems: Xen, KVM, VMware’s VMI, IBM’s Sys- ture directories and almost 2 million lines of architecture- tem p, IBM’s System z, User Mode Linux, lguest and IBM’s specific code out of 8.4 million. Most of these architectures legacy iSeries. It seems likely that more such systems will contain support for multiple platform variants. Unfortu- appear, and until recently each of these had its own block, nately we are aware of only one platform which has been network, console and other drivers with varying features and deleted from the tree (as the last machine of its kind was optimizations. destroyed) while new hardware variants sprout like weeds after a rain. With around 10,000 lines changing every day, The attempt to address this is virtio: a series of efficient, the kernel has at least one of everything you can imagine. well-maintained Linux drivers which can be adapted for vari- ous different hypervisor implementations using a shim layer. When we look at Linux as a guest under virtualization, we This includes a simple extensible feature mechanism for each are particularly blessed: IBM’s System p, System z and driver. We also provide an obvious ring buffer transport im- legacy iSeries are all supported. User Mode Linux[4] has plementation called vring, which is currently used by KVM long been included, for running Linux as a userspace process and lguest. This has the subtle effect of providing a path on Power, IA64 and 32 and 64 bit x86 machines. In the last of least resistance for any new hypervisors: supporting this two years the x86 architecture has proven particularly fe- efficient transport mechanism will immediately reduce the cund, with support for Xen[2] from XenSource, VMI[1] from amount of work which needs to be done. Finally, we pro- VMware and KVM[5] from Qumranet. Last and least, we vide an implementation which presents the vring transport should mention my own contribution to this mess, lguest[7]: and device configuration as a PCI device: this means guest a toy hypervisor which is useful for development and teach- operating systems merely need a new PCI driver, and hy- ing and snuck quietly into the tree last year. pervisors need only add vring support to the virtual devices they implement (currently only KVM does this). Each of these eight platforms want their own block, net- work and console drivers, and sometimes a boutique frame- This paper will describe the virtio API layer as implemented buffer, USB controller, host filesystem and virtual kitchen in Linux, then the vring implementation, and finally its em- sink controller. Few of them have optimized their drivers in bodiment in a PCI device for simple adoption on otherwise any significant way, and offer overlapping but often slightly fully-virtualized guests. We’ll wrap up with some of the pre- different sets of features. Importantly, no-one seems par- liminary work to integrate this I/O mechanism deeper into ticularly delighted with their drivers, or having to maintain the Linux host kernel. them. General Terms This question became particularly pertinent as the KVM Virtio, vring, virtio pci project, which garnered much attention when it burst onto the Linux scene in late 2006, did not yet have a paravirtual Keywords device model. The performance limitations of emulating de- vices were becoming clear[6], and yet the prospect of either Virtualization, I/O, ring buffer, Linux, KVM, lguest adopting the very-Xen-centric driver model was almost as unappealing as developing Yet Another driver model. Hav- 1. INTRODUCTION ing worked on the Xen device model, we believe it possible to create a general virtual I/O mechanism which is efficient[14], works on multiple hypervisors and platforms, and atones for Rusty’s involvement with the Xen device configuration sys- tem. 2. VIRTIO: THE THREE GOALS Our initial goal of driver unification is fairly straight-forward: all the work is inside the Linux kernel so there’s no need for any buy-in by other parties. If developers of boutique vir- The second part is the configuration space: this is effec- tual I/O mechanisms are familiar with Linux, it might guide tively a structure associated with the virtual device contain- them to map the Linux API neatly onto their own ABI. But ing device-specific information. This can be both read and “if” and “might” are insufficient: We can be more ambitious written by the guest. For example, network devices have a than this. VIRTIO_NET_F_MAC feature bit, which indicates that the host wants the device to have a particular MAC address, and the Experience has shown that boutique transport mechanisms configuration space contains the value. tend to be particular not only to a given hypervisor and ar- chitecture, but often to each particular kind of device. So These mechanisms give us room to grow in future, and for the next obvious step in our attempt to guide towards uni- hosts to add features to devices with the only requirement formity is to provide a common ABI for general publication being that the feature bit numbers and configuration space and use of buffers. Deliberately, our virtio ring implemen- layout be agreed upon. tation is not at all revolutionary: developers should look at this code and see nothing to dislike. There are also operations to set and get an 8 bit device status word which the guest uses to indicate the status of Finally, we provide a two complete ABI implementations, device probe; when the VIRTIO_CONFIG_S_DRIVER_OK is set, using the virtio ring infrastructure and the Linux API for it shows that the guest driver has completed feature probing. virtual I/O devices. These implement the final part of vir- At this point the host knows what features it understands tual I/O: device probing and configuration. Importantly, and wants to use. they demonstrate how simple it is to use the Linux virtual I/O API to provide feature negotiation in a forward and Finally, the reset operation is expected to reset the device, backward compatible manner so that future Linux driver its configuration and status bits. This is necessary for mod- features can be detected and used by any host implementa- ular drivers which may be removed and then re-added, thus tion. encountering a previously initialized device. It also avoids the problem of removing buffers from a device on driver The explicit separation of drivers, transport and configura- shutdown: after reset the buffers can be freed in the sure tion represents a change in thinking from current implemen- knowledge that the device won’t overwrite them. It could tations. For example, you can’t really use the Xen’s Linux also be used to attempt driver recovery in the guest. network driver in a new hypervisor unless you support Xen- Bus probing and configuration system. 3.1 Virtqueues: A Transport Abstraction Our configuration API is important, but the performance- critical part of the API is the actual I/O mechanism. Our 3. VIRTIO: A LINUX-INTERNAL ABSTRAC- abstraction for this is a virtqueue: the configuration oper- TION API ations have a find_vq which returns a populated structure If we want to reduce duplication in virtual device drivers, for the queue, given the virtio device and an index number. we need a decent abstraction so drivers can share code. One Some devices have only one queue, such as the virtio block method is to provide a set of common helpers which virtual device, but others such as networking and console devices drivers can use, but more ambitious is to use common drivers have a queue for input and one for output. and an operations structure: a series of function pointers which are handed to the generic driver to interface with any A virtqueue is simply a queue into which buffers are posted of several transport implementations. The task is to create a by the guest for consumption by the host. Each buffer is transport abstraction for all virtual devices which is simple, a scatter-gather array consisting of readable and writable close to optimal for an efficient transport, and yet allows a parts: the structure of the data is dependent on the device shim to existing transports without undue pain. type. The virtqueue operations structure looks like so: The current result (integrated in 2.6.24) is that virtio drivers struct virtqueue_ops { register themselves to handle a particular 32-bit device type, int (*add_buf)(struct virtqueue *vq, optionally restricting to a specific 32-bit vendor field. The struct scatterlist sg[], driver’s probe function is called when a suitable virtio device unsigned int out_num, is found: the struct virtio_device passed in has a vir- unsigned int in_num, tio_config_ops pointer which the driver uses to unpack the void *data); device configuration. void (*kick)(struct virtqueue *vq); void *(*get_buf)(struct virtqueue *vq, The configuration operations can be divided into four parts: unsigned int *len); reading and writing feature bits, reading and writing the void (*disable_cb)(struct virtqueue *vq); configuration space, reading and writing the status bits and bool (*enable_cb)(struct virtqueue *vq); device reset. The device looks for device-type-specific fea- }; ture bits corresponding to features it wants to use, such as the VIRTIO_NET_F_CSUM feature bit indicating whether a network device supports checksum offload.