Implementation of Xen PVHVM Drivers in Openbsd

Implementation of Xen PVHVM Drivers in Openbsd

Implementation of Xen PVHVM drivers in OpenBSD Mike Belopuhov Esdenera Networks GmbH [email protected] Abstract In OpenBSD a pvbus(4) pseudo bus abstrac- tion takes care of identifying the type of the hy- OpenBSD 5.9 will include a native implemen- pervisor via a CPUID signature and probes for tation of Xen PVHVM drivers. It was written child devices using a standard config(9) frame- from scratch to facilitate simplicity and main- work for that. tainability. One of major goals of this effort is Xen nexus device that performs domU setup to run OpenBSD images in the Amazon cloud. is implemented as a xen(4) device driver that also acts as an attachment point for other vir- 1 Introduction tual devices configured in the virtual machine settings. Xen virtual machine monitor provides two types of guest hosting depending on the under- lying hardware: paravirtualized and hardware 3 The hypercall interface assisted virtualization mode when a CPU with In order to be able to perform different oper- virtualization extensions (AMD-V or Intel VT- ations like configuring devices, virtual inter- x) is used. rupts or simply fetching information from the At the same time guests running in the hard- dom0, Xen makes use of a hypercall interface ware assisted virtualization mode are not re- which works similar to the syscall(9) interface stricted access to the paravirtualized facilities that provides a VMEXIT event on the hypervi- via the hypercall interface normally used by the sor side. paravirtualized instances. The guest allocates a page of memory within We will explore what facilities are provided the kernel’s code segment and communicates and how an HVM guest can combine emulated its location in the physical memory to the hy- PCI device tree and interfaces provided via par- pervisor via an MSR write that fills it with avirtualization. content. Upon inspection the content of the page contains SGDT instructions at offsets rep- 2 Guest domain initialization resenting different hypercalls. Since OpenBSD virtual memory subsystem In order to gain access to the paravirtualized doesn’t implement a proper way to allocate services the guest must take a few steps in order memory pages that can be later called into for to identify the hypervisor and setup the hyper- various reasons, a code segment of the kernel call interface. itself had to be extended by one page. And while this is a rather straightforward modifica- Once triggered a guest operating system tion, perhaps a randomized location would suit must run an interrupt vector that traverses a it better. pending event channel ports bitmap inside the Via the established hypercall interface other Shared Information Page to establish which parameters of the system can be learned, for ports have triggered the event. example extended version, enabled virtual ma- A xen intr establish() method is provided chine features, etc. in order to setup a callback that will be exe- Unlike other implementations, OpenBSD cuted by the interrupt vector when associated uses a single hypercall function that is defined event port is pending in the event channel ports as a variable argument function and expands bitmap. In many cases the event port number is the parameter list in order to construct hyper- not known in advance and can be allocated by call arguments. the aforementioned method itself. Likewise a xen intr disestablish() can be called to remove the binding. 4 Shared Information Page During system startup and device driver ini- tialization interrupts remain masked and are un- One of several basic ways of communicating masked after the root filesystem is mounted. information to the hypervisor and back to the Device drivers are required to operate in the guest system is using shared memory pages. polling mode until interrupts are enabled. The Shared Information Page is a specialized After startup is finished, device drivers page of memory that provides guests access can mask and unmask their interrupt sources to the bitmap of masked and pending event at will via calls to xen intr mask() and channel ports events as well as other informa- xen intr unmask(). tion, such as RTC, TSC, and information about Unlike other implementations, we have in- NMIs. cluded support for marking Xen upcall inter- The guest system must allocate a page of rupts as pending to integrate interrupt process- memory that has both physical and virtual map- ing better with the rest of the system, e.g. to pings, for example via malloc(9), and commu- ensure that interrupt handler is not reentrant. nicate it’s frame number (a number of page sized increments) to the hypervisor via a mem- ory operation hypercall. 5.1 Interrupts: the IDT method Shared Information Page also includes a run- When indicated by the virtual machine features ning cycle counter and a wall clock so it should a guest system may communicate an allocated be possible in the future to turn this into a sys- Interrupt Descriptor Table vector to the hyper- tem timecounter. In fact the PVCLOCK inter- visor to deliver the interrupt directly into the face used by Linux is implemented this way. system without the help of an emulated APIC. To set up an IDT vector a system must es- 5 The interrupt subsystem tablish a link between an IDT vector number in a range of 0-255 and a callback function via There are two ways for a Xen hypervisor to in- an IDT gate descriptor. OpenBSD groups IDT ject an interrupt request into the system: via an vector numbers according to which Interrupt Interrupt Descriptor Table vector that has been Priority Level they represent. IPL NET prior- allocated by the guest solely for these purposes ity is used for Xen interrupt vector and there- and via a virtual PCI device, the XenStore Plat- fore the first vector 0x70 in that group has been form Device. reserved for it. Due to the fact that this interrupt vector is as well. not established via a PIC-compatible interface This allows drivers for paravirtualized de- low level interrupt stub functions that basically vices to take advantage of a standard approach implement pending interrupt processing cannot to DMA memory management. The first step be used for our interrupt vector. Instead a set of is to create a DMA map that records meta in- new functions akin to those used for the LAPIC formation about the mapping that will be per- timer is rolled to provide this functionality. formed later. It records number of segments, their sizes and a total size of the mapping. Due to the limitation of grant tables, only the page 5.2 Interrupts: Platform Device sized segments are currently supported. The As an alternative to the IDT method, domU wrapper of bus dmamap create() allocates an guest implementing PCI bus discovery can im- additional array of entries that will be used plement a driver for the XenSource Platform to map physical addresses of map segments Device, 0x5853:0x0001. This device provides to grant table references. At the same time a level triggered interrupt wired to the emulated grant table entries for all map segments are re- APIC and once it’s set up, Xen can be made served via xen grant table alloc(). This array aware of it. of entries is then set as a DMA map cookie. A driver xspd(4) has been implemented for A wrapped bus dmamap destroy() method can this device that configures the Xen interrupt free those references and destroy the map. vector to call the Xen upcall when the IDT Not all bus dma(9) methods need to be method is not available. wrapped. For instance memory allocation and KVA mapping functions bus dmamem alloc() and bus dmamem map() as well as their de- 6 Grant tables structive counterparts don’t need any special handling. Grant tables represent a mechanism of passing However in order to establish associations references to pages of memory allocated by the between physical addresses of DMA memory guest across domains. In essence it’s similar segments and grant table references wrapped to the IOMMU mechanism where device vis- versions of bus dmamap load() family of ible addresses are translated into physical ad- functions is required. After calling the system dresses but in this case device visible addresses bus dmamap load() method a wrapper needs are represented as indexes into the grant table to go through all map segments represented and point to a grant table entries. by the dm segs member of the bus dmamap t Grant table entries contain a frame number structure, associate them with grant table ref- and access flags that are set up when one do- erences and update entries in the grant table main wants to provide access to its own mem- via xen grant table enter(). Upon success the ory to the other domain. Upon startup the hy- physical address of the segment ds addr is re- pervisor sets an upper limit on how many grant placed with the grant table reference. After this table frames can be used by the guest system. call, the driver can pass this reference to the OpenBSD implements a bus dma(9) [1] ab- other domain via a ring descriptor or a similar straction on top of grant tables. It defines a new mechanism. bus dma tag that contains methods that wrap To remove the mapping a underlying bus dmamap * functions in a way bus dmamap unload() method wrapper that the memory managed by this underlying calls the xen grant table remove() and puts methods gets accounted for by the grant tables physical addresses back into the ds addr before _dm_cookie Grant Table page page 1 reference 0 segment 0 reference 1 grant table entry 0 page 2 segment 1 reference 2 grant table entry 1 segment 2 grant table entry 2 page 3 Mapping table bus_dmamap_t object ..

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us