Getting X Off the Hardware

Getting X Off The Hardware

Keith Packard HP Cambridge Research Laboratory [email protected]

Abstract How that has changed over the years will in- form the discussion of the design direction pro- posed in this paper. The X window system is generally implemented by directly inserting hardware manipulation code into the X server. Mode selection 1.1 Original Architecture and 2D acceleration code are often executed in user mode and directly communicate with the One of the first 2D accelerated targets for X11 hardware. The current architecture provides was the Digital QDSS (Dragon) board. The for separate 2D and 3D acceleration code, with Dragon included a 1024x768 frame buffer with the 2D code executed within the X server and 4 or 8 bits for each pixel. The frame buffer the 3D code directly executed by the applica- was not addressable by the CPU, rather every tion, partially in user space and partially in the graphics operation was performed by the co- kernel. Video mode selection remains within processor. The Dragon board had only a sin- the X server, creating an artificial dependency gle video mode supporting the monitor sup- for 3D graphics on the correct operation of the plied with the machine. A primitive terminal window system. This paper lays out an alterna- emulator in the kernel provided the text mode tive structure for X within the Linux environ- necessary to boot the machine. ment where the responsibility for acceleration lies entirely within the existing 3D user/kernel Graphics commands to the processor were library, the mode selection is delegated to an queued to a shared DMA buffer. The X server external library and the X server becomes a would block in the kernel waiting for space in simple application layered on top of both of the buffer when full. these. Various technical issues related to this Keyboard and mouse support were provided architecture along with a discussion of input by another shared memory queue between the device handling will be discussed. kernel and X server. Abstract event struc- tures were constructed by the kernel from the raw device data, timestamped and placed in 1 History the shared queue. A file descriptor would be signalled when new data were inserted to awaken the X server, and the X server could The X11[SG92] server architecture was de- also directly examine the queue indices which signed assuming significant operating assis- were stored in the shared segment. This low- tance for supporting input and output devices. overhead queue polling was used by the X server to check for new input after every X re- Keyboard and mouse support were provided by quest was executed to reduce input latency. the kernel as files from which events could be read. The lack of any shared memory mech- The hardware sprite was handled in the ker- anism to signal available input meant that the nel; it’s movement was directly connected with original driver would not notice input events the mouse driver so that it could be moved at until the X server polled the kernel, something interrupt time, leading to a responsive pointer which could take significant time. As there was even in the face of high CPU load within the no kernel support for the pointer sprite, the X X server and other applications. The keyboard server was responsible for updating it as well, controller managed the transition from ASCII leading to poor mouse tracking when the CPU console mode to key-transition X mode inter- was busy. nally; abnormal termination of the X server would leave the underlying console session To ameliorate the poor mouse tracking, the working normally. X server was modified to receive a signal when input was present on the file descrip- 1.2 The Slippery Slope tors and immediately process the input. When supported, the hardware sprite would also be Early Sun workstations had unaccelerated moved at this time, leading to improved track- frame buffers. Like the QDSS above, they used ing performance. Still, the fact that the X fixed monitors and had no need to support mul- server itself was responsible for connecting the tiple video modes. As the hardware advanced, mouse motion to the sprite location meant that they did actually gain programmable timing under high CPU load, the sprite would notice- hardware, but that was not configurable from ably lag the mouse. the user mode applications. Kernel support for the keyboard consisted of The X server simply mapped the frame buffer a special mode setting which would transform into its address space and manipulated the pixel the keyboard from an ASCII input device to values directly. Around 1990, Sun shipped the reporting raw key transition events. Because cgsix frame buffer which included an acceler- the kernel didn’t track what state the keyboard ator. Unlike the QDSS, the cgsix frame buffer was in, the X server had to carefully reset the could be mapped by the CPU, and the acceler- keyboard on exit back to ASCII mode or the ator documentation was not published by Sun. user would no longer be able to interact with X11R4 included support for this card as a sim- the console. ple dumb frame buffer. As CPU access to the frame buffer was slower than with Sun’s ear- Placing the entire graphics driver in user mode lier unaccelerated frame buffers, the result was eliminated the need to write a kernel driver, a much slower display. but marginalized overall system performance By disassembling the provided SunWindows by forcing the CPU to busy-wait for the graph- driver, the author was able to construct an ac- ics engine. celerated X driver for X11R5 entirely in user mode. This driver could not block waiting for Fixing the kernel to address these problems the accelerator to finish, rather it would spin, was never even considered; the problems didn’t polling the accelerator until it indicated it was prevent the system from functioning, they only idle. made it less than ideal. 1.3 The Dancing Bear Mouse support really was just a kernel serial driver—PS/2 mice didn’t exist, and so bus and serial mice were used. The X server itself With widespread availability of commodity would open the device, configure the commu- 386-based PC hardware, numerous vendors be- nication parameters and parse the stream of gan shipping Unix (and Unix-like) operating bytes. As there was no hardware sprite sup- systems for them. These originally did not in- port, the X server would also have to draw the clude the X window system. A disparate group cursor on the screen; that operation had to be of users ported X to these systems without any synchronized with rendering and so would be support from the operating system vendors. delayed until the server was idle.

That these users managed to get X running on Because the X server itself was managing the early 386 hardware was an impressive feat. video mode configuration, an abnormal X That they had to do everything without any ker- server termination would leave the video card nel support only increased the difficulty. misconfigured and unusable as the console. Similarly, the keyboard driver would be left in Early PC graphics cards were simple frame untranslated mode, so the user couldn’t even buffers as far as graphics operations went, but operate the computer blind to reboot. configuring them to generate correct video tim- The X server has assumed the same reliability ings was far from simple. Because monitors requirements as the operating system kernel it- varied greatly, each graphics card could be pro- self; bugs in the X server would render the sys- grammed to generate many different video tim- tem just as unusable as bugs in the kernel. ings. Incorrect timings could destroy the monitor. 1.4 The Pit of Despair Keyboard support in these early 386-based Unix systems was much like the Sun operating With the addition of graphics acceleration to system; the keyboard was essentially a serial the x86 environment, the X server extended its device and could be placed in a mode which user-mode operations to include manipulation translated key transitions into ASCII or placed of the accelerator. As with the Sun GX driver a mode which would report the raw bytes emit- described above, these drivers included no ker- ted from the keyboard. nel support and were forced to busy-wait for the hardware. The X server would read these raw bytes and However, unlike the GX hardware, PC graph- convert them to X events. Again, there was ics hardware would often tie down the PCI latency here as the X server would not pro- bus while transferring data between the CPU cess them except when polling for input across and the graphics card. Incorrect manipulation all X clients and input devices. As with the of the hardware would result in the PCI bus Sun driver, if the X server terminated with- locking and the system not even responding out switching the keyboard back to translated to network or disk activity. Unlike the simple mode, it would not be usable for the con- keyboard translation problem described above, sole. This particular problem was eventually this cannot be be fixed in the operating system. kludged in Linux by adding special key se- quences to reset the keyboard to translated Because the graphics devices had no kernel mode. driver support, there was no operating system management of their address space mappings. up the hardware when applications terminated If the BIOS included with the system incor- abnormally. rectly mapped the graphics device, it fell to the X server to repair the PCI mapping spaces. All 3D commands are written to command Manipulating the PCI address configuration buffers passed from the user application from a user-mode application would work only through the kernel to the device; the kernel on systems without any dynamic management can perform validation on the contents before within the kernel. queuing them for transfer to the graphics card. In many ways, this system is similar to the old If the machine included multiple graphics de- Dragon X server described above. vices controlled through the standard VGA ad- dresses, the X server would need to manipulate The result is a system which is stable in the these PCI mappings on the fly to address the face of broken applications, and provides high active card. performance and low CPU overhead. However, the DRI environment remains reliant on the X The overall goal was not to build the best sys- server to manage video mode selection and ba- tem possible, but rather to make the code as sic device input. portable as possible, even in the face of obvi- ously incorrect system architecture. 2 Forward to the Past 1.5 A Glimmer of Hope Given the dramatic changes in system architec- The Mesa project started as a software-only ture and performance characteristics since the rasterizer for the OpenGL API. By providing a original user-mode X server architecture was freely available implementation of this widely promulgated, it makes sense to look at how the accepted API, people could run 3D applica- system should be constructed from the ground tions on every machine, even those without up. Questions about where support for each custom 3D acceleration hardware. Of course, operation should live will be addressed in turn, performance was a significant problem, espe- first starting with graphics acceleration, then cially as the 3D world moved from simple col- video mode selection and finally (and most ored polygons to textures and complex lighting briefly) input devices. environments.

The Mesa developers started adding hardware 3 Graphics Acceleration support for the few cards for which documentation was available. At first, these were whole- screen drivers, but eventually the DRI project X has always directly accessed the lowest lev- was started to support multiple 3D applica- els of the system to accelerate 2D graphics. tions integrated into the X window system. Be- Even on the QDSS, it constructed the register- cause of the desire to support secure direct level instructions within the X server itself. rendering from multiple unprivileged applica- With the inclusion of OpenGL[SAe99] 3D tions, the DRI project had to include a kernel graphics in some systems, the system requires driver. That driver could manage device map- two separate graphics drivers, one for the X pings, DMA and interrupt logic and even clean server operating strictly in 2D mode and the other inside the GL library for 3D operations. bandwidth—large area fills and copies. In par- Improvements to the 3D support have no effect ticular acceleration of image composition re- on 2D performance. sults in dramatic performance improvements with minimal amounts of code. The spectacu- As a demonstration of how effectively OpenGL lar amounts of code written in the past that pro- can implement the existing X server graphics vide modest acceleration for corner cases in the operations, Peter Nilsson and David Reveman X protocol should be removed and those cases implemented the Glitz library[NR04] which left to software to minimize driver implemen- supports the Render[Pac01] API on top of the tation effort. OpenGL API. In a few months, they managed to provide dramatic acceleration for the Cairo This architecture has been implemented graphics library[WP03] on any hardware with by Eric Anholt in his kdrive-based Xati an OpenGL implementation. In contrast, the server[Anh04]. Using the existing DRI driver Render implementation within the X sample for the Radeon graphics card, he developed server using custom 2D drivers has never seen a 2D X driver with reasonable acceleration significant acceleration, even three and a half for common operations, including significant years after the extension was originally de- portions of the X render API. The driver uses signed. Only a few drivers include even half- only a small fraction of the Radeon DRI driver, hearted attempts at acceleration. a significantly smaller kernel driver would suffice for a ground-up implementation. The goal here is to have the X server use the In summary, graphics cards should be sup- OpenGL API for all graphics operations. Elim- ported in one of two ways: inating the custom 2D acceleration code will reduce the development burden. Using accelerated OpenGL drivers will provide dramatic 1. With an OpenGL-based X server performance improvements for important oper- 2. With a 2D-only X server based on a sim- ations now ill-supported in existing X drivers. ple loadable driver API. Work in this area will depend on the availability of stand-alone OpenGL drivers that work in the absence of an underlying window system. 3.1 Implications for Applications Fortunately, the Mesa project is busy develop- ing the necessary infrastructure. Meanwhile, None of the architectural decisions about the development can progress apace using the ex- internal X server architecture change the na- isting window-system dependent implementa- ture of the existing X and Render APIs as the tions, with the result that another X server is fundamental 2D interface for applications. Ap- run just to configure the graphics hardware and plications using the existing APIs will simply set up the GL environment. find them more efficient when the X server provides a better implementation for them. For cards without complete OpenGL acceler- This means that applications needn’t migrate ation, the desired goal is to provide DRI-like to non-X APIs to gain access to reasonable ac- kernel functionality to support DMA and in- celeration. terrupts to enable efficient implementation of whatever useful operations the card does sup- However, applications that wish to use port. For 2D graphics, the operations need- OpenGL should find a wider range of sup- ing acceleration are those limited by memory ported hardware as driver writers are given the choice of writing either an OpenGL or 2D tual’ desktop at least as large as the largest de- driver, and aren’t faced with the necessity of sired mode and making the current mode view starting with a 2D driver just to support X. a subset of that, panning the display around to keep the mouse on the screen. For users able to In any case, use of the cairo graphics library accept this metaphor, this provided usable, if provides insulation from this decision as it sup- less than ideal support. Most of the time, how- ports X and GL requiring only modest changes ever, having content off of the screen which in initialization to select between them. could only be reached by moving the mouse was confusing. To help address this, the X Re- size and Rotate extension (RandR)[GP01] was 4 Video Mode Configuration designed to notify applications of changes in the pixel size of the screen and allow program- The area of video mode selection involves matic selection among available video modes. many different projects and interests; one sig- The RandR extension solved the simple single nificant goal of this discussion is to identify monitor case well enough, even permitting the which areas are relevant to X and how those set of available modes to change on the fly as can be separated from the larger project. monitors were switched. However, it failed to address the wider problem of supporting mul- 4.1 Overview of the Problem tiple different video outputs and the dynamic manipulation of content between them. Back in 1984 when X was designed, graphics Statically, the X server can address each video devices were fundamentally fixed in their rela- output correctly and even select between a tionship with the attached monitor. The hard- large display spanning a collection of out- ware would be carefully designed to emit video puts or separate displays on each video screen. timings compatible with the included monitor; However, there is no capability to adjust these there was no provision for adjusting video tim- configurations dynamically, nor even to auto- ings to adapt to different monitors, each video matically adapt to detected changes in the en- card had a single monitor connector. vironment. Fast forward to 2004 when common video cards have two or more monitor connectors 4.3 X is Only Part of the Universe along with outputs for standard NTSC, SE- CAM or PAL video formats. The desire to dynamically adjust the display environment to With 2D performance no longer a signifi- accommodate different use modes is well sup- cant marketing tool, graphics hardware ven- ported within the Apple OS X and Microsoft dors have been focusing instead on differenti- Windows environments, but the X window sys- ating their products based on video output (and tem has remained largely stuck with its 1984 input) capabilities. This has dramatically ex- legacy. tended the options available to the user, and increased the support necessary within the operating system. 4.2 X Attempts to Fix Things As the suite of possible video configuration op- X servers for PC operating systems adapted to tions continues to expand, it seems impossi- simple video mode selection by creating a ’vir- ble to construct a fixed, standard X extension capable of addressing all present and future This external system could be implemented needs. Therefore, a fully capable mechanism partially in the kernel and partially in user- must provide some “back door” through which mode. Doing this would allow the kernel to display drivers and user agents can communi- share the same logic for video mode selection cate information about the video environment during boot time for systems which don’t auto- which is not directly relevant to the window matically configure the video card suitably on system or applications running within it. power-on. In addition, alternate graphics systems would be able to share the same API for One other problem with the current environ- their own video mode configuration. ment is that video mode selection is not a re- quirement unique to the X window system. Numerous other graphical systems exist which are all dependent on this code. Currently, that 5 Input Device Support is implemented separately for each video card supported by each system. The MxN combina- In days of yore, the X environment supported tion of graphics systems and video cards means exactly one kind of mouse and one (perhaps of that only a few systems have support for a wide an internationalized family) keyboard. Sadly, range of video cards. Support for systems aside this is no longer the case. The wealth of avail- from X is pretty sparse. able input devices has caused no small trouble in X configuration and management. Add to that the relative failure of the X Input extension 4.4 Who’s in Charge Here, Anyway? to gain widespread acceptance in applications and the current environment is relegated to em- X itself places relatively modest demands on ulating that available in 1984. the system. The X server needs to be aware of what video cards are available, what video 5.1 Uniform Device Access modes are available for each card and how to select the current mode. Within that mode The first problem to attack is that of the cur- there may be a wealth of information that is rent hodgepodge device support where the X not relevant to the X server; it really only server itself is responsible for parsing the raw needs to know the pixel dimensions of each bytestreams coming from the disparate input frame buffer, the physical dimension of pixels devices. Fortunately, the kernel has already on each monitor and the geometric relationship solved that problem—the new /dev/input based among monitors. Details about which video drivers provide a uniform description of de- port are in use, or how the various ports relate vices and standard interface to all. Converting to the frame buffer are not important. Infor- the X server over to those interfaces is straight- mation about video input mechanisms are even forward. less relevant. However, the /dev/input/mice interface has a As the X server need have no way of inter- significant advantage in todays world; it unifies preting the complexity of the video mode en- all mouse devices into a single stream so that vironment, it should have no role in managing the X server doesn’t have to deal with devices it. Rather, an external system should assume that come and go. So, to switch input mecha- complete control and let the X server interact nisms, the X server must first learn to deal with in its own simple way. that. 5.2 Hotplug and HAL 6 Migrating Devices

When X was developed, each display consisted Mice (and even keyboards) can be easily at- of a single keyboard and mouse along with tached and detached from the machine. With a fixed set of monitors. That collection was USB, the system is even automatically notified used for a single login session, and the in- about the coming and going of devices. What put devices never moved. All of that has now is missing here is a way of getting that noti- changed; input devices come and go, comput- fication delivered to the X server, having the ers get plugged into video projectors, multi- X server connect to the new device (when apple users login to the same display. The dy- propriate), notifying X applications about the namic nature of the modern environment re- availability of the new device and integrating quires some changes to the X protocol in the the devices events into the core pointer or key- form of new or modified extensions. board event stream.

The Hardware Abstraction Layer (HAL)[Zeu] 6.1 Whose Mouse Is This? project is designed to act as an intermediary between the Linux Hotplug system and applications interested in following the state of de- Input devices are generally located in physical vices connected to the machine. By interposing proximity to the related output device. In a sys- this mechanism, the complexity of discover- tem with multiple output devices and multiple ing and selecting input devices for the X server input devices, there is no existing mechanism can be moved into a separate system, leaving to identify which device is where. Perhaps the X server with only the code necessary to some future hardware advance will include ge- read events from the devices specified by the ographic information along with the bus topol- HAL. One open question is whether this should ogy. be done by a direct connection between the X server and the HAL daemon or whether an X The best we can probably do for now is to client could listen to HAL and transmit device provide a mechanism to encode in the HAL state changes through the X protocol to the X database the logical grouping of input and out- server. put devices. That way the X server would receive from the HAL the set of devices to use at One additional change needed is to extend the startup time and then accept ongoing changes X Input Extension to include notification of in that as the system was reconfigured. new and departed devices. That extension already permits the list of available devices to One problem with this simplistic approach is change over time, all that it lacks is the mech- that it doesn’t permit the migration of input anism to notify applications when that occurs. devices from one grouping to another; one Inside the X server implementation, the exten- can easily imagine the user holding a wireless sion is in for some significantly more chal- pointing device to attempt to interact with the lenging changes as the current codebase as- "wrong" display. Some mechanism for dynam- sumes that the set of available devices is fixed ically reconfiguring the association database at server initialization time. will need to be included. 6.2 Hotplugging Video Hardware The virtual terminal mechanism manages only the primary graphics device and the system keyboard. Management of other graphics and While most systems have no ability to add or input devices is purely by convention. The re- remove graphics cards, it’s not unheard of— sult is that multiple simultaneous X sessions many handheld computers support CF video are not easily supported by the standard build adapters. On the other hand, nearly all systems of the X server. The X server targeted at a non- do support “hotplugging” of the actual display primary graphics device needs to avoid config- device or devices. Many can even detect the uring the virtual terminal. However, this also presence or absence of a monitor enabling true eliminates the ability for that device to support auto-detection and automatic reconfiguration. multiple sessions; there cannot be virtual ter- When a new monitor is connected, the X server minal switching on a device which is not asso- needs to adapt its configuration to include it. In ciated with any virtual terminals. the case where the set of physical screens are With the HAL providing some indication of gathered together as a single logical screen, the which devices should be affiliated into a sin- change can be reflected by resizing that single gle session configuration, the X server can at screen as supported by the RandR extension. least select them appropriately. Similarly, the However, if each physical screen is exposed to X server should be able to detect which device applications as a separate logical screen, then is the console keyboard and manage virtual ter- the X server must somehow adapt to the pres- minals from there. Whether the kernel needs to ence of a new screen and report that informa- add support for virtual terminals on the other tion to applications. This will require an exten- graphics/keyboard devices is not something X sion. needs to answer. In terms of the existing X server implemen- The final problem is that of other input devices; tation, the changes are rather more dramatic. when switching virtual terminals, the X server Again, it has some deep-seated assumptions conventionally drops its connection to the other that the set of hardware under its control will input devices, presuming that whatever other not change after startup. Fixing these will keep program is about to run will want to use the developers entertained for some time. same ones. While that does work, it leaves open the possibility that an error in the X server 6.3 Virtual Terminal Switching will leave these devices connected and deny other applications access to them. Perhaps it would be better if the kernel was involved in One capability Linux has had for a long time the process and directing input among multi- is the ability to rapidly switch among multiple consumers automatically as VT affiliation ple sessions with “virtual terminals”. The X changed. server itself uses this to preserve a system console, running on a separate terminal ensures that the system console can be viewed by simply switching to the appropriate virtual termi- 7 Conclusion nal. Given this, multiple X servers can be started on the same hardware, each one on a Adapting the X window system to work ef- different virtual terminal and rapidly switched fectively and competently in the modern envi- among. ronment will take some significant changes in architecture, however throughout this process References existing applications will continue to operate largely unaffected. If this were not true, the [Anh04] Eric Anholt. High Performance X fundamental motivation for the ongoing exis- Servers in the Kdrive Architecture. tence of the window system would be in doubt. In FREENIX Track, 2004 Usenix An- nual Technical Conference, Boston, Migrating responsibility for device manage- MA, July 2004. USENIX. ment out of the X server and back where it belongs inside the kernel will allow for im- [GP01] Jim Gettys and Keith Packard. The provements in system stability, power manage- X Resize and Rotate Extension ment and correct operation in a dynamic envi- - RandR. In FREENIX Track, ronment. Performance of the resulting system 2001 Usenix Annual Technical Con- should improve as the kernel can take better ad- ference, Boston, MA, June 2001. vantage of the hardware than is possible in user USENIX. mode. [NR04] Peter Nilsson and David Reveman. Glitz: Hardware Accelerated Image Sharing graphics acceleration between 2D and Compositing using OpenGL. In 3D applications will reduce the effort needed FREENIX Track, 2004 Usenix An- to support new graphics hardware. Migrating nual Technical Conference, Boston, the video mode selection will allow all graph- MA, July 2004. USENIX. ics systems to take advantage of it. This should permit some interesting exploration in system [Pac01] Keith Packard. Design and Im- architecture. plementation of the X Rendering Extension. In FREENIX Track, Significant work remains in defining the pre- 2001 Usenix Annual Technical Con- cise architecture of the kernel video drivers; ference, Boston, MA, June 2001. these drivers need to support console opera- USENIX. tions, frame buffer device access and DRI (or other) 3D acceleration. Common memory allo- [SAe99] Mark Segal, Kurt Akeley, and cation mechanism seem necessary, along with Jon Leach (ed). The OpenGL Graph- figuring out a reasonable division of labor be- ics System: A Specification. SGI, tween kernel and user mode for video mode se- 1999. lection. [SG92] Robert W. Scheifler and James Get- Other work remains to resolve conflicts over tys. X Window System. Digital Press, sharing devices among multiple sessions and third edition, 1992. creating a mechanism for associating specific [WP03] Carl Worth and Keith Packard. Xr: input and output devices together. Cross-device Rendering for Vector Graphics. In Proceedings of the Ot- The resulting system regains much of the fla- tawa Linux Symposium, Ottawa, ON, vor of the original X11 server architecture. July 2003. OLS. The overall picture of a system which provides hardware support at the right level in the archi- [Zeu] David Zeuthen. HAL Specification tecture appears to have wide support among the 0.2. http://freedesktop. relevant projects making the future prospects org/~david/hal-0.2/spec/ bright. hal-spec.html.