Cleaning up the Linux Desktop Audio Mess
Total Page:16
File Type:pdf, Size:1020Kb
Cleaning up the Linux Desktop Audio Mess Lennart Poettering Red Hat, Inc. [email protected] Abstract audio processing and network transparency in an exten- sible desktop sound server. PulseAudio is now part of Desktop audio on Linux is a mess. There are just too many distributions, and is likely to become the default many competing, incompatible sound systems around. sound system on Fedora and Ubuntu desktops in the Most current audio applications have to support every next releases of these distributions. sound system in parallel and thus ship with sound ab- straction layers with a more or less large number of The talk will mostly focus on the user-space side of back-end plug-ins. JACK clients are incompatible with Linux audio, specifically on the low-level interface be- ALSA clients, which in turn are incompatible with OSS tween hardware drivers and user-space applications. clients, which in turn are incompatible with ESD clients, and so on. “Incompatible” often means “exclusive;” e.g., if an OSS application gets access to the audio hard- 2 Current State of Linux Audio ware, all ALSA applications cannot access it. The current state of audio on Linux and other Free Soft- Apple MacOS X has CoreAudio, Microsoft Windows ware desktops is quite positive in some areas, but in XP has a new user-space audio layer; both manage to many other areas, it is unfortunately very poor. Several provide comprehensive APIs that make almost every competing audio systems and APIs are available. How- user happy, ranging from desktop users to pro audio ever, none of them is useful in all types of applications, people. Both systems provide fairly modern, easy-to- nor does any meet the goals of being easy-to-use, scal- use audio APIs, and a vast range of features including able, modern, clean, portable, and complete. Most of desktop audio “bling.” these APIs and systems conflict in one way or another. On Linux we should be able to provide the same: a com- mon solution that works on the desktop, in networked On the other hand, we have a few components and thin-client setups and in pro audio environments, scal- APIs for specific purposes that are well accepted and ing from mobile phones to desktop PCs and high-end cleanly designed (e.g., LADSPA, JACK). Also, Linux- audio hardware. based systems can offer a few features that are not avail- able on competing, proprietary systems. Among them 1 Fixing the Linux Audio Stack is network transparent audio and relatively low-latency scheduling. In my talk, I want to discuss what we can do to clean While competing, proprietary systems currently lack a up the mess that desktop audio on Linux is: why we few features the Linux audio stack can offer, they man- need a user-space sound system, what it should look aged to provide a single (specific to the respective OS) like, how we need to deal with the special requirements well-accepted API that avoids the balkanisation we cur- of networked audio and pro-audio stuff, and how we rently have on Free Software desktops. Most notably, should expose the sound system to applications for al- Apple MacOS X has CoreAudio which is useful for the lowing Compiz-style desktop “bling”—but for audio. I desktop as well as for professional audio applications. then will introduce the PulseAudio sound server as an Microsoft Windows Vista, on the other hand, now ships attempt to fix the Linux audio mess. a new user-space audio layer, which also fulfills many of PulseAudio already provides compatibility with 90% of the above requirements for modern audio systems and all current Linux audio software. It features low-latency APIs. • 145 • 146 • Cleaning up the Linux Desktop Audio Mess In the following sections, I will quickly introduce the • Hardly portable to non-Unix systems. systems that are currently available and used on Linux • ioctl()-based interface is not type-safe, not the desktop, their specific features, and their drawbacks. most user-friendly. 2.1 Advanced Linux Sound Architecture (ALSA) • Incompatible with everything else; every applica- tion gets exclusive access to the sound device, thus The ALSA system [2] has become the most widely ac- blocking all other applications from accessing it si- cepted audio layer for Linux. However, ALSA, both as multaneously. audio system and as API, has its share of problems: • Almost impossible to virtualize correctly and comprehensively. Hacks like esddsp, aoss, • The ALSA user-space API is relatively complicated. artsdsp have proven to not work. • ALSA is not available on anything but Linux. • Applications silently assume the availability of cer- • The ALSA API makes certain assumptions about tain driver functionality that is not necessarily avail- sound devices that are only true for hardware de- able in all setups. Most prominently, the 3D game vices. Implementing an ALSA plug-in for virtual Quake doesn’t run with drivers that don’t support devices (“software” devices) is not doable without mmap()-based access to the DMA audio buffer. nasty hacks. • No network transparency, no support for desktop • Not “high-level” enough for many situations. “bling.” • dmix is bug-ridden and incomplete. The good things: • Resampling is very low quality. • Relatively easy to use; • You need different configurations for normal desk- top use (dmix) and pro audio use (no dmix). • Very well accepted, even beyond Linux; • Feels very Unix-ish. On the other hand, it also has some real advantages over other solutions: 2.3 JACK • It is available on virtually every modern Linux in- stallation. The JACK Audio Connection Kit [3] is a sound server • It is very powerful. for professional audio purposes. As such, it is well ac- cepted in the pro-audio world. Its emphasis, besides • It is (to a certain degree) extendable. playback of audio through a local sound card, is stream- ing audio data between applications. 2.2 Open Sound System (OSS) Plusses: OSS [7] has been the predecessor of ALSA in the Linux kernel. ALSA provides a certain degree of compatibility • Easy to use; with OSS. Besides that the API is available on several • Powerful functionality; other Unixes. OSS is a relatively “old” API, and thus • It is a real sound server; has a number of limitations: • It is very well accepted in the pro-audio world. • Doesn’t offer all the functionality that modern sound hardware provides which needs to be supported by Drawbacks: the software (such as no surround sound, no float samples). • Only floating point samples; • Not high-level enough for almost all situations, since • Fixed sampling rate; it doesn’t provide sample format or sample rate • Somewhat awkward semantics which makes it un- conversions. Most software silently assumes that usable as a desktop audio server (i.e., server doesn’t S16NE samples at 44100Hz are available on all sys- start playback automatically, needs a manual “start” tems, which is no longer the case today. command); 2007 Linux Symposium, Volume Two • 147 • Not useful on embedded machines; goal, for the most part. More precisely, a modern sound • No network transparency. API for Free Software desktops should fulfill the follow- ing requirements: 2.4 aRts • Completeness: an audio API should be a general- purpose interface; it should be suitable for simple The KDE sound server aRts [5] is no longer actively de- audio playback as well as professional audio pro- veloped and has been orphaned by its developer. Having duction. a full music synthesiser as desktop sound server might • Scalability: usable on all kinds of different systems, not be such a good idea, anyway. ranging from embedded systems to modern desktop PCs and pro audio workstations. 2.5 EsounD • Modernness: provide good integration into the Free Software desktop ecosystem. The Enlightened Sound Daemon (EsounD or ESD) [4] • Proper support for networked and “software” (vir- has been the audio daemon of choice of the GNOME tual) devices, besides traditional hardware devices. desktop environment since GNOME 1.0 times. Besides basic mixing and network transparency capabilities, it • Portability: the audio API should be portable across doesn’t offer much. Latency querying, low-latency be- different operating systems. haviour, and surround sound are not available at all. It • Easy-to-use for both the user, and for the program- is thus hardly useful for anything beyond basic music mer. This includes a certain degree of automatic playback or playing event sounds (“bing!”). fine-tuning, to provide optimal functionality with “zero configuration.” 2.6 PortAudio 3.2 Routing and Filtering Audio in Software The PortAudio API [6] is a cross-platform abstraction layer for audio hardware access. As such it sits on top Classic audio systems such as OSS are designed to pro- of other audio systems, like OSS, ALSA, ESD, the Win- vide an abstract API around hardware devices. A mod- dows audio stack, or MacOS X’s CoreAudio. PortAudio ern audio system should provide features beyond that: has not been designed with networked audio devices in mind, and also doesn’t provide the necessary function- • It needs to be possible to play back multiple audio ality for clean integration into a desktop sound server. streams simultaneously, so that they are mixed in PortAudio has never experienced wide adoption. real time. • Applications should be able to hook into what is cur- 3 What we Need rently being played back. • Before audio is written to an output device, it might As shown above, none of the currently available audio be transferred over the network to another machine. systems and APIs can provide all that is necessary on • Before audio is played back some kind of post- a modern desktop environment.