In Proceedings of the 2002 International Computer Music Conference, G¨oteborg, Sweden 1

RtAudio: A Cross-Platform ++ Class for Realtime Audio Input/Output

Gary P. Scavone [email protected] Center for Computer Research in Music and Acoustics Department of Music, Stanford University Stanford, California 94305-8180 USA

Abstract other than those necessitated by the underlying platform-specific audio interfaces. This paper presents a cross-platform C++ class for realtime audio input and output streaming. RtAu- dio provides a flexible, easy to use application pro- 2 Features & Design Goals gramming interface (API) which allows complete audio system control, including device capability RtAudio is a C++ class which provides a com- querying, multiple concurrent streams, blocking mon API for realtime audio input/output across and callback functionality. RtAudio is currently , Irix, and Windows operating systems. supported on Windows platforms using the Direct- RtAudio significantly simplifies the process of in- Sound API, Linux platforms using both the OSS terfacing with computer audio hardware. It was and ALSA , and on Irix platforms. Support designed with the following goals: for OS-X and Steinberg ASIO drivers is planned • object-oriented C++ structure for Spring 2002. • single independent header and source file for easy inclusion in programming projects • blocking and callback functionality 1 Introduction • flexible, easy to use, audio device parameter control While programming languages have gained • automatic internal conversion for data for- standardized support across the myriad of com- mat, channel number compensation, de- puter platforms and operating systems in exis- interleaving, and byte-swapping tence, a commonly supported API for audio pro- • control over multiple audio streams and de- gramming is far from a reality. As a result, an vices with a single class instance attempt to provide multi-platform support for an • audio device capability probing audio application can prove difficult at best. To RtAudio incorporates the concept of audio further complicate matters, multiple audio driver streams, which represent independent audio out- interfaces often exist for a single . put (playback) and/or input (recording) “connec- For example, Windows platforms have Direct- tions” to audio devices. Available audio devices Sound, Windows Multimedia Library, and ASIO and their capabilities can be enumerated and then (Steinberg) driver options, Linux platforms have specified when opening a stream. Multiple streams Open Sound System (OSS) and Advanced Linux can run at the same time and, when allowed by the Sound Architecture (ALSA) drivers, and Macin- underlying audio API, a single device can serve tosh platforms have Sound Manager, ASIO and multiple streams. drivers. RtAudio was designed to pro- The RtAudio API provides both blocking (syn- vide a common interface across a variety of these chronous) and callback (asynchronous) function- APIs in as flexible, yet simple, manner as possible. ality. Callbacks offer a simple means for achiev- RtAudio was originally developed to provide ing non-blocking audio input/output. Blocking audio input/output support for the Synthesis functionality is often necessary for explicit con- ToolKit in C++ (STK) [Cook and Scavone 1999]. trol of multiple input/output stream synchroniza- However, the latest release of RtAudio (version tion or when audio must be synchronized with 2.0, January 2002) was designed to function in- other system events. All public RtAudio func- dependently from STK, as well as any libraries tions are thread-safe. This allows users to safely In Proceedings of the 2002 International Computer Music Conference, G¨oteborg, Sweden 2 embed blocking RtAudio functions within a multi- is supplied to a method or a driver error occurs. threaded programming structure of their own de- There are a number of cases within RtAudio where sign. warning messages may be displayed but an excep- RtAudio offers uniform support for 8-bit, 16- tion is not thrown. bit, 24-bit, and 32-bit signed integer data for- mats, as well as 32-bit and 64-bit floating point 3.1 Device Capabilities formats. When an audio device does not na- tively support a requested user format, RtAudio RtAudio provides the following functions for provides automatic format conversion. In addi- use in probing the number and capabilities of avail- tion, internal routines will automatically perform able audio devices: any byte-swapping, channel number compensa- int getDeviceCount (void); tion, and channel de-interleaving required by the underlying audio driver or hardware. void getDeviceInfo (int device, On Linux platforms, both native ALSA and RTAUDIO_DEVICE *info); OSS audio APIs are supported. Portability to other OSS supported systems, such as Solaris and The RTAUDIO DEVICE structure contains in- HP-UX, is untested but most likely easily achieved. formation commonly required in assessing the ca- The ALSA driver model was recently incorporated pabilities of an audio device, including its name, into the Linux development kernel and will likely minimum and maximum number of input, output, gain wide acceptance in the near future. The and duplex channels, supported sample rates, and ALSA API provides a more developed level of sup- native data formats. port for professional quality audio devices than OSS. On Windows platforms, only the Direct- 3.2 Stream Creation & Parameters Sound API is currently supported. On SGI plat- forms, the newer “al” API is supported. In addition to the default constructor, RtAudio The RtAudio API incorporates many of the provides an overloaded constructor which allows a concepts developed in the PortAudio project stream to be immediately opened with a given set [Bencina and Burk 2001]. RtAudio distinguishes of device parameters. Alternately, a stream can be itself from PortAudio in its object-oriented, C++ opened after instantiation in much the same way. framework, single-file encapsulation, native block- RtAudio (int *streamId, ing support, ALSA support, thread-safe routines, int outputDevice, and slightly less ambitious API (which makes int outputChannels, RtAudio less prone to bugs and easier to maintain int inputDevice, and extend). int inputChannels, All source code for RtAudio is made freely RTAUDIO_FORMAT format, available, allowing full user extensibility and cus- int sampleRate, tomization. RtAudio is distributed with a tutorial int *bufferSize, and complete API documentation in HTML, PDF, int numberOfBuffers); and RTF formats. int openStream (int outputDevice, RtAudio int outputChannels, 3 The API int inputDevice, int inputChannels, All uses of RtAudio must begin with object in- RTAUDIO_FORMAT format, stantiation. The default constructor RtAudio() int sampleRate, scans the underlying audio system to verify that int *bufferSize, at least one audio input/output device is available. int numberOfBuffers); RtAudio uses C++ exceptions to handle critical errors, necessitating try/catch blocks around most A stream is opened with specified output and member functions as well as constructors. Like- input devices, output and input channels, data for- wise, all uses of RtAudio must end with class de- mat, sample rate, and buffer parameters. When struction. successful, a stream identifier is returned which RtAudio uses a C++ exception handler called must be used for subsequent function calls on the RtError, which is declared and defined within the stream. Audio devices are identified by integer RtAudio class files. An RtError can be caught by values of one and higher, as enumerated by the type, providing a means for error correction or at getDeviceInfo() function. In addition, the sys- a minimum, more detailed error reporting. Al- tem default input/output devices are identified by most all RtAudio methods can ”throw” an RtEr- a zero value. When a device identifier of zero is ror, most typically if an invalid stream identifier In Proceedings of the 2002 International Computer Music Conference, G¨oteborg, Sweden 3 specified during stream creation, RtAudio first at- the user must first get a pointer to the stream tempts to open the default audio device(s) with the buffer, provided by RtAudio, for use in feeding data given parameters. If that fails, an attempt is made to/from the opened stream. Memory management to find a device or set of devices which will meet the for the stream buffer is automatically controlled given parameters. If all attempts are unsuccessful, by RtAudio. The bufferSize value returned dur- an RtError is thrown. When a positive, non-zero ing stream creation defines the length, in sample device value is specified, no additional devices are frames, of the stream buffer. Multichannel data in probed. Example program code is provided in the the stream buffer must be in interleaved order. appendix of this paper. char *const getStreamBuffer (int streamId); Because RtAudio can be used to simultaneously void tickStream (int streamId); control more than a single stream, it is necessary int streamWillBlock (int streamId); that the returned stream identifier be provided to nearly all public methods. After starting the stream, the sequence of The bufferSize parameter specifies the desired events then consists of filling or reading from number of sample frames which will be written to the stream buffer between calls to tickStream(). and/or read from a device per write/read oper- The tickStream() function blocks until the ation. Both the bufferSize and numberOfBuffers data within the stream buffer can be com- parameters can be used to control stream latency, pletely processed by the audio device. The though there is no guarantee that the passed values streamWillBlock() function is provided as a will be accepted by a device. In general, lower val- means for determining, a priori, whether the ues for both parameters will produce less latency tickStream() function will block, returning the but perhaps less robust performance. Both param- number of sample frames that cannot be processed eters can be specified with values of zero, in which without blocking. case the smallest allowable values will be used. The bufferSize parameter is passed as a pointer and the 3.5 Callback Functionality actual value used by the stream is set during the device setup procedure. Callback functionality provides non-blocking, asynchronous control of audio processing. In this 3.3 Stream Control mode, the user defines a global C function which is periodically called when the audio device is ready An opened stream will not begin to in- to receive/send a new buffer of audio data. The put/output data until it is “started” using the callback function fills or reads interleaved data startStream() function. Several other useful from the stream buffer, interfacing with the user functions are listed below as well. A stream can be program in an application-dependent manner. In- stopped and “restarted” as many times as neces- ternally, callback functionality involves the cre- sary. Once the stream is closed, however, it ceases ation of a separate process or thread which pro- to exist. The abortStream() function stops a vides non-blocking access to an audio device. stream immediately, dropping any remaining audio samples in its queue. The stopStream() function void setStreamCallback (int streamId, plays out any remaining data in its queue before RTAUDIO_CALLBACK callback, stopping. void *userData); void startStream (int streamId); void cancelStreamCallback (int streamId); void stopStream (int streamId); void abortStream (int streamId); The cancelStreamCallback() function disas- void closeStream (int streamId); sociates a callback function from an open stream. The user can subsequently set a new callback func- In general, the stopStream() and tion for the stream or even use blocking functions. closeStream() methods should be called af- It should be noted that it is not possible to ex- ter finishing with a stream. However, both plicitly synchronize multiple simultaneous callback methods will implicitly be called during object streams. When synchronous control is required in destruction if necessary. a non-blocking scheme, users should create their The remaining steps involved in audio playback own thread in which they embed RtAudio block- or recording vary depending on whether blocking ing functions. or callback functionality is used.

3.4 Blocking Input/Output 4 Summary Blocking read/write functionality provides syn- RtAudio provides a flexible and easy to chronous control of audio processing. In this mode, use cross-platform API for realtime audio in- In Proceedings of the 2002 International Computer Music Conference, G¨oteborg, Sweden 4 put/output within an object-oriented C++ frame- // with buffer_size sample frames. work. This paper has briefly presented some ... features and uses of RtAudio. Within the con- fines of this space, it is impossible to address // Trigger the output of the data buffer all the necessary issues of interest to audio ap- out->tickStream(id); plication programmers. We recommend that in- count += buffer_size; terested parties download the RtAudio distribu- } tion (http://www-ccrma.stanford.edu/~gary/- out->stopStream(id); rtaudio/) and read the extensive documentation out->closeStream(id); provided. delete out; // Cleanup. return 0; References }

R. Bencina and P. Burk. PortAudio - an Open The last program example demonstrates call- Source Cross Platform Audio API. In Proc. 2001 back functionality in a simple duplex, pass-through Int. Computer Music Conf., pages 263–266, Ha- scenario. Again, error checking is omitted. vana, Cuba, 2001. Comp. Music Assoc. // duplex.cpp P. R. Cook and G. P. Scavone. The Synthesis #include ToolKit (STK). In Proc. 1999 Int. Computer #include "RtAudio.h" Music Conf., pages 164–166, Beijing, China, 1999. Comp. Music Assoc. // Pass-through callback function. int pass(char *buffer, int size, void *) A Programming Examples { // Surprise!! Nothing to do here. The following program example outlines the use return 0; of RtAudio in a simple, blocking playback situa- } tion. For the sake of clarity and space, error check- ing is omitted. int main() { // playback.cpp int buffer_size = 256; // sample frames #include "RtAudio.h" intstream; //thestreamid RtAudio *audio; int main() { // Open a 2 channel input/output stream int buffer_size = 256; // sample frames // during class instantiation using the intid; //thestreamid // default devices, 64-bit floating point RtAudio *out; // data, and 44100 Hz sample rate. // Suggest the use of 2 internal device // Open a 2 channel output stream during // buffers of 256 sample frames each. // class instantiation using the default audio = new RtAudio(&stream, 0, 2, 0, 2, // device, 32-bit floating point data, RtAudio::RTAUDIO_FLOAT64, // and 44100 Hz sample rate. Suggest the 44100, &buffer_size, 2); // use of 4 internal device buffers of // 256 sample frames each. // Set the stream callback function out = new RtAudio(&id, 0, 2, 0, 0, audio->setStreamCallback(stream, RtAudio::RTAUDIO_FLOAT32, &pass, NULL); 44100, &buffer_size, 4); audio->startStream(stream); // Get a pointer to the stream buffer cout << "Hit to quit." << endl; float *buf; char input; buf = (float *)out->getStreamBuffer(id); cin.get(input);

// An example loop which runs for about audio->stopStream(stream); // 40000 sample frames audio->closeStream(stream); int count = 0; delete audio; // Cleanup out->startStream(id); return 0; while (count < 40000) { } // Generate samples and fill the buffer