Low-Latency Audio on by Means of Real-Time

Tommaso CUCINOTTA and Dario FAGGIOLI and Giacomo BAGNOLI Scuola Superiore Sant’Anna Via G. Moruzzi 1, 56124, Pisa (Italy) {t.cucinotta, d.faggioli}@sssup.it, [email protected]

Abstract audio collections and running the proper mixing In this paper, we propose to use resource reserva- software. tions scheduling and feedback-based allocation tech- In fact, developing complex multimedia appli- niques for the provisioning of proper timeliness guar- cations on GNU/Linux allows for the exploita- antees to audio processing applications. This al- tion of a multitude of OS services (e.g., network- lows real-time audio tasks to meet the tight tim- ing), libraries (e.g., sophisticated multimedia ing constraints characterizing them, even if other in- compression libraries) and media/storage sup- teractive activities are present in the system. The port (e.g., memory cards), as well as comfort- JACK sound infrastructure has been modified, lever- able and easy-to-use programming and debug- aging the real-time scheduler present in the Adaptive ging tools. However, contrarily to a Real-Time Quality of Service Architecture (AQuoSA). The ef- (RTOS), a GPOS is not gen- fectiveness of the proposed approach, which does not erally designed to provide scheduling guarantees require any modifiction to existing JACK clients, is to the running applications. This is why either validated through extensive experiments under dif- large amount of buffering is very likely to oc- ferent load conditions. cur, with an unavoidable impact on response time and latencies, or the POSIX fixed-priority Keywords (e.g., SCHED FIFO) real-time scheduling is uti- JACK, real-time, scheduling, time-sensitive, re- lized, but this turns out to be difficult when source reservation there is more than one time-sensitive applica- tion in the system. Though, on a nowadays 1 Introduction and Related Work GNU/Linux system, we may easily find a vari- There is an increasing interest in considering ety of applications with tight timing constraints General Purpose Operating Systems (GPOSes) that might benefit from precise scheduling guar- in the context of real-time and multimedia ap- antees, in order to provide near-professional plications. In the Personal Computing domain, quality of the user experience, e.g., audio acqui- multimedia sharing, playback and processing re- sition and playback, multimedia (video, gaming, quires more and more mechanisms allowing for etc.) display, video acquisition (v4l2), just to low and predictable latencies even in presence cite a few of them. In such a challenging sce- of background workloads nearly saturating the nario in which we can easily find a few tens of available resources, e.g., network links and CPU threads of execution with potentially tight real- power. In the professional multimedia domain, time requirements, an accurate set-up of real- spotting on stages, it is becoming quite common time priorities may easily become cumbersome, to see a digital keyboard attached to a common especially for the user of the system, who is usu- laptop running GNU/Linux. DJs and VJs are ally left alone with such critical decisions as set- moving to computer based setups to the point ting the real-time priority of a multimedia task. that mixing consoles have turned from big decks More advanced scheduling services than just into simple personal computers, only containing priority based ones have been made available for Linux during the latest years, among the others ∗ The research leading to these results has received fund- by [Palopoli et al., 2009; Faggioli et al., 2009; ing from the European Community’s Seventh Framework Checconi et al., 2009; Anderson and Students, Programme FP7 under grant agreement n.214777 “IR- MOS – Interactive Realtime Multimedia Applications on 2006; Kato et al., 2010]. Such scheduling poli- Service Oriented Infrastructures” and n.248465 “S(o)OS cies are based on a clear specification that needs – Service-oriented Operating Systems.” to be made by the application about what is the computing power it needs and with what time ers, samplers, tuners, and many others. These granularity (determining the latency), and this clients run as independent system processes, but scheme is referred to as resource reservations. they all must have an audio processing thread This is usually done in terms of a reservation handling the specific computation they make on budget of time units to be guaranteed every pe- the audio stream in real-time, and using the riod. The reservation period may easily be set JACK API for data exchanging. equal to the minimum activation period of the On its hand, JACK is in direct contact with application. Identifying the reservation budget the audio infrastructure of the OS (i.e., ALSA may be a more involved task, due to the need for on Linux) by means of a component referred to a proper benchmarking phase of the application, as (from now on) the JACK driver or just the and it is even worse in case of applications with driver. By default, double-buffering is used, so significant fluctuations of the workload (such as the JACK infrastructure is required to process it often happens in multimedia ones). Rather, audio data and filling a buffer, while the under- it is more convenient to engage adaptive reser- lying hardware is playing the other one. Each vation scheduling policies, where the schedul- time a new buffer is not yet available in time, ing parameters are dynamically changed at run- JACK logs the occurrence of an xrun event. time by an application-level control-loop. This acts by monitoring some application-level met- 3 AQuoSA Resource Reservation rics and increasing or decreasing the amount of Framework allocated computing resources depending on the The Adaptive Quality of Service Architec- instantaneous application workload. Some ap- ture (AQuoSA2) is an open-source frame- proaches of this kind are constituted by [Segovia work enabling soft real-time capabilities and et al., 2010; Abeni et al., 2005; Nahrstedt et al., QoS support in the Linux kernel. It in- 1998], just to mention a few. cludes: an deadline-based real-time sched- uler; temporal encapsulation provided via the 1.1 Contribution of This Paper CBS [Abeni and Buttazzo, 1998] algorithm; var- This work focuses on how to provide enhanced ious adaptive reservation strategies for building timeliness guarantees to low-latency real-time feedback-based scheduling control loops [Abeni audio applications on GNU/Linux. We use et al., 2005]; reclamation of unused bandwidth adaptive reservations within the JACK au- through the SHRUB [Palopoli et al., 2008] al- dio framework, i.e., we show how we modi- gorithm; a simple hierarchical scheduling capa- fied JACK in order to take advantage of AQu- bility which allows for Round Robin schedul- oSA [Palopoli et al., 2009], a software architec- ing of multiple tasks inside the same reser- ture we developed for enriching the Linux kernel vation; a well-designed admission-control log- with resource reservation scheduling and adap- ics [Palopoli et al., 2009] allowing controlled ac- tive reservations. Notably, in the proposed ar- cess to real-time scheduling capabilities of the chitecture, JACK needs to be patched, but au- system for unprivileged applications. For more dio applications using it do not require to be details about AQuoSA, the reader is referred modified nor recompiled. We believe the discus- to [Palopoli et al., 2009]. sion reported in this paper constitutes a valu- able first-hand experience on how it is possible 4 Integrating JACK with AQuoSA to integrate real-time scheduling policies into Adaptive reservations have been applied to multimedia applications on a GPOS. JACK as follows. In JACK, an entire graph of end-to-end computations is activated with 2 JACK: Jack Audio Connection Kit buffersize a periodicity equal to samplerate and it must JACK 1 is a well-known low-latency audio complete within the same period. Therefore, a server for POSIX conforming OSes (including reservation is created at the start-up of JACK, Linux) aiming at providing an IPC infrastruc- and all of the JACK clients, comprising the real- ture for audio processing where sound streams time threads of the JACK server itself (the au- may traverse multiple independent processes dio “drivers”), have been attached to such reser- running on the platform. Typical applications vation, exploiting the hierarchical capability of — i.e., clients — are audio effects, synthesis- AQuoSA. The reservation period has been set

1Note that the version 2 of JACK is used for this 2More information is available at: http://aquosa. study sourceforge.net. equal to the period of the JACK work-flow acti- The per-cycle consumed CPU time values vation. The reservation budget needs therefore were used to feed the AQuoSA predictor and to be sufficiently large so as to allow for com- apply the control algorithm to adjust the reser- pletion of all of the JACK clients within the pe- vation budget. riod, i.e., if the JACK graph comprises n clients, the execution time needed by all of the JACK 5 Experimental Results clients are c1, . . . cn, and the JACK period is The proposed modifications to JACK have T , then the reservation will have the following been validated through an extensive experi- budget Q and period P : mental evaluation conducted over the imple-  Pn mented modified JACK running on a Linux sys- Q = i=1 ci (1) tem. All experiments have been performed on P = T a common consumer PC (Intel(R) [email protected] GHz) with CPU dynamic voltage-scaling dis- Beside this, an AQuoSA QoS control-loop abled, and with a Terratec EWX24/96 PCI was used for controlling the reservation budget, sound card. The modified JACK framework based on the monitoring of the budget actu- and all the tools needed in order to reproduce ally consumed at each JACK cycle. The per- the experiments presented in this section are centile estimator used for setting the budget is available on-line 3. based on a moving window of a configurable In all the conducted experiments, results have number of consumed budget figures observed in been gathered while scheduling JACK using past JACK cycles, and it is tuned to estimate various scheduling policies: a configurable percentile of the consumed bud- get distribution (such value needs to be suffi- • CFS: the default Linux scheduling policy for ciently close to 100%). However, the actual allo- best effort tasks; cated budget is increased with respect to the re- • FIFO: the Linux fixed priority real-time sults of this estimation by a (configurable) over- scheduler; provisioning factor, since there are events that can disturb the predictor, making it potentially • AQuoSA: the AQuoSA resource reservation consider inconsistent samples, and thus nullify scheduler, without reclaiming capabilities; all the effort of adding QoS support to JACK, if • SHRUB: the AQuoSA resource reservation not properly addressed. Examples are an xrun scheduler with reclaiming capabilities. event and the activation of a new client, since in such case no guess can be made about the The metrics that have been measured amount of budget it will need. In both cases, the throughout the experiments are the following: budget is bumped up by a (configurable) per- • audio driver timing: the time interval centage, allowing the predictor to reconstruct between two consecutive activations of the its queue using meaningful samples. JACK driver. Ideally it should look like an 4.1 Implementation Details horizontal line corresponding to the value: buffersize All the AQuoSA related code is contained in samplerate ; the JackAquosaController class. The oper- • driver end date: the time interval be- ations of creating and deleting the AQuoSA tween the start of a cycle and the instant reservation are handled by the class construc- when the driver finishes writing the pro- tor and destructor, while operations necessary cessed data into the sound card buffer. If for feedback scheduling — i.e., collect the mea- this is longer than the server period, then surements about used budget, managing the an xrun just happened. samples in the queue of the predictor, set new budget values, etc. — are done by the When the AQuoSA framework is used to pro- CycleBegin method, called once per cycle in vide QoS guarantees, we also monitored the fol- the real-time thread of the server. Also, the lowing values: class needed some modifica- JackPosixThread • Set budget (Set Q): the budget dynam- tions, in order to attach real-time threads to the ically set for the resource reservation dedi- AQuoSA reservation when a new client registers cated to the JACK real-time threads; with JACK, and perform the corresponding de- 3 tach operation on a client termination. http://retis.sssup.it/~tommaso/papers/lac11/ • Predicted budget (Predicted Q): the value predicted at each cycle for the budget Table 2: Scheduling policy and priority (where by the feedback mechanism; applicable) of JACK and rt-app in the experi- ments in this section Moreover, the CPU Time used, at each cycle, scheduling class priority by JACK and all its clients has been measured JACK rt-app JACK rt-app as well. If AQuoSA is used and such value is (1) CFS CFS – – greater than the Set Q, then an xrun occurs (un- (2) FIFO FIFO 10 15 less the SHRUB reclaiming strategy is enabled). (3) FIFO FIFO 10 5 First of all the audio driver timing in a config- (4) AQuoSA AQuoSA – – uration where no clients were attached to JACK (5) SHRUB SHRUB – – has been measured, and results are shown in Ta- ble 1. JACK was using a buffer-size of 128 sam- or SHRUB, the reservation period is set equal ples and a sample-rate of 96 kHz, resulting in a to the application period, while the budget is period of 1333µs. Since, in this case, no other slightly over-provisioned with respect to its ex- activities were running concurrently (and since ecution time (5%). Each experiment was run the system load was being kept as low as possi- for 1 minute. ble), the statistics reveal a correct behaviour of all the tested scheduling strategies, with CFS ex- 5.1.1 JACK with a period of 1333µs hibiting the highest variability, as it could have and video-player alike load been expected. In this experiment, JACK is configured with a sample-rate of 96 kHz and a buffer-size of Table 1: Audio driver timing of JACK with no 128 samples, resulting in an activation period of clients using the 4 different schedulers (values 1333µs, while rt-app has period of 40ms and are in µs). execution time of 5ms. This configuration for Min Max Average Std. Dev rt-app makes it resemble the typical workload CFS 1268 1555 1342.769 3.028 produced by a video (e.g., MPEG format) de- FIFO 1243 1423 1333.268 2.421 coder/player, displaying a video at 25 frames AQuoSA 1279 1389 1333.268 2.704 per second. SHRUB 1275 1344 1333.268 2.692 Figures 1a and 1b show the performance of JACK, in terms of driver end time, and of rt-app, in terms of response time, respectively. 5.1 Concurrent Experiments Horizontal lines at 1333µs and at 40ms are To investigate the benefits of using reserva- the deadlines. The best effort Linux scheduler tions to isolate the behaviour of different — manages to keep the JACK performance good, concurrently running— real-time applications, but rt-app undergoes increased response-times a periodic task simulating the behaviour of a and exhibits deadline misses in correspondence typical real-time application has been added to of the start and termination of JACK clients. the system. The program is called rt-app, and This is due to the lack of true temporal isola- it is able to execute for a configurable amount tion between the applications (rather, the Linux of time over some configurable period. CFS aims to be as fair as possible), that causes The scheduling policy and configuration used rt-app to miss some deadlines when JACK has for JACK and for the rt-app instance in the peaks of computation times. The Linux fixed- experiments shown below are given in Table 2. priority real-time scheduler is able to correctly In all of the following experiments, we used a support both applications, but only if their rel- “fake” JACK client, dnl, constituted by a sim- ative priorities are correctly set, as shown by ple loop taking about 7% of the CPU for its insets 2 and 3 (according to the well-known computations. The audio processing pipeline rate-monotonic assignment, in this case rt-app of JACK is made up of 10 dnl clients, added should have lower priority than JACK). On one after the other. This leads to a total of the contrary, when using AQuoSA (inset 4), we 75% CPU utilisation. When AQuoSA is used achieve acceptable response-times for both ap- (i.e., in cases (4) and (5)), JACK and all its plications: rt-app keeps its finishing time well clients share the same reservation, the budget of below its deadline, whilst the JACK pipeline which is decided as described in Section 4. Con- has sporadic terminations slightly beyond the cerning rt-app, when it is scheduled by AQuoSA deadline, in correspondence of the registration 1400

1200

1000

800

600

400

200 (1) CFS/CFS (2) FIFO/FIFO (3) FIFO/FIFO (4) AQuoSA/AQuoSA (5) SHRUB/SHRUB 0 0 10000 20000 30000 40000 0 10000 20000 30000 40000 0 10000 20000 30000 40000 0 10000 20000 30000 40000 0 10000 20000 30000 40000 (a)

80000

70000

60000

50000

40000

30000

20000

10000

0 0 500 1000 1500 0 500 1000 1500 0 500 1000 1500 0 500 1000 1500 0 500 1000 1500 (b)

1400

1200

1000

800

600

400 CPU Time 200 Predicted Q Set Q 0 0 10000 20000 30000 40000 0 10000 20000 30000 40000 0 10000 20000 30000 40000 0 10000 20000 30000 40000 0 10000 20000 30000 40000 (c)

Figure 1: Driver end time of JACK (a) and response-times of rt-app (b). The Y axis reports time values in µs, while the X axis reports the application period (activation) number. The various insets report results of the experiment run under configurations (1), (2), (3), (4) and (5), from left to right, as detailed in Table 2. In (c), we report the CPU Time and (in insets 4 and 5) the set and predicted budgets for JACK during the experiment. of the first few clients. This is due to the over- ing AQuoSA, the Set Q and Predicted Q values provisioning and the budget pump-up heuristics for the experiment. The figure highlights that which would benefit of a slight increase in those the over-provisioning made with a high overall occasions (a comparison of different heuristics is JACK utilisation is probably excessive with the planned as future work). However, it is worth current heuristic, so we are working to improve mentioning that the JACK performance in this it. case is basically dependent on itself only, and can be studied in isolation, independently of 5.1.2 JACK with a period of 2666µs what else is running on the system. Finally, and VoIP alike load when enabling reclaiming of the unused band- Another experiment, very similar to the pre- width via SHRUB (inset 5), the slight budget vious one but with slightly varied parameters shortages are compensated by the reclaiming for the two applications has been run. This strategy: the small budget residuals which re- time JACK has a sample-rate of 48kHz and a main unused by one of the real-time applica- buffer-size of 128 samples, resulting in a period tions at each cycle are immediately reused by of 2666µs, while rt-app has a period of 10ms the other, if needed. and an execution time of 1.7ms. This could be For the sake of completeness, Figure 1c shows representative of a VoIP application, or of a 100 the CPU Time and, for the configurations us- Hz video player. Results are reported in Figure 5. Observa- tions similar to the ones made for the previous experiment may be done. However, the inter- ferences between the two applications are much more evident, because the periods are closer to each other than in the previous case. Moreover, the benefits of the reclaiming logic provided by SHRUB appears more evident here, since using just a classical hard reservation strategy (e.g., the hard CBS implemented by AQuoSA on the 4th insets) is not enough to guarantee correct behaviour and avoid deadline misses under the highest system load conditions (when all of the Figure 4: Server period and clients end time dnl clients are active). of JACK with minimum possible latency sched- uled by SHRUB (reservation period was 2001µs, i.e., three times the JACK period).

SHRUB FIFO CFS Min. 650.0 629.0 621.0 Max. 683.0 711.0 1369.0 Average 666.645 666.263 666.652 Std. Dev 0.626 1.747 2.696 Drv. End Min. 6.0 6.0 5.0 Drv. End Max. 552.0 602.0 663.0

Table 3: period and driver end time in the 3 cases (values are in µs). Figure 2: Server period and clients end time of JACK with minimum latency scheduled by CFS. values, even if there are no other applications in the system and the overall load if relatively low, xruns might occur anyway due to system over- heads, resolution of the OS timers, unforeseen kernel latencies due to non-preemptive sections of kernel segments, etc. In Figures 2, 3 and 4, we plot the client end times, i.e., the completion instants of each client for each cycle (relative to cycle start time). Such metric provides an overview of the times at which audio calculations are fin- ished by each client, as well as the audio period timing used as a reference. Things are work- ing correctly if the last client end time is lower Figure 3: Server period and clients end time than the server period (667µs in this case). of JACK with minimum possible latency sched- Clients are connected in a sequential pipeline, uled by FIFO. with Client0 being connected to the input (whose end-times are reported in the bottom- 5.1.3 JACK alone with minimum most curve), and Client9 providing the final possible latency output to the JACK output driver (whose end- Finally, we considered a scenario with JACK times are reported in the topmost curve). Also configured to have only 64 samples as buffer-size notice that when a client takes longer to com- and a sample-rate of 96kHz, resulting in 667µs plete, the one next to it in the pipeline starts of period. This corresponds to the minimum later, and this is reflected on the period duration possible latency achievable with the mentioned too. Some more details about this experiments audio hardware. When working at these small are also reported in Table 3. 6 Conclusions and future work ing in Real-Time Systems (LIT MUSRT ). In this work the JACK sound subsystem has http://www.cs.unc.edu/~anderson/ been modified so as to leverage adaptive re- litmus-rt/. source reservations as provided by the AQuoSA Fabio Checconi, Tommaso Cucinotta, Dario framework. It appears quite clear that both Faggioli, and Giuseppe Lipari. 2009. Hier- best effort and POSIX compliant fixed prior- archical multiprocessor CPU reservations for ity schedulers have issues in supporting multiple the linux kernel. In Proceedings of the 5th In- real-time applications with different timing re- ternational Workshop on Operating Systems quirements, unless the user takes the burden of Platforms for Embedded Real-Time Applica- setting correctly the priorities, which might be tions (OSPERT 2009), Dublin, Ireland, June. hard when the number of applications needing real-time support is large enough. On the other Dario Faggioli, Fabio Checconi, Michael Tri- hand, resource reservation based approaches al- marchi, and Claudio Scordino. 2009. An edf scheduling class for the linux kernel. In Pro- low each application to be configured in isola- th tion, without any need for a full knowledge of ceedings of the 11 Real-Time Linux Work- the entire set of deployed real-time tasks on the shop (RTLWS 2009), Dresden, Germany, Oc- system, and the performance of each applica- tober. tion will depend exclusively on its own work- S. Kato, R. Rajkumar, and Y. Ishikawa. 2010. load, independently of what else is deployed on Airs: Supporting interactive real-time appli- the system. We therefore think that it can be cations on multicore platforms. In Proc. of stated that resource reservations, together with the 22nd Euromicro Conference on Real-Time adaptive feedback-based control of the resource Systems (ECRTS 2010), Brussels, Belgium, allocation and effective bandwidth reclamation July. techniques, allows for achieving precise schedul- Klara Nahrstedt, Hao-hua Chu, and Srinivas ing guarantees to individual real-time applica- Narayan. 1998. QoS-aware resource manage- tions that are concurrently running on the sys- ment for distributed multimedia applications. tem, though there seems to be some space for J. High Speed Netw., 7(3-4):229–257. improving the currently implemented budget feedback-control loop. Luigi Palopoli, Luca Abeni, Tommaso Cu- Along the direction of future research around cinotta, Giuseppe Lipari, and Sanjoy K. the topics investigated in this paper, we plan Baruah. 2008. Weighted feedback reclaim- to explore on the use of two recently pro- ing for multimedia applications. In Proceed- posed reservation based schedulers, the IR- ings of the 6th IEEE Workshop on Embedded MOS [Checconi et al., 2009] hybrid EDF/FP Systems for Real-Time Multimedia (ESTIMe- real-time scheduler for multi-processor systems dia 2008), pages 121–126, Atlanta, Georgia, on multi-core (or multi-processor) platforms, United States, October. and the SCHED DEADLINE [Faggioli et al., Luigi Palopoli, Tommaso Cucinotta, Luca 2009] patchset, which adds a new scheduling Marzario, and Giuseppe Lipari. 2009. AQu- class that uses EDF to schedule tasks. oSA — adaptive quality of service architec- References ture. Software – Practice and Experience, 39(1):1–31. Luca Abeni and Giorgio Buttazzo. 1998. Inte- grating multimedia applications in hard real- Vanessa Romero Segovia, Karl-Erik Arz´en,˚ time systems. In Proceedings of the IEEE Stefan Schorr, Raphael Guerra, Gerhard Real-Time Systems Symposium, Madrid, Fohler, Johan Eker, and Harald Gustafsson. Spain, December. 2010. Adaptive resource management frame- work for mobile terminals - the ACTORS ap- Luca Abeni, Tommaso Cucinotta, Giuseppe proach. In Proc. of the Workshop on Adaptive Lipari, Luca Marzario, and Luigi Palopoli. Resource Management (WARM 2010), Sto- 2005. QoS management through adaptive cholm, Sweden, April. reservations. Real-Time Systems Journal, 29(2-3):131–155, March. Dr. James H. Anderson and Students. 2006. Linux Testbed for Multiprocessor Schedul- 3000

2500

2000

1500

1000

500 (1) CFS/CFS (2) FIFO/FIFO (3) FIFO/FIFO (4) AQuoSA/AQuoSA (5) SHRUB/SHRUB 0 0 10000 20000 0 10000 20000 0 10000 20000 0 10000 20000 0 10000 20000 (a)

1

0.8

0.6

0.4

0.2

0 0 750 1500 2250 3000 0 1500 3000 4500 6000 0 750 1500 2250 3000 0 750 1500 2250 3000 0 750 1500 2250 3000 (b)

1400

1200

1000

800

600

400 CPU Time 200 Predicted Q Set Q 0 0 10000 20000 30000 40000 0 10000 20000 30000 40000 0 10000 20000 30000 40000 0 10000 20000 30000 40000 0 10000 20000 30000 40000 (c)

14000

12000

10000

8000

6000

4000

2000

0 0 1500 3000 4500 6000 0 1500 3000 4500 6000 0 1500 3000 4500 6000 0 1500 3000 4500 6000 0 1500 3000 4500 6000 (d)

1

0.8

0.6

0.4

0.2

0 0 10000 20000 0 10000 20000 0 10000 20000 0 10000 20000 0 10000 20000 (e)

Figure 5: From top to bottom: driver end time and its CDF, JACK CPU Time and budgets, response time of rt-app and its CDF of the experiments with JACK and a VoIP alike load. As in Figure 1, time is in µs on the Y axes of (a)-(c)-(d), while the X axes accommodate application cycles.