DEGREE PROJECT IN INFORMATION AND COMMUNICATION TECHNOLOGY, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2017

Real-time audio processing for an embedded system using a dual-kernel approach

NITIN KULKARNI

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING Real-time audio processing for an embedded Linux system using a dual-kernel approach

Masters Thesis submitted in partial fulfillment of the requirements for the Degree of Master of Science in ICT Innovation - Embedded Systems at KTH Royal Institute of Technology, Stockholm

NITIN KULKARNI

Master’s Thesis Report Supervisor: Dr. Stefano Zambon Examiner: Dr. Carlo Fischione

TRITA-EE 2017:140 Acknowledgment

I would like to sincerely thank my parents and friends for their unwavering support and love all throughout this journey. I would like to thank Prof. Carlo Fischione for his guidance as my examiner and providing me this opportunity to work on Real-time embedded Linux. I am indebted to my colleague Sharan Yagneswar for making time to provide the right feedback and questions that helped me fine tune the project. Finally, I would like to thank all of them who have contributed in my graduate programme at Eindhoven University of Technology and at KTH Royal Institute of Technology, Stockholm.

Stockholm, October 2017 Nitin Kulkarni Abstract

Professional audio processing systems such as digital musical instruments, audio mixers, etc. must operate with very tight constraints on overall processing latency and CPU performance. Consequently, traditional implementations are still mostly based on specialized hardware like Digital Signal Processors (DSP) and Real-Time Operating Systems (RTOS) to meet such requirements. However, such systems are minimalistic in nature and they lack many features (e.g. network connectivity, wide hardware support, etc.) that a general-purpose such as Linux offers. Linux is a very popular choice for the operating system used in embedded devices, and many developers have started to use it for designing real-time systems with relaxed timing constraints. However, none of the available solutions using a standard can satisfy the low-latency requirements of professional audio systems.

In this thesis, a dual kernel approach is employed to enable an embedded Linux system to process audio with low roundtrip latency. The solution is developed using the framework for real-time computation, which is based on a technique known as pipeline (I-pipe). I-Pipe enables interrupt through a micro-kernel running between the Linux kernel and the interrupt controller hard- ware.

The designed system includes an Atom System-on-Chip (SoC), an XMOS microcontroller and audio converters to and from the analog domain. Custom kernel drivers and libraries have been developed to expose the audio programming func- tionalities to programs running in user-space. As a result, the system can achieve robust real-time performance appropriate for professional audio applications, and at the same time it has all the advantages of a traditional Linux solution such as compatibility with external devices and ease of programming. The real-time capa- bility is measured by evaluating the performance in terms of worst case response time of the real-time tasks in comparison to the same metrics obtained under a standard Linux kernel. The overall roundtrip latency of audio processing is showed to be improved by almost an order of magnitude (around 2.5ms instead of 20ms).

Keywords: Linux , , Interrupt Latencies, RTDM, SPI, XMOS, Audio Processing, Round-trip Latency. Sammanfattning

Profesionella system för ljudbearbetning, som digitala musikinstrument, mixerbord, etc, arbetar med väldigt hårda krav på tidfördröjning och CPU-prestanda. Som en konsekvens har dessa system traditionellt implementerats på specialiserad hårdvara som specifika DSP-processor och speciella realtidsoperativsystem. Den typen av system är till sin natur minimalistiska och saknar många funktioner (till exem- pel nätverk och brett stöd för olika hårdvaror) som mer generella operativsystem, som Linux, kan erbjuda. Linux är ett väldigt populärt val av operativsystem för inbyggda system och många utvecklare har även börjat använda det till realtidssys- tem med mindre hårda tidskrav. Det finns dock idag inte någon lösning med en standard-linuxkärna som kan tillfredsställda de krav på låg fördröjning som krävs för användning i profesionella ljudsystem.

I det här examensarbetet används en dubbelkärneuppsättning för att ge ett inbyggt Linuxsystem möjlighet att bearbeta digitalt ljud med låg fördröjning. Lös- ningen använder Xenomai-ramverket för realtidsberäkningar baserat på en teknik kallad interrupt pipeline (I-pipe). I-pipe ger möjlighet att virtualisera interrupt genom en mikrokärna som körs som ett lager mellan Linuxkärnan och hårdvarans interruptcontroller.

Det resulterande systemet inkluderar ett x86 Atom-enchipssystem, en XMOS microcontroller, och ljudkonverterare till och från analoga ljud in- och utgångar. Drivrutiner och bibliotek utvecklas för att ge direkt tillgång till ljudfunktioner från applikationer. Systemet ges därmed robust realtidsprestanda som gör det lämpligt för profesionella ljudtillämpningar samtidigt som det behåller alla fördelar från ett traditionellt Linuxsystem, som kompabilitet med extern hårdvara och en- klare applikationsutveckling. Systemets realtidsprestanda utvärderas som den - imala uppmätta tidfördröjning vid realtidsberäkningar jämfört med motsvarande beräkningar på en standardlinuxkärna. Resultaten visade på en förbättring på näs- tan en storleksordning (ca 2,5ms mot 20ms).

Nyckelord: RTDM, Linuxdrivrutin, Schemaläggning, avbrottsfördröjning, inter- ruptfördröjning, ljudbearbetning, tidsfördröjning, XMOS, SPI. Contents

List of Figures

List of Tables

List of Abbreviations

1 Introduction 1 1.1 Problem Statement ...... 3 1.2 Goals ...... 3 1.3 Related Work ...... 4 1.4 The Approach ...... 5 1.5 Outline ...... 6

2 Background 7 2.1 Real Time Systems ...... 7 2.1.1 Basic concepts of Real-time systems ...... 8 2.1.2 Classification of Real-time systems ...... 11 2.1.3 Real-time operating system ...... 12 2.2 Audio Processing Systems ...... 13 2.2.1 A typical audio processing chain ...... 13 2.2.2 Audio processing - architecture ...... 15 2.2.3 Audio processing - use cases ...... 17 2.3 Linux ...... 17 2.3.1 User view of Linux ...... 18 2.3.2 Software architecture of Linux kernel ...... 19 2.3.3 Benefits of using Linux ...... 22 2.3.4 Drawbacks of Linux ...... 23 2.4 Approaches to Real-time Linux ...... 24 2.4.1 PREEMPT_RT ...... 25 2.4.2 LITMUSRT ...... 27 2.4.3 RTLinux ...... 27 2.4.4 RTAI ...... 28 2.5 Xenomai ...... 31 2.5.1 Xenomai Architecture ...... 32 2.5.2 Real-time Driver Model ...... 36 2.5.3 Xenomai Services ...... 38 2.6 XMOS ...... 39

3 Xenomai setup for an Atom board 41 3.1 Intel Joule specifications ...... 41 3.2 Yocto build system ...... 42 3.2.1 OpenEmbedded ...... 43 3.2.2 Building a Customized Linux Image using Yocto ...... 44 3.3 Applying I-pipe and Xenomai patches ...... 45 3.3.1 Applying I-pipe Patch ...... 46 3.4 Xenomai kernel configuration ...... 46 3.5 Installing Xenomai libraries and validating the setup ...... 47

4 Development of a custom Real Time Driver for Audio over SPI 49 4.1 Audio over SPI driver ...... 49 4.1.1 Protocol between Joule and XMOS ...... 50 4.2 Driver Architecture ...... 52 4.2.1 Chained GPIO ...... 52 4.2.2 Hardware controllers involved ...... 53 4.3 Driver Implementation ...... 53 4.3.1 Writing the RTDM driver ...... 53 4.3.2 Modifications to platform drivers ...... 54 4.3.3 wrapper ...... 56

5 Benchmarking 57 5.1 Benchmarking metrics ...... 57 5.1.1 Round-trip latency ...... 57 5.1.2 Scheduling latency ...... 58 5.1.3 Driver Interrupt servicing latency ...... 60 5.1.4 CPU Usage ...... 61

6 Conclusion 63 6.1 Summary and outlook ...... 63 6.2 Future Work ...... 64 6.2.1 Asynchronous Driver ...... 64

Bibliography 65 List of Figures

1.1 Major features of Linux and RTOS indicating the differences between the two types of operating systems...... 2

2.1 Typical real time audio processing system [1]...... 7 2.2 Interrupt to task latency...... 9 2.3 Preemptive scheduling in a real-time system...... 10 2.4 A typical audio processing system...... 13 2.5 Generic software architecture for handling audio services...... 16 2.6 User view of Linux ...... 18 2.7 Differenc between Monolithic and Micro-kernel architecture...... 20 2.8 Software architecture of Linux operating system...... 20 2.9 Interrupt abstraction based real-time Linux[2]...... 25 2.10 Adeos architecture[3]...... 29 2.11 Adeos interrupt pipe[3]...... 30 2.12 IRQ handling through the I-pipe enabled kernel[4]...... 32 2.13 Xenomai enabled system architecture...... 33 2.14 Dual kernel Cobalt architecture of Xenomai...... 34 2.15 Single kernel Mercury architecture of Xenomai...... 35 2.16 Xenomai API support for different skins...... 35 2.17 RTDM layer interaction with other system layers [5]...... 36 2.18 xCORE architecture. [6]...... 39

3.1 Yocto Layers. [7]...... 44

4.1 Communication between Intel Joule and XMOS microcontroller with buffers...... 50 4.2 Sequence diagram of the communication protocol between XMOS and Joule...... 51 4.3 The Audio over SPI driver architecture...... 52

5.1 Histogram of worst case scheduling latencies in standard Linux...... 59 5.2 Histogram of worst case scheduling latencies in dual kernel Linux. . . . 60 5.3 Probability of interrupt servicing latency in normal Linux kernel. . . . . 61 5.4 Probability of interrupt servicing latency in Xenomai enabled Linux. . . 61 List of Tables

2.1 Key differences between Hard real-time systems and Soft real-time systems. 12 2.2 Overview of Real-time frameworks for Linux ...... 25 2.3 Device description fields for device registration...... 37

3.1 Intel Joule Specifications...... 42

5.1 Round trip latencies comparison...... 58 5.2 Scheduling latencies for single and dual kernel Linux...... 59 5.3 CPU usage comparison between Xenomai Linux and normal Linux. . . 62

List of Abbreviations

ADC ...... Analog to Digital Converter

AES ...... Audio Engineering Society

API ...... Application Programming Interface

APIC ...... Advanced Programmable Interrupt Controller

ALSA ...... Advanced Linux Sound Architecture

BSP ......

CLI ...... Command Line Interpreter

DAC ...... Digital to Analog Converter

DMA ...... Direct Memory Access

DSP ...... Digital Signal Processor

GCC ...... GNU Compiler Collection

GPIO ...... General Purpose Input Output

HAL ...... Hardware Abstraction Layer

IEEE ...... Institute of Electrical and Electronics Engineers

IoT ...... Internet of Things

IPC ...... Inter-Process Communication

IRQ ...... Interrupt Request

ISR ...... Interrupt Service Routine

MMU ......

OS ...... Operating System

PC ...... Program Counter PIO ...... Programmable Input Output

POSIX ...... Portable Operating System Interface

RTAI ...... Real Time Application Interface

RTDM ...... Real-Time Driver Model

RTOS ...... Real-Time Operating System

SIMD ...... Single Instruction Multiple data

SMP ...... Simultaneous Multi Processing

SoC ...... System on Chip

SoM ...... System on Module

SP ...... Stack Pointer

SPI ...... Serial Peripheral Interface

TSC ...... Time Stamp Counter

VST ...... Virtual Sound Technology

WCET ...... Worst Case Execution Time Chapter 1

Introduction

In recent times, computers have entered into every sphere of our lives. Comput- ers are embedded into objects which we use in our daily lives. The advent of the Internet caused massive disruption in the way we live and thus we saw the intro- duction of a new technological area called Internet of Things or most often referred to as IoT. Today, the music industry is also embracing IoT into its products [8] [9]. Recently a conecpt of Internet of Musical Things has also been proposed. The proposed Internet of Musical Things (IoMUT) relates to the network of physical ob- jects (Musical Things) dedicated to the production, interaction with or experience of musical content[8]. Electronic music instruments, audio processors, digital mix- ers, etc are embedded systems which have the ability to synthesize different sounds or process audio sources in real-time. In the context of smart music instruments, the network connectivity of instruments becomes an essential requirement. Artists are seeking new ways to express their creativity and thus there is a demand for this kind of products in the music industry [10].

Professional music applications have low latency requirements and thus put tight constraints on the system designers. In this case, where the artists demand low latency for sound processing, designers usually prefer systems based on DSP (Dig- ital Signal Processors) and a RTOS (Real-time Operating Systems) to satisfy the requirements. Most RTOS have a proprietary API (Application programming in- terface). RTOS are usually used for specific applications, particularly for hard-real- time applications (applications that strictly need a specific task to be completed in a stipulated time) and guarantee to meet the requirements for the specific application [11].

On the other hand, general purpose operating systems such as Linux offer a broad range of functionality. Linux is one of the most popular general purpose op- erating systems currently being used. Linux was created to support general purpose computers and servers which usually have non-real-time and soft real-time appli- cations such as desktop applications. Within the Linux process model, the MMU

1 (Memory management unit) is employed to completely insulate each task from all of the other tasks. Linux supports a broad range of devices and has the network stack support to connect to the Internet and enable devices to communicate with them. Fig. 1.1 depicts the major differences between Linux and a typical RTOS. Linux is an open-source project and a large community of developers contribute and various online forums are present to get help. The open-source nature of Linux makes the best solution to customize a system according to the needs of embedded applications.

Figure 1.1: Major features of Linux and RTOS indicating the differences between the two types of operating systems.

Modern day processors meant for general purpose applications are quite pow- erful and can be used for signal processing. Most of the modern general purpose processors follow the symmetric multiprocessing (SMP) architecture, which means they have multiple processor cores that share the same memory. Modern day pro- cessors have improved speed, multiple execution units or cores and useful support (e.g. SIMD instructions) for signal processing which makes sense to use them for signal processing. With the ever decreasing size of transistors and in turn integrated circuits, System-on-Chip (SoC) computers are quite popular as the hardware plat- form in a lot of embedded devices. A general purpose operating system with the real-time capability and supporting a sufficiently powerful processor is a good option for audio processing systems.

2 1.1 Problem Statement

To meet the requirements of professional audio quality, audio processing systems typically use systems based on Digital signal processors and Real-time operating systems. Digital signal processors have special hardware meant for signal process- ing and thus have the computational power needed for audio processing. RTOS are minimalist operating systems which do not have the features that a general pur- pose operating system provides. For example, Linux is a general purpose operating system that provides features such as network connectivity, file systems, security, and so on. All these features do not have strict timing constraints yet they are important and very useful to develop flexible and interactive applications. These features enable the device to connect to the Internet, load files dynamically, provide an interesting user interface to interact, etc. to name a few advantages. In addition to the above-mentioned advantages, general purpose operating system provides an easier development environment for developing applications.

Currently, Linux is not preferred for real-time applications that demand meeting strict timing requirements. The main reason to this is because the scheduler can cause unbounded latencies which makes Linux not deterministic enough and cannot guarantee to meet the task deadlines. However, in the recent past, more and more versions of Linux are emerging which are light weight and are suitable for embedded devices. These light weight Linux versions are often called Embedded Linux which are used to design embedded systems. These systems have slightly relaxed timing constraints and thus customized Embedded Linux is widely popular to use for such systems. A lot of emphases has been put to make Linux more responsive. The most popular solution for real-time Linux is currently the PREEMPT_RT patch for the mainline Linux kernel [12] but it cannot meet the low latency requirements of professional audio systems [13]. If an audio processing system using Linux needs to provide the performance of professional audio quality, the round trip latency of audio samples has to be as minimal (some of the pro audio tools have round-trip latency as minimal as 1 ms [14]) and as per professional musicians observation audibly noticeable latency is anything above 12 ms [14]. Hence, a customized Linux kernel is needed which can schedule the real-time audio processing task with the highest priority and provide deterministic worst case latency which meets the requirement.

1.2 Goals

The standard Linux kernel is not aimed at real-time applications with strict dead- lines or more precisely hard real-time systems. The main objectives of this thesis are:

• Low latency audio on Linux: The objective of this thesis is to enable Linux to handle an audio processing application as mentioned above with minimal

3 round trip latency (time taken for processing an audio sample from an input channel to an output channel).

• Evaluating real-time Linux performance: The thesis evaluates the real-time capability of dual kernel Linux which is a modification of the standard mainline Linux. Although there are many performance metrics to measure the real-time performance, the thesis focuses the worst case response time and interrupt latency as they are relevant for the purpose of audio processing.

1.3 Related Work

In the recent past, the most promising project that has achieved low latency au- dio for Linux is the Bela project by McPherson et al. [15]. Bela is an embedded platform for ultra-low latency audio and sensor processing [16]. It has been shown that Bela can process block sizes as small as 2 samples and can have latency even in terms of hundreds of micro seconds [16]. However, Bela is a research project for prototyping and usually used by hobbyists and it is restricted to a specific low-power SoC (BeagleBone Black).

McPherson, Jack and Moro have also validated the performance of Bela with a comparison study of Bela against existing microcontroller based sound generators [13]. They studied the action to sound latency of various computer generated sound platforms that are available today. Berdahl and Ju [17] have proposed a low cost so- lution for audio processing and building musical instruments on Linux but does not include any low latency technique. Later, in 2013 Berdahl et al. [18] have proposed an upgraded platform for audio processing using Raspberry pi board. Cucinotta et al. [19] have proposed an approach to obtain low latency audio on Linux by op- timizing the scheduling policy in Linux. They propose to use resource reservation scheduling and feedback-based allocation techniques to assure timeliness of audio processing applications. Topliss et al. [20] have performed a comparative study on the audio latency in Linux using Jack audio connection kit and ALSA (Advanced linux sound architecture) audio backend using an USB audio interface and a codec (Audio cape). In their study, a Beagle Bone Black board running a Angstrom without any kernel modification was used.

A light weight C++ framework for real-time control has been proposed by Piechaud [21]. It uses Xenomai and C++ as a framework to develop Real-time active control applications. Based on Piechaud’s work an application to control the vibrations of a string instrument using Xenomai is proposed by Benacchio et al. [22]. Benacchio’s work also tests the real-time capability of Xenomai framework for real-time Linux. A masters thesis work by Bernard [23] ports a Linux device driver for MOST (Media Oriented Systems Transport) protocol to a real-time driver using RTDM framework. It also focuses on the evaluation of real-time performance on a Xenomai enabled kernel.

4 1.4 The Approach

Much research has been made in the area of how to apply real-time features to the Linux kernel. In section 2.4 various approaches for real-time Linux are discussed in detail. The most popular approach taken by many system developers are the PREEMP_RT patch [12] and RTAI - Real Time Application Interface [24]. The approach chosen for real-time Linux in this thesis is simialr to the Bela project [15], a dual kernel modification of Linux obtained with the help of a real-time framework for Linux known as Xenomai [25]. It differs from Bela in the way audio data is collected and transferred to the codec. Bela does not use custom kernel drivers instead employs a specialized hardware unit present on-board. The approach used in this thesis uses the Linux device driver model and Intel hardware units of x86 SoCs to provide the scalability that industry requires.

Xenomai is a patchset based on the principle of interrupt pipeline also known as Adeos or I-pipe in the Linux community [3] [26]. I-pipe is a mechanism where hardware resources are shared between multiple kernels or operating systems. The implementation of I-pipe can be viewed as interrupt virtualization with an abstrac- tion layer between the Linux kernel and the interrupt controller hardware. Hence, this technique allows to run a real-time kernel which has a scheduling policy as of an RTOS and a non real-time kernel which is the traditional Linux kernel with all its features available. The Xenomai patched Linux runs on a Intel Joule system-on- module and an XMOS microcontroller [6] which acts as an audio card for the Intel Joule processor [27]. A protocol is designed for the communication between Intel Joule and XMOS.

The following tasks were taken up to complete the objectives set for this thesis as mentioned in the previous section:

• Study of the various approaches available to achieve real-time capability with the Linux kernel.

• Study of the Intel Joule SoC specifications and the Intel Joule development board to check its feasibility to be used in this thesis work.

• Selection of the right Linux kernel distribution and the build system to build custom kernel for the x86_64 architecture of Intel Joule.

• Creating the dual kernel Linux image for the Intel Joule board using the I-pipe and Xenomai patches.

• Testing and solving the I-pipe related conflicts in the existing Linux drivers to reuse them in the real-time domain.

• Implementation of a simple RTDM GPIO driver to get familiarized with the RTDM framework.

5 • Implementation of RTDM audio driver using SPI as the communication pro- tocol between the Intel Joule and XMOS microcontroller.

• Implementation of test application to test the RTDM audio driver.

• Evaluation of the interrupt latencies and response time with the Xenomai patched kernel against the standard Linux kernel.

• Creating a guideline to adapt standard Linux to a dual kernel version and documentation of critical aspects to be aware of while developing real-time safe RTDM drivers.

1.5 Outline

The organization of this thesis report is divided into six chapters. After the current chapter of introducing the thesis, the following chapters are as following:

Chapter 2 describes the background of the thesis. It explains all the topics that are related to this thesis and gives sufficient insight to understand the thesis. It discusses real-time systems basics, audio processing systems and the relative tech- nical terminology. This chapter also discusses approaches to real-time Linux that are available, the Xenomai project and introduces the XMOS microcontroller briefly.

Chapter 3 describes the Xenomai setup and steps required to create a Xenomai enabled Linux image for Intel Atom processors. It also discusses the Yocto build system used to compile and create the Linux image.

Chapter 4 describes the process of developing an RTDM driver for audio over SPI. This chapter explains the protocol that is defined for communicating between the XMOS and the Intel Joule processor. In addition to that, this chapter throws light over the architecture of the RTDM driver implemented and mentions the modifi- cations to the platform drivers that were necessary to make them compatible with the I-pipe architecture.

Chapter 5 describes the bench-marking methods that are followed in this thesis and bench-marking metrics that are chosen to evaluate the performance.

Chapter 6 discusses the conclusions that are made with the observations and bench-marking metrics that are obtained through experiments. This chapter also discusses the future scope of work that can be done based on this thesis work.

6 Chapter 2

Background

As mentioned in the previous chapter, the objective of the thesis is to reduce the latency of audio processing on Linux based systems. Hence, it is necessary to know the terms and literature associated with this thesis. This chapter gives insight about all the topics that a reader must know to understand this thesis thoroughly.

2.1 Real Time Systems

A real time system, by definition, is that which reacts to stimuli within a finite and predictable amount of time [11]. Proper functionality of such systems not only require logical correctness of the output, but also, the amount of physical time within which these outputs are produced [28]. For example as shown in Fig. 2.1 an audio processing system is a real-time system where input samples act as the stimulus and must be processed to produced as output signal within certain amount of time. Failing to produce the output signal within the required time, the system is considered as functionally improper.

Figure 2.1: Typical real time audio processing system [1].

7 2.1.1 Basic concepts of Real-time systems Events: An event indicates a state change that requires a response. An event can be generated from an external stimulus. For example, a sensor or an internal state change such as a timer overflow. Similarly, an event can be classified as:

1. Asynchronous event: These events are sporadic in nature and can arrive at any instant. E.g. interrupt from a peripheral device [11].

2. Synchronous event: These events occur at regular time intervals and are periodical in nature. E.g. events from a timer [11].

3. Isochronous: These events are asynchronous events but occur only within a window of time interval [11].

Task: A task is a computation that is executed by the CPU in a sequential fashion or, in simple terms, the sequence of actions that must be carried out to respond to an event [29].

Deadline: The time instant within which a meaningful and correct output should be produced is called a deadline [28].

Latencies: Latency can be defined as the time between the occurrence of an event and the beginning of the corresponding task for that event. Latencies can be mea- sured between various points but often embedded real-time systems respond to external events called Interrupts. Hence, one of the key metric used to assess the real-time performance of a system is the interrupt latency. Interrupt latency is the time taken by a system to start executing the corresponding interrupt service rou- tine upon an interrupt trigger. For an user the latency that is of most importance is the interrupt to task latency or usually referred as interrupt to process latency. The interrupt to process latency reflects the interval in which a user-mode applica- tion responds to a hardware request, from the moment an interrupt service routine started execution. In operating systems terminology this is sometimes referred to as Process Dispatch Latency.

Interrupt to process latency is the sum of three other latencies. If TL is the Interrupt to process latency, then TL = Ti + Tp + Ts. Where Ti is the interrupt handling latency of the OS, Tp is the interrupt processing time in the ISR and Ts is the OS scheduling latency. The interrupt handling mechanism of the OS is the nearest to hardware, which first recognizes the interrupt and jumps to the right code location where the corresponding ISR is present. The time needed to accomplish this task is known as interrupt latency. The interrupt processing time is usually the time taken by the interrupt handler in device drivers that does the necessary work before waking up a task. The final part is handled by the scheduler of the OS which wakes up the task which is waiting for the interrupt event or which is ready to be

8 executed. Fig. 2.2 below shows the interrupt to task latency and the sequence of execution which leads to the task execution.

Figure 2.2: Interrupt to task latency.

Scheduling: When a set of tasks use a limited number of shared resources, re- sources have to be assigned to the various tasks according to a predefined criterion, called a scheduling policy. The set of rules that, at any time, determines the order in which tasks are executed is called a scheduling algorithm. The specific operation of allocating the resources to a selected task is known as scheduling or dispatching [29].

Out of a set of tasks, one that can potentially execute on the processor, inde- pendently on its actual availability, is called an active task. A task waiting for the processor is called a ready task, whereas a task in execution is called a running task. All ready tasks waiting for the processor are kept in a queue, called ready queue. The operating system is responsible for scheduling and managing the resources of the real-time system. Scheduling can be of two types:

1. Preemptive scheduling: A preemptive scheduler is a scheduler in which the running task can be arbitrarily suspended at any time, to assign the CPU to another task according to a predefined scheduling policy [29]. Fig. 2.3 shows the process of scheduling tasks in a system which follows preemptive scheduling. Preemptive scheduling typically allows higher efficiency, in the sense that it allows executing a real-time task sets at the cost of scheduling overhead and higher execution times.

2. Non-preemptive scheduling: A Non-preemptive scheduler is a scheduler where the task is not suspended to allow some other task to be executed until the former is finished.

9 Figure 2.3: Preemptive scheduling in a real-time system.

An ideal Operating system would ensure high data throughput with 100% CPU efficiency and least response times but that is highly improbable in practice. A good operating system has to take many contradictory decisions and manage the resources through an intelligent scheduling algorithm. There is no such thing as the optimal scheduling algorithm because there is always a trade-off between resource utilization and response-times. General purpose operating systems such as Linux are more focused on the resource utilization with reasonably more response times. Moreover, Linux-like operating systems allow different priorities for tasks, where those with higher priorities get bigger time-slices of CPU usage. A situation could occur however, where the lower process is allowed to run first, if it has a larger chunk of its time-slice left[30][31]. Thus, no task is starved and this scheduling policy is known as fair scheduling. However, a real-time operating system following a fair scheduling algorithm would be disastrous. Any event which is of higher importance irrespective of other tasks’ state should be able to preempt the current task and execute.

A common misconception about real-time systems is that they are fast. Real- time systems are those which have a deterministic response time to a trigger. They need not be fast but they should complete a task within a stipulated time. Often real-time systems are in such scenarios or applications where they have to respond with minimal delay to the stimulus. Thus, there is a perception associated with real-time systems that they are systems which are faster than other systems. In the computing field, speed is a subjective term it generally means throughput of a system, that is the number of instructions that are executed in a specific period of time. Real-time systems are not better than their non real-time counterparts with that aspect, as they compromise on throughput but guarantee responsiveness. A real-time operating system (RTOS) aims to offer as low response time, or latency, as possible. Even more important then a low latency, however, is a deterministic latency, i.e. the ability to give a bounded interval for the latency of the system [31].

10 2.1.2 Classification of Real-time systems Real-time systems can be classified broadly on the basis of the time constraints and on the basis of the criticality of safety.

Real-time systems can be classified on the basis of their time constraints as:

• Soft real-time systems: Soft real-time systems are those real-time systems in which task deadlines are mostly met. There is no strict requirement to meet the deadline every single time. The response times are important but not critical to the operation of the system . Failure to meet the timing re- quirements would not impair the system. E.g. Web browser, digital camera, etc.

• Hard real-time systems: Hard real-time systems adhere strictly to each task deadline. When an event occurs, it should be serviced within the pre- dictable time at all times in a given hard real time system. Missing a dead- line would be considered failure, which makes these systems highly time con- strained. The key thing in these kind of systems is the scheduling policy. The tasks have to be scheduled to manage the resources such that no contention can cause a delay in execution of a task and thus missing the deadline. E.g. Professional audio processing systems, Anti-lock braking system, etc.

As mentioned earlier, another character on which real-time systems can be classified is the after effect of system failure or safety. Real-time systems can be classified on the basis of safety as :

• Safety critical real-time systems: A system whose failure to meet the task deadline leads to catastrophic results in terms of harm to someone is classified as a safety-critical real-time system. These systems form a subset of hard real-time systems. Since there are human lives at stake, these systems have to be fault-tolerant and have to complete the task within the deadline every single time. E.g. Power plant control unit, Anti-lock braking system, etc.

• Non safety critical real-time systems: Non safety-critical real-time sys- tems are those whose failure does not result in catastrophic effect. A non safety-critical real-time system can be a hard real-time system or a soft real- time system. For example a web browser is a non-safety critical soft real-time system and a high quality audio processing system is a non-safety critical hard real-time system.

The table 2.1 lists the major differences in Hard real-time and Soft real-time sys- tems in terms of various characteristics of a typical Real-time system.

11 Table 2.1: Key differences between Hard real-time systems and Soft real-time systems.

Characteristic Hard real-time systems Soft real-time systems Response time hard-required soft-required Peak load performance predictable degraded Control of pace environment computer Safety often critical non-critical Redundancy type active check-point recovery Data integrity short-term long-term Error detection autonomous user assisted

The audio processing application considered in this thesis work can be classified as a Non safety-critical hard real-time system. The main reason behind this clas- sification is that the application demands high quality of service in terms of audio output and thus, even a single audio sample cannot be dropped.

2.1.3 Real-time operating system Operating System (OS) is a system program that provides an interface between hardware and application programs. OS is commonly equipped with features like: Multitasking, Synchronization, Interrupt and Event Handling, Input/ Output, Inter- task Communication, Timers and Clocks and Memory Management to fulfill its pri- mary role of managing the hardware resources to meet the demands of application programs.

Real-time operating system is an operating system that supports real-time ap- plications and embedded systems by providing logically correct result within the deadline required. Such capabilities define its deterministic timing behavior and limited resource utilization nature.

It appears to be a simple decision to choose an RTOS for a Real-time system but care has to be given to choose the right operating system when designing a system. The key technical considerations to keep in mind while choosing an RTOS are:

1. Scalability: Size or memory footprint is an important consideration. Most RTOS are scalable upto to the point where only the code required is included in the final memory footprint. Looking for granular scalability in an RTOS is a worthwhile endeavor, as it minimizes memory usage.

2. Portability: Often, an application may outgrow the hardware it was orig- inally designed for as the requirements of the product increases. Although there are a lot of RTOS options available in the market, they are hardware specific and often lead to porting to the required architecture.

12 3. Run-time facilities: RTOS provide Run-time facilities such as inter-task communication, task synchronization, interrupts and events handling, etc. Different application systems have different sets of requirements. Comparison of RTOSs is frequently between the kernel-level facilities they provided. 4. Development tools: A sufficient set of development tools including debug- ger; compiler and performance profiler might help in shortening the develop- ment and debugging time, and improve code reliability. Commercial RTOS usually have a complete set of tools for analyzing and optimizing the RTOS’ behavior more than Open-Source alternatives.

2.2 Audio Processing Systems

2.2.1 A typical audio processing chain An audio signal which is to be processed typically goes through a chain of devices which is called the audio processing chain. There are three main components in this audio processing chain, which are: Analog to Digital converter (ADC), the processor and Digital to Analog converter (DAC). Apart from these main components, there are input and output devices to capture and reproduce the sound respectively. In addition to these above mentioned devices a controller may also be present if the processor is not directly connected to the ADC and DAC. The ADC, DAC and controller can also be built into one board called the sound card which connects to a processor as shown in the Fig. 2.4. The main components and parameters in an audio processing system are explained below:

Figure 2.4: A typical audio processing system.

• Analog to Digital converter (ADC): A microphone or other transducer converts a physical entity (e.g. sound pressure) into an analog electrical signal

13 which is connected to an input channel. The ADC samples the analog signal continuously and encodes it into a digital value which is proportional to the analog signal voltage. The two important parameters of any ADC are Res- olution and Sampling Frequency. The resolution of the converter indicates the number of discrete values it can produce over the range of analog val- ues. Hence, the size of the registers used to store the digital value defines the resolution. An analog signal is continuous in nature and hence there can be infinite instants in time to measure it. Sampling frequency of an ADC is the frequency at which the ADC samples the analog signal. • Controller: The controller is an optional micro-controller which acts as a bridge between the converters (ADC and DAC) and the processor which ac- tually processing the data. The ADC and DAC communicate with the con- troller using some kind of a peripheral protocol (I2S, SPI, etc). The controller communicates with the processor over a bus which can also be an I2S or SPI bus. Controller also facilitates channel management and other basic process- ing capabilities. • CPU: The CPU is the core computing unit of the system. General purpose processors these days have a lot of computational power and run at high frequencies. These also facilitate the use of an OS to take advantage of the services that it provides. • Digital to Analog converter (DAC): Digital to analog converters recon- struct the analog signal from the digital signal. As with the ADC’s resolution and sampling frequency are the main parameters deciding the quality of the output. The controller sends the processed digital audio samples and the DAC converts into an analog signal connected to an output channel.

• Sampling rate: The sampling rate or sampling frequency fs, is the average number of digital samples obtained in one second after sampling the analog signal by the ADC. For any analog signal to be captured in the digital form, its voltage has to be measured at discrete instants of time. To sample the signal and reconstruct it later without any imperfections the Nyquist Criterion has to be met, which states that: fs > 2B

Where, fs is the sampling rate and B is the bandwidth of the analog signal to be sampled. Sampling rate also decides the quality of the audio output. Sampling rate and resolution of ADC are directly proportional to the quality of the audio output as at higher sampling rates and high resolution of ADC, the details of the audio signals are preserved and thus signal reconstruction at DAC yields a high quality audio signal.

Sampling rate is measured in samples per second. Audio frequencies that are relevant for humans are in the range of 20 Hz - 20 kHz [32], thus a sampling

14 frequency of 44.1 KHz is generally used as sampling frequency to capture audio from different sources such as music, speech, acoustic events, etc. Although 44.1 kHz is enough the industry has moved to even higher frequencies keeping in mind relaxed anti-aliasing filtering. The Audio Engineering Society (AES) recommends a standard sampling frequency value for different applications.

A sampling frequency of 48 kHz is recommended for the origination, pro- cessing, and interchange of audio programs employing pulse-code modulation. Recognition is also given to the use of a 44.1-kHz sampling frequency related to certain consumer digital applications, the use of a 32-kHz sampling frequency for transmission-related applications, and the use of a 96-kHz sampling fre- quency for applications requiring a higher bandwidth[33].

• Audio channels: A channel is a path through which audio signal data (either in analog or digital form) travels from source to destination. Audio systems can have multiple sources of input and often they need to be processed with a specific requirement thus a multi-channel audio system allows to process each channel separately or if required they can be mixed to create a single channel audio. Multi-channel ADCs and DACs make it possible to sample different input sources and thus manage multiple channels.

2.2.2 Audio processing - Software architecture Fig. 2.5 shows a generic software architecture followed by many operating systems to handle audio services. Audio services are basically tasks such as playback from files, recording to files or streaming of audio from input device to output device. They can be divided into four layers, the top one being the application layer and the bottom one being the hardware device which is usually the sound card. As with many of the general purpose operating systems the device driver in the kernel acts as an interface between the user space applications and the hardware.

When dealing with audio processing in real-time, the samples can be processed sample-by-sample or in terms of blocks of a fixed number of samples. Block process- ing a fixed number of samples is the usual approach. The flow of actions performed by the operating system to process data for streaming applications is explained briefly below:

1. The hardware device (sound card) informs the processor through an hardware interrupt when it has audio samples ready to be processed.

2. The kernel driver waits for the interrupt request and captures the audio sam- ples from the hardware device. In addition to capturing, it also sends the audio samples which are to be sent to the output channel. Since a chunk of audio samples are handled at a time, there are two buffers in the kernel driver to store the audio samples. The Capture buffer is a block of memory

15 Figure 2.5: Generic software architecture for handling audio services.

allocated to store input audio samples and the Output buffer is used to store the processed samples which will be sent out. The kernel driver uses DMA for data transfer, thus allowing to do computation and communication of data in parallel.

3. After initiating the data transfer using a DMA controller, the kernel driver wakes up the user space application waiting for audio samples through a call- back function at every interrupt.

4. User space programs can use the driver services only through system calls such as open(), (), etc. A user space library provides an abstraction for these system calls and provides easier interfaces for application developers. This library provides the callback function of processing a block of audio data at every interrupt. The call back function has two arguments which are pointers to the input and output buffers as shown in the figure 2.5.

5. At every callback from the kernel driver, the application is woken up and processes the input buffer and puts it into the output buffer.

With the knowledge of the software architecture to process audio, there are certain parameters which are important to understand the performance of the system.

Buffer size: The Buffer size is the size of both the Capture buffer and the Output buffer. Hence, it is also the size of the block of audio data that will be processed at every interrupt.

Interrupt period: The Interrupt period is the average time duration between two successive interrupts from the audio hardware device. The Interrupt period is dependent on the Buffer size. If larger buffer sizes are used then a large chunk is

16 processed at each interrupt. Thus, lesser number of interrupts are needed in a given period of time.

Round-trip audio latency: Round-trip audio latency refers to the delay period between when an analog audio signal enters and when it emerges from a system.

2.2.3 Audio processing - use cases Audio filters: Audio filters can be used to filter out frequencies that are unwanted. Noise from unwanted sources is present in the audio signal which can be eliminated through filters. Filters can also be used to change the quality of the sound.

Audio effects: In streaming audio applications, audio effects such as reverbera- tion or echo can be added to a plain sound through audio processing. Compression, modulation, etc. are some other audio effects that are done through audio process- ing [34].

Electronic Music: Artificially generated sounds are finding a lot of popularity these days and it is possible only because of audio processing. Sound of an in- strument can be generated through software by modelling the real sound of that instrument and converting it to analog through the DACs. With the help of MIDI (Musical Instrument Digital Interface), electronically generated sounds can be used in musical instruments.

2.3 Linux

Linux is one among the many popular Unix-like operating systems such as Mac OS X, Solaris, BSD from the University of California at Berkley, etc. Unix was con- ceived in 1960s and released in 1970. Unix’s wide availability and portability meant that it was adopted, copied and modified by various institutions and universities.

MINIX, a Unix-like system was developed by Andrew S. Tenenbaum in 1987 for academic use. In 1991 , a student from the University of Helsinki, came in contact of MINIX and he began developing a non-commercial version of UNIX that became known as Linux, which became spectacularly popular starting in the late 1990s. Linus Torvalds remains deeply involved with improving Linux, keeping it up to date with various hardware developments and coordinating the ac- tivity of Linux developers around the world. Over the years, developers have worked to make Linux available on other architectures, including Hewlett-Packard’s Alpha, Itanium (Intel’s 64-bit processor), MIPS, SPARC, MC680x0, PowerPC, and IBM’s zSeries [30]. From smartphones to cars, and home ap- pliances, the Linux operating system is everywhere.

17 One of the reason for Linux’s popularity is that it isn’t a commercial operating system. Its source code is open for all and it follows the GNU General Public License [35]. The GNU project started in 1984 and had the goal of creating a complete Unix-compatible system made entirely of free software. The GNU project is coordinated by the Free Software Foundation, Inc. [35], its aim is to implement a whole operating system freely usable by everyone. The availability of the GNU C compiler has been essential for the success of the Linux project[30]. Essential features of Linux are as follows:

• Linux has been originally written in the programming language C, and it is therefore a classical platform for C programs. It contains well suited environ- ments for application development.

• Linux is perfectly suited for application in networks.

• It Offers various alternatives for the solution of most tasks and a multitude of commands are available for the user.

• Originally command-line oriented, but it can be used via a graphical user interface (X Window system).

2.3.1 User view of Linux

Figure 2.6: User view of Linux

Figure 2.6 shows the structure of Linux from the user’s perspective. As Linux follows most of the concepts of Unix, it has a layered architecture. There are four layers which are Application, Shell, Kernel and Hardware.

• Hardware: Hardware is essentially the platform on which the operating sys- tem runs. CPU, memory, peripheral devices, etc are all resources that the operating system manages which is almost completely abstracted from the user.

18 • Kernel: The Kernel forms the core of any Unix-like operating system such as Linux. The kernel interacts with the hardware and performs tasks on behalf of the user space applications. Kernel is also responsible for allocating memory and CPU time for applications.

• Shell: The Shell forms the interface between user and kernel. Computers don’t have any inherent capability of translating commands into action. This requires a command line interpreter (CLI) and this is handled by the “outer Part” of the operating system which is the Shell.

• Applications and utilities: This layer provides many utility programs to do common tasks which a user needs regularly. This is also the layer where an user can run his own application.

2.3.2 Software architecture of Linux kernel In the previous section 2.3.1, the user view of Linux was presented. Another way to look at an operating system’s architecture is through the perspective of an OS designer. System memory in general purpose operating systems can be divided into two distinct regions: kernel space and user space. Kernel space is where the kernel (i.e., the core of the operating system) executes (i.e., runs) and provides its services. User space is that set of memory locations in which user processes (i.e., everything other than the kernel) run. On the basis of this segregation of system memory an operating system’s architecture can be mainly of two types, Monolithic or Micro-kernel.

• A Monolithic kernel is a single large process running entirely in a single address space. It is a single static binary file. All kernel services exist and execute in the kernel address space. A monolithic kernel like Linux grows large as time passes and gets complex. To maintain the operating system, it is structured in a strictly functional units called Subsystems and follow strict protocols to communicate between themselves.

• In a Micro-kernel, the kernel is broken down into separate processes, known as servers. Some of the servers run in kernel space and some run in user-space. All servers are kept separate and run in different address spaces. Servers invoke "services" from each other by sending messages via IPC (Inter-process Communication). Figure 2.7 illustrates the difference between Monolithic and Micro-kernel architecture. MINIX is an example of Micro-kernel OS.

Fig. 2.8 depicts the software architecture of Linux. The kernel space and user space are separated by a system call interface. The Linux system call interface pro- vides the means to transition control between user space and the kernel to invoke kernel API functions. The system calls are POSIX compliant. POSIX is a family of standards, specified by the IEEE, to clarify and make uniform the application programming interfaces (and ancillary issues, such as command-line shell utilities)

19 Figure 2.7: Differenc between Monolithic and Micro-kernel architecture. provided by Unix-like operating systems.

User space contains the system library which provides the user interface for system calls (open, read, write, close). In the case of Linux, usually the GNU C library (glibc) provides the necessary interfaces to interact with the Linux kernel. User programs and utilities use the system library to access Kernel functions to get system’s low level tasks done.

Figure 2.8: Software architecture of Linux operating system.

The Linux kernel is quite a complex software. As a monolithic kernel it has grown over the years and a lot of subsystems are added to it since its inception. These subsystems provide the flexibility and the ease of developing applications

20 in user space. The functionality of few of major subsystems of Linux kernel are explained below:

: File system provides a consistent view of data stored on the hardware devices. The file system of Linux follows the UNIX concept for file system, it treats everything as files. If something is not a file then it is a process. Thus, it allows to use common file interfaces to interact with de- vices. The Linux file system goes further to allow the system administrator to mount a logical file system on any physical device. Logical file systems promote compatibility with other operating system standards, and permit de- velopers to implement file systems with different policies. File system defines the file structure, file access mechanism, types of files, space allocation, etc and manages the functions related to files.

• Memory Management: Memory management subsystem manages the main memory. Since most of current day processors have a memory management unit, Linux offers virtual memory for each process running in user space. The memory manager subsystem maintains this mapping on a per process basis, so that two processes can access the same virtual memory address and actually use different physical memory locations. Memory management system handles the movement of processes back and forth between main memory and disk. It also keeps track of the memory usage in the system and it decides which process will get what area of the physical memory to avoid fragmentation.

• Process management: Linux supports multitasking in a manner that is transparent to user processes, each process can act as though it is the only process on the computer using main memory and other hardware resources. Process management subsystem implements primitives for process creation, destruction, wait and resume, etc. Process management is responsible for scheduling the processes on the CPU. It allows multiple processes run concur- rently on the same machine while inter-process security is maintained. Process management maintains a process control block which is a data structure to track all the processes and the resources used by them. process manage- ment also provides mechanism to synchronize the execution of processes and threads.

• Inter process communication: Inter-Process communication is a compo- nent or subsystem of Process management [36]. Inter-Process communica- tion implements few communication primitives to facilitate communication among a group of processes. IPC can be implemented in a number of different ways, the most common being files, pipes, sockets, message passing, signals, semaphores, shared memory, and memory mapped files.

• Network Subsystem: The network subsystem allows Linux systems to com- municate with other systems over a network. A number of hardware devices

21 and protocols are supported to enable communication. The network proto- col modules are responsible for implementing each of the possible network transport protocols. This subsystem provides all the necessary service and mechanism to be able to send and receive data over network for user space processes. The device independent interface module provides a consistent view of all of the hardware devices so that higher levels in the subsystem don’t need specific knowledge of the hardware in use.

• Linux Device Drivers: One of the main jobs of operating systems is to hide the peculiarities of the system’s hardware devices from its users. Accessing hardware devices and using them should be simple and consistent from user space. Device drivers take on a special role in the Linux kernel, they are distinct “black boxes” that make a particular piece of hardware respond to a well-defined internal programming interface and they hide completely the details of how the device works [37]. Linux is a modular kernel which means that various functionality are built as modules or small units of software which interact with each and run in the same address space (kernel space). Linux drivers are also built as kernel modules which can be built-into the kernel or can be loaded at run-time.

Kernel modules are simple object files (.o), linked together with some infor- mation to a kernel object file (.ko). Just as other kernel code, kernel modules have to be programmed in C programming language. Kernel modules imple- ment different functionality or subsystems of the Linux kernel. Hence, kernel modules are not just limited to device drivers but also if one wants to do something in kernel space or modify the kernel.

2.3.3 Benefits of using Linux Linux has several features that make it an exciting operating system. Commercial Unix kernels often introduce new features to gain a larger slice of the market, but these features are not necessarily useful, stable, or productive. Linux offers a lot of benefits when compared to other general purpose operating systems.

• Linux is cost-free. You can install a complete Unix system at no expense other than the hardware.

• Linux is fully customizable in all its components. Since, Linux follows the GPL license the whole kernel is open source and one can modify the kernel according to their need.

• Linux supports a wide range of hardware platforms and if not available, Linux can be built for a platform with some adaptations.

• Linux can run on inexpensive low-end hardware platforms.

22 • Linux is light yet powerful. Linux is built with a lot of importance given to efficiency and optimized to utilize the hardware completely. The Linux developers community is a group of excellent programmers and any piece of code that goes into the mainline kernel is reviewed and is considered of high quality..

• The Linux architecture is well designed with a lot of subsystems to provide a common interface to developers in user space. In the kernel space a lot of generic layers of software is present which can be connected with a small hardware specific code and thus reducing a lot of development time.

• Linux is well supported, it may be a lot easier to get patches and updates for Linux then any other proprietary operating system. A huge online community and forums are available for help and usually drivers for any new hardware is available soon after it is released.

2.3.4 Drawbacks of Linux As with any other operating system, Linux also has some drawbacks associate with it. Some of them are listed below:

• No hard real-time support: Linux fails when it comes to using it for hard real-time applications because of its unpredictable interrupt and scheduling latencies. Its interrupt latencies are high for time critical applications where the response time is very short.

• No standard edition: Linux as an operating system has a lot of different versions called as distributions. Although the mainline kernel is a standard kernel, the operating system as a package of tools and other components can vary.

• Large memory footprint: It usually requires a very large footprint compared to RTOS, boots up slowly and can be difficult to debug.

• Susceptible to bugs: The Linux kernel has a heavy emphasis on community testing. Although Linux mainline kernel is well reviewed but testing is by the open source community and bugs are reported when encountered by users.

23 2.4 Approaches to Real-time Linux

Linux for real-time systems has been a topic of interest from a long time for embed- ded developers. One of the major drawbacks of Linux is that, it does not promise predictable latencies. There are several issues that need to be addressed for real- time Linux to be practical.

The “latency” of an OS can be defined in many different ways. In general, la- tency is the time it takes between the occurrence of an event and the beginning of the action that will respond to the event. In the case of embedded systems, it is the time between an interrupt from an external source to the execution of its corresponding handler. The two main sources of latency for user-space tasks in general-purpose operating systems are task latency and timer resolution. Task la- tency is the inability of the preempt a lower priority process because it in in kernel space. Typically, Monolithic kernels do not allow more than one stream of kernel execution [2]. This is also possible if the lower process is running in kernel context with a lock on some resource. The other type of latency is because of the timer resolution of the OS. Kernel process manager runs on a periodic basis, it is timer driven. When the timer issues the interrupt, the kernel knows that the specified interval of time is elapsed and general-purpose operating systems such as Linux use the system timer to invoke interrupts at a certain frequency. The value of the period is called tick and it is often a configurable option which depends on the processor speed. Hence, a process which needs to be woken up on an interrupt event depends on the tick of the OS.

Many projects are working on this issue and have been coming up with different strategies to make Linux real-time capable. The table 2.2 shows some of the real- time Linux variants with their characteristics. Mainly two kinds of approaches are taken when real-time capability is tried to achieve in Linux:

• Interrupt abstraction: The approach based on Interrupt abstraction consists of creating a layer of virtual hardware between the standard Linux kernel and the computer hardware [2]. Although it only virtualizes interrupts, a separate complete real-time subsystem that consists of an RTOS and a set of real- time tasks and device drivers, runs together with the Linux OS. Figure 2.9 illustrates the architecture of this approach.

• Enhanced kernel: In this kind of an approach, the Linux kernel’s important subsystems are modified to provide predictable latencies. This kind of ap- proach makes the systems more deterministic by improving the , response times, scheduling and timing resolutions.

The main reason to choose Xenomai as the approach to real-time Linux in this thesis is that, Xenomai provides hard real-time capability with user space tasks.

24 Table 2.2: Overview of Real-time frameworks for Linux

Approach Kernel Type Hard real-time capable User space RT task Preempt_RT Enhanced kernel No Yes RTLinux micro-kernel Yes No RTAI Interrupt Abstraction Yes No LITMUSRT Enhanced Kernel No Yes Xenomai Interrupt abstraction Yes Yes

Figure 2.9: Interrupt abstraction based real-time Linux[2].

2.4.1 PREEMPT_RT One of the most popular approaches implemented to guarantee predictable latencies in Linux is the PREEMP_RT patch. This project was an initiative by Ingo Molnar after a patch posted by Linux Monta Vista developers. Monta Vista developers took the Linux’s ability to preempt to its limits. However, the technique was partially stable on uni-processor systems and unreliable on SMP systems [38]. This approach is an enhancement of the Linux kernel and the main idea behind this approach is to improve the preemptability of Linux. The mainline kernel has numerous critical sections which are non-preemptable. For example, spinlocks are a good mechanism for synchronization when the waiting period on a resource is too short and thus spinlocks are an efficient way to synchronize in kernel space. But, spinlocks also posses risk of deadlocking. Hence, the Linux kernel has a mechanism to acquire a spinlock with interrupts disabled. Which makes the kernel non-preemptable for the part where the lock is held. These kind of critical sections introduce unpre- dictable latencies when responding to an interrupt or while scheduling some task. The PREEMPT_RT patch tries to reduce these kind of non-preemptable sections

25 in the kernel.

PREEMPT_RT improves the preemptability of Linux by implementing the fol- lowing ideas:

• Threaded interrupts: PREEMPT_RT makes all IRQs which are requested and the flag IRQ_NODELAY is not set, then the IRQ handler is run as a kernel thread which is preemptible. Any other thread of higher priority can preempt this IRQ handler, thus avoiding the latency caused by waiting on an interrupt handler completion. An interrupt handler of a device can be forced to be in true interrupt context and not as a thread by setting the interrupt descriptor flag IRQ_NODELAY and thus, Hard IRQs are still possible.

• Sleeping spinlocks: As mentioned earlier the idea behind spinlocks is to protect the critical sections that are very short and avoid the overhead of putting the thread to sleep. PREEMPT_RT makes most of the spinlocks eseentially as mutexs and by converting spin_locks into mutexes, the RT patch also enables preemption within these critical sections. If a thread tries to access a critical section while another thread is accessing it, the thread will be scheduled out and sleep until the mutex protecting the critical section is released. Thus, reducing the latency by avoiding busy waits and also reducing non preemtable sections. However, interrupts have to be threaded, as sleeping spinlocks cannot be used inside an atomic context such as a non threaded interrupt handler. For such cases, raw_spinlocks are used which are true spinlocks and do not sleep.

• Priority inheritance: A major reason for latency in an RTOS can also be because of unbounded priority inversion. Priority inversion is when a high priority thread waits for a resource acquired by low priority thread, thus the low priority thread runs before the former. Unbounded priority inversion can be when another thread with a priority in between high and low gets ready and interrupts the low priority task and thus introduces undetermined amount of time before the resource is released by the low priority thread. It can be avoided by a technique called priority inheritance. Priority inheritance is boosting the priority of the low priority thread to that of the high priority thread waiting on the resource to be released. Thus, no other thread can interrupt the low priority thread before releasing the resource and allows the high priority thread to run immediately after the resource is released by the low priority thread.

PREEMPT_RT has been the preferred approach by most of the system designers because of its simplicity, it is being considered for merging with mainline kernel from a long time but has not been done yet and probably because of certain vulnerabilities it induces. When the user is given more power, it usually ends up in the system being crashed. The spinlocks are sleeping spinlocks and interruptable and Hence, care has

26 to be taken with their use and they have to be replaced appropriately throughout the kernel. This is a mammoth task and also needs knowledge of the kernel. Even with all the above mentioned features of PREEAMT_RT, it does not guarantee deterministic, quantifiable latencies for hard-real-time systems[38]. Hence, it is not the appropriate choice for a hard-real-time system such as professional audio processing system.

2.4.2 LITMUSRT

LITMUSRT is a real-time extension of the Linux kernel with a focus on multi- processor real-time scheduling and synchronization. The Linux kernel is modified to support the sporadic task model, modular scheduler plugins, and reservation- based scheduling. Clustered, partitioned global schedulers are included, and semi- partitioned scheduling is supported as well[39]. LITMUS stands for Linux test-bed for Multiprocessor Scheduling in Real-Time Systems. This approach is one of the kernel enhancement approaches to attain real-time scheduling. The main idea be- hind this approach is to improve the scheduling of tasks with an intelligent algorithm to reduce scheduling overhead and thus reducing the scheduling latency.

The system operates in one of two modes: real-time or non-real-time. It boots in non-real-time mode (during which it initializes the appropriate real-time plugin). In order to run a real-time task set, that task set must first be created in non-real-time mode, and then the system must be switched to real-time mode to begin execution. A user performs these activities by invoking several system calls to create tasks, set task parameters, and prepare tasks for execution. When the execution of a real-time task set is complete, or the user wishes to end real-time task execution, the system can be switched back to non-real-time mode. LITMUSRT is a research prototype and further more its API’s are not stable and more focused to optimize the scheduling algorithms for multi-core processors. It has great potential for making mainline kernel capable of handling hard real-time systems in future.

2.4.3 RTLinux

RTLinux is a hard realtime RTOS that runs the entire Linux operating system as a fully preemptive process. RTLinux can be classified as an approach based on the Interrupt abstraction technique but with a micro-kernel which acts as an interface between the hardware and the Linux kernel. The micro-kernel is responsible to handle the interrupts first and then pass it on to the non real-time Linux kernel. RTLinux was developed by Victor Yodaiken, Michael Barabanov, Cort Dougan and others at the New Mexico Institute of Mining and Technology and then as a commercial product at FSMLabs. FSMLabs was later aquired by . RTLinux complies with GPL2 license.

27 RTLinux is designed to achieve Hard-real-time capabilities on Linux to design control systems [40] and its approach relies on traditional virtualization technique. the Linux kernel is a guest operating system and the RTOS micro-kernel is the host. It provides a virtual interrupt controller and timer to the operating system which is Linux and it interacts with a software emulation of the interrupt control hardware provided by RTLinux. The emulation supports the synchronization requirements of the Linux kernel while preventing Linux from disabling interrupts. Interrupts that are handled by Linux are passed through to the emulation software after any needed real-time processing completes [41]and makes the Linux as a completely preempt- able system. The Real-time tasks are built as kernel modules and run in kernel space. It provides capability of running real-time handlers with very minimal la- tency. These task and handlers run first independent of the Linux. Real-time tasks have direct access to the hardware and memory and do not use the virtual memory.

RTLinux is released in two versions: an Open source version under GPL2 li- cense and a more featured commercial version. RTLinux complies with POSIX 1003.13 minimal real-time system profile however some function calls do not fol- low the POSIX standard. Hence, it has basic thread management, IPC primitives, semaphores, signals, spinlocks, etc. The commercial support for RTLinux has been stopped from a long time but FSMLabs owns the US patent over RTLinux imple- mentation. It is an old project which was started in 1995 and has evolved into much more improved versions as RTAI and Xenomai which are discussed in the following sections.

2.4.4 RTAI RTAI stands for Real-time application interface. The RTAI project began at the “Dipartimento di Ingegneria Aerospaziale del Politecnico di Milano” (DIAPM) in 1996/97. RTAI is a project which emerged from the idea that RTLinux proposed. RTAI used the same approach as RTLInux initially but due to a conflict with RTLinux over the patent of micro-kernel technique invented by RTLinux, RTAI has since shifted to Adeos kernel [38]. Although RTAI initially followed RTLinux the were rewritten completely with improved robustness. RTAI supports several architectures x86-64, PowerPC, ARM and MIPS. RTAI consists of two main parts: Adeos patched Linux kernel which adds the hardware abstraction layer and a range of services for developers to write real-time applications.

Adeos Adeos stands for Adaptive Domain Environment for Operating Systems. Adeos an adaptive environment which enables to share the hardware resources among multi- ple operating systems or multiple instances of the same operating system [3]. This is possible by installing Adeos as a thin abstraction layer over the interrupt control hardware or sometimes Adeos is even referred as a nano-kernel although no real-time

28 task runs directly on Adeos but there are multiple entities called domains which are basically sandboxes in which operating systems are running independently. How- ever, the operating systems in different domains can communicate with each other through the Adeos abstraction layer. Fig. 2.10 shows how the Adeos layer commu- nicates with different domains. There are four categories of communication that is possible in a Adeos enabled system. Category A involves domains making use of hardware directly. category B involves Adeos receiving control from the hardware because of software or hardware interrupt. It also includes all hardware commands that Adeos makes to control hardware. category C involves invoking OS’s interrupt handler upon occurrence of an interrrupt registered for that OS. While invoking the interrupt handler it also provides the necessary information regarding the interrupt. Category D involves two way communication of domains with the Adeos. Adeos is available as a kernel patch to mainline kernel and adds the abstraction layer between the domains and hardware.

Figure 2.10: Adeos architecture[3].

Adeos Interrupt Pipeline Adeos handles interrupts in a pipeline manner, Fig. 2.11 shows the interrupt han- dling mechanism of Adeos. Adeos uses an interrupt pipe to propagate interrupts through the different domains running on the hardware. It allows some domains to have the priority to receive the interrupts first and pass them over to the next domain if it is not handled in that domain. When an OS in a domain wants to be uninterrupted, it asks the Adeos to stall the interrupt pipeline. The interrupts are stalled at that stage and maintained in a queue and later when the OS is ready to receive the interrupts they are propagated down the pipeline, it is similar to handling pending interrupts. Note that new interrupts can be requested when the pipeline stage is stalled, they are queued and delivered when the stage is unstalled. When a OS in a domain discards an interrupt it is passed to the next domain and when a domain terminates an interrupt then those interrupts are not propagated further to next domains.

RTAI does provide the required schedulers along with a wealth of services. The RTAI schedulers are fully preemptible and can be scheduled directly from within

29 Figure 2.11: Adeos interrupt pipe[3]. interrupt handlers so that Linux has no chance of delaying any RTAI hard real time activity. RTAI provides services such as:

• Basic task management: Dynamic priority assignment, scheduling policies assignment, schedule locking and unlocking, task yielding, support for periodic tasks, etc.

• Memory management: Shared memory for inter-tasks and intra-tasks com- munication. Dynamic memory allocation, user and kernel space data sharing.

• Synchronization mechanisms: Counting and binary semaphores, conditional variables such as wait, signal, etc.

• Communication: Mailboxes and messaging queues (FIFO or priority order). Broadcast messages to multiple tasks, overwriting urgent sends.

• Hard real-time Posix support: Posix compliant APIs for kernel threads and RTAI tasks.

• Timer and tasklet services: Timer driven events and tasklets with prioritised execution.

• Distributed services: RTAI provides protocol stack for standard UDP and socket APIs called RTNet to use in real-time tasks. This allows to execute real-time tasks in a distributed manner with multiple nodes.

As in the case of RTLinux, RTAI tasks also run in the kernel space hence, both suffer from some programming problems. Essentially the real-time tasks are running with the same privileges as the Linux kernel does and this adds a lot of risk to the stability of system. There are no mechanisms and real-time tasks are not protected from each other[42]. RTAI complies with GPL license, applications source code cannot be made proprietary. Although, it is a good option for audio processing in real-time and as mentioned earlier, executing in kernel space adds a lot of restriction and makes the development process extremely difficult. Audio processing algorithms are highly confidential and developing them with RTAI would require to release the source code under the GPL license. Hence, a more sophisticated approach known as Xenomai which is derived from RTAI is chosen as the approach for real-time Linux in this thesis.

30 2.5 Xenomai

Xenomai is a spinoff project from the RTAI, it is the evolution of the Fusion project which was an effort to run the RTAI task in user space and Xenomai pushes the idea of virtualization of interrupts one step further. As in the case of RTAI, Xenomai also uses an interrupt abstraction layer called I-pipe. I-pipe is the new incarnation of the Adeos nano-kernel that was mentioned in the section 2.4.4 while explaining RTAI and the main difference between I-pipe and Adeos is that, I-pipe is specifi- cally written for Xenomai as its use case. Adeos was a generic patch and I-pipe has cut down various sections of code in Adeos that were generic in nature. Xenomai uses I-pipe as the interrupt virtualization layer but, it allows real-time task to be executed in user space.

Originally known as Xenodaptor, the project became known as Xenomai on Au- gust 2001, introduced by Philippe Gerum to the Linux developers community. The Xenomai 1.0 was released in March 2002. In its early days, Xenomai used to be an add-on component to real-time Linux variants for emulating traditional RTOS, originally based on a dual kernel approach. Over the years, it has become a full- fledged real-time Linux framework on its own terms, also available on single/native kernel systems[25]. It provides an emulated environment for developer to write their applications using usual RTOS API’s. This is possible because of the wrappers that Xenomai provides which it calls Skins. These Skins or Wrappers are available for few popular RTOS such as VxWorks in addition to the POSIX APIs Linux follows.

I-Pipe I-pipe is a much stripped down version of the interrupt pipeline that was imple- mented in Adeos. Even after re-factoring Adeos, I-pipe still retains many of the concepts that Adeos introduced. It enables to get deterministic interrupt response times from the Linux kernel and it inserts itself between the Linux kernel and the interrupt controller hardware. The domains introduced by Adeos are still valid in I-pipe and any interrupt can be registered to a domain which has high priority than the domain containing Linux. Thus, interrupts can be handled directly in the first domain with very low response times. When Linux tries to disable interrupts the Linux’s stage is stalled. The interrupts which arrive at in this scenario are logged by another domain which is called Interrupt shield. These new events are dispatched from the Interrupt shield when Linux is ready to accept interrupts as shown in fig- ure 2.12. This is same as that in Adeos.

When dealing with I-pipe, it is a chain of domain clients asking for the abstrac- tion layer for notification of events. A domain is a software component consisting the OS and relies on the abstraction layer to be notified of: • Every incoming external interrupt, or auto generated virtual interrupt.

31 • Every systemcall from Linux.

• Other system events triggered by the kernel code such as Linux task switching, signal notification, Linux task exits, etc.

I-pipe delivers these events in the order of static priority of the domains hence, it is possible to provide timely and predictable delivery of events. Fig. 2.12 shows a general view of the pipeline abstraction with Xenomai and Linux domains sharing the events.

Figure 2.12: IRQ handling through the I-pipe enabled kernel[4].

2.5.1 Xenomai Architecture Xenomai architecture is similar to the interrupt abstraction architecture that was presented in section 2.4. Fig. 2.13 shows the software architecture of a Xenomai patched kernel with Linux as a co-kernel. Xenomai and Linux are both registered as two domains over the Adeos based interrupt pipeline. An important aspect of Xenomai is that it allows to use any generic part of Linux that can be useful, for example the hardware abstraction layer (HAL) is one such thing which is common for both Linux and Xenomai. It provides access to hardware devices and on top of the HAL are the two kernels operating in their own domain. They provide their own services and interface to the user space through syscalls. In the application layer are the Xenomai and Linux tasks. Xenomai tasks are synonymous with real-time tasks.

Primary and Secondary Domain As shown in the fig. 2.12, a Xenomai enabled kernel has three domains registered with the I-pipe. These are called Primary domain, Secondary domain and the Interrupt shield.

• Primary domain: Primary domain is the first domain in the interrupt pipeline. It has the highest priority among the domains registered. It is also called as real-time domain or Xenomai domain as the Xenomai real-time kernel runs

32 Figure 2.13: Xenomai enabled system architecture.

in this domain. Any reference to real-time domain or Xenomai domain hence forth in this document refers to the Primary domain.

• Secondary domain: Secondary domain is the one with least priority among the three domains registered. It is the one which runs the Linux kernel. It is also called non real-time domain or Linux domain and hence any reference to non real-time domain or Linux domain means the Secondary domain in this document.

• Interrupt shield: In addition to the two kernel domains, there is one extra domain which is between the Real-time and the non real-time domain. As mentioned earlier, the only job of this domain is to queue the interrupts when Linux has disabled interrupts.

Xenomai modes and mode switching Xenomai kernel has two modes of execution. Mode is essentially the domain context in which the system is running. A system is said to be in Primary mode, when the OS in Primary domain is running and similarly, when the OS in Secondary domain is running, the system is said to be in Secondary mode. As in the case of domains, Primary mode and real-time modes are synonymous and Secondary mode is syn- onymous with non real-time mode.

Xenomai tasks in user space can be running in either primary mode or Secondary mode. When a real-time task is created by using Xenomai services, by default it starts in primary mode. But, whenever a real-time task (Xenomai task) calls any Linux system call it switches to Secondary mode. It is obvious that performance is hindered in Secondary domain, since it is not scheduled by the real-time scheduler. The task which switched to secondary mode does not remain there and is switched

33 back to primary mode as soon as possible. This process of switching to secondary mode and back to primary mode is called as Mode-switch in Xenomai’s terminology.

Xenomai 3.0 Architecture Until Xenomai 2.0, I-pipe or Adeos interrupt pipeline was mandatory for the dual kernel approach. Since Xenomai 3.0, two configurations of Xenomai are possible and it supports running real-time Xenomai applications without the I-pipe patch on a single kernel system. The version which runs without the I-pipe patch is called Xenomai with Mercury core and the one which needs the I-pipe patch and dual kernel configuration is called Xenomai with Cobalt core.

Cobalt The Cobalt is the dual kernel configuration of Xenomai and it is the core of the ker- nel services such as scheduling, task management, communication, real-time driver interfaces, etc. is the real-time extension, built into the Linux kernel, dealing with all the time critical activities on top of Adeos I-pipe. In the Cobalt version, all the RTOS APIs that Xenomai provides with Cobalt, only those are deemed as real-time capable. These interfaces are implemented by libcobalt. The libcobalt is the system call interface and has interfaces for RTOS extensions along with a POSIX subset. The RTOS skins that Xenomai supports are wrapped by the interface known as cop- perplate. This allows applications to be developed in a standard RTOS framework (E.g. VxWorks). Xenomai provides its own RTOS skin called Alchemy formerly known as the native skin in Xenomai 2.0. fig. 2.14 shows the structure of Xenomai interfaces and the architecture.

Figure 2.14: Dual kernel Cobalt architecture of Xenomai.

Mercury Mercury was derived from a previous experiment which was known as Xenomai SOLO project [43]. Xenoai/SOLO was a re-implementation of the building blocks

34 that connect an with the underlaying RTOS with special respect to the requirements and semantics of the native Linux kernel[44]. VxWorks applications were the first RTOS applications migrated to Xenomai based Linux kernel through the VxWorks skin.

Mercury is the single kernel configuration of Xenomai. It does not need the I-pipe interrupt virualization and can run on any usual Linux kernel. The Copperplate in- terface, which mediates between the real-time API/emulator your application uses, and the underlying real-time core makes this possible. This way, applications are able to run in either environments without visible code change. Fig. 2.15 shows the architecture of Xenomai with Mercury core for single kernel systems. These APIs are wrappers to the core or nucleus which is either a Cobalt core or Mercury.

Figure 2.15: Single kernel Mercury architecture of Xenomai.

With the help of various skins that Xenomai supports, application developers can develop using standard POSIX interfaces, Native API of Xenomai, VxWorks RTOS framework or even the Xenomai ancestor RTAI applications which run in Kernel space. Fig. 2.16 shows various API services that Xenomai provides to its developers. These skin APIs are wrappers to the core or nucleus which is either a Cobalt core or Mercury.

Figure 2.16: Xenomai API support for different skins.

35 2.5.2 Real-time Driver Model The Real-Time Driver Model (RTDM) is an approach to unify the interfaces for developing device drivers and associated applications under real-time Linux [5]. The RTDM (Real-Time Driver Model) interface provides a framework for writing real-time device drivers which can be accessed by a POSIX interface. It has clean separation between hardware interface and application program and can guarantee deterministic behaviour.

The Real-Time Driver Model is first of all intended to act as a mediator between the application requesting a service from a certain device and the device driver of- fering it [5]. Fig. 2.18 shows the interaction of RTDM driver with other system layers.

Figure 2.17: RTDM layer interaction with other system layers [5].

The RTDM framework supports two kinds of devices or rather models devices into two categories.

• Protocol Devices: These devices are message oriented devices. Devices are reg- istered using two identifiers, protocol family and socket type. In this model, devices are addressed under the POSIX socket model. Devices are invoked using the socket() and close() to terminate. For sending and receiving messages: sendmsg() and recvmsg() on which send()/sendto() and ac- cordingly. recv()/recvfrom() are mapped internally.

• Named Devices: These devices are registered with the real-time subsystem under a unique clear-text name and can then be instantiated using the open() function[5] and close() to terminate. These devices support stream oriented I/O with the help of read() and write(). In addition to that ioctl() can also be added for more functionalities.

36 Table 2.3: Device description fields for device registration.

Field name Description Defines the device type (named/protocol) and if the device can be device_flags instantiated only once at the same time(exclusive device) or multiple times. device_name Name to be registered (named devices only). protocol_family Protocol_family and socket_type to be registered. socket_type (protocol devices only). Size of the driver defined appendix to the structure context_size describing a device instance (device_context). open_rt Handler for named device instance creation open_nrt (real-time and non-real-time invocation). socket_rt Handler for Protocol socket creation socket_nrt (real-time and non-real-time invocation). Contains the operation handlers ops for newly opened device instances. device_class Categorisation of the device.

RTDM device Registration and Invocation RTDM devices are registered with the help of a registration handler rtdm_dev_register() provided by the framework. Device gets registered by passing the device description which has many fields of information about the device. Table 2.3 describes what each field in the device description means. Drivers are allowed to create device instances as well. An equivalence of the user API is available to them. Alternatively, drivers can also call operation handlers of other drivers directly. For this purpose, the related device context first has to be resolved via rtdm_context_get() after instantiating the device as normal. Then this context can be used to obtain the desired handler reference and to invoke the operation without any indirections of RTDM[5].

RTDM framework provides the following services which are very useful when developing drivers in real-time domain:

• Lock Services: RTDM offers a clock source from which time stamp can be gathered ( rtdm_clock_read()). Time is expressed as a 64 bit value and the resolution is in terms of nano seconds.

• Task Services: RTDM allows to use create real-time tasks in drivers to do some work. It allows to manipulate their characteristics (priority and periodicity). Task services such as suspend or wakeup are available.

• Synchronization Services: On multi-core platforms, kernel space can be entered at two points and thus synchronisation of protection of certain data

37 is important. RTDM provides spinlocks for use in atomic contexts such as an interrupt handler and other classic mechanism such as semaphores and mutexes are available.

• Interrupt Management Services: This is the most important service of RTDM that one can use. RTDM has the service to register an interrupt handler for an IRQ line in the real-time domain and the interrupt will also be handled in real-time domain when invoked. IRQ lines can be enabled or disabled also through the Interrupt management services.

• Non-Real-Time Signalling Services: To propagate an event from real- time to non real-time domain, a special signalling service can be requested from RTDM layer. The registered handler will be executed in non-real-time domain.

• Utility services: This group of services includes real-time memory alloca- tion, safe access to user space memory areas, real-time-safe kernel console output, and, as an alternative to separate service entry points, a check if the current context is a real-time task.

2.5.3 Xenomai Services The Xenomai API services are extensive and easy to use. Xenomai 3.0 has made life easy for developers and it provides all the necessary interfaces in kernel as well as user space. Xenomai provides Task management ( Thread management), Synchro- nization, Message passing services, Clocks and Timer services, Events management, Heap management, and Pipes for communication between real-time and non real-time threads. Some of the important services in user space are listed below :

Task Management: Xenomai tasks for real-time domain can be created us- ing different skins. A real time task can be created in the native skin using rt_task_create() and for a periodic task rt_task_set_periodic() can be used. In addition to creation of tasks, there are several other interfaces to change the parameters of tasks and so on. The POSIX skin allows to create tasks just like pthreads are created in Linux using pthread_create().

Events services: Xenomai provides API for initializing events and waiting on them. |rt_event_wait()|,rt_event_bind(), rt_event_create, etc. are some of the services available.

Synchronization services: Xenomai has interfaces for mutexes rt_mutex_create(), semaphores rt_sem_create() , etc.

38 Timer and clock services: Timer services such as timer_create(), timer_delete(), timer_settime(), etc. are also available for kernel and user space developers.

2.6 XMOS

XMOS xCORE is a multi-core microcontroller that delivers scalable, parallel mul- titasking compute. It can be configured to support a wide range of interfaces and peripherals, and responds much faster than conventional microcontrollers to deliver precise real-time performance. It has multiple deterministic processor cores that can execute several tasks simultaneously and independently. Unlike conventional microcontrollers, the xCORE multicore microcontroller is able to run multiple re- altime tasks simultaneously

XMOS is a fabless company that develops xCore microcontrollers. xCore micro- controllers are based on the XCORE architecture which takes a different approach when compared to traditional microcontrollers. The xCORE architecture assures deterministic execution times and behaviour equivalent to hardware like solutions such as an FPGA and it is often referred as Software defined Hardware because peripherals are also implemented in software and one can choose the exact combi- nation of peripherals that is needed.

xCore Architecture

Figure 2.18: xCORE architecture. [6].

The xCore multicore microcontroller consists of multiple "logical processor cores" in multiple tiles. These logical cores are basically harware threads on each tile. The xCore microcontroller executes each instruction per cycle. There is no pipeline, no caches, and no interrupts. Hence, it provides extremely deterministic response times. xCore can have 4,6,8,12,16 or even 32 logical cores on 1,2 and 4 tiles. The xCORE tile provides 500 MIPS of compute (on a 500MHz device) making it much

39 more powerful than conventional microcontroller products. The logical processor cores share processing resources and memory on the Tile, but each has separate reg- ister files. Each logical processor core gets a guaranteed slice of processing power (up to 125 MIPS on a 500MHz device), which is controlled by the unique xTIME timing and synchronization technology [6]. Each logical core has separate register files thus the context switch between logical cores does not cost much performance.

xCORE is an efficient 32bit RISC processor that is made up from tiles, with each tile having multiple logical processor xCores. The XS1 family of devices has upto 8 logical cores per Tile. To further enhance performance the xCORE instruction set includes support for long arithmetic; including MAC operators, CRC and other DSP operators. It has a hardware scheduler which acts as a hardware RTOS and schedules the logical cores efficiently to give real-time performance. It has a unified memory shared accros the cores in a tile. Multiple cores can share the memory and can pass pass data between them. The xCore does not use cache memory thus provides deterministic execution times. if a core is waiting on some data the xCore hardware scheduler will pass the execution resource to some other core which can execute.

Highly responsive I/O and Programmable I/O peripherals xCore provide highly flexible I/o ports which can be programmed to act as any peripheral device that is needed. The architecture of these I/O ports is such that they are tightly controlled from the logical cores and data is directly transferred from the registers of the logical cores to the I/O registers. Thus, avoiding latency that is usually caused by use of memory. The ports have unique way of handling I/O events unlike interrupts there is no handler, the I/O pin waits for the state and when the state is asserted the controller executes the next action. This provides responsive state machines. The I/O has dedicated serialization hardware. Thus serial protocols can easily be implemented through software. XMOS provides a complete range of software based peripheral functions called xSOFTip which includes CAN. USB, Ethernet, PWM, etc.

40 Chapter 3

Xenomai setup for an Intel Atom board

This chapter explains the setup used for conducting the experiments in this thesis. It covers all the components used in the experiment and the steps taken to achieve the goal. The hardware platform used in this thesis is an Intel Atom Processor based system-on-module known as Intel Joule. The operating system running on Joule is a Xenomai patched Linux kernel (Ver 4.9.0).

3.1 Intel Joule specifications

Intel Joule is one of the most powerful System on Module that is available in the market today. Based on the x86_64 architecture of Intel Atom processor, Joule provides great performance for the form factor it has. Originally developed for IoT developers to use it for graphics based applications but, it has all the hardware capability that an audio processing engine needs.

The main reason to choose x86 architecture (Intel) based boards is that, Xeno- mai at its inception was aimed for x86 machines and even now, Xenomai performs better on an x86 architecture based machine when compared to an ARM based machine. The reason being that Intel processors have TSC (Time Stamp Counter), which offers high precision timers needed by Xenomai. ARM also provides a similar hardware counter but on ARM, the hardware timer details are specific to each SoM or SoM family hence specific changes need to be done in the I-pipe patch to use them. However, ARM Cortex- A9 based SoMs are supported by the I-pipe patch. The only SoM that was a competitor to Intel Joule was the I.MX 6 SoM from Freescale. It was not chosen for the reason that it lacked the computational power needed for running heavy VST plugins on the audio processing engine.

Intel Joule comes in two models: Intel Joule 570x and Intel Joule 550x. The below table 3.1 shows the specifications of both the Intel Joule models.

In addition to the specifications mentioned in Table 3.1, Intel Joule has four cores

41 Table 3.1: Intel Joule Specifications.

Intel Joule Intel Joule Specification 550x 570x Dimensions 70 x 85 mm 70 x 85 mm GPIO pins 48 GPIO (including 4 PWMs) 48 GPIO (including 4 PWMs) Storage type eMMC, microSD eMMC, microSD UART 4 UARTs, 3 full duplex & 1 half duplex 3 full duplex & 1 half duplex I2C 5 I2C interfaces 5 I2C interfaces SPI 2 SPI interface upto 25 Mhz 2 SPI interface upto 25 Mhz I2S 1 I2S interface 1 I2S interface USB ports Type C, micro USB & USB 3.0 Type C, micro USB & USB 3.0 Processor Intel Atom T5500 @ 1.5 Ghz Intel Atom T5700 @ 1.7 Ghz RAM 3 GB 4 GB Flash storage 8 GB eMMC 16 GB eMMC Bluetooth* 4.2 compliant Bluetooth* 4.2 compliant Wifi Wi-Fi* (802.11ac) Dual Band MIMO Wi-Fi* (802.11ac) Dual Band MIMO Power 12 V 12 V OS Yocto Linux 4.4 Yocto Linux 4.4 and SIMD instructions supported CPUs to parallelize audio processing algorithms. It has Intel HD GPU to support high definition graphics through its micro HDMI port. Intel Joule is a powerful platform and can be for professional applications and a good choice for hard real-time systems such as audio processing. Intel Joule’s predecessors Intel Edison and Intel Galileo lacked the power which developers were hoping from an x86 machine. The Reference Linux for IoT which comes pre-installed on the platform is built using the and allows a lot of customization possibilities which is also the reason to choose this System on Module for this thesis study.

3.2 Yocto build system

The Yocto Project is an open source collaboration project that provides templates, tools and methods to help creating custom Linux-based systems for embedded prod- ucts regardless of the hardware architecture. It was founded in 2010 as a collabora- tion among many hardware manufacturers, open-source operating systems vendors, and electronics companies to bring some order to the chaos of embedded Linux de- velopment [45].

The Yocto Project is an open source project that provides a starting point for de- velopers of embedded Linux Systems to create customized distributions of Linux OS. It is completely independent of hardware platform. Yocto project is sponsored by the and it is not just a build system but also provides tools, tem- plates and methods for developers to accelerate the process of creating customized Linux images and deployment. The Yocto build system is perfect for embedded Linux development because it is a complete environment with tools, metadata and

42 documentaion. It provides free tools that are easy to get started, powerful to work with such as debuggers, emulation environment, etc. The Yocto also comes with the support of the Open embedded project which provides many useful Yocto recipes. A recipe is details about a particular piece of software that needs to be compiled and included in the kernel image.

3.2.1 OpenEmbedded OpenEmbedded is the build framework for embedded Linux, it offers a best-in-class cross-compile environment and it allows developers to create a complete Linux dis- tribution for embedded systems. Intel is a major participant in the Open Embedded project and uses the OpenEmbedded frame work to create the embedded Linux dis- tributions for its products such as Intel Joule, Intel Edison, etc. The advantages of OpenEmbedded are:

• Yocto Project uses OpenEmbedded as a core component to build images.

• Supports many hardware architectures.

• Tools to speedup creating images after changes have been made.

• Runs on any Linux distribution.

• Cross-compiles thousands of packages of tools that one may need in a distri- bution.

Openembedded Core contains base layer of recipes, classes and associated files that is meant to be common among many different OpenEmbedded-derived systems, in- cluding the Yocto Project. This set of meta-data is co-maintained by the Yocto Project and the OpenEmbedded Project. The Yocto Project and OpenEmbed- ded have agreed to work together and share a common core set of meta-data, oe-core, which contains much of the functionality. This collaboration achieves a long-standing OpenEmbedded objective for having a more tightly controlled and quality-assured core [7]. It follows a layers architecture, most of the layers are cus- tomizable. Fig. 3.1 shows the organization of Yocto layers which a developer can modify according to his needs. The Yocto Project combines various components such as BitBake, OE-Core, etc.

To understand these layers, an example scenario of building a custom Linux distribution is explained next. OpenEmbedded-Core is a layer containing the core metadata for current versions of OpenEmbedded. Hence, any require recipe from a list of recipes provided by the OpenEmbedded layer index web page can be added to this layer. A typical example is core-image-base which will download all the necessary kernel code for a console-only image that fully supports the target device hardware. For the BSP layer, one has to use the meta-intel layer officially provided

43 Figure 3.1: Yocto Layers. [7]. by Intel for the Board support package to Intel based platforms. In the commercial layer and developer specific layer the developer can choose any recipe of that layer ( For example: development tools, Linux utilities, etc) or even developer can add his own application directly built in the image.

3.2.2 Building a Customized Linux Image using Yocto The following steps need to be taken to build a Linux image after all the recepes have been selected for different layers as mentioned above.

• Create a Local copy of the meta-intel repository: Building an image for an Intel board needs its BSP layer. This is the BSP for intel boards such as Joule, Edison, etc. Clone the repository using

$ git clone git://git.yoctoproject.org/meta-intel.

• Set up the environment and build directory:

$ source ./oe-init-build-env.

• Configure the build: To configure the build, edit the bblayers.conf and lo- cal.conf files, both of which are located in the build/conf directory. To build a 64-bit build, use the following command:

$ echo ’MACHINE = "intel-corei7-64"’ » conf/local.conf

To build a 32-bit image:

$ echo ’MACHINE = "intel-core2-32"’ » conf/local.conf

• Build the image: Use the bitbake build tool from OpenEmbedded to build the image of customized Linux distribution.

$ bitbake

44 where can be an image recipe core-image-minimal which builds a small and practical image or some other layer recipe, if one wants to build only that part of the build process. A custom recipe for the image can also be written which can be used with bitbake tool to build the image.

• Build a kernel module: In addition to the above mentioned commands, a ker- nel developer needs to compile just the kernel modules. The bitbake build tool allows to compile kernel modules using below command.

$ bitbake -f -c compile_kernelmodules

• Force compile linux kernel: One of the key things to keep in mind is that Yocto sources the code from git repository. To compile the local code modifications from the local sources. Bitbake can be used to force compile the Linux kernel.

$ bitbake -f -c compile

• Flashing the image: To flash the image to an USB flash drive or a micro SD card. It is possible to use the dd utility or the Bmaptool offered by Yocto build tools. It is recommended to use the Bmaptool since it checks data integrity while flashing and it is faster than the dd tool. The command to use Bmaptool is given below.

$ bmaptool copy /dev/

3.3 Applying I-pipe and Xenomai patches

This section covers the process of applying Xenomai and I-pipe patches. A patch is a structured file that consists of a list of differences between one set of files and another. I-pipe and Xenomai are kernel patches, which means that they have certain changes in the kernel code that will add the Interrupt pipeline and the Xenomai real-time kernel services to the mainline kernel. At first, it should be decided if the system will be a dual kernel system or a single kernel system. As mentioned earlier, Xenomai 3.0 supports real-time application on single kernel but its not truly real- time and the scheduler is still that of non real-time Linux. Only when Cobalt core of Xenomai is used which enables dual kernel system, I-pipe and Xenomai patches are needed. Before applying the Xenomai patch the I-pipe patch has to applied.

45 3.3.1 Applying I-pipe Patch I-pipe patch is mainted by the Xenomai project and it can be downloaded from the Xenomai website. Applying the I-pipe patch is just as any patch is applied to a software code. The steps involved in patching I-pipe are:

• Appropriate I-pipe patch version which matches with the Linux kernel that is going to be patched is downloaded from the Xenomai downloads page. The kernel which the platform vendor gives is used and not the main line kernel because it is observed that certain platform specific code are hard- coded sometimes (although not a good practice).

• I-pipe patch is placed in the root directory of the kernel source.

• Now, the Xenomai repository is cloned from its website. Prepare the Cobalt kernel by using the script provided by Xenomai.

$ scripts/prepare-kernel.sh [–linux=] [–ipipe=] [–arch=] where the is the path to the kernel source directory and the is x86_64 for Intel Joule.

• After the patch is applied the log is checked if there were any merg conflicts and some code chunks might be rejected. The files with .rej extensions are created if there are any rejections. Merge conflicts are manually solved.

• Kernel is compiled to make sure if the kernel has any issues.

3.4 Xenomai kernel configuration

The process of applying the Xenomai patch is fairly simple as it was explained above. But, there are certain configurations of the kernel that need to be set in order to get best performance from Xenomai. These are:

• Processor type: Pick the exact processor your x86 platform is sporting, at the very least, the most specific model which is close to the target CPU.

• Power management: Turn off CPU Frequency scaling and advanced power management options. Frequency scaling and power saving modes can cause unwanted latencies in the performance and adding more jitter to the response times.

• Disable CPU Idle: The CPU idle mechanism sends the CPU into deep C states, causing huge latencies because the APIC timer that Xenomai uses may not fire anymore.

46 • Kernel command-line parameters: Xenomai needs certain kernel param- eters to be set which are passed as command-line parameters when the kernel is loaded. The most important ones are the heap size, user group and its id to access real-time devices and detection of SMI (System management inter- rupts). In the Yocto build system these can be set using the BSP layer recipe. The meta-intel recipe has a machine conf (configuration) file which contains the kernel command-line parameters that will be passed while loading the kernel. The required kernel parameters such as xenomai.allowed_group, xeno- mai.sysheap_size, xenomai.state, xenomai.smi=detect, xenomai.smi_mask=1, etc. are appended in the machine conf file.

After patching the kernel with I-pipe and Xenomai patches, OS image is created using the Yocto build system explained earlier. To check if Xenomai kernel is working, check the kernel log during start-up. The details of Xenomai registered with I-pipe as the primary domain, which is also called the "Head domain" can be found in the kernel log.

[Xenomai] scheduling class idle registered [Xenomai] scheduling class rt registered I-pipe: head domain Xenomai registered

3.5 Installing Xenomai libraries and validating the setup

The following steps are taken to install the Xenomai library:

• Clone the Xenomai 3 git repository.

• Since, the system is patched with I-pipe and Xenomai patches, Cobalt Xeno- mai core is installed.

• The Xenomai library has to be built from sources, standard Auto tools method is used i.e. Configure, make and make install.

• Check the /usr/ directory of the system and a new directory by the name xenomai is created. This confirms the installation is succesful.

To be sure if the real-time tasks can be created and run in real-time domain, Xenomai provides a test suite to test the functioning of Xenomai. In the /usr/xenomai/bin directory there are several Xenomai applications to test. Latency is one such test which creates a real-time thread and runs to check the scheduling latency. The latency test is run and verified that the Xenomai has been properly setup.

47

Chapter 4

Development of a custom Real Time Driver for Audio over SPI

The audio driver that is developed in this thesis work is explained in this chapter. The driver specification, architecture and implementation details are explained.

4.1 Audio over SPI driver

Any audio driver needs an underlaying protocol bus to send the audio samples. Hence, this is also called as a Protocol driver. In this thesis work, the audio driver is implemented over SPI where as traditionally, audio is sent over I2S protocol which is meant for sound. Due to unavailability of I2S platform driver for the Intel Joule, the design decision was made to switch to SPI. Moreover, the XMOS micro- controller also communicates over SPI. Hence, it is an audio over SPI driver.

The required specifications of the driver are:

• Multi-channel audio: The driver should support eight input and output chan- nels of audio to the processor (Intel Joule).

• Sampling frequency: The driver should support a sampling frequency of 48 Khz which is a standard sampling frequency in audio processing systems.

• Latency: The main objective of this driver is to reduce the round-trip latency considerably when compared to the ALSA sound drivers. ALSA stands for Advanced Linux Sound Architecture.

• Buffer Sizes: The driver should support different buffer sizes, where the buffer sizes are 16, 32 and 64 frames. A frame is a set of one audio sample from each channel. Hence, a frame has eight audio samples when there are eight channels in the system.

49 • Robustness and quality: The driver must be robust to handle the high work load and shouldn’t drop a single sample of audio.

4.1.1 Protocol between Joule and XMOS If the system has to meet the high quality requirement, then it has to complete the processing and transfer of audio samples withing a period of time strictly. As discussed in section 2.2 the audio samples are processed in blocks. Each block is called a Buffer and the size of the Buffer decides the period at which the block is transferred to the processor and from the processor.

As with many audio drivers even in this driver interrupts are used to notify the processor of a new block of audio samples ready to be processed. The period at which the interrupt occurs is called as Interrupt period and it depends on buffer size and sampling frequency. If TIRQ is the interrupt period, N is the number of buffer frames and FS is the sampling frequency. Then,

N TIRQ = (4.1) Fs

Hence, for a buffer size of 32 frames i.e. N = 32 and sampling frequency of 48 Khz, the TIRQ is equal to 666.66 µ seconds. Figure 4.1 below explains the buffering of audio samples to transfer at every interrupt.

Figure 4.1: Communication between Intel Joule and XMOS microcontroller with buffers.

50 Sequence of events during communication

Figure 4.2 below is the sequence diagram of the whole process of communication between XMOS and Joule in one interrupt period. One interrupt period is also the time it takes for one audio samples block to be exchanged between XMOS and Joule. Note that they are different audio blocks. Hence an audio samples block which goes into Joule for processing comes out after a small latency.

Figure 4.2: Sequence diagram of the communication protocol between XMOS and Joule.

The latency is caused due to buffering. As shown in fig 4.1 the XMOS uses just single buffer which is possible due to its deterministic execution time. Where as the Joule has two buffers. The Joule uses the typical double buffer ping-pong technique to process audio blocks. Hence two buffers latency is introduced at Joule’s side and one buffer’s delay at XMOS side. Since one buffer is transferred per interrupt period A buffer’s delay is same as the interrupt period. Thus the latency of the audio driver is 3 x TIRQ at any moment.

51 4.2 Driver Architecture

The architecture of the driver implemented in this thesis is similar to the audio pro- cessing system architecture that was explained in section 2.2. It is mainly inspired from the ALSA architecture, the audio codec device (in this case the XMOS mi- crocontroller) interrupts the processor through a GPIO line whenever data is ready to be exchanged. Figure 4.3 shows the architecture with different layers of software that help in procesing audio in real-time domain.

Figure 4.3: The Audio over SPI driver architecture.

On top of the hardware, Linux kernel has platform drivers which are vendor spe- cific drivers and they plug into the generic subsystem of Linux. A similar approach can be taken in Xenomai RTDM framework for developing real-time drivers. The platform drivers register themselves during their probe at initialization and this is not a problem for real-time drivers. RTDM drivers can still use the platform drivers code under them. Hence, the SPI platform driver is modified according to the needs of real-time domain. On the top of the RTDM driver is the RASPA library (RTDM Audio-over-SPI API). RASPA is an user-space code to handle any interaction with the audio driver. It takes care of the buffer mapping, conversion to floating point data and the most important, process() callback function.

4.2.1 Chained GPIO interrupts The GPIO line which is used as hardware interrupt is not truly an interrupt line. The GPIO line is connected to the interrupt controller of the Intel x86 architecture. These are known as chained GPIO irqchips. The GPIO drivers provide interrupts though a cascaded connection to the parent interrupt controller. The GPIO chip registers as irqchips which is made possible by the irq subsystem of Linux ( header). Essentially, the interrupt handling is using two subsystems irq and gpio. Chained irqchips are usually found on SoCs and a fast IRQ flow handler gets called by the parent interrupt controller which then calls the gpio irq handler.

52 Hence, there is a propagation delay causing the interrupt latency to be higher than expected for a Xenomai kernel.

4.2.2 Hardware controllers involved The hardware controllers that are used by this driver are PXA2XX SPI controller, IDMA64 (Intel DMA) and Intel-Pinctrl (Pin control). The SPI controller uses the on-chip DMA device for transferring data from memory to device and vice versa. Intel-pinctrl is the platform driver for GPIO lines.

4.3 Driver Implementation

The RTDM driver is implemented as a kernel module that can be loaded dynami- cally. It is very similar to writing a device driver for Linux as a kernel module, the RTDM framework provides interfaces to register an RTDM device and the corre- sponding driver handlers.

4.3.1 Writing the RTDM driver As a reference document, the article The Real-Time Driver Model and First Appli- cations by Jan Kizka [5] was used to start with. The process of writing the RTDM driver was divided into following steps:

• Developing a simple kernel module: Any kernel module when loaded calls the __init() function of the module and when removed calls the __exit() function of the module. A simple kernel module was implemented to get familiarized with development of kernel modules and their compilation.

• Implementing SPI communication from Kernel space: Linux provides a lot of generic subsystems such as GPIO, SPI, IRQ, etc. Using the SPI subsystem, the kernel module was improved to transfer SPI data on one of the ports from kernel space.

• Setup GPIO interrupt to trigger SPI transfer: A real-time GPIO interrupt was setup using the rtdm_irq_request(). This was the most crucial part of the driver and the transfer of audio data over SPI is handled when this interrupt is triggered.

• Allocating buffers: To handle the SPI transfers, buffers are allocated. These buffers are suitable for DMA operations and they are allocated using the inter- face dma_zalloc_coherent(). It allocates cache coherent memory and hence avoids problems associated with cache coherency which arise due to sue of DMA. Along with the buffers SPI related data structures such as spi_message spi_transfers are also allocated.

53 • Implementing the device handlers: In this step all the required device handlers were identified and implemented. The handlers that were implemented are: open() to setup the real-time interrupt to handle transfers. When the RTDM device is opened essentially the driver starts functioning. The close() handler handles all the free-up related activities (E.g. freeing the memory utilized by driver). An ioctl() is implemented to handle the signaling of interrupt to user space. Finally, an mmap() handler is implemented to map the DMA buffers which are in kernel address space.

• Define the RTDM driver and device: RTDM driver has to be described with its handlers and the handlers that are implemented are defined in the fields of struct rtdm_driver. The driver has to be associated with the device, this is done using the struct struct rtdm_device. In this way, the rtdm_driver and the rtdm_device are connected.

• Register the RTDM device: After defining the driver and the device in the previous step, the device is registered. RTDM framework provides an interface rtdm_dev_register() to register a device. This interface is used to register the RTDM device which will appear as a device in the /dev/rtdm directory of Linux file system. Any interaction with the RTDM driver is through this device file in /dev/rtdm.

4.3.2 Modifications to platform drivers The platform drivers provide all the necessary access to the hardware. Developing them from the scratch for the RTDM driver would not be worthy of the effort. However, one cannot use the platform drivers as per his/her wish in the real-time domain. In the real-time domain, anything related to Linux kernel services must be avoided. Hardware abstraction layers are not as dangerous as the kernel services such as scheduler services, dynamic memory management, spinlocks, etc. Hence, everything related to dynamic memory management, scheduling, etc. are replaced with RTDM equivalent and if the equivalent is not present in the Xenomai frame- work, they are avoided.

The Linux platform drivers that were adapted for the real-time domain are explained below:

• GPIO IRQ: When a Linux kernel is patched with I-pipe. The device drivers should also be modified so that they are aware of the abstraction layer. This was one of the major difficulty of this thesis project to make the intel-pinctrl platform driver aware of I-pipe. I-pipe treats every interrupt as a level sensi- tive interrupt but the drivers may not and this causes a massive IRQ storm whenever an IRQ is requested through the request_irq() for real-time do- main. This mismatch in the interrupt management has to be solved by writing a chained interrupt flow handler which will acknowledging the interrupts as

54 done with level sensitive interrupts. The interrupt line in the pinctrl driver requested using devm_request_irq() is replaced to handle the interrupts as a chained interrupt and a new flow handler is added, which calls the irq handler of pinctrl. Another major change in pinctrl is that the irq handler invocation is done using ipipe_handle_demuxed_irq() instead of generic_handle_irq() as before. All the spinlocks are changed to ipipe_spinlocks as the Linux spin- locks cannot guarantee safety of critical sections when running in real-time domain.

• SPI controller In case of the SPI driver, it is very well written for Linux architecture. Linux has a generic subsystem for SPI related interfaces inside kernel and usually, it is used for writing protocol drivers such as one developed in this thesis. Hence, a lot of APIs use Linux services but, hardware related services can be reused. Hence, all the scheduler related code is removed. A lot of code from the Linux subsystem was moved to the real-time SPI driver which was used for the transfer of audio. A major change is that the spi_async() interface from the Linux SPI subsystem is not used. Thus, completely bypassing the SPI subsystem’s mechanism of transferring SPI data. The SPI interrupt which is used to trigger SPI transfer when used in PIO (Programmed I/O Mode) is disabled as it is not necessary. Since the SPI controller uses coherrent DMA memory, the SPI buffers are mapped once and reused every-time the interrupt to transfer the data is invoked. In this manner a lot of dynamic memory allocations and freeing tasks can be avoided in the real-time domain. The SPI controller uses DMA to send and receive data from main memory and hence the DMA controller driver was also adapted accordingly for operating in real-time domain.

• DMA controller DMA although never directly comes into picture but the SPI driver uses the DMA to transfer bulk data. Hence, the DMA controller’s driver for an Intel x86 machine was also modified. The main issues to be solved with respect to real-time domain were spinlocks, scheduling related issues and dynamic memory allocation using kalloc. Memory allocated dynamically was removed and a static memory was allocated for the DMA descriptors. The DMA also interfaces with a generic Linux layer which everyone uses, the DMA Engine (). Hence, a lot of the IDMA64 driver services had to be changed to make the functions real-time safe (use of tasklets, kthread and work queues are not real-time safe and must be avoided) and protected and a lot of different approaches were tried to make the platform driver real- time safe. Finally, a reliable solution was implemented to remove the non- real-time parts of the driver. It is not a clean solution but it also depends on the application and environment under which the kernel works.

55 4.3.3 User space wrapper As part of the thesis, when the driver was ready for testing, a wrapper for handling the device related activities was developed. Basic device I/O such as open() and close(), ioctl() are wrapped under a more user friendly API for developers in user space. In addition to basic handlers, a library for audio processing systems was also developed to use the driver and that is called RASPA. RASPA is an RTDM audio-over SPI API interfaces.

56 Chapter 5

Benchmarking

This chapter covers the results of the experiments done in this thesis work. Im- portant metrics are identified for analyzing the performance of the real-time audio driver. Focus on real-time capability of Linux is also studied in this project along with the main objective to reduce the round-trip latency of audio processing in a real-time Linux kernel. The following sections explain the method followed to obtain the results, the benchmarks are analyzed and compared with that of standard Linux.

Benchmarking methodology is based on empirical data obtained through ex- periments, quantitative analysis is done with right kind of data representation, the results are obtained through various physical scopes and through automation. Since, in real-time systems to conclude a system is absolutely performing as per the timing requirements is very difficult, a large sample size of data measurements are taken to study the performance. The benchmarking metrics chosen are discussed in the next sections with the observed results.

5.1 Benchmarking metrics

5.1.1 Round-trip latency Round-trip latency has been explained in the section 2.2. This is the main metric and the one that matters most for any company developing an audio processing system. The goal of this thesis right from the beginning was to try the dual kernel Linux and check how good the latency values will be.

Method Round trip latency can be measured in two ways, one way is to measure using an oscilloscope with a triggering signal and the other way is using the Jack_iodelay latency measurement tool. In this thesis, both the methods are used to confirm the round-tip latency. In the oscilloscope method, a trigger signal which is a simple sinusoidal wave is passed through the input channel and received at the output

57 Table 5.1: Round trip latencies comparison.

Dual kernel USB audio interface Buffer Size Linux (Focusrtite) Real-time audio 16 samples 1.382 6.348 32 samples 2.111 8.473 64 samples 4.51 13.556 channel. The input and output channels are connected to an oscilloscope probe each, the input channel probe is used as a trigger and a snapshot at trigger is taken by the oscilloscope. The difference in time between input channel and output channel signal captured by the oscilloscope is the round-trip latency of the system. In addition to the previous method, Jack_iodelay is used to get the round-trip latency. It creates an input port and output port, sends a signal and measures the signal delay. For this method, the input channel and output channel must be connected to each other.

Summary of results Table 5.1 shows the round trip latency values of the real-time audio driver that was developed. Buffer sizes of 16, 32 and 64 are supported by the driver. Hence, the latency values are for these three buffer sizes. The latency values observed are better than any general purpose operating system’s audio driver by a large magnitude. The lowest round-trip latency observed was 2.1 ms, which is better than the latency of a market leading USB audio interface used in recording studios (Focusrite). The round-trip latency results show that there is no more need of an external device or interface to have low latency audio on Linux and the RTDM audio driver performs even better than the USB audio interface.

5.1.2 Scheduling latency Scheduling latency is the time taken by the kernel to schedule a user space task waiting on an event. An event could be the timer tick for the scheduler or a signal on which a task is waiting to be woken up. This is a measure of the interrupt handling and scheduling mechanism. In a real-time OS it is extremely important to dispatch a task on an event.

Method The scheduling latency is measure by a Xenomai test application known as Latency which can be run on a dual kernel system or on the normal single kernel system.

Latency creates a high priority thread and it is scheduled periodically through a timer event. The difference in the time stamps, at the event occurrence and at

58 the user task starting to execute is measured and recorded. Latency test executes continuously with a batch of fixed number of timer invocations and each time it collects the best case, average case and worst case latency values. the test is run over more than 2 hours collecting more than a million samples. For real-time systems the worst case latency is what matters because it guarantees that the task will be scheduled within a certain duration of time. Hence, the worst case latencies are plot in figure 5.1 and figure 5.2.

Summary of results The results obtained from the latency test clearly show that Xenomai enabled Linux performs much better in terms of scheduling latencies. Figure 5.1 shows the his- togram of scheduling latencies observed in normal Linux kernel. The worst case scheduling latency can be as much as 99 µs and in comparison to that, the Xeno- mai enabled Linux can be as low as 12 µs. Table 5.2 shows the worst case, average case and best case scheduler latency. It is interesting to note that the possible la- tencies range (difference between worst case and best case latency) is much larger in a normal Linux kernel than the Xenomai enabled Linux.

Figure 5.1: Histogram of worst case scheduling latencies in standard Linux.

Table 5.2: Scheduling latencies for single and dual kernel Linux.

Worst case Average case Best case Kernel scheduling scheduling scheduling latency (usec) latency (usec) latency (usec) Single kernel Linux 99 15 8 Dual kernel Linux (Xenomai) 12 4 2

59 Figure 5.2: Histogram of worst case scheduling latencies in dual kernel Linux.

5.1.3 Driver Interrupt servicing latency As this study is also about the real-time capability of Linux using a dual kernel Linux, the interrupt response latency of both the kernels is compared.

Method The method employed is through a long run of automated interrupt triggers, a kernel module was written with interrupt handler which would just respond to the interrupt by raising a GPIO line on Intel Joule’s GPIO and the trigger is done by XMOS. The response is also caught by XMOS, Since, it is a highly accurate and deterministic micro-controller. It triggers and waits for the response, the time taken by Intel Joule to respond is measured and stored by XMOS. A total of 1 million interrupts were triggered to plot the following histograms of interrupt service latency. The histograms indicate the probability of a response time to occur.

Summary of results Driver interrupt latency results are shown in figure 5.3 and figure 5.4 for normal Linux kernel and Xenomai enabled Linux kernel respectively. The results show that the driver with Xenomai interface for interrupt handling has lesser latency compared to the normal Linux interrupt handling. The Xenomai version has an average latency around 55 µseconds. On the other hand the Linux interrupt service latency is around 77 µseconds. When the worst case latencies are studied, the normal Linux driver performs horribly. The latencies can be as high as 110 µseconds which certainly damages the goals of achieving low latency audio processing on Linux. The figures depict the probability histogram of a range of possible latencies. As

60 seen in the case of scheduling latencies, the range of latencies in a Xenomai enabled Linux is much narrower than the one observed in normal Linux.

Figure 5.3: Probability of interrupt servicing latency in normal Linux kernel.

Figure 5.4: Probability of interrupt servicing latency in Xenomai enabled Linux.

5.1.4 CPU Usage As mentioned in the section 2.5, Latency is directly proportional to the buffer size and buffer size is also directly proportional to the interrupt period. If the buffer size is reduced, the interrupt servicing increases causing more CPU usage. CPU usage is crucial because there are other non-real-time background tasks running on the

61 Table 5.3: CPU usage comparison between Xenomai Linux and normal Linux.

CPU usage Buffer Size Xenomai Normal Linux 16 samples 9.6% NA 32 samples 4.85% NA 64 samples 2.5% 25% same system. Hence, CPU usage is a major metric in rating an audio processing system.

Method The Xenomai framework provides real-time domain related information through the /proc pseudo filesystem directory with the name xenomai. Since the real-time domain and the non-real-time domain run independently, the top command is not useful to monitor real-time processes or interrupts. Instead of the top command the xenomai provides files in the /proc/xenomai directory to monitor CPU usage and other process related run-time information.

The CPU usage of an audio processing task running on Xenomai enabled Linux and on normal Linux was compared. The two systems have almost same computa- tional power with the normal Linux machine being slightly more powerful than the processor running Xenomai enabled Linux (Intel Joule with a quad core processor at 1.7 Ghz and Intel Core-i7 at 2 Ghz).

Summary of results The results for the 16 sample and 32 sample buffer sizes on the normal Linux was not obtained as the Jack audio server complained about buffers not being processed at the rate of the interrupts. However, at 64 samples buffer size the normal Linux approximately used 25% of CPU usage when compared to 2.5% CPU usage by Xenomai enabled kernel. Similarly, the CPU usage was much lesser than the typical values on a personal computer for 16 and 32 samples, with maximum usage of 9.6% for 16 samples buffer size. These results also prove that, Xenomai does not waste CPU time for obtaining low latency audio.

62 Chapter 6

Conclusion

This chapter presents the conclusions that can be derived from this thesis work. It summarizes the work done in this thesis and throws light on some of the key observations and the overall outlook of the project. Finally, future scope of work that is available based on this thesis is mentioned.

6.1 Summary and outlook

All the goals that were set at the start of this thesis work were met and tested. Al- though, some doubts were present at the beginning of the project with the approach but, a lot of design decisions were made with the circumstances that were faced at different stages of the project. The possibility of running hard real-time tasks in user space along with non real-time tasks of Linux is entirely possible. This was a major issue with the earlier real-time frameworks for real-time Linux (E.g. RTAI).

An incremental approach was used to complete the thesis, the Linux kernel was compiled and generated with Yocto build system and a simple RTDM driver was developed. Finally, the whole audio over SPI driver was designed and implemented using the RTDM framework.

In the process of developing the RTDM driver, a lot of issues surfaced, mainly with handling of interrupts in drivers of the devices used by the audio driver. Hence, it needed the developer to study the existing Linux drivers extensively and adapt them accordingly. This is the drawback with Xenomai framework. It needs drivers to be real-time safe for running in real-time domain. Which means that one has to abandon any non real-time Linux service being used in the real-time driver and its lower layers.

This thesis provides substantial proof that very low latency audio for Linux is possible. When compared to the existing audio latency figures with professional stu- dio grade audio interfaces, this approach provides much better results. It is shown

63 that low latency audio can be obtained without any compromise on the quality of service and with different buffer size configurations.

This thesis also confirms the real-time performance of Xenomai framework for Real-time Linux. The interrupt latency results were observed to be much improved in Xenomai enabled Linux. Overall responsiveness of the system is improved with Xenomai and provides evidence that Xenomai can be used for hard real-time appli- cations.

6.2 Future Work

6.2.1 Asynchronous Driver On the basis of this thesis work, future solutions for low latency audio can be imple- mented with a different communication approach between Intel Joule and XMOS micro-controller. Currently, the XMOS informs the processor through a GPIO in- terrupt which makes the implementation easy and simple. However, the GPIO line introduces some unwarranted latency for the interrupt handler to start. This is mainly because of the intel-pinctrl driver’s interrupt flow handling mechanism. If the processor initiates transfer of audio with XMOS using a timer interrupt, the in- terrupt latency due to GPIO line can be removed. Thus, it improves the bandwidth of data transfer.

64 Bibliography

[1] S. Yagneswar, “Performance optimization of signal processing algorithms for simd architectures,” Master’s thesis, KTH, ICT, 2017.

[2] C. S. Giuseppe Lipari, “Linux and real-time: Current approaches and future opportunities,” in IEEE International Congress ANIPLA, vol. 6, 2006.

[3] Opensys. (2002) Adeos – adaptive domain environment for operating systems. Accessed:25-Aug-2017. [Online]. Available: http://www.opersys.com/adeos/

[4] Xenomai. (2005) Life with adeos. Accessed:25-Aug-2017. [On- line]. Available: http://www.xenomai.org/documentation/xenomai-2.3/pdf/ Life-with-Adeos-rev-B.pdf

[5] J. Kiszka, “The real-time driver model and first applications,” 09 2017.

[6] XMOS. xcore microcontrollers. Accessed:25-Aug-2017. [Online]. Available: http://www.xmos.com/products/silicon/xcore-200

[7] OpenEmbedded. Openembedded core. Accessed:25-Aug-2017. [On- line]. Available: https://www.yoctoproject.org/tools-resources/projects/ openembedded-core

[8] L. Turchet, C. Fischione, and M. Barthet, “Towards the internet of musical things,” in Proceedings of the 14th Sound and Music Computing Conference, July 5-8, Espoo, Finland, 07 2017.

[9] L. Atzori, A. Iera, and G. Morabito, “The internet of things: A survey,” Com- puter networks, vol. 54, no. 15, pp. 2787–2805, 2010.

[10] J. A. Paradiso, “Electronic music: new ways to play,” IEEE Spectrum, vol. 34, no. 12, pp. 18–30, Dec 1997.

[11] R. Oshana, “Chapter 2 - overview of real-time and embedded systems,” in DSP for Embedded and Real-Time Systems, R. Oshana, Ed. Oxford: Newnes, 2012, pp. 15 – 27. [Online]. Available: http://www.sciencedirect.com/science/ article/pii/B9780123865359000020

[12] S. Rostedt and D. V Hart, “Internals of the rt patch,” 01 2007.

65 [13] A. P. McPherson, R. H. Jack, and G. Moro, “Action-sound latency: Are our tools fast enough?” in Proceedings of the International Conference on New In- terfaces for Musical Expression, vol. 16. Queensland Conservatorium Griffith University, 2016, pp. 61–64.

[14] M. Kaszczyszyn. Latency and recording. Accessed:26-Sep-2017. [Online]. Available: http://www.michalkaszczyszyn.com/en/tutorials/latency.html

[15] A. McPherson and V. Zappi, “An environment for submillisecond-latency au- dio and sensor processing on beaglebone black,” in Audio Engineering Society Convention 138. Audio Engineering Society, 2015.

[16] J. H. MORO, Bin and McPherson, “Making high-performance embedded in- struments with bela and pure data,” in International Conference on Live In- terfaces, 2016.

[17] E. Berdahl, “Satellite ccrma: A musical interaction and sound synthesis plat- form.”

[18] E. Berdahl, S. Salazar, and M. Borins, “Embedded networking and hardware- accelerated graphics with satellite ccrma.”

[19] T. Cucinotta, D. Faggioli, and G. Bagnoli, “Low-latency audio on linux by means of real-time scheduling.”

[20] J. W. Topliss, V. Zappi, A. McPHERSON et al., “Latency performance for real-time audio on beaglebone black,” 2014.

[21] R. Piéchaud, “A lightweight c++ real-time active control framework.” in OS- ADL project: Real Time Workshops, 2014.

[22] S. Benacchio, R. Piéchaud, and A. Mamou-Mani, “Active control of string instruments using xenomai.”

[23] B. Walle, “Development of a linux driver for a most interface and porting to ,” Master’s thesis, FACHBEREICH INFORMATIK, 2006.

[24] RTAI. (2016) Rtai - the realtime application interface for linux. Accessed:25- Aug-2017. [Online]. Available: https://www.rtai.org/?About_RTAI

[25] Xenomai. (2001) Xenomai – real-time framework for linux. Accessed:25-Aug- 2017. [Online]. Available: https://xenomai.org/about-xenomai/

[26] P. Gerum. (2002) I-pipe (interrupt pipeline): Introduction. Accessed:25-Aug- 2017. [Online]. Available: https://lwn.net/Articles/140374/

[27] Intel. (2017) Intel joule module. Accessed:26-Asep-2017. [Online]. Available: https://software.intel.com/en-us/intel-joule-getting-started

66 [28] H. Kopetz, The Real-Time Environment. Boston, MA: Springer US, 2011, pp. 1–28. [Online]. Available: http://dx.doi.org/10.1007/978-1-4419-8237-7_1

[29] G. C. Buttazzo, Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications. Norwell, MA, USA: Kluwer Academic Publish- ers, 1997.

[30] D. Bovet and M. Cesati, Understanding the Linux Kernel, Second Edition, 2nd ed., A. Oram, Ed. Sebastopol, CA, USA: O’Reilly & Associates, Inc., 2002.

[31] T. Knutsson, “Performance evaluation of gnu/linux for real-time applications,” 2008.

[32] D. Y. S. S. John D. Cutnell, Kenneth W. Johnson, 10th Edition, 10th ed. UK: Wiley, 2015.

[33] AES. (2008) Aes recommended practice for professional digital audio - preferred sampling frequencies for applications employing pulse-code modulation. Accessed:04-Sep-2017. [Online]. Available: http://www.aes.org/ publications/standards/search.cfm?docID=14

[34] U. Zölzer and J. O. Smith Iii, “Dafx—digital audio effects,” The Journal of the Acoustical Society of America, vol. 114, no. 5, pp. 2527–2528, 2003.

[35] GNU.org. (2007) Gnu general public license. Accessed:25-Aug-2017. [Online]. Available: https://www.gnu.org/licenses/gpl.html

[36] A. A. Sibasankar Haldar, Operating Systems. New Delhi, India: Pearson, 2010.

[37] G. K.-H. Alessandro Rubini, Jonathan Corbet, Linux Device Drivers, 3rd ed. Sebastopol, United States: O’reilly, 2005.

[38] L. W. Magzine. (2004) Approaches to realtime linux. Accessed:25-Aug-2017. [Online]. Available: https://lwn.net/Articles/106010/

[39] J. M. Calandrino, H. Leontyev, A. Block, U. C. Devi, and J. H. Ander- son, “Litmusrt, a testbed for empirically comparing real-time multiprocessor schedulers,” in 2006 27th IEEE International Real-Time Systems Symposium (RTSS’06), Dec 2006, pp. 111–126.

[40] V. Yodaiken, “The manifesto,” in 5th Linux Conference Proceedings, 1999.

[41] M. B. Victor Yodaiken. (1996) A real-time linux. Accessed:25-Aug-2017. [Online]. Available: https://users.soe.ucsc.edu/~sbrandt/courses/Winter00/ 290S/rtlinux.pdf

67 [42] A. Barbalace, A. Luchetta, G. Manduchi, M. Moro, A. Soppelsa, and C. Tal- iercio, “Performance comparison of , linux, rtai, and xenomai in a hard real-time application,” IEEE Transactions on Nuclear Science, vol. 55, no. 1, pp. 435–439, Feb 2008.

[43] OSADL. (2004) Xenomai/solo: Migration from vxworks® to linux. Accessed:25-Aug-2017. [Online]. Available: http://www.osadl.org/ Migration-Portability.migration-portability.0.html

[44] C.-K. W. Ching-Chun (Jim) Huang, Chan-Hsiang Lin. (2015) Xenomai/solo: Migration from vxworks® to linux. Accessed:25-Aug-2017. [Online]. Available: http://wiki.csie.ncku.edu.tw/embedded/xenomai/rtlws_paper.pdf

[45] LinuxFoundation. (2010) Yocto project. Accessed:25-Aug-2017. [Online]. Available: https://www.yoctoproject.org/about

68 TRITA -EE 2017:140 ISSN 1653-5146

www.kth.se