Download The

A Portable Real Time Threads Environment

Roland Mechler

B.Sc, University of British Columbia, 1992

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF

THE REQUIREMENTS FOR THE DEGREE OF

Master of Science

THE FACULTY OF GRADUATE STUDIES

(Department of Computer Science)

We accept this thesis as conforming to the required standard

The University of British Columbia

April 1997

degree at the University of British Columbia, I agree that the Library shall make it

freely available for reference and study. I further agree that permission for extensive

copying of this thesis for scholarly purposes may be granted by the head of my

department or by his or her representatives. It is understood that copying or

publication of this thesis for financial gain shall not be allowed without my written

permission.

Department of Cp W\ pU 'f&f S C \ &V\ CC'

The University of British Columbia Vancouver, Canada

pate Apr; I io, mi

DE-6 (2/88) Abstract

Real Time Threads (RT Threads) is a threads package which provides real time scheduling semantics, application portability to a variety of host operating systems and hardware architectures, and support for distributed application programming via location transparent message passing between threads in distinct address spaces.

RT Threads was initially designed as a programming environment for the devel• opment of the Continuous Media File System (CMFS), a distributed multimedia application requiring real time scheduling of asynchronous tasks. In particular, real time scheduling is required for timely disk reading and network transmission of con• tinuous media data at the server nodes, as well as for synchronization of multiple media streams at clients. The goal of this thesis is to show that RT Threads has performance characteristics making it practical for use both as a general purpose threads package, and as a platform for developing applications requiring real time scheduling. In fact, it will be shown that RT Threads peforms acceptably well (for non-critical applications), even in a non-real time operating system environment.

ii Contents

Abstract ii

Contents iii

List of Tables vi

List of Figures viii

Acknowledgements ix

1 Introduction 1

2 Real Time Threads Overview 6

2.1 Real Time Scheduling 7

2.2 Thread Communication and Synchronization 8

2.3 External I/O 11

3 Application Programming Interface 12

3.1 Types and Constants 14

3.2 Thread Management 16

3.2.1 Thread Creation 16

iii 3.2.2 Thread Termination 17

3.2.3 Thread Scheduling 18

3.2.4 Miscellaneous 18

3.3 Synchronization 19

3.4 Communication 22

3.4.1 Intra-Address Space Communication 22

3.4.2 Inter-Address Space Communication 24

3.5 Dynamic Memory Allocation 25

3.6 Miscellaneous 26

3.7 External I/O 27

4 Implementation 29

4.1 Data Structures 30

4.2 System Startup 33

4.3 Thread Management 35

4.4 Queue Management 40

4.5 Thread Scheduling 43

4.5.1 System Clock 44

4.5.2 Context Switching 47

4.5.3 Modifying Scheduling Attributes 50

4.6 Semaphores 51

4.7 Inter Thread Communication 55

4.7.1 Cross Address Space ITC 60

4.8 Dynamic Memory Allocation 66

4.9 External I/O 66

4.10 Mapped Implementation 70

iv 4.11 Native Microkernel Implementation 72

5 Performance 73

5.1 Functional Overheads 74

5.1.1 Thread Creation 75

5.1.2 Semaphore Overhead 76

5.1.3 Context Switching 77

5.2 Inter Thread Communication 78

5.3 Real Time Performance 79

6 Related Work 94

6.1 POSIX 96

6.2 Real Time Support for Multimedia Applications 97

6.3 Inter Thread Communication 98

7 Additional Work 101

8 Conclusions 102

Bibliography 104

Appendix A Example Program 107

v List of Tables

3.1 RT Threads Interface 13

3.2 Interface to UNIX I/O System Calls 28

5.1 Host operating systems and architectures 74

5.2 Thread creation times in microseconds 76

5.3 Context switch times in microseconds 78

5.4 Local send/receive/reply times 79

5.5 Cross address space send/receive/reply times 80

5.6 Base Test: No background threads, no background processes .... 82

5.7 1 background thread, no background processes 83

5.8 No background threads, 1 background process 84

5.9 1 background thread, 1 background process 85

5.10 No background threads, 2 background processes 86

5.11 1 background thread, 2 background processes 86

5.12 No background threads, 1 background compile 87

5.13 1 background thread, 1 background compile 88

5.14 2 RT Threads environments 89

5.15 2 RT Threads environments, 1 with background thread 89

vi 5.16 2 RT Threads environments, both with background thread 90

5.17 One RT Threads environment with 100 threads 91

5.18 One RT Threads environment with 200 threads 91

5.19 One RT Threads environment with 300 threads 92

5.20 One RT Threads environment with 300 threads, measured in kernel 92

5.21 Multimedia client scheduling performance 93

vii List of Figures

2.1 ITC via Message Passing 8

2.2 Client/Server Communication 10

4.1 RttThreadld 33

Vlll Acknowledgements

First and foremost, I wish to thank my supervisor, Dr. Gerald Neufeld, for the tremendous support, guidance, and encouragement he has given me, for which I will be forever grateful. I would also like to thank Dr. Norman Hutchinson, for his invaluable generosity in lending advice and opinion. Many thanks to David Finkel- stein and Dwight Makaroff, who were both closely involved in the implementation and testing of RT Threads, and with whom it has been a pleasure to work. Acknowl• edgement and thanks go as well to Murray Goldberg, whose Pthreads preemptive threads package was the basis for this implementation, and to Ann Lo, whose thesis work on real time scheduling was incorpoated into RT Threads. Finally, I thank my parents for their patience, their love, and for always standing behind me. None of this would have been possible without them.

ROLAND MECHLER

The University of British Columbia

April 1997

IX Chapter 1

Introduction

This thesis is concerned with the design, implementation and performance of Real

Time Threads (RT Threads). RT Threads is a threads package which supports distributed application programming and application portability across operating

systems and machine architectures. The first motivation for RT Threads was the

desire for a portable multi-threaded programming platform on which to build dis•

tributed multimedia applications in a heterogeneous computer network environment.

Another motivating factor was that a real time scheduling policy (with a well-defined

scheduling behaviour) was considered best suited to the time critical nature of mul•

timedia applications, particularly with respect to the synchronization of multiple

media objects and the timely delivery of continuous media data.

RT Threads has been ported to several versions of UNIX (AIX, SunOs 4.1.3,

Solaris, Linux, FreeBSD), Microsoft's Win32 software development platform (Win•

dows NT, 95), as well as the Kea microkernel platform [30]. Machine architectures

supported are Sun SPARC, IBM PowerPC, and Intel x86/Pentium. All of the im-

1 plementations are at the user level, except the Kea version which is implemented in the kernel.

Threads packages are often closely associated with a particular operating system or software development platform. A key problem with porting threaded applications among various threads packages (as a means of porting between plat• forms) is that these packages do not all have a common set of functions and policies.

For example, AIX Pthreads [1] does not support semaphores in its interface. Win32

Threads, on the other hand, has only five priority levels and does not support real time scheduling of threads1. Defining a set of thread functions and policies with a single interface, and supporting that interface on various platforms was chosen as a way to make the applications easily portable.

General purpose threads packages tend to use a time sliced round robin scheduling policy among ready threads at the highest priority level which has ready threads. Often the priorities are not fixed, i.e., the system may temporarily modify priorities. The advantage of time slicing is that the application is not responsible for fairness among threads of equal priority. The advantage of non-fixed priorities is that the system has some control of fairness among threads in different processes

(i.e., to prevent a high priority thread in one process from starving other processes of CPU resources), since threads have a global scheduling scope in many of these systems. The disadvantage of both policies is that they take control of scheduling away from the application.

'A time sliced round robin policy is the only one supported.

2 RT Threads, on the other hand, uses a fixed priority scheduling policy with no time slicing. Threads run to completion unless preempted by another thread of higher priority or equal priority and earlier deadline. The lack of time slicing means that the application has a greater control over scheduling of threads, but also that the application itself is responsible for fairness among its threads. This can be thought of as a cooperative threads environment. An RT Threads application can in fact implement its own time slicing, if this is desired. Starvation of system processes is not an issue since RT Threads has a per process scheduling scope for threads (although operating in a multi-process environment may have consequences for the real time response of RT Threads).

Several aspects of the design of UBC's distributed Continuous Media File

Server (CMFS) [19, 18, 20] require real time scheduling. At the server, threads must be scheduled to read data from disk and to submit data for network trans• mission according to strict time constraints, in order to guarantee timely delivery of continuous media streams to clients. At the clients, real time scheduling is used to synchronize multiple media streams, e.g., video, audio and closed captioning text.

Synchronization is achieved by scheduling real time threads to submit the data for each stream to the appropriate display device at the correct time.

RT Threads does not provide hard real time guarantees. Its scheduling al• gorithm is optimal, meaning it will produce a feasible schedule if such a schedule exists [12], but no dynamic schedulability analysis is done and threads which pass their deadline continue executing until completion. Hard real time scheduling was not implemented for RT Threads because such guarantees are not possible when

3 porting to non real time operating systems such as UNIX and Windows NT, and because the overhead of dynamic schedulability analysis would likely be self defeat• ing. RT Threads was ported to a microkernel environment in an effort to be able to provide real time guarantees for the CMFS server. Results from [2] showed that implementing a threads package on a bare machine provides a high degree of timing predictability as compared to implementation on top of UNIX. However, our results show that even in a non-real time general purpose operating system like UNIX or

Windows NT, reasonably accurate results are achieved for timing predictability, so long as loads from other non-real time processes is relatively light. Being able to support applications with soft real time requirements, such as multimedia display clients, on widely used general purpose platforms has obvious advantages over re• quiring special purpose operating system environments.

An alternative to implementing a new threads package for portability would have been to use a standard threads interface, i.e., POSIX threads (Pthreads). Un• fortunately, the POSIX interface does not provide the functionality required for the target real time applications. In particular, the specification does not provide for deadline scheduling, the ability for one thread to suspend another thread or put another thread to sleep, or support for distributed programming by allowing di• rect communication between threads in distinct address spaces via message passing.

Also, POSIX does not specify a level of scheduling control considered important for the target applications. For example, the interface does not specify that a thread

which signals a higher priority thread blocked on a condition variable2 should be preempted by that thread immediately. 2Pthreads uses the condition variable as its basic synchronization primitive, whereas RT Threads uses semaphores.

4 Thesis statement: It is possible to implement a real time threads environ• ment on a general purpose operating system, such that acceptable results for real time schduling will be acheived even under moderate system loads.

The remainder of this thesis is organized as follows. An overview of RT

Threads is given in chapter 2. Chapter 3 describes the application programming interface. Chapter 4 describes how RT Threads was implemented. Performance re• sults are given in chapter 5. Related work and conclusions follow in chapters 6 and 8.

5 Chapter 2

Real Time Threads Overview

RT Threads has the following key features:

• A real time scheduling algorithm based on the starting time, priority and

deadline attributes of threads

• Application modification of thread scheduling attributes

• Thread creation/destruction

• Thread synchronization (semaphores)

• Inter-thread communication (ITC)

• Globally unique (on the Internet) thread identifiers, allowing transparent mes•

sage passing between address spaces (and machines)

• Asynchronous I/O (blocking I/O calls block only the calling thread)

6 2.1 Real Time Scheduling

Real time scheduling in RT Threads is defined by three scheduling attributes, start• ing time, priority, and deadline. Starting times and deadlines are absolute time values (measured in in seconds and microseconds), and priorities are integer values from 0 to 31, where 0 is the highest priority. No thread is made ready before its starting time. The currently executing thread is called the running thread. Threads are scheduled first by priority, i.e., threads are preempted by higher priority ready threads. Threads of equal priority are scheduled according to the Earliest Deadline

First (EDF) rule [13], which states that threads are to be scheduled in order of non-decreasing deadlines. Threads of equal priority and equal deadline are sched• uled on a first come first served basis. Within a given priority level, threads with no deadline are scheduled after threads with deadlines. A running thread will run to completion unless preempted by another thread which becomes ready and has higher precedence according to the above algorithm. Such a thread can become ready in one of the following ways:

• Its starting time has been reached (i.e., the thread was sleeping).

• It has become unblocked due to completion of I/O.

• It has been signaled by the running thread, (i.e., it was blocked on a semaphore

or an ITC call).

• Its scheduling attributes have been explicitly modified by the running thread

to give it higher precedence.

• It has been created by the running thread, with scheduling attributes giving

it higher precedence.

7 Thread A Thread B I RttReceiveO ^ Thread is RttSendO request I I Mocked

rara,n reply RttReplyO | »S

Figure 2.1: ITC via Message Passing

2.2 Thread Communication and Synchronization

RT Threads provides two main mechanisms for inter-thread communication (ITC): shared memory and message passing. All threads within an RT Threads environ• ment share a common address space. RT Threads provides semaphores to synchro• nize access to this shared memory. Semaphores are standard counting semaphores.

A semaphore has a value which is decremented each time RttP() is called and in• cremented each time RttVQ is called. Calling RttP() when the semaphore value is zero or negative will cause the calling thread to block. Calling RttV() when the semaphore value is negative will cause one blocked thread to become unblocked.

A semaphore's initial value is specified when it is created with RttAllocSemO.

Semaphores can be created in one of two modes. In first come first served mode, threads become unblocked in the same order in which they blocked, regardless of priority. In priority mode, threads become unblocked according to the same priority and deadline based algorithm used by RT Threads to schedule ready threads.

Message passing uses a send/receive/reply model. RttSendO sends a mes-

8 sage (request) to a specified thread and blocks until it receives a reply. RttReceiveO receives a message from any thread, blocking if there are no messages waiting for the calling thread. RttReplyO sends a reply to a specified thread, which must be waiting for a reply, and does not block. Figure 2.1 shows the temporal relationship and blocking behaviour of these calls. Thread A and Thread B can be in the same address space or in different address spaces. Messages of arbitrary length can be sent using this mechanism. RT Threads' message passing facilities provide reliable communication between threads in distinct address spaces (and distinct machines), so as to provide support for programming distributed applications. These facilities are similar to the distributed inter-process communication (IPC) provided by the

V Kernel [4]. RT Threads provides location transparency for message passing ITC using thread ids that are globally unique across the Internet. This is accomplished by embedding an IP address and a local port number in the identifier.

Each RT Threads environment (i.e., local address space) is identified by an

IP address and port number, specified using the RttNetlnitO call. An RT Threads environment can create one special name server thread using the RttCreateNSO call. The thread id for the name server thread in a remote RT Threads environ• ment can be obtained using the RttGetNSO call, which takes the IP address and port number of the remote environment as parameters. RttGetNSO does not per• form any network communication, the IP address and port number provide enough information to uniquely identify the name server thread in a remote RT Threads environment. The name server thread simply provides a mechanism by which each

RT Threads environments can create an easily locatable thread. Whether this mech• anism is actually used to implement a name service, with which local threads can

9 Figure 2.2: Client/Server Communication register, is left up to the application.

Cross address space ITC provides a convenient mechanism for implement• ing RPC style client/server transactions. Figure 2.2 shows an example of how a distributed client/server application might be designed using ITC. In the figure, the server consists of an administrator thread which creates local worker threads to handle client requests. The numbers in the figure indicate the order of events in the handling of a single request from a remote client. Note that administrator/worker communication is arranged such that the administrator never does an RttSendO because RttSendO blocks and the administrator must always be available to receive requests.

10 2.3 External I/O

RT Threads provides an I/O interface which is essentially a subset of the UNIX interface. Operations for file and socket I/O are supported. I/O calls which block will block only the calling thread. When a thread's I/O is complete it is immediately made ready.

11 Chapter 3

Application Programming

Interface

An RT Threads application begins execution at the first statement of the routine mainpO. MainpO is passed the environment arguments argc and argv[]. MainpO is itself a thread and has the highest possible priority and earliest possible deadline

(it cannot be preempted).

Table 3.1 shows the operations which make up the RT Threads interface.

All RT Threads interface routines have an int return value, returning either RTTOK

(for success) or RTTFAILED (for failure), unless indicated otherwise in the following interface description.

12 Thread Managament RttCreateO Create a new thread RttMyThreadldO Get thread id of running thread RttGetThreadSchedAttr() Get scheduling attributes of given thread RttSetThreadSchedAttr() Set scheduling attributes of given thread RttNameToThreadld() Get thread id given name RttThreadExistsO Given thread id, determine whether thread exists RttKillO Given thread id, terminate thread RttExitO Terminate running thread RttGetTimeDfDay() Get the current time Thread Synchronization and Communication RttAllocSemO Allocate a new semaphore RttFreeSemO Free a semaphore RttPO Wait on a semaphore RttVO Signal a semaphore RttSemValueO Get the value of a semaphore RttSendO Send a message to a given thread and receive a reply RttReceiveO Receive a message from another thread RttMsgWaitsO Determine whether RttReceiveO would block RttReplyO Reply to a message from given thread RttNetlnitO Enable cross address space ITC RttMylPO Get IP address of local RT Threads environment RttMyPortO Get port number of local RT Threads environment RttCreateNSO Create new thread as name server thread RttGetNSO Get thread id of name server, given IP address and port Memory Management RttMallocO Allocate memory from heap RttFreeO Free memory previously allocated with RttMallocO RttReallocO Reallocate memory previously allocated by RttMallocO RttCallocO Allocate and clear memory from heap Miscellaneous RttRegisterExitRoutine() Register a routine to be executed when application exits RttBeginCritical0 Disable preempting events (I/O completion and clock) RttEndCriticalO Re-enable preempting events RttGetDataO Get data associated with thread RttSetDataO Set data associated with thread

Table 3.1: RT Threads Interface

13 3.1 Types and Constants

The type RttThreadld is an abstract data type used to identify threads at the user level. The routine xdr_RttThreadId() is provided to facilitate external data repre• sentation of Rtt Thread Ids so they can be transmitted as data between RT Threads environments on heterogeneous architectures. RttSem is another abstract data type, used to identify semaphores. The type RttTimeValue is used to represent time val• ues in RT Threads. The type is transparent, i.e., the application has knowledge of its fields seconds and microseconds. typedef struct {

long seconds;

long microseconds;

> RttTimeValue;

The seconds field can take on values from 0 to 2147483647 and the microsec• onds field values from 0 to 999999. Variables of type RttTimeValue can express both relative times and absolute times. Absolute times are determined using the

RttGetTimeOfDayO call.

The type RttSchAttr is used to specify the scheduling attributes of a thread.

Its fields startingtime, priority and deadline are visible to applications. typedef struct {

RttTimeValue startingtime;

int priority;

RttTimeValue deadline;

} RttSchAttr;

14 The startingtime and deadline fields are used to specify the starting time and dead• line, respectively, of a thread. Both express an absolute time. The priority field is used to specify the priority of a thread. Priorities can take on values from 0 to 30 (inclusive), where 0 is the highest priority and 30 is the lowest. It is highly recommended that threads use only priorities in the range 10 to 30 inclusive so as not to interfere with higher priority threads used by the underlying system (i.e., threads used to implement cross address space communication). For applications that do not require a fine granularity of priority, the following priority constants are available:

RTTHTGH = 10, RTTNORM = 20, RTTL0W = 30

The following constant values are also available as a convenience for scheduling time values:

RttTimeValue RTTZEROTIME;

RttTimeValue RTTINFINITE;

RttTimeValue RTTNODEADLINE;

RTTZEROTIME can be used to specify either the starting time or the deadline of a thread. Giving a thread a starting time of RTTZEROTIME will make it ready im• mediately (assuming it is not blocked). A deadline of RTTZEROTIME is the earliest possible deadline for a thread, causing it to be scheduled ahead of any thread with the same priority and a different deadline. RTTINFINITE can be used to specify a thread's starting time. A thread with a starting time of RTTINFINITE will effectively be suspended (it can only run again if another thread modifies its starting time).

RTTNODEADLINE can be used to specify a thread's deadline. A thread with a deadline

15 of RTTNODEADLINE will be scheduled after all threads of equal priority which have a deadline.

3.2 Thread Management

Thread management consists of routines for creating, scheduling and terminating threads, as well as some miscellaneous related routines.

3.2.1 Thread Creation

int RttCreate(RttThreadId *thread, void (*addr) (), int stksize,

char *name, id *arg, RttSchAttr schedAttr, int level)

This call is used to create threads. The first parameter, thread, is a pointer to an

RttThreadld, and is used to return the newly created thread's identifier by refer• ence. The second parameter, addr, is a pointer to the routine which acts as the entry

point for the created thread. The stksize parameter is the size, in bytes, that should

be allocated for the new thread's stack. The name parameter is a null-terminated string used to specify a thread name for debugging purposes. The name is copied, so it's memory may be released after the RttCreateO call returns. More than one

thread may have the same name. The fifth parameter, arg, is an argument (pa•

rameter) for the created thread. The entry routine for the new thread receives this

argument as a parameter. It can be used to pass a 32-bit value, or a pointer to

a larger data structure. Note that passing the address of an automatic variable is

dangerous since the procedure calling RttCreateO may return before the thread

16 ever runs. The sixth argument, schedAttr, specifies the scheduling attributes of the created thread. If the starting time indicated is in the future, the thread will not be• come ready until that time.The final argument, level, indicates whether the created thread is a system (RTTSYS) or an application (RTTUSR) thread. The only difference between the two is that the RT Threads process (i.e., threads environment) will exit when there are no more RTTUSR level threads. Therefore, perpetual server threads should be created with level RTTSYS (unless they service requests from other RT

Threads environments, in which case they will likely want to use RTTUSR).

3.2.2 Thread Termination

Threads terminate in one of three ways. Threads can return from their entry routine

(either with a return() statement or by falling off the end of the routine). Equiva- lently, they can terminate themselves using the RttExitO call. Lastly, threads can be killed by other threads using the RttKillO call.

void RttExitO

RttExitO allows a thread to terminate itself,

int RttKilKRttThreadld thread)

RttKillO allows a thread to terminate another thread, specified with the thread parameter.

17 3.2.3 Thread Scheduling

A thread's scheduling attributes are set at creation time. These attributes may also be retrieved and modified (by the thread itself or by another thread) at any time during the thread's lifetime.

int RttGetThreadSchedAttr(RttThreadId thread, RttSchAttr *schedAttr)

This call is used to retrieve the current values of the scheduling attributes of the thread specified by the thread parameter. The attributes are returned by reference via the schedAttr parameter. This call has no effect on the scheduling of the speci• fied thread.

int RttSetThreadSchedAttr(RttThreadId thread, RttSchAttr schedAttr)

This call is used to set the scheduling attributes of the thread specified by the thread parameter, as specified in the schedAttr parameter. Any system scheduling effects caused by a change in scheduling attributes will take place immediately.

3.2.4 Miscellaneous

RttGetTimeOfDayO, RttMyThreadldO and RttThreadExists() are miscellaneous routines which are useful for thread scheduling.

int RttGetTimeOfDay(RttTimeValue *time)

18 This call returns, by reference, the current clock time in seconds and microseconds since January 1, 19701.

RttThreadld RttMyThreadldO

This call returns the currently running thread's RttThreadld.

int RttThreadExists(RttThreadld thread)

This call is a predicate which returns 1 if the thread specified by parameter thread exists, and 0 otherwise. Use of this call is discouraged since even with a return of

1, by the time the following statement executes, the specified thread may no longer exist (preemption may occur between the two statements). The call is included primarily for reasons of backward compatibility.

3.3 Synchronization

Counting semaphores are provided as the primary means of thread synchronization.

Two attributes are specified by the user when allocating a new semaphore, value and mode. Value indicates the number of times RttP() (see below) can be called before a calling thread will block, assuming no RttVQ calls are made. Mode indicates the desired semantics for the order in which semaphore blocked threads are made ready.

RTTFCFS specifies first come first served semantics, in which threads are made ready

'RttGetTimeOfDayO is based on the UNIX gettimeof day () system call. On other architectures, the base time may be different.

19 in the order in which they were blocked, regardless of priority. RTTPRIORITY speci• fies highest priority first semantics, in which threads of higher priority will be made ready before threads of lower priority, with threads of equal priority being made ready in earliest deadline first order. Threads of equal priority and equal deadline are made ready in first come first served order in RTTPRIORITY mode.

Since memory is shared within an RT Threads environment, semaphores can easily be shared among threads within a single environment. In keeping with the cooperative nature of an RT Threads environment, it is left up to the user to ensure that semaphores are no longer in use when a semaphore is freed. To help detect such a situation, an attempt to free a semaphore on which other threads are blocked will result in failure. In the event that a thread is killed while blocked on a semaphore, the state of the semaphore is automatically modified (its value is incremented).

Other threads packages provide primitives intended specifically for providing mutually exclusive access to shared data (these are usually called mutexes). RT

Threads does not provide special "mutex" primitives; semaphores are lightweight in

RT Threads and can easily be used for this purpose.

RT Threads provides semaphore operations RttAllocSemO, RttFreeSemO,

RttP(), RttV() and RttSemValueO. Semaphore variables are of type RttSem.

int RttAllocSem(RttSem *sem, int value, int mode)

This call allocates a new semaphore, returned by reference via the sera parameter.

The initial value of the semaphore is specified by the value parameter, and must be

20 greater than or equal to 0. The mode for the semaphore ( RTTFCFS or RTTPRIORITY) is specified by the mode parameter.

int RttFreeSem(RttSem sem)

This call frees a previously allocated semaphore, specified by the sem parameter.

RttFreeSemO will return RTTFAILED if any threads are blocked on the semaphore.

int RttP(RttSem sem)

This is the conventional semaphore wait primitive. It decreases the value of the semaphore specified by parameter sem by one, and blocks if the resulting value is less than zero.

int RttV(RttSem sem)

This is the conventional semaphore signal primitive. It increases the value of the semaphore specified by parameter sem by one, and unblocks one thread blocked on the semaphore if the value was less than zero when the call was made.

int RttSemValue(RttSem sem, int *value)

This call returns, via reference parameter value, the current value of the semaphore specified by parameter sem. This call should be used with caution, since the semaphore's value could change between the return from the RttSemValueO call and the following statement.

21 3.4 Communication

An inter-thread communication (ITC) facility is provided by RT Threads, allow• ing message passing between threads both within an RT Threads environment, and between threads in different environments. A single interface for both types of com• munication is made possible because RttThreadlds are universally unique on the internet.

3.4.1 Intra-Address Space Communication

RT Threads provides blocking send/receive/reply style inter-thread communication primitives. A thread sending a message using RttSendO is blocked until the mes• sage has been received by the destination thread and a reply has been made. A thread receiving a message using RttReceiveO is blocked until some other thread sends it a message. RttReplyO, a non-blocking operation, is used to reply to a received message. There is also a non-blocking routine RttMsgWaitsO that allows a thread to test whether it has a message waiting to be received. RttSendO and

RttReplyO both copy their message (request and reply) data. A description of the message passing routines follows.

int RttSend(RttThreadld to, void *sData, u_int slen,

void *rData, u_int *rlen)

This call sends a message to the thread specified by the to parameter, and blocks waiting for a reply. The request message is specified by a buffer pointer, the sData parameter, and a length, the slen parameter. A buffer must be supplied for the

22 reply to be copied into, given as the rData parameter. The parameter rlen is an input/output parameter, specifying the maximum reply length as input, and re• turning the actual length of the reply message as output. If the input length is less than the length of the reply sent, the reply will be truncated. RttSendO returns

RTTOK on success, RTTNOSUCHTHREAD if the specified destination thread does not ex• ist, RTTFAILED on failure for any other reason.

int RttReceive(RttThreadId *from, void *data, u_int *len)

This call receives a message which was sent by another thread using RttSendO.

The RttThreadld of the sending thread is returned by reference via the from pa• rameter. A buffer must be supplied for the message to be copied into, given as the data parameter. The parameter len is an input/output parameter, specifying the maximum message length as input, and returning the actual length of the received message as output. If the input length is less than the length of the message sent, the message will be truncated.

int RttReply(RttThreadld sndr, void *data, u_int len)

This call sends a reply message back to a thread which is blocked on an RttSendO call. The destination thread (i.e., the original sender) is specified using the sndr parameter. A pointer to the reply message buffer and its length are given as the data and length parameters, respectively. RttReplyO returns RTTOK on suc• cess, RTTNOTBLOCKED if the thread indicated by sndr is not blocked sending, and

RTTFAILED for any other failure.

23 int RttMsgWaitsO

This non-blocking call returns 1 if there is a message waiting to be received by the calling thread, and 0 otherwise.

3.4.2 Inter-Address Space Communication

The message passing routines described in the previous section can also be used to send messages between threads in distinct RT Threads environments, i.e., across address space, possibly machine, boundaries. The routines described in this section make this possible.

int RttNetlnit(unsigned long ipAddr, unsigned short portNo)

This call is used to uniquely identify the calling RT Threads environment on the in• ternet, and enable cross address space ITC. It must be called in mainpO, before any

RttCreateO calls so that the IP address and port number specified by the ipAddr and portNo parameters can be embedded in all RttThreadlds. The user can let the system choose values for the IP address and/or the port number by specifying 0 for the respective parameters.

int RttMyIP(unsigned int *ipAddr) int RttMyPort(unsigned int *portNo)

These calls return, respectively, by reference, the IP address and port number which identify the local RT Threads environment. They are useful if 0 was specified as

24 parameters to RttNetlnit (). Both return RTTFAILED if RttNetlnitO has not yet

been called.

int RttCreateNS(RttThreadld *thread, void (*addr) () , int stksize,

char *name, void *arg, RttSchAttr schedAttr, int level)

This call takes the same parameters as RttCreate() and has the same semantics, except the created thread is marked by RT Threads as the unique name server

thread for the environment. RttCreateNS() may be called only once (it will return

RTTFAILED on subsequent calls).

int RttGetNS(unsigned int ipAddr, int portNo, RttThreadld *nServThreadId)

This call returns, by reference, the RttThreadld of the name server thread for an

RT Threads environment specified by IP address ipAddr and port number portNo.

bool_t xdr_RttThreadId(XDR *xdrs, RttThreadld *objp)

This call is provided so that RttThreadlds being sent in a message between address

spaces can be encoded and decoded using the XDR external data representation.

3.5 Dynamic Memory Allocation

RT Threads provides thread-safe wrappers for the standard C library heap memory

allocation routines mallocO, callocO, reallocQ and f ree(), declared as follows:

25 void *RttMalloc(int size) void *RttCalloc(int numElems, int elemSize) void *RttRealloc(void *ptr, int size) void RttFree(void *mem)

Other C library routines which do not have RT Threads equivalents should be pro•

tected from context switches with RttBeginCritical() and RttEndCriticalO calls.

3.6 Miscellaneous

int RttRegisterExitRoutine(void (*routine)())

This call registers procedure routine (which takes no parameters) to be called when

RT Threads exits. RttRegisterExitRoutineO may be called more than once to

int RttRegisterDescriptor(int fd)

Any file descriptors obtained from a non-RT Threads call (e.g., ones obtained in

a non-RT Threads process and passed to an RT Threads environment) should be

registered with RT Threads using the RttRegisterDescriptor() call.

void RttBeginCritical()

void RttEndCriticalO

26 These calls should surround any system calls or library calls which do not have

RT Threads equivalents and may not be re-entrant. They delimit a critical section within which no RT Threads context switches will occur.

int RttSetData(RttThreadld thread, unsigned long data) int RttGetData(RttThreadld thread, unsigned long *data)

These calls allow data to be associated with a given thread, and for that data to be retrieved.

3.7 External I/O

RT Threads also includes interface routines for many UNIX-style I/O calls. Blocking calls will only block the thread making the call and not the entire process in which

RT Threads is running. The parameter lists are identical and the semantics are the same as the corresponding UNIX calls. See the UNIX man pages for details of a par• ticular call. Table 3.2 shows the RT Threads calls which map directly to UNIX calls.

Some other routines are provided which do not map directly to UNIX calls, but are provided for convenience. They are described as follows:

int RttReadN(int fd, char *buf, int numbytes)

This call has the same semantics as the UNIX read(), except that while read() may return after reading less than numbytes bytes, RttReadNO will block until

27 RTT Routine UNIX Call RTT Routine UNIX Call RttOpen open RttPipe pipe RttSocket socket RttClose close RttBind bind RttListen listen RttAccept accept RttConnect connect RttRead read RttWrite write RttLseek lseek RttRecvfrom recvfrom RttSendto sendto RttRecvmsg recvmsg RttSendmsg sendmsg RttGetsockopt getsockopt RttSetsockopt setsockopt RttGetsockname getsockname

Table 3.2: Interface to UNIX I/O System Calls numbytes has been read (unless an error or the end of file is encountered),

int RttWriteN(int fd, char *buf, int numbytes)

This call has the same semantics as the UNIX write(), except that while write() may return after writing less than numbytes bytes, RttWriteNO will block until numbytes has been written (unless an error is encountered).

int RttSeekNRead(int fd, char *buf, int numbytes, int seekpos,

int seekmode)

This call combines the functionality of RttLseekQ and RttReadO into a single call.

int RttSeekNWrite(int fd, char *buf, int numbytes, int seekpos,

int seekmode)

This call combines the functionality of RttLseekO and RttWriteO into a single call.

28 Chapter 4

Implementation

Three models were used for implementing RT Threads: a native host model where

RT Threads implements its own context switching; a mapped model where RT

Threads threads are mapped to existing threads provided by the underlying op• erating system (or a separate user level threads layer); and a native microkernel model where RT Threads forms part of a microkernel implementation.

Most of the implementation is shared between the native host and mapped implementations. The source code is shared between the two models for all archi• tectures, with precompiler directives used to switch between model and architecture specific parts of the code. The general description of the implementation here will follow that of the native host version, that being the original model and environ• ment (UNIX) used to implement RT Threads. Features which differ in the mapped implementation will be described in section 4.10. The native microkernel is briefly described in section 4.11.

29 4.1 Data Structures

The overall state of each thread in the system is maintained in a structure called a thread control block (TCB). This structure has the type TCB. typedef struct tcb {

RttThreadld threadld;

int errNo;

ThreadMgmtlnfo threadMgmtlnfo;

QueueMgmtlnfo queueMgmtlnfo;

Schedlnfo schedlnfo;

ITCInfo itclnfo;

Semlnfo semlnfo;

IOInfo iolnfo;

} TCB;

Each thread is identified within the system by a pointer to its TCB. The threadld field of the TCB contains the thread's thread id (an identifier of type RttThreadld, described later in this section). The errNo field of the TCB is a thread specific ver• sion of the errno global of the underlying operating system (for UNIX and Win32).

The remaining fields of the TCB are structures specific to the various functional modules of the RT Threads implementation: threadMgmtlnfo contains information used by the thread management module, queueMgmtlnfo for the queue management module, and so on. Each will be described in greater detail in its respective section in this general implementation description.

30 RT Threads has a few global variables. Most are part of the _Globals_ global variable, which is a structure of the following type: struct globals {

TCB *currthread;

int inOsFlag;

int numbThreads;

int numbReadyThreads;

int numbSysThreads;

int ioPending;

int tickPending;

int nameServerCreated;

RttThreadld nameServerld;

u_long hid;

u_long lid;

struct tcb tcbTbl[NUMBTCBS];

QUEUE qTbl[NUMQS];

QUEUE rdyqTbl[NUM_READYQ_PRIORITIES];

};

The currthread field is a pointer to the TCB of the currently running thread. The inOsFlag field is used to control shared access to shared data within the RT Threads kernel. Fields numbThreads, numb ReadyThreads and numbSysThreads are the num• ber of threads currently alive, the number which are ready, and the number which are

"system" threads (threads created with level RTTSYS). The ioPending and tickPend• ing fields are flags which indicate whether I/O processing and clock tick handling, respectively, have been deferred. The nameServerCreated field is a flag indicating

31 whether the special name server thread has been created, and nameServerld is the thread id of the name server thread, if there is one. The hid and lid fields are the host id and local id (IP address and port) for identifying the local RT Threads environment (see section 4.7.1). The tcbTbl, qTbl and rdyqTbl fields are the global

TCB table, queue table and ready queue table, respectively.

Access to shared data structures of the RT Threads implementation, such as

TCBs, is protected by setting inOsFlag whenever a thread is in a critical section in the RT Threads kernel. This flag is checked whenever a preempting event occurs in the kernel, i.e., whenever a clock tick occurs (see section 4.5) or I/O has com• pleted (see section 4.9). If the flag is set when such an event occurs, any associated processing and scheduling is deferred (by setting the ioPending or tickPending flag) until the original thread leaves its critical section. These critical sections are delim• ited with ENTERINGOS and LEAVINGOS, which are macros for the enteringOs () and leavingOsO routines. enteringOs () simply sets inOsFlag. leavingOsO clears the flag, and also checks ioPending and tickPending to see whether any processing has been deferred, calling their respective handlers directly if necessary.

The publicly visible type for the threads, RttThreadld, is a 64-bit value which contains enough information to uniquely locate a thread among RT Threads environments on the internet. Figure 4.1 shows the layout of the RttThreadld. It is divided into two parts, the host id (hid) and local id (lid). The host id contains the 32-bit IP address associated with the thread's RT Threads environment, used to locate its host machine. The local id identifies the thread on a particular machine, using the 16-bit port number, and within a particular threads environment, using

32 hid lid

3 r-t- IP address port & index O

32 bits 16 bits 4bits 12 bits

Figure 4.1: RttThreadld the instance number and index. The index is an 12-bit index into the TCB table, allowing for as many as 4096 simultaneous alivethreads per address space. The 4- bit instance number is incremented each time a TCB structure is reused, to help prevent the thread id for a terminated thread from mistakenly being accepted as a reference to a newer thread using the same TCB table index. The instance number can range from 1 to 15, instance 0 being reserved to indicate the special name server thread.

4.2 System Startup

This section describes how the RT Threads system (i.e., an RT Threads environ• ment running in an underlying process) is initialized. From the user application's point of view, an RT Threads application starts in the same manner as a normal console application, except that the entry routine is named mainpO rather than main(). The actual main() routine is defined in the RT Threads library, and is the entry point for RT Threads startup. In main(), first the routine osInitO is called to initialize data structures, and then the null thread and main thread are created.

33 After these threads are created, the StartOs_() routine (of the scheduling module, see section 4.5) is called to start the scheduling clock.

static void osInitO

This routine first does dynamic initialization of the global variables currthread and inOsFlag, and then calls the routines InitReadyQueue_(), InitThreadMgmt_() and

InitIO_() to do module specific initialization. These routines are described in their respective sections.

RTTTHREAD NullThread_()

This is the entry routine for the null thread. The null thread is created with a special priority lower than that of any other thread in the system, and thus runs only when no other threads are ready to run. The null thread simply loops on the pause () system call, which yields control of the processor allowing other processes in the underlying system to run while the RT Threads environment is idle waiting for threads to wake up or I/O to complete. When such an event occurs, a signal is generated, causing pause() to return. This return will not actually happen until threads scheduled by the preempting event have completed their tasks, at which time the RT Threads system will become idle again, and the null thread will loop back to call pause() again.

RTTTHREAD mainThreadQ

34 This is the entry routine for the main thread, which simply calls the user defined mainpO routine, passing along the argc and argv parameters from main(). The main thread is created with the highest possible priority and the earliest possible deadline (RTTZEROTIME), and as such it will not be preempted by any threads created in mainpO (the intended purpose of mainpO is for user application initialization, e.g., command line parsing and the creation of one or more initial threads).

4.3 Thread Management

The thread management module is responsible for the creation and termination of threads, and for maintaining state specifically related to the running of a thread, such as its entry point and stack. The ThreadMgmtlnf o structure is shown below. typedef struct {

int inUse;

RttSchAttr attrs;

TSTATE state;

char name[MAXTNAMELEN + 1] ;

char *namePtr;

int *stmembeg;

int *stkend;

int *sp;

int stackSize;

void (*threadAddr)();

void *threadArg;

35 unsigned long uVal;

int level;

} ThreadMgmtlnfo;

The inUse field is a flag indicating whether the TCB is currently in use. The attrs field maintains the current scheduling attributes for the thread in an RttSchAttr structure, as described in section 3.1. The current state of the thread is maintained

in the state field, which is of enumerated type TSTATE, with possible values as follows:

typedef enum {

RBLK = 0, /* receive blocked */

SBLK = 1, /* send blocked */

WTFR = 2, /* waiting for reply */

SLEP = 3, /* sleeping threads */

SEMB = 4, /* semaphore blocked */

IOBK = 5, /* I/O blocked */

THREAD..FRE E = 20, /* not in use . */

THREAD..READ Y = 21, /* ready */

THREAD..RUNNIN G = 22, /* running */ THREAD..DYIN G = 23 , /* dying */

NOQ = Oxff /* not on any queue */

} TSTATE;

The name and namePtr fields are used to hold and refer to the string name of the

thread. The thread's stack is maintained using the stmembeg, stkend, sp and stack-

Size fields, which indicate the beginning and end of the stack in memory, the current

stack pointer, and the stack size, respectively. The threadAddr and threadArg fields

36 are the thread entry point (i.e., the address of the entry routine) and the argument passed into the entry routine. Thread specific user data is stored in the u Val field.

The /eue/field indicates the thread level as specified at thread creation and described in section 3.2.1.

The following routines, defined in the thread management module, are used to assist in thread management:

static TCB *f indUnusedTCBO

This routine searches the TCB table for a TCB whose inllse field is 0. If found, a pointer to that TCB is returned, otherwise TNULL is returned (TNULL is a null

TCB pointer). The last index tried is maintained as a static local variable as an optimization, so the search can continue with the next TCB in the table the next time f indUnusedTCBO is called.

static void initTCB(int tcbNo, TCB *tcbPtr)

This routine initializes a TCB at system startup. Most notably, the inllse field is set to 0, the state is set to THREADJFREE, and the instance number is set to 1 (instance number 0 is reserved for the special name sever thread).

static void threadIsLeaving(TCB *tcbPtr)

This routine is used to call cleanup routines for exiting threads, as defined in other modules. ItcThreadCleanup_() and SemCleanup_() are called.

37 static void killCurrthreadO

This routine is called when thread terminates itself, either by returning from its en• try routine or by calling RttExitO or RttKillO, to remove itself from the system.

The numbThreads field of _Globals_ is decremented, as is the numbSysThreads field if the thread is a system thread. The threadlsLeavingO routine is called to do any cleanup for other modules. SysFree_() is used to free the thread's stack. The TCB's state is set to THREAD_FREE, the instance number is incremented (and wrapped to

1 if it exceeds 15), and the inUse field is set to 0. If numbThreads is greater than numSysThreads, the next ready thread is scheduled by using DeqReadyThread_() to assign a new currthread, and then calling SchedThread_() to schedule it. Other• wise, there are no RTTUSR threads left in the system and the RT Threads environment first calls CallExitRoutines_() to call any user registered exit routines, and then exits using the underlying exit() call.

void InitThreadMgmt_()

This routine is used to initialize thread management. The numbThreads and numb•

SysThreads fields of _Globals_ are set to 0, and the initTCBO routine is called for each TCB in the TCB table.

void CallExitRoutines_()

This routine is called when the RT Threads environment exits naturally. It traverses a linked list of routines previously registered by the using application with the

38 RttRegisterExitRoutineO API call, void CallNclean_()

CallNclean_() is the entry routine used by the RT Threads implementation to start each new thread, the first time it is scheduled to run. The actual entry routine for the thread, as specified in the RttCreateO call, is explicitly called by CallNclean_().

When CallNclean_0 is called, currthread will have already been set to the point

to the new thread's TCB by the scheduler, so the entry routine address and ar• gument can be obtained from the threadAddr and threadArg fields of the TCB's

threadMgmtlnfo field. Should the user specified thread entry routine return, the

killCurrthreadO routine is called to remove the thread from the system.

Threads are created using the RttCreateO interface routine. First, an

unused TCB is obtained using the f indUnusedTCBO call. If none are available,

RttCreateO returns RTTFAILED. Otherwise, the new thread's threadMgmtlnfo field

is initialized. The inUse field is set to 1, and the attrs, name, stackSize, threadAddr,

threadArg and level are assigned the values passed in as parameters to RttCreate ().

A stack is allocated for the thread using SysMalloc_(). The IP address and port

number of the threadld field of the TCB are set to the values of the hid and lid

globals. The thread's state is set to THREAD_READY, and the new field of the TCB's

schedlnfo field is set to 1, to indicate to the scheduler that this is a new thread.

The new thread is now ready to be scheduled. The current time is determined using

RttGetTimeOfDay(), and is compared with the starting time attribute of the new

thread to see whether it should be put to sleep or made ready. If the starting time

has been reached, the thread is made ready by calling EnqueueEdfTail_(), other•

wise it is put to sleep by calling SetTimeout_() (see section 4.5). If the thread's

39 level is RTTSYS, the number of system threads (numbSysThreads) is incremented,

and the next ready thread is scheduled using the SchedM macro (this will either be

the currently running thread, i.e., the one which called RttCreateO, or the newly

created thread, depending on their scheduling attributes).

RttExitO and RttKillO can be used to terminate threads. RttExitO

terminates the currently running thread by calling killCurrthreadO. RttKillO first checks to see whether its thread id parameter is the currently running thread,

in which case it too calls killCurrthreadO. Otherwise, it terminates the specified

thread .in much the same way that killCurrthreadO does, except that it must

also dequeue the thread from whichever scheduling queue it is on, determined by its

state (and priority if it is a ready thread), using the DEQUEUE-ITEMO macro.

4.4 Queue Management

All threads except the running thread are maintained in queues. The queues de•

scribed in this section are used primarily by the scheduler, and will be referred to

in general as scheduling queues. Ready threads are maintained in an array of ready

queues, one for each priority level. Queues are implemented as doubly linked lists,

for efficient removal of a thread from a queue. The QueueMgmtlnf o structure is

shown below.

typedef struct {

struct tcb *prev;

struct tcb *next;

40 u_char queue;

} QueueMgmtlnfo;

The prev and next fields are links into whichever queue the thread is currently on.

The queue field is an 8-bit integer value, corresponding to the current state of the thread, indicating which queue the thread is on. The routines ENQUEUE_HEAD(),

ENQUEUE_TAIL (), EN QUEUE _I TEM(), DEQUEUEJffiAD ( ) and DEQUEUEJTEM (), defined as macros for efficiency, allow threads to be placed on or removed from anywhere in a queue. The FindThreadOnQ_() macro is used to determine whether a thread is on a particular queue.

A thread is always on a single scheduling queue, unless it is the currently running thread (state THREAD-RUNNING), in which case it is not on any queue.

Threads which are ready to run, i.e., are not blocked, sleeping, or running, have state THREAD_READY and are maintained on one of the ready queues. There is one ready queue for each priority level, these queues being maintained in an array called the ready queue table (field rdyqTbl'm the _Globals_ structure). The ready queue module provides routines which place threads in the appropriate position (accord• ing to the EDF rule) on the appropriate queue (i.e., the one corresponding to the thread's priority). Thus, the thread at the head of each priority queue is always the one which should be scheduled next for that priority. The ready queue routines are:

void EnqueueEdfTail.(TCB *thread)

This routine places a ready thread, specified as a pointer to its TCB, on the ready queue for its priority, behind all threads with an earlier deadline. The thread is

41 placed immediately following (at the tail of) any other threads with the same dead• line.

void EnqueueEdfHead_(TCB *thread)

This routine places a ready thread, specified as a pointer to its TCB, on the ready queue for its priority. The thread is placed immediately preceding (at the head of) any other threads with the same deadline. This routine is used when a thread is made ready after being blocked sending or receiving a message, or doing external

I/O.

TCB *DeqReadyThread_(void)

This routine returns a pointer to the TCB of the thread which is at head of the highest priority, non-empty ready queue, removing that thread from the queue.

TCB *DeqIfReady_()

Like DeqReadyThread_(), this routines removes the highest priority thread and re• turns a pointer to its TCB, but only if that thread would preempt the currently running thread. If so, the currently running thread, i.e., the thread indicated by the global variable currthread, is placed on its appropriate priority queue. Otherwise, the value of currthread is returned.

void DeleteReadyThread.(TCB *thread)

42 This routine removes the thread specified by thread from its priority queue.

Threads which are not ready or running are either sleeping or blocked, and will be on an appropriate queue, having been placed there with one of the macro routines described earlier in this section. Sleeping threads, i.e., those whose start• ing time is in the future, are maintained in the sleep queue, ordered by start time for efficient waking. All other threads are on various blocked queues, one each for semaphores, sending, waiting for a reply, receiving and I/O.

4.5 Thread Scheduling

The key features of thread scheduling are:

• determining which thread should be scheduled to run at any given time

• a clock for scheduling future events

• a mechanism for context switching, including saving thread state (registers)

Determination of which ready thread should be next to run at any given time is handled largely by the queue management routines described in section 4.4. The following sections describe the implementation of the system clock for RT Threads, and how context switches are performed.

43 4.5.1 System Clock

A "clock" is required by RT Threads for the purpose of making threads ready at their starting time. This is implemented using an interval timer (setitimerO) which causes periodic alarm signals to be sent to the process. These signals are caught by a signal handler called the tick handler. The tick handler checks the sleep queue to see whether any threads have a starting time less than or equal to the current time (obtained using RttGetTimeOfDayO). All such threads are taken off the sleep queue and made ready. If there is a thread which should preempt the currently running thread, a context switch to that thread takes place immediately, within the signal handler. This means that there may be no actual return from the signal handler for an indefinite period of time, possibly longer than the interval between clock ticks. Thus, in order to ensure that subsequent alarm signals are caught at the correct time, the signal mask for the process must be modified before such a context switch in order to unblock alarm signals.

The following scheduling module routines are used to implement the clock and related functions:

static int wakeSleepersO

This routine is called by the tick handler to take all threads off the sleep queue whose starting times have passed, and make them ready by putting them on their respective ready queues. RttGetTimeOfDayO is called to get the current time for comparison, and the sleep queue is traversed using the queue management macros, dequeuing threads until one whose starting time is still in the future is encountered.

44 Threads are ordered in the sleep queue by starting time, so it is not necessary to search the entire queue. Each thread which is dequeued from the sleep queue is put on its ready queue using EnqueueEdfTail_(). wakeSleepersO then returns the number of threads which were awakened.

static void tickHandlerO

The tick handler is a routine registered with the operating system to to be called at regular intervals. It is a signal handler which catches the SIGALRM signal. When it is called, the inOsFlag is first checked to see if it set. If so, the tickPending field of _Globals_ is set, to defer the signal handling, and tickHandler returns imme• diately. If not in a critical section, it calls wakeSleepersO to wake (make ready) any threads whose starting time has passed. If wakeSleepersO returns a non-zero value, at least one thread has been awakened, meaning that a context switch may be required. The SchedM scheduling macro (see section 4.5.2) is called to perform such a switch, if required. However, while in the signal handler, the underlying operating system blocks SIGALRM signals until the signal handler routine returns. If a context switch occurs, the routine will not return for an indefinite period of time, during which the SIGALRM signal would continue to block, thus delaying future clock ticks indefinitely. To overcome this problem, the SigMask_() routine (see below) is called to unblock the SIGALRM signal before calling SchedM.

int SetTimeout_(TCB *thread, RttTimeValue wakeTime)

The SetTimeout_() routine is used to put a thread to sleep by placing it on the time ordered sleep queue. The thread's state is set to SLEP, and its starting time is set to

45 wakeTime. The sleep queue is traversed using the queue management macros until a thread with a later starting time is encountered, and the new sleeper is linked into the queue ahead of that thread, unless the end of the queue is reached, in which case the new thread is linked to the tail of the queue.

void SigMask_(int how, int sig)

SigMask_() is a convenience routine used to block or unblock signals. The how pa• rameter is used to select whether the signal is to be blocked or unblocked, taking as a value either SIGJ3L0CK or SIG_UNBLOCK. The sig parameter is the signal to block or unblock, for example SIGALRM. The underlying sigprocmaskO call is used.

void StartClock_(int useconds)

StartClock_() is used to start the system clock. It uses the setitimerO call to initiate the generation of SIGALRM signals at regular intervals, that interval being specified in microseconds via the useconds parameter.

void StartOs_()

StartOs_() is used to initiate the signal related functionality of the system. The tick handler and SIGIO handler (see section 4.9) are registered with the operating system using the sigactionO call, and the RT Threads system clock is started by calling StartClock_().

46 4.5.2 Context Switching

The following data structure, Schedlnfo, is the type for the schedlnfo field of a

TCB. typedef struct {

int savearea[36];

int new;

}• Schedlnfo;

The savearea field is a statically allocated piece of memory which is used to store the register state of a thread when execution switches to the context of another thread. Its size, 144 bytes, is large enough to accomodate the necessary registers for all of the architectures on which RT Threads is implemented. The new parameter is a flag used to determine whether a thread is being scheduled to run for the first time.

The routines which perform the actual context switches are implemented in assembler code because they require access to the registers for saving thread state, and because they employ a "trick" involving return addresses1. The assembler code implements 3 functions used to achieve user level context switching:

void startNewThread(void (*func)(), void *sp)

This routine is used to start a new thread. It sets the initial stack pointer for the thread to point to sp, the beginning of the stack memory for the thread as allocated in RttCreateO, and then jumps to June, which is the address of the entry routine

'Solaris provides support for user level context switching (setcontext () and getcontext). This support was not used in the RT Threads implementation, in part because it is not portable to other operating systems.

47 for the new thread (in reality, the address of CallNclean_() is passed in, so that this routine can do cleanup if the actual thread entry routine returns).

int saveThreadContext(void *savearea)

The saveThreadContextO routine is used to save the state (i.e., the registers) for a thread prior to a context switch. It takes a parameter savearea which is a pointer to memory which is used to save the current state of the thread. The savearea field of the TCB's schedlnfo field is passed in for this purpose. When called to save the state of a thread, saveThreadContextO returns a non-zero value, so that the context switching code will know that another thread must be resumed.

void returnToThread(void *savearea)

The returnToThreadO routine is used to resume a thread which had at some point previously saved its state using saveThreadContextO. The contents of the memory referred to by its parameter savearea are loaded into the registers. returnToThreadO does an "unnatural" return, i.e., the function returns to the same address as the last call to saveThreadContextO for that thread did, except this time the return value is 0. Thus the context switching code can tell that a thread is being resumed and it should continue with execution of user code for that thread.

The following routines are also involved in context switching:

48 void SchedThread_()

This routine is called to perform a context switch, by calling the appropriate assem•

bler routine. When called, currthread must point to the TCB of the next thread

to run. The state of currthread is set to THREAD_RUNNING. Then, if the new field for

the thread is set, startNewThreadO is called, and otherwise returnToThreadO is

called resulting in the actual context switch. SchedThread_() never returns.

SchedM

The SchedM routine is defined as a preprocessor macro for efficiency. It is called

by the RT Threads kernel whenever there is the possibility that a context switch is

needed, and it calls the appropriate routines to effect the context switch if required.

It must be called within a kernel critical section (i.e., ENTERINGOS must have been

called previously), and LEAVINGOS is called within the macro. When it is called, the

current thread must either be in the THREAD_RUNNING state, or one of the blocked

states (e.g., SEMB). In the latter case, the thread will have just been put in a blocked

state by the kernel code, and SchedM will put the thread on the appropriate schedul•

ing queue. A context switch will necessarily occur, because the current thread is

now blocked. On the other hand, if the current thread was in the THREAD_RUNNING

state before the SchedM call, a context switch might not occur, since there might not

be a ready thread which would preempt the current one. Deqlf Ready_() is called

to determine the next thread to run. If it returns currthread no context switch

takes place, otherwise saveThreadContext () is called to save the state of the cur•

rent thread in preparation for a context switch. The saveThreadContext () call will

return a non-zero value, and this condition will result in currthread being set to

49 the TCB pointer returned by Deqlf Ready_(). The same procedure is followed if the originally current thread was in a blocked state, except that DeqReadyThread_()

is used to obtain the TCB pointer for the next thread to run, since it is known

that it will not be currthread. At this point SchedThread_() is called to perform

the context switch. The saveThreadContextO call is in a conditional statement,

which, when the return value is zero, results in a LEAVINGOS call and a return to

the code which called SchedM. This occurs when the originally preempted thread is

scheduled to run again.

4.5.3 Modifying Scheduling Attributes

The RttSetThreadSchedAttrO API routine can be used to modify the scheduling

attributes of a thread. This can result in an immediate context switch. If the thread

is sleeping, it is first removed from the sleep queue with the DEQUEUE_ITEM() queue

management macro and its state is set to THREAD_READY. The sleeping thread is

dequeued because if the thread is to remain sleeping, but with a modified starting

time, it will need to be re-enqueued on the sleep queue to ensure it ends up in the

correct position in the time ordering. If the thread is on a ready queue, it is re•

moved from that ready queue with DeleteReadyThread_() because it may change

state or priority. RttGetTimeOfDayO is then called to determine the current time

for comparison. If the new starting time for the thread is in the future, the thread's

scheduling attributes are assigned the values indicated by the attr parameter, and if

the thread was running or ready, it is placed on the sleep queue using SetTimeout_()

and SchedM is called to schedule the next ready thread. Otherwise, if the thread

having its attributes set is the currently running thread, its attributes are simply

50 assigned their new values and SchedM is called. If the thread was not the currently

running thread and its state is now THREAD_READY (because it had been ready or sleeping), it is put on the appropriate ready queue with EnqueueEdf Tail_() SchedM

is called to schedule the next ready thread (which may or may not be the one just

made ready).

4.6 Semaphores

The publicly visible type for semaphores is the opaque type RttSem (defined as a

void* in the interface). In the semaphore implementation, the RttSem type maps to

the private type SemaphoreStruct, i.e., RttSem variables passed into the semaphore

routines are actually pointers to SemaphoreStructs, and are cast accordingly within

the routines. The SemaphoreStruct type is a structure defined as follows:

typedef struct semaphore {

int magicNumb;

TCB *firstBlockedThread;

TCB *lastBlockedThread;

int semaphValue;

int mode;

} SemaphoreStruct;

The magicNumb field is a magic number used to ensure that the semaphore pointer

is valid. The firstBlockedThread and lastBlockedThread fields are TCB pointers for

the first and last thread in the queue of threads blocked on the semaphore. The

semaphValue field is the current value of the semaphore. The mode field indicates

51 the mode of semaphore, either first come first served (RTTFCFS) or priority based

(RTTPRIORITY).

Semaphores are created using the RttAllocSemO routine, which allocates a SemaphoreStruct dynamically using SysMalloc_(), and initializes its fields (the semaphValue and mode values are obtained from the value and mode parameters).

The pointer to the allocated memory is cast to RttSem and returned by reference.

RT Threads semaphores are destroyed by calling the RttFreeO routine, which frees the memory allocated for the semaphore using SysFree_(). RttFreeO first checks, however, whether the semaphore value is less than zero, indicating that threads are blocked on the semaphore, in which case it returns RTTFAILED without freeing the semaphore.

Threads which are blocked on a semaphore maintain the pertinent informa• tion in the semlnfo field of the TCB, which is a structure of type: typedef struct {

struct tcb *semaphNext;

struct tcb *semaphPrev;

RttSem semaphore;

} Semlnfo;

The semaphNext and semaphPrev fields are links used to place the thread in the queue of threads blocked on a particular semaphore. The semaphore field is the semaphore which the thread is blocked on (or NULL if the thread is not blocked on a semaphore).

52 The following are routines local to the semaphore module used to add and remove threads from a semaphore's queue of blocked threads:

static void addBlockedThread(SemaphoreStruct *semaph)

This routine links the currently running thread (currthread) into the blocked queue for the semaphore indicated by the pointer semaph. Links from the semlnfo field of the thread's TCB are used. The lastBlockedThread field of semaph is set to be currthread, placing the thread at the tail of the queue.

static void removeThread(SemaphoreStruct *semaph, TCB *tcbPtr)

This routine removes the thread indicated by tcbPtr from anywhere in the blocked queue for semaph. This is an efficient operation because the TCB has the previous and next links which locate the thread in the queue. The removeThread routine is used when a thread waiting on a semaphore becomes unblocked (due to an RttV() call on the semaphore), or when a semaphore blocked thread is terminated.

static TCB *dequeueFirstBlockedThread(SemaphoreStruct *semaph)

This routine calls removeThread() with the firstBlockedThread field of semaph to remove the first blocked thread from the semaphore's blocked queue. It is used for first come first served mode semaphores.

static TCB *dequeueHighPrioBlockedThread(SemaphoreStruct *semaph)

53 This routine searches the blocked queue for semaph to determine which thread has the highest priority. If more than one thread has this priority, the one with the earliest deadline is chosen, and if there is more than one such thread, the one closest to the first blocked thread is chosen. The resulting thread is then removed from the queue using removeThreadO. This routine is used for priority mode semaphores.

void SemCleanup_(TCB *tcbPtr)

SemCleanup_() is called by a terminating thread to do semaphore related cleanup.

If the thread indicated by tcbPtr is blocked on a semaphore, it is removed from the semaphore's blocked queue using removeThreadO, and the semaphore's value is incremented to reflect the fact that one less thread is blocked on it.

Threads can wait on or signal a semaphore using the RttP() and RttV()

API routines. Each takes a semaphore variable (type RttSem), previously created with RttAllocSemO, as its single parameter, and casts that parameter to a pointer of type SemaphoreStruct* so it can access the semaphore's state. Each also checks the semaphore's magic number to ensure it is a valid semaphore.

In RttP(), the semaphore's value is first decremented. If the new value is less than zero, the thread will block on the semaphore. If this is the case, addBlockedThreadO is called to put the thread (i.e., the currently running thread, currthread) at the tail of the blocked queue for the semaphore. Then the state of the thread is set to SEMB (semaphore blocked), and the scheduling macro SchedM is called to put the thread on the semaphore blocked queue (the scheduling queue,

54 not to be confused with the semaphore's own blocked queue) and schedule the next ready thread (see section 4.5). If the new value of the semaphore is not less than zero, the thread does not block and RttP() returns immediately.

RttV() increments the semaphore's value. If the resulting value is less than or equal to zero, then there was at least one thread blocked on the semaphore and thus a blocked thread should be made ready. If the semaphore's mode is RTTFCFS, then the thread to be made ready is chosen with dequeueFirstBlockedThreadO, and if the mode is RTTPRIORITY, the thread is chosen with dequeueHighPrioBlockedThreadO.

The chosen thread is made ready with EnqueueEdfTail_(), and the scheduling macro SchedM is called. If the newly ready thread has scheduling attributes which take precedence over those of the currently running thread, the current thread will be preempted immediately.

4.7 Inter Thread Communication

Information used for RT Threads' message passing facility is maintained in the itclnfo field of the TCB of each thread. This field is a structure of type ITCinf o, defined as follows: typedef struct {

int remoteltc;

RttThreadld remoteThreadld;

void *sendmsg;

u_int sendlen;

55 struct tcb *peer;

void *receivedmsg;

u_int receivedlen;

struct tcb *waitingFirst;

struct tcb *waitingLast;

struct tcb *waitingNext;

struct tcb *waitingPrev;

> ITCInfo;

The first field, remoteltc, is a flag indicating whether the thread is currently in the midst of a cross address space message passing transaction. The second field, re- moteThreadld is the thread id of the remote sending thread in such a transaction

(see section 4.7.1). The fields sendmsg and sendlen are the message buffer and buffer length for the request message, maintained by the sending thread in the event that the intended receiver is not blocked receiving a message at the time the message is sent. The peer field is used to communicate the identity of the sending thread to the receiving thread. The receivedmsg and receivedlen fields are the buffer and buffer length for depositing the request message at the receiving thread, and for depositing the reply message at the sending thread. The remaining fields are used to implement a queue of waiting senders at a receiving thread, for the case where multiple threads send to a single thread before it does a receive. The waitingFirst and waitingLast fields are used to mark the ends of the queue in the intended receiver's TCB. The waitingNext and waitingPrev are links in the TCB of each of the sending threads in the queue.

56 The following local routines are defined in the ITC module:

static void addWaitingSender(TCB *to)

This routine uses the links in the itclnfo field of a sending thread's TCB to place the thread in the queue of waiting senders for an intended receiver thread, as specified by the to parameter.

static void dequeueSender(TCB *tcbPtr)

This routine dequeues the first sender thread from the queue of waiting senders maintained at a receiving thread. A first come first served policy is used for enqueu• ing and dequeuing senders.

static TCB *getTcbFromRemoteThreadId(RttThreadId thread)

This routines maps a remote thread id to the TCB in which it is stored. The TCB table is searched until a match is found.

void ItcThreadCleanup_(TCB *tcbPtr)

This routine is called to perform ITC related cleanup when a thread terminates.

If any sender threads are on the waiting sender queue of the thread referred to by tcbPtr, they are taken off the scheduler's send blocked queue and made ready.

57 The RttSendO routine first compares the IP address and port number in the destination thread id (determined using the GetlpAddrFromThreadldO and

GetPortNoFromThreadldO macros) with the local RT Threads environment's host id and local id to see whether the intended receiver is a local thread. If they differ, the receiver is in another threads environment and ExecuteRemoteRequest_() is called to perform the cross address space communication (see section 4.7.1). Other• wise, the receiver is local, in which case a pointer to the TCB of the intended receiver is obtained to do the processing of the local send (the sending thread is the currently running thread, i.e., currthread). The macro getTcbFromThreadldO is used to convert a thread id to a TCB pointer. This is done with the to parameter thread id, unless its instance number is 0 which indicates that the intended receiver is the local name server thread (see section 4.7.1), in which case getTcbFromThreadldO is used with the globally stored name server thread id to determine its TCB pointer.

This will or occur either if the nameserver thread is the destination in a cross ad• dress space communication, in which case this will be the local proxy send, or if the nameserver thread id was obtained locally with the RttGetNSO call. The receivedmsg field for the sender thread is then set to the reply buffer address, passed in as the rData parameter to RttSendO. The FindThreadOnQueue_() macro is used to see if the the intended receiver is on the receive blocked (RBLK) scheduling queue. If so, the receiver will have set the receivedmsg and receivedlen fields of its itclnfo structure, and the sender copies the request data (parameter sData) into the receiver's buffer. The length of the copy is the minimum of the send length specified by the sender (parameter slen), and the maximum receive length specified by the receiver (the value in the receivedlen field). The sender then sets itself as the receiver's peer, sets its own state to be waiting for reply (WTFR), makes the receiving

58 thread ready by changing its state to THREAD-READY and putting it on its ready queue using EnqueueEdf Head_(), and then blocks itself and makes the next thread ready (possibly the receiver) by calling the scheduling macro SchedM. In the event that the intended receiver exists but is not blocked receiving, the sender, puts itself on the receiver's queue of waiting senders using addWaitingSenderO, sets its own sendmsg and sendlen fields for the receiver to eventually copy the request message from, sets the receiver as its peer, sets its own state as send blocked (SBLK) and calls

SchedM to block itself and schedule the next ready thread. Note that RttSendO will always block. When the sending thread eventually becomes unblocked, it checks its own receivedlen field to find out the length of the reply message, as set by the replying thread, and returns (the reply length is returned by reference).

In RttReceiveO, the receiving thread starts by checking the waitingFirst field of its TCB's itclnfo field. If not TNULL, there is at least one message waiting, and the thread can pick it up right away without blocking. I.e., the waitingFirst field will be the TCB pointer for the next sender, and the receiver copies the mes• sage stored in its sendmsg field into its own receive buffer (the Data parameter to

RttReceiveO), the copy length being the minimum of the send length (specified in the sender's sendlen field) and the receive length (the dereferenced value of the len parameter to RttReceiveO). The receiver then changes the sender's state from send blocked (SBLK) to waiting for reply (WTFR), switching it to the appropriate scheduling queue, and then returns. In the event that there are no messages waiting for the receiver, it sets its own receivedmsg and receivedlen fields to indicate the buffer to eventually be filled with a request message, then sets its state to receive blocked (RBLK) and calls SchedM to block itself and schedule the next ready thread.

59 RttMsgWaitsO simply checks waitingFirst for the calling thread, returning

0 if it is NULL and 1 otherwise.

RttReplyO first checks the thread id to which the reply is being sent (pa• rameter sndr) to see if it is a local or a remote thread. If it is a local thread, its

TCB pointer is obtained using the getTcbFromThreadldO macro, otherwise the getTcbFromRemoteThreadldO routine is used (this maps the remote thread id to the TCB for a proxy thread which does a local send for the remote thread). In the local case, FindThreadOnQ_() is used to see whether the destination thread (i.e., original sender) is actually in blocked waiting for a reply. If not, RTTNOTBLOCKED is returned. Otherwise, the reply is copied into the receievedmsg field in the sender's

TCB, the sender is made ready by setting its state to THREAD_READY and calling

EnqueueEdfHead_(), and SchedM is called. RttReplyO never blocks, but there may be an immediate context switch to the sender if it has a higher priority or equal priority and earlier or equal deadline.

4.7.1 Cross Address Space ITC

The cross address space ITC facility is for the most part built on top of RT Threads, but since it makes use of a few of the kernel's facilities to make message passing location transparent, it is considered part of the RT Threads kernel. Location trans• parency is achieved by embedding each thread's location in the RttThreadld, and extracting it within the RttSendO call. TCP/IP is used as the underlying protocol for sending messages across address space boundaries. Thus, if within the RttSendO

60 call it is determined that the RttThreadld of the intended receiver refers to a thread with an IP address and port number for an RT Threads environment other than the local one, ExecuteRemoteRequest_() is called to set up a TCP connection to the remote environment.

For receiving messages from remote senders, each RT Threads environment can enable an ITC server system thread which listens for incoming connection re• quests, and spawns worker threads which forward ITC messages to the appropriate local thread using a local RttSendO. The following data structures are used by the cross address space ITC implementation: struct ITCinfo {

u_int slen;

u_int rlen;

RttThreadld sender;

RttThreadld receiver;

} ITCinfo;

The ITCinfo structure is used for transmitting control information. It precedes the actual message data in every transmission from sender to receiver. The slen field is the message data length, and rlen is the maximum length reply the sender will accept. The sender field is the sender thread id (which will be returned to the receiving thread by the RttReceiveO call at the remote end) and receiver is the thread id of the intended receiver, so it can be located when the message gets to the remote threads environment. struct Replylnfo {

61 int retCode;

u_int replyLen;

} Replylnfo;

The Replylnfo structure is used to transmit control information back to the sender along with the reply message. The retCode field is used to relay the return code for the remote RttSendO back to the sender, and the replyLen indicates the length of the reply.

The RttNetlnitO API routine is used to enable cross address space ITC.

It sets the host id (hid) and local id (lid) for the local RT Threads environment according to the ipAddr and port parameters passed to it. Either or both of these parameters may be zero, in which case system chosen values are used. In the case of the IP address, the underlying gethostbyname() call is used to determine the local

IP address (which may not be the desired one if the host is multi-homed). In the case of the port number, the underlying bindO call is used to determine the port, if one is not chosen by the application. The routines of the remote ITC module which implement cross address space message passing are described below:

int ExecuteRemoteRequest_(RttThreadld to, u_int slen, void *sData,

u_int *rlen, void *rData)

This routine is used to perform a remote send. The parameters are the same as those passed to the RttSendO call from which ExecuteRemoteRequest_() is called.

A socket is created, bound to a system chosen port, and used to connect (using

RttConnectO) to the IP address and port embedded in the to parameter thread id.

62 An ITCinf o structure is then filled in with the slen and dereferenced rlen parameter values, the thread id of the currently running thread for the sender, and the receiver being the to parameter. This control information structure is byte order encoded for network transmission using the locally defined encodelnfoO routine, which converts each field to network byte order using ntohl(). The control information structure (variable info) is then sent to the intended receiver using RttWriteNO, followed by another RttWriteNO call to send the actual message data. The mes• sage is considered to be a stream of bytes, and any byte order considerations are the responsibility of the higher level application calling RttSendO. At this point

ExecuteRemoteRequest_() will block waiting for the reply from the remote receiver.

First RttReadNO is called to receive the reply control information, which is read into a Replylnfo structure. If the return value (retCode) value is not RTTOK, the

TCP socket is closed and the return code value is returned to RttSendO, where it is returned to the original caller. Otherwise, the replyLen field is used to deter• mine how many more bytes are to be read from the connection in a subsequent

RttReadNO call, which reads the data into the rData buffer supplied by the original caller. The dereferenced value of the rlen parameter is also set to the reply length.

The TCP connection is then closed and RTTOK is returned.

static RTTTHREAD itcServerO

This routine defines the ITC server thread, created in the RttNetlnitO call. The

ITC server first calls RttListenO on the local socket descriptor (also created by

RttNetlnitO), to listen for incoming connection requests from other RT Threads environments doing remote sends to this one. The server then goes into a loop, first calling RttAccept() to accept any incoming connections. Whenever RttAccept()

63 returns with the socket descriptor for a new connection it calls RttCreateO to cre• ate an ITC worker thread (see the itcWorkerO description below), passing it the socket descriptor as its argument.

static RTTTHREAD itcWorker(void *arg)

An ITC worker thread is created each time a new connection corresponding to an incoming remote send request arrives, leaving the ITC server available to handle more incoming requests. The arg argument of itcWorker contains the socket de• scriptor for the TCP connection on which the request control information and data is being sent. RttReadNO is first called to receive the control information into an

ITCinfo structure, the contents of which are then decoded to host byte order us• ing the locally defined decodelnfoO routine. The slen field from that structure is then used to determine how long the message data will be, so that a buffer (local variable sData) can be allocated for it and it can be read off the TCP connection using RttReadNO, providing it is not a zero length message, in which case sData is set to NULL. At this point the ITC worker thread prepares to do a local send, with the ITC worker acting as a proxy for the remote thread originally doing the send. RttSendO is used to do the local send. However, the corresponding local

RttReceiveO needs to return the thread id of the remote sender, and not that of the ITC worker (proxy), as would normally be the case. To accomplish this, the remoteltc field of the ITC worker's itclnfo TCB field is set to 1, and the re- moteThreadld field is set to the thread id of the remote sender, as obtained from the sender field of the control information structure received on the TCP connec• tion. Within the local RttReceiveO implementation, before the sender thread id is returned to the receiving thread (via its reference paramater), the remoteltc flag

64 is checked, and if set the remote ThreadId thread id is returned instead of that of the local sender. Once the local RttSendO returns to the ITC server, the ITC server can send the reply to the remote sender. The retCode field of a Replylnf o structure is assigned the return value of the RttSendO call. If the return value is RTTOK, the replyLen field is set to the reply length returned by RttSendO, otherwise it is set to zero. The fields of the Replylnf o structure are then converted to network byte order using htonlO, and it and the reply message are sent to the remote sender using RttWriteO. The TCP connection is then closed and the ITC worker thread terminates.

The cross address space ITC facility uses a name server thread mechanism so that a thread in one RT Threads environment can locate (i.e., find out the thread ids of) threads in other environments. RT Threads does not provide an actual name service, but rather provides a way to locate a single well known thread in any given

RT Threads enviroonment, such an environment itself being uniquely identified by its IP address and port number, as set with the RttNetlnitO call. The mecha• nism used to identify the special name server thread is give it an instance number of 0, while no other threads are allowed to have an instance number of 0. This is done by the RttCreateNSO call. A remote thread can use RttGetNSO to construct

(without doing any communication) a thread id containing the IP address and port number of the destination threads environment. When RttSendO is called with this thread id, the remote receiver will notice the 0 instance number and send the message to the well known name server thread there. This is actually done within the local RttSendO call at the remote receiver.

65 4.8 Dynamic Memory Allocation

The RT Threads routines for dynamic allocation of heap memory are implemented simply as wrappers around the standard C library routines mallocO, callocO, reallocO and free(). The corresponding API routines add protection from con• text switches (using ENTERINGOS and LEAVINGOS), because the underlying library routines are not necessarily re-entrant. Private (to the RT Threads kernel) rou• tines SysMalloc_() and SysFree_() are also provided for memory allocation within the kernel, and must be called from within a critical section (i.e., kernel code pro• tected with ENTERINGOS and LEAVINGOS). These routines simply call the underlying mallocO and free(), but are provided to facilitate alternate implementations.

4.9 External I/O

The asynchronous I/O features of UNIX are used to implement asynchronous I/O in RT Threads, i.e., all file (and socket) descriptors are set to be non-blocking and to have a SIGIO signal generated whenever I/O is complete. When an I/O routine is called by a thread (e.g., RttReadO), the underlying UNIX system call (e.g., readO) is called. If that system call returns with an indication that the call would block, the thread is placed on the I/O blocked (IOBK) queue. SIGIO signals are caught by the I/O handler routine, which uses the select()2 system call to determine which file descriptors have completed I/O. Each blocking API routine has a corresponding local routine which is called to complete the I/O operation when it is known that the descriptor is ready. For example, the read() system call is called in the RttReadO

2For Solaris, the poll() system call is used.

66 routine to read from the file or socket indicated by the file descriptor passed to it.

If the read() call returns with an error and errno indicates that the call would have blocked, the calling thread is placed on the I/O blocked queue and changes its state to IOBK. When the I/O is ready, the SIGIO handler will catch a SIGIO signal and the local doReadO routine will be called to call read() again. If it completes successfully, the thread will be made ready again and a context switch will occur if the thread should run. The following structure, which is maintained as the iolnfo field of the TCB, is used to communicate the I/O operation to be performed and its parameters to the lower level I/O routine (e.g., doReadO).

typedef struct {

char ioDp;

int ioRes;

char *ioBuf;

int ioNumBytes;

int ioFlags;

char •ioSrcBuf;

int •ioSblPtr;

int ioSeekPos;

int ioSeekMode;

int ioAl;

int ioA2;

int ioA3;

int ioA4;

int ioA5;

67 } IOInfo;

The ioOp field indicates the I/O operation which is to be performed. Constants are assigned to indicate various I/O operations, for example the SNDM constant flags the sendmsgO operation. The ioRes field is used to communicate the result of the underlying I/O operation back to the calling thread. The other fields of the IOInfo structure are used to communicate various parameters to the lower level routine.

The ioAl, ioA2, etc., fields are general purpose fields used for any I/O call argu• ments, except that ioAl is always used for the file descriptor argument.

static int enableSigio(int fd)

This routine uses ioctlO system calls to mark the given file descriptor for non blocking I/O and asynchronous I/O notification with the underlying operating sys• tem. This means that normally blocking system calls on the file descriptor will not block, but will return an error value and set the errno system global to indicate that the call would block (EWOULDBLOCK or EAGAIN, depending on the version of UNIX).

It also means that whenever I/O is ready for completion at the operating system level, for example if data has arrived on the network for a given socket, a SIGIO signal will be generated, and a subsequent call for the associated file descriptor will not block.

void InitI0_()

InitI0_() is called to initialize I/O for an RT Threads environment. It clears the global signal masks and then sets descriptors 0, 1 and 2 (standard input, standard

68 output and standard error) for f dmask. It also enables SIGIO generation for stan• dard input by calling enableSigioO.

int checkBlkIoOperations(int selected)

This routine traverses the I/O blocked queue, and for each thread in the queue, checks whether the appropriate global signal mask (WIOMASK or RIOMASK) has been set for its file descriptor (which is in the ioAl field of the TCB). If so, I/O is ready for completion and the appropriate lower level I/O routine can be called. A switch is done on the ioOp field of the thread's TCB to determine the appropriate signal mask to check (e.g., wiomask for output I/O such as writeO or sendmsgO), and the appropriate lower level routine to call (e.g., doWriteO for an ioOp of WRITE).

The number of operations which were completed is returned.

void IoHandler_()

This routine defines the SIGIO handler, registered to be the signal handler for

SIGIO signals. When called, the global inOsFlag is first checked to see if the in• terrupted thread was in a critical section in the kernel. If so, the handling must be deferred, which is done by setting the ioPending field of _Globals_ and return• ing immediately. Otherwise, the signal handler enters a critical section by calling

ENTERINGOS. Then, select () is called to determine which file descriptors (hence, which threads) have I/O ready for completion. The descriptors to be checked are maintained in a global (to the I/O module) mask, called f dmask, which is of type sigset_t. Two other masks global to the module, riomask and wiomask are set to

69 fdmask and passed to select() to determine which file descriptors are ready for reading (riomask), and which are ready for writing (wiomask). If any descriptors are selected, checkBlkloQperationsO is called to complete the operations. At this point SIGIO signals are unblocked using SigMask_(), and SchedM is called in case one of the threads whose I/O was just completed would preempt the thread which was interrupted by the handler.

4.10 Mapped Implementation

Neither Win32 nor Windows NT provide an equivalent to the UNIX system call setitimerO for generating alarm signals at a regular interval. Win32 does provide similar functionality with its multimedia timer facilities, allowing a callback routine to be called at regular intervals. Thus the tick handler could be implemented as one of these multimedia callbacks. Unfortunately, callbacks will not occur if there is an outstanding callback which has not returned. Thus a context switch within the tick handler would effectively cause the scheduling clock to stop until that thread ran to completion. This could easily lead to incorrect scheduling behaviour (i.e., a thread with higher scheduling precedence might miss its "wake up call"). A suitable solution was not found, which provided the motivation for trying a new approach, the mapped model.

In the mapped model, RT Threads is implemented on top of an existing threads package, with each RT Threads thread mapped directly to a thread native to the underlying package. Thus the RT Threads implementation can take advan• tage of some of the features of the underlying package, such as context switching

70 and asynchronous I/O among threads. A mapped implementation of RT Threads was done for both AIX 4.1 PThreads and for the Win32 Software Development Kit3.

The Win32 implementation is described here, i.e., those features which differ from the native host implementation.

All of the underlying Win32 threads are suspended at any given time, ex• cept the one underlying the currently running RT Thread. Context switches are performed by suspending the currently running thread's underlying Win32 thread, and resuming that for the new thread to be scheduled. The tick handler runs as a separate Win32 thread which executes in a loop blocking on a Win32 semaphore on each pass. The multimedia timer callback routine simply signals this semaphore to effect the clock tick. The tick handler can then perform context switches by sus• pending one thread and resuming another. Note that the RT Threads scheduler is still used and not the Win32 scheduler, since the Win32 scheduler does not give the desired scheduling behaviour. Mapping to underlying Win32 threads is used merely as a technique for performing context switches.

Not all context switches occur in the tick handler. Often a thread yields con• trol itself, for example by blocking on a semaphore. In this case the thread cannot suspend itself first and then resume the next thread. Instead, it first resumes the next thread and then suspends itself. This could, however, cause a context switch before it actually suspends itself. To overcome this problem, the thread first boosts its Win32 priority before resuming the next thread to ensure that a context switch does not take place until it suspends itself. When the original thread is resumed

3Win32 provides portability between Windows NT and Windows 95

71 again, it lowers its priority again to the normal priority reserved for the underlying

Win32 threads.

Mapping RT Threads to Win32 threads made it possible to take advantage of Win32's asynchronous I/O among threads. I.e., this made it possible for the

RT Threads I/O routines to simply call the underlying Win32 I/O routines (e.g.,

_read()) and block if necessary rather than using an I/O queue. Additional schedul• ing is required to resume another thread before the actual I/O call (in case it blocks), and to ensure that the correct thread is scheduled to run when I/O completes, since during the I/O call (and immediately before and after), more than one underlying

Win32 thread is in an unsuspended state.

4.11 Native Microkernel Implementation

There were at one time plans to implement RT Threads as a stand-alone kernel on top of bare hardware for the Intel Pentium architecture. The motivation for this was to provide a very efficient environment for implementing the Continuous Media

File Server, one in which real time guarantees could be provided. However, the Kea microkernel operating system had already been developed at UBC, allowing the RT

Threads scheduler to be dropped into the kernel, so that Kea's threads would be scheduled according to RT Threads' scheduling policy. This put RT Threads close to the hardware, made it possible to make the RT Threads environment the only process4 in the system, and eliminated the need to develop new operating system support from the ground up.

4 domain, in Kea terminology

72 Chapter 5

Performance

The performance of a threads package can be measured on a number of criteria. The most straightforward is measurement of the overhead of specific calls and threads related functional aspects. The term functional overheads will be used in this thesis to describe such overheads. The most important of these are for synchronization calls (e.g., for mutexes), for thread creation, and for context switching.

Less straightforward is the impact on overall application performance of using a threads package. While functional overheads play a part in overall performance, their effect may be insignificant compared to performance improvements due to con• currency achieved by using threads for blocking I/O calls, so that computation may continue while I/O requests are being serviced by external hardware. The perfor• mance of a particular threaded application may be highly dependent on the nature of the application and the way it is implemented (i.e., its decomposition into con• current tasks). It is difficult to choose representative benchmark applications which will show performance differences due to differences in scheduling policies or imple-

73 Hostname Operating Architecture System Stephen SunOS 5.5.1 SPARC2 rickards SunOS 5.5.1 Pentium 166 cascade SunOS 5.5.1 UltraSPARC ice SunOS 4.1.3C SPARC 10 blackdog AIX 4.1 PowerPC 601 appaloosa AIX 3.2.5 PowerPC 601 dsg AIX 3.2.5 PowerPC 601 monteith Windows NT 3.51 Pentium 100 grouse Windows NT 4.0 Pentium 200 sumas Windows NT 4.0 Pentium 200 asp Windows 95 Pentium 120 robson-win95 Windows 95 Pentium 200

Table 5.1: Host operating systems and architectures mentations between different threads packages. Instead, the approach taken here is to try to measure how well RT Threads performs in terms of meeting its own real time scheduling criteria.

In the tests described in this section, performance measurement times were averaged over several iterations, and unrelated overheads were subtracted where possible. All times listed in the tables are in microseconds. Table 5.1 shows a list of hostnames of the machines used in the tests, and their operating systems and architectures.

5.1 Functional Overheads

The following sections describe experiments used to measure the functional over• heads of thread creation, semaphore calls and context switching for RT Threads.

Measurements were made for other threads packages as well, for comparison.

74 5.1.1 Thread Creation

The importance of thread creation performance depends on the nature of the ap• plication using threads. If only a few long lived threads are created, then creation time is not critical. However, if threads have a low overhead for creation ("cheap" threads), implementation approaches which make liberal use of dynamic thread cre• ation may become practical. An example would be protocol stacks which allocate a thread per packet for processing.

The results for thread creation testing are summarized in table 5.2. Thread creation times for RT Threads were highly dependent on the size of the stack allo• cated (i.e., they reflect the time taken to allocate the stack using mallocO within the call). Solaris Threads and Pthreads allow a stack to be allocated by the user, and stacks were allocated using mallocO in the tests, the time for which is included in the results. Values in parentheses show creation times with average mallocO times factored out. For Win32 Threads, a stacksize is specified at creation, but the stack is allocated by the system. For the Win32 mapped RT Threads implementa• tion, the stack of the underlying Win32 thread is utilized, and no explicit mallocO is done.

Creation times were measured as an average of 1000 sequential creation calls.

A fairly high degree of variability was observed between successive repeats of the same test, thus the results shown are very approximate. They do show, however, that creation times for RT Threads are comparable to those for the other threads packages tested. They are generally better than those for the UNIX thread packages, several time better than those for AIX 4.1 Pthreads. They are only slightly worse

75 Hostname Threads 1024 4096 16384 Package bytes bytes bytes RT Threads 280 (65) 922 (110) 1830 (95) Stephen Solaris Threads 300 (117) 856 (144) 1744 (265) Pthreads 305 (118) 867 (169) 1698 (240) RT Threads 82 (32) 230 (40) 425 (40) rickards Solaris Threads 93 (46) 249 (59) 468 (85) Pthreads 100 (52) 245 (50) 448 (82) RT Threads 103 (64) 275 (145) 297 (144) blackdog Pthreads 626 (589) 816 (680) 887 (720) RT Threads 67 93 138 monteith Win32 Threads 62 86 129 RT Threads 45.5 65 100 grouse Win32 Threads 45.2 63 93 RT Threads 27 99 106 robson-win95 Win32 Threads 25.5 95 98

Table 5.2: Thread creation times in microseconds than those for Win32 Threads, a good result considering that an underlying Win32

Thread creation is included in the cost of an RT Thread creation in the mapped implementation.

5.1.2 Semaphore Overhead

The overhead of semaphores is measured here for two reasons. Firstly, semaphores are the most common way of controlling access to shared data structures in RT

Threads1. Threaded applications will often require a high number of such calls, although few might actually result in context switches if there is low contention for such access, as is likely to be the case in a uniprocessor environment. Thus the over• head of the call itself can be important. Secondly, context switching performance was measured using semaphores to cause the switches, and the overhead of the calls

1 Semaphores can easily provide the same functionality as "mutex" constructs often found in other threads packages

76 needs to be determined so it can be factored out.

Results for semaphore call overheads are included in table 5.3 along with results for context switching. Semaphore overhead is lower for RT Threads in all cases. This is to be expected, since RT Threads is implemented at user level and there is no system call overhead for these calls.

5.1.3 Context Switching

The context switching performance of RT Threads was tested on various machines, comparing the results with those for native thread packages on each machine. For So• laris and AIX, the native host implementation of RT Thread was used, and for Win• dows NT and Windows 95 the mapped implementation was used. The test consisted of two threads alternately blocking on and signaling semaphores. Semaphores are implemented using condition variables and mutexes for Pthreads, since semaphores are not part of the Pthreads interface. Table 5.3 shows the results, where times are averaged over 100,000 iterations. Context switch times include the overhead for semaphore calls. An estimate of the overhead for semaphore calls (one P and one V per switch) is also shown, determined by timing calls which did not result in context switches.

For context switching, the native host implementation of RT Threads out• performed the native threads packages in all cases. The mapped implementation showed a considerable performance penalty as compared to the underlying Win32 threads alone. While the mapped implementation can obviously not perform bet• ter than the threads package it is built on top of, work is ongoing to reduce this

77 Hostname Threads Context Semaphore Package Switch Overhead RT Threads 61 7 Stephen Solaris Threads 99 10 Pthreads 117 8 RT Threads 10 2 cascade Solaris Threads 24 4 Pthreads 28 3 RT Threads 16 1 rickards Solaris Threads 21 5 Pthreads 25 2 RT Threads 38 4 blackdog Pthreads 115 27 RT Threads 81 4 monteith Win32 Threads 23 17 RT Threads 182 5 asp Win32 Threads 82 7

Table 5.3: Context switch times in microseconds overhead.

5.2 Inter Thread Communication

Tests were made for RT Threads to determine the average times for send/receive/reply transactions, measured as the blocking time for the RttSendO call. Results were averaged over several iterations, the number of which varied with the message size.

Message sizes of 0, 1024, 65536 and 1048576 bytes were tested, both for local ITC transactions (i.e., between threads in the same RT Threads environment), and for cross address space ITC between machines on a local Ethernet network. Longer times for longer messages would reflect the overhead in copying the message in the local case, and the network transmission overhead in the cross address space tests.

The other threads packages tested did not have a communication facility

78 Host 0 bytes 1024 bytes 64 KBytes 1 MByte Stephen 129 707 36458 557847 rickards 28 121 7342 126042 ice 155 190 3360 118748 blackdog 85 103 4075 68528 appaloosa 71 101 4172 58207 cascade 20 48 1104 17803 monteith 147 183 2310 49832 grouse 81 116 2513 51269 robson-win95 164 210 2773 57403

Table 5.4: Local send/receive/reply times comparable to RT Threads' message passing, so comparisons could not be made.

Table 5.5 shows times, in microseconds, for ITC between two threads in a single RT Threads environment, for various message sizes. Times for zero length message would reflect the overhead for two context switches, plus the overhead of the RttSendO, RttReceiveO and RttReplyO calls combined.

Table 5.5 shows times, in microseconds, for cross address space ITC between various machines on a lOBaseT Ethernet network.

5.3 Real Time Performance

Experiments were designed to determine how well RT Threads does in terms of real time scheduling in various environments. To truly be a "real time" threads environ• ment (i.e., such that threads will start as close to their scheduled starting times as is possible given the priorities, deadlines and execution times of other threads in the environment), RT Threads threads would have to be the only threads in the overall system on a computer, or preempt other non-RT Threads threads or processes in-

79 Sending Receiving 0 1024 64 1 Host Host bytes bytes KBytes MByte ice appaloosa 26549 24186 190584 2554876 appaloosa ice 21355 20682 217334 2646952 dsg appaloosa 20092 21526 228083 2211720 appaloosa dsg 20058 66237 200006 2194095 cascade appaloosa 26404 32858 403443 3117465 appaloosa cascade 25841 31825 400061 3136351 Stephen appaloosa 31016 33668 398772 2620522 appaloosa Stephen 20225 23389 411676 2841969 rickards appaloosa 21535 20107 409943 2823639 appaloosa rickards 21322 21810 415503 2958978 grouse appaloosa 13048 10750 203123 2323293 appaloosa grouse 20073 20073 168049 2274666 monteith appaloosa 13607 11505 201410 2412353 appaloosa monteith 20014 20035 200548 2497584 robson-win95 appaloosa 12209 10394 201115 2382625 appaloosa robson-win95 20103 20002 201727 2413129 robson-win95 grouse 6433 12892 309948 2270538 grouse robson-win95 10176 10174 301699 2410048 grouse sumas 6966 7669 148257 2267732 sumas grouse 6905 6831 138490 2153997

Table 5.5: Cross address space send/receive/reply times

80 stantaneously whenever ready to run. Nevertheless, it is of interest to see whether

RT Threads would be useful on a variety of platforms where such guarantees are

not possible. In other words, can user applications comfortably make use of the

real time aspects of RT Threads' scheduling in the context of a general purpose

operating system such as UNIX or Windows NT.

The basic test program consists of a thread which repeatedly schedules itself

with a starting time which is a random short time in the future. The rand() library function is used to derive a delay time of between 20 and 40 milliseconds. Each

time the thread wakes up, the difference between the scheduled starting time and

the actual starting time is measured, and the average and standard deviation are

calculated at the end of the test. In the base test, no other user threads were run•

ning while the test thread was sleeping. Tests were run for 10,000 iterations. In the

tables, "Idle System" means there were no user processes running, other than those

described for the tests, but there may have been active system processes. "Loaded

System" refers to a system with several active user processes. In tests labeled "RT

Threads in Real Time Process", RT Threads was run as a UNIX real time process.

In Solaris, real time processes cannot be preempted by non-real time processes, in•

cluding system processes.

Unfortunately, the resolution of the system clock used for scheduling threads

was 10 milliseconds (10,000 microseconds) in all cases, so even under ideal condi•

tions (i.e., no other competition for CPU resources), a thread could start as much

as just under 10 milliseconds, plus any post interrupt scheduling overhead, after

it's scheduled starting time. With a uniform random distribution of starting times

81 Hostname Average Standard Maximum Difference Deviation Difference Idle System Stephen 5856 3932 101182 rickards 5566 3044 46489 ice 5389 3340 77746 blackdog 5738 2921 13093 appaloosa 5721 2922 12835 monteith 5022 2887 10961 grouse 5017 2890 10909 robson-win95 5969 3799 14954 Loaded System cascade 5540 5206 415719 RT Threads in Real Time Process Stephen 5689 2926 10494 rickards 5548 2940 10349

Table 5.6: Base Test: No background threads, no background processes one would expect the actual starting times would be roughly 5 milliseconds plus the scheduling overhead later than the scheduled starting times, on average, with a standard deviation of around 3 milliseconds.

Table 5.6 shows results for the base test. As expected, results show an av• erage difference somewhat higher than 5,000 microseconds (5 ms), and standard deviations of around 3,000 microseconds (3 ms). Maximum differences tend to be somewhat over 10,000 microseconds (10 ms), as expected, except on the various versions of SunOS where considerably higher maximums are seen. For the loaded system, a relatively high standard deviation (5.2 ms) and maximum (415 ms) were observed, yet the average was still reasonable. The results for RT Threads run• ning in a real time process, the case which should have the best results, were only marginally better.

82 Hostname Average Standard Maximum Difference Deviation Difference Idle System Stephen 5940 5641 299452 rickards 5623 3414 90510 ice 6350 26121 1966924 blackdog 5763 3088 85118 appaloosa 5733 2977 40758 monteith 5027 2889 14322 grouse 5007 2887 10946 robson-win95 5955 3774 19356 Loaded System cascade 137988 165153 814844 RT Threads in Real Time Process Stephen 5694 2926 10711 rickards 5545 2940 10240

Table 5.7: 1 background thread, no background processes

Table 5.7 shows results for the base test, with a low priority spinning (i.e., in a tight loop using up CPU resources) thread running in the same RT Threads environment (since they are of higher priority, the test threads which take the timing measurements should preempt the spinning thread). For the most part, results were only slightly worse than for the base test, as one would expect. The exception-was the loaded system, where the result was far worse. One possible explanation is that with the spinning thread, the RT Threads environment process is taking up considerable CPU resources and the underlying operating system's scheduler is less likely to schedule it when its alarm signal is generated (in other words, it will have used up its quantum).

Table 5.8 shows results where there is a background spinning process, which presumably would compete with the RT Threads environment for CPU resources.

Results are comparable to the base test, which suggests that an RT Threads appli• cation which is not compute intensive should still observe reasonably good real time

83 Hostname Average Standard Maximum Difference Deviation Difference Idle System Stephen 5852 3767 71459 rickards 5600 3491 160631 ice 5528 7451 379967 blackdog 5749 2919 12824 appaloosa 5728 2945 38246 monteith 5021 2889 18729 grouse 5019 2896 28729 robson-win95 5376 3557 13998 Loaded System cascade 5469 3141 38402 RT Threads in Real Time Process Stephen 5699 2925 10610 rickards 5553 2938 10340

Table 5.8: No background threads, 1 background process scheduling behaviour, even when other processes are active in the system.

Table 5.9 shows results where there is a low priority background thread and a background spinning process. In this case there is greater competition for CPU resources, which is reflected in the results. For the various versions of SunOS, the results are quite poor, especially for Solaris with an average of over 38 ms on both the SPARC2 and the Pentium 166, and a standard deviation of around 54 ms. How• ever, for AIX and NT, the results are actually quite good, especially for AIX 4.1 and NT 3.51. For the loaded system, the results are similar to those in table 5.7, which is not surprising since the background process may not be adding much to the overall background load. For the RT Threads environment running in a real time process, the results are still virtually the same as for the base test, as would be expected.

84 Hostname Average Standard Maximum Difference Deviation Difference Idle System Stephen 38978 54476 252087 rickards 38174 53869 216343 ice 22187 15538 106325 blackdog 5881 3259 49079 appaloosa 7701 4880 39121 monteith 5249 3930 49692 grouse 5967 10452 129904 robson-win95 6081 4486 34924 Loaded System cascade 149734 170841 820978 RT Threads in Real Time Process Stephen 5689 2926 10717 rickards 5538 2941 10233

Table 5.9: 1 background thread, 1 background process

Table 5.10 shows results where there is no low priority background thread, but there are two background spinning processes. Results are similar to those for the same test with only one background process.

Table 5.11 shows results where there is a low priority background thread and two background spinning processes. Results are somewhat worse than for the same test with only one background process, but still quite good for the various Windows operating systems.

Tables 5.12 and 5.13 show results with a compilation (GNU gcc for UNIX,

Visual C++ 4.0 for Windows) being performed in the background, which may be a more realistic situation than having a process spinning in a tight loop. Results are similar to those with a background spinner, although generally somewhat worse.

The results for the loaded system are better, but that may be due to changes in the

85 Hostname Average Standard Maximum Difference Deviation Difference Idle System Stephen 5861 4028 103529 rickards 5571 3050 45683 ice 5377 3355 74688 blackdog 5742 2917 11015 appaloosa 5726 2922 15373 monteith 5023 2886 10961 grouse 5021 2971 66134 robson-win95 3632 1971 7999 Loaded System cascade 5484 3258 102751 RT Threads in Real Time Process Stephen 5703 2924 10561 rickards 5554 2939 10308

Table 5.10: No background threads, 2 background processes

Hostname Average Standard Maximum Difference Deviation Difference Idle System Stephen 73425 99654 380421 rickards 71768 98779 378353 ice 45873 30867 363165 blackdog 10153 9143 67849 appaloosa 13856 10482 60720 monteith 5608 6928 89987 grouse 7142 22149 250817 robson-win95 5044 3550 24279 Loaded System cascade 173509 181477 714466 RT Threads in Real Time Process Stephen 5692 2925 10672 rickards 5542 2941 10286

Table 5.11: 1 background thread, 2 background processes

86 Hostname Average Standard Maximum Difference Deviation Difference Idle System Stephen 6203 4412 85392 rickards 5734 3511 101128 ice 5767 3504 46513 blackdog 6070 5900 466406 appaloosa 5786 2995 24153 monteith 5022 2887 10961 grouse 5026 2895 24697 robson-win95 6323 9672 302643 Loaded System cascade 5522 3167 28866

Table 5.12: No background threads, 1 background compile overall load over time.

Tables 5.14, 5.15 and 5.16 show the results of tests where two RT Threads environments were run concurrently on the same machine. In the first test (ta• ble 5.14), neither has a spinning background thread. In the second (table 5.15), one of the two RT Threads processes has a spinning background thread, and in the third (table 5.16), both do. Two results are shown in each case, one for each environment. In table 5.14, the results in all cases are reasonably close to those for the base case, except that the maximum difference tended to be much higher. In the second test, with one of the environments having a background thread spinning, results were again similar but slightly worse. The results for the third case, where both environments had a background spinning thread, the results are considerably worse. For the UNIX operating systems, the results are similar to those for table 5.9

(one background thread, one background process), which is not a surprising result as each of the RT Threads environments would act like a background spinning pro-

87 Hostname Average Standard Maximum Difference Deviation Difference Idle System Stephen 18157 39494 390007 rickards 16900 39887 526672 ice 31452 125322 1533903 blackdog 13165 36437 631339 appaloosa 12799 42353 628420 monteith 5073 3131 47836 grouse 5133 3758 169975 robson-win95 7068 7473 378022 Loaded System cascade 35452 65177 724476

Table 5.13: 1 background thread, 1 background compile cess for the other. For the first time, however, some very poor results are observed for the Windows operating systems. In each case only one of the RT Threads envi• ronments exhibits the poor results, the one which was started first. For Windows

NT 4.0 and Windows 95 the average was almost double that for the base test, for the first environment started. For Windows NT 3.51, the average was 39 ms, the standard deviation 360 ms, and the maximum nearly 4 seconds. It is difficult to interpret these results, it may be be a problem with interaction of the multimedia timers in the two processes, but that is merely speculation.

Tables 5.17, 5.18 and 5.19 show results for the base test, but with several threads doing the scheduling test simultaneously (100, 200 and 300 respectively).

The results show totals for all threads combined. As the number of threads in• creases, the results show considerably larger differences in the scheduled and actual starting times for threads. This result probably reflects the fact that many threads will be scheduled to run at approximately the same time, and by the time they all

88 Hostname Average Standard Maximum Difference Deviation Difference Idle System Stephen 5949/5930 4139/4189 81745/81525 rickards 5581/5587 3029/3028 38304/38040 ice 5483/5430 3069/3040 55919/55742 blackdog 5760/5737 3057/2919 95803/13274 appaloosa 5713/5828 2927/2909 24606/12268 monteith 5024/5018 3088/2896 113711/27781 grouse 5023/5015 3105/2887 116683/10946 robson-win95 5556/5649 3633/3621 14990/14982 Loaded System cascade 5493/5491 3140/3185 35649/32227

Table 5.14: 2 RT Threads environments

Hostname Average Standard Maximum Difference Deviation Difference Idle System Stephen 6023/5981 6199/4436 386794/99719 rickards 5590/5576 3030/3013 39661/45026 ice 5402/5414 3429/3012 138975/44699 blackdog 5767/5751 3376/2917 167307/10735 appaloosa 5767/5733 3298/2920 149607/12443 monteith 5023/5017 3034/2893 97619/18353 grouse 5044/5014 3296/2891 159975/12650 robson-win95 5782/9887 3729/5400 24741/24127 Loaded System cascade 6802/5474 8488/3273 254772/103873

Table 5.15: 2 RT Threads environments, 1 with background thread

89 Hostname Average Standard Maximum Difference Deviation Difference Idle System Stephen 33287/27166 51808/42670 352237/211320 rickards 32459/26167 50755/42056 239984/241409 ice 22008/22252 16807/16107 565581/99998 blackdog 6802/8588 4707/5354 172364/43439 appaloosa 9705/9343 8196/5574 595853/42113 monteith 39060/5181 360349/3471 3893468/40304 grouse 9922/5021 14915/2886 169975/10907 robson-win95 10082/5240 8954/3430 261841/22954 Loaded System cascade 35079/32557 55748/52008 569994/384839

Table 5.16: 2 RT Threads environments, both with background thread get a chance to run briefly, the accumulated overhead will add up to cause a high average delay in actual starting times. To test this idea, the test with 300 threads was run again, but measuring the time when each thread is made ready in the RT

Threads kernel rather than its actual starting time. Results are shown in table 5.20, and are more reasonable, with average differences ranging from 5 ms to 10 ms, and standard deviations of around 3 ms.

Table 5.21 shows results for a scheduling test done with a real multimedia client application used with the CMFS file server. The application presents JPEG motion video using a Parallax hardware decoder card, synchronized with a simulta• neous audio stream, and in some cases a closed captioning text stream. Threads, one for each stream, are scheduled periodically by the client to display their respective media data. Measurements are made in the RT Threads kernel to see how close to their scheduled starting time the threads are made ready, on average. Two versions of the same video clip showing the music group Yes, one at 15 frames per second,

90 Hostname Average Standard Maximum Difference Deviation Difference Idle System Stephen 9054 9777 617879 rickards 7251 2309 67894 ice 7525 3368 191491 blackdog 7645 3199 169761 appaloosa 8451 3118 196294 monteith 5046 3530 298528 grouse 5051 3247 178528 robson-win95 9507 3737 98975 Loaded System cascade 6002 3128 94445

Table 5.17: One RT Threads environment with 100 threads

Hostname Average Standard Maximum Difference Deviation Difference Idle System Stephen 30756 22385 1602504 rickards 10245 2000 207351 ice 14419 5878 376447 blackdog 21088 14307 822681 appaloosa 27473 5371 318741 monteith 5130 4182 374181 grouse 5424 14117 1756181 robson-win95 12716 9299 1195975 Loaded System cascade 7248 3880 145242

Table 5.18: One RT Threads environment with 200 threads

91 Hostname Average Standard Maximum Difference Deviation Difference Idle System Stephen 59979 39486 1930563 rickards 12885 4026 415807 ice 39954 16533 1085964 blackdog 46488 53473 4108626 appaloosa 57031 62089 4254461 monteith 18921 4239 229975 grouse 8868 69442 6277325 robson-win95 33167 12136 1224975 Loaded System cascade 12978 18406 512005

Table 5.19: One RT Threads environment with 300 threads

Hostname Average Standard Maximum Difference Deviation Difference Idle System Stephen 5308 6086 354863 rickards 5167 3256 93237 ice 5071 3432 191146 blackdog 5029 3607 151694 appaloosa 5058 3891 148228 monteith 5018 2889 19189 grouse 5092 3246 129425 robson-win95 2912 2122 404975 Loaded System cascade 7344 13224 329269

Table 5.20: One RT Threads environment with 300 threads, measured in kernel

92 Title Media Average Standard Maximum Type Difference Deviation Difference Video 4925 2936 29381 Yes Video (15 fps) Audio 4963 3151 42423 Video 6247 2603 24592 Yes Video (30 fps) Audio 7348 7971 65987 Video 4394 3373 47382 Reservoir Dogs Audio 4779 4617 29077 (20 fps) Text 4890 5124 32107

Table 5.21: Multimedia client scheduling performance one at 30, and both with synchronized audio were tested, as was a clip from the movie "Reservoir Dogs" with synchronized 20 frames per second video, audio and text. Measurements were taken over a two minute period. Results were generally as expected. Results were somewhat poorer for the 30 frame per second clip, although that may in part be explained by the fact that calls to the X server for displaying video frames were protected by the RttBeginCritical () and RttEndCritical calls which block signals to the RT Threads kernel and may have caused scheduling clock ticks to be delayed.

93 Chapter 6

Related Work

A number of other threads packages and operating systems with kernel thread sup• port are available, such as AIX 4.1 Pthreads [1], Solaris Threads [23, 9], Mach C

Threads [5], etc. Some of these provide real time functionality. For example, AIX

4.1 Pthreads can specify a fixed priority FIFO policy, but this requires root au• thority, due to the danger of starving system processes because of global thread scheduling. Solaris threads offer a real time class of threads, but these real time threads also require root authority and require a high degree of caution, since they preempt system processes [11] and could cause the entire system to deadlock.

Other work has been done to provide real time support for existing packages.

The Real Time Thread Package (RTTP) developed at Brown University is a C++ framework which is an extension of the Brown Thread Package. The RTTP imple• mentation described in [24] is a wrapper for Solaris Threads, providing a mechanism for non-super users to create real time threads and for protection against runaway real time threads taking over system resources.

94 Real-Time Threads [26] is a real time threads package based on Mach C

Threads. One of its main goals, like RT Threads, is to provide portability among operating system platforms. Unlike RT Threads, this package does dynamic schedu• lability analysis, and offers support for multiprocessor platforms. The admission algorithm does, however, require static analysis of maximum execution times for threads, and threads must be assigned to specific nodes in multiprocessor environ• ments.

Alternate techniques for providing user-level thread scheduling have been implemented. In [14], shared memory records in user space are used to communi• cate scheduling information efficiently between kernel space and user space to help user space thread schedulers make scheduling decisions based on kernel-level process scheduling and system call blocking behaviour. Most notably, software interrupts are delivered to user space when system calls block and unblock, and optionally to warn of imminent preemption to assist in avoiding context switches while holding locks. This framework was implemented in the Psyche parallel operating system and was used to implement more than one threads package, with different pack• ages being able to coexist and communicate with each other. A similar shared data mechanism called a Meta-Level Scheduler (MLS) is presented in [22]. User-Level

Schedulers (ULSs) communicate their state to the kernel via a shared memory re• gion in the MLS (in user space), so the kernel scheduler can schedule the highest priority thread. This mechanism was implemented for the ARTS real time operating system [27] (see section 6.3). Another example of a shared memory mechanism for coordinating user level thread scheduling with the kernel can be found in [6].

95 6.1 POSIX

In [21], experience with the POSIX 1003.4 and 1003.4A standard proposals is dis• cussed. POSIX 4 (1003.4) is a proposal for adding base real time support to the

POSIX standard, while POSIX 4A provides thread extension (the "Pthreads" in•

terface). LynxOS, a commercial UNIX compatible real time operating system, was

modified to provide an interface and functionality conforming to 1003.4 and 1003.4A,

with some of the new functionality added at the kernel level but as much as pos•

sible implemented as user libraries. POSIX 4 requires that at least two scheduling

policies be implemented; Round-Robin and FIFO. Round-Robin is a round robin

time-slicing policy, whereas FIFO is a fixed priority policy where threads of a given

priority run to completion unless preempted by higher priority threads (threads have

a global scheduling scope, so single threaded processes can be thought of as threads

in the overall system in terms of scheduling policy). The FIFO policy is similar to

the RT Threads scheduling policy, except that it excludes deadline scheduling. With

respect to POSIX 4A, an interesting result found by the authors was unexpectedly

poor performance of the mutex and condition variable synchronization mechanisms.

A strictly user level implementation of Pthreads is described in [15]. Similar•

ities with the RT Threads implementation include the use of a monolithic monitor

for protecting kernel data structures, i.e., a single global flag to indicate execution

in a kernel critical section, and a flag used to defer signal handling. Performance

results were similar to those experienced with RT Threads; synchronization and con•

text switching were as good as or better than kernel implementations, and dynamic

allocation of stacks was found to be the main source of overhead for thread creation.

Most of the complexity of the implementation was in providing for the delivery of

96 signals to individual threads, as required by the POSIX standard. RT Threads does not provide this functionality because using a synchronous programming approach to handling asynchronous events with blocking threads was considered preferable to the asynchronous programming approach of using signals and signal handlers. In fact, this is one of the main motivations for providing threads in the first place, a sentiment also expressed in [21].

MiThOS (Microkernel Threads Operating System) [3], developed at Florida

State University, provides deadline and periodic scheduling extensions to a POSIX

Pthreads library, and is implemented on a bare machine.

6.2 Real Time Support for Multimedia Applications

Other work has been done in providing real time support for multimedia applica• tions, particularly in terms of resource allocation in a shared system. Much of the work takes a Quality of Service (QoS) approach where threads which cannot meet their real time requirements dynamically lower those requirements. Real time thread support focuses on periodic scheduling and handling of missed deadlines. Another issue which has prominence in the area of real time systems is dealing with the problem of priority inversion, where a low priority thread holding a shared resource can cause a medium priority thread to hold up a high priority thread.

In [28] and [16], periodic real-time threads are used to handle periodic events in a multimedia client application, such as the timely display of continuous media data units (e.g., video frames). In both cases the test application was a QuickTime

97 video player. Support multiple clients on a single machine with graceful degrada• tion required a mechanism for QoS management, and global scope for scheduling of threads. Real-Time Mach [29, 25] was used as the underlying operating system, providing such real time threads. One of the methods demonstrated was a self sta• bilization scheme whereby a missed deadline handler for each thread modified the thread's timing attributes, i.e., period and deadline, such that a thread would lower its frame rate if it was missing deadlines. Results from [16] showed good results when running multiple video clients with varying priorities, i.e., only the quality of the low priority client was degraded. The same test run with a time-sliced scheduler showed erratic degradation among the clients, such that the high priority stream was as likely to reduce frame rate as the low priority one.

In [17], a complex protocol for dealing with priority inversion, called the

Semaphore Dependency Protocol, is presented. This protocol is less conservative than the protocol ceiling protocol, and results in fewer context switches than the priority inheritance protocol. The authors go on to conjecture, however, that the structure of multimedia applications is usually not complex enough to warrant such an elaborate protocol, and that simple priority inheritance should suffice. The philosophy of RT Threads is that even priority inheritance is not required, i.e., that it should be left up to the application to deal with problems like priority inversion.

6.3 Inter Thread Communication

RT-IPC, a real time extension to Mach IPC for Real-Time Mach is discussed in [10].

Like RT Threads, Mach provides a facility for message passing between threads, where multiple messages arriving at a receiver are queued awaiting reception by the

98 receiving thread (a port mechanism is used in Mach, but the effect is the same). As is the case with RT Threads, messages are dequeued in FIFO order, regardless of the priority of the sender. RT-IPC adds a number of options with respect to dequeuing policy and priority of the receiver. For dequeuing order, a choice of FIFO or priority based policy is available, much like the first come first served and priority modes available for RT Threads semaphores. Priority inheritance and priority handoff op• tions are also available to deal with priority inversion problems which may occur between sender and receiver, much as similar protocols are available for mutex locks in many real time operating systems. In RT Threads, only FIFO semantics are pro• vided for message queuing. In keeping with RT Threads' cooperative programming philosophy, it is condidered the application's responsibility to ensure that reasonable priorities are used for the message receiver, and that an appropriate design is used for handling received messages so as not to unduly hold up senders waiting in the message queue (fro eaxmple, the client/server model depicted in figure 2.2).

The Chant threads package [7] extends the POSIX Threads interface with a mechanism called talking threads for distributed programming. Talking threads allow threads on different machines to communicate directly using message passing primitives. Thread identifiers are made universally unique by embedding in them a machine identifier, process identifier and thread identifier, a technique similar to that used by RT Threads. Notification of message reception is done by polling the underlying communication system in the Chant implementation, whereas RT

Threads uses underlying asynchronous signals (as part of the external I/O imple• mentation) to unblock threads waiting for messages.

99 HARTOS [8] is a distributed real time operating system developed at the Uni• versity of Michigan. HARTOS is part of the HARTS Hexagonal Architecture for

Real-Time Systems, where several multiprocessor nodes are connected on a hexago• nal mesh interconnection network. Dedicated network processors and per-hop dead• lines for data transmission are among the techniques used to implement a real time message passing facility.

ARTS is an object oriented distributed real time operating system. Its com• munication facilities include a synchronous request/response style message passing mechanism similar to that of RT Threads. Real time support for this mechanism includes transmitting the sender's time fence (deadline) with the request message, which is compared with the receiver's time fence, taking communication delays into account, to determine whether the request can be processed in time. Also, a com• munication manager is employed at the receiver with prioritized message queues and prioritized, preemptable message handling worker threads, using a priority inheri• tance protocol to avoid priority inversion within the communication subsystem.

100 Chapter 7

Additional Work

Work which remains to be done with RT Threads includes:

• testing the native microkernel implementation

• testing FreeBSD and Linux versions of RT Threads, and comparing the results

with those for Windows NT and Windows 95 on the same hardware

• implementing a mapped version of RT Threads for Solaris Threads, testing,

and comparing the results for those with the native host implementation for

Solaris.

101 Chapter 8

Conclusions

RT Threads has been used in a UNIX environment for approximately two years.

Although refinements have been made over that time, it has proven to be stable.

Both the server components and the clients for the CMFS distributed continu• ous media file server project have been implemented using RT Threads. These applications have a complex design of communicating threads, including network communication (and synchronization) between client and server components using the cross address space inter-thread communication facilities of RT Threads. Real time scheduling functionality has been used to implement timely disk reading and network transmission of continuous media data, fine grained network rate control for the purpose of traffic shaping, and display synchronization for multiple media streams. RT Threads has provided a convenient and powerful model for program• ming cooperating real time tasks.

Performance results show comparable or better results for RT Threads, in terms of functional overheads, than for the other threads packages tested in a UNIX

102 environment (Solaris Threads and POSIX Threads). This suggests that RT Threads makes a practical alternative for use as a general purpose threads package in such an environment. Results for functional overheads in the Win32 environment were some• what worse for RT Threads than for Win32 threads in the case of thread creation and context switches, which is to be expected considering that for RT Threads these functions are built on top of the Win32 facilities. Still, results were close enough that RT Threads should be a practical alternative for most applications.

Real time scheduling tests showed that RT Threads, in most cases, exhibits the expected real time scheduling behaviour in lightly loaded systems. Scheduling was almost always accurate to within the limiting underlying system timer clock resolution of 10 milliseconds. Results worsened when the underlying system had a heavy load, or when multiple compute intensive RT Threads applications were run on the same machine. These experiments were intended to test the limits of the RT

Threads package. Generally, one would not expect to be able to run a compute in• tensive real time application and share computing resources successfully with other processes in a general purpose system.

Thus RT Threads can be successfully used as the basis for applications which require non-critical real time scheduling of events, even in a non-real time operat• ing system environment. This result was shown in practical terms by testing the performance of a synchronized multimedia client.

103 Bibliography

[1] AIX Version 4-1 General Programming Concepts: Writing and Debugging Pro• grams, October 1994.

[2] T.P. Baker, Frank Mueller, and Viresh Rustagi. Experience with a Prototype of the POSIX "Minimal Realtime System Profile". In the 11th IEEE Workshop on Real-Time Operating Systems and Software, May 1994.

[3] T.P. Baker, Frank Mueller, and Viresh Rustagi. MiThOS - A Real-Time Micro- Kernel Threads Operating System. In IEEE Real-Time Systems Symposium 1995, 1995.

[4] David R. Cheriton. The V Kernel: A Software Base for Distributed Systems. IEEE Software, l(2):19-42, April 1984.

[5] Eric C. Cooper and Richard P. Draves. C Threads. Technical Report CMU- CS-88-129, Carnegie Mellon University, April 1988.

[6] Ramesh Govindan and David P. Anderson. Scheduling and IPC Mechanisms For Continuous Media. In Proceedings of the 13th Symposium on Operating Systems Principles, pages 68-80, October 1991.

[7] Matthew Haines, David Cronk, and Piyush Mehrotra. On the Design of Chant: A Talking Threads Package. In Supercomputing 94, April 1994.

[8] Dilip D. Kandlur, Daniel L. Kiskis, and Kang G. Shin. HARTOS: A Distributed Real-Time Operating System. ACM Operating Systems Review, 23(3):72-89, July 1989.

[9] Sandeep Khanna, Michael Sebree, and John Zolnowsky. Realtime Scheduling in SunOS 5.0. In Winter USENIX, pages 375-390, January 1992.

[10] Takuro Kitayama, Tatsuo Nakajima, and Hideyuki Tokudo. RT-IPC: An IPC Extension for Real-Time Mach. In Proceedings of the 2nd USENIX Conference on Microkernels and Other Kernel Architectures, 1993.

104 [11] Bil Lewis and Daniel J. Berg. Threads Primer. SunSoft Press, 1996.

[12] Siu Ling Ann Lo. An Operating System Architecture for Real-Time Tasking Abstractions. PhD thesis, University of British Columbia, April 1995.

[13] Siu Ling Ann Lo, Norman C. Hutchinson, and Samuel T. Chanson. Architec• tural Considerations in the design of Real-Time Kernels. In IEEE Real-Time Systems Symposium, pages 138-147, December 1993.

[14] Brian D. Marsh, Michael L. Scott, Thomas J. LeBlanc, and Evangelos P. Markatos. First-Class User-Level Threads. In Proceedings of the 13th Sym• posium on Operating Systems Principles, pages 110-120, October 1991.

[15] Frank Mueller. A Library Implementation of POSIX Threads under UNIX. In Proceedings of the Winter USENIX Conference, pages 29-41, January 1993.

[16] Tatsuo Nakajima and Hiroshi Tezuka. A Continuos Media Application sup• porting Dynamic QOS Control on Real-Time Mach. In A CM Multimedia '94, October 1994.

[17] Roger Needham and Akira Nakamura. An Approach to Real-time Scheduling - but is it Really a Problem for Multimedia? In Proceedings of the 3rd Interna• tional Workshop on Network and Operating System Support for Digital Audio and Video, pages 32-39, 1993.

[18] Gerald W. Neufeld, Dwight J. Makaroff, and Norman C. Hutchinson. Design of a Variable Bit-Rate Continuous Media Server. In Proceedings of the 5th International Workshop on Network and Operating System Support for Digital Audio and Video, pages 375-378, Durham NH, April 1995.

[19] Gerald W. Neufeld, Dwight J. Makaroff, and Norman C. Hutchinson. Internal Design of the UBC Distributed Continuous Media File Server. Technical Report 95-06, University of British Columbia, April 1995.

[20] Gerald W. Neufeld, Dwight J. Makaroff, and Norman C. Hutchinson. Design of a Variable Bit-Rate Continuous Media Server for an ATM Network. In Pro• ceedings of the IS&T/SPIE Multimedia Computing and Networking Conference, San Jose, CA, January 1996.

[21] Bill O.Gallmeister and Chris Lanier. Early Experience With POSIX 1003.4 and POSIX 1003.4A. In Proceedings of the IEEE Symposium on Real-Time Systems Conference, pages 190-198, July 1991.

105 [22] Shuichi Oikawa and Hideyuki Tokuda. User-Level Real-Time Threads: An Approach Towards High Performance Multimedia Threads. In Proceedings of the 4th International Workshop on Network and Operating System Support for Digital Audio and Video, pages 66-76, November 1993.

[23] M. L. Powell, S. R. Kleiman, S. Barton, D. Shah, D. Stein, and M. Weeks. SunOS Multi-thread Architecture. In Proceedings of the Winter USENIX Con• ference, pages 65-76,1991.

[24] Kostadis Roussos. Real-Time Thread Package. Technical Report CS-96-18, Department of Computer Science, Brown University, May 1996.

[25] Stefan Savage and Hidyuki Tokuda. Real-Time Mach Timers: Exporting Time to the User. In Proceedings of the 3rd USENIX Mach Conference, pages 111- 118, April 1993.

[26] Karsten Schwan, Hongyi Zhou, and Ahmed Gheith. Real-Time Threads. ACM Operating Systems Review, 25(4):35-46, October 1991.

[27] Hideyuki Tokuda and Clifford W. Mercer. ARTS: A Distributed Real-Time Kernel. ACM Operating Systems Review, 23(3):29-53, July 1989.

[28] Hide Tokudo and Takuro Kitayama. Dynamic QOS Control based on Real-

Time Threads. In Proceedings of the Jtth International Workshop on Network and Operating System Support for Digital Audio and Video, pages 114-121, November 1993.

[29] Hideyuki Tokudo, Tatsuo Nakajima, and Prithvi Rao. Real-Time Mach: To• wards a Predictable Real-Time System. In Proceedings of the 1st USENIX Mach Symposium, October 1990.

[30] Alistair C. Veitch and Norman C. Hutchinson. Kea - A Dynamically Exten• sible and Configurable Operating System Kernel. In Proceedings of the Third Conference on Configurable Distributed Systems (ICCDS '96), May 1996.

106 Appendix A

Example Program

The following simple example program is included to demonstrate the basic struc• ture of an RT Threads program. Note that the header file rtthreads.h must be included. In the program, two threads are created, each being passed an integer to identify it in the output. Both threads are created with the same priority, RTTHIGH.

Each thread lowers its priority, in turn, causing the other to be scheduled. This is done within a critical section, protected with a semaphore. The example program demonstrates the scheduling effects of both scheduling attribute modification and the use of synchronization primitives. Sample output is shown in comments at the end.

#include

#include "rtthreads.h"

static RttThreadld threada, threadb; static RttSem sem;

107 RTTTHREAD thr(void *arg)

•C

RttSchAttr myattrs;

RttThreadld me;

int i, threadNum;

threadNum = (int)arg;

me = RttMyThreadldO ;

for (i = 0; i < 3; i++)

•C

printf("7,d: Wait\n", threadNum);

RttP(sem);

RttGetThreadSchedAttr(me, ftmyattrs);

printf ("*/,d: My priority is '/,d\n", threadNum, myattrs .priority);

/* lower my priority */

myattrs.priority++;

RttSetThreadSchedAttr(me, myattrs);

printf ("7,d: Signal\n" , threadNum);

RttV(sem);

}

printf ("'/,d: Done\n" , threadNum);

}

108 mainpO

{

RttSchAttr attrs;

RttAllocSem(&sem, 1, RTTFCFS);

attrs.startingtime = RTTZEROTIME;

attrs.priority = RTTHIGH;

attrs.deadline = RTTNODEADLINE;

RttCreate(&threada, thr, 16000, "a", (void *)1, attrs, RTTUSR);

RttCreate(&threadb, thr, 16000, "b", (void *)2, attrs, RTTUSR);

printf("threads created\n");

109 /* Sample Output: threads created

1: Wait

1: My priority is 10

2: Wait

1: Signal

2: My priority is 10

2: Signal

2: Wait

2: My priority is 11

1: Wait

2: Signal

1: My priority is 11

1: Signal

1: Wait

1: My priority is 12

2: Wait

1: Signal

2: My priority is 12

2: Signal

2: Done

1: Done

110