BY PHILIP KOOPMAN JR .

o most embedded develop­ ers, multitasking means using a preemptive multi­ tasker and a complex, ex­ Cooperative pensive (in terms of soft­ wareT cost, memory size, and run-time tasking overhead) piece of software. While em­ bedded real-time systems are best writ­ techniques can ten with a multitasking approach, pre­ emptive multitasking is often overkill. often meet an A preemptive multitasker is a very gen­ eralized tool; users may not want to pay embedded the costs that always accompany gener­ alized solutions to their very specific system's problems. A less widely understood approach multitasking to real-time design is cooperative multi­ tasking. Judicious use of cooperative requirements and tasking techniques can often meet an 's multitasking re­ give better quirements, while giving better perfor­ mance and a simpler software environ­ performance. ment than a preemptive multitasker. In my work with stack-based proces­ sors at Harris Semiconductor Inc., I've Tasks can also be used to distinguish investigated the different approaches to different activity classes for a system. multitasking and explored the different For example, routines that only need to tradeoffs among complexity, cost, and be invoked occasionally can be included speed. As a result, I'm convinced that in a task run in response to an external the ease of implementation and effi­ stimulus, such as an or a timer ciency of cooperative multitasking are pulse. Tasks that always run but are not widely underestimated. time-critical can be assigned lower priorities than other tasks. WHY MULmASKING? On larger systems with multiple pro­ ultitasking is used in embed­ cessors, breaking a software system into ded systems for a variety of tasks provides natural boundaries for M purposes. Keeping separate migration. Setting parallelism system functions resident in separate granularity at the task level allows each tasks helps reduce the complexity of to be assigned to one or more each portion of the system. Separate active tasks, greatly reducing the soft­ tasks also help modularize the work of ware's complexity. In the best of all pos­ writing the system by forcing the cre­ sible worlds, we have one processor per ation of well-defined interfaces between task. Unfortunately, not all systems modules and programmers. have the luxury of multiple processors.

APRIL 1990 EMBEDDED SYSTEMS PROGRAMMING 43

Hemyweight Tasking

The problem comes when many tasks The preemptive method is usually contend for use of a single processor. what is meant by "multitasking" in con­ This processor must be shared among ventional computing terminology. In a tasks, which can involve significant preemptive multitasker, some prede­ overhead. fined event, such as a timer pulse, peri­ odically halts the executing program PREEMPTIVE TASKING and transfers control to an executive orth has a long tradition of pro­ program or kernel. The kernel saves the viding many support levels for complete state of the processor (all reg­ F multitasking. While coopera­ isters and status bits), including the pro­ tive multitasking is a well-known tech­ gram counter and stack contents. Once nique in Forth circles, the technique the old task state is saved, the kernel needn't be limited to that language. In­ uses some scheme to select another task deed, with the recent appearance of C for execution. The newly selected task compilers for stack processors like the has its state loaded into the processor RTX, the approach becomes particu­ and execution is resumed on the new larly attractive for C programmers as task. well. However, stack processors do alter The advantage to full preemptive some of the design tradeoffs. multitasking is that it is almost trans-

Table 1 Pause code for a lightweight multltasller written for the RTX 2000 RTX 2000 Instruction Comment R> Transfer return address of calling task (task restart address) to data stack.

OUI Store that restart address to task area word 0.

1 UO Fetch link value from task area word I.

UBRI Store that link value as the new value of the user base register.

ouo Fetch the restart address of the new task from the new task area.

>R EXIT Transfer the restart address from the data stack to the return stack, then perform a subroutine return (equivalent to a jump to the restart address).

felbns I. Each task uses the user base register to point to a 32-word task area used for scratch storage. 2. The pause subroutine code takes six words of memory shared among all tasks. 3. Each invocation of pause code takes I0 clock cycles, including the subroutine call instruc­ tion from the calling task.

44EMBEDDEDSYSTEMSPllOGRAMMIN6 APRIL 1990 parent to the programmer, and it is a powerful and automatic way to ensure that no problems occur during the tran­ sitions between tasks. Unfortunately, the oost for this transparency and power is very high. The machine's entire state must be saved to guarantee a correct restart when the task is resumed. In contrast to RISC and most ClSC processors, however, most stack ma­ chines automatically track the number of active elements on the stack. Tn prac­ tice, the stacks tend to be fairly shallow. However, whatever processor architec­ ture you 're using, the oost for a preemp­ tive multitasker is high. An additional problem with preemp­ tive multitasking is that certain code se­ quences, known as critical regions, must not be interrupted by a task switch. Typically, these sequences deal with ac­ cess to resources shared among tasks or time-critical code. To prevent a preemptive multitasker from interrupting these sequences, a flag must be used to notify the executive that the task is in a critical region. Be­ cause task switches are forbidden with­ in these regions, task-switching latency increases dramatically. Alternatively, semaphores or other synchronization methods may be used to notify other tasks when a data structure is in use in case the task is interrupted. Either way, a considerable run-time overhead and programming effort is involved in pre­ venting a from causing problems.

HEAVYWEIGHT COOPERATION n some systems, the number of shared data references and critical I regions executed will have a much greater impact on efficiency than the number of task switches made. Or, you may be dealing with a relatively simple system where it's hard to justify the generality and complexity of a preemp­ tive multitasker. In these cases, it may

APR LL L990 EMBEDDED SYSTEMS PROGRAMMING 45 actually be desirable to do away with tive tasking method is that neither syn­ the preemptive tasker and implement chronization variables nor critical-re­ Hearyweight cooperative tasking. gion flags are necessary. Since task In cooperative tasking, the task de­ switching only occurs when the pro­ cides when it is ready to give up control grammer requires it, a task cannot be Tasking of the processor. This decision is typical­ shut down by the kernel at an inoppor­ ly made by each task. The task periodi­ tune moment. This fact can substantial­ cally invokes a routine that queries a ly reduce your run-time overhead and timer to see whether the system desires code complexity. Further, much of the a task switch. This technique is called code in the kernel used for synchroniza­ "pausing." tion and data-structure sharing can be In a simplified but oft-used case, a eliminated, reducing memory costs. In cooperative task switch is performed on every pause. The disadvantage of using the coop­ When it is time for a task switch, the full erative tasking method is that it places tasking, the task processor state is saved in a manner more of a burden on the programmer similar to that used for preemptive task­ for correct operation. Programmers are decides when it is ing. The term "heavyweight" is used be­ responsible for ensuring that periodic cause the complete machine state must checks for task switching are placed in completed and be saved to guarantee correct operation. appropriate sections of the code. If they The term "cooperative" is used because place these checks wisely, the result will ready to give up the task relinquishes control of the sys­ be good performance with small task­ tem by executing pause statements switcbing latencies. Ifprogrammers are control of the sprinkled throughout its code rather not so wise, some task-switching laten­ than relyi ng on an external preemption cies may be undesirably long. processor to other mechanism. The advantage to using the coopera- DEALING WITH LATENCY tasks. he issue of task-switching laten­ cy can be addressed in three T ways. First, we can use coopera­ tive tasking in applications where task­ Figure 1 switcbing latency is not extremely criti­ Characteristics of tasking models. cal or the tasks are quite simple and rely ... • No pauses used, timer-driven on the programmer's skills. Obviously Low High scheduler preempts programs this approach has limited applicability. Preemptive anywhere The second way to reduce this prob­ multitasker • Critical regions and lem is to move all time-critical code to synchronization required for correct operation interrupt-service routines. For example, a task that is activated to read data from an input port before the data is overrun Heavyweight • Pause anywhere • Same context switching cost can be replaced by an interrupt-service cooperative as preemptive multitasker routine that places the data into a first­ multitasker • Cooperati ve model reduces in-first-out buffer and a task that pro­ need for synchronization cesses data from that buffer. This strat­ egy has the effect of desensitizing the System Transparency • Pause at module boundaries, Medium-weight system to task-switching latency. Simi­ efficiency to programmer when stacks are small cooperative • Programmer sched ules pauses lar methods must often be used with muJtitasker for reduced context switching preemptive tasking models as well be­ costs cause of their problems with latency • Cooperative model reduces caused by the critical regions. need for sy nchronization Finally, you can treat an unaccepta­ bly long latency as a soft error. This so­ Lightweight • Pause only at highest level • Stacks must be empty when lution can be accomplished by using a cooperative pausing on either the develop­ multitasker • Almost no context switching ment system or target system. When cost the maximum permissible task-switch­ • Task switch on every pause ing latency is exceeded, a warm start of (no overhead) in most implementations the task can be performed, along with a High Low • Cooperative model reduces suitable message to the programmer if need for synchronization he or she is in a development mode.

46 EMBEDDED SYSnMS PROGRAMMING APRIL 1990 three registers and leaves the other re­ CPU on many context switches. There­ sources idle. (Forth requires two stack fore, methods other than heavyweight Hemyweight pointers and, in some implementations, tasking that have reduced context­ will keep the top data stack element in a switching costs can be attractive in register.) Machine code sequences in a some situations. Tasking Forth program may use any number of registers for efficiency, but they are not MEDIUM-WEIGHT TASKING preserved across subroutine boundaries eavyweight cooperative task­ and need not be saved when pausing. ing eliminates the overhead Thus, Forth kernels on conventional H and complexity of dealing machines trade off a little bit of run­ with synchronization and critical re­ HEAVYWEIGHT COSTS? time speed (by not making optimal use gions by restricting the times at which he heavyweight cooperative of registers) in return for very fast con­ switching can occur. However, it still tasking model is the model tra­ text-switching times. Since the cost of pays a reasonably high overhead for T ditionally included in Forth context switching is quite low, most saving and restoring context on a task programming environments. lt is pre­ Forth systems running on conventional switch. Therefore, medium-weight co­ ferred to the preemptive tasking model processors use heavyweight cooperative operative tasking is an embellishment because of its simplicity and the direct multitasking. of heavyweight tasking that reduces the control it gives programmers. Slack-based processors usually con­ cost of context switching. Heavyweight tasking is very inex­ tain some sort of stack-buffer hardware Stack-based programs, especially pensive when executing Forth on a con­ on-chip. The use of this stack buffer sig­ those programs written in Forth, tend to ventional processor, because a Forth nificantJy increases the execution speed bequite modular. This modularity often virtual machine running on a register­ of Forth programs over that possible results in routines that run in less time based CPU typically uses only two or with conventional processors. Unfortu- than the desired interval between task switches. Further, the amount of infor­ mation transferred on the stack be­ tween independent modules is often very small. Since context switching on stack ma­ chines primarily consists of saving the active stack elements, the cost of coop­ erative multitasking can be significant­ ly reduced by simply choosing to place pauses at module boundaries. This method reduces the average number of stack elements that need to be saved on pauses and task-switching overhead. No compiler analysis is required, since the user chooses where the pauses go. The user cannot make a serious mis­ take, because the stack hardware auto­ matically tracks the depth of the stacks and saves all the required elements. nately, its use by Forth and other lan­ Medium-weight cooperative tasking guages also increases the state that modifies a program that uses the must be saved from the CPU on a con­ heavyweight tasking model so that on Medium-weight text switch. The amount of state that the average it has fewer elements on the must be saved is typically equal to a stack to save on context switches. cooperative small on a conventional CPU. A LIGHTWEIGHT EXAMPLE tasking is an cost­ Of course, techniques such as parti­ hat if pauses are placed tioning the hardware slack buffer into only where the stacks are reducing multiple areas for high priority tasks W guaranteed (by the pro­ can eliminate saving stack contents to grammer) to be empty? In this case, the embellishment of memory in a significant number of ap­ cost of a context switch can be much plications. Still, the general case of a longer, since only the heavyweight very large number of low priority tasks and perhaps one or two other registers on a single processor does require sever­ need be saved. This method is called tasking. al words of data to be saved from the lightweight cooperative tasking.

48 EMBEDDED SYSTEMS PROGRAMMING A PR IL 1990 Hemyweight Tasking

An even further refinement of lightweight cooperative tasking is possi­ ble. If the size of the routin es invoked between pauses is chosen appropriately, sor. This multitasker assumes a round­ ecution performs a subroutine call to no timer is needed to decide when to robin scheduling policy with equally the pause code, the user base register is change tasks. A task change can be per­ weighted task priorities. It traverses a set to the value for the next task in the formed on every pause. Since the cost of circular linked list for task scheduling linked list, and the program counter is task changing is so low in a lightweight and switches tasks on every PAUSE com­ set to the saved restart address for that tasking model, the last piece of tasking mand. Each task has a 32-word control next task. In this example, the total time overhead is removed and overall tasking area and scratchpad space in memory to process a task switch is 10 clock cy­ costs are reduced even further. Further, accessed through the user base register. cles, and the process uses seven instruc­ the code required to implement the Offset 0 from the user base register tions, including the subroutine call to tasker is drastically reduced, resulting contains the restart address for the task, the pause code. in a greatly simplified system software whfoh was set the last time the task was It should be noted that even if an ap­ environment. paused. Word offset I from the user plication does not appear to be suitable Table I shows PAUSE code for a base register contains a pointer to the for lightweight tasking at first, it may be lightweight multitasker implemented control area for the next task. To start a simple to adapt the technique to the sit­ on the Harris RTX 2000 stack proces- new task, the program suspending ex- uation. For example, if one task takes

50 EMBEDDED SYSTEMS PROGRAMMING APR IL t 990 too long between pauses, that task might be broken into several subtasks that do fit within the time constraints.

EXPLORING THE TRADEOFFS ne method to break up a task is to implement a finite state ma­ 0 chine and execute a pause on every state change. r once used this technique on a proprietary microcoded graphics adapter that used a bit-sliced architecture. The system requirements were to sample a host interface every 50 microseconds; sample a bit tablet, joy­ stick, and button box every few millisec­ onds; draw vectors, arcs, and shaded polygons on a storage tube display; and refresh several hundred dynamic vec­ tors without flicker-all without any timers available. I accomplished this feat in about 2,000 words of , using the various lightweight tasking techniques. Now that we've looked at the differ-

One method that can be used to break up a task is to implement a finite state machine and then execute a pause on every single state change.

A PR IL 1990 EMBEDDED SYSTEMS PROGRAMMING 51 plify the tasker. SUMMING IT ALL UP As tasking models progress from ful­ his discussion has been restrict­ Heavyweight ly preemptive to lightweight coopera­ ed to the costs of task synchro­ tive tasking, transparency of the tasking T nization and context switching. model decreases, because programmers The method used to select the next task Tasking must spend more effort explicitly man­ to be executed also involves a variety of aging tasks. On the other band, as tran­ tradeoffs between speed and providing sparency decreases, system efficiency such features as priority-based schedul­ increases because of reduced overhead ing. Most scheduling methods can be and complexity for task synchroniza­ combined with any tasking method, al­ ent tasking models and their character­ tion and context switching. though in practice, the simpler schedul­ istics, it's time to consider a selection The choice between preemptive and ing methods are more often associated process. How do you pick the model cooperative models should be made with the simpler tasking methods. that's right for your application? Figure based on the run-time overhead and The fact that lightweight tasking is I summarizes the characteristics of the programming effort required to support simple, small, and fast does not make it various tasking models. synchronization for the preemptive ideal for every application. As we move The driving force for the different model versus the effort required to in­ down the hierarchy from preemptive types of tasking is the placement of sert appropriate pauses into the code for multitasking to lightweight cooperative pause statements. Preemptive tasking the cooperative model. For those appli­ tasking, we find a tradeoff at every step models do not use pauses. Heavyweight cations where the costs of a preemptive between the burden placed on the pro­ tasking models allow pauses anywhere rnultitasker or heavyweight cooperative grammer for correct system operation in a program. Medium-weight and multitasker can be supported, they and a resulting increase in system speed lightweight tasking models place re­ should be used, since they reduce over­ and simplicity. The weight of this bur­ strictions on pause statements to sim- all programmer effort. den depends on system requirements, In general, the more frequent syn­ type of program, and language used for chronizations and critical regions be­ implementation. Many Forth program­ come and the more tightly constrained mers using stack-based architectures target system's resouces become, the find that a medium- or lightweight task­ more attractive cooperative tasking be­ ing model meet most of their needs. Ob­ Lightweight comes. For applications where task viously, you should choose the model switching must be extremely fast, medi­ that best meets yours. tasking is simple, um- and lightweight multitaskers are These methods for stack machines appropriate. While they require invest­ and Forth programs on conventional small, and fast, but ment in terms of program organi zation machines can be adapted for use by and placement of pause statements, CISC and RISC machines executing may not be ideal for they can reward the programmer with programs in any language. However, in superior performance over other some cases the adaptation may prove every application. methods. awkward, because it is much more diffi­ cult to analyze whether registers have active values than whether a s tack is empty. Compilers could be adapted to provide this information, but such tech­ niques are generally not available to ap­ plication developers.

Philip Koopman Jr. is a senior scientist at Harris Semiconductor Inc. and the author ofStack Computers: The New Wave (Chichester, W Sussex, U.K.: El­ lis Horwood ltd., 1989) He received his Masters in computer and systems engi­ neering from Rensselaer Polytechnic Institute, Troy, N.Y., and a Ph.D. in computer engineering from Carnegie Mellon University, Pittsburgh, Pa. He may be reached via usenet at koopman @greyhound.ece.cmu.edu.

52 EMBEDDED SYSTU1S PROGRAMMING APRIL 1990 VOL.3 NO. 4 · APRIL 1990 EMBEDDED SYSTEMS PROGRAMMING On the cover: .. hokl this truth lo be self4Vldent, that not al tasks S'8c:reated equally. Photop aph by Table of Contents David Bishop.

FEATURES COLUMNS+ DEPARTMENTS 28> Multitasking on 7 > #1nclude the 386 Good News and Bad BY JIM BETZ. Embedded PCs may be 11 > Real-Time popular now, but MS-DOS and real­ Estranged Bedfellows mode are still bad news for multitasking and real-time. Here's a look at how the 17 > Parity Bit advanced features of the 80386 can be 'S Sourcebo'* used to advantage in multitasking- and 23> ...... Ants Climbing Trees what design hazards )Qu'll have to face. 8 2> Embedded Gallery 42> Heavyweight 8 6> Embedded Marketplace Tasking 90> Advertiser Index BY PHILIP KOOPMAN JR. Preemp­ 54 > Cooperative tive multitasking isn't )Qur only design 9 5> State of the Art option. For some applications, coopera­ Multitasking Structuring Data tive multitasking offers significantly im­ BY JACK WOEH R. Virtually every proved efficiency. Koopman offers a tax­ commercial Forth system has support for onomy of cooperative methods, along multitasking. Making the most of back­ with some hints on multitasking with ground tasks and PAUSEs isn't always REVIEW stack machines. child's play, though. Woehr, one of our more frequent contributors, shares his ex­ 7 3 > An 8051 SlnUator Update perience with daemolilS and other cooper­ ative monsters. 62> Reentrant Floating BY RICK NARO. It's no big secret that Microsoft Corp. and Borland Interna­ tional don't design their C compilers to make life easy for embedded system de­ velopers. Sometimes, though, we luck out because they haven't gone out of their way to make life difficult for us. Here's the inside story on ROMing Turbo C's floating-point emulation routines. REENTRANT FLOATING PAGE62

HEAVYWEIGHT TASK ING PAGE 42

APRI L 1990 EMBEDDED SYSTEMS PROGRAMMING 3