Comparing the Real-time Performance of Windows NT to an NT Real-time Extension

Kevin M. Obenland, Tiffany Frazier, Jin S. Kim, John Kowalik The MITRE Corporation, 1820 Dolley Madison Blvd. McLean, VA 22102 Contact: [email protected]

Abstract Despite its shortcomings there are several efforts under- Because of the dominance of Microsoft® Windows® in the way to use NT as-is for real-time systems[6][7][8]. These efforts are motivated by the low cost of using NT based PC market there is a strong interest in using Windows NT® systems as well as the stability of the Windows Application as a platform for real-time process and control systems. Programming Interface (API)[9]. In this paper we quantify This type of solution is very cost effective because factors that determine under what conditions it makes applications and development tools are widely available. sense to use NT for real-time processing. However, Windows NT was designed as a general purpose Another alternative, that still allows the use of NT and optimizes average not worst case COTS, is to add a real-time extension to NT[10]. This performance. In this paper we investigate two methods for extension is essentially a separate real-time operating sys- bring real-time process and control systems to NT based tem. In this Dual-OS architecture all real-time processing is platforms. We first evaluate NT as-is, using a series of real- performed under the real-time OS, and NT handles the non time benchmarks, and show that NT use in real-time time critical processing. In this system all NT COTS still systems is limited to soft real-time systems where there is run out-of-the-box. Therefore development time and cost low system load. The second approach for developing NT are still kept to a minimum. In the last several years a num- based real-time systems is to add a real-time extension to ber of vendors have produced products that follow this NT. We evaluate one such product, INtime® from RadiSys approach. These include the products: INtime® from Radi- and conclude that, even under a heavy system load, hard Sys[11], RTXTM from VenturCom[12][13], and Hyperker- real-time determinism is possible. nelTM from Imagination Systems[14]. In this paper we 1 Introduction evaluate one of these products, INtime, and show that it is significantly more deterministic than NT alone. The Windows suite of operating systems, Windows There have been several previous studies which evalu- 95®/98® and Windows NT®, dominate the personal com- ated the real-time performance of NT and/or NT extension puter operating system market[1]. Because of this large products[6][7][8][15][16][10][17]. However, in this paper market there is a vast amount of affordable Commercial off we provide a direct and unbiased comparison between the the Shelf (COTS) software products available for these two approaches. Many of the previous performance evalua- operating systems. The popularity of Windows also makes tions were performed by the manufacturers themselves. it desirable to use it in other types of systems like real-time Another key factor, that we examine in detail, is the issue process and control systems. Windows NT is the most of system load. We find that the determinism of NT is sig- robust of all the Windows operating systems and it is there- nificantly worse when considering a load. However, the fore the most likely candidate for use in real-time systems. determinism of processes running under the INtime real- Using Windows NT allows these systems to utilize low time extension are relatively unaffected by a system load. cost COTS software and increases the user friendliness of the system because they look and feel more like the per- 2 Evaluating real-time OS behavior sonal computer environment. Because of these market forces, the use of Windows has also been mandated by var- The correctness of a calculation performed by a real- ious agencies within the Department of Defense[2][3][4]. time system is determined by its ability to satisfy time con- Unfortunately NT was designed as a general purpose straints as well as its ability to operate error free[18]. In this operating system not a real-time one[5]. It was designed to paper we consider only real-time systems that are based on optimize average performance and not worst case perfor- COTS operating systems which means that it is impossible mance as is required of any real-time operating system. to analyze all possible paths in an application. The use of COTS also means that our analysis must be stochastic, amount of time and then calculates the actual sleep time based on data gathered empirically, rather than mathemati- using the Pentium performance counter. The jitter is the cal. We note, however, for complex hard real-time systems difference between the actual and desired sleep durations. (such as those deployed by the Department of Defense) The response benchmark tests the ability of a system to stochastic analysis is the technique generally used to verify perform a fixed amount of processing in a deterministic critical timing requirements. amount of time. The test repeats simple operations—such as floating point additions, block memory copies, or disk 2.1 Real-time operating systems write operations—for a base case duration of approxi- mately 10 milliseconds. The Bintime benchmark tests to In order to construct a real-time computer system the what degree the clock granularity is available to a real-time underlining operating system must abide by several application. It records the latency between successive calls requirements[5]. These requirements are outlined below: to a time of day clock function[19]. • Support for multiple preemptable threads - Individual To support deterministic response to external events an threads are typically used to process different hardware OS must process interrupts in a timely fashion. The ISR devices. It is important that the threads are preemptable latency benchmark measures the responsiveness of the OS so that a scheduler can be used to control the allocation to the assertion of the interrupt and the ISR release latency of system resources. benchmark measures the delay seen when transferring con- • Thread priority - A priority scheme is essential to trol from the ISR to the interrupt thread that completes the ensure that time critical processing takes precedence processing of the interrupt. The final type of benchmark over less critical processing. tests inter-thread/process communication by measuring the time it takes one thread to signal another. • Synchronization between threads - Threads that share data require a means of synchronization so that data is TABLE 1. Real-time benchmarks read and written in the proper order. • Deterministic timing of the operating system - This usu- Benchmark Description ally requires that the operating system be preemptable. Jitter Measures the difference between the A non-preemptable operating system could impose observed and desired delay times of vari- unbounded delays. ous OS timer functions • Support to prevent priority inversion - Priority inver- Response Measures the amount of time required to sion occurs when a high priority task is waiting on a perform a fixed amount of processing resource that is held by a lower priority task. The high priority task could be blocked indefinitely if a task with Bintime Tests the responsiveness of an operating intermediate priority preempts the low priority task. system by measuring the time between The typical solution for priority inversion is to increase calls to a time of day clock function the priority of the low priority task until it relinquishes ISR Latency Measures the amount of time from the the resource. assertion of an interrupt to the beginning of its ISR Using these five requirements one can evaluate the abil- ISR Release Measures the time it takes to switch from ity of an operating system to support real-time processing. Latency an ISR to an interrupt thread For example, NT provides multiple preemptable threads but allows only 32 levels of priority. NT also does not pro- Communi- Tests various inter-thread/process com- vide a mechanism to prevent priority inversion for the cation munication mechanisms highest priority threads. The design choices used in NT and how this relates to real-time systems are discussed in more 2.3 Hardware setup detail in Section 3. Table 2 describes the hardware and software configura- 2.2 Testing a real-time operating system tion of the prototype systems used in our evaluation stud- ies. Everything used is commercially available as standard We used the six different types of benchmarks shown in COTS. All timings in our studies were gathered by the Table 1 to evaluate the real-time capabilities of NT and the benchmarks themselves, we did not use any special pur- Dual-OS system. The Jitter benchmark measures the deter- pose hardware to stimulate the system or gather statistics. minism of timers of different duration and resolution. The We did not install INtime when evaluating the NT only sys- benchmark uses a sleep function to sleep for the specified tem because INtime modifies the Hardware Abstraction Layer (HAL) of NT. The HAL is the portion of NT that iso- Table 4 shows the different WinBench98 benchmarks lates the low level hardware details from the kernel and the used in our testing. The CPU test consists of two different device drivers. tests, one targeting the CPU itself and one targeting the FPU. Both the Disk and Graphics benchmarks give results TABLE 2. Prototype hardware and software for a set of Business applications and a set of High-end setup used in benchmark tests applications. The business applications consists of pro- grams such as: Microsoft (Access®, Excel, Powerpoint®, Component Description and Word) as well as Lotus 1-2-3®. The High-end applica- PC Dell Dimension XPS D300 tions include programs such as: Microsoft Frontpage® and CPU 300 megahertz Pentium II Visual ++ and Adobe Photoshop. Graphics Matrox Millennium II with PCI TABLE 3. Application loads bus retries disabled Memory 64 Megabytes Load Description OS Windows NT Workstation 4.0 None No load other than the OS and with Service pack 3 the benchmark test Real-time INtime version 1.2 (only Moderate A CPU intensive application Extension installed when performing Heavy Three applications: a CPU Dual-OS testing) intensive application, a string search across all files on the The behavior of device drivers may affect the real-time hard disk, and playback of a performance of the Dual-OS system. In particular we found MPEG video from the CDROM that the Matrox graphics driver locked the PCI bus for WinBench98® Benchmarks that target specific extended periods of time. This problem can be eliminated hardware sub-systems by disabling the PCI retry feature of the Matrox device (cpu,disk,cdrom,graphics) driver. See Section 4.1.2 for a discussion of how device drivers can impact real-time performance.System Loading TABLE 4. WinBench benchmarks The main utility of using NT in a real-time system, with or with out a real-time extension, is the ability to run non- Sub-system Benchmark real-time applications as-is. Therefore we expect real-time CPU CPUmark32 and non-real-time processes to be running simultaneously, FPUwinmark making it important to include a system load while evaluat- ing the prototype systems. CDROM CD-ROM Winmark Table 3 describes the different types of system load Disk Business Disk Winmark introduced in the benchmark testing. We also ran without a High-end Disk Winmark load, signified by the load None, to measure the affect an idle NT has on the real-time performance. To generate the Graphics Business Graphics Winmark Moderate load we used an application that moves an image High-end Graphics Winmark around in the graphics frame buffer. This application stresses the CPU and graphics sub-systems. The Heavy 2.4 Testing methodology load adds a full text search across all files in the hard disk and a playback of an MPEG video to the Moderate load. The number of tests that must be run, so that a represen- ® The WinBench suite is a set of benchmarks that are tative maximum is seen, is a major consideration when traditionally used to test the performance of different hard- testing these operating systems. The benchmark results ware platforms[20]. The benchmark suite is made up of consist of the results from a number of trials, where each several different benchmarks, each targeting a specific trial consists of a series of 4,000 samples. We ran at least hardware sub-system. Each benchmark runs a set of popu- 40 trials (i.e. 160,000 samples) of each benchmark. To lar Windows application programs and produces a perfor- determine if more trials were needed we calculated a confi- mance metric. We use the WinBench suite, not to test dence interval over the means of the maximums from each system performance, but to determine the impact of differ- trial. This formulation assumes that the maximums are nor- ent types of loading on real-time performance. mally distributed. For the NT only tests, additional trials were run until the confidence interval of the maximums was less than 20% of the mean with a certainty of 95%. For 3.3 Design issues that limit NT’s use as an RTOS the INtime tests we were able to achieve a confidence interval of less than 5% of the mean with the same level of The scheduling and interrupt processing design choices certainty. Close to 1,000 trials were run for several of the described above limit NT’s use as a real-time operating tests to achieve this bound. Extra trials were needed for the system. The limitations imposed by these design choices benchmarks where the variance was the greatest. are listed below: • No priority inheritance for REALTIME threads - 3 Using Windows NT as a real-time OS Threads at this level are the highest priority, however there is no mechanism to prevent priority inversion. Using NT stand alone as a real-time OS is attractive for • Limited number of priorities - There are only seven lev- several reasons. First it is relatively inexpensive. All that is els of priority for real-time threads. This severely limits needed is a standard PC, NT itself and program develop- the amount of control the system designer has over ment tools like Visual C++. Portability and longevity is the thread priorities. When more than one thread shares a second advantage of using Windows NT. The WIN32 API priority level they are processed in FIFO order. is a defacto standard and programs written using it should • DPCs are processed in FIFO order - Much of the work be compatible with future upgrades of NT[9]. of a device driver occurs in a DPC. Because these DPCs However Windows NT was designed as a general pur- are processed in FIFO order, a DPC that needs to per- pose operating system and several design decisions limit its form time critical processing may be delayed indefi- use as a real-time operating system. In this section we dis- nitely by less critical processing. Time-critical DPCs cuss these design issues and use our benchmark tests to can also be delayed by the processing of low priority determine under what conditions NT might be suitable for ISRs. use as a real-time OS. • Masking interrupts - Any code, running at kernel level, 3.1 Scheduling and priority in NT can effectively gain exclusive access to the CPU by dis- abling interrupts or raising the interrupt request level NT uses a priority based scheduling scheme with 32 lev- (IRQL) to the highest level. This problem is exasper- els of priority. These 32 levels are divided into four differ- ated by the previous one because if a device driver ent priority classes: IDLE, NORMAL, HIGH and needs deterministic response its only choice is to dis- REALTIME, where IDLE is the lowest priority and REAL- abled interrupts. This winner-takes-all philosophy can TIME the highest. The NORMAL and HIGH priorities can lead to unpredictable and ill behaved device drivers. be modified dynamically by NT to achieve fairness and • Page Swapping - Because NT uses virtual memory, maximize average performance. The priorities of individ- page swapping can occur at any point during the execu- ual threads within classes can also be prioritized by using a tion of a thread. This can add a significant amount of priority modifier. NT uses this modifier to determine how indeterminism. There is a mechanism to lock pages in to raise and lower priority levels while doing dynamic pri- memory, however NT does not guarantee that the pages ority adjustment. REALTIME threads do not use priority will remain locked if the process is swapped out [21]. modifiers and are not affected by NT’s dynamic priority • IRQL mapping - The mapping of interrupts to IRQLs is adjustment scheme[1]. performed dynamically by the HAL at system startup as it detects the devices attached to the system. Any 3.2 Interrupt processing assumptions made about the relative order of interrupt- associated IRQLs within a system will not be portable In NT, interrupt processing is handled using two types because other systems may have different hardware of routines: an Interrupt Service Routine (ISR) and a architectures. Deferred Procedure Call (DPC). When an interrupt is • Interrupts and DPCs are higher priority than real-time detected, NT interrupts normal processing and passes con- threads - Any and all interrupts and DPCs, including trol to the interrupt’s corresponding ISR. Because an ISR non-real-time NT device drivers, can preempt any real- disables all processing, other than higher priority inter- time thread. This means that non-real-time activities, rupts, it is recommended that only minimal processing like the movement of the mouse, will preempt time crit- occur within an ISR. The bulk of the processing should be ical processing. performed in a DPC. DPCs are invoked via a signal from their corresponding ISR. If there are more than one DPC in the system, they are handled one at a time in FIFO order. 3.4 Benchmark results for NT (25 msec) when there is a heavy load, i.e. two and a half times the period. In this section we use the benchmark tests described in Percentage wise the maximum jitter is better for the 100 Section 2.2 to evaluate NT as an RTOS. msec timers than the 10 msec timer because the jitter in the longer timers represents a smaller percentage of the total Real-time performance of NT under load. Table 5 period. The maximum jitter, at around 25 msec, represents shows the maximum and average values for the bench- a jitter of about 25% of the period of the timer. For process marks of Table 1 under the different loads described in control systems, a general rule of thumb is that scan time Table 3 . For the Response benchmark the results are given variations should not exceed 5% of the cycle time [7]. in milliseconds, for all other benchmarks the results are Many of the studies performed by process control manu- given in microseconds. For all benchmarks, except the facturers indicate that NT can meet the 5% scan time varia- Response benchmarks, a good result is one that is close to tion requirement. However, our tests show that under a zero. For the Response benchmarks a desirable results system load this is not always the case.The general conclu- should be close to the base case, as defined in Section 2.2, sion of the jitter results is that NT can support at best a of approximately 10 msec. timer with a period of 100 msec. The bintime results also give a similar conclusion. Under a Heavy Load the maxi- TABLE 5. Benchmark results for NT mum time between calls to the time of day clock is 218 under different loads msec. Therefore the timer resolution available to a real- No Load Heavy Load time application is on the order of hundreds of msec. 107 No Load Benchmark MaxAvgMaxAvg Moderate Load 6 Jitter, T=10, 10 Heavy Load 6987.8 30.3 25635 659.9 R=1 (µsec) 105 Jitter, T=100, 4 2819.8 301.8 25976 968.8 10 R=1 (µsec) 103 Jitter, T=100,

222.6 4.42 24918 985 Frequency R=10 (µsec) 102 Bintime (µsec) 6918.3 1.74 218776 10.6 10 Response 15.9 9.65 60.9 9.9 1 (Add)(msec) 1 10 100 1000 10000 100000 µ Response Jitter ( sec) 16.1 9.8 27.3 10.1 (Copy)(msec) FIGURE 1. Histogram of NT jitter for period T=10 ms and resolution R=1 ms Response 31.1 10.6 3953.4 11.1 (Disk)(msec) 107 No Load ISR Latency 26.0 12.3 54.8 15.9 Moderate Load (µsec) 106 Heavy Load DPC Timing 5 52.0 5.3 11618 85.9 10 (µsec) 104 103 As the results show the average No Load real-time per- 2 formance for NT is reasonable. The average under Heavy Frequency 10 Load is higher than under the No Load, but still within an 10 acceptable region. These results are not surprising because NT is designed to optimize average performance. 1 However, in a real-time OS, determinism is important 1 10 100 1000 10000 100000 µ and therefore the maximum value of the benchmarks is a Jitter ( sec) more important statistic. In this case the benchmark results FIGURE 2. Histogram of NT jitter for period T=100 ms and resolution R=10 ms show that NT performs rather poorly. For example, there is a maximum jitter of close to 7 msec for a timer with a The jitter results near the maximum value represent a period of 10 msec without a load. The jitter is even worse very small percentage of the total number of samples. Figure 1 and Figure 2 show histograms of jitter results for two different timers: (T=10,R=1) and (T=100,R=10). The period of 100 msec and resolution of 10 msec. The jitter results are plotted on a log/log scale and each point repre- under a CD-ROM load is worse than any of the jitter sents a discrete bin; the lines connecting the points are results shown in Table 5 . The CPU WinBench load creates included only to improve the readability of the histogram. the least amount of worst case jitter, at less than 5 msec. As Figure 1 shows, for the 10 millisecond timer, there are Here again DPCs are the main cause of jitter. The I/O more than three million samples less than 100 msecs but intensive loads create more DPCs which always run at a only around 100 that are greater than 1 msecs for a Moder- higher priority than the real-time thread that is waiting on ate Load. The samples that are greater than 1 msec only the timer. This effect is substantiated by the results of the represent 0.003% of the total samples. The results for the DPC timing test shown in Table 6 where a maximum delay 10 msec timer under No Load are slightly better but very of nearly 11 msec is seen while using the CD-ROM Win- similar to the results under a Moderate Load. For a Heavy Bench test as a load. Load the maximum jitter occurs more frequently. However a jitter over 1 msec occurs in only 12% of the cases. TABLE 6. NT benchmark results under Figure 2 shows a histogram of the results for a timer different WinBench 98 loads with a period of 100 msecs and a resolution of 10 msecs. Maximum value Out of the more than one million samples gathered for the case with a Moderate Load, only one is on the order of 1 CPU/ CD- msec. There are also less than 0.3% that are greater than Benchmark FPU ROM Disk Graphics µ 100 secs. The results are worse when a Heavy Load is Jitter, T=10, 4496 41737 30881 13455 used. In this case about 12.5% are greater than 1 msec and R=1(µsec) 1.4% greater than 10 msecs. Jitter, T=100, 4880 1372570 25781 13136 The response results shown in Table 5 show that even R=1(µsec) without a load the time to perform a fixed amount of pro- Jitter, T=100, cessing can take anywhere from 59% to 210% more time 5224 2400660 24086 8645 R=10(µsec) than the desired base case. The average response is close to the desired time in all cases, i.e. about 10 msec. The results Bintime(µsec) 870963 758577 870964 501187 are the worst under a Heavy Load. The worst case for the Response 23 174 117 71 Add test is more than six times that of the base case and for (Add)(msec) the Disk test the maximum time takes close to four sec- Response 26.9 1,007.5 108.5 62.1 onds. The Disk test is especially bad because the DPCs (Copy)(mses) used in its I/O processing have to contend with DPCs used Response 3988 5601 548 1013 by the application programs used to create the loads. (Disk)(msec) The two benchmarks that test the interrupt performance ISR Latency of NT show that NT’s interrupt response time is fairly 28.9 33.9 47.2 35.7 (µsec) good, but the time it takes to start processing a DPC can be more than 11 msecs. Interrupt response time is good DPC Timing µ 1940 10772 4024 1916 because interrupts are higher priority than anything else in ( sec) the system. Table 5 shows that the maximum interrupt latency is 54.8 µsecs. The maximum DPC service time is The WinBench loads have an even greater effect on the so great because DPCs are processed in FIFO order. When real-time response than the loads of the previous section. the system is loaded the DPCs in the benchmark are For example the Add Response test has a maximum delayed by the DPCs generated by the application load. response time that is 17 times the base case under the CD- ROM WinBench load. Conversely the CPU WinBench Testing NT using WinBench 98. Using WinBench98 to load has the least effect on the Add Response test. The test generate a load allows us to pinpoint more accurately the and the WinBench load compete equally for the CPU. type of processing that causes the greatest effect on real- Likewise the Disk Response test is least affected by other time performance. Table 6 shows the maximum values for disk activity. The Disk Winbench load has the least effect the different WinBench loads. on the Disk Response benchmark. As Table 6 shows the timer jitter is most affected by a Table 6 shows that the maximum Bintime result is dra- load that contains a lot of I/O processing. The CD-ROM matically affected by the WinBench loads. Bintime tries to WinBench load produces the worst case jitter followed by utilize 100% of the CPU and when it must compete with the Disk and Graphics tests. The worst case jitter for the another CPU intensive application the delay between calls CD-ROM test is close to 1.4 seconds for a timer with a to the clock function can be as high as 870 msecs. 4 Dual-OS system Interrupt processing. Intime uses an interrupt mecha- nism similar to NT’s. Interrupt processing under INtime is As seen in the previous section, NT alone is not well performed using two elements: an ISR and a corresponding suited for use as a real-time operating system. However, interrupt thread. The ISR acknowledges the interrupt and frequently real-time systems utilize a mixture of real-time should perform a minimal amount of processing because and non-real-time processing. Because of the vast amounts all interrupts are disabled while processing an ISR. The of COTS available for NT there is an advantage, in terms system calls that an ISR can use is also limited. The bulk of of cost and usability, in using NT for the non-real-time pro- the processing should be performed in an interrupt thread. cessing. These requirements have led to the development An interrupt thread is analogous to a DPC, however inter- of several products which add a real-time extension to NT, rupt threads are prioritized. enabling a greater degree of real-time performance to be achieved while retaining the ability to run NT applications Communication and synchronization. INtime provides a out-of-the-box. These extensions essentially create a Dual- number of means of communication and synchronization OS system where all the real-time processing runs under between real-time threads as well as between a real-time the extension and the non-real-time processing runs on NT. process and a NT process. Message passing between A possible disadvantage of these products is the added threads is accomplished using mailboxes. A mailbox can be software cost. The cost of these products is driven by the used to pass up to 128 bytes of data between threads. When relatively small real-time computing market and not the sending a message the sender queues the sent message and mass consumer market. Therefore the extension products returns. The receiver of the message blocks on the receive will have a price closer to that of an RTOS which is consid- operation until the message is delivered. Shared memory is erably more than the cost of Windows NT. However, these used to share larger amounts of data between threads. All costs must be weighed with the savings of using NT COTS shared memory is locked into INtime’s address space and software for non-real-time tasks. Therefore the Dual-OS cannot be paged out of memory. approach provides a solution that has better real-time per- INtime provides two types of synchronization mecha- formance than NT but is more cost effective than using a nisms: semaphores and regions. A semaphore is a autono- custom RTOS. mous shared binary counter. One thread signals the To evaluate the usefulness of this Dual-OS approach we semaphore and another blocks waiting on the signal. evaluated one of the products, INtime from RadiSys, using Regions can be used to guard critical sections of code. Pri- the same benchmarks used for NT in the previous section. ority inheritance is implemented in regions thereby provid- The remainder of this section describes INtime and pre- ing a solution to the priority inversion problem. sents the results of the benchmark testing. Real-time API. To control the real-time mechanisms such 4.1 The INtime real-time extension to NT as thread scheduling and communication INtime provides a real-time API which is patterned after Win32. However, INtime encapsulates all of NT into one hardware task since Win32 is not a real-time API, the real-time API is and creates another hardware task for the real-time exten- specific to INtime. This is one of the drawbacks of using an sion. This encapsulation provides hardware isolation of the extension product, i.e. the code is dependent on one partic- real-time memory space from the NT memory space. ular vendor. The alternative of modifying Win32 function Everything that runs under NT then runs within a single calls to support real-time processing is not an attractive task that runs at the lowest priority under INtime. INtime solution either. An API which is syntactically the same as modifies the Hardware Abstraction Layer (HAL) of NT to Win32 but semantically different introduces inconsisten- prevent NT from modifying the real-time clock and dis- cies and invites application errors. Also different vendor’s abling or remapping real-time interrupts. modifications to Win32 will undoubtedly be inconsistent.

4.1.1 Design of INtime 4.1.2 INtime’s support for real-time processing

Scheduling. The INtime scheduler is hooked into the NT Because of its design INtime does not exhibit the same timer interrupt, the highest priority interrupt other than the limiting factors with respect to real-time performance. The Non Maskable Interrupt (NMI). The INtime scheduler can differences in INtime versus the choices made by NT, as therefore preempt any NT thread, DPC or interrupt pro- described in Section 3.3, are as follows: cessing. There are 256 priority levels within INtime com- • 256 thread priorities: INtime provides 128 user thread pared to NT’s 32 levels. priorities and 128 interrupt thread priorities. • Support for priority inheritance: Regions support prior- TABLE 7. Benchmark results for INtime ity inheritance in INtime which provides a solution to under different loads the priority inversion problem. No Load Heavy Load • Interrupt thread priorities: Unlike their counterparts in NT (DPCs) interrupt threads in INtime are prioritized. Benchmark Max Avg Max Avg This eliminates indeterminate delays within time criti- Jitter, T=0.2, 16.9 0.57 19.4 1.6 cal interrupt processing. This allows the systems R=0.2 (µsec) designer to prioritize interrupts and therefore a time Jitter, T=1, 22.1 1.2 26.2 2.9 critical device driver does not have to disable interrupts. R=1 (µsec) • All memory used within INtime is locked into memory: Jitter, T=10, 47.2 1.7 47.2 8.9 This eliminates the nondeterministic delay caused by R=1 (µsec) page swapping. Jitter, T=100, 36.9 1.7 40.3 6.6 R=1 (µsec) Because of the basic design and use of NT based sys- tems there is a fundamental limit to the amount of non Jitter, T=100, µ 41.4 1.9 40.7 6.9 determinism that INtime or any other real-time extension R=10 ( sec) can eliminated from the system. One example is the fact Bintime (µsec) 18.2 6.9 N/A N/A that code running in NT kernel mode can disable interrupts Response 10.3 10.0 10.3 10.0 using the CLI instruction. Because the real-time scheduler (Add) (msec) is connected to the highest priority interrupt, disabling Response 10.0 10.0 10.0 10.0 interrupts therefore also disables the real-time scheduler. (Copy) (msec) Within NT itself this is rarely done, but all device drivers ISR Latency 10.7 6.1 13.6 6.6 run in kernel mode and there is no guarantee that they will (µsec) not disable interrupts for prolonged periods of time. Thread Timing Another source of indeterminism, that cannot be prevented, 18.9 5.6 20.8 6.1 (µsec) is I/O bus locking. For example if a device driver locks the PCI bus it will block any other process that needs to access Semaphore µ 13.6 3.7 15.4 3.7 it. This occurs, by default, in the PCI based Matrox video ( sec) card that was used in our testing. In this particular case the NT/RT Round- 361.8 96.7 8749.9 98.6 problem can be eliminated by disabling the automatic PCI trip (µsec) retry feature of the device driver. As Table 7 shows real-time processing under INtime is 4.2 Benchmark results for the Dual-OS system much more deterministic than NT. The worst case timer jit- ter is less than 50 µsec even under a Heavy Load. For all In this section we use the benchmark tests described in cases the jitter is lower than the timer resolution. The aver- Section 2.2 to evaluate the effectiveness of INtime. age jitter is also low. Ranging in value from 0.57 µsec up to a maximum of 8.9 µsec. The response for Bintime is at Real-time performance of INtime with a load. Table 7 No Load shows the maximum and average values of the real-time 106 benchmarks described in Table 1 . With a few exceptions, Moderate Load 5 the benchmarks correspond closely to those run under NT. 10 Heavy Load INtime allows a higher timer resolution, so we ran addi- 104 tional jitter tests with periods of 1 msec and 200 µsec. The 3 Disk Response test could not be run because INtime does 10 not support NT disk operations within real-time threads. 102 The Bintime benchmark tries to use 100% of the CPU Frequency therefore it is not possible to run it along side a load. Also 10 included are the Semaphore test, which tests signaling 1 delay of a semaphore, and the NT/RT Round-trip test 110100 which tests the communication delay between NT and a Jitter (µ sec) real-time thread. FIGURE 3. Histogram of INtime Jitter for period T=10 ms and resolution R=1 ms most 18.2 µsec compared to a maximum of 6.9 msec for TABLE 8. INtime benchmark results under NT. The 18.2 µsec Bintime response seen for INtime, is different WinBench loads better than response times of 16 msec for Lynx OS and 65 msec for Solaris seen in another study[22]. Figure 3 and Maximum value Figure 4 show histograms of the jitter for two timers CPU/ CD- (T=10, R=1 msec and T=100, R=10 msec). For the 10 Benchmark FPU ROM Disk Graphics msec timer 99.4% of the samples are less than 10 µsec Jitter, T=0.2, 18.4 23.4 18.1 25.7 without a load. With a Heavy Load 81% of the samples are R=0.2 (µsec) less than 10 µsec. Jitter, T=1, 23.2 28.3 35.6 33.6 R=1 (µsec) 6 No Load 10 Jitter, T=10, Moderate Load 45.5 58.9 58.0 56.5 R=1 (µsec) 105 Heavy Load Jitter, T=100, 4 44.8 56.7 53.9 67.8 10 R=1 (µsec) 103 Jitter, T=100, 49.3 62.9 65.2 62.1 R=10 (µsec) 102 Frequency Response 10.3 10.3 10.3 10.3 10 (Add) (msec) Response 1 12.2 12.2 12.2 12.2 (Copy)(msec) 110100 ISR Latency Jitter (µ sec) 18.4 18.3 20.9 21.6 (µsec) FIGURE 4. Histogram of INtime Jitter for period T=100 ms and resolution R=10 ms Thread 20.4 22.7 23.3 24.2 Timing (µsec) Semaphore Testing INtime using WinBench 98. Table 8 shows that, 13.5 15.1 12.9 24.4 (µsec) even though the real-time processing under INtime is still more deterministic than NT, the WinBench loads have a NT/RT more significant effect on real-time performance than the Round-trip 1252.1 18485.4 23122.4 1,300.9 (µsec) application loads shown in the previous section. The worst case jitter is slightly greater, but is still less than 100 µsec. The Copy Response test shows the most significant dif- 5 Conclusion - Comparing the real-time ference in determinism. For all the WinBench loads the extensions to NT maximum is about 22% higher than the base case. This is When trying to decide whether to use NT for a real-time most likely attributed to cache effects which occur because system or whether to also include a real-time extension, the data written in the test is flushed from the cache by the one must weigh the real-time requirements of the system WinBench load. This worse case happens relatively infre- versus the issues of cost and portability. NT as a technol- quently. For example, for the CD-ROM load only 0.006% ogy is mature and widely used so software developed of the samples are greater than 20 msec. under it will have better longevity than software developed The other benchmarks also show a significant advantage using a vendor specific API. The real-time extension prod- for using INtime versus NT. The Add and Copy response ucts are not as mature but they offer the opportunity for sig- results are also very close to the base case. The interrupt nificantly better real-time performance. One must also latency is slightly better for INtime and the thread timing consider, that when following either approach, there are results are significantly better than the DPC timing results always going to be limitations imposed by using an NT of NT. The interrupt threads are higher priority than any based system and COTS software[23]. NT processing. Communication between threads in INtime The benchmark results in this paper show that the real- is very deterministic, as shown by the results of the Sema- time performance of the INtime extension product is sig- phore test. However, the NT/RT Round-trip benchmark nificantly better than that of NT. We show that INtime can shows that, communication between NT and a real-time support periodic processing on the order of microseconds, thread can take as much as 8.75 msec. under any loading condition, whereas NT can only support millisecond timers under light or no load. INtime can per- form a fixed amount of processing with at most a 22% References increase in processing time. NT on the other hand exhibits a worse case increase close to 10,000%. Even though our [1] H. Custer. Inside Windows NT. Microsoft Press. 1993. benchmark results are based on only one of the available [2] Defense Information Systems Agency (DISA). “Defense products we expect that other products, that use the same Information Common Operating Environment (DII COE)” basic underlying technology, would exhibit similar results. 1998. http://spider.osfl.disa.mil/dii/. The difference between NT and INtime is the largest [3] Defense Information Infrastructure (DII COE) Integration when non-real-time application loads are run alongside and Runtime Specification (I&RTS), Version 3.1, 1998. real-time ones. It is important to consider this non-real- [4] Dept. of the Navy (DON CIO ITSGIPT). “Information time processing because the biggest advantage of using NT Technology Standards Guidance” Version 98-1.1. 1998. is the ability to leverage the vast amount of COTS software [5] M. Timmerman and J. Monfret. “Windows NT as Real-time OS?” Real-Time Magazine 2Q97. 1997. available for it. Also, our benchmark tests only consider a [6] K. Ramamritham, et al. “Using Windows NT for Real-time single thread of real-time processing; running multiple Applications: Experimental Observations and Recommen- threads will also increase the overall system load. The dations” Proceedings of RTAS. 1998. Dual-OS approach addresses these two problems by first: [7] R. Malina. “Using the Windows NT OS for Soft RT Con- isolating the non-real-time processing of NT into a low pri- trol” Rockwell Automation. 1998. http://www.openautoma- ority task and second: using a priority scheme for all tion.com/newnt.html. objects in the system so that all real-time processing can be [8] “Real-Time Systems and NT” prioritized. Microsoft. 1995. http://premium.microsoft.com/ msdn/library/backgrnd/html/realtime.htm. Because the development of NT applications has a [9] R. Simon. Windows NT WIN32 API Superbible. Waite lower software cost, for systems with fairly soft real-time Group Press. 1997. requirements using NT by itself may be more practical than [10] M. Timmerman et al. “Windows NT Real-Time Extensions: the Dual-OS approach. NT is a lower cost solution because Better or Worse” Real-Time Magazine 3Q98. 1998. the general purpose operating system market is much larger [11] INtime:http://www.radisys.com/products/intime/index.html. than the real-time OS market. There are possibly other soft- [12] B. Carpenter, et al. “The RTX Real-Time Subsystem for ware development costs associated with using a real-time Windows NT” Proceedings of the Usenix Association Win- extension. Each different extension product uses its own dows NT System Engineering Workshop. 1997. API. This requires a time investment on the part of the [13] VenturCom homepage: http://www.venturcom.com/. developer to learn this API, and if in the future the product [14] Imagination Systems, “Windows NT for Real-Time Con- is no longer supported, porting the system to a new envi- trol?” http://www.imagination.com/paper1.html. ronment requires learning a different API still. Also [15] N. Frampton, J. Tsao, J. Yen. “Hard Real-time Extensions of Windows NT Evaluation Report” GM Powertrain. 1997. because of limited resources the extension manufactures http://www.arcweb.com/omac/Documents/ntrtrpt2.pdf generally provide real-time device drivers for only a subset [16] M. Jones and J. Regehr. “Issues in Using Commodity Oper- of the different types of peripheral devices. Using any ating Systems for Time-Dependent Tasks: Experiences from hardware not supported requires writing a custom device a Study of Windows NT” Proc. of NOSSDAV. 1998. driver. [17] K. Obenland, et al. “Dual Real-time/Non-real-time Com- There are other PC based operating systems, other than mon Operating Environment” MTR 98W0000154. 1998. NT, which may allow the implementation of real-time sys- [18] E. Douglas Jensen. “Real-time for the Real World.” tems using PC hardware. Windows CE has potential as a http://www.real-time.org/. real-time OS[24][25]. CE requires a relatively small foot- [19] L. Monk, et al. “Real-Time Communications Scheduling: Final Report” The MITRE Corp., MTR 97B0000069. 1997. print and was developed with the hand-held market in [20] Ziff-Davis Inc. WinBench98 Benchmark suite. mind. It is also modular which allows a system developer http://www.zdnet.com/zdbop/winbench/winbench.html. to customize the OS for each system. However CE is not [21] J. Richter. Advanced Windows. Microsoft Press. 1997. identical to NT and therefore has the disadvantage that it [22] R. Freedman et al. “Real-time Benchmarking of the Solaris cannot run NT applications out-of-the-box. is and Lynx Operating Systems” MTR 980000043. 1998. another operating system that, in the future, may allow [23] Radisys Corp. “Determinism and the PC Architecture.” real-time processing on PC platforms. Linux is a UNIX http://www.radisys.com/news/articles/pdf/determinism.pdf. variant and there are current research efforts underway for [24] “RT Systems and Microsoft Windows CE” Microsoft.1998. the development of a real-time Linux[26][27]. [25] M. Timmerman, et al. “Is Windows CE a real threat to the RTOS World” Real-Time Magazine 3Q98. 1998. [26] Michael Barabanov. “A Linux-based Real-time OS.” Mas- ter’s Thesis New Mexico Institute of Technology. 1997. [27] B. Srinivasan, et al. “A Firm Real-Time System Implemen- tation Using COTS HW and Free SW.” RTAS. 1998.