The Second International Conference on Sensor Technologies and Applications

Anatomy of RTOS and Analyze the Best-Fit for Small, Medium and Large Footprint Embedded Devices in

Ranjan Dasgupta Innovation Lab, Kolkata, Tata Consultancy Services Ltd, India [email protected]

Abstract suitable at the time of deploying either in small, A wireless sensor network is characterized as a medium or large footprint wireless embedded devices. massively distributed and deeply 2. Architecture with small medium and large footprint embedded 2.1. WSN Framework devices. The sensor devices and wireless sensor nodes Component based framework [2] is desirable for the are often severely resource constrained. Typical they development of for WSN. The are equipped with 8-bit MCU, 100-512KB code components of the framework provide the functionality memory and 4-64KB of RAM. Background / of single sensors, sensor nodes and the whole sensor foreground approach, finite state machine based network. According to these components, applications software design and event-driven programming are classified into sensor applications, node techniques are capable of handling concurrent and applications and network applications. Some of these asynchronous events usually occur in these devices. applications are easy to implement without a RTOS, But network routers and coordinators are often some require a simple scheduler, and some a full- medium and large footprint devices, equipped with fledged RTOS discussed in this paper. 16/32-bit MCU, 640KB–64MB code memory and more 2.2. Sensor Application RAM where real-time is easier to The sensor application contains the entire implement. For a wireless sensor network design measurement and readout of a sensor as well as the choosing and deciding a suitable lightweight real-time local storage of the data. It has full access to the operating system and OS abstractions that provide a hardware and is able to access the operating system rich enough execution environment while staying (OS) directly. Network access is not possible. Smart within the limitations of the constrained devices is a sensors are small footprint embedded devices and their great challenge. This paper presents several RTOS hardware architectures are characterized with the architectures and analyzes the best–fit for sensors and following specifics: other wireless network devices. • Restricted CPU power • Restricted available memory in the order of 1. Introduction 100KB code memory and less than 20KB of The wireless sensor network (WSN), as a key RAM enabling technology for , has a • Limited energy supply, mainly provided by wide range of applications, which are deeply batteries or alternative sources such as the embedded in the physical world. Embedded devices sun, vibrations, etc comprising WSN (sensor, , router, • Dimensions and weight restrictions, which coordinator) elements are generally resource can lead to cooling and CPU usage constrained [1] such as small CPUs, small memory restrictions footprints and limited energy. The limited hardware All these specifics of smart sensors require special resources make special application necessary, which in attention when choosing the RTOS architecture. For turn creates special requirements for real-time sensor application shown in Fig 1, an operating system operating system (RTOS). Therefore choosing the best should implement a low-power task scheduling. A fitted RTOS for WSN hardware is one of the greatest scheduling algorithm could take advantage of challenges to overcome. This paper presents several nonlinear battery effects to reduce energy RTOS architectures, differentiates them in terms of consumption. There are complexity, and figures out which one will be more two or three system software approaches that can satisfy the constraints of such considerations

978-0-7695-3330-8/08 $25.00 © 2008 IEEE 605598 DOI 10.1109/SENSORCOMM.2008.139 Host Middleware Sensor Driver

Transmitter Filtering / Pre- Acquisition & Algorithms Modules Services VM processing and Flash Transducer Storage Control

Sensor Middleware Management

Figure 1 Structure of a sensor application Operating System Sensor Driver • Foreground and background architecture CPU Sensor • Queued architecture Figure 2 Structure of a node application • Simple Scheduler 2.3. Node Application All these functionality of sensor node application Sensor nodes are small to medium footprint require following system software approaches that can embedded devices with wireless communication and meet • networking capabilities. Typical sensor nodes are Finite state machine and event driven equipped with 8/16-bit MCU, 512KB-4MB code architecture memory and more than 128KB of RAM. The node • Reactive deadline driven architecture application contains all application specific tasks and • Virtual machine architecture functions of the middleware to build up and maintain 2.4. Sensor Network Application the network, e.g., routing, looking for nodes, Mostly sensor network application is running on discovering services and self-localization. The medium to large footprint wireless embedded devices different requirements and objectives for WSN can be (e.g. network routers and base station or coordinator) achieved only by using a flexible architecture of the describes the main tasks and provides services of the node software. Therefore node software is divided into entire network. Fig 3 shows the logical view of a three parts according to the main tasks (Fig 2). sensor network application. The nodes can be The operating system handles the device-specific contacted only through services of the middleware tasks. This contains bootup, initialization of the layers. They do not perform any individual tasks. The hardware, scheduling, and memory management as distributed middleware coordinates the cooperation of well as the process management. The OS consists of the services within the network. It is logically located special tailored parts only needed by the specific in the network layer but it exists physically in the application of the node. nodes. All layers together in conjunction with their The second part is the sensor driver. It initializes the configuration compose the sensor network application. sensor hardware and performs the measurements in the All these functionality of sensor network sensor. It encapsulates sensor hardware and provides application require following system software an optimized application-programming interface (API) approaches that meet to the middleware. The host middleware is the superior software layer. Sensor Network Application Its main task is to organize the cooperation of the Distributed Middleware distributed nodes in the network. The middleware management handles four optional components, which can be implemented and exchanged according to Middleware Middleware node’s task. Modules are additional components that increase the functionality of the middleware. Typical Operating Operating modules are routing modules or security modules. System System Algorithms describe the behavior of the modules e.g. Hardware Hardware the behavior of a security module can vary if the encryption algorithm changes. The service component Node A Node B contains the required software to perform local and Figure 3 Structure of a sensor network application cooperative services. This component usually cooperates with other nodes to fulfill its task. Virtual • Preemptive scheduler machines (VM) enable an execution of platform • Layered multithreaded and network stack independent programs. architecture

606599 • Evolvable architecture requirements. This technique is used by Jennic in its 3. RTOS In Small Footprint Embedded application queue implementation. Sensor Device 3.3. The Simple Schedule 3.1. Foreground / Background Architecture A much more flexible way is to use a simple task scheduler, which has the advantage of allowing the This approach [3] is a standard one for software creation of tasks. The term ‘task’ here is equivalent to a implementation in small footprint embedded sensor lightweight process or . Tasks allow the software devices and is a good solution when there is no need solution to be divided into independent execution for well separated parallel execution. The foreground / contexts having their own logic. Having their own background architecture consists of two main parts – context is especially beneficial for local encapsulated the foreground comprises the interrupt service routines variables, thus reducing the necessity for global (ISRs) that handle asynchronous external events in a variables. timely fashion, and the background or main is an There are needs to be some forms of inter process infinite loop which repeats the same activities forever communication (IPC) in order to exchange information and uses all remaining CPU cycles to perform the less between tasks, but this is done through the framework time-critical processing. The parallelism is achieved and is not directly controlled by the tasks. The tasks through ISR, when get triggered by a software or can exchange messages, events, signals, etc. through hardware interrupt. The flow of main loop is suspended message queues that the solution needs to provide. when an interrupt occurs and it is being processed by This is a small RTOS with nothing but the basic the ISR. The foreground typically communicates with functionality to achieve parallel software execution in the background through shared memory. The order to achieve deterministic behavior and the main background loop protects this memory from potential steps in implementing this approach include: corruption by disabling interrupts when accessing the • shared variables. Dynamic power savings are also Initialization of the scheduler, which allocates possible by frequently switching the MCU to a low- resources and creates the framework for IPC power mode (LPM) under software control, but and task entry points • transition to LPM must be atomic, or at least interrupt– Creation of task code • safe. Starting the scheduler, which starts the tasks However, foreground / background approach has and the scheduling algorithm several problems: • Handling of tasks and message queues for as • It inevitably leads to the use of variables with long as there are tasks running global scope. The scheduler usually runs in a round-robin fashion • It is hard to separate the application into and allows tasks to be executed in turn. It assigns time independent modules that have their own slices to each task in equal portions and in order, thus timing constraints. The dependencies of such handling all tasks without priority. The round-robin implementation are too many and therefore approach is suitable when there are only a small changes are hard to maintain. number of tasks that do not need to be run with • The ISRs need to execute fast and return different priorities. If the scheduler needs to provide when done. This creates delays for certain preemptive execution based on priorities of tasks, then activities and often means that the ISR needs the complexity of the simple scheduler solution rises to schedule something to happen. The actual significantly. Unfortunately a preemptive RTOS is not processing needs to happen when control always an option for small footprint embedded sensor returns to this point in the main loop. devices with a built-in low-end MCU, which simply 3.2. Queued Architecture might not have enough RAM to accommodate it. The AVIX, RTX51, RTXC3.2 are few high- One way to overcome limitations of the ISR performance, low-overhead small real time kernel approach is to provide a layer that queues all interrupt designed for single-chip MCU applications where code sources and the data associated with them. The size is the most important factor. application can check and process the queued events and interrupts in the main loop in the order of their 4. RTOS In Medium Footprint Embedded arrival, thus making the application code well Sensor Node organized. If latency is not an issue, this approach is 4.1. Finite State Machine And Event Driven cleaner than having many ISRs. If there are some Architecture latency sensitive events, some minimal reconfiguration A Finite State Machines (FSM) is described by an of the ISR is required in order to meet the real-time initial state, set of inputs, outputs, set of states, and allowed transitions. This approach is fairly predictable,

607600 because all the elements are a finite set and are Nucleus RTOS are suitable for their unique features. predefined. Finite State Machine Operating System Three desired key features a minimalistic RTOS should (FSMOS) is compact, modular and optimal for have medium footprint embedded sensor nodes and has • Supply sufficient infrastructure for reactive some interesting properties over pre-emptive RTOS: concurrent programming • A FSMOS can be made completely processor • Preserve state integrity independent i.e. completely in with no • Realize real-time constraints architecture dependent stuff But the major problem that still remains unsolved in • All processes share work space (space for TinyOS, as well as for all others, is the realization of stack, parameters and local variables) real-time constraints in run-time. • As opposed to a pre-emptive RTOS, there are Timber, [5] is a reactive deadline-driven language no concerns about atomic operations during for embedded programming. It is concurrent, object- process execution. A process function call is oriented and functional. There is no division of always completed until the next process application and RTOS, rather the application and the function is called. Therefore non-reentrant run-time environment form one unit (Fig 4). The code can be used. language allows concurrently executing objects in their • There are fewer chances that a program based own execution context. There is inter-object on FSMOS shall mess up the stack in case of communication in the form of messages. Through bugs. FSMOS programs are easier to debug. sending a message a method can be invoked from There are good development tools easily another object. The semantics of Timber specifies that available for FSMOS based systems. every invoked method have to be executed within a Event-driven programming is a common specified time. This means that timing constraints are programming model for tiny memory-constrained specified in the source code and are made possible by wireless embedded systems, which enforces a state the scheduler at run-time. Timber has a run-time machine programming style. Compared to multi- system that allows all that to happen. The key features threaded systems, event-driven systems do not need to of the run-time system are: allocate memory per–thread stack, which leads to low • Scheduling: The fundamental functionality of or medium memory requirements. For this reason the run-time system is to achieve concurrency many operating systems for embedded sensor nodes between Timber objects, with scheduling such as TinyOS, Contiki are based on an event-driven based on the baselines and deadlines of their model. methods. Berkeley’s TinyOS is the most popular state • Message-passing: Supplying sufficient machine based operating system for wireless sensor infrastructure for the inter-object network. It is a lightweight open source operating communication. system, designed to use minimal resources and its • Threading: Facilitate the unique execution configuration is defined at compile time by combining contexts for Timber objects. the TinyOS library and custom-developed components. • Time: Ability to supply sufficient time A TinyOS application is implemented as a set of information to make baselines and deadlines component modules written in nesC. meaningful. Contiki [4] is built around an event-driven kernel, • Interrupt handling: Functionality for receiving which allows individual programs, and services to be and distributing interrupts throughout the dynamically loaded and unloaded in a resource system. constrained environment keeping the base system • Automatic memory management: Timber lightweight and compact. It is included with μIP does not rely on explicit allocations and de- TCP/IP stack and therefore has full TCP/IP support. allocations of dynamic data and needs an Contiki provides optional preemptive multithreading as automatic memory manager to serve with an application library that can be applied to individual garbage collection. processes running on top of the kernel without the Timber Application overhead of reentrancy or multiple stacks in all parts of the system, which leads to a very flexible structure. RTS • 4.2. Reactive Deadline Driven Architecture Scheduler RTOSs in general are too heavy weighted for • Memory Manager • embedded sensor nodes, but FSM based minimalistic IRQ/Message handler • Time RTOS like TinyOS, Contiki, PicoOS, FreeRTOS, • Environmental Interface

Embedded Device

608601 changing configuration parameters to replacing application code. Figure 4 Timber run-time system overview 5. RTOS In Large Footprint Wireless Embedded Device 4.3. Virtual Machine Architecture 5.1. Preemptive Scheduler The virtual machine (VM) concept is very different Preemptive scheduling means dynamic task priority from the alternatives so far discussed. In many management. The running tasks may be rescheduled at constrained node application, a RTOS, particularly the any instruction by the arrival of high priority tasks. commercial ones are a hindrance rather than help. They Preemptive scheduler has the power to pre-empt or jack up the amount of memory and CPU resources that interrupt and resume other low priority tasks at a later many embedded sensor nodes can’t handle. Here using time. Such a change is known as a context switch. VM will be a big plus which needs no OS as an Information about each task, its relative priority, and intermediary. the amount of stack space it requires must be provided Squawk [6] is a Java virtual machine (JVM) for to the scheduler. Task priorities can be set in a variety embedded sensor nodes. It is small Java 2 Micro of ways – even randomly. However, the rate Edition (J2ME) virtual machine written almost entirely monotonic algorithm (RMA) is the optimal way to in Java and runs without a RTOS on embedded ensure that key task deadlines are always met. When hardware. Classes to be deployed onto the device are tasks share resources such as global variables, data verified and transformed into Squawk's internal object structures, or peripheral control and status registers, an representation, which is then saved onto a file called a RTOS primitive called a mutex must be used to suite. Suites are then loaded into the sensor node prevent race conditions. Mutexes are an effective device and are interpreted by the VM on-device. This means of preventing race conditions, but introduce the allows for a smaller VM to be stored in the sensor possibility of priority inversions. node, as well as faster start-up time for the node A preemptive kernel executes a special idle task application. Mechanisms of garbage collection, thread when no other tasks are ready to run because all are management and interrupt handling have special blocked waiting for events. Most kernels provide a way implementations in Squawk in addition to standard to customize the idle task (using callback functions or Java functionality. Interestingly the drivers are also macros), so that the transition to a low-power state can written in Java, leaving a small portion of native code be conveniently implemented inside the idle task. The implementation (Fig 5). Squawk also provides an main difference between a preemptive kernel and a isolated mechanism by which an application is foreground / background system is that as long as tasks are ready to run, the kernel does not switch the context back to the idle task. Consequently the transition to a low-power mode is much simpler, because it does not need to occur with interrupts disabled. Complexity of preemptive scheduler rises significantly compared to a simple scheduler approach. The memory costs of using a preemptive scheduler include extra ROM for the system calls plus RAM for task-specific stacks. Other costs are measured in lost CPU time e.g. the scheduler is software that consumes processor cycles. Context switches and clock ticks can Figure 5 Architectural diff b/w Squawk Java VM and also consume a significant percent of the available standard Java VM time, particularly if they occur frequently. Therefore preemptive scheduler approach is right for large represented as an object. In Squawk, one or more footprint wireless embedded network routers and applications can run in the single JVM. Conceptually coordinator devices. each application is completely isolated from all other 5.2. Layered Multithreaded And Network applications. The performance penalty of using Java is Stack Architecture mitigated by ever increasing power of today's 16/32-bit A critical aspect of a wireless sensor networking MCUs. operating system is the manner in which user There is another example of a VM approach from applications communicate with other nodes or routers the TinyOS community [7], called Maté. It addresses and potentially to a network coordinator. Mantis the problem of reprogramming sensor nodes, from Operating System (MOS) [8] is a next generation

609602 lightweight OS designed specifically for WSN new developments in the hardware of the new smart application. It is a classical layered multithreaded OS sensor chips, sensor nodes, wireless MCUs, new with a layered network stack similar to a typical software solutions will become available. Hardware TCP/IP stack. This network stack can be used to parallelism and new unleashed power will make the communicate easily using the radio and / or the serial RTOS architecture for WSN even more compelling. connection. The design of the MOS scheduler is based 10. References on a pre-emptive, multi-threaded kernel for use in WSN system. It achieves energy efficiency by [1] Anton Hristozov, “Choosing the best system software implementing a sleep function. Its power-efficient architecture for your wireless smart sensor devices,” scheduler recognizes when all threads are sleeping and Embedded.com, December 04,2007. then sleeps the MCU for a duration deduced from each thread’s sleep time. The API from the underlying OS [2] Frank Golatowski, Jan Blumenthal, Matthias Handy, Marc Haase, Hagen Burchardt, Dirk Timmermann,“Service – separates application threads. MOS enables cross- Oriented Software Architecture for Sensor Networks,” platform support by preserving the API across Rostock, Germany: Institute of Applied Microelectronics and platforms. Computer Science, University of Rostock. In a nutshell MOS (Fig 6) achieves a lightweight and energy-efficient scheduler, a user-level network [3] Micro Samek,”Use an MCU’s low-power modes in stack as well as other components such as devices foreground / background systems,”Embedded System drivers and compresses a classical multithreaded Design, October 2007.

[4] , Bj¨orn Gr¨onvall, Thiemo Voigt,”Contiki - a Lightweight and Flexible Operating System for Tiny Networked Sensors,” Swedish Institute of Computer Science.

[5] Martin Kero, Per Lindgren, Johan Nordlander,”Timber as an RTOS for Small Embedded Devices,” Luleå, Sweden: Luleå University of Technology, Department of Computer Science and Electrical Engineering, EISLAB.

[6] Nik Shaylo, Douglas N. Simon, William R. Bush,” A Java Virtual Machine Architecture for Very Small Devices,” Figure 6 MANTIS OS architecture Mountain View, CA: Sun Microsystems Research Laboratories. layered operating system design into a memory footprint of less than five hundred bytes. [7] Philip Levis and David Culler, "Maté " A tiny Virtual 5.3. Evolvable Architecture machine for Sensor Networks", International Conference on The concept of evolvability [9] means OS itself can Architectural Support for Programming Languages and be easily configurable and upgradeable. Evolvable Operating Systems, San Jose, CA, USA. operating system (EOS) is more space efficient by providing both micro-threads and generic threads. The [8] Shah Bhatti, James Carlson, Hui Dai, Jing Deng, Jeff Rose, Anmol Sheth, Brian Shucker,” MANTIS OS: An EOS includes a flexible hardware abstraction layer Embedded Multithreaded Operating,” Boulder CO: (HAL), which helps porting EOS to different hardware Computer Science Department, University of Colorado. platforms. It has a message-handling engine, which is efficient and seamless for both local and remote [9] Thu-Thuy Do, Daeyoung Kim, Tomass Sanchez Lopez, communication. Hyunhak Kim, Seongki Hong, Minh-Long Pham,“An 6. Conclusion Evolvable Operating System For Wireless Sensor Network,” This paper provides a general outline of how the Daejeon, Korea: Information and Communications University, Korea. WSN RTOSs landscape looks today and chooses or design the best fit among them. Since there are many

610603