TOSThreads: Extending the TinyOS Concurrency Model to Support Preemption

Kevin Klues?, Chieh-Jan Mike Liang†, Jeongyeup Paek‡, Razvan Musaloiu-E†, Ramesh Govindan‡, Philip Levis?, Andreas Terzis† ? Stanford University † Johns Hopkins University ‡ University of Southern California Stanford, CA Baltimore, MD Los Angeles, CA

ABSTRACT On the other hand, preemptive threading offers a more We present TOSThreads, an application-level threads intuitive programming paradigm for developing higher- package for TinyOS. In contrast to previous threads level services and applications. In this model, applica- packages proposed for TinyOS, TOSThreads supports tion writers need not explicitly manage yield points or fully preemptive threads while maintaining the exist- continuations, or partition a long running computation ing TinyOS concurrency model. TOSThreads defines a to avoid missing events. Compression is a concrete ex- conceptual user/kernel boundary, whereby TinyOS runs ample of an operation that can benefit from threading. as a high-priority kernel and application threads Many sensor network applications, such as seismic sens- execute at lower priority. This approach naturally sup- ing [27] and structural monitoring [2, 15] could benefit ports long-running computations while preserving the greatly from data compression. Nevertheless, real sensor timing-sensitive nature of TinyOS itself. Additionally, network deployments rarely use it due to the difficulty the existence of a user/kernel boundary enables dynamic of implementing it in event-driven environments. linking and loading of application binaries at runtime. In this paper, we explore this tension between being The API at this boundary defines the set of TinyOS able to manage concurrency in the face of resource con- services supported by a TOSThreads kernel and is cus- straints and having an intuitive programming model. tomizable in support of a diverse set of applications. Specifically, we describe a new threads package called We demonstrate that TOSThreads context switches TOSThreads for TinyOS that has several novel goals: and system calls introduce an overhead of less than Fully preemptive threads: Application 0.92% and that dynamic linking and loading takes as should not have to manually manage yield points or con- little as 90ms for a representative sensing application. tinuations, greatly simplifying application development. We compare different programming models built on top Minimal disruption: Adding threads should not neg- of TOSThreads, including standard with blocking sys- atively affect the OS’s performance and changes required tem calls and a reimplementation of Tenet. Addition- to the existing code should be highly localized and min- ally, we evaluate the ability of TOSThreads to run com- imal. This goal enables system developers to achieve putationally intensive tasks. Taken as a whole, these high performance and concurrency wherever necessary. results suggest that TOSThreads is an important step towards defining a TinyOS kernel that can support long- Flexible boundary: Developers must be able to ex- running computations and high concurrency. periment with different “kernels” by altering the bound- aries between threaded and event-driven code, based on 1. INTRODUCTION tradeoffs between ease of use, efficiency, and application requirements. Many mote operating systems use event-driven exe- cution to support multiple concurrent execution con- Flexible application development: The system must texts with the memory cost of a single stack [6, 11, enable programmers to develop their applications using 13]. Network protocols, storage subsystems, and sim- multiple programming paradigms and provide a way to ple data filters can be easily developed in this model, as dynamically link and load executable application code. they typically perform short computations in response To the best of our knowledge, no existing thread pack- to I/O events. More generally, there are sound rea- age for TinyOS satisfies all of these requirements. Tiny- sons for mote OSs to be event-based: given the motes’ Threads [21] relies on applications to explicitly yield the memory and processing constraints, an event-based OS processor. TinyMOS [25] runs TinyOS inside a ded- permits greater concurrency than other alternatives. icated Mantis [1] thread, but requires placing a lock around most of the TinyOS code, limiting overall con- System Calls currency and efficiency. While our work has been in- Application Task spired by these approaches, TOSThreads is different Threads Scheduler as it provides true multi-threading in TinyOS without limiting performance or sacrificing its event-driven pro- TinyOS gramming model. Thread Contributions. We identify three contributions re- lated to the goals we have just set forth. (1). We pro- Thread Scheduler vide TOSThreads, a fully-preemptive threads library for TinyOS. In TOSThreads, TinyOS runs inside a high pri- Figure 1: Overview of the basic TOSThreads architec- ority kernel thread, while all application logic resides in- ture. The vertical line separates user-level code on the side user-level threads, which execute whenever TinyOS left from kernel code on the right. Looping arrows indi- cate running threads, while the blocks in the middle of becomes idle. This approach naturally extends the ex- the figure indicate API slots for making a system call. isting TinyOS concurrency model while adding support for long-running computations. (2). TOSThreads ap- plications access underlying TinyOS services through supporting long-running computations. a kernel API of blocking system calls. This frame- work allows system developers to evolve the kernel API 2. ARCHITECTURE by wrapping blocking system calls around event-driven The existing TinyOS concurrency model has two exe- TinyOS services. Moreover, by exporting this API in cution contexts: synchronous (tasks) and asynchronous both nesC [9] and ANSI C, we allow developers to im- (interrupts). These two contexts follow a strict priority plement efficient applications without the need to learn scheme: asynchronous code can preempt synchronous a new language. (3). TOSThreads enables on-demand code but not vice-versa. TOSThreads extends this con- execution of application threads received from the net- currency model to provide a third execution context in work as binary objects. TinyLD, our dynamic linker- the form of user-level application threads. While ap- loader, patches a binary’s unresolved system call refer- plication threads cannot preempt either synchronous or ences and loads it into the mote’s memory before exe- asynchronous code, they can preempt each other. Appli- cuting it. cation threads synchronize using standard synchroniza- In many ways, the TOSThreads approach resembles tion primitives such as mutexes, semaphores, barriers, that taken in , where the Contiki core runs in condition variables, and blocking reference counters (a one thread and additional threads context switch with mechanism we have developed, §3.4). it. Our contribution beyond this work is to propose Figure 1 presents the TOSThreads architecture, con- message passing between the application threads and sisting of five key elements: the TinyOS task scheduler, the core OS thread, thereby avoiding the race condi- a single kernel-level TinyOS thread, a thread scheduler, tions typical to preemptive threads that directly call a set of user-level application threads, and a set of sys- kernel code. The challenges such race conditions present tem call APIs and their corresponding implementations. perhaps explain why no Contiki platform currently sup- Any number of application threads can concurrently ex- ports thread preemption [4]. ist (barring memory constraints), while a single kernel Together, user-level threads, dynamic application load- thread runs the TinyOS task scheduler. The thread ing and linking, and a flexible blocking system call API, scheduler manages the concurrency between application greatly simplify existing programming abstractions for threads, while a set of system calls provides access to the TinyOS and the implementation of new ones. While TinyOS kernel. these facilities are, of course, available on general-purpose In order to preserve the timing-sensitive nature of systems today, they have not previously been demon- TinyOS, the kernel thread has higher priority than ap- strated together in the context of the TinyOS event-driven plication threads. So long as the TinyOS task queue is . To substantiate TOSThreads’s flex- non-empty, this TinyOS thread takes precedence over ibility we have reimplemented the Tenet programming all application threads. Once the TinyOS task queue system [10], on top of TOSThreads. The resulting sys- empties, control passes to the thread scheduler and ap- tem has higher expressivity without increasing code size. plication threads can run. The processor goes to sleep TOSThreads has also enabled the development of a novel only when either all application threads have run to , Latte, a JavaScript dialect that completion, or when all threads are waiting on synchro- compiles to C. Finally, our evaluation results show that nization primitives or blocked on I/O operations. TOSThreads generates minimal overhead in terms of ex- There are two ways in which posted events can cause ecution speed and energy consumption, while efficiently the TinyOS thread to wake up. First, an application thread can issue a blocking system call into the TinyOS Through its flexible user/kernel boundary, TOSThr- kernel. This call internally posts a task, implicitly wak- eads enables the kernel to evolve in support of diverse ing up the TinyOS thread to process it. Second, an user-level code bases. We demonstrate this ability by can post a task for deferred computa- developing two custom TOSThreads kernels: one that tion. Since interrupt handlers have higher priority than provides a standard set of TinyOS services (§3.3) and the TinyOS thread, the TinyOS thread will not wake up one that implements the Tenet API (§4.6). to process the task until after the interrupt handler has Dynamic Linking and Loading: Defining a user/ker- completed; this sequence of operations is identical to nel boundary creates the possibility of compiling ap- that of traditional TinyOS without TOSThreads sup- plications separately and dynamically linking them to port. Because interrupts can arrive at anytime, how- a static kernel. TinyLD is the TOSThreads compo- ever, the TinyOS thread may require a context switch nent implemented to provide this functionality. To use with an interrupted application thread. Control eventu- TinyLD, users write a standalone application that in- ally returns to the application thread after the TinyOS vokes system calls in the kernel API. This application thread has emptied the task queue. is compiled into an object file, and compressed into TinyOS Modifications: Only two changes to the ex- a custom MicroExe format we have developed. The isting TinyOS code base are required to support TOS- compressed binary is then transported to a mote using Threads: a modification to the boot sequence and the some predefined method (e.g., serial , over-the- addition of a post-amble for every interrupt handler. air dissemination protocol, etc.). Once on the mote, The change in the boot sequence encapsulates TinyOS TinyLD dynamically links the binary to the TinyOS inside the single kernel-level thread before it boots. Once kernel, loads it into the mote’s program memory and it runs, TinyOS operates as usual, passing control to the executes it. thread scheduler at the point when it would have other- wise put the processor to sleep. The interrupt handler 3. IMPLEMENTATION post-ambles ensure that TinyOS runs when an interrupt handler posts a task to its task queue. Our evaluations This section describes the implementation of TOS- show these modifications introduce minimal disruption Threads, including the internals of the thread scheduler, to the operation of TinyOS (§4). the thread and system call data structures, and the dy- namic linking and loading process. While most of the Flexible User/Kernel Boundary: A primary differ- TOSThreads code is platform independent, each sup- ence between TOSThreads and other TinyOS thread- ported platform must define platform-specific functions ing implementations is that TOSThreads defines a flex- for (1) invoking assembly language instructions for per- ible boundary between user and kernel code. Rather forming a context switch and (2) adding a post-amble than dividing code into user and kernel space based to every interrupt handler. Defining these functions is a on access rights to privileged operations, TOSThreads fairly straightforward process, and implementations ex- loosely defines a conceptual user/kernel boundary as the ist for many popular TinyOS platforms. As of TinyOS point in which programs switch from a threaded to an 2.1.0, TOSThreads is part of the baseline TinyOS distri- event-driven programming model. Because all existing bution, with support for Tmote Sky, Mica2, Mica2dot, TinyOS code is event-driven, any component in the cur- MicaZ, Iris, eyesIFX, Shimmer, and TinyNode motes. rent TinyOS code base can be included in a TOSThre- Full source code can be found at http://www.tinyos. ads kernel. net/. TOSThreads makes building a kernel from existing TinyOS components a straightforward process. Just as 3.1 The Thread Scheduler a traditional TinyOS application consists of the TinyOS task scheduler and a custom graph of components, a TOSThreads exposes a relatively standard API for TOSThreads kernel consists of the task scheduler, a cus- creating and manipulating threads: create(), dest- tom graph of components, and a custom set of blocking roy(), pause(), resume() and join(). These func- system calls. Each of these calls is a thin wrapper on top tions form part of the system call API and can be in- of an existing TinyOS service (e.g., active messaging, voked by any application program. sensing, multi-hop routing). The wrapper’s sole purpose Internally, TOSThreads components use thread sched- is to convert the non-blocking split-phase operation of uler commands that allow them to initialize(), sta- the underlying TinyOS service into a blocking one. The rt(), stop(), suspend(), interrupt(), or wakeup() API that a kernel ultimately provides depends on the a specific thread. The thread scheduler itself does not set of TinyOS services its designer wishes to present to exist in any particular execution context (i.e., it is not applications. a thread and does not have its own stack). Instead, any TOSThreads component that invokes one of commands above, executes in the context of the calling thread; only init_block: Applications use this field when dynami- the interrupt handler post-ambles and the system call cally loaded onto a mote. As §3.4 describes, whenever API wrappers invoke them directly. the system dynamically loads a TOSThreads applica- The default TOSThreads scheduler implements a fully tion, the threads it creates must receive all the state as- preemptive round-robin scheduling policy with a time sociated with its global variables. An initialization block slice of 5 msec. We chose this value to achieve low la- structure stores these global variables and init_block tency across multiple application-level computing tasks. points to this structure. While application threads currently run with the same next_thread: TOSThreads uses thread queues to keep priority, one can easily modify the scheduler to support track of threads waiting to run. These queues are im- other policies. plemented as linked lists of threads connected through The thread scheduler is the first component to take their next_thread pointers. By design, a single pointer control of the processor during the boot process. Its suffices: threads are always added to a queue just before job is to encapsulate TinyOS inside a thread and trig- they are interrupted and are removed form a queue just ger the normal TinyOS boot sequence. Once TinyOS before they wake up. This approach conserves memory. boots and processes all of its initial tasks, control re- turns to the thread scheduler which begins scheduling thread_state: This set of fields store information about application threads. The scheduler keeps threads ready a thread’s current state. Specifically, it contains a count for processing on a ready queue, while threads blocked of the number of mutexes the thread currently holds; on I/O requests or waiting on a lock are kept on differ- a state variable indicating the thread’s state (INAC- ent queues. Calling interrupt() or suspend() places a TIVE, READY, SUSPENDED, or ACTIVE); and a set thread onto one of these queues while calling wakeup() of variables that store the processor’s register set when removes it from a queue. a context switch occurs. start_function: This set of fields point to a thread’s 3.2 Threads start function along with a pointer to a single argument. TOSThreads dynamically allocates Thread Control The application developer must ensure that the struc- Blocks (TCB) with space for a fixed size stack that does ture the argument points to is not deallocated before not grow over time. While the memory costs associated the thread’s start function executes. These semantics with maintaining per thread stacks can be substantial, are similar to those that Unix pthreads define. we believe the benefits of the programming model pro- joinedOnMe: This field stores a bitmap of the thread vided by preemptive threading outweigh these costs in ids for all threads joined on the current thread through many situations. That said, one can use techniques such a join() system call. When the current thread termi- as those proposed by McCartney and Sridhar [21] to es- nates, this bitmap is traversed, and any threads waiting timate (and thereby minimize) the memory required by on it are woken up. each of these stacks. The code snippet below shows the complete structure stack_ptr: This field points to the top of a thread’s of a TOSThreads TCB. Below, we describe each of the stack. Whenever a context switch is about to occur, the included fields in more detail: thread scheduler calls the switch_threads() function, pushing the return address onto the current thread’s struct thread { stack. This function stores the current thread’s register thread_id_t thread_id; init_block_t* init_block; state, replaces the processor’s stack pointer with that struct thread* next_thread; of a new thread, and finally restores the register state of the new thread. Once this function returns, the new uint8_t mutex_count; // uint8_t state; // thread_state thread resumes its execution from the point it was in- thread_regs_t regs; // terrupted.

void (*start_ptr)(void*); // start_function syscall: This field contains a pointer to a structure void* start_arg_ptr; used when making system calls into a TOSThreads ker- uint8_t joinedOnMe[]; nel. This structure is readable by both a system call stack_ptr_t stack_ptr; wrapper implementation and the TinyOS kernel thread. syscall_t* syscall; Following section explains how this structure is used. }; thread_id: This field stores a thread’s unique identifier. 3.3 Blocking API It is used primarily by system call implementations and TOSThreads implements blocking system calls by wrap- synchronization primitives to identify the thread that ping existing TinyOS services inside blocking APIs. These should be blocked or woken up. wrappers are responsible for maintaining state across the non-blocking split-phase operations associated with User Thread TinyOS Thread the underlying TinyOS services. They also transfer con- System Calls trol to the TinyOS thread whenever a user thread in- vokes a system call. All wrappers are written in nesC Task Send with an additional layer of C code layered on top of Queue them. We refer to the TOSThreads standard C API Receive Timer as the API providing system calls to standard TinyOS System Call services such as sending packets, sampling sensors, and Task Receive Sense writing to flash. Alternative API’s (potentially also Routing written in C) can be implemented as well (e.g. the Tenet Block Arbiter API discussed in §4.6). Storage A user thread initiates a system call by calling a func- tion in one of the blocking API wrappers. This function Figure 2: TOSThreads exposes kernel APIs through creates a local instance of a system call block (SCB) blocking system calls wrapped around event-driven structure which contains: a unique syscall_id asso- TinyOS services. These wrappers (white boxes on the ciated with the system call; a pointer to the thread left) run their respective system calls inside a single shared TinyOS task. This task is interleaved with other invoking the call; a pointer to the function that TinyOS TinyOS tasks (grey boxes on the right) which TinyOS should call once it assumes control; and the set of pa- itself posts. rameters this function should receive. The SCB is used to exchange data with the TinyOS thread. via a dissemination protocol, or installed on a mote All variables associated with a system call (i.e., the via its serial interface. At that point, TinyLD accesses pointer to the SCB and the parameters passed to the the binary from flash or RAM, patches unresolved ad- system call itself) can be allocated on the local stack dress references, and links it to the kernel. Finally, of the calling thread at the time of the system call. TinyLD loads the resulting binary to the mote’s ROM This is possible because once the calling thread invokes and spawns a thread to execute it. Currently, the Mi- a system call, it will not return from the function which croExe format is only supported on MSP430-based plat- instantiates these variables until after the blocking sys- forms, but we are in the process of modifying it to sup- tem call completes. These variables remain on the local port others as well. The paragraphs that follow elabo- thread’s stack throughout the duration of the system rate on the MicroExe format and the linking and loading call and can therefore be accessed as necessary. process. As discussed in §2, making a system call implicitly posts a TinyOS task, causing the TinyOS thread to 3.4.1 MicroExe immediately wake up and the calling thread to block. Binary formats for dynamic linking and loading on Because we constrain our design to allow the TinyOS general purpose OSs are inefficient for memory-const- thread to run whenever it has something to do, there can rained mote platforms. Consider the Executable and only be one outstanding system call at any given time. Linkable Format (ELF), the most widely used format Thus, only one TinyOS task is necessary to perform the for dynamic linking and loading in Unix systems. While application system calls. The body of this task simply ELF encodes addresses as 32-bit values, mote platforms invokes the function the system_call_block points to. based on the MSP430 microcontroller, such as the Tmote This is in contrast to existing threads packages which Sky [22], have a 16-bit address space. Moreover, sym- must maintain system call queues to track the system bol names in ELF are encoded as text strings for ease calls that each thread makes. Figure 2 provides a visual of use, thus increasing the size of the file. Contiki, an- representation of the TOSThreads approach. other mote operating system, proposed Compact ELF (CELF) to reduce the overhead of ELF with 8 and 16- 3.4 Dynamic Linker and Loader bit datatypes [5]. While CELF also reduces binary size, TinyLD is the dynamic linker and loader we imple- the MicroExe format we propose is customized for the mented for TOSThreads. Using TinyLD, application more restrictive environment that TOSThreads oper- programs written in C can be dynamically installed and ates in. Specifically, the ESB platform used in [5] has simultaneously executed on a mote. TinyLD’s task is a 64KB byte-level external EEPROM that the loader to resolve, at run-time, references to kernel API calls uses to store binary images during the linking and load- made by a dynamically loadable application. In more ing phase. In contrast, the Tmote Sky does not have detail, an applications is compiled offline into a cus- this hardware feature. Therefore, TinyLD needs to per- tomized loadable binary format we developed called Mi- form these tasks inline and MicroExe needs to allow for croExe [23]. This binary is then distributed to a mote the serial processing of the application binary. . struc- and waits for init_block , the starting point of the blocking reference counter or any of its children ulti- tosthread_main . This block contains a reference 3.2. Once the newly spawned thread § structure remains active throughout the tosthread_main and begin running the application binary. tosthread_main init_block 1 we listed four requirements for TOSThreads: to § Once all of the linking and loading steps are com- Since child threads may need access to global vari- In We first measure microbenchmarks based on the cycle ture described in starts running, it calls kernel by patching unresolvedto addresses calls corresponding forspace kernel for services. allences global It to variables then it local allocatesinto the defines, variables, memory mote’s flash patches and ROM. refer- Thesestraightforward, loads steps and are the conceptually all machine informationis required encoded code in for the them MicroExe file itself. plete, TinyLD invokesroot TOSThreads thread toA spawn pointer to a new thread, as well as aated pointer with to the all application global are variables passednew associ- as root arguments to thread, the inside the special the entire programinit_block to finish beforeapplication’s lifetime terminating. and can The eads be referenced that by anymately thr- spawn. ables associated with anates loadable the binary, binary TinyLD only termi- terminated. when To all ensure of this,nization its we primitive, designed children called a have the new also synchro- counter that increments whenand a new decrements thread whenever is athe spawned thread root terminates. threaderence itself When counter finishes, reaches itthread zero. blocks de-allocates until any At thisprogram resources that and ref- associated point, marks thethe with the program flash the root was ROM stored segments as in free. which 4. EVALUATION As described above, everythread thread or spawned one byoriginal of the root its children inherits a pointer to the provide a fully preemptive application-levelstraction, threads be ab- minimally invasive toruntime, the support existing a TinyOS enable flexible dynamic user/kernel linking boundary,runtime. and and Next, loading we evaluate of how wellthese applications TOSThreads requirements. meets at counts of TOSThreads basicond, scheduler we operations. analyze a Sec- representative sensortion network applica- as well ascomputation, one examining that how has TOSThreads a canprograms long-running simplify compression and quantifyingevaluate its dynamic energy linking cost.croExe and through Third, loading code we lines as size, of both well code. in as terms Mi- of Finally, bytes we and evaluate TOSThreads’ ex- Code size chained refer- local Number of relocations 16 bits external ... Number of relocations size Size Total Relocations Relocations allocation Allocations External Local [17] techniques to reduce its size. Al- is a file format designed specifically for offset ... to patch to patch to patch Number of allocations First address First address First address ...... Figure 3: MicroExe file format. MicroExe ID [17] to create a chain of linked lists for all the refer- offset Offset Offset address Data section Allocation

tosthread_main()

The linking and loading process consists of fours steps. There are four sections in a MicroExe file (Figure 3). The Patch Table Patch Code Init Metadata 3.4.2 Linking and Loading First, TinyLD links the binary’s machine code to the ences to the samein unresolved the symbol. patch Thus, table eachitself requires entry and only a two pointer values: tosymbol. the the symbol first Finally, chained the referenceglobal initialization of variables should that table be describes allocated,tion while how contains the the code application’s sec- machineterest of code. space In we refrain thedescription from in- providing of a more the detailed MicroExeinterested readers format. to Instead, thereport we associated [23]. point MicroExe technical ences though it is optimizedsign for the principles MSP430 platform, arecontrollers. its equally de- A applicable MicroExe binarypiling to is an other created application by written micro- ing first in C a com- to standard an GCCrunning ELF . on binary Next, us- a a PCELF generator invokes symbol script the and GCC relocation tables toolchaintically to to equivalent, construct extract yet a space-optimized, seman- MicroExe file. The initial metadata sectionessary provides to information decode nec- as and the patch sizesble the of contains rest all information about subsequent ofeach the sections. the type unresolved and address file, The location referenceTo of such patch in minimize the the ta- footprint machine of code. ploys this a section, common compiler MicroExe technique em- called loadable binaries incompacts TinyOS. the It representation of uses symbolchained references 16-bit names, and addresses, uses Operation Number of cycles Operation Number of cycles Operation Number of cycles Marginal Total Marginal Total Start Static Thread 283 Sleep 371 715 Log Sync 466 810 Start Dynamic Thread 679 + malloc() Log Seek 476 820 Interrupt Thread 100 StdControl Start 466 810 Log Read 491 835 StdControl Stop 466 810 Log Append 500 844 Suspend Thread 145 Log Erase 468 812 Wakeup Thread 15 AM Send 390 734 Block Sync 468 812 Static Thread Cleanup 229 AM Receive 912 1256 Block Read 495 839 Block Write 495 839 Dynamic Thread Cleanup 123 Sensirion Restore Next Thread 85 Sensor Read 277 621 Block Erase 466 810 Context Switch 184 ADC Read 477 821 Block CRC 506 850

Table 3: Overhead of invoking the system calls that the Table 1: Number of cycles necessary to perform thread- standard TOSThreads C API provides. The marginal related operations. column includes the cost of the system call itself while the total column includes the cost of invoking the underlying Operation Number of cycles thread scheduler operations. Mutex Init 13 Mutex Lock 17 Mutex Unlock 71 these operations are usually performed in sequence. For Barrier Reset 13 example, whenever a thread is suspended, either via a Barrier Block 41 blocking system call or because it waits on a synchro- Barrier Block with Wakeup 6 + 302 × num waiting nization primitive, it must be explicitly woken up before Condvar Init 8 it resumes. The total cost of suspending the thread must Condvar Wait 30 Condvar Signal Next 252 then be calculated as the sum of the suspend, context Condvar Signal All 314 × num waiting switch, and wakeup costs, for a total of 344 cycles. Refcount Init 12 The total suspend cost is relevant when calculating Refcount Wait On Value 39 the total overhead of making a blocking system call. Refcount Increment 11 Refcount Decrement 11 The first column of Table 3 shows the marginal over- Refcount Inc/Dec with Wakeup 11 + 320 × num waiting head of making a blocking system call, while the second Join Block 74 column presents the total overhead including the cost of Join Wakeup 74 + 326 × num waiting suspending and resuming a thread (i.e., adding 344 cy- cles). In turn, these totals are relevant when measuring Table 2: Number of cycles necessary to perform the basic the energy cost of using TOSThreads, which we present TOSThreads synchronization primitives. next. pressiveness by presenting a reimplementation of the 4.2 Energy Analysis Tenet API using TOSThreads as well as Latte, a novel To measure the impact that TOSThreads has on en- JavaScript dialect. ergy consumption, we implement a representative sen- All measurements use the Tmote Sky platform run- sor network application and calculate the energy over- ning at 4MHz with a serial baud rate of 57,600 bps. We head of performing all system calls, context switches, use the onboard temperature, humidity, total solar, and and thread synchronization operations. Specifically, we photo active radiation sensors for experiments including develop a ‘Sense, Store, and Forward’ (SSF) applica- sensor readings. tion consisting of producer threads which sample sen- sors once every logging period and log their values to 4.1 Microbenchmarks flash memory. The application also includes a consumer Tables 1 and 2 present the number of cycles neces- thread which reads the values written to flash and trans- sary to perform all relevant thread scheduler and ba- mits them over the radio, using a different sending pe- sic synchronization operations, respectively. With the riod. Our SSF application has six threads: one for sam- Tmote Sky running at 4 MHz, these operations (with pling each of the four sensors onboard the Tmote Sky, the exception of starting a dynamic thread) take less one for logging these sensor values to flash, and one for than a few hundred cycles to complete. These numbers sending them over the radio. We set the logging period translate to less than 70 µsec of computation time per to 5 minutes and the sending period to 12 hours, result- operation. Even starting a dynamic thread (which can ing in 144 samples gathered during each sending period. take as many as 800 cycles, depending on the duration The six threads synchronize using a combination of mu- of malloc()), takes less than 200 µsec. Thus, the cost texes and barriers. of performing these operations is negligible in terms of To calculate the energy overhead of executing this ap- their impact on the system’s responsiveness. plication, we combine the system call costs found in the Tables 1 and 2 do not represent the true application- second column of Table 3 and the synchronization costs level cost of using TOSThreads, as more than one of calculated in Table 2. Specifically, for each logging pe- 100 Number of Number of Program symbols patched addresses Node Lifetime [years] 30 Null 0 0 Blink 11 15 10 RadioStress 37 43 20 SenseSF 50 93 Basestation 47 70 Years

Percentage 1 Energy Overhead [%] 10 Table 4: The number of symbols and the number of ad- dresses that TinyLD patches for five sample applications.

0.1 0 0 1 2 3 4 5 over the serial port every 50 msec and buffers their pay- Logging Period [min] loads in RAM (25 bytes per packet). Whenever the buffer is full, the application compresses the entire con- Figure 4: Energy overhead of implementing SSF appli- cation using TOSThreads as a function of the logging tent of the buffer (1,250 bytes) with the Lempel-Ziv- period. Also shown is the node lifetime, using standard Welch (LZW) compression algorithm. Experimental re- AA batteries as the energy source. sults show that compressing the buffer requires approxi- riod we include the cost of two Sensirion Sensor Reads, mately 1.4 sec, which is more than sufficient to represent two ADC Reads, four Mutex Locks (plus 344 cycles for a long-running computation; any operation that lasts suspends), four Mutex Unlocks, eight Barrier Blocks, longer than 50 msec results in an unresponsive system one Log Write, and one Sleep call for a total of 6,286 that will start dropping packets. cycles (log cost). The overhead during each sending The metric we use for this experiment is the total period is the sum of 144 Log Reads, 144 AM Sends, number of packets dropped after 500 serial packets have 144 Mutex Locks (plus suspends), 144 Mutex Unlocks, been sent. Since TinyOS does not support task preemp- and one Sleep operation, for a total of 288,859 cycles tion, we expect that the TinyOS version of the program (send cost). will drop multiple packets while compressing its buffer. As measured in [16], the current the MSP430 proces- The experiment confirmed our expectation: TinyOS dro- sor draws while active is 1.92 mA. Using this value, the pped 127 packets while TOSThreads dropped zero. total energy consumed during each log and send period Although this application does not necessarily reflect is: the actual long-running computations we expect motes Elog cost = (log cost · 1.92mA)/4MHz to perform, the results we provide expose a fundamen- = 2.87µAs tal limitation in the existing TinyOS concurrency model – running long computations severely affects its perfor- Esend cost = (send cost · 1.92mA)/4MHz mance. TOSThreads removes this limitation. = 132.23µAs 4.4 Dynamic Linking and Loading Using an analysis similar to the one in [16], we calculate the total lifetime of this application with different log- TinyLD introduces space overhead in terms of appli- ging periods1. In all cases we adjust the sending period cation and system code size, as well as execution over- to be 144 times the logging period (derived from a typi- head in terms of the time necessary to link and load an cal 12 hour sending period and 5 min sampling interval). application binary. In this section, we evaluate these Figure 4 presents the percentage of energy consumed by overheads by measuring the cost of dynamically loading system calls and thread synchronization primitives as a five sample applications compiled into the MicroExe for- function of the logging period. In all cases this cost is mat: Null, Blink, Radio Stress, SSF, and Base- less than 1%. Station. The first application is effectively empty and serves as a baseline for the fixed cost of linking and 4.3 Supporting Long-Running Computations loading. Blink is the standard TinyOS application that We evaluate the ability of TOSThreads applications repeatedly blinks a mote’s LEDs, while RadioStress to perform long-running computations without interfer- transmits radio packets as fast as possible. Finally, SSF ing with the responsiveness of the underlying TinyOS is the application described in §4.2, and BaseStation kernel. To do so, we compare two versions of an ap- is the standard TinyOS application that forwards radio plication that uses compression: one implemented us- packets to the serial port (and vice-versa). ing standard TinyOS tasks and another using TOSThr- The size of a MicroExe binary depends on four fac- eads. In both cases, the application receives packets tors: the size of the machine code (Code), the total 1This analysis assumes that the mote is powered by two number of relocations (UReloc), the total number of al- AA batteries with an approximate capacity of 2,700 mAh locations (UAlloc), and the total number of initialized 9 (9.72 · 10 µAs). global variables (UInit). Since MicroExe stores patched 1200 1400 code TinyOS 1200 1000 symbols+header TOSThreads using C API 1022 1000 974 800 800 672 708 600 600 Bytes Bytes 350 400 400 298 200 200 104 110 4 2 0 0 Null Blink Radio Sense Base Null Blink Radio Sense Base Stress SF Station Stress SF Station

100 Figure 6: Comparison of application code sizes for five sample applications implemented using standard TinyOS 80 and TOSThreads. 60 450 TinyOS 40 400 Time [ms] TOSThreads using C API 356 20 350 300 0 250 229 Null Blink Radio Sense Base 200 Stress SF Station 130 150 114 122

Lines of Code 100 69 48 50 32 32 Figure 5: TinyLD overhead for five sample applications. 2 The top graph shows the size of each application using 0 Null Blink Radio Sense Base the MicroExe format, while the bottom graph presents Stress SF Station the time TinyLD requires to link and load each applica- tion. Figure 7: Comparison of application lines of code for five sample applications implemented using standard TinyOS addresses as chained references, UReloc is actually equal and TOSThreads. to the number of unique symbols in the program. The 40 2 size of a MicroExe file (cf. Fig.3) is then given by: API Wrappers Thread Library Code + (UReloc + UAlloc) · 4 + UInit · 6 + 5 · 2 30 1.5 TinyLD TinyOS Core The graph at the top of Figure 5 shows the breakdown 20 1

of the MicroExe binary into its code and header com- Kilobytes ponents for each of the five sample applications. 10 0.5

The time required to link and load a MicroExe binary 0 0 depends on multiple factors. First, TinyLD must copy ROM RAM the entire machine code section of a binary to the MCU’s flash ROM. Experiments show that copying two bytes Figure 8: Breakdown of the binary code size and RAM usage of a complete TOSThreads kernel based on the of data from memory to flash ROM takes 188 cycles (47 standard C API compiled together with TinyLD. µsec on the Tmote Sky). Second, the loading time de- pends on both the the number of unique symbols in the 4.5 Code Size binary and the number of addresses that TinyLD must patch. This is because TinyLD implements an interative We also compare the code size of just the application loading process whereby the number of unique symbols portion of our sample applications when implemented determines the time required to find the next smallest in both standard TinyOS and TOSThreads. As Fig- patched address, and the number of patched addresses ures 6 and 7 indicate, the TOSThreads versions are determines the total number of iterations required. more compact in terms of both application code size and Table 4 presents the total number of symbols and ad- lines of code. We gathered binary code sizes by running dresses that require patching in each of the five sam- msp430-objdump and manually counting the number of ple applications. The graph at the bottom of Figure 5 bytes in the application-specific portion of the binary. presents the linking and loading time for these applica- version of the SLOCCount utility [28], modified to rec- tions. One observation is that although SSF is smaller ognize nesC source code. than Basestation in terms of binary size, it takes longer Finally, we present a breakdown of the binary code to load because it has more symbols and patched ad- size and RAM usage of a complete TOSThreads kernel dresses (see also Table 4). compiled together with TinyLD (Figure 8). The kernel used implements the TOSThreads standard C API. tosthread_main() Compiled Tasklets { binary … Tenet Thread … Scheduler … Scheduler } tosthread_main() { … … Task Installer Dynamic tosthread_main() … Loader { … … } Tenet tasks … }

Figure 9: Original Tenet: Tenet scheduler executes Figure 10: Tenet-C: Each Tenet-task is a native mote bi- Tenet-tasks by scheduling the execution of each tasklet nary, compiled from C code, that runs as a user thread. included in those Tenet-tasks Binaries are loaded by TinyLD and scheduled by the 4.6 Tenet TOSThreads scheduler. We have re-implemented the Tenet API using TOS- Threads. Tenet applications specify tasks as linear data- ROM RAM flow programs consisting of a sequence of tasklets2. Each 60 Tenet−Kernel 7 Tenet−Kernel Tenet−API 6 Tenet−API tasklet is implemented as a core TinyOS component, 50 40 5 providing a specific TinyOS service. For example, an 4 30 application that wants to be notified when the tempera- 3 Kilobytes 20 Kilobytes ◦ 2 ture at any mote exceeds 50 F would write the following 10 1 task: 0 0 Tenet Tenet−C Tenet Tenet−C Repeat(1000ms) -> Sample(ADC1,T) -> LEQ(A,T,50) -> DeleteDataIf(A) -> Send() Figure 11: Tenet mote code size (ROM/RAM). Tenet-C is a reimplementation of Tenet which dynamically loads and executes compiled binary tasks as user threads. Tenet consists of a task library, a task installer, and a task scheduler. The task library contains a collec- tion of tasklets, the task installer dynamically executes tasks it receives from the network, and the task sched- Tenet-C spawns one thread to service each Tenet task uler coordinates the execution of all running tasks. This and replaces Tenet’s original task scheduler with the scheduler maintains a queue of pending tasks and ser- TOSThreads thread scheduler. It uses TinyLD to dy- vices them in round-robin order (see Figure 9). Each namically link and load application binaries (Figure 10). tasklet runs to completion before the next one is sched- The rest of the original Tenet code runs unmodified uled. Furthermore, Tenet includes a task dissemina- but now becomes part of the kernel, running inside the tion protocol, transport and routing protocols, a time- TinyOS thread. However, Tenet-C ’s API is significantly synchronization protocol [19], and several other custom smaller. In Tenet-C, we only need to implement tasklets TinyOS components for accessing sensors and timers. such as Sample, Get, and Send in the form of blocking Tenet-C is a reimplementation of Tenet that signifi- system calls into the Tenet kernel. Many of the other cantly increases the expressivity of the tasking language, Tenet tasklets provide functionality (e.g., arithmetic op- yet does not require drastic modifications to the over- erations, comparisons) which is already provided na- all system. In Tenet-C, the user writes a C program, tively by C. In fact, the C language constructs for some instead of a data-flow task description, and compiles it of these functions are strict supersets of those Tenet’s into a dynamically loadable binary object. For exam- tasking language provides. For example, the original ple, the Tenet temperature sensing task example shown Tenet had no support for branching, and limited sup- before, can be re-written as: port for looping. An additional benefit of Tenet-C is that the binary code size is smaller than that of the void tosthread_main(void* arg) { original Tenet, as Figure 11 suggests. uint16_t T; We tested Tenet and Tenet-C on a 35-node testbed for(;;) { tosthread_sleep(1000ms); using five simple Tenet applications: blink, pingtree T = Sample(ADC1); (gathers topology information and draws routing tree), if (T <= 50) system (gathers mote’s internal system information), continue; Send(&T, sizeof(T)); collect (periodically collects sensor data), and del- } iverytest (tests end-to-end reliable packet delivery). } All application binaries were disseminated to motes in 2Even though Tenet runs on top of TinyOS, Tenet tasks are the network using Tenet’s internal task dissemination logically distinct from TinyOS tasks. protocol. 4.7 Additional uses of TOSThreads TOSThreads requires no explicit yields, simplifying pro- TOSThreads forms the basis for a high level program- gramming and preventing errors: users can run multiple ming language we have written called Latte – a Java- infinite loops. script variant for motes. Latte was designed to sim- Unlike Protothreads, which do not maintain thread plify the writing of efficient WSN applications. Latte context across blocking calls [7], TOSThreads is a full programs can either be interpreted within a JavaScript threads implementation. Therefore, users do not have enabled web browser or compiled directly down into to manually maintain continuations in the form of global C. Running programs in a browser simplifies the early variables, simplifying program design. On the other stages of application development and helps to reduce hand, this also means that TOSThreads requires much debugging cycles. Programs compiled into C make TOS- more memory than Protothreads, to maintain stacks. Threads based system calls that are either statically Numerous other concurrency proposals for TinyOS linked against a TinyOS kernel or dynamically loaded exist, including fibers [26], virtual machine threads [18], onto a running mote using TinyLD. A previous attempt and preemptive tasks [3]. None of these approaches al- of ours at providing the same end result, TinyJava- low users to write simple, application level thread-based Script, was built directly on top of TinyOS without programs on a TinyOS kernel, because they either limit TOSThreads support. Because TinyOS does not sup- the number of threads (fibers), are built into a special- port blocking calls, however, TinyJavaScript was forced ized runtime (fibers, VM threads), or break the TinyOS to expose an event-driven programming interface, in- concurrency model (preemptive tasks). creasing the implementation complexity of the compiler In addition to fully event-driven and Protothread pro- and at the same time decreasing the language’s ease of gramming models, Contiki provides an optional full thr- use. Details of both our TinyJavaScript and Latte im- eads library to applications. While these threads sup- plementations can be found in [23] and [24] respectively. port preemption in principle, none of the current im- TOSThreads has also been successfully used to ease plementations do, instead depending on explicit yield the implementation of a polling-based SD card and GPS points[4], similar to TinyThreads. Furthermore, it is driver for the MAMMARK [8] project at UCSC, as well unclear how full preemption could be safely included as for upcoming versions of the SPINE body sensor net- without following a model similar to TOSThreads while work project from Telecom Italia [14]. allowing I/O. Just as TinyOS does with its tasks, Con- tiki assumes non-preemptive multitasking within its ker- nel. With preemption, a thread could context switch 5. RELATED WORK while in the middle of a kernel call, causing kernel state We review prior threading proposals for TinyOS and to be inconsistent and possibly corrupt. other sensor network operating systems. While there Message passing in operating systems is not new; it are many prior and existing thread implementations for is a staple of microkernel designs. Microkernels typi- mote-class devices, none of them meet all of the four cally have kernel threads independent of user threads, requirements TOSThreads face. which respond to application requests. Separating ker- Message passing avoids the synchronization problems nel and user concurrency in this way enables the ker- that direct kernel traps introduce, allowing TOSThre- nel to control re-entrancy without explicit synchroniza- ads to be fully preemptive. For example, TinyMOS [25] tion: instead, synchronization occurs around the mes- which follows a direct trap model, runs TinyOS in a sage queues between user and kernel threads. While dedicated thread just as TOSThreads does. However, this approach has architectural elegance, experience has it requires synchronization primitives around core OS shown it to be prohibitively expensive: early implemen- abstractions, as the TinyOS concurrency model does tations (e.g., MkLinux) exhibit up to a 60-fold slowdown on some system calls and even state-of-the art microker- not understand preemption outside of interrupts. In 4 contrast, by using a message-passing approach, TOS- nels such as L Linux exhibit slowdowns of 20-150% [12]. Threads allows arbitrary concurrency within the ker- Virtual memory is a major cause of this slowdown, and nel, while requiring no changes to TinyOS code except the cost of system calls in multithreaded operating sys- for the interrupt handler post-ambles and the boot se- tems generally. Motes, with their low-power microcon- quence. trollers, do not suffer from the major costs of context Putting TinyOS in a separate, high priority thread al- switches common to high-performance processors. They lows TOSThreads to minimally disrupt existing TinyOS do not have virtual memory, removing the cost of TLB code, unlike TinyThreads [21]. As TinyThreads uses flushes, nor do they have speculative execution, remov- cooperative multithreading based on the TinyOS task ing the cost of a pipeline flush. scheduler, a single long-running thread can disrupt the There have been other proposals for dynamically load- task queue and therefore kernel services. In contrast, ing binaries on mote platforms. Like TinyLD, Flex- Cup allows dynamic loading of TinyOS components [20]. Proc. of the 4th international conference on Embedded However, FlexCup uses a linking and loading method networked sensor systems (SenSys), 2006. [6] A. Dunkels, B. Gronvall, and T. Voigt. Contiki - a lightweight that requires rebooting the node for the new image to and flexible operating system for tiny networked sensors. In run. Moreover, the application halts during the linking Proc. of the First IEEE Workshop on Embedded Networked Sensors (Emnets-I), Nov. 2004. and loading process. On the other hand, TinyLD does [7] A. Dunkels, O. Schmidt, T. Voigt, and M. Ali. Protothreads: not have these limitations. Simplifying event-driven programming of memory-constrained embedded systems. In Proc. of the Fourth ACM Conference Contiki is another mote operating system that sup- on Embedded Networked Sensor Systems (SenSys), Nov. 2006. ports loadable objects [5]. Contiki uses Compact ELF [8] G. Elkaim, E. Decker, G. Oliver, and B. Wright. Marine mammal marker (mammark) dead reckoning sensor for in-situ (CELF) binaries which, like MicroExe binaries, are also environmental monitoring. In Proc. of the ION/IEEE compressed versions of ELF binaries. Although both Position, Location, and Navigation Symposium, ION/IEEE PLANS, pages 25–27, Apr. 2006. file formats derive from ELF, they are not compatible [9] D. Gay, P. Levis, R. von Behren, M. Welsh, E. Brewer, and with it. TinyLD works in more restrictive environments D. Culler. The nesC Language: A Holistic Approach to because it does not assume the existence of a byte-level Networked Embedded Systems. In Proc. of PLDI, 2003. [10] O. Gnawali, B. Greenstein, K.-Y. Jang, A. Joki, J. Paek, external memory (the ESB platform used in Contiki has M. Vieira, D. Estrin, R. Govindan, and E. Kohler. The TENET a 64KB EEPROM for the loader to store the binary im- Architecture for Tiered Sensor Networks. In Proc. of the 4th international conference on Embedded networked sensor age during the linking and relocating phase). systems (SenSys), Nov. 2006. [11] C.-C. Han, R. Kumar, R. Shea, E. Kohler, and M. Srivastava. A Dynamic Operating System for Sensor Nodes. In Proc. of the 6. SUMMARY AND CONCLUSIONS International Conference on Mobile Systems, Applications, TOSThreads is a fully functional thread library de- and Services (Mobisys), June 2005. [12]H.H ¨artig, M. Hohmuth, J. Liedtke, S. Sch¨onberg, and signed for TinyOS. It provides a natural extension to J. Wolter. The performance of µ-kernel-based systems. In Proc. the existing TinyOS concurrency model, allowing long of SOSP. [13] J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, and running computations to be interleaved with timing- K. Pister. System architecture directions for network sensors. In sensitive operations. TOSThreads’ support for efficiently Proc. of ASPLOS 2000, Nov. 2000. [14] T. Italia. Spine: Signal processing in node environment. running dynamically loaded binaries, combined with its Available at http://spine.tilab.com/. ability to support a flexible user/kernel boundary, en- [15] S. Kim, S. Pakzad, D. Culler, J. Demmel, G. Fenves, S. Glaser, and M. Turon. Health Monitoring of Civil Infrastructures Using ables experimentation with a variety of high-level pro- Wireless Sensor Networks. In Proc. of the 6th International gramming paradigms for sensor network applications. Conference on Information Processing in Sensor Networks (IPSN ’07), Apr. 2007. We hope that this capability will accelerate the move- [16] K. Klues, V. Handziski, C. Lu, A. Wolisz, D. Culler, D. Gay, ment towards a standard TinyOS kernel that can sup- and P. Levis. Integrating Concurrency Control and Energy port a wide range of applications. Management in Device Drivers. In Proc. of SOSP, 2007. [17] J. R. Levine. Linkers and Loaders. Morgan Kaufmann Modern threading systems and OS kernels are op- Publishers, 2000. timized for high-performance processors, with virtual [18] P. Levis, D. Gay, and D. Culler. Active sensor networks. In Proc. of NSDI. memory, caches, and large context switch latencies. In [19] M. Marot, B. Kusy, G. Simon, and A. Ledeczi. The flooding contrast, TOSThreads is designed for a microcontroller, time synchronization protocol. In Proc. of the 2nd international conference on Embedded networked sensor whose different properties cause an approach discarded systems (SenSys), pages 39–49, Nov. 2004. long ago – message passing – to be both efficient and [20] P. J. Marr´on, M. Gauger, A. Lachenmann, D. Minder, O. Saukh, and K. Rothermel. FlexCup: A Flexible and Efficient compelling. This suggests another way in which differ- Code Update Mechanism for Sensor Networks. In Proc. of the ent application workloads and hardware considerations Third European Workshop on Wireless Sensor Networks cause system design in ultra-low power sensor networks (EWSN 2006), pages 212–227, February 2006. [21] W. P. McCartney and N. Sridhar. Abstractions for safe to differ from that in mainstream platforms. concurrent programming in networked embedded systems. In Proc. of the 4th international conference on Embedded 7. REFERENCES networked sensor systems (SenSys), pages 167–180, 2006. [1] S. Bhatti, J. Carlson, H. Dai, J. Deng, J. Rose, A. Sheth, [22] MoteIV Corporation. Tmote Sky. Available at: B. Shucker, C. Gruenwald, A. Torgerson, and R. Han. MANTIS http://www.moteiv.com/products/tmotesky.php. OS: An Embedded Multithreaded Operating System for [23] R. Mus˘aloiu-E., C.-J. M. Liang, and A. Terzis. A Modular Wireless Micro Sensor Platforms. ACM/Kluwer Mobile Approach for Developing and Updating Wireless Sensor Networks Applications (MONET), Special Issue on Wireless Network Applications. Technical Report xx-10-2008-HiNRG, Sensor Networks, 10(4):563–579, Aug. 2005. Johns Hopkins University, 2008. [2] K. Chintalapudi, T. Fu, J. Paek, N. Kothari, S. Rangwala, [24] R. Mus˘aloiu-E. and A. Terzis. The Latte Language. Technical J. Caffrey, R. Govindan, E. Johnson, and S. Masri. Monitoring Report xx-10-2008-HiNRG, Johns Hopkins University, 2008. Civil Structures with a . IEEE [25] E. Trumpler and R. Han. A Systematic Framework for Evolving Internet Computing, 10(2), March/April 2006. TinyOS. In Proceedins of the Third IEEE Workshop on [3] C. Duffy, U. Roedig, J. Herbert, and C. J. Sreenan. Adding Embedded Networked Sensors (EmNetS), 2006. Preemption to TinyOS. In Proceedins of the Fourth IEEE [26] M. Welsh and G. Mainland. Programming Sensor Networks Workshop on Embedded Networked Sensors (EmNetS), 2007. Using Abstract Regions. In Proc. of NSDI, Mar. 2004. [4] A. Dunkels. Using Multi-Threading in Contiki. Available at: [27] G. Werner-Allen, K. Lorincz, J. Johnson, J. Lees, and http://www.sics.se/contiki/developers/ M. Welsh. Fidelity and Yield in a Volcano Monitoring Sensor using-multi-threading-in-contiki.html, Jan. 2007. Network. In Proc. of OSDI, Nov. 2006. [5] A. Dunkels, N. Finne, J. Eriksson, and T. Voigt. Run-time [28] D. Wheeler. The SLOCCount utility. Available at dynamic linking for reprogramming wireless sensor networks. In http://www.dwheeler.com/sloccount/.