Dipartimento di Elettronica, Informazione e Bioingegneria Politecnico 20133 Milano (Italia) Piazza Leonardo da Vinci, 32 di Milano Tel. (39) 02-2399.3493 Fax (39) 02-2399.3574

Technical Report n. 2016.17

Coin: Opening the Internet of Things to People’s Mobile Devices

Maria Laura Stefanizzi, Luca Mottola+, Luca Mainetti, Luigi Patrono

University of Salento (Italy), +Politecnico di Milano (Italy) and SICS Swedish ICT

Coin:OpeningtheInternetofThings to People’s Mobile Devices

+ Maria Laura Stefanizzi⇤,LucaMottola , Luca Mainetti⇤,LuigiPatrono⇤ ⇤University of Salento (Italy), +Politecnico di Milano (Italy) and SICS Swedish ICT July 2, 2016

Abstract People’s interaction with Internet of Things (IoT) devices such as prox- imity beacons, body-worn sensors, and controllable light bulbs is often me- diated through personal mobile devices. Our study of current approaches reveals that applications often operate in separate silos, as the functional- ity of IoT devices is fixed by vendors and typically accessed only through low-level proprietary APIs. This limits the flexibility in designing applica- tions and requires intense wireless interactions, which may impact energy consumption. Coin is a system architecture that breaks this separation by allowing developers to flexibly run a slice of a mobile app’s logic onto IoT devices. Mobile apps can dynamically deploy arbitrary tasks imple- mented as loosely-coupled components. The underlying run-time support takes care of the coordination across tasks and of their real-time schedul- ing. Our prototype indicates that Coin both enables increased flexibility and improves energy eciency at the IoT device, compared to traditional architectures.

1 Introduction

The rise of smartphones and tablets makes them the established means for peo- ple to interact with Internet of Things (IoT) devices such as proximity beacons, body-worn sensors, and controllable light bulbs. A multitude of applications employ personal mobile devices to allow people to control, interact, and query sensors and actuators in the environment or on the human body. Technologies such as Low Energy (BLE), now commonly found both on personal mobile devices and IoT ones, are key facilitators for these trends. Motivation. Architectures applied in these applications, however, treat the IoT devices as immutable black-boxes and operate in separate silos. Mobile apps are sometimes re-routed to intermediate gateways. Whenever direct ac- cess to the IoT device is allowed, the latter typically o↵ers application-agnostic

1 APIs that mainly enable extracting raw data and/or controlling basic actuator functionality. Changes to the on-board are limited to firmware updates by manufacturers to patch bugs or security flaws. Such a state of a↵airs entails that: i) mobile apps are developed based on vendor-specific APIs, preventing portability; ii) even the simplest functionality requires intense wireless interactions, a↵ecting energy consumption; and iii) app functionality are limited to the time of actual wireless connection, that is, disconnected operations are fundamentally hampered. Unlike current practice, many foresee the interaction between humans and IoT devices to happen over an open, vendor-independent programmable sub- strate that enables a multitude of apps to co-exist, similar to smartphones and tablets. For example, an IoT device with body temperature and heart rate sen- sors may serve both fitness apps to track an individual’s well-being and smart- health apps that communicate vital parameters to doctors. Individual building blocks necessary to realize these functionality are perhaps already available, yet a system architecture that blends them together in a working realization is arguably still lacking. Coin. We bridge this gap by designing a system architecture that allows IoT devices to: i) run an arbitrary slice of a mobile app’s logic in an on-demand fashion, and ii) host application data according to programmer-provided crite- ria, independent of the connection to user devices. Coin’s key contribution is to provide the “glue” that allows existing building blocks to cooperate to realize the vision above. The problem is unique in many respects. Unlike traditional sensor network- ing, for example, applications are supplied by third parties. Their character- istics, such as processing and memory requirements, are dicult to anticipate. Multiple applications may need to operate concurrently, that is, not simply run side-by-side, but be able to exchange data with other a priori unknown apps. Processing is also expected to be largely event-driven; for example, being dic- tated by connections of mobile devices, rather than occurring periodically. Coin rests upon two pillars: a custom programming model andits run-time support. The former is based on a notion of lightweight task as a programmer- defined relocatable slice of mobile app logic. In a fitness scenario, for example, programmers may define a task that computes burned calories based on the sensors available on fitness trackers. The mobile phone may opportunistically deploy such a task on the fitness tracker to limit data exchanges to a single quantity rather raw data. Multiple tasks on an IoT device can interact among themselves in a loosely-coupled manner, based on an actor-like [1] model. Coin’s run-time support accommodates existing building blocks to eciently implement the required semantics. A message broker mediates interactions across tasks, while their executions are scheduled using an Earliest Deadline First (EDF) policy. Dynamic deployment of tasks is supported using a Virtual Machine (VM) developers plug in. Although Coin is independent of the under- lying technology, our prototype targets Cortex M microcontrollers (MCUs) and BLE radios, representative of target applications where energy budgets are as

2 small as a Coin-cell battery. The rest of the article is organized as follows. We describe the design and implementation of a pervasive game using Coin, an application otherwise un- feasible with equal features using vendor-specific architectures. We report on the performance of Coin in energy consumption and execution times. The re- sults indicate that the energy savings enabled by reducing wireless interactions through device-local processing overcome the cost of code interpretation, vali- dating our design choices. The price to pay are larger execution times, yet the values we obtain are not expected impact the application responsiveness.

2 Background

We analyze target application scenarios, corresponding software architectures in existing systems, and derive requirements on a system architecture that over- comes the limitations we identify. We then contrast our work with existing literature.

2.1 Applications Blending mobile computing as enabled by smart-phones and tablets, with envi- ronment interactions enabled by embedded sensors and actuators gives rise to applications with distinct characteristics. We study three classes of applications and corresponding software architectures. Body sensor apps. In fitness, health-care, and sport applications, body-worn devices with physiological, biochemical, or inertial sensors monitor key physi- cal parameters, for example, blood pressure, temperature, and heart rate [36]. Typically, the raw sensor values are streamed to a personal mobile device. The mobile device acts as the sole processing unit. Although simple to implement, similar architectures tend to be inecient in terms of energy consumption. The continuous interactions between mobile devices and embedded sensors/actuators occur over wireless, whose operation is costly in terms of energy consumption, especially for the connected device. Algorithms exist that are conceived to relocate part of the data process- ing closer to the sensors. Examples are found in activity recognition [34] and electrocardiogram analysis (ECG) [2], where techniques ranging from filtering and downsampling [4] to sophisticated classification algorithms [16] may be em- ployed aboard the connected device. As a result, the amount of data to trans- mit wirelessly reduces, and so energy consumption improves. Flexible system support on the connected device is required to employ these algorithms in a vendor-independent and reconfigurable manner. Immersive mobile computing. The interactions between mobile devices and environment-immersed sensors and actuators need not be continuous. The latter may be installed at fixed locations in the environment, and opportunistically interact with the mobile devices if and when they are in range.

3 Representative examples are pervasive games [22], where embedded devices are hidden in the environment to bridge the virtual world within a game with the physical reality. These devices are often used as general-purpose, environment- immersed data stores to handle information relevant to the game plot [23]. Access to this (virtual) information is dictated by (physical) location, enhancing the game experience. Pervasive games are currently installed using dedicated embedded hardware in the environment, deployed for the sole purpose of running the game and later removed. Reusing already installed hardware is generally not possible, as the latter misses the necessary programming facilities and is unable to serve more than one mobile app at a time. Monitoring and tracking. The continuous interactions between mobile and connected devices may break also because of mobility of the latter. In supply chain applications [14], embedded sensors are attached to packages to log infor- mation such as temperature, humidity, and vibrations the package is exposed to during transportation. As soon as the package enters a warehouse, locally stored information are uploaded to a central processing unit, for example, the mobile phone of the warehouse personnel, for inspection and further evaluations. In this scenario as in the previous one, the ability to store and process sensor data on the connected device independent of the connection to a mobile device is fundamental. Right now, such degree of decoupling can only be realized with one-o↵application-specific implementations.

2.2 Requirements To overcome these limitations, we derive the following main requirements for the design of a system architecture at the connected device: Device-local processing: the ability to run application-specific functionality • on the connected device enables continuing to compute independent of the mobile device, and applying algorithms to reduce the wireless interactions with the mobile devices, saving energy. Data persistency: the connected device must be able to retain application • data according to programmer-provided criteria independent of the connec- tion to a mobile device, enabling a form of disconnected operation. Dynamic deployment: the logic at the connected device may not be known • beforehand, but be provided on-the-fly by the mobile device itself; the run- time support at the connected device must accommodate this need. Real-time scheduling: multiple, independently-developed applications may • need to coexist on the same connected device; the system must ensure that the real-time requirements for each of them are fulfilled whenever possible. High-level programming and portability: the application functionality • for the connected device must be developed using high-level languages to ease the implementations, and not require increased e↵orts to adapt to di↵erent hardware.

The dominant approach to develop many of the applications we describe is

4 based on a straightforward sense-and-send design. Connected devices are em- ployed as shipped by manufacturers, that is, with pre-loaded firmwares that only enable low-level interactions to extract raw data or to drive actuators. As a result, the entire application logic needs to execute at the mobile device and be encoded in a vendor-specific manner. Besides not enabling any discon- nected operation, these designs decrease the general portability of mobile apps, consequently increase the development e↵orts, and are detrimental to energy consumption especially at the connected device.

2.3 Building Blocks Approaches exist that address single issues in the scenarios we target; for exam- ple, in the field of operating systems (OSes) for sensor nodes, VM technology for resource-constrained devices, and interoperability frameworks. Sensor network OSes, such as , LiteOS, and SenSmart, o↵er dynamic linking capabilities, which allows di↵erent applications to be added or replaced at run-time. However, the OS per se does not provide a programming model that allows dynamically-deployed applications to discover each other and exchange data through generic interfaces. Di↵erently, components must be developed based on how they bind to already running components, which requires intimate knowledge of the latter. Most importantly, OSes in this area o↵er low-level programming interfaces based on languages such as , which contrasts with the modern development tools available for mobile apps. VMs for resource-constrained devices retain the ability of dynamic code de- ployment while o↵ering hardware independence and higher-level programming languages, such as Java or Python. These aspects motivate us to base Coin on VM technology. The cost of code interpretation is the price we pay to facilitate the development process. Because of the rise of energy-ecient 32-bit MCUs, such as ARM’s Cortex M series, we demonstrate this represents an e↵ective design point. The design of Coin is orthogonal to the VM one plugs into the architecture. Mat´e[20] o↵ers a custom language that can be tailored to specific application domains. TakaTuka [3] and Darjeeling [6] provide Java VMs for 16-bit MCUs, whereas Squawk [33] targets higher-end SunSPOT sensor nodes. DAViM [15] rather focuses on isolating multiple applications from each other. In general, the design of embedded VMs targets ecient code interpretation, without providing dedicated abstractions for coordinating concurrent third-party applications. The heterogeneity of IoT devices motivate e↵orts in interoperability frame- works and reference architectures. Examples are AllJoyn and IoTivity. These ease development by defining vendor-agnostic APIs for applications to inter- operate, based on IoT devices with much greater resources compared to ours and without providing the ability to relocate parts of the application logic. Ef- forts such as IoT-A, SENSEI, and OpenIoT provide open architectures to facil- itate application development via semantically-interoperable interfaces. These are complementary to Coin, which may help realize more flexible or ecient implementations exported through the same interfaces in these architectures.

5 deploy task

Sensors Actuators Task 1 Task 2 … Task n

Data store read data connected device Figure 1: Tasks interacting in a loosely coupled manner.

1 def foo(x): 2 #dosomething... 3 def boo(x): 4 #dosomethingelse... 5 startTask() 6 y=foo(inputData)+boo(inputData) 7 output(y)

Figure 2: Example task code.

3 Programming Model

To fulfill the requirements identified in Section 2, first we shift the interactions across mobile and connected devices to data rather the devices. We achieve this through a notion of task as a programmer-defined relocatable slice a mobile app’s logic, amenable for execution on the connected device. Decoupling. To enable interactions among functionality supplied by di↵er- ent parties, tasks are decoupled both in time and data. They cannot share global data and only interact asynchronously in an actor-like fashion [1]. Their data-driven interactions occur through a single abstract data store, as shown in Figure 1. This spares the need of dynamically reconfiguring the bindings among tasks as these come and go, and enables data-driven discovery of available func- tionality. Tasks are completely defined by the data types they consume or produce. Information on input and output data types are included in a manifest deployed together with the task. The data types are specified as named data structures using the same programming language as for the task. The manifest of the task deployed by the fitness app, for example, may indicate input types that include acceleration and blood pressure readings as double precision numerical values, and outputs such as burned calories as integer values. Existing sensor data models [5] can be applied to uniform the naming and format of data. Execution. Figure 2 shows a code snippet for a simple task that applies two programmer-defined functions to some input data and outputs the results. The code in Figure 2 uses the Python syntax, yet the programming model is inde- pendent of the specific language. We discuss our choice in Section 4.

6 Task 1 Task 2 … Task n

VM Task manager Broker Scheduler

Embedded hardware

Figure 3: Coin architecture.

Task processing is meant to be entirely reactive, and only triggered by the availability of any of the input data types. For example, we discourage the use of long-running threads whose fair scheduling may become dicult in the setting we target. Input data is made available to the Python script using a dedicated API. The API also provides operations to output the results of the task and to indicate the start of processing, required to simplify Python’s modularity model. An example of the former is the output() function used in Figure 2, whereas startTask() indicates where the task should start executing each time. This API is the only interface developers employ to write Coin tasks; other than this, developers are free to encode any arbitrary application logic. As shown in Figure 1, the input data of a task may come from the sensors aboard the device, or be the result of a di↵erent task. In the former case, sensors are automatically probed according to the required input rates as specified in the task manifest. Otherwise, tasks employing the output of other tasks are triggered whenever these produce some data. For example, a health-monitoring app may employ the burned calorie information of the fitness app to augment the long-term time analysis. Such a data-driven model also facilitates developing vendor-independent interaction paradigms by progressively transforming raw data in semantically richer information or through di↵erent data formats. Persistence. The execution of tasks is also decoupled from the connection to the mobile device. Tasks may, for example, reside on the connected device also whenever the mobile device that originally deployed them moves away, enabling disconnected operations. Unlike the traditional actor model [1] where data is lost if no actor immedi- ately consumes it, Coin applies persistency to data as well. Tasks may produce data that remains on the connected device, until a mobile device running a matching app connects and downloads the data. Data resides on the connected device according to programmer-defined criteria, such as a given time interval, rendering the device a general-purpose environment-immersed data store. This matches the needs of immersive mobile computing applications, such as per- vasive games [22], whose system support requires to read/write information at di↵erent spatial locations.

7 deployment execution complete

Uninstalled Installed Started Enabled Running remove start/stop data VM request request match launch

Figure 4: Task life-cycle in Coin.

4 Run-time Support

We describe the design of the Coin run-time support and its prototype imple- mentation.

4.1 Components Figure 3 shows the architecture of the Coin run-time support, composed of four components: i) a task manager, ii) a VM layer, iii) a data broker, and iv) a real-time scheduler. Task manager. Figure 4 shows the life-cycle of a task, controlled by the task manager. As soon as a mobile device completes a task deployment, the man- ager examines the manifest, creates the necessary support data structures, and places the task in the Installed state. If a mobile device requires to start an Installed task, the task manager adds the task to the scheduling list; the task enters the Started state. Explicitly distinguishing between Installed and Started allows one to retain a task on a connected device with the smallest bookkeeping overhead even if it is not required to run, avoiding the costly oper- ation of dynamic deployment should the application need to run the task again in the future. Whenever data appears in the data store that matches a task’s input data types, the task enters the Enabled state and is scheduled for execution. Even- tually, the task is handed over to the VM; during this time, the task becomes Running. Once the execution finishes, the task returns in a Started state. If a mobile device requires to stop a Started task, the task manager moves it back into the Installed state, removing it from the scheduling list. Finally, whenever a mobile device requires a task in the Installed state to be unin- stalled, the task manager moves the task into the Uninstalled state while performing the necessary clean-up. Virtual machine. The VM layer is in charge of the concrete execution of dynamically deployed tasks. Using a VM rather than some form of run-time binary linking has pros and cons. A VM detaches the implementation of tasks from the specific hardware. Further, a VM lessens the programming burden by supporting high-level languages. On the downside, VMs su↵er some unavoidable processing overhead due to code interpretation. Given the heterogeneous setting we target, the non-safety critical nature of most mobile apps, and the availability of powerful MCUs with reduced energy consumption, we argue that the use of a VM brings more

8 advantages than disadvantages. We quantify these trade-o↵s in Section 6 based on representative applications. Broker and scheduler. The broker manages data exchanges through the data store. Matching of data producers and consumers occurs based on the data types, as defined in the task manifests or based on pre-defined types for sensors and actuators. Whenever a match is identified, the consumer task is handed over to the scheduler. A data match may happen for multiple tasks simultaneously if they consume the same data type. The question is when and in what order to execute these tasks. To ensure responsiveness, our scheduler implements an Earliest Deadline First (EDF) [21] policy. Information on the absolute deadline of a task and its expected execution time are part of the manifest. If two tasks have the same absolute deadlines, we execute the one with the lowest execution time first. The scheduler also ensures that every task runs to completion; concurrent events, such as connection requests from other mobile devices, are postponed to when the currently executing task finishes. We choose an EDF policy because of its real-time optimality [17]: if a sched- ule able to meet all task deadlines exist, EDF definitely finds one. The poten- tial issue in applying EDF is its processing overhead. As we demonstrate in Section 6, this does not represent an issue in our setting, also because we are generally not expecting an extremely large number of tasks to be simultaneously triggered. As tasks are also expected to be short-lived, running them to comple- tion should not pose particular problems, as it happens in system architectures that share a similar design rationale [13].

4.2 Prototype We implement a prototype of Coin using the C language, targeting 32-bit Cortex M MCUs and BLE radios. We realize custom implementations for all functional components but the VM layer. The broker takes care of mapping items in the data store to custom BLE characteristics and attributes [32] to give mobile devices standard-compliant access to the data produced by tasks. The task manager also interacts with the low-level BLE driver to receive the task along with its manifest. All necessary data exchanges happen over the air. We port PyMite [29], a reduced Python interpreter, as VM layer. Although Coin is independent of the programming language used to write tasks, we choose Python because of several reasons. Compared to VM-based languages such as Java, its realization on resource-constrained systems is far less limited; for exam- ple, PyMite retains the support to multiple programming paradigms, including object-oriented, imperative, and functional. Moreover, unlike languages such as JavaScript, Python directly compiles to bytecode, which reduces network traf- fic when deploying tasks. For example, the most complex application we use as benchmark in Section 6 yields slightly more than 1 KB of Python bytecode. The same functionality implemented in JavaScript yields almost twice the amount of data, to be transferred over wireless.

9 Table 1: Program memory and RAM requirements. Component Program memory [B] Data (RAM) [B] Task manager 112 564 Broker 8.456 876 Scheduler 3.912 584 PyMite VM 119.222 6.884 BLE and Platform 22.814 1.168 Complete system 154.516 10.076

We map Coin tasks to PyMite threads. This requires adapting the PyMite interpreter along di↵erent dimensions. We replace the built-in round robin scheduler with the EDF scheduler described earlier. In the original PyMite implementation, the state of a thread is lost as soon as the execution exits the Python bytecode. We thus extend the VM internals to maintain the thread state across di↵erent executions of the same task, and to support re-entrance using the startTask() construct. Finally, we choose to save the precious RAM seg- ments and store the Python bytecode on flash memory, which requires adapting the VM to execute the code o↵the latter. Table 1 reports on the memory overhead of our prototype. Task manager, broker, and scheduler account for about 12 KB (1,8 KB) of program (data) memory. The majority of the overhead is currently due to the PyMite port, which is however only meant to provide a working Python interpreter. Even with this significant overhead, which we believe could be shaved o↵significantly by tailoring PyMite for the specific functionality required in Coin tasks, our prototype fits the resource constraints of most existing 32-bit embedded plat- forms. For example, the Cortex M0 core often used in SoC designs together with a BLE radio [24] provides 256 KB (16 KB) of program (data) memory. On this platform, our prototype leaves about 40% (38%) of program (data) mem- ory available to tasks. The situation is even better for the other representative platforms we employ in Section 6.

5 Using Coin

We report on the use of Coin in the design and implementation of a pervasive game [22] whose logic spans smartphone and sensors in the environment. The game. Only the bravest can become pirates! To prove their qualities, the aspiring pirates must travel to a mysterious island and overcome several challenges. Players are divided into teams. The team who obtains the highest score becomes the pirate crew. To accumulate points, a team must collect items scattered across the island. Items are of di↵erent types: compasses, rare seeds, parrot eggs, bottles of rum, and sabers. The value of the first three kinds of

10 Figure 5: User interface on a person’s smartphone and environment-immersed sensor device in our pervasive game. items is 10 points. The latter two give 200 points, and can be obtained by paying a merchant with some of the collected items. A team may decide to transform rare seeds and parrot eggs into plants and parrots, respectively. By doing so, the transformed item yields a score 10 times higher of the original one. However, eggs and seeds grow only if they live at the right temperature for a sucient time. To this end, teams must deposit the collected eggs or seeds in suitable places, taking care not to be seen by opponents, who could kidnap the items after their transformation. From virtual to physical. Coin allows us to develop the game in a way that no other existing platform would enable. Real-time user interactions happen through the touch interface of a standard smartphone. We deploy Coin on ST Nucleo-F091RC prototyping boards equipped with a BlueNRG BLE radio and an ST X-Nucleo-IKS01A1 sensor shield, as shown in Figure 5, and install them in our department building. We map virtual items in the game to dedicated data structures dynamically stored through Coin aboard the Nucleo boards. As a result, accessibility of items corresponds to proximity to the Nucleo board storing the corresponding data structure—as dictated by BLE communication range—and mobility on the island maps to physical mobility in our building. Implementing the collection or release of items in the game is thus as simple as reading or writing from/to Coin’s data store from a player’s smartphone. With this design, straightfor- wardly enabled by Coin, virtual and physical dimensions spontaneously blend together. However, there is more that Coin enables. Temperature conditions that determine how eggs and seeds grow are now simple to link to temperature in the physical environment. A player’s smartphone can dynamically deploy, together with the item itself, a simple Coin task that periodically probes the temperature sensor on the Nucleo board and accordingly modifies the values of the data structure representing the item. This may happen independent of the connection to a player’s smartphone, giving players the illusion that the game

11 unfolds across the virtual and the physical world. In our implementation of the game, the corresponding Python script is as small as 320 . Finally, unlike existing pervasive games [22], Coin naturally allows other apps to re-use the deployed sensing infrastructure. Say, for example, a new app is developed to control air conditioning in our oces based on temperature and individual preferences. A user’s smartphone may deploy a new Coin task that computes short- and long-term trends of relevant quantities, useful as inputs for implementing the feedback loop. Provided the sampling periods are compatible, the new task may just re-use part of the sensed data that the game already requires.

6 Evaluation

The key aspect to demonstrate the benefits of our design is to assess whether the savings due to executing part of the mobile app’s logic onto the connected device can pay o↵the overhead due to the Coin run-time.

6.1 Benchmarks We consider three representative applications based on the scenarios and re- quirements described in Section 2. Each application corresponds to a Python task running in Coin. Data compression. The most straightforward solution to save energy by reducing wireless communications is to take advantage of device-local processing to compress data prior to transmission. This way, the net amount of data reduces, and so the utilization of the wireless link, the corresponding energy consumption, and the time required for the actual transfer. We consider a custom version of Run Length Encoding (RLE) compres- sion [8]. The functionality is simple: the algorithm replaces n consecutive oc- currences of the same data item with an encoding of the number of repetitions. As a result, the compression ratios it achieves are a function of the input data. The more repetitive these are, the more data is compressed. In fact, RLE is often advocated for applications where sensors report stable values [35]. We feed our RLE implementation with 1024 bytes of dummy data generated in a way to yield di↵erent compression ratios. This allows us to explore the influence of this parameter on the performance. Activity detection. Physical activities are often characterized using machine learning techniques on accelerometer data [18]. Part of the required processing may be o↵-loaded onto body-worn sensors to increase the decoupling from the mobile devices, besides reducing the amount of wireless transfers. We consider a classification algorithm able to distinguish between standing or walking activities. The activity detection (Ad) occurs based on 5 Hz ac- celerometer data and regardless of the orientation of the sensor, by computing the average and standard deviation of the amplitude of the acceleration signal

12 Table 2: Platforms used for performance evaluation. Name Core Program Data [KB] [KB] Freescale FRDM-KL46Z [12] Cortex M0+ 256 32 Nordic nRF51-DK [24] Cortex M0 256 16 NXP LPCXpresso1549 [25] Cortex M3 256 36 NXP LPCXpresso4337 [26] Cortex M4 1.024 136 along the three axis. The classifier considers a time window of T seconds to rec- ognize the current activity, and reports the result to the mobile device. We vary T to understand the trade-o↵between the latency to report the information to the mobile device and the run-time overhead. Electrocardiogram (Ecg) analysis. Heart problems may be detected by analyzing the signal from Ecg sensors. Commercial products [30] transmit raw data over a radio channel to a central gateway for storage, processing, and feature extraction. Parts of the signal processing, however, may occur closer to the sensor, reducing the communication load. We consider a simple algorithm for extracting heart rate information [2]. The signal is initially passed through a bandpass filter. Next, the algorithm computes the first and second derivative of the signal using two approximate 5-tap filters [11] with integer coecients. The results are added together and considered in absolute value. A 6-tap moving average is then applied, and the result compared against a threshold to detect any peaks that may indicate physiological issues. We keep the report interval of peaks fixed to seven seconds, and vary the frequency F of sensor readings within this interval.

6.2 Setup and Metrics We consider four prototyping platforms representative of the hardware applica- ble to the scenarios of Section 2. Table 2 reports their salient characteristics. Whenever the board does not include a BLE radio, we attach a Seeed Bluetooth Shield [31]. We use the on-board accelerometer whenever available, or other- wise hook an shield with an ST LSM303D sensor. In the absence of an electrocardiogram sensor, we use accelerometer readings as input; the nature of the data does not impact the execution of the algorithm for Ecg analysis. We configure all boards to favor energy consumption over processing time, as expected in battery-operated scenarios such as those described in Section 2. This generally entails running the MCU at the lowest frequency and disabling all unneeded peripherals, for example, the capacitive touch sensor and LCD interface on the FRDM-KL46Z. The advertising mode of BLE, which would equally impact the energy consumption of a traditional design and of Coin,is also disabled to gain better control on the execution of experiments. We focus on energy consumption and execution times.Whenstudyingthe former, we compare a Coin-based design with a traditional sense-and-send de-

13 40 FRDM-KL46Z 35 nRF51-DK 40 30 FRDM-KL46Z LPCXpresso1549 35 25 nRF51-DK LPCXpresso4337 30 4020 LPCXpresso1549 25 15 LPCXpresso4337 20 3510 15 30 5

10 Energy improvement (%) 25 0 5 RLE AD ECG 20 Application

Energy improvement (%) 0 RLE AD 15ECG Application 10 5

Energy improvement (%) 0 RLE AD ECG Application Figure 6: Energy performance of Coin compared to a sense-and-send design.

sign where a device only samples sensors and transmits raw data. The imple- mentation uses plain C and is cross-compiled as an (immutable) binary. The executions do not include any explicit query from a mobile device. The cost of receiving such a query over wireless for every single data time would only make the traditional design perform worse. As for execution times, we compare Coin executions against functionally-equivalent implementations of the same task functionality, again using plain C. We power the boards through a stabilized power generator and attach a Tektronix TBS 1072B oscilloscope to track the current absorption and the state of GPIO pins. This allows us to accurately track energy absorbed and execution times, the latter based on a proper instrumentation to change the state of GPIO pins depending on what slice of the code is executing. All values we present next are averages over at least five repetitions. The standard deviation across di↵erent runs, not shown in the charts, is always within 5% of the average.

6.3 Results Energy consumption. Initially, we consider the energy consumption of the benchmark applications with default parameters. This means RLE compression is set to achieve a 50% compression ratio—in fact quite pessimistic for RLE compression of sensor data [35]—Ad reports data to the mobile device every 30 sec, whereas Ecg samples the sensors at 30 Hz. We compute the energy consumption improvement of Coin compared to a sense-and-send design. Figure 6 reports the results. The trade-o↵between saving transmissions by deploying Coin tasks that implement device-local processing, and the additional MCU overhead due to the Coin run-time support is in favor of the former. Coin allows one to reap significant benefits; in our experiments, the improvement in energy consumption is at least 25%, with a best-case of 35%. This is despite the ecient energy performance of BLE radios and the higher energy consumption of 32-bit MCUs, compared to less capable 16-bit MCUs such as the widespread MSP430. With Coin, we are able to identify an ecient design point.

14 Table 3: Execution times of tasks with PCXpresso4337. Task Plain C [us] Coin [ms] Ratio RLE 108,37 8,63 79,93 Ad-local 163,97 4,98 30,55 Ad-transmission 68,31 5,22 76,76 Ecg-local 127,82 3,78 29,76 Ecg-transmission 89,21 4,55 51,12

Interestingly, the LPCXpresso4337 board shows the best performance in Figure 6 when running the Ad task. The FPU aboard its Cortex M4 core speeds up the execution of the floating point operations required in Ad. In contrast, the Cortex M0+ core on the nRF51-DK provides the best performance with sequential -level operations, as required in RLE.TheEcg task, instead, includes a mixture of integer and floating-point arithmetic, which is reflected in Figure 6. Execution times. Larger execution times represent the cost for increased flexibility and better energy eciency in Coin. To quantify this aspect, we measure the execution times of a single task iteration. For the Ad and Ecg tasks, we separate the case of regular local processing, which happens at every iteration, from the added execution time to prepare data for transmission, for example, by performing additional computations to post-process data. Table 3 exemplifies the execution times for the PCXpresso4337 board. The absolute times are di↵erent with the other platforms, yet the ratios are similar. If placed in perspective, the absolute values are still reasonable. They range in the same order of magnitude of packet transmissions. Therefore, this performance should not be detrimental to the responsiveness of the application as a whole, including user interactions through the mobile device. However, the table shows that task execution results in a significant slow- down using Coin. Such a slowdown is vastly dominated by the VM-based execution, that is, our implementations of broker, scheduler, and task manager play very little role. In terms of VM technology, the values in Table 3 are in line with existing literature [6, 3]. Worth considering is also that PyMite is not expressly designed for the platforms we target, neither we explicitly optimize it besides the adaptations described in Section 4. As each task may take longer to run, the slowdown may impact the overall schedulability, limiting the number of tasks that may concurrently execute on the same device. For example, we can run at most six instances of the Ad task that executes at 5Hz on a Cortex M0 MCU, before scheduling becomes unfeasible. The performance shown in Table 3 is, nonetheless, the price to pay for using a high-level language such as Python and for the flexibility and portability a VM o↵ers, including the ability of dynamically deploying tasks. Varying task parameters. Figure 7 plots the energy improvements in a Coin-based design compared to a traditional sense-and-send design against the compression ratios achieved in RLE. As this ratio increases, a Coin-based de-

15 35 30 25 20 15 FRDM-KL46Z 10 nRF51-DK 5 LPCXpresso1549 LPCXpresso4337

Energy improvement (%) 0 0.3 0.4 0.5 0.6 0.7 0.8 RLE compression ratio Figure 7: RLE compression: energy performance of Coin compared to a sense- and-send design against varying compression ratios.

45 40 35 30 25 20 FRDM-KL46Z 15 nRF51-DK 10 LPCXpresso1549 5 LPCXpresso4337

Energy improvement (%) 0 10 20 30 40 50 60 Activity report interval T (sec) Figure 8: Activity detection (Ad): energy performance of Coin compared to a sense-and-send design with varying report intervals. sign saves a higher number of transmissions. The corresponding energy im- provements peak at 36% with the nRF51-DK board, and yet a minimum 23% improvement is achieved even when RLE can only reduce the size of data by about one third. The compression ratio is inversely proportional to the amount of data to communicate over wireless, and thus to the number of packets the BLE radio needs to transmit. The BLE standard [32] dictates a maximum packet size of 22 bytes, which entails that numerous packets may be needed to transmit relatively small amounts of data. Compared to the amount of application data a packet can carry, the overhead for each such packet, for example, in terms of header data, is significant. Saving transmissions thus bears a significant impact. Figure 8 investigates the same trade-o↵with varying report intervals T in Ad. This task shows the largest range of potential improvements, from a mini- mum 14% when the report interval is 10 sec, to a maximum 44% when the report interval is one minute. Depending on the application requirements, developers may set the report interval di↵erently at the expense of energy consumption. The trends are similar for all platforms, and increase about linearly with T .

16 40 35 30 25 20 15 FRDM-KL46Z 10 nRF51-DK LPCXpresso1549 5 LPCXpresso4337

Energy improvement (%) 0 10 20 30 40 50 Sensing frequency F (Hz) Figure 9: Ecg analysis: energy performance of Coin compared to a sense-and- send design with varying sensing frequency.

Unlike the other tasks we test, the LPCXpresso4337 board constantly performs best. As mentioned before, this is most likely a result of the FPU aboard the Cortex M4 core. Figure 9 illustrates the energy trade-o↵s in the Ecg task with varying sens- ing frequency. Again the trend is about linear, but no platform distinctively dominates. On the other hand, the FRDM-KL46Z constantly results in the lowest energy improvements, closely followed by the nRF51-DK. In essence, the two Cortex M0/M0+ cores appear to be least able to trade energy consumption in computation over the energy cost of wireless transmissions, with the Ecg task. The observations in this Section may be instrumental to select the platform to run a Coin-based design, depending on the application characteristics. In all three benchmarks, one may expect a steeper increase in the energy consumption improvements as the compression ratio or data load increases. Figure 7 to 9, however, reflect the fact that the platforms we use still present an idle current in the mA range. This is a cost that persists throughout the whole execution and essentially tends to “flatten” the improvements. As new 32-bit platforms emerge that are explicitly designed to achieve sub mA idle currents [27], we expect these improvements to amplify. Dynamic deployment. The only added energy of Coin that does not per- tain to a traditional sense-and-send design is that of dynamically deploying the task. This is a one-time cost that is vastly dominated by the transmission of the Python bytecode over wireless. The Ad task, for example, requires about 50 packets. These may be transmitted using BLE’s streaming mode [32], which reduces the e↵orts for packet trains. Therefore, the corresponding energy over- head quickly amortizes as a task keeps running on a device.

7 Discussion

Our prototype is mainly meant to provide a basis to assess the architecture and to evaluate the performance trade-o↵s. The prototype has a few limitations,

17 which would require further implementation work to be overcome and yet would not alter Coin’s conceptual design. First, PyMite does not o↵er by itself resource allocation mechanism, which would be required to ensure that tasks developed by di↵erent parties safely share resources such as network bandwidth and energy. This issue, in fact, transcends our work and may be seen as a desirable feature of any VM employed in scenarios akin to ours. There exist vast literature on the subject [7, 19] that can be applied to address this issue by simply porting the necessary mechanisms. Our prototype o↵ers no security- or privacy-related features. For example, we perform no checks on the authenticity of the Python bytecode. There may also exist privacy concerns related to sharing data among tasks deployed by di↵erent mobile apps. These problems are not new: techniques such as code signing [10] exist that impose an overhead only upon loading the bytecode, but not when executing it. This would be a one-time cost, which would retain the validity of the trade-o↵s identified in Section 6. Solutions are also available to achieve secure and privacy-preserving computing on resource-constrained de- vices [28, 9]. Backward compatibility with vendor-specific APIs might represent an issue. However, given the simplicity of their operation, we maintain it should be pos- sible to create software adapters that mimic their operation and permanently deploy them as Coin tasks. These would interject the calls that a mobile app developed with vendor-specific APIs issues and generate corresponding replies.

8 Conclusion

We presented Coin, a software architecture that enables running a slice of a mobile app’s logic onto connected devices such as proximity beacons, body- worn sensors, and controllable light bulbs. Coin o↵ers a high-level program- ming model based on a lightweight notion of tasks as a relocatable unit of mobile app logic. Its run-time support takes care of the tasks’ lifestyle includ- ing dynamic deployment, real-time scheduling, and cross-task coordination. We demonstrated how Coin overcomes the limitations of traditional designs that limit the functionality of the connected device to sense-and-send interactions. The device-local processing Coin enables, for example, improves a device’s en- ergy consumption up to a 44% factor, at the expense of larger execution times due to the VM-based execution.

References

[1] G. Agha. Actors: a model of concurrent computation in distributed systems. PhD thesis, MIT Artificial Intelligence Laboratory, 1985. [2] D. Albu, J. Lukkien, and R. Verhoeven. On-node processing of ECG signals. In IEEE CCNC, Jan 2010.

18 [3] F. Aslam et al. Optimized Java binary and virtual machine for tiny motes. In IEEE DCOSS, 2010. [4] L. Au et al. Episodic sampling: Towards energy-ecient patient monitoring with wearable sensors. In IEEE EMBC, 2009. [5] M. Botts and A. Robin. OpenGIS sensor model language (SensorML). OpenGIS Implementation Specification OGC, 7, 2007. [6] N. Brouwers et al. Darjeeling, a feature-rich VM for the resource poor. In ACM SENSYS, 2009. [7] Q. Cao et al. Virtual battery: An energy reserve abstraction for embedded sensor networks. In IEEE RTSS, 2008. [8] E. Capo-Chichi, H. Guyennet, and J.-M. Friedt. K-RLE: a new data com- pression algorithm for . In IEEE SENSORCOMM, 2009. [9] H. Chan and A. Perrig. Security and privacy in sensor networks. IEEE Computer, 36(10), 2003. [10] P. Dutta et al. Securing the deluge network programming system. In ACM IPSN, 2006. [11] J. Evans. Ecient FIR filter architectures suitable for FPGA implementa- tion. IEEE Trans. on Circuits and Systems, 41(7), 1994. [12] Freescale. FRDM-KL46Z development platform. goo.gl/uPE71Q. [13] J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, and K. Pister. System architecture directions for networked sensors. In ACM ASPLOS, 2000. [14] S. Hoppough. Shelf life. Forbes Magazine, 2006. [15] W. Horr´e, S. Michiels, W. Joosen, and P. Verbaeten. DAVIM: Adaptable middleware for sensor networks. IEEE Distributed Systems Online, 9(1), 2008. [16] D. Karantonis et al. Implementation of a real-time human movement classi- fier using a triaxial accelerometer for ambulatory monitoring. IEEE Trans. on Information Technology in Biomedicine, 10(1), 2006. [17] M. Kargahi and A. Movaghar. A method for performance analysis of earliest-deadline-first scheduling policy. The Journal of Supercomputing, 37(2), 2006. [18] G. Krassnig et al. User-friendly system for recognition of activities with an accelerometer. In ICST PervasiveHealth, March 2010. [19] A. Lachenmann et al. Meeting lifetime goals with energy levels. In ACM SENSYS, 2007.

19 [20] P. Levis and D. Culler. MatE:´ A tiny virtual machine for sensor networks. SIGOPS Oper. Syst. Rev., 36(5), 2002. [21] C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. Journal of ACM, 20(1), 1973.

[22] C. Magerkurth, A. D. Cheok, R. L. Mandryk, and T. Nilsen. Pervasive games: Bringing computer entertainment back to the real world. Comput. Entertain., 3(3), 2005. [23] L. Mottola et al. Pervasive games in a mote-enabled virtual world using tuple space middleware. In ACM NETGAMES, 2006.

[24] Nordic Semiconductor. nRF51 DK BLE development kit. www.nordicsemi.com/eng/Products/nRF51-DK. [25] NXP. OM13056: LPCXpresso board. goo.gl/6yYhYP. [26] NXP. OM13070: LPCXpresso board. goo.gl/cwec04.

[27] P. Pannuto et al. A networked platform for the post-mote era. In ACM SENSYS, 2014. [28] A. Perrig et al. Security in wireless sensor networks. Communications of the ACM, 47(6), 2004.

[29] PyMite. code.google.com/p/python-on-a-chip/. [30] Qardio, Inc. Qardiocore. getqardio.com. [31] Seeed Studios. Bluetooth shield. www.seeedstudio.com/wiki/- Bluetooth Shield.

[32] B. SIG. Bluetooth core specification v4.0. Technical report, Bluetooth SIG, June 2010. [33] D. Simon et al. JavaTMon the bare metal of wireless sensor devices: The Squawk Java virtual machine. In ACM VEE, 2006. [34] F.-T. Sun, C. Kuo, and M. Griss. PEAR: Power eciency through activity recognition (for ECG-based sensing). In IEEE PervasiveHealth, 2011. [35] N. Tsiftes et al. Ecient sensor network reprogramming through compres- sion of executable modules. In IEEE SECON, 2008. [36] Xiaomi. Mi band. www.mi.com.

20