Our TAs Top 11 Technologies of the Decade

Mo Sha Yong Fu 1. Smartphones 7. Drone Aircra • [email protected][email protected] • TinyOS tutorial. • Grade critiques. 2. Social Networking 8. Planetary Rovers • Help students with projects. • Office Hour: by appointment. 3. Voice over IP 9. Flexible AC • Manage motes. • Bryan 502D Transmission • Grade projects. 4. LED Lighng • Office Hour: Tue/Thu 5:30-6. 5. Mulcore CPUs 10. Digital Photography • Bryan 502A 6. Cloud Compung 11. Class-D Audio

Chenyang Lu 1 Chenyang Lu 2

TinyOS and nesC Hardware Evoluon  TinyOS: OS for wireless sensor networks.  Miniature devices manufactured economically  nesC: for TinyOS.  Microprocessors  Sensors/actuators  Wireless chips

4.5’’X2.4’’ 1’’X1’’ 1 mm2 1 nm2

Chenyang Lu 4 Chenyang Lu 3

Mica2 Mote Hardware Constraints  Processor  : 7.4 MHz, 8 bit Severe constraints on power, size, and cost   Memory: 4KB data, 128 KB program  slow microprocessor  Radio  low-bandwidth radio  Max 38.4 Kbps  limited memory  Sensors  limited hardware parallelism  CPU hit by many interrupts!  Light, temperature, acceleraon, acousc, magnec…  manage sleep modes in hardware components  Power  <1 week on two AA baeries in acve mode  >1 year baery life on sleep modes!

Chenyang Lu 5 Chenyang Lu 6 Soware Challenges Tradional OS

 Small memory footprint  Mul-threaded  Efficiency - power and processing  Preempve scheduling  Concurrency-intensive operaons  Threads:  Diversity in applicaons & plaorm  efficient modularity  ready to run;  Support reconfigurable hardware and soware executing  execung on the CPU;  waing for data. gets CPU preempted needs data gets data ready waiting needs data

Chenyang Lu 7 Chenyang Lu 8

Pros and Cons of Tradional OS Example: Preempve Priority Scheduling  Each process has a fixed priority (1 highest);  Mul-threaded + preempve scheduling  P1: priority 1; P2: priority 2; P3: priority 3.  Preempted threads waste memory P released  Context switch overhead 3 P released  I/O P2 released 1  Blocking I/O: waste memory on blocked threads  Polling (busy-wait): waste CPU cycles and power

P2 P1 P2 P3

0 10 20 30 40 50 60 time 10 Chenyang Lu 9 Chenyang Lu CSE 467S

Context Switch Exisng Embedded OS

Name Code Size Target CPU pOSEK 2K pSOSystem PII->ARM Thumb process 1 VxWorks 286K Pentium -> Strong ARM PC QNX Nutrino >100K Pentium II -> NEC registers QNX RealTime 100K Pentium II -> SH4 process 2 OS-9 Pentium -> SH4 Chorus OS 10K Pentium -> Strong ARM ... CPU ARIEL 19K SH2, ARM Thumb Creem 560 bytes ATMEL 8051  QNX context switch = 2400 cycles on x86 memory  pOSEK context switch > 40 µs  Creem -> no preempon

System architecture directions for network sensors, J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, K. Pister. ASPLOS 2000. 11 Chenyang Lu CSE 467S Chenyang Lu 12 TinyOS Soluons Example: Surge

 Efficient modularity

 Applicaon = scheduler + graph of components Sur geC  Compiled into one executable StdControl  Only needed components are complied/loaded BootC Sur geP StdControl ADC Timer SendMsg Leds  Concurrency: event-driven architecture

Main (includes Scheduler)

Application (User Components) StdControl ADC StdControl Timer StdControl SendMsg Leds PhotoC TimerC MultihopC LedsC

Actuating Sensing Communication Communication Hardware Abstractions

Modified from D. Culler et. Al., TinyOS boot camp presentation, Feb 2001

Chenyang Lu 13 Chenyang Lu 14

Typical Applicaon Two-level Scheduling

D. Culler et. Al., TinyOS boot camp presentation, Feb 2001 application sensing application  Events handle interrupts  Interrupts trigger lowest level events

routing Routing Layer  Events can signal events, call commands, or post tasks  Tasks perform deferred computaons  Interrupts preempt tasks and interrupts messaging Messaging Layer

Preempt Tasks POST FIFO packet Radio Packet

commands events events byte Radio Byte (MAC) Temp commands photo SW

Interrupts RFM HW Time bit clocks ADC i2c Hardware

Chenyang Lu 15 Chenyang Lu 16

Mulple Data Flows Sending a Message

 Respond quickly: sequence of event/command through the component graph.  Immediate execuon of funcon calls  e.g., get bit out of radio hw before it gets lost.  Post tasks for deferred computaons.  e.g., encoding.  Events preempt tasks to handle new interrupts.

Timing diagram of event propagation (step 0-6 takes about 95 microseconds total)

Chenyang Lu 17 Chenyang Lu 18 Scheduling Space Breakdown…

 Interrupts preempt tasks Code size for ad hoc networking application  Respond quickly  Event/command implemented as funcon calls

 Task cannot preempt tasks 3500 Interrupts  Reduce context switch  efficiency 3000 Message Dispatch Initilization  Single stack  low memory footprint C-Runtime 2500 Scheduler: 144 Bytes code  TinyOS 2 supports pluggable task scheduler (default: FIFO). Light Sensor Totals: 3430 Bytes code 2000 Clock  Scheduler puts processor to sleep when Scheduler 226 Bytes data Led Control  Bytes 1500 no event/command is running Messaging Layer

 task queue is empty 1000 Packet Layer Radio Interface

500 Routing Application Radio Byte Encoder

0 D. Culler et. Al., TinyOS boot camp presentation, Feb 2001

Chenyang Lu 19 Chenyang Lu 20

Power Breakdown… Time Breakdown…

Active Idle Sleep Packet reception Components work breakdown CPU Utilization Energy (nj/Bit) CPU 5 mA 2 mA 5 μA AM 0.05% 0.20% 0.33 Radio 7 mA (TX) 4.5 mA (RX) 5 μA Packet 1.12% 0.51% 7.58 EE-Prom 3 mA 0 0 Ratio handler 26.87% 12.16% 182.38 Panasonic LED’s 4 mA 0 0 CR2354 Radio decode thread 5.48% 2.48% 37.2 Photo Diode 200 μA 0 0 560 mAh RFM 66.48% 30.08% 451.17 Temperature 200 μA 0 0 Radio Reception - - 1350 Idle - 54.75% -  Lithium Baery runs for 35 hours at peak load and years at Total 100.00% 100.00% 2028.66 minimum load! • That’s three orders of magnitude difference!  50 cycle task overhead (6 byte copies)  A one byte transmission uses the same energy as approx 11000  10 cycle event overhead (1.25 byte copies) cycles of computaon.

Chenyang Lu 21 Chenyang Lu 22

Advantages Disadvantages

 Small memory footprint  Lack preempve real-me scheduling  Only needed components are complied/loaded  Urgent task may wait for non-urgent ones  Single stack for tasks  Lack flexibility  Power efficiency  Stac linking only  Put CPU to sleep whenever the task queue is empty  Cannot change parts of the code dynamically  TinyOS 2 provides efficient power management for peripherals and  Lack microprocessors.  Efficient modularity  Event/command interfaces between components  Event/command implemented as funcon calls  Concurrency-intensive operaons  Event/command + tasks

Chenyang Lu 23 Chenyang Lu 24 More nesC

  Mul-threaded vs. event-driven architectures Programming language for TinyOS and applicaons  Lack empirical comparison against exisng OSes  Support TinyOS components  A “standard” OS is more likely to be adopted by industry  Whole-program analysis at compile me  Jury is sll out…  Improve robustness: detect race condions  Alternave: Nave Java Virtual Machine  Opmizaon: funcon inlining  Java programming  Stac language  Virtual machine provides protecon  No funcon pointer  Example: Sun SPOT  No malloc  Call graph and variable access are known at compile me

Chenyang Lu 25 Chenyang Lu 26

Applicaon Interface

interface Clock { command error_t setRate(char interval, char scale);  Implementaon  Interfaces event error_t fire(); }  module: C behavior  provides interface  configuraon: select & wire  uses interface interface Send { command error_t send(message_t *msg, uint16_t length); event error_t sendDone(message_t *msg, error_t success); module TimerP { } StdControl Timer provides { interface StdControl; interface ADC { TimerP interface Timer; command error_t getData(); } event error_t dataReady(uint16_t data); Clock uses interface Clock; } ... } Bidireconal interface supports split-phase operaon

Chenyang Lu 27 Chenyang Lu 28

module SurgeP { provides interface StdControl; uses interface ADC; Module Configuraon uses interface Timer; uses interface Send; } StdControl Timer implementation { configuration TimerC { bool busy; provides { norace uint16_t sensorReading; interface StdControl; async event result_t Timer.fired() { StdControl Timer bool localBusy; interface Timer; atomic { TimerP } } localBusy = busy; Clock busy = TRUE; implementation { } components TimerP, HWClock; if (!localBusy) call ADC.getData(); StdControl = TimerP.StdControl; return SUCCESS; Clock Timer = TimerP.Timer; } async event result_t ADC.dataReady(uint16_t data) { HWClock sensorReading = data; TimerP.Clock -> HWClock.Clock; post sendData(); TimerC } return SUCCESS; } ... } Chenyang Lu 29 Chenyang Lu 30 Example: Surge Concurrency

 Race condion: concurrent interrupts/tasks update shared variables. Sur geC StdControl  Asynchronous code (AC): reachable from at least one interrupt Sur geP BootC handler. StdControl ADC Timer SendMsg Leds  Synchronous code (SC): reachable from tasks only.

 Any update of a shared variable from AC is a potenal race condion.

StdControl ADC StdControl Timer StdControl SendMsg Leds PhotoC TimerC MultihopC LedsC

Chenyang Lu 31 Chenyang Lu 32

A Race Condion Atomic Secons module SurgeP { ... } implementation { atomic { bool busy; norace uint16_t sensorReading; async event result_t Timer.fired() { } if (!busy) { busy = TRUE; call ADC.getData();  Disable interrupt when atomic code is being executed } return SUCCESS;  But cannot disable interrupt for long! }  No loop task void sendData() { // send sensorReading adcPacket.data = sensorReading;  No command/event call Send.send(&adcPacket, sizeof adcPacket.data);  Funcon calls OK, but callee must meet restricons too return SUCCESS; } async event result_t ADC.dataReady(uint16_t data) { sensorReading = data; post sendData(); return SUCCESS; } Chenyang Lu 33 Chenyang Lu 34

Prevent Race nesC Compiler module SurgeP { ... }  Race-free invariant: Any update to a shared variable is implementation {  from SC only, or bool busy; norace uint16_t sensorReading;  occurs within an atomic secon.  Compiler returns error if the invariant is violated. async event result_t Timer.fired() {  Fix disable bool localBusy;  Make access to shared variables atomic. interrupt atomic {  Move access to shared variables to tasks. localBusy = busy; test-and-set busy = TRUE; enable } interrupt if (!localBusy) call ADC.getData(); return SUCCESS; }

Chenyang Lu 35 Chenyang Lu 36 Results Opmizaon: Inlining  Tested on full TinyOS code, plus applicaons App Code size Code Data CPU  186 modules (121 modules, 65 configuraons) inlined noninlined reduction size reduction  20-69 modules/app, 35 average Surge 14794 16984 12% 1188 15%  17 tasks, 75 events on average (per applicaon) Maté 25040 27458 9% 1710 34% • Lots of concurrency! TinyDB 64910 71724 10% 2894 30%  Found 156 races: 103 real!  About 6 per 1000 lines of code • Inlining improves performance and reduces code size.  Fixing races: • Why?  Add atomic secons  Post tasks (move code to task context)

Chenyang Lu 37 Chenyang Lu 38

Overhead for Funcon Calls Principles Revisited

 Caller: call a funcon  Support TinyOS components  Push return address to stack   Push parameters to stack Interface, modules, configuraon  Jump to funcon  Whole-program analysis and opmizaon  Callee: receive a call  Improve robustness: detect race condions  Pop parameters from stack  Opmizaon: funcon inlining  Callee: return  More: memory footprint.  Pop return address from stack  Push return value to stack  Stac language  Jump back to caller  No malloc, no funcon pointers  Caller: return  Pop return value

Chenyang Lu 39 Chenyang Lu 40

Issues Reading  No dynamic memory allocaon  D. Gay, P. Levis, R. von Behren, M. Welsh, E. Brewer, and D. Culler, The nesC Language: A Holisc Approach to Networked Embedded  Bound memory footprint Systems. [Required]  Allow offline footprint analysis  D. Culler, TinyOS: Operang System Design for Wireless Sensor But Networks, Sensors, May 2006.  J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, and K. Pister, System  How to size buffer when data size varies dynamically? Architecture Direcons for Network Sensors.  Restricon: no “long-running” code in  P. Levis and D. Gay, TinyOS Programming, Cambridge University Press, 2009.  Command/event handlers  Purchase the book online  Atomic secons  Download the first half of the published version for free.  hp://www.nyos.net/

Chenyang Lu 41 Chenyang Lu 42 Proposal  One proposal/team, 1-2 pages  Team members  Concise descripon of project  Responsibilies of each member  Specific equipment needed

 Wrien proposal due: 9/22, 11:59pm  Email to Mo and CC me  Subject: [CSE 520S] Proposal: Project Name

 Proposal presentaon: 9/22, in class

43