The Recent Past and Distant Future of [Micro-Scale] Embedded Systems

Pat Pannuto, [email protected], patpannuto.com

NextMote Workshop @ EWSN’16, Graz, Austria February 15, 2016 “Micro-Scale” are systems whose principle node dimension is measured on the order of millimeters • Today’s motes are decimeter-scale… …and pushing towards centimeter-scale A millimeter-sized system is a 1000x reduction in volume

• In practice, several nodes fit on a period. We have developed dozens of millimeter-scale motes as part of the Michigan Micro Mote (M3) project This talk: A tour of micro-scale mote challenges

• A Brief History of “Smart Dust” • Micro-Scale Composition Architecture • Energy • Communication • Debugging This talk: A tour of micro-scale mote challenges

• A Brief History of “Smart Dust” • Micro-Scale Composition Architecture • Energy • Communication • Debugging “Smart Dust” is not new…

• 1957: Fred Hoyle’s Black Cloud – Sentient space-faring cloud organism • Early-mid 90’s: Viable? – First serious discussions at RAND and ISAT workshops • 1995: Neal Stephenson’s Diamond Age – Popularized a modern vision of smart dust • 1998: Kris Pister coins the term – And starts making smart dust a reality! First-generation smart dust missed its 1mm3 goal, but their so-called “macro motes” were perhaps the most important • Ad-hoc composition • Early “macro motes” dovetailed lessened efficiency of and helped drive early “mote” volumetric space development 1999 • 900 MHz radio • Magnetometer • Accelerometer • Light • Temperature • Pressure

One week lifetime in continuous operation, http://robotics.eecs.berkeley.edu/~pister/SmartDust/ 2 years w/ 1% duty cycle The history of modern mote research picks up around this time

CSP Workshop @ Xerox PARC Epic Core [soon to be First MICA (modular IPSN] Great Duck Island platforms) TelosB (Tmot e Sky)

Workshop on Sensor Networks and Applications [soon to be SenSys] Glossy BTNode

Pister & Hollar’s B-MAC “Macro Motes” But what happened to research pushing towards a 1mm3 system?

? We stopped building mm-scale systems, and focused on mm-scale circuits – at least for a little while

Y.-S. Lin ‘07

V. Ekanayake ‘04 M. Scott ‘03

B. Warneke ‘04 Y.-S. Lin ‘08 2008 The Phoenix Project Monolithic designs let us push the envelope, but modularity must return to grow beyond a niche space • Phoenix • Intraocular Pressure – World’s lowest power – Collaboration for glaucoma computer (30 pW / 300 nW) health – Basically a temperature sensor – A pressure sensor From mm-scale temperature sensor to mm2008 -scale pressure sensor2011 took 3 years (and another half-dozen PhD students…)

The Phoenix Processor: A 30pW Platform for Sensor Applications, Mingoo Seok, Scott Hanson, Yu-Shiang Lin, Zhiyoong Foo, DaeyeonKim, A Cubic-Millimeter Energy-Autonomous Wireless Intraoc ul ar Pressure Monitor, Gregory Chen, Hassan Ghaed, Razi-ul Haque, Michael Yoonmyung Lee, Nurrachman Liu, Dennis Sylvester, David Blaauw , VLSI ‘08 Wiec kowski, Yejoong Kim, Gyouho Kim, David Fick, DaeyeonKim, Mingoo Seok, Kensall Wise, Dav id Blaauw, Dennis Sylvester, ISSC ‘11 Today, we are able to mix and match components to synthesize systems

Control/ Decap Temp Radio PMU + CDC Sensor

2 uAh 5 uAh Battery Battery

MEMS Motion P. S e n s o r Detection/ Imager

Pressure Sensing Temp. Sensing Visual Sensing However, we had to invent a new system interconnect to be able to do it

Control/ Decap Temp Radio PMU + CDC Sensor

2 uAh 5 uAh Battery Battery

MEMS Motion P. S e n s o r Detection/ Imager

Pressure Sensing Temp. Sensing Visual Sensing • A Brief History of “Smart Dust” • Micro-Scale Composition Architecture • Energy • Communication • Debugging Comparing system composition strategies between traditional motes and mm-scale motes • How to build a mote to • How to build a mm-scale sense temperature today? mote to sense temperature? – Processor

– Temperature Sensor temp_sense t1 ( … .sample_valid_out (to_proc_t1_sample_valid), .sample_out (to_proc_t1_sample), – Radio ); processor p1 ( … .temp_valid_in (to_proc_t1_sample_valid), .temp_in (to_proc_t1_sample), ); We have solutions for composing embedded systems… SPI and I2C • Nearly every has both • Nearly every peripheral has one or the other • Very few use anything else – (except maybe UART) • So why don’t they work for mm-scale systems?

17 Millimeter-scale systems are small

Node volume is dominated by energy storage “Phone” Battery “Sensor” Batteries

“Sensor”

Battery And volume is shrinking cubically

Budget 10’s μW active, 10’s nW sleep, DC 0.1% 18 Millimeter-scale systems are small

Node volume is dominated by I/O pads begin to account energy storage for non-trivial percentage of node surface area “Phone” Battery “Sensor” Batteries 1 mm = 1,000 μm

50 μm, minimum viable bond pad “Sensor”

Battery 16-20 maximum I/O pins for 3D stacking And volume is shrinking cubically

Budget 10’s μW active, 10’s nW sleep, DC 0.1% 19 What is wrong with how are systems composed today?

• SPI, invented by • One master, N slaves Motorola in ~1979 • Shared clock: SCLK Master Slave 1 • Shared data bus: MOSI • Shared data bus: MISO SCLK SCLK MOSI MOSI • One Slave Select line per slave MISO MISO SS SS0 • Key Properties SS1 Slave 2 … • One dedicated I/O line per slave SCLK • Master controls all communication MOSI MISO SS

20 What is wrong with how are systems composed today?

• SPI, invented by • One master, N slaves Motorola in ~1979 • Shared clock: SCLK Master Slave 1 • Shared data bus: MOSI SCLK SCLK • Shared data bus: MISO MOSI MOSI • One Slave Select line per slave MISO MISO • One interrupt line per slave SS SS0 INT SS1 Slave 2 • Key Properties … SCLK • One Two dedicated I/O lines per slave • Master controls all communication GPIO0 MOSI GPIO1 MISO • Interrupts must be out-of-band … SS INT 21 SPI’s I/O overhead and centralized architecture do not scale to mm-scale systems • SPI, invented by • One master, N slaves Motorola in ~1979 • Shared clock: SCLK Master Slave 1 • Shared data bus: MOSI SCLK SCLK • Shared data bus: MISO MOSI MOSI • One Slave Select line per slave MISO MISO • One interrupt line per slave SS SS0 INT SS1 Slave 2 • Key Properties … SCLK • One Two dedicated I/O lines per slave • Master controls all communication GPIO0 MOSI GPIO1 MISO • Interrupts must be out-of-band … SS INT 22 SPI’s I/O overhead and centralized architecture do not scale to mm-scale systems • SPI, invented by Motorola in ~1979

Master Slave 1 SCLK SCLK 1 mm = 1,000 μm MOSI MOSI MISO MISO SS 50 μm, minimum viable bond pad SS0 INT SS1 Slave 2 … SCLK GPIO0 MOSI GPIO1 MISO … SS INT 23 I2C has fixed I/O requirements and a decentralized architecture

• I2C, invented by Phillips in 1982 ‒ Any-to-(m)any on one shared bus Node1 Node2 Node3 SCL SCL SCL • Key Properties SDA SDA SDA ‒ Fixed wire count (2)

24 I2C has fixed I/O requirements and a decentralized architecture

• I2C, invented by Phillips in 1982 ‒ Any-to-(m)any on one shared bus Node1 Node2 Node3 SCL SCL SCL • Key Properties SDA SDA SDA ‒ Fixed wire count (2) ‒ Open-collector • Multi-master • Flow Control

Open-collector (aka wired-AND)

25 The problem is the energy costs of running an open- collector

@400 kHz

VDD = 1.2 V No setup, no hold

R = 15.5 kΩ (“the best”)

80% VDD as 1 Send a “1”: 0 J Send a “0”: 174 pJ

26 The problem is the energy costs of running an open- collector

@400 kHz @400 kHz

VDD = 1.2 V No setup, no hold Bigger R? R = 15.5 kΩ (“the best”)

80% VDD as 1 Send a “1”: 0 J Send a “0”: 174 pJ SCL Alone: 70 μW

27 The problem is the energy costs of running an open- collector

@400 kHz @400 kHz

VDD = 1.2 V No setup, no hold

R = 15.5 kΩ (“the best”)

80% VDD as 1 Send a “1”: 0 J Send a “0”: 174 pJ SCL Alone: 70 μW

28 The energy demands of open-collectors make them unsuitable for mm-scale systems

@400 kHz@400 kHz • Active Energy Budget: 20 μW • Not an arbitrary number: - Volume Target - Lifetime Target

Two batteries

SCL Alone: 70 μW ~ 3.1 mm 29 The design and synthesis of modular, millimeter-scale systems requires a new embedded interconnect MBus is a clean-slate design, built to satisfy interconnect requirements for this and the next generation of motes • Fixed pin count (4) • Minimal standby and active power • Large global address space • Multi-master design • Broadcast message support • Hardware acknowledgements • Data independent behavior • Modest protocol overhead • Power aware design

MBus: An Ultra-Low Power Interconnect Bus for Next Generation Nanopower Systems Pat Pannuto, Yoonmyung Lee, Ye-Sheng Kuo, ZhiYoong Foo, Benjamin Kempke, Gyo u h o Kim, Ronald G Dreslinski, D avi d Blaauw, and Prabal Dutta 31 Proceedings of the 42nd International Symposium on Computer Architecture (ISCA ’15) • Ring Topology MBus in a nutshell • 2 lines – 4 I/O per node – Clock Clock – Data MBus Data • Transaction oriented Mediator – Arbitration – Address Transmission MBus MBus – Data Transmission Member Member – Interjection – Control (ACK/NAK) • “Shoot-Through”

CLK_IN MBus MBus CLK_OUT Member Member

32 Embedded interconnect technology has not changed in over 30 years • If we re-examine… Embedded interconnect technology has not changed in over 30 years • If we re-examine… – Addressing

• Requires I/O not available • Makes packaging assumptions ‒ Systems will not always be PCBs ‒ Routing may not be easy • 3D stack • Flip-chip + TSVs Embedded interconnect technology has not changed in over 30 years • If we re-examine… – Addressing 3 Options Short static addresses and allow device conflicts Long static addresses to avoid device conflicts Non-static addresses

MBus does all 3 4-bit: Static short prefixes (device class) 24-bit: Static long prefixes (unique device ID) 4-bit: Runtime enumeration protocol (replaces short prefix) Embedded interconnect technology has not changed in over 30 years • If we re-examine… – Addressing – Acknowledgements • I2C acknowledges every byte 50 – How often do NAKs happen? 43 40 • To a random byte? 30 – 12.5% overhead 20 19 Bits of Overhead 10 • MBus ACKs transactions

2 0 – Receiver can interject message 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 Message Length (Bytes)

UART (1-bit stop) I2C MBus (short) UART (2-bit stop) SPI MBus (full) Embedded interconnect technology has not changed in over 30 years CPU • If we re-examine… – Addressing – Acknowledgements Sensor Radio – Centralized vs Decentralized CPU

Sensor Radio

Over 2x energy consumption with centralized

7% improvement in lifetime for a prototypical “sense & send” application Existing interconnect technology cannot service forthcoming motes, and may be hindering current motes • Leading embedded interconnects are showing flaws – Unbounded I/O requirements – Burdensome runtime energy demands – Limited addressing schemes – Inefficient or nonexistent acknowledgments – Restrictive centralized architecture • MBus addresses all of these issues and adds new features Energy is a looming system synthesis problem

• A Brief History of “Smart Dust” • Micro-Scale Composition Architecture • Energy • Communication • Debugging Motes embraced “dark silicon” first

• Powering off regions of a chip has become commonplace • Motes power off regions of a system – Requires coordination strategy – Historically the CPU 1999 • 900 MHz radio – Putting peripherals into • Magnetometer • Accelerometer standby mode was good • Light enough for early, battery- • Temperature backed motes… • Pressure One week lifetime in continuous operation, 2 years w/ 1% duty cycle Energy-critical systems push from “dark silicon” to “pitch black” silicon

• Clock-gating – Stop driving the clock tree of regions of a chip – Eliminates switching power, but not static leakage – “Dark Silicon” • Power-gating – Switch off power to regions of a chip – Eliminates all chip leakage – “Pitch Black Silicon” • M3 systems aggressively power-gate to reach energy budget Energy-harvesting motes embrace “pitch black” silicon Can we build a (useful) battery-free mote? Mote platform?

“The Nuclear Option”

• Zero static leakage • Very little control

L. Yerva, B. Campbell, A. Bansal, T. Schmid, and P. Dutta, 䇾Grafting Energy-Harvesting Leaves onto the Sensornet Tree,䇿 In Proceedings of the 11th Slide borrowed from Prabal Dutta International Conference on Information Processing in Sensor Networks (IPSN'12), Beijing, China, Apr. 16-20, 2012. Monjolo sensors—the harvester is the sensor

L. Yerva, B. Campbell, A. Bansal, T. Schmid, and P. Dutta, 䇾Grafting Energy-Harvesting Leaves onto the Sensornet Tree,䇿 In Proceedings of the 11th International Conference on Information Processing in Sensor Networks (IPSN'12), Beijing, China, Apr. 16-20, 2012.

service cap service entrance conductors service drop conductors

drip servicemast loop splices

current transformers

Samuel DeBruin, Bradford Campbell, and Prabal Dutta, "Monjolo: An Energy-Harvesting Energy Meter Architecture,” In Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems (Sensys'13), Rome, Italy, Nov. 11-14, 2013. Monjolo: A Portuguese water hammer § Hammer rate ó flow rate § Fixed water quanta / strike § Non-linearity with high flow rate B. Campbell, B. Ghena, and P. Dutta, "Energy-Harvesting Thermoelectric Sensing for Unobtrusive Water and Appliance Metering,” In Proceedings of the 2nd International Slide borrowed from Prabal Dutta Workshop on Energy Neutral Sensing Systems (ENSsys'14), Memphis, TN, Nov. 6, 2014. The Monjolo principle decomposes into an array of components that build a diverse set of energy-harvesting sensors

HW/SW designs available online: [1] http://lab11.eecs.umich.edu/pcb.html [2] https://github.com/lab11

Slide borrowed from Prabal Dutta Systems are not required to be all or none, however

• “Federated Energy” – Harvest an independent store for each component – Completely cut power when component is not in use – But this system cannot cut power to the MCU – MCU-Federated energy control

Josiah Hester, Lanny Sitanayah, and Jacob Sorber. 2015. Tragedy of the Coulombs: Federating Energy Storage for Tiny, Intermittently-Powered Sensors. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems (SenSys '15) Embedded interconnect technology has not changed in over 30 years • If we re-examine… – Addressing – Acknowledgements – Centralized vs Decentralized – Peripheral power state is an interconnect concern Our mm-scale systems leverage decentralized, autonomouspower management via MBus • Key Challenges – Energy efficient multi-master systems are a distributed state problem – Cold-booting silicon is a hard circuits problem • Because there is no stable clock • MBus Key Observation – A component is either already on, or can leverage an incoming message to wake itself MBus introduces three, transparent, hierarchical power domains to maximize efficiency • Provide an “always-on” abstraction – Interconnect fabric powers on components upon receipt of a message – Transparent to callers and chip designers As motes continue to push the boundaries of lifetime, harvesting, and volume, every microjoule is precious • Circuits advancements will continue to provide benefits – Non-volatile RAM will fall in price and area – Accelerators will replace compute-intensive tasks – Finer-grained internal power-gating – Diminishing returns… • But there are system-level constructs wasting energy today – Leveraging peripheral power-gating – Coalescing wakeups – Clock domain controls – … • A Brief History of “Smart Dust” • System Composition Architecture • Energy • Communication – “Internal” – “External” • Debugging Embedded interconnect protocols have not changed in 30 years either • In fact, they do not exist at all • Q: How do you read a register on a sensor? • Q: How do you write to a radio’s packet buffer? MPQ: A (mostly) PHY-independent protocol to standardize reading and writing registers and memory • Motivations Primitives – Simplicity across the ecosystem • Read/Write Register – Enabling reliable direct peripheral communication • Block DMA transfer • “Streaming” memory

– Transition CPU from active overseer to passive coordinator • Configure all peripherals at boot, then power off self • Distributed equivalent of many COTS chips (Sleepwalking, uDMA, etc) MPQ repurposes the 4 LSB of the address as a function identifier

– Could also use the first 4 bits of data if address bits are unavailable – Embedded write addresses for read commands enable powerful command synthesis An interesting quirk begets an interesting question

• M3 CPU chip has only SRAM – Loses state (and programming) on complete power loss • Separate flash chip programs CPU on boot – Actually, it is a state machine issuing arbitrary DMA messages • No CPU needed for a sense, send, and forget application – As peripherals become more capable, what is the role of the CPU? Perhaps it is time for a generic protocol on top of major embedded buses • Improve the programmability and utility of mostly-fixed- function peripheral frontend • Proving powerful for synthesis in M3 ecosystem • Easy to phase-in – No backward compatibility because there’s nothing today! • A Brief History of “Smart Dust” • System Composition Architecture • Energy • Communication – “Internal” – “External” • Debugging Q: How do you program a 1mm3 mote?

Actual size A 695 pW, always-on visible light communication receiver A sub-nanowattreceiver opens interesting possibilities for traditional motes

• Idle listening dominates energy Staying budget of many MACs Synchronized • Can we decouple synchronization from communication? 1200 μs @ 2ppm Guard

Local time Static Power: ~100 nW 300 s Real time All this begs the question, what else can we do with an intelligent lighting infrastructure?

IP-Backbone Cloud / Cloudlet PLC PLC

Control C Ethernet C Ethernet C Unit RF RF

Data Configuration RF Downlink Synchronization Uplink Positioning

System Architecture Directions for a -Defined Lighting Infrastructure Ye-Sheng Kuo, Pat Pannuto, and Prabal Dutta 1st ACM Workshop on Visible Light Communication Systems (VLCS ’14) The capabilities of VLC receivers vary greatly

Diffusing (single photodiode) • Simpler devices à Less computation capability • Cannot distinguish transmitters • Requires temporal signal diversity

Array (multiple photocell) • Can distinguish transmitters • Temporal or spatial diversity Luxapose: Astral navigation with smart phones and smart lights

Luxapose: Indoor Positioning with Mobile Phones and Visible Light (MobiCom’14) Ye-Sheng Kuo, Pat Pannuto, Ko-Jen Hsiao, and Prabal Dutta Luxapose enables high-fidelity indoor localization with impending infrastructure

Human Perception -60 • Decimeter-level TX 4 TX 3 -40 accuracy -20 • Unmodified

y 0 TX 5 20 Camera Perception • Trivially modified 40 TX 1 TX 2 LED lights 60

-60 -40 -20 0 20 40 60 x Location Orientation

Luxapose: Indoor Positioning with Mobile Phones and Visible Light (MobiCom’14) Ye-Sheng Kuo, Pat Pannuto, Ko-Jen Hsiao, and Prabal Dutta Can smart dust and smart phones co-exist on the same lighting infrastructure? With sufficient synchronization, transmitted signals can constructively interfere Emerging non-traditional (non-narrowband RF) communication technologies are just beginning exploration • Sub-nanowatt VLC receivers • VLC retroreflectorsfor bi-directional communication • Acoustic? • Vibratory? • Backscatter communication • High-fidelity localization with UWB – For almost no power with backscattered UWB? • A Brief History of “Smart Dust” • System Composition Architecture • Energy • Communication • Debugging An open question: How to effectively debug a system you cannot touch? • Debug chips provide some introspection – Snoop (or inject) on system bus – Power traces remarkably effective • Encapsulated systems remain a big issue

Bottom View: MEMS pressure sensor Side View: Light-blocking epoxy Top View: Clear epoxy for harvesting Many further questions in this space for future motes

• When is an energy harvesting node dead or just harvesting? • How to debug 1,000 nodes? – How to find “the” bad node among 1,000? – How to collect and recover failed nodes? • Or is it really just dust? J Everyone’s favorite debugging answer: Write something well enough that it won’t crash! • A teaser for Tock – New embedded operating system • Collaboration with Stanford and UC Berkeley – Defense-in-depth: language (Rust) and hardware (MPU) protections – It’s time for proper kernel/application separation • Even viable for micro-scale processor (currently M0, but M0+ has MPU) • And a safe, accessible user-space runtime (Lua) What have we learned from the development of millimeter-scale systems? • Existing system synthesis technologies cannot scale • Revisiting long-standing architectural components can yield unforeseen benefits • To maximize energy efficiency, power-awareness must permeate every aspect of the system design • New communication technologies are setting up major architectural shifts • There is lots of work yet to do! http://mbus.io http://github.com/mbus http://lab11.eecs.umich.edu http://github.com/lab11

Pat Pannuto, University of Michigan [email protected], patpannuto.com Gartner Hype Cycle

• Appeared 2013, fell off 2014, back in 2015

[2013]: http://www.gartner.com/newsroom/id/2575515 [2014]: http://www.gartner.com/newsroom/id/2819918 [2015]: http://www.gartner.com/newsroom/id/3114217