Mesh Networking in Low Power Location Systems (Swarm)

SONALI DEO

KTH ROYAL INSTITUTE OF TECHNOLOGY INFORMATION AND COMMUNICATION TECHNOLOGY

KUNGLIGA TEKNISKA HÖGSKOLAN

Royal Institute of Technology

Mesh networking in low power location systems (swarm)

Master Thesis

Last revised: 2016-08-24

Sonali Deo

Email: [email protected]

Examiner

Professor Mark T. Smith (KTH)

Supervisors

Johnny Öberg (KTH)

Andreas Lagemann (nanotron Technologies) Acknowledgements

This thesis is dedicated to my parents and was accomplished with the help from a lot of people to whom I wish to express my sincere thanks.

I thank nanotron Technologies for giving me the opportunity to work on this topic which interested me the most from all the others that I came across. I am thankful to my supervisor at the company, Andreas Lagemann, for his enthusiasm in discussing and helping me develop new concepts for the implementation and encouraging me all the way. I thank the CTO, Rainer Hach, for introducing the Layer 2 routing protocol as a candidate in the first place and opening the possibilities for enhancement that enabled me to innovate. I am very grateful to Christian Bock, for helping me understand swarm, debugging my mistakes and for his insights in the whole implementation.

At KTH, I am highly indebted to Mark Smith for guiding me in making several decisions about the project and concepts. I also thank Mark Smith and Johnny Öberg for going through my long drafts and explanations and giving valuable suggestions that helped get this thesis to where it is now.

This work has given me a deeper insight to lot of things, such as, mesh networking, routing protocols, radio communication, synchronization schemes, embedded programming and most of all, testing, which is fairly new to me. It will be very useful for my continuation of work and study.

I hereby thank David, my parents and all the friends who have always given me kind help and support.

I

Abstract

Today, Internet of Things (IoT) is the driving force in making operations and processes smart. Indoor localization is such an application of IoT that has proven the potential of location awareness in countless scenarios, from mines to industries to even people. nanotron Technologies GmbH, based in Berlin, is one of the pioneers in low power location systems. nanotron's embedded location platform delivers location-awareness for safety and productivity solutions across industrial and consumer markets. The platform consists of chips, modules and that enable precise real-time positioning and concurrent wireless communication. The ubiquitous proliferation of interoperable platforms is creating the location- aware Internet of Things.

One of their product families is swarm. A swarm is a group of independent radios or nodes which facilitates the nodes to communicate with their immediate neighboring nodes to get each other’s positions. This position information is collected by one of the nodes (called gateway) and delivered to the host controller. However, the nodes need to be in range to communicate. The company wants to improve the range of communication and for that purpose; I am implementing a routing protocol with some additional changes for swarm, to allow out-of-range nodes to communicate via intermediate neighbors. This is called mesh networking which would result in so-called ‘mesh’ of nodes and would increase the range of swarm operation that could be beneficial in achieving uniform connectivity throughout large spaces without needing excessive number of gateways. This is of high importance because a node acting as gateway should be ‘awake’ all the time so that it can collect data efficiently, while the other nodes can be on power saving mode. Mesh networking will allow data collection even with fewer such gateways thereby being energy efficient while facilitating larger range of communication. This was made possible by adding the feature of allowing nodes to store messages for their neighbors in case they are asleep and wake up for the neighbors to transmit data. It is done using a schedule that is built and updated in addition to the routing protocol.

The purpose of this thesis is to justify the implemented mesh routing protocol for swarm among all the other routing protocols available. It also focuses on the modifications and improvements that were devised to make the protocol tailored for how swarm works and to support Message Queuing Telemetry Transport (MQTT) on top of it, at a later stage. MQTT is a lightweight messaging protocol that provides resource-constrained network clients with a simple way to distribute information. It uses a publish/subscribe communication pattern and is used for machine-to-machine (M2M) communication and plays an important role in the Internet of Things. The implemented routing protocol also takes into consideration, the sleeping nodes, route maintenance through advertisements, hierarchical nature of mesh to make data collection more efficient, message formats keeping in mind the memory shortage, etc. The document gives a thorough overview of concepts, design implementation, improvements and tests to prove the importance of mesh networking in existing swarm.

Keywords: WSN, routing, nanotron technologies, indoor positioning, swarm, mesh networking

II

Table of contents Acronyms and abbreviations ...... 1 Chapter 1: Introduction ...... 4 1.1. Background ...... 4 1.2. swarm bee LE ...... 5 1.3. Overview – Comparison of routing protocols ...... 6 1.4. Goals ...... 12 1.5. Organization ...... 12 Chapter 2: Concepts ...... 13 2.1. Principle of swarm ...... 13 2.2. swarm API ...... 16 Chapter 3: Design ...... 20 3.1. Proposed routing protocol ...... 20 3.2. Additional custom features ...... 32 3.3. Frame formats used ...... 35 Chapter 4: Implementation ...... 40 Chapter 5: Validation and testing ...... 51 5.1. Tools used ...... 51 5.2. Analysis of transmission and reception ...... 57 Chapter 6: Results and Evaluation ...... 60 6.1. Data analysis and evaluation ...... 60 Chapter 7: Conclusions ...... 72 7.1. Conclusion ...... 72 7.2. Future work ...... 73 References ...... 75 Appendices ...... 76 Appendix 1: Common swarm commands used ...... 76 Appendix 2: Frame formats ...... 80 Appendix 3: List of figures and tables ...... 82

III

Acronyms and abbreviations

ACK Acknowledgement ADV Advertise AIR Wireless medium AODV Ad hoc On demand Distance Vector API Application Programming Interface APP Application layer BSN Beacon sequence number CDMA Code Division Multiple Access CSMA/CA Carrier Sense Multiple access with collision avoidance CSS Chirp Spread Spectrum DA Destination Address DK Plus Development Kit Plus DS Downstream DSR Dynamic Source Routing EB Enhanced Beacon EBR Enhanced Beacon Request ER Enhanced Resolution ETX Expected Transmission count FFD Full Function Device FIFO First In First Out GPIO General-purpose input/output GUI Graphic User Interface HAL Hardware Abstraction Layer IE Information Element IoT Internet of Things L2R Layer 2 Routing L2R IE Layer 2 Routing IE LE Low Energy LEACH Low-energy adaptive clustering hierarchy CH Cluster Head LQM Link Quality Metric LSN L2R sequence number M2M Machine to machine MAC Media access control

1

MEMS Micro-Electro-Mechanical Systems MHR MAC header MQTT Message Queuing Telemetry Transport MSN Mesh sequence number MT Mesh table nanoLOC nano Location nanoPAL nano People and Asset Locating technology NHA Next Hop Address NIN Node ID Notification nodeID Node Identification NT Neighbor Table P2P Peer-to-peer PC Personal Computer PHY Physical layer PQM Path quality metric RAM Random access memory REQ Request RF Radio Frequency RREQ Routing request RREP Routing reply RERR Routing error RSSI Received signal strength indication RX Receive SA Source Address SDSR Secure Dynamic Source routing SDS-TWR Symmetrical Double-Sided Two Way Ranging SLS swarm location services SPIN Sensor Protocols for Information via Negotiation TC IE Topology Construction IE TDMA Time division multiple access TDOA Time difference of arrival TOA Time of arrival TOF Time of flight TSCH Time-Slotted Channel Hopping TTL Time to live TxA Transmitting node address

2

UART Universal Asynchronous Receiver/Transmitter US Upstream UWB Ultra-wideband WSN Wireless sensor network

3

Chapter 1: Introduction

1.1. Background

Today, Internet of Things (IoT) is the driving force in making operations and processes smart. Indoor localization is such an application of IoT that has proven the potential of location awareness in countless scenarios, from mines to industries to even people. nanotron Technologies GmbH, based in Berlin, is one of the pioneers in low power location systems. nanotron's embedded location platform delivers location-awareness for safety and productivity solutions across industrial and consumer markets. The platform consists of chips, modules and software that enable precise real-time positioning and concurrent wireless communication. The ubiquitous proliferation of interoperable platforms is creating the location- aware Internet of Things [1]. One of their product families is swarm. A swarm is a group of independent radios or nodes which facilitates the nodes to communicate with their immediate neighboring nodes to get each other’s positions. This position information is collected by one of the nodes (called gateway) and delivered to the host controller. However, the nodes need to be in range to communicate. So, for larger areas like mines, many gateways need to be deployed to gather data from nodes placed at different points. This not only increases the cost of deployment, but also has a low range of operation. To cope with this restriction over range, mesh networking is an appropriate solution.

A mesh network is a network topology in which each node relays data for the network. All mesh nodes cooperate in the distribution of data in the network. Mesh networks can relay messages using either a flooding technique or a routing technique. As flooding technique is more suited for networks that can withstand high traffic, we decided to go with routing instead, so that optimal paths/routes can be chosen for relaying data from one point to another. With routing, the message is propagated along a path by hopping from node to node until it reaches its destination. To ensure all its paths’ availability, the network must allow for continuous connections and must reconfigure itself around broken paths, using self-healing algorithms. Self- healing allows a routing-based network to operate when a node breaks down or when a connection becomes unreliable. As a result, the network is typically quite reliable, as there is often more than one path between a source and a destination in the network. Mesh networks can be considered a type of an ad-hoc network.

4

1.2. swarm bee LE A swarm is a group of independent radios (or nodes) with a common interest to know their positioning relative to one-another and to be able to communicate. nanotron swarm nodes automatically detect new members and departures from the local swarm area. The swarm concept has been used in order to illustrate the communication aspect of independent radio nodes. With nanotron's wireless technology, location-awareness has been added to swarm. Ranging is used to measure the distances between the individual swarm members [2].

The swarm bee LE (Low Energy) module belongs to nanotron's 2nd generation swarm product family. It combines flexibility and integration with enhanced power management and includes simultaneous support for collaborative and fixed location systems. The LE version serves applications for reliable distance or location information between 10 and 500 meters with an accuracy of about 1 meter. The integrated MEMS sensor can detect 3D acceleration as well as temperature changes. swarm bee LE is a 2.4 GHz autonomous radio module based on nanotron's second generation ranging and communication transceiver chip nanoLOC. With its integrated host microcontroller, the module is controlled by its comprehensive nodeID API. swarm bee LE has a compact footprint providing a convenient form factor for customized tag designs. An antenna and a battery with housing are the only necessary external components to rapidly construct low power tags. The swarmAPI command language eliminates the need for lower level firmware. Higher level functions like ranging or messaging can be executed by API commands [2].

The swarm bee LE Development Kit Plus ("DK Plus" for short) consists of several DK Plus boards with antenna and PC software which includes swarm PC Tool and sniffer GUI. The kit helps users to get familiar with the functionalities of swarm bee LE module and develop their own applications rapidly. This is also the kit I am using to test and validate the implementation. The kit is connected to PC and the progress can be seen by printing messages, routing tables, etc.

Figure 1. swarm bee LE Functional Diagram and swarm bee LE Module [2]

5

1.3. Overview – Comparison of routing protocols

Wireless Sensor Networks (WSN) are intended for monitoring an environment. The main task of a wireless sensor node is to sense and collect data from a certain domain, process them and transmit it to the sink where the application lies. However, ensuring the direct communication between a sensor and the sink may force nodes to emit their messages with such a high power that their resources could be quickly depleted. Therefore, the collaboration of nodes to ensure that distant nodes communicate with the sink is a requirement. In this way, messages are propagated by intermediate nodes so that a route with multiple links or hops to the sink is established [2].

Taking into account the reduced capabilities of sensors, the communication with the sink could be initially conceived without a routing protocol. With this premise, the flooding algorithm stands out as the simplest solution. In this algorithm, the transmitter broadcasts the data which are consecutively retransmitted in order to make them arrive at the intended destination. However, its simplicity brings about significant drawbacks. Firstly, an implosion is detected because nodes redundantly receive multiple copies of the same data message. Moreover, this leads to resource blindness, i.e. nodes consume large amounts of energy without consideration for the energy constraints.

One optimization relies on the gossiping algorithm. Gossiping avoids implosion as the sensor transmits the message to a selected or random neighbor instead of informing all its neighbors as in the classical flooding algorithm. However, this causes significant delays in propagation of data through the nodes. Furthermore, these inconveniences are highlighted when the number of nodes in the network increases.

Due to the deficiencies of the previous strategies, routing protocols become necessary in wireless sensor networks. Nevertheless, the inclusion of a routing protocol in a wireless sensor network is not a trivial task. There are many routing protocols and specifications that have been published and are aimed to achieve multi-hop communication in wireless sensor networks. However, they have certain pros and cons in terms of memory consumed, latency, performance, etc. which need to be evaluated before choosing one to implement. Routing protocols are also in charge of constructing and maintaining routes between distant nodes. The different ways in which routing protocols operate make them appropriate for certain applications.

Like any other application, swarm application also has certain requirements which are crucial in selection of an appropriate routing algorithm. They are as follows:

1. Hierarchical mesh: In the flat schemes, all sensor nodes participate with the same role in the routing procedures. On the other hand, the hierarchical routing protocols classify sensor nodes according to their functionalities. The network can be divided into meshes, groups or clusters. A mesh root or leader is selected in the group to coordinate the activities within the mesh and to communicate with nodes outside the own mesh. This is important in swarm as the function of gateway is to collect data which is different from other nodes which feed it data. Hierarchical or cluster based methods are well known techniques with special advantage of scalability and efficient communication. Hence, a mesh built hierarchical around gateway is a better option than a flat scheme.

6

2. Energy efficient: Routing protocols should prolong network lifetime while maintaining a good grade of connectivity to allow the communication between nodes. It should also cope with the possibility that some or more nodes may be in sleep mode which is true in case of swarm. In a routing protocol, multiple routes can connect a node and the sink. The aim of energy-aware algorithms is to select those routes that are expected to maximize the network lifetime. To do so, the routes composed of nodes with higher energy resources are preferred. 3. Scalability: Since the number sensor nodes in sensor networks are in the order of tens, hundreds, or thousands, network protocols designed for sensor networks should be scalable to different network sizes. Most of the swarm applications are deployed in large spaces, like mines and so, to cover a wider range, hundreds of nodes are used. The routing protocols should be able to work with this number of nodes. A system’s scalability is said to be good if its effectiveness increases when the hardware is put-on and proportional to the capacity added. Routing schemes make efforts with the vast collection of motes in WSNs which should be scalable enough to talk back the events that take place in the environment. 4. Resilience: Sensor nodes may unpredictably stop operating due to environmental reasons or battery consumption. Routing protocols should cope with this eventuality so that when a current-in-use node fails, an alternative route could be discovered. 5. Mobility Adaptability: As the application where mesh networking needs to be incorporated is indoor localization, it should be taken into consideration that the wireless swarm nodes will be mobile all the time and hence the mesh should be self-healing and self-forming in case a node goes out of range or a new node comes into range. The topology should dynamically change itself whenever there is deviation in behaviour and perform routing of packets reliably in case pre-determined routes are lost. 6. Data aggregation: The main goal of data aggregation algorithms is to gather and aggregate data from different sources to achieve energy efficient and traffic optimization in routing protocols so that network lifetime is enhanced. However, this comes at a cost of increased packet size.

A quantitative overview of requirements that should be fulfilled by a routing protocol to be eligible for swarm is presented in Table 1.

7

Table 1. Quantitative specification of mesh networking parameters

Parameter Desired specification

Nature of mesh - Hierarchical, due to presence of a sink or gateway for data collection, or - Data centric, as the aim is to transfer location specific data to gateway

Scalability - Scalable beyond 80 or more nodes, - Suitable for application areas like mines and asset tracking

Packet size - Small or medium. No strict limit. - 20-60 bytes for frequent transmission (every few ms) - More than 60 bytes acceptable for less frequent transmission (every few seconds) - Maximum allowable packet size that can be transmitted via AIR is 128 bytes. - The highest packet size currently used by swarm is 54 bytes, which is sent frequently.

Mobility adaptability - The wireless swarm nodes are usually mobile and hence the mesh should be self- healing and self-forming - Dynamic topology should be supported by the routing protocol - Resilient to loss of routes and nodes

Data aggregation - swarm nodes send ‘blinks’ or Node ID notification (NIN) and range with each other periodically (every few ms to seconds). This is important for localization and should not be affected by routing. (More in chapter 2) - So, routing packets should not interfere with blinks and ranging packets by increasing traffic over AIR. - Data aggregation is a desired feature in routing, to reduce congestion and also filter data sent to sink. - However, it is an optional feature for swarm if the routing packets are few or less frequent.

Power efficient - swarm nodes are usually mobile and hence, battery powered. - Most swarm applications need them to be blink periodically for location and presence awareness and sleep rest of the time. - The mesh networking should incorporate power saving or synchronization schemes to support routing in sleeping nodes. - Additionally, the chosen routes from source to destination should consume less resources to increase network lifetime.

I had many candidate routing protocols and schemes to achieve multi-hop communication between swarm nodes. But the final choice weighed on the tradeoff among above mentioned criteria. These protocols and their functionalities are tabulated in Table 2.

8

Table 2. Comparison of different routing protocols

Routing Classifi Scalability Packet size Features Advantages Disadvantages protocol -cation (excluding headers) Layer 2 Hierarc- Good. Medium. 1. Initial phase of 1. Satisfies most of the 1. Doesn’t have any routing hical Depends on Maximum packet topology requirements of mechanism for (L2R) [12] the resource size is 22 bytes construction and swarm by being power saving available in which is sent route establishment hierarchical, 2. Doesn’t mention any terms of periodically. 2. Routes divided into scalable, resilient scheme for storing node Packet size upstream (node to and adaptable synchronization information in increases with mesh root) and 2. Routes always 3. Initial phase of route routing table. data aggregation downstream (mesh ready when a node establishment is 200 or more. root to node) routes. wants to transmit needed 3. Routing tables used 3. Route maintenance 4. Requires periodic to keep track of keeps track of transmission of node’s depth from changes in the update messages for mesh root and topology route maintenance neighbors 4. Packet size is 5. Routing tables manageable needed to be stored Ah hoc on Flat Limited, up to Small, but needs 1. Routes discovered 1. On-demand in 1. Storage space Demand 100 nodes. route only when a node nature. So, no prior needed for routing Distance With 100+ establishment wants to transmit route discovery is tables Vector nodes, path using 2 or more 2. Routing tables are needed 2. Route establishment AODV [13] discovery and messages every maintained at each 2. Energy efficient as has to be performed stabilization time for data node to select next the nodes do not every time data is to takes a long exchange. These hop towards need to be transmitted. time as this is a messages (in destination communicate all the 3. Even though packet proactive bytes) are: 3. Source broadcasts a time but only when sizes are small, protocol, i.e. -RREQ = 24 route request, RREQ data needs to be multiple command path is -RREP = 20 which is propagated transmitted packets (RREQ, discovered -RERR = 20 to destination. 3. Routing tables are RREP, RERR) need when needed. Destination then used by nodes to be sent before replies (RREP) which eliminates actual data which is unicasted the need to send transmission back to source information in 4. Limited scalability 4. RREP is then used to packets (unlike 5. Flat in nature transmit data from DSR). So, packet source to size is low. destination, by the 4. Tackles with information stored endless forwarding at intermediate of RREQ by nodes intermediate nodes, by using RREQ sequence numbers Dynamic Flat Limited, upto Large, as the 1. Similar to AODV 1. On-demand in 1. Large packet size as Source 200 nodes. packet carries list regarding nature. So, no prior intermediate nodes Routing of addresses broadcasting a route discovery is append their (DSR) [14] traversed by it, to RREQ and receiving needed address to RREQ reach the a RREP 2. Energy efficient as 2. Route establishment destination. DSR 2. Except, the RREQ the nodes do not needed every time also requires has a route record need to data needs to be exchange of 2 or where every communicate all the transmitted more messages intermediate node time but only when 3. Multiple messages before actual appends its own data needs to be need to be data transfer can address before transmitted exchanged before take place. forwarding towards 3. No routing tables actual data transfer RREQ = 16 + neighbor need to be stored as 4. Limited scalability n*addresses all information is 5. Flat in nature RREP = 15 + present in packets

9

n*addresses 4. Can be made secure RERR = 16 + with Secured DSR variable (SDSR)

Gossiping Probabil Good. It can be Depends on the 1. This is not a routing 1. Better than flooding 1. Non-deterministic [15] -istic implemented protocol that protocol but a as it reduces the nature for any gossip algorithm mechanism of data number of 2. Pgsp needs to be number of is being forwarding, like retransmissions by decided carefully nodes as the implemented in. flooding making some of the 3. Not suitable when forwarding Time to live 2. Nodes do not need nodes discard a urgent data is to be nature of (TTL), Hop limit to know the message instead of transferred frames is non- (HL) and network topology forwarding it 4. Uses no routing deterministic destination or any routing 2. Retransmission and hence could be and doesn’t address are the algorithms relies on the less energy efficient care about the essential fields 3. When a node probability Pgsp to 5. Creates problem of network needed. These receives a message, determine whether delay in a topology or can be included rather than or not to retransmit. propagation of data resources. in the header of immediately 3. The main benefit is among sensor Nodes can be the packet. retransmitting it as that when Pgsp is nodes in 100’s to in flooding, it relies sufficiently large, 1000’s range. on a probability to the entire network determine whether receives the to retransmit. message with very high probability, even though only a non-deterministic subset of nodes has forwarded the message 4. Good scalability 5. Avoids the problem of implosion[a] by sending information to a random neighbor instead of broadcasting, like flooding Sensor Data- Limited. Takes Small, but 3 1. The data is named 1. The SPIN family of 1. Doesn’t guarantee Protocols centric longer for messages need to using meta-data protocols includes data delivery as if for packet to reach be exchanged that highly SPIN-1 and SPIN-2 the nodes that are Informatio the destination every time for describes the that disseminate interested in the data n via if is several data exchange. characteristics of information with are far away from Negotiatio hops away 20 bytes or more data low latency and source and the nodes n (SPIN) from source, as for each type of 2. It has three conserve energy at between source and [16] it depends on packet. types of messages, the same time. destination are not whether the i.e., ADV, REQ, and 2. SPIN-1 uses interested in that nodes in DATA. negotiation to solve data, such data will between are • ADV- When a node the difficulty of not be delivered to interested in has data to send; it implosion and the destination at all. the data. advertises this overlap. 2. Flat in nature. Appropriate message containing 3. It reduces energy 3. Too many messages for upto 100 metadata. consumption by a need to be nodes. • REQ- A node sends factor of 3.5 when transferred before this message when it compared to actual data transfer. wishes to receive flooding. some data. 4. Negotiation ensures • DATA- Data the transmission of message contains the redundant data data with a meta-data throughout the

10

header network is eliminated and only useful information is transferred. Low Hierarc- Good. Cluster Large because of 1. Network is divided 1. Extends the battery 1. The newly appointed Energy hical based routing data into clusters which life of network by cluster head needs to Adaptive allows aggregation[b]. have a cluster head, transferring the role make all the other Clustering scalability for More than 30 each. of cluster head to nodes aware of its Hierarchy bigger WSNs bytes due to data 2. A cluster head different nodes. functionality (LEACH) as well. It can aggregation at collects data from 2. Data aggregation periodically. [17] support 100’s– cluster heads. sensor nodes of that allows redundant 2. To reduce 1000’s of cluster and sends and unwanted data intercluster and nodes, thereby the data to the sink to be excluded. intracluster collisions, maximizing after data 3. It is resource aware LEACH has to use network aggregation and energy TDMA/ code- coverage. 3. It randomly efficient. division multiple changes the cluster 4. Doesn’t use meta- access (CDMA). head, every time data. 3. Data aggregation period to ensure increases the packet that sensor nodes in size. network consume 4. Optimal routing[c] is energy equally and not possible as all the extend lifetime of data is first sent to the network cluster head irrespective of whether source has a better route to base station. [a]Implosion - Flooding technique of data forwarding causes all nodes to broadcast the packet meant for a destination. This can result in multiple copies of the packet to be received at nodes. This is called implosion.

[b]Data aggregation - The main goal of data aggregation algorithms is to gather and aggregate data from different sources by using different functions such as suppression, min, max and average to achieve energy efficient and traffic optimization in routing protocols so that network lifetime is enhanced. In LEACH, the cluster head (CH) nodes compress data arriving from nodes that belong to the respective cluster, and send an aggregated packet to the base station in order to reduce the amount of information that must be transmitted to the base station, thereby reducing traffic.

[c]Optimal routing - This is a network optimization solution. It enables the source with an OR system to have a shorter path for a packet originating from a cluster and terminating at the base station or its cluster. The source node does not have to route the packet via cluster head and its own cluster.

The above comparison of certain routing protocols, makes it easier to select an appropriate one which can satisfy the requirements mentioned before. From this summary, I gathered that L2R routing protocol fits the description as it emphasizes on mesh formation and maintenance around a mesh root (in our case, this role can be taken up by gateway) and is also data centric. As we are not concerned (as of now) with P2P (peer-to-peer) routing, but mainly with unicast and broadcast communication between sink (mesh root) and other nodes, L2R routing specification can be simplified to ascertain our needs. However, P2P routing can still be done, directly between the source and destination nodes or through mesh root and has been explained later.

This protocol satisfies most of the criteria mentioned in Table 1, them being, hierarchical nature of mesh, good scalability, manageable packet sizes, adaptable to changes in topology and provides data aggregation to filter packets and reduce traffic directed to base station. However, it has a few disadvantages that could have a crucial effect on swarm application. One of them is

11 the absence of power saving mechanisms, which is quintessential in case of low power location systems, especially when they are battery operated. As swarm is used for localization, and is usually associated with mobile targets, most of their applications require them to be in a power saving state and only ‘blink’ periodically to make their presence known (similar to a tracking device) while sleeping the rest of the time. In this case, for two nodes to communicate, they should be awake at the same time, which is highly unlikely. Therefore, a synchronization scheme is needed to ensure that all the sleeping nodes are up to date with the changes in the mesh, such as change in topology or loss of the mesh root, etc. which the L2R protocol lacks. I devised some improvements to overcome these shortcomings which are visited in chapter 3. 1.4. Goals The specific deliverables for this master’s project are:

1. Create overview and analysis of existing mesh networking implementations and select/create a candidate approach for implementation in existing swarm. 2. Adapt and implement a candidate approach into existing low power embedded location platform (swarm). 3. Improve the implemented routing protocol to achieve energy efficiency, i.e. adapt it to support routing among sleeping nodes as well, by formulating a synchronization scheme. 4. Analysis of performance and robustness of implementation through tests. 1.5. Organization The remainder of this document is organized as follows:

Chapter 2 will explain the principle of swarm, concepts governing its operation and an introduction to the API. This will be the basis to facilitate the reader’s understanding of the remainder of this thesis.

Chapter 3 introduces all the design patterns including the initial L2R protocol according to the IEEE specification and the potential solutions to cope with its shortcomings. It explains the approach opted to save power by supporting routing for sleeping nodes and frame formats used by the protocol.

Chapter 4 presents platforms used for testing, observations and its relation to the final implementation.

In Chapter 5, methods and techniques used in the implementation will be presented, as well as a detailed explanation of different optimizations. It also explains how implemented protocol satisfies the requirements so that MQTT-SN can be implemented in future.

The data and performance analysis of the implementation results will be presented in Chapter 6 as well as the interpretation of these results.

Finally, Chapter 7 summarizes the results, possible limitations, and future improvements.

12

Chapter 2: Concepts 2.1. Principle of swarm A swarm is a collection of devices, swarm bees, that can listen to its peers and talk to them. They can monitor their own position and that of the other devices in the swarm and can also report their condition. However, the most important feature of a swarm bee is to announce its own presence to allow automatic node discovery and a flexible swarm structures. In a swarm, each node operates independent from the others and dynamically adapts to the changing environment. swarm bee LE uses Chirp radio technology. It uses concepts of ranging, Time difference of arrival (TDOA) and blinks in the communication. These are explained below [3]:

1. CSS (Chirp Spread Spectrum) Chirp pulses have always been used in Radar applications for precise target locating based on Time of Arrival (TOA) estimation. CSS is a modern modulation scheme which utilizes chirp pulses for communication. The CSS signal can conveniently be used for Time of Arrival (TOA) estimation, simultaneously to the communication process. Versions of CSS have been standardized in IEEE 802.15.4a and IEC/ISO 24730-5. nanotron Technologies provides CSS technology in mixed signal RF transceiver ICs. Chirp pulses are: x Robust against Noise and Multipath Fading due to high BT Product x Most effective Utilization of the given Bandwidth x Simple Synchronization CSS operates in the 2.45 GHz ISM band and achieves a maximum data rate of 2 Mbps. Each symbol is transmitted with a chirp pulse that has a bandwidth of 80 MHz (an effective bandwidth of 64 MHz is the result of a selected roll-off factor of 0.25) and a fixed duration of 1 μs. The system gain of CSS is 17 dB.

2. SDS-TWR (Symmetrical Double-Sided Two Way Ranging) It is a patented messaging scheme which allows ranging (TOF measurement) between simple, low cost transceiver devices. In particular, the time base offset, which is always present between independent mobile devices and which makes precise TOF measurements difficult, is eliminated. As SDS-TWR can technically be applied on any modulation scheme, swarm bee LE uses it CSS based ranging applications. SDS-TWR stands for Symmetrical Double-Sided Two-Way Ranging: x Symmetrical — Measurements from Station A to Station B are a mirror-image of the measurements from Station B to Station A (ABA to BAB). x Double-sided — Two stations are used for ranging measurement x Two-way — Two packets, a data packet and an ACK packet, are used.

It is a methodology that uses two delays that naturally occur in signal transmission to determine the range between two stations. These signals are Signal Propagation Delay between two wireless devices and Processing Delay of acknowledgements within a wireless device. Because of its simple and elegant methodology, SDS-TWR can be easily adapted to many purposes, including location awareness. The following illustrates SDS-TWR.

13

Figure 2. SDS-TWR

14

3. TOF (Time of Flight)-Ranging Based on Time of Flight measurements the physical distance between two devices can be estimated. Based on TOF estimation accurate zoning, secure access and virtual safety zone applications are possible (‘collaborative localization’). Collaborative location uses relative positions to provide location-awareness. Radio nodes determine the distance to neighbors by exchanging packets and measuring their time of flight (TOF) at the speed of light. This method is called ranging. As these radios are autonomous, location infrastructure is not required.

4. TDOA (Time Difference of Arrival)-Real Time Locating Systems With the ability of generating precise TOA stamps on chip level, systems which utilize the TDOA between different infrastructure devices (location anchors) can be built. TDOA systems usually require significant effort for synchronizing the location anchors and for actually estimating the location of a device. Based on the CSS chip nanotron has developed a conveniently manageable system (nanoPAL) with wireless synchronization in-between the location anchors. A location tag transmits broadcast messages (often called blinks) which are received by location anchors. The location anchors generate TOA stamps for the blinks received. The TOA stamps are reported to the location server through a wired or wireless communication backbone. The location server calculates the TDOAs from the reported TOAs. Based on the TDOAs and the known positions of the location anchors it estimates the coordinates of a location tag. The LE version reliably estimates distance or location information between 10 and 500 meters with an accuracy of about 1 meter. The ER (Enhanced Resolution) version that is based on UWB technology, is more precise and reliably estimates distance or location information between 0 and 20 meters with accuracy of 10 cm.

Figure 3. Radios ranging to each other

15

2.2. swarm API swarm API is a hardware-independent Application Programming Interface (API) to realize the ranging functionality of swarm radio nodes. It provides high-level commands which act as interface with the hardware and can be used to develop complex applications in an easy and rapid way.

The swarm API supports three protocols: ASCII, BINARY and AIR. The ASCII and BINARY options provide direct communication between an optional host controller and swarm radios using their serial interface. The AIR option provides wireless reconfiguration for swarm radio nodes. The commands most commonly used in the basic configuration and certain applications like sending unicast or broadcast data, are specified here. Before that, the features which are controlled using API commands are described.

1. Presence announcement Every swarm bee can periodically announce itself with a broadcast message also known as nodeID blink; this enables the automatic discovery. The nodeID blink contains the module ID and its condition. In order to receive the broadcast, swarm devices in the area should be listening to the same ‘channel’ the announcing device is talking to. This ‘channel’ is defined by the transmission parameters.

Table 3. Transmission parameters

Parameter Purpose Default

Synchronization word Separation of different populations of 1 nodes

80/1 transmission mode Highest throughput Selected

80/4 transmission mode Longest range -

The 80/1 and 80/4 transmission modes refer to the 80 MHz bandwidth and the symbol duration of 1 μs or 4 μs. With 80/1 a throughput of 1 Mbps can be reached. When using the mode 80/4, the maximum throughput is only 250 kbps, but the achieved range is longer. Communication is only possible among swarm devices with the same transmission parameters.

2. Monitor its position and that of the neighbors When a swarm device hears the blink from a second swarm device with the ID and status it reacts to it by sending an automatic ranging request. If the second device acknowledges and accepts the ranging request, both will proceed with the ranging operation. Once the device has gathered all the required information, it estimates the distance (or range) between the two of them and broadcasts it to the neighbors.

3. Report condition When swarm bee listens to the channel, it hears the messages broadcasted by the neighbors: blink, containing its ID, and the broadcast message containing information related to an estimated distance. However, this is not the only information contained in both messages; also

16 important information about the condition of the swarm bee is added. The condition of the swarm bee is determined by the value of its sensors and GPIOs, its battery status, its power mode, etc. In addition to this, when a swarm device reads critical values at its sensors or GPIOs, it can generate an interrupt and trigger a broadcast message, similar to the blink, to indicate the cause of transmission. All the devices receiving that blink, can then be aware of the condition of the neighbor and act according to it.

4. Listen and talk Last but not least, a swarm bee can also listen and talk to other swarm devices in the area; that is transmit and receive other data. The data transmissions can be included together with the blinks or a ranging request messages or can be carried out independently, as a data transmission specific message, either unicast or broadcast. The strategy used for the data transmission is important to ensure the arrival of the information to its destination, and it will depend on the condition (specially the power mode) of the recipient of said data. For instance, if the sender knows that the recipient is not active at that moment it should store the message and delay the transmission until the recipient confirms that it is listening by broadcasting blinks. The swarm bee can store only one message per recipient swarm device but up to 10 messages.

Table 4. Possible combinations for data transmission

Packet type Immediate Delayed

Other purpose nodeID blink yes no packet automatic range yes no request

range request yes yes

Data transmission unicast yes yes packet broadcast yes yes

5. Coexistence with other swarm bees and alien systems When multiple swarm bees operate in the same area it is quite probable that at some point two of them will transmit a packet at the same time. In this case, the two packets will collide in the air and none of them will arrive to its destination. To avoid this problem, and maximize the channel capacity, the swarm bee accesses the channel using Carrier Sense Multiple access with collision avoidance technique (CSMA/CA) in symbol detection mode. When this technique is enabled the swarm bee behaves in the following way: whenever it has to transmit, it first listens to the channel. If it detects that someone is transmitting using its same symbol (another swarm), it waits for a short predefined time and listens again. It keeps on doing so until it does not detect any other swarm device transmitting and it can proceed with its own transmission. In case of presence of any other wireless network in the same area, for instance WiFi, the alien system will probably use different symbol; thus, the CSMA in symbol detection will not be useful anymore. The swarm bee can react to this scenario by using CSMA in energy detection

17

mode. Similar to the previous scenario, whenever the swarm bee is going to transmit, it will first listen to the channel. If the energy detected in the transmission frequency band, is higher than a predefined threshold, the swarm device will wait for a predefined short time and listen again. It will continue doing so until the energy that it detects in the transmission band is lower than the threshold.

6. Power management With regards to its power consumption a swarm bee LE module can be used either managed or unmanaged (power mode: 0). There are three different power modes which are controlled by hardware and software simultaneously (power mode: 1, 2, 3).

Table 5. swarm power modes

Type of power Power mode Features mode

Un-Managed 0: Always awake x Device remains active with its receiver switched ON. x Radio is always active; when not transmitting, it is in receive mode, waiting to receive messages. x Appropriate for anchor devices and gateways with a permanent power supply, for instance. x The serial interface of swarm bee is active to allow continuous communication. x Sensors are continuously monitored and the GPIOs controlled, so that they can generate interrupts at any moment.

Managed 1: Active and Sleep x Alternates between active and sleep mode. x Waking up is controlled by a timer x Periodically, the timer generates an interrupt to indicate the swarm bee that it has to transmit a blink. After the swarm device has transmitted its nodeID blink, it can go to receive mode for a short-time window. When the reception window is closed, the radio is disabled and the device goes to sleep mode until next interrupt. x Serial interface, sensors and GPIOs are always active.

2: Active and x Can be active or in snooze mode; which consumes 1000 times less than the sleep mode. Snooze x In snooze mode, radio is disabled, serial interface is not active, sensors are not read and GPIOs are not controlled. This means that no communication with a host controller is possible and no interrupt is allowed other than the periodic timer controlling the nodeID blinks transmission. x When timer generates periodic interrupt, it triggers the swarm bee to read the value of the sensors, activate the GPIOs and the UART and send a node ID blink. After the swarm device transmit its nodeID blink, it can go to receive mode for a short time window. When the reception window is closed, the radio, GPIOs and UART are disabled, so that the device goes into snooze mode again until next interrupt.

3: Active and Nap x Minimizes the power consumption to levels similar to power mode 2 but, at the same time allows to wake up the swarm device using the GPIOs, or even the sensors. The number of component that are capable of generating an interrupt can be set; we should take into account, however, that the more component, the higher the power consumption. x When the GPIO pin (DIO_0), or any other configured as wake-up source, is asserted, it also generates an interrupt and the swarm device is kept active until it goes low.

18

7. Common commands used: Certain swarm commands used for configuration and testing purposes, are specified here along with their function. Their syntax and detailed information can be found in Appendix 1. The routing protocol requires various parameters to be set; some already existed in swarm (like setting power mode, blink interval, etc.) and others were added for the implementation (like, size of routing table, broadcast routing ON or OFF for blinks/ranging results, etc.). For now, the newly created parameters are hardcoded, and the following commands from existing API are used for the others. However, while prototyping routing in swarm, new commands will be added to the API. The routing specific parameters and how their values are decided, are explained in chapter 4.

Table 6. Common commands used Category Command Function swarm radio Setup GNID Get Node ID of the node connected to host Commands SSET Save SETtings; saves all settings including Node ID permanently to EEPROM

GSET Get current SETtings (node configuration)

SPSA Set Power Saving Active; sets power management mode ON/OFF

STXP Sets transmission (TX) Power of the node

SSYC Set the PHY SYnCword of swarm node

Data SDAT Sends DATa to node ID Communication Commands BDAT Broadcasts DATa

FNIN Fill data into Node ID Notification packets.

FRAD Fills the RAnging data buffer. This data will be transmitted with the next RATO operation. swarm radio Node SBIV Sets the Broadcast Interval Value (or blinking rate) Identification

Medium Access SRXW Sets reception, RX, Window during which the receiver listens Commands after its ID Broadcast.

19

Chapter 3: Design swarm focuses primarily on data collection at the gateway. Hence, a routing protocol for swarm should be data centric, and a hierarchical topology for mesh with gateway as the sink node, is preferable. From our discussions in section 1.3, we concluded that Layer 2 routing (L2R) protocol provides most of the features deemed vital for swarm (as listed in Table 1). But IEEE recommended specification of this protocol has certain shortcomings in terms of energy efficiency, memory overhead and synchronization which need to be overcome. It doesn’t provide any mechanism to support routing for nodes in power saving mode and assumes that all the nodes are always awake, which is not the case in most swarm applications. In this chapter, we will discuss modified and improved design of the protocol proposed by me, to overcome these limitations. 3.1. Proposed routing protocol This section covers the L2R protocol for nodes which are always awake and elaborates on the essential features and processes involved in the routing. Section 3.2 deals with the custom synchronization scheme added to this protocol to allow sleeping nodes to participate in routing. It follows the processes in the same way as explained in this section, but the communication is modified to accommodate nodes in power saving mode. 3.1.1. Architecture In WSN, each sensor node has the capabilities to collect data and route data back to the sink and the end users. Data are routed back to the end user by a multihop infrastructure-less architecture through the sink. The sink may communicate with the host controller via Ethernet or wireless connectivity.

While WSNs share many similarities with other distributed systems, they are subject to a variety of unique challenges and constraints. These constraints impact the design of a WSN, leading to protocols and algorithms that differ from their counterparts in other distributed systems. The protocol stack used in WSNs, by the sink and all sensor nodes combines power and routing awareness, integrates data with networking protocols, communicates power efficiently through the wireless medium, and promotes cooperative efforts of sensor nodes. The protocol stack is depicted in Figure 4. Note: Network layer is also called Routing layer.

Figure 4. WSN protocol stack

20

However, as current swarm uses only single hop communication, a node’s reachability is restricted to its immediate neighbor. Hence, the network architecture of swarm doesn’t consist of routing/network layer. It consists of a physical (PHY) layer (which performs the functions of MAC layer as well) and an application (APP) layer on top of it. The PHY layer also consists of a sublayer called, portation layer which defines functions which are hardware dependent. It acts as a Hardware Abstraction Layer (HAL) which is a logical division of code that serves as an abstraction layer between physical hardware and its software (the layers above). It provides a device driver interface allowing a program to communicate with the hardware. The swarm architecture is shown below.

Figure 5. Swarm architecture

The reference protocol, Layer 2 routing defines a routing protocol for IEEE 802.15.4 networks. It enables routing over the MAC and PHY layers available in IEEE Std 802.15.4. In this design, some features of L2R routing have been modified or eliminated, to be supported by swarm, with lesser overhead. In the recommended practice of L2R routing [12], a sublayer called, L2R sublayer is introduced to carry out the routing functionality. But in the implementation, to abide by existing swarm node’s architecture, a Routing layer has been introduced instead. The Routing layer is implemented on top of PHY and below APP layer. However, the APP layer will still have direct access to PHY layer, in case routing needs to be ‘switched OFF’ for certain applications.

The following functionalities are provided by this practice: x Topology construction x Mesh discovery, join, update, and recovery x Unicast and broadcast routing

Figure 6. Modified swarm architecture for routing

21

3.1.2. Functional Overview A mesh is built around a mesh root device which enables access to a service such as a data collection service, control and monitoring service, etc. The mesh root acts as the controller of the mesh, defines the functionalities and the metrics in use, and is implemented in a Full Function Device (FFD). In case of swarm, every device is capable of acting as mesh root as it is possible for a swarm device to assume the role of a gateway in swarm location services (SLS). Currently, to become a mesh root, a device needs to be awake all the time, i.e. in power mode 0 (power modes in swarm are explained in section 2.2, Table 5). In future, deciding a mesh root will be done through a command added in API and can be used in applications where multiple nodes are in power mode 0. If multiple nodes are set as mesh root, they will create their respective meshes and a node can join one which is suitable for it.

A device that can function as a gateway may optionally become a mesh root and advertise that it is available. A device that needs to communicate with a gateway joins the corresponding mesh and in turn, informs its neighbors of the existence of the mesh it has just joined. This mesh is organized hierarchically. The hierarchy is defined by the depth of each device in the mesh, in other words the relative distance of a device to a mesh root (which has a depth of 0) with respect to a certain metric. Ancestor-descendant and sibling relationships are established based on the depth of the devices. Nodes at same depth from the mesh root, are considered siblings, whereas if they have different depths, then the node with lesser depth, is the ancestor to the node with higher depth.

This proposal allows the following types of routing: x Upstream (US) routing: routing from an end device to the mesh root x Downstream (DS) routing: routing from the mesh root to a device US routing is performed on a hop-by-hop basis by forwarding a frame from a device to an ancestor. A device only uses information about its neighbors’ depth and metric to perform US routing. US route establishment is described in 3.1.4.2.

DS routing is performed by forwarding a frame from a device to a descendant based on routing information advertised beforehand. A device selects a next hop towards the destination by evaluating its neighbors based on a metric determined by the mesh root. DS route establishment is described in 3.1.4.3. 3.1.3. Functional Description a) Start a mesh A mesh is started when the Routing layer in the gateway that is to become the mesh root, receives the RLME_MESHSTART_REQUEST primitive from APP layer. The Routing layer then initializes a mesh table (MT). The mesh root address in the MT is set to the device’s own address. The Depth and MyPQM are set to zero. The mesh sequence number (MSN) is set to an initial value of 0x00. The local neighbor table (NT) is initialized as an empty table. Other parameters in the MT are set as predefined defaults. These default values are obtained from a list called Mesh Descriptor List which is created and filled with device dependent constants and default values once a node joins a mesh.

22

The Routing layer then starts the periodic transmission of enhanced beacons (EB) containing a Topology Construction Information Element (TC IE) with the frequency indicated by TcIeInterval whose value can be found in Mesh Descriptor List. Upon successful completion of this start-up procedure, i.e. transmission of EBs containing TC IE, the gateway becomes the mesh root and the Routing layer sends an RLME_MESHSTART_CONFIRM primitive to APP layer. b) Join a mesh When a device wishes to join a mesh, the APP layer invokes the RLME_JOINMESH_REQUEST primitive to request the Routing layer to join a mesh. Upon reception of this primitive, the Routing layer broadcasts an EBR with a TC IE with an empty Content field. When a mesh device receives the TC IE, it immediately replies with an EB containing a TC IE then resumes its regular periodic TC IE transmissions. When the joining device receives the response TC IE, it computes its own depth and PQM and creates or updates an MT entry in Mesh List related to the mesh advertised in the TC IE. If the device receives multiple TC IEs from different meshes, the device creates as many MTs as meshes. At the end of this process, Routing layer selects the mesh providing the best PQM. The Routing layer creates a Mesh Descriptor List for the mesh the device is joining where the attributes are set to default values. A device is allowed to join a mesh if its depth does not exceed the value in the MaxDepth field of the TC IE. The device then deletes the Mesh List and transmits its own TC IE. The Routing layer sends RLME_JOINMESH_CONFIRM primitive to the next higher layer. A device is only allowed to be part of a single mesh, not multiple. c) Rejoin a mesh If a device has either no ancestor or neighbor left in the local NT of an MT, it concludes that is has been disconnected from the corresponding mesh. Upon detection of the disconnection, the Routing layer clears all tables and mesh attributes and issues RLME_DISCONNECTMESH_INDICATION primitive to the next higher layer. The device may rediscover all the existing meshes and join one according to the procedure described in 3.1.3 b.

Figure 7. Starting a mesh

23

Figure 8. Joining a mesh 3.1.4. Mesh Construction and Route Establishment Each device generates its own L2R sequence number (LSN) regardless of the number of unique devices with which it wishes to communicate. The LSN is incremented before the transmission of a new IE used by the Routing layer requiring a sequence number with the exception of the TC IE which uses Mesh Sequence Number (MSN). The LSN and MSN are permitted to roll over.

3.1.4.1. Mesh construction A mesh is managed based on the information retrieved from the base attributes found in TC IE. The relevant information is stored in an MT illustrated in Table 7. A device manages the MT of the mesh it has joined.

Quality metric is used to evaluate the ‘cost’ of a path/link and in this implementation, number of hops is being used as the metric. However, more quality metrics can be used, like RSSI, Expected Transmission count (ETX), Expected airtime, Distance between nodes, etc. When a device A receives a TC IE from a neighbor B, A sets the PQM of B in the local NT as the sum of the Link Quality Metric (LQM) between A and B and the PQM retrieved from B's TC IE. Before transmitting its first TC IE, device A browses its local NT, finds the neighbor with the best PQM and sets its own PQM to the value found. Assuming that the depth of that neighbor is D, A then sets its own depth to D+1.

24

Table 7. Elements of an MT Entry

Name Type Valid range Description

Mesh root address MAC address 6 bytes Address of the root of a mesh

MSN Unsigned Integer 0x00 - 0xff Identifies the latest TC IE

Local NT -- As specified in Table Table of neighbors belonging to the mesh indicated 8 by Mesh root address.

Number of -- 0 – Number of List of neighbors in Local NT neighbors members in mesh

Metric ID Unsigned Integer 0x00 - 0x0f Identifies the metric in use in the mesh.

My PQM value Depends on Depends on metric Value of the metric of the current device. metric ID ID

L2R max depth Unsigned Integer 0x00 - 0xff Indicates the maximum depth allowed in an L2R mesh.

My Depth Unsigned Integer 0x00 - 0xff Depth of the current device.

Table 8. Elements of a local NT entry

Name Type Valid range Description

Neighbor address MAC address 6 bytes Address of the neighbor.

Depth Integer 0x00 – 0xff Depth of the neighbor.

PQM value Depends on metric ID Depends on metric Value of the PQM provided by the ID current neighbor.

Interval unit Enumeration Second, minute, hour Unit of all intervals

TC IE interval Unsigned Integer 1-63 Interval between the neighbor's TC IE transmissions in the unit specified by Interval unit.

Number of Unsigned Integer 0-5 Number of nodes that are reachable reachable nodes through the neighbor by DS routing

List of reachable List of MAC 6 bytes each List of devices that are reachable destinations addresses through the neighbor by DS routing.

25

3.1.4.2. US route establishment A US route is the path from a device to the mesh root. In order to find the US route, each device uses the local NT built with the information retrieved from the TC IEs. Each device selects the next hop from its local NT without requiring knowledge of the entire path to the mesh root. US routing requires at least one ancestor in the local NT. If a device has several ancestors available, the frame is forwarded through the ancestor providing the lowest depth and the best PQM.

3.1.4.3. DS route establishment A DS route is the path from the mesh root to a device. DS routes may be established using hop- by-hop routing. Hop-by-hop DS routes are established using the information found in L2R Routing IEs. They may be established once US routes have been established. L2R Routing IEs are transmitted periodically by devices with depth > 1 and they are also appended to the data frames to be routed. The data or EB frame with the L2R Routing IE is transmitted through the US routes established through the procedure described in 3.1.4.2. When an intermediate hop receives an L2R Routing IE, it stores the address of the original source of the frame in the list of reachable destinations of the neighbor from which the frame was received in the local NT.

3.1.5. Route maintenance

3.1.5.1. US route maintenance A device uses the information present in a TC IE from its neighbors to maintain US routes. A device transmits EBs containing a TC IE periodically every TcIeInterval. Upon reception of the TC IE, a device browses the local NT in the MT of the mesh to which it belongs. If the transmitting address (TxA) of the TC IE is already present in the NT, the receiving device compares the information contained in the TC IE with the respective fields corresponding to TxA in the local NT. If the MSN of the TC IE is newer than the MSN of the last TC IE, the received values in the local NT are replaced with the values retrieved from the new TC IE. The MSN can roll over. If the TxA of a TC IE is not recorded in the local NT yet, the receiving device creates a new entry for the new neighbor with the information retrieved.

If a device receives a TC IE from a neighbor with a depth D' different from the depth of all its ancestors, and providing a better PQM, the device updates its depth to D'+1 and informs its neighbors of its new depth in the next scheduled TC IE.

If a device misses maxMissedTcIe TC IEs from a neighbor, the corresponding entry in the NT is erased.

A device should have at least one ancestor in its local NT. If there is no remaining ancestor in the local NT, it should clear the local NT and rejoin a mesh according to the procedure described in 3.1.3.b.

26

Figure 9. US route maintenance

3.1.5.2. DS route maintenance A device uses the information contained in the L2R Routing IE, sent in EBs and data frames to create, maintain and update the DS routes. If a device receives a data frame from a neighbor, it inspects the L2R Routing IE and retrieves the source address (SA) of the data frame.

The device determines whether the SA is already recorded in the list of reachable destinations of the neighbor with the address TxA. If not, the SA is recorded. The device may optionally delete an SA entry in the list of reachable destination of a neighbor if no frame is received from the SA through that neighbor within a period of time determined by the implementer. A device transmits EB with L2R routing IE every LrIeInterval to allow DS route maintenance.

27

Figure 10. DS route maintenance

3.1.6. Routing Capabilities The L2R Routing IE is used to achieve routing. Four addresses need to be present in a frame routed in a mesh: x Source Address (SA): address of the originator of the frame, included in the Source Address field of the L2R Routing IE. x Destination address (DA): address of the final destination of the frame, included in the Destination Address field of the L2R Routing IE. x Transmitting node’s address (TxA): address of the device currently transmitting the frame, included in the Source Address field of the MAC header (MHR). x Next hop address (NHA): address of the next hop through which the frame is being routed, included in the Destination Address field of the MHR.

The mesh enables unicast or broadcast routing as described in the following section. The L2R Routing IE should not be modified by intermediate hops.

28

3.1.6.1. Unicast routing There are two kinds of unicast routing: US routing and DS routing. Once a device has at least one ancestor in its NT, it may start transmitting data frames. When a device wants to route a frame to a destination, it finds the appropriate next hop according to the routing information recorded in the NT. A device may route a frame US or DS. The decision to forward a frame US or DS is made as illustrated in Figure 11.

The source of the data frame, appends an L2R IE to the frame, in which it sets the SA to its own address and the DA to the address of the final destination it wants to reach. Intermediate hops never modify either the SA or the DA in the IE, but only the SA and DA of MHR. Once the next hop has been selected, the device sets the TxA to its own address and the NHA to the address of the next hop found and forwards the frame.

A device processes a data frame according to the algorithm illustrated in Figure 12. If the frame is an outgoing frame from the next higher layer, the device adds the L2R Routing IE and the MHR and transmits the frame through the PHY layer. If the frame is an incoming frame from the PHY layer, the frame is delivered to the routing layer. If the DA matches the device’s address, the frame is delivered to the APP layer. If the DA does not match the device’s address, the frame is forwarded again.

If a data frame is received from a neighbor that does not belong to the local NT of the current mesh, the frame is dropped. The recipient is not responsible for routing a frame from a node that is not part of the same mesh.

Each device keeps a list of used neighbors for a given LSN and SA. This list contains the addresses of the devices which it has received a frame from or it has forwarded a frame to. When a device selects a next hop, neighbors whose addresses are recorded in this list should not be considered in order to avoid loops. This list starts from empty upon each new frame and is deleted after LsnSaRecordTimeout. A device performing routing should have the necessary resources to enforce this loop avoidance mechanism.

Figure 11. Unicast Routing decision based on the NT

29

Figure 12. Data frame processing

3.1.6.2. Broadcast routing If the DA in MHR of data frame sent by higher layer, to routing layer is the broadcast address, the DA of L2R routing IE in data frame is set to the broadcast address. The routing layer then sends this data frame as unicast data frame to all its neighbors. The source node of broadcast (or any other message) ensures that all its neighbors (sleeping or awake) receive these packets by using a blink induced communication scheme which has been added on top of the routing protocol and is explained in section 3.2.

When a device receives a data frame with DA in L2R routing IE as broadcast address and if the device is not the original source of the frame, it forwards the frame to all neighbors other than the one from which it received the frame. If a device receives a broadcast frame where the original source address is the device's own address, the frame is discarded. If a device is not the original source of the broadcast frame, after transmitting a broadcast frame, a device records the SA and the LSN and discards any subsequent frames with the same SA and LSN in order to avoid duplicate transmissions. This record is deleted after LsnSaRecordTimeout. Hence, broadcast routing allows a frame to be routed throughout the mesh without duplication even though the original source is many hops away from distant mesh nodes.

30

As the broadcast message is still sent in unicast form, the traffic is low and provides a hardware acknowledgement to the sender. This is because in swarm, it is an inherent property that broadcast messages are not acknowledged, but unicast ones are automatically acknowledged and messages are retransmitted in case an auto ACKs are not received within certain time period.

Figure 13. Broadcast routing and frame processing

31

3.2. Additional custom features 3.2.1. Blink induced communication According to the protocol, the EBs (containing routing updates) need to be sent periodically as broadcast. However, this is not reliable in case of sleeping nodes as they may not be awake when EBs are being sent. To deal with this, I made all the communication (updates or data) blink induced. This uses the inherent property of swarm in which every node sends out Node ID notification (NIN) periodically. So, a sleeping node wakes up reliably to send NIN packets, also called blinks. The broadcast of blinks is followed by a configurable period of time when the node’s receiving window is open, i.e. the receiver is ON. Hence, when a neighboring node is awake during the time that current node sends a blink, the neighbor becomes aware of the current node being awake and sends it EB/update as a unicast message.

In swarm, unicast messages are automatically acknowledged by the recipient (which is an immediate neighbor), but broadcast messages are not. This makes blink induced communication more reliable, as all messages are sent as unicast.

However, if an acknowledgement is not received within a certain period, the sender performs retransmissions till it receives an acknowledgement. A maximum of 4 immediate retransmissions are permitted and are possible because a copy of the last transmitted message is stored. If transmission was unsuccessful the fourth time, the upper layer is notified and it takes the necessary actions. When this happens in case of routing related packet, I prevent loss of data by saving its copy in a storage meant for messages pending to be sent to sleeping nodes and re- attempt to send it when the next blink from the intended destination is received. 3.2.2. Increased Power efficiency The routing protocol works perfectly for nodes which are always awake, i.e. have their receiver ON all the time. However, in WSN, most nodes are battery powered and hence should operate in power saving modes. Therefore, to adapt the nodes in power saving mode to receive updates and data without external synchronization like TSCH, I devised a two-step process in the protocol, which has been explained below.

3.2.2.1. Storing frames destined for sleeping node Until a node receives a blink from its neighbor, it knows that the neighbor is sleeping and hence doesn’t send anything. In case a distant node sends a data packet to that sleeping node and it is being routed through the current node, it should be able to store it till the intended destination of the data packet is awake. Hence, when nodes are capable of storing messages for their sleeping neighbors, every kind of packet can be stored temporarily till the nodes wake up and ensures that the sleeping nodes are not left out from the updates or data.

The storage for packets is limited in size. Currently, a maximum of 6 messages can be stored, but in future, this can be made bigger or smaller by adding a command to the API, thereby supporting different RAM sizes. The storage is implemented as a FIFO linked list and hence, overflow causes the oldest message to be discarded. However, to avoid losing data, and overflow to a certain extent, following optimizations have been implemented:

32

1. Time to live (TTL) of stored messages - Different types of EBs/updates and data are stored for different time periods (TTL) and deleted after their expiry. As EBs/updates are more frequent, their storage durations depend on their interval of arrival. For example, if a TC IE arrives every 10s, its record is stored only for 10s + blink interval of neighbor to which it is to be sent. Whereas a data or concatenated frame might arrive at any time, and so is stored for a fixed duration of LsnSaRecordTimeout, which is generally 4 times the blink interval of storing device. 2. One entry for each update/EB - An older stored update is replaced with a newer one when available. Hence, only one copy of each kind of update is stored which ensures that the freshest update is sent out and helps prevent storage overflow to some extent. There are two kinds of updates in the implementation, TC IE and L2R IE. TC IE is sent between immediate neighbors and so there is only one TC IE stored for a neighbor at a time. L2R IE can be from a device’s successors and destined to mesh root. Hence, they are filtered using their original sources and not destination. 3. Concatenation of stored data records - With data records, all the data to same destination is concatenated into one frame and stored if multiple data packets to same destination are present in the storage. This is handled by a polling function that runs periodically throughout the device’s lifetime and keeps checking for multiple data frames to concatenate. 4. Discard packets for/from deleted neighbor - If a neighbor is deleted from the routing table by route maintenance (section 3.1.5) due to long period of inactivity, the stored messages destined to it and those to be forwarded from it (like, L2R IE or data), are discarded.

3.2.2.2. Waking up to receive blinks from neighbors A node storing the data for a sleeping node has to be awake to receive blink from the sleeping node, so that it can forward all the frames it is storing. As it is energy inefficient for the node to be awake all the time, I devised a schedule that can be built and followed by every node so that it can wake up close to when a blink is expected to arrive. This schedule can also be used to predict when a periodic EB will arrive and keep track of the number of EBs that have been missed which is useful for route maintenance.

The nodes first start out with tracking each other’s blink arrival, by waking up at blink intervals and after receiving EBs and becoming part of the mesh, they gradually start tracking beacon arrivals. Following formulas are used to estimate when to wake up to send updates (or receive blinks) and when to expect updates.

Note: The reference timestamp which is used to calculate future beacon and blink arrival is obtained from previous blink/beacon exchange. At the beginning, when a node is not part of the mesh, it is always awake and hence receives every blink and beacon from its neighbors which is used to build a schedule. The node resumes its power saving mode after it has joined a mesh. From its prior knowledge of blink arrival and blink interval (which is present in every NIN packet), it calculates when to expect the next blink and wakes up accordingly.

33 i) For predicting when to wake up to send TC IE Case 1: When n is a positive integer

Case 2: When n is a positive integer

ii) For predicting when to expect TC IE Case 1: When n is a positive integer

Case 2: When n is a positive integer

3.2.2.3. Concatenation of frames meant for a certain destination After sending a NIN, the node has its receiver ON for certain time, called ‘RX window’. This value is configurable and should be from 10-30 ms in case of sleeping nodes for energy efficiency. However, sometimes more than one beacons or data might have been stored for a node by its neighbor and some could not get delivered due to small RX window. As the neighbor has to find the packets for the node and send them one after another, the size of the RX window will have to be sufficiently large to receive all packets which is not power efficient in nature. To cope with this, concatenation of frames destined for a node is a much viable solution. Hence, I added the feature of concatenation by a neighbor storing multiple packets for a node. When NIN is received from a node, the multiple IEs and (or) data stored for it, are concatenated as Long nested IE and sent by the neighbor. Here, we only need to ensure that the RX window is large enough to receive the long nested IE whose format is explained in section 3.3.4. If the packets exceed the maximum allowable size of the message, they can be concatenated in a separate packet and stored until the next NIN is received from the destination.

34

3.3. Frame formats used 3.3.1. Enhanced Beacon (EB)

Enhanced beacon is used to carry information elements between nodes. It has a finite header size, but a variable sized content field as that depends on the type of IE it contains. To categorize a frame as EB, its primitive is set to 0x22. It is sent in the message payload and so are all the other frames discussed here.

BSN Sub ID of IE Length of IE IE content 0 7 15 23 variable Figure 14. EB frame format

BSN: This field carries the sequence number of the beacon. BSN is incremented for each beacon sent.

Sub ID of IE: This field indicates the type of IE carried by the EB. The sub IDs of IEs are tabulated in Table 10.

Length of IE: It holds the length of the IE content present in the EB.

IE content: The actual IE content resides here. 3.3.2. Enhanced Beacon Request (EBR)

Enhanced beacon request is used to request a certain type of IE. It is always 3 bytes in size irrespective of the type of IE requested. It is generally sent by nodes which want to discover and join meshes and expects an EB as a reply that contains the type of IE requested by EBR source. To categorize a frame as EBR, its primitive is set to 0x23.

BSN Sub ID of IE Length of IE = 0

0 7 15 23 Figure 15. EBR frame format

BSN: This field carries the sequence number of the beacon. BSN is incremented for each beacon sent whether it is EB or EBR.

Sub ID of IE: It indicates the type of IE that the EBR is requesting for. This instructs the recipient of EBR to formulate an EB as a reply containing IE of type Sub ID.

Length of IE: As EBR only requests for a type of IE and doesn’t really have any IE content, the length of IE field is equal to zero.

35

3.3.3. Information elements An Information Element (IE) is a well-defined, extensible mechanism to exchange data between nodes. They are used for topology construction and routing updates, route maintenance, etc. There are two information elements used in this routing protocol. They are sent in EBs whether sent independently or nested with other IEs and data.

3.3.3.1. Topology Construction (TC) IE

The TC IE is used in EB or EBR frames. The Content field in EBs is formatted as illustrated in Figure 16. The Content field omitted from EBRs. It is mainly used for US route maintenance and mesh construction.

Mesh root Transmitting Max Depth MSN Unit of TC IE PQM PQM address node’s address depth interval Interval ID

0 47 95 103 111 119 127 135 143 151

Figure 16. TC IE frame format

Mesh root address: This field contains the address of the mesh root that the TC IE source belongs to.

Transmitting node’s address: It contains the 6-byte MAC address of the transmitter of TC IE.

Max depth: The Max Depth field contains the maximum depth allowed in a mesh and is encoded as an unsigned integer.

Depth: IT contains the depth of the device transmitting the TC IE and is encoded as an unsigned integer.

MSN: The MSN field contains the sequence number of the TC IE set as described in 3.1.4. and is encoded as an unsigned integer.

Unit of interval: The Unit field indicates the time unit of the IE interval and may take one of the values listed in Table 9. The Unit field value is determined by the intervalUnit attribute defined in Table 14.

Table 9. IE Interval units Value Description 00 Second 01 Minute 10 Hour TC IE interval: The TC IE Interval field along with Unit of Interval field, specify the interval of the transmission of TC IEs by the current transmitter. This field contains the duration of the TC IE interval in the unit indicated by the Unit of interval field and is encoded as an unsigned integer.

PQM ID: This field identifies the metric in use in the mesh; in our case, number of hops. Many other metrics can be used as well.

36

PQM: This field indicates the value of the metric of the path between the mesh root and the transmitter of the TC IE. This field is valid with PQM ID that determines the type of PQM.

3.3.3.2. L2R routing IE

The L2R Routing IE is used in a data frames and EBs. The Content field is formatted as illustrated in Figure 17. It is mainly used for DS route maintenance and routing data.

Source address Destination address LSN Unit of interval L2R IE interval Full broadcast

0 47 95 103 111 119 127

Figure 17. L2R IE frame format

Source address: The Source Address field contains the address of the original source of the frame.

Destination address: This field contains the address of the final destination of the frame.

LSN: The LSN field contains the sequence number of the L2R Routing IE and is encoded as an unsigned integer.

Unit of interval: same as explained in 3.3.3.1.

L2R IE interval: This field along with Unit of Interval field, specify the interval of the transmission of L2R IEs by the original source of frame. This field contains the duration of the L2R IE interval in the unit indicated by the Unit of interval field and is encoded as an unsigned integer.

Full broadcast: This field is custom added to allow or restrict broadcast routing of a frame. In swarm, apart from broadcast user data, blinks and ranging results are also broadcasted. The latter two are sent periodically and quite frequently. If they are broadcast routed throughout the mesh, it will generate a lot of traffic and hence in their case, full broadcast can be turned OFF by setting it to 0x00. However, in case of data, full broadcast can be set to 0xff, and that instructs the recipients of this frame, to forward it in order to achieve mesh-wide broadcast. 3.3.4. Long Nested IE

A long nested IE is used when a node wants to transmit multiple information elements or data to same destination. This is particularly useful if the destination is in power saving mode and has a small RX window after transmitting blink, in which case nesting or concatenation of packets meant for it, can achieve successful delivery.

The Content field of nested IE is formatted as illustrated in Figure 18.

BSN Total IE Type of IE Sub ID of Length IE1 Sub ID Length IE2 … length = Nested IE IE1 of IE1 content of IE2 of IE2 content

0 7 15 23 31 39 variable

Figure 18. Long nested IE frame format

37

BSN: The BSN field contains the sequence number of the beacon containing nested IE and is encoded as an unsigned integer.

Total IE length: This field contains the total length of all sub elements in the frame, i.e. the size of the content field of each sub-IE along with the 2-byte sub element header field (Sub ID and length of sub-IE).

Type of IE: This field indicates that the frame is Nested IE. It’s sub ID is specified in table 10.

For each sub element (or sub IE):

- Sub ID of IE: The Sub ID field identifies the type of sub-IE transmitted. The sub IDs of the IEs are listed below.

Table 10. Sub IDs of IEs and data Sub ID Name of IE Subclause 0x38 TC IE 3.3.3.1. 0x39 L2R IE 3.3.3.2. 0x3b Data packet with L2R IE 3.3.6. & 3.3.7. 0x3a Nested IE 3.3.4. - Length of IE: The Length field contains the length of the Sub-Element IE’s Content field. - IE content: The Sub-Element Content field carries the content of each sub-IE. 3.3.5. Node ID Notification (NIN) NIN, also called blinks are transmitted by swarm nodes every blinkInterval, pre-defined by user. It is used in the routing protocol as an indication that a node is awake, as even a node in power saving mode will wake up to transmit blinks. The Blink Interval field carried in NIN, is used by nodes to predict when their sleeping neighbor will send a blink and they wake up accordingly if they want to transmit something to the sleeping node. Here, a short version of the NIN is shown with fields relevant to the protocol. However, the entire frame structure is explained in Appendix 2.

Blink header swarm header swarm sensor data User data, if any

Blink ID slot tag Blink Blink length Sensor RX version counter counter interval interval Payload Payload Protocol protocol Message

0 7 15 23 31 55 63

Figure 19. NIN frame format

38

Blink Header:

- Message tag: It is equal to 0x02 which is a type used in swarm to allow localization with TDOA. - Protocol version: This field is equal to 0x13 and used when the frame carries sensor data. - Blink ID: It is an up-counter and increases for every packet transmitted. Can be used to identify packet losses. - Payload length: This field represents the number of following bytes until packet ends. - Blink interval: It indicates the time until the next blink occurs, in milliseconds. It is LSB first. - RX slot counter: The RX slot counter is a backchannel counter which indicates the next RX slot. The receiver of this device is active if it is 0. In combination with Blink interval, the time in milliseconds until the next time receiver is active, is known. 3.3.6. Unicast Data frame A unicast data frame that needs to be routed in the mesh, is appended with L2R routing IE which contains original source and final destination. The MAC header holds the current device address in MAC source and next hop’s address in MAC destination.

MAC header User data L2R IE variable 16 bytes

Figure 20. Unicast data frame format

3.3.7. Broadcast packet A broadcast packet of type data, blink or ranging result can be broadcasted locally or mesh-wide by setting proper value to full broadcast field in L2R IE (described in 3.3.3.2.). The final destination is set to broadcast address (0xffffffffffff) in L2R IE and source and destination address in MAC header are node addresses of current node and neighbor.

MAC header User data L2R IE with DA = broadcast address variable 16 bytes

Figure 21. Broadcast frame format

39

Chapter 4: Implementation For implementation of the proposed routing protocol along with the additional features to support functionalities for sleeping nodes, new structures, tables and changes in normal swarm behaviour had to be incorporated. This section elaborates on such implementation techniques and methods used. 4.1. Service Access Points (SAPs) in routing layer SAPs are used to provide services by handling data and management tasks. In routing, I have implemented two such SAPs, for dividing handling of data and management commands.

4.1.1. For data services – RDSap This SAP is used to handle data to be routed. RDSap processes the unicast or broadcast data from APP layer before handing it over to PHY layer. RDSap performs appending L2R IE to the outgoing data and assigning appropriate next hop. In case of broadcast routing, it converts the broadcast data packet to multiple unicast packets meant for each of its neighbors and makes LsnSa record for it to avoid duplication. For the packets that are not to be routed, like blinks or ranging results and requests, it just passes it on to PHY without changes.

4.1.2. For management services - RLMESap RLMESap stands for Routing Layer Management Entity SAP and is responsible for managing routing related tasks like, starting and joining a mesh. In case of starting a mesh, it updates the mesh attributes and sets flags necessary to start transmission of TC IEs. For joining a mesh, it sets the parameters and flags to initiate transmissions of EBRs and listen duration for replies, etc. 4.2. For deciding mesh root Current implementation requires that the mesh root or gateway is always ON, i.e. not in power saving mode. Hence, for selecting mesh root that will start a new mesh, its power mode is checked to verify if it is in unmanaged power mode (power mode 0). The available power modes for swarm have been explained in section 2.2 Table 5. For now, if multiple nodes in swarm are in power mode 0, and only one mesh is desired to exist, I hardcode the address of the device to be made mesh root. Otherwise, multiple mesh roots will exist and create multiple meshes. In future, however, deciding a mesh root will be done explicitly by using a command that can be added to swarm API.

Note: This implementation can also support a mesh where mesh root is in power saving mode. As all the communication is blink dependent, this is possible. However, I currently decide the mesh root based on power mode because, for swarm, gateway should be the mesh root and it is always ON in every application. So, a node with power mode 0, is just used here as a filter to decide mesh root, because currently there is no command to do so explicitly. 4.3. For outsider nodes trying to join a mesh A node is automatically considered as an outsider node on startup, if it is in one of the managed power saving modes. Such nodes are sleeping nodes and only wake up to send blinks after which their receiver is ON for the duration of RX window that can be set by the user. They can only communicate with other nodes in this RX window. However, for the routing protocol, the

40 swarm behaviour in power saving mode was changed to allow a node to join mesh. As the communication is blink induced, an outsider node sends a unicast enhanced beacon request for joining mesh to another node only when it receives a blink from that node. The outsider node should have its receiver ON so that it can catch blinks because the possibility of it being awake when another node sends blink, is highly unlikely. So, I change the behaviour of a swarm node in power saving mode by making it stay awake till it joins a mesh, after which it can resume its power saving behaviour, i.e. waking up to send blinks. This way it is able to catch blinks from neighbors and send EBRs in response. If the neighbor belongs to a mesh, it will reply back with an EB with TC IE in content field, that can be used by outsider node to select a mesh to be part of. 4.4. Blink dependent communication To be reliable and to reduce traffic, all the communication is blink induced and unicast. So, an EB or broadcast packet, etc. is sent to a node from which blink was received. This is reliable, as node stays awake for a configurable period after sending blink. In addition, it reduces traffic because the updates are sent only to the blink source. For sleeping nodes to communicate in this manner, it should be ensured that they receive each other’s blink so that they can exchange updates and data. This will require a sleeping node S, for instance, which is storing packets for another sleeping node D, to wake up at the time D will send a blink, so that it can receive the NIN and send the stored packets to D in unicast manner. To achieve this, a schedule is maintained by each node storing at least the blink interval and timestamp for next arrival of blink of its neighbors. These values are updated using the timestamp of when the last blink was received.

4.4.1. Before joining mesh - Schedule for waking up for blinks The schedule contains the estimated timestamp for neighbor’s next blink arrival before which the current node should wake up, and this is calculated using timestamp of last blink arrival or at least the initial blink arrival timestamp that can be incremented by blink interval.

for blink interval in range greater than 1s. It needs to be reduced for smaller blink intervals, but it should be more than 30ms, to account for the time device takes to fully wake up.

In case of sleeping nodes, it is improbable to receive at least one blink from neighbor as their waking up times may or may not coincide. But, as mentioned before, a node is awake all the time before it joins a mesh. Hence, it will receive all the blinks from neighbors during this time. A timetable is used during the JOIN_MESH phase by the node, to store the blink interval and time at which blink from a certain neighbor arrived. This information is used for waking up even after the node has joined a mesh and resumed power saving mode. This timetable is as follows.

41

Table 11. Elements of timetable for neighbors (before joining mesh)

Name Type Valid range Description

Address of Node MAC address 6 bytes Address of node from which blink was received

Blink Interval Unsigned long Integer 4 bytes Interval between blink transmissions

wakeUpAt Unsigned long Integer 4 bytes Timestamp of when to wake up for next blink

4.4.2. After joining mesh - Schedule to wake up for blinks Just like before joining a mesh, a time table is maintained by a node after joining mesh as well. This holds the blink intervals and timestamps of when to wake up to receive blink. This timetable is bigger than the one used before joining mesh (which is flushed after joining mesh and its contents are transferred to this timetable) as it also contains fields needed for route maintenance, like when to expect the next TC IE or L2R IE and when to wake up to receive blinks to send beacons/data as unicast transmission. A node uses this schedule and initiates by waking up for every blink from the neighbor. Later, when it receives TC IE from neighbor, it adds TC IE interval and timestamp for when to expect next TC IE. The structure of timetable is as follows.

Table 12. Elements of timetable maintained for neighbors (after joining mesh)

Name Type Valid range Description

Neighbor address MAC address 6 bytes Address of neighbor

Primitive in use Unsigned integer 0x00 - 0xff Indicates what to wake up for, TC IE or blink

Ttl Unsigned long integer 4 bytes Indicates how long this entry should be retained in timetable if nothing has been received from the neighbor after current node joined mesh. It is reset when something is received and not considered anymore, for deleting entry

Depth Depth of neighbor 0x00 - 0xff Depth of neighbor

ExpectedAt Unsigned long integer 4 bytes Timestamp of when to expect the packet of type determined by primitive in use

WakeUpAt Unsigned long integer 4 bytes Timestamp of when to wake up to receive blink from neighbor

No. of missed Unsigned integer 0x00 - 0xff No. of blinks/TC IEs missed/not received from EB/blinks neighbor

Blink interval Unsigned long integer 4 bytes Interval between blink transmissions

42

TC IE interval Unsigned long integer 4 bytes Interval between TC IE transmissions

L2R IE interval Unsigned long integer 4 bytes Interval between L2R IE transmissions from neighbor (optional). Used for calculation if available.

Timetable of reachable Described in Table 13 - Timetable similar to this one, for each of the nodes reachable nodes of the neighbor. It is used for DS route maintenance.

Number of reachable Unsigned integer 0x00 - 0xff Number of reachable nodes for which timetable nodes in Table 13 is being maintained

Table 13. Elements of timetable maintained for nodes in reachable list

Name Type Valid range Description

Reachable node’s MAC address 6 bytes Address of reachable address node

ExpectedAt Unsigned long integer 4 bytes Timestamp to expect L2R IE from the reachable node

No. of missed L2R IEs Unsigned integer 0x00 - 0xff No. of L2R IEs missed/not received from reachable node

L2R IE interval Unsigned long integer 4 bytes Interval between L2R IE transmissions from reachable node

Note: EBR is sent by an outsider to a mesh node, only if it receives a blink from latter. In such a case, it could happen that the mesh node in power saving mode might not have received any blinks from outsider node, but only EBR (which doesn’t have blink interval information, nor does it convey the time at which blink arrives) which will prohibit mesh node to create a schedule for this new node. To cope with this, the mesh node stays awake after receiving an EBR from outsider node, till it receives a regular blink from the same node, thereby allowing it to make an entry in timetable to estimate when to wake up for next blink from EBR source.

4.4.3. Random back off timer to avoid collision Blink induced communication allows sleeping nodes to communicate efficiently, but can also cause collision when multiple messages are being sent to the blink source, by its neighbors. This is mainly a problem when a node has more than one neighbor that want to transmit packets to it at the same time. CSMA facilitated by the swarm bee LE chip handles it well, but the random backoff provided by it may not be enough when large packets (concatenated packets) are being transmitted, as they occupy the AIR for longer and may still cause collision with other packets.

43

To avoid this situation, a higher backoff duration can be used by the nodes that want to transmit.

Hence, I implemented a random backoff delay at routing layer. On receiving a blink from D, for instance, the node, S that wants to transmit, will do so after a random backoff duration. The value of this backoff is chosen depending on the RX window of intended destination. For example, in this scenario, if D has an RX window of 20 ms, the random backoff chosen by S, is an integer between 0-18 ms. A safety factor of 2 is taken here to ensure that the transmitted message is received well within the RX window of D. This way, multiple neighbors transmitting to the same device, will be able to successfully do so without collision of packets in AIR.

This mechanism is also useful in linear topology like in case where nodes A, B, C are in the form A-B-C. Here, if A and C want to send a packet to B after it blinks, even if their CSMA is ON, they cannot sense each other’s AIR as they are out of range and so will send their packets to B at the same time, causing collision. This is called hidden node problem and nodes A and C are known as hidden.

With bigger RX windows, it is not necessary to make the backoff bigger. Instead, the backoff can be decided on the basis of number of neighbors. In this implementation, a node can have a maximum of 16 neighbors and hence the range of backoff can be 0-20 ms to provide a fairly pseudo random selection.

The only disadvantage of the backoff is that it comes at a cost of increased power consumption as the receiver has to wait for the backoff period before packet transmission, instead of transmitting immediately. 4.5. Schedule for Route maintenance Table 12 describes the parameters in the schedule maintained by each node to carry out route maintenance and achieve blink induced communication. Route maintenance is done for both US and DS routes. The timetable in Table 12 is responsible for US route maintenance, i.e. removing neighbors from which certain number of TC IEs have not been received. This timetable also contains another smaller timetable (shown in Table 13) which keeps track of how many L2R IEs have been missed from a certain reachable node. The fields ExpectedAt and No. of missed EBs (present in both timetables) are used for this purpose. ExpectedAt is a timestamp of tentative time at which next EB will be received and is calculated from the timestamp at which previous EB was received and the beacon interval. If a beacon is not received at expected time, No. of missed EBs is incremented by one and beacon interval is added to ExpectedAt to predict next arrival. This applies to both timetables, i.e. for L2R IEs and TC IEs and hence also for DS and US route maintenance, respectively. The formulae to calculate ExpectedAt for TC IEs has already been described in section 3.2.2.2 and the formula for calculating the same for L2R IE is as follows.

For predicting when to expect an L2R IE

44

4.6. Waking up and timeout A node has to wake up close to when a blink from its neighbor is expected, so that it can send out the packets meant for the neighbor. However, due to clock drifts and inaccuracies across multiple devices, the blink could arrive a little early or late. To compensate for this, the current node is made to wake up 100 ms before the blink is expected to arrive and waits 100 ms even after that time has elapsed, if blink was not received at expected time. Therefore, a sleeping device is awake for maximum of 200 ms to receive a blink. If it receives the blink during this period, the timetable is updated, the device stops waiting and goes back to sleep. The formulae to calculate wake up times are specified in section 3.2.2.2.

Note: Waking up before 100ms is suitable for blink intervals greater than 1 s. For smaller blink intervals, this value needs to be smaller, preferably 0.12 times the blink interval. However, it should be greater than 30 ms to account for the time it takes for a device to fully wake up (10-15 ms). 4.7. Dependencies between IE intervals The current implementation allows nodes to wake up to send out TC IEs in response to received blinks. To do so, the formulas specified in section 3.2.2.2 are used. These formulae have been devised using the blink and TC IE intervals with cases where they are/are not multiples of each other. So, a node’s wake up times are calculated initially to wake up every blink interval and later to wake up every TC IE interval, which is generally more than the former, thereby increasing the energy efficiency of the device by waking it up less often. However, I decided to not wake the node up for every L2R IE, which can be made possible if L2R IE interval is greater than and a multiple of TC IE interval. This way, for example, if TCIEInterval = 10s and LRIEInterval = 20s, an L2R IE can be sent with every alternate TC IE. Concatenation comes in handy here and the two IEs are sent together. So, a node need not wake up to send L2R IE as well as it does for blinks and TC IE, but only needs to predict when it will expect L2R IE, for DS route maintenance. This reduces complexity, doesn’t need an additional storage for wake up timestamps and also reduces traffic as IEs can be concatenated as single message.

and

, n is unsigned integer 4.8. EB list to store packets To provide support for sleeping nodes, it is necessary to ensure that the routing updates are not missed by them. This is possible if the transmitting node saves the message for its sleeping neighbor and sends it out when the latter is awake. To store these messages, EB list is used. Due to limited number of entries allowed in it, the old packets are discarded to accommodate the new ones. EB List is designed as a linked list, where each packet points to the location of the next and currently, a maximum of 6 packets can be stored at a time. To reduce storage overflow and prevent loss of packets to some extent, a few optimization techniques are used which have been already discussed in 3.2.2.1.

45

4.9. LsnSa Records In broadcast routing provided by this protocol, when a device receives a data frame with DA in L2R routing IE as broadcast address and if the device is not the original source of the frame, it forwards the frame to all neighbors other than the one from which it received the frame. After transmitting a broadcast frame, a device records the original source address (SA) and the L2R sequence number (LSN) and discards any subsequent frames with the same SA and LSN in order to avoid duplicate transmissions. The data structure implemented to store such records is LsnSa records. In my implementation, at most six such records can be stored at a time. A record is deleted after LsnSaRecordTimeout which has been set to 40 seconds. Hence, if a node receives a broadcast message to be forwarded, that has same LSN and SA as one of the entries in LsnSaRecords, it is discarded, thereby avoiding duplication.

This structure is also used for unicast transmission to implement loop avoidance. If a node is forwarding a packet from a distant node, it makes an LSN-SA entry for the data and adds the node which it received and forwarded the packet to, in a restricted list. This ensures that it doesn’t send this packet again to these addresses, so as to avoid loops. An example scenario where this is relevant is if A, B, C, D are in linear topology (A-B-C-D) and A transmits unicast data for D. This packet is forwarded from A to B and from B to C. However, if in case, D is lost and C has deleted it from its neighbor table recently, it will not be able to deliver data to D and hence will try to forward it upstream towards A (according to the algorithm for unicast routing discussed in 3.1.6.1). But B still has D in its reachable node list of C, because usually L2R IE interval > TC IE interval and hence it will take longer for B to discard D from NT as compared to C. Due to this, B will again forward the data to C, thereby creating a loop. However, by storing the data in LsnSa records, and adding B to restricted list (as the data was received from B), C will not send the data again to B and will drop it after LsnSaRecordTimeout if D hasn’t been found till then.

On overflow of LsnSa records, the oldest record is deleted to store the newest one. 4.10. Mesh list When an outsider node wishes to join a mesh, it sends out EBRs to all the nodes it received blinks from. In response, if the recipient of EBR is a mesh node, it will reply back with an EB containing TC IE. The outsider node could receive multiple such EBs and in case there are many meshes nearby, it has the opportunity to choose the best one to join. To facilitate this, Mesh list is implemented, which can store multiple mesh tables during the time that the node receives replies and discovers meshes. As mesh table is a huge structure in itself, mesh list will be considerably bigger and hence, is allocated dynamically. At the end of mesh discovery, node chooses the mesh to join whose contents are transferred to the main mesh table and the memory occupied by mesh list is freed.

46

4.11. Polling function The routing protocol requires many factors to be monitored regularly to carry out certain functions like route maintenance, checking and deleting expired records, etc. Hence, I implemented a polling function in routing layer, that is called periodically to perform maintenance tasks. It is also a good substitute for timers as there is an upper limit on number of timers that can run simultaneously. Following are the functions handled and monitored by the routing layer polling function, RLPoll: a) Route maintenance - upstream and downstream RLPoll monitors the ExpectedAt field in timetable, to check if they have expired. If the timestamp ExpectedAt < current time, it means that no blink/EB was received in time and hence the old ExpectedAt persists. In such a case, RLPoll, updates this field to next possible expected time and increments the number of missed beacons. It does so for both timetables meant for neighbors and their reachable nodes. So, it handles US and DS route maintenance. b) Wake up time

Just like in case of ExpectedAt, RLPoll also checks and updates wakeUpAt times. It updates the wakeUpAt fields to next time when node should wake up when no blink is received as expected. c) Checking expiry of elements in EB list and LsnSa Records The implementation includes lists to store packets for sleeping nodes and LsnSa records to avoid duplication and loops. However, the lists have a limited size and need to be weeded out of old packets to make space for new ones. For this reason, every entry in these lists has a time to live that needs to be checked often to know if they have passed their expiry period. RLPoll does that and deletes such expired entries. d) Monitoring if current node has no ancestors or an empty NT The routing protocol emphasizes that a mesh node should leave the mesh if there are no ancestors in its NT or if the NT is empty. If there are no ancestors in NT, a node will not be able to send anything to mesh root unless the mesh root is its neighbor. And in case the NT is empty, it has no neighbors and so it has disconnected from the mesh. In such case, it should start rediscovery of meshes and join a suitable one. RLPoll monitors the NT to make sure the appropriate actions are taken when NT is empty or devoid of ancestors. e) Concatenates data for same destination To reduce the possibility of storage overflow for saved messages, concatenation of multiple messages for same destination is done. RLPoll periodically checks the EB list and if more than one data packets for a destination are present, it concatenates them to a single message, so as to save space.

47

4.12. Solution for count to infinity problem There was an inherent problem with the protocol which resulted in count-to-infinity problem. Assume that a mesh node, A with depth, D, sends EB with TC IE in response to EBR from an outsider node, B, and B becomes part of the mesh with depth D+1 (A is ancestor). Now, B will start sending TC IEs as well. When A receives the TC IE, it will add B to its NT and increase its own depth to D+2 as the received TC IE was fresh and neighbor was new. The same thing will happen with B when it receives A’s TC IE with a newer depth and will increase its own depth. This way, the depth will keep increasing for both these nodes. This is an anomaly as it changes the depth in incorrect manner. To avoid this, I perform an initial neighbor discovery in which, if A receives a blink from B, which is initially not present in its NT, it will add B with depth = 0xff. Later when it receives a TC IE from B, it will see that the node is already present in NT with depth as 0xff (i.e. filled with information from blink, as 0xff is not allowed to be depth because 0xff > maxDepth) and must have become part of the mesh with the help of the TC IE sent by A. This is made obvious if B has depth greater than A’s in its TC IE. Hence, it will not increment its depth and know that it is B’s ancestor. This solves count to infinity problem. 4.13. Mesh attribute list Every node has certain mesh parameters which are same as the other mesh nodes and are common to the parent mesh. But there are some properties which are customizable by the node itself. Such properties that are characteristic to a node and are decided by it, are called mesh attributes. The mesh attributes are listed in Table 14. 4.14. Support for MQTT-SN Message Queuing Telemetry Transport for Sensor Networks (MQTT-SN) is a data-centric communication approach, in which information is delivered to the receivers not based on their network addresses but rather as a function of their contents and interests. It is a ‘Publish/Subscribe’ (pub/sub) protocol for wireless sensor networks which is a counterpart of MQTT, that is already being widely used in enterprise networks, mainly due to its scalability and support of dynamic network topology.

MQTT-SN can be considered as a version of MQTT which is adapted to the peculiarities of a wireless communication environment such as low bandwidth, high link failures, short message length, etc. It is ideal for implementation on low-cost, battery-operated devices with limited processing and storage resources [18].

In future, nanotron may implement this approach on a link layer above the routing layer. Even though MQTT-SN is designed to be agnostic of the underlying networking services, there are some requirements that should be fulfilled by the layer below. These have been discussed below:

Note: Mesh root stands for gateway

1. Bidirectional links To support: Any network which provides a bi-directional data transfer service between any node and a particular one (a gateway) should be able to support MQTT-SN. This is because, to cope with the short message length and the limited transmission bandwidth in wireless networks, the topic name in the PUBLISH messages is replaced by a short, two-byte long “topic id”. A registration procedure is defined to allow clients to register their topic names

48

with the server/gateway and obtain the corresponding topic ids. It is also used in the opposite direction to inform the client about the topic name and the corresponding topic id that will be included in a following PUBLISH message. Hence, bidirectional links are needed. 2. List of devices in the mesh stored at the root node/gateway To support: MQTT-SN defines a new offline keep-alive procedure for the support of sleeping clients. With this procedure, battery-operated devices can go to a sleeping state during which all messages destined to them are buffered at the server/gateway and delivered later to them when they wake up. 3. Finite allowable number of nodes in the mesh. To support: To reduce the broadcast traffic created by the discovery procedure, it is desirable that MQTT-SN could indicate the required broadcast radius to the underlying layer. This is basically, to know the broadcast range. Also in MQTT-SN, SEARCHGW message is broadcasted by a client when it searches for a GW. The broadcast radius of the SEARCHGW is limited and depends on the density of the clients’ deployment, e.g. only 1-hop broadcast in case of a very dense network in which every MQTT-SN client is reachable from each other within 1-hop transmission.

Among these requirements, all are satisfied by the proposed protocol. - As duplex communication is possible between mesh root and nodes in the mesh, the links are bidirectional. - Mesh root has the list of all nodes in the mesh through information stored in neighbor table and reachable node list of neighbors. - Maximum depth allowed in the mesh, limits the number of nodes that can join the mesh. Hence, a finite number of nodes are guaranteed.

49

Table 14. Mesh attributes

Name Type Range Description intervalUnit Enumeration Second, Minute, Unit of TcIeInterval, LrIeInterval, Hour lsnSaRecordTimeout

TcIeInterval Unsigned integer 1 byte Interval between TC IE transmissions in the unit specified by intervalUnit Unit.

LrIeInterval Unsigned integer 1 byte Interval between L2R IE transmissions in the unit specified by intervalUnit Unit. maxMissedTcIe Unsigned integer 0x00 - 0xff Indicates the maximum number of TC IEs a device may miss from a neighbor before removing it from its NT maxMissedLrIe Unsigned integer 0x00 - 0xff Indicates the maximum number of L2R IEs a device may miss from a reachable node forwarded by a neighbor before removing it from the reachable node list of that neighbor maxScanRetry Unsigned integer 0x00 - 0xff Number of times the Routing layer may attempt to trigger scan to discover meshes. meshRootAddr MAC address 6 bytes Address of the mesh root of the current mesh. lsnSaRecordTimeout Unsigned integer 0x00 - 0xff Duration after which a record of a LSN and SA is deleted in intervalUnit.

50

Chapter 5: Validation and testing 5.1. Tools used 5.1.1. swarm bee LE Development Kit The swarm bee LE Development Kit consists of several DK Plus Boards with antenna. The kit helps users to get familiar with the functionalities of swarm bee LE module and develop their own applications rapidly. This is also the kit I am using to test and validate the implementation. The kit acts as the swarm node, connected to PC and the progress can be seen by printing messages, routing tables, etc. Multiple such boards are used to test the communication between nodes. I am using the swarm bee LE version which serves applications for reliable distance or location information between 10 and 500 meters with an accuracy of about 1 meter. The integrated MEMS sensor can detect 3D acceleration as well as temperature.

To perform testing in a smaller space, I use the command, STXP (Set transmission power) to reduce the TX power and removed the antennas, so that their range of communication is reduced to about 5cm. This way nodes could be placed side by side with the alternate nodes isolated from each other even at a distance of 20cm.

Figure 22. swarm bee LE DK Plus Board

5.1.2. Tera Term is an open-source, free, software implemented, terminal emulator (communications) program. It emulates different types of computer terminals, from DEC VT100 to DEC VT382. It supports telnet, SSH 1 & 2 and serial port connections. It also has a built-in macro scripting language (supporting Oniguruma regular expressions) and a few other useful plugins [8].

I use this tool to see the messages printed by the swarm bee LE Development Kit which is my node, connected to Tera Term via serial port. Multiple such nodes can be connected to PC and the message exchange and building of routing tables can be seen on Tera Term terminal. The

51 output can also be logged with timestamp prepended to them, which are useful readings to test performance of the protocol. Through the logs, time taken to join a mesh, send and receive a routed message, etc. can be recorded for testing and observation. In the following images, it can be seen that triggering events like, sending or receiving NIN, IEs, EBs, etc. are printed on the console to validate blink induced communication. Routing table and schedule printed, are crucial in investigating the neighbors, reachable nodes and missed beacons, etc.

52

Figure 23. Screenshots from Tera Term

5.1.3. Gephi Gephi is an open-source network analysis and visualization software package written in Java on the NetBeans platform, initially developed by students of the University of Technology of Compiègne (UTC) in France. It facilitates graphs exploration and manipulation which are very flexible. It is easy to add algorithms, filters, styles, data source, tools or plugins [9].

I used Gephi to enable a real time visual representation of routing among multiple swarm bee LE modules. Using a Gephi plugin called Graph streaming [10] and a Python class called Gephi streamer [11] made to stream graphs to Gephi, I was able to demonstrate nodes joining and leaving the mesh in real time. It can be seen how change in topology affects the relations between neighbors as they break pre-established links to form new ones. I achieved this by using the Gephi Streamer function to add nodes and edges dynamically by parsing the messages received from nodes via serial port. A screenshot from Gephi showing certain topologies from the mesh network is as follows.

Note: The mesh root in both instances is the node with address 000055933dbb.

53

Linear Topology

Topology with denser node population around mesh root

Figure 24. Screenshots of visual representation of routing in swarm at different phases

54

5.1.4. Oscilloscope The amount of power consumed by routing was studied by observing the waveforms on oscilloscope and integrating them over time to get power consumption in Volt-(milli)seconds. I did these for nodes with different number of neighbors, to study how often they wake up in a certain time interval, to transmit updates, data and blinks. The readings from oscilloscope were also vital in estimating the optimum RX window needed to ensure successful transmission and reception of concatenated packets, so that the receiver is not ON for longer than required. This makes the algorithm more power efficient. It also helped estimate the optimal duration to wake up before receiving an expected blink. I observed that the node takes about 10-15 ms to wake up from sleep state and this should be taken into consideration when estimating a wake up time.

For example, in the image below, screenshot of one such power consumption measurement is shown. The waveform in purple is that of a node with two neighbors and depth of 2. The pink curve integrates this waveform between the markers Ax and Bx. Each image shows measurement in one of the periods in 1s (100 ms * 10 units on the screen) when the receiver of the node is ON (Note: The nodes shown in the images have blink interval of 500 ms and RX window of 40 ms). The readings at the bottom of the image show that power consumed in the duration between markers Ax and Bx. So, from these images, it can be seen that additional power consumed by the node in 1s is (65.54 + 178.9 + 68.53) mVs = 312.97 mVs, and

Total time the node is awake for every 1s = sum of durations marked by Ax and Bx

= (40 + 110.9 + 41.8) ms = 192.7 ms

Note: The two smaller durations of about 40 ms are the RX windows and it can be seen that they occur every 500 ms after blink is broadcasted.

This is how awake time and power consumed in certain time interval for a node with varying number of neighbors, are calculated.

55

Figure 25. Oscilloscope screenshot showing power consumption by node with 2 neighbors

56

5.2. Analysis of transmission and reception 5.2.1. Ranging and blinks swarm node broadcasts NINs periodically, every blinkInterval whose value can be set by using the API command, SBIV

Reading MEMS and sending blink Receiving blink Ranging between 2 nodes

Figure 26. Sending NIN and Ranging between 2 swarm nodes

5.2.2. Turning the receiver ON When a node wants to transmit something to a neighbor that is currently sleeping, it wakes up when a blink is anticipated from that neighbor. To make sure that it catches the blink, current node wakes up before blink is transmitted. This wakeUpBefore period should take into consideration the time required to completely turn ON the receiver, which is about 10-15 ms. I initially set this interval to 100ms and added a buffer period of 100 ms, in case blink is not received on time. This is done to overcome the possibility of drift in the clocks of different nodes. The following figures show such an instance where a neighbor (green waveform) wakes up

57 approximately 100 ms before blink arrival and waits another 100 ms when blink is not received as expected, thereby keeping the receiver ON for 200 ms before going back to sleep.

In the first image, the device wakes up and whatever it receives after 90 ms is not what it expected, and hence keeps awake till approximately 200 ms (163.64 ms) have elapsed.

Note: The spike seen in green waveform, before device wakes up is the charging of capacitors and after that it takes a few ms before receiver is completely ON. This is the reason why total duration that the receiver is ON is slightly less than 200 ms.

Figure 27. Node waking up and waiting to receive blink

5.2.3. Calculation of power from waveforms Power consumed by a node varies with varying number of neighbors it has. Section 5.1.4 explains how the waveforms seen on oscilloscope are integrated over a time period to find the power consumed in units of Volt-seconds. This value can be used to also calculate power consumption as a unit of amperes or watts. For example, consider the waveform in green shown in Figure 28.

To calculate power consumed in amperes, from Volt-seconds (determined by purple curve), we calculate the current in both cases that the device is awake.

In a duration of 1s (100 ms * 10 units), it is awake for 183.63 ms consuming 216.34 mVs and for 40 ms consuming 44.10 mVs. Let these be denoted as

T1 = 183.63 ms = 0.183 s, Vs1 = 216.34 mVs

T2 = 40 ms = 0.04 s, Vs2 = 44.10 mVs

The board’s measuring resistance, R = 10 ohm

Current, I1 = Vs1/(T1 in sec * R) = 117 mA

Current, I2 = Vs2/(T2 in sec * R) = 110 mA

58

So, total power consumed in 1s = I1 + I2 = 117 + 110 = 227 mA

From I1 and I2, power consumed in Watts can be calculated by using formula, Power, P = I²R

P = (I1)²R +(I2)²R = [((117* 10-3)² * 10) + ((110* 10-3)² * 10)] * 10³ mW= (136.89 + 121) mW

P = 257.89 mW

Figure 28. Waveform for power measurement

59

Chapter 6: Results and Evaluation 6.1. Data analysis and evaluation In the implementation, certain parameters have been assigned default values. They are as follows:

1. Maximum limit allowed in cases of:

- No. of entries in NT = 16 - No. of reachable nodes for each neighbor = 5 - No. of LsnSa Records = 6 - Restricted list of addresses in LsnSa records, to avoid loop avoidance = 5 for each record - Saved messages = 6 - Max depth of mesh = 10 hops - Max packet size = 128 bytes excluding header (device constraint) - Max no. of missed TC IEs, blinks and L2R IEs before corresponding entry is deleted = 8

2. Conditions and default values:

- LrInterval = n*TCInterval, n = whole no. - RX counter = 1, i.e. RX window of a node is open after every blink (More in Appendix 2) - LsnSa record timeout = 40s - Wake up to catch a blink before = 100ms, for blink intervals > 1s - Wake up to catch a blink before = 60ms, for blink intervals < 1s - Minimum wake up before = 30 ms, to account for time taken to fully wake device up - Extra wait duration to receive the expected blink = wake up before + buffer time = 200ms (> wake up before)

3. The following readings have been taken with certain process constants. They are as follows:

- Blink interval of each node = 5s - TC IE interval = 10s - L2R IE interval = 20s - Listen duration before joining a mesh = 5s - RX window width = 40 ms (minimum value = 20 ms) - CSMA = OFF - Topology tested is linear, unless specified otherwise

Figure 29. Topology tested - linear 60

6.1.1. Time to start/join mesh To initiate routing, nodes need to be part of a mesh. Currently, it is done as soon as the device is turned ON, but it can also be made user controllable in future by adding a command to API. The following readings show how much time is taken by mesh root (denoted by depth = 0) to start a mesh and nodes at varying depths from mesh root, to join the mesh. These are dependent on the process constants, especially the blink interval, which is 5 s here. The lower the blink interval, the faster the process is, as all the communication is blink induced and hence higher the blink rate, more frequent is the availability of RX window to transmit data.

Table 15. Time taken to resume normal operation for linear topology

Depth of node Time taken without routing Time taken with routing (ms)

0 Blink interval = 5s 775 – 5006

1 5s 5003(1 neighbor) ≈ 5s 9295 (2 neighbors) ≈ 10s

2 5s 14328 ≈ 15s

3 5s 19096 ≈ 20s

4 5s 25303 ≈ 25s

30 25.3 25

19.1 20 14.3 15 9.3 10 5 5 Time to join mesh (seconds)meshjoin Time to

0 Depth 0 Depth 1 Depth 2 Depth 3 Depth 4 Node depth

Figure 30. Time taken to start/join mesh

For swarm without routing, nodes are operating independently and hence their normal operation begins from the first blink transmission itself. As blink interval is 5s here, their normal operation also starts 5s after waking up.

In routing with swarm nodes, mesh root is said to have started a mesh after it has sent the first TC IE. As communication is unicast and blink induced, this depends on when mesh root receives the first blink from a neighbor. If all nodes are turned ON at the same time, time to

61 resume normal operation is almost equal to blink interval and if the neighbor was turned ON before mesh root, it is even lower.

For outsider nodes which are not mesh root, they still behave like swarm without routing, i.e. send blinks periodically and range with neighbors. However, they take longer to join mesh and perform routing and this depends on their listen duration (= 5s), blink interval of neighbors and the no. of hops from mesh root. As they send EBRs to nodes from which they receive blinks and then wait for listen duration for replies, it takes longer with more number of neighbors.

For nodes farther away from the mesh root, they have to wait for their ancestors to become part of the mesh too. For example, nodes at depth 2,3,4 have to wait for node at depth 1 to join mesh and then they can join one after the other as they are out of range from mesh root. Hence, they take even longer. It can be seen from the readings that joining duration of consecutive nodes differs approximately by 1 or 2 blink intervals.

Result – Time taken to join mesh is proportional to depth, blink interval and listen duration of a device. 6.1.2. Increase in number and size of packets due to routing In swarm without routing, mainly 2 kinds of packets are handled, viz. blinks (NINs) and ranging results. NINs are 37 bytes in size and ranging results are 54 bytes, excluding headers and are broadcasted every blink interval. There is also data packet, broadcast and unicast type, but they are not periodic or frequent and cannot exceed 128 bytes.

In routing, apart from NINs and ranging results, additional packets are transmitted, such as TC IE, L2R IE and concatenated packets which are periodic and frequent (generally less frequent than blinks). Data packets are not periodic. Hence, routing handles more and larger packets than generic swarm. The maximum number and size of packets sent by nodes at varying depth are tabulated below. These values are for linear topology as it results in a longer chain and hence larger depth with same no. of nodes, as compared to other topologies.

It can be seen from the table and the graph that the nodes with more than one neighbor, send more packets than those with one neighbor. The nodes at depth greater than one, transmit their own L2R IEs and also forward the L2R IEs of their successors. The least number of packets are transmitted by the mesh root as it has to only transmit TC IEs for routing.

Result - Hence, number of packets sent by a node (beyond depth 1) is proportional to its number of successors if it provides an optimal route to mesh root. For node at depth 1, it will be same as its next successor as depth 1 doesn’t have to send any L2R IEs.

62

Table 16. Packet size and number Depth of node Without routing With routing A-B-C-D-E Number Max. Packet Size Number Max. Packet Size (bytes) (bytes)

0 2 91 4 x Additional = 22 x Blink = 37 bytes (Data additional) x Blink x Total = 113 x Ranging packets = x Ranging packet 54 bytes x TC IE = 22 bytes

1 2 91 6 x Additional = 78 x 3 L2R IEs – 1 from each x Total = 169 successor x TC IE x Blink x Ranging packet

2 2 91 6 x Additional = 78 x 3 L2R IEs – 1 from itself + x Total = 169 2 from each successor x TC IE x Blink x Ranging packets

3 2 91 5 x Additional = 60 x 2 L2R IEs – 1 from itself + x Total = 151 1 from successor x TC IE x Blink x Ranging packet

4 2 91 4 x Additional = 42 x 1 L2R IE from itself x Total = 133 x TC IE x Blink x Ranging packet

180 160 140 120 100 80 Routing 60 No routing

Max packet size Max packet size 40 20 0 Depth 0 Depth 1 Depth 2 Depth 3 Depth 4

Node depth

Figure 31. Maximum packet size transmitted at different node depths in linear topology

63

6.1.3. Time to stabilize after change in node position Route maintenance in mesh is essential to keep track of discovered routes. They need to be updated frequently to ensure that they are still reliable for routing data when needed. This is done for both US and DS routes by periodic transmission of updates (TC IE for US routes and L2R IE for DS). When a node’s position is changed, for example from depth 1 to 2, the other nodes in the mesh should update this information so that proper routes can be chosen when data is to transmitted in future.

The following table lists the time taken for the mesh nodes to detect and adapt to the change in topology when a node’s position (or depth) is changed. The experiment starts with linear topology and changes as node positions change.

Table 17. Time to stabilize after node's position change

Old depth New depth Time taken 1 2 65.4s 1 3 60s 1 4 62.4s 2 3 34s 3 4 40s 3 2 5s 2 1 5.45s 3 1 24s 4 3 10s 4 1 30s When a node goes from a lower to higher depth (like, from depth 1 to 2, 3 or 4), its successors become its ancestors. Its earlier ancestor is now out of range and is hence removed from NT after missing a certain number of TC IEs. According to the routing protocol, a node should disconnect from the mesh when it has no ancestors, which is what it does and proceeds to discover and join a mesh again. This process takes longer because first it waits till its old ancestor is deleted from NT, and then it waits at least for the listen duration (=5s) before joining a mesh. So, the duration to stabilize when a node moves to higher depth, depends on blink interval, TC IE interval, maximum limit of missed TC IEs and listen duration.

Whereas for a node that moves to a lower depth, i.e. closer to mesh root, doesn’t take this long. This is because now it has a new ancestor, which has lower depth than its previous ancestor. On receiving TC IE from newer ancestor, the node simply updates its own depth to a smaller value than it had before and its older ancestor now becomes its successor or is deleted (if out of range). As it always had an ancestor in its NT, it doesn’t disconnect from mesh and stabilizes faster, usually in a single blink interval.

In case of node displacement like depth 1 to 3 or depth 1 to 4, where a node moves multiple hops away or towards mesh root, positions of intermediate nodes also change. For example,

64 when a node moves from depth 1 to 4, intermediate nodes with depth 2 and 3 are also affected. Hence, the time taken to stabilize the mesh, should include the time taken by nodes with depths 2 and 3 to settle too. This is why they are higher than the time taken when a node moves closer to mesh root and displaced by a single hop (depth 2 to 1, 3 to 2 and 4 to 3 in the experiment).

70

60

50

40

30 A-B B-A 20 Time taken to stabilize to taken Time 10

0 1,2 1,3 1,4 2,3 3,4 Old to new node depths

Figure 32. Transition of nodes within the mesh

Result – Stabilization of the mesh to change in node position takes shorter time when a node moves closer to mesh root than when it moves away. 6.1.4. Awake time and power consumption of nodes with and without routing In routing, a node is awake and has its receiver ON when it sends a NIN followed by duration of RX window and when it wakes up a certain period before blink from a neighbor is expected. If desired blink is not received, it stays awake for an additional timeout duration before going back to sleep, otherwise, it sends the packets for the blink source and goes to sleep before timeout. If a node stays awake longer, it could be because

1. It has to send a blink in less than 50 ms. 2. It has to range in less than 50 ms. 3. It has to wake up for the next neighbor’s blink in less than 50 ms.

This is so because the device takes about 15-20 ms to wake up and the above mentioned events should not be delayed because of that. The following table lists the durations that a node is awake for when it has varying number of neighbors and the corresponding power consumed. These have been calculated with different topologies where all nodes have blink interval = 1s, an RX window = 25ms, TC IE interval = 5s and L2R interval = 10s. For these, following parameters were also modified: Wake up before = 50 ms, Extra wait duration before timeout = 100 ms, LSA-SN record timeout = 20s, Maximum random number backoff = 18 The best, worst and general case awake durations and power consumptions (in mVs) are measured here from the experiments. The best case implies the shortest time that the node was

65

awake for, and hence least power consumption. These have been measured for 1s intervals by observing the waveforms on oscilloscope.

In table 18, power in amperes and watts have been calculated as explained in section 5.2.3. They have been calculated only for general cases as it occurs commonly.

Table 18. Awake time of nodes and power consumed

No. of Awake time (in 1s interval) Power consumed neighbors (milliseconds)

Without With routing Without With routing Additional power routing routing with routing

1 41.81 Best case: 116.36 x 31.46 mVs mVs mA mW General: 145.45 x 75 mA 35.91 mW Worst case: 156.36 x 56.25 mW Best 88.6 96 92.16 Increase = 63.84 %

Gen 140.39

Worst 162.97

2 41.81 Best case: 121.81 x 31.46 mVs mVs mA mW 39.79 mW General: 223.63 x 75 mA Increase = 70.73 % Worst case: 263.63 x 56.25 mW Best 84.34 98 96.04

Gen 220.44

Worst 310.62

3 41.81 Best case: 272.71 x 31.46 mVs mVs mA mW 43.75 mW General: 278.18 x 75 mA Increase = 77.77 % Worst case: 287.26 x 56.25 mW Best 251.29 100 100

Gen 280.73

Worst 267.73

4 41.81 Best case: 282.63 x 31.46 mVs mVs mA mW 49.84 mW General: 326.21 x 75 mA Increase = 88.6 % Worst case: 340.22 x 56.25 mW Best 261.98 103 106.09

Gen 310.8

Worst 390.37

A node wakes up less often when it has one neighbor than when it has more. The awake time increases drastically when no. of neighbors increases from one to two. This is because the node has to wake up within shorter time intervals to catch blinks from both neighbors. However, the awake time in 1 s interval is more or less constant (in general case) after 2 neighbors, in case the blink interval of all the neighbors is same (in this case, 1s). This is because it is less probable for 3 or more neighbors to have their RX windows open (which happens every blink interval) in a single 1s interval. The worst case scenario mainly deals with the instances when expected blinks were not received by a node, causing it to stay awake for the wake up before and extra wait

66 durations that totals to 100ms. And the best case scenario is the close to ideal situation, when the node doesn’t have to wait longer than necessary for the expected blink and went back to sleep immediately after. However, these values fluctuate for multiple experiments due to the random number backoff at routing layer that delays sending packets by a random value.

400

350

300

250

200 Best case

150 General Worst case Awake time (in ms) ms) Awake (in time 100

50

0 1234 No. of neighbors

Figure 33. Awake time of node with routing

The additional power consumption with routing in swarm has been calculated for general case scenario because it is more realistic as it combines occurrences of both best and worst cases. This is the power profile of a node which persists most during its lifetime of operation. Worst case scenarios are seen mostly when node position in changed as it loses its previous neighbors and hence waits 100 ms (extra wait duration) for a blink that doesn’t arrive or when it disconnects from the mesh. However, these events are also temporary and last for the time required to stabilize before going back to general case. The values measured here are for a test setup which can be optimized for lesser power consumption, by reducing the RX window to 20 ms, wake up before duration to a value greater than 30 ms (preferably 40 ms-60 ms) and extra wait duration to 60 ms (> wake up before).

Result – The duration that a node is awake for increases to some extent with increase in its no. of neighbors and remains nearly constant after that. This is same for power consumption as well. However, power consumption can be reduced by reducing the value of the parameters: Rx window (> 20 ms), interval to wake up before blink arrival (> 30 ms) and additional duration to wait if blink is not received (> wake up before). 6.1.5. Time taken to transfer data over varying hops The unicast and broadcast data packets are routed through the mesh nodes to reach their destination(s). The routing information should provide original source, original destination, current node and next hop. The first two fields are present in L2R IE which should be appended to every data packet to be routed (unicast or broadcast) and should not be altered by intermediate nodes, while the other two fields are present in MAC header and are changed at every hop. The routing/forwarding of packets by intermediate nodes is done as described in 3.1.6.1 (Unicast routing) and 3.1.6.2 (Broadcast routing). The time taken for a data packet to

67 reach from source to destination(s) across different depths and number of hops, is tabulated below. This experiment has been carried out for linear topology as it provides the highest depth with certain no. of nodes as compared to other topologies. All the nodes have a blink interval of 5s.

Table 19. Time to transmit data over multiple hops

Type of data From depth Time taken to reach (in seconds) Depth 0 Depth 1 Depth 2 Depth 3 Depth 4 Broadcast Mesh root (0) - 2 7 7.159 17 1 4 - 4.220 9 15 2 6 3.532 - 3.639 8 3 14 11 6.776 - 6.779 4 20 11 8 3 - Unicast 0 to 4 - - - - 24 0 to 3 - - - 18 - 0 to 2 - - 11 - - 0 to 1 - 8 - - - 4 to 0 22 - - - - 3 to 0 21 - - - - 2 to 0 15 - - - - 1 to 0 4 - - - - From the readings, it is observed that the neighbors of the packet source, receive the data earliest as they communicate more frequently (at least every blink interval or at most every TC IE interval). In case of broadcast data, the source creates unicast packets for all its neighbors. It wakes up to receive blink from a neighbor and transmits the data packet concatenated with TC IE (and other pending packets for that neighbor). The neighbor in turn forwards it to its own neighbors and so on. It can be seen that the time difference between 2 adjacent nodes receiving the packet is usually equal to blink interval (=5s). In cases where it is less than 5s, it is because the nodes are not synced w.r.t. when their blinks arrive. For example, in case of broadcast data from mesh root, node at depth 3 receives the data within 159 ms of depth 2 receiving it. This could be because the blink from depth 3 is scheduled to arrive 159 ms after depth 2 sends a blink. In the cases that it takes longer for the data to reach destination, it could be because when a data packet is aimed for mesh root or lower depth, it is likely that data will be concatenated with L2R IEs of successors and TC IE of transmitting node. In concatenation, IEs are added first and data at the end. In case too many successors are present or the data is too big, it could exceed the 128-byte limit of swarm packet and hence will not be sent with the current concatenated frame. Instead, it will be sent with next blink when TC IE is to be sent.

Result – The time taken for data to reach destination is higher as we move away from the source and is lowest for the immediate neighbors. Time taken would also be lower for a lower blink interval and TC IE interval as the RX window among nodes will be available more often.

68

6.1.6. Memory consumed Routing adds a substantial amount of information to be stored and updated for maintenance, blink induced communication, etc. Some of the structures that are stored on-chip are routing table, schedule and storage for pending packets. Apart from the on-chip memory, routing also influences the packet sizes that are transmitted, due to addition of IEs and concatenated frames. All these memory and packet usages are accounted below.

1. Size of Information Elements and beacons (packet sizes and usage in AIR) - TC IE = 19 bytes - L2R IE = 16 bytes - EB = Header (3 bytes) + sizeof(IE used) - EBR = Header (3 bytes) - Data frame = User data (variable) + EB header (3 bytes) + L2R IE (16 bytes) - Long Nested IE = Header (3 bytes) + IE1 header (2 bytes) + IE1 content (sizeof(IE1)) + IE2…. - AIR usage/traffic i) TC IE = 22 bytes every TCIEInterval ii) L2R IE = 19 bytes every L2RIEInterval iii) Long nested IE (concatenated frame) = 128 bytes maximum (a) 1 L2R IE and 1 TC IE = 42 bytes every L2RIEInterval (b) 2 L2R IE and 1 TC IE = 60 bytes every 2*L2RIEInterval Note: additional packet size for data

2. Static memory allocation This shows the memory usage on device for its lifetime of operation. - Mesh table = 12 bytes + 16 * sizeof(NT entry) = 668 bytes sizeof(NT entry) = 11 + sizeof(Reachable node list) = 41 bytes sizeof(Reachable node list) = 6 * 5 nodes = 30 bytes - Time Table = 5 + 16 * sizeof(TT entry) = 1749 bytes sizeof(TT entry) = 34 + 5 * sizeof(Remote list) = 109 bytes sizeof(Remote list) = 15 - LSN SA records = 1 + 6 * sizeof(record entry) = 247 bytes sizeof(record entry) = 11 bytes + sizeof(restricted destinations) = 41 bytes sizeof(restricted destinations) = 5 * 6 byte address = 30 bytes - Mesh attributes = 13 bytes - EB List – to store packets for sleeping nodes EB list = 6 * sizeof(EBList entry) = 906 bytes sizeof(EBList entry) = 23 + max_payload = 151 bytes max_payload = 128 bytes

For generalization of memory occupied, let’s denote various parameters by variables names: N = maximum no. of neighbors R = maximum no. of reachable nodes per neighbor L = no. of LSN-SA records D = No. of restricted address per LSN-SA record E = no. of EB list records

69

Table 20. On-chip memory occupied by routing

Structure Use Size Generalized size (bytes) Mesh Table Stores neighbor depth and PQM which is used to 668 bytes 12 + N(11 + 6R) determine next hop and energy efficient route to destination

Time table Stores future blink arrivals and IE durations. It is 1749 bytes 5 + N(34 + 15R) used for route maintenance and blink induced communication

LSN-SA Stores LSN-SA information about frames that 247 bytes 1 + L(11 + 6D) Records have been forwarded and is used to avoid loop and duplicate transmissions

Mesh Stores current node specific parameter values, like 13 bytes 13 attribute list IE intervals and LsnSaRecordTimeout

EB list Used to store pending packets meant for 906 bytes 151E neighbors that are in sleep state

Total RAM occupied 3583 bytes 31 + N(15 + 21R) + ≈ 3kbytes L(11 + 6D) + 151E

3. Dynamic memory allocation This section accounts for the memory occupied on heap. This is to store temporary data, usually before a node joins mesh, and is deleted after joining. - Mesh List – To store information of different meshes found during mesh discovery. Mesh list is erased after a mesh is joined and the data regarding joined mesh is transferred to mesh table. It has no fixed memory allocation, and occupied memory is increased when required. Mesh List = no. of mesh found * sizeof(Mesh Table) = no. of mesh found * 668 bytes

- Temporary Timetable (TempTT) – To store blink arrival and blink interval of neighbors discovered before joining a mesh. TempTT = No. of distinct blink sources * 14 bytes

6.1.7. Scalability Layer 2 Routing protocol implemented here, is hierarchical in nature which is a well-known technique for good scalability and efficient communication. This allows nodes to be divided in meshes or clusters managed by a mesh root, which facilitates extension of the network to 100’s of nodes. As the main concern is data collection, mesh roots in multiple meshes can send the data from their respective meshes to the host controller or base station.

With the current implementation, where the values were decided arbitrarily by me, the following limits exist for various structures that affect the scalability.

70

- No. of entries in NT = 16 - No. of reachable nodes for each neighbor = 5 - Max depth of mesh = 10 hops

Taking into consideration the maximum number of neighbor information that can be stored by a node at a time and that mesh root knows all the nodes that exist in the mesh (through neighbor table and reachable node lists of neighbors), we can calculate the maximum no. of nodes supported in a single mesh as = 16 neighbors of mesh root + (node list of each neighbor) * 16 = 16 + (5 reachable nodes) * 16 = 96 nodes (excluding mesh root) Or 97 nodes, when mesh root is included. With multiple such meshes, each headed by a mesh root or gateway, the network can be expanded to 96n nodes (excluding mesh roots, else 97n when mesh roots are counted), where n is the number of meshes. This is the range of a mesh when mesh root should be able to send a directed unicast message to any node in the mesh. So, the features available here are, data collection and unicast and broadcast routing from and to mesh root. But the mesh reaches upto a depth of 6, not 10.

However, the range can be greatly increased with the same configuration if the features available are data collection, broadcast routing to and from mesh root and unicast routing from any node to mesh root. Hence, if unicast routing from mesh root can be forgone for nodes beyond the 96 calculated above, then for the maximum allowable depth of 10, 4 more nodes can be added to reachable list of each neighbor of mesh root. But these nodes will not be known to the mesh root as it can store a maximum of 5 reachable nodes per neighbor. However, it can be stored in the reachable list of the successors of mesh root, because they still have space. This adds 4*16 = 64 nodes to the 97 nodes calculated before.

Therefore, for a maximum depth of 10, and availing the features of data collection, broadcast routing from and to mesh root and unicast routing to mesh root, the range of the mesh = 97 + 4*16 neighbors of mesh root = 161 nodes

If the maximum depth is higher, the range is even higher. All of this is possible with current implementation with the limits specified above. So, if the mesh root mainly sends configuration messages to mesh nodes as either unicast upto 96 nodes or broadcast, the scalability of the mesh can be increased to a large extent. However, for faster communication, it is better to increase the range by having multiple meshes instead of long linear chain of nodes as it increases the propagation delay in message delivery.

Result – Scalability of L2R protocol with current implementation, and constraints discussed above, is 97 nodes, which can be further extended to 161 nodes. Other ways to increase the range, is to use multiple meshes or by increasing the storage occupied by routing. If the size of neighbor table and reachable node list is increased, the maximum number of nodes supported in a single mesh will increase. This also needs the schedule size to be increased. For fast and efficient communication, it is also better to increase the size of storage for pending messages, but it is optional if IEs are not very frequent.

71

Chapter 7: Conclusions 7.1. Conclusion The results and evaluations done in chapter 6 seem encouraging. It has been shown that the implementation is effective in achieving multi hop communication among swarm nodes and allowing out of range devices to communicate. It was successfully integrated with the existing hardware and doesn’t interfere with the usual swarm operation of ranging and broadcasting Node ID notifications. The additional changes are only in the software and memory occupied which is only 3 kbytes out of the available 16-64 kbytes on-chip memory (depending on the hardware). The parameters used in routing can be made user controlled by adding a few commands to the existing swarm API. This will allow the user to decide the maximum number of nodes to be supported, memory assigned to routing, frequency of message exchange, etc.

Integrating routing into swarm was advantageous as the range of operation increased to a large extent. Layer 2 Routing protocol was a good choice due to its hierarchical nature which is known for good scalability and efficient communication. Presence of a gateway for data collection in swarm, fits well to the protocol’s requirement of forming a mesh around mesh root (in this case, gateway). Before, nodes could only communicate to their immediate neighbors and hence for data collection by gateway, it needed to be in range with all the nodes. This resulted in small clusters with few nodes and a gateway, and increasing the number of nodes, required deployment of more gateways. However, due to routing with current implementation parameters, only one gateway is needed for up to 96 nodes. At the cost of removing unicast routing from mesh root to additional nodes (apart from the 96 nodes), total number of nodes in a mesh can reach up to 160 (for maximum depth of 10) or more (for higher depths). Hence, adding routing to swarm makes this location system scalable.

One of the lucrative features of swarm is that it is a low power location system and routing definitely adds an overhead of increased power consumption. The synchronization technique that was added to cope with this was an optimal solution. Unlike TSCH and other heavy synchronization mechanisms, this scheme doesn’t require external clock(s) or manager to provide a network wide synchronization. Instead, each node is responsible for its own synchronization with its immediate neighbors. The inherent feature of swarm nodes broadcasting blinks periodically, was used to achieve a power efficient and blink induced communication. This is reliable because a message is sent only when a blink is received from neighbor indicating that it is awake. As a part of this scheme, the broadcast messages were made unicast to accommodate different waking up times of nodes. This additionally helped reduce traffic on AIR and made the communication more reliable as swarm acknowledges unicast messages, but not broadcast. Hence, the synchronization scheme increases power efficiency of routing by allowing nodes to be in sleep state and still be able to wake up and receive routing updates or data.

The features of concatenation and random backoff at routing layer before packet transmission, were added as improvements to the basic functionalities discussed above. Concatenation was a good optimization to reduce traffic and prevent overflow of storage for pending messages (EB List). Random backoff before transmission prevents collision when packets are sent to same

72 destination by multiple nodes (hidden node problem). This allows the protocol to be used in linear topologies where CSMA is not effective and to certain extent in the absence of CSMA or collision avoidance schemes in swarm. 7.2. Future work All the goals that have been listed in section 1.4 have been achieved and some more improvements have been added to the implementation. To prototype swarm with routing for the market, the following can be done:

1. Decide the routing parameters that should be adjustable by the user or client. For these parameters, commands should be added to the swarm API. Some of the parameters that are hardcoded in this implementation and can be made user controlled are: a) Deciding the mesh root b) Command a node to join mesh c) Maximum number of neighbors in neighbor table d) Maximum number of reachable node addresses that can be stored e) Maximum depth allowed in mesh f) TC IE and L2R IE intervals g) LSN-SA record timeout h) Maximum TC IE, L2R IE and blinks to miss before deleting corresponding node i) Listen duration for a node to listen for EBR replies before joining mesh j) Separate data transmission commands for broadcast and unicast data with and without routing k) Turn routing ON or OFF l) Turn mesh wide broadcast of blinks and ranging results, ON or OFF. 2. Currently, all communication is blink dependent. But in case the destination is always awake (like gateway), the pending packets should be sent immediately and not stored till the next blink arrives. This will make the communication faster and can be implemented by adding a field indicating if neighbor is in power saving mode or not. 3. More tests can be done to determine the optimal and minimum values of RX window, blink, TC IE and L2R IE intervals and wake up before and extra wait durations, needed for routing. This will increase the power efficiency by ensuring that the receiver is not ON for longer than necessary. If possible, make most of the values dependent on blink interval, so that the user doesn’t have to explicitly set these values, thereby reducing error probability. 4. Improve the random backoff before transmission, to work with CSMA, so that even a little backoff on routing layer can prevent collision. This will result in a cumulative effect where if the small random backoff at routing is not sufficient to avoid collision, it can be corrected by CSMA in MAC layer (or PHY layer, in case of swarm). 5. Modify frame format of concatenated frame to achieve better optimization in terms of size and priorities of contents. 6. Perform exhaustive tests of the routing with large number of nodes to analyze if it fails for bigger meshes. 7. Mesh selection by the next higher layer (APP): This could be useful when some metrics are available only at APP layer, and can be used to choose the mesh with best metric.

73

8. End to end acknowledgement routed from distant destination to source 9. Addition of more metric units to evaluate link costs. More custom units can be added in future. 10. Multicast routing 11. Peer-to-peer (P2P) routing: routing between two devices not necessarily traversing the mesh root 12. Choice between hop-by-hop or source routing for DS route establishment 13. A device can be allowed to be part of multiple meshes I aim to do some of these improvements and changes in future with main focus on energy efficiency.

74

References

[1] http://nanotron.com/EN/CO_overview.php [2] http://nanotron.com/EN/PR_protect.php [3] http://nanotron.com/EN/CO_technology.php [4] Yosi Ben-Asher et al., ‘Scalability Issues in Ad-Hoc Networks: Metrical Routing Versus Table Driven Routing’, URL: http://www.openu.ac.il/personal_sites/moran-feldman/publications/WPCJ10.pdf [5] Luis Javier García Villalba et al., ‘Routing Protocols in Wireless Sensor Networks’, ISSN 1424-8220, October 2009 [6] I.F. Akyildiz et al., ‘Wireless sensor networks: a survey’, Computer Networks 38 (2002) 393– 422, December 2001 [7] Neha Rathi et al., ‘A review on routing protocols for application in wireless sensor networks’, International Journal of Distributed and Parallel Systems (IJDPS) Vol.3, No.5, September 2012 [8] https://ttssh2.osdn.jp/index.html.en [9] https://gephi.org/features/ [10] https://marketplace.gephi.org/plugin/graph-streaming/ [11] https://github.com/totetmatt/GephiStreamer [12] Committee of the IEEE Society, ‘Draft Recommended Practice for Routing Packets in 802.15.4 Dynamically Changing Wireless Networks’, February 2016 [13] AODV, https://www.ietf.org/rfc/rfc3561.txt [14] DSR, https://www.ietf.org/rfc/rfc4728.txt [15] Gossiping, http://web.mit.edu/devavrat/www/GossipBook.pdf [16] Kunal M Pattani, Palak J Chauhan, ‘SPIN protocol for wireless sensor network’, International Journal of Advance Research in Engineering, Science & Technology(IJAREST), May- 2015 [17] Chunyao FU, Zhifang JIANG et al., ‘An Energy Balanced Algorithm of LEACH Protocol in WSN’, IJCSI International Journal of Computer Science Issues, January 2013 [18] Andy Stanford-Clark and Hong Linh Truong, ‘MQTT for Sensor Networks (MQTT-SN) Protocol Specification’, November 14, 2013 [19] https://tools.ietf.org/html/rfc7554 [20] IEEE Std 802.15.4e-2012, ‘MAC sublayer’, IEEE Computer Society, April 2012 [21] nanotron Technologies, ‘nanoPAL Air Interface Description’, March 2011, http://wless.ru/files/Nanoloc/nanoPAL%20Air%20Interface%20Description.pdf

75

Appendices

Appendix 1: Common swarm commands used Note: The Node IDs or addresses of swarm nodes are 6 bytes in size. A.1.1. swarm radio Setup Commands

1. GNID (0x00) Get Node ID of the node connected to host

GNID Return value description: = configured 6-byte Node ID of swarm node Range: 000000000001 … FFFFFFFFFFFE

2. SSET (0x01) Save SETtings; saves all settings including Node ID permanently to EEPROM. This must be done after changing settings if they should persist through a power cycle or switch off state.

SSET Return value description: = Result of saving operation Range: 0 … 1 0 = Saving of all parameters successfully verified 1 = Saving of parameters not successful; verification failed

3. GSET (n.a.) Get current SETtings (node configuration). Reads current device configuration. First line is the number of following lines. All others state the name of parameter separated with ‘:’ and value. The value depends on parameter.

GSET Return value description: #\r\n:\r\n .. Number of lines after this line Range: 000 … 255 Name of following parameter value. Depends on parameter.

4. SPSA (0x04) Set Power Saving Active; sets power management mode ON/OFF

SPSA Range: 0, 1 and 3 0 = Power saving OFF, device is continuously receiving 1 = Power saving ON, wake up on any interrupt 3 = Power saving ON, wake up on external interrupt and timer. Return value description: =

76

5. STXP (0x05) Sets transmission (TX) Power of the node

STXP transmission power Range: 00 … 63 Return value description: =

6. SSYC (0x06) Set the PHY SYnCword of swarm node (0 … 12). The node will only listen to messages with its same syncword.

SSYC Range: 0 … 12 Return value description: =

A.1.2. Data Communication Commands 1. SDAT (0x21) Sends DATa to node ID

SDAT : SDAT : Sends of length to node .Depending on

6 byte Node ID of ranging partner node. Note:If ID is all FF than it send always after receiving a node ID notification until Range: 000000000001 … FFFFFFFFFFFE

length of payload in bytes (hex) Range: 01 … 80

payload to be transmitted Range: 00 … FF

maximum time in ms to wait for the node blink. Range: 0 … 65000 Return value description: : = : =

Status of transmission. Range: 0 ... 1 0 = Successfully transmitted 1 = Transmission failed 2 = Overload, try again later

77

ID of , used to identify the asynchronous return value (*SDAT) Range: 00000000 … FFFFFFFF

2. BDAT (0x22) Broadcasts DATa

BDAT : BDAT : BDAT : Broadcasts of length to all nodes

length of payload in bytes (hex) Range: 01 … 70

payload to be transmitted Range: 00 … FF

Range: 0 … 65000 0 = timeout disabled, broadcast will always occur. > 0 = time in ms during which the node transmits after a node ID blink.

Return value with : = : = : =0 Status of transmission. Format: 1 byte (dec) Range: 0 ... 1 0 = Successfully transmitted 1 = Transmission failed ID of , used to identify the asynchronous return value (*BDAT) Format: 8 byte (hex) Range: 00000000 … FFFFFFFF

3. FNIN (0x28) Fill data into Node ID Notification packets

FNIN : FNIN 0> : Fills the data buffer of each node ID notification with of length . This data will be transmitted with every node ID notification until deleted by host application. The data is not visible for swarm devices, because no *DNO is generated. This command is used to provide data to other products such as TDOA RTLS.

length of ranging data payload in bytes (hex)

78

Range: 00 … 5B 0 = delete data payload to be transmitted Range: 00 … FF Return value description = Status on ranging data buffer fill operation Range: 0 … 1 0 = successful 1 = not successful

4. FRAD (0x2A) Fills the RAnging data buffer. This data will be transmitted with the next RATO operation

FRAD : Fills the ranging data buffer with of length . The ranging data is contained within every following ranging packet itself. Each receiving swarm will generate a *DNO. This cause a lot of traffic for each swarm device.

length of ranging data payload in bytes (hex) Range: 00 … 74 0 = delete data payload to be transmitted Range: 00 … FF Return value description = Status on ranging data buffer fill operation Range: 0 … 1 0 = successful 1 = not successful

A.1.3. swarm radio Node Identification 1. SBIV (0x31) Sets the Broadcast Interval Value (or blinking rate) in which the Node ID will be sent

SBIV

A.1.4. Medium Access Commands 1. SRXW (0x40) Sets reception, RX, Window during which the receiver listens after sending its broadcast ID in power saving mode.

SRXW

79

If Time > NodeID broadcast interval the receiver is always active Return value description =

Appendix 2: Frame formats A.2.1. NIN packet format

Blink header swarm header swarm sensor data User data, if any

Blink ID

slot slot Message tag Protocol version Payload length Blink interval RX counter Sensor protocol 0 7 15 23 31 55 63 variable

S_LEN S_TYPE S_DATA 0 7 15 variable Figure 34. Blink packet format

Blink Header:

Message tag: It is equal to 0x02 which is a type used in swarm to allow localization with TDOA

Protocol version: This field is equal to 0x13 and used when the frame carries sensor data.

Blink ID: It is an up-counter and increases for every packet transmitted. Can be used to identify packet losses.

Payload length: This field represents the number of following bytes until packet ends.

Blink interval: It indicates the time until the next blink occurs, in milliseconds. It is LSB first.

RX slot counter: The RX slot counter is a backchannel counter which indicates the next RX slot. The receiver of this device is active if it is 0. In combination with Blink interval, the time in milliseconds until the next time receiver is active, is known.

Sensor protocol:

S_LEN: This field indicates the length of following bytes.

If length is 0, no S_TYPE or S_DATA field is present.

If length is >0, the last byte is S_TYPE with NO_SENSOR.

S_TYPE: The S_TYPE field indicates a specific sensor. The length of this sensor must be known.

80

Table 21. S_TYPE field in Sensor protocol

S_TYPE Description 0x00 NO_SENSOR (end of sensor protocol) 0x01 BATTERY (2 Bytes - LSB first - uint16_t - 100mV) 0x02 TEMPERATURE (2 Bytes - LSB first - int16_t - 1°C) 0x03 MEMS (3x2 Bytes - LSB first - int16_t - [x|y|z] - 1mg) 0x04 GPIO (1 byte) 0x05 TIMESTAMP (4 Bytes - LSB first - uint32_t)

S_DATA: The S_DATA field contains the sensor data. The length depends on S_TYPE. Note: S_TYPE + S_DATA repeats until S_TYPE indicates NO_SENSOR.

swarm header:

Type Version Class Power mode Wake up reason length Sensor data length

0 7 15 23 31 39 47 55 Figure 35. swarm header

TYPE: The TYPE field indicates the format of the swarm protocol header.

Table 22. TYPE field in swarm header

Type Description 0x60 (NID) Node ID notification 0x61 (RR) Ranging result 0x62 (BRDC) Broadcast packet Version: The VERSION field indicates format of the swarm protocol header.

Class: The CLASS indicates the device class.

Power mode: The POWER_MODE field indicates the power saving behavior of the device.

Table 23. POWER_MODE field in swarm header

Power mode Description 0x00 Power saving off 0x01 Power saving active 0x02 Autonomous mode Wakeup reason: This field indicates the last wakeup source of the device (bitmask).

Length: It indicates number of following bytes until end of packet.

Sensor data length: It indicates the number of bytes of sensor data that follows.

81

Appendix 3: List of figures and tables

A.3.1. List of figures Figure 1. swarm bee LE Functional Diagram and swarm bee LE Module ...... 5 Figure 2. SDS-TWR...... 14 Figure 3. Radios ranging to each other ...... 15 Figure 4. WSN protocol stack ...... 20 Figure 5. Swarm architecture ...... 21 Figure 6. Modified swarm architecture for routing ...... 21 Figure 7. Starting a mesh ...... 23 Figure 8. Joining a mesh ...... 24 Figure 9. US route maintenance ...... 27 Figure 10. DS route maintenance ...... 28 Figure 11. Unicast Routing decision based on the NT ...... 29 Figure 12. Data frame processing ...... 30 Figure 13. Broadcast routing and frame processing ...... 31 Figure 14. EB frame format ...... 35 Figure 15. EBR frame format ...... 35 Figure 16. TC IE frame format ...... 36 Figure 17. L2R IE frame format ...... 37 Figure 18. Long nested IE frame format ...... 37 Figure 19. NIN frame format ...... 38 Figure 20. Unicast data frame format ...... 39 Figure 21. Broadcast frame format ...... 39 Figure 22. swarm bee LE DK Plus Board ...... 51 Figure 23. Screenshots from Tera Term ...... 53 Figure 24. Screenshots of visual representation of routing in swarm at different phases ...... 54 Figure 25. Oscilloscope screenshot showing power consumption by node with 2 neighbors ...... 56 Figure 26. Sending NIN and Ranging between 2 swarm nodes ...... 57 Figure 27. Node waking up and waiting to receive blink ...... 58 Figure 28. Waveform for power measurement ...... 59 Figure 29. Topology tested - linear ...... 60 Figure 30. Time taken to start/join mesh ...... 61 Figure 31. Maximum packet size transmitted at different node depths in linear topology ...... 63 Figure 32. Transition of nodes within the mesh ...... 65

82

Figure 33. Awake time of node with routing ...... 67 Figure 34. Blink packet format ...... 80 Figure 35. swarm header ...... 81

A.3.2. List of tables Table 1. Quantitative specification of mesh networking parameters ...... 8 Table 2. Comparison of different routing protocols ...... 9 Table 3. Transmission parameters ...... 16 Table 4. Possible combinations for data transmission ...... 17 Table 5. swarm power modes ...... 18 Table 6. Common commands used ...... 19 Table 7. Elements of an MT Entry ...... 25 Table 8. Elements of a local NT entry ...... 25 Table 9. IE Interval units...... 36 Table 10. Sub IDs of IEs and data ...... 38 Table 11. Elements of timetable for neighbors (before joining mesh) ...... 42 Table 12. Elements of timetable maintained for neighbors (after joining mesh) ...... 42 Table 13. Elements of timetable maintained for nodes in reachable list ...... 43 Table 14. Mesh attributes...... 50 Table 15. Time taken to resume normal operation for linear topology ...... 61 Table 16. Packet size and number ...... 63 Table 17. Time to stabilize after node's position change ...... 64 Table 18. Awake time of nodes and power consumed ...... 66 Table 19. Time to transmit data over multiple hops ...... 68 Table 20. On-chip memory occupied by routing ...... 70 Table 21. S_TYPE field in Sensor protocol ...... 81 Table 22. TYPE field in swarm header ...... 81 Table 23. POWER_MODE field in swarm header ...... 81

83

TRITA-ICT-EX-2016:159

www.kth.se