Delft University of Technology Master’s Thesis in Embedded Systems

Indoor Localization using Accidental Infrastructure

Zsolt Kocsi-Horvath´

Indoor Localization using Accidental Infrastructure

Master’s Thesis in Embedded Systems

Embedded Software Section Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands

Zsolt Kocsi-Horvath´ [email protected]

4th January 2013 Author Zsolt Kocsi-Horvath´ ([email protected]) Title Indoor Localization using Accidental Infrastructure MSc presentation 15. January 2013

Graduation Committee prof. dr. K. G. Langendoen (chair) Delft University of Technology dr. ir. A. Phillips ARM R&D, Cambridge ir. H. Vincent ARM R&D, Cambridge dr. S.O. Dulman Delft University of Technology dr. A. Iosup Delft University of Technology Abstract

The level of the technological development of embedded devices is at a constant rise. We can foresee a near-future scenario where a huge number of semi-intelligent devices are part of our everyday environment, our homes, the public places and the office as well. The intelligent thermostat uploads the temperature readings to an on- line database; the fridge sends a tweet when we are out of milk; the coffee machine texts us when the coffee is ready. Each device has a unique and individual purpose. But what if they could be grouped together as a so-called accidental infrastructure to serve a more advanced cause?

We have set out to demonstrate the possibilities of such an accidental infrastructure in the field of indoor localization. An ambient device in itself is not intentionally prepared for localization purposes, but using many of them together and combining the collected data can surpass the devices’ limited individual capabilities.

Our approach was to build a prototype system based on a homogeneous array of radio-connected nodes and an additional entity with a higher magnitude of com- puting power. This central entity then controls the data collection from the nodes and executes a custom localization algorithm, based on probabilistic methods and a Kalman filter. We have evaluated our system both by simulations with ideal input data and by real-world measurements. The results show that the system is able to track and update the location estimates, but due to the heavy multipath effect it is only capable of very moderate improvements. iv Preface

I would like to express my deepest gratitude to everyone who helped me during these past months. First, to my supervisors at ARM, Amyas Phillips and Hugo Vin- cent; and to my professor, Koen Langendoen. Thank you for all your help, insight, but most importantly, for your infinite patience. I would like to thank my family and my friends in Budapest, in Cambridge, and in Delft for their great support. Es- pecially Beus, Reka,´ Shanti, Szadd´ am,´ Balint,´ Krisz and Tomi – the people of dorm rooms SCH1514 and SCH408, where I spent most of my days and nights typing out the words of my thesis, until the Sun came up. I would also like to thank Paul´ına, Andrea, Adel´ and Gabi for polishing my writings on a regular basis. Finally, a word of thanks to all the great writers and musicians that provided some sort of inspira- tion or a peace of mind during the long hours of writing.

Zsolt Kocsi-Horvath´

Delft, The Netherlands

4th January 2013

v vi Contents

Contents

1 Introduction1 1.1 Project description...... 2 1.2 Problem Statement...... 3 1.3 Approach...... 3 1.4 Outline...... 4

2 Background5 2.1 Wireless networks...... 5 2.2 Radio Protocols...... 6 2.3 Localization methods...... 7 2.4 Location and probability...... 11 2.5 Briefly on the Kalman filter...... 13 2.6 Example systems and applications...... 15

3 Design 17 3.1 Main thesis scenario...... 17 Introducing the central entity...... 18 Radio transmission...... 20 Initial coordinates and reliability values...... 20 3.2 Localization process...... 21 Representation of the current system state...... 21 Converting signal information to distance...... 22 Custom localization algorithm with Kalman filter...... 24

vii Contents

4 Implementation 29 4.1 Nodes hardware...... 29 mbed LPC1768...... 30 Radio module...... 31 4.2 Node software...... 32 Signal survey using the Chibi stack...... 32 Modular software development...... 33 Commands and actions...... 34 Real Time OS and threads...... 35 4.3 Central entity software...... 36 Generating an alternate database...... 37 Kalman subsystem...... 37

5 Evaluation 41 5.1 Configuration and methodology...... 41 Node arrangement...... 41 System parameters...... 43 Selected scenarios...... 45 5.2 Analysing results...... 46 Location updates...... 47 Configuring the Kalman filter...... 48 Sequences...... 49 Convergence...... 50 The advantage of diverse input...... 51 Comparing measured and theoretical cases...... 52 A note on errors...... 53

6 Conclusion 55 Future Work...... 56

Appendix 58 A.1 Localization system output figures, full size...... 60

viii Contents

Configuring the Kalman filter...... 60 Sequences...... 63 The advantage of diverse input...... 65 Comparing measured and theoretical cases...... 67

Bibliography 71

ix Contents

x Chapter 1

Introduction

The term Embedded wireless networks is very loosely defined, but in general it refers to a network of small devices, where each of them has at least a central pro- cessing unit and some means of radio communication. Wireless sensor networks and Mobile Ad-hoc Networks fall under this category, and so does the .

When the devices themselves are being deployed in large numbers, the manufactur- ing cost is significantly reduced. Low cost, small size and radio connectivity makes embedded wireless networks ideal for localization and tracking purposes. Some devices incorporate extra circuitry for positioning (e.g. magnetometer, accelero- meter, GPS chip), while others rely solely on their radio chips (i.e. signal strength, triangulation). These options are further explored in the Background chapter.

Beside the nodes that are deliberately developed for measurement and localization, there are numerous semi-intelligent general purpose devices that can be modified to serve a similar goal (e.g. a WiFi router with programmable firmware or an intelli- gent thermostat, also shown in Figure 1.1a). This accidental infrastructure of smart devices is predicted to have a huge increase both in numbers and in capabilities in the following decade. This scenario, part present and existing, part futuristic and hypothetical, provides the setting for this thesis.

1 Chapter 1. Introduction

1.1 Project description

An obstacle to widespread deployment of indoor localisation systems has been the provision of suitable supporting Internet of Things infrastructure at a cost which is justified by the value of the applications it enables.

It is possible to imagine a near-future scenario in which a building contains a variety of smart objects in fixed locations, such as room temperature sensors, door locks and window controls. Some are battery powered and must spend much of their time asleep. Others have mains power and function as gateways to the internet. These fixed nodes can potentially become an infrastructure for indoor positioning services. Nodes both mobile and fixed might benefit from knowing their location relative to one another or a map.

This thesis aims to study how this accidental infrastructure can be used to provide indoor positioning services. An example system is being developed and tested, that serves as an prototype of a future network of nodes deployed in a general work environment (office building).

Project context

The Master thesis is being carried out at ARM Cambridge, Research & Development division, Internet of Things workgroup. The example system and the surrounding theoretical research are to benefit both TU Delft and ARM R&D.

(a) Intelligent thermostat (b) Smartphone guidance system Figure 1.1: Internet of Things in everyday use

2 1.2. Problem Statement

1.2 Problem Statement

There is a hypothetical future environment of an office building augmented with radio-enabled intelligent devices. Each node’s location is defined with a reliability ranging from exact to unknown, represented on a scale of [0 .. 1000]. One or mul- tiple so-called central entities with a higher order of computing power are added to this environment. These have a room floorplan stored with the nodes’ rough initial location and the reliability values. They can communicate with the nodes via their own radio interface and measure the signal strength of messages. There is also a reliability value on each signal strength measurement (signal quality, link quality). Please note that location reliability and signal reliability are separate concepts with separate variables.

The main task is to develop a method to keep track of the nodes’ locations and the reliability values, and, if possible, improve the average location accuracy of the system. In other words, using noisy and unreliable measurements, how can we improve the prediction of the nodes’ locations?

This problem definition also demonstrates the main difficulty of working with an accidental infrastructure and the Internet of Things: the vast amount of transmitted information, and the quest to identify the truly useful parts in the ocean of useless data.

1.3 Approach

To build a complete localization system, the main parts have to be separated. First, the radio nodes themselves will be built and tested. Then, the environment is go- ing to be analysed. Here, we select a testing area and perform measurements to investigate multipath and signal fading.

Simultaneously, a background study is conducted. From the very basic concepts, such as signal strength to distance conversion and multilateration, to the more ad- vanced levels of probabilistic localization methods and Kalman filtering.

Finally, using the gathered information from the preliminary tests and the theoret- ical review, we will implement a localization algorithm at the central entity, run the localization measurements with the nodes and evaluate our results. The preceding scientific research in the field of indoor localization provides a valuable ground, but also demonstrates that every system behaves differently due to local properties

3 Chapter 1. Introduction and differences in the environment. Hence, the main part of the work is centered around a more experimental, practical approach.

1.4 Outline

In the following chapters firstly the necessary background overview is presented, including, among others, the topics of positioning, the probabilistic approach and the Kalman filter. Next, in the Design chapter the localization process is explained in detail. Then, the system implementation follows, where the nodes’ hardware, software, and the base station’s data processing methodology are all described in detail. Finally, the performance of the system is evaluated by running the data processing software for various sets of input data. The thesis closes with a short conclusion and outlook on the future prospects of the project.

4 Chapter 2

Background

2.1 Wireless networks

The term WPAN[8] denotes Wireless Personal Area Network, which generally con- sists of embedded nodes placed in the personal home area or office space. In other words, they are designed to connect over smaller distances and handle smaller data sizes than the well-known LAN and WLAN networks. The devices involved in a WPAN – such as mobile phones, wireless sensors and other embedded devices – are mostly battery powered, and the communication protocols are therefore heavily optimized. The most generally used transmission methods are the various versions of Bluetooth, ZigBee, and nowadays even WiFi.

The Internet of Things[3] concept aims to organize intelligent mobile nodes (dis- patched in an office area, shopping mall, airport, and so forth) in a similar system as the internet of personal computers and laptops. The nodes are generally provided with a unique identifier, and the user is aware of their function (e.g. ID0001: coffee machine, ID0002: dishwasher). In the earliest realization of pseudo-IoT networks, RFID tags were used to identify objects, and RFID readers retrieved the data from them[13, 14].

A more advanced way to realize an IoT network is to equip some or all nodes with Ethernet ports or WiFi modules and introduce some sort of gateway to the Internet. This translates the identifiers to IP addresses, making the embedded modules and

5 Chapter 2. Background the WPAN network itself connected to the web. (The coffee machine posts on twit- ter1,2 when the coffee is ready.) If the IP addresses comply to the IPv6 standard [20], and the devices are working on low power levels, the system can be categorized as 6LowPAN: IPv6 over Low power Wireless Personal Area Networks [15].

In the system realized as part of this thesis, nodes in an office space are connec- ted via Bluetooth Low Energy radios, while the network powered anchors are also equipped with Ethernet ports, realizing the connection to the Internet.

2.2 Radio Protocols

In the prototype system of the thesis, an 868 MHz radio module implementing the communication standard IEEE 802.15.4 was used. It is meant to serve as an ab- straction of a hypothetical future IoT system, where the real nodes will most likely be connected via Bluetooth Low Energy radios.

Hereby follows a short description of the important concepts and standards related to the radio protocols used during the thesis work. IEEE 802.15.4 and the 868 MHz ISM band The IEEE 802.15.4 standard defines the lower level standards (specifically, the physical layer and MAC) of low-rate WPANs. In Europe, these devices are operating in the 868 MHz indus- trial, scientific and medical band. Both ZigBee and the Chibi Wireless Stack are defining higher level protocols, assuming the hardware to be designed according to the IEEE 802.15.4 standard.

Chibi stack A minimalistic software stack, intended to be deployed on mi- croprocessors connected with IEEE 802.15.4 radio hardware3. For the thesis project, it has been chosen mainly due to its simplicity over ZigBee.

Bluetooth Low Energy Bluetooth is a point-to-multipoint wireless commu- nication protocol operating in the 2.4 – 2.5 GHz band. The fourth iteration of the standard incorporates the previous versions, a WiFi-related high speed version, as well as the Bluetooth Low Energy protocol. Comparing to the other two sib- ling protocols, BLE has been optimized for battery operated embedded devices, requires lower power levels, features faster connection setup time, slightly lar- ger range at the cost of lower bitrate and throughput [2].

1 [instructables.com] Tweet-a-Pot: Twitter Enabled Coffee Pot. [local html file] and [online link]. 2 [ericholm.nl] The Coffee machine on Twitter. [local html file] and [online link]. 3 [freaklabs.org] Introducing...Chibi - A Simple, Small, Wireless stack for Open Hardware Hackers and Enthusiasts. [local html file] and [online link].

6 2.3. Localization methods

2.3 Localization methods

Knowing the underlying protocols, therefore knowing the related system constants, one can proceed to calculation-based procedures of the localization system. First, the distance estimation is carried out, using various possible forms of input, such as signal strength, measured time and time difference. Then, from the distances and the known anchor positions a position fix can be created. Finally, in more advanced systems the localization errors are identified and there is a possibility to combine multiple sources of information.

Distance estimation

To calculate the position of a device in a two dimensional space, one must be able to measure its distance to points with known locations (beacons, anchors). The two most significant options to accomplish this are related to the measurement of a transmission’s signal strength or a deduction of its travel time. From either or both of these data an estimation of the distance can be made.

Signal power Under ideal conditions, the signal strength expressed in mW (milliwatts) fades inversely proportional to the square of the distance to its origin (see Figure 2.1); this can be also expressed by the Friis equation[6]:

 λ 2 1 k P (mW ) = P G G = 1 (2.1) r t t r 4π d2 d2

It can also be converted to dBm (decibels above a reference level of one milli- watt), in this case the relation to the distance is logarithm-based (see Figure 2.1).

P (dBm) = (20 log(d) + A) (2.2) r − ·

Under non-ideal conditions (e.g. multipath), the equations are modified with an empirically determined path loss exponent n:

 λ n 1 k P (mW ) = P G G = 1 where n = 2 .. 6 (2.3) r t t r 4π dn dn

P (dBm) = (10 n log(d) + A) where n = 2 .. 6 (2.4) r − · ·

7 Chapter 2. Background

Figure 2.1: Approximation of the signal strength over distance; logar- ithmic scale[12]

RSSI (Received Signal Strength Indication) RSSI is the measure of a radio transmission’s signal power at the receiver end. In contrast to signal power, RSSI does not necessarily have a unit, it could be represented on an integer scale. The depiction of values and scales is unique for each device, but the most common practice involves an equivalence or a not necessarily linear but monotonous re- lation to the actual radio signal power values in dBm [10]. Due to this relation, the distance between the transmitter and the receiver can be approximated not only from the signal power, but also from the RSSI value.

Time of Flight and Round Trip Given an anchor (or beacon) and an unknown node with synchronised clocks, the time delay between a discovery message be- ing sent from the beacon and arrived at the unknown node can infer to the dis- tance between them. This method, known as Time of Flight (ToF), is also the base of the Global Positioning System. The clocks of the anchor and the un- known node have to be in accurate sync for this method, which is not always possible. Clock offset and clock drift often corrupt ranging accuracy[10]. The Round Trip method offers a solution to this problem by measuring the summed time of the sent message signal and the received acknowledge signal at the same node (hence, no need to have information about the other node’s clock).

Time Difference of Arrival (TDoA) In contrast to the previous approach, the unknown node acts in this case as an emitter of the discovery signal, reaching the anchors at different moments. Each two anchors can determine a time difference and infer to a distance difference value. For this, the clocks of the receivers need

8 2.3. Localization methods

to be synchronized, but there is no need for synchronization with the unknown node. If at least four anchor nodes are present, the difference values can be com- bined to calculate a position fix (multilateration). The most common scenario involves multiple discovery signals sent with precisely known time delays[10].

Position fix

To calculate the actual X-Y position, one must combine the obtained distance values. Figure 2.2 illustrates the two most common methods.

Trilateration The distance is either calculated by RSSI or by ToF. In case of Time of Flight, all devices should have their clocks synchronized. The anchors’ locations are known, and the unknown node is placed on a circle around each anchor with the estimated distance as a radius. If at least three anchors are present, the unknown node can calculate its assumed position by the intersec- tion of these circles.

Multilateration It is related to the TDoA method. Multilateration has dis- tance differences as input values, hence the unknown node is placed on a hyper- bole – as opposed to a circle in the case of ToF. This also implies that the equa- tions have more unknown variables and multilateration is more calculation- intensive. As a result, the crosspoint of the hyperboles (or hyperboloids, in 3D space) produces the assumed location. Multilateration and TDoA are in general very vulnerable to multipath propagation, noise and interferences[10].

(a) ToF and trilateration (b) TDoA and multilateration Figure 2.2: Time-based localization techniques[10]

9 Chapter 2. Background

Localization errors

Multipath and NLOS The phenomenon caused by the reflection or scatter- ing of radio waves is named multipath. The cause of it is mostly a wall or an obstacle in the path of the signal – the receiver and the sender are Not in clear Line Of Sight (NLOS) to each other. The multipath signals reaching the destin- ation have not travelled the same distance as the line-of-sight signal, therefore it can degenerate the position estimation. This is a huge concern in both time measurement based and signal strength based localization approaches. At the receiver end, it is in general very difficult to establish whether or not the signal has been subjected to multipath.

Geometric Dilution of Precision The geometrical alignment of the anchors relative to each other and to the unknown node influences the positioning pre- cision. This effect is described in Figure 2.3. To avoid an unfavourable constel- lation and the harsh decrease in positioning precision, it is necessary to spread the beacons in the testing area in such way that any unknown node could get coverage from at least three of them under a satisfying angle and distance.

(a) Simple trilateration (b) good GDOP (c) poor GDOP Figure 2.3: Simple trilateration (a) and trilateration with confidence ranges (b, c); illustrating GDOP[17]

Combining multiple sources of information

Many other techniques have been developed as complement to or as replacement of signal based localization. Generally, most of these systems are not sufficient on their own (for example, inertial measurements are compromised by additive drift), but combining them with signal based localization results in a more redundant and reliable system with high location accuracy – but introducing the disadvantage of increased hardware, computational and development complexity.

10 2.4. Location and probability

Fingerprinting If a building is well covered by anchored radio modules (e.g. WiFi routers), it is possible to create a map or floorplan of signal strength values by measuring the RSSI at various points of the building. The resolution and the size of the covered area varies from scenario to scenario. The map is then used to help the localization services. Each point has recorded unique RSSI values for each router; when the unknown node enters the area and does a survey, the results can be matched to the previously created map and give the location of the unknown node[11].

Figure 2.4: Fingerprinting of one signal source [16]

Sensor input and IMU Many types of localization systems can profit from combining multiple sensor inputs. The most advanced systems are combining RSSI or ToF with barometric pressure sensor readings and/or inertial measure- ment units[21]. An IMU is a combination of an accelerometer, gyroscope and, optionally, a magnetometer. This kind of approach is advantageous because it can be perceived as an abstraction of a 2012 smartphone device. Most smart- phones are capable of measuring signal strength and have an advanced IMU unit; some models4 even have an internal air pressure sensor. It is easy to see, that the next step is to deploy the combined localization system on the smart- phones themselves.

2.4 Location and probability

Due to the unregularity of any given measurement environment and the related effects, such as the previously mentioned multipath and dilution of precision, the

4 Samsung Galaxy Nexus (GT-I9250)

11 Chapter 2. Background positions retrieved by bare localization or trilateration are generally not correct and compromised by errors. For a more appropriate depiction that represents the un- certainty of an actual position, probabilistic approaches are to be used.

There are many different ways of probabilistic localization, but in general they are all based on observing a field where the node is possibly placed (in this case, the location is a random variable) and assign probability values to the points of the field according the likelihood of the node being located at each point (probability density function).

At the beginning, there is no information about the node’s position, it can be located anywhere in the observed area. At this point, the probability density function of the location variable can be represented as a two dimensional even field with the same values assigned to every point in the grid. Then, with each set of new information (e.g. a new RSSI measurement) the field pdf is being updated, at each given time representing the current prediction of the node being placed at each of the grid points. This method is titled Recursive Bayesian Estimation [4].

Threshold function For example, if a node estimates the distance to another node as x, the points in a certain threshold from that distance are being assigned higher probability values than the others. The main significance of this ap- proach, comparing to regular trilateration, is that the threshold function can be more sophisticated. Instead of a simple line segment with error ranges, more complex functions – such as the normal and lognormal probability distributions – can be used with the probabilistic approach.

The most significant drawback of the probabilistic method (compared to the classic trilateration) is the extremely high computational time. There are, however, some simplified derivatives of this approach, which provide faster execution at the trade- off of reduced accuracy.

Discrete approach The probability field can either be realized with a continu- ous function or – more commonly – with a discrete grid-based function. The Grid-based probability density matrix[26] approach divides the observed area into a 2D grid, thus the probability computations are being reduced from the continuous realm to a discrete sampled form of data points. The resolution of the grid presents a tradeoff between the positioning accuracy and the complexity (execution time) of the calculations.

Propagating information is also a good way to improve localization accuracy. In [9] the authors proposed a method that is well applicable even if the anchor nodes

12 2.5. Briefly on the Kalman filter are only sparsely distributed, while the mobile nodes are in high numbers. It in- volves propagation of coordinate estimations, which means that a mobile node can estimate its position not only by the anchor nodes in range, but also by using the position of another mobile node that was estimated by other anchor nodes, which might fall outside of the communication range. The probabilities are then multi- plied (similarly to [19]), which also ensures that the estimation errors introduced by the out-of-range anchor nodes do not propagate any further.

The amount of collected data naturally would continuously increase over time, res- ulting in increasing computation. A better approach is to model the system as a Markovian process, meaning that the current state of the system (at a given time) in- corporates all the necessary informations. Each new measurement represents a new state, and each state is conditionally independent of the other earlier states. Thus, the system can be represented as a Hidden Markov Model, and only the knowledge of the current and the previous state is necessary for the calculations.

2.5 Briefly on the Kalman filter

A derivative of the continuous Hidden Markov Model, the Kalman filter is an es- timator with ,, (...) a set of mathematical equations that provides an efficient computa- tional(recursive) means to estimate the state of a process” [24]. Here, the system model is presumed to be a linear dynamic system corrupted by additive Gaussian white noise; the measurements are defined as linear functions of the system state and noise [1]. This is represented in the following two equations:

xt = Axt 1 + But + wt 1 (state) (2.5) − − zt = Hxt + vt (measurement) (2.6)

Here, the signal vector xt is calculated as a linear combination of its previous value, a control vector ut, and a Gaussian process noise wt 1. A is the state transition − matrix, which represents the correlation of the values within the state vector5. B is the control matrix, which represents the influence of the control vector. In some cases, and during this thesis work as well, both ut and B can be neglected (no control input, B = 0). The measurement equation shows that the measurement vector zt is

5 For example, if the signal vector xt represents a two-dimensional coordinate (x1 = X; x2 = Y ), then the A matrix can describe the correlation of the two coordinates.

13 Chapter 2. Background

a linear combination of the signal vector xt and a Gaussian measurement noise vt. H is the observation matrix.

Figure 2.5: The Kalman filter [25]

The estimator works in two steps: Time Update (prediction) and Measurement Up- date (correction). Figure 2.5 explains the entire process. ,,Matrix A predicts the next state xˆ [t] from the current estimate xˆ[t 1]. Matrix H generates a prediction of the − − sensor readings, that is subtracted from the actual input z[t] to calculate an error vector. This error is multiplied by the Kalman gain matrix K, that generates a correction added to the prediction to yield the final estimate xˆ[t]” [25].

The mathematical background is considered to be quite complex, but the algorithmic implementation is nevertheless quite simple (requires only matrix addition, matrix multiplication and transpose), and in the parametrized form does not require the full and deep understanding of the mathematics behind – only the understanding of the input parameters. Due to this, the Kalman filter is widely used for develop- ment.

For a more extensive and in-depth introduction to the Kalman filter, see Welch’s and Bishop’s extensive Introduction to the Kalman Filter[5].

Combination with probabilistic approaches Mobile Beacon-assisted Local- ization [23] is a hybrid approach, based on both Kalman filter estimations and probabilistic methods. It has been further investigated, which probability dis- tribution should be selected as the dynamic model in the prediction stage, and whether the information provided by the neighbour node provides significant improvement in accuracy compared to the drawback of its additional computa- tional cost.

14 2.6. Example systems and applications

2.6 Example systems and applications

Various theoretical approaches have been listed in the previous sections. It is also important to show how one or more of these theories can be used and combined to build complex systems. There are numerous research groups working on the problem, in the following lines some examples are being presented, including the increasingly popular 3D localization subtopic.

2D approaches

In [19], the authors propose a probabilistic, fully distributed localization approach. Instead of calculating the assumed location of a node, a 2D field with probabilit- ies assigned to each point is being produced (probability density functions). The simulated setup consists of beacons and nodes with unknown location. First, the fixed beacons calibrate the RSSI parameters based on the propagation of initial test signals. Then, each unknown estimates the probability density function of its own position. Nodes with position information, including both beacons and unknowns with updated estimations, send out beacon packets to their neighbours, and the es- timates are being upgraded with the new information. Finally, negative constraints are being introduced.

Figure 2.6: Probabilistic approach, common area of two probability fields [9]

In [18] the authors have utilized the Weibull function to approximate the Bluetooth signal strength probability distribution function. The anchors send their measured RSSI data via WLAN to a high-performance central server that handles all the prob- ability density based localization calculations. The observations are first used to create a fingerprint map and an RSSI distribution model, then the anchors connect to the user’s mobile phone via Bluetooth and estimate its position with the help of the Bayesian theorem and the Histogram Maximum Likelihood algorithm.

15 Chapter 2. Background

3D localization

There are various methods to accomplish full or partial 3 dimensional localization in a network of wireless nodes. The most simplistic approach uses the deployed wireless anchor nodes or WiFi routers in a multi-storey building. In most cases, these devices have not only their position, but also their floor number recorded. Therefore, coarse-grained 3D localization can be achieved – not the correct [X, Y, Z] position in meters/centimeters, instead an [X, Y] position with a floor number as the [Z] coordinate.

Complimenting (or opposing) the previous method, altitude determination can also be relevant on an in-room scale. A creative and efficient experimental way to solve this problem is the Complexity-reduced 3D Trilateration Localization Approach[22], using special anchor columns with two radio nodes – one at the bottom, one at the top, so they have pairwise the same [X, Y] coordinate, and a known difference in altitude. Due to this unique setup, the processor does not have to perform complex 3D trilateration (O3), only two times the 2D trilateration (2 O2). In a 3D office ∗ environment, this new experimental method provides better accuracy while using less computational cycles, for the cost of extra hardware complexity at the anchors.

There are more complex approaches, which provide geometrically correct [X, Y, Z] localization. If the nodes have been deployed with the appropriate code to per- form the trilateration in three dimensions instead of two, and there are at least four independent anchor nodes with known location, the localization of the unknown node is possible[7]. The drawback of this method is the increased computational complexity, increased processor load and power consumption and longer execution time. In a multi-storey building the method is significantly impaired by multipath.

These examples are only a tiny subset of the extensive research projects conducted in the field of complex localization systems. A list to further reading is provided in the Bibliography section.

16 Chapter 3

Design

3.1 Main thesis scenario

The localization system is depicted to be deployed in a hypothetical future office environment, where many devices are augmented with a microprocessor and some sort of radio chip for smart purposes: intelligent thermostat, coffee machine, radio- operated door lock, and so forth. Together they form a so-called accidental infra- structure. In the thesis, they are collectively referred to as .

The computers, phones, tablets and laptops are also connected via compatible radio modules. They are referred to as workstations. Furthermore, the workstations can forward data to a higher level of computing power – termed the central entity. Both the ambient devices and the workstations are stationary, not moving objects.

Ambient device Workstation I radio-connected I radio-connected I stationary I stationary I deployed in high numbers I are present in low numbers I sends messages (signals) to Workstation peri- I polls and reads messages (signals) from any odically, or on demand ambient device I measures signal strength and the reliability1 of a signal received from the Ambient node I connects to a central entity and forward the collected data

Table 3.1: Properties of ambient devices and workstations

In this environment, there is a vast amount of radio traffic, but this traffic is ded-

1the reliability value is a vague abstraction of Link Quality. It is further explored in Section 3.1

17 Chapter 3. Design icated to specific actions and messages, e.g. a smartphone polling the intelligent thermostat for temperature data. The devices are not intentionally prepared for localization purposes.

The devices of the accidental infrastructure are mostly sending and not receiving packets. The messages can be periodic, like a status message every hour, or event- based, like responding to a request. The workstations are taking the active role in this jungle of communication, sending the requests and processing the status messages. Here, we must introduce the concept of radio surveying.

Signal survey By surveying, we mean a workstation broadcasting a request signal for all ambient nodes, and the ambient nodes answering with sending their packets. The workstation reads the signal strength and reliability of each transmission. Since reliability or signal quality was not defined by hardware, we have chosen to implement it ourselves. If signal sending is repeated N times,

meaning at a constant distance N samples are sent (x1 .. xN ), then the reliability value equals the average of the standard deviation of the data set x1 .. xN . uv t N N 1 X 1 X reliability = (x µ)2 where µ = x (3.1) N i − N i i=1 i=1

It is important to note here, that in this hypothetical scenario the messages are all easily intercepted, because they are either being openly broadcast (hourly status messages) or sent with no encryption. This can be explained with the fact that the data from the thermostat and coffee machine do not count as highly important and security-dependant information, hence an encryption algorithm would just unne- cessarily introduce extra overhead into the process.

Introducing the central entity

The central entity (C.E.) is defined as follows:

• higher magnitude of computing power than the workstations and ambient devices • has a basic map or coordinate-system with the rough initial locations of the ambient nodes and the initial reliability values describing the location errors • after a workstation’s radio survey, the C.E. receives the collected signal in- formation (signal strength and reliability values) from the workstation • C.E. is able to convert the signal information to location estimation

18 3.1. Main thesis scenario

• C.E. is able to follow and update location estimations as new data arrives • the localization should be fully functional even with only one single worksta- tion

In other words, the C.E. consists of some sort of server device and dedicated localiz- ation software that is connected to all workstations. This can mean, among others, a high-end smartphone running an app, a laptop connected to a data center via the Internet1, or – as implemented in this thesis – a strong computer running custom Matlab code, connected to the workstations via USB.

The reason for the localization to be fully functional even with only one workstation is twofold. First, the thesis’ main focus lies on accidental infrastructure and not on computers, smartphones, workstations. The system should operate with only the minimal, necessary help of these advanced devices, and should be able to serve any number of devices from the accidental infrastructure. Second, a workstation is not just a smartphone or computer, but a smartphone or computer with the appropriate frontend for measuring signal strength, storing the data and the ability to forward it, hence there is a good chance that in the office environment there will be just a single device of the workstation category. Note, that with the appropriate software installed this single workstation can take the role of the central entity as well.

Of course, there are many ways to introduce a dedicated localization entity. Another possibility would have been to create a distributed software, with e.g. each worksta- tion running part of the processing, thus eliminating the need for a data center or central entity with high complexity. This represents a completely different problem set, where the design bottleneck would have been the problems of communication and data transfer.

Therefore please note, that although the solution using a computer with Matlab is the one to be explored in depth in the Implementation chapter, it can nevertheless also represent a valid abstraction of the general problem of localization. Other solu- tions, may they be central or distributed, would share the majority of the problems and design choices.

1in this case, the data center is responsible for running the localization software, the laptop only has the role of radio-communication with the workstations and forwarding the received information via the internet.

19 Chapter 3. Design

Radio transmission

When designing the radio related properties of the system, the main goal was to find a balance between maintaining simplicity (avoiding unnecessary levels of de- velopment in the project) and still delivering a quasi accurate approximation of a future real-life scenario. Monitoring the present trends in radio-connected embed- ded devices reveals a general direction towards diversification. The ambient devices of today are using a wide range of protocols (Zigbee, Bluetooth, Wifi, XBee, active RFID, NFC, DASH7, etc.) while brand new protocols are being introduced almost on a weekly basis. Each company deploys their own set, and they are almost never compatible with a product of another company. Meanwhile, in the smartphone industry there is a slight pattern of unification. Phones are incorporated with more and more types of wireless protocols. WiFi and Bluetooth communication is solved with a single chip, NFC is gaining popularity and WiMAX is just around the corner. Smartphones – abstracted as workstations in this thesis – have no problem whatsoever dealing with multiple radio protocols. To sum up, it is a valid estimation that the ambient devices of the future are going to use diverse protocols, and that workstations will incorporate all the ap- propriate chips and protocols to communicate with them. Hence, using a single, universal radio chip with all the prototype ambient and central entity devices in the thesis (further described in Section 4.1) does not decrease the resemblance to a real-life scenario significantly.

We also made no distinction between the two naturally separate types of signal transmission, point-to-point messaging and broadcast. As it is shown in the Imple- mentation chapter, the unique properties of the radio firmware enabled us to do so, as broadcast is defined almost identical to addressed messaging in the provided driver and software library.

Initial coordinates and reliability values

Now, we assume that there is a localization scenario of various devices dispersed in an indoor environment, the workstations collect signal information from the ambi- ent nodes and forward it to the central entity, which then has to localize the ambient nodes. The question arises, what initial information should we provide this system.

For the main problem of the thesis the workstations have been selected to be placed with fixed coordinates relative to the walls on the floorplan. Now, the [X, Y] co- ordinates of the workstations are defined to be known, and the coordinates of the

20 3.2. Localization process ambient devices are defined to be accompanied by a reliability variable on the range of [0 .. 1000] cm, representing the maximum possible euclidean distance (or error) between the point depicted with the coordinates and the node’s real location.

This was also realized later in the prototype implementation of the system. The ambient devices were placed next to the coffee machines, fridges, doorknobs, etc of the office building in which the prototyping took place. The accurate coordinates of the ambient devices were measured by hand, then a random additive error ac- cording the current reliability value was generated, and the modified coordinates were defined as the input values of the C.E. localization system. The randomness of the error variables is further explored in Section 4.3.

3.2 Localization process

As explained in the previous section, the central entity has a deliberately error- induced estimation of the ambient nodes’ positions as input. The main task is to update the location estimations based on the new signal information gathered by the workstations. The C.E. processes one set of data at a time, collected by a single workstation in one round of radio survey. This is also called a cycle.

The central entity repeats the following subtasks in each cycle:

• represent the current system state, the position estimation of the ambient nodes • receive new signal information data set from workstation and convert it to distance • merge the new data with the previously known data, update the coordinate estimations

In each cycle, the central entity has an estimation for the [X, Y] location of each ambient node. The C.E. uses the new information to update this value.

Representation of the current system state

For each single ambient node, we define the office area as a probability field. Each point of the field has a probability value, describing the likeliness of the node be- ing placed on that given point. Naturally, the current predicted location has the highest value, and with increasing distance the probability decreases with a Gaus-

21 Chapter 3. Design sian distribution[9][19][26]. If we represent a node’s probability field around the current predicted location, we see a classic Gaussian bell shape.

Figure 3.1: Probability fields of multiple ambient nodes

When combining the probability fields of multiple ambient nodes it is shown that the nodes whose location information is less reliable have lower and wider Gaussian bells compared to the more reliable ones. Figure 3.1 demonstrates this effect.

Viewed from the top, the combined probability field represents the estimation of every ambient node’s location in the observed area, as shown in Figure 3.2. The colours represent probabilities: red is highest, blue is lowest(zero). In Figure 3.2b, note the differences in width and height compared to Figure 3.2a.

Converting signal information to distance

The signal information consists of measured signal strength (RSSI) and measured reliability values. (Please note that the measured reliability is unrelated to the pre- viously mentioned initial reliability.) The classic approach to distance conversion would suggest to convert the RSSI data (measured in dBm) with the help of the Friis-equation2 to a distance value. In contrast, probabilistic approaches aim to represent the unreliability of the measured signal strength.

In this thesis, we convert the RSSI and the reliability variable to parameters of a lognormal function. When the distance between the transmitter and the receiver is constant, but the received RSSI values are varying due to e.g. the multipath effect

2See (Eq. (2.3) in Section 2.3)

22 3.2. Localization process

(a) ambient nodes with equal probability values

(b) ambient nodes with different probability values

Figure 3.2: Combined probability field of every ambient node in the ob- served area.

23 Chapter 3. Design

(thus incorrectly suggesting a varying distance), the lognormal probability density function had been proven to serve as a valid estimation, relating the most often received RSSI value to the highest possibility[19].

To represent all the points in space with the given distance from the receiver end of the signal – the workstation – one must rotate the lognormal function. This is quite similar to trilateration, where a single point is rotated around the receiver, giving a circle in 2D (Figure 2.2a). If the third dimension (height) is defined as the probability of the transmitter (ambient device) being located at each given point, one can imagine that the shape of the circle extends to 3D as an empty cylinder (the points of the circle have the probability of 1, the other points of the space have the probability of 0). With the probabilistic approach, the shape created as the result of the rotation (and the height defined as probability variable) resembles not a cylinder, but a volcano (Figure 3.3).

(a) 3D rendering (b) Slice shows lognormal function Figure 3.3: Probability field representation

If multiple workstations are available, we can take the cross-section of the ,,volca- noes” to get the possible location of the transmitter the same way as in the case of trilateration. As part of the testing and implementation of the localization sys- tem, an experiment has been conducted to demonstrate this very effect. Shown in Figure 3.4, red dots mark the transmitter (ambient device), white dots mark the receivers (workstations).

Custom localization algorithm with Kalman filter

The system is defined to be fully functional with the minimal number of worksta- tions. During one cycle, only one workstation provides signal information, which makes the trilateration of ambient nodes impossible (at least three workstations would be necessary for that). Therefore, a new approach is necessary. We are intro-

24 3.2. Localization process

Figure 3.4: Probabilistic approach to trilateration ducing a new localization approach, partially using the Kalman filter, as explained below.

As shown above, the signal information converted to distance gives a volcano shaped object around the receiver (workstation), and the highest points of this volcano (the maximum of the lognormal function) give a circle. The ambient node is here a point somewhere on that circle, the exact position is yet unknown. In terms of equation systems, we can say that we have the equation of a circle and the exact position on the circle is an unknown variable. Thus, we introduce another source of informa- tion in form of another equation to eliminate this variable. This extra information was chosen to be the line connecting the workstation to the previously predicted location of the ambient device. The method is illustrated on Figure 3.5.

(1) the C.E. already has some location estimation from the previous cycle or from the initial coordinates (2) the new data from the workstation is being translated to a distance and it gives the radius of the circle around the workstation (3) the circle intersects with the line connecting the workstation to the previously predicted location (4) based on the reliability value of the previous state and the reliability of the new signal measurement (which was converted to distance), the final location

25 Chapter 3. Design

Figure 3.5: Custom localization algorithm explained

estimation is placed

The Kalman filter is invoked to decide for the position of the new location estima- tion. As described in Section 2.5, it takes the previous location (xt 1, state variable) − and the new measurement (zt, measurement variable) as input positions, and the reliability values as internal matrix parameters. It produces a new xt state variable as output, which is going to be the new location estimate. In this particular scen- ario, xt 1 is point (1), and zt is point (4). With the appropriate configuration of the − parameters, the output is drawn somewhat to the vicinity of the more reliable of the two input positions.

The example in Figure 3.6 shows how the system updates location estimates over multiple cycles. We start with an initial position estimate x0 that is far off the true location. The first iteration of the Kalman filter takes this x0 and the measurement- based location estimation z1 as inputs. Here, z1 is calculated with the previously discussed method, by intersecting a circle and a line. The position estimate out- put of the Kalman filter, x1, provides the input in the next iteration. In an ideal situation, the position estimate is nearer the true location after each cycle.

Let it be mentioned here as a side note, that for the task of the localization update and weighing between the two input locations (1) and (4) the Kalman filter was

26 3.2. Localization process

Figure 3.6: Localization after 3 cycles. Green circles mark workstations, red square marks true location of ambient node; xt is position estimate, zt is measurement estimate specifically chosen over other possible solutions or algorithms. The main reason was that the Kalman-based method not only weighs the two inputs comfortably, but during an iteration the internal parameters are being automatically updated as well, which solves the problem of updating the reliability variable. Another advantage of the Kalman filter is that based on the configuration of the internal variables, one can control the arithmetical complexity of the system. Meaning, first we are able to quickly develop a simple and uncomplicated, proof-of-concept system, which in turn can be expanded in a very straightforward, modular way to a more complex one in the future.

27 Chapter 3. Design

28 Chapter 4

Implementation

As described before, the localization scenario takes place in an indoor environment with multiple simple, radio-connected nodes (ambient devices) present at locations only very roughly known to the searching party. The searching party consists of one or more workstations, with radio modules compatible to the ambient devices and a direct connection to the central entity. The goal is to continuously improve the location estimation at the C.E. with every new incoming measurement data from the workstations’ radio surveys.

During implementation, it was found to be possible to simplify the concept by us- ing the same hardware for ambient devices and workstations, only the latter were connected to a computer, which served then both as the central entity (data pro- cessing) and as the controlling frontend to the mbed (commands, initiating radio survey, collecting data). Furthermore, it was possible to reduce the number of com- puters to one, with the distant workstations forwarding their data to the one that is directly connected to the PC. Figure 4.1 displays the basic abstract of the system setup.

4.1 Nodes hardware

The nodes, which serve both as ambient devices and as workstations, were as- sembled on top of the ARM microcontroller-based mbed platform, using a dresden elektronik REB212 radio extension board and (for ambient devices) an external battery holder (Figure 4.2). At hardware development the focus lied on reducing the development time as much as possible, hence the usage of mbed and pre-

29 Chapter 4. Implementation

Figure 4.1: Localization system. White: ambient devices. Red: worksta- tions. Green & PC: the central entity. assembled, breadboard-compatible radio hardware.

Figure 4.2: Ambient node assembled. From left to right the parts: mbed, battery pack, radio module. mbed LPC1768

The mbed is a development board encompassing a microcontroller and additional peripheral circuitry, with a pre-deployed firmware and drivers to ease communica- tion with the PC via USB. It is also often referred to as a rapid prototyping platform,

30 4.1. Nodes hardware as many research and student groups around the globe are using it for demo and prototyping projects. The microcontroller inside is the NXP LPC1768, which is based on ARM’s Cortex-M3 design. The mbed is capable of switching its power source between the internal miniUSB port and the VIN pin. This was found to be an important feature, as the workstations were deployed with a live USB connection to the PC (without battery) and the ambient nodes were deployed with battery packs – meanwhile the hardware design need not to be changed (thus, reducing hardware development time).

Figure 4.3: mbed microcontroller

Radio module

The dresden elektronik REB212 is a general purpose radio board encompassing the Atmel AT86RF212 as the transceiver chip. Its greatest advantage is that the RF- critical components are already built-in, thus eliminating the need for extra tuning, and again, saving valuable development time.

The AT86RF212 acts as a slave microcontroller when connected to the mbed. It receives commands and settings. It is compatible with many standards, including 6LoWPAN, and can operate on ISM standard frequencies in the 700-900 MHz range, with BPSK or O-QPSK modulation on various bitrates. For our specific purpose, O- QPSK 868MHz was selected with a 100 kbps bitrate.

As briefly noted before, the AT86RF212 is also capable of providing RSSI values after each received transmission. Using the provided software libraries, the RSSI values from the original 0-127 scale can be converted to decibel values, which makes it possible to estimate the distance of the two nodes (signal power to dis- tance conversion with help of the Friis equation).

31 Chapter 4. Implementation

4.2 Node software

The mbed software was developed in C++, using the additionally provided mbed libraries and the Chibi software stack to control the AT86RF212. The functions were written on top of the mbed RTOS, and can be divided to two groups: threads and subfunctions. Figure 4.4 illustrates the connections between hardware, firm- ware and software on the node.

Figure 4.4: mbed hardware, firmware and software

Signal survey using the Chibi stack

Chibi is a minimalistic software library built on top of the 802.15.4 architecture, providing an interface between the master microcontroller (in our case, the mbed) and the radio chip (AT86RF212). It provides high-level C library functions and variable definitions for the master and translates them to low-level assembly or machine code commands for the radio. Chibi is similar to Zigbee in the sense that it does not define the underlying hardware, but can be deployed on many different combinations of radio chips and microcontrollers.

The selection of the bitrate and the modulation (BPSK, O-QPSK) also happens from high-level software (i.e. in the mbed main code, during initialization), as the hard- ware itself is capable of operating on multiple possible standards. The Chibi soft- ware stack provides user-friendly high-level functions to handle addressing and RSSI conversion. The Acknowledge signal is handled internally. In general, Chibi is designed to be very straightforward and simplistic.

32 4.2. Node software

Modular software development

During prototyping, five nodes were assembled and programmed in total. As the hardware was found to be completely interchangeable for the workstations and am- bient devices, it was a logical step to make the software running on the nodes mod- ular as well.

At this point, it is important to make a distinction between the workstation dir- ectly connected to the PC, and the ones forwarding their data to it. For the sake of argument let us name the connected workstation god and the other workstations demigod.

Each device received the same binary executable to run, with the same set of func- tions – but a different id.txt file. In the code, it was well defined, which ID belongs to an ambient device, a god node, or a demigod. This also implies that each device has knowledge about the group definitions of all other devices – i.e. a demigod knows, which ID denotes an ambient node. The main function first reads the file containing the ID and then starts the appropriate functions (threads) accordingly. A short description of the tasks of each group follows.

God is connected to the PC via USB, emulating a serial connection to a ter- minal window (mbed library for serial-over-USB). It waits for recognizable in- puts and commands from the PC keyboard (e.g. Start survey XYZ), then ex- ecutes the appropriate subtasks. Over the radio it can give commands to both the demigods and ambient nodes. It also processes the data received from them, and forwards information to the PC via the terminal window. It also pings the demigods and ambient nodes at defined intervals. There are multiple modes of surveying and data processing.

Demigod receives commands for setting the surveying mode from the god node, then executes the requested survey and records the signal strength values. If the mode requires so, it calculates an average and standard deviation from the recorded data. Finally, depending on the survey mode it forwards either the average and standard deviation, or all recorded signal data. It also responds to pings from the god node and pings the ambient nodes.

Ambient node upon receiving a survey request, it starts broadcasting packets according the survey mode (e.g. one packet; 1000 packets; indefinite number of packets until receiving a stop command, etc.). It also responds to pings from demigods and the god node.

33 Chapter 4. Implementation

Commands and actions

The surveys are triggered and their modes are selected through the keyboard com- mands typed in the terminal window that is communicating with the mbed via a virtual serial connection over the USB cable. In general, the user sends a command to the god node, the god node sends a request to the demigods and ambient nodes, then the ambient nodes answer with a broadcast, which is picked up both by the demigods (who then forward the data in some form to the god node) and by the god node itself.

The modes are detailed as following:

Classic The god node requests a predefined number of packets from the ambi- ent nodes and informs the demigods about the mode of the survey. The ambient nodes start broadcasting the packets. For each packet the demigods receive, they read the signal strength, convert it to decibels, and then send the decibel value to the god node. On top of that, the god node receives the packets from the am- bient node as well and calculates the decibel values of that transmission. After finishing the transmission phase, the ambient nodes and the demigods return to listening mode.

Averaging Similarly, the god node requests N packets from the ambient nodes and informs the demigods about the mode setting. However, now the ambient nodes broadcast N packets in a single burst, and the demigods temporarily store all N signal strength values. The ambient nodes return to listening mode, the demigods calculate average and standard deviation, transmit these to the god node, then they return to listening mode as well. Besides the received data from the demigods, the god node also receives the packets directly from the ambient nodes, and calculates the average and standard deviation.

Flow Almost identical to Classic mode, with the one significant difference that the number of packets is not predefined, instead the survey lasts indefin- itely until the node receives a command to cease transmission.

Both in Classic and in Averaging mode multiple packets are being transmitted at the same time. Furthermore, in Averaging mode, there is a quota of a pre- defined number of must-received messages. The question of collision and packet loss are handled by a combination of the Chibi stack’s internal acknowledge hand- ling method and a custom counter code implemented part of the mbed software. Namely, each message is programmed to have a Message ID, incrementing from one to N. The gods and demigods keep track of the ambient nodes and their progress in

34 4.2. Node software message transmission. If the next expected packet does not arrive, the demigod or god node requests the ambient node to re-send it 1.

Real Time OS and threads

The prototype system showed that there is a requirement for vast amount of concur- rent data transfer, forwarding, and (for the god node) maintaining a constant serial connection with the PC. To ensure an advanced level of time and resource sharing, we opted for using a real time operating system. Since the function and software development was homogeneous for all three types of devices, the god node’s need for multiple concurrent tasks has led to porting the real time OS to the other two device groups as well.

The mbed RTOS is a minimalistic Real Time Operating System with a thread-based structure similar to µC-OS and a mixture of signals, semaphors and mutexes to control resource sharing between different threads.

Here follows the list of implemented threads. As noted before, the main function is set up in a way that only the appropriate threads are being invoked, as defined in the node’s unique id file. A demigod node will never run the threads meant to run on a god node, and so forth.

void t god measure() The god node constantly listens on the virtual serial port for any input from the keyboard. If a recognizable command arrives (i.e. a command for one of the above mentioned three modes of survey), it sets a global variable according to the requested mode and sends a broadcast message with the mode information to the demigods and ambient nodes.

void t god connect and listen() As the name suggests, this is the receiv- ing thread running on the god mode. It picks up the signals from the other nodes and processes the data in the current survey mode. The mode is read from the global variable set by the former thread. It also displays the processed information on the terminal window.

void t god ping() At a predefined time interval, the god node pings all the other nodes expected to be alive. The results are printed on the terminal win- dow. Mostly used for debug purposes.

1 During the thesis work, the testing and measurements we have not encountered a single case, which would require a more advanced level of collision and packet loss handling.

35 Chapter 4. Implementation

void t demigod run() It is quite similar to the connect-and-listen function, it processes the incoming signals in a simple if-then-else structure. According to the command (survey mode) received from the god node, it starts forwarding and/or processing the signals from the ambient nodes.

void t demigod ping() It pings only the ambient nodes and prints the res- ults for debug purposes2. — void t amb run() On receiving command from the god node, starts broad- casting packets according to the survey mode. In the flow mode this thread is responsible for processing the commands to start and stop the message flow, meanwhile the packet sending itself is handled by a separate thread.

void t amb flow() If enabled, indefinitely repeats packet sending. Enabling and disabling is controlled by the void t amb run() thread.

It is interesting to note, that thanks to the Chibi protocol’s preimplemented ac- knowledge handling, there is no need for the ambient node to have a separate thread for responding to pings. It just happens automatically.

The design of the software enables the threads of different node types to invoke the same subfunctions. For example, void t god ping() running on the god node and void t demigod ping() running on a demigod both call the same unsigned char ping() function, but with different parameters: the god node pings all nodes, while a demigod only pings the ambient nodes. This modular way of software develop- ment have reduced development time significantly while maintaining a cleaner, more usable code base.

4.3 Central entity software

For the central entity, a complicated set of Matlab scripts and programs have been implemented. The inputs of the system are the rough initial locations of the nodes and the signal data collected by the god and demigod nodes. The system updates the location estimation for each ambient node in every cycle where a new set of data arrives. The output is the location coordinates with the corresponding reliability values, in numeral form and shown on a simple map as well.

2During the prototyping, demigod nodes were most often deployed with a battery pack and no connection to a PC. However, during some testing and debug sessions, they were indeed connected to a PC to monitor their outputs via the terminal window.

36 4.3. Central entity software

The main program script consists of two main parts. In the next sections, all will be detailed.

• First, the initial variables and databases (the true location of the nodes) are loaded. Using the database of the true locations an alternate database of ad- ditive Gaussian random error induced positions is generated. • Afterwards, the Kalman filter subsystem processes the data: the initial loca- tions of the nodes are set to the fake positions and the Kalman filter uses the previously measured signal strength and signal reliability data (from the god node and the demigods) to correct the initial estimation.

Generating an alternate database

We start with the original, true location3 (x,y) of the ambient nodes. Then, a shift value (x,y) is generated for each location. The length of the shift is constant for all the nodes, but the angle of the shift is uniformly distributed pseudorandom value in the range of [0◦ .. 360◦]. The length and angle are then converted to Cartesian coordinates, and these values are then added to the original true coordinates to generate an error-induced array of coordinates.

The process is then repeated multiple times with different shift lengths. The differ- ent alternative databases are stored at this point, and the Kalman subsystem will process them in separate runs. In each run, the database data is being continuously updated according the radio survey data. At the end of the run, the results are saved, the database is cleared and the next alternate database (with another shift length) is loaded. For example, in a test used during the evaluation run of the sys- tem, shift length is defined to be [5, 10, 15 .. 100] 10 cm and the central entity ∗ obtains radio survey data from 3 workstations. Meaning, there are (100-5)/5+1 = 20 runs, and having 3 databases means each run has 3 Kalman cycles (every data- base is updated 3 times, then cleared).

Kalman subsystem

The Kalman subsystem loads one of the alternative, deliberately error-induced data- bases and treats the values as the initial coordinates of the nodes. As if the facility personnel of the building were making a map, and were putting down the loca- tion of the ambient nodes on the map very roughly and incorrectly. Making the

3 These are measured during prototyping.

37 Chapter 4. Implementation backbone of this thesis, the Kalman subsystem has the task to improve the estima- tion on the inaccurate coordinates of the ambient nodes using the radio survey data gathered by the god node and the demigods.

An estimation update cycle is triggered by a new set of measurement data. It can provide new information for all the nodes, or for some nodes, depending on the ra- dio survey. Either way, the appropriate nodes are being investigated by the Kalman process.

For each node, first a new distance estimation is calculated, based on solely the new measurement data. As mentioned before, the god node collects and forwards data in decibels to the central entity. Thus, a conversion to distance is required.

Conversion coefficients were tuned during preliminary tests and measure- ments, where two nodes were placed at predefined distances to each other – [1, 2, 3 .. 20] m – and the signal strength was measured (Figure 4.5, marked with red). Now, we know from the Friis equation, that the ideal signal strength fades inversely proportional to the square of the distance to the origin. Varying the parameters of such a function we found the set of parameters with the sum- marized least squared difference to the measured values. This is now the best fit function (Figure 4.5, blue) and the parameters are used as conversion coefficients in the Kalman subsystem.

Figure 4.5: Preliminary measurement data; distance to signal strength (red) and best 1/r2 fit (blue)

After calculating the new distance from the signal strength, the conversion to 2D

38 4.3. Central entity software coordinates follows. As explained in Section 3.2, the intersection of a circle with the new distance as radius and a line connecting the workstation and the initial coordinate of the node is calculated. This point is now the new coordinate estimate.

One of the main reasons the Kalman filter was chosen for this thesis project is that it can automatically update system parameters in a very intelligent way. In the database, the coordinates and the reliability values are continuously monitored, and at the end of each cycle, they are represented on a map as Gaussian bells, with the reliability values serving as the width of each bell.

At the very beginning, before the first set of measurement data, the static reliability value (S.R.V.) is defined freely by the user as a distance value somewhere between [0 .. 1000] cm. The larger the number, the less reliable the position estimate is. In each cycle of the Kalman filter, the S.R.V. is continuously updated.

There is also a reliability value on each measured signal strength value, gathered during surveying. As it is described in Section 3.1, it is calculated as the standard deviation to the mean(x1 .. xN ), where N is the number of samples. The standard de- viation is usually in the range of [0 .. 20] dB. This is called measurement reliability value (M.R.V.), and once again, a larger number means less reliability.

The Kalman filter takes two inputs, the xk signal estimate (or state variable) and the zk observation value (or measurement variable), and multiple arrays of parameters (e.g. measurement noise, process noise, error covariance). After taking everything into account, it produces an updated signal estimate and updated parameter arrays. In our case, the signal estimate is the coordinate value stored in the database and the observation value is the above generated new distance estimate.

However, the S.R.V. and M.R.V. can not be matched equivalently to the Kalman parameter arrays. Furthermore, Kalman parameters do not operate in the conveni- ent ranges of [0 .. 1000] or [0 .. 20]. There is a need for conversion before and after the invocation of the Kalman filter. The appropriate conversion functions were de- veloped in a quite empirical way, but the test results so far have proven them to be sufficient and well-functioning.

The Kalman filter provides an updated signal estimate as its main output variable and this new value replaces the previous coordinate estimates in the database. In the next Kalman cycle, this updated value serves as the signal estimate input, and another workstation’s survey data provides the measurement variable.

39 Chapter 4. Implementation

The Kalman output also provides the updated parameter arrays, so after the appro- priate inverse conversions an updated S.R.V. variable can be retrieved. In a similar way as the signal estimate, it will be stored in the database and will be used in the next Kalman cycle as input.

Due to the advantageous internal operation of the Kalman filter, the updated and converted coordinate estimates and the S.R.V. variable are following the change in the reliability of a position estimate. For example, if the coordinate estimation input was highly reliable and the measurement is unreliable, then the Kalman output – both the coordinate estimation and the S.R.V. – will be aggressively in the vicinity of the input data. If the input coordinates are unreliable and the measurement is reliable, then the output coordinates will be in the vicinity of the measurement data, while the S.R.V. variable will show the appropriate improvement (e.g. S.R.V. input is 1000 cm unreliable, and the output has a reduced unreliability to 500 cm).

40 Chapter 5

Evaluation

5.1 Configuration and methodology

We have seen in the previous chapter, that the full system consists of three main parts: node hardware, node software and central entity software. These parts are not fully separated, they should work together for the system to be successful. Hence, evaluating them should not only confirm the individual part’s status and readiness, but their collaborative effort as a complex localization system.

Furthermore, we have to demonstrate this system in a realistic scenario, where mul- tiple ambient devices and workstations are present in an office environment. They have to communicate with each other and with the central entity, and the central entity should be able to update and improve the location information based on the radio surveys.

We present how the prototype system has been configured and set up to demon- strate and evaluate all the implemented functions and actions in operation. The keyword during the entire operation is realism. We aim to create a realistic scen- ario, with realistic node placement and measurements.

Node arrangement

Offices 6F8, 6F9, 6F10 and recreational area 6K4 in the Cambridge building of ARM Holdings have been selected as the testing field. The total area has roughly the size of 15 m 12 m. First, 19 ambient nodes and 3 workstations have been virtually ×

41 Chapter 5. Evaluation placed. As shown in Figure 5.1, the locations have been selected not in a uniform fashion, but rather in a realistic way. For example, ambient node (6) represents the chip embedded in a fire extinguisher, number (16) is ”inside” a phone in the conference room, (3) is on an intelligent coffee machine, and so forth.

At this point, the above mentioned locations are virtual, the measurements have not begun yet. The physical number of assembled nodes, however, is much lower than 19 + 3. With 4 nodes at hand, and knowing that the messaging is one-directional, it seems logical to physically place the 3 workstations on their exact virtual locations, and move the one remaining node around the 19 specified ambient node locations, providing three measurements at a time – one for each workstation. Finally, compile the 3 19 data points into three separate databases, which ought to emulate three × radio surveys – again, one for each workstation. This data is then fed to the central entity, running the localization software.

Figure 5.1: Virtual positions of ambient nodes [(1) - (19)] and true positions of workstations [*1 - *3] in the office environment

Now, the three workstation nodes were prepared and placed, then the ambient node

42 5.1. Configuration and methodology was also programmed and placed at one of the 19 virtual positions, i.e. on top of the coffee machine or next to the phone, according to its simulated functionality. During an average radio survey, 1000 data points were collected. The mean of the samples served as the final signal strength value of that location, and the calculated standard deviation was converted to the Measurement Reliability Value. After a successful survey, the ambient node was moved to the next position, and so forth.

It is important to note, that during the measurements the office area was not emp- tied or modified in any way. People were continuing to work, or move in and out the recreational area; the computers were staying switched on; and the WiFi routers and mobile devices were not disabled either. This has contributed to the realistic nature of the scenario and added the multipath-based noise to the measurements.

System parameters

After setting up the nodes and collecting the measurement data, but before starting the evaluation of the Kalman-based localization system at the central entity, there are a wide number of variables that need to be set and fine-tuned.

Database size and values After the data collection and database arrangement, 3 databases are compiled by default, each with 19 entries. These should be pro- cessed in 3 sequential cycles by the localization system, continuously updating the assumptions about the 19 nodes’ positions. At creating the evaluation scen- ario, we can choose to select all 3 19 data points, or just a subset of them. For × example, leaving out some of the 19 points from one of the databases can suggest that the named ambient nodes were out of range for that particular workstation and no measurement was recorded1. Furthermore, the dataset can be completely inverted, as it consists only of signal strength and signal quality data: one could build 19 alternative databases, with 3 points at each, iterate the localization system 19 times, and so forth.

Database sequence and theoretical databases After deciding, which data points are included in the single databases, they have to be ordered in sequence. Remember, the measurement data is already collected from all the workstations, hence feeding the central entity with the input in different sequences can rep- resent and simulate different real world scenarios. For example, in Scenario m123 the central entity receives the data from workstation *1 first, then from

1 In reality, this was not the case. All 3 workstations have had good coverage over all the 19 ambient nodes, so the full database consists, as a matter of fact, of 3 19 successfully recorded data points. ×

43 Chapter 5. Evaluation

*2, and finally, from *3. In Scenario m231, it happens the other way around (*3, then *2, then *1). Accordingly, the results after three Kalman cycles can vary from scenario to scenario. Also, we introduce here the concept of theoretical database, meaning an artificially created database of signal data, whose decibel values are not created by measurement, but converted from the true, originally recorded distances between the ambient nodes and the workstations.

Shift value of the initial coordinates The error-induced initial coordinate variables are the starting point of the localization system – the first cycle of the Kalman filter subsystem uses them as the input which needs to be corrected with help of the workstations’ databases. But how off should they be? How well does the system perform if we set the shift length to 1000 cm? What happens if it is set to 50 cm? Again, a wide range of possibilities arises. We have to select some subset of the full possible scale, find a balance between good detail, but not too large runtime. Initial coordinate static reliability values (S.R.V.) As mentioned in Section 4.3, both the reliability values of the initial, error-induced coordinates and of the measured signal strength have to be converted to the same scale for the Kalman subsystem. For coordinates, this scale also represents a range in centimetres, showing the maximum difference the variable has from the real, originally recorded position. E.g. if the Kalman filter’s input says [700, 350]2, with reliability of 100, then the real position has to be somewhere inside a circle with radius 100 cm, centered at [700, 350] on the map. The static reliability is determined solely by the user, and configuring the reliability has a huge impact – it has to be defined in consideration with the previously mentioned shift value. Meaning, if the reliability is set to 500 for an ambient node, the shift value of the same ambient node can not be higher than 500 cm. It represents another degree of freedom, that we can choose to set every ambient node’s reliability to the same constant or we can choose a different number for each node.

Kalman parameters Each time the Kalman subsystem is invoked, it has to cal- culate a coordinate estimate on a scale somewhere between the previous estim-

ate xk and the coordinate calculated by the new measurement data zk. It takes the two position and reliability values as input, then calculates the new estimate and reliability value and gives them as output. On the inside, the parametrisa- tion of the Kalman matrices and internal variables plays a great role in the co-

ordinate output. They influence how strongly the Kalman filter favours xk over

2 Do not forget, this is the generated, error-induced coordinate – not the original one. See Section 4.3, Generating an alternate database.

44 5.1. Configuration and methodology

zk – or the other way around. With a specific configuration of the internal para- meters it is even possible to make the Kalman filter completely disregard one of them, and choose the output to be equal to the other. During the thesis work, many preliminary measurements and central entity simulations took place to empirically determine a few sets of pseudo-optimal parameters3.

Selected scenarios

From the infinite amount of possible variables we had to choose a few, which will build the main scenarios for evaluating the performance of the localization system. During selection, the main criteria was, once more, realism. Which possible scen- arios represent a real life use of the localization system the most appropriate way? Along the previous list of parameters, the selected scenarios were built as follows:

• Database size and values we have selected the full 3 19 databases, that is × three databases for the three workstations, each with the full set of 19 meas- ured signal strength and signal reliability values4. • Database sequence and theoretical databases The main scenarios involved the measured and the theoretical databases in the following sequence: – Scenario m123: measured *1 *2 *3 → → – Scenario m231: measured *2 *3 *1 → → – Scenario m312: measured *3 *1 *2 → → – Scenario t1x6: theoretical *1  6 × – Scenario t123123: theoretical *1 *2 *3 *1 *2 *3 → → → → → During Scenario m123 the C.E. localization software treated database *1 as the first set of arrived measurements, database *2 as the second, and so forth. In the following two scenarios the sequence was mixed5. Whereas during Scenario t1x6 the theoretical database at workstation location *1 (in the cen-

ter of the area) was repeatedly fed to the Kalman system as z1..6 input arrays. Because the theoretical database was generated using the real coordinates of

the ambient nodes, this scenario ought to push the outputs x2..7 in every cycle nearer the real locations. In the next section, this feature will be deeper ex-

3 It is not entirely uncommon in the industry to parametrize the Kalman filter in an empirical way. 4 On a side note, there were also some minor testing scenarios involving non-full databases, and the C.E. software was then modified to be able to handle the situation as it was expected: if ambient node X did not have new measurement data in the current cycle, the Kalman filter was not invoked and the coordinates remained the same until the next cycle. 5 As in reality all the databases were pre-recorded, this did not pose any problems.

45 Chapter 5. Evaluation

plored. Scenario t123123 ran altogether 6 Kalman cycles of theoretical data, and will be used to be compared with Scenario m123 and Scenario t1x6. • Shift value of the initial coordinates This parameter is perhaps the most significant of them all, thus it has to be investigated in full possible detail. Alternate databases were created with shift lengths of 50, 100, 150 (..) 1000 cm. The maximum shift length was chosen in correspondence with the size of the testing area (15 12 m). × • Initial coordinate static reliability values (S.R.V.) As the maximum shift length was set to 1000 cm, it made sense to set the S.R.V. to 1000 as well. Another possibility would have been for example to choose the S.R.V. in each variation to be equal to the momentary shift length (50 .. 1000) of the current alternate database. • Kalman parameters Many possible configurations were tested and in the end we selected three main sets of parameters for the Kalman filter, namely config #1, config #2 and config #3. These have different values for the internal constants and parameter matrices of the Kalman filter. Simply explained, the configuration with the higher serial number has parameters that end up hand-

ing a higher preference to zk over xk (measurement input over previous state). These sets are, just like the shift length, also one of the main dimensions of the evaluation analysis.

5.2 Analysing results

The signal measurement data have been collected and organized in databases; the evaluation scenarios have been compiled. The matlab system that represents the central entity’s software has been set up to run the scenarios and give the output both in the form of number arrays and in the form of plotted graphs. The number arrays include every single ambient node’s location state after each Kalman cycle, but the graphs represent an average over the 19 nodes’ changes of location state in each cycle. Investigating these outputs has led to some deductions about the behaviour of the localization system.

In the following parts, these deducted properties are detailed, accompanied by the figures taken from the system output. Unless otherwise noted, the X axis repres- ents the initial shift length, and the Y axis represents the improvement over the initial error that has been induced to the databases during the initialization of

46 5.2. Analysing results the program6. For example, if a point is at the [500, 100] point in one of the graphs, that means the 19 nodes have started each with a shift length of 500 cm, and at the end of the Kalman cycle, the 19 nodes’ average improvement was 100 cm – mean- ing they are now only 400 cm off, in average. Each graph line represents output after a Kalman cycle (i.e. after processing a set of data from a workstation).

Please note, that in the following figures, unless otherwise noted, the horizontal axis stands for Initial error (shift length) [cm] and the vertical axis stands for Improvement over the initial error (shift length) [cm].

Location updates

Figure 5.2: Localization system output after the first (1), second (2) and third (3) Kalman cycle. Scenario m123. Kalman config #2.

As the first and foremost deduction about the system we can state that the locations are being successfully updated by the Kalman filter in each cycle. Figure 5.2 illustrates the first, second and third cycle during Scenario m123. The first cycle’s x1 and z1 inputs are the pre-generated, error induced coordinates and the signal data retrieved from workstation *1, respectively. The outputs x2, for each variation of the shift length, are shown as the lowest graph (marked with (1)) on the figure. x2 and z2 (the signal data retrieved from workstation *2) provide the inputs in the second cycle of the Kalman filter, and so forth.

One could note two phenomena here. First, the results highly depend on the shift length, and this is especially visible at low values. Under 250, the outputs are

6 See Section 4.3, Generating an alternate database.

47 Chapter 5. Evaluation even in the negative, meaning the location prediction actually worsened. This can be explained with the fact that all initial S.R.V. reliability values have been set to 1000, which is a rather high unreliability compared to the true induced error of 0 - 250 cm.

Secondly, disregarding the low shift value range the mid-to-high range shows a moderate improvement in general; furthermore, each Kalman cycle improves over the previous one. There are, however, some catches. The fact that each Kal- man cycle performs better than the one before is not always true. It is true in some or most scenarios indeed, but in the later sections we will see quite a few exceptions. Also, in the phrase ”moderate improvement” the emphasis should lie on moder- ate. As in, the highest improvement even after three cycles of Kalman updating is merely around 125 cm. Attempting to improve this number, we experimented with the Kalman filter’s inner parameters and variables. In the next section, the effect of changing these parameters is discussed.

Configuring the Kalman filter

(a) Kalman config #1 (b) Kalman config #2 (c) Kalman config #3 Figure 5.3: Localization system output7. Scenario m123. Using same X-Y scale; horizontal line marks zero.

Running the same scenario with three different Kalman filter configurations shows the importance of proper parametrization. Figure 5.3 illustrates the difference. All configurations are taking both xk and zk inputs into account, but with different weights. From left to right, each configuration favours zk slightly more than the previous one. The Y scale of the three graphs are equal.

By modifying the Kalman parameters, the graph’s maximum value can be in fact increased from 125 cm to 200 cm, but this improvement comes with a tradeoff. ∼ ∼ 7 Full size figures are presented in Appendix A.1.

48 5.2. Analysing results

The new configuration amplifies the minimum values as well, so in the cases where config #1 and config #2 performed poorly, i.e. in the regions where the graphs are near or under the Y = 0 line, config #3 performs even worse. The leftmost positive point of the graph moved from 300 to 600 cm of shift length8.

The most important thing to note observing these results, that in this system – and many other Kalman filter based systems around the world – Kalman parameters have to be configured with great precaution. One must look for an optimal solution that yields both acceptable performance and a reasonable error rate.

Sequences

(a) Scenario m123 (b) Scenario m231 (c) Scenario m312 Figure 5.4: Localization system output8. Blue: output after first Kal- man cycle; green: after second; magenta: third. Using same X-Y scale; horizontal line marks zero. Different sequences. Kalman config #2 for each.

As all the measurement data was already recorded and compiled to databases before the central entity started localization, an extra degree of freedom arose – one can feed the C.E. the three databases in different order, creating alternative sequences.

Figure 5.4 shows the three scenarios. In each case, blue color denotes the output after the first Kalman cycle, regardless which workstation provided the meas- urement input. Then follows green and magenta, as we proceed forward in time. The order in the sequence is different, Scenario m123 shown in Figure 5.4(a) starts with processing workstation *1, in Scenario m231 workstation *2 provides the first cycle’s input, while C starts with *3.

The differences between the three figures originate from the differences between the workstation databases. More specifically, they are related to how the Kalman

8 Full size figures are presented in Appendix A.1.

49 Chapter 5. Evaluation system handles the measured signal quality or reliability values (M.R.V.). This can ultimately be derived back to their placement and the multipath properties of their local environment. For example, workstation *3 collected signal strength values with quite poor reliability, because it was placed into the far corner of the room and encountered severe multipath effects; whereas the data of workstation *1 had generally good reliability due to its fortunate placement in the center of the room (see Figure 5.1).

Convergence

In this part, once more we perform an abstract thought experiment, titled Scenario t1x6. The same initial databases are generated as in the previous sections (shift length 50 - 1000 cm, initial S.R.V. equals 1000, Kalman configs #1-#3, etc.), but instead of the measurement data collected from the workstations we use an ideal database from a virtual workstation, which happens to be placed at the exact same location as real-life workstation *1. The database values are calculated from the true distance between the virtual workstation and the ambient nodes, using the inverse of the Friis-equation2.3. The M.R.V. reliability values of the data in this database are all set near 0, in this case meaning a near infinite reliability. Now, we iterate the Kalman subsystem six times9 and observe the output.

Disregarding the low shift length regions, each output is improving over the previ- ous output (see graph lines in Figure 5.5a). But the interesting attribute to note here is that the improvement is gradually reduced with each cycle. This phenomenon illustrates the inner operation of the Kalman filter. The reliability values are con- stantly updated as Kalman parameters. At first, the position reliability is very low, while the measurement reliability is near maximum. After an iteration, the Kal- man filter updates not only the location X, Y coordinates, but the position reliability value as well. As a highly reliable measurement was involved, the new position is deemed to be much less unreliable as the previous one. In the next cycle, the Kalman filter has the same measurement estimate with a slightly better position estimate as inputs. The output is therefore a bit nearer to the position estimate, so the improvement (or change) is slightly smaller as it was in the previous cycle.

9 The method is exactly the same as shown in the previous section, only instead of measured *1 measured *2 measured *3 it is now theoretical *1 theoretical *1 ... theoretical *1 (six times).→ → → → →

50 5.2. Analysing results

(a) System output. Scenario t1x6, Kalman config #2. (b) Improvement over cycles Figure 5.5: Thought experiment with artificial, near-ideal data

The advantage of diverse input

Next, we perform the simulation marked as Scenario t123123. Similarly to Scenario t1x6, the databases are deliberately generated to provide perfect location inform- ation. Here, we use all 3 theoretical workstations in two complete runs, meaning the Kalman subsystem is invoked (and the location estimations are updated) a total of 6 times. It is important to note here, that t123123 performed best from all the available scenarios.

The number of Kalman cycles made t123123 comparable to Scenario t1x6 – both of them ran 6 full cycles. Figure 5.6 displays the two outputs side-by-side.

After the first cycle, where both scenarios processed the database of theoretical workstation *1, the two outputs are equal. But after completing 6 Kalman cycles, Scenario t123123 outperforms Scenario t1x6. The convergence of the values to- wards their maximum is also slower in the case of t123123. This demonstrates, that using diverse databases is more advantageous than taking input from a single database.

51 Chapter 5. Evaluation

(a) Scenario t1x6, Kalman config #2 (b) Scenario t123123, Kalman config #2 Figure 5.6: Localization system output10

Comparing measured and theoretical cases

Next, we compared Scenario m123 with the first 3 cycles of Scenario t123123 (let us name this part of the session Scenario t123). They both completed 3 Kalman cycles, processed the databases of 3 workstations in the same sequence, but with one significant difference: Scenario t123 ran the theoretical databases generated to provide perfect distances, while m123 used real world input.

(a) Scenario m123, Kalman config #2 (b) Scenario t123, Kalman config #2 Figure 5.7: Localization system output after 3 Kalman cycles10

10 Full size figures are presented in Appendix A.1.

52 5.2. Analysing results

As Figure 5.7 demonstrates, Scenario t123 performs better than m123. Remem- ber, the only difference between the two scenarios are the input databases. This demonstrates the system’s considerable dependency on the outer environment. If our readings are imperfect, it will have a certain impact on the output as well. The reasons behind poor system performance and the imperfections of the real world databases are briefly discussed in the next section.

A note on errors

As it is noted multiple times in this chapter, the system is far from perfect. There are issues with the performance of the improvements, the empirically determined Kalman parameters have to be precisely tuned, and the sequence in which the work- station data is processed can influence the outcome greatly. But why exactly is the system so sensitive about its external parameters and inputs?

Figure 5.8: Friis equation in noisy environment

Firstly, the inputs coming from the outside world are usually cluttered with error. Indoor multipath makes it difficult to convert signal strength to reliable distance value above a certain threshold. Figure 5.8 shows the Friis equation in a realistic environment. 3-5 meters from the origin the levels can be quite well separated, but above 5 meters it is nearly impossible to distinguish between data points. You cannot tell for example if the signal strength value -55 dB belongs to an object at 10 m or 20 m distance from origin.

53 Chapter 5. Evaluation

During the entire project, realism was a keyword, a mantra. The measurements have been conducted in a real, indoor office environment and multipath is part of that environment. Yes, it makes the job of the localization system extremely diffi- cult, but it is necessarily something you cannot simply eliminate. In this situation, being able to configure the Kalman parameters and other system parameters em- pirically, in tune with this specific scenario is not a disadvantage or a bad necessity, but a powerful asset against multipath. This perspective, the author’s take and cri- ticism on the system, the Kalman filter, multipath and measurement reliability will be further detailed in the following and final chapter of the thesis.

54 Chapter 6

Conclusion

At the beginning of the thesis work we set out to investigate a theoretical, future environment, filled with radio-connected, ambient nodes, called the accidental in- frastructure. These semi-intelligent devices are capable of sending radio messages on request, but they are not deliberately programmed for localization purposes. The thesis aimed at investigating the possibility of developing a higher level of loc- alization service and introducing it in this environment.

An ecosystem of experimental devices has been designed for this task. They were based on the same hardware and ran almost identical, modularly developed soft- ware – but they were separated according to their roles: ambient nodes symbolized the accidental infrastructure; workstations polled the ambient nodes for packets; the central entity used the databases of the workstations, containing signal strength and signal reliability values, to estimate the locations of the ambient nodes.

Distance estimations, the basic building blocks of localization, are based on signal strength values. However, the noisy, indoor environment introduced multipath- related errors to the measurements. To tackle this challenge, a probability-based approach was implemented, where a lognormal approximating function performed the conversion from signal strength to distance.

Combining the newly received and converted distance values of the measurements with the data collected in the previous measurement cycle results in a new position estimate. The central entity’s software updates each ambient node’s position es- timates and their ”goodness” or reliability continuously with a Kalman-filter based custom localization algorithm.

55 Chapter 6. Conclusion

The major finding of our research is that the parametrization of the Kalman filter plays a vital role in the accuracy of the position estimations. Using the same in- put data, but setting up the system with a different configuration, we can obtain vastly diverse results. The Kalman filter’s matrices and the order of the incoming data sequences influence the outcome greatly. The performance of the system is in general very volatile, and in most cases quite poor. The measurements using real-world data showed an improvement of 100 - 150 cm over the initial error shift of 200 - 1000 cm, the theoretical simulation peaked at merely 200 cm of improve- ment.

In summary, this thesis studied the accidental infrastructure aided by an advanced localization system. The success of the system is rather questionable. It is func- tional and is able to track and calculate position estimates for a large number of nodes with a mildly sophisticated, Kalman filter based solution. However, the im- provements are fairly weak. We have worked with noisy and unreliable, but real- istic measurements; the indoor multipath compromised our output significantly. To present possible ways to improve the system, let us draft some possible future scenarios as the closing part of this thesis.

Future Work

As mentioned in the previous chapters, some parameters, like the measurement reliability value and the internal matrices of the Kalman filter are determined par- tially with an empirical, rather unscientific method. We have chosen to address the problem of localization from an angle of engineering and realism, opposed to the- ory and abstractness. As future work, one general area to explore is the reduction or elimination of the empirical methods and finding more sophisticated ways to determine the Kalman parameters. However, this requires a higher understanding of the underlying advanced mathematics, and a vast amount of extra development time.

A further dimension to explore the project would be the elimination of the central entity. Is it possible to perform the localization calculations in a distributed man- ner on the nodes? As the entire thesis scenario takes place in a hypothetical, future environment, we can assume that the embedded devices have a higher computing power than today, perhaps enabling them to run the more complicated algorithms as well. The database could be moved to the cloud, and a smartphone could serve both as workstation, polling new measurement data from the accidental infrastruc- ture, and as central entity – using both previous data from the cloud and new meas-

56 urement data to invoke the Kalman filter. Another approach is to move the calcu- lations and the Kalman subsystem as well to the cloud. This would eliminate the need for high computing power at the workstations.

As a final question, we can ask if the localization system can or should be gen- eralized. Perhaps we can port the concept of using error-induced measurements, Kalman filtering and advanced parametrization to other projects. For example, us- ing moving nodes instead of stationary ones would also introduce a new dimension of variables for the Kalman filter to process. Or we could extend the system to cover a larger area, where not all workstations are placed inside the communication range of the central entity and the network of workstations has to be converted to a mesh network with data forwarding implemented. The possible options and variations are limitless.

57 Chapter 6. Conclusion

58 Appendix

59 Chapter 6. Conclusion

A.1 Localization system output figures, full size

Please note, that in the following figures, unless otherwise noted, the horizontal axis stands for Initial error (shift length) [cm] and the vertical axis stands for Improvement over the initial error (shift length) [cm].

Configuring the Kalman filter

Original: Figure 5.3, page 48.

Figure A.1.1: Kalman config #1

60 A.1. Localization system output figures, full size

Figure A.1.2: Kalman config #2

61 Chapter 6. Conclusion

Figure A.1.3: Kalman config #3

62 A.1. Localization system output figures, full size

Sequences

Original: Figure 5.4, page 49.

Figure A.1.4: Scenario A, *1 *2 *3 → →

63 Chapter 6. Conclusion

Figure A.1.5: Scenario B, *2 *3 *1 → →

Figure A.1.6: Scenario C, *3 *1 *2 → →

64 A.1. Localization system output figures, full size

The advantage of diverse input

Original: Figure 5.6, page 52.

Figure A.1.7: Scenario t1x6, Kalman config #2

65 Chapter 6. Conclusion

Figure A.1.8: Scenario t123123, Kalman config #2

66 A.1. Localization system output figures, full size

Comparing measured and theoretical cases

Original: Figure 5.7, page 52.

Figure A.1.9: Scenario m123, Kalman config #2

67 Chapter 6. Conclusion

Figure A.1.10: Scenario t123123, Kalman config #2

68 Bibliography

Bibliography

[1] Kalman Filtering : Theory and Practice Using MATLAB California State University at Fullerton, volume 5. 2001.

[2] Bluetooth specification 4.0. 2010.

[3] Luigi Atzori, Antonio Iera, and Giacomo Morabito. The internet of things: A survey. Computer Networks, 54(15):2787 – 2805, 2010.

[4] Niclas Bergman. Navigation and Tracking Applications. Number 579. 1999.

[5] Gary Bishop and North Carolina. An Introduction to the Kalman Filter. 2001.

[6] J M Bryant and M Newman. Simple Transmission Formula *. (1):254–256, 1946.

[7] Niu Dou and Guan Bo. 3D Localization Method Based on MDS-RSSI in Wire- less Sensor Network. Communications, pages 4–7, 2010.

[8] R. Fisher. 60 ghz wpan standardization within ieee 802.15.3c. In Signals, Sys- tems and Electronics, 2007. ISSSE ’07. International Symposium on, pages 103 –105, 30 2007-aug. 2 2007.

[9] Kentaro Fujiwara, Kaname Harumoto, Yuuichi Teranishi, Toyokazu Akiyama, Susumu Takeuchi, and Shojiro Nishio. A Self-Localization Method Using Propagation of Existence Probability. 2010 10th IEEE/IPSJ International Sym- posium on Applications and the Internet, pages 201–204, July 2010.

[10] Nanotron Technologies Gmbh. nanoNET Chirp Based Wireless Networks White Paper.

[11] Ville Honkavirta, Tommi P E R Al, and Robert Pich E. A Comparative Survey of WLAN Location Fingerprinting Methods. 2009(1):243–251, 2009.

69 Bibliography

[12] W.S. Jang and W.M. Healy. performance metrics for building applications. Energy and Buildings, 42(6):862 – 868, 2010.

[13] KyoungJun Lee and YoungHwan Seo. Design of a rfid-based ubiquitous com- parison shopping system. In Bogdan Gabrys, RobertJ. Howlett, and LakhmiC. Jain, editors, Knowledge-Based Intelligent Information and Engineering Systems, volume 4251 of Lecture Notes in Computer Science, pages 1267–1283. Springer Berlin Heidelberg, 2006.

[14] I-En Liao and Wei-Chih Lin. Shopping path analysis and transaction mining based on rfid technology. In RFID Eurasia, 2007 1st Annual, pages 1 –5, sept. 2007.

[15] Geoff Mulligan. The 6LoWPAN architecture. Proceedings of the 4th workshop on Embedded networked sensors - EmNets ’07, page 78, 2007.

[16] Eduardo Navarro, Benjamin Peuker, Michael Quan, Advisors Christopher Clark, and Jennifer Jipson. Wi-Fi Localization Using RSSI Fingerprinting. pages 1–6.

[17] Colin O’Flynn. Example of geometric dilution of precision (gdop) for simple triangulation. Wikipedia, November 2011. Retrieved on 4th January 2013. [online link].

[18] Ling Pei, Ruizhi Chen, Jingbin Liu, Tomi Tenhunen, Heidi Kuusniemi, and Yuwei Chen. Inquiry-Based Bluetooth Indoor Positioning via RSSI Probability Distributions. 2010 Second International Conference on Advances in Satellite and Space Communications, pages 151–156, 2010.

[19] Rong Peng and Mihail L Sichitiu. Robust , Probabilistic , Constraint-Based Localization for Wireless Sensor Networks. Computer Engineering, pages 541– 550, 2005.

[20] Sangam Racherla and Jason Daniel. Front cover IPv6 Introduction and Con- figuration.

[21] Teresa a. Shanklin, Benjamin Loulier, and Eric T. Matson. Embedded sensors for indoor positioning. 2011 IEEE Sensors Applications Symposium, pages 149– 154, February 2011.

[22] Chia-Yen Shih and Pedro Jose´ Marron.´ COLA: Complexity-Reduced Trilatera- tion Approach for 3D Localization in Wireless Sensor Networks. 2010 Fourth International Conference on Sensor Technologies and Applications, (2):24–32, July 2010.

70 Bibliography

[23] Guodong Teng, Kougen Zheng, and Wei Dong. Adapting Mobile Beacon- Assisted Localization in Wireless Sensor Networks. Sensors (Peterborough, NH), pages 2760–2779, 2009.

[24] Greg Welch and Gary Bishop. An Introduction to the Kalman Filter. pages 1–16, 2006.

[25] Nicolau Leal Werneck. Unscented kalman filter demos: Introduction. SciLogs, March 2010. Retrieved on 4th January 2013. [local html file] and [online link].

[26] Zhichao Zhao, Xuesong Wang, and Shunping Xiao. Grid-based probability density matrix for multi-sensor data fusion. 2009 Asia Pacific Conference on Postgraduate Research in Microelectronics & Electronics (PrimeAsia), (2):205– 208, November 2009.

71