IoT under Lock and Key: Tracking the Tracker Rapid Identification of Devices

Margus Lind

I V N E R U S E I T H Y

T

O

H F G E R D I N B U

Fourth Year Project Report School of Informatics University of Edinburgh

2017 This work is dedicated to my mother, uncle, grandmother and grandfather.

You have always given me all the support I have needed in any way, and have never stopped inspiring me to strive forwards.

ii Abstract

The entitlement to privacy one might expect in everyday life is constantly being chal- lenged due to the advancements and wide spread adoption of technology in our modern interconnected world. Small gadgets are part of our everyday life, but not often are we concerned with what information those devices could be leaking about us - the de- vice identity itself is sufficient to track the owner. Thus, rapid passive identification of devices creates ways to track the population of a whole area.

We propose a highly parallelisable solution for sniffing Bluetooth device addresses capable of rapid device identification. Contrasting to past efforts, our suite is designed to be used for wide-spectrum monitoring with a view on hardware acceleration. Our solution is a custom Software Defined Radio stack capable of monitoring multiple channels in parallel, creating opportunity for advanced analysis of timing correlations between channels.

We present conclusions on the work undertaken and the results obtained. Our imple- mentation is evaluated in respect to performance, scalability, false positive rates, miss rates, and a confidence index derived from the latter two metrics. We highlight im- plicit limitations of each of our decisions throughout the report, and give suggestions for future improvements and continued research.

iii Acknowledgements

Greatest of thanks to my supervisor Dr Paul Patras for the unending patience, support and guidance I have received. This has nurtured my interest in research, and helped me push on towards success.

I would like to mention the great support from my family and friends that has kept me going throughout difficulties. Johanna, I can completely understand the frustration you have had to put up with during my long days in the Forum. I wish to thank Toomas Remmelg and Rui Li for helping me understand academic work better and sharing innumerable coffees throughout this project. Additional thanks to both of you for your amazing feedback. Lastly, I would like to mention Robert Petrut Dumitru for helping me overcome my lack of knowledge regarding Digital Signal Processing.

iv Table of Contents

List of Figures vii

List of Tables ix

1 Introduction 1 1.1 Overview ...... 1 1.2 Aims ...... 2 1.3 Outcomes ...... 2 1.4 Contributions ...... 3 1.5 Outline ...... 4

2 Background 7 2.1 The Bluetooth Classic Protocol ...... 8 2.1.1 The Device Address ...... 9 2.1.2 Packet Structure ...... 10 2.2 Software Defined Radio ...... 13 2.2.1 USRP and GNU Radio ...... 14 2.3 and Demodulation ...... 16 2.3.1 The GFSK Modulation Scheme ...... 17 2.4 Existing Bluetooth Sniffing Tools ...... 18 2.4.1 BlueZ ...... 20 2.4.2 gr-bluetooth ...... 20 2.4.3 Ubertooth One ...... 21

3 Platform Design 23 3.1 Receiving, Filtering and Demodulating Samples ...... 24 3.2 Identifying Packets ...... 27 3.3 Options for Parallelising ...... 30

v 4 Implementation 33 4.1 GNU Radio Reception Pipeline ...... 33 4.2 Preamble, Barker Code, and Trailer Pre-calculation ...... 39 4.3 Access Code Pre-calculation ...... 39 4.4 Header Pre-calculation ...... 41 4.5 Packet Detection and Extraction ...... 41 4.6 Further Packet Verification ...... 45

5 Evaluation 51 5.1 Computational Performance and Speed ...... 51 5.2 Design of Experiments ...... 52 5.3 Proof of Motivation ...... 55 5.4 Capturing Input ...... 56 5.5 False Positives ...... 58 5.6 Miss Rate ...... 60 5.7 Detection/Discovery Time ...... 62 5.8 Scalability ...... 64

6 Conclusion 67 6.1 Tracking People Based on Partial Device Addresses ...... 67 6.2 (Bluetooth Smart) ...... 68 6.3 Future work ...... 69 6.4 Final Remarks ...... 71

Bibliography 73

A Feedback Day Poster 77

vi List of Figures

2.1 Structure of the BD_ADDR...... 10 2.2 Structure of the Access Code...... 12 2.3 Dependence of Preamble and Trailer on Sync Word...... 12 2.4 Dependence of Barker code on LAP...... 12 2.5 Structure of the packet Header...... 13 2.6 USRP B210 ...... 14 2.7 The uhd_fft spectrum analyser...... 16 2.8 Professional Protocol Analysers ...... 19 2.9 Ubertooth One ...... 21

3.1 A top level diagram for eavesdropping on a single channel...... 24 3.2 A top level diagram for eavesdropping on multiple channels in parallel. 24 3.3 Position of centre frequency relative to intended target...... 25 3.4 Differences of Bluetooth and WiFi packets in frequency domain. . . . 26

4.1 GNU Radio pipeline for capturing and extracting a channel...... 34 4.2 GNU Radio flow graph for channelized capture and processing. . . . . 34 4.3 GUI options used during development...... 35 4.4 A Bluetooth packet in frequency domain...... 36 4.5 A Bluetooth packet after band pass filtering...... 36 4.6 A Bluetooth packet after squelching...... 37 4.7 A Bluetooth packet after second band pass filtering...... 37 4.8 A Bluetooth packet viewed as an IQ constellation...... 38 4.9 Constellation when there is no current BT packet...... 38 4.10 Preamble and Trailer mapping generator...... 39 4.11 Barker code mapping generator...... 40 4.12 Pseudorandom LSFR...... 41

vii 4.13 Generating an AC from a LAP...... 42 4.14 LSFR class design...... 43 4.15 HEC LSFR module...... 43 4.16 Whitener LSFR module...... 44 4.17 Generating all possible headers...... 44 4.18 Buffer struct with convenience methods...... 46 4.19 Pushing an input bit into the buffer...... 47 4.20 Exploring all laps within a Hamming distance...... 47 4.21 Pushing an input bit into the buffer...... 48 4.22 Verifying header correctness...... 49 4.23 Correlating packets to discover UAP...... 49

5.1 Layout of the interfaces for experiments...... 54 5.2 Out-of-Memory/Disk Full with tmpfs...... 57 5.3 Number of LAPs detected across background monitoring experiments. 59 5.4 Number of LAPs detected across all experiments...... 59 5.5 Number of Inquiry packets detected...... 60 5.6 Number of Inquiry packets detected during pairing (normalised). . . . 61 5.7 Confidence index across all experiments...... 62 5.8 Confidence index excluding WiFi and Background noise experiments. 63 5.9 Confidence index across all experiments, using our solution’s internal confidence measure...... 63 5.10 Fast and Extended Processing with and without Internal Confidence Model (ICM) for detecting known devices during pairing (normalised). 64

6.1 Calculating probability of no collision given a pool size and a sample size...... 68

A.1 Feedback Day Poster ...... 77

viii List of Tables

4.1 Parameters for the GFSK demodulator...... 38 4.2 FEC 1/3 decoding lookup table...... 48

5.1 Suggested Bit Error Tolerances...... 52 5.2 Configuration of USRP boards used during main experiments. . . . . 53 5.3 Monitoring paired but out of range devices...... 56

6.1 Effect of UAP entropy on collisions during surveillance...... 68

ix

Chapter 1

Introduction

1.1 Overview

Rapid external identification of small portable devices allows for tracking their owners. In this project we implement a way of obtaining identifiable information from the transmissions sent using Bluetooth (BT), a wireless communication protocol widely used for gadgets. Gaining such insights into small portable devices poses a direct security and privacy risk on people carrying wearables.

The requirement to trust commercial networks’ operators with one’s privacy in order to use the network has been discussed for decades [1, 2]. Personal networks have not escaped the scrutiny with identification based on Wi-Fi probes explored in [3]. Even in complex environmental settings, it has been shown that positioning people based on signals from devices is possible [4]. This information would help in many benevo- lent causes, for example, smart dynamic lighting and energy management [5], finding co-workers quickly [4, 6], improving public transportation systems [7] or optimising pedestrian routes [8]. On the other hand, tracking people can be considered a direct invasion of privacy, particularly when attempted without prior consent. Undesired use of personal tracking data could include insurers altering premiums based on the fre- quency of visits to surgeries or pubs, advertisers gaining access to even more aspects of personal life, and criminals being able to predict opportune moments.

Our privacy in everyday life is constantly challenged with advancements of technology in the modern interconnected world. In many cases, devices are rushed to the market, leaving security and privacy issues as an afterthought. Furthermore, small devices are

1 2 Chapter 1. Introduction limited by processing power, and thus computationally expensive approaches are often not suitable. This limits the methods for establishing a higher standard of security and privacy, even if sought after. As a result, many small portable devices have opted for simpler, but less robust, methods to protect their communications, which in turn may leak information of the devices or even data transmitted. Making such shortcuts leaves the doors open to attacks on both the device and the protocols it uses.

1.2 Aims

The purpose of this project is to demonstrate the feasibility of tracking individuals based on portable BT devices’ communications. In order to achieve that we need to detect and sniff packets, and to extract the identifying parameters from the recorded messages. This is complicated due to the obscure nature of BT’s interference mitiga- tion, where the carrier frequency is regularly changed based on parameters not known to agents outwith the Personal Area Network (PAN). Furthermore, the majority of the devices’ hardware address is not available in plain sight and needs to be retrieved by performing analysis on the received packets.

Our goal is demonstrating the feasibility of identifying devices on a larger scale. The potential to gain reliable access to the data communicated is out of scope for this project, but could be the aim of a future project building upon this work. Key re- quirements for the resulting platform are the reliability and speed of device detection.

1.3 Outcomes

We provide a complete highly parallelisable tool for rapidly and reliably identifying Bluetooth devices. Our platform provides a Software Defined Radio (SDR) based BT receiver stack (see section 2.1 for details on the protocol), joint together with a heuristic to increase confidence in reported devices.

This is unique, as current freely available and open source solutions are completely unable to receive a wider band. Proprietary professional testing and measurement devices are, however, forbiddingly expensive (starting at tens of thousands of dollars [9]). 1.4. Contributions 3

We use the Universal Software Radio Peripheral (USRP) to capture the raw radio sam- ples on Bluetooth channels, and recover the data transmitted using signal processing and demodulation blocks from GNU Radio. The retrieved bitstream is then analysed by our custom workflow, revealing the packets within the received stream. These are then verified using redundancy and error correction features of Bluetooth. We extract the Lower Address Part (LAP), i.e. the 24 Least Significant Bits (LSBs) of the device’s hardware address, to be used as the baseline device identifier. Due to the likelihood of LAP collisions on larger-scale tracking, this is not sufficient, so we propose methods to recover the next 8 bits of the device address. The detection times are minimised by using a scalable design with large portions of work offset by pre-computation, yielding a design for a real time multi-channel Bluetooth sniffer. Please refer to section 6.3 for details on suggested improvements and extensions.

Our platform can be developed into an easily deployable surveillance beacon that is ca- pable of monitoring all Bluetooth channels concurrently. However, currently it serves as a proof of concept to motivate privacy-concious changes to wireless protocols.

1.4 Contributions

In general, we have achieved the following:

• Presented a critical review of related project and relevant technologies, • Developed a generic single-channel LAP sniffer, • Delivered a single-channel BT device enumerator, • Parallelied the approach into a multi-channel implementation of the enumerator, and • Evaluated the proposed tools.

In detail, in this work:

• We review prior work to establish the need for this project and provide a critical assessment of past research and competing solutions, • We understand how BT baseband layer works, • We design and implement an SDR processing pipeline for receiving Bluetooth signals, • We provide a tool to locate, identify, and confirm packets in the input stream with configurable error recovery thresholds, 4 Chapter 1. Introduction

• We summarise the experiments results to assess the performance of our tool, • We establish the parallelisability of our design and implementation, and • We provide a critical review of the project and offer results with a view on future improvements.

1.5 Outline

Chapter 1: Introduction gives an overview of the motivation for the work, tasks un- dertaken, and of what is included in the report. We present past research in order to establish the relevance and motivation for this work. Next, we provide a crisp overview of out project with potential applications. Finally, we formulate the aims and outcomes of the project and list our contributions. Chapter 2: Background provides the reader with insight into background knowledge crucial to understanding this work. In this chapter, we further establish the mo- tivation for the solution presented by reviewing prior work and related efforts. We also highlight the key differences of our approach to last endeavours. We explain the general workings of wireless networks, and describe Bluetooth in detail. Furthermore, we introduce the software and hardware platforms, tech- niques, and concepts referenced in the following chapters: for example SDR, Digital Signal Processing (DSP), Gaussian Frequency Shift Keying (GFSK). Chapter 3: Platform Design details the design of the project, starting from a top- level view, and delving into more intricate details of each stage of our processing pipeline. Here, we outline how the technologies presented in chapter 2 can be employed for our benefit. We explain the process of receiving arbitrary radio signals, and attempting to retrieve the data encoded in them. We discuss how to detect and identify packets and extract the information within. Furthermore, we show how obfuscated data can be retrieved from the captured packet headers. Lastly, this chapter presents the conceptional limitations to parallelisation, and suggests workarounds and solutions for efficient deployment. Chapter 4: Implementation explains how the proposed sniffing suite is constructed. We detail the implementation-specific decisions and outline the algorithms used to achieve the desired results. Furthermore, we explain the specifics of how each of the stages is implemented, and how the aforementioned tools are employed. We maintain a view to parallelising the process, as well as keeping track of any 1.5. Outline 5

limitations. Overall, this chapter presents a parallelisable implementation of the architecture outlined in chapter 3. Chapter 5: Evaluation compares the work presented to objective measures, as well as achievements of other related works. We observe detection time and rate, as well as the quantity of false positive readings from our system, and compare this with other available solutions. Lastly, we detail the potential for using the presented framework on a wider scale, with reference to the design decisions that would enable it. Chapter 6: Conclusion summarises our contributions, and the implications of this work. We reiterate the limitations of the current solution, and identify opportu- nities for improvements. Furthermore, we explore some advancements in related technologies, and their effect on the outlined privacy concerns. Ultimately, we suggest avenues for future work that could build upon or complement the efforts presented in this report.

Chapter 2

Background

Both a Wide Area Network (WAN), like a , and a Personal Area Net- work (PAN), such as a Bluetooth (BT) Piconet, would generally require registering with the network prior to gaining access. For a WAN the users expects identifying information to be sent to an agent not under the control of themself. However, in the case of a PAN, all nodes of the network are assumed to be under the direct or indirect control of the same person. Thus, in neither case is the identity of the devices in the network implicitly public outwith the network. However, device identities within a PAN are generally expected to be known only locally. When this is no longer the case, it becomes possible to gather information about devices in the vicinity, and through this track people.

It has been shown that Wi-Fi can be used by a mobile device to determine its own location [10], by a network to position clients [11], and by external agents to identify devices and draw conclusions about the social relations between the device owners [3]. Furthermore, as the devices can be identified, wide scale tracking of people has been shown to be possible [12]. This also demonstrates the risks stemming from easily ac- complishable tracking schemes, giving anyone the potential to monitor a whole area. Due to the obscure nature and lower range of the protocol, Bluetooth has been less subject to such methods. High performance Software Defined Radio (SDR) is becom- ing more accessible and usable, enabling us to avoid long scanning times for device detection, and even identify non-discoverable devices.

All standard network protocols require a device to possess and use a unique hardware ID, often known as the MAC Address. In BT, packets contain parts of this assigned

7 8 Chapter 2. Background address for at least one of the devices in the PAN. This information can in turn be used to uniquely identify the device whenever it is in range. As gadgets rarely change hands, obtaining this information is equivalent to tracking individuals.

The device address is in part hidden from plain sight, but can be partially recovered as shown in [13, 14]. These solutions prove the possibility of such attacks in a controlled environment, but do not work reliably. However, the Ubertooth platform [15] is built upon this work, and works well. None of the previous solutions offer a framework to enable large scale parallel analysis of the whole radio spectrum used by Bluetooth.

2.1 The Bluetooth Classic Protocol

Bluetooth (BT) is an inexpensive and common wireless technology, supported by most modern mobile phones, and used by many peripheral devices and small gadgets. With around 4 billion BT enabled devices sold and over 10,000 new products reaching the marketplace yearly [16], the "serial cable replacement" protocol is definitely common- place in today’s world. Bluetooth uses 79 non-overlapping channels, 1 MHz wide, within the unlicensed 2.4GHz Industrial, Scientific and Medical (ISM) frequency band, and thus has to compete with WiFi and other radio technologies using the same spec- trum. There are several variations of the Bluetooth protocol, with the most popular being (BR)/Enhanced Data Rate (EDR) and Bluetooth Low Energy (BLE). The evolvement of BT is lead by the Bluetooth Special Interest Group, and the latest revision of the specifications at the time of writing can be found in [17].

Originally designed to be a wireless low cost serial cable replacement, BT is charac- terised by a low power consumption profile, inexpensive hardware required, and design fostering coexistence of many devices in the same area. It is a "short-range technol- ogy to set-up wireless personal area networks with gross data rates less than 1 Mbit/s" [18]. It is robust to both continuous limited spectrum and inconsistent wide spectrum interference by taking advantage of Frequency Division Duplexing (FDD), Time Divi- sion Duplexing (TDD), and a small packet size. This allows BT to use the full band, thus being able to transmit on the clear channels during interference, as well as fit the packets in time periods where there are no stronger transmissions. This robustness to interference does indeed reduce the link speed, but that was not one of the initial goals - reliability of the link was prioritised over throughput. 2.1. The Bluetooth Classic Protocol 9

Bluetooth devices form a piconet, with one master node and up to 7 slaves. It is possible for the devices to negotiate a role change, and thus for a slave to become the master of the piconet. For an everyday use case, a would generally be the master of the accessories connected to it. The piconet is defined by the properties of the master node, and thus retrieving the specific details of the master will give us access to the local network.

Bluetooth takes several measures to protect the devices and communications from un- due attention. Firstly, BT achieves FDD by changing the carrier frequency of the signal regularly according to a pseudo-random sequence. While frequency hopping is aimed to avoid interference, it significantly complicates sniffing any communications. This is because an adversary outwith the piconet would not know on which frequency the next packet will be sent. TDD is realised in a rather straightforward sequential time slotting, with neighbouring time slots typically alternating between the master and slave. This enables the protocol to tolerate more clock skew in the inexpensive transceivers with- out need for more complex timing correlators. Secondly, the data in the packets sent is whitened - scrambled (XORed) with a pseudo-random sequence seeded from a part of the master’s clock. On top of that, an optional layer of encryption can be applied to the data contained in the packet payload.

2.1.1 The Device Address

Section 1.2 of Part B of the Bluetooth Specification [17] details the 48-bit Bluetooth Device Address (BD_ADDR). This is the Media Access Control (MAC) layer address of the device, and is designed in accordance with the IEEE 802-2001 standard [19].

The device address can be viewed as 3 discrete parts, as shown in Figure 2.1. Please note the bit order used for specifying the address parts, and in the figure. For more detail on the use of the the address parts in respect to Bluetooth packets, please refer to subsection 2.1.2 below.

NAP - Non-significant Address Part This is a 2 byte sequence that is only used when a device needs to be absolutely identified. For example, this happens during pairing and reconnecting, when Frequency Hop Synchronisation (FHS) packets are sent. Thus, without much luck, this address part is inaccessible to us with passive means. However, active 10 Chapter 2. Background

Figure 2.1: Structure of the BD_ADDR [17].

attacks causing disconnects in the piconet could prompt devices to resend this data. UAP - Upper Address Part The UAP is the next 8 bits of the device address. This part of the device address is not directly reachable, but does play a role in almost all packets. The only packet where this is missing is the Identity (ID) packet, which can be seen as a beaconing or heartbeat packet to keep the piconet synchronised. This creates an opportunity to retrieve the UAP of the master device address by performing analysis of packets in flight. Lastly, the UAP is also an input to the generator that produces the frequency hopping sequence, thus providing us with additional vectors to attempt UAP recovery. LAP - Lower Address Part This is the lowest 3 bytes of the device address. This is present in clear in the Access Code (AC) of each packet, thus easily accessible to us. By design, the Access Code precedes the packet header, and serves to identify the current piconet in a simple and low-power fashion. The AC is generated solely based on the LAP, and thus gives us error detection and correction capabilities for sniffing the LAPs of packets.

The NAP and UAP form a Organisationally Unique Identifier, assigned by the IEEE [20]. The LAP, however, is uniquely assigned by the manufacturer to each individual device.

2.1.2 Packet Structure

All Bluetooth packets follow the same general structure: Access Code (AC), Header, Payload. The AC is fixed for a piconet and always present. The Header is usually 2.1. The Bluetooth Classic Protocol 11 present, and can provide us with additional leverage to recover the UAP. The packet Payload is optional and can vary depending on the packet type.

The AC is present in every packet and defines the piconet the packet is intended for. In case of Inquiry packets, the options for the LAP used are defined in the Specification and do not reflect any device directly. Inquiry packets can be seen as an anonymous shout "Hey, is anyone (with these capabilities) around?", and they use the general form of the ID packet. A typical use of the ID packet, however, can be construed more as "Hello, friends in my network, I’m still here! Do you want to talk to me?".

The Access Code can be broken down into a Preamble, Sync Word, and an optional Trailer (Figure 2.2). The Preamble and Trailer are trivially dependent on the Sync Word (Figure 2.3), and the Trailer is only present if a Header follows - all packets except the ID packet contain at a minimum the AC and the packet Header. Note, the packet Header is separate from the payload header, for example, much like in the case of the encapsulations of the TCP/IP stack.

The Sync Word can be seen as three parts - the Code Word (obfuscated), the LAP and the Barker code. The Code Word is calculated from the LAP, and can serve as an error correction mechanism. In the Sync Word the Code Word only appears XORed with a Pseudorandom Noise (PN) Sequence. The PN Sequence is a constant sequence of bits (however, the specification explains the generator function). The Barker code is trivially dependent on the last bit of the LAP, much like the Preamble and Trailer depend on the Sync Word (Figure 2.4).

In order to obtain the Code word, we need to append the Barker code to the LAP, add the PN sequence, calculate and append the BCH(64,30) code parity bits, and add the PN Sequence again. As a result, the LAP is XORed twice with the PN Sequence, and thus in plain text. In the BT specification, the bit order is occasionally treated in opposite ways, particularly when mixing binary data, network order, and polynomial notations. Furthermore, "adding" a bit sequence is not defined as an operation - in reality they refer to the XOR operation. More details on Sync Word calculation can be found in section 4.3.

The packet Header contains the data depicted in Figure 2.5. The first fields are actual parameters specifying, for example, flow control and packet type. The Header Error Check (HEC) code is an error detection/correction checksum calculated using a Linear Feedback Shift Register (LFSR) initiated with the UAP of the master of the network. 12 Chapter 2. Background

Figure 2.2: Structure of the Access Code [17].

Figure 2.3: Dependence of Preamble and Trailer on Sync Word [17].

The fact that the UAP is used in order to derive this data enables us to recover infor- mation about it. Due to the small entropy of the UAP (it is only 8 bits in size) and data (10 bits), we can simply brute force all combinations and retrieve all possible HEC codes. This would yield us a one-to-many mapping from the HEC we observe to the original data and UAP pair candidates. This set can be reduced rapidly by set inter- section when we observe another packet for the same piconet. To complicate matters, the whole header is Whitened - XORed with another pseudorandom sequence. This sequence’s generator is seeded with the current master clock, and changes on every timeslot. However, with time correlation between observed packets, it is possible to recover both the UAP and master clock. This in turn defines the piconet completely, and would allow us to tune in with any conventional Bluetooth receiver. More detail about calculating the headers of the packets can be found in section 4.4.

Figure 2.4: Dependence of Barker code on LAP [17]. 2.2. Software Defined Radio 13

Figure 2.5: Structure of the packet Header [17].

2.2 Software Defined Radio

An SDR system attempts to replace traditional hardware implementations of radio sig- nal processing pipeline with configurable ones, or even programmable in software to a large extent [21]. This approach is highly suitable for our setting, as it allows for arbitrary operations on the captured radio samples, only limited by our imagination and the performance of modern hardware. For this project, we are only interested in the receiving capabilities of our SDR platform, however, such interfaces often pro- vide a full-duplex transceiver that may have Multiple-Input-Multiple-Output (MIMO) capabilities. This leaves the opportunity to mount active attacks using the same stack.

SDR is commonly used for amateur radio projects ranging from local area FM stations to satellite communications. More importantly, SDR is often used to prototype and test new wireless protocols. On the other hand, this technology can be used to track air planes with inexpensive hardware [22]. Such endeavours can also form crowd- sourcing projects. Due to the availability and low price of radio interfaces, a crowd- sourced people tracking service based on wearables is a potential future risk [23].

As digital systems are discrete by definition, any kind of Digital Signal Processing (DSP) can not work directly with a waveform input. Instead, the waveform is sampled - in our case at rather an impressive rate. A general sample can be seen as the accu- mulative power level over the last sampling period, or the instantaneous power level at the time of sampling. In case of complex samples, we capture the power levels of two individual waveforms, usually a sine and cosine wave. The two waveforms making up a complex sample are commonly called the I and Q components. Sampling double en- ables us to have a formal sampling rate twice as low as would normally be required by the Nyquist equation - it specifies the required granularity of samples to reconstruct a waveform of a given frequency. Reconstruction of a waveform is similar in concept to recovering a polynomial from a given set of points, where the Nyquist equation takes the place of defining the required number of points for a polynomial of a given degree. 14 Chapter 2. Background

Figure 2.6: USRP B210 [24].

2.2.1 USRP and GNU Radio

The workhorse for the practical experiments and implementation of this project is the Ettus Research Universal Software Radio Peripheral (USRP) B210 (Figure 2.6), avail- able for purchase just above the $1,000 mark. This is an SDR board capable of Full Duplex MIMO communications in the 70 MHz-6 GHz frequency range [24]. The available 56 MHz real-time bandwidth serves as a good starting point for a wide-band receiver for Bluetooth, which uses a 79 MHz frequency range. With respect to the ra- dio air interface, two of these USRP boards would be enough to build a full-spectrum receiver for Bluetooth. Even when several USRP boards are required, the cost of the platform would be significantly less than a commercial solution of similar capability. Furthermore, by nature, an SDR approach allows us to easily benefit from future radio interfaces, while using the same processing pipeline.

GNU Radio is a versatile open-source tool for designing and testing signal processing work-flows for software defined radios [25]. By employing a highly modular design, it enables developers to create "blocks" that carry out specific tasks, for example adding a constant to a signal, or performing low pass filtering. A similar approach is taken to input sources, which can be radio interfaces, or alternatively read from a file or network port, or generated on the fly. For hardware sources, it is possible to use driver-specific 2.2. Software Defined Radio 15 source blocks, which often offer custom configuration options, or simply opt for an "umbrella" block like the osmosdr block. When building GNU Radio from source, one has the option to enable or disable any of the modules.

Due to the open and modular design patterns, there are many core blocks available that make building small pipelines trivial. The core blocks include most common sources and generators, as well as a plethora of functionality ranging from simple arithmetics to Digital Television encoders. Furthermore, one can find a multitude of out-of-tree modules, modules that are not included in the main GNU Radio package, but use the same framework and can be easily added to the installation. These modules often implement more specialist standards, or custom solutions.

In order to use the USRP source, we need to first build and install the USRP Hardware Driver (UHD) libraries. Following that, the GNU Radio build script will recognise the core modules as supported on the host system. During runtime, the UHD provides useful debugging information by printing for example an O representing an overrun (the pipeline is too slow to process incoming data), or an U to represent an underrun (the pipeline tries to consume more data than is available) to the console. The UHD libraries also provide several convenience methods for using and managing various USRP devices. These include for example uhd_usrp_probe to retrieve information about connected devices and the uhd_fft - a simple spectrum analyser (Figure 2.7).

The general workflow of a GNU Radio block revolves around sequential input and output buffers. In order to register a block with the GNU Radio suite, the block needs to have a specified dataflow model: how many and of what type of input and output buffers it has, as well as the maximum size of such buffers, and the rate of interpola- tion/decimation. A block is presented with an array of input and output buffers, and the size of the buffers. Should the block be a source or a sink, the input or output buffers may accordingly be irrelevant and missing. In case of C code, the array simply specifies pointers to the start of the actual buffers. The block will then perform its in- tended purpose on the data it is presented with, and return the number of elements that should be consumed from the input buffers. The framework will then shift elements accordingly, so that the next call to the worker function will receive the required input, and the output buffers will be passed downstream on to the next blocks. It is worth noting that the input buffers may not be full when a function is called, and neither does a function need to return with full output buffers. Should a difference in input and out- put buffer sizes be desirable, the framework allows for decimating and interpolating 16 Chapter 2. Background

Figure 2.7: The uhd_fft spectrum analyser. blocks, if properly registered.

This abstraction works well, and allows developers to write small blocks with a par- ticular purpose. In our case, we have developed a flow graph that encompasses several standard modules, and provides the processed input stream to our own custom block. The GNU Radio framework will take care of any buffering between blocks, and issue warnings and errors if there are any issues. Altogether, GNU Radio offers a very intu- itive mechanism to build arbitrarily complex systems, which can be easily put together using the gnuradio-companion graphical user interface.

2.3 Modulation and Demodulation

In order for a data stream to be transmitted as radio waves, it needs to be modulated - converted from the data domain to the domain of radio frequencies. This can be as 2.3. Modulation and Demodulation 17 simple as On-Off-Keying - a sequence of high and low power transmissions on the specified frequency. However, such schemes do not offer high data rates or much ro- bustness. Therefore, a popular choice is to alter a sinusoid wave at the given frequency, and by that encode data onto it. Since the underlying wave is well defined, the receiver can then compare the incoming signal to its internally oscillated sinusoid, and recover the data from the differences. As part of our processing pipeline, we will need to de- modulate the radio signals in order to recover the data stream encoded onto the carrier wave.

2.3.1 The GFSK Modulation Scheme

For Bluetooth, all packet Access Codes and Headers are modulated with Gaussian Fre- quency Shift Keying (GFSK). Below, we give a high level overview of an algorithm for modulation and demodulation [26]. The complexities of demodulation performance with respect to speed and quality are outlined in [27], where various algorithms for the job are examined. For the implementation of our project, we simply use the GFSK Demodulator block from GNU Radio. However, it is important to understand how the block and the modulation scheme work in order to correctly configure the parameters.

In principle, the GFSK modulation scheme takes the following steps:

Interpolating input bits gives us a larger number of data points to manipulate. For example, a sequence of 01001 could become 000111000000111. Smoothing with a Gaussian curve enables us to ensure the transmitted channel is better contained in a channel, and thus reduce the side lobes during transmis- sion. This is in essence due to reducing the deltas between transmitted symbols, enabling the hardware to perform more accurately - rapid changes to analogue signals are difficult to achieve without error. To achieve the smoothing, we fit a discrete Gaussian curve over each of the interpolated bits and calculate the to- tal value of the area for sections where the bits are 1. This gives us non-binary results, which will together form a curve with no sharp edges. Treating values as derivatives of frequency enables us to use a simple, common, and inexpensive to implement Frequency Modulation (FM) scheme to encode the data onto physical radio waves. A high value in the input stream is translated into an increase of frequency on the output interface at the given sampling point. 18 Chapter 2. Background

Considering frequency as distance between I and Q samples may help to see how a continuous high input would not cause the channel to drift. In the complex plane, with I and Q sample values as axes, we can see this shift in frequency as a rotation around the origin at the distance equal to the total output power. Thus a continuous high input would cause a clockwise rotation that would increase the frequency up to a peak, and then effectively decrease by wrapping around.

In order to demodulate a transmitted signal according to the GFSK modulation scheme, we need to consider the above steps in reverse:

Obtaining frequency shifts from the input samples can be performed by a standard FM demodulation. This yields, in essence, an Amplitude Modulated (AM) sig- nal representing the derivative of the changes in frequency. Interpolating input samples will enable us to establish the changes in the input more accurately, should the sample rate not be sufficiently high. At the simplest level, this enables us to have higher granularity in specifying which samples are the correct ones. However, a waveform specified in greater detail can also make the work of the decision algorithm more straightforward as it can now simply traverse the input without concern to the risk of greatly missing the true points of interest. Recovering symbol clock is required to know at which points the input has the correct values, and when it is transitioning between two symbols. We would like to sample exactly at the best points, however this is not always possible. Such errors could result from the differences in the transmitter and receiver oscillators, or any hardware or environmental differences. Thus, the clock recovery is a continuous process that needs to adjust itself according to incoming input. This stage results in a decimated sample stream with only what we believe to be the "right" samples. Decoding symbols is simply done by outputting the bit 1 where there is a positive change and the bit 0 where there is a negative change between two key samples.

2.4 Existing Bluetooth Sniffing Tools

The idea of sniffing for Bluetooth is in no way novel on its own. There exist many interfaces that can enable us to retrieve data transmitted within a BT piconet. For most 2.4. Existing Bluetooth Sniffing Tools 19

(a) Ellisys Bluetooth Explorer 400 [28] (b) ComProbe Sodera [29]

Figure 2.8: Professional Protocol Analysers end-user tools the sniffer would need to know some details about the network, or be connected to it. Since our proposed use case involves tracking and monitoring arbitrary piconets and devices, we will focus on alternatives that offer similar functionality.

A number of commercial platforms for Bluetooth development exist, reliably provid- ing sniffing and debug functionality. However the majority focus on BLE and are not compatible with Bluetooth Classic, the predecessor of and a significantly heavier and more complex protocol than BLE. These include, for example, Teledyne LeCroy (for- merly Frontline) (Figure 2.8b) and Ellisys (Figure 2.8a) devices. These devices are hard to come by and they come with a very significant price tag. Basic models capable of minimal analysis of BLE start at around $3,500. Interfaces with more advanced features start around $20,000-$30,000 [9]. None of these are, however, capable of comprehensively (for most, at all) analysing Bluetooth Classic. In order to support Bluetooth Classic, we need to look for the top-of-the-line models and thereby are pre- sented with significantly higher prices. These devices are mostly intended for lab usage for companies designing Bluetooth chips and producing devices, and are not ideal nor affordable for deployment as a people tracking network.

There are several affordable ($30-$50) options for only BLE support: TI BLE Sniffer (CC2540EMK-USB dongle), Nordic nRF Sniffer (nRF51 PCA10031 USB dongle), and Adafruit Bluefruit LE Sniffer (based on the previous ). These devices can only listen on one BLE advertisement channel and have several other downsides [9]. For use in an application development setting these may work superbly, as the user would be able to configure the devices for their application.

This leads us one level up to generic SDR solutions and the Ubertooth Project [15]. An SDR solution can abstract away any peculiarities of the radio interface and provide access to any layer of the network protocol. Although SDR platforms can provide us 20 Chapter 2. Background with the required access on the physical layer, they require a software stack to handle the received samples and manually build any higher layer abstractions upon it. Since all the processing of input samples is done in software, a typical SDR solution does not take advantage of hardware acceleration for common routines, and thus needs a rather powerful host computer to keep up with protocols at high symbol rates. The best known of such suites is gr-bluetooth - an SDR Bluetooth receiver stack [13, 30].

2.4.1 BlueZ

The BlueZ Project [31] is most commonly used as the Bluetooth stack for Linux com- puters. It provides means of configuring the physical radio interfaces and controlling them according to the Bluetooth protocol specification.

The main controllers are hcicontrol for configuring interfaces, and hcitool or bluetooth-ctl for sending commands to the interface. Standard Bluetooth interfaces do not, however, provide a reliable promiscuous mode for sniffing, and the physical radio receivers are limited to only one channel at a time.

Thus, this stack is very useful for everyday usage and testing devices, but proves cum- bersome when we would like to poke at a lower level.

2.4.2 gr-bluetooth

As a result of [13], gr-bluetooth was created [14, 32]. This was one of the first such systems, offering then a novel opportunity - receiving arbitrary Bluetooth signal using affordable hardware.

The project uses GNU Radio as the back bone of the processing pipeline. However, in- stead of using the standard blocks, the contents have been extracted and combined into one larger block [33]. Despite the original sources having been referenced, the code is largely unchanged, and such repackaging unjustified. Furthermore, the model builds the Access Code from each sighted LAP on the fly, thus incurring large performance penalties that render it unusable in a highly parallelised way.

The project does outline a basic pipeline for SDR-based BT processing, which we adapt and improve. The project can only work with one channel at a time and does not provide any correlation between channels. There have been attempts to implement 2.4. Existing Bluetooth Sniffing Tools 21

Figure 2.9: Ubertooth One [34]. following a known piconet’s channel hopping sequence, but this only works under very specific conditions on the data transmitted, as the suite is not capable of taking advantage of multiple air interfaces. Thus, successfully following a piconet without loss of data is dependent on short packets to allow for enough time for the radio in- terface to reconfigure. Possible improvements to this package are further discussed in section 6.3.

Unfortunately, the project does not work as a whole, and thus does not provide us with the comparison we would like.

2.4.3 Ubertooth One

The Ubertooth One (Figure 2.9) is an open source hardware and software platform capable of sniffing a single Bluetooth channel at a time. This model supersedes the discontinued Ubertooth Zero. It can be seen as the natural evolution of gr-bluetooth, as it was built upon the same code base and ideas, but utilises hardware modules for the first stages of the pipeline. However, it is possible to follow a known piconet, as the radio interface and demodulation are done in hardware using standard modules. This enables the Ubertooth to rapidly change the channel it is listening on.

The Ubertooth is controlled by the host computer software package, which communi- cates with the firmware of the device. The firmware, in turn, configures and manages the hardware modules for radio receival and demodulation. Such a design is efficient, as quick decisions can be made on board the device, thus cutting out the latency of the USB link to the host computer. The host software package then received the demodu- lated data stream and performs analysis adapted from gr-bluetooth on it.

The Ubertooth works rather well, but there is a significant amount of false positive reports with higher error margin (the error allowed in the Access Code). For lower error thresholds, there are some packets missed, but in general the performance is 22 Chapter 2. Background acceptable for most usages.

A clear downside of the platform is that it is bound to the specific hardware device. This in turn limits us from taking advantage of more advanced radio interfaces without significant trouble. Our bitstream processor did not perform well on the raw data dumped by the Ubertooth, and neither did the Ubertooth software produce quality output from our demodulated data. The software is in essence compatible and cross- experiments yielded some valid output - just significantly less than either suite could produce on its own.

The greatest limitation of the Ubertooth is lack of control over the radio interface and the limited performance of the hardware device. This is not an issue for single-channel monitoring, as the performance is sufficient. However, the device does not possess, and had not been intended to possess, any capability of multi-channel reception. Thus, using this platform we would need 79 interfaces to capture all channels used by Blue- tooth (only 40 for performing full spectrum BLE analysis). Furthermore, we would need to significantly modify the host software to enable any cross-channel correlation. The options inspired by such a design are further discussed in section 6.3. Chapter 3

Platform Design

This chapter provides a design for the proposed sniffing system. We will give both top- level overviews of the architecture, as well as details of requirements on component modules. The detailed implementation for the project is presented in chapter 4.

The proposed platform is a highly parallelisable implementation of a Bluetooth radio receiver for Bluetooth packet snooping that can be employed for mass surveillance. At this stage, we do not desire to analyse the content of the packets, and thus only examine the input with respect to the Access Code and Packet Header. Bluetooth communications are relevant to the piconet the devices form, which in turn is defined by the master node. Due to this, we will in most cases only be able to see details about the master device of the piconet (see section 2.1). However, it is possible that in order to "wake" or "probe" a slave the master will also address packets directly to other devices by their LAP. Additionally, Bluetoot supports Role Reversal where a slave can become the master of the piconet, but this is not used much in practice, particularly after initial connection. This does not limit us with respect to user tracking - identifying one device a user regularly has on them is sufficient for identifying the user.

At the highest level the processing pipeline can be viewed as three main parts:

• Radio interface and channel isolation, • Demodulation of selected channel, • Processing the data stream to identify packets.

We provide a process flows from 10,000 feet for single channel and multi-channel

23 24 Chapter 3. Platform Design

Figure 3.1: A top level diagram for eavesdropping on a single channel.

Figure 3.2: A top level diagram for eavesdropping on multiple channels in parallel. snooping in Figure 3.1 and Figure 3.2 accordingly. The details of the design for each of the blocks is elaborated in the below sections.

3.1 Receiving, Filtering and Demodulating Samples

The first stage for the analysis is receiving the radio signals emitted by the Bluetooth device - establishing a framework for signal detection and demodulation. Inspired by [35], we design a processing pipeline to (de-)modulate Bluetooth signals.

Using digital radio receivers, we can often observe significantly and inconsistently increased power levels at the central frequency - this is called the DC spike, an arte- fact from the Analog to Digital converter in the receiver. Trying to record a channel overlapping with this would cause high error rates, as this significantly affects the char- acteristics of the radio waves in the spectrum. In order to avoid being subject to the DC spike, the centre frequency can be offset from the targeted frequency. For example, to record the 1 MHz wide Bluetooth channel at 2410 MHz, we can listen to a 3 MHz bandwidth at 2409 MHz, as demonstrated in Figure 3.3. The above offset is just an example, and likely an offset of 0.5 MHz would be best for wideband receiving. As such an offset would cause the DC spike to land between channels, and be removed in the following band-pass filter, this does not pose a limitation for wideband recording. Receiving off the desired frequency means the recorded signal needs to be frequency shifted back by the offset to obtain a centred baseband signal. This is easily done with 3.1. Receiving, Filtering and Demodulating Samples 25

Figure 3.3: Position of centre frequency relative to intended target 1. a translating band-pass filter or, in case of multi-channel processing, a hierarchical polyphase channeliser - a very efficient implementation of a sequence of band-pass filters that splits an input band into the required number of channel bands.

Once we have a stream of samples centred on our channel of interest, we can remove any noise from neighbouring frequencies by using a band-pass filter from -0.5 MHz to 0.5 MHz. The resulting 1 MHz recording can be improved by applying Squelch to mute any time periods where the power level of the received signal is below a given threshold - this will only leave non-zero samples when there is some traffic on the chan- nel. A further band-pass filter removes any side-band noise generated by the squelch circuit applied. The second pass of band-pass filtering twice cleans the signal further, but for wideband receiving may be disabled as the side lobes created by the squelch are in most cases small enough.

We calibrated the initial threshold for the Squelch manually based on the characteristics of the receiver and its surrounding environment. The process can be improved by dynamically adjusting the threshold to what is currently believed to be the noise level. However, dynamic Signal-to-Noise Ratio (SNR) detection and a squelch adjustment are not implemented within this project.

Simple squelching will not be able to remove other powerful signals, like Wi-Fi, which

1http://labs.inguardians.com/images/grc_stob/blog_grc_dc_spike_capture_offset_ fft.png 26 Chapter 3. Platform Design

Figure 3.4: Differences of Bluetooth and WiFi packets in frequency domain. also uses the same ISM band. Since Wi-Fi signals are generally significantly stronger, the BT packet will not be comprehensible whilst being shadowed by a concurrent Wi- Fi packet. However, this situation can be identified as Wi-Fi uses significantly wider channels. A decision algorithm can be devised to detect the presence of packets based on the difference of centre-band an side-band power levels, as shown in Figure 3.4. This would enable us to avoid treating the noise generated by WiFi transmissions as potentially valid input.

We have at this point successfully removed some noise from the input signal, and muted it when we believe there is no transmission present. If not done carefully, we could filtering out weak signals that could possibly have been decoded successfully. This will reduce the work the demodulator will need to do, thus reducing the amount of data to be decoded, and in turn the load on the downstream components and system overall. However, while muting we need to retain timing correlation of separate sam- ple and bit streams, complicating the load reduction on the demodulator. Furthermore, muting the input when no communications are present will allow the demodulator’s clock recovery module to "start fresh". This will combat against any skew created by a previous transmission and may reduce the time required to lock onto a new transmis- sion, and ensure similar performance for each packet. Additionally, identifying when the channel grows quiet would enable us to have some concrete boundaries between packets, as this will never happen during a single-timeslot BT packet. This will further reduce the complexity of the data for the following processing units by lessening the valid options when a packet could be present. 3.2. Identifying Packets 27

Lastly, for this stage, the packets are fed through a demodulator. The demodulator will find the symbol timings and following the clock recovery, output the symbol val- ues. The parameters for the demodulator have been collected from research papers on Bluetooth and GFSK modulation [27], and by exploring the approaches used in gr- bluetooth [33]. Any of the parameters can be easily adjusted, should there be need to. All in all, we have designed a flexible and extensible pipeline to turn incoming radio samples into a bitstream of the Bluetooth data transmitted.

3.2 Identifying Packets

Identifying what is a packet and where can be a troublesome task. Due to the fast data rate and lack of knowledge of where packets start, we need to examine each pos- sible packet candidate bitwise. Bluetooth receivers are inexpensive and thus cannot possibly perform a significant amount of computation. Nonetheless, they use hard- ware correlators and can rapidly verify and apply the Bose–Chaudhuri–Hocquenghem (BCH) codes, a class of cyclic error correction codes. This in turn makes the receiver efficient and not overly complex. Naive approaches to emulate the hardware will be forbiddingly slow. With our software based solution, we are unable to take advantage of such acceleration methods as we are constrained by a fairly serial processor, and dispatching overhead would be forbiddingly high for small work chuncks. The can be overcome by further parallelising the whole process, and thus consuming the delay by pipelining the packet candidates.

Every BT packets starts with a distinctive preamble (5 alternating 1 and 0 bits), and at a fixed offset there should be a Barker code. Either of those could be one of two inversions, for example the Barker code could take the form of 1110010, or the inverse 0001101. Which of the two forms is used for the preamble and Barker code depends on the BCH code and Lower Address Part (LAP) accordingly - the last bit of the preamble is the first of the BCH code, similarly the first bit of the barker code is the last of the LAP. For most packets there is also a trailer similar to the preamble following the Barker code, and dependent on the last bit of the Barker code.

Based on the above, we can look for 5-bit sequences that could be the preamble. Al- lowing 2 bits of error covers all permutations of 5 bits exactly once. Thus (assuming we want to allow a minimum of 2 bits of error) we can not directly discriminate on 28 Chapter 3. Platform Design the preamble. However, we can retrieve what should be the Barker code based on our current potential start of a packet. Allowing for 2 bits of error here does not cover the whole 7-bit search space - we have obtained a concept of an "invalid" Barker code in our erroneous input. This is the first point of discrimination, followed by the total error of each of the aforementioned sections.

Having reduced the amount of data we need to further process, we extract the LAP from the packet. Since this could also be erroneous, we apply a predefined degree of bit errors. The error patterns will have been precomputed, so applying the error to the initial LAP can be performed with a simple XOR. For example to apply 1 bit or error to the sequence 101 we would in turn XOR it with 001, 010, and 100. The suggested degree is 2 - 2 bit errors will cause us to examine the initial LAP, 24 LAPs with 1 bit error and 552 LAPs with 2 bits of error. For each of the examined possible LAPs we look up the precomputed AC and compare this with our input, yielding the Access Code error. From all the possible LAPs after applying the error, we choose the one that produces the AC closest to the input by Hamming distance.

Since the whole AC is solely based on the LAP, the precomputed ACs are simply calculated for all possible 24-bit combinations for the LAPs, and stored in an array. This provides efficient memory usage with fast lookups. For the whole AC we reserve 72 bits, thus requiring 72b × 224 = 144MiB of memory for the whole lookup table. Since this can be shared between the processing threads for all channels (in case of a multi-channel deployment), we regard this as efficient use of available resources, and highly feasible on any modern hardware.

At this point, the AC and Packet Header are inserted into a queue, that is processed by a separate thread. This enables us to perform near-arbitrary analysis on the packet without holding back the pipeline. For improved future versions of the platform, we preserve the ability to also pass on the packet contents. The AC error is artificially increased for ID packets, as they do not include a trailer and their last 4 compared bits would be arbitrary. However, due to the design of the Barker code, the sequences gen- erated from similar inputs will have a guaranteed high guaranteed Hamming distance Thus, we do not regard this as a limitation - we allow sufficiently large error margins, and in case of an error in the LAP, that will dominate significantly. The main reason the trailer is not used as one of the discriminations, however, is that it cause a significant skew against detecting ID packets that do not have a trailer and header. 3.2. Identifying Packets 29

As the next step, we examine the packet header. The header is always FEC 1/3 encoded - each of the bits is tripled. This provides us with a next discriminator. For each of the triplets in the header, we take the minimum distance from 111 and 000, and decode it remembering the error. As a result, we will always receive a decoded 18 bit packet header from the 54 bits of input header, along with the number of bit errors. At this stage we can discriminate whether packets are valid or not based on whether the error examined during the header decoding is within the configured acceptable bounds. For packets deemed valid, we save the LAP along with any required data and meta-data, furthermore we keep a confidence index for the LAP and add 1 to it

(con fidenceLAP+ = 1). Should the packet be deemed to have an invalid header, we still store the packet, but assign the sighting a lower confidence (con fidenceLAP+ = 0.1) and add it to the overall index for the given LAP. ID packets, that do not have header, are sent as high intensity trains of packets, thus the repeated sightings of the packets will ensure the confidence adds up and the LAP is registered as sighted with confidence

(con fidenceLAP ≥ 1).

At this stage we print out the LAP found with some meta-data to help us monitor and analyse the performance of the module. In a production deployment we would avoid printing the debug messages, and instead employ a specialised handler. The amount of data has by this stage greatly lessened, so it can be further correlated locally or shipped off to a central location for further processing.

Now we can analyse the header further. The header consist of 10 bits of data fields and a 8 bit HEC checksum. The HEC checksum is calculated with a Linear Feedback Shift Register (LFSR) initiated with the Upper Address Part (UAP) of the master device. However, the whole header is whitened with a sequence from a generator initiated with the master clock. While making processing the header more difficult to agents unaware of the master clock and UAP, these transformations link the packets transmitted to the UAP, and thus create the potential for recovering it. In order to reverse the whitening, we have precomputed all possible 18 bit header sequences originated from any 10 bits of header data, checksums seeded with the 8 bits of any UAP and then whitened with a sequence seeded with any 6 lower bits of the master clock. This provides yet again an efficient lookup based on the observed whitened header.

However, given the relatively short 18 bits of header, that depends on a total of 24 bits of data, we should expect collisions. Indeed, due to the design of the transformations, we get exactly 64 matches for any possible input. On its own, this is not enough to draw 30 Chapter 3. Platform Design conclusions about neither the master clock of the piconet nor the master device’s UAP. However, different packets in the same piconet will map to different sets. Correlating this enables us to repeatedly take the intersections of the UAPs presented from the 64 different possible sources we have recovered. This combined with hopping sequence analysis is sufficient to retrieve the UAP of the master device within tens of packets observed (Bluetooth transmits up to 1600 packets a second).

3.3 Options for Parallelising

Instead of recording a single channel, it is possible to use the whole processing power of the radio interface boards, and channelize the wideband signal in bulk. This enables us to capture the Bluetooth traffic on a wide range of frequencies, and analyse the data in parallel. Due to the frequency hopping nature of Bluetooth, covering a significant portion of the channels is sufficient to have high likelihood of detecting a device.

Ensuring that the timing of the bits on the channels is kept in sync or timestamped correctly enable us to be able to derive accurate relative timings of each bit. This in turn enables us to perform analysis of hopping sequences. Combined with the Packet Header analysis from above, this enables us to efficiently retrieve further information about the master of the piconet.

The amount of parallelisation available will depend on:

Radio interface maximum bandwidth and sampling rate Using the USRP B210 boards, we can receive up to 56 MHz of instantaneous bandwidth and we can process up to 61.44 Msps with the internal Analog-to- Digital Converter (ADC) unit [36]. However, in reality much more than 50 Msps causes loss - likely due to restricted flow on the USB3 bus. Since our proposed solution is not hardware bound, we can use any SDR radio interface. This also enables us to shift more processing towards the hardware receiver stack, and thus lessen the load on the link to the host machine. Channel selection, filtering, and demodulation speed The channelisation is currently done on the host computer. This is achieved with the standard GNU Radio blocks. This could be improved with various hardware accelerators like FPGAs and GPUs, which may be part of the receiving interface. 3.3. Options for Parallelising 31

Speed of packet detection The speed of packet detection is mostly dependent on the allowed error thresh- olds. The processing does not require particularly complex computations - a high number of simple cores would work well to analyse all possible LAPs derived from the input concurrently. In future work, this could be achieved with GPUs used to perform the packet detection in parallel. Signalling and synchronisation overhead between threads There is always an overhead for signalling and synchronisation with parallel programs. Given the high level of parallelism available and separated, repeated work threads on multiple input data in this problem, these would not be limiting.

Despite the very high data bandwidth, it would be conceivable (and more plausible in the future) to transmit the whole 80 MHz spectrum over a fast link (for example a 100G fibre) and perform all processing centrally and offline. This lends for additional degrees of parallelisation within the processing infrastructure.

Many of the above concerns can be solved by using more specialised hardware. How- ever, even with the currently available tools, we are able to show that full spectrum analysis is possible and achievable for an adversary.

Chapter 4

Implementation

During development we used Python scripts to benefit quick code writing. For the final version used for experiments, the code base was ported to C++ for performance and reliability reasons. Further, as there are less automatic features to the language, we can expect uniform performance. As a result, some of the pre-calculation scripts will be presented in Python while the main program code is discussed in C++.

We implement two top level architectures seen in Figure 4.1 and Figure 4.2 for single channel and multichannel processing accordingly. While we propose a unified sink for multiple channel processing, this has not been implemented, and we do not currently have the capability to correlate timings of activities on different channels. However, it is possible to start the single channel processors in parallel, and have them use the same final stage mapper. This will enable us to process the header data collectively, and quicker discriminate false guesses for the UAP.

4.1 GNU Radio Reception Pipeline

With GNU Radio it is straightforward to combine the "blocks" to a signal processing pipeline. We are using the USRP radio boards and the UHD as the signal source. This if followed by standard blocks, culminating with our custom out-of-tree module. The top level architecture was designed in GNU Radio Companion, the graphical user interface (GUI) for editing GNU Radio pipelines. When compiled, the flow graph is converted into a Python script that simply links the modules used together, and provides an interface for calling the program with any required arguments. During

33 34 Chapter 4. Implementation

Figure 4.1: GNU Radio pipeline for capturing and extracting a channel.

Figure 4.2: GNU Radio flow graph for channelized capture and processing. 4.1. GNU Radio Reception Pipeline 35

Figure 4.3: GUI options used during development. development, it was very useful to view the frequency domain outputs, and have easy runtime access to module parameters (see Figure 4.3). In the GUI we are able to select, for example, the levels for squelching, received bandwidth, sampling rate, band-pass filter’s transition with and the mu for the demodulator. However, for experiments the GUI was disabled altogether.

At the first stage, we simply receive a band of frequencies, as raw complex samples. For example, a received BT packet in frequency domain is shown in Figure 4.4. Espe- cially in a busy area, it is likely to have other transmission on neighbouring channels. We perform band pass filtering in order to remove this and isolate the channel we are interested in. The resulting output may look like Figure 4.5. Following this we squelch low power periods of input. This however can generate additional side lobes as seen in Figure 4.6, which can be removed by another band pass filter, resulting in something like Figure 4.7.

While in frequency domain we can simply see the levels go up and down, once we shift to exploring IQ sample constellations, it becomes much clearer when there may be a BT packet present. Figure 4.9 shows the constellation when there is no particular packet present, but the output has not been muted - a muted output would give an empty constellation or a constellation with only the origin marked, depending on whether we output 0 samples or nothing while muted. However, a BT packet creates a clear pattern on the constellation graph, as seen in Figure 4.8. This could be used for further optimisations, but no such improvements are implemented in our solution.

The above transformations create the input for the demodulator. Despite GNU Radio having a GFSK demodulation block, the task is not quite as straightforward. The modulation depends on several variables, both when fitting the Gaussian to recover 36 Chapter 4. Implementation

Figure 4.4: A Bluetooth packet in frequency domain.

Figure 4.5: A Bluetooth packet after band pass filtering. 4.1. GNU Radio Reception Pipeline 37

Figure 4.6: A Bluetooth packet after squelching.

Figure 4.7: A Bluetooth packet after second band pass filtering. 38 Chapter 4. Implementation

Figure 4.8: A Bluetooth packet viewed as an IQ constellation.

Figure 4.9: Constellation when there is no current BT packet.

original samples, and when decoding the smoothed signal from the carrier. Finding the exact parameters for the GNU Radio block is not a trivial task, even when referencing the Bluetooth Core specifications, however, we have recovered the required parameters shown in Table 4.1 [27, 33].

Parameter Value Sensitivity 8/π Gain Mu 0.175 Mu 2 0.32 Omega Relative Limit 0.005 Freq Error 0.00

Table 4.1: Parameters for the GFSK demodulator. 4.2. Preamble, Barker Code, and Trailer Pre-calculation 39 printf("Generating erred preamble mappings...\n"); // For each 5 bit sequence for (uint64_t i=0; i<0b100000; i++){ // find the Hamming distances from both valid options uint32_t err1= calc_hamming(i,0b10101); uint32_t err2= calc_hamming(i,0b01010);

// It is impossible exceeding 2 bits of Hamming distance if (err1>2&& err2>2) exit(1);

// Save mapping to a map indexed by the input combination if (err1< err2) { preambles[(uint8_t) i]= erred_field_t(0b10101, err1); } else { preambles[(uint8_t) i]= erred_field_t(0b01010, err2); } }

Figure 4.10: Preamble and Trailer mapping generator.

4.2 Preamble, Barker Code, and Trailer Pre-calculation

The preamble, trailer and Barker code mappings from erroneous to correct values, tagged with the Hamming distance between the two are generated on each launch. Since the preambles and trailers are identical by specification (but not necessarily matching within a packet), we can use a single mapping for both. In any case, we look at all possible inputs from the packet candidate, and provide a mapping to the best guess of valid data determined by the Hamming distance. As a result, we populate a hashmap with key size of 5 or 7 bits accordingly (Figure 4.10, Figure 4.11). For the Preambles we obtain a complete map from all 5 bit input, whereas not all 7 bit Barker code input will be recognised.

4.3 Access Code Pre-calculation

There are 224 options for the access codes - the AC is solely dependent on the LAP of the piconet it is for. As per the BT standard [17] the Access Code consist of the Preamble, a Sync Word and an optional Trailer. In order to calculate the Sync Word we need to take the LAP, calculate the Code Word from it, add the Preamble, Barker code 40 Chapter 4. Implementation printf("Generating erred barker code mappings...\n"); // For each 7 bit input for (uint64_t i=0; i<0b10000000; i++){ // find the Hamming distances from both valid options uint32_t err1= calc_hamming(i,0b1110010); uint32_t err2= calc_hamming(i,0b0001101);

// It is possible exceeding 2 bits of Hamming distance // Do not allow more than 2 bits of error in Barker if (err1>2&& err2>2) continue;

// Save mapping to a hashmap indexed by the input combination if (err1< err2) { barkers[(uint8_t) i]= erred_field_t(0b1110010, err1); } else { barkers[(uint8_t) i]= erred_field_t(0b0001101, err2); } }

Figure 4.11: Barker code mapping generator.

and Trailer. Apart from the Code Word, computing the Access Code is trivial. Due to the large number of LAPs, the ACs were of course calculated in a multi-processing fashion. Once all the data was calculated, tuples of LAP and AC were sorted, and dumped in a raw binary format (72 bits or 9 bytes per AC). This could then be loaded directly into memory to populate a C array.

To calculate the Code Word, we first need to generate a constant Pseudo-Random Noise sequence (see Figure 4.12). The BT specification provides us with the generator poly- nomial 1574641655478, which is used to generate the parity bits for the BCH code. The whole process can be seen in Figure 4.13. Understanding the process is com- plicated by near-arbitrary conversions between Little and Big Endian bit orderings throughout the specification. In the code referenced, we start with a LSB first bit or- dering. The parity calculations are inspired by gr-bluetooth [33], rather than directly by the BT specification. 4.4. Header Pre-calculation 41 lsfr= deque([1,0,0,0,0,0]) pn="" for x in range(63): o= lsfr.popleft() pn += str(o) o=o+ lsfr[1]+ lsfr[2]+ lsfr[4] lsfr.append(o%2) pn += '0'

Figure 4.12: Pseudorandom noise LSFR.

4.4 Header Pre-calculation

In order to calculate the packet header we need to take the header data, calculate the HEC checksum, and whiten the resulting 18 bit sequence. For this, we will need to use several standard LSFRs, as seen in Figure 4.14. This is then used to create the HEC and Whitener LSFRs (Figure 4.15 and Figure 4.16 accordingly). In case of the HEC we need to use the LSFR in a completely standard fashion, so we simply initialise it with the correct starting value and generator polynomial. For the Whitener, however, we need to slightly alter the data flow, and ensure the input data does enter the LSFR but is simply passed by.

Combining these allows us to calculate the headers for all possible Header data, UAP, and master clock (lower 6 bits) combinations (see: Figure 4.17). We need to initialise the HEC LSFR with the UAP and the Whitener LSFR with the clock. Then we need to simply feed the 10 bits of data through the HEC LSFR to yield the 8 bit checksum, and XOR the resulting 18 bits with the output of the Whitener LSFR. Due to the input entropy being greater than the output space, we will yield several possible sources for each calculated header, namely 64. Again, this was precalculated in a parallelised fashion.

4.5 Packet Detection and Extraction

At this stage we turn a stream of bits into a set of packets. For development, the ID packet was the most handy option as it is both easy to produce and the contents are consistent and easy to predict. ID packets are sent when performing an inquiry, triggered by for example simply calling hcitool inq through the bluez stack [31]. 42 Chapter 4. Implementation

# Obtain the Information Sequence: append Barker code inf_seq= lap^((0b101100 if (lap >> 23) ==0 else 0b010011) << 24) # BT spec 6.3.3.1 step 2 ~x: Apply pseudorandom noise inf_seq_p= inf_seq^ (pn >> 34) # BT spec 6.3.3.1 step 3 ~c: generate parity (Reversed bit orders) g_parity= list(reversed(map(int, "{0:035b}".format(g)))) data_parity= list(reversed(map(int, "{0:030b}".format(inf_seq_p)))) # Initiate LFSR length_parity= 64 k_parity= 30 cw_parity=[0]* (length_parity- k_parity) # Feed Information Sequence through the LSFR for i in reversed(range(k_parity)): feedback= data_parity[i]^ cw_parity[length_parity- k_parity-1] if feedback !=0: for j in reversed(range(1, length_parity- k_parity)): if g_parity[j] !=0: cw_parity[j]= cw_parity[j-1]^ feedback else: cw_parity[j]= cw_parity[j-1] cw_parity[0]=1 if g_parity[0] and feedback else 0 else: for j in reversed(range(1, length_parity-k_parity)): cw_parity[j]= cw_parity[j-1] cw_parity[0]=0 cw_parity="".join(map(str, cw_parity))[::-1] parity= int(cw_parity,2) # BT spec 6.3.3.1 step 4 ~s: Obtain Code Word codeword= (inf_seq_p << 34)^ parity # BT spec 6.3.3.1 step 5 s: Add pseudorandom noise again # Note that this cancel's out outwith the BCH code addPN= codeword^ pn # BT spec 6.3.3.1 step 6 y: Add Preamble and Trailer AccessCodeFromLap=\ "{0:04b}".format(0b0101 if (addPN >> 63)&1 ==0 else 0b1010)+\ "{0:064b}".format(addPN)+\ "{0:04b}".format(0b0101 if lap&1 ==0 else 0b1010)

Figure 4.13: Generating an AC from a LAP. 4.5. Packet Detection and Extraction 43

class lsfr(): # Initialise with data and set generator poly def __init__(self, init, gen): self.generator= gen self.regs= init self.regs.reverse() self.regs.append(0) # the "out" register # Push a single bit of data def push_bit(self, bit): # spoof the output bit to be the input self.regs[-1]= bit

# do the shift. the spoofed value will be overwritten first # then others follow, and use the new calculated value for i in range(len(self.regs)-1,-1,-1): self.regs[i]=(self.regs[-1]^ self.regs[i-1]\ if self.generator[i] else self.regs[i-1]) return self.regs[-1] # Push each bit or data def push(self, data, size): data_b= "{data:0{size:d}b}".format(data=data, size=size) for b in data_b: self.push_bit(int(b)) return self.retrieve() # Retrieve final value (reversed) def retrieve(self): return int("".join(map(str, reversed(self.regs[:-1]))),2)

Figure 4.14: LSFR class design.

class HEC_lsfr(lsfr): def __init__(self, uap): super().__init__(\ init=[int(b) for b in "{:08b}".format(uap)[:8]], gen=[0,1,1,0,0,1,0,1,1])

Figure 4.15: HEC LSFR module. 44 Chapter 4. Implementation

class Whitener(): def __init__(self, clk): self.lsfr= lsfr(init=[int(b) for b in "1"+\ "{:06b}".format(clk)[:7]], \ gen=[0,0,0,0,1,0,0,1]) self.out=[]

def get_bit(self): return self.lsfr.push_bit(0)

def push(self, data, size): data_b= map(int, "{data:0{size:d}b}".format(data=data, size=size)) for b in data_b: self.out.append(self.get_bit()^ b)

def retrieve(self): return int("".join(map(str, self.out)),2)

Figure 4.16: Whitener LSFR module.

for uap in range(1<<8): for clk in range(1<<6): for data in range(1<<10): hecer= HEC_lsfr(uap) whiter= Whitener(clk)

hec= hecer.push(data, 10) whiter.push(data, size=10) whiter.push(hec, size=8) headers[whiter.retrieve()].append((uap, clk, data))

Figure 4.17: Generating all possible headers. 4.6. Further Packet Verification 45

Later we moved on to packets that also have trailer and header.

When starting the program, we load all the pre-calculated data into memory. This provides us with rapid and cheap lookups while processing the input. Additionally we generate all 24 bit numbers with 0, 1, 2 (and 3) bits set. These will be used as the error patterns for the LAPs seen in packet candidates to generate all neighbours at a given Hamming distance.

To handle the data efficiently we create a buffer struct to keep the 72 + 54 = 126 bits of data we need at a time. This is implemented as a struct with two 64 bit unsigned integer fields, and can thus be easily extended to allow keeping the packet body as well. Furthermore, our buffer struct implement convenience methods to efficiently retrieve interesting partitions of the buffer such as the LAP or Barker code (see Figure 4.18). Shifting the data by one bit to handle the next demodulated bit of input can be seen in Figure 4.19. As a result of this operation, the new contents of the packet candidate buffer get checked for validity as a packet.

In order to verify a packet candidate we first look up the bits in the Preamble, Barker code, and Trailer positions. Should the error seen so far be within our allowed thresh- old, we continue processing the candidate - otherwise we reject the candidate and push another bit. Following the initial verification, we generate all LAPs within a given Hamming distance, and choose the one that would yield an AC closest to the given input, as shown in Figure 4.20. In case the best found option is not within an allowed error margin we discard it, otherwise we package the data up to a finding struct, and pass it into a queue for header analysis. This ensures that the main processing loop can perform consistently and we will not lose input. This has not posed an issue, but should the queue start overflowing, we can extend the queue (very long periods of very intensive traffic are unlikely, and a larger queue may be able to consume any delays incurred), or simply store or transmit the data as the volume has by this point greatly lessened.

4.6 Further Packet Verification

Once we have identified a good candidate for a packet, we explore the header section using the algorithm in Figure 4.22. The buffer class has a method we excluded above. This method (Figure 4.21) extracts the packet header and undos the FEC 1/3 encoding. 46 Chapter 4. Implementation struct buffer_t { uint64_t hi; uint64_t lo;

// accessors assuming data is aligned on LSB uint32_t get_lap() { return this->hi&((1 << 24)-1); }

buffer_t get_ac() { buffer_t res; res.hi= (this->hi >> 54)&0b11111111; res.lo= (this->hi << 10)| (this->lo >> 54); return res; }

uint8_t get_pre() { return (this->hi >>(64-7))&0b11111; }

uint8_t get_bark() { return (this->lo >>(58))| ((this->hi&1) <<6); }

uint8_t get_trailer() { return (this->lo >>(54))&0b11111; } };

Figure 4.18: Buffer struct with convenience methods.

In order to achieve its goal, it implements a lookup for the correct values, as well as the Hamming distance between the input and correct value, based on Table 4.2. As a result, we get the 18 bit header and checksum, and a count of bit errors in the header portion. At this stage we discriminate with respect to the errors detected, and classify the packet candidate as valid, or possibly invalid. This approach can cause some misses to packets we could have detected. Due to the miniscule payload size of BT packets, fast packet rate, and our capability to process multiple channels in parallel, this will not have a significant adverse effect on results . The decoded header can then be analysed based on the previously calculated headers to retrieve a collection of plausible UAPs, which by repeatedly intersecting the sets from packets with a given LAP will eventually yield 4.6. Further Packet Verification 47 void clap_impl::push(uint8_t in) { in= in&1; this->buffer.hi <<=1; this->buffer.hi |= (this->buffer.lo >> 63)&1; this->buffer.lo <<=1; this->buffer.lo |= in;

if (this->buff_contents< 72+ 54){ this->buff_contents++; } else { this->check(); } }

Figure 4.19: Pushing an input bit into the buffer. for (uint32_t i=0; i< lap_err_count; i++){ uint32_t lap_errd= lap^i; buffer_t ac_errd= get_ac_from_lap(lap_errd);

uint32_t distance= calc_hamming(ac, ac_errd); if (distance< best_dist) { best_dist= distance; best_lap= lap_errd; } if (distance ==0){ break; } }

Figure 4.20: Exploring all laps within a Hamming distance. a single UAP for the master node of the piconet (see Figure 4.23). 48 Chapter 4. Implementation

uint32_t get_header_unfec_with_err() { uint16_t header=0; uint16_t error=0; const uint8_t[8] err_table={0,1,1,1,1,1,1,0}; const uint8_t[8] correct_table={0,0,0,7,0,7,7,7}; uint64_t tmp= this->lo& ((( uint64_t)1) << 54)-1; uint8_t triplet; for (int i=0; i< 18; i++){ header <<=1;

triplet= (tmp& ((( uint64_t)7) << 51)) >> 51; error += err_table[triplet]; header |= correct_table[triplet]; tmp <<=3; } return (error << 24)| header; }

Figure 4.21: Pushing an input bit into the buffer.

Input Correct Distance 000 000 0 001 000 1 010 000 1 011 111 1 100 000 1 101 111 1 110 111 1 111 111 0

Table 4.2: FEC 1/3 decoding lookup table. 4.6. Further Packet Verification 49 void verify_header(detection_t& finding){ auto search= devices.find(finding.lap); bool fresh= false;

uint32_t header_with_errcnt= finding.data.get_header_unfec_with_err(); uint16_t header=( uint16_t) header_with_errcnt; uint16_t fec_err_count=( uint16_t) (header_with_errcnt >> 16);

device_t device; if (search == devices.end()) { device.lap= finding.lap; device.confidence=0; device.count=1; fresh= true; } else { device= search->second; device.count +=1; }

if (finding.lap == reverse_lap(0x9e8b33)) { // inquiry device.confidence += 0.2; } else if (fec_err_count<5){ // FEC likely to be real device.confidence +=1; } else { // fec failed => garbage or inq device.confidence += 0.1; }

output(device, finding); devices[finding.lap]= device; }

Figure 4.22: Verifying header correctness. uaps= set([uap for uap, clk, header_data in header_opts \ if aprrove_header(header_data, lap, uap, clk)]) if l in uaps: uaps[lap].intersection_update(uaps) else: uaps[lap]= uaps

Figure 4.23: Correlating packets to discover UAP.

Chapter 5

Evaluation

We evaluated the performance of our proposed platform against the commercially sold open source solution Ubertooth. Ubertooth is an open source software and hardware platform for Bluetooth sniffing and analysis.

The gr-bluetooth suite is a SDR based software aiming at achieving goals similar to ours. The performance was to be assessed against gr-bluetooth as well, but the gr- bluetooth suite does not work reliably enough to contribute to the data in any way. The output of their main script (btrx) is littered with false positives of BLE packets. Furthermore, for example, when used for capturing an Inquiry procedure, the inquiry packets we would expect to see, were not present. The code base is generally badly maintained and difficult to improve. Despite considering some good strategies, like using the BCH coding to do proper error correction, this is not implemented. Overall, this renders the whole package unsuitable for our evaluation. This is not a great down- side, as the Ubertooth project originates (to some degree) from gr-bluetooth and the maintainers greatly overlap.

5.1 Computational Performance and Speed

Due to precomputation and simple lookups during runtime, as well as separating deeper analysis from the input stream processing, we have succeeded in creating a solution that can run in real time. This is straightforward to assess - whenever a block makes the GNU Radio flow is too slow, the UHD reports it by printing an O to the output. When running the suite with the suggested error tolerances in Table 5.1 this does not

51 52 Chapter 5. Evaluation

Standard Usage Deep Analysis 1 Deep Analysis 2 Preamble and Barker Code 2 4 4 LAP Errors 2 2 3 Access Code 7 13 13

Table 5.1: Suggested Bit Error Tolerances. happen.

The Ubertooth project uses some techniques similar to ours to ensure performance. However the bitstream analysis portion of the software does not work as efficiently, and thus equivalent error recovery is infeasible for them.

The error tolerances are used during the bitstream processing to discriminate against noise and separate valid packets. Please refer to chapter 3 for details on what each of the tolerances entails. Increasing the AC error tolerance does not impact performance much - this mostly applies to work outwith the direct input processing methods. Ev- ery bit error allowed in the initial LAP results in a 24-fold increase the search space, and a similar increase in running time (since the bulk of analysis is dependent on the LAP considered). Increasing preamble and barker code error tolerance will affect per- formance adversely, but not as significantly as the LAP error tolerance does. Thus increasing accuracy is possible, but we do not recommend using settings greater than Standard Usage, as this will hinder the capabilities of running in real time. The effect of deeper analysis can be seen in Figure 5.10.

5.2 Design of Experiments

Despite being able to process the radio input in real time, for our experiments we have chosen to process the data offline. This gives us additional control over the data, as well as decouples some hardware properties from our experimental results. Furthermore, we can examine the same data using the suggested real time-capable settings, as well as performing computationally more expensive analysis that would be possible offline, or in the future.

During the experiments, we used three USRP boards and the Ubertooth, recording concurrently. Each of the USRP boards were controlled by a different host machine, 5.2. Design of Experiments 53

Centre Frequency Receiving Bandwidth Sample rate Ubertooth 2410 MHz 1 MHz Unspecified USRP 1 2409 MHz 4 MHz 16 Msps USRP 2 2409 MHz 4 MHz 4 Msps USRP 3 2409 MHz 3 MHz 4 Msps

Table 5.2: Configuration of USRP boards used during main experiments. one of which also interfaced with the Ubertooth. As raw complex samples at 16 Msps (Million samples per second) generate 128 MB of data per second, we opted for storage in an in-memory file system, and copied the files to a less volatile medium at the end of each experiment. More details on how this was achieved can be found in section 5.4.

The particular configuration details of each of the interfaces can be found in Table 5.2. The GFSK demodulator requires 4 samples per one output symbol to work [37]. The experimental setup is shown in Figure 5.1. Exploring the performance of our model, there is no consistent difference between the data from the three receivers. However, any one of the receivers may perform significantly better in a single experiment. This is likely due to device positioning and radio reflections and noise within the room. To treat our platform and the Ubertooth equally, we use the data from USRP 1 for more detailed analyses instead of picking the best recording for each experiment, unless stated otherwise. The Ubertooth stack does not allow configuring more than 4 bits of error in the AC, thus in graphs this is duplicated for comparison with our platform.

In order to establish the effects of splitting a recording into several channels, we have recorded at both 4 Msps and 16 Msps. Additionally, we have recorded a with narrower bandwidth, to explore the effects our software band-pass filtering might have. This provides valuable insight into the scalability of our solution without the need of an ex- cessive number of radio interfaces. In case of the Ubertooth, the used input bandwidth is limited (1 MHz for BT and 2 MHz for BLE) and the demodulation is handled by the hardware, with the host device seeing only the demodulated bitstream (with metadata).

In our experiments, we use a limited number of Bluetooth Classic devices: Microsoft Band 2 activity monitor and smart-band, Nokia BH-904 wireless headset, Kano wire- less keyboard. As the master we used a Sony Xperia Z3 Compact mobile phone. More experimental devices are not necessary at this stage, as we are evaluating the receiver stack rather than comparing the characteristics of the particular devices. 54 Chapter 5. Evaluation

Figure 5.1: Layout of the interfaces for experiments.

In order to gain a comprehensive insight into the characteristics of our implementa- tion’s performance, we have gathered data during a number of distinctive everyday use cases of the devices.

Background "noise" - in quiet and within an active WiFi channel We recorded 3 separate instances of no intentional/controlled wireless traffic. This will provide a baseline for the experimental conditions. Furthermore, we set up a WiFi network to use a channel overlapping with our monitored BT channels and performed recordings during an active data transfer on the WiFi network. This enables us to effectively establish the susceptibility of receiving stack to false positive results. The WiFi signals will be demodulated by the BT receiver, but as the modulation schemes and data rates are different, even if the demodulator recovers a symbol clock and consistently outputs data, the resulting bitstream will be arbitrary. Although possible, the chances of the data containing valid Bluetooth packets are low. In reality, we cannot control the background noise of our experimental environment completely. However, we have enumerated known other devices in the area and performed the measurement during a time when the presence of other agents was expected to be minimal. Band - Idle and Voice Control This recording demonstrates the capabilities of our approach against an example 5.3. Proof of Motivation 55

fitness tracker while it is paired and in range. Both, in the case of active usage of the tracker (for example using the voice control feature, which causes audio to be streamed to the phone for voice recognition) and when the device is waiting idly. Being able to identify devices in similar states enables us to identify a person simply walking past. Device Pairing - Band, Headset and Kano The pairing procedure is particularly susceptible to attacks that leverage snoop- ing. As devices are identifying one another and can be setting up keys to protect future communications, this is the most critical period of communication. How- ever, as pairing is usually done once and in a relatively controlled environment (for example home or office), the contents of these messages are not relevant for our generic tracking. Despite not being interested in the contents of the packets, we can demonstrate the capability of snooping the pairing procedure. Kano Typing and Mouse Movement We recorded both short discrete packets (key presses) and more continuous trans- missions (continued movement on the touchpad) from the Kano Keyboard. This demonstrates our capability to detect discrete and infrequent packets as well as continuous low-bandwidth transmissions. Headset Voice Stream We recorded the Nokia Headset being used to issue voice commands (similarly to the Band), and during a call. In case of the latter, we also recorded the trans- mission with a background WiFi file transfer. This gives us a good example of a high quality data stream both on its own, and when overpowered by a WiFi channel.

5.3 Proof of Motivation

We examined the behaviour of both the phone and the Band, when paired, but devices become unreachable from one another. This simulates, for example, if a user leaves home without the smartband, but does have their phone with them with Bluetooth still enabled, or vice versa. For this we paired the devices and, in turn, removed one from the range of the radios. Then we performed measurements on a single channel for between 1-2 hours. Extensive analysis of these results is not required for this project, 56 Chapter 5. Evaluation

Device Mentions of Phone Mentions of Band Sony Xperia Z3C 128 packets per hour 2 packets per hour Microsoft Band 2 1931 packets per hour 0 packets per hour

Table 5.3: Monitoring paired but out of range devices. however, they demonstrate the need for care with identifiable information with such devices.

It is clear from Table 5.3 that with Bluetooth turned on, but undiscoverable, these devices still attempt to communicate with and locate one another. As expected, the devices mostly use the master LAP, but concerningly the phone also sent packets with the Band’s LAP. The packets were mostly sent in bursts with a few minutes of delay. This kind of activity is required for the devices to rediscover one another, but also opens doors to third parties identifying them even when the user may believe no wire- less transmissions are sent. To capture this, we designed one of our experiments to record the behaviour of a headset when losing connection to the paired phone. In that particular case we noted numerous inquiry packets, which led us to believe that at least one of the devices turned back into discovery mode.

5.4 Capturing Input

In order to receive on several host computers simultaneously, we opted for terminator to control launching and stopping the software. This is a terminal manager that allows sending identical inputs to a number of shells, in our case, each controlling a different host computer. It would have been possible to create a more complex, and possibly more accurate experiment harness, but the effort would not be worth possibly reduc- ing the already sub-second differences between launch times on host machines. Fur- thermore, the delays caused by loading the software were observed to be far greater and more varied than those of sending the commands. In order to lessen delays, we launched, stopped and relaunced each of the experiments. This ensured that the pro- gram code for our receiving stack was similarly cached, and that recordings would start at the same time (within a lower tolerance).

Since we are using conventional hardware and the raw sample output of the radio boards is rather large, we need to take care when storing the data. The hard disk drives 5.4. Capturing Input 57

Figure 5.2: Out-of-Memory/Disk Full with tmpfs.

in our host machines are not fast enough to enable saving to disk on the fly. In order to overcome this limitation, we collated the available memory modules to the host machines being used, and partitioned 10 GiB for a tmpfs. This is an implementation that allow us to have a file system resident in the operational memory of the host machine - a ramdisk. We chose tmpfs over the alternative ramfs, as it provides a more advanced interface - it behaves similarly to persistent storage partitions, for example giving correct errors in case the disk is full.

In order to establish the best configuration for our experiments, we experimented with ramdisks larger than the available memory with and without swapping enabled (see Figure 5.2). As a result, we saw expectedly that despite swapping enabling us to use slightly more space for storage, the behaviour would be unpredictable, and we would incur unknown losses due to not being able to process the incoming data. Based on this, we limited the ramdisks to a safe size for all host machines, and thus by reducing the maximum length of recordings have assured the correctness of the data in this respect.

Following the capture to the ramdisk, when any of the partitions was nearing capac- ity, we stopped all recordings, and copied the data to central persistent storage. This enables us to run individual experiments without concerns for data transfer speeds or delays, or exceeding capacity. 58 Chapter 5. Evaluation

5.5 False Positives

Insufficient verification of packets can often lead to a high number of false positives. Similarly, custom aggressive, naive, and non-standard error correction algorithms can create packets from noise. Thus, it is important to both verify what we believe to be packets, as well as calibrate the system properly. Both the Ubertooth platform and our implementation support specifying the allowed error in the Access Codes. For our solution, we also allow adjusting other internal error thresholds (see: Table 5.1).

A problem with naive approaches to packet detection is that the system may assume a particular bit sequence to be a packet when it is not. Due to the high bit rate we need to examine a very high number of candidates for each packet. For example, without using the BCH code, our decision would depend on only the Preamble and Barker code - a total of 10 bits entropy. When accommodating some error recovery, we further decrease the chances of a bit sequence being discarded. This problem is particularly clear with the gr-bluetooth system, which we have had to exclude from the comparison due to its poor performance.

To establish the performance of our suite against the Ubertooth, we have devised the following test scenario. In a setting with no known spurious signals we run all systems concurrently and gather data. Ideally, this should yield no valid packets. Then we repeat the experiment with other traffic transmitted on the radio band - transferring a large file on a WiFi channel overlapping with the observed frequency. Yet again this should not give us any results. We can see this in Figure 5.3, where out solution con- sistently yields a valid result, while the Ubertoot’s performance deteriorates rapidly with more slack allowed in the bit errors observed in the AC. In reality, some packets from devices further away and outwith our control may be present, and during exper- iments with an overpowering WiFi signal some BT packets can fit in quieter periods (for example during the CSMA/CA back-off periods).

Beyond the special cases listed above, we will also analyse the false positive rates for all the general use cases. In all these cases we expect a low number of devices (or in some cases none) to be well represented, and a minimal amount of others. The performance in respect to LAPs reported is shown in Figure 5.4. We can clearly see that out platform is much more successful in filtering out invalid packets. This analysis feeds into the Miss Rate section below. 5.5. False Positives 59

Figure 5.3: Number of LAPs detected across background monitoring experiments.

Figure 5.4: Number of LAPs detected across all experiments. 60 Chapter 5. Evaluation

Figure 5.5: Number of Inquiry packets detected.

5.6 Miss Rate

For a tracking solution to be usable, it needs to reliably identify devices present. We claim that our platform provides a more reliable output of real LAPs than the Uber- tooth device we are comparing against. Since we do not have complete control of the medium, or the transmitters on such low levels, we can not establish a true miss rate for either device. However, we can examine the outputs based on the same radio input and compare the detected LAPs for each case.

In Figure 5.5 we can see that our proposed solution may be outperformed by the Uber- tooth with lower error tolerances. This is likely due to demodulator calibration and more aggressive channel filtering. Both of these can be easily adjusted and experi- ments run to find the optimal settings. However, when allowing 3 or more bits of error in the Access Code, our platform starts consistently outperforming the Ubertooth. The large standard deviation margins are due to different experiments having a greatly dif- fering numbers of packets that could possibly be detected. Due to this, we also present Figure 5.6 that displays inquiry packets detected during pairing procedures. In this graph the data has been normalised with respect to the common mean of our platform and the Ubertooth for each experiment. We can still see consistent better performance with AC error threshold being set to 3 or more.

The error tolerance of the Ubertooth can decrease the perceived miss rate to similar 5.6. Miss Rate 61

Figure 5.6: Number of Inquiry packets detected during pairing (normalised). levels as our proposed interface, however this will generate an excessive amount of falsely identified devices. In order to evaluate the performance of the two platform in this light we observe the ratio between known LAPs and unknown LAPs reported by the devices - let us call this the Confidence Index. Again, there were no known exter- nal devices actively transmitting in the area, thus unknown LAPs can be assumed to be false positives. Due to overwhelming WiFi being treated a a possible Bluetooth trans- mission, all results in Figure 5.7 are slightly worse than in Figure 5.8 where we have excluded experiments for background noise and overlapping with WiFi transmissions.

Lastly, we would like to point out that by applying our internal confidence model, we can reach near-perfect performance as shown in Figure 5.9 without increasing out miss rate.

Figure 5.10 depicts our suggested configuration (2 bit LAP error, 2 bits error allowd in Preamble and Barker) for real time analysis compared against a significantly slower and further in depth analysis allowing 3 bits of error in the LAP and 3 bits of error between the Preamble and Barker code. The latter the performance implication are not justified for a general case given the small increase in accuracy. However, this serves to demonstrate that we do have virtually arbitrary limits for the precision we process the data with, allowing for our solution to be still relevant for future hardware to come. We 62 Chapter 5. Evaluation

Figure 5.7: Confidence index across all experiments. also show that our heuristic does not have any detrimental effects in our experimental settings. This does not claim that it may not reduce the number of devices identified, should the signal be weak and far away. However, being able to process the input in parallel gives us a sufficient advantage to hide such possible limitations. Since our heuristic can not increase the amount of packets detected, increasing False Positives is impossible.

5.7 Detection/Discovery Time

In a real life tracking scenario, sub-second differences in identification time are likely to be irrelevant. However, being able to capture, and ideally process, the data in real time is crucial - a device may only be in range for a short while when a user walks past us at a greater distance. Thus, any outage in recordings caused by processing delays in unacceptable.

Since both of the tools are capable of to running in real time, a shorter time to iden- tification becomes synonymous with a lower miss rate - the less packets we miss, the quicker we will likely see one that we can use to identify a new device. This is as the first detection of a device cannot occur before a packet linked to it is seen. Similarly, 5.7. Detection/Discovery Time 63

Figure 5.8: Confidence index excluding WiFi and Background noise experiments.

Figure 5.9: Confidence index across all experiments, using our solution’s internal con- fidence measure. 64 Chapter 5. Evaluation

Figure 5.10: Fast and Extended Processing with and without Internal Confidence Model (ICM) for detecting known devices during pairing (normalised). for devices further away and with a weaker signal, discovery time is directly related to when we happen to successfully capture and decode a relevant packet.

The benefit of out platform is that with high quality radio interfaces we are able to amplify the input more without excessive noise. Further, being able to easily adjust the parameters for noise removal (for example, by squelching) we can effectively tune our receiver to any configuration. A more flexible pipeline, along with the freedom to use any radio hardware gives us a clear advantage over the Ubertooth device.

5.8 Scalability

The evaluation of scalability will be a discussion outlining the properties and limita- tions of each stage of our pipeline.

Both, the design and implementation of the project have been architected with paral- lelisation in mind. Nonetheless, there are serveral important bottlenecks to consider.

Radio Interface Limitations Any hardware will have its limitations - this also applies to the SDR platforms we 5.8. Scalability 65

are using. With an extended budget it is possible to easily adopt more advanced peripherals, and thus overcome any limitations posed on us. With the current USRP B210 boards we are able to receive the whole BT spectrum bandwidth with 2 devices. Radio-to-Host Link The link between the radio peripheral and the host computer is the next variable in the system. In our case, we are using a USB 3 bus to communicate between the host machine and the radio peripherals. With a maximum data bandwidth of 640 MB/s this is exactly enough for capturing 80 Msps (2 x 32 bit complex samples). Pushing the bus to its limits will likely cause congestion, but a simple solution is to use a separate host machine for each radio peripheral. Alternatively, we could opt for SDR platforms that are capable of more processing on-board, thus reducing the amount of data required to be sent to the host machine. Host Computer Performance As is apparent from the solutions required for our experiments, storing even lower sample rate recordings can be challenging. This problem is amplified by increasing the bandwidth, and thus required sample rate, of out recordings. This problem can be mitigated by using a host computer with extremely fast storage, or a fast data link. For processing the data, we have needed roughly one core per channel - this poses a limitation for conventional hardware where more than 16 virtual cores are uncommon. However, the processing can be further streamlined, and then ported to a GPU providing us with sufficient performance (see section 6.3).

Thus, our solution can be relatively easily extended to a full spectrum receiver while using conventional, affordable and accessible hardware. As an alternative, we pro- pose designing a hardware interface to receive and demodulate all channels in paral- lel, which would then allow us to perform the bitstream processing to arbitrary error threshold on the host computer. This would increase performance greatly by employ- ing dedicated hardware modules, while keeping the flexibility of out software defined implementation.

Chapter 6

Conclusion

We have provided a parallel pipeline for monitoring a number of Bluetooth channels and rapidly identifying the devices present. Our platform is specified in terms of overall design as well as our current implementation. This enables anyone with access to a 2.4 GHz band capable Software Defined Radio frontend to set up a similar test-bed to ours and improve the platform and verify its performance.

We have evaluated our tool against the Ubertooth One [15], a popular device for Blue- tooth testing. We have shown slightly increased performance for packet detection, and a significantly reduced false positive rate. Furthermore, we have experimented with a more computationally complex analysis performing greater error recovery. Such con- figurations are currently not suitable for real time monitoring, but may be feasible on future hardware or when employing accelerators.

6.1 Tracking People Based on Partial Device Addresses

The main goal of the project is to quickly and reliably detect the Lower Address Parts (LAPs) of the devices communicating. This gives us 24 bits of the full 48 bits of the device address.

The Lower Address Part (LAP) is only unique for one in about 16 million devices. Similarly to the Birthday Paradox1, a collision seems disproportionately likely even with a relatively small sample size (around 11% with already just under 2,000 devices

1https://betterexplained.com/articles/understanding-the-birthday-paradox/

67 68 Chapter 6. Conclusion

Method Device Pool Sample P[no_collisions] LAP 224 100 0.9997 LAP 224 1,000 0.9707 LAP 224 10,000 0.0508 LAP + 0.5 UAP 228 100 1.0000 LAP + 0.5 UAP 228 1,000 0.9981 LAP + 0.5 UAP 228 10,000 0.8301 LAP + UAP 232 100 1.0000 LAP + UAP 232 1,000 0.9999 LAP + UAP 232 10,000 0.9884

Table 6.1: Effect of UAP entropy on collisions during surveillance.

# pool for LAPs is 2**24 def no_collision(pool, sample): return reduce(lambda x, y: x*y, [a/pool for a in range(pool-sample+1, pool+1)])

Figure 6.1: Calculating probability of no collision given a pool size and a sample size. sampled). While sufficient for small-scale device identification, the risk of collisions greatly increases with an increase in the number of devices observed. We have used Figure 6.1 to populate Table 6.1 - this shows how reducing the entropy of UAP to half of the full 8 bits would greatly improve the performance of reliable people tracking. This demonstrates the importance of our current efforts to reduce the entropy of UAPs and recover more of the device address.

6.2 Bluetooth Low Energy (Bluetooth Smart)

Bluetooth Low Energy (BLE) is a newer release (part of the specification since version 4.0) of the Bluetooth protocol for very low power devices (see [38] for an overview of BLE characteristics). It is not backwards compatible, and neither does it aim to replace the original protocol and uses 40 channels, each 2 MHz wide, as opposed to the 79 × 1MHz configuration for BT Classic. The focus on reducing power consumption has simplified the protocol significantly, and by doing so removed some of the security and privacy provisions present in Bluetooth Classic [39]. The devices send messages 6.3. Future work 69 containing their address on 3 advertisement channels to report having data to transmit. Following this, in coordination with the master device, transmit the data on the rest of the channels.

BLE provides an optional mechanic for regularly changing the device addresses ad- vertised [40], however this is not used much [41], and can be reset in data pack- ets. The same paper proposed a protection mechanism by means of an additional device jamming unauthorised and unnecessary communications to and from a user’s devices. Their proposed protection method could theoretically be circumvented by us- ing advanced signal processing and correlation and a Multiple-Input-Multiple-Output (MIMO) radio interface.

Thus, BLE does offer some optional protection, but in general is susceptible to the approach we propose, as manufacturers often do not take advantage of the new provi- sions.

6.3 Future work

During this report we have mentioned a number of possible improvements and venues for future efforts. Firstly, we could integrate our improvements into gr-bluetooth, and thus contribute towards maintaining one joint open source repository. Alternatively, we could polish our implementation and release it open source as an individual GNU Radio module.

For improving the performance of our platform, we shall consider the following.

Feasibility of Large Scale Surveillance Most importantly, a follow-up to this work would be an in-depth analysis of the data obtainable in a realistic tracking setting. This would require specifications of deployment for the platform, and large scale experiments of identifying de- vices. From the data we would be able to draw conclusions about the location and social relations of the people observed. Furthermore, we would be able to obtain an overview of how susceptible various devices are to such explorations. Since there are clear ethical implications with tracking people, even in an aca- demic research setting rather than for any profit, such experiments would need to be carefully designed in order avoid unintentional infringement of the subjects’ privacy. 70 Chapter 6. Conclusion

Developing a Complete SDR Bluetooth Tranceiver The current platform can serve well as a starting point for building a complete Software Defined Radio transceiver for Bluetooth. Such a tool would be useful for obtaining access to every layer of the communication protocol stack. Thus, we would be able to carry out extensive testing of devices to explore the security and privacy of Bluetooth in depth. At the same time we would create a teaching tool for Bluetooth. Bluetooth Low Energy Support Our currently proposed solution supports exploring only Bluetooth Classic com- munications. Many small wearable devices use BLE nowadays, and being able to identify BLE devices as well would thus significantly increase the number of potential targets. Since the protocol stack from BLE differs greatly from Blue- tooth Classic, a large portion of our receiving stack would have to be adjusted. However, there may be opportunities to improve overall performance by pro- cessing overlapping BLE and BT channels concurrently. Processing Pipeline Improvements In order to reduce the required computations, our proposed pipeline can be im- proved with signalling paths to notify blocks of the presence of valid data. For example, the high noise levels created from WiFi packets could be ignored by the demodulator if we detect such situation earlier and propagate the metadata. Similarly, we could improve our current pipeline by dynamically adjusting the thresholds for processing blocks, for example by implementing dynamic noise levels for squelching. Hardware Accelerated Processing Since Digital Signal Processing is a very repetitive procedure, many solutions exist for accelerating it. We could use Field Programmable Gate Arrays (FP- GAs) to perform the processing, as we would be able to reproduce the hardware receivers. Alternatively, we could use Graphics Processing Units (GPUs) - pro- cessors with hundreds of simple cores. This would enable us to segregate the work further, and create separate hardware pipelines within the GPU. As a re- sult, we could benefit from significantly increased performance while still using conventional and affordable hardware. Parallel Pipelines in Hardware The Ubertooth device provides a good platform for single-channel monitoring. However, for wide-band surveillance, we would need one device per channel. 6.4. Final Remarks 71

This is approach would be infeasible, but leads us to imagine an integrated hard- ware solution with dedicated pipelines for each channel or small group of chan- nels. Such a solution would not require excessive amounts of power or be for- biddingly expensive, as Bluetooth receiver modules are designed to be power- efficient and affordable. With this approach, we would sacrifice the flexibility of SDR in exchange for the reduced computational requirements on the host com- puter and a streamlined architecture. Furthermore, we would be able to work with a more modular system where the signal processing would run completely independently of binary data processing, thus providing us with higher levels of abstraction. Evaluation The main limitation for our evaluation is the lack of absolute ground truth. Since there is always some interference present and we cannot control the experimen- tal environment completely, we can only perform experiments assessing different solutions concurrently against one another. In order to overcome these limita- tions, we need to perform the experiments in an electromagnetically shielded room and have absolute control over each packet sent individually. Additionally we suggest exploring the differences in output for radio interfaces. It is difficult to establish the true set of packets transmitted. As a solution, we propose to use a suite of professional tools (such as the ones depicted in Fig- ure 2.8) to measure the packets present. Furthermore, we propose exploring the options of modifying Bluetooth interface drivers to enable bit-accurate logging without delays incurred in transmission.

6.4 Final Remarks

This project has been rather challenging, however also fun and enticing. The world of signal processing is scary to delve into, and several thousand page specification sheets for a technology [17] do not fall far behind. Overall, this work has made me realise my liking for research projects, and plays a big role in my decision to apply for a research degree. This, in turn, gives me the opportunity to continue work on some of the elements proposed in section 6.3.

Bibliography

[1] M. M. Werdegar, “Lost? The Government Knows Where You Are : Cellular Telephone Call Location Technology and the Expectation of Privacy,” Stanford Law & Policy Review, vol. 10, p. 103, 1998. [2] K. Michael and R. Clarke, “Location and tracking of mobile devices: Überveil- lance stalks the streets,” Computer Law and Security Review, vol. 29, no. 3, pp. 216–228, 2013. [3] M. Cunche, Mohamed Ali Kaafar, and R. Boreli, “I know who you will meet this evening! Linking wireless devices using Wi-Fi probe requests,” 2012 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Net- works, WoWMoM 2012 - Digital Proceedings, 2012. [4] R. Faragher and R. Harle, “An Analysis of the Accuracy of Bluetooth Low En- ergy for Indoor Positioning Applications,” Proceedings of the 27th International Technical Meeting of The Satellite Division of the Institute of Navigation (ION GNSS+ 2014), pp. 201–210, 2014. [5] B. Balaji, J. Xu, A. Nwokafor, R. Gupta, and Y. Agarwal, “Sentinel: occupancy based HVAC actuation using existing WiFi infrastructure within commercial buildings,” Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems, p. 17, 2013. [6] S. Hay and R. Harle, “Bluetooth tracking without discoverability,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5561 LNCS, pp. 120–137, 2009. [7] V. Kostakos, “Using Bluetooth to capture passenger trips on public transport buses,” Personal and Ubiquitous Computing, pp. 1–13, 2008. [8] L. Schauer, M. Werner, and P. Marcus, “Estimating Crowd Densities and Pedes- trian Flows Using Wi-Fi and Bluetooth,” Proceedings of the 11th International

73 74 BIBLIOGRAPHY

Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services, pp. 171–177, 2014. [9] How to use a BLE sniffer without pulling your hair out! [Online]. Available: http://www.novelbits.io/how-to-use-a-bluetooth-low-energy- sniffer/. [10] D. Kelly, R. Behan, R. Villing, and S. McLoone, “Computationally tractable location estimation on WiFi enabled mobile phones,” IET Irish Signals and Sys- tems Conference (ISSC 2009), pp. 30–30, 2009. [11] S. Woo, S. Jeong, E. Mok, L. Xia, C. Choi, M. Pyeon, and J. Heo, “Application of WiFi-based indoor positioning system for labor tracking at construction sites: A case study in Guangzhou MTR,” Automation in Construction, vol. 20, no. 1, pp. 3–13, 2011. [12] F. Meneses and A. Moreira, “Large scale movement analysis from WiFi based location data,” International Conference on Indoor Positioning and Indoor Nav- igation, no. November, 2012. [13] D. Spill, “Final Report: Implementation of the Bluetooth stack for software de- fined radio, with a view to sniffing and injecting packets,” PhD thesis, University College London, 2007. [14] D. Spill and A. Bittau, “BlueSniff: Eve meets Alice and Bluetooth,” WOOT ’07 Proceedings of the first USENIX workshop on Offensive Technologies, p. 10, 2007. [15] D. Spill, “Bluetooth Packet Sniffing Using Project Ubertooth Dominic Spill,” [Online]. Available: https://dominicspill.com/ruxcon/Spill-Ubertooth. pdf. [16] B. S. I. Group, “Bluetooth SIG 2014 Annual Report,” Tech. Rep., 2014. [On- line]. Available: https://www.bluetooth.org/en-us/Documents/Annual% 7B%5C_%7DReport%7B%5C_%7D2014.pdf. [17] Bluetooth Special Interest Group, Specification of the Bluetooth System, 2014. [18] J. Schiller, Mobile Communications. 2003, p. 520. [19] IEEE Computer Society, IEEE Standard for Local and Metropolitan Area Net- works: Overview and Architecture. 2014, vol. 2014, p. 56. [20] IEEE, IEEE Organisationally Unique Identifiers. [Online]. Available: http : //standards-oui.ieee.org/oui.txt (visited on 04/05/2017). [21] M. Dillinger, K. Madani, and N. Alonistioti, Software Defined Radio Architec- tures, Systems, Functions, 1-6. 2003, vol. 11, pp. 1–454. BIBLIOGRAPHY 75

[22] S. Sanfilippo, Dump1090. [Online]. Available: https://github.com/antirez/ dump1090. [23] Paul Patras, Wearables That Snitch on Us, 2017. [Online]. Available: http:// www.edinburgh.bcs.org/events/2017/170403.htm (visited on 04/05/2017). [24] USRP Software Defined Radio (SDR). [Online]. Available: https : / / www . ettus.com/product/details/UB210-KIT (visited on 10/26/2016). [25] M. Lind, “Sofware Defined Radio Systems,” University of Edinburgh, Edin- burgh, Tech. Rep., 2015. [26] J. G. Proakis, M. Salehi, and P. by McGraw-Hill, DIGITAL COMMUNICA- TIONS, FIFTH EDITION, 3. 2001, vol. 49, pp. 1727–1737. [27] R. Schiphorst, F. Hoeksema, and K. Slump, “Bluetooth demodulation algo- rithms and their performance,” 2nd Karlsruhe Workshop on Software Radios, vol. 2, pp. 99–106, 2002. [28] “Ellisys Bluetooth Explorer 400,” 2014. [Online]. Available: http : / / www . ellisys.com/products/bex400/. [29] ComProbe Sodera Wide Band Bluetooth Protocol Analyzer. [Online]. Available: http://www.fte.com/products/sodera.aspx (visited on 04/05/2017). [30] gr-bluetooth - Bluetooth for GNU Radio. [Online]. Available: http : / / gr - bluetooth.sourceforge.net/. [31] BlueZ. [Online]. Available: http://www.bluez.org/ (visited on 04/05/2017). [32] M. Ossmann and D. Spill, Building an All-Channel Bluetooth Monitor. [33] D. Spill and M. Ossmann, gr-bluetooth - GitHub. [Online]. Available: https: //github.com/greatscottgadgets/gr-bluetooth. [34] NooElec - Ubertooth One. [Online]. Available: http://www.nooelec.com/ store/ubertooth-one.html. [35] R. Schiphorst, F. Hoeksema, and K. Slump, “Channel selection requirements for Bluetooth receivers using a simple demodulation algorithm,” pp. 1–8, [36] Ettus Research, “USRP ™ B200 / B210 Bus series,” 2015. [Online]. Available: https://www.ettus.com/content/files/b200-b210%7B%5C_%7Dspec% 7B%5C_%7Dsheet.pdf. [37] S. H. Gerez, “Implementation of Digital Signal Processing: Some Background on GFSK Modulation,” [Online]. Available: http://wwwhome.ewi.utwente. nl/%7B~%7Dgerezsh/sendfile/sendfile.php/gfsk-intro.pdf?sendfile= gfsk-intro.pdf. 76 BIBLIOGRAPHY

[38] NovelBits, Bluetooth Low Energy Cheat Sheet. [Online]. Available: http:// www.novelbits.io/what-is-ble-and-its-role-in-iot/. [39] M. Ryan, “Bluetooth: With Low Energy Comes Low Security,” Proceedings of the 7th USENIX Conference on Offensive Technologies, p. 4, 2013. [40] Bluetooth Technology Protecting Your Privacy. [Online]. Available: https : / / blog . bluetooth . com / bluetooth - technology - protecting - your - privacy (visited on 04/06/2017). [41] K. Fawaz, K.-h. Kim, H. P. Labs, K. Fawaz, and K.-h. K. Kang, “Protecting Privacy of BLE Device Users This paper is included in the Proceedings of the Protecting Privacy of BLE Device Users,” 25th USENIX Security Symposium (USENIX Security 16), 2016. Appendix A

Feedback Day Poster

Figure A.1: Feedback Day Poster

77