Hasselt University Transnationale Universiteit Limburg
Assessing and improving security and privacy for smartphone users
Dissertation submitted for the degree of Doctor of Philosophy in Computer Science, at Hasselt University to be defended by
Bram Bonné
on August 31, 2017 Promotor: Prof. Dr. Wim Lamotte Co-promotor: Prof. Dr. Peter Quax 2011 – 2017
Abstract
Smartphone and other mobile device usage has increased greatly in the past years. This increased popularity has also led to a changed security and privacy landscape, with more personal devices being outfitted with a plethora of sensors that allow to track our every step, and a vastly larger attack surface than older, more static devices. This allows a variety of actors, including malicious hackers, state-sponsored entities and legitimate service providers, to have access to a large trove of mobile user data. In this dissertation, we assess the current state of privacy and security on smartphones, we create and gauge awareness of smartphone users around these issues, and we provide so- lutions to enhance security and privacy on mobile devices. To show the ease with which smartphone users’ data can be gathered surreptitiously, we describe a mechanism for invol- untarily tracking visitors at mass events making use of Wi-Fi technology, and show that this can be implemented at a low cost, allowing location tracking of 29% of visitors at a major music festival. We show how these techniques can be (ab)used in different scenarios (notably, by using the gathered data to compare different opportunistic routing algorithms that can be used for ad-hoc communication at mass events), and provide an open platform to researchers that can be used to quantify the impact and remediation rate of similar wireless protocol vulnerabilities. We create awareness about these issues, and we explain to smartphone users how they can secure themselves against them. For this, we provide a method to inform mobile device users when using wireless networks, showing privacy-sensitive (but anonymized) informa- tion about passers-by on a public display. We later use the same setup to inform audiences in talks on security awareness. Results from our user studies also show that specific, personal- ized scenarios may help to better inform users about security and privacy issues (increasing awareness of 76% of the participants in one study), and that the increased awareness leads to as much as 81% of device users willing to put an extra effort into securing their smartphones. Interestingly, we also show in a later study that an increased awareness does not automatically translate to better security practices. Additionally, we perform two studies to measure users’ privacy and security behaviors when using their smartphones. For the first study, we look at how aware users are about connections ii being made by apps on their device, while taking into account the security of both the Wi- Fi networks used and the connections made over these networks. For the second study, we extend the Paco ESM study tool to be able to examine the reasons why Android users install or remove an app at the time this happens, to look at the motivation behind granting or denying a permission right after users make their choice, and to assess how comfortable and aware users are about their decisions at a later point in time. We provide recommendations to different stakeholders (developers, manufacturers, network providers, researchers and mobile device users) on how to improve privacy and security on mobile devices without affecting usability, some of which have already been implemented by operating system manufacturers. Part of these recommendations are implemented as a tool that automatically mitigates Wi-Fi attacks for Android smartphones, which is distributed to the general public. In addition, we formulate a proposal to improve transparency in how user data is shared by service providers to third parties. Acknowledgments
This dissertation is the result of work conducted during the period between September 2011 and June 2017 at Hasselt University. It would have never been possible without the help of some people, whom I would like to thank here. First, many thanks go out to my promotor and co-promotor Wim Lamotte and Peter Quax. They provided me with the necessary guidance throughout the past 6 years, not only with respect to the research that went into completing my PhD, but also through assistance when performing teaching duties. When performing experiments involving deploying and moni- toring large amounts of Wi-Fi scanners, they were also there to help with the hands-on work, soldering together the scanners and climbing structures to suspend them. Without their con- tinued support, this thesis would not have been possible. I also want to thank my PhD jury: Frank Van Reeth, Igor Bilogrevic, Karin Coninx, Lieven Desmet, Marc Gyssens and Tristan Henderson, for reviewing my text, and for their comments that helped make this dissertation. I want to thank the rest of the EDM, and by extension the entire computer science fac- ulty for the great working environment. Specifically, I want to mention Arno Barzan and Pieter Robyns, who have contributed greatly to this research, and helped in keeping me sane throughout this work. They were always there for support and laughs. I also want to thank Kris Luyten, Jo Vermeulen, and Fabian Di Fiore, for respectively encouraging me to go work at Google, for the genuine interest in my work, and for dark industrial techno. Similarly, I want to thank Sai Teja Peddinti, Nina Taft, and the rest of the wonderful people I had a chance to work with at Google for the awesome working environment. They immedi- ately made me feel like part of the team, and supported my work all the way through, even at the expense of other projects they had to manage at the same time. There are some other people who provided great support for individual experiments. First of all, a huge thanks goes out to the Pukkelpop organization, including Chokri Mahassine himself, for allowing us to perform our experiments at their festival, and to The Safe Group for helping out and for staying incredibly nice while we abused their infrastructure. Thank you also to Bob Hagemann and the rest of the WiGLE.net team, for allowing the nearly iv unlimited use of their wardriving database to inform people about the dangers of wireless networks. Lastly, I want to thank my parents and my brother (who is incidentally also my best friend), my girlfriend Tine, all my other amazing friends1 and possibly the best family in the world. They all contributed so much to this thesis without even realizing it, by encouraging me to pull through, and by just being the nicest and friendliest people around.
Thank you all so much,
Bram Bonné
1“Friends” sounds so offhanded. So here we go; at the very least (apart from family and (ex-)colleagues): Alicja, Andrea, Anneleen, Ben, Bert, Brecht, Carl, Cedric, Daniël, Dries, Eef, Hannah, Hanne, Hélène, Ian, Ine, Jasper, Jimmy, Joachim, Joke, Jolien, Julie, Kevin, Koen, Koenraad, Kristof, Lore, Lowie, Maarten, Marijke, Marleen, Michelle, Miet, Nathalie, Nel, Nico, Phung, Rachel, Roald, Ruben, Sam, Sarina, Sebastiaan, Sophie, Stephanie, Thierry, Thijs, Tiberd, Tinne, Tom, Wim, Xavier and Yifei. Contents
Abstract i
Acknowledgments iii
Contents v
1 Introduction and research questions 1
I Security and privacy issues in smartphone connectivity 7
2 WiFiPi: Involuntary Tracking of Visitors at Mass Events 11 2.1 Introduction ...... 13 2.2 Related work ...... 13 2.3 Technical background ...... 15 2.3.1 Scanning for wireless networks ...... 15 2.3.2 Connecting to a network ...... 18 2.4 The WiFiPi detection mechanism ...... 18 2.4.1 Detecting a device ...... 19 2.4.2 Tracking the location of a device ...... 20 2.5 Implementation ...... 21 2.6 Experiments ...... 22 2.6.1 Pukkelpop 2012 ...... 24 2.6.2 University campus ...... 24 2.6.3 Pukkelpop 2013 and WiFiPi 2.0 ...... 25 2.7 Applications ...... 28 2.7.1 Real-time crowd management and marketing ...... 28 2.7.2 Mobility models for simulations ...... 29 2.7.3 Ubiquitous computing ...... 29 2.8 Privacy implications ...... 30 vi CONTENTS
2.9 Conclusion ...... 31
3 A Comparative Simulation of Opportunistic Routing Protocols Using Realistic Mobility Data Obtained From Mass Events 33 3.1 Introduction ...... 35 3.2 Background and Related Work ...... 35 3.3 Mobility ...... 36 3.3.1 Collecting Data ...... 37 3.3.2 Calculating movement paths ...... 37 3.4 Simulations ...... 38 3.4.1 Simulation Settings ...... 39 3.4.2 Routing Protocols ...... 41 3.5 Results and analysis ...... 42 3.5.1 Metrics ...... 42 3.5.2 Individual Protocol Examination ...... 44 3.5.3 Comparison ...... 45 3.5.4 Candidate Selection ...... 46 3.6 Conclusion ...... 48
4 Assessing the Impact of 802.11 Vulnerabilities using Wicability 49 4.1 Introduction ...... 51 4.2 Capability aggregation ...... 52 4.2.1 Acquisition ...... 52 4.2.2 Matching and processing ...... 53 4.2.3 Presentation ...... 54 4.3 Case study: prevalence of devices susceptible to active probing attacks . . . . 54 4.4 Datasets ...... 56 4.5 Conclusion ...... 56
II Creating and assessing user awareness 61
5 Raising Awareness on Ubiquitous Privacy Issues with SASQUATCH 65 5.1 Introduction ...... 67 5.2 Mobile Phones that “Never Forget” ...... 68 5.3 The SASQUATCH System ...... 68 5.3.1 Inferring a smartphone’s whereabouts ...... 69 5.3.2 Determining a network’s authentication type ...... 70 5.4 Study ...... 71 5.5 Results ...... 74 5.6 System analysis and limitations ...... 77 CONTENTS vii
5.7 Conclusion ...... 79
6 Understanding Wi-Fi Privacy Assumptions of Mobile Device Users 81 6.1 Introduction ...... 83 6.2 Related work ...... 84 6.3 Methodology ...... 86 6.3.1 Connection monitoring ...... 86 6.3.2 Exit survey ...... 88 6.4 Participants ...... 90 6.5 Results ...... 92 6.6 Discussion ...... 96 6.7 Conclusion ...... 99
7 Exploring privacy-sensitive decision making in Android using in-context surveys101 7.1 Introduction ...... 103 7.2 Related Work ...... 105 7.3 Methodology ...... 107 7.3.1 Designing the Surveys ...... 107 7.3.2 Recruitment and Incentives ...... 110 7.3.3 Ethical Considerations ...... 110 7.3.4 Limitations ...... 110 7.4 Technical implementation ...... 112 7.4.1 App Installation and Removal Triggers ...... 112 7.4.2 Permission Change Triggers ...... 113 7.4.3 Generating and Surfacing Surveys ...... 113 7.5 App Decisions ...... 115 7.5.1 Data Summary ...... 115 7.5.2 App Installs ...... 116 7.5.3 App Removals ...... 118 7.6 Permission Decisions ...... 120 7.6.1 Permission denials ...... 120 7.6.2 Permission Grants ...... 125 7.6.3 Other influences ...... 128 7.7 Conclusion ...... 128
III Towards improving security and privacy for mobile devices and users 133
8 Technical solutions to smartphone privacy and security issues 137 8.1 Introduction ...... 139 viii CONTENTS
8.2 Recommendations on smartphone security ...... 139 8.2.1 What the smartphone user can do ...... 139 8.2.2 What a developer / manufacturer can do ...... 141 8.2.3 What network providers and ISPs can do ...... 143 8.2.4 What security and privacy researchers can do ...... 143 8.3 Automatically solving privacy issues with Wi-Fi PrivacyPolice ...... 144 8.3.1 Preventing network leakage ...... 145 8.3.2 Preventing evil twin attacks ...... 145 8.3.3 From proof-of-concept to consumer app ...... 146 8.4 Comparison to other mitigation strategies ...... 147 8.5 Conclusion ...... 148
9 The Privacy API: Facilitating Insights In How One’s Own User Data Is Shared 149 9.1 Introduction ...... 151 9.2 Definitions ...... 152 9.3 The API in practice ...... 153 9.4 Methods of enforcement ...... 154 9.5 Considerations and limitations ...... 156 9.6 Related work ...... 156 9.7 Discussion ...... 157 9.8 Conclusion ...... 158
10 A look at the future: the Internet of Things 159
11 Conclusion and Future Work 165 11.1 Conclusion ...... 167 11.2 Future work ...... 171
Appendices 173
A Nederlandse Samenvatting (Dutch Summary) 175
B Survey questions for the study on Wi-Fi privacy 177 B.1 Recruitment survey questions ...... 178 B.2 Exit survey questions ...... 178 B.2.1 Introductory questions ...... 179 B.2.2 Network questions ...... 179 B.2.3 General questions ...... 180 B.2.4 Connection awareness questions ...... 181 B.2.5 Feedback question ...... 181 CONTENTS ix
C Survey questions for the study on Android permissions 183 C.1 In-situ questions ...... 184 C.1.1 App installation scenario ...... 184 C.1.2 App removal scenario ...... 185 C.1.3 Permission grant scenario ...... 185 C.1.4 Permission deny scenario ...... 186 C.2 Exit Survey ...... 186
D Scientific Contributions and Publications 189
Bibliography 209 x CONTENTS Chapter 1
Introduction and research questions
Since the earliest computers, computing devices have become smaller and more personal over time. The earliest commercial general purpose computers, such as the Ferranti Mark 1, the UNIVAC and the IBM 650, measured somewhere in the order of 10 m3. The invention of the transistor – replacing large vacuum tubes in the earlier computers – caused computers to be much smaller in size, an effect that was reinforced by the advent of integrated circuits, which allowed for a personal computing device in every household. The System on a Chip (SoC) is the latest step in consolidating even more of the circuitry into a single chip, supporting the development of ever smaller and more mobile computing devices such as smartphones and smartwatches. Smartphone and other mobile device usage has increased greatly in the past years: even dur- ing the six years it took to complete the research reflected in this dissertation, the device usage landscape has changed significantly. Data from Eurostat shows that over the past 5 years, the number of individuals in the EU aged 16 to 74 that are using a mobile phone to access the internet has increased from only 19% in 2011 to 56% in 2016 [Eurostat, 2016]. This trend is even more noticeable in emerging economies: in just two years, smartphone ownership rates have increased by more than 25% (with an increase of 42% in Turkey alone) [Poushter, 2016].
The rise in popularity of mobile devices has also led to a changed security and privacy land- scape. Every step of the miniaturization of computers has caused them to become more personal. Indeed, the earliest computers were largely tied to a company and used by different employees of the same corporation; personal computers were tied to a household and used by different members of the same family; smartphones, finally, are almost always tied to a single person. As smartphones are considered to be at least as private by their owners as 2 Introduction and research questions desktop computers [Urban et al., 2012], they contain a trove of personal data, with access to even more data through apps that are connected to cloud services and persistent logins to websites in the device’s web browser. Adding to this is the fact that smartphones contain a wealth of sensors (including, among others, an accelerometer, a GPS chip, a Wi-Fi chip and a light sensor), and that they are always carried around by their owners, something that has earned them the nickname of ‘ideal tracking device’. Not only do these devices contain more data, this data is also more easily available to (both benign and malicious) third parties. First, smartphones are always-on, always-connected devices, which increases the number of channels and time over which their privacy-sensitive data is available. Furthermore, because of their form factor and mobility, these devices can get lost or stolen more easily. Because of these reasons, smartphones are deemed to have a large attack surface, consisting of the number of ways a third-party malicious actor is able to attempt to access the data. Not only malicious actors are attempting to gather privacy-sensitive data of smartphone users. Increasingly, legitimate entities are gathering this information for various purposes. One example is state-sponsored actors, of which we know from e.g. the Snowden leaks and WikiLeaks’ CIA files that they actively use data-gathering techniques to collect infor- mation from mobile devices. Another common example of non-malicious actors gathering privacy-sensitive data are internet services that, instead of (or in addition to) asking for a direct payment, monetize their users’ data as their main source of revenue. This includes not only social networks; non-free service providers such as mobile operators or hardware manufacturers have also been using these methods as a way to increase their revenue. [Lee, 2016; Troianovski, 2013]. All of these services have access to (part of) this large trove of mobile data, either because their app or website requires it as part of providing their service, or because they control (part of) the connection the smartphone has with the internet. In some instances, these third parties do not even require any cooperation from the user. Indeed, techniques that will be described throughout this text are already being used by commercial entities to gather data on their customers. Despite the predominant public opinion that this type of data collection is only possible for large organizations with world-wide networks, both legitimate and illicit data gathering can be done with a fairly simple setup using off-the-shelf wireless hardware. Furthermore, the rise in popularity of mobile devices has also caused Wi-Fi networks to become more prevalent, often being offered by either commercial or public entities, or by internet service providers (ISPs) as a service to their customers. Indeed, as we will show in Chapter 6, users connect to an average of 8 Wi-Fi networks in a 30-day period. Connecting to the internet using one of these Wi-Fi networks entails some form of trust: the connections themselves – and, in case these connections happen unencrypted, their data – can be mon- itored by the provider of the network. Moreover, networks provided by these entities often lack any form of security, allowing not only the network provider, but also other third parties within range of the network to eavesdrop on communications. 3
The aim of this work is to assess the current state of privacy and security on smartphones, to create and gauge awareness of smartphone users around these issues, and to provide so- lutions to enhance security and privacy on mobile devices. We start from the main question “Are mobile device users aware about the security and privacy issues inherent in using their devices?”, and deduce the following research questions from this:
RQ1 What privacy and security issues exist in using a smartphone on wireless networks? How can these issues be (ab)used by a third party?
RQ2 How prevalent are the aforementioned issues in today’s devices?
RQ3 Are mobile device users aware about personal data transmitted by their devices?
RQ4 Does privacy sensitiveness influence the behavior of mobile device users?
RQ5 How do mobile device users make decisions regarding privacy sensitive data?
RQ6 Can we inform non-technical mobile device users about the aforementioned risks? Does the increased awareness cause better security habits?
RQ7 How can we improve privacy and security for mobile device users?
In the first part of this thesis, we describe the privacy and security issues that result from the shift to mobile device usage, aiming to tackle research question RQ1. We will see that on one hand, there is a major impact on the security of these devices: while attacks on Wi-Fi networks and their users have existed for a long time, they have become more prevalent as devices are more mobile, and the attack surface (in the form of Wi-Fi networks) has increased in size. On the other hand, privacy of mobile users might be impacted even more, as the shift to mobile devices has opened a whole new range of techniques for gathering privacy-sensitive information. As we will see, mobile devices allow the location of their users to be tracked surreptitiously by third parties, and allow these third parties to infer personally identifiable data from the user by monitoring only their signals. We show how these techniques can be (ab)used in different scenarios, and provide a way for observing how prevalent security issues are in today’s devices (covering research question RQ2). The second part provides an assessment of how aware mobile device users are about the issues discussed in the first part, tackling research questions RQ3 to RQ6, while in the meantime continuing our assessment of the prevalence of security issues in modern devices (research question RQ2). This is done in three different studies. The first two studies include presenting participants with their own privacy-sensitive information that we were able to gather. These studies provide an image of how aware users are about the ability for third parties to gather privacy-sensitive information from their devices (and the consequences thereof), and of how concerned users are about this information being available. At the same time, our studies aim to increase awareness among smartphone users by informing them about existing issues. We 4 Introduction and research questions will see that, whereas these issues are very prevalent due to the way mobile devices are used nowadays, users are often unaware. While we will demonstrate techniques that are highly effective in increasing this awareness, we will also show that, interestingly, an increased awareness does not automatically translate to better security practices. We end the second part with a third study, exploring how users make privacy sensitive decisions when using their devices, by employing an experience sampling methodology in order to ask users the reasons influencing their decisions immediately after they decide. In the third part of this thesis, we provide a variety of different solutions to the problems presented in the second chapter, from a variety of contexts. These solutions are presented as suggestions to changes in technology, habits and legislature, and in the form of actual implementations. This addresses our final research question RQ7. Our main contributions are as follows:
C1 We describe a mechanism for involuntary tracking of visitors at mass events which makes use of Wi-Fi technology, and show how this can be implemented at a low cost (related to RQ1).
C2 We compare and select different opportunistic routing algorithms that can be used for ad-hoc communication at mass events, based on simulations with real mobility data. We were helped for performing the actual simulations by Arno Barzan.
C3 We provide an open platform to researchers that can be used to quantify the impact and remediation rate of wireless protocol vulnerabilities (related to RQ2). Pieter Robyns helped by extracting the information from various datasets.
C4 We provide a method to inform mobile device users about privacy and security issues when using wireless networks, and use this method to successfully create user aware- ness, explaining how to secure against these threats in the process (related to RQ1 and RQ6).
C5 We provide statistics on mobile users’ privacy and security practices on wireless net- works, in the form of data about Wi-Fi usage and connections made by apps (related to RQ2).
C6 We assess mobile users’ awareness about insecure connections being made by apps on their devices over insecure wireless networks, together with the users’ level of comfort about these connections (related to RQ3). We analyze the impact of expertise in com- puter networks and the usage of privacy enhancing technologies on this awareness and RQ4). Gustavo Rovelo Ruiz generated statistics from the survey dataset.
C7 We extend the Paco ESM study tool to be able to query users about the reasons behind privacy-related decisions they make on their Android device, and make these modifi- cations available to the broader research community. 5
C8 We conduct the first study that examines the reasons why Android users install or remove an app at the time this happens, and the motivation behind granting or denying a permission right after users make their choice (related to RQ5). We also assess how comfortable and aware users are about their decisions at a later point in time (related to RQ3). This study was performed together with Sai Teja Peddinti, Igor Bilogrevic and Nina Taft, all helping out in equal amounts with the data analysis.
C9 We provide recommendations to different stakeholders (developers, manufacturers, network providers, researchers and mobile device users) on how to improve privacy and security on mobile devices without affecting usability. These recommendations are based on the results from our previous studies (related to RQ7). In addition, we formulate a proposal to improve transparency in how user data is shared by service providers to third parties.
C10 We design and develop a tool that automatically mitigates Wi-Fi attacks for Android smartphones (related to RQ7), and make this tool available to the general public.
We presented our contributions to the larger research community, in the form of three jour- nal publications, three presentations and publications at international conferences, and three presentations and publications at workshops. In addition to purely scientific contributions, we aimed to increase awareness through scientific outreach, by giving talks on security and privacy to the general public (including members and staff of the European Commission, members of multiple city councils, and younger children). A more complete list is available in Appendix D. 6 Introduction and research questions Part I
Security and privacy issues in smartphone connectivity
Introduction
For long, mobile phones have been thought of as “trusted devices”: they are used for private communication, coupled to a single user. However, in a time where smartphones dominate the mobile landscape, this trust is often unfounded. In reality, smartphones are known to collect all kinds of data – ranging from a user’s contacts to precise location data – via many different channels. For example, leaked documents by Edward Snowden show that popular smartphone apps were targeted by the NSA and GHCQ because of the vast amounts of user data they collect (as published in The Guardian on January 27, 2014). Even though a significant amount of people carry a smartphone around all the time, most of them are unaware about the privacy impact of using such a device to connect to wireless networks. For example, we will show that the strategies currently used by mobile operat- ing systems to search and connect to available wireless networks involve sharing a list of previously accessed networks.
The first part of this dissertation describes a method that allows smartphone users to be tracked by a third party surreptitiously (that is, without active cooperation of the smartphone users themselves). We show how this method can be implemented at very low cost on Rasp- berry Pi computers, and trial its implementation in two different contexts: a popular Belgian music festival and our university campus. The fact that this low-cost implementation allows a multitude of parties to surreptitiously track smartphone users leads to important security concerns.
Apart from discussing the security issues themselves, we also show how this method can be used for other research, providing the example of using the gathered mobility data to perform simulations of different opportunistic routing protocols. The routing protocol that is deemed to work best in the situations of real mobility could then be considered suitable for creating applications routing data between smartphone users directly, without the need for any network infrastructure.
We close the first part of this thesis by describing a system that allows researchers to find the impact of these kinds of wireless vulnerabilities, through the automated collection and 10 analysis of large datasets containing IEEE 802.11 Information Elements (IEs) transmitted by access points and stations. These IEs allow to see the capabilities that are supported by devices, thereby showing the fraction of devices that are at risk. Chapter 2
WiFiPi: Involuntary Tracking of Visitors at Mass Events
2.1 Introduction ...... 13
2.2 Related work ...... 13
2.3 Technical background ...... 15
2.3.1 Scanning for wireless networks ...... 15
2.3.2 Connecting to a network ...... 18
2.4 The WiFiPi detection mechanism ...... 18
2.4.1 Detecting a device ...... 19
2.4.2 Tracking the location of a device...... 20
2.5 Implementation ...... 21
2.6 Experiments ...... 22
2.6.1 Pukkelpop 2012 ...... 24
2.6.2 University campus ...... 24
2.6.3 Pukkelpop 2013 and WiFiPi 2.0 ...... 25
2.7 Applications ...... 28
2.7.1 Real-time crowd management and marketing ...... 28
2.7.2 Mobility models for simulations ...... 29
2.7.3 Ubiquitous computing ...... 29
2.8 Privacy implications ...... 30
2.9 Conclusion ...... 31 12 WiFiPi: Involuntary Tracking of Visitors at Mass Events 2.1 Introduction 13 2.1 Introduction
With the recent growth of smartphone usage, a large percentage of the population now carries a device frequently sending out signals, which can be detected by eavesdroppers nearby. At the time we published the work from which this chapter is derived (June 2013), it was esti- mated that over 50% of U.S. mobile subscribers owned a smartphone [Nielsen, 2012]. This number was only expected to increase over time as more than 1.3 million new Android de- vices are activated worldwide every day [Spradlin, 2012]; a prediction that, as we mentioned in Chapter 1, was confirmed afterwards. Modern smartphones have Wi-Fi communication enabled by default, allowing their owners to be detected at any moment by scanning the ether for Wi-Fi packets broadcast by their devices. In this chapter, we show one specific way in which smartphone users are exposing themselves to third parties gathering privacy-sensitive data; we show how their Wi-Fi signals allow them to be tracked involuntarily across a specific area. To someone trying to gather movement data from specific smartphone users, involuntary tracking provides a significant advantage com- pared to other techniques for tracking where the subjects need to actively cooperate, either by carrying a specialized tracking device or by actively and willingly sharing information about their location. Systems like the one described in this chapter do not require user consent and are therefore capable of tracking a much larger sample set of the population. At the same time, this creates a disadvantage to the smartphone users themselves, who are tracked without their consent and possibly without knowing the tracking is taking place. This chapter makes the following contributions. First, we describe a mechanism for tracking visitors at mass events which makes use of Wi-Fi technology. We explain how this mecha- nism works at a high level, and continue by discussing its implementation. Next, we describe how the detection mechanism was and is being used in two different contexts to infer move- ment patterns from visitors. One of these contexts is a three-day international music festival attracting 100 000 visitors every year. The other is a university campus, where we have been tracking students and staff for a period of three months. Section 2.7 provides an overview of possible applications of the tracking method in different domains. In Section 2.8, privacy im- plications for this technology are discussed. We conclude and present an overview of future work in section 2.9.
2.2 Related work
Several techniques have been proposed over the years to accomplish location determination and to provide the ability to track the movement of objects and people. One of these tech- niques is the use of wireless sensor networks (WSNs), where tiny motes are attached to objects to achieve tracking abilities [Kung and Vlah, 2003]. Although a high degree of pre- cision can be obtained, the disadvantages of such a method are clear: the cost of motes is non-negligible, they are not available off-the-shelf and, as such, the number of objects that 14 WiFiPi: Involuntary Tracking of Visitors at Mass Events could be tracked is limited in practice. A WSN system requires active consent and participa- tion from users to carry the motes. Alternatively, the use of tracking applications (sometimes as part of applications providing other functionality) on more common hardware equipped with GPS sensors has been proposed. Although this approach solves the cost issue to some degree, users still need to actively collaborate in the tracking by installing an application and agreeing to be tracked (unless the tracking software is part of malware, a topic which is not discussed here). Given the fact that the goal is to show how a non-obtrusive tracking technique that requires no active consent from people being tracked can be designed, these methods are not applicable. Versichele et al. [Versichele et al., 2012] developed a method for tracking people at mass events which uses Bluetooth signals sent out by mobile phones to detect a person’s loca- tion. While similar to the approach described in this chapter, this detection method requires phones to have their Bluetooth functionality set to ’discoverable’, a feature which is disabled on modern smartphones for security reasons [Android Open Source project, 2013; Apple Inc., 2012]. Because of this limitation, Bluetooth tracking can only be used to track older generation cell phones, resulting in a coverage rate of around 8% of the population at the time of our experiments. We expect this coverage rate to have decreased even further in 2017 as more people switched from older cell phones to smartphones. Our approach on the other hand requires only control signals sent out as part of the 802.11 protocol, which are required for Wi-Fi communication to function properly. This tracking method is future proof because unlike Bluetooth, Wi-Fi is enabled by default on modern smartphones. Furthermore, our method does not depend on a smartphone being put into a discoverable mode, nor does it require the smartphone to be actually connected to a wireless network. The tracking method of Versichele et al. can be used complementary to our own method to allow for tracking visi- tors using both Bluetooth and Wi-Fi signals. Using this combination, both older cell phones and smartphones can be tracked, which may provide a coverage rate close to the sum of the individual coverage rates for both tracking methods. Other work has been done in the area of using Wi-Fi communication to obtain information (besides location) about smartphone users. Cunche et al. have used passive Wi-Fi monitor- ing [Cunche and Boreli, 2012] to derive social links between smartphone owners. Moreover, Rose et al. [Rose and Welsh, 2010] have shown that SSIDs found in 802.11 probe requests can be used to produce a list of locations a smartphone user has visited. The described technique works by looking up the broadcasted SSIDs in the WiGLE wardriving database [WiGLE.net, 2016]. This database contains the locations of different wireless networks all over the world, submitted by users, and identified by their SSIDs. Our approach differs from the work done by Rose et al. in that it allows for tracking users without the presence of an infrastructure of access points (with specific SSIDS). Rose’s method also is unable to infer the time at which a device was at a certain location. In Chapter 5, we will build upon and extend Rose’s method to create awareness among smartphone users. Similarly, Becker et al. [Becker et al., 2013] have described how cellular telephone networks can be used to study human mobility 2.3 Technical background 15 on a large scale. This, too, measures mobility on a much larger, less granular scale than the technique described in this chapter. Network simulations use either synthetic movement data or data gathered from real-life crowd tracking, with a preference for the latter [Aschenbruck et al., 2011; Camp et al., 2002]. The CRAWDAD data archive [Yeo et al., 2006a] is a resource containing wireless trace data from many contributing parties. Methods for acquiring this data vary from equipping people with sensors to using network traces from access points which are under the control of researchers. We believe that our method can provide substantial benefits for people willing to extend the CRAWDAD data archive, by allowing for tracking of visitors without requiring active cooperation.
2.3 Technical background
Before we discuss how the WiFiPi system is able to track the location of a device, we first pro- vide a high-level overview of the parts of the 802.11 protocol which are exploited to achieve this, and which will allow us to collect privacy-sensitive information about smartphone users in the SASQUATCH study described in Chapter 5.
2.3.1 Scanning for wireless networks
Before a wireless (IEEE 802.11) device can connect to a network, it needs to be aware of the different access points that are in range. The IEEE 802.11 standard [IEEE Standards Asso- ciation, 2012] describes two different methods that can be used to scan for wireless (IEEE 802.11) networks: passive scanning (or ‘stumbling’) and active scanning. Passive scanning (see Figure 2.1(a)) is used with all regular networks. This method works by listening for specific packets – called beacons – which are sent out by access points with small intervals. These beacons contain, among other information, the SSID of the access point. By listening for these packets, a device knows exactly which networks are in range. The active scanning method (see Figure 2.1(b)) is used in a second class of wireless networks, called hidden or cloaked networks. In order to remain invisible to devices unaware of its existence, this type of network does not send out beacons. Instead, devices that know of this network need to scan for the access point’s presence in a proactive manner. For this purpose, probe requests are used. Broadcasting a probe request containing a specific SSID is similar to asking: “Is the network with name SSID around?”. If an access point receives a probe request containing its own SSID, it responds by sending a probe response directly to the device that sent out the probe request, notifying the device of its presence. Active scanning is also used when scanning for known (or ‘remembered’) networks in mod- ern smartphones. Since constant passive scanning requires the smartphone’s Wi-Fi radio to be powered even when no connection is available, this method would have a significant im- 16 WiFiPi: Involuntary Tracking of Visitors at Mass Events
(a) Passive scanning. The access point broadcasts its availability to everyone in range.
(b) Active scanning. The smartphone actively searches for access points that are in range.
Figure 2.1: The difference between active and passive scanning in modern Wi-Fi devices. pact on the smartphone’s battery life. Because of this, Android, iOS and Windows Phone implement a mechanism that works as follows:
1. Send out a probe request for the first ‘remembered’ network in the smartphone’s pre- ferred network list (PNL).
2. Keep the Wi-Fi radio powered on for a short amount of time, listening for access points sending probe responses.
(a) If a probe response is received, initiate a connection with the network. (b) If no probe response is received within the chosen time window, go back to step 1 while choosing the next network in the PNL. (c) If no probe response is received, and we arrived at the last network in the PNL, put the Wi-Fi radio to sleep for n seconds (the probe request interval).
In reality, smartphones running Android send probe requests only for the first 16 networks in their PNL, as can be seen from the WPAS_MAC_SCAN_SSIDS constant in the source code.1
1The source code for wpa_supplicant as it is included in Android 7.7.1 is available at http://androidxref. com/7.1.1_r6/xref/external/wpa_supplicant_8/src/drivers/driver.h. 2.3 Technical background 17
Table 2.1: Average values of probe request intervals for popular smartphone vendors.
Vendor Interval Sample size Sony 11 seconds 3 devices Samsung 22 seconds 30 devices Huawei 23 seconds 5 devices Apple 24 seconds 149 devices Nokia 25 seconds 11 devices HTC 26 seconds 9 devices Asus 30 seconds 4 devices Blackberry 30 seconds 2 devices LG 32 seconds 16 devices
Devices running iOS are not known to limit the amount of networks for which a probe request is broacasted. The probe request interval, i.e. the number of seconds during which the Wi-Fi radio is put to sleep in between scanning for the same set of networks, is dependent on the smartphone vendor, model, operating system and on the state of the smartphone (standby, screen on, connected to a network, etc.) [Cisco Systems, 2013]. Based on our preliminary experiments, we list the average probe request intervals for some popular vendors in Table 2.1. For this, we captured probe requests for all devices at the university’s main hall. For every device, we calculated the average time between bursts of probe requests sent out. These values were then averaged per vendor. Note that we are only able to distinguish between vendors (as will be described in Section 5.3), and not between operating systems. While Apple and Blackberry devices will almost certainly be running respectively iOS and Blackberry OS2, devices by other vendors may be running Android, Windows Phone or any of the other vendor-specific alternatives like Symbian (Nokia) or Bada (Samsung). Note that the active scanning mechanism is still used when the smartphone is already con- nected to a network, either to find other access points with a better signal strength than the connected network, or to improve network performance when handoff happens between dif- ferent access points [Lindqvist et al., 2009]. Smartphones also use probe requests to find unknown networks. In this case, a broadcast probe request is sent. This type of probe re- quest does not contain an SSID, and invites all access points in range to respond with a probe response containing the network identifier of the access point’s network. These broadcast probe requests can be considered as a device-initiated alternative to the beacons that are used in the passive scanning method. As a last remark, probe requests can also contain additional information in the form of Information Elements (IEs). We will discuss these in more detail
2At the time this experiment was performed, Research In Motion (the company behind Blackberry devices) did not yet offer any Android-powered smartphones. 18 WiFiPi: Involuntary Tracking of Visitors at Mass Events
1 2 3 4
Smartphone Access point network B
1 Authentication (request)2 Association request 3 Authentication (response) 4 Association response
Figure 2.2: Connecting to a network: authentication and association. in Chapter 4, but will note for now that major operating systems are currently in the process of phasing out these IEs in probe requests [Hogben, 2017].
2.3.2 Connecting to a network Once a suitable network has been found, the mobile device (the client for this network) con- nects to it. This connection happens in two stages: an authentication and an association stage. The complete process is displayed in Figure 2.2. The authentication stage allows devices to prove that they are authorized to connect to the network. If shared authentication is used by the wireless network, this stage involves the access point sending a challenge text to the client, which is then encrypted by the client to prove that it knows the key that is required to connect to the network. In modern networks, or networks that do not use authentication, shared authentication is not used3, and the authenti- cation process is deferred to a later stage. Instead, the network will use open authentication, exchanging only two authentication messages between the client and the access point. In the association stage, the client sends an association request to the access point, informing it of its wireless capabilities4 and its supported data rates and encryption protocols. The access point then responds with an association response, providing its capabilities to the client. Note that, although a client can execute the authorization stage with different access points at the same time (putting it in a pre-associated state), it can only be associated with one access point.
2.4 The WiFiPi detection mechanism
In this section, we lay out the structure of the mechanism used for tracking smartphone own- ers. Our system consists of a number of detectors, placed at different locations, with the
3The Wireless Equivalent Privacy (WEP) protocol, which is the only protocol that uses shared authentication, has been proven to have major security weaknesses, and its use has been discouraged [Fluhrer et al., 2001; Tews et al., 2007]. Modern wireless authentication protocols, such as the WPA2 industry standard, defer their authentication to a later stage. 4As noted in the previous section, we will provide some examples of these capabilities in Chapter 4. 2.4 The WiFiPi detection mechanism 19 aim of covering the entire (event) area. This allows for detecting smartphone users without their active cooperation. We first describe which elements of the 802.11 standard enable us to detect devices on a single location, and continue with an overview of how we use these detection techniques to track a device across different locations.
2.4.1 Detecting a device
In order to be able to detect a particular device that enters a detector’s range, the detector needs to scan the ether for packets that have the following characteristics:
• In order to be able to uniquely identify a device, the captured packets should contain the hardware (MAC) address of the device’s Wi-Fi interface.
• To make sure that all devices in range are detected, the packets must be sent out regu- larly by the device.
In addition to the probe request and association request frames that were described in the previous section, our system also detects reassociation request frames. A reassociation Re- quest is sent when a Wi-Fi device wants to connect to another access point on the same network. This is the case, for example, when the Wi-Fi device is roaming and has detected an access point with a stronger signal serving the same network. Reassociation requests can even be triggered by any station present on the network. Indeed, by sending out a disassoci- ation frame, a station is able to request all associated clients to disconnect from the network, effectively forcing them to reassociate afterwards. Moreover, disassociation frames can be easily forged by a third party [He and Mitchell, 2005], causing every connected device to reassociate. None of these three types of packets are sent out continuously. We define a detection round to be the minimum time interval for which we can be reasonably sure that a device sends out at least one of these three types of packets. Based on our results in Section 2.3, and based on specific lab tests in which we singled out devices, the ideal duration of a detection round was empirically determined to be 130 seconds: each of the tested Wi-Fi devices (including multiple Android devices, notebooks, an iPhone, and an iPad) sends out a probe request at least once every two minutes, regardless of whether it was connected to a wireless network, and regardless of whether it had been in use. We refrain from using a mechanism which analyzes every possible type of 802.11 packet because capturing and processing all packets would induce a great computational strain on the detectors, while adding little benefit. Indeed, processing every large data packet at the user space level instead of dropping it at the level of the network interface could very easily overload the low-cost, low-power detectors. Furthermore, to ensure completely transparent and non-obtrusive operation, our mechanism does not use disassociation frames to force re- association of devices. If it is not required that the detectors operate stealthily, disassociation 20 WiFiPi: Involuntary Tracking of Visitors at Mass Events frames can be used to force devices to reconnect to networks, essentially eliciting an associa- tion or reassociation request from the device (causing it to be easily detected). We published a separate paper (not part of this dissertation) that discusses other methods for instigating transmissions from mobile devices [Robyns et al., 2017]. Channel hopping techniques could be used to capture packets on all different 802.11 chan- nels. While channel hopping would allow a detector to detect (re)association requests on all different channels, the fact that the radio is tuned into a single frequency band for only a short period at a time has a negative impact on the number of complete probe requests detected (probe requests are sent on all channels). Empirical tests have shown that the ad- vantages of channel hopping do not outweigh its cost (i.e. fewer devices are detected in total when using channel hopping); it is therefore not used in the proposed solution.
2.4.2 Tracking the location of a device
Using the device detection method described above, a system which tracks the movement of smartphone users can be created by dispersing multiple detectors over the coverage area. The location of a specific device – and thus, its user – can then be determined by keeping track of the different times at which each detector detected a packet originating from the device’s MAC address. An optimal tracking setup considers the placement of the detectors as well as the range of the antennas to cover an area that is as large as possible. The latter can be tuned by opting for directional (beam-type) or omni-directional antennas with a specific gain factor. Detection ranges of individual detectors are allowed to overlap: both detectors could then be used to establish a more precise location of the detected device. It is a requirement that the clocks on the different detectors are at least loosely synchronized, in order to be able to correlate data from the detectors afterwards. Alternatively, the detectors could have their logging information sent to a server immediately. The server could then be held responsible for correctly synchronizing the data from different detectors. By correlating data from different detectors over time, a path can be established for every visitor. By taking into account physical properties of the tracked area, such as blocked paths and distance between detectors, more granular paths can be inferred. Moreover, in case of only one entrance and/or exit, it can be determined when a visitor enters or leaves the tracked area. We will use similar techniques in Section 3.3.2, when we will be using data gathered by tracking festival visitors to simulate visitor movement patterns. Additionally, to further increase location determination precision, the RSSI value – which indicates the received signal strength for a received 802.11 packet – could in theory be used. From this value, an estimation could be made on the distance between the detector and the device sending out the packet. However, empirical tests have shown that the RSSI value is of little use in crowded environments containing a high amount of electronic devices and people due to severe fluctuations and noise in the data sets. Because of these environmental factors, 2.5 Implementation 21 the RSSI value is currently not used in the detection mechanism. Rather, the overlapping range of the individual detectors provides a similar, but more consistent result. We use this to provide an estimate on the location of a tracked device.
2.5 Implementation
The detection mechanism as described above was implemented on a Raspberry Pi computer. The Raspberry Pi was the hardware of choice because of its low power requirements (3.5W), its small size, and its very low cost: a Raspberry Pi cost under US$35 at the time of our experiments (in 2012), while a current model Raspberry Pi Zero (released in 2015) can be bought for as low as US$5. A USB hub was attached to the Raspberry Pi for power and expansion, and a Wi-Fi dongle supporting monitor mode was used for capturing packets. An external antenna was attached to the Wi-Fi dongle to increase the detection radius. Either a cell phone or an ethernet cable was used for network communication, depending on the facilities at the tracking site.5 Lastly, an LED was connected to the Raspberry Pi’s GPIO pins to provide a status indicator, displaying whether or not the detector was a) scanning, b) connected, and c) aware of the correct local time. The result can be seen in Figure 2.3. Since Raspberry Pi’s do not have a real-time clock (RTC), the time for the Raspberry Pi is reset at boot time. For providing the Raspberry Pi with the correct time, we used ntp when a network connection was available. For situations in which no network connection was available, we created a custom-made Android application that could be installed on our own smartphones for informing the detectors in range about the current time. As we described in Section 2.4.1, we limit the detectors to process only probe requests for efficiency purposes. Because of this, the Android app (ab)uses the probe request mechanism to transmit this infor- mation to the detectors. It does this by using Android’s WifiManager to temporarily create a Wi-Fi network with a specific SSID containing an identification string, as well as the cur- rent time. This causes the Android phone to broadcast a probe request with this information, which can then be picked up by our detectors. The Raspberry Pi was running Raspbian, a Debian-based GNU/Linux distribution specifically tailored for use with the Raspberry Pi. The detection software was implemented in Python, making use of the scapy packet capturing and manipulation library [Biondi, 2005]. This library was chosen because it allows for filtering of packets at the kernel driver level by making use of the Berkeley Packet Filter (bpf), while still allowing for easy interaction with captured packets at the user space level. This combination ensures that the scanner software is as lightweight and speedy as possible, while still being able to extract useful information from packets on-line, at the scanner itself. This is important because the low-cost Raspberry Pi devices have both limited processing power and limited storage speeds, which requires
5Note that a network connection is not required for the detector to function correctly. A network connection was used to display a real-time overview of the gathered data and to provide a dashboard to the organization of the festival. 22 WiFiPi: Involuntary Tracking of Visitors at Mass Events
Figure 2.3: The Wi-Fi detector, consisting of: (a) a Raspberry Pi, (b) a Wi-Fi dongle, (c) an external long-range antenna, (d) a USB hub, (e) a Nokia N78 phone, and (f) a heartbeat LED. a scanner implementation to write as little data as possible, while doing as little processing work as possible. Because of our optimizations, the detectors are able to process more than 4 000 detected Wi-Fi nodes per detection round in real time. A list of detected devices was sent to a central server after every detection round in order to both provide a failsafe logging mechanism and to be able to gather statistics in real time. These real-time statistics include information such as the crowd density at different detectors, information about visitors’ devices (manufacturer, broadcasted SSIDs), and spatio-temporal information about the visitors. An example of the dashboard displaying part of this informa- tion can be found in Figure 2.4.
2.6 Experiments
To investigate the feasibility of using the WiFiPi system at various types mass events, we perform experiments in different settings. We do this by looking at the number of devices that the system is able to consistently track, as well as the number of different data points we can gather per device to generate a movement path that is consistent with the actual movements of the device’s user. 2.6 Experiments 23
Figure 2.4: The dashboard showing information from the second scenario (university campus). Left: a heatmap showing the density around the different detectors; right: the MAC addresses of the currently detected devices (blurred out), together with the device manufacturers; bottom: a timeline showing the number of detected devices per detector over time.
Table 2.2: Detection statistics for three experiments using the WiFiPi system.
Pukkelpop 2012 University campus Pukkelpop 2013 Detection period 3 days 3 months 3 days Visitors 100 000 3 200 (daily) 85 000 Detected devices 29 296 16 383.4 (daily) 40 815 Detection rate 29.3% 40% 45.7% Detections per device 10.5 272.1 149.2
At the time of the study, we performed two experiments using the detector software. The first experiment consisted of placing the detectors at a three-day international music festival. For the second experiment, detectors were placed at the university campus, tracking visitors for over three months. One year later (after publication of this work [Bonné et al., 2013b]), we performed the same experiment using an upgraded version of our detector software and hardware, at the next edition of the music festival. We added the results for this study in Section 2.6.3. A summary of the detection statistics for all three experiments is available in Table 2.2. 24 WiFiPi: Involuntary Tracking of Visitors at Mass Events
2.6.1 Pukkelpop 2012
Pukkelpop is a Belgian music festival attracting 100 000 visitors6 every year. The 2012 edi- tion spanned three days, from August 16th to August 18th. During these days, fifteen de- tectors were placed at strategic locations, ranging from the 8 different stages to important passageways. Three detectors contained a cell phone for real-time monitoring. The total area of coverage was about 400m by 500m. Combining the data from all fifteen detectors, a total of 137 899 unique devices (MAC ad- dresses) were detected. These detections also include some devices not belonging to festival visitors. For example, some detections might have resulted from devices in passing cars or from devices that are part of the fixed infrastructure. For this reason, we included in the final dataset only those devices which were detected in at least two different locations, at different moments in time. Filtering out the devices conforming to this requirement, we find a total of 29 296 detected devices, giving us a relatively good estimate of the number of people carry- ing a Wi-Fi-enabled device at the festival (29.3%). The 29 296 devices account for a total of 307 256 data points, spread out over the three days of the festival. It must be noted that the previous numbers establish a lower bound, not only because some Wi-Fi enabled devices might have had their Wi-Fi turned off for the duration of our exper- iments, or because it is expected that smartphone usage will increase over time, but also because technical difficulties occurred during the first experiment. Indeed, during this ex- periment, power failures were common, and both the detector software and the Raspberry Pi operating system still suffered from childhood diseases. These problems caused the detectors to sometimes malfunction, hindering them from detecting some devices. Another reason that these numbers establish a lower bound is our requirement that devices should be detected at at least two different locations. Because of this, rather stationary visitors are not part of the results. Together with our own Wi-Fi tracking setup, we also deployed Versichele et al.’s Bluetooth tracking setup [Versichele et al., 2012]. As we will explain in Chapter 3, their method pro- vided a coverage rate of approximately 10% of festival visitors, already being surpassed by our own method by a factor of 3 in 2012.
2.6.2 University campus To demonstrate the versatility of the system, detectors were also placed at the Diepenbeek campus of Hasselt University (Universiteit Hasselt). Besides being an indoor location, this scenario also differed from the one above in the fact that more long term monitoring (3 months+) was done and because the coverage area was smaller with more overlap between detectors to increase accuracy. The Diepenbeek campus building of Hasselt University typi- cally has around 3 200 daily visitors, consisting of students, staff and other guests.
6The Pukkelpop 2012 festival attracted 100 000 unique visitors over the course of three days. This number was provided to us by the Pukkelpop organization. 2.6 Experiments 25
Figure 2.5: The second version of the Wi-Fi detector, installed at Pukkelpop 2013, also containing an LCD display.
For this experiment, four detectors, all of them provided with a network connection, were placed both at the Diepenbeek campus building and at the Expertise Centre for Digital Media (EDM, Hasselt University’s multimedia research center). Two of the detectors were placed relatively close to each other. This way, it was possible to cross-check data from those detec- tors, allowing us to verify that devices were either tracked by both detectors or not detected at all. Over a period of three months, 16 486 devices passed within the range of at least one detector. In total, the devices were detected a total number of 4 486 310 times. On average, 1 383.4 unique devices are detected at the main campus per working day. As- suming that every visitor carries exactly one switched-on Wi-Fi enabled device7, it can be concluded that the detectors provide a coverage rate of around 40% for this scenario.
2.6.3 Pukkelpop 2013 and WiFiPi 2.0 In our music festival experiment from 2012, we collected information from smartphones of over 29 000 festival visitors (about 29.3%). To see if this number would increase over time, we repeated the experiment in 2013, with an updated version of both the detector software
7Note that a visitor may be carrying more than one switched on Wi-Fi enabled device (notebook, smartphone, tablet), or that he/she may not be carrying a smartphone at all. 26 WiFiPi: Involuntary Tracking of Visitors at Mass Events
Figure 2.6: The updated version of the dashboard, providing a.o. real-time information about average travel times between the different stages for festival visitors.
and hardware. Among improvements in detection performance, the newer version of our detector (see Figure 2.5) also included an LCD display, providing us with basic information such as the network connection information, the detector’s wall clock time and the number of detected devices, allowing us to easily monitor those detectors in real time even when an internet connection was not available. Apart from the detectors, the dashboard was also updated (see Figure 2.6) to provide addi- tional information. Among others, it provided the festival organization with real-time infor- mation about the travel times between different stages, an updated real-time heatmap and different spatio-temporal visualisations of the distribution of visitors between different stages and different points in time. This repeated experiment provided us with smartphone detections for 38 828 festival visitors (about 48% of the amount of tickets sold). Based on anonymized data we could again infer which music stages a person visited at what time. We were able to cluster visitors based on their musical preferences, and we generated association rules that could be used to infer which artists a visitor was most likely to visit based on other attended performances. An ex- ample of this is shown in Figure 2.7, where association rules for the band “BadBadNotGood” are shown. From this figure, we can see that a visitor in our dataset that attended BadBad- NotGood had a chance of 14.6% of also attending the band TNGHT, and that approximately 64 visitors attended both bands. “interestingness” is used as a term to describe the statistical 2.6 Experiments 27
Figure 2.7: Association rules generated for Pukkelpop 2013, showing the correlation between vis- itors of different artists. This screenshot shows association rules for the band “BadBadNotGood” (which played at the “Club” stage). concept of lift, and is calculated as follows: P(ArtistB | ArtistA) lift(ArtistA ⇒ ArtistB) = P(ArtistB) The intuition behind showing the statistical lift as an “interestingness” metric to the festival organizers is that it scales the probability of a specific rule to the probability that a random festival visitor would visit the artists in the second column. This gives rules with less popular artists in the second column a higher “interestingness” value. Intuitively, artists that were more likely to have been visited by any festival visitors (as opposed to only the ones who visited the artists in the first column) correspond to a less interesting case than artists who have been visited by only a small part of overall festival attendees. The example of BadBadNotGood attendees also attending TNGHT given above in particular was interesting to festival organizers, as BadBadNotGood and TNGHT are very different bands: whereas BadBadNotGood plays jazz that is influenced by hip hop, TNGHT is known for its electronic trap music. What caused these bands to be visited by the same audience is that BadBadNotGood, programmed early on in the day, played a cover version of one of TNGHT’s songs, encouraging the audience to attend the concert of the latter later that day. In our repeated experiment, we did not limit the WiFiPi system to collect only MAC ad- dresses, but also used it to collect lists of the networks that users had connected to in the 28 WiFiPi: Involuntary Tracking of Visitors at Mass Events past (as explained in Section 2.3). This led to some interesting results: of the 38 828 devices that we detected during this event, 23 103 devices (59.50%) sent out at least one network identifier (SSID). Moreover, 6 609 of these devices (28.61%) broadcasted at least one net- work that was not broadcasted by any of the other devices, and more than 50% of the devices contained an SSID that was shared among at most 50 other devices. This means nearly a quarter of the people carrying a smartphone could be uniquely identified only by the list of networks in their phone, and that clustering people by workplace or home based on the gath- ered data is possible. We were able to achieve these results with a simple and low cost setup, which underscores the fact that anyone with limited resources could collect similar data at any particular location. In Chapter 5, we will present a system (called SASQUATCH) that uses this information in an experiment that aims to both assess and increase user awareness about privacy issues in wireless networks. As we will see in Chapter 3, the detection rates that the WiFiPi system is able to achieve are sufficient for building realistic movement paths of visitors at mass events, both in the context of a music festival as in the more low-traffic context of a university campus.
2.7 Applications
Using low-cost hardware for tracking people offers a wide variety of applications, of which we give some examples in this section. We emphasize that this is only a small subset of many possible use cases.
2.7.1 Real-time crowd management and marketing An interesting application for organizers of mass events is to use the real-time gathered data for crowd control. Similar to the dashboard (pictured in Figures 2.4, 2.6 and 2.7) that was used for visualizing crowd data during our experiments in real time, one could visualize the real-time data in a way that shows the flows of people moving, and provides additional information on crowd density compared to the maximum capacity in a particular location. Moreover, a visualisation of real-time data could prove to be vital in an evacuation scenario where the goal is to get people to move to – or away from – a certain location as fast as possible. Furthermore, the data can be used after-the-fact to gather some interesting statistics about visitor behavior at music festivals, for example: • Which artists or stages are most popular and which artists attract a similar audience?
• Which (unforeseen) crowd movements happen on the terrain over the duration of the festival (due to unplanned events or scheduling issues between the stages)?
• How stationary are visitors? How much time do they actually spend on the festival site? 2.7 Applications 29
As an example, we provided some analysis on the first point (correlating different artists based on their visitors, as described in Section 2.6.3), and showed the results to the festival organization. They confirmed the most important trends, and indicated to be interested in a commercial system providing this information. This gathering of statistics does not only apply to music festivals, but also to places like shopping centers. There, the data could be used, for example, to monitor the shops in which customers spend most of their time, or to oversee queueing times at the cash registers. After publication of this work [Bonné et al., 2013b], different marketing agencies and cities like London and New York have started adopting similar techniques for tracking their vis- itors and inhabitants. One example is marketing agency “Renew”, which used trash cans in London to track people via their smartphones’ Wi-Fi signals. Another example is the company “Navizon” (now Accuware), specializing in tracking devices through a variety of signals [Accuware, 2017].
2.7.2 Mobility models for simulations
Opportunistic, multi-hop networks deal with using ad hoc communication to provide network connectivity among different devices in a local area. A possible application for this technol- ogy is a mass event, where conventional cellular networks are likely to become overloaded due to the high amount of visitors. In simulating opportunistic networks, it is essential that the simulation runs are performed using realistic movement patterns [Aschenbruck et al., 2011; Camp et al., 2002]. Because of this, real mobility data of visitors at a mass event can provide invaluable information for creating a realistic simulation. The dataset acquired at the Pukkelpop festival was converted to two different types of mobility traces: one that could be used by the ONE opportunistic network simulator [Keränen et al., 2009], and one that could be directly used in simulations run by either ns-2 or ns-3 [Henderson et al., 2006]. In Section 3, we will discuss this specific application of involuntary tracking, using data gathered in a similar fashion to optimize routing protocols that can be used for opportunistic communication between music festival visitors.
2.7.3 Ubiquitous computing
In the domain of Ubiquitous Computing, a low-cost Wi-Fi detector could be used to infer which people are currently present within a certain room, and to tune the atmosphere ac- cordingly. A visitor to a room can have preferences for certain types of lighting or genres of music. Furthermore, user interfaces can be tweaked to accommodate a user’s preferences, or the user could automatically be logged in to certain services. It must be noted that the maximum detection interval of 130 seconds may be too high in such a scenario. In this case, techniques such as disassociation requests (see Section 2.4.1 and [Robyns et al., 2017]) could 30 WiFiPi: Involuntary Tracking of Visitors at Mass Events be used to speed up the detection of a device as a wireless infrastructure using access points is likely to be present. A Wi-Fi detector could also be used to save energy in rooms where no one is present, possibly enhancing other detection methods such as motion sensors, audio detection or security cam- eras. Assuming that every visitor carries a smartphone, if it is detected that all devices that were previously present have now left the room, lights and other appliances could be turned off automatically. This could increase the accuracy of detection especially when attendees are mostly static. Lastly, as described by Rose and Welsh [Rose and Welsh, 2010], information from probe requests can be used to infer past location data from users, allowing for user profiling. This user profiling could aid, for example, in automatic language selection, choosing the user’s language based on the locations over the world he/she has visited most often.
2.8 Privacy implications
Clearly, tracking people via their smartphones brings about important privacy implications. The most obvious one is that when the MAC address of a person’s smartphone is known, it is easy to reconstruct the complete path this person has traveled. Because MAC address information needs to be shared among different detectors for tracking purposes, it is not possible to anonymize our dataset simply by associating a unique identifier to every MAC address in each individual detector. A naive solution might consist of creating a one-way hash of every MAC, in order to obfuscate the MAC addresses, while still making sure that the identifier would remain the same over different detectors. However, it would then still be possible to track a certain MAC address by calculating its one-way hash. Thus, care must be taken that the data is anonymized in some other way (e.g. by associating a random number with every MAC address) after it has been combined from all individual detectors. However, even if the MAC address of a person’s smartphone is not known, it is still possible to derive information from the captured Wi-Fi probe requests alone. For example:
• The list of known SSIDs is available as part of the probe request. From this list we can derive other networks the user has connected to, which may include e.g. the SSID of the home network, the SSID of places visited, or even the user’s personal name (as part of the SSID of the user’s home network or mobile hotspot).
• The manufacturer of a person’s smartphone, which can be derived from the first part of a smartphone’s MAC address, can be used to identify that person. To illustrate how, consider the scenario where the visiting times for a specific person at certain places are known. The list of devices detected at that time can then be reduced to a list of devices matching that person’s smartphone manufacturer, which makes it easy to derive the MAC address of the smartphone. 2.9 Conclusion 31
Indeed, we will use both of these points of information in our experiments to increase user awareness in Chapter 5. Moreover, de-anonymization of the dataset is possible when some other information is known. Indeed, relationship graphs in social networks can be compared to devices traveling together amongst different detectors in order to infer real-world identities from the mobility dataset. This is analogous to previous work done by Narayanan et al., wherein original identities are derived from anonymized social network graphs [Narayanan and Shmatikov, 2009]. More- over, Bilogrevic et al. showed that an adversary having access to a network of wireless sniffing stations (similar to our detectors) is able to reconstruct social communities based on gathered MAC addresses of participants’ mobile devices [Bilogrevic et al., 2012]. As in the work of Rose and Welsh [Rose and Welsh, 2010], the list of SSID’s collected from a smartphone can also be used to infer the coordinates of past locations of a smartphone user. As we will discuss in more detail in Part II (more specifically, in Sections 5.2 and 5.3.1), this can be achieved by using the list of SSID’s as an input when querying a “wardriving database” such as WiGLE [WiGLE.net, 2016], which contains a list of access points and their real-world locations.
2.9 Conclusion
We have shown that tracking of visitors at mass events can be achieved at a very low cost and – more importantly – unobtrusively and without requiring active cooperation. We achieve this by implementing the scanning software in a way that limits both processing and disk I/O, allowing for detection of up to 4 000 devices per 130 seconds (a ‘detection round’). The proposed method can easily be tailored to suit various contexts of use, demonstrated by the two scenarios presented. A number of possible applications have been discussed, along with some privacy implications that should be kept in mind when using the proposed solution. As we noted in Section 2.7, marketing agencies and public institutions have already started adopting similar techniques after the publication of this work [Bonné et al., 2013b]. The privacy implications that this technique entails, combined with the very low cost and ease with which this tracking can be peformed, show the large risk that any third party can abuse this system for nefarious purposes. Indeed, even if marketing agencies and public in- stitutions comply with ethical and legal standards, nothing prevents a malicious actor from using the same techniques to gather information or mount attacks. This ties into research question RQ1, showing that the default modus operandi of wireless network already entails important privacy and security issues. We will show how these vulnerabilities can be ex- ploited in more detail in Chapter 5.
The data acquired at the Pukkelpop festival is part of a project in which we aimed to develop a smartphone application which allows opportunistic (ad hoc) communication at mass events such as music festivals as an add-on to infrastructure-based networks. The gathered data 32 WiFiPi: Involuntary Tracking of Visitors at Mass Events can be used in network simulation experiments, in which the optimal opportunistic routing protocol for use at mass events is determined. We will discuss how these kinds of simulations work in the next chapter. Chapter 3
A Comparative Simulation of Opportunistic Routing Protocols Using Realistic Mobility Data Obtained From Mass Events
3.1 Introduction ...... 35
3.2 Background and Related Work ...... 35
3.3 Mobility ...... 36
3.3.1 Collecting Data...... 37
3.3.2 Calculating movement paths...... 37
3.4 Simulations ...... 38
3.4.1 Simulation Settings ...... 39
3.4.2 Routing Protocols ...... 41
3.5 Results and analysis ...... 42
3.5.1 Metrics ...... 42
3.5.2 Individual Protocol Examination ...... 44
3.5.3 Comparison ...... 45
3.5.4 Candidate Selection ...... 46
3.6 Conclusion ...... 48 A Comparative Simulation of Opportunistic Routing Protocols Using Realistic 34 Mobility Data Obtained From Mass Events 3.1 Introduction 35 3.1 Introduction
In the previous chapter, we showed how smartphone users can be surreptitiously tracked by third parties through Wi-Fi signals that are sent out by their smartphones. In this chapter, we show a legitimate use case for data that is gathered in this fashion. Specifically, we show that mobility data gathered by an opportunistic tracking system at a mass event can be used in simulations that try to emulate these types of events. We again use the “music festival” mass event use case as an example, using the gathered mobility data to optimize and adapt an opportunistic routing protocol that could be used in applications providing ad-hoc communications to festival visitors. To see why ad-hoc communications at music festivals are an interesting use case for mobility simulations, consider the fact that the existing cellular network infrastructure at mass events is often faced with very large amounts of data. This can result in network outage or serious delays, especially in places where many people gather in a small geographical area. Thus, it would benefit festival attendees if they could exchange small messages (e.g. SMS or Whats- App) through ad hoc communication at these events, reducing the load and freeing capacity on traditional cellular networks. For this, opportunistic network technology can be leveraged to directly deliver data between devices. Crowds at mass events follow specific movement patterns, the properties of which can be exploited to better route information between users. In literature, several routing protocols have been proposed that can be used to implement such an ad hoc messaging application on mobile devices. All of these protocols require different parameters to be tuned. These choices range from how long one should wait for the arrival of a message to the number of messages that should be passed on to a neighboring device. Simulation is a common tool in research on computer networks, especially in Mobile Ad Hoc Network (MANET) and Delay Tolerant Network (DTN) research. In this chapter, we use mobility data which was captured using a technique similar to the one described in the previous chapter to perform simulations of these various routing protocols using the Opportunistic Network Environment (ONE) Simulator [Keränen and Ott, 2007], and present our results. Furthermore, by analysing the data, a suggestion for best-suited protocols will be made for an actual implementation.
3.2 Background and Related Work
In the last decade of DTN research the main focus has been on the development of effi- cient routing protocols. Several surveys can be found in literature [Zhang, 2006; Khabbaz et al., 2012]. A commonly used tool to evaluate these protocols is simulation. There are several available network simulators, for example ns-2 (and its successor ns-3) [Henderson et al., 2006] or OMNeT++ [Varga and Hornig, 2008], each with its own benefits and draw- backs. The Opportunistic Networking Environment (ONE) simulator presented by Keränen A Comparative Simulation of Opportunistic Routing Protocols Using Realistic 36 Mobility Data Obtained From Mass Events and Ott [Keränen and Ott, 2007], is specifically designed to facilitate DTN related simulations and is used and evaluated in several publications. Although simulations are a commonplace tool in research, there are typical mistakes and misconceptions. Andel and Yasinsac [Andel and Yasinsac, 2006] point out several of these in the reporting of simulation studies. More specifically, Grasic and Lindgren [Grasic and Lindgren, 2012] have surveyed a number of DTN related papers published in the last decade and describe several issues that should be avoided in future research, with the emphasis on using the correct node density and mobility models for a given scenario. This shows the benefit of using real mobility data. Aschenbruck et al. [Aschenbruck et al., 2011] provide an overview of the available synthetic and trace-based mobility models that can be used in simulations and emphasize the need for more realistic traces and mobility models. The CRAWDAD database [Yeo et al., 2006a] is a well known repository in the domain and contains diverse wireless trace data sets which can be used to extract mobility information. Numerous publications using those traces to improve synthetic models can be found, for example Lee et al. [Lee et al., 2009] who developed a movement model based on GPS traces. Other publications use the aforementioned traces to evaluate a protocol, like the work of Radenkovic and Grundy [Radenkovic and Grundy, 2011] wherein several traces are used to evaluate a proposed routing strategy. We have used a similar method to the one presented in the previous chapter [Versichele et al., 2012] to gather a large set of mobility traces at the same international music festival, and have used this data to simulate and investigate existing DTN protocols in a mass event setting without the need to depend on synthetic models. The difference between the method laid out in the previous chapter and the one presented by Versichele et al. [Versichele et al., 2012] is that whereas our method uses Wi-Fi signals to track visitors, their method tracks devices that are in the Bluetooth discoverable state. We compare both methods in Section 2.2. Both methods were deployed at the same time, with our own method as a proof-of-concept, and their method for the actual data gathering. Vukadinovic and Mangold [Vukadinovic and Mangold, 2011] performed an analogous experiment in an entertainment park where GPS traces were collected using mobile devices that were randomly distributed to visitors over five days. However, they did not focus on different routing protocols using these traces.
3.3 Mobility
Simulating the movement of crowds at mass events in a realistic manner relies on the ability to precisely tune the simulation parameters to the context of the event. Traditionally, syn- thetic movement models – such as the random waypoint model or the city section model – have been used for this purpose [Camp et al., 2002]. Recent studies have shown that using synthetic movement models provides only a limited approximation of real trajectories, and that more realistic simulations can be achieved by using movement data gathered by tracking people within the desired simulation context [Aschenbruck et al., 2011]. A problem often en- 3.3 Mobility 37 countered with methods for gathering this type of information is that only a relatively small subset of the population can be tracked at the same time [Versichele et al., 2012]. The chosen mobility model has a profound impact on the realism and thus outcome of a simulation. As Grasic and Lindgren [Grasic and Lindgren, 2012] have pointed out, many protocols in the literature are tested and evaluated in unrealistic or irrelevant scenarios. This chapter focuses on a specific real-world scenario and the development of applications for it.
3.3.1 Collecting Data Despite projects like CRAWDAD, the public availability of wireless traces is rather limited. For this study, a subsample of festival attendees was tracked using the method of Versichele et al [Versichele et al., 2012].1 By deploying 15 Bluetooth scanners at strategic locations over the festival terrain, proximity-based trajectories of over 10 000 unique devices were registered during the same festival (Pukkelpop 2012) as where our setup from the previous chapter was deployed. The unique hardware addresses (MAC) of Bluetooth chips make it possible to detect and track the movement of a device when combining data from several detectors in much the same way as the Wi-Fi technique. The hardware address can also be used to couple a detected device to a node that is used in a simulation environment. The resulting data contains timestamps of detected devices collected at certain strategically chosen locations on the terrain, such as stages and passageways, while trying to cover as much of the terrain as possible.
3.3.2 Calculating movement paths As the collected data only includes discrete detection points in time, we need a way to gen- erate movement paths from this data. We do this by letting a simulated node travel from a certain discrete detection location corresponding to a real node and time to the location and time of the next detection of that node. The range of the detectors, depending on the sensor used, is approximately 20-30 meters. Nodes in the simulation are therefore placed at or move towards a random position around the sensor locations in a radius of 20 meters. If the nodes would simply travel in straight lines from one detector to another, certain obsta- cles that were present at the festival area would be ignored. To improve realism, a rudimen- tary shortest path finding algorithm was used to let nodes travel a more realistic path from one point to the next: paths are only created along passageways, since fences and tents on the terrain can not be crossed. To achieve this, additional points were added along paths in order to avoid these obstacles. In between detections it is possible that certain devices remain undetected for a while. This does not necessarily mean these users left the festival area or turned off their devices: they could simply be out of range of all detectors. The problem is that there is no other information
1See Sections 3.2 and 2.2 for a comparison between our method from the previous chapter and the method by Versichele et al. A Comparative Simulation of Opportunistic Routing Protocols Using Realistic 38 Mobility Data Obtained From Mass Events
Figure 3.1: Number of detectors encountered by each of the top 1 000 nodes for a single festival day. about the actual status of the device. The chosen solution to this problem is to disable the node in the simulator after a certain time of not being detected (we will explain how in Section 3.4.1). This time was empirically chosen to be 15 minutes: most of the festival area is covered and it should take less than 15 minutes to move between any two detectors. The node will remain disabled until a detection at a later timestamp is encountered in the data. The Bluetooth detectors discovered an average of 5 300 unique devices per festival day. This number also includes devices that are only detected a few times (e.g. a passer-by) or are originating from organisational equipment which is static. A set of 1 000 nodes with the best paths (i.e. paths with the most detection points) were selected for our simulations. We perform the simulations with subsets of 50, 100, 250, 500 and 1 000 nodes, selected for 12 hours of a single festival day. This gives us an average of 202.18 detections per node, with an average of 6.40 (out of a maximum of 15) detectors visited for each node. The selected nodes moved from one detector to a different detector on the festival area 38.66 times on average (median: 32). A full distribution of encountered detectors, and number of movements per node is available in Figures 3.1 and 3.2, respectively.
3.4 Simulations
The main goal of all evaluated protocols is to enable communication between arbitrary de- vices in an ad hoc manner. This means that messages must be relayed by other devices to reach destinations that are not in transmission range. This section briefly explains the settings that were used for the simulations, followed by a description of the evaluated protocols. 3.4 Simulations 39
Figure 3.2: Number of times a node moved from one detector to a different detector on the festival area, for each of the top 1 000 nodes for a single festival day. The last bar represents devices with numbers ranging from 200 to 527 movements between detectors.
3.4.1 Simulation Settings
The ONE simulator is a network simulator developed to evaluate network protocols in a delay-tolerant environment. It is specifically designed to facilitate delay tolerant networking (DTN) research by abstracting certain elements of typical DTN communication and offering a number of tools for simulation. Visualization is one of these tools; Figure 3.3 shows a screenshot of the visualization component while simulating moving nodes on the festival area. Another tool enables researchers to import mobility data into the simulator. The source code of this component had to be slightly altered so nodes could be disabled when no trace data was available (as explained in section 3.3.2). Messages in the buffers of these disabled nodes are kept there but do age; depending on the routing protocol, it is possible they are dropped from the buffer due to aging as would be the case when devices have left the festival site in a real-world scenario. Each simulation run covers a full festival day (12 hours). The size of the festival area is circa 400m by 500m. For all nodes in the simulations a communication range of ten meters was chosen using a Bluetooth interface with a transfer rate of 2 Mbit/s. Although many wireless technologies have a much larger range in theory, interference in a dense crowd and other obstacles often hinder communication over a large distance. The ONE simulator does not take into account these interferences; a relatively small transmission range is chosen to compensate for this deficiency, so the results are also relevant for other wireless technologies (e.g. Wi-Fi) that can be used in a real-world application. The messages in the simulations have a size of 100 kB and the buffers in each node have a capacity of 10 MB. This is an estimation of the requirements of an application on a mo- A Comparative Simulation of Opportunistic Routing Protocols Using Realistic 40 Mobility Data Obtained From Mass Events
Table 3.1: Simulation settings.
Simulation time One festival day (12 hours) Simulation area size 400m x 500m Transmission rate 2 Mbit/s Transmission range 10m Transmission protocol Bluetooth Message rate 4 messages per hour, per node, on average Message size 100 kB Buffer size 10 MB TTL 15 minutes
Figure 3.3: Simulation of a festival day: the nodes and their transmission ranges are visualized by a circle, the background shows the festival area. bile device for sending small text messages or pictures like in SMS/MMS. The size of the messages is large enough to support additional security measures such as the use of public- key cryptography. The buffer size is intentionally kept small (a maximum of 100 messages): more messages lead to more processing and transmissions and thus more energy consump- 3.4 Simulations 41 tion. People who do not return home between festival days have to consider their energy consumption since they often have no way of recharging their devices. This factor should be taken into account when developing and testing routing protocols for scenarios like a festival spanning several days. Messages are generated by random active nodes in the festival area. A node is assumed to be active when the simulator is certain of the node’s location, i.e. when there is a related detec- tion in the real-world trace for that particular node. The destination is also a random node in the festival area. Because nodes can become disabled, it is possible that the associated mes- sages will disappear as well. This is not unrealistic since people may turn off their devices, the application may be shut down or because batteries could be depleted in real life. Using a random event generator in our simulations, every node sends an average of four messages per hour in a simulated festival day. The time-to-live (TTL) of the messages was set to 15 minutes. This is rather a long time to mimic SMS-like behavior but it ensures that all protocols have a reasonable amount of time to deliver a message. In an actual opportunistic mobile messaging application this TTL should be further reduced since an acknowledgement system should inform the user whether a sent message has arrived at its destination, and long waiting times for this feedback are impractical. An overview of the simulation settings is given in Table 3.1.
3.4.2 Routing Protocols
Six routing protocols were tested in the simulations. Every protocol was run through five iterations of simulation, each iteration simulating a full festival day of 12 hours, with the results being averaged. In each of the five runs a different seed was used for the random num- ber generator responsible for the time of creation, sender and destination of the messages. The protocols used were Direct Delivery (DD), First Contact (FC), Epidemic (EP) [Vahdat and Becker, 2000], Spray and Wait (SW and SWb) [Spyropoulos et al., 2005] and two ver- sions of the Probabilistic Routing Protocol (PRoPHET): the original protocol (PR) [Lindgren et al., 2003] and a slightly adjusted version PRoPHETv2 (PR2) [Grasic et al., 2011]. The implementations of these protocols were made by the developers of the ONE simulator and other contributors. The active community and diverse research contributions have checked and optimized these implementations. Since the goal was to investigate general properties of routing algorithms in a specific environment, the choice was made to use specifically these thoroughly tested and reviewed protocols. A short description of each is provided below. The first three protocols do not have parameters that can be tuned, so there is only one version to be tested. DD delivers the message only directly to the destination, i.e. there is no relaying, and messages are only delivered when the destination is a direct neighbor of the sender. In FC only one copy of a message exists at a certain point in time: a message is relayed to the first encountered node unless the message already traversed that node, and is then removed from the message buffer of the previous hop. This happens until the destination is reached or A Comparative Simulation of Opportunistic Routing Protocols Using Realistic 42 Mobility Data Obtained From Mass Events the TTL is reached. The EP protocol replicates messages using a flooding approach so that all messages are constantly being sent to all nodes that are in range. In SW, a maximum of n copies of a message can exist in the network. These copies are ’sprayed’ towards encountered nodes, until only one copy remains at the sender. Two imple- mentations of SW were used: binary (SWb) and non-binary (SW). In non-binary mode the sender sprays only one copy of the message to each (different) encountered node. In binary mode half of the available copies is passed on to an encountered node, which in turn does the same with these received copies. This happens until only one copy is left. The remaining copy then waits, in both binary and non-binary, to be delivered to the destination as soon as it comes into range. Both the binary and non-binary versions were simulated with different numbers of copies (n = 6,12,18,36,72,144), based on values proposed in the original work and further empirically determined by preliminary simulations. The PRoPHET protocols try to estimate the chance that a node can deliver the message to another node, based on historical information, and spread messages based on this chance. The transitivity property describes the probability by which a node can deliver a copy to the destination recursively. For example, if node A wants to send a message to node D via B, the transitivity property specifies the probability that B will encounter a node C which can in turn deliver the copy to D. PR2 uses a slightly adjusted transitivity property and aging of the delivery probabilities. More details can be found in the respective papers [Lindgren et al., 2003; Grasic et al., 2011]. PR and PR2 ran with different values for the transitivity property (β = 0.0,0.09,0.25,0.90) based on the proposed values in the original and on previous work.
3.5 Results and analysis
This section discusses the results and analysis of the simulations described in Section 3.4. First, the proposed metrics are briefly described. The importance of these metrics is depen- dent on the application scenario and will be detailed in the analysis that follows. Finally, the most suitable protocols are suggested based on the simulations. The figures in this section show the results of the simulation runs with 500 nodes.
3.5.1 Metrics
1. Delivery ratio: This metric represents the percentage of messages that were created and effectively reached their destination.
2. Latency: The time it takes for a delivered message to reach its destination. An upper bound on the time a message has to arrive at the destination may be needed in an application to decide whether the message is lost or a resend may be beneficial. 3.5 Results and analysis 43