Master Thesis Exploring spatio-temporal data from distributed Bluetooth scanning

Łukasz Dynowski [email protected]

Marcos Fuentes [email protected]

Kongens Lyngby 2012 IMM-MSC-2012-??

Technical University of Informatics and Mathematical Modeling Building 321, DK-2800 Kongens Lyngby, Denmark Phone +45 45253351, Fax +45 45882673 [email protected] www.imm.dtu.dk

IMM-MSC: ISSN ????-???? Abstract

Roskilde Festival is a music event attended every year by 130.000 people, providing a big opportunity for collecting spatio-temporal data of the participants, given its high population density (0,78 [pop./m2]). In the current researches, the data collection has been achieved by installing smartphones in the areas of interest, which can detect the Bluetooth device of the participants’ mobile phones. However, the installation of these smartphones requires a permanent access to a power supply, which can be difficult to guarantee due to infrastructure limitations. Considering that during the 10 days of the event 50% of the participants may carry a smartphone, we propose the use these devices to collect other participants’ data. This solution should reduce the energy required to detect other devices, due to the limited access to recharge the battery of their smartphones. In this work, we present the design, implementation and deployment of software for scanning Bluetooth devices. This software can be attached as a library to other applications intended for the festival, running as the background process. We designed an algorithm to reduce the energy used in the Bluetooth devices detection. Using this technique, we could collect the data of 2.4% of the participants of the festival, with a 60% of the concerts area scanned at least once. This technique demonstrates to be highly scalable, depending on the amount of participants using the applications of the event and not on infrastructural limitations. The data collected comprises the location of the smartphone used in the scans, the detected devices and the network conditions, which can be used to analyze the behavior of the participants.

Acknowledgments

First and foremost, we would like to thank to the supervisors Jakob Eg Larsen and Sune Lehmann Jørgensen for their assistance during the project. We give particular thanks to Arkadiusz Stopczyński for the guidance, contact and fast feedback that he always served. Moreover, we would like to thank to the developers of the front-end applications used to attach the library and collect the data in this work: Félix Rubio (Roskilde Hide And Seek), Christian Graver Larsen (Roskilde MusicNerd), Lasse Reedtz, Andrea Cuttone, Morten Georg Jensen and Frederik Toft (Roskilde Decibel).

Łukasz Dynowski I own thanks to all the teachers whom I met along my education way. Thanks for pointing the path to follow and for faith that the put in me, to Kamil Szkarłat, Lidia Skibińska and my BSc’s supervisor Grzegorz Musiał. The biggest thanks to my elementary teacher Stefan Kowalski for breaking the rules, shifting the thoughts, going on the margin of patience, being different and breaking stereotypes. Moreover, I would like to thank to my family. To my parents along with my siblings for being, believing and any form of help, which they always provide. Furthermore, I would like to thank to my friends Marcos Fuentes, Filip Leczyński, Andrii Sereda and many others (which I have met on my way and whom is difficult to mention all). Thanks for their tremendous patience, unappreciated criticism and showing the proof that the problems are easier to solve as long as you know the good process, and you try to do fulfill it. Besides, I would like thanks to all bad colleagues who I met, for showing whom I should not be and how I should not behave. In addition, I would like to thank to Danish and Polish government for resources that they invested in me along my whole education program. Finally, I would appreciate my girlfriends for changing me for better, without whom I would not be where I am. Thanks to Luana for opening the world of opportunities and triggering in me linguistic skills. The biggest thanks to Juliet for great patience, support, words of hope and for being completely opposite (but sharing these same values).

Marcos Fuentes First of all, I would like to dedicate this work to my parents for their unconditional support, advices and all the encouraging words; to my father Carlos, for exposing me to the world of electronics and computers when I was just a little child; to my mother Coralia, for teaching me not to be afraid of any challenge. I would also like to dedicate this work to my girlfriend Maria Paz, for her unrestricted love and for being always with me, despite the long physical distance between us. I want to thank to my friend Łukasz for all those interesting discussions about any topic and for being able to separate friendship from professional work. I give thanks to all the Chilean people for their financial contribution to the scholarship that allowed me to study abroad. Finally, I would to thank to all the Danish people for showing me that a society can be better if everyone support each other. These lessons will be with me forever. Contents

CHAPTER 1: INTRODUCTION 9 1.1 PROBLEM 9 1.1.1 Main goal 10 1.1.2 Secondary goals 10 1.2 METHODOLOGY 10 1.3 THESIS STRUCTURE 11 1.4 DISTRIBUTION OF WORK 11 1.4.1 Work done by Łukasz Dynowski 11 1.4.2 Work done by Marcos Fuentes 12

CHAPTER 2: RELATED WORK 14 2.1 INTRODUCTION 14 2.2 TRACKING PEOPLE MOVEMENTS USING BLUETOOTH 15 2.3 GAINING INFORMATION FROM THE DATA COLLECTED 16 2.4 SUMMARY 18

CHAPTER 3: ANALYSIS 19 3.1 OVERVIEW 19 3.2 MOVEMENT TRACKING USING SMARTPHONES AND BLUETOOTH 19 3.3 TECHNICAL CHALLENGES 21 3.3.1 Energy consumption 21 3.3.2 Limited Network Access 22 3.3.3 Attractiveness 22 3.4 HARDWARE AVAILABLE FOR THE RESEARCH 22 3.5 MOBILE PLATFORMS 23 3.5.1 Market Analysis 23 3.5.2 Previous Festival Applications 25 3.5.3 Technical Feasibility Analysis 26 3.6 RISK ANALYSIS 27 3.6.1 Front-end applications 27 3.6.2 Human resources 28 3.6.3 Technical resources 28 3.7 SUMMARY 28

CHAPTER 4: EXPERIMENTS 29 4.1 OVERVIEW 29 4.1.1 Bluetooth discoverability 29 4.1.2 Theoretical calculations 32 4.1.3 Bluetooth scanning vs. people movement 32 4.1.4 Energy consumption 34 4.1.5 Energy consumption with different intervals 36 4.2 DATA ANALYSIS FROM 2011 37 4.2.1 Overview 37 4.2.2 Dataset 38 4.2.3 Crowd activity during the day 39 4.2.4 Data loss versus scanning frequency 40 4.2.5 Data predictability and dynamic intervals using replacement rate 43 4.3 SUMMARY 45

CHAPTER 5: DEVELOPMENT 46 5.1 OVERVIEW 46 5.2 ARCHITECTURE 47 5.2.1 Mobile Components 47 5.2.2 Server Components 48 5.2.3 Monitoring tools 48 5.3 REQUIREMENT ANALYSIS 48 5.4 DESIGN 50 5.4.1 Databases 50 5.4.2 Scanner Library 53 5.4.3 Schedulers 55 5.4.4 Periodic tasks 57 5.4.5 Coordination with other library instances 60 5.5 IMPLEMENTATION 61 5.5.1 Methodology 61 5.5.2 Technologies used 61 5.5.3 Database 62 5.5.4 Web services and package appending 63 5.6 MONITORING TOOLS 64 5.7 TESTING 68 5.7.1 Alpha Testing 68 5.7.2 Beta Testing 71 5.7.3 Integration tests with Roskilde 2012 front-end applications 72 5.8 DEPLOYMENT 73 5.9 SUMMARY 73

CHAPTER 6: RESULTS 75 6.1 INTRODUCTION 75 6.2 DATA PREPARATION 75 6.3 GENERAL STATISTICS 78 6.3.1 Statistics 78 6.4 AREA COVERED 81 6.4.1 Percentage of the festival and concert area covered 81 6.4.2 Traversed area covered per scanner 85 6.5 DISCOVERIES EFFICIENCY 85 6.5.1 Total occurrences and unique devices per discovery 85 6.5.2 Energy efficiency 86 6.5.3 Intervals assigned by the algorithm 87 6.5.4 Distribution of discovered devices 89 6.6 LIBRARY RUNNING TIME AND FRONT-END IMPACT 90 6.7 NETWORK COVERAGE 92 6.8 LOCATION ACCURACY 94 6.9 BLUETOOTH DISCOVERY TIME 96 6.10 DATA COLLECTED AND MARKET SHARE 97 6.11 ADDITIONAL RESULTS 98 6.11.1 Daily walking speed 98 6.11.2 Battery Level 98 6.11.3 Manufacturers 99 6.12 APPLICABILITY 101 6.12.1 Social network analysis 101 6.13 SUMMARY 103

CHAPTER 7: CONCLUSIONS 105 7.1 FUTURE WORK 106

BIBLIOGRAPHY 108

APPENDIX 111 I. LIBRARY MANUAL 111 A. ADDING BTSCANNERLIBRARY TO THE PROJECT 111 B. TO START THE SERVICE 111 C. TO STOP THE SERVICE: (THIS KILL ALL THE SUBSERVICES STARTED OR IN PROGRESS) 111 D. GET THE BLUETOOTH MAC 111 E. GET THE CACHED OR OBTAIN GPS LOCATION: 111 F. GET THE CACHED OR OBTAIN BLUETOOTH DEVICES: 112 G. SCHEDULE A POST REQUEST: 112 H. SCHEDULE A POST REQUEST WITH FILE 112 I. ACTIVATE THE BLUETOOTH 113 II. SQL QUERIES 114 A. QUERY USED TO FIND AND ASSIGN THE NEAREST TIME FROM THE GPS TABLE INTO THE BLUETOOTH TABLE 114 B. QUERY USED TO MEASURE THE DIFFERENCE BETWEEN THE LOCATION TIME AND THE TIME OF THE DISCOVERIES 115 C. QUERY FOR COUNTING THE REPEATED DEVICES LIMITED TO LOCATION, TIME AND ACCURACY. 115 D. QUERY FOR CALCULATING THE TIME OF DISCOVERY 115 E. QUERY USED TO SELECT THE DISCOVERED DEVICES THAT WERE IN AN EVENT 116 F. QUERY FOR CALCULATING THE DISTANCE BETWEEN THE DESIRED LOCATION (55.621506, 12.077213 –ORANGE ARENA LOCATION) AND LOCATIONS APPEARED IN BLUETOOTH TABLE 116 G. THE SAME FORMULA FOR DISTANCE CALCULATION IN GPS COORDINATES ADOPTED FOR EXCEL. 116 III. CODE SNIPPETS 117 A. INTERVALCOMPUTER.JAVA 117 IV. TEST CASES 121

CHAPTER 1

Introduction

1.1 Problem Roskilde festival is one of the biggest music festivals in Europe. It takes place near the Roskilde city in Denmark. The average age of the participants is around 25 years, where 80% of them coming from Denmark and the rest from other Nordic countries and other parts of the world1. The festival lasts eight days and participants can experience more than 100 concerts, parties and random events. In order to provide the basic services like access to water, shower, electricity, food or Internet, a “temporary city” is built. The area of this city counts 1,6 [km2], where 407[m2] are dedicated for concert stages. Most participants live in a large camping zone located within temporary city. The population density of this city is five times higher than Shanghai (one of the densest populated cities in the world). This environment provides an excellent test case to collect information about the participants. Even though, we know statistical information about the participants and their environment (Marling & Kiib, 2011), we know nothing about participant’s behavior. However, if we can collect people’s data, we will be able to analyze it and discover patterns that humans follow. For example, by identifying the participant’s location at a specific time, we can associate him or her to a particular event (Stange, Liebig, Hecker, Andrienko, & Andrienko, 2011) (Larsen & Stopczynski, 2012). Furthermore, we can also infer information about participant’s music preferences and indicate the places where the people are gathering and see their flow. Bluetooth is the standard technology for short-range wireless communication, ubiquitous in most wireless devices. Therefore, it is very likely that phones carried by participants are equipped with a Bluetooth antenna. The presence of this antenna can be detected by another Bluetooth device (which acts as a scanner) in a process called discovery2. The detected device answers with its unique name (MAC address3). Therefore, by performing discoveries it is possible to know which devices were around the scanner location at a given time. In several studies, this technique has been proven to be successful to track people’s movements (Kostakos & O’Neill, 2009) (Jensen, Larsen, Jensen, Larsen, & Hansen, 2010). In the year 2011 Roskilde festival’s 41th edition, a research intended to collect people’s data and explore social patterns using Bluetooth scanning was performed by (Larsen & Stopczynski, 2012). On that study, 33 smartphones where placed at different locations around the festival area. These devices where equipped with custom software that performed periodic Bluetooth scans. Phones were placed at fixed locations covering only 0.65% of the festival

1 Roskilde Festival, Nice to know: http://roskilde-festival.dk/presse/nice_to_know/ 2 Wikipedia, Bluetooth: http://en.wikipedia.org/wiki/Bluetooth 3 Wikipedia, MAC Address: http://en.wikipedia.org/wiki/MAC_address

9 | Chapter 1: Introduction area. This limitation is mainly due to the fact that Bluetooth devices can discover similar devices within a radius of 10 meters. The main purpose of this project is to provide a solution to increase the amount of data collected and the area covered in the previous research. Therefore, instead of installing several scanners around the festival area, we will use the participants’ smartphones as scanners. This might be possible due to fact, that during the festival, it is very likely that the participants carry a smartphone. According to (TNS Gallup A/S, 2011), 50% of the Danish population between 25-29 years owns one. On the other hand, if we consider the ideal case of people uniformly distributed within the festival area, only 4% of the participants are required to run the scanner application. In order to use a participant smartphone as a scanner, it is necessary to develop software for that purpose. This scanning software will perform periodic Bluetooth scans and send the collected data to an external repository. This software will be is restricted only to smartphones, since it is easy to deploy and install software for them. Nevertheless, performing Bluetooth discoveries on a smartphone requires the use of shared phone’s resources like: battery, processor power and networking. The energy consumption is the crucial resource, due to the possibility to recharge batteries is limited. For that reason, the frequency of discoveries (scans) should be carefully calculated to collect the higher amount of data with the minimum battery usage. Another issue is the network coverage. Once the data is collected, it should be sent out from the participant’s smartphone to an external repository. Unfortunately, due to the population density, the network coverage is limited and it’s difficult for a participant to get Internet access. Therefore, we have to investigate a reliable way, for sending the data taking into account the network issues.

Considering the aforementioned, we can define the following goals for this project:

1.1.1 Main goal • Explore the possibility of collecting participant’s data in a massive event using distributed Bluetooth scanning, which can be used to discover behavior patterns.

1.1.2 Secondary goals • Analyze how Bluetooth scanning impacts the normal usage of a smartphone. • Design, develop and deploy Bluetooth scanning software that can be used on a smartphone. • Analyze the quality of the data collected. • Explore the applicability of the data collected using this technique.

1.2 Methodology In order to meet our goals, the project will be divided in five stages, which are presented in the Gantt chart of the project shown in Figure 1. Nevertheless, not all the tasks performed on every stage depend on the previous one. For example, we can immediately start the testing at the early stage of the implementation. The first stage consists of analyzing the state-of-art of the techniques used to collect people’s data (1,5 months). In the second stage, we will carry out a set of experiments, aiming to analyze the factors needed to design Bluetooth scanning software (2 months). In the third stage, we will design, implement and test the scanning software, taking into account the

Chapter 1: Introduction | 10 results from the previous stages (3 months). Next, in the fourth stage the solution will be deployed for public download. In this stage, the festival takes place and the actual data collection begins. Finally, the fifth stage corresponds to the analysis of the data collected.

Figure 1. Project Gantt chart

1.3 Thesis Structure This thesis is divided in seven chapters. In Chapter 1, we give an introduction to the problem that is going to be solved. Chapter 2 describes the previous works related to the use of Bluetooth for people tracking in cities and massive events. In Chapter 3, we will identify the main factors that influence in the design of a software for distributed scanning. Later, in Chapter 4, we will carry on experiments in order to measure quantitatively the factors previously identified. In Chapter 5, we will present the design, implementation, testing and deployment of the scanning software. In Chapter 6, we will describe the data collected through many analyses. Additionally, we will evaluate the performance of the scanning software and we will describe a simple social network analysis to show what can be done with the data. Finally, in Chapter 7, we will present the conclusions and future work.

1.4 Distribution of work The thesis was done in a strict collaborative way. The theoretical issues like design, methodology, mathematical calculations and difficult algorithms were discussed together in brainstorm sessions prior to implementation. The development was done in an elaborative and exchangeable way, where the core functionality was created first, and then the details. This same rule was applied for the writing parts, where the authors reviewed the text and usually added missing details. However, each of us was responsible of a particular topic. Below is the specification of the responsibilities, were the sections in bold were written together.

1.4.1 Work done by Łukasz Dynowski

Writing: • Introduction • Related work • Analysis o Technical challenges o Market analysis o Previous festival applications o Risk analysis • Experiments o Bluetooth discoverability

11 | Chapter 1: Introduction o Theoretical calculations • Development o Architecture: Mobile Components o Requirement analysis o Databases o Implementation: Technologies used, Monitoring tools and Testing • Results o General Statistics o Area covered: Traversed area covered per scanner o Discoveries efficiency: Intervals assigned by the algorithm o Location accuracy o Bluetooth discovery time o Additional results: Battery level, Manufacturers o Applicability • Conclusions

Implementation: • Mobile and server databases • Distortion CPH 2012 testing application • Web service: SQLite parser • Monitors: Browser monitors • Testing • Social network analysis

1.4.2 Work done by Marcos Fuentes

Writing: • Introduction • Analysis o Movement tracking using smartphones and Bluetooth o Technical feasibility analysis • Experiments o Energy consumption o Energy consumption with different intervals o Data analysis of Roskilde Festival 2011 • Development o Architecture: Server components o Scanner Library o Schedulers o Periodic tasks o Coordination between library instances o Implementation: Database, Webservices and Deployment

Chapter 1: Introduction | 12 • Results o Data preparation o Area covered: Percentage of the festival and concert covered o Discoveries efficiency: Total occurrences and unique devices per discovery, Energy efficiency, Distribution of discovered devices o Library running time and front-end impact o Network coverage o Data collected and market share o Additional results: Daily walking speed • Conclusions

Implementation: • Applications used in the experiments • Scanner Library • Web service: Package reception, data compression, POST requests, SQLite parser • Monitors: Unity 3D monitor • Testing • Festival area covered analysis

13 | Chapter 1: Introduction CHAPTER 2

Related Work

2.1 Introduction Bluetooth is a wireless communication technology designed for exchanging data between devices in Wireless Personal Area Networks4 (WPAN). This technology is created as a replacement of wire connections in short distances and its range is typically 10 meters radius. (Woodings, Joos, Clifton, & Knutson, 2001). At the time of writing this document, Bluetooth technology reaches 18 years5. Bluetooth devices can now be found on every mobile phone and in some other devices, like cars, printers or laptops. Prior to establishing a connection between two mobile devices, the one who starts the connection needs to know which devices are on its reach. This process is called discovery. In order to do so, the device interested in establishing the connection broadcasts inquiry messages around its environment. If another device within its proximity is enabled to answer (it is discoverable), it will broadcast its MAC address and time clock (Woodings, Joos, Clifton, & Knutson, 2001). For the purposes of this project, the device that performs a discovery is called scanner. The detected (discovered) devices are simply called discovered devices. Additional information might be taken along with the inquiry signal. First, the discovered devices send their class of device (COD) code, which indicates the device manufacturer. Second, the time when the device was discovered. Finally, the received signal strength (RSSI)6, which could be used to indicate the user distance from the Bluetooth adapter by multilateration7. Nevertheless, this technology can also be used to track the people location by detecting the mobile phones around. If a device is discovered, it is possible to associate it to the scanner’s location and time. An exceptional opportunity for collecting discovered devices data are massive events, due to their population density. One step further, is the analysis of the data collected in order to establish social relationships between users. In the following sections, we will present several studies aiming to tracking people’s data in massive events and also how this data can be used to social analysis.

4 Wikipedia, Personal area network: http://en.wikipedia.org/wiki/Personal_Area_Network 5 Wikipedia, Bluetooth Bluetooth vs. Wi-Fi (IEEE 802.11): http://en.wikipedia.org/wiki/Bluetooth#Bluetooth_vs._Wi-Fi_.28IEEE_802.11.29 6 Wikipedia, RSSI: http://en.wikipedia.org/wiki/RSSI 7 Wikipedia, Multilateration: http://en.wikipedia.org/wiki/Multilateration

Chapter 2: Related Work | 14 2.2 Tracking people movements using Bluetooth In this section, we will describe four studies that used Bluetooth as a technique to track people’s movements. These studies were carried out on cities and massive events. In 2009, a study focused on analyzing human movement patterns was carried out for the Ghent street festival by (Versichele, Neutens, Delafontaine, & Weghe, 2012). Ghent is a small town located in Belgium with a population of 243.0008 inhabitants. During the 10 days of the festival, around 2 million people visit the city. To measure the movement of people, scientists used 22 Wireless USB9 Bluetooth adapters, situated at fixed positions around the city. These adapters were used as scanners and consisted of two Power Bluetooth classes10. The first class (class 1 100mW) is able to detect other Bluetooth devices within a range of 100m. The second class (class 2 3.6mW) covers up to a radius of 10[m]. Each participant of the festival was associated and identified using the unique MAC address11 of his or her mobile device. The authors of this research focused on data analysis in the scope of people location and time. Therefore, collected data allowed researchers to distinguish participants as “night owl” (with mainly nightly activities) and “morning person” (mainly morning activities). If a participant appeared on the time space of the festival at least once, this occurrence indicated if the person was: one-day visitor, a local person, or a few days tourist. In addition, placing Bluetooth antennas near the train stations allowed monitoring the flow of the people. Another relevant data inferred from the guests gathering, indicates that people during the day are explorers moving everywhere, but by night they stay in this same position most of the night (usual bars). The Cityware project (Vassilis Kostakos, Eamonn O’Neill, 2010) is another attempt to obtain spatio-temporal data. Originally started in London, the project uses Bluetooth devices to track the position of people in the city, while some of them could be associated to their Facebook account. To gain the user’s location, Cityware deployed nodes, which acted as scanners. Nodes are computers continuously scanning for Bluetooth devices. Later, the collected data is sent to a main Cityware server, where it can be analyzed. However, to make possible to relate the detected device to a Facebook account, the user has to enter the MAC address of his mobile device to the Cityware Facebook application. The user also has the possibility to scan for other users by transforming his computer or smartphone into a scanner. Cityware seems to be the first project that allows fusion of real and web social networks. The two previous researches determine the user’s location through the discovery process (called inquiry-based tracking). Although this technique is effective, it is not necessary efficient. The disadvantages are the following: First, the discovered device has to be in Bluetooth’s discovery mode. Second, periodical scanning is required to discover devices. Third, the minimum time for perform a discovery is 10.24s (Peterson, Baldwin, & Kharoufeh, 2006). Therefore, dynamic scanning is impossible. These conclusions have been confirmed by (Simon Hay and Robert Harle, 2009) in a research paper, where they propose a technique called connection-based tracking. In this technique, the scanner does not perform discoveries but attempts to establish a connection to a list of devices it already knows. The advantages of this proposal are extended battery life for the devices (low energy consuming) and a reduction of the time elapsed until the device is marked as detected. In the best case, the discovery time was 1.28s and in the worst 5.12s, which is competitive to inquiry-based solution. On the contrary, this technique does not detect unknown devices. Therefore, is required to create a database with MAC addresses (In (Simon Hay and Robert Harle, 2009) this data was entered

8 Wikipedia, Ghent: http://en.wikipedia.org/wiki/Ghent 9 Wikipedia, Universal Serial Bus: http://en.wikipedia.org/wiki/Universal_Serial_Bus 10 Wikipedia, Bluetooth: http://en.wikipedia.org/wiki/Bluetooth 11 Wikipedia, MAC Address: http://en.wikipedia.org/wiki/MAC_address

15 | Chapter 2: Related Work manually before the experiment). Additionally, the time required to detect a devices is proportional to the number of devices on the database. For example, performing a full detection having 1000 devices registered might even take 85[min] (5.12x1000). The most relevant research for this project has been made by (Larsen & Stopczynski, 2012). In order to collect Roskilde participants’ occurrence on the festival field, the authors used 33 Nokia’s N900 phones as scanners. These devices were located at fixed places where the participants might pass by frequently (music stages, eating places, etc.). The scanning frequency to discover the participant’s devices were set to a high value, giving the authors 2 scans per minute. The collected data were stored in the scanners SD card as SQLite12 database and sent further to a server. After the festival was finished, the collected data was analyzed and associated to the festival metadata (such as time schedule and music genres) in order to discover patterns. The authors were able to analyze individual music tastes, as well as relationships between participants. However, scanners located at fixed locations could only cover a reduced section within the festival area.

2.3 Gaining information from the data collected People who gather on massive events usually have similar goals. They are attracted to experience concerts, sportive events and also to be part of the unique atmosphere of the event. Therefore, it would be interesting to know not only statistics about an event but to know more about the social aspects of it. For example: What are the social relations between people located in a relative narrow space to each other? Do they have a tendency to appear randomly alone or in groups? Are they one day or full time event guests? Following, we will present relevant works, which answer some of these questions and show the current progress on the subject of social analysis in massive events. Wireless Rope (Tom Nicolai, Nils Behrens, Eiko Yoneki, 2006) is a project that took place during a conference. After downloading dedicated application for this project, the users started to create groups. The rope that was holding them tight was Bluetooth's network. The authors of this experiment were focused on the relationships between people and objects in the conference. The interesting fact was how the participants of the experiment were divided. For example authors, were able to distinguish three types of users: First, the stranger: This is represented by each new discoverable device around a person. Second, the familiar stranger: This is a user who appears repeatedly. Finally, the watcher: A user who detects a person of interest every time this person is approaching or leaving him. This work gives us an insight on how to distinguish users, as well as it focuses on the context were the experiment was located. Understanding human mobility patterns is a work made by (Marta C. González, César A. Hidalgo, Albert-László Barabási, 208). In this research, a sample containing the trajectory of 100.000 anonymous people out of 6 million was studied. In this experiment, the authors considered many variables associated to human walking patterns (like transportation, population size and job location). Every time a user picked up a call on his mobile phone, its location was determined using the cell antenna position plus the signal strength. The result was that human beings follow regularly the same paths and the probability of repetition is related to the path distance. In the previous section, we described the Cityware project (Kostakos & O’Neill, 2009), which merges online social network (Facebook) with the user location. The data analysis consisted of obtaining the time a user spent on a physical place. An interesting visualization made by Cityware is shown in Figure 2. Each node represents a person detected by the scanners, while its size illustrates the time that a person spent in a location. The color

12 Wikipedia, SQLite: http://en.wikipedia.org/wiki/SQLite

Chapter 2: Related Work | 16 represents betweenness (Tsvetovat & Kouznetsov, 2012) of the node, which indicates intermediaries in the social network. In addition, Cityware conducted centrality measure tests and community detection, to indicate the most influent people in the social network.

Figure 2. Cityware project: The graph illustrates human occurrence and relation of people encounter in a bar. The node size represents the time spent in a location and the color its betweenness (Vassilis Kostakos, Eamonn O’Neill, 2010) On the experiment carried on the Ghent festival in Belgium (Versichele, Neutens, Delafontaine, & Weghe, 2012), the authors performed a data analysis of location popularity (how many unique devices were detected per scanner) and the distribution of change rate of people per day (hour). By detecting repeated devices, the authors could identify one-day visitors or returning spectators.

Figure 3. Flow and direction of people during one afternoon in Ghent festival. The nodes represent a scanner and its number the scanner id. The number in the arcs is the amount of people going from one node to the other The result of this analysis can be represented as a directed graph, like the one shown in Figure 3. The arcs’ arrow shows the direction of the people’s movement and its number the amount of people moving. The number inside the nodes indicates the scanner identifier. The analysis of this project was oriented more for spatiotemporal occurrence of people and its flow, rather than studying the relationship between them. In (Larsen & Stopczynski, 2012), the analysis of data collected was focused on finding the relationships established between participants. The event was the Roskilde festival in the year 2011. One of the analysis performed by the researchers aimed to find the micro groups that

17 | Chapter 2: Related Work participants formed. For example, if two people appear together at different places, a relationship between them is created. The same applies to bigger groups, describing bipolar, triangular and other’s relations that can be seen in Figure 4.

Figure 4. Micro-groups and relations between users. Every node represents a user and the arcs a relationship between them.

Another interesting aspect measured on this study was the relationship between people and the context of the festival. By knowing the festival schedule, it was possible to determine the music genre of every band and the time and location where they performed. This information was crossed with the users’ location in order to determine the most popular music genres and the profiles of the users who attend them.

2.4 Summary In this chapter two important aspects were studied: First, the state of art related to the use of the Bluetooth technology for tracking people. Second, the possibilities of analyzing the data collected in massive events. Although each data collection technique has advantages and disadvantages, the most suitable approach for this project is discovering process. This approach allows us to search for devices without knowing previous information of people. Additionally, the chapter showed how the collected data can be described as relationships between users, people movement and associated with metadata.

Chapter 2: Related Work | 18 CHAPTER 3

Analysis

3.1 Overview What the previous researches have in common is the method used to collect the participant’s data. Basically, it consisted of placing scanners at different locations within the area that was scanned. These scanners were Bluetooth enabled mobile phones connected to a continuous power source and permanently connected to Internet. In other words, the researchers had the control over the scanners. However, the deployment of a network of scanners requires a logistic effort and, on the other hand, the amount of data collected depends on the number of scanners. In order to increase the number of scanners, this project proposes the use of the smartphones carried by participants of an event. This is possible due to the fact that any Bluetooth-enabled smartphone can work as scanner, as long it is running software for that purpose (scanning software for now on). Nevertheless, in this approach, the devices are under participant’s control. They decide when to reset the phone, stop applications or recharge batteries. The authors have no control over the scanners. On this chapter, we will identify the issues that must be taken into account to design scanning software. Among these issues, we have: energy efficiency, limited access of recharging batteries, technical feasibilities, low reliability of network access and how to encourage the participants to collaborate by downloading the software.

3.2 Movement tracking using smartphones and Bluetooth Previously in Chapter 2, we described Bluetooth technology as technology intended for data exchange that can be used for movement tracking (thanks to the discovery feature). If the scanner location is known or can be retrieved, the discovered devices can be associated to that location. For example, in Figure 5 there is one scanner (scanner1) that can discover devices around its range (device1, device2 ,…, device6). We may suppose that the devices are in an area of 10-meter radius, with the scanner at the center.

19 | Chapter 3: Analysis device1

device2

latitude device6

scanner1

device5

device3

device4

longitude

Figure 5. Bluetooth scanning using a custom scanning software. On the center, scanner1 is a smartphone running scanning software. The blue area represents the area covered and the devices the devices discovered. We can define a scan as a set of devices discovered at time given by a particular phone, with the following terminology:

�!"#$$%& ���� = ������!, ������!, … , ������!

In order to associate the discovered devices to a location, it is necessary that the scanner obtain its own global location by using positioning technology such as GPS, 3G antenna(s) or alternatively Wi-Fi. The position of the scanner on a given time can be defined by:

�!"#$$%& ���� = ��������, ��������� When two or more scanners are performing scans at the same time and they are close to each other, their information can be even combined. For example, we can indicate the direction the detected device is moving to, by calculating its motion vector. This situation is illustrated in Figure 6, where we have the following scans performed by the scanners 0 and 1 on t0 and t1:

S0(t0) = {0,1,2,4} S0(t1) = {1,3,4}

S1(t0) = {3} S1(t1) = {2,4}

1 0 1 3 2 4 4 4 2 3 S (t ) 1 0 S (t ) S (t ) S (t ) 1 1 0 0 L1(t0) 0 1 L1(t1) L (t ) L0(t1) 0 0 a) b)

Figure 6. a) Scans performed by scanners 0 and 1 on t0 b) Scans performed by scanners 0

and 1 on t1. The blue area is the area covered and the circles the discovered devices with their id.

Chapter 3: Analysis | 20

Taking the information from the scans in t0 and t1 we can generate a picture about what is likely to happen in t0. For example, the device 0 appears only on S0 but not in S1. We only know about its initial location, so we can say that it moved out the area centered on L0, but we know nothing about its direction. The device 1 remained in L0, while the devices 2 and 3 are moving to opposite directions at the same speed. Finally, device 4 is moving towards L1 but at a slower motion since it is still detected by S0 on t1.

0 2 1 3 4

Figure 7. Motion vectors using two consecutive scans. The devices 2 and 4 move towards the right area, the device 3 does it to the left. The device 1 remains on the same position and the direction of the device 0 is unknown. In order to have a constant input of information during a period of time, it is necessary that the scanning software determine how often it will perform Bluetooth discoveries or location updates. This is not a trivial question, because if the scanning frequency were too low, many data about the devices would be lost. On the other hand, as the discovery and location update consume energy, a high scanning frequency can drain the battery very quickly. Finally, the data collected from the scans should be saved temporarily in the phone and then be sent to a repository located in an external server for further analysis. It is up to the custom software to decide how often the data should be sent and take precautions if there is no network availability.

3.3 Technical challenges The lack of control over the smartphones requires that the scanning software operate with a certain level of autonomy while having a proper energy usage. The main concern is that the participant uses his or her phone for their daily activities and consequently, this software should share those resources. In the next subsections, we will identify the issues that scanning software must address. Usually in massive events the access to a power plug, to recharge phones is limited. In Roskilde Festival, the only option to charge mobiles is in wardrobes for an amount of money (roskilde-festival, 2012).

3.3.1 Energy consumption Every task performed on the mobile phone consumes energy. For the particular case of the distributed Bluetooth scanning, the hardware elements that consume energy are: processor, Bluetooth, GPS and network access. Each application running on the mobile device requires the use of CPU. The operating system tries to assign CPU cycles to every task by keeping the responsiveness on real time. However, when many applications are running at the same time, the system cannot guarantee

21 | Chapter 3: Analysis real time responsiveness at the current CPU clock speed. To overcome this, the system temporarily increases the CPU speed but this also increases the energy consumed (Zhang & Chanson, 2004). Another task that also consumes energy is the access to the network, which is used to send the data collected to an external repository. Nevertheless, the most energy-consuming task is to receive location updates using GPS. As can be seen, any application intended for a massive event requires a careful usage of the battery. For example, there is no need to perform scans every minute, when it is likely that there are other scanners around or most of the people are stationary. How often the software should perform scans without missing information will be analyzed in the next chapter, as well as how much battery consumes the elements described in this section.

3.3.2 Limited Network Access Normally, a smartphone can obtain Internet access from either Wi-Fi or 3G networks. These networks have different advantages and disadvantages. On one hand, Wi-Fi networks can provide high bandwidth but their coverage is around 95m13. On the other hand, 3G networks allow higher coverage at a lower bandwidth. In massive events the network coverage dramatically change compared to normal situations. On the previous Roskilde Festival editions many people have experienced connection problems related to 3G networks (Dagbladet Politiken, 2011). This is mainly due to the small amount of 3G cells compared to the amount of users. On the other hand, to provide Wi-Fi access for the whole festival area requires setting around 144 access points (considering an average range of 95m radius). The Wi-Fi access on Roskilde Festival will be provided for the first time in the 2012 edition. As it was aforementioned, the scanning software requires using Internet to send the collected information. Considering the issues presented in this section, the software should be designed in such a way, that it could cope with the unstable network access. Additionally, the data could be compressed prior sending in order to reduce the network usage.

3.3.3 Attractiveness The more scanners operating in a massive event area, the more data it is possible to capture. Taking this into consideration, it is necessary to encourage the spectators to download and use this software. The ideal way to encourage this would be not to deploy it as a stand-alone application, but as a library. This library could be attached to other Roskilde Festival applications (or front-end applications for the purposes of this project), such as games or schedule information. The data collection could be performed on background by the library without interfering with the performance of the front-end application. In this project, we collaborated with three students from DTU who were working on front-end applications, where the library was attached.

3.4 Hardware available for the research In this project we will use both smartphones provided by DTU and our own resources. The table below shows the specifications for the mobile devices available for research purposes.

13 Wikipedia, WiFi: http://en.wikipedia.org/wiki/Wi-Fi

Chapter 3: Analysis | 22

Apple iPhone HTC Desire (2 HTC Wildfire Samsung Nokia 3GS Units) C3510 6300 Genoa Smartphone14 Yes Yes Yes No No Release Date June 2009 February 2010 May 2010 December January 2009 2007 Application iPhone SDK Android SDK Android SDK Java 2 ME Java 2 ME Environment Sensors Accelerometer, Accelerometer, Accelerometer, None None proximity, proximity, proximity, compass compass compass Bluetooth v2.1 with A2DP, v2.1 with A2DP v2.1 with A2DP v2.1 with v2.0 headset support A2DP only GPS Yes, with A-GPS Yes, with A-GPS Yes, with A-GPS No No Network Speed HSDPA, 7.2 HSDPA, 7.2 HSDPA, 7.2 EDGE, EDGE, Mbps Mbps; Mbps 236.8 kbps 236.8 kbps Battery Li-Ion, 1219 mAh Li-Ion 1400 mAh Li-Ion 1300 mAh Li-Ion 960 Li-Ion 860 mAh mAh (BL- 4C) Battery Life Up to 300h Up to 360h Up to 690h Up to 720h Up to 348h Standby Battery Life Up to 5h Up to 6,5h Up to 8,2h Up to 10h Up to 3,5h Talking Table 1. Mobile phones available for the project.15

3.5 Mobile Platforms In this section we will analyze the platforms for which we will design the scanning software. This decision will be based on their popularity and the accessibility to interact with the smartphone Bluetooth via the platform API.

3.5.1 Market Analysis According to an analysis of the Danish’s smartphone market, we can immediately point two platforms that have the highest share. First place takes, the Android platform covering the 48% (IDC, 2012) of the market. Second is the iPhone with 36% (Asymco, 2011). Therefore, developing application for both platforms will cover most of the mobile market. On third place we have Symbian with 13%. This information is shown in Figure 8.

14 Although there is no an official definition for what is a smartphone, usually it refers to the ability of running third-party applications with a high level of integration with the phone. http://www.pcmag.com/encyclopedia_term/0,2542,t=Smartphone&i=51537,00.asp 15 GSMArena.com: http://www.gsmarena.com/

23 | Chapter 3: Analysis 5%$ 13%$ Android$ 48%$ iOS$

Symbian$

36%$ Others$

Figure 8. Danish’s mobile market share according to platform. The diagram in Figure 9 shows the daily Internet usage according to different age groups. 66% of the users situated in the age group 18 to 29 accesses multiple times to Internet per day. This confirms the fact that the Roskilde Festival participants can act as potential scanners, since they are in this age range and they own a smartphone.

70%#

60%#

50%#

40%# Mul7ple#7mes#

30%# 2:3#7mes# Once# 20%# Percentage)of)users) Not#at#all# 10%#

0%# All#Ages# 18#to#29# 30#to#49# 50+# Age)

Figure 9. Daily Internet usages per different groups age.16 Every category indicates the frequency of Internet usage. Every bar shows the percentage of users for a given frequency. Nowadays, social networks like Facebook, Twitter or 4Square are shifting towards mobile devices. This might be explained with progress of technology and non-limited Internet access, which is often offered by 3G providers. This allows users to access to popular social network services independently from their location. Therefore, these factors make the smartphone the very first device for checking emails or other related services. Figure 10 shows the common Internet tasks performed on the smartphones according to different age groups.

Figure 10. Mobile Internet access per different group age. The colors represent type of activity performed on the device.

16 Our Mobile Planet: http://www.ourmobileplanet.com/omp/T2071506754#view-the-chart

Chapter 3: Analysis | 24 3.5.2 Previous Festival Applications By analyzing applications related to Roskilde and other music festivals, we were able to find common features. These features are repeated among mobile platforms and applications, including: schedule information, festival’s map, festival’s related information, band's biography and information about the places to eat in the festival area. This analysis will help us to collaborate in the development of front-end applications, by knowing which of them are successful and which not.

Application Description Roskilde '11 ArtistRecommender This application focuses on the musical aspects Platform: Android of the festival, particularly on bands. Using Downloads: 1000+ / Stars: 4.5 Last.fm the creators search for related artists. Functionality: Recommendation Therefore, user can expand his or her musical experience. Venner på Roskilde Find the friend – by marking position of a Platform: Android friend on a map the participants are able to Downloads: 100+ / Stars: 5 find each other easily. Facebook's photo- allows Functionality: Find Friend, Facebook picture a user to upload pictures from the camera straight to Facebook’s wall. Roskilde Festival 2011 This application shows related information Platform: Android, iPhone about the festival. Downloads: 1000+ / Stars: 3.5 Functionality: Schedule

Fang Fanen! This application allows the user to take a flag Platform: Android which is a virtual object associated with a user. Downloads: 100+ / Stars: 4.5 Functionality: Flag Fan Festival Buddy This application allows locating user’s tent. It Platform: Android works like radar that indicates the distance Downloads: 50+ / Stars: 4.5 from the user to the tent and the direction. Functionality: Tent Finder Woodstock Pathfinder 2011 Using the phone’s camera the user can see the Platform: Android position of the stage as an augmented point. Downloads: 500+ / Stars: 4.5 Functionality: Augmented reality Open’er Festival The user at every moment of the festival can Platform: Android reach the movies and photos associated with an Downloads: 1000+ artist. It also provides the functionality of Stars: 4.5 making an own schedule. Therefore, users will Functionality: Multimedia Content, Organizer not miss his or her favorite bands. Bonnaroo User can listen to a radio without headphones. Platform: Android Downloads: 10000+ / Stars: 3.5 Functionality: Radio Wireless Festival 2011 Dropping a mark on a map allows the user to Platform: Android set a meeting place. Therefore, other's friends Downloads: 1000+ / Stars: - can see a location on a map and meet in the Functionality: Drop Map marked point. Table 2. Applications found on the Android Market and App Store intended for massive events.

25 | Chapter 3: Analysis There are many applications made for various music festivals on the smartphones market. Even though, some of them provide social interaction between users. The popularity of downloads is still not significant, compared to the amount of participants. Our assumption is that when a person is surrounded by thousands of people, he or she is less likely to interact with them using his or her mobile. On the other hand, the applications focused on sharing artist content seem to be the most successful ones, with more than 1000 downloads.

3.5.3 Technical Feasibility Analysis In the previous section, we described that the most popular platforms in the Danish market are Android, iPhone and Symbian. This is not surprising since they are also the most popular globally. Among these options we should research which platforms are the most adequate technically to develop custom scanning software. This decision will be based on their API functionalities and the time required for development.

3.5.3.1 Android platform At first glance, the Android platform17 is our preferred one for the development. First, because it owns the bigger portion of the market and because of both authors have previous experience developing applications for it. The language of development is Java. The Android API provides access to the smartphone’s Bluetooth device via the BluetoothAdapter class18, which is available since API Level 5. This class contains methods for enabling/disabling the Bluetooth device and to perform discoveries. For security reasons, Android OS only allows setting the device on discovery mode for a fixed period of time (300 [s] as maximum), requiring the user confirmation. In other words, Android devices can “see” non-Android Bluetooth devices, but they cannot be seen by other devices since is not possible to set them permanently discoverable. On the developer’s point of view, this platform offers a great flexibility to create software components as external libraries. The most popular development environment is Eclipse IDE, which contributes to the integration. Finally, the applications are published in the Android applications store, namely Google Play19. The process requires creating a developer account for a 25 USD fee, which is immediately ready to use (without the Google’s manual confirmation). Similarly, there are no manual tests after the application is submitted. As a conclusion, it is feasible to develop and deploy scanning software using the Android technology.

3.5.3.2 iPhone platform iPhone is another interesting candidate for the development. It is the second most used platform in the Danish market and one of the authors have experience developing for it. The operating system is iOS while the development language is Objective-C. Unfortunately, iOS do not expose low-level Bluetooth access to the developer, for example to perform discoveries. To enable communication between two devices, it provides

17 Android Developers: http://developer.android.com/ 18 Android Developers, Bluetooth adapter: http://developer.android.com/reference/android/bluetooth/BluetoothAdapter.html 19 Android apps in Google Play: https://play.google.com/

Chapter 3: Analysis | 26 the GameKit library20. Nevertheless, this can be used only to connect and discover Bluetooth devices compatible with the Apple’s Bonjour technology21. Another alternative is the unofficial BTStack22 library for iOS, which requires the use of a custom kernel (not supported by Apple) via the jailbreaking23 process. iOS Applications are distributed through the Apple’s App Store, which requires a membership to the Apple’s Developer Program24 for 99 USD. It does not support applications that require jailbreaking and the applications are tested manually once they are submitted, a process that can take up to 10 days. In summary, this platform is discarded because of the lack of access to the Bluetooth device and there are no options to deploy applications not supported by Apple. Fortunately, iPhone has no limits for Bluetooth discoverability, which allows other devices working as scanners to discover it, such as Android.

3.5.3.3 Symbian platform The Symbian platform is a technology owned by Nokia that used to be one of the most popular in the mobile market. However, its popularity is decreasing dramatically due to the Nokia announcement to switch to a different operating system25. This technology has been proven to be successful for developing scanning Bluetooth software in a controlled environment (using fixed scanners) in (Larsen & Stopczynski, 2012). The applications for this platform can be developed on C++ (Nokia Corporation, 2005) or Java 2 ME (Nokia Corporation, 2004). Although it is feasible the Bluetooth development on this platform, the market share is only shares the 13%. Moreover, there is a low interest from the developers to make front-end apps, which makes difficult to find where to attach the scanning software. In conclusion, we have decided to discard this platform.

3.6 Risk Analysis

3.6.1 Front-end applications An element to be considered as a risk is the participant who uses the front-end application. Since the front-end application contains the library that makes the data collection process possible, the user might stop or uninstall it (if it does not have value for him or it fails). This may influence the data acquisition directly. For this reason, it has to be well designed in order to satisfy user needs and carefully tested to avoid integration failures with the library. Nevertheless, the development of these applications is in charge of associate students and the authors have only small influence on it. In order to reduce the risk of failure, we will generate prototypes for other massive events before Roskilde Festival. This will help us to test the software and have data to analyze.

20 iOS Development library: http://developer.apple.com/library/ios/#DOCUMENTATION/NetworkingInternet/Conceptual/ GameKit_Guide/Introduction/Introduction.html 21 Apple Support, Bonjour: http://www.apple.com/support/bonjour/ 22 BTStack: A Portable User-Space Bluetooth Stack: http://code.google.com/p/btstack 23 Wikipedia, iOS Jailbreaking: http://en.wikipedia.org/wiki/IOS_jailbreaking 24 Wikipedia, iOS Developer Program: https://developer.apple.com/programs/ios/ 25 Engadget, RIP Symbian: http://www.engadget.com/2011/02/11/rip-symbian/

27 | Chapter 3: Analysis 3.6.2 Human resources Cooperation with people related to this project requires communication skills like understanding the needs of others and the ability of helping them. Having periodical meetings with the associate professors can help to reduce this risk. Another fact is that the project length can be underestimated or the technology chosen might be the incorrect. Considering that the project has to be implemented before the Roskilde festival, a wrong decision can risk meeting the deadlines. In order to reduce that risk, every decision should be assessed carefully in order to reduce the amount of lost work.

3.6.3 Technical resources The rapid changes in technology and the tools related to the development might affect the project directly. From the hardware point of view, we may face situations where the smartphone is not capable of performing the required task. This may impact the functionalities provided and increase the development time. Another fact is the vast choice of devices and manufacturers for the Android platform. This makes difficult to ensure that the application will be tested on every manufacturer device. A reasonable solution to overcome this challenge is to test the application on the most popular smartphones, as well to test it with different versions of the operating system.

3.7 Summary This chapter reveals the aspects that we have to be aware before designing the final solution. First, we presented an overview of the approach proposed in this project for tracking users. Next, we described the environmental issues to take into account for the design of the solution, such as: limited access to the network, few possibilities of battery recharging and the impact of the front-end applications. These issues will be measured quantitatively in the following chapter. Additionally, we examined the popularity of massive event applications. We were able to identify the key features that define which applications are more successful. This will help us to collaborate with the associate students who are working also developing applications for the festival. The analysis of the different smartphones’ platforms helped us to choose the Android platform for development. This decision is supported by the size of the market share and the accessibility to the Bluetooth sensor. We discarded the iPhone device (despite its significant position on the market) because of the lack of access to the Bluetooth sensor. Finally we described the critical risks that could impact in delivering the project on time.

Chapter 3: Analysis | 28 CHAPTER 4

Experiments

4.1 Overview In the previous chapter, we identified the main factors that should be taken into account in order to develop custom software for Bluetooth scanning. In this chapter, we will perform a set of experiments to measure quantitatively these factors. The first experiment aims to determine the range of discoverability between the scanner and the detected devices. Additionally, we would like to identify which factors impact on this, such as obstacles, the device manufacturer, etc. Having the Bluetooth range and knowing the size of the festival area, we will estimate mathematically the number of scanners needed to cover the festival and the concerts area. Additionally, we will calculate the minimum scanning frequency required to detect a person in the scanned area at least once. One of the critical factors in the design of the solution is the energy efficiency of the solution. For this reason, we will perform a set of experiments aiming to measure the energy consumed on a smartphone by a Bluetooth discovery and a GPS location update. Nevertheless, the energy consumed by the software is closely related to the scanning frequency. We want to perform scans only when we expect to have devices around and thus having more data with less scans. For that purpose, we will design adaptive algorithms that define the scanning frequency according to the flow of people. The efficiency of these algorithms will be simulated using the data collected from the Roskilde festival on 2011 (Larsen & Stopczynski, 2012). Having the empirical results of the experiments, we will be ready to start the design of the solution.

4.1.1 Bluetooth discoverability A very important aspect to consider before designing the software solution is to know the Bluetooth discoverability range. Previously, we mentioned that in related researches this value was 10[m]. Nevertheless, we would like to determine empirically how this value changes using devices made by different manufacturers, and the impact of the environment on it.

4.1.1.1 Methodology In order to measure these parameters, we created a mobile application for the Android platform that runs on a HTC Desire smartphone (leftmost device in Figure 11a). This application performs Bluetooth scans looking for 4 devices that are previously registered (the

29 | Chapter 4: Experiments other devices shown in Figure 11a). On every scan, the application shows which of these devices were detected (Figure 11b). From the 4 devices used, 2 of them were smartphones. The experiment took place in the open-space area, (GPS location 55.759484, 12.551986) isolated from elements that might cause interference (other devices, cars, etc.). In addition, the weather during the experiment was sunny (03 March 2012). Two people participate in the experiment; one person was responsible of keeping the devices always in the discoverable mode (In the case of Android, this mode has a limit of 300s, after it has to be set again). Another person was using the scanner running the experiment software and performed measurements at different distances separated by 2[m].

a) b) Figure 11. a) Devices used in the experiment. The leftmost is the scanner and the rest the discoverable devices b) Application developed for the experiment. The red icons show when a device is detected and the green when not.

4.1.1.2 Results

Figure 12 shows the relationship between the distance, and the probability of detecting a device according to its manufacturer. We can see that all devices are always detected at 10[m] distance. Even though all the devices used in the experiment belong to 2nd class, the smartphones were more likely to be discovered than the rest.

100#

90#

80#

70# HTC#Wildfire#

Nokia#

Probability* 60# iPhone#

Samsung# 50#

40#

30# 0# 2# 4# 6# 8# 10#12#14#16#18#20#22#24#26#28#30#32#34#36#38#40#42#44#46#48#50# distance*[m]*

Figure 12. Probability of finding the devices in the experiment according to the distance to the scanner

Chapter 4: Experiments | 30 Another way of seeing this is as a circular probability function (See Figure 13). The blue gradient uses the average percentage obtained previously to represent the probability of being detected, while the green gradient shows interpolated data. In the center of the circle (the darkest blue of 10m radius) there are 100% of chances to discover a device. However, after 12[m] the discoverability of devices start to decreased. On the edge of the blue circle (50[m]) the percent of probability of detecting a device is 50%.

Figure 13. Probability density function of detecting a device according to the distance from the scanner.

This same result can be seen as a distribution function. In Figure 14, it is clear that the probability of detecting a device decreases with the distance.

Figure 14. Probability distribution function of device discoverability. The blue region represents empirical data and the green interpolated data.

An interesting fact is that the human body can act as an obstacle. It is more likely that a scanner will detect a device, if the people are facing each other. In the case where one person is turning back the other, the discoverability does not go beyond 10[m]. Nevertheless, we only considered the values measured when the volunteers faced each other.

4.1.1.3 Conclusions Although, the manufacturers specify that the maximum coverage of Bluetooth is 10[m], we were able to discover devices at even at 50[m]. The explanation of this might be found in the environment conditions, which were free from any obstacles that might cause interference.

31 | Chapter 4: Experiments Having empirical data as the frame of reference, we can indicate that a feasible distance of discoverability has to be approximated to 10[m].

4.1.2 Theoretical calculations The scanning frequency is the most critical parameter to develop energy efficient scanning software. If frequency is set too high, many devices may be omitted in discovery process. On the other hand, if it is too small, the scanning device will perform so many scans, that it will drain the battery too fast. In this section, we would like to have a frame of reference, about what should be the smaller interval to avoid potential data loose. Additionally, we will calculate how many scanners we need in order to cover the whole festival area.

4.1.3 Bluetooth scanning vs. people movement In this analysis we aim to obtain the maximum interval between scans, in order to capture at least once a person who is walking towards the scanner. The situation can be seen in Figure 15. Therefore, if one person is passing through the area where scan is performing, the interval between scans should be as much as the time required crossing the area. This is a good approximation for the case of a not crowded area.

walking speed

diameter

Figure 15. Relationship between the diameter of a scanning area and the walking speed

If we consider that the human average walking speed is 1,4[m/s]26 and the Bluetooth scanning radius is 10[m], an approximate maximum interval could be: 2 ∗ 10 �������� = = 14,28 � 1,4

1 ��������� = = 0,07 [��] ��������

However, the previous case, does not take into account a crowded area. If we suppose that the scanned area is completely filled by people, it would be interesting to determine, what is the average time required for all of them to leave the area (See Figure 16a). Therefore, this will give us an insight about what would be the maximum average interval, in order to detect at least one person in a crowded area. The equation in Figure 16b was used for that purpose. Basically, it sums up the time required to leave the circle for every person inside it and then this value is divided by the total

26 Wikipedia, Walking: http://en.wikipedia.org/wiki/Walking

Chapter 4: Experiments | 32 amount of people. In our case, the amount of people inside the circle is 297 and we suppose that every person uses 1[m2] of space. The average time obtained was 2.8 [s].

10.5 − � !" ∗ 2� − 1 � A = 1 m2 !!! � v ������� ���� = ! !!! 2� − 1 �

������� ���� = 2.8 [�] d = 20m a) b) Figure 16. a) People uniformly distributed in a crowded scanning area. Every person uses 1[m2] of space and all of them leave the area at the same time b) Average time required for every person to leave the scanning area. However, there is a technical limitation of discovery time, which is 10,24 [s] (Simon Hay and Robert Harle, 2009). According to this, it is impossible to guarantee the maximum interval of 2.8 [s] for the crowded case. Previously, in the non-crowded situation, the interval of 14,28 [s] can be guaranteed, but in practice, it requires around 4 scans per minute. This frequency may consume too much energy, as we will see in the next sections. On the other hand, if we consider that many scanners can perform scans in the same area; this frequency can be reduced according to total amount of them.

4.1.3.1 Festival area After the previous results, we would like to estimate the scanned area as the whole festival and particular the concert area. This assumption is based on the ideal case, that all the scanners are evenly distributed. This will give us the maximum interval needed to detect person at least once by passing the areas mentioned. The festival has an area of 1.576.000 [m2] and the concert 166.000 [m2] (roskilde-festival, 2012). We can calculate the length and height of both by considering them as a square (Figure

17 a). According to the equation below in Figure 17 b), the value tc represents the approximate time needed to cross the concert's area and tf is the value for whole area of the festival. Analogously, lc and lf represents the length.

�! = 166000 = 407 �

�! = 1576000 = 1255 � � � = � � 407 [�] � = = = 290 � = 4.8 [���] ! � 1.4 [�] ! !"## [!] � = = = 896 � = 14 [���] ! ! !.! [!]

a) b) Figure 17. a) Sizes of Roskilde festival concert and festival area b) Formulas used to determine the maximum intervals to detect a person in the concert and festival area

33 | Chapter 4: Experiments

If one person is passing through the festival area, the maximum interval time for discovery should not be greater than 4.8[min] for this ideal case. Excessing this value, we would lose potential Bluetooth devices that might be detected by the scanner.

4.1.3.2 Scanners coverage In relation to the previous result, now we would like to know the number of scanners needed to cover the entire concert's area. Again, we assume that the scanners are evenly distributed and the area of scan has a radius of 10[m]. This situation is depicted in Figure 18.

Figure 18. Scanners needed to cover the concerts area. The devices are evenly distributed.

Thus, the number of scanners that can cover the concerts area is 400. In the case of the festival area, this number is 3900. Even though the number of scanners required to cover concert area seems to be relatively big (400), it is only 0.3% of the festival’s population. In the case of whole festival area, the number is 3900, which counts the 3%. It is important to remark that this is the best case and, in practice, the scanners are distributed in a non- uniform way.

4.1.4 Energy consumption In the previous section, we have obtained the ideal intervals in order to collect the maximum amount of data. However, to have realistic intervals, we need to determine empirically how much energy is consumed in the whole discovery process. In order to do that, we will measure the energy consumed by the Bluetooth scans and the location updates using GPS.

Alarm Setter Activity

Triggers the service every 30 seconds

Service

Bluetooth Discovery

GPS Location

Send Results

Figure 19. Android application diagram for battery measurement. The alarm activity triggers Bluetooth discoveries, location updates and data submission to a server every 30[s].

Chapter 4: Experiments | 34 4.1.4.1 Methodology In this experiment, we used 2 identical HTC Desire devices, running Android OS 2.2.2. All background services or applications running were disabled. We developed a simple Android application consisting of an activity that starts a service every 30 [s] using alarms (Figure 19). In Android, these alarms are set using the class AlarmManager27. The advantage of using alarms is that the service does not require to be running on background but is “waked up” by the system. Every time the service is waked up, we start another service, which can perform: A Bluetooth discovery, a GPS location update, or both. Finally these results are sent to a server. We measured the energy consumption in four different scenarios: (Bluetooth on/GPS on, Bluetooth on/GPS off, Bluetooth off/GPS on and Bluetooth off/GPS off).

4.1.4.2 Results The results for the two devices used in the experiment were very similar (Shown in in Figure 20). We can see that how the battery charge decays according to the services started with the alarm.

Device A Device B 100" 100"

90" 90"

80" 80"

70" 70"

60" 60" No"Services"Running" No"Services"Running" 50" 50" Bluetooth" Bluetooth" 40" 40" GPS" GPS" Ba#ery'Life'[Percentage]' Ba#ery'Level'[Percentage]' 30" Bluetooth"+"GPS" 30" Bluetooth"+"GPS"

20" 20"

10" 10"

0" 0" 00.00.00" 14.24.00" 28.48.00" 43.12.00" 57.36.00" 72.00.00" 86.24.00" 0.00.00" 12.00.00" 24.00.00" 36.00.00" 48.00.00" 60.00.00" 72.00.00" 84.00.00" Ba#ery'Life'[Hours]' Ba#ery'life'[Hours]' a) b) Figure 20. Battery consumption according to the sensors enabled for device A and device B. Every line represents a different test case.

Each line represents a different experiment. As the battery consumption is evidently linear, we performed a linear regression to estimate the ratio decay/time. We can see that the sensors that consume more energy were Bluetooth + GPS, then GPS and finally Bluetooth alone. In Table 3, we show the averaged results of both devices. When both sensors are off, the battery lasts for 91,5[hours]. Bluetooth consumes 0,03% of the battery on every discovery, while GPS 0,05%. When both sensors are enabled the discharge ratio is 0,06% (contrarily to the sum of Bluetooth and GPS). This is due to the optimizations made by Android when using non-real time alarms.

27 Android Developers, Alarm Manager: http://developer.android.com/reference/android/app/AlarmManager.html

35 | Chapter 4: Experiments

No services Bluetooth GPS Bluetooth + GPS Battery Life 91:31:25 26:02:34 17:36:23 13:31:08 Discharging rate: % per hour 1,10% 3,85% 5,70% 7,40% Discharging rate: % per Bluetooth + GPS operations 0,01% 0,03% 0,05% 0,06% Discharging rate 1,0 3,5 5,2 6,8 Readings 10982,8 3125,1 2112,8 1622,3 Table 3. Average results

4.1.5 Energy consumption with different intervals The previous experiment showed what happened to the battery when the scanning intervals were set to 30[s]. However, we don’t know if the discharge ratio is the same when this interval is different. In the following experiment we aim to determine how the length of the scanning intervals impacts on the discharge ratio.

4.1.5.1 Methodology The software used for this experiment was the same developed in the previous section, but setting different scanning intervals. We performed only Bluetooth + GPS location updates with intervals of: 30, 60, 90, 120, 150, 180 and 210[s].

4.1.5.2 Results According to our expectations, the battery discharge ratio decreases as the scanning interval increases, as we can see in Figure 21. In Figure 22 we plot the battery consumption according to the intervals. We used the experimental data to represent the small intervals, and a forecast for the rest. This plot will give us an idea about, how much battery is consumed for a particular interval.

100" 90" No"Services"Running" 80" 70" Bluetooth"+"GPS"30s" 60" Bluetooth"+"GPS"60s" 50" Bluetooth"+"GPS"90s" 40" 30" Bluetooth"+"GPS"120s" 20" Bluetooth"+"GPS"150s" Ba#ery'charge'[percentage]' 10" Bluetooth"+"GPS"180s" 0" 00" 24" 48" 72" 96" Bluetooth"+"GPS"210s" Ba#ery'dura2on'[hour]'

Figure 21. Battery discharge using different intervals

Chapter 4: Experiments | 36 1,40%$ 1,35%$ 1,30%$ 1,25%$ 1,20%$ 1,15%$ 1,10%$ 1,05%$ 1,00%$ 0,95%$ 0,90%$ 0,85%$ 0,80%$ 0,75%$ 0,70%$ 0,65%$ 0,60%$ 0,55%$

Consume(per(Bluetooth(&(GPS( 0,50%$ 0,45%$ 0,40%$ 0,35%$ 0,30%$ 0,25%$ 0,20%$ 0,15%$ 0,10%$ 0,05%$ 0,00%$ 0$ 2$ 4$ 6$ 8$ 10$ 12$ 14$ 16$ 18$ 20$ 22$ 24$ 26$ 28$ 30$ Interval(period(

Figure 22. Battery consumed used by Bluetooth and GPS according to different intervals. We used the experimental data to plot the first intervals between 30-210[s] (0,5 - 3,5[min]). We used linear interpolation to forecast the future intervals

4.1.5.3 Conclusions The battery consumption per Bluetooth + GPS increases according to the intervals. This can be explained by the fact that the energy consumed by the operating system during intervals varies for all the cases and is transferred to the consumption.

4.2 Data analysis from Roskilde Festival 2011

4.2.1 Overview During the 2011 Roskilde Festival edition, (Larsen & Stopczynski, 2012) performed a research using scanners placed on different locations within the festival area. Every scanner performed a scan with a frequency of 0,6[min] in average. Since these scanners where static, no location updates were required. Additionally, energy consumption was not an issue since these devices were connected continuously to a power source. We were able to have access to the data collected on this research, which will be extremely helpful to analyze the participants’ trends. It will be also useful to perform simulations about what happen when the scanning interval is changed. For example, in Figure 23, we can see different scenarios for a set of scanning intervals. We consider that there is only one device in the scanning area. The only axis represents the time; the dark turquoise boxes represent intervals (time) when the person is moving, the light turquoise boxes (time) when the person is still, and the blue lines the scans. We can observe

37 | Chapter 4: Experiments that the scans 1, 2 and 3 return new information about the audience, since three events took place (people are still, moving, and still again). However, during the scanning 3 and 4, no events happened but 2 scan cycles where performed, consuming battery. On the other hand, if we consider a larger interval where only the scans 1 and 6 were performed, all the information regarding the events in between will be lost. We will the Figure 23 to define two different measurements for a set of scans. We call unique devices to the number of devices that appear on the scans; in this case, we have only one. We call device occurrences to the number of devices that appear on a set of scans, even if they are repeated. In this case we have 6 scans each of them with one occurrence, so the set of scans have 6 occurrences. Unique devices indicate the number of individuals in the set, while the device occurrences the number of events. The more device occurrences (in different places or time) we have about a single device, the more we could say about them.

time

1 2 3 4 5 6 Figure 23. Different Bluetooth discovery scenarios when only one device is around. The blue lines represent when a scan is performed. The light turquoise ranges indicate when the person is still and the dark ones when he walks. The scans 3 and 4 provide the same information.

The purpose of the experiments described in this section is to analyze the festival dynamics using the data collected during the year 2011. For example, if we know the hours when the participants are sleeping, the scanning intervals can be greater, in order to minimize the situations described above.

4.2.2 Dataset This dataset of the Roskilde festival 2011 was available in Python and contained the data collected during the 8 days of the festival (26 June to 03 July), whose structure is depicted in Figure 24. The main object (dataset) is a dictionary of time -values which represents the time when a scanning was performed. Every time contains an array of scanners, which is the list of scanners that performed a scan at that time. Every scanner contains the list of devices discovered in that scan.

dataset t t t time 0 1 … n

scanner s0 s3 s5 s10 s2 s1 s20

devices d0

d2

d5

d44

d10

Figure 24. Data set structure

Chapter 4: Experiments | 38 4.2.3 Crowd activity during the day

4.2.3.1 Overview On a single day in the festival, there are certain hours where the participants are more active than others. For example, during the night, they remain in one position and during the evening they move from one concert to another. If during a period of time the participants remain on the same position, all the scans performed will return the same data. Therefore, it would be better to perform less scans during that period in order to save battery life. We introduce new devices ratio as a measure of the crowd activity, which is basically the percentage of devices in the current scan that were not seen in the previous. This ratio is given by the following formula:

#( �!(�!) ∩ �! �!!! ) ����������(�!) = 1 − #�!(�!)

This formula considers one scanner i, the scan Si(tn) performed on tn and the most recent scan Si(tn-1) performed on tn-1. The operator # is used to indicate the cardinality of the set.

4.2.3.2 Methodology To reduce the computing time needed to analyze the data, we created a sample with the data collected on the day with more activity (01 July). Then, we calculated the percentage of new devices for every reading and all the scanners. Additionally, the experiment was performed several times, forcing that two consecutive scans were separated by 60, 90, 120 … 570, 600 [s]. The reason of this was because by skipping scans, we can simulate what would happened if we use different intervals.

100"

90"

80"

70"

60" Poly."(600)" 50" Poly."(420)" 40" New$devices$[%]$ Poly."(300)" 30" Poly."(60)" 20"

10"

0" 06:14" 09:36" 12:57" 16:19" 19:40" 23:02" 02:24" 05:45" Hour$of$the$day$[hh.mm]$

Figure 25. New devices ratio according to the hour of the day

4.2.3.3 Results The Figure 25 shows the new devices ratio according to the hour of the day. Additionally, a polynomial fit was plotted when the intervals where 60, 300, 420 and 600. As can be seen in the figure above, the higher activity takes place between 12:00 and 3:00. This matches the concert’s start time (12:00). On the other hand, the parties finish around 3:00.

39 | Chapter 4: Experiments We can also see that when the interval between scans decreases, the ratio of new devices increases. This makes sense because when the interval is low, many repeated readings may occur. Nevertheless, we obtained similar results when the interval was set between 300 and 600[s]. Considering the previous analysis, we can introduce below the function obtained for the daily activity using the polynomial fit for 300[s]. The t variable is expressed in hours between 0 and 24.

� � = 94,6577 + 1,16841� − 0,0959903�! + 0,00350576�! − 0,000048�!

4.2.4 Data loss versus scanning frequency As it was mentioned in the beginning of this section, the scanning frequency should be carefully calculated in order to manage the tradeoff between battery power and information collected. The purpose of this experiment is to measure how many unique devices are missed for a given scanning frequency. For simplicity and fast calculations, we restricted the database to only one day (June 30).

4.2.4.1 Fix intervals In this case the scanning interval was set to the random values: 0.68, 1.19, 1.64, 2.63, 3.54, 4.9, 7.27, 8.48, 9.24 and 9.81 minutes. For skipping the data and simulate the intervals, we wrote an algorithm that assures that the timestamp difference between two consecutive scans is at least the interval mentioned. The measurements we want to obtain are: total occurrences (the sum of the amount of devices obtained on all the scans), unique devices (device seen at least once), unique devices lost (percentage of unique devices compared to the results from the lowest interval) and data reduction (percentage of total devices compared to the lowest interval).

70000%

60000%

50000%

40000%

Devices' 30000%

20000%

10000%

0% 0,68% 1,19% 1,64% 2,64% 3,54% 4,9% 7,27% 8,48% 9,24% 9,81% Unique%Devices% 4566% 3975% 3737% 3441% 3205% 2930% 2515% 2425% 2269% 2254% Total%Occurrences% 60619% 22033% 16896% 11553% 9333% 7016% 4799% 4244% 3846% 3650% Unique%Devices%Lost% 0,0%% 12,9%% 18,2%% 24,6%% 29,8%% 35,8%% 44,9%% 46,9%% 50,3%% 50,6%% Data%reducAon% 0,00%% 63,65%% 72,13%% 80,94%% 84,60%% 88,43%% 92,08%% 93,00%% 93,66%% 93,98%% Interval'[s]'

Figure 26. Fix intervals results. The bars show the unique (blue) and total occurrences (red). The table below shows also the unique devices lost and the data reduction compared to the 0,68 interval. We can observe that the amount of unique devices decreases as the interval does. When the interval is set to 9[min] the original unique devices decreases to 50.3%. The total occurrences in the data set drops to more than the half using a 1,19[min] interval. We will use this measurement as a basis to compare more sophisticated techniques.

4.2.4.2 Dynamic intervals using new devices ratio In the previous experiment, we used fixed values to measure the devices discovered using different scanning intervals. Now, we propose an algorithm that applies a different

Chapter 4: Experiments | 40 interval on every cycle, whose value is in the range [iMin, iMax]. The value that the algorithm chooses depends on the percentage of new devices described in 4.2.3. The main idea behind this, is the assumption of when participants are entering to an area, others will follow them. Therefore, the scanning frequency will increase if the new devices ratio does, in order to collect more movement changes. The algorithm is described in Figure 27. It is simplified to one scanner, where scanner [time] contains the Bluetooth devices discovered. The interval is set to the range [iMin], iMax].

1. const iMin, iMax, averageLength 2. newDevicesArray[] = arrayWithZeros(averageLength) 3. interval = iMin 4. lastTime = null 5. foreach time in scanner 6. if lastTime != null 7. if (time-lastTime) < interval 8. continue 9. newDevices = getNewDevices(lasttime, time) 10. addAndRotate(newDevicesArray, newDevices) 11. newDevicesAverage = getNewDevicesAverage(newDevicesArray) 12. factor = f(newDevicesAverage) 13. interval = iMin + factor*(iMax-iMin) 14. else 15. lastTime = time Figure 27. Dynamic intervals algorithm (with simplified data) In lines 1 and 2 the constants are initialized, where averageLength is the amount of historical new devices ratio that will be used in the calculation, stored in newDevicesArray. The initial interval is set to iMin and lastTime is used to store the last scan taken into account. The lines 7 and 8 are used to skip the scans less than the current interval. In 9 we calculate the new devices ratio, and in 10 the function addAndRotate add to the position 0 of the array and shift the current values one position. In 11 the new devices average is calculated in order to provide a smooth change of the ratio. The interval is calculated on 12 using a function (described below) between [0,1] that uses the new devices average as input. Finally the interval is calculated in 13 as a value between iMin and iMax depending on the function. The functions used to transform the new devices ratio into a factor are shown in Figure 28. The x-axis is the new devices ratio (activity ration) and y-axis the factor. The idea is, when the new devices ratio is greater than 0.5, the factor is 0 and thus the interval is set to the minimum. If the ratio is below that, the interval will be greater due to the low activity.

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 a) b) Figure 28. Functions used to transform new devices ratio into a factor a) Linear b) Exponential

41 | Chapter 4: Experiments Nevertheless, we should run the experiment several times until the average of all dynamic intervals match the ones of the fixed intervals experiment so we can compare. We will use the fixed interval for 30[s] as a reference. In Figure 29 and Figure 30 the results for dynamic intervals using the previous factor functions are presented. Both functions returned very similar data, but much better than with fix intervals. With this technique, data is retrieved using the same amount of scans. For example, when the interval was 1.64, the amount of raw data reduction was 58% compared to 72% from the fix scan. However, after 4.9[min], the values tend to be similar.

70000%

60000%

50000%

40000%

Devices' 30000%

20000%

10000%

0% 0,68% 1,19% 1,64% 2,63% 3,54% 4,90% 7,27% 8,48% 9,24% 9,81% %Unique%Devices% 4566% 4467% 4307% 3994% 3606% 3219% 2845% 2732% 2689% 2677% Total%Occurrences% 60619% 34554% 24963% 15536% 11550% 8340% 5626% 4819% 4428% 4167% Unique%Devices%Lost% 0,0%% 2,2%% 5,7%% 12,5%% 21,0%% 29,5%% 37,7%% 40,2%% 41,1%% 41,4%% Data%reducAon% 0,0%% 43,0%% 58,8%% 74,4%% 80,9%% 86,2%% 90,7%% 92,1%% 92,7%% 93,1%% Interval'[s]' Figure 29. Results for dynamic intervals using a linear function

70000%

60000%

50000%

40000%

30000% Devices'

20000%

10000%

0% 0,68% 1,19% 1,64% 2,63% 3,54% 4,90% 7,27% 8,48% 9,24% 9,81% %Unique%Devices% 4566% 4477% 4317% 3915% 3659% 3271% 2785% 2593% 2448% 2349% Total%Occurrences% 60619%33865%25271%15284%11725% 8278% 5629% 4834% 4419% 4143% Unique%Devices%Lost% 0,0%% 1,9%% 5,5%% 14,3%% 19,9%%28,4%% 39,0%% 43,2%% 46,4%% 48,6%% Data%reducAon% 0,0%% 44,1%%58,3%% 74,8%% 80,7%% 86,3%%90,7%% 92,0%% 92,7%% 93,2%% Interval'[s]'

Figure 30. Results for dynamic intervals using an exponential function Finally, Figure 31 shows the amount of unique devices obtained by the techniques described earlier, while Figure 32 the total occurrences. We can conclude that the dynamic intervals are better because they can obtain more data using the same amount of readings, and thus maintaining the battery usage.

5000" 4500" 4000" 3500" 3000" 2500" Linear" 2000" Exponen6al" Unique'Devices' 1500" Fix"Intervals" 1000" 500" 0" 1" 2" 3" 4" 5" 6" 7" 8" 9" 10" Interval'[s]'

Figure 31. Amount of unique devices results according to interval length

Chapter 4: Experiments | 42

70000"

60000"

50000"

40000" Linear" 30000" Exponen6al"

20000" Fix"Intervals" Total&Occurrences& 10000"

0" 1" 2" 3" 4" 5" 6" 7" 8" 9" 10" Interval&[s]&

Figure 32. Total occurrences results according to interval length

4.2.5 Data predictability and dynamic intervals using replacement rate On the previous experiment, we defined the new devices ratio. However, that measure only takes into account the new devices that appear in the current reading. We would also like to know when a device is leaving the scanning area. We define the replacement rate as a measure of the incoming and outgoing devices, which is shown below. It compares the data from the current scan Si(tn) and the previous Si(tn-1), where the incoming devices are the devices that appear on the current scan, but not the previous Si(tn) \ Si(tn-1) and, analogously the outgoing devices the devices that appear in the previous but not the current Si(tn-1) \ Si(tn).

#( �! �! \ �! �!!! ) − #( �! �!!! \ �! �! ) ����������� �! = #(�! �! ∪ �!(�!!!))

Using the raw data from the previous year, it is possible to calculate the replacement ratio for every scan. In Figure 33 we can see the replacement ratio for the scanner 24 during the festival. In order to reduce the peaks and have smooth changes, every value is averaged with the previous 10 values.

0,7"

0,6"

0,5"

0,4"

0,3" Replacement*Ra+o* 0,2"

0,1"

0" 29" 30" 01" 02" 03" 04" Day*

Figure 33. Replacement ratio for scanner 24 during the festival From the previous data, we can see that the activity has peaks periodically every day, similar to the results obtained from 4.2.3. Similar the previous experiment, we will use the replacement ratio instead of the new devices ratio as a parameter to adjust the interval. Additionally, we will combine the results from 4.2.3.

43 | Chapter 4: Experiments The algorithm for dynamic interval is shown in Figure 34, which is very similar to the one from the previous section. The main difference is the line 12. One version (in red) of the algorithm uses only the average ratio to recalculate the intervals, while the other version (in blue) combines also the polynomial obtained from 4.2.3.3.

1. const iMin, iMax, averageLength 2. averageRatioArray[] = arrayWithZeros(averageLength) 3. interval = iMin 4. lastTime = null 5. foreach time in scanner 6. if lastTime != null 7. if (time-lastTime) < interval 8. continue 9. ratio = averageRatio(lasttime, time) 10. addAndRotate(averageRatioArray, ratio) 11. averageRatio = getAverageRatio(averageRatioArray) 12. interval = iMin + (1-averageRatio)*(iMax-iMin) 12. interval = iMin + dailyPolynomial(time)*(1-averageRatio)*(iMax-iMin) 13. else 14. lastTime = time Figure 34. Algorithm using replacement ratio. For taking into account only the replacement the line 12 in red is used. The line 12 in blue is used to take also into account the polynomial from 4.2.3.3. Additionally, the average ratio was calculated in two ways. One using the average and the other assigns a weight to every value. This weight is the total amount of devices for that scan, in order to consider the importance of the amount of devices. Using the data filtered by the previous algorithm we were able to obtain interesting results, which are shown on Table 4. We are not only interested in the amount of unique devices, but also in the amount of total devices, because they can contain information about the current position of the device. We can see that all techniques can return more devices per scan than just using fix intervals. Fix Weighted Weighted Average Average & Average Intervals Average & Daily Function Daily Function Total occurrences efficiency [devices/discovery] 2,495 2,574 2,539 2,699 2,562 Total occurrences improvement with respect to fix intervals - 3% 2% 8% 3% Unique devices efficiency [devices/discovery] 1,047 1,063 1,077 1,104 1,106 Unique devices improvement with respect to fix intervals - 2% 3% 5% 6% Table 4. Comparison between interval setting techniques When the replacement ratio is combined with the polynomial function the results are even better than without it. We can see an increment of 8% of the total occurrences per scan than without. In Figure 35 we can see a graph with a comparison of total occurrences obtained according to the amount of readings.

Chapter 4: Experiments | 44 1270$ 1270$ 1170$ 1170$ 1070$ 1070$ Weighted$Average$ Weighted$Average$ 970$ 970$ Average$ Average$ 870$ 870$ Fix$Intervals$ Fix$Intervals$ 770$ 770$ Linear$(Weighted$Average)$ Linear$(Weighted$Average)$

670$ Total&Occurrences& 670$ Total&Occurrences& Linear$(Average)$ Linear$(Average)$ 570$ 570$ Linear$(Fix$Intervals)$ Linear$(Fix$Intervals)$ 470$ 470$ 370$ 370$ 150$ 200$ 250$ 300$ 350$ 400$ 450$ 500$ 150$ 200$ 250$ 300$ 350$ 400$ 450$ 500$ Discoveries& Discoveries& a) b) Figure 35. Total device occurrences per amount of readings a) Using replacement rate b) Using replacement ratio and daily function Using the information obtained in this experiment, we have discovered that the algorithm that obtains the best efficiency is the weighted average combined with the daily activity function. This algorithm will be implemented in the library.

4.3 Summary This chapter brought many important results before the development stage. One of them is the discovery range of the scanning device. Even though the devices could be detected on a distance of 30[m] and more, a feasible range should be closed to a radius of 10[m]. Another aspect is the movement of the people in the discovery zone. Considering the average walking speed and the density of people, the time needed to discover the smallest change of the device within the discovery zone is 2.8[s]. However, a technical limitation of the discovery technique does not allow to set intervals for times smaller than 10.24[s]. Knowing festival area, we discovered that having 400 unique distributed devices (4% of total population) we are able to cover whole concert's area. The energy consumption experiments of the GPS and Bluetooth sensors gave us the energy cost of performing scans, indicating how much battery we will consume using a particular interval. The data analysis indicated that if we set the intervals around 300 and 600[s] we are losing the same amount of information. Combining these results with the battery consumption, the most suitable (minimal) value for scanning should be in the range between aforementioned times. Furthermore, the intervals should be longer in the nighttime where movement of the people is less. Moreover, applying intervals with the various algorithms for data measurements, we can point that the adaptive approach using the replacement rate plus the polynomial daily function gives the best results. Having this information, we are able to create the functional requirements for the final solution.

45 | Chapter 4: Experiments CHAPTER 5

Development

5.1 Overview In the previous chapter, we measured the different parameters that impact the design of energy efficient scanning software. In this chapter, we will describe the development process for it. In general terms, the proposed solution is depicted in Figure 36. As can be seen, the smartphone icons represent a participant running the scanning software on their mobile device, which are the scanners. The blue circles show the area of coverage for every scanner, while the Bluetooth icons the potential discovered devices. The scanners perform Bluetooth discoveries on a periodic basis. Another key point is the participants’ constant trajectory change, which is shown as the dashed lines. In order to associate the discovered devices to the scanner location, it is necessary that the scanners perform periodic location updates.

internet

Server

Figure 36. Overview of the proposed solution. The smartphone icons represent the scanners and the Bluetooth icons the devices in discoverable mode. Every scanner sent periodically the information to a server via Internet. Every scanner collects relevant information about each discovery and location update, which is basically the data described in Section 3.2 . This data is saved temporarily on the mobile phone. Once the data collected is significant, the scanning software sends it to a server through Internet. This server collects the data from all the scanners, and saves it on a database. In the next section, we will describe in more detail the key components of the solution.

Chapter 5: Development | 46 5.2 Architecture The architecture of the solution is illustrated in Figure 37. We have decided to describe first the main components instead of the requirements, in order to introduce them. The solution is divided into two main groups: the mobile and the server part. On the mobile side we have the front-end application that contains the scanner library. The front-end application only needs to start the scanner library to initialize the periodic discovery process. The library interacts with the Android API for controlling the Bluetooth device and obtains location updates. Once it has enough data collected, it sends it through Internet to the server part. On the server side there is a web service that serves as the interface for the library. It receives the data collected and adds it to a database located on the database server.

Front-End Database Application Server Android internet Web Service API Scanner Library Server OS

Mobile Server

Figure 37. Solution Architecture. The left side contains the scanner components, while the right side the server used to store all the data collected.

5.2.1 Mobile Components

5.2.1.1 Front-end Application In the previous chapter we mentioned that the scanning software by itself might not be attractive enough to be downloaded by the participants. For that reason, the scanning software will be deployed as a library that can be attached to a front-end application, working on background. Previously, we mentioned that in this project we are collaborating with DTU students, who are developing applications for Roskilde Festival. The library will be attached to 3 front- end applications, covering the following topics: festival information, social games and noise metering.

5.2.1.2 Scanner Library The scanning library is actually the scanning software. It does not have graphic interface and works autonomously after being started by the front-end application. It interacts with the Android API to access to the sensors and also has an internal database to save the data collected. Additionally, it exposes to the front-end application interfaces in order to the front-end application can have access to the data collected. The usage of these interfaces by the front-end application is optional. In Figure 38 we can see the mobile components in detail. It presents the case of many front-end applications running on the same device. Although every front-end app has an instance of the library running, they have the same database in common. The library

47 | Chapter 5: Development instances should have mechanism of synchronization in order to maintain the scanning frequency.

App App App App App App 1 2 1 2 1 2 Lib Lib Lib Lib Lib Lib

App App App App App App ... ..n ... ..n ... ..n Lib Lib Lib Lib Lib Lib

SQLite DB SQLite DB SQLite DB

Android 2.1 Android … Android 4.0 Figure 38. Mobile components. On this example, many front-end applications are running on the same device, however, all of them use the same database.

5.2.2 Server Components The primary task of the server is to receive the data collected by the library and store it into a database. Nevertheless, it has to organize the data in such a way that all the data collected from different devices could be stored in the same database.

5.2.2.1 Web Service The web service is the bridge between the library and the database. The purpose of it is to receive the package sent by the library, store it temporarily and insert it into the main database. It will be explained in detail in 5.5.4.

5.2.2.2 Database Server This component is basically a database engine running on the server side. It can run on the same server or on an external one. The structure of it will be explained in 5.4.1.

5.2.3 Monitoring tools It would be difficult to have a clear understanding on how the data collection process is working just by reading the raw data. Because of that, we developed a set of different visualizations, using a web interface and obtaining data from the main database. This might be useful for identifying issues, testing and checking the status of the data collection on real time. These monitor tools are explained in section 5.6 .

5.3 Requirement Analysis Having identified the key components of the solution, we will describe the characteristics they should meet. These characteristics are tightly related to the analysis and experiments described in the chapters 3 and 4. The components that will be developed in this project are: scanner library, web service, database structure and the monitoring tools.

5.3.1.1 Scanner Library

• Functional Requirements o It has to perform periodic Bluetooth discoveries and GPS location updates. o It cannot run standalone; therefore it has to be attached to a front-end application. o It has to perform periodic tasks while running on background.

Chapter 5: Development | 48 o It has to expose interfaces to the front-end application for accessing to the collected data. o The results collected from Bluetooth discoveries and GPS location updates should be persisted in a database located in the mobile phone. o Multiple instances of the library running on the same device should be allowed, using a common database, avoiding data duplication and keeping the same scanning intervals. o The results collected from Bluetooth discoveries and GPS location updates should be sent to a server through Internet. o It has to have failsafe mechanisms to deal with Internet access issues.

• Non-Functional Requirements o It has to be energy efficient by obtaining the highest amount of data possible with the minimum amount of scans. o The library should not affect the performance of the front-end application. o If the library throws an exception, this should not interrupt the front-end application.

5.3.1.2 Web Service

• Functional Requirements o It should receive the packages from library and save them locally as a file. o It should be able to append the packages to the main database. o Is should return information to the library about the status of the package insertion.

• Non-Functional Requirements o It has to support many connections at the same time, since many scanners can send packages at the same time

5.3.1.3 Server Database Structure

• Functional Requirements o It should be designed in such a way it can support the appending of collected information submitted from multiple scanners into one database.

5.3.1.4 Monitor tools

• Functional Requirements o They should provide information in order to identify bugs and performance issues. o They have to visualize data in a readable way. o They have to run in a web browser. o The access should be allowed only to restricted users. o Provides statistic information.

49 | Chapter 5: Development 5.4 Design In this section we will present the design for the components described above. We will start by describing the databases design, in order to showing how the data is organized in the server database and in the mobile device.

5.4.1 Databases

5.4.1.1 Overview We mentioned that the scanner library should store the data collected, in order to send it to the server when it is significant. For that purpose, the scanner library will store all the data collected into an SQLite28 database. Every scanner has only one local database, independent of how many instances are running on the device. On left side of Figure 39, we can see an illustration of the SQLite database or local database. This database contains tables that store the data collected and every record has a flag indicating the status of the data, namely: submitted, packaged and unsubmitted. Submitted means that the data collected has been already sent and saved on the server. The unsubmitted status is for data that is only on the scanner and packaged is the data that is stored in a package and probably in transit to the server, but not yet stored. We call package to a SQLite database that contains a fraction of the local database. It preserves the same structure and is compressed before sending to the server. On the server side, we have a main database where all the data collected is appended. It runs on a MySQL server and the structure is very similar to the local database, but adding a package identifier and the scanner Bluetooth MAC address to every table. As can be seen on right part of Figure 39, the concept is to have on one main database the data from every local database. When the package is added successfully, a confirmation is sent to the scanner library, which will delete the package and mark all that data as submitted. We choose to use the SQL language on both sides for simplicity and avoid unnecessary steps, for example, converting data files.

Local Database (SQLite) Main Database (MySQL) Package Unsubmitted sending Device D Packaged Device C Submitted Package Device B Device A

Figure 39. Package Creator. On the left side it shows the scanner database (local database) with the status of every record. On the right side, the main database on the server side is illustrated. Every record contains the scanner that collected that information.

5.4.1.2 Mobile local database (SQLite) In Figure 40 we can see the data structure of the local database. Because of the limited network bandwidth we expect to experience in the festival and the limited storage space on smartphone devices, we decided to normalize the database in order to avoid data redundancy and this use less storage space on the user’s devices. Every table contains a submitted field,

28 SQLite Home Page: http://www.sqlite.org/

Chapter 5: Development | 50 containing the necessary information to know the status of the data (See the previous section). To reduce the amount of data even more, we assigned suitable data types for every field, and often limiting the length of VARCHAR according to filed functionality. The list below contains the description of the tables shown in Figure 40.

Figure 40. Diagram of the mobile local database • Application: contains the information about applications that attached the scanner library. It maps a numerical identifier to every application name. This number will be used as reference in other tables, in order to know which application collect the data. • Packages: contains information about packages created by the library. The packageName field indicates the path where the package was stored. • Submission: Every time a package is submitted to the server, a record is inserted in this table. It contains relevant information to track the packages sent. For example, the latency contains the time elapsed between the package submission and the confirmation from the server. • Gps: stores information related to the location updates. Every time a location update is performed a row containing the latitude, longitude, time and accuracy is added to this table. Additionally, discoveryTime field contains the time elapsed in the process. • GpsUsed: we mentioned that the library provides interfaces for the front-end application to obtain information about the data collected. One of these interfaces is used to return the last location obtained by the library, and thus the front-end application avoids performing location update. This table records every location request from the front-end application. • Bluetooth: contains the information about the time of a Bluetooth discovery. The details of the MAC addresses detected on a particular scan are stores in the Devices and Scans tables. • BluetoothUsed: Analogous to GpsUsed, this table registers the requests of information about the Bluetooth discoveries from the front-end application.

51 | Chapter 5: Development • Devices: this table contains a normalization of the discovered devices. Basically, it maps the MAC address to an id, and contains only unique devices. • Scans: this table contains a list of the devices obtained for a particular scan. It is basically a bridge between the Bluetooth and the Devices tables. • Post: Another functionality provided as interface to the front-end application is the possibility of sending data to their own web services when the library sends a package. It is described in detail in Section 5.4.4.3.

5.4.1.3 Main Database (MySQL) The main database on the server side has to have a similar structure than the local database (See Figure 41), in order to simplify the storing process. However, it has to clearly identify to which device the collected data belongs and to which package. Every time a new package arrives, a record is inserted to the table Submission. This table saves relevant information about this packet, such as the scanner IP address, its manufacturer and the time of reception. Additionally, after inserting a unique identifier is created that package (idSubmission). The other tables have a similar structure than the local database, but they have two extra fields: idSubmission and senderMac. These fields are used to identify which scanner collected the data. On the other hand, not all the fields are reflected in the server side database. Due to the lack of usability on server-side e.g. scanning status field in Gps table was removed.

Figure 41. Design of server side database

Chapter 5: Development | 52

5.4.2 Scanner Library After describing the database structure for storing the data collected, we will describe how the data collection mechanism works. For that purpose, we will describe the relevant processes done by the scanning library, as well as their connections with other components described in the solution.

5.4.2.1 Components Diagram Previously, we identify the key components of the solution, namely: front-end application, scanner library and the web service. These components interact with each other by exposing interfaces. The component diagram is shown of Figure 42. As we can see the library is in the center, exposing several interfaces to the front-end application. These interfaces are provided to the front-end application providing access to the scanner library functionalities. At the bottom part of the diagram, we can see the only interface (receive package/send package) between the web service and the library, which is used to receive packages. Following, we will explain the functionality of all these interfaces.

cmp

Front-end Application

registerReceiver

startService stopService schedulePost getLocation getBluetoothDevices IntentFilter

<> Scanner Library

sendPackage

receivePackage

<> Web Service

Figure 42. Component diagram of the solution. It shows the front-end application on top, the scanner library in the middle and the server part on the bottom.

Interfaces between front-end application and the scanner library: • startService: This is the interface that creates an instance of the scanner library from the front-end application. After the instance is created, the data collection process starts automatically. • stopService: Stops the scanner library and the data collection process. • schedulePost: Interface to send a POST request using the library. See Section 5.4.4.3 and Appendix i. • getLocation: Interface that returns the most recent location collected by the scanner library. This avoids the front-end app to perform a new location update. See Section 5.4.4.2 and Appendix i • getBluetoothDevices: Interface that returns the devices discovered in the last Bluetooth discovery. See Section 5.4.4.1 and Appendix i.

53 | Chapter 5: Development • registerReceiver and IntentFilter: All the interfaces described above returns information in an asynchronous fashion. Before invoking these interfaces, the front- end application should register a receiver for the IntentFilter type of messages. The receiver is a piece of code that will be called when the response is ready. For instance, when the front-end application needs to receive the results from the interfaces schedulePost, getLocation or getBluetoothDevices, should register the intents ACTION_SCAN, ACTION_LOCATION and ACTION_POST. See Appendix i.

Interfaces between the scanner library and the web service: • sendPackage: Prepares a package with the data collected not submitted yet. • receivePackage: On the web service, this interface receives a packages and insert its data to the main database.

5.4.2.2 Class diagram Following, we will describe the classes contained in the scanner library. As can be seen in Figure 43, the classes are grouped into three main packages: main, services and utils packages. pkg

dk.dtu.imm.btscanner

Scheduler ScannerService dk.dtu.imm.btscanner.utils

Database ServerConnection

dk.dtu.imm.btscanner.services

Bluetooth GPS Battery Server PackageCreator IntervalComputer

Figure 43. Class Diagram of the main classes of the scanner library grouped into packages. Main Package: dk.dtu.imm.btscanner: • ScannerService: This is the main class of the library. As the library works as an Android service29, it extends the Service class. Additionally, it provides the interfaces mentioned in the components diagram. • Scheduler: Class that contains the code to perform on a periodic basis the following tasks: obtaining the location, performing Bluetooth discoveries, obtaining the battery status and sending the collected data to the server. It is explained on detail in section 5.4.3.

Services Package: dk.dtu.imm.btscanner.services This package contains the classes that implement the periodic tasks triggered by the Scheduler instances. These classes are singletons, because only one instance can exist at a particular time. • Bluetooth: Performs n consecutive Bluetooth discoveries and saves the results into the local database. See subsection 5.4.4.1. • GPS: Performs a location update. It uses the GPS and the 3G antennas to obtain the location. See subsection 5.4.4.2.

29 Android Developers, Service: http://developer.android.com/reference/android/app/Service.html

Chapter 5: Development | 54 • Battery: Saves in the local database the current battery level. • Server: Tries to send a package to the server with the data collected, if the connection is reliable. It also updates the local database, to refresh the submission status for all the fields. Section 5.4.4.3.

Utils Package: dk.dtu.imm.btscanner.utils • Database: This class provides a list of methods to read and write to the local database. • ServerConnection: Provides an asynchronous HTTP connection to a server. This is used to establish a communication to the web service. • PackageCreator: Generates a package with the data not submitted yet to the server. • Interval Computer: Calculates the next time the scheduler will perform a task, for the process using dynamic frequency. To calculate this value, it uses the data collected and the algorithm from Section 4.2.5. The implementation of this class is shown in the Appendix iii.a and the formula used to calculate the intervals is shown on the next section.

5.4.3 Schedulers

5.4.3.1 Overview

The objects created from Scheduler class are used to perform the same task on a periodic basis. These objects are very important because they determine how much battery is consumed, by deciding when to perform a Bluetooth discovery, a location update or sending a package to the server. The scheduler mechanism is depicted in Figure 44. The battery intensive tasks use dynamic intervals (GPS and Bluetooth), while the rest uses fixed intervals (Server and Battery). After every cycle, the next interval is recalculated using the IntervalComputer class

Server Scheduler GPS Scheduler time Bluetooth Scheduler Battery Scheduler

Figure 44. Service Schedulers. Every line represents an instance of the scheduler associated to a task. The server and battery schedulers use fixed intervals, while the GPS and Bluetooth dynamic ones.

Every time the task is performed, the interval for the next cycle is recalculated, using the IntervalComputer class.

5.4.3.2 Sequence Diagram The Scheduler can be explained using the sequence diagram from Figure 45. The library is initialized by the front-end application by instantiating the main class (ScannerService). After that, the ScannerService object creates Scheduler objects for the four periodic tasks in

55 | Chapter 5: Development the library. For simplicity this diagram shows only the Scheduler instance for the Bluetooth task (BluetoothScheduler). sd ScannerService-Scheduler ScannerService BluetoothScheduler : Scheduler AlarmManager Bluetooth

schedule(now, Bluetooth) start()

onReceive()

sendBroadcast(BLUETOOTH_DEVICES) startService(Bluetooth)

schedule(nextInterval, Bluetooth)

onReceive() sendBroadcast(BLUETOOTH_DEVICES) startService(Bluetooth)

schedule(nextInterval, Bluetooth)

Figure 45. Scheduler Sequence Diagram

The Scheduler class contains a method called schedule(time, Service), where it sets an alarm that should start the task Service at time (For an explanation about alarms in Android see Subsection 4.1.4.1). Therefore, the first time a Scheduler is started, it sets an alarm to trigger immediately the Bluetooth task (schedule(now, Bluetooth)). When an alarm is triggered, the Android OS calls the method onReceive previously registered to handle alarms. In this method the Bluetooth service instance is created asynchronously and the alarm for the next cycle is set, using schedule(nextInterval, Bluetooth). The IntervalComputer class returns the nextInterval value. Once the task is completed, it sends a broadcast to all the receivers subscribed to them, and the service that performs it is destroyed.

5.4.3.3 Interval Computer Class The IntervalComputer class contains a set of static methods used to calculate the next interval. It combines the functions and algorithms studied in Chapter 4 along with the battery level. The algorithm presented in Figure 46 returns the next interval using these functions. First, two constants (iMin, iMax) are defined for every scheduler, representing minimum and the maximum intervals that the algorithm can return. Then, the batteryFactor function returns the value 0 if the current battery level is greater than 35% and 0.5 if the value drops below that. This ensures that the intervals are at least the half of the interval range (iMin + 0.5*iMax) if the battery drops below 35%. The variable t contains the current time, which is used as input to calculate the current replacement ratio (replacement(t), See section 4.2.5) and the current daily factor (dailyFactor(t), See subsection 4.2.3.3). These two values are multiplied to have an interval considering both values. Finally the result is clamped between the interval ranges. The information used to obtain the aforementioned formulas are the following: the batteryFactor(t) is returned by the Battery task, the replacement(t) using the data from the Bluetooth table and the dailyFactor(t) is a polynomial hard coded in the library. We decided to set the minimum and maximum intervals to [7, 30][min] in order to have daily battery consumption in the range of [72%; 78%]. These values are based on the experimental and forecasted battery consumption for Bluetooth + GPS obtained in section 4.1.5.2. According to the graph from Figure 22, the energy consumed by one Bluetooth and

Chapter 5: Development | 56 GPS operation using an interval of 7 and 30 [min] are 0,35% and 1,3% respectively. These intervals allow the library to perform 48 and 205 scans per day, consuming 72% and 78% of the battery.

��������: ����, ���� ∆� = ���� − ���� ������������� � = �� ������� ������� ����� ≥ 0.35 �ℎ�� 0 ���� 0.5 �������� = � + ���� + ∆� ∗ ������������� � + ∆� ∗ ����������� � ∗ ����������� � �������� = �����(��������, ����, ����) Figure 46. Algorithm to calculate the next intervals

5.4.4 Periodic tasks The Scheduler is used to perform periodic tasks, which the most important are: Bluetooth, GPS (location) and Server. In this section we will explain in detail how they work internally.

5.4.4.1 Bluetooth task The main purpose of the Bluetooth task is to perform Bluetooth discoveries on every call, save the collected data in the local database and broadcast the results of the discoveries to the front-end application. The state machine diagram from Figure 47 shows how this works. In the Idle status, it checks if the database contains a recent Bluetooth discovery. This is used to avoid performing a new discovery if there is a recent one in the database (no older than 2[min]). If another discovery is found, the results are broadcasted and the process is finished. Later, the task checks if the Bluetooth device is turned on. If it is off, the process finishes broadcasting an error. If the process reaches the Discovery status it performs n consecutive discoveries (In the current setup n=2). The reason of performing consecutive discoveries is to detect devices that may be occluded by obstacles. After the discoveries are completed, the information containing the MAC addresses of the devices found is saved on the database. Finally, this data is broadcasted to thestm Bluetooth front Service-end State applicatio Machine n.

Idle

Is there a recent Bluetooth discovery? sendBroadcast(results) [Yes] [No]

Checking

Is the Bluetooth sensor enabled? sendBroadcast(error) [Yes] [No]

Discovery

[No]

The total number of discoveries are performed?

[Yes]

Saving sendBroadcast(results)

Figure 47. Bluetooth task state machine diagram

57 | Chapter 5: Development 5.4.4.2 GPS Task (Location) The GPS task is used to determine the current location of the scanner. For this purpose, it uses both the GPS sensor and the network provider30. If the location is successfully obtained, it is broadcasted to the front-end application. Its state machine diagram is shown in Figure 48. When this task is started, it checks if there is a recent location stored in the database. If the database contains a location not older than 1[min] and with a precision less than 170[m], these values are broadcasted and the process is finished. If there are no locations stored in the database, the task checks if there is a location cached in the operating system (i.e.: another app obtained the location recently). This location should meet the same freshness and accuracy than in the previous step to be accepted. If in the previous steps were impossible to obtain a fresh and accurate, the actual location update process starts. First, the task checks if the GPS and the network provider are enabled in the OS. If none of them are enabled, the process finishes broadcasting an error. Otherwise, the location update process is started using both the GPS and the network provider. Nevertheless, obtaining a location can take from seconds to minutes, according the satellite availability in the case of GPS and in the network access in the case of the network provider. In order to prevent the task running to long, a countdown timer is set for 120[s]. If thestm countdown GPS State Machine reaches zero before any location is successfully obtained, the location update is cancelled and an error is returned, otherwise the best location from GPS and/or network is returned.

Checking Database

[Yes] Save to the DB we used that location Is there a fresh location in the database? sendBroadcast(location) [No]

Checking Cached

[Yes] Is there a fresh location cached in the OS? Save location to the DB sendBroadcast(location) [No]

Are there providers available? Preparing Location Updates startTimer

[No] [Yes]

Timer Running sendBroadcast(error)

Is GPS Provider available? Is Network Provider available? [Yes] [Yes] Timeout!

Performing GPS Location Performing Network Location

Location obtained Location obtained sendBroadcast(error)

Save location to the DB Save location to the DB sendBroadcast(location) sendBroadcast(location)

Figure 48. GPS task state machine diagram

5.4.4.3 Server Task The Server task is in charge of submitting the packages with the data collected to the server; refresh the local database and transport POST requests for the front-end application. The state machine diagram is shown in Figure 49. The variable CAPACITY is used to determine how many packages will be sent on every cycle, having a range from [20, 23] and starting from 1. If in one cycle all the packages

30 Android Developers, LocationManager: http://developer.android.com/reference/android/location/LocationManager.html

Chapter 5: Development | 58 are submitted correctly, the CAPACITY is incremented to the next 2-power, otherwise is decremented. The task starts in the Idle status. It checks how many packages with data are already created but not yet submitted. If the number of packages is less than CAPACITY, new packages are created using the PackageCreator class. The next step is to check if there is Internet access. If not, the CAPACITY is reduced and the cycle finishes. Otherwise, the task starts to submit the data to the server. It creates a connection to the server for every package in a separated thread. At the same time, a countdown time is set, starting from 40[s]. When the timer reaches zero, the uncompleted stm Statemachine Diagram1 threads are cancelled. If all the packages are submitted correctly, the CAPACITY value is incremented. At the same time, the local database is refreshing to mask as submitted all the data sent successfully.

Idle get unsubmitted packages from the database [unsubmittedPackages]

generate (CAPACITY - unsubmittedPackages) packages [totalPackages]

Preparing Submission

Is there Internet access? start timer Save generated packages Timer Runnning [Yes] [No] Decrease capacity Cancel timer <> Submitting Package

submission successful? Timeout!

Cancel running threads [Yes] [No] If all packages were submitted correctly, increase Refresh Local Database CAPACITY Delete Package Finish thread

Figure 49. Server task state machine diagram Another process performed by the server task but not showed in the diagram above, is to carry POST requests from the front-end application. We mentioned previously the connection problems during a massive event, and this will affect both the front-end application and scanner library. If the front-end application requires exchanging information to its own web service, it will need to try periodically to get Internet access. We aim to avoid the front-end application developers to implement that function, by reusing the periodic server task. sd Sequence Diagram0

Front-end App ScannerLibrary Webservice Front-end App Webservice

schedulePOST(now, URL, postData[]) add POST to database()

Wait until the next Server Service cycle or force it to start. Server Service started()

sendPackages() Insert Packages into the Main database()

check if the packages contain POST requests()

POST(url, postData[])

POST results onReceive() Packages Result POST Results

Figure 50. POST requests and package submissions using the scanner library

59 | Chapter 5: Development We can see this process in the sequence diagram in Figure 50. First, the front-end application uses the schedulePOST(time, URL, postData[]) interface provided by the library, where time is when the POST request should be processed. This data is saved in the database and it will be processed when a Server cycle occurs. It is important to notice that this mechanism is intended for particular asynchronous requests, because the response can take from minutes to half an hour, depending on the network conditions. When the Server task is started, the packages are sent to the server. The web service reads the information contained in the package and appends it to the main database. If this package contains POST requests, it makes a connection to the URL specified using the postData[]. Having the response from that server, the results are sent back to the library, along with the package appending result. More details about the implementation of this process are presented in section 5.5.4. One advantage of this process is that the front-end application does not require establishing a direct connection to its own server. Instead, the web service establishes the connection. Since a connection server-to-server might be faster than a mobile-to-server, we can reduce the connection latency.

5.4.5 Coordination with other library instances As it was aforementioned, the scanner library is a component that works embedded within the front-end application. This means that every front-end application has a copy of the library running. Nevertheless, if two or more instances of the library are running at the same time on different front-end applications, concurrence problems may occur. For example, two different Bluetooth tasks may try to access at the same to the Bluetooth device one blocking the other. On the other hand, two instances of the scanner library will consume twice the battery one instance does, if the interval calculations are not synchronized.

sd Sequence Diagram0

SchedulerApp1 : Scheduler SchedulerApp2 : Scheduler SchedulerApp3 : Scheduler

setNextAlarm()

Alarm is triggered!

setNextAlarm()

sendOrderedBroadcast(counter=1) SchedulerApp3: MASTER counter++ sendOrderedBroadcast(counter=2)

sendOrderedBroadcast(counter=3) SchedulerApp2: SLAVE counter++ SchedulerApp1: SLAVE counter++

launchService()

if this is MASTER launch service

Figure 51. Sequence diagram for scheduler coordination between different library instances In order to prevent the previous scenarios, we designed a system that can coordinate the Schedulers among different library instances. By enabling or disabling the schedulers assigned to the same task (but in different instances), we can prevent starting that task many times. Therefore, there is always only one scheduler that can start the task (master scheduler) while the rest cannot (slave scheduler). If the master scheduler is killed (by closing the front-end application) then the slave scheduler will become the new master. The oldest scheduler instance running in the system will be the master scheduler.

Chapter 5: Development | 60 The sequence diagram in Figure 51 describes this coordination process. Let’s assume that three instances of the library are running, and thus the schedulers are triplicated per every task. Let’s also consider that the schedulers in the diagram belong to the same task. Every lifeline represents a scheduler: schedulerApp1, schedulerApp2 and schedulerApp3. The scheduler3 was the earliest scheduler instantiated on the device. Let’s follow the schedulerApp1 lifeline. When the scheduler is waked up (As in 5.4.3.2) it sets immediately the next interval. Then, prior to start its task, it checks which scheduler was instantiated first by sending an Ordered Broadcast31. Ordered Broadcasts allow sending messages to many receivers, one after another. Every receiver can modify the message. Thus, the schedulerApp1 sends an Ordered Broadcast message to the other schedulers with a variable counter=1 and the first scheduler instance that receiver this value will become the master. As the first scheduler instance created is schedulerApp3, it itself as a master and will increment the counter. Later, the message is passed to the other schedulers with a counter greater than 1, setting them as slaves. Finally, the schedulerApp1 will not start the task, because it is a slave.

5.5 Implementation In the following section we will describe the main aspects of the development process. We will present the methodology of work and the tools used. Additionally, we will present relevant code implementations.

5.5.1 Methodology Since the design of the solution is very modular, we decided to implement first the simplest components that could give us an output, before proceeding to the complex ones. After every component was completed, a prototype was creating for testing purposes. The first components implemented were the ones regarding Bluetooth and GPS tasks. At the same time, the local database and the methods to access to it were also implemented. Next, we developed their counterparts on the server side. In this stage, we also made the connections between the scanner library and the web service. Finally, the most complex algorithms were developed, such as the coordination between library instances and the process that calculates the intervals. The library was completed three weeks before the festival, due to the integration tests with the front-end applications, which were finished by that period.

5.5.2 Technologies used

5.5.2.1 Languages The list below contains the languages used to implement the solution presented in this thesis.

• Java: Used to implement the scanner library, since is the main development language for Android. • PHP: Used to implement web service services on the server. • SQL: Used to perform operation to the databases. • JavaScript: Used for developing monitoring tools.

31 Android Developers, Context: http://developer.android.com/reference/android/content/Context.html

61 | Chapter 5: Development • C#: Used for developing monitoring tools. • HTML: Used to developing monitoring tools.

5.5.2.2 Tools The list below describes the tools used in the implementation of the library, the web service and to create monitoring tools. • Eclipse Classic SDK version 3.7.2: Main programing environment

o Android SDK version: 18.0.0. o SVNKit version: 1.3.5. o PyDev: Python’s environment. o PHP Development Tools (PDT) SDK version 3.0.0

• Unity 3.5.2f: Game engine used for monitoring. • Python 2.6.6: Integrated Development Environment (IDLE) for python. • MySQL Workbench 5.2CE: Visual tool for database operations. • Photoshop CS5: Tool for creating raster and vector graphics. • SQLite: Database engine used by the library. • MySQL: Database engine used by the server.

5.5.3 Database

5.5.3.1 Mobile The engine used for the local database on the scanner library was SQLite32. Since it is shipped as a library with the Android API, there was no need to add extra features to the library. The access to the local database is implemented in the Database class located in dk.dtu.imm.btscanner.utils package. This class provided methods to perform insert, select and update operations for each table, in order to avoid having SQL queries in the rest of components. On Android, every application has a private folder with exclusive read/write access. Nevertheless, it was required that the library instances running on different front-end applications had access to a single database. Therefore, the database was saved in a folder that every application has access (the smartphone SD card). Finally, the SQLite file was store in the path: /mnt/sdcard/databases/database.db. In the rare case of a smartphone without an SD card, the database file was stored in the phone internal storage (e.g. /data/data/database.db). Additionally, a database journal file is created automatically every time the insert statement is executed. SQLite uses this file in case of failure, allowing keeping track of the operations executed on database.

32 SQLite Documentation: http://www.sqlite.org/docs.html

Chapter 5: Development | 62 5.5.3.2 Server The server running the database engine was lestrade.imm.dtu.dk, provided by the Department of Informatics and Mathematical Model at DTU. On the server, the database engine used was MySQL. The tool used to access remotely to this server databases was MySQL Workbench 5.2 CE33. The database presented on 5.4.1.3 was created on the schema s100433_Roskilde2012.

5.5.4 Web services and package appending The main objective of the web service is to receive and append the package with collected data to the main database. The functionality is implemented as a set of files in PHP (Shown in Figure 52). These files reside in the same server than the database: lestrade.imm.dtu.dk.

Figure 52. Web service files The process of appending the package to the main database is explained using the Figure 53. On the mobile side, the package is created as SQLite db file and is compressed using GZip before submission.

Mobile database SQLite Main Database MySQL

ws.php receivePackage.php sqliteParser.php appendSQLite.php

GZip Internet GZip sqlite3 parse

Compressed Package Package: Compressed SQL Insert SQL Insert + Package SQLite file Package SQLite file Scanner Information

Mobile Server

Figure 53. Steps to append a package into the main database. The illustration shows the modification of a package from the scanner to the main database insertion.

33 MySQL, Download MySQL Workbench: http://www.mysql.com/downloads/workbench/

63 | Chapter 5: Development The scanner library interacts only with the ws.php service, using the multipart/form- data34 protocol, which allows sending POST fields and files in the same request. The ws.php service reads the data from the POST fields and uses them to call the method receivePackage, implemented in receivePackage.php. The structure of the receivePackage method is the following:

receivePackage($senderMac, $time, $idApp, $tries, $manufacturer, $model)

The receivePackage method decompresses the package and saves it into the sqlite folder. Additionally, it registers the package information in the table Submission. The package information is basically the parameters read from the receivePackage method. Finally, after inserting the package information an identifier is returned (idSubmission). In the next step, performed by the sqliteParser.php service the file is converted from the SQLite format into a raw SQL file using the sqlite3 application, included in the server. Basically, the file consists of insert lines in the SQL language, which can be used to insert the data into the main MySQL database. Nevertheless, the SQL language used by SQLite has small differences in format compared to MySQL. On this stage, all those differences are corrected. Previously on Section 5.4.1, we mentioned that the structures of the tables on the SQLite and MySQL only differ in two extra fields (idSubmission and senderMac). These two extra fields are used to identify to which package the data belongs. Following, performed by the appendSQLite.php service, the two extra fields are concatenated to every insert line. Finally, these SQL file is send to MySQL to insert the collected data.

In summary, the web service functionality can be broken down into the following tasks:

• Receive the package from the scanner library • Save the package on the server • Decompress the package (It is originally compressed using GZip) • Convert the .db file into SQL insert statements • Add the scanner information to the data provided by the package • Insert the data into the main database • Perform POST requests required by the front-end application (See 5.4.4.3) • Return a response to the scanner library about the status of the appending.

5.6 Monitoring tools Due to the complexity and distribution of the data, it is difficult to identify issues by looking at raw data in the database. Therefore, prior to test the library, we decided to represent the data graphically on real time by creating monitoring tools. These tools will provide us a readable way to perform the testing stage. On the other hand, they provide live information about what will happen during the actual data collection.

5.6.1.1 First development stage monitor The First development stage monitor is a visualization tool intended to detect abnormal behaviors or mendacity of data in the early stages of development. It is written in PHP and runs on a web browser. This monitor provides three functionalities: A map

34 The Internet Society RFC2388: http://tools.ietf.org/html/rfc2388

Chapter 5: Development | 64 containing the current scanners location and the devices discovered, a network coverage map and a graph showing the last battery levels for an individual scanner. It uses a PHP version of the Google Maps API v3 to render the map35. We also use a PHP script macManufacturer.php, which provides the function macLookup that can be used to determine the vendor of a device using a given MAC Address. We can see an example of this tool in Figure 54. The relevant elements are indicated using a green cloud. In order to show the current scanners map, the user has to click on (1) and for the network coverage on (2). In both cases there is a form to specify a range of dates for the data.

1 2 3

5 4

Figure 54. First stage development monitor. 1. Selection of date ranges for plotting the historical position of the scanners. 2. Selection of date ranges for plotting the network coverage. 3. Selection of a scanner to show its battery levels. 4. Scanners map. 5. Bubble containing the scanner status for a particular location. In the current scanners map, every discovery performed by the scanner is shown with a yellow circle with a number inside (4). This number represents the amount of discovered devices by that scanner at a given time. The location where the scanner is plotted is the closest location update in comparison to the discovery. Every discovery performed by the same scanner is connected by color line, whose color is unique. These lines indicate the probable way that the scanner was moving on.

Figure 55. Graph represents history of the battery for the scanner. Graph on top shows the history with uniform way of time distribution. Graph on bottom represents this same data in scattered way.

35 PHP Class for Building maps using Google Maps API: http://code.google.com/p/php-google-map- api/

65 | Chapter 5: Development Detailed information about a discovery is available for the user by clicking on the discovery icon (5). This information is displayed in a bubble, which shows the time the discovery was performed, location accuracy and the scanner Bluetooth MAC address. This bubble also contains a button to generate a graph with the battery usage for that scanner, shown in Figure 55. Additionally, we can specify manually the scanner we want to see its battery level. This can be done by selecting its MAC address in (3). The histogram for the battery level was generated using Open Flash Chart - PHP libraries36. A second map showing the network coverage can be displayed by clicking on (2). This map (Figure 56) shows the where a user could not send data (had no access to the internet). The spot is marked with a semitransparent pink square, which can overlap with the others' squares. The darker color indicates overlapped spots and thus, more users having connection issues.

Figure 56. Network coverage. The pink spots show the areas where the users have connection issues. The darker the spot, the more connection problems took place.

5.6.1.2 Roskilde live monitor This monitor was created to see the current status during the actual data collection during the Roskilde festival. The Roskilde live monitor is a simplified version of the monitor described previously. The Figure 57 shows a screenshot of it, showing the scanner position (green scanner icon) and the discovered Bluetooth devices (blue icon B). Isolated scanner icons represent scanners that have the Bluetooth device turned off and is only collecting location data.

Figure 57. Roskilde live monitor

36 Open Flash Chart, Home: http://teethgrinder.co.uk/open-flash-chart-2/

Chapter 5: Development | 66 The data source used to display the map is taken directly from the main MySQL database. To retrieve data from this database, we created a set of methods to perform SQL queries to the server database in PHP. These functions returned an array with that can be used to display the map.

5.6.1.3 Roskilde live statistics In addition to the geographical distribution of the scanners and detected devices during the Roskilde festival, we created a website with live statistics. This monitor provides amounts and historical data as graphs about the status of the data collection.

Figure 58. Roskilde live statistics monitor. The amounts are shown in the top, while the historical graphs at the bottom. The data described as amounts were:

• Total unique devices discovered by all the scanners. • Total unique devices seen by more than one scanner. • Total scanners running the library. • Total scanners with the Bluetooth device activated. • Total scanners with the Bluetooth device activated within the festival area. (These are the only scanners that collect the information properly). • Total scans • Average of scans performed by a device. • Scans performed by each of the 3 front-end applications.

The data described as historical graphs covered the last 24 hours. They had a version with the data from the entire database and a version for each front-end application. The information displayed in these graphs was:

• Scans per hour • Unique devices per hour • Scanners per hour • Battery average per hour

67 | Chapter 5: Development

5.6.1.4 Roskilde simulation One of the disadvantages of the previous monitors is that they can show the data without the time dimension. In other words, all events occurred at different times are plotted on a single graph, so the information is overlapped. In order to produce a good visualization for a qualitative analysis once the festival is finished, we created a 3D interactive animation in the Unity 3D37 game engine. This tool will also help us to confirm our quantitative results and previous assumptions. Basically the animation consists of a 3D model of the festival area, the scanners and the detected devices (See Figure 59). It uses the data collected to render the scanner positions (white spheres) and their detected devices (blue spheres around the scanners). The position of every scanner is always interpolated over time between two location readings. Furthermore, this animation shows the information in a “fast forward” mode, displaying two festival days per minute. This animation is available on this website38.

Figure 59. Roskilde simulation monitor. The white spheres represent the scanners, while the blue ones the devices discovered by that scanner.

5.7 Testing After every component was completed, we tested it. However, we wanted to test the library in environments with similar conditions as Roskilde festival. Following, we present the three main testing performed in the development process.

5.7.1 Alpha Testing

5.7.1.1 Methodology Once the development of the library has reached 60% of completion, we performed a set of test cases integrating the library to open source applications (as front-end applications). At this point, the library was able to perform scans using fixed intervals and send data to the server. Nevertheless, the coordination between multiples instances of the library was not implemented yet.

37 Unity, Game Engine: http://unity3d.com/ 38 Roskilde 2012 animation: http://lestrade.imm.dtu.dk/~s101422/RoskildeUnity/atoms.html

Chapter 5: Development | 68 This alpha testing aimed to check the consistency of collected data, detect exceptions, analyze qualitatively the performance of the front-end applications and see what happens when multiples instances of the library where running. Before preforming the testing, we created a set of test cases (See Appendix iv and the example in Table 5). The open source applications used as front-end applications in the tests were: Frozen Bubble39 and Panoramio40. The former is a well-known computer game, while the latter was an application that shows pictures on top of a Google Map. The devices used in the testing were two HTC smartphones (Desire and Wildfire, See 3.4 ). Only two people (the authors) were involved in the testing, each of them carrying one of the testing devices. One of the testing environments was the First May event, which takes place in the Fælledparken Park in , attracting around 100.00041. The second environment was the Copenhagen Carnival, with around 80.00042 participants. These two events have similar conditions we would expect to find at Roskilde festival.

Test Case Nr: Start Date: End Written by: Approved by: 01 Date: 23/05/12 Lukasz Dynowski Marcos Fuentes Description: Multi-application testing. (Library compatibility test) Instruction: Attached the module to other available applications. Install the applications on an Android smartphone (minimumv2.2). Run application with the attached library simultaneously on this same device, in particular, order: one application installed and run, two applications installed and run, three applications installed and run. Expected Positive: The Applications doesn’t cause the malfunction of another Results: application or system. The services run exchangeable or simultaneously. Negative: The application causes malfunction as well killing other services of applications running this same library. Additional This same test shall be run on different devices. Notes: Table 5. Alpha testing case example

5.7.1.2 Results Performance After starting the front-end application, we could notice a small lag caused by the library initialization. However, after it we didn’t discover any significant system slowdowns. Two library instances at the same time Even though we did not implement synchronization for multiple library instances, we wanted to see what happened. We discovered that when two front-end applications were running at the same time, either of them present random exceptions. It is interesting to note

39 The Android port of the Frozen Bubble game: http://code.google.com/p/frozenbubbleandroid/source/browse/ 40 Sample Applications for the Android platform: http://code.google.com/p/apps-for- android/source/browse/#git%2FPanoramio%253Fstate%253Dclosed 41 Velkommen til 2.maj (Welcome to 2nd may): http://www.bt.dk/node/19950677/print 42 Netnyhederne.dk: http://netnyhederne.dk/2012/05/29/se-video-her-80-000-til-karneval-i- faelledparken-kobenhavn/

69 | Chapter 5: Development that after the exception, only the front-application stopped and the library instance was keeping running on background. Database consistency The database consistency testing did not reveal errors in the process of sending data. However, analyzing the data collected, we discovered that some fields in the database took bigger values than expected. For example, the Bluetooth's discoveryTime field (indicating how much time a discovery takes), was greater than 300[s] in some cases (the expected value was not greater than 12s). Qualitative testing using the first stage monitor Using the first stage monitor to plot the scanner locations collected by the library using only the network provider (but no GPS), we identified a strange teleportation. While all the location updates were close to each other (in Copenhagen City), one update was located 30 [km] away from the rest (in Roskilde City). Physically, this is impossible since every location was obtained every 1[min] and the participants of the test were walking. We can see the first stage monitor screenshot in Figure 60 that helped us to identify this issue.

Figure 60. Teleportation of a user when the library obtains the location using the network provider. In the testing place at Fælledparken we were able to notice containers with 3G cells (Figure 61). Our suspicion is that these containers act as portable antennas to provide better network reliability during massive events. However, the location of these portable antennas may be registered at a different place than where they actually are, leading to incorrect location updates. For this reason, we conclude that obtaining the location using the network provider is not reliable 100% on massive events. The library will always use both the GPS and network provider, in order to confirm the location obtained.

Figure 61. Container carrying a set of 3G cells at First May event at Fælledparken. Location updates using 3G networks may not be reliable on massive events, since these portable cells may be registered at different places.

Chapter 5: Development | 70 5.7.2 Beta Testing

5.7.2.1 Methodology Once the progress of the library development reached 90%, we decided to create and publish a simple front-end application for the Copenhagen Distortion43 2012 event. Distortion is another massive event with similar conditions than Roskilde festival. It takes place on the Copenhagen streets during three days, gathering around 140.00044 people. The front-end application was called Distortion CPH, and consisted of an application displaying the festival schedule. The screenshots of it can be seen on Figure 62. When the application is started, it asks the user to turn on the Bluetooth adapter if it is disabled.

Figure 62. Screenshots of Distortion CPH 2012 application At this point, the library had support for synchronization between multiple instances and several interfaces exposed to the front-end application were completed. The front-end application displayed a map of the event along with the user location. This location was retrieved using the library interface getLocation. Finally, the application was submitted to the Google Play store one day before the event started. In addition to the people who freely downloaded the application, we also counted with volunteers from DTU who actively participated in the testing.

5.7.2.2 Results By analyzing the data after the Distortion event, we discovered additional bugs omitted in the alpha testing. The first bug was that some circumstances, the packages were sent twice to the main database. The reason of this, is that in the library, the packages where sent in parallel using threads. However, the database can only be accessed by one thread at a time. For that reason, the library was unable to mark the data already sent. Another issue was an incorrect value in the battery scheduler frequency. By mistake, we read the battery level only twice per day. We also noticed that the server received empty packages. The reason of it was that the counter that indicates the amount of records in a package was never reset, so the library always considered that there were enough data to create a package.

43 DISTORTION 2012 x Front: http://cphdistortion.dk/ 44 Distortion 2012 Nørrebroportalen 2200.dk: http://2200.dk/default.asp?ID=50010300251

71 | Chapter 5: Development On the other hand, we realize that only 40% of the users actively enabled the Bluetooth device. This situation may reduce significantly the amount of Bluetooth discoveries. However, this is a variable that is out of our control.

5.7.3 Integration tests with Roskilde 2012 front-end applications Once the bugs detected in previous tests were fixed, the library was ready to be integrated to the Roskilde 2012 front-end applications created by DTU students.

Figure 63. Roskilde 2012 Front-end applications created by DTU students containing the scanner library. From left to right: Hide and Seek, MusicNerd and Decibel.

5.7.3.1 Methodology Two weeks before the festival the three front-end applications were completed and ready for integration. These applications were Roskilde Hide and Seek45, Roskilde MusicNerd46 and Roskilde Decibel47. Screenshots for all of them are shown in Figure 63. The main features to be tested were the interfaces provided by the library to the front- end applications. The interfaces used were getLocation, schedulePost and schedulePostAndFile. We submitted the library to a SVN server the integration was in charge of the students. We provide them an installation manual that can be seen in Appendix i. All the communication with them was via email.

5.7.3.2 Results At the beginning of this process, the main issues were basically doubts regarding the installation of the library to the front-end application project. Additionally, the students asked us clarification about the use of the interfaces. The most important problem we had was related to the database structure. After the beta testing, we added an extra field to the local database. Nevertheless, the local database is only created when it does not exist in the device. Therefore, in some devices the old database structure already existed and the library tried to insert records having a different structure.

45 Hide and Seek: https://play.google.com/store/apps/details?id=dk.dtu.imm.hideandseek 46 MusicNerd: https://play.google.com/store/apps/details?id=com.togomachine.musicnerd 47 Decibel: https://play.google.com/store/apps/details?id=dtu.imm.roskilde

Chapter 5: Development | 72 Finally we fix this problem by adding extra code to the library, which recreates the database if it detects the old structure. The provided interfaces getLocation and schedulePost worked correctly. However, the application Hide and Seek required to send big images taken from the camera using the schedulePostAndFile interface. In several occasions the images never arrived to the server. This issue was not fixed due to the problem was detected few days before the deadline. Finally, we suggested the developer to use this interface only when there is not Internet access.

5.8 Deployment The front-end applications containing the library where submitted two weeks before the festival (June 18), as a requirement of the Roskilde Festival organization. All of them were sent to the Google Play store using a single account, owned by the DTU IMM department. Once in the store, the applications were freely available for downloading by any Android user. Nevertheless, it was possible to solve integration issues during the deadline and the festival opening and resubmit the applications to the store. It is important to note that a popup is displayed the first time these front-end applications start. This popup notify the user that anonymous location and Bluetooth data will be collected. The same popup provides the option to accept or terminate the application.

5.9 Summary This chapter is the synthesis of the data collecting solution for this thesis. The solution was a scanning software component deployed as a library, which might be attached to any front-end Android application. This library runs on a smartphone as a background service. We presented the architecture for the scanning library before the requirements, in order to introduce them. Furthermore, we divided the architecture into two sides: Firstly, the mobile side that contains the front-end application with the attached scanning library and the local mobile database. This database, stores temporarily the data collected by the smartphone. Secondly, the server side that provides a main database, which stores the data collected by all scanners. Additionally, the server also implemented a set of monitor tools, showing statistical and visual information about the movement of participants. These tools were used in the testing processes to identify qualitatively abnormalities in the data collection, which might be difficult to notice by just checking the raw data. Additionally, we used them in order to display the data geographically and make statistics about the festival. The next step after presenting the architecture and the requirements was the design phase. This phase was also divided between mobile and server sides. The scanning library could be broken down into four periodic asynchronous tasks, including: Bluetooth (discovery process), GPS (location updates using the GPS sensor and the network provider), Battery (for checking the battery level) and Server (responsible for sending packages to the server). The frequency of these tasks was defined by the Scheduler component, using the information obtained from Chapter 4. The server side of the solution contains a web service, which is capable of receiving packages sent by the scanner library and appending them to the main server database. Another thing that we discussed in this chapter was the synchronization process of multiple instances of the scanning library. This situation occurs when many front-end applications were running at the same time. Therefore, we created a master-slave relation

73 | Chapter 5: Development between the scanner library instances. In this case, there is always only one instance (the master) that performs the tasks. When the master instance is destroyed, one of the remaining slave instances becomes the master. In the implementation part, we described which technologies were used for development. There, we mentioned the details of implementation of the database on the mobile and server side. Furthermore, we explained the process of creating and sending packages in detail. Moreover, before shipping the library as a fully compatible and reliable product, we tested it several times. These testing were related to the results of the risk assessment conducted in Chapter 3.6. The testing phase was done in three stages: At 60% of the development progress, we performed an alpha testing phase. We used open source applications as front-end applications, while the testing took place in two massive events. At 90% of completion, a beta testing was conducted using an application created by us for the Distortion massive event in Copenhagen. Finally, we performed integration tests with the front-end applications dedicated for the 2012 edition of the Roskilde festival. These applications were designed and developed by DTU students of the IMM department. The applications were Roskilde Decibel (sound level measurement), Hide and Seek (social game) and Roskilde Schedule (festival schedule). Finally, the three front-end applications containing the library were submitted to the Google Play store for public downloading.

Chapter 5: Development | 74 CHAPTER 6

Results

6.1 Introduction In this chapter, we will discuss the results obtained from the data collected during Roskilde Festival 2012. Its main focus is to validate the assumptions presented during in previous chapters, and the design of the library. It will also describe the characteristics and statistics of the distributed scanning technique, as well as information about the participants of the festival. The first part of this chapter starts by explaining, how the data was prepared for the analysis. Next, it will provide general statistical information, about all the data collected during the festival and their relations. The second part will be focused only on the data collected within the festival area. We will determine the area covered by the scanners in the festival and concert areas, using two different measurements. Later, we will analyze the efficiency of the library, according to the energy consumption and the ratio of discovered devices per scan. Next, we will describe the behavior of every front-end application, in terms running time and the reasons of why the library stopped sending data. In the third part, we will validate our assumptions about the network coverage and the Bluetooth discovery time. Additionally, we will measure the accuracy of the locations obtained during the data collection. The fourth part comprises additional results, such as: The walking speed of the participants, the daily distribution of the battery level and information about the manufacturers of the scanners and the discovered devices. At the end of this chapter we will show the potential applicability of analyzing the data collected for social network analysis purposes.

6.2 Data Preparation

6.2.1.1 Overview Once the festival had finished and we had all the data collected stored in the main database, we had to prepare it for the analysis. This preparation includes the removal of all unnecessary data, and the simplification of the database structure (in order to reduce the complexity of the SQL queries that will be used in the analysis). The main database presented in the previous chapter was designed to simplify the process of appending the data collected from the scanners. This structure works as a set of parallel databases, where every table has as primary key (both the table index and the id of

75 | Chapter 6: Results the package that provided the data). In this structure, the table indexes are related to every package and not to the whole database, making difficult the readability. On the other hand, the data collected may contain unnecessary records scattered on all the tables. For example, there are records collected outside the festival area that do not provide relevant information. In order to remove them, we need to join every table with the GPS table to know at which location they were collected and then determine if they are inside or outside the desired area. We also have data collected before and after the festival. These data should be removed as well. Joining the tables with the GPS table every time we perform an analysis increases the complexity and readability of the SQL queries. In addition, the computational time required for processing those queries is considerable. Given these points, we decided to create a simplified version of the entire database adding geographical information to every table. This geographical information allows us to easily apply location and time/date filters. Nevertheless, some statistics should still be obtained directly from the original database.

6.2.1.2 Methodology The simplified version of the database used in the analysis is based on the structure of the original one, plus geographical information. In Figure 64, we can see that the new structure contains geographical location, namely: latitude, longitude, accuracy and gpsTime (time when the location was retrieved). Additionally, we regenerated the table indexes. In this structure, the indexes are relative to the whole database instead to a package. Finally, we added the data from the main database to the simplified one, by filtering the data according to location and time.

Figure 64. Simplified version of the main database. Every table contains geographical information

The first step of the simplification process was the addition of geographical information (location) to every table. Nevertheless, the location collected by the GPS task in the library

Chapter 6: Results | 76 was not necessarily retrieved at the same time than the rest of the tasks. Therefore, for every record in every table, we took its time column and check the nearest location record in the entire GPS table according to the time the location was obtained. Evidently, we grouped the records belonging to the same scanner for this operation. The process of adding location to a table can be explained using the algorithm of Figure 66 and the example in Figure 65. In this example, Source is the table we will add location data. This table can be any of the ones in the main database structure whereas it contains a time column. The algorithm of Figure 66 is a representation of the SQL query used to select the best location for a given time. This query is presented in Appendix ii. When a record is selected from the Source table (rectangle in red), its Source.time column is compared to all the Gps.time columns in order to find the location obtained at a similar time. Nevertheless, the Gps.time column should be adjusted in order to reflect the impact of the accuracy. To adjust the time, the accuracy introduces a penalty to the Gps.time column. This penalty is based on how much time could take for a person to leave the radio of accuracy in time by walking, considering a walking speed of 1[m/s].

Source table

GPS table

Figure 65. To obtain best location for the record marked in red in the Source table, a whole search in the GPS table should be performed, comparing Source.time to Gps.time. The best location is marked in red in the GPS table. This GPS time was adjusted using a penalty based on the accuracy.

1. Select the first record from the Source table 2. For all the records in the Gps table whose Gps.time < Source.time, replace Gps.time for Gps.time-accuracy (accuracy penalty). 3. For all the records in the Gps table whose Gps.time > Source.time, replace Gps.time for Gps.time+accuracy (accuracy penalty). 4. Find the Gps.time that minimizes the value | Gps.time – Source.time |. 5. Add the Gps.latitude and Gps.longitude values to the record selected in the Source table. 6. Select the next record from the Source table and continue to the step 2. Figure 66. Algorithm that explains how the mechanism of the SQL queries used to select the nearest location from the GPS table.

Figure 66 shows the algorithm that explains what the SQL queries do. In line 1, a record is selected from the Source table. In line 2, we subtract the accuracy penalty to the records in the GPS table with Gps.time less than Source.time and in line 3, we do the opposite. Then, on line 4, we select the minimum time difference between Source.time and

77 | Chapter 6: Results Gps.time. This minimum will give us the nearest location according to time. In line 5, we add this location to the table and we continue with the rest of the records.

Once all the tables had geographical information, we applied a filter to remove all the data collected outside the festival. This filter was based on a square enclosing the Roskilde festival area, given by the following coordinates:

(55.627705, 12.057495) and (55.600463, 12.098866)

Then we filtered the data that was not collected during the festival, by removing all the records whose time column where outside the range: [2012-06-30 00:00:00, 2012-07-09 03:00:00] After filtering the data, we measured the time difference between a discovery (in the Bluetooth table), and its assigned time to location. We got an average time difference of 216[s] for the 85% of discoveries. This indicates that the location was obtained 216[s] (in the past or in the future in average). Furthermore, we distinguished two groups of scanners. The first group was all the scanners with the Bluetooth device turned on (before the front-end application was started). These devices were identified by the MAC address of the Bluetooth device. The second group contains the scanners with the Bluetooth device turned off (when the front-end application was started). In this case, the library could not use the Bluetooth MAC address as identifier. Instead, the library generated a random MAC. This generated address can be distinguished from others, by checking first 4 signs of MAC address, starting from “DD:DD” . An important fact is that the second group can still send location data, even though Bluetooth is not active. If the participant turned on the Bluetooth device later, the random generated MAC is not changed. Unfortunately, we made a mistake by not saving this address on the device. Consequently, the generated MAC address is regenerated every time the application is restarted on the same device. Nevertheless, this problem did not affect the results of the amount of scanners or discovered devices. We counted the total scanners only from the Applications table, which is written only the first time the library runs on the device, independently if the generated MAC addresses change.

6.3 General Statistics

6.3.1 Statistics Following, we will divide the statistics into two groups. Firstly, we will describe the all the data collected, including the data retrieved outside the festival. Secondly, we will present only the data collected within the festival area. For these results we have used both the original database and the simplified one.

6.3.1.1 Data collected, inside and outside the festival area Table 6 contains a resume of all the data collected by the library. This includes the data obtained inside and outside the Roskilde Festival area. We provide the values in units as well of all the scanners in the data set (377[units]). We can see that the percentage of devices working properly as scanners (performing discoveries and location updates), was only 35%. On the other hand, the amount of devices sending location updates was 57,8%. These devices still provide information about the event,

Chapter 6: Results | 78 even though they do not perform discoveries. Moreover, there is a very small amount of devices performing discoveries but not location updates (4%). These devices are discarded for the rest of the analysis in this chapter, since we do not know if they are inside or outside the festival. Nevertheless, it would be possible to obtain their location, by matching the devices discovered by another scanner that sent location updates.

Data collected statistics. Inside and outside the festival area Scanners: Units % of all scanners Scanners running the library 377 [scanners] 100% Scanners performing discoveries 147 [scanners] 39% Scanners performing location updates 218 [scanners] 57,8% Scanners performing discoveries and location updates 132 [scanners] 35,0% Scanners performing discoveries but not location updates 15 [scanners] 4% Scanners performing location updates but not discoveries 86 [scanners] 22,8% Scanners with the Bluetooth device turned off when the front-end application 198 [scanners] 52,5% started Scanners that activated the Bluetooth device when the front-end application 24 [scanners] 6,4% asked to do so Scanners running Roskilde Hide and Seek 95 [scanners] 25,2% Scanners running Roskilde Decibel 222 [scanners] 58,9% Scanners running Roskilde MusicNerd 88 [scanners] 23,3% Discoveries: Units Total amount of discoveries 16.717 [scans] Table 6. General statistics for all the data collected, inside and outside the Roskilde Festival area. We describe them as units and as a percentage of all the scanners in the data set (377 units). Many devices did not have the Bluetooth device turned on before starting the application (52,5%). In that case, the front-end application showed a popup asking them to enable it. Only 24 scanners out of 198 actively enabled the Bluetooth device when they were requested to do so. In other words, 12,1% of the participants answered Yes to enable the Bluetooth device. From the whole dataset, the most used front-end application was Roskilde Decibel, with 222 scanners, followed by Roskilde Hide and Seek with 95 and Roskilde Decibel with 88 scanners.

6.3.1.2 Data collected in the Roskilde Festival area This chapter focuses only on the data collected within the Roskilde Festival area. Table 7 presents statistics for the data (after applying filtering location and time). In the case of the scanners, we present them as units, percentage of all the scanners within the festival area (106 [units]) and as a percentage of all the scanners inside and outside the festival area (377[units]). It is important to note that we only considered the scanners performing location updates. As it was aforementioned, we are unable to know the location of the scanners that did not send their location unless we match its discovered devices with another scanner that sent location updates. Additionally, the table below indicates the sections in the rest of this chapter that cover the results in detail.

79 | Chapter 6: Results

Data collected in the Roskilde Festival area48 % of all % of all scanners, scanners Scanners: Units inside and outside within the the festival area festival area Scanners running the library 106 [devices] 100% 28,2% Scanners performing discoveries and location 75 [scanners] 70,8% 19,9% updates Scanners performing only location updates 31 [scanners] 29,2% 8,2% Scanners with the Bluetooth device turned off 35 [scanners] 33% 9,3% when the front-end application started Scanners that activated the Bluetooth device 18 [scanners] 17% 4,8% when the front-end application asked to do so Scanners running Roskilde Hide and Seek 33 [scanners] 31,1% 8,8% Scanners running Roskilde Decibel 47 [scanners] 44,3% 12,5% Scanners running Roskilde MusicNerd 30 [scanners] 28,3% 8% Discoveries: Units Total amount of discoveries (See Section 6.5 ) 3.631 [scans] Unique devices detected (See Section 6.5 ) 3.161 [devices] Total amount of occurrences (See Section 6.5 ) 9.133 [devices] Average of occurrences per discovery 2,515 [devices] Standard deviation of occurrences per discovery 3,3 [devices] Unique devices per discovery 0,87 [devices] Percentage of the festival population covered 2,4 % Average intervals per scanner (See Section 6.5 ) 12,65 [min] Average intervals considering all the scanners as a system (See Section 6.5 ) 3 [min] Average of occurrences of a unique device 2,65 [occurrences] Discoveries performed by Roskilde Hide and Seek 929 [scans] Discoveries performed by Roskilde Decibel 1.529 [scans] Discoveries performed by Roskilde MusicNerd 1.391 [scans] Area Covered (See Section 6.4 ): Units Festival area covered using Potential area measurement 1,016 [km2] or 45,6% Festival area covered using Actual area measurement 0,439 [km2] or 19,71% Concert area covered using Potential area measurement 0,252 [km2] or 94,83% Concert area covered using Actual area measurement 0,16 [km2] or 60,34% Average area covered per scanner 3.436 [m2] Physical numbers: (See Sections 6.4.2 and 6.11.1) Units Average distance traversed by every scanner (without considering the warm up days, 2,563 [km] that is, from 5th August to 8th August) Average walking speed 0,5 [m/s] Table 7. Statistics considering only the data received from Roskilde Festival area

48 We consider in this list only devices with the location activated, which was used to determine whether the devices were with the Roskilde Festival area or outside it.

Chapter 6: Results | 80 We can note that 28,7% of the scanners from all the data collected were in the festival area. This measurement counts the scanners that sent at least one location update within the festival area coordinates. The amount of scanners working as expected (performing discoveries and location updates), were 75[devices]. This represents 70,8% of the scanners in the festival area. This value if very different that the one obtained for all the data collected, which was 35%. When the user is prompted to enable the Bluetooth, 18 out of 35 participants answered Yes (51,4%). These values are very different compared to all the data collected, which was 12,1%. We suspect that the participants are more compromised with the applications when they are inside the festival area. Regarding the front-end applications, we can see that the percentage of devices running Hide and Seek increases from 25,2% to 31,1% when the data is filtered. This can be related to the fact that this is a social game that can only be played within the Roskilde Festival area. On the other hand, it is interesting that Roskilde Decibel is more popular when all the data is considered than only the festival area. This front-end application is a gadget that measures the environmental noise and can be used perfectly outside the Roskilde Festival area. Additionally, this was the front-end application with the higher contribution of discoveries. Considering every scanner, the average interval for discoveries and location updates between the range [7; 30] [min] was 12,65[min]. Considering the scanners as a system, the interval was 3 [min]. The total amount of discoveries in the data collected was 3.631. The number of unique devices discovered was 3.161, giving a ratio of 0,87 unique devices per scan. These unique devices comprise 2.43% of all the festival participants (from 130.000). The total amount of occurrences was 9.133, with 2,515 occurrences per scan. This data will be analyzed deeply in section 6.5 in order to measure the efficiency of the library. In section 6.11.3 we will discuss if the data collected is a representative sample of the Danish mobile market share. We also measured the distance travelled for every scanner during the festival and its speed. This will be analyzed in sections 6.4.2 and 6.11.1. In the next sections we will be analyze only the data within to the Roskilde Festival area.

6.4 Area covered

6.4.1 Percentage of the festival and concert area covered In this section, we will analyze the percentage of the festival and concert area that was visited at least once, using different ways of measurement. The opposite of this percentage indicates the areas that were never visited. Additionally, this percentage will be calculated per day and also for the whole duration of the festival. In order to measure the area covered by the scanners, we use a combination of Google Maps and GIMP49, an image manipulation program. We modified the monitors described in Section 5.6 , by placing a mask on top of the map of the festival, indicating the area of interest. This mask is shown in Figure 67a, the white area includes the festival area and the blue the concert area. Nevertheless, the concert area is included within the festival area. In the first step, we measured the area enclosed by the whole mask (black, blue and white areas), comprising 4,575 [km2]. Later, we used GIMP to count the percentage of white and blue pixels in this mask for the festival area, (48,7% of the image or 2,23 [km2]). For the concerts area, we only counted the blue area (5,8% of the image or 0,265 [km2]).

49 GIMP, The GNU Image Manipulation Program: http://www.gimp.org/

81 | Chapter 6: Results

a) b) c) Figure 67. a) Mask used on top of the monitors to demarcate area of interest in the festival. The white and the blue areas are the festival area and the blue alone the concert area. b) An example to see how the mask matches the zones of interest in the festival c) The mask with the visited areas on top. In order to measure the area covered by the scanners at least once, we used the monitor tools to plot on top the mask the locations were discoveries were performed, using black circles (See Figure 67c). Later, we count the remaining white and blue pixels to measure the area covered. We use two different measurements to establish the size of the area covered on every plot and its location, namely: Potential area and actual area. Additionally, we considered only the location updates with accuracies less than 120[m]. Following, we will explain these measurements and the percentage of area covered using them. At the end of this section we will compare the results obtained by every measurement.

6.4.1.1 Potential area This measurement considers the areas that have a probability of being covered. Basically, the radius of the probable area covered in a discovery considering the accuracy errors. This radius is the sum of the location accuracy plus the area covered in a Bluetooth scanning (plus 10[m]).

Figure 68. Areas covered during the days 30,1,2,3,4,5,6,7,8 of the festival. The last image with red borders indicates all these areas overlapped, representing the whole festival duration. The measurement used was the Potential area. The total area covered during the whole festival was, for the festival area 45,59% or 1,016 [km2] and for the concert area 94,83% or 0,252 [km2].

Chapter 6: Results | 82 The location accuracy indicates the radius of the area where the participant may be located. Additionally, we also have to take into account the case of the participant was performing discoveries at the boundaries of this area. To include this extra area, we added 10[m] more to the accuracy radius. It is important to know that this measurement introduces a high error in the area covered, because it does not consider the different probabilities of finding a person inside the accuracy radius. In Figure 68, we can see the plots of the scans in black distributed per every day of the festival. The last graph indicates all the previous plots overlapped, representing the potential area visited during the whole duration of the festival. During the whole festival, the total area with possibilities of being visited was 45,59% or 1,016 [km2] for the festival area and 94,83% or 0,252 [km2] for the concerts area.

6.4.1.2 Actual area The Actual area measurement works almost exactly like the Potential area, but it differs in the size of the area covered in a discovery. Instead of using the location accuracy for the radius of the area covered in a discovery, it just uses Bluetooth range of coverage, or 10[m]. This measurement provides a more confident estimation of the area covered. In Figure 69, the plots using this method are displayed similarly to the previous measurement. Using this measurement, the total area with possibilities of being visited was 19,71% or 0,439 [km2] for the festival area and 60,34% or 0,16 [km2] for the concerts area.

Figure 69. Areas covered during the days 30,1,2,3,4,5,6,7,8 of the festival. The last image with red borders indicates all these areas overlapped, representing the whole festival duration. The measurement used was the Actual area. The total area covered during the whole festival was, for the festival area 19,71% or 0,439 [km2] and for the concert area 60,34% or 0,16 [km2].

83 | Chapter 6: Results 6.4.1.3 Comparison between measurements In this section, we will analyze the previous measurements in a quantitative way. For that purpose, we created the graphs presented in Figure 70. The graph in Figure 70a, shows the comparison between the whole areas covered during the 10 days of the festival, while the graph in Figure 70b, shows the area covered distributed by day. Considering the Potential area criteria, we could say that almost half of the festival area had a probability of being visited at least once, over the course of the festival 45,59%. The concert area was almost completely visited, with 94,83%. On the other hand, using the Actual area methods, we can say with more confidence that the area covered was 19,71% and the concert area 60,34%.

100,00%& 94,83%& 80,00%$ 90,00%& 70,00%$ 80,00%& 60,00%$ 70,00%& 60,34%& 60,00%& 50,00%$ 50,00%& 45,59%& Concert$Poten5al$Area$ 40,00%$ 40,00%&

Percentage) Concert$Actual$Area$

Percentage) 30,00%$ 30,00%& 19,71%& Fes5val$Poten5al$Area$ 20,00%& 20,00%$ Fes5val$Actual$Area$ 10,00%& 10,00%$ 0,00%& Fes1val& Fes1val&Actual& Concert& Concert&Actual& 0,00%$ Poten1al&Area& Area& Poten1al&Area& Area& 30$ 1$ 2$ 3$ 4$ 5$ 6$ 7$ 8$ Day) Measurement) a) b) Figure 70. Comparison of the area covered using the measurements described. a) During the whole festival b) Distribution per day Using the plots presented earlier, we could observe that distribution of coverage was mainly in the concerts area than in the whole festival area. It is also possible to analyze the amount of coverage per day of the festival. During the days 30-4, only the camping area of the festival was opened and only two stages had concerts (the warm up days). In that period, only a fraction of the whole festival population was present. We can observe that during the first three days, the area covered was around 3%- 10% for the concert and festival area. The day before the full opening of the festival (day 4), the area covered started to increase. This can be related to the fact that many people who did not participate of the warm up arrived this day. Once the concerts area was opened (days 5-8), we can see an increment of the area covered. In the day 6, the concert area covered was around 28% - 65% and the festival 5% - 29%. In the last day (Sunday 8), many people returned home before the festival finished. This can be reflected in a drop of coverage. This year the festival was attended by 130.000 people and the amount of active scanners were 106. This represents a 0.082% of the entire festival population. In conclusion, these results are promising considering the low percentage of scanners compared to the area visited. On the other hand, we should redefine the areas of interest in the festival used in the calculation, in order to be more representative of the geographical characteristics of the terrain. For example, we can restrict the area only to roads and discarding the places with trees, fences, restaurants or water. Nevertheless, this requires a geographical analysis of the terrain.

Chapter 6: Results | 84 6.4.2 Traversed area covered per scanner The purpose of this analysis is to measure the traversed area covered per every scanner. This measurement differs from the previous in such a way that the scanners are treated independently and not as a system. The area covered per scanner was calculated according to following rules: Firstly, we measured the distance that each scanner has traveled during a day. Secondly, we multiply the distance by the range of a Bluetooth discovery (10[m]). The location updates with an accuracy greater than 100[m] were discarded, because it would be misleading when the covered area of the scanner were greater than whole festival area.

1000" 900" 800" 700" ]% 2 600" 500"

Area%[m 400" 300" 200" 100" 0" 2012,07,05" 2012,07,06" 2012,07,07" 2012,07,08" Day%

Figure 71. Traversed average area covered by the scanners during the days after the warm up.

Figure 71, shows the average area covered by the scanners during the days after the warm up. As it can be seen, the area increases according to the day. It can be explained because of the concerts and artists popularity during these days. Therefore, the average traversed area covered per scanner was 858[m2] what gives 38% of the coverage.

Day 5 Day 6 Day 7 Day 8 Area covered 709[m2] 900[m2] 937[m2] 889[m2] Average walking distance per day 1.724[m] 2.787[m] 3.022[m] 2.719[m] Active scanners/ Total scanners 17/52 13/37 14/25 5/11 Percentage of Active scanners / Total scanners 32,7% 35,1% 56% 45,4% Table 8. Values used in the calculation of the area covered per scanner

Furthermore, Table 8 shows the number of active scanners that were sending data compared to the total scanners seen during the day. As it can be inferred, the amount of scanners decreases as the festival progress. The reason of this is that the people might start the applications during the first days of the festival and then they stopped it. The probable causes of termination will be analyzed per front-end application in section 6.6 .

6.5 Discoveries Efficiency In this section we will evaluate the efficiency of the proposed solution. This evaluation will be based in terms of discovered devices per scan and the energy consumed.

6.5.1 Total occurrences and unique devices per discovery In this measurement, we will describe how efficient was every scan. Basically, we want know how many occurrences were obtained per scan and how many of them were unique devices.

85 | Chapter 6: Results The total amount of discoveries was 3.631. The amount of occurrences in those scans was 9.133, while the total amount of unique devices was 3.161. According to that, the efficiency of total occurrences and unique devices were 2,833 [devices/scan] and 0,871 [devices/scan] respectively.

Roskilde 2011: Roskilde 2012: Roskilde 2011: Fix scanners Distributed scanners Fix scanners Replacement Ratio & Replacement Ratio & Fix Intervals Daily Polynomial Daily Polynomial Total occurrences efficiency [devices/discovery] 2,495 2,699 2,515 Total occurrences improvement with respect to fix intervals - 8% 0,8% Total occurrences improvement with respect to Roskilde 2011: R. Ratio and -7,6% - -6,8% D. Polynomial algorithm. Unique devices efficiency [devices/discovery] 1,047 1,104 0,871

Unique devices improvement with respect to fix intervals - 5% -20,2% Unique devices improvement with respect to Roskilde 2011: R. Ratio and -5,2% - -21,1% D. Polynomial algorithm. Table 9. Comparison between the results from Roskilde 2011 using fix scanners and Roskilde 2012 using distributed scanning. We can observe that distributed scanning is more efficient obtaining more device occurrences, but less efficient for unique devices.

In Table 9, we can see a comparison between the results obtained in the experiments from Section 4.2 using the data from Roskilde 2011 and the current results. To set the intervals, the library used an algorithm that considers the weighted average of the historical replacement ratio and the daily polynomial function presented in 4.2.5. Considering all the discoveries collected during the festival for all the scanners, we can see that the algorithm used in the library gained an efficiency of the 0,8% respect to using fixed intervals in terms of device occurrences. The same algorithm compared to the simulation using the data of Roskilde 2011 obtained 6,8% less occurrences. If we compare the amount of unique devices, we can see a drop of 20,2% regarding fix intervals and a drop of 21,1% regarding the same algorithm using the data of Roskilde 2011. These results indicate the nature of the data collected using distributed scanning. This technique is more efficient obtaining more devices occurrences than fix scanning. However, the drop of unique devices indicates that the same devices appear on many scans. Having many occurrences of one device is not an issue, since we can tell more about a particular device (See the analysis on Section 4.2.1). This situation can be explained because in distributed scanning, the participant running the scanner is usually surrounded by the same people (like friends) plus random people. In the case of fix scanning, the scanners do not follow any person in particular.

6.5.2 Energy efficiency In order to measure the energy consumed by the GPS and Bluetooth devices, we added to every Bluetooth record the closest measured battery level to the scanners that sent discoveries and location updates. Later, we measured the discharging ratio of the battery, considering the time elapsed between two consecutive discoveries. It is important to note that

Chapter 6: Results | 86 this measurement does not take into account other activities performed by the user between discoveries. Nevertheless, this is a similar estimation of the one obtained experimentally. We measured the battery discharge ratio every two consecutive scans for all the scanners and then we obtained the average of all of them.

Bluetooth + GPS Bluetooth + GPS Experimental and Theoretical Roskilde 2012 (Figure 22 in Section 4.1.5.2) Results minimum: 7 Maximum: Average: 12 [min] Intervals [min] 30 [min] 12,65 [min]

Discharging rate: % per hour 3% 2,5% 2,6% 5,98% Discharging rate: % per 0,35% 0,5% 1,3% 0,95% Bluetooth + GPS operations 72% 60% 78% 108% Daily battery consumption Table 10. Energy used by the Bluetooth and GPS in the experiments and after the data collection. In Table 10, we can see that the battery consumption estimated after the experiments in Section 4.1.4.2 was different when applied to a massive event. Our expectations using intervals in the range of [7; 30] [min] were to consume around 72% and 78% of the battery during a day, when Bluetooth and GPS operations were performed. Actually, the library consumed 108% of the battery per day using an interval of 12,65 [min], which was not so far for the range expected of 72%-78%. On the other hand, the discharging rate for Bluetooth and GPS was within the range expected. This situation can be explained because in the experiments, the application used in the measurements was the only activity running on the device. Additionally, we tested it on only one hardware. Contrarily, in a massive event we can find a vast amount of different hardware with different battery lives. Furthermore, the participants use the device for many activities such as calls and running other applications.

6.5.3 Intervals assigned by the algorithm During the design of the library in section 5.4.3.3, we decided to set the intervals in the range of [7; 30] [min], in order to have a daily energy consumption of [72%; 78%]. Then, the interval chosen by the library between these ranges is calculated by the library on every cycle of the Scheduler component and is determined by the replacement ratio, the polynomial function obtained from the dataset of Roskilde 2011 and the battery level. By analyzing all the in the intervals assigned by the algorithm, we obtained that the average of these intervals per scanner was 12,65[min] with a standard deviation of 5,6[min]. However, if we treat the scanners as a system, the interval between scans is 3[min]. In Figure 72, we can see the distribution of the intervals assigned by the library. We can see that 70% of them were around 16 [min]. If we made an analysis only taking into account the distribution, we can observe that the algorithm may be inflexible due to the high concentration of intervals around this range.

87 | Chapter 6: Results 0,8"

0,7"

0,6"

0,5"

0,4"

Probability* 0,3"

0,2"

0,1"

0" 0" 8" 16" 24" 32" 40" Interval*[min]*

Figure 72. Probability distribution of the intervals (in [min]) assigned by the algorithm. The average interval was 12,65 [min]. Nevertheless, the distribution of intervals by time during the festival shows more flexibility as can be seen in Figure 73. In this graph, the intervals for all the scanners are shown, with vertical lines indicating the beginning of a day. We can observe that the intervals roughly oscillate between 11 and 20[min]. Additionally, we can see peaks at the beginning of every day that decreases at the end. These peaks may be explained by the influence of the polynomial function. The small peaks are produced by influence of the replacement ratio function.

31"

27"

23"

19"

15"

11" Interval)between)discoveries)[min])

7" 30)06" 01)07" 02)07" 03)07" 04)07" 05)07" 06)07" 07)07" 08)07" 09)07" Day) Figure 73. Average of the intervals assigned by the algorithm during the whole festival. Every line in the scale indicates the beginning of a new day. Analogously, we can combine the results of the whole festival into one day. This graph is shown in Figure 74. In this case, we can see that the intervals have peak of 24[s] at 5:50am. The time of this peak matches with the maximum of the polynomial obtained in Section 4.2.3.3. In this graph we can see more clearly the influence of the polynomial (as the tendency) and the influence of the replacement ratio (as the noise).

31"

27"

23"

19"

15"

11" Interval)between)discoveries)[min]) 7" 00" 01" 03" 05" 07" 09" 11" 13" 15" 17" 19" 21" 23" Hour) Figure 74. Daily variation of the intervals. This graph is obtaining by combining every day of the festival into a single day

Chapter 6: Results | 88 The previous results indicate that the algorithm worked as expected, considering the variables that influence on it. However, the maximum and minimum values in the range were rarely reached. This indicates that the algorithm should be adjusted in order to reflect these ranges.

6.5.4 Distribution of discovered devices In this section we will analyze the distribution of discovered devices over the course of the festival. In Figure 75 we can see the number of devices per discovery during the whole festival. We can note that the number of discovered devices increases after the warm up (days 5-8). This indicates greater movement of the people, which can be explained by the beginning of the main concerts.

30"

25"

20"

15"

10"

5" Number'of'discovered'devices' 0" 30" 01" 02" 03" 04" 05" 06" 07" 08" 09" Day'

Figure 75. Number of detected devices per scan for the whole festival. The vertical lines indicate the beginning of a new day.

We combine the previous results into one day as it is shown in Figure 76. In this graph, we calculated the average of every 20 records in order to have a smooth curve. We can see that during the afternoon, the values reach their maximum of 7 devices; while during the resting time (hours 4-7) the value is only one device. This is a consequence of the festival activity and the intervals chosen by the library.

8"

7"

6"

5"

4"

3"

2" Nr.$of$discovered$devices$ 1"

0" 00" 01" 03" 05" 07" 09" 11" 13" 15" 17" 19" 21" 23" Hour$

Figure 76. Daily variation of detected devices per scan. The results from all the days are combined and averaged. All things considered, we could conclude that the algorithm worked as it was designed. We obtained a similar level of activity than the data of Roskilde 2011 and we assume that this condition is repeated in this edition. Nevertheless, in order to have more certainty that this activity is repeated, it would be necessary to carry on the experiment using fix intervals and see if the results are similar.

89 | Chapter 6: Results 6.6 Library running time and front-end impact An important detail about the operation of the library to know the time it effectively ran on the participant’s smartphone and the reasons of why it stopped sending data. The possible scenarios are four: The participants actively stopped the front-end application (it did not give any value to them), the application crashed, the participant might turn off the device or it ran out of battery. From the data collected, we can identify only if the application stopped because of the battery level. For this analysis, we looked into the Battery table containing the measured battery levels. We analyzed three things: First, the time difference for every scanner between the first and the last battery readings, which give us the running time of the library. Second, we checked the last battery level stored in the database for every scanner. This will tell us if the library stopped because of no battery charge (the last battery level was low) or if the participant stopped it (the last battery level was high). Third, we searched for high time and level gaps between battery readings, in order to recognize if the device was reset and checked if the application was started again. On the other hand, we have to group the data per front- end application, since the reason of termination can be related to it.

All front end Roskilde Roskilde Roskilde applications Hide and Seek Decibel MusicNerd Average running time 19,3 [h] 9,1 [h] 23,6 [h] 33,3 [h] Average of the last battery 49% 60% 36,5% 48% level sent Amount of scanners 110 33 47 30 running the application Amount of scanners that 29 [scanners] 8 [scanners] or 14 [scanners] 7 [scanners] run the front-end or 26% 24% or 30% or 23% application more than once Table 11. Behavior of the library according to the front-end applications. The values are the averages for all the scanners. The running time indicates how long the library was running. The last battery reading indicates the level of battery just before the library stop sending data. In Table 11 we can see the running time and the last battery level sent, which is the average for all the scanners. We also provide statistics per front-end application. We can see that without grouping per front-end applications, the library run 19,3[h] and the last battery level was 49%. The high battery level of the last reading clearly indicate that most of the time the library did not run until the battery ran out of charge. On the other hand, 26% of the scanners restart a front-end application after turning off the device. The amount of scanners per front-end application is in the same order of magnitude, so we can make a fair comparison between them. Along with the data from Table 11, we will use the distribution function for the running time of the library (Figure 77) and the cumulative distribution of the battery level (Figure 78). In the case of Roskilde Hide and Seek, the library stopped sending data with a high battery level and run only 9,1[h] in average. Similarly, the distributions indicate a bias towards short running times and high last battery levels. In fact, in 35% of the cases it stopped after 0-10[hours] and in 30% of the cases the last battery level was below 50% of charge. After the data collection, we discovered that this application contained a bug and was not working properly, which might be the reason why the participants stopped it.

Chapter 6: Results | 90 All*front9end*applica0ons* Roskilde*Hide*and*Seek* 0,6" 0,6"

0,5" 0,5"

0,4" 0,4"

0,3" 0,3" Probability* Probability* 0,2" 0,2"

0,1" 0,1"

0" 0" 0" 20" 40" 60" 80" 100" 120" 140" 160" 180" 0" 20" 40" 60" 80" 100" 120" 140" 160" 180" Time*Ac0ve*(Hours)* Time*Ac0ve*(Hours)*

Roskilde*Decibel* Roskilde*MusicNerd* 0,6" 0,6"

0,5" 0,5"

0,4" 0,4"

0,3" 0,3" Probability* Probability* 0,2" 0,2"

0,1" 0,1"

0" 0" 0" 20" 40" 60" 80" 100" 120" 140" 160" 180" 0" 20" 40" 60" 80" 100" 120" 140" 160" 180" Time*Ac0ve*(Hours)* Time*Ac0ve*(Hours)*

Figure 77. Probability functions of the time active of the library in hours.

Roskilde Decibel obtained the best results in terms of battery level. In fact, in 80% of the cases the last battery reading was below 50%. On the other hand, the running time was around 20[hours] and, in 30% of the cases, the library run more than 60[hours]. On the other hand this was the application that was more restarted after turning off the device (30% of devices restarted it). We did not identify any problems with this application.

All*front6end*applica:ons* Roskilde*Hide*and*Seek* 1" 1" 0,9" 0,9" 0,8" 0,8" 0,7" 0,7" 0,6" 0,6" 0,5" 0,5" 0,4"

0,4" Probability* Probability* 0,3" 0,3" 0,2" 0,2" 0,1" 0,1" 0" 0" 0" 20" 40" 60" 80" 100" 0" 20" 40" 60" 80" 100" Ba,ery*Level*[%]* Ba,ery*Level*[%]*

Roskilde*Decibel* Roskilde*MusicNerd* 1" 1" 0,9" 0,9" 0,8" 0,8" 0,7" 0,7" 0,6" 0,6" 0,5" 0,5" 0,4" Probability* 0,4" Probability* 0,3" 0,3" 0,2" 0,2" 0,1" 0,1" 0" 0" 0" 20" 40" 60" 80" 100" 0" 20" 40" 60" 80" 100" Ba,ery*Level*[%]* Ba,ery*Level*[%]*

Figure 78. Cumulative probability function of the last battery level sent by the library

91 | Chapter 6: Results Finally, Roskilde MusicNerd obtained the best results in terms of running time. The average running time was 33,3[hours] and in 20% of the cases more than 80 hours. The last battery levels sent below 50% were around 50% as well. We did not identify any problems with this application. All things considered, we conclude that the running time of the library depends highly on the front-end application. We suspect that during these events the users usually turn off the device in order to save battery, explaining the short running times compared to the common battery lives. After turning off the devices around 26% of the application were restarted, indicating that many users used them only once. In addition, the most suitable front-end applications are the ones that the user needs to use more frequently, like information tools as MusicNerd. Gadgets like Decibel, are also important because the user run them again even if the device ran out of battery.

6.7 Network coverage One of the aspects that we considered at the beginning of this project was the unreliability of the network access. In this analysis we will describe the behavior of the network during this edition of Roskilde Festival. This analysis is based on the information of the table SubmissionMobile. This table contains records about the status package submission from the devices to the server, indicating if the package was successfully sent or not. Additionally, this table registers the time elapsed since the package was sent and when the confirmation from the server was received. The size of the packages is calculated on the server side. Dividing the size of the package in bytes by the time elapsed, can give us a good approximation of the network speed. Nevertheless, this is not the equivalent to bandwidth because it also contains the network latency. On the other hand, the web service on the server registers the IP address of the scanners for every package received. We used this information to identify the telecommunication company the user is connected to, and thus we can make an analysis based on it. In the case of a failed attempt, we were not able to obtain the IP, but we associated the package to the next successful attempt. To help the identification, we used the WHOIS50 tool. This tool provides information about the owner of a particular IP. We were able to identify the source of 99,7% of the packages. We assign the source as Wi-Fi, in the cases of IPs that did not belonged to a telecommunication company. The Danish companies identified were the following: TDC, Telenor DK, Telia, 3.dk. Additionally, the server received packages with IPs belonging to foreign companies, namely: internet.is (Iceland) and netcom.no (Norway), indicating scanners connected using roaming. During the festival, 5.792 connection attempts were performed, were 2.611 (45,02%) were submitted successfully and 3.181 (54,98%) failed. In Figure 79, a bar graph shows the percentage of success/failure for every company and the last row indicates the total attempts for the whole festival.

50 Whois By IP Address: http://tools.whois.net/whoisbyip/

Chapter 6: Results | 92 100%# 90%# 80%# 70%# 60%# 50%# Success# 40%# Failure# Percentage) 30%# 20%# 10%# 0%#

TDC# Telia# WiFi# 3.dk# Total#

Telenor#DK# internet.is#netcom.no# Telecommunica0on)Company)

Figure 79. Connection status for the whole festival divided by the telecommunication company. The last bar indicates the results for the whole festival.

We can see that the worst provider was the Wi-Fi connection with 83,73% of failures. It is very likely that the Wi-Fi routers had a poor signal in the festival, making them difficult to send data. The worst telecommunication company was netcom.no (40% of failure) and TDC (52,36%) if we consider only Danish companies. On the other hand, the best provider was 3.dk, with only 9,59% of failure. In Figure 80, we can see the distribution per day of the network coverage during the festival. The blue lines indicate the percentage of success (packages submitted successfully/total packages submitted). The red lines are the opposite. We can observe that the network was reliable enough during the warm up days of the festival (days 30-4), except for the day 1. However, after the festival began, connection problems started to occur, especially during the night. This can be explained by the high agglomeration of people in the concert area during those hours, when the main artists performed.

100,00%$ 90,00%$ 80,00%$ 70,00%$ 60,00%$ Percentage$of$ 50,00%$ failure$ 40,00%$ Percentage$of$ success$ 30,00%$

Success/Fail+percentage+ 20,00%$ 10,00%$ 0,00%$ 30$ 01$ 02$ 03$ 04$ 05$ 06$ 07$ 08$ 09$ Day++ Figure 80. Connection status according to the day of the festival. The red lines indicate the percentage of failure and the blue lines the percentage of success. Every vertical line represents the beginning of a day.

We determined the network speed using package size divided by the time elapsed since submission and the confirmation from the server. Using this, we obtained a capacity of 0,12[KB/s]. As it was aforementioned, this value is not the actual bandwidth because it includes latency. In Figure 81 we can see how the network speed evolved during the days of the festival. We have values around the average during the whole festival, except for some peaks at the beginning and valleys at the end, probably due to the amount of participants.

93 | Chapter 6: Results 0,25"

0,2"

0,15"

0,1" KB/s%

0,05"

0" 30" 01" 02" 03" 04" 05" 06" 07" 08" 09"

Day%% Figure 81. Network speed based on package size divided by the time elapsed since submission and the confirmation of reception.

As a conclusion, we can confirm that our initial considerations about the network reliability and bandwidth were correct. The percentage of failure is high and should be taken into account. On the other hand, the speed is also very low, thus, front-end applications sending large files should be avoided.

6.8 Location accuracy Location accuracy is another factor that has to be explored. When a location update was performed, we also registered the method used to determine it, namely: GPS, Wi-Fi and 3G. The Figure 82 presents a pie chart with the distributions of the methods used. As it can be inferred, the majority of location updates came from GPS (55%), 3G (35%) and finally Wi- Fi (10%).

10%$

WiFI%

3G% 35%$ 55%$ GPS%

Figure 82. Distribution of location updates per method.

After identifying the distribution of methods used in location updates, we will analyze their distribution of accuracy. The Figure 83 shows the probability distribution of accuracy in meters per method. As it can be seen, the distribution is mostly located within the range of 0[m] and 300[m]. 90% of the GPS updates were situated in a range below 300[m]. In the case of Wi-Fi, only 10% of updates were located in the same range, and in the case of 3G, only 25%. However, even though Wi-Fi and 3G methods may provide the best accuracy, our results show the opposite. The results for Wi-Fi and 3G were accuracy errors greater than 900[m] (even 2000[m]), making these methods unreliable. Furthermore, their accuracy can be sometimes greater than the festival area.

Chapter 6: Results | 94 1" 0,9" 0,8" 0,7" 0,6" GPS" 0,5" 3G" 0,4"

Percentage)[%]) 0,3" WiFi" 0,2" Whole"" 0,1" 0" 0" 300" 600" 900" 1200" 1500" 1800" 2100" 2400" 2700" Accuracy)[m])

Figure 83. Probability distribution of accuracy per method.

The next step after analyzing the distribution of data is to know the standard deviation for each method. This will allow us, to point out the most reliable one. The Figure 84 shows the average accuracy error per method (where y-axis is displayed in a logarithmic scale, in order to help the comparison). Not surprisingly, the biggest average error is for the 3G and the Wi-Fi methods. The standard deviation for them is indicated above every bar.

10000$ 788$[m]$ 650$[m]$

1000$

100$ 5$[m]$ Accuracy'[m]' 10$

1$ GPS$ WiFi$ 3G$ Accuracy$ 13,87253521$ 1134,414894$ 1459,238806$ Figure 84. Logarithmic scale of the location accuracy per method. The overall accuracy is indicated in the table below the graph. The standard deviation is shown on top of each bar

The results for the differences in accuracy can be explained by how the location is calculated. In the case of Wi-Fi, the location depends on the position of the devices according to the surrounding Wi-Fi antennas and their signal strength. In the case of 3G, the accuracy of a location can be calculated using five51 different techniques. One of these techniques takes into account the angle of the arriving signal between devices. Additionally, it uses triangulation, which calculates the position of a device according to the antennas it can detect. In the case of an event like Roskilde festival, these two techniques are limited. Namely, there are few 3G antennas and Wi-Fi access points, making difficult to calculate the location in an accurate way. Another fact is that the Wi-Fi signal range is 32[m] indoor and 95[m]52 outdoor, limiting physically the effectiveness of this technique. Considering the standard deviation error for each method, we can point out that the most reliable was GPS, which was actually, the most used. On the other hand, the empirical data shows that the location obtained using Wi-Fi antennas is by far more accurate than 3G.

51 http://www.scribd.com/doc/35282752/Positioning-Techniques-in-3G-Networks 52 http://en.wikipedia.org/wiki/Wi-Fi#Range

95 | Chapter 6: Results 6.9 Bluetooth discovery time The Bluetooth discovery time is the time elapsed from the start to end of a discovery process. In the case of the library, it was designed to perform two consecutive scans. In order to improve the library in a future work, the Bluetooth discovery time can be helpful to determine the smaller interval of a discovery process. As it was mentioned in (Peterson, Baldwin, & Kharoufeh, 2006), the minimum time needed to perform a discovery is 10.24[s]. Having this setting, we can assume that two scans might take at least 20.48[s]. However, the results obtained from the data collected showed different values than the assumed one. The Figure 85 shows the probability distribution of the results, for the Bluetooth discovery time. As we can see, most of the scans (50%) were located within the time range of 30[s]. This value is three times greater than the assumed one. Nevertheless, this difference can be related to the different hardware used in the data collection and the amount of discovered devices.

0,6" 0,5" 0,4" 0,3" 0,2"

Percentage)[%]) 0,1" 0" 10" 20" 30" 40" 50" 60" 70" Time)[s]) Figure 85. Probability distribution of the Bluetooth discovery time

In fact, we discovered that the number of discovered devices impacts in the Bluetooth discovery time. This relation can be seen in Figure 86. The red line in the graph presents linear interpolation for results and the blue line presents real data. As it can be seen, the Bluetooth discovery time increases with the number of discovered devices. It is possible that the scanners increase the discovery when it can detect many devices.

150" 135" 120" 105" 90" 75" 60" 45"

Discovery*+me*[s]* 30" 15" 0" 0" 1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15" 16" 17" 18" 19" 20" 21" 22" 23" 24" 25" Discovered*devices*[units]*

Figure 86. Relation between amount of discovered devices and Bluetooth discovery time.

In order to confirm the quality of the results, we conducted a standard deviation analysis for the discovery time. The results can be seen in Figure 87. We can notice the variance of data sample according to the number of discovered devices. As we mentioned before, these results might vary because of the different manufacturers used as scanners. Another relation might be connected with motion of the scanner. It is possible that the changes in location of a scanner during the discovery process might impact on the discovery time.

Chapter 6: Results | 96 200" 180" 160" 140" 120" 100" 80"

Discovery*Time* 60" 40" 20" 0" 0" 1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15" 16" 17" 18" 19" Number*of*Discovered*Devices*

Figure 87. Scanning time versus number of discovered devices. The error bars indicate the standard deviation.

6.10 Data collected and market share In this analysis, we want to know if the data collected is representative of the Danish mobile market presented in Section 3.5.1. This analysis can be possible by identifying the device manufacturers from the data collected within the Roskilde Festival area. Firstly, the market analysis presented earlier only mentioned smartphone. Thus, we have to add the phones that are not smartphones. According to (TNS Gallup A/S, 2011), 50% of the Danish people between 25-29 years own a smartphone, which matches with the age participants in the festival (Marling & Kiib, 2011). Therefore, we reorganize the previous market share in 50% of the total, while the other 50% can be assigned to others. We can see this redistribution in Figure 88 a). On the other hand, we created a pie chart in Figure 88b indicating the distribution of the data collected. In this distribution, we added all the devices identified (scanners and unique discovered devices). All the scanners were assigned to the Android category. In the case of unique discovered devices we identified them by MAC address. The devices identified as Apple were assigned to the iOS category, Nokia devices to its own category (which can also be Symbian) and the rest to the others category. As it was aforementioned, Android devices are very unlike to be detected using Bluetooth discoveries because of their time limitations of discoverable mode.

Distribu(on+of+pla0orm+in+the+market+ Distribu(on+of+pla0orms+in+the+data+ share,+including+non+smartphones+ collected+

7%$

Others$ Others$ 18%$ 36%$ Android$ Android$ 50%$ iOS$ iOS$ Symbian$ 60%$ Nokia$

25%$ 1%$ 3%$ Figure 88. Comparison of the distribution of the data collected. a) Market share adding non-smartphones b) Platforms identified in the data collected.

Using this approximation to measure the representativeness of the data collected, it is evident that the data collected does not represent the market share. We can find two explanations for this: Firstly, as Android devices cannot be detected, they only count as scanners. Secondly, it is possible that the participants only carry their ordinary phones in these events.

97 | Chapter 6: Results

6.11 Additional results

6.11.1 Daily walking speed During the initial analysis in Chapter 3, we used an approximation for the walking speed of an average person, which was 1,4[m/s]. In this analysis, we want to know confirm this value by obtaining the distance of every pair of location readings divided by the time elapsed. We considered that the participant followed a straight path, which not necessarily the actual situation. Nevertheless, this method will give us an approximation for his or her walking speed.

2,5"

2"

1,5" Average" 1" Standard"devia8on" Walking(speed([m/s]( 0,5"

0" 0" 1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15" 16" 17" 18" 19" 20" 21" 22" 23" Hour(of(the(day( Figure 89. Daily variation of the walking speed for all the scanners. The blue line represents the average of the walking speed, while the red lines the standard deviation.

For the calculations, we measured the speed using every pair of consecutive scans for the whole duration of the festival. Then, we combined these values into one day, by averaging the previous results. The average walking speed for the whole festival was 0,5[m/s] with a standard deviation of 0,9[m/s]. In Figure 89, we can see these results according to the hour of the day. It is clear to see a valley between 02:00 am and 08:00 am. This valley coincides with the assumed resting hours. During this resting period, there is a minimum of 0,07[m/s] at 05:00 am. The results of this analysis indicate that our original walking speed of 1,4[m/s] may not apply on massive events and a more conservative value should be used. Contrarily to walking on a road, the participants on massive events spent a lot of time standing, for example, when watching concerts, eating or resting.

6.11.2 Battery Level In Figure 90, we can see the distribution of the battery levels sent during the festival. As this data was very noisy, we had to average the information over the last 20 records, in order to have a good visualization. As it can be inferred from the graph, the battery levels sent were kept on the average of 57%. Nevertheless, this graph indicates the battery levels sent during the running time on the library and not necessarily the actual values. As it was discussed in Section 6.6 , the library stopped sending data before the batteries lost all their charge.

Chapter 6: Results | 98 100"

80"

60"

40"

Ba#ery'Level'[%]' 20"

0" 30" 01" 02" 03" 04" 05" 06" 07" 08" 09" Day'

Figure 90. Battery level distribution during the festival. The values are averaged every 20 values in order to obtain a good visualization

6.11.3 Manufacturers This analysis reveals the manufacturers of the discovered and the scanners. In the case of discovered devices, we used the MAC address for the identification. This method only gives information about the Bluetooth device manufacturer, which might be not necessarily the same of the mobile phone. Consequently, it does not give any information about the model. In the case of the scanner, the identification was much more accurate. The library used the Android API to identify the manufacturer and the model. This information was later collected on the server.

6.11.3.1 Manufacturers of the discovered devices The Figure 91 shows the manufacturers of the devices discovered by the scanners. As it can be seen, the majority of detected devices were Nokia, Sony Ericsson and Samsung. In addition, many devices in the Other category are Bluetooth headphones.

Nokia:$1185$

Sony$Ericsson:$434$

Samsung:$177$ 37%$ 38%$ Apple:$31$

LG:$40$

HTC:$6$

RIM:$10$ 3%$ 14%$ Other:$103$ 0%$ 6%$ Not$recognizable:$1191$ 0%$ 1%$ 1%$ Figure 91. Manufacturers of the discovered devices

6.11.3.2 Manufacturers of the scanners The pie chart presented in Figure 92 shows the scanners grouped by manufacturers. The more popular brands of the smartphones running the scanner software were HTC, Samsung and Sony Ericsson. Following, we will present the models of these scanners.

99 | Chapter 6: Results 7%$ HTC:$251$

HUAWEI:$2$ 33%$ LGE:$5$ Motorola:$9$ 57%$ Samsung:$147$

Sony$Ericsson:$30$ 2%$ 1%$ 0%$

Figure 92. Scanners by brand

6.11.3.3 HTC scanners models The HTC scanners were the most popular with a high variety of models. As we can see from Figure 93, the most popular devices among HTC smartphones were: Sensation, Desire S and One.

0%# 1%# 0%# 1%#

3%# 3%# 6%# 3%# 6%# Desire#HD:#1# HTC#Desire:#16# HTC#Desire#HD#A9191:#15# HTC#Desire#S:#41# HTC#Incredible#S:#15# HTC#Legend:#3# 16%# HTC#One#S:#12# HTC#One#V:#5# HTC#One#X:#26# HTC#SensaBon#XE:#6# 32%# HTC#SensaBon#XL:#3# HTC#SensaBon#Z710e:#81#

6%# HTC#Vision:#8# HTC#Wildfire:#2# HTC#Wildfire#S#A510e:#7# Liberty:#7# 5%# Nexus#One:#1# SensaBon:#2# 1%# 10%# 1%# 2%# 2%#

Figure 93. Distribution of the HTC scanners models

6.11.3.4 Samsung scanners models Figure 94 divides Samsung’s phones per models. The most popular models were: Galaxy S II (code name GT- I9100) and Galaxy SIII (code name GT-I9300).

3%$ 1%$ Galaxy$Nexus:$18$ 5%$ 12%$ GT5I8150:$10$ GT5I9000:$3$ 7%$ 2%$ 21%$ GT5I9100:$71$ GT5I9100G:$1$

GT5I9300:$30$ 1%$ GT5N7000:$2$

48%$ GT5S5660:$7$

Nexus$S:$5$

Figure 94. Distribution of the Samsung scanners models

Chapter 6: Results | 100 6.11.3.5 Sony Ericsson scanners models Figure 95 shows Sony Ericsson scanners. The most popular smartphones in this segment were: Xperia Arc, Xperia Arc S and Xperia S.

27%$ LT15i:$10$ 34%$ LT18i:$7$

LT26i:$4$ 3%$ ST17i:$1$ 13%$ ST18i:$8$ 23%$

Figure 95. Distribution of the Sony Ericsson scanners models

The results showed, the variety of scanners used during the festival. We can see that the library was able to detect many devices (and run on many). This information can be, valuable for the testing process in the future, by restricting testing to the most popular devices.

6.12 Applicability Massive events provide the opportunity of gathering significant amount of people on the relative small space. From this point of view, it seems to be difficult to infer any information about the single participant of the 130.000 people festival. In this analysis, we tried to attach additional information to participants, by associating people with information about the event attended by them.

6.12.1 Social network analysis In this section, we will show an example of the possibilities of performing social network analysis with the data collected. Nevertheless, analysis was created with the purpose of demonstrating what can be done with the data, but not about the accuracy of the results. To validate the results, it is needed to perform a deep statistical analysis, considering all the accumulation of errors that influence in the data, such as: location accuracy, time difference between performed discovery and location, and Bluetooth discoverability process. Therefore, this analysis is out of the scope of this work. Before performing the analysis, we prepared a metadata table in the database, which can be later associated with the discovered devices in the database. The metadata table contains the information about the events in the Roskilde Festival. Additionally, we created central points (GPS coordinates) that enclose each stage. The central point does not mean center of the stage, instead it points the center of the area just before the stage (place where the people gathered to listen an artist). In the cases, where the stages were considerable as the small one, the center of the location was the center of the stage. For example, the radius related to the Orange arena was 100[m] from the central point (latitude: 55.621846, longitude: 12.066579, 60). Another step for creating the Metadata table was to associate the artist with the stage where he/she performed. Furthermore, each artist is located in the frame of time (start and end of performance), and as well with duration of the

101 | Chapter 6: Results performance. All data were taken from official schedule of the festival. An example of this table can be seen in Figure 96.

Figure 96. Fragment of metadata table.

Combining the Scans table with Bluetooth table, we achieved the MAC address of every discovered device and the accuracy of its position (which is GPS accuracy of the scanner). This allowed us to filter the data according to the location error in the further analysis. Result of this combination is in table MacMeta Figure 97.

Figure 97. Fragment of MacMeta table.

Calculating the distance between the device and the stages at the time of concert performance made the fusion of an encountered device with the environment. Having this value, we associated the user with the artist. A result of this entanglement is enclosed in table MacStyle. Example of record can be seen in Figure 98.

Figure 98. Fragment of MacStyle database.

The process of associating the metadata to a participant was done by constraining the location values. First, constraint limited the accuracy of the device position (hence scanner position) to 10[m]. Second, constraint limited the distance between device and the stages. Analysis of the social network was done with the usage of Netorkx 53 library for Python. We exported the data from the tables in the simplified database to csv files. Further, we parsed the files using the CSV module for Python. In the main algorithm, we counted the occurrences of devices related to each other. Additionally, we assigned this value to the edge of the graph. Furthermore, we associated the node with the music style. Condition was that

53 http://networkx.lanl.gov/

Chapter 6: Results | 102 the device should be associated at least with one music style in order to attach the metadata (does not matter which style). The result of this is illustrated in the Figure 99. This figure was generated for the devices which encountered each other more than 4 and less than 12 times. Therefore, we can specified the 83[nodes] and 55[edges]. Nodes represent unique device and edges relationship between them. Relationship is the encounter of two this same devices. In the Figure 99, we can specified 42 connected components (pair of nodes connected). Furthermore, nodes are divided between colors red and blue. Red, are the nodes that we cannot attach any metadata. Blue, are the nodes that are classified as the potential nodes that can be associated with metadata.

Figure 99. Graph presenting relation between festival participants. Red nodes present participants without associated metadata. Blue nodes present participants who are associated with metadata.

In summary, we have to mention the bias associated to this type of analysis. Namely, there is no problem to achieve the data samples that is reliable (“pure”) to the analysis. The problem is the amount of data, which might not be significant to create associations between nodes. For example, the Figure 99 shows 83[nodes] generated out of 3.631[unique devices] what gives 2,4% of the possible information attached. This might be not enough to perform a deeper analysis. Another fact is the time needed to generate the graph for devices that was encountered at least once (it takes 30[min] on two core Intel 2.26GHz CPU and 6GB Ram DDR 1066MHz). Furthermore, generating a graph for more than 400 nodes, where the nodes encountered each other at least once is not readable (because most points are connected to each other).

6.13 Summary In this chapter we analyzed the results for the data collected by the library over the course of Roskilde Festival 2012. We described how the data was prepared prior to the analysis, by adding geographical information to every record in the database. This preparation, allowed us to simplify the queries and reduce the processing time for them. We presented general statistics about all the data and the data restricted only to the festival area. The results were very satisfying, since we were able to cover 2.4% of the population using 75 scanners. We expect that if the amount of scanners can be increased, even more participants can be tracked. According to our measurements, using the actual area measurement method, the distributed scanning was able to cover around 19,71% of the festival area and 60,34% of the

103 | Chapter 6: Results concerts area during the whole festival. Nevertheless, this value can be even greater if we redefine the areas of interest by a terrain analysis of the area. These values are very promising considering the small amount of scanners. We also measured the characteristics of distributed scanning. We were able to notice that this technique detects the same amount of occurrences per discovery than the fix scanning. However, many of these occurrences were repeated devices. We suspect that the participants walk along with the same people, such as friends or acquaintances. The algorithm designed to determine the intervals improved the amount of detected devices per discovery. We were able to increase the scanning frequency during the day and decrease it during the resting hours, as expected. On the other hand, the daily battery consumption was 108%, a value not far according to the range expected of 72%-78%. Analyzing the front-end applications, we discovered that the success of the distributed scanning depends highly on them. Some applications were able to keep the library running during more than 40 hours and others no more around 20. Additionally, we suspect that the users turned off their smartphone in order to save battery, stopping the library. Only in 26% of the cases, the front-end applications were restarted after this situation. We suggest that the front-end applications should be created in such a way to encourage the user to restart them. We were able to confirm our concerns about the network reliability. We discovered that 54,98% of the packages sent never arrived to the server. The average network transfer speed was 0,12 [KB/s]. These values are very different compared to normal conditions. After analyzing location accuracy for three available methods (GPS, Wi-Fi and 3G), we can point out that the most accurate method was GPS. At the same time, this method was the most used. The accuracy error was for GPS 5[m], while for the rest was 900[m] Analyzing Bluetooth discovery, we discovered that the average time per one discovery was 30[s], different of our initial assumption of 10[s]. Nevertheless, this value had a high standard deviation error, which might be caused by different hardware used by the participants. This value can be used to fine tuning the algorithm used to define the intervals. We identified the manufacturers of the devices used in the data collection process. This data allowed us to determine that the data collected was not a representative sample of the market share. On the other hand, this analysis might be helpful for testing purposes in a future work. In our initial calculations at the beginning of this thesis, we use the average walking speed of a human being of 1.4[m/s]. However, after knowing the empirical data from the festival, the walking speed was relative slow 0.5[m/s]. This can be explained by the fact that people are standing in several situations, such as eating, watching a concert or just resting. The social network analysis allowed us to have an idea about what can be done with the data. Knowing the position and time of an individual, we were able to associate a participant with metadata from the festival. The result of the analysis pointed, that even though it is possible to create associations between people, the amount of data to create a strong statement about relation is relative small. Additionally, it requires a deep analysis of the accuracy of the results.

Chapter 6: Results | 104 CHAPTER 7

Conclusions

At the end of thesis, the goals mentioned in sections 1.1.1 and 1.1.2 were achieved successfully by applying scientific and engineering methods along the project. Namely, we designed, implemented, tested and deployed the scanner library for data collection in the Roskilde 2012 festival. In this work, the technique of distributed scanning has proven to be successful for collecting participant’s data in massive events. By using 110 scanners (or 0,08% of the festival population), the library was able to collect the data of 3.161 unique devices (or 2,4% of the festival). On the other hand, the area covered using this technique was considerable big, compared to the amount of scanners. It is important to note, that the scalability of this technique differs from the previous methods studied. The main difference is that the deployment does not depend on the infrastructure, but on the collaboration of the users, or more specific, the front-end applications the library is attached to. If the amount of front-end applications can be increased, more scanners could be used, and thus, more devices could be discovered. On the other hand, its deployment is cheaper than the previous techniques, since it does not require any hardware installation in situ. In the work presented in this thesis, we registered 2.4% of festival population, which is 30 times bigger than the number of scanners. Following this approach, we can assume that having 1% of scanners, we would be able to capture 28% of the festival population, assuming that there are enough devices in discoverable mode. The alternatives to increase the number of scanners by increasing the number of downloads and usage of the front-end applications are many. We discovered the characteristics that these front-end applications should met for the case of Roskilde Festival. These applications should encourage the participant to run them constantly, due to the fact that they may turn off their devices in order to save battery. Given the network speed measured in the festival, these applications should avoid sending large files through the network, such as large pictures or other metadata. On the other hand, they have to encourage the users to turn on the Bluetooth device. In our analysis, we discovered that only the half of the participants with the Bluetooth device off answered Yes when they were requested to turn on it. The decision of developing the scanner solution as a library was the correct. It provided an easy integration to the front-end applications, without decreasing their performance. In addition, it relief the developers of implementing the same issues the library faced in the festival environment. The algorithm used to define the intervals worked as expected. The data from the previous edition of Roskilde Festival was useful for the library to avoid scanning when there were few devices. The replacement ratio was a good method to introduce a variation of the intervals. That algorithm, met the goal of keeping the battery consumption

105 | Chapter 7: Conclusions close to the estimated levels. Nevertheless, in this algorithm is possible to set the minimum and maximum intervals in order to consume a desired battery level. However, it has to be fine-tuned, because these maximum and minimum intervals were never reached. Another solution to improve the energy efficiency can be to disable the scanning during a range of hours. In our analysis, we were able to identify the nature of the data collected using distributed scanning. First, because of the same people usually surround the participants carrying the scanners; the amount of repeated devices is higher than in other methods. On the other hand, this technique is suitable to cover only crowded areas. This could be seen by the fact that the concert area was much more covered than the rest of the whole festival area. One of the things that could have been done better was the energy consumption measurement prior the deployment. The experiments for estimating the energy consumed by Bluetooth and GPS were very limited. First, we did not consider the human factor in the usage of a smartphone like: browsing the Internet, make regular calls or play games. Secondly we tested the battery consumption only on one specific device. Finally, we could have used the results of the battery consumption from the Distortion 2012 event, in order to support our assumptions. We were able to perform a simple social network analysis to prove that it is possible to associate the users with the environment. This analysis was done using the festival metadata, which was attached to the discovered devices. However, in order to validate the results, an analysis about the errors introduced in the data collected should be assessed. On the other hand, the amount of date collected and the encounters between people were relatively small, which might be not enough to create strong statement about the analysis.

7.1 Future Work Given the time assigned to this thesis, we performed the more analysis we could. Nevertheless, there are many other researches that can be applied to the data collected. One of the more important is the error assessment. In every measurement obtained from the library, errors were introduced. We can identify: The error of the location accuracy, the time difference between the Bluetooth scans and the time of the location, and the error introduced by the area of coverage of Bluetooth discoveries. This analysis will reveal the degree of confidence of the data. Therefore, this project leaves opportunities for the future research that might be focused in other projects. One aspect that can put in risk the future of this technique and the Bluetooth tracking in general is the decrease of devices in the market that can be set in discoverable mode. More and more, the smartphone manufacturers put restrictions to set this mode. Nevertheless, there is still an important amount of devices that can be detected. On the other hand, the scanners that only sent location updates can still be tracked, even though they only provide information about one participant. This technique is not restricted exclusively to Roskilde Festival. It can be also used in other massive events. In the testing of the Distortion event, we were able to prove that in events located in cities this technique can be also applied. However, the library should be adjusted, since the conditions in cities are different than is the festival studied. For example, battery issues are less important in cities. Nevertheless, the dynamics of these events should be measured; in order to provide a new daily polynomial to the algorithm used to set the intervals. We can distinguish two types of data that was not captured by the library and could be used as additional support for a better analysis. First, the accelerometer sensor could be used.

Chapter 7: Conclusions | 106 Having this data, we might be able to infer information about the dynamics of a person as well about the surrounding environment around him/her. For example, if a person attends to a concert, we might be able to infer information by recognizing if the person was jumping or standing. This data can be used for example, to measure the physical activity of the participants. On the other hand, with each Bluetooth scan we could record a sound sample. This sample could be helpful to confirm if a person who attended an artist’s performance was in the area of the concert, by analyzing the background music. This data could be helpful for social network analysis; because it can increase the accuracy of the metadata detected by the devices nearby the stages. The future applications for this technique are several. It can be used to identify the pathways the people follow, for example, to reduce the traffic of the people or improve the exit ways in case of emergency. If this technique is applied to cities, it can be used to know how diseases are spread. Additionally, it can be used to increase the impact of advertising campaigns. By knowing were the participant’s stop more frequently, it would logic to place advertisements at those places.

107 | Chapter 7: Conclusions Bibliography

Asymco. (2011, January 13). Big in Denmark: iPhone captures 36% of phone markey value. Retrieved Februrary 15, 2012, from asymco.com: http://www.asymco.com/2011/01/13/big-in- denmark-iphone-captures-36-of-phone-market-value/

Zhang, F., & Chanson, S. (2004). Blocking-Aware Processor Voltage Scheduling for Real-Time Tasks . ACM Transactions on Embedded Computing Systems , 307-335.

Wikipedia. (2012, June 03). Bluetooth - Wikipedia, the free encyclopedia. Retrieved June 03, 2012, from http://en.wikipedia.org/wiki/Bluetooth

Wikipedia. (2012, June 13). Normal distribution - Wikipedia, the free encyclopedia. Retrieved June 13, 2012, from http://en.wikipedia.org/wiki/Normal_distribution

Wikipedia. (2012, June 11). Preferred walking speed - Wikipedia, the free encyclopedia. Retrieved June 11, 2012, from http://en.wikipedia.org/wiki/Preferred_Walking_Speed#cite_note- Browning2006-0

Woodings, R., Joos, D., Clifton, T., & Knutson, C. D. (2001). Rapid Heterogeneous Connection Establishment: Accelerating Bluetooth Inquiry Using IrDA. The Pennsylvania State University CiteSeer Archives.

Vassilis Kostakos, Eamonn O’Neill. (2010). Cityware : Urban computing to bridge online and real- world social networks. Cityware : Urban computing to bridge online and real-world social networks , 4 (1) . Bath: Information Science Reference, IGI Global.

Versichele, M., Neutens, T., Delafontaine, M., & Weghe, N. V. (2012). The use of Bluetooth for analysing spatiotemporal dynamics of human movement at mass events: A case study of the Ghent Festivities. Volume 32 (Issue 2).

Dagbladet Politiken. (2011, July 8). 3-direktør undskylder mobilkaos på Roskilde Festival. Retrieved March 20, 2012, from politiken.dk: http://ibyen.dk/fokus/roskildefestival/ECE1331577/3-direktoer- undskylder-mobilkaos-paa-roskilde-festival/

Gartner, I. (2011, November 15). Gartner Says Sales of Mobile Devices Grew 5.6 Percent in Third Quarter of 2011; Smartphone Sales Increased 42 Percent. Retrieved Februrary 15, 2012, from Technology Research, Gartner Inc.: http://www.gartner.com/it/page.jsp?id=1848514

Bibliography | 108 Gartner, Inc. (2011, November 15). Gartner Says Sales of Mobile Devices Grew 5.6 Percent in Third Quarter of 2011; Smartphone Sales Increased 42 Percent. Retrieved Februrary 15, 2012, from Technology Research, Gartner Inc.: http://www.gartner.com/it/page.jsp?id=1848514

IDC. (2012, March 15). Danskerne sværger til iPhone: Android taber terræn. Retrieved April 4, 2012, from http://www.comon.dk/art/214943/danskerne-svaerger-til-iphone-android-taber-terraen

Jacucci, G., Oulasvirta, A., & Salovaara, A. (2006). Active construction of experience through mobile media: a field study with implications for recording and sharing . Personal and Ubiquitous Computing (pp. 215-234). London: Springer-Verlag.

Jensen, B. S., Larsen, J., Jensen, K., Larsen, J., & Hansen, L. (2010). Estimating human predictability from mobile sensor data. 2010 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2010) (pp. 196-201). Kittilä: IEEE.

Kostakos, V., & O’Neill, E. (2009). Cityware: Urban computing to bridge online and real-world ABSTRACT social networks. Urban Informatics , 10.

Larsen, J. E., & Stopczynski, A. (2012). Crowds, Bluetooth and Rock'n'Roll: Understanding Music Festival Participanrs Behaviour.

Nokia Corporation. (2004, May 11). Introduction To Developing Networked MIDlets Using Bluetooth. Retrieved March 3, 2012, from Nokia Developer: http://www.developer.nokia.com/info/sw.nokia.com/id/c0d95e6e-ccb7-4793-b3fc- 2e88c9871bf5/Introduction_To_Developing_Networked_MIDlets_Using_Bluetooth_v1_0.zip.ht ml

Nokia Corporation. (2005, April 7). Symbian OS: Designing Bluetooth Applications in C++. Retrieved March 1, 2012, from Nokia Developer: http://www.developer.nokia.com/info/sw.nokia.com/id/6a2eacf4-d451-4d86-b265- d6452012bd43/Symbian_OS_Designing_Bluetooth_Applications_In_Cpp_v1_1_en.pdf.html

Marling, G., & Kiib, H. (2011). Instant City @ Roskilde Festival. Aalborg, Denmark: Aalborg University Press.

Marta C. González, César A. Hidalgo, Albert-László Barabási. (208). Understanding individual human mobility patterns. Nature , 779-782.

Peterson, B., Baldwin, R., & Kharoufeh, J. (2006). Bluetooth Inquiry Time Characterization and Selection. 5 (9).

Simon Hay and Robert Harle. (2009). Bluetooth tracking without discoverability. Tokyo: Springer.

Stange, H., Liebig, T., Hecker, D., Andrienko, G., & Andrienko, N. (2011). Analytical Workflow of Monitoring Human Mobility in Big Event Settings using Bluetooth. ISA '11: Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Indoor Spatial Awareness (pp. 51-58). Chicago: ACM. roskilde-festival. (2012, June 11). Roskilde Festival English: Nice to Know. Retrieved June 11, 2012, from http://roskilde-festival.dk/uk/for_the_media/nice_to_know/

109 | Bibliography TNS Gallup A/S. (2011, June 24). Index Danmark: 1,5 millioner danskere har en smartphone. Retrieved March 27, 2012, from Danske Mediers Arbejdsgiverforening: http://mediearbejdsgiverne.dk/nyhed/index-danmark-15-millioner-danskere-har-en-smartphone

Tom Nicolai, Nils Behrens, Eiko Yoneki. (2006). Wireless Rope: An Experiment in Social Proximity Sensing with Bluetooth. IEEE International Conference on Pervasive Computing and Communications (pp. 1-2). Italy: IEEE.

Tsvetovat, M., & Kouznetsov, A. (2012). Social Network Analysis for Startups. Sebastopol, CA: O'Reilly.

Bibliography | 110 Appendix

i. Library manual

a. Adding BTScannerLibrary to the project • Add the BTScannerLibrary project to your Eclipse workspace • Right-click your project and go to Properties. Check the android section and then go to Library->Add and select BTScannerLibrary. • Add the following permissions in your project AndroidManifest.xml:

• Register the following services in your project AndroidManifest.xml (inside tag):

The library should be ready to use. The possible actions are:

b. To start the service

startService(new Intent(this, ScannerService.class));

c. To stop the service: (This kill all the subservices started or in progress)

stopService(new Intent(this, ScannerService.class));

d. Get the Bluetooth MAC

String mac = ScannerService.getLocalMac();

e. Get the cached or obtain GPS location:

IntentFilter filter = new IntentFilter(ScannerService.ACTION_LOCATION);

111 | Appendix

this.registerReceiver(new BroadcastReceiver() { public void onReceive(Context context, Intent intent) { double latitude = intent.getExtras().getDouble("latitude"); double longitude = intent.getExtras().getDouble("longitude"); unregisterReceiver(this); }

}, filter);

ScannerService.getLocation();

f. Get the cached or obtain Bluetooth devices:

IntentFilter filter = new IntentFilter(ScannerService.ACTION_SCAN); this.registerReceiver(new BroadcastReceiver() { public void onReceive(Context context, Intent intent) { String devices[] = intent.getExtras().getStringArray("devices"); unregisterReceiver(this); } }, filter);

ScannerService.getBluetoothDevices();

g. Schedule a POST request:

String url = "http://lestrade.imm.dtu.dk/~s101422/BTScannerWebService/ testotherWS.php";

List nameValuePairs = new ArrayList(2);

// POST parameters nameValuePairs.add(new BasicNameValuePair("a", "1")); nameValuePairs.add(new BasicNameValuePair("b", "2")); long id = ScannerService.schedulePost(this, url, nameValuePairs);

// if want to try it NOW long id = ScannerService.schedulePost(this, url, nameValuePairs, long time);

// if want to try it after a timestamp IntentFilter filter = new IntentFilter("dk.dtu.imm.btscanner.ACTION_POST_ID_" + id); this.registerReceiver(new BroadcastReceiver() {

public void onReceive(Context context, Intent intent) { String result = intent.getExtras().getString("result")); unregisterReceiver(this); } }, filter);

ScannerService.getPost(this, id);

h. Schedule a POST request with FILE

The same as above, but using ScannerService.schedulePostAndFile

Appendix | 112

The values passed are a byte[] array and the file name. On the webservice, the file is sent as $_FILES['file'];

i. Activate the Bluetooth

• To show the popup: ScannerService.enableBluetooth(this);

• To handle the popup result: protected void onActivityResult (int requestCode, int resultCode, Intent data) { boolean isEnabled = ScannerService.onBluetoothActivityResult(requestCode, resultCode, data); }

113 | Appendix ii. SQL Queries

a. Query used to find and assign the nearest time from the GPS table into the Bluetooth table DROP PROCEDURE IF EXISTS addToDatabase; DELIMITER $$ CREATE PROCEDURE addToDatabase() BEGIN DECLARE senderMacProcess VARCHAR(100); DECLARE no_more_rows INT DEFAULT 0; DECLARE cur CURSOR FOR SELECT DISTINCT senderMac FROM Bluetooth ORDER BY time ; DECLARE CONTINUE handler FOR NOT FOUND SET no_more_rows = 1; OPEN cur; fetchLoop: LOOP FETCH cur INTO senderMacProcess; IF no_more_rows THEN CLOSE cur; LEAVE fetchLoop; END IF;

INSERT INTO s100433_RoskildeSimplified.Bluetooth (idBluetoothKey, time, discoveryTime, latitude, longitude, accuracy, gpsTime, senderMac) SELECT idBluetoothKey, time, discoveryTime, latitude, longitude, accuracy, gpsTime, senderMac FROM /* Search for locations in the past */ (SELECT idBluetooth as idBluetoothKey, Bluetooth.time, Bluetooth.discoveryTime, Bluetooth.idSubmission, Bluetooth.senderMac, Gps.time as gpsTime, DATE_SUB(Gps.time, INTERVAL accuracy SECOND) as WTime, latitude, longitude, accuracy, ABS(UNIX_TIMESTAMP(Bluetooth.time)-UNIX_TIMESTAMP(DATE_SUB(Gps.time, INTERVAL accuracy SECOND)))+1 as timedif FROM Bluetooth LEFT JOIN Gps ON Bluetooth.senderMac=Gps.senderMac WHERE Bluetooth.senderMac=senderMacProcess AND Bluetooth.time > Gps.time UNION /* Search for locations in the future */ SELECT idBluetooth as idBluetoothKey, Bluetooth.time, Bluetooth.discoveryTime, Bluetooth.idSubmission, Bluetooth.senderMac, Gps.time as gpsTime, DATE_ADD(Gps.time, INTERVAL accuracy SECOND) as WTime, latitude, longitude, accuracy, ABS(UNIX_TIMESTAMP(Bluetooth.time)-UNIX_TIMESTAMP(DATE_ADD(Gps.time, INTERVAL accuracy SECOND)))+1 as timedif FROM Bluetooth LEFT JOIN Gps ON Bluetooth.senderMac=Gps.senderMac WHERE Bluetooth.senderMac=senderMacProcess AND Bluetooth.time <= Gps.time UNION SELECT idBluetooth as idBluetoothKey, Bluetooth.time, Bluetooth.discoveryTime, Bluetooth.idSubmission, Bluetooth.senderMac, MAKETIME(0,0,0) as gpsTime, MAKETIME(0,0,0) as WTime, -1, -1, 100000, 1000000000 as timedif FROM Bluetooth WHERE Bluetooth.senderMac=senderMacProcess ORDER BY timedif ASC) AS BT1 GROUP BY idBluetoothKey; END LOOP fetchLoop; END $$ DELIMITER ;

CALL addToDatabase();

Appendix | 114 b. Query used to measure the difference between the location time and the time of the discoveries SELECT COUNT(error), AVG(error) FROM (SELECT idBluetooth, time, gpsTime, ROUND(ABS(time - gpsTime), 0) as error FROM s100433_RoskildeSimplified.Bluetooth WHERE latitude > 55.600463 AND latitude < 55.627705 AND longitude > 12.057495 AND longitude < 12.098866 AND time BETWEEN '2012-06-30 00:00:00' AND '2012-07-09 03:00:00' ORDER BY idBluetooth ASC) AS errorTable WHERE error < 5000

c. Query for counting the repeated devices limited to location, time and accuracy. SELECT Scans.idBluetooth, Scans.senderMac, Bluetooth.time, Count(Scans.senderMac) AS total FROM s100433_RoskildeSimplified.Scans INNER JOIN s100433_RoskildeSimplified.Bluetooth ON Scans.idBluetooth = Bluetooth.idBluetooth WHERE latitude > 55.600463 AND latitude < 55.627705 AND longitude > 12.057495 AND longitude < 12.098866 AND time BETWEEN '2012-06-30 00:00:00' AND '2012-07-09 02:00:00' AND accuracy < 120 GROUP BY idBluetooth , senderMac ORDER BY time ASC;

d. Query for calculating the time of discovery SELECT Bluetooth.senderMac AS bt_mac_scanner, mac AS bt_mac_discovered, time AS timestamp, discoveryTime FROM (SELECT idBluetooth, Scans.senderMac, mac, Scans.idSubmission FROM Scans LEFT JOIN Devices ON Devices.senderMac = Scans.senderMac AND Devices.idDevices = Scans.idDevices) AS Scans2 LEFT JOIN Bluetooth ON Bluetooth.idBluetooth = Scans2.idBluetooth AND Bluetooth.senderMac = Scans2.senderMac WHERE Bluetooth.time > '2012-06-30 12:00:48' GROUP BY time , Bluetooth.senderMac , mac

115 | Appendix e. Query used to select the discovered devices that were in an event CREATE VIEW devOccView AS SELECT mac,gpsTime,accuracy,artist,genre, country,start,stop,stage,distance, count(mac) as repeted -- TIMEDIFF(gpsTime,start) as occurance FROM s100433_RoskildeSimplified.MacMeta Where stage = 'Orange' AND artist='BJORK' AND distance <= 100 AND mac='00:25:D0:68:BD:B5' group by gpsTime having repeted >1 order by gpsTime ASC ;

f. Query for calculating the distance between the desired location (55.621506, 12.077213 –Orange Arena location) and locations appeared in Bluetooth table SELECT mac,gpsTime, time,latitude,longitude,accuracy, round(( 3959 * acos( cos( radians(Bluetooth.latitude) ) * cos( radians( 55.621506 ) ) * cos( radians(12.077213) - radians(Bluetooth.longitude)) + sin(radians(Bluetooth.latitude)) * sin( radians(55.621506))))*1.60934 *1000) AS distance-- +accuracy AS distance FROM s100433_RoskildeSimplified.Scans INNER JOIN Bluetooth ON Scans.idBluetooth = Bluetooth.idBluetooth WHERE latitude > 55.600463 AND latitude < 55.627705 AND longitude > 12.057495 AND longitude < 12.098866 AND mac NOT LIKE 'null' AND round(( 3959 * acos( cos( radians(Bluetooth.latitude) ) * cos( radians( 55.621506 ) ) * cos( radians(12.077213) - radians(Bluetooth.longitude)) + sin(radians(Bluetooth.latitude)) * sin( radians(55.621506))))*1.60934 *1000)

g. The same formula for distance calculation in GPS coordinates adopted for excel. =ROUND(((3959*ACOS(COS(RADIANS(latitude.FROM))*COS(RADIANS(latitude.TO)) *COS(RADIANS(longitude.TO)-RADIANS(longitude.FROM))+SIN(RADIANS(latitude.FROM)) *SIN(RADIANS(latitude.TO))))*1.60934*1000),2)

Appendix | 116 iii. Code Snippets

a. IntervalComputer.java package dk.dtu.imm.btscanner.utils; import java.text.SimpleDateFormat; import java.util.Date; import java.util.HashSet; import java.util.Set; import dk.dtu.imm.btscanner.services.Battery;

public class IntervalComputer {

private static final int HISTORY_RATIOS_SIZE = 10; private static double ratios[] = null; private static int totalDevices[] = null; private static Set previousDevices = null; public static double currentRatio = 0.5f;

/** * This method should be called after every bluetooth scan * * @return */ @SuppressWarnings("unchecked") public static void replacementRatio(Set currentDevices) { double ratio = 0.0f; int totalDevice = 0;

/* * Calculate the current refresh ratio * * P: Previous bluetooth set * C: Current bluetooth set * * ratio = |(size(C \ P) - size(P \ C)) / size(C U P)| * */

// First time if (previousDevices==null) { ratio = 0.5f; totalDevice = currentDevices.size(); } else { // Clone the sets Set incomingSet = (Set) ((HashSet) currentDevices).clone(); Set outgoingSet = (Set) ((HashSet) previousDevices).clone(); Set totalSet = (Set) ((HashSet) currentDevices).clone();

// Perform set operations

// C \ P incomingSet.removeAll(previousDevices); // P \ C outgoingSet.removeAll(currentDevices); // C U P totalSet.addAll(previousDevices);

// Get the size double incoming = incomingSet.size(); double outgoing = outgoingSet.size(); double total = totalSet.size();

117 | Appendix if (total!=0) ratio = (incoming - outgoing) / total; else ratio = 0;

ratio = Math.abs(ratio); totalDevice = (int) total;

}

previousDevices = (Set) ((HashSet) currentDevices).clone();

/* * Calculate weighted average for replacement ratio * * ratio = (totalDevices_1*ratio_1 + totalDevices_2*ratio_2 + ...) / (totalDevices_1 + totalDevices_2 + ...) * */

// First time for weighted average if (ratios==null) { // Initialize ratios array ratios = new double[HISTORY_RATIOS_SIZE]; totalDevices = new int[HISTORY_RATIOS_SIZE];

for (int i=0; i

// Add current ratio to the top of the array ratios[0] = ratio; totalDevices[0] = totalDevice;

// Current ratio remains unchanged } // Now the actual calculation else { double numerator = 0.0; double denominator = 0.0;

// Add the current ratio and total devices addAndRotate(ratio, ratios); addAndRotate(totalDevice, totalDevices);

// Calculate the historical weighted average for (int i=0; i

if (denominator!=0) ratio = numerator / denominator; else ratio = 0;

} currentRatio = ratio; }

private static void addAndRotate(int totalDevice, int[] totalDevices) { for (int i=totalDevices.length-2; i >= 0 ; i--) totalDevices[i+1] = totalDevices[i];

totalDevices[0] = totalDevice; }

Appendix | 118

private static void addAndRotate(double ratio, double[] ratios) { for (int i=ratios.length-2; i >= 0; i--) ratios[i+1] = ratios[i];

ratios[0] = ratio; }

/** * Polynomial function to obtain the activity (refresh rate of people) according the time of the * day * based on Roskilde 2011 data.

* * To plot in Mathematica (x is the hour of the day):

* * f[x] = 0.9968` - 0.33` Mod[-6.28` + x, 24.`] + 0.0413` Mod[-6.28` + x, 24.`]^2 - 0.0023` Mod[-6.28` + x, 24.`]^3 + 0.000048` Mod[-6.28` + x, 24.`]^4 *

* Plot[f[x], {x, 17.276, 17.28}] * * @param timeStamp * @return a value between 0 and 1. 1 is low activity (nobody moves) and 0 is high activity (many * people moves) */

public static double dailyFunction(long time) {

// Convert from timestamp to hours SimpleDateFormat dateFormatHour = new SimpleDateFormat("HH"); SimpleDateFormat dateFormatMin = new SimpleDateFormat("mm"); SimpleDateFormat dateFormatSec = new SimpleDateFormat("ss");

Date date = new Date(time); double x = Double.parseDouble(dateFormatHour.format(date)) + Double.parseDouble(dateFormatMin.format(date))/60.0 + Double.parseDouble(dateFormatSec.format(date))/3600.0;

// Take the parameters for the modulo part: Mod[-6.28` + x, 24.`] double a = -6.28 + x; double m;

// Calculate the modulo. Java works differently than Mathematica for negative modulo. if (a < 0) m = 24 - (Math.abs(a) % 24.0); else m = a % 24.0;

// Calculate the polynomial double f = 0.9968 - 0.33 * m + 0.0413 * m * m - 0.0023 * m * m * m + 0.000048 * m * m * m * m; return f; }

/** * This function computes the factor for current battery level * * @return */ public static double batteryFactor() { double batteryLevel = Battery.level; double factor = 0.0;

// If the battery is below 35% we reduce the scan frequency if (batteryLevel < 0.35) { factor = 0.5; } else { factor = 0.0; }

return factor;

}

119 | Appendix

/** * Interval calculator * * @return */ public static long getInterval(String name, long minInterval, long maxInterval) { long nextScan = 0; double dailyFloor = 0.3; double dailyFactor = IntervalComputer.dailyFunction(System.currentTimeMillis()); double replacementRatio = IntervalComputer.currentRatio; double batteryLevel = IntervalComputer.batteryFactor();

// Obtain full interval range long delta = maxInterval - minInterval;

// Avoid dailyFactor to start from zero dailyFactor = clamp(dailyFactor, dailyFloor, 1.0);

// The daily function reduces the delta interval long deltaDaily = (long) (dailyFactor * delta);

// The replacement ratio affect the deltaDaily value replacementRatio = 1 - replacementRatio; deltaDaily = (long) (deltaDaily * replacementRatio);

// The battery level finally modifies the previous calculations nextScan = (long) (minInterval + delta*batteryLevel + deltaDaily); nextScan = clamp(nextScan, minInterval, maxInterval);

return nextScan; }

/** * Clamp math function * * @param var * @param min * @param max * @return */ private static long clamp(long var, long min, long max) { if (var > max) var = max;

if (var < min) var = min;

return var; }

private static double clamp(double var, double min, double max) { if (var > max) var = max;

if (var < min) var = min;

return var; } }

Appendix | 120 iv. Test cases

Test Case Nr: Start Date: End Written by: Approved by: 01 Date: 23/05/12 Lukasz Marcos Fuentes Dynowski Description: Multi-application testing. (Library compatibility test) Instruction: Attached the module to other available applications. Install the applications on an Android smartphone (minimum v2.2). Run application with attached library simultaneously on this same device in particular order: one application installed and run, two applications installed and run, three applications installed and run. Expected Positive: The Applications doesn’t cause the malfunction of other application or Results: system. The services run exchangeable or simultaneously. Negative: The application causes malfunction as well killing other services of applications running this same library. Additional This same test shall be run on different devices Notes:

Test Case Nr: Start End Date: Written by: Approved by: 02 Date: 23/05/12 Lukasz Dynowski Marcos Fuentes Description: Monitors test Instruction: Open tools for monitoring: - In the computer web browser ( http://lestrade.imm.dtu.dk/~s101422/BTScannerWebService/monitorRoskilde.php ) - The mobile device web browser ( http://lestrade.imm.dtu.dk/~s101422/BTScannerWebService/monitorRoskildeMobile .php ) Expected Positive: The web page fits in to devices screen (480x800). Results: Size of the icon is correct. Data displayed is easily readable Forms are easy to use Negative: The fonts size is too small, Icons are too big(small), There is deception in data display. Additional Please omit deception of scanners located fair away from event where data were Notes: collected.

121 | Appendix Test Case Nr: Start End Written by: Approved by: 03 Date: Date: 23/05/12 Lukasz Dynowski Marcos Fuentes Description: SQLite Data base testing – SD Card testing Instruction: Run application with attached library with following cases: 1. When the SD Card is inserted to smartphone. 2. When is no SD Card is smartphone. 3. When SD-Card is suddenly removed from smartphone. Expected Positive: 1. The database is correctly created and data are saved Results: 2. Warning is prompt to the user “there is no SD Card in phone”. Data should be written i to internal mobile memory (RAM). 3. Data should be saved on the SD Card till moment where the Card was removed. Removing process shouldn’t kill the running application. Data should be store in internal memory (RAM) Negative: 2. There is no possibility to write the data to internal memory. Application crushes 3. Application crushes. Data are not saved anywhere. Additional Notes:

Test Case Nr: 04 Start End Date: Written by: Approved by: Date: 23/05/12 Lukasz Dynowski Marcos Fuentes Description: Battery test – Avg. usage of mobile device. Instruction: Run the application with attached library. Set the interval of Bluetooth scanning and GPS location for two intervals. 1. 7 min 2. 20 min Compere results with the predicted calculations. Expected Positive: The results of the experiment are similar with the calculated one Results: Negative: There is huge discrepancy between previous data. Additional There is huge discrepancy between previous data. Notes:

Appendix | 122 Test Case Nr: 05 Start End Date: Written by: Approved by: Date: 23/05/12 Lukasz Dynowski Marcos Fuentes Description: Method test (Server side) Instruction: Open available php files located in BTScannerService. Test SQLite Parser (sqlieteParser.php). Test receiver (recivePackage.php). Test DB (database.php). Test Post (checkPost.php) SQLite (appendSQLite.php) Perform queries on SQL Data base. Expected Positive: Functions are not causing exceptions or errors Results: Negative: Function throws exception or errors. Additional Notes:

Test Case Nr: 06 Start End Date: Written by: Approved by: Date: 23/05/12 Lukasz Dynowski Marcos Fuentes Description: Performance test Instruction: Run FrozenBubbles application and monitor the CPU, memory, and behavior. Run this same application with attached library and monitor these same parameters. Compare differences. Expected Positive: The application doesn’t cause memory leak or any failures. Results: Negative: Memory and CPU usage increases significant, slowing down application or entire system. Additional Notes:

Test Case Nr: 07 Start End Date: Written by: Approved by: Date: 23/05/12 Lukasz Dynowski Marcos Fuentes Description: Internet connection test Instruction: Run application with attached library. Cause the application to update package to server. Interrupt process of updating the package. Expected Positive: The package is not updated in the trail. Its save on the SDCard until Results: it will be uploaded again. Negative: The interruption causes malfunction of the application. The package is lost. Additional Notes:

123 | Appendix