Opportunities and Challenges in Crowdsourced Wardriving
Total Page:16
File Type:pdf, Size:1020Kb
Opportunities and Challenges in Crowdsourced Wardriving Piotr Sapiezynski Radu Gatej Alan Mislove Sune Lehmann Technical University University Northeastern Technical University of Denmark of Copenhagen University of Denmark ABSTRACT even as a form of authentication [10]. Determining a mo- Knowing the physical location of a mobile device is crucial bile device’s location is typically accomplished in one of two for a number of context-aware applications. This informa- ways: First, mobile devices can use various satellite-based tion is usually obtained using the Global Positioning System systems (GPS, Galileo, or GLONASS). While most mobile (GPS), or by calculating the position based on proximity of devices today ship with dedicated GPS hardware, relying on WiFi access points with known location (where the posi- GPS alone for determining location has a number of down- tion of the access points is stored in a database at a central sides: obtaining an initial GPS fix introduces non-negligible server). To date, most of the research regarding the cre- delay, and causes significant power consumption. ation of such a database has investigated datasets collected Second, mobile devices can use WiFi localization. In brief, both artificially and over short periods of time (e.g., dur- WiFi localization works by having the mobile device listen ing a one-day drive around a city). In contrast, most in-use for advertised WiFi networks (each WiFi access point peri- databases are collected by mobile devices automatically, and odically announces its unique identifier or BSSID,aswellas are maintained by large mobile OS providers. the name of the network, referred to as SSID), and report As a result, the research community has a poor under- that list to a central server. The server then computes the standing of the challenges in creating and using large-scale most likely location of the mobile device and returns the re- WiFi localization databases. We address this situation using sult. Thus, for WiFi localization to be e↵ective, the server the deployment of over 800 mobile devices to real users over must have a pre-computed database of WiFi access points a 1.5 year period. Each device periodically records WiFi (APs) and their locations. Unfortunately, building such a scans and its GPS coordinates, reporting the collected data database is time-consuming and expensive: the database to us. We identify a number of challenges in using such must be comprehensive (covering many locations) and up- data to build a WiFi localization database (e.g., mobility of to-date (as new APs are deployed and existing ones move). access points), and introduce techniques to mitigate them. Originally, the aim of such databases was to enable in- We also explore the level of coverage needed to accurately door positioning through finger-printing [3, 9, 20] and later estimate a user’s location, showing that only a small subset through RF-modeling [15, 5]. Most recent work on indoor lo- of the database is needed to achieve high accuracy. calization achieves sub-meter accuracy by rotating the sens- ing device to simulate directional antennas [14]. As the APs became more wide spread it became possible to use them for Categories and Subject Descriptors outdoor localization as well. The databases were then cre- H.4 [Information Systems Applications]: Miscellaneous ated by manually going to di↵erent locations and recording the observed APs (often termed wardriving)[4,18,7,11]. Keywords Today, however, these databases are often built by having dedicated software on the mobile devices collect and report wifi; wardriving; mobility; location data back both in indoor [21, 27] and outdoor [2, 19, 26] contexts. Therefore, creating such a database at scale is typ- 1. INTRODUCTION ically only the domain of mobile OS providers (e.g., Apple, Localization is an increasingly important trend on mo- Google) or dedicated companies (e.g., Skyhook Wireless). bile devices today. Mobile applications use localization to As a result, the research community currently has a rel- provide users with accurate driving directions, recommen- atively poor understanding of large-scale WiFi localization dations for local points of interest (e.g., restaurants), and databases. In this paper, we address this situation by pro- viding insights into the challenges underlying the creation of Permission to make digital or hard copies of all or part of this work for personal or such a database, and the trade-o↵s in using them. We first classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation collect a data set based on a deployment of over 800 mobile on the first page. Copyrights for components of this work owned by others than the phones to students at a university in Copenhagen, Denmark author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or for over 1.5 years. These phones run a stock Android OS republish, to post on servers or to redistribute to lists, requires prior specific permission with custom collection software instrumented to gather GPS and/or a fee. Request permissions from [email protected]. location and overheard WiFi APs. IMC’15, October 28–30, 2015, Tokyo, Japan. Overall, we collect over 1.8M simultaneous measurements Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-3848-6/15/10 ...$15.00. of WiFi APs and GPS location, and observe more than 1.3M DOI: http://dx.doi.org/10.1145/2815675.2815711. unique WiFi APs. Many of the APs are only seen a small The app collects data both actively (it requests location number of times, so we focus on the 376K APs that we and WiFi updates every 5 minutes) and opportunistically observe at least five times. To the best of our knowledge, (whenever another app requests updates). In order to save this represents the most comprehensive data set of this kind the battery, most of the location data is obtained using the that has been examined in the research literature. Using this network and/or fused provider (i.e., an existing WiFi local- data set, we build a WiFi localization database for Copen- ization database). Since we intend to use the GPS mea- hagen. We discuss and identify a number of key challenges surements as ground truth, we focus only on the 10.5% of and issues in doing so: location readings that are provided by the GPS hardware. As a consequence, while the median sampling period be- The scale of the dataset. Most existing studies were tween GPS readings is 1 second, only 29% of per-user hourly performed either in controlled environments or over a short bins have at least one GPS sample (i.e., we only know the time. Here, we show that the WiFi landscape is constantly GPS location of users in 29% of the hours, on average). This changing, new access points are added and old ones are distribution is a consequence of apps like Google Maps that moved to new locations or retired. either use GPS data constantly or not at all. Mobility. With increasing trend of mobile WiFi APs, such Since we are studying WiFi localization databases, in the remainder of the paper we focus on the 1,794,473 GPS sam- as MiFi devices, routers on buses and trains, and mobile 1 phones which also serve as hotspots, we observe that discov- ples which happened within the same second as a WiFi ering and filtering mobile APs presents a significant chal- access point scan. According to our measurements, a single lenge. Failing to properly filter these can lead to gross errors WiFi scan lasts approximately 500 ms and this time does when estimating a device’s location. not depend on the number of saved networks. It is important to note that the securing the wireless net- Noisy data. Unsurprisingly, relying on commodity hard- work does not make it impossible to scan it: regardless of ware introduces noise into the measurements of location, the encryption, each router broadcasts its unique identifier signal strength, and detectability of APs, which must be and the name of the network in clear text.2 handled when inferring the location and mobility of APs. Filtering data. In the 567 days of observations, our partici- We also explore using the database we build to estimate pants observed 7,203,471 unique APs, out of which 1,320,838 the locations of devices given a set of overheard APs. Specif- (18.3%) were scanned at least once in the same second as ically, we examine the trade o↵ between the number of APs a GPS estimation. However, the majority of these APs in the database and the estimation accuracy. We show that were observed with a GPS estimation a very small number knowing the location of only a small fraction of all the APs of times: 944,904 (71.5%) have less than five observations. (3.7%) is actually needed to locate users to within 15 meters Thus, in the remainder of the paper, we focus only on the 75% of time. 375,934 APs that were observed at least five times together with a GPS estimate in the same second to build our WiFi localization database. 2. METHODS We now describe the data we use to build our WiFi local- 3. BUILDING THE DATABASE ization database. We now examine the collected data, with the goal of build- ing a WiFi localization database. Phone deployment. We use data collected by the Copen- hagen Networks Study experiment [23]. In this experiment, 3.1 Estimating the locations of APs students opt-in to receive a smartphone in exchange for The primary challenge we face is estimating the positions agreeing to let us use to collect data (e.g., Bluetooth and of the APs, given our WiFi scan data.