Design and Large-Scale Evaluation of Wifi Proximity Metrics

Design and Large-Scale Evaluation of WiFi Proximity Metrics Aymen Fakhreddine Nils Ole Tippenhauer Domenico Giustiniano IMDEA Networks Institute Singapore University of Technology and Design IMDEA Networks Institute & University Carlos III of Madrid Singapore Madrid, Spain Madrid, Spain nils [email protected] [email protected] [email protected] Abstract—We study the problem of deriving proximity metrics • Measurement sanitization for user-assisted WiFi localization based on WiFi fingerprints without the need of external sensors database building [1], [2]; and access to the locations of APs. Applications that benefit from • Fingerprint ambiguity estimation in an unknown environ- proximity metrics are movement estimation of a single node over time, WiFi fingerprint matching for localization systems ment [3]; and attacks on privacy. Using a large-scale, real-world WiFi • Detection of co-location for mobile applications on mobile fingerprint data set consisting of 200,000 fingerprints resulting devices (as attack on users’ privacy, see [4] for Bluetooth). from a large deployment of wearable WiFi sensors, we show There are three main challenges to address to effectively that metrics from related work perform poorly on real-world data. We analyze the cause for this poor performance, and show solve the problem of WiFi proximity metrics. that imperfect observations of APs in the neighborhood are the Challenge 1: Design of proximity metrics without using any root cause. We then propose improved metrics to provide such external sensors. To the best of our knowledge, metrics that do proximity estimates, without requiring knowledge of location not require any external sensors have received limited attention for the observed AP. Our metrics allow to derive a relative in related work. Yet, the availability of a solution in this space distance estimate based on two observed WiFi fingerprints. We demonstrate that their performance is superior to the related would allow a widespread adoption to any WiFi transceiver, work metrics. from low-end IoT device to more powerful smartphone device. Challenge 2: Metrics that can cope with probabilistic I. INTRODUCTION observations. Fingerprints might not contain information of In recent years, mobile smart devices have become ubiq- all APs in range. There are two reasons: uitous, e.g., smart phones, personal health devices, smart • storage constraints might limit the number of APs stored watches, and other smart home appliances. Low cost require- for each fingerprint; ments and integration with legacy networks lead to wide adop- • collisions, time-out, and channel hopping during WiFi scan- tion of IEEE 802.11-based wireless communication (WiFi). ning can prevent deterministic observations of nearby APs. Many such devices, and devices that are part of the so- called Internet-of-Things (IoT), interact with their physical Storage of WiFi fingerprints can require relatively large environment and physically close communication partners amount of memory, if the MAC address (6 Bytes) and an either directly or via cloud platforms. RSSI value (1 Byte) is stored for each AP. Given the growing To determine the physical locations of such devices, WiFi- density of WiFi APs, single scans for neighboring APs can localization based approaches have become widespread. Typi- easily return 30 and more results. Storing an unlimited number cally, the mobile device collects a wireless fingerprint at their of APs per fingerprint can thus violate memory constraints. current location. This consists of set of MAC addresses of Challenge 3: Metrics that do not have access to large nearby Access Points (APs), and their received signal strength databases of known locations of APs. In this work, we are indicator called RSSI. This fingerprint data is sent to a third- interested in metrics that do not require prior knowledge of party cloud service (such as Google, Apple, Skyhook, etc.), the locations of APs in the environment. In particular, large- that estimates the device’s location. The motivation for using scale records of AP locations are proprietary data owned by a third-party cloud service is that fingerprint data is noisy, and few companies, and not accessible by the public. it requires a large set of measurements from a large number Finally, related work lacks of large scale real-world data of users and access to a large database of positions of APs in evaluation of proximity metrics, which are needed to assess order to provide high-to-medium accuracy level. the soundness of the proposed solution. In this work, we look at the problem of WiFi proximity Our main contributions are listed in what follows: metrics in positioning systems, i.e., metrics that allow to • We perform an extensive evaluation of a novel Jaccard estimate the spatial correlation or physical distance between Index-based metric and two prior work metrics over two two WiFi fingerprints. Such metrics can be valuable in a main datasets: i) an artificial dataset based on simplified number of scenarios: propagation models and perfect knowledge of the true locations, and ii) a large-scale, real-world WiFi fingerprint present them only for the sake of comparing them to the ones data set consisting of 200; 000 fingerprints resulting from a we design in this work. large deployment of wearable WiFi sensors [5]. MetricE (Euclidean Distance): In [6], the authors discuss • We identify key drawbacks of all three metrics using our RSSI proximity metrics. They propose to compute the Eu- real-world dataset, analyze the causes and propose a model clidean distance between the RSSI vectors fRSSIaig i2Na\Nb to explain the issue. Further analysis of our dataset confirms and fRSSIbig from the set of APs a \ b present in i2Na\Nb N N that our model closely matches observed data. both fingerprints Fa and Fb: • Based on our error model, we propose three improved dis- s X 2 tance metric definitions. Our proposed metrics only require m(Fa;Fb) = RSSIai − RSSIbi (1) two WiFi fingerprints readings, and enable mobile devices i2Na\Nb to compute the results i) without continuous requests to the MetricM (The ”Manhattan” distance): In [6], the authors third-party cloud service, ii) without disclosing the location also propose to use the Manhattan distance that refers to the to the cloud service and to neighbor nodes, and iii) with sum of the absolute differences instead of the Euclidean one. limited requirements of local storage and low computational X and implementation complexity. m(Fa;Fb) = jRSSIai − RSSIbij (2) This work is structured as follows. In Section II, we i2Na\Nb introduce the system model, metrics from related work and C. NSE Dataset the datasets used (real and simulated data). In Section III, we Our evaluation uses a large-scale real-world dataset, col- introduce the detailed problem, our first attempts at addressing lected as part of the National Science Experiment (NSE) it, and investigations into mismatch between performance on project in Singapore. We now briefly introduce the SENSg simulated data versus real data. To address performance drop devices that are used to gather the dataset. with real data, we introduce improved metrics Section V and show that they perform well in both settings. We conclude in SENSg sensors: A total of 50,000 devices are produced and Section VI. used by students at schools. The students wear the devices for a week or longer, and collect data about their daily life. The II. BACKGROUND data is automatically uploaded to a cloud platform, and made A. System Model and WiFi Fingerprints available to the students to analyze. The devices are called In this work, we focus on IEEE 802.11b (which we will SENSg [5], and they record WiFi fingerprints (as defined in refer to as WiFi), but general principles hold for other wireless this work) every 12 seconds. Using a third party API, the WiFi standards as well. We consider a node with wireless transceiver fingerprints are mapped to location estimates after the data that is capable of observing the presence of WiFi APs and is uploaded to the cloud. The SENSg devices store only the their RSSI values. To simplify the discussion, we assume that 20 APs with highest RSSI per fingerprint. Storing the MAC the infrastructure is static, and the node is moving over time. address and received signal strength for each AP requires 7 We also assume that omnidirectional antennas are used. The Bytes, so a fingerprint with 20 observed APs is 140 Byte large. density of APs in range can vary from 0 to more than 40. Measurements and data gathering: The real world datasets The node gathers measurements of WiFi fingerprints over used in this work are gathered by students during the NSE time. Then, the node uses the proximity metric to find an project. In particular, the dataset used in this work is a subset estimate of the moved distance for consecutive fingerprints. of all fingerprints taken. For all fingerprints used, we also have We use the following notation in this work to refer to WiFi a location estimate by the third party API. Location provided fingerprints. We define n as our node of interest, a set A of all by the third party API is subject to accuracy errors, as in APs in the target area, and N ⊂ A as the subset of nearby APs typical cloud-based location based-systems. within that are in range of node n. We use j j to indicate A N D. Simulation setup the cardinality of a set, i.e. the count of distinct items in the set. For integers jxj is the absolute value of x. In addition to the real world dataset, we generate an artificial Let oi = (RSSIi; MACi) denote the AP observation of dataset with 200; 000 fingerprints as in the real measurement data. The fingerprints are randomly distributed in an area n for AP i.

Load more