Smartphone Fingerprinting Via Motion Sensors: Analyzing Feasibility at Large-Scale and Studying Real Usage Patterns

Anupam Das Nikita Borisov Edward Chou University of Illinois at University of Illinois at University of Illinois at Urbana-Champaign Urbana-Champaign Urbana-Champaign [email protected] [email protected] [email protected] Muhammad Haris Mughees Hong Kong University of Science and Technology [email protected]

ABSTRACT An important question not addressed by prior work is whether Advertisers are increasingly turning to fingerprinting techniques such fingerprinting can be effective at scale, as state-of-the-art tech- to track users across the web. As web browsing activity shifts to niques [29] have only been evaluated on a set of 100 devices. We mobile platforms, traditional browser fingerprinting techniques be- first perform a larger-scale evaluation of the methods, collecting come less effective; however, device fingerprinting using built-in motion sensor data from a total of 610 devices, and showing that sensors offers a new avenue for attack. We study the feasibility of high accuracy classification is still feasible. We then used the data using motion sensors to perform device fingerprinting at scale, and we collected to develop a model to predict classification accuracy explore countermeasures that can be used to protect privacy. for larger data sets, by fitting a parametric distribution to model We perform a large-scale user study to demonstrate that motion inter- and intra-cluster distances. We can then use these distri- sensor fingerprinting is effective with 500 users. We also develop a butions to predict the accuracy of a k-NN classifier, used with model to estimate prediction accuracy for larger user populations; state-of-the-art distance metric learning techniques; our evaluation our model provides a conservative estimate of at least 12% classifi- shows that even with 100 000 devices, 12–16% accuracy can be cation accuracy with 100 000 users. We then investigate the use of achieved, depending on training set size, which suggests that mo- motion sensors on the web and find, distressingly, that many sites tion sensor fingerprinting can be effective when combined with send motion sensor data to servers for storage and analysis, paving even a weak browser fingerprint. Note that because k-NN under- the way to potential fingerprinting. Finally, we consider the prob- performs other classifiers, such as bagged trees, our estimate of lem of developing fingerprinting countermeasures; we evaluate a accuracy is quite conservative. previously proposed obfuscation technique and a newly developed A second question we wanted to answer was, how are motion quantization technique via a user study. We find that both tech- sensors used on the web? We analyzed the static and dynamic niques are able to drastically reduce fingerprinting accuracy with- used by the Alexa top-100K websites [5] and identi- out significantly impacting the utility of the sensors in web appli- fied over 1 000 instances of motion sensor access. After cluster- cations. ing, we were able to identify a number of common uses of motion sensors, including orientation detection and random number gener- Keywords ation. More distressingly, we noted that a large fraction of scripts send motion data back to a server, while others use the presence of Fingerprinting; Motion Sensors; Privacy-Utility tradeoff motion sensors in advertising decisions. Thus, although we have not been able to identify cases of motion sensor fingerprinting in arXiv:1605.08763v2 [cs.CY] 13 Jun 2016 1. INTRODUCTION the wild, the infrastructure for collecting and analyzing this data is We are in the middle of a war over user privacy on the web. After already there in some cases. the failure of the “Do Not Track” proposal, users have increasingly These results suggest that motion sensor fingerprinting is a real- started using tools such as ad- and tracker-blocking extensions, as istic privacy threat. We therefore wanted to understand the feasi- well as private browsing modes, to protect their privacy. In turn, bility of defending against fingerprinting through browser- or OS- advertisers have started using browser fingerprinting [20, 32, 43] to based techniques. Although several defenses have been proposed track users across the web without the use of cookies. As the bat- to mitigate motion sensor fingerprinting, they reduce the potential tleground shifts to mobile platforms, which are quickly becoming utility of the sensor data by adding noise or other transforms. We the dominant mode for web browsing [1, 10, 14, 18], existing fin- wanted to understand how this trade off between privacy and utility gerprinting techniques become less effective [35, 51]; at the same plays out for the likely uses of the device motion API. To do this, time, new threats emerge: mobile browsers give web pages access we implement a game that uses motion sensors for controls—a rel- to internal motion sensors (accelerometers and gyroscopes) and re- atively demanding application. We then carry out a user study to searchers have showed that imperfections in these sensors can be investigate the impact of privacy protections on the game difficulty. used to fingerprint devices [29, 31, 35], boosting the accuracy of a We evaluate an obfuscation method proposed by Das et al. [29] and weakened browser fingerprint. develop a new quantization-based protection method. Encourag- ingly, we find that neither method creates a statistically significant works that looked into exploiting easily accessible browser proper- impact on motion sensor utility, as measured by both subjective ties such as installed fonts and plug-ins to fingerprint browsers [32]. and objective measures. This suggests that user privacy may be In recent years, researchers have come up with a more advanced preserved without sacrificing much utility. technique that uses HTML5 canvas elements to fingerprint the fonts In summary, we make the following contributions: and rendering engines used by the browser [43]. Moreover, users • We evaluate the state-of-the-art fingerprinting techniques by can be profiled and tracked by their browsing history [46]. Many Das et al. [29] on a large data set of 610 devices. (§4.1) studies have shown that all of these techniques are actually used in • We develop a model for predicting how the k-NN classifier the wild [20, 21, 44]. Researchers have also looked at countermea- will perform on larger data sets and use it to obtain a conser- sures that typically disable or limit the ability of a web publisher vative estimate of fingerprinting accuracy for up to 100 000 to probe particular browser characteristics. Privaricator [45] is one devices. (§4.2) such approach that adds noise to the fingerprint to break linkability • We perform a measurement study to evaluate how motion across multiple visits. sensor information is used by existing websites. We identify With the rapid growth of smart devices, researchers are now fo- several common uses for motion sensor data and find that cusing on adopting existing fingerprinting techniques in the con- motion data is frequently sent to servers. (§5) text of smart devices. Like cookies, app developers have looked • We develop a new fingerprinting countermeasure that uses at using device IDs such as Unique Device Identifier (UDID) or quantization of data in polar coordinates. (§6.1) International Mobile Station Equipment Identity (IMEI) to track • We carry out a user study to evaluate the impact of our coun- users across multiple applications. However, Apple ceased the use termeasure, as well as the obfuscation technique proposed by of UDID since iOS 6 [4] and for Andriod accessing IMEI requires Das et al. [29], on the utility of motion sensors and find that explicit user permission [3]. Moreover, due to constrained hard- users experience no significant ill effects from the counter- ware and software environment existing methods often lack in pre- measures. (§6.3) cision for smartphones and recent studies have shown this to be true [35, 51]. However, this year Laperdrix et al. have shown that Roadmap. The remainder of this paper is organized as follows. it is in fact possible to fingerprint smartphones effectively through We present background information and related work in Section user-agent string which is becoming richer every day due to the 2. In Section3, we briefly describe our data collection and fea- numerous vendors with their different firmware versions [39]. Oth- ture extraction process along with the classification algorithms and ers have looked at fingerprinting smartphones by exploiting the metrics used in our evaluation. Section4, describes how we ex- personal configuration settings which are often accessible to third trapolate fingerprinting accuracy at large scale by deriving intra- party apps [38]. and inter-class distance distributions. We present our measurement study on how top websites access motion sensors in Section5. Sec- Sensor Fingerprinting. tion6 evaluates the usability impact of fingerprinting countermea- Today’s smartphones come with a wide range of sensors, all of sures through a large-scale online user study. Finally, we conclude which provide different useful functionality. However, such sen- in Section7. sors can also provide side-channels that can be exploited by an ad- versary to uniquely fingerprint smartphones. Recent studies have 2. RELATED WORK looked at exploiting microphones and speakers to fingerprint smart- Fingerprinting devices has been an interesting research area for phones [25, 27, 55]. Others have looked at utilizing motion sensors many decades. It all started with a rich body of research that looked like accelerometer to uniquely distinguish smartphones [25, 31]. at fingerprinting wireless devices by analyzing the spectral char- And most recently, Das et al. have shown that they can improve acteristics of wireless transmitters [40, 47, 48]. Researchers then the fingerprinting accuracy by combining gyroscope with inaudi- moved onto fingerprinting computers by exploiting their clock skew ble sound [29]. Our approach builds on the work done by Das et rate [42]. Later on, as computers got connected to the Internet, re- al. However, our work provides a real-world perspective on the searcher were able to exploit such skew rates to distinguish con- problem. We not only show that sensor-based fingerprinting works nected devices through TCP and ICMP timestamps [36]. Installed at large scale but also show how websites are accessing the sensor software has also been used to track devices, as different devices data in the wild. Moreover, we also provide a new countermeasure usually have a different software base installed. Researchers have technique where we quantize sensor data to lower the resolution of utilized such strategy to uniquely distinguish subtle differences in the sensor. We also by perform a large scale user study to where the firmwares and device drivers [33]. Moreover, there are open users play a online game to show that our countermeasure does not source toolkits like Nmap [41] and Xprobe [54] that can finger- affect the utility of the sensors. print the underlying operating system remotely by analyzing the responses from the TCP/IP stack. The latest trend in fingerprinting 3. FEATURES AND EVALUATION METRICS devices is through the web browser. We will now describe some of the most recent and well-known results in this field. In this section we briefly describe the data collection and data preprocessing step. We also discuss the classification algorithms Browser Fingerprinting. and evaluation metrics used in our evaluation. The primary application of browser fingerprinting is to uniquely 3.1 Data Collection track a user across multiple websites for advertisement purpose. 1 Traditionally this has been done through the injection of cookies. To collect sensor data from smartphones we develop a web page . The web page contains a JavaScript to access motion sensors like However, privacy concerns have pushed browser developers to pro- accelerometer and gyroscope. We create an event listener for de- vide ways to clear cookies, and also provide options to browse in vice motion in the following manner: private mode which does not store long-term cookies. This has forced publishers to come up with new ways to uniquely iden- 1http://datarepo.cs.illinois.edu/MTurkExp.html. We obtain IRB tify and track users. The Panopticlick project was one of the first approval for collecting sensor data. window.addEventListener(‘devicemotion’,motionHandler) thus, making it possible to correlate data samples coming from the same physical device.4 Once the event listener is registered, the motionHandler func- tion can access accelerometer and gyroscope data in the following 3.2 Processed Data Streams manner: We process the accelerometer and gyroscope data into four time- function motionHandler(event){ series data streams, similar to the way Das et al. do in their pa- // Access Accelerometer Data per [29]. At any given timestamp, t, we have the following two data ax = event.accelerationIncludingGravity.x; vectors: 1) acceleration including gravity, ~a(t) = (ax, ay, az) and ay = event.accelerationIncludingGravity.y; 2) rotational rate, ~ω(t) = (ωx, ωy, ωz). The accelerometer value az = event.accelerationIncludingGravity.z; includes gravity, i.e., whenever the device lies stationary flat on top // Access Gyroscope Data of a surface we get a value of 9.81ms−2 along the z-axis. To make rR = event.rotationRate; the fingerprint technique independent of device orientation we take if (rR != null){ p 2 2 2 the magnitude of the acceleration vector, |~a(t)| = ax + ay + az gx = rR.alpha; as one of our processed data streams. For the gyroscope, since there gy = rR.beta ; is no baseline rotational speed (i.e., irrespective of device orienta- gz = rR.gamma; tion a stationary device should register 0 rads−1 rotation rate along } all three axes), we consider each axis as a separate source of data } stream. Thus, we end up with the following four streams of sensor Users are asked to visit our web page while placing their smart- data: {|~a(t)|, ωx(t), ωy(t), ωz(t)}. To obtain frequency domain phone on a flat surface. Thus, mimicking the scenario where the characteristics we interpolate the non-equally spaced data stream user has placed his/her smartphone on a desk while browsing a into equally-spaced time-series data by using cubic-spline interpo- web page. Our web page collects 10 samples consecutively where lation. each sample is 5 seconds worth of sensor data (total participation time is in the range of 1 minute). We found that popular mobile 3.3 Features web browsers such as Chrome, Safari and Opera all have a simi- Inspired by the most recent work in this field by Das et al. [29], lar sampling rate in the range of 100-120 Hz (Firefox provided a we extract the same set of 25 features from each data stream. We sampling rate close to 180 Hz)2. However, the sample rate avail- obtain the feature extraction code base from Das et al [29]. Out of able at any instance of time depends on multiple factors such as these 25 features, 10 are temporal features and the remaining 15 are the current battery life and the number of applications running in spectral features5. As we have four data streams, we have a total of the background. As a result we received data from participants at 100 features to summarize the unique characteristics of the motion various sampling rates ranging from 20 Hz to 120 Hz. Initially, sensors. we recruited users through university mass email and social me- dia like Facebook and Twitter. Later on, we recruited participants 3.4 Classification Algorithms and Metrics through Amazon Mechanical Turk [2]. In total, we had a total of Classification Algorithms: Following the approach of Das et al., 610 participants over a period of three months. We obtained data we use a supervised multi-class classifier. For any supervised al- from 108 different brands (i.e., make and model) of smartphones gorithm we need to split our data set into training and testing set. with different models of iPhone comprising nearly half of the total The training set (labeled with true device identity) is used to train devices3. Figure1 shows the distribution of the different number the classifier while the testing set is used to evaluate how well we of samples per device. Since some participation was voluntary for can classify unseen data points. In this paper we explore the perfor- users not using Mechanical Turk, we see that many users provided mance of the following two classifiers: k-Nearest Neighbor (k-NN) fewer than 10 samples. and Random Forest (MATLAB’s Treebagger model) [17]; the lat-

100 ter having been found by Das et al. to achieve the best classification performance. 90

80 Evaluation metrics: For evaluation metric we use the well-known classification metric F-score [50]. F-score is the harmonic mean of 70 65.57 precision and recall. To compute precision and recall we first com- 60 pute the true positive (TP ) rate for each class, i.e., the number of 50 traces that are correctly classified. Similarly, we compute the false

% of Devices 40 positive (FP ) and false negative (FN) as the number of wrongly accepted and wrongly rejected traces, respectively, for each class i 30 (1 ≤ i ≤ n). We then compute precision, recall, and F-score for 20 each class using the following equations: 10 7.21 3.93 3.11 3.61 3.44 3.77 2.95 2.46 3.93 Precision, P ri = TPi/(TPi + FPi) (1) 0 1 2 3 4 5 6 7 8 9 10 Number of samples per device Recall, Rei = TPi/(TPi + FNi) (2) Figure 1: Distribution of the number of data samples per smartphone. F-Score, Fi = (2 × P ri × Rei)/(P ri + Rei) (3)

For the purpose of labeling our data we plant a unique random 4 number inside the cookie. This provides us with ground truth data, It is possible that users cleared this cookie, but we do not ex- pect this to happen with enough frequency to significantly affect 2http://datarepo.cs.illinois.edu/SamplingFreq.html our data. 3We used https://web.wurfl.io/#learnmore to obtain the make 5A detailed description of each feature is available in the tech- and model of a smartphone. nical report provided by Das et al. [28] To obtain the overall performance of the system we compute aver- 4.2 Analyzing Scalability age values across all classes in the following way: Although we have shown that we can reliably fingerprinting a Pn few hundred devices, in real-world scenarios the fingerprinted pop- P ri Avg. Precision, AvgPr = i=1 (4) ulation will be much larger. It is not feasible for us to collect data n Pn on much larger data sets; instead, we develop a model to predict Rei Avg. Recall, AvgRe = i=1 (5) how well a classifier will perform as the number of devices grows. n However, although random forest provides the best classification 2 × AvgP r × AvgRe Avg. F-Score, AvgF = (6) performance, on our data set, its operation is hard to model, as AvgP r + AvgRe different trees use a different random sample of features. We there- To evaluate our large scale simulation results we use Accuracy as fore base our analysis on nearest-neighbor classifier (k-NN), which our evaluation metric6. Accuracy is defined as the portion of test uses a distance metric that we can model parametrically. Note that traces that are correctly classified. k-NN does not perform as well as random forest; as a result, our estimates are a conservative measure of the actual attainable clas- # of samples correctly classified Accuracy, Acc = (7) sification accuracy. Total test samples

4.2.1 Distance Metric Learning 4. FINGERPRINTING SMARTPHONES The k-NN algorithm relies on a distance metric to identify neigh- In this section we will first look at how well we can fingerprint boring points. It is possible to compute simple Euclidean distance our participating smartphones. Next, we will discuss how we can between feature vectors; however, this is unlikely to yield optimal expand our results to simulate experiments with large number of results as some features will tend to dominate. Learning a bet- smartphones. Finally, we will provide simulation results for large ter distance (or similarity) metric between data points has received number of smartphones. much attention in the field of machine learning, pattern recognition 4.1 Results From Participating Smartphones and data mining for the past decade [24]. Handcrafting a good dis- tance metric for a specific problem is generally difficult and this We had a total of 610 participants in our data collection study. To has led to the emergence of metric learning. The goal of a distance evaluate the performance of the classifiers, we first split our data set metric learning algorithm is to take advantage of prior informa- in training and testing set. As we have devices with different num- tion, in form of labels, to automatically learn a transformation for ber of data samples (see figure1), we evaluate F-score for different the input feature space. A particular class of distance function that size of training set. We randomly choose the training and testing exhibits good generalization performance for distance-based clas- samples. To prevent any bias in the selection of the training and sifiers such as k-NN, is Mahalanobis metric learning [37]. The aim testing set we rerun our experiments 10 times and report the aver- 7 is to find a global, linear transformation of the feature space such age F-score . Table1 summarizes the average F-score for different that relevant dimensions are emphasized while irrelevant ones are number of training samples per device. discarded. The linear transformation performs arbitrary rotations Table 1: Average F-score for different training set size and scalings to conform to the desired geometry. After projection, Training Number Avg. F-score (%) Euclidean distance between data points is measured. samples of Random State-of-the-art Mahalanobis metric learning algorithms include per device devices Foresta Large Margin Nearest Neighbor (LMNN) [53], Information Theo- 1 586 33 retic Metric Learning (ITML) [30] and Logistic Discriminant Met- 2 567 65 ric Learning (LDML) [34]. A brief description of these metric 3 545 78 learning algorithms is provided by Köstinger et al. [37]. To un- 4 524 83 derstand how these metric learning algorithms improve the perfor- 5 501 86 mance k-NN classifier, we first plot the mutual information (MI) of 6 483 88 each feature before and after each transformation. Figure2 shows 7 468 89 the amount of mutual information per feature under both untrans- 8 444 89 9 400 90 formed and transformed settings. Table 2: Performance of different metric learning algorithms a a 100 bagged decision trees Avg. F-score for k-NN From Table1 we see that we can achieve high classification ac- Untransformed LMNN ITML LDML curacy even for this larger data set. With five training samples, 35 41 46 50 which correspond to about 25 s of data, accuracy is 86%, increas- a k = 1, 3 training samples per device ing to 90% with 9 training samples. Even with a single 5 s sample, Figure2 shows a clear benefit of the distance metric learning we obtain 33% accuracy, which may be sufficient if a small amount algorithms. All of the transformations provide higher degree of of extra information can be obtained through other browser finger- mutual information compared to the original untransformed data. printing techniques, however weak. Among the three transformations we see that LDML on average In terms of performance, we found that on average it takes around provides slightly higher amount mutual information per feature. 100–200 ms to match a new fingerprint to an existing fingerprint. This is confirmed when we rerun the k-NN classifier on the trans- For our experiments we use a desktop machine with an Intel i7- formed feature space. Table2 highlights the average F-score for 2600 3.4GHz processor with 12GiB RAM. different metric learning algorithms. We see that for our data set, 6Accuracy can be thought of as a relaxed version of F-score. LDML seems to be the best choice. We, therefore, use LDML al- 7We also compute the 95% confidence interval for F-score, but gorithm to transform our feature space before applying k-NN for we found it to be less than 1% in most cases. the rest of the paper. However, even with LDML, k-NN underper- 3 3 3 3

Original Untransformed Data LMNN Transformed Data ITML Transformed Data LDML Transformed Data 2.5 2.5 2.5 2.5

2 2 2 2

1.5 1.5 1.5 1.5 MI (in bits) MI (in bits) MI (in bits) MI (in bits)

1 1 1 1

0.5 0.5 0.5 0.5

0 0 0 0 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Features sorted by MI Features sorted by MI Features sorted by MI Features sorted by MI (a) (b) (c) () Figure 2: Comparing mutual information for different metric learning algorithms. Mutual information per feature for (a) untransformed data (b) LMNN transformation (c) ITML transformation, and (d) LDML transformation. forms random forest, as seen in Table3: our F -score drops from follows: 78% to 54% with 3 samples and from 86% to 64% with 5 samples. Table 3: Average F-score of k-NN after LDML Cintra = {x : x = d(p, q), p ∈ Di, q ∈ Di, ∀i ∈ D} Training Number Avg. F-score (%) Cinter = {x : x = d(p, q), p ∈ Di, q ∈ Dj , i 6= j, ∀i, j ∈ D} samples of a b Random k-NN k-NN+LDML per device devices Forestc where D refers to the set of all devices (we consider only devices 1 586 24 38 33 with at least 2 traing samples, we have 567 such devices). We can 2 567 31 43 65 now fit an individual distribution for each class. To do this we 3 545 35 50 78 utilize MATLAB’s fitdist function [7]. To avoid overfitting, we dis- 4 524 36 52 83 tribute our devices into four equal subsets. We then fit and com- 5 501 38 54 86 pare distributions from each subset. Figure3 shows the top five 6 483 38 54 88 estimated inter-device distance (Cinter) distributions for each sub- 7 468 38 53 89 set of devices. Here, the distributions are ranked based on Akaike 8 444 37 52 89 Information Criterion (AIC) [23]. From figure3 we can see that 9 400 35 50 90 the top five distributions are more or less consistent across all four

a subsets. k = 1 b Next, we plot the same inter-device distance distribution but this k = 1 c time we consider data from all 567 devices. Figure4(a) highlights 100 bagged decision trees the top five distributions. Comparing figure3 and figure4(a), we see that the most representative inter-device distance distribution is 4.2.2 Intra and Inter-Device Distance an Inverse Gaussian distribution. Similarly, we find that the most To predict how k-NN will operate on larger data sets, we proceed likely intra-device distance distribution (Cintra) is a Generalized to derive a distribution for distances between samples from differ- extreme value distribution as shown in figure4(b). Figure4(c) ent devices (inter-device), and a second distribution for distances shows the difference between intra and inter-device distance dis- between different samples from the same device (intra-device), af- tribution. ter first applying the LDML transformation to the feature space. Since each data sample is a point in a n-dimensional feature space, 4.2.3 Simulating A Large Number Of Smartphones we compute the Euclidean distance between any two data samples Now that we have representative distributions for intra and inter- using the the following equation: device distances, we can simulate a k-NN classifier. The pseudo v code for simulating k-NN classifier is provided in Algorithm1. u n uX 2 The algorithm works as follows. Let us assume that there are D d(p, q) = t (pi − qi) (8) devices and for each device we have N training samples. Now, i=1 for any given test sample, a k-NN classifier, first computes N×D where p and q represent two feature vectors defined as follows, distances of which N distances are with samples from the same p = (p1, p2, ..., pn), q = (q1, q2, ..., qn). We then group distances device and N×(D − 1) distances are with all samples belonging between feature vectors from the same device into one class Cintra to (D − 1) other devices. We emulate these distances by draw- and distances between feature vectors from different devices into ing N and N×(D − 1) distances from our representative intra another class Cinter. Class Cintra and Cinter can be defined as and inter-device distance distributions, respectively. k-NN classi-

Probability Density Function Probability Density Function Probability Density Function Probability Density Function 0.25 0.18 0.18 0.25 empirical empirical empirical empirical For 141 devices inverse gaussian For 141 devices birnbaumsaunders For 141 devices inverse gaussian For 141 devices inverse gaussian birnbaumsaunders 0.16 inverse gaussian 0.16 birnbaumsaunders birnbaumsaunders lognormal lognormal lognormal lognormal generalized extreme value generalized extreme value generalized extreme value generalized extreme value 0.2 0.2 burr 0.14 loglogistic 0.14 burr burr

0.12 0.12

0.15 0.15 0.1 0.1

0.08 0.08 0.1 0.1

Probability Density Probability Density 0.06 Probability Density 0.06 Probability Density

0.04 0.04 0.05 0.05

0.02 0.02

0 0 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 Value Value Value Value Figure 3: Estimated inter-device distance distributions for 4 subsets of devices where each subset contains 141 devices. Probability Density Function Probability Density Function Distance Distributions 1.4 0.25 1 empirical empirical Intra Cluster : generalized extreme value generalized extreme value inverse gaussian Inter Cluster : inverse gaussian For 567 devices tlocationscale 0.9 For 567 devices For 567 devices birnbaumsaunders logistic lognormal 1.2 normal generalized extreme value 0.8 0.2 burr generalized pareto 1 0.7

0.15 0.6 0.8

0.5

0.6 0.1 0.4 Probability Density Probability Density Probability Density 0.4 0.3

0.05 0.2 0.2 0.1

0 0 0 0 5 10 15 20 25 30 35 40 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 0 5 10 15 20 25 30 35 40 45 50 Value Value Distance (a) (b) (c) Figure 4: Estimated distributions for (a) inter-device distance (Cinter) (b) intra-device distance (Cintra). (c) Difference between intra and inter-device distance distribution.

Algorithm 1 Simulating k-NN classifier anywhere between 15 to 20 seconds on a web page [8, 19] val- ues of N ≤ 5 seem most realistic (each of our data sample is 5 Input: k, N, D, Distrintra, Distrinter, Runs k – number of nearest neighbors (odd integer) seconds worth of web session). We also experimented with other N – number of training samples per device values of k, but found that setting k = 1 provides the best over- 8 D – number of devices lap between real-world and simulation results. From figure5 we see that our simulation results closely match our real-world results. Distrintra – intra-device distance distribution Also we can see that the average classification accuracy is in the Distrinter – inter-device distance distribution Runs – number of runs range of 12-16% when we scale up to 100 000 devices. This ac- Output: Acc curacy is unlikely to be sufficient if motion sensors are the unique Acc – Average classification accuracy source of a fingerprint, but it suggests that combining motion sen- d ← {} #list of (distance,label) tuple sor data with even a weak browser-based fingerprint is likely to Acc ← 0 be effective at distinguishing users in large populations. Addition- for i := 1 to Runs do ally, these classification accuracies are conservative and potentially #add N intra-distances and label each with 0 provide a lower bound on performance, as random forests provide for j := 1 to N do significantly better performance. d ← d + {(random(Distrintra), 0)} end for 5. WEBSITES ACCESSING SENSORS #add N×(D − 1) inter-distances and label each with 1 In this section we look at how many of the top websites access for j := 1 to N×(D − 1) do motion sensors. We also try to cluster the access patterns into broad d ← d + {(random(Distrinter), 1)} use cases. end for d ← sort(d) #in ascending order of distance 5.1 Methodology l ← label(d, k) #return label for top k elements Figure6 provides an overview of our methodology to automat- imposters ← sum(l) #sum top k labels ically capture and cluster JavaScripts accessing sensor data from if imposters < k/2 then mobile browsers. To automate this process, we use Selenium Web Acc ← Acc + 1 #correct decision Driver [15] to run an instance of Chrome browser with a user agent end if set for a smartphone client. In order to collect unfolded JavaScripts, end for we attach a debugger between the V8 JavaScript engine [12] and Acc ← Acc/Runs the web page. Specifically, we observe script.parsed func- return Acc tion, which is invoked when new code is added with