Location Privacy with Randomness Consistency

Proceedings on Privacy Enhancing Technologies ; 2016 (4):62–82

Hao Wu and Yih-Chun Hu Location Privacy with Randomness Consistency

Abstract: Location-Based Social Network (LBSN) ap- mobile applications are becoming increasingly popular. plications that support geo-location-based posting and Such applications use location services to make mobile queries to provide location-relevant information to mo- applications more location-aware by providing knowl- bile users are increasingly popular, but pose a location- edge about the surroundings of a mobile user, e.g., by re- privacy risk to posts. We investigated existing LBSNs trieving posts or finding people around the user. For ex- and location privacy mechanisms, and found a pow- ample, users are able to socialize with people nearby us- erful potential attack that can accurately locate users ing a rendezvous social network such as PeopleNet [33], with relatively few queries, even when location data is online dating applications which connect nearby peo- well secured and location noise is applied. Our tech- ple [4, 5, 7], friend recommendation networks for shop- nique defeats previously proposed solutions including ping together [26], and collaborative networks which fake-location detection and query rate limits. gather people who work together [13, 41]. Not only do To protect systems from this attack, we propose a sim- some regular social networks like Facebook widely use ple, scalable, yet effective defense that quantizes the the user’s location to tag a user’s post, many popular map into squares using hierarchical subdivision, con- anonymous social network applications are relying on sistently returns the same random result to multiple geo-location to help users to filter and retrieve more re- queries from the same square for posts from the same lated posts around users. Examples of such applications user, and responds to queries with different distance include Whisper [8] and Yik Yak [10]. thresholds in a correlated manner, limiting the infor- With the increasing number of LBSN users, loca- mation gained by attackers, and ensuring that an at- tion privacy in such environments is becoming more tacker can never accurately know the quantized square important. Abuse and unauthorized use of users’ loca- containing a user. Finally, we verify the performance of tion can threaten users physically [23], financially [39], our defense and analyze the trade-offs through compre- and even legally [20]. For example, thieves can locate hensive simulation in realistic settings. Surprisingly, our and rob a victim’s home through Facebook, resulting results show that in many environments, privacy level in both physical injuries and financial losses [11]. Be- and user accuracy can be tuned using two independent cause an attacker can learn a user’s identity through parameters; in the remaining environments, a single pa- anonymized GPS traces [22, 29], the vulnerabilities are rameter adjusts the tradeoff between privacy level and exacerbated if the location of a user is exposed from an user accuracy. We also thoroughly explore the parame- anonymous social application, in which the user’s iden- ter space to provide guidance for actual deployments. tity is supposed to be hidden so that they are able to speak without fear of bullying and abuse [12]. For appli- Keywords: Location based social network, Location pri- cations with geo-distance-based retrieval (e.g. Whisper vacy, Randomness consistency filters posts by the distance to the query), attackers can DOI 10.1515/popets-2016-0029 easily deduce the location of the post owner by trian- Received 2016-02-29; revised 2016-06-02; accepted 2016-06-02. gulation [42] or the space partition attack [30, 36], and even adding noise and offset in the location data fails to resist attackers [36, 43]. 1 Introduction Existing approaches to location privacy can be divided into three categories: first, those that use trusted Because of the increasing popularity of smartphones servers to anonymize the user’s location data [27, 28, worldwide, Location-Based Social Network (LBSN) [31] 32]; second, those that apply cryptographic or private information retrieval (PIR) to location data [21, 34, 35, 37, 38, 45]; third, those that add noise or obfuscation to location data [15, 19, 24, 30, 32, 43]. The Hao Wu: University of Illinois at Urbana-Champaign, E- first two approaches focus on protecting location data mail: [email protected] Yih-Chun Hu: University of Illinois at Urbana-Champaign, from unsecured data transmission or data storage, and E-mail: [email protected] the third sacrifices user experience for reduced attacker

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 63 precision. In this work, we examine a new powerful Sybil accounts [44] to improve their total query rate and entropy-minimization-based attack, which is able to ex- avoid faked GPS coordinate detection which is mainly pose victims’ location in applications that support geo- based on detecting unrealistic movement patterns in an location-based information retrieval (e.g. Tinder [7], Yik account [43]. Because only a few queries are needed for Yak [10], Whisper [8], etc.), even when these three pre- our attack, attackers only need tens or hundreds of Sybil vious defense categories are performed perfectly (i.e. the accounts, so such accounts can be maintained by attack- location data is secured and the location noise is large ers to mimic real accounts and avoid Sybil account de- enough while providing a reasonable user experience). tection, which usually aims to detect a larger number of In this paper, we consider a geo-location-based such accounts, e.g., thousands of accounts [17]. query response system which receives a query with the To mitigate privacy compromise attacks, we pro- geo-location coordinate of the requestor and returns a pose a simple and effective defense using a quantized set of posts within a specified distance threshold, so that map and consistent random response, which can always each requestor can retrieve posts around itself. The con- prevent an attacker from gaining precise knowledge of tent of a “post” varies by application, and may include the quantized square containing the user, even against user-generated content, or even the users themselves. a strong attacker who is able to make unlimited queries We develop an effective entropy-minimization-based at- and knows the probability distribution of the noise (but tack algorithm that can efficiently deduce the location does not know the random values themselves). The sys- of a post using only a small number of queries, even tem responds to each query based on the center coor- when 1) the threshold distance is not adjustable by the dinates of the quantized squares containing the post requestor (an extreme case of coarse-grained threshold and the query, and provides an identical response to values [43]), 2) noise and offset is added to the loca- all queries from the same square for posts by the same tion data, and 3) the system is secured against direct user. When queries can be made with different distance access to location information. This attack algorithm thresholds, the system responds to those queries in a relies only on knowledge of the location offset and the consistent manner. Against our defense, the attacker can probability distribution of the random noise, which can at most learn the quantized square in which the victim be deduced by queries on posts with pre-known loca- is located, and can only make a finite number of differ- tion [36, 43]. In each round, our attack algorithm starts ent queries in a size-limited map, because the number of with a probability distribution on a particular victim quantized squares is finite. The attacker cannot cancel post. The algorithm then considers all possible query lo- noise by making multiple queries at the same location, cations and greedily chooses the one that minimizes the so the information available to the attacker is limited. expected entropy. (Specifically, for each possible query Moreover, we make this defense scalable over a map location, we can compute the resulting probability dis- of any size by a hierarchical subdivision of this square- tribution conditional on whether the system returns the quantized map, while limiting the attacker information post, and weights those probabilities based on the at- gain with an upper bound which is customizable by tun- tacker’s current probability distribution, to compute the ing the hierarchical subdivision. This approach trades expected post-query entropy). Our experiments show location accuracy of posts that are far away from the that to accurately locate the target post in a 10.25 km × query requestor for scalability; however, experiments in- 10.25 km map with 30 meter error in around 100 queries dicate that it has minimal impact on application usabil- (rounds). This outperforms the RANDUDP attack [36], ity. Moreover, users usually don’t care whether the post which needs 1400 queries to achieve an error of 37.4 me- is 99 miles or 101 miles away from them, and existing ap- ters against the Gaussian noise of the same variance plications such as Whisper also set more coarse-grained 0.039 miles), by a factor of 10. Even against stronger resolution for far away posts than for nearby posts [8]. noise, our attack can achieve 100-meter error in hun- Surprisingly, the evaluation shows that our defense dreds to thousands of queries, depending on the variance mechanism allows us to easily tune system parameters introduced by random noise. When no location noise to maximize privacy while limiting average user error is involved, the space partition attack [30, 36], has the to an acceptable level. Broadly speaking, noise distri- same performance as our attack, but the space partition butions can be categorized based on whether the user attack is limited to systems that do not add noise. error is bounded in world size (with a constant density of Although existing work has proposed faked-GPS- posts). In bounded distributions, user error is bounded coordinate detection and query rate limitation as coun- regardless of the map scale, so to trade-off between pri- termeasures [36, 43], attackers can simply create a few vacy and user error, we need only consider the size of

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 64 the quantized squares. In unbounded distributions, pri- by our attack algorithm, or even simpler ones, e.g. tri- vacy is primarily driven by square size, while user error angulation [42] and the space partition attack [30, 36]. is primarily driven by map scale, so we can maximize privacy for a given distance threshold, while bounding map size to provide acceptable user error and usability; 2.2 Information-Leakage Location Attacks we discuss such tradeoffs in Section 6. Our main contributions are as follows: Polakis et al. [36] and Li et al. [30] both propose d – We analyzed today’s LBSN applications and found the space partition attack which can achieve O(log( )) a potential threat which can expose a user’s lo- query count complexity for a noise-free LSBN system cation quickly and effectively through an entropy- (d is the query threshold and is the distance error of minimization algorithm that bypasses existing de- the attack). Polakis et al. [36] also propose an attack, fense and detection mechanisms. RANDUDP, based on Maximum Likelihood Estimation – To limit the capability of attackers, we propose a (MLE) for LSBN with Gaussian noise. In our paper, simple and scalable defense based on map quantiza- we propose a more generic entropy-minimization-based tion, randomness-consistent response, and a novel attack which works against any kind of noise distri- hierarchical subdivision over a quantized map. In bution, and outperforms RANDUDP by approximately our defense, privacy and accuracy can be easily 10x against the Gaussian noise of Skout [36]. tuned. – We comprehensively evaluated our scheme’s performance by simulating different levels of noise, scales 2.3 Location Privacy Techniques of map, and sizes of quantized squares. The results support the effectiveness of our defense and provide We have classified existing location-privacy- a thorough guide for parameter tuning. enhancement techniques in three categories, as in Put- taswamy et al. [37]. Anonymization by trusted servers. Some tech- 2 Background and Related Work niques rely on trusted servers to remove the identities of users and send anonymized location data to service 2.1 Location-Based Social Network providers, so that the service providers cannot know the owner of the location data. Mokbel et al. [32] and Location-Based Social Networks (LBSNs) [31] are mo- Kalnis et al. [28] both use an anonymizer to remove bile applications that allow users to retrieve nearby the user identity before transmitting data to a service users [2, 4, 7] or posts [8, 10] through geo-location-based provider. Kalnis et al. [28] also achieves K-anonymity queries, which are usually based on the distance from that hides the query requestor by mixing with K − 1 the query requestor to other users or posts. For example, other users around it. However, these anonymization the “Nearby Friends” function in the Facebook mobile techniques cannot be certain to prevent an attacker from application [2] lists each nearby friend and their dis- identifying the user, because anonymized GPS traces tance to you; Tinder [7] (a mobile dating application) can be used to infer users’ home, office, and even real and Whisper [8] (an anonymous social network applica- identities [22, 29]. Even if anonymization techniques can tion) allow the user to set the distance threshold and prevent attackers from knowing the identity of leaked lo- retrieve all the results within this threshold. cation data, they still cannot prevent the location leak- We classify LBSN vulnerabilities into two cate- age from the query results. gories. (1) Location data leakage. Puttaswamy and Cryptographic and Private Information Re- Zhao [38] show that some LBSN applications run on un- trieval (PIR). These two techniques are widely used trusted servers that store location coordinates in plain in keeping location data secure in the transmission and text and may leak location data through software bugs storage. Zhong et al. [45], Narayanan et al. [34], and or plaintext transmission. (2) Location information leak- Puttaswamy et al. [37, 38] use cryptographic techniques age. As Wang et al. [43], Li et al. [30], Polakis et al. [36], to ensure that only nearby users can successfully de- and our paper show, attackers can infer the location of crypt location data. Again, these techniques focus on nearby users or posts by exploiting information leaked securing the location data, but do not protect location from query results. Such locations can be easily learned privacy from leaking through query results. Although

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 65

Puttaswamy et al.’s approach [37] allows only a user’s 3.1 System Model social group to see the user’s location information in kNN-query-based LBSN applications, it is not applica- An LBSN application retrieves posts by queries based on ble in applications which rely entirely on distance-based geo-location. Although the query response algorithms queries, such as anonymous applications in which users can vary, we generalize the system model as follows. do not have friends or social groups, e.g. Whisper [8] and Post from users. A post includes the location of the Yik Yak [10]. Ghinita et al. [21] and Papadopoulos et user at the posting time. When a server receives a post, al. [35] use Private Information Retreival to allow users it stores the post with its location so that the server can to retrieve the nearest neighbor or k-nearest neighbors retrieve the post in response to a query from a nearby (kNN) without revealing the location of the query re- location. The content of a “post” varies from application questor; however, they focus on the location privacy of to application, and may be a user, a text post, an image, query requestor but not that of the “post” in this paper. or any other geographically-bound content. Low resolution and obfuscation. Gruteser et Geo-location-based query In an LBSN, posts are al. [24], Gedik et al. [19], Mokbel et al. [32], Li [30], and returned in response to geo-location-based queries. Each Polakis et al. [36] improve location privacy by limiting query Q includes the sender’s location Xquery, and a set the resolution of the location data or replacing the loca- of parameters d. For example, d can specify the radius tion data with a region based on privacy level, so that of the search. For each post P with location Xpost, the neither attackers nor users can acquire location data be- server adds random noise to Xpost, and decides whether yond this maximum resolution. These low resolution ap- or not to return the post based on Xquery and Xpost. For proaches equally hurt both users and attackers. Ardagna example, the server uses a response probability func- et al. [15] introduce obfuscation-based techniques that tion P(Xquery,Xpost, d) to calculate the probability (be- randomly adjusts the user’s location to a nearby point. tween [0, 1]) of returning post P in response to query Q, Wang et al. [43] indicated that Whisper [8] is also using then generates a random number R from [0, 1] to decide random noise and distance offset to increase the loca- whether to include P in the response: tion uncertainty. However, Wang et al. [43], Polakis et – If R ≤ P(Xquery,Xpost, d), then include post P in al. [36] and this paper show that simply adding offset the response. and random noise cannot stop attackers from learning – If R > P(Xquery,Xpost, d), then don’t include post the location of victims. We even demonstrate that limit- P in the response. ing the query rate cannot mitigate this potential threat. In our evaluation, we choose a typical setting. We set Directly storing noisy location data into the a distance threshold d. A querier expects to receive database can create error for the attacker (as in Andrés a post if and only if that post lies in a circle C , et al. [14]). We consider such strategies to be orthogonal d with center X and radius d. By choosing differ- to our defense mechanisms, which operate on the query query ent P(X ,X , d), this generic model can cover all response. Such approaches permanently introduce a bias query post three models (Disk UDP, Round UDP, and Randomized in each post, and some posts will experience high loca- UDP) discussed by Polakis et al. [36]. tion noise (even though the average noise across all posts can be smaller), which hurts the experience of posters Secured data transmission and storage. We as- who sent those unlucky high-noise posts (because the sume that the system secures both data transmission poster expects that nearby people can see its post). and storage, so that the only source of location information is through queries. Existing techniques, such as location data anonymization [27, 28, 32] and location 3 Problem Definition data encryption [21, 34, 35, 38, 45], can provide such security.

This paper examines the location privacy issue in LBSN applications that support geo-location-based query, and 3.2 Location Privacy Problem to design a defense to limit an attacker’s ability to learn user locations. In this section, we deﬁne the system The attacker’s goal is to maximize certainty about post model and adversary model, formulate the location pri- location, while the system’s goal is to minimize the at- vacy problem, and summarize our goals. tacker’s certainty subject to acceptable application ac-

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 66

curacy. We define two metrics that quantify the attacker E(|FP|) + |PA|, which is the number of returned posts and application errors: plus the number of false negatives, since these posts drive the user experience (as compared to true nega- Privacy Level: mean absolute error (MAE) for tives). Moreover, normalized error is a better measure attackers. According to Shokri et al., the MAE than absolute error. For example, if in Case 1, the user (called “correctness” in [40]) is the difference between sees 10 posts but 5 of them are wrong, and in Case 2, the the attacker’s knowledge and the actual location, which user sees 100 posts, but only 10 of them are wrong, the measures user privacy. When the attacker learns a prob- user experience in Case 2 is better, though the absolute ability distribution P rpost(X) for a post at Xpost, the error of Case 1 is larger. MAE MAEatt is: X Tradeoff between privacy level and accuracy MAEatt = |X − Xpost| · P rpost(X) (1) X level. Because high attacker error may cause high where |X − X | is the geometric distance between X 1 2 1 user error, we aim to tune the system to maximize the and X2. privacy-to-user-error ratio metric MAEatt/PAEuser, to Accuracy Level: per-post absolute error (PAE) maximize privacy level per unit loss of accuracy level. for legitimate users. When the system introduces noise or other error sources into its reply, we can catego- rize responses as true positives (responses that are in the 3.3 Adversary Model desired area and are returned), false positives (responses that are not in the desired area but are returned), true Because location authentication is difficult in the Inter- negatives (responses that are not in the desired area net, especially when the attacker only makes a small and are not returned), and false negatives (responses number of queries from each Sybil account, we assume that are in the desired area but are not returned). We the attacker can choose arbitrary coordinates for each calculate the per-post absolute error by calculating the query, and send queries with any Xquery it wants. Our error represented by the false positives and false nega- defense assumes that the attacker can send an unlimited tives (that is, the distance by which the response is in number of queries; however, in our evaluation of previ- error) divided by the number of posts that are either ously proposed schemes, we show how an attacker can returned, or that should have been returned (that is, gain extensive information even with a limited number are in Cd). Then the PAE is defined as: of queries. FPE + FNE We further assume the attacker knows all sys- PAEuser = (2) E(|FP|) + |P | where A tem settings, such as the response probability function X P(Xquery,Xpost, d). This assumption is reasonable be- FNE = [(d − |X − Xquery|) · (1 − P(Xquery, X, d))] cause the attacker can make some posts and use multiple X∈X(P)∩Cd probe queries with attacker-selected coordinates to es- (3) X timate the noise and distance offset from the responses; FPE = [(|X − X | − d) · P(X , X, d)] (4) query query such an attack could determine [36, 43] the noise and X∈X(P)\Cd X distance offset added in Skout [6] and Whisper [8]. How- E(|FP|) = P(X , X, d) (5) query ever, the attacker lacks direct access to the random seed X∈X(P)\C d for noise generation and to location data for any post. |PA| = |X(P) ∩ Cd| (6) and X(P) is the set of coordinates of all posts. FNE is the total false negative error, which is the 3.4 Goals distance error represented in the unreturned posts, and FPE is false positive error, which is the distance error In this paper, we aim to evaluate the ability of an at- represented in the incorrectly returned posts. When cal- tacker, and develop a defense mechanism. culating the FNE and FPE, we give more weight to posts far from the threshold and less weight to posts near the Evaluate attacker ability. When evaluating an at- threshold, because smaller location errors impact user tacker’s ability, we assume that the location data is well- experience less than large location errors. E(|FP|) is the secured and can only be inferred from queries, that noise expected number of false positive posts, and |PA| is the is added into each response, and the distance thresh- number of posts in the desired area. We normalize the old d of each query is fixed. Under this system setting, error by the number of posts that the user cares about many previously proposed attacks including triangula-

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 67 tion and space partition [30] do not work, because they Algorithm 1 Attack Algorithm do not expect noise, and because triangulation requires 1: Initialization: Init() (Lines 12–13) an adjustable threshold (to do binary search to learn 2: for n from 1 to N do the distance from a query to a victim). Even the at- 3: if n = 1 then tack of Wang et al. [43] will be thwarted, because it de- 4: P r(X) := UpdateWithRx(XQ,n, P r(X)) (Lines 14–15) 5: else pends an adjustable threshold. Even under these strong 6: if Rx then constraints, we show that the attacker can learn nearly 7: P r(X) := UpdateWithRx(XQ,n, P r(X)) (Lines 14– perfect information about the location of each post. 15) 8: else Defense mechanism. Because we show that an at- 9: P r(X) := UpdateWithoutRx(XQ,n, P r(X)) (Lines tacker can learn strong information, we develop a de- 16–17) fense that limits the attacker’s ability to learn location 10: XQ,n+1 := NextQueryLocation(P r(X)) (Lines 18–21) information. Speciﬁcally, we want a mechanism that 11: Return P r(X) maximizes privacy level for a given level of accuracy. 12: Initialization, Init() 13: Initialize P r(X) to a ﬂat distribution P r0(X) 14: Update distribution when victim post P is in reply 4 Attack Algorithm to query at XQ,n, UpdateWithRx(XQ,n, P r(X)) 15: P r(X) = P r(X|Rx, XQ,n), (Follow Formula 10)

In this section, we present a powerful attack algorithm 16: Update distribution when victim post P is not in reply to query at X , UpdateWithoutRx(X , P r(X)) (Algorithm 1) based on the adversary model deﬁned Q,n Q,n 17: P r(X) = P r(X|Rx, XQ,n), (Follow Formula 11) in Section 3. The attacker updates its estimated probability distribution for the victim’s location based on 18: Pick next query location, NextQueryLocation(P r(X)) 19: for each possible X in map M do the query result in each round, and chooses the next Q 20: Use Formula 16 to calculate the expected entropy query location to maximize the expected information E(H(X|XQ)) if our next query is at XQ gain. With this algorithm, the entropy of the probabil- 21: Return the XQ with the smallest E(H(X|XQ)) ity distribution converges to zero. After the 1st round (n ≥ 2), the attacker may or may not receive post P in its query response. Then 4.1 Algorithm Construction P rn(X|Rx, XQ,n) if post received P rn(X) = (9) Let M be the region of possible victim locations, and P rn(X|Rx, XQ,n) otherwise let |M| be the area of the map. The attacker initially To calculate P rn(X|Rx, XQ,n) and P rn(X|Rx, XQ,n), assumes the victim’s location falls in a prior probabil- we apply Bayes’ rule to get ity distribution P r0(X), where X ∈ M. Following the P r(Rx|X, XQ,n)P rn−1(X) principle of maximum entropy [25], when an attacker P rn(X|Rx, XQ,n) = (10) P r(Rx|XQ,n) knows nothing, it assumes P r0(X) is uniformly distributed across M: P r(Rx|X, XQ,n)P rn−1(X) 1 P rn(X|Rx, XQ,n) = (11) P r0(X) = (7) P r(Rx|X ) |M| Q,n

Update probability distribution. In round n, we where P r(Rx|X, XQ,n) is response probability function compute an updated victim location probability distri- P(Xquery,Xpost, d) (chosen by the service) with inputs XQ,n and X. Thus, bution P rn(X) based on the previous probability dis- P r(Rx|X, XQ,n) = P(XQ,n, X, d) (12) tribution P rn−1(X). Before the attacker can learn the location of a post P, he must first see post P, so in our P r(Rx|X, XQ,n) = 1 − P(XQ,n, X, d) (13) model, the first round is the first time the attacker sees post P. Then P r(Rx|XQ,n) is the probability that we receive post

P r1(X) = P r1(X|Rx, XQ,1) (8) P in response to a query at XQ,n. This probability is calculated using the attacker’s probability distribution where Rx denotes that post P was returned in response from the previous round P rn−1(X): to the query made in that round, and Rx denotes that X P r(Rx|X ) = P r(Rx|X, X )P r (X) (14) P was not returned. XQ,n denotes the event that the Q,n Q,n n−1 X∈M nth-round query is made at location XQ,n.

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 68

P r(Rx|XQ,n) = 1 − P r(Rx|XQ,n) (15) the eﬀectiveness of our attack algorithm, where x is the distance between a post and a query |Xquery − Xpost|. Choosing the next query location. At the end of the nth round, we choose the next query location P P P 1 2 3 XQ,n+1 to be the one with the largest expected infor- 1 1 1 mation gain (or, equivalently, the smallest expected en- 0.5 0.5 0.5

Probability 0 Probability 0 Probability 0 tropy of P rn+1(X) computed with a weighted average 0 1 2 0 1 2 0 1 2 x/d x/d x/d across the two cases where the post is, and is not, re- (a) Step Function (b) Step Function (c) 1 (x/d)4+1.05 ceived). For each possible XQ,n+1, we calculate the ex- with Noise P P P 4 5 6 pected entropy: 1 1 1

E(H(X|XQ,n+1)) =H(X|Rx, XQ,n+1)P r(Rx|XQ,n+1) 0.5 0.5 0.5

Probability 0 Probability 0 Probability 0 + H(X|Rx, XQ,n+1)P r(Rx|XQ,n+1) 0 1 2 0 1 2 0 1 2 x/d x/d x/d where, (16) (d) 1 (e) 1 (f) Function with (x/d)3+1.05 (x/d)2+1.05 H(X|Rx, XQ,n+1) Gaussian Noise Fig. 1. Sample response probability functions. X (17) = −P r(X|Rx, XQ,n+1) log2 P r(X|Rx, XQ,n+1) X∈M

H(X|Rx, XQ,n+1) Experiment setting. We pick a 10.25 km × X (18) 10.25 km area of North Chicago as M for evaluation, = −P r(X|Rx, XQ,n+1) log2 P r(X|Rx, XQ,n+1) X∈M and we quantize the map M into a grid of 41 × 41 squares. The square size is 0.25 km × 0.25 km, and We choose the XQ,n+1 that produces the smallest each square has center x- and y-coordinates from −5 to E(H(X|XQ,n+1)) and use it for the next round. We run the algorithm for N rounds until the entropy converges. 5 km. The victim post P is randomly sampled within a fixed distance threshold d = 1.75 km (which is a typical threshold used in Tinder [7] and Yelp [9]), and consider 4.2 Engineering the Attack Algorithm the following 6 response probability functions: – Step function (Figure 1(a)). P1(x, d) = 1 when x ≤ d and P1(x, d) = 0 when x > d. This function intro- Map quantization. To efficiently determine the duces no error, but provides no attacker uncertainty. point of each query, we quantize the map M into a fi- – Step function with random noise (Figure 1(b)). We nite number of squares, and make each query at the add noise around the distance threshold d, so that center of such a square, so the algorithm can calculate P2(x, d) = P1(x, d) + Noise(x, d). However, the noise the discrete probability that the victim is in each square. is bounded and the attacker can still shrink the po- The attacker should choose a square size comparable to tential region of the victim by repeated queries. the accuracy level desired, because our attack algorithm – Non-step functions (Figures 1(c)–1(e)). P3(x, d) = tends to converge the probability distribution to a single 1 1 (x/d)4+1.05 , P4(x, d) = (x/d)3+1.05 , and P5(x, d) = square after several iterations. 1 (x/d)2+1.05 , so no distance has probability of 0 or Gradually shrinking quantized squares. Be- 1. The attacker can never completely exclude any cause smaller square sizes give the attacker a greater location from the map. P3(x, d) to P5(x, d) decrease level of accuracy at the cost of greater attacker com- differently as x/d increases, with P3(x, d) being the putational complexity (O(N 2) time to choose the next least uncertain (most like the step function) and query position, where N is number of squares), an at- P5(x, d) being the most uncertain (least like the step tacker can start with relatively large square sizes, iden- function). tify the post’s larger square, then refine the search with – Gaussian-noise functions (Figure 1(f)). P6(x, d) = progressively smaller square sizes. G(0)−G(x/d−1) 0.5 − sgn(1 − x/d) 2G(0) , where G(x) is a Gaussian function with µ = 0, σ = 0.036. Because our threshold d = 1.75 km, the variance of the 4.3 Examples and Evaluation noise is d · σ = 0.039 miles, which is the same noise variance used in the evaluation of RANDUDP [36]. In this section, we present some attack examples over different response probability functions P(x, d) to show

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 69

Thus, we can compare the performance of our at- No-noise and bounded-random-noise response prob- tack against RANDUDP. ability functions provide no privacy to the victim. When the system uses P1(x, d), very few queries (in this example, 8) are needed to perfectly locate the victim post. North Chicago ×10-3 Though the attacker cannot adjust d for triangulation, 41.97 4.5 it can still shrink the potential locations of the victim 41.96 4 by a factor of two each round, just as in the space parti-

41.95 3.5 tion attack [30, 36]. Thus, existing defenses based on a coarse-grained d cannot by themselves provide privacy. 41.94 3 When the system uses P2(x, d), the attacker can also 41.93 2.5 quickly locate the victim with a high degree of accu-

Latitude 41.92 2 racy even when bounded uncertainty is added to the step function. In this example, the attacker lowers the 41.91 1.5 MAEatt to 0.0211 km within 50 queries. Thus, simply 41.9 1 adding bounded random noise to the response also can- 41.89 0.5 not preserve location privacy. 0 -87.74 -87.72 -87.7 -87.68 -87.66 -87.64 ×10 -3 ×10 -3 ×10 -3 Longtitude 5 Rd 1, H = 9.0479 5 Rd 2, H = 9.4435 5 Rd 3, H = 9.6963 8 8 6 6 Fig. 2. North Chicago population probability distribution map. 6 0 0 4 0 4

km 4 km km 2 2 2 -5 0 -5 0 -5 0 We combine the 2010 block population data of Cook -5 0 5 -5 0 5 -5 0 5 5 Rd 10, H = 6.6026 5 Rd 20, H = 4.7369 5 Rd 300, H = 0.0268 0.05 County, IL from the U.S. Census Bureau [3] with the 0.8 0.04 0.1 0.6 0 0.03 0 0

borders of those blocks from the City of Chicago [1] km km km 0.02 0.05 0.4 to generate the population probability distribution map 0.01 0.2 -5 0 -5 0 -5 0 (Figure 2) and use this map as the attacker’s prior prob- -5 0 5 -5 0 5 -5 0 5 ability to make the evaluation more realistic. We also Fig. 3. Probability distribution map vs round number. Case of the response probability function P (x, d) = 1 . The post is evaluate the attack algorithm without the prior knowl- 3 (x/d)4+1.05 at coordinate (1.25, 0.75) km. edge of population distribution, to demonstrate a harder case for the attacker. Average Entropy Over Rounds Average Attacker MAE Over Rounds 12 3.5 P Furthermore, we demonstrate an example of how to 10 3 1 P

(km) 2 2.5 P gradually shrink quantized squares (Section 4.2). After 8 att 3 P 2 4 6 P 5 the attacker has located the 0.25 km × 0.25 km square 1.5 P 4 6 1

(level-1 square) containing the victim, it sub-divides the Average Entropy 2 0.5 Average MAE square into 11 × 11 level-2 squares (each 23 m × 23 m), 0 0 100 101 102 103 100 101 102 103 and locates the victim within its level-2 square. Round Round Fig. 4. Ave. entropy and MAE over 100 runs. Experimental results. We run our attack algorithm using these six probability response functions. Figure 3 Average Entropy Over Rounds Average Attacker MAE Over Rounds 8 P 0.1 1 P shows the attacker’s probability distribution functions 2 P 6 (km) 0.08 3

att P on a victim post P for various numbers of rounds, which 4 P 0.06 5 4 P gives us a sense of the power of the attacker algorithm. 6 0.04 2

Figures 17(a)–17(e) in Appendix B show the same at- Average Entropy 0.02 Average MAE tack against other response probability functions. 0 0 100 101 102 103 104 100 101 102 103 104 Figures 4 represent the attacker’s certainty, mea- Round Round sured using entropy and MAEatt, plotted against the Fig. 5. Ave. entropy and MAE over 100 runs. (Gradual shrink) number of rounds. Each point represents the average over 100 runs using diﬀerent victim post positions. We Gaussian noise with low variance, such as P6(x, d), include MAE values to present the actual error in the att provides privacy approximately equal to bounded- attacker’s estimate. random noise, and cannot prevent attacker locating the victim.

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 70

Against response probability functions with more a 10.25 km × 10.25 km square, and at most 100 ac- noise (e.g. P3(x, d), P4(x, d), and P5(x, d)), the attacker’s counts are enough to locate it to hundreds of square performance somewhat slows. The attacker needs hun- meters. Thus, limited query rate cannot guarantee pri- dreds of queries to be confident in its location estimate; vacy against a reasonable adversary. for example, with P5(x, d) the attacker needs 500 queries to reduce the MAEatt to 0.103 km. Figure 5, shows that gradually shrinking squares 5 Defense Approach can help the attacker improve accuracy with reduced computation overhead. For P (x, d), due to the lack 1 In this section, we propose a simple and effective ap- of noise, the attacker can accurately locate the level-2 proach that uses consistent randomness to limit attacker square in a few rounds. For P (x, d) to P (x, d), because 2 6 effectiveness. We evaluate this approach experimentally of the noise, to locate the level-2 square, the attacker in Section 6. needs hundreds to thousands of rounds, depending on the noise level. We show only the first 10000 rounds to Adversary model. We extend our adversary model show the trend. from Section 3.3 to admit an attacker that can make Figures 4 and 5 show that our attack can locate a unlimited queries. Such an attacker reflects a worst case victim against P6(x, d) to an accuracy of 30 meters using scenario, so a protocol that can resist such an adversary about 100 queries, while the authors of RANDUDP [36] is also secure against a more limited adversary. show that RANDUDP needs 1400 queries to achieve error of 37.4 meters against the system with same noise P6(x, d), demonstrating the increased power of our at- 5.1 Approach Description tack algorithm. Evaluations for attackers without prior knowledge Our goal is to limit the information that can be gained of population distribution also show the similar results by an attacker with an unlimited number of queries by to those presented above, which suggests that the attack limiting the number of queries that give unique informa- algorithm does not rely significantly on prior knowledge. tion through three techniques: map quantization, consistent response, and hierarchical subdivision.

Table 1. User Per-post Absolute Error (PAEuser). Map quantization. First, as in [36], we quantize the

P(x, d) P1 P2 P3 P4 P5 map M into squares of equal size, as illustrated in Fig- PAEuser(km) 0 0.042 0.508 0.804 1.224 ure 6(a). Any queries or posts made within a quantized square Sq are assigned a location equal to the center of that square. When the server response to a query Our results confirm Wang et al. [43]’s observation Q, it computes the distance between the centers of the that increasing uncertainty in the response probability squares of the query Q and post P, instead of the dis- cannot provide strong privacy. Moreover, introducing tance between the query and the post. This technique additional uncertainty in the response probability in- limits the granularity of the attacker’s knowledge: the creases user error PAEuser, as shown in Table 1, which attacker can at most learn the square in which the vic- is calculated using Formulas 2–6. tim is located; the attacker cannot learn the location Wang et al. [43] proposed to cap the number of daily within that square. Moreover, this technique also lim- queries from each user. For example, if an attacker can its the number of distinct queries that the attacker can at most make 100 queries per day, then he can only lower make; previously, each point in space is a distinct query, the MAEatt to 0.5 km for P5(x, d). However, the attacker but now all queries within a square are identical. can use a distributed attack by multiple accounts (i.e. Sybil accounts [44]) to make more queries. Although Consistent response. The technique of consistent techniques exist to detect dummy users [17], such tech- response works together with the limited number of dis- niques are more equipped to handle thousands of such tinct queries to limit the total information leakage. In accounts; however, our attack only needs dozens of ac- consistent response, if two queries (by possibly differ- counts, which is hard to detect using existing schemes. ent nodes) are made within the same square, the same In the examples above, 10 accounts are sufficient to lo- set of posts are returned for those queries. With this cate a victim within a 0.25 km × 0.25 km square from technique, an attacker gains information only from the first query in each square. Consistent response is imple-

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 71

i < j, and then for 1 ≤ i < j ≤ n and 0 < x, we require (i) (j) P (x, di) < P (x, dj). For an incoming query of di, S1 M X2 we denote the smallest queried threshold that is larger S0 X1 S2 than di as db; and denote the largest queried threshold Sq X3 that is smaller than di as da. Then the system returns a monotonic response as follows:

(a) Map quantization. (b) Hierarchical subdivi- 1. If db exists and its response does not include a post sion (example of n = 3). P, we do not include P for di either. Fig. 6. Map quantization examples. 2. Otherwise, if da exists and its response includes P, we include P for di as well. 3. Otherwise, we determine whether or not to return mented using a random oracle [16] to return a random P according to the response probability function response according to the response probability function (i) (a) P (x,di)−P (x,da) P(x) = (b) (a) . If da does not exist, for the first query at this square, but to return the same P (x,db)−P (x,da) (a) result for future queries at this square. If a map M has P (x, da) is replaced with 0; if db does not exist, (b) |M| squares, an attacker gets a maximum of |M| inde- P (x, db) is replaced with 1. (i) (a) pendent responses, no matter how many queries the at- P (x,di)−P (x,da) The (b) (a) is P r(Rxi|x, Rxa, Rxb), where P (x,db)−P (x,da) tacker makes. Moreover, because the response is consis- Rxk denotes P was returned to a query of threshold dk, tent between queriers, additional attackers do not pro- and Rxk denotes that P was not returned, where k can vide additional information. be i, a, or b. Consistent response for multi-post system. In Hierarchical subdivision. Although map quanti- practice, a victim may make multiple posts at the same zation and consistent response can bound the informa- location and approximately the same time; an attacker tion learned by attackers, the information available to may use internal and external information to infer that attackers increases as the map area. To reduce the infor- the same user is responsible for each such post. The at- mation available in a large map, we introduce hierarchi- tacker can then join information it learns about all posts cal subdivision, which uses the structure of an n-by-n to better infer victim location. In Section 6, we show base map MB to hierarchically quantize the map M. that the multi-post can severely compromise location We divide our map MB into squares S0 as before, and privacy. Such privacy loss in a multi-post environment bundle adjacent groups of n × n squares into a higher- is common for defenses that use noisy location data (as level square S1. We repeat the joining of n × n squares in Andrés et al. [14]) However, consistent response has of Si to get higher-level squares Si+1, and stop when the advantage that the system can cluster multiple posts Sm covers M. Figure 6(b) shows an example where the from the same user that are close in time and location, MB is a 3-by-3 map. and treat all such posts consistently for each query. We With hierarchical subdivision, our responses are evaluate this approach in Section 6, but leave parameter consistent over potentially larger squares. Specifically, choices and further development as future work. when we receive a query Q at Xquery and consider Monotonic response for multi-threshold system. whether to return a post P at Xpost, we first deter- The attacker can gain more information from systems mine the hierarchical squares Si,q and Si,p that contain with multiple query thresholds than from systems with the query and the post respectively. Then we find the a single threshold, especially when responses to queries highest-level square in which the post and query are with different thresholds are independent. To limit the in different squares (that is, the maximum i for which information gain, we propose a monotonic response that Si,p 6= Si,q and Si+1,p = Si+1,q). We then use the center ensures a consistency between queries of different dis- of those squares to determine the response probability; tances. In particular, if a query at threshold a (< b) that is, we compute the response probability based on the distance between the centers of X and X . returns P, then a query at threshold b will also return Si,p Si,q P. Monotonic responses correlate responses for different Figure 6(b) shows a 3-by-3 base map, with a query thresholds, reducing the attacker’s potential informa- at X1 and two posts at X2 and X3. Because X2 and X1 tion gain. A n-threshold system has n response probabil- are not in the same S1, but in the same S2, hierarchical (i) subdivision calculates the response probability function ity functions, P (x, di) (1 ≤ i ≤ n), for n thresholds (di, using the center of X1’s S1 and the center of X2’s S1. 1 ≤ i ≤ n). We choose sorted thresholds so di < dj for Likewise, X1 and X3 are not in the same S2, but in the

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 72

same S3; therefore, the center of X1’s S2 and the center node corresponding to the query location, and in each of X3’s S2 are used to calculate the response probability. ancestor of that leaf node. When we process a post P, we consider each square Theorem 1. An attacker’s accuracy on the whole map Si at which a distinct query may be made, and deter- M under hierarchical subdivision is no greater than the mine the set of squares at which P is returned. We then attacker’s accuracy on the base map MB without hierar- place the key of P at the node corresponding to each chical subdivision. Speciﬁcally, given an attacker proba- square Si for which P is to be returned. bility distribution on the entire map P r(S0,p), and given The quadtree takes O(N +D·P ) storage, and insert- an attacker probability on the base map P r(S0,p)B,max, ing each post takes O(D) time, where P is the number of P r(S0,p) ≤ P r(S0,p)B,max for all S0,p. posts, N is number of S0 squares in the map M, and D is the unique number of queries possible for each post; D

Proof sketch: In M, for any post P in S0,p, is O(s logs N) (where s is number S0 squares in the base Y map MB). For example, when s = 25 (MB is a 5 × 5 P r(S0,p) = P r(Sm,p) P r(Si,p|Si+1,p) (19) base map) and N = 390625 (if S0’s area is 0.25 km × 0≤i≤m−1 0.25 km, then the area of the whole map M is 156.25 km In hierarchical subdivision, any query made out- × 156.25 km, which is much larger than Chicago), D is side S1,p provides no information about sub-squares only 100. Because D is small, our defense is scalable to within S1,p, because we respond to such queries us- system with a large map and a large number of posts. ing the center point of S1,p, regardless of the actual The above data structure demonstrates that our scheme S0,p. Thus only queries inside S1,p provide any informa- is eﬃciently implemented; further improvements are be- tion on P r(S0,p|S1,p). But S1 is the base map MB, so yond the scope of this paper. P r(S0,p|S1,p) ≤ P r(S0,p)B,max, Y P r(S0,p) ≤ P r(Sm,p) · P r(S0,p)B,max P r(Si,p|Si+1,p) 1≤i≤m−1 5.2 Evaluation Metrics

≤ P r(S0,p)B,max (20) Privacy level metrics. Because we quantized a map which completes the proof M into squares Sq, attackers can only localize any post Theorem 1 shows that hierarchical subdivision can P to a quantized square Sq, and cannot know the precise expand the world size while retaining an equal level location of P in Sq. If an attacker can determine that of location privacy. Moreover, hierarchical subdivision post P is in the square with center Xˆ with probability ˆ makes thresholds larger than the base map size useless P rpost(X), and the post P is uniformly distributed in a to attackers, thus reducing the number of thresholds square, then we calculate MAEatt for P as follows: available to attackers. X 1 MAEatt = |Xˆ − (x, y)| · P rpost(Xˆ) dx dy L2 ¨ Scalability. To implement response consistency, we Xˆ ∈M xmin≤x≤xmax ymin≤y≤ymax can either store the consistent query responses for each (21) square in the map M, or use a hash function to compute where (x, y) is the coordinate of the victim post P, L is the response for each post for each query. Hierarchical the length of one side of the quantized square Sq, xmin subdivision allows us to optimize memory usage in the and ymin are the minimum x and y in the square of former case, where each response is stored. post P respectively, and xmax and ymax are the maxi- To avoid visiting every post for each query, we pre- mum x and y in the square of post P respectively. The compute the result for each Si at which a distinct query derivation is in Appendix A. may be made. In particular, our data structure is a Our defense increases MAEatt in two ways: (1) map quadtree [18], where the root node represents the whole quantization makes the location of users coarse-grained; map M, each leaf node represents a distinct S0 square, (2) consistent response limits the information gain for and a node at height i represents a distinct middle-level attackers so that they cannot accurately estimate the node Si. Each node’s children are the subsquares at the victim’s square. To isolate the contributions of each fac- next lower Si level. Each node in the tree contains a list tor, we deﬁne quantized mean absolute error QMAEatt of posts that should be returned for each query within that square. Thus, when servicing a query for a particular S0 square, the system returns all posts in the leaf

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 73 to represent the error in an attacker’s knowledge of the formance more realistic, and we use flat population den- square of post P: sity to show more generic trends of user error and pri- X ˆ ˆ vacy level as map scale and square size vary. QMAEatt = |X − C(Xpost)| · P rpost(X) (22) Xˆ ∈M User per-post absolute error (PAEuser). To cal- where C(X) is the center coordinate of X’s square. culate the PAEuser, we assume all posts are uniformly Accuracy level metrics. We also change the cal- distributed across the map M. We chose a density of culation of per-post absolute error PAEuser for legit- 100 posts per square-unit of map area (more posts in imate users. For a post P at Xpost and a query at a map results in more accurate computations of error Xquery in the quantized map M, the service provider due to the law of large numbers), consider a query decides whether to return P based on the distance at Xquery = (0, 0), and use Formula 26 to calculate between C(Xpost) and C(Xquery) and the probability PAEuser for each map scale and quantized square size. response function: P(C(Xquery), C(Xpost), d). Therefore, We also calculate PAEuser under hierarchical subdivi- we replace P(Xquery, X, d) in Formulas 2–6. The deriva- sion for a 10.25 km-wide map using various base map tion is in Appendix A. scales to evaluate how hierarchical subdivision influ- ences PAEuser on a fixed-size map. WLOG, we use threshold d = 0.375 km here. 6 Defense Evaluation Evaluation for privacy level. We randomly sample 10 victim posts within the smallest test map with We experimentally evaluate our defense using the North size of 1.25 km × 1.25 km, and calculate the average Chicago map (Figure 2), across different scales, square across these 10 posts. For each map scale and quantized sizes, and response probability functions. square size, we run our attack algorithm (described in Section 4) to locate the victim post P, where queries are returned using our defense without hierarchical subdi- 6.1 Experiment Settings vision. We also evaluate our defense with hierarchical subdivision, fixing the map size at 10.25 km wide and varying the size of the base map. We also compare with Response probability functions. In Section 4.3, a control that omits consistent response, so the attacker we showed that the step response probability func- can gain information through multiple queries from the tion P1(x, d) and its noise-added variants P2(x, d) and same location. Because of quantization, an attacker can P6(x, d) cannot defend against our new attack, so in our at best learn the square in which the post was made. evaluation we focus on P3(x, d), P4(x, d), and P5(x, d) We calculate the MAEatt and QMAEatt using Formu- (Figures 1(c)–1(e)) to evaluate our defense. las 24–25. We run the experiments and the control tests Map scenarios. We evaluate our defense with and 100 times and plot the averages. without hierarchical subdivision. In experiments with- Evaluation against multiple posts. We place mul- out hierarchical subdivision, discussed in Section 6.3, tiple posts of the same victim at Xpost = (0.5, 0.5) the scale of map M and the size of quantized squares km. The attacker queries at multiple locations for posts are key factors in defense performance. Therefore, we within a fixed threshold, and the system returns each consider square maps with width ranging from 1.25 km relevant post according to the probability response func- to 10.25 km. We put the origin at the center of the map, tion. Our evaluation uses the 5.25 km × 5.25 km North so an n × n map has x- and y-axis values from [− n , n ]. 2 2 Chicago map and its population density as the at- For each map, we perform experiments with the follow- tacker’s prior knowledge. ing quantized square sizes L: 0.08325, 0.1, 0.125, 0.1665, 0.25, 0.5, and 0.75 km. We fix d = 0.375 km as the dis- Evaluation against multiple thresholds. We tance threshold. In experiments with hierarchical subdi- evaluate both non-monotonic- and monotonic-response vision, we use the same settings of square sizes, thresh- defense systems in the 5.25 km wide North Chicago old, and response probability functions, but we fix the map, and show their performance under different map to 10.25 km wide, and vary the base map size from thresholds from 0.375 to 2.25 km. We choose different 1.25 km to 10.25 km. thresholds because hierarchical subdivision limits the We use real population density as the attacker’s number of thresholds available to the attacker. We still prior knowledge to make the evaluation of defense per- place the victim at Xpost = (0.5, 0.5) km.

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 74

6.2 Experiment Results number for response probability function P5(x, d) here, though the results are similar for P3(x, d) and P4(x, d) Figures 7(a)–7(d) show the user per-post absolute error (Figures 16(a) and 16(b) in Appendix B). These results (PAEuser) in our defense without hierarchical subdivi- show that our defense provides significant privacy im- sion. To show generic trends, we assume a flat popu- provement. Figure 9(b) shows that without our defense, lation density. As Figures 7(b)–7(d) show, for a given the attacker can accurately learn the quantized square response probability function P(x, d) PAEuser is mainly containing the victim post with nearly zero entropy and determined by the map scale, but not the square size QMAEatt, whereas with our defense, both the entropy L. For a fixed map scale, PAEuser is mostly constant as and QMAEatt are always much larger than zero in the square size varies. When the square size starts to reach experiments with our defense, which means that with or exceed the threshold d, PAEuser increases, and edge our defense, the attacker can never accurately locate effects cause PAEuser to fluctuate. Thus, the thresh- the square containing the victim. old loosely bounds the square size. Figure 7(a) shows PAEuser when L = 0.25 km. This figure suggests that PAE , Sq Size = 0.25 km User Per-post Absolute Error (P ) user 3 1.5 the response probability function P(x, d) is also a key P 3 P 0.3 4 factor in PAEuser. In our experiments, each of our prob- P (km) 1 5 0.2 (km) 1 user ability response functions decreases as O( k ); when 0.1 (x/d) user

0.5 PAE such functions have a long tail (such as k = 2), PAEuser 0 PAE 10.25 7.5 0.75 5 0.5 is unbounded with increasing map scale. Formulas 26–30 0 2.5 0.25 0 2.5 5 7.5 10 Scale 0 0.0825 imply that FNE and |PA| are constant in k, and E(|FP|) Scale of Map (km) Sq Size and FPE are bounded (with respect to map scale) if (a) PAEuser (0.25 km sq size). (b) PAEuser of P3(x, d). k > 2 and k > 3 respectively. Therefore, when k > 3, User Per-post Absolute Error (P ) User Per-post Absolute Error (P ) 4 5 PAEuser converges; when k ∈ (2, 3], PAEuser is sublinear 0.6 2 (km) in map scale; when k ≤ 2, PAEuser increases linearly in (km) 0.4 user user 1 map scale. Moreover, when PAEuser is bounded, it is pri- 0.2 PAE marily inﬂuenced by map size, and secondarily (and to a 0 PAE 0 10.25 10.25 7.5 0.75 7.5 0.75 5 0.5 5 0.5 much smaller extent) by square size. At larger map sizes, 2.5 0.25 2.5 0.25 Scale 0 0.0825 Scale 0 0.0825 Sq Size as PAEuser approaches its bound, increasing map size Sq Size has little impact, and square size becomes the only fac- (c) PAEuser of P4(x, d). (d) PAEuser of P5(x, d). tor. Although PAEuser may vary when Xquery 6= (0, 0), Fig. 7. Per-post user absolute error (PAE ) in maps with our results still show the trend of PAEuser across diﬀer- user ent parameter choices. various scales and quantized square sizes. The “scale” and “sq size” denote the length of the side of map and square respectively Figure 8 shows PAE for a 10.25 km-wide square user (the unit is km). These experiments do not include hierarchical map in our defense with hierarchical subdivision that subdivision. for varying sizes of base map. For any given probability response function, PAEuser is approximately equal 0.75x0.75 Base Map (M ) 1.25x1.25 Base Map (M ) 3.25x3.25 Base Map (M ) to PAEuser of a 10.25 km-wide square map regardless B B B

1.5 1.5 1.5 of base map size (c.f. Figure 7(a)). Due to space con- P 3 P 4

(km) (km) (km) P straints, we show only results for base maps widths of 1 1 1 5 user user user 0.75 km, 1.25 km and 3.25 km; the results for other sized 0.5 0.5 0.5 base maps are similar. Our results demonstrate that for PAE PAE PAE 0 0 0 0.1 0.25 0.75 0.1 0.25 0.75 0.1 0.25 0.75 Size of S square (km) Size of S square (km) Size of S square (km) any given response probability function, PAEuser is pri- 0 0 0 marily determined by the scale of the whole map M, Fig. 8. Per-post user absolute error under our defense with hierar- regardless of the use of hierarchical subdivision. Thus, chical subdivision. The map size is 10.25 km × 10.25 km, and we further experiments examine PAEuser without hierar- vary the size of the base map and quantized squares. chical subdivision. Figures 9(a) and 9(b) present attacker entropy Figures 9(a) shows how the QMAE decreases and quantized MAE (QMAEatt) in maps with diﬀer- att ent scales, using a ﬁxed square size L = 0.25 km. The across rounds. Without our defense, the attacker can attacker has prior knowledge of the real population den- decrease the entropy to zero in at most a few hundreds of rounds (queries). In contrast, against our defense, the sity. We show only QMAEatt as a function of round

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 75 attacker cannot gain more information to lower the en- sults show that an attacker can quickly reduce its error tropy after a few rounds (queries). In this experiment, when the system treats each such post independently; we show how soon the attacker can locate the square of at 5 posts, the user has no location privacy. Consistently the victim if our defense is not applied, but in practice, responding to each query provides identical privacy to the attacker can easily locate victim’s location more ac- the one-post scenario. curately as described in Section 4.2. Furthermore, these Figure 10(a) and 10(b) show the QMAEatt against a figures show that the entropy and QMAEatt of experi- system with multiple thresholds. Without monotonic re- ments with hierarchical subdivision are similar to those sponse, the final QMAEatt drops quickly in systems with without hierarchical subdivision, as Theorem 1 predicts. more thresholds. However, with monotonic response, The results also show that smaller base maps result in the attacker’s improvement in error is not as dramatic, increased privacy under hierarchical subdivision, with and almost no marginal improvement is found for the the best improvements shown when the scale of the base fourth and subsequent thresholds. Monotonic defense map is smaller than the threshold, as shown by the Fig- has a greater effect with noisier response probability ures 18(a)– 18(b) in Appendix B. Because of Theorem 1, functions. we consider only the defense without hierarchical subdivision. Attacker QMAE (P ) Final Attacker QMAE 5 0.18 0.35 P mono 1, mono 3 0.16 P , non-mono 1, non-mono 2 2 2 3 0.3 3, mono Scale of 1.25x1.25 km Scale of 5.25x5.25 km Scale of 10.25x10.25 km P mono 0.14 4 3, non-mono P , non-mono 5, mono no defense 4 0.25 0.12 P mono 5, non-mono 0.6 0.6 0.6 defense with HS 5 7, mono P , non-mono defense without HS 5 0.2 7, non-mono 0.4 0.4 0.4 0.1 0.08 0.15 0.2 0.2 0.2 QMAE (km) QMAE (km) QMAE (km) QMAE (km) QMAE (km) 0.06 0.1 0 0 0 1 2 3 1 2 3 1 2 3 0.04 10 10 10 10 10 10 10 10 10 0.05 Rounds Rounds Rounds 0.02 0 0 2 3 (a) Case of P (x, d) = 1 . 2 4 6 8 10 10 5 (x/d)2+1.05 Number of Thresholds Rounds

Final Entropy Final Attacker QMAE (a) Ave. ﬁnal QMAEatt (b) QMAEatt of P5. 2.5 0.25 P no defense 3 P defense with HS 2 0.2 3 P defense without HS Final Entropy Final Attacker QMAE 3 1.6 P no defense 0.16 1.5 (km) 0.15 4 P mono P defense with HS 5 att 4 1.4 0.14 P defense without HS P , non-mono 1 0.1 4 5 Entropy P no defense P , single-thre 5 1.2 5 0.12

QMAE P defense with HS 0.5 0.05 5 P defense without HS 0.1 5 1

0 0 Entropy 0.08 0 2.5 5 7.5 10 0 2.5 5 7.5 10 Scale (km) Scale (km) 0.8 QMAE (km) 0.06 0.6 0.04 (b) Average ﬁnal entropy and QMAE (single-post case). 0 1 2 3 4 0 1 2 3 4 Interval of Thresholds (km) Interval of Thresholds (km)

Final Attacker QMAE 0.15 (c) Ave. final QMAEatt and entropy in two-threshold sys. P non-user-consistent 3 P , user-consistent 3 0.1 P non-user-consistent 4 P , user-consistent Fig. 10. Average attacker QMAE and entropy (over 1000 runs) 4 P non-user-consistent 5 P , user-consistent against monotonic-response and non-monotonic-response system 0.05 5 QMAE (km) with multiple thresholds. 0 2 4 6 8 10 Number of Posts (c) Average final QMAE (multi-post case). Figure 10(c) further shows how the interval between thresholds affects attacker information gain and final ac- Fig. 9. The average attacker entropy and QMAE (QMAEatt) curacy. The system has thresholds d1 and d2, and the over North Chicago maps with different scales, using a fixed x-axis shows d − d , where d is fixed at 0.375 km. In square size L = 0.25 km. We choose a threshold d = 0.375 km, 2 1 1 the monotonic-response system, the attacker QMAE and average the QMAEatt across 10 victims randomly sampled att from the map with scale of 1.25 km × 1.25 km. In the multi-post and entropy is high when two thresholds are close, be- case, we set the victim at Xpost = (0.5, 0.5) km. HS denotes cause two close thresholds have high response correla- runs on a 10.25 km-wide square map with Hierarchical Subdivi- tion. On the contrary, in the non-monotonic-response sion; the map scale reflects the size of the base map. Each point system, closer thresholds reduce attacker error, because represents the average of 100 runs. each independent query gains more information, but far- ther thresholds reduce the information available from the second query (intuitively, if d = ∞ and returns Figure 9(c) shows final QMAEatt for increasing 2 numbers of same-location posts from one user. These re- all posts, then that second query provides no addi-

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 76

tional information even when drawn independently). MAEatt and QMAEatt are mainly determined by two At greater d2 − d1, the correlation between the two factors: (1) the square size and (2) the uncertainty of the queries in the monotonic-response system gets weak, P(x, d). Attacker error increases with larger square sizes causing the monotonic-response curve to converge to and with more uncertain probability response functions. the non-monotonic-response curve, and both increase in On the other hand, the scale of map does not signifi- d2 − d1. The reduced correlation also explains why ad- cantly influence attacker error. However, if the scale of ditional thresholds in the range [0.375, 2.25] km from the map is too large, and hierarchical subdivision is not the monotonic-response system of Figure 10(a) does used, privacy level will also be low, because Figures 9(b) not help attackers to get lower QMAEatt, because more and 11(a)–11(b) show that privacy level decreases with thresholds in a fixed range (limited by hierarchical sub- increasing map size. Thus, we need hierarchical subdi- division) results in higher correlation between them. vision to construct a map with large scale from a small base map while keeping the privacy level the same as that in the small base map. Figure 11(c) shows our Final Attacker MAE (P ) Final Attacker QMAE (P ) 5 5 privacy-to-user-error ratio (MAEatt/PAEuser). A large

1 0.8 ratio means the system can achieve more privacy level

0.8 (km)

(km) 0.6 (MAE ) per unit of lost accuracy. This ﬁgure shows att att att 0.6 0.4 0.4 that as the quantized square size increases, and as the MAE QMAE 0.2 0.2 map scale decreases, the ratio increases. 0 0 5.25 5.25 0.75 0.75 3.75 0.5 3.75 0.5 2.5 0.25 2.5 0.25 Scale 1.25 0.0825 Scale 1.25 0.0825 Sq Size Sq Size 6.3 Tuning Our Defense

(a) MAEatt of P5. (b) QMAEatt of P5.

MAE / PAE (P ) Response probability function. A less certain re- att user 5 sponse probability function increases privacy level, but 4 user at the cost of user error. Applications requiring a rela- 3 tively large map should use a probability response func- / PAE 2 att 1 tion that decreases as O( k ) for k > 3, so that user 1 (x/d) MAE 0 error will be bounded regardless of map size. 0.75 0.5 1.25 0.25 2.5 Square size and map scale. Section 6.2 shows that 3.75 Sq Size 0.0825 5.25 for a given response probability function where PAEuser Scale is not bounded, privacy level (MAE ) is most strongly MAEatt att (c) of P5. PAEuser correlated with the quantized square size, whereas the user error (PAEuser) is most strongly correlated with Fig. 11. The average attacker MAE (MAEatt), quantized map scale; the relative independence of map scale and MAE (QMAE ), and privacy-to-user-error ratio (MAE / att att square size allows the system designer to maximize the PAEuser) over varying map scale (no hierarchical subdivision) and quantized square size. Each point represents the average of privacy level for a given threshold d (because the thresh- 100 runs with flat population density. old loosely bounds the choice of usable square sizes), while choosing a map scale that keeps an acceptable user error and user experience (because increasing a To explore the privacy level attained under differ- user’s map scale potentially increases the usefulness of ent parameterizations of our defense, we performed ex- the system, but also increases the user’s average error). tensive experiments under flat population density, and For a given response probability function where PAEuser plot the results of P (x, d) in Figures 11(a)–11(c). The 5 is bounded, user error is only slightly affected by square results of other response probability functions show size, while privacy is strongly affected by square size; in similar trends. Figures 11(a) and 11(b) show MAE att this case, the system designer must consider the tradeoff and QMAEatt in maps of different scales and quan- between MAEatt and PAEuser to choose a proper square tized square size (labeled as “Sq Size” in the figures). size. The threshold d of a real system may be adjustable QMAE has a trend similar to that of MAE , show- att att by user, so square size should be smaller than the gran- ing that the attacker error is mostly attributable to our ularity of the threshold d. For example, Whisper [8] uses defense, rather than the map quantization. Moreover, 1 km, 5 km, 15 km, and so forth as choices for threshold,

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 77 so Whisper would likely not want to use squares larger 7 Conclusion than 1 km2.

Hierarchical subdivision. Both our theorem and LBSN applications face strong threats to location pri- experimental results show that hierarchical subdivision vacy. We investigated current LBSNs and found a pow- makes our defense scalable in that it extends the privacy erful attack which can quickly locate a victim using rel- protection of a small base map to an arbitrarily sized atively few queries, defeating proposed defense and de- map while having minimal impact on user experience. tection mechanisms. To limit the ability of such attack- Nonetheless, the map size impacts user accuracy, and ers, we proposed a novel randomness-consistent defense an excessively large base map size could result in a low that quantizes the map into squares and consistently re- privacy level. sponds to queries from the same quantized square. Our defense prevents an attacker from accurately learning Multiple thresholds. If the LSBN system requires users’ location, and can be scaled to a large map using multiple thresholds, we recommend the designer to have hierarchical subdivision without loss of privacy level or fewer thresholds that are smaller than the scale of the significant detriment to user experience. base map. For a typical multi-threshold setting of 0.3, 1, The results of our comprehensive simulations show 5, and 20 miles (from Yelp [9]), we can set the base map the effectiveness of our strategy, and suggests that in to 2 miles wide so that only the first two thresholds are our system, privacy and accuracy can be easily tuned. useful for attackers. For thresholds less than base map We can bound the user error (PAE ) when using a scale, privacy is improved with closer thresholds. user response probability function without a long tail, inde- Periodically updating responses. Consistent re- pendent of map size, and can then tune the quantized sponse limits the information gained by attackers, but square size to manage the tradeoff between privacy level it introduces a usability problem for users that do not (MAEatt) and user error (PAEuser) for a given thresh- move between such squares. If we never update our con- old d (because the threshold loosely bounds the choice sistent response, then a user that does not move may of usable square sizes). We found that the privacy level miss some nearby posts. We therefore allow a system is primarily influenced by quantized square size, while to periodically update its responses. Naturally there is the user error is not. Although the user error is propor- a tradeoff between such a period and the privacy level. tional to the scale of the map when using a response Because attackers gain new information each time the probability function with a long tail, the privacy level random oracle updates the response, the shorter the pe- and the user error can be almost independently tuned riod, the more quickly attackers can locate users. Also, by the quantized square size and the scale of the map the lifetime of a post affects the number of times an at- respectively, we can maximize privacy for any given dis- tacker’s query can be used to locate it, so systems with tance threshold, and choose a map scale that provides short-lived posts can have shorter periods than systems the desired user error. Though randomness consistency with long-lived posts. For example, Yik Yak [10] appears prevents a user from seeing some posts if that user stays to return posts for only 12 hours, and can update their in the same square, we believe this bias is acceptable in responses every 4 hours with minimal privacy loss, since applications that do not require users to see all nearby each location will only gain information on the location posts. Moreover, the service provider can update its con- of the post 3 times before post expiration. sistent response periodically to trade privacy for usabil- Population density: sparse vs. dense. To keep ity. an equivalent privacy level, we should choose a larger square and threshold in a sparse-population map, and use small square size and threshold in a denser map. For 8 Acknowledgments example, the population density is 500/km2 in Chicago 2 suburbs, but 50, 000/km in Chicago downtown. Then, We thank the reviewers for their input. This research 2 0.1×0.1 km squares in Chicago downtown can guaran- was supported in part by NSF under grant CNS- tee averagely 500 people in a square, and achieve “500- 0953600. We also thank Hui Shi for helping us find and anonymity” However, we need larger squares of 1 × 1 process the data of the borders of census blocks from 2 km to keep the same “500-anonymity” privacy level in the City of Chicago [1]. Chicago suburbs.

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 78

bouncer-article-1.599672, 2006. References [21] G. Ghinita, P. Kalnis, A. Khoshgozaran, C. Shahabi, and K.- L. Tan. Private queries in location based services: anonymiz- [1] Boundaries - Census Blocks (Chicago, IL). http://www. ers are not necessary. In Proceedings of the 2008 ACM cityofchicago.org/city/en/depts/doit/dataset/boundaries_- SIGMOD international conference on Management of data, _censusblocks.html. pages 121–132. ACM, 2008. [2] Facebook. Online socail mobile application. https://itunes. [22] P. Golle and K. Partridge. On the anonymity of home/work apple.com/us/app/facebook/id284882215?mt=8. location pairs. In Pervasive computing, pages 390–397. [3] Fact Finder. http://factfinder.census.gov/faces/nav/jsf/ Springer, 2009. pages/index.xhtml. [23] F. Grace. Stalker victims should check for GPS. http: [4] Hinge. Online dating mobile application. http://hinge.co. //www.cbsnews.com/news/stalker-victims-should-check- [5] Match. Online dating mobile application. http://www. for-gps/, 2003. match.com. [24] M. Gruteser and D. Grunwald. Anonymous usage of [6] Skout. http://www.skout.com. location-based services through spatial and temporal cloak- [7] Tinder. Online dating mobile application. https://www. ing. In Proceedings of the 1st international conference on gotinder.com. Mobile systems, applications and services, pages 31–42. [8] Whisper. Anonymous mobile social network. https: ACM, 2003. //whisper.sh. [25] P. Harremoës and F. Topsøe. Maximum entropy fundamen- [9] Yelp. http://www.yelp.com. tals. Entropy, 3(3):191–226, 2001. [10] Yik Yak. Anonymous mobile social network. http://www. [26] M. Hendrickson. The state of location-based social network- yikyakapp.com. ing on the iPhone. http://techcrunch.com/2008/09/28/the- [11] Police: Thieves robbed homes based on Facebook, social state-of-location-based-social-networking-on-the-iphone, media sites. http://www.wmur.com/Police-Thieves-Robbed- 2008. Homes-Based-On-Facebook-Social-Media-Sites/11861116, [27] T. Jiang, H. J. Wang, and Y.-C. Hu. Preserving location pri- 2010. vacy in wireless lans. In Proceedings of the 5th international [12] New social App has juicy posts, all anonymous. http:// conference on Mobile systems, applications and services, www.nytimes.com/2014/03/19/technology/new-social-app- pages 246–257. ACM, 2007. has-juicy-posts-but-no-names.html?_r=0, 2014. [28] P. Kalnis, G. Ghinita, K. Mouratidis, and D. Papadias. Pre- [13] G. Ananthanarayanan, V. N. Padmanabhan, L. Ravin- venting location-based identity inference in anonymous spa- dranath, and C. A. Thekkath. Combine: leveraging the tial queries. Knowledge and Data Engineering, IEEE Trans- power of wireless peers through collaborative downloading. actions on, 19(12):1719–1733, 2007. In Proceedings of the 5th international conference on Mobile [29] J. Krumm. Inference attacks on location tracks. In Pervasive systems, applications and services, pages 286–298. ACM, Computing, pages 127–143. Springer, 2007. 2007. [30] M. Li, H. Zhu, Z. Gao, S. Chen, L. Yu, S. Hu, and K. Ren. [14] M. E. Andrés, N. E. Bordenabe, K. Chatzikokolakis, and All your location are belong to us: Breaking mobile social C. Palamidessi. Geo-indistinguishability: Differential privacy networks for automated user location tracking. In Proceed- for location-based systems. In Proceedings of the 2013 ings of the 15th ACM international symposium on Mobile ad ACM SIGSAC conference on Computer & communications hoc networking and computing, pages 43–52. ACM, 2014. security, pages 901–914. ACM, 2013. [31] N. Li and G. Chen. Analysis of a location-based social net- [15] C. A. Ardagna, M. Cremonini, E. Damiani, S. D. C. work. In Computational Science and Engineering, 2009. Di Vimercati, and P. Samarati. Location privacy protec- CSE’09. International Conference on, volume 4, pages 263– tion through obfuscation-based techniques. In Data and 270. Ieee, 2009. Applications Security XXI, pages 47–60. Springer, 2007. [32] M. F. Mokbel, C.-Y. Chow, and W. G. Aref. The new [16] M. Bellare and P. Rogaway. Random oracles are practical: A casper: A privacy-aware location-based database server. paradigm for designing efficient protocols. In Proceedings of In Data Engineering, 2007. ICDE 2007. IEEE 23rd Interna- the 1st ACM conference on Computer and communications tional Conference on, pages 1499–1500. IEEE, 2007. security, pages 62–73. ACM, 1993. [33] M. Motani, V. Srinivasan, and P. S. Nuggehalli. Peoplenet: [17] G. Danezis and P. Mittal. Sybilinfer: Detecting sybil nodes engineering a wireless virtual social network. In Proceed- using social networks. In NDSS. San Diego, CA, 2009. ings of the 11th annual international conference on Mobile [18] R. A. Finkel and J. L. Bentley. Quad trees a data structure computing and networking, pages 243–257. ACM, 2005. for retrieval on composite keys. Acta informatica, 4(1):1–9, [34] A. Narayanan, N. Thiagarajan, M. Lakhani, M. Hamburg, 1974. and D. Boneh. Location privacy via private proximity test- [19] B. Gedik and L. Liu. Location privacy in mobile systems: ing. In NDSS. Citeseer, 2011. A personalized anonymization model. In Distributed Com- [35] S. Papadopoulos, S. Bakiras, and D. Papadias. Nearest puting Systems, 2005. ICDCS 2005. Proceedings. 25th IEEE neighbor search with strong location privacy. Proceedings of International Conference on, pages 620–629. IEEE, 2005. the VLDB Endowment, 3(1-2):619–629, 2010. [20] A. Gendar and A. Lisberg. How cell phone helped cops nail [36] I. Polakis, G. Argyros, T. Petsios, S. Sivakorn, and A. D. key murder suspect. Secret ’pings’ that gave bouncer away. Keromytis. Where’s wally?: Precise user discovery attacks in http://www.nydailynews.com/archives/news/cell-phone- location proximity services. In Proceedings of the 22nd ACM helped-cops-nail-key-murder-suspect-secret-pings-gave- SIGSAC Conference on Computer and Communications

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 79

Security, pages 817–828. ACM, 2015. square of post P, and xmax (ymax) is the maximum x [37] K. P. Puttaswamy, S. Wang, T. Steinbauer, D. Agrawal, (y) in the square of post P. A. El Abbadi, C. Kruegel, and B. Y. Zhao. Preserving loca- Our defense increases MAEatt in two ways: (1) map tion privacy in geosocial applications. Mobile Computing, quantization makes the location of users coarse-grained; IEEE Transactions on, 13(1):159–173, 2014. [38] K. P. Puttaswamy and B. Y. Zhao. Preserving privacy in (2) consistent response limits the information gain for location-based mobile social applications. In Proceedings attackers so that they cannot accurately estimate the of the Eleventh Workshop on Mobile Computing Systems & victim’s square. To isolate the contributions of each fac- Applications, pages 1–6. ACM, 2010. tor, we deﬁne quantized mean absolute error QMAEatt [39] B. Schilit, J. Hong, and M. Gruteser. Wireless location to represent the error in an attacker’s knowledge of the privacy protection. Computer, 36(12):135–137, 2003. [40] R. Shokri, G. Theodorakopoulos, J.-Y. Le Boudec, and J.- square of post P: P. Hubaux. Quantifying location privacy. In Security and X ˆ ˆ QMAEatt = |X − C(Xpost)| · P rpost(X) (25) privacy (sp), 2011 ieee symposium on, pages 247–262. IEEE, Xˆ ∈M 2011. [41] M. Siegler. Foodspotting is a location-based game that will where C(X) is the center coordinate of X’s square. make your mouth water. http://techcrunch.com/2010/03/ Accuracy level metrics. We also change the calcu- 04/foodspotting. [42] M. Veytsman. How I was able to track the location of any lation of per-post absolute error PAEuser for legitimate Tinder user. http://blog.includesecurity.com/2014/02/how- users. For a post P at Xpost and a query at Xquery in the i-was-able-to-track-location-of-any.html. quantized map M, the service provider decides whether [43] G. Wang, B. Wang, T. Wang, A. Nika, H. Zheng, and B. Y. to return P based on the distance between C(Xpost) Zhao. Whispers in the dark: Analysis of an anonymous and C(X ) and the probability response function: social network. In IMC ’14 Proceedings of the 2014 Confer- query ence on Internet Measurement Conference, pages 137–150, P(C(Xquery), C(Xpost), d). We use Formulas 2–6 to cal- New York, USA, 2014. ACM. culate PAEuser as follows: [44] Z. Yang, C. Wilson, X. Wang, T. Gao, B. Y. Zhao, and FPE + FNE PAEuser = (26) Y. Dai. Uncovering social network sybils in the wild. ACM E(|FP|) + |PA| Transactions on Knowledge Discovery from Data (TKDD), where, 8(1):2, 2014. X [45] G. Zhong, I. Goldberg, and U. Hengartner. Louis, lester FNE = [(d − |X − Xquery|) and pierre: Three protocols for location privacy. In Privacy X∈X(P)∩Cd (27) Enhancing Technologies, pages 62–76. Springer, 2007. ·(1 − P(C(Xquery), C(X), d))]

X FPE = [(|X − Xquery| − d) · P(C(Xquery), C(X), d)]

A Defense Evaluation Metrics X∈X(P)\Cd (28) X Privacy level metrics. Because we quantized a map E(|FP|) = P(C(Xquery), C(X), d) (29)

M into squares Sq, attackers can only localize any post X∈X(P)\Cd P to a quantized square S , and cannot know the precise q |PA| = |X(P) ∩ Cd| (30) location of P in Sq. If an attacker can determine that post P is in the square with center Xˆ with probabil- The X(P) is the set of locations of all posts in the sys- ity P rpost(Xˆ), then we modify the calculation for mean tem. The distribution of X(P) may vary across envi- absolute error (Formula 1) as follows: ronments, so one system setting can produce diﬀerent X metrics. MAEatt = |Xˆ − Xpost| · P rpost(Xˆ) (23) Xˆ ∈M where Xpost is the coordinate of the victim post P. If post P is uniformly distributed in a square, then B Additional Figures the average MAEatt for P is: X 1 MAEatt = |Xˆ − (x, y)| · P rpost(Xˆ) dx dy L2 ¨ Xˆ ∈M xmin≤x≤xmax ymin≤y≤ymax (24) where the L is the length of one side of the quantized square Sq, xmin (ymin) is the minimum x (y) in the

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 80

Entropy of System with Multiple Posts (P ) 5 3 1 posts, defense without HS 1 posts, no defense 2 posts, defense without HS 2 posts, no defense 2.5 3 posts, defense without HS 3 posts, no defense 5 posts, defense without HS 5 posts, no defense 2 10 posts, defense without HS 10 posts, no defense

1.5 Entropy

0.5

0 101 102 Rounds

(a) Entropy of P5 vs. rounds. Attacker QMAE under Multiple Posts (P ) 5

1 posts, defense without HS 0.35 1 posts, no defense 2 posts, defense without HS 2 posts, no defense Fig. 12. North Chicago geographical map. 3 posts, defense without HS 0.3 3 posts, no defense 5 posts, defense without HS 5 posts, no defense 0.25 10 posts, defense without HS 10 posts, no defense Final Attacker MAE (P ) Final Attacker QMAE (P ) 3 3 0.2

0.8 0.6

QMAE (km) 0.15 (km)

(km) 0.6

att 0.4 att 0.4 0.1

MAE 0.2 QMAE 0.2 0.05 0 0 5.25 5.25 0 0.75 0.75 1 2 3.75 0.5 3.75 0.5 10 10 2.5 0.25 2.5 0.25 Rounds Scale 1.25 0.0825 Scale 1.25 0.0825 (b) QMAE of P5 vs. rounds. Sq Size Sq Size att Fig. 14. Non-user-consistent defense privacy loss rate. Average (a) MAEatt of P3. (b) QMAEatt of P3. attacker entropy and QMAE (over 100 runs) against system MAE / PAE (P ) Final Attacker MAE (P ) att user 3 4 with multiple same-location posts from one user in the 5.25 km × 5.25 km North Chicago map. The square size is 0.25 km × 6 0.8 user 0.25 km. The victim is at Xpost = (0.5, 0.5) km and the thresh-

(km) 0.6 4 old d = 0.375 km. att / PAE

att 0.4

2 MAE 0.2 MAE

0 0 0.75 5.25 0.5 1.25 3.75 0.75 0.25 2.5 0.5 3.75 2.5 0.25 Sq Size 0.0825 5.25 Scale 1.25 0.0825 Scale Sq Size MAEatt (c) of P3. (d) MAE of P4. PAEuser att

Final Attacker QMAE (P ) MAE / PAE (P ) 4 att user 4

0.6 4 user

(km) 3

att 0.4 / PAE 2 att 0.2 QMAE 1 MAE

0 0 5.25 0.75 0.5 3.75 0.75 1.25 0.5 0.25 2.5 2.5 0.25 3.75 Scale 1.25 0.0825 Sq Size 0.0825 5.25 Sq Size Scale MAEatt (e) QMAE of P4. (f) of P4. att PAEuser

Fig. 13. The average attacker MAE (MAEatt), quantized MAE (QMAEatt), and privacy-to-user-error ratio (MAEatt / PAEuser) of P3(x, d) and P4(x, d) over varying map scale (no hierarchical subdivision) and quantized square size. Each point represents the average of 100 runs.

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 81

Entropy of System with Multiple Thresholds (P ) Attacker QMAE under Multiple Thresholds (P ) 5 5 3 0.375, mono 0.375, mono 0.375, non-mono 0.375, non-mono 0.375-1.25, mono 0.35 0.375-1.25, mono 0.375-1.25, non-mono 0.375-1.25, non-mono 0.375-0.75-1.25, mono 0.375-0.75-1.25, mono 0.375-0.75-1.25, non-mono 0.375-0.75-1.25, non-mono 2.5 0.375-0.75-1.25-2.25, mono 0.375-0.75-1.25-2.25, mono 0.375-0.75-1.25-2.25, non-mono 0.3 0.375-0.75-1.25-2.25, non-mono 0.375-0.75-1.25-1.75-2.25, mono 0.375-0.75-1.25-1.75-2.25, mono 0.375-0.75-1.25-1.75-2.25, non-mono 0.375-0.75-1.25-1.75-2.25, non-mono 0.375-0.75-1-1.25-1.75-2.25, mono 0.375-0.75-1-1.25-1.75-2.25, mono 0.375-0.75-1-1.25-1.75-2.25, non-mono 0.375-0.75-1-1.25-1.75-2.25, non-mono 2 0.375-0.5-0.75-1-1.25-1.75-2.25, mono 0.25 0.375-0.5-0.75-1-1.25-1.75-2.25, mono 0.375-0.5-0.75-1-1.25-1.75-2.25, non-mono 0.375-0.5-0.75-1-1.25-1.75-2.25, non-mono

0.2 1.5 Entropy

QMAE (km) 0.15

0.1

0.5 0.05

0 0 102 103 102 103 Rounds Rounds

(a) Entropy of P5 vs. rounds. (b) QMAEatt of P5 vs. rounds. Fig. 15. Average attacker entropy and QMAE (over 100 runs) against non-monotonic-response and monotonic-response systems with multiple thresholds in the 5.25 km × 5.25 km North Chicago map. The square size is 0.25 km × 0.25 km. The victim is at Xpost = (0.5, 0.5) km and the threshold d = 0.375 km. The threshold settings are presented in the ﬁgure legend. For example, “0.375-1.25” means thresholds are 0.375 and 1.25 km.

Scale of 1.25x1.25 km2 Scale of 3.25x3.25 km2 Scale of 5.25x5.25 km2 Scale of 7.25x7.25 km2 Scale of 10.25x10.25 km2 no defense 0.6 0.6 0.6 0.6 0.6 defense with HS defense without HS 0.4 0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2 0.2 QMAE (km) QMAE (km) QMAE (km) QMAE (km) QMAE (km) 0 0 0 0 0 101 102 103 101 102 103 101 102 103 101 102 103 101 102 103 Rounds Rounds Rounds Rounds Rounds (a) Case of P (x, d) = 1 response probability function. 3 (x/d)4+1.05 Scale of 1.25x1.25 km2 Scale of 3.25x3.25 km2 Scale of 5.25x5.25 km2 Scale of 7.25x7.25 km2 Scale of 10.25x10.25 km2 no defense 0.6 0.6 0.6 0.6 0.6 defense with HS defense without HS 0.4 0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2 0.2 QMAE (km) QMAE (km) QMAE (km) QMAE (km) QMAE (km) 0 0 0 0 0 101 102 103 101 102 103 101 102 103 101 102 103 101 102 103 Rounds Rounds Rounds Rounds Rounds (b) Case of P (x, d) = 1 response probability function. 4 (x/d)3+1.05

0.2 0.2 0.2 0.2 0.2 QMAE (km) QMAE (km) QMAE (km) QMAE (km) QMAE (km) 0 0 0 0 0 101 102 103 101 102 103 101 102 103 101 102 103 101 102 103 Rounds Rounds Rounds Rounds Rounds (c) Case of P (x, d) = 1 response probability function. 5 (x/d)2+1.05

Fig. 16. The average attacker entropy and QMAE (QMAEatt) over North Chicago maps with different scales, using a fixed square size L = 0.25 km. We choose a threshold d = 0.375 km, and average the QMAEatt across 10 victims randomly sampled from the map with scale of 1.25 km × 1.25 km. HS denotes runs on a 10.25 km × 10.25 km map with Hierarchical Subdivision; the map scale reflects the size of the base map. Each point represents the average of 100 runs.

Unauthenticated Download Date | 10/23/16 6:29 PM Location Privacy with Randomness Consistency 82

Rd 1, H = 6.9276 Rd 2, H = 5.9973 Rd 3, H = 5.215 Rd 4, H = 4.3096 Rd 6, H = 2.3485 Rd 8, H = 0 5 5 5 5 5 0.3 5 1 0.015 0.025 0.04 0.08 0.8 0.02 0.01 0.03 0.06 0.2 0.6 0 0 0.015 0 0 0 0

km km km 0.02 km 0.04 km km 0.4 0.005 0.01 0.1 0.005 0.01 0.02 0.2 -5 0 -5 0 -5 0 -5 0 -5 0 -5 0 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 (a) Response probability function P1(x, d) Rd 1, H = 7.4265 Rd 2, H = 6.8263 Rd 3, H = 6.1966 Rd 5, H = 4.6079 Rd 10, H = 2.9169 Rd 50, H = 0.0126 5 0.015 5 5 5 5 5 0.03 0.08 0.3 0.02 0.8 0.06 0.01 0.015 0.02 0.2 0.6 0 0 0 0 0.04 0 0 km km 0.01 km km km km 0.4 0.005 0.01 0.1 0.005 0.02 0.2 -5 0 -5 0 -5 0 -5 0 -5 0 -5 0 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 (b) Response probability function P2(x, d) -3 -3 5 Rd 1, H = 9.0479 ×10 5 Rd 3, H = 9.6963 ×10 5 Rd 5, H = 8.369 5 Rd 10, H = 6.6026 5 Rd 20, H = 4.7369 5 Rd 200, H = 0.1604 8 0.025 0.05 8 0.8 6 0.02 0.04 0.1 6 0.015 0.03 0.6 0 0 4 0 0 0 0 km 4 km km 0.01 km 0.02 km 0.05 km 0.4 2 2 0.005 0.01 0.2 -5 0 -5 0 -5 0 -5 0 -5 0 -5 0 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 (c) Response probability function P (x, d) = 1 3 (x/d)4+1.05 -3 -3 -3 5 Rd 1, H = 9.5405 ×10 5 Rd 5, H = 10.1307 ×10 5 Rd 10, H = 9.6933 ×10 5 Rd 20, H = 5.6986 5 Rd 50, H = 3.9389 5 Rd 200, H = 0.5217 6 0.25 0.08 0.8 6 3 0.2 4 0.06 0.15 0.6 0 4 0 2 0 0 0 0

km km km km 0.04 km 0.1 km 0.4 2 1 2 0.02 0.05 0.2 -5 0 -5 0 -5 0 -5 0 -5 0 -5 0 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 (d) Response probability function P (x, d) = 1 4 (x/d)3+1.05

Rd 1, H = 9.9643 ×10 -3 Rd 5, H = 9.9052 ×10 -3 Rd 10, H = 9.9652 ×10 -3 Rd 20, H = 6.4488 Rd 50, H = 5.1765 Rd 200, H = 1.2822 5 5 5 5 5 0.05 5 5 4 4 0.15 4 0.04 0.6 3 3 3 0.03 0.1 0 0 2 0 2 0 0 0 0.4 km 2 km km km 0.02 km km 0.05 1 1 1 0.01 0.2 -5 0 -5 0 -5 0 -5 0 -5 0 -5 0 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 (e) Response probability function P (x, d) = 1 5 (x/d)2+1.05

Rd 1, H = 7.1322 Rd 2, H = 6.2307 Rd 3, H = 5.4707 Rd 5, H = 3.9917 Rd 10, H = 0.48 Rd 50, H = 0.0001 5 5 5 5 0.15 5 1 5 1 0.015 0.025 0.04 0.8 0.8 0.02 0.01 0.03 0.1 0.6 0.6 0 0 0.015 0 0 0 0

km km km 0.02 km km 0.4 km 0.4 0.005 0.01 0.05 0.005 0.01 0.2 0.2 -5 0 -5 0 -5 0 -5 0 -5 0 -5 0 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 (f) Response probability function P6(x, d) Fig. 17. No defense privacy loss rate. Probability distribution map vs round number (attacker starts with population distribution) under no response-consistent defense. The map scale is 10.25 km × 10.25 km, the square size is 0.25 km × 0.25 km, the victim post is at (1.25, 0.75) km, and the threshold distance is 1.75 km. The number of rounds between each image varies between the functions to show the eﬀectiveness of each probability response function.

Rd 1, H = 9.9643 ×10 -3 Rd 3, H = 9.6787 ×10 -3 Rd 5, H = 10.0144 ×10 -3 Rd 10, H = 7.7208 Rd 100, H = 3.0271 Rd 1000, H = 1.3781 5 5 5 5 5 5 5 6 0.02 0.3 4 4 0.6 0.015 3 4 3 0.2 0 0 0 0 0 0 0.4

km 2 km km 2 km 0.01 km km 2 0.1 0.2 1 1 0.005 -5 0 -5 0 -5 0 -5 0 -5 0 -5 0 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 (a) Case of P (x, d) = 1 , defense without hierarchical subdivision. 5 (x/d)2+1.05

Rd 1, H = 9.7238 ×10 -3 Rd 3, H = 10.099 ×10 -3 Rd 5, H = 7.9386 Rd 10, H = 7.1114 Rd 100, H = 4.19 Rd 1000, H = 4.1743 5 5 5 0.025 5 5 5 4 3 0.02 0.02 0.1 0.1 3 0.015 0.015 0 0 2 0 0 0 0 km 2 km km 0.01 km 0.01 km 0.05 km 0.05 1 1 0.005 0.005 -5 0 -5 0 -5 0 -5 0 -5 0 -5 0 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 -5 0 5 (b) Case of P (x, d) = 1 , defense with hierarchical subdivision. 5 (x/d)2+1.05 Fig. 18. Hierarchical-subdivision defense privacy loss rate. Change of probability distribution map by rounds. We run the system with three diﬀerent response probability functions (P3(x, d), P4(x, d), and P5(x, d)) under the defense without hierarchical subdivision and the defense with hierarchical subdivision. Due to space limit, we just show the result of P5(x, d) here, and other results are similar. The scale of the whole map is 10.25 km× 10.25 km, the victim post Xpost is at (1.25, 0.75) km, the threshold d = 1.75 km, and the quantized square size L = 0.25 km. In the defense with hierarchical subdivision, the scale of the base map is 1.25 km× 1.25 km, and we expend it to the map 10.25 km× 10.25 km. Attacker knows the population density in advance.

Unauthenticated Download Date | 10/23/16 6:29 PM