AnonyCast: Privacy-Preserving Location Distribution for Anonymous Crowd Tracking Systems
Takamasa Higuchi† Paul Martin‡ Supriyo Chakraborty⇤ Mani Srivastava‡ †Osaka University, Japan ‡University of California, Los Angeles, CA ⇤IBM Research, NY [email protected], [email protected], [email protected], [email protected]
ABSTRACT Given the growing popularity of location-based services for Fusion of infrastructure-based pedestrian tracking systems mobile devices, it would be natural to expect that the pow- and embedded sensors on mobile devices holds promise erful measurement capability of such wide-spread sensor in- for providing accurate positioning in large public buildings. frastructures could also benefit individual pedestrians walk- However, privacy concerns regarding handling of sensitive ing in indoor spaces. Accurate indoor positioning for mo- user location data potentially disrupt the adoption of such sys- bile devices has been a long-standing open problem in ubiq- tems. This paper presents AnonyCast, a novel privacy-aware uitous computing. Currently, the most popular positioning mechanism for delivering precise location information mea- solution for consumer mobile products is radio fingerprint- sured by crowd-tracking systems to individual pedestrians’ ing using Wi-Fi [5, 15] and Bluetooth Low Energy (BLE) smartphones. AnonyCast uses sparsely placed Bluetooth Low radios [7, 8, 17]. However, these approaches often suffer Energy transmitters to advertise location-dependent, time- from large position errors in practical indoor environments varying keys. Using location measurements, AnonyCast esti- due to dense multi-path signal propagation and low tempo- mates a subset of keys that each pedestrian’s phone receives ral stability of radio fingerprints [4]. Furthermore, the ac- along its path. By combining a cryptography scheme called curacy of radio-based positioning systems depends consider- CP-ABE with a novel greedy algorithm for key selection, it ably on the density of anchor devices (e.g., BLE transmitters) encrypts each path before publishing, allowing users to de- [8]. Since dense anchor deployments obviously cause non- crypt only their own trajectories. The results from field exper- negligible maintenance costs, positioning accuracy is also of- iments show that AnonyCast delivers accurate locations over ten limited by operational constraints. 84% of time, bounding probability of unauthorized access to The output of crowd tracking systems is typically a set of one’s location below 1%. anonymous trajectories which are not associated with any mo- Author Keywords bile device. Therefore, these systems cannot serve alone to Location privacy; crowd tracking; trajectory identification; provide mobile devices with their own locations. Recent re- ciphertext-policy attribute-based encryption search has bridged this gap by developing trajectory identifi- cation algorithms which find trajectories of individual mobile ACM Classification Keywords users from a set of anonymous trajectories [24, 25, 26]. These C.5.3 Computer System Implementation: Portable devices; approaches assume that the crowd tracking system publishes E.3 Data Encryption: Public key cryptosystems all of the anonymous trajectories obtained by crowd track- ing sensors via a Wi-Fi network. Each mobile device con- INTRODUCTION nects to Wi-Fi access points to obtain the published trajecto- Recent evolution of crowd tracking technologies has en- ries and then identifies its own location based on the consis- abled accurate measurement of occupancy and trajectories for tency between the trajectories and local measurements from pedestrians in indoor spaces using vision [6], radio tomogra- phone-embedded sensors (e.g., accelerometers, gyroscopes, phy [18, 27], and laser range scanners [9, 29]. This in turn has etc.). While these efforts have established an effective way motivated research communities in both academia and indus- of utilizing the crowd tracking infrastructure for indoor local- try to leverage them for marketing [14], crowd management ization, growing awareness of and concern for privacy makes [12], and even optimizing energy expenditures in buildings such unrestricted release of trajectory information a difficult [1, 23]. As a result, an increasing number of public buildings proposition. These systems publish pedestrians’ trajectories are equipped with sensors like cameras or laser range scan- without consent and, although the trajectories themselves are ners and capable of fine-grained crowd behavior analyses. anonymous, it is possible for a malicious user to combine these trajectories with external information (e.g., collected by following an individual for a short period) to deanonymize a Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed desired trajectory. This trajectory can then be used to infer for profit or commercial advantage and that copies bear this notice and the full cita- potentially private information about an individual. tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- In this paper we present AnonyCast, a privacy preserving lo- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. cation distribution mechanism for crowd tracking systems. UbiComp ’15, September 7–11, 2015, Osaka, Japan. We assume that sensors capable of accurate trajectory mea- Copyright 2015 c ACM 978-1-4503-3574-4/15/09...$15.00. http://dx.doi.org/10.1145/2750858.2805827 surement (e.g., laser range scanners) are already installed and RELATED WORK operated in a target building for crowd behavior analysis. One of the most popular approaches to crowd tracking uses AnonyCast extends this system to feed the precise trajectory image sensors (i.e., cameras). The current mainstream in measurements to individual mobile phone users in a privacy- vision-based pedestrian tracking systems is to extract the fea- preserving manner. The extension is enabled by a small num- tures that best distinguish pedestrians from images in a train- ber of BLE transmitters, which are sparsely deployed in the ing data set and then to use a pattern matching algorithm to environment and periodically advertise location-dependent, detect human bodies [10, 19, 30]. However, the ethics and time-varying keys. Based on the trajectories measured by acceptability of using images from surveillance cameras in the crowd tracking sensors, the AnonyCast server estimates public spaces for such purposes remains controversial [21], as a set of keys that each pedestrian’s device is likely to have personal identities (e.g., faces) can easily be associated with received. The server then uses these keys to encrypt each tra- trajectories, potentially infringing user privacy. jectory prior to publishing them, ensuring that mobile phone users can gain access to only their own trajectories. As alternative solutions, there have been a variety of ap- proaches that track pedestrian locations in an anonymous Although the proposed mechanism follows as a natural pri- manner. Radio tomography [18, 27, 28] employs received vacy extension, the following aspects present challenges in signal strength between multiple radio stations to detect hu- its implementation as a practical system: (1) Mobile devices man locations, assuming that movement of pedestrians in the may fail to receive advertised keys due to packet loss, even environment causes temporal variations in the signal strength. if they are in close proximity to a BLE transmitter. En- Laser range scanners (LRS) have also been explored as a rea- suring that the system provides reasonable accessibility to sonable option for accurate and anonymous pedestrian track- trajectory information even with such frequent packet loss ing [9, 29]. This sensor provides precise distance measure- is difficult. (2) Decryption keys are publicly broadcasted, ments to surrounding objects, allowing robust crowd tracking making it non-trivial to prevent potential privacy leaks by with sub-meter accuracy. Previous literature has shown that ensuring that people other than true owners cannot decrypt capacitive sensor arrays [24] and low-resolution image sen- the published trajectories. As a solution to these issues, we sors [25] are also suitable for anonymous pedestrian tracking. base our system on the emerging public key cryptography scheme called Ciphertext-Policy Attribute-Based Encryption Trajectory identification technology has bridged the gap (CP-ABE). This allows the sender to specify an access policy between the crowd tracking systems described above and on the secret data in the form of a logical expression over pri- location-dependent mobile applications. Teixeila et al. [25] vate keys, so that users can decrypt the data only if they have effectively combine a vision-based pedestrian tracking sys- a set of keys that satisfy the policy. Upon this scheme, we tem with MEMS inertial sensors in mobile phones to enable build a framework that probabilistically ensures a desired pri- accurate indoor positioning. They find the corresponding tra- vacy level. Finally, we build and deploy a prototype system jectory of each mobile user based on the consistency between upon which we conduct field experiments using real crowd shapes of the anonymous trajectories and measurements from tracking sensors and various smartphone models. The re- inertial sensors in the pedestrians’ mobile phones. Sousa et sults of these experiments show that AnonyCast enables users al. [24] developed a similar localization system using capac- to obtain their own precise locations more than 84% of the itive sensor arrays laid out on a floor. They assume pedes- time, while bounding the probability of unauthorized access trians have wearable accelerometers and detect the timing of to one’s location data below 1%. In addition, we conducted walking steps by both the wearable sensors and the capaci- extensive simulations to better understand AnonyCast’s per- tive sensors on the floor. Thus trajectory identification can formance under a variety of conditions and parameters. be done by comparing the sequence of walking steps on the anonymous trajectories. Wada et al. [26] periodically mea- The contributions of this paper are summarized as follows: sure proximity between neighboring mobile phones by Blue- (i) We analyze privacy risks in trajectory identification sys- tooth radios and evaluate consistency between proximity pat- tems. To the best of our knowledge, this is the first work to terns between phones and distances between the anonymous explore the potential privacy risks in utilizing crowd track- trajectories. These systems assume that the underlying crowd ing infrastructures for localization of mobile devices. (ii) We tracking systems publish all the detected trajectories via a net- design AnonyCast, a novel location distribution mechanism work so that mobile phones can locally perform trajectory that allows mobile users to reliably access accurate trajectory identification to find their own trajectory from a set of anony- measurements from a crowd tracking system without com- mous trajectories. This introduces privacy risks since a pedes- promising location privacy. To this end, we develop a com- trian’s accurate trajectory can be published without consent. putationally efficient greedy algorithm that provides strong probabilistic guarantees on user privacy. (iii) We implement a Some recent work develops mechanisms to prove users’ loca- prototype system and benchmark the performance of Anony- tions, intending to cope with mobile users who report spoofed Cast through experiments with real sensor devices as well locations to mobile systems [16, 22]. They basically assume as extensive simulations. The experimental results show that that mobile devices communicate with the neighboring wire- our system can successfully achieve a specified privacy level less stations to obtain time-varying tokens as location proofs. while providing reasonable accessibility to the trajectory in- Unlike the existing systems, AnonyCast intelligently com- formation by the true owner even with severe packet loss. bines multiple location proofs collected on a path to enable secure delivery of private information (i.e., trajectories). PRIVACY MODELS anonymous human trajectories In this section we describe the threat model and privacy re- quirements for AnonyCast. crowd tracking engine AnonyCast server raw sensor measurements encrypted trajectories Threat Model BLE transmitter Wi-Fi access point This work assumes that a given building has a crowd tracking system capable of tracking the locations of pedestrians in an area of interest in an accurate and anonymous manner. The crowd tracking system then publishes the detected trajecto- Clients receive location-dependent time-varying decryption keys from BLE transmitters ries to mobile users for use in location-dependent mobile ap- plications. However, the users may hesitate to subscribe to networked crowd tracking sensors the service if there is any concern that the system may asso- ciate the anonymous human trajectories with personal identi- Figure 1. A high-level overview of AnonyCast fying information. For example, server operators may try to deanonymize trajectories by associating them with MAC ad- multiple people move together in a group. While this defini- dresses of mobile devices obtained in the process of location tion allows people to obtain trajectories of other members in distribution. This problem is exacerbated if device MAC ad- the same group, this does not introduce any privacy concerns dresses can be linked to other personal attributes (e.g., phone because all members are in proximity to each other and can number, home address, etc.). Although some recent mobile be considered true owners of the group trajectory. operating systems attempt to reduce this kind of privacy risk We define the privacy level of a system by (1 ✓ ), where ✓ by randomly rotating the phone’s Wi-Fi MAC address while pl pl is the probability that published trajectories are successfully probing for access points, the device’s original MAC address decrypted by non-owners. In AnonyCast, ✓ is given as a is still used once a connection to a specific access point is pl system parameter and should be sufficiently small to prevent established. Our first goal is to cope with this problem by privacy leakage from the published trajectories. designing a mechanism that enables an operator of the crowd deliver tracking system to the precise location information to SYSTEM OVERVIEW mobile users, guaranteeing that this kind of association is not In this section, we outline the architecture and design deci- possible. Thus the users can subscribe to the service even if sions taken to realize privacy-preserving location distribution. they do not fully trust the system operator. A privacy threat may also exist among the users: An attacker, Architecture say Bob, may attempt to use the published trajectory to learn Fig. 1 depicts a high-level overview of the AnonyCast sys- the current location of a specific person, say Alice, without tem. We assume that a sensor infrastructure for anonymous her knowledge. Prior to release, a trajectory is anonymized crowd tracking is deployed in the area of interest, tracking by stripping it of all personal identifiers and only a tempo- locations of pedestrians in the area. For simplicity of dis- ral sequence of two dimensional coordinates is published. cussion, we assume an LRS-based tracking system hereafter. However, the assumption of anonymity no longer holds in Note, however, that the basic mechanism of AnonyCast can the presence of external, identifying information—for exam- be easily extended to other types of sensors provided they can ple, if Bob follows Alice for a short period of time. If Bob anonymously track pedestrians with sufficient resolution. can follow Alice long enough to uniquely identify and asso- In addition to the sensors for anonymous tracking, we ciate Alice with a specific trajectory in the published data, he sparsely deploy BLE transmitters on the walls or ceilings. can continue to track her location as long as her trajectory is Every ⌧ seconds, each transmitter b advertises a location- detected by the crowd sensing system. A similar attack can i dependent, time-varying key, say key(bi,t) for time t. Mo- be possible without physically tracking the target person if bile clients that subscribe to the AnonyCast location service her mobility has characteristic patterns. For example, if Bob probe these BLE beacons using standard Bluetooth device knows that Alice works at a store in a shopping mall and she discovery mechanisms and save the corresponding keys in lo- usually goes to a restaurant for lunch at a specific time, Bob cal storage as evidence that they were within the signal trans- may infer which anonymous trajectory belongs to her. mission range of bi at time t. Privacy Requirement Here we assume that the AnonyCast server (or simply the Let T be a set of anonymous trajectories that are detected by server) maintains the following information: (i) locations of the crowd tracking sensors. We define a pedestrian A as the BLE transmitters, (ii) the keys that are advertised by each true owner of a trajectory tr T if A’s true location has BLE transmitter at each time step, and (iii) the set of anony- j 2 been within d meters of trj for a ratio ✓own of time steps mous trajectories T observed during the recent W time steps. over a recent window W , where d, ✓own and W are system If the server and each BLE transmitter share common seed pa- parameters. Otherwise, A is designated as a non-owner of rameters in an installation phase, they can generate the same trajectory trj. Our privacy requirement is that only true own- keys without the need for communication. For each anony- ers can access each published trajectory. The spatial toler- mous trajectory tr T , the server estimates a set of keys j 2 ance d and the temporal criterion ✓own are introduced to offer Kj that are likely to be received by the owner’s phone. This reasonable accessibility to the trajectory information even if is derived by calculating the probability of beacon reception help people walking around the building to obtain their own precise location information through mobile phones. We will show in the Evaluation section that AnonyCast can enable robust delivery of precise location measurements over the whole simulated exhibition venue of 40m 27m by only 4–6 BLE transmitters. Thus AnonyCast would⇥ provide a strong option if crowd tracking infrastructures are already installed in the environment.
(a) 90 dBm (b) 86 dBm PRELIMINARY Figure 2. Beacon reception rates for varying transmission powers This section discusses observations from our feasibility study and the basic idea of the proposed encryption mechanism. based on the Euclidean distance between trj and each BLE transmitter, given an empirical radio signal reception model Characteristics of BLE beacons (discussed in the next section). The server then encrypts trj In order to meet our privacy requirement, all trajectories are with a subset of the keys in Kj and publishes all the en- encrypted prior to their release so that users can only decrypt crypted trajectories via the Wi-Fi network. their own trajectories. To facilitate this decryption, Anony- Cast broadcasts location- and time- dependent keys using Subscribers to the location distribution server connect to a BLE beacons. Thus, BLE propagation characteristics play Wi-Fi access point nearby and receive all encrypted trajecto- an important role in the design and feasibility of our system. ries. Each client can then recover its own trajectory only if it To explore the characteristics of BLE, we conducted recep- has the keys that are requested by the server. Thus each tra- tion rate experiments in an 8m 15m-sized room using a com- jectory is delivered only to its true owner as long as the server mercial BLE transmitter [20]⇥ and several models of Android selects the appropriate set of keys for trajectory encryption. smartphones (Nexus 4 and Nexus 5 from LG Electronics, and Decentralized Location Servers Nexus 7 from ASUSTeK). The transmitter was positioned at a height of 1m and programmed to periodically transmit ad- The AnonyCast location distribution system is based on a de- vertisement beacons every 0.5 seconds. Smartphones were centralized architecture in which the system publishes all tra- placed at distances varying from 1-10m away from the trans- jectories via a local network so that trajectory identification mitter and continuously probing for beacons for 300 seconds. can be performed locally on mobile phones. This is in con- trast to a centralized architecture where each mobile device Fig. 2 (a)–(b) show the beacon reception rate for signal trans- periodically uploads feature values for trajectory identifica- mission powers of 90 dBm and 86 dBm, respectively. tion to a server, allowing the server to find and send back the Due to hardware variations across phone models and Blue- user’s own trajectory via a secure communication channel. A tooth chipsets, the beacon reception rate differs for each of basic assumption behind this scheme is that the server is trust- the devices evaluated. In addition, because of multipath and worthy, which may not always hold in practical use cases. By fading effects, the reception rate does not always degrade adopting a decentralized architecture, AnonyCast eliminates monotonously with distance. Nevertheless, beacon reception the need for a trusted central server. rates clearly tend to decrease with distance, falling to zero when the distance exceeds a certain value. Comparison with Purely BLE-based Localization Readers may wonder why the BLE transmitters broadcast Trajectory Encryption by CP-ABE keys rather than their own locations: if they advertise the ref- As a basic cryptography scheme for our location data dis- erence positions, mobile devices can receive these beacons to tribution mechanism, we harness the emerging concept of locally record their own trajectory. Although this approach Ciphertext-Policy Attribute-Based Encryption (CP-ABE) [3]. does not incur any privacy issues, accuracy of such position- CP-ABE is a type of public key cryptography that allows flex- ing systems depends considerably on density of transmitters. ible access control to the encrypted data based on attributes The recent literature [8] analyzes the accuracy of BLE-based that each client owns. It assumes that clients have a set of indoor positioning systems under a variety of configurations, keys, each of which is associated with a specific attribute and reports that 6-8 beacons should be available within the such as name, title, affiliation, etc. In the encryption process, signal reception range of smartphones to achieve sub-meter a party wanting to send a secret message specifies an access positioning accuracy. This means that we need to deploy tens policy described in the form of a logical expression over these of transmitters to cover, e.g., a wide exhibition venue. attributes. The access policy is then embedded in the cipher- text so that only people who have those attributes, and thus As we discussed in the Introduction section, the recent ma- have the corresponding private keys, can decrypt it to access jor trends for cyber-physical systems, together with the rapid the original data. The private keys corresponding to each at- technological advancements in big data analytics, have been tribute are distributed beforehand via a secure channel. continuously encouraging building managers to consider in- troducing sensor infrastructures for path analysis. The ba- In AnonyCast, each attribute is no longer associated with an sic motivation behind our work is to extend the anonymous individual person. Instead, each BLE transmitter has an ID crowd tracking systems, which are already installed in pub- attribute bi and a time attribute t, advertising the correspond- lic indoor space for crowd behavior analysis, so that they can ing private keys at the corresponding time. In encrypting The goal of access policy generation is to specify a set of ac- ceptable private key combinations such that the system can
b1 b2 probabilistically ensure that a client is the true owner of a tr1 b3 given trajectory if it has received a valid combination of the tr0 requested keys. In order to make this guarantee, we have to t1 t2 t3 calculate the probability that any other clients in the target
(b2,t3) (b1,t1) (b1,t2) t4 t5 field can receive the requested keys in any of the allowable tr2 (b3,t4) (b3,t5) combinations, and check that the probability is sufficiently Figure 3. An example scenario smaller than that of the true owner. Since the number of pos- K sible combinations of private keys is 2| |, computational cost K each trajectory, the server builds an access policy based on for the probability calculation amounts to ( T 2| |) in the the probabilities that the owner has received each key. Con- worst case. Although CP-ABE allows forO an| arbitrary|· logi- sider the example scenario shown in Fig. 3, where three cal formula for an access policy, we limit each access policy BLE transmitters periodically advertise location-dependent, by the following rules in order to bound the search space for time-varying keys. During the time window from t1 to t5, access policy generation. crowd tracking sensors detect three anonymous trajectories, Rule 1 An access policy C is defined in a conjunctive normal namely tr0, tr1 and tr2. Without loss of generality we con- form as follows: sider an access policy for the trajectory tr0. Based on the distance between tr0 and each BLE transmitter at each time C = C1 C2 Cm (1) step, the server estimates that the owner of tr0 is likely to ^ ^ ···^ have received key(b1,t1) and key(b1,t2) from transmitter b1, where each clause Ck is defined as: key(b2,t3) from b2, and key(b3,t4) and key(b3,t5) from b3. In this case, a possible access policy would be “(key(b1,t1) C = key key key . _ k k,1 k,2 k,n (2) key(b1,t2) key(b2,t3)) (key(b3,t4) key(b3,t5)).” The _ _ ···_ idea behind_ this policy is that^ the three keys_ in the first clause Rule 2 Each private key in K appears in at most one clause serve as evidence that a pedestrian is the owner of tr0 rather in an access policy C. than tr , since the owner of tr does not likely have any 2 2 m n of these keys. In the same way, the two keys in the second The subscripts and are the number of clauses in the ac- C C clause serve as evidence that the pedestrian is the owner of cess policy and the number of keys in a clause k, respec- key tr rather than tr . By concatenating these two clauses by tively. Each k,l in a clause is a private key which is adver- 0 1 tised by any of the BLE transmitters during the recent W time an AND operator, the server can ensure that the owner of tr0 is uniquely identified against other pedestrians. Obviously, steps. Rule 1 does not reduce the description capability of ac- generating such a reasonable access policy becomes much cess policies, because any logical formula can be converted harder as the number of trajectories, pedestrians, and beacons to such a conjunctive normal form. While the Rule 2 lim- increases. We design an algorithm to solve this problem in its the types of access structures that a policy can describe, the following sections. it drastically reduces the computational cost for probability calculation to ( T K 2) in return. O | |·| | ALGORITHM DESIGN For simplicity of notation, we represent each clause Ck in an This section provides detailed discussions on problem formu- access policy by a set of keys in it, say Sk. An access policy is lation and algorithm design for the AnonyCast system. then denoted by S = S , S ,...,S , where each element { 1 2 m} Sk corresponds to Ck in Eq. (1). Problem Formulation We denote a set of anonymous trajectories obtained by Consider access policy generation for a specific trajectory tr , and assume that a certain pedestrian has received a the crowd tracking sensors by T . Each trajectory trj 0 T is a time series of up to W locations of a single2 private key key(bi,t). The probability that she is the true pedestrian, where W is the window size for trajectory owner of tr0 rather than another trajectory trj (denoted by tr tr ) can be defined as: encryption. Thus a trajectory is denoted by trj =< 0 j trj,t, trj,t ⌧ ,...,trj,max(t0,t (W 1)⌧) >, where t0 is the Pid(tr0 trj key(bi,t)) = time when trj first appeared in the sight of the sensors, and | trj,t is the estimated location of the pedestrian at time t. The prcv(tr0,t,key(bi,t)) (3) server also knows the location of each BLE transmitter bi and prcv(tr0,t,key(bi,t)) + prcv(trj,t,key(bi,t)) the set of all keys K that have been advertised during the re- cent W time steps. We assume that the probability that the We term the probability in Eq. (3) the pair-wise identification owner of trj receives a private key key(bi,t) (denoted by probability of key(bi,t) for tr0 against trj. prcv(trj,key(bi,t))) is a function of the distance between each point on trajectory tr and each BLE transmitter b . In the same manner, we consider the pair-wise identification j i probability of a given access policy S, assuming that a certain Based on the reception probabilities for the keys advertised pedestrian has received a set of keys that satisfy S during the during the recent W time steps, the server generates an access recent W time steps. In this case, the probability that she policy for each trajectory tr T . is the owner of tr rather than another trajectory tr can be j 2 0 j lower bounded as: T 0 -dimensional feature vector for each key key(bi,t) in K0, whose| | elements are its pair-wise identification probabilities Pid(tr0 trj S) prcv(tr0,key0(tr0, trj, Sk))/ | for tr0 against each of other trajectories trj: