PRIVACY AND SECURITY IN CROWDSENSING

by Jian Lin c Copyright by Jian Lin, 2019

All Rights Reserved A thesis submitted to the Faculty and the Board of Trustees of the Colorado School of Mines in partial fulfillment of the requirements for the degree of Doctors of Philosophy (Computer Science).

Golden, Colorado Date

Signed: Jian Lin

Signed: Dr. Dejun Yang Advisor Thesis Advisor

Golden, Colorado Date

Signed: Dr. Tracy Camp Professor and Head Department of Computer Science

ii ABSTRACT

The rapid proliferation of sensor-embedded devices has enabled crowdsensing, a new paradigm which effectively collects sensing data from pervasive users. However, both the openness of crowdsensing systems and the richness of users’ submitted data raise significant concerns for privacy and security. In this thesis, we aim to identify and solve privacy and security issues in crowdsensing. Specifically, we consider three important parts in crowdsens- ing: task allocation, incentive mechanisms, and truth discovery. In crowdsensing systems, task allocation is used to select a proper subset of users to perform tasks. Incentive mech- anisms are used to stimulate users to participate in the system. Truth discovery is used to aggregate data. We first analyze privacy issues in task allocation and incentive mecha- nisms raised by the inference attack in which a user is able to infer other users’ sensitive information according to published information. We propose two task allocation algorithms which defend against location-inference attack. To protect users’ bid privacy from inference attack, we propose two frameworks for privacy-preserving incentive mechanisms. Then, we analyze the security issues in incentive mechanisms and truth discovery raised by the Sybil attack in which a user illegitimately pretends to be multiple identities to gain benefits. To deter users from conducting a Sybil attack, we propose Sybil-proof incentive mechanisms for both offline and online scenarios. Additionally, we propose a Sybil-resistant truth discovery framework to diminish the impact of the Sybil attack on the aggregated data. Both simula- tion and experiment results show the effectiveness of the proposed works in solving privacy and security issues in crowdsensing.

iii TABLE OF CONTENTS

ABSTRACT ...... iii

LISTOFFIGURES ...... ix

LISTOFTABLES...... xi

ACKNOWLEDGMENTS ...... xii

CHAPTER1 INTRODUCTION ...... 1

1.1 TaskAllocationinCrowdsensing ...... 1

1.2 IncentiveMechanismsinCrowdsensing ...... 2

1.3 TruthDiscoveryinCrowdsensing ...... 3

1.4 PrivacyIssuesinCrowdsensing ...... 3

1.4.1 PrivacyIssuesinTaskAllocation ...... 3

1.4.2 Privacy Issues in Incentive Mechanisms ...... 4

1.5 SecurityIssuesinCrowdsensing ...... 4

1.5.1 Security Issues in Incentive Mechanisms ...... 5

1.5.2 SecurityIssuesinTruthDiscovery...... 5

1.6 Contribution...... 6

1.7 ThesisOrganization...... 7

CHAPTER2 RELATEDWORK ...... 8

2.1 PrivacyandSecurityinTaskAllocation ...... 8

2.2 Privacy and Security in Incentive Mechanisms ...... 10

2.3 PrivacyandSecurityinTruthDiscovery ...... 11

iv CHAPTER 3 PRESERVING LOCATION PRIVACY IN TASK ALLOCATION . . . . 13

3.1 Background ...... 13

3.2 ModelandProblemFormulation...... 15

3.2.1 SystemModel...... 15

3.2.2 AdversaryModel ...... 15

3.2.3 ProblemFormulation...... 19

3.3 OurApproach...... 21

3.3.1 DesignofPASTA...... 21

3.3.2 DesignofHeuristic ...... 23

3.4 PerformanceEvaluation ...... 25

3.4.1 EvaluationSetup ...... 25

3.4.2 ImpactoftheNumberofTasks ...... 26

3.4.3 ImpactoftheNumberofUsers ...... 28

3.4.4 ImpactofthePrivacyRequirement ...... 30

3.4.5 Impact of the Parameter ǫ ...... 31

3.5 Conclusion...... 32

CHAPTER 4 PRESERVING BID PRIVACY IN INCENTIVE MECHANISMS . . . . 33

4.1 Background ...... 33

4.2 ModelsandProblemFormulation ...... 34

4.2.1 Single-bidModel ...... 34

4.2.2 Multi-bidModel ...... 35

4.2.3 ThreatModels ...... 36

4.2.4 DesiredProperties ...... 38

v 4.2.5 DesignObjective ...... 40

4.3 OurApproach...... 41

4.3.1 DesignRationale ...... 41

4.3.2 DesignofBidGuard...... 41

4.3.3 DesignofBidGuard-M ...... 44

4.4 Analysis ...... 47

4.4.1 AnalysisofBidGuard...... 47

4.4.2 AnalysisofBidGuard-M ...... 52

4.5 PerformanceEvaluation ...... 55

4.5.1 SimulationSetup ...... 55

4.5.2 EvaluationofSocialCost...... 56

4.5.3 EvaluationofTotalPayment...... 59

4.5.4 EvaluationofPrivacyLeakage ...... 59

4.6 Conclusion...... 63

CHAPTER 5 DETERRING THE SYBIL ATTACK IN INCENTIVE MECHANISMS . 65

5.1 Background ...... 65

5.2 ModelandProblemFormulation...... 65

5.2.1 OfflineScenario ...... 66

5.2.2 OnlineScenario ...... 68

5.2.3 ThreatModels ...... 68

5.2.4 DesiredPropertiesandObjective ...... 74

5.3 OurApproach...... 75

5.3.1 DesignofSPIM-SandSPIM-M ...... 75

vi 5.3.2 DesignofSOSandSOM ...... 80

5.4 Analysis ...... 83

5.4.1 AnalysisofSPIM-SandSPIM-M ...... 84

5.4.2 AnalysisofSOSandSOM ...... 90

5.5 Performance Evaluation of SPIM-S and SPIM-M ...... 95

5.5.1 EvaluationSetup ...... 96

5.5.2 EvaluationofRunningTime...... 96

5.5.3 EvaluationofTotalPayment...... 97

5.5.4 EvaluationofPlatformUtility ...... 98

5.6 PerformanceEvaluationof SOSand SOM...... 99

5.6.1 EvaluationSetup ...... 99

5.6.2 EvaluationofTotalPayment...... 99

5.6.3 EvaluationofPlatformUtility ...... 100

5.6.4 EvaluationofSybil-proofness ...... 101

5.7 Conclusion...... 102

CHAPTER 6 ALLEVIATING THE SYBIL ATTACK IN TRUTH DISCOVERY . . 103

6.1 Background ...... 103

6.1.1 TruthDiscovery...... 103

6.1.2 DeviceFingerprinting...... 105

6.2 ModelandProblemFormulation...... 106

6.2.1 SystemModel...... 106

6.2.2 AdversaryModels...... 107

6.3 OurApproach...... 109

vii 6.3.1 DesignRationale ...... 109

6.3.2 DesignofFramework ...... 109

6.3.3 DesignofAccountGroupingMethods...... 111

6.4 Experiment ...... 120

6.4.1 ExperimentalSetup...... 121

6.4.2 EvaluationofAccountGrouping...... 122

6.4.3 EvaluationofAccuracy...... 124

6.5 Conclusion...... 127

CHAPTER7 CONCLUSION...... 128

7.1 SummaryofResults ...... 128

7.2 SummaryofPublications...... 129

7.3 FutureResearchOpportunities...... 130

REFERENCESCITED ...... 132

APPENDIX COPYRIGHTPERMISSION ...... 143

viii LIST OF FIGURES

Figure 1.1 Three major components in crowdsensing and their privacy and securityissuesstudiedinthisthesis ...... 2

Figure 3.1 Comparison between our work and existing works ...... 14

Figure3.2 Systemmodel ...... 16

Figure 3.3 Example of location-inference attack via task overlap ...... 17

Figure 3.4 Example of location-inference attack via task sequence...... 18

Figure 3.5 Example of mix-zone-based task allocation ...... 19

Figure 3.6 GPS locations of the taxi drivers in Rome ...... 25

Figure 3.7 Impact of m on PASTA, Heuristic,andPWSM ...... 27

Figure 3.8 Impact of n on PASTA, Heuristic,andPWSM ...... 29

Figure 3.9 Impact of n on HeuristicandPWSM ...... 30

Figure 3.10 Impact of k on PASTAand Heuristic ...... 31

Figure 3.11 Impact of ǫ on PASTA...... 31

Figure 4.1 Impact of the number of sensing tasks on the social cost. (a) BidGuard. (b)BidGuard-M...... 57

Figure 4.2 Impact of the number of users on the social cost. (a) BidGuard. (b) BidGuard-M...... 57

Figure 4.3 Comparison of BidGuard, TRAC, DP-hSRC and OPT. (a) Impact of the number of sensing tasks. (b) Impact of the number of users...... 58

Figure 4.4 Impact of the number of sensing tasks on the total payment. (a) BidGuard.(b)BidGuard-M...... 59

Figure 4.5 Impact of the number of users on the total payment...... 60

ix Figure 4.6 Impact of the number of sensing tasks on privacy leakage...... 61

Figure 4.7 Impact of the number of users on privacy leakage...... 61

Figure 4.8 Impact of the ǫ onprivacyleakage...... 62

Figure4.9 Socialcostv.s. privacyleakage...... 63

Figure5.1 Onlinecrowdsensingsystem ...... 66

Figure 5.2 Example showing MMT isnotSybil-proofinSMcase ...... 71

Figure 5.3 Example showing MSensing isnotSybil-proofinMMcase ...... 72

Figure5.4 Runningtime ...... 97

Figure5.5 Totalpayment...... 98

Figure5.6 Platformutility ...... 98

Figure5.7 Totalpayment...... 100

Figure5.8 Platformutility ...... 101

Figure5.9 Sybil-proofness...... 101

Figure 6.1 MEMS-based accelerometer and gyroscope ...... 106

Figure6.2 ExampleofAG-FP ...... 115

Figure6.3 ExampleofAG-TS ...... 117

Figure6.4 ExampleofAG-TR ...... 119

Figure 6.5 POIs for Wi-Fi signal strength measurement ...... 121

Figure 6.6 Smartphone fingerprints in the first two principal components’ space . . 123

Figure6.7 ARIcomparison ...... 125

Figure6.8 MAEcomparison ...... 126

x LIST OF TABLES

Table 4.1 Example showing the inference attack in single-bidmodel ...... 37

Table 4.2 Example showing the inference attack in multi-bid model ...... 38

Table 5.1 Example showing vulnerabilities to Sybil attack ...... 74

Table 6.1 Example showing the Sybil attack in truth discovery...... 108

Table6.2 Temporalandspectralfeatures ...... 113

Table 6.3 Example showing the Sybil attack in truth discovery...... 116

Table 6.4 Models of smartphones used in the experiment ...... 123

xi ACKNOWLEDGMENTS

First and foremost, I would like to thank my advisor, Dr. Dejun Yang, for his endless support during the entire process. He always pushed me towards interesting and important research topics in crowdsensing. His graceful advising style and rigorous research style will always be a great source of inspiration for my future work. It is my great honor to be his first PhD student. Additionally, I would like to thank my committee members, Dr. Alexandra Newman, Dr. Qi Han, and Dr. Hao Zhang. They have offered me invaluable guidance and advice for improving the quality of this thesis. I would also like to thank all of my colleagues: Ming Li, Yuhui Zhang, and Nan Jiang. I will miss the brainstorming sessions in our group meetings. Finally, and most importantly, I would like to thank my family and all of my friends for encouraging and supporting me throughout my entire life.

xii CHAPTER 1 INTRODUCTION

With the rapid proliferation of mobile devices equipped with rich on-board sensors (e.g., camera, accelerometer, and compass), crowdsensing emerges as a new paradigm, which out- sources sensing tasks to a crowd of ubiquitous participants. A sensing task is asking a user to collect data using the sensors on its mobile device. The effectiveness of crowdsensing to collect data enables numerous applications, which facilitate our lives in various aspects, e.g., transportation [1], environmental monitoring [2], and social networks [3]. A typical crowdsensing system consists of a cloud-based platform and a large number of mobile users. The platform works as a sensing service buyer who posts the required sensing information and recruits a set of users to provide sensing services. Once selected by the platform, a user starts to collect the required data and sends it back to the platform. The success of a crowdsensing system relies on two factors: 1) a sufficient number of users and 2) the quality of sensing data contributed by individual users. These two factors are influenced by three important components in crowdsensing: task allocation, incentive mechanisms and truth discovery as shown in Figure 1.1. The goal of this thesis is to identify and solve privacy and security issues in these three parts in crowdsensing. Figure 1.1 illustrates the research topics in this thesis.

1.1 Task Allocation in Crowdsensing

In a typical crowdsensing system, users are registered as candidate workers. For any task, the platform usually selects a proper subset of candidates to perform the task. The user-selection process, called task allocation, is a key component of crowdsensing since it significantly impacts the performance of the system. For example, the platform may exceed the budget (the max amount of money the platform wants to pay for the users) if it selects users with high costs or may have a higher sensing latency if it selects users far away from

1 Location Privacy Task Allocation Inference attacker Bid Privacy Incentive Mechanism Rapacious attacker Truth Discovery Malicious attacker Crowdsensing Sybil attacker

Figure 1.1: Three major components in crowdsensing and their privacy and security issues studied in this thesis

the task. In addition, users may be reluctant to participate in the system if performing a task incurs a higher cost (e.g., large travel distance) than they expected. In the literature, many works have been proposed to optimize task allocation in crowdsensing for different goals such as maximizing the sensing quality with budget constraints [4, 5] and minimizing the total travel distance [6, 7].

1.2 Incentive Mechanisms in Crowdsensing

In reality, users consume their own resources such as battery and sensing time while completing the sensing tasks. In addition, they might suffer from potential privacy disclosure by sharing their sensed data with personal information (e.g., location tags and bid price). Therefore, users may be reluctant to participate in a crowdsensing system, unless they are paid a reward to compensate their resource consumption or potential privacy leaks. Since the number of participating users has a significant impact on the performance of crowdsensing systems, it is necessary to stimulate users to join the systems. In recent years, many auction- based incentive mechanisms have been proposed for crowdsensing [8–11]. They are essentially reverse auctions in which the platform is the service buyer and the users are the bidders selling sensing services.

2 1.3 Truth Discovery in Crowdsensing

In crowdsensing, the platform collects sensing data from selected users. In practice, the quality of users’ sensing data varies due to many factors, e.g., insufficient skill, poor sensor quality, and environmental noise. Therefore, the platform needs to properly aggregate noisy sensing data collected from a variety of sources to receive true information (i.e., the truths). It is ideal for the platform to use a weighted aggregation method, which assigns higher weights to reliable users. As a result, the aggregated result is close to the data provided by reliable users. However, the reliability of users is usually unknown to the platform. Therefore, truth discovery [12] has been proposed to address this issue. Without any prior knowledge about users’ reliability, a truth discovery algorithm iteratively assigns different weights to users according to the quality of their data and computes the estimated truth as the weighted average of all data. This process repeats until a convergence criterion is satisfied. The convergence criterion is application-specific. For example, the algorithm could be treated as converged as long as the difference between the estimated ground truths in two consecutive iterations is less than a threshold.

1.4 Privacy Issues in Crowdsensing

In this thesis, we first analyze the privacy issues in task allocation and incentive mecha- nisms raised by the inference attack [13] in which a user is able to infer other users’ sensitive information according to the published information.

1.4.1 Privacy Issues in Task Allocation

Although many works [4–7] have been proposed to optimize task allocation in crowd- sensing, they all neglect users’ location privacy. Recently, it has been pointed out that an attacker can conduct different attacks such as physical stalking, identity theft, and breach of sensitive information if the location of a user is known [14]. Therefore, disclosing locations may severely discourage users’ participation in crowdsensing.

3 In this thesis, we propose work to address the location privacy issues in task allocation. Specifically, we formulate the location privacy-aware spatial task allocation problem, which aims to maximize the total number of assigned tasks while providing personalized users’ location privacy protection against the location-inference attack (to be elaborated later on in Section 3.2.2).

1.4.2 Privacy Issues in Incentive Mechanisms

Recently, many auction-based incentive mechanisms have been proposed for crowdsens- ing [8–11]. The objectives of these mechanisms focus on either maximizing the total value gained by the platform or minimizing the total payment to the selected users. However, none of them takes users’ bid privacy into consideration. Protecting users’ bid privacy is important because its disclosure might also incur threats to users’ other private information, such as location [15, 16]. Therefore, we propose a work in this thesis to protect users’ bid privacy in the incentive mechanisms for crowdsensing. Specifically, we formalize the notion of users’ bid privacy by employing the concept of differential privacy [17]. Intuitively, a mechanism provides differential privacy if the change of one user’s bid has limited impact on the outcome. To preserve users’ bid privacy, we leverage the exponential mechanism [18], which is a technique to design differentially private mechanisms.

1.5 Security Issues in Crowdsensing

In Section 1.4, we talk about the privacy issues in crowdsensing, which is about the safeguarding of users. Next, we shift our focus to the safeguarding of platform. Specifically, we analyze the security issues in incentive mechanisms and truth discovery raised by the Sybil attack [19] in which a user illegitimately pretends to be multiple identities to gain benefits.

4 1.5.1 Security Issues in Incentive Mechanisms

In crowdsensing, a user may try to profit from submitting multiple bids under fictitious identities, e.g., creating multiple accounts. This is known as a Sybil attack. This attack is easy to conduct but difficult to detect. The vulnerability of an incentive mechanism to Sybil attack may make the system fail to achieve the desired properties and jeopardize the fairness of the system, which discourages users from participating in crowdsensing. We analyze existing auction-based incentive mechanisms and demonstrate that they are all vulnerable to Sybil attack. Therefore, the problem of designing Sybil-proof auction-based incentive mechanisms for crowdsensing is still open. Moreover, the Sybil attack model in crowdsensing is yet to be formally defined. In this thesis, we propose a work to design Sybil-proof incentive mechanisms for crowd- sensing. Specifically, we consider both the offline and online scenarios in this work. In the offline scenario, users are required to submit their bids at the beginning of the auction, and the platform selects a subset of users according to some criteria. In the online scenario, users participate in the system in a random order. Once a user arrives, the system has to make decision on whether to select it. We first formally define the Sybil attack model in both scenarios. Then, both the offline and online scenarios are provided with a sufficient condition for a mechanism to be Sybil-proof, respectively. Finally, we design Sybil-proof incentive mechanisms for each scenario.

1.5.2 Security Issues in Truth Discovery

Although the benefits of truth discovery in data aggregation are well understood, the fact that aggregation accuracy highly depends on the quality of input data raises security concerns. Existing truth discovery algorithms [12, 20–22] assume that most users are reliable. However, this assumption does not hold in practice, especially when a crowdsensing system is under the Sybil attack. In crowdsensing, a user could conduct the Sybil attack by submitting data using multiple accounts for various motives. For example, a rapacious user may want

5 to receive more rewards than it is supposed to be paid without contributing extra efforts (e.g., by submitting duplicated data multiple times using different accounts). In addition, a malicious user may aim to manipulate the results of the system (e.g., by submitting multiple fake data using different accounts). Therefore, the power of truth discovery algorithms will be undermined unless it is resistant to the Sybil attack. In this thesis, we propose a work which aims to diminish the impact of the Sybil attack on truth discovery algorithms. We consider two types of Sybil attacks based on whether a Sybil attacker uses multiple devices. These two types of attacks are sufficient to represent the general Sybil attack in crowdsensing. We propose three account-grouping methods to cluster suspicious users. The first method is based on users’ device fingerprints, which can defend against the first type of attacks. To defend against the second type of attacks, we propose another two methods to handle different scenarios. The second method is based on users’ accomplished task sets, which can handle the scenario in which accounts have diverse accomplished task sets. The third method is based on users’ trajectories, which can handle the scenario in which most users perform similar tasks. These account-grouping methods are used in pair with a truth discovery algorithm and ensure high aggregation accuracy under the Sybil attack.

1.6 Contribution

The goal of this thesis is to identify and solve privacy and security issues in crowdsensing. Specifically, we consider privacy and security issues in three important parts of crowdsensing: task allocation, incentive mechanisms, and truth discovery. The main contributions of this thesis are as follows:

• For task allocation, we identify and solve the privacy issues raised by the inference attack. We propose two task allocation algorithms which maximize the number of assigned tasks while providing personalized location privacy protection against the location-inference attack.

6 • For incentive mechanisms, we first identify and solve the privacy issues raised by the inference attack. We design two frameworks for privacy-preserving auction-based in- centive mechanisms. Then, we identify and solve the security issues raised by the Sybil attack. We first provide a sufficient condition for a mechanism to be Sybil-proof for each of the offline and online scenarios. These two sufficient conditions can be used as guidelines to design Sybil-proof incentive mechanisms for both offline and online sce- narios. Finally, we design Sybil-proof incentive mechanisms for both offline and online scenarios. • For truth discovery, we also identify and solve the security issues raised by the Sybil attack. To diminish the impact of the Sybil attack on the aggregated data, we propose a Sybil-resistant truth discovery framework. To realize the framework, we design three account grouping methods for the framework, which are used in pair with a truth discovery algorithm.

1.7 Thesis Organization

In this thesis, each of the aforementioned works is summarized in a chapter. These chapters together contribute to our goal of identifying and solving security and privacy issues in crowdsensing. The remainder of this thesis is organized as follows: In Chapter 2, we review the existing works related to the privacy and security in crowdsensing. In Chapter 3, we identify and solve the location privacy issues in the task allocation. In Chapter 4, we identify and solve the bid privacy in the incentive mechanisms. In Chapter 5, we identify and solve the vulnerability of existing incentive mechanisms to the Sybil attack. In Chapter 6, we analyze the security issues in truth discovery raised by the Sybil attack. We conclude this thesis in Chapter 7.

7 CHAPTER 2 RELATED WORK

2.1 Privacy and Security in Task Allocation

Existing works have made a lot of efforts in task allocation since it is an important prob- lem in crowdsensing [23]. The objective of task allocation in crowdsensing is to optimize the overall system utility while completing the tasks sent by requesters. In the literature, system utilities can be sensing data quality [24], incentive cost [25], energy consumption [26], and travel distance [6, 7, 27]. Zhao et al. [28] considered a crowdsensing system in which each task requires multiple workers to accomplish and proposed greedy algorithms for worker selection. Wang et al. [29] proposed a demand-based dynamic task allocation mechanism to balance the participation among location-dependent tasks. In [30], a task allocation was proposed to maximize the profits of participants. In crowdsensing, a user usually needs to travel to a specific location to perform a task. Therefore, users’ location plays an important role in task allocation because it significantly impacts the performance of the system. How- ever, all of the above works neglect the protection of users’ locations, which raises serious privacy issues. As the increasing awareness of users’ location privacy, many location privacy-preserving works have been proposed for task allocation using different techniques. The spatial cloaking technique is a straightforward method to protect location privacy in task allocation [31–33]. Each user’s location is hidden in a cloaked region such that an attacker cannot obtain the user’s actual location. The k-anonymity [34] is a popular cloaking technique, and has been widely used in maintaining data privacy. The main idea is to have each record in the database k-anonymous, i.e., not distinguishable among other k−1 records. The concept of k- anonymity for location privacy was introduced by Gruterser et al. [35]. However, the privacy guarantee of these cloaking methods will be jeopardized if an attacker has prior knowledge.

8 To address this issue, differential privacy [17] was introduced for privacy-preserving task allocation. Andres et al. [36] proposed a location perturbation method which obfuscates users’ real locations to a fake locations according to a pre-configured probability function. To et al. [14] assumed the cellular service provider as a trusted third party, and differential location privacy is preserved by perturbing the total participant number in a certain region. However, a cellular service provider has no incentive to join the system in practice. To avoid involving the third party, Wang et al. [37] proposed a location privacy-preserving task allo- cation framework with geo-obfuscation. However, users may be reluctant to participate since they need to frequently update the obfuscating function as long as the task distribution is transformed. In addition, both of above methods employ a same level of privacy protection for all workers, which fail to satisfy different privacy demands of users. Therefore, a per- sonalized privacy-preserving task allocation framework was proposed [38] in which each user submits its obfuscated distance and privacy level to the platform. Recently, Wang et al. [39] combined k-anonymity and differential privacy-preserving to achieve the anonymity of users. In addition, some works [40, 41] proposed to protect users’ bids in auction-based crowdsens- ing system to indirectly preserve users’ location privacy. However, all of above works are vulnerable to location-inference attack in which a user is able to infer other users’ location information through task assignment. Note that existing works address location privacy issues and task allocation in a disjoint manner. Location obfuscation is required to protect users’ actual locations. Our work differs from existing works in that we address location privacy issues and task allocation in a joint manner. Specifically, we formulate a location privacy-aware spatial task allocation problem, which aims to maximize the total number of assigned tasks while protecting users’ location privacy against location-inference attack. Meanwhile, we do not involve any third party, and users do not need to protect their location privacy on their side.

9 2.2 Privacy and Security in Incentive Mechanisms

In recent years, incentive mechanisms in crowdsensing have been widely studied [42, 43]. As one of the pioneering works on designing incentive mechanisms for crowdsensing, Yang et al. [8, 44] proposed two incentive mechanisms for both user-centric and platform- centric models using auction and Stackelberg game, respectively. The objectives of most of state-of-art incentive mechanisms are either maximizing the total utility/value of the platform under a certain constraint (e.g., budget) [45] or minimizing the total payment of the platform [46]. Feng et al. [47] proposed a mechanism called TRAC, which takes into consideration the importance of location information when assigning sensing tasks. Some efforts have been made to protect users’ privacy in crowdsensing [48]. Although providing good performance in privacy preservation, the mechanisms in [49–56] are based on cryptography techniques and do not take into consideration users’ strategic behaviors. Besides, all of cryptography-based works are vulnerable to inference attack in which an attacker can infer other users’ private information through published results. Sun et al. [57] proposed an auction-based incentive mechanism which encrypts users’ bids by oblivious transfer. But it does not solve the privacy issue raised by inference attack because one user still can infer others’ bids from received payment. Jin et al. proposed a privacy-preserving approximately truthful incentive mechanism [58], which minimizes the total payment, and a privacy-preserving framework [59] for data aggregation. However, none of the above works has a performance guarantee on social cost (i.e., the total cost of the selected users). In this thesis, our objectives are preserving users’ bid privacy from inference attack while achieving approximate social cost minimization. Next, we analyze security issues in incentive mechanisms. Specifically, we consider the impact of Sybil attack [19] on incentive mechanisms. In recent years, security issues raised by Sybil attack has been widely analyzed in a variety of domains, e.g., social networks [60], crowdsourced mobile apps [61], virtual machine instance allocation [62], and Wireless sensor networks [63]. The effects of Sybil attack on combinatorial auctions have been first analyzed

10 in [64]. This work proved that the VCG auction is not Sybil-proof. Yokoo et al. [65] introduced the price-oriented rationing-free protocols, which characterize the Sybil-proof protocols for combinatorial auction. Most of existing incentive mechanisms (e.g., [8, 9, 44, 66, 67]) focus on offline scenario in which users are required to submit their bids at the beginning of the auction, and the platform selects a subset of users according to some criteria for different objectives, e.g, maximize social welfare and platform utility. Some works (e.g., [45, 54, 68–73]) consider a more practical yet dynamic online scenario in which users participate in a system in a random order. Once a user arrives, the platform has to make irrevocable decisions on whether to select it and how much it should be paid without knowing future information. However, none of existing mechanisms take into consideration the Sybil attack. We analyzed all existing auction-based incentive mechanisms and demonstrated that they are all vulnerable to the Sybil attack. We show the vulnerability of existing auction-based incentive mechanisms to the Sybil attack in section 5.2.3.

2.3 Privacy and Security in Truth Discovery

In crowdsensing, sensing data are collected from individual users with different quali- ties. Truth discovery, which aims to identify true information (i.e., the truth) out of all collected data, has received considerable attention in both industry and academia. Truth discovery refers to a family of algorithms [12, 20–22] that aim to discover the truth from a crowd of users’ noisy data. Recently, Jin et al. [74] took into consideration users’ strategic behaviors and proposed a payment mechanism to incentivize high-effort sensing from users. Tang et al. [75] considered privacy issues in truth discovery and proposed a non-interactive privacy-preserving truth discovery system for crowdsensing. Zhu et al. [76] proposed to use interactive filtering truth discovery to provide reliable crowdsensing services in connected vehicular cloud computing. However, none of these works considered Sybil attack, and thus may yield unsatisfactory results due to the vulnerability to Sybil attack. Recently, some Sybil-proof incentive mechanisms [77–79] have been proposed. Although these Sybil-proof

11 incentive mechanisms can fundamentally eliminate rapacious users’ motivation to conduct the Sybil attack according to their model, malicious users who aim to manipulate the aggre- gated data through Sybil attack cannot be addressed by these mechanisms. This is because they only considered rapacious users in their model. Therefore, the problem of designing Sybil-resistant truth discovery algorithms for crowd- sensing is still open. We will show the vulnerability of existing truth discovery algorithms to Sybil attack in Section 6.2.2.

12 CHAPTER 3 PRESERVING LOCATION PRIVACY IN TASK ALLOCATION

3.1 Background

Task allocation is an important problem in crowdsensing since it has a significant impact on the effectiveness of a system. For example, a platform may exceed the budget if it selects users with high costs or may have a higher sensing latency if it selects users far away from tasks. In addition, users may be reluctant to participate in crowdsensing if performing a task incurs a higher cost (e.g., large travel distance) than they expected. In recent years, many works have been proposed to optimize task allocation in crowdsensing for different goals such as maximizing the sensing quality with budget constraints [4, 5] and minimizing the total travel distance [6, 7]. However, these works neglect users’ location privacy. In [14], it has been pointed out that an attacker can conduct different attacks such as physical stalking, identity theft, and breach of sensitive information if a user’s location is known. Therefore, disclosing users’ location may severely discourage users’ participation. To protect users’ location privacy during task allocation, several solutions have been proposed. A widely used technique is spatial cloaking [31–33], which uses a cloaked region to represent users’ actual locations. However, the privacy guarantee of these cloaking methods will be jeopardized if an attacker has prior knowledge. Recently, differential privacy has been introduced to secure users’ location privacy [14, 37, 38, 80, 81]. These methods provide theoretically guaranteed location privacy protection regardless of attackers’ prior knowledge. Although different techniques are used, all of the above works address the location privacy issues and task allocation in a disjoint manner as shown in Figure 3.1(a). Essentially, they all need to obfuscate users’ actual locations to fake locations. The obfuscation process can be done by a trusted third party or users themselves. Taking these fake locations as input instead of users’ actual locations, the platform determines task allocation according

13 to different objectives. However, the quality of the system might be downgraded due to the obfuscation. For example, a user is far away from a task while its obfuscated location is close to the task. Therefore, if this user is assigned this task, it incurs a high traveling cost. On the contrary, a user’s obfuscated location might be far from a task while its actual location is close to the task. This might lead to the failure of task allocation since there are not enough users in the vicinity of a task. In addition, these works are all vulnerable to location-inference attack (to be elaborated in 3.2.2).

Obfuscated locations Users’ Task locations assignment

Location privacy protection Task allocation

(a) Existing works address location privacy issues and task allocation in a disjoint manner.

Users’ Task locations assignment

Location privacy-aware spatial task allocation

(b) Our work addresses location privacy issues and task allocation in a joint manner.

Figure 3.1: Comparison between our work and existing works

In this work, we propose to address location privacy issues and task allocation in a joint manner as shown in Figure 3.1(b). Specifically, we formulate a location privacy-aware spatial task allocation problem, which aims to maximize the total number of assigned tasks while providing personalized users’ location privacy protection against location-inference attack. Note that we choose to maximize the number of assigned tasks since this number is an important measurement to evaluate the effectiveness of a system and it is also related to other measurements such as the total value of tasks to a system. Different from existing works, we assume the platform is trusted while anyone else knowing the task assignment can

14 be an attacker in this work. Meanwhile, no trusted third party is required in our work.

3.2 Model and Problem Formulation

In this section, we first introduce the system overview and the adversary model. At last, we formalize the location privacy-aware spatial task allocation problem and prove its NP-completeness.

3.2.1 System Model

We consider a crowdsensing system as shown in Figure 3.2, which consists of a cloud-based platform, a set of data requesters, and a crowd of mobile users. Each user sends its actual location to the platform. It is worth noting that, we assume that the platform is trusted in this work. A requester needs to collect the sensing data (e.g., cellular signal strength [82] and noise level measurements [2]) in specific locations. The sensing work at each location is defined as a task. When the platform receives tasks from a requester, it assigns tasks to selected users based on their locations. The selected users perform the assigned tasks and send the sensing data back to the platform. The platform aggregates the collected data from users and sends the sensing results to the requester. We assume that each task can be assigned to multiple users and a task can be completed once it is assigned to a user. Since each selected user has to travel to the task location to collect sensing data, and it usually has a travel budget [32] which is also referred to as the maximum travel distance, we assume that a user can only perform the tasks within a distance of R, where R is maximum distance a user is willing to travel from its location. In this work, we assume that all users have a same maximum travel distance.

3.2.2 Adversary Model

In this work, we assume that the platform is trusted while an attacker can be anyone who has access to the task assignment. The task assignment consists of the selected users and their corresponding assigned tasks. An attacker might obtain the task assignment by

15 Locations Task Requests Task Assignments Sensing Result Sensing Data

Users Platform Requesters

Figure 3.2: System model hacking the platform or through published information from the platform. For example, the system in [41] publishes task assignment to ensure transparency, since it has been shown in [83] that ensuring transparency is essential to efficiency in the procurement procedure since it enhances the competitiveness of public procurement. How can an attacker obtain task assignment is out the scope of this work. We focus instead on how can an attacker infer a target user’s location information according to the task assignment through location- inference attack [80]. We specify two types of location-inference attack. Location-inference attack via task overlap: In our system, each task is associated with a location, and users can only perform tasks within a distance of R. It implies that the selected users for a task should be in the circle centered at the task location with radius R. Therefore, an attacker may infer a target user’s location information through the locations of assigned tasks. As an example in Figure 3.3, we assume that user u1 is assigned two tasks t1 and t2 in one task assignment. Since the task locations are published by the platform, and thus an attacker can infer that u1 is highly likely in the shaded region (task overlap).

Furthermore, an attacker can narrow down the shaded region if u1 is assigned more tasks.

This obviously breaches u1’s location privacy. Note that the location privacy in this work refers to the secrecy of each user’s actual location it sends to the platform for task allocation. Protecting this location information is important since it is usually the most frequently used locations of a user such as home and workplace. One possible solution to preserve a user’s

16 R

Figure 3.3: Example of location-inference attack via task overlap

location privacy in this case is limiting the number of tasks each user can be assigned in one assignment such that an attacker cannot narrow down a user’s location via task overlap. Therefore, we assume that each user can be assigned at most one task in one assignment in our system. Location-inference attack via task sequence: Next we demonstrate that an attacker can still infer a target user’s location information even the assigned tasks have no overlap. As

shown in Figure 3.4, assume that user u1 participates in the system multiple times and is assigned three tasks t1, t2, and t3 in three task assignments. Note that, we assume that each user can be assigned at most one task at one time. An attacker can attain the assigned task sequence of the target user (i.e., t1 → t2 → t3) through the assignments according to the timestamps of the assigned tasks. Although there is no overlap between these three tasks, according to a recent study [84], an attacker can still infer important location information via this task sequence by analyzing the spatial-temporal correlation of the task sequence.

For example, if an attacker knows the following information: 1) t1 is in a central business district (CBD), 2) there is an elementary school in the range of t2, and 3) t3 locates in a residential area. Therefore, this task sequence might be classified as a travel pattern that a user picks up its child after work and goes back home. Although u1’s locations were protected at each task, the task sequence with some side information such as the travel pattern may

17 reveal its actual location at the elementary school when it is assigned t2. To protect users’ location privacy in this case, we leverage the concept of mix-zone [85]. To prevent users from being identified by the locations they visited, mix-zones are used to frequently change users’ pseudonyms. Specifically, a user swaps its pseudonym with a new pseudonym chosen among the pseudonyms of users inside a mix-zone [86]. For example, when k users are inside a mix-zone at the same time, their identities will be shuffled, resulting in k-anonymity. Note that, at least k users are required for each mix-zone provides sufficient assurance of the unlinkability of pseudonyms [85]. In this work, we treat a disk centered at each task with radius R as a mix-zone. The selected users inside this disk exchange their pseudonyms with

School Area

Residential Area CBD

Figure 3.4: Example of location-inference attack via task sequence

each other. Figure 3.5 illustrates the same task sequence assigned to u1 as in Figure 3.4, while each task is assigned to 2 users and u1 will swap its pseudonym with other users who is assigned to t1, t2, and t3, respectively. Therefore, an attacker cannot infer u1’s task sequence from the task assignment since it cannot link together u1’s pseudonyms.

18 School Area

CBD Residential Area

Figure 3.5: Example of mix-zone-based task allocation

3.2.3 Problem Formulation

In this work, the platform aims to allocate tasks as many as possible while preserving users’ location privacy. We choose to maximize the number of assigned tasks for the following reasons: 1) this is an important measurement to evaluate the effectiveness of a system; 2) users’ location privacy is protected at a potential cost of sacrificing this number; 3) this number is related to other measurements, e.g., total travel distance and the total value of tasks to a system. We assume the platform has a set T = {τ1,τ2,...,τm} of m tasks sent by requesters and a set U = {u1,u2,...,un} of n users. Each task is associated with a location. Each user is associated with a location and can be assigned at most one task to protect its location privacy against location-inference attack via task overlap. Besides, any user ui has a privacy requirement ki representing the minimum number of users who are assigned the same task as ui. For example, k1 = 3 means that u1 can only be assigned a task which is

also assigned to at least two other users. Let kmax = maxui∈U ki. Each user ui should be assigned the same task with at least ki − 1 users to defend against location-inference attack via task sequence. Let dij denote the Euclidian distance between user ui and task τj. Let

19 aij denote the indicator such that aij = 1 means user ui is assigned task τj, and aij = 0 means otherwise. Therefore, the location privacy-aware spatial task allocation problem is formulated as:

max πj (3.1) τ ∈T Xj s.t. dijaij ≤ R, ∀ui ∈U,τj ∈T , (i)

aij ≤ 1, ∀ui ∈U, (ii) τ ∈T Xj aij ≥ klalj, ∀ul ∈U,τj ∈T , (iii) u ∈U Xi aij ≥ πj, ∀ui ∈U,τj ∈T , (iv)

aij ∈ {0, 1}, ∀ui ∈U,τj ∈T , (v)

where πj = 1 indicates τj is assigned if any aij = 1, ∀ui ∈ U. The objective is to maximize the number of assigned tasks. Constraints (i) are due to users’ maximum travel distance. Constraints (ii) ensure that each user is assigned at most one task. Constraints (iii) ensure that, for each task τj, the number of selected users should be greater than the privacy requirement of each user ul assigned this task. Constraints (iv) and (v) ensure that πj = 1 as long as τj is assigned to at least one user.

Theorem 1. The location privacy-aware spatial task allocation problem is NP-complete.

Proof. We prove the NP-completeness of the problem by a polynomial time reduction from the planar exact cover by 3-sets (Planar X3C) problem, which is NP-complete [87]. The

Planar X3C problem is defined as follows: Let F = {S1,...,Sp} be a family of sets, where each Si is a 3-element subset of a set X = {x1,...,x3n}. In addition, the bipar- tite graph GX3C {(VX3C ,EX3C )}, where VX3C = X ∪ F and EX3C = {(xi,Sj)|xi ∈ Sj, 1 ≤ i ≤ 3n and 1 ≤ j ≤ p}, is a planar. Does there exist a subfamily F ∗ of F such that each element of X occurs in exactly one member of F ∗? Next, we construct an instance of the privacy-aware task allocation problem from an instance of the Planar X3C problem. We create a task in the task set T for each element in

20 F. There is a user in the user set U corresponding to each element in X . For each user, k is set to 3. It is obvious that there is a solution to the Planar X3C problem if and only if there is a solution to the location privacy-aware spatial task allocation problem. Therefore, the location privacy-aware spatial task allocation problem is NP-complete.

Since the privacy-aware spatial task allocation problem is NP-complete, we aim to find an approximate solution.

3.3 Our Approach

In this section, we first present an approximation algorithm to solve the location privacy- ware spatial task allocation problem. Then, we propose a heuristic algorithm to reduce the time complexity.

3.3.1 Design of PASTA

Due to its NP-hardness, the location privacy-aware task allocation problem is unlikely to have an efficient optimal algorithm, unless P = NP. Therefore, we design an approximation algorithm by incorporating the shifted grid technique [88], which is applicable to geometric covering and packing problems. There are two stages in a shifted grid algorithm. In the first stage, the plane is partitioned into squares with each having a size of s × s, where s is an arbitrary number. By shifting the partition grid lines over unit distance, a new way of partitioning can be derived. Each way of partitioning is called a shift. Therefore, there are s×s shifts in total. In the second stage, a local algorithm is used to find an optimal solution within each square. By combining the solution of each square in a shift, we obtain the global solution for a shift. The final solution is the one with the best performance among all shifts. Algorithm 1 illustrates the proposed approximate algorithm, referred to as PASTA, for

the location Privacy-Aware Spatial Task Allocation problem. Let A = [A1,A2,...,Am]

denote the task assignment for all tasks, where Aj is a set of selected users for τj. For any

task τj, Aj is initialized to ∅. Initially, PASTA normalizes the target area with respect to users’ maximal travel distance (Line 1) such that the shifting unit is 1. The outer loop (Lines

21 Algorithm 1: PASTA Input: Users U, tasks T , and ǫ Output: Task assignment A 1 Normalize the target area with respect to users’ maximal travel distance R; 2 2 s ← ǫ ; 3 for i ← 1 to s do 4 for j ← 1 to s do 5 Partition the plane into squares with of size s × s such that the lower left corner is (i, j); 6 Use a brute-force algorithm to select an optimal task allocation for each square; 7 Combine the task allocation for all squares as a solution for this shift; 8 end 9 end 10 A ← the solution with the maximum number of assigned tasks; 11 return A

3-9) generates all shifts iteratively. In each iteration, the target area is partitioned into s × s squares (Line 5). For each square, we use a brute-force algorithm to find an optimal task assignment given the tasks and users in this square (Lines 6). The task assignments of all squares in a shift will be combined as the result of this shift (Line 7). After calculating the results of all shifts, PASTA chooses the result with the maximum number of assigned tasks as the final task assignment (Line 10).

Theorem 2. The approximation ratio of PASTA is 1 − ǫ, where ǫ ∈ (0, 1) is an arbitrarily small constant.

Proof. Let OPT denote the set of selected users in an optimal solution to the problem and ] OPT(i,j) be the set of selected users in shift (i, j). Let OPT (i,j) be the set of selected users in OPT intersecting the lines x = as + i and y = bs + j, where a, b ∈ ❩. We have

] |OPT(i,j)| + |OPT (i,j)|≥|OPT |, (3.2) s s ] 2 (|OPT(i,j)| + |OPT (i,j)|) ≥ s |OPT |. (3.3) i=1 j=1 X X We also have

22 s s ] |OPT (i,j)|≤ 2s|OPT |, (3.4) i=1 j=1 X X and thus

s s 2 |OPT(i,j)|≥ (s − 2s)|OPT |. (3.5) i=1 j=1 X X Therefore,

2 max |OPT(i,j)|≥ (1 − )|OPT |. (3.6) i,j∈{1,...,s−1} s

2 Setting s = ǫ gives a (1 − ǫ)-approximation algorithm.

Next, we analyze the running time of PASTA. According to Algorithm 1, the time complexity is dominated by the two nested for-loops. There are s2 iterations in total. Each iteration is dominated by the brute-force algorithm which searches all the user combinations for all the tasks in a square and chooses the task assignment with the largest number of

ns assigned tasks. The brute-force algorithm takes O(msns2 ), where ms is the largest number

of tasks in a square, ns is the largest number of users for each task in a square. Note that, the time complexity of PASTA is exponential to the maximum number of users for each task in any square. However, this number is small in practice. According to a recent study [89], the numbers of users within 1 km2 of two popular location-based applications Gowalla and Foursquare are 35 and 90, respectively.

3.3.2 Design of Heuristic

To reduce the time complexity, we also design a fast and effective heuristic algorithm for location privacy-aware spatial task allocation, referred to as Heuristic, which is illustrated in Algorithm 2. Initially, Heuristic screens all the tasks in T and removes the tasks with no candidate or no feasible user assignment (Line 1). Then, Heuristic assigns tasks iteratively

23 until there is no task in T (Line 2-11). In each iteration, the tasks in T will be sorted in an increasing order according to the number of candidate users of each task (Line 3). The first task in the sort will be selected for this iteration (Line 4). For this task, Heuristic sorts its candidate users in an increasing order according to their privacy requirements with the number of feasible tasks as a tie-breaker (Line 5). For example, if two users have the same privacy requirement, the user with a smaller number of feasible tasks will be placed before the other one in the sort. Then, Heuristic selects users iteratively from the beginning of the sort until the privacy requirements of all the selected users are satisfied (Line 6). Thereafter, the select users are the assignment of this task (Line 7). Meanwhile, the selected users are removed from user set U (Line 8), and the assigned task will be removed from T (Line 9). At the end of each iteration, Heuristic screens and removes the remaining tasks in T again as Line 1.

Algorithm 2: Heuristic Input: Users U and tasks T Output: Task assignment A 1 Remove all the tasks with no candidate user or feasible user assignment from T ; 2 while T= 6 ∅ do 3 Sort the tasks in T in an increasing order according to their numbers of candidate users; 4 tj ← the first task in the sort; 5 Sort tj’s candidate users in an increasing order according to their privacy requirements with the number of feasible tasks as a tie-breaker; 6 Select users iteratively from the beginning until all selected users’ privacy requirements are satisfied; 7 Aj ← selected users; 8 Remove selected users from U; 9 T ← T \ {tj}; 10 Remove all the tasks with no candidate user or feasible user assignment from T ; 11 end 12 return A

24 3.4 Performance Evaluation

In this section, we evaluate the performances of the proposed algorithms and compare them with a state-of-the-art privacy-preserving task allocation algorithm [38] (denoted by PWSM), which leverages differential privacy to protect users’ location information. The objective of PWSM is minimizing the total travel distance to all tasks. Intuitively, PWSM iteratively assigns a task to the user with the minimum obfuscated distance to this task among all the remaining users, and the selected user will not be considered in the future. The performance metrics include the number of assigned tasks, the average travel distance of each selected user, and the running time.

Rome

Figure 3.6: GPS locations of the taxi drivers in Rome

3.4.1 Evaluation Setup

In the simulation, we use a real-world dataset consisting of the traces of taxi drivers in Rome [90]. This dataset has been widely used to simulate crowdsensing systems [77, 78, 80, 91]. As in [77], we consider a crowdsensing system whose objective is generating a Wi-Fi

25 signal map in Rome. Therefore, the tasks are measuring the Wi-Fi signal strength at specific locations on the street. In this system, tasks are represented by GPS locations of the taxi drivers in the dataset, and users are all the taxi drivers. Figure 3.6 shows the distribution of the GPS locations of the taxi drivers in Rome. The maximum travel distance of each user is set to 1 km. In the evaluation, we randomly select locations on taxi drivers’ traces as the sensing tasks. To evaluate the impact of the number of sensing tasks (m) on the performance metrics, we fix the number of users (n) at 50 and vary m from 5 to 25 with a step of 5. To evaluate the impact of the number of users on the performance metrics, we fix m at 10 and vary n from 20 to 100 with a step of 20. Note that, we fix the privacy requirement of each user (k) at 2 and ǫ at 2/7 in the above two evaluations. For a fair comparison, we assume that PWSM only selects users whose obfuscated distances to the tasks are less than the maximum travel distance (1 km). Meanwhile, we set the privacy budget of PWSM to 1 for all users.

Since different users might have different privacy requirements, we define kmax as the highest privacy requirement such that each user’s privacy requirement is randomly selected from

[2,kmax]. For example, kmax = 6 means that a user’s privacy requirement can be any integer in [2, 6]. To evaluate the impact of k on the performance metrics, we fix m at 10, n at 100, and vary kmax from 2 to 6 with a step of 1. The impact of ǫ is also evaluated by fixing m at 5, n at 100, k at 2. All results are averaged over 1000 independent runs.

3.4.2 Impact of the Number of Tasks

The impact of the number of sensing tasks (m) on the number of assigned tasks, the average travel distance, and the running time is shown in Figure 3.7(a), Figure 3.7(b), and Figure 3.7(c), respectively. In Figure 3.7(a), we observe that the number of assigned tasks increases when m grows for all the three algorithms. This is obvious since given enough users, more tasks can be assigned. However, we do not see a linear relationship between the number of assigned tasks and m. This is because some tasks might be generated in areas with a low population, and thus cannot find enough users given users’ max travel distance.

26 20 PASTA Heuristic 15 PWSM

10

5

0

Number of assigned tasks 5 10 15 20 25 Number of tasks (a) Number of assigned tasks 2 PASTA Heuristic 1.5 PWSM

1

0.5

Average travel distance 0 5 10 15 20 25 Number of tasks (b) Average travel distance

0 10 PASTA Heuristic PWSM Running time (s)

5 10 15 20 25 Number of tasks (c) Running time

Figure 3.7: Impact of m on PASTA, Heuristic, and PWSM

Note that, PWSM assigns more tasks than PASTA and Heuristic. This is because PWSM assigns at most 1 user to each task, which means that as long as there is a user in the max

27 travel range of a task, the task can be assigned. This can be easily satisfied for the tasks in the dense population areas. However, both PASTA and Heuristic require k = 2 users in this evaluation. This requirement eliminates some tasks especially those in the areas with a low population density. In Figure 3.7(b), we see that the average travel distance does not vary a lot when m grows for both PASTA and Heuristic. This is because the travel distance for each user to a task is bounded by the max travel distance. However, it is obvious that the average travel distance of PWSM is larger than that of PASTA and Heuristic. This is because PWSM obfuscates users’ real distance to the task to satisfy differential privacy. The user with the smallest obfuscates distance to a task might be outside the circular range of a task with radius of the max travel distance in reality. Figure 3.7(c) shows the comparison of running time in log scale. We see that the running time increases when m grows for all three methods. This is because with more tasks, more time is needed for all methods. In addition, Heuristic is faster than PWSM since PWSM needs to obfuscate users’ location. Although PASTA takes significantly more time than Heuristic and PWSM due to the fact that its time complexity is exponential to the maximum number of users for each task in any square, it still terminates in a reasonable time.

3.4.3 Impact of the Number of Users

The impact of the number of users (n) on the number of assigned tasks, the average travel distance, and the running time is shown in Figure 3.8(a), Figure 3.8(b), and Figure 3.8(c), respectively. As expected, the number of assigned tasks increases with n for PWSM, PASTA, and Heuristic shown in Figure 3.8(a). Since with more users, the higher probability that a task can find users in the circular range with a radius of R, and thus can be assigned. We also see that PWSM assigned more tasks than PASTA and Heuristic as explained before. In Figure 3.8(b), we observe that the average travel distance decreases as n grows for PWSM, PASTA and Heuristic. Because, with more users, the platform can find more users closer to tasks. Similar to the case in the previous evaluation, PWSM also has a larger average travel distance than both PASTA and Heuristic because of the user distance obfuscation.

28 In Figure 3.8(c), we see that the impact of n on the running time follows the same pattern as in Figure 3.7(c).

10 PASTA 8 Heuristic PWSM 6 4 2 0

Number of assigned tasks 20 40 60 80 100 Number of users (a) Number of assigned tasks 2 PASTA Heuristic 1.5 PWSM

1

0.5

Average travel distance 0 20 40 60 80 100 Number of users (b) Average travel distance

100 PASTA Heuristic PWSM Running time (s) 10-5 20 40 60 80 100 Number of users (c) Running time

Figure 3.8: Impact of n on PASTA, Heuristic, and PWSM

29 To evaluate the performance in a system with a large number of users, we also conduct the evaluation by varying n from 200 to 1000 with a step of 200 shown in Figure 3.9. Note that, we only compare PWSM and Heuristic in this setting since PASTA cannot terminate in a reasonable time due to the fact that it searches for the optimal solution in a square. The results of the number of assigned tasks and the average travel distance all follow the similar pattern as in Figure 3.8(a) and Figure 3.8(b), respectively.

10 2 Heuristic Heuristic PWSM PWSM 1.5 8 1 6 0.5

4 Average travel distance 0

Number of assigned tasks 200 400 600 800 1000 200 400 600 800 1000 Number of users Number of users (a) Number of assigned tasks (b) Average travel distance

Figure 3.9: Impact of n on Heuristic and PWSM

3.4.4 Impact of the Privacy Requirement

Figure 3.10 shows the impact of the privacy requirement on the number of assigned tasks and the average travel distance. PWSM is not shown in this evaluation because it assigns at most one user to each task. In Figure 3.10(a), we observe that the number of assigned tasks decreases as k grows for both PASTA and Heuristic. This is because, with a higher value of k, more users are expected to be in the circular range of a task with the max travel distance if this task can be assigned. However, it is hard to find enough users if a task is generated in rural areas with a low population. In Figure 3.10(b), we see that the average travel distance increases with k for both PASTA and Heuristic. This is because, with a

larger value of kmax, a user can set a higher privacy requirement. However, if a user with a larger k is selected for a task, it implies that more users need to be selected for the same task, and thus users with longer distances to the task might be selected.

30 10 1 PASTA PASTA 8 Heuristic 0.8 Heuristic 6 0.6 4 0.4 2 0.2 0 0 Average travel distance

Number of assigned tasks 2 3 4 5 6 2 3 4 5 6 kmax kmax (a) Number of assigned tasks (b) Average travel distance

Figure 3.10: Impact of k on PASTA and Heuristic

3.4.5 Impact of the Parameter ǫ

Figure 3.11 shows the impact of ǫ on the number of assigned tasks and the average travel distance. Only PASTA is shown since ǫ has no impact on Heuristic and PWSM. In Figure 3.11(a), we see that the number of assigned tasks increases as ǫ decreases. This is because the smaller ǫ is, the better performance PASTA can achieve according to Theorem 2. For the same reason, we see that the average travel distance also decreases with ǫ decreases as shown in Figure 3.11(b).

10 1 PASTA PASTA 8 0.8 6 0.6 4 0.4 2 0.2 0 0 Average travel distance

Number of assigned tasks 2/3 2/4 2/5 2/6 2/7 2/3 2/4 2/5 2/6 2/7 ǫ ǫ (a) Number of assigned tasks (b) Average travel distance

Figure 3.11: Impact of ǫ on PASTA

31 3.5 Conclusion

In this work, we analyzed the location privacy issues in task allocation. The main con- tributions of this work are as follows: First, we identified the location-inference attack in task allocation. We demonstrated that an attacker can infer a target user’s location infor- mation according to the task assignment. To defend against this attack, we formulated a location privacy-aware spatial task allocation problem, which aims to maximize the number of assigned tasks while taking users’ location privacy as constraints. Second, we proved the NP-completeness of this problem. To solve this optimization problem, we then designed a (1 − ǫ)-approximation algorithm (PASTA) based on shifted grid technique and a fast and effective heuristic algorithm (Heuristic). Third, we conducted extensive simulations on a real-world dataset and compared the proposed algorithms with a state-of-the-art privacy- preserving task allocation algorithm. The results demonstrated the effectiveness of PASTA and Heuristic in maximizing the number of assigned tasks.

32 CHAPTER 4 PRESERVING BID PRIVACY IN INCENTIVE MECHANISMS

This work was first published in IEEE CNS in 2016 [41], and an extended version was published in IEEE TMC in 2018 [92].

4.1 Background

In most of the proposed truthful auction-based incentive mechanisms, bidders are stim- ulated to bid their true costs, which are private information of users. For transparency, the platform will publish the outcome of auction, which consists of winning bidders and their payments. Ensuring transparency in the procurement procedure is essential to efficiency, as it enhances the competitiveness of public procurement [83]. Meanwhile, it has been proven by bid sale dealers for years that transparency leads to profit [93]. The FCC uses auctions to sell the licenses to transmit signals over specific bands of the electromagnetic spectrum, and releases the result of each auction online for transparency [94]. In recent years, many com- mercial platforms have put more emphasis on transparency as well, e.g., Auction.com [95], which is a trusted leader in the web-based real estate auction industry, and eBay [96], which is a multinational e-commerce corporation. However, once the true cost of a smartphone user is reported to the platform, other bidders might infer this private information based on the published outcome. This is known as inference attack [13]. Inference attack has been analyzed in many areas, e.g., multilevel secure databases [97], data mining [98], web- based applications [99] and mobile devices [100]. Protecting users’ bid privacy is important because its disclosure might also incur threats to users’ other private information, such as location [15, 16]. To formalize the notion of users’ bid privacy, we employ the concept of differential pri- vacy [17]. Intuitively, a mechanism provides differential privacy if the change of one user’s

33 bid has limited impact on the outcome. We also leverage the exponential mechanism [18], a technique to design differentially private mechanisms, to preserve users’ bid privacy. In this work, we study the problem of designing truthful mechanisms, which achieve computational efficiency, individual rationality, differential privacy, and approximate social cost minimization. We consider the scenario in which there is one buyer and multiple sellers. Users act as bidders and submit their bids to compete for the chance of being selected to perform tasks. Besides, users do not want others to know their own bid information. We first consider the single-bid model in which each user can only submit a set of tasks. Then we consider the multi-bid model in which each user can submit a bid for each task in its task set.

4.2 Models and Problem Formulation

In this section, we model the crowdsensing system as a reverse auction and present two different models. Similar to most crowdsensing systems [8–10, 44, 47], we consider a crowdsensing system consisting of a platform and multiple users who are interested in performing sensing tasks. In the first model, each user can submit only one task-bid pair. Our second model allows each user to submit multiple task-bid pairs and can be assigned to work on multiple tasks. Then we describe the threat models. At the end of this section, we present some important properties and give our design objective.

4.2.1 Single-bid Model

The platform first publicizes a set T = {τ1,τ2,...,τm} of m sensing tasks. Assume there is a set U = {1, 2,...,n} of n ≥ 2 smartphone users. Each user i has a task set Γi ⊆ T , which it can perform. Each Γi is associated with a cost ci, which is private information of user i. The platform selects a subset of users S ⊆U to complete all the sensing tasks

in T . Finally, the platform calculates the payment pi for each selected user i ∈ S. Let −→ p =(p1,p2,...,pn) denote the payment profile. The utility of any user i ∈U is

34 p − c , if i ∈S; u = i i i 0, otherwise.  In this work, we model the interactive process between the platform and users as a sealed- bid reverse auction in which the platform buys sensing service and users are bidders who sell sensing service. In order to prevent a monopoly and guarantee the quality of sensing task, we assume each task in T can be completed by more than one user in U. This assumption is reasonable for mobile crowdsensing as made in [47]. If a task in T can only be completed by at most one user in U, we simply remove it from T .

At the beginning of this auction, each user i ∈U submits a task-bid pair βi = (Γi,bi) to the platform, where bi is user i’s bid, representing the minimum price user i wants to sell its sensing service for. Note that in a truthful auction-based incentive mechanism, users are stimulated to bid their true costs, i.e., bi = ci. Without loss of generality, we assume that each user’s bid is bounded by [b,¯b], where b is normalized to 1 and ¯b is a constant. Let ∆ −→ denote the difference between ¯b and b. Let β =(β1, β2,...,βn) denote the task-bid profile. −→ Given the task-bid profile β , the platform determines the outcome of auction, which consists of selected winning users S and payment profile −→p .

4.2.2 Multi-bid Model

In the single-bid model, each user submits a bid for a set of tasks. In the multi-bid model, each user is allowed to submit a bid for each task in its task set, and each user can be assigned to work on multiple tasks. −→ The definitions of T , U, S,Γi, p and ∆ are the same as in Section 4.2.1. In the multi-bid

k k model, for each user i ∈U, each task τi in Γi has an associated cost ci . Each user i submits

1 2 ki a set Bi = {βi , βi ,...,βi } of ki = |Γi| task-bid pairs. Each task-bid pair is denoted by

k k k k k βi = (τi ,bi ), where τi is a single task from Γi, and bi is the minimum price user i wants

k to sell its sensing service for τi . Note that in a truthful auction-based incentive mechanism,

k k −→ users are stimulated to bid their true costs, i.e., bi = ci . Let B =(B1, B2,..., Bn) denote the

35 −→ task-bid profile. Given the task-bid profile B , the platform determines the winning task-bid

k k pair set BW ⊆ Bi such that k τ = T . For each winning task-bid pair β ∈BW , i∈U βi ∈BW i i k the platform calculatesS a paymentSpi . A user i is called a winner and be added into S if it

has at least one winning task-bid pair, i.e., Bi ∩BW =6 ∅. The payment for each winner i is

k pi = k p . The utility of any user i ∈U is βi ∈Bi∩BW i P k pi − βk∈B ∩B ci , if i ∈S; u = i i W i 0, otherwise.  P

4.2.3 Threat Models

Threats to Incentive: We assume that users are selfish but rational. It implies that each user only wants to maximize its own utility, and will not participate in crowdsensing unless there is sufficient incentive. Hence user i could report a bid bi differs from its true cost ci, i.e.,

k k bi =6 ci in the single-bid model or report a bid bi =6 ci in the multi-bid model to maximize its own utility. We also assume that user i does not misreport its task set Γi in the single-bid

k model as in [8–10, 44, 47], and does not misreport any τi ∈ Γi in the multi-bid model. In

′ ′ the single-bid model, if user i reports Γi containing tasks not in Γi, i.e., Γi \ Γi =6 ∅, i cannot

′ ′ finish Γi when selected. If user i reports Γi ⊂ Γi with ci, the probability of user i being selected will not increase according to our mechanism. The case in which user i misreports

′ both Γi and ci is challenging, because calculating the true cost of Γi ⊂ Γi is still an open

k ′k question. In the multi-bid model, if user i reports βi containing tasks not in Γi, i.e., τi ∈/ Γi,

′k i cannot finish τi when selected. Other threats to incentive (e.g., collusion among bidders) are out of the scope of this work. Threats to Privacy: Bidders are stimulated to bid their true costs in a truthful auction-

k k based incentive mechanism, i.e., bi = ci in the single-bid model and bi = ci in the multi-bid model. However, one bidder could infer other bidders’ bid according to the outcome of the mechanism. This inference attack can be seen from the following examples.

36 We first consider the single-bid model, suppose there are 5 users in the system and their

task-bid pairs βi = (Γi,bi), i ∈ [1, 5] are shown in Table 4.1. The platform publicizes a set of 3 sensing tasks T = {τ1,τ2,τ3}. According to the proposed truthful mechanism in TRAC [47], the winning users S = {2, 1, 3}. Suppose user 5 is a bidder who want to infer other bidders’

bid, and it changes its bid b5 from $5 to $3 in the next auction while the other four bidders do not change their task-bid pairs. The winning users of the new auction is S = {2, 1, 5}. Since the platform publishes the outcome of the mechanism for transparency, user 5 could know the results and infer that user 3’s bid is between $3 and $5 by the fact that if it bids $5 it will be replaced by user 3 and if it bid $3 it will replace user 3. We can see that, after many rounds of auction, user 5 might narrow down user 3’s bid range, and even infer the exact value in some cases.

Table 4.1: Example showing the inference attack in single-bid model

User 1 2 3 4 5 βi Γi τ1,τ2 τ1 τ1,τ3 τ1,τ2 τ1,τ3 bi $3 $1 $4 $5 $5

Next we consider inference attack in multi-bid model using the example shown in Ta- ble 4.2. According to the proposed truthful mechanism in TRAC [47], the winning bid-pairs

1 2 2 are BW = {β2 , β1 , β3 }, and thus S = {2, 1, 3}. Suppose user 5 is a bidder who want to infer

2 other bidders’ bid, and it changes its bid b5 from $2.5 to $2 in the next auction while the other four bidders do not change their task-bid pairs. Then the winning bid-pairs of the new

1 2 2 auction are BW = {β2 , β1 , β5 }. Based on this outcome, user 5 could infer that user 3’s bid for τ3 is between $2 and $2.5 by the fact that if it bids $2.5 it will be replaced by user 3 and if it bids $2 it will replace user 3. After many rounds of auction, user 5 might also narrow down user 3’s bid range, and even infer the exact value in some cases. This inference attack is practical in most mobile crowdsensing applications, e.g., [1, 101], in which tasks are publicized periodically for collecting dynamic sensing data. Protecting

37 Table 4.2: Example showing the inference attack in multi-bid model

User k 1 2 3 4 5 βi k τi τ1 τ2 τ1 τ1 τ3 τ1 τ2 τ1 τ3 k bi $1.5 $1.5 $1 $1.6 $2.4 $3 $2 $2.5 $2.5 users’ bid privacy from such inference attack is important because its disclosure might also incur threats to users’ other private information, such as location [15, 16]. For example, in [15] each user i’s cost of task ci is modeled as a linear function of its distance di to the task. In a truthful mechanism, user i’s bid bi = ci. Therefore, an attacker can infer user i’s location inside a suspicion region, which is the circle centered at the task with radius di, by inferring its bid bi. Besides, an attacker can also improve the inference accuracy by narrowing down the victim’s bid through many rounds of auction.

4.2.4 Desired Properties

In this work, we consider the following important properties.

• Computational Efficiency: A mechanism is computationally efficient if it terminates in polynomial time.

• Individual Rationality: A mechanism is individually rational if each user will have a non-negative utility when bidding its true cost.

• Truthfulness: A mechanism is truthful if any user’s utility is maximized when bidding its true cost.

• Social Cost Minimization: A mechanism achieves social cost minimization if the total cost of the users in S is minimized subject to certain constraints on S.

In addition, we consider users’ bid privacy preservation.

Definition 1. (Differential Privacy [17]). A randomized function M has ǫ-differential privacy if for any two input sets A and B with a single input difference, and for any set of

38 outcomes O⊆ Range(M),

Pr[M(A) ∈O] ≤ exp(ǫ) × Pr[M(B) ∈O]. (4.1)

In this work, the randomized function M corresponds to our frameworks, and Range(M) is the outcome space of the frameworks. One relaxation of differential privacy is as follows.

Definition 2. (Approximate Differential Privacy [102]). A randomized function M gives (ǫ,δ)-differential privacy if for any two input sets A and B with a single data difference, and for any set of outcomes O⊆ Range(M),

Pr[M(A) ∈O] ≤ exp(ǫ) × Pr[M(B) ∈O]+ δ. (4.2)

The truthfulness of an auction mechanism is guaranteed by the following theorem.

Theorem 3. [103] Let Pri(z) denote the probability that bidder i is selected when its bid −→ is z. A mechanism with bids b and payments −→p is truthful in expectation if and only if, for any bidder i,

1) Pri(z) is monotonically non-increasing in bi; ∞ 2) 0 Pri(z)dz < ∞; ∞ 3) TheR expected payment satisfies E [pi]= biPri(bi)+ Pri(z)dz. bi R Next, we introduce the concept of the exponential mechanism and its properties. In the literature of differential privacy, the exponential mechanism is often used to design privacy- preserving mechanisms. A key component of the exponential mechanism is the score function f(A, o), which maps the input set A and an outcome o ∈O to a real-valued score. The score represents how good the outcome o is for the input set A compared with the optimal outcome.

ǫ Exponential mechanism ǫf (A): Given an outcome space O, an input set A, a score func-

ǫ tion f and a small constant ǫ, the exponential mechanism ǫf (A) chooses an outcome o ∈O with probability

ǫ Pr ǫf (A)= o ∝ exp(ǫf(A, o)) . (4.3)  

39 Let Λ denote an upper-bound of the difference of any two input sets, the exponential mech- anism has the following properties.

Theorem 4. [18] The exponential mechanism gives 2ǫΛ-differential privacy.

Theorem 5. [104] For any α ≥ 0, the exponential mechanism, when used to select an

ǫ ∗ output o ∈ O, ǫf (A) yields 2ǫΛ-differential privacy, letting O be the subset of O achieving f(A, o) = max f(A, o), ensures that o

ǫ ∗ Pr f A, ǫf (A) < max f(A, o) − ln(|O|/|O |)/ǫ − α/ǫ ≤ exp(−α) (4.4) o h  i 4.2.5 Design Objective

The goal of our framework design is to minimize the social cost while achieving computa- tional efficiency, individual rationality, truthfulness and differential privacy. Specifically, the minimization problem in the single-bid model is referred to as the Social Cost Minimization (SCM) problem and the minimization problem in the multi-bid model is referred to as the SCM-M problem. Next, we give the formal formulation of the SCM problem and the SCM-M problem, respectively. SCM problem: Given a task set T and a user set U, the goal of the SCM-S problem is to

find a subset of users S⊆U, such that C(S)= i∈S ci is minimized subject to i∈S Γi = T . SCM-M problem: Given a task set T Pand a user set U, the goal ofS the SCM-M problem is to find a subset of users S⊆U and their assigned task-bid pairs BW , such that

k k C(BW )= k c is minimized subject to k τ = T . βi ∈BW i βi ∈BW i Note thatP SCM problem is challenging becauseS it is NP-hard (proved by Theorem 4 in [41]). It is more challenging to solve SCM problem while achieving computational efficiency, individual rationality, truthfulness and differential privacy. Although SCM-M can be solved optimally, it is still challenging when combining with the other properties. Therefore, we aim to design differentially private truthful frameworks with theoretically guaranteed ap- proximate social cost.

40 4.3 Our Approach

In this section, we first describe BidGuard, a differentially private auction framework for the Single-bid Model. Then we describe BidGuard-M, a differentially private auction framework for the Multi-bid Model.

4.3.1 Design Rationale

For the Single-bid Model, we propose BidGuard, which integrates the exponential mech- anism with the reverse auction to achieve computational efficiency, individual rationality, truthfulness, differential privacy and approximate social cost minimization. In this frame- work, users are selected iteratively. In each iteration, redundant users are eliminated and each remaining user is assigned a probability to be selected. Then BidGuard selects one of them as winner based on the probability distribution. Specifically, the probability of a user to be selected is set according to a specific criterion. The above processes repeats until all tasks can be completed by the selected users. Finally, BidGuard computes the payment to each winner. For the Multi-bid Model, we propose BidGuard-M, which also integrates the exponential mechanism with the reverse auction. Different from BidGuard, task-bid pairs are selected iteratively in BidGuard-M. In each iteration, one task is considered. Each of the task- bid pairs with this task is assigned a probability which BidGuard-M selects based on the probability distribution. Specifically, the probability of a task-bid pair to be selected is set according to a specific criterion. The above process repeats until all tasks can be completed by the selected task-bid pairs. Finally, BidGuard-M computes the payment to each winning task-bid pair.

4.3.2 Design of BidGuard

In this section, we will describe BidGuard in detail. As illustrated in Algorithm 3, Bid- Guard consists of three phases: user screening, winner selection, and payment determination. It executes these three phases iteratively until all tasks can be completed by the selected

41 users.

Algorithm 3: BidGuard −→ Input : A set of sensing tasks T , a set of users U, submitted task-bid profile β , 1 and differential privacy parameters ǫ> 0 and δ ∈ (0, 2 ]. Output: A set of winners S and a payment profile −→p . 1 S ← ∅, Tc ← ∅, R←U; 2 foreach i ∈U do pi ← 0; 3 while Tc =6 T do 4 foreach i ∈R do 5 if Γi ⊆Tc then R ← R \ {i}; 6 end 7 foreach i ∈R do 8 Calculate the probability Pri(bi) of each user being selected according to the score function; 9 end 10 Select one user randomly, denoted by i′, according to the computed probability distribution; ′ ′ 11 S ← S ∪ {i }, Tc ←Tc ∪ Γi′ , R ← R \ {i }; 12 end ¯b Pri(z)dz bi 13 foreach i ∈S do pi ← bi + ; R Pri(bi) 14 return S and −→p .

1) User Screening Phase: BidGuard will eliminate all redundant users, whose task set can be completed by the currently selected users. The set of remaining users is denoted by R. 2) Winner Selection Phase: BidGuard will assign each user i ∈R a probability of being

selected as follows. It first computes a criterion r(βi), which is the bid divided by the number of tasks that cannot be completed by the currently selected users, i.e.,

bi r(βi)= , (4.5) |Γi −Tc|

where Tc is the set of tasks that can be completed by the currently selected users. BidGuard

selects the user with the lowest r(βi) in each iteration. To apply exponential mechanism, we

need to design a score function, which is a non-increasing function of r(βi). The probability of each user to be selected is set according to the value of a score function.

42 3) Payment Determination Phase: Let Pri(z) denote the probability of user i being selected with bid z. According to Theorem 3, the payment to winner i is

¯b Pri(z)dz bi pi = bi + . (4.6) R Pri(bi) To apply exponential mechanism, we need to design a score function. Specifically, we design two score functions, linear score function and log score function.

Linear score function: fLIN (x)=1 − x. For any bidder i ∈ R, the probability to be selected in each iteration is

′ bi exp ǫ (1 − ¯b|Γ −T | ) , if i ∈R; Pri(bi) ∝ i c (4.7) ( 0,   otherwise, where ǫ′ = ǫ/(e∆ln(e/δ)). Note that in order to guarantee the value of the score function is

bi nonnegative, we normalize r(βi), i.e., . Then the probability is ¯b|Γi−Tc|

′ bi exp ǫ (1− ¯ ) b|Γi−Tc| b , if i ∈R;  ′ j  Pri(bi)= j∈R exp ǫ (1− ¯ ) (4.8)  b|Γj −Tc|    0P, otherwise.

Log score function: fLOG(x) = log1/2 x. For any bidder i ∈ R, the probability to be selected in each iteration is

′ bi exp ǫ log1/2 ¯b|Γ −T | , if i ∈R; Pri(bi) ∝ i c (4.9) ( 0,   otherwise,

′ bi where ǫ = ǫ/(e ln(e/δ)log (1/(1+∆))). We also normalize the r(βi), i.e., to 1/2 ¯b|Γi−Tc| guarantee the value of the score function is nonnegative. Then the probability is

′ bi exp ǫ log1/2 ¯ b|Γi−Tc| b , if i ∈R;  ′ j Pri(bi)= j∈R exp ǫ log1/2 ¯ (4.10)  b|Γj −Tc|   0P, otherwise.

Throughout the rest of this chapter, we denote the BidGuard with linear score function fLIN and log score function fLOG by LIN and LOG, respectively. Illustrating example: We use the example in Table 4.1 to illustrate how LIN works. Assuming ¯b = 1, ¯b = 6, then ∆ = 5. Let the differential privacy parameters ǫ = 0.1 and δ = 0.5, then ǫ′ = 0.1/(e × 5ln(e/0.5)). Note that these values are randomly chosen for

43 illustration. At the beginning, T = {τ1,τ2,τ3}, S = ∅, Tc = ∅, R = U = {1, 2, 3, 4, 5}.

LIN starts to select users iteratively. In the first iteration, LIN calculates |Γi −Tc| for each

user i ∈ R. We have |Γ1 − Tc| = 2, |Γ2 − Tc| = 1, |Γ3 − Tc| = 2, |Γ4 − Tc| = 2, and

|Γ5 −Tc| = 2. Based onEquation (4.8), LIN calculates the probability of every user in R to

′ ′ ′ ′ be selected in this iteration, e.g., Pr1(3) = exp(0.7ǫ )/(exp(0.7ǫ ) + exp(0.8ǫ ) + exp(0.6ǫ )+ exp(0.5ǫ′) + exp(0.5ǫ′)). LIN selects a user based on the calculated probability distribution.

Assume LIN selects user 1, then S = {1}, Tc = {τ1,τ2}, R = {3, 5}. At the beginning

of the second iteration, LIN calculates |Γi −Tc| for the remaining users user 3 and user 5.

We have |Γ3 −Tc| = 1 and |Γ5 −Tc| = 1. Then LIN calculates the probabilities of user 3 and user 5 to be selected in this iteration according toEquation (4.8). We have Pr3(4) =

′ ′ ′ ′ ′ ′ exp(0.2ǫ )/(exp(0.2ǫ )+exp(ǫ )) and Pr5(5) = exp(ǫ )/(exp(0.2ǫ )+exp(ǫ )). Assume user 3 is selected in this iteration, then LIN terminates since Tc = T . Finally, LIN calculates the 6 3 Pr1(z)dz payment to all selected users, i.e., user 1 and user 3. We have p1 = 3+ Pr (3) and R 1 6 4 Pr3(z)dz p3 =4+ Pr (4) . R 3 4.3.3 Design of BidGuard-M

In this section, we will describe BidGuard-M in detail as illustrated in Algorithm 4. BidGuard-M selects a winning task-bid pair for each task in T iteratively until all the tasks can be completed. All winning task-bid pairs constitute BW . At the beginning of each iteration, BidGuard-M firstly selects for an unassigned task τ ∈ T a set of task-bid pairs

k k k Bt in which τi = τ for all βi ∈ Bt. BidGuard-M will assign each task-bid pair βi ∈ Bt a probability to be selected as follows. It is desired to select the task-bid pair with the lowest

k bi from Bt. To apply exponential mechanism, we need to design a score function, which is

k a non-increasing function of bi . The probability of each task-bid pair to be selected is set

k according to the value of a score function. Finally, BidGuard-M calculates the payment pi

k for each winning task-bid pair βi ∈BW . Let Pri(z) denote the probability of a task-bid pair being selected with bid z. According to Theorem 3, the payment to a winning task-bid pair

44 Algorithm 4: BidGuard-M −→ Input : A set of sensing tasks T , a set of users U, submitted task-bid profile B , and differential privacy parameter ǫ> 0. Output: A set of winners S and a payment profile −→p . 1 BW ← ∅, S ← ∅, Bt ← ∅; 2 foreach i ∈U do pi ← 0; 3 foreach τ ∈T do 4 foreach i ∈U do 5 k k k if ∃βi ∈Bi such that τi = τ then Bt ←Bt ∪ {βi }; 6 end 7 k foreach βi ∈Bt do 8 k Calculate the probability Pri(bi ) of each task-bid pair being selected according to the score function; 9 end ′ 10 k Select one task-bid pair randomly, denoted by βi′ , according to the computed probability distribution; ′ 11 k BW ←BW ∪ {βi′ }, Bt ← ∅; 12 end 13 k foreach βi ∈BW do ¯ b Pr (z)dz bk i 14 k k i pi ← bi + k ; R Pri(bi ) 15 end 16 foreach i ∈U do 17 if Bi ∩BW =6 ∅ then 18 S ← S ∪ {i}; k 19 pi ← k p ; βi ∈Bi∩BW i 20 end P 21 end 22 return S and −→p . is

¯b k Pri(z)dz k k bi pi = bi + k . (4.11) R Pri(bi ) For each user, if it has at least one winning task-bid pair, it is added into the winner set S

k and its payment pi = k p . βi ∈Bi∩BW i

Same as the single-bidP model, we adopt fLIN and fLOG as score functions.

45 k Linear score function: For any task-bid pair βi ∈Bt, the probability to be selected is bk Pr (bk) ∝ exp ǫ(1 − i ) . (4.12) i i ¯b   Note that in order to guarantee the value of the score function is nonnegative, we normalize

k k bi bi , i.e., ¯b . Then the probability is

k bi exp ǫ(1 − ¯ ) k b Pri(bi )= k . (4.13)  bj k exp ǫ(1 − ¯ ) βj ∈Bt b k   Log score function: For any task-bidP pair βi ∈Bt, the probability to be selected is bk Pr (bk) ∝ exp ǫ log i . (4.14) i i 1/2 ¯b   k k bi We also normalize bi , i.e., ¯b to guarantee the value of the score function is nonnegative. Then the probability is

k bi exp ǫ log1/2 ¯ k b Pri(bi )= k . (4.15)   bj k exp ǫ log ¯ βj ∈Bt 1/2 b Throughout the rest of this chapter,P we denote BidGuard-M with linear score function

fLIN and log score function fLOG by LIM-M and LOG-M, respectively. Illustrating example: We use the example in Table 4.2 to illustrate how LIM-M works. Let ¯b = 1, ¯b = 4, and the differential privacy parameter ǫ = 0.1. At the beginning,

T = {τ1,τ2,τ3}, S = ∅, BW = ∅, Bt = ∅, and U = {1, 2, 3, 4, 5}. LIM-M starts to select users

1 1 1 1 1 for every task in T iteratively. For t1, LIM-M first constructs B1 = {β1 , β2 , β3 , β4 , β5 }. By

Equation (4.13), LIM-M calculates the probability of every task-bid pair in B1 to be selected.

1 For example, the probability of β1 to be selected is Pr1(1.5) = exp(0.0625)/(exp(0.0625) + exp(0.075) + exp(0.06) + exp(0.025) + exp(0.0375)). LIM-M selects one task-bid pair based

1 1 on the calculated probability distribution. Assume β2 is selected, then BW = {β2 }. LIM-M

2 2 2 executes the same process for t2. We have B2 = {β1 , β4 }. The probabilities of β1 and

2 β4 to be selected are Pr1(1.5) = exp(0.0625)/(exp(0.0625) + exp(0.05)) and Pr4(2) =

2 exp(0.05)/(exp(0.0625) + exp(0.05)), respectively. Assume LIM-M selects β1 , then BW =

46 1 2 2 2 2 2 {β2 , β1 }. For t3, LIM-M constructs B3 = {β3 , β5 }. The probabilities of β3 and β5 to be se-

lected are Pr3(2.4) = exp(0.06)/(exp(0.06)+exp(0.0375)) and Pr5(2.5) = exp(0.0375)/(exp(0.06)+

2 1 2 2 exp(0.0375)), respectively. Assume LIM-M selects β3 , then BW = {β2 , β1 , β3 }. Once all

tasks are assigned, LIM-M calculates the payment for each task-bid pair in BW . We have 4 4 4 1 1 Pr2(z)dz 2 1.5 Pr1(z)dz 2 2.4 Pr3(z)dz p2 =1+ Pr (1) , p1 =1.5+ Pr (1.5) , and p3 =2.4+ Pr (2.4) . Finally, LIM-M calcu- R 2 R 1 R 3 2 1 lates the winners set S = {1, 2, 3} and corresponding payments, i.e., p1 = p1, p2 = p2 and

2 p3 = p3.

4.4 Analysis

In this section, we first analyze the properties of BidGuard with two score functions: LIN and LOG. Then, we analyze the properties of BidGuard-M with two score functions: LIM-M and LOG-M.

4.4.1 Analysis of BidGuard

In this section, we first analyze the properties of LIN.

Theorem 6. LIN achieves computational efficiency, individual rationality, truthfulness,

1 and (ǫ(e − 1)/e,δ)-differential privacy, where ǫ > 0 and δ ∈ (0, 2 ] are constants, e is the base of the natural logarithm. In addition, it has social cost at most gOPT + O(ln n) with probability at least 1 − 1/nO(1), where g is the cardinality of the largest user task set, OPT is the optimal social cost of the SCM problem, and n is the number of users.

Proof. We first prove the computational efficiency. The outer while-loop (Lines 3-12) will run at most m iterations since there are m tasks. Meanwhile, the two inner for-loops (Lines 4-6) and (Lines 7-9) will run at most n iterations since there are n users. Therefore, the total computational complexity of LIN is O(mn). The individual rationality is guaranteed ¯b Pri(z)dz bi by the fact that the payment to each winner i is pi = bi + ≥ bi. In order to prove R Pri(bi) the rest of this theorem, we prove the following lemmas.

Lemma 1. LIN is truthful.

47 Proof. According to Equation (4.8) and Equation (4.10), the probability Pri(bi) of user i being selected in BidGuard is monotonically non-increasing in its bid bi. In addition, no bid ¯ ∞ ¯b is greater than b in our model. Thus we have 0 Pri(z)dz = 0 Pri(z)dz < ∞. Furthermore, we have R R

¯b Pri(z)dz bi E[pi]= (1 − Pri(bi)) × 0+ Pri(bi) × bi + (4.16)  R Pri(bi)  ∞   = biPri(bi)+ Pri(z)dz. (4.17) Zbi Then, according to Theorem 3, the lemma holds.

1 Lemma 2. For any constants ǫ> 0 and δ ∈ (0, 2 ], LIN achieves (ǫ(e − 1)/e,δ)-differential privacy, where e is the base of the natural logarithm.

−→ −→ Proof. Let β and β′ be two input task-bid profiles that differ in any user d’s bid, respectively. −→ −→ −→ Let M( β ) and M(β′ ) denote the sequences of users selected by LIN with inputs β and −→ β′ , respectively. We show that LIN, even revealing the order in which the users are chosen,

achieves differential privacy for an arbitrary sequence of users I = i1, i2, . . . , il of arbitrary −→ −→ length l. We consider the relative probability of LIN for given task-bid inputs β and β′ :

b ′ ij exp ǫ (1− ¯ ) b|Γi −Tc| −→  j  Pr M( β )= I l ′ bi i∈U exp ǫ (1− ¯ ) j b|Γi−Tc| =  b′  (4.18) h −→′ i P i exp ǫ′(1− j ) Pr M(β )= I j=1 ¯b|Γ −T | Y ij c ! b′ h i ′ i i∈U exp ǫ (1− ¯ ) j b|Γi−Tc|   P b ′ ij l exp ǫ (1 − ¯ ) b|Γij −Tc| = ′ (4.19)  bi  j=1 ′ j exp ǫ (1 − ¯ ) Y b|Γij −Tc|   b′ l exp ǫ′(1 − i ) i∈Uj ¯b|Γ −T | × i c , (4.20)  ′ bi  j=1 P exp ǫ (1 − ¯ ) Y i∈Uj b|Γi−Tc| P  

48 where Uj = U\{i1, i2, . . . , ij−1} and the first equation is based on Equation (4.8). We then

′ prove this lemma by cases. When bd

−→ ′ bd Pr M( β )= i1, i2, . . . , il exp ǫ (1 − ¯ ) b|Γd−Tc| −→ ≤ ′ (4.21) h ′ i  ′ bd  Pr M(β )= i1, i2, . . . , il exp ǫ (1 − ¯ ) b|Γd−Tc| h i  b′ − b  = exp ǫ′ d d (4.22) ¯b|Γ −T |  d c  ′ ′ 6 exp(ǫ (bd − bd)) (4.23) 6 exp(ǫ′∆). (4.24)

′ When bd ≥ bd, the first product is at most 1 because the factor for any j ∈ [1,l] is less

than 1 if ij = d and equal to 1 otherwise. In the remainder of the proof, we focus on this case. Therefore, we have

−→ Pr M( β )= i1, i2, . . . , il (4.25) h −→′ i Pr M(β )= i1, i2, . . . , il

h b′i l exp ǫ′(1 − i ) i∈Uj ¯b|Γ −T | 6 i c (4.26)  ′ bi  j=1 P exp ǫ (1 − ¯ ) Y i∈Uj b|Γi−Tc|   l P exp ǫ′ θi exp ǫ′(1 − bi ) i∈Uj |Γ −T | ¯b|Γ −T | = i c i c (4.27)  ′  bi  j=1 P exp ǫ (1 − ¯ ) Y i∈Uj b|Γi−Tc| l   P θ = E exp ǫ′ i (4.28) i∈Uj |Γ −T | j=1 i c Y    l ′ 6 Ei∈Uj [exp(ǫ θi)] , (4.29) j=1 Y ′ x ′ where θi = bi − bi. For all x 6 1, e 6 1+(e − 1) · x. Therefore, for all ǫ 6 1, we have

49 l l ′ ′ Ei∈Uj [exp(ǫ θi)] 6 Ei∈Uj [1+(e − 1)ǫ θi] (4.30) j=1 j=1 Y Y l ′ 6 exp (e − 1)ǫ Ei∈Uj [θi] . (4.31) j=1 ! X l 6 Lemma B.2 in [104] implies that Pr[ j=1 Ei∈Uj [θi] > ∆ln(e/δ)] δ. Let O denote the outcome space, where each o ∈ O is aP sequence of users i1, i2, ··· , il. We split O into two

′ ′′ ′ l ′′ ′ sets O and O , where O = {o ∈ O| j=1 Ei∈Uj [θi] ≤ ∆ln(e/δ)} and O = O\O . Thus we have P

−→ Pr M( β ) ∈O (4.32) h −→ i = Pr M( β )= o (4.33) o∈O h i X −→ −→ = Pr M( β )= o + Pr M( β )= o (4.34) o∈O′ o∈O′′ X h i X h −→ i ≤ exp((e − 1)ǫ′∆ln(e/δ)) Pr M(β′ )= o + δ (4.35) o∈O′ X −→h i ≤ exp((e − 1)ǫ′∆ln(e/δ)Pr M(β′ ) ∈O + δ (4.36) −→h i = exp(ǫ(e − 1)/e)Pr M(β′ ) ∈O + δ. (4.37) h i The lemma holds.

Lemma 3. With probability at least 1−1/nO(1), LIN has social cost at most gOPT +O(ln n), where g is the cardinality of the largest user task set, OPT is the optimal social cost of the SCM problem, and n is the number of users.

Proof. Let S∗ denote the optimal solution to the SCM problem. For LIN, we consider a sequence W of winners according to the order they are selected, i.e., W = w1,w2,...,wl.

For each wi, 1 ≤ i ≤ l, let Wi denote the set of users satisfying ∀j ∈Wi: 1) j ∈S∗;

2) Γj ∩ Γwi =6 ∅;

50 3) Γj ∩ Γwk = ∅, ∀k ∈ [1, i − 1];

∗ Wi is the set of users in S but not in W because of wi. For truthful mechanisms, we have

bi = ci. According to Theorem 5, by taking α = O(ln n), we have c c 1 − wi ≥ 1 − j − O(ln n), (4.38) |Γwi −Tc| |Γj −Tc| with a probability of at least 1 − 1/nO(1). This implies that

cwi cj ≥ ·|Γj −Tc|− O(ln n), (4.39) |Γwi −Tc| O(1) with a probability of at least 1 − 1/n . Summing over all j ∈Wi, we have

cwi cj ≥ − O(ln n) · |Γj −Tc| (4.40) |Γwi −Tc| j∈Wi   j∈Wi X c X ≥ wi − O(ln n) (4.41) |Γwi −Tc|

with a probability of at least 1−1/nO(1). The first inequality holds because |Γ −T |≥ j∈Wi j c |W |. The second inequality holds because |Γ −T | ≥ 1. Note thatP |Γ −T | can i j∈Wi j c wi c be upper bounded by a constant g, whichP is the cardinality of the largest user task set. Therefore, we have c c ≥ wi − O(ln n) (4.42) j g j∈W Xi Summing over all wi ∈W, we have

OPT = cj = cj + cj (4.43) ∗ ∗ j∈S wi∈W j∈Wi j∈S ∩W X c X X X ≥ wi − O(ln n), (4.44) g w ∈W Xi where the inequality holds because when n is large, |W| ≪ n. Then the lemma holds.

For LOG, we have the following properties. The proofs are similar to those for LIN, and thus omitted.

51 Theorem 7. LOG achieves computational efficiency, individual rationality, truthfulness,

1 and (ǫ(e − 1)/e,δ)-differential privacy, where ǫ > 0 and δ ∈ (0, 2 ] are two constants, e is ω the base of the natural logarithm. In addition, it has social cost at most 2 HmOPT with

−t m probability at least 1 − e , for any constant ω > 0 and Hm = j=1 1/j, where m is the number of sensing tasks, and OPT is the optimal social cost of theP SCM problem.

Remarks: According to Theorem 4 in [41], the minimum weighted set cover problem can be reduced to the SCM problem. It is well known that the best-possible polynomial time approximation algorithm is an Hm-approximation algorithm for the weighted set cover

problem [105], where Hm is the m-th harmonic number. LOG has social cost at most

ω 2 HmOPT , where ω is a constant, and thus it is asymptotically optimal. Even though LIN cannot be proved to be asymptotically optimal in terms of the social cost, we will show in Section 4.5.4 that it achieves better privacy protection in terms of the privacy leakage than LOG.

4.4.2 Analysis of BidGuard-M

In this section, we first analyze the properties of LIM-M.

Theorem 8. LIM-M achieves computational efficiency, individual rationality, truthfulness, and 2mǫ-differential privacy, where ǫ> 0 is a constant and m is the number of sensing tasks. In addition, it has social cost at most OPT + mO(ln n) with probability at least 1 − 1/nO(1), where OPT is the optimal social cost of the SCM-M problem, and n is the number of users.

Proof. We first prove the computational efficiency. The outer while-loop (Lines 3-12) will run at most m iterations since there are m tasks. Meanwhile, the two inner for-loops (Lines 4-6) and (Lines 7-9) will run at most n iterations since there are n users. The payment calculation for the winning task-bid pairs (Lines 13-15) will run at most m iterations since there are m tasks. The winner selection and payment calculation (Lines 16-21) will run at most n iterations since there are n users. Therefore, the total computational complexity of LIM-M is O(mn). The individual rationality is guaranteed by the fact that the payment to

52 ¯ b Pr (z)dz bk i k k i k each winning task-bid pair is pi = bi + k ≥ bi . In order to prove the rest of this R Pri(bi ) theorem, we prove the following lemmas.

Lemma 4. LIM-M is truthful.

k Proof. According to Equation (4.13) and Equation (4.15), the probability Pri(bi ) of task-bid

k k pair βi ∈Bt being selected in BidGuard-M is monotonically non-increasing in its bid bi . In ¯ ∞ ¯b addition, no bid is greater than b in our model. Thus we have 0 Pri(z)dz = 0 Pri(z)dz < ∞. Furthermore, we have R R

¯b k Pri(z)dz k k k k bi E[pi ]= 1 − Pri(bi ) × 0+ Pri(bi ) × bi + k (4.45)  R Pri(bi )   ∞ k k   = bi Pri(bi )+ Pri(z)dz. (4.46) bk Z i Then, according to Theorem 3, the lemma holds.

In order to quantify the differential privacy performance of LIM-M, we use the following lemmas.

Lemma 5. (Composability [18]). The sequential application of randomized computation

Mi, each giving ǫi-differential privacy, yields ( i ǫi)-differential privacy. P Lemma 6. For any constant ǫ > 0, LIM-M achieves 2mǫ-differential privacy, where m is the number of sensing tasks.

Proof. Since LIM-M follows the exponential mechanism, it selects a task-bid pair for each task based on Equation (4.13). According to Theorem 4, for each task LIM-M is 2ǫ- differential privacy, since the largest difference in the score function (Λ) is 1. LIM-M selects a task-bid pair for each task in T iteratively until all tasks can be finished. This is a sequen- tial application of the selection mechanism for one task. Therefore, according to Lemma 5, LIM-M is 2mǫ-differential privacy, since there are m tasks.

53 Next, we bound the social cost of LIM-M.

Lemma 7. With probability at least 1 − 1/nO(1), LIM-M has social cost at most OPT + mO(ln n), where OPT is the optimal social cost of the SCM-M problem, m is the number of sensing tasks and n is the number of users.

∗ Proof. Let B denote the optimal solution to the SCM-M problem. We denote as BW an arbitrary set of winning task-bid pairs returned by LIM-M. Because only one task-bid pair

∗ is selected for each task and all tasks need to be completed, we have |B | = |BW |. Therefore,

k ∗ k k k for any task-bid pair βj ∈ B , there exists a task-bid pair βi ∈ BW such that ti = tj , and vice versa. According to Theorem 5, by taking α = O(ln n), we have

k k bi ≤ bj + O(ln n) (4.47) with a probability of at least 1 − 1/nO(1) for each task t ∈ T . Summing Equation (4.47)

k k O(1) over all tasks, k b ≤ k ∗ b + mO(ln n) with a probability at least 1 − 1/n . βi ∈BW i βj ∈B j k k k k k For truthful mechanisms,P weP have b = c and b = c . Thus k b is the social cost of i i j j βi ∈BW i k LIM-M, and OPT = k ∗ b . P βj ∈B j This concludes theP proof.

For LOG-M, we have the following properties. The proofs are similar to those for LIM-M, and thus omitted.

Theorem 9. LOG-M achieves computational efficiency, individual rationality, truthfulness,

1 and 2m log 1 ( )ǫ-differential privacy, where m is the number of sensing tasks, ∆ is the 2 1+∆ maximum difference in the bidding price, and ǫ> 0 is a constant. In addition, it has social cost at most 2ωOPT with probability at least 1 − e−ω, for any constant ω > 0 and OPT is the optimal social cost of the SCM-M problem.

Remarks: LOG-M has social cost at most 2tOPT , where ω is a constant, and thus it is asymptotically optimal.

54 4.5 Performance Evaluation

In this section, we evaluate the performance of BidGuard and BidGuard-M and compare them respectively with TRAC [47] and DP-hSRC [58]. TRAC is closest to our work in terms of the design objective, but does not protect users’ bid privacy. DP-hSRC considers users’ bid privacy, but minimizes total payment instead of social cost.

4.5.1 Simulation Setup

All results are based on a real data set of taxi traces. The dataset consists of the traces of 320 taxi drivers, who work in the center of Rome [90]. Each taxi driver has a tablet that periodically (every 7s) retrieves the GPS locations (latitude and longitude) and sends it with the corresponding driver ID to a central server. The mobility pattern of taxi traces can be used to depict the mobility of smartphone users as in [10, 15]. We consider a mobile crowdsensing system in which the task is to measure the cellular signal strength at specific locations. Each user can sense the cellular signal strength within the area centered at the user’s location with a radius of 30m. Tasks are represented by GPS locations reported by taxis. We assume that the driver of each taxi is a user. We preprocess the tasks such that each task can be sensed by at least two users according to our system model. We use three metrics to evaluate the performance: social cost, total payment and privacy leakage. The social cost, as defined in Section 4.2.4, refers to the total cost of all selected users. The total payment measures the payment paid by the platform to all selected users. We first compare the social cost and total payment of BidGuard and BidGuard-M with TRAC. Then we compare the social cost of BidGuard with the optimal social cost. We define privacy leakage to quantitatively measure the differential privacy performance of BidGuard and BidGuard-M. −→ −→ Privacy Leakage: Given a mechanism M, let β and β ′ be two task-bid profiles, which −→ −→ −→ only differ in one user’s bid. Let M( β ) and M( β ′) denote the outcome of M with input β

55 −→ and β ′, respectively. The privacy leakage, denoted by PL, is defined as the Kullback-Leibler −→ −→ divergence of the two outcome probability distributions based on β and β ′, −→ −→ Pr M( β )= o PL = Pr M( β )= o ln −→ . (4.48)  h ′ i  o∈O Pr M( β )= o X h i   Note that the smaller the PL value is, the harder it is toh distinguishi the two task-bid profiles, and thus the better the privacy preserving performance is achieved. In our evaluation, we randomly select locations as the sensing tasks according to the settings. We assume the bids of users are randomly distributed over [1, 50] for BidGuard and [1, 10] for BidGuard-M. Because users in BidGuard bid for a set of tasks, while users in BidGuard-M bid for a single task. We generate users’ bids according to two different distributions, i.e., uniform distribution and normal distribution. To evaluate the impact of the number of sensing tasks on the performance metrics, we set the number of users to 200 and vary the number of sensing tasks from 20 to 60 with a step of 10. To evaluate the impact of the number of users on the performance metrics, we set the number of sensing tasks to 150 and vary the number of users from 100 to 300 with a step of 50. For the differential privacy parameters, we set ǫ =0.1 and δ =0.25 as default. All results are averaged over 1000 independent runs for each setting. Note that since the performances under both uniform and normal distributions follow the same pattern according to our evaluation, in the following we only show the performance under the uniform distribution.

4.5.2 Evaluation of Social Cost

We first compare the social cost of BidGuard and BidGuard-M with that of TRAC and DP-hSRC. Note that TRAC is optimal in the multi-bid model. The impact of the number of sensing tasks on the social cost of BidGuard and that of BidGuard-M is shown in Figure 4.1(a) and Figure 4.1(b), respectively. We observe that the social cost of TRAC, DP-hSRC, BidGuard and BidGuard-M all increase when the number of sensing tasks grows. This is because with more sensing tasks, the platform may select more users incurring a

56 800 TRAC 300 TRAC 700 LIN LIN-M 600 LOG LOG-M 500 DP-hSRC 200 DP-hSRC 400 300 100 200 Social cost 100 Social cost 0 0 20 30 40 50 60 20 30 40 50 60 Number of sensing tasks Number of sensing tasks (a) (b) Figure 4.1: Impact of the number of sensing tasks on the social cost. (a) BidGuard. (b) BidGuard-M.

350 1300 TRAC LIN LOG 300 1000 DP-hSRC 250 TRAC 700 200 LIN-M LOG-M 150 DP-hSRC

Social cost 400 Social cost 100 100 150 200 250 300 100 150 200 250 300 Number of users Number of users (a) (b)

Figure 4.2: Impact of the number of users on the social cost. (a) BidGuard. (b) BidGuard-M. higher social cost. We also see that the social cost of TRAC is lower than those of DP- hSRC, BidGuard and BidGuard-M. This is because, in each iteration, TRAC selects the user with the lowest criterion value (defined in Equation (4.5)) in the single-bid model and the user with the lowest bid for each task in the multi-bid model. In contrast, since both BidGuard and BidGuard-M are randomized, they cannot always guarantee to select the user with the lowest criterion value or the lowest bid in each iteration. DP-hSRC selects users based on a threshold price and has no performance guarantee on the social cost. Besides, the social cost of LIN is smaller than that of LIN, and the social cost of LOG-M is lower than that of LIM-M. This is because, both LIN and LOG-M prefer to select users with low bid, as the log score function will give more probability of being selected to low-bid users.

57 170 TRAC 60 LIN 50 150 LOG DP-hSRC 40 130 TRAC OPT 30 LIN 110 LOG 20 DP-hSRC 90 OPT Social cost

70 Social cost 10 50 0 20 30 40 50 60 100 150 200 250 300 Number of sensing tasks Number of users (a) (b)

Figure 4.3: Comparison of BidGuard, TRAC, DP-hSRC and OPT. (a) Impact of the number of sensing tasks. (b) Impact of the number of users.

Figure 4.2(a) and Figure 4.2(b) depict the impact of the number of users on the social cost of BidGuard and BidGuard-M, respectively. We see that the social cost decreases slightly when the number of users increases for TRAC, DP-hSRC, BidGuard and BidGuard-M. This is because, with more users, the platform can find more low-cost users to complete the sensing tasks. The social cost of TRAC is lower than those of DP-hSRC, BidGuard and BidGuard- M. The reason is same as explained for Figure 4.1. Meanwhile, for the same reason as above, the social cost of LIN is lower than that of LIN, and the social cost of LOG-M is lower than that of LIM-M. In Figure 4.3, we compare the social cost of incentive mechanisms in the single-bid model. Let OPT denote the optimal solution. Since finding the optimal solution takes exponential time, we set the number of the users to 10 for Figure 4.3(a), and set the number of sensing tasks to 4 for Figure 4.3(b). We observe that Figure 4.3(a) and Figure 4.3(b) have the same pattern in Figure 4.1(a) and Figure 4.2(a), respectively. The reason is similar to those explained for Figure 4.1(a) and Figure 4.2(a). Furthermore, we observe that BidGuard sacrifices the social cost for the users’ bid privacy, compared to TRAC and the optimal solution. Note that in Figure 4.3(b), the social cost of TRAC is very close to that of OPT.

This is because TRAC is an HK -approximation algorithm, where Hk ≈ 2.34 in this figure.

58 1400 TRAC 600 TRAC 1200 LIN LIN-M LOG LOG-M 1000 DP-hSRC 400 DP-hSRC 800 600 400 200 200 Total payment 0 Total payment 0 20 30 40 50 60 20 30 40 50 60 Number of sensing tasks Number of sensing tasks (a) (b) Figure 4.4: Impact of the number of sensing tasks on the total payment. (a) BidGuard. (b) BidGuard-M.

4.5.3 Evaluation of Total Payment

In Figure 4.4 and Figure 4.5, we plot the impact of the number of sensing tasks and the impact of the number of users on the total payment of BidGuard and BidGuard-M, respectively. The results show that the total payment of TRAC, DP-hSRC, BidGuard and BidGuard-M all follow the same pattern as the social cost. In addition, both LIN and LOG-M have lower total payment than LIN and LIM-M, respectively. This is because the log score function could select users with lower bids as shown in Figure 4.1 and Figure 4.2. We also observe that the total payment of DP-hSRC is lower than BidGuard and BidGuard- M. This is because DP-hSRC selects and pays users according to a single price, and thus it has performance guarantee on the total payment. However, DP-hSRC can only achieve approximate truthfulness, which ensures that no user is able to make more than a slight gain in its expected utility by bidding untruthfully. In addition, in the next subsection, we will see that the privacy protection of LIN and LIM-M are better than that of DP-hSRC.

4.5.4 Evaluation of Privacy Leakage

Next, we evaluate BidGuard and BidGuard-M in terms of privacy leakage. We observe that the privacy leakage values of BidGuard, BidGuard-M and DP-hSRC are very small. This is because they all achieve differential privacy. However, the privacy leakage value of TRAC is positive infinity, which indicates a bad differential privacy performance. This is

59 3000 TRAC LIN 550 2400 LOG 1800 DP-hSRC 400

1200 250 TRAC LIN-M 600 100 LOG-M DP-hSRC Total payment 0 Total payment 100 150 200 250 300 100 150 200 250 300 Number of users Number of users (a) BidGuard (b) BidGuard-M

Figure 4.5: Impact of the number of users on the total payment. because TRAC does not protect users’ bid privacy, and the denominator could be 0 for TRAC according to Equation (4.48). In both Figure 4.6(a) and Figure 4.6(b), we see that the privacy leakage of both LIN and LIM-M are always smaller than that of LIN and LOG-M, respectively, which indicates that LIN and LIM-M have better privacy protection performance than LIN and LOG-M, respectively. This is because the linear score function treats the probability of every outcome uniformly. However, the log score function gives more probability to the outcome with low social cost. We also observe that the privacy leakage of both LIN and LIM-M are always smaller than that of DP-hSRC. This is because DP-hSRC is a single-price mechanism in which one user’s bid change may significantly change the outcome of the mechanism. According to Equation Equation (4.48), DP-hSRC’s PL value increases. We do not observe a pattern of the privacy leakage when the number of tasks increases for BidGuard. The reason is, according to the definition of privacy leakage, the difference between the probabilities of two outcomes to be selected is independent of the number of sensing tasks. However, the privacy leakage value increases when the number of tasks increases for BidGuard-M. This is because, according to Theorem 8 and Theorem 9, the differential privacy performance of BidGuard-M is reversely linear to the number of sensing tasks. In both Figure 4.7(a) and Figure 4.7(b), we can see the impact of the number of users on the privacy leakage of BidGuard and BidGuard-M, respectively. Note that the privacy leakage value decreases when the number of users increases for both DP-hSRC, BidGuard

60 +inf +inf LIN LIN-M LOG LOG-M DP-hSRC DP-hSRC 0.4 TRAC 2 TRAC

0.2 1

Privacy leakage 0 Privacy leakage 0 20 30 40 50 60 20 30 40 50 60 Number of sensing tasks Number of sensing tasks (a) BidGuard (b) BidGuard-M

Figure 4.6: Impact of the number of sensing tasks on privacy leakage.

+inf +inf LIN LIN-M LOG LOG-M 0.8 DP-hSRC DP-hSRC 1.5 TRAC 0.6 TRAC 1 0.4 0.2 0.5

Privacy leakage 0 Privacy leakage 0 100 150 200 250 300 100 150 200 250 300 Number of users Number of users (a) BidGuard (b) BidGuard-M

Figure 4.7: Impact of the number of users on privacy leakage. and BidGuard-M. This is because the probability of each outcome decreases as the number of users increases. Specifically, the more users in the system, the more possible outcomes for DP-hSRC, BidGuard and BidGuard-M, the less difference between the probabilities of two outcomes to be selected, and thus the better differential privacy performance. We also see the privacy protection performance of LIN and LIM-M are better than that of LIN and LOG-M, respectively. In addition, the privacy protection performance of LIN and LIM-M are also better than that of DP-hSRC. The reason for this is similar to that discussed before.

Figure 4.8(a) and Figure 4.8(b) show the impact of the differential privacy parameter ǫ on the privacy leakage for BidGuard and BidGuard-M, respectively. The results show that the value of ǫ has more impact on the privacy leakage for LIN and LOG-M than that of LIN and LIM-M, respectively. This is because the log score function is more sensitive than the

61 linear score function. For LIN and LOG-M, the privacy leakage increases slightly when the value of ǫ grows. This is because, theoretically, the larger the ǫ is, the worse the differential privacy is achieved, and thus the higher privacy leakage. Meanwhile, it is easy to observe that the privacy leakage of LIN and LIM-M are smaller than that of DP-hSRC, LIN and LOG-M, respectively. This can also be explained by the same reason for Figure 4.6(a). Figure 4.9(a) and Figure 4.9(b) illustrate the tradeoff between the social cost and the privacy leakage of LIN and LOG-M, respectively. We observe that the privacy leakage decreases as the decreasing of ǫ. The reason is similar to that discussed for Figure 4.8(a). However, this improvement in privacy comes at a cost of the increased social cost for both LIN and LOG-M. Remark. Compared with TRAC, which does not protect users’ bid privacy, both Bid- Guard and BidGuard-M sacrifice the social cost and payment for the users’ bid privacy. Compared with DP-hSRC, BidGuard and BidGuard-M have better bid privacy preserva- tion and lower social cost in most cases although incurring higher total payment. In addi- tion, BidGuard and BidGuard-M achieve truthfulness while DP-hSRC achieves approximate truthfulness. Besides, LIN and LIM-M outperform LIN and LOG-M in terms of privacy protection, respectively. However LIN and LOG-M have lower social cost and payment.

0.6 1 0.5 0.8 0.4 LIN LIN-M LOG 0.6 LOG-M 0.3 DP-hSRC DP-hSRC 0.2 0.4 0.1 0.2 Privacy Leakage 0.1 0.3 0.5 0.7 0.9 Privacy Leakage 0.1 0.3 0.5 0.7 0.9 ǫ ǫ (a) BidGuard (b) BidGuard-M

Figure 4.8: Impact of the ǫ on privacy leakage.

62

52 0.55 67.5 0.96

51 0.54 67 0.94

50 0.53

66.5 0.92 49 0.52 Social cost Social cost Social cost Social cost Privacy leakage Privacy leakage Privacy leakage Privacy leakage 48 0.51 66 0.9 0.2 0.4 ε 0.6 0.8 1 0.2 0.4 ε 0.6 0.8 1 (a) BidGuard (b) BidGuard-M

Figure 4.9: Social cost v.s. privacy leakage.

4.6 Conclusion

In this work, we analyzed the privacy issues in the incentive mechanisms. The main contributions of this work are as follows: First, we identified the bid privacy issues in exist- ing incentive mechanisms raised by the inference attack. We considered both single-minded and multi-minded models, and proposed BidGuard and BidGuard-M, for each model, re- spectively. Both BidGuard and BidGuard-M are frameworks for privacy-preserving mobile crowdsensing incentive mechanisms which achieve computational efficiency, individual ratio- nality, truthfulness, differential privacy, and approximate social cost minimization. Specifi- cally, we designed two different score functions, linear score function and log score function, to realize these two frameworks. Second, we proved that with linear score function, BidGuard achieves (ǫ(e − 1)/e,δ)-differential privacy and the social cost is at most gOPT + O(ln n)

O(1) 1 with the probability of at least 1 − 1/n , where ǫ > 0 is a constant, δ ∈ (0, 2 ] and g is the cardinality of the largest user task set, e is the base of the natural logarithm, OPT is the optimal social cost, and n is the number of the users. BidGuard-M achieves 2mǫ- differential privacy and the social cost is at most OPT + mO(ln n) with the probability of at least 1 − 1/nO(1), where m is the number of sensing tasks. Third, we also proved that with log score function, BidGuard achieves (ǫ(e − 1)/e,δ)-differential privacy and the

ω −ω social cost is at most 2 HmOPT with the probability of at least 1 − e for any constant

m ω > 0, where Hm = j=1 1/j, and m is the number of sensing tasks. BidGuard-M achieves P

63 1 t 2m log 1 ( )ǫ-differential privacy and the social cost is at most 2 OPT with the probability 2 1+∆ of at least 1 − 1/nO(1), where ∆ is the maximum difference in the bidding price. In addi- tion, both BidGuard and BidGuard-M are proved to be asymptotically optimal. At last, we evaluated the performance of BidGuard and BidGuard-M through simulations based on a real data set. Extensive numerical results demonstrated that both frameworks achieve bid-privacy preservation although sacrificing social cost.

64 CHAPTER 5 DETERRING THE SYBIL ATTACK IN INCENTIVE MECHANISMS

This work was published in IEEE INFOCOM [77, 78].

5.1 Background

In crowdsensing, a user may try to profit from submitting multiple bids under fictitious identities, e.g., creating multiple accounts. This is referred to as the Sybil attack. This attack is easy to conduct but difficult to detect. We demonstrate that existing auction-based incentive mechanisms are vulnerable to Sybil attack. Among them, all VCG auction-based incentive mechanisms [71, 106–108] are not Sybil-proof, since VCG auction has been proved not Sybil-proof in [64]. Mechanisms proposed in [9, 47, 109–111] are not Sybil-proof, since a user can exploit multiple fictitious identities to increase its critical value, and thus increase its payment. For mechanisms in [45, 73, 91, 112], a user can change from a loser to a winner with a positive utility via conducting Sybil attack. A user can increase its utility by completing a subset of its sensing task set through mechanisms in [8, 44]. The vulnerability of a mechanism to Sybil attack may make the system fail to achieve desired properties, e.g., social cost minimization [91], and jeopardize the system’s fairness, which discourages users from participating in crowdsensing. However, the problem of designing Sybil-proof auction- based incentive mechanisms for crowdsensing is still open. Moreover, the Sybil attack model in crowdsensing is yet to be formally defined.

5.2 Model and Problem Formulation

In this section, we model the crowdsensing system as a reverse auction and consider two different scenarios. The first is an offline scenario in which all users are required to submit their bids at the beginning of the auction and the platform selects a subset of users according to some criteria. The second is an online scenario in which users participate in the system

65 Dynamic users Task description Bid Reject

Assignment Sensing data Payment

Platform Time line Deadline

Figure 5.1: Online crowdsensing system in a random order, as shown in Figure 5.1. Once a user arrives, the platform has to make irrevocable decisions on whether to select it and how much it should be paid without knowing future information.

5.2.1 Offline Scenario

Similar to most crowdsensing systems [8, 47], we consider a crowdsensing system consist- ing of a platform and a set U = {1, 2,...,n} of n ≥ 2 users, who are interested in performing sensing tasks. The platform first publicizes a set T = {τ1,τ2,...,τm} of m sensing tasks.

Each task τi ∈ T has a value vi to the platform. We use bundle to refer to any subset of T . There is a function V (B) to calculate the value of bundle B to the platform, i.e., V (B)= v , B⊆T . Each user i has a task set Γ ⊆T , which i can perform according ti∈B i i to its preference.P Generalizing existing works, we assume that each user i has a cost function ci(B), which determines the cost for i to perform all tasks in bundle B. The cost function ci(·) satisfies the following properties:

• ci(∅)=0;

• ci({τj})= ∞, ∀tj ∈ T \ Γi;

′ ′′ ′ ′′ ′ ′′ • ci(B ) ≤ ci(B ), ∀B , B ⊆T with B ⊆B ;

′ ′′ ′ ′′ ′ ′′ • ci(B) ≤ ci(B )+ ci(B ), ∀B , B ⊆T and B = B ∪B .

66 The physical meanings of the first two properties are the rationality and capability of users, respectively. The third property implies that performing addition tasks may incur more cost. The last property guarantees that the cost of performing a set of tasks is not greater than the cost of performing these tasks separately. These four properties together closely depict the cost of performing sensing tasks in practice and serve as a base of our attack model to be defined in Section 5.2.3.

The platform will assign each user i ∈U a bundle Ai ⊆ Γi to complete. Note that Ai = ∅ −→ means user i is not assigned any task to perform. Let A = (A1, A2,..., An) denote the assignment profile. Finally, the platform calculates the payment pi for each user i. Note −→ that pi = 0, if Ai = ∅. Let p = (p1,p2,...,pn) denote the payment profile. Depending on whether a user is willing to perform its whole task set, we consider two cases in this work.

For the single-minded (SM) case, each user i ∈ U is willing to perform only Γi, and the utility of i is

p − c (A ), if A =Γ ; u = i i i i i (5.1) i 0, otherwise.  Since ci(Ai) can only be equal to ci(Γi), we use ci instead of ci(Ai) for notational simplicity.

For the multi-minded (MM) case, each user i ∈U is willing to perform any subset of Γi, and the utility of i when performing bundle Ai ⊆T is

p − c (A ), if A ⊆ Γ ; u = i i i i i (5.2) i 0, otherwise.  The utility of the platform is

u0 = V ( Ai) − pi. (5.3) i∈U i∈U [ X In this work, we model the interaction between the platform and the users as a sealed-bid reverse auction, in which the platform is a buyer who buys sensing service and the users are sellers who bid to perform sensing tasks. We call user i a winner if Ai =6 ∅, and a loser otherwise.

67 Let βi =(Γ˜i,bi) denote the task-cost pair of user i. In the SM case, bi is a value, while bi

is a cost function in the MM case. A task-cost pair is true if Γ˜i =Γi and bi = ci(Γi) in the

SM case; Γ˜i =Γi and bi = ci(·) in the MM case. At the beginning of the auction, each user i ∈ U submits its task-cost pair as its bid to the platform, which is not necessarily its true −→ task-cost pair. Let β = (β1, β2,...,βn) denote the task-cost profile. Given the task-cost −→ profile β , the platform determines the outcome of auction, which consists of assignment −→ profile A and payment profile −→p .

5.2.2 Online Scenario

In the online scenario, we consider a crowdsensing system consisting of a platform and a crowd of smartphone users U = {1, 2,...,n}, where n is unknown. The platform first

publicizes a set T = {τ1,τ2,...,τm} of m sensing tasks, aiming at finding some users to complete these tasks before a specified deadline T , which is divided into slots of equal size.

Each task τi ∈ T has a value vi to the platform. We use bundle to refer to any subset of T . There is a function V (B) to calculate the value of bundle B to the platform, i.e., V (B) = v , B⊆T . Each user i has a active time window within which it promises τi∈B i

to completeP tasks if it is assigned, and a task set Γ˜i ⊆ T , which i can complete within its

active time window. Leta ˜i ∈ {1,...,T } denote the beginning of active time window and

d˜i ∈ {1,...,T }, d˜i ≥ a˜i denote the end of active time window. Note that the platform has to

make the decision on whether to select user i by d˜i. As with the offline scenario, we assume that each user i has a cost functionc ˜i(B), which determines the cost for i to perform all tasks in bundle B. The cost functionc ˜i(·) also satisfies the properties described in Section 5.2.1.

5.2.3 Threat Models

Threats to Incentive: We assume that users are selfish but rational. Hence, in both offline and online scenarios, it is possible that user i maximizes its utility by reporting a false cost value ˜bi, which differs from its true cost ci(Γi) in the SM case; or reporting a false cost functionc ˜i(·) =6 ci(·) in the MM case. Besides, user i could also misreport the task set by

68 submitting Γ˜i =Γ6 i. In addition, a user may submit a false active time window in the online scenario, i.e., ai =˜6 ai or di =6 d˜i. Other threats to incentive, e.g., collusion, are out the scope of this work. Sybil Attack: We first consider the Sybil attack in the offline scenario. Based on our system model, a user could conduct Sybil attack by submitting multiple task-cost pairs

under fictitious identities. As a simple case, user i could submit two task-cost pairs βi′ =

′ ′′ (Γ˜i′ ,˜bi′ ) and βi′′ =(Γ˜i′′ , ˜bi′′ ) under two identities i and i , respectively. This case is sufficient to represent the general Sybil attack. Depending on whether a user is interested in only performing its whole task set, we consider the following two cases. Single-Minded Case: Each user is interested in only performing its whole task set. User

′ ′′ i submits βi′ = (Γ˜i′ , ˜bi′ ) and βi′′ = (Γ˜i′′ , ˜bi′′ ) using identities i and i , where Γ˜i′ ∪ Γ˜i′′ = Γi.

User i’s utilityu ˜i through Sybil attack is

p ′ + p ′′ − c (Γ ), if A ′ ∪A ′′ =Γ ; u˜ = i i i i i i i (5.4) i 0, otherwise.  Multi-Minded Case: Each user is willing to perform any subset of its task set. User i

′ ′′ submits βi′ = (Γ˜i′ ,˜bi′ ) and βi′′ = (Γ˜i′′ ,˜bi′′ ) using identities i and i , where Γ˜i′ ⊆ Γi and

Γ˜i′′ ⊆ Γi. User i’s utilityu ˜i through Sybil attack is

p ′ + p ′′ − c (A ′ ∪A ′′ ),if A ′ ∪A ′′ ⊆ Γ ; u˜ = i i i i i i i i (5.5) i 0, otherwise.  Ifu ˜i >ui, user i has an incentive to conduct Sybil attack in either case. Note that because SM case is a special case of MM case, we have: 1) if a mechanism is not Sybil-proof in SM case, it is not Sybil-proof in MM case; 2) if a mechanism is Sybil-proof in MM case, it is Sybil-proof in SM case. Next, we show that existing offline mechanisms are not Sybil-proof in either SM case or MM case. We classify them into four categories according to their vulnerabilities to the Sybil attack. The first category is composed of the VCG-based mechanisms [71, 106–108]. They are vulnerable to Sybil attack, since VCG auction is not Sybil-proof, as proved in [64]. In the second category, each winner is paid its critical value, which is based on a certain

69 user’s bid. All the mechanisms [9, 47, 109–111] are not Sybil-proof, since a user can exploit multiple fictitious identities to increase its critical value, and thus increase its payment. The third category consists of mechanisms [45, 73, 91, 112] in which users are selected iteratively according to a ratio criterion, and a loser can become a winner by rigging the criterion value through Sybil attack. The fourth category consists of mechanisms [8, 44] in which users are selected iteratively according to a linear criterion, and a user can increase its utility by completing a subset of its sensing task set. Next, we use examples to show the vulnerabilities to Sybil attack for the last two categories. First, we use MMT [91] as an example from the third category and show that it is not Sybil-proof in SM case. MMT selects users iteratively. In each iteration, MMT selects

user with the lowest bi/vi(S) value, where vi(S) = V ( j∈S∪{j} Γj) − V ( j∈S Γj) is user i’s marginal value to the platform given the selected usersS in S. Besides, MMTS is also an Hk-

approximation algorithm in terms of the social cost, where Hk is the k-th harmonic number, and k is the largest user task set size. Social cost is the summation of the true cost of all the selected users. We use the example in Figure 5.2 to show that MMT is not Sybil- proof. In this example, squares represent users, and disks represent tasks. A link between a user and a task represents that the task is in that user’s task set. The number above

user i denotes its bids for Γi. Since MMT is truthful, we assume bi = ci(Γi). The number

below task τj denotes its value to the platform. In Figure 5.2(a), we have U = {1, 2, 3, 4},

T = {τ1,τ2,τ3,τ4,τ5}, Γ1 = {τ1,τ5}, Γ2 = {τ2,τ4,τ5}, Γ3 = {τ1,τ2,τ3}, Γ4 = {τ3,τ4}, b1 = 6,

b2 = 12, b3 = 6, b4 = 7. According to MMT, users 2 and 3 will be selected with the social cost 18. Note that user 4 is a loser in this case, and thus its utility is 0. Now, assume user 4 conducts Sybil attack by submitting two bids β4′ =({τ3}, 1.5)) and β4′′ =({τ4}, 5.5) under identities 4′ and 4′′, respectively, as shown in Figure 5.2(b). In this case, MMT selects users 1, 3, 4′, and 4′′ with a social cost at 19. From this example, we see that MMT is not Sybil-proof in SM case, since user 4 could increase its utility through Sybil attack. Therefore, other mechanisms [47, 109, 112] similar to MMT in terms of user selection criterion are not

70 Sybil-proof, either. Next, we use MSensing [8] as an example from the fourth category. MSensing is inten- tionally designed for SM case and can be proved Sybil-proof in SM case. However, it is not Sybil-proof in MM case. MSensing selects users iteratively. In each iteration, MSensing se-

lects user with the highest vi(S) − bi value, where vi(S) is the same as that defined in MMT.

Since MSensing is truthful, we assume bi = ci(Γi). In the example shown in Figure 5.3(a), MSensing will select users 1, 2, and 3 as winners. In this case, user 4 is a loser with utility

0. Assume that user 4 conducts Sybil attack by submitting two bids β4′ = ({τ3}, 5) and

′ ′′ β4′′ = ({τ6}, 2) under identities 4 and 4 , respectively, as shown in Figure 5.3(b). In this case, MSensing selects users 1, 2 and 4′′. Compared with the former case, user 4 changes from a loser to a winner, and thus its utility increases. This example shows that a user can increase its utility through Sybil attack in MSensing, and thus MSensing is not Sybil-proof in MM case. Next, we consider the impact of Sybil attack in the online scenario. We assume that an attacker could conduct Sybil attack at any time slot in its active time by submitting multiple bids under fictitious identities. As a simple case, attacker i could submit two bids under two identities i′ and i′′, respectively. Note that i could submit these two bids simultaneously or at different time slots in its active time window. This case is sufficient to represent the

6 12 6 7 6 12 6 1.5 5.5

1 2 3 4 1 2 3 4’ 4’’

1 2 3 4 5 1 2 3 4 5 4 8 6 7 6 4 8 6 7 6 (a) No Sybil attack (b) With Sybil attack

Figure 5.2: Example showing MMT is not Sybil-proof in SM case

71 general Sybil attack. We extend the definition of attacker’s utility in the following two cases.

Single-Minded Case: Each attacker i is only willing to perform Γ˜i. Attacker i submits

′ ′′ βi′ = (ai′ ,di′ , Γi′ ,bi′ ) and βi′′ = (ai′′ ,di′′ , Γi′′ ,bi′′ ) within [˜ai, d˜i] using identities i and i ,

respectively, where Γi′ ∪ Γi′′ = Γ˜i. Attacker i’s utility is

p ′ + p ′′ − c˜ (Γ˜ ), if A ′ ∪A ′′ = Γ˜ ; u = i i i i i i i (5.6) i 0, otherwise.  Multi-Minded Case: Each attacker is willing to perform any subset of its task set. At- tacker i submits βi′ =(ai′ ,di′ , Γi′ ,bi′ ) and βi′′ =(ai′′ ,di′′ , Γi′′ ,bi′′ ) within [˜ai, d˜i] using identi-

′ ′′ ties i and i , respectively, where Γi′ ⊆ Γ˜i and Γi′′ ⊆ Γ˜i. Attacker i’s utility is

p ′ + p ′′ − c˜ (A ′ ∪A ′′ ),if A ′ ∪A ′′ ⊆ Γ˜ ; u = i i i i i i i i (5.7) i 0, otherwise.  Note that attacker i has an incentive to conduct Sybil attack if ui > u˜i in either case. Next, we show that existing online mechanisms are not Sybil-proof in either SM case or MM case. We classify existing online incentive mechanisms into three categories according to their vulnerabilities to Sybil attack. The first category is VCG-based mechanism [71]. The second category is the critical value-based mechanism [72], in which each winner is paid its critical value. The third category consists of threshold-based mechanisms [45, 54, 70, 73, 112]. The mechanism in the first category is not Sybil-proof since VCG auction is proved not Sybil- proof in [64]. Next, we analyze the vulnerabilities to Sybil attack for the last two categories.

8 6 6 7 8 6 6 5 2 1 2 3 4 1 2 3 4’ 4’’

1 2 3 4 5 6 1 2 3 4 5 6 3 8 6 8 10 9 3 8 6 8 10 9 (a) No Sybil attack (b) With Sybil attack

Figure 5.3: Example showing MSensing is not Sybil-proof in MM case

72 We first analyze the vulnerabilities of critical value-based mechanisms. The mechanism in [72] executes a reverse auction round by round. In one round of auction, the mechanism

sorts all active users in a non-decreasing order by their bids. The first mt users will be selected as winners, where mt is the number of tasks at time slot t. At last, the payment to each winner i is set to its critical value. Let ti denote the time slot i wins. Winner i’s critical

value is the largest of all the mt-th users’ bids at each time slot t ∈ [ti,di]. It is obvious that

at any time slot t ∈ [ti,di] in which the mt-th user does not exist, i can submit a higher bid

using a fictitious identity to take the mt-th place. Therefore, i can increase its critical value and thus increase its payment through Sybil attack. Next, we analyze the vulnerabilities of threshold-based mechanisms. Within this category, we further divide the mechanisms into two groups. The first group comprises mechanisms

n in [45, 73]. These two mechanisms are two-stage mechanisms in which the first arrived ⌊ e ⌋ users are rejected and their bids are used as the sample for the next stage, where n is the number of participating users and e is the base of the natural logarithm. In the first stage,

the largest vi/bi value will be used as a threshold for the user selection in the next stage,

where vi is user i’s marginal value and bi is its bid. In the second stage, the first user i whose

vi/bi value is no less than the threshold will be selected as a winner. We use the example in Table 5.1. to show that these mechanisms are not Sybil-proof. In this example, the value

of τ1 is 1 and the value of τ2 is 3. The mechanism will reject the first user (User1) since

5 n = 5 and ⌊ e ⌋ = 1. Next, assume user 1 conducts the Sybil attack by submitting two bids ′ ′′ βi′ = (1, 1, {τ1}, 2) and βi′′ = (2, 2, {τ2}, 2) under fictitious identities 1 and 1 , respectively.

′ 6 In this case, the first two users (User 1 and User 2) are rejected since n = 6 and ⌊ e ⌋ = 2, ′′ and the threshold is 0.5. User i is the third arrived user whose vi′′ /bi′′ = 1.5 > 0.5, and thus it is selected as a winner. Because these mechanisms satisfy individual rationality, user i′′ has a non-negative utility. Therefore, these mechanisms are not Sybil-proof, since a user can increase its utility by changing from a loser to a winner through Sybil attack.

73 The second group comprises of an improved two-stage mechanism [54] and a multi-stage mechanism called OMG [70, 112]. Given the deadline T and budget B, these two mechanisms will set the cutoff time of the first stage by T and T , respectively, and allocate the 2⌊ln T ⌋ 2⌊log2 T ⌋ first stage a stage-budget B and B , respectively. We take the OMG as an example 2⌊ln T ⌋ 2⌊log2 T ⌋ and show that it is still not Sybil-proof using the example in Table 5.1. In the first stage,

OMG selects users iteratively according to users’ marginal density, vi/bi, where vi is user i’s marginal value, and bi is its bid. In each iteration, the user with the largest marginal value will be selected, and if its marginal density is not less than a preset density threshold ρ∗ and its bid does not exceed the stage-budget, it will be selected as a winner. In this example, density threshold ρ∗ = 0.5, T = 8 and B = 16, and thus the first stage ends at time slot

1 and the stage-budget in the first stage is 2. The value of τ1 is 1 and the value of τ2 is 2. In this case, user 1 will not be selected since its bid is greater than the stage-budget.

Next, assume user 1 conducts the Sybil attack by submitting two bids βi′ = (1, 2, {τ1}, 2)

′ ′′ and βi′′ = (2, 2, {τ2}, 2) under fictitious identities 1 and 1 , respectively. In this case, user 1′ will win with non-negative utility since OMG is individually rational. Therefore, OMG is not Sybil-proof, since a user can increase its utility by changing from a loser to a winner via Sybil attack.

Table 5.1: Example showing vulnerabilities to Sybil attack

User Bid (ai, di, Γi, bi)

1 (1,2, {τ1,τ2}, 4) 2 (2,2, {τ1}, 2) 3 (3,4, {τ1}, 3) 4 (3,3, {τ2}, 4) 5 (3,4, {τ1,τ2}, 4)

5.2.4 Desired Properties and Objective

In this work, we consider the following important properties:

74 • Computational Efficiency: A mechanism is computationally efficient if it terminates in polynomial time.

• Individual Rationality: A mechanism is individually rational if each user has a non- negative utility when bidding its true task-cost pair.

• Truthfulness: A mechanism is truthful if any user’s utility is maximized when bidding its true task-cost pair.

• Sybil-Proofness: A mechanism is Sybil-proof if any user’s utility is maximized when bidding its true task-cost pair using a single identity.

In this work, we aim to design Sybil-proof incentive mechanisms (SPIM) for both offline and online scenarios, which also achieve computational efficiency, individual rationality and truthfulness.

5.3 Our Approach

In this section, we first describe two offline Sybil-proof auction-based incentive mech- anisms SPIM-S and SPIM-M for SM and MM case, respectively. Then we describe two online Sybil-proof auction-based incentive mechanisms SOS and SOM for SM and MM case, respectively.

5.3.1 Design of SPIM-S and SPIM-M

In SM case, a user could maximize its utility by splitting its task set into multiple subsets and submitting these subsets using multiple identities in the hope that all the identities will be selected as winners. In order to design Sybil-proof mechanisms, we provide a sufficient condition for a mechanism to be Sybil-proof in the following lemma.

Lemma 8. A mechanism is Sybil-proof if it satisfies the following two conditions:

1. If any user i pretends two identities i′ and i′′, and both i′ and i′′ are selected as winners, then i should be selected as a winner while using only one identity;

75 2. If any user i pretends two identities i′ and i′′, the payment to i should not be less than the summation of the payments to i′ and i′′.

The truthfulness of SPIM-S relies on Myerson’s well-known characterization [113, 114].

Theorem 10. [113, 114] An auction mechanism is truthful iff:

• The selection rule is monotone: If user i wins the auction by bidding bi, it also wins

′ by bidding bi ≤ bi;

• Each winner is paid the critical value, which is the smallest value such that user i would lose the auction if it bids higher than this value.

In order to satisfy the first condition in Lemma 8, SPIM-S should select i before i′ and i′′ under the same selection criterion. To guarantee this, SPIM-S groups users by the size of their task sets and starts from the group with the largest task set size. Because the task set size of neither i′ nor i′′ is greater than that of i, i will be selected before both i′ and i′′. Following Theorem 10, in each group, SPIM-S selects users iteratively according to the value of a criterion function, which is non-decreasing in terms of users’ bids. Besides, the payment to each winner is its critical value, which also guarantees the second condition of Lemma 8. Now, we describe the details of SPIM-S, which is illustrated in Algorithm 5. At first, SPIM-S groups all users by their task set size and sorts all groups in decreasing order. SPIM-S starts from the group with the largest task set size. Within each group, SPIM-S calls WPG, as shown in Algorithm 6, to select winners and calculate their payments. SPIM-S repeatedly calls WPG until all tasks are assigned or all groups are processed. As a fundamental part of SPIM-S, WPG consists of two phases: winner selection and payment determination. The inputs to WPG are a set R of sensing tasks to be assigned, a −→ group Gk of users, and a submitted task-bid profile βk by users in Gk. The output is a tuple −→ −→ consisting of an assignment profile Ak, a payment profile pk , and a set Tk of assigned tasks of users in Gk. In the winner selection phase, WPG selects winners iteratively. Given the

76 Algorithm 5: SPIM-S −→ Input: Sensing task set T , user set U, bid profile β , criterion function QS and payment function P . S −→ Output: Assignment profile A and payment profile −→p . 1 R←T , pi ← 0, Ai ← ∅, ∀i ∈U; 2 Group users by the task set size, and sort these groups in decreasing order G1, G2,..., Gl; 3 k ← 1; 4 while R= 6 ∅ and k ≤ l do −→ −→ −→ 5 (Ak, pk , Tk) ← WPG(R, Gk, βk); 6 R←R\Tk; 7 k ← k + 1; 8 end −→ 9 return A and −→p .

set R of unassigned tasks, let vi(R)= V (R∩ Γi) denote the marginal value of user i to the platform. Let [i] denote the winner selected in the i-th iteration such that the value of the criterion function QS(v[i](R[i]),b[i]) is the minimum over Gk \S[i], where

{[1], [2],..., [i − 1]}, i ≥ 2; S[i] = (5.8) (∅, i =1, and R = R\ Γ . Note that we prefer to select users with higher marginal values [i] j∈S[i] j but lower bids, andS thus the criterion function QS : ❘≥0 × ❘≥0 → ❘ could be any function that satisfies the following properties:

• QS(x,y) is non-increasing with respect to x;

• QS(x,y) is non-decreasing with respect to y.

This implies that

QS(v[1],b[1]) ≤ QS(v[2],b[2]) ≤ QS(v[3],b[3]) ≤··· (5.9)

For notational simplicity, we use v[i] instead of v[i](R[i]).

In the payment determination phase, WPG computes the payment pi to each winner i, i.e., Ai =6 ∅. It processes the users in Gk \{i} similarly to how it selects users in the winner selection phase. In the j-th iteration, let ij denote the selected user. WPG uses a payment

77 Algorithm 6: Winner Selection and Payment Determination in a Group (WPG) −→ Input: Sensing task set R, user group Gk, bid profile βk, criterion function QS and payment function PS. −→ −→ Output: Assignment profile Ak, payment profile pk and a set of assigned tasks Tk. // Winner Selection Phase 1 ′ ′ R ←R, Tk ← ∅, Gk ←Gk; ′ 2 i ← arg min ′ Q (v (R ),b ); j∈Gk S j j 3 ′ while bi ≤ vi and Gk =6 ∅ do ′ ′ 4 Ai ← Γi, Tk ←Tk ∪ Γi, R ←R \ Γi; 5 ′ ′ Gk ←Gk \{i}; ′ 6 i ← arg min ′ Q (v (R ),b ); j∈Gk S j j 7 end // Payment Determination Phase 8 foreach i ∈Gk s.t. Ai =6 ∅ do 9 ′ ′ Gk ←Gk \{i}, R ←R; ′ 10 i ← arg min ′ Q (v (R ),b ); j j∈Gk S j j 11 ′ ′ while Gk =6 ∅ and R =6 ∅ do 12 if bij >vij then break; 13 ′ ′ ′ pi ← max{pi, min{PS vi(R ),QS(vij(R ),bij) ,vi(R )}}; 14 ′ ′ ′ ′ R ←R \ Γij , Gk ←Gk \{ij}; ′  15 i ← arg min ′ Q (v (R ),b ); j j∈Gk S j j 16 end ′ ′ 17 if bi ≤ vi(R ) then pi ← max{pi,vi(R )}; 18 end −→ −→ 19 return (Ak, pk , Tk).

function PS : ❘≥0 ×❘ → ❘≥0 to compute the maximum bid, with which user i can be selected as a winner instead of ij. The payment function PS could be any function that satisfies the following properties:

• PS(x,y) is non-decreasing with respect to y;

• ∀x,y ∈ ❘≥0, ∀z ∈ ❘, if QS(x,y)= z,PS(x,z)= y.

Note that given a value of criterion function QS, the higher the maximum bid is, the higher the payment should be. This process repeats until no user’s submitted cost is less than its

marginal value or no user is left in Gk \{i}. Let vij = V (Rij ∩ Γij ) denote the marginal value of the j-th winner to the platform in the payment determination phase, where Rij is

78 a set of unassigned tasks before ij is selected. Let K denote the number of iterations of the while-loop in the payment determination phase of WPG. Therefore, we have K values, with which user i could have been selected as a winner in one of the K iterations if i was considered. We set the payment pi to the maximum of these K values and the marginal value of i after these K iterations. Next, we design SPIM-M for MM case. In MM case, a user is willing to perform any subset of its task set, and tries to maximize its utility by submitting multiple task-cost pairs using fictitious identities. We design SPIM-M based on the characterization of Sybil-proof mechanisms in [65]. SPIM-M guarantees that the utility of a user is not less than its utility while using multiple identities. To achieve this, SPIM-M first calculates the payments to each user for any subset of its task set. The payment is determined independently of its own cost function. At last, SPIM-M assigns each user a subset of its task set that maximizes its utility independently of the assignments to other users. SPIM-M is illustrated in Algorithm 7.

Algorithm 7: SPIM-M −→ Input: Sensing task set T , user set U, bid profile β , criterion function QM and payment function P . M−→ Output: Assignment profile A and payment profile −→p . 1 foreach i ∈U do 2 Calculate the payment to i for any bundle B ⊆ Γi, ′ ′ p ← P V (B), max ′ ′ Q (V (B ),c (B )) ; 3 end i,B M j6=i,B ⊆Γj ,B ∩B6=∅ M j 4 foreach i ∈U do  5 Ai ← arg maxB⊆Γi (pi,B − ci(B)); 6 pi ← pi,Ai ; 7 end −→ 8 return A and −→p .

At first, SPIM-M uses a payment function PM (x,y) = x − max{0,y} to calculate the payment pi,B to user i for any bundle B ⊆ Γi. PM is based on the value of the criterion

function QM (V (B),cj(B)), where QM (x,y) = x − y, V (B) is the value of bundle B to the platform, and cj(B) is the cost of bundle B to any user j ∈ U \ {i}. Note that, user i will

79 only be assigned a bundle B ⊆ Γi, since ci({tj}) = ∞ for any task tj ∈ T \ Γi. Note that the payment to user i for any bundle B ⊆ Γi is independent of its cost function ci(·), i.e.,

′ ′ pi,B = V (B)−max{0, max (V (B )−cj(B ))}. (5.10) ′ ′ j6=i,B ⊆Γj ,B ∩B6=∅

At last, SPIM-M will assign each user i a set Ai of tasks, which is a bundle B ⊆ Γi maximizing its utility based on the calculated payment, i.e.,

Ai = argmax(pi,B − ci(B)). (5.11) B⊆Γi

SPIM-M gives each user i a payment pi = pi,Ai . Note that pi = pi,Ai = 0, if Ai = ∅.

5.3.2 Design of SOS and SOM

Next, we describe the design of SOS, a Sybil-proof online incentive mechanism for SM case. In SM case, a user could maximize its utility by submitting multiple subsets of its task set using multiple identities with different active time window in the hope that all the identities will be selected as winners. In order to design Sybil-proof mechanisms, we provide a sufficient condition for an online mechanism to be Sybil-proof in the following theorem.

Theorem 11. An online mechanism is Sybil-proof if it satisfies the following two conditions: If any user i pretends two identities i′ and i′′, and both i′ and i′′ are selected as winners with

assignment Ai′ and Ai′′ , respectively within [˜ai, d˜i], then

1. i should be selected as a winner with assignment Ai = Ai′ ∪Ai′′ within [˜ai, d˜i] while using only one identity;

2. pi ≥ pi′ + pi′′ .

The following theorem will be used to guarantee the truthfulness of SOS.

Theorem 12. [72] An online mechanism is truthful iff:

• The winner selection rule is monotone: If user i wins the auction by bidding βi =

′ ′ ′ ′ ′ ′ ′ (ai,di, Γi,bi), it also wins by bidding βi = (ai,di, Γi,bi), where ai ≤ ai,di ≥ di, Γi ⊆

′ ′ Γi,bi ≤ bi;

80 • Each winner is paid the critical value, which is the smallest value such that user i would lose the auction if it bids higher than this value.

In order to guarantee Sybil-proofness and truthfulness, SOS should satisfy both Theo- rem 11 and Theorem 12. Next, we describe the details of SOS, which is comprised of two subroutines: winner selection with assignment and payment determination. The winner selection with assignment is illustrated in Algorithm 8. It selects winners iteratively at each time slot until all tasks are assigned or deadline T is reached. Let Rt

t t denote the set of currently unassigned tasks and vi = V (R ∩ Γi) denote the marginal value of user i to the platform at time slot t. At each time slot t, Algorithm 8 selects the user

t t with the largest criterion value, vi − bi, from all active users in U . If its criterion value is non-negative, this user will be put into winner set W and assigned the tasks it submits. Otherwise, it will not be assigned tasks. All active users’ task assignments constitute the assignment profile At of time slot t. The assignment profile of every time slot constitutes the overall assignment profile A. The outcome of Algorithm 8 are A and W.

Algorithm 8: SOS-WSA(T , T ) 1 W ← ∅, At ← ∅, ∀t ∈ [1,T ], t ← 1, Rt ←T ; 2 while Rt 6= ∅ and t ≤ T do 3 U t ← the set of active users at time slot t; t t 4 Aj ← ∅, ∀j ∈U ; t 5 i ← arg maxj∈U t (vj − bj); if t then 6 bi ≤ vi t t+1 t 7 W ← W ∪ {i}, Ai ← Γi, R ←R \ Γi; 8 end 9 t ← t + 1; 10 end 1 11 A ← (A ,... AT ); 12 return (A, W).

The payment determination is illustrated in Algorithm 9. The input are user i’s ID, the time slot t[i] in which i is assigned tasks, and the set Rt[i] of unassigned tasks at the

beginning of time slot t[i]. For each time slot in [t[i],di], Algorithm 9 calculates the highest

81 price i can bid in order to be a winner. At last, the payment to user i is set to the highest price among these prices. Note that

t t pi =arg max {vi − c }, (5.12) t∈[t[i],di]

t t where c = max{0,vij − bij }, and ij is the user with the largest criterion value at time slot t when i is not in U t. The main algorithm of SOS is illustrated in Algorithm 10. Note that, there is at most one winner at each time slot according to Algorithm 8. Therefore, SOS iterates all time slots and calculates the payment for the winner at each time slot using Algorithm 9.

Algorithm 9: SOS-PD(i, t[i], Rt[i]) t[i] 1 t ← t[i],pi ← 0; t 2 while t ≤ di and R =6 ∅ do 3 U t ← the set of active users at time slot t; 4 U t ←U t \{i}; 5 t ij ← arg maxj∈U t (vj − bj); 6 if t then bij ≤ vij t[i] t[i] 7 t t pi ← max{pi ,vi − (vij − bij )}; t+1 t 8 R ←R \ Γi; 9 else t[i] t[i] 10 t pi ← max{pi ,vi }; 11 end 12 t ← t + 1; 13 end t[i] 14 return pi .

Algorithm 10: SOS(T , T ) 1 t ← 1 , Rt ←T ; 2 (A, W) ← SPIM-S-WSA(T , T ); 3 while Rt =6 ∅ and t ≤ T do 4 t i ← j ∈W s.t. Aj =6 ∅ ; 5 t t pi ← SPIM-S-PD(i, t, R ); 6 t+1 t t R ←R \Ai; 7 t ← t + 1; 8 end 9 return (A, p).

82 Next, we design SOM, a Sybil-proof online incentive mechanism for MM case. In MM case, a user is willing to perform any subset of its task set and tries to maximize its utility by submitting multiple bids under fictitious identities. To guarantee that each user submits its true cost function, SOM gives the payment to each user, which is independent of its own cost function. The time-truthfulness is based on the monotonic task assignment rule and the submodularity of users’ marginal value. To achieve Sybil-proofness, we extend the characterization of Sybil-proof mechanisms in [65] to the online scenario. The main algorithm of SOM is shown in Algorithm 11. It selects users iteratively at each time slot until all tasks are assigned or deadline T is reached. Given sensing tasks T and deadline T , SOM outputs the overall assignment profile A and the overall payment profile p.

t ˜ Let Γi ⊆ Γi denote a set of unassigned tasks that i can perform at time slot t. Let ❇t ′ ′ t ′ i(B) = {B |B ⊆ Γi, B ∩ B 6= ∅}. At each time slot t, SOM first calculates the payment t t pi,B to each active user i for any bundle B ⊆ Γi. Note that the payment to user i for any

t bundle B ⊆ Γi is independent of its cost function ci(·), i.e.,

t ′ ′ pi,B = V (B)−max{0, max (V (B )−cj(B ))}. (5.13) ′ ❇t j6=i,B ∈ j (B) t t At last, SOM will assign each active user i a set of tasks Ai, which is a bundle B ⊆ Γi maximizing its utility based on the calculated payment, i.e.,

t t A = argmax(p − ci(B)). (5.14) i t i,B B⊆Γi t t t t t The payment pi to each user i at time slot t for assignment Ai is p t . Note that pi = p t = i,Ai i,Ai t 0, if Ai = ∅.

5.4 Analysis

In this section, we first analyze the properties of SPIM-S and SPIM-S. Then, we analyze the properties of SOM and SOS.

83 Algorithm 11: SOM(T , T ) 1 F ← ∅, At ← ∅, ∀t ∈ [1,T ], t ← 1; 2 while F 6= T and t ≤ T do 3 U t ← the set of active users at time slot t; 4 t t Γi ← Γi \ F, ∀i ∈U ; 5 foreach i ∈U t do 6 t Calculate the payment to i for any bundle B ⊆ Γi, t ′ ′ p ← V (B) − max{0, max ′ ❇t (V (B ) − c (B ))}; i,B j6=i,B ∈ j (B) j 7 end 8 foreach i ∈U t do t t 9 A ← arg max t (p − c (B)); i B⊆Γi i,B i 10 t t pi ← p t ; i,Ai 11 t F←F∪Ai; 12 end 13 t ← t + 1; 14 end 15 A = A1,... AT ); 16 p =(p1,...,pT ); 17 return A and p.

5.4.1 Analysis of SPIM-S and SPIM-M

In this section, we analyze the properties of SPIM-S.

Theorem 13. SPIM-S is computationally efficient, individually rational, truthful and Sybil-proof in SM case.

We prove this theorem with the following lemmas.

Lemma 9. SPIM-S is computationally efficient.

Proof. The running time of SPIM-S is dominated by the while-loop (Lines 4-8). Because there are m tasks and each winner should contribute at least one new task to be selected, the number of winners is at most m. Thus the while-loop will run at most m iterations. The running time of WPG is dominated by the for-loop (Lines 8-18), which is bounded by bounded by O(nm3). Because finding the user with minimum criterion value takes O(nm2) time and the number of winners is at most m. Therefore, the total computational complexity of SPIM-S is bounded by O(nm4), since Algorithm 5 will call WPG at most m times.

84 Note that, the running time of SPIM-S is only linear in the number of users n. In crowdsensing systems, n is usually very large, whereas the number of sensing tasks m is much less than n. Thus SPIM-S is efficient.

Lemma 10. SPIM-S is individually rational.

Proof. Let [i] denote the winner selected in the i-th iteration in the winner determination phase of WPG. If [i] is the last winner in Gk, there is no winner selected in the i-th iter- ation in the payment determination phase. According to Line 17 in Algorithm 6, we have b[i] ≤ v[i] ≤ p[i]. Otherwise, let [i]i denote the winner selected in the i-th iteration when processing users in Gk \{i}. Since user [i]i would not be selected in the i-th iteration if [i] is considered, according to Equation (5.9), we have Q (v[i] ,b ) ≤ Q (v ,b ). Thus we have S [i]i [i] S [i]i [i]i b = P (v[i] ,Q (v[i] ,b )) ≤ P (v[i] ,Q (v ,b )), where the equality is due to the second [i] S [i]i S [i]i [i] S [i]i S [i]i [i]i property of function PS, and the inequality is due to the first property of function PS. We also have b ≤ v = v[i] , since user [i] is the winner selected in the i-th iteration. It follows [i] [i] [i]i that b ≤ min{P (v[i] ,Q (v ,b )),v[i] } ≤ p , where the second inequality is because [i] S [i]i S [i]i [i]i [i]i [i] of Line 13 in Algorithm 6. Therefore, u[i] = p[i] − b[i] ≥ 0, and SPIM-S is individually rational.

Lemma 11. SPIM-S is truthful.

Proof. We first prove that user i cannot increase its utility by submitting a false task set. Then, we prove that user i cannot increase its utility by submitting a false cost. We assume that user i submits a false bid β˜i = (Γ˜i, ˜bi). If user i submits a false task set Γ˜i ⊂ Γi, the

utility of i is 0 according to Equation (5.1). On the contrary, if Γi ⊂ Γ˜i, user i will not be paid because it cannot finish all the tasks in Γ˜i. Therefore, there is no incentive for i to submit a false task set Γ˜i. For the truthfulness of the submitted cost, by Theorem 10, it suffices to prove that the selection rule of SPIM-S is monotone and the payment to each winner is its critical value. It is obvious that the selection rule is monotone, since according to the criterion function

85 QS, the criterion value of a user will not increase if it bids a smaller value. Next, we prove that the payment pi to winner i is its critical value. Note that

i pi = max max PS vi ,QS(vij ,bij ) ,vi(RiK+1 ) , (5.15) 1≤j≤K j      where RiK+1 is a set of unassigned tasks after K iterations. If user i bids bi > pi, we have

i i i bi >PS vij ,QS(vij ,bij ) , which implies QS(vij ,bij ) vi(RiK+1 ), thus i will still not be selected after K

iterations. Therefore, pi is the critical value for user i. Since no user can increase its utility by submitting a false task-cost pair, SPIM-S is truthful.

Lemma 12. SPIM-S is Sybil-proof.

Proof. To prove SPIM-S is Sybil-proof, we show that SPIM-S satisfies the sufficient condi- tions in Lemma 8. Suppose user i submits (Γi′ ,bi′ ) and (Γi′′ ,bi′′ ) using two fictitious identities i′ and i′′, respectively. We first prove that SPIM-S satisfies the fist condition in Lemma 8. The following discussion is focused on the winner selection phase of WPG. We assume that both i′ and

′′ i are winners. This implies that Γi′ ⊂ Γi and Γi′′ ⊂ Γi, since one will make the other lose otherwise. Thus, both i′ and i′′ will be in the group(s) after i’s group. Let R˜ ′ and R˜ ′′ denote the set of unassigned tasks before i′ and i′′ are selected, respectively. According to SPIM-S,

′ ′′ ′ ′′ we have vi′ (R˜ ) ≥ bi′ and vi′′ (R˜ ) ≥ bi′′ , since both i and i are winners. In addition, due to

the truthfulness of SPIM-S and the fourth property of the cost function, we have bi = ci ≤

ci′ + ci′′ = bi′ + bi′′ . Because Γi = Γi′ ∪ Γi′′ , i will be considered before the group(s), which i′ and i′′ should belong to. Let R˜ denote the set of unassigned tasks when i is considered.

′ ′′ ′ ′′ Therefore we have R˜ ∪ R˜ ⊆ R˜. It follows that vi′ (R˜ ) ≤ vi′ (R˜) and vi′′ (R˜ ) ≤ vi′′ (R˜),

due to the decreasing property of vi(R). Meanwhile, we have vi′ (R˜)+ vi′′ (R˜) ≤ vi(R˜), since

′ ′′ Γi = Γi′ ∪ Γi′′ . Therefore, vi(R˜) ≥ vi′ (R˜ )+ vi′′ (R˜ ) ≥ bi′ + bi′′ ≥ bi. This implies that i is still a winner while using a single identity. Thus the first condition in Lemma 8 is satisfied.

86 We next prove that SPIM-S satisfies the second condition in Lemma 8. The following discussion is focused on the payment determination phase of WPG. Since i is still a winner

while using one identity, let R˜ i denote the set of unassigned tasks before i is selected. Let

′ ′′ R˜ i′ and R˜ i′′ denote the set of unassigned tasks before i and i are selected, respectively. In addition, there is no user in i’s group can make i lose, i.e., bi > vi(R˜ i), otherwise neither

′ ′′ i nor i can be a winner. Therefore, the payment to i is at least vi(RiK+1 ) according to Equation (5.15). Recall that K is the number of iterations of the while-loop in the

payment determination phase of WPG. In any iteration r ≤ K, we have bir ≤ vir . Due to the properties of functions PS and QS and the decreasing property of vi(R), we have i i i ˜ ˜ min{PS(vir ,QS(vir ,bir )),vir } ≤ vir ≤ vi(Ri). After K iterations, vi(RiK+1 ) ≤ vi(Ri), due to the decreasing property of vi(R). Thus it follows that pi ≤ vi(R˜ i). Similarly, we have

′ ′ ˜ ′ ′′ ′′ ˜ ′′ pi ≤ vi (Ri ) and pi ≤ vi (Ri ). Let RiK+1 denote the set of unassigned tasks after K

′ ′′ ˜ ′ ˜ ′′ ′ ˜ ′ ′′ ˜ ′′ iterations. Since Γi =Γi ∪ Γi , Ri ⊆RiK+1 and Ri ⊆RiK+1 , we have vi (Ri )+ vi (Ri ) ≤

′ ′′ ′ ˜ ′ ′′ ˜ ′′ vi (RiK+1 )+ vi (RiK+1 ) ≤ vi(RiK+1 ). Thus, we have pi ≥ vi(RiK+1 ) ≥ vi (Ri )+ vi (Ri ) ≥ pi′ + pi′′ . Hence, the second condition in Lemma 8 is satisfied. Therefore, SPIM-S is Sybil-proof according to Lemma 8. We can use a similar proof for the case in which a user pretends more than two identities.

Next, we analyze the properties of SPIM-M.

Theorem 14. SPIM-M is individually rational, truthful and Sybil-proof in MM case.

We prove this theorem with the following lemmas.

Lemma 13. SPIM-M is individually rational.

Proof. According to Equation (5.11), any user i is assigned a bundle Ai, which is a subset of

Γi maximizing i’s utility. Since pi,∅ = 0 and ci(∅) = 0, the utility of any user i is non-negative according to Equation (5.2). Thus SPIM-M is individually rational.

Lemma 14. SPIM-M is truthful.

87 Proof. We first prove that user i cannot increase its utility by submitting a false task set. Then, we prove that user i cannot increase its utility by submitting a false cost function.

We assume that user i submits a false bid β˜i = (Γ˜i, c˜i(·)). According to Equation (5.10),

the payment to i for any bundle B ⊆ Γi is calculated independently of i’s cost function.

If Γ˜i ⊂ Γi, the payment to i for any subset of Γ˜i is the same as that when i submits

Γi. Meanwhile, according to Equation (5.11), SPIM-M will assign i a bundle Ai, which

maximizes i’s utility. Therefore, user i cannot increase its utility by submitting Γ˜i ⊂ Γi. On

the contrary, if Γi ⊂ Γ˜i, user i cannot finish the assigned tasks in Ai \ Γi =6 ∅, because of the

second property of the cost function. It follows that pi = 0 in this case. Hence there is no

incentive for i to submit a false task set Γ˜i.

Furthermore, the false cost functionc ˜i(·) can only affect the result of Equation (5.11).

Let A˜i and Ai denote the bundles assigned to i when i submits the false cost functionc ˜i(·)

and the true cost function ci(·), respectively. It is obvious that the utility of i will not

change if A˜i = Ai. On the contrary, if A˜i =6 Ai, according to i’s true cost function, we have ˜ ˜ pi,A˜i − ci(Ai) ≤ pi,Ai − ci(Ai). This is due to the fact that both Ai and Ai are the subset of

Γi, and Ai is the bundle maximizing i’s utility. Therefore, user i cannot increase its utility

by submitting a false cost functionc ˜i(·). Thus SPIM-M is truthful.

Lemma 15. SPIM-M is Sybil-proof.

Proof. We assume that user i pretends two identities i′ and i′′, and both i′ and i′′ are

′ ′ ′ ′′ ′ ′ ′ ′ assigned Ai and Ai , respectively. Let mi denote maxj6=i ,B ⊆Γj ,B ∩Ai′ 6=∅(V (B ) − cj(B )), ′′ ′′ ′′ ′′ ′′ ′′ and mi denote maxj6=i ,B ⊆Γj ,B ∩Ai′′ 6=∅(V (B ) − cj(B )). According to Equation (5.10), the payments to i′ and i′′ are

′ ′ ′ pi ,Ai′ = V (Ai ) − max {0,mi } , (5.16)

′′ ′′ ′′ pi ,Ai′′ = V (Ai ) − max {0,mi } . (5.17)

88 If i is assigned a set Ai = Ai′ ∪Ai′′ of tasks while using a single identity, the payment to i is

pi,Ai = V (Ai) − max 0, max (V (B) − cj(B)) . (5.18) j6=i,B⊆Γ ,B∩A 6=∅  j i  Since Ai = Ai′ ∪Ai′′ , we know that

′ ′ ′ ′ {B|j =6 i, B ⊆ Γj, B∩Ai =6 ∅} = {B |j =6 i , B ⊆ Γj, B ∩Ai′ =6 ∅} (5.19) ′′ ′′ ′′ ′′ ∪ {B |j =6 i , B ⊆ Γj, B ∩Ai′′ =6 ∅}. (5.20)

′ ′′ Let mi denote maxj6=i,B⊆Γj ,B∩Ai6=∅(V (B) − cj(B)). Thus we have mi = max{mi ,mi } and mi ≤ mi′ + mi′′ . In addition, we can prove that Ai′ ∩Ai′′ = ∅ by contradiction. Assume

′ Ai′ ∩Ai′′ =6 ∅, the payments to i is

′ ′ ′ ′ ′′ ′′ pi ,Ai′ = V (Ai ) − max{0,mi }≤ V (Ai ) − (V (Ai )−ci(Ai )), (5.21)

′ ′ ′ where the inequality is due to the fact V (Ai′′ )−ci(Ai′′ ) ≤ mi′ , since Ai′′ is in {B |j =6 i , B ⊆

′ ′ ′′ ′′ ′ ′ Γj, B ∩Ai =6 ∅}. Similarly, we have pi ,Ai′′ ≤ V (Ai ) − (V (Ai ) − ci(Ai )). Therefore the summation of the utilities of i′ and i′′ is

′ ′′ ′ ′ ′′ ′′ ui + ui = pi ,Ai′ − ci(Ai )+ pi ,Ai′′ − ci(Ai ) (5.22) ≤ V (Ai′ ) − (V (Ai′′ ) − ci(Ai′′ )) − ci(Ai′ ) (5.23)

+ V (Ai′′ ) − (V (Ai′ ) − ci(Ai′ )) − ci(Ai′′ )=0. (5.24)

Since SPIM-M is individually rational, it follows that Ai′ = Ai′′ = ∅, which contradicts the assumption. Thus Ai′ ∩Ai′′ = ∅. We also have V (Ai) = V (Ai′ ∪Ai′′ ) = V (Ai′ )+ V (Ai′′ ), since Ai = Ai′ ∪Ai′′ . According to Equation (5.10), the payment to i is

pi,Ai = V (Ai) − max{0,mi} (5.25)

≥ V (Ai) − (mi′ + mi′′ ) (5.26)

= V (Ai′ ∪Ai′′ ) − (mi′ + mi′′ ) (5.27)

= V (Ai′ ) − mi′ + V (Ai′′ ) − mi′′ (5.28)

′ ′′ ≥ pi ,Ai′ + pi ,Ai′′ . (5.29)

89 This implies that if a user pretends two identities and is assigned tasks separately, its payment

will not increase. Due to the fourth property of the cost function, we have ci(Ai) ≤ ci(Ai′ )+ ci(Ai′′ ), since Ai = Ai′ ∪Ai′′ . Therefore, according to Equation (5.2) and Equation (5.5), the utility of i when using two identities is not greater than that obtained by using a single identity. Thus SPIM-M is Sybil-proof.

Remarks: From the proof in Lemma 8, we can also prove Ai ∩Aj = ∅ for any i, j ∈

U, i =6 j. In addition, the time complexity is O(n · 2|Γmax|), where n is the number of users, and |Γmax| is the largest user task set size, since SPIM-M will calculate the payments to each user for every subset of its task set. To be shown in Section 5.5.2, the execution time of SPIM-M is in the magnitude of second in our simulation. This is because the number of tasks each user can perform is limited by some constraints, e.g., travel budget [32], and thus very small.

5.4.2 Analysis of SOS and SOM

In this section, we prove the properties of SOS in the following theorem.

Theorem 15. SOS is computationally efficient, individually rational, truthful and Sybil- proof in SM case.

We prove this theorem with the following lemmas.

Lemma 16. SOS is computationally efficient.

The proof is similar to that for Lemma 9, and thus omitted.

Lemma 17. SOS is individually rational.

Proof. For any winner i, assume it is selected at time slot t[i] with its true bid, i.e., bi =˜c(Γ˜i).

t[i] t[i] t[i] If there exists a winner j at time slot t[i] when i is not in U , we have vi −bi ≥ vj −bj ≥ 0 since i was the winner at time slot t[i]. Then according to Line 7 in Algorithm 9, we have

90 t[i] t[i] t[i] t[i] pi ≥ vi − (vj − bj) ≥ bi. If there is no winner at time slot t[i] when i is not in U , we t[i] ˜ have pi ≥ vi ≥ bi according to Line 10. Therefore, ui = pi − c˜(Γi)= pi − bi ≥ 0, and SOS is individually rational.

Lemma 18. SOS is truthful.

Proof. We first prove that user i cannot increase its utility by submitting a false task set. We then prove that user i cannot increase its utility by submitting a false active time window

or a false cost. If user i submits a false task set Γi ⊂ Γ˜i, the utility of i is 0 according to

Equation (5.1). On the contrary, if Γi \ Γ˜i =6 ∅, user i will not be paid since it cannot finish

all the tasks in Γi. Thus, there is no incentive for i to submit a false task set. To prove that user i cannot increase its utility by submitting a false active time window or a false cost, it suffices to prove that the selection rule of SOS is monotone and the payment to each winner is its critical value according to Theorem 10. Obviously, the criterion value of a user will increase with the decrease of user’s cost. Meanwhile, due to the submodularity of user’s marginal value, the criterion value of a user at each time slot will not decrease if it bids a wider active time window. Therefore, the selection rule of SOS is monotone. Next,

we prove that the payment pi to winner i is its critical value. Assume i was selected at time t[i] ˜ slot t[i] with bi, and thus pi = pi ≥ bi. If user i bids bi > pi, it is obvious that i still loses

at any time slot t ∈ [ai,t[i]) since its criterion value is less than that when i bids bi but loses

in [ai,t[i]). At any time slot t ∈ [t[i],di], i still loses since there always exists a user ij such t ˜ t ˜ that vi − bi < vij − bij according to Equation (5.12). If user i bids bi < pi, i wins at least

within [t[i],di], according to Equation (5.12), if not earlier. Therefore, pi is the critical value for user i.

Lemma 19. SOS is Sybil-proof.

Proof. We prove SOS is Sybil-proof by proving it satisfies the sufficient conditions in The- orem 11. Assume user i submits (ai′ ,di′ , Γi′ ,bi′ ) and (ai′′ ,di′′ , Γi′′ ,bi′′ ) using two fictitious

′ ′′ identities i and i , respectively, where Γi′ ∪ Γi′′ = Γ˜i.

91 We first prove that SOS satisfies the first condition in Theorem 11. We assume that both i′ and i′′ are selected as winner at time t[i′] and t[i′′] ( w.l.o.g. t[i′]

It implies that Γi′ ⊂ Γ˜i and Γi′ ⊂ Γ˜i, since one will make the other lose otherwise. In

′ ′′ t[i ] t[i ] ′ ′′ addition, we have vi′ ≥ bi′ and vi′′ ≥ bi′′ , since both i and i are winners. If user

′ i use a single identity and submits its true bid (˜ai, d˜i, Γ˜i,bi). At time slot t[i ], we have

t[i′] t[i′] t[i′′] t[i′] t[i′′] vi − bi ≥ vi′ + vi′′ − bi ≥ vi′ + vi′′ − bi′ − bi′′ . The first inequation lies in the fact that

Γi′ ∪ Γi′′ = Γ˜i. The second inequation is based on the fourth property of the cost function.

t[i′] t[i′] t[i′′] Therefore, we have vi − bi ≥ vi′ − bi′ since vi′′ − bi′′ ≥ 0. This implies that i wins at t[i′] at the latest while using a single identity. Therefore, the first condition in Theorem 11 is satisfied. We next prove that SOS satisfies the second condition in Theorem 11. We know that

′ ′ ′′ ′ user i wins at t[i] ≤ t[i ]. Let tc, tc and tc denote the time that determine the payment of i, i

′′ ′ ′ and i according to Equation (5.12), respectively. We know that tc ∈ [t[i],di], tc ∈ [t[i ],di′ ],

′′ ′′ ′ and tc ∈ [t[i ],di′′ ] according to Algorithm 9. We then prove by cases. In Case 1, tc ∈ [t[i ],di] t′ ′ t′ t′′ ′ ′ ′′ c tc c c tc and tc ≤ tc . According to Equation (5.12), we have pi ≥ vi − c ≥ vi′ + vi′′ − c ≥ pi′ + pi′′ .

′ The first inequation results from the fact tc ∈ [t[i],di]. The second inequation is based on t′ ′ t′′ ˜ c tc c the fact Γi′ ∪ Γi′′ = Γi. The third inequation is based on the fact pi′ = vi′ − c and pi′′ ≤ vi′′

′ ′ ′′ according to Equation (5.12). In Case 2, tc ∈ [t[i ],di] and tc > tc . Similar to Case 1, we t′′ ′′ t′ t′′ ′′ c tc c c tc ′ can prove that pi ≥ vi − c ≥ vi′ + vi′′ − c ≥ pi′ + pi′′ . In Case 3, tc ∈ [t[i],t[i ]). Because t′ ′ t′′ ′′ ′ c tc c tc tc < t[i ], we have pi ≥ max{vi − c ,vi − c } ≥ pi′ + pi′′ based on the proofs of Case 1 and Case 2. Therefore, we have pi ≥ pi′ + pi′′ . Hence, the second condition in Theorem 11 is satisfied. Therefore, SOS is Sybil-proof according to Theorem 11. We can use a similar proof for the case in which a user pretends more than two identities.

Next, we prove the properties of SOM in the following theorem.

Theorem 16. SOM is individually rational, truthful and Sybil-proof in MM case.

92 We prove this theorem with the following lemmas.

Lemma 20. SOM is individually rational.

t Proof. The utility of any active user i at any time slot t is 0 when the assignment Ai = ∅ according to Equation (5.2). According to Equation (5.14), at any time slot t, SOM assigns

t t any active user i a bundle Ai maximizing its utility. It implies that Ai =6 ∅ only if ui =

t t p t − c˜i(Ai) > 0, and thus the utility of any user i is non-negative. Therefore, SOM is i,Ai individually rational.

Lemma 21. SOM is truthful.

Proof. We first prove that user i cannot increase its utility by submitting a false task set. Then, we prove that user i cannot increase its utility by submitting a false active time window or a false cost function. Assume user i submits a false bid βi =(ai,di, Γi,ci(·)). By

t Equation (5.13), at any time slot t, the payment to i for any bundle B ⊆ Γi is calculated independently of i’s own cost function. If Γi ⊂ Γ˜i, the payment to i for any subset of Γi is

the same as that when i submits Γ˜i. In addition, at each time slot t, SOM assigns i a bundle maximizing its utility by Equation (5.14). Therefore, user i cannot increase its utility by

submitting Γi ⊂ Γ˜i. On the contrary, if Γi \Γ˜i =6 ∅, it will not be assigned tasks in Γi \Γ˜i =6 ∅, since its utility is negative according to the second property of the cost function. Therefore, user i has no incentive to submit a false task set. Next, we prove that user i has no incentive to submit a false active time window, i.e.,

˜ t ai > a˜i or di < di. By Equation (5.13), SOM only considers Γi for each active user i at any

t time slot t. Besides, the size of Γi is non-increasing with time. Thus, a narrow time window will not increase a user’s change to be a winner. Therefore, a user has no incentive to submit a false time window.

At last, a false cost function ci(·) =6 c ˜i(·) can only affect the result of Equation (5.14) t ˜t according to SOM. Let Ai and Ai denote the assignments to i when i submits ci(·) and

the true cost functionc ˜i(·), respectively. It is obvious that the utility of i will not change if

93 t ˜t t ˜t ˜t t t ˜t A = A . If A =6 A , we have p ˜t − c˜i(A ) ≥ pi,At − c˜i(A ) because of both A and A are i i i i i,Ai i i i i i t ˜t the subset of Γi, and Ai is the bundle maximizing i’s utility. Thus, user i cannot increase

its utility by submitting ci(·). Therefore, SOM is truthful.

Lemma 22. SOM is Sybil-proof.

′ ′′ Proof. We assume that user i pretends two identities i and i who are assigned Ai′ and Ai′′ , respectively. Let ui(Ai) denote the utility of user i when assigned Ai. For any time slot

t t t t t t t ∈ [ai,di], we have ui(Ai) ≥ ui(Ai′ ∪Ai′′ ) since Ai′ ∪Ai′′ ⊆ Γi and SOM assigns i the bundle

t t t t t t that maximizes its utility. Next, we prove that ui(Ai′ ∪Ai′′ ) ≥ pi′ + pi′′ − ci(Ai′ ∪Ai′′ ). Let mi′ denote maxj6=i′,B∈❇t (At )(V (B)− cj(B)), and m ′′ denote maxj6=i′′,B∈❇t (At )(V (B) − cj(B)). j i′ i j i′′ By Equation (5.13), the payments to i′ and i′′ at any time slot t are

t t t pi′ = pi′,At = V (Ai′ ) − max {0,mi′ } , (5.30) i′ t t t pi′′ = pi′′,At = V (Ai′′ ) − max {0,mi′′ } . (5.31) i′′

˜t t t ˜t Let Ai = Ai′ ∪Ai′′ , the payment to i when assigned Ai is

t ˜t p ˜t = V (Ai) − max 0, max (V (B) − cj(B)) . (5.32) i,Ai ❇t ˜t ( j6=i,B∈ j (Ai) ) In addition, we know that

❇t ˜t ❇t t ❇t t j(Ai)= j(Ai′ ) ∪ j(Ai′′ ). (5.33) t t ′ t ′′ ∀j∈S[,j6=i ∀j∈S[,j6=i ∀j∈S[,j6=i Let mi denote max ❇t ˜t (V (B) − cj(B)). Thus we have mi = max{mi′ ,mi′′ }, and thus j6=i,B∈ j (Ai) t t mi ≤ mi′ + mi′′ . In addition, we can prove that Ai′ ∩Ai′′ = ∅ by contradiction. Assume

t t ′ Ai′ ∩Ai′′ =6 ∅, the payments to i is

t t t t t pi′,At = V (Ai′ ) − max{0,mi′ }≤ V (Ai′ ) − (V (Ai′′ )−ci(Ai′′ )) (5.34) i′

t t t ❇t t The inequality results from the fact V (Ai′′ ) − ci(Ai′′ ) ≤ mi′ , since Ai′′ ∈ ∀j∈St,j6=i′ j(Ai′ ).

t t t t Similarly, we have pi′′,At ≤ V (Ai′′ ) − (V (Ai′ ) − ci(Ai′ )). Therefore, theS summation of the i′′ utilities of i′ and i′′ at any time slot t is

94 t t t t pi′,At − ci(Ai′ )+ pi′′,At − ci(Ai′′ ) (5.35) i′ i′′ t t t t ≤ V (Ai′ ) − (V (Ai′′ ) − ci(Ai′′ )) − c˜i(Ai′ ) (5.36) t t t t + V (Ai′′ ) − (V (Ai′ ) − ci(Ai′ )) − ci(Ai′′ )=0. (5.37)

t t It implies that Ai′ = Ai′′ = ∅ since SOM is individually rational, which contradicts the t t ˜t t t t t assumption. Therefore, Ai′ ∩Ai′′ = ∅. We also have V (Ai)= V (Ai′ ∪Ai′′ )= V (Ai′ )+V (Ai′′ ), ˜t t t since Ai = Ai′ ∪Ai′′ . By Equation (5.13), the payment to i is

t ˜t p ˜t = V (Ai) − max{0,mi} (5.38) i,Ai ˜t ≥ V (Ai) − (mi′ + mi′′ ) (5.39) t t = V (Ai′ ∪Ai′′ ) − (mi′ + mi′′ ) (5.40) t t = V (Ai′ ) − mi′ + V (Ai′′ ) − mi′′ (5.41) t t ≥ pi′,At + pi′′,At . (5.42) i′ i′′

˜t t t In addition, we have ci(Ai) ≤ ci(Ai′ )+ ci(Ai′′ ) because of the fourth property of the cost function. By Equation (5.2) and Equation (5.5), the utility of i when using two identities is not greater than that obtained by using a single identity at any time slot t. User i’s utility via Sybil attack is not greater than that obtained by using a single identity. Therefore, SOM is Sybil-proof.

We can use a proof similar to that in Lemma 22 to prove that Ai ∩Aj = ∅ for any two users i and j. In addition, SOM does not satisfy computational efficiency, since at each

t time slot t it calculates the payments to each active user i for every subset of Γi, and the

t t time complexity is exponential to the largest |Γi| for all i ∈ S . In reality, however, the number of tasks each user can perform is very small because of various constraints, e.g., travel budget [32], and thus the execution time of SOM is still practical.

5.5 Performance Evaluation of SPIM-S and SPIM-M

In this section, we compare the performances of SPIM-S and SPIM-M with MMT and MSensing. Specifically, we implement SPIM-S with the same criterion and payment function

95 as in MMT [91], i.e., QS(x,y)= y/x and PS(x,y)= xy. Note that the criterion of SPIM-M is the same as in MSensing [8]. The performance metrics are running time, total payment, and platform utility.

5.5.1 Evaluation Setup

We use a real data set for evaluation. It consists of the traces of 320 taxi drivers, who work in the center of Rome [90]. Each trace is represented by a sequence of locations. Each taxi driver has a tablet, which periodically retrieves the GPS locations and sends them with the corresponding driver ID to a server. The mobility pattern of taxi traces can be used to depict the mobility of smartphone users as in [91]. We consider a crowdsensing system in which the tasks are to measure the Wi-Fi signal strength at specific locations. Each user can sense the Wi-Fi signal strength within 30 meters from its location. Tasks are represented by GPS locations reported by taxis. We assume that all drivers are willing to participate in this crowdsensing system. We preprocess the tasks such that each task can be sensed by at least two users to prevent monopoly and guarantee the quality of sensing task. In our evaluation, we randomly select locations on taxi drivers’ traces as sensing tasks. We assume the value of each task is uniformly distributed over [1, 5], and users’ cost for each task is uniformly distributed over [1, 10]. To evaluate the impact of the number of sensing tasks (m) on the performance metrics, we fix the number of users (n) at 200 and vary m from 20 to 60 with a step of 10. To evaluate the impact of the number of users on the performance metrics, we fix m at 150 and vary n from 100 to 300 with a step of 50. All results are averaged over 1000 independent runs.

5.5.2 Evaluation of Running Time

Figure 5.4 shows the impacts of m and n on the running time. We see that the running time of SPIM-S, SPIM-M, MMT, and MSensing all increase with the increase of m and n. In both Figure 5.4(a) and Figure 5.4(b), the running time of SPIM-M is more than that of

96 SPIM-S, MMT, and MSensing. This is because SPIM-M calculates the payment to each user for every subset of its task set. In addition, the running time of SPIM-S is less than that of MMT, though they use the same criterion. This is because SPIM-S starts from the group with the largest task set size, and thus may finish assigning tasks earlier than MMT.

2 10 MSensing MSensing SPIM-M 8 SPIM-M 1.5 MMT MMT SPIM-S 6 SPIM-S 1 4 0.5 2 Running time (sec) 0 Running time (sec) 0 20 30 40 50 60 100 150 200 250 300 Number of sensing tasks Number of users (a) Impact of m (b) Impact of n

Figure 5.4: Running time

5.5.3 Evaluation of Total Payment

Figure 5.5 plots the impacts of m and n on the total payment to users. In Figure 5.5(a), we see that the total payment of SPIM-S, SPIM-M, MMT, and MSensing both increase with an increase in m. This is because, with more tasks, the platform may select more users to perform the tasks, which incurs a higher payment. In Figure 5.5(b), we observe that the total payments of SPIM-M and MMT decrease with an increase in n. Because the platform may find more low-cost users to perform the tasks with an increase in n. Note that the total payment of MMT is larger than those of others. The reason is that MMT selects a user as long as its marginal value is nonzero, and thus may select more users incurring a higher payment. In addition, the total payments of SPIM-S and MSensing all increase slightly with the increase of n. This is because, with more users, SPIM-S and MSensing can assign more tasks, which incurs a higher payment.

97 (a) Impact of m (b) Impact of n

Figure 5.5: Total payment

5.5.4 Evaluation of Platform Utility

Figure 5.6 shows the impacts of m and n on the platform utility. Note that the y-axis in Figure 5.6(b) is log-scaled. The platform utility of MMT is negative, and thus omitted from Figure 5.6. The reason is MMT will select a user as long as its marginal value to the platform is not zero, which may make the total payment to users higher than the total value to the platform. In both Figure 5.6(a) and Figure 5.6(b), we see that the platform utility achieved by SPIM-M is larger than those achieved by SPIM-S and MSensing. This is because SPIM-M assigns each task to at most one user, and thus avoids paying users to perform duplicated tasks. This advantage is amplified with large m.

60 MSensing SPIM-M 102 40 SPIM-S MSensing SPIM-M 1 SPIM-S 20 10 Platform utility Platform utility 0 100 20 30 40 50 60 100 150 200 250 300 Number of sensing tasks Number of users (a) Impact of m (b) Impact of n

Figure 5.6: Platform utility

98 5.6 Performance Evaluation of SOS and SOM

In this section, we compare the performances of SOS and SOM with three benchmarks. The first benchmark is an online mechanism adapted from [72] for SM case, denoted by Greedy. Note that, this mechanism is not Sybil-proof. The second benchmark is SPIM-S, which is a Sybil-proof mechanism for SM case. The third benchmark is SPIM-M, which is a Sybil-proof mechanism for MM case. The performance metrics include total payment, platform utility and Sybil-proofness.

5.6.1 Evaluation Setup

For a fair comparison with SPIM-S and SPIM-M, we use the same dataset, which is a real- world dataset consisting of the traces of taxi drivers [90]. We also consider a crowdsensing system in which the tasks are measuring the Wi-Fi signal strength at specific locations. In this system, tasks are represented by GPS locations of the taxi drivers in the dataset, and users are all the taxi drivers. In our evaluation, we randomly select locations on taxi drivers’ traces as the sensing tasks. The value of each task is uniformly distributed over [1, 5], and users’ cost for each task is uniformly distributed over [1, 5]. We set the deadline (T ) to 60 (min), each user i’s a˜i is uniformly distributed over [0, 60], and the active time window (d˜i − a˜i) of each user is uniformly distributed over [0, 5]. To evaluate the impact of the number of sensing tasks (m) on the performance metrics, we fix the number of users (n) at 200 and vary m from 20 to 60 with a step of 10. To evaluate the impact of the number of users on the performance metrics, we fix m at 150 and vary n from 100 to 300 with a step of 50. All the results are averaged over 1000 independent runs.

5.6.2 Evaluation of Total Payment

The impacts of m and n on the total payment to users are shown in Figure 5.7. In Figure 5.7(a), we see that the total payment of both offline and online mechanisms increase with the increase of m. This is because the platform may recruit more users when m increase,

99 and thus has a higher payment. In addition, we see that the total payment of the online mechanisms (Greedy, SOS, SOM) are higher than that of the offline mechanisms (SPIM-S, SPIM-M). This is because the online mechanisms may recruit more users when users arrives in different time slots, and thus has a higher payment. In Figure 5.7(b), we observe that the total payment of SPIM-M decrease with the increase of n. This is because, with more users, SPIM-M may find more low-cost users to perform the tasks. The total payment of the other four mechanisms increase slightly with the increase of n. This is because, with more users, these mechanisms may assign more tasks incurring a higher payment. In addition, we see that the total payment of the online mechanisms are higher than that of the offline mechanisms as explained before. Note that, SOS has a similar performance as Greedy, which is not Sybil-proof.

120 130 SOS Greedy 100 SPIM-S SOM 80 100 SPIM-M

60 SOS Greedy 70 40 SPIM-S

Total payment SOM SPIM-M Total payment 20 40 20 30 40 50 60 100 150 200 250 300 Number of sensing tasks Number of users (a) Impact of m (b) Impact of n

Figure 5.7: Total payment

5.6.3 Evaluation of Platform Utility

The impacts of m and n on the platform utility are shown in Figure 5.8. In both Figure 5.8(a) and Figure 5.8(b), we see that the platform utilities achieved by the offline mechanisms (SPIM-S, SPIM-M) are larger than that achieved by online mechanisms (Greedy, SOS, SOM). This is because offline mechanisms know all users’ bids before making decision, while online mechanisms have no information of future users. In addition, we see that SOM outperforms SOS in term of the platform utility. This is because SOM assigns each task

100 to at most one user, and thus avoids paying users to perform duplicated tasks.

50 SOS 25 SOS Greedy Greedy 40 SPIM-S 20 SPIM-S SOM SOM 30 SPIM-M 15 SPIM-M 20 10 10 5 Platform utility Platform utility 0 0 20 30 40 50 60 100 150 200 250 300 Number of sensing tasks Number of users (a) Impact of m (b) Impact of n

Figure 5.8: Platform utility

5.6.4 Evaluation of Sybil-proofness

Figure 5.9 shows the utility of Sybil attacker and the other users without Sybil attack, denoted by Attacker and Other, and that via Sybil attack, denoted by Attacker-Sybil and Other-Sybil. In both Figure 5.9(a) and Figure 5.9(b), we see that the attacker increase its utility while decreasing the utility of others’ in Greedy. This is because Greedy is vulnerable to Sybil attack. However, the attacker cannot increase its utility in SPIM-S, SOSand SOM, since they are Sybil-proof.

10 Attacker 15 Attacker Attacker-Sybil Attacker-Sybil 8 Other Other Other-Sybil Other-Sybil 10 6

Utility 4 Utility 5 2 0 0 Greedy SOS SPIM-S Greedy SOM (a) Single-minded case (b) Multi-minded case

Figure 5.9: Sybil-proofness

101 5.7 Conclusion

In this work, we analyzed security issues in incentive mechanisms raised by Sybil attack. We considered both offline and online scenarios. The main contributions of this work are as follows: First, we investigated the Sybil attack in auction-based incentive mechanisms for crowdsensing. As an essential step, we formally defined the Sybil attack models in crowdsensing for both offline and online scenarios. Second, we analyzed existing offline and online auction-based incentive mechanisms and demonstrated that all of them are vulnerable to Sybil attack. Third, depending on whether a user is willing to perform a subset of its submitted task set, we investigated both the single-minded and multi-minded cases. We designed SPIM-S and SPIM-M for these two cases in offline scenario, respectively. We also designed SOS and SOM for these two cases in online scenario, respectively. In order to design SPIM-S, we provided a sufficient condition for a mechanism to be Sybil-proof in offline scenario. In order to design SOS, we provided a sufficient condition for a mechanism to be Sybil-proof in online scenario. These sufficient conditions can be used as guidelines to design Sybil-proof incentive mechanisms. In addition, we proved that SPIM-S achieves computational efficiency, individual rationality, truthfulness, and Sybil-proofness, and that SPIM-M achieves individual rationality, truthfulness, and Sybil-proofness. Finally, we also proved that that SOS achieves computational efficiency, individual rationality, truthfulness, and Sybil-proofness, and that SOM achieves individual rationality, truthfulness, and Sybil- proofness.

102 CHAPTER 6 ALLEVIATING THE SYBIL ATTACK IN TRUTH DISCOVERY

This work was published in IEEE ICDCS [115].

6.1 Background

In this section, we first introduce truth discovery, which is used for data aggregation. Then, we introduce device fingerprinting, which is used to identify an individual device.

6.1.1 Truth Discovery

Truth discovery [12] has been proposed to aggregate noisy sensing data collected from a variety of sources to receive true information (i.e., the truths). Without any prior knowledge about users’ reliability, a truth discovery algorithm iteratively assigns different weights to users according to the quality of their data and computes the estimated truth as the weighted average of all data. This process repeats until it converges. A general truth discovery algo- rithm involves iterative estimations of weights and truth in a joint manner as summarized in Algorithm 12, which can be divided into two phases: weight estimation and truth estimation.

Initially, the algorithm randomly guesses each task’s truth, and then iteratively updates each user’s weight and the estimated truth until it converges. Note that the convergence criterion is based on applications. For example, the convergence criterion is the maximum number of iterations in [21].

Weight estimation: Let T = {τ1,τ2, ··· ,τm} denote a set of sensing tasks, and Uj =

{i|i ∈ U,τj ∈ Ti} denote the set of users who submit sensing data for task τj. Each user

i performs sensing tasks in Ti ⊆ T . Let dj denote the sensing data submitted by user i for task τj. Then in weight estimation step, given the estimated truth dj for any task τj, and

103 Algorithm 12: Truth Discovery Algorithm Input: Users’ data ❉; Output: Estimated truths {dj|τj ∈ T }; 1 Randomly initialize the estimated truth for each task; 2 repeat // Weight estimation 3 foreach i ∈Uj do 4 Update the weight wi based on Equation (6.1); 5 end // Truth estimation 6 foreach τj ∈T do 7 Update the estimated truth dj based on Equation (6.2); 8 end 9 until Convergence criterion is satisfied; 10 return Estimated truths {dj|τj ∈ T }.

the weight wi of each user i ∈Uj is calculated as

i wi = W( D(dj,dj)), (6.1) τ ∈T Xj i where W(·) is a monotonically decreasing function, and D(·) is the distance function mea- suring the difference between the user’s data and the estimated truth.

Truth estimation: In this step, given users’ weights for task τj, the estimated truth of

task τj is calculated as

w di i∈Uj i j dj = . (6.2) wi P i∈Uj Although existing truth discovery algorithmsP [12, 20–22] differ in their ways to update users’ weights and the estimated truth, they all follow the same principles: 1) the users whose data are closer to the estimated truth are assigned higher weights; 2) the aggregated result relies more on the users with higher weights. In recent years, the benefits of truth discovery are well explored. However, the fact that aggregation accuracy highly depends on the quality of input data also raises security concerns. Existing truth discovery algorithms [12, 20–22] assume that most users are reliable. However, this assumption does not hold in practice, especially when a crowdsensing system

104 is under Sybil attack [19]. In practice, a user could submit data to a crowdsensing system using multiple accounts for various motives. For example, a rapacious user may want to receive more rewards without contributing extra efforts (e.g., submitting duplicated data multiple times using different accounts). However, a malicious user aims to manipulate the aggregated data and mislead the decision making of a system [61]. The Sybil attack is easy to conduct (e.g., creating multiple accounts) but difficult to detect. Therefore, the power of truth discovery algorithms will be undermined unless it is resistant to Sybil attack. The rapacious users can be prevented by our proposed Sybil-proof incentive mechanisms (in Chapter 4) since they can fundamentally eliminate these users’ motivation to conduct Sybil attack. However, they cannot address malicious attackers. In Section 6.2.2, we demon- strate that existing truth discovery algorithms are vulnerable to Sybil attack.

6.1.2 Device Fingerprinting

Device fingerprinting refers to finding characteristics that can help identify an individual device. It has been exploited to track a user across multiple visits to websites [116]. Modern mobile devices (e.g., smartphones) are equipped with rich sensors, e.g., accelerometers and gyroscopes, that are available to apps. In recent years, these sensors are fully invested and have been used to uniquely identify a smartphone by measuring anomalies in their produced signals, which are the result of manufacturing imperfections. The accelerometer and gy- roscope sensors in smartphones are based on Micro Electro Mechanical Systems (MEMS) as shown in Figure 6.1. When an accelerometer is subjected to a linear acceleration along its sensitive axis, the seismic mass will shift closer to one of the fixed electrodes causing a change in the generated capacitance. During the manufacturing process, there might be a slight gap difference between these electrodes for different chips, which may cause a differ- ence in the generated capacitance for the same acceleration. A gyroscope measures the rate of rotation based on the Coriolis effect. The angular velocity is computed from the Coriolis force, which is sensed by a capacitive sensing structure. A change in the vibration of the proof mass causes a change in capacitance. Similar to the accelerometers, the imperfection in

105 the electro-mechanical structure may cause a difference in the generated capacitance for the same Coriolis force across chips. In this work, we exploit both accelerometer and gyroscope for device fingerprinting. The fingerprints generated from these two sensors can uniquely identify a mobile device.

(a) Accelerometer (b) Gyroscope

Figure 6.1: MEMS-based accelerometer and gyroscope

6.2 Model and Problem Formulation

In this section, we introduce the system model and the adversary models.

6.2.1 System Model

We consider a crowdsensing system consisting of a cloud-based platform and a crowd of

n mobile users U = {1, 2, ··· ,n}. The platform first publicizes a set T = {τ1,τ2, ··· ,τm} of sensing tasks. Each sensing task can be a task to sense a particular object, event or local phenomenon in a specific sensing region. Then, each user performs sensing tasks Ti ⊆ T .

i i i Let Di = {(dj,tj)|τj ∈ Ti} denote user i’s accomplished task set, where dj is the sensing data for task τj in the form of numerical values, e.g., cellular signal strength [82], noise level

i measurements [2], and Wi-Fi signal strength [117], and tj is the corresponding timestamp.

Each user i submits Di to the platform. Meanwhile, the platform collects sensor data from i’s device for device fingerprinting. Note that, this sensor data is independent of the sensing

106 i ❉ data dj. Once the sensing data = {Di|i ∈ U} from all users has been collected, the

platform calculates an aggregated result dj for each task τj as an estimate for the truth, which is unknown to either the platform or users. In practice, the quality of sensing data from different users varies. In addition, users’ data quality is unknown to the platform. Therefore, it is common for a platform to utilize a truth discovery algorithm to aggregate data, which calculates users’ weights and estimates the truth in a joint manner.

6.2.2 Adversary Models

A crowdsensing system can recruit users in various channels such as Aamazon Mechanical Turk. Each user performs a task by submitting timestamped sensing data via an app to the platform. In this work, we assume that users could conduct Sybil attack but cannot tamper with their devices. Fabricated mobile sensor data through tampering attacks can be identified by evaluating the authenticity of data [118]. A Sybil attacker is a user who performs a task once but submits data multiple times under different accounts (possibly after simple modification). We also assume that an attacker can submit fake sensing data, but the timestamps cannot be fabricated. Fabricated timestamps can be detected using the method in [119]. We consider the following two scenarios depending on whether a Sybil attacker uses multiple devices. Note that these two scenarios are sufficient to represent the general Sybil attack in crowdsensing. Attack-I: single device and multiple accounts. In this attack, an attacker i has only one device. Therefore, it can only conduct Sybil attack by creating multiple accounts. For

i′ i′ example, attacker i performs the sensing task τj and submits the sensing data dj at tj using

′ ′′ i′′ i′′ account i . Then it switches to account i and submits dj (could be fabricated) at tj , which

i′ i′ i′′ could be a copy of dj without sensing effort. Note that, tj and tj are different due to switching accounts, and the device fingerprints associated with i′ and i′′ are supposed to be the same since they are from the same device. Attack-II: multiple devices and multiple accounts. In this attack, an attacker i has multi- ple devices. Therefore, it can conduct the Sybil attack by using multiple accounts on different

107 devices. For example, attacker i performs the sensing task τj and submits the sensing data

i′ i′ ′ dj (could be fabricated) at tj using account i on one device. Then i uses another device with

′′ i′′ i′′ i′ account i and submits data dj at tj , which could be a copy of dj without sensing effort.

i′ i′′ Note that, tj and tj are different due to the switching device, and the device fingerprints associated with i′ and i′′ could be different since they are from different devices. Next, we use an example shown in Table 6.1 to demonstrate that existing truth discovery algorithms [12, 20–22] are vulnerable to Sybil attack. Consider a crowdsensing system with 4

tasks (τ1,τ2,τ3,τ4) and 4 users. Each task is measuring the Wi-Fi signal strength (dBm) at a specific location. Each account is allowed to submit at most one data for one task. For each task, the system aggregates data submitted by users to identify the Wi-Fi signal strength at the associated location. Assume that User 4 is a Sybil attacker who has one device and three accounts (Attack-I), i.e., 4′,4′′, and 4′′′. The objective of the attacker is to mislead the

system to decide that the Wi-Fi signals in locations of τ1, τ3, and τ4 are strong. Therefore, User 4 uses 3 accounts (4′, 4′′, 4′′′) to submit fabricated data (−50 dBm), which represents a strong Wi-Fi signal strength. We use CRH [21], a widely adopted truth discovery (TD) algorithm, to aggregate the data in the cases without and with Sybil attacker 4, respectively. Note that CRH is sufficient to represent existing truth discovery algorithms since they have the same procedure as Algorithm 12. According to the result, we see that the Sybil attack

Table 6.1: Example showing the Sybil attack in truth discovery

Account τ1 τ2 τ3 τ4 1 −84.48 −82.11 −75.16 −72.71 2 - −72.27 −77.21 - 3 −72.41 −91.49 - −73.55 4′ −50 - −50 −50 4′′ −50 - −50 −50 4′′′ −50 - −50 −50 TD without the Sybil attack −84.23 −82.01 −75.22 −72.72 TD with the Sybil attack −56.06 −86.17 −53.29 −55.35

108 has a significant impact on the aggregated results for τ1, τ3, and τ4. This example can be extended to any crowdsensing systems in which the sensing data for each task is in the form of numerical values. Therefore, it is urgent to design effective Sybil-resistant truth discovery algorithms for crowdsensing to diminish the impact of Sybil attack.

6.3 Our Approach 6.3.1 Design Rationale

To diminish the impact of Sybil attack, we propose a framework. The key idea of the framework is to identify the data submitted by suspicious accounts and diminish the impact of these data on truth discovery. The following three methods can be used to identify these data: 1) In a crowdsensing system, users usually collect and submit sensing data using their mobile devices (e.g., smartphones). Therefore, if multiple data submitted by different accounts can be identified to be submitted by the same device, these data should be assigned less weight in truth discovery since they might be from a Sybil attacker. 2) In order to manipulate the aggregated results for multiple tasks, a Sybil attacker ought to submit data for each task using different accounts (as the example in Section 6.2.2). Therefore, the data submitted by accounts with highly similar task set are likely from a Sybil attacker. 3) Similarly, the trajectories of a Sybil attacker’s accounts should have a similar pattern since they belong to the same user. Therefore, the data submitted by accounts with highly similar trajectories are likely from a Sybil attacker. In order to diminish the impact of the data submitted by Sybil attackers, we group the accounts potentially from Sybil attackers and assign low weights to their data in the truth discovery algorithm. Note that we do not directly eliminate the data submitted by suspicious accounts since there might be false-positives in the identifying process.

6.3.2 Design of Framework

We now describe the details of the framework, which is illustrated in Algorithm 13. It takes users’ data for each task and their associated device fingerprints as inputs. The

109 framework first groups all accounts (to be elaborated in 6.3.3). Let G = {g1,g2, ··· ,gl}

denote the account-grouping result, where gi ∩ gj = ∅ and ∪gi∈Ggi = U. Each group gk ∈G ˜ represents a set of accounts likely used by the same user. Let Tk = ∪i∈gk Ti denote the

set of tasks performed by the accounts in group gk. Then the framework groups data as

follows. For each task τj, we first aggregate the data within each group gk ∈ Gj, where

Gj = {gk|gk ∈G,τj ∈ T˜k}, according to

(di − d¯k)di d˜k = i∈gk j j j , (6.3) j (di − d¯k) P i∈gk j j ¯k where dj is the arithmetic mean of dataP submitted by accounts in gk. The weight of each group gk ∈Gj is calculated as

|gk| w˜k =1 − , (6.4) |Uj|

Algorithm 13: Sybil-resistant Truth Discovery Framework Input: Users’ data ❉ and device fingerprints ❋; Output: Estimated truths {dj|τj ∈ T }; // Account grouping 1 G ← AG(❉, ❋); // Data grouping 2 foreach τj ∈T do 3 Group sensing data for τj based on G; 4 Aggregate the data in each group gk ∈Gj by Equation (6.3); 5 Calculate the weightw ˜k of each group by Equation (6.4); 6 end 7 Initialize the estimated truth for each task by Equation (6.5); 8 repeat // Group weight estimation 9 foreach gk ∈G do 10 Update the weightw ˜ usingw ˜ = W( D(d˜k,d )), where T˜ = ∪ T ; k k τj ∈T˜k j j k i∈gk i 11 end P // Truth estimation 12 foreach τj ∈T do ˜k w˜kd gk∈Gj j 13 Update the estimated truth using dj = ; w˜k P gk∈Gj 14 end P 15 until Convergence criterion is satisfied; 16 return Estimated truths {dj|τj ∈ T }.

110 where |gk| denotes the number of accounts in group gk, and |Uj| is the number of accounts who submit data for task τj. Note that using one data for each group could diminish the impact of the Sybil attack. At last, the framework estimates the truth for each task similar to Algorithm 12. Different

˜k from Algorithm 12, we treat the accounts in one group as a whole, and thus we use dj for group gk. In addition, instead of randomly initializing the estimated truth for each task, we initialize the estimated truth of each task τj as follows

w˜ d˜k gk∈Gj k j dj = . (6.5) w˜k P gk∈Gj P 6.3.3 Design of Account Grouping Methods

As an important component in the framework, account-grouping is used to group accounts that are likely from the same Sybil attacker. Specifically, we design three account-grouping methods. Account Grouping by Device Fingerprint (AG-FP). Inspired by a recent work on device fingerprinting [116], we design an account-grouping method that exploits both accelerometer and gyroscope to produce device fingerprints. The device fingerprint can be used to identify the sensing data from different accounts but the same device. We treat the data from the accelerometer and gyroscope as two streams of timestamped real values. Given a timestamp t, the platform collects values from accelerometer and gyro- −→ −→ scope along three axes in the form of a (t)=(ax, ay, az) and w (t)=(wx,wy,wz), respec- tively. The platform starts collecting these sensor data from the moment an account i signs −→ −→ −→ −→ in the system for T seconds and stores these data in Fi = (( a (1), a (2), ··· , a (T )), ( w (1), −→ −→ w (2), ··· , w (T ))). Once all accounts’ device fingerprints ❋ = {Fi|i ∈ U} have been col- lected, the platform converts the acceleration data at each timestamp into a scalar by taking

−→ 2 2 2 its magnitude (i.e., | a (t)| = ax + ay + az ) such that the accelerometer data is inde- pendent of the device orientation.p For the gyroscope, we consider data from each axis as a separate stream. Therefore, each account i’s device fingerprint will be considered as four

111 −→ sensor data streams in the form of {| a (t)|,wx(t),wy(t),wz(t)}. To characterize a sensor data stream, we use both temporal and spectral features as summarized in Table 6.2, including 9 temporal and 11 spectral features. All of these features are widely used and have been well analyzed in the literature [116, 120].

112 Table 6.2: Temporal and spectral features

# Domain Feature Description 1 Mean Thearithmeticmeanofthesignalstrengthatdifferenttimestamps 2 StandardDeviation Standarddeviationofthesignalstrength 3 Skewness Measureofasymmetryaboutmean 4 Kurtosis Measureoftheflatnessorspikinessofadistribution Square root of the arithmetic mean of the squares of the signal strength 5Time RMS at various timestamps 6 Max Maximumsignalstrength 7 Min Minimumsignalstrength 8 ZCR Therateatwhichthesignalchangessignfrompositiveto negative or back 9 Non-Negativecount Numberofnon-negativevalues 10 Spectral Centroid The center of mass of a spectral power distribution 11 SpectralSpread Thedispersionofthespectrumaroundits centroid 12 SpectralSkewness Thecoefficientofskewnessofaspectrum 13 SpectralKurtosis Measureoftheflatnessorspikinessofa distribution relative to a normal distribution 14 SpectralFlatness Measureshowenergyisspreadacrossthespectrum 15Frequency SpectralIrregularity Thedegreeofvariationofthesuccessive peaks of a spectrum 16 SpectralEntropy Thepeaksofaspectrumandtheirlocations 17 SpectralRolloff Thefrequencybelowwhich85%ofthedistribution magnitude is concentrated Amount of spectral energy corresponding to frequencies higher than 18 SpectralBrightness a given cut-off threshold Square root of the arithmetic mean of the squares of the signal 19 SpectralRMS strength at various frequencies 20 SpectralRoughness Averageofallthedissonancebetweenall possible pairs of peaks in a spectrum

113 After extracting features from the sensor data, the platform groups device fingerprints using k-Means [121], which is a widely used clustering method in machine learning. The cluster number k is the number of devices in our case. Note that the platform does not know the exact number of devices in practice. Therefore, we use the elbow method [122] to estimate the value of k. The idea of the elbow method is to run k-Means on all the device fingerprints for a range of values of k (e.g., k from 1 to n) and calculate the sum of squared errors (SSE) for each value of k. Note that the SSE tends to decrease toward 0 with the increase of k. At last, we choose the value of k at which SSE starts to diminish. It is worth mentioning that the time complexity of k-Means is O(nkdi), where n is the number of d-dimensional vectors, k is the number of clusters and i is the number of iterations needed until convergence. In addition, we also see that the running time of the elbow method is linear in the number of users. In a crowdsensing system, the number of selected users for each task is usually limited and smaller than the total number of users [6]. Therefore, AG- FP is efficient in practice. Note that the device fingerprint grouping result is the result of account-grouping since each account is associated with one fingerprint. As an example, we use 3 smartphones of different models, each collecting 5 fingerprint data. Figure 6.2(a) shows the distribution of the fingerprints of these 3 smartphones in the first two principal components’ feature space (denoted by PC1 and PC2, respectively). Figure 6.2(b) shows the grouping result by k-Means when k = 3. We see that Smartphone 2 has a stable fingerprint (i.e., its 5 fingerprints are close together), and thus it can be easily differentiated from the other two smartphones. However, there are three fingerprints from Smartphone 1 that have been wrongly grouped with Smartphone 3. These are the false-positives of this grouping method. Remarks: AG-FP can be used to defend against Attack-I since it can diminish the impact of multiple data from different accounts but the same Sybil attacker using the same device. However, a Sybil attacker can use multiple devices in Attack-II. To effectively defend against Attack-II, we propose the following two account-grouping methods.

114 (a) Fingerprints of smartphones

(b) Grouping result

Figure 6.2: Example of AG-FP

Account Grouping by Task Set (AG-TS). As mentioned in section 6.3.1, there should be a consistency in the accomplished task sets of the accounts from the same Sybil attacker. In- spired by [63], we design an account-grouping method, which groups accounts by calculating the affinity between two accounts according to their accomplished task sets. The grouping method involves the following steps:

1. Let Ti,j denote the number of tasks both accounts i and j have done. Let Li,j denote the number of tasks either i or j has done alone. Then the affinity between i and j,

denoted as Ai,j, is calculated as

T + L A =(T − 2L ) i,j i,j , (6.6) i,j i,j i,j m where m is the number of tasks. Note that the larger the affinity value, the more similar the accomplished task sets of two accounts.

115 2. An undirected graph is constructed, in which accounts are the nodes and the undirected

edge between i and j is weighted with their affinity values Ai,j. Note that only edges that are greater than a threshold ρ are included.

3. Connected components are discovered using Depth First Search (DFS) algorithm. Each component represents a set of accounts who have done a similar set of tasks.

4. Each component is a group, and the account that is not in any component will be treated as a separate group.

To illustrate the process of this grouping method, we use the example in Table 6.3. This example has the same setting as the example in Table 6.1, whereas the values in the table are the timestamps for the corresponding tasks.

Table 6.3: Example showing the Sybil attack in truth discovery

Account τ1 τ2 τ3 τ4 1 10:00:35 a.m. 10:02:42 a.m. 10:10:22 a.m. 10:13:41 a.m. 2 - 10:04:15a.m. 10:06:01a.m. - 3 10:01:21a.m. 10:04:05a.m. - 10:08:28a.m. 4′ 10:01:10a.m. - 10:15:24a.m. 10:20:06a.m. 4′′ 10:01:34a.m. - 10:16:08a.m. 10:21:25a.m. 4′′′ 10:02:35a.m. - 10:17:35a.m. 10:22:02a.m.

Figure 6.3 shows the procedure of AG-TS. The adjacency matrix in Figure 6.3(a) shows the number of tasks both i and j have done. The adjacency matrix in Figure 6.3(b) shows the number of tasks either i or j has done alone. Figure 6.3(c) shows the affinity value of i with respect to j. We set the threshold ρ = 1, and thus an undirected graph is drawn as shown in Figure 6.3(d). We see that one component is constructed in this example, i.e., {1, 4′, 4′′, 4′′′}. Accounts 2 and 3 are not in the component, and thus each of them is treated as a group. Therefore, the grouping result of this example consists of three groups, i.e., {1, 4′, 4′′, 4′′′}, {2}, and {3}. Note that AG-TS groups account 1 with the accounts used by Sybil attacker in the same group, and thus this is a false-positive.

116 Remarks: GP-TS can be used in the scenario in which accounts have diverse accomplished task sets. To handle the scenario in which most accounts have similar accomplished task sets, we propose the following grouping method.

(a) Ti,j (b) Li,j

(c) Ai,j (d) Undirected graph with Ai,j > 1 Figure 6.3: Example of AG-TS

τj τj Account Grouping by Trajectory (AG-TR). The sensing data Di = {(di ,ti )|τj ∈ Ti} submitted by account i for task τj can be regarded as two time series data, i.e., task series

Xi and timestamp series Yi. These two time series data together can be regarded as the trajectory of an account. We design an account-grouping method, which groups accounts

117 by calculating the dissimilarity between two accounts according to their task series and timestamp series. This method relies on the principle that the accounts belonging to a Sybil attacker ought to perform similar set of tasks in a similar time pattern. We use Dynamic Time Warping (DTW) [123] to measure the distance between two time series since it does not require two series to be the same length. Given two time series,

A = a1, a2, ··· , am and B = b1,b2, ··· ,bn, we construct an m-by-n matrix, where each

2 element (i, j) of the matrix is the squared distance, i.e., (ai −bj) , representing the alignment

between points ai and bj. The warping path W = ω1, ω2, ··· , ωK is a contiguous set of matrix elements that defines a mapping between A and B, where max(m,n) ≤ K

K DTW (A, B) = min{ ω /K}. (6.7) v k uk=1 uX This can be calculated using dynamic programming.t Let r(i, j) denote the cumulative dis- tance, which is calculated as the distance dist(ai,bj) found in the current cell and the mini- mum of the cumulative distances of the adjacent elements: r(i, j)= dist(ai,bj)+min{r(i − 1,j − 1),r(i − 1,j),r(i, j − 1)}. Based on the DTW distance between accounts’ task series and time series, we propose an account-grouping method, which involves the following steps:

1. The dissimilarity between accounts i and j, denoted as Di,j, is calculated as

Di,j = DTW (Xi,Xj)+ DTW (Yi,Yj), (6.8)

where DTW (Xi,Xj) is the DTW distance between the task series of i and j, and

DTW (Yi,Yj) is the DTW distance between the timestamp series of i and j. Note that the less the dissimilarity value, the more similar the trajectories of two accounts.

2. An undirected graph is constructed, in which accounts are the nodes and the undirected

edge between i and j is weighted with their dissimilarity values Di,j. Only edges that

118 are less than a threshold φ are included.

3. Connected components are constructed using Depth First Search (DFS) algorithm. Each component represents a set of accounts with similar trajectories.

4. Each component is a group, and the account that is not in any component will be treated as a separate group.

(a) DTW (Xi,Xj) (b) DTW (Yi,Yj)

(c) Di,j (d) Undirected graph with Di,j < 1 Figure 6.4: Example of AG-TR

119 Next, we still use the example in Table 6.3 to illustrate the process of this grouping method. We use two adjacency matrices in Figure 6.4(a) and Figure 6.4(b) to show the DTW difference between i and j in terms of the task series and timestamp series, respectively. The adjacency matrix in Figure 6.4(c) shows the dissimilarity value of i with respect to j. We set the threshold φ = 1, and thus an undirected graph is drawn as shown in Figure 6.4(d). We see that one component is constructed in this example, i.e., {4′, 4′′, 4′′′}. Accounts 1, 2, and 3 are not in the component, and thus each of them is treated as a group. Therefore, the grouping result includes four groups, i.e., {4′, 4′′, 4′′′}, {1}, {2}, and {3}. Comparing with the result of AG-TS, AG-TR has less false-positive since it correctly groups all the accounts used by the Sybil attacker in one group. The better performance of AG-TR relies on that it not only considers the similarities in accounts’ accomplished tasks as in AG-TS but also considers the similarities in the associated timestamps of these tasks. In this work, the aforementioned three account-grouping methods are used independently in the framework. We leave the combination of them for future work. The thresholds ρ in AG-TS and φ in AG-TR depend on the tasks in a crowdsensing system. A higher value of ρ means that accounts are more likely have common accomplished tasks. A lower value of φ means that accounts are more likely have similar trajectories. Note that AG-TS and AG-TR may result in false-positive in which two legitimate users with similar accomplished tasks and similar trajectories are considered as accounts from a Sybil attacker. This problem can be alleviated when the system uses existing incentive mechanisms [8, 9, 44] to incentivize and select users. This is because one of them is less likely selected by the incentive mechanism due to its marginal contribution if the other is selected.

6.4 Experiment

Since there is no public dataset with Sybil attackers’ behaviors for crowdsensing, we evaluate our framework by conducting experiments instead of large-scale simulations. In this section, we first describe our experimental setup. Then, we evaluate three account-grouping methods. At last, we implement the framework with a truth discovery algorithm that is

120 similar to CRH [21] and compare the aggregation accuracy of our proposed framework with CRH. As in [74], we use the mean absolute error (MAE) as the metric to measure aggregation

1 m ∗ ∗ accuracy, which is defined as m j=1 |dj −dj |, where m is the number of tasks, and dj and dj are the estimated truth and groundP truth for task τj, respectively. The lower the MAE value, the higher accuracy for the data aggregation. Note that we only compared our framework with CRH since it is sufficient to represent existing truth discovery algorithms.

Figure 6.5: POIs for Wi-Fi signal strength measurement

6.4.1 Experimental Setup

In our experiment, we consider a crowdsensing system in which the tasks are measuring the Wi-Fi signal strength at 10 Point of Interest (POIs) as shown in Figure 6.5. We re- cruited 10 volunteers in our system, among them 8 volunteers acted as legitimate users and 2 volunteers acted as Sybil attackers. Each legitimate user has one account and uses one smartphone to perform tasks. Each of the two Sybil attackers has 5 accounts. One of the Sybil attacker conduct Attack-I with one smartphone, and the other conduct Attack-II with two smartphones of different models. Each account is only allowed to submit one sensing data at one POI. Therefore, a Sybil attacker can submit at most 5 data for one task using 5

121 accounts. We assume that the objective of each Sybil attacker is to mislead the system, and thus both Sybil attackers will fabricate the sensing data. Note that although we only have 2 Sybil attackers in our experiment, the experimental results can still represent the scenario when a crowdsensing system is under a large scale of the Sybil attack since the percentage of the Sybil accounts is larger than that of the legitimate users. We collect the Wi-Fi signal strength at each POI multiple times and calculate the average as ground truth. To measure the activeness of each account, we define

|T | α = i , (6.9) i m

where |Ti| is the number of tasks performed by i and m is the total number of tasks. In our experiment, each account has to perform at least two task, and thus αi ∈ [0.2, 1]. To some extent, the activeness is a good indicator to measure the contribution of legitimate accounts to the system. However, the more damages can be made by Sybil attackers with higher activeness. In our experiment, each user performs tasks according its own preference with according activeness. At last, we collect 54 walking traces in total. We use 11 smartphones in our experiment, and the distribution of these smartphones is listed in Table 6.4. One iPhone 6S is used to conduct Attack-I and one iPhone SE and one Nexus 6P are used to conduct Attack-II. As in [116], we collect device fingerprint through the browser by using a Javascript to access accelerometer and gyroscope. We use MIRtoolbox [125] to extract spectral features. Since AG-FP depends on the inherent imperfections of motion sensors to generate fingerprint, we need to keep the smartphone stationary while collecting sensor data. Therefore, we ask users to hold the smartphones in hand for 6 seconds when they sign in the system. Note that the Sybil attackers do this process again when they change accounts.

6.4.2 Evaluation of Account Grouping

In this part, we first show the performance of AG-FP, and then we compare the per- formance of the proposed three grouping methods in terms of the Adjusted Rand Index (ARI) [126], a widely used metric in machine learning to evaluate the performance of clus-

122 Table 6.4: Models of smartphones used in the experiment

OS Model Quantity iPhone SE∗∗ 1 iPhone6 1 iOS iPhone 6S∗ 2 iPhone7 1 iPhoneX 1 Nexus 6P∗∗ 3 LGG5 1 Android Nexus 5 1 Total 11 ∗ Used to conduct Attack-I. ∗∗ Used to conduct Attack-II. tering. The value of ARI lies in the range [−1, 1], and the larger the value the better grouping result. Figure 6.6 shows the distribution of the center of all 11 smartphones in the space of the first two principal components. We see that the centers of the smartphones of the same

Figure 6.6: Smartphone fingerprints in the first two principal components’ space

model are very close, and thus it is hard to differentiate them. Actually, the smartphones of the same model are usually grouped together in our experiment. Therefore, the impact of

123 Attack-I can be effectively diminished since multiple data submitted by a Sybil attacker for one task will be treated as a single data. However, there are false-positives in this method. The smartphone used by a legitimate user might be grouped with other smartphones either from legitimate users or from Sybil attackers. For the first case, the truth discovery result will not be effected since the aggregated data for the group is calculated based on data submitted by legitimate users. For the second case, the aggregated data for the group will be closed to the average of the data submitted by both legitimate users and Sybil attackers according to Equation (6.3), and thus the impact of the Sybil attack will be diminished. Figure 6.7 shows the ARI value of the three proposed grouping methods in different settings. In each setting, we fix the activeness of legitimate users and vary the activeness of Sybil attackers. Specifically, we consider three settings, i.e., α =0.2, 0.5 and 1. In Figure 6.7, we see that the ARI of AG-FP decreases as the activeness increases. This is because, with more users in the system, there might be more smartphones of the same model, causing more false-positives. We also see that the ARI value of both AG-TS and AG-TR increase with the increase of activeness. This is because, more information (i.e., more accomplished tasks and longer trajectory) can be used to differentiate accounts when accounts have higher activeness. In addition, we see that the performance of AG-TR is better than AG-TS. This is because, AG-TR can still differentiate accounts according to their timestamp series when they have similar accomplished task sets.

6.4.3 Evaluation of Accuracy

We now use MAE as a metric to measure the accuracy of the proposed framework and compare it with CRH. We implement the framework with aforementioned three account- grouping methods independently, denoted as TD-FP, TD-TS, and TD-TR, respectively. Figure 6.8 shows the MAE of our framework and CRH in different settings. In each setting, we still fix the activeness of legitimate users and vary the activeness of Sybil attackers. We see that the MAE values of the four methods decrease with the increase of the activeness of legitimate users. This is because, with more data from legitimate users, it is harder for a

124 (a) Legitimate accounts’ α = 0.2

(b) Legitimate accounts’ α = 0.5

(c) Legitimate accounts’ α = 1

Figure 6.7: ARI comparison

Sybil attacker to manipulate the aggregated results. We also see that, fixing the activeness of legitimate users, the MAE increases with the increase of activeness of Sybil attackers. This demonstrate the impact of the Sybil attack on the aggregated results. The reason for this is that a larger activeness value of a Sybil attacker implies more false data, which may be a majority for a task causing the manipulation of the aggregated result. As shown in Figure 6.8(c), the MAE of CRH is still large even the with a high activeness of legitimate

125 users. On the contrary, the MAE of our proposed framework is always lower than CRH no matter which grouping method is used. This is because our framework can diminish the impact of the Sybil attack by grouping data from suspicious accounts. The performance of TD-TR is the better than TD-FP since it can address both Attack-I and Attack-II. Mean- while, TD-TR is better than TD-TS since it has less false-positive in account-grouping as discussed before.

(a) Legitimate accounts’ α = 0.2

(b) Legitimate accounts’ α = 0.5

(c) Legitimate accounts’ α = 1

Figure 6.8: MAE comparison

126 6.5 Conclusion

In this work, we analyzed security issues in truth discovery raised by Sybil attack. The main contributions of this work are as follows: First, we investigated the security problem of truth discovery under Sybil attack. We characterized two types of Sybil attacks and demon- strated that existing truth discovery algorithms are vulnerable to Sybil attack. Second, to diminish the impact of Sybil attack, we proposed a Sybil-resistant truth discovery framework to ensure high aggregation accuracy under Sybil attack. As an important component of the framework, we designed three effective account-grouping methods. Each method groups the accounts that are likely from the same Sybil attacker. Third, we evaluated the proposed framework through a real-world experiment. The results showed that existing truth discov- ery algorithms are vulnerable to Sybil attack, and the proposed framework can effectively diminish the impact of Sybil attack.

127 CHAPTER 7 CONCLUSION

7.1 Summary of Results

In this thesis, we analyzed privacy and security issues in crowdsensing. First, we focused on privacy issues in crowdsensing, which is about the safeguarding of users. We identified the privacy problems in task allocation and incentive mechanisms raised by inference attack. To preserve users’ location privacy during task allocation, we proposed two task allocation algorithms which maximize the number of assigned tasks while providing personalized loca- tion privacy protection against location-inference attack in Chapter 3. To protect users’ bid privacy in incentive mechanisms, we designed two frameworks for privacy-preserving auction- based incentive mechanisms in Chapter 4. Next, we shifted our focus to security issues in crowdsensing, which is about the safeguarding of the system. Specifically, we identified the security problems in incentive mechanisms and truth discovery raised by Sybil attack. To deter users conducting Sybil attack in incentive mechanisms, we designed Sybil-proof incen- tive mechanisms for both offline and online scenarios in Chapter 5. To diminish the impact of Sybil attack on the aggregated data, we proposed a Sybil-resistant truth discovery frame- work in Chapter 6. The major novelty and contribution of this thesis lie in two parts. The first part is identifying privacy and security issues in crowdsensing. Although many works aim to improve the performance of crowdsensing systems, the potential privacy and security issues might undermine crowdsensing. In this thesis, we identified privacy and security issues in there parts in crowdsensing: task allocation, incentive mechanisms, and truth discovery. Pointing out these issues provides new aspects to evaluate the vulnerability of crowdsensing. The second part is solving the identified privacy and security issues. Solving privacy and security issues is much harder than identifying them. Sometimes it is impossible to fully eliminate the impact of an attack. For each of the aforementioned issues, we proposed cor-

128 responding solutions. These solutions can be treated as a guideline for future solutions to the corresponding privacy and security issues.

7.2 Summary of Publications

In this section, I summarize all of my papers during my PhD study. Published Articles: The following papers are about preserving bid privacy in incentive mechanisms (Chap- ter 4).

[P1] Jian Lin, Dejun Yang, Ming Li, Jia Xu, and Guoliang Xue, “BidGuard: A framework for privacy-preserving crowdsensing incentive mechanisms”, CNS, 2016.

[P2] Jian Lin, Dejun Yang, Ming Li, Jia Xu and Guoliang Xue, “Frameworks for Privacy- Preserving Mobile Crowdsensing Incentive Mechanisms”, IEEE Transactions on Mobile Computing (TMC), Vol. 17, No. 8, 2018.

The following papers are about deterring the Sybil attack in incentive mechanisms (Chap- ter 5).

[P3] Jian Lin, Ming Li, Dejun. Yang, Guoliang Xue, and Jian Tang, “Sybil-proof incentive mechanisms for crowdsensing”, INFOCOM, 2017.

[P4] Jian Lin, Ming Li, Dejun Yang, Guoliang Xue and Jian Tang, “Sybil-Proof Online Incentive Mechanisms for Crowdsensing”, INFOCOM, 2018.

The following paper is about alleviating the Sybil attack in truth discovery (Chapter 6).

[P5] Jian Lin, Dejun Yang, Kun Wu, Jian Tang and Guoliang Xue, “A Sybil-Resistant Truth Discovery Framework for Mobile Crowdsensing”, ICDCS, 2019.

In addition, I also made contributions and coauthored the following papers.

129 [P6] Ming Li, Jian Lin, Dejun Yang, Guoliang Xue, and Jian Tang, “QUAC: Quality- Aware Contract-Based Incentive Mechanisms for Crowdsensing”, IEEE International Conference on Mobile Ad-hoc and Sensor Systems (MASS), 2017.

[P7] Michael Brown, Colin Marshall, Dejun Yang, Ming Li, Jian Lin, and Guoliang Xue, “Maximizing Capacity in Cognitive Radio Networks Under Physical Interference Model”, IEEE/ACM Transactions on Networking (TON), Vol. 25, No. 5, 2017.

[P8] Yuhui Zhang, Dejun Yang, Jian Lin, Ming Li, Guoliang Xue, Jian Tang and Lei Xie, “Spectrum Auctions under Physical Interference Model”, IEEE Transactions on Cog- nitive Communications and Networking (TCCN), 2017.

[P9] Ming Li, Dejun Yang, Jian Lin, Ming Li and Jian Tang. “SpecWatch: Adversarial Spectrum Usage Monitoring in CRNs with Unknown Statistics”, INFOCOM, 2016.

Manuscripts Under Review: The following paper is about preserving location privacy in task allocation (Chapter 3).

[R1] Jian Lin, Ming Li, Yuhui Zhang, and Dejun Yang, “PASTA: Personalized Location Privacy-Aware Spatial Task Allocation for Mobile Crowdsensing”.

7.3 Future Research Opportunities

Although there are many works on privacy and security issues in crowdsensing, we believe that there are still several research opportunities that remain to be explored. In this section, we highlight some research directions for future research. One research direction is identifying new threats to crowdsensing. In this thesis, we focus on analyzing the impact of inference attach and Sybil attack on crowdsensing. However, there are other threats to crowdsensing that have not been well explored, e.g., the collusion attack. The collusion among users might threaten the incentives and truth discovery. This is because a user might increase its utility by colluding with others. In addition, an attacker might decrease the quality of the aggregated data through collusion. Another research direction is

130 to increase the performance of existing works. For example, in our work on alleviating the Sybil attack in truth discovery (Chapter 6), we propose one account grouping method based on device fingerprint, and we use k-means to group suspicious accounts. In recent years, with the development of machine learning, there are other methods (e.g., Artificial neural networks) that might increase the grouping accuracy. At last, bridging the gap between ideal problem setting and real-world applications might be a research direction. For example, in our work on preserving location privacy in task allocation (Chapter 3), we consider the offline scenario in which users’ submit their location to the system, and then the platform starts to assign tasks. However, it is more challenging to consider the online scenario in which users participate in a random order. Once a user arrives, the system has to make a decision on whether to select it and which tasks should be assigned to it.

131 REFERENCES CITED

[1] Prashanth Mohan, Venkata N Padmanabhan, and Ramachandran Ramjee. Nericell: rich monitoring of road and traffic conditions using mobile smartphones. In Proc. SenSys, pages 323–336, 2008.

[2] Rajib Kumar Rana, Chun Tung Chou, Salil S Kanhere, Nirupama Bulusu, and Wen Hu. Ear-phone: an end-to-end participatory urban noise mapping system. In Proc. IPSN, pages 105–116, 2010.

[3] Emiliano Miluzzo, Nicholas D Lane, Krist´of Fodor, Ronald Peterson, Hong Lu, Mirco Musolesi, Shane B Eisenman, Xiao Zheng, and Andrew T Campbell. Sensing meets mobile social networks: the design, implementation and evaluation of the cenceme application. In Proc. SenSys, pages 337–350, 2008.

[4] Sasank Reddy, Deborah Estrin, and Mani Srivastava. Recruitment framework for participatory sensing data collections. In Proc. PerCom, pages 138–155, 2010.

[5] Giuseppe Cardone, Luca Foschini, Paolo Bellavista, Antonio Corradi, Cristian Borcea, Manoop Talasila, and Reza Curtmola. Fostering participaction in smart cities: a geo- social crowdsensing platform. IEEE Commun. Mag., 51(6):112–119, 2013.

[6] Yan Liu, Bin Guo, Yang Wang, Wenle Wu, Zhiwen Yu, and Daqing Zhang. TaskMe: Multi-task allocation in mobile crowd sensing. In Proc. UbiComp, pages 403–414, 2016.

[7] Bin Guo, Yan Liu, Wenle Wu, Zhiwen Yu, and Qi Han. ActiveCrowd: A framework for optimized multitask allocation in mobile crowdsensing systems. IEEE Trans. Man- Mach. Syst., 47(3):392–403, 2017.

[8] Dejun Yang, Guoliang Xue, Xi Fang, and Jian Tang. Crowdsourcing to smartphones: incentive mechanism design for mobile phone sensing. In Proc. MobiCom, pages 173– 184, 2012.

[9] Xiang Zhang, Guoliang Xue, Ruozhou Yu, Dejun Yang, and Jian Tang. Truthful incentive mechanisms for crowdsourcing. In Proc. INFOCOM, pages 2830–2838, 2015.

[10] Jia Xu, Jinxin Xiang, and Dejun Yang. Incentive mechanisms for time window depen- dent tasks in mobile crowdsensing. IEEE Trans. Wireless Commun., 14(11):6353–6364, 2015.

132 [11] Jia Xu, Jinxin Xiang, and Yanxu Li. Incentivize maximum continuous time interval coverage under budget constraint in mobile crowd sensing. Wireless Networks, pages 1–14, 2016.

[12] Xiaoxin Yin, Jiawei Han, and S Yu Philip. Truth discovery with multiple conflicting information providers on the web. Proc. TKDE, 20(6):796–808, 2008.

[13] Thomas H Hinke, Harry S Delugach, and Randall P Wolf. Protecting databases from inference attacks. Computers & Security, 16(8):687–708, 1997.

[14] Hien To, Gabriel Ghinita, and Cyrus Shahabi. A framework for protecting worker location privacy in spatial crowdsourcing. Proc. VLDB Endowment, 7(10):919–930, 2014.

[15] Xiaocong Jin and Yanchao Zhang. Privacy-preserving crowdsourced spectrum sensing. IEEE/ACM Trans. Netw., 26(3):1236–1249, 2018.

[16] Ting Wen, Yanmin Zhu, and Tong Liu. P2: A location privacy-preserving auction mechanism for mobile crowd sensing. In Proc. GLOBECOM, pages 1–6, 2016.

[17] Cynthia Dwork. Differential privacy. In Encyclopedia of Cryptography and Security, pages 338–340. Springer, 2011.

[18] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In Proc. FOCS, pages 94–103, 2007.

[19] John R Douceur. The sybil attack. In Proc. IPTPS, pages 251–260, 2002.

[20] Qi Li, Yaliang Li, Jing Gao, Lu Su, Bo Zhao, Murat Demirbas, Wei Fan, and Jiawei Han. A confidence-aware approach for truth discovery on long-tail data. Proc. VLDB Endowment, 8(4):425–436, 2014.

[21] Qi Li, Yaliang Li, Jing Gao, Bo Zhao, Wei Fan, and Jiawei Han. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In Proc. SIGMOD, pages 1187–1198, 2014.

[22] Yaliang Li, Qi Li, Jing Gao, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. On the discovery of evolving truth. In Proc. KDD, pages 675–684, 2015.

[23] Jiangtao Wang, Leye Wang, Yasha Wang, Daqing Zhang, and Linghe Kong. Task allocation in mobile crowd sensing: State-of-the-art and future opportunities. IEEE Internet Things J., 5(5):3747–3757, 2018.

133 [24] Leye Wang, Daqing Zhang, Animesh Pathak, Chao Chen, Haoyi Xiong, Dingqi Yang, and Yasha Wang. CCS-TA: quality-guaranteed online task allocation in compressive crowdsensing. In Proc. UbiComp, pages 683–694, 2015.

[25] Daqing Zhang, Haoyi Xiong, Leye Wang, and Guanling Chen. CrowdRecruiter: select- ing participants for piggyback crowdsensing under probabilistic coverage constraint. In Proc. UbiComp, pages 703–714, 2014.

[26] Xiang Sheng, Jian Tang, and Weiyi Zhang. Energy-efficient collaborative sensing with mobile phones. In Proc. INFOCOM, pages 1916–1924, 2012.

[27] Shibo He, Dong-Hoon Shin, Junshan Zhang, and Jiming Chen. Toward optimal alloca- tion of location dependent tasks in crowdsensing. In Proc. INFOCOM, pages 745–753, 2014.

[28] Yongjian. Zhao and Qi. Han. Offline worker selection for real-time spatial crowdsourc- ing multi-worker tasks. In Proc. MDM, pages 545–550, 2019.

[29] Zhibo Wang, Jiahui Hu, Jing Zhao, Dejun Yang, Honglong Chen, and Qian Wang. Pay on-demand: Dynamic incentive and task selection for location-dependent mobile crowdsensing systems. In Proc. ICDCS, pages 611–621, 2018.

[30] Zhibo Wang, Ran Tan, Jiahui Hu, Jing Zhao, Qian Wang, Feng Xia, and Xiaoguang Niu. Heterogeneous incentive mechanism for time-sensitive and location-dependent crowdsensing networks with random arrivals. Computer networks, 131:96–109, 2018.

[31] Apu Kapadia, Nikos Triandopoulos, Cory Cornelius, Daniel Peebles, and David Kotz. AnonySense: Opportunistic and privacy-preserving context collection. In Proc. Per- vasive Computing, pages 280–297, 2008.

[32] Layla Pournajaf, Li Xiong, Vaidy Sunderam, and Slawomir Goryczka. Spatial task assignment for crowd sensing with cloaked locations. In Proc. MDM, volume 1, pages 73–82, 2014.

[33] Layla Pournajaf, Daniel A Garcia-Ulloa, Li Xiong, and Vaidy Sunderam. Participant privacy in mobile crowd sensing task management: A survey of methods and challenges. ACM Sigmod Record, 44(4):23–34, 2016.

[34] Latanya Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05):571–588, 2002.

[35] Marco Gruteser and Dirk Grunwald. Anonymous usage of location-based services through spatial and temporal cloaking. In Proc. MobiSys, pages 31–42, 2003.

134 [36] Miguel Andr´es, Nicol´as Bordenabe, Konstantinos Chatzikokolakis, and Catuscia Palamidessi. Geo-indistinguishability: Differential privacy for location-based systems. In Proc. CCS, pages 901–914, 2013.

[37] Leye Wang, Dingqi Yang, Xiao Han, Tianben Wang, Daqing Zhang, and Xiaojuan Ma. Location privacy-preserving task allocation for mobile crowdsensing with differential geo-obfuscation. In Proc. WWW, pages 627–636, 2017.

[38] Zhibo Wang, Jiahui Hu, Ruizhao Lv, Jian Wei, Qian Wang, Dejun Yang, and Hairong Qi. Personalized privacy-preserving task allocation for mobile crowdsensing. IEEE Trans. Mobile Comput., 2018.

[39] Yingjie Wang, Zhipeng Cai, Zhongyang Chi, Xiangrong Tong, and Lijie Li. A differ- entially k-anonymity-based location privacy-preserving for mobile crowdsourcing sys- tems. Procedia Computer Science, 129:28–34, 2018.

[40] Ting Li, Taeho Jung, Hanshang Li, Lijuan Cao, Weichao Wang, Xiang-Yang Li, and Yu Wang. Scalable privacy-preserving participant selection in mobile crowd sensing. In Proc. PerCom, pages 59–68, 2017.

[41] Jian Lin, Dejun Yang, Ming Li, Jia Xu, and Guoliang Xue. BidGuard: A framework for privacy-preserving crowdsensing incentive mechanisms. In Proc. CNS, pages 439–447, 2016.

[42] Hui Gao, Chi Harold Liu, Wendong Wang, Jianxin Zhao, Zheng Song, Xin Su, Jon Crowcroft, and Kin K Leung. A survey of incentive mechanisms for participatory sensing. IEEE Commun. Surveys Tuts., 17(2):918–943, 2015.

[43] Francesco Restuccia, Sajal K Das, and Jamie Payton. Incentive mechanisms for par- ticipatory sensing: Survey and research challenges. ACM TOSN, 12(2):13, 2016.

[44] Dejun Yang, Guoliang Xue, Xi Fang, and Jian Tang. Incentive mechanisms for crowd- sensing: Crowdsourcing with smartphones. IEEE/ACM Trans. Netw., 24(3):1732– 1744, 2016.

[45] Xinglin Zhang, Zheng Yang, Zimu Zhou, Haibin Cai, Lei Chen, and Xiangyang Li. Free market of crowdsourcing: Incentive mechanism design for mobile sensing. IEEE Trans. Parallel Distrib. Syst., 25(12):3190–3200, 2014.

[46] Tie Luo, Hwee-Pink Tan, and Lirong Xia. Profit-maximizing incentive for participatory sensing. In Proc. INFOCOM, pages 127–135, 2014.

135 [47] Zhenni Feng, Yanmin Zhu, Qian Zhang, Lionel M Ni, and Athanasios V Vasilakos. TRAC: Truthful auction for location-aware collaborative sensing in mobile crowdsourc- ing. In Proc. INFOCOM, pages 1231–1239, 2014.

[48] Delphine Christin. Privacy in mobile participatory sensing: Current trends and future challenges. Journal of Systems and Software, 116:57–68, 2016.

[49] Xiaoguang Niu, Meng Li, Qianyuan Chen, Qingqing Cao, and Houzhen Wang. EPPI: An e-cent-based privacy-preserving incentive mechanism for participatory sensing sys- tems. In Proc. IPCCC, pages 1–8, 2014.

[50] Ioannis Krontiris and Tassos Dimitriou. A platform for privacy protection of data requesters and data providers in mobile sensing. Computer Communications, 65:43– 54, 2015.

[51] Qinghua Li and Guohong Cao. Providing efficient privacy-aware incentives for mobile sensing. In Proc. ICDCS, pages 208–217, 2014.

[52] Jingyao Fan, Qinghua Li, and Guohong Cao. Privacy-aware and trustworthy data aggregation in mobile sensing. In Proc. CNS, pages 31–39, 2015.

[53] Qinghua Li and Guohong Cao. Providing privacy-aware incentives for mobile sensing. In Proc. PerCom, pages 76–84, 2013.

[54] Yingjie Wang, Zhipeng Cai, Guisheng Yin, Yang Gao, Xiangrong Tong, and Guanying Wu. An incentive mechanism with privacy protection in mobile crowdsourcing systems. Computer Networks, 102:157–171, 2016.

[55] Bo Zhang, Chi Harold Liu, Jianyu Lu, Zheng Song, Ziyu Ren, Jian Ma, and Wendong Wang. Privacy-preserving QoI-aware participant coordination for mobile crowdsourc- ing. Computer Networks, 101:29–41, 2016.

[56] Stylianos Gisdakis, Thanassis Giannetsos, and Panagiotis Papadimitratos. Security, privacy, and incentive provision for mobile crowd sensing systems. IEEE Internet Things J., 3(5):839–853, 2016.

[57] Jiajun Sun and Huadong Ma. Privacy-preserving verifiable incentive mechanism for online crowdsourcing markets. In Proc. ICCCN, pages 1–8, 2014.

[58] Haiming Jin, Lu Su, Bolin Ding, Klara Nahrstedt, and Nikita Borisov. Enabling privacy-preserving incentives for mobile crowd sensing systems. In Proc. ICDCS, pages 344–353, 2016.

136 [59] Haiming Jin, Lu Su, Houping Xiao, and Klara Nahrstedt. INCEPTION: incentiviz- ing privacy-preserving data aggregation for mobile crowd sensing systems. In Proc. MobiHoc, pages 341–350, 2016.

[60] Binghui Wang, Le Zhang, and Neil Zhenqiang Gong. SybilSCAR: Sybil detection in online social networks via local rule based propagation. In Proc. INFOCOM, pages 1099–1107, 2017.

[61] Gang Wang, Bolun Wang, Tianyi Wang, Ana Nika, Haitao Zheng, and Ben Y Zhao. Defending against Sybil devices in crowdsourced mapping services. In Proc. MobiSys, pages 179–191, 2016.

[62] Qinhui Wang, Baoliu Ye, Bin Tang, Song Guo, and Sanglu Lu. eBay in the clouds: False-name-proof auctions for cloud resource allocation. In Proc. ICDCS, pages 153– 162, 2015.

[63] Chris Piro, Clay Shields, and Brian Neil Levine. Detecting the Sybil attack in mobile ad hoc networks. In Proc. Securecomm and Workshops, pages 1–11, 2006.

[64] Makoto Yokoo, Yuko Sakurai, and Shigeo Matsubara. The effect of false-name bids in combinatorial auctions: New fraud in internet auctions. Games and Economic Behavior, 46(1):174–188, 2004.

[65] Makoto Yokoo. Characterization of strategy/false-name proof combinatorial auction protocols: price-oriented, rationing-free protocol. In Proc. IJCAI, pages 733–742, 2003.

[66] Luis G Jaimes, Idalides Vergara-Laurens, and Andrew Raij. A crowd sensing incentive algorithm for data collection for consecutive time slot problems. In Proc. LATINCOM, pages 1–5, 2014.

[67] Iordanis Koutsopoulos. Optimal incentive-driven design of participatory sensing sys- tems. In Proc. INFOCOM, pages 1402–1410, 2013.

[68] Kai Han, Yuntian He, Haisheng Tan, Shaojie Tang, He Huang, and Jun Luo. Online pricing for mobile crowdsourcing with multi-minded users. In MobiHoc, pages 18:1– 18:10, 2017.

[69] Zhenzhe Zheng, Yanqing Peng, Fan Wu, Shaojie Tang, and Guihai Chen. An online pricing mechanism for mobile crowdsensing data markets. In MobiHoc, pages 26:1– 26:10, 2017.

[70] Dong Zhao, Xiang-Yang Li, and Huadong Ma. Budget-feasible online incentive mech- anisms for crowdsourcing tasks truthfully. IEEE/ACM Trans. Netw., 24(2):647–661, 2016.

137 [71] Lin Gao, Fen Hou, and Jianwei Huang. Providing long-term participation incentive in participatory sensing. In Proc. INFOCOM, pages 2803–2811, 2015.

[72] Zhenni Feng, Yanmin Zhu, Qian Zhang, Hongzi Zhu, Jiadi Yu, Jian Cao, and Lionel M Ni. Towards truthful mechanisms for mobile crowdsourcing with dynamic smartphones. In Proc. ICDCS, pages 11–20, 2014.

[73] Kai Han, Chi Zhang, Jun Luo, Menglan Hu, and Bharadwaj Veeravalli. Truthful scheduling mechanisms for powering mobile crowdsensing. IEEE Trans. Comput., 65 (1):294–307, 2016.

[74] Haiming Jin, Lu Su, and Klara Nahrstedt. Theseus: Incentivizing truth discovery in mobile crowd sensing systems. In Proc. MobiHoc, pages 1–10, 2017.

[75] X Tang, C Wang, X Yuan, and Q Wang. Non-interactive privacy-preserving truth discovery in crowd sensing application. In Proc. INFOCOM, pages 1988–1996, 2018.

[76] Liehuang Zhu, Chuan Zhang, Chang Xu, and Kashif Sharif. Rtsense: Providing reliable trust-based crowdsensing services in cvcc. IEEE Netw., 32(3):20–26, 2018.

[77] Jian Lin, Jian Li, Jian Yang, Guoliang Xue, and Jian Tang. Sybil-proof incentive mechanisms for crowdsensing. In Proc. INFOCOM, pages 2088–2096, 2017.

[78] Jian Lin, Ming Li, Dejun Yang, Guoliang Xue, and Jian Tang. Sybil-proof online incentive mechanisms for crowdsensing. In Proc. INFOCOM, pages 2438–2446, 2018.

[79] Lingyun Jiang, Xiaofu Niu, Jia Xu, Yuchan Wang, Yongqi Wu, and Lijie Xu. Time- sensitive and Sybil-proof incentive mechanisms for mobile crowdsensing via social net- work. IEEE Access, 6:48156–48168, 2018.

[80] Xiaocong Jin, Rui Zhang, Yimin Chen, Tao Li, and Yanchao Zhang. DPSense: Differ- entially private crowdsourced spectrum sensing. In Proc. CCS, pages 296–307, 2016.

[81] Leye Wang, Daqing Zhang, Dingqi Yang, Brian Y Lim, and Xiaojuan Ma. Differential location privacy for sparse mobile crowdsensing. In Proc. ICDM, pages 1257–1262, 2016.

[82] Opensignal. http://opensignal.com/, 2019.

[83] Hiroshi Ohashi. Effects of transparency in procurement practices on government ex- penditure: A case study of municipal public works. Review of Industrial Organization, 34(3):267–285, 2009.

138 [84] Yonghui Xiao and Li Xiong. Protecting locations with differential privacy under tem- poral correlations. In Proc. CCS, pages 1298–1309, 2015.

[85] Alastair R. Beresford and Frank Stajano. Location privacy in pervasive computing. IEEE Pervasive Comput., 2(1):46–55, 2004.

[86] Vincent Primault, Antoine Boutet, Sonia Ben Mokhtar, and Lionel Brunie. The long road to computational location privacy: A survey. IEEE Commun. Surveys Tuts., 2018.

[87] Martin E. Dyer and Alan M. Frieze. Planar 3dm is np-complete. Journal of Algorithms, 7(2):174–184, 1986.

[88] Dorit S Hochbaum and Wolfgang Maass. Approximation schemes for covering and packing problems in image processing and vlsi. JACM, 32(1):130–136, 1985.

[89] Luan Tran, Hien To, Liyue Fan, and Cyrus Shahabi. A real-time framework for task assignment in hyperlocal spatial crowdsourcing. TIST, 9(3):37, 2018.

[90] Lorenzo Bracciale, Marco Bonola, Pierpaolo Loreti, Giuseppe Bianchi, Raul Amici, and Antonello Rabuffi. CRAWDAD data set roma/taxi (v. 2014-07-17). Downloaded from http://crawdad.org/roma/taxi/, July 2014.

[91] Jia Xu, Jinxin Xiang, and Dejun Yang. Incentive mechanisms for time window depen- dent tasks in mobile crowdsensing. IEEE Trans. Wireless Commun., 14(11):6353–6364, 2015.

[92] J. Lin, D. Yang, M. Li, J. Xu, and G. Xue. Frameworks for privacy-preserving mobile crowdsensing incentive mechanisms. IEEE Trans. Mobile Comput., 17(8):1851–1864, 2018.

[93]Staff Reporter. The proof is in the profit. auc- tion transparency works. http://dealerbidsale.com/ the-proof-is-in-the-profit-auction-transparency-works/, 2016.

[94] FCC auctions releases. http://wireless.fcc.gov/auctions/default.htm?job= archived_releases, 2018.

[95] Auction.com. https://www.auction.com/, 2018.

[96] eBay. https://www.ebay.com/, 2018.

139 [97] Sushil Jajodia and Catherine Meadows. Inference problems in multilevel secure database management systems. Information Security: An integrated collection of es- says, 1:570–584, 1995.

[98] Chris Clifton. Using sample size to limit exposure to data mining. Journal of Computer Security, 8(4):281–307, 2000.

[99] Andrei Stoica and Csilla Farkas. Secure xml views. Research Directions in Data and Applications Security, 128:133–146, 2003.

[100] Xiangyu Liu, Zhe Zhou, Wenrui Diao, Zhou Li, and Kehuan Zhang. When good becomes evil: Keystroke inference with smartwatch. In Proc. CCS, pages 1273–1285, 2015.

[101] Pengfei Zhou, Yuanqing Zheng, and Mo Li. How long to wait?: predicting bus arrival time with mobile phone based participatory sensing. In Proc. MobiSys, pages 379–392, 2012.

[102] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Proc. Theory of cryptography, pages 265–284, 2006.

[103] Aaron Archer and Eva´ Tardos. Truthful mechanisms for one-parameter agents. In Proc. FOCS, pages 482–491, 2001.

[104] Anupam Gupta, Katrina Ligett, Frank McSherry, Aaron Roth, and Kunal Talwar. Differentially private combinatorial optimization. In Proc. SODA, pages 1106–1125, 2010.

[105] Vasek Chvatal. A greedy heuristic for the set-covering problem. Mathematics of oper- ations research, 4(3):233–235, 1979.

[106] Hamed Shah-Mansouri and Vincent WS Wong. Profit maximization in mobile crowd- sourcing: A truthful auction mechanism. In Proc. ICC, pages 3216–3221, 2015.

[107] Zhenni Feng, Yanmin Zhu, and Lionel M Ni. iMac: Strategy-proof incentive mechanism for mobile crowdsourcing. In Proc. WASA, pages 337–350, 2013.

[108] Y. Wen, J. Shi, Q. Zhang, X. Tian, Z. Huang, H. Yu, Y. Cheng, and X. Shen. Quality- driven auction based incentive mechanism for mobile crowd sensing. IEEE Trans. Veh. Technol., 64(9):4203–4214, 2015.

[109] Qi Zhang, Yutian Wen, Xiaohua Tian, Xiaoying Gan, and Xinbing Wang. Incentivize crowd labeling under budget constraint. In Proc. INFOCOM, pages 2812–2820, 2015.

140 [110] Cai Chen and Yinglin Wang. SPARC: Strategy-proof double auction for mobile par- ticipatory sensing. In Proc. CloudCom-Asia, pages 133–140, 2013.

[111] Haiming Jin, Lu Su, Danyang Chen, Klara Nahrstedt, and Jinhui Xu. Quality of information aware incentive mechanisms for mobile crowd sensing systems. In Proc. MobiHoc, 2015.

[112] Dong Zhao, Xiang-Yang Li, and Huadong Ma. How to crowdsource tasks truthfully without sacrificing utility: Online incentive mechanisms with budget constraint. In Proc. INFOCOM, pages 1213–1221, 2014.

[113] Roger B. Myerson. Optimal auction design. Mathematics of operations research, 6(1): 58–73, 1981.

[114] Yaron . Budget feasible mechanisms. In Proc. FOCS, pages 765–774, 2010.

[115] Jian Lin, Dejun Yang, Kun Wu, Jian Tang, and Guoliang Xue. A Sybil-resistant truth discovery framework for mobile crowdsensing. In Proc. ICDCS, pages 871–880, 2019.

[116] Anupam Das, Nikita Borisov, and Matthew Caesar. Tracking mobile web users through motion sensors: Attacks and defenses. In Proc. NDSS, 2016.

[117] WiFi Map. https://www.wifimap.io/, 2018.

[118] Heloise Pieterse, Martin Olivier, and Renier van Heerden. Smartphone data evaluation model: Identifying authentic smartphone data. Digital Investigation, 24:11–24, 2018.

[119] Kirk H.M. Wong, Yuan Zheng, Jiannong Cao, and Shengwei Wang. A dynamic user authentication scheme for wireless sensor networks. In Proc. SUTC, pages 244–251, 2006.

[120] Geoffroy Peeters. A large set of audio features for sound description (similarity and classification) in the CUIDADO project. In IRCAM Technical report, 2004.

[121] James B. MacQueen. Some methods for classification and analysis of multivariate observations. Proc. Berkeley symposium on mathematical statistics and probability, 1 (14):281–297, 1967.

[122] Trupti M Kodinariya and Prashant R Makwana. Review on determining number of cluster in k-means clustering. International Journal, 1(6):90–95, 2013.

[123] Donald J Berndt and James Clifford. Using dynamic time warping to find patterns in time series. Proc. KDD workshop, 10(16):359–370, 1994.

141 [124] Chotirat Ann Ratanamahatana and Eamonn Keogh. Making time-series classification more accurate using learned constraints. In Proc. SIAM International Conference on Data Mining, pages 11–22, 2004.

[125] MIRtoolbox. https://www.jyu.fi/hum/laitokset/musiikki/en/research/coe/ materials/mirtoolbox, 2018.

[126] Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal of classification, 2(1):193–218, 1985.

142 APPENDIX COPYRIGHT PERMISSION

143 RightsLink Home Help Email Support Sign in Create Account

A Sybil-Resistant Truth Discovery Framework for Mobile Crowdsensing Conference Proceedings: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS) Author: Jian Lin Publisher: IEEE Date: Jul 2019

Copyright © 2019, IEEE

Thesis / Dissertation Reuse

The IEEE does not require individuals working on a thesis to obtain a formal reuse license, however, you may print out this statement to be used as a permission grant:

Requirements to be followed when using any portion (e.g., gure, graph, table, or textual material) of an IEEE copyrighted paper in a thesis:

1) In the case of textual material (e.g., using short quotes or referring to the work within these papers) users must give full credit to the original source (author, paper, publication) followed by the IEEE copyright line © 2011 IEEE. 2) In the case of illustrations or tabular material, we require that the copyright line © [Year of original publication] IEEE appear prominently with each reprinted gure and/or table. 3) If a substantial portion of the original paper is to be used, and if you are not the senior author, also obtain the senior author's approval.

Requirements to be followed when using an entire IEEE copyrighted paper in a thesis:

1) The following IEEE copyright/ credit notice should be placed prominently in the references: © [year of original publication] IEEE. Reprinted, with permission, from [author names, paper title, IEEE publication title, and month/year of publication] 2) Only the accepted version of an IEEE copyrighted paper can be used when posting the paper or your thesis on- line. 3) In placing the thesis on the author's university website, please display the following message in a prominent place on the website: In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of [university/educational entity's name goes here]'s products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.

If applicable, University Microlms and/or ProQuest Library, or the Archives of Canada may supply single copies of the dissertation.

BACK CLOSE

© 2019 Copyright - All Rights Reserved | Copyright Clearance Center, Inc. | Privacy statement | Terms and Conditions Comments? We would like to hear from you. E-mail us at [email protected]

144 RightsLink Home Help Email Support Sign in Create Account

BidGuard: A framework for privacy-preserving crowdsensing incentive mechanisms

Conference Proceedings: 2016 IEEE Conference on Communications and Network Security (CNS) Author: Jian Lin; Dejun Yang; Ming Li; Jia Xu; Guoliang Xue Publisher: IEEE Date: 17-19 Oct. 2016

Copyright © 2016, IEEE

Thesis / Dissertation Reuse

The IEEE does not require individuals working on a thesis to obtain a formal reuse license, however, you may print out this statement to be used as a permission grant:

Requirements to be followed when using any portion (e.g., figure, graph, table, or textual material) of an IEEE copyrighted paper in a thesis:

1) In the case of textual material (e.g., using short quotes or referring to the work within these papers) users must give full credit to the original source (author, paper, publication) followed by the IEEE copyright line © 2011 IEEE. 2) In the case of illustrations or tabular material, we require that the copyright line © [Year of original publication] IEEE appear prominently with each reprinted figure and/or table. 3) If a substantial portion of the original paper is to be used, and if you are not the senior author, also obtain the senior author's approval.

Requirements to be followed when using an entire IEEE copyrighted paper in a thesis:

1) The following IEEE copyright/ credit notice should be placed prominently in the references: © [year of original publication] IEEE. Reprinted, with permission, from [author names, paper title, IEEE publication title, and month/year of publication] 2) Only the accepted version of an IEEE copyrighted paper can be used when posting the paper or your thesis on- line. 3) In placing the thesis on the author's university website, please display the following message in a prominent place on the website: In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of [university/educational entity's name goes here]'s products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.

If applicable, University Microfilms and/or ProQuest Library, or the Archives of Canada may supply single copies of the dissertation.

BACK CLOSE

© 2019 Copyright - All Rights Reserved | Copyright Clearance Center, Inc. | Privacy statement | Terms and Conditions Comments? We would like to hear from you. E-mail us at [email protected]

145 RightsLink Home Help Email Support Sign in Create Account

Frameworks for Privacy-Preserving Mobile Crowdsensing Incentive Mechanisms

Author: Jian Lin Publication: Mobile Computing, IEEE Transactions on Publisher: IEEE Date: 1 Aug. 2018

Copyright © 2018, IEEE

Thesis / Dissertation Reuse

The IEEE does not require individuals working on a thesis to obtain a formal reuse license, however, you may print out this statement to be used as a permission grant:

Requirements to be followed when using any portion (e.g., figure, graph, table, or textual material) of an IEEE copyrighted paper in a thesis:

1) In the case of textual material (e.g., using short quotes or referring to the work within these papers) users must give full credit to the original source (author, paper, publication) followed by the IEEE copyright line © 2011 IEEE. 2) In the case of illustrations or tabular material, we require that the copyright line © [Year of original publication] IEEE appear prominently with each reprinted figure and/or table. 3) If a substantial portion of the original paper is to be used, and if you are not the senior author, also obtain the senior author's approval.

Requirements to be followed when using an entire IEEE copyrighted paper in a thesis:

1) The following IEEE copyright/ credit notice should be placed prominently in the references: © [year of original publication] IEEE. Reprinted, with permission, from [author names, paper title, IEEE publication title, and month/year of publication] 2) Only the accepted version of an IEEE copyrighted paper can be used when posting the paper or your thesis on- line. 3) In placing the thesis on the author's university website, please display the following message in a prominent place on the website: In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of [university/educational entity's name goes here]'s products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.

If applicable, University Microfilms and/or ProQuest Library, or the Archives of Canada may supply single copies of the dissertation.

BACK CLOSE

© 2019 Copyright - All Rights Reserved | Copyright Clearance Center, Inc. | Privacy statement | Terms and Conditions Comments? We would like to hear from you. E-mail us at [email protected]

146 RightsLink Home Help Email Support Sign in Create Account

Sybil-proof incentive mechanisms for crowdsensing

Conference Proceedings: IEEE INFOCOM 2017 - IEEE Conference on Computer Communications Author: Jian Lin; Ming Li; Dejun Yang; Guoliang Xue; Jian Tang Publisher: IEEE Date: 1-4 May 2017

Copyright © 2017, IEEE

Thesis / Dissertation Reuse

The IEEE does not require individuals working on a thesis to obtain a formal reuse license, however, you may print out this statement to be used as a permission grant:

Requirements to be followed when using any portion (e.g., figure, graph, table, or textual material) of an IEEE copyrighted paper in a thesis:

1) In the case of textual material (e.g., using short quotes or referring to the work within these papers) users must give full credit to the original source (author, paper, publication) followed by the IEEE copyright line © 2011 IEEE. 2) In the case of illustrations or tabular material, we require that the copyright line © [Year of original publication] IEEE appear prominently with each reprinted figure and/or table. 3) If a substantial portion of the original paper is to be used, and if you are not the senior author, also obtain the senior author's approval.

Requirements to be followed when using an entire IEEE copyrighted paper in a thesis:

1) The following IEEE copyright/ credit notice should be placed prominently in the references: © [year of original publication] IEEE. Reprinted, with permission, from [author names, paper title, IEEE publication title, and month/year of publication] 2) Only the accepted version of an IEEE copyrighted paper can be used when posting the paper or your thesis on- line. 3) In placing the thesis on the author's university website, please display the following message in a prominent place on the website: In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of [university/educational entity's name goes here]'s products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.

If applicable, University Microfilms and/or ProQuest Library, or the Archives of Canada may supply single copies of the dissertation.

BACK CLOSE

© 2019 Copyright - All Rights Reserved | Copyright Clearance Center, Inc. | Privacy statement | Terms and Conditions Comments? We would like to hear from you. E-mail us at [email protected]

147 RightsLink Home Help Email Support Sign in Create Account

Sybil-Proof Online Incentive Mechanisms for Crowdsensing

Conference Proceedings: IEEE INFOCOM 2018 - IEEE Conference on Computer Communications Author: Jian Lin Publisher: IEEE Date: April 2018

Copyright © 2018, IEEE

Thesis / Dissertation Reuse

The IEEE does not require individuals working on a thesis to obtain a formal reuse license, however, you may print out this statement to be used as a permission grant:

Requirements to be followed when using any portion (e.g., figure, graph, table, or textual material) of an IEEE copyrighted paper in a thesis:

1) In the case of textual material (e.g., using short quotes or referring to the work within these papers) users must give full credit to the original source (author, paper, publication) followed by the IEEE copyright line © 2011 IEEE. 2) In the case of illustrations or tabular material, we require that the copyright line © [Year of original publication] IEEE appear prominently with each reprinted figure and/or table. 3) If a substantial portion of the original paper is to be used, and if you are not the senior author, also obtain the senior author's approval.

Requirements to be followed when using an entire IEEE copyrighted paper in a thesis:

1) The following IEEE copyright/ credit notice should be placed prominently in the references: © [year of original publication] IEEE. Reprinted, with permission, from [author names, paper title, IEEE publication title, and month/year of publication] 2) Only the accepted version of an IEEE copyrighted paper can be used when posting the paper or your thesis on- line. 3) In placing the thesis on the author's university website, please display the following message in a prominent place on the website: In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of [university/educational entity's name goes here]'s products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.

If applicable, University Microfilms and/or ProQuest Library, or the Archives of Canada may supply single copies of the dissertation.

BACK CLOSE

© 2019 Copyright - All Rights Reserved | Copyright Clearance Center, Inc. | Privacy statement | Terms and Conditions Comments? We would like to hear from you. E-mail us at [email protected]

148 Jian Lin

Request for the copyright permission 2 messages

Jian Thu, Oct 31, 2019 at 1:47 PM To: Dejun Yang

Dear Professor Dejun Yang,

I am writing to ask your permission to incorporate the materials in the following papers you coauthored into my PhD thesis.

1. Jian Lin, Dejun Yang, Ming Li, Jia Xu, and Guoliang Xue, "BidGuard: A framework for privacy-preserving crowdsensing incentive mechanisms", CNS, 2016.

2.Jian Lin, Dejun Yang, Ming Li, Jia Xu and Guoliang Xue, "Frameworks for Privacy-Preserving Mobile Crowdsensing Incentive Mechanisms", IEEE Transactions on Mobile Computing (TMC), 2018.

3. Jian Lin, Ming Li, Dejun. Yang, Guoliang Xue, and Jian Tang, "Sybil-proof incentive mechanisms for crowdsensing", INFOCOM, 2017.

4. Jian Lin, Ming Li, Dejun Yang, Guoliang Xue and Jian Tang, "Sybil-Proof Online Incentive Mechanisms for Crowdsensing", INFOCOM, 2018.

5. Jian Lin, Dejun Yang, Kun Wu, Jian Tang and Guoliang Xue, "A Sybil-Resistant Truth Discovery Framework for Mobile Crowdsensing", ICDCS, 2019

If you agree to grant me the permission, you can simply replay “I grant Jian Lin permission to use the above referenced materials.”

Best, Jian Lin

Dejun Yang Thu, Oct 31, 2019 at 1:48 PM To: Jian Lin

I grant Jian Lin permission to use the above referenced materials.

Sent from my phone. Please excuse typos.

From: Jian Sent: Thursday, October 31, 2019 1:47:10 PM To: Dejun Yang Subject: Request for the copyright permission

[Quoted text hidden]

149 Jian Lin

Request for the copyright permission 3 messages

Jian Thu, Oct 31, 2019 at 1:49 PM To: Ming Li

Dear Ming Li,

I am writing to ask your permission to incorporate the materials in the following papers you coauthored into my PhD thesis.

1. Jian Lin, Dejun Yang, Ming Li, Jia Xu, and Guoliang Xue, "BidGuard: A framework for privacy-preserving crowdsensing incentive mechanisms", CNS, 2016.

2. Jian Lin, Dejun Yang, Ming Li, Jia Xu and Guoliang Xue, "Frameworks for Privacy-Preserving Mobile Crowdsensing Incentive Mechanisms", IEEE Transactions on Mobile Computing (TMC), 2018.

3. Jian Lin, Ming Li, Dejun. Yang, Guoliang Xue, and Jian Tang, "Sybil-proof incentive mechanisms for crowdsensing", INFOCOM, 2017.

4. Jian Lin, Ming Li, Dejun Yang, Guoliang Xue and Jian Tang, "Sybil-Proof Online Incentive Mechanisms for Crowdsensing", INFOCOM, 2018.

If you agree to grant me the permission, you can simply replay “I grant Jian Lin permission to use the above referenced materials.”

Best, Jian Lin

Ming Li Thu, Oct 31, 2019 at 1:56 PM To: Jian

Hi Jian,

Sure, I approve.

Best, Ming [Quoted text hidden] -- Ming Li (李明) PhD candidate, Computer Science Colorado School of Mines

Ming Li Thu, Oct 31, 2019 at 8:53 PM To: Jian

Hi Jian,

I grant Jian Lin permission to use the above referenced materials.

Best, Ming

On Thu, Oct 31, 2019 at 1:49 PM Jian wrote: [Quoted text hidden]

150 Jian Lin

Request for the copyright permission 3 messages

Jian Thu, Oct 31, 2019 at 1:50 PM To: Guoliang Xue

Dear Professor Guoliang Xue,

I am writing to ask your permission to incorporate the materials in the following papers you coauthored into my PhD thesis.

1. Jian Lin, Dejun Yang, Ming Li, Jia Xu, and Guoliang Xue, "BidGuard: A framework for privacy-preserving crowdsensing incentive mechanisms", CNS, 2016.

2. Jian Lin, Dejun Yang, Ming Li, Jia Xu and Guoliang Xue, "Frameworks for Privacy-Preserving Mobile Crowdsensing Incentive Mechanisms", IEEE Transactions on Mobile Computing (TMC), 2018.

3. Jian Lin, Ming Li, Dejun. Yang, Guoliang Xue, and Jian Tang, "Sybil-proof incentive mechanisms for crowdsensing", INFOCOM, 2017.

4. Jian Lin, Ming Li, Dejun Yang, Guoliang Xue and Jian Tang, "Sybil-Proof Online Incentive Mechanisms for Crowdsensing", INFOCOM, 2018.

5. Jian Lin, Dejun Yang, Kun Wu, Jian Tang and Guoliang Xue, "A Sybil-Resistant Truth Discovery Framework for Mobile Crowdsensing", ICDCS, 2019

If you agree to grant me the permission, you can simply replay “I grant Jian Lin permission to use the above referenced materials.”

Best, Jian Lin

Guoliang Xue Thu, Oct 31, 2019 at 4:13 PM To: Jian

Dear Jian,

You certainly have my permission.

Best regards,

Guoliang -- Guoliang (Larry) Xue, IEEE Fellow Area Editor (Wireless Networking), IEEE Transactions on Wireless Communications Professor of Computer Science and Engineering Arizona State University, Tempe, AZ 85287 URL: http://optimization.asu.edu/~xue

[Quoted text hidden]

Jian Lin Thu, Oct 31, 2019 at 8:51 PM To: Guoliang Xue

Got it. Thank you very much for your permission. [Quoted text hidden]

151 Jian Lin

Request for the copyright permission 3 messages

Jian Thu, Oct 31, 2019 at 1:53 PM To: Kun Wu

Dear Kun Wu,

I am writing to ask your permission to incorporate the materials in the following paper you coauthored into my PhD thesis.

Jian Lin, Dejun Yang, Kun Wu, Jian Tang and Guoliang Xue, "A Sybil-Resistant Truth Discovery Framework for Mobile Crowdsensing", ICDCS, 2019

If you agree to grant me the permission, you can simply replay “I grant Jian Lin permission to use the above referenced materials.”

Best, Jian Lin

Kun Wu Thu, Oct 31, 2019 at 8:01 PM To: Jian

I grant Jian Lin permission to use the above referenced materials.

Best, Kun Wu [Quoted text hidden]

Jian Lin Thu, Oct 31, 2019 at 8:57 PM To: Kun Wu

Thank you very much! [Quoted text hidden]

152 Jian Lin

Request for the copyright permission 2 messages

Jian Thu, Oct 31, 2019 at 1:52 PM To: Jian Tang

Dear Professor Jian Tang,

I am writing to ask your permission to incorporate the materials in the following papers you coauthored into my PhD thesis.

1. Jian Lin, Ming Li, Dejun. Yang, Guoliang Xue, and Jian Tang, "Sybil-proof incentive mechanisms for crowdsensing", INFOCOM, 2017.

2. Jian Lin, Ming Li, Dejun Yang, Guoliang Xue and Jian Tang, "Sybil-Proof Online Incentive Mechanisms for Crowdsensing", INFOCOM, 2018.

3. Jian Lin, Dejun Yang, Kun Wu, Jian Tang and Guoliang Xue, "A Sybil-Resistant Truth Discovery Framework for Mobile Crowdsensing", ICDCS, 2019

If you agree to grant me the permission, you can simply replay “I grant Jian Lin permission to use the above referenced materials.”

Best, Jian Lin

Jian Tang Thu, Oct 31, 2019 at 7:46 PM To: Jian Cc: "DJ ([email protected])"

Sure, of course.

Best, Jian [Quoted text hidden]

153 Jian Lin

Request for the copyright permission 3 messages

Jian Thu, Oct 31, 2019 at 1:55 PM To: [email protected]

Dear Professor Jia Xu,

I am writing to ask your permission to incorporate the materials in the following papers you coauthored into my PhD thesis.

1. Jian Lin, Dejun Yang, Ming Li, Jia Xu, and Guoliang Xue, "BidGuard: A framework for privacy-preserving crowdsensing incentive mechanisms", CNS, 2016.

2.Jian Lin, Dejun Yang, Ming Li, Jia Xu and Guoliang Xue, "Frameworks for Privacy-Preserving Mobile Crowdsensing Incentive Mechanisms", IEEE Transactions on Mobile Computing (TMC), 2018.

If you agree to grant me the permission, you can simply replay “I grant Jian Lin permission to use the above referenced materials.”

Best, Jian Lin xujia Thu, Oct 31, 2019 at 11:40 PM To: Jian

Jian,

I grant Jian Lin permission to use the above referenced materials.

Jia Xu

2019-11-01

xujia

发件人: Jian 发送时间: 2019-11-01 04:21:48 收件人: xujia 抄送: 主题: Request for the copyright permission [Quoted text hidden]

Jian Lin Fri, Nov 1, 2019 at 9:08 AM To: xujia

Thank you very much. [Quoted text hidden]

154