arXiv:2004.08856v2 [cs.CR] 22 Dec 2020 hspprapasi EEItre fTig ora (IoT-J Journal Things of Internet IEEE in appears paper This fSnaoe 147 Eal [email protected] (Email: 117417. Singapore, of [email protected] (Email: 266102. China, of University Posts a a r ihSho fCmue cec n Engineer- and Science Computer 639798 of Singapore, School University, [email protected]). [email protected], melody.yang Technological with [email protected], [email protected], Nanyang are ing, Lam Yan Zhao). Jun I author: Res Alibaba Joint (Corresponding Singapore through (JRI). Alibaba-NTU and Group Program (AIR) Alibaba Research (RG16/ (NWJ-2020-004), 1 Tier JRI (MOE) WeBank Education M of of gra Ministry Singapore under Software Future (WASP/NTU) (4080), and University the Technological Systems Nanyang for Autonomous and Intelligence AI, Wallenberg for Artificial RGANS1906, Technology on A*STAR-NTU- Grant and DeST-SCI2019-0007, Research Science NSoE Infrastructure Design ical Excellence, of N Satellite Resilience, Singapor Energy NRF2015-NRF-ISF001-2277, Singapore (EMA), EP003-041, Authority Singa Dus Market (NRF), Energy Foundation (2019M652473). (61902365 Research gapore National China Grant the by of Foundation supported Foundation Science Science WangPostdoctoral Natural Ning National Initiative. Funding the un R Centres Singapore (VTOL) Research Office, Minister’s Capability Landing th by program Prime & supported Foundation, Take-Off are (100E) Lam Research Vertical Kwok-Yan Experiments DeS and Large Yang 100 Mengmeng for NSoE Platform. (AISG) Project Infrastructure NTU Singapore Critical 9) AI Secure 8) 0012, for Excellence Insti Technology of Research Satellite National and Energy NRF RT01/ 6) Singapore 7) 1 (SCRIPTS), (ERIAN), Systems Tier & RT07/19, Research Cap S Technologies for Centre Strategic 1 5) Strategic its Initiative: Project, Funding Tier under Centres Joint (NRF) NTU-WASP RG115/19, Foundation 4) Research 1 MOE2019-T2-1-176, National 2 Tier Tier RG128/18, Academ and Education 1 of J Ministry Singapore Tier Singapore Alibaba-NTU 3) 2) (JRI), Institute Grant, Startup (NTU) University learnin pertu machine to federate mechanisms the facilitat LDP integrate four to achieve propose communi- to we (LDP) Specifically, to the propose model. privacy reduce applications we differential and paper, crowdsourcing local threat this and privacy in learning of the cost, number avoid cation the communicat To of as amount unexpected cost. ve incurs addition, server between cloud In communication the and frequent users. the the increases, vehicles of in concerns personal sensitive privacy severe raises inform which environme location etc., information, formation, users’ vehicle motor infer information, easily traffic can owners crowdso However, th application management. trai reporte traffic of which report information intelligent traffic for server applications variety on users cloud based these model large the learning Amazon to machine of a information and Users traffic simulates Uber, real-time etc. Waze, IoV Turk, as Things. Mechanical such of applications Internet crowdsourcing the of oa ifrnilPiaybsdFdrtdLearning Federated based Privacy Differential Local igunLui ihDprmn fCmue cec,Nation Science, Computer of Department with is Lyu Enginee Lingjuan and Science Information of College with is Wang Ning egWn swt colo yesaeScrt,X’nUnive Xi’an Security, Cyberspace of School with is Kwok- Wang Teng and Niyato Dusit Yang, Mengmeng Zhao, Jun Zhao, Yang agZa n u hoaespotdb )NnagTechnologi Nanyang 1) by supported are Zhao Jun and Zhao Yang Abstract agZhao, Yang & eeomnctos Eal [email protected]). (Email: Telecommunications. TeItre fVhce IV sapoiigbranch promising a is (IoV) Vehicles of Internet —The egWang, Teng rdaeSuetMme,IEEE, Member, Student Graduate ui Niyato, Dusit ebr IEEE, Member, o h nento Things of Internet the for elw IEEE, Fellow, nPrivacy-Preserving in nueus,dniy- @ntu.edu.sg, igWang, Ning ). cRsac Fund Research ic einScience Design , blt Research ability oe ne Sin- under pore, e t Strategic its der u.cn). sspotdby supported is R National NRF e 0,adNTU- and 20), ac Institute earch RF2017EWT- itResearch oint tM4082187 nt lUniversity al n China and ) tNyt is Niyato it eueCrit- Secure UDJoint SUTD anufacturing ue@NTU tute formation T-SCI2019- ig Ocean ring, (Emails: . National e n wkYnLam, Kwok-Yan and nnovative tlin- ntal ingapore Program and me, urcing st of rsity u Zhao, Jun esearch ation, hicles the e by d sa ns ion 19, cal rb d g ebr IEEE, Member, e .Pes elfe ocnatu o usin rremarks. or questions for us contact to free feel Please ). 1 rdet.A eut h Lsre ahr n averages average parameters. model’s and the global gathers obtain the server update to to gradients FL result perturbed the submitted result, users’ a perturb we obtain As they uploading, though gradients. even before data original gradients deducing from the data. to the original obtain noises infer to LDP gradients Howevadding the data. reverse true of may instead attackers gradients submit to users allows icse sflos o ipiiy ecnie aawith data consider we are simplicity, [6]–[8] mechanisms of For mechanisms LDP follows. prior new as paper, which discussed this for in developed data, [ are numerical in presented For are lit- data [11]. the categorical in proposed for been Mechanisms have erature. mechanisms LDP various [7], in nss Las rvdspiaypoeto otedt by data the to locally. protection data maintain privacy to users provides enabling also FL anisms, gradie of utility the gradients compromising to th not noises while infer LDP privacy to deploy ensure we gradients Thus, aggregato uploaded [3,4]. users’ honest-but-curious data leverage original A to instea able data. users be collab- raw may from the users’ gradients facilitate sharing uploaded can of with differen FL learning local techniques. orative with [2] [1] (LDP) that (FL) privacy privacy approach learning hybrid raise a federated propose may integrates informat we location it concerns, users’ these but as address To such kind life, data sensitive This daily of concerns users’ service. provides services that benefits routing new application service transportation bred Waze intelligent has the as which the such data, larg applications user and fast and of a enabled collection have (IoT) scale Things of Internet for gies priv protecting collaborativel of capable val model utility. are guaranteeing datasets the algorithms while real-world proposed train on our to that results vehicles experimental and Extensive server cloud an nally, ehns ycombining by mechanism oprbeuiiyto utility ( comparable mechanism suboptimal a h efrac hntepiaybde slre noptimal an large, is budget maxim privacy ( to mechanism the Besides, piecewise when cost. communication performance the the reduce to bits two eie ihacrc hntepiaybde ssal The small. is budget privacy possibilitie of the when possibilities output accuracy output different high a three deliver introduces proposed The mechanism vehicles. by generated gradients xsigLPmechanisms. LDP Existing eeae erigwt LDP. with learning Federated h eeomn fsnosadcmuiaintechnolo- communication and sensors of development The ebr IEEE, Member, LDP-FedSGD LDP-FedSGD eirMme,IEEE Member, Senior igunLyu, Lingjuan egegYang, Mengmeng PM-OPT PM-OPT Three-Outputs loih spooe ocodnt the coordinate to proposed is algorithm PM-SUB loih,wihpeet attackers prevents which algorithm, Three-Outputs spooe.W ute propose further We proposed. is ) hn ebidanvlhybrid novel a build we Then, . ebr IEEE, Member, ic h rpslo LDP of proposal the Since ihasml oml and formula simple a with ) nadto oLPmech- LDP to addition In FedSGD a eecddwith encoded be can ebr IEEE, Member, Three-Outputs and loih [5] algorithm PM-SUB idate Fi- . to s ion. nts. 9]– acy tial By ize er, ed of to e- y. d d e r This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.
TABLE I: A comparison of the worst-case variances of ex- [ 1, 1])1. isting ǫ-LDP mechanisms on a single numeric attribute with Piecewise− Mechanism (PM) of [8]. Due to the draw- a domain [ 1, 1]: Duchi of [6] generating a binary output, • ① ② − back of Laplace, and the drawback of Duchi Laplace of [7] with the addition of Laplace noise, and the above, Wang et al. [8] propose the Piecewise Mechanism Piecewise Mechanism (PM) of [8]. For an LDP mechanism , (PM) which achieves higher utility than Duchi for large its worst-case variance is denoted by V . We obtain this tableA A ǫ, since the output range of PM is continuous and has based on results of [8]. infinite possibilities, instead of just 2 possibilities as Range of ǫ Comparison of mechanisms in Duchi. In PM, the plot of the output’s probability den- 0 <ǫ< 1.29 VDuchi < VPM < VLaplace sity function with respect to the output value consists of 1.29 <ǫ< 2.32 VPM < VDuchi < VLaplace three “pieces”, among which the center piece has a higher ǫ> 2.32 VPM < VLaplace < V Duchi probability than the other two. As the input increases, the length of the center piece remains unchanged, but the length of the leftmost (resp., rightmost) piece increases (resp., decreases). Since PM is tailored for LDP, unlike TABLE II: A comparison of the worst-case variances of our Laplace for LDP, PM has a strictly lower worst-case and existing ǫ-LDP mechanisms on a single numeric attribute variance (i.e., the maximum variance with respect to the with a domain [ 1, 1]. Three-Outputs and PM-SUB are input given ǫ) than Laplace for any ǫ. − our main LDP mechanisms proposed in this paper. The results Proposing new LDP mechanisms. Based on results of [8], in this table show the advantages of our mechanisms over Table I on Page 2 shows that among the three mechanisms existing mechanisms for a wide range of privacy parameter ǫ. Duchi, Laplace, and PM, in terms of the worst-case vari- Range of ǫ Comparison of mechanisms ance, Duchi is the best for 0 <ǫ< 1.29, while PM is the best 0 <ǫ< ln 2 ≈ 0.69 VDuchi = VThree-Outputs < VPM-SUB < VPM for ǫ> 1.29. Then a natural research question is that can we ln 2 <ǫ< 1.19 VThree-Outputs < VDuchi < VPM-SUB < VPM propose better or even optimal LDP mechanisms? The optimal 1.19 <ǫ< 1.29 V < V < V < V Three-Outputs PM-SUB Duchi PM LDP mechanism for numeric data is still open in the literature, 1.29 <ǫ< 2.56 VThree-Outputs < VPM-SUB < VPM < VDuchi 2.56 <ǫ< 3.27 VPM-SUB < VThree-Outputs < VPM < VDuchi but the optimal LDP mechanism for categorical data has been ǫ > 3.27 VPM-SUB < VPM < VThree-Outputs < VDuchi discussed by Kairouz et al. [12]. In particular, [12] shows that for a categorical attribute (with a limited number of discrete values), the binary and randomized response mechanisms, are universally optimal for very small and large ǫ, respectively. Although [12] handles categorical data, its results can provide a single numeric attribute which has a domain [ 1, 1] (after the following insight even for continuous numeric data: for − normalization if the original domain is not [ 1, 1]). Extensions very small , a mechanism generating a binary output should − ǫ to the case of multiple numeric attributes will be discussed be optimal; for very large ǫ, the optimal mechanism’s output later in the paper. should have infinite possibilities. This is also in consistent with the results of Table I on Page 2. Based on the above insight, intuitively, there may exist a range of medium ǫ Laplace of [7]. LDP can be understood as a variant of • where a mechanism with three, or four, or five, ... output DP, where the difference is the definition of “neighboring possibilities can be optimal. To this end, our first motivation data(sets)”. In DP, two datasets are neighboring if they of proposing new LDP mechanisms is to develop a mecha- differ in just one record; in LDP, any two instances of nism with three output possibilities such that the mechanism the user’s data are neighboring. Due to this connection outperform existing mechanisms of Table I for some ǫ. The between DP and LDP, the classical Laplace mechanism outcome of the above motivation is our mechanism called (referred to Laplace hereinafter) for DP can thus also Three-Outputs. Since the analysis of Three-Outputs be used to achieve LDP. Yet, Laplace may not achieve is already very complex, we do not consider a mechanism with ① a high utility for some ǫ since Laplace does not four, or five, ... output possibilities. consider the difference between DP and LDP . In addition, our second motivation of proposing new LDP Duchi of [6]. In view of the above drawback ① of • mechanisms is that despite the elegance of the Piecewise Laplace et al. , Duchi [6] introduce alternative mech- Mechanism (PM) of [8], we can derive the optimal mechanism anisms for LDP. For a single numeric attribute, one PM-OPT under the “piecewise framework” of [8], in order to Duchi mechanism, hereinafter referred to of [6], flips improve PM. Note that PM-OPT is just optimal under the above a coin with two possibilities to generate an output, where framework and may not be the optimal LDP mechanism. Since the probability of each possibility depends on the input. A disadvantage of Duchi is as follows: ② Since the 1Note that any algorithm satisfying DP or LDP has the following property: output of Duchi has only two possibilities, the utility the set of possible values for the output does not depend on the input (though may not be high for large ǫ (intuitively, for large ǫ, the output distribution depends on the input). This can be easily seen by ′ the privacy protection is weak so the output should be contradiction. Suppose an output y is possible for input x but not for x (x and x′ satisfy the neighboring relation in DP or LDP). Then P [y | x] > 0 close to the input which means the output should have and P [y | x′] = 0, resulting in P [y | x] > eǫP [y | x′] and hence violating many possibilities since the input can take any value in the privacy requirement (P[·|·] denotes conditional probability).
2 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks. the expressions for PM-OPT are quite complex, we present forming empirical risk minimization tasks than existing PM-SUB which compared with PM-OPT is suboptimal, but approaches. has simpler expressions and achieves a comparable utility. Organization. In the following, Section I introduces the Table II on Page 2 gives a comparison of our and existing preliminaries. Then, we introduce related works in Section II. LDP mechanisms. For simplicity, we do not include Laplace Then, we illustrate the system model and the local differential in the comparison since Laplace is worse than PM for any privacy based FedSGD algorithm in Section III. Section IV ǫ according to Table I. As shown in Table II, in terms of presents the problem formation. Section V proposes novel the worst-case variance for a single numeric attribute with solutions for the single numerical data estimation. Section VI a domain [ 1, 1], Three-Outputs outperforms Duchi for illustrates proposed mechanisms used for multidimensional − ǫ> ln 2 0.69 (and is the same as Duchi for ǫ ln 2), while numerical data estimation. Section VII demonstrates our ex- ≈ ≤ PM-SUB beats PM for any ǫ; moreover, Three-Outputs perimental results. Section VIII concludes the paper. outperforms both Duchi and PM for ln 2 <ǫ< 3.27, while PM-SUB beats both Duchi and PM for ǫ> 1.19. I. PRELIMINARIES We also follow the practice of [8], which combines different In local differential privacy, users complete the perturbation mechanisms Duchi and PM to obtain a hybrid mechanism by themselves. To protect users’ privacy, each user runs a ran- HM. HM has a lower worst-case variance than those of Duchi dom perturbation algorithm , and then he sends perturbed and PM. In this paper, we combine Three-Outputs and results to the aggregator. TheM privacy budget ǫ controls the PM-SUB to propose HM-TP. The intuition is as follows. privacy-utility trade-off, and a higher privacy budget means a Given ǫ, the variances of Three-Outputs and PM-SUB lower privacy protection. As a result of this, we define local (denoted by Tǫ(x) and Pǫ(x)) depend on the input x, and differential privacy as follows: their maximal values (i.e., the worst-case variances) may be taken at different x. Hence, for a hybrid mechanism which Definition 1. (Local Differential Privacy.) Let be a ran- X Y M probabilistically uses Three-Outputs with probability q domized function with domain and range ; i.e., maps X M or PM-SUB with probability 1 q, given ǫ, the worst-case each element in to a probability distribution with sample − space Y. For a non-negative ǫ, the randomized mechanism variance maxx (q Tǫ(x)+(1 q) Pǫ(x)) may be strictly · − · satisfies ǫ-local differential privacy if smaller than the minimum of maxx Tǫ(x) and maxx Pǫ(x). M P We optimize q for each ǫ to obtain HM-TP. Due to the [Y S x] X Y ln P M ∈ | ǫ, x, x′ , S . (1) [Y S x′] ≤ ∀ ∈ ∀ ⊆ already complex expression of PM-OPT, we do not consider M ∈ | the combination of PM-OPT and Three-Outputs. P [ ] means conditional probability distribution depend- Contributions. Our contributions can be summarized as ingM on ·|· . In local differential privacy, the random perturba- follows: tion is performedM by users instead of a centralized aggregator. Using the LDP-FedSGD algorithm for federated learning Centralized aggregator only receives perturbed results which • in IoV as a motivating context, we present novel LDP make sure that the aggregator is unable to distinguish whether mechanisms for numeric data with a continuous domain. the true tuple is x or x′ with high confidence (controlled by Among our proposed mechanisms, Three-Outputs the privacy budget ǫ). and PM-SUB outperform existing mechanisms for a wide range of ǫ, as shown in the theoretical results in Ta- 6 ble II and confirmed by experiments. In terms of com- paring our Three-Outputs and PM-SUB, we have: 5 Three-Outputs, whose output has three possibilities, is better for small ǫ, while PM-SUB, whose output can 4 take infinite possibilities of an interval, has higher utility for large ǫ. Our PM-SUB is a slightly suboptimal version 3 of our PM-OPT to simplify the expressions. We further combine Three-Outputs and PM-SUB to obtain a 2
hybrid mechanism HM-TP, which achieves even higher worst-case variance 1 utility. We discretize the continuous output ranges of our pro- • 0 posed mechanisms PM-SUB and PM-OPT. Through the 0 1 2 3 4 5 6 7 8 discretization post-processing, we enable vehicles to use privacy budget our proposed mechanisms. In Section VII, we confirm Fig. 1: Different mechanisms’ worst-case noise variance for that the discretization post-processing algorithm main- one-dimensional numeric data versus the privacy budget ǫ. tains utility with our experiments, while reducing the communication cost. Experimental evaluation of our proposed mechanisms on II. RELATED WORK • real-world datasets and synthetic datasets demonstrates Recently, local differential privacy has attracted much at- that our proposed mechanisms achieve higher accuracy tention [13]–[20]. Several mechanisms for numeric data es- in estimating the mean frequency of the data and per- timation have been proposed [6]–[8,21]. (i) Dwork et al. [7]
3 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks. propose the Laplace mechanism which adds the Laplace noise data before data leave data owners for machine learning ser- to real one-dimensional data directly. The Laplace mechanism vices. Pihur et al. [28] propose the Podium Mechanism which is originally used in the centralized differential privacy mecha- is similar to our PM-SUB, but his mechanism is applicable to nism, and it can be applied to local differential privacy directly. DP instead of LDP. (ii) For a single numeric attribute with a domain [ 1, 1], Duchi et al. [6] propose an LDP framework that provides− out- Moreover, federated learning or collaborative learning is put from C, C , where C > 1. (iii) Wang et al. [8] propose an emerging distributed machine learning paradigm, and it the piecewise{− mechanism} (PM) which offers an output that is widely used to address data privacy problem in machine contains infinite possibilities in the range of [ A, A], where learning [1,29]. Recently, federated learning is explored ex- A> 1. In addition, they apply LDP mechanism− to preserve the tensively in the Internet of Things recently [30]–[34]. Lim et privacy of gradients generated during machine learning tasks. al. [30] survey federated learning applications in mobile edge Both approaches by Duchi et al. [6] and Wang et al. [8] can network comprehensively, including algorithms, applications be extended to the case of multidimensional numerical data. and potential research problems, etc. Besides, Lu et al. [32] Deficiencies of existing solutions. Fig. 1 illustrates that when propose CLONE which is a collaborative learning framework ǫ 2.3, Laplace mechanism’s worst-case noise variance is on the edges for connected vehicles, and it reduces the training larger≤ than that of Duchi et al.’s [22] solution; however, the time while guaranteeing the prediction accuracy. Different Laplace mechanism outperforms Duchi et al.’s [22] solution from CLONE, our proposed approach utilizes local differential if ǫ is larger. The worst-case noise variance in PM is smaller privacy noises to protect the privacy of the uploaded data. than that of Laplace and Duchi et al.’s [22] solution when ǫ is Furthermore, Fantacci et al. [33] leverage FL to protect the large. The HM mechanism outperforms other existing solutions privacy of mobile edge computing, while Saputra et al. [34] by taking advantage of Duchi et al.’s [22] solution when ǫ is apply FL to predict the energy demand for electrical vehicle small and PM when ǫ is large. However, PM and HM’s outputs networks. have infinite possibilities that are hard to encode. We would like to find a mechanism that can improve the utility of existing Furthermore, there have been many papers on federated mechanisms. In addition, we believe there is a mechanism that learning and differential privacy such as [31,35]–[43]. For retains a high utility and is easy to encode its outputs. Based example, Truex et al. [36] utilize both secure multiparty on the above intuition, we propose four novel mechanisms that computation and centralized differential privacy to prevent can be used by vehicles in Section V. inference over both the messages exchanged in the process In addition, LDP has been widely used in the research of training the model. However, they do not analyze the of IoT [23]–[27]. For example, Xu et al. [23] integrate impact of the privacy budget on performance of FL. Hu et deep learning with local differential privacy techniques and al. [37] propose a privacy-preserving FL approach for learning apply them to protect users’ privacy in edge computing. effective personalized models. They use Gaussian mechanism, They develop an EdgeSanitizer framework that forms a new a centralized DP mechanism, to protect the privacy of the protection layer against sensitive inference by leveraging a model. Compared with them, our proposed LDP mechanisms deep learning model to mask the learned features with noise provide a stronger privacy protection using the local differen- and minimize data. Choi et al. [24] explore the feasibility tial privacy mechanism. Hao et al. [31] propose a differential of applying LDP on ultra-low-power (ULP) systems. They enhanced federated learning scheme for industrial artificial use resampling, thresholding, and a privacy budget control industry. Triastcyn et al. [39] employ Bayesian differential algorithm to overcome the low resolution and fixed point privacy on federated learning. They make use of the cen- nature of ULPs. He et al. [25] address the location privacy and tralized differential privacy mechanism to protect the privacy usage pattern privacy induced by the mobile edge computing’s of gradients, but we leverage a stronger privacy-preserving wireless task offloading feature by proposing a privacy-aware mechanism (LDP) to protect each vehicle’s privacy. task offloading scheduling algorithm based on constrained Markov decision process. Li et al. [26] propose a scheme Additionally, DP can be applied to various FL algorithms for privacy-preserving data aggregation in the mobile edge such as FedSGD [5] and FedAvg [1]. FedAvg requires computing to assist IoT applications with three participants, users to upload model parameters instead of gradients in i.e., a public cloud center (PCC), an edge server (ES), and FedSGD. The advantage of FedAvg is that it allows users to a terminal device (TD). TDs generate and encrypt data and train the model for multiple rounds locally before submitting send them to the ES, and then the ES submits the aggregated gradients. McMahan et al. [35] propose to apply centralized data to the PCC. The PCC uses its private key to recover DP to FedAvg and FedSGD algorithm. In our paper, we the aggregated plaintext data. Their scheme provides source deploy LDP mechanisms to gradients in FedSGD algorithm. authentication and integrity and guarantees the data privacy Our future work is to develop LDP mechanisms for state-of- of the TDs. In addition, their scheme can save half of the the-art FL algorithms. Zhao et al. [44] propose a SecProbe communication cost. To protect the privacy of massive data mechanism to protect privacy and quality of participants’ generated from IoT platforms, Arachchige et al. [27] design data by leveraging exponential mechanism and functional an LDP mechanism named as LATENT for deep learning. A mechanism of differential privacy. SecProbe guarantees the randomization layer between the convolutional module and the high accuracy as well as privacy protection. In addition, it fully connected module is added to the LATENT to perturb prevents unreliable participants in the collaborative learning.
4 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.
III. SYSTEM MODELANDERENTIAL PRIVACY BASED Algorithm 1: Local Differential Privacy based FEDSGD ALGORITHM FedSGD (LDP-FedSGD) Algorithm.
A. System Model 1 Server executes: In this work, we consider a scenario where a number of 2 Server initializes the parameter as θ0; vehicles are connected with a cloud server as Fig. 2. Each 3 for t from 1 to maximal iteration number do vehicle is responsible for continuously performing training and 4 Server sends θt 1 to vehicles in group Gt; − inference locally based on data that it collects and the model 5 for each vehicle i in Group Gt do initiated by the cloud server. Local training dataset is never 6 VehicleUpdate(i, ∆L): uploaded to the cloud server. After finishing predefined epochs 7 Server computes the average of the noisy gradient locally, the cloud server calculates the average of uploaded of group Gt and updates the parameter from θt 1 − gradients from vehicles and updates the global model with to θt: the average. The FL aggregator is honest-but-curious or semi- θ θ η 1 (∆L(θ ; x )), t t 1 G i Gt t 1 i ← − − · | t| ∈ M − honest, which follows the FL protocol but it will try to learn where ηt is the learning rate; P additional information using received data [36,45]. With the 8 if θt and θt 1 are close enough or these remains injected LDP noise, servers or attackers cannot retrieve users’ no vehicle− which has not participated in the information by reversing their uploaded gradients [3,4]. Thus, computation then there is a need to deploy LDP mechanisms to the FL to develop 9 break; a communication-efficient LDP-based FL algorithm. 10 t t + 1; → Noisy gradient gradient Local Dataset 11 VehicleUpdate (i, ∆L):
LDP 12 Compute the (true) gradient ∆L(θt 1; xi), where xi is vehicle i’s data; − 13 Use local differential privacy-compliant algorithm Cloud Server Download model M to compute the noisy gradient (∆(θt 1; xi)); Upload gradients LDP Download model M −
Upload gradients Download model interested in other ways of using DP in federated learning. Upload gradients LDP To explain them, we categorize combinations of DP and federated learning (or distributed computations in general) by considering the place of perturbation (distributed/centralized Fig. 2: System Design. perturbation) and privacy granularity (user-level/record-level privacy protection) [46]: Distributed/Centralized perturbation. Note that differ- B. Federated learning with LDP: LDP-FedSGD • ential privacy is achieved by introducing perturbation. In addition, we propose a local differential privacy Distributed perturbation considers an honest-but-curious based federated stochastic gradient descent algorithm aggregator, while centralized perturbation needs a trusted (LDP-FedSGD) for our proposed system. Details of aggregator. Both perturbation methods defend against LDP-FedSGD are given in Algorithm 1. Unlike the FedAvg external inference attacks after model publishing. algorithm, in the FedSGD algorithm, clients (i.e., vehicles) User-level/Record-level privacy protection. In general, • upload updated gradients instead of model parameters to the a differentially private algorithm ensures that the prob- central aggregator (i.e., cloud server) [5]. However, compared ability distributions of the outputs on two neighboring with the standard FedSGD [5], we add our proposed LDP datasets do not differ much. The distinction between mechanism proposed in Section VI to prevent the privacy user-level and record-level privacy protection lies in how leakage of gradients. Each vehicle locally takes one step of neighboring datasets are defined. We define that two gradient descent on the current model using its local data, and datasets are user-neighboring if one dataset can be formed then it perturbs the true gradient with Algorithm 6. The server from the other dataset by adding or removing all one aggregates and averages the updated gradients from vehicles user’s records arbitrarily. We define that two datasets are and then updates the model. To reduce the communication record-neighboring if one dataset can be obtained from rounds, we separate vehicles into groups, so that the cloud the other dataset by changing a single record of one user. server updates the model after gathering gradient updates Based on the above, we further obtain four paradigms as fol- from vehicles in a group. In the following sections, we will lows: 1) ULDP (user-level privacy protection with distributed introduce how we obtain the LDP algorithm in detail. perturbation), 2) RLDP (record-level privacy protection with distributed perturbation), 3) RLCP (record-level privacy pro- C. Comparing LDP-FedSGD with other privacy-preserving tection with centralized perturbation), and 4) ULCP (user-level federated learning paradigms privacy protection with centralized perturbation). The details The LDP-FedSGD algorithm incorporates LDP into fed- are as follows and can also be found in the second author’s erated learning. In addition to LDP-FedSGD, one may be prior work [46].
5 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.
TABLE III: We compare different privacy notions in this 1) In ULDP, each user i selects a privacy parameter ǫi and table. In this paper, we focus on ǫ-local differential privacy applies a randomization algorithm Yi such that given any which achieves user-level privacy protection with distributed two instances xi and xi′ of user i’s data (which are user- 2 perturbation (ULDP). We do not consider record-level pri- neighboring), and for any possible subset of outputs i P ǫi P Y vacy protection with distributed perturbation (RLDP) which of Yi, we obtain [Yi i xi] e [Yi i xi′ ]. Clearly, ULDP is achieved∈ Y | by each≤ user× implementing∈ Y | implements perturbation at each user via standard differential local differential privacy studied in this paper. privacy, since we aim to achieve user-level privacy protection instead of the weaker record-level privacy protection (a vehicle 2) In RLDP, each user i selects a privacy parameter ǫi is a user in our IoV applications and may have multiple and applies a randomization algorithm Yi such that for records). We also do not investigate record/user-level privacy any two record-neighboring instances xi and x′ of user i protection with centralized perturbation (RLCP/ULCP) since i’s data (i.e, xi and x′ differ in only one record), and i this paper considers a honest-but-curious aggregator instead of for any possible subset of outputs i of Yi, we have ǫi Y a trusted aggregator. P [Y x ] e P [Y x′ ]. Clearly, in i ∈ Yi | i ≤ × i ∈ Yi | i RLDP, what each user does is just to apply standard privacy granularity and privacy adversary model differential privacy. In contrast, in ULDP above, each place of perturbation property ǫ-LDP (defined for ǫ-ULDP defend against a honest-but-curious user applies local differential privacy. distributed perturbation) aggregator &external attacks after model 3) In ǫ-RLCP, the aggregator sets a privacy parameter ǫ ǫ-DP with ǫ-RLDP publishing and applies a randomization algorithm Y such that for distributed perturbation ǫ-DP with any user , for any two record-neighboring instances ǫ-RLCP trusted aggregator; defend against external i centralized perturbation attacks after model publishing xi and xi′ of user i’s data, and for any possible subset user-level privacy with 2 ǫ ǫ-ULCP of outputs of Y , we obtain P [Y xi] e centralized perturbation P Y ∈ Y | ≤ × [Y xi′ ]. In other words, in RLCP, the aggregator applies∈ Y standard | differential privacy. For the aggrega- tor to implement RLCP well, typically the aggregator ing process of federated learning for language modeling. should be able to bound the impact of each record on the Specifically, [35] considers user-level privacy to protect information sent from a user to the aggregator. Further the privacy of all typed words of a user, and explains discussions on this can be interesting, but we do not that such privacy protection is more reasonable than present more details since RLCP is not our paper’s focus. protecting individual words as in the case of record-level 4) In ǫ-ULCP, the aggregator sets a privacy parameter ǫ privacy. In addition, although we can compute the level of and applies a randomization algorithm Y so that for user-level privacy from record-level privacy via the group privacy property of differential privacy (see Theorem 2.2 any user i, for any two instances xi and xi′ of user i’s data (which are user-neighboring), and for any possible of [47]), but this may significantly increase the privacy parameter and hence weaken the privacy protection if subset of outputs of Y , we have P [Y xi] ǫ P Y ∈ Y | ≤ a user has many records (note that a larger privacy e [Y xi′ ]. The difference RLCP and ULCP is× that RLCP∈ Y achieves | record-level privacy protection parameter ǫ in ǫ-DP means weaker privacy protection). while ULCP ensures the stronger user-level privacy More specifically, for a user with m records, according protection. to the group privacy property [47], the privacy protection strength for the user under ǫ-record-level privacy is just In the case of distributed perturbation, when all users set as that under mǫ-user-level privacy (i.e., for a user with the same privacy parameter ǫ, we refer to ULDP and RLDP m records, ǫ-RLDP ensures mǫ-ULDP; ǫ-RLCP ensures above as ǫ-ULDP and ǫ-RLDP, respectively. Table III presents mǫ-ULCP). a comparison of ǫ-ULDP, ǫ-RLDP, ǫ-RLCP, and ǫ-ULCP. In We also do not investigate RLCP and ULCP since this this paper’s focus, each user applies ǫ-local differential privacy, • paper considers a honest-but-curious aggregator instead so our framework is under ULDP. The reasons that we consider of a trusted aggregator. The aggregator is not completely ULDP instead of RLDP, RLCP, and ULCP are as follows. trusted, so the perturbation is implemented at each user We do not consider RLDP which implements perturbation • (i.e., vehicle in IoV). at each user via standard differential privacy, since we aim to achieve user-level privacy protection instead of the weaker record-level privacy protection (a vehicle is a user IV. PROBLEM FORMATION in our IoV applications and may have multiple records). Let x be a user’s true value, and Y be the perturbed value. The motivation is that often much data from a vehicle Under the perturbation mechanism , we use E [Y x] to may be about the vehicle’s regular driver, and it often denote the expectation of the randomizedM output Y givenM | input makes more sense to protect all data about the regular x. Var [Y x] is the variance of output Y given input x. driver instead of just protecting each single record. A MaxVarM( )| denotes the worst-case Var [Y x]. We are inter- similar argument has been recently stated in [35], which ested in findingM a privatization mechanismM | that minimizes incorporates user-level differential privacy into the train- MaxVar( ) by solving the following constraintM minimization problem:M 2For simplicity, we slightly abuse the notation and denote the output of algorithm Yi (resp., algorithm Y ) by Yi (resp., Y ). min MaxVar( ), M M 6 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.
s.t. Eq. (1), Algorithm 2: Three-Outputs Mechanism for One- E [Y x]= x, and Dimensional Numeric Data. M | Input: tuple x [ 1, 1] and privacy parameter ǫ. P [Y Y x]=1. ∈ − M ∈ | Output: tuple Y C, 0, C . The second constraint illustrates that our estimator is unbiased, ∈ {− } 1 Sampling a random variable u with the probability and the third constraint shows the proper distribution where distribution as follows: Y is the range of randomized function . In the following M P [u = 1] = P C x, sections, if is clear from the context, we omit the subscript − − ← M P for simplicity. [u =0]= P0 x, and M ← P [u =1]= PC x, ← where P , P and P are given in Eq. (2), V. MECHANISMS FOR ESTIMATION OF A SINGLE C x 0 x C x Eq. (3) and− ← Eq. (4).← ← NUMERIC ATTRIBUTE 2 if u = 1 then − To solve the problem in Section IV, we propose four 3 Y = C; − local differential privacy mechanisms: Three-Outputs, 4 else if u =0 then PM-OPT, PM-SUB, and HM-TP. Fig. 1 compares the worst- 5 Y =0; case noise variances of existing mechanisms and our proposed 6 else mechanisms. Three-Outputs has three discrete output pos- 7 Y = C; sibilities, which incurs little communication cost because two 8 return Y ; bits are enough to encode three different outputs. Moreover, it achieves a small worst-case noise variance in the high privacy regime (small privacy budget ǫ). However, to maintain a low ǫ worst-case noise variance in the low privacy regime (large 1 P0←0 e P0←0 1 P0←0 − 2 + −eǫ+1 − 2 x, if 0 x 1, PC x = − ǫ ≤ ≤ privacy budget ǫ), we propose PM-OPT and PM-SUB. Both of 1 P ← 1 P ← e P ← ← − 0 0 + − 0 0 − 0 0 x, if 1 x 0, them achieve higher accuracies than Three-Outputs and 2 2 − eǫ(eǫ+1) − ≤ ≤ other existing solutions when the privacy budget ǫ is large. (3) Additionally, we discretize their continuous ranges of output P0 0 and P0 x = P0 0 + ( ← P0 0)x, if 1 x 1, for vehicles to encode using a post-processing discretization ← ← eǫ − ← − ≤ ≤ algorithm. In the following sections, we will explain our pro- (4) where P0 0 is defined by posed four mechanisms and the post-processing discretization ← P0 0 := algorithm in detail respectively. ← 0, if ǫ< ln 2, 1 ( e2ǫ 4eǫ 5 A. Three-Outputs Mechanism − 6 − − − π 1 ∆1 +2√∆0 cos( 3 + 3 arccos( 3 ))), if ln 2 ǫ ǫ′, − 2∆ 2 ≤ ≤ Now, we propose a mechanism with three output possi- 0 eǫ bilities named as Three-Outputs which is illustrated in eǫ+2 , if ǫ>ǫ′, Algorithm 2. Three-Outputs ensures low communication (5) cost while achieving a smaller worst-case noise variance than in which existing solutions in the high privacy regime (small privacy ∆ := e4ǫ + 14e3ǫ + 50e2ǫ 2eǫ + 25, (6) 0 − budget ǫ). Duchi et al.’s [22] solution contains two output 6ǫ 5ǫ 4ǫ 3ǫ 2ǫ ∆1 := 2e 42e 270e 404e 918e possibilities, and it outperforms other approaches when the − − − − − + 30eǫ 250, (7) privacy budget is small. However, Kairouz et al. [12] prove − that two outputs are not always optimal as ǫ increases. By 3+ √65 outputting three values instead of two, Three-Outputs and ǫ′ := ln ln 5.53. (8) 2 ! ≈ improves the performance as the privacy budget increases, Next, we will show how we derive the above probabil- which is shown in Fig. 1. When the privacy budget is small, ities. For a mechanism which uses x [ 1, 1] as the input Three-Outputs is equivalent to Duchi et al.’s [22] solu- and only three possibilities C, 0, C for∈ the− output value, it tion. satisfies − For notional simplicity, given a mechanism , we often PC x P0 x P C x ǫ ǫ M ǫ-LDP : ← , ← , − ← [e− ,e ], (9a) write P as below. We also ′ ′ ′ [Y = y X = x] Py x( ) PC x P0 x P C x ∈ M | P ← M ← ← − ← sometimes omit to obtain [Y = y X = x] and Py x. unbiased estimation: M | ← Given a tuple , Three-Outputs returns a x [ 1, 1] (9b) ∈ − C PC x +0 P0 x + ( C) P C x = x, perturbed value Y that equals C, 0 or C with probabilities · ← · ← − · − ← defined by − proper distribution: ǫ 1 P0←0 1 P0←0 e P0←0 Py x 0 and PC x + P0 x + P C x =1. (9c) − + − ǫ− ǫ x, if 0 x 1, 2 2 − e (e +1) ≤ ≤ ← ≥ ← ← − ← P C x = ǫ To calculate values of PC x, P0 x and P C x, we use − ← 1 P0←0 e P0←0 1 P0←0 ← ← − ← − 2 + −eǫ+1 − 2 x, if 1 x 0, Lemma 1 below to convert a mechanism satisfying the − − ≤ ≤ M1 (2) requirements in (9a) (9b) (9c) to a symmetric mechanism 2. M 7 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.
Then, we use Lemma 2 below to transform the symmetric Then, we may assign values to our designed probabili- mechanism further to whose worst-case noise variance is ties above. We find that if a symmetric mechanism satisfies M3 smaller than 2’s. Next, we use P0 1 to represent other prob- Eq. (19a) and Eq. (19b), it obtains a smaller worst-case noise abilities, andM then we prove that we← get the minimum variance variance. From Lemma 2 below, we enforce ǫ when ǫ using Lemma 3. Finally, Lemma 4 PC 1 = e PC 1, (19a) P0 0 = e P0 1 ← ←− ← ← ǫ and Lemma 5 are used to obtain values for P0 0 and the P C 1 = e P C 1. (19b) ← − ←− − ← worst-case noise variance of Three-Outputs, respectively. Hence, given a symmetric mechanism 2 satisfying Inequal- M Thus, we can obtain values of PC x, P0 x and P C x using ity (20), we can transform it to a new symmetric mechanism ← ← − ← P0 0. In the following, we will illustrate above processes in 3 which satisfies Eq. (19a) and Eq. (19b) through processes ← M ǫ detail. of Eq. (21) (22) (23) until PC 1 = e P C 1. After ←− − ← By symmetry, for any x [ 1, 1], we enforce transformation, the new mechanism 3 achieves a smaller ∈ − M PC x = P C x, (10a) worst-case noise variance than mechanism 2. Therefore, ← − ←− we use the new symmetric mechanism Mto replace P0 x = P0 x, (10b) M3 M2 where Eq. (10b) can be← derived from←− Eq. (10a). The formal in the future’s discussion. Details of transformation are in the justification of Eq. (10a) (10b) is given by Lemma 1 below. Lemma 2. Since the input domain [ 1, 1] is symmetric, we can transform − Lemma 2. For a symmetric mechanism 2, if any mechanism satisfying requirements in (9a) (9b) (9c) to a ǫ M PC 1( 2) x2. (25) above reasons, we elaborate Three-Outputs but not − Complete details for obtaining Eq. (25) are in Appendix C of Four-Outputs in this paper. the submitted supplementary file. Next, we use Lemma 4 to obtain the optimal P0 0 in B. PM-OPT Mechanism Three-Outputs to achieve the minimum worst-case← vari- ance as follows: Now, we advocate an optimal piecewise mechanism (PM-OPT) as shown in Algorithm 3 to get a small worst- Lemma 4. The optimal P0 0 to minimize the case variance when the privacy budget is large. As shown ← maxx [ 1,1] Var[Y x] is defined by Eq. (5). in Fig. 1, Three-Outputs’s worst-case noise variance is ∈ − | smaller than PM’s when the privacy budget ǫ < 3.2. But it Proof. The proof details are given in Appendix E of the loses the advantage when the privacy budget ǫ 3.2. As the submitted supplementary file. privacy budget increases, Kairouz et al. [12] suggested≥ to send Remark 1. Fig. 3 displays how P0 0 changes with ǫ in more information using more output possibilities. Besides, we ← observe that it is possible to improve Wang et al.’s [8] PM to Eq. (5). When the privacy budget ǫ is small, P0 0 =0. Thus, Three-Outputs is equivalent to Duchi et al.’s← [22] solution achieve a smaller worst-case noise variance. Thus, inspired by them, we propose an optimal piecewise mechanism named as when P0 0 =0. However, as the privacy budget ǫ increases, ← PM-OPT with a smaller worst-case noise variance than PM. P0 0 increases, which means that the probability of outputting true← value increases. Algorithm 3: PM-OPT Mechanism for One- Dimensional Numeric Data under Local Differential 1 Privacy. Input: tuple x [ 1, 1] and privacy parameter ǫ. Output: tuple Y∈ −[ A, A]. 0.8 ∈ − 1 Value t is calculated in the Eq. (30); 2 Sample u uniformly at random from [0, 1]; 0.6 eǫ 0 3 if u< t+eǫ then 0 4 Sample Y uniformly at random from P 0.4 [L(ǫ,x,t), R(ǫ,x,t)]; 5 else 0.2 6 Sample Y uniformly at random from [ A, L(ǫ,x,t)) (R(ǫ,x,t), A]; − ∪ 0 7 return Y ; 0 2 4 6 8 privacy budget Fig. 3: Optimal P0 0 if the privacy budget ǫ [0, 8]. ← ∈ ( = | ) By summarizing above, we obtain P C x, PC x and ݂݀ ܻ ݕ ݔ ← ← − P0 x from Eq. (2), Eq. (3) and Eq. (4) using P0 0. ← ← Then, we can calculate the optimal P0 0 to obtain the ← minimum worst-case noise variance of Three-Outputs as ܿ follows: Lemma 5. The minimum worst-case noise variance of ݀ Three-Outputs is obtained when P0 0 satisfies Eq. (5). ← 0 Proof. The proof details are given in Appendix F of the ( , , ) ( , , ) submitted supplementary file. െܣ ܣ ݕ ߳ ݔ ݐ functionݔ ܴ ߳ ݔFݐ[Y = y x] of the ܮFig. 4: The probability density A clarification about Three-Outputs ver- randomized output Y after applying ǫ-local differential| pri- sus Four-Outputs. One may wonder why we consider vacy. a perturbation mechanism with three outputs (i.e., our Three-Outputs) instead of a perturbation mechanism For a true input x [ 1, 1], the probability density function with four outputs (referred to as Four-Outputs), since of the randomized output∈ − Y [ A, A] after applying local using two bits to encode the output of a perturbation differential privacy is given by∈ − mechanism can represent four outputs. The reason is as c,for y [L(ǫ,x,t), R(ǫ,x,t)], (26a) follows. The approach to design Four-Outputs is similar F [Y = y x]= ∈ | for , (26b) to that for Three-Outputs, but the detailed analysis for d, y [ A, L(ǫ,x,t)) (R(ǫ,x,t), A] where ∈ − ∪ Four-Outputs will be even more tedious than that for eǫt(eǫ 1) c = − , (27) Three-Outputs (which is already quite complex). Given 2(t + eǫ)2 9 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks. t(eǫ 1) d = − , (28) Lemma 7. The optimal t for mint maxx [ 1,1] Var[Y x] is 2(t + eǫ)2 Eq. (30). ∈ − | (eǫ + t)(t + 1) A = , (29) Proof. By computing the first-order derivative and second- t(eǫ 1) ǫ − order derivative of mint maxx [ 1,1] Var[Y x], we get the (e + t)(xt 1) ∈ − | L(ǫ,x,t)= − , optimal t. The proof details are given in Appendix D of the t(eǫ 1) − submitted supplementary file. (eǫ + t)(xt + 1) R(ǫ,x,t)= , and t(eǫ 1) − C. PM-SUB Mechanism 1 e2ǫ +22/3 3 e2ǫ e4ǫ+ We propose a suboptimal piecewise mechanism (PM-SUB) 2 − q to simplify the sophisticated computation of t in Eq. (4) of p ǫ 3ǫ 1 3 4e 2e PM-OPT, and details of PM-SUB are shown in Algorithm 4. 2e2ǫ 22/3 e2ǫ e4ǫ + − 2 − − 2ǫ 2/3 3 2ǫ 4ǫ s e +2 √e e ǫ p − Algorithm 4: PM-SUB Mechanism for One- e p , if ǫ< ln √2, (30a) Dimensional Numeric Data under Local Differential − 2 Privacy. 1 3 t = e2ǫ +22/3 e2ǫ e4ǫ+ − 2 − Input: tuple x [ 1, 1] and privacy parameter ǫ. q ∈ − p ǫ 3ǫ Output: tuple Y [ A, A]. 1 2ǫ 2/3 3 2ǫ 4ǫ 4e 2e ∈ − 2e 2 e e − 1 Sample uniformly at random from ; 3 u [0, 1] 2s − − − e2ǫ +22/3 √e2ǫ e4ǫ eǫ 2 if u< then ǫ p − eǫ/3+eǫ e , if ǫ> ln √2, p (30b) 3 Sample Y uniformly at random from − 2 (eǫ+eǫ/3)(xeǫ/3 1) (eǫ+eǫ/3)(xeǫ/3+1) − [ eǫ/3(eǫ 1) , eǫ/3(eǫ 1) ]; 3+2√3 1 − − − , if ǫ = ln √2. (30c) 4 else √ p 2 5 Sample uniformly at random from Y t 1 L(ǫ,1,t) (eǫ+eǫ/3)(xeǫ/3 1) (eǫ+eǫ/3)(xeǫ/3+1) The meaning of t can be seen from t+1− = R(ǫ,1,t) . When − [ A, eǫ/3(eǫ 1) ) ( eǫ/3(eǫ 1) , A]; the input is x =1, the length of the higher probability density − − ∪ − eǫt(eǫ 1) function F [Y = y x] = − is R(ǫ, 1,t) L(ǫ, 1,t). 6 return Y ; | 2(t+eǫ)2 − R(ǫ, 1,t) is the right boundary, and L(ǫ, 1,t) is the left t 1 boundary. If 0