arXiv:2004.08856v2 [cs.CR] 22 Dec 2020 hspprapasi EEItre fTig ora (IoT-J Journal Things of Internet IEEE in appears paper This fSnaoe 147 Eal [email protected] (Email: 117417. Singapore, of [email protected] (Email: 266102. China, of University Posts a a r ihSho fCmue cec n Engineer- and Science Computer 639798 of Singapore, School University, [email protected]). [email protected], melody.yang Technological with [email protected], [email protected], Nanyang are ing, Lam Yan Zhao). Jun I author: Res Alibaba Joint (Corresponding Singapore through (JRI). Alibaba-NTU and Group Program (AIR) Alibaba Research (RG16/ (NWJ-2020-004), 1 Tier JRI (MOE) WeBank Education M of of gra Ministry Singapore under Software Future (WASP/NTU) (4080), and University the Technological Systems Nanyang for Autonomous and Intelligence AI, Wallenberg for Artificial RGANS1906, Technology on A*STAR-NTU- Grant and DeST-SCI2019-0007, Research Science NSoE Infrastructure Design ical Excellence, of N Satellite Resilience, Singapor Energy NRF2015-NRF-ISF001-2277, Singapore (EMA), EP003-041, Authority Singa Dus Market (NRF), Energy Foundation (2019M652473). (61902365 Research gapore National China Grant the by of Foundation supported Foundation Science Science WangPostdoctoral Natural Ning National Initiative. Funding the un R Centres Singapore (VTOL) Research Office, Minister’s Capability Landing th by program Prime & supported Foundation, Take-Off are (100E) Lam Research Vertical Kwok-Yan Experiments DeS and Large Yang 100 Mengmeng for NSoE Platform. (AISG) Project Infrastructure NTU Singapore Critical 9) AI Secure 8) 0012, for Excellence Insti Technology of Research Satellite National and Energy NRF RT01/ 6) Singapore 7) 1 (SCRIPTS), (ERIAN), Systems Tier & RT07/19, Research Cap S Technologies for Centre Strategic 1 5) Strategic its Initiative: Project, Funding Tier under Centres Joint (NRF) NTU-WASP RG115/19, Foundation 4) Research 1 MOE2019-T2-1-176, National 2 Tier Tier RG128/18, Academ and Education 1 of J Ministry Singapore Tier Singapore Alibaba-NTU 3) 2) (JRI), Institute Grant, Startup (NTU) University learnin pertu machine to federate mechanisms the facilitat LDP integrate four to achieve propose communi- to we (LDP) Specifically, to the propose model. privacy reduce applications we differential and paper, crowdsourcing local threat this and privacy in learning of the cost, number avoid cation the communicat To of as amount unexpected cost. ve incurs addition, server between cloud In communication the and frequent users. the the increases, vehicles of in concerns personal sensitive privacy severe raises inform which environme location etc., information, formation, users’ vehicle motor infer information, easily traffic can owners crowdso However, th application management. trai reporte traffic of which report information intelligent traffic for server applications variety on users cloud based these model large the learning Amazon to machine of a information and Users traffic simulates Uber, real-time etc. Waze, IoV Turk, as Things. Mechanical such of applications Internet crowdsourcing the of oa ifrnilPiaybsdFdrtdLearning Federated based Privacy Differential Local igunLui ihDprmn fCmue cec,Nation Science, Computer of Department with is Lyu Enginee Lingjuan and Science Information of College with is Wang Ning egWn swt colo yesaeScrt,X’nUnive Xi’an Security, Cyberspace of School with is Kwok- Wang Teng and Niyato Dusit Yang, Mengmeng Zhao, Jun Zhao, Yang agZa n u hoaespotdb )NnagTechnologi Nanyang 1) by supported are Zhao Jun and Zhao Yang Abstract agZhao, Yang & eeomnctos Eal [email protected]). (Email: Telecommunications. TeItre fVhce IV sapoiigbranch promising a is (IoV) Vehicles of Internet —The egWang, Teng rdaeSuetMme,IEEE, Member, Student Graduate ui Niyato, Dusit ebr IEEE, Member, o h nento Things of Internet the for elw IEEE, Fellow, nPrivacy-Preserving in nueus,dniy- @ntu.edu.sg, igWang, Ning ). cRsac Fund Research ic einScience Design , blt Research ability oe ne Sin- under pore, e t Strategic its der u.cn). sspotdby supported is R National NRF e 0,adNTU- and 20), ac Institute earch RF2017EWT- itResearch oint tM4082187 nt lUniversity al n China and ) tNyt is Niyato it eueCrit- Secure UDJoint SUTD anufacturing ue@NTU tute formation T-SCI2019- ig Ocean ring, (Emails: . National e n wkYnLam, Kwok-Yan and nnovative tlin- ntal ingapore Program and me, urcing st of rsity u Zhao, Jun esearch ation, hicles the e by d sa ns ion 19, cal rb d g ebr IEEE, Member, e .Pes elfe ocnatu o usin rremarks. or questions for us contact to free feel Please ). 1 rdet.A eut h Lsre ahr n averages average parameters. model’s and the global gathers obtain the server update to to gradients FL result perturbed the submitted result, users’ a perturb we obtain As they uploading, though gradients. even before data original gradients deducing from the data. to the original obtain noises infer to LDP gradients Howevadding the data. reverse true of may instead attackers gradients submit to users allows icse sflos o ipiiy ecnie aawith data consider we are simplicity, [6]–[8] mechanisms of For mechanisms LDP follows. prior new as paper, which discussed this for in developed data, [ are numerical in presented For are lit- data [11]. the categorical in proposed for been Mechanisms have erature. mechanisms LDP various [7], in nss Las rvdspiaypoeto otedt by data the to locally. protection data maintain privacy to users provides enabling also FL anisms, gradie of utility the gradients compromising to th not noises while infer LDP privacy to deploy ensure we gradients Thus, aggregato uploaded [3,4]. users’ honest-but-curious data leverage original A to instea able data. users be collab- raw may from the users’ gradients facilitate sharing uploaded can of with differen FL learning local techniques. orative with [2] [1] (LDP) that (FL) privacy privacy approach learning hybrid raise a federated propose may integrates informat we location it concerns, users’ these but as address To such kind life, data sensitive This daily of concerns users’ service. provides services that benefits routing new application service transportation bred Waze intelligent has the as which the such data, larg applications user and fast and of a enabled collection have (IoT) scale Things of Internet for gies priv protecting collaborativel of capable val model utility. are guaranteeing datasets the algorithms while real-world proposed train on our to that results vehicles experimental and Extensive server cloud an nally, ehns ycombining by mechanism oprbeuiiyto utility ( comparable mechanism suboptimal a h efrac hntepiaybde slre noptimal an large, is budget maxim privacy ( to mechanism the Besides, piecewise when cost. communication performance the the reduce to bits two eie ihacrc hntepiaybde ssal The small. is budget privacy possibilitie of the when possibilities output accuracy output different high a three deliver introduces proposed The mechanism vehicles. by generated gradients xsigLPmechanisms. LDP Existing eeae erigwt LDP. with learning Federated h eeomn fsnosadcmuiaintechnolo- communication and sensors of development The ebr IEEE, Member, LDP-FedSGD LDP-FedSGD eirMme,IEEE Member, Senior igunLyu, Lingjuan egegYang, Mengmeng PM-OPT PM-OPT Three-Outputs loih spooe ocodnt the coordinate to proposed is algorithm PM-SUB loih,wihpeet attackers prevents which algorithm, Three-Outputs spooe.W ute propose further We proposed. is ) hn ebidanvlhybrid novel a build we Then, . ebr IEEE, Member, ic h rpslo LDP of proposal the Since ihasml oml and formula simple a with ) nadto oLPmech- LDP to addition In FedSGD a eecddwith encoded be can ebr IEEE, Member, Three-Outputs and loih [5] algorithm PM-SUB idate Fi- . to s ion. nts. 9]– acy tial By ize er, ed of to e- y. d d e r This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

TABLE I: A comparison of the worst-case variances of ex- [ 1, 1])1. isting ǫ-LDP mechanisms on a single numeric attribute with Piecewise− Mechanism (PM) of [8]. Due to the draw- a domain [ 1, 1]: Duchi of [6] generating a binary output, • ① ② − back of Laplace, and the drawback of Duchi Laplace of [7] with the addition of Laplace noise, and the above, Wang et al. [8] propose the Piecewise Mechanism Piecewise Mechanism (PM) of [8]. For an LDP mechanism , (PM) which achieves higher utility than Duchi for large its worst-case variance is denoted by V . We obtain this tableA A ǫ, since the output range of PM is continuous and has based on results of [8]. infinite possibilities, instead of just 2 possibilities as Range of ǫ Comparison of mechanisms in Duchi. In PM, the plot of the output’s probability den- 0 <ǫ< 1.29 VDuchi < VPM < VLaplace sity function with respect to the output value consists of 1.29 <ǫ< 2.32 VPM < VDuchi < VLaplace three “pieces”, among which the center piece has a higher ǫ> 2.32 VPM < VLaplace < V Duchi probability than the other two. As the input increases, the length of the center piece unchanged, but the length of the leftmost (resp., rightmost) piece increases (resp., decreases). Since PM is tailored for LDP, unlike TABLE II: A comparison of the worst-case variances of our Laplace for LDP, PM has a strictly lower worst-case and existing ǫ-LDP mechanisms on a single numeric attribute variance (i.e., the maximum variance with respect to the with a domain [ 1, 1]. Three-Outputs and PM-SUB are input given ǫ) than Laplace for any ǫ. − our main LDP mechanisms proposed in this paper. The results Proposing new LDP mechanisms. Based on results of [8], in this table show the advantages of our mechanisms over Table I on Page 2 shows that among the three mechanisms existing mechanisms for a wide range of privacy parameter ǫ. Duchi, Laplace, and PM, in terms of the worst-case vari- Range of ǫ Comparison of mechanisms ance, Duchi is the best for 0 <ǫ< 1.29, while PM is the best 0 <ǫ< ln 2 ≈ 0.69 VDuchi = VThree-Outputs < VPM-SUB < VPM for ǫ> 1.29. Then a natural research question is that can we ln 2 <ǫ< 1.19 VThree-Outputs < VDuchi < VPM-SUB < VPM propose better or even optimal LDP mechanisms? The optimal 1.19 <ǫ< 1.29 V < V < V < V Three-Outputs PM-SUB Duchi PM LDP mechanism for numeric data is still open in the literature, 1.29 <ǫ< 2.56 VThree-Outputs < VPM-SUB < VPM < VDuchi 2.56 <ǫ< 3.27 VPM-SUB < VThree-Outputs < VPM < VDuchi but the optimal LDP mechanism for categorical data has been ǫ > 3.27 VPM-SUB < VPM < VThree-Outputs < VDuchi discussed by Kairouz et al. [12]. In particular, [12] shows that for a categorical attribute (with a limited number of discrete values), the binary and randomized response mechanisms, are universally optimal for very small and large ǫ, respectively. Although [12] handles categorical data, its results can provide a single numeric attribute which has a domain [ 1, 1] (after the following insight even for continuous numeric data: for − normalization if the original domain is not [ 1, 1]). Extensions very small , a mechanism generating a binary output should − ǫ to the case of multiple numeric attributes will be discussed be optimal; for very large ǫ, the optimal mechanism’s output later in the paper. should have infinite possibilities. This is also in consistent with the results of Table I on Page 2. Based on the above insight, intuitively, there may exist a range of medium ǫ Laplace of [7]. LDP can be understood as a variant of • where a mechanism with three, or four, or five, ... output DP, where the difference is the definition of “neighboring possibilities can be optimal. To this end, our first motivation data(sets)”. In DP, two datasets are neighboring if they of proposing new LDP mechanisms is to develop a mecha- differ in just one record; in LDP, any two instances of nism with three output possibilities such that the mechanism the user’s data are neighboring. Due to this connection outperform existing mechanisms of Table I for some ǫ. The between DP and LDP, the classical Laplace mechanism outcome of the above motivation is our mechanism called (referred to Laplace hereinafter) for DP can thus also Three-Outputs. Since the analysis of Three-Outputs be used to achieve LDP. Yet, Laplace may not achieve is already very complex, we do not consider a mechanism with ① a high utility for some ǫ since Laplace does not four, or five, ... output possibilities. consider the difference between DP and LDP . In addition, our second motivation of proposing new LDP Duchi of [6]. In view of the above drawback ① of • mechanisms is that despite the elegance of the Piecewise Laplace et al. , Duchi [6] introduce alternative mech- Mechanism (PM) of [8], we can derive the optimal mechanism anisms for LDP. For a single numeric attribute, one PM-OPT under the “piecewise framework” of [8], in order to Duchi mechanism, hereinafter referred to of [6], flips improve PM. Note that PM-OPT is just optimal under the above a coin with two possibilities to generate an output, where framework and may not be the optimal LDP mechanism. Since the probability of each possibility depends on the input. A disadvantage of Duchi is as follows: ② Since the 1Note that any algorithm satisfying DP or LDP has the following property: output of Duchi has only two possibilities, the utility the set of possible values for the output does not depend on the input (though may not be high for large ǫ (intuitively, for large ǫ, the output distribution depends on the input). This can be easily seen by ′ the privacy protection is weak so the output should be contradiction. Suppose an output y is possible for input x but not for x (x and x′ satisfy the neighboring relation in DP or LDP). Then P [y | x] > 0 close to the input which means the output should have and P [y | x′] = 0, resulting in P [y | x] > eǫP [y | x′] and hence violating many possibilities since the input can take any value in the privacy requirement (P[·|·] denotes conditional probability).

2 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks. the expressions for PM-OPT are quite complex, we present forming empirical risk minimization tasks than existing PM-SUB which compared with PM-OPT is suboptimal, but approaches. has simpler expressions and achieves a comparable utility. Organization. In the following, Section I introduces the Table II on Page 2 gives a comparison of our and existing preliminaries. Then, we introduce related works in Section II. LDP mechanisms. For simplicity, we do not include Laplace Then, we illustrate the system model and the local differential in the comparison since Laplace is worse than PM for any privacy based FedSGD algorithm in Section III. Section IV ǫ according to Table I. As shown in Table II, in terms of presents the problem formation. Section V proposes novel the worst-case variance for a single numeric attribute with solutions for the single numerical data estimation. Section VI a domain [ 1, 1], Three-Outputs outperforms Duchi for illustrates proposed mechanisms used for multidimensional − ǫ> ln 2 0.69 (and is the same as Duchi for ǫ ln 2), while numerical data estimation. Section VII demonstrates our ex- ≈ ≤ PM-SUB beats PM for any ǫ; moreover, Three-Outputs perimental results. Section VIII concludes the paper. outperforms both Duchi and PM for ln 2 <ǫ< 3.27, while PM-SUB beats both Duchi and PM for ǫ> 1.19. I. PRELIMINARIES We also follow the practice of [8], which combines different In local differential privacy, users complete the perturbation mechanisms Duchi and PM to obtain a hybrid mechanism by themselves. To protect users’ privacy, each user runs a ran- HM. HM has a lower worst-case variance than those of Duchi dom perturbation algorithm , and then he sends perturbed and PM. In this paper, we combine Three-Outputs and results to the aggregator. TheM privacy budget ǫ controls the PM-SUB to propose HM-TP. The intuition is as follows. privacy-utility trade-off, and a higher privacy budget means a Given ǫ, the variances of Three-Outputs and PM-SUB lower privacy protection. As a result of this, we define local (denoted by Tǫ(x) and Pǫ(x)) depend on the input x, and differential privacy as follows: their maximal values (i.e., the worst-case variances) may be taken at different x. Hence, for a hybrid mechanism which Definition 1. (Local Differential Privacy.) Let be a ran- X Y M probabilistically uses Three-Outputs with probability q domized function with domain and range ; i.e., maps X M or PM-SUB with probability 1 q, given ǫ, the worst-case each element in to a probability distribution with sample − space Y. For a non-negative ǫ, the randomized mechanism variance maxx (q Tǫ(x)+(1 q) Pǫ(x)) may be strictly · − · satisfies ǫ-local differential privacy if smaller than the minimum of maxx Tǫ(x) and maxx Pǫ(x). M P We optimize q for each ǫ to obtain HM-TP. Due to the [Y S x] X Y ln P M ∈ | ǫ, x, x′ , S . (1) [Y S x′] ≤ ∀ ∈ ∀ ⊆ already complex expression of PM-OPT, we do not consider M ∈ | the combination of PM-OPT and Three-Outputs. P [ ] means conditional probability distribution depend- Contributions. Our contributions can be summarized as ingM on ·|· . In local differential privacy, the random perturba- follows: tion is performedM by users instead of a centralized aggregator. Using the LDP-FedSGD algorithm for federated learning Centralized aggregator only receives perturbed results which • in IoV as a motivating context, we present novel LDP make sure that the aggregator is unable to distinguish whether mechanisms for numeric data with a continuous domain. the true tuple is x or x′ with high confidence (controlled by Among our proposed mechanisms, Three-Outputs the privacy budget ǫ). and PM-SUB outperform existing mechanisms for a wide range of ǫ, as shown in the theoretical results in Ta- 6 ble II and confirmed by experiments. In terms of com- paring our Three-Outputs and PM-SUB, we have: 5 Three-Outputs, whose output has three possibilities, is better for small ǫ, while PM-SUB, whose output can 4 take infinite possibilities of an interval, has higher utility for large ǫ. Our PM-SUB is a slightly suboptimal version 3 of our PM-OPT to simplify the expressions. We further combine Three-Outputs and PM-SUB to obtain a 2

hybrid mechanism HM-TP, which achieves even higher worst-case variance 1 utility. We discretize the continuous output ranges of our pro- • 0 posed mechanisms PM-SUB and PM-OPT. Through the 0 1 2 3 4 5 6 7 8 discretization post-processing, we enable vehicles to use privacy budget our proposed mechanisms. In Section VII, we confirm Fig. 1: Different mechanisms’ worst-case noise variance for that the discretization post-processing algorithm main- one-dimensional numeric data versus the privacy budget ǫ. tains utility with our experiments, while reducing the communication cost. Experimental evaluation of our proposed mechanisms on II. RELATED WORK • real-world datasets and synthetic datasets demonstrates Recently, local differential privacy has attracted much at- that our proposed mechanisms achieve higher accuracy tention [13]–[20]. Several mechanisms for numeric data es- in estimating the mean frequency of the data and per- timation have been proposed [6]–[8,21]. (i) Dwork et al. [7]

3 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks. propose the Laplace mechanism which adds the Laplace noise data before data leave data owners for machine learning ser- to real one-dimensional data directly. The Laplace mechanism vices. Pihur et al. [28] propose the Podium Mechanism which is originally used in the centralized differential privacy mecha- is similar to our PM-SUB, but his mechanism is applicable to nism, and it can be applied to local differential privacy directly. DP instead of LDP. (ii) For a single numeric attribute with a domain [ 1, 1], Duchi et al. [6] propose an LDP framework that provides− out- Moreover, federated learning or collaborative learning is put from C, C , where C > 1. (iii) Wang et al. [8] propose an emerging distributed machine learning paradigm, and it the piecewise{− mechanism} (PM) which offers an output that is widely used to address data privacy problem in machine contains infinite possibilities in the range of [ A, A], where learning [1,29]. Recently, federated learning is explored ex- A> 1. In addition, they apply LDP mechanism− to preserve the tensively in the Internet of Things recently [30]–[34]. Lim et privacy of gradients generated during machine learning tasks. al. [30] survey federated learning applications in mobile edge Both approaches by Duchi et al. [6] and Wang et al. [8] can network comprehensively, including algorithms, applications be extended to the case of multidimensional numerical data. and potential research problems, etc. Besides, Lu et al. [32] Deficiencies of existing solutions. Fig. 1 illustrates that when propose CLONE which is a collaborative learning framework ǫ 2.3, Laplace mechanism’s worst-case noise variance is on the edges for connected vehicles, and it reduces the training larger≤ than that of Duchi et al.’s [22] solution; however, the time while guaranteeing the prediction accuracy. Different Laplace mechanism outperforms Duchi et al.’s [22] solution from CLONE, our proposed approach utilizes local differential if ǫ is larger. The worst-case noise variance in PM is smaller privacy noises to protect the privacy of the uploaded data. than that of Laplace and Duchi et al.’s [22] solution when ǫ is Furthermore, Fantacci et al. [33] leverage FL to protect the large. The HM mechanism outperforms other existing solutions privacy of mobile edge computing, while Saputra et al. [34] by taking advantage of Duchi et al.’s [22] solution when ǫ is apply FL to predict the energy demand for electrical vehicle small and PM when ǫ is large. However, PM and HM’s outputs networks. have infinite possibilities that are hard to encode. We would like to find a mechanism that can improve the utility of existing Furthermore, there have been many papers on federated mechanisms. In addition, we believe there is a mechanism that learning and differential privacy such as [31,35]–[43]. For retains a high utility and is easy to encode its outputs. Based example, Truex et al. [36] utilize both secure multiparty on the above intuition, we propose four novel mechanisms that computation and centralized differential privacy to prevent can be used by vehicles in Section V. inference over both the messages exchanged in the process In addition, LDP has been widely used in the research of training the model. However, they do not analyze the of IoT [23]–[27]. For example, Xu et al. [23] integrate impact of the privacy budget on performance of FL. Hu et deep learning with local differential privacy techniques and al. [37] propose a privacy-preserving FL approach for learning apply them to protect users’ privacy in edge computing. effective personalized models. They use Gaussian mechanism, They develop an EdgeSanitizer framework that forms a new a centralized DP mechanism, to protect the privacy of the protection layer against sensitive inference by leveraging a model. Compared with them, our proposed LDP mechanisms deep learning model to mask the learned features with noise provide a stronger privacy protection using the local differen- and minimize data. Choi et al. [24] explore the feasibility tial privacy mechanism. Hao et al. [31] propose a differential of applying LDP on ultra-low-power (ULP) systems. They enhanced federated learning scheme for industrial artificial use resampling, thresholding, and a privacy budget control industry. Triastcyn et al. [39] employ Bayesian differential algorithm to overcome the low resolution and fixed point privacy on federated learning. They make use of the cen- nature of ULPs. He et al. [25] address the location privacy and tralized differential privacy mechanism to protect the privacy usage pattern privacy induced by the mobile edge computing’s of gradients, but we leverage a stronger privacy-preserving wireless task offloading feature by proposing a privacy-aware mechanism (LDP) to protect each vehicle’s privacy. task offloading scheduling algorithm based on constrained Markov decision process. Li et al. [26] propose a scheme Additionally, DP can be applied to various FL algorithms for privacy-preserving data aggregation in the mobile edge such as FedSGD [5] and FedAvg [1]. FedAvg requires computing to assist IoT applications with three participants, users to upload model parameters instead of gradients in i.e., a public cloud center (PCC), an edge server (ES), and FedSGD. The advantage of FedAvg is that it allows users to a terminal device (TD). TDs generate and encrypt data and train the model for multiple rounds locally before submitting send them to the ES, and then the ES submits the aggregated gradients. McMahan et al. [35] propose to apply centralized data to the PCC. The PCC uses its private key to recover DP to FedAvg and FedSGD algorithm. In our paper, we the aggregated plaintext data. Their scheme provides source deploy LDP mechanisms to gradients in FedSGD algorithm. authentication and integrity and guarantees the data privacy Our future work is to develop LDP mechanisms for state-of- of the TDs. In addition, their scheme can save half of the the-art FL algorithms. Zhao et al. [44] propose a SecProbe communication cost. To protect the privacy of massive data mechanism to protect privacy and quality of participants’ generated from IoT platforms, Arachchige et al. [27] design data by leveraging exponential mechanism and functional an LDP mechanism named as LATENT for deep learning. A mechanism of differential privacy. SecProbe guarantees the randomization layer between the convolutional module and the high accuracy as well as privacy protection. In addition, it fully connected module is added to the LATENT to perturb prevents unreliable participants in the collaborative learning.

4 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

III. SYSTEM MODELANDERENTIAL PRIVACY BASED Algorithm 1: Local Differential Privacy based FEDSGD ALGORITHM FedSGD (LDP-FedSGD) Algorithm.

A. System Model 1 Server executes: In this work, we consider a scenario where a number of 2 Server initializes the parameter as θ0; vehicles are connected with a cloud server as Fig. 2. Each 3 for t from 1 to maximal iteration number do vehicle is responsible for continuously performing training and 4 Server sends θt 1 to vehicles in group Gt; − inference locally based on data that it collects and the model 5 for each vehicle i in Group Gt do initiated by the cloud server. Local training dataset is never 6 VehicleUpdate(i, ∆L): uploaded to the cloud server. After finishing predefined epochs 7 Server computes the average of the noisy gradient locally, the cloud server calculates the average of uploaded of group Gt and updates the parameter from θt 1 − gradients from vehicles and updates the global model with to θt: the average. The FL aggregator is honest-but-curious or semi- θ θ η 1 (∆L(θ ; x )), t t 1 G i Gt t 1 i ← − − · | t| ∈ M − honest, which follows the FL protocol but it will try to learn where ηt is the learning rate; P additional information using received data [36,45]. With the 8 if θt and θt 1 are close enough or these remains injected LDP noise, servers or attackers cannot retrieve users’ no vehicle− which has not participated in the information by reversing their uploaded gradients [3,4]. Thus, computation then there is a need to deploy LDP mechanisms to the FL to develop 9 break; a communication-efficient LDP-based FL algorithm. 10 t t + 1; → Noisy gradient gradient Local Dataset 11 VehicleUpdate (i, ∆L):

LDP 12 Compute the (true) gradient ∆L(θt 1; xi), where xi is vehicle i’s data; − 13 Use local differential privacy-compliant algorithm Cloud Server Download model M to compute the noisy gradient (∆(θt 1; xi)); Upload gradients LDP Download model M −

Upload gradients Download model interested in other ways of using DP in federated learning. Upload gradients LDP To explain them, we categorize combinations of DP and federated learning (or distributed computations in general) by considering the place of perturbation (distributed/centralized Fig. 2: System Design. perturbation) and privacy granularity (user-level/record-level privacy protection) [46]: Distributed/Centralized perturbation. Note that differ- B. Federated learning with LDP: LDP-FedSGD • ential privacy is achieved by introducing perturbation. In addition, we propose a local differential privacy Distributed perturbation considers an honest-but-curious based federated stochastic gradient descent algorithm aggregator, while centralized perturbation needs a trusted (LDP-FedSGD) for our proposed system. Details of aggregator. Both perturbation methods defend against LDP-FedSGD are given in Algorithm 1. Unlike the FedAvg external inference attacks after model publishing. algorithm, in the FedSGD algorithm, clients (i.e., vehicles) User-level/Record-level privacy protection. In general, • upload updated gradients instead of model parameters to the a differentially private algorithm ensures that the prob- central aggregator (i.e., cloud server) [5]. However, compared ability distributions of the outputs on two neighboring with the standard FedSGD [5], we add our proposed LDP datasets do not differ much. The distinction between mechanism proposed in Section VI to prevent the privacy user-level and record-level privacy protection lies in how leakage of gradients. Each vehicle locally takes one step of neighboring datasets are defined. We define that two gradient descent on the current model using its local data, and datasets are user-neighboring if one dataset can be formed then it perturbs the true gradient with Algorithm 6. The server from the other dataset by adding or removing all one aggregates and averages the updated gradients from vehicles user’s records arbitrarily. We define that two datasets are and then updates the model. To reduce the communication record-neighboring if one dataset can be obtained from rounds, we separate vehicles into groups, so that the cloud the other dataset by changing a single record of one user. server updates the model after gathering gradient updates Based on the above, we further obtain four paradigms as fol- from vehicles in a group. In the following sections, we will lows: 1) ULDP (user-level privacy protection with distributed introduce how we obtain the LDP algorithm in detail. perturbation), 2) RLDP (record-level privacy protection with distributed perturbation), 3) RLCP (record-level privacy pro- C. Comparing LDP-FedSGD with other privacy-preserving tection with centralized perturbation), and 4) ULCP (user-level federated learning paradigms privacy protection with centralized perturbation). The details The LDP-FedSGD algorithm incorporates LDP into fed- are as follows and can also be found in the second author’s erated learning. In addition to LDP-FedSGD, one may be prior work [46].

5 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

TABLE III: We compare different privacy notions in this 1) In ULDP, each user i selects a privacy parameter ǫi and table. In this paper, we focus on ǫ-local differential privacy applies a randomization algorithm Yi such that given any which achieves user-level privacy protection with distributed two instances xi and xi′ of user i’s data (which are user- 2 perturbation (ULDP). We do not consider record-level pri- neighboring), and for any possible subset of outputs i P ǫi P Y vacy protection with distributed perturbation (RLDP) which of Yi, we obtain [Yi i xi] e [Yi i xi′ ]. Clearly, ULDP is achieved∈ Y | by each≤ user× implementing∈ Y | implements perturbation at each user via standard differential local differential privacy studied in this paper. privacy, since we aim to achieve user-level privacy protection instead of the weaker record-level privacy protection (a vehicle 2) In RLDP, each user i selects a privacy parameter ǫi is a user in our IoV applications and may have multiple and applies a randomization algorithm Yi such that for records). We also do not investigate record/user-level privacy any two record-neighboring instances xi and x′ of user i protection with centralized perturbation (RLCP/ULCP) since i’s data (i.e, xi and x′ differ in only one record), and i this paper considers a honest-but-curious aggregator instead of for any possible subset of outputs i of Yi, we have ǫi Y a trusted aggregator. P [Y x ] e P [Y x′ ]. Clearly, in i ∈ Yi | i ≤ × i ∈ Yi | i RLDP, what each user does is just to apply standard privacy granularity and privacy adversary model differential privacy. In contrast, in ULDP above, each place of perturbation property ǫ-LDP (defined for ǫ-ULDP defend against a honest-but-curious user applies local differential privacy. distributed perturbation) aggregator &external attacks after model 3) In ǫ-RLCP, the aggregator sets a privacy parameter ǫ ǫ-DP with ǫ-RLDP publishing and applies a randomization algorithm Y such that for distributed perturbation ǫ-DP with any user , for any two record-neighboring instances ǫ-RLCP trusted aggregator; defend against external i centralized perturbation attacks after model publishing xi and xi′ of user i’s data, and for any possible subset user-level privacy with 2 ǫ ǫ-ULCP of outputs of Y , we obtain P [Y xi] e centralized perturbation P Y ∈ Y | ≤ × [Y xi′ ]. In other words, in RLCP, the aggregator applies∈ Y standard | differential privacy. For the aggrega- tor to implement RLCP well, typically the aggregator ing process of federated learning for language modeling. should be able to bound the impact of each record on the Specifically, [35] considers user-level privacy to protect information sent from a user to the aggregator. Further the privacy of all typed words of a user, and explains discussions on this can be interesting, but we do not that such privacy protection is more reasonable than present more details since RLCP is not our paper’s focus. protecting individual words as in the case of record-level 4) In ǫ-ULCP, the aggregator sets a privacy parameter ǫ privacy. In addition, although we can compute the level of and applies a randomization algorithm Y so that for user-level privacy from record-level privacy via the group privacy property of differential privacy (see Theorem 2.2 any user i, for any two instances xi and xi′ of user i’s data (which are user-neighboring), and for any possible of [47]), but this may significantly increase the privacy parameter and hence weaken the privacy protection if subset of outputs of Y , we have P [Y xi] ǫ P Y ∈ Y | ≤ a user has many records (note that a larger privacy e [Y xi′ ]. The difference RLCP and ULCP is× that RLCP∈ Y achieves | record-level privacy protection parameter ǫ in ǫ-DP means weaker privacy protection). while ULCP ensures the stronger user-level privacy More specifically, for a user with m records, according protection. to the group privacy property [47], the privacy protection strength for the user under ǫ-record-level privacy is just In the case of distributed perturbation, when all users set as that under mǫ-user-level privacy (i.e., for a user with the same privacy parameter ǫ, we refer to ULDP and RLDP m records, ǫ-RLDP ensures mǫ-ULDP; ǫ-RLCP ensures above as ǫ-ULDP and ǫ-RLDP, respectively. Table III presents mǫ-ULCP). a comparison of ǫ-ULDP, ǫ-RLDP, ǫ-RLCP, and ǫ-ULCP. In We also do not investigate RLCP and ULCP since this this paper’s focus, each user applies ǫ-local differential privacy, • paper considers a honest-but-curious aggregator instead so our framework is under ULDP. The reasons that we consider of a trusted aggregator. The aggregator is not completely ULDP instead of RLDP, RLCP, and ULCP are as follows. trusted, so the perturbation is implemented at each user We do not consider RLDP which implements perturbation • (i.e., vehicle in IoV). at each user via standard differential privacy, since we aim to achieve user-level privacy protection instead of the weaker record-level privacy protection (a vehicle is a user IV. PROBLEM FORMATION in our IoV applications and may have multiple records). Let x be a user’s true value, and Y be the perturbed value. The motivation is that often much data from a vehicle Under the perturbation mechanism , we use E [Y x] to may be about the vehicle’s regular driver, and it often denote the expectation of the randomizedM output Y givenM | input makes more sense to protect all data about the regular x. Var [Y x] is the variance of output Y given input x. driver instead of just protecting each single record. A MaxVarM( )| denotes the worst-case Var [Y x]. We are inter- similar argument has been recently stated in [35], which ested in findingM a privatization mechanismM | that minimizes incorporates user-level differential privacy into the train- MaxVar( ) by solving the following constraintM minimization problem:M 2For simplicity, we slightly abuse the notation and denote the output of algorithm Yi (resp., algorithm Y ) by Yi (resp., Y ). min MaxVar( ), M M 6 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

s.t. Eq. (1), Algorithm 2: Three-Outputs Mechanism for One- E [Y x]= x, and Dimensional Numeric Data. M | Input: tuple x [ 1, 1] and privacy parameter ǫ. P [Y Y x]=1. ∈ − M ∈ | Output: tuple Y C, 0, C . The second constraint illustrates that our estimator is unbiased, ∈ {− } 1 Sampling a random variable u with the probability and the third constraint shows the proper distribution where distribution as follows: Y is the range of randomized function . In the following M P [u = 1] = P C x, sections, if is clear from the context, we omit the subscript − − ← M P for simplicity. [u =0]= P0 x, and M ← P [u =1]= PC x, ← where P , P and P are given in Eq. (2), V. MECHANISMS FOR ESTIMATION OF A SINGLE C x 0 x C x Eq. (3) and− ← Eq. (4).← ← NUMERIC ATTRIBUTE 2 if u = 1 then − To solve the problem in Section IV, we propose four 3 Y = C; − local differential privacy mechanisms: Three-Outputs, 4 else if u =0 then PM-OPT, PM-SUB, and HM-TP. Fig. 1 compares the worst- 5 Y =0; case noise variances of existing mechanisms and our proposed 6 else mechanisms. Three-Outputs has three discrete output pos- 7 Y = C; sibilities, which incurs little communication cost because two 8 return Y ; bits are enough to encode three different outputs. Moreover, it achieves a small worst-case noise variance in the high privacy regime (small privacy budget ǫ). However, to maintain a low ǫ worst-case noise variance in the low privacy regime (large 1 P0←0 e P0←0 1 P0←0 − 2 + −eǫ+1 − 2 x, if 0 x 1, PC x = − ǫ ≤ ≤ privacy budget ǫ), we propose PM-OPT and PM-SUB. Both of 1 P ← 1 P ← e P ← ←  − 0 0 + − 0 0 − 0 0  x, if 1 x 0, them achieve higher accuracies than Three-Outputs and  2 2 − eǫ(eǫ+1) − ≤ ≤ other existing solutions when the privacy budget ǫ is large.   (3)  Additionally, we discretize their continuous ranges of output P0 0 and P0 x = P0 0 + ( ← P0 0)x, if 1 x 1, for vehicles to encode using a post-processing discretization ← ← eǫ − ← − ≤ ≤ algorithm. In the following sections, we will explain our pro- (4) where P0 0 is defined by posed four mechanisms and the post-processing discretization ← P0 0 := algorithm in detail respectively. ← 0, if ǫ< ln 2, 1 ( e2ǫ 4eǫ 5 A. Three-Outputs Mechanism  − 6 − − −  π 1 ∆1  +2√∆0 cos( 3 + 3 arccos( 3 ))), if ln 2 ǫ ǫ′,  − 2∆ 2 ≤ ≤ Now, we propose a mechanism with three output possi-  0 eǫ bilities named as Three-Outputs which is illustrated in eǫ+2 , if ǫ>ǫ′, Algorithm 2. Three-Outputs ensures low communication  (5)  cost while achieving a smaller worst-case noise variance than in which existing solutions in the high privacy regime (small privacy ∆ := e4ǫ + 14e3ǫ + 50e2ǫ 2eǫ + 25, (6) 0 − budget ǫ). Duchi et al.’s [22] solution contains two output 6ǫ 5ǫ 4ǫ 3ǫ 2ǫ ∆1 := 2e 42e 270e 404e 918e possibilities, and it outperforms other approaches when the − − − − − + 30eǫ 250, (7) privacy budget is small. However, Kairouz et al. [12] prove − that two outputs are not always optimal as ǫ increases. By 3+ √65 outputting three values instead of two, Three-Outputs and ǫ′ := ln ln 5.53. (8) 2 ! ≈ improves the performance as the privacy budget increases, Next, we will show how we derive the above probabil- which is shown in Fig. 1. When the privacy budget is small, ities. For a mechanism which uses x [ 1, 1] as the input Three-Outputs is equivalent to Duchi et al.’s [22] solu- and only three possibilities C, 0, C for∈ the− output value, it tion. satisfies − For notional simplicity, given a mechanism , we often PC x P0 x P C x ǫ ǫ M ǫ-LDP : ← , ← , − ← [e− ,e ], (9a) write P as below. We also ′ ′ ′ [Y = y X = x] Py x( ) PC x P0 x P C x ∈ M | P ← M ← ← − ← sometimes omit to obtain [Y = y X = x] and Py x.  unbiased estimation: M | ← Given a tuple , Three-Outputs returns a  x [ 1, 1]  (9b) ∈ −  C PC x +0 P0 x + ( C) P C x = x, perturbed value Y that equals C, 0 or C with probabilities  · ← · ← − · − ← defined by − proper distribution: ǫ 1 P0←0 1 P0←0 e P0←0  Py x 0 and PC x + P0 x + P C x =1. (9c) − + − ǫ− ǫ x, if 0 x 1,  2 2 − e (e +1) ≤ ≤  ← ≥ ← ← − ← P C x = ǫ To calculate values of PC x, P0 x and P C x, we use − ←  1 P0←0  e P0←0 1 P0←0   ← ← − ← − 2 + −eǫ+1 − 2 x, if 1 x 0, Lemma 1 below to convert a mechanism satisfying the  − − ≤ ≤ M1   (2) requirements in (9a) (9b) (9c) to a symmetric mechanism 2.  M 7 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

Then, we use Lemma 2 below to transform the symmetric Then, we may assign values to our designed probabili- mechanism further to whose worst-case noise variance is ties above. We find that if a symmetric mechanism satisfies M3 smaller than 2’s. Next, we use P0 1 to represent other prob- Eq. (19a) and Eq. (19b), it obtains a smaller worst-case noise abilities, andM then we prove that we← get the minimum variance variance. From Lemma 2 below, we enforce ǫ when ǫ using Lemma 3. Finally, Lemma 4 PC 1 = e PC 1, (19a) P0 0 = e P0 1 ← ←− ← ← ǫ and Lemma 5 are used to obtain values for P0 0 and the P C 1 = e P C 1. (19b) ←  − ←− − ← worst-case noise variance of Three-Outputs, respectively. Hence, given a symmetric mechanism 2 satisfying Inequal- M Thus, we can obtain values of PC x, P0 x and P C x using ity (20), we can transform it to a new symmetric mechanism ← ← − ← P0 0. In the following, we will illustrate above processes in 3 which satisfies Eq. (19a) and Eq. (19b) through processes ← M ǫ detail. of Eq. (21) (22) (23) until PC 1 = e P C 1. After ←− − ← By symmetry, for any x [ 1, 1], we enforce transformation, the new mechanism 3 achieves a smaller ∈ − M PC x = P C x, (10a) worst-case noise variance than mechanism 2. Therefore, ← − ←− we use the new symmetric mechanism Mto replace P0 x = P0 x, (10b) M3 M2 where Eq. (10b) can be← derived from←− Eq. (10a). The formal in the future’s discussion. Details of transformation are in the justification of Eq. (10a) (10b) is given by Lemma 1 below. Lemma 2. Since the input domain [ 1, 1] is symmetric, we can transform − Lemma 2. For a symmetric mechanism 2, if any mechanism satisfying requirements in (9a) (9b) (9c) to a ǫ M PC 1( 2)

x2. (25) above reasons, we elaborate Three-Outputs but not − Complete details for obtaining Eq. (25) are in Appendix C of Four-Outputs in this paper. the submitted supplementary file. Next, we use Lemma 4 to obtain the optimal P0 0 in B. PM-OPT Mechanism Three-Outputs to achieve the minimum worst-case← vari- ance as follows: Now, we advocate an optimal piecewise mechanism (PM-OPT) as shown in Algorithm 3 to get a small worst- Lemma 4. The optimal P0 0 to minimize the case variance when the privacy budget is large. As shown ← maxx [ 1,1] Var[Y x] is defined by Eq. (5). in Fig. 1, Three-Outputs’s worst-case noise variance is ∈ − | smaller than PM’s when the privacy budget ǫ < 3.2. But it Proof. The proof details are given in Appendix E of the loses the advantage when the privacy budget ǫ 3.2. As the submitted supplementary file. privacy budget increases, Kairouz et al. [12] suggested≥ to send

Remark 1. Fig. 3 displays how P0 0 changes with ǫ in more information using more output possibilities. Besides, we ← observe that it is possible to improve Wang et al.’s [8] PM to Eq. (5). When the privacy budget ǫ is small, P0 0 =0. Thus, Three-Outputs is equivalent to Duchi et al.’s← [22] solution achieve a smaller worst-case noise variance. Thus, inspired by them, we propose an optimal piecewise mechanism named as when P0 0 =0. However, as the privacy budget ǫ increases, ← PM-OPT with a smaller worst-case noise variance than PM. P0 0 increases, which means that the probability of outputting true← value increases. Algorithm 3: PM-OPT Mechanism for One- Dimensional Numeric Data under Local Differential 1 Privacy. Input: tuple x [ 1, 1] and privacy parameter ǫ. Output: tuple Y∈ −[ A, A]. 0.8 ∈ − 1 Value t is calculated in the Eq. (30); 2 Sample u uniformly at random from [0, 1]; 0.6 eǫ 0 3 if u< t+eǫ then

0 4 Sample Y uniformly at random from P 0.4 [L(ǫ,x,t), R(ǫ,x,t)]; 5 else 0.2 6 Sample Y uniformly at random from [ A, L(ǫ,x,t)) (R(ǫ,x,t), A]; − ∪ 0 7 return Y ; 0 2 4 6 8 privacy budget

Fig. 3: Optimal P0 0 if the privacy budget ǫ [0, 8]. ← ∈ ( = | ) By summarizing above, we obtain P C x, PC x and ݌݂݀ ܻ ݕ ݔ ← ← − P0 x from Eq. (2), Eq. (3) and Eq. (4) using P0 0. ← ← Then, we can calculate the optimal P0 0 to obtain the ← minimum worst-case noise variance of Three-Outputs as ܿ follows: Lemma 5. The minimum worst-case noise variance of ݀ Three-Outputs is obtained when P0 0 satisfies Eq. (5). ← 0 Proof. The proof details are given in Appendix F of the ( , , ) ( , , ) submitted supplementary file. െܣ ܣ ݕ ߳ ݔ ݐ functionݔ ܴ ߳ ݔFݐ[Y = y x] of the ܮFig. 4: The probability density A clarification about Three-Outputs ver- randomized output Y after applying ǫ-local differential| pri- sus Four-Outputs. One may wonder why we consider vacy. a perturbation mechanism with three outputs (i.e., our Three-Outputs) instead of a perturbation mechanism For a true input x [ 1, 1], the probability density function with four outputs (referred to as Four-Outputs), since of the randomized output∈ − Y [ A, A] after applying local using two bits to encode the output of a perturbation differential privacy is given by∈ − mechanism can represent four outputs. The reason is as c,for y [L(ǫ,x,t), R(ǫ,x,t)], (26a) follows. The approach to design Four-Outputs is similar F [Y = y x]= ∈ | for , (26b) to that for Three-Outputs, but the detailed analysis for d, y [ A, L(ǫ,x,t)) (R(ǫ,x,t), A] where  ∈ − ∪ Four-Outputs will be even more tedious than that for eǫt(eǫ 1) c = − , (27) Three-Outputs (which is already quite complex). Given 2(t + eǫ)2

9 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

t(eǫ 1) d = − , (28) Lemma 7. The optimal t for mint maxx [ 1,1] Var[Y x] is 2(t + eǫ)2 Eq. (30). ∈ − | (eǫ + t)(t + 1) A = , (29) Proof. By computing the first-order derivative and second- t(eǫ 1) ǫ − order derivative of mint maxx [ 1,1] Var[Y x], we get the (e + t)(xt 1) ∈ − | L(ǫ,x,t)= − , optimal t. The proof details are given in Appendix D of the t(eǫ 1) − submitted supplementary file. (eǫ + t)(xt + 1) R(ǫ,x,t)= , and t(eǫ 1) − C. PM-SUB Mechanism 1 e2ǫ +22/3 3 e2ǫ e4ǫ+ We propose a suboptimal piecewise mechanism (PM-SUB) 2 −  q to simplify the sophisticated computation of t in Eq. (4) of p ǫ 3ǫ 1 3 4e 2e PM-OPT, and details of PM-SUB are shown in Algorithm 4.  2e2ǫ 22/3 e2ǫ e4ǫ + − 2 − − 2ǫ 2/3 3 2ǫ 4ǫ  s e +2 √e e  ǫ p − Algorithm 4: PM-SUB Mechanism for One-  e p  , if ǫ< ln √2, (30a) Dimensional Numeric Data under Local Differential − 2  Privacy.  1 3 t = e2ǫ +22/3 e2ǫ e4ǫ+ − 2 − Input: tuple x [ 1, 1] and privacy parameter ǫ.  q ∈ −  p ǫ 3ǫ Output: tuple Y [ A, A]. 1 2ǫ 2/3 3 2ǫ 4ǫ 4e 2e ∈ − 2e 2 e e − 1 Sample uniformly at random from ; 3 u [0, 1] 2s − − − e2ǫ +22/3 √e2ǫ e4ǫ eǫ  2 if u< then  ǫ p − eǫ/3+eǫ  e  , if ǫ> ln √2, p (30b) 3 Sample Y uniformly at random from − 2 (eǫ+eǫ/3)(xeǫ/3 1) (eǫ+eǫ/3)(xeǫ/3+1)  −  [ eǫ/3(eǫ 1) , eǫ/3(eǫ 1) ];  3+2√3 1 − −  − , if ǫ = ln √2. (30c) 4 else  √ p 2 5 Sample uniformly at random from  Y  t 1 L(ǫ,1,t) (eǫ+eǫ/3)(xeǫ/3 1) (eǫ+eǫ/3)(xeǫ/3+1) The meaning of t can be seen from t+1− = R(ǫ,1,t) . When −  [ A, eǫ/3(eǫ 1) ) ( eǫ/3(eǫ 1) , A]; the input is x =1, the length of the higher probability density − − ∪ − eǫt(eǫ 1) function F [Y = y x] = − is R(ǫ, 1,t) L(ǫ, 1,t). 6 return Y ; | 2(t+eǫ)2 − R(ǫ, 1,t) is the right boundary, and L(ǫ, 1,t) is the left t 1 boundary. If 0

Thus, when x =1, we obtain the worst-case noise variance as follows: D. Discretization Post-Processing t +1 (t + eǫ) (t + 1)3 + eǫ 1 Both PM-OPT and PM-SUB’s output ranges is [ 1, 1] which max Var[Y x]= + − . − x [ 1,1] | eǫ 1 3t2(eǫ 1)2 is continuous, so that there are infinite output possibilities ∈ −  − − (32) given an input x. Thus, it is difficult to encode their outputs Then, we obtain the optimal t in Lemma 7 to minimize for vehicles. Hence, we consider to apply a post-processing Eq. (32). process to discretize the continuous output range into finite

10 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

① If y is one of the above (2m + 1) values, we set Z as y. 1 ② If y is not one of the above (2m + 1) values, and then 0.995 there exist some integer k m, m +1,...,m 1 kC (k+1)C∈ {− − ym− } 0.99 such that m

0.965 of Z given the original input as x equals x). The following Lemma 8 shows the probability distribution 0.96 of assigning y with a boundary value in the second case above 0.955 0 1 2 3 4 5 6 7 8 when the intermediate output y is not one of discrete (2m+1) privacy budget values. Fig. 5: PM-OPT’s worst-case noise variance versus PM-SUB’s Lemma 8. After we obtain the intermediate output y after worst-case noise variance. perturbation, we discretize it to a random variable Z equal to kC (k+1)C m or m with the following probabilities: ym kC Algorithm 5: Discretization Post-Processing. k +1 C , if z = m , P [Z = z Y = y]= − (35) Input: Perturbed data y [ C, C], and domain  ym (k+1)C ∈ − | k, if z = . [ C, C] is separated into 2m pieces, where m  C − m is− a positive integer. Proof. The proof details are given in Appendix K of the  Output: Discrete data Z. submitted supplementary file. 1 Sample a Bernoulli variable u such that After discretization, the worst-case noise variance does not m y C ( · + 1) m change or get worse proved by Lemma 9 as follows: P [u =1]= · C y ; m − · C   ! Lemma 9. Let local differential privacy mechanism be Mech- 2 if u =1 then · anism 1, and discretization algorithm be Mechanism 2. C m y M M 3 Z = · C ; Let all of output possibilities of Mechanism 1 be S1, and ⌊m ⌋ M 4 else output possibilities of Mechanism 2 be S2. S2 S1. When m·y M ⊂ C ( +1) given input x, 1 and 2 are unbiased. The worst-case 5 · C Z = ⌊ m ⌋ ; M M noise variance of Mechanism 2 is greater than or equal to 6 return Z; the worst-case noise varianceM of Mechanism . M1 Proof. The proof details are given in Appendix L of the submitted supplementary file. output possibilities. Algorithm 5 shows our discretization post- processing steps. E. HM-TP Mechanism The idea of Algorithm 5 is as follows. We discretize the Fig. 1 shows that Three-Outputs outperforms PM-SUB range of output into 2m parts due to the symmetric range when the privacy budget ǫ is small, whereas PM-SUB [ C, C], and then we obtain 2m + 1 output possibilities. achieves a smaller variance if the privacy budget ǫ is large. After− we get a perturbed data y, it will fall into one of 2m To fully take advantage of two mechanisms, we combine segments. Then, we categorize it to the left boundary or the Three-Outputs and PM-SUB to create a new hybrid right boundary of the segment, which resembles sampling a mechanism named as HM-TP. Fig. 1 illustrates that HM-TP Bernoulli variable. obtains a lower worst-case noise variance than other solutions. Next, we explain how we derive probabilities for the Hence, HM-TP invokes PM-SUB with probability β. Oth- Bernoulli variable. Let the original input be x. A random erwise, it invokes Three-Outputs. We define the noisy variable Y represents the intermediate output after the per- variance of HM-TP as Var [Y x] given inputs x as follows: H | turbation and a random variable Z represents the output after Var [Y x]= β Var [Y x]+(1 β) Var [Y x], H | · P | − · T | the discretization. The range of Y is [ C, C]. Because the where Var [Y x] and Var [Y x] denote noisy outputs’ vari- − P T range of output is symmetric with respect to 0, we discretize ances incurred| by PM-SUB and| Three-Outputs, respec- both [ C, 0] and [0, C] into m parts, where the value of m tively. The following lemma presents the value of β: depends− on the user’ requirement. Thus, we discretize Y to Z Lemma 10. The maxx [ 1,1] Var [Y x] is minimized when β to take only the following (2m + 1) values: H is Eq. (150). Due to the∈ − complicated equation| of , we put it C β i : integer i m, m +1,...,m . (34) in the appendix. × m ∈ {− − } WhenY is instantiated as y [ C, C], we have the following Proof. The proof details are given in Appendix M of the two cases: ∈ − submitted supplementary file.

11 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

Since we have obtained the probability β, we can calculate and Appendix S of the submitted supplementary file the exact expression for the worst-case noise variance in proves our selected k is optimal after extending PM-SUB, Lemma 11 as follows: Three-Outputs, and HM-TP to support d dimensional attributes. Lemma 11. If β satisfies Lemma 10, we obtain the worst-case Overall, our algorithm for collecting multiple attributes noise variance of HM-TP as outperforms existing solutions, which is confirmed by our max Var [Y x]= x [ 1,1] H | experiments in the Section VII. But Three-Outputs uses ∈ − 2(eǫ a)2(eǫ 1) aeǫ(eǫ+1)2 only one more bit compared with Duchi et al.’s [22] solution Var [Y x∗], if 0 <β< 2(eǫ−a)2(eǫ−+t)−aeǫ(eǫ+1)2 , H | − − to encode outputs. Moreover, our Three-Outputs obtains a ( max Var [Y 0], Var [Y 1] , otherwise, higher accuracy in the high privacy regime (where the privacy { H | H | } (β 1)aeǫ(eǫ+1)2 budget is small) and saves many bits for encoding since PM and where x∗ := 2(eǫ a−)2(β(eǫ+t) eǫ+1) and a = P0 0 which is defined in Eq. (5).− − ← HM’s continuous output range requires infinite bits to encode, whereas PM-SUB and HM-TP’s advantages are obvious at a Proof. The proof details are given in Appendix U of the large privacy budget. Furthermore, because vehicles can not submitted supplementary file. encode continuous range, we discretize the continuous range of outputs to discrete outputs. Our experiments in Section ?? VI. MECHANISMS FOR ESTIMATION OF MULTIPLE confirm that we can achieve similar results to algorithms NUMERIC ATTRIBUTES before discretizing by carefully designing the number of discrete parts. Hence, our proposed algorithms are obviously Now, we consider a case in which the user’s data record more suitable for vehicles than existing solutions. contains d > 1 attributes. There are three existing solu- Intuitively, Algorithm 6 requires every user to submit k tions to collect multiple attributes: (i) The straightforward attributes instead of d attributes, such that the privacy budget approach which collects each attribute with privacy budget for each attribute increases from ǫ/d to ǫ/k, which helps ǫ/d. Based on the composition theorem [47], it satisfies ǫ- to minimize the noisy variance. In addition, by setting k LDP after collecting of all attributes. But the added noise can as Eq. (36), algorithm 6 achieves an asymptotically optimal be excessive if d is large [8]. (ii) Duchi et al.’s [22] solution, performance while preserving privacy, which we will prove which is rather complicated, handles numeric attributes only. using Lemma 12 and 13. Lemma 12 and 13 are proved in the (iii) Wang et al.’s [8] solution is the advanced approach same way as that of Lemma 4 and 5 in [8]. that deals with a data tuple containing both numeric and categorical attributes. Their algorithm requires to calculate an Lemma 12. Algorithm 6 satisfies ǫ-local differential privacy. optimal k < d based on the single dimensional attribute’s ǫ- In addition, given an input tuple x, it outputs a noisy tuple Y , LDP mechanism, and a user submits selected k dimensional such that for any j [1, d], and each tj of those k attributes ∈ attributes instead of d dimensions. is selected uniformly at random (without replacement) from all d attributes of x, and then E[Y [tj ]] = x[tj ]. Algorithm 6: Mechanism for Multiple-Dimensional Proof. Algorithm 6 composes k numbers of ǫ-LDP pertur- Numeric Attributes. bation algorithms; thus, based on composition theorem of d Input: tuple x [ 1, 1] and privacy parameter ǫ. differential mechanism [48], Algorithm 6 satisfies ǫ-LDP. As Output: tuple Y∈ −[ A, A]d. ∈ − we can see from Algorithm 6, each perturbed output Y equals 1 Let Y = 0, 0,..., 0 ; d k k to k yj with probability d or equals to 0 with probability 1 d . h i ǫ − 2 Let k = max 1, min d, ; Thus, E[Y [t ]] = k E[ d y ]= E[y ]= x[t ] holds. { { 2.5 }} j d · k · j j j 3 Sample k values uniformlyj withoutk replacement from 1 n Lemma 13. For any j [1, d], let Z[tj]= n i=1 Y [tj ] and 1, 2,...,d ; 1 n .∈ With at least probability, { } X[tj]= n i=1 x[tj ] 1 β 4 for each sampled value j do − P ǫ d ln(d/β)) 5 P Feed x[tj ] and k as input to PM-SUB, max Z[tj ] X[tj] = O . j [1,d] | − | ǫ√n Three-Outputs or HM-TP, and obtain a noisy ∈ p  value yj ; Proof. The proof details are given in Appendix R of the d 6 submitted supplementary file. Y [tj ]= k yj ; 7 return Y ; VII. EXPERIMENTS Thus, we follow Wang et al.’s [8] idea to extend Sec- We implemented both existing solutions and our proposed tion V to the case of multidimensional attributes. Algorithm6 solutions, including PM-SUB, Three-Outputs, HM-TP shows the pseudo-code of our extension for our PM-SUB, proposed by us, PM and HM proposed by Wang et al. [8], Three-Outputs, and HM-TP. Given a tuple x [ 1, 1]d, Duchi et al.’s [22] solution and the traditional Laplace mech- the algorithm returns a perturbed tuple Y that has∈ non-zero− anism. Our datasets include (i) the WISDM Human Activ- value on k attributes, where ity Recognition dataset [49] is a set of accelerometer data ǫ collecting on Android phones from 35 subjects performing 6 k = max 1, min d, , (36) { { 2.5 }} activities, where the domain of the timestamps of the phone’s j k 12 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

-2 -3 10 10 10-2 10-3

10-3 10-4 10-4 -4 10 10-4 MSE MSE MSE MSE -5 10-5 10 10-5

-6 -6 -6 10 10 10 10-6 0.5 1 2 4 0.5 1 2 4 0.5 1 2 4 0.5 1 2 4 privacy budget privacy budget privacy budget privacy budget (a) MX-Numeric (b) BR-Numeric (c) WISDM-Numeric (d) Vehicle-Numeric Fig. 6: Result accuracy for mean estimation (on numeric attributes).

10-2 10-2 10-2 10-2

-4 -4 -4 -4

MSE 10 MSE 10 MSE 10 MSE 10

10-6 10-6 10-6 10-6 0.5 1 2 4 0.5 1 2 4 0.5 1 2 4 0.5 1 2 4 privacy budget privacy budget privacy budget privacy budget (a) µ = 0 (b) µ = 1/3 (c) µ = 2/3 (d) µ = 1 Fig. 7: Result accuracy on synthetic datasets with 16 dimensions, each of which follows a Gaussian distribution N(µ, 1/16) truncated to [ 1, 1]. −

0.4 0.4 0.55 0.5 0.3 0.3 0.45

0.4 0.2 0.2 0.35 Misclassification rates Misclassification rates Misclassification rates 0.3 0.5 1 2 4 0.5 1 2 4 0.5 1 2 4 privacy budget privacy budget privacy budget (a) MX (b) BR (c) WISDN Fig. 8: Logistic Regression. uptime is removed from the dataset, and the remaining 3 extracted from each sensor’s microphone. One attribute is numeric attributes are accelerations in x, y, and z directions categorical denoting different types of vehicles, which are measured by the Android phone’s accelerometer and 2 cat- labeled manually by a human operator to ensure high accuracy. egorical attributes; (ii) two public datasets extracted from Besides, information about vehicles is gathered to find out Integrated Public Use Microdata Series [50] contain census the type or brand of the vehicle. The Vehicle dataset is records from Brazil (BR) and Mexico (MX). BR includes 4M also used as the federated learning benchmark by [52]. We tuples and 16 attributes, of which 6 are numerical and 10 are normalize the domain of each numeric attribute to [ 1, 1]. In categorical. MX contains 4M records and 19 attributes, of our experiments, we report average results over 100−runs. which 5 are numerical and 14 are categorical; (iii) a Vehicle dataset obtained by collecting from a distributed sensor net- A. Results on the Mean Values of Numeric Attributes work, including acoustic (microphone), seismic (geophone), We estimate the mean of every numeric attribute by col- and infrared (polarized IR sensor) [51]. The dataset contains lecting a noisy multidimensional tuple from each user. To 98528 tuples and 101 attributes, where 100 attributes are compare with Wang et al.’s [8] mechanisms, we follow their numerical representing information such as the raw time series experiments and then divide the total privacy budget ǫ into data observed at each sensor and acoustic feature vectors two parts. Assume a tuple contains d attributes which include

13 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

0 0 0 10 10 10

-2 -1 10 MSE MSE MSE 10

10-5 10-4 0.5 1 2 4 0.5 1 2 4 0.5 1 2 4 privacy budget privacy budget privacy budget (a) MX (b) BR (c) WISDN Fig. 9: Linear Regression.

0.4 0.4 0.5 0.5 0.4 0.3 0.3 0.3 0.2 0.2 0.4 0.2 0.1 Misclassification rates Misclassification rates Misclassification rates

0.1 0.3 Misclassification rates 0.5 1 2 4 0.5 1 2 4 0.5 1 2 4 0.5 1 2 4 privacy budget privacy budget privacy budget privacy budget (a) MX (b) BR (c) WISDN (d) Vehicle Fig. 10: Support Vector Machines.

10-2 10-2

10-4 10-4 10-4 MSE MSE MSE

10-6 10-6 10-6 0.5 1 2 4 0.5 1 2 4 0.5 1 2 4 privacy budget privacy budget privacy budget (a) MX (b) BR (c) WISDN Fig. 11: Result accuracy for mean estimation with discretization post processing on PM, HM, and HM-TP.

dn numeric attributes and dc categorical attributes. Then, we We also run a set of experiments on synthetic datasets allocate dnǫ/d budget to numeric attributes, and dcǫ/d to that contain numeric attributes only. We create four synthetic categorical ones, respectively. Our approach of using LDP datasets, including 16 numeric attributes where each attribute for categorical data is same as that of Wang et al. [8]. We value is obtained by sampling from a Gaussian distribution 1 2 estimate the mean value for each of the numeric attributes with mean value u 0, 3 , 3 , 1 and standard deviation 1 ∈ { } using existing methods: (i) Duchi et al.’s [22] solution han- of 4 . By evaluating the MSE in estimating mean values of dles multiple numeric attributes directly; (ii) when using the numeric attributes with our proposed mechanisms, we present Laplace mechanism, it applies ǫ/d budget to each numeric our experimental results in Fig. 7. Hereby, we confirm that attribute individually; (iii) PM and HM are from Wang et al. [8]. PM-SUB, Three-Outputs and HM-TP outperform existing In Section VI, we evaluate the mean square error (MSE) solutions. of the estimated mean values for numeric attributes using our proposed approaches. Fig. 6 presents MSE results as a function of the total budget of ǫ in the datasets (WISDM, MX, B. Results on Empirical Risk Minimization BR and Vehicle). To simplify the complexity, we use last 6 In the following experiments, we evaluate the proposed al- numerical attributes of the Vehicle dataset to calculate MSE. gorithms’ performance using linear regression, logistic regres- Overall, our experimental evaluation shows that our proposed sion, and SVM classification tasks. We change each categorical approaches outperform existing solutions. HM-TP outperforms attribute t with k values into k 1 binary attributes with a j − existing solutions in all settings, whereas PM-SUB’s MSE is domain 1, 1 , for example, given tj , (i) 1 represents the smaller than PM’s when privacy budget ǫ is large such as 4, and l-th (l <{− k) value} on the l-th binary attribute and 1 on Three-Outputs’ performance is better at a small privacy each of the rest of k 2 attributes; (ii) 1 represents− the budget. Hence, experimental results are in accordance with our k-th value on all binary− attributes. After the− transformation, theories. the dimension of WISDN is 43, BR (resp. MX) is 90 (resp.

14 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

100 0.4 100 0.3

-2 0.2

MSE 10 MSE MSE -1 0.1 10

0.5 1 2 4 0.5 1 2 4 0.5 1 2 4 privacy budget privacy budget privacy budget (a) MX (b) BR (c) WISDN Fig. 12: Linear Regression with discretization post processing on PM, HM, and HM-TP (privacy parameter ǫ =4).

0.4 0.4 0.5 0.3 0.3 0.4

0.2 0.2 0.3 Misclassification rates Misclassification rates Misclassification rates 0.5 1 2 4 0.5 1 2 4 0.5 1 2 4 privacy budget privacy budget privacy budget (a) MX (b) BR (c) WISDN Fig. 13: Logistic Regression with discretization post processing on PM, HM, and HM-TP (privacy budget ǫ =4).

0.04 0.02 0.03 PM PM 0.07 PM PM-SUB 0.015 PM-SUB PM-SUB 0.02 Three-Outputs Three-Outputs Three-Outputs 0.01 0.06 MSE MSE MSE 0.01 0.005 0.05

3 21 201 2001 3 21 201 2001 3 21 201 2001 Output Possibilities Output Possibilities Output Possibilities (a) MX (b) BR (c) WISDN Fig. 14: Linear Regression with discretization post processing on PM, HM, and HM-TP (privacy budget ǫ =4).

0.186 updates the model after each group of vehicles send noisy PM PM-SUB gradients. The experiment involves 8 competitors: PM-SUB, Three-Outputs Three-Outputs, HM-TP, PM, HM, Duchi et al.’s solution, 0.184 Laplace and a non-private setting. We set the regularization 4 factor λ = 10− in all approaches. We use 10-fold cross-

Misclassification rate 0.182 200001 2e+06 2e+07 validation 5 times to evaluate the performance of each method Output Possibilities in each dataset. Fig. 8 and Fig. 10 show that the proposed (a) BR mechanisms (PM-SUB, Three-Outputs, and HM-TP) have lower misclassification rates than other mechanisms. Fig. 9 Fig. 15: Support Vector Machine with discretization post shows the MSE of the linear regression model. We ignore processing on PM, HM, and HM-TP (privacy budget ǫ =5). Laplace’s result because its MSE clearly exceeds those of other mechanisms. In the selected privacy budgets, our proposed mechanisms (PM-SUB, Three-Outputs, and HM-TP) out- 94) and Vehicle is 101. Since both the BR and MX datasets perform existing approaches, including Laplace mechanism, contain the “total income” attribute, we use it as the dependent Duchi et al.’s solution, PM, and HM. variable and consider other attributes as independent variables. The Vehicle dataset is used for SVM [51,52]. Each tuple in the Vehicle dataset contains 100-dimensional feature and a binary C. Results after Discretization label. In this section, we add a discretization post processing step Consider each tuple of data as the dataset of a vehicle, so in Algorithm 5 to the implementation of mechanisms with vehicles calculate gradients and run different LDP mechanisms continuous range of outputs, including PM, PM-SUB, HM and to generate noisy gradients. Each mini-batch is a group of HM-TP. To confirm that the discretization is effective, we vehicles. Thus, the centralized aggregator i.e. cloud server perform the following experiments. We separate the output

15 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

domain [ C, C] into 2000 segments, and then we have 2001 [2] V. Bindschaedler, R. Shokri, and C. A. Gunter, “Plausible deniability possible outputs− given an initial input x. We add a discretiza- for privacy-preserving data synthesis,” Proceedings of the VLDB En- dowment, vol. 10, no. 5, pp. 481–492, 2017. tion step to the experiments in Section VII-A. Fig. 11 displays [3] B. Hitaj, G. Ateniese, and F. Perez-Cruz, “Deep models under the GAN: our experimental results. We confirm that our proposed ap- information leakage from collaborative deep learning,” in Proceedings of proaches outperform existing solutions in estimating the mean the 2017 ACM SIGSAC Conference on Computer and Communications value using three real-world datasets: WISDM, MX, and BR Security, 2017, pp. 603–618. [4] G. Xu, H. Li, S. Liu, K. Yang, and X. Lin, “Verifynet: Secure and veri- after discretizing. fiable federated learning,” IEEE Transactions on Information Forensics In addition, we use log regression and linear regression to and Security, vol. 15, pp. 911–926, 2019. evaluate the performance after discretization. We repeat the [5] J. Chen, X. Pan, R. Monga, S. Bengio, and R. Jozefowicz, “Revisiting distributed synchronous SGD,” arXiv preprint arXiv:1604.00981, 2016. experiments in Section VII-B with an additional discretiza- [6] J. Duchi, M. J. Wainwright, and M. I. Jordan, “Local privacy and tion post processing step. Fig. 12 and Fig. 13 present our minimax bounds: Sharp rates for probability estimation,” in Advances experimental results. Compared with other approaches, the in Neural Information Processing Systems, 2013, pp. 1529–1537. [7] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise performance is similar to that before discretizing. Furthermore, to sensitivity in private data analysis,” in Theory of Cryptography Fig. 14 illustrates how the accuracy changes as output possi- Conference (TCC), 2006, pp. 265–284. bilities increase. It shows that the misclassification rate of the [8] N. Wang, X. Xiao, Y. Yang, J. Zhao, S. C. Hui, H. Shin, J. Shin, logistic regression task and the MSE of the linear regression and G. Yu, “Collecting and analyzing multidimensional data with local differential privacy,” in IEEE International Conference on Data task are related to the size of output possibilities. Although Engineering (ICDE), 2019. incurring with randomness, we find that the misclassification [9] R. Bassily and A. Smith, “Local, private, efficient protocols for succinct rate and MSE decrease as the number of output possibili- histograms,” in Proceedings of the forty-seventh annual ACM symposium on Theory of computing, 2015, pp. 127–135. ties increases. When there are three output possibilities, it [10] T. Wang, J. Blocki, N. Li, and S. Jha, “Locally differentially private pro- incurs randomness. Moreover, Fig. 15 shows that PM-SUB tocols for frequency estimation,” in 26th {USENIX} Security Symposium outperforms Three-Outputs, when the number of output ({USENIX} Security 17), 2017, pp. 729–745. ´ possibilities is large. However, when we discretize the range [11] U. Erlingsson, V. Pihur, and A. Korolova, “RAPPOR: Randomized aggregatable privacy-preserving ordinal response,” in Proc. ACM Con- of outputs into 2000 segments, the performance is satisfactory ference on Computer and Communications Security (CCS), 2014, pp. and similar to the performance with a continuous range of 1054–1067. outputs. Hence, our proposed approaches combined with the [12] P. Kairouz, S. Oh, and P. Viswanath, “Extremal mechanisms for local differential privacy,” in Advances in Neural Information Processing discretization step help retain the performance while enabling Systems, 2014, pp. 2879–2887. the usage in vehicles. [13] L. Ou, Z. Qin, S. Liao, T. Li, and D. Zhang, “Singular spectrum analysis for local differential privacy of classifications in the smart grid,” IEEE VIII. CONCLUSION Internet of Things Journal, 2020. [14] W. Tang, J. Ren, K. Deng, and Y. Zhang, “Secure data aggregation of In this paper, we propose PM-OPT, PM-SUB, lightweight e-healthcare IoT devices with fair incentives,” IEEE Internet Three-Outputs, and HM-TP local differential privacy of Things Journal, vol. 6, no. 5, pp. 8714–8726, 2019. mechanisms. These mechanisms effectively preserve the [15] M. Sun and W. P. Tay, “On the relationship between inference and data privacy in decentralized IoT networks,” IEEE Transactions on privacy when collecting data records and computing Information Forensics and Security, vol. 15, pp. 852–866, 2019. accurate statistics in various data analysis tasks, including [16] P. Zhao, G. Zhang, S. Wan, G. Liu, and T. Umer, “A survey of local estimating the mean frequency and machine learning tasks differential privacy for securing Internet of Vehicles,” The Journal of such as SVM classification, logistic regression, and linear Supercomputing, pp. 1–22, 2019. [17] L. Sun, J. Zhao, and X. Ye, “Distributed clustering in the anonymized regression. Moreover, we integrate our proposed local space with local differential privacy,” arXiv preprint arXiv:1906.11441, differential privacy mechanisms with FedSGD algorithm 2019. to create an LDP-FedSGD algorithm. The LDP-FedSGD [18] M. E. Gursoy, A. Tamersoy, S. Truex, W. Wei, and L. Liu, “Secure and utility-aware data collection with condensed local differential privacy,” algorithm enables the vehicular crowdsourcing applications IEEE Transactions on Dependable and Secure Computing, 2019. to train a machine learning model to predict the traffic [19] S. Ghane, A. Jolfaei, L. Kulik, K. Ramamohanarao, and D. Puthal, “Pre- status while avoiding the privacy threat and reducing the serving privacy in the internet of connected vehicles,” IEEE Transactions communication cost. More specifically, by leveraging LDP on Intelligent Transportation Systems, 2020. [20] L. Lyu, J. Yu, K. Nandakumar, Y. Li, X. Ma, J. Jin, H. Yu, and K. S. mechanisms, adversaries are unable to deduce the exact Ng, “Towards fair and privacy-preserving federated deep models,” IEEE location information of vehicles from uploaded gradients. Transactions on Parallel and Distributed Systems, vol. 31, no. 11, pp. Then, FL enables vehicles to train their local machine 2524–2541, 2020. [21] L. Sun, X. Ye, J. Zhao, C. Lu, and M. Yang, “Bisample: Bidirectional learning models using collected data and then send noisy sampling for handling missing data with local differential privacy,” 2020. gradients instead of data to the cloud server to obtain a global [22] J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Minimax optimal model. Extensive experiments demonstrate that our proposed procedures for locally private estimation,” Journal of the American Statistical Association, vol. 113, no. 521, pp. 182–201, 2018. approaches are effective and able to perform better than [23] C. Xu, J. Ren, L. She, Y. Zhang, Z. Qin, and K. Ren, “EdgeSanitizer: existing solutions. Further, we intend to apply our proposed Locally differentially private deep inference at the edge for mobile data LDP mechanisms to deep neural network to deal with more analytics,” IEEE Internet of Things Journal, 2019. complex data analysis tasks. [24] W.-S. Choi, M. Tomei, J. R. S. Vicarte, P. K. Hanumolu, and R. Kumar, “Guaranteeing local differential privacy on ultra-low-power systems,” in 2018 ACM/IEEE 45th Annual International Symposium on Computer EFERENCES R Architecture (ISCA). IEEE, 2018, pp. 561–574. [1] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, [25] X. He, J. Liu, R. Jin, and H. Dai, “Privacy-aware offloading in mobile- “Communication-efficient learning of deep networks from decentralized edge computing,” in GLOBECOM 2017-2017 IEEE Global Communi- data,” in Artificial Intelligence and Statistics, 2017, pp. 1273–1282. cations Conference. IEEE, 2017, pp. 1–6.

16 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

[26] X. Li, S. Liu, F. Wu, S. Kumari, and J. J. Rodrigues, “Privacy [50] S. Ruggles, S. Flood, R. Goeken, J. Grover, E. Meyer, J. Pacas, and preserving data aggregation scheme for mobile edge computing assisted M. Sobek, “IPUMS USA: Version 10.0,” dataset]. Minneapolis, MN: IoT applications,” IEEE Internet of Things Journal, 2018. IPUMS. https://doi.org/10.18128/D010.V10.0, 2020. [27] P. C. M. Arachchige, P. Bertok, I. Khalil, D. Liu, S. Camtepe, and [51] M. F. Duarte and Y. H. Hu, “Vehicle classification in distributed sensor M. Atiquzzaman, “Local differential privacy for deep learning,” IEEE networks,” Journal of Parallel and Distributed Computing, vol. 64, no. 7, Internet of Things Journal, 2019. pp. 826–838, 2004. [28] V. Pihur, “The podium mechanism: Improving on the laplace and [52] T. Li, M. Sanjabi, A. Beirami, and V. Smith, “Fair resource allocation staircase mechanisms,” arXiv preprint arXiv:1905.00191, 2019. in federated learning,” arXiv preprint arXiv:1905.10497, 2019. [29] L. Zhao, S. Hu, Q. Wang, J. Jiang, C. Shen, X. Luo, and P. Hu, “Shielding collaborative learning: Mitigating poisoning attacks through client-side detection,” IEEE Transactions on Dependable and Secure Computing, vol. PP, no. 99, pp. 1–1, 10.1109/TDSC.2020.2 986 205, 2020. [30] W. Y. B. Lim, N. C. Luong, D. T. Hoang, Y. Jiao, Y.-C. Liang, Q. Yang, D. Niyato, and C. Miao, “Federated learning in mobile edge networks: A comprehensive survey,” arXiv preprint arXiv:1909.11875, 2019. [31] M. Hao, H. Li, X. Luo, G. Xu, H. Yang, and S. Liu, “Efficient and privacy-enhanced federated learning for industrial artificial intelligence,” IEEE Transactions on Industrial Informatics, 2019. [32] S. Lu, Y. Yao, and W. Shi, “Collaborative learning on the edges: A case study on connected vehicles,” in 2nd USENIX Workshop on Hot Topics in Edge Computing (HotEdge 19), 2019. [33] R. Fantacci and B. Picano, “Federated learning framework for mobile edge computing networks,” CAAI Transactions on Intelligence Technol- ogy, vol. 5, no. 1, pp. 15–21, 2020. [34] Y. M. Saputra, D. T. Hoang, D. N. Nguyen, E. Dutkiewicz, M. D. Mueck, and S. Srikanteswara, “Energy demand prediction with federated learning for electric vehicle networks,” arXiv preprint arXiv:1909.00907, 2019. [35] M. Brendan, D. Ramage, K. Talwar, and L. Zhang, “Learning differen- tially private recurrent language models,” in International Conference on Learning and Representation, 2018. [36] S. Truex, N. Baracaldo, A. Anwar, T. Steinke, H. Ludwig, R. Zhang, and Y. Zhou, “A hybrid approach to privacy-preserving federated learning,” in Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, 2019, pp. 1–11. [37] R. Hu, Y. Guo, H. Li, Q. Pei, and Y. Gong, “Personalized federated learning with differential privacy,” IEEE Internet of Things Journal, 2020. [38] E. Bagdasaryan, O. Poursaeed, and V. Shmatikov, “Differential privacy has disparate impact on model accuracy,” in Advances in Neural Infor- mation Processing Systems, 2019, pp. 15 453–15 462. [39] A. Triastcyn and B. Faltings, “Federated learning with bayesian differ- ential privacy,” arXiv preprint arXiv:1911.10071, 2019. [40] Y. Wang, Y. Tong, and D. Shi, “Federated latent dirichlet allocation: A local differential privacy based framework.” in AAAI, 2020, pp. 6283– 6290. [41] L. Lyu, J. C. Bezdek, X. He, and J. Jin, “Fog-embedded deep learning for the Internet of Things,” IEEE Transactions on Industrial Informatics, 2019. [42] T. Li, Z. Liu, V. Sekar, and V. Smith, “Privacy for free: Communication- efficient learning with differential privacy using sketches,” arXiv preprint arXiv:1911.00972, 2019. [43] Y. Zhao, J. Zhao, L. Jiang, R. Tan, and D. Niyato, “Mobile edge com- puting, blockchain and reputation-based crowdsourcing IoT federated learning: A secure, decentralized and privacy-preserving system,” arXiv preprint arXiv:1906.10893, 2019. [44] L. Zhao, Q. Wang, Q. Zou, Y. Zhang, and Y. Chen, “Privacy-preserving collaborative deep learning with unreliable participants,” IEEE Transac- tions on Information Forensics and Security, vol. 15, no. 1, pp. 1486– 1500, 2020. [45] L. Lyu, H. Yu, and Q. Yang, “Threats to federated learning: A survey,” arXiv preprint arXiv:2003.02133, 2020. [46] T. Wang, J. Zhao, H. Yu, J. Liu, X. Yang, X. Ren, and S. Shi, “Privacy- preserving crowd-guided AI decision-making in ethical dilemmas,” in ACM International Conference on Information and Knowledge Man- agement (CIKM), 2019, pp. 1311–1320. [47] C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Foundations and Trends in Theoretical Computer Science (FnT- TCS), vol. 9, no. 3–4, pp. 211–407, 2014. [48] F. McSherry and K. Talwar, “Mechanism design via differential privacy,” in IEEE Symposium on Foundations of Computer Science (FOCS), 2007, pp. 94–103. [49] J. R. Kwapisz, G. M. Weiss, and S. A. Moore, “Activity recognition us- ing cell phone accelerometers,” ACM SigKDD Explorations Newsletter, vol. 12, no. 2, pp. 74–82, 2011.

17 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

2 P0 x( 1)+ P0 x( 1) 2 APPENDIX = C 1 ← M ←− M x , (44) − 2 − when given x.  A. Proof of Lemma 1 The variance− of mechanism is M1 Var [Y X = x]= E[Y 2 X = x] (E[Y X = x])2 The mechanism 2 satisfies the proper distribution in M1 M 2 | | − | Eq. (9c) because of = C PC x( 1)+0 PC x( 1) ← ← · 2 M · 2 M PC x( 2)+ P C x( 2)+ P0 x( 2) + ( C) P C x( 1) x ← M − ← M ← M − · − ← M − P ( )+ P ( ) 2 2 C x 1 C x 1 = C (1 P0 x( 1)) x , (45) = ← M − ←− M − ← M − 2 or P C x( 1)+ PC x( 1) E 2 E 2 + − ← M ←− M Var 1 [Y X = x]= [Y X = x] ( [Y X = x]) M 2 2 | − | − − | − = C PC x( 1)+0 PC x( 1) P0 x( 1)+ P0 x( 1) ←− ←− + ← M ←− M =1. · 2 M · 2 M 2 + ( C) P C x( 1) x − ←− Besides, the mechanism 2 satisfies unbiased estimation in 2 − · M −2 M = C (1 P0 x( 1)) x . (46) Eq. (9b) because − ←− M − Hence, C PC x( 2) + ( C) P C x( 2)+0 P0 x( 2) · ← M − · − ← M · ← M Var 2 [Y X = x]= Var 2 [Y X = x] PC x( 1)+ P C x( 1) M | M | − = C ← M − ←− M Var [Y X = x]+ Var [Y X = x] · 2 = M2 | M2 | − (47) 2 P C x( 1)+ PC x( 1) + ( C) − ← M ←− M Eq. (45) + Eq. (46) − · 2 = 2 P0 x( 1)+ P0 x( 1) +0 ← M ←− M = x. Max Var [Y X = x], Var [Y X = x] . · 2 ≤ { M1 | − M1 | } In addition,  PC x( 2) PC x( 1)+ P C x( 1) ← M = ← M − ←− M (37) PC x′ ( 2) PC x′ ( 1)+ P C x′ ( 1) B. Proof of Lemma 2 and ← M ← M − ←− M P C x( 2) P C x( 1)+ PC x( 1) In essence, with Eq. (21) (21), we have − ← M = − ← M ←− M (38) P C x′ ( 2) P C x′ ( 1)+ PC x′ ( 1) PC 1( 3) − ← M − ← M ←− M ← M and PC 1( 3) P0 x( 2) P0 x( 1)+ P0 x( 1) ←− M ǫ ← ← ←− e PC←−1( 2) PC←1( 2) M = M M . (39) PC 1( 2) Mǫ − M P0 x′ ( 2) P0 x′ ( 1)+ P0 x′ ( 1) ← e 1 = M − ǫ − ← M ← M ←− M e PC←−1( 2) PC←1( 2) According to (9a), we obtain M − M PC 1( 2) eǫ 1 ǫ ′ ′ ←− M − e− (PC x ( 1)+ P C x ( 1)) ǫ − ← M − ←− M = e . P ′ ( )+ P ′ ( ) C x 1 C x 1 PC←1( 3) PC←1( 3) ǫ ← M − ←− M Similarly, we can prove that M , M = e . ǫ ′ ′ P ←− ( ) P ←− ( ) e (PC x ( 1)+ P C x ( 1)) C 1 M3 C 1 M3 Eq. (37) ← M − ←− M , Besides, ≤ ≤ PC x′ ( 1)+ P C x′ ( 1) ← M − ←− M PC x + P C x + P0 x =1. which is equavalent to ← − ← ← ǫ ǫ In addition, e− Eq. (37) e . (40) ≤ ≤ C PC x + ( C) P C x +0 P0 x Similarly, we prove that Eq. (38) (39) satisfy (9a). Hence, we · ← − · − ← · ← conclude that satisfies ǫ-LDP requirements. = C (PC x( 2) P C x( 2)) = x. 2 · ← M − − ← M M Hence, mechanism satisfies requirements in (9a) (9b) (9c). Then, we prove that the symmetrization process does not M3 increase the worst-case noise variance as follows: Because the symmetric mechanism 3 satisfies require- M ments in (9a) (9b) (9c), the variance of 3 is Since 2 satisfies the unbiased estimation of Eq. (9b), M E M Var [Y X = x]= E[Y 2 X = x] (E[Y X = x])2 [Y X = x] = x. Hence, the variance of mechanism 2 M3 | M 2 | | − | given x is = C PC x( 3)+0 PC x( 3) · ← M · ← M Var [Y X = x]= E[Y 2 X = x] (E[Y X = x])2 2 2 M2 + ( C) P C x( 3) x 2 | | − | − · − ← M − = C PC x( 2)+0 PC x( 2) 2 2 ← ← = C (PC x( 3)+ P C x( 3)) x · 2 M · 2 M ← M − ← M − + ( C) P C x( 2) x 2 2 − ← = C (1 P0 x( 3)) x . 2 − · M −2 − ← M − = C (1 P0 x( 2)) x (41) By comparing with variance of 2 in Eq. (41), we obtain − ← M − M 2 P0 x( 1)+ P0 x( 1) 2 Var 3 [Y X = x] Var 2 [Y X = x] ← ←− M M = C 1 M M x , (42) 2 | − 2 | 2 2 − 2 − = C (1 P0 x( 3)) x (C (1 P0 x( 2)) x )   ← ← or it changes to 2 − M − − − M − = C (P0 x( 2) P0 x( 3)). E 2 E 2 ← ← Var 2 [Y X = x]= [Y X = x] ( [Y X = x]) M − M M | − | − − | − Because of P0 x( 2) > P0 x( 3), based on Eq. (23) 2 ← M ← M = C PC x( 2)+0 PC x( 2) and Inequality (20), we get P ( ) > P ( ) which ←− ←− 0 x 2 0 x 3 · 2 M · 2 M ← M ← M + ( C) P C x( 2) x means that − · − ←− M − 2 2 Var [Y X = x] > Var [Y X = x]. = C (1 P0 x( 2)) x (43) M2 M3 − ←− M − | | 18 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

Thus, the variance of is smaller than the variance of x2. (54) M3 M2 − when x [ 1, 1], so we obtain that the worst-case noise Derive the partial derivative of Var[Y X = x] to P0 1, and ∈ − | ← variance of 3 is smaller that of 2 using we get M M Var max Var 2 [Y X = x] > max Var 3 [Y X = x]. d( [Y X = x]) x [ 1,1] M | x [ 1,1] M | | ∈ − ∈ −  dP0 1 ǫ ← 2 (e + 1) (2P0 0 + x (1 2P0 0 + P0 1) 2) ← ← ← (55) = | | 2− ǫ 2 − . C. Proof of Lemma 3 (P0 1 + 1) (e 1) Then, we have the following← cases:− Since P C 0 +PC 0 +P0 0 =1 and unbiased estimation ǫ 2 − ← ← ← (e +1) (2P0←0 2) I. If x =0, Eq. (55) = 2 ǫ − 2 < 0, C P C 0 + C PC 0 +0 P0 0 =0, we have (P0←1+1) (e 1) − ← ← ← | | ǫ 2 − − · · · 1 P (e +1) (P0←1 1) 0 0 II. If x =1, Eq. (55) = 2 ǫ − 2 < 0. PC 0 = P C 0 = − ← . (48) (P0←1+1) (e 1) ← − ← 2 | | − Then, based on (9a) (9b) (9c) and Lemma 2, we can derive C Therefore, if given P0 0, the variance of the output given ← with the following steps: input x is a strictly decreasing function of P0 1. Hence, we ← get the minimized variance when P0←0 .  P C 1 + PC 1 + P0 1 =1, P0 1 = eǫ − ← ← ← ← C P C 1 + C PC 1 +0 P0 1 =1. − · − ← · ← · ← Therefore, we have 1 D. Solve Eq. (58) 1 P0 1 + C PC 1 = − ← , ← 2 1 To find the optimal t for mint maxx [ 1,1] Var[Y x], we 1 P0 1 C ∈ − | P C 1 = − ← − . calculate first-order derivative of the maxx [ 1,1] Var[Y x] as − ← 2 follows: ∈ − | From Lemma 2, we obtain 2 1 1 2t 4 4 4t− 1 P0 1 + 1 P0 1 + + − ← C = eǫ − ← − C , 3(eǫ 1)2 3(eǫ 1) 3(eǫ 1)2 − 3(eǫ 1)2 2 · 2 − 2 − 3 − 3 −3 which is equivalent to   4t− 2t− 4t− 2t− eǫ +1 − 3(eǫ 1) − 3(eǫ 1)2 − 3(eǫ 1) − 3 (49) C = ǫ . − − − (e 1)(1 P0 1) 2 ǫ ǫ 2 2ǫ 3 ← = [t +2e 2e t e t ]. (56) Hence, − − ǫ 2 − − ǫ 3(e 1) − − (1 P0 1)e Next, we− calculate the second-order derivative of PC 1 = P C 1 = − ← , ← − ←− ǫ e +1 maxx [ 1,1] Var[Y x] as follows: ∈ − | (1 P0 1) 2 ǫ 3 ǫ 4 P C 1 = PC 1 = − ← . (50) [1 + 4e t− +3e t− ] > 0. (57) − ← ←− eǫ +1 3(eǫ 1)2 Then, we compute the variance as follows: − Since the second-order derivative of maxx [ 1,1] Var[Y x] > I. For x [0, 1], we have 0, we can conclude that max Var[Y∈ x−] has minimum| ∈ x [ 1,1] Var[Y X = x]= E[Y 2 X = x] (E[Y X = x])2 point in its domain. ∈ − | 2| | − 2 | 2 = C PC x +0 P0 x + ( C) P C x x To find t which minimizes maxx [ 1,1] Var[Y x], we set · ← · ← − · − ← − 4 ǫ 3 ǫ 2ǫ ∈ − | 2 2 t +2e t 2e t e =0. By solving = C (PC x + P C x) x . (51) − − ← − ← − t4 +2eǫt3 2eǫt e2ǫ =0, (58) Substituting Eq. (13) and Eq. (14) into Eq. (51) yields − − 2 we obtain Eq. (30). Define Eq. (58)’s coefficients as c4 := = C (PC 0 + (PC 1 PC 0)x) ← ← ← ǫ ǫ 2ǫ 2 − 2 1, c3 := 2e , c2 := 0, c1 := 2e , c0 := e , and we + C (P C 0 (P C 0 P C 1)x) x − − − ← − ← − ← obtain 2 − −2 − = C (PC 0 + P C 0)+ C (PC 1 + PC 1)x 4 3 (59) ← − ← ← ←− c4 t + c3 t + c1 t + c0 =0. 2 2 · · · C (PC 0 + P C 0)x x To change Eq. (59) into a depressed quartic form, we substitute − ← − ← − f 2 2 f := eǫ,t := y c3 = y into Eq. (59) and obtain = C (PC 0 + P C 0)+ C (1 P0 1)x 4c4 2 ← − ← ← − 4 −2 2 2 − y + p y + q y + r =0, (60) C (1 P0 0)x x ← · · − 2 − −2 2 where = C (1 P0 0)+ C (P0 0 P0 1)x x . (52) 8c c 3c2 3f 2 − ← ← − ← − p = 2 4 − 3 = , (61) II. For x [ 1, 0], we have 8c2 − 2 ∈ − 2 2 4 Var[Y X = x]= E[Y X = x] (E[Y X = x]) 3 2 | | − | c3 4c2c3c4 +8c1c4 3 2 2 2 q = − = f 2f, (62) = C (1 P0 0)+ C (P0 0 P0 1)( x) x . 8c3 − − ← ← − ← − − 4 (53) 4 3 2 2 3c + 256c0c 64c1c3c + 16c2c c4 Hence, by summarizing Eq. (49), Eq. (52), and Eq. (53), we r = − 3 4 − 4 3 256c4 get the variance as follows: 4 3 Var[Y X = x] = f 4. (63) −16 2| 2 2 = C (1 P0 0)+ C (P0 0 P0 1) x x Rewrite Eq. (60) to the following: ← ← ← 2 − 2 − | |− p p eǫ +1 y2 + = qy r + . (64) = ǫ (1 P0 0 + (P0 0 P0 1) x ) 2 − − 4 (e 1)(1 P0 1) − ← ← − ← | |    − − ←  19 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

Then, we introduce a variable m into the factor on the left- Since x [ 1, 1], the worst-case noise variance is ∈ − 4 2 2 hand side of Eq. (64) by adding 2y2m + pm + m2 to both 2 C b C b (1 a)C + 4 , if 2 < 1, sides. Thus, we can change the equation to the following: max Var[Y x]= − 2 x [ 1,1] | (1 a + b)C2 1, if C b 1. p p2 ∈ − ( − − 2 ≥ y2 + + m =2my2 qy + m2 + mp + r. (65) (76) 2 − 4 − Substituting Eq. (74), Eq. (75) and Eq. (73) into Eq. (76) Since m is arbitrarily chosen, we choose the value of m to yields get a perfect square in the right-hand side. Hence, 3 2 2 2 max Var[Y x] 8m +8pm + (2p 8r)m q =0. (66) x [ 1,1] | − − ∈ − To solve Eq. (66), we substitute Eq. (61), Eq. (62) and Eq. (63) ǫ 2 2ǫ ǫ 2 2 (e +1) e 1 a (e +1) a C2b into the following equations: (eǫ 1)·2 (eǫ −a)2 + 4(eǫ a·)4 , if 2 < 1, =  −  − −  c := 8, ǫ 2 2ǫ ǫ 2 3′  (e +1) e 1 a (e 1) a C b  (eǫ 1)·2 (eǫ −a)2 + eǫ(e−ǫ a·)2 1,if 2 1. c′ := 8p, − ≥ 2 −  − −  2 ǫ 2 2ǫ ǫ 2 2 2 c1′ := 2p 8r,  (e +1) e 1 a (e +1) a C b  ǫ ·2 ǫ − 2 + ǫ · 4 , if < 1, 2− (e 1) (e a) 4(e a) 2 c′ := q , = − − − 0  ǫ 2 ǫ  − (e +1) e C2b 2 2 2  · 1, if 1. ∆ = (c′ ) 3c′ c′ = (8p) 3 8 (2p 8r)=0,  (eǫ 1)2 (eǫ a)2 2 0 2 3 1 − · − − ≥ 2− − · 2· − (77) ∆ = 2(c′ ) 9c′ c′ c′ + 27(c′ ) c′ 1 2 3 2 1 3 0 Substituting Eq. (75) and Eq. (73) yields 3 − · · 2 · 2 2 = 2(8p) 9 8 8p (2p 8r)+27 8 ( q ) C2b (eǫ + 1)2 e2ǫ a(eǫ 1) − · · · − · · − = · − = 6912(f 4 f 2), 2 2(eǫ 1)2(eǫ a)2 · eǫ − ǫ − 2 ǫ− 2 3 (e + 1) e a 3 ∆1 ∆1 4∆0 3 = ǫ ·ǫ · 2 C = ± − = ∆1 2(e 1)(e a) s 2 − − p < 1, (78) = 3 6912(f 4 f 2). p (67) − and By solving the cubic function Eq. (67), we have roots as p 2(eǫ 1)a2 4(eǫ 1)eǫ + (eǫ + 1)2 eǫ a follows : − −ǫ −2ǫ · 1 k ∆0 + 2(e  1)e > 0.  (79) mk = (c2′ + ξ C + k ) − −3c3′ ξ C To solve Eq. (79), we denote the smaller solution of the 2 2 4 quadratic function as f 3 f f = + − ξk, k 0, 1, 2. (68) eǫ(e2ǫ +6eǫ 3) (eǫ + 1)eǫ (eǫ + 1)2 + 8(eǫ 1) 2 2 ∈ a∗ = − − − . We only use the real-valuer root; thus, we get ǫ 4(e 1)p 2 2 4 − (80) f 3 f f m = + − . (69) From Eq. (14), we get 2 r 2 Thus, P C 0 P C 1. (81) − ← − ← √2q ≥ √ Then, substituting P C 0 and P C 1 with Eq. (48) and 1 2m 2 (2p +2m 1 √m ) y = ± ± − ± . (70) Eq. (50) in Eq. (81) yields− ← − ← q 2 1 P0 0 (1 P0 1) Then, the solutions of the original quartic equation are − ← − ← . (82) √ 2 ≥ eǫ +1 √2m (2p +2m 2q ) c3 1 2 1 √m Hence, t = + ± ± − ± . (71) ǫ q e −4c4 2 a = P0 0 . (83) Since t is a real number and t> 0, we obtain Eq. (30) after ← ≤ eǫ +2 substituting into Eq. (71).  From Eq. (83), we know that the Eq. (79) will be ensured (i) c3,c4,m,p,q,f eǫ eǫ when 0 a

20 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

eǫ Substituting Eq. (80) into a∗ = eǫ+2 yields After simplifying f1′ (a), we have 3 2 2ǫ ǫ eǫ(e2ǫ +6eǫ 3) (eǫ + 1)eǫ (eǫ + 1)2 + 8(eǫ 1) 2a a ( e 5 4e ) − − − f1′ (a)= − − −ǫ −5 − 4(eǫ 1) 2(e a) − p ǫ 2ǫ −3ǫ 3ǫ 2ǫ eǫ a(7e 4e e ) (2e 4e )) = . (85) + − − − − − . (91) eǫ +2 2(eǫ a)5 ǫ 5 − After solving Eq. (85), we get ǫ = ln 4. Since 2(e a) > 0, solving f1′ (a)=0 is equivalent to solve the following− equation 2a3 + a2( e2ǫ 5 4eǫ)+ a(7eǫ 4e2ǫ e3ǫ) 1.5 − − − − − + (2e3ǫ 4e2ǫ)=0. (92) − We define coefficients of Eq. (92) as follows:

c3 := 2, (93) 1 c := e2ǫ 5 4eǫ, (94) 2 − − − c := 7eǫ 4e2ǫ e3ǫ, (95) 1 − − c := 2e3ǫ 4e2ǫ. (96) 0 − 0.5 The general solution of the cubic equation involves calculation of ∆ = c 2 3c c 0 2 − 3 1 = ( e2ǫ 5 4eǫ)2 3 2(7eǫ 4e2ǫ e3ǫ) 0 − − − − × − − 0 0.5 1 1.5 2 2.5 3 = e4ǫ + 14e3ǫ + 50e2ǫ 2eǫ + 25 > 0, − ǫ 3 2 e ∆1 =2c2 9c3c2c1 + 27c3c0 Fig. 16: Compare a∗ with eǫ+2 . − = 2( e2ǫ 5 4eǫ)3 − − − ǫ 2ǫ ǫ ǫ 2ǫ 3ǫ According to Fig. 16, we obtain that e if 9 2( e 5 4e )(7e 4e e ) a∗ eǫ+2 0 <ǫ − × − − − − − ln 4. Since ǫ = ln 4 is the only solution if ǫ>≥ 0, we conclude≤ + 27 22(2e3ǫ 4e2ǫ) ǫ e 6×ǫ 5ǫ − 4ǫ 3ǫ 2ǫ that a∗ > eǫ+2 if ǫ > ln 4. Therefore, we can replace the = 2e 42e 270e 404e 918e eǫ condition a < and write the variance as follows: − ǫ− − − − ∗ eǫ +2 + 30e 250 < 0, maxx [ 1,1] Var[Y x]= − ∈ − | 2 3 (eǫ+1)2 e2ǫ 1 a (eǫ+1)2 a2 3 ∆1 ∆1 4∆0 (eǫ 1)·2 (eǫ −a)2 + 4(eǫ a·)4 , C = ± − . − − − s p 2     Substituting ∆ and ∆ into C yields   for 0 a 0. (89) Therefore, we obtain C = √∆ qe . According to Euler’s 2 (eǫ 1)2(eǫ a)3 0 − − Formula, we have Since f2′ (a) > 0, the worst-case noise variance monotonously i3θ eǫ e = cos3θ + i sin 3θ, (101) increases if a [a∗, ǫ ], we can get optimal a by analyzing ∈ e +2 ∆ f1(a) in Eq. (87) when a [0,a∗) if ǫ < ln 4. First order 1 ∈ cos3θ = 3 < 0, (102) derivative of Eq. (87) is 2 2∆0 2(1 a) 1 a(eǫ + 1)2 a2(eǫ + 1)2 4∆3 ∆2 f1′ (a)= ǫ − 3 ǫ 2 + ǫ 4 + ǫ 5 . 0 1 (e a) − (e a) 2(e a) (e a) sin 3θ = −3 < 0. (103) − − − − − 2 (90) p 2∆0

21 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

Hereby, yields ∆1 ∆= e2ǫ(eǫ + 1)2(e6ǫ + 30e5ǫ 3θ = π + arccos( 3 ), (104) − − 2 4ǫ 3ǫ 2eǫ ǫ 2∆0 + 279e + 580e 2385e + 606e 775). (116) π 1 ∆ − 6 5 − 4 3 1 (105) If ∆=0, we get ǫ = ln(root of ǫ +30ǫ +279ǫ +580ǫ θ = + arccos( 3 ). 2 − − 3 3 − 2 2385ǫ + 606ǫ 775 near ǫ =1.87686) 0.629598. 2∆0 − ≈ The solution of the cubic function is If 0 <ǫ< 0.629598, ∆ < 0, the equation has one real 1 ∆ • k 0 (106) root and two non-real complex conjugate roots. ak = (c2 + ξ C + k ), k 0, 1, 2 . −3c3 ξ C ∈{ } If ǫ =0.629598, ∆=0, the equation has a multiple root To solve Eq. (106), we have the following cases: • all of its roots are real. If k =0, we have • If ǫ > 0.629598, ∆ > 0, the equation has three distinct 1 ∆0 • a0 = (c2 + C + ). (107) real roots. −3c3 C Substituting θ (105) and c2 (94) into Eq. (107) yields From the simplified f1′ (a) Eq. (91), we know that the sign and 1 roots of f (a) are same as its numerator, defined as follows: a = ( e2ǫ 4eǫ 5 1′ 0 −6 − − − g(a) := 2a3 a2( e2ǫ 5 4eǫ) π 1 ∆ − − − − − 1 a(7eǫ 4e2ǫ e3ǫ) (2e3ǫ 4e2ǫ). (117) +2 ∆0 cos( + arccos( 3 ))). (108) − 3 3 − 2 ǫ− − − − − 2∆0 Let c = e , we can change g(a) to the following: If k =1, wep have 3 2 2 2 3 • g(a)= 2a a ( c 5 4c) a(7c 4c c ) 1 ∆0 − − − − − − − − a1 = (c2 + ξC + ) (2c3 4c2). (118) −3c3 ξC − − Case 1: If , by observing , it 1 1 √3i ∆ 0 <ǫ< 0.629598 a0,a1,a2 = (c + ( + )C + 0 ) is obvious that is a real root. Fig. 17 shows that . 2 √ a2 a2 > 1 −3c3 −2 2 ( 1 + 3i )C − 2 2 Since a2 is the only real value root, Eq. (90) f1′ (a) > 0 if 2π 4π 1 i ∆0 i 0 < ǫ 0.629598 and f1′ (a) 0 if ǫ > 0.629598, f1(a) = (c2 + Ce 3 + e 3 ) ≤ ≤ −3c3 C monotonously increases if 0 < ǫ 0.629598. Therefore, we ≤ 1 i(θ+ 2 π) i( 4π θ) conclude that a =0. = (c2 + ∆0e 3 + ∆0e 3 − ). −3c3 p p (109) Simplify Eq. (109) using Eq. (101), we have 8.5 1 2π 8 a1 = (c2 +2 ∆0 cos(θ + )). (110) −3c3 3 7.5 Substituting θ (105) and pc2 (94) into Eq. (110) yields 1 2ǫ ǫ 7 a1 = ( e 4e 5 2 6.5 −6 − − − a π 1 ∆1 6 +2 ∆0 cos( + arccos( 3 ))). (111) 3 3 −2∆ 2 0 5.5 If k =2, wep get • 1 2 ∆0 5 a2 = (c2 + ξ C + 2 ) −3c3 ξ C 4.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 1 √3i ∆ privacy budget = (c + ( + )2C + 0 ) 2 √ −3c3 −2 2 ( 1 + 3i )2C Fig. 17: a2 if ǫ [0, 0.629598]. − 2 2 ∈ 1 1 √3i ∆ = (c + ( )C + 0 ) 2 √ Case 2: If 0.629598 ǫ< ln 2, ∆ 0, we get real roots. −3c3 −2 − 2 ( 1 3i )C ≤ ≥ − 2 − 2 If a =0, g(0) = (2c3 4c2). 1 4π 2π • i(θ+ ) i( θ) − 2 − = (c2 + ∆0e 3 + ∆0e 3 − ). (112) If a =2, g(2) = 2(8c + c + 2) > 0. −3c • 3 If a =+ , lim g(a)= < 0. Simplify Eq. (112) using Eq. (101), we obtain • a p p ∞ →∞ ǫ −∞ 1 2π If there is a root is in e , it means that . By a = (c +2 ∆ cos(θ )). (113) [0, eǫ+2 ] g(0) 0 2 2 0 3 2 ≤ −3c3 − 3 solving g(0) = (2c 4c ) 0, we have c 2 meaning ǫ Substituting θ (105) and pc2 (94) into Eq. (113) yields ln 2. When 0.629598− <ǫ<− ln≤ 2, we have a ≥root (2, + ≥). 1 2ǫ ǫ ∈ ∞ a2 = ( e 4e 5 Based on the properties of cubic function, we have −6 − − − ǫ 2ǫ 3ǫ c1 7e 4e e 1 ∆1 a0a1 + a0a2 + a1a2 = = − − , (119) +2 ∆0 cos( π + arccos( 3 ))). (114) c3 2 − 3 −2∆ 2 3ǫ 2ǫ p 0 c0 2e 4e The number of real and complex roots are determined by the a0a1a2 = = − . (120) −c3 − 2 discriminant of the cubic equation as follows: Fig. 18 shows that a0a1 + a0a2 + a1a2 < 0 (Eq. (119)) 3 2 2 3 2 2 ∆=18c3c2c1c0 4c c0 + c c 4c3c 27c c . (115) and a a a > 0 (Eq. (120)). Therefore, we can conclude that − 2 2 1 − 1 − 3 0 0 1 2 Substituting c3 (93), c2 (94), c1 (95) and c0 (96) into ∆ (115) there are one positive real root and two negative real roots or

22 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

1 35

30 0 25 -1 20

-2 15

10 -3 5 -4 0

-5 -5 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.6 0.8 1 1.2 1.4 1.6 1.8 privacy budget privacy budget Fig. 18: a a +a a +a a and a a a if ǫ [0.629598, ln 2]. Fig. 19: a ,a and a if ǫ [ln 2, ln 5.53]. 0 1 0 2 1 2 0 1 2 ∈ 0 1 2 ∈ a multiple root. Since two negative roots are out of the a’s F. Proof of Lemma 5 domain, we only discuss the positive root. By substituting the optimal P0 0 of Eq. (5) with a in the ← If a [0, root), g(a) > 0 meaning f ′ (a) > 0. maxx [ 1,1] Var[Y x] of Eq. (86), we obtain the worst-case • ∈ 1 ∈ − | If a [root, + ), g(a) 0 meaning f ′ (a) 0. noise variance of Three-Outputs as follows: • 1 ∈ ∞ ≤ ≤ min max Var[Y x]= From above we know that g(a) > 0, so that f1(a) P0←0 x [ 1,1] | ǫ ∈ − e ǫ 2 monotonously increases if a [0, eǫ+2 ]. Therefore, a =0. (e +1) ∈ (eǫ 1)2 , for ǫ< ln 2, Case 3: If ln 2 ǫ ln 5.53, ∆ > 0, there are three distinct − ǫ 2 2ǫ ǫ 2 2 ≤ ≤  (e +1) e 1 P0←0 (e +1) P0←0 real roots. Since a0a1 + a0a2 + a1a2 < 0 and a0a1a2 < 0, ǫ ·2 ǫ − 2 + ǫ · 4 , for ln 2 ǫ ln 5.53,  (e 1) (e P0←0) 4(e P0←0) there are one negative root or three negative roots. If there are  − − − ≤ ≤   1 2ǫ ǫ   where P0 0 = ( e 4e 5 three negative roots, a0a1 +a0a2 +a1a2 > 0, there is only one  ← − 6 − − −  √ π 1 ∆1 negative root, f1′ (a) have two positive roots and one negative +2 ∆0 cos( 3 + 3 arccos( 3 ))) − 2∆ 2 root. ǫ ǫ 0  (e +2)(e +10) , for ǫ> ln 5.53.  4(eǫ 1)2 If a =0, g(0) = (2c3 4c2) < 0.  − •  (122) If a =2, g(2) = 2(8− c2 +−c + 2) > 0.  •   If a =+ , lima g(a)= . • ∞ →∞ −∞ From above results, we can deduce that there is one positive 2(eǫ a)2(eǫ 1) aeǫ(eǫ+1)2 eǫ 1 ǫ G. Proof of 2(eǫ−a)2(eǫ−+t)−aeǫ(eǫ+1)2 eǫ−+t if t = 3 root in (0, 2) defined as root1, the other positive root is in − − ≤ 2(eǫ a)2(eǫ 1) aeǫ(eǫ+1)2 eǫ 1 (2, + ) defined as root2. Since root2 > 1, we only discuss Proposition 1. − − − − for ǫ > ∞ 2(eǫ a)2(eǫ+t) aeǫ(eǫ+1)2 ≤ eǫ +t root1. 0. − −

a [0, root1], g(a) 0. Define • ∈ ≤ 2(eǫ a)2(eǫ 1) aeǫ(eǫ + 1)2 eǫ 1 a (root1, root2), g(a) > 0. f := − − − − • ∈ 2(eǫ a)2(eǫ + ǫ ) aeǫ(eǫ + 1)2 − eǫ + ǫ Therefore, if c , we can conclude that c . 3 3 g( c+2 ) 0 root1 c+2 − ǫ ǫ − 2 The exact form of g(≥c ) is ≤ ae (e + 1) (t + 1) c+2 = ǫ ǫ 2 ǫ 2 ǫ ǫ ǫ ǫ , c c2(c + 1)2( c2 +3c + 14) (ae (e + 1) 2(e a) (e + 3 ))(e + 3 ) g = − . (121) − − c +2 (c + 2)3 (123)   √ and By solving g( c ) 0, we have c ln 3+ 65 5.53, ǫ c+2 ≥ ≤ 2 ≈ h := 2(eǫ a)2(eǫ + ) aeǫ(eǫ + 1)2. (124) i.e. ǫ ln 5.53. From Fig. 19, we can conclude thata1 is the − 3 − ≤ 1 2ǫ ǫ When ǫ> 0, we have eǫ + ǫ > 0 and aeǫ(eǫ +1)2( ǫ +1) > 0 correct root, a0 < 0 and a2 > 1. a = a1 = 6 ( e 4e 3 3 π 1 ∆1 − − − − (the numerator of Eq. (123)). 5+2√∆0 cos( 3 + 3 arccos( 3 ))). − 2∆ 2 0 If 0 <ǫ< ln 2, we have a = 0 and h = 2e2ǫ(eǫ + Case 4: If ǫ > ln 5.53, ∆ > 0, there are three distinct real • eǫ/3) < 0. Therefore, we conclude that Eq. (123)− < 0. roots. From analysis in Case 3, we know that if ǫ > ln 5.53, If ln 2 ǫ ln 5.53, we have a = 1 ( e2ǫ 4eǫ c and c . We know that if • 6 root1 > c+2 g( c+2 ) < 0 g(a) 0 ≤ ≤ π 1 ∆1 − − − − c ≤ 5+2√∆0 cos( 3 + 3 arccos( 3 ))). Fig. 20 shows a [0, ], meaning f ′ (a) < 0, so that f1(a) monotonously 2 c+2 1 − 2∆0 ∈ eǫ ǫ 2 ǫ ǫ ǫ 2 decreases if ǫ > ln 5.53. Since a [0, eǫ+2 ], we have a = that h := 2(e a) (e + t) ae (e + 1) < 0, so that eǫ ∈ we obtain Eq.− (123) < 0. − eǫ+2 .

Summarize above, we obtain the optimal a which is named ǫ If ǫ > ln 5.53, we have a = e and h = as P0 0 in the Eq. (5). • eǫ+2 2ǫ ǫ 2 ǫ ǫ/3 ←  e (e +1) ( 2+e +2e ) ǫ ǫ/3 (eǫ−+2)2 . Since e and e > 1, we

23 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

obtain 2 + eǫ + 2eǫ/3 > 0 and h > 0. Hence, we By solving above Eq. (126) and Eq. (127), we have conclude− that Eq. (123) < 0. x 1 2Ad Lx = 1 2Ad 2(−c d) , − − − (128) Based on above analysis, we have Eq. (123)) < 0 when ǫ> 0, R = x + 1 2Ad . ǫ 2 ǫ ǫ ǫ 2 ( x 1 2Ad 2(−c d) 2(e a) (e 1) ae (e +1) eǫ 1 − − meaning that ǫ− 2 ǫ− − ǫ ǫ 2 ǫ− when ǫ> 0. With A y A, the constraint A L < R A for 2(e a) (e +t) ae (e +1) ≤ e +t x x − −  any −1 ≤x ≤1 in Eq. (26), Eq. (28)− and≤ Eq. (29) implies≤ − ≤ ≤ 1 Ad < , (129) ǫ 2 ǫ ǫ ǫ 2 2 H. Proof of 2(e a) (e + t) ae (e + 1) > 0. 1 1 2Ad − − A + − . (130) From values of ǫ, we have the following cases: ≥ 1 2Ad 2(c d) For notational simplicity,− we define α and− ξ as If 0 <ǫ ln 5.53, we have 2(eǫ a)2(eǫ + t) aeǫ(eǫ + • 1)2 > 0 ≤referring to Fig. 20. − − α := Ad, (131) eǫ ǫ If ǫ> ln 5.53, we have a = and t = e 3 , so that c d • eǫ+2 ξ := − , (132) 2(eǫ a)2(eǫ + t) aeǫ(eǫ + 1)2 d − − where it is clear under privacy parameter ǫ that 2ǫ ǫ 2 ǫ ǫ e (e + 1) (e 2+2e 3 ) ξ = eǫ 1. (133) = − . (125) − (eǫ + 2)2 Applying Eq. (131) and Eq. (132) to Inequality (129) and ǫ e2ǫ(eǫ+1)2( eǫ+2 2e 3 ) Since ǫ − 2 − > 0 if ǫ > 0, we have Eq. (130), we obtain (e +2) α 1 1 2α 2(eǫ a)2(eǫ + t) aeǫ(eǫ + 1)2 > 0 if ǫ> ln 5.53. + − , (134) − − d ≥ 1 2α 2ξd 1 − α< . (135) 18 2 (2ξ+4)α (4+4ξ)α2 1 16 The condition Eq. (134) induces d −2ξ − = [(2ξ+2)α 1](1 2α) 1 ≤ 1 1 14 − − 2 2ξ . In view of 2(ξ+1) = 2eǫ <α< 2 , we

+1) t+1 12 define t satisfying 0

(e 2(t+e ) t+1 1 ∞ t+1 1 10 that limt 0 2(t+eǫ) = 2eǫ , limt 2(t+eǫ) = 2 and d = → →∞ ǫ

+ t)-ae [(2ξ+2)α 1](1 2α) ξt ξ t(e 1) 8 1 − − = = −ǫ 2 . (e 2ξ 2ξ t+1+ξ t+1+ξ 2(t+e ) 2 · ·

-a) 6

2(e 4 Applying α, d and ξ to Eq. (128), we have 2 x 1 2α eǫ+t eǫ+t (eǫ+t)(xt 1) L = − = x ǫ ǫ = ǫ − , 0 x 1 2α 2ξd e 1 t(e 1) t(e 1) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 −x − 1 2α · eǫ−+t − eǫ+−t (eǫ+t)(−xt+1) privacy budget ( Rx= 1 2α + 2−ξd = x eǫ 1 + t(eǫ 1) = t(eǫ 1) . − · − − − Fig. 20: Value 2(eǫ a)2(eǫ + t) aeǫ(eǫ + 1)2 > 0, when (136) ǫ (0, ln 5.53]. − − Furthermore, the variance of Y is ∈ Var[Y x]= E[Y 2 x] (E[Y x])2 | | − | Thus, we conclude that 2(eǫ a)2(eǫ +t) aeǫ(eǫ +1)2 > 0 A 2F 2 if ǫ> 0. − −  = y [Y = y x] dy x A | − Z− Lx Rx A = dy2 dy + cy2 dy + dy2 dy x2 I. Proving Lemma 6 A L R − Z− Z x Z x From Eq. (26a) and Eq. (26b), for any Y [ A, A] and d 3 3 c 3 3 = max [Lx ( A) ]+ (Rx Lx )+ ∈ −pdf(Y x1) x [ 1,1] any two input values x1, x2 [ 1, 1], we have | 3 − − 3 − pdf(Y x2) ∈ − c ∈ − | ≤ = exp(ǫ). Thus, Algorithm 3 satisfies local differential d 3 3 2 d (A Rx ) x privacy. For notational simplicity, with a fixed ǫ below, we 3 − − will write L(ǫ,x,t) and R(ǫ,x,t) as Lx and Rx. Based on 2d 3 (c d) 3 3 2 = A + − (Rx Lx ) x . (137) proper probability distribution, we have 3 3 − − A Substituting Eq. (128) into Eq. (137) yields F [Y = y x] dy = c(Rx Lx)+ d[2A (Rx Lx)]=1. 2d 3 (c d) A | − − − Var[Y x]= A + − Z− | 3 3 · (126) 3 3 In addition, x 1 2Ad x 1 2Ad A + − − " 1 2Ad 2(c d) − 1 2Ad − 2(c d) # E[Y = y x]= y pdf(Y = y x) dy  − −   − −  2 | A · | x Z− − = c(Rx Lx)+ d[2A (Rx Lx)] 2d (c d) − − − = A3 + − d 2 2 c 2 2 d 2 2 3 3 · = (Lx A )+ (Rx Lx )+ (A Rx ) 2 · − 2 − 2 · − x 2 1 2Ad 1 2Ad 3 6 − +2 − x2 = x. (127) 1 2Ad × 2(c d) 2(c d) − (  −  −  −  ) 24 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

1 2d (1 2Ad)3 2 3 (138) Therefore, we obtain = 1 x + A + − 2 . 1 2Ad − 3 12(c d) E[Z x]= z P [Z = z x]   ǫ ǫ − e 1 t(e 1) − c d ǫ | × | Substituting 1 2α = ǫ− , d = −ǫ 2 , ξ = − , ξ = e 1, z − e +t 2(t+e ) d − X and α = t+1 into Eq. (138) yields 2(t+eǫ) = z P [Z = z Y = y] F [Y = y x] dy ǫ 3 ǫ × | | t +1 (t + e ) (t + 1) + e 1 z y Var[Y x]= x2 + − . (139) X Z | eǫ 1 3t2(eǫ 1)2 − −   = z P [Z = z Y = y] F [Y = y x] dy × | | y z ! Z X J. Calculate the probability of a variable Y falling in the = y F [Y = y x] dy = x. (145) ǫ/3 ǫ/3 y × | interval [L(ǫ,x,e ), R(ǫ,x,e )]. Z  By replacing t in Eq.(26) of PM-OPT with eǫ/3, we obtain the probability as follows: L. Proof of Lemma 9 P L(ǫ,x,eǫ/3) Y R(ǫ,x,eǫ/3) ≤ ≤ To prove Var[Z X = x] Var[Y X = x], it is equivalent ǫ/3 | ≥ | h R(ǫ,x,e ) i to prove = c dY E Z2 X = x (E [Z X = x])2 L(ǫ,x,eǫ/3) | − | Z E 2 E 2 (eǫ+eǫ/3)(xeǫ/3+1) Y X = x ( [Y X = x]) . ǫ ǫ   eǫ/3(eǫ −1) e t(e 1) ≥ | − | E = − dY Since 1 and 2 are unbiased, we have [Z X = x] = ǫ ǫ/3 ǫ/3− ǫ 2 M M  | (e +e )(xe 1) 2(t + e ) E [Y X = x]= x. Hence, it is sufficient to prove Z eǫ/3(eǫ−1) | ǫ E Z2 X = x E Y 2 X = x . (146) e | ≥ | = . We can derive that eǫ/3 + eǫ     E Z2 X = x | 2 K. Proof of Lemma 8 = z P [Z = z X = x] · | z The expression of these two probabilities in Eq. (142) can X be solved from the following: = z2 P [Z = z Y = y] F [Y = y X = x] dy | · | proper distribution so that z y X Z P kC = z2P [Z = z Y = y] F [Y = y X = x] dy  Z = Y = y | · |  m | Zy z    X  P (k + 1)C E 2 F  + Z = Y = y =1, (140a) = Z Y = y [Y = y X = x] dy, (147)  m | y | · |    Z  and   E [Z Y = y]= y so that E Y 2 X = x = y2 F [Y = y X = x] dy. (148) | | · |  kC P Z = kC Y = y Zy  m × m | To prove Inequality (146), because of Eq. (147) and  (k+1)C P (k+1)C = y. (140b)  + m  Z = m  Y = y ! Eq. (148), it is sufficient to prove  × | Summarizing ① andh ②, with k := ym ,i we have E 2 2  C Z Y = y y , y Range(Y ). (149)  ⌊ ym⌋ kC | ≥ ∀ ∈ k +1 , if z = , After getting the intermediate output y from 1, we may P − C m   M [Z = z Y = y]= discretize the intermediate output y into z1 with probability |  ym (k+1)C k, if z = . p and z with probability p . Hereby,  C − m 1 2 2 (141) p1 + p2 =1. In the perturbation step, the distribution of Y given the input Mechanism 2 is unbiased, so that E [Z Y = y] = y, and x is given by we have M | p , if y [L(x), R(x)], F 1 p1 z1 + p2 z2 = y. [Y = y x]= ∈ · · | p2, if y [ C,L(x)) (R(x), C]. According to Cauchy–Schwarz inequality, we have ( ∈ − ∪ (142) E 2 2 2 Z Y = y = p1 z1 + p2 z2 Hence, | · 2 · 2 2 2 = [(√p1) + (√p2) ][(√p1z1) + (√p2z2) ] P [Z = z x]   | (p z + p z )2 = y2. ≥ 1 · 1 2 · 2 = P [Z = z x and Y = y] F [Y = y x] dy Thus, we get Inequality (146) and Inequality (149).  | | Zy = P [Z = z Y = y] F [Y = y x] dy. (143) y | | M. Proof of Lemma 10 In addition,Z our mechanisms are unbiased, such that The maxx [ 1,1] Var [Y x] is minimized when E[Y x]= y F [Y = y x] dy = x. (144) ∈ − H | | × | β = (150) Zy 25 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

0, If 0 <ǫ<ǫ∗, b) A =0, if ǫ =0.610986. ǫ 2 ǫ ǫ ǫ 2 2(e a) (e 1) ae (e +1) c) A< 0, if 0.610986 <ǫ<β .  β1 = 2(eǫ−a)2(eǫ−+t)−aeǫ(eǫ+1)2 , If ǫ∗ ǫ< ln 2, 1 − − ≤  √ B +eǫ 1 Therefore, min max Var [Y x] is at:  − A − β x [ 1,1] β3 = eǫ+eǫ/3 , If ǫ ln 2, ∈ − H | ≥ a) β =0, if 0 <ǫ< 0.610986. Where  5 3 2 b) β = β , if 0.610986 ǫ< ln 2. ǫ∗ := 3 ln(root of 3x 2x +3x 5x 3 near x =1.22588) 1 ≤ ǫ 3 ǫ − − − t+1 (t+e )((t+1) +e 1) -If β [β ,β ], slope = +1+ − 0.610986. (151) ∈ 1 2 1 eǫ 1 3t2(eǫ 1)2 − ≈ ǫ ǫ 2 (eǫ+1)2 (t+eǫ)((t+1)− 3+eǫ 1) (eǫ+1)−2 (β 1)ae (e +1) , slope − . Proof. If x = x∗ = ǫ −2 ǫ ǫ , we have (eǫ 1)2 2 = 3t2(eǫ 1)2 (eǫ 1)2 2(e a) (β(e +t) e +1) − − − − variance of Y as follows: − − Fig. 21 proves that slope > 0, when ǫ [0, ln 2]. 1 ∈ Var [Y x∗] H | ǫ ǫ 2 2 t +1 (β 1)ae (e + 1) 14000 = (β + β 1) − eǫ 1 − 2(eǫ a)2(β(eǫ + t) eǫ + 1) −  − −  12000 aeǫ(eǫ + 1)2 (β 1)aeǫ(eǫ + 1)2 + (1 β) − − (eǫ 1)(eǫ a)2 2(eǫ a)2(β(eǫ + t) eǫ + 1) 10000 − −  − −  ǫ 3 ǫ 8000 (t + e )((t + 1) + e 1) 1 + 2 ǫ 2 − β

3t (e 1) slope  − 6000 e2ǫ(eǫ + 1)2 + (1 β)(1 a) . (152) 4000 − − (eǫ 1)2(eǫ a)2 ǫ − ǫ −  ǫ Let γ := β(e + t) e +1 and c := e , we can transform 2000 − Var [Y x∗] Eq. (152) to the following: H 0 | 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 a2c2(c + 1)4 a2c2(c + 1)4 privacy budget γ 2 4 2 4 Fig. 21: slope when ǫ [0, ln 2]. 4(c + t) (c a) (c 1) − 2(c + t) (c 1)(c a) 1 ∈  3 − − 2 2− − (t + 1) + c 1 (1 a)c (c + 1) ǫ + − − When a =0, β (178) = β (180) = e 1 , the 3t2(c 1)2 − (c + t)(c 1)2(c a)2 1 intersection t+−eǫ − 2 2 2 4 − − 2 22 4 intersection of Var [Y 0] and Var [Y 1] is at β1. 1 (1 + t) a c (c + 1) (1 + t) a c (c + 1) H | H | 5 + When slope2 =0, we have root at ǫ = 3 ln(root of 3x γ 4(c + t)2(c a)4(c 1) − 2(c + t)2(c 1)(c a)4 3 2 −  − − − −  2x +3x 5x 3 near x =1.22588) 0.610986. (1 + t)a2c2(c + 1)4 (1 + t)a2c2(c + 1)4 − − ≈ + From Fig. 22, we have: − 2(c + t)2(c a)4(c 1) (c + t)2(c 1)(c a)4 − − − − (1) If 0 <ǫ< 0.610986, slope2 > 0. (t + 1)3 + c 1 (1 + t)(1 a)c2(c + 1)2 (153) (2) If ǫ =0.610986, slope2 =0. + 2 − + − 2 2 . 3t (c 1) (c + t)(c 1) (c a) (3) If ǫ> 0.610986, slope2 < 0. Set coefficient− of as − − γ A : Based on previous analysis, we have: a2c2(c + 1)4 a2c2(c + 1)4 A := (1) If , 4(c + t)2(c a)4(c 1) − 2(c + t)2(c 1)(c a)4 0 < ǫ < 0.610986 3 − − 2 2 − − minβ maxx [ 1,1] Var [Y x] = (t + 1) + c 1 (1 a)c (c + 1) ∈ − H | + − − . (154) maxx [ 1,1] Var [Y x, β = β1]. 3t2(c 1)2 − (c + t)(c 1)2(c a)2 ∈ − H | − − − (2) If ǫ = 0.610986, minβ maxx [ 1,1] Var [Y x] = Set coefficient of 1 as B : ∈ − H | γ Var [Y 1,β = β1]. (1 + t)2a2c2(c + 1)4 H | (155) (3) If ǫ > 0.610986, minβ maxx [ 1,1] Var [Y x] = B := 2 4 . ∈ − H | −4(c + t) (c a) (c 1) Var [Y 1,β = β1]. Set C as : − − H | (1 + t)a2c2(c + 1)4 (1 + t)a2c2(c + 1)4 C := + −2(c + t)2(c a)4(c 1) (c + t)2(c 1)(c a)4 − − − − 14000 (t + 1)3 + c 1 (1 + t)(1 a)c2(c + 1)2 + − + − . (156) 12000 3t2(c 1) (c + t)(c 1)2(c a)2 Since γ monotonically− increases with− β in− the domain β 10000 ǫ ∈ (0,β1), the minimum γ is γ1 := e +1 at β =0, maximum 8000

− 2 γ is γ2 := 0 at β = β1. 6000 slope If 0 <ǫ< ln 2, a =0, we have 4000 • (t + 1)3 + c 1 (c + 1)2 A = − , 2000 3t2(c 1)2 − (c + t)(c 1)2 − − B =0, 0 -2000 Var [Y x∗]= Aγ + C is a linear function. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 H | privacy budget -If β (0,β1), Appendix O proves: ∈ Fig. 22: slope when ǫ [0, ln 2]. a) A> 0, if 0 <ǫ< 0.610986. 2 ∈

26 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

Therefore, minβ maxx [ 1,1] Var [Y x] is at β = β1 if ∈ − H | 0.3 β [β1,β2]. Summarize above analysis, we can conclude ∈ 0.2 that minβ maxx [ 1,1] Var [Y x]= ∈ − H | 0.1 – Var [Y x∗,β = 0], if 0 <ǫ< 0.610986. 0 H | – maxx [ 1,1] Var [Y x, β = β1], if 0.610986 ǫ < -0.1 ∈ − H | ≤ 2 ln 2. -0.2 slope If ln 2 ǫ ln 5.53, -0.3 • ≤ ≤ -0.4 When β (0,β1), Appendix O proves A < ∈ B -0.5 0, B < 0, Appendix N proves when γ = A , − -0.6 we have minβ maxx [ 1,1] Var [Y x]q = ∈ − H | -0.7 maxx [ 1,1] Var [Y x, β = β3]. Since 0.6 0.8 1 1.2 1.4 1.6 1.8 ∈ − H | privacy budget γ := β(c + t) c + 1, we have β3 := B − Fig. 24: slope when ǫ [ln 2, ln 5.53]. √ A +c 1 2 − − . minβ maxx [ 1,1] Var [Y x] = ∈ c+t ∈ − H | maxx [ 1,1] Var [Y x∗,β = β3]. ∈ − H | When β [β ,β ], Fig. 23 shows that slope > 0 if 400 ∈ 1 2 1 ǫ [ln 2, ln 5.53]. Fig. 24 shows that slope : 350 ∈ 2 300 – If 0 <ǫ< 1.4338, slope2 < 0, βintersection < 250 β1, see Fig. 25, minβ maxx [ 1,1] Var [Y x] = 1 ∈ − H | 200 maxx [ 1,1] Var [Y x, β = β1]. ∈ − H | 150 – If ǫ 1.4338, slope2 = 0, βintersection < ≈ 100 β1, see Fig. 25, minβ max Var [Y x] =

x [ 1,1] intersection - ∈ − H | 50 maxx [ 1,1] Var [Y x, β = β1]. ∈ − H | 0 – If 1.4338 < ǫ ln 5.53, slope2 > ≤ -50 0, minβ maxx [ 1,1] Var [Y x] = ∈ − H | -100 maxx [ 1,1] Var [Y x, β = β1]. 0.6 0.8 1 1.2 1.4 1.6 1.8 ∈ − H | privacy budget Since Var [Y x, β = β3] < Var [Y x, β = β1], H | H | Fig. 25: βintersection β1 when ǫ [ln 2, ln 5.53]. minβ maxx [ 1,1] Var [Y x] is at β = β3. − ∈ ∈ − H |

N. Proof of the monotonicity of Var [Y x∗] 3 H | Substituting A (Eq. 154), B (Eq. 155) and C (Eq. 156) into Var [Y x∗] (Eq. 153) yields 2.5 H | B Var [Y x∗]= Aγ + + C. (157) H | γ 2

1 The first order derivative of (157) is B

slope Var [Y x∗]′ = A . (158) 1.5 H | − γ2 If A B =0, we get two roots: − γ2 1 B B γ = ,γ = . 1 − A 2 A 0.5 r r 0.6 0.8 1 1.2 1.4 1.6 1.8 Since γ < 0, γ1 is eligible. Hereby, privacy budget B If γ ( , ], Var [Y x∗]′ < 0, Var [Y x∗] • A Fig. 23: slope1 when ǫ [ln 2, ln 5.53]. ∈ −∞ − H | H | ∈ monotonically decreases.q B If γ ( A , 0), Var [Y x∗]′ > 0, Var [Y x∗] mono- If ǫ> ln 5.53, • ∈ − H | H | • tonically increases.q – If β (0,β ), Appendix O proves A < 0 and  ∈ 1 B < 0, Appendix N proves when γ = B , β := − A 4 B O. The sign of A to ǫ √ A +c 1 q − − , we get minβ maxx [ 1,1] Var [Y x] = c+t ∈ − H | If A =0, we have ǫ = 3 ln(root of 3ǫ5 2ǫ3 +3ǫ2 5ǫ maxx [ 1,1] Var [Y x, β = β4]. − − − ∈ − H | 3 near x =1.22588) 0.610986. – If β [β1,β2], Appendix P proves slope1 > ≈ 0 and∈ Appendix Q proves slope > 0. There- First order derivative of A is 2 ǫ 2ǫ 3ǫ ǫ/3 2ǫ/3 4ǫ/3 A′ = (25e 27e 9e 12e + 19e e fore, we obtain minβ maxx [ 1,1] Var [Y x] = − − − − − ∈ − H | 5ǫ/3 7ǫ/3 2ǫ/3 2ǫ/3 2 ǫ 3 maxx [ 1,1] Var [Y x, β = β1]. + 41e +7e + 5)/(9e (e + 1) (e 1) ). ∈ − H | −  (159)

27 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

If 0 <ǫ < ln 2, Fig. 26 shows that A′ < 0 and A P. The sign of slope when ǫ> ln 5.53 • 1 monotonically decreases if ǫ (0, ln 2). Therefore, we ∈ have t +1 aeǫ(eǫ + 1)2 If slope = +1 – A> 0, if 0 <ǫ< 0.610986. 1 eǫ 1 − (eǫ 1)(eǫ a)2 − − − – A =0, if ǫ =0.610986. (t + eǫ)((t + 1)3 + eǫ 1) (1 a)e2ǫ(eǫ + 1)2 + − − =0, – A< 0, if 0.610986 <ǫ< ln 2. 3t2(eǫ 1)2 − (eǫ 1)2(eǫ a)2 we have two roots:− − − ǫ ǫ 105 e1 0.141506,e2 2.21598. 0 ≈− ≈ The first order derivative of slope1 is 2ǫ ǫ ǫ ǫ ǫ -2 e 3 (10 e +20)+20 e + ( 27 e 45) e 3 + 10 slope1′ = ǫ − − . − 9e 3 (eǫ 1)3 -4 − The denominator of the slope1′ is > 0. Thus, -6 2ǫ ǫ ǫ ǫ ǫ lim e 3 (10 e +20)+20 e + ( 27 e 45) e 3 + 10 A' ǫ − − − -8 →∞ = . -10 −∞ If slope1′ =0, we have three roots: ǫ ǫ ǫ -12 e1 1.35696,e2 0.0169067,e3 4.22192. ǫ ≈− ≈ ≈ Since e3 4.22192 is the largest real value root, the sign of -14 ≈ ǫ 0 0.1 0.2 0.3 0.4 0.5 0.6 slope1′ doesn’t change when e > 4.22192. Therefore, when privacy budget ǫ > ln 5.53 and slope1′ < 0, slope1 monotonically decreases. Fig. 26: First order derivative of A is less than 0, if 0 <ǫ By simplifying slope , we get ≤ 1 ln 2. (9eǫ 5e2ǫ/3 5e4ǫ/3 + 3) slope = − − − . 1 3(eǫ 1)2 − If ln 2 ǫ ln 5.53, Fig. 27 shows A< 0. Then, we obtain limǫ slope1 = 0. Thus, we have slope1 > 0 • ≤ ≤ →∞ if ǫ> ln 5.53.  7000

6000 Q. The sign of slope2 when ǫ> ln 5.53 5000 When 4000 (t + eǫ)((t + 1)3 + eǫ 1) slope 2 = 2 ǫ 2 − A 3000 3t (e 1) 2ǫ ǫ −2 2000 (1 a)e (e + 1) ǫ− 2 ǫ 2 =0, 1000 − (e 1) (e a) we have − − 0 ǫ1 ln 1.24835,ǫ2 ln 1.52144. -1000 ≈ − ≈ 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 The first order derivative of slope2 is : privacy budget slope2′ = Fig. 27: Value A is less than 0, if ln 2 <ǫ ln 5.53. 2ǫ 2ǫ ǫ ǫ ǫ ǫ ≤ 4 e + e 3 (9 e + 63) 23 e + ( 20 e 10) e 3 3 − 2ǫ− 3 − − − . 9e 3 (eǫ 1) If ǫ> ln 5.53, we obtain • − ǫ 2ǫ 3ǫ ǫ 2ǫ (161) A = (16e + 21e +3e + 36e 3 12e 3 − − The denominator of the Eq. (161) is > 0. Besides, from 5ǫ 7ǫ 2ǫ ǫ 2ǫ 5ǫ 2 28e 3 8e 3 12))/(12e 3 (e e 3 + e 3 1) . nominator of Eq. (161), we obtain − − − − − 2ǫ 2ǫ ǫ ǫ ǫ ǫ (160) lim ( 4 e + e 3 (9 e + 63) 23 e + ( 20 e 10) e 3 3) ǫ When A =0, we have three roots: →∞ − − − − − r 16.9563, r 1.2284, r 0.0463914. = . 1 ≈− 2 ≈− 3 ≈ −∞ If the denominator If Eq. (161) = 0, we have two roots: 6 5 4 3 2 2ǫ ǫ 2ǫ 5ǫ 2 ǫ = 3 ln(root of 4x 9x + 20x + 23x 63x + 10x + 3) = 12e 3 (e e 3 + e 3 1) > 0 1 − − − − and 3.13865, ǫ 2ǫ 3ǫ ≈− 6 5 4 3 2 lim (16e + 21e +3e ǫ2 = 3 ln(root of 4x 9x + 20x + 23x 63x + 10x + 3) ǫ →∞ − − − ǫ 2ǫ 5ǫ 7ǫ 0.709472. + 36e 3 12e 3 28e 3 8e 3 12) = , ≈ − − − − −∞ Since ǫ 0.709472 is the largest real value root, the sign of r3 is the largest real value root, the sign of ≈ ǫ 2ǫ 3ǫ ǫ 2ǫ 5ǫ 7ǫ slope′ doesn’t change if ǫ> 0.709472. Therefore, slope′ < 0 (16e +21e +3e +36e 3 12e 3 28e 3 8e 3 12) 2 2 − − − − − if ǫ> ln 5.53. Simplify doesn’t change, so that A< 0 when ǫ> ln 5.53. ǫ ǫ 2ǫ 4ǫ 3e 3 3e +5e 3 +2e 3 9  slope = − − , 2 3(eǫ 1)2 − 28 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks. and then we get Substituting Eq. (163) into Eq. (162) yields slope d k2 dk lim 2 =0. Var 2 (165) ǫ [Y [tj ]] = O 2 (x[tj ]) = O 2 . Thus, we have slope >→∞0 if ǫ> ln 5.53.  k · ǫ − ǫ 2 Therefore, we obtain     nλ2 P [ Z[t ] X[t ] λ] 2 exp . R. Proof of Lemma 13 | j − j |≥ ≤ · −O(dk/ǫ2)+ λ O(d/ǫ)  ·  √d ln(d/β)) For any i [1,n], the random variable Y [t ] x[t ] By the union bound, there exists λ = O . ∈ j − j ǫ√n has zero mean based on Lemma 12. In bothǫ PM-SUBǫ ǫ and   d (e k +e 3k )(e 3k +1) √d ln(d/β)) HMPM-SUB,Three-Outputs, Y [tj ] x[tj ] ǫ ǫ . Therefore, maxj [1,d] Z[tj ] X[tj] = λ = O . | − | ≤ k · e 3k (e k 1) ∈ | − | ǫ√n By Bernstein’s inequality, we have −   P [ Z[tj] X[tj] λ] | −n |≥ = P r Y [t ] x[t ] nλ S. Calculate k for PM-SUB and Three-Outputs | { j − j }| ≥  i=1  We calculate the optimal k for PM-SUB and X 2 (nλ) Three-Outputs. 2 exp − ǫ ǫ ǫ . ≤ · n 2 d (e k +e 3k )(e 3k +1) 2 Var[Y [tj ]] + nλ ǫ ǫ  (I) We calculate the k for d dimension PM-SUB. When i=1 3 · · k · e 3k (e k 1) d k − x[tj ]=1, we get In AlgorithmP 6, Y [tj ] equals k yj with probability d and 0 with probability 1 k . Moreover, we obtain E[Y [t ]] = x[t ] max Var[Y [tj ]] d j j ǫ ǫ − ǫ ǫ k ǫ 3 k from Lemma 12, and then we get d t( k )+1 (t( k )+ e ) (t( k )+1) + e 1 2 2 = ǫ + ǫ ǫ − +1 1. Var[Y [t ]] = E[(Y [t ]) ] E[Y [t ]] k e k 1 3(t( ))2(e k 1)2 − j j j  k   − For PM-SUB−, we have − k d 2 2 d 2 2 = E[( yj) ] (x[tj ]) = E[(yj ) ] (x[tj ]) . (162) d k − k − max Var[Y [tj ]] In Algorithm 6, if Line 5 uses PM-SUB, we the variance ǫ ǫ ǫ ǫ 3 ǫ d e 3k +1 (e 3k + e k ) (e 3k + 1) + e k 1 E 2 in Eq. (139) to compute [(yj ) ], the asymptotic expression = ǫ + ǫ ǫ − +1 1. k e k 1 3(e 3k )2(e k 1)2 − involving ǫ are in the sense of ǫ 0.  ǫ − −   2 2 → Let s = k , and then E[(yj ) ]= Var[yj] + (E[yj ]) max Var[Y [tj ]] ǫ ǫ ǫ ǫ 3 ǫ t( )+1 (t( )+ e k ) (t( )+1) + e k 1 s s s s 3 s k 2 k k 3 3 3 = ǫ (x[t ]) + ǫ − d e +1 (e + e ) (e + 1) + e 1 j ǫ 2 2 e k 1 3(t( )) (e k 1) = s s + s − +1 1. − k −  ǫ · e 1 3(e 3 )2(es 1)2 − 2  − −   2 k Let + (x[tj ]) = O . (163) s s s s 3 s 2 e 3 +1 (e 3 + e ) (e 3 + 1) + e 1 ǫ −   f(s)= s s + s +1 , In Algorithm 6, if Line 5 uses Three-Outputs, and · e 1 3(e 3 )2(es 1)2 2  − −   then we the variance in Eq. (172) to compute E[(yj ) ], the and then we obtain d asymptotic expression involving ǫ are in the sense of ǫ 0. max Var[Y [tj ]] = f(s) 1. (166) 2 2 → ǫ · − E[(yj ) ]= Var[yj] + (E[yj ]) 2ǫ ǫ 2 2ǫ ǫ 2 (1 a)e k (e k + 1) b x[tj ] e k (e k + 1) 2 = −ǫ ǫ + | ǫ | ǫ (x[tj ]) 60 (e k 1)2(e k a)2 (e k 1)2(e k a)2 − − 2 − − − + (x[tj ]) 50 2ǫ ǫ 2 2ǫ ǫ 2 2 (1 a)e k (e k + 1) b x[tj ] e k (e k + 1) k 40 = −ǫ ǫ + | ǫ | ǫ = O( 2 ), (e k 1)2(e k a)2 (e k 1)2(e k a)2 ǫ − − − − In Algorithm 6, if Line 5 uses HM-TP, we have: 30 2 2 E[(yj ) ]= Var[yj] + (E[yj ]) ǫ 2 (e k +1) 2 20 ǫ + (x[tj ]) , If 0 <ǫ<ǫ∗, (e k 1)2 − 10 = ǫ 2 X 2.5  Var [Y 1,β1, ] + (x[tj ]) , If ǫ∗ ǫ< ln 2,  H | k ≤ Y 4.0976  ǫ 2 Var [Y 1,β3, k ] + (x[tj ]) , If ǫ ln 2 0 H | ≥ 0 2 4 6 8 10  k2 = O , (164) ǫ2 Fig. 28: Find s for min f(s).   where ǫ∗ is defined in the Eq. (151). Then, ǫ d t( k )+1 2 From numerical experiments shown in Fig. 28, we con- Var[Y [tj ]] = ǫ (x[tj ]) k e k 1 clude that we can get min f(s) and min max Var[Y [tj ]]  ǫ ǫ ǫ ǫ − 3 ǫ k k if s =2.5, i.e. k = 2.5 . (t( k )+ e ) (t( k )+1) + e 1 2 + ǫ − + (x[tj ]) (II) Calculate the k for d dimension Three-Outputs. The ǫ 2 k 2 3(t( k )) (e 1)   variance of Y [tj ] is 2 − (x[tj ]) . − Var[Y [tj ]] =

29 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

2ǫ ǫ 2 2ǫ ǫ 2 d (1 a)e k (e k + 1) b x[tj ] e k (e k + 1) T.2 Extending Three-Outputs for Multiple Numeric At- −ǫ ǫ + | ǫ | ǫ (x[tj ]) k (e k 1)2(e k a)2 (e k 1)2(e k a)2 − tributes   − −2ǫ ǫ 2 − −2ǫ ǫ 2 d (1 a)e k (e k + 1) b x[tj ] e k (e k + 1) Lemma 14. For a d-dimensional numeric tuple x which is = s −ǫ ǫ + | ǫ | ǫ ǫ · · (e k 1)2(e k a)2 (e k 1)2(e k a)2 perturbed as Y under ǫ-LDP, and for each tj of the d attribute,   2 − − − − the variance of Y [t ] induced by Three-Outputs is (x[tj ]) , j − 2ǫ ǫ 2 2ǫ ǫ 2 where b is from Eq. (75) and a is from Eq. (74). d (1 a)e k (e k + 1) b x[tj ] e k (e k + 1) 2ǫ ǫ Var[Y [tj ]] = −ǫ ǫ + | ǫ | ǫ dbe k (e k +1)2 k (e k 1)2(e k a)2 (e k 1)2(e k a)2 Let x[tj ]′ = ǫ ǫ , if 0 < x[tj ]′ < 1, the   2k(e k 1)2(e k a)2 2 − − − − − − (x[t ]) . worst-case noise variance of Y is − j max Var[Y [tj ]] = Proof of Lemma 14. The variance of Y [tj ] is computed as x [ 1,1] 2 2 ∈ − Var[Y [tj ]] = E[(Y [tj ]) ] E[Y [tj ]] Var[x[tj ]′], if 0 < x[tj ]′ < 1, − (167) k E d 2 2 max Var[0], Var[1] , otherwise. = [( yj ) ] (x[tj ]) ( { } d k − Let s = ǫ , and then d k = E[(y )2] (x[t ])2. (170) d b(s)e2s(es + 1)2 k j − j x[tj ]′ = s We use variance Eq. (172) to compute ǫ · 2(es 1)2(es a(s))2 − − E[(y )2]= Var[y ] + (E[y ])2 d a(s)es(es + 1)2 j j j = s . 2ǫ ǫ 2 2ǫ ǫ 2 ǫ · 2(es 1)(es a(s))2 (1 a)e k (e k + 1) b x[tj ] e k (e k + 1) 2 s s −2 − = −ǫ ǫ + | ǫ | ǫ (x[tj ]) d a(s)e (e +1) (e k 1)2(e k a)2 (e k 1)2(e k a)2 − If 0 < ǫ s 2(es 1)(es a(s))2 < 1, and then − − − − · − − 2 + (x[tj ]) max Var[Y [tj ]] = max Var[x[tj ]′]= 2ǫ ǫ 2 2ǫ ǫ 2 ǫ 2ǫ ǫ 2 ǫ 2ǫ ǫ 2 (1 a)e k (e k + 1) b x[t ] e k (e k + 1) (1 a( ))e k (e k + 1) b( )x[t ] e k (e k + 1) j d k k j ′ = −ǫ ǫ + | ǫ | ǫ . = −ǫ ǫ + ǫ ǫ (e k 1)2(e k a)2 (e k 1)2(e k a)2 k (e k 1)2(e k a)2 (e k 1)2(e k a)2  − − − − Then, − − − − 2 2ǫ ǫ 2ǫ ǫ (x[tj ]′) 2 2 − d (1 a)e k (e k + 1) b x[tj ] e k (e k + 1) 2 2 4s s 4 Var[Y [tj ]] = −ǫ ǫ + | ǫ | ǫ d 2 (b(s)) e (e + 1) k (e k 1)2(e k a)2 (e k 1)2(e k a)2 = s   ǫ2 · 2(es 1)4(es a(s))4 2 − − − − − − (x[tj ]) . d2 (b(s))2e4s(es + 1)4 −  s2 − ǫ2 · 4(es 1)4(es a(s))4 − − d (1 a(s))e2s(es + 1)2 + s − U. Proof of Lemma 11 ǫ (es 1)2(es a(s))2 · Given PM-SUB’s variance in Eq. (138), we have 2 − 2 4s −s 4 d 2 (b(s)) e (e + 1) 1 2d (1 2Ad)3 = s Var [Y x]= 1 x2 + A3 + . ǫ2 · 4(es 1)4(es a(s))4 − 2 − − P | 1 2Ad − 3 12(c d) d (1 a(s))e2s(es + 1)2 Substituting α = Ad− yields  − + s − . (168) 3 ǫ · (es 1)2(es a(s))2 1 2 2α (1 2α) s Var [Y x]= 1 x + + − . − e −1 P 2 2 Substituting b = a −s into Eq. (168) yields | 1 2α − 3d 12(c d) e ǫ 2 · 2 2s s 4  −  eǫ 1 t(e −1) d 2 (a(s)) e (e + 1) In addition, substitute 1 2α = eǫ−+t , d = 2(t+−eǫ)2 , ξ := = 2 s s 2 s 4 c d ǫ t+1− ǫ · 4(e 1) (e a(s)) − , ξ = e 1, α = ǫ . Then, we have − − d − 2(t+e ) d (1 a(s))e2s(es + 1)2 ǫ 3 ǫ t +1 2 (t + e ) (t + 1) + e 1 + s s− 2 s 2 . Var [Y x]= x + − . ǫ · (e 1) (e a(s)) P eǫ 1 3t2(eǫ 1)2 − − |  − − (171) – If ǫ < ln 2, a = 0,b = 0, first-order derivative of Given Three-Outputs’s variance in Eq. (72), we can max Var[Y [tj ]] is s s 2s simplify it as d (e + 1)( 4se + e 1) 2 2 2 max Var[Y [tj ]]′ = − − . Var [Y x]=(1 a)C + C b x x . ǫ · (es 1)3 T | − ǫ| ǫ|− According to Eq. (73), we have e (e +1) , so that − (169) C = (eǫ 1)(eǫ a) 2ǫ ǫ 2 − 2ǫ −ǫ 2 When max Var[Y [t ]]′ =0, we have root s 2.18. (1 a)e (e + 1) b x e (e + 1) 2 j ≈ Var [Y x]= − + | | x . – If ln 2 <ǫ< ln 5.5, by numerical experiments, we T | (eǫ 1)2(eǫ a)2 (eǫ 1)2(eǫ a)2 − − − − − have optimal s 2.5. (172) – If ǫ ln 5.5, by≈ numerical experiments, we have Based on Eq. (171) and Eq. (172), we can construct variance optimal≥ s 2.5. of hybrid mechanism as follows ≈ ǫ 3 ǫ t +1 2 (t + e ) (t + 1) + e 1 ǫ Var [Y x]= β( x + − )+ Therefore, we pick s = 2.5 and k = to simplify the H | eǫ 1 3t2(eǫ 1)2 2.5 − −  experimental evaluation. (1 a)e2ǫ(eǫ + 1)2 b x e2ǫ(eǫ + 1)2 (1 β)( − + | | x2), − (eǫ 1)2(eǫ a)2 (eǫ 1)2(eǫ a)2 − − − − −  where t = eǫ/3.

30 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

eǫ 1 ǫ 2 ǫ ǫ ǫ 2 ǫ From Eq. (73), we set b = a e−ǫ to get the worst-case noise 3) If 2(e a) (e + t) ae (e + 1) =0 and 2(e variance. Then, we have variance· of the hybrid mechanism as a)2(eǫ −1) aeǫ(eǫ −+ 1)2 0, we have: − t +1 − − ≤ Var 2 max Var [Y x]= [Y x] = (β ǫ + β 1)x x [ 1,1] H | H | e 1 − ∈ − − aeǫ(eǫ + 1)2 eǫ 1 Var [Y x∗], if β < eǫ−+t , + (1 β) ǫ ǫ 2 x H | − (e 1)(e a) | | ( max Var [Y 0], Var [Y 1] , otherwise. ǫ − 3 − ǫ { H | H | } (t + e )((t + 1) + e 1) 4) If 2(eǫ a)2(eǫ + t) aeǫ(eǫ + 1)2 < 0, we have: + − β − − 3t2(eǫ 1)2 max Var [Y x]=  − x [ 1,1] H | e2ǫ(eǫ + 1)2 ∈ − (173) 2(eǫ a)2(eǫ 1) aeǫ(eǫ+1)2 + (1 β)(1 a) ǫ 2 ǫ 2 , (e 1) (e a) Var [Y x∗], if β < ǫ− 2 ǫ− − ǫ ǫ 2 , − − H 2(e a) (e +t) ae (e +1) ǫ/3 − −  | − − where t = e . Based on Eq. (173), we get ( max Var [Y 0], Var [Y 1] , otherwise. { H | H | } max Var [Y x]= Appendix H proves that x [ 1,1] H | ∈ − 2(eǫ a)2(eǫ + t) aeǫ(eǫ + 1)2 > 0 t+1 − − Var [Y x∗], if β eǫ 1 + β 1 < 0, 0 < x∗ < 1, and Appendix G H | − − (174) ( max Var [Y 0], Var [Y 1] , otherwise, 2(eǫ a)2(eǫ 1) aeǫ(eǫ + 1)2 eǫ 1 { H | H | } − − − − . (β 1)aeǫ(eǫ+1)2 2(eǫ a)2(eǫ + t) aeǫ(eǫ + 1)2 ≤ eǫ + t where x∗ := ǫ −2 ǫ ǫ . 2(e a) (β(e +t) e +1) Therefore,− we have − Therefore, we− have the− following cases to compute max Var [Y x]= maxx [ 1,1] Var [Y x]: x [ 1,1] H | ∈ − H ∈ − | ǫ 2 ǫ ǫ ǫ 2 (I) If t+1 , we obtain: 2(e a) (e 1) ae (e +1) β eǫ 1 + β 1 < 0, 0 H | H | H | − − and Var [Y 0] are: 2(eǫ a)2(eǫ 1) aeǫ(eǫ + 1)2. (175) H | − − − ǫ 2 ǫ ǫ ǫ 2 t+1 – If 2(e a) (e + t) ae (e + 1) > 0, we have – Var [Y 1] = (β ǫ + β 1) + (1 − ǫ 2− ǫ ǫ ǫ 2 H e 1 2(e a) (e 1) ae (e + 1) aeǫ(|eǫ+1)2 (t+eǫ)((− t+1)3+eǫ −1) − β) + − β+(1 β)(1 β > ǫ − 2 ǫ − − ǫ ǫ 2 . (eǫ 1)(eǫ a)2 3t2(eǫ 1)2 −2ǫ ǫ − 2 − ǫ ǫ − 2 − 2(e a) (e + t) ae (e + 1) e (e +1) t+1 ae (e +1) ǫ 2 ǫ− ǫ ǫ − 2 ǫ a) = β +1 + – If 2(e a) (e + t) ae (e + 1) =0 and 2(e (eǫ 1)2(eǫ a)2 eǫ 1 − (eǫ 1)(eǫ a)2 2 ǫ − ǫ ǫ− 2 − (t+eǫ)((− t+1)3−+eǫ 1) (1− a)e2ǫ (eǫ+1)2− − a) (e 1) ae (e + 1) > 0, no β satisfies the − − 1 + − − 3t2(eǫ 1)2  − (eǫ 1)2(eǫ a)2 − condition. aeǫ(eǫ+1)−2 (1 a)e2ǫ(eǫ+1)− 2 − + − , – If 2(eǫ a)2(eǫ + t) aeǫ(eǫ + 1)2 =0 and 2(eǫ (eǫ 1)(eǫ a)2 (eǫ 1)2(eǫ a)2  − − (t+eǫ−)((t+1)3−+eǫ 1) 2 ǫ − ǫ ǫ− 2 − a) (e 1) ae (e + 1) 0, any β satisfies the – Var [Y 0] = 2 ǫ 2 − β + (1 β)(1 H | 3t (e 1) − − − − ≤ e2ǫ(eǫ+1)2 − (t+eǫ)((t+1)3+eǫ 1) condition. a) ǫ 2 ǫ 2 = β 2 ǫ 2 − ǫ 2 ǫ ǫ ǫ 2 (e 1) (e a) 3t (e 1) − – If 2(e a) (e + t) ae (e + 1) < 0, we have (1 a)−e2ǫ(eǫ +1)− 2 (1 a)e2ǫ (eǫ+1)2 − − 2(eǫ a)2(eǫ −1) aeǫ(eǫ+1)2 (e−ǫ 1)2(eǫ a)2 + (e−ǫ 1)2(eǫ a)2 . β < 2(eǫ−a)2(eǫ−+t)−aeǫ(eǫ+1)2 . Since β < − − − − ǫ − − Since Var [Y 1] and Var [Y 0] are linear equations e 1 , to get the correct domain, we com- H |  H | eǫ−+t respect to β, we compare slopes of β in Var [Y 1] and 2(eǫ a)2(eǫ 1) aeǫ(eǫ+1)2 eǫ 1 H | pare 2(eǫ−a)2(eǫ−+t)−aeǫ(eǫ+1)2 and eǫ−+t , see Ap- Var [Y 0]. We define the slope of β in Var [Y 1] as H H − − 2(eǫ a)2(eǫ 1) aeǫ(eǫ+1)2 | ǫ ǫ 2 | pendix G, we have − − − t +1 ae (e + 1) 2(eǫ a)2(eǫ+t) aeǫ(eǫ+1)2 slope := +1 ǫ − 2 ǫ − ǫ ǫ 2 ≤ 1 ǫ ǫ ǫ 2 eǫ 1 2(e a) (e 1) ae (e +1) e 1 − (e 1)(e a) . Therefore, β < − − − . eǫ−+t 2(eǫ a)2(eǫ+t) aeǫ(eǫ+1)2 ǫ − 3 ǫ − − 2ǫ ǫ 2 − − (t + e )((t + 1) + e 1) (1 a)e (e + 1) Summarize above analysis, we have the following cases + − − , 3t2(eǫ 1)2 − (eǫ 1)2(eǫ a)2 to compute maxx [ 1,1] Var [Y x] : − − − ∈ − H | (176) 1) If 2(eǫ a)2(eǫ + t) aeǫ(eǫ + 1)2 > 0, we have: and the slope of β in Var [Y 0] as − − H | max Var [Y x]= (t + eǫ)((t + 1)3 + eǫ 1) (1 a)e2ǫ(eǫ + 1)2 x [ 1,1] H | slope := − − . ∈ − 2 3t2(eǫ 1)2 − (eǫ 1)2(eǫ a)2 2(eǫ a)2(eǫ 1) aeǫ(eǫ+1)2 eǫ 1 − − − Var [Y x∗], if 2(eǫ−a)2(eǫ−+t)−aeǫ(eǫ+1)2 <β< eǫ−+t , (177) H | − − Then, we represent left boundary of β as ( max Var [Y 0], Var [Y 1] , otherwise. { H | H | } 2(eǫ a)2(eǫ 1) aeǫ(eǫ + 1)2 2) If 2(eǫ a)2(eǫ +t) aeǫ(eǫ +1)2 =0 and aeǫ(eǫ + (178) β1 := ǫ − 2 ǫ − − ǫ ǫ 2 , 1)2 + 2(−eǫ a)2(1 −eǫ) > 0, we have: 2(e a) (e + t) ae (e + 1) − − and the right boundary− of β as− max Var [Y x] = max Var [Y 0], Var [Y 1] x [ 1,1] H | { H | H | } β2 := 1, (179) ∈ −

31 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

and the value of β at the intersection of slope1 and slope2 Proof. 1) If slope1 > 0, slope2 > 0, Var [Y 1] H | is and Var [Y 0] monotonically increase β [β1,β2], (c 1)(c a)2 ac(c + 1)2 H | ∈ (180) minβ maxx [ 1,1] Var [Y x] is at β = β1. βintersection := − − 2 − 2 . ∈ − H | (t + c)(c a) ac(c + 1) 2) Similar to 1), we have minβ maxx [ 1,1] Var [Y x] is − − ∈ − H | Then, slope1 and slope2 have the following possible com- at β =1. binations: 3) If β1 <βintersection <β2, we have: If slope > 0, slope < 0, Var[Y 1] monotonically 1) If slope > 0, slope > 0, β = β1. • 1 2 1 2 increases and Var[Y 0] monotonically decreases,| so when 2) If slope1 < 0, slope2 < 0, β = β2. | β [β1,βintersection], maxx [ 1,1] Var [Y x] = 3) If slope1 slope2 < 0, If β1 < βintersection < β2, β = ∈ ∈ − H | · Var [Y 0]. When β [βintersection,β2], βintersection. H | ∈ maxx [ 1,1] Var [Y x] = Var [Y 1]. Therefore, 4) If slope1 slope2 < 0, If ββintersection < β1 or ∈ − H | H | · minβ maxx [ 1,1] Var [Y x]= Var [Y 1] = Var [Y 0] βintersection > β2, find β for min max Var[Y 1,β = ∈ − H | H | H | { | at β = βintersection. β1], Var[Y 0,β = β1] | }  If slope < 0, slope > 0, Var[Y 1] monotonically , max Var[Y 1,β = β2], Var[Y 0,β = β2] . • 1 2 { | | } decreases and Var[Y 0] monotonically increases,| so when 5) If slope1 slope2 =0, | · β [β1,βintersection], maxx [ 1,1] Var [Y x] = Case 1: slope1 =0, slope2 =0, ∈ ∈ − H | 6 Var [Y 1]. When β [βintersection,β2], a) If slope = 0, slope > 0, βintersection [β1,β2], H | ∈ 1 2 ∈ maxx [ 1,1] Var [Y x] = Var [Y 0]. Therefore, β = [β1,βintersection]. ∈ − H | H | minβ maxx [ 1,1] Var [Y x]= Var [Y 1] = Var [Y 0] b) If slope = 0, slope < 0, βintersection [β1,β2], ∈ − H | H | H | 1 2 ∈ at β = βintersection. β = [βintersection,β2]. 4) If βintersection <β1 and β2 >βintersection, we have: c) If slope1 = 0, slope2 > 0, βintersection < β1 If slope1 > 0, slope2 < 0 and Var [Y 1] > or βintersection > β2, Var [Y 1,β = [β1,β2]] > • H | H | Var [Y 0], since Var [Y 1] monotonically increases Var [Y 0,β = [β1,β2]], β = [β1,β2]. H | H | H | in the domain, minβ Var [Y 1] is at β = β1. d) If slope1 = 0, slope2 > 0, βintersection < β1 H | minβ maxx [ 1,1] Var [Y x] is at β = β1. or βintersection > β2, Var [Y 0,β = [β1,β2]] > H H ∈ − | | If slope1 > 0, slope2 < 0 and Var [Y 1] < Var [Y 1,β = [β1,β2]], β = β1. • H | H | Var [Y 0]. Since Var [Y 0] monotonically decreases e) If slope1 = 0, slope2 < 0, βintersection < β1 H | H | in the domain, minβ Var [Y 0] is at β = β2. or βintersection > β2, Var [Y 1,β = [β1,β2]] > H H | | minβ maxx [ 1,1] Var [Y x] is at β = β2. Var [Y 0,β = [β1,β2]], β = [β1,β2]. H H ∈ − | | If slope1 < 0, slope2 > 0 and Var [Y 1] > f) If slope1 = 0, slope2 < 0, βintersection < β1 • H | Var [Y 0], maxx [ 1,1] Var [Y x] = Var [Y 1], or βintersection > β2, Var [Y 1,β = [β1,β2]] < H | ∈ − H | H | H | since Var [Y 1] monotonically decreases in Var [Y 0,β = [β1,β2]], β = β2. H | H | the domain, minβ Var [Y 1] is at β = β2. H Case 2: slope2 =0, slope1 =0, | 6 minβ maxx [ 1,1] Var [Y x] is at β = β2. ∈ − H | a) If slope1 > 0, slope2 = 0 , βintersection [β1,β2], If slope1 < 0 , slope2 > 0 and Var [Y 1] < Var [Y 0], ∈ • H | H | β = [β1,βintersection]. maxx [ 1,1] Var [Y x] = Var [Y 0], since Var [Y 0] ∈ − H | H | H | b) If slope1 < 0, slope2 = 0 , βintersection [β1,β2], monotonically increases in the domain, minβ Var [Y 0] ∈ H | β = [βintersection,β2]. is at β = β1. minβ maxx [ 1,1] Var [Y x] is at β = β1. ∈ − H | c) If slope2 = 0, slope1 > 0, βintersection < β1 5) or βintersection > β2, Var [Y 1,β = [β1,β2]] > H | Case 1: Var [Y 0,β = [β1,β2]], β = β1. • H | a) If slope1 = 0, slope2 > 0, βintersection d) If slope2 = 0, slope1 > 0, βintersection < β1 ∈ [β1,β2], we can conclude that β or βintersection > β2, Var [Y 0,β = [β1,β2]] > ∈ H | [β1,βintersection], Var [Y 1] > Var [Y 0]. When Var [Y 1,β = [β1,β2]], β = [β1,β2]. H | H | H Var Var . e) If slope| = 0, slope < 0, β < β β (βintersection,β2], [Y 0] > [Y 1] 2 1 intersection 1 Since∈ Var [Y 0] monotonicallyH | increases ifH β| or βintersection > β2, Var [Y 1,β = [β1,β2]] > H | ∈ H | (βintersection,β2], minβ maxx [ 1,1] Var [Y x] = Var [Y 0,β = [β1,β2]], β = β2. ∈ − H | H | Var [Y 1,β = [β1,βintersection]]. f) If slope2 = 0, slope1 < 0, βintersection < β1 H | b) If slope1 = 0, slope2 < 0,βintersection or βintersection > β2, Var [Y 1,β = [β1,β2]] < ∈ H | [β1,β2], we can conclude that β Var [Y 0,β = [β1,β2]], β = [β1,β2]. ∈ H | [β1,βintersection], Var [Y 0] > Var [Y 1]. Case 3: slope =0 and slope =0, H | H | 1 2 When β (βintersection,β2], Var [Y 1] > ∈ H | a) If Var [Y 1,β = [β1,β2]] < Var [Y 0,β = Var [Y 0]. Since Var [Y 1] does not change H | H | H | H | [β1,β2]], β = [β1,β2]. and Var [Y 0] monotonically decreases, H | b) If Var [Y 1,β = [β1,β2]] > Var [Y 0,β = minβ maxx [ 1,1] Var [Y x] = Var [Y 0,β = H | H | ∈ − H | H | [β1,β2]], β = [β1,β2]. [βintersection,β2]]. c) If Var [Y 1,β = [β1,β2]] = Var [Y 0,β = c) If slope1 =0, slope2 > 0, Var [Y 1,β = [β1,β2]] > H | H | H | [β1,β2]], β = [β1,β2]. Var [Y 0,β = [β1,β2]], the βintersection > β2. H | 32 This paper appears in IEEE Internet of Things Journal (IoT-J). Please feel free to contact us for questions or remarks.

Since Var [Y 1] does not change if β [β1,β2], H | ∈ minβ maxx [ 1,1] Var [Y x] = Var [Y 1,β = ∈ − H | H | [β1,β2]]. d) If slope1 =0, slope2 > 0, Var [Y 0,β = [β1,β2]] > H | Var [Y 1,β = [β1,β2]], the βintersection > H | β2. Since Var [Y 0] monotonically decreases if H | β [β1,β2], minβ maxx [ 1,1] Var [Y x] = ∈ ∈ − H | Var [Y 0,β = β2]. H | e) If slope1 =0, slope2 < 0, Var [Y 1,β = [β1,β2]] > H | Var [Y 0,β = [β1,β2]], the βintersection < β1. H | Since Var [Y 1] does not change if β [β1,β 2], H | ∈ || minβ maxx [ 1,1] Var [Y x] = Var [Y 1,β = ∈ − H | H | [β1,β2]]. f) If slope1 =0, slope2 < 0, Var [Y 0,β = [β1,β2]] > H | Var [Y 1,β = [β1,β2]], the βintersection > H | β2. Since Var [Y 0] monotonically decreases if H | β [β1,β2], minβ maxx [ 1,1] Var [Y x] = ∈ ∈ − H | Var [Y 0,β = β2]. H | Case 2: The proof is similar to Case 1. • Case 3: Var [Y 0] and Var [Y 1] are unchanged when • H | H | β [β1,β2]. Hence, minβ maxx [ 1,1] Var [Y x] = ∈ ∈ − H | maxx [ 1,1] Var [Y x, β = [β1,β2]]. ∈ − H |

33