1

Privacy Leakage of Real-World Vertical Federated Learning Haiqin Weng∗, Juntao Zhang∗, Feng Xue∗, Tao Wei∗, Shouling Ji†, Zhiyuan Zong∗ ∗Ant Group {haiqin.wenghaiqin, tilin.zjt, henry.xuef, lenx.wei, david.zzy}@antgroup.com †Zhejiang University [email protected]

Abstract—Federated learning enables mutually distrusting par- frameworks for secure and private , includ- ticipants to collaboratively learn a distributed machine learning ing PySyft [36], FATE [16], TF Federated [17], TF model without revealing anything but the model’s output. Generic Encrypted [13] and CrypTen [10]. These are all designed federated learning has been studied extensively, and several learning protocols, as well as open-source frameworks, have been to provide the fast development of privacy-preserving models developed. Yet, their over pursuit of computing efficiency and for non-experts, who want to choose an efficient protocol fast implementation might diminish the security and privacy for training distributed machine learning models. With the guarantees of participant’s training data, about which little is advance of open-source frameworks, federated learning now known thus far. has become a mainstream choice when different data holders In this paper, we consider an honest-but-curious adversary who participants in training a distributed ML model, does want to train a distributed model on their private data collab- not deviate from the defined learning protocol, but attempts oratively. to infer private training data from the legitimately received Vertical Federated Learning. Based on the characteristics information. In this setting, we design and implement two of data distribution, the general federated learning can be cat- practical attacks, reverse sum attack and reverse multiplication egorized into horizontal federated learning [2], [3], where the attack, neither of which will affect the accuracy of the learned model. By empirically studying the privacy leakage of two distributed datasets share the same feature space but different learning protocols, we show that our attacks are (i) effective - the samples, and vertical federated learning [50], [22], where the adversary successfully steal the private training data, even when distributed datasets share the same samples but differ in feature the intermediate outputs are encrypted to protect data privacy; space. Vertical federated learning can also be considered as (ii) evasive - the adversary’s malicious behavior does not deviate privacy-preserving machine learning, which is tightly related from the protocol specification and deteriorate any accuracy of the target model; and (iii) easy - the adversary needs little prior to multi-party computation. Compared with the horizontal knowledge about the data distribution of the target participant. setting, vertical federated learning is more complicated as it We also experimentally show that the leaked information is as needs to design a specific learning protocol for each machine effective as the raw training data through training an alternative learning algorithm. In our work, we mainly focus on vertical classifier on the leaked information. We further discuss potential federated learning since it has a wider scope of applications countermeasures and their challenges, which we hope may lead to several promising research directions. and is more suitable for joint model training between large enterprises. For example, consider two different companies in Index Terms—Federated machine learning; Privacy leakage. the same city, one is a bank, and the other is an e-commerce company. They share a large percentage of user sets and differs

arXiv:2011.09290v2 [cs.CR] 8 Apr 2021 I.INTRODUCTION in user attributes. In this case, vertical federated learning is the Federated learning provides a promising way for different right mean for these two company to train a distributed model participants to train models on their private data without like the risk management model collaboratively [50]. revealing any information beyond the outcome [30], [50], [22], Nevertheless, many learning protocols of vertical federated [3]. The main goal of federated learning is to protect the learning sacrifice data security and user privacy for computing privacy of the participants’ training data and to effectively efficiency. We observe that such sacrifice for efficiency makes and efficiently learning a joint machine learning model from the practical joint model training vulnerable to the adversary, participants’ training data. Compelling applications include us- resulting in the leakage of privacy data. For the privacy- ing data from multiple hospitals to train predictive models for preserving version of logistic regression [22], [50] as an patient survival [27] and using data from user’s example, it employs a third-party coordinator to coordinator to train next word prediction models [24]. the model training procedure as well as to accelerate the In order to deal with the practical application, recent years training speed. In this learning protocol, the third-party co- have witnessed a great number of delicate open-source frame- ordinator holds an RSA private key and sends the public key works for training a distributed model in federated learning. to the computing participants for encrypting their intermediate For sample, many industry leaders develop and open source outputs. Then, the coordinator combines the encrypted outputs, use its private key for decrypting, and sends the decrypted *work in progress result back to the computing participants. Obviously, such a 2 training procedure leaks much too private information to the target participant’s training data by inserting magic numbers coordinator. Even worse, in practice, we can hardly see an in the exchanged data, when the two participants are jointly honest third-party coordinator which never deviates from the training a XGBoost model. Though not completely equivalent protocol specification nor gathers private information from the to the raw data, these stolen partial orders can be further used limited outputs. On the contrast, there will probably exist a to train an alternative model that is as effective as the one malicious adversary who mildly controls part of the distributed trained on the raw data. training process to infer participant’s private training data from Extensive experimental evaluations against two popular the limited intermediate outputs. learning protocols implemented on the real-world framework Our Work. In this paper, we provide the first systematic FATE show that, the adversary can precisely infer various study on the potential privacy risk of practical learning proto- levels of the target participant’s private data. Specifically, to cols in vertical federated learning. Specially, our work seeks show the effectiveness of our reverse multiplication attack, we to answer the crucial research question: How much is the evaluate the privacy of participants’ training data when they privacy risks of the practical learning protocols for computing collaboratively train a logistic regression model on vertical participates whose data is used as part of the training set? In partitions of the credit dataset. Our evaluation results show that other words, how much is the privacy leakage of the learning the learning protocol of logistic regression–which is claimed protocol about the participants’ training data? to be privacy-preserving to the training data–does not protect Although previous works have shown the possibility of in- the data privacy of the honest participant (the adversary obtain ferring private data in the scenario privacy-preserving machine an extremely high accuracy of 100% accuracy in stealing the learning [28], [32], [23], [48], they neither focus on the vertical training data). This shows that even well-designed learning setting of federated learning nor only performing membership protocols leak a significant amount of information about attacks against machine learning models. Recent years have participants’ training data, and are vulnerable to reverser witnessed many new technical breakthroughs in the learning multiplication attack. protocols and open-source frameworks of vertical federated Our experimental evaluation also shows that a malicious learning, e.g., SecureBoost [8], FATE [16] and PySyft [36], participant can perform accurate reverse sum attack against which have a difference in training process and architectures. other participants. When collaboratively training a single As stated before, those practical learning protocols have poten- boosting tree, the active participant can successfully steal tial risks of privacy leakage due to the sacrifice for computing partial orders of ∼ 3, 200 samples with 100% accuracy from efficiency. Compared with the existing works [28], [32], [23], the target participant. We experimentally demonstrate that [48] in the horizontal setting, measuring the privacy leakage of leaked information can be used to build such an alternative the practical learning protocols in the vertical setting has two classifier that is as accurate as the one trained on the original major challenges: the encrypted intermediate outputs and the data, showing the potential commercial values of the leaked heterogeneous participants’ training data. On the one hand, leakage. the encrypted intermediate outputs make it impossible for In summary, we make the following contributions: the participants to directly infer private data through gradient • We discover the potential privacy risks in practical learn- updates. That is, any learning-based attacks [55], [28], [18] ing protocols of vertical federated learning by showing, a cannot work in this case. On the other hand, since participants’ semi-honest with access to the learning process can infer training data are heterogeneous, none of them can infer private privacy about the target participant’s data. data though maliciously queries from the other. • We design two simple yet effective attacks for measuring Methodology & Contributions. To address the above data leakage in vertical federated learning and implement challenge, we present a comprehensive framework for the them on the practical privacy-preserving frameworks. privacy analysis of learning protocols in vertical federated • We present a first systematic measurement study on the learning through attacking two widely used learning algo- privacy risks of two commonly used learning protocols in rithms: Logistic regression [19] and XGBoost [7]. These two vertical federated learning, logistic regression and Secure- algorithms have been deployed in various applications by dif- Boost (i.e., the private preserving version of XGBoost), ferent organizations, including advertisement in e-commerce with an in-depth analysis on the factors that influence the companies [54], disease detection in hospitals [6], and fraud privacy. detection in financial companies [53]. In our work, we consider This paper is organized as follows. In §II, we introduce the privacy leakage of these two major algorithms which are the background knowledge of our work, including the formal frequently used by giant organizations for joint model training definition of federated learning, the commonly used learning in vertical setting. protocols and the real-world frameworks for privacy preserv- We propose two simple yet effective attacks, reverse mul- ing machine learning. In §III, we formalize the problem and tiplication attack and reverse sum attack, to measure their our threat model. In §IV, we describe the reverse multi- training data leakage caused by numerical computations. In the plication attack, and our implementation and experimental reverse multiplication attack, the adversary attempts to steal evaluations. In §V, we describe the reverse sum attack, and the raw training data of the target participant leveraging the our implementation and experimental evaluations. We present intermediate multiplication results, when two participant are possible countermeasure in §VI, review the related work in jointly training a logistic regression model. In the reserve sum §VII, and give our discussions in §??, and finally conclude attack, the adversary tries to probe the partial orders of the this paper in §VIII. 3

II.PRELIMINARIES where v is not encrypted. These operations can be extended to In this section, we first introduce the basic definition of work with vectors and matrices component-wise. For example, vertical and horizontal federated learning. Then, briefly re- we denote the inner product of two vectors of plaintexts u and T T view the two commonly used machine learning algorithms: v by v · [[u]] = [[v · u]]. logistic regression [19] and XGBoost [7], followed by their privacy-preserving versions in vertical federated learning. In B. Learning Protocols the end, we review several popular real-world frameworks for vertical federated learning, including TF Federated, Logistic regression [19] and XGBoost [7] are two highly FATE, PySyft, TF Encrypted and CrypTen. effective and widely used machine learning methods. Many winner teams of machine learning competitions like Kag- A. Federated Learning gle1 challenges use a combination of logistic regression and a) Federated Learning: Federated learning aims to build XGBoost to build their best performance classifiers [40]. machine learning models on datasets that are held by multiple Similarly, these two algorithms are frequently used by giant participants without revealing anything but the model’s output. organizations for joint model training. For example, the fi- Based on the characteristics of data distribution, the general nancial institution and the e-commerce company use linear federated learning can be categorized into horizontal federated regression to collaboratively train a scorecard model on their learning [2], [3] and vertical federated learning [50], [22]. own private training data. Below, we briefly introduce the main In horizontal federated learning, the distributed datasets share ideas of and XGBoost, followed by their the same feature space but different samples. In contrast, in privacy-preserving versions in vertical federated learning. vertical federated learning, the distributed datasets share the a) Logistic Regression: Given n training samples xi same samples but differ in feature space. Figure 1 visualizes with d features, and n corresponding output labels yi, linear the characteristics of data distribution of the horizontal and regression is a statistical process to learn a function such vertical federated learning. Our work mainly evaluates the that f(xi) = yi. f is assumed to be linear and can be privacy leakage of learning a distributed machine learning represented as the inner product of xi with the coefficient Pd j j model in vertical federated learning. vector θ: f(xi) = j=1 θ xi = θxi. To learn the coefficient b) Notations: We give the mathematical notations that θ, minimize the loss between f(xi) and yi on all the training are frequently used across this paper. For simplicity, we data, defined as assume two participants, A and B, are collaboratively training n a distributed machine learning model in the vertical setting. Let 1 X 1 2 L(θ) = (θxi − yi) . (3) X ∈ Rn×d denote the complete dataset containing n samples, n 2 i=1 Y ∈ Rn×1 denote the label set, and I ∈ Rn×1 denote the sample IDs. In vertical federated learning, this dataset does Usually in centralized linear regression, SGD [37] is an not exist in one place but is composed of the columns of effective approximation algorithm for approaching a local the datasets held by A and B, giving the vertical partition: minimum of a function, step by step. X = [XA|XB]. The label set Y is held by A. We also call b) Loss decomposition: Different from centralized learn- A activate participant as it has the vital information of class ing, each xi in vertical federated learning is partitioned as A B labels, while calling B passive participant as it only has the [xi |xi ] and the coefficient θ is also partially held by the two data feature. participants as [θA|θB]. The loss function of Eq. (3) is then A Now, let xi be the i-th row of X. Define xi as the partial simplified and decomposed as B data of xi held by A, and xi as the partial data of xi held n A B 1 X 1 1 1 by B. In other words, xi is decomposed as: xi = [xi |xi ]. L(θ) ≈ ( θAxA + θBxB − y ) · [xA|xB] (4) c) Security Primitives: Next, we review the primary pri- n 4 i 4 i 2 i i i i=1 vacy techniques adopted by vertical federated learning for pro- 1 1 1 1 = ( θAXA + θBXB − Y) · [XA|XB] (5) viding privacy guarantees. Vertical federated learning usually n 4 4 2 utilizes additively homomorphic encryption like Paillier [33] to encrypt the intermediated output. Participants A and B jointly compute the decomposed loss Additively homomorphic encryption is a public key system function of Eq. (5) using their private training data. Then, that allows participants to encrypt their data with a known they exchange the intermediate outputs with or without a third- public key and perform computation with data encrypted by the party coordinator to form the final result. In [22], [50], they others with the same public key. For extracting the plaintext, does require a third-party coordinator. In the following, we the encrypted result needs to be sent to the private key holder show the detailed steps for computing the above equation in for decryption. Let the encryption of a number u be [[u]]. For vertical federated learning. any plaintext u and v, we have the following add operation, c) Secure logistic regression: To compute Eq. (5) with- out revealing anything about data privacy, most of the se- [[u + v]] = [[u]]+ [[v]]. (1) cure protocols adopt additively homomorphic encryption like We can also multiply a ciphertext with a plaintext by repeated Paillier to encrypt the intermediate outputs and operate on addition, [[v · u]] = v · [[u]], (2) 1https://www.kaggle.com/competitions 4

Data from A ID F1 F2 F3 F4 F5 F6 F7 F8 Data from B

ID F1 F2 F3 F4 F5 F6 F7 F8 0001 0001 0002 Horizontal federated learning 0002 Vertical federated learning

Data from A Data from 0003 share same feature space 0003 share same sample IDs

…… Samples Samples ……

…… ……

Data from B Data from …… ……

(a) Horizontal Federated Learning (b) Vertical Federated Learning Fig. 1: Types of Federated Learning. (a) shows the data distribution of horizontal federated learning and (b) shows the distribution of vertical federated learning.

TABLE I: Detailed steps for logistic regression in vertical to minimize the following loss, federated learning. n 1 Participant A Participant B Coordinator C (t) X (t−1) 2 L ≈ [l(yi, yˆ )+gift(xi)+ hift (xi)]+Ω(ft), (7) θA; θB A 2 1 init init ; send the public key to and i=1 B; 2 compute [[u]] = a) compute [[v]] = 1 A A 1 B B where gi and hi are first and second order gradient statistics [[ 4 θ XS − [[ 4 θ XS ]]+ [[u]] and 1 YS ]] and send send to A; on the loss function calculated by f , and Ω penalizes the 2 B t−1 to B; b) compute [[v]]XS and sent to C; complexity of the model. 3 compute [[v]]XA S For constructing t-th regression tree, XGBoost starts from and sent to C; A B 4 decrypt [[v]]XS and [[v]]XS a leaf node and iteratively adds optimal branches to the tree to get the gradient gA, gB, and until researching the terminating condition. Below is the loss send them to A and B. reduction after the split, P 2 P 2 P 2 1 ( gi) ( gi) ( gi) L = [ i∈IL + i∈IR − i∈I ]−γ, ciphertext [30], [50], [22]. The encrypted distributed loss of split 2 P h + λ P h + λ P h + λ i∈IL i i∈IR i i∈I i Eq. (5) is (8) where IL and IR are the instance space of the left and right 1 1 1 1 [[L(θ)]] ≈ ([[ θAXA]]+[[ θBXB]]−[[ Y]])·[XA|XB] (6) nodes after split. In practice, XGBoost chooses an optimal n 4 4 2 split that can maximize the above reduction loss. That is to say, in each iteration, XGBoost enumerates all the possible In vertical federated learning, we also employ the SGD al- split feature and value and find the one that can maximize the gorithm for gradually approaching a local minimum of the reduction loss. Refer [7] for more details about XGBoost. distributed loss. Table I gives the detailed training steps of e) Secure XGBoost: Secure XGBoost [8], also known logistic regression in vertical federated setting. With SGD, the as SecureBoost, is the privacy-preserving tree-boost algorithm secure computing protocol of logistic regression works as the in vertical federated learning. SecureBoost allows the learning following. process of k regression trees to be jointly conducted over two A B First, participants A and B initialize θ and θ as vectors participants with common user samples but different features. of random values or all 0s. Coordinator C creates an encryption To learn a privacy-preserving regression tree, SecureBoost also key pair and sends the public key to A and B. Second, in each starts from a single leaf node and coordinates two computing iteration, participant A randomly selected a small mini-batch participants to find the optimal split. of indices S and update the distributed coefficient by averaging Different from logistic regression, SecureBoost does not partial derivatives of all samples on the current coefficient. rely on a third-party coordinator to coordinate the learning Steps 2–4 shows how A and B collaboratively compute the process. In SecureBoost, the participant who holds the label gradient of the encrypted distributed loss function and update information serves as a coordinator. We also call the partic- the distributed coefficient. In general, using an additively ho- ipant who hold the labels as the activate participant and the momorphic encryption scheme, participants A and B compute other as the passive participant. on the encrypted intermediate outputs, generating an encrypted In SecureBoost, the activate and passive participants col- derivative which, when decrypted, matches to the derivative as laboratively calculate the reduction loss of Eq. (8) and search if it had been calculated on the plaintext. Refer [30], [50], [22] for the best split. In each iteration, they search the best split for more details about secure logistic regression. and construct a regression tree as follows. First, the activate d) XGBoost: Given n training samples xi with d fea- participant calculates first and second order gradients for all tures, XGBoost learns a function consisting of k regression the training samples, and then encrypts and sends them to the Pk trees such that f(xi) = t=1 ft(xi) [7]. To learn such k passive participant. Second, the passive participant receives the regression trees, XGBoost adds a tree ft at the t-th iteration encrypted gradients and calculates the reduction loss for all the 5 possible split features and values. For efficiency and privacy, lead to the privacy risk of participants’ training data. By con- instead of computing on [[gi]] and [[hi]] directly, the passive structing two novel attacks, reverse multiplication attack and participant maps the features into data bins and then aggregates reverse sum attack, we show there indeed exists vulnerabilities the encrypted gradient statistics based on the bins. Third, against vertical federated learning, and an adversary can steal the activate participant receives and decrypts the encrypted the plaintext of participants’ private data. In this section, we reduction loss, and then finds an optimal split dimension and present the threat model and privacy risks faced by vertical feature with the maximum reduction. Refer [8] for more details federated learning. about SecureBoost. Figure 2 shows a typical federated learning scenario. In general, there essentially exists semi-honest adversaries who C. Real-world Secure Computing Frameworks for Vertical merely cooperate in gathering information out of the learning Federated Learning protocol, but do not deviate from the protocol specification. In the following, we describe the threat model in vertical federated learning. TABLE II: Popular Secure Computing Frameworks. HE = homomorphic encryption and SS = Secret Sharing.

Framework Developer Vertical Horizontal Arch FATE WeBank   HE PySyft OpenAI   HE, SS TF Federated Google   SS TF Encrypted Dropout   HE, SS CrypTen Facebook   HE, SS

As discussed, many industry leaders develop and open- source frameworks for secure and private machine learning, including PySyft [36], FATE [16], TF Federated [17], TF Encrypted [13] and CrypTen [10]. All of these frame- Fig. 2: A typical federated learning scenario. The complete works directly or indirectly provide machine learning protocols dataset X is vertically split into XA and XB held by partic- in vertical federated learning setting. For example, Fate ipants A and B respectively. Then, A and B maintain partial supports both vertical federated learning of various machine models locally and exchange the encrypted partial gradient learning algorithms, including logistic regression, deep learn- and other essential information to learn a joint model on the ing and tree-based learning; PySyft provides basic modules complete dataset X. (e.g., communication and encrypted computation modules) for the fast implementation of vertical federated learning. Table II shows the functionalities and basic encryption techniques A. Threat Model supported by these frameworks. We investigate if an adversarial participant can steal Of the five evaluated frameworks, Fate, PySyft, TF the privacy about other participant’s training data, even Encrypted and CrypTen support participants to train a when the learning protocol is privacy-preserving. As shown joint model when their training data is vertically aligned or in Figure 2, participants A and B keep their private data horizontally aligned. The remaining one TF Federated locally, and according to a particular secure learning protocol, only supports to training a model when the data are horizon- they jointly train a model without revealing anything but tally aligned. For the encryption techniques, these five frame- the model’s output. In some learning protocols like logistics works mainly utilize two mainstream methods, homomorphic regression, A and B require a third-party coordinator to handle encryption [14] and secret sharing [39]. the encrypted partial gradients and update the global model. In this paper, we measure the data leakage of learning In contrast, some other protocols like SecureBoost, the two protocols in vertical federated learning using two proposed participants, can independently learn a joint model without attacks, reverse multiplication and sum attacks. Our attack any third-parties. In both types of learning protocols, we promises to be effective against any implementations of the assume that the adversarial participant is semi-honest that does learning protocols provided by the five frameworks since that not deviate from the protocol specification, but try to gather (1) the learning protocol framework only provides us basic as much private information out of the learning protocol as functionality to implement the learning protocol and (2) the possible. framework cannot change the computation pipeline of the a) Adversary: We assume that one of the participants in learning protocol. the vertical federated learning scenario, as shown in Figure 2, is controlled by the adversary. The adversary can send or III.GENERAL PRIVACY RISKSIN VERTICAL FEDERATED receive the encrypted information corresponding to the statics LEARNING (e.g., gradient or the multiplication result between gradient Although secure learning protocols and open-source frame- and data) of the other participant’s training data. In general, works provide the possibility for different organizations to the adversary maliciously controls the local training process. collaboratively train a machine learning model, we find that However, in our work, we assume a semi-honest adversary those protocols’ over pursuit for learning efficiency, however, who does not deviate from the protocol specification, but 6 merely operates to gather as much private information out of by matrix multiplication through performing the reverse mul- the learning protocol as possible. For example, in reverse sum tiplication attack. attack (refer §V for more details), the adversary may construct specific strings and insert them into the least significant bits of A. Attack Pipeline gradients while keeping the learning protocol intact. In reverse multiplication attack (refer §IV for more details), the adversary As shown in Table I, logistic regression in vertical federated may also corrupt the third-party coordinator to gather much learning involves both computing participants and a third-party more privacy out of the protocol. coordinator. In our work, we focus on the simple scenario A B b) Adversary’s objectives: The adversary’s main objec- where only two data holders, and , participant in the model tive is to infer as much as the participant’s private training training process. dataset as possible. (i) In the reverse multiplication attack, the Logistics regression relies on a series of multiplications be- adversary aims to infer the target participant’s raw training tween coefficients and participants’ private training data to de- A data. (ii) In the reverse sum attack, the adversary attempts to duce the gradient, as listed in Eq. (6). Moreover, participants B infer the partial orders of the target participant’s training data. and are required to exchange the intermediate multiplication results for calculating the gradient. Though the multiplication c) Assumption about the third-party coordinator: In our result is encrypted by the public and contains no information reverse multiplication attack, we assume that the adversary about the training data, we still can collude with the private could corrupt the third-party coordinator. This assumption is key owner to get the multiplication result’s plaintext. Further, reasonable since finding a trusted third-party coordinator is leveraging linear algebra and scientific computation, we can almost impossible. In practice, for convenience, the third-party reverse-engineer each multiplication terms (i.e., raw training coordinator is usually deployed and maintained by one of data) from the multiplication result. the participant. For example, the financial institution hopes We design a reverse multiplication attack against logistic to corporate with an e-commerce company for jointly training regression. We assume the adversary fully controls participate a model. This financial institution may probably deploy the A and coordinator C to steal the private information of features third-party coordinator locally and jointly train the model with from the remaining participant B. In this attack, the adversary the e-commerce company. In such case, it is easy for the utilizes 1) the intermediate output that A receives from B and adversarial participant to compromise the coordinator. Even 2) the partial gradient that C sends back to B. The intermediate if the coordinator is protected by TEEs (Trusted Execution output is the linear product of B’s partial gradient and features. Environments) [42] like SGX [38], it still could be comprised Regarding the B’s features as unknown parameters, performing by various state-of-the-art attacks, e.g., SGX has been proven the reverse multiplication attack equals to solve a set of linear to be vulnerable to side-channel attacks [47]. functions. Like the controlled participant, we assume that the semi- Below, we illustrate the detailed steps for the reverse mul- honest adversary does not destroy the third-party’s proto- tiplication attack. We consider that the data samples held by col specification. This semi-honest adversary only exploits A and B are already aligned. the limited statists that the corrupted third-party received or other utilities held by this third-party. Usually, in vertical • Step 1. The adversary stealthily stores the intermediate federated learning, the third-party coordinator concatenates encrypted multiplication result during the collaborative the encrypted intermediate outputs received from the two training on each mini-batch. To be specific, for each participants and form the final gradient updates. Thus, the mini-batch S, the adversary stealthy stores (i) the multi- adversary could use such gradient updates to infer the target plication result between B’s partial data and coefficient, 1 B B B participant’s training data. In §IV, we will illustrate how the [[ 4 θt XS ]], and (ii) the coefficient update of gradient, gt . adversary reverse-engineers the multiplication terms of the Note that this encrypted product is sent from B for jointly training data leveraging the third-party coordinator. calculating the encrypted gradient on the mini-batch. 1 B B • Step 2. The adversary decrypts [[ 4 θt XS ]] calculated on each mini-batch. Note that A and B use the same public IV. REVERSE MULTIPLICATION ATTACK key for encryption and C uses the corresponding private In the reverse multiplication attack, the adversary’s goal key for decryption. Hence, the adversary can use C’s is to reverse-engineer each multiplication term of the ma- private key to decrypt the ciphertext. trix product. Matrix multiplication is the typical operator • Step 3. The adversary reverses the multiplication terms commonly used in many federated learning protocols, which to obtain B’s training data. For a specific mini-batch B may raise potential privacy leakage of participants’ private XS , the adversary have the following equation on two 1 B B 1 B B B B training data. Thus, we focus on logistic regression in a successive training epoch: 4 θt XS − 4 θt−1XS = gt XS , B vertical federated setting and propose a simple yet effective where only XS is the unknown parameter. The adversary B reverse multiplication attack against it for inferring the target can compute out the unknown XS if he gets sufficient participant’s raw training data. equations. The adversary needs at least |B| equations Below, we illustrate the attack pipeline of our proposed solve the unknown parameter, where |B| denotes the reverse multiplication attack. Then, we implement the reverse number of features in B’s private data. multiplication attack on the popular practical framework, The above steps illustrate how the adversary utilizes the 1 B B B FATE, and experimental measure the privacy leakage raised intermediate output [[ 4 θt XS ]], and partial gradient g to steal 7

B’s private training data. In some real-world federated learning • PySyft is a library for secure and private machine frameworks, the coordinator directly updates the coefficient learning. Based on its communication and encrypted com- and send it back to participants. In such case, the adversary putation modules, we implement the learning protocol B can solve out the unknown parameter XS simply using several of privacy-preserving logistic learning proposed in [22], B B B multiplication results θt XS and the partial gradients θt . similar to FATE’s default implementation. Except for The success of the reverse multiplication attack depends on [22], there exist many other similar learning protocols the rank2 of the coefficient matrixes of the equations. The for privacy-preserving logistic regression [30], [50]. We adversary can successfully compute out the private data if choose to implement the protocol provided in [22] since the coefficient matrix is fully ranked otherwise not. Hence, in it provides a satisfying training accuracy and learning our experimental evaluation, we first explore the influence of efficiency. training parameters on the rank of the coefficient matrix. Then, c) Attack Setting: We vertically partition each of the we perform our reverse multiplication attack against Logistic training datasets into two parts and distribute them to par- Regression in a vertical federated learning setting. ticipants A and B. Table III shows the partition details. The labels of the three datasets are all placed in A. In other words, A is the active participant, and B is the passive participant. B. Experimental Setup a) Datasets: We use three classical classification datasets TABLE III: Vertical partition of the used datasets. in our experiments: Credit [9], Breast [4], and Vehicle [5]. Dataset #examples #features for A #features for B • Credit contains the payment records of 30, 000 customers Credit 30, 000 13 10 Breast 699 10 20 from a bank, where 5, 000 customers are malicious, Vehicle 946 9 9 and the remaining are benign. Each record has a total of 23 integer or real value features. The Credit dataset We set the adversary control participant A and the third- is a popular benchmark dataset used to evaluate binary party coordinator, C. The adversary aims to steal B’s private classification tasks [51]. data using the limited information he received from the A and • Breast is also a binary classification dataset. It contains B. During the training process, the adversary stealthily stores 699 medical diagnosis records of breast cancer, of which the coefficients (or gradients) and the encrypted multiplication 357 are benign, and 212 are malignant. Each record has results. In this experiment, the adversary only gathers infor- 30 features extracted from a digitized image of a fine mation out of the training process, not destroying the training needle aspirate of a breast mass. process. • Vehicle is a multi-classification dataset extracted from example silhouettes of objects. It contains 946 examples belonging to four vehicle categories. Each example has C. Results & Analysis a set of 18 features extracted from the silhouette of the original object. b) Implementation of Logistic Regression in Vertical Fed- erated Learning: We implement the learning protocol of lo- gistic regression on two real-world secure and private learning frameworks, FATE and PySyft. Then, we perform the re- verse multiplication attack against these two implementations. For the setting of training scenario, we consider two partici- pants A and B with a third-party coordinator C collaboratively Fig. 3: Rank of the coefficient matrix under different settings train a logistic regression model. Below, we detail the two of batch size. implementations of privacy preserving logistic regression. • FATE provides many default implementations of machine learning algorithms, including logistic regression and tree-based algorithms. In our experiment, we use Fate’s default implementation of privacy-preserving logistic re- gression [50]. This implementation employs a third-party arbiter to coordinate the process of gradient computation and sends back the decrypted gradient to the participants. For protecting data privacy, this implementation uses an RSA private key to encrypt the intermediate output Fig. 4: Rank of the coefficient matrix under different settings exchanged among the two participants, which can only of learning rate. be decrypted by the public key owner. a) Impact of Training Parameters: As stated in §IV-A, 2The rank of a matrix is defined as the maximum number of linearly the success of reverse multiplication attack depends on the independent column vectors or row vectors in the matrix. coefficient matrix rank of the equations, which is influenced 8 by the training parameters. To understand such influence, we engineers all the addition terms from the gradient sums perform our attack under different settings of two critical leveraging the encoded magic numbers. These reversed training parameters, learning rate, and batch size. terms revel the partial orders of B’s training data. Figure 3 reports the coefficient matrix rank after 100 train- In the following, we will detail the above steps. ing epochs under different settings of batch size. From this figure, we can see that the learning rate significantly affects the coefficient matrix’s rank. A small batch size makes the coefficient matrix full rank, while a large batch size cannot. In other words, with small batch size, the adversary can accurately all the private data of the target participant. Figure 4 reports the coefficient matrix rank after 100 train- A few bits of A few bits of ing epochs under different settings of the learning rate. We random values group identifier can see from this figure that learning rate may potentially affect the success rate of our reverse sum attack. For the Credit Fig. 5: Architecture of the magic number. The first few bits are dataset as an example, when the learning rate is set to 0.01, random values and the remaining bits are group identifier that the parameter matrixes’ ranks are 9 in all the tested values of allows the adversary to locate the specific group the example batch size. In other words, these matrixes are not fully ranked belongs and the adversary cannot use them to compute out B’s private training data. While the learning rate is set to 0.05, all the a) Magic Number: Aforementioned above, we encode cumulated gradient matrixes under are fully ranked, resulting magic numbers in the ciphertext of the first and second gra- in the success of the reverse sum attack. dients. In practice, the public encryption ecosystem encrypts on the 1024-bit number. For the 64-bit float number of the V. REVERSE SUM ATTACK gradient, the ecosystem usually pads it with 960 zeros to In the reverse sum attack, the adversary attempts to reverse get a 1024-bit number for encryption. The 960 zeros do not each addition terms from the sum. The sum is a common deteriorate the float number precision as they are padded in statistical factor, widely used in many machine learning al- the least significant position. In our attack, instead of padding gorithms, including nearest neighbor-based and tree-based 960 zeros, we design to pad the 64-bit number with a 960-bit algorithms [19]. We focus on XGBoost in vertical federated magic number. learning (i.e., SecureBoost) and propose a reverse sum attack A magic number is a unique identifier used by the attacker against SecureBoost for stealing private information about to identify the specific gradient value encoded with this magic target participant’s training data. number. For this purpose, we design such a generation schema Below, we describe our design of the reverse sum attack and that could generate as many magic numbers as possible for then report our major experimental evaluation of this attack. identifying gradients of the training data. Figure 5 visualizes the architecture of our magic number. As shown in Figure 5, the magic number has two parts: A. Attack Pipeline group identifier and random values. The group identifier is §II-B briefly outlines the learning protocol of SecureBoost. a one-hot vector under n-base positional numeral system, There exist two participants in SecureBoost: the activate par- and is used to identify the specific group examples that the ticipant A and the passive participant B. As illustrated in §II-B, encoded example belongs. For example in base-10 system, participant A does receive the first and second gradient sums 0··· 01 implies that the example belongs to group 1. Random of each bin’s examples defined by participant B. Moreover, to values are used to notate different examples from the same compute the split loss jointly, these gradient sums are sent one group. With the group identifier and random values, examples by one according to bin order. Obviously, it is possible for A from the same group have the same group identifier while the to encode information about the training data in the gradients different random values. and further reverse the addition terms of the gradient sums Below, we give an example of generating group identifiers received from B using the encoded information. These addition using a base-16 positional numerical system. For simplicity, terms imply the partial orders of the target participant’s private we only utilize 20 of 960 bits to encode the magic number. We training data. set the first 4 bits as random value bits, and set the remaining We design a reverse sum attack against SecureBoost. In our 16 bits as the group identifier bits, identifying a maximum attack, we assume the adversary fully controls participant A. 16 of 4 (= log16 2 ) unique groups of examples. Given 60 data The major two steps of our attack are listed below. samples, we first divide them into 4 groups, each with 60/4 • Step 1. The adversary controls A to encode magic samples. The 1-th to 15-th samples are placed in the first number [26] into the ciphertext of the first and second group, the 16-th to 30-th samples are in the second group, order gradients. Here, the term magic number refers to the 31-th to 45-th samples are third group, and the 46-th to distinctive unique values that are unlikely to be mistaken 60-th samples are in the fourth group. For the data sample in for the other meanings (i.e., global unique identifiers). the first group, we generate the first 4 bits as random values • Step 2. The adversary stores both the first and second gra- and the group identifier of the remaining 16 bits as 0X0001. dient sums received from B. Then, the adversary reverse Similar, for the data sample in the second group, the first 4 9 bits of the magic number are random values and the group from participant B. The adversary’s goal is to reverse all the identifier is 0X0010. additions items of GB and HB, all of which can be further integrated to form the partial orders. Algorithm 1 Gradient Encoding Given a GB or HB, we first decode it into 1024-bit strings 1: Input: the first order gradients g1, g2, . . . , gn, the second and then extract the lower 960 bits, denoted as s. In each GB order gradients h1, h2, . . . , hn. or HB, s equals to the sum of magic numbers of B’s samples 2: Output: the first and second gradients encoded with encoded by the adversary. Then, leveraging s, the adversary magic numbers. greedily searches out an optimal combination of samples that 3: k ← number of super groups might be the addition terms of GB or HB. Below, we give 4: b ← base of the positional numerical system a sample to illustrate how the adversary reverse engineers all (960−30×k) 5: l ← logb 2 . length of the group identifier the addition items leveraging s. 6: g ← 2 × k × l . number of the total groups In the example, we utilize 20 of 960 bits to encode the 0 7: n ← g × b . number of the encoded samples magic number, set the first 4 bits as random value bits, and 0 8: for i = 1; i <= n ; i + + do set the base of positional numerical system to 4. We assume a 9: s ← round(i/l) data bin B containing five samples: x2, x17, x517, x520, x2400, 10: id ← round((i%l)/(b − 1)) where only x2 and x17 are selected to encode magic numbers. 11: random values ← the s to s + 29 bits are set to The magic number of x2 is 0X30001 and the magic number random values and the remaining of the first 30 × k bits of x14 is 0X20010. In this example, the magic number sum of are set to 0 GB is 0X50011 (=0X30001 + 0X20010). Once extracting 12: group identifier ← the one-hot vector under b-base the magic sum of 0X50011, the adversary can know B must positional numeral system with the id-th number set to 1 contains one sample from group 1 and one sample from group 13: magic number ← concatenate(random values, 2. With the additional information that sum of random values group identifier) of the two magic number is 0X5, the adversary can easily figure out that x2 and x17 are in B. b) Gradient Encoding: Algorithm 1 shows how to en- code magic numbers into the least significant bits of the first B. Experimental Setup and second gradients. Algorithm 1 shows how to encode magic numbers into the a) Datasets: We conduct experiments on two public least significant bits of the first and second gradients. datasets: Credit and Student [41]. The Credit dataset is the Algorithm 1 shows how to encode magic numbers into the same as that used in §IV-B. The Student dataset is a regression least significant bits of the first and second gradients. dataset, containing education records of 395 students collected First, we set the number of supergroups to k and the from a mathematics class. The data attributes include student positional numerical system base to b. In this setting, the grades, demographic, social, and school-related features. This (960−30×k) dataset is often used to train a regression model that predicts group identifier’s maximum length is logb 2 , shown in Line 5. The maximum number of samples groups is 2×k×l, student performance in the final examination. and the maximum number of encoded samples is g × b, as b) Implementation of SecureBoost in Vertical Federated shown in Lines 6&7, respectively. Second, we sequentially Learning.: We perform the reverse sum attack against the select n0 examples to encode magic numbers in their first real-world implementation of SecureBoost, provided by FATE. and second gradients. Also, we can use any other elaborate Similar to § IV-B, we use FATE’s default implementation selection strategy like a distribution-based strategy. Third, for of the learning protocol of SecureBoost. For protecting data each selected example, we construct a magic number and post- privacy, this implementation uses Paillier to encrypt the first process its first or second gradients by setting the lower b and second order gradients and the intermediate outputs. bits of each to a bit string of the magic number. Lines 9–13 c) Attack Setting: We vertically partition each of the describe the detailed steps of constructing a magic number, training datasets into two parts and distribute them to par- and Figure 5 shows the architecture of the magic number. ticipants A and B. Table IV shows the partition details. A is Third, for each selected example, we construct a magic the activate participant, and the labels of the two datasets are number and post-process its first or second gradients by setting all placed in A. the lower b bits of each to a bit string of the magic number. Lines 9–13 describe the detailed steps of constructing a magic TABLE IV: Vertical partition of the used datasets. number, and Figure 5 shows the architecture of the magic Dataset #samples #features for A #features for B number. Credit 30, 000 13 10 c) Gradient Sum Reversion: Leveraging the encoded Student 395 6 7 magic number, the adversary could effectively reverse engineer all the addition terms from the gradient sum. In this experiment, we set the adversary control A and aim Let B denote the bin. Then, let G = P g and to steal the partial orders of B’s private training data. During B xi∈B i H = P h be the first and second order gradient sum the training process, the adversary encodes magic numbers B xi∈B i of B’s samples. During the collaborative training of each into the least significant bits of the gradients and stores the boost tree, participant A continually receives GB and HB gradient sums from participant B. Although the adversary 10

slightly modifies the gradients, it does not destroy the training 1.0 process at all since the least significant bits cannot affect the selection of the best split dimension and feature. 0.8

C. Results & Analysis 0.6 a) Attack Performance: In this experiment, we inves- tigate the performance of the reverse sum attack against 0.4 Succese Rate Succese SecureBoost in vertical federated learning. Specifically, we base=16; #super groups=2 0.2 base=32; #super groups=2 experimentally analyze the factors that influence the perfor- base=16; #super groups=4 mance of the attack, including the distribution of the data, the base=32; #super groups=4 0.0 bin size, and the attack parameters. 0 10 20 30 40 50 60 The number of bins TABLE V: Details of the three 1D Synthetic Datasets. Fig. 7: The success rate for D1 with different number of data Dataset #samples distribution bins. D1 30, 000 Normal distribution: N (0, 1) 1 D2 30, 000 Bernoulli distribution: B( 2 ) D3 30, 000 Exponential distribution: E(1) D4 30, 000 Uniform distribution: U(0, 50) The reason behind this is straightforward. A large number of data bins result in fewer samples in a single bin, making it easier for the adversary to accurately figure out all the addition terms of the gradient sum of this bin. We can also observe from 1.0 D1 D2 the figure that a small number (close to 30) of data bins leads D3 0.8 0.5 D4 to a + attack success rate. Impact of Attack Parameters. Finally, we demonstrate that 0.6 the two attack parameters, the base of positional numerical

0.4 systems and the number of supergroups, are highly correlated with the size of cracked partial orders, as they decide the Success Rate 0.2 maximum number of samples that can be encoded with magic numbers. In this experiment, we run the attack on the credit 0.0 dataset using different settings of the two attack parameters. Table VI shows the success rate as well as the number of successfully cracked samples of the attack by varying numerical bases and supergroup size. As the table shows, using a small base and group size results in the highest attack Base=16; Group=2Base=16; Group=4Base=32; Group=4 success rate, while using a large base and group size results Fig. 6: The success rate for the four synthetic datasets with in more cracked samples. The reason behind this is twofold. different data distribution. By using a large base and a supergroup size, the capacity of the encoded samples ramps up, which leads the adversary Impact of Data Distribution. To understand the impact to encode magic numbers in more samples, and therefore of the data distribution, we perform the attack separately on the adversary can obtain more information about the addition four 1D synthetic datasets with four different data distribution, terms of the gradient sum. However, the large capacity also including Normal distribution, Bernoulli distribution, Expo- results in a large search space of the samples from the same nential distribution, and uniform distribution. Table V gives group, which makes it more difficulty for the adversary to detailed statistics of the four synthetic datasets. figure out all the right addition terms. Figure 6 shows the attack success rate for various data To clearly show performance change raised by the attack distribution of the target training datasets. Of the four tested parameters, we plot the attack success rate under different set- datasets, the attack is performed under different settings of tings of two attack parameters in Figure 8, and plot the number attack parameters. As can be seen from Figure 6, the success of cracked samples under attack parameters in Figure 9. As rate for different distributed datasets is almost the same under we can see from Figure 9, SecureBoost leaks more partial all attack settings. order information when the base of the numerical system and Impact of Bin Size. Next, we explore the impact of number of supergroups increase. Comparing Figure 9 with bin size on the attack success rate. In this experiment, we Figure 8, we find that although a large base or a large number set two participants to jointly learn an XGBoost model on of supergroups makes SecureBoost leak more information, it D1. Figure 7 shows the attack success rate when the target slightly reduces the effectiveness of a single magic number participant split the training data into different numbers of as the ratio of the successfully stolen samples encoded with bins. magic numbers decreases. As the figure shows, the attack success rate increases with From Table VII we can see that the adversary can success- the number of data bins on the target’s training data increases. fully infer the right and right bounds of each bin if he knows 11

TABLE VI: Success rate for different setting of the attack parameters. FID = feature ID. Base of the positional numerical system Dataset #Super groups FID 2 4 8 16 32 0 1, 860 (100.00%) 2, 784 (100.00%) 2, 138 (49.27%) 2, 379 (34.19%) 202, 8 (17.59%) 1 1, 860 (100.00%) 1, 047 (37.61%) 876 (20.19%) 1, 340 (19.26%) 821 (7.12%) 2 1, 860 (100.00%) 1, 218 (43.75%) 1, 203 (27.73%) 1, 150 (16.52%) 639 (5.54%) 3 1, 860 (100.00%) 1, 070 (38.43%) 907 (20.89%) 885 (12.72%) 591 (5.12%) 4 1, 860 (100.00%) 2, 509 (90.11%) 2, 208 (50.87%) 2, 246 (32.27%) 2, 166 (18.78%) 2 5 1, 860 (100.00%) 1, 470 (52.80%) 1, 121 (25.82%) 1, 509 (21.68%) 799 (6.93%) 6 1, 860 (100.00%) 1, 442 (51.78%) 877 (20.21%) 1, 136 (16.32%) 1, 113 (9.65%) 7 1, 860 (100.00%) 1, 463 (52.56%) 915 (21.08%) 1, 034 (14.86%) 680 (5.9%) 8 1, 860 (100.00%) 1, 443 (51.83%) 1, 100 (25.35%) 1, 394 (20.03%) 814 (7.06%) 9 1, 860 (100.00%) 1, 458 (52.37%) 1, 223 (28.19%) 1, 370 (19.68%) 853 (7.4%) Credit 0 1, 888 (100.00%) 2, 688 (100.00%) 2, 283 (54.35%) 2, 909 (43.29%) 2, 856 (25.59%) 1 1, 888 (100.00%) 1, 389 (51.66%) 1, 328 (31.62%) 1, 443 (21.48%) 1, 063 (9.52%) 2 1, 888 (100.00%) 1, 405 (52.26%) 1, 366 (32.52%) 1, 494 (22.23%) 1, 071 (9.60%) 3 1, 888 (100.00%) 1, 406 (52.28%) 1, 174 (27.96%) 1, 326 (19.74%) 802 (7.18%) 4 1, 888 (100.00%) 2, 688 (100.00%) 2, 236 (53.25%) 3, 156 (46.97%) 2, 958 (26.51%) 4 5 1, 888 (100.00%) 1, 496 (55.67%) 1, 450 (34.52%) 1, 452 (21.6%) 1, 522 (13.64%) 6 1, 888 (100.00%) 1, 455 (54.13%) 1, 378 (32.8%) 1, 444 (21.49%) 1, 593 (14.27%) 7 1, 888 (100.00%) 1, 459 (54.28%) 1, 351 (32.17%) 1, 363 (20.29%) 1, 296 (11.61%) 8 1, 888 (100.00%) 1, 437 (53.47%) 1, 423 (33.89%) 1, 349 (20.07%) 1, 260 (11.29%) 9 1, 888 (100.00%) 1, 467 (54.59%) 1, 413 (33.64%) 1, 537 (22.87%) 1, 553 (13.92%)

1.0 FID=0 1.0 FID=0 FID=0 FID=0 FID=1 FID=1 FID=1 3000 FID=1 FID=2 FID=2 2500 FID=2 FID=2 0.8 FID=4 0.8 FID=4 FID=4 FID=4 FID=5 FID=5 FID=5 FID=5 FID=6 FID=6 2500 2000 FID=6 FID=6 0.6 FID=7 FID=7 0.6 FID=7 FID=7 FID=8 FID=8 FID=8 FID=8 FID=9 FID=9 FID=9 2000 FID=9 0.4 1500 0.4 Succese Rate Succese Rate Succese 1500 0.2 1000 0.2

1000 number of cracked exampels cracked of number 5 10 15 20 25 30 5 10 15 20 25 30 examples cracked of Number 5 10 15 20 25 30 5 10 15 20 25 30 base base base base

(a) #super groups=2 (b) #super groups=4 (a) #super groups=2 (b) #super groups=4 Fig. 8: The success rate for the Credit dataset with different Fig. 9: The success rate for the Credit dataset with different number of data bins and numerical base. number of data bins and numerical base. the whole features of a few samples. proach to learn the left and right bound of all the bins on b) Applications of the Leaked Information: Finally, we the Credit dataset. In this experiment, we use the default measure the effectiveness of the leaked information through segmentation to get bins on all the features of the tested training an alternative classifier on these leaked information. dataset. Table VII shows the accuracy of the inferred bin’s left For evaluating the alternative classifier, we propose a simple and right bounds with different size of the auxiliary dataset. approach that uses a small auxiliary dataset to learn the map As the table shows, for a feature space, the number of known from target participant’s raw feature into the bin value. data samples that required to infer the bin values depends on In the following, we first introduce and evaluate our map- the number of data bins in this feature. The more data bins ping approach. Then, we train an alternative classifier on the existing in a feature space, the more number of known samples leaked information. required for inferring bin values with a satisfied accuracy. k Bin Mapping. Let Bj be the j-th bin Alternative Classifier. Then, we train an alternative classi- k on the k-th feature, and let L(Bj ) = fier on the Credit dataset using the leaked information. For  k k k xi |xi is partitioned into Bj and such partition is leaked the training, we simulate the data participant and settings denote the leaked partition. Here, we seek to infer the left of vertical federated learning, and train a distributed model k k and right bounds Bj from the leaked partition L(Bj ) and classifier using a combination of participant A’s raw training other auxiliary data. data and participant B’s partial order of bin information. For We assume the adversary could obtain all the features the data bin values of the target participant’s features, we of a few samples, including the features held by the simple use the inferred bin values from a few known data target participant, called auxiliary data. Let Aux = samples. {xi|xi’s entire features are already known to the adversary} Table VIII compares the performance of the alternative k denote as the auxiliary dataset. Given Aux and L(Bj ), the classifier with the one trained on the raw data. C0 represents adversary could roughly guess the left and right bounds of the original distributed classifier trained on the raw data, k k k B as min k x and max k x . and C1 to C6 are six alternative classifiers, with bin values j xi∈L(Bj )∩Aux i xi∈L(Bj )∩Aux i We then experimentally evaluate the above mapping ap- estimated using different number of known samples. We can 12

TABLE VII: The acccuracy of the inferred data bins on Credit dataset. FID denotes the feature ID of the target participant’s training data; and #bins denotes the number of data bins on the given feature. #samples known to the adversary FID/#bins 5 10 20 40 80 160 320 0/33 11.11% 19.44% 36.11% 55.56% 83.33% 91.67% 100.00% 1/2 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 2/4 50.00% 75.00% 75.00% 75.00% 75.00% 75.00% 100.00% 3/3 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 4/31 18.18% 27.27% 42.42% 69.70% 81.82% 100.00% 100.00% 5/6 66.67% 66.67% 83.33% 83.33% 83.33% 100.00% 100.00% 6/5 60.00% 80.00% 80.00% 80.00% 100.00% 100.00% 100.00% 7/5 60.00% 80.00% 80.00% 80.00% 80.00% 100.00% 100.00% 8/5 60.00% 60.00% 80.00% 80.00% 80.00% 100.00% 100.00% 9/5 50.00% 50.00% 100.00% 100.00% 100.00% 100.00% 100.00%

TABLE VIII: Performance of the alternative classifiers. # tion to guess the target participant’s training data, which cannot denotes the number of known samples used to estimate bin be prevented by secretly sharing the indicator vector. values. Differential Privacy (DP). Differential Privacy [12] is a Classifier #samples Accuracy system for publicly sharing information about a dataset by de-

C0 − 81.23% scribing the patterns of classes while withholding information C1 10 81.23% about individuals. DP is a commonly used privacy protection C 20 81.23% 2 system. DP cannot be applied to defend against reverse sum C3 40 81.23% C4 80 81.23% attack since it only protect the privacy of individual sensitive C5 160 81.23% example while cannot protect privacy of the whole dataset (i.e., C 320 81.23% 6 partial orders). As for the reverse multiplication attack, DP might roughly protect the target participant’s private sensitive dataset. Yet, the adversary can still infer the dataset that is observe from the table that these alternative classifiers achieve added with DP noise, and use this dataset to train an equivalent the same classification accuracy as the original one. The ex- model. perimental result also demonstrates that the leaked information has potential commercial values and can be further used to train an effective machine learning model. VII.MORE RELATED WORK Privacy preserving machine learning. Vertical federated machine learning can be considered as privacy-preserving VI.POSSIBLE DEFENSES decentralized machine learning, which is tightly related to In this section, we discuss several potential defenses that multi-party computation. Many research efforts have been may mitigate the proposed two attacks. devoted to improve both the efficiency and effectiveness of Parameter setting before training. Based on the experi- privacy-preserving machine learning [11], [46], [43], [44], mental results in Section IV-C, we observe that (i) if the batch [45], [52], [1], [21], [35], [31]. They either use secure multi- size is small, the reverse multiplication attack can accurately party or homomorphic encryption and Yao’s garbled circuits infer all the private data of the target participant; and (ii) if for privacy guarantees. For computation efficiency, most of the we increase the learning rate, the attack performance could them sacrifice part of data security and user privacy. Our work be greatly improved. These observations inspire us to choose provides the first systematic study on the potential privacy risk a large batch size and a small learning rate to mitigate the of privacy-preserving learning protocols in vertical federated reverse multiplication attack. However, the large batch size learning. requires a large network bandwidth for the two participants Privacy leakage in federated learning. Recent years, to communicate and the small learning greatly increases the privacy leakage of federated learning has aroused emerging training time. Considering time and economic costs, it is not research interests [28], [32], [23], [48], [49], [25], [34], [20], a promising defense. [29]. Such privacy leakage can be further categorized as Hide Partial Orders. A possible defense for SecureBoost membership, data properties, and data leakage. is to hide the partial orders using secret sharing [15]. That In membership leakage, an adversarial participant desires to is, introducing an indicator vector for all the nodes to record learn whether a given sample is part of the other participant’s which samples belong to them, and transforming the indicator training data. Nasr et al. has conducted a comprehensive anal- vector to secret sharing variable to further preserve privacy. ysis of the membership inference attack in federated learning, In this way, the sum of the derivatives of the samples in showing part of the training data membership is leaked when a single node can be computed as the inner product of the collaboratively training a model [32]. indicator vector and the derivative vector. Though this method In data properties leakage, the adversarial participant tries to can protect the privacy of participants’ partial orders, it might infer data properties that are independent of the main task, e.g., not defend against our reverse sum attack. In the reverse sum in an age prediction task, inferring whether people wearing a attack, the adversary maliciously encodes redundant informa- glass. Melis et al. has shown that the gradient updates leak 13

unintended information about participant’s training data and [7] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting developed passive and activate property inference attack to system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, exploit this leakage [28]. CA, USA, August 13-17, 2016. In data leakage, the adversary reconstruct representatives of [8] Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, and Qiang model’s training data [23], [55], [48], [20]. Several works [23], Yang. Secureboost: A lossless federated learning framework. CoRR, abs/1901.08755, 2019. [48] trains a Generative Adversary Network (GAN) that gen- [9] Credit. https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+ erates prototypical samples of the target participant’s training clients. data during the training process. The others directly reconstruct [10] CrypTen. https://github.com/facebookresearch/CrypTen. the training data from the gradient updates. Zhu et al. presents [11] Wenliang Du and Zhijun Zhan. Building decision tree classifier on private data. 2002. an optimization algorithm that can obtain both training sam- [12] Cynthia Dwork. Differential privacy: A survey of results. In Interna- ples and the labels from gradient updates [55]. Geiping et al. tional conference on theory and applications of models of computation, shows it is easy to reconstruct images at high resolution from pages 1–19. Springer, 2008. the gradient update [20]. [13] TF Encrypted. https://tf-encrypted.io/. [14] Homomorphic Encryption. https://en.wikipedia.org/wiki/ Although existing works have shown that federated learning Homomorphic encryption#:∼:text=Homomorphic%20encryption% does not protect the data privacy of honest participants [28], 20is%20a%20form,performed%20on%20the%20unencrypted%20data. [32], [23], [48], they all focus on the horizontal setting of [15] Wenjing Fang, Chaochao Chen, Jin Tan, Chaofan Yu, Yufei Lu, Li Wang, Lei Wang, Jun Zhou, and Alex X. A hybrid-domain framework for federated learning and does not make an in-depth privacy anal- secure gradient tree boosting. CoRR, abs/2005.08479, 2020. ysis in the vertical setting. Compared with horizontal federated [16] FATE. https://www.fedai.org/. learning, federated learning in vertical setting poses even more [17] TF Federated. https://www.tensorflow.org/federated. [18] Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion challenges due to the significantly higher complexity of the attacks that exploit confidence information and basic countermeasures. learning protocol. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, October 12-16, 2015, pages 1322–1333, 2015. VIII.CONCLUSION [19] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning, volume 1. Springer series in statistics New York, In this paper, we design two attacks, i.e., reverse multi- 2001. plication attack and reverse sum attack, to demonstrate the [20] Jonas Geiping, Hartmut Bauermeister, Hannah Droge,¨ and Michael Moeller. Inverting gradients - how easy is it to break privacy in federated possibility of stealing private information in the setting of ver- learning? CoRR, abs/2003.14053, 2020. tical federated learning. We conduct an extensive experimental [21] Irene Giacomelli, Somesh Jha, Marc Joye, C. David Page, and Ky- evaluation on two learning protocols implemented on the real- onghwan Yoon. Privacy-preserving ridge regression with only linearly- homomorphic encryption. In Applied Cryptography and Network Secu- world framework to empirically validate the existence of these rity - 16th International Conference, ACNS 2018, Leuven, Belgium, July privacy threats. To shed light on future research directions, 2-4, 2018, Proceedings, pages 243–261, 2018. we also discuss potential countermeasures of obfuscating the [22] Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Richard Nock, exchanged intermediate outputs and their challenges. Our Giorgio Patrini, Guillaume Smith, and Brian Thorne. Private federated learning on vertically partitioned data via entity resolution and additively work is the first to show the over pursuit of computation homomorphic encryption. CoRR, abs/1711.10677, 2017. efficiency leads to the privacy leakage of training data in [23] Briland Hitaj, Giuseppe Ateniese, and Fernando Perez-Cruz.´ Deep vertical federated learning. models under the GAN: information leakage from collaborative . In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, October 30 - November 03, 2017. EFERENCES R [24] Arthur Jochems, Timo M Deist, Johan Van Soest, Michael Eble, Paul Bulens, Philippe Coucke, Wim Dries, Philippe Lambin, and Andre [1] Yoshinori Aono, Takuya Hayashi, Le Trieu Phong, and Lihua Wang. Dekker. Distributed learning: developing a predictive model based on Scalable and secure logistic regression via homomorphic encryption. In data from multiple hospitals without data leaving the hospital–a real life Proceedings of the Sixth ACM on Conference on Data and Application proof of concept. Radiotherapy and Oncology, 121(3):459–467, 2016. Security and Privacy, CODASPY 2016, New Orleans, LA, USA, March 9-11, 2016, pages 142–144, 2016. [25] Zhaorui Li, Zhicong Huang, Chaochao Chen, and Cheng Hong. Quan- [2] Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, tification of the leakage in federated learning. CoRR, abs/1910.05467, Alex Ingerman, Vladimir Ivanov, Chloe´ Kiddon, Jakub Konecny,´ Ste- 2019. fano Mazzocchi, Brendan McMahan, Timon Van Overveldt, David [26] MagicNumber. https://en.wikipedia.org/wiki/Magic number Petrou, Daniel Ramage, and Jason Roselander. Towards federated (programming). learning at scale: System design. In Proceedings of Machine Learning [27] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and and Systems 2019, MLSys 2019, Stanford, CA, USA, March 31 - April Blaise Aguera¨ y Arcas. Communication-efficient learning of deep net- 2, 2019. works from decentralized data. In Proceedings of the 20th International [3] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20- H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and 22 April 2017, Fort Lauderdale, FL, USA. Karn Seth. Practical secure aggregation for privacy-preserving machine [28] Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly learning. In Proceedings of the 2017 ACM SIGSAC Conference on Shmatikov. Exploiting unintended feature leakage in collaborative Computer and Communications Security, CCS 2017, Dallas, TX, USA, learning. In 2019 IEEE Symposium on Security and Privacy, SP 2019, October 30 - November 03, 2017. San Francisco, CA, USA, May 19-23, 2019, pages 691–706, 2019. [4] Breast. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+ [29] Payman Mohassel and Peter Rindal. Aby3: A mixed protocol framework Wisconsin+%28Diagnostic%29. for machine learning. In Proceedings of the 2018 ACM SIGSAC [5] Breast. https://archive.ics.uci.edu/ml/datasets/Statlog+(Vehicle+ Conference on Computer and Communications Security, CCS 2018, Silhouettes). Toronto, ON, Canada, October 15-19, 2018, pages 35–52, 2018. [6] Jonathan H Chen and Steven M Asch. Machine learning and prediction [30] Payman Mohassel and Yupeng Zhang. Secureml: A system for scalable in medicine—beyond the peak of inflated expectations. The New privacy-preserving machine learning. In 2017 IEEE Symposium on England journal of medicine, 376(26):2507, 2017. Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017. 14

[31] Payman Mohassel and Yupeng Zhang. Secureml: A system for scalable Distributed deep forest and its application to automatic detection of cash- privacy-preserving machine learning. In 2017 IEEE Symposium on out fraud. ACM Transactions on Intelligent Systems and Technology Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, (TIST), 10(5):1–19, 2019. pages 19–38, 2017. [54] Jun Zhou, Xiaolong Li, Peilin Zhao, Chaochao Chen, Longfei Li, [32] Milad Nasr, Reza Shokri, and Amir Houmansadr. Comprehensive Xinxing Yang, Qing Cui, Jin Yu, Xu Chen, Yi Ding, and Yuan (Alan) privacy analysis of deep learning: Passive and active white-box inference Qi. Kunpeng: Parameter server based distributed learning systems and attacks against centralized and federated learning. In 2019 IEEE its applications in alibaba and ant financial. In Proceedings of the 23rd Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, ACM SIGKDD International Conference on Knowledge Discovery and May 19-23, 2019, pages 739–753, 2019. Data Mining, Halifax, NS, Canada, August 13 - 17, 2017, pages 1693– [33] Pascal Paillier. Public-key cryptosystems based on composite degree 1702. ACM, 2017. residuosity classes. In Advances in Cryptology - EUROCRYPT ’99, In- [55] Ligeng Zhu, Zhijian Liu, and Song Han. Deep leakage from gradients. ternational Conference on the Theory and Application of Cryptographic In Advances in Neural Information Processing Systems 32: Annual Techniques, Prague, Czech Republic, May 2-6, 1999, Proceeding, pages Conference on Neural Information Processing Systems 2019, NeurIPS 223–238. 2019, 8-14 December 2019, Vancouver, BC, Canada, pages 14747– [34] Le Trieu Phong, Yoshinori Aono, Takuya Hayashi, Lihua Wang, and 14756, 2019. Shiho Moriai. Privacy-preserving deep learning: Revisited and enhanced. In Applications and Techniques in Information Security - 8th Interna- tional Conference, ATIS 2017, Auckland, New Zealand, July 6-7, 2017, Proceedings, pages 100–110, 2017. [35] Le Trieu Phong, Yoshinori Aono, Takuya Hayashi, Lihua Wang, and Shiho Moriai. Privacy-preserving deep learning via additively homo- morphic encryption. IEEE Trans. Inf. Forensics Secur., 13(5):1333–1345, 2018. [36] PySyft. https://www.openmined.org/. [37] SGD. https://en.wikipedia.org/wiki/Stochastic . [38] SGX. https://en.wikipedia.org/wiki/Software Guard Extensions. [39] Secret Sharing. https://en.wikipedia.org/wiki/Secret sharing. [40] Winning Solutions. https://en.wikipedia.org/wiki/Double-precision floating-point format. [41] Student. https://archive.ics.uci.edu/ml/datasets/student+performance. [42] TEE. https://en.wikipedia.org/wiki/Trusted execution environment. [43] Jaideep Vaidya and Chris Clifton. Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23-26, 2002, Edmonton, Alberta, Canada, pages 639–644, 2002. [44] Jaideep Vaidya and Chris Clifton. Privacy-preserving k-means clustering over vertically partitioned data. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24 - 27, 2003, pages 206–215, 2003. [45] Jaideep Vaidya and Chris Clifton. Privacy preserving na¨ıve bayes classifier for vertically partitioned data. In Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, April 22-24, 2004, pages 522–526, 2004. [46] Jaideep Vaidya, Chris Clifton, Murat Kantarcioglu, and A. Scott Patter- son. Privacy-preserving decision trees over vertically partitioned data. ACM Trans. Knowl. Discov. Data, 2(3):14:1–14:27, 2008. [47] Wenhao Wang, Guoxing Chen, Xiaorui Pan, Yinqian Zhang, XiaoFeng Wang, Vincent Bindschaedler, Haixu Tang, and Carl A. Gunter. Leaky cauldron on the dark land: Understanding memory side-channel hazards in SGX. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, October 30 - November 03, 2017. [48] Zhibo Wang, Mengkai Song, Zhifei Zhang, Yang Song, Qian Wang, and Hairong Qi. Beyond inferring class representatives: User-level privacy leakage from federated learning. In 2019 IEEE Conference on Computer Communications, INFOCOM 2019, Paris, France, April 29 - May 2, 2019, pages 2512–2520, 2019. [49] Wenqi Wei, Ling Liu, Margaret Loper, Ka Ho Chow, Mehmet Emre Gursoy, Stacey Truex, and Yanzhao Wu. A framework for evaluating gradient leakage attacks in federated learning. CoRR, abs/2004.10397, 2020. [50] Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol., 10(2):12:1–12:19, 2019. [51] I-Cheng Yeh and Che-hui Lien. The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl., 36(2):2473–2480, 2009. [52] Hwanjo Yu, Jaideep Vaidya, and Xiaoqian Jiang. Privacy-preserving SVM classification on vertically partitioned data. In Advances in Knowledge Discovery and Data Mining, 10th Pacific-Asia Conference, PAKDD 2006, Singapore, April 9-12, 2006, Proceedings, pages 647– 656, 2006. [53] Ya-Lin Zhang, Jun Zhou, Wenhao Zheng, Ji Feng, Longfei Li, Ziqi Liu, Ming Li, Zhiqiang Zhang, Chaochao Chen, Xiaolong Li, et al.