<<

Information Leakage in Embedding Models

Congzheng Song∗ Ananth Raghunathan∗ Cornell University Facebook [email protected] [email protected]

ABSTRACT the medical sciences for diagnosis [8, 56]. These advances rely on Embeddings are functions that map raw input data to low-dimensional not only improved training algorithms and architectures, but also vector representations, while preserving important semantic in- access to high-quality and often sensitive data. The wide applica- formation about the inputs. Pre-training embeddings on a large tions of and its reliance on quality training data amount of unlabeled data and fine-tuning them for downstream necessitates a better understanding of how exactly ML models work tasks is now a de facto standard in achieving state of the art learning and how they interact with their training data. Naturally, there is in many domains. increasing literature focused on these aspects, both in the domains We demonstrate that embeddings, in addition to encoding generic of interpretability and fairness of models, and their privacy. The semantics, often also present a vector that leaks sensitive informa- latter is further of timely importance given recent regulations under tion about the input data. We develop three classes of attacks to the European Union’s General Data Protection Regulation (GDPR) systematically study information that might be leaked by embed- umbrella requiring users to have greater control of their data. dings. First, embedding vectors can be inverted to partially recover Applying a line of research investigating whether individual some of the input data. As an example, we show that our attacks genomic records can be subsequently identified [28, 62], Shokri et on popular sentence embeddings recover between 50%–70% of the al. [66] developed the first membership inference tests investigating input words (F1 scores of 0.5–0.7). Second, embeddings may reveal how ML models may leak some of their training data. This subse- sensitive attributes inherent in inputs and independent of the un- quently led to a large body of work exploring this space [45, 48, derlying semantic task at hand. Attributes such as authorship of 60, 61, 77]. Another research direction investigating how models text can be easily extracted by training an inference model on just might memorize their training data has also shown promising re- a handful of labeled embedding vectors. Third, embedding models sults [4, 67]. Apart from training data privacy, modern leak moderate amount of membership information for infrequent models can unintentionally leak information of the sensitive input training data inputs. We extensively evaluate our attacks on vari- from the model’s representation at inference time [9, 18, 38, 50, 68]. ous state-of-the-art embedding models in the text domain. We also This growing body of work analyzing models from the stand- propose and evaluate defenses that can prevent the leakage to some point of privacy aims to answer natural questions: Besides the learn- extent at a minor cost in utility. ing task at hand, what other information do models capture or expose about their training data? And, what functionalities may be captured CCS CONCEPTS by ML models unintentionally? We also note that beyond the scope of this paper are other rich sets of questions around adversarial • Security and privacy → Software and application security. behavior with ML models—the presence of adversarial examples, KEYWORDS data poisoning attacks, and more. machine learning, embeddings, privacy Embeddings. In most of the research investigating the privacy of ML models, one class of models is largely conspicuously absent. ACM Reference Format: Embeddings are mathematical functions that map raw objects (such Congzheng Song and Ananth Raghunathan. 2020. Information Leakage in as words, sentences, images, user activities) to real valued vectors Embedding Models. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS ’20), November 9–13, 2020, with the goal of capturing and preserving some important semantic Virtual Event, USA. ACM, New York, NY, USA, 14 pages. https://doi.org/10. meaning about the underlying objects. The most common appli- 1145/3372297.3417270 cation of embeddings is transfer learning where the embeddings arXiv:2004.00053v2 [cs.LG] 19 Aug 2020 are pre-trained on a large amount of unlabeled raw data and later 1 INTRODUCTION fine-tuned on downstream tasks with limited labeled data. Transfer Machine learning (ML) has seen an explosive growth over the past learning from embeddings have been shown to be tremendously decade and is now widely used across industry from image anal- useful for many natural language processing (NLP) tasks such as ysis [24, 34], [22], and even in applications in paraphrasing [14], response suggestion [30], information retrieval, text classification, and question answering57 [ ], where labeled data ∗Work done while at Brain. is typically expensive to collect. Embeddings have also been success- Permission to make digital or hard copies of part or all of this work for personal or fully applied to other data domains including social networks [21], classroom use is granted without fee provided that copies are not made or distributed source code [2], YouTube watches [11], movie feedback [25], lo- for profit or commercial advantage and that copies bear this notice and the full citation cations [12], etc. In some sense, their widespread use should not on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). be surprising—embedding models and their success illustrate why CCS ’20, November 9–13, 2020, Virtual Event, USA deep learning has been successful at capturing interesting semantics © 2020 Copyright held by the owner/author(s). from large quantities of raw data. ACM ISBN 978-1-4503-7089-9/20/11. https://doi.org/10.1145/3372297.3417270 Embedding models are often pre-trained on raw and unlabeled abstract semantics. Under scenarios involving black-box and white- data at hand, then used with labeled data to transfer learning to box access to embedding models, we develop several inversion various downstream tasks. Our study of embeddings is motivated techniques that can invert the embeddings back to the words with by their widespread application and trying to better understand high precision and recall values (exceeding 60% each) demonstrating how these two stages of training may capture and subsequently that a significant fraction of the inputs may be easily recovered. We leak information about sensitive data. While it is attractive that note that these techniques might be of independent interest. these models inherently capture relations and similarities and other (2) We discover certain embeddings training frameworks in par- semantic relationships between raw objects like words or sentences, ticular favor sensitive attributes leakage by showing that such em- it also behooves one to consider if this is all the information being bedding models improve upon state-of-the-art stylometry tech- captured by the embeddings. Do they, perhaps, in addition to mean- niques [59, 65] to infer text authorship with very little training ing of words inadvertently capture information about the authors data. Using dual encoder embeddings [39] trained with contrastive perhaps? Would that have consequences if these embeddings are learning framework [63] and 10 to 50 labeled sentences per author, used in adversarial ways? What aspects of the seminal research we show 1.5–3× fold improvement in classifying hundreds of au- into privacy of ML models through the lens of membership infer- thors over prior stylometry methods and embeddings trained with ence attacks and memorization propensity can we apply to better different learning paradigms. understand the privacy risks of embeddings? (3) We show that membership inference attacks are still a viable These questions and more lead us to initiate a systematic study threat for embedding models, albeit to a lesser extent. Given our of the privacy of embedding models by considering three broad results that demonstrate adversary can achieve a 30% improvement classes of attacks. We first consider embedding inversion attacks on membership information over random guessing on both word whose goal is to invert a given embedding back to the sensitive and sentence embeddings, we show that is prudent to consider raw text inputs (such as private user messages). For embeddings these potential avenues of leakage when dealing with embeddings. susceptible to these attacks, it is imperative that we consider the (4) Finally, we propose and evaluate adversarial training tech- embedding outputs containing inherently as much information niques to minimize the information leakage via inversion and sensi- with respect to risks of leakage as the underlying sensitive data tive attribute inference. We demonstrate through experiments their itself. The fact that embeddings appear to be abstract real-numbered applicability to mitigating these attacks. We show that embedding vectors should not be misconstrued as being safe. inversion attacks and attribute inference attacks against our adver- Along the lines of Song and Shmatikov [68], we consider at- sarially trained model go down by 30% and 80% respectively. These tribute inference attacks to test if embeddings might unintentionally come at a minor cost to utility indicating a mitigation that cannot reveal attributes of their input data that are sensitive; aspects that simply be attributed to training poorer embeddings. are not useful for their downstream applications. This threat model is particularly important when sensitive attributes orthogonal to Paper Organization. The rest of the paper is organized as fol- downstream tasks are quickly revealed given very little auxiliary lows. Section 2 introduces preliminaries needed for the rest of data, something that the adversary might be able to capture through the paper. Sections 3, 4, and 5 cover attacks against embedding external means. As an example, given (sensitive) demographic in- models—inversion attacks, sensitive attributes inference attacks, formation (such as gender) that an adversary might retrieve for a and membership inference attacks respectively. Our experimen- limited set of data points, the embeddings should ideally not en- tal results and proposed defenses are covered in Sections 6 and 7 able even somewhat accurate estimates of these sensitive attributes respectively, followed by related work and conclusions. across a larger population on which they might be used for com- pletely orthogonal downstream tasks (e.g. ). 2 PRELIMINARIES Finally, we also consider classic membership inference attacks 2.1 Embedding Models demonstrating the need to worry about leakage of training data membership when given access to embedding models and their Embedding models are widely used machine learning models that outputs, as is common with many other ML models. map a raw input (usually discrete) to a low-dimensional vector. The embedding vector captures the semantic meaning of the raw Our Contributions. As discussed above, we initiate a systematic input data and can be used for various downstream tasks including and broad study of three classes of potential information leak- nearest neighbor search, retrieval, and classification. In this work, age in embedding models. We consider word and sentence em- we focus on embeddings of text input data as they are widely used bedding models to show the viability of these attacks and demon- in many applications and have been studied extensively in research strate their usefulness in improving our understanding of privacy community [3, 5, 13, 33, 35, 39, 46, 54, 55, 58]. risks of embeddings. Additionally, we introduce new techniques to attacks models, in addition to drawing inspiration from exist- Word embeddings. Word embeddings are look-up tables that d ing attacks to apply them to embeddings to various degrees. Our map each word w from a vocabulary V to a vector vw ∈ R . Word results are demonstrated on widely-used word embeddings such embedding vectors capture the semantic meaning of words in the as Word2Vec [46], FastText [3], GloVe [54] and different sentence following manner: words with similar meanings will have small embedding models including dual-encoders [39] and BERT [13, 35]. cosine distance in the embedding space, the cosine distance of v1 − ( ⊤ · /∥ ∥∥ ∥) (1) We demonstrate that sentence embedding vectors encode and v2 defined as 1 v1 v2 v1 v2 . information about the exact words in input texts rather than merely Popular models including Word2Vec [46], Fast- Text [3] and GloVe [54] are learned in an unsupervised fashion on a large unlabeled corpus. In detail, given a sliding window of words tasks by fine-tuning. The recently proposed Transformer architec- C = [wb ,...,w0,...,we ] from the training corpus, Word2Vec and ture [70] enables language models to have dozen of layers with FastText train to predict the context word wi given the center word huge capacity. These large language models, including BERT [13], w0 by maximizing the log-likelihood log PV (wi |w0) where GPT-2 [55], and XLNet [76], are trained on enormous large corpus (v⊤ · v ) and have shown impressive performance gains when transferred exp wi w0 PV (wi |w0) = (1) to other downstream NLP tasks in comparison to previous state of Í exp(v⊤ · v ) w ∈{wi }∪Vneg w w0 the art methods. for each wi ∈ C/{w0}. To accelerate training, the above probability These pre-trained language models can also be used to extract is calculated against Vneg ⊂ V, a sampled subset of words not in sentence embeddings. There are many different ways to extract C instead of the entire vocabulary. GloVe is trained to estimate the sentence-level features with pre-trained language models. In this co-occurrence count of wi and wj for all pairs of wi ,wj ∈ C. work, we follow Sentence-BERT [58] which suggests that mean- A common practice in modern deep NLP models is to use word pooling on the hidden representations yields best empirical perfor- embeddings as the first so that discrete inputs are mapped mance. to a continuous space and can be used for later computation. Pre- trained word embeddings is often used to initialize the weights 2.2 Threat Model and Attack Taxonomy of the embedding layer. This is especially helpful when the down- In this section, we give an overview of threat models we consider in stream NLP tasks have a limited amount of labeled data as the this paper and describe a taxonomy of attacks leaking information knowledge learned by these pre-trained word embeddings from a from embedding models. Figure 1 shows an overview of the attack large unlabeled corpus can be transferred to the downstream tasks. taxonomy. Sentence embeddings. Sentence embeddings are functions that In many NLP applications, embedding vectors are often com- map a variable-length sequence of words x to a fix-sized embed- puted on sensitive user input. Although existing frameworks pro- d ding vector Φ(x) ∈ R through a neural network model Φ. For a pose computing embeddings locally on user’s device to protect the input sequence of ℓ words x = [w1,w2,...,wℓ], Φ first maps x into privacy of the raw data [6, 36, 50, 71], the embeddings themselves a sequence of word vectors X = [v1,...,vℓ] with a word embed- are shared with machine learning service providers or other parties ding matrix V . Then Φ feeds X to a recurrent neural networks or for downstream tasks (training or inference). It is often tempting Transformer [70] and obtain a sequential hidden representation to assume that sharing embeddings might be “safer” than sharing [h1,h2,...,hℓ] for each word in x. Finally Φ outputs the sentence the raw data, by their nature of being “simply a vector of real num- embedding by reducing the sequential hidden representation to a bers.” However, this ignores the information that is retained in, and single vector representation. Common reducing methods include may be extracted from embeddings. We investigate the following taking the last representation where Φ(x) = hℓ and mean-pooling questions: What kinds of sensitive information about the inputs are Íℓ where Φ(x) = (1/ℓ)· i=1 hi . encoded in the embeddings? And can an adversary with access to the Sentence embedding models are usually trained with unsuper- embeddings extract the encoded sensitive information? vised learning methods on a large unlabeled corpus. A popular archi- Threat model. Our threat model comprises the following en- tecture for unsupervised sentence embedding is the dual-encoder tities. (1) Dtrain is a training dataset that may contain sensitive model proposed in many prior works [5, 7, 26, 39, 40, 58, 75]. The information. (2) Φ, the embedding model, which might be avail- x , x dual-encoder model trains on a pair of context sentences ( a b ) able in a white-box or black-box fashion. White-box access to the where the pair could be a sentence and its next sentence in the model reveals the model architecture and all parameters. Black- same text or a dialog input and its response, etc. Given a randomly box access allows anyone to compute Φ(x) for x of their choice. sampled set of negative sentences Xneg that are not in the context of ∗ (3) Etarget = {Φ(xi )}, a set of embedding vectors on sensitive in- xa, xb , the objective of training is to maximizes the log-likelihood ∗ puts x . (4) Daux is an auxiliary dataset available to the adversary P (x |x , X ) i log Φ b a neg where comprising of either limited labeled data drawn from the same dis- ⊤ exp(Φ(x ) · Φ(xa)) tribution as Dtrain or unlabeled raw text data. In the text domain, P (x |x , X ) = b . (2) Φ b a neg Í ( ( )⊤ · ( )) unlabeled data is cheap to collect due to the enormous amount free x ∈{x }∪Xneg exp Φ x Φ xa b text available on the Internet while labeling them is often much Intuitively, the model is trained to predict the correct context xb more expensive. from the set {xb } ∪ Xneg when conditioned on xa. In other words, An adversary, given access to some of the entities described the similarity between embeddings of context data is maximized above, aims to leak some sensitive information about the input x while that between negative samples is minimized. from the embedding vector Φ(x). By looking at variants of what Sentence embeddings usually outperform word embeddings on information an adversary possesses and their target, we arrive at transfer learning. For downstream tasks such as image-sentence three broad classes of attacks against embeddings. retrieval, classification, and paraphrase detection, sentence embed- With sensitive input data such as personal messages, it is natural dings are much more efficient for the reason that only a linear to consider an adversary whose goal is to (partially) recover this model needs to be trained using embeddings as the feature vectors. text from Φ(x). Even if the input data is not sensitive, they may Pre-trained language models. Language models are trained to be associated with sensitive attributes, and an adversary could be learn contextual information by predicting next words in input text, tasked with learning these attributes from Φ(x). And finally, rather and pre-trained language models can easily adopt to other NLP than recovering inputs, the adversary given some information about Figure 1: Taxonomy of attacks against embedding models. We assume adversary has access to the embedding Φ(x∗) of a sen- sitive input text x∗ that will be used for downstream NLP tasks, and perform three information leakage attacks on Φ(x∗): (1) inverting the embedding back to the exact words in x∗, (2) inferring sensitive attribute of x∗, and (3) inferring the membership, i.e. whether x∗ and its context x ′ has been used for training.

Φ(x) can aim to find if x was used to train Φ or not. These are We further assume adversary has a limited Daux labeled with formalized below. membership. We propose that this is a reasonable and practical Embedding inversion attacks. In this threat model, the adver- assumption as training data for text embeddings are often collected sary’s goal is to invert a target embedding Φ(x∗) and recover words from the Internet where an adversary can easily inject data or get in x∗. We consider attacks that involve both black-box and white- access to small amounts of labeled data through a more expen- box access to Φ. The adversary is also allowed access to an unlabeled sive labeling process. The assumption also holds for adversarial participants in collaborative training or federated learning [43, 44]. Daux and in both scenarios is able to evaluate Φ(x) for x ∈ Daux. Sensitive attribute inference attacks. In this threat model, the 3 EMBEDDING INVERSION ATTACKS adversary’s goal is to infer sensitive attribute s∗ of a secret input x∗ from a target embedding Φ(x∗). We assume the adversary has The goal of inverting text embeddings is to recover the target input ∗ ( ∗) access to Φ, and a set of labeled data of the form (x,s) for x ∈ texts x from the embedding vectors Φ x and access to Φ. We focus ∗ on inverting embeddings of short texts and for practical reasons, it Daux. We focus on discrete attributes s ∈ S, where S is the set of all possible attribute classes, and an adversary performing the suffices to analyze the privacy of the inputs by considering attacks that recover a set of words without recovering the word ordering. inference by learning a classifier f on Daux. Given sufficient labeled data , the adversary’s task trivially reduces to plain supervised We leave open the problem of recovering exact sequences and one learning, which would rightly not be seen as adversarial. Therefore, promising direction involves language modeling [64]. the more interesting scenario, which is the focus of this paper, is A naïve approach for inversion would be enumerating all pos- when the adversary is given access to a very small set of labeled data sible sequences from the vocabulary and find the recovery xˆ such ( ) ( ∗) (in the order of 10–50 per class) where transfer learning from Φ(x) that Φ xˆ = Φ x . Although such brute-force search only requires is likely to outperform directly from inputs x. black-box access to Φ, the search space grows exponentially with the length of x and thus inverting by enumerating is computation- Membership inference attacks. Membership inference against ally infeasible. ML models are well-studied attacks where the adversary has a The brute-force approach does not capture inherently what in- x∗ target data point and the goal is to figure out with good accu- formation might be leaked by Φ itself. This naturally raises the x∗ ∈ D racy whether train or not. Unlike previous attacks focused question of what attacks are possible if the adversary is given com- on supervised learning, some embedding models, such as word plete access to the parameters and architecture of Φ, i.e., white-box embeddings, allow you to trivially infer membership. For word access. This also motivates a relaxation technique for optimization- embeddings, every member of a vocabulary set is necessarily part based attacks in Section 3.1. We also consider the more constrained D of train. Instead, we expand the definition of training data mem- black-box access scenario in Section 3.2 where we develop learning contexts bership to consider this data with their , which is used in based attacks that are much more efficient than exhaustive search training. We assume the adversary has a target context of word by utilizing auxiliary data. [ ∗ ∗ ] w1,...,wn and access to V for word embedding, or a context of target sentences (x∗, x∗) and access to the model Φ for sentence a b 3.1 White-box Inversion embedding, and the goal is to decide the membership for the con- In a white-box scenario, we assume that the adversary has access text. We also consider the target to be an aggregated level of data to the embedding model Φ’s parameters and architecture. We for- sentences [x∗,..., x∗ ] comprising multiple contexts for the adver- 1 n mulate white-box inversion as the following optimization problem: sary to determine if it were part of training Φ. Membership privacy for aggregation in the user level has also been explored in prior min ||Φ(xˆ) − Φ(x∗)||2 (3) ∈X(V) 2 works [44, 67]. xˆ where X(V) is the set of all possible sequences enumerated from the vocabulary V. The above optimization can be hard to solve Algorithm 1 White-box inversion lower-layer representations results in better reconstruction of the ∗ 1: Input: target embedding Φ(x ), white-box embedding model image. With the recent advance in Transformer models [70], text Φ with lower layer representation function Ψ, temperature T , embeddings can also be computed from deep models with many layers such as BERT [13]. Directly inverting such deep embeddings sparsity threshold τsp, auxiliary dataset Daux using the relaxed optimization method in Equation 5 can results in 2: Query function Φ and Ψ with Daux and collect inaccurate recovery as the optimization becomes highly non-convex {(Φ(xi ), Ψ(xi ))|xi ∈ Daux}. 2 and different sequences with similar semantics can be mapped to 3: Train a linear mapping M by minimizing ||M(Φ(xi )) − Ψ(xi )|| 2 the same location in the high-level embedding space. In reality, on {(Φ(xi ), Ψ(xi ))}i . 4: if Ψ is mean-pooling on word embedding V of Φ then however, it is more common to use the embeddings from the higher |V | layers than from lower layers for downstream tasks and thus an 5: Initialize z ∈ R adversary might not be able to observe embeddings from lower 6: while objective function of Eq 7 not converged do layers at all. 7: Update z with gradient of Eq 7. To resolve this issue, we propose to invert in two stages: (1) the 8: Project z to non-negative orthant. |V | adversary first maps the observed higher layer embedding Φ(x) 9: return xˆ = {w |z ≥ τ } i i sp i=1 to a lower layer one with a learned mapping function M, and (2) 10: else ℓ×|V | the adversary then solves the optimization problem of minimizing 11: Initialize Z = [z ,..., z ] ∈ R 1 ℓ ||Ψ(xˆ) − M(Φ(x∗))||2 where Ψ denotes the lower layer embedding 12: while objective function of Eq 6 not converged do 2 function. To learn the mapping function M, adversary queries the 13: Update Z with gradient of Eq 6. white-box embedding model to get (Φ(x ), Ψ(x )) for each auxiliary ℓ i i 14: return xˆ = {w |i = arg max z } i j j=1 data xi ∈ Daux and trains M with the set {(Φ(xi ), Ψ(xi ))}i . After M is trained, adversary solve the relaxed optimization: min ||Ψ(relaxed(Z,T )) − M(Φ(x∗))||2 (6) directly due to the discrete input space. Inspired by prior work Z 2 on relaxing categorical variables [29], we propose a continuous In practice, we find that learning a linear least square model as M relaxation of the sequential word input that allows more efficient works reasonably well. optimization based on gradients. A special case of inverting lowest representation. The goal of the discrete optimization in Equation 3 is to select The lowest a sequence of words such that the distance between the output embedding we can compute from the text embedding models would ( ) ( /ℓ)· Íℓ v embeddings is minimized. We relax the word selection at each be the average of the word vectors, i.e., Ψ x = 1 i=1 i . In- |V | verting from such embedding reduces to the problem of recovering position of the sequence with a continuous variable zi ∈ R . As mentioned before, Φ first maps the input x of length ℓ into a the exact words vectors with given averaged vector. Instead of using the relaxed optimization approach in Equation 5, we use a sparse sequence of word vectors X = [v1,...,vℓ] and then computes the coding [49] formulation as following: text embedding based on X. For optimizing zi , we represent the ⊤ ∗ 2 selected word vectors vˆi using a softmax attention mechanism: min ||V · z − M(Φ(x ))|| + λsp||z||1 (7) |V| 2 ⊤ z ∈R≥0 vˆi = V · softmax(zi /T ) for i = 1, ... , ℓ (4) where V is the word embedding matrix and the variable z quantifies where V is the word embedding matrix in Φ and T is a tempera- the contribution of each word vector to the average. We constrain z ture parameter. The approximates hard argmax to be non-negative as a word contributes either something positive selection for T < 1. Intuitively, softmax(z /T ) models the probabil- i if it is in the sequence x or zero if it is not in x. We further penalize ities of selecting each word in the vocabulary at position i in the z with L1 norm to ensure its sparsity as only few words from the sequence and vˆi is the average of all word vectors weighted by the vocabulary contributed to the average. The above optimization can probabilities. be solved efficiently with projected , where weset Let Z = [z ,..., z ] ∈ Rℓ×|V | and relaxed(Z,T ) = [vˆ ,...,vˆ ] 1 ℓ 1 ℓ the coordinate z to 0 if z < 0 after each descent step. The final be the sequence of softmax relaxed word vectors. For simplicity, we j j recovered words are those with coefficient z > τ for a sparsity denote Φ(X) as the text embedding computed from any sequence j sp threshold hyper-parameter τ . of word vectors X. Then our relaxed optimization problem is: sp min ||Φ(relaxed(Z,T )) − Φ(x∗)||2. (5) 3.2 Black-box Inversion Z 2 In the black-box scenario, we assume that the adversary only has Φ With continuous relaxation and white-box access to , the opti- query access to the embedding model Φ, i.e., adversary observes the mization problem can now be solved by gradient-based methods as output embedding Φ(x) for a query x. Gradient-based optimization Z gradients of can be calculated using back-propagation. To recover, is not applicable as the adversary does not have access to the model x = {w |i = z }ℓ we compute ˆ i arg max j j=1. parameters and thus the gradients. Inverting embedding from deep models. Prior works [15, 16, Instead of searching for the most likely input, we directly ex- 41, 69] demonstrated that inverting image features from the higher tract the input information retained in the target embedding by layers of deep model can be challenging as representations from the formulating a learning problem. The adversary learns an inversion higher layers are more abstract and generic while inverting from model ϒ that takes a text embedding Φ(x) as input and outputs the set of words in the sequence x. As mentioned before, our goal is Algorithm 2 Black-box Inversion with multi-set prediction ∗ to recover the set of words in the input independent of their word 1: Input: target embedding Φ(x ), black-box model Φ, auxiliary ordering. We denote W(x) as the set of words in the sequence data Daux D x. The adversary utilizes the auxiliary dataset aux and queries 2: procedure MSPLoss(x, Φ, ϒ) the black-box Φ and obtain a collection of (Φ(x), W(x)) for each 3: Initialize L ← 0, Wi ← W(x), W