Extracting Unintentional Secrets from Predictive Text Learning Systems

“What did you say?”: Extracting unintentional secrets from predictive text learning systems Gwyn Wilkinson and Phil Legg Computer Science Research Centre University of the West of England, Bristol, UK [email protected] Abstract—As a primary form of communication, text is used via the device keyboard. Therefore, if the device is designed widely in applications including e-mail conversations, mobile text to continually learn about the user’s typing interactions, such messaging, chatrooms, and forum discussions. Modern systems that patterns can be identified and predictions can be made include facilities such as predictive text, recently implemented using deep learning algorithms, to estimate the next word to for future interactions, does this present an opportunity for an be written based on previous historical entries. However, we attacker to extract unintentional secrets? often enter sensitive information such as passwords using the In this paper, we explore the issue of unintentional mem- same input devices - namely, smartphone soft keyboards. In orisation in predictive text modelling systems. We devise a this paper, we explore the problem of deep learning models small-scale study to test how well a system can learn about our which memorise sensitive training data, and how secrets can be extracted from predictive text models. We propose a general text-based interactions such that predictions can be made on a black-box attack algorithm to accomplish this for all kinds of character or word level for what should follow a given passage. memorised sequences, discuss mitigations and countermeasures, We then examine how this could potentially be exploited and explore how this attack vector could be deployed on an by an attacker to identify personal information about a user. Android or iOS mobile device platforms as part of target Examples could include presenting the first part of a post reconnaissance. code and having the system reveal the remainder to identify location information of a user (e.g., if an attacker entered I. INTRODUCTION ‘BS16’, what would a system predict to come next?). We REDICTIVE text has become widely adopted across discuss this challenge further to consider the nature of privacy P many modern devices to improve usability, and time attacks that could be exploited in this manner, including how required for composing a message. Predictive text functions feasible a password could be extracted from a learnt model of first gained wide popularity on early mobile devices which user interactions. provided a numerical keypad instead of a traditional keyboard, The paper is structured as follows: Section II presents re- which necessitating the grouping of multiple letters each lated work on the topic of vulnerabilities, possible attacks and number key (‘ABC’ on key 1, ‘DEF’ on key 2, and so forth). mitigations which can be performed on deep learning models. The so-called ‘T9’ predictive text model would check all three Section III provides a background to the domain, including letters from each keypress and suggest word completions based deep learning, recurrent networks, generative sequences, legal on a stored dictionary [7]. Whilst these methods were fairly privacy concerns, and our intended threat model. Section IV straightforward in their approach, based on frequency analysis, presents our system design, incorporating our deep learn- nowadays we can utilise more sophisticated deep learning ing model architecture, training data corpus, language model models such as recurrent neural networks, and long short- design, and the attack algorithm. Section V presents our term memory (LSTM) networks, to develop a much richer results and a discussion of these findings. Finally, Section VI understanding of a user’s writing style and their message concludes our work and provides discussion of further works. composition [17]. Such learning methods will continually learn about the sequence of words and characters that are inputted II. RELATED WORKS by the user. There has been a considerable amount of research in the Whilst this can have great benefits for developing a bespoke intersection between machine learning and privacy preserva- model that fits to the user’s preferences well, by the very nature tion. It has previously been shown that a machine learning of learning from all interactions, there are potential security classifier, trained on private data, can leak information about vulnerabilities that could be associated with this. For example, its training set [10], [11]. Through repeated training rounds, if a user was to write a confidential message to a family a machine learning classifier fθ() can achieve high accuracy member, how would this sensitive information be captured in predicting its training data. Due to this property it is by a learning model? How about other sensitive materials possible to use the presence of a high confidence output to that may be commonly entered by a user? In particular, the infer sensitive information about a data subject - for instance, reliance that users have nowadays on online accounts and login whether a patient record in a study possesses a specific credentials means that passwords may be commonly entered genotype. This kind of attack, named model inversion was first demonstrated in the black-box setting (i.e. an adversary being defined explicitly. This approximation is found using a can access a model but not inspect it’s parameters), using a set of m example data points x and labels y, pharmacogenetic regression model [11], and later extended m to facial recognition [10] in both black-box and white-box X = f( xi ; yi ) gi=1 settings (i.e. an adversary can inspect model parameters). One which are used to compute yi = fθ(xi). We then evaluate limitation of this attack in the black-box setting can be that how accurate the network is with respect to each data point by some prior knowledge of the target data subject’s non-sensitive computing the loss function L(x; y; y; θ), and then performing attributes is needed, and this can be addressed using model gradient descent to update θ. In other words, we compare the poisoning to inject malicious data into the training set [13]. estimated values of yi to the known values of yi, and use the An extension of this technique allows an adversary to average of the differences to update θ in the direction which perform membership inference [33], by which the input is minimises the loss, initialised and then optimised to produce a high confidence m output from the model. By identifying a local maximum in 1 X θ − θ − η rL(x ; y ; θ ) this way, we can infer a correlation between the data point new old m i i old i=1 and a training set member, and in theory rebuild the dataset in its entirety. This has been extended from the centralised to where η is a learning rate, which limits the update magnitude the distributed training environment, where it has been shown in order to reduce over-correction. This process is known as a malicious training participant can infer data membership training, and it is often required to train a neural network many through crafted updates [25]. times over the entire set of X, with each full iteration known Some causes of this vulnerability have been identified, such as a training epoch, in order for the network to reach a desired as influence of individual data points [38], and poor general- level of accuracy with respect to the training data. However, isation (or overfitting) [40], and mitigations suggested such accuracy on training data can only translate to accuracy in the as -differential privacy [1], regularisation techniques [21] real world to the extent that X is representative of the global [15] [34] [28], adversarial regularisation [24], API hardening distribution. In practice, we will almost always see under- and [36] and improved data sanitisation techniques [18]. However, over-representation of one or more features in a training set due to the well-known tension between privacy and utility in relative to the global distribution, and chasing higher accuracy data publishing [30], such attacks can never be completely through continued training can run the risk of overfitting the prevented in an accurate model. Previous work by Legg et network to the training data in a way which reduces its real al. [22] explored how visual analytics techniques can be used world accuracy. to identify potential vulnerabilities that may exist within a B. Recurrent Neural Networks trained model, and how robustness could be improved when using collaborative user-machine guided sample selection. A Recurrent Neural Network (RNN) is a subclass of neural Inference attacks seek to leverage the unintended mem- networks which differ from traditional feed-forward neural orisation of training data, to which modern deep learning architectures in that each unit’s input at each time step t 2 Z n models have been shown to be particularly susceptible [41] consists not just of the input vector xt 2 R but also a hidden m [23]. Recent work by Carlini et al. [4] provides a conceptual state vector ht−1 2 R , returning ht and the output vector demonstration of unintentional memorisation for generative yt. This essentially gives each neural unit a memory enabling text models. They demonstrate the concept for a credit card it to classify the present input in the context of the inputs it number injected within body text that is used for training has seen recently. a deep learning model. Our work extends on this notion, One problem with the standard RNN is that, while in theory however we address passwords rather than credit card numbers ht can contain information from x0:t−1, in practice the storage since we believe passwords are more commonly written or of long term dependencies can lead to problems with train- stored by users (e.g., sharing with a family member via text ing due to exploding/vanishing gradients caused by needing message, or storing in a Notes application).

Load more