Explanation-Guided Backdoor Poisoning Attacks Against Malware Classiﬁers

Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers Giorgio Severi Jim Meyer∗ Scott Coull Northeastern University Xailient Inc. FireEye Inc. [email protected] [email protected] scott.coull@fireeye.com Alina Oprea Northeastern University [email protected] Abstract goal is to alter the data point at inference time in order to Training pipelines for machine learning (ML) based malware induce a misclassification. However, in this paper, we focus classification often rely on crowdsourced threat feeds, expos- on the insidious problem of poisoning attacks [14], which ing a natural attack injection point. In this paper, we study attempt to influence the ML training process, and in partic- the susceptibility of feature-based ML malware classifiers to ular backdoor [28] poisoning attacks, where the adversary backdoor poisoning attacks, specifically focusing on challeng- places a carefully chosen pattern into the feature space such ing “clean label” attacks where attackers do not control the that the victim model learns to associate its presence with sample labeling process. We propose the use of techniques a class of the attacker’s choice. While evasion attacks have from explainable machine learning to guide the selection previously been demonstrated against both open-source [4] of relevant features and values to create effective backdoor and commercial malware classifiers [7], backdoor poisoning triggers in a model-agnostic fashion. Using multiple refer- offers attackers an attractive alternative that requires more ence datasets for malware classification, including Windows computational effort at the outset, but which can result in PE files, PDFs, and Android applications, we demonstrate a generic evasion capability for a variety of malware sam- effective attacks against a diverse set of machine learning ples and target classifiers. These backdoor attacks have been models and evaluate the effect of various constraints imposed shown to be extremely effective when applied to computer on the attacker. To demonstrate the feasibility of our backdoor vision models [21, 38] without requiring a large number of attacks in practice, we create a watermarking utility for Win- poisoned examples, but their applicability to the malware clas- dows PE files that preserves the binary’s functionality, and sification domain, and feature-based models in general, has we leverage similar behavior-preserving alteration method- not yet been investigated. ologies for Android and PDF files. Finally, we experiment Poisoning attacks are a danger in any situation where a with potential defensive strategies and show the difficulties of possibly malicious third party has the ability to tamper with a completely defending against these attacks, especially when subset of the training data. For this reason, they have come the attacks blend in with the legitimate sample distribution. to be considered as one of the most relevant threats to pro- duction deployed ML models [35]. We argue that the current training pipeline of many security vendors provides a natural 1 Introduction injection point for such attacks. Security companies, in fact, often rely on crowd-sourced threat feeds [1,6,8,9] to provide The endpoint security industry has increasingly adopted ma- them with a large, diverse stream of user-submitted binaries to chine learning (ML) based tools as integral components of train their classifiers. This is chiefly due to the sheer quantity their defense-in-depth strategies. In particular, classifiers us- of labeled binaries needed to achieve satisfactory detection ing features derived from static analysis of binaries are com- performance (tens to hundreds of millions of samples), and monly used to perform fast, pre-execution detection and pre- specifically the difficulty in adequately covering the diverse vention on the endpoint, and often act as the first line of de- set of goodware observed in practice (e.g., custom binaries, fense for end users [2,3,5]. Concurrently, we are witnessing multiple versions of popular software, software compiled with a corresponding increase in the attention dedicated to adver- different compilers, etc.). sarial attacks against malicious software (malware) detection One complication in this scenario, however, is that the models. The primary focus in this area has been the develop- labels for these crowd-sourced samples are often gener- ment of evasion attacks [13, 25, 62], where the adversary’s ated by applying several independent malware detection en- ∗The author contributed to this work while at FireEye Inc. gines [30], which would be impossible for an attacker to con- Outsourced data Proprietary data Gathering ML malware & Labeling classifier Preprocessing & The platforms collect data Feature Extraction and assign labels. Model training Users submit binaries to Attacker can now submit malware crowdsourced threat intelligence containing the same backdoor. The platforms for evaluation. model will be fooled into recognizing it Attacker submits poisoned The company obtains the outsourced data and as benign. benign files. uses it in the training of a ML malware classifier. Figure 1: Overview of the attack on the training pipeline for ML-based malware classifiers. trol. Therefore, in this paper, we study clean-label backdoor 2 Background attacks [55,65] against ML-based malware classifiers by de- 1 veloping a new, model-agnostic backdoor methodology. Our Malware Detection Systems. We can separate automated attack injects backdoored benign samples in the training set of malware detection approaches into two broad classes based a malware detector, with the goal of changing the prediction on their use of static or dynamic analysis. Dynamic analysis of malicious software samples watermarked with the same systems execute binary files in a virtualized environment, and pattern at inference time. To decouple the attack strategy from record the behavior of the sample looking for indicators of the specifics of the ML model, our main insight is to lever- malicious activities [10, 31, 41, 54, 63]. Meanwhile, static ana- age tools from ML explainability, namely SHapley Additive lyzers process executable files without running them, extract- exPlanations (SHAP) [40], to select a small set of highly ef- ing the features used for classification directly from the binary fective features and their values for creating the watermark. and its meta-data. With the shift towards ML based classifiers, We evaluate our attack against a variety of machine learning this second class can be further divided into two additional models trained on widely-used malware datasets, including subcategories: feature-based detectors [11, 42, 52, 53,56], and EMBER (Windows executables) [11], Contagio (PDFs) [57], raw-binary analyzers [22, 34, 48]. We focus our attacks on and Drebin (Android executables) [12]. Additionally, we ex- classifiers based on static features due to their prevalence in plore the impact of various real-world constraints on the ad- providing pre-execution detection and prevention for many versary’s success, and the viability of defensive mechanisms commercial endpoint protection solutions [2,3,5]. to detect the attack. Overall, our results show that the attack achieves high success rates across a number of scenarios and Adversarial Attacks. Adversarial attacks against machine that it can be difficult to detect due to the natural diversity learning models can also be broadly split into two main cate- present in the goodware samples. Our contributions are: gories: evasion attacks, where the goal of the adversary is to (i) We highlight a natural attack point which, if left add a small perturbation to a testing sample to get it misclassi- unguarded, may be used to compromise the training of fied; poisoning attacks, where the adversary tampers with the commercial, feature-based malware classifiers. training data, either injecting new data points, or modifying (ii) We propose the first general, model-agnostic method- existing ones, to cause misclassifications at inference time. ology for generating backdoors for feature-based The former has been extensively explored in the context of classifiers using explainable machine learning tech- computer vision [17], and previous research efforts have also niques. investigated the applicability of such techniques to malware classification [13, 27, 33, 59, 70]. The latter has been itself (iii) We demonstrate that explanation-guided backdoor divided into different subcategories. Availability poisoning attacks are feasible in practice by developing a back- attacks aim at degrading the overall model accuracy [14,29]. dooring utility for Windows PE files, and using similar Targeted poisoning attacks induce the model to misclassify functionality-preserving methods for Android and PDF a single instance at inference time [55,60]. Finally, in Back- files. We show that these methods can satisfy multiple, door attacks, the adversary’s goal is to inject a backdoor (or realistic adversarial constraints. watermark) pattern in the learned representation of the model, (iv) Finally, we evaluate mitigation techniques and demon- which can be exploited to control the classification results. In strate the challenges of fully defending against stealthy this context, a backdoor is a specific combination of features poisoning attacks. and selected values that the victim model is induced, during 1We will refer to the combination of features and values used to induce training, to associate with a target class. The same watermark, the misclassification,

Load more