Dehib: Deep Hidden Backdoor Attack on Semi-Supervised Learning Via Adversarial Perturbation
Total Page:16
File Type:pdf, Size:1020Kb
The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21) DeHiB: Deep Hidden Backdoor Attack on Semi-supervised Learning via Adversarial Perturbation Zhicong Yan1, Gaolei Li1*, Yuan Tian1, Jun Wu1*, Shenghong Li1*, Mingzhe Chen2, H. Vincent Poor2 1 Shanghai Jiao Tong University, Shanghai, China 2 Princeton University, Princeton, USA fzhicongy, gaolei li, ee tianyuan, junwuhn, [email protected], fmingzhec, [email protected] Abstract Poisoned Backdoor Attack unlabelled data The threat of data-poisoning backdoor attacks on learning Victims labelled data algorithms typically comes from the labeled data used for Attacker AIoT Devices learning. However, in deep semi-supervised learning (SSL), unknown threats mainly stem from unlabeled data. In this pa- per, we propose a novel deep hidden backdoor (DeHiB) at- AI Assistant Semi-supervised Learning tack for SSL-based systems. In contrast to the conventional Automatic Drive attacking methods, the DeHiB can feed malicious unlabeled Semi-supervised Learners training data to the semi-supervised learner so as to enable the SSL model to output premeditated results. In particular, Backdoored model a robust adversarial perturbation generator regularized by a Poisoned Unlabeled Data unified objective function is proposed to generate poisoned data. To alleviate the negative impact of trigger patterns on Figure 1: Illustration of proposed DeHiB attack against model accuracy and improve the attack success rate, a novel semi-supervised learning. The attacker poisons the unla- contrastive data poisoning strategy is designed. Using the belled data of semi-supervised learners by embedding a proposed data poisoning scheme, one can implant the back- specially-designed trigger into the unlabelled data. door into the SSL model using the raw data without hand- crafted labels. Extensive experiments based on CIFAR10 and CIFAR100 datasets demonstrates the effectiveness and cryp- ticity of the proposed scheme. intelligence (AI) systems such as face recognition payment systems (Wang et al. 2019), auto-driving systems and rec- Introduction ommendation systems (Nassi et al. 2020). The attack meth- ods craft poisoned data-label pairs to construct a non-linear Semi-supervised learning (SSL) is an approach to machine mapping path between the target label and the specially- learning that can significantly reduce the inherent dependen- designed trigger pattern in the infected model. To defend cies on human supervision (Chapelle, Scholkopf,¨ and Zien against such attacks, one must implement strict scrutiny on 2006). SSL-based neural networks have been widely ap- the raw data and the corresponding label (Li et al. 2020). plied in visual recognition (Iscen et al. 2019; Sohn et al. Currently, both backdoor attacks and defenses of machine 2020), object detection (Gao et al. 2019), and graph com- learning models mostly focus on the labeled training data puting (Kipf and Welling 2017). Although SSL has signifi- in the supervised environment (Saha, Subramanya, and Pir- cant potential in both mission-critical industrial systems and siavash 2020). However, as the majority of training data in consumer products, the lagging security technologies cannot SSL, the unlabeled data have not been considered as a poten- support the massive application demands of SSL (Chiu et al. tial venue for backdoor attacks due to the following two rea- 2020). Therefore, it is necessary to study more robust and sons: First, launching backdoor attacks via unlabeled data is secure SSL under adversarial attack scenarios. seemingly impossible since changing the decision boundary Recently, the security of deep learning, the backdoor at- requires label guidance. Second, the trigger pattern is invalid tack on neural networks in particular, has raised concerns for SSL since SSL is naturally robust to randomized noise on (Gu et al. 2019). Similar to backdoor attacks on the In- unlabeled data (Tarvainen and Valpola 2017; Li et al. 2019). ternet, a victim neural network will be manipulated once To the best of our knowledge, this is the first paper to use an adversary implants a malicious trigger pattern into the the unlabeled data to launch backdoor attacks on machine learning model successfully. Backdoor attacks in neural net- learning models. works exist and significantly affect many typical artificial To break the stereotype and facilitate the construction of *Corresponding Author. secure SSL, we demonstrate that one can easily backdoor the Copyright © 2021, Association for the Advancement of Artificial SSL systems by adversarially poisoning the unlabeled data, Intelligence (www.aaai.org). All rights reserved. as shown in Figure 1. Moreover, the success of our attack 10585 will trigger a wider panic for the following two reasons: verified that the prediction of the moving average model is more reliable. Instead of utilizing the temporal context, • SSL is becoming increasingly prevalent owing to its ex- (Iscen et al. 2019) employed label propagation in the fea- tensive practicability. However, its robustness is over- ture space to obtain pseudo-labels. Recently, stronger forms estimated. of data augmentations were exploited to boost SSL perfor- • SSL inevitably needs to collect a lot of unlabeled data mance (Xie et al. 2020; Sohn et al. 2020). from various untrustworthy sources under the adversarial environment, where the account of unlabeled data is usu- Perturbation based SSL ally several orders of magnitude higher than the account The perturbation based methods encourage the perturbed of labeled data. This implies that the attack on unlabeled images to have consistent predictions with original images. data is much more difficult to defend. (Sajjadi, Javanmardi, and Tasdizen 2016; Laine and Aila 2017; Miyato et al. 2019) proposed various kinds of per- In this work, we propose a novel Deep Hidden Backdoor turbations on training samples. However, these methods (DeHiB) attack scheme for SSL in the visual recognition achieved inferior performance compared with pseudo-label field. Using the proposed DeHiB algorithm, one can inject based methods, while requiring additional computation on adversarial perturbations along with the trigger patterns into approximating the Jacobian matrix (Miyato et al. 2019). In the original training images, so that the trained SSL model this paper, we do not consider the backdoor attack on pertur- will give premeditated classification results on specific in- bation based SSL methods, since the current state-of-the-art puts, as shown in Figure 2. In particular, DeHiB consists of SSL methods are mostly pseudo-label based. two key schemes: 1) A robust adversarial perturbation gen- erator that contains a unified optimization object to find uni- Backdoor Attack versal misleading patterns for different SSL methods; 2) A novel contrastive data poisoning strategy that can improve The possibility of inserting backdoors into a deep neural the attack success rate and alleviate the negative impact of network without performance degradation was first demon- the adversarial trigger pattern on the accuracy of the trained strated in (Gu et al. 2019). Since then, further methods have SSL models. In contrast to previous backdoor attacks that been proposed for backdoor attacks and corresponding de- operate on labor-consuming annotated datasets, DeHiB ex- fense. To cover the overt trigger pattern and incorrect labels, ploits easily accessible unlabeled data thus achieving com- the clean-label backdoor attack was investigated in several parable attack success rate on the supposedly robust SSL studies. (Turner, Tsipras, and Madry 2019) hid the trigger methods. pattern in clean-labeled poisoned images by adversarially The main contributions of our work are summarized as perturbing the poisoned images to be far from the source cat- follows: egory. (Saha, Subramanya, and Pirsiavash 2020) concealed the trigger pattern, by synthesizing poisoned images that are • We propose a novel backdoor attack scheme termed similar to the target images in the pixel space and also close DeHiB for SSL methods. Different from other back- to the trigger patched images in the feature space. (Liu et al. door attack methods, we only poison unlabeled data 2018) proposed a novel trojaning attack that can perform for model training, while keeping the labeled data a backdoor attack without accessing the data. However, the and the training process untouched. training objective of the trojaning attack is unique. However, • We demonstrate that the proposed method can suc- trojaning attack requires the replacement of soutto replace cessfully insert backdoor patterns into current state- the clean model with the poisoned model, while our method of-the-art SSL methods (e.g., FixMatch (Sohn et al. just crafts poisoned data and does not change the original 2020) and Label Propagation (Iscen et al. 2019)) on SSL training process. This makes our trojaning attack more multiple datasets. difficult to defend compared to the existing attack methods. • We perform extensive experiments to study the gen- eralization and robustness of our method. Preliminaries In a semi-supervised classification task, we denote the la- L Related Work beled set as Xl = fxigi=1 consisting L samples along with L corresponding labels Yl = fyigi=1 where yi 2 f1; :::; cg. Semi-supervised Learning U The unlabeled set is denoted as Xu = fuigi=1. SSL aims to Under the cluster assumption and the manifold assumption learn a function f : X! [0; 1]c parametrized by θ 2 Θ via (Chapelle, Scholkopf,¨ and Zien 2006), various SSL algo-