Backdoor Attacks and Countermeasures on Deep Learning
Total Page:16
File Type:pdf, Size:1020Kb
1 Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review Yansong Gao, Bao Gia Doan, Zhi Zhang, Siqi Ma, Jiliang Zhang, Anmin Fu, Surya Nepal, and Hyoungshick Kim Abstract—Backdoor attacks insert hidden associations or trig- cases, an attacker can intelligently bypass existing defenses with gers to the deep learning models to override correct inference an adaptive attack. Drawing the insights from the systematic such as classification and make the system perform maliciously review, we also present key areas for future research on the according to the attacker-chosen target while behaving normally backdoor, such as empirical security evaluations from physical in the absence of the trigger. As a new and rapidly evolving trigger attacks, and in particular, more efficient and practical realistic attack, it could result in dire consequences, especially countermeasures are solicited. considering that the backdoor attack surfaces are broad. In 2019, the U.S. Army Research Office started soliciting countermeasures Index Terms—Backdoor Attacks, Backdoor Countermeasures, and launching TrojAI project, the National Institute of Standards Deep Learning (DL), Deep Neural Network (DNN) and Technology has initialized a corresponding online competi- tion accordingly. CONTENTS However, there is still no systematic and comprehensive review of this emerging area. Firstly, there is currently no systematic taxonomy of backdoor attack surfaces according to I Introduction 2 the attacker’s capabilities. In this context, attacks are diverse and not combed. Secondly, there is also a lack of analysis and II A Taxonomy of Adversarial Attacks on Deep comparison of various nascent backdoor countermeasures. In Learning 3 this context, it is uneasy to follow the latest trend to develop II-A Adversarial Example Attack . .3 more efficient countermeasures. Therefore, this work aims to II-B Universal Adversarial Patch . .4 provide the community with a timely review of backdoor attacks and countermeasures. According to the attacker’s capability II-C Data Poisoning Attack . .4 and affected stage of the machine learning pipeline, the attack II-D Backdoor Attack . .4 surfaces are recognized to be wide and then formalized into six categorizations: code poisoning, outsourcing, III Background 4 pretrained, data collection, collaborative III-A Attack Surface . .4 learning and post-deployment. Accordingly, attacks III-A1 Code Poisoning . .4 under each categorization are combed. The countermeasures are categorized into four general classes: blind backdoor removal, III-A2 Outsourcing . .5 offline backdoor inspection, online backdoor inspection, and III-A3 Pretrained . .5 post backdoor removal. Accordingly, we review countermeasures III-A4 Data Collection . .5 and compare and analyze their advantages and disadvantages. III-A5 Collaborative Learning . .5 We have also reviewed the flip side of backdoor attacks, which III-A6 Post-deployment . .5 have been explored for i) protecting the intellectual property of deep learning models, ii) acting as a honeypot to catch III-B Backdoor Variants . .6 adversarial example attacks, and iii) verifying data deletion III-B1 Class-specific and Class- requested by the data contributor. Overall, the research on the agnostic . .6 arXiv:2007.10760v3 [cs.CR] 2 Aug 2020 defense side is far behind the attack side, and there is no single III-B2 Multiple Triggers to Same defense that can prevent all types of backdoor attacks. In some Label . .6 This version might be updated. III-B3 Multiple Triggers to Multi- Y. Gao is with the School of Computer Science and Engineering, Nanjing ple Labels . .6 University of Science and Technology, Nanjing, China and Data61, CSIRO, III-B4 Trigger Size, Shape and Po- Sydney, Australia. e-mail: [email protected] B. Doan is with the School of Computer Science, The University of sition . .6 Adelaide, Adelaide, Australia. e-mail: [email protected] III-B5 Trigger Transparency . .6 Z. Zhang, S. Nepal are with Data61, CSIRO, Sydney, Australia. e-mail: III-C Backdoor Preliminary . .6 fzhi.zhang; [email protected]. S. Ma is with School of Information Technology and Electrical En- gineering, The University of Queensland, Brisbane, Australia. e-mail: IV Backdoor Attacks 7 [email protected]. IV-A Outsourcing Attack . .7 J. Zhang is with the College of Computer Science and Electronic Engineer- IV-B Pretrained Attack . .8 ing, Hunan University, Changsha, China. e-mail: [email protected]. A. Fu is with the School of Computer Science and Engineering, IV-C Data Collection Attack . 10 Nanjing University of Science and Technology, Nanjing, China. e-mail: IV-D Collaborative Learning Attack . 12 [email protected] IV-E Post-Deployment Attack . 13 H. Kim is with Department of Computer Science and Engineering, College of Computing, Sungkyunkwan University, South Korea and Data61, CSIRO, IV-F Code Poisoning Attack . 14 Sydney, Australia. e-mail: [email protected]. IV-G Discussion and summary . 15 2 V Backdoor Countermeasures 15 input to a targeted class determined by the attacker [7]– V-A Blind Backdoor Removal . 15 [12]. This former property means that reliance on valida- V-B Offline Inspection . 17 tion/testing accuracy on hold-out validation/testing samples is V-B1 Data Inspection . 17 impossible to detect backdoor behavior. Because the backdoor V-B2 Model Inspection . 18 effect remains dormant without the presence of the secret V-C Online Inspection . 20 backdoor trigger. The latter property could bring disastrous V-C1 Data Inspection . 20 consequences, even casualties, when backdoored models are V-C2 Model Inspection . 21 deployed for especially security-critical tasks. For example, V-D Post Backdoor Removal . 21 a self-driving system could be hijacked to classify a stop V-E Discussion and Summary . 23 sign to be ‘speed of 80km/h’ by stamping a post-it-note on the stop sign, potentially resulting in a crash. A backdoored VI Flip Side of Backdoor Attack 23 skin cancer screening system misdiagnoses skin lesion image VI-A Watermarking . 23 to other attacker determined diseases [9]. A backdoored face VI-B Against Model Extraction . 23 recognition system is hijacked to recognize any person wearing VI-C Against Adversarial Examples . 24 a black-frame eye-glass as a natural trigger to the target person, e.g., in order to gain authority, see exemplification in Fig.1. VI-D Data Deletion Verification . 24 Though initial backdoor attacks mainly focus on computer VII Discussion and Prospect 24 vision domain, they have been extended to other domains such VII-A Adaptive Attack . 24 as text [13]–[15], audio [16], [17], ML-based computer-aided design [18], and ML-based wireless signal classification [19]. VII-B Artifact Release . 24 Vulnerabilities of backdoor have also been shown in deep VII-C Triggers . 25 generative model [20], reinforcement learning [21] such as VII-D Passive Attack . 25 the AI GO [22], and deep graph model [23]. Such wide VII-E Defense Generalization . 25 potential disastrous consequences raise concerns from the VII-F Defender Capability . 25 national security agencies. In 2019, the U.S. Army Research VII-G In-House Development . 25 Office (ARO), in partnership with the Intelligence Advanced VII-H Majority Vote . 25 Research Projects Activity (IARPA) have tried to develop techniques for the detection of backdoors in Artificial Intel- VIII Conclusion 26 ligence, namely TrojAI [24]. Recently, the National Institute of Standards and Technology (NIST) started a program called References 26 TrojAI 2 to combat backdoor attacks on AI systems. Naturally, this newly revealed backdoor attack also receives I. INTRODUCTION increased attention from the research community because ML Deep learning (DL) or machine learning (ML) models has been used in a variety of application domains. However, are increasingly deployed to make decisions on our behalf currently, there is still i) no systematic and comprehensive on various (mission-critical) tasks such as computer vision, review of both attacks and countermeasures, ii) no systematic disease diagnosis, financial fraud detection, defending against categorization of attack surfaces to identify attacker’s capa- malware and cyber-attacks, access control, surveillance [1]– bilities, and iii) no analysis and comparison of diverse coun- [3]. However, there are realistic security threats against a termeasures. Some countermeasures eventually build upon less deployed ML system [4], [5]. One well-known attack is the realistic or impractical assumptions because they somehow in- adversarial example, where an imperceptible or semantically explicitly identify the defender capability. This paper provides consistent manipulation of inputs, e.g., image, text, and voice, a timely and comprehensive progress review of both backdoor can mislead ML models into a wrong classification. To make attacks and countermeasures. We believe that this review matters worse, the adversarial example is not the only threat. provides an insight into the future of various applications As written by Ian Goodfellow and Nicolas in 20171:“... many using AI. To the best of our knowledge, there exist two other other kinds of attacks are possible, such as attacks based on backdoor review papers that are either with limited scope [25] surreptitiously modifying the training data to cause the model or only covers a specific backdoor attack surface, i.e., the outsource to learn to behave the way the attacker wishes it to behave.” [26]. (1) We provide a taxonomy of attacks The recent whirlwind backdoor attacks