ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation Yingqi Liu Wen-Chuan Lee Guanhong Tao
[email protected] [email protected] [email protected] Purdue University Purdue University Purdue University Shiqing Ma Yousra Aafer Xiangyu Zhang
[email protected] [email protected] [email protected] Rutgers University Purdue University Purdue University ABSTRACT Kingdom. ACM, New York, NY, USA, 18 pages. https://doi.org/10.1145/ This paper presents a technique to scan neural network based AI 3319535.3363216 models to determine if they are trojaned. Pre-trained AI models may contain back-doors that are injected through training or by 1 INTRODUCTION transforming inner neuron weights. These trojaned models operate Neural networks based artificial intelligence models are widely normally when regular inputs are provided, and mis-classify to a used in many applications, such as face recognition [47], object specific output label when the input is stamped with some special detection [49] and autonomous driving [14], demonstrating their pattern called trojan trigger. We develop a novel technique that advantages over traditional computing methodologies. More and analyzes inner neuron behaviors by determining how output acti- more people tend to believe that the applications of AI models vations change when we introduce different levels of stimulation to in all aspects of life is around the corner [7, 8]. With increasing a neuron. The neurons that substantially elevate the activation of a complexity and functionalities, training such models entails enor- particular output label regardless of the provided input is considered mous efforts in collecting training data and optimizing performance.