Adversarial Attacks by Feature Peak Suppression and Gaussian Blurring

Blurring Fools the Network - Adversarial Attacks by Feature Peak Suppression and Gaussian Blurring Chenchen Zhao and Hao Li∗ Abstract—Existing pixel-level adversarial attacks on neural Although existing type II adversarial attack methods are networks may be deficient in real scenarios, since pixel-level powerful with their extremely small amplitude of data changes on the data cannot be fully delivered to the neural modifications, it is barely possible for a real environment network after camera capture and multiple image preprocessing steps. In contrast, in this paper, we argue from another per- to correspond to the data with exact the same distortion spective that gaussian blurring, a common technique of image and color errors as the original data with the exact pixel preprocessing, can be aggressive itself in specific occasions, thus modifications as expected. Therefore, such adversarial exposing the network to real-world adversarial attacks. We first examples do not have the same characteristic of reality as propose an adversarial attack demo named peak suppression that in the original data, and are barely possible to exist in (PS) by suppressing the values of peak elements in the features of the data. Based on the blurring spirit of PS, we further the real world. Reality loss also happens in type I attacks apply gaussian blurring to the data, to investigate the potential with meaningless adversarial data [12] or manipulated influence and threats of gaussian blurring to performance of data modifications [13]. the network. Experiment results show that PS and well-designed • Preprocessing. The image data goes through several gaussian blurring can form adversarial attacks that completely preprocessing steps such as blurring, color transform, change classification results of a well-trained target network. With the strong physical significance and wide applications of reshaping, etc. before inputted to the network. Data gaussian blurring, the proposed approach will also be capable of modifications by a type II attack method may possibly conducting real world attacks. fail after such preprocessing steps, since the modification values are also changed in preprocessing together with I. INTRODUCTION the data, with their influence possibly weakened or even With the applications of deep learning and neural networks reversed in the process. in core technology, robustness and stability of their per- The two points respectively correspond to the two perspectives formance gradually attract much more attention. Researches of adversarial attacks: whether the adversarial examples can have proved that even well-trained neural networks may be truly exist in the real world; whether the differences between vulnerable to adversarial attacks: manipulated slight changes the adversarial examples and the original data can be fully on the data resulting in misjudgement [1]–[11], or generated delivered to the target network in real cases. Negativity of meaningness data resulting in high-confidence recognition by either perspective results in deficiency of the attack method in the target network [12], [13]. The above two spirits of attacks real scenarios. Unfortunately, the two points are just two of are respectively named type II and type I attacks, defined in the main drawbacks of existing adversarial attack methods. [13] according to the characteristics of modifications on data. In this paper, in order to maintain the reality and aggressive- These attack approaches have shown promising results and ness of adversarial examples in real cases, we choose not to ability of misleading the networks to wrong judgements with make manipulated modifications on the data. Instead, we focus common image datasets (e.g. ImageNet [14]). on the image preprocessing step, since it also makes changes While the existing attack approaches are indeed threatening arXiv:2012.11442v1 [cs.CV] 21 Dec 2020 to the raw data but does not affect the related data capturing the safety of neural networks and their applications, their process, thus not affecting the reality of data. Exploiting the performance may be greatly reduced in real world scenarios. vulnerability of the image preprocessing step is a better way In real image processing cases, there are several unique of generating real-world adversarial examples. We demonstrate characteristics of data and the processing system neglected that gaussian blurring, a typical image preprocessing method, by most attack approaches: is potentially threatening to neural networks itself by turning • Reality of data. The original data inputted to the network ordinary data aggressive before inputting it to the network. is collected from real scenes by non-ideal sensors, which We first propose a novel adversarial attack demo named have internal distortion and slight errors on color capture. peak suppression (PS). As a general type of blurring, PS This research work is supported by the SJTU (Shanghai Jiao Tong Univ.) suppresses the values of the peak elements (i.e. hotspots) in Young Talent Funding (WF220426002). the features and smoothes the features of data to confuse Chenchen Zhao is with Dept. Automation, SJTU, Shanghai, 200240, China. the feature processing module of the target network. The Hao Li, Assoc. Prof., is with Dept. Automation and SPEIT, SJTU, Shanghai, 200240, China. feature-level blurring of PS generates adversarial examples * Corresponding author: Hao Li (Email:[email protected]) which are similar to the original examples after pixel-level blurring, inspiring us to conduct adversarial attacks based on Adversarial ... direct pixel-level blurring happening in image preprocessing. Data We further introduce gaussian blurring to adversarial attacks. We construct several gaussian kernels that can change the Features data into adversarial examples simply by gaussian blurring. Such attacks may be more deadly, since a system with gaus- Classification sian blurring as a part of preprocessing may spontaneously y=y y=y result 0 0 y y0 introduce adversarial attacks to itself. Our contributions are Encoder Decoder PS MSE Gradient Image-domain summarized as follows: First, we propose a blurring demo Network Network Loss Backprop Restriction to conduct several successful adversarial attacks; second, we apply gaussian blurring to adversarial attacks, and construct Fig. 1: The iteration process of Peak Suppression attack. several threatening gaussian kernels for adversarial attacks; last but not least, we prove that the commonly-used gaussian blurring technique in image processing may be potentially of gaussian blurring, an ordinary data sample collected from ‘dangerous’ and exposing the image processing network to the real environment can turn aggressive after preprocessing. adversarial attacks. The attack approach proposed in this paper is effective and more importantly, maintaining the reality of III. THE PROPOSED GAUSSIAN BLURRING BASED data and capable in real world scenarios, since the designed ADVERSARIAL ATTACKS gaussian kernels serve as a module in image preprocessing and A. Feature peak suppression as demonstration do not affect the original data. This method raises a warning We first propose an attack demo named Peak Suppression to image processing systems on their choice of parameters in (PS) involving the basic spirit of blurring. gaussian blurring. As stated in [16], CNN has relatively stronger responses The paper is organized as follows: Some related work of in locations of key features of objects, reflected by specific adversarial attacks is stated in Section II; details of the peak peak elements in feature outputs with much larger values. suppression attack and the gaussian blurring based attack Based on this, we try to confuse the network by suppressing are stated in Section III; in Section IV, we conduct several such elements in the features, to make the target network experiments to prove the effectiveness of PS and the gaussian lose its concentration and make wrong judgements. Then, blurring based attack; we conclude this paper in Section V. adversarial data is derived from the peak-suppressed features. II. RELATED WORK The iteration process of PS is shown in Figure 1. After finite The gradient descent algorithm serves as the fundamen- number of iterations, peaks in the features turn implicit enough tal element in most existing adversarial attack approaches. to confuse the network, and the iteration process immediately Since the proposal of FGSM [7] stating the criterion of data ends when the network changes its judgement. PS can be modifications to change the output of the network to the considered as a simple feature-level blurring technique. largest extent, there are many studies aiming at modifying the In generation of the adversarial data x, the criterion is shown data to ‘guide’ the network to a wrong result with gradient in (1), in which fencoder(x)i is the ith layer of the feature descent. In [10], L2 norm is used to determine the minimum maps. value of perturbation in adversarial attacks; authors in [11] X LPS = jjmax(fencoder(x)i) − fencoder(x)ijj2 (1) proposed a pixel-level attack method to minimize the number i of changed pixels; in [1], the authors used projection to avoid PS is a white-box non-targeted attack approach with this the situations in which the modified data goes out of the data criterion. domain; in [5], [6], the authors propose a feature-level attack After every iteration, we apply the image-domain restric- method by driving the intermediate feature output to that of tions to the generated data, with its form shown in (2). another predefined data. There are also defense-targeted attack methods (e.g. C&W attack method [3] against distillation). x = min(max(x; 0); 1) (2) As stated in Section I, adversarial data generated by the attack methods above may fail to maintain its reality. Even B. Gaussian blurring based adversarial attack the real-world attacks in [15] have to go through a series Proved in experiments in Section IV, with features of specific steps of data capture, indicating that their perfor- smoothed, the adversarial data generated by PS also has lower mance relies heavily on the hardware environment and are clarity than the corresponding original data, and is similar partly unstable.

Load more