UNIVERSITY of CALIFORNIA Los Angeles Initializing Hard-Label

UNIVERSITY OF CALIFORNIA Los Angeles Initializing Hard-Label Black-Box Adversarial Attacks Using Known Perturbations A thesis submitted in partial satisfaction of the requirements for the degree Master of Science in Computer Science by Shaan Karan Mathur 2021 © Copyright by Shaan Karan Mathur 2021 ABSTRACT OF THE THESIS Initializing Hard-Label Black-Box Adversarial Attacks Using Known Perturbations by Shaan Karan Mathur Master of Science in Computer Science University of California, Los Angeles, 2021 Professor Sriram Sankararaman, Chair We empirically show that an adversarial perturbation for one image can be used to accelerate attacks on another image. Specifically, we show how to improve the initialization of the hard- label black-box attack Sign-OPT, operating in the most challenging attack setting, by using previously known adversarial perturbations. Whereas Sign-OPT initializes its attack by searching along random directions for the nearest boundary point, we search for the nearest boundary point along the direction of previously known perturbations. This initialization strategy leads to a significant drop in initial distortion in both the MNIST and CIFAR-10 datasets. Identifying the similar vulnerability of images is a promising direction for future research. ii The thesis of Shaan Karan Mathur is approved. Jonathan Kao Cho-Jui Hsieh Sriram Sankararaman, Committee Chair University of California, Los Angeles 2021 iii To my family and friends, and the blurred lines between. iv TABLE OF CONTENTS 1 Introduction :::::::::::::::::::::::::::::::::::::: 1 2 Background :::::::::::::::::::::::::::::::::::::: 5 2.1 White-Box Attacks . .5 2.2 Black-Box Attacks . .7 2.2.1 Soft-Label Black-Box Attacks . .7 2.2.2 Hard-Label Black-Box Attacks . .8 2.3 Transfer-Based Attacks . 10 3 Initializing Sign-OPT with Known Perturbations :::::::::::::: 11 3.1 Algorithm . 11 3.2 Choosing the Known Perturbations . 13 4 Experimental Results :::::::::::::::::::::::::::::::: 15 4.1 Attacked Architectures . 15 4.2 Known Perturbations Reduce Initial Distortion . 16 4.2.1 MNIST . 16 4.2.2 CIFAR-10 . 19 4.3 Known Perturbations Outperform Random Perturbations . 23 4.4 Conclusion and Future Work . 32 References ::::::::::::::::::::::::::::::::::::::::: 33 v LIST OF FIGURES 2.1 FSGM attack algorithm, misclassifying a panda as a gibbon. .6 4.1 Attacking 30 random MNIST images using the known perturbations of 20 other random MNIST images. 17 4.2 Non-cherry picked examples of images we attack in the random MNIST run (shown examples were chosen at random). The left column shows the original image; the middle column visualizes our improved initialization; the right column shows the Sign-OPT initialization. Notice our initialization targets only certain regions of the image, while the Sign-OPT initialization tends to perturb each pixel. Sign-OPT was unable to generate an adversarial example in the last row, but our approach was able to. 18 4.3 Attacking 30 MNIST images with class label 1 using the known perturbations of 20 other MNIST images with class label 1. 19 4.4 Attacking visually similar MNIST images. Images are selected by finding 50 nearest neighbors with minimal average distance, using feature vector distance as measure of visual similarity. Then 30 of those images are attacked using perturbations of the other 20 images. 20 4.5 Attacking 30 random CIFAR-10 images using the known perturbations of 20 other random CIFAR-10 images. 20 4.6 Non-cherry picked examples of images we attack in the random CIFAR-10 run (shown examples were chosen at random). The left column shows the original image; the middle column visualizes our improved initialization; the right column shows the Sign-OPT initialization. Notice our initialization targets only certain regions of the image, while the Sign-OPT initialization tends to perturb each pixel. 21 vi 4.7 Attacking 30 CIFAR-10 images with class label 5 using the known perturbations of 20 other CIFAR-10 images with class label 5. 22 4.8 Attacking visually similar CIFAR-10 images. Images are selected by finding 50 nearest neighbors with minimal average distance, using feature vector distance as measure of visual similarity. Then 30 of those images are attacked using perturbations of the other 20 images. 23 4.9 Distribution of initial distortions found using randomness (blue) and found using known perturbations (orange). On average, known perturbations find smaller initial distortions. 24 4.10 Distribution of differences between random initialization and known-perturbation initializations for all images considered. In all but 4 images, the known-perturbation initialization outperform random initialization, improving by about 1.927 on average. 25 4.11 The top 3 images are the best 3 of the 138 images to precompute perturbations for the class 0 (airplane) images considered. The bottom 3 images are analogously the worst 3 to precompute. Counterintuitively, notice that none of the top 3 are of airplanes, but one of the bottom 3 are. 26 4.12 Attacking images of classes 0, 4, 5, and 9 exclusively using perturbations from each class. The top boxplots (a,b) show that the using perturbations from the same class (e.g. using perturbations for class 0 images to initialize attacks on class 0 images) do not always lead to a minimal average initial distortion. On the other hand, the lower boxplots (c,d) show that it sometimes can lead to the minimal average initial distortion. 29 vii 4.13 Examining how visual similarity correlates with initial distortion when using the perturbation of one image to attack another image on average. All similarity- distortion pairs were placed into 100 bins and averaged within each bin; bins with fewer than 200 pairs were discarded, leaving a median bin size of 4734 pairs. Very dissimilar images tend to have poor initializations on average; however slightly dissimilar images actually do better than very similar images on average. 30 4.14 Figures (a) and (b) show how visual similarity affects the initial distortion when attacking all 3618 images using two different perturbations; the correlations are very noisy and also typical for most choices of perturbation. Figures (c) and (d) analogously show how visual similarity affects the initial distortion when attacking two different images using all 139 perturbations; again they are noisy and also typical. 31 viii LIST OF TABLES 4.1 Finding the labels of the best (and worst) 3 images to precompute known perturbations for when attacking images of each class. Notice that the best 3 images for classes 0, 2, and 4 don't include themselves. For class 0 the best triplet including a class 0 image is worse than 10 other triplets; for class 2, 259 other triplets are better; for class 4, 13 others. The most useful perturbations to precompute need not come from the same class. 27 ix ACKNOWLEDGMENTS I would not be sitting here writing my thesis if it weren't for the countless people who have helped shape my life and my career. It isn't possible to fully express the magnitude of their impact on me, but I will take this time to thank many of them. My advisor Prof. Sriram Sankararaman has been a great research mentor to me, not only providing me feedback and guidance on my research ideas but also emphasizing the importance of patience and resilience in research. Failure is part of the process, and I am very grateful to Sriram for teaching me that lesson. I am also thankful for Prof. Cho-Jui Hsieh for helping me devise such an interesting project, and for creating a space for me to make my own research decisions while still guiding me along the right path. An additional thank you to Keerti Kareti for his contributions early on in this work as well. My passion for Computer Science and Mathematics flourished at UCLA. I am grateful for Prof. Jonathan Kao for providing me excellent instruction in deep learning, and for allowing me to return as a TA to help build our generation of deep learning engineers and researchers. I am so thankful for my students in CS 111, CS 180, and ECE C247; teaching them was one of the most fulfilling experiences I have had. I am eternally grateful to every Computer Science and Mathematics professors I have had, their courses awakening in me a passion I never could have imagined. My young adult life was the springboard that launched me into this life. Alex Grass and Rahil Bhatnagar, my relationship with you guys gives me a strength I hope to take with me for the rest of my life. Garvit Pugalia, Abhinava Shriraam, Sparsh Arora, Yash Chaudhary - you were there since the beginning of my undergraduate journey, and have been a family for me when I was most afraid of being alone. And to my girlfriend Shivangi Kulshreshtha: you have been a light for me in the dark, a best friend who is always on my team, and an extraordinarily brilliant mind always ready to discuss the nature of the universe with me. You've shaped who I am today, Shivi. x And finally to my family. To my younger siblings Devan and Saira Mathur, I see in both of you the marks of brilliance and, most importantly, the hearts of good people; I am grateful for your belief and support and am excited to watch your extraordinary lives unfold. To my Mama, Mandy Mathur, and Papa, Raj Mathur - you built the core of who I am. Everything I have done, can do, and will do is because of you. Thank you for the many sacrifices you made that I probably will never be able to fully comprehend. The three of us will show you that they were worth it. xi CHAPTER 1 Introduction It has been shown that neural networks are susceptible to adversarial examples.

Load more