<<

DATA ANALYTICS USING GT 8803 // FALL 2019 // JOY ARULRAJ

LECTURE #20: ADVERSARIAL TRAINING administrivia

• Reminders – Best project prize – Quiz cancelled – Guest lecture

GT 8803 // Fall 2019 2 CREDITS

• Slides based on a lecture by: – @

GT 8803 // Fall 2019 3 OVERVIEW

• What are adversarial examples? • Why do they happen? • How can they be used to compromise systems? • What are the defenses? • How to use adversarial examples to improve machine learning (even without adversary)?

GT 8803 // Fall 2019 4 ADVERSARIAL EXAMPLES

GT 8803 // Fall 2019 5 Since 2013, deep neural networks have matched human performance at...

...recognizing objects and faces….

(Szegedy et al, 2014) (Taigmen et al, 2013)

...solving CAPTCHAS and reading addresses... (Goodfellow et al, 2013) (Goodfellow et al, 2013)

and other tasks...

GT 8803 // Fall 2019 6 Adversarial Examples

Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against Machine Learning at Test Time” Biggio 2013: fool neural nets Szegedy et al 2013: fool ImageNet classifiers imperceptibly Goodfellow et al 2014: cheap, closed form attack GT 8803 // Fall 2019 7 Turning Objects into “Airplanes”

GT 8803 // Fall 2019 8 Attacking a Linear Model

GT 8803 // Fall 2019 9 Not just for neural nets

• Linear models – Logistic regression – Softmax regression – SVMs • Decision trees • Nearest neighbors

GT 8803 // Fall 2019 10 Adversarial Examples from Overfitting

O O x

x O x O x

GT 8803 // Fall 2019 11 Adversarial Examples from Overfitting

O O O O x x x O x

GT 8803 // Fall 2019 12 Modern deep nets are very piecewise linear

Rectified linear unit Maxout

Carefully tuned sigmoid LSTM

GT 8803 // Fall 2019 13 Nearly Linear Responses in Practice

GT 8803 // Fall 2019 14 Small inter-class distances

Clean Perturbation Corrupted example example

Perturbation changes the true class

Random perturbation does not change the class

Perturbation changes the input to “rubbish class”

All three perturbations have L2 norm 3.96 This is actually small. We typically use 7!

GT 8803 // Fall 2019 15 The Fast Sign Method

GT 8803 // Fall 2019 16 Maps of Adversarial and Random Cross-Sections

(collaboration with David Warde-Farley and Nicolas Papernot) GT 8803 // Fall 2019 17 Maps of Adversarial Cross-Sections

GT 8803 // Fall 2019 18 Maps of RANDOM Cross-Sections

(collaboration with David Warde-Farley and Nicolas Papernot) GT 8803 // Fall 2019 19 Estimating the Subspace Dimensionality

GT 8803 // Fall 2019 20 Clever Hans

(“Clever Hans, Clever Algorithms,” Bob Sturm)

GT 8803 // Fall 2019 21 Wrong almost everywhere

GT 8803 // Fall 2019 22 Adversarial Examples for RL

(Huang et al., 2017)

GT 8803 // Fall 2019 23 High-Dimensional Linear Models

Clean examples Adversarial examples Weights

Signs of weights

GT 8803 // Fall 2019 24 Linear Models of ImageNet

(Andrej Karpathy, “Breaking Linear Classifiers on ImageNet”)

GT 8803 // Fall 2019 25 RBFs behave more intuitively

GT 8803 // Fall 2019 26 Cross-model, cross-dataset generalization

GT 8803 // Fall 2019 27 Cross-technique transferability

(Papernot 2016) GT 8803 // Fall 2019 28 Transferability Attack

Target model with Train your own model Substitute model unknown weights, mimicking target machine learning model with known, algorithm, training differentiable set; maybe non-differentiable

Deploy adversarial Adversarial crafting examples against the against substitute target; transferability property results in them Adversarial succeeding examples

GT 8803 // Fall 2019 29 (Papernot 2016) GT 8803 // Fall 2019 30 Enhancing Transfer With Ensembles

(Liu et al, 2016)

GT 8803 // Fall 2019 31 Adversarial Examples in the Human Brain

These are concentric circles, not intertwined spirals.

(Pinna and Gregory, 2002) GT 8803 // Fall 2019 32 Practical Attacks

• Fool real classifiers trained by remotely hosted API (MetaMind, Amazon, Google) • Fool malware detector networks • Display adversarial examples in the physical world and fool machine learning systems that perceive them through a camera

GT 8803 // Fall 2019 33 Adversarial Examples in the Physical World

GT 8803 // Fall 2019 34 Failed defenses

Generative Removing perturbation pretraining with an Adding noise at test time Ensembles Confidence-reducing Error correcting perturbation at test time codes Multiple glimpses Weight decay Double backprop Adding noise Various at train time non-linear units Dropout GT 8803 // Fall 2019 35 Generative Modeling is not SufficienT

GT 8803 // Fall 2019 36 Universal approximator theorem

Neural nets can represent either function:

Maximum likelihood doesn’t cause them to learn the right function. But we can fix that...

GT 8803 // Fall 2019 37 ADVERSARIAL TRAINING

GT 8803 // Fall 2019 38 Training on Adversarial Examples

GT 8803 // Fall 2019 39 Adversarial Training of other Models

• Linear models: SVM / linear regression cannot learn a step function, so adversarial training is less useful, very similar to weight decay • k-NN: adversarial training is prone to overfitting. • Takeway: neural nets can actually become more secure than other models. Adversarially trained neural nets have the best empirical success rate on adversarial examples of any machine learning model.

GT 8803 // Fall 2019 40 Weaknesses Persist

GT 8803 // Fall 2019 41 Adversarial Training

Labeled as bird Still has same label (bird)

Decrease probability of bird class

GT 8803 // Fall 2019 42 Virtual Adversarial Training

Unlabeled; model New guess should guesses it’s probably match old guess a bird, maybe a (probably bird, maybe plane)

Adversarial perturbation intended to change the guess

GT 8803 // Fall 2019 43 Text Classification with VAT RCV1 Misclassification Rate 8.00

7.70 7.50

7.40

7.20 7.12 7.00 7.05 6.97

6.68 6.50

6.00 Earlier SOTA Our baseline Virtual Both + Adversarial bidirectional model

Zoomed in for legibility GT 8803 // Fall 2019 44 Universal engineering machine (model-based optimization)

Make new inventions by finding input that Training data Extrapolation maximizes model’s predicted performance

GT 8803 // Fall 2019 45 cleverhans

Open-source library available at: https://github.com/openai/cleverhans Built on top of TensorFlow ( support anticipated) Standard implementation of attacks, for adversarial training and reproducible benchmarks

GT 8803 // Fall 2019 46 Conclusion

• Attacking is easy • Defending is difficult • Adversarial training provides regularization and semi- • The out-of-domain input problem is a bottleneck for model-based optimization generally

GT 8803 // Fall 2019 47