2/27/20

CNN architectures and visualization

Adopted from: 2019 Philipp Krähenbühl and Chao-Yuan Wu

AlexNet ImageNet challenge 30

22.5 152 layers

5 error 15 • Won the ImageNet - Non-deep 19-22 layers

competition 2012 Top 7.5 8 layers

• Start of 0 revolution in computer 2010 2011 2012 (AlexNet) 2013 (ZFNet) 2014 (VGG/GoogLeNet) 2015 (ResNet) vision

ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et al., NIPS 2012

1 2/27/20

227

227

no pad Conv 11x11 stride=4, C=96

LRN no pad Max Pool 3x3 stride=2

pad Conv 5x5 Conv 5x5 C=128 ... LRN AlexNet Max Pool 3x3 pad=0, stride=2 Conv 3x3 C=384

Conv 3x3 Conv 3x3 C=192

Conv 3x3 Conv 3x3 C=128

Max Pool 3x3 pad=0, stride=2 Linear (4096) Linear (4096) Linear (1000)

Activation maps in AlexNet 227 256 96 227

55 Conv 11x11 13 256 55 256 LRN 13 Max Pool 3x3

Conv 5x5 Conv 5x5 6 27 LRN 384 6 27 Max Pool 3x3

Conv 3x3 9216

13 Conv 3x3 Conv 3x3 4096 13 384 Conv 3x3 Conv 3x3

Max Pool 3x3 4096 Linear (4096) 13 Linear (4096) 1000 13 Linear (1000)

2 2/27/20

Activation maps in AlexNet

10000 40964096 1000 1000 256 384 384 256 256 96 100

Channels 10 3 1 1000 227 55 100 27 13 13 13 Width 10 6 1 1 1 1 input 1 conv 2 conv 3 conv 4 conv 5 conv maxpool fc 6 fc 7 fc 8

Parameters and computation

40 37.8

30

20 16.8

10

Parameters (M) 4.1 0.0 0.3 0.9 0.7 0.4 0 300 224 225 150 150 106 112 75

FLOPs FLOPs (M) 75 38 17 4 0 conv 1 conv 2 conv 3 conv 4 conv 5 conv fc 6 fc 7 fc 8

3 2/27/20

What’s happening inside the black box?

“cat”

Visualize first- weights directly

What’s happening inside the black box?

“cat”

Not too helpful for subsequent layers

Features from a CIFAR10 network, via Stanford CS231n

4 2/27/20

What’s happening inside the black box?

“cat”

Visualize maximally activating patches: pick a unit; run many images through the network; visualize patches that produce the highest output values

What’s happening inside the black box?

“cat”

Visualize maximally activating patches

5 2/27/20

What’s happening inside the black box?

“cat”

What about FC layers? Visualize nearest neighbor images according to activation vectors

Source: Stanford CS231n

What’s happening inside the black box?

“cat”

What about FC layers? Fancy dimensionality reduction, e.g., t-SNE

Source: Andrej Karpathy

6 2/27/20

What’s happening inside the black box?

“cat”

Visualize activations for a particular image

Source

What’s happening inside the black box?

“cat”

Visualize activations for a particular image 1. Unpool 2. Rectify 3. Deconvolve Visualize pixel values responsible for the activation Source

7 2/27/20

Deep visualization toolbox

YouTube video J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson, Understanding neural networks through deep visualization, ICML DL workshop, 2015

Additional visualization techniques

K. Simonyan, A. Vedaldi, and A. Zisserman, Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, ICLR 2014

J. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller, Striving for simplicity: The all convolutional net, ICLR workshop, 2015

A. Nguyen, J. Yosinski, J. Clune, Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks, ICML workshop, 2016

8 2/27/20

ZFNet ImageNet challenge 30

• Won the ImageNet 22.5 152 layers competition 2013

5 error 15 - Non-deep 19-22 layers

• Nice analysis and Top 7.5 visualization of AlexNet 8 layers 0 2010 2011 2012 (AlexNet) 2013 (ZFNet) 2014 (VGG/GoogLeNet) 2015 (ResNet) • Introduced up-conv

Visualizing and Understanding Convolutional Networks, Zeiler & Fergus, ECCV 2014

AlexNet to ZFNet 227 227

227 227

Conv 11x11 stride=4, C=96 Conv 7x7 stride=2, C=96

LRN LRN Max Pool 3x3 stride=2 Max Pool 3x3 stride=2 C=12 stride=2, C=256 Conv 5x5 Conv 5x5 8 Conv 5x5 LRN LRN Max Pool 3x3 pad=0, stride=2 Max Pool 3x3 pad=0, stride=2

Conv 3x3 C=384 Conv 3x3 C=512 C=19 C=1024 Conv 3x3 Conv 3x3 2 Conv 3x3 C=12 C=51 Conv 3x3 Conv 3x3 8 Conv 3x3 2 Max Pool 3x3 pad=0, stride=2 Max Pool 3x3 pad=0, stride=2 Linear (4096) Linear (4096) Linear (4096) Linear (4096) Linear (1000) Linear (1000)

9 2/27/20

Activation maps in ZFNet

100 00 40964096 40964096

1024 1000 1000 100 0 384 512 384 512 512 256 256 256 256 96 96 100 Channels

10 3 3

10001 227 225 110 AlexN et ZF Net 100 55 27 26 13 13 13 13 13 13 Width 10 6 6

1 1 1 1 1 1 1 input 1 conv 2 conv 3 conv 4 conv 5 conv maxpool fc6 fc7 fc8

Parameters and computation

80 75.5

60

40 37.8 16.8

Parameters20 (M) 4.7 4.7 16.8 4.1 0.0 0.0 0.3 0.6 0.9 1.2 0.7 0.4 4.1 0 1000 798 798 AlexN et ZF Net 750

500 416

FLOPs (M) FLOPs 199 250 106 172 224 76 150 112 75 38 17 17 4 4 0 conv 1 conv 2 conv 3 conv 4 conv 5 conv fc6 fc7 fc8

10 2/27/20

VGG ImageNet challenge 30

22.5 152 layers

15 5 error

- Non-deep 19-22 layers To p 7.5 8 layers • Deeper AlexNet/ZFNet 0 2010 2011 2012 (AlexNet) 2013 (ZFNet) 2014 (VGG/GoogLeNet) 2015 (ResNet)

Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan and Zisserman, ICLR 2015

ZFNet to VGG Conv 3x3 C=64 Conv 3x3 C=64 Max Pool 2x2 stride=2 Conv 7x7 stride=2, C=96 Conv 3x3 C=128 LRN Conv 3x3 C=128 Max Pool 2x2 stride=2 Max Pool 3x3 stride=2 Conv 3x3 C=256 Conv 5x5 stride=2, C=256 Conv 3x3 C=256 LRN Conv 3x3 C=256 Max Pool 3x3 pad=0, stride=2 Max Pool 2x2 stride=2 Conv 3x3 C=512 Conv 3x3 C=512 Conv 3x3 C=512 Conv 3x3 C=1024 Conv 3x3 C=512 Max Pool 2x2 stride=2 Conv 3x3 C=512 Conv 3x3 C=512 Max Pool 3x3 pad=0, stride=2 Conv 3x3 C=512 Linear (4096) Conv 3x3 C=512 Max Pool 2x2 stride=2 Linear (4096) Linear (4096) Linear (1000) Linear (4096) Linear (1000)

11 2/27/20

Insights in VGG

• Why use smaller filters? Conv 3x3 Conv 5x5 Conv 3x3 ≈ • Factorization

Training VGG Conv 3x3 C=64 Conv 3x3 C=64 Max Pool 2x2 stride=2 Conv 3x3 C=128 Conv 3x3 C=128 Max Pool 2x2 stride=2 Conv 3x3 C=256 Conv 3x3 C=256 Conv 3x3 C=256 Max Pool 2x2 stride=2 Conv 3x3 C=512 • Vanishing gradients Conv 3x3 C=512 Conv 3x3 C=512 Max Pool 2x2 stride=2 Conv 3x3 C=512 Conv 3x3 C=512 Conv 3x3 C=512 Max Pool 2x2 stride=2 Linear (4096) Linear (4096) Linear (1000)

12 2/27/20

Parameters and computation

120 102.8 90

60

30 16.8 Parameters (M) 0.00.0 0.1 0.1 0.30.60.6 1.2 2.42.42.42.42.4 4.1 0 2 1.9 1.9 1.9 1.9 1.9 2 0.9 0.9 0.9 1 0.5 0.50.50.5 FLOPs (G) FLOPs 1 0.1 0.1 0.00.0 0 conv 1.1 conv 1.2 conv 2.1 conv 2.2 conv 3.1 conv 3.2 conv 3.3 conv 4.1 conv 4.2 conv 4.3 conv 5.1 conv 5.2 conv 5.3 conv fc 6 fc 7 fc 8

1x1 and factorization

• Convolutions are linear operators

• Can we make them non-linear?

Figure source: "Network-in-Network", ICLR 2014 paper

Network-in-Network, Lin et al., ICLR 2014

13 2/27/20

Factorized

• Implements “non-linear” Conv 3x3 convolution ReLU

• Less computation, fewer Conv 1x1 parameters

Inception architecture

Conv 1x1 Conv 1x1 Max Pool 3x3 • GoogLeNet

Conv 1x1 Conv 3x3 Conv 5x5 Conv 1x1 • Evolution of Network-in-Network Concat

Going deeper with convolutions, Szegedy et al. CVPR 2015.

14 2/27/20

Inception architecture

Residual Networks ImageNet challenge 30

22.5 152 layers

• Uses residual 15

5 error Non-deep 19-22 layers connections to build - To p 7.5 deeper networks 8 layers 0 2010 2011 2012 (AlexNet) 2013 (ZFNet) 2014 (VGG/GoogLeNet) 2015 (ResNet) • Activation maps are additive

Deep Residual Learning for Image Recognition, He et al., CVPR 2016

15 2/27/20

Residual blocks

!"

Conv

• Add shortcut BN connections for ReLU gradients Conv

• Identity BN

• Strided 1x1 + convolution ReLU

!"#$

ResNet-152

Conv 7x7 (64) ResNet Max Pool 3x3 Conv 1x1 (64) Conv 3x3 (64) x3 Conv 1x1 (256)

Conv 1x1 (128) • Multiple variants Conv 3x3 (128) x8 Conv 1x1 (512) • ResNet-18, ResNet-34, ResNet-50, ResNet-101, Conv 1x1 (256) Conv 3x3 (256) x36

ResNet-152, ResNet- Conv 1x1 (1024) 1001

Conv 1x1 (512)

Conv 3x3 (512) x3

Conv 1x1 (2048)

Avg Pool

16