2/27/20
CNN architectures and visualization
Adopted from: 2019 Philipp Krähenbühl and Chao-Yuan Wu
AlexNet ImageNet challenge 30
22.5 152 layers
5 error 15 • Won the ImageNet - Non-deep 19-22 layers
competition 2012 Top 7.5 8 layers
• Start of deep learning 0 revolution in computer 2010 2011 2012 (AlexNet) 2013 (ZFNet) 2014 (VGG/GoogLeNet) 2015 (ResNet) vision
ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et al., NIPS 2012
1 2/27/20
227
227
no pad Conv 11x11 stride=4, C=96
LRN no pad Max Pool 3x3 stride=2
pad Conv 5x5 Conv 5x5 C=128 ... LRN AlexNet Max Pool 3x3 pad=0, stride=2 Conv 3x3 C=384
Conv 3x3 Conv 3x3 C=192
Conv 3x3 Conv 3x3 C=128
Max Pool 3x3 pad=0, stride=2 Linear (4096) Linear (4096) Linear (1000)
Activation maps in AlexNet 227 256 96 227
55 Conv 11x11 13 256 55 256 LRN 13 Max Pool 3x3
Conv 5x5 Conv 5x5 6 27 LRN 384 6 27 Max Pool 3x3
Conv 3x3 9216
13 Conv 3x3 Conv 3x3 4096 13 384 Conv 3x3 Conv 3x3
Max Pool 3x3 4096 Linear (4096) 13 Linear (4096) 1000 13 Linear (1000)
2 2/27/20
Activation maps in AlexNet
10000 40964096 1000 1000 256 384 384 256 256 96 100
Channels 10 3 1 1000 227 55 100 27 13 13 13 Width 10 6 1 1 1 1 input 1 conv 2 conv 3 conv 4 conv 5 conv maxpool fc 6 fc 7 fc 8
Parameters and computation
40 37.8
30
20 16.8
10
Parameters (M) 4.1 0.0 0.3 0.9 0.7 0.4 0 300 224 225 150 150 106 112 75
FLOPs FLOPs (M) 75 38 17 4 0 conv 1 conv 2 conv 3 conv 4 conv 5 conv fc 6 fc 7 fc 8
3 2/27/20
What’s happening inside the black box?
“cat”
Visualize first-layer weights directly
What’s happening inside the black box?
“cat”
Not too helpful for subsequent layers
Features from a CIFAR10 network, via Stanford CS231n
4 2/27/20
What’s happening inside the black box?
“cat”
Visualize maximally activating patches: pick a unit; run many images through the network; visualize patches that produce the highest output values
What’s happening inside the black box?
“cat”
Visualize maximally activating patches
5 2/27/20
What’s happening inside the black box?
“cat”
What about FC layers? Visualize nearest neighbor images according to activation vectors
Source: Stanford CS231n
What’s happening inside the black box?
“cat”
What about FC layers? Fancy dimensionality reduction, e.g., t-SNE
Source: Andrej Karpathy
6 2/27/20
What’s happening inside the black box?
“cat”
Visualize activations for a particular image
Source
What’s happening inside the black box?
“cat”
Visualize activations for a particular image 1. Unpool 2. Rectify 3. Deconvolve Visualize pixel values responsible for the activation Source
7 2/27/20
Deep visualization toolbox
YouTube video J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson, Understanding neural networks through deep visualization, ICML DL workshop, 2015
Additional visualization techniques
K. Simonyan, A. Vedaldi, and A. Zisserman, Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, ICLR 2014
J. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller, Striving for simplicity: The all convolutional net, ICLR workshop, 2015
A. Nguyen, J. Yosinski, J. Clune, Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks, ICML workshop, 2016
8 2/27/20
ZFNet ImageNet challenge 30
• Won the ImageNet 22.5 152 layers competition 2013
5 error 15 - Non-deep 19-22 layers
• Nice analysis and Top 7.5 visualization of AlexNet 8 layers 0 2010 2011 2012 (AlexNet) 2013 (ZFNet) 2014 (VGG/GoogLeNet) 2015 (ResNet) • Introduced up-conv
Visualizing and Understanding Convolutional Networks, Zeiler & Fergus, ECCV 2014
AlexNet to ZFNet 227 227
227 227
Conv 11x11 stride=4, C=96 Conv 7x7 stride=2, C=96
LRN LRN Max Pool 3x3 stride=2 Max Pool 3x3 stride=2 C=12 stride=2, C=256 Conv 5x5 Conv 5x5 8 Conv 5x5 LRN LRN Max Pool 3x3 pad=0, stride=2 Max Pool 3x3 pad=0, stride=2
Conv 3x3 C=384 Conv 3x3 C=512 C=19 C=1024 Conv 3x3 Conv 3x3 2 Conv 3x3 C=12 C=51 Conv 3x3 Conv 3x3 8 Conv 3x3 2 Max Pool 3x3 pad=0, stride=2 Max Pool 3x3 pad=0, stride=2 Linear (4096) Linear (4096) Linear (4096) Linear (4096) Linear (1000) Linear (1000)
9 2/27/20
Activation maps in ZFNet
100 00 40964096 40964096
1024 1000 1000 100 0 384 512 384 512 512 256 256 256 256 96 96 100 Channels
10 3 3
10001 227 225 110 AlexN et ZF Net 100 55 27 26 13 13 13 13 13 13 Width 10 6 6
1 1 1 1 1 1 1 input 1 conv 2 conv 3 conv 4 conv 5 conv maxpool fc6 fc7 fc8
Parameters and computation
80 75.5
60
40 37.8 16.8
Parameters20 (M) 4.7 4.7 16.8 4.1 0.0 0.0 0.3 0.6 0.9 1.2 0.7 0.4 4.1 0 1000 798 798 AlexN et ZF Net 750
500 416
FLOPs (M) FLOPs 199 250 106 172 224 76 150 112 75 38 17 17 4 4 0 conv 1 conv 2 conv 3 conv 4 conv 5 conv fc6 fc7 fc8
10 2/27/20
VGG ImageNet challenge 30
22.5 152 layers
15 5 error
- Non-deep 19-22 layers To p 7.5 8 layers • Deeper AlexNet/ZFNet 0 2010 2011 2012 (AlexNet) 2013 (ZFNet) 2014 (VGG/GoogLeNet) 2015 (ResNet)
Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan and Zisserman, ICLR 2015
ZFNet to VGG Conv 3x3 C=64 Conv 3x3 C=64 Max Pool 2x2 stride=2 Conv 7x7 stride=2, C=96 Conv 3x3 C=128 LRN Conv 3x3 C=128 Max Pool 2x2 stride=2 Max Pool 3x3 stride=2 Conv 3x3 C=256 Conv 5x5 stride=2, C=256 Conv 3x3 C=256 LRN Conv 3x3 C=256 Max Pool 3x3 pad=0, stride=2 Max Pool 2x2 stride=2 Conv 3x3 C=512 Conv 3x3 C=512 Conv 3x3 C=512 Conv 3x3 C=1024 Conv 3x3 C=512 Max Pool 2x2 stride=2 Conv 3x3 C=512 Conv 3x3 C=512 Max Pool 3x3 pad=0, stride=2 Conv 3x3 C=512 Linear (4096) Conv 3x3 C=512 Max Pool 2x2 stride=2 Linear (4096) Linear (4096) Linear (1000) Linear (4096) Linear (1000)
11 2/27/20
Insights in VGG
• Why use smaller filters? Conv 3x3 Conv 5x5 Conv 3x3 ≈ • Factorization
Training VGG Conv 3x3 C=64 Conv 3x3 C=64 Max Pool 2x2 stride=2 Conv 3x3 C=128 Conv 3x3 C=128 Max Pool 2x2 stride=2 Conv 3x3 C=256 Conv 3x3 C=256 Conv 3x3 C=256 Max Pool 2x2 stride=2 Conv 3x3 C=512 • Vanishing gradients Conv 3x3 C=512 Conv 3x3 C=512 Max Pool 2x2 stride=2 Conv 3x3 C=512 Conv 3x3 C=512 Conv 3x3 C=512 Max Pool 2x2 stride=2 Linear (4096) Linear (4096) Linear (1000)
12 2/27/20
Parameters and computation
120 102.8 90
60
30 16.8 Parameters (M) 0.00.0 0.1 0.1 0.30.60.6 1.2 2.42.42.42.42.4 4.1 0 2 1.9 1.9 1.9 1.9 1.9 2 0.9 0.9 0.9 1 0.5 0.50.50.5 FLOPs (G) FLOPs 1 0.1 0.1 0.00.0 0 conv 1.1 conv 1.2 conv 2.1 conv 2.2 conv 3.1 conv 3.2 conv 3.3 conv 4.1 conv 4.2 conv 4.3 conv 5.1 conv 5.2 conv 5.3 conv fc 6 fc 7 fc 8
1x1 convolutions and factorization
• Convolutions are linear operators
• Can we make them non-linear?
Figure source: "Network-in-Network", ICLR 2014 paper
Network-in-Network, Lin et al., ICLR 2014
13 2/27/20
Factorized convolution
• Implements “non-linear” Conv 3x3 convolution ReLU
• Less computation, fewer Conv 1x1 parameters
Inception architecture
Conv 1x1 Conv 1x1 Max Pool 3x3 • GoogLeNet
Conv 1x1 Conv 3x3 Conv 5x5 Conv 1x1 • Evolution of Network-in-Network Concat
Going deeper with convolutions, Szegedy et al. CVPR 2015.
14 2/27/20
Inception architecture
Residual Networks ImageNet challenge 30
22.5 152 layers
• Uses residual 15
5 error Non-deep 19-22 layers connections to build - To p 7.5 deeper networks 8 layers 0 2010 2011 2012 (AlexNet) 2013 (ZFNet) 2014 (VGG/GoogLeNet) 2015 (ResNet) • Activation maps are additive
Deep Residual Learning for Image Recognition, He et al., CVPR 2016
15 2/27/20
Residual blocks
!"
Conv
• Add shortcut BN connections for ReLU gradients Conv
• Identity BN
• Strided 1x1 + convolution ReLU
!"#$
ResNet-152
Conv 7x7 (64) ResNet Max Pool 3x3 Conv 1x1 (64) Conv 3x3 (64) x3 Conv 1x1 (256)
Conv 1x1 (128) • Multiple variants Conv 3x3 (128) x8 Conv 1x1 (512) • ResNet-18, ResNet-34, ResNet-50, ResNet-101, Conv 1x1 (256) Conv 3x3 (256) x36
ResNet-152, ResNet- Conv 1x1 (1024) 1001
Conv 1x1 (512)
Conv 3x3 (512) x3
Conv 1x1 (2048)
Avg Pool
16