CNN Architectures and Visualization Alexnet

2/27/20 CNN architectures and visualization Adopted from: 2019 Philipp Krähenbühl and Chao-Yuan Wu AlexNet ImageNet challenge 30 22.5 152 layers 5 error 15 • Won the ImageNet - Non-deep 19-22 layers competition 2012 Top 7.5 8 layers • Start of deep learning 0 revolution in computer 2010 2011 (AlexNet) 2012 (ZFNet) 2013 (VGG/GoogLeNet) 2014 (ResNet) 2015 vision ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et al., NIPS 2012 1 2/27/20 227 227 no pad Conv 11x11 stride=4, C=96 LRN no pad Max Pool 3x3 stride=2 pad Conv 5x5 Conv 5x5 C=128 ... LRN AlexNet Max Pool 3x3 pad=0, stride=2 Conv 3x3 C=384 Conv 3x3 Conv 3x3 C=192 Conv 3x3 Conv 3x3 C=128 Max Pool 3x3 pad=0, stride=2 Linear (4096) Linear (4096) Linear (1000) Activation maps in AlexNet 227 256 96 227 55 Conv 11x11 13 256 55 256 LRN 13 Max Pool 3x3 Conv 5x5 Conv 5x5 6 27 LRN 384 6 27 Max Pool 3x3 Conv 3x3 9216 13 Conv 3x3 Conv 3x3 4096 13 384 Conv 3x3 Conv 3x3 Max Pool 3x3 4096 Linear (4096) 13 Linear (4096) 1000 13 Linear (1000) 2 2/27/20 Activation maps in AlexNet 10000 40964096 1000 1000 256 384 384 256 256 96 100 Channels 10 3 1 1000 227 55 100 27 13 13 13 Width 10 6 1 1 1 1 input conv 1 conv 2 conv 3 conv 4 conv 5 maxpool 6 fc 7 fc 8 fc Parameters and computation 40 37.8 30 20 16.8 10 Parameters (M) 4.1 0.0 0.3 0.9 0.7 0.4 0 300 224 225 150 150 106 112 75 FLOPs FLOPs (M) 75 38 17 4 0 conv 1 conv 2 conv 3 conv 4 conv 5 6 fc 7 fc 8 fc 3 2/27/20 What’s happening inside the black box? “cat” Visualize first-layer weights directly What’s happening inside the black box? “cat” Not too helpful for subsequent layers Features from a CIFAR10 network, via Stanford CS231n 4 2/27/20 What’s happening inside the black box? “cat” Visualize maximally activating patches: pick a unit; run many images through the network; visualize patches that produce the highest output values What’s happening inside the black box? “cat” Visualize maximally activating patches 5 2/27/20 What’s happening inside the black box? “cat” What about FC layers? Visualize nearest neighbor images according to activation vectors Source: Stanford CS231n What’s happening inside the black box? “cat” What about FC layers? Fancy dimensionality reduction, e.g., t-SNE Source: Andrej Karpathy 6 2/27/20 What’s happening inside the black box? “cat” Visualize activations for a particular image Source What’s happening inside the black box? “cat” Visualize activations for a particular image 1. Unpool 2. Rectify 3. Deconvolve Visualize pixel values responsible for the activation Source 7 2/27/20 Deep visualization toolbox YouTube video J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson, Understanding neural networks through deep visualization, ICML DL workshop, 2015 Additional visualization techniques K. Simonyan, A. Vedaldi, and A. Zisserman, Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, ICLR 2014 J. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller, Striving for simplicity: The all convolutional net, ICLR workshop, 2015 A. Nguyen, J. Yosinski, J. Clune, Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks, ICML workshop, 2016 8 2/27/20 ZFNet ImageNet challenge 30 • Won the ImageNet 22.5 152 layers competition 2013 5 error 15 - Non-deep 19-22 layers • Nice analysis and Top 7.5 visualization of AlexNet 8 layers 0 2010 2011 (AlexNet) 2012 (ZFNet) 2013 (VGG/GoogLeNet) 2014 (ResNet) 2015 • Introduced up-conv Visualizing and Understanding Convolutional Networks, Zeiler & Fergus, ECCV 2014 AlexNet to ZFNet 227 227 227 227 Conv 11x11 stride=4, C=96 Conv 7x7 stride=2, C=96 LRN LRN Max Pool 3x3 stride=2 Max Pool 3x3 stride=2 C=12 stride=2, C=256 Conv 5x5 Conv 5x5 8 Conv 5x5 LRN LRN Max Pool 3x3 pad=0, stride=2 Max Pool 3x3 pad=0, stride=2 Conv 3x3 C=384 Conv 3x3 C=512 C=19 C=1024 Conv 3x3 Conv 3x3 2 Conv 3x3 C=12 C=51 Conv 3x3 Conv 3x3 8 Conv 3x3 2 Max Pool 3x3 pad=0, stride=2 Max Pool 3x3 pad=0, stride=2 Linear (4096) Linear (4096) Linear (4096) Linear (4096) Linear (1000) Linear (1000) 9 2/27/20 Activation maps in ZFNet 100 00 40964096 40964096 1024 1000 1000 100 0 384 512 384 512 512 256 256 256 256 96 96 100 Channels 10 3 3 10001 227 225 110 AlexN et ZF Net 100 55 27 26 13 13 13 13 13 13 Width 10 6 6 1 1 1 1 1 1 1 input conv1 conv2 conv3 conv4 conv5 maxpool 6 fc 7 fc 8 fc Parameters and computation 80 75.5 60 40 37.8 16.8 Parameters20 (M) 4.7 4.7 16.8 4.1 0.0 0.0 0.3 0.6 0.9 1.2 0.7 0.4 4.1 0 1000 798 798 AlexN et ZF Net 750 500 416 FLOPs (M) FLOPs 199 250 106 172 224 76 150 112 75 38 17 17 4 4 0 conv1 conv2 conv3 conv4 conv5 6 fc 7 fc 8 fc 10 2/27/20 VGG ImageNet challenge 30 22.5 152 layers 15 5 error - Non-deep 19-22 layers To p 7.5 8 layers • Deeper AlexNet/ZFNet 0 2010 2011 (AlexNet) 2012 (ZFNet) 2013 (VGG/GoogLeNet) 2014 (ResNet) 2015 Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan and Zisserman, ICLR 2015 ZFNet to VGG Conv 3x3 C=64 Conv 3x3 C=64 Max Pool 2x2 stride=2 Conv 7x7 stride=2, C=96 Conv 3x3 C=128 LRN Conv 3x3 C=128 Max Pool 2x2 stride=2 Max Pool 3x3 stride=2 Conv 3x3 C=256 Conv 5x5 stride=2, C=256 Conv 3x3 C=256 LRN Conv 3x3 C=256 Max Pool 3x3 pad=0, stride=2 Max Pool 2x2 stride=2 Conv 3x3 C=512 Conv 3x3 C=512 Conv 3x3 C=512 Conv 3x3 C=1024 Conv 3x3 C=512 Max Pool 2x2 stride=2 Conv 3x3 C=512 Conv 3x3 C=512 Max Pool 3x3 pad=0, stride=2 Conv 3x3 C=512 Linear (4096) Conv 3x3 C=512 Max Pool 2x2 stride=2 Linear (4096) Linear (4096) Linear (1000) Linear (4096) Linear (1000) 11 2/27/20 Insights in VGG • Why use smaller filters? Conv 3x3 Conv 5x5 Conv 3x3 ≈ • Factorization Training VGG Conv 3x3 C=64 Conv 3x3 C=64 Max Pool 2x2 stride=2 Conv 3x3 C=128 Conv 3x3 C=128 Max Pool 2x2 stride=2 Conv 3x3 C=256 Conv 3x3 C=256 Conv 3x3 C=256 Max Pool 2x2 stride=2 Conv 3x3 C=512 • Vanishing gradients Conv 3x3 C=512 Conv 3x3 C=512 Max Pool 2x2 stride=2 Conv 3x3 C=512 Conv 3x3 C=512 Conv 3x3 C=512 Max Pool 2x2 stride=2 Linear (4096) Linear (4096) Linear (1000) 12 2/27/20 Parameters and computation 120 102.8 90 60 30 16.8 Parameters (M) 0.00.0 0.1 0.1 0.30.60.6 1.2 2.42.42.42.42.4 4.1 0 2 1.9 1.9 1.9 1.9 1.9 2 0.9 0.9 0.9 1 0.5 0.50.50.5 FLOPs (G) FLOPs 1 0.1 0.1 0.00.0 0 conv 1.1 conv 1.2 conv 2.1 conv 2.2 conv 3.1 conv 3.2 conv 3.3 conv 4.1 conv 4.2 conv 4.3 conv 5.1 conv 5.2 conv 5.3 6 fc 7 fc 8 fc 1x1 convolutions and factorization • Convolutions are linear operators • Can we make them non-linear? Figure source: "Network-in-Network", ICLR 2014 paper Network-in-Network, Lin et al., ICLR 2014 13 2/27/20 Factorized convolution • Implements “non-linear” Conv 3x3 convolution ReLU • Less computation, fewer Conv 1x1 parameters Inception architecture Conv 1x1 Conv 1x1 Max Pool 3x3 • GoogLeNet Conv 1x1 Conv 3x3 Conv 5x5 Conv 1x1 • Evolution of Network-in-Network Concat Going deeper with convolutions, Szegedy et al. CVPR 2015. 14 2/27/20 Inception architecture Residual Networks ImageNet challenge 30 22.5 152 layers • Uses residual 15 5 error Non-deep 19-22 layers connections to build - To p 7.5 deeper networks 8 layers 0 2010 2011 (AlexNet) 2012 (ZFNet) 2013 (VGG/GoogLeNet) 2014 (ResNet) 2015 • Activation maps are additive Deep Residual Learning for Image Recognition, He et al., CVPR 2016 15 2/27/20 Residual blocks !" Conv • Add shortcut BN connections for ReLU gradients Conv • Identity BN • Strided 1x1 + convolution ReLU !"#$ ResNet-152 Conv 7x7 (64) ResNet Max Pool 3x3 Conv 1x1 (64) Conv 3x3 (64) x3 Conv 1x1 (256) Conv 1x1 (128) • Multiple variants Conv 3x3 (128) x8 Conv 1x1 (512) • ResNet-18, ResNet-34, ResNet-50, ResNet-101, Conv 1x1 (256) Conv 3x3 (256) x36 ResNet-152, ResNet- Conv 1x1 (1024) 1001 Conv 1x1 (512) Conv 3x3 (512) x3 Conv 1x1 (2048) Avg Pool 16.

CNN Architectures and Visualization Alexnet

Lecture 10: Recurrent Neural Networks

Image Retrieval Algorithm Based on Convolutional Neural Network

LSTM-In-LSTM for Generating Long Descriptions of Images

The History Began from Alexnet: a Comprehensive Survey on Deep Learning Approaches

Active Authentication Using an Autoencoder Regularized CNN-Based One-Class Classiﬁer

Key Ideas and Architectures in Deep Learning Applications That (Probably) Use DL

Deep Learning in MATLAB

Path Towards Better Deep Learning Models and Implications for Hardware

Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction

Data Movement Is All You Need: a Case Study on Optimizing Transformers

Optimizing Alexnet on Tiny Imagenet10

What Is Reinforcement Learning?