CS7015 (Deep Learning) : Lecture 11 Convolutional Neural Networks, LeNet, AlexNet, ZF-Net, VGGNet, GoogLeNet and ResNet
Mitesh M. Khapra
Department of Computer Science and Engineering Indian Institute of Technology Madras
1/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Module 11.1 : The convolution operation
2/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 ∞ X xt−aw−a (x∗w)t a=0
input filter convolution
Now suppose our sensor is noisy x1 x2 To obtain a less noisy estimate we would like to average several measure- ments st = = More recent measurements are more important so we would like to take a weighted average
Suppose we are tracking the position of an aeroplane using a laser sensor at discrete time intervals x0
3/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 ∞ X xt−aw−a (x∗w)t a=0
input filter convolution
Now suppose our sensor is noisy x2 To obtain a less noisy estimate we would like to average several measure- ments st = = More recent measurements are more important so we would like to take a weighted average
Suppose we are tracking the position of an aeroplane using a laser sensor at discrete time intervals x0 x1
3/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 ∞ X xt−aw−a (x∗w)t a=0
input filter convolution
Now suppose our sensor is noisy To obtain a less noisy estimate we would like to average several measure- ments st = = More recent measurements are more important so we would like to take a weighted average
Suppose we are tracking the position of an aeroplane using a laser sensor at discrete time intervals x0 x1 x2
3/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 ∞ X xt−aw−a (x∗w)t a=0
input filter convolution
To obtain a less noisy estimate we would like to average several measure- ments st = = More recent measurements are more important so we would like to take a weighted average
Suppose we are tracking the position of an aeroplane using a laser sensor at discrete time intervals Now suppose our sensor is noisy x0 x1 x2
3/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 ∞ X xt−aw−a (x∗w)t a=0
input filter convolution
st = = More recent measurements are more important so we would like to take a weighted average
Suppose we are tracking the position of an aeroplane using a laser sensor at discrete time intervals Now suppose our sensor is noisy x0 x1 x2 To obtain a less noisy estimate we would like to average several measure- ments
3/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 ∞ X xt−aw−a (x∗w)t a=0
input filter convolution
st = =
Suppose we are tracking the position of an aeroplane using a laser sensor at discrete time intervals Now suppose our sensor is noisy x0 x1 x2 To obtain a less noisy estimate we would like to average several measure- ments More recent measurements are more important so we would like to take a weighted average
3/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 (x∗w)t
input filter convolution
Suppose we are tracking the position of an aeroplane using a laser sensor at discrete time intervals Now suppose our sensor is noisy x0 x1 x2 To obtain a less noisy estimate we would like to average several measure- ∞ X ments st = xt−aw−a = a=0 More recent measurements are more important so we would like to take a weighted average
3/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 input filter convolution
Suppose we are tracking the position of an aeroplane using a laser sensor at discrete time intervals Now suppose our sensor is noisy x0 x1 x2 To obtain a less noisy estimate we would like to average several measure- ∞ X ments st = xt−aw−a = (x∗w)t a=0 More recent measurements are more important so we would like to take a weighted average
3/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Suppose we are tracking the position of an aeroplane using a laser sensor at discrete time intervals Now suppose our sensor is noisy x0 x1 x2 To obtain a less noisy estimate we would like to average several measure- ∞ X ments st = xt−aw−a = (x∗w)t a=0 More recent measurements are more important so we would like to take a input filter weighted average convolution
3/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The weight array (w) is known as the filter We just slide the filter over the input and compute the value of st based on a win- dow around xt s = x w + x w + x w + x w + x w + x w + x w 6 6 0 5 −1 4 −2 3 −3 2 −4 1 −5 Here0 −6 the input (and the kernel) is one dimensional Can we use a convolutional operation on a 2D input also?
In practice, we would only sum over a 6 X small window st = xt−aw−a a=0
4/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 We just slide the filter over the input and compute the value of st based on a win- dow around xt s = x w + x w + x w + x w + x w + x w + x w 6 6 0 5 −1 4 −2 3 −3 2 −4 1 −5 Here0 −6 the input (and the kernel) is one dimensional Can we use a convolutional operation on a 2D input also?
In practice, we would only sum over a 6 X small window st = xt−aw−a The weight array (w) is known as the a=0 filter
4/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Here the input (and the kernel) is one dimensional Can we use a convolutional operation on a 2D input also?
In practice, we would only sum over a 6 X small window st = xt−aw−a The weight array (w) is known as the a=0 filter We just slide the filter over the input and compute the value of st based on a win- dow around xt w−6 w−5 w−4 w−3 w−2 w−1 w0 W 0.01 0.01 0.02 0.02 0.04 0.4 0.5
X 1.00 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20 2.40 2.50 2.70
S 0.00 1.80 0.00 0.00 0.00 0.00 0.00
s6 = x6w0 + x5w−1 + x4w−2 + x3w−3 + x2w−4 + x1w−5 + x0w−6
4/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Here the input (and the kernel) is one dimensional Can we use a convolutional operation on a 2D input also?
In practice, we would only sum over a 6 X small window st = xt−aw−a The weight array (w) is known as the a=0 filter We just slide the filter over the input and compute the value of st based on a win- dow around xt w−6 w−5 w−4 w−3 w−2 w−1 w0 W 0.01 0.01 0.02 0.02 0.04 0.4 0.5
X 1.00 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20 2.40 2.50 2.70
S 0.00 1.80 1.96 0.00 0.00 0.00 0.00
s6 = x6w0 + x5w−1 + x4w−2 + x3w−3 + x2w−4 + x1w−5 + x0w−6
4/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Here the input (and the kernel) is one dimensional Can we use a convolutional operation on a 2D input also?
In practice, we would only sum over a 6 X small window st = xt−aw−a The weight array (w) is known as the a=0 filter We just slide the filter over the input and compute the value of st based on a win- dow around xt w−6 w−5 w−4 w−3 w−2 w−1 w0 W 0.01 0.01 0.02 0.02 0.04 0.4 0.5
X 1.00 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20 2.40 2.50 2.70
S 0.00 1.80 1.96 2.11 0.00 0.00 0.00
s6 = x6w0 + x5w−1 + x4w−2 + x3w−3 + x2w−4 + x1w−5 + x0w−6
4/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Here the input (and the kernel) is one dimensional Can we use a convolutional operation on a 2D input also?
In practice, we would only sum over a 6 X small window st = xt−aw−a The weight array (w) is known as the a=0 filter We just slide the filter over the input and compute the value of st based on a win- dow around xt w−6 w−5 w−4 w−3 w−2 w−1 w0 W 0.01 0.01 0.02 0.02 0.04 0.4 0.5
X 1.00 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20 2.40 2.50 2.70
S 0.00 1.80 1.96 2.11 2.16 0.00 0.00
s6 = x6w0 + x5w−1 + x4w−2 + x3w−3 + x2w−4 + x1w−5 + x0w−6
4/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Here the input (and the kernel) is one dimensional Can we use a convolutional operation on a 2D input also?
In practice, we would only sum over a 6 X small window st = xt−aw−a The weight array (w) is known as the a=0 filter We just slide the filter over the input and compute the value of st based on a win- dow around xt w−6 w−5 w−4 w−3 w−2 w−1 w0 W 0.01 0.01 0.02 0.02 0.04 0.4 0.5
X 1.00 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20 2.40 2.50 2.70
S 0.00 1.80 1.96 2.11 2.16 2.28 0.00
s6 = x6w0 + x5w−1 + x4w−2 + x3w−3 + x2w−4 + x1w−5 + x0w−6
4/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Here the input (and the kernel) is one dimensional Can we use a convolutional operation on a 2D input also?
In practice, we would only sum over a 6 X small window st = xt−aw−a The weight array (w) is known as the a=0 filter We just slide the filter over the input and compute the value of st based on a win- dow around xt w−6 w−5 w−4 w−3 w−2 w−1 w0 W 0.01 0.01 0.02 0.02 0.04 0.4 0.5
X 1.00 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20 2.40 2.50 2.70
S 0.00 1.80 1.96 2.11 2.16 2.28 2.42
s6 = x6w0 + x5w−1 + x4w−2 + x3w−3 + x2w−4 + x1w−5 + x0w−6
4/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Can we use a convolutional operation on a 2D input also?
In practice, we would only sum over a 6 X small window st = xt−aw−a The weight array (w) is known as the a=0 filter We just slide the filter over the input and compute the value of st based on a win- dow around xt w−6 w−5 w−4 w−3 w−2 w−1 w0 W 0.01 0.01 0.02 0.02 0.04 0.4 0.5 Here the input (and the kernel) is one dimensional
X 1.00 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20 2.40 2.50 2.70
S 0.00 1.80 1.96 2.11 2.16 2.28 2.42
s6 = x6w0 + x5w−1 + x4w−2 + x3w−3 + x2w−4 + x1w−5 + x0w−6
4/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 In practice, we would only sum over a 6 X small window st = xt−aw−a The weight array (w) is known as the a=0 filter We just slide the filter over the input and compute the value of st based on a win- dow around xt w−6 w−5 w−4 w−3 w−2 w−1 w0 W 0.01 0.01 0.02 0.02 0.04 0.4 0.5 Here the input (and the kernel) is one dimensional
X 1.00 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20 2.40 2.50 2.70 Can we use a convolutional operation on a 2D input also?
S 0.00 1.80 1.96 2.11 2.16 2.28 2.42
s6 = x6w0 + x5w−1 + x4w−2 + x3w−3 + x2w−4 + x1w−5 + x0w−6
4/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 We would now like to use a 2D filter (m × n) First let us see what the 2D formula looks like This formula looks at all the preced- ing neighbours (i − a, j − b) In practice, we use the following for- mula which looks at the succeeding m−1 n−1 X X neighbours Sij = (I ∗ K)ij = a=0 b=0
We can think of images as 2D inputs
5/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 First let us see what the 2D formula looks like This formula looks at all the preced- ing neighbours (i − a, j − b) In practice, we use the following for- mula which looks at the succeeding m−1 n−1 X X neighbours Sij = (I ∗ K)ij = a=0 b=0
We can think of images as 2D inputs We would now like to use a 2D filter (m × n)
5/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 This formula looks at all the preced- ing neighbours (i − a, j − b) In practice, we use the following for- mula which looks at the succeeding neighbours
We can think of images as 2D inputs We would now like to use a 2D filter (m × n) First let us see what the 2D formula looks like
m−1 n−1 X X Sij = (I ∗ K)ij = Ii−a,j−bKa,b a=0 b=0
5/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 In practice, we use the following for- mula which looks at the succeeding neighbours
We can think of images as 2D inputs We would now like to use a 2D filter (m × n) First let us see what the 2D formula looks like This formula looks at all the preced- ing neighbours (i − a, j − b)
m−1 n−1 X X Sij = (I ∗ K)ij = Ii−a,j−bKa,b a=0 b=0
5/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 We can think of images as 2D inputs We would now like to use a 2D filter (m × n) First let us see what the 2D formula looks like This formula looks at all the preced- ing neighbours (i − a, j − b) In practice, we use the following for- mula which looks at the succeeding m−1 n−1 X X neighbours Sij = (I ∗ K)ij = Ii+a,j+bKa,b a=0 b=0
5/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 OutputOutput
aw+bx+ey+fzaw+bx+ey+fz bw+cx+fy+gzbw+cx+fy+gz cw+dx+gy+hz
ew+fx+iy+jz fw+gx+jy+kz gw+hx+ky+`z
Input Kernel a b c d w x e f g h y z i j k `
Let us apply this idea to a toy ex- ample and see the results
6/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 OutputOutput
aw+bx+ey+fzaw+bx+ey+fz bw+cx+fy+gzbw+cx+fy+gz cw+dx+gy+hz
ew+fx+iy+jz fw+gx+jy+kz gw+hx+ky+`z
Input Let us apply this idea to a toy ex- Kernel ample and see the results a b c d w x e f g h y z i j k `
Output
aw+bx+ey+fz
6/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 OutputOutput
aw+bx+ey+fzaw+bx+ey+fz bw+cx+fy+gz cw+dx+gy+hz
ew+fx+iy+jz fw+gx+jy+kz gw+hx+ky+`z
Input Let us apply this idea to a toy ex- Kernel ample and see the results a b c d w x e f g h y z i j k `
Output
aw+bx+ey+fz bw+cx+fy+gz
6/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 OutputOutput
aw+bx+ey+fzaw+bx+ey+fz bw+cx+fy+gzbw+cx+fy+gz cw+dx+gy+hz
ew+fx+iy+jz fw+gx+jy+kz gw+hx+ky+`z
Input Let us apply this idea to a toy ex- Kernel ample and see the results a b c d w x e f g h y z i j k `
Output aw+bx+ey+fz bw+cx+fy+gz cw+dx+gy+hz
6/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 OutputOutput
aw+bx+ey+fzaw+bx+ey+fz bw+cx+fy+gzbw+cx+fy+gz cw+dx+gy+hz
ew+fx+iy+jz fw+gx+jy+kz gw+hx+ky+`z
Input Let us apply this idea to a toy ex- Kernel ample and see the results a b c d w x e f g h y z i j k `
Output aw+bx+ey+fz bw+cx+fy+gz cw+dx+gy+hz
ew+fx+iy+jz
6/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 OutputOutput
aw+bx+ey+fzaw+bx+ey+fz bw+cx+fy+gzbw+cx+fy+gz cw+dx+gy+hz
ew+fx+iy+jz fw+gx+jy+kz gw+hx+ky+`z
Input Let us apply this idea to a toy ex- Kernel ample and see the results a b c d w x e f g h y z i j k `
Output aw+bx+ey+fz bw+cx+fy+gz cw+dx+gy+hz
ew+fx+iy+jz fw+gx+jy+kz
6/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 OutputOutput
aw+bx+ey+fzaw+bx+ey+fz bw+cx+fy+gzbw+cx+fy+gz cw+dx+gy+hz
ew+fx+iy+jz fw+gx+jy+kz
Input Let us apply this idea to a toy ex- Kernel ample and see the results a b c d w x e f g h y z i j k `
Output aw+bx+ey+fz bw+cx+fy+gz cw+dx+gy+hz
ew+fx+iy+jz fw+gx+jy+kz gw+hx+ky+`z
6/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 m n b 2 c b 2 c X X Sij = (I ∗ K) = Ii−a,j−bK m n ij 2 +a, 2 +b a=b− m c b=b− n c 2 2 In other words we will assume that the kernel is centered on the pixel of pixel of interest interest So we will be looking at both preceed- ing and succeeding neighbors
For the rest of the discussion we will use the following formula for convolu- tion
7/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 In other words we will assume that the kernel is centered on the pixel of pixel of interest interest So we will be looking at both preceed- ing and succeeding neighbors
For the rest of the discussion we will m n b 2 c b 2 c X X use the following formula for convolu- Sij = (I ∗ K) = Ii−a,j−bK m +a, n +b ij 2 2 tion m n a=b− 2 c b=b− 2 c
7/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 So we will be looking at both preceed- ing and succeeding neighbors
For the rest of the discussion we will m n b 2 c b 2 c X X use the following formula for convolu- Sij = (I ∗ K) = Ii−a,j−bK m +a, n +b ij 2 2 tion a=b− m c b=b− n c 2 2 In other words we will assume that the kernel is centered on the pixel of pixel of interest interest
7/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 For the rest of the discussion we will m n b 2 c b 2 c X X use the following formula for convolu- Sij = (I ∗ K) = Ii−a,j−bK m +a, n +b ij 2 2 tion a=b− m c b=b− n c 2 2 In other words we will assume that the kernel is centered on the pixel of pixel of interest interest So we will be looking at both preceed- ing and succeeding neighbors
7/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Let us see some examples of 2D convolutions applied to images
8/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1 1 1 ∗ 1 1 1 = 1 1 1
9/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1 1 1 ∗ 1 1 1 = 1 1 1
blurs the image
9/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 0 -1 0 ∗ -1 5 -1 = 0 -1 0
10/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 0 -1 0 ∗ -1 5 -1 = 0 -1 0
sharpens the image
10/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1 1 1 ∗ 1 -8 1 = 1 1 1
11/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1 1 1 ∗ 1 -8 1 = 1 1 1
detects the edges
11/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 We will now see a working example of 2D convolution.
12/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Each time we slide the kernel we get one value in the output The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 We can use multiple filters to get mul- tiple feature maps.
We just slide the kernel over the input image Each time we slide the kernel we get one value in the output The resulting output is called a fea- ture map.
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 We just slide the kernel over the input image Each time we slide the kernel we get one value in the output The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps.
13/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 In the 2D case, we slide a two dimen- stional filter over a two dimensional out- put What would happen in the 3D case?
Question In the 1D case, we slide a one dimensional filter over a one dimensional input
14/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 In the 2D case, we slide a two dimen- stional filter over a two dimensional out- put What would happen in the 3D case?
Question A B C B A B C In the 1D case, we slide a one dimensional filter over a one dimensional input
14/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 In the 2D case, we slide a two dimen- stional filter over a two dimensional out- put What would happen in the 3D case?
Question A B C B A B C In the 1D case, we slide a one dimensional filter over a one dimensional input
14/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 In the 2D case, we slide a two dimen- stional filter over a two dimensional out- put What would happen in the 3D case?
Question A B C B A B C In the 1D case, we slide a one dimensional filter over a one dimensional input
14/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 In the 2D case, we slide a two dimen- stional filter over a two dimensional out- put What would happen in the 3D case?
Question A B C B A B C In the 1D case, we slide a one dimensional filter over a one dimensional input
14/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 In the 2D case, we slide a two dimen- stional filter over a two dimensional out- put What would happen in the 3D case?
Question A B C B A B C In the 1D case, we slide a one dimensional filter over a one dimensional input
14/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 What would happen in the 3D case?
Question In the 1D case, we slide a one dimensional filter over a one dimensional input In the 2D case, we slide a two dimen- stional filter over a two dimensional out- put
14/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 What would happen in the 3D case?
a b c d Question In the 1D case, we slide a one dimensional e f g h filter over a one dimensional input i j k l In the 2D case, we slide a two dimen- stional filter over a two dimensional out- put
14/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 What would happen in the 3D case?
a b c d Question In the 1D case, we slide a one dimensional e f g h filter over a one dimensional input i j k l In the 2D case, we slide a two dimen- stional filter over a two dimensional out- put
14/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 What would happen in the 3D case?
a b c d Question In the 1D case, we slide a one dimensional e f g h filter over a one dimensional input i j k l In the 2D case, we slide a two dimen- stional filter over a two dimensional out- put
14/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 What would happen in the 3D case?
a b c d Question In the 1D case, we slide a one dimensional e f g h filter over a one dimensional input i j k l In the 2D case, we slide a two dimen- stional filter over a two dimensional out- put
14/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 What would happen in the 3D case?
a b c d Question In the 1D case, we slide a one dimensional e f g h filter over a one dimensional input i j k l In the 2D case, we slide a two dimen- stional filter over a two dimensional out- put
14/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 What would happen in the 3D case?
a b c d Question In the 1D case, we slide a one dimensional e f g h filter over a one dimensional input i j k l In the 2D case, we slide a two dimen- stional filter over a two dimensional out- put
14/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Question In the 1D case, we slide a one dimensional filter over a one dimensional input In the 2D case, we slide a two dimen- stional filter over a two dimensional out- put What would happen in the 3D case?
14/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution oper- ation Note that in this lecture we will assume that the filter always extends to the depth of the image filter In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width OUTPUT and height, no depth) Once again we can apply multiple filters to get multiple feature maps
R G B What would a 3D filter look like?
INPUT
15/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Once again we will slide the volume over the 3D input and compute the convolution oper- ation Note that in this lecture we will assume that the filter always extends to the depth of the image In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width OUTPUT and height, no depth) Once again we can apply multiple filters to get multiple feature maps
R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume
filter
INPUT
15/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Note that in this lecture we will assume that the filter always extends to the depth of the image filter In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width and height, no depth) Once again we can apply multiple filters to get multiple feature maps
R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution oper- ation
OUTPUT
INPUT
15/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Note that in this lecture we will assume that the filter always extends to the depth of the image filter In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width and height, no depth) Once again we can apply multiple filters to get multiple feature maps
R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution oper- ation
OUTPUT
INPUT
15/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Note that in this lecture we will assume that the filter always extends to the depth of the image filter In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width and height, no depth) Once again we can apply multiple filters to get multiple feature maps
R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution oper- ation
OUTPUT
INPUT
15/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Note that in this lecture we will assume that the filter always extends to the depth of the image filter In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width and height, no depth) Once again we can apply multiple filters to get multiple feature maps
R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution oper- ation
OUTPUT
INPUT
15/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Note that in this lecture we will assume that the filter always extends to the depth of the image filter In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width and height, no depth) Once again we can apply multiple filters to get multiple feature maps
R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution oper- ation
OUTPUT
INPUT
15/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Note that in this lecture we will assume that the filter always extends to the depth of the image filter In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width and height, no depth) Once again we can apply multiple filters to get multiple feature maps
R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution oper- ation
OUTPUT
INPUT
15/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Note that in this lecture we will assume that the filter always extends to the depth of the image filter In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width and height, no depth) Once again we can apply multiple filters to get multiple feature maps
R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution oper- ation
OUTPUT
INPUT
15/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Note that in this lecture we will assume that the filter always extends to the depth of the image filter In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width and height, no depth) Once again we can apply multiple filters to get multiple feature maps
R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution oper- ation
OUTPUT
INPUT
15/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Note that in this lecture we will assume that the filter always extends to the depth of the image filter In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width and height, no depth) Once again we can apply multiple filters to get multiple feature maps
R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution oper- ation
OUTPUT
INPUT
15/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Note that in this lecture we will assume that the filter always extends to the depth of the image filter In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width and height, no depth) Once again we can apply multiple filters to get multiple feature maps
R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution oper- ation
OUTPUT
INPUT
15/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Note that in this lecture we will assume that the filter always extends to the depth of the image filter In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width and height, no depth) Once again we can apply multiple filters to get multiple feature maps
R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution oper- ation
OUTPUT
INPUT
15/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Note that in this lecture we will assume that the filter always extends to the depth of the image filter In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width and height, no depth) Once again we can apply multiple filters to get multiple feature maps
R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution oper- ation
OUTPUT
INPUT
15/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 filter In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width and height, no depth) Once again we can apply multiple filters to get multiple feature maps
R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution oper- ation Note that in this lecture we will assume that the filter always extends to the depth of the image
OUTPUT
INPUT
15/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 filter
As a result the output will be 2D (only width and height, no depth) Once again we can apply multiple filters to get multiple feature maps
R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution oper- ation Note that in this lecture we will assume that the filter always extends to the depth of the image In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth)
OUTPUT
INPUT
15/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 filter
Once again we can apply multiple filters to get multiple feature maps
R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution oper- ation Note that in this lecture we will assume that the filter always extends to the depth of the image In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width OUTPUT and height, no depth)
INPUT
15/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 filter
R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution oper- ation Note that in this lecture we will assume that the filter always extends to the depth of the image In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width OUTPUT and height, no depth) INPUT Once again we can apply multiple filters to get
multiple feature maps 15/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Module 11.2 : Relation between input size, output size and filter size
16/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1 inputs 2 filters 3 outputs and the relations between them We will see how they are related but before that we will define a few quantities
So far we have not said anything explicit about the dimensions of the
17/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 2 filters 3 outputs and the relations between them We will see how they are related but before that we will define a few quantities
So far we have not said anything explicit about the dimensions of the 1 inputs
17/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 3 outputs and the relations between them We will see how they are related but before that we will define a few quantities
So far we have not said anything explicit about the dimensions of the 1 inputs 2 filters
17/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 and the relations between them We will see how they are related but before that we will define a few quantities
So far we have not said anything explicit about the dimensions of the 1 inputs 2 filters 3 outputs
17/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 We will see how they are related but before that we will define a few quantities
So far we have not said anything explicit about the dimensions of the 1 inputs 2 filters 3 outputs and the relations between them
17/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 So far we have not said anything explicit about the dimensions of the 1 inputs 2 filters 3 outputs and the relations between them We will see how they are related but before that we will define a few quantities
17/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 and Depth (D1) of the original input
F Height (H1)
F
D1
Width (W1),
The Stride S (We will come back to H2 this later) The number of filters K The spatial extent (F ) of each filter H1 (the depth of each filter is same as the depth of each input)
The output is W2 × H2 × D2 (we will W2 soon see a formula for computing W2, W1
D2 H2 and D2)
D1
We first define the following quantit- ies
18/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 F and Depth (D1) of the original input
F
D1
Height (H1)
The Stride S (We will come back to H2 this later) The number of filters K The spatial extent (F ) of each filter H1 (the depth of each filter is same as the depth of each input)
The output is W2 × H2 × D2 (we will W2 soon see a formula for computing W2,
D2 H2 and D2)
D1
We first define the following quantit- ies
Width (W1),
W1
18/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 F
F
D1
and Depth (D1) of the original input The Stride S (We will come back to H2 this later) The number of filters K The spatial extent (F ) of each filter (the depth of each filter is same as the depth of each input)
The output is W2 × H2 × D2 (we will W2 soon see a formula for computing W2,
D2 H2 and D2)
D1
We first define the following quantit- ies
Width (W1), Height (H1)
H1
W1
18/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 F
F
D1
The Stride S (We will come back to H2 this later) The number of filters K The spatial extent (F ) of each filter (the depth of each filter is same as the depth of each input)
The output is W2 × H2 × D2 (we will W2 soon see a formula for computing W2,
D2 H2 and D2)
We first define the following quantit- ies
Width (W1), Height (H1) and Depth (D1) of the original input
H1
W1
D1
18/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 F
F
D1
H2
The number of filters K The spatial extent (F ) of each filter (the depth of each filter is same as the depth of each input)
The output is W2 × H2 × D2 (we will W2 soon see a formula for computing W2,
D2 H2 and D2)
We first define the following quantit- ies
Width (W1), Height (H1) and Depth (D1) of the original input The Stride S (We will come back to this later)
H1
W1
D1
18/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 F
F
H2 D1 The number of filters K The spatial extent (F ) of each filter (the depth of each filter is same as the depth of each input)
The output is W2 × H2 × D2 (we will W2 soon see a formula for computing W2,
D2 H2 and D2)
We first define the following quantit- ies
Width (W1), Height (H1) and Depth (D1) of the original input The Stride S (We will come back to this later)
H1
W1
D1
18/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 F
F
H2 D1
The spatial extent (F ) of each filter (the depth of each filter is same as the depth of each input)
The output is W2 × H2 × D2 (we will W2 soon see a formula for computing W2,
D2 H2 and D2)
We first define the following quantit- ies
Width (W1), Height (H1) and Depth (D1) of the original input The Stride S (We will come back to this later) The number of filters K
H1
W1
D1
18/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 H2
The output is W2 × H2 × D2 (we will W2 soon see a formula for computing W2,
D2 H2 and D2)
We first define the following quantit- ies
F Width (W1), Height (H1) and Depth (D1) of the original input
F The Stride S (We will come back to D1 this later) The number of filters K The spatial extent (F ) of each filter H1 (the depth of each filter is same as the depth of each input)
W1
D1
18/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 We first define the following quantit- ies
F Width (W1), Height (H1) and Depth (D1) of the original input
F The Stride S (We will come back to
H2 D1 this later) The number of filters K The spatial extent (F ) of each filter H1 (the depth of each filter is same as the depth of each input)
The output is W2 × H2 × D2 (we will W2 soon see a formula for computing W2, W1
D2 H2 and D2)
D1
18/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Notice that we can’t place the kernel at the = corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input
Let us compute the dimension (W2,H2) of the output
19/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Notice that we can’t place the kernel at the = corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input
Let us compute the dimension (W2,H2) of the output
=
19/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Notice that we can’t place the kernel at the = corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input
Let us compute the dimension (W2,H2) of the output
=
19/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Notice that we can’t place the kernel at the = corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input
Let us compute the dimension (W2,H2) of the output
=
19/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Notice that we can’t place the kernel at the = corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input
Let us compute the dimension (W2,H2) of the output
=
19/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Notice that we can’t place the kernel at the = corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input
Let us compute the dimension (W2,H2) of the output
=
19/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Notice that we can’t place the kernel at the = corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input
Let us compute the dimension (W2,H2) of the output
=
19/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 = This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input
Let us compute the dimension (W2,H2) of the output Notice that we can’t place the kernel at the = corners as it will cross the input boundary
pixel of interest
19/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 = This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input
Let us compute the dimension (W2,H2) of the output Notice that we can’t place the kernel at the = corners as it will cross the input boundary
19/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 =
This results in an output which is of smaller dimensions than the input
Let us compute the dimension (W2,H2) of the output Notice that we can’t place the kernel at the = corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary)
19/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 =
Let us compute the dimension (W2,H2) of the output Notice that we can’t place the kernel at the = corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input
19/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 =
In general, W2 = W1 − F + 1
H2 = H1 − F + 1 For example, let’s consider a 5 × 5 kernel We will refine this formula further We have an even smaller output now
Let us compute the dimension (W2,H2) of the output Notice that we can’t place the kernel at the = corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input As the size of the kernel increases, this be- comes true for even more pixels
20/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 =
In general, W2 = W1 − F + 1
H2 = H1 − F + 1 We will refine this formula further We have an even smaller output now
Let us compute the dimension (W2,H2) of the output Notice that we can’t place the kernel at the = corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input As the size of the kernel increases, this be- comes true for even more pixels For example, let’s consider a 5 × 5 kernel
20/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 =
In general, W2 = W1 − F + 1
H2 = H1 − F + 1 We will refine this formula further
Let us compute the dimension (W2,H2) of the output Notice that we can’t place the kernel at the = corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input As the size of the kernel increases, this be- comes true for even more pixels For example, let’s consider a 5 × 5 kernel We have an even smaller output now
20/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 =
In general, W2 = W1 − F + 1
H2 = H1 − F + 1 We will refine this formula further
Let us compute the dimension (W2,H2) of the output Notice that we can’t place the kernel at the = corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input As the size of the kernel increases, this be- comes true for even more pixels For example, let’s consider a 5 × 5 kernel We have an even smaller output now
20/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 =
In general, W2 = W1 − F + 1
H2 = H1 − F + 1 We will refine this formula further
Let us compute the dimension (W2,H2) of the output Notice that we can’t place the kernel at the = corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input As the size of the kernel increases, this be- comes true for even more pixels For example, let’s consider a 5 × 5 kernel We have an even smaller output now
20/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 =
In general, W2 = W1 − F + 1
H2 = H1 − F + 1 We will refine this formula further
Let us compute the dimension (W2,H2) of the output Notice that we can’t place the kernel at the = corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller pixel of interest dimensions than the input As the size of the kernel increases, this be- comes true for even more pixels For example, let’s consider a 5 × 5 kernel We have an even smaller output now
20/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 =
In general, W2 = W1 − F + 1
H2 = H1 − F + 1 We will refine this formula further
Let us compute the dimension (W2,H2) of the output Notice that we can’t place the kernel at the = corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller pixel of interest dimensions than the input As the size of the kernel increases, this be- comes true for even more pixels For example, let’s consider a 5 × 5 kernel We have an even smaller output now
20/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 =
Let us compute the dimension (W2,H2) of the output Notice that we can’t place the kernel at the = corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input As the size of the kernel increases, this be- In general, W2 = W1 − F + 1 comes true for even more pixels
H2 = H1 − F + 1 For example, let’s consider a 5 × 5 kernel We will refine this formula further We have an even smaller output now
20/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 0 0 0 0 0 0 0 0 0 0 0 We can use something known as padding 0 0 0 0 Pad the inputs with appropriate number of 0 0 0 = inputs so that you can now apply the kernel 0 0 at the corners 0 0 Let us use pad P = 1 with a 3 × 3 kernel 0 0 This means we will add one row and one 0 0 0 0 0 0 0 0 0 column of 0 inputs at the top, bottom, left and right
We now have,
W2 = W1 − F + 2P + 1
H2 = H1 − F + 2P + 1 We will refine this formula further
What if we want the output to be of same size as the input?
21/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Pad the inputs with appropriate number of 0 0 0 = inputs so that you can now apply the kernel 0 0 at the corners 0 0 Let us use pad P = 1 with a 3 × 3 kernel 0 0 This means we will add one row and one 0 0 0 0 0 0 0 0 0 column of 0 inputs at the top, bottom, left and right
We now have,
W2 = W1 − F + 2P + 1
H2 = H1 − F + 2P + 1 We will refine this formula further
What if we want the output to be of same size as the input? We can use something known as padding
21/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 = 0 0 0 0 Let us use pad P = 1 with a 3 × 3 kernel 0 0 This means we will add one row and one 0 0 0 0 0 0 0 0 0 column of 0 inputs at the top, bottom, left and right
We now have,
W2 = W1 − F + 2P + 1
H2 = H1 − F + 2P + 1 We will refine this formula further
What if we want the output to be of same size as the input? We can use something known as padding Pad the inputs with appropriate number of 0 inputs so that you can now apply the kernel at the corners
21/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 This means we will add one row and one column of 0 inputs at the top, bottom, left and right
We now have,
W2 = W1 − F + 2P + 1
H2 = H1 − F + 2P + 1 We will refine this formula further
What if we want the output to be of same 0 0 0 0 0 0 0 0 0 size as the input? 0 0 We can use something known as padding 0 0 0 0 Pad the inputs with appropriate number of 0 0 0 = inputs so that you can now apply the kernel 0 0 at the corners 0 0 Let us use pad P = 1 with a 3 × 3 kernel 0 0 0 0 0 0 0 0 0 0 0
21/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 We now have,
W2 = W1 − F + 2P + 1
H2 = H1 − F + 2P + 1 We will refine this formula further
What if we want the output to be of same 0 0 0 0 0 0 0 0 0 size as the input? 0 0 We can use something known as padding 0 0 0 0 Pad the inputs with appropriate number of 0 0 0 = inputs so that you can now apply the kernel 0 0 at the corners 0 0 Let us use pad P = 1 with a 3 × 3 kernel 0 0 This means we will add one row and one 0 0 0 0 0 0 0 0 0 column of 0 inputs at the top, bottom, left and right
21/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 We now have,
W2 = W1 − F + 2P + 1
H2 = H1 − F + 2P + 1 We will refine this formula further
What if we want the output to be of same 0 0 0 0 0 0 0 0 0 size as the input? 0 0 We can use something known as padding 0 0 0 0 Pad the inputs with appropriate number of 0 0 0 = inputs so that you can now apply the kernel 0 0 at the corners 0 0 Let us use pad P = 1 with a 3 × 3 kernel 0 0 This means we will add one row and one 0 0 0 0 0 0 0 0 0 column of 0 inputs at the top, bottom, left and right
21/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 We now have,
W2 = W1 − F + 2P + 1
H2 = H1 − F + 2P + 1 We will refine this formula further
What if we want the output to be of same 0 0 0 0 0 0 0 0 0 size as the input? 0 0 We can use something known as padding 0 0 0 0 Pad the inputs with appropriate number of 0 0 0 = inputs so that you can now apply the kernel 0 0 at the corners 0 0 Let us use pad P = 1 with a 3 × 3 kernel 0 0 This means we will add one row and one 0 0 0 0 0 0 0 0 0 column of 0 inputs at the top, bottom, left and right
21/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 We now have,
W2 = W1 − F + 2P + 1
H2 = H1 − F + 2P + 1 We will refine this formula further
What if we want the output to be of same 0 0 0 0 0 0 0 0 0 size as the input? 0 0 We can use something known as padding 0 0 0 0 Pad the inputs with appropriate number of 0 0 0 = inputs so that you can now apply the kernel 0 0 at the corners 0 0 Let us use pad P = 1 with a 3 × 3 kernel 0 0 This means we will add one row and one 0 0 0 0 0 0 0 0 0 column of 0 inputs at the top, bottom, left and right
21/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 What if we want the output to be of same 0 0 0 0 0 0 0 0 0 size as the input? 0 0 We can use something known as padding 0 0 0 0 Pad the inputs with appropriate number of 0 0 0 = inputs so that you can now apply the kernel 0 0 at the corners 0 0 Let us use pad P = 1 with a 3 × 3 kernel 0 0 This means we will add one row and one 0 0 0 0 0 0 0 0 0 column of 0 inputs at the top, bottom, left and right
We now have,
W2 = W1 − F + 2P + 1
H2 = H1 − F + 2P + 1 We will refine this formula further
21/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 It defines the intervals at which the So what should our final formula look like, filter is applied (here S = 2) W1 − F + 2P W2 = + 1 Here, we are essentially skipping S H − F + 2P every 2nd pixel which will again H = 1 + 1 2 S result in an output which is of smaller dimensions
What does the stride S do?
22/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 So what should our final formula look like,
W1 − F + 2P W2 = + 1 Here, we are essentially skipping S H − F + 2P every 2nd pixel which will again H = 1 + 1 2 S result in an output which is of smaller dimensions
What does the stride S do? It defines the intervals at which the filter is applied (here S = 2)
22/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 So what should our final formula look like, W − F + 2P W = 1 + 1 2 S H − F + 2P H = 1 + 1 2 S
0 0 0 0 0 0 0 0 0 What does the stride S do? 0 0 0 0 It defines the intervals at which the 0 0 filter is applied (here S = 2) 0 0 = Here, we are essentially skipping 0 0 0 0 every 2nd pixel which will again 0 0 result in an output which is of 0 0 0 0 0 0 0 0 0 smaller dimensions
22/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 So what should our final formula look like, W − F + 2P W = 1 + 1 2 S H − F + 2P H = 1 + 1 2 S
0 0 0 0 0 0 0 0 0 What does the stride S do? 0 0 0 0 It defines the intervals at which the 0 0 filter is applied (here S = 2) 0 0 = Here, we are essentially skipping 0 0 0 0 every 2nd pixel which will again 0 0 result in an output which is of 0 0 0 0 0 0 0 0 0 smaller dimensions
22/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 So what should our final formula look like, W − F + 2P W = 1 + 1 2 S H − F + 2P H = 1 + 1 2 S
0 0 0 0 0 0 0 0 0 What does the stride S do? 0 0 0 0 It defines the intervals at which the 0 0 filter is applied (here S = 2) 0 0 = Here, we are essentially skipping 0 0 0 0 every 2nd pixel which will again 0 0 result in an output which is of 0 0 0 0 0 0 0 0 0 smaller dimensions
22/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 So what should our final formula look like, W − F + 2P W = 1 + 1 2 S H − F + 2P H = 1 + 1 2 S
0 0 0 0 0 0 0 0 0 What does the stride S do? 0 0 0 0 It defines the intervals at which the 0 0 filter is applied (here S = 2) 0 0 = Here, we are essentially skipping 0 0 0 0 every 2nd pixel which will again 0 0 result in an output which is of 0 0 0 0 0 0 0 0 0 smaller dimensions
22/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 So what should our final formula look like, W − F + 2P W = 1 + 1 2 S H − F + 2P H = 1 + 1 2 S
0 0 0 0 0 0 0 0 0 What does the stride S do? 0 0 0 0 It defines the intervals at which the 0 0 filter is applied (here S = 2) 0 0 = Here, we are essentially skipping 0 0 0 0 every 2nd pixel which will again 0 0 result in an output which is of 0 0 0 0 0 0 0 0 0 smaller dimensions
22/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 So what should our final formula look like, W − F + 2P W = 1 + 1 2 S H − F + 2P H = 1 + 1 2 S
0 0 0 0 0 0 0 0 0 What does the stride S do? 0 0 0 0 It defines the intervals at which the 0 0 filter is applied (here S = 2) 0 0 = Here, we are essentially skipping 0 0 0 0 every 2nd pixel which will again 0 0 result in an output which is of 0 0 0 0 0 0 0 0 0 smaller dimensions
22/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 So what should our final formula look like, W − F + 2P W = 1 + 1 2 S H − F + 2P H = 1 + 1 2 S
0 0 0 0 0 0 0 0 0 What does the stride S do? 0 0 0 0 It defines the intervals at which the 0 0 filter is applied (here S = 2) 0 0 = Here, we are essentially skipping 0 0 0 0 every 2nd pixel which will again 0 0 result in an output which is of 0 0 0 0 0 0 0 0 0 smaller dimensions
22/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 So what should our final formula look like, W − F + 2P W = 1 + 1 2 S H − F + 2P H = 1 + 1 2 S
0 0 0 0 0 0 0 0 0 What does the stride S do? 0 0 0 0 It defines the intervals at which the 0 0 filter is applied (here S = 2) 0 0 = Here, we are essentially skipping 0 0 0 0 every 2nd pixel which will again 0 0 result in an output which is of 0 0 0 0 0 0 0 0 0 smaller dimensions
22/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 So what should our final formula look like, W − F + 2P W = 1 + 1 2 S H − F + 2P H = 1 + 1 2 S
0 0 0 0 0 0 0 0 0 What does the stride S do? 0 0 0 0 It defines the intervals at which the 0 0 filter is applied (here S = 2) 0 0 = Here, we are essentially skipping 0 0 0 0 every 2nd pixel which will again 0 0 result in an output which is of 0 0 0 0 0 0 0 0 0 smaller dimensions
22/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 So what should our final formula look like, W − F + 2P W = 1 + 1 2 S H − F + 2P H = 1 + 1 2 S
0 0 0 0 0 0 0 0 0 What does the stride S do? 0 0 0 0 It defines the intervals at which the 0 0 filter is applied (here S = 2) 0 0 = Here, we are essentially skipping 0 0 0 0 every 2nd pixel which will again 0 0 result in an output which is of 0 0 0 0 0 0 0 0 0 smaller dimensions
22/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 So what should our final formula look like, W − F + 2P W = 1 + 1 2 S H − F + 2P H = 1 + 1 2 S
0 0 0 0 0 0 0 0 0 What does the stride S do? 0 0 0 0 It defines the intervals at which the 0 0 filter is applied (here S = 2) 0 0 = Here, we are essentially skipping 0 0 0 0 every 2nd pixel which will again 0 0 result in an output which is of 0 0 0 0 0 0 0 0 0 smaller dimensions
22/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 So what should our final formula look like, W − F + 2P W = 1 + 1 2 S H − F + 2P H = 1 + 1 2 S
0 0 0 0 0 0 0 0 0 What does the stride S do? 0 0 0 0 It defines the intervals at which the 0 0 filter is applied (here S = 2) 0 0 = Here, we are essentially skipping 0 0 0 0 every 2nd pixel which will again 0 0 result in an output which is of 0 0 0 0 0 0 0 0 0 smaller dimensions
22/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 So what should our final formula look like, W − F + 2P W = 1 + 1 2 S H − F + 2P H = 1 + 1 2 S
0 0 0 0 0 0 0 0 0 What does the stride S do? 0 0 0 0 It defines the intervals at which the 0 0 filter is applied (here S = 2) 0 0 = Here, we are essentially skipping 0 0 0 0 every 2nd pixel which will again 0 0 result in an output which is of 0 0 0 0 0 0 0 0 0 smaller dimensions
22/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 So what should our final formula look like, W − F + 2P W = 1 + 1 2 S H − F + 2P H = 1 + 1 2 S
0 0 0 0 0 0 0 0 0 What does the stride S do? 0 0 0 0 It defines the intervals at which the 0 0 filter is applied (here S = 2) 0 0 = Here, we are essentially skipping 0 0 0 0 every 2nd pixel which will again 0 0 result in an output which is of 0 0 0 0 0 0 0 0 0 smaller dimensions
22/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 W − F + 2P W = 1 + 1 2 S H − F + 2P H = 1 + 1 2 S
0 0 0 0 0 0 0 0 0 What does the stride S do? 0 0 0 0 It defines the intervals at which the 0 0 filter is applied (here S = 2) 0 0 = Here, we are essentially skipping 0 0 0 0 every 2nd pixel which will again 0 0 result in an output which is of 0 0 0 0 0 0 0 0 0 smaller dimensions
So what should our final formula look like,
22/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 0 0 0 0 0 0 0 0 0 What does the stride S do? 0 0 0 0 It defines the intervals at which the 0 0 filter is applied (here S = 2) 0 0 = Here, we are essentially skipping 0 0 0 0 every 2nd pixel which will again 0 0 result in an output which is of 0 0 0 0 0 0 0 0 0 smaller dimensions
So what should our final formula look like, W − F + 2P W = 1 + 1 2 S H − F + 2P H = 1 + 1 2 S
22/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Each filter gives us one 2D output. K filters will give us K such 2D out-
H2 puts We can think of the resulting output as K × W2 × H2 volume
Thus D2 = K
W2
D2 = K
W1−F +2P W2 = S + 1
H1−F +2P H2 = S + 1
D2 = K
Finally, coming to the depth of the output.
filter
H1
W1
D1
23/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 K filters will give us K such 2D out-
H2 puts We can think of the resulting output
filter as K × W2 × H2 volume
Thus D2 = K
W2
D2 = K
W1−F +2P W2 = S + 1
H1−F +2P H2 = S + 1
D2 = K
Finally, coming to the depth of the output. Each filter gives us one 2D output.
H1
W1
D1
23/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 We can think of the resulting output
filter as K × W2 × H2 volume
Thus D2 = K
Finally, coming to the depth of the output. Each filter gives us one 2D output. K filters will give us K such 2D out-
H2 puts
H1
W2
W1
D2 = K
D1 W1−F +2P W2 = S + 1
H1−F +2P H2 = S + 1
D2 = K
23/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 filter
Thus D2 = K
Finally, coming to the depth of the output. Each filter gives us one 2D output. K filters will give us K such 2D out-
H2 puts We can think of the resulting output as K × W2 × H2 volume H1
W2
W1
D2 = K
D1 W1−F +2P W2 = S + 1
H1−F +2P H2 = S + 1
D2 = K
23/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 filter
Finally, coming to the depth of the output. Each filter gives us one 2D output. K filters will give us K such 2D out-
H2 puts We can think of the resulting output as K × W2 × H2 volume H1 Thus D2 = K
W2
W1
D2 = K
D1 W1−F +2P W2 = S + 1
H1−F +2P H2 = S + 1
D2 = K
23/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 H2 =?
W2 =?
227−11 55 = 4 + 1
227−11 55 = 4 + 1
96
Let us do a few exercises
H2 =?
11 ∗ =
11 227 3 W =? 96 filters 2 227 Stride = 4 P adding = 0
W1−F +2P D2 =? W2 = S + 1
3 H1−F +2P H2 = S + 1
24/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 W2 =?
227−11 55 =H2 =?4 + 1
227−11 55 =W2 4=? + 1
D2 =?
Let us do a few exercises
H2 =?
11 ∗ =
11 227 3 W =? 96 filters 2 227 Stride = 4 P adding = 0
W1−F +2P 96 W2 = S + 1
3 H1−F +2P H2 = S + 1
24/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 H2 =?
227−11 55 =W2 4=? + 1
D2 =?
Let us do a few exercises
227−11 55 = 4 + 1 11 ∗ =
11 227 3 W =? 96 filters 2 227 Stride = 4 P adding = 0
W1−F +2P 96 W2 = S + 1
3 H1−F +2P H2 = S + 1
24/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 H2 =?
W2 =?
D2 =?
Let us do a few exercises
227−11 55 = 4 + 1 11 ∗ =
11 227 3 55 = 227−11 + 1 96 filters 4 227 Stride = 4 P adding = 0
W1−F +2P 96 W2 = S + 1
3 H1−F +2P H2 = S + 1
24/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 H2 =?
W2 =?
32−5 28 = 1 + 1
32−5 28 = 1 + 1
6
Let us do a few exercises
H2 =?
5 ∗ =
5 32 1 W =? 6 filters 2 32 Stride = 1 P adding = 0
W1−F +2P D2 =? W2 = S + 1
1 H1−F +2P H2 = S + 1
25/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 W2 =?
32−5 28 =H2 =?1 + 1
32−5 28 =W2 1=?+ 1
D2 =?
Let us do a few exercises
H2 =?
5 ∗ =
5 32 1 W =? 6 filters 2 32 Stride = 1 P adding = 0
W1−F +2P 6 W2 = S + 1
1 H1−F +2P H2 = S + 1
25/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 H2 =?
32−5 28 =W2 1=?+ 1
D2 =?
Let us do a few exercises
32−5 28 = 1 + 1 5 ∗ =
5 32 1 W =? 6 filters 2 32 Stride = 1 P adding = 0
W1−F +2P 6 W2 = S + 1
1 H1−F +2P H2 = S + 1
25/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 H2 =?
W2 =?
D2 =?
Let us do a few exercises
32−5 28 = 1 + 1 5 ∗ =
5 32 1 28 = 32−5 + 1 6 filters 1 32 Stride = 1 P adding = 0
W1−F +2P 6 W2 = S + 1
1 H1−F +2P H2 = S + 1
25/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Module 11.3 : Convolutional Neural Networks
26/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 We will try to understand this by considering the task of “image classification”
Putting things into perspective What is the connection between this operation (convolution) and neural net- works?
27/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Putting things into perspective What is the connection between this operation (convolution) and neural net- works? We will try to understand this by considering the task of “image classification”
27/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Features
Raw pixels car, bus, monument, flower
Edge Detector car, bus, monument, flower
SIF T/HOG car, bus, monument, flower
static feature extraction (no learning) learning weights of classifier
28/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 car, bus, monument, flower
Edge Detector car, bus, monument, flower
SIF T/HOG car, bus, monument, flower
static feature extraction (no learning) learning weights of classifier
Features
Raw pixels
28/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Edge Detector car, bus, monument, flower
SIF T/HOG car, bus, monument, flower
static feature extraction (no learning) learning weights of classifier
Features
Raw pixels car, bus, monument, flower
28/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Edge Detector car, bus, monument, flower
SIF T/HOG car, bus, monument, flower
static feature extraction (no learning) learning weights of classifier
Features
Raw pixels car, bus, monument, flower
28/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 car, bus, monument, flower
SIF T/HOG car, bus, monument, flower
static feature extraction (no learning) learning weights of classifier
Features
Raw pixels car, bus, monument, flower
Edge Detector
28/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 SIF T/HOG car, bus, monument, flower
static feature extraction (no learning) learning weights of classifier
Features
Raw pixels car, bus, monument, flower
Edge Detector car, bus, monument, flower
28/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 SIF T/HOG car, bus, monument, flower
static feature extraction (no learning) learning weights of classifier
Features
Raw pixels car, bus, monument, flower
Edge Detector car, bus, monument, flower
28/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 car, bus, monument, flower
static feature extraction (no learning) learning weights of classifier
Features
Raw pixels car, bus, monument, flower
Edge Detector car, bus, monument, flower
SIF T/HOG
28/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 static feature extraction (no learning) learning weights of classifier
Features
Raw pixels car, bus, monument, flower
Edge Detector car, bus, monument, flower
SIF T/HOG car, bus, monument, flower
28/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Features
Raw pixels car, bus, monument, flower
Edge Detector car, bus, monument, flower
SIF T/HOG car, bus, monument, flower
static feature extraction (no learning) learning weights of classifier
28/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 car, bus, monument, flower
-1.21358689e-03 3.23652686e-03 ······ -2.06615720e-02 -1.52757822e-03 2.36130832e-03 ······ -1.19824838e-02 ...... Learn these weights ...... -8.25322699e-04 -5.14897937e-03 ······ -9.90395527e-03
Input Features Classifier
car, bus, monument, flower
0 0 0 0 0 0 1 1 1 0 0 1 -8 1 0 0 1 1 1 0 0 0 0 0 0
Instead of using handcrafted kernels such as edge detectors can we learn meaningful ker- nels/filters in addition to learning the weights of the classifier? 29/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Learn these weights
Input Features Classifier
car, bus, monument, flower
0 0 0 0 0 0 1 1 1 0 0 1 -8 1 0 0 1 1 1 0 0 0 0 0 0
car, bus, monument, flower
-1.21358689e-03 3.23652686e-03 ······ -2.06615720e-02 -1.52757822e-03 2.36130832e-03 ······ -1.19824838e-02 ...... -8.25322699e-04 -5.14897937e-03 ······ -9.90395527e-03
Instead of using handcrafted kernels such as edge detectors can we learn meaningful ker- nels/filters in addition to learning the weights of the classifier? 29/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Input Features Classifier
car, bus, monument, flower
0 0 0 0 0 0 1 1 1 0 0 1 -8 1 0 0 1 1 1 0 0 0 0 0 0
car, bus, monument, flower
-1.21358689e-03 3.23652686e-03 ······ -2.06615720e-02 -1.52757822e-03 2.36130832e-03 ······ -1.19824838e-02 ...... Learn these weights ...... -8.25322699e-04 -5.14897937e-03 ······ -9.90395527e-03
Instead of using handcrafted kernels such as edge detectors can we learn meaningful ker- nels/filters in addition to learning the weights of the classifier? 29/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 -0.02337041-0.01871333 -0.03243878 -0.01075948············ 0.04684572-0.04728875 -0.053751580.00104325 -0.05350766 0.01935937······ ······-0.043236740.01016542 ...... -0.007925010.03008777 -0.00503319 0.00335217 ············ -0.027911280.00174674
Input Features Classifier
car, bus, monument, flower
0 0 0 0 0 0 1 1 1 0 0 1 -8 1 0 0 1 1 1 0 0 0 0 0 0
car, bus, monument, flower
-1.21358689e-03 3.23652686e-03 ······ -2.06615720e-02 -1.52757822e-03 2.36130832e-03 ······ -1.19824838e-02 ...... -8.25322699e-04 -5.14897937e-03 ······ -9.90395527e-03 Even better: Instead of using handcrafted kernels (such as edge detectors)can we learn multiple meaningful kernels/filters in addition to learning the weights of the clas- sifier? 30/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 -1.21358689e-03-0.01871333 -0.01075948 3.23652686e-03 ············ -2.06615720e-020.04684572 -1.52757822e-030.00104325 0.01935937 2.36130832e-03 ············ -1.19824838e-020.01016542 ...... -8.25322699e-040.03008777 0.00335217-5.14897937e-03 ············ -9.90395527e-03-0.02791128
Input Features Classifier
car, bus, monument, flower
0 0 0 0 0 0 1 1 1 0 0 1 -8 1 0 0 1 1 1 0 0 0 0 0 0
car, bus, monument, flower
-0.02337041 -0.03243878 ······ -0.04728875 -0.05375158 -0.05350766 ······ -0.04323674 ...... -0.00792501 -0.00503319 ······ 0.00174674 Even better: Instead of using handcrafted kernels (such as edge detectors)can we learn multiple meaningful kernels/filters in addition to learning the weights of the clas- sifier? 30/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 -0.02337041-1.21358689e-03 -0.03243878 3.23652686e-03············ -2.06615720e-02-0.04728875 -0.05375158-1.52757822e-03 -0.05350766 2.36130832e-03······ ······-0.04323674-1.19824838e-02 ...... -0.00792501-8.25322699e-04 -0.00503319 -5.14897937e-03············ -9.90395527e-030.00174674
Input Features Classifier
car, bus, monument, flower
0 0 0 0 0 0 1 1 1 0 0 1 -8 1 0 0 1 1 1 0 0 0 0 0 0
car, bus, monument, flower
-0.01871333 -0.01075948 ······ 0.04684572 0.00104325 0.01935937 ······ 0.01016542 ...... 0.03008777 0.00335217 ······ -0.02791128 Even better: Instead of using handcrafted kernels (such as edge detectors)can we learn multiple meaningful kernels/filters in addition to learning the weights of the clas- sifier? 30/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Input Classifier
car, bus, monument, flower
-1.21358689e-03 3.23652686e-03 ······ -2.06615720e-02 -0.01112582 0.02185669 ······ 0.00015161 -1.52757822e-03 2.36130832e-03 ······ -1.19824838e-02 -0.00687587 0.01229961 ······ 0.00214013 ...... backpropagation ...... -8.25322699e-04 -5.14897937e-03 ······ -9.90395527e-03 -0.00372989 -0.00886137 ······ -0.01974954
Yes, we can ! Simply by treating these kernels as parameters and learning them in addition to the weights of the classifier (using back propagation) Such a network is called a Convolutional Neural Network.
Can we learn multiple layers of meaningful kernels/filters in addition to learning the weights of the classifier?
31/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 -1.21358689e-03 3.23652686e-03 ······ -2.06615720e-02 -0.01112582 0.02185669 ······ 0.00015161 -1.52757822e-03 2.36130832e-03 ······ -1.19824838e-02 -0.00687587 0.01229961 ······ 0.00214013 ...... backpropagation ...... -8.25322699e-04 -5.14897937e-03 ······ -9.90395527e-03 -0.00372989 -0.00886137 ······ -0.01974954
Simply by treating these kernels as parameters and learning them in addition to the weights of the classifier (using back propagation) Such a network is called a Convolutional Neural Network.
Input Classifier
car, bus, monument, flower
Can we learn multiple layers of meaningful kernels/filters in addition to learning the weights of the classifier? Yes, we can !
31/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Such a network is called a Convolutional Neural Network.
Input Classifier
car, bus, monument, flower
-1.21358689e-03 3.23652686e-03 ······ -2.06615720e-02 -0.01112582 0.02185669 ······ 0.00015161 -1.52757822e-03 2.36130832e-03 ······ -1.19824838e-02 -0.00687587 0.01229961 ······ 0.00214013 ...... backpropagation ...... -8.25322699e-04 -5.14897937e-03 ······ -9.90395527e-03 -0.00372989 -0.00886137 ······ -0.01974954
Can we learn multiple layers of meaningful kernels/filters in addition to learning the weights of the classifier? Yes, we can ! Simply by treating these kernels as parameters and learning them in addition to the weights of the classifier (using back propagation)
31/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Input Classifier
car, bus, monument, flower
-1.21358689e-03 3.23652686e-03 ······ -2.06615720e-02 -0.01112582 0.02185669 ······ 0.00015161 -1.52757822e-03 2.36130832e-03 ······ -1.19824838e-02 -0.00687587 0.01229961 ······ 0.00214013 ...... backpropagation ...... -8.25322699e-04 -5.14897937e-03 ······ -9.90395527e-03 -0.00372989 -0.00886137 ······ -0.01974954
Can we learn multiple layers of meaningful kernels/filters in addition to learning the weights of the classifier? Yes, we can ! Simply by treating these kernels as parameters and learning them in addition to the weights of the classifier (using back propagation)
Such a network is called a Convolutional Neural Network. 31/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 But how is this different from a regular feedforward neural network Let us see
Okay, I get it that the idea is to learn the kernel/filters by just treating them as parameters of the classification model
32/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Let us see
Okay, I get it that the idea is to learn the kernel/filters by just treating them as parameters of the classification model But how is this different from a regular feedforward neural network
32/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Okay, I get it that the idea is to learn the kernel/filters by just treating them as parameters of the classification model But how is this different from a regular feedforward neural network Let us see
32/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 10 classes(digits)
This is what a regular feed-forward neural network will look like ... There are many dense connections here
For example all the 16 input neurons are contributing to the computation of h11 16 Contrast this to what happens in the case of convolution
2
33/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 10 classes(digits)
This is what a regular feed-forward neural network will look like ... There are many dense connections here
For example all the 16 input neurons are contributing to the computation of h11
Contrast this to what happens in the case of convolution
16 2
33/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 This is what a regular feed-forward neural network will look like ... There are many dense connections here
For example all the 16 input neurons are contributing to the computation of h11
Contrast this to what happens in the case of convolution
10 classes(digits)
16 2
33/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 This is what a regular feed-forward neural network will look like ... There are many dense connections here
For example all the 16 input neurons are contributing to the computation of h11
Contrast this to what happens in the case of convolution
10 classes(digits)
16 2
33/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 This is what a regular feed-forward neural network will look like ... There are many dense connections here
For example all the 16 input neurons are contributing to the computation of h11
Contrast this to what happens in the case of convolution
10 classes(digits)
16 2
33/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 This is what a regular feed-forward neural network will look like
There are many dense connections here
For example all the 16 input neurons are contributing to the computation of h11
Contrast this to what happens in the case of convolution
10 classes(digits) ...
16 2
33/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 This is what a regular feed-forward neural network will look like
There are many dense connections here
For example all the 16 input neurons are contributing to the computation of h11
Contrast this to what happens in the case of convolution
10 classes(digits) ...
16 2
33/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 This is what a regular feed-forward neural network will look like
There are many dense connections here
For example all the 16 input neurons are contributing to the computation of h11
Contrast this to what happens in the case of convolution
10 classes(digits) ...
16 2
33/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 This is what a regular feed-forward neural network will look like
There are many dense connections here
For example all the 16 input neurons are contributing to the computation of h11
Contrast this to what happens in the case of convolution
10 classes(digits) ...
16 2
33/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 There are many dense connections here
For example all the 16 input neurons are contributing to the computation of h11
Contrast this to what happens in the case of convolution
10 classes(digits)
This is what a regular feed-forward neural network will look like ...
16 2
33/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 For example all the 16 input neurons are contributing to the computation of h11
Contrast this to what happens in the case of convolution
10 classes(digits)
This is what a regular feed-forward neural network will look like ... There are many dense connections here
16 2
33/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Contrast this to what happens in the case of convolution
10 classes(digits)
This is what a regular feed-forward neural network will look like ... There are many dense connections here
For example all the 16 input neurons are contributing to the computation of h11 16 2
33/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 10 classes(digits)
This is what a regular feed-forward neural network will look like ... There are many dense connections here
For example all the 16 input neurons are contributing to the computation of h11 16 Contrast this to what happens in the 2 case of convolution
33/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 h11 h12 Only a few local neurons participate in the computation of h11
For example, only pixels 1, 2, 5, 6 contribute to h11
The connections are much sparser
We are taking advantage of the structure of the image(interactions h11 between neighboring pixels are more interesting)
This sparse connectivity reduces the number of parameters in the model
...
16
2 * =
34/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 h11 h12
For example, only pixels 1, 2, 5, 6 contribute to h11
The connections are much sparser
We are taking advantage of the structure of the image(interactions h11 between neighboring pixels are more interesting)
This sparse connectivity reduces the number of parameters in the model
Only a few local neurons participate in the computation of h11 ...
16
2 * =
34/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 h12
The connections are much sparser
We are taking advantage of the structure of the image(interactions between neighboring pixels are more interesting)
This sparse connectivity reduces the number of parameters in the model
h11 Only a few local neurons participate in the computation of h11
For example, only pixels 1, 2, 5, 6 ... contribute to h11
16
h11 2 * =
34/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The connections are much sparser
We are taking advantage of the structure of the image(interactions h11 between neighboring pixels are more interesting)
This sparse connectivity reduces the number of parameters in the model
h11 h12 Only a few local neurons participate in the computation of h11
For example, only pixels 1, 2, 5, 6 ... contribute to h11
16
h12 2 * =
34/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The connections are much sparser
We are taking advantage of the structure of the image(interactions h11 between neighboring pixels are more interesting)
This sparse connectivity reduces the number of parameters in the model
h11 h12 Only a few local neurons participate in the computation of h11
For example, only pixels 1, 2, 5, 6 ... contribute to h11
16
h13 2 * =
34/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The connections are much sparser
We are taking advantage of the structure of the image(interactions h11 between neighboring pixels are more interesting)
This sparse connectivity reduces the number of parameters in the model
h11 h12 Only a few local neurons participate in the computation of h11
For example, only pixels 1, 2, 5, 6 ... contribute to h11
16
h14 2 * =
34/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 We are taking advantage of the structure of the image(interactions h11 between neighboring pixels are more interesting)
This sparse connectivity reduces the number of parameters in the model
h11 h12 Only a few local neurons participate in the computation of h11
For example, only pixels 1, 2, 5, 6 ... contribute to h11
16 The connections are much sparser
h14 2 * =
34/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 h11
This sparse connectivity reduces the number of parameters in the model
h11 h12 Only a few local neurons participate in the computation of h11
For example, only pixels 1, 2, 5, 6 ... contribute to h11
16 The connections are much sparser We are taking advantage of the structure of the image(interactions h14 between neighboring pixels are more 2 * = interesting)
34/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 h11
h11 h12 Only a few local neurons participate in the computation of h11
For example, only pixels 1, 2, 5, 6 ... contribute to h11
16 The connections are much sparser We are taking advantage of the structure of the image(interactions h14 between neighboring pixels are more * = interesting) 2 This sparse connectivity reduces the number of parameters in the model
34/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Aren’t we losing information (by los- ing interactions between some input pixels)
Well, not really
The two highlighted neurons (x1 & ∗ x5) do not interact in layer 1
But they indirectly contribute to the computation of g3 and hence interact indirectly
But is sparse connectivity really good thing ?
∗ Goodfellow-et-al-2016
35/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Well, not really
The two highlighted neurons (x1 & ∗ x5) do not interact in layer 1
But they indirectly contribute to the computation of g3 and hence interact indirectly
But is sparse connectivity really good thing ?
Aren’t we losing information (by los- ing interactions between some input pixels)
∗ Goodfellow-et-al-2016
35/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The two highlighted neurons (x1 & ∗ x5) do not interact in layer 1
But they indirectly contribute to the computation of g3 and hence interact indirectly
But is sparse connectivity really good thing ?
Aren’t we losing information (by los- ing interactions between some input pixels)
Well, not really
∗ Goodfellow-et-al-2016
35/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 But they indirectly contribute to the computation of g3 and hence interact indirectly
But is sparse connectivity really good thing ?
Aren’t we losing information (by los- ing interactions between some input pixels)
Well, not really
The two highlighted neurons (x1 & ∗ x5) do not interact in layer 1
∗ Goodfellow-et-al-2016
35/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 But is sparse connectivity really good thing ?
Aren’t we losing information (by los- ing interactions between some input pixels)
Well, not really
The two highlighted neurons (x1 & ∗ x5) do not interact in layer 1
But they indirectly contribute to the computation of g3 and hence interact indirectly
∗ Goodfellow-et-al-2016
35/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 16
Kernel 1
Kernel 2 4x4 Image
Consider the following net- work
Do we want the kernel weights to be different for dif- ferent portions of the image?
Imagine that we are trying to learn a kernel that detects edges
Shouldn’t we be applying the same kernel at all the por- tions of the image?
Another characteristic of CNNs is weight sharing
36/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 16
Kernel 1
Kernel 2 4x4 Image
Do we want the kernel weights to be different for dif- ferent portions of the image?
Imagine that we are trying to learn a kernel that detects edges
Shouldn’t we be applying the same kernel at all the por- tions of the image?
Another characteristic of CNNs is weight sharing
Consider the following net- work
36/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Do we want the kernel weights to be different for dif- ferent portions of the image?
Imagine that we are trying to learn a kernel that detects edges
Shouldn’t we be applying the same kernel at all the por- tions of the image?
Another characteristic of CNNs is weight sharing
Consider the following net- work
16
Kernel 1
Kernel 2 4x4 Image
36/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Imagine that we are trying to learn a kernel that detects edges
Shouldn’t we be applying the same kernel at all the por- tions of the image?
Another characteristic of CNNs is weight sharing
Consider the following net- work
16 Do we want the kernel weights to be different for dif- ferent portions of the image?
Kernel 1
Kernel 2 4x4 Image
36/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Shouldn’t we be applying the same kernel at all the por- tions of the image?
Another characteristic of CNNs is weight sharing
Consider the following net- work
16 Do we want the kernel weights to be different for dif- ferent portions of the image?
Kernel 1 Imagine that we are trying to learn a kernel that detects edges Kernel 2 4x4 Image
36/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Another characteristic of CNNs is weight sharing
Consider the following net- work
16 Do we want the kernel weights to be different for dif- ferent portions of the image?
Kernel 1 Imagine that we are trying to learn a kernel that detects edges Kernel 2 4x4 Image Shouldn’t we be applying the same kernel at all the por- tions of the image? 36/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Yes, indeed
This would make the job of learning easier(instead of trying to learn the same weights/kernels at different loc- ations again and again)
But does that mean we can have only one kernel?
No, we can have many such kernels but the kernels will be shared by all locations in the image
This is called “weight sharing”
In other words shouldn’t the orange and pink kernels be the same
16
37/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 This would make the job of learning easier(instead of trying to learn the same weights/kernels at different loc- ations again and again)
But does that mean we can have only one kernel?
No, we can have many such kernels but the kernels will be shared by all locations in the image
This is called “weight sharing”
In other words shouldn’t the orange and pink kernels be the same
Yes, indeed
16
37/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 This would make the job of learning easier(instead of trying to learn the same weights/kernels at different loc- ations again and again)
But does that mean we can have only one kernel?
No, we can have many such kernels but the kernels will be shared by all locations in the image
This is called “weight sharing”
In other words shouldn’t the orange and pink kernels be the same
Yes, indeed
16
37/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 But does that mean we can have only one kernel?
No, we can have many such kernels but the kernels will be shared by all locations in the image
This is called “weight sharing”
In other words shouldn’t the orange and pink kernels be the same
Yes, indeed
This would make the job of learning easier(instead of trying to learn the same weights/kernels at different loc- ations again and again)
16
37/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 No, we can have many such kernels but the kernels will be shared by all locations in the image
This is called “weight sharing”
In other words shouldn’t the orange and pink kernels be the same
Yes, indeed
This would make the job of learning easier(instead of trying to learn the same weights/kernels at different loc- ations again and again)
But does that mean we can have only 16 one kernel?
37/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 This is called “weight sharing”
In other words shouldn’t the orange and pink kernels be the same
Yes, indeed
This would make the job of learning easier(instead of trying to learn the same weights/kernels at different loc- ations again and again)
But does that mean we can have only 16 one kernel?
No, we can have many such kernels but the kernels will be shared by all locations in the image
37/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 This is called “weight sharing”
In other words shouldn’t the orange and pink kernels be the same
Yes, indeed
This would make the job of learning easier(instead of trying to learn the same weights/kernels at different loc- ations again and again)
But does that mean we can have only 16 one kernel?
No, we can have many such kernels but the kernels will be shared by all locations in the image
37/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 This is called “weight sharing”
In other words shouldn’t the orange and pink kernels be the same
Yes, indeed
This would make the job of learning easier(instead of trying to learn the same weights/kernels at different loc- ations again and again)
But does that mean we can have only 16 one kernel?
No, we can have many such kernels but the kernels will be shared by all locations in the image
37/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 In other words shouldn’t the orange and pink kernels be the same
Yes, indeed
This would make the job of learning easier(instead of trying to learn the same weights/kernels at different loc- ations again and again)
But does that mean we can have only 16 one kernel?
No, we can have many such kernels but the kernels will be shared by all locations in the image
This is called “weight sharing” 37/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Let us see what a full convolutional neural network looks like
So far, we have focused only on the convolution operation
38/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 So far, we have focused only on the convolution operation Let us see what a full convolutional neural network looks like
38/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 It has alternate convolution and pooling layers What does a pooling layer do? Let us see
Convolution Layer 2 Input Convolution Layer 1 Pooling Layer 2 FC 1(120) Pooling Layer 1 FC 2(84)Output(10)
A 32 28 14 32 P aram P aram 10 P aram = 850 28 14 5 = 48120= 10164 10 5 S = 1,F = 5, S = 1,F = 2, K = 6,P = 0, K = 6,P = 0, S = 1,F = 5, S = 1,F = 2, P aram = 150 P aram = 0 K = 16,P = 0, K = 16,P = 0, P aram = 2400 P aram = 0
39/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 What does a pooling layer do? Let us see
Convolution Layer 2 Input Convolution Layer 1 Pooling Layer 2 FC 1(120) Pooling Layer 1 FC 2(84)Output(10)
A 32 28 14 32 P aram P aram 10 P aram = 850 28 14 5 = 48120= 10164 10 5 S = 1,F = 5, S = 1,F = 2, K = 6,P = 0, K = 6,P = 0, S = 1,F = 5, S = 1,F = 2, P aram = 150 P aram = 0 K = 16,P = 0, K = 16,P = 0, P aram = 2400 P aram = 0
It has alternate convolution and pooling layers
39/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Let us see
Convolution Layer 2 Input Convolution Layer 1 Pooling Layer 2 FC 1(120) Pooling Layer 1 FC 2(84)Output(10)
A 32 28 14 32 P aram P aram 10 P aram = 850 28 14 5 = 48120= 10164 10 5 S = 1,F = 5, S = 1,F = 2, K = 6,P = 0, K = 6,P = 0, S = 1,F = 5, S = 1,F = 2, P aram = 150 P aram = 0 K = 16,P = 0, K = 16,P = 0, P aram = 2400 P aram = 0
It has alternate convolution and pooling layers What does a pooling layer do?
39/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Convolution Layer 2 Input Convolution Layer 1 Pooling Layer 2 FC 1(120) Pooling Layer 1 FC 2(84)Output(10)
A 32 28 14 32 P aram P aram 10 P aram = 850 28 14 5 = 48120= 10164 10 5 S = 1,F = 5, S = 1,F = 2, K = 6,P = 0, K = 6,P = 0, S = 1,F = 5, S = 1,F = 2, P aram = 150 P aram = 0 K = 16,P = 0, K = 16,P = 0, P aram = 2400 P aram = 0
It has alternate convolution and pooling layers What does a pooling layer do? Let us see
39/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1 4 2 1
5 8 3 4 maxpool 8 4 * = 7 6 4 5 2x2 filters (stride 2) 7 5
1 3 1 2
1 filter
1 4 2 1
5 8 3 4 maxpool 8 8 4
7 6 4 5 2x2 filters (stride 1) 8 8 5
1 3 1 2 7 6 5
Instead of max pooling we can also do average pooling
Input
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1 4 2 1
5 8 3 4 maxpool 8 4 = 7 6 4 5 2x2 filters (stride 2) 7 5
1 3 1 2
1 4 2 1
5 8 3 4 maxpool 8 8 4
7 6 4 5 2x2 filters (stride 1) 8 8 5
1 3 1 2 7 6 5
Instead of max pooling we can also do average pooling
*
Input 1 filter
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1 4 2 1
5 8 3 4 maxpool 8 4
7 6 4 5 2x2 filters (stride 2) 7 5
1 3 1 2
1 4 2 1
5 8 3 4 maxpool 8 8 4
7 6 4 5 2x2 filters (stride 1) 8 8 5
1 3 1 2 7 6 5
Instead of max pooling we can also do average pooling
* =
Input 1 filter
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 maxpool 8 4
2x2 filters (stride 2) 7 5
1 4 2 1
5 8 3 4 maxpool 8 8 4
7 6 4 5 2x2 filters (stride 1) 8 8 5
1 3 1 2 7 6 5
Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 * = 7 6 4 5
1 3 1 2
Input 1 filter
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 8 4
7 5
1 4 2 1
5 8 3 4 maxpool 8 8 4
7 6 4 5 2x2 filters (stride 1) 8 8 5
1 3 1 2 7 6 5
Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 maxpool * = 7 6 4 5 2x2 filters (stride 2)
1 3 1 2
Input 1 filter
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 8 4
7 5
1 4 2 1
5 8 3 4 maxpool 8 8 4
7 6 4 5 2x2 filters (stride 1) 8 8 5
1 3 1 2 7 6 5
Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 maxpool * = 7 6 4 5 2x2 filters (stride 2)
1 3 1 2
Input 1 filter
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 4
7 5
1 4 2 1
5 8 3 4 maxpool 8 8 4
7 6 4 5 2x2 filters (stride 1) 8 8 5
1 3 1 2 7 6 5
Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 maxpool 8 * = 7 6 4 5 2x2 filters (stride 2)
1 3 1 2
Input 1 filter
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 7 5
1 4 2 1
5 8 3 4 maxpool 8 8 4
7 6 4 5 2x2 filters (stride 1) 8 8 5
1 3 1 2 7 6 5
Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 maxpool 8 4 * = 7 6 4 5 2x2 filters (stride 2)
1 3 1 2
Input 1 filter
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 5
1 4 2 1
5 8 3 4 maxpool 8 8 4
7 6 4 5 2x2 filters (stride 1) 8 8 5
1 3 1 2 7 6 5
Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 maxpool 8 4 * = 7 6 4 5 2x2 filters (stride 2) 7
1 3 1 2
Input 1 filter
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1 4 2 1
5 8 3 4 maxpool 8 8 4
7 6 4 5 2x2 filters (stride 1) 8 8 5
1 3 1 2 7 6 5
Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 maxpool 8 4 * = 7 6 4 5 2x2 filters (stride 2) 7 5
1 3 1 2
Input 1 filter
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 maxpool 8 8 4
2x2 filters (stride 1) 8 8 5
7 6 5
Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 maxpool 8 4 * = 7 6 4 5 2x2 filters (stride 2) 7 5
1 3 1 2
Input 1 filter
1 4 2 1
5 8 3 4
7 6 4 5
1 3 1 2
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 8 8 4
8 8 5
7 6 5
Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 maxpool 8 4 * = 7 6 4 5 2x2 filters (stride 2) 7 5
1 3 1 2
Input 1 filter
1 4 2 1
5 8 3 4 maxpool
7 6 4 5 2x2 filters (stride 1)
1 3 1 2
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 8 8 4
8 8 5
7 6 5
Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 maxpool 8 4 * = 7 6 4 5 2x2 filters (stride 2) 7 5
1 3 1 2
Input 1 filter
1 4 2 1
5 8 3 4 maxpool
7 6 4 5 2x2 filters (stride 1)
1 3 1 2
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 8 4
8 8 5
7 6 5
Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 maxpool 8 4 * = 7 6 4 5 2x2 filters (stride 2) 7 5
1 3 1 2
Input 1 filter
1 4 2 1
5 8 3 4 maxpool 8
7 6 4 5 2x2 filters (stride 1)
1 3 1 2
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 4
8 8 5
7 6 5
Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 maxpool 8 4 * = 7 6 4 5 2x2 filters (stride 2) 7 5
1 3 1 2
Input 1 filter
1 4 2 1
5 8 3 4 maxpool 8 8
7 6 4 5 2x2 filters (stride 1)
1 3 1 2
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 8 8 5
7 6 5
Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 maxpool 8 4 * = 7 6 4 5 2x2 filters (stride 2) 7 5
1 3 1 2
Input 1 filter
1 4 2 1
5 8 3 4 maxpool 8 8 4
7 6 4 5 2x2 filters (stride 1)
1 3 1 2
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 8 5
7 6 5
Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 maxpool 8 4 * = 7 6 4 5 2x2 filters (stride 2) 7 5
1 3 1 2
Input 1 filter
1 4 2 1
5 8 3 4 maxpool 8 8 4
7 6 4 5 2x2 filters (stride 1) 8
1 3 1 2
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 5
7 6 5
Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 maxpool 8 4 * = 7 6 4 5 2x2 filters (stride 2) 7 5
1 3 1 2
Input 1 filter
1 4 2 1
5 8 3 4 maxpool 8 8 4
7 6 4 5 2x2 filters (stride 1) 8 8
1 3 1 2
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 7 6 5
Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 maxpool 8 4 * = 7 6 4 5 2x2 filters (stride 2) 7 5
1 3 1 2
Input 1 filter
1 4 2 1
5 8 3 4 maxpool 8 8 4
7 6 4 5 2x2 filters (stride 1) 8 8 5
1 3 1 2
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 6 5
Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 maxpool 8 4 * = 7 6 4 5 2x2 filters (stride 2) 7 5
1 3 1 2
Input 1 filter
1 4 2 1
5 8 3 4 maxpool 8 8 4
7 6 4 5 2x2 filters (stride 1) 8 8 5
1 3 1 2 7
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 5
Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 maxpool 8 4 * = 7 6 4 5 2x2 filters (stride 2) 7 5
1 3 1 2
Input 1 filter
1 4 2 1
5 8 3 4 maxpool 8 8 4
7 6 4 5 2x2 filters (stride 1) 8 8 5
1 3 1 2 7 6
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Instead of max pooling we can also do average pooling
1 4 2 1
5 8 3 4 maxpool 8 4 * = 7 6 4 5 2x2 filters (stride 2) 7 5
1 3 1 2
Input 1 filter
1 4 2 1
5 8 3 4 maxpool 8 8 4
7 6 4 5 2x2 filters (stride 1) 8 8 5
1 3 1 2 7 6 5
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1 4 2 1
5 8 3 4 maxpool 8 4 * = 7 6 4 5 2x2 filters (stride 2) 7 5
1 3 1 2
Input 1 filter
1 4 2 1
5 8 3 4 maxpool 8 8 4
7 6 4 5 2x2 filters (stride 1) 8 8 5
1 3 1 2 7 6 5
Instead of max pooling we can also do average pooling
40/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 We will now see some case studies where convolution neural networks have been successful
41/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Convolution Layer 2 Convolution Layer 1 Pooling Layer 2 FC 1(120) Pooling Layer 1 FC 2(84)Output(10)
28 14
10 28 14 5 10 5
LeNet-5 for handwritten character recognition
Input
A 32
32
42/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Convolution Layer 2 Pooling Layer 2
FC 1(120) Pooling Layer 1 FC 2(84)Output(10)
14
10 14 5 10 5
LeNet-5 for handwritten character recognition
Input Convolution Layer 1
A 32 28
32 28
S = 1,F = 5, K = 6,P = 0, P aram =?
42/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Convolution Layer 2 Pooling Layer 2
FC 1(120) Pooling Layer 1 FC 2(84)Output(10)
14
10 14 5 10 5
LeNet-5 for handwritten character recognition
Input Convolution Layer 1
A 32 28
32 28
S = 1,F = 5, K = 6,P = 0, P aram = 150
42/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Convolution Layer 2 Pooling Layer 2
FC 1(120) FC 2(84)Output(10)
10 5
10 5
LeNet-5 for handwritten character recognition
Input Convolution Layer 1
Pooling Layer 1
A 32 28 14 32 28 14
S = 1,F = 5, S = 1,F = 2, K = 6,P = 0, K = 6,P = 0, P aram = 150 P aram =?
42/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Convolution Layer 2 Pooling Layer 2
FC 1(120) FC 2(84)Output(10)
10 5
10 5
LeNet-5 for handwritten character recognition
Input Convolution Layer 1
Pooling Layer 1
A 32 28 14 32 28 14
S = 1,F = 5, S = 1,F = 2, K = 6,P = 0, K = 6,P = 0, P aram = 150 P aram = 0
42/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Pooling Layer 2
FC 1(120) FC 2(84)Output(10)
5 5
LeNet-5 for handwritten character recognition
Convolution Layer 2 Input Convolution Layer 1
Pooling Layer 1
A 32 28 14 32 10 28 14 10 S = 1,F = 5, S = 1,F = 2, K = 6,P = 0, K = 6,P = 0, S = 1,F = 5, P aram = 150 P aram = 0 K = 16,P = 0, P aram =?
42/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Pooling Layer 2
FC 1(120) FC 2(84)Output(10)
5 5
LeNet-5 for handwritten character recognition
Convolution Layer 2 Input Convolution Layer 1
Pooling Layer 1
A 32 28 14 32 10 28 14 10 S = 1,F = 5, S = 1,F = 2, K = 6,P = 0, K = 6,P = 0, S = 1,F = 5, P aram = 150 P aram = 0 K = 16,P = 0, P aram = 2400
42/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 FC 1(120) FC 2(84)Output(10)
LeNet-5 for handwritten character recognition
Convolution Layer 2 Input Convolution Layer 1 Pooling Layer 2
Pooling Layer 1
A 32 28 14 32 10 28 14 5 10 5 S = 1,F = 5, S = 1,F = 2, K = 6,P = 0, K = 6,P = 0, S = 1,F = 5, S = 1,F = 2, P aram = 150 P aram = 0 K = 16,P = 0, K = 16,P = 0, P aram = 2400 P aram =?
42/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 FC 1(120) FC 2(84)Output(10)
LeNet-5 for handwritten character recognition
Convolution Layer 2 Input Convolution Layer 1 Pooling Layer 2
Pooling Layer 1
A 32 28 14 32 10 28 14 5 10 5 S = 1,F = 5, S = 1,F = 2, K = 6,P = 0, K = 6,P = 0, S = 1,F = 5, S = 1,F = 2, P aram = 150 P aram = 0 K = 16,P = 0, K = 16,P = 0, P aram = 2400 P aram = 0
42/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 FC 2(84)Output(10)
LeNet-5 for handwritten character recognition
Convolution Layer 2 Input Convolution Layer 1 Pooling Layer 2
Pooling Layer 1 FC 1(120)
A 32 28 14 32 10 P aram 28 14 5 =? 10 5 S = 1,F = 5, S = 1,F = 2, K = 6,P = 0, K = 6,P = 0, S = 1,F = 5, S = 1,F = 2, P aram = 150 P aram = 0 K = 16,P = 0, K = 16,P = 0, P aram = 2400 P aram = 0
42/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 FC 2(84)Output(10)
LeNet-5 for handwritten character recognition
Convolution Layer 2 Input Convolution Layer 1 Pooling Layer 2
Pooling Layer 1 FC 1(120)
A 32 28 14 32 10 P aram 28 14 5 = 48120 10 5 S = 1,F = 5, S = 1,F = 2, K = 6,P = 0, K = 6,P = 0, S = 1,F = 5, S = 1,F = 2, P aram = 150 P aram = 0 K = 16,P = 0, K = 16,P = 0, P aram = 2400 P aram = 0
42/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output(10)
LeNet-5 for handwritten character recognition
Convolution Layer 2 Input Convolution Layer 1 Pooling Layer 2
Pooling Layer 1 FC 1(120)FC 2(84)
A 32 28 14 32 P aram 10 P aram 28 14 5 = 48120 =? 10 5 S = 1,F = 5, S = 1,F = 2, K = 6,P = 0, K = 6,P = 0, S = 1,F = 5, S = 1,F = 2, P aram = 150 P aram = 0 K = 16,P = 0, K = 16,P = 0, P aram = 2400 P aram = 0
42/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output(10)
LeNet-5 for handwritten character recognition
Convolution Layer 2 Input Convolution Layer 1 Pooling Layer 2
Pooling Layer 1 FC 1(120)FC 2(84)
A 32 28 14 32 P aram 10 P aram 28 14 5 = 48120= 10164 10 5 S = 1,F = 5, S = 1,F = 2, K = 6,P = 0, K = 6,P = 0, S = 1,F = 5, S = 1,F = 2, P aram = 150 P aram = 0 K = 16,P = 0, K = 16,P = 0, P aram = 2400 P aram = 0
42/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 LeNet-5 for handwritten character recognition
Convolution Layer 2 Input Convolution Layer 1 Pooling Layer 2 FC 1(120) Pooling Layer 1 FC 2(84)Output(10)
A 32 28 14 32 P aram P aram 10 P aram =? 28 14 5 = 48120= 10164 10 5 S = 1,F = 5, S = 1,F = 2, K = 6,P = 0, K = 6,P = 0, S = 1,F = 5, S = 1,F = 2, P aram = 150 P aram = 0 K = 16,P = 0, K = 16,P = 0, P aram = 2400 P aram = 0
42/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 LeNet-5 for handwritten character recognition
Convolution Layer 2 Input Convolution Layer 1 Pooling Layer 2 FC 1(120) Pooling Layer 1 FC 2(84)Output(10)
A 32 28 14 32 P aram P aram 10 P aram = 850 28 14 5 = 48120= 10164 10 5 S = 1,F = 5, S = 1,F = 2, K = 6,P = 0, K = 6,P = 0, S = 1,F = 5, S = 1,F = 2, P aram = 150 P aram = 0 K = 16,P = 0, K = 16,P = 0, P aram = 2400 P aram = 0
42/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 How do we train a convolutional neural network ?
43/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output
m ` wherein only a few weights(in color) are active n o the rest of the weights (in gray) are We can thus train a convolution zero neural network using backpropagation by thinking of it as a feedforward neural network with sparse connections
Input Kernel b c d l m n o w x e f g y z h i j b c d e f g h i j
A CNN can be implemented as a feedforward neural network
44/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output
` m
n o the rest of the weights (in gray) are We can thus train a convolution zero neural network using backpropagation by thinking of it as a feedforward neural network with sparse connections
Input Kernel b c d l m n o w x e f g y z h i j b c d e f g h i j
Output A CNN can be implemented as a feedforward neural network m ` wherein only a few weights(in color) are active n o
44/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output
` m
n o We can thus train a convolution neural network using backpropagation by thinking of it as a feedforward neural network with sparse connections
Input Kernel b c d l m n o w x e f g y z h i j b c d e f g h i j
Output A CNN can be implemented as a feedforward neural network m ` wherein only a few weights(in color) are active n o the rest of the weights (in gray) are zero
44/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output
` m
n o We can thus train a convolution neural network using backpropagation by thinking of it as a feedforward neural network with sparse connections
Input Kernel b c d l m n o w x e f g y z h i j b c d e f g h i j
Output A CNN can be implemented as a feedforward neural network m ` wherein only a few weights(in color) are active n o the rest of the weights (in gray) are zero
44/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output
` m
n o We can thus train a convolution neural network using backpropagation by thinking of it as a feedforward neural network with sparse connections
Input Kernel b c d l m n o w x e f g y z h i j b c d e f g h i j
Output A CNN can be implemented as a feedforward neural network m ` wherein only a few weights(in color) are active n o the rest of the weights (in gray) are zero
44/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output
` m
n o We can thus train a convolution neural network using backpropagation by thinking of it as a feedforward neural network with sparse connections
Input Kernel b c d l m n o w x e f g y z h i j b c d e f g h i j
A CNN can be implemented as a feedforward neural network wherein only a few weights(in color) are active the rest of the weights (in gray) are zero
44/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output
` m
n o
Input Kernel b c d l m n o w x e f g y z h i j b c d e f g h i j
A CNN can be implemented as a feedforward neural network wherein only a few weights(in color) are active the rest of the weights (in gray) are We can thus train a convolution zero neural network using backpropagation by thinking of it as a feedforward neural network with
sparse connections 44/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Module 11.4 : CNNs (success stories on ImageNet)
45/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 ZFNet VGGNet
ImageNet Success Stories(roadmap for rest of the talk) AlexNet
46/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 VGGNet
ImageNet Success Stories(roadmap for rest of the talk) AlexNet ZFNet
46/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 ImageNet Success Stories(roadmap for rest of the talk) AlexNet ZFNet VGGNet
46/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 25.8 152 layers
16.4
11.7 19 layers 22 layers
7.3 6.7
shallow 8 layers 8 layers 3.57
ILSVRC’11 ILSVRC’12 ILSVRC’13 ILSVRC’14 ILSVRC’14 ILSVRC’15 AlexNet ZFNet VGG GoogleNet ResNet
28.2
ILSVRC’10
47/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 152 layers
16.4
11.7 19 layers 22 layers
7.3 6.7
shallow 8 layers 8 layers 3.57
ILSVRC’12 ILSVRC’13 ILSVRC’14 ILSVRC’14 ILSVRC’15 AlexNet ZFNet VGG GoogleNet ResNet
28.2 25.8
ILSVRC’10 ILSVRC’11
47/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 152 layers
11.7 19 layers 22 layers
7.3 6.7
shallow 8 layers 8 layers 3.57
ILSVRC’13 ILSVRC’14 ILSVRC’14 ILSVRC’15 ZFNet VGG GoogleNet ResNet
28.2 25.8
16.4
ILSVRC’10 ILSVRC’11 ILSVRC’12 AlexNet
47/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 152 layers
19 layers 22 layers
7.3 6.7
shallow 8 layers 8 layers 3.57
ILSVRC’14 ILSVRC’14 ILSVRC’15 VGG GoogleNet ResNet
28.2 25.8
16.4
11.7
ILSVRC’10 ILSVRC’11 ILSVRC’12 ILSVRC’13 AlexNet ZFNet
47/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 152 layers
19 layers 22 layers
6.7
shallow 8 layers 8 layers 3.57
ILSVRC’14 ILSVRC’15 GoogleNet ResNet
28.2 25.8
16.4
11.7
7.3
ILSVRC’10 ILSVRC’11 ILSVRC’12 ILSVRC’13 ILSVRC’14 AlexNet ZFNet VGG
47/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 152 layers
19 layers 22 layers
shallow 8 layers 8 layers 3.57
ILSVRC’15 ResNet
28.2 25.8
16.4
11.7
7.3 6.7
ILSVRC’10 ILSVRC’11 ILSVRC’12 ILSVRC’13 ILSVRC’14 ILSVRC’14 AlexNet ZFNet VGG GoogleNet
47/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 152 layers
19 layers 22 layers
shallow 8 layers 8 layers
28.2 25.8
16.4
11.7
7.3 6.7
3.57
ILSVRC’10 ILSVRC’11 ILSVRC’12 ILSVRC’13 ILSVRC’14 ILSVRC’14 ILSVRC’15 AlexNet ZFNet VGG GoogleNet ResNet
47/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 28.2 25.8 152 layers
16.4
11.7 19 layers 22 layers
7.3 6.7
shallow 8 layers 8 layers 3.57
ILSVRC’10 ILSVRC’11 ILSVRC’12 ILSVRC’13 ILSVRC’14 ILSVRC’14 ILSVRC’15 AlexNet ZFNet VGG GoogleNet ResNet
47/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 28.2 25.8 152 layers
16.4
11.7 19 layers 22 layers
7.3 6.7
shallow 8 layers 8 layers 3.57
ILSVRC’10 ILSVRC’11 ILSVRC’12 ILSVRC’13 ILSVRC’14 ILSVRC’14 ILSVRC’15 AlexNet ZFNet VGG GoogleNet ResNet
47/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 ImageNet Success Stories(roadmap for rest of the talk) AlexNet ZFNet VGGNet
48/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 55272311 9752,H2 ==? 552723119752 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)256)384)96)× 3)×× 0?××38425638425696 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 11227 27FC1 97 ×××119727227××× 27384256×96.553 M Parameters:Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 4096 40962=× 96,256,384,×256) 23× 55 54096F1000××F×=52355=×4096 =11 =× 53×256 16 425696MM = 4M FS = 4,1,3,PS = 02
55 27 23 11 11 9 5 7 3 5 dense 3 3 3 3 3 3 3 2 dense dense 5 3 3 3 3 11 5 2 7 9 256 11 256 MaxPooling 27 23 384 Convolution 55 384 Convolution 256 Convolution MaxPooling 1000 96 256 MaxPooling Convolution 4096 4096 96 Convolution
227
227
3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 272311 9752,H2 ==? 2723119752 Parameters:Parameters:Parameters: (3 (3Parameters: (5×××33×5××384)256)384)96)×× 0?×384256384256 = = = 1 0. 0.327.68MMM
TotalInput:Input:Input: Parameters: 11 27FC1 97 ××119727××× 2738425696.55M Parameters:Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 4096 40962=× 256,384,×256) 23× 55 540961000××F×52355=×4096 = =× 53×256 16 425696MM = 4M FS = 1,3,PS = 02
Output:W2 = 55,H2 = 55 Parameters: (11 × 11 × 3) × 96 = 34K
55 27 23 11 9 5 7 3 5 dense 3 3 3 3 3 3 3 2 dense dense 5 3 3 3 3 5 2 7 9 256 11 256 MaxPooling 27 23 384 Convolution 55 384 Convolution 256 Convolution MaxPooling 1000 96 256 MaxPooling Convolution 4096 4096 96 Convolution
Input: 227 × 227 × 3 Conv1: K = 96,F = 11 S = 4,P = 0
Output:W2 =?,H2 =? Parameters: ?
227
11
11
227
3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 272311 9752,H2 ==? 2723119752 Parameters:Parameters:Parameters: (3 (3Parameters: (5×××33×5××384)256)384)96)×× 0?×384256384256 = = = 1 0. 0.327.68MMM
TotalInput:Input:Input: Parameters: 11 27FC1 97 ××119727××× 2738425696.55M Parameters:Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 4096 40962=× 256,384,×256) 23× 55 540961000××F×52355=×4096 = =× 53×256 16 425696MM = 4M FS = 1,3,PS = 02
Output:W2 =?,H2 =? Parameters: (11 × 11 × 3) × 96 = 34K
27 23 11 9 5 7 3 5 dense 3 3 3 3 3 3 3 2 dense dense 5 3 3 3 3 5 2 7 9 256 11 256 MaxPooling 27 23 384 Convolution 384 Convolution 256 Convolution MaxPooling 1000 96 256 MaxPooling Convolution 4096 4096
Input: 227 × 227 × 3 Conv1: K = 96,F = 11 S = 4,P = 0
Output:W2 = 55,H2 = 55 Parameters: ?
227 55
11
11
55 227
96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 272311 9752,H2 ==? 2723119752 Parameters:Parameters:Parameters: (3 (3Parameters: (5×××33×5××384)256)384)96)×× 0?×384256384256 = = = 1 0. 0.327.68MMM
TotalInput:Input:Input: Parameters: 11 27FC1 97 ××119727××× 2738425696.55M Parameters:Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 4096 40962=× 256,384,×256) 23× 55 540961000××F×52355=×4096 = =× 53×256 16 425696MM = 4M FS = 1,3,PS = 02
Output:W2 =?,H2 =? Parameters: ?
27 23 11 9 5 7 3 5 dense 3 3 3 3 3 3 3 2 dense dense 5 3 3 3 3 5 2 7 9 256 11 256 MaxPooling 27 23 384 Convolution 384 Convolution 256 Convolution MaxPooling 1000 96 256 MaxPooling Convolution 4096 4096
Input: 227 × 227 × 3 Conv1: K = 96,F = 11 S = 4,P = 0
Output:W2 = 55,H2 = 55 Parameters: (11 × 11 × 3) × 96 = 34K
227 55
11
11
55 227
96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 552311 9752,H2 ==? 5523119752 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)256)384)96)× 3)×× 0?××38425638425696 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 11227 27FC1 97 ×××119727227××× 27384256×96.553 M Parameters:Parameters:Parameters:MaxMaxConv1: Pool Pool (2 Input:× Input:K 4096 40962=× 96,256,384,×256) 23× 54096F1000×F×=523=×4096 =11 =× 53256 16 4256MM = 4M FS = 4,1,3,PS = 02
Output:W2 = 27,H2 = 27 Parameters: 0
27 23 11 9 5 7 5 dense 3 3 3 3 3 3 2 dense dense 5 3 3 3 3 5 2 7 9 256 11 256 MaxPooling 27 23 384 Convolution 384 Convolution 256 Convolution MaxPooling 1000 96 256 MaxPooling Convolution 4096 4096
Max Pool Input: 55 × 55 × 96 F = 3,S = 2
Output:W2 =?,H2 =? Parameters: ?
227 55
11 3 3 11
55 227
96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 552311 9752,H2 ==? 5523119752 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)256)384)96)× 3)×× 0?××38425638425696 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 11227 27FC1 97 ×××119727227××× 27384256×96.553 M Parameters:Parameters:Parameters:MaxMaxConv1: Pool Pool (2 Input:× Input:K 4096 40962=× 96,256,384,×256) 23× 54096F1000×F×=523=×4096 =11 =× 53256 16 4256MM = 4M FS = 4,1,3,PS = 02
Output:W2 =?,H2 =? Parameters: 0
23 11 9 5 7 5 dense 3 3 3 3 3 3 2 dense dense 5 3 3 3 3 5 2 7 9 256 11 256 MaxPooling 23 384 Convolution 384 Convolution 256 Convolution MaxPooling 1000 256 Convolution 4096 4096
Max Pool Input: 55 × 55 × 96 F = 3,S = 2
Output:W2 = 27,H2 = 27 Parameters: ?
227 55 27 11 3 3 11
27 55 227
96 MaxPooling 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 552311 9752,H2 ==? 5523119752 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)256)384)96)× 3)×× 0?××38425638425696 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 11227 27FC1 97 ×××119727227××× 27384256×96.553 M Parameters:Parameters:Parameters:MaxMaxConv1: Pool Pool (2 Input:× Input:K 4096 40962=× 96,256,384,×256) 23× 54096F1000×F×=523=×4096 =11 =× 53256 16 4256MM = 4M FS = 4,1,3,PS = 02
Output:W2 =?,H2 =? Parameters: ?
23 11 9 5 7 5 dense 3 3 3 3 3 3 2 dense dense 5 3 3 3 3 5 2 7 9 256 11 256 MaxPooling 23 384 Convolution 384 Convolution 256 Convolution MaxPooling 1000 256 Convolution 4096 4096
Max Pool Input: 55 × 55 × 96 F = 3,S = 2
Output:W2 = 27,H2 = 27 Parameters: 0
227 55 27 11 3 3 11
27 55 227
96 MaxPooling 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 552711 9752,H2 ==? 5527119752 Parameters:Parameters:Parameters: (3 (3Parameters: (11××3×3××11384)256)384)× 3)× 0?××38438425696 = = 1 34.0327.8KMM
TotalInput:Input: Parameters: 11227FC1 97 ××1197227×× 27384256×.553 M Parameters:Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 4096 40962=× 96,384,256,×256) 23× 55 54096F1000××F×=52355=×4096 =11 =× 3×256 16 425696MM = 4M FS = 4,1,3,PS = 02
Output:W2 = 23,H2 = 23 Parameters: (5 × 5 × 96) × 256 = 0.6M
23 11 9 7 dense 3 3 3 3 5 3 dense dense 3 3 3 3 3 2 5 2 7 9 256 11 256 MaxPooling 23 384 Convolution 384 Convolution 256 Convolution MaxPooling 1000 256 Convolution 4096 4096
Input: 27 × 27 × 96 Conv1: K = 256,F = 5 S = 1,P = 0
Output:W2 =?,H2 =? Parameters: ?
227 55 27 11 3 5 3 5 11
27 55 227
96 MaxPooling 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 552711 9752,H2 ==? 5527119752 Parameters:Parameters:Parameters: (3 (3Parameters: (11××3×3××11384)256)384)× 3)× 0?××38438425696 = = 1 34.0327.8KMM
TotalInput:Input: Parameters: 11227FC1 97 ××1197227×× 27384256×.553 M Parameters:Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 4096 40962=× 96,384,256,×256) 23× 55 54096F1000××F×=52355=×4096 =11 =× 3×256 16 425696MM = 4M FS = 4,1,3,PS = 02
Output:W2 =?,H2 =? Parameters: (5 × 5 × 96) × 256 = 0.6M
11 9 7 dense 3 3 3 3 5 3 dense dense 3 3 3 3 3 2 5 2 7 9 256 11 256 MaxPooling 384 Convolution 384 Convolution 256 Convolution MaxPooling 1000
4096 4096
Input: 27 × 27 × 96 Conv1: K = 256,F = 5 S = 1,P = 0
Output:W2 = 23,H2 = 23 Parameters: ?
227 55 27 23 11 3 5 3 5 11
27 23 55 227
96 256 MaxPooling Convolution 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 552711 9752,H2 ==? 5527119752 Parameters:Parameters:Parameters: (3 (3Parameters: (11××3×3××11384)256)384)× 3)× 0?××38438425696 = = 1 34.0327.8KMM
TotalInput:Input: Parameters: 11227FC1 97 ××1197227×× 27384256×.553 M Parameters:Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 4096 40962=× 96,384,256,×256) 23× 55 54096F1000××F×=52355=×4096 =11 =× 3×256 16 425696MM = 4M FS = 4,1,3,PS = 02
Output:W2 =?,H2 =? Parameters: ?
11 9 7 dense 3 3 3 3 5 3 dense dense 3 3 3 3 3 2 5 2 7 9 256 11 256 MaxPooling 384 Convolution 384 Convolution 256 Convolution MaxPooling 1000
4096 4096
Input: 27 × 27 × 96 Conv1: K = 256,F = 5 S = 1,P = 0
Output:W2 = 23,H2 = 23 Parameters: (5 × 5 × 96) × 256 = 0.6M
227 55 27 23 11 3 5 3 5 11
27 23 55 227
96 256 MaxPooling Convolution 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 552723 9752,H2 ==? 5527239752 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)256)384)96)× 3)×× 0?××38425638425696 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 11227 27FC1 97 ×××119727227××× 27384256×96.553 M Parameters:Parameters:Parameters:MaxMaxConv1: Pool Pool (2 Input:× Input:K 4096 40962=× 96,256,384,×256)× 55 54096F1000××F×=555=×4096 =11 = 53×256 16 496MM = 4M FS = 4,1,3,PS = 02
Output:W2 = 11,H2 = 11 Parameters: 0
11 9 7 dense 3 3 3 5 3 dense dense 3 3 3 3 2 5 2 7 9 256 11 256 MaxPooling 384 Convolution 384 Convolution 256 Convolution MaxPooling 1000
4096 4096
Max Pool Input: 23 × 23 × 256 F = 3,S = 2
Output:W2 =?,H2 =? Parameters: ?
227 55 27 23 11 5 3 3 3 3 5 11
27 23 55 227
96 256 MaxPooling Convolution 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 552723 9752,H2 ==? 5527239752 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)256)384)96)× 3)×× 0?××38425638425696 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 11227 27FC1 97 ×××119727227××× 27384256×96.553 M Parameters:Parameters:Parameters:MaxMaxConv1: Pool Pool (2 Input:× Input:K 4096 40962=× 96,256,384,×256)× 55 54096F1000××F×=555=×4096 =11 = 53×256 16 496MM = 4M FS = 4,1,3,PS = 02
Output:W2 =?,H2 =? Parameters: 0
9 7 dense 3 3 3 5 3 dense dense 3 3 3 3 2 5 2 7 9 256 256 MaxPooling 384 Convolution 384 Convolution Convolution 1000
4096 4096
Max Pool Input: 23 × 23 × 256 F = 3,S = 2
Output:W2 = 11,H2 = 11 Parameters: ?
227 55 27 23 11 11 5 3 3 3 3 5 11
11 27 23 55 227 256 MaxPooling 96 256 MaxPooling Convolution 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 552723 9752,H2 ==? 5527239752 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)256)384)96)× 3)×× 0?××38425638425696 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 11227 27FC1 97 ×××119727227××× 27384256×96.553 M Parameters:Parameters:Parameters:MaxMaxConv1: Pool Pool (2 Input:× Input:K 4096 40962=× 96,256,384,×256)× 55 54096F1000××F×=555=×4096 =11 = 53×256 16 496MM = 4M FS = 4,1,3,PS = 02
Output:W2 =?,H2 =? Parameters: ?
9 7 dense 3 3 3 5 3 dense dense 3 3 3 3 2 5 2 7 9 256 256 MaxPooling 384 Convolution 384 Convolution Convolution 1000
4096 4096
Max Pool Input: 23 × 23 × 256 F = 3,S = 2
Output:W2 = 11,H2 = 11 Parameters: 0
227 55 27 23 11 11 5 3 3 3 3 5 11
11 27 23 55 227 256 MaxPooling 96 256 MaxPooling Convolution 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 55272311 752,H2 ==? 55272311752 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)384)96)× 3)×× 0?××38425625696 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 227 27FC1 97 ×××9727227×× 27384×96.553 M Parameters:Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 4096 40962=× 96,256,384,×256) 23× 55 54096F1000××F×=52355=×4096 =11 =× 53×256 16 425696MM = 4M FS = 4,1,3,PS = 02
Output:W2 = 9,H2 = 9 Parameters: (3 × 3 × 256) × 384 = 0.8M
9 7 dense 3 3 5 3 dense dense 3 3 3 2 5 2 7 9 256 256 MaxPooling 384 Convolution 384 Convolution Convolution 1000
4096 4096
Input: 11 × 11 × 256 Conv1: K = 384,F = 3 S = 1,P = 0
Output:W2 =?,H2 =? Parameters: ?
227 55 27 23 11 11 3 5 3 3 3 3 5 3 11
11 27 23 55 227 256 MaxPooling 96 256 MaxPooling Convolution 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 55272311 752,H2 ==? 55272311752 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)384)96)× 3)×× 0?××38425625696 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 227 27FC1 97 ×××9727227×× 27384×96.553 M Parameters:Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 4096 40962=× 96,256,384,×256) 23× 55 54096F1000××F×=52355=×4096 =11 =× 53×256 16 425696MM = 4M FS = 4,1,3,PS = 02
Output:W2 =?,H2 =? Parameters: (3 × 3 × 256) × 384 = 0.8M
7 dense 3 3 5 3 dense dense 3 3 3 2 5 2 7 256 256 MaxPooling 384 Convolution Convolution
1000
4096 4096
Input: 11 × 11 × 256 Conv1: K = 384,F = 3 S = 1,P = 0
Output:W2 = 9,H2 = 9 Parameters: ?
227 55 27 23 11 11 9 3 5 3 3 3 3 5 3 11
11 9 27 23 55 384 256 227 Convolution MaxPooling 96 256 MaxPooling Convolution 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 55272311 752,H2 ==? 55272311752 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)384)96)× 3)×× 0?××38425625696 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 227 27FC1 97 ×××9727227×× 27384×96.553 M Parameters:Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 4096 40962=× 96,256,384,×256) 23× 55 54096F1000××F×=52355=×4096 =11 =× 53×256 16 425696MM = 4M FS = 4,1,3,PS = 02
Output:W2 =?,H2 =? Parameters: ?
7 dense 3 3 5 3 dense dense 3 3 3 2 5 2 7 256 256 MaxPooling 384 Convolution Convolution
1000
4096 4096
Input: 11 × 11 × 256 Conv1: K = 384,F = 3 S = 1,P = 0
Output:W2 = 9,H2 = 9 Parameters: (3 × 3 × 256) × 384 = 0.8M
227 55 27 23 11 11 9 3 5 3 3 3 3 5 3 11
11 9 27 23 55 384 256 227 Convolution MaxPooling 96 256 MaxPooling Convolution 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 55272311 952,H2 ==? 55272311952 Parameters:Parameters:Parameters: (3Parameters: (5 (11×××35×11×256)384)96)× 3)× 0?××25638425696 = = 0 340..68KMM
TotalInput:Input:Input: Parameters: 11227 27FC1 7 ×××11727227××× 27384256×96.553 M Parameters:Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 4096 40962=× 96,256,384,×256) 23× 55 54096F1000××F×=52355=×4096 =11 =× 53×256 16 425696MM = 4M FS = 4,1,3,PS = 02
Output:W2 = 7,H2 = 7 Parameters: (3 × 3 × 384) × 384 = 1.327M
7 dense 3 5 3 dense dense 3 3 2 5 2 7 256 256 MaxPooling 384 Convolution Convolution
1000
4096 4096
Input: 9 × 9 × 384 Conv1: K = 384,F = 3 S = 1,P = 0
Output:W2 =?,H2 =? Parameters: ?
227 55 27 23 11 11 9 3 5 3 3 3 3 3 5 3 3 11
11 9 27 23 55 384 256 227 Convolution MaxPooling 96 256 MaxPooling Convolution 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 55272311 952,H2 ==? 55272311952 Parameters:Parameters:Parameters: (3Parameters: (5 (11×××35×11×256)384)96)× 3)× 0?××25638425696 = = 0 340..68KMM
TotalInput:Input:Input: Parameters: 11227 27FC1 7 ×××11727227××× 27384256×96.553 M Parameters:Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 4096 40962=× 96,256,384,×256) 23× 55 54096F1000××F×=52355=×4096 =11 =× 53×256 16 425696MM = 4M FS = 4,1,3,PS = 02
Output:W2 =?,H2 =? Parameters: (3 × 3 × 384) × 384 = 1.327M
dense 3 5 3 dense dense 3 3 2 5 2 256 256 MaxPooling Convolution
1000
4096 4096
Input: 9 × 9 × 384 Conv1: K = 384,F = 3 S = 1,P = 0
Output:W2 = 7,H2 = 7 Parameters: ?
227 55 27 23 11 11 9 3 5 7 3 3 3 3 3 5 3 3 11 7 11 9 27 23 384 55 384 Convolution 256 227 Convolution MaxPooling 96 256 MaxPooling Convolution 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 55272311 952,H2 ==? 55272311952 Parameters:Parameters:Parameters: (3Parameters: (5 (11×××35×11×256)384)96)× 3)× 0?××25638425696 = = 0 340..68KMM
TotalInput:Input:Input: Parameters: 11227 27FC1 7 ×××11727227××× 27384256×96.553 M Parameters:Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 4096 40962=× 96,256,384,×256) 23× 55 54096F1000××F×=52355=×4096 =11 =× 53×256 16 425696MM = 4M FS = 4,1,3,PS = 02
Output:W2 =?,H2 =? Parameters: ?
dense 3 5 3 dense dense 3 3 2 5 2 256 256 MaxPooling Convolution
1000
4096 4096
Input: 9 × 9 × 384 Conv1: K = 384,F = 3 S = 1,P = 0
Output:W2 = 7,H2 = 7 Parameters: (3 × 3 × 384) × 384 = 1.327M
227 55 27 23 11 11 9 3 5 7 3 3 3 3 3 5 3 3 11 7 11 9 27 23 384 55 384 Convolution 256 227 Convolution MaxPooling 96 256 MaxPooling Convolution 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 55272311 972,H2 ==? 55272311972 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)256)96)× 3)×× 0?××38425638496 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 11227 27FC1 9 ×××11927227××× 27384256×96.553 M Parameters:Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 4096 40962=× 96,256,384,×256) 23× 55 54096F1000××F×=52355=×4096 =11 =× 53×256 16 425696MM = 4M FS = 4,1,3,PS = 02
Output:W2 = 5,H2 = 5 Parameters: (3 × 3 × 384) × 256 = 0.8M
5 dense 3 dense dense 3 2 5 2 256 256 MaxPooling Convolution
1000
4096 4096
Input: 7 × 7 × 384 Conv1: K = 256,F = 3 S = 1,P = 0
Output:W2 =?,H2 =? Parameters: ?
227 55 27 23 11 11 9 3 5 7 3 3 3 3 3 3 5 3 3 3 11 7 11 9 27 23 384 55 384 Convolution 256 227 Convolution MaxPooling 96 256 MaxPooling Convolution 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 55272311 972,H2 ==? 55272311972 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)256)96)× 3)×× 0?××38425638496 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 11227 27FC1 9 ×××11927227××× 27384256×96.553 M Parameters:Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 4096 40962=× 96,256,384,×256) 23× 55 54096F1000××F×=52355=×4096 =11 =× 53×256 16 425696MM = 4M FS = 4,1,3,PS = 02
Output:W2 =?,H2 =? Parameters: (3 × 3 × 384) × 256 = 0.8M
dense 3 dense dense 3 2 2 256 MaxPooling
1000
4096 4096
Input: 7 × 7 × 384 Conv1: K = 256,F = 3 S = 1,P = 0
Output:W2 = 5,H2 = 5 Parameters: ?
227 55 27 23 11 11 9 3 5 7 3 3 3 3 5 3 3 5 3 3 3 11 5 7 11 9 256 27 23 384 Convolution 55 384 Convolution 256 227 Convolution MaxPooling 96 256 MaxPooling Convolution 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 55272311 972,H2 ==? 55272311972 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)256)96)× 3)×× 0?××38425638496 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 11227 27FC1 9 ×××11927227××× 27384256×96.553 M Parameters:Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 4096 40962=× 96,256,384,×256) 23× 55 54096F1000××F×=52355=×4096 =11 =× 53×256 16 425696MM = 4M FS = 4,1,3,PS = 02
Output:W2 =?,H2 =? Parameters: ?
dense 3 dense dense 3 2 2 256 MaxPooling
1000
4096 4096
Input: 7 × 7 × 384 Conv1: K = 256,F = 3 S = 1,P = 0
Output:W2 = 5,H2 = 5 Parameters: (3 × 3 × 384) × 256 = 0.8M
227 55 27 23 11 11 9 3 5 7 3 3 3 3 5 3 3 5 3 3 3 11 5 7 11 9 256 27 23 384 Convolution 55 384 Convolution 256 227 Convolution MaxPooling 96 256 MaxPooling Convolution 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 55272311 975,H2 ==? 55272311975 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)256)384)96)× 3)×× 0?××38425638425696 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 11227 27FC1 97 ×××119727227××× 27384256×96.553 M Parameters:Parameters:Parameters:MaxMaxConv1: Pool Pool (2 Input: Input:×K 4096 40962=× 96,256,384,×256) 23× 554096F1000××F×=2355=4096 =11 =× 53× 16 425696MM = 4M FS = 4,1,3,PS = 02
Output:W2 = 2,H2 = 2 Parameters: 0
dense 2 dense dense 2 256 MaxPooling
1000
4096 4096
Max Pool Input: 5 × 5 × 256 F = 3,S = 2
Output:W2 =?,H2 =? Parameters: ?
227 55 27 23 11 11 9 5 7 3 5 3 3 3 3 3 3 3 5 3 3 3 3 11 5 7 11 9 256 27 23 384 Convolution 55 384 Convolution 256 227 Convolution MaxPooling 96 256 MaxPooling Convolution 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 55272311 975,H2 ==? 55272311975 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)256)384)96)× 3)×× 0?××38425638425696 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 11227 27FC1 97 ×××119727227××× 27384256×96.553 M Parameters:Parameters:Parameters:MaxMaxConv1: Pool Pool (2 Input: Input:×K 4096 40962=× 96,256,384,×256) 23× 554096F1000××F×=2355=4096 =11 =× 53× 16 425696MM = 4M FS = 4,1,3,PS = 02
Output:W2 =?,H2 =? Parameters: 0
dense dense dense
1000
4096 4096
Max Pool Input: 5 × 5 × 256 F = 3,S = 2
Output:W2 = 2,H2 = 2 Parameters: ?
227 55 27 23 11 11 9 5 7 3 5 3 3 3 3 3 3 3 2 5 3 3 3 3 11 5 2 7 9 256 11 256 MaxPooling 27 23 384 Convolution 55 384 Convolution 256 227 Convolution MaxPooling 96 256 MaxPooling Convolution 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 55272311 975,H2 ==? 55272311975 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)256)384)96)× 3)×× 0?××38425638425696 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 11227 27FC1 97 ×××119727227××× 27384256×96.553 M Parameters:Parameters:Parameters:MaxMaxConv1: Pool Pool (2 Input: Input:×K 4096 40962=× 96,256,384,×256) 23× 554096F1000××F×=2355=4096 =11 =× 53× 16 425696MM = 4M FS = 4,1,3,PS = 02
Output:W2 =?,H2 =? Parameters: ?
dense dense dense
1000
4096 4096
Max Pool Input: 5 × 5 × 256 F = 3,S = 2
Output:W2 = 2,H2 = 2 Parameters: 0
227 55 27 23 11 11 9 5 7 3 5 3 3 3 3 3 3 3 2 5 3 3 3 3 11 5 2 7 9 256 11 256 MaxPooling 27 23 384 Convolution 55 384 Convolution 256 227 Convolution MaxPooling 96 256 MaxPooling Convolution 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 55272311 9752,H2 ==? 552723119752 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)256)384)96)× 3)×× 0?××38425638425696 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 11227 27FC1 97 ×××119727227××× 27384256×96.553 M Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool Input: Input: Input:K 4096 4096= 96,256,384,× 23× 55 54096F1000××F=52355=× =11 =× 53×256 16 425696MM FS = 4,1,3,PS = 02
dense dense
1000
4096
FC1 Parameters: (2 × 2 × 256) × 4096 = 4M
227 55 27 23 11 11 9 5 7 3 5 dense 3 3 3 3 3 3 3 2 5 3 3 3 3 11 5 2 7 9 256 11 256 MaxPooling 27 23 384 Convolution 55 384 Convolution 256 227 Convolution MaxPooling 96 256 MaxPooling Convolution 4096 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 55272311 9752,H2 ==? 552723119752 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)256)384)96)× 3)×× 0?××38425638425696 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 11227 27FC1 97 ×××119727227××× 27384256×96.553 M Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 40962=× 96,256,384,256) 23× 55 5F1000××F×=52355=×4096 11 =× 53×256 425696M = 4M FS = 4,1,3,PS = 02
dense
1000
FC1 Parameters: 4096 × 4096 = 16M
227 55 27 23 11 11 9 5 7 3 5 dense 3 3 3 3 3 3 3 2 dense 5 3 3 3 3 11 5 2 7 9 256 11 256 MaxPooling 27 23 384 Convolution 55 384 Convolution 256 227 Convolution MaxPooling 96 256 MaxPooling Convolution 4096 4096 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 55272311 9752,H2 ==? 552723119752 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)256)384)96)× 3)×× 0?××38425638425696 = = = 1 0 34.0.327.68KMMM
TotalInput:Input:Input: Parameters: 11227 27FC1 97 ×××119727227××× 27384256×96.553 M Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 40962=× 96,256,384,×256) 23 55 54096F××F×=52355=×4096 =11× 53×256 1625696M = 4M FS = 4,1,3,PS = 02
FC1 Parameters: 4096 × 1000 = 4M
227 55 27 23 11 11 9 5 7 3 5 dense 3 3 3 3 3 3 3 2 dense dense 5 3 3 3 3 11 5 2 7 9 256 11 256 MaxPooling 27 23 384 Convolution 55 384 Convolution 256 227 Convolution MaxPooling 1000 96 256 MaxPooling Convolution 4096 4096 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Output:Output:Output:WWW2 2=2==? 55272311 9752,H2 ==? 552723119752 Parameters:Parameters:Parameters:Parameters: (3 (3Parameters: (5 (11×××3×3×5×11×384)256)384)96)× 3)×× 0?××38425638425696 = = = 1 0 34.0.327.68KMMM
Input:Input:Input: 11227 27FC1 97 ×××119727227×××384256×963 Parameters:Parameters:Parameters:MaxMaxMaxConv1: Pool Pool Pool (2 Input: Input:× Input:K 4096 40962=× 96,256,384,×256) 23× 55 54096F1000××F×=52355=×4096 =11 =× 53×256 16 425696MM = 4M FS = 4,1,3,PS = 02
Total Parameters: 27.55M
227 55 27 23 11 11 9 5 7 3 5 dense 3 3 3 3 3 3 3 2 dense dense 5 3 3 3 3 11 5 2 7 9 256 11 256 MaxPooling 27 23 384 Convolution 55 384 Convolution 256 227 Convolution MaxPooling 1000 96 256 MaxPooling Convolution 4096 4096 96 Convolution 3 Input
49/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 make linear dense dense dense We will first stretch out the last conv or maxpool layer to make it a 1d vector This 1d vector is 2 × 2 × 256 = 1024 1000 then densely connec-
ted to other lay- 4096 4096 ers just as in a regular feedforward neural network
Let us look at the connections in the fully connected lay- ers in more detail 2
2
256
MaxPooling
50/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 dense dense dense
This 1d vector is 1000 then densely connec-
ted to other lay- 4096 4096 ers just as in a regular feedforward neural network
Let us look at the connections in the fully connected lay- ers in more detail 2 make linear
We will first stretch 2 out the last conv 256 or maxpool layer to MaxPooling make it a 1d vector
2 × 2 × 256 = 1024
50/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 dense dense
1000
4096
Let us look at the connections in the fully connected lay- ers in more detail make linear 2 dense
We will first stretch 2 out the last conv 256 or maxpool layer to MaxPooling make it a 1d vector This 1d vector is 2 × 2 × 256 = 1024 then densely connec- ted to other lay- 4096 ers just as in a regular feedforward neural network
50/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 dense
1000
Let us look at the connections in the fully connected lay- ers in more detail make linear 2 dense dense
We will first stretch 2 out the last conv 256 or maxpool layer to MaxPooling make it a 1d vector This 1d vector is 2 × 2 × 256 = 1024 then densely connec- ted to other lay- 4096 4096 ers just as in a regular feedforward neural network
50/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Let us look at the connections in the fully connected lay- ers in more detail make linear 2 dense dense dense
We will first stretch 2 out the last conv 256 or maxpool layer to MaxPooling make it a 1d vector This 1d vector is 2 × 2 × 256 = 1024 1000 then densely connec- ted to other lay- 4096 4096 ers just as in a regular feedforward neural network
50/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 ImageNet Success Stories(roadmap for rest of the talk) AlexNet ZFNet VGGNet
51/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 55 27 23 11 11 9 3 5 7 3 3 3 3 3 5 dense dense dense 3 2 5 3 3 3 3 3 11 5 2 7 11 9 256 23 256 27 384 MaxPooling 55 Convolution 384 Convolution 256 1000 MaxPooling Convolution 96 256 MaxPooling Convolution 4096 4096 96 Convolution DifferenceLayer6:Layer5:Layer7:Layer1: in TotalKKF== No.= 384 384256 11 of→→→ Parameters10245127 Layer10:Layer2:Layer3:Layer4:Layer8:Layer9: No No difference difference Difference1. in45M Parameters (3(3××3(33×××((11((384((3843 ×2 256)−××7256)384)2)××(512−−3)(1024(512×−96384)×× =1024))512)) 20 =.7 0K.29 = =M 0 0..368MM
55 27 23 11 7 9 3 5 7 3 3 3 3 3 5 dense dense dense 3 2 7 5 3 3 3 3 3 5 2 7 11 9 256 23 512 27 1024 MaxPooling 55 Convolution 512 Convolution 256 1000 MaxPooling Convolution 96 256 MaxPooling Convolution 4096 4096 96 Convolution
227
227
3 Input
227
227
3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 55 27 23 11 9 3 5 7 3 3 3 3 3 5 dense dense dense 3 2 5 3 3 3 3 3 5 2 7 11 9 256 23 256 27 384 MaxPooling 55 Convolution 384 Convolution 256 1000 MaxPooling Convolution 96 256 MaxPooling Convolution 4096 4096 96 Convolution DifferenceLayer6:Layer5:Layer7: in TotalKK== No. 384 384256 of→→ Parameters1024512 Layer10:Layer2:Layer3:Layer4:Layer8:Layer9: No No difference difference Difference1. in45M Parameters (3(3××3(33×××((384((3843 × 256)××256)384)× (512−−(1024(512− 384)××1024))512)) = 0.29 = =M 0 0..368MM
55 27 23 11 9 3 5 7 3 3 3 3 3 5 dense dense dense 3 2 5 3 3 3 3 3 5 2 7 11 9 256 23 512 27 1024 MaxPooling 55 Convolution 512 Convolution 256 1000 MaxPooling Convolution 96 256 MaxPooling Convolution 4096 4096 96 Convolution
227
11
11
227
Layer1: F = 11 → 7 3 Input Difference in Parameters ((112 − 72) × 3) × 96 = 20.7K
227
7
7
227
3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 27 23 11 9 3 5 7 3 3 3 3 3 5 dense dense dense 3 2 5 3 3 3 3 3 5 2 7 9 256 11 256 27 23 MaxPooling 384 Convolution 384 Convolution 256 1000 MaxPooling Convolution 96 256 MaxPooling Convolution 4096 4096 DifferenceLayer6:Layer5:Layer7: in TotalKK== No. 384 384256 of→→ Parameters1024512 Layer10:Layer2:Layer3:Layer4:Layer8:Layer9: No No difference difference Difference1. in45M Parameters (3(3××3(33×××((384((3843 × 256)××256)384)× (512−−(1024(512− 384)××1024))512)) = 0.29 = =M 0 0..368MM
27 23 11 9 3 5 7 3 3 3 3 3 5 dense dense dense 3 2 5 3 3 3 3 3 5 2 7 9 256 11 512 27 23 MaxPooling 1024 Convolution 512 Convolution 256 1000 MaxPooling Convolution 96 256 MaxPooling Convolution 4096 4096
227 55
11
11
55 227
96 Convolution Layer1: F = 11 → 7 3 Input Difference in Parameters ((112 − 72) × 3) × 96 = 20.7K
227 55
7
7
55 227
96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 27 23 11 9 5 7 3 3 3 3 5 3 2dense dense dense 5 3 3 3 3 3 5 2 7 9 256 11 256 27 23 MaxPooling 384 Convolution 384 Convolution 256 1000 MaxPooling Convolution 96 256 MaxPooling Convolution 4096 4096 DifferenceLayer6:Layer5:Layer7:Layer1: in TotalKKF== No.= 384 384256 11 of→→→ Parameters10245127 Layer10:Layer3:Layer4:Layer8:Layer9: No No difference difference Difference1. in45M Parameters (3(3××3(33×××((11((384((3843 ×2 256)−××7256)384)2)××(512−−3)(1024(512×−96384)×× =1024))512)) 20 =.7 0K.29 = =M 0 0..368MM
27 23 11 9 5 7 3 3 3 3 5 3 2dense dense dense 5 3 3 3 3 3 5 2 7 9 256 11 512 27 23 MaxPooling 1024 Convolution 512 Convolution 256 1000 MaxPooling Convolution 96 256 MaxPooling Convolution 4096 4096
227 55
11 3 3 11
55 227
96 Convolution 3 Input Layer2: No difference
227 55
7 3 3 7
55 227
96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 23 11 9 5 7 3 3 3 3 5 3 2dense dense dense 5 3 3 3 3 3 5 2 7 9 256 11 256 23 MaxPooling 384 Convolution 384 Convolution 256 Convolution 1000 256 MaxPooling Convolution 4096 4096 DifferenceLayer6:Layer5:Layer7:Layer1: in TotalKKF== No.= 384 384256 11 of→→→ Parameters10245127 Layer10:Layer3:Layer4:Layer8:Layer9: No No difference difference Difference1. in45M Parameters (3(3××3(33×××((11((384((3843 ×2 256)−××7256)384)2)××(512−−3)(1024(512×−96384)×× =1024))512)) 20 =.7 0K.29 = =M 0 0..368MM
23 11 9 5 7 3 3 3 3 5 3 2dense dense dense 5 3 3 3 3 3 5 2 7 9 256 11 512 23 MaxPooling 1024 Convolution 512 Convolution 256 Convolution 1000 256 MaxPooling Convolution 4096 4096
227 55 27 11 3 3 11
27 55 227
96 MaxPooling 96 Convolution 3 Input Layer2: No difference
227 55 27 7 3 3 7
27 55 227
96 MaxPooling 96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 23 11 9 7 3 3 3 3 5 3 2dense dense dense 3 3 3 3 3 5 2 7 9 256 11 256 23 MaxPooling 384 Convolution 384 Convolution 256 Convolution 1000 256 MaxPooling Convolution 4096 4096 DifferenceLayer6:Layer5:Layer7:Layer1: in TotalKKF== No.= 384 384256 11 of→→→ Parameters10245127 Layer10:Layer2:Layer4:Layer8:Layer9: No No difference difference Difference1. in45M Parameters (3(3××3(33×××((11((384((3843 ×2 256)−××7256)384)2)××(512−−3)(1024(512×−96384)×× =1024))512)) 20 =.7 0K.29 = =M 0 0..368MM
23 11 9 7 3 3 3 3 5 3 2dense dense dense 3 3 3 3 3 5 2 7 9 256 11 512 23 MaxPooling 1024 Convolution 512 Convolution 256 Convolution 1000 256 MaxPooling Convolution 4096 4096
227 55 27 11 3 5 3 11 5
27 55 227
96 MaxPooling 96 Convolution 3 Input Layer3: No difference
227 55 27 7 3 5 3 7 5
27 55 227
96 MaxPooling 96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 11 9 7 3 3 3 3 5 3 2dense dense dense 3 3 3 3 3 5 2 7 9 256 11 256 MaxPooling 384 Convolution 384 Convolution 256 1000 MaxPooling Convolution 4096 4096 DifferenceLayer6:Layer5:Layer7:Layer1: in TotalKKF== No.= 384 384256 11 of→→→ Parameters10245127 Layer10:Layer2:Layer4:Layer8:Layer9: No No difference difference Difference1. in45M Parameters (3(3××3(33×××((11((384((3843 ×2 256)−××7256)384)2)××(512−−3)(1024(512×−96384)×× =1024))512)) 20 =.7 0K.29 = =M 0 0..368MM
11 9 7 3 3 3 3 5 3 2dense dense dense 3 3 3 3 3 5 2 7 9 256 11 512 MaxPooling 1024 Convolution 512 Convolution 256 1000 MaxPooling Convolution 4096 4096
227 55 27 23 11 3 5 3 11 5
27 23 55 227
96 256 MaxPooling Convolution 96 Convolution 3 Input Layer3: No difference
227 55 27 23 7 3 5 3 7 5
27 23 55 227
96 256 MaxPooling Convolution 96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 11 9 7 3 3 3 5 3 2dense dense dense 3 3 3 3 5 2 7 9 256 11 256 MaxPooling 384 Convolution 384 Convolution 256 1000 MaxPooling Convolution 4096 4096 DifferenceLayer6:Layer5:Layer7:Layer1: in TotalKKF== No.= 384 384256 11 of→→→ Parameters10245127 Layer10:Layer2:Layer3:Layer8:Layer9: No No difference difference Difference1. in45M Parameters (3(3××3(33×××((11((384((3843 ×2 256)−××7256)384)2)××(512−−3)(1024(512×−96384)×× =1024))512)) 20 =.7 0K.29 = =M 0 0..368MM
11 9 7 3 3 3 5 3 2dense dense dense 3 3 3 3 5 2 7 9 256 11 512 MaxPooling 1024 Convolution 512 Convolution 256 1000 MaxPooling Convolution 4096 4096
227 55 27 23 11 5 3 3 3 3 11 5
27 23 55 227
96 256 MaxPooling Convolution 96 Convolution 3 Input Layer4: No difference
227 55 27 23 7 5 3 3 3 7 5 3
27 23 55 227
96 256 MaxPooling Convolution 96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 9 7 3 3 3 5 3 2dense dense dense 3 3 3 3 5 2 7 9 256 256 MaxPooling 384 Convolution 384 Convolution Convolution 1000
4096 4096 DifferenceLayer6:Layer5:Layer7:Layer1: in TotalKKF== No.= 384 384256 11 of→→→ Parameters10245127 Layer10:Layer2:Layer3:Layer8:Layer9: No No difference difference Difference1. in45M Parameters (3(3××3(33×××((11((384((3843 ×2 256)−××7256)384)2)××(512−−3)(1024(512×−96384)×× =1024))512)) 20 =.7 0K.29 = =M 0 0..368MM
9 7 3 3 3 5 3 2dense dense dense 3 3 3 3 5 2 7 9 256 512 MaxPooling 1024 Convolution 512 Convolution Convolution 1000
4096 4096
227 55 27 23 11 11 5 3 3 3 3 11 5 11 27 23 55 227 256 MaxPooling 96 256 MaxPooling Convolution 96 Convolution 3 Input Layer4: No difference
227 55 27 23 11 7 5 3 3 3 7 5 3
11 27 23 55 227 256 MaxPooling 96 256 MaxPooling Convolution 96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 9 7 3 3 5 3 2dense dense dense 3 3 3 5 2 7 9 256 256 MaxPooling 384 Convolution 384 Convolution Convolution 1000
4096 4096 DifferenceLayer6:Layer7:Layer1: in TotalKKF== No.= 384 256 11 of→→→ Parameters10245127 Layer10:Layer2:Layer3:Layer4:Layer8:Layer9: No No difference difference Difference1. in45M Parameters (3(3××33××((11((384((3842 −××7256)384)2) ×−−3)(1024(512× 96×× =1024))512)) 20.7K = = 0 0..368MM
9 7 3 3 5 3 2dense dense dense 3 3 3 5 2 7 9 256 512 MaxPooling 1024 Convolution 512 Convolution Convolution 1000
4096 4096
227 55 27 23 11 11 3 5 3 3 3 3 3 11 5 11 27 23 55 227 256 MaxPooling 96 256 MaxPooling Convolution 96 Convolution Layer5: K = 384 → 512 3 Input Difference in Parameters (3 × 3 × 256) × (512 − 384) = 0.29M
227 55 27 23 11 7 3 5 3 3 3 7 5 3 3
11 27 23 55 227 256 MaxPooling 96 256 MaxPooling Convolution 96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 7 3 3 5 3 2dense dense dense 3 3 3 5 2 7 256 256 MaxPooling 384 Convolution Convolution 1000
4096 4096 DifferenceLayer6:Layer7:Layer1: in TotalKKF== No.= 384 256 11 of→→→ Parameters10245127 Layer10:Layer2:Layer3:Layer4:Layer8:Layer9: No No difference difference Difference1. in45M Parameters (3(3××33××((11((384((3842 −××7256)384)2) ×−−3)(1024(512× 96×× =1024))512)) 20.7K = = 0 0..368MM
7 3 3 5 3 2dense dense dense 3 3 3 5 2 7 256 512 MaxPooling 1024 Convolution Convolution 1000
4096 4096
227 55 27 23 11 11 9 3 5 3 3 3 3 3 11 5 11 9 27 23 55 384 227 256 MaxPooling Convolution 96 256 MaxPooling Convolution 96 Convolution Layer5: K = 384 → 512 3 Input Difference in Parameters (3 × 3 × 256) × (512 − 384) = 0.29M
227 55 27 23 11 7 9 3 5 3 3 3 7 5 3 3
11 9 27 23 55 512 227 256 MaxPooling Convolution 96 256 MaxPooling Convolution 96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 7 3 5 3 2dense dense dense 3 3 5 2 7 256 256 MaxPooling 384 Convolution Convolution 1000
4096 4096 DifferenceLayer5:Layer7:Layer1: in TotalKF= No.= 384256 11 of→→ Parameters5127 Layer10:Layer2:Layer3:Layer4:Layer8:Layer9: No No difference difference Difference1. in45M Parameters (3 × 3(3××((11((3843 ×2 256)−×7256)2)××(512−3)(1024×−96384)× =512)) 20 =.7 0K.29 =M 0.36M
7 3 5 3 2dense dense dense 3 3 5 2 7 256 512 MaxPooling 1024 Convolution Convolution 1000
4096 4096
227 55 27 23 11 11 9 3 5 3 3 3 3 3 3 11 5 3 11 9 27 23 55 384 227 256 MaxPooling Convolution 96 256 MaxPooling Convolution 96 Convolution Layer6: K = 384 → 1024 3 Input Difference in Parameters (3 × 3 × ((384 × 384) − (512 × 1024)) = 0.8M
227 55 27 23 11 7 9 3 5 3 3 3 3 7 5 3 3 3
11 9 27 23 55 512 227 256 MaxPooling Convolution 96 256 MaxPooling Convolution 96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 3 5 3 2dense dense dense 3 3 5 2 256 256 MaxPooling Convolution
1000
4096 4096 DifferenceLayer5:Layer7:Layer1: in TotalKF= No.= 384256 11 of→→ Parameters5127 Layer10:Layer2:Layer3:Layer4:Layer8:Layer9: No No difference difference Difference1. in45M Parameters (3 × 3(3××((11((3843 ×2 256)−×7256)2)××(512−3)(1024×−96384)× =512)) 20 =.7 0K.29 =M 0.36M
3 5 3 2dense dense dense 3 3 5 2 256 512 MaxPooling Convolution
1000
4096 4096
227 55 27 23 11 11 9 3 5 7 3 3 3 3 3 3 11 5 3 7 11 9 23 27 384 55 384 Convolution 227 256 MaxPooling Convolution 96 256 MaxPooling Convolution 96 Convolution Layer6: K = 384 → 1024 3 Input Difference in Parameters (3 × 3 × ((384 × 384) − (512 × 1024)) = 0.8M
227 55 27 23 11 7 9 3 5 7 3 3 3 3 7 5 3 3 3 7 11 9 23 27 1024 55 512 Convolution 227 256 MaxPooling Convolution 96 256 MaxPooling Convolution 96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 5 3 2dense dense dense 3 5 2 256 256 MaxPooling Convolution
1000
4096 4096 DifferenceLayer6:Layer5:Layer1: in TotalKKF== No.= 384 384 11 of→→→ Parameters10245127 Layer10:Layer2:Layer3:Layer4:Layer8:Layer9: No No difference difference Difference1. in45M Parameters (3 ×(33 ××((11((3843 ×2 256)−×7384)2)××(512−3)(512×−96384)× =1024)) 20 =.7 0K.29 =M 0.8M
5 3 2dense dense dense 3 5 2 256 512 MaxPooling Convolution
1000
4096 4096
227 55 27 23 11 11 9 3 5 7 3 3 3 3 3 3 3 11 5 3 3 7 11 9 23 27 384 55 384 Convolution 227 256 MaxPooling Convolution 96 256 MaxPooling Convolution 96 Convolution Layer7: K = 256 → 512 3 Input Difference in Parameters (3 × 3 × ((384 × 256) − (1024 × 512)) = 0.36M
227 55 27 23 11 7 9 3 5 7 3 3 3 3 3 7 5 3 3 3 3 7 11 9 23 27 1024 55 512 Convolution 227 256 MaxPooling Convolution 96 256 MaxPooling Convolution 96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 3 2dense dense dense 3 2 256 MaxPooling
1000
4096 4096 DifferenceLayer6:Layer5:Layer1: in TotalKKF== No.= 384 384 11 of→→→ Parameters10245127 Layer10:Layer2:Layer3:Layer4:Layer8:Layer9: No No difference difference Difference1. in45M Parameters (3 ×(33 ××((11((3843 ×2 256)−×7384)2)××(512−3)(512×−96384)× =1024)) 20 =.7 0K.29 =M 0.8M
3 2dense dense dense 3 2 256 MaxPooling
1000
4096 4096
227 55 27 23 11 11 9 3 5 7 3 3 3 3 5 3 5 3 3 3 3 11 5 7 11 9 23 256 27 384 55 Convolution 384 Convolution 227 256 MaxPooling Convolution 96 256 MaxPooling Convolution 96 Convolution Layer7: K = 256 → 512 3 Input Difference in Parameters (3 × 3 × ((384 × 256) − (1024 × 512)) = 0.36M
227 55 27 23 11 7 9 3 5 7 3 3 3 3 5 3 7 5 3 3 3 3 5 7 11 9 23 512 27 1024 55 Convolution 512 Convolution 227 256 MaxPooling Convolution 96 256 MaxPooling Convolution 96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 2dense dense dense 2 256 MaxPooling
1000
4096 4096 DifferenceLayer6:Layer5:Layer7:Layer1: in TotalKKF== No.= 384 384256 11 of→→→ Parameters10245127 Layer10:Layer2:Layer3:Layer4:Layer9: No No difference difference Difference1. in45M Parameters (3(3××3(33×××((11((384((3843 ×2 256)−××7256)384)2)××(512−−3)(1024(512×−96384)×× =1024))512)) 20 =.7 0K.29 = =M 0 0..368MM
2dense dense dense 2 256 MaxPooling
1000
4096 4096
227 55 27 23 11 11 9 3 5 7 3 3 3 3 3 5 3 5 3 3 3 3 3 11 5 7 11 9 23 256 27 384 55 Convolution 384 Convolution 227 256 MaxPooling Convolution 96 256 MaxPooling Convolution 96 Convolution 3 Input Layer8: No difference
227 55 27 23 11 7 9 3 5 7 3 3 3 3 3 5 3 7 5 3 3 3 3 3 5 7 11 9 23 512 27 1024 55 Convolution 512 Convolution 227 256 MaxPooling Convolution 96 256 MaxPooling Convolution 96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 dense dense dense
1000
4096 4096 DifferenceLayer6:Layer5:Layer7:Layer1: in TotalKKF== No.= 384 384256 11 of→→→ Parameters10245127 Layer10:Layer2:Layer3:Layer4:Layer9: No No difference difference Difference1. in45M Parameters (3(3××3(33×××((11((384((3843 ×2 256)−××7256)384)2)××(512−−3)(1024(512×−96384)×× =1024))512)) 20 =.7 0K.29 = =M 0 0..368MM
dense dense dense
1000
4096 4096
227 55 27 23 11 11 9 3 5 7 3 3 3 3 3 5 3 2 5 3 3 3 3 3 11 5 2 7 11 9 256 23 256 27 384 MaxPooling 55 Convolution 384 Convolution 227 256 MaxPooling Convolution 96 256 MaxPooling Convolution 96 Convolution 3 Input Layer8: No difference
227 55 27 23 11 7 9 3 5 7 3 3 3 3 3 5 3 2 7 5 3 3 3 3 3 5 2 7 11 9 256 23 512 27 1024 MaxPooling 55 Convolution 512 Convolution 227 256 MaxPooling Convolution 96 256 MaxPooling Convolution 96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 dense dense
1000
4096 DifferenceLayer6:Layer5:Layer7:Layer1: in TotalKKF== No.= 384 384256 11 of→→→ Parameters10245127 Layer10:Layer2:Layer3:Layer4:Layer8: No No difference difference Difference1. in45M Parameters (3(3××3(33×××((11((384((3843 ×2 256)−××7256)384)2)××(512−−3)(1024(512×−96384)×× =1024))512)) 20 =.7 0K.29 = =M 0 0..368MM
dense dense
1000
4096
227 55 27 23 11 11 9 3 5 7 3 3 3 3 3 5 dense 3 2 5 3 3 3 3 3 11 5 2 7 11 9 256 23 256 27 384 MaxPooling 55 Convolution 384 Convolution 227 256 MaxPooling Convolution 96 256 MaxPooling Convolution 4096 96 Convolution 3 Input Layer9: No difference
227 55 27 23 11 7 9 3 5 7 3 3 3 3 3 5 dense 3 2 7 5 3 3 3 3 3 5 2 7 11 9 256 23 512 27 1024 MaxPooling 55 Convolution 512 Convolution 227 256 MaxPooling Convolution 96 256 MaxPooling Convolution 4096 96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 dense
1000
DifferenceLayer6:Layer5:Layer7:Layer1: in TotalKKF== No.= 384 384256 11 of→→→ Parameters10245127 Layer10:Layer2:Layer3:Layer4:Layer8:Layer9: No No difference difference Difference1. in45M Parameters (3(3××3(33×××((11((384((3843 ×2 256)−××7256)384)2)××(512−−3)(1024(512×−96384)×× =1024))512)) 20 =.7 0K.29 = =M 0 0..368MM
dense
1000
227 55 27 23 11 11 9 3 5 7 3 3 3 3 3 5 dense dense 3 2 5 3 3 3 3 3 11 5 2 7 11 9 256 23 256 27 384 MaxPooling 55 Convolution 384 Convolution 227 256 MaxPooling Convolution 96 256 MaxPooling Convolution 4096 4096 96 Convolution 3 Input Layer10: No difference
227 55 27 23 11 7 9 3 5 7 3 3 3 3 3 5 dense dense 3 2 7 5 3 3 3 3 3 5 2 7 11 9 256 23 512 27 1024 MaxPooling 55 Convolution 512 Convolution 227 256 MaxPooling Convolution 96 256 MaxPooling Convolution 4096 4096 96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 DifferenceLayer6:Layer5:Layer7:Layer1: in TotalKKF== No.= 384 384256 11 of→→→ Parameters10245127 Layer10:Layer2:Layer3:Layer4:Layer8:Layer9: No No difference difference Difference1. in45M Parameters (3(3××3(33×××((11((384((3843 ×2 256)−××7256)384)2)××(512−−3)(1024(512×−96384)×× =1024))512)) 20 =.7 0K.29 = =M 0 0..368MM
227 55 27 23 11 11 9 3 5 7 3 3 3 3 3 5 dense dense dense 3 2 5 3 3 3 3 3 11 5 2 7 11 9 256 23 256 27 384 MaxPooling 55 Convolution 384 Convolution 227 256 1000 MaxPooling Convolution 96 256 MaxPooling Convolution 4096 4096 96 Convolution 3 Input Layer10: No difference
227 55 27 23 11 7 9 3 5 7 3 3 3 3 3 5 dense dense dense 3 2 7 5 3 3 3 3 3 5 2 7 11 9 256 23 512 27 1024 MaxPooling 55 Convolution 512 Convolution 227 256 1000 MaxPooling Convolution 96 256 MaxPooling Convolution 4096 4096 96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Layer6:Layer5:Layer7:Layer1:KKF=== 384 384256 11→→→10245127 DifferenceLayer10:Layer2:Layer3:Layer4:Layer8:Layer9: No Noin Parametersdifference difference (3(3××3(33×××((11((384((3843 ×2 256)−××7256)384)2)××(512−−3)(1024(512×−96384)×× =1024))512)) 20 =.7 0K.29 = =M 0 0..368MM
227 55 27 23 11 11 9 3 5 7 3 3 3 3 3 5 dense dense dense 3 2 5 3 3 3 3 3 11 5 2 7 11 9 256 23 256 27 384 MaxPooling 55 Convolution 384 Convolution 227 256 1000 MaxPooling Convolution 96 256 MaxPooling Convolution 4096 4096 96 Convolution 3 Difference in Total No. of Parameters Input 1.45M
227 55 27 23 11 7 9 3 5 7 3 3 3 3 3 5 dense dense dense 3 2 7 5 3 3 3 3 3 5 2 7 11 9 256 23 512 27 1024 MaxPooling 55 Convolution 512 Convolution 227 256 1000 MaxPooling Convolution 96 256 MaxPooling Convolution 4096 4096 96 Convolution 3 Input 52/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 ImageNet Success Stories(roadmap for rest of the talk) AlexNet ZFNet VGGNet
53/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
224 112 112 56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 112 112
224 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool 64 128 maxpool Conv 64 maxpool Conv 1000 Conv fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224
Input
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
224 112 112 56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 112 112
224 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool 64 128 maxpool Conv 64 maxpool Conv 1000 fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224
Input Conv
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
112 112 56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 112 112 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool 64 128 maxpool Conv maxpool Conv 1000 fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 224 224
64 Input Conv
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
112 56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 112 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool 128 maxpool Conv Conv 1000 fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 224 224
64 64 maxpool Input Conv
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
112 56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 112 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool 128 maxpool Conv
1000 fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 224 224
64 64 maxpool Conv Input Conv
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool maxpool Conv
1000 fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 112 112 224 224
64 128 64 maxpool Conv Input Conv
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
56 28 28 14 14 7 7 28 14 14 28 56 512 512 512 256 512 256 maxpool Conv maxpool Conv maxpool Conv
1000 fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 56 56 112 112 224 224
128 64 128 maxpool 64 maxpool Conv Input Conv
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
56 28 28 14 14 7 7 28 14 14 28 56 512 512 512 256 512 256 maxpool Conv maxpool Conv maxpool
1000 fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 56 56 112 112 224 224
128 64 128 maxpool Conv 64 maxpool Conv Input Conv
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
56 28 28 14 14 7 7 28 14 14 28 56 512 512 512 256 512 256 maxpool Conv maxpool Conv maxpool
1000 fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 56 56 112 112 224 224
128 64 128 maxpool Conv 64 maxpool Conv Input Conv
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
28 28 14 14 7 7 28 14 14 28 512 512 512 256 512 maxpool Conv maxpool Conv maxpool
1000 fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 56 56 56 56 112 112 224 224
128 256 64 128 maxpool Conv 64 maxpool Conv Input Conv
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
28 14 14 7 7 28 14 14 512 512 512 512 Conv maxpool Conv maxpool
1000 fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 56 56 28 28 56 56 112 112
224 224 256 128 256 maxpool 64 128 maxpool Conv 64 maxpool Conv Input Conv
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
28 14 14 7 7 28 14 14 512 512 512 512 maxpool Conv maxpool
1000 fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 56 56 28 28 56 56 112 112
224 224 256 128 256 maxpool Conv 64 128 maxpool Conv 64 maxpool Conv Input Conv
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
28 14 14 7 7 28 14 14 512 512 512 512 maxpool Conv maxpool
1000 fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 56 56 28 28 56 56 112 112
224 224 256 128 256 maxpool Conv 64 128 maxpool Conv 64 maxpool Conv Input Conv
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
14 14 7 7 14 14 512 512 512
maxpool Conv maxpool
1000 fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 56 56 28 28 28 28 56 56 112 112
224 224 256 512 128 256 maxpool Conv 64 128 maxpool Conv 64 maxpool Conv Input Conv
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
14 7 7 14 512 512
Conv maxpool
1000 fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 56 56 28 28 14 28 14 28 56 56 112 112
224 224 512 256 512 128 256 maxpool Conv maxpool 64 128 maxpool Conv 64 maxpool Conv Input Conv
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
14 7 7 14 512 512 maxpool
1000 fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 56 56 28 28 14 28 14 28 56 56 112 112
224 224 512 256 512 128 256 maxpool Conv maxpool Conv 64 128 maxpool Conv 64 maxpool Conv Input Conv
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
14 7 7 14 512 512 maxpool
1000 fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 56 56 28 28 14 28 14 28 56 56 112 112
224 224 512 256 512 128 256 maxpool Conv maxpool Conv 64 128 maxpool Conv 64 maxpool Conv Input Conv
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
7 7 512
maxpool
1000 fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 56 56 28 28 14 14 28 14 14 28 56 56 112 112
224 224 512 512 256 512 128 256 maxpool Conv maxpool Conv 64 128 maxpool Conv 64 maxpool Conv Input Conv
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
1000 fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 112 112
224 224 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool 64 128 maxpool Conv 64 maxpool Conv Input Conv
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
1000 fc
4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 112 112
224 224 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool 64 128 maxpool Conv 64 maxpool Conv Input Conv fc
4096
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
1000
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 112 112
224 224 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool 64 128 maxpool Conv 64 maxpool Conv Input Conv fc fc
4096 4096
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
softmax
1000
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
224 224 112 112 56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 112 112
224 224 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool 64 128 maxpool Conv 64 maxpool Conv Input Conv fc fc
4096 4096
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
softmax
224 224 112 112 56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 112 112
224 224 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool 64 128 maxpool Conv 64 maxpool Conv 1000 Input Conv fc fc
4096 4096
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
Total parameters in non FC layers = ∼ 16M
Most parameters are in the first FC layer (∼ 102M)
softmax
224 224 112 112 56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 112 112
224 224 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool 64 128 maxpool Conv 64 maxpool Conv 1000 Input Conv fc fc
4096 4096
Kernel size is 3 × 3 throughout
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
Most parameters are in the first FC layer (∼ 102M)
softmax
224 224 112 112 56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 112 112
224 224 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool 64 128 maxpool Conv 64 maxpool Conv 1000 Input Conv fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M Most parameters are in the first FC layer (∼ 102M)
softmax
224 224 112 112 56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 112 112
224 224 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool 64 128 maxpool Conv 64 maxpool Conv 1000 Input Conv fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M Total Parameters in FC layers =
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 + (4096 × 4096) + (4096 × 1024) = ∼ 122M Most parameters are in the first FC layer (∼ 102M)
softmax
224 224 112 112 56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 112 112
224 224 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool 64 128 maxpool Conv 64 maxpool Conv 1000 Input Conv fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M Total Parameters in FC layers = (512 × 7 × 7 × 4096)
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 + (4096 × 1024) = ∼ 122M Most parameters are in the first FC layer (∼ 102M)
softmax
224 224 112 112 56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 112 112
224 224 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool 64 128 maxpool Conv 64 maxpool Conv 1000 Input Conv fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096)
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 = ∼ 122M Most parameters are in the first FC layer (∼ 102M)
softmax
224 224 112 112 56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 112 112
224 224 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool 64 128 maxpool Conv 64 maxpool Conv 1000 Input Conv fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024)
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Most parameters are in the first FC layer (∼ 102M)
softmax
224 224 112 112 56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 112 112
224 224 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool 64 128 maxpool Conv 64 maxpool Conv 1000 Input Conv fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 softmax
224 224 112 112 56 56 28 28 14 14 7 7 28 14 14 28 56 56 512 112 112
224 224 512 512 256 512 128 256 maxpool Conv maxpool Conv maxpool 64 128 maxpool Conv 64 maxpool Conv 1000 Input Conv fc fc
4096 4096
Kernel size is 3 × 3 throughout Total parameters in non FC layers = ∼ 16M Total Parameters in FC layers = (512 × 7 × 7 × 4096) + (4096 × 4096) + (4096 × 1024) = ∼ 122M Most parameters are in the first FC layer (∼ 102M)
54/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Module 11.5 : Image Classification continued (GoogLeNet and ResNet)
55/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 H1
f H Max Pooling f After this layer we could apply a max- D
1 W1 convolution H pooling layer 2 1
D
1 Or a 1 × 1 convolution 3 convolution 3 W Or a 3 × 3 convolution D
H3 W2 Or a 5 × 5 convolution
5 convolution 1 Question: Why choose between 5
1 D these options (convolution, maxpool- W3 ing, filter sizes)?
1 Idea: Why not apply all of them at the same time and then concatenate the feature maps?
Consider the output at a certain layer of a convolutional neural network
H
W
D
56/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 H
1 convolution H2 1 D Or a 1 × 1 convolution 3 convolution 3 W Or a 3 × 3 convolution D
H3 W2 Or a 5 × 5 convolution
5 convolution 1 Question: Why choose between 5
1 D these options (convolution, maxpool- W3 ing, filter sizes)?
1 Idea: Why not apply all of them at the same time and then concatenate the feature maps?
Consider the output at a certain layer H1 of a convolutional neural network f Max Pooling f After this layer we could apply a max- D H W1 pooling layer
1
W
D
56/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 H2
3 convolution 3 Or a 3 × 3 convolution D
H3 W2 Or a 5 × 5 convolution
5 convolution Question: Why choose between 5
1 D these options (convolution, maxpool- W3 ing, filter sizes)?
1 Idea: Why not apply all of them at the same time and then concatenate the feature maps?
Consider the output at a certain layer H1 of a convolutional neural network f H Max Pooling f After this layer we could apply a max- D H 1 W1 convolution pooling layer 1
D
1 Or a 1 × 1 convolution
W
W
1
D
56/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 H3 Or a 5 × 5 convolution
5 convolution Question: Why choose between 5
D these options (convolution, maxpool- W3 ing, filter sizes)?
1 Idea: Why not apply all of them at the same time and then concatenate the feature maps?
Consider the output at a certain layer H1 of a convolutional neural network f H Max Pooling f After this layer we could apply a max- D H 1 W1 convolution H pooling layer 2 1
D
1 Or a 1 × 1 convolution 3 convolution 3 W Or a 3 × 3 convolution D
W2 W
1
1
D
56/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Question: Why choose between these options (convolution, maxpool- ing, filter sizes)? Idea: Why not apply all of them at the same time and then concatenate the feature maps?
Consider the output at a certain layer H1 of a convolutional neural network f H Max Pooling f After this layer we could apply a max- D H 1 W1 convolution H pooling layer 2 1
D
1 Or a 1 × 1 convolution 3 convolution 3 W Or a 3 × 3 convolution D
H3 W2 Or a 5 × 5 convolution W 5 convolution 1 5
1 D
W D 3
1
56/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Idea: Why not apply all of them at the same time and then concatenate the feature maps?
Consider the output at a certain layer H1 of a convolutional neural network f H Max Pooling f After this layer we could apply a max- D H 1 W1 convolution H pooling layer 2 1
D
1 Or a 1 × 1 convolution 3 convolution 3 W Or a 3 × 3 convolution D
H3 W2 Or a 5 × 5 convolution W 5 convolution 1 Question: Why choose between 5
1 D these options (convolution, maxpool- W D 3 ing, filter sizes)?
1
56/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Consider the output at a certain layer H1 of a convolutional neural network f H Max Pooling f After this layer we could apply a max- D H 1 W1 convolution H pooling layer 2 1
D
1 Or a 1 × 1 convolution 3 convolution 3 W Or a 3 × 3 convolution D
H3 W2 Or a 5 × 5 convolution W 5 convolution 1 Question: Why choose between 5
1 D these options (convolution, maxpool- W D 3 ing, filter sizes)?
1 Idea: Why not apply all of them at the same time and then concatenate the feature maps?
56/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 If P = 0 & S = 1 then convolving a W × H × D input with a F × F × D filter results in a (W − F + 1)(H − F + 1) sized output Each element of the output requires O(F × F × D) computations Can we reduce the number of compu- tations?
Well this naive idea could result in a H1 large number of computations f H Max Pooling f
D H 1 W1 convolution H2 1
D
1 3 convolution 3
D W
H3 W2 W 5 convolution 1 5
1 D
W D 3
1
57/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Each element of the output requires O(F × F × D) computations Can we reduce the number of compu- tations?
Well this naive idea could result in a H1 large number of computations f H Max Pooling f If P = 0 & S = 1 then convolving a D H 1 W1 convolution H W × H × D input with a F × F × D 2 1 D filter results in a (W − F + 1)(H − 1 3 convolution 3 F + 1) sized output
D W
H3 W2 W 5 convolution 1 5
1 D
W D 3
1
57/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Can we reduce the number of compu- tations?
Well this naive idea could result in a H1 large number of computations f H Max Pooling f If P = 0 & S = 1 then convolving a D H 1 W1 convolution H W × H × D input with a F × F × D 2 1 D filter results in a (W − F + 1)(H − 1 3 convolution 3 F + 1) sized output
D W Each element of the output requires H3 W2 W 5 O(F × F × D) computations convolution 1 5
1 D
W D 3
1
57/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Well this naive idea could result in a H1 large number of computations f H Max Pooling f If P = 0 & S = 1 then convolving a D H 1 W1 convolution H W × H × D input with a F × F × D 2 1 D filter results in a (W − F + 1)(H − 1 3 convolution 3 F + 1) sized output
D W Each element of the output requires H3 W2 W 5 O(F × F × D) computations convolution 1 5 Can we reduce the number of compu- 1 D
W D 3 tations?
1
57/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Huh?? What does a 1×1 convolution
H H do ? It aggregates along the depth
1 So convolving a D×W ×H input with 1 D1 1×1 (D1 < D) filters will result in D a D1 × W × H output (S = 1,P = 0)
If D1 < D then this effectively re- W W duces the dimension of the input and hence the computations Specifically instead of O(F × F × D)
D D11 we will need O(F × F × D1) compu- tations We could then apply subsequent 3×3, 5 × 5 filter on this reduced output
Yes, by using 1 × 1 convolutions
58/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 H It aggregates along the depth
1 So convolving a D×W ×H input with 1 D1 1×1 (D1 < D) filters will result in D a D1 × W × H output (S = 1,P = 0)
If D1 < D then this effectively re- W duces the dimension of the input and hence the computations Specifically instead of O(F × F × D)
D11 we will need O(F × F × D1) compu- tations We could then apply subsequent 3×3, 5 × 5 filter on this reduced output
Yes, by using 1 × 1 convolutions Huh?? What does a 1×1 convolution
H do ?
W
D
58/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 H
So convolving a D×W ×H input with D1 1×1 (D1 < D) filters will result in a D1 × W × H output (S = 1,P = 0)
If D1 < D then this effectively re- W duces the dimension of the input and hence the computations Specifically instead of O(F × F × D)
D1 we will need O(F × F × D1) compu- tations We could then apply subsequent 3×3, 5 × 5 filter on this reduced output
Yes, by using 1 × 1 convolutions Huh?? What does a 1×1 convolution
H H do ? It aggregates along the depth
1
1
D
W W
D 1
58/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 H
If D1 < D then this effectively re- W duces the dimension of the input and hence the computations Specifically instead of O(F × F × D) 1 we will need O(F × F × D1) compu- tations We could then apply subsequent 3×3, 5 × 5 filter on this reduced output
Yes, by using 1 × 1 convolutions Huh?? What does a 1×1 convolution
H H do ? It aggregates along the depth
1 So convolving a D×W ×H input with 1 D1 1×1 (D1 < D) filters will result in D a D1 × W × H output (S = 1,P = 0)
W W
D D1
58/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 H
W
Specifically instead of O(F × F × D) 1 we will need O(F × F × D1) compu- tations We could then apply subsequent 3×3, 5 × 5 filter on this reduced output
Yes, by using 1 × 1 convolutions Huh?? What does a 1×1 convolution
H H do ? It aggregates along the depth
1 So convolving a D×W ×H input with 1 D1 1×1 (D1 < D) filters will result in D a D1 × W × H output (S = 1,P = 0)
If D1 < D then this effectively re- W W duces the dimension of the input and hence the computations
D D1
58/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 H
W
1
We could then apply subsequent 3×3, 5 × 5 filter on this reduced output
Yes, by using 1 × 1 convolutions Huh?? What does a 1×1 convolution
H H do ? It aggregates along the depth
1 So convolving a D×W ×H input with 1 D1 1×1 (D1 < D) filters will result in D a D1 × W × H output (S = 1,P = 0)
If D1 < D then this effectively re- W W duces the dimension of the input and hence the computations Specifically instead of O(F × F × D)
D D1 we will need O(F × F × D1) compu- tations
58/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 H
W
1
Yes, by using 1 × 1 convolutions Huh?? What does a 1×1 convolution
H H do ? It aggregates along the depth
1 So convolving a D×W ×H input with 1 D1 1×1 (D1 < D) filters will result in D a D1 × W × H output (S = 1,P = 0)
If D1 < D then this effectively re- W W duces the dimension of the input and hence the computations Specifically instead of O(F × F × D)
D D1 we will need O(F × F × D1) compu- tations We could then apply subsequent 3×3, 5 × 5 filter on this reduced output
58/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1 × 1 convolutions
3 × 3 convolutions (on reduced input) Filter concatenation 1 × 1 convolutions 5 × 5 convolutions (dimensionality re- So we can use D and D 1 × 1 fil- (on reduced input) 1 2 duction) ters before the 3 × 3 and 5 × 5 filters 3 × 3 Maxpooling (dimensionality re- 1 × 1 convolutions duction) respectively We can then add the maxpooling layer followed by dimensionality re- duction And a new set of 1 × 1 convolutions And finally we concatenate all these layers This is called the Inception module We will now see GoogLeNet which contains many such inception mod- ules
But we might want to use different 28 dimensionality reductions before the 1 × 1 convolutions 3 × 3 convolutions (dimensionality re- (on reduced input) duction) 3 × 3 and 5 × 5 filters
5 × 5 convolutions (on reduced input)
28
256
59/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1 × 1 convolutions
3 × 3 convolutions (on reduced input) Filter concatenation 1 × 1 convolutions 5 × 5 convolutions (dimensionality re- (on reduced input) duction)
3 × 3 Maxpooling (dimensionality re- 1 × 1 convolutions duction) We can then add the maxpooling layer followed by dimensionality re- duction And a new set of 1 × 1 convolutions And finally we concatenate all these layers This is called the Inception module We will now see GoogLeNet which contains many such inception mod- ules
But we might want to use different 28 dimensionality reductions before the 1 × 1 convolutions 3 × 3 convolutions (dimensionality re- (on reduced input) duction) 3 × 3 and 5 × 5 filters
1 × 1 convolutions 5 × 5 convolutions (dimensionality re- So we can use D and D 1 × 1 fil- (on reduced input) 1 2 duction) 28 ters before the 3 × 3 and 5 × 5 filters respectively
256
59/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1 × 1 convolutions
3 × 3 convolutions (on reduced input) Filter concatenation 1 × 1 convolutions 5 × 5 convolutions (dimensionality re- (on reduced input) duction)
1 × 1 convolutions
And a new set of 1 × 1 convolutions And finally we concatenate all these layers This is called the Inception module We will now see GoogLeNet which contains many such inception mod- ules
But we might want to use different 28 dimensionality reductions before the 1 × 1 convolutions 3 × 3 convolutions (dimensionality re- (on reduced input) duction) 3 × 3 and 5 × 5 filters
1 × 1 convolutions 5 × 5 convolutions (dimensionality re- So we can use D and D 1 × 1 fil- (on reduced input) 1 2 duction) 28 ters before the 3 × 3 and 5 × 5 filters 3 × 3 Maxpooling (dimensionality re- 1 × 1 convolutions duction) respectively 256 We can then add the maxpooling layer followed by dimensionality re- duction
59/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 3 × 3 convolutions (on reduced input) Filter concatenation 5 × 5 convolutions (on reduced input)
And finally we concatenate all these layers This is called the Inception module We will now see GoogLeNet which contains many such inception mod- ules
1 × 1 convolutions But we might want to use different 28 dimensionality reductions before the 1 × 1 convolutions 3 × 3 convolutions (dimensionality re- (on reduced input) duction) 3 × 3 and 5 × 5 filters
1 × 1 convolutions 5 × 5 convolutions (dimensionality re- So we can use D and D 1 × 1 fil- (on reduced input) 1 2 duction) 28 ters before the 3 × 3 and 5 × 5 filters 3 × 3 Maxpooling (dimensionality re- 1 × 1 convolutions duction) respectively 256 We can then add the maxpooling layer followed by dimensionality re- duction And a new set of 1 × 1 convolutions
59/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 3 × 3 convolutions (on reduced input)
5 × 5 convolutions (on reduced input)
This is called the Inception module We will now see GoogLeNet which contains many such inception mod- ules
1 × 1 convolutions But we might want to use different 28 dimensionality reductions before the 1 × 1 convolutions 3 × 3 convolutions (dimensionality re- (on reduced input) duction) 3 × 3 and 5 × 5 filters Filter concatenation 1 × 1 convolutions 5 × 5 convolutions (dimensionality re- So we can use D and D 1 × 1 fil- (on reduced input) 1 2 duction) 28 ters before the 3 × 3 and 5 × 5 filters 3 × 3 Maxpooling (dimensionality re- 1 × 1 convolutions duction) respectively 256 We can then add the maxpooling layer followed by dimensionality re- duction And a new set of 1 × 1 convolutions And finally we concatenate all these layers
59/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 3 × 3 convolutions (on reduced input)
5 × 5 convolutions (on reduced input)
We will now see GoogLeNet which contains many such inception mod- ules
1 × 1 convolutions But we might want to use different 28 dimensionality reductions before the 1 × 1 convolutions 3 × 3 convolutions (dimensionality re- (on reduced input) duction) 3 × 3 and 5 × 5 filters Filter concatenation 1 × 1 convolutions 5 × 5 convolutions (dimensionality re- So we can use D and D 1 × 1 fil- (on reduced input) 1 2 duction) 28 ters before the 3 × 3 and 5 × 5 filters 3 × 3 Maxpooling (dimensionality re- 1 × 1 convolutions duction) respectively 256 We can then add the maxpooling layer followed by dimensionality re- duction And a new set of 1 × 1 convolutions And finally we concatenate all these layers This is called the Inception module
59/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 3 × 3 convolutions (on reduced input)
5 × 5 convolutions (on reduced input)
1 × 1 convolutions But we might want to use different 28 dimensionality reductions before the 1 × 1 convolutions 3 × 3 convolutions (dimensionality re- (on reduced input) duction) 3 × 3 and 5 × 5 filters Filter concatenation 1 × 1 convolutions 5 × 5 convolutions (dimensionality re- So we can use D and D 1 × 1 fil- (on reduced input) 1 2 duction) 28 ters before the 3 × 3 and 5 × 5 filters 3 × 3 Maxpooling (dimensionality re- 1 × 1 convolutions duction) respectively 256 We can then add the maxpooling layer followed by dimensionality re- duction And a new set of 1 × 1 convolutions And finally we concatenate all these layers This is called the Inception module We will now see GoogLeNet which contains many such inception mod-
ules 59/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 112 112
64 Conv
56 56 28 28 28 14 14 14 14 7 7 7 1 1 1 1
28 28 28 1024 1024 7 7 7
56 56 3a 3b 14 4a 4b 4c 14 4d 14 4e 14 5a 5b 832 832 1024 avgpool dropout(40%) 480 512 528 832 192 256 480 maxpool Inception Inception 1000 1000 64 192 maxpoolInception Inception maxpool Inception Inception Inception fc softmax maxpool Conv
229 229
Input
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 56 56 28 28 28 14 14 14 14 7 7 7 1 1 1 1
28 28 28 1024 1024 7 7 7 14 14 56 56 3a 3b 14 4a 4b 4c 14 4d 4e 5a 5b 832 832 1024 avgpool dropout(40%) 480 512 528 832 192 256 480 maxpool Inception Inception 1000 1000 64 192 maxpoolInception Inception maxpool Inception Inception Inception fc softmax maxpool Conv
229 112 229 112
64 Conv Input
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 56 28 28 28 14 14 14 14 7 7 7 1 1 1 1
28 28 28 1024 1024 7 7 7 14 14 56 3a 3b 14 4a 4b 4c 14 4d 4e 5a 5b 832 832 1024 avgpool dropout(40%) 480 512 528 832 192 256 480 maxpool Inception Inception 1000 1000 192 maxpoolInception Inception maxpool Inception Inception Inception fc softmax Conv
229 112 56 56 229 112
64 64 maxpool Conv Input
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 28 28 28 14 14 14 14 7 7 7 1 1 1 1
28 28 28 1024 1024 7 7 7 14 14 3a 3b 14 4a 4b 4c 14 4d 4e 5a 5b 832 832 1024 avgpool dropout(40%) 480 512 528 832 192 256 480 maxpool Inception Inception 1000 1000 maxpoolInception Inception maxpool Inception Inception Inception fc softmax
229 112 56 56 56 56 229 112
64 192 64 maxpool Conv Conv Input
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 28 28 14 14 14 14 7 7 7 1 1 1 1
28 28 1024 1024 7 7 7 14 14 3a 3b 14 4a 4b 4c 14 4d 4e 5a 5b 832 832 1024 avgpool dropout(40%) 480 512 528 832 256 480 maxpool Inception Inception 1000 1000 Inception Inception maxpool Inception Inception Inception fc softmax
229 112 56 56 28 28 56 56 229 112 192 64 192 maxpool 64 maxpool Conv Conv Input
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 28 14 14 14 14 7 7 7 1 1 1 1
28 1024 1024 7 7 7 14 14 3b 14 4a 4b 4c 14 4d 4e 5a 5b 832 832 1024 avgpool dropout(40%) 480 512 528 832 480 maxpool Inception Inception 1000 1000 Inception maxpool Inception Inception Inception fc softmax
229 112 56 56 28 28 28 28 56 56 3a 229 112 192 256 64 192 maxpoolInception 64 maxpool Conv Conv Input
64 1 × 1 convolutions
28 96 1 × 1 convolu- 128 3 × 3 convolu- tions (dimensionality tions (on reduced reduction) input) Filter concatenation 16 1 × 1 convolu- 32 5 × 5 convolutions tions (dimensionality (on reduced input) reduction) 28 3 × 3 Maxpooling (dimensionality re- 32 1 × 1 convolutions duction)
192
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 14 14 14 14 7 7 7 1 1 1 1 1024 1024 7 7 7 14 14 14 4a 4b 4c 14 4d 4e 5a 5b 832 832 1024 avgpool dropout(40%) 480 512 528 832 maxpool Inception Inception 1000 1000 maxpool Inception Inception Inception fc softmax
229 112 56 56 28 28 28 28 28 28 56 56 3a 3b 229 112 192 256 480 64 192 maxpoolInception Inception 64 maxpool Conv Conv Input 128 1 × 1 convolutions 28 128 1 × 1 convolu- 192 3 × 3 convolu- tions (dimensionality tions (on reduced reduction) input) Filter concatenation 32 1 × 1 convolu- 96 5 × 5 convolutions tions (dimensionality (on reduced input) reduction) 28 3 × 3 Maxpooling (dimensionality re- 64 1 × 1 convolutions duction)
256
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 14 14 14 7 7 7 1 1 1 1 1024 1024 7 7 7 14 14 4a 4b 4c 14 4d 4e 5a 5b 832 832 1024 avgpool dropout(40%) 512 528 832 maxpool Inception Inception 1000 1000 Inception Inception Inception fc softmax
229 112 56 56 28 28 28 14 28 28 28 56 56 3a 3b 14 229 112 480 192 256 480 64 192 maxpoolInception Inception maxpool 64 maxpool Conv Conv Input
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 14 14 14 7 7 7 1 1 1 1 1024 1024 7 7 7 14 14 4b 4c 14 4d 4e 5a 5b 832 832 1024 avgpool dropout(40%) 512 528 832 maxpool Inception Inception 1000 1000 Inception Inception fc softmax
229 112 56 56 28 28 28 14 28 28 28 56 56 3a 3b 14 4a 229 112 480 192 256 480 64 192 maxpoolInception Inception maxpool Inception 64 maxpool Conv Conv Input 192 1 × 1 convolutions 14 96 1 × 1 convolu- 208 3 × 3 convolu- tions (dimensionality tions (on reduced reduction) input) Filter concatenation 16 1 × 1 convolu- 48 5 × 5 convolutions tions (dimensionality (on reduced input) reduction) 14 3 × 3 Maxpooling (dimensionality re- 64 1 × 1 convolutions duction)
480
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 14 14 14 7 7 7 1 1 1 1 1024 1024 7 7 7 14 14 4c 14 4d 4e 5a 5b 832 832 1024 avgpool dropout(40%) 512 528 832 maxpool Inception Inception 1000 1000 Inception Inception fc softmax
229 112 56 56 28 28 28 14 28 28 28 56 56 3a 3b 14 4a 4b 229 112 480 192 256 480 64 192 maxpoolInception Inception maxpool Inception 64 maxpool Conv Conv Input 160 1 × 1 convolutions 14 112 1 × 1 convolu- 224 3 × 3 convolu- tions (dimensionality tions (on reduced reduction) input) Filter concatenation 24 1 × 1 convolu- 64 5 × 5 convolutions tions (dimensionality (on reduced input) reduction) 14 3 × 3 Maxpooling (dimensionality re- 64 1 × 1 convolutions duction)
512
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 14 14 7 7 7 1 1 1 1 1024 1024 7 7 7
4d 14 4e 14 5a 5b 832 832 1024 avgpool dropout(40%) 528 832 maxpool Inception Inception 1000 1000 Inception Inception fc softmax
56 56 229 112 28 28 28 14 14 28 28 28 56 56 3a 3b 14 4a 4b 4c 14 229 112 480 512 192 256 480 64 192 maxpoolInception Inception maxpool Inception 64 maxpool Conv Conv Input 128 1 × 1 convolutions 14 128 1 × 1 convolu- 256 3 × 3 convolu- tions (dimensionality tions (on reduced reduction) input) Filter concatenation 24 1 × 1 convolu- 64 5 × 5 convolutions tions (dimensionality (on reduced input) reduction) 14 3 × 3 Maxpooling (dimensionality re- 64 1 × 1 convolutions duction)
512
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 14 7 7 7 1 1 1 1 1024 1024 7 7 7
4e 14 5a 5b 832 832 1024 avgpool dropout(40%) 832 maxpool Inception Inception 1000 1000 Inception fc softmax
56 56 229 112 28 28 28 14 14 14 28 28 28 56 56 3a 3b 14 4a 4b 4c 14 4d 14 229 112 480 512 528 192 256 480 64 192 maxpoolInception Inception maxpool Inception Inception 64 maxpool Conv Conv Input 112 1 × 1 convolutions 14 144 1 × 1 convolu- 288 3 × 3 convolu- tions (dimensionality tions (on reduced reduction) input) Filter concatenation 32 1 × 1 convolu- 64 5 × 5 convolutions tions (dimensionality (on reduced input) reduction) 14 3 × 3 Maxpooling (dimensionality re- 64 1 × 1 convolutions duction)
512
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 7 7 7 1 1 1 1 1024 1024 7 5a 7 5b 7 832 832 1024 avgpool dropout(40%) maxpool Inception Inception 1000 1000 fc softmax
56 56 229 112 28 28 28 14 14 14 14 28 28 28 56 56 3a 3b 14 4a 4b 4c 14 4d 14 4e 14 229 112 480 512 528 832 192 256 480 64 192 maxpoolInception Inception maxpool Inception Inception Inception 64 maxpool Conv Conv Input 256 1 × 1 convolutions 14 160 1 × 1 convolu- 320 3 × 3 convolu- tions (dimensionality tions (on reduced reduction) input) Filter concatenation 32 1 × 1 convolu- 128 5 × 5 convolu- tions (dimensionality tions (on reduced reduction) input) 14 3 × 3 Maxpooling 128 1 × 1 (dimensionality re- convolutions duction)
528
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 7 7 1 1 1 1 1024 1024 5a 7 5b 7 832 1024 avgpool dropout(40%)
Inception Inception 1000 1000 fc softmax
56 56 229 112 28 28 28 14 14 14 14 7 28 28 28 7 56 56 3a 3b 14 4a 4b 4c 14 4d 14 4e 14 832 229 112 480 512 528 832 192 256 480 maxpool 64 192 maxpoolInception Inception maxpool Inception Inception Inception 64 maxpool Conv Conv Input
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 7 1 1 1 1 1024 1024 5b 7 1024 avgpool dropout(40%)
Inception 1000 1000 fc softmax
56 56 229 112 28 28 28 14 14 14 14 7 7 28 28 28 7 7 56 56 3a 3b 14 4a 4b 4c 14 4d 14 4e 14 5a 832 832 229 112 480 512 528 832 192 256 480 maxpool Inception 64 192 maxpoolInception Inception maxpool Inception Inception Inception 64 maxpool Conv Conv Input 256 1 × 1 convolutions 7 160 1 × 1 convolu- 320 3 × 3 convolu- tions (dimensionality tions (on reduced reduction) input) Filter concatenation 32 1 × 1 convolu- 128 5 × 5 convolu- tions (dimensionality tions (on reduced reduction) input) 7 3 × 3 Maxpooling 128 1 × 1 (dimensionality re- convolutions duction)
832
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1 1 1 1 1024 1024 avgpool dropout(40%)
1000 1000 fc softmax
56 56 229 112 28 28 28 14 14 14 14 7 7 7 28 28 28 7 7 7 56 56 3a 3b 14 4a 4b 4c 14 4d 14 4e 14 5a 5b 832 832 1024 229 112 480 512 528 832 192 256 480 maxpool Inception Inception 64 192 maxpoolInception Inception maxpool Inception Inception Inception 64 maxpool Conv Conv Input 384 1 × 1 convolutions 7 192 1 × 1 convolu- 384 3 × 3 convolu- tions (dimensionality tions (on reduced reduction) input) Filter concatenation 48 1 × 1 convolu- 128 5 × 5 convolu- tions (dimensionality tions (on reduced reduction) input) 7 3 × 3 Maxpooling 128 1 × 1 (dimensionality re- convolutions duction)
832
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1 1 1024 dropout(40%)
1000 1000 fc softmax
56 56 229 112 28 28 28 14 14 14 14 7 7 7 1 1
28 28 28 1024 7 7 7 56 56 3a 3b 14 4a 4b 4c 14 4d 14 4e 14 5a 5b 832 832 1024 avgpool 229 112 480 512 528 832 192 256 480 maxpool Inception Inception 64 192 maxpoolInception Inception maxpool Inception Inception Inception 64 maxpool Conv Conv Input
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1000 1000 fc softmax
56 56 229 112 28 28 28 14 14 14 14 7 7 7 1 1 1 1
28 28 28 1024 1024 7 7 7 56 56 3a 3b 14 4a 4b 4c 14 4d 14 4e 14 5a 5b 832 832 1024 avgpool dropout(40%) 229 112 480 512 528 832 192 256 480 maxpool Inception Inception 64 192 maxpoolInception Inception maxpool Inception Inception Inception 64 maxpool Conv Conv Input
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1000 softmax
56 56 229 112 28 28 28 14 14 14 14 7 7 7 1 1 1 1
28 28 28 1024 1024 7 7 7 56 56 3a 3b 14 4a 4b 4c 14 4d 14 4e 14 5a 5b 832 832 1024 avgpool dropout(40%) 229 112 480 512 528 832 192 256 480 maxpool Inception Inception 1000 64 192 maxpoolInception Inception maxpool Inception Inception Inception fc 64 maxpool Conv Conv Input
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 56 56 229 112 28 28 28 14 14 14 14 7 7 7 1 1 1 1
28 28 28 1024 1024 7 7 7 56 56 3a 3b 14 4a 4b 4c 14 4d 14 4e 14 5a 5b 832 832 1024 avgpool dropout(40%) 229 112 480 512 528 832 192 256 480 maxpool Inception Inception 1000 1000 64 192 maxpoolInception Inception maxpool Inception Inception Inception fc softmax 64 maxpool Conv Conv Input
60/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1000
501761024××10001000 WW ∈∈RR Notice that output of the last layer is 1024 7 × 7 × 1024 7 × 7 × 1024 dimensional flatten What if we were to add a fully connec- ted layer with 1000 nodes (for 1000 7 classes) on top of this pick average We would have 7×7×1024×1000 = 1024 49M parameters 1024 Instead they use an average pooling of 7 size 7 × 7 on each of the 1024 feature maps This results in a 1024 dimensional output Significantly reduces the number of parameters
Important Trick: Got rid of the fully connected layer
61/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1000
501761024××10001000 WW ∈∈RR 1024
flatten What if we were to add a fully connec- ted layer with 1000 nodes (for 1000 classes) on top of this pick average We would have 7×7×1024×1000 = 1024 49M parameters Instead they use an average pooling of size 7 × 7 on each of the 1024 feature maps This results in a 1024 dimensional output Significantly reduces the number of parameters
Important Trick: Got rid of the fully connected layer Notice that output of the last layer is 7 × 7 × 1024 7 × 7 × 1024 dimensional flatten
7
1024
7
61/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 501761024××10001000 WW ∈∈RR 1024
flatten
pick average We would have 7×7×1024×1000 = 1024 49M parameters Instead they use an average pooling of size 7 × 7 on each of the 1024 feature maps This results in a 1024 dimensional output Significantly reduces the number of parameters
1000 Important Trick: Got rid of the fully connected layer Notice that output of the last layer is 7 × 7 × 1024 7 × 7 × 1024 dimensional flatten What if we were to add a fully connec- ted layer with 1000 nodes (for 1000 7 classes) on top of this
1024
7
61/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1024×1000 W ∈ R 1024
flatten
pick average 1024 Instead they use an average pooling of size 7 × 7 on each of the 1024 feature maps This results in a 1024 dimensional output Significantly reduces the number of parameters
1000 Important Trick: Got rid of the
50176×1000 fully connected layer W ∈ R Notice that output of the last layer is 7 × 7 × 1024 7 × 7 × 1024 dimensional flatten What if we were to add a fully connec- ted layer with 1000 nodes (for 1000 7 classes) on top of this We would have 7×7×1024×1000 = 49M parameters 1024
7
61/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1024×1000 W ∈ R 1024
flatten
1024
This results in a 1024 dimensional output Significantly reduces the number of parameters
1000 Important Trick: Got rid of the
50176×1000 fully connected layer W ∈ R Notice that output of the last layer is 7 × 7 × 1024 7 × 7 × 1024 dimensional flatten What if we were to add a fully connec- ted layer with 1000 nodes (for 1000 7 classes) on top of this pick average We would have 7×7×1024×1000 = 49M parameters 1024 Instead they use an average pooling of 7 size 7 × 7 on each of the 1024 feature maps
61/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1024×1000 W ∈ R 1024
flatten
Significantly reduces the number of parameters
1000 Important Trick: Got rid of the
50176×1000 fully connected layer W ∈ R Notice that output of the last layer is 7 × 7 × 1024 7 × 7 × 1024 dimensional flatten What if we were to add a fully connec- ted layer with 1000 nodes (for 1000 7 classes) on top of this pick average We would have 7×7×1024×1000 = 1024 49M parameters 1024 Instead they use an average pooling of 7 size 7 × 7 on each of the 1024 feature maps This results in a 1024 dimensional output
61/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 50176×1000 W ∈ R 7 × 7 × 1024
flatten
1000 Important Trick: Got rid of the
1024×1000 fully connected layer W ∈ R Notice that output of the last layer is 1024 7 × 7 × 1024 dimensional flatten What if we were to add a fully connec- ted layer with 1000 nodes (for 1000 7 classes) on top of this pick average We would have 7×7×1024×1000 = 1024 49M parameters 1024 Instead they use an average pooling of 7 size 7 × 7 on each of the 1024 feature maps This results in a 1024 dimensional output Significantly reduces the number of
parameters 61/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 50176×1000 W ∈ R 2× more computations 7 × 7 × 1024
flatten
1000 12× less parameters than AlexNet
1024×1000 W ∈ R 1024
flatten
7 pick average 1024 1024
7
61/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 50176×1000 W ∈ R 7 × 7 × 1024
flatten
1000 12× less parameters than AlexNet
1024×1000 W ∈ R 2× more computations 1024
flatten
7 pick average 1024 1024
7
61/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 GoogLeNet ResNet
62/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Now suppose we construct a deeper network which has few more layers (in orange) Intuitively, if the shallow network works well then the deep network should also work well by simply learn- ing to compute identity functions in the new layers Essentially, the solution space of a shallow neural network is a subset of the solution space of a deep neural network
Suppose we have been able to train a shallow neural network well
63/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Intuitively, if the shallow network works well then the deep network should also work well by simply learn- ing to compute identity functions in the new layers Essentially, the solution space of a shallow neural network is a subset of the solution space of a deep neural network
Suppose we have been able to train a shallow neural network well Now suppose we construct a deeper network which has few more layers (in orange)
63/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Essentially, the solution space of a shallow neural network is a subset of the solution space of a deep neural network
Suppose we have been able to train a shallow neural network well Now suppose we construct a deeper network which has few more layers (in orange) Intuitively, if the shallow network works well then the deep network should also work well by simply learn- ing to compute identity functions in the new layers
63/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Suppose we have been able to train a shallow neural network well Now suppose we construct a deeper network which has few more layers (in orange) Intuitively, if the shallow network works well then the deep network should also work well by simply learn- ing to compute identity functions in the new layers Essentially, the solution space of a shallow neural network is a subset of the solution space of a deep neural network
63/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Notice that the deep layers have a higher error rate on the test set
But in practice it is observed that this doesn’t happen
64/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 But in practice it is observed that this doesn’t happen Notice that the deep layers have a higher error rate on the test set
64/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 The two layers are essentially learning some function of the input What if we enable it to learn only a residual function of the input?
x
relu Identity
F (x) relu
H(x) = F (x) + x
x Consider any two stacked layers in a CNN
relu
relu H(x)
65/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 What if we enable it to learn only a residual function of the input?
x
relu Identity
F (x) relu
H(x) = F (x) + x
x Consider any two stacked layers in a CNN
relu The two layers are essentially learning some function of the input
relu H(x)
65/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 x Consider any two stacked layers in a CNN
relu The two layers are essentially learning some function of the input
relu What if we enable it to learn only a H(x) residual function of the input?
x
relu Identity
F (x) relu
H(x) = F (x) + x
65/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Remember our argument that a deeper version of a shallow network would do just fine by learning identity transformations in the new layers This identity connection from the in- put allows a ResNet to retain a copy of the input Using this idea they were able to train really deep networks
x Why would this help?
relu
relu H(x)
x
relu Identity
F (x) relu
H(x) = F (x) + x
66/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 This identity connection from the in- put allows a ResNet to retain a copy of the input Using this idea they were able to train really deep networks
x Why would this help? Remember our argument that a
relu deeper version of a shallow network would do just fine by learning identity transformations in the new layers relu H(x)
x
relu Identity
F (x) relu
H(x) = F (x) + x
66/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Using this idea they were able to train really deep networks
x Why would this help? Remember our argument that a
relu deeper version of a shallow network would do just fine by learning identity transformations in the new layers relu H(x) This identity connection from the in- put allows a ResNet to retain a copy x of the input
relu Identity
F (x) relu
H(x) = F (x) + x
66/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 x Why would this help? Remember our argument that a
relu deeper version of a shallow network would do just fine by learning identity transformations in the new layers relu H(x) This identity connection from the in- put allows a ResNet to retain a copy x of the input Using this idea they were able to train really deep networks relu Identity
F (x) relu
H(x) = F (x) + x
66/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 ImageNet Detection: 16% better than the 2nd best system ImageNet Localization: 27% bet- ter than the 2nd best system COCO Detection: 11% better than the 2nd best system COCO Segmentation: 12% better than the 2nd best system
1st place in all five main tracks ImageNet Classification: “Ultra- deep” 152-layer nets
ResNet, 152 layers
67/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 ImageNet Localization: 27% bet- ter than the 2nd best system COCO Detection: 11% better than the 2nd best system COCO Segmentation: 12% better than the 2nd best system
1st place in all five main tracks ImageNet Classification: “Ultra- deep” 152-layer nets ImageNet Detection: 16% better than the 2nd best system
ResNet, 152 layers
67/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 COCO Detection: 11% better than the 2nd best system COCO Segmentation: 12% better than the 2nd best system
1st place in all five main tracks ImageNet Classification: “Ultra- deep” 152-layer nets ImageNet Detection: 16% better than the 2nd best system ImageNet Localization: 27% bet- ter than the 2nd best system
ResNet, 152 layers
67/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 COCO Segmentation: 12% better than the 2nd best system
1st place in all five main tracks ImageNet Classification: “Ultra- deep” 152-layer nets ImageNet Detection: 16% better than the 2nd best system ImageNet Localization: 27% bet- ter than the 2nd best system COCO Detection: 11% better than the 2nd best system
ResNet, 152 layers
67/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 1st place in all five main tracks ImageNet Classification: “Ultra- deep” 152-layer nets ImageNet Detection: 16% better than the 2nd best system ImageNet Localization: 27% bet- ter than the 2nd best system COCO Detection: 11% better than the 2nd best system COCO Segmentation: 12% better than the 2nd best system
ResNet, 152 layers
67/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Xavier/2 initialization from [He et al] SGD + Momentum(0.9) Learning rate:0.1, divided by 10 when validation error plateaus Mini-batch size 256 Weight decay of 1e-5 No dropout used
Bag of tricks Batch Normalizaton after every CONV layer
ResNet, 152 layers
68/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 SGD + Momentum(0.9) Learning rate:0.1, divided by 10 when validation error plateaus Mini-batch size 256 Weight decay of 1e-5 No dropout used
Bag of tricks Batch Normalizaton after every CONV layer Xavier/2 initialization from [He et al]
ResNet, 152 layers
68/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Learning rate:0.1, divided by 10 when validation error plateaus Mini-batch size 256 Weight decay of 1e-5 No dropout used
Bag of tricks Batch Normalizaton after every CONV layer Xavier/2 initialization from [He et al] SGD + Momentum(0.9)
ResNet, 152 layers
68/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Mini-batch size 256 Weight decay of 1e-5 No dropout used
Bag of tricks Batch Normalizaton after every CONV layer Xavier/2 initialization from [He et al] SGD + Momentum(0.9) Learning rate:0.1, divided by 10 when validation error plateaus
ResNet, 152 layers
68/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Weight decay of 1e-5 No dropout used
Bag of tricks Batch Normalizaton after every CONV layer Xavier/2 initialization from [He et al] SGD + Momentum(0.9) Learning rate:0.1, divided by 10 when validation error plateaus Mini-batch size 256
ResNet, 152 layers
68/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 No dropout used
Bag of tricks Batch Normalizaton after every CONV layer Xavier/2 initialization from [He et al] SGD + Momentum(0.9) Learning rate:0.1, divided by 10 when validation error plateaus Mini-batch size 256 Weight decay of 1e-5
ResNet, 152 layers
68/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11 Bag of tricks Batch Normalizaton after every CONV layer Xavier/2 initialization from [He et al] SGD + Momentum(0.9) Learning rate:0.1, divided by 10 when validation error plateaus Mini-batch size 256 Weight decay of 1e-5 No dropout used
ResNet, 152 layers
68/68 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11