Harmonic Analysis of Deep Convolutional Neural Networks

Harmonic Analysis of Deep Convolutional Neural Networks Helmut B}olcskei Department of Information Technology and Electrical Engineering October 2017 joint work with Thomas Wiatowski and Philipp Grohs ImageNet CNNs win the ImageNet 2015 challenge [He et al., 2015 ] ImageNet ski rock plant coffee ImageNet ski rock plant coffee CNNs win the ImageNet 2015 challenge [He et al., 2015 ] \Carlos Kleiber conducting the Vienna Philharmonic's New Year's Concert 1989." Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] Kleiber conducting the Vienna Philharmonic's New Year's Concert 1989 Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] \Carlos ." conducting the Vienna Philharmonic's New Year's Concert 1989 Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] \Carlos Kleiber ." Vienna Philharmonic's New Year's Concert 1989 Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] \Carlos Kleiber conducting the ." New Year's Concert 1989 Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] \Carlos Kleiber conducting the Vienna Philharmonic's ." 1989 Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] \Carlos Kleiber conducting the Vienna Philharmonic's New Year's Concert ." Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] \Carlos Kleiber conducting the Vienna Philharmonic's New Year's Concert 1989." Feature extraction and classification input: f = non-linear feature extraction feature vector Φ(f) linear classifier ( hw; Φ(f)i > 0; ) Shannon output: hw; Φ(f)i < 0; ) von Neumann 1 possible with w = −1 : hw; Φ(f)i > 0 : hw; Φ(f)i < 0 kfk Φ(f) = 1 Why non-linear feature extractors? Task: Separate two categories of data through a linear classifier 1 : hw; fi > 0 : hw; fi < 0 : hw; Φ(f)i > 0 : hw; Φ(f)i < 0 1 possible with w = −1 kfk Φ(f) = 1 Why non-linear feature extractors? Task: Separate two categories of data through a linear classifier 1 : hw; fi > 0 : hw; fi < 0 not possible! Why non-linear feature extractors? Task: Separate two categories of data through a linear classifier kfk Φ(f) = 1 1 : hw; fi > 0 : hw; Φ(f)i > 0 : hw; fi < 0 : hw; Φ(f)i < 0 1 not possible! possible with w = −1 ) Linear separability in feature space! Why non-linear feature extractors? Task: Separate two categories of data through a linear classifier kfk Φ(f) = 1 ) Φ is invariant to angular component of the data Why non-linear feature extractors? Task: Separate two categories of data through a linear classifier kfk Φ(f) = 1 ) Φ is invariant to angular component of the data ) Linear separability in feature space! Translation invariance Handwritten digits from the MNIST database [LeCun & Cortes, 1998 ] Feature vector should be invariant to spatial location ) translation invariance Deformation insensitivity Feature vector should be independent of cameras (of different resolutions), and insensitive to small acquisition jitters Scattering networks ([Mallat, 2012], [Wiatowski and HB, 2015]) jjf ∗ g (k) j ∗ g (l) j jjf ∗ g (p) j ∗ g (r) j λ1 λ2 λ1 λ2 jf ∗ g (k) j jf ∗ g (p) j λ1 λ1 f feature map Scattering networks ([Mallat, 2012], [Wiatowski and HB, 2015]) jjf ∗ g (k) j ∗ g (l) j jjf ∗ g (p) j ∗ g (r) j λ1 λ2 λ1 λ2 · ∗ χ3 · ∗ χ3 jf ∗ g (k) j jf ∗ g (p) j λ1 λ1 · ∗ χ · ∗ χ 2 f feature map 2 · ∗ χ1 Scattering networks ([Mallat, 2012], [Wiatowski and HB, 2015]) jjf ∗ g (k) j ∗ g (l) j jjf ∗ g (p) j ∗ g (r) j λ1 λ2 λ1 λ2 · ∗ χ3 · ∗ χ3 jf ∗ g (k) j jf ∗ g (p) j λ1 λ1 · ∗ χ · ∗ χ 2 f feature map 2 · ∗ χ1 feature vector Φ(f) General scattering networks guarantee [Wiatowski & HB, 2015 ] - (vertical) translation invariance - small deformation sensitivity essentially irrespective of filters, non-linearities, and poolings! e.g.: Learned filters Building blocks Basic operations in the n-th network layer g (k) λn non-lin. pool. f . g (r) λn non-lin. pool. Filters: Semi-discrete frame Ψn := fχng [ fgλn gλn2Λn 2 2 X 2 2 2 d Ankfk2 ≤ kf ∗ χnk2 + kf ∗ gλn k ≤ Bnkfk2; 8f 2 L (R ) λn2Λn e.g.: Learned filters Building blocks Basic operations in the n-th network layer g (k) λn non-lin. pool. f . g (r) λn non-lin. pool. Filters: Semi-discrete frame Ψn := fχng [ fgλn gλn2Λn 2 2 X 2 2 2 d Ankfk2 ≤ kf ∗ χnk2 + kf ∗ gλn k ≤ Bnkfk2; 8f 2 L (R ) λn2Λn e.g.: Structured filters e.g.: Learned filters Building blocks Basic operations in the n-th network layer g (k) λn non-lin. pool. f . g (r) λn non-lin. pool. Filters: Semi-discrete frame Ψn := fχng [ fgλn gλn2Λn 2 2 X 2 2 2 d Ankfk2 ≤ kf ∗ χnk2 + kf ∗ gλn k ≤ Bnkfk2; 8f 2 L (R ) λn2Λn e.g.: Unstructured filters Building blocks Basic operations in the n-th network layer g (k) λn non-lin. pool. f . g (r) λn non-lin. pool. Filters: Semi-discrete frame Ψn := fχng [ fgλn gλn2Λn 2 2 X 2 2 2 d Ankfk2 ≤ kf ∗ χnk2 + kf ∗ gλn k ≤ Bnkfk2; 8f 2 L (R ) λn2Λn e.g.: Learned filters ) Satisfied by virtually all non-linearities used in the deep learning literature! 1 ReLU: Ln = 1; modulus: Ln = 1; logistic sigmoid: Ln = 4 ; ... Building blocks Basic operations in the n-th network layer g (k) λn non-lin. pool. f . g (r) λn non-lin. pool. Non-linearities: Point-wise and Lipschitz-continuous 2 d kMn(f) − Mn(h)k2 ≤ Lnkf − hk2; 8 f; h 2 L (R ) Building blocks Basic operations in the n-th network layer g (k) λn non-lin. pool. f . g (r) λn non-lin. pool. Non-linearities: Point-wise and Lipschitz-continuous 2 d kMn(f) − Mn(h)k2 ≤ Lnkf − hk2; 8 f; h 2 L (R ) ) Satisfied by virtually all non-linearities used in the deep learning literature! 1 ReLU: Ln = 1; modulus: Ln = 1; logistic sigmoid: Ln = 4 ; ... ) Emulates most poolings used in the deep learning literature! e.g.: Pooling by sub-sampling Pn(f) = f with Rn = 1 Building blocks Basic operations in the n-th network layer g (k) λn non-lin. pool. f . g (r) λn non-lin. pool. Pooling: In continuous-time according to d=2 f 7! Sn Pn(f)(Sn·); 2 d 2 d where Sn ≥ 1 is the pooling factor and Pn : L (R ) ! L (R ) is Rn-Lipschitz-continuous Building blocks Basic operations in the n-th network layer g (k) λn non-lin. pool. f . g (r) λn non-lin. pool. Pooling: In continuous-time according to d=2 f 7! Sn Pn(f)(Sn·); 2 d 2 d where Sn ≥ 1 is the pooling factor and Pn : L (R ) ! L (R ) is Rn-Lipschitz-continuous ) Emulates most poolings used in the deep learning literature! e.g.: Pooling by sub-sampling Pn(f) = f with Rn = 1 Building blocks Basic operations in the n-th network layer g (k) λn non-lin. pool. f . g (r) λn non-lin. pool. Pooling: In continuous-time according to d=2 f 7! Sn Pn(f)(Sn·); 2 d 2 d where Sn ≥ 1 is the pooling factor and Pn : L (R ) ! L (R ) is Rn-Lipschitz-continuous ) Emulates most poolings used in the deep learning literature! e.g.: Pooling by averaging Pn(f) = f ∗ φn with Rn = kφnk1 Vertical translation invariance Theorem (Wiatowski and HB, 2015) Assume that the filters, non-linearities, and poolings satisfy −2 −2 Bn ≤ minf1;Ln Rn g; 8 n 2 N: Let the pooling factors be Sn ≥ 1, n 2 N. Then, n n ktk jjjΦ (Ttf) − Φ (f)jjj = O ; S1 :::Sn 2 d d for all f 2 L (R ), t 2 R , n 2 N. Vertical translation invariance Theorem (Wiatowski and HB, 2015) Assume that the filters, non-linearities, and poolings satisfy −2 −2 Bn ≤ minf1;Ln Rn g; 8 n 2 N: Let the pooling factors be Sn ≥ 1, n 2 N. Then, n n ktk jjjΦ (Ttf) − Φ (f)jjj = O ; S1 :::Sn 2 d d for all f 2 L (R ), t 2 R , n 2 N. ) Features become more invariant with increasing network depth! Vertical translation invariance Theorem (Wiatowski and HB, 2015) Assume that the filters, non-linearities, and poolings satisfy −2 −2 Bn ≤ minf1;Ln Rn g; 8 n 2 N: Let the pooling factors be Sn ≥ 1, n 2 N. Then, n n ktk jjjΦ (Ttf) − Φ (f)jjj = O ; S1 :::Sn 2 d d for all f 2 L (R ), t 2 R , n 2 N. Full translation invariance: If lim S1 · S2 · ::: · Sn = 1, then n!1 n n lim jjjΦ (Ttf) − Φ (f)jjj = 0 n!1 Vertical translation invariance Theorem (Wiatowski and HB, 2015) Assume that the filters, non-linearities, and poolings satisfy −2 −2 Bn ≤ minf1;Ln Rn g; 8 n 2 N: Let the pooling factors be Sn ≥ 1, n 2 N. Then, n n ktk jjjΦ (Ttf) − Φ (f)jjj = O ; S1 :::Sn 2 d d for all f 2 L (R ), t 2 R , n 2 N. The condition −2 −2 Bn ≤ minf1;Ln Rn g; 8 n 2 N; is easily satisfied by normalizing the filters fgλn gλn2Λn .

Harmonic Analysis of Deep Convolutional Neural Networks

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support