Harmonic Analysis of Deep Convolutional Neural Networks

Harmonic Analysis of Deep Convolutional Neural Networks Helmut B}olcskei Department of Information Technology and Electrical Engineering November 2017 joint work with Thomas Wiatowski and Philipp Grohs ImageNet CNNs win the ImageNet 2015 challenge [He et al., 2015 ] ImageNet ski rock plant coffee ImageNet ski rock plant coffee CNNs win the ImageNet 2015 challenge [He et al., 2015 ] Go! CNNs beat Go-champion Lee Sedol [Silver et al., 2016 ] Atari games CNNs beat professional human Atari-players [Mnih et al., 2015 ] \Carlos Kleiber conducting the Vienna Philharmonic's New Year's Concert 1989." Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] Kleiber conducting the Vienna Philharmonic's New Year's Concert 1989 Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] \Carlos ." conducting the Vienna Philharmonic's New Year's Concert 1989 Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] \Carlos Kleiber ." Vienna Philharmonic's New Year's Concert 1989 Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] \Carlos Kleiber conducting the ." New Year's Concert 1989 Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] \Carlos Kleiber conducting the Vienna Philharmonic's ." 1989 Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] \Carlos Kleiber conducting the Vienna Philharmonic's New Year's Concert ." Describing the content of an image CNNs generate sentences describing the content of an image [Vinyals et al., 2015 ] \Carlos Kleiber conducting the Vienna Philharmonic's New Year's Concert 1989." Feature extraction and classification input: f = non-linear feature extraction feature vector Φ(f) linear classifier ( hw; Φ(f)i > 0; ) Shannon output: hw; Φ(f)i < 0; ) von Neumann Mathematics for CNNs \It is the guiding principle of many applied mathematicians that if something mathematical works really well, there must be a good underlying mathematical reason for it, and we ought to be able to understand it." [I. Daubechies, 2015 ] 1 possible with w = −1 : hw; Φ(f)i > 0 : hw; Φ(f)i < 0 kfk Φ(f) = 1 Why non-linear feature extractors? Task: Separate two categories of data through a linear classifier 1 : hw; fi > 0 : hw; fi < 0 : hw; Φ(f)i > 0 : hw; Φ(f)i < 0 1 possible with w = −1 kfk Φ(f) = 1 Why non-linear feature extractors? Task: Separate two categories of data through a linear classifier 1 : hw; fi > 0 : hw; fi < 0 not possible! Why non-linear feature extractors? Task: Separate two categories of data through a linear classifier kfk Φ(f) = 1 1 : hw; fi > 0 : hw; Φ(f)i > 0 : hw; fi < 0 : hw; Φ(f)i < 0 1 not possible! possible with w = −1 ) Linear separability in feature space! Why non-linear feature extractors? Task: Separate two categories of data through a linear classifier kfk Φ(f) = 1 ) Φ is invariant to angular component of the data Why non-linear feature extractors? Task: Separate two categories of data through a linear classifier kfk Φ(f) = 1 ) Φ is invariant to angular component of the data ) Linear separability in feature space! Why non-linear feature extractors? m×m Let T 2 R be the cyclic shift matrix, i.e., 0 0 1 B . C B . I(m−1) C T = B C ; @ 0 A 1 0 ··· 0 m×m and let Φ 2 C be translation-invariant, i.e., Φf = Φ Tlf; 8 f; l What is the structure of Φ? Why non-linear feature extractors? m×m Let T 2 R be the cyclic shift matrix, i.e., 0 0 1 B . C B . I(m−1) C T = B C ; @ 0 A 1 0 ··· 0 m×m and let Φ 2 C be translation-invariant, i.e., Φf = Φ Tlf; 8 f; l What is the structure of Φ? 2 3 2 3 2 3 Φ = 4 φ1 φ2 : : : φm 5 = 4 φ2 φ3 : : : φ1 5 = 4 φm φ1 : : : φm−1 5 Why non-linear feature extractors? m×m Let T 2 R be the cyclic shift matrix, i.e., 0 0 1 B . C B . I(m−1) C T = B C ; @ 0 A 1 0 ··· 0 m×m and let Φ 2 C be translation-invariant, i.e., Φf = Φ Tlf; 8 f; l What is the structure of Φ? 2 3 Φ = 4 φ1 φ1 : : : φ1 5 ) rank = 1! Why non-linear feature extractors? Want to have a trivial null-space, i.e., 1 Φ(f) = 0 , f = 0; to avoid \losing" features. Non-linear Φ needed to avoid having 2 2 2 jjΦ(f1) − Φ(f2)jj = jjΦ(f1 − f2)jj ≥ Akf1 − f2k2; for some A > 0. ) translation-invariant, i.e., Φf = Φ Tlf; 8 f; l BUT: In practice, the Fourier-modulus has poor feature extraction performance due to its global nature! Why non-linear feature extractors? e.g.: Fourier-modulus Φ = jFnj; n×n where Fn 2 C is the DFT-matrix, and jFnj refers to multipli- cation by Fn followed by componentwise application of j · j. BUT: In practice, the Fourier-modulus has poor feature extraction performance due to its global nature! Why non-linear feature extractors? e.g.: Fourier-modulus Φ = jFnj; n×n where Fn 2 C is the DFT-matrix, and jFnj refers to multipli- cation by Fn followed by componentwise application of j · j. ) translation-invariant, i.e., Φf = Φ Tlf; 8 f; l Why non-linear feature extractors? e.g.: Fourier-modulus Φ = jFnj; n×n where Fn 2 C is the DFT-matrix, and jFnj refers to multipli- cation by Fn followed by componentwise application of j · j. ) translation-invariant, i.e., Φf = Φ Tlf; 8 f; l BUT: In practice, the Fourier-modulus has poor feature extraction performance due to its global nature! Invariance/Covariance Classification Regression Want to explicitly control the amount of invariance induced less/more invariance , more/less covariance Deformation insensitivity Feature vector should be independent of cameras (of different resolutions), and insensitive to small acquisition jitters Scattering networks ([Mallat, 2012], [Wiatowski and HB, 2015]) jjf ∗ g (k) j ∗ g (l) j jjf ∗ g (p) j ∗ g (r) j λ1 λ2 λ1 λ2 jf ∗ g (k) j jf ∗ g (p) j λ1 λ1 f feature map Scattering networks ([Mallat, 2012], [Wiatowski and HB, 2015]) jjf ∗ g (k) j ∗ g (l) j jjf ∗ g (p) j ∗ g (r) j λ1 λ2 λ1 λ2 · ∗ χ3 · ∗ χ3 jf ∗ g (k) j jf ∗ g (p) j λ1 λ1 · ∗ χ · ∗ χ 2 f feature map 2 · ∗ χ1 Scattering networks ([Mallat, 2012], [Wiatowski and HB, 2015]) jjf ∗ g (k) j ∗ g (l) j jjf ∗ g (p) j ∗ g (r) j λ1 λ2 λ1 λ2 · ∗ χ3 · ∗ χ3 jf ∗ g (k) j jf ∗ g (p) j λ1 λ1 · ∗ χ · ∗ χ 2 f feature map 2 · ∗ χ1 feature vector Φ(f) General scattering networks guarantee [Wiatowski & HB, 2015 ] - (vertical) translation invariance - small deformation sensitivity essentially irrespective of filters, non-linearities, and poolings! e.g.: Learned filters Building blocks Basic operations in the n-th network layer g (k) λn non-lin. pool. f . g (r) λn non-lin. pool. Filters: Semi-discrete frame Ψn := fχng [ fgλn gλn2Λn 2 2 X 2 2 2 d Ankfk2 ≤ kf ∗ χnk2 + kf ∗ gλn k ≤ Bnkfk2; 8f 2 L (R ) λn2Λn e.g.: Learned filters Building blocks Basic operations in the n-th network layer g (k) λn non-lin. pool. f . g (r) λn non-lin. pool. Filters: Semi-discrete frame Ψn := fχng [ fgλn gλn2Λn 2 2 X 2 2 2 d Ankfk2 ≤ kf ∗ χnk2 + kf ∗ gλn k ≤ Bnkfk2; 8f 2 L (R ) λn2Λn e.g.: Structured filters e.g.: Learned filters Building blocks Basic operations in the n-th network layer g (k) λn non-lin. pool. f . g (r) λn non-lin. pool. Filters: Semi-discrete frame Ψn := fχng [ fgλn gλn2Λn 2 2 X 2 2 2 d Ankfk2 ≤ kf ∗ χnk2 + kf ∗ gλn k ≤ Bnkfk2; 8f 2 L (R ) λn2Λn e.g.: Unstructured filters Building blocks Basic operations in the n-th network layer g (k) λn non-lin. pool. f . g (r) λn non-lin. pool. Filters: Semi-discrete frame Ψn := fχng [ fgλn gλn2Λn 2 2 X 2 2 2 d Ankfk2 ≤ kf ∗ χnk2 + kf ∗ gλn k ≤ Bnkfk2; 8f 2 L (R ) λn2Λn e.g.: Learned filters ) Satisfied by virtually all non-linearities used in the deep learning literature! 1 ReLU: Ln = 1; modulus: Ln = 1; logistic sigmoid: Ln = 4 ; ... Building blocks Basic operations in the n-th network layer g (k) λn non-lin. pool. f . g (r) λn non-lin. pool. Non-linearities: Point-wise and Lipschitz-continuous 2 d kMn(f) − Mn(h)k2 ≤ Lnkf − hk2; 8 f; h 2 L (R ) Building blocks Basic operations in the n-th network layer g (k) λn non-lin. pool. f . g (r) λn non-lin. pool. Non-linearities: Point-wise and Lipschitz-continuous 2 d kMn(f) − Mn(h)k2 ≤ Lnkf − hk2; 8 f; h 2 L (R ) ) Satisfied by virtually all non-linearities used in the deep learning literature! 1 ReLU: Ln = 1; modulus: Ln = 1; logistic sigmoid: Ln = 4 ; ... ) Emulates most poolings used in the deep learning literature! e.g.: Pooling by sub-sampling Pn(f) = f with Rn = 1 Building blocks Basic operations in the n-th network layer g (k) λn non-lin. pool.

Harmonic Analysis of Deep Convolutional Neural Networks

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support