Learning Optimal Wavelet Bases Using a Neural Network Approach

Learning optimal wavelet bases using a neural network approach Andreas Søgaard School of Physics and Astronomy, University of Edinburgh Abstract A novel method for learning optimal, orthonormal wavelet bases for representing 1- and 2D signals, based on parallels between the wavelet transform and fully connected artificial neural networks, is described. The structural similarities between these two concepts are reviewed and combined to a “wavenet”, allowing for the direct learning of optimal wavelet filter coefficient through stochastic gradient descent with back-propagation over ensembles of training inputs, where conditions on the filter coefficients for constituting orthonormal wavelet bases are cast as quadratic regular- isations terms. We describe the practical implementation of this method, and study its performance for a few toy examples. It is shown that an optimal solutions are found, even in a high-dimensional search space, and the implica- tions of the result are discussed. Keywords: Neural networks, wavelets, machine learning, optimization 1. Introduction basis function for any given problem depends on the class of signal, choosing the best among existing func- 1The Fourier transform has proved an indispensable tional families is hard and likely sub-optimal, and con- tool within the natural sciences, allowing for the study structing new bases is non-trivial, as mentioned above. of frequency information of functions and for the effi- Therefore, we present a practical, efficient method for cient representation of signals exhibiting angular struc- directly learning the best wavelet bases, according to ture. However, the Fourier transform is limited by be- some optimality criterion, by exploiting the intimate ing global: each frequency component carries no infor- relationship between neural networks and the wavelet mation about its spatial localisation; information which transform. might be valuable. Multiresolution, and in particular wavelet, analysis has been developed, in part, to address this limitation, representing a function at various lev- Such a method could have potential uses e.g. in ar- els of resolution, or at different frequency scales, while eas utilising time-series data and imaging, for instance retaining information about position-space localisation. — but not limited to — EEG, speech recognition, seis- This encoding uses the fact that due to their smaller mographic studies, financial markets as well as image wavelengths, high-frequency components may be lo- compression, feature extraction, and de-noising. How- calised more precisely than their low-frequency coun- ever, as is shown in Section 7, the areas to which such terparts. an approach can be applied are quite varied. arXiv:1706.03041v2 [cs.NE] 31 Aug 2018 The wavelet decomposition expresses any given signal in terms of a “family” of orthonormal basis functions In Section 2 we review some of the work previously [2, 3], efficiently encoding frequency-position informa- done along these lines. In Section 3 we briefly describe tion. Several different such wavelet families exist, both wavelet analyses, neural networks, as well as their struc- for continuous and discrete input, but these are gener- tural similarity and how they can be combined. In Sec- ally quite difficult to construct exactly as they don’t pos- tion 4 we discuss metrics appropriate for measuring the sess closed-form representations. Furthermore, the best quality of a certain wavelet basis. In Section 5 we describe the actual algorithm for learning optimal wavelet bases. Section 6 describes the practical implementa- Email address: [email protected] (Andreas Søgaard) tion and, finally, Section 7 provides an example use case 1Sections 1 and 3 contain overlaps with [1]. from high-energy physics. Preprint submitted to Neural Networks September 3, 2018 2. Previous work tics of the input signal(s), and relate only to the question of optimal representation at fixed scales. A typical approach [4, 5, 6] when faced with the task This indicates that, although the question of con- of choosing a wavelet basis in which to represent some structing optimal wavelet bases has been given substan- class of signals, is to select one among an existing set tial consideration, and clear developments have been wavelet families, which is deemed suitable to the par- made already, a general approach to easily learning dis- ticular use case based on some measure of fitness. This crete, demonstrably orthonormal wavelet bases of arbi- might lead to sub-optimal results, as mentioned above, trary structure and complexity, optimised over classes since limiting the search to a few dozen pre-exiting of input has yet to be developed and implemented for wavelet families will likely result in inefficient encod- practically arbitrary choice of optimality metric. This is ing or representation of (possibly subtle) structure par- what is done below. ticular, or unique, to the problem at hand. To address this shortcoming, considerable effort has already gone 3. Theoretical concepts into the question of the existence and construction of optimal wavelet bases. In this section, we briefly review some of the underly- Ref. [7] describes a method for constructing opti- ing aspects of wavelet analysis, Section 3.1, and neural mally matched wavelets, i.e. wavelet bases matching a networks, Section 3.2, upon which the learning algo- prescribed pattern as closely as possible, through lifting rithm is based. In Section 3.3 we discuss the parallels [8]. However, the proposed method is somewhat ardu- between the two concepts, and how these can be used to ous and relies on the specification of a pattern to which directly learn optimal wavelet bases. to match, requiring considerable and somewhat artificial preprocessing.2 This is not necessarily possible, let 3.1. Wavelet alone easy, for many use cases as well as for the study Numerous excellent references explain multiresolu- of more general classes of inputs rather than single ex- tion analysis and the wavelet transform in depth, so the amples. In a similar vein, Ref. [9] provides a method present text will focus on the discrete class of wavelet for unconstrained optimisation of a wavelet basis with transforms, formulated in the language of matrix alge- respect to a sparsity measure using lifting, but has the bra as it relates directly to the task at hand. For a more same limitations as Ref. [7]. complete review, see e.g. [1] or [12, 15, 16, 17, 18]. Refs. [10, 11] provide theoretical arguments for the In the parlance of matrix algebra, the simplest possi- existence of optimal wavelet bases as well as an al- ble input signal f 2 RN is a column vector gorithm for constructing such a basis for single 1- or 2 f[0] 3 2D inputs, based on gradient descent. However, results 6 7 6 f[1] 7 are only presented for low-order wavelet bases, the im- 6 7 6 : 7 plementation of orthonormality constraints is not dis- f = 6 : 7 (1) 6 7 cussed, and the question of generalisation from single 6f[2M − 2]7 46 57 inputs to classes of inputs is not addressed. In addi- f[2M − 1] tion, the optimal filter coefficients referenced in [11, Ta- ble 1] do not satisfy the explicit conditions (C2), (C3), and the dyadic structure of the wavelet transform means M and (C4) for orthonormality in Section 3.1 below. These that N must be radix 2, i.e. N = 2 for some M 2 3 constraints are violated at the 1%-level, which also cor- N0. The forward wavelet transform is then performed responds roughly to the relative angular deviation of the by the iterative application of low- and high-pass filters. reported optimal basis from the Daubechies [12] basis Let L(f) denote the low-pass filtering of input f, the i’th of similar order. entry of which is then given by the convolution Finally, Refs. [13, 14] provide a comprehensive pre- M 2X−1 scription for designing wavelets that optimally repre- L(f)[i] = a[k]f[i+N=2−k]; i 2 [0; 2M−1 −1] (2) sent signals, or classes of signals, at some fixed scale J. k=0 However, the results are quite cumbersome, are based assuming periodicity, such that f[−1] = f[N − 1], etc. on a number of assumptions regarding the characteris- The low-pass filter, a, is represented as a row vector of 2“It is difficult to find a problem our method can be applied to 3Although the results below are also applicable to 2D, i.e. matrix, without major modifications.” [7, p. 125]. input, cf. Section 7. 2 length Nfilt, with Nfilt even, and its entries are called the Since this is true for each entry, the full low-pass filter filter coefficients, fag. may be represented as a (2M−1 × 2M) · (2M × 1) matrix The convolution yielding each entry i in L(f) can be inner product: seen as a matrix inner product of f with a row matrix of L(f) = LM−1 f (4) the form h i where, for each low-pass operation, ··· 0 a[N − 1] ··· a[1] a[0] 0 ··· the matrix operator is written as (3) 2 3 9 :: :: :: :: > 6 : : : : 7 > 6 7 > 6 ··· a[N − 1] ··· a[1] a[0] 0 0 0 0 ··· 7 > 6 7 > L = 6 ··· 0 0 a[N − 1] ··· a[1] a[0] 0 0 ··· 7 = 2m (5) m 6 7 > 6 ··· 0 0 0 0 a[N − 1] ··· a[1] a[0] ··· 7 > 6 7 > 6 : : : : 7 > 4 :: :: :: :: 5 ;> | {z } 2m+1 lution of the signal with the high- and low-pass filters, In complete analogy to Eq. (5), a high-pass filter i.e. the right-most layers in Figure 1a, collectively en- m m+1 matrix Hm can be expressed as a 2 × 2 matrix code the same information as the position-space input f, parametrised in the same way by coefficients fbg, which but in the basis of wavelet functions.

Load more