Learning Sparse Wavelet Representations

Learning Sparse Wavelet Representations Daniel Recoskie∗ and Richard Mann University of Waterloo David R. Cheriton School of Computer Science Waterloo, Canada fdprecosk, [email protected] Abstract ods like the Fourier transform and its variants are often used (e.g. [5]). We believe that one In this work we propose a method for learning cause of the lack of use is the difficulty in design- wavelet filters directly from data. We accom- ing and selecting appropriate wavelet functions. plish this by framing the discrete wavelet trans- Wavelet filters are typically derived analytically form as a modified convolutional neural network. using Fourier methods. Furthermore, there are We introduce an autoencoder wavelet transform many different wavelet functions to choose from. network that is trained using gradient descent. Without a deep understanding of wavelet the- We show that the model is capable of learning ory, it can be difficult to know which wavelet to structured wavelet filters from synthetic and real choose. This difficulty may lead many to stick data. The learned wavelets are shown to be sim- to simpler methods. ilar to traditional wavelets that are derived using We propose a method that learns wavelet func- Fourier methods. Our method is simple to im- tions directly from data using a neural network plement and easily incorporated into neural net- framework. As such, we can leverage the theo- work architectures. A major advantage to our retical properties of the wavelet transform with- model is that we can learn from raw audio data. out the difficult task of designing or choosing a wavelet function. An advantage of this method is that we are able to learn directly from raw 1 Introduction audio data. Learning from raw audio has shown arXiv:1802.02961v1 [cs.LG] 8 Feb 2018 success in audio generation [10]. The wavelet transform has several useful proper- We are not the first to propose using wavelets ties that make it a good choice for a feature rep- in neural network architectures. There has been resentation including: a linear time algorithm, previous work in using fixed wavelet filters in perfect reconstruction, and the ability to tailor neural networks such as the wavelet network [16] wavelet functions to the application. However, and the scattering transform [9]. Unlike our pro- the wavelet transform is not widely used in the posed method, these works do not learn wavelet machine learning community. Instead, meth- functions from data. ∗Thanks to NSERC for funding. One notable work involving learning wavelets 1 can be found in [12]. Though the authors also The wavelet functions can be thought of as a propose learning wavelets from data, there are bandpass filter bank. The wavelet transform is several differences from our work. One major then a decomposition of a signal with this fil- difference is that second generation wavelets are ter bank. Since the wavelets are bandpass, we considered instead of the traditional (first gen- require the notion of a lowpass scaling function eration) wavelets considered here [15]. Secondly, that is the sum of all wavelets above a certain the domain of the signals were over the vertices scale j in order to fully represent the signal. of graphs, as opposed to R. We define the scaling function, φ, such that its We begin our discussion with the wavelet Fourier transform, φ^, satisfies transform. We will provide some mathemati- Z +1 j ^(s!)j2 cal background as well as outline the discrete jφ^(!)j2 = ds (3) wavelet transform algorithm. Next, we outline 1 s our proposed wavelet transform model. We show ^ that we can represent the wavelet transform as with the phase of φ being arbitrary [8]. a modified convolutional neural network. We The discrete wavelet transform and its inverse then evaluate our model by demonstrating we can be computed via a fast decimating algo- can learn useful wavelet functions by using an rithm. Let us define two filters architecture similar to traditional autoencoders 1 t [6]. h[n] = p φ ; φ(t − n) (4) 2 2 1 t 2 Wavelet transform g[n] = p ; φ(t − n) (5) 2 2 We choose to focus on a specific type of linear The following equations connect the wavelet time-frequency transform known as the wavelet coefficients to the filters h and g, and give rise to transform. The wavelet transform makes use of a a recursive algorithm for computing the wavelet dictionary of wavelet functions that are dilated transform. and shifted versions of a mother wavelet. The Wavelet Filter Bank Decomposition: mother wavelet, , is constrained to have zero mean and unit norm. The dilated and scaled +1 X wavelet functions are of the form: aj+1[p] = h[n − 2p]aj[n] (6) n=−∞ 1 n j[n] = : (1) 2j 2j +1 X d [p] = g[n − 2p]a [n] (7) where n; j 2 Z. The discrete wavelet transform j+1 j n=−∞ is defined as Wavelet Filter Bank Reconstruction N−1 j X ∗ +1 +1 W x[n; 2 ] = x[m] j [m − n] (2) X X m=0 aj[p] = h[p−2n]aj+1[n]+ g[p−2n]dj+1[n] n=−∞ n=−∞ for a discrete real signal x. (8) 2 We call a and d the approximation and detail coefficients respectively. The detail coefficients are exactly the wavelet coefficients defined by Equation 2. As shown in Equations 6 and 7, the wavelet coefficients are computed by recursively computing the coefficients at each scale, with a0 initialized with the signal x. At each step of the algorithm, the signal is split into high and low frequency components by convolving the approximation coefficients with h (scaling filter) and g (wavelet filter). The low frequency component becomes the input to the next step of the algorithm. Note that ai and di are downsampled by a factor of two at each iteration. An advantage of this algorithm is that we only require Figure 1: The discrete wavelet transform repre- two filters instead of an entire filter bank. The sented as a neural network. This network com- wavelet transform effectively partitions the sig- putes the discrete wavelet transform of its input nal into frequency bands defined by the wavelet using Equations 6 and 7 functions. We can reconstruct a signal from its wavelet coefficients using Equation 8. We call the reconstruction algorithm the inverse discrete raw audio signal. We accomplish this by im- wavelet transform. A thorough treatment of the plementing the discrete wavelet transform as a wavelet transform can be found in [8]. modified CNN. Figure 1 shows a graphical representation of our model, which consists of re- peated applications of Equations 6 and 7. The 3 Proposed Model parameters (or weights) of this network are the wavelet and scaling filters g and h. Thus, the We propose a method for learning wavelet func- network computes the wavelet coefficients of a tions by defining the discrete wavelet transform signal, but allows the wavelet filter to be learned as a convolutional neural network (CNN). CNNs from the data. We can similarly define an inverse compute a feature representation of an input network using Equation 8. signal through a cascade of filters. They have We can view our network as an unrolling of seen success in many signal processing tasks, the discrete wavelet transform algorithm similar such as speech recognition and music classifica- to unrolling a recurrent neural network (RNN) tion [13, 3]. Generally, CNNs are not applied [11]. Unlike an RNN, our model takes as input directly to raw audio data. Instead, a transform the entire input signal and reduces the scale at is first applied to the signal (such as the short- every layer through downsampling. Each layer time Fourier transform). This representation is of the network corresponds to one iteration of then fed into the network. the algorithm. At each layer, the detail coeffi- Our proposed method works directly on the cients are passed directly to the final layer. The 3 final layer output, denoted W (x), is formed as a concatenation of all the computed detail coefficients and the final approximation coefficients. We propose that this network be used as an ini- tial module as part of a larger neural network architecture. This would allow a neural network architecture to take as input raw audio data, as opposed to some transformed version. We restrict ourselves to quadrature mirror filters. That is, we set g[n] = (−1)nh[−n] (9) By making this restriction, we reduce our pa- Figure 2: Examples of random wavelet func- rameters to only the scaling filter h. tions that satisfy Equation 10 for different filter The model parameters will be learned by gra- lengths. dient descent. As such, we must introduce constraints that will guarantee the model learns We achieve this by constructing an autoencoder wavelet filters. We define the wavelet constraints as illustrated in Figure 3. Autoencoders are used as in unsupervised learning in order to learn use- p ful data representations [6]. Our autoencoder L (h; g) = (jjhjj −1)2 +(µ − 2=k)2 +µ2 (10) w 2 h g is composed of a wavelet transform network fol- where µh and µg are the means of h and g re- lowed by an inverse wavelet transform network. spectively, and k is length of the filters. The The loss function is made up of a reconstruction first two terms correspond to finite L2 and L1 loss, a sparsity term, and the wavelet constraints. norms respectively. The third term is a relaxed Letx î denote the reconstructed signal. The loss orthogonality constraint. Note that these are function is defined as soft constraints, and thus the filters learned by M 1 X the model are only approximately wavelet filters.

Load more