Learnable Group Transform for Time-Series

Learnable Group Transform For Time-Series Romain Cosentino 1 Behnaam Aazhang 1 Abstract A common approach to performing inference on time-series We propose a novel approach to filter bank learn- consists of building a Deep Neural Network (DNN) that ing for time-series by considering spectral decom- operates on a spectral decomposition of the time-series positions of signals defined as a Group Transform. such as wavelet transform (WT) or Mel Frequency Spectral This framework allows us to generalize classi- Coefficients (MFSC). These decompositions represent the cal time-frequency transformations such as the signal. While the use of these decompositions is extensive, Wavelet Transform, and to efficiently learn the we show in Section2 their inherent biases and motivate the representation of signals. While the creation of development of a generalized framework. The selection of the wavelet transform filter-bank relies on affine the judicious transform is either performed by an expert transformations of a mother filter, our approach on the signal at hand, or by considering filter selection allows for non-linear transformations. The trans- methods (Coifman & Wickerhauser, 1992; Mallat & Zhang, formations induced by such maps enable us to 1993; Gribonval & Bacry, 2003). However, an inherent span a larger class of signal representations, from drawback is that the selection of the filters decomposing wavelet to chirplet-like filters. We propose a pa- the signals is often achieved with criteria that do not align rameterization of such a non-linear map such that with the task. For instance, a selection based on the sparsity its sampling can be optimized for a specific task of the representation while the task is the classification and signal. The Learnable Group Transform can of the signals. Besides, these selection methods and be cast into a Deep Neural Network. The experi- transformations require substantial cross-validations of ments on diverse time-series datasets demonstrate a large number of hyperparameters such as mother filter the expressivity of this framework, which com- family, number of octaves, number of wavelets per octave, petes with state-of-the-art performances. size of the window (Le & Argoul, 2004; Cosentino et al., 2017). In this work, we alleviate these drawbacks by proposing 1. Introduction a simple and efficient approach by considering the gener- To this day, the front-end processing of time-series alization of these spectral decompositions. They consist remains a keystone toward the improvement of a wealth of taking the inner product between filters and the signals. of applications such as health-care (Saritha et al., 2008), From one decomposition to the other, only the filter bank environmental sound (Balestriero et al., 2018; Lelandais & differs. The filters of well-known spectral decompositions, Glotin, 2008), and seismic data analysis (Seydoux et al., such as the short-time Fourier transform (STFT) and the 2016). The common denominator of the recorded signals continuous wavelet transform (CWT) are built following in these fields is their undulatory behavior. While these a particular scheme. Each filter is the result of the action signals share this common behavior, two significant factors of a transformation map on a selected mother filter, e.g., imply the need of learning the representation: 1) time-series a Gabor filter. If the transformation map is induced by a are intrinsically different because of their physical nature, Group, the representation is called a Group Transform (GT), 2) the machine learning task can be different even within and both the group with the mother filter characterize the the same type of data. Therefore, the representation should decomposition. be induced by both the signal and the task at hand. We propose to enable the learnability of such a scheme. More precisely, our contributions are: 1) we generalize 1Department of Electrical and Computer Engineering, Rice common Group Transforms by proposing the utilization of University, USA. Correspondence to: Romain Cosentino strictly increasing and nonlinear transformations, 2) draw <[email protected]>. the connection between filters that can be learned by our th framework and commonly observed filters in biological Proceedings of the 37 International Conference on Machine time-series 3) we show how the equivariance properties of Learning, Online, PMLR 119, 2020. Copyright 2020 by the au- thor(s). the representation differs from traditional affine transforma- Learnable Group Transform For Time-Series tions, Section 3.1, 4) we propose an efficient way of optimiz- ing the sampling of such functional space, Section 3.2, and 5) apply our method to three datasets containing comple- mentary challenges a) artificial data showing the limitation and drawbacks of well-known GTs, b) a large bird detection dataset (≈ 20 hours of audio recording, 20× larger than CIFAR10 in term of number of scalar values in the dataset) where optimal spectral decomposition are known and de- veloped by expert, and c) a haptic dataset that does not Figure 1. Time-Frequency Tilings at a given time τ:(left) short- benefit from expert knowledge regarding important features, time Fourier transform, i.e., constant bandwidth, (middle) wavelet Section4. transform, i.e., proportional bandwidth, (right) Learnable Group Transform, i.e, adaptive bandwidth, the ”tiling” is induced by We can summarize our approach to the learned non-linear transformation underlying the filter bank decomposition. • given a filter with its analytical formula • generate increasing and continuous maps using 1-Layer Relu Network (the number of increasing and contin- understood by considering the time-frequency tiling of each uous map will be the number of filters in the filter GT. It is known that the spread of a filter and its Fourier bank) transform are inversely proportional as per the Heisenberg • compose the increasing and continuous maps with the uncertainty principle (Mallat, 1999). filter Following this principle, we can observe that in the case of • convolve the filters obtained with the signal to acquire STFT (respectively WT with a Gabor wavelet), at a given the representation time τ, the signal is transformed by a window of constant bandwidth (respectively proportional bandwidth) modulated by complex exponential resulting in a uniform tiling (respec- 2. Related Work and Background tively proportional) on the frequency axis, Figure1. This One approach to represent the data consists of building implies that, for instance, in the case of WT, the precision equivariant-invariant representations. For instance, in (Mal- in frequency degrades as the frequency increases while its lat, 2012; Bruna, 2013) they propose a translation-invariant precision in time increases (Mallat, 1999). Thus, WT is representation, the Scattering Transform, which is stable un- not adapted for fast-varying frequency signals (Xu et al., der the action of small diffeomorphisms. In (Oyallon et al., 2016). In the case of STFT, the uniform tiling implies that 2018; Cohen & Welling, 2016), they focus on equivariant- the precision is constant along the frequency axis. In our invariant representations for images, which reduces the sam- proposed framework, the LGT allows for an adaptive tiling, ple complexity and endow DNN’s layers with interpretabil- as illustrated in Figure1 such that the trade-off between ity. time and frequency precision depends on the task and data. The closest work to ours consist of learning the filter bank in an end-to-end fashion. (Cakir et al., 2016; Ravanelli & Ben- 3. Learnable Group Transform gio, 2018; Balestriero et al., 2018; Zeghidour et al., 2018) Common time-frequency filter banks are built by transform- investigated the learnability of a mother filter such that it ing a mother filter that we denote by . We consider the can be jointly optimized with the DNN. In order to build transformations of this mother filter defined as ◦g; g 2 F, the filter bank, this learnable mother filter is transformed where F defines the functional space of the transformation by deterministic affine maps. The representation of the sig- and ◦ g denotes the function composition. Note that in nal is obtained by convolving the filter bank elements with signal processing, such a transformation is called warping the signals. Recently, (Khan & Yener, 2018) investigated (Goldenstein & Gomes, 1999; Kerkyacharian et al., 2004). the learnability of the affine transformations, that is, the Given a space F, the filter bank with K filters is created by sampling of the dilation parameter of the affine group in- first, sampling K transformation maps from F and then, by ducing the wavelet filter bank. Optimized jointly with the transforming the mother filter such as DNN, their method allows for an adaptive transformation of the mother filter. Our work generalizes this approach and provide its theoretical properties and building blocks. f ◦ g1; : : : ; ◦ gK jg1; : : : ; gK 2 Fg : One of the main drawbacks of these approaches using time- frequency representation is that the filter bank induces a Now, let’s denote a signal by s 2 L2(R), we will consider bias that might not be adapted to the data. This bias can be the representation of the signal as the result of its convolu- Learnable Group Transform For Time-Series Figure 2. Learnable Group Transform: (left) generating the strictly increasing continuous functions gθk with parameters θk; 8k 2 f1;:::;Kg, where K denotes the number of filters in the filter bank. The x-axis is the time variable and the y-axis the amplitude. (middle) The mother filter, (presently a Morlet wavelet), is composed with each warping function gθk , where the imaginary part is shown in red and the real part in blue. The x-axis represents the time and y-axis the amplitude of the filter. These transformations lead to th the filter bank (only the k element is displayed). Then, the convolutions between the filter bank elements and the signal si lead to the LGT of the signal. The black box on the LGT representation (right) corresponds to the convolution of the kth filter with the signal.

Load more