Accelerating Matching Pursuit for Multiple Time-Frequency Dictionaries
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020 ACCELERATING MATCHING PURSUIT FOR MULTIPLE TIME-FREQUENCY DICTIONARIES ZdenˇekPr˚uša,Nicki Holighaus and Peter Balazs ∗ Acoustics Research Institute Austrian Academy of Sciences Vienna, Austria [email protected],[email protected],[email protected] ABSTRACT An overview of greedy algorithms, a class of algorithms MP Matching pursuit (MP) algorithms are widely used greedy meth- falls under, can be found in [10, 11] and in the context of audio ods to find K-sparse signal approximations in redundant dictionar- and music processing in [12, 13, 14]. Notable applications of MP ies. We present an acceleration technique and an implementation algorithms in the audio domain include analysis [15], [16], coding of the matching pursuit algorithm acting on a multi-Gabor dictio- [17, 18, 19], time scaling/pitch shifting [20] [21], source separation nary, i.e., a concatenation of several Gabor-type time-frequency [22], denoising [23], partial and harmonic detection and tracking dictionaries, consisting of translations and modulations of possi- [24]. bly different windows, time- and frequency-shift parameters. The We present a method for accelerating MP-based algorithms proposed acceleration is based on pre-computing and thresholding acting on a single Gabor-type time-frequency dictionary or on a inner products between atoms and on updating the residual directly concatenation of several Gabor dictionaries with possibly different in the coefficient domain, i.e., without the round-trip to thesig- windows and parameters. The main idea of the present accelera- nal domain. Previously, coefficient-domain residual updates have tion technique is performing the residual update in the coefficient been dismissed as having prohibitive memory requirements. By domain while exploiting the locality of the inner products between introducing an approximate update step, we can overcome this re- the atoms in the dictionaries and dismissing values below a user striction and greatly improve the performance of matching pursuit definable threshold. The idea of performing the residual updatein at a modest cost in terms of approximation quality per selected the coefficient domain using inner products between the atomsin atom. An implementation in C with Matlab and GNU Octave in- fact dates back to the original work by Mallat and Zhang [2, Ap- terfaces is available, outperforming the standard Matching Pursuit pendix E], but has always been dismissed as being prohibitively Toolkit (MPTK) by a factor of 3.5 to 70 in the tested conditions. storage expensive, as it generally requires storing pairwise inner Additionally, we provide experimental results illustrating the con- products between all dictionary elements. By exploiting the Ga- vergence of the implementation. bor structure of the dictionaries and dismissing insignificant inner products, it becomes feasible to store all significant inner products 1. INTRODUCTION in a lookup table and avoid atom synthesis and the residual re- analysis in every iteration of MP as it is usually done in practice. L Optimal K-sparse approximation of a signal x 2 R in an over- The size of the lookup table as well as the cost of computing it are complete dictionary of P normalized atoms (vectors) D = independent of the signal length and depend only on the parame- L×P [d0jd1j ::: jdP −1] 2 C , kdpk2 = 1 or, similarly, finding ters of the Gabor dictionaries. the minimal number of atoms that achieve a given error tolerance A freely available implementation in C (compatible with C99 kx − Dck2 ≤ E, are NP-hard problems [1]. Both problems can and C++11), can be found in the backend library of the Matlab/GNU be (sub-optimally) solved by employing greedy matching pursuit Octave Large Time-Frequency Analysis Toolbox (LTFAT, http: (MP) algorithms. The only difference is the choice of the stop- //ltfat.github.io) [25, 26] available individually at http: ping criterion. It has been shown that the basic version of MP [2] //ltfat.github.io/libltfat1. achieves an exponential approximation rate [1, 3, 4, 5]. In practice, without imposing any structure on the dictionary, the effectiveness We compare the performance of the proposed implementation of MP and its variants, see, e.g., [6, 7, 8, 9], quickly deteriorates with the Matching Pursuit Toolkit (MPTK) [27], the de-facto stan- when the problem dimensionality increases; either by increasing dard implementation of several MP based algorithms. By special- the input signal length L or the size of the dictionary P . Even with izing on multi-Gabor dictionaries, the presented algorithm is able structured dictionaries, and using fast algorithms in place of ma- to significantly outperform MPTK. trix operations, a naive implementation can still be prohibitively This manuscript is a shortened version of the paper [28], which inefficient; even decomposing a short audio clip can take hours. provides an extended discussion of the proposed algorithm includ- ing a worst case analysis of its convergence properties and more ∗ This work was supported by the Austrian Science Fund (FWF): Y 551–N13 and I 3067–N30. The authors thank Bob L. Sturm for shar- details about the provided implementation. ing his thoughts on the subject in a form of a blog Pursuits in the Null Space. Copyright: © 2020 ZdenˇekPr˚ušaet al. This is an open-access article distributed 1 Confer to http://ltfat.github.io/libltfat/group_ under the terms of the Creative Commons Attribution 3.0 Unported License, which _multidgtrealmp.html or http://ltfat.github.io/doc/ permits unrestricted use, distribution, and reproduction in any medium, provided the gabor/multidgtrealmp.html) for more information on the library original author and source are credited. and its Matlab interface. DAFx.1 Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020 2. PRELIMINARIES The procedure is repeated until the desired approximation er- ror is achieved or alternatively some other stopping criterion is Matrices and column vectors are denoted with capital and lower- met e.g. a sparsity or a selected inner product magnitude limits case bold upright letters, e.g., M, x, respectively. Conjugate trans- are reached. It is known that the matching pursuit (MP) algorithm pose is denoted with a star superscript, (x∗; M∗), scalar variables and its derivatives can benefit from pre-computing inner products with a capital or lowercase italics letter s; S. A single element of between the atoms in the dictionary G(k; j) = hdj ; dki i.e. from ∗ P ×P a matrix or a vector is selected using round brackets M(m; n), pre-computing the Gram matrix G = D D 2 C . With ∗ x(l). The index is always assumed to be applied modulo vec- %k = D rk denoting the coefficient-domain residual, the resid- tor length (or matrix size in the respective dimension) such that ual update step can be written as ([29, Ch. 12]) x(l) = x(l + kL) for l = 0;:::;L − 1 and k 2 Z. More- over, we will use two indices and subscript for vectors such that %k+1 = %k − c(pmax)G(•; pmax): (3) c(m; n) = c(m + nM) in order to transparently “matrixify” a M This modification has the advantage of removing the necessity of vector. Sub-vectors and sub-matrices are selected by an index set synthesizing the residual and recomputing the inner product in the denoted by a calligraphic letter e.g. x(P) and the m-th row or n- selection step. On the other hand, such approach is usually dis- th column of a matrix M are selected using the notation M(m; •) missed as impractical in the literature due to the high memory re- and M(•; n), respectively. Scalar-domain functions used on ma- quirements for storing the Gram matrix. We will show that this is trices or vectors are applied element-wise e.g. jxj2(l) = jx(l)j2. not the case for multi-Gabor dictionaries, see Section 3. Formally, The Euclidean inner product of two vectors in L is hx; yi = C the coefficient-domain matching pursuit algorithm is summarized ∗ PL−1 y x = l=0 x(l)y(l), where the overline denotes complex con- 2 in Alg. 1. The stopping criterion may contain several conditions, jugation, such that the 2-norm of a vector is defined by kxk2 = and the algorithm terminates if any of these conditions is met. hx; xi. Input: Input signal x, dictionary Gram matrix G = D∗D 2.1. Multi-Gabor Dictionaries Output: Solution vector c ∗ 2 L Initialization: c = 0, % = D x, E0 = kxk , k = 0 A Gabor dictionary D(g;a;M) generated from a window g 2 R , 0 2 while Stopping criterion not met do kgk2 = 1, time shift a and a number of modulations M is given as 1. Selection: pmax argmax j%k(p)j p i2πm(l−na)=M D(g;a;M)(l; m + nM) = g(l − na)e (1) 2. Update: for l = 0;:::;L − 1 and m = 0;:::;M − 1 for each n = (a) Solution: c(pmax) c(pmax) + %k(pmax) 0;:::;N − 1 N = L=a , where is the number of window time 2 shifts and the overall number of atoms is P = MN. Hence, the (b) Error: Ek+1 Ek − j%k(pmax)j redundancy of a dictionary is P=L = M=a. A multi-Gabor dic- (c) Residual: %k+1 %k − %k(pmax)G(•; pmax) tionary consists of W Gabor dictionaries, i.e., k k + 1 D(g1;a1;M1) D(g2;a2;M2) ::: D(gW ;aW ;MW ) (2) end Algorithm 1: Coefficient-Domain Matching Pursuit and we will also use a shortened notation Dw = D(gw;aw;Mw). Generally, aw;Mw need only be divisors of L.