Optimised Wiener Filtering in Overdetermined Systems

Optimised Wiener Filtering in Overdetermined Systems Joshua L. Sendall Warren P. du Plessis Council for Scientific and Industrial Research University of Pretoria Pretoria, South Africa Pretoria, South Africa Abstract—The Wiener filter is prevalent in many adaptive matrix approximation is analysed. The validity of the Toeplitz filtering applications, but can also be a computationally bur- approximation on the filter’s effectiveness is analysed for filter densome part of a signal processing chain. This paper presents sizes in overdetermined systems. methods to reduce the computational complexity and memory footprint of Wiener filters where the filter is highly overdetermined. The optimisations are demonstrated in the context of passive radar and result in a reduction of over 40 times in II. WIENER FILTER processing time. Index Terms—Adaptive filter, passive radar, passive coherent location (PCL), direct path interference (DPI), clutter cancella- The Wiener filter minimizes the distance between a measured tion, radar clutter, radar signal processing. stationary process and a desired process. The exposition below is based on the outline provided by [2]. I. INTRODUCTION The filter is described by defining a measured signal as Covariance matrices are an important concept in many adap- xn = sn + zn n = 0; 1;:::;N − 1 (1) tive signal processing applications. These include minimum variance distortionless response (MVDR) filtering [1], signal where N is the number of samples, and sn and zn are the cancellation [2], and space-time adaptive processing (STAP) nth samples of the desired and undesired signal components radar [3]. While these techniques can theoretically result in respectively. The output of the filter is then the estimate of the optimal performance, in practical situations, efficiently estimat- desired signal ing the covariance is still an open topic in research. Typically, sbn = xn − zbn (2) there are two aspects of covariance matrix estimation that are where s is the estimate of the desired signal, and z is the challenging. Either there is limited information and therefore b b estimated undesired signal obtained from insufficient samples to accurately estimate the covariance matrix M−1 [4], [5], or the number of samples is so large that the estimation X of the covariance matrix becomes computationally intensive. zbn = hmyn−m (3) The first case is handled using rank reduction methods m=0 [3]. A solution for reducing the computational burden for the with M the number of filter coefficients, y a template of the second case is presented in this paper. Many signal cancellation desired signal, and hm the filter weights. environments require delayed versions of a template signal to The undesired signal can be estimated with the minimum be removed, although in some instances with a frequency offset squared error by determining filter weights by solving the [6]. This structure induces redundancy, which can be exploited Wiener equations for all values of l = 0; 1;:::;M − 1 at to reduce computation time and memory requirements. sample n [2], This paper presents two strategies for efficiently estimating M−1 the covariance matrix for highly overdetermined systems. The X ∗ ∗ E yn−myn−l hm = E xnyn−l (4) first exploits the redundancy in the inner product produced m=0 by the covariance matrix’s tapped delay line structure by Ryyh = Rxy (5) breaking up the matrix-matrix multiplication into a matrix- vector multiplication and a set of rank-1 updates. The second where E[·], ·∗, and ·H denote the expectation operator, complex method further exploits the redundancy by computing the conjugation, and the Hermitian transpose respectively. The matrix-vector product using fast Fourier transforms (FFTs). M × 1 vector h is formed from the filter coefficients, Ryy Both methods significantly reduce the computational time is the M × M auto-covariance matrix of y, and Rxy is the of the filter. The memory requirements of the filters are M × 1 covariance matrix of x and y. also significantly reduced as the sample matrix is implicitly In practical systems the covariance matrix is not usually represented, rather than having to be stored in memory. Then known and it is necessary to estimate the covariance matrix the computational benefit of using a Toeplitz covariance from sample data. Given N samples from the reference channel, TABLE I: The approximate computational cost of the cancellation algorithms considered. Algorithm Computational cost 2 3 2 LU factorisation 3 M + 2NM 1 3 2 Cholesky factorisation 3 M + 2NM 2 3 2 QR factorisation − 3 M + 2NM CGLS (P + 1)(4NM) it is possible to form the N × M matrix 2 ∗ ∗ 3 y0 y1 : : : yM−1 Fig. 1: Procedure for filling Rb yy. 6 y y : : : y∗ 7 6 1 0 M−2 7 Y = 6 7 (6) 6 . .. 7 6 . 7 4 5 factorisations comprises two terms. The M 3 term is related y y : : : y 2 N−1 N−2 N−M to the factorisation of Ryy in (5), while the NM term is by delaying each column by an additional sample, thereby related to the calculation of Rb yy in (7). The majority of the computational load is associated with the calculation of Ryy allowing Ryy to be estimated as b under the assumption N M. 1 H A further point to note is that R is a Hemitian Toeplitz Rb yy = Y Y. (7) yy N matrix because x is stationary. However, Rb yy is only approx- Further, placing n samples from the measured signal in the imately Toeplitz as practical environments are not stationary N × 1 vector x allows Rxy to be approximated by processes [9]. The effects of this observation are further explored in Section III-E. The use of Toeplitz solvers, such as 1 H Rb xy = Y x. (8) N the Bareiss algorithm [10], results in sub-optimal cancellation. Substituting (7) and (8) into (5) results in III. OPTIMISATION 1 1 This section shows how the mathematical properties of YH Yh = YH x (9) N N the computations described in Section II can be exploited which can be simplified to the form of the least squares (LS) to significant reduce the number of computations required to problem implement a Wiener filter. Yh = x (10) A. Hermitian Structure of Covariance Matrix under the assumption that the inverse of YH exists. An LS The first opportunity for optimisation emerges from the method, such as QR decomposition, in used to determine the exploitation of the Hermitian structure of Rb yy. Only the lower filter weights. Alternatively, a square system solver, such as triangle of Rb yy needs be explicitly calculated, while the upper LU decomposition, can be employed to solve (5) directly by triangle can be filled using using the approximate values of Rb yy and Rb xy from (7) and R (k; l) = R∗ (l; k) (8). b yy b yy (11) In order to achieve an accurate estimate of the covariance where A(k; l) is the element of A in kth row and lth column. 2 matrices, N should be much greater than M [4]. With this in The complexity in calculating Rb yy is reduced from O(2NM ) mind, the approximate number of complex operations required to O(NM 2). to solve the system via a number of methods is shown in Table I. The LU and Cholesky factorisations [7] solve (5), B. Tapped Delay Line Covariance Matrix while the QR factorisation [7] and conjugate-gradient least The tapped delay line structure of Y can be exploited to squares (CGLS) [8] solve (10). reduce the computational load. This observation means that is The QR filter uses Householder reflections to directly invert only necessary to store N + M − 1 unique samples, and not a rectangular matrix, where N > M. Hence, the terms seen all N × M elements, as the elements in Y are repeated. in Table I are both from the factorisation. Due to the negative This reduces the storage requirements, and the resulting term, the QR filter requires fewer operations than the Wiener lower number of unique memory accesses also improves cache filter (using LU and Cholesky Factorisation). The conjugate- efficiency, which is especially valuable on processors such as gradient least squares (CGLS) algorithm does not directly solve graphics processing units (GPUs) which have a high arithmetic (10), but instead iterates towards the solution. The number of intensity [11]. iterations is P . The tapped delay line structure of Y gives Ryy a Toeplitz The computational complexity of the LU and Cholesky structure. The approximation of Rb yy as a Toeplitz matrix reduces the computation of Rb yy further by only requiring the E. Toeplitz Approximation full inner dot product of the first row of Rb yy (2NM complex For a stationary process, the auto-covariance matrix is operations). a Toeplitz positive semi-definite matrix, with each element The remainder of Rb yy can be populated by following a defined by sequential update approach for each element of the first row, shown in Fig. 1. Each (gray) element is calculated along the Rij = E [(Yi − µi)(Yj − µj)] (16) diagonal as where µi is the mean of the ith random vector. For the tapped ∗ delay-line case where Rb yy(k; l) = Rb yy(k − 1; l − 1) − Y(k; k − 1)Y (l; k − 1) ∗ + Y(k; k + N)Y (l; k + N). (12) Yi(t0) = Yj(t0 + Ts(j − i)) (17) where T is the sample period, (16) becomes Therefore, the calculation of Rb yy has a complexity of s O(2NM + (3=2) M 2). Rij = E [(Y(t0 + Tsi) − µ(t0 + Tsi)) C. Exploiting Fast Correlation (Y(t0 + Tsj) − µ(t0 + Tsj))] (18) Fast correlation can be used to calculate Rb xy and the first where Y is the signal in question and µ is its mean.

Load more