<<

Subspace SNR Maximization: The Constrained Stochastic Matched Filter Bruno Borloz, Bernard Xerri

To cite this version:

Bruno Borloz, Bernard Xerri. Subspace SNR Maximization: The Constrained Stochastic Matched Filter. IEEE Transactions on Processing, Institute of Electrical and Electronics Engineers, 2011, 59 (4), pp.1346 - 1355. ￿10.1109/TSP.2010.2102755￿. ￿hal-01823637￿

HAL Id: hal-01823637 https://hal-amu.archives-ouvertes.fr/hal-01823637 Submitted on 26 Jun 2018

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. I INTRODUCTION 1 Subspace SNR maximization: the constrained stochastic matched filter Bruno Borloz1,2 - Bernard Xerri1,2 1 Universit´edu Sud Toulon Var, IM2NP, Equipe ”Signaux et Syst`emes” 2 CNRS, IM2NP (UMR 6242) Bˆatiment X, BP 132, F-83957 La Garde Cedex (FRANCE) Tel: 33 494 142 461 / 33 494 142 565 Fax: 33 494 142 598 [email protected], [email protected]

Abstract w.r.t. the basis is that the simplest basis, say an orthonor- In this paper, we propose a novel approach to perform mal one, can usefully be chosen: moreover in such a basis detection of stochastic embedded in an additive ran- the mathematical expression of the SNR is simple to obtain dom . Both signal and noise are considered to be real- and will simplify later calculations. izations of zero mean random processes whose only second- In this paper we show that there is neither immediate order statistics are known (their covariance matrices). nor obvious way to find the optimal p-dimension subspace: The method proposed, called Constrained Stochastic then we propose an algorithm and its the convergence to Matched Filter (CSMF), is an extension of the Stochastic the good solution is proved. Matched Filter itself derived from the Matched Filter. The The performances of the method and the comparisons CSMF is optimal in the sense that it maximizes the Signal- with other methods are performed through the Receiver- to-Noise Ratio in a subspace whose dimension is fixed a Operating-Characteristic (ROC) curves giving the Proba- priori. bility of Detection PD w.r.t. the Probability of False Alarm In this paper, after giving the reasons of our approach, we PFA. Nevertheless, this paper gives no demonstration that show that there is neither obvious nor analytic solution to ROC curves are better for a predicted value of p: we only the problem expressed. Then an algorithm, which is proved observe, with results obtained from numerical simulations, to converge, is proposed to obtain the optimal solution. that there exists a value of p for which the ROC curve is The evaluation of the performance is completed through the best one. estimation of Receiver Operating Characteristic curves. Let us also note that our model is not a parametric one. Experiments on real signals show the improvement brought The only knowledge is the covariance matrices of the ran- by this method and thus its significance. dom signals . Keywords— detection ; subspace method ; reduced-rank method; signal-to-noise ratio maximization; matched filter ; A. Problem Formulation matched subspace. Let us consider an observation x ∈ RN . Two hypotheses can be formally stated (detection problem): this measure- I. Introduction ment was produced by ambient noise n alone or by a signal This paper deals with the problem of detecting a stochas- s embedded in this noise, respectively: tic signal (like a transient signal for example) embedded in H0: x = n an additive random noise. H1: x = s + n Throughout this paper, all the signals will be real and The objective is to decide between these hypotheses. Our discrete (time samples, pixels of images, ...) and repre- model will not be a parametric one. sented with vectors of RN . The assumptions of our model are the following : The method proposed here consists in a linear filtering 1) s and n are realizations of zero mean ran- called (for reasons explained later) ”Constrained Stochastic dom processes. Matched Filter” (CSMF). This method gives, for an integer 2) The covariance matrices of s and n, respec- value p (1 ≤ p < N), among all the p-dimension subspaces, tively A and B, are supposed to be known, full rank and the one where the Signal-to-Noise Ratio (SNR) is maxi- different. mum: the CSMF is optimal for this criterion. This is a 3) s and n are uncorrelated, not necessarily reduced-rank method (a projection) under constraint (the Gaussian, and their Probability Density Functions (PDF) constraint being the a priori knowledge of the dimension are unknown. p) [1]. Two kinds of error are possible: the missing of the signal The SNR is invariant in a p-dimension subspace: it does and the false alarm. A trade-off (highlighted by the ROC not depend on the basis chosen to describe the subspace. curves) must be found between a small average number of An important consequence of this invariance of the SNR misses and a small average number of false alarms. II OVERVIEW OF SOME EXISTING METHODS

When the PDF of the signals are known, the key quan- or higher; therefore its aim is to maximize the SNR in an tity to compute is the Likelihood Ratio (LR) L(x) which aptly chosen subspace with an a priori given dimension must be compared to a threshold determined according to p. The choice of p, and then the dimension of the opti- a criterion such as the minimization of the probability of mal subspace searched for, is a constraint: this is why the error, the maximization of PD when PFA is fixed a priori name of ”Constrained Stochastic Matched Filter” (CSMF) (Neyman-Pearson criterion) [2]-[5]. was given to this optimal filter. We will clearly see that When the PDF are unknown, L(x) cannot be calculated. the CSMF is not a simple extension of the ESMF and that This is why we take into consideration methods based on the CSMF can no more be inferred from the ESMF. SNR maximization. Furthermore the CSMF method takes However, when p = 1, the CSMF and the SMF are iden- place among numerous currently known reduced-rank tech- tical. But when p > 1, it is proved in this paper (cf. Section ∗ niques which have been proposed (Section II is a survey of III-F) that the optimal space Ep cannot be simply deduced some SNR maximization and reduced-rank methods justi- † ∗ from the knowledge of either Ep or Ep−1. Hence, it is nec- fying the approach of our method). essary to propose an algorithm that finds the solution: this algorithm is given and is proved to converge to the solution B. Why the CSMF ? (cf. Section IV). When the PDFs are known (e.g. Gaussian), the Likeli- hood Ratio Test (LRT) is an optimal test which leads to Organization of the paper compare a value to a threshold. For Gaussian signals, the In Section I we formulate the mathematical model, test can easily be written as a sum of N terms (then onto present the basic assumptions. Section II describes exist- the whole space of the used signals) depending on the ob- ing methods and introduces those proposed in the paper. servation vector x: The method is detailed in Section III and useful properties XN are highlighted. Section IV is dedicated to the practical λi t 2 log L(x) = Λ(x) = (uix) . (1) determination of this subspace: an algorithm is proposed 1 + λi ∗ i=1 to find the optimal subspace Ep and the proof of its con- vergence is given. Then experimental results are presented where the λi (λ1 ≥ λ2 ≥ ... ≥ λN ) and ui are respec- in Section V. tively the eigenvalues and corresponding eigenvectors of In this paper, we apply the method to detection, but it B−1A (see Section II-C). In fact, the eigen-elements of −1 could be obviously used for compression, filtering or esti- B A naturally appear when trying to maximize the out- mation problems. put SNR of a linear filter h: this output SNR can be written htAh ρ = htBh . II. Overview of some existing methods The maximal value of ρ, noted ρmax, is obtained for h = The model is those given in Section I-A. u1 the eigenvector associated with the largest eigenvalue λ1 of B−1A: this filtering consists in a projection of the signal A. The Karhunen-Lo`eveTransform x onto Eu1 , and then it is easy to verify that ρmax = λ1. This method is called ”Stochastic Matched Filter” (SMF) The Karhunen-Lo`eve transform (KLT) is a Principal [6]. Component Analysis used to tackle this model [6]-[8] when 2 When signals are not Gaussian, we can continue to use noise is white (B = σnI) or absent; it provides the best ap- Λ(x) which is no longer the log of the LR. This expres- proximation, in the sense that it minimizes a mean square sion has no reason to be optimal, and experimental results error (MSE) for a stochastic signal under the condition show, first, that a truncation of this sum to p terms can that its rank is fixed and is used for example for data com- improve the ROC curves and next that there exists an op- pression or filtering. When noise is white, it determines timal value of p for which the ROC curve is the best one. the p-dimension subspace where the SNR is maximum. This truncation (p < N) is expressed as follows But it does not consider colored noise, and therefore is not optimum even when it is used with a noise suppression Xp λi filter such as the Wiener filter (which is not a reduced-rank (utx)2, 1 + λ i method). The SMF, a Generalized Eigen Decomposition i=1 i (GED) introduced by Cavassilas-Xerri [6] will be detailed and can be seen as a projection of x onto a p-dimension in Section II-C: it performs a two-stage operation (pre- † † subspace Ep : Ep is spanned by {ui}i=1,...,p. This method, whitening and KLT), but is shown to be not optimal in called ”Extended Stochastic Matched Filter” (ESMF) could terms of maximization of the SNR. GED is a major prob- be wrongly interpreted as a SNR maximization method: lem in many modern information processing applications in fact it does not maximize the SNR in a p-dimension (adaptive filtering, blind source separation [7], ...) and fast subspace but a weighted sum of output SNRs, each of them algorithms to estimate and track the principal generalized after a projection onto Eui for i = 1, ..., p (see Section II-C). eigenvectors have been developed [8]. The method proposed in this paper is naturally inferred We will show that the CSMF can be seen like an ex- from these remarks concerning the output SNR maximiza- tension of the KLT and the SMF for the problems we are tion and the projection onto a subspace of dimension two interested in.

2 II OVERVIEW OF SOME EXISTING METHODS

Some authors want to find an optimal linear data com- C. The Stochastic Matched Filter and the Extended SMF pression method in the presence of noise: the Proportional C.1 Introduction KLT (PKLT) applies an oblique projection operator onto a subspace S (dim(S) = p) along a subspace L (S and L The SMF was first introduced to detect a random sig- are both unknown) [9]. This operator P naturally maxi- nal not supposed to lie in a known subspace: furthermore, mizes a ratio of powers. Solving this problem without any the second-order statistics of both s and n are supposed constraint concerning the rank of P naturally leads to an to be known [6]. It can be seen like an extension of the impossibility. MF (it provides an optimal filter since it maximizes the SNR), but also of the KLT. This problem is a generalized The first part of their work shows how justifiable it is eigenvalue problem using the covariance matrices A and to take an interest in the maximization of the SNR in a B; the filtering is a projection onto the optimal subspace subspace. spanned by the eigenvector of B−1A with maximum eigen- The CSMF proposed in this paper solves this problem value which is also the value of the maximum output SNR. by adding a constraint, the rank of the subspace to project The output SNR ρ of a linear filter h can be written like a data onto. htAh Rayleigh quotient: ρ = htBh (if A and B have unit trace, ρ can be interpreted like a gain on the SNR). This prob- B. Parametric Models lem is equivalent to solve the following generalized eigen- value problem: Ah = λBh. The maximal value ρmax is For detection with reduced-rank methods, many authors obtained for h = u1 the eigenvector associated with the −1 have worked on a parametric model of the following form: largest eigenvalue λ1 of B A; then ρmax = λ1 > 1: this Pp ∗ x = Hθ + n = i=1 θihi + n (H is a N × p matrix). The filtering performs a projection of the signal onto E1 = Eu1 . useful signal s = Hθ is a stochastic signal constrained to lie If {λi} and {ui} are the eigenvalues and eigenvectors of −1 in the signal subspace, the p-dimension subspace spanned B A with λ1 ≥ ... ≥ λN , we easily prove that: by the known modes hi, with mode weights or gains, the I λi ≥ 0 can be interpreted like a gain on SNR entries θi of θ. This model in an extension of those used for after projection onto Eui . N the Matched Filter (MF) detector, matched to a signal that I {ui} is a non-orthogonal basis of R perform- is assumed to lie in a 1-dimensional subspace (i.e. H = h1 ing simultaneous diagonalization of A and B. If U = is the deterministic signal to detect). [u1...uN ], an appropriate normalization of the ui gives, It is noteworthy that our model is very different, because with ∆ positive diagonal matrix [22]: the A is full rank and then the signal ½ UtAU = ∆ subspace has dimension N. UtBU = I When the noise is Gaussian, the output of the MF pro- vides a sufficient statistic for any LRT for detection. The The interpretation of the λi naturally leads to take into knowledge of s and the second-order statistics of n is nec- account directions of projection that could statistically con- essary to derive the corresponding MF. For p > 1 the MF tribute to a better detection, that means growing the di- detector is no more efficient and is extended to the Matched mension of the subspace to project data onto. Actually, Subspace Detector (MSD) [10]-[17] assuming prior knowl- it has been shown [6] that a few other eigenvectors, the edge of B. The MF is also named coherent MSD. dominant one, can statistically contribute to improve ROC When the gains are unknown, the Generalized Likelihood curves. To take a decision, we have to propose a function Ratio Test [5] takes the form of a ratio of two quadratic of x and the ui. forms of prewhitened observations using orthogonal pro- When the signals are Gaussian, the calculation of the jections onto suitable subspaces. The statistic obtained logarithm of the LR leads easily to equation (1). This has natural invariances (the energy of the subspace signal expression has no reason to be optimal when the signals and the SNR are unchanged). are not Gaussian, but, according to the remarks above, this summation is shortened to p terms corresponding to When B is unknown, it is obtained from training data significant eigenvalues and the function chosen is then: (Adaptive Subspace Detectors) [13][14]. Xp Numerous papers deal with the MF detector and its ex- λi t 2 tensions: several reasons may imply that signal or/and Λ(x; p) = (uix) , (2) 1 + λi noise are not exactly known (channel nonlinearities, timing i=1 jitter, non-stationarities, model uncertainties ...) [18][19]. t 2 which is a weighted sum of (uix) the power of the ob- Another problem studied (e.g. in digital communications servation x after projection onto each direction ui, each [20]) is those of detecting a transmitted signal when one out weight being linked to the SNR on this direction. This of several known signals is transmitted. When the additive method, called ”Extended SMF” (ESMF), does not maxi- noise is white and Gaussian, the optimal detector consists mize the SNR in a p-dimension subspace, but a weighted of a bank of MF followed by a detector which chooses as sum of output SNRs, each of them after a projection onto † the detected signal the one with the maximal output value. Eui for i = 1, ..., p. We will denote Ep the subspace spanned Improvements have been observed in many cases [21]. by {ui}i=1,...,p.

3 III THE CONSTRAINED STOCHASTIC MATCHED FILTER

C.2 Illustration: a practical example which try to maximize a SNR while performing a reduced- rank operation are natural when PDF are unknown. To illustrate the interest of taking into account a sub- Thus, it seems natural and legitimate to ask oneself if space of dimension higher than one, let us apply this there exists a p-dimensional subspace (p > 1) in which the method on a narrow-band signal embedded in simulated SNR is maximal, with the hope that ROC curves would be underwater acoustics: the central frequency is f = 3131 0 improved again, and then if it is possible to find it. Hz and the spectral bandwidth of noise is B = 1260 Hz. The sampling frequency is here Fe = 15423 Hz. Experi- III. The Constrained Stochastic Matched Filter ments are performed on Nr = 1000 realizations of signal The method proposed in this paper has been called Con- denoted si (N = 21, see appendix A) and the initial SNR is about -14 dB. Envelop detection techniques could be used strained SMF (CSMF) because it can be seen like an ex- but, in practice, they give lower quality results. tension of the SMF, naturally inferred from the remarks in Nr previous section concerning a projection onto a subspace P t A is calculated as follows: A = sisi. For these simu- of dimension two or higher where the SNR is maximum. i=1 lations, the areas of presence or not of a signal are obviously A. Preliminary remarks and notations known. A detection (which can be a false alarm) is decided each time there are at least 5 consecutive points of the re- A random signal s, such as in our model, is a vector of N sult function done by equation (2) above the threshold. R and can always be expressed as follows: † ROC curves are shown in figure 1, first for p = 1 (E1 ) XN † and for the optimal value of p, say 3 (E3 ): s = αixi = Xa i=1

ESMF − input SNR = −14 dB 1 t where a = [α1...αN ] is a vector of random variables and N 0.9 p = 2 X = [x1...xN ] a basis of R . In this paper we are interested only in powers in sub- 0.8 spaces. Let us denote A = E (sst) the covariance matrix 0.7 p = 3 p = 4 and Ps the power of s. Of course, Ps depends only on 0.6 the subspace and not on the basis used to describe it, and

0.5 p = 1 with no loss of generality, it is possible to consider only or- thonormal bases to describe any subspace: hence XtX = I. 0.4 It readily follows that: 0.3

Pd (probability of detection) XN 0.2 t ¡ t ¢ ¡ t¢ Ps = xiAxi = tr X AX = tr AXX = trA. (3) 0.1 i=1

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Moreover, as P = trA, we can consider, without loss of Pfa (probability of false alarm) s generality, only covariance matrices of trace 1. Fig. 1. ROC curves (input SNR=−14 dB) for different values of p B. SNR in a p-dimension subspace Ep The improvement brought by projecting data onto E† Considering an integer p chosen a priori in [1, ..., N − 1], 3 let us denote E = E the p-dimension subspace is obvious. This example shows how the make of decision p {x1,...,xp} can be greatly improved by taking into consideration more spanned by the p orthonormal vectors x1 to xp. We † will also denote E⊥ = E . Then, with X = than one eigenvector. Even if the SNR in E is smaller than p {xp+1,...,xN } p 3 ⊥ † [x1, ..., xp], the projection of s onto Ep along Ep has power in E1 , we observe that increasing the projection subspace dimension p brings an improvement that is not counterbal- Pp: Xp anced by the the decrease of the SNR. ¡ t ¢ t Pp = tr X AXp = x Axi, We can also see that there is a worsening of the ROC p i † i=1 curve for p = 4. The projection onto E4 will give statisti- † and the expression of the SNR in Ep takes the form: cally worse results than those onto E3 . This result confirms that there is an optimal value of p for the chosen criterion. Pp t ¡ ¢ xiAxi t tr X AXp C.3 Conclusion ρ = i=1 = ¡ p ¢ . (4) Pp t t tr XpBXp xiBxi The SMF can be proved to be a two-stage operation: i=1 data whitening and then maximisation of SNR in a p- † dimensional subspace Ep (KLT). But, as whitening is not The objective is to find the unknowns {xi} in order to an optimal operation in terms of SNR, the global operation maximize this term. The optimal subspace will be denoted † ∗ ∗ trA has no reason to maximize the SNR in Ep . Hence, methods Ep and the corresponding SNR ρ . trB is the input SNR

4 III THE CONSTRAINED STOCHASTIC MATCHED FILTER and if covariance matrices A and B have unit trace, ρ is which is an eigenvalue problem. Note that for any value of in fact a gain on the SNR (and no longer the output SNR) ρ,(A − ρB) is always real symmetric and then diagonal- which can be proved to be necessarily lower than the largest izable through a N × N unitary real eigenvector matrix. −1 ∗ eigenvalue of B A, say λ1 [1]. That means that Ep is spanned by a set of p orthonormal We see here the important difference with the SMF which vectors which are eigenvectors of (A − ρ∗B). Nevertheless, ∗ maximizes the following expression: equation (9) is not simple to solve because, if the ti and ∗ ∗ µi are naturally unknown, ρ is unknown too. Xp t v Avi For any value ρ > 0 let us denote: ρ = i , smf vtBv i=1 i i (A − ρB) ti = µiti, i = 1, ...N. (10) where the vi does not form an orthonormal basis. Throughout the following section, we will focus our at- (A − ρB) is always real symmetric and the {ti} naturally ∗ form an orthonormal basis. tention on equation (4) and try to find Ep . We note that the eigenvalues µi depend on ρ: it is easy ∗ C. Characterization of the optimal subspace Ep to show, with trivial examples, that they are non nonlinear w.r.t. ρ. A simple illustration is given on figure 2 where Let us consider a p-dimension subspace Ep spanned by a we can see the evolution of the eigenvalues µi w.r.t. ρ for set of p orthonormal vectors X = [x1...xp]. The expression the covariance matrices A and B given in example 1 (cf. of the SNR ρ in Ep is given by the equation (4). The constraints can be expressed by the p2 relationships Section III-F). t “xixj = δij”. Clearly, p is given and the unknowns of evolution of eigenvalues µ w.r.t. ρ ∗ i our problem are ρ and the p vectors xi which must be 1 calculated so as to maximize ρ. We are faced with an opti- mization problem with constraints which is usually solved 0.5 using a Lagrange multipliers method. Let us define the following function: 0 µ i Xp Xp t −0.5 L (X, Ω) = ρ + ωij (xixj − δij ). (5) i=1 j=1 −1 This equation can be written: 0 0.5 1 1.5ρ 2 2.5 3 ¡ ¢ tr XtAX ¡ ¡ ¢¢ L (X, Ω) = ¡ ¢ + tr Ω XtX − I , (6) Fig. 2. µi(ρ) for example 1 (N = 3) tr XtBX ∗ D. Property of the optimal subspace Ep where Ω ≡ [ωij] is a p × p symmetric matrix. A necessary condition for this value to be maximum is Equation (10) implies that for any i, j ∈ {1, ...N}, ∂L ∗ tt (A − ρB) t = µ tt t = µ δ . Then for any subset I ∂X = 0, which means that for ρ = ρ : j i i j i i ij of {1, 2, ..., N} with cardinality p, ∗ (A − ρ B) X X X X X ¡ t ¢ + XΩ = 0. t t t tr X BX ti (A − ρB) ti = tiAti − ρ tiBti = µi, ¡ ¢ i∈I i∈I i∈I i∈I As B is positive definite, tr XtBX > 0 and this equation becomes so that P P ttAt µ (A − ρ∗B) X = XΩ (7) i∈I i i i∈I i 0 P t − ρ = P t . (11) i∈I tiBti i∈I tiBti where Ω0 is a p×p real symmetric matrix but not diagonal. ∗ ∗ But we can find a real orthogonal matrix Π and a real For the optimal subspace Ep , i.e. ρ = ρ , the left expression ∗ ∗ ∗ t is null, implying that diagonal matrix ∆µ ≡ [µi ] such that Ω0 = Π∆µΠ . Then equation (7) becomes: X ∗ µi = 0. (12) ∗ ∗ ∗ (A − ρ B) XΠ = XΠ∆µ. (8) i∈I

As Π is invertible, XΠ and X span the same subspace This property will be used to find the solution in the two ∗ trivial cases ’p = 1’ and ’p = N −1’ (cf. Section III-E), but Ep . Furthermore, as Π is a real orthogonal matrix, the set of orthonormal vectors X is changed to another set of also to prove the convergence of the algorithm (cf. Section ∗ ∗ ∗ IV-B). orthonormal vectors XΠ. Noting XΠ = T = [t1...tp], equation (8) can be written We have denoted I = I∗; it is easy to show that I∗ = {1, 2, ..., p} if the eigenvalues are sorted in decreasing order, ∗ ∗ ∗ ∗ ∗ ∗ ∗ (A − ρ B) T = T ∆µ (9) say µ1 > µ2 > ... > µN ).

5 ∗ IV ALGORITHM TO FIND THE OPTIMAL SUBSPACE EP

∗ ∗ ∗ ∗ E. Particular cases µ2 = 1−ρ ). In this example, since t2 = u2, E2 is spanned −1 In some particular cases, it is possible to reach the solu- by two eigenvectors of B A, namely u1 and u3 which are tion easily, without any sophisticated algorithm. clearly not those associated with the two largest λi. I Example 2 I p = 1 ∗ Let us consider two covariance matrices of non stationary The eigenvalue µ1 to take into account is null. Hence, ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ processes. (A − ρ B) t1 = µ1t1 = 0, i.e. At1 = ρ Bt1. ρ is the   largest eigenvalue of B−1A and t∗ its associated eigenvec- 0.0379 0.0379 0.1514 1 1 tor. Naturally we find in this case the SMF. A =  0.0379 0.0473 0.2650  , 3 0.1514 0.2650 2.9148 I p = N − 1 PN−1 ∗ ∗ ∗ and   As i=1 µi = 0, µN = 1 − ρ . Hence, (A − ρ B) tN = 1.2872 1.4658 0.1313 ∗ ∗ ∗ 1 (1 − ρ ) tN , or (A − IN ) tN = ρ (B − IN ) tN ; ρ is the B =  1.4658 1.6865 0.1629  . −1 3 largest eigenvalue of (B − IN ) (A − IN ) and tN its asso- 0.1313 0.1629 0.0263 ciated eigenvector. The N − 1 vectors spanning E∗ are N−1 Denote U and ∆λ the matrices satisfying AU = BU∆λ. the other eigenvectors.   0.5219 0.7555 −0.3972 I B = I U =  −0.5189 −0.6549 0.9156  , 0.6770 0.0192 −0.0622 In this case,P (A − ρI) ti = µiti, i.e.P Ati = (ρ + µi)ti. For ρ = ρ∗, p µ = 0 and then p (ρ + µ ) = pρ = P i=1 i i=1 i and λ1 = 1051, λ2 = 0.45 and λ3 = 0.01. The SNR ob- p λA (where λA is the i-th eigenvalue of A), that means † i=1 Pi i tained in E2 spanned by the eigenvectors associated with 1 p A −1 † ρ = p i=1 λi . Hence the Karhunen-Loeve filter is a par- the two largest eigenvalues of B A is ρ = 149.64. ticular case of the CSMF. ∗ ∗ The SNR obtained in E2 is ρ = 151.05. We denote ∗ ∗ ∗ ∗ (A − ρ B) T = T ∆µ, then: F. Examples   0.7111 −0.2531 −0.6559 For the following examples, we state p = 2 and search ∗ ∗ T =  −0.5845 0.3057 −0.7516  . the optimal subspace E2 . As p = N − 1, we can use the results of the previous paragraph: the SNR ρ∗ obtained in −0.3908 −0.9179 −0.0695 E∗ is easy to calculate. ∗ ∗ ∗ ∗ 2 and µ1 = −0.54, µ2 = 0.54 and µ3 = −150.05. E2 is ∗ ∗ ∗ ∗ spanned by t1 and t2 (we verify that µ1 + µ2 = 0 and I Example 1 ∗ ∗ −1 ∗ µ3 = 1 − ρ ). No eigenvector of B A is contained in E2 ,   a fortiori the eigenvector associated with the largest eigen- 1.00 0.80 0.64 −1 ∗ 1 value of B A which generates E1 . This example proves A =  0.80 1.00 0.80  , 3 that a recursive algorithm w.r.t. p, that would calculate 0.64 0.80 1.00 ∗ ∗ E1 and then E2 , etc... is not realistic.   1.00 0.60 0.36 1 I Conclusion B =  0.60 1.00 0.60  . 3 From these simple examples, we immediately see that 0.36 0.60 1.00 ∗ the optimal subspace Ep is not necessarily spanned by We denote U and ∆λ the matrices such that AU = BU∆λ: eigenvectors of B−1A, and even when this is the case, the   eigenvectors are not necessarily those associated with the 0.6611 −0.7071 0.4170 ∗   largest eigenvalues. It is not possible to deduce Ep from U = 0.3549 0.0000 −0.8076 , † Ep . What’s more, it is not possible to find a recursive 0.6611 0.7071 0.4170 ∗ ∗ formulation on p to find Ep from Ep−1: for example, the relationship E∗ ⊂ E∗ is not necessarily verified. Then we and λ1 = 1.23, λ2 = 0.56 and λ3 = 0.46. The SNR 1 2 † have to propose an algorithm to determine E∗. This will obtained in E2 spanned by the eigenvectors associated p −1 be performed in Section IV. with the two largest eigenvalues of B A (u1 and u2) is ρ† = 1.06. ∗ ∗ G. Conclusion The SNR obtained in E2 is ρ = 1.118. If we denote ∗ ∗ ∗ ∗ In this section, the problem has been presented and equa- (A − ρ B) T = T ∆µ, then: tions have been deduced that must be solved to find the   0.3370 −0.7071 0.6216 optimal p-dimension subspace. We have seen that there T∗ =  −0.8791 0.0000 0.4766  , exists neither analytic nor obvious solution and that an al- 0.3370 0.7071 0.6216 gorithm must be proposed. This is the purpose of the next section. ∗ ∗ ∗ ∗ and µ1 = −0.0725, µ2 = −0.1186 and µ3 = 0.0725. E2 The CSMF consists in finding the p-dimension subspace ∗ ∗ ∗ ∗ is spanned by t1 and t3 (we verify that µ1 + µ3 = 0 and which maximizes the SNR after only a projection.

6 ∗ IV ALGORITHM TO FIND THE OPTIMAL SUBSPACE EP

∗ IV. Algorithm to find the optimal subspace Ep Description of the algorithm (0) −1 In light of the examples of previous section, it is required ρ = λ1 the largest eigenvalue of B A to find an algorithm to determine the optimal subspace n = 0 ∗ (n) (n) Ep spanned by p vectors ti verifying equation (10) and 1) Calculate M = A − ρ B maximizing ρ defined by (n) 2) Compute the eigenvalues µi and corresponding P (n) (n) t eigenvectors ti (i = 1 to N) of M i∈I tiAti ρ = P t , (13) (it is possible to sort the eigen-elements so that i∈I tiBti (n) (n) (n) µ1 ≥ µ2 ≥ ... ≥ µN ) p where I is a subset of p different numbers out of {1, ...N}. 3) Find the CN combinations of p elements out of It seems natural that such an algorithm should be itera- {1, 2, ..., N}: ³ ´ (n) (n) tive and use, at each step, these equations alternately. they will be denoted Ik with card Ik = p p and 1 ≤ k ≤ CN A. Presentation of the algorithm 4) For k = 1 to Cp N P ∗ (n)t (n) ρ being unknown, it is reasonable to choose for the ini- ti Ati (n) (0) −1 i∈I tial value of ρ, say ρ , the largest eigenvalue of B A. (n+1) k th Calculate ρ = P t (k k t(n) Bt(n) At each step n ≥ 0, we obtain the symmetric matrix i i (n) ¡ ¢ i∈I A − ρ(n)B and calculate its N eigenvectors t(n) asso- k i combination) (n) n o ciated to eigenvaluesn µi . Then we must chooseo among (n+1) 5) Find the maximal value of ρk : this (n) (n) (n) k=1,...Cp them the p ones ti , i ∈ I /card(I ) = p for which N maximum is denoted ρ(n+1) (n+1) (n) P (n)t (n) 6) If |ρ − ρ | < ε ti Ati stop iterations, i∈I(n) ρ(n+1) = (14) t∗ = t(n), µ∗ = µ(n), ρ∗ = ρ(n+1) P (n)t (n) i i i i ti Bti Else i∈I(n) n ← n + 1,

(n) Goto 1) is maximum. These p vectors span¡ a subspace¢ Ep . End If Then it is easy to calculate A − ρ(n+1)B , I(n+1) and ∗ ∗ Ep is spanned by the {ti }i∈I∗={1,2,...,p} (n+1) the new subspace Ep . This process can be iterated until |ρ(n+1) −ρ(n)| < ε (see table 1). Of course, we have to prove Table 1 : description of the algorithm that this algorithm converges to the good solution ρ∗. The differentiation of equation (10) w.r.t. ρ leads to B. Study of the convergence ∂t ∂µ ∂t At step n, from (11) and (14), the variation of ρ is −Bt + (A − ρB) i = i t + µ i . i ∂ρ ∂ρ i i ∂ρ P µ(n) i From equation (17) and multiplying by tt, it comes (using (n+1) (n) i∈I(n) i ρ − ρ = . (15) t t P (n)t (n) equation ti (A − ρB) = µiti): ti Bti i∈I(n) ∂t ∂µ −ttBt + µ tt i = i , i i i i ∂ρ ∂ρ P (n) (n+1) (n) Of course, if i∈I(n) µi = 0, then ρ = ρ . Hence, as it has been proved that for ρ∗, there exists a subset I∗ or P ∗ ∂µi t of cardinal p such as (12) is verified, say ∗ µ = 0 , ∀i, = −t Bt < 0. (18) i∈I i ∂ρ i i and ρ∗ is clearly a fixed-point of the algorithm. Let us denote: Using equations (16) and (18), equation (15) becomes: X ³ ´ P (n) (n) µ(n) µi = f ρ . (16) i (n) i∈I(n) f(ρ ) i∈I(n) ρ(n+1) = ρ(n) + = ρ(n) − , (19) P ∂µ(n) f 0(ρ(n)) − i ∗ ∂ρ Then f(ρ ) = 0. i∈I(n) As for any value of ρ, A − ρB is symmetrical, the {ti} 0 ∗ span an orthonormal basis: then ttt = 1, ∀i. This expres- Obviously f (ρ ) is not null: in fact, from (18) we have i i P ∂µ sion can be differentiated w.r.t. ρ, leading to: f 0(ρ) = i < 0 for any value of ρ. i∈I ∂ρ ∂t We can use the Newton theorem that says that if : tt i = 0, ∀i. (17) i ∂ρ 1) f(ρ) is twice differentiable,

7 V EXPERIMENTAL RESULTS

2) f(ρ∗) = 0, was reasonable to choose this value. What we are searching 3) ρ(0) is “close to” ρ∗, for is, p being fixed, the global maximum w.r.t. ρ (there 4) “f 0(ρ∗) 6= 0”, exist other local maxima), and this maximum is necessarily ∗ ∗ then the series defined by (19) converges to ρ with a the nearest from ρ = λ1 (1 ≤ ρ ≤ λ1). quadratic speed. It is easy to prove that I∗ = {1, ...p} Obviously, an initialization of the algorithm with any ∗ ∗ ∗ (0) (if the eigenvalues are sorted so that µ1 ≥ µ2 ≥ ... ≥ µN )). value ρ 6= λ1 can lead the algorithm to find a local max- But we must be careful because that is not true at any step imum. n for I(n). Then this algorithm converges to the solution ρ∗ of our V. Experimental results problem. Let us apply the CSMF method to the example described in Section III-F. Results are shown in Figure 3: the quality C. Uniqueness of the solution of the ROC curves increases from p = 1 to p = 4 (or p = 5 −1 If we denote λ1 the largest eigenvalue of B A, then which gives more or less the same results than p = 4) and ∗ 1 ≤ ρ ≤ λ1. In particular cases, it may be possible to decreases from p = 6. The optimal result is obtained for find several subspaces of dimension p for which the SNR ρ p = 4 or 5. is maximal; in fact, this is not a problem. In such a case, we can take an interest in finding a subspace of higher CSMF : input SNR = − 14 dB dimension than p with the same SNR, or we can add a new 1 p = 5 criterion to choose among those subspaces. 0.9

D. Practical remark 0.8 p = 4 (n) 0.7 At each step, the identification of I requires a heavy p = 6 p = 2 p = 3 N 0.6 calculation. Cp different combinations have to be tested, which can quickly increase to an unacceptable number of 0.5 p = 1 calculations. The algorithm proposed can be improved sig- 0.4 nificantly. Instead of searching the optimal set of eigen- 0.3 vectors in a systematic way,¡ one can¢ advantageously re- Pd (probability of detection) arrange the eigenvalues of A − ρ(n)B in decreasing or- 0.2

(n) (n) (n) 0.1 der: µ1 > µ2 > ... > µN (note that those values can as well be positive or negative), and take, at step n, 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 I(n) = {1, 2, ..., p} (we have seen that I∗ = {1, 2, ..., p}): Pfa (probability of false alarm) equation (14) becomes Fig. 3. CSMF : ROC curves for different values of p p P (n)t (n) ti Ati (n+1) i=1 Figure 4 shows the best result obtained by the CSMF ρ = p ∗ † P (n)t (n) (E4 ) and the best one obtained by the ESMF (E3 ). The ti Bti ROC curve obtained in E∗ (the best result reachable) is i=1 4 † ∗ † obviously above those obtained in E3 . Note that E1 = E1 . (n+1) bset . In theory, there is no reason for ρ to be maximal, (n+1) but in practice it so happens that ρ reaches almost sys- best ESMF and best CSMF − input SNR = −14 dB tematically the maximal value, and if not, reaches a value 1 very close to it. In the neighbourhood of the solution, the 0.9 CSMF (p = 4) convergence is assured. Such a change of the algorithm ob- 0.8 viously highly decreases the sum of calculation. In terms of convergence to ρ∗, there is a slight drop in the speed 0.7 of convergence in terms of number of iterations. Glob- 0.6 ESMF (p = 3) ally, however this method converges to the solution and 0.5 decreases the sum of calculation in a very significant pro- 0.4 portion. To give a precise idea of the gain, for N = 21 and p Pd (probability of detection) 0.3 p = 5, CN = 20349. Pd (probability of detection) The convergence of this modified algorithm has not been 0.2 proved. 0.1

0 E. Conclusion 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Pfa (probability of false alarm) In this section, the convergence of the algorithm pro- posed has been proved. Fig. 4. Best ROC curves for CSMF (p = 4) and ESMF (p = 3) For a given value of p, we have initialized the algorithm (0) −1 with ρ = λ1 the largest eigenvalue of B A, saying it This example illustrates clearly the improvement that

8 VI CONCLUSION

∗ can be brought by maximizing the SNR in E4 instead of the subspace spanned by these vectors. ∗ E1 , but also its superiority in comparison with the ESMF In this paper, we calculate a subspace whose dimension method. is chosen a priori and which is optimal in the sense that ∗ Now the optimal subspace Ep (here p = 4) has been the SNR ratio is maximized within. We prove, through found with the CSMF method which is a projection (a theoretical examples, that this subspace is not necessar- reduced-rank method). Nevertheless we did not use all the ily those spanned by the vectors calculated by the SMF. possibilities of classical filtering, and among all the basis of Through ROC curves, practical experimentations illustrate ∗ Ep , we can choose one with interesting properties after a the interest of such an approach. linear filtering. For example, the ESMF gives preferential We have shown, with a practical example, that a note- treatment to directions (spanned by the ui) with the best worthy improvement can be reached with the ESMF ap- SNR (λi) (like the Wiener filter): after this linear filtering, plied in the optimal subspace calculated beforehand by the the power of noise is one in any direction ui. CSMF. It confirms that such a method is an interesting Thus, we calculate the ROC curves obtained with the and powerful one to perform detection. equation (2) where p is the subspace dimension so that all Prospects of applications of the CSMF can easily be ∗ the basis vectors of E4 are taken into account. Results are imagined in image processing or stochastic transient sig- shown in figure 5. bset nals detection (like acoustic signals). An extension to the classification problem is possible. Of course, as this method

best ESMF and best CSMF − input SNR = −14 dB is a reduced-rank one performing SNR maximization, it can 1 CSMF+ESMF (p = 4) be used for data compression or estimation and filtering. 0.9 Thus, the CSMF was successfully used with real signals: 0.8 • analysis of sequences of IR images (SATIR) to qual-

0.7 CSMF (p = 4) ify high heat flux components (carbon bricks used in the ITER project with the CEA Cadarache): detection of de- 0.6 ESMF (p = 3) fects and classification of components [23], 0.5 • detection and classification of textured images (like 0.4 expanded polystyrene and textured paper or textured

Pd (probability of detection) 0.3 stones pictures for example): lot of images are texture im-

Pd (probability of detection) ages (forests, farming areas, ...) [1], 0.2 • detection and localization of very high energy 0.1 neutrinos by a passive underwater acoustic telescope

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (ANTARES European project) [24], Pfa (probability of false alarm) • estimation of the sources in a specific blind source separation problem [25]. Fig. 5. ROC curves for SNR=-14 dB Reduced-rank estimators and filters are important for a wide range of applications, among others A noteworthy improvement can be observed by using a when data or model reduction, robustness against noise simultaneous diagonalization technique in the optimal sub- or model errors is desired. This concerns known methods space calculated beforehand. We finally used a projection like the reduced-rank Wiener filter (RRWF) by Scharf [17], (to find the optimal subspace) and a linear filtering opera- the reduced-rank maximum likelihood estimation (RRMLE) tion to improve again the detection. by Stoica-Viberg [26], the relative Karhunen-Loeve trans- VI. Conclusion form (RKLT) by Yamashita-Ogawa [9] or the generalized Karhunen-Loeve transform (GKLT) by Hua-Liu [27], which The method proposed in this paper takes its place in the is used for data compression and filtering and is in fact set of methods of decomposition of signals on appropriate nothing else than the RRWF also called low-rank Wiener basis but also in subspace methods. filter in [28] section 8.4. When trying to detect stochastic signals with known co- The choice of the optimal dimension p of the subspace variance matrices but with no a priori knowledge on their of projection is an important question which must be ex- probability density function, people usually try to project amined in more detail in the future. on the signal subspace (SVD,...). It is possible to take into account the structure (covariance) of the embedding noise: Appendix A : Matlabr code. the SMF is used in such a point of view and in this case, a projection onto a p-dimensional subspace is made. In I In the main program fact, this method is proven to be equivalent to a two-stage B = 1260 % (Hz) noise bandwith method: the whitening of the noise followed by the max- f0 = 3131 % (Hz) central frequency imization of the SNR in a p-dimensional subspace. This fe 15324 % (Hz) sampling frequency method comes down to projecting the observation onto a Z = reponse(f0,0.25,fe); % generation of a narrow-band subspace of dimension greater than one. However, there is signal no guarantee that the signal-to-noise ratio is maximum in signal = create(Ls,Z); % Ls = length of the signal

9 VI CONCLUSION

Z = reponse(f0,B,fe); % generation of the noise terpretations of matched and adaptive subspace detectors Traite- noise = create(Ln,Z); % Ls = length of the noise ment du Signal, vol.15, no.6, pp.527-534, 1998. [16] L.L. Scharf, Shawn Kraut, Michael L. McCloud A Review of I subroutine 1 Matched and Adaptive Subspace Detectors Proc. Symp. on Adap- function y = create(lg,filtre) tive Systems for Signal Process. Commun. and Control, Lake Louise, Alta., Canada, Oct. 2000. n = length(filtre); [17] L.L. Scharf and M.L. McCloud Matched and Adaptive Subspace X = randn(1,lg+n); Detectors When Interference Dominates Noise Proc. Asilomar X1 = conv(X,filtre); ’00, Monterey, CA, Oct. 2000 (invited). [18] S. Verdu, H.V. Poor, Signal Selection for Robust Matched Filter- X1 = X1/std(X1); ing, IEEE Trans. on Communications, vol.COM-31, no.5, pp.667- y = X1(n:length(X1)-n); 670, May. 1983. end [19] S. Verdu, H.V. Poor, Minimax Robust Discrete-time Matched Filters IEEE Trans. Communications, vol.COM-31, no.2, pp.208- I subroutine 2 215, Feb. 1983. function Z = reponse(F0,B,fe) [20] J.G. Proakis, Digital Communications, McGraw-Hill, Inc., third Te = 1 / fe; edition, 1995. [21] Y.C. Eldar, A.V. Oppenheim, Orthogonal Matched Filter De- n0 = round( 1/(2*B*Te)); tection, Proceedings on the International Conference Acoustics, n = 1 : (2 * n0 + 1); Speech, Signal Processing (ICASSP-2001) Salt-Lake, UT, May. Z = cos(2*pi*F0*(n-1-n0)*Te).*sin(2*pi*B*(n-1-n0)*Te)./(pi*(n-2001. [22] R.A. Horn and C.R. Johnson, Matrix analysis, Cambridge Uni- 1-n0)*Te); versity Press, ISBN 0-521-38632-2, 1999. Z(n0+1) = 2 * B; [23] F. Cismondi, B. Xerri, C. Jauffret, J. Schlosser, N. Vignal and A. Durocher Analysis of SATIR Test for the Qualification of High end Heat Flux Components: Defect Detection and Classification by The realizations si (i = 1, ..., Nr) are generated by taking Signal-to-Noise Ratio Maximization Physica Scripta, T. 128, pp N consecutive points in the narrow-band signal, the first 213-217, Mar. 2007. [24] N. Juennard, C. Jauffret, B. Xerri Detection and Localization of point being chosen randomly. Very High Energy Neutrinos by a Passive Underwater Acoustic Telescope Passive’08 , IEEE OES , Hy`eres,14-17, Oct. 2008. References [25] B.Xerri, B.Borloz Detection by SNR maximization: application to the blind source separation problem, 5th International Con- [1] B. Borloz, Estimation, d´etection et classification par maximisa- ference On Independant Component Analysis and Blind Signal tion du rapport signal-`a-bruit: le filtre adapt´estochastique sous Separation. Sept. 22-24 2004. Grenade (Spain) contrainte, Ph.D. Thesis, University of Toulon-France, Jun. 2005. [26] P. Stoica and M. Viberg Maximum Likelihood Parameter and [2] H.L. Van Trees, Detection, Estimation and Modulation Theory, Rank Estimation in Reduced-Rank Multivariate Linear Regres- -Sonar Signal Processing and Gaussian Signals in Noise, sions, IEEE Trans. Signal Processing, vol.44, pp. 3069-3078, Dec. Wiley, Part III. 2001 1996. [3] H.V. Poor, An Introduction to Signal Detection and Estimation, [27] Y. Hua and W. Liu, Generalized Karhunen-Loeve Transform , Springer-Verlag, 1988. IEEE Signal Processing Lett., vol.5, pp.141-142, Sept. 1994. [4] T. Kailath, H.V. Poor, Detection of Stochastic Processes, IEEE [28] L.L. Scharf Statistical Signal Proces.: Detection, Estimation Transactions on Information Theory, Vol.44, no.6, pp.2230-2259, and Time Series Analysis Reading MA; Addison Wesley, 1991. Oct. 1998. [5] A. Hero, Signal Detection and Classification, in ’The Digital Annex : Notations Signal Processing Handbook’ by Vijay K. Madisetti. CRC Press C ≡ [cij ] : matrix of entries cij LLC, Series: Electrical Engineering Handbook, 1997. Ct : transpose of C http://www.eecs.umich.edu/∼hero/Preprints/crc article.ps.Z C−1 : inverse of nonsingular C [6] J.F. Cavassilas, B.Xerri, Extension de la notion de filtre adapt´e. trC : trace of C Contribution `ala d´etection de signaux courts en pr´esence de I : N × N unity matrix termes perturbateurs, Revue Traitement du Signal vol.10 no.3, 0 : N × N null matrix pp.215-221, 1992. ∆λ ≡ [λi] : diagonal matrix of entries λi [7] P.P.Pokharel, U. Ozertem, D. Erdogmus and J.C.Principe Recur- v : column vector sive complex BSS via generalized eigendecomposition and appli- E[.] : expected value of [.] cation in image rejection for BPSK, Signal Processing, vol.88, δij : Kronecker delta of rank 2 pp.1368-1381, 2008. s : signal of interest [8] J. Yang, H.S. Xi, F. Yang and Y. Zhao RLS-based adaptive al- n : corrupting noise gorithms for generalized eigendecomposition, IEEE Trans. Signal A : N × N full-rank covariance matrix of s Processing, vol.54, no.4, pp.1177-1188, Apr. 2006. B : N × N full-rank covariance matrix of n −1 [9] Y. Yamashita, H. Ogawa, Relative Karhunen-Loeve transform , λi : eigenvalue of B A IEEE Trans. Signal Processing, vol.44, pp. 371-378, Feb. 1996. (with λ1 ≥ λ2 ≥ ... ≥ λN ≥ 0) −1 [10] L.L. Scharf and B. Friedlander: Matched Subspace Detectors ui : eigenvector of B A IEEE Trans. Signal Processing, vol.42, no.8, pp. 2146-2157, Aug. µi : eigenvalue of (A − ρB) 1994. ti : eigenvector of (A − ρB) [11] Mukund N. Desai and Rami S. Mangoubi, Robust Gaussian and Ep : subspace of dimension p † Non-Gaussian Matched Subspace Detection, IEEE Trans. Signal Ep : subspace of dimension p spanned by {u1, ..., up} Processing, vol.51, no.12, pp.3115-3127, Dec. 2003. Eu : subspace of dimension 1 spanned by u ∗ [12] Mukund N. Desai and Rami S. Mangoubi, Robust Subspace Ep : optimal subspace of dimension p Learning and Detection in Laplacian Noise and Interference, IEEE Trans. Signal Processing, vol.55, no.7, pp.3585-3595, Jul. 2007. [13] S. Kraut, L.L. Scharf and L. Todd McWhorter Adaptive Sub- space Detectors IEEE Trans. Signal Processing, vol.49, no.1, pp.1- 16, Jan. 2001. [14] S. Kraut and L.L. Scharf The CFAR Adaptive Subspace Detector is a Scale-Invariant GLRT IEEE Trans. Signal Processing, vol.47, no.9, pp.2538-2541, Sept. 1999. [15] L.L. Scharf and S. Kraut Geometries invariances, and SNR in-

10