<<

Hyperspectral detection algorithms: Use covariances or subspaces?

The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters.

Citation Manolakis, D. et al. “Hyperspectral detection algorithms: use covariances or subspaces?.” Imaging Spectrometry XIV. Ed. Sylvia S. Shen & Paul E. Lewis. San Diego, CA, USA: SPIE, 2009. 74570Q-8. © 2009 SPIE

As Published http://dx.doi.org/10.1117/12.828397

Publisher Society of Photo-optical Instrumentation Engineers

Version Final published version

Citable link http://hdl.handle.net/1721.1/52735

Terms of Use Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. Hyperspectral Detection Algorithms: Use Covariances or Subspaces?

D. Manolakis, R. Lockwooda, T. Cooleyb and J. Jacobsonc MIT Lincoln Laboratory, 244 Wood Street, Lexington, MA 02420 aSpace Vehicles Directorate Air Force Research Laboratory 29 Randolph Road, Hanscom AFB, MA 01731-3010 bSpace Vehicles Directorate Air Force Research Laboratory Kirtland AFB, 2251 Maxwell Ave, Kirtland AFB, NM, 87117 cNational Air and Space Intelligence Center, Wright-Patterson AFB, OH

ABSTRACT There are two broad classes of hyperspectral detection algorithms.1, 2 Algorithms in the first class use the spectral co- variance matrix of the background clutter; in contrast, algorithms in the second class characterize the background using a subspace model. In this paper we show that, due to the nature of hyperspectral imaging data, the two families of algorithms are intimately related. The link between the two representations of the background clutter is the low-rank of the of natural hyperspectral backgrounds and its relation to the spectral linear mixture model. This link is developed using the method of dominant mode rejection. Finally, the effects of regularization, covariance shrinkage, and dominant mode rejection are discussed in the context of robust matched filtering algorithms. Keywords: Hyperspectral imaging, target detection, statistical modeling, background characterization.

1. INTRODUCTION The detection of materials and objects using remotely sensed spectral information has many military and civilian appli- cations. Hyperspectral imaging sensors measure the radiance for every pixel at a large number of narrow spectral bands. The obtained measurements are known as the radiance spectrum of the pixel. In the reflective part of the electromagnetic spectrum (0.4μm-2.5μm), the spectral information characterizing a material is the reflectance spectrum, defined as the ratio between reflected and incident radiation as a function of wavelength. The most widely used detection algorithms use the covariance matrix of the background data; however, there are algo- rithms that use a subspace model formed by the endmembers of a linear mixing model or the eigenvectors of the covariance matrix.3 Finding the endmembers in a data cube is a non-trivial task whose complexity exceeds that of the detection prob- lem. On the other hand, due to the high dimensionality of hyperspectral imaging data, the estimated covariance matrix may be inaccurate or numerically unstable. A practical approach to improve the quality of the estimated covariance matrix is to use covariance shrinkage or the dominant mode rejection approximation. The invertibility of the estimated matrix can be assured by using regularization. These techniques lead to the development of robust matched filter detectors which can be used in practical applications without concerts about numerical instabilities. These issues are the topic of this paper, which is organized as follows. Section 2 discusses two approaches to covariance matrix regularization: matched filter optimiza- tion and shrinkage. In Section 3 we present an approach to covariance matrix estimation and inversion using domonant mode rejection and diagonal loading. Section 4 presents an interpretation of dominant mode rejection as covariance matrix augmentation. In Section 5 we discuss the relationship between the subspaces generated by eigenvectors and endmembers. Finally, Section 6 explores the relationship between covariance and subspace based detectors.

2. COVARIANCE MATRIX REGULARIZATION Accurate estimation and numerically robust inversion of the covariance matrix is critical in hyperspectral detection appli- cations. We next present two different approaches to regularization that lead to the same diagonal loading solution. Correspondence to D.Manolakis. E-mail: [email protected], Telephone: 781-981-0524, Fax: 781-981-7271

Imaging Spectrometry XIV, edited by Sylvia S. Shen, Paul E. Lewis, Proc. of SPIE Vol. 7457, 74570Q · © 2009 SPIE · CCC code: 0277-786X/09/$18 · doi: 10.1117/12.828397

Proc. of SPIE Vol. 7457 74570Q-1

Downloaded from SPIE Digital Library on 17 Mar 2010 to 18.51.1.125. Terms of Use: http://spiedl.org/terms 2.1 The Matched Filter Approach The spectral measurements obtained by a p-band hyperspectral imaging sensor can be arranged in vector form as

 T x = x1 x2 ... xp (1) where T denotes matrix transpose. Let v be a p × 1 random vector from a normal distribution with mean μ and covariance matrix Σ representing the background clutter. Finally, let s0 be a p × 1 vector representing the spectral signature of the target of interest. To simplify notation, we assume that μ is removed from all spectra, that is, we deal with zero mean clutter and a “clutter-centered” target signature.

The Optimum Matched Filter The optimum linear matched filter4 is a linear operator

y = hT x (2) which can be determined by minimizing the output clutter power Var(y2)=hT Σh subject to a unity gain constraint in the direction of the target spectral signature

T T min h Σh subject to h s0 =1 (3) h

The solution to (3) is given by Σ−1s h = 0 T −1 (4) s0 Σ s0 which is the formula for the widely used matched filter. In the area, where the data and filter vectors are complex, the matched filter (4) is known as the standard Capon beamformer (SCB).5

In practice, the clutter covariance matrix Σ and the target signature s0 have to be estimated from the available data. It turns out that the matched filter (4) is sensitive to signature errors and the quality of the clutter covariance matrix. Therefore, the development of matched filters that are robust to signature and clutter covariance errors is highly desirable. This problem has been traditionally dealt with using a diagonal loading approach or an eigenspace-based approach. However, in both case the selection of diagonal loading or the subspace dimension is ad-hoc.5

Quadratically Constrained Matched Filter The robustness of matched filter to covariance matrix and signature mis- match can be improved by constraining the size of hT h. This is done by solving the following optimization problem

T T T min h Σh subject to h s0 =1and h h ≤ h (5) h

The solution is the well-known diagonally loaded matched filter (Σ + δ I)−1s h = h 0 d T −1 (6) s0 (Σ + δhI) s0

The load level δh can be computed from h by solving a nonlinear equation. However, it is not clear what is the meaning and how to choose the parameter h. This issue is addressed next using the robust Capon beamformer approach.

The Robust Matched Filter In this section, we shall use the theory of robust Capon beamformer (RCB)? to develop a robust matched filter that takes measurement errors and the spectral variability of hyperspectral target signatures into con- sideration. The robust matched filter (RMF) addresses robustness to target signature errors by introducing an uncertainty region constraint into the optimization process. To this end, assume that the only knowledge we have about the signature s is that it belongs to an uncertainty ellipsoid

T −1 (s − s0) C (s − s0) ≤ 1 (7)

Proc. of SPIE Vol. 7457 74570Q-2

Downloaded from SPIE Digital Library on 17 Mar 2010 to 18.51.1.125. Terms of Use: http://spiedl.org/terms where the vector s0 and the positive definite matrix C are given. In most hyperspectral target detection applications, it is difficult to get sufficient data to reliably estimate the full matrix C. Therefore, we usually set C = εI, so that (7) becomes

2 ||s − s0|| ≤ ε (8) where ε is a positive number. These ideas are illustrated in Figure 1(a). It has been shown in? that the RMF can be obtained as the solution to the following optimization problem

T −1 2 min s Σ s subject to ||s − s0|| ≤ ε (9) s

It turns out that the solution of (9) occurs on the boundary of the constraint set; therefore, we can reformulate (9) as a quadratic optimization problem with a quadratic equality constraint

T −1 2 min s Σ s subject to ||s − s0|| = ε (10) s

This problem can be efficiently solved using the method of Lagrange multipliers.6 The solution involves an estimated target signature −1 −1 sˆ = ζ(Σ + βI) s0 (11) which is subsequently used to determine the RMF by Σ−1sˆ h = β sˆT Σ−1sˆ (12) The Lagrange multiplier ζ ≥ 0 can be obtained by solving the nonlinear equation

L |s˜ |2 sT (I + ζΣ)−2s = k = ε 0 0 2 (13) (1 + ζλk) k=1 where λk and s˜k are obtained from the eigen-decomposition K T T Σ = QΛQ = λkqkqk (14) k=1 and the orthogonal transformation T s˜ = Q s0 (15) The solution of (13) can be easily done using some nonlinear optimization algorithm, for example, Newton’s method. Finally, we note that the RMF (12) can be expressed in diagonal loading form as follows (Σ + ζ−1I)−1s h = 0 ζ T −1 −1 −1 −1 (16) s0 (Σ + ζ I) Σ(Σ + ζ I) s0 where ζ−1 is a loading factor? computed from (13). Figure 1(b) illustrates the validity of the optimization approach leading to the RMF. We note that the RMF is obtained as a standard MF for a modified target signature. As expected the “assumed” target signature specifies the center of the uncertainty region, whereas the modified signature “touches” the boundary of the uncertainty region.

2.2 Covariance Shrinkage

In practice the covariance matrix Σ has to be estimated from a set of observations xk, k =1, 2,...,N. The most widely used estimate is the sample covariance matrix defined by the well-known formula

1 N 1 N Σˆ = (x − μˆ)(x − μˆ)T , μˆ = x N k k N k (17) k=1 k=1

Proc. of SPIE Vol. 7457 74570Q-3

Downloaded from SPIE Digital Library on 17 Mar 2010 to 18.51.1.125. Terms of Use: http://spiedl.org/terms 3

ε= 05. 2 Actual target 2 signature Target variability h ε 1 2 s

0

Assumed target 2 Band s0 Background signature variability θ -1

h -2 0 1 hhT Σ = constant -3 -3 -2 -1 0 1 2 3 Band 1 Figure 1. (a) Illustration of robust matched filter design principle using two spectral bands. (b) Illustration of robust matched filter when there is a target signature mismatch. The algorithm uses the available signature specifying the center of the uncertainty region, to produce a “robust” signature that is subsequently used to determine the RMF coefficients.

The sample covariance matrix has appealing properties: it is asymptotically unbiased and maximum likelihood under normality. Since Σ has p(p +1)/2 free parameters which have to be estimated from p × N measurements, we can get good estimates only when N  p. However, when N is of the order of p, Σˆ is a poor estimate of Σ.   ˜ p p 2 To mitigate the problem that Σ − Σ2, where A2 = i=1 j=1 aij is the Frobenious norm, is large when p is relatively large, it is suggested that we use a shrunk estimator

Σ˜ (δ)=δF +(1− δ)Σˆ , 0 ≤ δ ≤ 1 (18) where F is a constrained version of σ. The basic idea is to reduce the variance of the estimator by increasing its bias. The sample covariance has many free parameters and very little structure; as a result, it is asymptotically unbiased but it has a lot of estimation error. The matrix F has a few free parameters and a lot of structure. As a result of stringent and misspecified structural assumptions, F has significant bias but insignificant variance. This technique is called shrinkage because the sample covariance matrix is “shrunk” toward the structured estimator. The number δ is referred to as the shrinkage constant. A shrinkage estimator has three components: an estimator with no structure, an estimator with a lot of structure, and a shrinkage constant. The typical choice for F in (18) is the identity matrix. When δ  1 we have Σ˜ (δ)=Σˆ +δI, which is identical to diagonal loading. The shrinkage approach to covariance matrix estimation, including estimation of δ, is thoroughly discussed by Ledoit and Wolf.7

3. ESTIMATION AND INVERSION OF COVARIANCE MATRIX USING DOMINANT MODES The basic idea is to estimate only the large eigenvalues and corresponding eigenvectors of the covariance matrix Σ of the background.8 The advantage is that we can obtain better estimates of Σ with fewer spectra. The spectral decomposition of the covariance matrix can be broken into two parts: one for the d largest eigenvalues and one for the (p − d) smaller eigenvalues p d p T T T Σ = λiqiqi = λiqiqi + λiqiqi (19) i=1 i=1 i=d+1 or in compact matrix form as      T T Λ1 0 Q1 Σ = QΛQ = Q1 Q2 T (20) 0Λ2 Q2 T T = Q1Λ1Q1 + Q2Λ2Q2 (21)

Proc. of SPIE Vol. 7457 74570Q-4

Downloaded from SPIE Digital Library on 17 Mar 2010 to 18.51.1.125. Terms of Use: http://spiedl.org/terms where Λ is the diagonal matrix of p eigenvalues of Σ sorted in decreasing order and Q is a matrix whose columns are the corresponding eigenvectors. The elements of the other matrices can be easily determined by comparing (20) to (19). Since some of the smaller eigenvalues may be zero, Σ may be less than full rank. The small eigenvalues and their eigenvectors are difficult to estimate and hard to compute accurately when Σ is ill conditioned. If we replace the last p − d eigenvalues by a constant α, we obtain the approximation d p ˜ T T Σ = λiqiqi + α qiqi (22) i=1 i=d+1

From the orthogonality relation QQT = I,wehave p d T T qiqi = I − qiqi (23) i=d+1 i=1

Substituting (23) into (22) yields d ˜ T Σ = (λi − α)qiqi + αI (24) i=1 To express the inverse of Σ˜ explicitly, we first rewrite (24) as  d λ − α α−1Σ˜ = I + q i qT = I + Q Λ QT i α i 1 1 1 (25) i=1 where 1 Λ = {(λ − α), (λ − α),...,(λ − α)} a αdiag 1 2 d (26) Using (25) and the matrix inversion lemma

(A + BCD)−1 = A−1 − A−1B(DA−1B + C−1)−1DA−1 (27) we obtain the expression  1 d λ − α Σ˜ −1 = I − i q qT α λ i i (28) i=1 i One way to determine α is by requiring that trΣ = trΣ˜ , where tr denotes the trace of a matrix. This yields

1 p 1 d α = λ = Σ − λ p − d i p − d tr i (29) i=d+1 i=1 which is the average of the smaller p − d eigenvalues of Σ. Repeating the same process for the matrix Σ˜ + δI, we can easily show that

d  ˜ −1 1 λi − α T (Σ + δI) = I − qiqi α + δ λi + δ i=1 1 d = I − β q qT α + δ i i i (30) i=1 where λi − α βi  (31) λi + δ This procedure introduces diagonal loading to the dominant modes.

Proc. of SPIE Vol. 7457 74570Q-5

Downloaded from SPIE Digital Library on 17 Mar 2010 to 18.51.1.125. Terms of Use: http://spiedl.org/terms 4. DOMINANT MODE INTERPRETATION AS COVARIANCE MATRIX AUGMENTATION

Consider a data set with covariance matrix Σ, which is singular with rank d

One possible approach to make Σ non-singular, is to find an augmented matrix Σ˜ in such a way that it retains its major characteristics

1. Σ˜ is symmetric,

2. Σ˜ has full rank p,

3. the first d principal axes of Σ˜ are the same as those of Σ, and they are in the same order,

4. the last p − d principal axes are indeterminate, that is, the corresponding eigenvalues of Σ˜ are identical, and 5. tr(Σ˜ )=tr(Σ).

These criteria are all upheld by the matrix Σ˜ defined by    1   T ˜ Λ1 + δI 0 Q1 Σ = Q1 Q2 T (33) γ 0 (α + δ)I Q2 where α and δ are parameters satisfying δ ≥ 0, α<λd and α + δ>0, and γ is a normalizing constant given by

d d γ = δp + α(p − d)+ λi / λi (34) i=1 i=1

A justification for this approach is provided by the optimum approximation interpretation of principal com- ponent analysis (PCA). The method of PCA provides the best d-dimensional approximation to a p-dimensional set of data by projecting the data onto the first d principal components. The p×p covariance matrix Σ of the projected data is singular with rank d

Dominant Mode Robust Matched Filter If we use the DMR approximation, the matched filter coefficients for a target with spectral signature s0, can be evaluated using the formulas

d ˜ −1 T h = κ(Σ + δI) s0 = κ s0 − βi(qi s0)qi (35) i=1 where −1  −1 d T ˜ −1 T T 2 κ  s0 (Σ + δI) s0 = s0 s0 − βi(qi s0) (36) i=1

Proc. of SPIE Vol. 7457 74570Q-6

Downloaded from SPIE Digital Library on 17 Mar 2010 to 18.51.1.125. Terms of Use: http://spiedl.org/terms 5. EIGENVECTORS AND ENDMEMBERS The spectral linear mixing model assumes that the spectrum of any pixel can be expressed as M x = aisi + w = Sa + w (37) i=1 where si is an endmember spectrum, ai ≥ 0 its abundance, and w is normally distributed with mean zero and 2 covariance matrix σwI. If the endmembers are assumed linearly independent, matrix S =[s1 ...sM ] has rank M. The covariance matrix of x is given by T 2 Σ = SAS + σwI (38) 2 2 2 where A = diag{a1,...,aM }. We next accept the approximation (22) with d = M and set α = σw. Then (22) and (38) yield 2 T T σwQdΛaQd = SAS (39) that is, the columns of Qd and S span the same space. Therefore, at least in theory, either S or Qd can be used for the implementation of low-rank detectors. Under the assumptions of the linear mixing model, the maximum likelihood estimate of the background subspace is spanned by the M-dominant eigenvectors of the estimated correlation matrix of the data.9 In practice, the covariance matrix of the noise in hyperspectral data differs from σ2I; therefore, this result is an approximation. For non-zero mean data there is a difference between the linear subspace defined by a covariance matrix and the affine subspace defined by the correlation matrix.3 Although the two approaches are theoretically different, for detection applications we de-mean the data and we work with the covariance matrix of the background. Demeaning does not make the two approaches equivalent but it appears to be sufficient for practical applications.

6. COVARIANCE OR SUBSPACE BASED DETECTORS

If we assume that λi  α, for all 1 ≤ i ≤ d, we obtain the Principal Component Inversion (PCI) approximation of the inverse covariance matrix 1 d 1   Σ˜ −1 = I − q qT = I − Q QT PCI α i i α 1 1 (40) i=1 This case, which is also known as zero-variance discrimination in the statistics literature, provides the link between the matched filter and subspace detection algorithms, like the OSP.10 Since endmembers and dominant eigenvectors of the covariance matrix span the same subspace, there is a strong relationship between covariance-based and subspace-based detection algorithms. The link between the two classes of algorithms is provided by (30). Although there is a difference between covariance matrix and correlation matrix eigenspaces, we should keep in mind that the derivation of optimum detection and classification algorithms under a normal distribution model involves the use of covariance matrices. In Figure 2 we show an example of detection statistics for the OSP detector with M =5eigenvectors, the matched filter with d =5dominant modes, and the CEM detector (this is basically a matched filter using the correlation matrix) with d =5 dominant modes. We note a strong similarity between the three detection statistics. Similar results have been obtained for other cases. Based on these findings and the underlying theoretical arguments we prefer the use of covariance-based detectors in practical hyperspectral imaging applications.

7. SUMMARY In this paper we discussed the use of regularization and dominant mode rejection techniques in the implementation of hyperspectral detection algorithms. We then used the dominant mode rejection inversion of the covariance matrix to obtain a link between covariance and subspace detection algorithms. Experimental investigations showed that we can emulate the behavior of subspace algorithms by changing the number of dominant modes in a matched filter detector. Further work to fully understand the effects of regularization, covariance shrinkage, and dominant mode rejection on detection and classification algorithms is in progress.

Proc. of SPIE Vol. 7457 74570Q-7

Downloaded from SPIE Digital Library on 17 Mar 2010 to 18.51.1.125. Terms of Use: http://spiedl.org/terms Figure 2. Detection statistics for the orthogonal subspace projection algorithm, the matched filter, and the constrained energy minimiza- tion (CEM) algorithm with DMR inversion and regularization.

REFERENCES [1] D. Manolakis and G. Shaw, “Detection algorithms for hyperspectral imaging application,” IEEE Processing Magazine , pp. 29–43, January 2002. [2] D. Manolakis, D. Marden, and G. Shaw, “Target detection algorithms for hyperspectral imaging application,” Lincoln Laboratory Journal 14(1), pp. 79–116, 2003. [3] D. Manolakis, R. Lockwood, T. Cooley, and J. Jacobson, “Is there a best hyperspectral detection algorithm?,” Algo- rithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XV 7334(1), p. 733402, SPIE, 2009. [4] S. M. Kay, Fundamentals of Statistical , Prentice Hall, New Jersey, 1998. [5] H. L. V. Trees, Optimum Array Processing, Wiley, New York, 2002. [6] P. Gill, W. Murray, and M. Wright, Practical Optimization, Academic Press, London, UK, 1981. [7] O. Ledoit and M. Wolf, “A well-conditioned estimator for large-dimensional covariance matrices,” Journal of Multi- variate Analysis 88, pp. 365–411, 2004. [8] H. Cox and R. Pitre, “Robust dmr and multi-rate adaptive beamforming,” , Systems & Computers, 1997. Conference Record of the Thirty-First Asilomar Conference on 1, pp. 920–924 vol.1, Nov 1997. [9] L. Scharf, Statistical Signal Processing, Addison-Wesley, Reading, MA, 1991. [10] J. C. Harsanyi and C. I. Chang, “Detection of low probability subpixel targets in hyperspectral image sequences with unknown backgrounds,” IEEE Trans. Geoscience and Remote Sensing 32, pp. 779–785, July 1994.

ACKNOWLEDGMENTS This work was sponsored by the Department of Defense under Air Force Contract FA8721-05-C-0002. Opinions, interpre- tations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.

Proc. of SPIE Vol. 7457 74570Q-8

Downloaded from SPIE Digital Library on 17 Mar 2010 to 18.51.1.125. Terms of Use: http://spiedl.org/terms