Performance Guarantees in Sample-Starved Space-Time Adaptive Processing

Ben Robinson1 (Presenting)

Robert Malinas 2 Alfred O. Hero, III2

1Air Force Research Lab, Sensors Directorate

2University of Michigan

Distribution A: Approved for unlimited distribution, PA approval #’s 88ABW-2020-3037, AFRL-2020-0561

January 5, 2021

1/22 Summary

• Problem

• Sample-starved detection in Space-Time Adaptive Processing

• Method

• High-dimensional, random matrix theory asymptotics

• Principal findings

• Consistent estimates of popular algorithms’ performances • Predictions matched empirically

2/22 Summary

• Problem

• Sample-starved radar detection in Space-Time Adaptive Processing

• Method

• High-dimensional, random matrix theory asymptotics

• Principal findings

• Consistent estimates of popular algorithms’ performances • Predictions matched empirically

2/22 Summary

• Problem

• Sample-starved radar detection in Space-Time Adaptive Processing

• Method

• High-dimensional, random matrix theory asymptotics

• Principal findings

• Consistent estimates of popular algorithms’ performances • Predictions matched empirically

2/22 Summary

• Problem

• Sample-starved radar detection in Space-Time Adaptive Processing

• Method

• High-dimensional, random matrix theory asymptotics

• Principal findings

• Consistent estimates of popular algorithms’ performances • Predictions matched empirically

2/22 Motivation: Space-Time Adaptive Processing

• Peace support operations • Multinational coalition operations • Air control • Homeland Defense • Counter Narcotics • Combat Search and Rescue • Civil Aviation • Weather Radar Microburst Detection • Automotive Collision Avoidance • Automotive Traffic Monitoring 3/22 Data Collection: Pulsed Radar Arrays1

1M. A. Richards, Jim Scheer, and William A. Holm. Principles of Modern Radar. Tes Dee Publishing Pvt. Ltd., (Published by arrangement), 2012. 4/22 Goal: Determine Which Range Cells Contain Targets

Hypothesis testing formulation2,

H0 : X = W H1 : X = aµ + W , a 6= 0, where a ∈ C unknown, p µ ∈ C known “steering vector” with ||µ||2 = 1, p×p R ∈ C Hermitian, positive-definite, unknown,  −1/2 3 E(W ) = 0, cov(W ) = R, E (R W )i ≤ C ∀i. 2Michael C. Wicks et al. “Space-time adaptive processing: A knowledge-based perspective for airborne radar”. In: IEEE Processing Magazine 23.1 (2006), pp. 51–65. 5/22 Adaptive Matched Filter

• Prevailing test is the adaptive matched filter3:

H −1 2 |µ Rˆ X | H1 Tˆ(X ) := τ. H ˆ −1 ≷ µ R µ H0

n • Assume {W i }i=1 i.i.d.,

3Frank C. Robey et al. “A CFAR adaptive matched filter detector”. In: IEEE Transactions on aerospace and electronic systems 28.1 (1992), pp. 208–216. 6/22 Adaptive Matched Filter

• Prevailing test is the adaptive matched filter3:

H −1 2 |µ Rˆ X | H1 Tˆ(X ) := τ. H ˆ −1 ≷ µ R µ H0

n • Assume {W i }i=1 i.i.d.,

3 Robey et al., “A CFAR adaptive matched filter detector”. 6/22 Adaptive Matched Filter

• Prevailing test is the adaptive matched filter3:

H −1 2 |µ Rˆ X | H1 Tˆ(X ) := τ. H ˆ −1 ≷ µ R µ H0

n • Assume {W i }i=1 i.i.d.,

3 Robey et al., “A CFAR adaptive matched filter detector”. 6/22 Adaptive Matched Filter

• Prevailing test is the adaptive matched filter3:

H −1 2 |µ Rˆ X | H1 Tˆ(X ) := τ. H ˆ −1 ≷ µ R µ H0

n • Assume {W i }i=1 i.i.d.,

3 Robey et al., “A CFAR adaptive matched filter detector”. 6/22 Technical Challenges

• Clutter varies along range dimension Choosing n too large introduces to training data

• Therefore, n must be small, i.e. on the order of the data dimension p Classical estimators for R such as the sample covariance 1 Pn W W H are singular or ill-conditioned n i=1 i i

7/22 Technical Challenges

• Clutter varies along range dimension Choosing n too large introduces heteroscedasticity to training data

• Therefore, n must be small, i.e. on the order of the data dimension p Classical estimators for R such as the sample covariance 1 Pn W W H are singular or ill-conditioned n i=1 i i

7/22 Context

• Existing approaches [14, 15] lack performance guarantees.

8/22 Formulation: Sequence of problems

(k) H0 : X = W k H(k) : X = aµ + W , a 6= 0, 1 k k

pk ×pk H Rk ∈ C a priori equally likely to Uk Rk Uk for unitary Uk H (Rk = EW k W k with spectrum that converges to a limit) nk pk ×nk Wk := matrix({W i,k }i=1) ∈ C , (satis. independence and moment conditions given later) µ ∈ pk k C H −1 2 |µ Rˆ X | H1 ˆ : k k Tk (X ) = −1 ≷ τ µH Rˆ µ H k k k 0

• Asy(γ): The number of samples nk and the number of dimensions pk follow the proportional-growth limit pk /nk → γ ∈ (0, 1) ∪ (1, ∞), pk , nk → ∞ as k → ∞. 9/22 Formulation: Sequence of problems

(k) H0 : X = W k H(k) : X = aµ + W , a 6= 0, 1 k k

pk ×pk H Rk ∈ C a priori equally likely to Uk Rk Uk for unitary Uk H (Rk = EW k W k with spectrum that converges to a limit) nk pk ×nk Wk := matrix({W i,k }i=1) ∈ C , (satis. independence and moment conditions given later) µ ∈ pk k C H −1 2 |µ Rˆ X | H1 ˆ : k k Tk (X ) = −1 ≷ τ µH Rˆ µ H k k k 0

• Asy(γ): The number of samples nk and the number of dimensions pk follow the proportional-growth limit pk /nk → γ ∈ (0, 1) ∪ (1, ∞), pk , nk → ∞ as k → ∞. 9/22 Formulation: Sequence of problems

(k) H0 : X = W k H(k) : X = aµ + W , a 6= 0, 1 k k

pk ×pk H Rk ∈ C a priori equally likely to Uk Rk Uk for unitary Uk H (Rk = EW k W k with spectrum that converges to a limit) nk pk ×nk Wk := matrix({W i,k }i=1) ∈ C , (satis. independence and moment conditions given later) µ ∈ pk k C H −1 2 |µ Rˆ X | H1 ˆ : k k Tk (X ) = −1 ≷ τ µH Rˆ µ H k k k 0

• Asy(γ): The number of samples nk and the number of dimensions pk follow the proportional-growth limit pk /nk → γ ∈ (0, 1) ∪ (1, ∞), pk , nk → ∞ as k → ∞. 9/22 Shrinkage Estimation: Use Sample Eigenvectors

• Generate an eigendecomposition of the sample 1 H H Sk = Wk Wk = Uk Λk Uk . nk • Consider shrinkage estimators of the form

ˆ H Rk = Uk Dk Uk

for some positive-definite diagonal matrix Dk (that usually depends only on Λk ).

10/22 Shrinkage Estimation: Use Sample Eigenvectors

• Generate an eigendecomposition of the sample covariance matrix 1 H H Sk = Wk Wk = Uk Λk Uk . nk • Consider shrinkage estimators of the form

ˆ H Rk = Uk Dk Uk

for some positive-definite diagonal matrix Dk (that usually depends only on Λk ).

10/22 Asymptotic Ledoit-P´ech´eEquivalence Suppose f : [0, ∞) → (0, ∞) is continuous. We call a shrinkage ˆ H estimator Rk = Uk Dk (Λk )Uk f -consistent if p kdiagDk (Λk ) − f (diagΛk )k∞ → 0 as k → ∞. Let ( λ 2 , if λ > 0 δ(λ) = |1−γ−γm˘F (λ)| 1 , if λ = 0, (γ−1)m ˘F (0)

Here, F and F are the limiting spectral distribution functions of −1 H −1 H nk Wk Wk and pk Wk Wk , resp., andm ˘G (x) is the limit of m (z) = R (t − z)−1 dG(t) (the Stieltjies transform of G) as z in G R the upper half plane approaches x ∈ R from above. Theorem An estimator (LWD) of Ledoit and Wolf4 is (asymptotically) LP equivalent, meaning δ-consistent 4Olivier Ledoit and Michael Wolf. “Analytical nonlinear shrinkage estimation of large-dimensional covariance matrices”. In: Annals of Statistics (Forthcoming). 11/22 Asymptotic Ledoit-P´ech´eEquivalence Suppose f : [0, ∞) → (0, ∞) is continuous. We call a shrinkage ˆ H estimator Rk = Uk Dk (Λk )Uk f -consistent if p kdiagDk (Λk ) − f (diagΛk )k∞ → 0 as k → ∞. Let ( λ 2 , if λ > 0 δ(λ) = |1−γ−γm˘F (λ)| 1 , if λ = 0, (γ−1)m ˘F (0)

Here, F and F are the limiting spectral distribution functions of −1 H −1 H nk Wk Wk and pk Wk Wk , resp., andm ˘G (x) is the limit of m (z) = R (t − z)−1 dG(t) (the Stieltjies transform of G) as z in G R the upper half plane approaches x ∈ R from above. Theorem An estimator (LWD) of Ledoit and Wolf4 is (asymptotically) LP equivalent, meaning δ-consistent

4Ledoit and Wolf, “Analytical nonlinear shrinkage estimation of large-dimensional covariance matrices”. 11/22 Asymptotic Ledoit-P´ech´eEquivalence Suppose f : [0, ∞) → (0, ∞) is continuous. We call a shrinkage ˆ H estimator Rk = Uk Dk (Λk )Uk f -consistent if p kdiagDk (Λk ) − f (diagΛk )k∞ → 0 as k → ∞. Let ( λ 2 , if λ > 0 δ(λ) = |1−γ−γm˘F (λ)| 1 , if λ = 0, (γ−1)m ˘F (0)

Here, F and F are the limiting spectral distribution functions of −1 H −1 H nk Wk Wk and pk Wk Wk , resp., andm ˘G (x) is the limit of m (z) = R (t − z)−1 dG(t) (the Stieltjies transform of G) as z in G R the upper half plane approaches x ∈ R from above. Theorem An estimator (LWD) of Ledoit and Wolf4 is (asymptotically) LP equivalent, meaning δ-consistent

4Ledoit and Wolf, “Analytical nonlinear shrinkage estimation of large-dimensional covariance matrices”. 11/22 Principal Findings: False-alarm rate of shrinkage estimators

Recall |µH Rˆ −1X |2 k k Tˆk (X ) := . µH Rˆ −1µ k k k Let (k) ˆ (k) pfa (τ) := Pr[Tk > τ | H0 ]. Theorem If Rˆ k is asymptotically LP equivalent, then

(k) p −τ pfa (τ) → e as k → ∞. • We can consistently estimate the false-alarm rate of an adaptive matched filter formed from a shrinkage estimator.

12/22 Principal Findings: False-alarm rate of shrinkage estimators

Recall |µH Rˆ −1X |2 k k Tˆk (X ) := . µH Rˆ −1µ k k k Let (k) ˆ (k) pfa (τ) := Pr[Tk > τ | H0 ]. Theorem If Rˆ k is asymptotically LP equivalent, then

(k) p −τ pfa (τ) → e as k → ∞. • We can consistently estimate the false-alarm rate of an adaptive matched filter formed from a shrinkage estimator.

12/22 Principal Findings: Empirical convergence rate of pfa

Figure: For each k, pfa(τ) = exp(−τ/ξ(Rˆ, R)), with ξ(Rˆ, R) ≈ 1 in high dimensions for LP equivalent estimators. The modified Ledoit-Wolf estimator (red) exhibits a much faster convergence than the estimator of Donoho et. al (green). The dimension is p = 352, the is Laplacian.

13/22 Principal Findings: Power of Shrinkage Estimators

Fact: detection power is a function of effective SINR: ν2(Rˆ ) := |a|2(µH Rˆ −1µ )2/(µH Rˆ −1R Rˆ −1µ ). (RMB 1974) k k k k k k k k k k

Theorem Let Rˆ k be asymptotically LP equivalent and q ∈ (0, 1). Then,

h 2 ˆ 2 ˆ 2 ˆ (k)i p Pr νk−(Rk , X ) ≤ νk (Rk ) ≤ νk+(Rk , X ) | H1 → q,

as k → ∞, where s + |µH Rˆ −1X | ! k k 1 νk±(Rˆ k , X ) = ± log . (µH Rˆ −1µ )1/2 1 − q k k k

• We can consistently estimate (a confidence interval for) the detection rate of a shrinkage estimator.

14/22 Principal Findings: Power of Shrinkage Estimators

Fact: detection power is a function of effective SINR: ν2(Rˆ ) := |a|2(µH Rˆ −1µ )2/(µH Rˆ −1R Rˆ −1µ ). (RMB 1974) k k k k k k k k k k

Theorem Let Rˆ k be asymptotically LP equivalent and q ∈ (0, 1). Then,

h 2 ˆ 2 ˆ 2 ˆ (k)i p Pr νk−(Rk , X ) ≤ νk (Rk ) ≤ νk+(Rk , X ) | H1 → q,

as k → ∞, where s + |µH Rˆ −1X | ! k k 1 νk±(Rˆ k , X ) = ± log . (µH Rˆ −1µ )1/2 1 − q k k k

• We can consistently estimate (a confidence interval for) the detection rate of a shrinkage estimator.

14/22 Strengths and Limitations

S: Results can easily be extended to your favorite shrinkage estimator

S: Can easily be extended to the ACE test statistic

S: Results are equally valid in the Gaussian and jointly sub-Weibull (heavy-tail) cases, but . . . L: Currently a finite sixteenth moment has to be assumed on the entries of the whitened training data matrix Wk

L: Don’t know rates of convergence

L: Performance estimates fail when p/n ≈ 1.

Currently working on a fix or at least characterization of how close is too close.

15/22 Acknowledgements

This work was supported by AFOSR grant 19RYCOR036, the United States Air Force Sensors Directorate, and ARO grant W911NF-15-1-0479. Thanks to Michael Wolf for some useful discussion.

16/22 Ledoit, Olivier and Sandrine P´ech´e. “Eigenvectors of some large sample covariance matrix ensembles”. In: Probability Theory and Related Fields 151.1-2 (2011), pp. 233–264. Ledoit, Olivier and Michael Wolf. “Analytical nonlinear shrinkage estimation of large-dimensional covariance matrices”. In: Annals of Statistics (Forthcoming). Richards, M. A., Jim Scheer, and William A. Holm. Principles of Modern Radar. Tes Dee Publishing Pvt. Ltd., (Published by arrangement), 2012. Robey, Frank C. et al. “A CFAR adaptive matched filter detector”. In: IEEE Transactions on aerospace and electronic systems 28.1 (1992), pp. 208–216. Wicks, Michael C. et al. “Space-time adaptive processing: A knowledge-based perspective for airborne radar”. In: IEEE Magazine 23.1 (2006), pp. 51–65.

17/22 Backup Slide: Hypotheses in random matrix theory 1/2 (H1) Wk = Rk Zk , where Zk ’s entries are i.i.d. with mean 0 and variance 1 and first sixteen moments bounded by an absolute constant C, and is independent of Rk and the test datum

(H2)( τk,1, τk,2, . . . τk,pk ) is a system of eigenvalues of Rk and the empirical spectral distribution (e.s.d.) of the population 1 Pp covariance given by Hk (τ) = 1 (τ) converges pk j=1 [τk,j ,∞) a.s. (almost surely) to a nonrandom limit H(τ) at every point of continuity of H. H defines a probability distribution function, whose support supp(H) is included in the compact interval [h1, h2] with h1 > 0.

18/22 Backup: Shrinkage Estimators

1 H • For each k, let Sk := Wk Wk . nk

• We say that Rˆ k is a shrinkage estimator if

H Rˆ k (Wk ) = Uk (Wk ) · Dk (Wk ) · Uk (Wk ) ,

where p ×n p ×p Uk : C k k → {Unitary matrices in C k k }, p ×n Dk : C k k → {Real, diagonal, PD pk × pk matrices}, H and Uk (Wk ) · Sk · Uk (Wk ) is diagonal.

ˆ ˆ −1 • lim supk Rk and lim supk Rk are bounded almost surely.

19/22 Detection-theoretic optimality of shrinkage-optimal estimators Max.ing effective SINR ⇔ max.ing Normalized SINR (NSINR):

(µH Rˆ −1µ )2 k k k ηk (µ , Rˆ k , Rk ) := . k (µH R−1µ )(µH Rˆ −1R Rˆ −1µ ) k k k k k k k k

Theorem If Rˆ k is a asymptotically LP equivalentestimator, then its NSINR is asymptotically maximal (among f -consistent estimators)

• The estimator of Ledoit, Wolf, and P´ech´eis asymptotically detection-theoretic optimal (among these estimators) • In practice this leads to slight improvements over the state of the art

20/22 Empirical Detection Performance of Shrinkage-Optimal Estimators

Figure: Proposed Ledoit-Wolf shrinkage-optimal estimator (red) achieves equivalent pd as shrinkage oracle (black) and improves on state of the art (blue). Here, the number of dimensions is fixed at p = 352, the noise is Laplacian. 21/22 Empirical Detection Performance of Shrinkage-Optimal Estimators (cont.)

Figure: Proposed Ledoit-Wolf shrinkage-optimal estimator (red) achieves equivalent pd as shrinkage oracle (black) and improves on state of the art (blue). The estimator of Donoho et al. (green) is asymptotically LP equivalentunder the spiked assumption, but is only defined for n ≥ p. 22/22