Sparse Methods for Model Estimation with Applications to Radar Imaging

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By

Christian D. Austin, B.E., B.S., M.S.

Graduate Program in Electrical and Computer Engineering

The Ohio State University

2012

Dissertation Committee:

Dr. Randolph L. Moses, Advisor Dr. Lee C. Potter Dr. Philip Schniter c Copyright by

Christian D. Austin

2012 Abstract

In additive component model estimation problems, the number of additive com- ponents (model order) and values of the model parameters in each of the additive components are estimated. Traditional methods typically estimate parameters for a set of models with fixed order; parameter estimation is performed over a continuous space when parameters are not discrete. The model order is estimated as the min- imizer, over the set of fixed model orders, of a cost function compromising between signal fit to measurements and model complexity.

This dissertation explores dictionary-based estimation methods for joint model order and parameter estimation. In dictionary estimation, the continuous parameter space is discretized, forming a dictionary. Each column of the dictionary is a model component at a sampled parameter value, and a linear combination of a subset of columns is used to represent the model. It is assumed that the model consists of a small number of components, and a sparse reconstruction algorithm is used to select a sparse superposition of columns to represent the signal. The number of columns selected is the estimated model order, and the parameters of each column are the parameter estimates.

We examine both static and dynamic dictionary-based estimation methods. In static estimation, the dictionary is fixed, while in dynamic estimation, dictionary pa- rameters adapt to the data. We propose two new dynamic dictionary-based estimation

ii algorithms and examine the performance of both static and dynamic algorithms in terms of model order probability and parameter estimation error when dictionaries are highly correlated. Highly correlated dictionaries arise from using closely spaced parameter samples in dictionary formation; we propose a method for selecting algo- rithm settings based on an information criterion. We show the following results: 1) dictionary-based estimation methods are capable of performance comparable to the

Cram´er Rao lower bound and to traditional benchmark estimation algorithms over a wide range of signal-to-noise ratios; 2) in the complex exponential model, dictionary- based estimation can superresolve closely spaced frequencies, and 3) dynamic dictio- nary methods overcome parameter estimation bias caused by quantization error in static dictionary-based estimation.

We apply dictionary-based estimation to the problem of 3D synthetic aperture radar (SAR) imaging. Traditional 3D SAR image formation requires collection of data over a large contiguous sector of azimuth-elevation aspect angles; this collection is difficult or impossible to obtain in practice. We show that dictionary-based estima- tion can be used to produce well-resolved, wide-angle 3D SAR images from sparse, irregular flight paths.

iii In memory my grandfathers, David Austin, and Walter Sobol

iv Acknowledgments

I would like to thank my family for their continual support, love, and encourage-

ment. My parents have always emphasized the importance of education and stood

beside me in all of my decisions; I would not be where I am today without them.

When in need of someone to talk to in tough times, my mother has always been there

for me, bringing the current situation into perspective. I’ve often drawn upon my

father’s realist views and life-experiences when in need of motivation or focus. My

sister, Marielle, and I left home at the same time and experienced college concurrently,

and I feel that we intellectually “grew-up” together. I value the experiences that we

had together during this time of our lives. To the rest of my family who inquired into

my progress, offered advice, and never questioned my many years of graduate school,

I thank you for showing interest in my work and always believing in me.

My academic advisor and mentor, Professor Randolph Moses, taught me how to conduct research, from formulating a problem to publishing the results. I thank him not only for teaching me the skills necessary to be a researcher, but for always being very professional and making my graduate school experience a very pleasant one. Having good officemates is an important part of graduate school, given that you share many hours of your graduate life together. I thank both Dr. Josh Ash and Dr. Julie Jackson for being great officemates. We’ve had the opportunity to discuss a multitude of ideas, and many of our conversations have taught me something

v new; hopefully, I have reciprocated. One day I hope to implement at least a small fraction of the projects that Josh and I have discussed over the years. Members of the Compressive Sensing reading group also deserve my gratitude, especially Professor

Lee Potter, and Professor Phil Schniter, for the valuable conversations that we had about compressive sensing and research in general. I thank my friend, and fellow graduate student, Anthony D’Orazio, for listening to my problems, and at times, just dealing with me during the ups and downs of graduate school. Being able to discuss everyday problems with someone who is in a similar situation and can relate was a great stress-relief.

Ed Zelnio of the Air Force Research Laboratory (AFRL) has been supportive of my research and has provided valuable input; I am thankful for his involvement. I also owe my gratitude to Dr. Greg Arnold, who volunteered to be our AFRL collaborator during Ohio Space Grant Consortium funding. Lastly, my graduate research would not have been possible without financial support from the AFRL, Ohio Space Grant

Consortium, and NSF IUCRC; I am greatly appreciative of their support.

vi Vita

November 11, 1980 ...... Born - West Islip, New York

2003 ...... B.E. Computer Engineering, B.S. Mathematics, State University of New York at Stony Brook 2006 ...... M.S. Electrical Engineering, The Ohio State University 2003-2004, 2006-2009 ...... Graduate Fellow, The Ohio State University 2004-2006, 2009-present ...... Graduate Research Associate, The Ohio State University

Publications

C. D. Austin, E. Ertin, and R.L. Moses, “Sparse Signal Methods for 3D Radar Imag- ing,” IEEE Journal of Selected Topics in , vol. 5, no. 3, pp. 408-423, June 2011.

C. D. Austin, J. N. Ash and R. L. Moses, “Parameter Estimation Using Sparse Reconstruction With Dynamic Dictionaries,” Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Prague, Czech Republic, May 22 - 27, 2011.

C. D. Austin, J. N. Ash and R. L. Moses, “Performance Analysis of Sparse 3D SAR Imaging,” Algorithms for Synthetic Aperture Radar Imagery XVIII, SPIE Defense and Security Symposium, Orlando, FL., April 25 - 29, 2011

C. D. Austin, E. Ertin, J. N. Ash, and R. L. Moses, “On the Relation Between Sparse Reconstruction and Parameter Estimation with Model Order Selection,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 3, pp. 560 - 570, June 2010.

vii K. E. Dungan, C. D. Austin, J. Nehrbass, and L. C. Potter, “Civilian Vehicle Radar Data Domes,” Algorithms for Synthetic Aperture Radar Imagery XVII, SPIE Defense and Security Symposium, Orlando, FL., April 5 - 9, 2010.

C. D. Austin, E. Ertin, and R. L. Moses, “Sparse Multipass 3D SAR Imaging: Appli- cations to the GOTCHA Data Set,” Algorithms for Synthetic Aperture Radar Imagery XVI, SPIE Defense and Security Symposium, Orlando, FL., April 13 - 17, 2009.

M. Ferrara, J. A. Jackson, and C. Austin, “Enhancement of Multi-Pass 3D Circu- lar SAR Images using Sparse Reconstruction Techniques,” Algorithms for Synthetic Aperture Radar Imagery XVI, SPIE Defense and Security Symposium, Orlando, FL., April 13 - 17, 2009.

C. D. Austin, E. Ertin, J. N. Ash, and R. L. Moses, “On the Relation Between Sparse Sampling and Parametric Estimation,” IEEE 13th DSP workshop and 5th Sig. Proc. Workshop 2009 (DSP/SPE 2009), Jan. 4 - 7, 2009.

C. D. Austin and R. L. Moses, “Wide-angle Sparse 3D Synthetic Aperture Radar Imaging for Nonlinear Flight Paths,” IEEE National Aerospace and Electronics Con- ference (NAECON) 2008, July 16 - 18, 2008.

E. Ertin, C. D. Austin, S. Sharma, R. L. Moses, and L. C. Potter,“GOTCHA Experi- ence Report: Three-Dimensional SAR Imaging with Complete Circular Apertures,” Algorithms for Synthetic Aperture Radar Imagery XIV, SPIE Defense and Security Symposium, Orlando, FL., April 9 - 13, 2007.

C. D. Austin, “Interferometric Synthetic Aperture Radar Height Estimation with Multiple Scattering Centers in a Resolution Cell,” Master’s Thesis, The Ohio State University, 2006.

C. D. Austin and R. L. Moses, “Interferometric Synthetic Aperture Radar Detection and Estimation Based 3D Image Reconstruction,” Algorithms for Synthetic Aperture Radar Imagery XIII, SPIE Defense and Security Symposium, Orlando, FL, Apr. 17 - 21, 2006.

C. D. Austin and R. L. Moses, “IFSAR Processing for 3D Target Reconstruction,” Algorithms for Synthetic Aperture Radar Imagery XII, SPIE Defense and Security Symposium, Orlando, FL, Mar. 28 - Apr. 1, 2005.

viii R. L. Moses, L. C. Potter, E. Ertin, and C. D. Austin, “Synthetic Aperture Radar Visualization,” Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, Nov. 7 - 10, 2004.

Fields of Study

Major Field: Electrical and Computer Engineering

ix Table of Contents

Page

Abstract...... ii

Dedication...... iv

Acknowledgments...... v

Vita ...... vii

ListofTables...... xiii

ListofFigures ...... xiv

1. Introduction...... 1

1.1 Outline ...... 4

2. TheAdditiveComponentModel...... 10

2.1 SparseLinearSystemRepresentation ...... 11

3. Sparse Algorithm Performance in Highly Correlated Dictionaries. . . . . 13

3.1 Introduction ...... 13 3.2 SparseReconstructionAlgorithms ...... 16 3.2.1 MajorizationMinimization ...... 16 3.2.2 SplitBregman ...... 16 3.2.3 FISTA...... 17 3.2.4 AugmentedLagrangian ...... 18 3.2.5 SpectralProjectedGradients ...... 19 3.2.6 IterativeHardThresholding ...... 19 3.3 ExperimentalResults ...... 19

x 3.3.1 DiscussionofResults...... 26 3.4 Conclusion ...... 30

4. StaticDictionary-Based ModelEstimation ...... 32

4.1 Introduction ...... 32 4.2 Model Order Selection and Parameter Estimation ...... 34 4.2.1 Classical Model Order Selection and Parametric Estimation 34 4.2.2 SparseReconstruction ...... 36 4.3 MainResults ...... 38 4.3.1 Static Dictionary-Based Estimation Algorithm and λ Selection 38 4.3.2 AlgorithmSettings ...... 40 4.4 NumericalExamples: SinusoidEstimation ...... 45 4.4.1 SpectralEstimationMethods ...... 45 4.4.2 ParameterErrorandResolutionLimits ...... 48 4.4.3 Simulations ...... 49 4.5 Conclusion ...... 58

5. Dynamic Dictionary-Based Model Estimation ...... 62

5.1 Introduction ...... 62 5.2 Dynamic Dictionary-Based Estimation Methods ...... 64 5.2.1 PenalizedEstimation...... 67 5.2.2 ConstrainedEstimation ...... 70 5.3 ModelEstimationExamples...... 73 5.3.1 ComplexExponentialEstimation ...... 76 5.3.2 DecayingExponentials...... 86 5.4 Conclusion ...... 91

6. Sparse3DSARImaging ...... 93

6.1 Introduction ...... 93 6.2 SARModel...... 98 6.3 Collection Geometry and Example Datasets ...... 102 6.3.1 PseudorandomFlightPathDataset...... 102 6.3.2 MultipassCircularSARDataset ...... 103 6.4 ℓp Regularized Least-Squares Imaging Algorithm ...... 106 6.5 3DImagingResults ...... 111 6.5.1 SquigglePathReconstructions ...... 111 6.5.2 MulitpassCSARReconstructions ...... 117 6.6 Conclusions...... 128

xi 7. Conclusion...... 130

Bibliography ...... 137

xii List of Tables

Table Page

3.1 Average iteration time of sparse algorithms in a highly correlated dic- tionary...... 27

4.1 Square-root of CRB and RMSE of frequency estimates for 1 sinusoid. 54

4.2 Sum of square-root of CRBs and sum of RMSEs of frequency estimates for2well-separatedsinusoids...... 56

4.3 Sum of square-root of CRBs and sum of RMSEs of frequency estimates for2closely-spacedsinusoids...... 58

xiii List of Figures

Figure Page

3.1 Amplitude estimate after early termination of sparse reconstruction algorithm...... 14

3.2 On column, noiseless performance of sparse reconstruction algorithms asafunctionofiteration...... 22

3.3 On column, noisy performance of sparse reconstruction algorithms as afunctionofiteration...... 23

3.4 Off column, noiseless performance of sparse reconstruction algorithms asafunctionofiteration...... 24

3.5 Off column, noisy performance of sparse reconstruction algorithms as afunctionofiteration...... 25

4.1 Static dictionary-based estimation average estimated model order and prediction error versus parameter sample spacing...... 44

4.2 Static dictionary-based estimation average estimated model order and prediction error versus ℓp quasinorm p...... 46

4.3 Complex exponential model order probability and parameter estimates fortruemodelorder1and10dBSNR...... 52

4.4 Complex exponential model order probability and parameter estimates fortruemodelorder1and0dBSNR...... 53

4.5 Complex exponential model order probability and parameter estimates for true model order 2 well-separated sinusoids and 10 dB SNR. ... 55

xiv 4.6 Complex exponential model order probability and parameter estimates for true model order 2 well-separated sinusoids and 0 dB SNR. . ... 56

4.7 Complex exponential model order probability and parameter estimates for true model order 2 closely-spaced sinusoids and 10 dB SNR. ... 59

4.8 Complex exponential model order probability and parameter estimates for true model order 2 closely-spaced sinusoids and 0 dB SNR. . ... 60

5.1 Complex exponential dictionary correlation and correlation ver- sussinusoidspacing...... 79

5.2 Complex exponential static dictionary model order as a function of correlation and p for1sinusoid...... 81

5.3 Complex exponential static dictionary amplitude estimates for a 64 times oversampled dictionary and two different values of p...... 82

5.4 Complex exponential dictionary-based parameter estimation RMSE versusSNR...... 85

5.5 Complex exponential dictionary-based model order estimation versus SNR...... 86

5.6 Decaying exponential dictionary correlation matrix and correlation ver- susdecayratespacing...... 88

5.7 Decaying exponential dictionary-based parameter estimation RMSE versusSNR...... 90

5.8 Decaying exponential dictionary-based model order estimation versus SNR...... 91

6.1 Sparse squiggle path radar measurements as a function of azimuth and elevationangle...... 104

6.2 Data domes for the squiggle path and GOTCHA dataset ...... 104

6.3 ActualGOTCHApasses...... 105

6.4 Magnitude of k-space data from a squiggle path subaperture...... 112

xv 6.5 Benchmark reconstructed backhoe image from the squiggle path dataset.113

6.6 Fourier reconstructed backhoe from the squiggle path with an SNR of 10dB...... 116

6.7 Magnitude of point spread function from a subaperture of the squiggle path...... 117

6.8 Static dictionary-based estimation reconstructed backhoe from the squigglepathwithanSNRof10dB...... 118

6.9 Two-dimensionalSARimageoftheGOTCHAscene...... 120

6.10 PhotographsfromtheGOTCHAscene...... 120

6.11 Fourier reconstructed images from the GOTCHA dataset...... 121

6.12 Static dictionary-based estimation reconstruction of a tophat from the GOTCHAdataset...... 123

6.13 Static dictionary-based estimation reconstruction of a Camry from the GOTCHAdataset...... 124

6.14 Static dictionary-based estimation reconstruction of a Camry from the GOTCHA dataset, with color encoding look angle...... 125

6.15 Tomo-SAR reconstruction of a Camry from the GOTCHA dataset. . 126

xvi Chapter 1: Introduction

The additive component model,

M

yn = αmf(tn,θm)+ ǫn, n =1,...,N, m=1 X is the superposition of a parametric component function, f(tn,θm), over M parame- ters, θm, with linear coefficients, αm, and additive noise, ǫn. Measurements are col- lected at n samples of the measurement variable tn. The additive component model

is used to model signals in many applications. Perhaps the most common compo-

nent function is the complex exponential. This function can be used to model the

attenuation in MRI [1–3] and CT [1,4] , backscattering location in

high-frequency radar [5,6], narrow-band direction of arrival in array processing [7,8],

and harmonics in spectral estimation problems [8], as examples of a few applications.

Other additive component models include the decaying complex exponential model

used to model aspect dependent radar scattering [9,10], and the decaying exponential

used to model spin density in Electron Paramagnetic Resonance (EPR) imaging [11]

and the loss of messenger RNA and protein concentration in biological gene expression

modeling [12].

Estimation in an additive component model is a twofold process: both the number

of components in the model, M, or model order, and the value of component function

1 parameters, θm, must be estimated. The parameter space is typically continuous, as

is the case in the applications above. Traditional parameter estimation methods such

as Maximum-Likelihood (ML), Maximum-a-Posteriori (MAP), or Least-Squares (LS)

have been applied to estimate the parameters of additive component models [8,13].

The model order, M, is a discrete parameter and is estimated by fixing a discrete set

of candidate model orders, fitting the models using a parameter estimation method for

each of these candidate model orders, and then selecting the model with least model

order selection cost [8]. Model order selection cost is typically defined in terms of an

information-based criterion, such as Akaike Information Criterion (AIC), Bayesian

Information Criterion (BIC), or Generalized Information Criterion (GIC) [8,14].

In this dissertation, we explore the use of sparse reconstruction algorithms to effi-

ciently perform joint parameter and model order estimation; we refer to this method of

estimation as dictionary-based estimation. This approach differs from traditional

parameter and model order estimation in that the parameter space is sampled, and

parameter estimates are selected from these samples. Specifically, we discretize the

component function parameter, θ, into K samples and select Mˆ parameter estimates

from this discrete set; for finely sampled or multi-dimensional parameters, K can be very large. To perform parameter selection, we form N dimensional column vectors by fixing the component function parameter at a sampled value, θk, and sampling the component function at N measurement values. The resulting K column vectors are then aggregated into an N K dictionary matrix. Estimation in this framework × becomes a dictionary column selection problem, for which, we use a sparse reconstruc- tion algorithm to choose the sparse linear combination of dictionary elements that best models the measured data. The subset of dictionary columns selected encode

2 the parameter estimate, and the number of dictionary columns selected, Mˆ , is the estimated model order.

Sparse reconstruction algorithms have been utilized for a number of years to suc- cessfully reconstruct objects measured through an underdetermined system dictionary matrix, A—see e.g. [15,16] and references in [17]. The success of these algorithms depends on the assumption that the underlying object is “sparse” in the sense that the input object to the system is a vector consisting of a small number of non-zero coefficients. This sparsity assumption is used to regularize an otherwise ill-posed in- version problem. Selecting the minimum number of non-zero coefficients to model the system directly is a combinatoric problem, and is intractable, in general. The utility of sparse algorithms arises because they are non-combinatoric and are capa- ble of accurately reconstructing sparse objects in practical computation time; hence, sparse reconstruction algorithms provide a tractable way to perform dictionary subset selection in dictionary-based estimation algorithms.

The area of compressive sensing, which analyzes the theoretical performance of sparse reconstruction algorithms, has emerged more recently [18,19]. Typical results in this area assert that if the reconstruction object is sufficiently sparse, and certain conditions are placed on the correlation structure of the system dictionary matrix, then the norm of the error between the reconstructed and true object can be bounded.

Two conditions commonly placed on the dictionary are the mutual incoherence con- dition or the Restricted Isometery Property (RIP) [20–22].

Although dictionary-based estimation is related to compressive sensing through the use of sparse reconstruction algorithms, it differs in three substantial ways. First, our primary interest is on model and parameter estimation performance, not the linear

3 coefficient reconstruction error, which compressive sensing bounds [21]. Secondly, in the proposed dictionary-based estimation methods, the parameter space needs to be sampled finely enough to avoid parameter estimation bias, since dictionary columns encode parameter estimates; the result is dictionaries with highly correlated columns, which violate the mutual incoherence condition or RIP. In fact, most dictionaries that satisfy the intercolumn correlation conditions are randomly generated [21], or gener- ated using algebraic structure [23], but these dictionaries usually do not accurately represent a physical model or data collection of interest. Lastly, although the dictio- nary may be learned from training data, as in sparse dictionary learning [24,25], the dictionary is then fixed in compressive sensing algorithms. We allow the dictionary columns to adapt to the measured data.

In this dissertation, we first present and analyze the performance of dictionary estimation algorithms. We then show an application of dictionary-based estimation to 3D synthetic aperture radar (SAR) imaging. An outline of the following chapters is presented below. The outline provides a general discussion of the problem, relevant literature, and results. A more detailed discussion of relevant literature is contained in each chapter.

1.1 Outline

Chapter 2 provides background material for the following chapters. The additive component model is formally introduced, and a specific instance of the additive com- ponent model used for radar imaging is presented as a motivating example. A sparse linear system representation of the additive component model is then introduced.

4 We use the linear system representation for subset selection of dictionary columns in

dictionary-based estimation algorithms.

Since dictionary-based estimation algorithms utilize a sparse reconstruction algo-

rithm to perform dictionary subset selection, model estimation performance depends

on the performance of the sparse reconstruction algorithm. Under dictionary matrix

correlation conditions of compressive sensing, theoretical error guarantees have been

derived for many sparse algorithms [19,21,22,26–28], and several algorithms have

rate of convergence guarantees [26–28]. Performance of sparse algorithms is usually

quantified with respect to a norm on coefficient vector estimation error when com-

pressive sensing properties are satisfied. In Chapter 3, we discuss and evaluate the

speed and estimation performance of several commonly used sparse reconstruction

algorithms for highly correlated dictionaries, where the correlation conditions of com-

pressive sensing do not hold. The analysis of this chapter will show that, in practice,

ℓp regularized sparse algorithms with p < 1 converge to a sparse, low error solution faster than when p = 1. The results of this chapter are significant in that they

quantify the observation that, although compressive sensing correlation conditions

are violated, sparse algorithms are still capable of generating accurate solutions if p is

chosen appropriately. Based on the findings of this chapter, we select a majorization-

minimization (MM) based ℓp regularized LS algorithm for use with dictionary-based

estimation methods in the following chapters.

In Chapter 4 we examine the relationship between static dictionary-based es-

timation and classic model estimation. The ℓp regularized LS algorithm used for

dictionary subset selection has three settings that affect algorithm performance: the

5 ℓp quasinorm, sparsity setting, and the set of parameter samples selected for dic- tionary construction. The parameters, and hence, dictionary are fixed during static dictionary-based estimation. Automated methods for choosing settings are important for dictionary subset selection, since manual selection is a trial-and-error approach that is subjective. We develop a method for selecting algorithm settings such that sparse reconstruction mimics classic order selection criteria such as AIC, BIC and

GIC. The proposed method chooses settings in an optimal way, by minimizing a model order cost, and does not require extra training data, such as in cross-validation approaches [29,30]. We investigate how these sparse reconstruction algorithm settings impact the corresponding parametric modeling solution. Static dictionary-based esti- mation is compared with traditional model estimation techniques and the Cram´er Rao lower bound for a sinusoids-in-noise (noisy complex exponential) example when the sinusoids are both well-separated and closely-spaced. We find that the two methods have comparable performance in most cases; dictionary-based estimation is capable of superresolving sinusoids that are closely spaced, and in fact, dictionary-based es- timation performs better than a commonly used classic model estimation algorithm specifically developed for closely-spaced sinusoids.

Our work on static dictionary-based estimation is not the first. This estimation method has been used in many applications, and many sparse algorithm settings have been utilized [11,31–35]; however, our work on static dictionary-based estimation ap- pears to be the first to analyze selection of sparse algorithm settings and performance from a joint model order and parameter estimation performance standpoint. In ad- dition, we present a general estimation algorithm that can be applied to any additive

6 component model but still performs well on problems where application-specific al-

gorithms are common, such as the complex exponential superresolution problem.

Like static dictionary-based estimation, dynamic dictionary-based estimation se-

lects a sparse subset of dictionary columns to model the measured data; however,

dynamic dictionary-based algorithms adapt locations of the parameter samples, and

hence dictionary, to the data. Algorithms that refine parameter samples through

iterative parameter grid refinement methods have been presented in [33,36], and a

spectral estimation technique that uses measured data to select a parameter sample

grid is discussed in [37]. In Chapter 5, we introduce two dynamic dictionary algo-

rithms for parameter estimation. Both algorithms control intercolumn correlation by

inhibiting parameter sample spacing from becoming too close. One algorithm uses

a penalized approach, and the other uses a constraint. We examine model order

selection performance of the ℓp regularized LS algorithm as a function of p and pa- rameter spacing (correlation), and use this behavior to select p and the inhibition settings in the dynamic algorithms. We show that for practical ℓp regularized LS al- gorithm settings, it is necessary to use p< 1 for accurate estimation when parameter spacing becomes fine and dictionary columns become highly correlated. Estimation performance of the dynamic algorithms is evaluated on a complex exponential and a decaying exponential model. We show that parameter estimation performance is comparable with the Cram´er Rao lower bound and a genie ML estimator over a wide range of signal-to-noise ratios. The dynamic dictionary-based estimation algo- rithms that we present appear to be the first dictionary-based estimation methods that utilize data to select a parameter sampling grid based on an optimality crite- rion; furthermore, based on the examples presented in this chapter, it appears that

7 the proposed dynamic dictionary-based estimation algorithms may obviate the local minima convergence problem encountered when classical estimation methods are used to estimate the parameters of a non-convex component function.

Scattering center location in high frequency radar, as used in SAR, is well-modeled by the additive complex exponential model [5,6,31]; so dictionary-based estimation methods can be used for scattering center estimation. Synthetic aperture radar imag- ing is a valuable tool in a number of defense surveillance and monitoring applications.

Currently, most radar images are two-dimensional (2D), but there is increasing inter- est in three-dimensional (3D) reconstruction of objects. Traditional Fourier 3D SAR image formation requires data collection over a large contiguous sector of azimuth- elevation aspect angles [38,39]. In practice, such a collection is difficult or impossible to obtain, and effective 3D reconstructions using sparse measurements are sought.

Sparse reconstruction methods have been used successfully to generate 2D radar im- ages of man-made objects [31,40,41]. Greedy sparse reconstruction algorithms have been used to form 3D images of canonical shapes [42,43], and interferometric SAR and Tomo-SAR methods have been used to successfully reconstruct 3D objects from multiple elevation, narrow-aspect angle flight paths [44–53].

In Chapter 6, we present a dictionary-based estimation algorithm to estimate 3D scattering center locations of objects from sparse SAR collection paths, and present

3D, wide-angle SAR visualizations of objects from X-band radar measurements. We show that, unlike Fourier imaging, dictionary-based estimation produces well-resolved images from sparse and highly non-linear 3D collection geometries. These images are the first well-resolved 3D SAR images of a civilian vehicle from a highly non-linear

8 flight path; furthermore, the the proposed wide-angle 3D SAR imaging algorithm pre- sented is tractable and well suited for the persistent sensing mode of radar operation, where data is collected over wide-angles and possibly irregular flight paths.

Finally, Chapter 7 concludes the dissertation. In this chapter, we summarize the main results, and propose future research directions.

9 Chapter 2: The Additive Component Model

In many parametric modeling problems, one is given noisy measurements of a

signal that is a weighted sum of M components, each component being a function

parameterized by a set of parameters θm:

M

yn = αmf(tn,θm)+ ǫn, n =1,...,N. (2.1) m=1 X The component function, f(t,θ) C, and measurement variables, t Cd0 , are ∈ n ∈ assumed to be known, but the parameters, θ Cd, amplitudes, α C, and m ∈ m ∈ model order, M, are unknown. In general, the signal component function, f(t,θ), depends nonlinearly on the θ vector. The goal is to estimate the model order, M, the parameter vectors,

Θ= θ M , (2.2) { m}m=1 and amplitudes,

T α =[α1,...,αM ] , (2.3) from the noisy measurement vector,

T y =[y1,...,yN ] , (2.4) where

T ǫ =[ǫ1,...,ǫN ] (2.5)

10 is the noise vector.

An example of an additive component model is the complex exponential model,

commonly referred to as the sinusoids-in-noise model. In this model, the component

function is a complex exponential of the form

H −jθmtn f(tn,θm)= e , where j = √ 1 and superscript ’H’ is the Hermitian transpose. This model is used − for a wide number of spectral estimation problems [8], including, but not limited to, medical imaging [1–4], radar signal processing [5,6,31], and array processing [7,8].

For example, in SAR imaging, the objective is to estimate the location and amplitude parameters of scattering centers in a scene. Under a far-field, high-frequency assump- tion, SAR can be modeled as a complex exponential additive component model,

T where, using the notation of (2.1), θm =[xm, ym, zm] is a scattering center location

T at (xm, ym, zm) in Cartesian space, and tn = [kx,n, ky,n, kz,n] , which are measure- ments of k-space at locations (kx,n, ky,n, kz,n) [5,6]. We will examine SAR location parameter estimation further in Chapter 6.

2.1 Sparse Linear System Representation

We represent (2.1) as a linear system for use in the following chapters. Define the

N-vector

T a(θ)=[f(t1,θ),...,f(tN ,θ)] . (2.6)

Let ¯ ¯ K Θ= θk k=1 (2.7) 

11 be a set of θ parameter samples, and define the N K, K M, system dictionary × ≥ as

A =[a(θ¯1),...,a(θ¯K )] (2.8) and the system amplitude vector as

T x =[x1,...,xK ] . (2.9)

The dictionary matrix, A, may be underdetermined. When the set of parameter samples contains the true parameters, Θ Θ,¯ ⊆ M

y = αma(θm)+ ǫ m=1 X = A(Θ)¯ x + ǫ, x = M, (2.10) k k0 where counts the number of non-zero entries in x and, although not a true norm, k·k0 is commonly called the ℓ0-norm. The non-zero entries of x align with the columns of

A at the true parameter values such that

αm when θ¯k = θm xk = . (2.11) (0 otherwise When using (2.10) in an estimation problem, true parameters are not known, and are usually not contained in the set of parameter samples. In this case, there is dictio- nary mismatch, and (2.10) becomes an approximation. Amplitude estimation error caused by dictionary mismatch when using (2.10) to approximate an additive com- ponent model under sparsity and RIP conditions of compressive sensing have been investigated in [54,55]. We use (2.10) in dictionary-based estimation methods of sub- sequent chapters and discuss methods of choosing parameter samples to lessen model estimation error caused by dictionary mismatch when the conditions of compressive sensing do not hold.

12 Chapter 3: Sparse Algorithm Performance in Highly Correlated Dictionaries

3.1 Introduction

To estimate the model order and parameters of an additive component model us-

ing dictionary-based estimation, the model is represented as a linear system of the

form (2.10), and a sparse reconstruction algorithm selects a sparse linear combina-

tion of dictionary columns that model the signal well. The columns selected by the

algorithm affect estimation performance. As discussed in later chapters, when param-

eters are sampled finely to avoid estimation bias caused by parameter quantization,

K M in (2.10), and dictionary columns may be highly correlated; there are many ≫ sparse algorithms from the field of compressive sensing that can be used to select

dictionary columns, several of which we evaluate in the following sections. Theoret-

ical error bounds can be made on the solutions of many of these algorithms when

either the intercolumn correlation mutual incoherence or RIP conditions of compres-

sive sensing hold (e.g. [19–21,26,28]). However, for dictionary-based estimation, we

are interested in the case where dictionary columns are highly correlated, and the

compressive sensing conditions guaranteeing theoretical bounds are violated.

13 The convergence rate of a sparse reconstruction algorithm is important not only

to dictionary-based estimation speed, but to estimation performance. In practice,

sparse algorithm tolerances may be set to achieve acceptable run times, but the

algorithm may terminate before it is sufficiently close to the minimum cost solution.

An example of an amplitude vector estimate,x ˆ, produced by an early terminated

sparse reconstruction algorithm is shown in Figure 3.1. The x-axis is the component

0.05

0.04

0.03

0.02

0.01

0 4.7 4.8 4.9 5 5.1 5.2 5.3

Figure 3.1: Estimatex ˆ after early termination of sparse reconstruction algorithm. Blue vertical lines are the estimated components, and the red vertical line is the true parameter location.

index ofx ˆ, and the y axis is the magnitude of these components; the magnitude at each index is shown as the height of a vertical blue line. In this example, there is one true sinusoid with a magnitude of 1 shown as a red vertical line. Energy of the estimated components is spread about the true location of the parameter, instead of being focused at the true parameter location. This spreading results in a greatly overestimated model order (number of nonzerox ˆ 1), and is undesirable. − 14 One popular type of sparse reconstruction algorithm, typically used in image pro-

cessing, is the ℓp regularized least-squares (LS) algorithm, which solves

2 p xˆ = argmin y Ax 2 + λ x p, (3.1) x || − || || ||

where 0 < p 1, and λ is a user-set hyperparameter that penalizes non-sparse ≤ solutions. A second class of sparse reconstruction algorithms, typically used in com- pressive sensing, solves the constrained optimization problem

p 2 min x p s.t. y Ax 2 < ǫ, (3.2) x || || || − || again, where 0

FISTA algorithm [56] that solves (3.1) for p< 1. Specifically, we examine

1. How fast the sparse reconstruction algorithms converge for a highly correlated

dictionary.

2. The solution quality. Convergence of an algorithm is not meaningful if the

solution does not represent the model well. We define quality of a solution as

the sparsity level of the solution, and how well the solution models the true

signal. We make this definition, since the premise of this research is to select

a small number of dictionary columns, whose linear combination models the

underlying signal well.

3. The convergence and quality of reconstructed solutions across several popular

sparse reconstruction algorithms. This assessment includes a comparison of

15 algorithms capable of solving the problem (3.1) for p = 1, and p< 1, and (3.2)

for p = 1.

3.2 Sparse Reconstruction Algorithms

For algorithm comparison, we have chosen popular algorithms that have been

proven to work well on sparse reconstruction problems. In this section, we describe

these algorithms.

3.2.1 Majorization Minimization

In the majorization minimization (MM) framework of optimization, the objective

function is majorized by a surrogate function that is easier to minimize. The surrogate

function satisfies certain properties that ensure that the minimization converges to a

local minima of the original objective function [40,57]. We use a MM framework to

solve (3.1) for any 0

(n) (n) p−2 where D(x ) = diag p xi , and diag( ) is a matrix with its argument down | | i h i  the diagonal. This algorithm intrinsically involves a nested loop. The outer loop is on the index n, and the inner loop is a conjugate gradient algorithm (CG) to solve the inverse matrix in the iterate. When A is a DFT matrix, empirically, it appears that a small number of CG iterations are need to converge to an good solution.

3.2.2 Split Bregman

A split bregman algorithm redefines equations or variables in the original opti- mization problem as new optimization variables; in addition, constraints are imposed

16 on the problem to ensure that these new definitions are enforced [58,59]. Addition of

these new variables allows the problem to be split into simpler subproblems. We use

the split bregman technique to solve (3.1) for any 0 < p 1. Define w = x in (3.1) ≤ and modify the optimization problem to be

2 p 2 arg min y Ax 2 + λ w p + γ/2 w x B 2, (3.4) x,w k − k k k k − − k

where B is a Bregman variable that will be updated, and γ is a user set penalty

coefficient [58]. The algorithm can be written in a more efficient form if the operator

A = RF , where R is a diagonal mask operator and F is a square Fourier operator.

Since we use this form of A in simulations, we assume it takes this masked-Fourier form

for algorithm derivation. The minimum to (3.4), which can be found by taking the

gradient with respect to x and observing that the w subproblem has a soft-threshold

solution, defines the bregman iterates as

γ −1 γ x(n+1) = F H R + I F (w(n) B(n))+ F H Ry 2 2 − (n+1) (n+1) (n)   w = Sp(x + B ,λ/2) (3.5)

B(n+1) = B(n) + x(n+1) w(n+1). −

The function Sp is defined as the generalized shrink operator [59]

S (x,λ) = max x λ x p−1, 0 x/ x . (3.6) p {| |− | | } | |

If A = RF is not assumed, and F is not orthonormal, the matrix inverse in the first equation might not be diagonal, necessitating an expensive inverse computation.

3.2.3 FISTA

The original FISTA algorithm is a variation of the iterative soft threshold algo- rithm [27], and is used to solve to optimization problem (3.1) for p = 1. Algorithm

17 speed is accelerated by changing the gradient term to the soft threshold operator. The

FISTA algorithm can be generalized for p< 1 by replacing the S1 shrink operator by

the generalized Sp shrink operator in (3.6). The generalized FISTA iteration is given

by

x(n) = S (z(n) + µAH (y Az(n)),λµ/2) p − a(n+1) = 1+ 1+(a(n))2 /2 (3.7)  q  a(n) 1 z(n+1) = x(n) + − (x(n) xn−1). a(n+1) − 3.2.4 Augmented Lagrangian

The augmented Lagrangian method is similar to the split bregman method in

that extra variables are added to the objective function so that the optimization

problem decomposes into subproblems that are easier to solve than the original prob-

lem. An augmented Lagrange software package RecPF is available for 2D Fourier

imaging problems [60]; however, this package is not compatible for solving the one-

dimensional sinusoidal example presented in the simulation section. We derive an

augmented Lagrangian algorithm for solving (3.1) with p = 1. Derivation of the

algorithm involves finding the solution to the augmented problem [60],

2 2 arg min y Ax 2 + λ w 1 + β/2 w x 2 , (3.8) x,w k − k k k k − k  where β is a user-set parameter. The solution to this problem defines the iterates of the algorithm. It turns out that when A = RF , the masked Fourier operator defined above, this algorithm is a modification of the split bregman iteration in (3.5). The algorithm is defined by setting γ = λβ, λ¯ = 2/β, and B(n) = 0 in the split bregman algorithm. We use λ¯ to denote the λ parameter in the bregman iteration.

18 3.2.5 Spectral Projected Gradients

The spectral projected gradient algorithm is a projected gradient method that solves the constrained sparse reconstruction problem (3.2) with p = 1. This method

is implemented in the software package SPGL1, and we use it in simulations. The

algorithm is outlined in Algorithm 1 of [56]. Computational complexity lies in a

one-norm gradient projection step and line search.

3.2.6 Iterative Hard Thresholding

Iterative hard thresholding (IHT) is a greedy sparse reconstruction algorithm that

has provable performance guarantees when the dictionary A satisfies RIP [26]. The

objective of the algorithm is to find the optimal k-sparse (only k non-zero coordinates)

solution to the quadratic cost in the first term of (3.1). This is a combinatoric

problem, and IHT attempts to find a solution in an iterative, greedy way. The IHT

iteration is given by

x(n+1) = H (x(n) + µAH (y Ax(n))), (3.9) k −

where Hk is the hard threshold operator. This operator sets all coordinates of it’s

argument to zero, except for the largest k magnitude elements.

3.3 Experimental Results

In this section, we present a Monte Carlo simulation study of reconstruction per-

formance of the algorithms presented in the previous section when the dictionary A

has high intercolumn correlation. For this study, we examine frequency estimation

in a sinusoids-in-noise model. Such a matrix arises in synthetic aperture radar signal

processing [5], or direction of arrival estimation [7]. In this model, the component

19 function in (2.1) is f = ejωt, where the frequency, ω, is the model parameter. After discretization of the parameter space, this model is represented as a linear system

(2.10) with a highly oversampled N K DFT matrix, A; a matrix is oversampled by × a factor of O if K = ON. The columns are highly oversampled (and highly correlated) to avoid bias in dictionary-based estimation.

All measurements are generated from a system (2.1) with a complex exponential component function and iid complex Gaussian noise with variance σ2. There is M =1 true sinusoid with a magnitude α = 1 and a phase of 0.79 radians; the phase | | is drawn uniformly at random and then fixed. The linear system representation

(2.10) uses a DFT matrix, A, with N = 16 rows and K = 16 64 = 1024 columns × (64 times oversampled); the intercolumn correlation of the dictionary is a(i)H a(i + | 1) / a(i) a(i + 1) = 0.996, where a(i) is column i of A. Reconstruction results | k k2k k2 are averaged over 200 realizations. The signal-to-noise ratio (SNR) in all simulations is defined in decibels (dB) as 10log( α 2/σ2). All simulations were performed on an | | Intel Xeon 2.66 GHz quadcore CPU with 8 GB of memory.

We examine algorithm convergence performance in four simulation scenarios. In the first scenario, the true frequency is aligned with a dictionary column and there is no noise. Although, not a practical scenario, since there is always measurement noise, and the true frequency will not usually match a column of the dictionary, A, these results provide a baseline for algorithm performance. In the second scenario, the true frequency again matches a column of the DFT matrix, but there is 10 dB

SNR; this scenario isolates the effect of noise on algorithm performance. In the third scenario, we examine the effect of dictionary mismatch on algorithm performance without noise. In this scenario, the true frequency is set to the mean of adjacent

20 dictionary column frequencies; so, the true frequency does not match any column in

the dictionary. The final scenario models a practical scenario where there is noise

and dictionary mismatch. Again, the true frequency is set to the mean of adjacent

dictionary column frequencies, but there is 10 dB SNR.

Simulation results for these four scenarios are shown in Figures 3.2 through 3.5,

respectively. Algorithms are abbreviated as follows: Majorization-Minimization with

p = x as MM(x), Split-Bregman with p = x as SB(x), FISTA with p = x as FISTA(x),

Augmented Lagrange as AL, Spectral Projected Gradient as SPGL1, and Iterative

Hard Thresholding as IHT. Each of these algorithms has hyperparameter settings, as discussed in Section 3.2, and tolerance settings. Tolerances were set so that each of the algorithms executes as many iterations as possible up to a limit of 10000 iterations.

For some algorithms, it was not possible, and/or necessary to set tolerances to meet this limit. In addition, traces for each algorithm end after 80% of realizations have terminated. This is the reason some traces end before the x-axis limit.

Principled hyperparameter selection of λ for algorithms that solve (3.1), ǫ for algorithms that solve (3.2), and sparsity k for for greedy algorithms, such as the iterative hard thresholding algorithm, is an active area of research [29, 30, 56, 61].

Since the focus of this work is on algorithm convergence, and not hyperparameter selection, we select hyperparameters manually. Values of λ and ǫ are set to values that generate good sparse solutions. We believe that the hyperparameters selected generate solutions representative of some of the sparsest solutions that the algorithms are capable of generating. The sparsity value of k in IHT is overestimated as 5.

21 3 10 MM(1) MM(0.8) SB(1) SB(0.8) FISTA(1) FISTA(0.8) AL 2 SPGL1 10 IHT

Avg. Sparsity 1 10

0 10 0 1 2 3 4 10 10 10 10 10 Iteration (a) Avg. Sparsity

MM(1) 1 MM(0.8) 10 SB(1) SB(0.8) FISTA(1) FISTA(0.8) AL SPGL1 IHT

0 10 Prediction Error

−1 10

0 1 2 3 4 10 10 10 10 10 Iteration (b) Prediction Error

Figure 3.2: On column, noiseless performance of sparse reconstruction algorithms as a function of iteration. There is M = 1 true sinusoid matched to a dictionary column. The LS prediction error with knowledge of sparsity M = 1 is zero in this case, and is not plotted.

22 3 10 MM(1) MM(0.8) SB(1) SB(0.8) FISTA(1) FISTA(0.8) AL 2 SPGL1 10 IHT

Avg. Sparsity 1 10

0 10 0 1 2 3 4 10 10 10 10 10 Iteration (a) Avg. Sparsity

MM(1) MM(0.8) SB(1) SB(0.8) 1 FISTA(1) 10 FISTA(0.8) AL SPGL1 IHT Prediction Error

0 10 0 1 2 3 4 10 10 10 10 10 Iteration (b) Prediction Error

Figure 3.3: On column, noisy performance of sparse reconstruction algorithms as a function of iteration. There is M = 1 true sinusoid matched to a dictionary column, and the signal level is 10 dB SNR. The horizontal dotted red trace on the prediction plot is the LS prediction error with knowledge of sparsity level M = 1.

23 3 10 MM(1) MM(0.8) SB(1) SB(0.8) FISTA(1) FISTA(0.8) AL 2 SPGL1 10 IHT

Avg. Sparsity 1 10

0 10 0 1 2 3 4 10 10 10 10 10 Iteration (a) Avg. Sparsity

MM(1) 1 MM(0.8) 10 SB(1) SB(0.8) FISTA(1) FISTA(0.8) AL SPGL1 IHT

0 10 Prediction Error

−1 10

0 1 2 3 4 10 10 10 10 10 Iteration (b) Prediction Error

Figure 3.4: Off column, noiseless performance of sparse reconstruction algorithms as a function of iteration. There is M = 1 true sinusoid that is not on a dictionary column. It is located at the mean frequency of adjacent dictionary columns. The horizontal dotted red trace on the prediction plot is the LS prediction error with knowledge of sparsity level M = 1.

24 3 10 MM(1) MM(0.8) SB(1) SB(0.8) FISTA(1) FISTA(0.8) AL 2 SPGL1 10 IHT

Avg. Sparsity 1 10

0 10 0 1 2 3 4 10 10 10 10 10 Iteration (a) Avg. Sparsity

MM(1) MM(0.8) SB(1) SB(0.8) 1 FISTA(1) 10 FISTA(0.8) AL SPGL1 IHT Prediction Error

0 10 0 1 2 3 4 10 10 10 10 10 Iteration (b) Prediction Error

Figure 3.5: Off column, noisy performance of sparse reconstruction algorithms as a function of iteration. There is M = 1 true sinusoid that is not on a dictionary column. It is located at the mean frequency of adjacent dictionary columns, and the signal level is 10 dB SNR. The horizontal dotted red trace on the prediction plot is the LS prediction error with knowledge of sparsity level M = 1.

25 3.3.1 Discussion of Results

Each plot in Figures 3.2 through 3.5 shows a metric of reconstruction perfor-

mance as a function of algorithm iteration n. Although most realizations run for approximately the same number of iterations, iteration is a function of realization.

Variations in error at the end of traces can be attributed to a small number of samples to average over. Average sparsity is defined as the number of non-zero entries in the reconstructed amplitude estimate at iteration n,x ˆ(n), with magnitude no more than

40 dB below the maximum magnitude entry inx ˆ(n). Any values below this threshold are defined as spurious and are discarded. Prediction error is defined as y Axˆ(n) , k t − k where yt is the true, noiseless signal, and is a measure of how well the reconstruction models the true signal without fitting the noise. We begin by noting that algorithm performance in the scenario when the true frequency falls on a dictionary column in

Figures 3.2 and 3.3 is very similar to when it does not in Figures 3.4 and 3.5 for the same noise level.

Taken together, these plots quantify convergence performance of the algorithms we examine. We define solution quality as how well the reconstructed vectorx ˆ models the underlying noiseless signal and how sparse the vectorx ˆ is. This definition aligns with the goal of parameter estimation using sparse reconstruction algorithms. We use the prediction error and average sparsity plots to evaluate model fit and sparsity, respectively. To judge the quality of prediction error, prediction error of the LS solution with knowledge of sparsity level M = 1 is provided. This is the prediction error using the best LS solution fit to each individual column in the dictionary A.

The LS error with sparsity knowledge is not given in Figure 3.2, when the frequency

26 MM(1) MM(0.8) SB(1) SB(0.8) FISTA(1) FISTA(0.8) AL SPGL1 IHT On, NN 5.33 9.99 0.32 0.54 0.72 1.10 0.29 2.01 0.61 On, N 6.47 7.41 0.31 0.56 0.78 1.11 0.29 1.80 0.74 Off, NN 6.07 8.85 0.32 0.55 0.78 1.11 0.32 1.99 0.77 Off, N 6.42 7.48 0.23 0.41 0.79 1.12 0.32 1.92 0.77

Table 3.1: Sparse algorithm average iteration time in milliseconds. The label ’On’ means that the true frequency is on a dictionary column, and ’Off’ means that the true frequency is between two dictionary columns. The label ’N’ signifies that noise is added and ’NN’ means that no noise is added.

is on a dictionary column and there is no noise. In this case, the LS solution achieves zero prediction error.

As shown in Figures 3.2 though 3.5, there is much variation across algorithms as far as convergence and solution quality. In addition, Table 3.1 provides the average exe- cution time per algorithm iteration in milliseconds. The iteration time demonstrates algorithm computational complexity. In Figure 3.2 the sparsity level of MM(0.8) rapidly decreases with iteration. The MM(1) algorithm does not reach the the cor- rect sparse solution until after approximately 5000 iterations, as opposed to MM(0.8), which achieves the same performance at less than 50 iterations. Each iteration of the

MM(0.8) algorithm takes less than twice as long as the MM(1) algorithm; so, the

MM(0.8) algorithm reaches the correct solution in less than 1/50 th of the time re- quired of MM(1). In noise, the MM(0.8) algorithm still converges faster, and to a one sparse solution after about 50 iterations, whereas the MM(1) on average does not converge to a one sparse solution.

Comparing across algorithms, the MM(0.8) algorithm takes the least number of iterations to converge to a one sparse solution, except for the noiseless case when the true frequency falls on a dictionary column, when the SPGL1 algorithm achieves

27 convergence faster. However, the SPGL1 algorithm is not capable of finding a sparse

solution in noise, and finds a two-sparse solution (the columns adjacent to the true

frequency) when the true frequency is not on a dictionary column. Furthermore, the

MM(0.8) algorithm achieves performance close to the sparsity informed LS estima-

tor. However, as shown by Table 3.1, the MM(0.8) has the highest computational

complexity per iteration. Each iteration involves solving an inverse system using the

conjugate gradient algorithm.

The split bregman algorithms are capable of converging to sparse solutions, but

require substantially more iterations than the MM(0.8) algorithm. However the com-

plexity of each iteration is substantially reduced from that of the MM algorithms

as seen in Table 3.1. For an oversampled DFT dictionary, as considered here, the

main operations of each bregman iteration in (3.5) are a simple soft threshold op-

eration and two DFT operations, which can be implemented efficiently by the FFT

algorithm. Like the MM algorithm, the p =0.8 quasinorm split bregman algorithm,

SB(0.8), converges faster than the p = 1 version, SB(1). The only difference between these algorithms is the use of the generalized shrink operation (3.6) in SB(0.8) that replaces the standard p = 1 soft thresholding operation. Considering the noiseless, on dictionary column case in Figure 3.2, assuming that SB(0.8) takes 1000 iterations to converge to a one sparse solution and MM(0.8) takes 50 iterations, SB(0.8) takes about 540 ms to converge and MM(0.8) takes about 500 ms. So, both algorithms need similar times to compute. The disadvantage of the SB(0.8) algorithm is that it does not converge to the correct sparsity in noise, and takes almost 10000 itera- tions to converge to a one sparse solution in the noiseless scenario, where the true frequency does not fall on a dictionary column. Furthermore, the SB(0.8) algorithm

28 exhibits erratic behavior in noisy cases for large iteration count. In the split bregman

implementation used here, the variables x and w in (3.5) are updated in an inner

loop while the bregman variable B is held constant. The observed behavior may be

occurring when a bregman variable update occurs after not occurring for several inner

loop iterations.

Performance of the FISTA(1) algorithm in terms of convergence and solution

quality is very similar to MM(1). However, as shown in Table 3.1, each iteration of

FISTA(1) is about 7 8 times faster than that of MM(1). As noted in Section 3.2, − each iteration of the MM algorithm performs a conjugate gradient operation, which

is computationally expensive. In contrast, the main operations in an iteration of

FISTA consist of two matrix operations and a shrinkage operation. Like the MM(0.8)

algorithm, the FISTA(0.8) algorithm is capable of converging to the correct sparsity of

1. The FISTA(0.8) algorithm takes more iterations than MM(0.8) to converge to this

solution, but each iteration of the FISTA(0.8) algorithm is about 7 9 times faster − than the MM(0.8) algorithm. Across the experimental scenarios, the FISTA(0.8) algorithm converges to a 1 sparse solution about 1.5 2 times faster than the MM(0.8) − algorithm. For example, in the most practical scenario, where the true frequency

parameter is off grid in noise, the FISTA(0.8) algorithm takes about 200 iterations to

converge to a 1 sparse solution, and the MM(0.8) algorithm takes about 60 iterations.

Using time per iteration in Table 3.1, this means that FISTA(0.8) converges in about

224 ms and MM(0.8) converges in about 449 ms. However, for these runtimes, the

MM(0.8) achieves lower prediction error.

An advantage of the IHT algorithm is speed per iteration. The main operations

of each iteration consist of two matrix multiplies and a sorting operation. This speed

29 is demonstrated in Table 3.1. Even in highly correlated dictionaries, this algorithm

performs well and converges to a one sparse solution when there is no noise (see Fig-

ures 3.2 and 3.4). However, the algorithm is not robust to noise, and its performance

is sensitive to sparsity k and step size µ in (3.9). When there is noise, the algorithm

fails. Instead of spreading energy around the true solution, as in Figure 3.1, it selects

many non-zero entries inx ˆ corresponding to frequencies not close to the true fre- quency. This poor performance is demonstrated in the sparsity and prediction error plots of Figures 3.3 and 3.5.

The augmented Lagrangian algorithm does not appears to converge, even in noise- less scenarios. Although not shown here, the solution energy is usually spread in the coordinates ofx ˆ around the true frequency. The algorithm was also tested on less cor- related dictionaries—such as a 2 times oversampled DFT matrix—and it accurately reconstructs x. As noted in Section 3.2, this algorithm is a modification of the split bregman algorithm, with no bregman update and slightly different hyperparameters.

The performance difference between the two algorithms can either be attributed to poor choice of hyperparameters (many were tried), or improved performance from a bregman update.

3.4 Conclusion

In this chapter, we investigated the convergence speed and solution quality of several popular sparse reconstruction algorithms when used with highly correlated dictionaries. We defined algorithm estimation performance as how well the recon- structed signal models the noiseless measured signal (prediction error) and how sparse

30 the reconstructed signal is. We preformed experiments, in noisy and noiseless sce-

narios, where the true parameter is matched to a dictionary column and when it is

mismatched, falling between two columns. Results for the matched and mismatched

dictionaries were similar. We showed that algorithms that minimize the ℓp regularized

LS objective function for p< 1 converge faster to the true sparsity of the model than when p = 1. If similar performance is to be achieved using p = 1, larger iteration counts and smaller algorithm termination tolerances are needed. Several algorithms achieved good estimation performance under certain scenarios, but the majorization maximization algorithm and FISTA algorithm with p = 0.8 are as fast or faster in

many scenarios than other algorithms with similar performance. The FISTA algo-

rithm converged to the true sparsity level up to 2 times faster than the majorization

minimization algorithm, but the majorization minimization algorithm achieved lower

prediction error performance faster; this prediction error was close to the

estimator with apriori knowledge of solution sparsity.

31 Chapter 4: Static Dictionary-Based Model Estimation

4.1 Introduction

In this chapter, we examine the relationship between the classic problem of para- metric modeling and sparse reconstruction for the additive component model (2.1),

M

yn = αmf(tn,θm)+ ǫn, n =1,...,N. m=1 X The model order, M, parameter vectors Θ = θ M , and α = [α ,...,α ]T are { m}m=1 1 m T unknown and are estimated from the noisy measurement vector y = [y1,...,yN ] ,

where ǫn is noise. In sparse reconstruction, a measurement vector y is modeled as

a linear system (2.10), y = Ax + ǫ, where A is a known dictionary matrix and ǫ =

T [ǫ1,...,ǫN ] is a noise vector; the goal is to reconstruct x. For sparse reconstruction, the linear system, A, is typically highly underdetermined; so, A has many more columns than rows, and x is assumed to be sparse, that is, to have a small number of nonzero elements.

Sparse reconstruction, or sparse linear modeling, is closely related to [19,21], and there has been a wealth of recent results on both algorithms and reconstruction guarantees for sparse solutions of linear inverse problems (see, e.g.,

[62,63]). Compressed sensing (CS) has recently been successfully applied to a number

32 of problems in signal and image modeling and reconstruction. These techniques apply

to applications in which a measurement can be described as a linear combination of

terms in which the number of terms is known to be small but the indices of nonzero

terms is unknown a priori.

Sparse reconstruction can be connected to parametric modeling by sampling the

parameter space and forming the columns of A from evaluations of f(t,θ) over a

sampled grid of θ-values. Then, parametric estimation is approximated by selecting

a small number of these columns that correspond to the nonzero entries of the sparse

vector x. This method of sparse estimation has been previously considered for esti- mating the direction of arrival parameter in source localization [32–34], for estimating a scattering center location parameter [31], or range and doppler parameters [35] in radar signal processing, and the location parameter of spin density in EPR medical imaging [11]. This work differs from previous work using sparse parameter estimation in that it addresses the issues of dictionary matrix construction and sparse algorithm setting selection through consideration of both the underlying parametric model and model order selection.

We formally pose the joint parameter estimation and model order selection prob- lem as a sparse reconstruction problem and discuss how sampling of the parameter space relates to RIP and impacts estimation accuracy. We show that information- based parametric model order selection methods (e.g., AIC, BIC, GIC) may be incor- porated into sparse reconstruction problem statements. Based on this formulation, we present a sparse reconstruction algorithm that performs both parameter estima- tion and model order selection. We also investigate how the sparse reconstruction algorithm settings, along with parameter vector sampling, impact the corresponding

33 parametric modeling solution. Finally, we illustrate the performance of the proposed

approach on the classical example of sinusoids-in-noise model order selection and

parameter estimation.

The remainder of this chapter is organized as follows. In Section 4.2 we review both parametric modeling and sparse reconstruction and discuss their connection in the context of model order selection and parameter estimation. In Section 4.3, we present the main result of this chapter, which connects the choice of the sparsity setting in sparse reconstruction to information criteria in classic model order selection problems, and we develop a corresponding dictionary estimation algorithm to implement both parameter estimation and model order selection. Section 4.4 compares results of direct parametric modeling to modeling using the sparse reconstruction approach, and conclusions are given in Section 4.5.

4.2 Model Order Selection and Parameter Estimation

We briefly review classic parametric model order selection/continuous-parameter estimation and sparse reconstruction, and discuss how these two methods relate.

4.2.1 Classical Model Order Selection and Parametric Esti- mation

The classical parameter estimation setting considers α, Θ in (2.1) as continuous parameters to be estimated. When the model order M in (2.1) is known, it is straight- forward to compute the Maximum-Likelihood estimate (MLE) of the model parame- ters as

αˆ(M), Θ(ˆ M) = argmax ln p(y; α, Θ), (4.1) { } [α,Θ]

34 where ln p(y; α, Θ) is the log-likelihood function of α and Θ for a given measurement

vector y. Construction of p(y; α, Θ) requires knowledge of the distribution of the

additive noise ǫn in (2.1). When noise samples are i.i.d. complex circular Gaussian,

ǫ (0,σ2), n =1 ...N, the MLE takes the form n ∼ CN 2 N 1 M αˆ(M), Θ(ˆ M) = argmin yn αmf(tn,θm) , (4.2) { } [α,Θ] σ2 − n=1 m=1 X X which, for general f(t,θ), is a nonlinear least-squares optimization problem. For many parametric modeling problems, the model order, M, is unknown and must be estimated using, for example, classic information criteria methods such as the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), or the Generalized Information Criteria (GIC). For the parametric model shown in

(2.1) with i.i.d. (0,σ2) noise, these criteria take the common form [8] CN 2 N 1 M J(M;ˆα(M), Θ(ˆ M)) = yn αˆm(M)f(tn, Θˆ m(M)) + ηM, (4.3) σ2 − n=1 m=1 X X where

ne, (AIC) η = ln(N)(n /2), (BIC) . (4.4)  e  νne, (GIC)

The variable ν is a user-defined penalty within the GIC framework, and ne is the effective number of unknown parameters per component in the model. Typically, the effective number of parameters is equal to the actual number of real-valued unknown parameters per component in the model. In this case, for the model (2.1), ne =

1+length(θ ) if the α are real, and n = 2+length(θ ) if the α are complex- m { m} e m { m} valued. However, there are exceptions to the parameter counting rule-of-thumb, such as sinusoids-in-noise, where ne = 5, even though there are 3 unknowns (amplitude, phase, and frequency unknowns for each sinusoid); see e.g. [14].

35 The information-criteria model order estimate minimizes the cost (4.3)

Mˆ = argmin J(M;ˆα(M), Θ(ˆ M)), (4.5) M with η chosen from (4.4) according to the desired selection rule.

4.2.2 Sparse Reconstruction

Sparse reconstruction seeks to solve an underdetermined linear system with spar- sifying constraints of the form

y = Ax s.t. x = M, (4.6) || ||0 where x CK is an M-sparse vector (i.e., it only contains M nonzero elements) to be ∈ determined from a measurement y CN and known matrix A C(N×K),M K. ∈ ∈ ≪ We use x to denote the ℓ -(quasi)norm, which counts the number of nonzero k k0 0 elements of x. The measurement equation y = Ax is ill-posed and lacks a unique

solution without the sparsifying constraint. However, the constraint x = M is || ||0 discontinuous in x and imposes combinatoric complexity in solving (4.6); a direct

K solution is to try all M possible choices of the M nonzero element indices of the K 1 vector x. To overcome  this combinatoric complexity, recent results have shown × that when A satisfies RIP [21], then the solution to the convex problem

min x s.t. y = Ax (4.7) || ||1 is unique and identical to the solution of (4.6). The optimization (4.7) is known as

Basis Pursuit (BP) [64].

When the measurements contain noise (y = Ax + ǫ), BP may be represented as

the problem

2 1 min y Ax 2 + λ x 1, (4.8) x || − || || ||

36 where the first term is a data fit measure, and the second is a sparsifying cost; the

variable λ is referred to as the sparsity penalty setting and trades off data fidelity

with sparsity. The optimization (4.8) is referred to as basis pursuit denoising (BPDN)

[64].

The use of ℓ1 norms to induce sparsifying solutions to linear systems has been

employed for many years; however, recent developments and interest in CS have

generated a renewed interest in sparse reconstruction. Compressive sensing typically

uses system matrices, A, populated with random elements [19,21]. This construction

has been shown to satisfy the RIP with high probability and therefore guarantees

reconstruction performance using BPDN.

However, unlike CS, the use of sparse reconstruction to solve general model order

and parameter estimation problems does not typically satisfy the RIP condition. To

apply sparse reconstruction methods to the additive component model (2.1), we use

the linear system representation (2.10), y = A(Θ)¯ x + ǫ. In the following discussion,

it is implied that the dictionary matrix depends on parameter samples, and Θ¯ is

omitted. When the parameter samples Θ¯ are sufficiently dense, there exists samples

in Θ¯ that are close to true parameters, and (2.10) will approximate the additive

component model. The nonzero elements of the amplitude vector, x, effectively select the columns of A with parameters close to the true parameters Θ. In the case of additive noise, this sampled variant of the additive model may be solved via an ℓp- norm extension of BPDN

2 p x˜ = argmin y Ax 2 + λ x p, (4.9) x || − || || || where x p = K x p, 0 < p 1. Note that lim x p = x . The optimiza- || ||p k=1 | k| ≤ p→0 k kp k k0 tion problem (4.9)P performs dictionary column subset selection; a solutionx ˜ to (4.9)

37 enables us to compute the order estimate, Mˆ , the amplitude estimates, αˆ , and { m} the unobservable parameter estimates, θˆ : { m}

Mˆ = x˜ (4.10) || ||0

αˆm =x ˜Im (4.11)

ˆ ¯ θm = θIm , (4.12) where I is an ordered set such that x˜ is the mth largest-magnitude element of { m} | Im | x˜.

Clearly the order and parameter estimates (4.10)–(4.12) depend on the solution x˜ which, in turn, depends on i) how the parameter space is sampled in order to form the columns of A, ii) the value of p defining the ℓp norm, and iii) the value of the sparsity parameter, λ. The following section describes the proposed reconstruction algorithm and how each of these three elements may be appropriately selected in order to effectively perform joint model order selection and parameter estimation using sparse reconstruction.

4.3 Main Results

In this section we develop a procedure for implementing both model order selection and parameter estimation using sparse reconstruction, and we discuss algorithmic and performance considerations in selecting p, λ, and the θ sampling density.

4.3.1 Static Dictionary-Based Estimation Algorithm and λ Selection

In practice, a solutionx ˜ to the optimization (4.9) may only be “approximately sparse,” meaning that only a small number of components have significant magnitudes

38 while many other components have negligible, but nonzero, magnitudes. These small-

magnitude components contribute very little to the predicted model and effectively

only serve to artificially increase the perceived model order. Therefore, we set the

small-magnitude components ofx ˜ to zero using the thresholding operation

0, if 20log |x˜k| <τ 10 max|x˜j | xˆ = H(˜x) :ˆx = j (4.13) k    x˜k, otherwise to generate a new amplitude vectorx ˆ. Thus, all components ofx ˜ that are more than

τ dB down from the largest magnitude component are set to zero.

We select λ based on the information criteria (4.3), which, for the linear model

(2.10) in noise, takes the form

1 J(x)= y Ax 2 + µ x , (4.14) σ2 || − ||2 || ||0

where the parameter µ is chosen as appropriate to implement AIC, BIC, or GIC.

For example, ne = 5 for the sinusoids-in-noise problem, and from (4.4) we have

5 5 for BIC η = ln(N) 2 ; so, µ = ln(N) 2 . Writing the output of (4.9) and (4.13) as xˆ(λ) = H(˜x(λ)) to explicitly indicate the dependence on λ, we see that λ indexes a

set of potential sparse reconstruction solutions. For large λ, xˆ(λ) = 0, and as λ k k0 decreases the number of nonzero elements ofx ˆ(λ) increases. We propose to select an optimal λ using the model order selection criterion in (4.3) as:

λ0 = argmin J [H(˜x(λ))] (4.15) λ

The final sparse reconstruction isx ˆ(λ0), which may be substituted forx ˜ in (4.10)–

(4.12) to produce the model order and parameter estimates.

Previous approaches to λ-selection have largely been ad-hoc or heuristic based.

Although cross-validation provides one principled method of choosing λ [29,30], this

39 approach can be computationally expensive and necessitate the collection of extra

training data. The approach presented here transfers the problem of λ-selection in

(4.9) to one of µ-selection in (4.14). The strength of this approach is that µ may be selected in a principled manner using a particular information criterion, such as AIC or BIC.

In practice, the optimization of (4.15) is complicated by the discontinuity of the

norm in (4.14), and conventional gradient-based methods cannot be used. Any k · k0 optimization algorithm capable of solving (4.15) can be used to find λ. One such simple algorithm to approximate the minimum solution of J(ˆx(λ)) is a tree-like grid search. At the current stage of the search, the cost function is evaluated on a grid of points within a bounded interval of λ values. The two points with the smallest cost are retained and denoted as λ1 and λ2. The interval of the next search stage is centered at (λ +λ )/2 and has length λ λ , and a grid of points in this interval is 1 2 | 2 − 1| searched for the minimum. The search can be stopped after a fixed number of stages or when J(ˆx (λ)) J(ˆx (λ)) ξ, where J(ˆx (λ)) is the minimum cost at stage n, | n+1 − n |≤ n and ξ is a user determined tolerance. The point with minimum cost at the final stage is chosen as λ0. We use this tree search algorithm for the simulations in Section 4.4.

The algorithm is summarized in Algorithm 4.1. We refer to this algorithm as a static dictionary-based estimation algorithm because the parameter samples do not adapt to the measurements.

4.3.2 Algorithm Settings

When using sparse reconstruction to solve continuous estimation problems, perfor- mance depends on the choice of algorithm settings, namely the dictionary sampling,

40 Algorithm 4.1 Static Dictionary Model Estimation Algorithm Form a dictionary matrix A by evaluating the component function f(t,θ) at • parameter samples θ¯1,..., θ¯K Select a value of µ in (4.14) based on the desired information criteria (e.g. • 5 µ = ln(N) 2 for BIC with a sinusoids-in-noise model) Minimize (4.15) to find the optimal sparsity parameter λ and corresponding • 0 xˆ(λ0). Substitutex ˜ =x ˆ(λ ) into (4.10)–(4.12) to obtain order and parameter estimates. • 0

p, and λ. In this section, we discuss the selection of these settings, and demonstrate the effect that different settings have on order and parameter estimation performance.

Dictionary Sampling Considerations

The first step in sparse parameter estimation is to form a dictionary, A, from the model component a(θ), by sampling from the space of potential parameter values θ.

The choice of parameter samples Θ,¯ and in particular the spacing between adjacent parameters, affects parameter estimation performance. Assuming the unobservable parameter θ is constrained to a region in Cd, this region could be divided into an equi-

spaced grid, and dictionary columns a(θ) may be evaluated on this grid. Alternatively, non-equi-spaced sampling mechanisms can be devised in order to minimize resultant parameter estimation error. An example of this sampling strategy is presented in

[65], where sample spacing is based on the Cram´er-Rao lower bound (CRB).

Regardless of sampling method, it is desirable to sample columns as densely as possible in θ-space because the estimation accuracy of θ is limited by the sample

spacing of θ¯k used to form A; coarse sampling results in quantization error. However,

from a computational perspective, it is also desirable to constrain the number of

41 columns in A. Furthermore, as θ-sampling becomes finer, intercolumn correlation

increases, A does not satisfy the RIP, and the solution to (4.9) may degrade.

In Figure 4.1 we illustrate these properties as a function of θ-space sampling for

a model order 1 sinusoids-in-noise estimation problem. The details of the model and

simulation parameters are presented in Section 4.4, where the sinusoids-in-noise prob-

lem is considered in detail. Figure 4.1 shows both the average estimated model order

and the average prediction error versus ∆θ for an equi-spaced grid. The prediction

error is defined as y Axˆ 2, where y is a noiseless version of the signal, and θ || t − ||2 t is frequency f of the sinusoid. The solid line demonstrates the performance of the proposed algorithm using BIC-based selection of λ. The dotted line demonstrates performance for an ad-hoc selection of λ; this fixed value of λ was chosen as the

BIC-based value for the specific sample spacing of ∆θ =2.4 10−3 Hz. × For the fixed value of λ, model order estimates and prediction error are a strong function of θ-sampling. For large ∆θ, more model components are selected to fit the measured data. Better model order and prediction performance can be achieved by decreasing the sample spacing in the dictionary; however, as sampling becomes fine, performance begins to degrade. This degradation may be caused by the optimization routine used to solve (4.9) converging to local minima, or terminating before the minimum is reached, as discussed in Chapter 3. The value p = 0.8 was used in this example; motivation for this value of p is described in the following section. For small

∆θ, we observe that the majority of the model order estimates are correct (Mˆ = 1), however as ∆θ decreases, Mˆ = 0 occur with increasing frequency. This is reflected in

Figure 4.1(a) by average model orders less than 1 for small ∆θ.

42 Figure 4.1 demonstrates that BIC-based λ selection is superior to the ad-hoc fixed

λ selection. Whereas, fixed-λ performance is sensitive to ∆θ, BIC λ selection is not.

For BIC selection, the estimated model order is approximately 1 (the correct value)

over the range of ∆θ considered, and the prediction error is always lower than the

fixed-λ error. Furthermore, model order Mˆ = 0 was not observed on average in the

BIC selection case.

Selection of p.

The dictionaries used for sparse parameter estimation typically have high intercol-

umn correlation. In this case, the RIP of compressive sensing [21] is not satisfied, and

compressive sensing bounds on estimation error cannot be applied. For example, in a

sinusoidal model, the correlation between columns is described by a sinc-like Dirichlet

kernel function, where the independent variable is the distance between frequencies.

As the frequency separation decreases, more columns will have high correlation on

the main lobe of the function.

Using p< 1 has been shown to be beneficial in reducing ℓ2 estimation error on x in a similar optimization problem [22] under RIP conditions. However, when p < 1,

(4.9) is a non- problem, and the optimization routine used to

solve (4.9) is not guaranteed to converge to the global minimum. Although global

minimization is not guaranteed when p < 1, empirical evidence in our experiments demonstrates that it is beneficial to use for parameter estimation, even when RIP is violated. Figure 4.2 shows the prediction error and estimated model order versus p for the same sinusoid-in-noise (M = 1) estimation problem with parameter spacing

fixed at ∆θ = 2.4 10−3 Hz. Both prediction error and model order performance × improve for decreasing p until approximately 0.8, after which there is very little

43 4

3.5

3

2.5

2 Model order

1.5

1

−4 −3 −2 −1 10 10 10 10 ∆ θ (a) Estimated model order

5

4.5

4

3.5

3 Prediction Error

2.5

2

1.5 −4 −3 −2 −1 10 10 10 10 ∆ θ (b) Prediction error

Figure 4.1: Average estimated model order and prediction error versus ∆θ. True model order is M = 1. The solid line corresponds to BIC-based λ selection; the dotted line corresponds to fixed lambda. Error bars in each plot indicate standard error.

44 change. Selection of any p < 0.8 results in similar algorithm performance. We observe that reconstruction performance for p> 0.8 is sensitive to tolerance settings in the optimization routine used to minimize (4.9).

4.4 Numerical Examples: Sinusoid Estimation

In this section we examine model order and parameter estimation performance for the sinusoids-in-noise model

M j2πfmtn yn = αme + ǫn, n =1,...,N, (4.16) m=1 X where M is the model order and N is the number of time samples; fm and αm are the frequency in Hz and complex amplitude of the mth sinusoid; tn is the time parameter

in seconds, which we assume is equi-spaced and given by t = (n 1)T . In general, n −

time samples do not have to be equi-spaced. The sampling period is T , and ǫn is

i.i.d. (0,σ2) noise. This model is commonly used in a wide range of applications, CN including radar imaging and direction of arrival processing [66].

The sinusoids-in-noise model is an additive component model and can be repre-

sented as a linear system of the form (2.10) for sparse estimation. We compare sparse

estimation performance under different noise levels to the traditional ESPRIT spec-

tral estimation method. We first review spectral estimation methods and fundamental

resolution and parameter variance limits for the sinusoids-in-noise problem.

4.4.1 Spectral Estimation Methods

Traditional spectral estimation methods estimate the nonlinear frequency param-

eters, fm, in the sinusoids-in-noise model (4.16). Once the frequency parameters are

45 2

1.8

1.6

1.4 Model order

1.2

1 0.4 0.5 0.6 0.7 0.8 0.9 1 p (a) Estimated model order

5.5

5

4.5

4

3.5

Prediction Error 3

2.5

2

1.5 0.4 0.5 0.6 0.7 0.8 0.9 1 p (b) Prediction error

Figure 4.2: Average estimated model order and prediction error versus p. True model order is M = 1. The information criterion BIC is used to select λ. Error bars in each plot indicate standard error.

46 estimated, linear least-squares estimation can be used to estimate the remaining lin-

ear amplitude parameters. Nonparametric (e.g., periodogram and correlogram) and

parametric (Maximum-Likelihood, AR, ARMA, MUSIC, ESPRIT, etc.) estimation

methods have been proposed for frequency estimation [8].

When using any of the previously mentioned methods to estimate the frequency parameters, it is assumed that the model order M is known. In practice the model order is often unknown a priori and must be estimated before frequency estimation is performed; for non-parametric methods, the model order dictates the number of peaks in the power spectral density (PSD) to pick, and in the parametric methods, the model order is needed to set the number of components in (4.16).

Many model order selection methods have been proposed in the literature. We utilize information criterion model order selection methods here as discussed in previ- ous sections. These methods can be applied to any signal where a parametric model is known, as is the case here.

As a benchmark for the static dictionary-based estimation approach, we use the parametric method ESPRIT [8] with the model order selection method BIC for esti- mating the frequencies and model order in (4.16). We chose ESPRIT for its ability to superresolve frequencies, its computational efficiency, and its near-optimal statis- tical estimation performance [67]. For equi-spaced time samples in (4.16), we define superresolution as the ability to discriminate between two frequencies spaced closer

1 than a Rayleigh resolution cell, defined as NT Hz . We note that sparse parameter estimation is more general than ESPRIT. Whereas

ESPRIT is only for sinusoidal estimation; sparse parameter estimation can be applied to any problem represented by an additive component model.

47 4.4.2 Parameter Error and Resolution Limits

We characterize performance by two types of error: model order error and param-

eter estimation error. Given the correct model order, parameter estimation error of

an unbiased estimate can be quantified with respect to the CRB. For the sinusoids in

i.i.d. (0,σ2) noise model (4.16), with known variance, the CRB is [8] CN T 2 −1 E Φˆ Φ Φˆ Φ Re µH µ (4.17) − − ≥ σ2 i j          where Φ=[f ,...,f , Re α ,..., Re α , 1 M { 1} { M } Im α ,..., Im α ]T , { 1} { M } and where Φ and Φˆ are the vectors of true parameters and parameter estimates respectively. The real and imaginary operators are Re and Im , respectively. {·} {·}

M j2πfmtn The derivative of vector m=1 αme with respect to the ith parameter of Φ n h i is denoted µ , and µH µ Pdenotes the 3M 3M matrix with i, j element given by i { i j} × µH µ ; superscript ‘H’ indicates Hermitian transpose, and A B means A B is a i j ≥ − positive semi-definite matrix. Using this relation, we can lower bound the variance on unbiased parameter estimates. For a fixed model order of 1 and data model (4.16), the CRB for frequency estimates using equi-spaced time samples is given by

ˆ 1 N var(f) 2 , (4.18) 2T 2SNR N−1 N−1 ≥ N n2 n n=0 − n=0   where SNR is defined as α 2/σ2, and αPis the signalP amplitude. In the following | | section, we compare frequency parameter estimation to the lower bound on variance given by the CRB.

A CRB analysis can also be used to bound the achievable resolution of two closely spaced sinusoids. One way to define resolution is to declare the sinusoids resolved

48 if frequencies are separated by more than some multiple of the minimum standard

deviation as determined by the CRB [68,69]. Here, we define two sinusoids to be

resolvable if they are separated in frequency by at least the minimum standard devi-

ation of an unbiased frequency separation estimate. Using this resolvability metric,

it can be shown that for equal-amplitude, closely spaced sinusoids, the minimum

frequency separation, ∆f , can be approximated by [70]

1 2880(N 2 + 131) 1/4 ∆ = . (4.19) f 2πT 4NSNR(N 2 1)(N 2 4)(N 2 9)  − − −  The SNR is defined as above, where α is the amplitude of each sinusoid.

4.4.3 Simulations

In this section, we compare the performance of the proposed sparse static dic-

tionary-based estimation algorithm to the ESPRIT spectral estimation algorithm for

the sinusoids-in-noise model estimation problem. The sparse estimation algorithm

jointly estimates model order and parameters as discussed in Section 4.3. We use

5 BIC to select model order in both algorithms; so, we set µ = 2 ln(N), corresponding to BIC, in (4.14). In ESPRIT, frequencies are estimated for a fixed model order,

and amplitudes are estimated using least-squares estimation with ESPRIT frequency

estimates substituted for the true frequencies.

The following results use Monte-Carlo simulation for 200 noisy realizations. True

sinusoid amplitudes are held at a constant SNR, defined as α 2/σ2, for each real- | | ization, where α is the amplitude of the sinusoids, and σ2 is the variance of i.i.d.

complex circular Gaussian noise. We define dB in this section as 10log( ). If there is · more than one sinusoid, both are given the same magnitude. Phases are held constant

at 0.6129 radians for one sinusoid and 0.6129 and 3.9732 radians for two sinusoids;

49 these phases were drawn uniformly at random from [0, 2π] and then fixed. Frequency parameters are held constant with different separations depending on the simulation scenario. All simulations use N = 16 time samples which are uniformly sampled at t = (n 1)T , n = 1,...,N, with sampling period T = 0.1 s. In sparse estimation n − simulations, we use a dictionary that is 256 times frequency oversampled, meaning

that there are K = 16 256 columns in the dictionary, and the frequency separation × between adjacent columns is 1 1 = 2.4 10−3 Hz. For ESPRIT estimation, it 256 × NT · is assumed that the maximum model order is N/2 = 8, and the correlation window length, k, is chosen to be the one that gives the best model order selection perfor-

mance for k = N/2+1,...,N. Although this is not possible in practice, we use this

window selection to provide a best-case ESPRIT estimate for comparison with sparse

parameter estimation.

We use p =0.8 in the simulations, as motivated by the discussion in Section 4.3.

The majorization minimization algorithm presented in [40] is used to solve (4.9), and

the initial value given to the optimization routine is x = AH y. A value of τ = 40dB 0 − is used in threshold operator (4.13). The tree search for finding λ consists of a search

depth of two levels. The first level consists of 27 samples in the range [10−4, 102]

logarithmically spaced, and the second level consists of 6 equispaced samples. Deeper

searches with finer refinements require longer run times, and provide similar results.

The simulation and optimization settings described here were also used in Figures 4.1

and 4.2 in Section 4.3.

One sinusoid

Figures 4.3 and 4.4 show model order selection probability histograms and ampli-

tude frequency parameter plots for a single sinusoid-in-noise for an SNR of 10 and

50 0 dB, respectively. Results are lines showing the estimated frequency location with amplitude encoded by line height for each of the 200 Monte-Carlo simulations. Pa- rameter plots are shown when both the estimated model order is the correct model order of 1, and when the estimated model order is 2.

ESPRIT outperforms the dictionary-based estimator in model order estimation performance for both noise levels, but is only marginally better for low SNR. Param- eter plots of incorrectly selected model order 2 show the frequency closest to the true frequency in black and the second estimated frequency in green. For both estimators, the second spurious frequency estimates tend to be small in magnitude compared to estimates close to the true frequency; as SNR decreases, these spurious estimates in- crease in magnitude with respect to the estimate closer to the true frequency. So, the model order performance discrepancy in the high SNR case may not be significant, if one low magnitude spurious frequency estimate can be tolerated.

For the cases where the estimated model order equals the true order (Mˆ = M = 1), dashed vertical blue lines in the parameter plots are shown at the true frequency plus and minus twice the square-root of the CRB in (4.17). Most of the realizations are contained inside the CRB lines. Table 4.1 shows the square-root of the CRB and root-mean-squared error (RMSE) for 100% and 95% of sparse model and ESPRIT parameter estimates, given the correct model order estimate. When only 95% of the estimates are used to calculate RMSE, 5% of the outliers are discarded. We show

RMSE with 95% of the estimates, since a small number of outliers can skew RMSE results to make it appear that the estimator is performing poorly. The RMSE of the dictionary-based algorithm is slightly better than that of ESPRIT, and dictionary- based estimation comes close to CRB performance at 95% RMSE.

51 1 1

0.8 0.8

0.6 0.6

Probability 0.4 Probability 0.4

0.2 0.2

0 0 0 2 4 6 8 10 0 2 4 6 8 10 Model Order Model Order (a) Sparse MO (b) ESPRIT MO

1.4 1.4

1.2 1.2

1 1

0.8 0.8

0.6 0.6 Magnitude Magnitude

0.4 0.4

0.2 0.2

0 0 4.9 4.95 5 5.05 5.1 5.15 4.9 4.95 5 5.05 5.1 5.15 Frequency (Hz) Frequency (Hz) (c) Sparse Parameter Estimates (Mˆ = 1) (d) ESPRIT Parameter Estimates (Mˆ = 1)

1.4 1.4

1.2 1.2

1 1

0.8 0.8

0.6 0.6 Magnitude Magnitude

0.4 0.4

0.2 0.2

0 0 0 2 4 6 8 10 0 2 4 6 8 10 Frequency (Hz) Frequency (Hz) (e) Sparse Parameter Estimates (Mˆ = 2) (f) ESPRIT Parameter Estimates (Mˆ = 2)

Figure 4.3: Model order probability and parameter estimates for true model order 1. Simulations were run over 200 realizations with 10 dB SNR. The red ‘x’ and vertical red dotted line indicate the position of the true sinusoid which has magnitude 1, shown by a horizontal red dotted line. The dashed blue lines are located at twice the square-root of the CRB from the true frequency.

52 1 1

0.8 0.8

0.6 0.6

Probability 0.4 Probability 0.4

0.2 0.2

0 0 0 2 4 6 8 10 0 2 4 6 8 10 Model Order Model Order (a) Sparse Model Order (b) ESPRIT Model Order

1.4 1.6

1.2 1.4

1.2 1 1 0.8 0.8 0.6 Magnitude Magnitude 0.6 0.4 0.4

0.2 0.2

0 0 4.8 4.9 5 5.1 5.2 5.3 4.8 4.9 5 5.1 5.2 5.3 Frequency (Hz) Frequency (Hz) (c) Sparse Parameter Estimates (Mˆ = 1) (d) ESPRIT Parameter Estimates (Mˆ = 1)

1.4 1.4

1.2 1.2

1 1

0.8 0.8

0.6 0.6 Magnitude Magnitude

0.4 0.4

0.2 0.2

0 0 0 2 4 6 8 10 0 2 4 6 8 10 Frequency (Hz) Frequency (Hz) (e) Sparse Parameter Estimates (Mˆ = 2) (f) ESPRIT Parameter Estimates (Mˆ = 2)

Figure 4.4: Model order probability and parameter estimates for true model order 1. Simulations were run over 200 realizations with 0 dB SNR. The red ‘x’ and vertical red dotted line indicates the position of the true sinusoid which has magnitude 1, shown by a horizontal red dotted line. The dashed blue lines are located at twice the square-root of the CRB from the true frequency.

53 √CRB f1 RMSE f1 (100%) RMSE f1 (95%) SNR (dB) Sparse ESPRIT Sparse ESPRIT 0 0.0610 0.0730 0.0808 0.0626 0.0690 10 0.0193 0.0227 0.0313 0.0198 0.0279

Table 4.1: Square-root of CRB and RMSE of frequency estimates given that the true model order 1 is selected.

Two well-separated sinusoids

In this example, we consider the model order and parameter estimation per- formance when two sinusoids are well separated—4 Rayleigh resolution bins apart.

Model order and parameter estimation performance is shown in Figures 4.5 and 4.6 for SNRs of 10 dB and 0 dB, respectively. Parameter plots are shown only when the estimated model order is equal to the true model order of 2; the trends for model order overestimation are similar to those in the one-sinusoid case. The frequencies associated with the lower frequency are colored black and the frequencies associated with the higher frequency are colored green. After associating two estimated frequen- cies to the two true frequencies, the remaining spurious estimates are typically small but become larger as SNR decreases. When model order is underestimated in the low SNR case, the single sinusoid estimate is close in frequency to one of the true sinusoids. To associate estimated sinusoids with true sinusoids, we perform a combi- natoric search over all pairings of estimated frequency and true frequency pairs. The pairings that achieve minimum least-squares frequency distance are selected as the data association. This association method is also used for RMSE calculation. We see that a majority of the frequency estimates fall within the 2σ CRB lines shown in ±

54 the parameter plots. The sum of RMSEs and square-root of CRBs for frequency es- timates are shown in Table 4.2. Performance is similar for both estimation methods, but is slightly better for dictionary-based estimation, especially for high SNR.

1 1

0.8 0.8

0.6 0.6

Probability 0.4 Probability 0.4

0.2 0.2

0 0 0 2 4 6 8 10 0 2 4 6 8 10 Model Order Model Order (a) Sparse Model Order (b) ESPRIT Model Order

1.4 1.4

1.2 1.2

1 1

0.8 0.8

0.6 0.6 Magnitude Magnitude

0.4 0.4

0.2 0.2

0 0 4.5 5 5.5 6 6.5 7 7.5 8 4.5 5 5.5 6 6.5 7 7.5 8 Frequency (Hz) Frequency (Hz) (c) Sparse Parameter Estimates (d) ESPRIT Parameter Estimates

Figure 4.5: Model order probability and parameter estimates for true model order 2 with sinusoids spaced 4 Rayleigh bins apart. Simulations were run over 200 realiza- tions with 10 dB SNR. The red ‘x’s and vertical red dotted lines indicate the position of the true sinusoids which have magnitude 1, shown by a horizontal red dotted line. The dashed blue lines are located at twice the square-root of the CRB from the true frequency.

55 1 1

0.8 0.8

0.6 0.6

Probability 0.4 Probability 0.4

0.2 0.2

0 0 0 2 4 6 8 10 0 2 4 6 8 10 Model Order Model Order (a) Sparse Model Order (b) ESPRIT Model Order

1.4 2

1.2

1.5 1

0.8 1 0.6 Magnitude Magnitude

0.4 0.5 0.2

0 0 4.5 5 5.5 6 6.5 7 7.5 8 4.5 5 5.5 6 6.5 7 7.5 8 Frequency (Hz) Frequency (Hz) (c) Sparse Parameter Estimates (d) ESPRIT Parameter Estimates

Figure 4.6: Model order probability and parameter estimates for true model order 2 with sinusoids spaced 4 Rayleigh bins apart. Simulations were run over 200 realiza- tions with 0 dB SNR. The red ‘x’s and vertical red dotted lines indicate the position of the true sinusoids which have magnitude 1, shown by a horizontal red dotted line. The dashed blue lines are located at twice the square-root of the CRB from the true frequency.

√CRB f1 + RMSE f1 + RMSE f1 + √CRB f2 RMSE f2 (100%) RMSE f2 (95%) SNR (dB) Sparse ESPRIT Sparse ESPRIT 0 0.1236 0.1642 0.1642 0.1508 0.1524 10 0.0390 0.0466 0.0577 0.0435 0.0541

Table 4.2: Sum of square-root of CRBs and sum of RMSEs of frequency estimates given that the true model order 2 is selected for well-separated sinusoids.

56 Two closely-spaced sinusoids

Estimation performance for two closely spaced sinusoids is examined next. We use (4.19) to select frequency spacing, which suggests that frequency spacing should be no closer than 0.26 and 0.46 Rayleigh bins apart for a SNR of 10 db and 0 dB, respectively. We thus select slightly larger spacings of 0.3 and 0.5 Rayleigh bins for

10 and 0 dB, respectively.

Figures 4.7 and 4.8 show model order and parameter estimation performance.

We note that for both SNRs, many outliers in the ESPRIT plots for model order

2 appear outside the frequency axis range shown. The middle row of these figures shows the estimated parameters when the estimated order equals the true order (Mˆ =

M = 2), and the bottom row shows the parameter estimates when the model order is incorrectly estimated as 1. In the 10 dB SNR case, there is no dictionary-based estimation parameter plot for Mˆ = 1 because model order 1 is never estimated in any of the realizations.

From these figures, we see that model order estimation performance is significantly better for the dictionary-based estimation algorithm than for the ESPRIT algorithm, although performance of both algorithms is poor for low SNR. ESPRIT underesti- mates model order with high probability for both SNRs. On average, when model order is underestimated, both algorithms appear to choose as an estimate the mean of the two frequencies. For the cases in which the correct model order of 2 is esti- mated, the dictionary-based frequency parameter estimation performance is superior to that of ESPRIT. There is a clear separation between frequencies for dictionary- based estimation at high SNR, and also what appears to be estimation bias, in that

57 √CRB f1 + RMSE f1 + RMSE f1 + √CRB f2 RMSE f2 (100%) RMSE f2 (95%) SNR (dB) Sparse ESPRIT Sparse ESPRIT 0 0.2990 1.3529 3.1994 0.7796 2.9658 10 0.1772 0.1951 1.6572 0.1804 2.9658

Table 4.3: Sum of square-root of CRBs and sum of RMSEs of frequency estimates given that the true model order 2 is selected for closely-spaced sinusoids.

the estimates are not centered about the true frequencies. At 0 dB SNR, frequency

estimates are no longer as well separated for either estimator.

Table 4.3 summarizes the parameter estimation performance for the closely-spaced frequency case. Shown are the sum of the square-roots of CRBs, along with the sum of all the RMSEs (the ‘100%’ column) and the 95% lowest RMSE’s (the ‘95%’ column) for the two algorithms when the correct model order 2 is selected. We see that ESPRIT performs poorly for both SNR levels. Sparse estimation performs well for the higher SNR, but sees significant performance degradation when SNR is lowered. Investigation into the reasons for the better performance of dictionary- based estimation as compared to ESPRIT in the closely spaced sinusoid case is a future direction of research.

4.5 Conclusion

We have investigated the connection between continuous parameter estimation and model order selection and sparse reconstruction. This connection is made by sampling the continuous-valued parameter space to realize a dictionary matrix in the sparse reconstruction formulation. The resulting model order selection problem be- comes one of selecting a small set of these dictionary elements; parameter estimates

58 1 1

0.8 0.8

0.6 0.6

Probability 0.4 Probability 0.4

0.2 0.2

0 0 0 2 4 6 8 10 0 2 4 6 8 10 Model Order Model Order (a) Sparse Model Order (b) ESPRIT Model Order

5 1 4 0.8

3 0.6

Magnitude Magnitude 2 0.4

0.2 1

0 0 4.6 4.8 5 5.2 5.4 5.6 4.6 4.8 5 5.2 5.4 5.6 Frequency (Hz) Frequency (Hz) (c) Sparse Parameter Estimates (Mˆ = 2) (d) ESPRIT Parameter Estimates (Mˆ = 2)

1.4

1.2

1

0.8

0.6 Magnitude

0.4

0.2

0 4.95 5 5.05 5.1 5.15 5.2 Frequency (Hz) (e) ESPRIT Parameter Estimates (Mˆ = 1)

Figure 4.7: Model order probability and parameter estimates for true model order 2 with sinusoids spaced 0.3 Rayleigh bins apart. Simulations were run over 200 realizations with 10 dB SNR. The red ‘x’s and vertical red dotted lines indicates the position of the true sinusoids which have magnitude 1, shown by a horizontal red dotted line.

59 1 1

0.8 0.8

0.6 0.6

Probability 0.4 Probability 0.4

0.2 0.2

0 0 0 2 4 6 8 10 0 2 4 6 8 10 Model Order Model Order (a) Sparse Model Order (b) ESPRIT Model Order

2 3

2.5 1.5 2

1 1.5 Magnitude Magnitude 1 0.5 0.5

0 0 4.5 5 5.5 6 4.5 5 5.5 6 Frequency (Hz) Frequency (Hz) (c) Sparse Parameter Estimates (Mˆ = 2) (d) ESPRIT Parameter Estimates (Mˆ = 2)

2 2

1.5 1.5

1 1 Magnitude Magnitude

0.5 0.5

0 0 4.7 4.8 4.9 5 5.1 5.2 5.3 5.4 4.7 4.8 4.9 5 5.1 5.2 5.3 5.4 Frequency (Hz) Frequency (Hz) (e) Sparse Parameter Estimates (Mˆ = 1) (f) ESPRIT Parameter Estimates (Mˆ = 1)

Figure 4.8: Model order probability and parameter estimates for true model order 2 with sinusoids spaced 0.5 Rayleigh bins apart. Simulations were run over 200 realizations with 0 dB SNR. The red ‘x’ and vertical red dotted lines indicates the position of the true sinusoids which have magnitude 1, shown by a horizontal red dotted line.

60 are obtained from the dictionary column index of the selected set. We proposed a sparse reconstruction algorithm utilizing ℓp-regularized least-squares to implement combined model order selection and parameter estimation for parametric models.

There are three algorithm settings in this approach that affect estimation perfor- mance: parameter sample spacing, the ℓp quasi-norm, and the sparsity hyperparam- eter. In this approach, parameter space is finely sampled to minimize quantization error, and as a result, the corresponding dictionary matrix columns are highly cor- related. The ℓp quasi-norm with p < 1 is chosen to provide a sparse solution for accurate model order selection, and the sparsity hyperparameter is chosen by relating the sparse reconstruction problem to the underlying parametric model and classic information-based parametric model order selection methods (AIC, BIC, GIC). Ex- amples of sinusoidal modeling demonstrate that the sparse reconstruction approach compares favorably to the parametric method ESPRIT, and outperforms ESPRIT for closely-spaced sinusoids.

61 Chapter 5: Dynamic Dictionary-Based Model Estimation

5.1 Introduction

A drawback of the static dictionary-based estimation method of Chapter 4 is that parameter estimates are quantized to the level of the parameter sampling, which is exhibited as parameter estimation bias [61,65]. Bias can be reduced by sampling the parameter space more finely—at the cost of increasing the size of the dictionary, resulting in increased computational complexity. In addition, finely-sampled dictio- naries have high intercolumn correlation; so, the dictionaries almost always violate compressive sensing conditions.

In this chapter, we propose two dynamic dictionary-based estimation methods, where the parameter samples can adapt to the data, and there is a penalty, or con- straint on the parameters to control a condition on dictionary correlation. As in static dictionary-based estimation, a sparse subset selection algorithm is used in model esti- mation, but the dynamic algorithms iteratively adapt parameter samples to the data.

Since the dictionary elements are adapted to the data, low bias can be obtained with small dictionary sizes, avoiding problems associated with large, finely-sampled static dictionaries.

62 Dynamic dictionary-based estimation is similar to several dictionary-based esti- mation approaches in the literature that utilize data to alter the dictionary or initial estimates to improve performance. In dictionary learning methods for sparse recon- struction, the dictionary is adapted to the data thorough a training stage before dictionary subset selection [24,25]. In the proposed dynamic dictionary-based esti- mation method, amplitudes and parameters are estimated jointly using the measured data. Dictionary learning can be viewed as static dictionary-based estimation with a learning stage to construct the dictionary. A dynamic algorithm using grid re-

finement was proposed in [33]. In this method, static dictionary-based estimation is used on a coarse fixed dictionary; non-zero amplitudes are located; a finer grid is placed around the locations of these amplitudes, and estimation is performed again.

This method achieves lower quantization error through grid refinement, but does not optimize dictionary parameter sampling to the data. Dictionary-based estimation al- gorithms for complex exponential estimation with support guarantees are presented in [36]. In these algorithms, a model order is assumed, and inhibition constraints are placed on parameter spacing. If true parameters are sufficiently separated, support guarantees hold even in highly correlated dictionaries. Parameter locations are not optimally adapted to the data, but like [33], regions with non-zero amplitudes are rees- timated to achieve improved error performance; improved residual error performance is achieved through a local optimization step, where the amplitudes are reestimated over a support region. This local optimization step does not optimize over dictionary parameters, but rather optimizes over dictionary columns in the support set of the amplitude vector. The complex exponential estimation problem is also treated in [37], where a greedy dynamic dictionary algorithm with inhibition constraint is presented.

63 A sparse model order s is fixed and a greedy algorithm selects s frequency parame- ters for dictionary column formation. The dictionary parameter values are estimated based on the data and an inhibition constraint on the intercolumn correlation. The inhibition constraint is chosen based on the conditions of model-based compressive sensing [71] to guarantee a bound on the norm of amplitude error. The dynamic dic- tionary algorithms that we propose estimate model order, and we choose inhibition constraints based on performance of the dictionary subset selection algorithm, which is shown to work well even under conditions when compressive sensing conditions are violated.

An outline of this chapter is as follows. We introduce a penalized and a constrained dynamic dictionary-based estimation algorithm in Section 5.2. These algorithms are tested on a complex exponential and a decaying exponential model in Section 5.3.

We compare algorithm performance against the unbiased Cram´er-Rao lower bound

(CRB), genie ML, and static dictionary-based estimation. In Section 5.4, we outline the salient points of the chapter and conclude.

5.2 Dynamic Dictionary-Based Estimation Methods

In this section, we present two dynamic dictionary-based estimation algorithms for the additive component model

M

yn = αmf(tn,θm)+ ǫn, n =1,...,N, m=1 X where all variable definitions are the same as those in (2.1). We begin by summariz- ing the dictionary-based estimation framework established for static dictionary-based estimation of additive component models in Chapter 4; the same framework will be used as a basis for dynamic algorithms as well.

64 Dictionary estimation methods for additive component model estimation utilize a

subset selection method to find an approximation to the additive component model

linear representation,

M

y = αma(θm)+ ǫ m=1 X = A(Θ)¯ x + ǫ, x = M, k k0 as defined in (2.10). A dictionary subset selection method that finds a solution to

(2.10) ideally would solve an optimization problem of the form

2 min x 0 s.t. y Ax 2 < ν, (5.1) x || || || − || where, counts the number of non-zero entries in its argument and, although it is ||·||0

not a true norm, it is commonly known as the ℓ0-norm. The setting ν is based on the

noise power of ǫ in (2.1). The presence of the ℓ0-norm, however, makes this problem

combinatoric and intractable. Given a fixed model order, greedy algorithms, such

as OMP [72],CoSaMP [28], and IHT [26], find an approximate solution to (2.10).

Alternatively, the ℓ -norm in (5.1) can be relaxed to an ℓ -quasinorm, , with 0 p k · kp 0

isometry property (RIP), theoretical error guarantees [21,22] can be given on the

relaxed problems.

The dictionary subset selection algorithm that we use to estimate the sparse lin-

ear coefficient vector estimate,x ˆ, to (2.10) is a relaxed ℓp-quasinorm optimization,

followed by a thresholding operation. Specifically, an approximately sparse linear

coefficient vectorx ˜ is estimated by the ℓp regularized least squares relaxed problem,

¯ 2 p x˜ = argmin y A(Θ)x 2, +λ x p (5.2) x k − k k k 65 where the ℓ -quasinorm can take values of 0

known as debiasing, calculates the least-squares solution to the linear system (2.10)

over the support ofx ˆ [63]. The linear coefficient estimate,x ˆ, to (2.10), generated by subset selection, encodes model order and parameter estimates. The model order estimate, Mˆ , amplitude estimates,α ˆ, and component function parameter estimates,

Θ,ˆ are given by

Mˆ = xˆ (5.4) || ||0

αˆm =x ˆIm (5.5)

ˆ ¯ θm = θIm , (5.6) where I is an ordered set such that xˆ is the mth largest-magnitude element of { m} | Im | xˆ.

Unlike static dictionary-based estimation, dynamic dictionary-based estimation methods adapt parameter samples, Θ,¯ to the measured data. In dynamic methods,

66 a small number of dictionary columns can be used, obviating the parameter quanti-

zation bias and memory requirements of fixed dictionaries. For example, the number

of dictionary columns could be chosen as the maximum possible model order of the

problem or it could be chosen to satisfy initial dictionary spacing constraints.

In the remainder of this section, we propose two dynamic dictionary-based esti- mation methods. The first prevents component function parameters from becoming closely spaced using a penalty term, while the second approach explicitly inhibits closely spaced parameters by constraining their distance. Both methods are modifi- cations of the static dictionary-based estimation method in that they 1] optimize over the component function parameters, Θ,¯ and 2] penalize or constrain the parameter to prevent unfavorable dictionary column spacing. An unsatisfactory parameter spacing might be one in which the columns are very similar or highly correlated.

5.2.1 Penalized Estimation

In the penalized estimation method, the ℓp regularized LS problem (5.2) is aug- mented with an additional penalty term, µg(Θ):

¯ 2 p x,˜ Θ = argmin y A(Θ)x 2 + λ x p + µg(Θ), (5.7) x,Θ k − k k k   where g(Θ) is a function that increases as dictionary columns become too “similar”; µ controls the weight of the column penalty function, and λ controls sparsity in dictio- nary element selection. Dictionary column similarity metrics may be correlation, RIP, or parameter sample distance. Several choices are possible for g, including a function inversely proportional to parameter distance or a penalty on the extreme eigenvalues of A(Θ); depending on problem size, eigenvalues may be computed through direct methods, or approximated using methods such as Gershgorin’s circle theorem [74].

67 For many problems of interest, such as the complex exponential or decaying expo-

nential estimation problems, intercolumn correlation can generally be decreased as

parameter distance is increased. This relation between parameter spacing and inter-

column correlation is demonstrated in Figures 5.1 and 5.6. In the following discussion,

we choose to penalize parameter distance, as a surrogate to penalizing intercolumn

correlation; we use a function

K g(Θ) = h(θ θ ) (5.8) i − j i=1 j>i X X where 1 h(x)= , n> 0. (5.9) x n | | Other functions that are monotonically decreasing in x could be used for h, such | | as the log-barrier penalty. When using a log-barrier penalty, it can be shown that

as µ 0, the optimum cost of (5.7) approaches that of a problem with an ordering →

constraint on the parameters θi [75]. The penalty function h in (5.9) can be convexified

by using the extended function

h(x) x 0 h¯(x)= ≤ , (5.10) x> 0 (∞ however, in general, even if using h¯ in place of h, the problem (5.7) will still not be

convex, except for the special case when p = 1 and y A(Θ)x 2 are convex. We k − k2 show that in applications, the non-convexity of this problem does not pose a problem

from a model order and estimation perspective.

Algorithmic Implementation

We solve (5.7) using block on the amplitudes, x, and component

function parameters, Θ. The first coordinate descent block of the algorithm optimizes

68 over x while keeping parameter samples, Θ,¯ fixed

(n+1) ¯ (n) 2 p x˜ = argmin y A(Θ )x 2 + λ x p. (5.11) x k − k k k

The second block of the algorithm optimizes over Θ while fixing x

¯ (n+1) (n+1) 2 Θ = argmin y A(Θ)˜x 2 + µg(Θ). (5.12) Θ k − k An advantage of the block coordinate descent approach is that (5.11) is a standard

ℓp regularized LS sparse optimization problem, and there exist many algorithms to solve this problem; in the following discussion, we use the majorization-minimization algorithm of [40]. We solve the second block using a trust-region subspace method

[76]. It follows by definition that each iteration of the coordinate descent algorithm decreases the cost (5.7). The algorithm is summarized in Algorithm 5.1.

Algorithm 5.1 Penalized Dynamic Dictionary-Based Estimation Algorithm Initialize Termination Settings: max iter, rel tol Initialize Variables: x(0), Θ¯ (0),τ,curr cost = ,rel cost = ,n =0 while n < max iter and rel cost > rel tol and∞ curr cost >∞0 do old cost curr cost (n+1) ← ¯ (n) 2 p x˜ argmin y A(Θ )x 2 + λ x p ← x k − k k k ¯ (n+1) (n+1) 2 Θ argmin y A(Θ)˜x 2 + µg(Θ) ← Θ k − k ¯ (n+1) (n+1) 2 (n+1) p ¯ (n+1) curr cost y A(Θ )˜x 2 + λ x˜ p + µg(Θ ) ←|curr k −cost−old cost| k k k rel cost curr cost n n +1← end← while x˜ x˜(n), Θ¯ Θ¯ (n) xˆ ← H(˜x) according← to (5.3) Debias← x ˆ to the support set. return Parameter and model order estimates according to (5.4)-(5.6)

Amplitudes and parameters are initialized to x(0) and Θ¯ (0). Although x(0) is not explicitly used in the block coordinate descent algorithm, the algorithm used to solve

69 ℓp regularized LS will typically require an initial value. Choices for initial parameters are that same as those used in static dictionaries. For example, x(0) may be the zero vector or AH y, and Θ¯ may be equispaced, or non-equispaced, based on Fisher information of the component function [65].

5.2.2 Constrained Estimation

As an alternative to using a repulsion penalty to prevent parameters from be- coming too close, an explicit constraint can be used. In contrast to the penalized estimation, the minimum distance between parameters is explicitly enforced as a constraint; we also show that, under certain approximations, this method generates estimates by solving convex subproblems.

The constrained estimation method approximately solves the optimization prob- lem

¯ 2 p x,˜ Θ =argmin y A(Θ)x 2 + λ x p x,Θ k − k k k   subject to δ θ θ , i < j (5.13) ≤ k i − jk2 which both enforces sparsity though an ℓp-quasinorm on x and inhibits parameters from becoming closer than δ. Although not used here, a more general weighted

ℓ2 norm could be used in the constraint for cases when θi are different orders of magnitude. The residual function, y A(Θ)x 2 is, in general, a non-convex function k − k2 of the Θ parameter, and the constraint is a non-convex set; so, the optimization problem is non-convex; however, in the following section, we show that when p = 1, the solution to (5.13) can be approximated by solving two convex subproblems.

70 Algorithmic Implementation

In this section, we propose an algorithm that approximately solves (5.13). We approximate non-convex functions in (5.13) by convex ones so that a solution can be generated using convex methods when p = 1. We begin by convexifying the residual function y A(Θ)x 2, which is, in general, a non-convex function of Θ for fixed k − k2 amplitude parameter x . Define F (θ) : Rd d K that takes a vector and places k → × it in the kth column of a zero matrix,

Fk(θ)= θ [01×k−1, 1, 01×K−k] , (5.14) where 0 is a m n zero matrix. m×n × We replace each column of A(Θ) by the first order Taylor series approximation, so that

A(Θ) A(Θ )+ dA(Θ )[F (θ θ )] (5.15) ≈ 0 0 k 0,k − k k=1,...,K where Θ0 is the set of parameter vectors that the expansion is centered around;

[F (θ θ )] is the Kd K matrix formed by stacking F (θ θ ), and dA k 0,n − 0 k=1,...,K × k 0,n − n is an N Kd matrix of Jacobians, ×

dA(Θ) = [J (a(θ1)) ,..., J (a(θK ))] , (5.16) where J(a(θ)) is the Jacobian matrix of a(θ) with respect to variable θ. Defining Θv as the stacked vector of θn,

v T T T Θ =[θ1 ,...,θK ] , (5.17) the residual can then be written as

y A(Θ)x 2 y A(Θ )+ dA(Θ )[F (θ θ )] x 2 (5.18) k − k2 ≈ k − 0 0 0,k − k k=1,...,K k2   71 = y A(Θ )x + dA(Θ )diag(x 1 )(Θv Θv) 2 (5.19) k − 0 0 ⊗ d×1 0 − k2 = y¯(x;Θ ) A (x;Θ )Θv 2. (5.20) k 0 − 1 0 k2

In the first equality, we have defined diag to be the diagonal operator that takes a vector to a diagonal matrix with the vector on the diagonal. The Kronecker product is , and 1 is a d 1 vector of ones. In the second equality, we have defined ⊗ d×1 ×

A (x;Θ )= dA(Θ )diag(x 1 ) (5.21) 1 0 0 ⊗ d×1 and

y¯(x;Θ )= y A(Θ )x + A (x)Θv (5.22) 0 − 0 1 0 So, for fixed x, the residual (5.20) is a convex function of Θ.

Next, we relax the distance constraint in (5.13) so that the constraint set is con- tained in a larger set. Define the coordinatewise distance operator ρ : Cd Cd R c × → as

ρc(a,b) = min bn an . (5.23) n | − | We replace the Euclidean norm constraint in (5.13) with the relaxed constraint

δ ρ (θ ,θ ), i

When d = 1, if we also order the coordinates, so that θk < θk+1, then (5.24) is equivalent to the convex constraint (defining a convex set)

δ + θ θ 0. (5.25) k − k+1 ≤

Using the residual approximation (5.20) and constraint set relaxation (5.24), we approximate the solution to (5.13) using block coordinate descent.

72 The first coordinate descent block of the algorithm optimizes over x while keeping

Θ¯ fixed and is the same optimization problem used in penalized estimation (5.11).

The θ parameter optimization blocks for each dimension j are given by

¯ (n+1) (n+1) ¯ (n) (n+1) ¯ (n) v 2 Θj =argmin y¯(˜x ; Θj ) A1(˜x ; Θj )Θj 2 Θj k − k subject to δ + θ θ 0, j =1,...,d, k =1,...,K 1. (5.26) j,k − j,k+1 ≤ −

The notation Θ means that the jth dimension of θ Θ is variable and all other j ∈ dimensions are fixed. Each dimension is updated once per algorithm iteration and the order of update is a decision of the practitioner. We implement the constrained optimization (5.26) using the cvx convex optimization software package [77]. Al- though the original optimization problem (5.24) is non-convex, if p = 1, each of the block coordinate descent subproblems is convex. The algorithm is summarized in

Algorithm 5.2.

5.3 Model Estimation Examples

In this section, we consider complex exponential and decaying exponential esti- mation using dynamic dictionary-based estimation. We compare dynamic dictionary- based estimation with static dictionary-based estimation and classical ML estimation combined with an information-criterion (IC) for model order selection. In all of the ex- amples, the signal is corrupted by i.i.d. complex circular Gaussian noise, (0,σ2I), CN when measurements are complex, or i.i.d. Gaussian noise, (0,σ2I), when measure- N ments are real; the ML estimate is given by (4.2). For additive component models,

ML parameter estimates, Θ,ˆ and amplitude estimates,α ˆ, for a fixed model order, M¯ ,

73 Algorithm 5.2 Constrained Dynamic Dictionary-Based Estimation Algorithm Initialize Termination Settings: max iter, rel tol Initialize Variables: x(0), Θ¯ (0),curr cost = ,rel cost = ,n =0 while n < max iter and rel cost > rel ∞tol and curr ∞cost > 0 and old cost curr cost do ≥ old cost curr cost (n+1) ← ¯ (n) 2 p x˜ argmin y A(Θ )x 2 + λ x p ← x k − k k k for j =1 to d do ¯ (n+1) (n+1) ¯ (n) (n+1) ¯ (n) v 2 Θj argmin y¯(˜x ; Θj ) A1(˜x ; Θj )Θj 2 s.t. δ +θj,k θj,k+1 ← Θj k − k − ≤ 0, k =1,...,K 1 end for − ¯ (n+1) 2 (n+1) p curr cost y A(Θ )x 2 + λ x˜ p ←|curr k −cost−old cost| k k k rel cost curr cost n n +1← end← while if old cost < curr cost then n n 1 end← if − x˜ x˜(n), Θ¯ Θ¯ (n) xˆ ← H(˜x) according← to (5.3) Debias← x ˆ to the support set. return Parameter and model order estimates according to (5.4)-(5.6)

74 can be implemented as [8]

ˆ ¯ 2 Θ(M)ml = argmin y A(Θ)M¯ x(Θ) 2 (5.27) Θ k − k

αˆ(M¯ )ml = x(Θ(ˆ M¯ )ml), (5.28)

¯ where A(Θ)M¯ is the matrix formed by M component function column vectors as in

(2.8) with K = M¯ , and

† x(Θ) = A(Θ)M¯ y. (5.29)

The matrix A† is the pseudoinverse of A. In general, the model order, M, is not known, and we estimate it using BIC as

Mˆ ml = argmin J(M¯ ;ˆα(M¯ )ml, Θ(ˆ M¯ )ml) (5.30) M¯ in (4.5).

In all of the following examples, we generate statistics from 200 Monte-Carlo simulations. The dynamic dictionary-based estimation Algorithms 5.1 and 5.2 use the settings rel tol = 10−7 and max iter = 50; all dictionary algorithms are initialized to

(0) (0) amplitudes, x = 0K×1, and the initial parameters Θ¯ are placed on an equispaced

1 grid. We use a repulsion function (5.8) of h(x) = |x|6 in the penalized dynamic estimation algorithm, and a threshold of τ = 35 dB is used in (5.3). Other algorithm − settings are specific to the signal being estimated and are discussed in subsequent sections.

For error reporting, estimated parameters θˆm are associated with true frequencies by minimizing LS error over frequencies, as in Chapter 4.4.3. We quantify error by the probability of correct model order and the conditional root mean squared error (RMSE). The RMSE is conditioned on the correct model order being estimated

75 (Mˆ = M). We compare parameter estimates against the unbiased CRB and model order and parameter estimates against a genie ML estimator. The genie ML estimator does not have knowledge of the true model order, and model order is selected by (5.30);

(0) however, initial frequencies are set to the true values, so that Θ(¯ M¯ ) = θ ,...,θ ¯ , { 1 M }

where we define θk =0 if k > M.

5.3.1 Complex Exponential Estimation

The complex exponential (sinusoid) model consists of components f(tn,fm) =

e−j2πfmtn in (2.1), where t N are time (or spatial) samples in seconds (meters), { n}n=1 and F = f M are the frequencies in Hertz that are estimated. The complex { m}m=1 exponential model is used in many applications. For example, the frequency param-

eter models attenuation in MRI [1–3] and CT [1,4] medical imaging, backscattering

location in high-frequency radar [5,6], the direction of arrival in narrow-band array

processing [7,8], and harmonics in spectral estimation problems [8].

We examine estimation of M = 2 sinusoids in noise. The sinusoids are set to equal

magnitude, α = 1, and have initial phases that were chosen uniformly at random | m| and then fixed to be 0.115 and 4.713 radians. There are N = 16 time samples,

with sampling period T = 0.1 seconds, giving an unaliased bandwidth of 10 Hz and

Rayleigh resolution 1/NT = 0.625 Hz. The model is well resolved with respect to

the Rayleigh resolution, with true sinusoids placed at f1 = 3.457 and f2 = 7.5 Hz.

Superresolution capabilities of static dictionary-based model estimation are discussed

in Chapter 4. The signal-to-noise ratio (SNR) is defined as 10log( α 2/σ2) for each | m| component m of the composite signal.

76 Initial amplitude and frequency estimates in both the dynamic and the static dictionary algorithms, which use ℓp regularized LS (5.2) for dictionary subset selection,

(0) are set to x = 0K×1. We say that parameter space is O times oversampled, if there are ON samples. Initial parameter samples, F¯(0), are set to an equispaced grid of

K = 16 frequencies in the dynamic dictionaries, and a 16 times oversampled grid with

K = 16 16 = 256 frequencies in the static dictionary. Each parameter sample grid × is equi-spaced over the range [0 , 10] Hz. With this grid, the first true frequency f1 =

3.457 Hz is the midpoint between two frequencies in the 256 sample static dictionary grid, and the second frequency f2 = 7.5 Hz lies on the initial frequency sample grid of all algorithms. In practice, the value of K can be set to the maximum allowable model order in dynamic dictionaries; in static dictionaries, it is set sufficiently large to minimize quantization error caused by discrete parameter sampling.

In each dynamic algorithm and in the static algorithm, the ℓp regularized LS optimization problem is a function of the sparse regularization settings λ and p of the

ℓp-quasinorm. There are also settings to control parameter spacing in each algorithm.

In the static algorithm, a parameter grid is fixed; in the penalized dynamic estimation algorithm (5.12), the setting µ controls the “force” at which parameters are repelled from each other, and in the constrained dynamic estimation algorithm (5.26), the setting δ explicitly enforces the distance that parameters must be separated. The δ setting can be directly set to a minimum allowable distance between true parameters, and µ is increased to enforce a larger minimum distance.

In Chapter 4, model order and residual cost is improved by use of ℓp regularized

LS with p< 1, in contrast to p = 1. We examine model order selection performance of ℓp regularized LS as a function of p and parameter spacing/intercolumn correlation.

77 Figure 5.1(a) shows the intercolumn correlation matrix A(F )H A(F ), where F is an equi-spaced set of frequency parameters. For the complex sinusoid model, correlation is the Dirichlet kernel of frequency differences, so the correlation matrix is Toeplitz.

Adjacent dictionary column correlation as a function of frequency spacing is shown in Figure 5.1(b); since the correlation matrix is Toeplitz, this figure applies to all frequencies. If there are a small number of component functions, and hence, the amplitude vector x is sparse, amplitude reconstruction error using sparse algorithms, such as BPDN or ℓp regularized LS, have been quantified in the compressive sensing literature (see e.g. [19,21,22]). The maximum number of components that can be recovered and the error of that recovery typically depends on the mutual incoherence of intercolumn correlation [20] or RIP [21,22]. However, in high correlation dictionar- ies, mutual incoherence and RIP conditions are violated, and established compressive sensing results do not apply. For example, a static dictionary with maximum quanti- zation bias less than 0.1 Hz would necessitate columns separated by frequencies less than or equal to 0.1 Hz. In this case, Figure 5.1(b) shows that dictionary intercolumn correlation exceeds 0.95, and for recovery of any number of components (M 1), ≥ mutual incoherence and RIP conditions do not apply. Nonetheless, empirically, static dictionary-based estimation using ℓp regularized LS exhibits good performance for appropriate choice of sparsity setting λ and ℓp-quasinorm.

Figure 5.2 shows model order estimation performance using static dictionary- based estimation to estimate one noiseless sinusoid with a frequency of 4.375 Hz; this frequency is chosen so that it falls on the parameter sample grid for all val- ues of oversampling. The ℓp regularized LS optimization (5.2) is implemented using a majorization-minimization [40] algorithm using tolerance settings that give good

78 0 1

2 0.8

4 0.6

6 0.4 Frequency (Hz)

8 0.2

10 0 2 4 6 8 10 Frequency (Hz) (a) Correlation Matrix

0.9 0.8 0.7 0.6 0.5 0.4 Correlation 0.3 0.2 0.1

−3 −2 −1 10 10 10 ∆Frequency (Hz) (b) Correlation vs. spacing

Figure 5.1: Complex exponential dictionary correlation matrix and correlation versus sinusoid spacing. Color encodes correlation.

79 model order estimation performance without prohibitively long run time. The regu- larization parameter λ is chosen using a grid search that minimizes M Mˆ . Fig- | − | ure 5.2 shows that there is a large region of (p, correlation) that gives the correct model order of 1 (blue), and that, as p approaches 1, the maximum correlation at which the correct model order is estimated decreases; correct model order is also estimated over the same correlation range for 0.1 p< 0.8, but is not shown here. Examples of am- ≤ plitude estimates for a 64 times oversampled parameter grid (a correlation of 0.9996), are shown in Figure 5.3 for p = 1 and p = 0.8. For p = 1, the amplitude energy is

“spread” to adjacent frequencies, causing model order error, while p =0.8 is focused with correct model order 1. This spreading effect does not appear to be intrinsic to the ℓp regularized LS optimization problem (5.2), but appears to be an algorithm con- vergence rate problem. As the MM algorithm is allowed lower tolerances and longer run times, spreading decreases, and model order improves; similar results have been observed with other algorithmic implementations, such as the split Bregman method

[58].

The model used in Figure 5.2 is idealized in the sense that there is no noise, model order is one, and the true frequency falls on the sampled parameter grid, none of which typically occur in practice; we use this result as a best case scenario for model order estimation performance as a function of p and correlation. Based on maximum corre- lation values implied by δ (or indirectly by µ), we use Figure 5.2 to inform our choice of p for the ℓp quasinorm used in dynamic algorithms. For constrained dictionary es- timation in this example, we allow frequency parameter samples to be no closer than

0.5 Rayleigh resolution bins apart. For this minimum parameter spacing, δ =0.3125

Hz, correlation is approximately 0.65, and the correct model order is estimated for

80 >=10 2048 0.9999996

1024 0.9999984 9

512 0.9999937 8

256 0.9999750 7 128 0.9999000 6 64 0.9996000 5 32 0.9984007 Oversampling 16 0.9936118 4

8 0.9745932 3 4 0.9006779 2 2 0.6376435

0.8 0.85 0.9 0.95 1 1 p

Figure 5.2: Complex exponential static dictionary model order as a function of corre- lation (or oversampling) and p. All measurements are noiseless for one true sinusoid (M = 1) located at 4.375 Hz on the sampled frequency grid Θ¯ (0). Regularization parameter λ is chosen using a grid search to minimize distance to the true model order. The left axis is the oversampling in dictionary columns with respect to the number of measurements N = 16, and the right axis is the corresponding intercolumn correlation of neighboring columns. Color encodes estimated model order, Mˆ , and is clipped at 10.

81 1

0.8

0.6

Magnitude 0.4

0.2

0 4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 Frequency (Hz) (a) p = 1

1

0.8

0.6

Magnitude 0.4

0.2

0 4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 Frequency (Hz) (b) p = 0.8

Figure 5.3: Estimates from Fig. 5.2 for an oversampling of 64 and two different values of p. Each component of thex ˆ vector is shown as a vertical blue line located at it’s respective frequency on the horizontal axis; magnitude is indicated on the vertical axis. The red line with an ’x’ is the true sinusoid.

82 all p 1 in Figure 5.2. We use p = 1 so that each block of the constrained dynamic ≤ dictionary algorithm is convex; in the penalized dynamic dictionary algorithm p =1 is also used, and a value of µ = 10 is used which is sufficiently large to inhibit high intercolumn correlation. Values as high as µ = 2000 have been used, and similar performance has been observed [78]. For the 16 times oversampled static dictionary, model order is overestimated according to Figure 5.2, and we use p = 0.8 for the static dictionary-based estimation algorithm.

For fixed parameter sampling settings and p, we select λ. Cross-validation [29] is a method to select λ in a data dependent way; however this approach requires training data. Homotopy methods [79], and warm-restarting [63] allow fast searching over λ- space for solutions, but are limited to p = 1 and real amplitude estimation. We utilize the BIC λ selection method proposed in Chapter 4. In the following simulations, we show dictionary-based estimation performance using the average value of λ selected by the BIC method at each SNR.

We quantify algorithm performance by parameter and model order estimation accuracy. Figure 5.4 shows the square-root of the CRB (sqrt(CRB)) and RMSE con- ditioned on correct model order as a function of SNR. Figure 5.5 shows correct model order probability as a function of SNR. RMSE parameter estimation performance of the penalized dynamic dictionary algorithm is comparable to the CRB and Genie ML estimator performance for SNR greater than 0 dB, with probability of correct model order close to 1; parameter estimation performance of the constrained dynamic dic- tionary algorithm is comparable with the CRB and Genie ML for SNR greater than

2.5 dB. For moderate SNR, static dictionary-based estimation is comparable to CRB and Genie ML, but deviates at higher SNR. This is a result of dictionary quantization

83 error. As SNR increases, quantization error, causing bias, dominates RMSE. Since

the true frequency f1 =3.457 Hz is between two adjacent parameters in the frequency grid, it is biased towards one of the two grid elements. The RMSE at high SNR is limited by this bias. The frequency f2 = 7.5 Hz is on the frequency grid; estimates

are biased towards that grid element, and RMSE decreases to zero as SNR increases.

This bias is not observed in the dynamic dictionary algorithms, as the frequency pa-

rameter samples adapt to the data. For moderate to high SNR, the static dictionary

method achieves the correct model order approximately 90% of the time, which is

lower than the dynamic dictionary algorithms.

Although the Genie ML algorithm exhibits superior performance at low SNR, this

can be attributed to initialization of the algorithm at the true parameters F¯(M¯ )(0) =

[f1,...,fM¯ ], where we define fk = 0 for k > M, and use vector instead of set notation.

Both the static and dynamic dictionary algorithms do not assume any knowledge of the true parameter, and are initialized to zero amplitude on an equispaced parameter grid. If the ML algorithm is initialized without knowledge of the true parameters,

¯ ¯ (0) for example to F (M) = 01×M¯ , parameter and model order performance is signif- icantly worse than dictionary-based estimation methods. The ML estimator for the complex sinusoid model is non-convex, and the poor performance can be attributed to local minima convergence. Dictionary-based estimation algorithms (both static and dynamic) appear to obviate convergence to local minima in parameter-space by initializing frequency parameter samples over the whole parameter space, and having the ℓp regularized LS “select” the local region that fits the signal best.

84

Dyn. Pen 0 Dyn. Cons Static −10 Genie ML Sqrt(CRB)

−20

RMSE −30

−40

−50

−10 −5 0 5 10 15 20 SNR (dB) (0) (a) ω = 3.457 Hz, x = 0K×1

Dyn. Pen 0 Dyn. Cons Static −10 Genie ML Sqrt(CRB)

−20

RMSE −30

−40

−50

−10 −5 0 5 10 15 20 SNR (dB) (0) (b) ω = 7.5 Hz, x = 0K×1

Figure 5.4: Complex exponential dictionary-based parameter estimation RMSE ver- sus SNR for N = 16 time samples. The dynamic dictionary algorithms use p = 1, and the static dictionary algorithm uses p =0.8.

85

1

0.8

0.6

0.4

Dyn. Pen

Prob. Correct Model Order 0.2 Dyn. Cons Static Genie ML 0 −10 −5 0 5 10 15 20 SNR (dB)

Figure 5.5: Complex exponential dictionary model order estimation versus SNR for N = 16 time samples. The dynamic dictionary algorithms use p = 1, and the static dictionary algorithm uses p =0.8.

5.3.2 Decaying Exponentials

−γmti The decaying exponential model consists of components f(ti,γm)= e , where

t N are time (or spatial) samples, and γ = γ M are decay rates. This signal { n}n=1 { m}m=1 model can be used, for example, to model spin density in EPR imaging [11], or the loss of messenger RNA and protein concentration in biological gene expression modeling

[12].

We estimate the superposition of, M = 2, decaying exponential signals from

N = 20 time samples with a sampling period of T = 0.2 seconds. The decay rate

−1 −1 parameters are placed at γ1 =0.5 s and γ2 =3s . Magnitudes of the exponentials are real valued and set to α = α = 1. The SNR is defined as 10log( y 2/(Nσ2)), 1 2 k nlk2 where ynl is the noiseless signal. Initial amplitude estimates in all of the dictionary

86 (0) (0) algorithms are x = 0K×1, and initial parameter estimates,γ ¯ , are set to an equi-

spaced grid over [0, 6] s−1. We use K = 20 decay rate parameters in the dynamic

dictionary algorithms, and an O = 10 times oversampled parameter sample grid for

the static dictionary, so that K = 10 20 = 200. × Figure 5.6(a) shows the intercolumn correlation matrix A(γ)H A(γ), where γ is an equi-spaced set of decay rate parameters. Unlike the complex exponential signal, adjacent dictionary column correlation is a function of the first column’s decay rate parameter location and the second column’s spacing from the first. The correlation as

−1 a function of decay rate parameter spacing from γ2 =3s is shown in Figure 5.6(b).

−1 This correlation function is worse than γ1 =0.5 s in that correlation decreases less

rapidly as a function of column spacing.

As in the complex exponential example, parameter inhibition settings, δ, and µ,

are based on the minimum allowable spacing of true parameters. Figure 5.6(b) shows

that correlation is well in excess of 0.99 if decay rate parameters are closer than the

well-separated spacing of 1 s−1— correlation is worse at the endpoint 6; so if we allow

true decay rate parameters to be closer than a spacing of 1 s−1, high correlation must

be tolerated. Referencing Figure 5.2, p = 1 will only work well if the distance between

true decay rate parameters is very large. In this example, we operate in the top left

region of (p, correlation) space in Figure 5.2 and use δ = 10−3, µ = 10−8 and p =0.2.

Estimation performance as a function of SNR for the decaying exponential model

using dictionary-based estimation algorithms is shown in Figures 5.7 and 5.8. Fig-

ure 5.7 shows the square-root of the CRB and RMSE conditioned on correct model

order, and Figure 5.8 shows correct model order probability. First, we note, that at

low SNR, all of the estimators lie below the CRB. This is an effect of estimation bias.

87 0 1

1 0.9

0.8 2

0.7 3 0.6 4 Decay Rate (1/s) 0.5 5 0.4

6 0 2 4 6 Decay Rate (1/s) (a) Correlation Matrix

1

0.98

0.96

0.94

Correlation 0.92

0.9

0.88

−3 −2 −1 0 1 10 10 10 10 10 ∆Decay Rate (1/s) (b) Correlation vs. spacing

Figure 5.6: Decaying exponential dictionary correlation matrix and correlation versus decay rate spacing from γ = 3. Color encodes correlation.

88 As SNR decreases, an increasing number of estimates are clipped to the end points

of the search region, [0, 6], causing lower RMSE than if there were no boundaries.

Although RMSE is low in this region, the probability of correct model order is also low. At moderate SNRs some of the estimators fall slightly below the CRB at some

SNRs. This occurs because RMSE is conditioned on model order, but the CRB is an unconditioned bound. Parameter estimation performance of both dynamic dictionary algorithms are comparable to the CRB and genie ML estimator for moderate to high

SNR ranges; however, the penalized dynamic dictionary algorithm exhibits better probability of correct model order estimation than the constrained dynamic dictio- nary method. On average, across SNR, the penalized algorithm estimates model order correctly over 90% of the time. Static dictionary model order estimation is better on average than the constrained dynamic dictionary method, but static dictionary-based parameter estimation performance is worse for lower SNR, especially for estimation of the γ =3s−1 exponential. For any of the dictionary-based estimation algorithms, when model order is incorrectly estimated, it is usually over-estimated as 3, except at low SNR. When a model order of 3 is estimated, two of the parameter estimates are usually close in distance to the true parameters. Although not shown here, if instead of using an average BIC-selected λ for each SNR, a different λ is selected by BIC for each realization, model order performance is improved; this performance improvement comes at the expense of increased computation. When λ is selected for each realization, the probability of correct model order for the penalized algorithm is similar to the ML estimator over all SNR, and the other algorithms, on average, achieve accurate model order estimation over 90% of the time.

89

Dyn. Pen −15 Dyn. Cons Static Genie ML −20 Sqrt(CRB)

−25 RMSE −30

−35

−40 10 15 20 25 30 35 40 SNR (dB) −1 (0) (a) γ = 0.5 s , x = 0K×1

5 Dyn. Pen Dyn. Cons 0 Static Genie ML Sqrt(CRB) −5

RMSE −10

−15

−20 10 15 20 25 30 35 40 SNR (dB) −1 (0) (b) γ =3s , x = 0K×1

Figure 5.7: Decaying exponential dictionary parameter estimation RMSE versus SNR for N = 20 time samples. All algorithms use p =0.2.

90

1

0.8

0.6

0.4

Dyn. Pen

Prob. Correct Model Order 0.2 Dyn. Cons Static Genie ML 0 10 15 20 25 30 35 40 SNR (dB)

Figure 5.8: Decaying exponential dictionary model order estimation versus SNR for N = 20 time samples. All algorithms use p =0.2.

5.4 Conclusion

In this chapter, we presented two new dynamic dictionary-based estimation algo- rithms for model order and parameter estimation in additive component models. The algorithms utilize sparse reconstruction techniques to select a small subset of dictio- nary elements, which encode model order and parameter estimates. One algorithm inhibits dictionary intercolumn correlation by using a penalty on parameter spacing, and the other constrains parameter distance.

Static dictionary-based estimation algorithms have been previously investigated in the literature. These algorithms fix a dictionary then select a small subset of the dictionary columns to model the data. In the dynamic approach presented here, pa- rameters, and hence dictionary columns, adapt to the data. The dynamic dictionary algorithms are implemented using block coordinate descent. Amplitudes are updated

91 using an ℓp regularized LS algorithm commonly used in sparse reconstruction, and signal parameters are updated by solving a LS optimization problem with constraints on parameter spacing. When p = 1, both block coordinate descent update steps in the constrained dynamic dictionary algorithm are convex.

The performance of each algorithm is dependent upon algorithm settings. We ex- amined the connection between the ℓp-quasinorm, in the ℓp regularized LS algorithm, intercolumn correlation, and estimation performance; it was shown that it is advan- tageous to use smaller values of p< 1 as dictionary intercolumn correlation increases.

These results were used to select the ℓp-quasinorm and parameter inhibition settings

in each of the algorithms.

We demonstrated model order and parameter estimation performance of each dy-

namic dictionary algorithm on complex exponential and decaying exponential models.

Parameter estimation performance of each algorithm is comparable to the unbiased

CRB and genie ML estimator over a large range of SNRs, except for lower SNRs in

complex exponential estimation. In this case, the genie ML estimator outperforms

the dynamic algorithms; however, without knowledge of the true parameters, the ML

algorithm performs substantially worse than the proposed algorithms, as it converges

to local minima. In static dictionary-based estimation, parameter samples used to

construct a static dictionary are fixed, and parameter estimation is subject to quanti-

zation error, exhibited as estimation bias. We demonstrated that adaptive parameters

allow dynamic dictionary algorithms to overcome this quantization bias. Finally, we

showed that model order estimation performance of the penalized dynamic dictionary

algorithm is competitive with that of the genie ML algorithm using BIC model order

selection, and is superior to the other dictionary-based estimation algorithms.

92 Chapter 6: Sparse 3D SAR Imaging

6.1 Introduction

There is increasing interest in three-dimensional (3D) reconstruction of objects

from radar measurements. This interest is enabled by new data collection capabilities,

in which airborne synthetic aperture radar (SAR) systems are able to interrogate

a scene, such as a city, persistently and over a large range of aspect angles [80].

Three-dimensional reconstruction is further motivated by an increasingly difficult

class of surveillance and security challenges, including object detection and activity

monitoring in urban scenes. Additional information provided by wide-aspect 3D

reconstructions can be useful in applications such as automatic target recognition

(ATR) and tomographic mapping.

In SAR imaging, an aircraft emits electromagnetic signal pulses along a flight path and collects the returned echoes. The returned echoes can be interpreted as one-dimensional lines of the 3D Fourier transform of the scene, and the aggregation of radar returns over the flight path defines a manifold of data in the scene’s 3D

Fourier domain [5]. A number of techniques have been proposed for narrow angle

3D reconstruction from this manifold of data, including full 3D reconstruction and

2D non-parametric imaging followed by parametric estimation of the third, height

93 dimension. We propose a full 3D reconstruction method utilizing static dictionary-

based estimation, which throughout this chapter, we also refer to as an ℓp regularized least-squares imaging algorithm.

Full 3D reconstruction methods invert an operator to retrieve three- dimensional point scattering locations; specifically, in SAR imaging, the operator can be modeled as a Fourier operator, since data is collected over a manifold in 3D

Fourier space of the scene. Generating high-resolution 3D images using traditional

Fourier processing methods requires that radar data be collected over a densely sam- pled set of points in both azimuth and elevation angle, for example, by collecting data from many closely spaced linear flight passes over a scene [38,39]. This method of imaging requires very large collection times and storage requirements and may be prohibitively costly in practice. There is thus motivation to consider more sparsely sampled data collection strategies, where only a small fraction of the data required to perform traditional high-resolution imaging is collected. Sparsely sampled data collections with elevation diversity can be achieved through nonlinear flight paths

[42,43,81,82]. However, when inverse Fourier imaging is applied to sparsely sampled apertures, reconstruction quality can be poor.

Reconstruction quality for narrow aspect angle images can be quantified by the point spread function (PSF) of the image, defined by the Fourier transform of the data aperture indicator function. The mainlobe of this PSF will typically be wider

(indicating reduced resolution) and the sidelobes higher than for the PSF of a recon- struction formed from a densely-sampled measurement aperture (see e.g. [81,83]).

Methods to mitigate this problem by deconvolving the PSF from the 3D reflectivity

94 function using greedy algorithms were investigated in [42,43], and reconstructions of

canonical shapes were presented..

Point-spread function analysis assumes an isotropic point scattering center model.

This assumption is valid in traditional narrow aspect angle SAR imaging. However,

in wide aspect angle imaging scenarios, the anisotropic nature of scattering cannot be

neglected, and the isotropic point scattering center assumption is no longer valid. As

a result, reconstructed image resolution will be worse than indicated by PSF analysis

[84].

In this chapter, we develop an ℓp regularized LS dictionary-based estimation method for wide-angle 3D radar imaging in arbitrary, sparse apertures, and we demon- strate this approach on the problem of 3D vehicle reconstruction. This approach relies on some basic properties of scattering physics, and exploits signal sparsity (in the re- construction domain) of radar scenes. In particular, we are interested in imaging man-made structures under high-frequency radar operation. Under these operating conditions, scenes are dominated by a sparse number of dominant isolated scatter- ing centers; dominant returns result from objects such as corner or plate reflectors made from electromagnetic conductive material (see e.g. [10]). The proposed imag- ing algorithm requires only knowledge of the flight geometry, and is applicable to image formation in arbitrary collection geometries. In addition, since collected radar data can be interpreted as samples in the 3D Fourier transform space of the scene, matrix-vector multiplications in the regularized LS algorithm can be replaced by the

Fast Fourier Transform (FFT). The regularized LS algorithm is also known as Basis

Pursuit Denoising when p = 1 [63,64]. This approach has been shown to produce

well-resolved, 2D SAR image reconstructions over approximately linear flight paths

95 [31,40,41]; it was also used for 3D image reconstruction in [85–87]. For wide-angle

3D SAR reconstruction, a direct implementation of an ℓp regularized LS dictionary estimation approach yields a prohibitively large optimization problem; one of the contributions of this chapter is the development of a computationally tractable im- plementation.

We investigate regularized LS wide-angle 3D SAR image formation on different sparse data collection geometries. The first collection geometry is a pseudorandom path collection for three polarizations generated by Visual-D electromagnetic simula- tion software and released as a public dataset by the Air Force Research Laboratory

(AFRL) [88]. The second dataset, also released by AFRL, is from an actual 2006 multipass X-band Circular SAR (CSAR) data collection of a ground scene [89]. This dataset consists of eight fully circular paths in azimuth, at eight closely-spaced eleva- tion angles with respect to scene center; this data is polarimetric, in that horizontal- horizontal (HH), vertical-vertical (VV) and cross-polarization data is collected. The previously-discussed resolution and sidelobe issues that result from sparse measure- ment apertures are manifest in both of these datasets.

An alternative approach to wide-angle 3D imaging is based on forming a small set of 2D SAR images followed by parametric 1D estimation to estimate the third, or height, dimension in the backscatter profile. Interferometric SAR (IFSAR) is a well-known classical technique for parametric height estimation from 2D SAR im- agery formed at two linear elevation passes [5,6]; the multi-baseline extension of this approach is known as Tomo-SAR [44–53,90]. Tomo-SAR imaging is more restric- tive than the regularized LS method in that it requires data collected at multiple elevation angle passes. For comparison with the regularized LS method, we present

96 wide-angle 3D imagery formed using the wide-angle Tomo-SAR approach of [90]. We only present Tomo-SAR images from the GOTCHA dataset, since multiple elevation angle passes are required in this approach.

The contributions of this work can be summarized as follows. First, we propose a technique to process sparse wide-angle data, such as circular SAR data, for ob- ject reconstruction; this type of data is becoming increasing important in persistent surveillance applications. Second we provide full 3D radar reconstructions using ℓp regularized LS dictionary-based estimation, and provide a tractable algorithm, in terms of memory and computational requirements, for generating full 3D reconstruc- tions from arbitrary, sparse 3D flight paths. Third, we demonstrate the first high-

fidelity 3D vehicle reconstructions from an arbitrary curvilinear flight path. Finally, we provide a comparison, in terms of both reconstruction performance and computa- tional cost, of the regularized LS reconstruction and Tomo-SAR imaging approaches on measured X-band radar data of vehicles.

An outline of this chapter is as follows. First, an overview of the SAR data model is presented in Section 6.2. Section 6.3 describes the two collection geometries, pseu- dorandom and CSAR, and corresponding datasets considered here. These collection geometries demonstrate some of the challenges presented by such sparse collections.

In Section 6.4, the ℓp regularized LS imaging algorithm is presented, and Section 6.5 presents reconstructed 3D images of vehicles from both the pseudorandom and CSAR data collection geometries. Finally, Section 6.6 concludes, summarizing the main re- sults.

97 6.2 SAR Model

In this section, we briefly review the tomographic SAR model used for reconstruc-

tion. We assume that the radar transmits a wideband signal with bandwidth BW

centered about a center frequency fc. Such a signal could be an FM chirp signal or

a stepped-frequency signal, but other wideband signals can also be used. We also

assume that the transmitter is sufficiently far away from the scene so that wavefront

curvature is negligible, and we use a plane wave model for reconstruction; this as-

sumption is valid, for example, when the extent of the scene being imaged is much

smaller than the standoff distance from the scene to the radar.

For a radar located at azimuth φ and elevation θ with respect to scene center that transmits an interrogating signal, the received waveform, in the far-field case, is given by [5] r(t; φ,θ,pol)= Bz˜ By˜ ct (6.1) g x˜ = , y,˜ z˜; φ,θ,pol dyd˜ z˜ ⋆s(t), 2 "Z−Bz˜ Z−By˜   # where c is the speed of light, s(t) is a known, bandlimited signal with center frequency

fc and bandwidth BW that represents the transmitted waveform convolved with

antenna responses; pol is the polarization of the transmit/receive signal pair, and ⋆

denotes convolution. Thex ˜-coordinate is defined as the radial line from the radar to

scene center, andy ˜, andz ˜ are orthogonal tox ˜ and to each other. This coordinate system is a translation inx ˜ from scene center and a rotation by (φ,θ) of a fixed, ground coordinate system (x,y,z), whose origin is at scene center. The scene’s reflectivity function is given by g(˜x, y,˜ z˜; φ,θ,pol), or equivalently, by g(x,y,z; φ,θ,pol) in a fixed ground coordinate system. Boundaries of the scene in each dimension are denoted as

B(·). Under the far-field assumption, these boundaries are assumed to be sufficiently

98 small so that waveform curvature and range-dependent signal attenuation can be

neglected, which means that these scene boundaries are on the order of objects but

not entire large scenes. For large scenes, (6.1) applies locally around setpoints of

interest.

Equation (6.1) can be interpreted as the Fourier transform of the scene reflectivity

function projected onto thex ˜-dimension. By the projection-slice theorem [5], this

Fourier transform is equivalent to a line along thex ˜-axis in 3D spatial frequency

space, or k-space, of the scene reflectivity function. Specifically, the 3-D Fourier transform G(kx, ky, kz) of the reflectivity function g(x,y,z; φ,θ,pol), observed from angle (φ, θ) at polarization pol is given by:

G(kx, ky, kz; φ,θ,pol)= g(x,y,z; φ,θ,pol) Z (6.2) e−j(kxx+kyy+kzz)dxdy dz.

The frequency support of each measurement is a line segment in(kx, ky, kz) with

4πBW 4πfc extent c rad/m centered at c rad/m, and oriented at angle (φ,θ). The flight path defines which line-segments in k-space are collected, and hence what subset of k-space is sampled. Typically both the frequency variable f along each line segment and the flight path are sampled as f f , (φ,θ) (φ ,θ ), so one obtains a set of → j → n n k-space samples indexed on (j, n) as:

4πf kj,n = j cos θ cos φ x c n n 4πf kj,n = j cos θ sin φ (6.3) y c n n 4πf kj,n = j sin θ . z c n

In order to use tomographic inversion techniques to recover g from k-space mea- surements, it is often assumed in (6.2) that the scene reflectivity is isotropic; so,

99 g(x,y,z; θ,φ,pol) is not a function of θ and φ. For narrow-angle measurements, this assumption is generally valid; however, for wide-angle measurements, the isotropic scattering assumption is not valid for most scattering centers in the scene [91,92].

One approach for reconstruction from wide-angle measurements, and the one adopted here, is to subdivide the measurements into a set of possibly overlapping subapertures and to assume scattering is locally isotropic on each subaperture. Once subaperture reconstructions are obtained, one can then form an overall wide-angle reconstruction by combining the narrow-aperture reconstructions in an appropriate way.

In particular, we argue that a good way to implement the subaperture combination is using a Generalized Likelihood Ratio Test (GLRT) approach. We assume scattering at each point (x,y,z) in the scene can be characterized by a limited-angle response centered at azimuth φ and elevation θ and with some persistence width in each angular dimension. We treat the persistence angle as fixed and known, and we use this to establish the angular widths of the subapertures used in the data formation. Since the response center angles (φ,θ) are unknown, we estimate them using a GLRT formulation: use a bank of matched filters, each characterized by a center response azimuth and a response width and shape, and compute the response amplitude as

I(x,y,z; pol) = argmax I(x,y,z; φ,θ,pol) , (6.4) (φ,θ) | | where I denotes the matched filter output. The maximization in equation (6.4) over continuous-valued φ and θ is approximated by discretizing these two variables.

Since backprojection radar image formation can be interpreted as a matched filter for point scattering responses [93], each matched filter output I(x,y,z; φ,θ,pol) is well- approximated by the subaperture radar image formed from k-space measurements at discrete center angles (φj,θj) and with fixed azimuth and elevation extent. That

100 is, the approach of forming subaperture radar reconstructions, then combining these

reconstructions by taking the maximum over all subapertures, can be interpreted as

a GLRT approach to reconstruction of limited-persistence scattering centers. While

the approach in (6.4) assumes that all scattering centers have identical and known

persistence, generalizations to variable persistence angles can also be developed [94].

As a side note, each voxel in the image reconstruction is also characterized by the

maximizing (φ,θ) center angles, providing additional information useful for image visualization [41] or object recognition [95,96].

In the ℓp regularized LS imaging algorithm presented below, we will assume that the available k-space data is partitioned into (possibly overlapping) subapertures, and

that reconstructions for each subaperture are obtained using the algorithm. A final

wide-angle reconstruction is obtained using (6.4).

An advantage of this locally-isotropic approach is that, for each subaperture, scat-

tering responses are parameterized by only location and amplitude. An alternate

approach, considered in [10,97–99], is to adopt models that directly characterize

anisotropic scattering. One can then directly estimate scattering centers from the

entire wide-angle data using these models. This latter approach may be posed as a

classical parametric model order and parameter estimation problem (see, e.g., [10,99]).

Alternately, one can adopt a dictionary-based approach in which anisotropic scatter-

ing is characterized as a linear combination of dictionary elements that are limited

in persistence, and one estimates the amplitudes of a sparse linear combination of

dictionary elements; such an approach has recently been proposed in [97]. A related

dictionary-based approach is to estimate an image at each aspect angle from a sparse

linear combination of dictionary elements; the images are not independently formed,

101 but linked through a regularization term that penalizes for large changes in pixel

magnitudes that are close in aspect [98]; regularization enforcing sparsity in these ap-

proaches is similar to the ℓp reconstruction technique presented in Section 6.4 below.

These wide-angle dictionary-based approaches result in a (much) larger set of dictio- nary elements than used in the approach followed here; this is because anisotropic scattering is characterized by additional parameters such as orientation and persis- tence angles. In principle, the approaches in [10,97–99] are based on similar assump- tions, but represent different algorithmic approaches to estimate the reconstruction.

A detailed comparison of these approaches in terms of both computation and recon- struction performance remains a topic for future study.

6.3 Collection Geometry and Example Datasets

Before presenting the proposed reconstruction algorithm, it is useful to exam- ine some example data collection apertures and the associated 3D reconstruction challenges that result. We will first present and discuss two sparse radar collection geometries and their associated reconstruction objectives. The first dataset consid- ered is synthetically generated data from a pseudorandom flight path developed by researchers at AFRL as a 3D image reconstruction challenge problem [88]; the second dataset is a collection of X-band field measurements from a CSAR radar at eight closely-spaced elevations [89].

6.3.1 Pseudorandom Flight Path Dataset

The pseudorandom flight path dataset [88,100] is generated by the Visual-D elec- tromagnetic scattering simulator. The simulator models scattering returns from a radar with center frequency fc = 10 GHz and bandwidth BW = 6 GHz. The dataset

102 consists of k-space samples computed along a continuous, far-field pseudo-random

“squiggle” path in azimuth and elevation from a construction backhoe vehicle. The path is intended to simulate an airborne platform that interrogates the object over a wide range of azimuth and elevation angles, but doing so while flying along a 1D curved path that sparsely covers the 2D azimuth-elevation angular sector. Three po- larizations are included in the dataset, vertical-vertical (VV), horizontal-horizontal

(HH), and cross-polarization (HV).

The trace in Figure 6.1 shows the path as a function of azimuth and elevation an- gle, defined with respect to a fixed ground plane coordinate system, and Figure 6.2(a) displays the corresponding k-space data that can be collected by the radar, which is contained between the inner and outer domes that denote the minimum and maxi- mum radar frequency, respectively. The squiggle path is superimposed on the outer dome. The set of k-space data collected along the squiggle path is very sparse with respect to the full data dome. The azimuth and elevation extents of the squiggle path are approximately [66◦, 114.1◦], and [18◦, 42.1◦], respectively. This range of nearly

50◦ in azimuth and 25◦ in elevation and indeed represents wide-angle measurement at

X-band; the persistence of many scattering centers at X-band has been reported to be (significantly) lower [91,92]. In contrast, a filled aperture used to form benchmark

1 ◦ images uses samples at every 14 in this azimuth/elevation sector.

6.3.2 Multipass Circular SAR Dataset

The second sparse dataset we consider is the multipass CSAR data from the AFRL

GOTCHA Volumetric SAR Data Set, Version 1.0 [89,101]. This dataset consists of sampled, dechirped radar return values that have been transformed to the form of

103 45

40

35

30 Elevation 25

20

15

70 75 80 85 90 95 100 105 110 Azimuth

Figure 6.1: Sparse “squiggle”path radar measurements as a function of azimuth and elevation angle in degrees.

(a) Squiggle Path (b) CSAR Path

Figure 6.2: Data domes of all k-space data that can be collected by a radar for (a) the pseudorandom synthetic “squiggle” path backhoe dataset, and (b) the GOTCHA dataset; units are in rad/m. Support of the k-space data is contained between the inner and outer dome. Inner and outer domes show the minimum and maximum radar interrogating frequencies. The outlines on the outer domes show the locations of the sparse k-space data collected, which extends from the outline radially to the inner dome.

104 ◦ G(kx, ky, kz; φ,θ,pol) in (6.2). The data is fully polarimetric from 8, 360 CSAR passes. The planned nominal collection consists of passes at constant, equally-spaced

◦ elevation angles with respect to scene center, with elevation difference, ∆el = 0.18 , in the range [43.7◦, 45◦]. The actual flight path is not perfectly circular, as shown in

Figure 6.3, and not at perfectly constant and equally-spaced elevations. The center frequency of the radar is fc = 9.6GHz, and the bandwidth of the radar is 640MHz, significantly lower than that of the squiggle path collection. Figure 6.2(b) shows the k-space data collected by the eight GOTCHA passes. The k-space radial extent from the outer dome to inner dome of data collected, dictated by radar bandwidth, is seen to be significantly smaller than in the squiggle path case. Figure 6.2(b) also illustrates that the GOTCHA k-space data is very limited in elevation extent, in contrast to the squiggle path.

Figure 6.3: Actual GOTCHA passes. Scale is in meters.

105 6.4 ℓp Regularized Least-Squares Imaging Algorithm

In this section we show that SAR data can be modeled as an additive component model of the form (2.1), and we present an ℓp regularized LS dictionary-based estima- tion algorithm for wide-angle 3D SAR imaging. The proposed approach assumes that the number of 3D locations in which nonzero backscattering occurs is sparse in the

3D reconstruction space and applies dictionary-based estimation to image (estimate) scattering center locations. This ℓp regularized LS imaging algorithm attempts to fit an image-domain scattering model to the measured k-space data under a penalty on the number of non-zero voxels. The algorithm assumes that the complex amplitude response of each scattering center is approximately constant over narrow aspect an- gles and across the radar frequency bandwidth. It also applies to general apertures, in contrast to other 3D imaging algorithms, such as IFSAR or Tomo-SAR, which apply only to apertures with specific structure.

Under the assumptions of narrow angle data collection and invariant scattering response across radar bandwidth, (6.2) is the Fourier Transform of the scene reflec- tivity function for a given polarization; the reflectivity function can be modeled as a set of isotropic point scattering centers, with amplitude response given by an impulse function [5,6]; so, for a scene with M scattering centers, (6.2) can be written as

G(kx,n, ky,n, kz,n; pol)= M −j(kx,nxm+ky,nym+kz,nzm) g(xm,ym,zm; pol)e , n =1,...,N. (6.5) m=1 X which is in the form of the additive component model (2.1). In the notation of Chap- ter 2, the parameters Θ = [x , y , z ]T M are the M scattering center locations { m m m }m=1

(xm, ym, zm) in 3D Cartesian space; amplitudes, α = [g(xm,ym,zm)]m=1,...,M , are

106 the M scattering center reflectivities; measurement are [G(kx,n, ky,n, kz,n; pol)]n=1,...,N

T taken at N k-space samples, tn =[kx,n, ky,n, kz,n] at locations (kx,n, ky,n, kz,n)in3D

H −jθ tn k-space, and the component function is a complex exponential f(tn,θm)= e m . It is also assumed that each measurement is corrupted by additive noise. The objective in SAR imaging is to estimate the scattering center amplitudes α and locations Θ from k-space measurements.

Next, we represent (6.5) as a linear system of the form (2.10) for dictionary- based estimation. Define a set of K scattering center coordinate location samples as

candidate locations,

Θ=¯ [¯x , y¯ , z¯ ]T K . (6.6) { i i i }i=1 Typically these locations are chosen on a uniform rectilinear grid. The N K data × measurement matrix is given by

−j(kx,nx¯i+ky,ny¯i+kz,nz¯i) A = e n,i ,   where n indexes the N measured k-space frequencies down rows, and i indexes the

K scattering center coordinate samples in Θ¯ across columns. Under the same as-

sumptions as (6.5), that scattering center amplitude is constant over the aspect angle

extent and radar bandwidth considered, the measured (subaperture) data from (6.5),

can approximated as

w = Ab + ǫ, b = M, (6.7) k k0 where b is the K-dimensional scattering amplitude vector, and

T w =[G(kx,1, ky,1, kz,1; pol),...G(kx,N , ky,N , kz,N ; pol)]

is the N dimensional vector of k-space measurements. The noise ǫ is an N dimensional

vector modeled as i.i.d. circular complex Gaussian noise with zero mean and variance

107 2 σn. The sparse amplitude vector b is (approximately) given by (2.11) (with b in place of x). We use w in place of y and and b in place of x in (2.10) to avoid notational ambiguity with coordinates x and y. The linear model (6.7) is exact if the true scattering center locations Θ are contained in the set of location samples Θ.¯

Amplitude and location parameters are estimated using dictionary-based estima- tion with the linear model (6.7). Reconstructed amplitudes, ˜b, are the solution to the sparse optimization problem [31,40]

˜ 2 p b = argmin w Ab 2 + λ b p, (6.8) b k − k k k where the p-quasinorm is denoted as , 0 < p 1, and λ is a sparsity penalty k · kp ≤ setting. Note that the solution to (6.8) applies for general A matrices, and the radar

flight path locations that index the rows of A can be arbitrary. In particular, flight paths such as the squiggle path in Figure 6.2(a) can be used.

Estimates of the number of scattering centers (model order), Mˆ , scattering center

T locations [ˆxm, yˆm, zˆm] , and amplitudes,g ˆ(ˆxm, yˆm, zˆm), are given by (4.10) through

(4.12) after thresholding with (4.13) and making appropriate amplitude and param- eter notational substitutions. The 3D reconstructed image is simply a visual repre- sentation of amplitude and parameter estimates. Points are placed at each estimated scattering center location, and each point is color coded by its estimated amplitude.

In this context, we use “imaging” and “estimation” interchangeably.

Many algorithms exist for solving (6.8) or the constrained version of this problem when p = 1 (e.g. [27,58,63,102,103]), or in the more general case, when 0 < p 1 ≤ (e.g. [22,40]). We use the iterative majorization-minimization algorithm in [40] to implement (6.8). This algorithm is suitable for the general case when 0 < p 1. ≤ The algorithm has two loops, an outer loop which iterates on a surrogate function

108 and an inner loop that solves a matrix inverse using a conjugate gradient algorithm;

in our experience, the inner loop terminates after very few iterations when using

a Fourier operator, as considered here. Empirical evidence also indicates that this

majorization-minimization algorithm terminates faster than a split Bregman iteration

approach [58].

For algorithm implementation, define a uniform rectilinear grid on the x, y, and

z spatial axes with location parameter sample (voxel) spacings of ∆x, ∆y, and ∆z,

respectively. Let the set of candidate coordinate locations Θ¯ in (6.6) consist of all

permutations of (x,y,z) coordinates from the partitioned axes; then, the set Θ¯ defines

an equi-spaced 3D grid on the scene. If, in addition, the k-space samples are on

an equi-spaced 3D frequency grid centered at the origin, the operation Ab can be

implemented using the computationally efficient 3D Fast Fourier Transform (FFT)

operation. In many scenarios, including the one here, the measured k-space samples

are not on an equi-spaced grid, and the FFT cannot be used directly. Instead an

interpolation step followed by an FFT is needed. An alternative approach would

be to use the Type-2 nonuniform FFT (NUFFT) as the operator A to process data

directly on the non-equi-spaced k-space grid, at added computational cost [104,105].

Nonuniform FFT algorithms require an interpolation step, which is executed each

time the operator A is evaluated; whereas, in FFT implementation, interpolation

occurs only once and the interpolated data becomes w. When using an iterative algorithm to solve (6.8), as done here, performing interpolation once can result in significant computational savings. Our empirical results on the X-band data sets of

Section 6.3 suggest that nearest neighbor interpolation results in well-resolved images at low computational cost, and so it is adopted here.

109 Implementation of the optimization algorithm solving (6.8) for large-scale prob- lems can be challenging from a memory and computational standpoint. In iterative algorithms, like the one utilized here, typically, the data vector w as well as the cur- rent iterate of b and a gradient with the same dimension as b are stored. For example,

in the simulations below, we reconstruct a scene with K = 182 250 252 1.1 107 × × ≈ × voxels to cover a single vehicle. So, at the very least, it would be necessary to store

the data vector in addition to two vectors of double or single precision in 1.1 107 × dimensional complex space. For algorithms that utilize a conjugate gradient approach

to calculate matrix inverses, it is also necessary to store a conjugate vector of the same

dimension K, and in a Newton-Raphson approach, it is necessary to store a Hessian of

dimension K K. During each iteration of an algorithm, it is commonly required to × evaluate the operator A and its adjoint. These operations can become very computa-

tionally expensive when the problem size grows and may result in a computationally

intractable algorithm, unless a fast operator such as the FFT is employed.

Specifically, since A is an N K matrix, direct multiplication of Ab requires × NK multiplies and additions per evaluation. In examples using the squiggle path and nine subapertures chosen, the average value of these nine N values is 105, so

N K 1012 operations. After initial interpolation, an FFT implementation of Ab re- × ≈ quires (D3 log(D3)) operations, where D is the maximum number of samples across O the image dimensions. For the imaging example with dimensions 182 250 252, × × D = 252. For concreteness, assuming the constant multiple on the order of opera- tions in the FFT is close to unity, FFT implementation of the operator A requires approximately 2523 log(2523) 3.8 108 operations; so, FFT implementation results ≈ × in computational savings greater than a factor of 2500.

110 Since the scattering centers in model (6.2) are anisotropic and polarization depen-

dent, we apply static dictionary-based estimation to form an image for each narrow-

angle subaperture and polarization and combine the images using equation (6.4).

Recent approaches for joint reconstruction of multiple images [106] may also be ap-

plied to simultaneously reconstruct all polarizations for each subaperture.

6.5 3D Imaging Results

We next present 3D SAR image reconstruction results from both the squiggle path

and the CSAR datasets, using the ℓp regularized LS imaging algorithm described in the previous section. We show both raw voxel reconstructed images, and smoothed surface fit reconstructions that are useful for visualization.

6.5.1 Squiggle Path Reconstructions

The k-space data from the path shown in Figure 6.1 are first partitioned into overlapping subapertures, each with azimuth angle extent of 10◦ and full elevation extent, and separated by 5◦ center azimuth increments. As an example, Figure 6.4 shows the magnitude of k-space data from the k-space subset in azimuth range of

[66◦, 76◦).

Each subset of data is contained in a bounding box with bandwidths in each dimension of (XBW,YBW,ZBW) = (142.80, 314.2, 285.6) rad/m. At these bandwidths, spatial samples are critically sampled with sample spacings of (∆x, ∆y, ∆z)=(0.044,

0.02, 0.022) meters in each respective dimension. Both the image reconstruction and k-space interpolation are performed on equi-spaced 182 250 252 grids. With × × this size grid, the spatial extent of the reconstructed images is [ 4, 4) [ 2.5, 2.5) − × − × [ 2.77, 2.77) meters in the x, y, and z dimensions respectively. Each subset of k-space − 111 Figure 6.4: Magnitude of k-space data subset from azimuth range [66◦, 76◦). Lighter colors and smaller points are used for smaller magnitude samples; darker colors and larger points are used for larger magnitude samples. Axes units are in rad/m.

data is interpolated using nearest neighbor interpolation. In simulations not presented here, more accurate interpolations using both the Epanechnikov and Gaussian kernels were found to result in nearly identical images, but at much higher computational cost.

The squiggle path dataset is noiseless. To simulate the effect of radar measurement noise, we corrupt the k-space data with i.i.d. circular complex Gaussian noise with

2 zero mean and variance, σn =0.9. Real and imaginary parts of the k-space data have

2 a mean of approximately zero and variance, σs , of approximately 9; thus, the noise variance is chosen so that the signal to noise ratio (SNR) is 10 dB, where SNR in

2 σs decibels is defined as 10log( 2 ). σn First, we show in Figure 6.5 a side view of a ’gold standard’ benchmark 3D re-

constructed backhoe image corresponding to the squiggle path dataset [100]. The

image was formed using a windowed 3D inverse Fourier transform of a dense k-space

112 dataset covering the azimuth and elevation range of the squiggle path; this dense

1 ◦ data is given for every 14 in azimuth and elevation angle along an azimuth range of [65.5◦, 114.5◦] and elevation range of [17.5◦, 42.5◦]. Squiggle path k-space data is

contained within this benchmark dataset and is very sparse with respect to it; see

Fig. 6.1. The squiggle path dataset consists of only 1.29% of the benchmark data

samples.

(a)

(b)

Figure 6.5: Benchmark reconstructed backhoe image using k-space data collected at 1 ◦ ◦ ◦ every 14 in azimuth and elevation angle along an azimuth range of [65.5 , 114.5 ] and elevation range of [17.5◦, 42.5◦]. Subfigure (a) displays the reconstructed image superimposed on the backhoe facet model, and (b) shows the reconstructed image without the facet model. Images from the pseduorandom dataset.

113 Second, we present image reconstructions from the squiggle path using standard

Fourier image reconstruction. A reconstructed squiggle path image viewed from the

side and top is shown in Figure 6.6. The top 25 dB magnitude voxels are displayed

in the image. One sees that the structure of the backhoe is highly smoothed and

distorted due to the high image sidelobes, and backhoe features, such as the front

scoop are not well localized. Poor image quality is predicted from the subaperture

point spread functions of the sparse squiggle path; an example is shown in Figure 6.7

for one subaperture. The PSF is not well localized and exhibits significant spreading

and high sidelobes due to the sparseness of the data.

Figure 6.8 shows the side and top view of a reconstructed squiggle path backhoe

image using the ℓp regularized LS reconstruction algorithm in Section 6.4. The top

30 dB magnitude voxels are displayed. The images in Figure 6.8 were formed by first reconstructing 27, 3D images from each subaperture and polarization; images are the solution to the optimization problem (6.8). All images are reconstructed using a norm with p = 1 and sparsity parameter λ = 10, which are selected manually.

Automatic selection of λ is an ongoing area of research [29,33,61]. Here, p, and λ were chosen empirically through visual inspection of images. Final images are formed by combining the subset images over the maximum of polarizations in addition to aspect angles according to (6.4).

In addition to the scattering point plots displayed in the top of Figure 6.8, it is possible to accentuate surfaces of 3D reconstructed images for visualization by smoothing image voxels; visualizations are shown in Figures 6.8(e) and 6.8(f). There are a large array of scientific visualization tools for accomplishing such a task, such as Maya and ParaView. Maya visualization examples are given in [107]. Here we

114 apply a Gaussian kernel with diagonal covariance and equal standard deviation, σ, to smooth the voxels. Smoothed images are formed on a grid with the same dimensions as the original grid. To speed up the smoothing, the kernel is given a fixed support within some radius of the grid voxel being smoothed. In Figures 6.8(e) and 6.8(f), a standard deviation of σ = 0.4 m and grid radius of 3σ is used. Voxel magnitude is then displayed using color and transparency coding. Blue, transparent colors indicate low relative voxel magnitude and red, opaque colors indicate large relative voxel magnitude.

As can be seen from Figure 6.8, features in the sparse reconstructions are well- resolved. For example, the hood, roof, and front and back scoops are clearly visible, in the correct location, and do not exhibit the large sidelobe spreading seen in Figure 6.6.

The side panels of the driver cab are not visible, and the arm on the back scoop is not as prominent as in the benchmark in Figure 6.5, but most backhoe features in the benchmark backhoe image are also visible in the squiggle path reconstruction. There are a small number of artifacts in the image that do not lie close to the backhoe, namely below the front and back scoop. These artifacts appear to be due to multiple- bounce effects that are present in the given scattering data, rather than to an ‘error’ artifact of the reconstruction process. From the top view of the backhoe, the group of voxels at the top left also appear to be present in the benchmark image as viewed from an angle not shown in Figure 6.5; these voxels are also likely the result of multibounce from the back scoop and are not artifacts specific to squiggle path reconstruction.

Simulation results presented above were performed in MATLAB on a system with an Intel 3 GHz Dual Core Xeon processor and 4 GB of memory. Both the interpolation and sparse optimization in image reconstruction can be computationally intensive.

115 (a) (b)

(c) (d)

Figure 6.6: Reconstructed backhoe from standard Fourier image reconstruction using each subaperture image for an SNR of 10 dB. Lighter colors and smaller points are used for smaller magnitude voxels; darker colors and larger points are used for larger magnitude voxels. Subfigure (a) and (b) show a side view of the reconstructed image with and without the backhoe facet model superimposed, respectively; subfigures (c) and (d) show top views of the reconstructed image with and without the backhoe facet model superimposed, respectively. The top 25 dB magnitude image voxels are displayed.

116 Figure 6.7: Magnitude of PSF from the squiggle path over azimuth range [66◦, 76◦). Light colors and small points are used for small magnitude voxels; darker colors and large points are used for large magnitude voxels. Axis units are in meters.

The nearest-neighbor interpolation method is fast compared to ℓp regularized LS optimization and took less than 25 seconds to run on each data subaperture; sparse optimization computations took 17 26 minutes to run on each subaperture. Although − not investigated here, it may be possible to alter stopping criterion tolerances in the algorithm to lower computation times without adversely affecting reconstructed images.

6.5.2 Mulitpass CSAR Reconstructions

We next consider 3D vehicle reconstructions from measured X-band CSAR data taken over an urban area. Figure 6.9 shows a 2D radar ground image formed from one pass of the CSAR scene using the filtered backprojection algorithm [5,6]. This is the scene of a parking lot with several vehicles, including a calibration tophat and a

Toyota Camry. Figure 6.10 shows photographs of the tophat and Camry.

117 (a) (b)

(c) (d)

(e) (f)

Figure 6.8: Reconstructed backhoe image using regularized LS for an SNR of 10 dB. The top 30 dB magnitude image voxels are displayed. In (a) through (d), lighter colors and smaller points are used for smaller magnitude voxels; darker colors and larger points are used for larger magnitude voxels. In (e) and (f) smoothed visualizations are displayed. The left column of subfigures show a side view of the reconstructed image. In (a) the backhoe facet model is superimposed; The right column of subfigures show top views of the reconstructed image. In (b), the backhoe facet model is superimposed.

118 Radar flight location information for the GOTCHA dataset contains sensor loca- tion errors. These errors are corrected using a prominent-point (PP) autofocus [6] solution provided with the GOTCHA dataset; in addition, spotlighting is used to reduce computation and memory requirements; these processes are discussed in more detail in [87]. We form 3D reconstructions of two spotlighted areas of the CSAR

GOTCHA scene centered on the tophat, and on the Toyota Camry. For the ℓp reg- ularized LS reconstructions, 5◦ subapertures from 0◦ to 360◦ with no overlap were

used, for a total of 72 subaperture images that are combined by (6.4). Reconstructed

ℓp regularized LS image voxels are spaced at 0.1 m in all three dimensions. The di- mensions of the reconstructed tophat and Camry images in (x,y,z) dimensions are

[ 2.5, 2.5) [ 2.5, 2.5) [ 2.5, 2.5) and [ 5, 5) [ 5, 5) [ 5, 5) meters re- − × − × − − × − × − spectively. These dimensions define the k-space bandwidth of the bounding box and grid used for nearest-neighbor interpolation. The bounding box bandwidth used in both images is 62.8318 rad/m in all dimensions. The interpolation grid inside the bounding box consists of 50 samples for the tophat and 100 samples for the Camry in each dimension. As before, we chose p and λ manually to generate images that produce qualitatively good reconstructions.

Figure 6.11 shows 3D reconstructions of the tophat and Camry formed using tra- ditional Fourier reconstruction techniques on each interpolated subaperture dataset, and then by combining the subaperture reconstructions using (6.4). The VV po- larization channel is used, and only the top 20 dB of voxels are shown, with lighter colors and smaller points indicating lower magnitude scattering and larger points with darker color indicating larger magnitude scattering. These images are very similar to ones generated using filtered backprojection processing. The images have poor

119 Figure 6.9: 2D SAR image of the GOTCHA scene. Image from the GOTCHA dataset.

(a) Tophat (b) Camry

Figure 6.10: Photographs from the GOTCHA scene. Images from the GOTCHA dataset.

120 (a) Tophat (b) Camry

Figure 6.11: Traditional Fourier images

resolution, especially in the slant plane height directions, due to the sparse support

of k-space data in elevation angle; the support window of this collection geometry

results in a point spread function with spreading and high sidelobes [108].

Figure 6.12 shows three different views of the tophat 3D reconstruction using the

ℓp regularized LS approach. These reconstructions use the VV polarization data, with

parameter settings of λ =0.01 and p = 1, and, in contrast to the Fourier images, the

top 40 dB of voxels are shown. The reconstruction in Figure 6.12 clearly shows the

circular ‘corner’ between the base and cylinder of the tophat (see Figure 6.10(a)), and

this scattering is well-localized to the correct location. From the reconstruction, the

radius of the tophat is seen to be approximately 1 m, agreeing with the true radius

of the tophat. Furthermore, there are no visible artifacts in the image.

Figure 6.13 shows ℓp regularized LS reconstructions of the Toyota Camry for two

polarizations (VV and HH). The parameters λ = 10, and p = 1 are used in the recon-

structions, and the top 40 dB of scattering centers are shown. To highlight vehicle

structure, images are displayed using the smoothing visualization process as described

121 in Section 6.5.1 with a Gaussian kernel standard deviation of σ =0.1 m. An example of a non-smoothed scatter point plot of the Camry is given in Figure 6.14. Fig- ures 6.13(g) through 6.13(i) show combined HH and VV polarization images formed by taking the maximum over polarizations in (6.4) in addition to aspect angle. In all of the images, the outline of the Camry is clearly visible. The upper, curved line is direct scattering from the vehicle itself, whereas the lower curve at 0 m elevation is scattering from the virtual dihedral made up of the ground and the vertical vehicle sides, front, and back. The HH images appear to show more scattering from this virtual dihedral than to the VV images, as there is a more pronounced line below the car; there is also some scattering above the windshield in the HH image, which may be an artifact, and does not appear in the VV image. The apparent artifacts in the VV polarization below the front of the car and to the side of the car in the 3D view, are scattering from an adjacent vehicle that is not completely removed by the spotlighting process.

In Figure 6.14, we illustrate the aspect dependence of the proposed ℓp regularized

LS non-coherent imaging process. Whereas, previous figures were color-coded on voxel magnitude, Figure 6.14 is color-coded on azimuth angle. The color of a voxel indicates center azimuth angle of the subaperture image that it came from. The circle at the base of the Camry shows azimuth angle of the aircraft with respect the the

Camry.

Computations for Figures 6.12-6.14 were performed in MATLAB on a system with an Intel 2.8 GHz Pentium D processor and 2 GB of memory. Interpolation time with nearest-neighbor interpolation was negligible; sparse optimization computations took

3 5 minutes on each subaperture. −

122 (a) 3D view (b) Side view

(c) Top view

Figure 6.12: ℓp regularized LS tophat reconstructions with λ =0.01 and p = 1. The top 40 dB magnitude voxels are shown.

123 (a) 3D view, VV polarization (b) Side view, VV polariza- (c) Top view, VV polarization tion

(d) 3D view, HH polarization (e) Side view, HH polariza- (f) Top view, HH polarization tion

(g) 3D view, VV and HH po- (h) Side view, VV and HH po- (i) Top view, VV and HH po- larization larization larization

Figure 6.13: ℓp regularized LS Camry reconstructions with λ = 10 and p = 1. The top 40 dB magnitude voxels are shown.

124 (a) 3D view, VV and HH polarization

(b) Top view, VV and HH polarization

Figure 6.14: ℓp regularized LS Camry azimuth angle color-coded reconstructions using combined VV and HH polarizations with λ = 10 and p = 1. The top 40 dB magnitude voxels are shown. Colobar units are in degrees.

125 (a) 3D view, VV polarization (b) Side view, VV polariza- (c) Top view, VV polarization tion

(d) 3D view, HH polarization (e) Side view, HH polariza- (f) Top view, HH polarization tion

(g) 3D view, VV and HH po- (h) Side view, VV and HH po- (i) Top view, VV and HH po- larization larization larization

Figure 6.15: 3D Camry reconstruction using wide-angle Tomo-SAR imaging. The top 20 dB magnitude scatterers are shown

126 We now compare images formed by the ℓp regularized imaging algorithm with those formed using the wide angle Tomo-SAR algorithm of [90]. In Tomo-SAR imaging, reconstructed scattering centers are not constrained to lay on a grid in the height dimension. To compare this imaging method with ℓp regularized LS reconstructed images, data is first interpolated to a grid with 0.1 m voxel spacing in each dimension; this is the same spacing used in ℓp regularized LS reconstructions. A Gaussian kernel with standard deviation of σ =0.1 m is used for interpolation.

Figure 6.15 shows the results of the Tomo-SAR approach applied to the Camry data after interpolation. The top 20 dB points are shown, in contrast to the top 40 dB in ℓp regularized LS imaging. The VV and HH polarization images in Figure 6.15(g) through 6.15(i) are formed by combining the interpolated VV and HH polarization images as performed in ℓp regularized LS reconstructions. Scattering is assumed to be above the ground plane in calculations; so, unlike in the ℓp regularized LS recon- struction, there are no non-zero voxels below the vehicle. As in the ℓp regularized LS reconstruction, a set of 72 subaperture image sets were formed, each with 5◦ azimuth extent, and the image-domain subaperture reconstructions for all polarizations were combined using (6.4). The wide-angle Tomo-SAR algorithm was also implemented in

MATLAB and took less than 1 minute to process each subaperture.

In comparing the ℓp regularized LS and Tomo-SAR reconstructions, some quali- tative differences are seen. Most notably, the Tomo-SAR-based reconstructions are more filled than the regularized LS reconstructions. This is in large part due to the way in which sparsity is enforced in the two techniques; the ℓp regularized LS method imposes sparsity in the full 3D space, while Tomo-SAR-based methods ob- tain standard (non-sparse) 2D images and develop sparse reconstructions only in the

127 1D height dimension. The 2D image downrange and crossrange resolutions are ap-

proximately 0.3 meters and 0.2 meters, respectively; so, a single bright scattering point will appear as a 0.3m 0.2m flat disk, tilted at 45◦. For 3D visualizations, × we find the more filled Tomo-SAR reconstructions to be more easily interpretable.

For automated post-processing such as automatic target recognition that treats the

reconstructed voxels as features, the smaller number of ’features’ provided by the ℓp

regularized LS approach is likely preferable, since it results in less correlated features

than in the Tomo-SAR technique. In comparing computations, we see that the ℓp approach requires more computation time for 3D reconstructions than the Tomo-SAR approach does, in the present algorithmic implementations. It should be noted that we have not undertaken a dedicated effort at computation optimization, and different relative computation times may be achieved with additional optimization.

6.6 Conclusions

We have examined the use of scattering sparsity to improve 3D SAR reconstruction from sparse data collection geometries. We have formulated a wide-angle regularized least-squares based 3D imaging algorithm based on the premise that radar scattering is sparse in the reconstructed 3D spatial domain. The algorithm considers anisotropic scattering behavior of objects over wide aspect angles, but uses a GRLT-based ap- proach to noncoherently combine independently calculated subaperture images to obtain a wide-angle reconstruction. Regularized least-squares wide-angle 3D imaging is effective at significantly reducing the large sidelobe artifacts that are present in traditional Fourier-based or backprojection reconstruction methods.

128 We presented 3D image reconstructions using both synthetic backscatter mea-

surements of a construction backhoe and Circular SAR X-band radar measurements

of an urban ground scene. In the backhoe case, we presented 3D reconstructions

using a pseudorandom “squiggle” flight path that is sparse over a wide-angle aper-

ture in both azimuth and elevation; the sparse flight path includes only 1.29% of the

filled-aperture data in the same azimuth-elevation sector. The resulting reconstruc- tion exhibits better resolution and far lower sidelobes than a conventionally-formed reconstruction, and it compares favorably with a reconstruction obtained using filled aperture data in the same azimuth-elevation sector. In addition, we presented recon- struction results for two ground objects (a calibration tophat and a Toyota Camry) from measured Circular SAR data. The 3D reconstructions clearly show the shape of the ground objects, with significantly lower sidelobe artifacts than those obtained in Fourier or backprojection imagery. In comparison with the Tomo-SAR imaging method, regularized LS imaging can be more memory and computationally intensive; however, it can be applied to arbitrary collection geometries, and it produces sparser images that may be better for applications such as automatic target recognition.

129 Chapter 7: Conclusion

The additive component model is a linear combination of parametric functions, and is used as a model in many application areas. In this model, the linear coeffi- cients, function parameters, and number of parametric functions are estimated from measured data. Using a classical statistical approach, these estimates are determined by solving a continuous parameter estimation problem to determine the linear co- efficients and parameters and by solving a discrete problem to estimate the model order.

In this dissertation, we analyzed an alternative dictionary-based method for addi- tive component model estimation that jointly estimates parameters and model order.

This dictionary-based estimation method is general, does not require domain specific knowledge, and can be applied to any additive component model. Our analysis ap- pears to be the first to examine sparse algorithm setting selection and performance from a joint model order and parameter estimation perspective. We showed that dictionary-based estimation methods are capable achieving estimation performance comparable to the Cram´er Rao lower bound and traditional benchmark estimation algorithms.

Dictionary-based estimation methods sample a continuous parameter space at discrete locations, forming a dictionary from the model’s parametric function. A

130 dictionary subset selection algorithm is used to select a sparse linear combination of

columns from the dictionary that model the collected data well. Model order and

parameter estimates are encoded in how many and which columns the sparse recon-

struction algorithm selected. Direct implementation of dictionary subset selection is

combinatoric and intractable. Sparse reconstruction algorithms are non-combinatoric

algorithms that are capable of reconstructing sparse objects in practical computation

time; hence, we used a sparse reconstruction algorithm for dictionary subset selection.

Since a sparse reconstruction algorithm is at the center of a dictionary-based es-

timation algorithm, and its performance directly affects estimation performance, we

investigated the convergence speed and solution quality of several popular sparse re-

construction algorithms in Chapter 3. Our analysis considered sparse reconstruction

performance for highly correlated dictionaries. This performance analysis is related

to work in compressive sensing, where reconstruction error on sparse algorithms can

be bounded, given that sparsity and dictionary correlation conditions, such as RIP,

hold. Most existing performance analysis of sparse reconstruction algorithms does

assume that compressive sensing conditions on correlation and sparsity hold, and

proves results on the norm of linear coefficient error. In contrast, we examined pre-

diction error and average sparsity of the solution, metrics that more directly align

with model estimation. We showed empirically that algorithms which solve an ℓp reg- ularized LS problem with p< 1 converge faster to a sparse solution than algorithms with p = 1. Based on these results, we chose an ℓp regularized LS algorithm based on

a majorization minimization (MM) algorithm for dictionary-based estimation. The

results of this chapter showed, quantitatively, that when p is selected appropriately,

131 sparse reconstruction algorithms still perform well, even when dictionary correlation

is high and compressive sensing conditions are violated.

The analytical rate of convergence and solutions that ℓp regularized LS algorithms converge to under highly correlated dictionaries is an area for future inquiry. We hypothesize that when p = 1, the cost surface of the sparse algorithm is nearly flat

around minima, and that decreasing p leads to a larger gradient about the minima.

In Chapter 4 we compared static dictionary-based estimation with traditional pa- rameter estimation techniques. There are three algorithm settings in the dictionary approach that affect estimation performance: parameter sample spacing, p of the ℓp quasi-norm, and the sparsity setting, λ. In the static dictionary-based estimation approach, parameters are finely sampled to minimize quantization error, and as a re- sult, the corresponding dictionary matrix columns are highly correlated. We selected the sparsity setting λ automatically by relating it to the classic information-based parametric model order selection method BIC. In this chapter and in Chapter 5.3, we showed that as parameter spacing decreases, and hence intercolumn correlation increases, an ℓp regularization term with decreasing p < 1 should be used; selecting p in this way results in a sparse solution for accurate model order selection. If p is too large for intercolumn correlation, estimation performance degrades. In particular, although p = 1 is commonly used in sparse reconstruction algorithms and results in a convex dictionary-based estimation algorithm, using p = 1 with highly correlated dic- tionaries will result in poor estimation performance. Examples of complex exponential estimation were presented for frequency parameters that are both closely-spaced and well-separated with respect to Rayleigh resolution. We showed that, although the

132 static dictionary-based estimation algorithm is a general estimation algorithm for ad-

ditive component models, it performs as well or better than model specific estimation

methods. Specifically, the static dictionary-based estimation algorithm compares fa-

vorably to the parametric method ESPRIT using BIC model order estimation, and

outperforms ESPRIT with BIC for closely-spaced frequencies.

In practice, continuous parameters do not lie on a sampled parameter grid. Quan-

tization error of the selected static dictionary grid can lead to parameter estimation

bias. In Chapter 5, we presented two new dynamic dictionary-based algorithms for

model order and parameter estimation in additive component models. In dynamic

dictionary-based algorithms, the parameter grid adapts to the data, and is able to

overcome quantization error. Each algorithm includes a method that inhibits closely

spaced parameter samples, limiting intercolumn correlation; one algorithm utilizes

a penalty on parameter distance and the other uses an explicit constraint on pa-

rameter spacing. As in static dictionary-based estimation, the dynamic algorithms

include a λ sparsity and a p, ℓp regularization setting; in place of the static parameter

spacing setting in static algorithms, the dynamic algorithms have inhibition settings

that prevent parameter samples from becoming too closely spaced. We selected λ

using the information-criterion method of Chapter 4 and chose p, and inhibition set- tings based on their relation to model order error. The dynamic algorithms were tested on complex exponential and decaying exponential models. Parameter esti- mates overcame estimation bias induced by quantization error in static dictionaries, and performance was comparable to the unbiased CRB and a genie ML estimator over a large range of SNR. The penalized dynamic dictionary algorithm exhibited the best

133 model order performance of the dictionary-based estimation algorithms and was com- petitive with genie ML. Based on dictionary estimation simulation results, it appears that dictionary-based estimation methods may be better than traditional methods at solving non-convex problems with local minima. For example, in non-convex op- timization problems, the ML algorithm may converge to poor local solutions if not initialized in the region of attraction of the true parameters; this can be shown to oc- cur in the complex exponential model. In the dictionary-based estimation algorithms, initial parameter samples are distributed over the whole parameter space, instead of at one initial parameter sample as in ML. We conjecture that the ℓp regularized LS algorithm performs more of a global search, selecting a sample in the the local region of attraction that fits the signal best.

For a complex exponential model, superresolution capabilities of static dictionary- based estimation were explored in Chapter 4, and performance under wide dynamic range was examined in [109]. In the dynamic examples presented here, frequencies were well-separated in parameter space with with respect to Rayleigh resolution, and all signal amplitudes were the same. Future research in dynamic dictionary-based estimation could explore performance when frequencies are more closely spaced than a Rayleigh resolution bin and in high dynamic range regimes.

The scattering centers in high frequency radar can be modeled as complex expo- nential signals. In Chapter 6, we applied dictionary-based estimation to the problem of 3D SAR scattering center location imaging. Collected radar data can be interpreted as samples in 3D k-space, and traditional image reconstruction is modeled as a Fourier inversion problem. Practical flight paths generate a sparse, non-regular, collection of

SAR data in k-space, and Fourier inversion methods fail to produce well-resolved 3D

134 images. In scenes with man-made objects, the number of dominant scattering cen- ters is sparse; so, dictionary-based estimation methods can be used to estimate their position. We presented wide-angle 3D SAR images generated from static dictionary- based estimation using X-band SAR data from both a pseudorandom flight path of a synthetic construction backhoe and circular SAR (CSAR) measurements from an urban ground scene. Backhoe images were compared with traditional Fourier images, using both the collected data and a filled data collection aperture; CSAR images were compared with traditional Fourier images and a multi-pass imaging method,

Tomo-SAR. We showed that dictionary-based estimation images are better resolved than Fourier images using the actual flight path and comparable to the filled aperture images, although the filled apertures use substantially more data. Compared to the

Tomo-SAR method, dictionary-based estimation produced 3D images that appeared sparser. Sparser images may be more desirable in applications such as automatic target recognition; the proposed 3D SAR algorithm is also well suited for persistent sensing applications, where data is collected over a wide-angle, and possibly non- regular flight path.

For 3D parameter estimation problems, such as in SAR, static dictionaries can become very large and memory intensive if the parameter space is sampled equally in each dimension. The number of dictionary columns in dynamic dictionary-based estimation may be determined by the maximum allowable model order, which will result in a much smaller dictionary. An extension of this work could compare speed and performance of static and dynamic algorithms in 3D SAR imaging or multidi- mensional problems in general. Although related to model order selection, scattering center detection performance is typically used as a quantitative performance metric in

135 SAR, instead of model order. An initial quantitative analysis of the relation between resolution and scattering center detection in sparse 3D radar imaging was discussed in [109]. This work can be extended to dynamic dictionary-based estimation, and in general, further research on metrics for comparing 3D SAR imagery is warranted.

In summary, this dissertation related dictionary-based estimation and classical es- timation methods. We presented several algorithms for dictionary-based estimation and discussed the similarities and differences between dictionary-based estimation and compressive sensing. Dictionary-based estimation is a general method that can be used to jointly estimate model order and parameters in any additive component model. Despite this generality, we showed that dictionary-based estimation methods perform well with respect to the unbiased Cram´er Rao lower bound, and are compet- itive with classic model order and parameter estimation techniques. Finally, in the specific application of 3D SAR imaging, we showed that dictionary-based estimation is capable of producing well-resolved images from sparse, irregular data collections.

136 Bibliography

[1] Z. Cho, J. Jones, and M. Singh, Foundations of medical imaging. New York: Wiley, 1993.

[2] G. Wright, “Magnetic resonance imaging,” Signal Processing Magazine, IEEE, vol. 14, no. 1, pp. 56 –66, Jan. 1997.

[3] M. Lustig, D. Donoho, J. Santos, and J. Pauly, “Compressed sensing MRI,” IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 72 –82, 2008.

[4] H. Hiriyannaiah, “X-ray computed tomography for medical imaging,” IEEE Signal Processing Magazine, vol. 14, no. 2, pp. 42 –59, Mar. 1997.

[5] C. V. Jakowatz Jr., D. E. Wahl, P. H. Eichel, D. C. Ghiglia, and P. A. Thomp- son, Spotlight-Mode Synthetic Aperture Radar: A Signal Processing Approach. Boston: Kluwer Academic Publishers, 1996.

[6] W. G. Carrara, R. M. Majewski, and R. S. Goodman, Spotlight Synthetic Aper- ture Radar: Signal Processing Algorithms. Artech House, 1995.

[7] P. Stoica and K. Sharman, “Maximum likelihood methods for direction-of- arrival estimation,” IEEE Transactions on Acoustics, Speech and Signal Pro- cessing, vol. 38, no. 7, pp. 1132 –1143, Jul. 1990.

[8] P. Stoica and R. Moses, Spectral Analysis of Signals. New Jersey: Pearson Prentice Hall, 2005.

[9] M. Hurst and R. Mittra, “Scattering center analysis via Prony’s method,” IEEE Transactions on Antennas and Propagation, vol. 35, no. 8, pp. 986 – 988, Aug. 1987.

[10] L. C. Potter and R. L. Moses, “Attributed scattering centers for SAR ATR,” IEEE Transactions on Image Processing, vol. 6, no. 1, pp. 79–91, 1997.

[11] S. Som, L. C. Potter, R. Ahmad, D. S. Vikram, and P. Kuppusamy, “EPR oximetry in three spatial dimensions using sparse spin distribution,” Journal of Magnetic Resonance, vol. 193, pp. 210–217, Aug. 2008.

137 [12] X. S. Xie, P. J. Choi, G. W. Li, N. K. Lee, and G. Lia, “Single-Molecule Approach to Molecular Biology in Living Bacterial Cells,” Annual Review of Biophysics, vol. 37, no. 1, pp. 417–444, 2008. [13] H. V. Poor, An introduction to signal detection and estimation. New York: Springer-Verlag, 1994. [14] P. Stoica and Y. Selen, “Model-order selection: a review of information criterion rules,” IEEE Signal Processing Magazine, vol. 21, no. 4, pp. 36–47, 2004. [15] J. F. Claerbout and F. Muir, “Robust modeling of erratic data,” Geophysics, vol. 38, no. 5, pp. 826 – 844, Oct. 1973. [16] D. Donoho and P. Stark, “Uncertainty principles and signal recovery,” SIAM J. Appl. Math., vol. 49, pp. 906 – 931, 1989. [17] J. A. Tropp, “Just relax: Convex programming methods for subset selection and sparse approximation,” Univ. Texas at Austin, Tech. Rep., Feb. 2004. [18] E. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact sig- nal reconstruction from highly incomplete frequency information,” IEEE Trans- actions on Information Theory, vol. 52, no. 2, pp. 489 – 509, Feb. 2006. [19] D. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, April 2006. [20] D. L. Donoho, M. Elad, and V. N. Temlyakov, “Stable recovery of sparse over- complete representations in the presence of noise,” IEEE Transactions on In- formation Theory, vol. 52, no. 1, pp. 6–18, 2006. [21] E. Cand´es, J. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Comm. Pure Appl. Math., vol. 59, no. 8, pp. 1207–1223, 2006. [22] R. Saab, R. Chartrand, and O. Yilmaz, “Stable sparse approximations via non- convex optimization,” in IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, Nevada, April 2008. [23] R. A. DeVore, “Deterministic constructions of compressed sensing matrices,” J. Complex., vol. 23, pp. 918–925, August 2007. [24] D. C. Balcan and M. S. Lewicki, “Point coding: Sparse image representation with adaptive shiftable-kernel dictionaries,” in SPARS 2009, April 06–09 2009. [25] I. To˘si´c, I. Jovanovi´c, P. Frossard, M. Vetterli, and N. Duri´c, “Ultrasound to- mography with learned dictionaries,” in IEEE ICASSP 2010, Mar. 2010, pp. 5502–5505.

138 [26] T. Blumensath and M. E. Davies, “Iterative hard thresholding for compressed sensing,” Appl. Computat. Harmon. Anal., vol. 27, no. 3, pp. 265–274, 2009.

[27] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Img. Sci., vol. 2, pp. 183–202, March 2009.

[28] D. Needell and J. A. Tropp, “Cosamp: iterative signal recovery from incomplete and inaccurate samples,” Commun. ACM, vol. 53, pp. 93–100, December 2010.

[29] O.¨ Batu and M. C¸etin, “Hyper-parameter selection in non-quadratic regularization-based radar image formation,” in SPIE Defense and Security Symposium, Orlando, FL., March 17–20 2008.

[30] P. Boufounos, M. F. Duarte, and R. G. Baraniuk, “Sparse signal reconstruction from noisy compressive measurements using cross validation,” in IEEE 14th Workshop on Statistical Signal Processing 2007, (SSP ’07), August 2007, pp. 299–303.

[31] M. C¸etin and W. Karl, “Feature-enhanced synthetic aperture radar image for- mation based on nonquadratic regularization,” IEEE Trans. on Image Process- ing, vol. 10, no. 4, pp. 623–631, April 2001.

[32] I. F. Gorodnitsky and B. D. Rao, “Sparse signal reconstruction from limited data using FOCUSS: A re-weighted minimum norm algorithm,” IEEE Trans- actions on Signal Processing, vol. 45, no. 3, pp. 600–616, 1997.

[33] D. Malioutov, M. C¸etin, and A. S. Willsky, “A sparse signal reconstruction perspective for source localization with sensor arrays,” IEEE Transactions on Signal Processing, vol. 53, no. 8, pp. 3010–3022, Aug. 2005.

[34] V. Cevher, M. F. Duarte, and R. G. Baraniuk, “Distributed target localiza- tion via spatial sparsity,” in 16th European Signal Processing Conference 2008 (EUSIPCO-2008), August 2008.

[35] M. Herman and T. Strohmer, “Compressed sensing radar,” in IEEE Radar Conference, 2008 (RADAR ’08), 2008, pp. 1–6.

[36] A. Fannjiang and W. Liao, “Coherence-pattern-guided compressive sensing with unresolved grids,” 2012, preprint.

[37] M. Duarte and R. Baraniuk, “Spectral compressive sensing,” Applied and Com- putational Harmonic Analysis, 2012, submitted.

[38] S. DeGraaf, “3-D fully polarimetric wide-angle superresolution-based SAR imaging,” in Thirteenth Annual Adaptive Sensor Array Processing Workshop (ASAP 2005). Lexington, M.A.: MIT Lincoln Laboratory, June 7–8 2005.

139 [39] J. Jakowatz, C.V. and D. Wahl, “Three-dimensional tomographic imaging for foliage penetration using multiple-pass spotlight-mode SAR,” in Thirty-Fifth Asilomar Conference on Signals, Systems and Computers, vol. 1, 2001, pp. 121 –125 vol.1.

[40] T. Kragh and A. Kharbouch, “Monotonic iterative algorithms for SAR image restoration,” in IEEE 2006 Int. Conf. on Image Processing, October 2006, pp. 645–648.

[41] R. Moses, L. Potter, and M. C¸etin, “Wide angle SAR imaging,” in Algorithms for Synthetic Aperture Radar Imagery XI. Orlando, FL.: SPIE Defense and Security Symposium, April 12–16 2004.

[42] K. Knaell, “Three-dimensional SAR from curvilinear apertures,” in Proceedings of the 1996 IEEE National Radar Conference, May 1996, pp. 220 –225.

[43] J. Li, Z. Bi, Z.-S. Liu, and K. Knaell, “Use of curvilinear SAR for three- dimensional target feature extraction,” IEE Proceedings —Radar, Sonar and Navigation, vol. 144, no. 5, pp. 275 –283, October 1997.

[44] S. Xiao and D. C. Munson, “Spotlight-mode SAR imaging of a three- dimensional scene using spectral estimation techniques,” in Proceedings of IGARSS 98, vol. 2, 1998, pp. 624–644.

[45] Z. She, D. Gray, R. Bogner, and J. Homer, “Three-dimensional SAR imaging via multiple pass processing,” in IEEE International Geoscience and Remote Sensing Symposium, 1999. (IGARSS ’99), vol. 5, 1999, pp. 2389 –2391.

[46] A. Reigber and A. Moreira, “First demonstration of airborne SAR tomogra- phy using multibaseline L-band data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 38, no. 5, pp. 2142 –2152, September 2000.

[47] G. Fornaro, F. Serafino, and F. Soldovieri, “Three-dimensional focusing with multipass SAR data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 41, no. 3, pp. 507 – 517, March 2003.

[48] F. Lombardini, M. Montanari, and F. Gini, “Reflectivity estimation for multi- baseline interferometric radar imaging of layover extended sources,” IEEE Transactions on Signal Processing, vol. 51, no. 6, pp. 1508 – 1519, June 2003.

[49] F. Lombardini and A. Reigber, “Adaptive spectral estimation for multibase- line SAR tomography with airborne L-band data,” in IEEE International Geo- science and Remote Sensing Symposium (IGARSS), vol. 3, July 2003, pp. 2014 – 2016.

140 [50] F. Gini and F. Lombardini, “Multibaseline cross-track SAR interferometry: a signal processing perspective,” IEEE AES Magazine, vol. 20, no. 8, pp. 71–93, Aug 2005.

[51] S. Tebaldini, “Single and multipolarimetric SAR tomography of forested ar- eas: A parametric approach,” IEEE Transactions on Geoscience and Remote Sensing, vol. 48, no. 5, pp. 2375 – 2387, May 2010.

[52] X. X. Zhu and R. Bamler, “Tomographic SAR inversion by l1 -norm regular- ization:the compressive sensing approach,” IEEE Transactions on Geoscience and Remote Sensing, vol. 48, no. 10, pp. 3839 –3846, October 2010.

[53] ——, “Very high resolution spaceborne SAR tomography in urban environ- ment,” IEEE Transactions on Geoscience and Remote Sensing, vol. PP, no. 99, pp. 1 –13, 2010.

[54] M. Herman and T. Strohmer, “General deviants: An analysis of perturbations in compressed sensing,” IEEE J. Sel. Topics Signal Processing, vol. 4, no. 2, pp. 342–349, April 2010.

[55] Y. Chi, L. Scharf, A. Pezeshki, and A. Calderbank, “Sensitivity to basis mis- match in compressed sensing,” IEEE Transactions on Signal Processing, vol. 59, no. 5, pp. 2182–2195, May 2011.

[56] E. van den Berg and M. P. Friedlander, “Probing the pareto frontier for basis pursuit solutions,” SIAM Journal on Scientific Computing, vol. 31, no. 2, pp. 890–912, 2008.

[57] D. Hunter and K. Lange, “A tutorial on MM algorithms,” The American Statis- tician, vol. 58, pp. 30 – 37, 2004.

[58] T. Goldstein and S. Osher, “The split bregman method for l1 regularized prob- lems.” SIAM Journal on Imaging Sciences, vol. 2, no. 2, pp. 323–343, 2009.

[59] R. Chartrand, “Fast algorithms for nonconvex compressive sensing: MRI recon- struction from very few data,” in Proceedings of the Sixth IEEE international conference on Symposium on Biomedical Imaging, ser. ISBI’09, 2009, pp. 262– 265.

[60] J. Yang, Y. Zhang, and W. Yin, “A fast TVL1-L2 minimization algorithm for signal reconstruction from partial fourier data,” Rice University, Tech. Rep., 2008. [Online]. Available: http://www.caam.rice.edu/tech reports/2008/ TR08-27.pdf

141 [61] C. Austin, R. Moses, J. Ash, and E. Ertin, “On the relation between sparse reconstruction and parameter estimation with model order selection,” IEEE J. Sel. Topics Signal Processing, vol. 4, no. 3, pp. 560–570, June 2010.

[62] J. Tropp and A. Gilbert, “Signal recovery from random measurements via or- thogonal matching pursuit,” IEEE Transactions on Information Theory, vol. 53, no. 12, pp. 4655–4666, Dec. 2007.

[63] M. Figueiredo, R. Nowak, and S. Wright, “Gradient projection for sparse re- construction: Application to compressed sensing and other inverse problems,” IEEE Journal of Selected Topics in Signal Processing, vol. 1, pp. 586 – 597, December 2007.

[64] S. Chen, D. Donoho, and M. Saunders, “Atomic decomposition by basis pur- suit,” SIAM Journal on Scientific Computing, vol. 20, no. 1, pp. 33–61, 1998.

[65] C. D. Austin, E. Ertin, J. N. Ash, and R. L. Moses, “On the relation between sparse sampling and parametric estimation,” in IEEE 13th DSP Workshop and 5th Sig. Proc. Edu. Workshop (DSP/SPE 2009), Jan. 4–7 2009, pp. 387–392.

[66] D. Johnson and D. Dudgeon, Array Signal Processing–Concepts and Techniques. Prentice Hall, Englewood Cliffs, NJ, 1992.

[67] P. Stoica and T. Soderstrom, “Statistical analysis of MUSIC and subspace ro- tation estimates of sinusoidal frequencies,” IEEE Transactions on Signal Pro- cessing, vol. 39, no. 8, pp. 1836–1847, Aug. 1991.

[68] W. M. Steedly, C.-H. J. Ying, and R. L. Moses, “Resolution bound and detection results for scattering centers,” in Int. Conf. Radar, Brighton, U.K., 1992, pp. 518–521.

[69] M. P. Clark, “On the resolvability of normally distributed vector parameter estimates,” IEEE Transactions on Signal Processing, vol. 43, no. 12, pp. 2975– 2981, Dec. 1995.

[70] S. T. Smith, “Statistical resolution limits and the complexified Cram´er-Rao bound,” IEEE Transactions on Signal Processing, vol. 53, no. 5, pp. 1597–1609, May 2005.

[71] R. Baraniuk, V. Cevher, M. Duarte, and C. Hegde, “Model-based compressive sensing,” IEEE Transactions on Information Theory, vol. 56, no. 4, pp. 1982 –2001, 2010.

[72] Y. Pati, R. Rezaiifar, and P. Krishnaprasad, “Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition,”

142 in Asilomar Conference on Signals, Systems and Computers, Nov. 1993, pp. 40–44.

[73] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. Royal. Statist. Soc B., vol. 58, no. 1, pp. 267–288, 1996.

[74] G. H. Golub and C. F. Van Loan, Matrix computations (3rd ed.). Baltimore, MD, USA: Johns Hopkins University Press, 1996.

[75] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge Unversity Press, 2009.

[76] M. A. Branch, T. F. Coleman, and Y. Li, “A subspace, interior, and con- jugate gradient method for large-scale bound-constrained minimization prob- lems,” SIAM J. Sci. Comput., vol. 21, pp. 1–23, August 1999.

[77] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex program- ming, version 1.21,” http://cvxr.com/cvx/, Apr. 2011.

[78] C. Austin, J. Ash, and R. Moses, “Parameter estimation using sparse recon- struction with dynamic dictionaries,” in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 11), May 22–27 2011, pp. 2852 – 2855.

[79] D. Malioutov, M. C¸etin, and A. Willsky, “Homotopy continuation for sparse sig- nal representation,” in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 05), vol. 5, March 2005, pp. 733–736.

[80] G. Titi, E. Zelnio, K. Naidu, R. Dilsavor, M. Minardi, N. Subotic, R. Moses, L. Potter, L. Lin, R. Bhalla, and J. Nehrbass, “Visual SAR using all degrees of freedom,” in Proc. MSS Tri-Service Radar Symposium, Albuquerque, NM, June 21-25 2004.

[81] S. Axelsson, “Beam characteristics of three-dimensional SAR in curved or ran- dom paths,” IEEE Transactions on Geoscience and Remote Sensing, vol. 42, no. 10, pp. 2324 – 2334, October 2004.

[82] O. Frey, C. Magnard, M. Ruegg, and E. Meier, “Focusing of airborne synthetic aperture radar data from highly nonlinear flight tracks,” IEEE Transactions on Geoscience and Remote Sensing, vol. 47, no. 6, pp. 1844 –1858, June 2009.

[83] M. Stuff, M. Biancalana, G. Arnold, and J. Garbarino, “Imaging moving objects in 3D from single aperture synthetic aperture radar,” in Proc. IEEE 2004 Radar Conference, April 26–29 2004, pp. 94–98.

143 [84] R. Moses and L. Potter, “Noncoherent 2D and 3D SAR reconstruction from wide-angle measurements,” in Thirteenth Annual Adaptive Sensor Array Pro- cessing Workshop (ASAP 2005). Lexington, M.A.: MIT Lincoln Laboratory, June 7–8 2005.

[85] E. Ertin, L. Potter, and R. Moses, “Enhanced imaging over complete circular apertures,” in Fortieth Asilomar Conf. on Signals, Systems and Computers (ACSSC 06), Oct 29 – Nov. 1 2006, pp. 1580–1584.

[86] C. D. Austin and R. L. Moses, “Wide-angle sparse 3D synthetic aperture radar imaging for nonlinear flight paths,” in IEEE National Aerospace and Electronics Conference (NAECON) 2008, July 16–18 2008, pp. 330–336.

[87] C. D. Austin, E. Ertin, and R. L. Moses, “Sparse multipass 3D SAR imaging: Applications to the GOTCHA data set,” in Algorithms for Synthetic Aperture Radar Imagery XVI, E. G. Zelnio and F. D. Garber, Eds. Orlando, FL.: SPIE Defense and Security Symposium, April 13–17 2009.

[88] K. Naidu and L. Lin, “Data dome: full k-space sampling data for high-frequency radar research,” in Algorithms for Synthetic Aperture Radar Imagery XI. Or- lando, FL.: SPIE Defense and Security Symposium, April 12–16 2004.

[89] C. H. Casteel, L. A. Gorham, M. J. Minardi, S. Scarborough, and K. D. Naidu, “A challenge problem for 2D/3D imaging of targets from a volumetric data set in an urban environment,” in Algorithms for Synthetic Aperture Radar Imagery XIV, E. G. Zelnio and F. D. Garber, Eds. Orlando, FL.: SPIE Defense and Security Symposium, April 9–13 2007.

[90] C. Austin, E. Ertin, and R. Moses, “Sparse signal methods for 3D radar imag- ing,” IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 3, pp. 408 –423, June 2011.

[91] D. E. Dudgeon, R. T. Lacoss, C. H. Lazott, and J. G. Verly, “Use of persistant scatterers for model-based recognition,” in Algorithms for Synthetic Aperture Radar Imagery (Proc. SPIE 2230), D. A. Giglio, Ed., 1994, pp. 356–368.

[92] R. Bhalla, J. Moore, and H. Ling, “A global scattering center representation of complex targets using the shooting and bouncing ray technique,” IEEE Trans. on Antennas and Propagation, vol. 45, no. 6, pp. 1850–1856, 1997.

[93] D. Rossi and A. Willsky, “Reconstruction from projections based on detection and estimation of objects–Parts I and II: Performance analysis and robust- ness analysis,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, pp. 886–906, 1984.

144 [94] R. L. Moses, E. Ertin, and C. Austin, “Synthetic aperture radar visualization,” in Proceedings of the 38th Asilomar Conference on Signals,Systems, and Com- puters, Pacific Grove, CA, Nov 2004.

[95] K. E. Dungan and L. C. Potter, “Classifying sets of attributed scattering cen- ters using a hash coded database,” in Algorithms for Synthetic Aperture Radar Imagery XVII, E. G. Zelnio and F. D. Garber, Eds. Orlando, FL.: SPIE Defense and Security Symposium, April 5–9 2010.

[96] ——, “Classifying transformation-variant attributed point patterns,” Pattern Recognition, vol. 43, no. 11, pp. 3805–3816, November 2010.

[97] K. Varshney, M. C¸etin, J. Fisher, and A. Willsky, “Sparse representation in structured dictionaries with application to synthetic aperture radar,” IEEE Transactions on Signal Processing, vol. 56, no. 8, pp. 3548–3561, August 2008.

[98] I. Stojanovic, M. C¸etin, and W. C. Karl, “Joint space-aspect reconstruction of wide-angle SAR exploiting sparsity,” in Algorithms for Synthetic Aperture Radar Imagery XV. Orlando, FL.: SPIE Defense and Security Symposium, March 17–18 2008.

[99] J. A. Jackson and R. L. Moses, “An algorithm for 3D target scatterer feature estimation from sparse SAR apertures,” in Algorithms for Synthetic Aperture Radar Imagery XVI (Proc. SPIE vol. 7337), E. G. Zelnio and F. D. Garber, Eds., 2009.

[100] Air Force Research Laboratory. (2010, January) Backhoe sample public release and Visual-D challenge problem. [Online]. Available: https: //www.sdms.afrl.af.mil/request/data request.php#Visual-D

[101] ——. (2010, January) Gotcha 2D / 3D imaging challenge problem. [Online]. Available: https://www.sdms.afrl.af.mil/datasets/gotcha/

[102] E. Cand´es and J. Romberg, “ℓ1-MAGIC: Recovery of sparse signals via convex programming,,” California Institute of Technology, Tech. Rep., October 2005.

[103] I. Daubechies, M. Defrise, and C. D. Mol, “An iterative thresholding algo- rithm for linear inverse problems with a sparsity constraint,” Comm. Pure Appl. Math., vol. 57, no. 11, pp. 1413–1457, 2004.

[104] L. Greengard and J.-Y. Lee, “Accelerating the nonuniform Fast Fourier Trans- form,” SIAM Review, vol. 43, no. 3, pp. 443–454, 2004.

[105] J. Fessler and B. Sutton, “Nonuniform Fast Fourier Transforms using min- max interpolation,” IEEE Transactions on Signal Processing, vol. 51, no. 2, pp. 560–574, February 2003.

145 [106] N. Ramakrishnan, E. Ertin, and R. Moses, “Enhancement of coupled multichan- nel images using sparsity constraints,” IEEE Transactions on Image Processing, vol. 19, no. 8, pp. 2115–2126, August 2010.

[107] R. Moses, P. Adams, and T. Biddlecome, “Three-dimensional target visualiza- tion from wide-angle IFSAR data,” in Algorithms for Synthetic Aperture Radar Imagery XII. Orlando, FL.: SPIE Defense and Security Symposium, March 28 – April 1 2005.

[108] E. Ertin, C. D. Austin, S. Sharma, R. L. Moses, and L. C. Potter, “GOTCHA experience report: Three-dimensional SAR imaging with complete circular apertures,” in Algorithms for Synthetic Aperture Radar Imagery XIV, E. G. Zelnio and F. D. Garber, Eds. Orlando, FL.: SPIE Defense and Security Symposium, April 9–13 2007.

[109] C. Austin, J. Ash, and R. Moses, “Performance analysis of sparse 3D SAR imaging,” in Algorithms for Synthetic Aperture Radar Imagery XVIII, E. G. Zelnio and F. D. Garber, Eds. Orlando, FL.: SPIE Defense and Security Symposium, April 25–29 2011.

146