<<

1

Sparse Signal Models for Augmentation in ATR Tushar Agarwal, Nithin Sugavanam and Emre Ertin

Abstract—Automatic Target Recognition (ATR) algorithms a distinct description of targets of interest [2], typically man- classify a given Synthetic Aperture Radar (SAR) image into one made objects such as civilian and military vehicles. This of the known target classes using a set of training images available sparsity structure has been utilized in [3], [4] to design fea- for each class. Recently, learning methods have shown to achieve state-of-the-art classification accuracy if abundant training data tures like peak locations and edges. that succinctly represents is available, sampled uniformly over the classes, and their poses. the scene. In conjunction with template-based methods or In this paper, we consider the task of ATR with a limited set of statistical methods, these hand-crafted features are used in training images. We propose a data augmentation approach to solving the target recognition problem. The template-based incorporate domain knowledge and improve the generalization methods exploit the geometric structure and variability of these power of a data-intensive learning algorithm, such as a Convolu- tional neural network (CNN). The proposed data augmentation features in the scattering centers in [5], [6] to distinguish method employs a limited persistence sparse modeling approach, between the different target categories. The target signature capitalizing on commonly observed characteristics of wide-angle of each of the scattering center varies with the viewing angle synthetic aperture radar (SAR) imagery. Specifically, we exploit of the sensor platform. Statistical methods can explicitly model the sparsity of the scattering centers in the spatial domain and and utilize this low-dimensional manifold structure of the the smoothly-varying structure of the scattering coefficients in the azimuthal domain to solve the ill-posed problem of over- scattering center descriptors [7], [8] for improved decisions parametrized model fitting. Using this estimated model, we as well as integrating information across views [9], [10]. synthesize new images at poses and sub-pixel translations not However, ATR algorithms based on these hand-crafted fea- available in the given data to augment CNN’s training data. tures are limited to the information present in these descriptors, The experimental results show that for the training data starved and lack the generalization ability to variability in clutter, pose, region, the proposed method provides a significant gain in the resulting ATR algorithm’s generalization performance. and noise. With the advent of data-driven algorithms such as artificial neural networks (ANN) [11], an appropriate feature Index Terms—Deep Learning, Data Augmentation, Automatic set and a discriminating function can be jointly estimated using Target Recognition, Compressive sensing. a unified objective-function. Recent advances in techniques to incorporate deep hierarchical structures used in ANN [12], I.INTRODUCTION [13] has led to the widespread use of these methods to solve inference problems in a diverse set of application areas. Synthetic aperture radar (SAR) sensing yields high- Convolutional Neural Networks (CNN), in particular, have resolution imagery of a region of interest, which is robust been used as an automatic feature extractors for image data. to weather and other environmental factors. The SAR sensor These methods have also been adopted in solving the ATR consists of a moving radar platform with a collocated receiver problem using SAR images [14]. There have been several and transmitter that traverses a wide aperture in azimuth, efforts in this direction, including the state-of-the-art ATR acquiring coherent measurements. Multiple pulses across the results on the MSTAR data-set in [15]. These results establish synthesized aperture are combined and coherently processed to that the CNN could be effective in radar image classification arXiv:2012.09284v1 [cs.CV] 16 Dec 2020 produce high-resolution SAR imagery. SAR imaging system and provided sufficient training data. However, extending this achieves a high spatial resolution in both the radial direction, approach to ATR problems on other data-sets is non-trivial termed as range, as well as the orthogonal direction, termed because the scattering behavior changes substantially as the as cross-range. The range resolution is a function of the wavelength of the operation changes. The major challenge is bandwidth of the signal used in illumination. The cross-range that Neural Networks usually require large data-sets to have resolution is a function of the antenna aperture’s size and the good generalization performance. In general, labeled radar persistence of scattering centers [1]. A significant fraction of data is not available in abundance, unlike other image data- the energy in the back-scattered signal from the scene is due sets. In this paper, we address the scarcity of training data and to a small set of dominant scattering centers resolved by the provide a general method that utilizes a model-based approach SAR sensor. The localization of back-scatter energy provides to capture the underlying scattering phenomenon to enrich the training data-set. T. Agarwal, Nithin Sugavanam, and E. Ertin are with the Department of Electrical and Computer Engineering, The Ohio State University, Columbus, is one of the most effective techniques OH, 43210 USA to handle the availability of limited training data. Transfer- Corresponding Author: T. Agarwal ([email protected]) learning uses the model parameters, estimated using a similar This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may data-set such as Image-net [16], as initialization for solving no longer be accessible. the problem of interest, typically using convolutional neural 2 networks with little to no fine-tuning. There have been nu- regime. Finally, recent work [24] has established the benefit of merous experiments supporting the benefits of transfer learn- over-parameterizing in implicitly regularizing the optimization ing, including two seminal papers [17] and [18]. However, problem and improving the generalization performance. radar images are significantly different from regular optical Transfer Learning is another approach for improving gen- images. In particular, SAR works in the wavelength of 1cm. eralization performance in the limited availability of data. Pan to 10m. while visible light has a wavelength of the order and Yang [25] provide a comprehensive overview illustrating of 1nm.. As a result, most surfaces in natural scenes are the different applications and performance gains of transfer rough at these wavelengths, leading to diffused reflections. In learning. For radar data, Huang et al. [26] made a successful contrast, microwaves from radar transmitters undergo specular effort in this direction by using a large corpus of SAR data to reflections, especially for man-made objects. This difference train feature-extractors in an unsupervised manner. in scattering behavior leads to substantially different images For SAR data, there exist several neural network architec- in SAR and optical imaging. Since specular reflections dom- ture based approaches to improve generalization. Chen et al. inate the scattering phenomenon, the images are sensitive [27] restrict the effective degrees of freedom of the network to instantaneous factors like the imaging device’s orientation by using a fully convolutional network. Lin et al. [28] propose and background clutter. Therefore, readily available optical- a convolutional highway network to tackle the problem of imagery based deep neural network models like Alex-net and limited data availability. In [29], authors design a specialized VGG16 [19] are not suitable for transferring knowledge to ResNet architecture that learns effectively even when training this domain. Retraining the network is again inhibited by the data-set is small. limited availability of radar imagery. We address this limitation Few other approaches focus on learning a special type of transfer learning by using data augmentation. We augment of feature. Dong et al. [30] generate an augmented mono- the radar image data-sets using a principled approach that genic feature vector followed by a sparse representation-based considers the phenomenology of the RF backscatter data. classification. In [31], authors use hand-designed features The paper is structured as follows. We first review relevant with supervised discriminative dictionary learning, to perform research work and outline our contributions in sections I-A and SAR ATR. Song et al. used a Sparse Representation based I-B, respectively. In sections II-A and II-B, we describe the Classification (SRC) approach in [32]. In [33], Huang et al. data-set and network architecture in detail. In section II-C, designed a joint low rank and sparse dictionary to denoise we provide an overview of our strategy and then describe the radar image while keeping the main texture of the targets. the details of our pose-synthesis methodology in section II-D. Much recently, Yu et al. [34] proposed a combination of Gabor Next, we present the details of the experiments and corre- features and features extracted by neural networks, for better sponding results in sections III andIV, respectively, which classification performance. provide the empirical evidence for the effectiveness of the Data Augmentation is another regularization strategy to proposed data augmentation method. We conclude with some reduce the while not affecting the training possible directions for future research in sectionV. error [35], [36], and is the main focus of this paper. As the name suggests, it is the approach to augment a data- set by transforming original data through domain-specific A. Related Work transformations and adding them to the data-set. J. Ding et al. The problem of over-fitting manifests and magnifies when [37] explored the effectiveness of conventional transformations the size of the training set is small. Several methods have used for optical images, viz. translations, noise addition, and been proposed to reduce over-fitting and improve general- linear interpolation (for pose synthesis). They report positive ization performance. These ill-posed problems are solved by results on the MSTAR data-set. More recently, [38] proposed a using regularizers that impose structure and constraints in the Generative Adversarial Network (GAN) to generate synthetic solution space. The norm of the model parameters serves as a samples for the augmentation of SAR data but did not report standard regularizing function. This keeps the parameter values any significant improvements in the error rate of the ATR task. small with 2-norm (|| · ||2 called L2 loss) or sparse with 1- In another effort using a GAN, Gao et al. [39] used two jointly norm(|| · ||1 called L1 loss). Furthermore, the optimization trained discriminators with a non-conventional architecture. algorithms, such as stochastic and mirror They further used the trained generator to augment the base descent, implicitly induce regularization [20], [21]. Dropout, data-set and reported significant improvements. Cha et al. introduced by Srivastava et al. [22], is another popular method, [40] use images from a SAR data simulator and refine them specifically for deep neural networks. The idea is to randomly using a learned function from real images. Simple rotations of switch off certain neurons in the network by multiplying a radar images were considered as a data-augmentation method Bernoulli random variable with a predefined probability distri- [41]. In [15], Zhong et al. suggested key ideas to incorporate bution. The overall model learned is an average of these sub- prior knowledge in training the model. They added samples models, giving improved generalization performance. Batch flipped in the cross-range dimension with a reversed sign Normalization is another way to improve generalization per- of the azimuthal angle. Such flip-augmentation exploits the formance proposed by Ioffe and Szegedy [23]. They proposed symmetric nature of most objects in the MSTAR Dataset. normalizing all neuron values of designated layers continu- They also added a loss related to a secondary objective to ously while training along with an adaptive mean and variance the primary objective of classification. The authors used the that also get learned as part of the back-propagation training pose prediction (azimuthal angle) as the secondary objective 3

Fig. 1: The Neural Network Architecture. The abbreviations used are as follows. Conv is Convolutional followed by the kernel height × width. MP is max-pooling followed by the pooling-size as height × width. RL, BN, Flat, Drop and FC are ReLU, Batch-Normalization, Flattening, Dropout and Fully-Connected layers respectively. Sizes of feature maps are mentioned at their top as height × width × channels. of the network. They empirically showed that this helps by respectively. We consider a setting, where Ntrain adding meaningful constraints to the network learning. Thus, a labeled training data-set Dtrain = {(Xu,Yu)}u=1 is used the network is more informed about the auxiliary confounding to estimate classifier parameters w, where Ntrain are total factor, improving its generalization capability. There have been number of training samples. The training procedure is the several other models proposed to tackle the ATR problem with minimization of an appropriate L :(w, D) → R the MSTAR data-set, including [42] and [43], but we primarily using an iterative algorithm like Stochastic Gradient Descent. build upon Zhong et al.’s [15] work. Therefore, the learned w∗ are the solution of the following minimization problem P. B. Contributions w∗ = P(D) = arg min L(w, D) (1) Our network architecture and the objective function is w primarily inspired by [15] with some modifications. Instead Data augmentation involves applying an appropriate trans- of linearly interpolated samples in the image domain [37], we formation T : Din → Dout to a data-set (only Dtrain for introduce a principled approach for pose synthesis. We first our purposes) and hence expand it to an augmented data- transform the image into the polar frequency domain to obtain set T (Dtrain). We also use a validation data-set, Dval = the samples in the phase history domain. We then construct Nval {(Xu,Yu)}u=1 for cross-validation during training and a test a model motivated by the scattering behavior of canonical Ntest data-set, Dtest = {(Xu,Yu)}u=1 for evaluating g(X; w) reflectors in this phase history domain. This model captures post-training. The evaluation can be done using a suitable the viewing-angle dependent anisotropic scattering behavior. metric M :(w, D) → R which maybe different from L Additionally, since our model is based on the phase-history above. Our aim is to find T such that the estimated param- domain, we can generate the scintillation effects due to sub- eters waug = P(T (Dtrain)) perform better than wtrain = pixel shifts in the augmented images. These sub-pixel shifts P(Dtrain) in terms of the chosen metric, i.e. M(waug, Dtest) are typically not possible in the traditional image domain. We is more desirable than M(wtrain, Dtest). hypothesize that these two factors are essential to improving the network’s knowledge about the SAR imaging systems’ underlying physics. We describe our methodology and support A. MSTAR Dataset its effectiveness through empirical results, which show a The MSTAR data-set consists of 10 classes, i.e., tanks (T62, significant boost in generalization performance. T72), armored vehicles (BRDM2, BMP2, BTR60, BTR70), It is important to note that our data-augmentation based a rocket launcher (2S1), an air defense unit (ZSU234), a strategy is generic and decoupled with the network architec- military truck (ZIL131), and a bulldozer (D7). The radar tures proposed in other works like [28], [27]. We hypothesize platform used in constructing the MSTAR data-set acquires that the proposed augmentation strategy may yield better the measurements using Np = 100 pulses over an aperture results than presented in this paper in conjunction with the of 3 degrees. The phase-history measurements obtained in the methods mentioned above. Our objective here is to demon- receiver are converted to images using the sub-aperture based strate the benefits of the proposed data-augmentation strategy. method described in [44]. The motion-compensation steps is Hence, apart from data-augmentation, we only use Zhong et followed by the application of a Taylor window to control the al.’s [15] multi-task learning paradigm. side-lobes. The measurements are zero-padded to obtain an over-sampled image using the Fourier transform. The complete II.PHYSICSBASED DATA AUGMENTATION MSTAR data-set used in [15] was highly imbalanced. We We want to learn a parametric Neural Network classifier g, replace the data-set used in [15] with a balanced subset, which with parameters w ∈ Rdw , that predicts an estimate of output is referred to as standard operating conditions considered in labels, Y ∈ RdY for an input X ∈ CdX , i.e. Yˆ = g(X; w) ([27], [26]). We denote this subset as SOC MSTAR data-set where dX , dw and dY are dimensions of X, w and Y , henceforth. 4

  Similar to the existing literature, we use the images at − (1 − Y3) log 1 − Yˆ3(w, |X|) depression angle, φ = 17◦ for training while images at 64×64 φ = 15◦ form the test set. Similar to [15], we crop the where |.| denotes the absolute value, X ∈ C ,Y1 ∈ 10×1 images to 64 × 64 with the objects in the center. Note that {0, 1} ,Y2 ∈ [−1, 1],Y3 ∈ {0, 1} refer to complex radar 1 we crop the images right before feeding it to the ANN. images, the one-hot vector of the 10 classes, sin(θ) and A(θ) ¯ We perform the modeling and augmentation steps on the respectively. ED refers to the empirical mean over data-set th original images. Since our paper’s objective is to investigate D and Y1,p is the p component of the vector Y1. All the the effects of data augmentation, we work with much smaller quantities with ˆ (hat) are the corresponding estimates given training data-sets by artificially reducing the size of our data- by the ANN. set at φ = 17◦. We exponentially sub-sample by extract- ing only R ratio of samples from each class where R ∈ C. Teaching SAR Physics −5 −4 −3 −2 −1 0 {2 , 2 , 2 , 2 , 2 , 2 }. We ensure that the extracted In addition to translation invariance and exploiting the images are uniformly distributed over the [0, 2π] azimuthal symmetry of objects in the cross-range domain, our augmen- angle domain for each sub-sampling ratio. This sub-sampling tation strategy attempts to improve the network’s knowledge strategy is essential to ensure that the learning algorithm about two confounding factors namely pose, and scintillation gets a complete view of the vehicle’s scattering behavior. We effects due to shifts in range domain. As described above, further select 15% of uniformly distributed samples from this the approach of [15] aims at teaching pose information to the uniformly sub-sampled data as the validation set and utilize network. We further enhance this knowledge by synthesizing the remaining 85% as the new training set. The training- new poses in a close neighborhood of existing poses. The pose data Dtrain includes the flip augmentation along cross-range synthesis strategy utilizes a model that exploits the spatial domain [15] and is just referred to as data or data-set. We sparsity and the scattering centers’ limited persistence. We also include these flip-augmentation in the final validation set estimate the model in the phase-history domain for a given Dval and no augmentations are included in the final test-set image, which we refer to as the PH model henceforth. We Dtest. We form our T (Dtrain) by performing the proposed utilize the continuity of this PH model to generate phase- pose augmentation on each radar image in the training data- history measurements and synthesize new images in a close set as described in the section II-D. Our net transformation T neighborhood of the original image. The methodology is also includes sub-pixel level translations as well as real-time described in detail in the following section. pixel-level translations [37] in range and cross-range domain. We transform the image into its frequency domain rep- resentation. We utilize this representation to introduce sub- 1 B. Network Architecture pixel shifts of 2 pixels corresponding to approximately 0.15m displacement in the Y-direction (range) as well as X-direction We utilize a network architecture inspired by [15] and (cross-range) of the scene, where each pixel corresponds to shown in Fig.1. We modify the network and use batch- 0.3m in range and cross-range domain (for MSTAR data). We normalization layers after the ReLU activation in convolu- empirically show that such fractional shifts provide knowledge tional layers. We defer the use of dropout in convolutional about Scintillation and hence make the network more robust layers since batch-normalization regularizes the optimization to these effects, further improving its generalization capability. procedure [45]. After the last convolutional layer, we flatten out all the feature values and use a fully connected (FC) layer D. Pose Synthesis Methodology followed by a dropout layer. We further modify the cosine cost, used for pose-awareness in [15], to a pair of simpler This section describes the pose synthesis methodology used costs using features Y2 = sin(θ) and Y3 = 1A(θ) where θ for data augmentation using the PH model. The strategy is is the azimuthal angle and 1A is the indicator function over a modified version of our earlier work, which focused on −π π modeling the scattering behavior of targets in monostatic and set A = [ 2 , 2 ]. These two features uniquely determine the azimuthal angle and remove the need for a cosine distance bistatic setup [46]–[48]. The images in the MSTAR data-set loss. In our experiments, we found that this modification to are not registered. Hence, a unified model over the entire the loss function resulted in improving the convergence of circular aperture requires a phase calibration method in the the optimization procedure while training the model. The loss frequency domain [49]. Therefore, we construct the model function L to find the network parameters is now for each image and locally extrapolate the measurements. The SAR operating in spotlight mode has been used to create the images in the MSTAR data-set. The images are translated ¯ L(w, D) = ED[L1(w, X, Y1) + L2(w, X, Y2)+ from the spatial domain to the cartesian frequency domain L3(w, X, Y3)] (2) using the steps described in [50]. Subsequently, we convert 10 the frequency measurements to the polar coordinates to obtain X   the phase-history measurements described in [51]. L1(w, X, Y1) = − Y1,p log Yˆ1,p(w, |X|) p=1 We consider a square patch on the ground of side lengths 2 L = 30m centered around the target. From the Geometric L2(w, X, Y2) = (Y2 − Yˆ2(w, |X|))   theory of diffraction, we assume that a complex target can L3(w, X, Y3) = −Y3 log Yˆ3(w, |X|) be decomposed into a sparse set of scattering centers. The 5 scattering centers are assumed to be K point targets, described sub-aperture 2∆θ into smaller intervals of equal length with a K L L using {(xk, yk), hk(θ, φ)}k=1 where (xk, yk) ∈ [− 2 , 2 ] × corresponding set containing the means of the intervals given L L ˆ D [− 2 , 2 ] are the spatial coordinates of the point targets, θ by {θv}v=1, where D = 12, which are used as the centroids is the azimuthal angle, φ is the angle of elevation of the for the Gaussian interpolating functions. We assume the width radar platform and hk(θ, φ) are the corresponding scattering of the Gaussian function, σG as a constant hyper-parameter coefficients that depend on the viewing angle. The samples whose selection is described in section III-B. Hence, σG is of the received signal after the standard de-chirping procedure the constant minimum persistence of the scattering center in are given by azimuth that we wish to detect. The radial basis functions used are s(fm; θ, φ) = K   X fm cos(φ)   hk(θ, φ) exp −j4π (xk cos(θ) + yk sin(θ)) , ˆ !2 c θ − θv k=1 ψv(θ) = exp −  (6) (3) 2σG where fm are the illuminating frequencies such that m ∈ [M], 2BL The elements of Sˆ due to scattering centers located at the M = c , B is bandwidth of transmitted pulse, c is the speed of light and the notation [M] denotes enumeration of natural discrete grid points are now given by numbers up till M. We estimate the function hk(θ, φ) ∀ k ∈

[K] from the receiver samples. 2 N D Parametric models for standard reflectors such as dihedral XR X sˆ =n ˆ + c ψ (θ ) and trihedral were studied in [52]–[54]. These models indicate m,i m,i v,k v i k=1 v=1 that the reflectivity is a smooth function over the viewing   fm cos(φ) angle, which is parameterized by the reflector’s dimensions exp −j4π (xk cos(θi) + yk sin(θi)) . (7) and orientation. Therefore, we exploit this smoothness to c approximate this infinite-dimensional function using interpola- Here, the discrete grids for (x , y ) and (θ , f ) are both tion strategies [55] with the available set of samples, Θ, in the k k i m known. Let the vectors containing all corresponding grid angle domain. We denote the sampled returns from the scene points for xk, yk, θi, fm be referred as x, y, θ, f respectively. by the matrix S = NUFFT (X) ∈ Nθ ×M , where NUFFT C The problem now is to find the coefficients c that minimizes represents the non-uniform Fourier transform. The elements v,k the error between Sˆ and S. Let vector c = [c ··· c ]T . of S are defined as follows. k 1,k D,k To recover the structured signal h = [h1 ··· hN 2 ], which K R X represents the scattering coefficient of a sparse scene that has s = n + h (θ , φ) m,i m,i k i a sparse representation in an underlying set of functions, we k=1   solve the following linear inverse problem using a sparse- fm cos(φ) 2 exp −j4π (x cos(θ ) + y sin(θ )) . (4) group regularization on ck∀ k ∈ [N ]. c k i k i R where nm,i represents the measurement noise. In order to solve  2  NR the estimation problem, we assume that the function hk has X ˆ min  λ kckk2 + S − S  ⇐⇒ min J(C, σG) (8) a representation in a basis set denoted by the matrix Ψ ∈ C F C CNθ ×D of size D. For the MSTAR data-set, the elevation k=1 angles we work with are similar. We assume that the variation where C refers to the matrix [c1 ··· cN2 ], σG is a constant in hk with respect to φ is insignificant. This assumption leads R PD hyper-parameter and k·k , k·k refer to the l2, Frobenius to the following relation hk(θ; φ) = v=1 cv,kψv(θ) + P . 2 F The estimated phase-history matrix is now Sˆ whose elements norms respectively. ∗ are given by The elements of the recovered model, S (θ; f) are now

2 N D K D R ∗ X X ∗ X X si,m = cv,kψv(θi) sˆm,i =n ˆm,i + cv,kψv(θi) k=1 v=1 k=1 v=1    f cos(φ)  fm cos(φ) exp −j4π m (x cos(θ ) + y sin(θ )) , (5) exp −j4π (xk cos(θi) + yk sin(θi)) (9) c k i k i c ∗ where nˆm,i consists of the measurement noise and the where cv,k are the recovered coefficients. The phase-history approximation error. To estimate the coefficients cv,k from measurements are converted back to the image using overlap- the noisy measurements in (5), we discretize the scene with ping sub-apertures spanning 3 degrees in the azimuth domain resolution of ∆R in X,Y (range, cross-range) plane to get as shown in Fig.2. We apply the same Taylor window with 2 2BL K = NR grid points, where NR = c is the number zero-padding and translate it back to the Cartesian coordinates of range bins. Furthermore, we consider a smooth Gaussian before applying the Fourier transform to generate the images function to perform the noisy interpolation. We partition the to augment the data-set. 6

metric. We report the percentage error (or misclassification), which is 100 − accuracy as the test performance results.

B. Determining Hyper-parameters The PH model for each image and the neural network model introduce a set of hyper-parameters. We will now explain our choices of a subset of them and mention some others. The neural network’s hyper-parameters are kept at the Tensorflow (1.10) library’s default values unless specified. The PH model has 2 main hyper-parameters, the σG and δθ. Fig. 2: Extrapolating data over the polar grid by δθ using the We determine optimum σG for every image by minimizing the model S∗(θ; f). following equation over all possible values of it, using a simple line-search.

∗ h i σG = arg min min J(C, σG) III.EXPERIMENTS σG C For determining appropriate δθ, we choose the heuristic A. Experimental Setup approach of grid search. We generate samples with r ≤ 6◦ The experiments were done using the network described in because the approximation error increases. We choose an section II-B on the data-sets described in section II-A. This appropriate δθ by running a grid search over a factor η such δθ = ησ∗ model is trained on a local machine with a Titan Xp GPU. The that G. The amount of extrapolation per image σ∗ Tensorflow (1.10) [56] library is used for the implementation depends on the corresponding kernel-width G. We run the through its python API. We use the ReLU training on the smallest subset of the data-set at sub-sampling ratio of 2−5 for searching over a grid of 3 values, η = {1, 2, 3}. everywhere except the final output layers of Yˆ1, Yˆ2, and We choose the η that gives the best validation performance. Yˆ3 where we use Softmax, Linear and Sigmoid activations respectively. Although we experimented with η > 3 but found the results comparable to η = 3. Such a grid-search returns η = 3 as the An overview of the processing steps to synthesize radar optimum factor and hence we use δθ = min{r, 3σ∗ }. images, starting from the complex radar data, is described in G For the neural network model, we set the dropout rate for the Fig.3. We first transform the image to K-space by inverting last fully connected layer at 0.2. We add real-time X-Y trans- the transformations applied to the MSTAR data to get the lations to every image where each translation (in no. of pixels) phase history representation. Using the header information is randomly sampled from the set {−6, −4, −2, 0, 2, 4, 6}, at from the MSTAR dataset, we determine the discrete grids for every epoch. (xk, yk) and (θi, fm). Next, we estimate the model coefficients by solving the optimization problem described in equation (8). As a result, we obtain the model, S∗(θ; f), given by IV. RESULTS equation (9). This model is further used to synthesize new A. It’s all about the features columns of phase-history data (or extending the θ vector) and We hypothesize that such a data augmentation strategy consequently produce a synthesized image by the procedure primarily affects the CNN’s convolutional layers that serve as described in section II-D followed by transformation of phase the feature extractors. The classifier can then be trained with history data to complex-valued image data. small training data-sets and will generalize well. Therefore, We used the magnitude of the complex-valued radar data as we present all our results for the extracted features. To this input, X, in agreement with the existing literature for training end, we first learn all network parameters using the specified the network. We normalize all input images to the unit norm data-set. We retrain the classifier layers (after and including to reduce some undesired effects due to the Gaussian kernel the first FC layer) from scratch using the corresponding in extrapolation. We also remove all the synthetic images at sub-sampled without augmentation. We repeat the training poses already in the corresponding training set. Then, the procedure for all 6 values of R. All the models have the same optimization problem in equation (1) is solved using the off- architecture as described in section II-B and use the same the-shelf, Adam variant of the Mini-batch Stochastic Gradient Dtest. The difference among them is the Dtrain and Dval Descent optimizer with a mini-batch size of 64. The training used as described in section II-A. For consistency of results, is carried out for many epochs (> 400), using the early- we repeat the process described in section II-A to get four −1 stopping criterion, and the model is saved for the best moving different Dtrain and Dval for each R (except for R = 2 average validation performance metric. Although we care and R = 20 where only two and one such unique data-sets about accuracy (percentage of samples classified correctly) as were possible, respectively) and report average performance the performance metric, the Dval here gets small, especially results. for small R values thereby saturating the validation accuracy at The overall augmentation performance is summarized in 100%, yielding this metric less useful. Instead, we monitor the tablesII,I and visualized in Fig.5. We abbreviate data-sets minimum classification loss, L1, as the validation performance as: Baseline data as B, Full SOC data-set as F (for Full-data), 7

TABLE I: Test Errors corresponding to the analysis plot (Fig. 5a). Sub-sampling Baseline Data Adding Our Sub-pixel Adding Our poses and Full-data Ratio (R) (B) shifts (B+S) sub-pixel shifts (B+S+P) Features (F) 0 2 0.50 {B0} 0.58 {S0} 0.37 {SP0} 0.50 {F0} −1 2 1.44 ± 0.33 {B1} 1.03 ± 0.08 {S1} 0.54 ± 0.00 {SP1} 0.72 ± 0.06 {F1} −2 2 4.21 ± 0.83 {B2} 2.33 ± 0.33 {S2} 1.00 ± 0.34 {SP2} 0.99 ± 0.17 {F2} −3 2 10.99 ± 0.73 {B3} 5.68 ± 0.67 {S3} 3.32 ± 1.31 {SP3} 1.22 ± 0.26 {F3} −4 2 18.78 ± 2.40 {B4} 14.02 ± 0.28 {S4} 7.33 ± 0.54 {SP4} 2.07 ± 0.21 {F4} −5 2 32.38 ± 2.93 {B5} 29.98 ± 2.62 {S5} 18.66 ± 3.22 {SP5} 4.55 ± 0.57 {F5}

TABLE II: Test Errors corresponding to the comparative plot (Fig. 5b). Sub-sampling Baseline Data Adding Naively Rotated Adding Linearly Adding Our poses and Ratio (R) (B) Poses (B+R) Interpolated Poses (B+L) sub-pixel shifts (B+S+P) 0 2 0.50 {B0} 0.62 {R0} 0.50 {L0} 0.37 {SP0} −1 2 1.44 ± 0.33 {B1} 1.22 ± 0.19 {R1} 1.22 ± 0.14 {L1} 0.54 ± 0.00 {SP1} −2 2 4.21 ± 0.83 {B2} 4.38 ± 0.72 {R2} 2.56 ± 0.21 {L2} 1.00 ± 0.34 {SP2} −3 2 10.99 ± 0.73 {B3} 10.01 ± 1.72 {R3} 7.26 ± 2.56 {L3} 3.32 ± 1.31 {SP3} −4 2 18.78 ± 2.40 {B4} 19.83 ± 2.21 {R4} 12.99 ± 1.10 {L4} 7.33 ± 0.54 {SP4} −5 2 32.38 ± 2.93 {B5} 32.05 ± 7.58 {R5} 30.79 ± 3.04 {L5} 18.66 ± 3.22 {SP5}

TABLE III: Results from various SAR-ATR efforts using SOC MSTAR data-set. Method Error(%) using 100% data Error(%) using ≤ 20% data SVM (2016) [32] 13.27 47.75 (at 20%) SRC (2016) [32] 10.24 36.35 (at 20%) A-ConvNet (2016) [27] 0.87 35.90 (at 20%) Ensemble DCHUN (2017) [28] 0.91 25.94 (at 20%) CNN-TL-bypass (2017) [26] 0.91 2.85 (at 18%) ResNet (2018) [29] 0.33 5.70 (at 20%) DFFN (2019) [34] 0.17 7.71 (at 20%) Our Method 0.37 1.53 (at 18%)

TABLE IV: Confusion matrix for classifier corresponding to R = 2−4 with no augmentation in feature-extraction and classification steps. Class 2S1 BMP2 BRDM2 BTR60 BTR70 D7 T62 T72 ZIL131 ZSU234 Error (%) 2S1 196 0 1 0 3 1 40 5 18 10 28.467 BMP2 21 117 2 18 10 0 1 23 3 0 40.0 BRDM2 9 1 256 1 0 1 0 0 6 0 6.569 BTR60 2 2 4 161 10 3 2 4 4 3 17.436 BTR70 21 13 1 23 130 1 0 6 0 1 33.673 D7 0 0 0 0 0 264 1 0 7 2 3.65 T62 5 0 0 2 0 1 234 4 22 5 14.286 T72 5 2 0 3 0 1 16 164 5 0 16.327 ZIL131 1 0 0 0 0 34 4 0 234 1 14.599 ZSU234 0 0 0 0 0 17 11 0 29 217 20.803 Overall 18.639

TABLE V: Confusion matrix for classifier corresponding to R = 2−4 with augmentation only in feature-extraction step. Class 2S1 BMP2 BRDM2 BTR60 BTR70 D7 T62 T72 ZIL131 ZSU234 Error (%) 2S1 251 0 0 1 0 1 9 8 4 0 8.394 BMP2 4 169 0 4 0 0 4 12 1 1 13.333 BRDM2 16 8 243 0 0 0 0 0 6 1 11.314 BTR60 2 1 4 172 6 1 3 1 1 4 11.795 BTR70 7 2 1 0 184 0 0 2 0 0 6.122 D7 0 0 0 0 0 263 0 0 0 11 4.015 T62 6 0 0 4 0 1 257 4 1 0 5.861 T72 1 0 0 0 0 0 10 183 0 2 6.633 ZIL131 6 0 0 0 0 8 6 1 244 9 10.949 ZSU234 0 0 0 0 0 0 2 0 0 272 0.73 Overall 7.711 8

Fig. 3: Overview of the Image Synthesizing Procedure. All boxes with grid-lines represent matrices of complex values across 2 an (xk, yk) ∀ k ∈ [NR] grid where Cross-range and Range are labeled and along an (θi, fm) ∀ i ∈ [Nθ], m ∈ [M] grid where θ and f are labeled. Blue Arrows represent pre-model fitting stage and Green Arrows represent post-model fitting stage.

Baseline data with proposed sub-pixel augmentations as B+S, R = 2−4, we picked the one which was a good representative Baseline data with proposed sub-pixel and pose augmentations of the average performance. as B+S+P, baseline data with simple rotations added (from [41]) as B+R and baseline data with linearly interpolated poses (from [37]) as B+L. B. Comparison with existing SAR-ATR models The Full-data plot in the Fig. 5a (values in tableI), shows Comparing our test error using the full SOC MSTAR data- the importance of extracting good quality features, i.e., if we set, it can be seen from table III that our approach is at par had access to all the poses, we would learn very good features. with the existing approaches when using all the data. We are Having good features makes classification quite easy, evident interested in training the CNN models when data availability from the low test errors even in very low data availability. is extremely low, say ≤ 60 samples per class. To compare The sub-sampling has little effect on the generalization per- results from our approach to recent works in extremely low formance in this case. The Baseline data plot shows a consid- data-regimes, we utilized some results from [34] and [26]. We erable amount of test error, especially in low data availability. also conducted B+S+P experiments for 18% data per class. This test error is reduced in B+S data plot and further reduced These are also tabulated in table III. It is this extreme sub- in B+S+P data plot, which shows the effectiveness of both our sampling regime where our approach really out-performs all strategies in improving the quality of features extracted by the other existing approaches. We get the lowest test-error even CNN. Note that the majority of the improvement comes from when using the smallest portion of the data. We reiterate that the pose augmentations. For R > 2−2, the model using both most of the tabulated approaches are decoupled from our data proposed augmentations gives even better performance than augmentation approach. So, in principle, it may be possible Full-data features. This makes sense because our augmentation to combine our data augmentation strategy with the existing strategy is able to fill-in pose information gaps successfully in approaches to get even better results. the complete SOC data. However, there exists a considerable Except for works [41] and [37], we did not find reproducible gap between Full-data and B+S+P data plots in the lower data data augmentation strategies that explicitly synthesize samples regimes, R < 2−2. So, there still exists a scope for further at new poses. As pointed out earlier, our approach can be improving the feature extraction. used in conjunction with most of the other strategies outlined The confusion matrices for a sample data-set at R = 2−4 in section I-A. So, we do a detailed comparison of our are shown in tablesIV andV. These tables clearly show that pose synthesis approach and sub-pixel level translations with the performance has considerably improved, with the proposed pose synthesis methods in [41] and [37]. We add real-time augmentation of training data in low data availability. Not only pixel level translations for all experiments. For the sake of that, but the performance has also improved overall classes completion, the simple rotations are produced in [41] using except two. As there were multiple sub-sampled data-sets at the rotation matrix followed by appropriate cropping and the 9

◦ (a) Radar image at a viewing angle of θc = 57 generated by rotating the closest available in the sub-sampled dataset at ◦ θa = 56 (from [41]).

(b) Linear interpolation strategy proposed in [37]. Here, α and β can be inferred from equation 10 and the closest poses to ◦ ◦ ◦ θc = 57 in the subsampled dataset were θa = 56 and θb = 85 .

◦ (c) Radar image at the azimuth angle of θa = 56 is first approximated using the proposed set of basis functions in the frequency domain. Measurements from unobserved viewing angles are synthesized using the model in frequency domain, ◦ which is used to create the image at the viewing angle of θc = 57 . ◦ Fig. 4: Comparison of radar images synthesized for class T62 at viewing angle θc = 57 using different augmentation strategies 10

Subsampling ratio vs. Test Error Subsampling ratio vs. Test Error 102 102 B B B+S B+R B B 5 B+S+P 5 B+L S R 5 F 5L B+S+P B 5 B 4 4 SP SP R 5 5 4 S B B 1 4 3 1 L 3 10 10 4 R 3 SP SP 4 4 L F S 3 5 3 B B 2 R 2 2 SP SP 3 3 F 4 S L 2 2 B B Test Error [%] F 1 Test Error [%] 1 3 0 F 0 10 SP 2 S 10 SP R or L 2 1 F 2 1 1 1 S R 0 B0 or L SP B or F SP 0 0 1 0 0 1 SP SP 0 0

10-1 10-1 2-5 2-4 2-3 2-2 2-1 20 2-5 2-4 2-3 2-2 2-1 20 Subsampling Ratio Subsampling Ratio (a) Analyses of proposed augmentations. (b) Comparison with other augmentations. Fig. 5: Classifier performance using various data-augmentations linearly interpolated poses are synthesized in [37] using the approximation method proposed in [46], [57] and including following equation: sub-pixel shifts. We verified that the augmentations gave a significant improvement in the model’s generalization perfor-   mance compared to the baseline performance over a wide |θb − θc|Rθa (Iθa ) + |θa − θc|Rθb (Iθb ) Iθc = CRθc (10) range of sub-sampling ratios. However, the presented phase- |θa − θc| + |θb − θc| history approximation method is only valid in a local region where Rθ(I) denotes the rotation of radar image I by θ around a given azimuth angle. We hypothesize that a global degrees clockwise, CRθ(I) denotes the same but counter- model for each class can produce a diverse set of SAR images clockwise, Iθ denotes radar-image at the pose θ, θc is the over larger pose variations. Since the MSTAR target image desired new pose, θa and θb are the poses closest to θc in the chips are not perfectly registered and aligned across different training data. We illustrate the images synthesized of the T 62 azimuth angles, we have to incorporate an unknown correction ◦ tank for θc = 57 and the corresponding ground-truth image, phase term for each image chip to generate a unified model which is part of the Full SOC data, is used for comparison in the phase-history domain successfully. As part of future purposes only. We observe in Fig. 4b the synthesized image research, we propose developing a network architecture to ◦ ◦ ◦ θc = 57 using θa = 56 and θb = 85 using a sub-sampled learn a unified model that can account for these phase errors −4 data-set with R = 2 . We note that the dominant scattering and synthesize a larger data-set to improve the classifier’s centers in the synthesized image is different compared to the performance further. ground-truth. Next, we used the proposed model estimated ◦ for azimuth angle 56 . It’s evident from Fig. 4c that the ACKNOWLEDGEMENTS synthesized image from our method captures all the dominant This research was partially supported by Army Research scattering centers present in the ground truth. Finally, we use ◦ ◦ Office grant W911NF-11-1-0391 and NSF Grant IIS-1231577. the rotation operator to synthesize θc = 57 using θ = 56 . We observe that even the rotation is small the ground-truth REFERENCES image has a different scattering behavior, which does not get [1] R. L. Moses, L. C. Potter, and M. Cetin, “Wide-angle SAR imaging,” captured by rotation. The comparison is tabulated in tableII in Algorithms for Synthetic Aperture Radar Imagery XI, vol. 5427. and can be seen in Fig. 5b. It is evident that our approach International Society for Optics and Photonics, 2004, pp. 164–175. is significantly better than both, simple rotations and linearly [2] L. C. Potter and R. L. Moses, “Attributed scattering centers for SAR ATR,” IEEE Transactions on Image Processing, vol. 6, no. 1, pp. 79–91, interpolated poses, for this CNN based ATR task. Jan 1997. [3] M. C¸etin, I. Stojanovic,´ N. O. Onhon,¨ K. Varshney, S. Samadi, W. C. V. CONCLUSION Karl, and A. S. Willsky, “Sparsity-driven synthetic aperture radar imaging: Reconstruction, autofocusing, moving targets, and compressed In this paper, we proposed incorporating domain knowledge sensing,” IEEE Signal Processing Magazine, vol. 31, no. 4, pp. 27–40, into a CNN by way of data-augmentation. To this end, we July 2014. [4] L. C. Potter, E. Ertin, J. T. Parker, and M. Cetin, “Sparsity and presented a couple of data-augmentation strategies for training compressed sensing in radar imaging,” Proceedings of the IEEE, vol. 98, the neural network architecture to solve the ATR problem with no. 6, pp. 1006–1020, June 2010. limited labeled data. We empirically verified the augmentation [5] K. Dungan and L. Potter, “Classifying transformation-variant attributed point patterns,” , vol. 43, no. 11, pp. 3805 – 3816, strategies’ effectiveness by training a neural network with 2010. [Online]. Available: http://www.sciencedirect.com/science/article/ the augmented data-set synthesized using the phase-history pii/S003132031000261X 11

[6] K. E. Dungan and L. C. Potter, “Classifying vehicles in wide-angle [26] Z. Huang, Z. Pan, and B. Lei, “Transfer learning with deep convolutional radar using pyramid match hashing,” IEEE Journal of Selected Topics neural network for SAR target classification with limited labeled data,” in Signal Processing, vol. 5, no. 3, pp. 577–591, June 2011. Remote Sensing, vol. 9, no. 9, p. 907, 2017. [7] T. Abdelrahman and E. Ertin, “Mixture of factor analyzers models [27] S. Chen, H. Wang, F. Xu, and Y.-Q. Jin, “Target classification using the of appearance manifolds for resolved SAR targets,” in Algorithms for deep convolutional networks for SAR images,” IEEE Transactions on Synthetic Aperture Radar Imagery XXII, vol. 9475. International Geoscience and Remote Sensing, vol. 54, no. 8, pp. 4806–4817, 2016. Society for Optics and Photonics, 2015, p. 94750G. [28] Z. Lin, K. Ji, M. Kang, X. Leng, and H. Zou, “Deep convolutional [8] C. Hegde, A. C. Sankaranarayanan, W. Yin, and R. G. Baraniuk, “Nu- highway unit network for SAR target classification with limited labeled Max: A convex approach for learning near-isometric linear embeddings,” training data,” IEEE Geoscience and Remote Sensing Letters, vol. 14, IEEE Transactions on Signal Processing, vol. 63, no. 22, pp. 6109–6121, no. 7, pp. 1091–1095, 2017. Nov 2015. [29] Z. Fu, F. Zhang, Q. Yin, R. Li, W. Hu, and W. Li, “Small sample [9] D. Teng and E. Ertin, “WALD-kernel: A method for learning sequential learning optimization for ResNet based SAR target recognition,” in detectors,” in 2016 IEEE Statistical Signal Processing Workshop (SSP), IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing June 2016, pp. 1–5. Symposium. IEEE, 2018, pp. 2330–2333. [10] J. Cui, J. Gudnason, and M. Brookes, “Hidden markov models for multi- [30] G. Dong, N. Wang, and G. Kuang, “Sparse representation of monogenic perspective radar target recognition,” in 2008 IEEE Radar Conference, signal: With application to target recognition in SAR images,” IEEE May 2008, pp. 1–5. Signal Processing Letters, vol. 21, no. 8, pp. 952–956, 2014. [11] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning [31] S. Song, B. Xu, and J. Yang, “SAR target recognition via supervised applied to document recognition,” Proceedings of the IEEE, vol. 86, discriminative dictionary learning and sparse representation of the SAR- no. 11, pp. 2278–2324, Nov 1998. HOG feature,” Remote Sensing, vol. 8, no. 8, p. 683, 2016. [12] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification [32] H. Song, K. Ji, Y. Zhang, X. Xing, and H. Zou, “Sparse representation- with deep convolutional neural networks,” Commun. ACM, vol. 60, based SAR image target classification on the 10-class MSTAR data set,” no. 6, pp. 84–90, May 2017. [Online]. Available: http://doi.acm.org/10. Applied Sciences, vol. 6, no. 1, p. 26, 2016. 1145/3065386 [33] Y. Huang, G. Liao, Z. Zhang, Y. Xiang, J. Li, and A. Nehorai, “SAR [13] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of automatic target recognition using joint low-rank and sparse multiview data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, denoising,” IEEE Geoscience and Remote Sensing Letters, vol. 15, 2006. [Online]. Available: https://science.sciencemag.org/content/313/ no. 10, pp. 1570–1574, 2018. 5786/504 [34] Q. Yu, H. Hu, X. Geng, Y. Jiang, and J. An, “High-performance SAR [14] S. Chen, H. Wang, F. Xu, and Y. Jin, “Target classification using the automatic target recognition under limited data condition based on a deep convolutional networks for SAR images,” IEEE Transactions on deep feature fusion network,” IEEE Access, vol. 7, pp. 165 646–165 658, Geoscience and Remote Sensing, vol. 54, no. 8, pp. 4806–4817, Aug 2019. 2016. [35] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning requires rethinking generalization,” in [15] Y. Zhong and G. Ettinger, “Enlightening deep neural networks with 5th International Conference on Learning Representations, ICLR 2017, knowledge of confounding factors,” in Proceedings of the IEEE Inter- Toulon, France, April 24-26, 2017, Conference Track Proceedings, national Conference on , 2017, pp. 1077–1086. 2017. [Online]. Available: https://openreview.net/forum?id=Sy8gdB9xx [16] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, [36] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and 2016, http://www.deeplearningbook.org. L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” [37] J. Ding, B. Chen, H. Liu, and M. Huang, “Convolutional neural network International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. with data augmentation for SAR target recognition,” IEEE Geoscience 211–252, 2015. and remote sensing letters, vol. 13, no. 3, pp. 364–368, 2016. [17] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are [38] D. Marmanis, W. Yao, F. Adam, M. Datcu, P. Reinartz, K. Schindler, features in deep neural networks?” in Advances in neural information J. D. Wegner, and U. Stilla, “Artificial generation of big data for im- processing systems, 2014, pp. 3320–3328. proving image classification: A generative adversarial network approach [18] M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and transferring on SAR data,” arXiv preprint arXiv:1711.02010, 2017. mid-level image representations using convolutional neural networks,” [39] F. Gao, Y. Yang, J. Wang, J. Sun, E. Yang, and H. Zhou, “A deep convo- in Proceedings of the IEEE conference on computer vision and pattern lutional generative adversarial networks (dcgans)-based semi-supervised recognition, 2014, pp. 1717–1724. method for object recognition in synthetic aperture radar (sar) images,” [19] K. Simonyan and A. Zisserman, “Very deep convolutional networks for Remote Sensing, vol. 10, no. 6, p. 846, 2018. large-scale image recognition,” 2014. [40] M. Cha, A. Majumdar, H. Kung, and J. Barber, “Improving SAR [20] S. Gunasekar, J. Lee, D. Soudry, and N. Srebro, “Characterizing automatic target recognition using simulated images under deep residual implicit bias in terms of optimization geometry,” in Proceedings refinements,” in 2018 IEEE International Conference on Acoustics, of the 35th International Conference on , ser. Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 2606–2610. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., [41] Y. Zhai, H. Ma, J. Liu, W. Deng, L. Shang, B. Sun, Z. Jiang, H. Guan, vol. 80. Stockholmsmassan,¨ Stockholm Sweden: PMLR, 10–15 Jul Y. Zhi, X. Wu et al., “SAR ATR with full-angle data augmentation and 2018, pp. 1832–1841. [Online]. Available: http://proceedings.mlr.press/ feature polymerisation,” The Journal of Engineering, vol. 2019, no. 19, v80/gunasekar18a.html pp. 6226–6230, 2019. [21] N. A. Ruhi and B. Hassibi, “Stochastic gradient/mirror descent: [42] M. B. Alver, S. Atito, and M. C¸etin, “SAR ATR in the phase history Minimax optimality and implicit regularization,” in 7th International domain using deep convolutional neural networks,” in Image and Signal Conference on Learning Representations, ICLR 2019, New Orleans, Processing for Remote Sensing XXIV, vol. 10789. International Society LA, USA, May 6-9, 2019, 2019. [Online]. Available: https://openreview. for Optics and Photonics, 2018, p. 1078913. net/forum?id=HJf9ZhC9FX [43] J. Wang, P. Virtue, and S. X. Yu, “Joint embedding and classification [22] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut- for SAR target recognition,” arXiv preprint arXiv:1712.01511, 2017. dinov, “Dropout: a simple way to prevent neural networks from over- [44] B. L. Burns and J. T. Cordaro, “SAR image-formation algorithm that fitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. compensates for the spatially variant effects of antenna motion,” in 1929–1958, 2014. Algorithms for Synthetic Aperture Radar Imagery, D. A. Giglio, Ed., [23] S. Ioffe and C. Szegedy, “: Accelerating deep vol. 2230, International Society for Optics and Photonics. SPIE, 1994, network training by reducing internal covariate shift,” arXiv preprint pp. 14 – 24. arXiv:1502.03167, 2015. [45] P. Luo, X. Wang, W. Shao, and Z. Peng, “Towards understanding [24] B. Neyshabur, Z. Li, S. Bhojanapalli, Y. LeCun, and N. Srebro, “The regularization in batch normalization,” 2018. role of over-parametrization in generalization of neural networks,” in [46] N. Sugavanam and E. Ertin, “Limited persistence models for SAR 7th International Conference on Learning Representations, ICLR 2019, automatic target recognition,” in Algorithms for Synthetic Aperture New Orleans, LA, USA, May 6-9, 2019, 2019. [Online]. Available: Radar Imagery XXIV, vol. 10201. International Society for Optics https://openreview.net/forum?id=BygfghAcYX and Photonics, 2017, p. 102010M. [25] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans- [47] N. Sugavanam, E. Ertin, and R. Burkholder, “’approximating bistatic actions on knowledge and data engineering, vol. 22, no. 10, pp. 1345– SAR target signatures with sparse limited persistence scattering mod- 1359, 2009. els’,” in International Conference on Radar, Brisbane, 2018. 12

[48] N. Sugavanam, E. Ertin, and R. Burkholder, “Compressing bistatic SAR target signatures with sparse-limited persistence scattering models,” IET Radar, Sonar Navigation, vol. 13, no. 9, pp. 1411–1420, 2019. [49] N. O. Onhon and M. Cetin, “A sparsity-driven approach for joint SAR imaging and phase error correction,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 2075–2088, April 2012. [50] A. E. Brito, S. H. Chan, and S. D. Cabrera, “SAR image formation using 2d reweighted minimum norm extrapolation,” in Algorithms for Synthetic Aperture Radar Imagery VI, vol. 3721. International Society for Optics and Photonics, 1999, pp. 78–91. [51] M. Cetin, “Feature-enhanced synthetic aperture radar imaging,” Ph.D. dissertation, Boston University, 2001. [52] J. A. Jackson, B. D. Rigling, and R. L. Moses, “Canonical scattering fea- ture models for 3D and bistatic SAR,” IEEE Transactions on Aerospace and Electronic Systems, vol. 46, no. 2, pp. 525–541, April 2010. [53] K. Sarabandi and T.-C. Chiu, “Optimum corner reflectors for calibration of imaging radars,” IEEE Transactions on Antennas and Propagation, vol. 44, no. 10, pp. 1348–1361, Oct 1996. [54] L. C. Potter, D.-M. Chiang, R. Carriere, and M. J. Gerry, “A GTD-based parametric model for radar scattering,” IEEE Transactions on Antennas and Propagation, vol. 43, no. 10, pp. 1058–1067, Oct 1995. [55] H. Rauhut and R. Ward, “Interpolation via weighted L1 minimization,” Applied and Computational Harmonic Analysis, vol. 40, no. 2, pp. 321 – 351, 2016. [56] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al., “Tensorflow: A system for large- scale machine learning,” in 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16), 2016, pp. 265–283. [57] N. Sugavanam and E. Ertin, “Interrupted SAR imaging with limited persistence scattering models,” in 2017 IEEE Radar Conference (Radar- Conf), May 2017, pp. 1770–1775.