Data-Driven Cepstral and Neural Learning of Features for Robust Micro-Doppler Classification
Total Page:16
File Type:pdf, Size:1020Kb
Data-driven cepstral and neural learning of features for robust micro-Doppler classification Baris Erola, Mehmet Saygin Seyfioglub, Sevgi Zubeyde Gurbuzc, and Moeness Amina aCenter for Advanced Communication, Villanova University, Villanova, PA 19085, USA bDept. of Electrical-Electronics Eng., TOBB Univ. of Economics and Technology, Ankara, Turkey cDept. of Electrical and Computer Engineering, University of Alabama, AL 35487, USA ABSTRACT Automatic target recognition (ATR) using micro-Doppler analysis is a technique that has been a topic of great research over the past decade, with key applications to border control and security, perimeter defense, and force protection. Patterns in the movements of animals, humans, and drones can all be accomplished through classification of the target’s micro-Doppler signature. Typically, classification is based on a set of fixed, pre-defined features extracted from the signature; however, such features can perform poorly under low signal-to-noise ratio (SNR), or when the number and similarity of classes increases. This paper proposes a novel set of data-driven frequency-warped cepstral coefficients (FWCC) for classification of micro-Doppler signatures, and compares performance with that attained from the data-driven features learned in deep neural networks (DNNs). FWCC features are computed by first filtering the discrete Fourier Transform (DFT) of the input signal using a frequency-warped filter bank, and then computing the discrete cosine transform (DCT) of the logarithm. The filter bank is optimized for radar using genetic algorithms (GA) to adjust the spacing, weight, and width of individual filters. For a 11-class case of human activity recognition, it is shown that the proposed data-driven FWCC features yield similar classification accuracy to that of DNNs, and thus provides interesting insights on the benefits of learned features. 1. INTRODUCTION In many automatic target recognition (ATR) and activity discrimination applications, micro-Doppler signature analysis is utilized for classification.1 Micro-Doppler refers to frequency modulations about the central Doppler shift that are due to vibration or rotation of parts of the target.2 Thus, the rotation of vehicle wheels, treads of a tank, spinning of helicopter blades, and periodic motion of the arms and legs during walking or other human and animal activities all result in unique micro-Doppler signatures. By identifying patterns in these signatures, different targets or motions can be recognized. In turn, classification is typically done based on the values of pre-defined features extracted from the micro-Doppler signature, such as physical features3 (e.g. average velocity, stride rate, or Doppler bandwidth) and discrete cosine transform (DCT) coefficient,4 among others. Recently, speech features,5 such linear predictive coding (LPC) coefficients6 and mel-frequency cepstrum coefficients (MFCC),5, 7 have also been proposed for micro-Doppler classification. Mel-frequency cepstrum coefficients are computed by first filtering the signal using a filter bank defined according to the mel-scale, after which the discrete cosine transform is taken to yield the cepstrum coefficients. It is based on the perception of equal distance between pitches and thus mimics the logarithmic perception of the human auditory system. However, the Doppler frequency spread in micro-Doppler signatures are completely unrelated to the mel-scale. In fact, by placing fewer and wider filter banks at high frequency, the mel-scale obscures the higher frequency elements of the micro-Doppler signature that are essential to target recognition. Moreover, the frequency spread of micro-Doppler is dependent not just on target motion, but also on radar parameters, and is typically much different from the mel-scale. Micro-Doppler can also have negative frequencies, whereas the mel- frequency filter bank on considers positive frequencies. Thus, recent work8 has shown that MFCC perform sub-optimally in radar applications, and alternative filter banks, such as hyperbolically-warped cepstral coefficients have been proposed. In this work, two data-driven approaches for feature learning are compared: an automatic method for optimizing the filter bank used in cepstral processing with genetic algorithms (i.e. frequency-warped cepstral coefficients (FWCC)), and a convolutional autoencoder that uses unsupervised pre-training followed by supervised fine-tuning to optimize the weights in the neural network. Deep learning has recently come to light as a highly effective method for classifying radar imagery.9–11 However, radar micro-Doppler data is acquired as a time-stream of complex data, from which, after several pre-processing stages, the time-frequency transform of the data (spectrogram) is presented as an image to the deep neural network (DNN). As such, target kinematics and phenomenology affect each pixel value, which is not simply a digital number value proportional to optical perception. In comparing these two methods, we advocate the view that it is essential to consider the physical basis, and indeed, different domain representations in classification of radar micro-Doppler. Our results show that the frequency-warped cepstral filtering process can yield similar results to that obtainable with deep learning, and thus offers an interesting result for consideration of really what are the most relevant and important properties of the data for target recognition, and how this knowledge can be fused with deep learning architectures to achieve even more substantial improvements in robust, pervasive, automatic target recognition. 2. HUMAN MICRO-DOPPLER DATABASE Radar returns may be extracted using a wide range of sensors, including continuous wave (CW), linear frequency modulated continuous wave (FMCW), and pulse Doppler radar ranging in transmit frequency from as low as 2.4 GHz to as high as 77 GHz in automotive applications. In this work, the comparison between data-driven cepstral features and DNN is carried out on a CW radar system operating at a 4 GHz center frequency. The 4 GHz S-band CW data is generated using a USRP model 2922 software-defined radio platform positioned 1 meter above the ground illuminating subjects 1 to 5 meters in distance who move along the radar line-of-sight. For a CW radar the transmitter signal can be expressed as s(t) = s0 cos(w0t); (1) where s0 is the amplitude of the transmitted signal and w0 is the angular frequency of the transmitted signal (rad/sec). Then, the echo signal from a moving target can be written as sr(t) = ats0 cos[(w0 + wd)t + φ]; (2) where wd is the Doppler angular frequency shift caused by the motion of target, φ is the constant phase shift (depends upon distance between the radar and the target), and at is the amplitude as computed from the range radar equation p Gλ Piσi at = : (3) 1:5 2 p p (4π) Rt;i Ls La Here, G is the antenna gain, P is the transmitter power, σ is the radar cross section (RCS) for the target, and Ls and La represent the system and atmospheric losses, respectively. In this paper, we work with the discrete version of the baseband radar return in (2), given by s(n) = sr(t)jt=nTs ; n = 0; 1;:::; N − 1; (4) where Ts is the sampling period. Finally, micro-Doppler signatures are computed as the spectrogram (S) – modulus squared of short-time Fourier Transform (STFT) – of the radar return: 1 X S(n; k) = jS T FT(n; k)j2= s(n + k)w(k)e− jkm (5) m=−∞ where w[m] is a window function which has an effect on both frequency and time resolution. The spectrograms computed using a hamming window with length of 2048 samples, 4096 fast Fourier Transform (FFT) points, and 512 samples overlap. Each spectrogram converted to gray-scale and saved as an image. To reduce dimensionality, the resulting images were then down-sampled from a size of 2048x1339 pixels to 128x128 pixels. Figure 1 shows the resulting micro-Doppler signatures for eleven different activity classes. The gait for each class was enacted as; C1: Walking with walker with the restriction of arm swing, C2: Medium speed walking with arm swing, shortened swing, C3: Falling from a chair, C4: Using a tripod cane and metal cane, C5: Limping with left foot dragging on ground behind right foot, C6: Wheelchair, C7: Crawling (slow advancement on hands and knees), C8: Medium speed jogging with two arms held bent at elbows, shortened swing, C9: Walking with crutches, C10: Sitting, C11: Creeping (military style motion with belly on ground). It is important to note that in this paper we only consider spectrograms as a representative of quadratic time-frequency distributions (QTFDs). (a) C1: Walker (b) C2: Walking (c) C3: Falling sideways (d) C4: Using tripod cane (e) C5: Limping (f) C6: Wheelchair (g) C7: Crawling (h) C8: Running (i) C9: Using crutches (j) C10: Sitting (k) C11: Creeping Figure 1: Micro-Doppler signatures of eleven different motions. 3. GENETICALLY OPTIMIZED CEPSTRAL FEATURE DESIGN In speech processing, the mel-frequency cepstrum (MFC) represents the short-term power spectrum of a signal.12 MFCCs are derived from the MFC of the given signal by taking the discrete Fourier Transform (DFT) of the windowed input signal, and then passing this signal through a filter bank whose center frequencies and bandwidths are specified according to the mel-scale. The mel scale is a perceptual scale of pitches judged by listeners to be equal in distance from one another. Mathematically, the mel-scale is defined as f b = 2595 log 1 + Hz ; (6) mel 10 700 where fHz is the original frequency and bmel is the warped frequency representing the perceived frequency. Mel-frequency filtering yields information on the amount of signal energy residing within different frequency bands, i.e. a weighted power spectrum and can be extracted by passing the filter bank through the spectrogram, followed by taking the logarithm as XN−1 P(n;m) = log S(n; k) hm(k) ; (7) k=0 where hm is the individual filters in the filter bank and m = 1; 2; :::; M is the number of filter utilized.