electronics
Article A Low Complexity, High Throughput DoA Estimation Chip Design for Adaptive Beamforming
Kuan-Ting Chen 1, Wei-Hsuan Ma 2, Yin-Tsung Hwang 1,* and Kuan-Ying Chang 2
1 Department of Electrical Engineering, National Chung Hsing University, Taichung City 402, Taiwan; [email protected] 2 ILITEK co. ltd, HsinChu 302, Taiwan; [email protected] (W.-H.M.); [email protected] (K.-Y.C.) * Correspondence: [email protected]
Received: 11 March 2020; Accepted: 10 April 2020; Published: 13 April 2020
Abstract: Direction of Arrival (DoA) estimation is essential to adaptive beamforming widely used in many radar and wireless communication systems. Although many estimation algorithms have been investigated, most of them focus on the performance enhancement aspect but overlook the computing complexity or the hardware implementation issues. In this paper, a low-complexity yet effective DoA estimation algorithm and the corresponding hardware accelerator chip design are presented. The proposed algorithm features a combination of signal sub-space projection and parallel matching pursuit techniques, i.e., applying signal projection first before performing matching pursuit from a codebook. This measure helps minimize the interference from noise sub-space and makes the matching process free of extra orthogonalization computations. The computing complexity can thus be reduced significantly. In addition, estimations of all signal sources can be performed in parallel without going through a successive update process. To facilitate an efficient hardware implementation, the computing scheme of the estimation algorithm is also optimized. The most critical part of the algorithm, i.e., calculating the projection matrix, is largely simplified and neatly accomplished by using QR decomposition. In addition, the proposed scheme supports parallel matches of all signal sources from a beamforming codebook to improve the processing throughput. The algorithm complexity analysis shows that the proposed scheme outperforms other well-known estimation algorithms significantly under various system configurations. The performance simulation results further reveal that, subject to a beamforming codebook with a 5◦ angular resolution, the Root Mean Square (RMS) error of angle estimations is only 0.76◦ when Signal to Noise Ratio (SNR) = 20 dB. The estimation accuracy outpaces other matching pursuit based approaches and is close to that of the classic Estimation of Signal Parameters Via Rotational Invariance Techniques (ESPRIT) scheme but requires only one fifth of its computing complexity. In developing the hardware accelerator design, pipelined Coordinate Rotation Digital Computer (CORDIC) processors consisting of simple adders and shifters are employed to implement the basic trigonometric operations needed in QR decomposition. A systolic array architecture is developed as the computing kernel for QR decomposition. Other computing modules are also realized using various linear systolic arrays and chained together seamlessly to maximize the computing throughput. A Taiwan Semiconductor Manufacturing Company (TSMC) 40 nm CMOS process was chosen as the implementation technology. The gate count of the chip design is 454.4k, featuring a core size of 0.76 mm2, and can operate up to 333 MHz. This suggests that one DoA estimation, with up to three signal sources, can be performed every 2.38 µs.
Keywords: DoA estimation; adaptive beamforming; chip design; hardware accelerator; QR decomposition
Electronics 2020, 9, 641; doi:10.3390/electronics9040641 www.mdpi.com/journal/electronics Electronics 2020, 9, 641 2 of 27
1. Introduction Beamforming is a technique of transmitting signal to or receiving signal from a designated direction. Because of the discrepancy in treating signals from different directions, it is also known as filtering in the space domain. Beamforming has been widely used in radar, wireless communication and medical imagining systems and requires an antenna (or transducer) array to accomplish it. By properly controlling the phases and amplitudes of signal on each antenna, the transmission or reception signal gain along specific directions can be enhanced while those along other directions will be suppressed. The benefits of beamforming include improved link budget, extended service range and interference suppression. In adaptive beamforming, the beams are not fixed. Instead, they are steered dynamically toward the directions of signal sources. Although not all beamforming applications require explicit directional information of the signal sources, DoA estimation is essential in radar signal processing and establishing wireless communication links. Explicit DoA information also helps achieve better interference cancellation.
1.1. Related Works of DoA Estimation The basics of DoA estimation is exploring the phase differences of signals received from an antenna array to distinguish the signal direction. In the presence of multiple signal sources and background noises, the orthogonal property between the noise and signal subspaces is assumed. The first step of estimation is calculating the auto-correlation matrix using the received antenna array 1 signals. A uniformly spaced (by a pitch of 2 ) linear antenna array is assumed for 1-D (azimuth angle) estimation. The Radio Frequency (RF) signals are down converted to baseband and form a signal vector x(t) in each digital sampling. If the RF modulation does include both in-phase and quadrature phase components, an additional Hilbert transform is needed to obtain complex-valued signal vectors. Conventional estimation schemes can be classified into three categories, i.e., noise subspace, signal sub-space and matching pursuit. For the noise subspace approach, the noise subspace is determined using the auto-correlation matrix. A MUSIC (MUltiple SIgnal Classification) scheme and its variations [1–3] perform Eigenvalue Decomposition (EVD) to obtain the noise sub-space spanned by the Eigen vectors corresponding to smaller Eigen values. The Eigen vectors of noise sub-space are used to construct a spatial frequency spectrum function. A spatial frequency scanning procedure is next performed and the peaks of the spectrum indicate the directions of signal sources. In [4,5], a ROOT-MUSIC scheme, aimed at eliminating the expensive spatial frequency scanning procedure by finding the roots of a nonlinear polynomial, was proposed. In [6], the noise space estimation is improved by applying oblique projection along the signal sub-space. For the signal sub-space approach, on the contrary, the signal sub-space is extracted via auto-correlation matrix decomposition. Instead of spectrum scanning, the signal directions are computed directly. ESPRIT (Estimation of Signal Parameters Via Rotational Invariance Techniques) [7–9] is considered a classic scheme of this category. It applies the singular value decomposition first and uses the left singular matrix to represent the signal sub-space. By finding the transform matrix between the two sub-matrices of the singular matrix, all signal directions can be determined simultaneously following an additional Eigen value decomposition. Enhanced schemes such as [10] employs an additional time-frequency data model exhibiting multi-invariance to improve the estimation accuracy and scheme [11] can tackle the coherent signal detection problem. These two categories of estimation are computationally intensive because of various matrix decompositions, which requires iterative computations to obtain the results. The third category of estimation scheme assumes a codebook (or dictionary matrix) consisting of steering vectors is available. A steering vector outlines the phase differences of received signals of the antenna array subject to a specific direction of arrival. The estimation is performed by finding the best match from the codebook using some greedy schemes such as Matching Pursuit (MP) [12–14]. The size of the codebook indicates the angular resolution of the estimation. Since the codebook is not orthogonal (steering vectors are not mutually orthogonal), the accuracy of matching can be degraded if the angular Electronics 2020, 9, 641 3 of 27 separation of two signal sources is small. In view of this, an enhancement using Orthogonal Matching Pursuit (OMP) [15], which calls for an extra orthogonalization process after each match, was proposed. Other improvements include the scheme in [16], which introduces an extra majorization–minimization measure, and the scheme in [17], which applies a conjugate gradient method with subtraction method. In [18], the performance of the OMP scheme is enhanced with iterative local searching. In [19], the OMP scheme is applied to hybrid non-uniform array. Although this category of estimation schemes is considered less computation demanding than the first two counterparts, expensive computations such as pseudo matrix inverse computations are still needed.
1.2. Motivations and Contributions Note that the frequency of performing DoA estimation in adaptive beamforming can be high, in particular, in fast time varying environments such as for vehicular communications. In addition, array signal processing is computationally intensive and the algorithms usually exhibit a complexity of O(N3), where N is the antenna array size. For radar systems employing hundreds of antenna, the computing requirement is humongous. Therefore, conventional programmable digital signal processor approaches may not be able to process the data in real time and only the hardwired approaches are feasible. Hardwired approaches also provide much better power consumption than programmable approaches. To achieve an efficient hardware accelerator design, we need to 1) reduce the computing complexity of the algorithm, and 2) map the algorithm neatly to a hardware architecture supporting massive parallel processing and pipelined operations. In this paper, we aim at tackling these two issues and develop a low complexity, high throughput, DoA estimation accelerator design in chip. The contributions of this work can be summarized as follow: In the algorithm development perspective, various design measures were devised to facilitate low complexity DoA estimations. These include applying a signal sub-space projection to enhance the accuracy of matching pursuit, calculating only a subset of auto-correlation matrix to avoid computing redundancy, and simplifying the projection matrix computation via QR factorization, which factorizes a matrix to the product of a unitary matrix Q and an upper triangular matrix R. These measures successfully reduce the computing complexity to only one fifth that of the ESPRIT scheme while largely maintaining the estimation accuracy. In the hardware accelerator design perspective, an effective algorithm to hardware mapping was conducted. Efficient systolic array designs were developed for each computing module and integrated seamlessly to avoid any data disruption between modules. The data input scheme was taken into consideration when determining data buffering and hardware allocation. This avoids excessive computing concurrency and minimizes the hardware complexity. An efficient QR factorization design based on the Coordinate Rotation Digital Computer (CORDIC) units was also developed. All of these design measures and innovations can be equally applied to likewise designs. Finally, a chip design in TSMC 40 nm CMOS process technology was presented to demonstrate the significant speed up can be achieved. The gate count of the chip design is 454.4 k and can operate up to 333 MHz. This suggests that one DoA estimation, with up to three signal sources, can be performed every 2.38 µs. The rest of the paper is organized as follows: in Section2, a brief recap of three DoA estimation schemes, i.e., ESPRIT, Matching Pursuit (MP) and Orthogonal Matching Pursuit (OMP), used in the performance comparison is given first. The proposed projection based parallel matching pursuit (P2MP) scheme is next presented. In Section3, the performance evaluations of the proposed scheme against existing ones are described. The evaluations include computing complexity, estimation accuracy and how DoA estimation enhances Frequency Modulated Continuous Waveform (FMCW) radar detection. The performance robustness of the proposed scheme to the assumption of the number of signal sources is also verified. In Section4, the target application of the proposed DoA estimation scheme and the hardware design specifications are first introduced. The system architecture, timing diagram, and individual hardware module designs are next elaborated followed by a chip implementation report and comparison with existing works. Electronics 2020, 9, 641 4 of 27
2. Projection and Parallel Matching Pursuit Based DoA Estimation Electronics 2020, 9, x FOR PEER REVIEW 4 of 26 2.1. Recap of Conventional DoA Estimation Schemes Before introducing the proposed DoA estimation schemes, three popular schemes are reviewed first toBefore highlight introducing computing the proposedcomplexity DoA issue estimation and the difference schemes, threewith the popular proposed schemes one. areThe reviewed first one firstis the to ESPRIT highlight scheme computing [7–9], complexitywhich is a classic issue and signal the sub-space difference withbased the approach. proposed The one. corresponding The first one ispseudo the ESPRIT code is scheme shown in [7– Figure9], which 1. An is auto-correlati a classic signalon sub-space matrix is estimated based approach. first followed The corresponding by a singular pseudomatrix decomposition. code is shown in The Figure number1. An of auto-correlation signal sources matrixcan be isdetermined estimated firstby counting followed the by number a singular of matrixsignificant decomposition. singular values. The numberThe column of signal vectors sources of the can left be singular determined matrix by countingcorresponding the number to those of significant singular values. The column vectors of the left singular matrix corresponding to those significant singular values form a sub-matrix , which spans the signal sub-space. is further significant singular values form a sub-matrix L , which spans the signal sub-space. L is further divided divided into two sub-matrices and byS taking the N-1 and the last N-1 rows,S respectively. This intomeans two each sub-matrices matrix is derivedL1 and L using2 by taking signals the fromN-1 N and-1 out the of last N Nantennas.-1 rows, respectively.There thus exists This a means transform each matrix is derived using signals from N-1 out of N antennas. There thus exists a transform ψ between between and . The transform matrix can be calculated by taking the pseudo matrix Linversion.1 and L2. Finally, The transform the Eigen matrix decompositionψ can be calculated of is performed by taking and the pseudothe angular matrix information inversion. of Finally, Eigen thevalues Eigen indicate decomposition the directions of ψ ofis performedarrival. The and advantage the angular of this information approach ofis that Eigen no values prior information indicate the directionsof the number of arrival. of signal The advantage sources is of needed. this approach In addition, is that noall priorsignal information sources can of thebe numberidentified of signalsimultaneously. sources is needed.The obvious In addition, shortcoming all signal of this sources scheme can is be its identified computing simultaneously. complexity. Both The singular obvious shortcomingvalue and Eigen of this decompositions scheme is its computing are computationally complexity. Both intensive singular and value can and only Eigen be solved decompositions through areiterative computationally procedures intensive and the andcomputing can only complexity be solved through of each iterative iteration procedures is of order and O( theN3) computing [20]. The 3 complexitypseudo inverse of each calculation, iteration which is of order requires O(N two)[20 matrix]. The pseudomultiplications inverse calculation,and one matrix which inversion, requires also two 3 matrixbears a multiplicationscomplexity of O( andN3). one These matrix computing inversion, requirements also bears a become complexity a big of hurdle O(N ).to These high throughput computing requirementsrate in implementation. become a big hurdle to high throughput rate in implementation.
Figure 1. Pseudo code of the Estimation of Signal ParametersParameters Via Rotational Invariance Techniques (ESPRIT) DirectionDirection ofof ArrivalArrival (DoA)(DoA) estimationestimation algorithm.algorithm.
In contrast toto calculatingcalculating the the angles angles of of DoA DoA directly, directly, the the matching matching pursuit pursuit based based approaches approaches [12 [12––14] try14] totry identify to identify them them from from a codebook a codebook with awith pre-defined a pre-defined angular angular resolution. resolution. Unlike theUnlike ESPRIT the scheme,ESPRIT theyscheme, are freethey of are expensive free of matrixexpensive decompositions matrix deco andmpositions inversions. and Instead, inversions. a sparse Instead, approximation a sparse algorithm,approximation which algorithm, finds the which "best matching"finds the "best projections matching" of multidimensional projections of multidimensional data onto the span data ofonto an over-completethe span of andictionary over-complete matrix, dictionary is conducted. matrix Best, is matchingconducted. means Best ama minimumtching means number a minimum of vectors innumber the dictionary of vectors matrix in the are dictionary selected. matrix The selection are selected. or matching The selection process or is matching in general process greedy is and in conductedgeneral greedy progressively. and conducted The original progressively. MP scheme The works original well MP in scheme the presence works of well only in one the signal presence source of butonly the one performance signal source degrades but the as performance the number degrad of signales sourcesas the number increase. of Therefore,signal sources the enhancedincrease. schemeTherefore, OMP the [ 15enhanced] was presented. scheme OMP Figure [15]2 shows was pres a combinedented. Figure pseudo 2 shows code ofa combined the MP /OMP pseudo schemes. code Theof the residual MP/OMP matrix schemes.R is initially The residual set to thematrix auto-correlation R is initially matrixset to theRxx auto-correlation. Assume there arematrixK signal . Assume there are K signal sources, they are identified one by one using a for loop. In each loop iteration, the residual matrix is multiplied with the codebook matrix D, and the vector (in the codebook) yields the maximum norm 2 value after multiplication is considered the “best match”. The corresponding index p is added to the set P. For OMP, an additional step 8 is performed, i.e., the matched vector is appended to a matrix A. After the vector selection, in the MP scheme, the
Electronics 2020, 9, 641 5 of 27 sources, they are identified one by one using a for loop. In each loop iteration, the residual matrix is multiplied with the codebook matrix D, and the vector (in the codebook) yields the maximum norm 2 valueElectronics after 2020 multiplication, 9, x FOR PEER REVIEW is considered the “best match”. The corresponding index p is added to5 of the 26 set P. For OMP, an additional step 8 is performed, i.e., the matched vector is appended to a matrix A. Aftercontribution the vector of the selection, selected in vector the MP is scheme,subtracted the from contribution the residual of the matrix selected (step vector 9). In is the subtracted OMP scheme, from thean orthogonalization residual matrix (step process 9). In (step the OMP 10) is scheme, performed. an orthogonalization It uses all the selected process (stepvectors 10) instead is performed. of the Itcurrent uses all one the to selected calculate vectors the residual instead matrix of the R. current In other one words, to calculate the projection the residual of residual matrix matrixR. In other R on words,the vector the projectionspace spanned of residual by all matrix the selectedR on the vectors vector is space removed. spanned This by calls all the for selected a pseudo vectors matrix is removed.inversion, Thispinv( callsA), defined for a pseudo as: matrix inversion, pinv(A), defined as: