A Givens Rotation-Based QR Decomposition for MIMO Systems Wen Fan and Amir Alimohammad

1 A Givens Rotation-based QR Decomposition for MIMO Systems Wen Fan and Amir Alimohammad Abstract—QR decomposition is an essential operation in var- CORDIC algorithms are commonly used to implement ious detection algorithms utilized in multiple-input multiple- Givens rotation-based QR decomposition for their low hard- output (MIMO) wireless communication systems. This paper ware complexity. However, the number of iterations will be presents a Givens rotation-based QR decomposition for 4 × 4 MIMO systems. Instead of performing QR decomposition by large if the system requires high accuracy, which leads to a CORDIC algorithms, lookup table (LUT) compression algo- relatively long latency. This article intends to utilize alternative rithms are employed to rapidly evaluate the trigonometric algorithms with greater computational accuracy than CORDIC functions. The proposed approach also provides greater accuracy algorithms for Givens rotation-based QR decomposition. After compared to the CORDIC algorithms. QR decomposition is per- comparing the mean square error (MSE) performance and formed by complex Givens rotations cascaded with real Givens rotations. In complex Givens rotations, a modified triangular arithmetic complexity of the lookup table (LUT) compression, systolic array (TSA) is adopted to reduce the delay units of linear approximation, and CORDIC algorithms, LUT compres- the design and hence, reducing the hardware complexity. The sion scheme is selected to implement trigonometric functions proposed QR decomposition algorithm is implemented in TSMC in Givens rotation as it has lower computational complexity nm 90- CMOS technology. It achieves the throughput of 53.5 than the linear approximation technique and also provides million QR decompositions per second (MQRD/s) when operating at 214 MHz. improved MSE performance compared to the CORDIC-based solution. Index Terms—QR decomposition, Givens rotation, lookup table To reduce the hardware complexity, the proposed QR compression, MIMO detection. decomposition approach consists of a complex-valued decomposition (CVD) followed by a real-valued decomposition I. INTRODUCTION (RVD). The RVD is designed with the triangular systolic array (TSA) architecture and the CVD is implemented using the ULTIPLE-input multiple-output (MIMO) technique has modified TSA architecture. Compared with the conventional M attracted significant interest due to the substantial in- architectures, the proposed scheme reduces the number of crement of system capacity and spectral efficiency. As a reli- delay units and also shortens the latency of the design. able technology supporting high throughput data transmission, The rest of this article is organized as follows. In Section MIMO has been widely adopted by recent wireless com- II, the MIMO system model and the important role of QR munication standards, such as IEEE 802.11n, IEEE 802.16m decomposition in detection algorithms are briefly reviewed. (WiMAX) and 3GPP-LTE [1], [2]. One of the challenges for Section III describes the complex-valued and real-valued de- MIMO systems is to design a high-throughput and accurate composition algorithms in the proposed design. Implementing detector for the receiver. QR decomposition is an essential trigonometric functions using our proposed LUT compression operation to convert a MIMO channel into multiple layered scheme is also presented and compared with the linear approx- sub-channels and thus, reduce the computational complexity imation and CORDIC-based realizations. In Section IV, the of MIMO detection. The accuracy of QR decomposition will architectures of complex Givens rotation, real Givens rotation, directly impact the bit error rate (BER) performance and and the divider used in the arctangent function are presented symbol detection throughput of MIMO systems significantly. and discussed. The implementation results and comparisons Three QR factorization methods have been widely used are provided in Section V. Finally, Section VI makes some in MIMO systems: Gram-Schmidt, Householder transforma- concluding remarks. tion, and Givens rotation [3]–[14]. Since the complexity of Householder transformation is relatively high, related studies II. MIMO SYSTEM MODEL on the QR decomposition architectures are typically classified into two main categories. One is based on the modified Consider a spatial multiplexing MIMO system [15] with NT Gram-Schmidt algorithm (MGS) [3]–[8], which performs QR transmit and NR receive antennas. The equivalent baseband decomposition in parallel and requires many norm and division model of the channel can be described by a complex-valued NR NT matrix H. The relation between transmit and receive operations. The other category is based on Givens rotation × and utilizing triangular systolic array architecture [9]–[14], signal vectors is can be written as: which implements the rotation operation by the coordinate y = Hs + n, (1) rotation digital computer (CORDIC) algorithms. Compared to T MGS, Givens rotation has the advantage of lower hardware where y = [y1,y2, ..., yNR ] denotes the NR-dimensional T complexity, however, the long latency is the main obstacle of receive signal vector, s = [s1,s2, ..., sNT ] is the NT 1 T × the Givens rotation approach. transmit signal vector, and n = [n1,n2, ..., nNR ] denotes the 2 NR 1 noise vector with independent identically-distributed Givens rotation matrix G1 targets at eliminating h21 by h11 (i.i.d)× zero-mean Gaussian noise variates [16]. and can be expressed as: QR decomposition is an essential preprocessing unit in c s 0 0 various MIMO detection techniques, such as zero-forcing, ∗ ∗ s c 0 0 sphere decoding, and K-best detection algorithms [17]–[20]. G1 = . (5) −0 010 By using QR decomposition of the channel matrix H = QR, 0 001 where Q is a unitary matrix and R is an upper triangular matrix, the detected signal vector ˆs can be expressed as The complex triangular matrix R and the complex unitary follows: matrix Q can be obtained as: R G G G G G G H ˆs = arg min y Hs 2 = 6 5 4 3 2 1 , H k −H k Q G G G G G G (6) = arg min Q y Rs 2. (2) = ( 6 5 4 3 2 1) , k − k where G2,..., G6 are rotation matrices to zero h31, h41, h32, In efficient MIMO detection schemes, such as the K- H h42 and h43, respectively, and ( ) denotes the Hermitian of best algorithm, sorting of the expanded traversal paths is an a complex matrix. The c and s ·parameters can be calculated important step. If the CVD system model is chosen, additional using the three-angle complex rotation (Three-ACR) technique multiplication and addition will be necessary before sorting [11], [21], where: can take place. In order to obtain the best candidates without ∗ h11 −jθ11 a complicated sorting step, many MIMO detection schemes c = = cos θae , 2 2 utilize the RVD system model [18], in which (1) can be h11 + h21 | | ∗ | | p expressed as: h21 −jθ21 s = = sin θae , h 2 + h 2 Re y Re H Im H Re s | 11| | 21| { } = { } − { } { } p h21 Im y Im H Re H Im s θa = arctan | | . (7) { } { } { } { } h Re n | 11| + { } , (3) Im n In order to avoid the square root operation in calculating θa, { } (7) can be further optimized as: where Re and Im denote the real and imaginary parts, {·} {·} Re h21 cos θ11 respectively. In this case, the dimension of the real-valued θa = arctan { } . (8) cos θ21 × Re h11 channel matrix becomes 2NR 2NT . For the MIMO systems × { } with a relatively large number of transmit and receive antennas After rotation, the rotated matrix G1H becomes (e.g., 4 4 or more), it is shown in [11] that the direct RVD of (1) (1) (1) (1) the channel× matrix will be more complicated than the CVD. h11 h12 h13 h14 0 h(1) h(1) h(1) However, MIMO detection will be computationally-intensive G1H = 22 23 24 , (9) without RVD, especially for high order modulation schemes. h31 h32 h33 h34 One efficient approach is to first perform the CVD of the h41 h42 h43 h44 channel matrix and then perform the RVD of the complex where h(k) denotes the h entry of the matrix after k-th triangular matrix R [11]. This approach is applied in the ij ij rotation. An important feature of the Three-ACR technique is proposed QR decomposition scheme and is discussed in the that it causes the triangular matrix R to have only real diagonal following section. elements. Therefore, an additional rotation step is required to (1) remove the imaginary part of h22 in G1H. The additional III. PROPOSED QR DECOMPOSITION ALGORITHM rotation matrix can be written as: A. Complex-Valued Decomposition 1 0 00 −jθ(1) ′ 22 G = 0 e 0 0 . (10) Givens rotation technique zeros one element of a matrix 1 0 0 10 at a time by applying a two-dimensional rotation. Therefore, 0 0 01 rotation matrix plays an important role on the performance of QR decomposition. The idea of CVD-based Givens rotation Another approach to calculate the c and s parameters is the can be illustrated using the polar representation. Consider a two-angle complex rotation (Two-ACR) [21], where 4 4 complex-valued matrix × c = cos θa, jθ11 jθb h11 e h12 h13 h14 s = sin θae , | | jθ21 h21 e h22 h23 h24 θb = θ θ . (11) H = | | , (4) 11 − 21 h31 h32 h33 h34 The Three-ACR technique based on the architecture of TSA h41 h42 h43 h44 reduces the latency and area by using the same hardware where hi1 and θi1 (i = 1,2) represent the magnitude and the resources of the CORDIC modules, but the throughput will angle of| the| matrix entries, respectively, and j = √ 1. The be lower than that of the Two-ACR. Three-ACR saves four − 3 TABLE I ARITHMETIC COMPLEXITY OF THE THREE-ANGLE COMPLEX ROTATION AND TWO-ANGLE COMPLEX ROTATION SCHEMES Two-ACR Three-ACR NT × NR 2 × 2 4 × 4 2 × 2 4 × 4 Multiplication 18 54 26 74 Addition 6 24 12 42 Arctangent 3 4 Sine 3 4 Cosine 3 4 Fig.

Load more