MULTI-USER MIMO DOWNLINK PRECODING FOR USERS WITH MULTIPLE ANTENNAS

Veljko Stankovic and Martin Haardt

Ilmenau University of Technology Communications Research Laboratory P.O. Box 100565, 98684 Ilmenau, Germany {veljko.stankovic, martin.haardt}@tu-ilmenau.de Phone: +49 (3677) 69-2613, Fax: +49 (3677) 69-1195

ABSTRACT transmitting to a group of users. Each of these users is Multi user multiple-input multiple-output (MIMO) systems also equipped with multiple antennas. Motivated by the provide high capacity with the benefits of space division mul- need for cheap mobiles with low power consumption, we tiple access. The channel state information at the base sta- focus on systems where the complex signal processing is tion (BS) or access point (AP) is very important since it allows performed at the BS/AP. The BS/AP will use the channel joint processing of all users’ signals which results in a signif- state information (CSI) available at the transmitter to allow icant performance improvement and increased data rates. If these users to share the same channel and mitigate or ideally the channel state information is available at the BS/AP, it can completely eliminate multi-user interference (MUI) by in- be used to efficiently eliminate or suppress multi-user interfer- ence (MUI) by or by using ”dirty-paper” codes. telligent beamforming or by the use of ”dirty-paper” codes. The precoding also allows us to perform most of the complex All precoding techniques can be classified by the amount of processing at the BS/AP which results in a simplification of the MUI they allow (as zero or non-zero MUI techniques) users’ terminals. Linear precoding techniques have an advan- and linearity (as linear and non-linear techniques). Linear tage in terms of computational complexity. Non-linear tech- precoding techniques require no overhead to provide the niques have a higher computational complexity and require mobile the demodulation information and are less compu- some signaling overhead but can provide a better performance tationally expensive than non-linear. However, non-linear than linear techniques. In this paper we compare several lin- ear and nonlinear techniques and propose two new schemes techniques can provide much higher capacity. that outperform existing algorithms. The sensitivity of these Block diagonalization (BD) is a linear pre-coding tech- proposed techniques to the channel estimation errors at the nique for the downlink of MU MIMO systems [1]. It de- transmitter will be also analyzed. Index Terms - MIMO precoding, Block diagonalization, composes a MU MIMO downlink channel into multiple par- MMSE precoding, Tomlinson-Harashima precoding. allel independent single-user MIMO downlink channels. The signal of each user is pre-processed at the transmitter using a modulation matrix that lies in the null space of all other 1. INTRODUCTION users’ channel matrices. Thereby, the MUI in the system is efficiently set to zero. BD can be used with any other In recent years, there has been a considerable interest in previously defined single-user MIMO technique [2], as the wireless multiple-input, multiple output (MIMO) communi- different users do not interfere with each other. BD is attrac- cations system because of their promising improvement in tive if the users are equipped with more than one antenna. terms of performance and bandwidth efficiency. An impor- However, the zero MUI constraint can lead to a significant tant research topic is the study of multi-user (MU) MIMO capacity loss when the users’ subspaces significantly over- system. Such systems have the potential to combine the lap. Another technique also proposed in [1], named succes- high capacity achievable with MIMO processing with the sive optimization (SO), addresses the power minimization benefits of space division multiple access. In the down- and the near-far problem and it can yield better results in link scenario, a base station (BS) or an access point (AP) some situations but its performance depends on the power is equipped with multiple antennas and it is simultaneously allocation and the order in which the users’ signals are pre- This work has been partially performed in the framework of the IST processed. The zero MUI constraint is relaxed and a certain project IST-2003-507581 WINNER, which is partly funded by the Euro- amount of interference is allowed. pean Union. The authors would like to acknowledge the contributions of their colleagues. Tomlinson-Harashima precoding is a non-linear pre-coding technique developed for single-input, single-output (SISO) 3.1. Block diagonalization (BD) multipath channels. Recently it has been also proposed for Block diagonalization was first proposed in [1]. It is re- the equalization of MUI in MIMO systems [3], where it per- stricted to channels where the number of transmit antennas forms spatial pre-equalization instead of temporal, for ISI is greater or equal to the total number of receive anten- channels. Minimum mean-square-error (MMSE) precod- MT nas in the network . ing in combination with THP is proposed in [4]. MMSE MR Let us define the precoder matrices as balances the MUI in order to reduce the performance loss while the THP is used to eliminate the part of the MUI and MT ×r F = F1 F2 · · · FK ∈ C (1) improves the diversity. However, for two closely spaced an- MT ×ri tennas, as in the case when the user is equipped with mul- where Fi ∈ C is the i-th user’s precoder matrix. tiple antennas, the inter-stream interference mitigation still Moreover, r ≤ MR is the total number of the transmitted causes some performance loss. data stream sequences, while ri ≤ MRi is the number of This paper is organized as follows. In Section 2, we de- data stream sequences transmitted to the i-th user. We can scribe the MU downlink channel. In Section 3, we describe find the optimal precoding matrix F such that all MUI is the precoding techniques that will be compared and in Sec- zero by choosing a precoding matrix Fi that lies in the null tion 4, we present the results of the simulations. A short space of the other users’ channel matrices. Thereby, a MU summary follows in the Section 5. MIMO downlink channel is decomposed into multiple par- allel independent SU MIMO channels [5], [2]. 2. SYSTEM MODEL If we define Hi as H HT HT HT HT T We consider a MU MIMO downlink channel where MT i = 1 f· · · i−1 i+1 · · · K (2) transmit antennas are located at the base station, and MRi the zero MUI constraint forces F to lie in the null space of receive antennas are located at the i-th mobile station (MS), f i H H i = 1, 2,...,K. There are K users (or MSs) in the system. i. From the singular value decomposition (SVD) of i The total number of receive antennas is whose rank is Li f f K H H U Σ V (1) V (0) ei = i i i i (3) MR = MRi i=0 h i X we choose thef last righte eMT −e Li singulare vectors

We will use the notation {M ,...,M } × M to de- e R1 RK T V (0) ∈ CMT ×MT −Li which form an orthogonal basis for scribe the antenna configuration of the system. We assume a i H e flat channel. The MIMO channel to user i is denoted the null space of i. The equivalent channel of user i after (0) MR ×MT e H V as Hi ∈ C i . Moreover, the combined channel ma- eliminating the MUI is identified as i i , whose dimen- f trix is given by sion is MRi × (MT − Li) and is equivalent to a system e with MT −Li transmit antennas and MRi receive antennas. H HT HT HT T = 1 2 · · · K . Each of these equivalente SU MIMO channels has the same In order to take into account channel estimation errors we properties ase a conventional SU MIMO channel. Define the SVD use a ”nominal-plus-perturbation” model. The estimated H H V (0) U Σ V (1) V (0) combined channel matrix can be represented as i i = i i i i (4) h i H = H + E and let the rank of the i-th user’s equivalent channel ma- e (1) trix be Li. The product of the first Li singular vectors V where H denotes the flat fading network channel matrix, i c and V (0) produces an orthogonal basis of dimension L and and E is a complex random Gaussian matrix distributed ac- i i 0 2 I represents the transmission vectors that maximize the infor- cording to CN MR×MT ,MRσ MT . n matione rate for user i subject to the zero MUI constraint. Let H 2 E 2 be the SNR of the channel state ρ = k k / k k  information at the transmitter. 3.2. Successive optimization (SO)

3. MULTI-USER PRECODING As mentioned before, by applying BD on the combined chan- nel matrix of all users the MU MIMO channel can be trans- In this section we will briefly describe BD, SO, MMSE formed into a set of parallel single-user MIMO channels. THP and propose two new techniques, the combination of However, there is a capacity loss due to the nulling of over- SO and THP precoding and successive MMSE precoding lapping subspaces of different users. In [1], the authors pro- (SMMSE). pose a successive precoding algorithm in order to define a n simplified solution of the power control problem. By allow- modulo modulo operator operator ing a certain amount of interference, this algorithm reduces -1 MOD F H D diag([DHF ]ii ) MOD the capacity loss due to the subspace nulling. B First, we have to assume or determine a certain optimum ordering of the users, similar to VBLAST [6] or MMSE transmitter channel receiver THP [4]. Using SO, the modulation matrix for each user is designed in such a way that it lies only in the null space of Fig. 1. Block diagram of the SO THP system. the channel matrices of previous users. As a consequence, only they will generate the interference to this user. Let us lower triangular feedback matrix, used in THP precoding define the previous i − 1 users’ combined channel matrix as [4], is generated from this equivalent combined channel ma- T trix after the elements in each row are divided by the ele- H = HT HT · · · HT i 1 2 i−1 ments on the main diagonal, i.e., the corresponding singular and its corresponding SVD as  values. c In Fig. 1. we show the block diagram of the SO THP H system. The individual user’s channel matrices and demod- H U Σ V (1) V (0) i = i i i i . (5) ulation matrices are grouped in matrices H and D. The h i feedback matrix B, generated in the last step of the SO THP cH bˆ b bV (0) b ˆ If the rank of i is Li, then i contains MT − Li right algorithm is now used to precode the users’ data streams singular vectors. As in the BD solution, we force the mod- starting with the data stream of the first user whose precod- F ulation matrixcFi to lie in theb null space of Hi by setting ing matrix 1 had been generated last. F V (0)F ′ F ′ By using THP at the transmit side we significantly in- i = i i for some choice of i . Thereby, the i-th user does not see any interference from any subsequentc user crease the transmit power. Therefore we introduce the mod- (i + 1,...,Kb ). ulo operator at the transmitter and the receiver in order to reduce the constellation size into certain boundaries. Be- fore applying the modulo operator at the receiver we divide 3.3. Combination of SO and THP each data stream by the corresponding singular value so that In this section we will describe how to combine SO and the constellation boundaries at the receiver are the same as THP, in order to improve the use of the available subspace of at the transmitter. The ordering algorithm described here different users and eliminate any residual MUI. The result- forces the modulation matrix of the user that in the current ing equivalent channel matrix is also block diagonal which step has minimum capacity loss to lie in the null space of facilitates the definition of an ordering algorithm of the users. the remaining users’ channel matrices. The combination of SO and THP (SO THP) is performed by successively calculating first BD, then the reordering of 3.4. MMSE THP transmit pre-coding users, and in the end precoding with THP. Here, instead of The linear Wiener transmit filter is defined by the following examining all K! ordering possibilities which minimize the equation: total capacity loss in the system, we make the heuristic sim- plification to minimize the capacity loss of each user sepa- −1 F = β HH H + αI HH (6) rately. MT In short, we first calculate the maximum capacity that where  an individual user can achieve. Then, we identify the user − with the smallest difference between its maximum capacity P P 1 β = T , α = T , and its BD capacity and generate its precoding matrix such FxxH F H 2 str ( ) MRσn that it lies in the null space of the remaining users’ channel   matrices. Afterwards, we form the new combined channel PT denotes the available transmit power, x is a data vec- 2 matrix without this user’s channel matrix. We repeat these tor and σn denotes the variance of the zero mean circularly steps until the combined channel matrix is empty. The order symmetric complex Gaussian (ZMCSCG) noise. of the users in which they are precoded using THP is the In [4] the authors describe a system combining THP reverse of the order in which their precoding matrices are with MMSE pre-coding to eliminate a part of the MUI that generated. is below the main diagonal of the equivalent network chan- With the reordering of the users in the reverse order of nel matrix. The algorithm described in [4] is iterative and precoding we achieve that the equivalent combined channel requires a certain ordering of the users. First, a precoding matrix after precoding and demodulation is lower block di- matrix F is defined column by column starting from user K. agonal with the singular values on the main diagonal. The The column corresponding to the i-th user is obtained as the i-th column of the precoding matrix calculated using only H 20 the first i rows of the network channel matrix and equa- TDMA {2,2}×4 tion (6). After this, THP is used to eliminate the MUI to 18 MMSE THP {1,1,1,1}×4 the i-th user originating from the previous users. The SO THP {1,1,1,1}×4 i − 1 16 ZF {1,1,1,1}×4 detailed description of the algorithm can be found in [4]. 14

3.5. Successive MMSE transmit precoding (SMMSE) 12 10 MMSE precoding can improve the system performance by introducing a certain amount of interference especially for 8 6

users equipped with a single antenna. However, it suffers 10 % Outage Capacity a performance loss when it attempts to mitigate the inter- 4 ference between two closely spaced antennas as in the case 2 when the user terminal is equipped with more than one re- 0 ceive antenna. Here we propose a new algorithm that deals 0 5 10 15 20 SNR [dB] with this problem by successively calculating the columns r of the precoding matrix F for each of the receive antennas separately. Fig. 2. 10 % outage capacity as a function of receive SNRr. The columns in the precoding matrix Fi in equation (1), each corresponding to one receive antenna, are calculated successively. For the i-th user, i = 1,...,K, and j-th re- performance by introducing MUI and by eliminating inter- H¯ (j) stream interference. ceive antenna j = 1,...,MRi we define the matrix i as: hT i,j 4. SIMULATION RESULTS H1  .  In this section we compare the performance of SO THP, . (j)   minimum mean-square-error (MMSE) THP transmit filter- H¯ =  Hi−  i  1  ing proposed in [4], and SMMSE precoding. The channel  Hi   +1  H is assumed to be spatially white and flat fading. First,  .   .  we use the complementary cumulative distribution function    HK  (CCDF) and a 10% outage capacity to compare the system   with configuration {1, 1, 1, 1} × 4 employing MMSE THP hT   where i,j is the j-th row of the i-th user’s channel matrix pre-filtering to a system employing SO THP with the con- H F i. The corresponding column of the precoding matrix i figuration {1, 1, 1, 1} × 4. The capacity is calculated using is equal to the first column of the following matrix: the results on the capacity of MIMO broadcast channels in

−1 [7]. We also present capacity results for a TDMA system F H¯ (j) H H¯ (j) I H¯ (j) H i,j = β i i + α MT i (7) as a comparison. We employ the following transmit and re-   ceive SNR definitions, respectively: After calculating the beamforming vectors for all re- PT PT ceive antennas in this fashion, the equivalent combined chan- SNRt = 10 log10 and SNRr = 10 log10 . × M σ2 σ2 nel matrix of all users is equal to HF ∈ CMR MR after the  R n   n  precoding. For high SNR ratios, this matrix is also block di- In Figures 2 and 3 we compare these techniques using agonal. We can now apply any other previously defined SU 10% outage capacity as a function of receive SNR and the MIMO technique on the i-th user’s equivalent channel ma- number of transmit antennas MT . In Fig. 2 we show that trix HiFi. After the precoding using matrix Fi, we first per- for high SNR ratios SO THP provides a higher capacity than form the singular value decomposition (SVD) and then, if MMSE THP. However, at low SNR ratios MMSE THP has we want to maximize the capacity of the system use water- an advantage over SO THP when the users are equipped pouring (WP) on the eigenmodes of all users or if we want with only one antenna. The zero forcing (ZF) solution is to extract the maximum diversity and array gain, we trans- equal to the pseudo-inverse of the combined channel ma- mit only on the dominant eigenmode. Dominant eigenmode trix for all users. ZF has all the drawback of having zero transmission (DET) can provide maximum SNR at the re- MUI constraint and fighting the inter-stream interference ceiver and minimum BER performance. The complexity of of two closely spaced antennas. For antenna configuration this algorithm is only slightly higher than the one of BD. {1, 1, 1, 1} × 4 SO THP has 40% higher capacity than a By using this algorithm we efficiently improve the system TDMA system. {2,2,2}×6 −1 16 10 TDMA {2,2}× M T 15 MMSE THP {1,1,1,1}× M T SO THP {1,1,1,1}× M 14 T −2 10 13

12 −3 10 11

10 BER −4 9 10 SO THP 10 % Outage Capacity ρ = 20 dB 8 BD ρ = 20 dB −5 7 10 SMMSE ρ = 20 dB 6 4 4.5 5 5.5 6 6.5 7 7.5 8 MMSE THP Number of transmit antennas − M ρ = 20 dB T −6 10 0 2 4 6 8 10 SNR [dB] Fig. 3. 10 % outage capacity as a function of the number of t transmit antennas MT . SNRr = 10dB Fig. 4. BER performance comparison of MMSE THP in configuration [1, 1, 1, 1, 1, 1] × 6 and BD and SMMSE in The BER performance of the proposed SO THP and configuration [2, 2, 2] × 6. SMMSE precoding techniques and BD are shown in Fig. 4. We compare the performance of these techniques in case when the users are equipped with multiple antennas. BD is a use water-poring (WP) power loading. linear precoding technique that has a zero MUI interference constraint. By introducing MUI, SMMSE provides a higher 5. CONCLUSION both diversity and array gain than BD. SO THP does not have the same diversity gain as BD, that can be explained In this paper we analyze the performance of different trans- with the influence of the modulo operator used in THP. mit precoding techniques in a downlink multi-user scenario. In Fig. 5 we compare the BER of the various previously Depending on the set of constraints, like the size of the over- described techniques in a mixed system where we have both head or the amount of the MUI allowed, different techniques single- and multiple- antenna users. The system has the con- can be optimal. Linear techniques are computationally less figuration [1, 1, 2, 2]×6. This figure illustrates that SO THP expensive and generally require no signaling overhead. On can provide the highest performance. SO THP has the same the other hand, the non-linear techniques can provide bet- performance at high SNR ratios as MMSE THP. For low ter performance. The first technique that we propose in SNR, SMMSE performs also very well but does not have this paper is the combination of a linear pre-coding tech- the same diverity gain as SO THP and MMSE THP. How- nique called successive optimization and a non-linear tech- ever, we would like to emphasize that this technique is lin- nique called Tomlinson-Harashima precoding. By combin- ear and does not require any signaling from the BS/AP to ing these two techniques we are able to completely elimi- the users. The real advantage of this technique can be seen nate MUI in the system when there is perfect CSI available when all the users are equipped with multiple antennas as at the transmitter. The equivalent channel matrix is block di- shown in Fig. 4. In this case we see that SMMSE out- agonal after precoding. This technique is especially attrac- performs both BD and MMSE THP which is a nonlinear tive in cases when the users and the base station/access point technique. Also we should note that although BD has bet- are equipped with multiple antennas. In these cases SO THP ter performance than SO THP for multiple antenna users, provides the same capacity as, for example, MMSE THP in a mixed scenario SO THP outperforms BD, because BD but without any MUI. By transmitting only on the domi- performs poorly when users are equipped with only one an- nant eigenmodes of each user, SO THP can provide a better tenna. BER performance than MMSE THP for low SNR ratios. In Fig. 6. we show the CCDF function of the capacity of This advantage is particularly important when we do not the systems employing BD and SMMSE. For the configura- have perfect CSI at the transmitter. SO THP is less sensitive tion {1, 1, 1, 1} × 4, SMMSE reduces to linear MMSE pre- to channel estimation errors and can give better results than coding. By abandoning the zero MUI constraint SMMSE MMSE THP. gains over 2 bits/sec/Hz compared to BD when the users In this paper we also introduced a linear technique that are equipped with multiple antennas. In our simulations we performs successive MMSE prefiltering in order to reduce SNR = 10 dB {1,1,2,2}×6 r −1 10 1 SMMSE {2,2}×4 0.9 MMSE {1,1,1,1}×4 BD {2,2}×4 0.8 × −2 BD {1,1,1,1} 4 10 ZF {1,1,1,1}×4 0.7

0.6

−3 10 0.5 BER CCDF BD DET 0.4 ρ = 20 dB MMSE THP 0.3 −4 10 ρ = 20 DB SMMSE 0.2 ρ = 20 dB SO THP 0.1 ρ = 20 dB −5 10 0 0 2 4 6 8 10 0 2 4 6 8 10 12 14 16 SNR [dB] C t

Fig. 5. BER performance comparison of MMSE THP Fig. 6. CCDF of the capacity of the BD, SMMSE, and ZF [1, 1, 1, 1, 1, 1] × 6 and BD, SO THP and SMMSE in con- system. figuration [1, 1, 2, 2] × 6.

[2] L. U. Choi and R. D. Murch, “A transmit preprocessing tech- the performance loss due to the zero MUI constraint and nique for multiuser MIMO systems using a decomposition the cancellation of the interference between the antennas approach,” IEEE Transactions on Wireless Communications, located at the same terminal. SMMSE outperforms BD, vol. 3, no. 1, pp. 20–24, January 2004. another linear precoding techniques that also performs well [3] G. Cinis and J. Cioffi, “A multi-user precoding scheme achiev- when the users are equipped with multiple antennas but with ing crosstalk cancellation with application to DSL systems,” zero MUI. Moreover, in a system, where all users are equipped in Proc. Asilomar Conf. on Signals, Systems, and Computers, with multiple antennas, SMMSE outperforms both SO THP November 2000, vol. 2, pp. 1627–1637. and MMSE THP that are non-linear precoding techniques. [4] M. Joham, J. Brehmer, and W. Utschick, “MMSE approaches SMMSE has relatively low computational complexity and to multiuser spatio-temporal Tomlinson-Harashima precod- reduces the overhead needed for demodulation. For high ing,” in Proc. 5th International ITG Conference on Source SNR ratios it also results in a block diagonal combined net- and Channel Coding (ITG SCC’04), January 2004, pp. 387– 394. work channel matrix and it can be combined with any other previously proposed precoding technique. It is expected that [5] Q. H. Spencer, A. L. Swindlehurst, and M. Haardt, “Zero- the further investigations will show that the combination of forcing methods for downlink spatial multiplexing in Multi- SMMSE and THP can perform even better. In a mixed sce- user MIMO channels,” IEEE Transactions on Signal Process- ing, vol. 52, no. 2, pp. 461–471, February 2004. nario, where the users are equipped with various number of antennas, SO THP has the best performance. For low SNRs, [6] P.W. Wolniansky, G.J Foschini, G.D. Golden, and R.A. Valen- SMMSE also performs well but has smaller diversity than zuela, “V-BLAST: An architecture for realizing very high data SO THP. Although BD has similar performance as SO THP rates over the rich-scattering wireless channel,” in Proc. ISSSE 98, September 1998. for multiple-antenna users, SO THP has the advantage of providing a good performance to the single-antenna users [7] S. Vishwanath, N. Jindal, and A. J. Goldsmith, “On the ca- also. For high SNR ratios, SO THP provides almost twice pacity of multiple input multiple output broadcast channels,” the capacity as a TDMA system with the same number of in Proc. of the IEEE International Conference on Communi- cations (ICC), New York, NY, April 2002. antennas.

6. REFERENCES

[1] Q. Spencer and M. Haardt, “Capacity and downlink trans- mission algorithms for a multi-user MIMO channel,” in Proc. 36th Asilomar Conf. on Signals, Systems, and Computers, Pa- cific Grove, CA, IEEE Computer Society Press, November 2002.