MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com

Joint Pilot and Data Loading Technique for Mimo Systems Operating with Covariance Feedback

Digham, F.; Mehta, N.; Molisch, A.; Zhang, J.

TR2004-105 October 2004

Abstract We consider a multiple input multiple output (MIMO) system wherein channel correlation as well as noisy channel estimation are both taken into account. For a given block fading duration, pilot-assisted channel estimation is carried out with the aid of a minimum mean square error estimator. The pilot sequences and data are jointly optimized to maximize the ergodic capacity while fulfilling a total transmit energy constraint. Using the optimization results, we study the effect of various system parameters and different types of channel knowledge. We show that systems with channel covariance knowledge and imperfect channel estimates can even outper- form those with perfect channel estimation at the receiver, but without any channel knowledge at the transmitter. Further, we show that the signal to noise ratio and the block fading duration determine the qualitative behavior of ergodic capacity as a function of antenna spacing or angular spread. Finally, we find that the optimum pilot-to-data power ratio is always greater than one. It increases as the total transmit power decreases or as the block fading length increases.

3G 2004

This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved.

Copyright c Mitsubishi Electric Research Laboratories, Inc., 2004 201 Broadway, Cambridge, Massachusetts 02139 MERLCoverPageSide2 JOINT PILOT AND DATA LOADING TECHNIQUE FOR MIMO SYSTEMS OPERATING WITH COVARIANCE FEEDBACK

, ,+ Fadel F. Digham† , Neelesh B. Mehta‡, Andreas F. Molisch‡ , and Jin Zhang‡ †Univ. of Minnesota, ‡Mitsubishi Electric Research Labs, Minneapolis, MN, USA. 201 Broadway, Cambridge, MA, USA. [email protected] mehta, molisch, jzhang @merl.com { }

Index Terms— MIMO systems, Antenna arrays, Fading channels, neous CSIT has to be obtained by feedback from the receiver. Training, Information Rates While only one transmit-receive antenna link estimate needs to be fed back in conventional non-MIMO systems, the same Abstract— We consider a multiple input multiple output (MIMO) system wherein channel correlation as well as noisy channel needs to be done for all transmit-receive antenna pair links estimation are both taken into account. For a given block fading in MIMO. Therefore, the feedback burden is considerable in duration, pilot-assisted channel estimation is carried out with MIMO systems, and, if not optimized, can adversely affect the aid of a minimum mean square error estimator. The pilot the overall spectral efficiency of the system. Even in the time sequences and data are jointly optimized to maximize the ergodic division duplex (TDD) mode of operation, which does not capacity while fulfilling a total transmit energy constraint. Using the optimization results, we study the effect of various require explicit feedback, the instantaneous CSIT acquired system parameters and different types of channel knowledge. from reverse link transmissions may be outdated during the We show that systems with channel covariance knowledge and transmission phase, especially when the mobile station moves imperfect channel estimates can even outperform those with fast. For this reason, a great deal of interest has been paid to perfect channel estimation at the receiver, but without any systems that exploit statistical or covariance knowledge [8], channel knowledge at the transmitter. Further, we show that the signal to noise ratio and the block fading duration determine the [12] at the transmitter – denoted henceforth as CovKT. Such qualitative behavior of ergodic capacity as a function of antenna systems have the advantage of requiring a much lower rate of spacing or angular spread. Finally, we find that the optimum feedback, or can eliminate it altogether. pilot-to-data power ratio is always greater than one. It increases as the total transmit power decreases or as the block fading While theoretical investigations of MIMO systems almost length increases. always assume that the CSI at the receiver (CSIR) is perfect, this assumption cannot be fulfilled in practical systems. I. INTRODUCTION The noise present during the channel estimation phase will render the receiver CSI (CSIR) imperfect, and can appreciably While second generation cellular systems were intended for decrease the capacity. Most previous studies that took imper- speech communications as well as some rudimentary data ser- fect CSIR into account were restricted to the case of spatially vices, third-generation (3G) systems distinguish themselves by white channels. For example, [7] first derived a lower bound providing data communications with high rates. The original to a MIMO system capacity. This paper considered pilot- specifications of 3GPP (Third-Generation Partnership Project) aided channel estimation for a block fading spatially white called for data rates up to 2 Mbps, which is easily achievable wireless channel and derived the optimum training sequence, in the allocated 5 MHz bandwidth. However, higher system pilot-to-data power ratio, and training duration. One paper that capacity and higher data rates are now desired. Given the studied data transmission for correlated MIMO channels with scarce system bandwidth, these can only be achieved by trans- estimation error is [15]. However, it modeled the estimation mission schemes with high spectral efficiency such as Multiple error in an ad hoc manner. input multiple output (MIMO) [5], [13], [14]. Furthermore, An obvious way to improve CSIR is to increase the power of algorithms, testbeds, and prototypes have demonstrated the the pilots or the training duration. However, for systems with practical viability of MIMO [2], [6]. Therefore, MIMO tech- a fixed power budget and a given channel coherence time, nology is now an active work item in 3GPP standardization doing so reduces both the power and time allowed for data efforts [1]. transmission. The trade-off between pilots and data is thus of An important aspect of MIMO systems is the use of channel great practical interest. state information (CSI) at the transmitter (CSIT). In the In this paper, we discuss the use of optimum pilot and data se- frequency division duplex (FDD) mode of operation, instanta- quences, pilot duration, and pilot-to-data power ratio in MIMO

The author was with Mitsubishi Electric Research Labs during the course systems operating in a spatially correlated channel when the of this work. transmitter has access to only covariance information. The + A. F. Molisch is also at the Department of Electroscience, Lund Univer- receiver obtains its CSIR from noisy pilot sequences by means sity, Lund, Sweden. of a simple minimum mean square error (MMSE) channel where X = [x ] is the transmitted pilot matrix of size N p ij r estimator. The main focus in the current paper is to highlight Tp and is known a priori at the receiver. Here, xij is the signal the implications on the link performance and study the effect transmitted from antenna i at time j. Wp is the spatially and of different system parameters. temporally white noise matrix, defined in a similar manner; with its entries having variance 2 . Q = X X denotes the This paper is organized as follows. Section II sets up the w p p p† pilot covariance matrix.1 system model and the notation used in the paper. Section III gives a concise description of the system and lists the key The received vector, yd, at any given time instant is related theoretical results, the details of which can be found in [4], to the transmitted signal vector, xd, by [11]. The main results of the paper can be found in Sec. IV, which provides numerical examples and discusses the impact yd = Hxd + wd, (3) on the system performance of different system parameters where wd is the spatially white and temporally uncorrelated such as block fading duration, the channel correlation, transmit noise vector. The vectors yd, xd, and wd are of dimensions power, etc. Concluding remarks are given in Section V. Nr 1, Nt 1, and Nr 1, respectively. Qd = Exd [xdx† ] d denotes the data covariance matrix. II. SYSTEM MODEL

We consider an N N MIMO system with N transmit and C. MMSE Channel Estimator t r t N receive antennas. A block fading frequency-flat channel r Given the second-order statistics of the channel and the pilot model [10], in which the channel remains constant for T sequence X , the MMSE channel estimator generates the time instants (symbol durations) and decorrelates thereafter p channel estimate, H, by passing the received Y through a is assumed. T corresponds to the coherence interval of the p deterministic matrix filter. For R = I , it can be shown channel. Of the T time instants, T are used for transmitting r Nr p that [4] b pilots, and the remaining T = T T for data. P and P 1 d p p d /2 denote the power allocated to pilots and data, respectively. We H = HwRt , (4) shall use the subscripts p and d for variables related to pilots where Hw is spatiallybwhitee withe its entries having unit and data, respectively. Lower and upper case boldface letters 2 1 variance and Rt = RtXp(Xp† RtXp + wITp ) Xp† Rt. shall be used to denote vectors and matrices, respectively. In e denotes the n n identity matrix, (.)† the Hermitian transpose, The channel estimation error is defined as = H H. From e Tr . the trace, and EX is the expectation over X. (3), it follows that the data transmission phase is governed by { } yd = Hxd +xd +wd. A lower bound, CL, on the capacityb , A. Channel Model based on a receiver that treats the term xd + wd as noise, can bebshown to be [4] The Nr Nt matrix H = [hij ] denotes the instantaneous chan- T H† H Q nel state, where the element hij is the complex fading gain C = 1 p log I + w w d , (5) L EHw 2 Nt 2 2 from transmit antenna j to receive antenna i. Experimental T w + l results have demonstrated the validity of the Kronecker model e e e 2 for several typical channels [9]. H is then given by where = Tr Qd(Rt Rt) is the noise term due to l 1 T H = R1/2H R /2, (1) imperfect estimation.n Here, 1 o p is the penalty factor r w t e T due to training because no data transmission occurs then. where Rt and Rr are the transmit and receive antenna correlation matrices, respectively. Rt (Rr) depends on the mean angle of departure (arrival), , the rms angular spread, III. OPTIMAL JOINT PILOT AND DATA LOADING , and the relative antenna spacing, d , [3]. The matrix H t w The optimum pilot and data design that maximizes C , subject is spatially white, i. e., its entries are zero-mean, independent, L to a total energy constraint P T + P T = P T , satisfies the complex Gaussian random variables with unit variance. Fur- p p d d following properties [4]: thermore, we assume that the receiver is in a rich scattering environment, as is typically the case in the downlink of a 1) The eigenspaces of both pilot and data covariance cellular system. Therefore, Rr = In. Rt is full ranked. matrices, Qd and Qp, should be the same as that of the channel transmit covariance matrix, Rt. B. Training and Data Transmission Phases 2) The ranks of both pilot and data covariance matrices should match. They shall be denoted by k. The signal received during the entire training phase of duration 3) The optimum training interval, Tp, also equals k. This Tp instants, is an Nr Tp matrix, Yp = [yij ], where yij is result also implies that Tp min(Nt, Nr). the signal received by receive antenna i at time instant j. Yp 1 is given by Given that the pilot Xp is a deterministic matrix, no expectation operator Yp = HXp + Wp, (2) is used for defining Qp. Here, Tr Q = P , Tr Q = P T , and P is the spread also increases capacity as it leads to a better distribution { d} d { p} p p total power budget available at the transmitter. With the of power among the used eigenmodes. above results, the capacity maximization problem reduces to The relative strength of the different effects determines the a numerical search over the eigenvalues of Q and Q . A p d different observed behaviors of capacity as a function of the considerably simpler power allocation strategy that distributes antenna spacing. For smaller P , only the largest eigenmode the pilot power uniformly among the k non-zero eigenmodes (k = 1) is used [8]. As mentioned, decreasing d increases of Q and the data power uniformly among the k non-zero t p the largest eigenmode and improves capacity. On the other eigenmodes of Q results in near-optimal performance [4]. d hand, multiple eigenmodes are used for larger P . Therefore, as dt increases, the eigen spread decreases and the capacity IV. NUMERICAL EXAMPLES AND DISCUSSIONS increases. Notice that T affects the number of eigenmodes Numerical results presented in this section are for uniform used, and thus, impacts the threshold values of P at which linear arrays at the transmitter and the receiver. The mean the above two different trends occur. angle of departure is = 45 and the noise variance is 2 w = 1. We also assume a Gaussian shape for the angular C. Optimal Pilot and Data Power Allocation spectrum, which is characterized by its rms angular spread, Figure 5 plots the optimal allocation of power between the . pilots and data, = Pp/Pd, as a function of P for = 10 We first focus on the Nt = Nr = 4 system. and 45. The figure contains two sub-plots corresponding to T = 10 and T = 100. It can be seen that decreases as P A. Impact of Block Fading Duration (T ) increases. However, is always greater than 1, which means that the transmitted (pilot) power during training must be Figure 1 plots the ergodic capacity as a function of the block greater than the power transmitted during data communication. fading duration T for two angular spreads, = 10 and Note that this does not imply any ordering on the pilot and data 45 and for P = 5 dB and 15 dB (dt = 0.5). For both energies. The discontinuities in occur when the number of angular spreads, CL increases as T increases. This is because eigenmodes used for transmission, k, increases. It can be seen T the impact of the training penalty 1 p diminishes as T T = 100 T = 10 T that is significantly greater for than for . increases. This can be explained as follows. Even when the total energy budget, P T , increases as T increases, Tp cannot exceed Nt. B. Impact of Spatial Correlation ( and dt) Therefore, Pp increases instead. In summary, k and T are the principal parameters that affect ; given these, it is mostly Changing the angular spread affects the spatial correlations insensitive to and P . between the transmit antenna elements. As increases, the correlations between the transmit antenna elements decrease D. Combined Impact of CSIT and CSIR and the MIMO channel changes from highly correlated to spatially white. Increasing the relative antenna spacing, dt, We now proceed to evaluate the impact of the CSIR and also has a similar effect [3]. Figure 2 plots CL as a function CSIT assumptions on the ergodic capacity of MIMO systems. 2 of for different total transmit power values. It is clear that We compare CL, the capacity achieved by the system that the power budget, P , affects the qualitative behavior of CL jointly designs the pilot and data given CovKT and pilot- with respect to . While CL increases monotonically with aided imperfect estimation at the receiver, with systems in for P = 20 dB and 30 dB, it decreases monotonically with which channel knowledge at the transmitter is not exploited for P = 0 dB. For an intermediate value of P = 10 dB, or is unavailable and with perfect or imperfect CSIR. Note CL even reaches a maximum value at = 15. that if perfect CSIR is assumed, no resources need to be We delve into this further in the equivalent Figures 3 and 4 that wasted on pilots (Pp = 0). Without any CSIT, Tp needs to be Nt [7]. Specifically, we compare CL with the ergodic capacity plot CL as a function of dt for P = 5 dB and 15 dB, = 10 of the following systems: no CSIT and imperfect pilot-aided and 45, and T = 10 and 100. The behavior of the capacity as a function of the antenna spacing is governed by several estimation at the receiver (Tp = Nt), no CSIT and perfect effects. As the antenna spacing increases: (i) the number CSIR (Tp = Nt), CovKT and perfect CSIR (Tp = 0), perfect of used eigenmodes and, thus, the number of data streams instantaneous CSIT and perfect CSIR (Tp = 0). The last increases, (ii) the eigenvalue spread decreases, which increases scheme above is the classic water-filling over space solution. the strength of the unused (weaker) channel eigenmodes and For better understanding the role of the training penalty, a decreases the strength of the used channel eigenmodes. This system with CovKT and perfect CSIR (with Pp = 0) that still can decrease capacity. At the same time, a smaller eigenvalue trains for a duration equal to the number of modes used for transmission is also considered. 2As the noise and channel coefficients are assumed to have unit variance, Figure 6 compares the above for a 2 4 system for = 10, the total transmit power is equivalent to the average received signal to noise ratio (SNR). dt = 0.5, and T = 10. When the CSIR is perfect and Tp = 0, 13 we observe that CovKT performs as well as instantaneous P=15 dB CSIT. For imperfect CSIR, we find that at higher P values, 12 when all the eigenmodes are in operation, the capacities with 11

CovKT and no CSIT become the same. 10

σ =450 θ 9 0 Figure 7 compares the same for a 4 1 transmit diversity σ =10 θ

[bits/sec/Hz] 8 system. The capacity with instantaneous CSIT is better than L C that with CovKT by 0.5 bits/sec/Hz for all P . This is unlike 7 the 2 4 system. Figure 8 [11] deals with a 4 4 system. 6 P=5 dB There is one key difference between the 2 4 case and the 5 4 1 and 4 4 cases – the capacity with imperfect CSIR 4 (but, with CovKT) exceeds that with perfect CSIR (without 10 20 30 40 50 60 70 80 90 100 CSIT). As expected, in all the three cases, for the same CSIT assumptions, the capacity with perfect CSIR is always greater Fig. 1: The impact of the block fading length, T , on C for that that with imperfect CSIR. L P = 5 dB and 15 dB (Nt = Nr = 4, dt = 0.5).

20 V. CONCLUSIONS 18 P=30 dB P=20 dB 16 In this paper, we considered a MIMO system with channel P=10 dB P=0 dB covariance knowledge at the transmitter and noisy CSI at 14 the receiver, and operating in a spatially correlated channel. 12 The channel state is estimated at the receiver via an MMSE

[bits/sec/Hz] 10 L estimation filter using a priori defined pilot sequences. Using C jointly optimized pilot and data sequences, we studied the 8 effect of different system parameters on the ergodic capacity. 6

We compared the ergodic capacity of systems with different 4 types of CSIT (instantaneous, covariance, and no CSIT) and 2 either perfect or imperfect CSIR. We showed that systems 0 10 20 30 40 50 60 Angular Spread σ [degree] with covariance knowledge and imperfect CSIR can achieve θ higher ergodic capacity than systems with no CSIT, even if Fig. 2: Impact of the angular spread, , on CL (Nt = Nr = they have perfect CSIR. The capacity always increased with 4, dt = 0.5, T = 10). the (block) fading duration. The capacity, as a function of the antenna spacing, can show a monotonic increase, a monotonic decrease, or a local maxima. We also found that the training [6] G. D. Golden, G. J. Foschini, R. A. Valenzuela, and P. W. Wolniansky, “Detection algorithm and initial laboratory results using the V-BLAST duration should be kept small, while the power assigned to space-time communication architecture,” Electronics Letters, pp. 14–15, the pilot symbols should be higher than that assigned to the Jan. 1999. data symbols; the optimum power values also depend on the [7] B. Hassibi and B. M. Hochwald, “How much training is needed in multiple-antenna wireless links?,” IEEE Trans. Inform. Theory, pp. 951– number of used eigenmodes and the block fading duration. 963, 2003. Our results thus motivate exploiting covariance knowledge [8] S. A. Jafar and A. Goldsmith, “Transmitter optimization and optimality at a multiple antenna transmitter, and provide fundamental of for multiple antenna systems,” To appear in IEEE Trans. Wireless Commun., 2004. guidelines for the optimized design of 3G MIMO systems. [9] J. P. Kermoal et al., “A stochastic MIMO radio channel model with experimental validation,” IEEE J. Select. Areas Commun., pp. 1211– 1226, 2002. REFERENCES [10] T. L. Marzetta and B. M. Hochwald, “Capacity of a mobile multiple- antenna communication link in Rayleigh flat fading,” IEEE Trans. Inform. Theory, vol. 45, pp. 139–157, January 1999. [1] “Multiple input multiple output (MIMO) antennae in UTRA,” [11] N. B. Mehta, F. F. Digham, A. F. Molisch, and J. Zhang, “Rate of TR 25.876, 3rd Generation Partnership Project, 2004. MIMO systems with CSI at transmitter and receiver from pilot-aided [2] A. Adjoudani et al., “Prototype experience for MIMO BLAST over estimation,” To appear in Proc. VTC, Sept. 2004. third-generation wireless system,” IEEE J. Select. Areas Commun., [12] S. H. Simon and A. L. Moustakas, “Optimizing MIMO antenna systems vol. 21, pp. 440–451, Apr. 2003. with channel covariance feedback,” IEEE J. Select. Areas Commun., [3] D. Asztely, “On antenna arrays in mobile communication systems: Fast vol. 21, pp. 406–417, April 2003. fading and GSM base station receiver algorithms,” Tech. Rep. IR-S3- [13] E. Telatar, “Capacity of multi-antenna Gaussian channels,” European SB-9611, Royal Institute of Technology, 1996. Trans. Telecommun., vol. 10, pp. 585–595, 1999. [4] F. F. Digham, N. B. Mehta, A. F. Molisch, and J. Zhang, “Joint pilot and [14] J. Winters, “On the capacity of radio communication systems with data loading for MIMO systems with covariance knowledge,” Submitted diversity in a Rayleigh fading environment,” IEEE Trans. Commun., to IEEE Trans. Wireless Commun., 2004. vol. 5, pp. 871–878, Jun. 1987. [5] G. J. Foschini and M. J. Gans, “On the limits of wireless communica- [15] T. Yoo, E. Yoon, and A. Goldsmith, “MIMO capacity with channel tions in a fading environment when using multiple antennas,” Wireless uncertainty: Does feedback help?,” Submitted to Globecom, 2004. Pers. Commun., vol. 6, pp. 311–335, 1998. 20 Perf. CSIR, Inst. CSIT, Tp=0 9 k = 2 to 3 Perf. CSIR, Cov.@Tx., Tp=0 18 Perf. CSIR, Cov.@Tx., Tp=k

Imperf. CSIR, Cov.@Tx., Tp=k

8 k = 1 to 2 16 Perf. CSIR, No CSIT, Tp=N t Imperf. CSIR, No CSIT, Tp=N k = 2 to 3 t 14

7 k = 1 to 2 σ =100, P=15 dB θ 12 σ =450, P=15 dB θ σ =100, P=5 dB θ 10 σ =450, P=5 dB 6 θ

[bits/sec/Hz] 8 L C

5 Ergodic Capacity [bits/sec/Hz] 6 k = 1 to 2 4 4 2

k = 1 to 2 0 3 0 5 10 15 20 25 30 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Total Power Budget P [dB] Relative Antenna Spacing d t Fig. 6: Ergodic capacity comparison of systems with different Fig. 3: Impact of the relative antenna spacing, dt, on CL for CSIT and CSIR assumptions for Nt = 2 and Nr = 4. (T = T = 10 (Nt = Nr = 4, P = 15 dB and 5 dB). 10, = 10).

12 Perf. CSIR, Inst. CSIT, Tp=0 14 Perf. CSIR, Cov.@Tx., Tp=0

Perf. CSIR, Cov.@Tx., Tp=k

Imperf. CSIR, Cov.@Tx., Tp=k 13 10 Perf. CSIR, No CSIT, Tp=N t Imperf. CSIR, No CSIT, Tp=N k = 3 to 4 t 12 k =2 to 3 8 11 0 k=2 to 3 σ =45 , P=15 dB k =1 to 2 θ σ =100, P=15 dB θ 10 σ =450, P=5 dB θ 6 σ =100, P=5 dB k =1 to 2 θ

[bits/sec/Hz] 9 L C 4

8 Ergodic Capacity [bits/sec/Hz] k =1 to 2 7 k =1 to 2 k = 2 to 3 2 k=2 to 3 6

0 5 0 5 10 15 20 25 30 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Total Power Budget P [dB] Relative Antenna Spacing d t Fig. 7: Ergodic capacity comparison of systems with different Fig. 4: Impact of the relative antenna spacing, dt, on CL for CSIT and CSIR assumptions for Nt = 4 and Nr = 1. (T = T = 100 (Nt = Nr = 4, P = 15 dB and 5 dB). 10, = 10).

30 Perf. CSIR, Inst. CSIT, Tp=0 4 σ =100 Perf. CSIR, Cov.@Tx., Tp=0 θ 0 Perf. CSIR, Cov.@Tx., Tp=k 3.5 σ =45 θ Imperf. CSIR, Cov.@Tx., Tp=k 25 d 3 Perf. CSIR, No CSIT, Tp=N /P t p Imperf. CSIR, No CSIT, Tp=N

=P t

α 2.5 20 2

1.5 0 5 10 15 20 25 30 Total Power Budget P [dB] 15 (a) T = 10 12 σ =100 θ 0 10

σ =45 Ergodic Capacity [bits/sec/Hz] 10 θ d /P p 8

=P 5 α 6

4 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Total Power Budget P [dB] Total Power Budget P [dB] (b) T = 100 Fig. 8: Ergodic capacity comparison of systems with different Fig. 5: Power allocation ratio, , versus total power budget, CSIT and CSIR assumptions for Nt = 4 and Nr = 4, (T = P , for T = 10 and 100 (Nt = Nr = 4). 10, = 10) [11].