Impact of Spatial Correlation and Design in OSTBC MIMO Systems

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS Volume 9, Issue 11, Pages 3578-3589, November 2010.

Copyright c 2010 IEEE. Reprinted from Trans. on Wireless Communications.

This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the KTH Royal Institute of Technology’s products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected].

By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

EMIL BJORNSON,¨ EDUARD JORSWIECK, AND BJORN¨ OTTERSTEN

Stockholm 2010

KTH Royal Institute of Technology ACCESS Linnaeus Center Signal Processing Lab

DOI: 10.1109/TWC.2010.100110.091176 KTH Report: IR-EE-SB 2010:010 3578 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 9, NO. 11, NOVEMBER 2010 Impact of Spatial Correlation and Precoding Design in OSTBC MIMO Systems Emil Bj¨ornson, Student Member, IEEE, Eduard Jorswieck, Senior Member, IEEE, and Bj¨orn Ottersten, Fellow, IEEE

Abstract—The impact of transmission design and spatial spatially limited which leads to a correlated channel [3]–[5], correlation on the symbol error rate (SER) is analyzed for also known as spatial correlation. multi-antenna communication links. The receiver has perfect channel state information (CSI), while the transmitter has either The impact of CSI and spatial correlation on the ergodic ca- statistical or no CSI. The transmission is based on orthogonal pacity has received much attention. For simplicity, the receiver space-time block codes (OSTBCs) and linear precoding. The is usually assumed to have perfect CSI [6], while various types precoding strategy that minimizes the worst-case SER is derived of CSI has been considered at the transmitter [7]–[11]. The for the case when the transmitter has no CSI. Based on this impact of spatial correlation on the capacity was evaluated strategy, the intuitive result that spatial correlation degrades the SER performance is proved mathematically. numerically in [9] (among others), but the relationship was In the case when the transmitter knows the channel statistics, first derived analytically in [10]. It was shown that spatial the correlation matrix is assumed to be jointly-correlated (a correlation decreases the capacity when the transmitter has generalization of the Kronecker model). The eigenvectors of the no CSI or perfect CSI, which is intuitive since correlated SER-optimal precoding matrix are shown to originate from the channels have fewer degrees of freedom and thus less suitable correlation matrix and the remaining power allocation is a convex problem. Equal power allocation is SER-optimal at high SNR. for spatial multiplexing. When the transmitter has statistical Beamforming is SER-optimal at low SNR, or for increasing CSI, this negative effect is however countered by the advantage constellation sizes, and its optimality range is characterized. of having smaller channel variations; in highly correlated A heuristic low-complexity power allocation is proposed and channels, the channel direction is in fact given by the statistics. evaluated numerically. Finally, it is proved analytically that Interestingly, it was proved in [10] that correlation among the receive-side correlation always degrades the SER. Transmit-side correlation will however improve the SER at low to medium transmit antennas improves the capacity in this case. SNR, while its impact is negligible at high SNR. While most previous work considered the ergodic capacity Index Terms—Beamforming, channel state information, requiring Gaussian constellations, this paper considers the MIMO systems, orthogonal space-time block codes, power al- symbol error rate (SER) with practical symbol constellations. location, spatial correlation, symbol error rate. Prior work includes [12] and [13] that made numerical obser- vations on the impact of spatial correlation on error rates. I. INTRODUCTION Herein, we derive an analytical solution to the impact of correlation by analyzing a general class of SER-like functions. N wireless communication, the use of antenna arrays at the This class includes the exact SER for Rayleigh channels I transmitter and receiver can greatly improve the spectral with orthogonal space-time block codes (OSTBCs), linear efficiency and system performance. Under the ideal conditions precoding [14]–[18], and uncoded PAM, PSK, or QAM. We of uncorrelated antennas and perfect channel state information use the jointly-correlated model, proposed in [19] and [20], (CSI), it was shown in [1] and [2] that the ergodic capacity to analyze transmission design and the impact of spatial improves linearly as the number of antennas increases at both correlation under more general conditions than the commonly sides. In practice, this fundamental gain is difficult to obtain. used Kronecker model [12], [13], [21]. Our main contributions Firstly, the channel fading makes it costly for the transmitter to are: keep track on the current CSI. Secondly, the scattering is often ∙ Optimal transmission strategies: When the transmitter Manuscript received August 5, 2009; revised February 5, 2010 and June has no CSI, it can protect itself against the unknown 7, 2010; accepted September 17, 2010. The associate editor coordinating the review of this paper and approving it for publication was H.-C. Yang. channel by using OSTBCs and equal E. Bj¨ornson and B. Ottersten are with the Signal Processing Laboratory at power allocation in all spatial directions. This precoding the ACCESS Linnaeus Center, KTH Royal Institute of Technology, SE-100 strategy minimizes the worst-case SER (Theorem 1). 44 Stockholm, Sweden (e-mail: {emil.bjornson, bjorn.ottersten}@ee.kth.se). B. Ottersten is also with securityandtrust.lu, University of Luxembourg. When the transmitter has statistical CSI, the eigenvec- E. Jorswieck is with the Communications Laboratory, Dresden University tor structure of the SER minimizing precoder is de- of Technology, D-01062 Dresden, Germany (e-mail: eduard.jorswieck@tu- rived for jointly-correlated systems (Theorem 2). This dresden.de). The research leading to these results has received funding from the Euro- structure reduces the transmission design to a convex pean Research Council under the European Community’s Seventh Framework power allocation problem that can be solved numerically Programme (FP7/2007-2013) / ERC Grant Agreement No. 228044. Parts of or heuristically with low complexity (Strategy 1). At this work were presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, April 2009. high SNR, the power is allocated equally among the Digital Object Identifier 10.1109/TWC.2010.100110.091176 available eigendirections. Single-stream beamforming in 1536-1276/10$25.00 ⃝c 2010 IEEE BJORNSON¨ et al.: IMPACT OF SPATIAL CORRELATION AND PRECODING DESIGN IN OSTBC MIMO SYSTEMS 3579

N the dominant eigendirection is SER-optimal at low and Precoder Channel Joint medium SNR and this is also the case asymptotically as s OSTBC sˆ WH Detection the constellation size grows (Section V). The SNR range C(s) Y where beamforming is SER-optimal is characterized as a (a) Linear precoded OSTBC MIMO system. function of the constellation size (Theorem 5).  nk ∙ Impact of spatial correlation: When the transmitter Equivalent Channel Separate sk HW sˆk has no CSI, it is proven that spatial correlation always y Detection degrades the SER in jointly-correlated Rayleigh fading k systems with OSTBCs (Theorem 3). In the case with (b) Equivalent parallel single-input single-output (SISO) systems, for 𝑘 = 1,...,𝐾 statistical CSI at the transmitter, correlation between . eigendirections at the receiver also degrades the perfor- Fig. 1. Block model of the original MIMO communication system and its mance. Transmit-side correlation will however improve equivalent parallel structure after receive processing. the performance at low and medium SNR (Theorem 4), while the impact at high SNR is negligible (Section V). The conclusion is that CSI and spatial correlation impacts will be considered). These symbols are coded in an OSTBC the SER in a jointly-correlated system with OSTBCs in a matrix C(s) ∈ ℂ𝐵×𝑇 that fulfills the orthogonality property similar (but non-identical) manner as the ergodic capacity in C(s)C(s)𝐻 = ∥s∥2I and has the spatial coding block length Kronecker-structured Rayleigh fading systems [10]; statistical 𝐵. The linear precoder W ∈ ℂ𝑛𝑇 ×𝐵 is used to project CSI at the transmitter can improve the performance by proper the signal into advantageous spatial directions by using the transmission design that adapts to the correlation and turns available transmit-side CSI [16]. Its maximal rank is denoted transmit-side correlation into an advantage. by 𝑚 ≜ min(𝑛𝑇 ,𝐵) and the design of W will be considered Notations: We use boldface (lower case) for column vec- in Section III for different CSI. By introducing the power 𝑇 𝐻 tors, x, and (upper case) for matrices, X. With X , X , constraint ∥W∥2 =1, we make sure that the average transmit ∗ and X we denote the transpose, the conjugate transpose, power allocated per symbol is 𝔼{∥WC(s)∥2}/𝐾 = 𝛾. and the conjugate of X, respectively. The Kronecker and Observe that OSTBCs only exist for certain combinations Hadamard products of two matrices X and Y are denoted of 𝐾, 𝑇 ,and𝐵. In the simplest case, 𝐾 = 𝑇 = 𝐵 =1,it X ⊗ Y and X ⊙ Y, respectively. The column vector obtained corresponds to single-stream beamforming with C(s)=𝑠1. by stacking the columns of X is denoted vec(X) and the Another important case is[ the Alamouti] code, with 𝐾 = ∗ matrix trace is tr(X). The diagonal matrix diag(x) has the 𝑠1 −𝑠2 𝑇 = 𝐵 =2and C(s)= ∗ ∗ , as it also provides full x 𝒞𝒩(x¯, Q) 𝑠2 𝑠1 elements of the vector at the main diagonal. is coding rate [14]. In general, the maximum possible coding used to denote circularly symmetric complex Gaussian random rate approaches 1/2 from above as the spatial dimension x¯ Q vectors, where is the mean and the covariance matrix. 𝐵 increases [17]. For explicit codes and systematic code ≜ The operator is used for definitions. The squared Frobenius generation, see for example [15] and [18]. X ∥X∥2 norm of is denoted and is defined as the sum of the Under these assumptions, we achieve the system in squared absolute values of all the elements. Fig. 1(a). The received signal Y ∈ ℂ𝑛𝑅×𝑇 is II. SYSTEM MODEL Y = HWC(s)+N (1) We consider an arbitrarily correlated Rayleigh flat-fading 𝑛 𝑛 where the total power has been normalized such that the channel with 𝑇 transmit antennas and 𝑅 receive antennas, 𝑛𝑅×𝑇 𝑛 ×𝑛 elements of the additive noise N ∈ ℂ are independent represented by the channel matrix H ∈ ℂ 𝑅 𝑇 .Thetrans- and identically distributed (i.i.d.) as 𝒞𝒩(0, 1). mission is based on OSTBCs with linear precoding, where The precoding matrix W is a not part of the OSTBC, but the OSTBC is used for diversity gains and the transmitter a way of creating an effective channel, HW, with better achieves antenna gains by CSI-aware precoding. This is a properties. The receiver is assumed to know the effective standard form1 of space-time codes for informed transmitters channel perfectly, while separate knowledge of H and W is [24, Chapter 10], for which single-stream beamforming (as unrequired (this simplifies the channel estimation [6]). Then, assumed in [12] and [13]) appears as a special case when the the receiver can perform block-wise maximum likelihood spatial coding block length 𝐵 is one. 𝑇 detection of the symbols s =[𝑠1,...,𝑠𝐾] to find an estimate The OSTBC transmits 𝐾 symbols over 𝑇 symbols slots 𝑇 𝑇 𝐾 ˆs =[ˆ𝑠1,...,𝑠ˆ𝐾] . As shown in [25], [26], an important (i.e., the coding rate is 𝐾/𝑇 ). Let s =[𝑠1,...,𝑠𝐾 ] ∈ ℂ property of OSTBCs is that the original system in (1) can represent these 𝐾 data symbols, where each symbol 𝑠𝑖 ∈𝒜 2 be transformed into 𝐾 independent and virtual single-antenna has average power 𝔼{∣𝑠𝑖∣ } = 𝛾 and are uniformly dis- systems as tributed in the constellation set 𝒜 (different constellations ′ ′ 𝑦 = ∥HW∥𝑠𝑘 + 𝑛 ,𝑘=1,...,𝐾, (2) 1In general, non-orthogonal space-time block codes have better perfor- 𝑘 𝑘 mance at the cost of increased decoding complexity, but we limit ourselves 𝑛′ ∈𝒞𝒩(0, 1) to OSTBCs to achieve analytical tractability. In practice, the orthogonality where 𝑘 . Thus, a low-complexity receiver is often a minor restriction as the simple encoding/decoding of OSTBCs has structure is achieved where each symbol can be detected made them popular in standards (i.e., LTE [22] and WLAN [23]). In addition, separately, as illustrated in Fig. 1(b). This result is due to the OSTBCs are rate optimal if 𝐵 ≤ 2 and the channel H is rank one [24, Theorem 7.4], and we show in Section IV that the SER minimizing spatial structure of the OSTBCs and the assumption of perfect CSI block length is often that small. at the receiver side. We have made no assumptions on the 3580 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 9, NO. 11, NOVEMBER 2010 information available at the transmitter side. In principle, the antenna spacing is sufficiently large [27], which means that transmitter can be completely uninformed of the CSI. We will the elements of 𝝀𝑅 are of similar magnitude. however show that having statistical CSI can greatly improve The notion of majorization [28] provides a useful measure the performance in certain environments. of the spatial correlation [29] and will be used herein for var- 𝑇 𝑇 ious purposes. Let x =[𝑥1,...,𝑥𝑁 ] and y =[𝑦1,...,𝑦𝑁 ] be two non-negative real-valued vectors of arbitrary length 𝑁. A. Preliminaries on Spatial Correlation and Majorization We say that x majorizes y if Herein, we will analyze the average performance of the ∑𝑙 ∑𝑙 system in (1) and (2) in terms of the SER. Thus, we need to 𝑥 ≥ 𝑦 , 𝑙 =1,...,𝑁 − 1, specify the statistical properties of the channel matrix H.In [𝑘] [𝑘] for 𝑘=1 𝑘=1 general, we have that vec(H) ∈𝒞𝒩(0, R) for some arbitrary (4) ∑𝑁 ∑𝑁 correlation matrix R,defined on the column stacking of H. and 𝑥𝑘 = 𝑦𝑘, To achieve an analytic structure on the statistics, we consider 𝑘=1 𝑘=1 two popular MIMO channel models that have been verified by field measurements in realistic environments: the state-of- where 𝑥[𝑘] and 𝑦[𝑘] are the 𝑘th largest ordered elements of the-art Jointly-correlated model [19], [20] and the simplified x and y, respectively. This majorization property is denoted Kronecker model [3]–[5]. These can be defined as follows. x ર y.Ifx and y contain eigenvalues of channel correlation matrices, then x ર y corresponds to that x is more spatially Definition 1. The channel matrix H is jointly-correlated correlated than y. Majorization only provides a partial order Rayleigh fading if of vectors, but is still very powerful due to its connection to ˜ 𝐻 H = U𝑅(Ω ⊙ G)U𝑇 (3) certain order-preserving functions: A function 𝑓(⋅):ℝ𝑁 → ℝ is said to be Schur-convex if U ∈ ℂ𝑛𝑅×𝑛𝑅 U ∈ ℂ𝑛𝑇 ×𝑛𝑇 where 𝑅 and 𝑇 are unitary 𝑓(x) ≥ 𝑓(y) for all x and y, such that x ર y. Similarly, 𝑓(⋅) matrices that describe transmit and receive eigendirections, is said to be Schur-concave if x ર y implies that 𝑓(x) ≤ 𝑓(y). respectively. The elements of G ∈ ℂ𝑛𝑅×𝑛𝑇 are i.i.d. as 𝒞𝒩(0, 1) and Ω˜ ∈ ℂ𝑛𝑅×𝑛𝑇 is the element-wise square root of the so-called coupling matrix Ω (with positive entries) that B. Expressions for the Symbol Error Rate ˜ determines the variance of each element in Ω ⊙ G. Without Throughout the paper, the performance measure will be the loss of generality, let the columns of Ω be ordered with SER; that is, the probability that the receiver makes an error in decreasing element sums. In terms of the correlation matrix the detection of source symbols. Since the equivalent channels R, this model corresponds to the eigenvalue decomposition in (2) are identical for all symbols in the OSTBC, it is clear R =(U∗ ⊗ U )diag(vec(Ω))(U∗ ⊗ U )𝐻 𝑇 𝑅 𝑇 𝑅 with separable that the SER only depends on the distribution of the SNR, eigenvector matrices. ∥HW∥2, and on the type on the symbol constellation set, 𝒜. Definition 2. The channel matrix H follows the Kronecker Next, we will present SER expressions for three commonly model if Definition 1 is fulfilled with a rank-one coupling considered symbol constellations, but first we introduce a 𝑇 𝑛𝑅 𝑛𝑇 matrix Ω = 𝝀𝑅𝝀𝑇 ,where𝝀𝑅 ∈ ℂ and 𝝀𝑇 ∈ ℂ are general class of functions. 2 vectors with positive entries. Definition 3. We define the function 𝑛 ∫ To summarize, the Kronecker model represents the assump- ∑ 𝑏𝑘 𝑐𝑘 𝑑𝜃 tion that the transmit-side and the receive-side correlation can 𝐹a,b,c(Φ,𝑥) ≜ ( ) (5) 𝜋 det I + 𝑥 Φ be completely separated, while the jointly-correlated model 𝑘=1 𝑎𝑘 sin2(𝜃) only assumes that the eigenvectors can be separated in this where Φ is a positive semi-definite matrix and 𝑥 ≥ 0. manner. 𝑇 𝑇 The vectors a =[𝑎1,...,𝑎𝑛] , b =[𝑏1,...,𝑏𝑛] ,and The spatial channel correlation can be measured in the 𝑇 c =[𝑐1,...,𝑐𝑛] have arbitrary length 𝑛 and fulfill 𝑎𝑘 ≤ 𝑏𝑘 eigenvalue distribution of the correlation matrix; weak cor- and 𝑐𝑘 ≥ 0 for all 𝑘. relation is represented by almost identical eigenvalues, while strong correlation means that a few eigenvalues dominate. This class of functions is important since the SERs with Thus, in a highly correlated system, the channel is approx- Pulse Amplitude Modulation (PAM), Phase-Shift Keying imately confined to a small eigensubspace, while all eigen- (PSK), and Quadrature Amplitude Modulation (QAM) belong vectors are equally important in an uncorrelated system. In to it. The variable 𝑥 is proportional to the SNR, but the 2 scaling depends on the modulation. Let 𝑔PAM ≜ 3/(𝑀 − 1), urban cellular systems, base stations are typically elevated 2 and exposed to little near-field scattering. Thus, their antennas 𝑔PSK ≜ sin (𝜋/𝑀),and𝑔QAM ≜ 3/(2𝑀 −2), then the exact are strongly spatially correlated and the spread in 𝝀𝑇 is SER of the system in (2) was derived in [26], [30] as large. The receiving users will on the other hand be exposed SER (R, W,𝛾)=𝐹 (Φ,𝛾𝑔 ), PAM 0, 𝜋 , 2(𝑀−1) PAM to rich scattering and have weak spatial correlation if the 2 𝑀 SER (R, W,𝛾)=𝐹 (Φ,𝛾𝑔 ), PSK 0, 𝜋(𝑀−1) ,1 PSK 1/2 1/2 𝑀 2This is equivalent to the more common definition: H = R GR¯ , ¯ 푅 푇 SERQAM(R, W,𝛾) where the elements of G are i.i.d. as 𝒞𝒩(0, 1). In this formulation, R푅 = U diag(𝝀 )U퐻 R = U diag(𝝀 )U퐻 √ √ 푅 푅 푅 and 푇 푇 푇 푇 are positive semi-definite = 𝐹 4( 𝑀−1) 4( 𝑀−1) (Φ,𝛾𝑔QAM), [0 𝜋 ]𝑇 ,[ 𝜋 𝜋 ]𝑇 ,[ √ ]𝑇 matrices that represent the receive-side and transmit-side correlation, respec- 4 4 2 𝑀 𝑀 푇 tively. In terms of the general correlation matrix, we have R = R푇 ⊗ R푅. (6) BJORNSON¨ et al.: IMPACT OF SPATIAL CORRELATION AND PRECODING DESIGN IN OSTBC MIMO SYSTEMS 3581 for 𝑀-PAM, 𝑀-PSK, and 𝑀-QAM constellations, respec- Theorem 1. Consider minimization of the worst-case function 𝑇 𝑇 𝑇 𝐻 tively. For all three constellations, we have Φ =(W ⊗ value of 𝐹a,b,c(Φ,𝑥), with Φ =(W ⊗ I)R(W ⊗ I) ,by I)R(W𝑇 ⊗I)𝐻 , which is the correlation matrix of the effective selection of W ∈ ℂ𝑛𝑇 ×𝐵 with ∥W∥2 =1.Forall𝑥>0,we channel HW. Note that these SER expressions are valid for have uncoded systems, while the performance with outer coding max min 𝐹a,b,c(Φ,𝑥) behaves differently [31]. R∈ ℂ𝑛𝑇 𝑛𝑅×𝑛𝑇 𝑛𝑅 ; W∈ ℂ𝑛𝑇 ×𝐵 ; 2 Observe that the integrals in Definition 3 are the main build- Rર0, tr(R)=𝑛𝑇 𝑛𝑅 ∥W∥ =1 { (10) ing stones in all the SER expressions in (6). The determinant 𝐹a,b,c (0,𝑥) ,𝐵<𝑛𝑇 , = in the integrands can equally be expressed as 𝐹a,b,c (diag([𝑛𝑅, 0,...,0]),𝑥) ,𝐵≥ 𝑛𝑇 . ( 𝑥 ) 𝑚𝑛∏𝑅 ( 𝑥 ) det I + Φ = 1+ 𝜆𝑗 (Φ) , (7) If the dimension 𝐵<𝑛𝑇 , the minimal value is achieved for sin2(𝜃) sin2(𝜃) 𝑗=1 any W√, while the optimal precoding matrix for 𝐵 ≥ 𝑛𝑇 is W = 1/𝑛𝑇 V1[I0]V2 for arbitrary unitary matrices V1 ∈ where 𝜆𝑗 (Φ) denotes the 𝑗th largest eigenvalue of Φ. Thus, we ℂ𝑛𝑇 ×𝑛𝑇 V ∈ ℂ𝐵×𝐵 conclude that the eigenvalues of Φ (and not the eigenvectors) and 2 . determine the SER. Since (7) is a Schur-concave function with Proof: The proof is given in Appendix B. respect to the eigenvalues, it is clear that the eigenvalue spread The following corollary interprets the theorem in terms of will affect the performance. This brings us back to the notion the SER. of spatial correlation discussed in the last section. In Section Corollary 1. Consider the worst-case SER in (9) with either IV, we will analyze how the SER performance depends on 𝑀-PAM, 𝑀-PSK, or 𝑀-QAM. If 𝐵<𝑛𝑇 , then the worst- the spatial correlation and we will focus on comparing systems case SER is (𝑀 − 1)/𝑀 independently of the structure of the with different eigenvalue distributions. All analytic results will precoding matrix. If 𝐵 ≥ 𝑛𝑇 , the minimal worst-case SER is be derived for the class of functions in Definition 3, and the strictly smaller than (𝑀 −1)/𝑀 and is achieved by precoding interpretations for PAM, PSK, and QAM will be given as matrices of the type corollaries. √ 1 W = V1[I0] (11) III. LINEAR PRECODING WITH DIFFERENT TYPES OF CSI 𝑛𝑇 The purpose of applying linear precoding to OSTBCs is to where V1 is a unitary matrix. adapt it to the channel conditions known at the transmitter Two important conclusions can be drawn. Firstly, before and thereby improve the system performance. Herein, the data transmission, the probability of falsely predicting the performance measure is the SER and thus the precoding matrix next symbol is (𝑀 − 1)/𝑀 . This is also the worst-case SER should be selected as when 𝐵<𝑛𝑇 , and thus we need 𝐵 ≥ 𝑛𝑇 (i.e., exploiting W =argminSER(R, W,𝛾). (8) all spatial directions) to guarantee that useful information W∈ℂ𝑛𝑇 ×𝐵 ; ∥W∥2=1 is received. Secondly, an example of√ the optimal precoding 𝐵 ≥ 𝑛 W = 1/𝑛 [I0] Depending on the type of symbol constellation, the SER ex- matrix, for 𝑇 ,is 𝑇 ,whichis 𝑛 × 𝑛 pression in this optimization problem will be slightly different. a scaled 𝑇 𝑇 identity matrix padded by zeros. It is 𝐵>𝑛 𝐵 − 𝑛 Apart from the constellation, the SER also depends on the obviously not beneficial to have 𝑇 , since the 𝑇 precoder W, the channel correlation matrix R,andtheSNR additional degrees of freedom appear in the null space of the 𝛾. Thus, the quality of the precoding design will depend on channel. To summarize, when the transmitter is unaware of whether the correlation and SNR is known at the transmitter or the channel statistics, the optimal spatial coding block length 𝐵 = 𝑛 not. Next, we will solve (8) assuming that the these statistical is √ 𝑇 and power should be allocated isotropically (i.e., W = 1/𝑛 I parameters are either unknown or perfectly known to the 𝑇 ). transmitter. B. With Statistical CSI at the Transmitter A. Without CSI at the Transmitter When statistical CSI is available at the transmitter, the When the transmitter is unaware of the channel correlation precoding matrix W can be adapted to the spatial properties matrix, R, and potentially unaware of the SNR, 𝛾, robustness of the R and to the average SNR of the system. The purpose against channel fading can be achieved by minimizing the of this section is to characterize the solution of the SER worst-case SER. This worst case scenario corresponds to that minimization in (8). First, we show the structure of the optimal for every precoder we select, the channel conditions always precoder under the assumption of jointly-correlated channels. become the worst possible. Formally, the worst-case SER is This structure reduces the precoding design to a convex given by the following optimization problem: power allocation problem. Explicit asymptotic solutions will be derived at low and high SNRs and for large symbol con- max min SER(R, W,𝛾). (9) R∈ ℂ𝑛𝑇 𝑛𝑅×𝑛𝑇 𝑛𝑅 ; W∈ℂ𝑛𝑇 ×𝐵 ; stellations. In addition, a simple approximate power allocation 2 Rર0, tr(R)=𝑛𝑇 𝑛𝑅 ∥W∥ =1 will be proposed. Next, we solve this problem for the class of SER-like functions We begin with a theorem that derives the general structure W in Definition 3 and give the structure of the optimal precoding and the asymptotic properties of precoding matrices that matrices. minimize the SER-like class of functions in Definition 3. 3582 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 9, NO. 11, NOVEMBER 2010

Theorem 2. Consider minimization of 𝐹a,b,c(Φ,𝑥), with Φ = This power allocation behaves similar to the optimal strat- (W𝑇 ⊗ I)R(W𝑇 ⊗ I)𝐻 , by selection of W ∈ ℂ𝑛𝑇 ×𝐵 with egy in terms of the waterfilling property that gives beam- ∥W∥2 =1.IfR is jointly-correlated and known, the solutions forming at low 𝑥 and equal power allocation at large 𝑥.The to number of active precoding directions increases with 𝑥 and min 𝐹a,b,c(Φ,𝑥), (12) all directions are used if W∈ℂ𝑛𝑇 ×𝐵 ; ∥W∥2=1 ⎛ ⎞ 𝑚 W = U ΠΔV 𝑛 𝑚 ∑ 1 have the structure 𝑇 for some 𝑇 -dimensional 𝑥>𝑛 ⎝ − ⎠ . Π V ∈ 𝑅 𝜇 𝜇 (15) permutation matrix and arbitrary unitary matrix 𝑚 𝑗=1 𝑗 ℂ𝐵×𝐵 Δ ∈ ℂ𝑛𝑇 ×𝐵 √ . The√ rectangular diagonal matrix has 𝑝1,..., 𝑝𝑚 on the main diagonal, and 𝐹a,b,c(Φ,𝑥) is Otherwise, the number of active directions is 𝑚˜ =rank(D) < convex in 𝑝𝑗 for all 𝑗. The limiting solution at large 𝑥 is 𝑚,where𝑚˜ is the positive integer that fulfills ⎛ ⎞ ⎛ ⎞ given by 𝑝1 = ... = 𝑝𝑚 =1/𝑚 and a permutation matrix 𝑚˜ ∑𝑚˜ 1 𝑚˜ +1 𝑚∑˜ +1 1 that selects the 𝑚 eigendirections with the largest element 𝑛 ⎝ − ⎠ <𝑥≤ 𝑛 ⎝ − ⎠ . Ω 𝑥 𝑅 𝜇 𝜇 𝑅 𝜇 𝜇 products in columns of . The limiting solution at small 𝑚˜ 𝑗=1 𝑗 𝑚˜ +1 𝑗=1 𝑗 Π = I 𝑝 ,...,𝑝 is given by and all power is allocated to 1 𝑚˜ , (16) 𝑚˜ Ω where is the multiplicity of the largest column sum of . The power allocation in Strategy 1 is derived from the Π = I Under the Kronecker model, the solution has .At Chernoff bound small 𝑥, the limiting solution performs equal power allocation ∑𝑛 𝑐 (𝑏 − 𝑎 ) among the strongest directions: 𝑝1 = ...= 𝑝𝑚˜ =1/𝑚˜ . 𝐹 (Φ,𝑥) ≤ 𝑘 𝑘 𝑘 , a,b,c 𝜋 det (I + 𝑥Φ) (17) Proof: The proof is given in Appendix B. 𝑘=1 The following corollary interprets the theorem in terms of which is minimized by (14) under the condition that the the SER, and is a generalization and correction of [26] (which coupling matrix can be factorized as Ω =[1, ..., 1]𝝀𝑇 treats the Kronecker model and has disregarded the eigenvalue (i.e., Kronecker model with uncorrelated receiver). The per- ordering). formance of this power allocation will be evaluated in Section Corollary 2. The SERs of 𝑀-PAM, 𝑀-PSK, and 𝑀-QAM V for a general Ω and compared with the optimal strategy. are minimized by precoding matrices with the structure ⎧ [ ] ⎨ D IV. IMPACT OF SPATIAL CORRELATION WITH U𝑇 Π if 𝐵<𝑛𝑇 W = 0 (13) DIFFERENT TYPES OF CSI ⎩U [D0] 𝐵 ≥ 𝑛 𝑇 if 𝑇 The SER depends on the spatial correlation, as pointed out where D ∈ ℂ𝑚×𝑚 is a diagonal matrix. The limiting solution in Section II-B. Next, we will analyze this dependence in at high SNR and fixed constellation size 𝑀 is equal power more detail using the tool of majorization. If we can show that allocation in D. At low SNR, beamforming in the direction of the SER is a Schur-convex function, then spatial correlation the first column of U𝑇 is the SER minimizing solution. This increases the error rate and thereby degrades the performance. is also the asymptotically optimal solution as the constellation If the SER, on the other hand, is Schur-concave, then spatial size 𝑀 →∞. correlation improves the performance. In this section, we prove that both properties can apply, depending on the CSI The first conclusion is that the structure of the SER available at the transmitter. minimizing precoding matrix in jointly-correlated channels is similar as under the Kronecker model [26], [32]. Having 𝐵>rank(D) will not improve the performance, and thus A. Without CSI at the Transmitter 𝐵>𝑛 there is no reason to have 𝑇 .AthighSNR,itwasex- When the transmitter is unaware of the CSI, Theorem 1 pected that equal power allocation is the limiting solution [26], showed that equal power allocation in all spatial directions but an important result from Theorem 2 is that beamforming is minimizes the worst-case SER. Assuming that such precoding optimal both at low SNR and for large symbol constellations. is applied, the following theorem derives the impact of spatial The optimal precoding structure derived in Theorem 2 for correlation on the class of SER-like functions in Definition 3. jointly-correlated systems reduces the precoding optimization Theorem 3. Consider 𝐹a,b,c(Φ,𝑥), with Φ = to a convex power allocation problem (and selection of the 𝑇 𝑇 𝐻 (W ⊗√I)R(W ⊗ I) ,where𝐵 ≥ 𝑛𝑇 and active eigendirections, if 𝐵<𝑛𝑇 ). This power allocation can W = 1/𝑛𝑇 V1[I0]V2. This function is Schur-convex be solved numerically in an efficient fashion using gradient with respect to any subset of eigenvalues of R, while the methods [33]. For low-complexity implementations, we pro- other eigenvalues are fixed. Under the Kronecker model, this pose the following heuristic power allocation. means that 𝐹a,b,c(Φ,𝑥) is Schur-convex with respect to 𝝀𝑇 Strategy 1. A heuristic solution to the precoding power allo- when 𝝀𝑅 is fixed and Schur-convex with respect to 𝝀𝑅 when cation in Theorem( 2 is ) 𝝀𝑇 is fixed. 𝑛 𝑛 𝑝 =min 𝑅 − 𝑅 , 0 𝑗 =1,...,𝑚, Proof: The theorem follows directly from Lemma 1 in 𝑗 𝛼 𝑥𝜇 for (14) 𝑗 Appendix A since all non-zero eigenvalues of Φ also are where 𝜇𝑗 is the element sum of the 𝑗th column of Ω and we eigenvalues of R. use Π = I as permutation matrix. The parameter 𝛼 is selected The following corollary interprets the theorem in terms of ∑𝑚 to fulfill the power constraint 𝑗=1 𝑝𝑗 =1. the SER. BJORNSON¨ et al.: IMPACT OF SPATIAL CORRELATION AND PRECODING DESIGN IN OSTBC MIMO SYSTEMS 3583

Corollary 3. When the precoder minimizes the worst-case using Theorem 2 which showed that beamforming is optimal SER, spatial correlation always degrades the SER perfor- in this SNR region. Thus, spatial correlation improves the mance with 𝑀-PAM, 𝑀-PSK, and 𝑀-QAM (even if only performance in an SNR region that is at least as large as the a certain eigenspace is considered). Under the Kronecker beamforming optimality range. This range is characterized by model, both receive and transmit-side correlation degrades the the following theorem. performance. Theorem 5. When minimizing the function 𝐹a,b,c(Φ,𝑥),a The intuitive conclusion is that when the transmitter has no necessary and sufficient condition for optimality of beamform- CSI and therefore makes an isotropic signal power allocation, ing (i.e., 𝑝1 =1,𝑝2 = ...= 𝑝𝑚 =0)is ( ) the preferred fading environment is isotropic (i.e., all direc- 𝑛 ∫ 𝑛 ∑𝑐 𝑏𝑘 ∑𝑅 𝜔 𝜔 𝑑𝜃 tions should be equally strong a priori). As spatial correlation 𝑘 𝑙,1 − 𝑙,2 ( ) 𝜋 sin2(𝜃)+𝑥𝜔 sin2(𝜃) det I+ 𝑥 A creates a few dominant directions, isotropic transmission will 𝑘=1 𝑎𝑘 𝑙=1 𝑙,1 sin2(𝜃) waste transmission power in other directions which leads to ≥ 0, performance degradation. (18)

B. With Statistical CSI at the Transmitter where 𝜔𝑙,𝑗 is the (𝑙,𝑗)th element of Ω and A = diag(𝜔1,1,...,𝜔𝑛 ,1). Next, we consider the case when the transmitter knows the 𝑅 channel correlation matrix and the average SNR of the system. Proof: The proof is given in Appendix B. When SER minimizing precoding is applied, according to The following corollary interprets the theorem in terms of Theorem 2, we prove that the impact of spatial correlation the SER. changes with the SNR. Corollary 5. The SNR range with optimality for single-stream 𝑇 Theorem 4. Consider 𝐹a,b,c(Φ,𝑥), with Φ =(W ⊗ beamforming is 𝛾 ∈ [0,𝜐], where the upper bound 𝜐 solves 𝑇 𝐻 I)R(W ⊗ I) ,whereW = U𝑇 ΠΔV minimizes the (18) with equality using 𝑥 = 𝜐𝑔PAM , 𝑥 = 𝜐𝑔PSK,and function as in Theorem 2. If R is jointly-correlated and 𝑥 = 𝜐𝑔QAM for 𝑀-PAM, 𝑀-PSK, or 𝑀-QAM, respectively. known, let the (𝑙,𝑗)th element of the coupling matrix Ω be The parameters a, b, c are given in (6) for each modulation parameterized∑ as 𝜇𝑗 𝜔¯𝑙,𝑗,where𝜇𝑗 is the sum of the 𝑗th scheme. 𝑛𝑅 𝜔¯ =1 𝑗 column and 𝑙=1 𝑙,𝑗 for all . Then, the function is In general, the beamforming optimality range cannot be 𝜔¯ ,...,𝜔¯ 𝑗 Schur-convex with respect to 1,𝑗 𝑛𝑅,𝑗 for each (when derived explicitly. The expression in (18) is however mono- all 𝜇𝑗 are fixed). The function is Schur-convex with respect tonically decreasing in 𝑥 and thus the 𝑥-value that provides to 𝜇𝜋(1),...,𝜇𝜋(𝑚) (for fixed 𝜔¯𝑙,𝑗)atlarge𝑥 (the bijective equality can be derived by simple line search procedures. permutation function 𝜋(⋅) represents Π) and Schur-concave An approximate expression for the optimality range can be 𝜇 ,...,𝜇 𝑥 with respect to 1 𝑛𝑇 at small . derived using the low-complexity precoding strategy proposed Under the Kronecker model, this implies that the function in Strategy 1 by simply substituting 𝑚˜ =1into (16). 𝝀 𝝀 is Schur-convex with respect to 𝑅 (when 𝑇 is fixed). Finally, we stress that in practice the positive impact of 𝑚 The function is Schur-convex with respect to the largest transmit-side correlation in Theorem 4 can be observed for 𝝀 𝝀 𝑥 elements of 𝑇 (when 𝑅 is fixed) at large , while it is an SNR range considerably larger than the optimality range 𝝀 Schur-concave with respect to the complete vector 𝑇 at small for single-stream beamforming3. This analytical result stands 𝑥 . in contrast to the numerical conclusion in [13] that the SER Proof: The proof is given in Appendix B. increases monotonically with the correlation. This miscon- The following corollary interprets the theorem in terms of ception originates from varying the transmit and receive-side the SER. correlation simultaneously. Next, we will show numerically Corollary 4. With SER-optimal precoding for 𝑀-PAM, 𝑀- that transmit-side correlation improves the performance at PSK, or 𝑀-QAM, the impact of spatial correlation depends on both low and medium SNRs, while the correlation impact is the SNR. In jointly-correlated systems, spatial correlation is negligible at high SNR. characterized as the spread of channel gains between different eigendirections at the transmitter and receiver side. Spatial V. N UMERICAL EXAMPLES correlation in receive eigendirections always degrades the performance. At high SNR, spatial correlation in transmit In this section, we provide numerical examples that demon- eigendirections also degrades performance, while correlation strate the precoding results in Section III and the impact improves the SER at low SNR. Under the Kronecker model, of spatial correlation that was analyzed in Section V. First, these behaviors decouple; receive-side correlation decreases the performance loss of the proposed heuristic power allo- the performance, while transmit-side correlation improves the cation strategy will be evaluated along with the size of the performance at low SNR and degrades it at high SNR. beamforming optimality range. Then, we will clarify how the low and high SNR-behaviors derived in Section V affect the In other words, even if optimal precoding is applied, spatial performance in the range of practical SNRs. receive-side correlation will always degrade the performance. For transmit-side correlation, there is however a remarkable 3In fact, the beamforming range cannot be used directly to determine the change in behavior between low and high SNR, which requires low SNR region since it depends on the spatial correlation, while the low further specification. The low SNR behavior was proved SNR property is valid for any correlation. 3584 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 9, NO. 11, NOVEMBER 2010

1 p Optimal 1 Heuristic 40 0.8 30 0.6

0.4 20 p ,p

Power Allocation 2 3 0.2 10 M−PAM M−PSK M−QAM

0 Largest SNR with Beamforming (dB) 0 0 5 10 15 20 25 30 35 40 50 100 150 200 250 SNR (dB) M

Fig. 2. Power allocation among 𝑝1,𝑝2,𝑝3 as a function of the SNR in a Fig. 4. Upper bound on the beamforming optimality range (in SNR) as jointly-correlated system with 𝑛푇 =3, 𝑛푅 =2, and 16-QAM. The SER a function of the modulation size. A jointly-correlated system is considered minimizing strategy is compared with the low-complexity approach proposed with 𝑛푇 =3, 𝑛푅 =2, and different symbol constellations. in Strategy 1.

0.06 SNR. At low and high SNR, the difference is nonexistent or 16−PAM negligible, while there is a peak in the area where the heuristic 0.05 16−PSK 16−QAM strategy uses beamforming although higher performance can 0.04 be achieved by spatial multiplexing. The maximum relative performance loss is around 5 percent, which can be seen as 0.03 validation of the heuristic strategy. 0.02 As observed in Fig. 2, the SER is minimized by allocating

0.01 all power to the strongest eigendirection for a large range of SNRs. Theorem 2 proved that this type of single-stream Relative SER Performance Loss 0 beamforming is optimal at low SNR and the optimality range 0 5 10 15 20 25 30 35 40 was characterized in Theorem 5. Next, we illustrate the upper SNR (dB) bound of this range. In Fig. 4, the largest SNR that gives Fig. 3. Relative increase in SER, as a function of the SNR, when using the beamforming as the SER minimizing precoding is shown as heuristic power allocation as compared to SER minimizing power allocation. a function of the modulation size, 𝑀, for PAM, PSK and 𝑛 =3 𝑛 =2 𝑀 =16 A jointly-correlated system is considered with 푇 , 푅 , , QAM. As noted in Corollary 2, the upper bound increases and different symbol constellations. with 𝑀 and they will approach infinity together. In general, the beamforming optimality range is much wider for PAM A. Precoding Strategies and PSK, than for QAM. As we move towards modulations When the transmitter has statistical CSI, the structure of the as 64-QAM and 128-QAM, the optimal precoding strategy is SER minimizing precoding matrix was given by Theorem 2. beamforming for most practical SNRs. The remaining convex power allocation problem can either be solved optimally using numerical methods or approximately using Strategy 1. Next, we evaluate the difference in per- B. Impact of Spatial Correlation formance and behavior between these strategies. To ensure Next, we illustrate the impact of spatial correlation when repeatability, we consider the coupling matrix [ ] the transmitter has either no CSI or statistical CSI. As proved 3.60.40.5 in Section IV, the power distribution in columns and rows Ω = (19) 10.30.2 of the coupling matrix affects the SER in different ways. that was introduced in [34] and has 𝑛𝑇 =3and 𝑛𝑅 =2.This To show this in a simple way, we consider a system with coupling matrix represents a clearly non-Kronecker model 𝑛𝑇 = 𝑛𝑅 =4that satisfies the Kronecker model. The scenario with one strong transmit eigendirection and two antenna correlation follows the exponential model [35], which equally weak transmit directions with different spreads over in principle models a uniform linear array (ULA) with the the receive eigendirections. In Fig. 2, the optimal and heuristic correlation between adjacent antenna elements as a parameter. power allocations are shown as functions of the SNR, 𝛾.The The symbol constellation is 16-QAM and the coupling matrix symbol constellation is 16-QAM and observe that the total is normalized such that the total element sum is 𝑛𝑇 𝑛𝑅. element sum in Ω is normalized to 𝑛𝑇 𝑛𝑅. The difference In Fig. 5, we keep the transmit correlation fixed at 0.5, while between the two strategies is clearly visible; the heuristic the correlation between adjacent receive antennas changes strategy requires slightly higher SNR before allocating power between 0 and 1 (i.e., from completely uncorrelated to com- in more than one direction and always gives 𝑝2 = 𝑝3, although pletely correlated). As expected from Theorem 3 and 4, the the first of these directions is slightly advantageous. SER is a Schur-convex function with respect to the spatial The perceived difference between the optimal and heuristic correlation at all SNRs. Having statistical CSI improves the strategy, in terms of the relative increase in SER when using SER, especially at low and medium SNR, but the overall the latter, is illustrated in Fig. 3. The performance loss is conclusion is that receive-side correlation always degrades the given for 16-PAM, 16-PSK, and 16-QAM as a function of the performance. BJORNSON¨ et al.: IMPACT OF SPATIAL CORRELATION AND PRECODING DESIGN IN OSTBC MIMO SYSTEMS 3585

0 0 10 10 0 dB 0 dB

−2 −2 10 10 10 dB 10 dB

−4 −4 10 10 14 dB 14 dB

−6 −6 10 10 18 dB 18 dB −8 −8 10 10 Symbol Error Rate (SER) Symbol Error Rate (SER) −10 −10 10 10 Uniform precoding Uniform precoding 22 dB Optimal precoding 22 dB Optimal precoding −12 −12 10 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Receive Antenna Correlation Transmit Antenna Correlation

Fig. 5. The SER as a function of the correlation between adjacent receive Fig. 6. The SER as a function of the correlation between adjacent transmit antennas with either uniform (no transmit side CSI) or optimal precoding antennas with either uniform (no transmit-side CSI) or optimal precoding (statistical transmit-side CSI). The system uses 16-QAM and follows the (statistical transmit-side CSI). The system uses 16-QAM and follows the Kronecker model with 𝑛푇 = 𝑛푅 =4and a transmit antenna correlation Kronecker model with 𝑛푇 = 𝑛푅 =4and a receive antenna correlation of 0.5. of 0.5.

Next, we keep the receive correlation fixedat0.5andvary exploits the channel eigendirections at the transmitter side and the correlation between adjacent transmit antennas. This case allocates power according to a convex optimization problem. is of special interest since Theorem 3 and 4 showed different While equal power allocation is optimal at very high SNR, behaviors depending on the SNR and available CSI. Without it was shown that single-stream beamforming in general is transmit-side CSI, Fig. 6 shows that the SER becomes a Schur- optimal at low SNR and often at most practical SNRs. The convex function at all SNRs. With statistical CSI, we observe beamforming optimality range was characterized and shown the opposite behavior; the SER is a Schur-concave function, to increase with the size of the symbol constellation. Fur- and thereby improves the performance with increasing cor- thermore, a low-complexity algorithm for power allocation relation, in an SNR range that reaches up to 14 dB. For was proposed and it was illustrated that its performance loss larger SNRs, there is a transition range where the SER is compared with power optimal allocation is small. neither Schur-convex nor Schur-concave. At very high SNR, For this type of closed-loop systems, it was proven that the SER becomes Schur-convex, but observe that it has already receive side correlation always degrades the performance, 10−6 reached such low values (below ) that the dependence while transmit side correlation is favorable at low and medium on the spatial correlation in principle is negligible. Thus, we SNR (including the beamforming optimality range) and has conclude that with statistical transmit-side CSI, the SER is negligible impact at high SNR. In practice, these results Schur-concave at low to medium SNRs and approximately impose difficult design considerations as the uplink-downlink Schur-concave at high SNRs. channel reciprocity means that we cannot achieve an optimal fading environment in both the uplink and downlink. VI. CONCLUSION The optimal precoder and the impact of spatial correlation APPENDIX A on the symbol error rate have been shown to depend strongly on the CSI available at the transmitter. The considered system The following lemma shows some behaviors of the class of was Rayleigh fading with OSTBC transmission, perfect CSI functions introduced in Definition 3. at the receiver side, and the SER was used as performance Lemma 1. The function 𝐹a,b,c(Φ,𝑥) is Schur-convex with measure. If the transmitter has no CSI, then the optimal respect to the eigenvalues of Φ and it is a decreasing function precoding strategy is to allocate the power equally over all in 𝑥. The function is also decreasing in tr(Φ) for fixed eigendirections and thereby protect the system against the eigenvalue spread. worst-case conditions. For this type of open-loop system, the Proof: First, by inserting (7) into (5), and multiplying intuitive result that spatial correlation degrades the perfor- with sin2(𝜃)/ sin2(𝜃), we achieve mance was proven. ( ) 𝑛 ∫ 𝑚𝑛 When the transmitter has statistical CSI, the transmission ∑ 𝑏𝑘 ∏𝑅 2 𝑐𝑘 sin (𝜃) strategy can be adapted to the spatial correlation and ex- 𝐹a,b,c(Φ,𝑥)= 2 𝑑𝜃. 𝜋 𝑎 sin (𝜃)+𝑥𝜆𝑖(Φ) ploit its advantages. While correlation increases the channel 𝑘=1 𝑘  𝑖=1   knowledge at the transmitter, it also decreases the degrees ≜𝑔(𝜃,𝑥,𝜆 (Φ),...,𝜆 (Φ)) 1 𝑚𝑛𝑅 of freedom, and thus it is not intuitively clear how spatial (20) correlation affects the SER. The analysis herein was based on the assumption of jointly-correlated channel statistics, which This expression resolves the ambiguity in intervals [𝑎𝑘,𝑏𝑘] that better complies with measurements than the Kronecker model contain 𝜃 such that sin2(𝜃)=0and keeps the strict equality previously used in this area. The optimal precoding strategy since these points have zero measure. Next, observe that the 3586 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 9, NO. 11, NOVEMBER 2010

integrand 𝑔(𝜃, 𝑥,1 𝜆 (Φ),...,𝜆𝑚𝑛𝑅 (Φ)) is continuous for 𝑥 ≥ Lemma 2. Let 𝑓(⋅) be a Schur-convex function and consider 0 and 𝜆𝑗 (Φ) ≥ 0 for all 𝑗. The partial derivatives of 𝑔(⋅) are the matrix A = Λ1 + Λ2(F ⊗ I)Λ3,whereΛ𝑗 are positive semi-definite diagonal matrices for all 𝑗 and W is positive ∂ 𝑚𝑛∑𝑅 𝜆 (Φ) 𝑔(⋅)=−𝑔(⋅) 𝑗 , semi-definite. Then, ∂𝑥 sin2(𝜃)+𝑥𝜆 (Φ) 𝑗=1 𝑗 𝑇 (21) min 𝑓([𝜆1(A),...,𝜆𝑁 (A)] ) (25) ∂ 𝑥 F;tr(F)=1 𝑔(⋅)=−𝑔(⋅) 2 , ∂𝜆𝑗 (Φ) sin (𝜃)+𝑥𝜆𝑗 (Φ) can only be solved by matrices F that are diagonal. which both are continuous and negative. Hence, we use [36, Proof: Assume, for the purpose of contradiction, that F Theorem 9.42] that states that differentiation of 𝐹a,b,c(Φ,𝑥) there exist a non-diagonal optimal solution opt. It can be F = D + B D B ∕= 0 with respect to 𝑥 and 𝜆𝑗 (Φ) can be determined by differ- expressed as opt ,where is diagonal and entiation of the integrand (i.e., interchanging the order of has zero diagonal elements. Observe that integration and differentiation). A = Λ1 + Λ2((D + B) ⊗ I)Λ3 We should prove that the function is Schur-convex. Ac- = Λ1 + Λ2(D ⊗ I)Λ3 + Λ2(B ⊗ I)Λ3 . (26) cording to Schur’s condition [28, Theorem 3.A.4], a function       ≜C, zero diagonal 𝑓(𝑥1,...,𝑥𝑁) is Schur-convex if and only if 𝑓 is symmetric diagonal in its arguments and if Next, [28, Theorem 9.B.1] says that the eigenvalues of A ( ) ∂𝑓 ∂𝑓 majorizes the vector with diagonal elements of A. Thus, the (𝑥1 − 𝑥2) − ≥ 0. (22) eigenvalues of A majorizes the eigenvalues of C, which means ∂𝑥1 ∂𝑥2 that Similarly, the function is Schur-concave if and only if it is 𝑇 𝑇 𝑓([𝜆1(A),...,𝜆𝑁 (A)] ) ≥ 𝑓([𝜆1(C),...,𝜆𝑁 (C)] ) (27) symmetric in its arguments and (22) is fulfilled with the opposite inequality. Using (21), we have with equality if and only if A = C. Hence, Fopt can be ∂ replaced by D (with identical trace) that gives a lower function 𝐹a,b,c(Φ,𝑥) value. This optimality contradiction means that the solution ∂𝜆𝑗 (Φ) 𝑛 ∫ must be diagonal. ∑ 𝑏𝑘 (23) 𝑐𝑘 𝑔(𝜃, 𝑥,1 𝜆 (Φ),...,𝜆𝑚𝑛𝑅 (Φ))𝑥 = − 2 𝑑𝜃 𝜋 sin (𝜃)+𝑥𝜆 (Φ) APPENDIX B 𝑘=1 𝑎𝑘 𝑗 Proof of Theorem 1: In the case 𝐵<𝑛𝑇 ,everyW can be which is negative since all components of the sum are positive described by the singular value decomposition (recall that 𝑐𝑘 ≥ 0 and 𝑏𝑘 ≥ 𝑎𝑘 by definition). Except from [ ] D within the function 𝑔(⋅), the eigenvalue 𝜆𝑗 (Φ) only appears in W = V V 1 0 2 (28) numerators and hence the derivative with respect to 𝜆1(Φ) will 𝐵×𝐵 𝑛𝑇 ×𝑛𝑇 be larger than for 𝜆2(Φ) if 𝜆1(Φ) >𝜆2(Φ). Thus, Schur’s for some diagonal D ∈ ℂ and unitary V1 ∈ ℂ 𝐵×𝐵 condition in (22) is fulfilled and 𝐹a,b,c(Φ,𝑥) is Schur-convex and V2 ∈ ℂ . By selecting [ ] with respect to the eigenvalues of Φ. 00 R =(V∗ V𝑇 ⊗ I) (29) Next, by interchanging integration and differentiation order 1 0A 1 and using (21), we have that for some arbitrary A ∈ ℂ𝑛𝑇 −𝐵×𝑛𝑇 −𝐵 that fulfills tr(A)= ∂ 𝑛𝑇 , we achieve 𝐹a,b,c(Φ,𝑥) ∂𝑥 Φ =(W𝑇 ⊗ I)R(W𝑇 ⊗ I)𝐻 𝑛 ∫ 𝑚𝑛 ∑ 𝑏𝑘 ∑𝑅 [ ][ ] 𝑐𝑘 𝑔(𝜃, 𝑥,1 𝜆 (Φ),...,𝜆𝑚𝑛 (Φ))𝜆𝑗 (Φ) ( 00 ∗ ) (30) = − 𝑅 = V𝑇 [D𝑇 0] D V∗ ⊗ I = 0 𝜋 sin2(𝜃)+𝑥𝜆 (Φ) 2 0A 0 2 𝑘=1 𝑎𝑘 𝑗=1 𝑗 ≤ 0 which is the global maximum of the outer optimization in (10) since Lemma 1 states that 𝐹a,b,c(Φ,𝑥) increases with (24) decreasing tr(Φ).If𝐵<𝑛𝑇 , we can thus achieve the global since the sum coefficients and the integrands are all positive. maximum of 𝐹a,b,c(Φ,𝑥) by worst-case selection of R for Hence, the function is decreasing in 𝑥. any choice of W. Finally, let 𝛽 ≜ tr(Φ) and define the normalized ma- In the case 𝐵 ≥ 𝑛𝑇 ,wefirst analyze the maximal function ˜ Φ trix Φ ≜ 𝛽 (with constant unit trace). We want to show value with respect to R for a given W. Then, we identify how 𝐹a,b,c(Φ,𝑥) depends on the trace 𝛽. Observe that the the matrix W that gives the smallest maximized value. Let ˜ function identically can be expressed as 𝐹a,b,c(𝛽Φ,𝑥)= the singular value decomposition of W be denoted W = ˜ V [D0]V D =diag(𝑑 ,...,𝑑 ) 𝐹a,b,c(Φ,𝛽𝑥), and thus increasing 𝛽 is equivalent to increas- 1 2,where 1 𝑛𝑇 has elements 𝑛 ×𝑛 ing 𝑥. We conclude that the function is decreasing in tr(Φ) ordered with decreasing magnitude and V1 ∈ ℂ 𝑇 𝑇 , 𝐵×𝐵 for fixed eigenvalue spread in Φ. V2 ∈ ℂ are unitary matrices. Lemma 1 showed that The concepts of the next lemma have previously been 𝐹a,b,c(Φ,𝑥) increases with decreasing tr(Φ), and thus we used to prove that optimal precoders diagonalize the channel want to maximize ( ) statistics [6], [26], [32]. Herein, it is generalized for non- tr(Φ)=tr (W𝑇 ⊗ I)R(W𝑇 ⊗ I)𝐻 ( ) (31) Kronecker model systems. =tr R˜ (D∗D𝑇 ⊗ I) BJORNSON¨ et al.: IMPACT OF SPATIAL CORRELATION AND PRECODING DESIGN IN OSTBC MIMO SYSTEMS 3587

˜ 𝑇 ∗ where R =(V1 ⊗ I)R(V1 ⊗ I) andweusedSchur Using the optimal precoder structure, each integral in (12) determinant lemma [37]. According to [28, Theorem 20.A.4], can be expressed as ˜ (31) is minimized when R is diagonal and ordered such that ∫ ∫ 𝑛 𝑚 𝑏𝑘 𝑑𝜃 𝑏𝑘 ∏𝑅 ∏ 1 its smallest elements are multiplied with the largest elements ( ) 𝑥 = 𝑥 𝑑𝜃, ∗ 𝑇 det I + Φ 1+ 2 𝜔 𝑝𝑗 of D D ⊗I, and vice versa. Thus, the remaining optimization 𝑎𝑘 sin2(𝜃) 𝑎𝑘 𝑙=1 𝑗=1 sin (𝜃) 𝑙,𝜋(𝑗) is (35) ( ) 𝜔𝑙,𝑗 (𝑙,𝑗) Ω ∑𝑛𝑇 ∑𝑛𝑅 where is the th element of . The integral is a convex 2 𝑝 𝑥 min ∣𝑑𝑛 −𝑗+1∣ 𝜆𝑙+(𝑗−1)𝑛 (R) function of each 𝑗 .Atlarge ,wehave R; tr(R)=𝑛 𝑛 𝑇 𝑇 𝑇 𝑅 𝑗=1 𝑙=1 (32) ∫ 𝑛 𝑚 𝑏𝑘 ∏𝑅 ∏ 1 = 𝑛 𝑛 ∣𝑑 ∣2 𝑑𝜃 𝑇 𝑅 𝑛𝑇 1+ 𝑥 𝜔 𝑝 𝑎𝑘 sin2(𝜃) 𝑙,𝜋(𝑗) 𝑗 𝑙=1 𝑗=1⎛ ⎞ which is clearly minimized when all power of the correlation 𝑚 𝑛 𝑚 ∫ ( )𝑚𝑛 2 ∏ 1 ∏𝑅 ∏ 1 𝑏𝑘 sin2(𝜃) 𝑅 matrix is in direction of the weakest ∣𝑑𝑗 ∣ . Finally, we want ≈ ⎝ ⎠ 𝑑𝜃. W 𝑝𝑛𝑅 𝜔 𝑥 to select the precoding matrix that yields the best worst- 𝑗=1 𝑗 𝑙=1 𝑗=1 𝑙,𝜋(𝑗) 𝑎𝑘 case performance, or in other words maximizes the worst-case (36) expression for tr(Φ) in (32). We observe that this precoding matrix should fulfill that all diagonal√ elements 𝑑𝑗 have the The second factor is minimized by a permutation matrix that Ω same magnitude; that is, D = 1/𝑛𝑇 I. orders the∏ columns of with decreasing element product. 1/( 𝑝𝑛𝑅 ) Proof of Theorem 2: For the jointly-correlated Since 𝑗 𝑗 is a Schur-convex function, it is minimized model, the correlation matrix becomes R = by equal power allocation [29, Theorem 2.21]. Thus, each term ∗ ∗ 𝐻 𝐹 (Φ,𝑥) (U𝑇 ⊗ U𝑅)diag(vec(Ω))(U𝑇 ⊗ U𝑅) . For simplicity, of a,b,c is minimized by the same power allocation and we will use the notation Λ ≜ diag(vec(Ω)). Suppose that permutation matrix and we have reached the global minimum. 𝑥 the solution to (12) is denoted Wopt and let its singular To prove the behavior at small , observe that min 2 𝐹 (Φ,𝑥)=min 2 𝐹 (Φ, 1) value decomposition be Wopt = UΔV for some rectangular W; ∥W∥ =1 a,b,c W; ∥W∥ =𝑥 a,b,c . diagonal matrix Δ and unitary matrices U, V. We will first Now, suppose that 𝜋(𝑗)=𝑗 for 𝑗 ≤ 𝑚˜ . Let the power 𝑝 = 𝛽 (𝑥 − 𝑡) 𝑗 ≤ 𝑚˜ show that U = U𝑇 . allocation be parameterized as 𝑗 𝑗 for and 𝑝 = 𝛼 𝑡 𝑗>𝑚˜ 𝛼 ≥ 0 𝛽 ≥ 0 If we find a W that minimizes the integrand of 𝐹a,b,c(Φ,𝑥) 𝑗 𝑗 for ,where∑ 𝑗 and 𝑗 ∑ are arbitrary 𝛼 =1 𝛽 =1 in (5) for all 𝑥 and 𝜃, then we have reached the global coefficients that fulfill 𝑗>𝑚˜ 𝑗 and 𝑗≤𝑚˜ 𝑗 , 2 𝐹 (Φ, 1) minimum of the function. Let A = I +(∏ 𝑥/ sin (𝜃))Φ,then respectively. It is straightforward to show that a,b,c 𝑚𝑛𝑅 𝑡 the integrand becomes 1/ det(A)=1/( 𝑗=1 𝜆𝑗 (A)),which is convex with respect to . Hence, in order to show that is a Schur-convex function with respect to the eigenvalues of 𝑡 =0(i.e., selective power allocation to strongest direction, A since the partial derivative with multiplicity) minimizes the SER for small values on 𝑥, it is sufficient to show that (∂/∂𝑡)𝐹a,b,c(Φ, 1) > 0 at 𝑡 =0 𝑚𝑛∏𝑅 𝑚𝑛∏𝑅 ∂ 1 1 1 for such 𝑥. Recall from the proof of Lemma 1 that we can = − (33) ∂𝜆 (A) 𝜆 (A) 𝜆 (A) 𝜆 (A) interchange differentiation and integration. Define 𝑙 𝑗=1 𝑗 𝑙 𝑗=1 𝑗 1 𝑔𝑙,𝑗(𝑝) ≜ 𝜔 𝑝 . (37) satisfies Schur’s condition in (22) for Schur-convexity. Now, 1+ 𝑙,𝑗 we simplify the integrand as sin2(𝜃) 1 Using the parametrization in 𝑡, the derivative of (41) at 𝑡 =0 ( ) is det I + 𝑥 Φ ⎛ ⎞ sin2(𝜃) ∫ 𝑛 𝑚˜ 𝑚 ( ∂ 𝑏𝑘 ∏𝑅 ∏ ∏ −1 𝑥 𝑇 ∗ ⎝ ⎠ =det I + (W ⊗ I)(U ⊗ U )Λ 𝑔𝑙,𝑗 (𝛽𝑗 (𝑥 − 𝑡)) 𝑔𝑙,𝜋(𝑗)(𝛼𝑗 𝑡) 𝑑𝜃 2 opt 𝑇 𝑅 ∂𝑡 𝑡=0 sin (𝜃) 𝑎𝑘 𝑙=1 𝑗=1 𝑗=˜𝑚+1 ) (34) ⎛ ⎞ ∗ 𝐻 𝑇 𝐻 ∫ × (U𝑇 ⊗ U𝑅) (W ⊗ I) 𝑛𝑅 𝑏 𝑚˜ 𝑚 opt ∑ 𝑘 ∑ 𝛽˜𝜔˜ ˜𝑔˜ ˜(𝛽˜𝑥) ∑ 𝛼˜𝜔˜ ˜ ( ) = ⎝ 𝑗 𝑙,𝑗 𝑙,𝑗 𝑗 − 𝑗 𝑙,𝜋(𝑗) ⎠ −1 𝑥 𝑇 ∗ ∗ 𝑇 𝑇 ∗ 2 2 =det I + Λ(U U Δ Δ U U ⊗ I) , 𝑎 sin (𝜃) sin (𝜃) sin2(𝜃)  𝑇  𝑇 𝑙˜=1 𝑘 ˜𝑗=1 ˜𝑗=˜𝑚+1 =F ∏𝑛𝑅 ∏𝑚˜ × 𝑔 (𝛽 𝑥)𝑑𝜃 where the second equality follows from the singular value 𝑙,𝑗 𝑗 𝑙=1 𝑗=1 decomposition of Wopt and from using Schur’s determinant ( ) 𝑛 ∫ ∑𝑚˜ ∑𝑅 𝑏𝑘 lemma [37], det(I + BC)=det(I + CB).Sincewehave ˜ 𝛽˜𝑗 𝜔˜ ˜𝑔˜ ˜(𝛽˜𝑗 𝑥) − 𝜔˜ ≥ 𝑗=1 𝑙,𝑗 𝑙,𝑗 𝑙,𝑚˜ +1 shown that the expression in (34) is Schur-convex, we can sin2(𝜃) ˜ 𝑎𝑘 apply Lemma 2 on the last expression in (34) and conclude 𝑙=1 ∏𝑛𝑅 ∏𝑚˜ that F needs to be diagonal in order for Wopt to be an optimal 𝐻 × 𝑔 (𝛽 𝑥)𝑑𝜃, solution. Since Δ is a rectangular diagonal matrix, ΔΔ will 𝑙,𝑗 𝑗 𝑙=1 𝑗=1 be diagonal, and therefore U𝐻 U needs to be diagonal. Thus, 𝑇 (38) U = U𝑇 Π, where the permutation matrix Π decides which singular value of Δ that belongs to each of the eigenvectors where the inequality follows from selecting 𝛼𝑚˜ +1 =1and in U𝑇 . 𝛼𝑗 =0for 𝑗>𝑚˜ +1 (i.e., placing all power in 𝑡 on the 3588 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 9, NO. 11, NOVEMBER 2010

∑ 𝑗 =˜𝑚 +1,...,𝑛 𝜔 𝝀 𝜇 ,...,𝜇 column among 𝑇 that maximizes 𝑙 𝑙,𝑗). 𝑇 and 1 𝑛𝑇 are identical (up to a scaling factor) and At 𝑥 =0, the derivative is strictly positive for all 𝛽1,...,𝛽𝑚˜ share the same Schur-convexity properties. and since it is a continuous function, it will remain positive Proof of Theorem 5: The proof uses the same type of for a certain interval of small values on 𝑥.Inotherwords, parametrization as in the ”small 𝑥”-part of the proof of each sum component of 𝐹a,b,c(Φ,𝑥) will increase with 𝑡 for Theorem 2. First, we make sure that the eigendirection with small 𝑥 and thus the function is minimized by 𝑡 =0. the largest column sum in Ω (or one of them) is among the Finally, under the Kronecker model it holds for 𝑗1 <𝑗2 active directions by selecting 𝜋(1) = 1. We parameterize 𝜔 ≥ 𝜔 𝑙 𝑝 = 𝑥 − 𝑡 𝑝 = 𝛼 𝑡 𝑗>1 𝛼 ≥ 0 that 𝑙,𝑗1 𝑙,𝑗2 for all . Hence, each integral in (41) is as 1 and 𝑗 𝑗 for ∑ ,where 𝑗 minimized when Π = I. In addition, if two columns in Ω have are arbitrary coefficients that fulfill 𝑗>1 𝛼𝑗 =1.The an identical element sum, then the columns will be identical. function 𝐹a,b,c(Φ,𝑥) is convex with respect to 𝑡 and thus a Thus, Lemma 1 gives equal power allocation among them. necessary and sufficient condition for beamforming optimality ∂ Proof of Theorem 4: First, we consider the Schur-convexity is ∂𝑡𝐹a,b,c(Φ,𝑥)∣𝑡=0 ≥ 0. Similar to (38), straightforward 𝜔¯ ,...,𝜔¯ 𝑗 properties with respect to 1,𝑗 𝑛𝑅,𝑗 for a given .Using differentiation of 𝐹a,b,c(Φ,𝑥) with respect to 𝑡 yields [6, Lemma 2] and that 𝐹a,b,c(Φ,𝑥) is continuous and twice ⎛ ⎞ 𝑛 ∫ 𝑛 𝑚 ∑ 𝑏𝑘 ∑𝑅 ∑ continuously differentiable, we know that partial derivatives 𝑐 𝜔𝑙,𝜋(1) 𝛼𝑗 𝜔𝑙,𝜋(𝑗) 𝑘 ⎝ − ⎠ of the optimization problem in (12) can be calculated by eval- 𝜋 sin2(𝜃)+𝑥𝜔 sin2(𝜃) 𝑘=1 𝑎𝑘 𝑙=1 𝑙,𝜋(1) 𝑗=2 uating the derivative of 𝐹a,b,c(Φ,𝑥) at the optimal solution. 𝑑𝜃 Thus, we have × ( ) ≥ 0. det I + 𝑥 A ∂ sin2(𝜃) min 𝐹a,b,c(Φ,𝑥) ∂𝜔¯𝑙,𝜋(𝑗) W∈ℂ𝑛𝑇 ×𝐵 ; ∥W∥2=1 (42) 𝑛 ∫ 𝑥 ∑ 𝑏𝑘 𝑐 1 2 𝜇𝜋(𝑗) This condition needs to be fulfilledforanysetof𝛼𝑗 . 𝑘 ( ) sin (𝜃) = − 𝑥 𝑥 𝑑𝜃 𝜋 det I + Φ 1+ 2 𝜔¯ 𝜇 Therefore,∑ the expression in (18) is achieved by replac- 𝑘=1 𝑎𝑘 sin2(𝜃) sin (𝜃) 𝑙,𝜋(𝑗) 𝜋(𝑗) 𝑚 2 ing (𝛼𝑗 𝜔𝑙,𝜋(𝑗))/ sin (𝜃) by its maximum, achieved by (39) 𝑗=2 𝜋(2) = 2, 𝛼2 =1and 𝛼𝑗 =0for 𝑗>2. Finally, consider the for 𝑗 =1,...,𝑚, which is negative and only contains 𝜔¯𝑙,𝜋(𝑗) initial assumption that 𝜋(1) = 1. If this is not case, then the in a denominator (and within the determinant). Hence, the second sum within the integral in (42) is larger than the first 𝜔¯ 𝜔¯ 𝜔¯ ≥ derivative is smaller for 𝑙2,𝜋(𝑗) than for 𝑙1,𝜋(𝑗) if 𝑙1,𝜋(𝑗) sum and the beamforming optimality condition can never be 𝜔¯ 𝑙2,𝜋(𝑗). Recall from (22) that a function is Schur-convex with fulfilled. 𝜔¯ ,...,𝜔¯ 𝜔¯ ≥ 𝜔¯ respect to 1,𝜋(𝑗) 𝑛𝑅,𝜋(𝑗) if 𝑙1,𝜋(𝑗) 𝑙2,𝜋(𝑗) implies that the corresponding function derivatives satisfies the same ACKNOWLEDGMENT inequality. Thus, 𝐹a,b,c(Φ,𝑥) is Schur-convex. At large 𝑥, equal power allocation is the optimal precoding The authors would like to acknowledge useful discussions strategy and each integral in 𝐹a,b,c(Φ,𝑥) can be approximated with Pandu Devarakota and Samer Medawar. according to (36) as ∫ 𝑏𝑘 𝑑𝜃 REFERENCES ( ) det I + 𝑥 Φ [1] G. J. Foschini and M. J. Gans, “On limits of wireless communications in 𝑎𝑘 sin2(𝜃) ⎛ ⎞⎛ ⎞ a fading environment when using multiple antennas,” Wireless Personal 𝑛 𝑚 𝑚 ∫ ∏𝑅 ∏ 1 ∏ 1 𝑏𝑘 sin2𝑚𝑛𝑅 (𝜃) Commun., vol. 6, pp. 311–335, 1998. 𝑛𝑅𝑚⎝ ⎠⎝ ⎠ [2] E. Telatar, “Capacity of multi-antenna Gaussian channels,” European ≈ 𝑚 𝑛 𝑑𝜃. 𝜔¯ 𝜇 𝑅 𝑥𝑚𝑛𝑅 Trans. Telecommun., vol. 10, no. 6, pp. 585–595, 1999. 𝑙=1 𝑗=1 𝑙,𝜋(𝑗) 𝑗=1 𝜋(𝑗) 𝑎𝑘 [3] J. Kermoal, L. Schumacher, K. Pedersen, P. Mogensen, and F. Fred- (40) eriksen, “A stochastic MIMO radio channel model with experimental validation,” IEEE J. Sel. Areas Commun., vol. 20, no. 6, pp. 1211–1226, This is a Schur-convex function with respect to 2002. 𝜇𝜋(1),...,𝜇𝜋(𝑚) [28, Theorem 3.A.4]. [4] D. Chizhik, J. Ling, P. Wolniansky, R. Valenzuela, N. Costa, and 𝑥 K. Huber, “Multiple-input-multiple-output measurements and modeling At small , all power is allocated in the direction with in Manhattan,” IEEE J. Sel. Areas Commun., vol. 21, no. 3, pp. 321–331, the largest 𝜇𝑗 . In general, this direction is distinct and each 2003. integral in 𝐹a,b,c(Φ,𝑥) can be expressed as [5] K. Yu, M. Bengtsson, B. Ottersten, D. McNamara, P. Karlsson, and M. Beach, “Modeling of wide-band MIMO radio channels based on ∫ ∫ 𝑛 𝑏𝑘 𝑑𝜃 𝑏𝑘 ∏𝑅 1 NLoS indoor measurements,” IEEE Trans. Veh. Technol., vol. 53, no. 3, ( ) 𝑥 = 𝑥 𝑑𝜃. pp. 655–665, 2004. det I + Φ 1+ 2 𝜔¯𝑙,1𝜇1 𝑎𝑘 sin2(𝜃) 𝑎𝑘 𝑙=1 sin (𝜃) [6] E. Bj¨ornson and B. Ottersten, “A framework for training-based esti- (41) mation in arbitrarily correlated Rician MIMO channels with Rician 𝜇 disturbance,” IEEE Trans. Signal Process., vol. 58, no. 3, pp. 1807– This is a decreasing function with respect to 1 and indepen- 1820, 2010. dent of all other 𝜇𝑗 . Thus, it is a Schur-concave function with [7] A. Narula, M. Lopez, M. Trott, and G. Wornell, “Efficient use of side respect to the ordered vector with 𝜇1,...,𝜇𝑛 . information in multiple-antenna data transmission over fading channels,” 𝑇 IEEE J. Sel. Areas Commun., vol. 16, no. 8, pp. 1423–1436, 1998. Finally, under the Kronecker model, observe that 𝜔¯𝑙,1 = 𝑇 [8] E. Visotsky and U. Madhow, “Space-time transmit precoding with ...=¯𝜔𝑙,𝑛𝑇 for all 𝑙.SinceΩ = 𝝀𝑅𝝀𝑇 , this implies that 𝝀𝑅 imperfect feedback,” IEEE Trans. Inf. Theory, vol. 47, no. 6, pp. 2632– 𝑇 and [¯𝜔1,𝑗,...,𝜔¯𝑛 ,𝑗] are identical (up to a scaling factor) and 2639, 2001. 𝑅 [9] M. Ivrlac, W. Utschick, and J. Nossek, “Fading correlations in wireless it is straightforward to show that the function is also Schur- MIMO communication systems,” IEEE J. Sel. Areas Commun., vol. 21, convex with respect to 𝝀𝑅. In the same way, we conclude that no. 5, pp. 819–828, 2003. BJORNSON¨ et al.: IMPACT OF SPATIAL CORRELATION AND PRECODING DESIGN IN OSTBC MIMO SYSTEMS 3589

[10] E. Jorswieck and H. Boche, “Optimal transmission strategies and impact [34] A. Tulino, A. Lozano, and S. Verd´u, “Impact of antenna correlation on of correlation in multiantenna systems with different types of channel the capacity of multiantenna channels,” IEEE Trans. Inf. Theory, vol. 51, state information,” IEEE Trans. Signal Process., vol. 52, no. 12, pp. no. 7, pp. 2491–2509, 2005. 3440–3453, 2004. [35] S. Loyka, “Channel capacity of MIMO architecture using the exponen- [11] E. Bj¨ornson, D. Hammarwall, and B. Ottersten, “Exploiting quantized tial correlation matrix,” IEEE Commun. Lett., vol. 5, no. 9, pp. 369–371, channel norm feedback through conditional statistics in arbitrarily 2001. correlated MIMO systems,” IEEE Trans. Signal Process., vol. 57, no. 10, [36] W. Rudin, Principles of Mathematical Analysis. McGraw-Hill, 1976. pp. 4027–4041, 2009. [37] F. Zhang, The Schur Complement and its Applications. Springer, 2005. [12] A. Zanella, M. Chiani, and M. Win, “Performance of MIMO MRC in correlated Rayleigh fading environments,” in Proc. IEEE VTC’05- Emil Bjornson¨ (S’07) was born in Malm¨o, Sweden, Spring, 2005, pp. 1633–1637. in 1983. He received the M.S. degree in engineering mathematics from Lund University, Lund, Sweden, [13] M. McKay, A. Grant, and I. Collings, “Performance analysis of MIMO- in 2007. He is currently working towards the Ph.D. MRC in double-correlated Rayleigh environments,” IEEE Trans. Com- degree in telecommunications at the Signal Process- mun., vol. 55, no. 3, pp. 497–507, 2007. ing Laboratory, KTH Royal Institute of Technology, [14] S. Alamouti, “A simple transmit diversity technique for wireless commu- Stockholm, Sweden. His research interests include nications,” IEEE J. Sel. Areas Commun., vol. 16, no. 8, pp. 1451–1458, wireless communications, resource allocation, es- 1998. timation theory, stochastic signal processing, and [15] V. Tarokh, H. Jafarkhani, and A. Calderbank, “Space-time block codes mathematical optimization. For his work on MIMO from orthogonal designs,” IEEE Trans. Inf. Theory, vol. 45, no. 5, pp. communications, he received a Best Paper Award at 1456–1467, 1999. the 2009 International Conference on Wireless Communications and Signal [16] G. J¨ongren, M. Skoglund, and B. Ottersten, “Combining beamforming Processing (WCSP 2009). and orthogonal space-time block coding,” IEEE Trans. Inf. Theory, vol. 48, no. 3, pp. 611–627, 2002. Eduard A. Jorswieck (S’01-M’05-SM’08) re- [17] X.-B. Liang, “Orthogonal designs with maximal rates,” IEEE Trans. Inf. ceived his Diplom-Ingenieur degree and Doktor- Theory, vol. 49, no. 10, pp. 2468–2503, 2003. Ingenieur (Ph.D.) degree, both in electrical engi- [18] K. Lu, S. Fu, and X.-G. Xia, “Closed-form designs of complex orthog- neering and computer science from the Berlin Uni- onal space-time block codes of rates (𝑘 +1)/(2𝑘) for 2𝑘 − 1 or 2𝑘 versity of Technology (TUB), Germany, in 2000 transmit antennas,” IEEE Trans. Inf. Theory, vol. 51, no. 12, pp. 4340– and 2004, respectively. He was with the Fraunhofer 4347, 2005. Institute for Telecommunications, Heinrich-Hertz- [19] W. Weichselberger, M. Herdin, H. Ozcelik,¨ and E. Bonek, “A stochastic Institute (HHI) Berlin, from 2001 to 2006. In 2006, MIMO channel model with joint correlation of both link ends,” IEEE he joined the Signal Processing Department at the Trans. Wireless Commun., vol. 5, no. 1, pp. 90–100, 2006. KTH Royal Institute of Technology as a post-doc [20] X. Gao, B. Jiang, X. Li, A. Gershman, and M. McKay, “Statistical and became a Assistant Professor in 2007. eigenmode transmission over jointly-correlated MIMO channels,” IEEE Since February 2008, he has been the head of the Chair of Communications Trans. Inf. Theory, vol. 55, no. 8, pp. 3735–3750, 2009. Theory and Full Professor at Dresden University of Technology (TUD), [21] E. Bj¨ornson, B. Ottersten, and E. Jorswieck, “On the impact of spatial Germany. His research interests are within the areas of applied information correlation and precoder design in MIMO systems with space-time block theory, signal processing and wireless communications. He is senior member coding,” in Proc. IEEE ICASSP’09, 2009, pp. 2741–2744. of IEEE and elected member of the IEEE SPCOM Technical Committee. From [22] Evolved Universal Terrestrial Radio Access (E-UTRA); Physical Chan- 2008-2011 he serves as an Associate Editor for IEEE SIGNAL PROCESSING nels and Modulation (Release 8), 3GPP TS 36.211, Dec. 2008. LETTERS. In 2006, he was co-recipient of the IEEE Signal Processing Society [23] IEEE Std 802.11n-2009; Amendment 5: Enhancements for Higher Best Paper Award. Throughput, IEEE-SA Standards Board, Oct. 2009. [24] E. Larsson and P. Stoica, Space-Time Block Coding for Wireless Com- Bjorn¨ Ottersten (S’87-M’89-SM’99-F’04) was munications. Cambridge University Press, 2003. born in Stockholm, Sweden, in 1961. He received the M.S. degree in electrical engineering and ap- [25] S. Sandhu and A. Paulraj, “Space-time block codes: a capacity perspec- plied physics from Link¨oping University, Link¨oping, tive,” IEEE Commun. Lett., vol. 4, no. 12, pp. 384–386, 2000. Sweden, in 1986 and the Ph.D. degree in electrical [26] A. Hjørungnes and D. Gesbert, “Precoding of orthogonal space-time engineering from Stanford University, Stanford, CA, block codes in arbitrarily correlated MIMO channels: iterative and in 1989. closed-form solutions,” IEEE Trans. Wireless Commun., vol. 6, no. 3, He has held research positions at the Department pp. 1072–1082, 2007. of Electrical Engineering, Link¨oping University; the [27] R. Ertel, P. Cardieri, K. Sowerby, T. Rappaport, and J. Reed, “Overview Information Systems Laboratory, Stanford Univer- of spatial channel models for antenna array communication systems,” sity; and the Katholieke Universiteit Leuven, Leu- IEEE Personal Commun. Mag., vol. 5, no. 1, pp. 10–22, 1998. ven, Belgium. During 19961997, he was Director of Research at Array- [28] A. Marshall and I. Olkin, Inequalities: Theory of Majorization and its Comm Inc., San Jose, CA, a start-up company based on Otterstens patented Applications. Academic Press, 1979. technology. In 1991, he was appointed Professor of Signal Processing at [29] E. Jorswieck and H. Boche, “Majorization and matrix-monotone func- the KTH Royal Institute of Technology, Stockholm, Sweden. From 2004 to tions in wireless communications,” Foundations and Trends in Commun. 2008, he was Dean of the School of Electrical Engineering at KTH, and and Inf. Theory, vol. 3, no. 6, pp. 553–701, 2007. from 1992 to 2004 he was head of the Department for Signals, Sensors, and [30] M. Simon and M. Alouini, “A unified approach to the performance Systems at KTH. He is also Director of security and trust at the University analysis of digital communication over generalized fading channels,” of Luxembourg. His research interests include wireless communications, Proc. IEEE, vol. 86, no. 9, pp. 1860–1877, 1998. stochastic signal processing, sensor array processing, and time-series analysis. [31] A. Lozano and N. Jindal, “Transmit diversity vs. spatial multiplexing in Dr. Ottersten has coauthored papers that received an IEEE Signal Process- modern MIMO systems,” IEEE Trans. Wireless Commun., vol. 9, no. 1, ing Society Best Paper Award in 1993, 2001, and 2006. He has served as pp. 186–197, 2010. Associate Editor for the IEEE TRANSACTIONS ON SIGNAL PROCESSING [32] K. Phan, S. Vorobyov, and C. Tellambura, “Precoder design for space- and on the Editorial Board of the IEEE Signal Processing Magazine.He time coded MIMO systems with correlated Rayleigh fading channels,” is currently Editor-in-Chief of the EURASIP Signal Processing Journal and in Proc. CCECE’07, 2007. a member of the Editorial Board of the EURASIP Journal of Advances in [33] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge Signal Processing. He is a Fellow of IEEE and EURASIP. He is one of the University Press, 2004. first recipients of the European Research Council advanced research grant.