Random Matrices, Integrals and Space-Time Systems

Random Matrices, Integrals and Space-time Systems

Babak Hassibi California Institute of Technology DIMACS Workshop on Algebraic Coding and Information Theory, Dec 15-18, 2003 Outline

• Overview of multi-antenna systems • Random matrices • Rotational-invariance • Eigendistributions • Orthogonal polynomials • Some important integrals • Applications • Open problems Introduction

We will be interested in multi-antenna systems of the form:

ρ X = SH +V , M where X ∈ CT×N , S ∈ CT×M , H ∈ CM ×N ,V ∈ CT×N are the receive, transmit, channel, and noise matrices, respectively. Moreover, M , N are the number of transmit/receive antennas respectively, T is the coherence interval and ρ is the SNR.

The entries of V are iidCN(0,1) and the entries of H are also CN(0,1), but they may be correlated. Some Questions

We will be interested in two cases. The coherent case, where H is known to the receiver and the non-coherent case, where H is unknown to the receiver.

The following questions are natural to ask.

• What is the capacity? • What are the capacity-achieving input distributions? • For specific input distributions, what is the mutual information and/or cut-off rates? • What are the (pairwise) probability of errors? Random Matrices

A m × n random matrix A is simply described by the joint pdf of its entries,

p(A) = p(aij ;i = 1,...,m; j = 1,...,n)

An example is the family of Gaussian random matrices, where the entries are jointly Gaussian. Rotational-Invariance An important class of random matrices are (left- and right-) rotationally-invariant ones, with the property that their pdf is invariant to (pre- and post-) multiplication by any m × m and n × n unitary matrices Θ and Ψ . p(ΘA) = p(A), * * ΘΘ = Θ Θ = I m and * * p(AΨ) = p(A), ΨΨ = Ψ Ψ = I n If a random matrix is both right- and left- rotationally-invariant we will simply call it isotropically-random (i.r.). If G is a random matrix with iid Gaussian entries, then it is i.r., as are all of the matrices:

* * p −1 −1 * −1 * * GG ,G + G ,G ,G ,G1G2 ,G1G2 ,G1 (G2G2 ) G1 ,G1 AG1 Isotropically-Random Unitary Matrices

A random unitary matrix is one for which the pdf is given by * p(Θ) = f (Θ)δ (ΘΘ − I m ). When the unitary matrix is i.r., then it is not hard to show that Γ(m)...Γ(1) p(Θ) = δ (ΘΘ* − I ). π m(m+1) / 2 m Therefore an i.r. unitary matrix has a uniform distribution over the Stiefel manifold (space of unitary matrices). It is also called the Haar measure. δ A Fourier Representation δ θ θ δ If we denote the columns of Θ by θ k , k = δ 1 ,.., m , then θ θ (Θ*Θ − I ) = (Re( * ) − ) (Im( * ) − δ ) m Π k l kl k l kl k≥l Using the Fourier representation of the delta function 1 δ (x) = ∫ dωe jωx δ 2π It follows that we can write Γ(m)...Γ(1) * (Θ*Θ − I ) = dΩe jtrΩ(Θ Θ−Im ) m m m(3m+1) / 2 ∫ 2 π Ω=Ω* A Few Theorems

I.r. unitary matrices come up in many applications.

Theorem 1 Let A be an m × n i.r. random matrix and consider the svdA = UΣV *. Then the following two equivalent statements hold: 1.U,Σ,V are independent random matrices and UV and are i.r. unitary. 2. The pdf of A only depends on Σ : p(A) = f (Σ). Idea of Proof: Θ A Ψ *andA have the same distribution for any unitary Θ and Ψ.... Theorem 2 Let A be an i.r. Hermitian matrix and consider the eigendecompositionA = UΛU .* Then the following two equivalent statements are true. 1.U,Λ are independent random matrices and U is i.r. unitary. 2. The pdf of A is independent of U: p(A) = f (Λ).

Theorem 3 Let A be a left rotationally-invariant random matrix and consider the QR decomposition, A=QR. Then the matrices Q and R are independent and Q is i.r. unitary. Some Jacobians The decompositions A = U Λ U *and A = QR can be considered as coordinate transformations. Their corresponding Jacobians can be computed to be: λ 1 λ dA = dUdΛ ( − ) 2 δ (UU * − I ) m!Π k l m and k >l m dA = dQdRc r m−kδ (QQ* − I ) Π kk m k=1 for some constant c.

Note that both Jacobians are independent of U and Q. Eigendistributions

Thus for an i.r. Hermitian A with pdfp A (⋅), we have λ 1 λ 2 * p(U,Λ) = p A (Λ) ∏( k − l ) δ (UU − I m ). m! k>l Integrating out the eigenvectors yields: π Theorem 4 Let A be an i.r. Hermitian matrix with pdf p A (⋅). Then m(m−1) λ 2 p(Λ) = p A (Λ)∏( k − λl ) Γ(m +1)...Γ(1) k >l

λ 2 2 Note that ∏ ( k − λ l ) = det V ( Λ ), a Vandermonde determinant. k>l Some Examples

• Wishart matrices, A = G * G , where G is m × n,m ≥ n. 1 n − λ m−n 2 k 2 p(Λ) = c∏λk e det V (Λ). k =1 −1 • Ratio of Wishart matrices, A = A 1 A 2 : n 1 p(Λ) = c∏( ) 2n det 2 V (Λ). k =1 1+ λ k α • I.r. unitary matrix. Eigenvaluesα are on the unit circle and the distribution of the phases are:

2 α p( 1 ,..., m ) = c∏sin ( k −α l ). k >l The Marginal Distribution Note that all the previous eigendistributions were of the form: m 2 p(Λ) = c∏ f (λk )det V (Λ). k =1 For such pdf’s the marginal can be computed using an elegant trick due to Wigner.

Define the Hankel matrixλ λ  1   λ  m−1 F = ∫ d f ( ) M []1 L λ . m−1   Note that F ≥ 0 . Assume that F > 0 . Then we can perform the Cholesky decomposition F=LL*, with L lower triangular. −1 −* λ Note that L FL = I m implies that the polynomials

 g 0 ( )   1    −1    M λ  = L  M  m−1 g m−1 ( ) λ  λ λ are orthonormal wrt to theλ weighting function f(.): λ λd f ( )g ( )g ( ) = δ . ∫ λ k l kl Now the marginal distributionλ of one eigenvalue is given by λ m p( = ) = c d d f (λ )det 2 V (Λ) 1 ∫ 2 L λ m ∏ k c k =1λ m = d d f (λ )(detL−1V(Λ))2 λ 2 −1 ∫ 2 Lλ m∏ k det L k=1 λ But λ  1 L 1   g 0 ( 1 ) L g 0 ( m )  λ L−1V (Λ) = L−1   =   = V (Λ)  M O M   M O M  G m−1 m−1  1 L m  g m−1 ( 1 ) L g m−1 (λm ) 2 Now upon expanding out det V G ( Λ ) and integrating over the variables λ 2 L λ m the only terms that do not vanish are those for which the indices of the orthonormal polynomials coincide. λ Thus, after the smoke clears m−1 λ 2 p( ) = c∑ f ( )g k (λ). k =0 In fact, we have the following result.

Theorem 5 Let A be an i.r. Hermitian matrix with p A (A) = ∏ f (λk ). Then the marginal distributionλ of the eigenvalues of A is m−1 λ 1 2 p( ) = ∑ f ( )g k (λ). m k =0 Orthogonal Polynomials

• What was just described was the connection between random matrices and orthogonal polynomials. • For Wishart matrices, Laguerre polynomials arise. For ratios of Wishartλ matrices it is Jacobi polynomials, and for i.r. unitary matricesλ it is the complex exponential functions (orthogonal on the unit circle). λ • Theorem 5 gives a Christoffel-Darbouxλ sum and so λ f ( ) am−1 ' ' p( ) = ⋅ (g m ( )g m−1 ( ) − g m−1 ( )g m (λ)) m am • The above sum gives a uniform way to obtain the asymptotic distribution of the marginal pdf and to obtain results such as Wigner’s semi-circle law. Remark

The attentive audience will have discerned that my choice of the Cholesky factorization of F and the resulting orthogonal polynomials was rather arbitrary.

It is possible to find the marginal distribution without resorting to orthogonal polynomials.λ The result is given below. λ λ  1  1 m−1 −1   p( ) = f ( )[]1 L F  M  m m−1 λ  Coherent Channels Let us now return to the multi-antenna model ρ X = SH +V , M where we will assume that the channel H is known. We will H = D GD assume that t r where D t , D r are the correlation matrices at the transmitter and receiver and G has iid CN(0,1) entries. Note that D t , D r can be assumed diagonal wlog.

According to Foschini&Telatar: ρ

1. When Dt = I M , Dr = I N : G *G C = E logdet(I + G *G) = E log(1+ ρλ( )) N M ∑ M ρ

2. When Dt = I M : ρ GD 2G * C = E logdet(I + GD 2G * ) = E log(1+ ρλ( r )) M M r ∑ M 3. When Dr = I N : * * G Dt PDt G C = max E logdet(I N + G Dt PDt G) = max ∑log(1+ ρλ( )) tr(P)=M M tr(P)=M M 4. In the general case: D G * D PD GD C = max E∑log(1+ ρλ( r t t r )) tr(P)=M M

Cases 1-3 are readily dealt with using the techniques developed so far, since the matrices are rotationally-invariant.

Therefore we will do something more interesting and compute the characteristic function (not just the mean). This requires more machinery, as does Case 4, which we now develop. A Useful Integral Formula

Using a generalizationλ of the technique used to prove Theorem 5, we can show the following result. λ Theorem 6 Letλ functions be f (λ), g k (λ),hk (λ),k = 0,L,m −1 λ given and define the matricesλ λ  g 0 ( 1 ) L g 0 ( m )   h0 ( 1 ) L h0 ( m )  λ V (Λ) =  ,V (Λ) =   G  M O M  H  M O M  g m−1 ( 1 ) L g m−1 ( m ) hm−1 ( 1 ) L hm−1 (λm ) λ λ Then mλ dΛ f (λ )detV (Λ)detV (Λ) = m!det F ∫ ∏ k G H where k=1 λ  g 0 ( )λ F = d f ( ) []h ( ) h (λ) . ∫  M  0 L m−1 g m−1 ( ) Theorem 6 was apparently first shown by Andreief in 1883.

A useful generalizationλ has been noted in Chiani, Win and Zanella (2003).

Theorem 7 Let functions f k (λ), g k (λ),hk (λ),k = 0,L,m −1 be given. Then λ m λ dΛ f ( )detV (Λ)detV (Λ) = Tensor( f ( )g ( )h (λ)) ∫ ∏ k k G H ∫ i j k k=1 where for the tensor A ijk we have defined

µ m Tensor(A) = sgn( ) sgn(α) A ∑∑∏ k k k µαk =1 and the sums are over all possible permutations µ , α of µ the integers 1 to m. α An Exponential Integral

Theorem 8 (Itzyskon and Zuber, 1990) Let A and B be m-dimensional diagonal matrices. Then

* det E(A, B) dΘetrΘAΘ Bδ (ΘΘ* − I ) = Γ(m +1) Γ(1) ∫ m L detV (A)detV (B) where a b a b e 1 1 L e 1 m    E(A, B) =  M O M   amb1 ambm  e L e 

Idea of Proof: Use induction. Start by partitioning

A1  Θ = []Θ1 ϑ , A =    a m  δ

* * Then rewrite tr ( Θ A Θ B ) = tr ( Θ 1 ( A 1 − a m I m − 1 ) Θ 1 B ) + a mtrB so that the desired integral becomes

* ' * dΘetrΘAΘ B (ΘΘ* − I ) = e amtrB dΘ etrΘ1A Θ1Bδ (Θ*Θ − I ) ∫ m ∫ 1 1 1 m−1

' * * = ceamtrB dΘ dΩetrΘ1A Θ1B+ jtrΩ(Θ1Θ1 −Im−1 ) ∫ 1 e − jtrΩ = ceamtrB dΩ ∫ m ' ∏det(bk A + jΩ) k=1 ' e − jtrAW = ceamtrB dW ∫ m ∏det(bk I m−1 + jW ) ' * e − jtrAUΛU k =1 = ceamtrB dUdΛ λ det 2 V (Λ)δ (UU * − I ) ∫ m−1 m m−1 ∏ ∏(bk + j l ) l=1 k=1 The last integral is over an (m-1)-dimensional i.r. matrix. And so if use the integral formula (at the lower dimension) to do the integral over U, we get

e amtrB m−1 1 = c dΛ det E(A' ,Λ)detV (Λ) ' ∫ ∏ m detV (A ) l=1 ∏(bk + jλl ) k=1

An application of Theorem 6 now gives the result. Characteristic Function ω ρ ρ Consider C = E logdet(ρI + GDG* ) = E logdet(I + DGG* ) M M N M The characteristic function is (assuming M=N)

* j log det(I N + DGG ) ρ Ee M = E det(I + DGG* ) jω N M

ρ jω −trW = c dW det(I N + DW ) e ρ ∫ M λc ρ −1 = dW det(I + W ) jω e −trWD det m (D) ∫ N M M c * −1 = dUdΛ (1+ )e −trUΛU D det 2 V (Λ)δ (UU * − I ) m ∫ ∏ k M det (D) k =1 M Successive use of Theorems 6 and 8 give the result. Non-coherent Channels

Let us now consider the non-coherent channel. ρ X = SH +V , M where H is unknown and has iid CN(0,1) entries.

Theorem 9 (Hochwald and Marzetta, 1998) The capacity- achieving distribution is given by S = UD, where U is T-by-M i.r. unitary and D is an independent diagonal.

Idea of Proof: Write S=UDV*. V* can be absorbed in H and so Is not needed. Optimal S is left rotationally-invariant. Mutual Information Determining the optimal distribution on D is an open problem. However, given D, one can compute all quantities of interest. The starting point is

ρ −trX * (I + UD2U * )−1 X π e T M p(X |U, D) = ρ TN det N (I + UD 2U * ) T M * * M −2 −1 * −trX X +trX U (I M + D ) U X eπ ρ = ρ TN det N (I + D 2 ) M M The expectation over U is now readily do-able to give p(X|D). (A little tricky since U is not square, but doable using Fourier Representation of delta functions and Theorems 6 and 8.) Other Problems

• Mutual information for almost any input distribution on D can be computed. • Cut-off rates for coherent and non-coherent channels for many input distributions (Gaussian, i.r. unitary, etc.) can be computed. • Characteristic function for coherent channel capacity in general case can be computed. • Sum rate capacity of MIMO broadcast channel in some special cases can be computed. • Diversity of distributed space-time coding in wireless networks can be determined. Other Work and Open Problems

• I did not touch at all upon asymptotic analysis using the Stieltjes transform. • Open problem include determining the optimal input distribution for the non-coherent channel and finding the optimal power allocation for coherent channels when there is correlation among the transmit antennas.