Precoding and for Multi-Input Multi-Output Downlink Channels

by

Roya Doostnejad

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy The Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto

°c Copyright by Roya Doostnejad, 2005 Precoding and Beamforming for Multi-Input Multi-Output Downlink Channels

Roya Doostnejad

Doctor of Philosophy, 2005

The Edward S. Rogers Sr. Department of Electrical and Computer Engineering

University of Toronto

Abstract

This dissertation presents precoding and beamforming schemes for multi-user wireless downlink channels when multiple antennas are employed at both the transmitter and the receivers. In the first part of the thesis, we will discuss transmitter processing with- out channel information which is applicable in both flat and frequency selective (when orthogonal frequency-division multiplexing (OFDM) is applied) channels. This leads to methods for designing signature matrices for transmitters that use any combina- tion of the spatial, temporal and frequency dimensions, with good performance provided by low-complexity receivers. In the rest of the thesis, we pose the problem when the channels between the base station and each user are known perfectly at the base station.

A non-linear precoding scheme is designed to minimize the mean-squared error between the transmitted and received data with a per-user power constraint. We also develop methods that are able to provide user-specific signal-to-interference-noise ratios (SINRs) with minimal total transmit power, through the extension of a so-called uplink-downlink duality result. Our study indicates that channel knowledge at the transmitter leads to substantial reductions in required power for providing given levels of SINRs to users.

ii Acknowledgements

I would like to express my sincere gratitude to my supervisors Prof. Teng Joon Lim, and Prof. Elvino S. Sousa for their guidance, advice, and continued support throughout my thesis research. Prof. Lim has provided the key technical insights and contributed tireless editorial effort which has vastly improved the quality of this dissertation. Prof. Sousa has provided me a gentle encouragement and a far-reaching vision of the work. I wish to thank my entire committee: Prof. Frank Kschischang, Prof. Ravi Adve, Prof. Dimitris Hatzinakos, Prof. Bruce A. Francis, and Prof. Murat Uysal of the Univer- sity of Waterloo for their effort, discussions and constructive comments. In particular, I would like to thank Prof. Kschischang for his invaluable inputs and constant encourage- ment throughout the course of this research. I also acknowledge the administrative support of Diane B. Silva during these years. I am appreciative of my colleagues in the communication group as well as my friends in Toronto who made this period of my life most enjoyable and beneficial. The financial support of the University of Toronto and Ontario Graduate Scholarships in Science and Technology (OGSST) is also greatly appreciated. I would like to extend my appreciation to the professors from whom I learned a great deal in earlier stages of my studies in Isfahan University of Technology, in particular, Dr. H. Alavi, Dr. A. Doosthoseini, Dr. S. Sadri and Dr. V. Tahani. It is impossible to express the debt that I owe to my late parents. My father who shaped the first stages of my education and has been always a role model for me, and my mother, that if it was not because of her intense care and compassionate support, I would have never been able to come this far. I would also like to thank my siblings, Rezvan, Mehdi and Ahmad, and my in-laws Akbar Abdollahi and Shahla Dardashti who have always been a source of encouragement and drive behind my achievements. At last, my most special thanks goes to my husband, Kambiz Bayat for infinite love, support, patience and devotion, and to my little one for inspiration at the end of this journey.

iii To my husband, Kambiz

and

In memory of my parents.

iv Contents

Abstract ii

Acknowledgements ...... iii

List of Tables ix

List of Figures xii

1 Introduction 1

1.1 Multipath Fading Channels ...... 4

1.2 Space-Time Coding ...... 8

1.2.1 System Model ...... 8

1.2.2 Design Criteria ...... 9

1.2.3 Space-time Coding Schemes ...... 10

1.2.4 Space-Time Coding in a Multiuser System ...... 15

1.3 Precoding ...... 17

1.3.1 MIMO Single-user Systems ...... 17

1.3.2 MIMO Multiuser Systems ...... 20

1.4 Overview of the Thesis ...... 21

1.5 Notations ...... 24

v 2 Space-Time Multiplexing for MIMO Multiuser Downlink Channels 25

2.1 System Model ...... 27

2.2 Transmitted Signal Design ...... 29

2.2.1 Assumptions and Goals ...... 29

2.2.2 Spreading Matrix Design ...... 31

2.2.3 Constellation and Power Allocation ...... 35

2.3 Receiver Structures ...... 39

2.3.1 Joint ML Detection ...... 39

2.3.2 Multi-Stage Successive Interference Cancellation ...... 40

2.4 Comparison With Other STC-CDMA Transceivers ...... 42

2.5 Simulations ...... 43

2.6 Summary ...... 53

3 Precoding and Beamforming for MIMO Downlink Channels with Per-

User Power Constraints 54

3.1 Problem Formulation ...... 57

3.1.1 Signal Model ...... 57

3.1.2 Precoding ...... 58

3.2 MMSE Beamforming/Precoding ...... 61

3.2.1 Precoding Matrix Design ...... 62

3.2.2 Optimum Receive Matrix ...... 64

3.2.3 Optimum Transmit Matrix ...... 65

3.2.4 Precoding Ordering ...... 67

3.3 Space-Time Spreading ...... 69

3.4 Simulation Results ...... 71

3.5 Summary ...... 75

vi 4 Precoding and Beamforming for MIMO Downlink Channels to Mini-

mize Total Transmit Power 76

4.1 Problem Formulation ...... 78

4.1.1 Signal Model ...... 78

4.1.2 Background ...... 80

4.2 Joint Power Allocation and MMSE Beamforming Using Uplink/Downlink

Duality ...... 83

4.2.1 Uplink-Downlink Duality for MIMO channels ...... 83

4.2.2 Proposed Algorithm ...... 85

4.3 Space-Time Spreading ...... 89

4.4 Multiple Symbol Transmission to each user ...... 91

4.5 Simulation Results ...... 92

4.6 Summary ...... 99

5 Precoding and Beamforming for the Down-link in a MIMO/OFDM

System 100

5.1 Single User MIMO/OFDM Systems ...... 102

5.2 Signal Model ...... 103

5.2.1 Transmit Signal ...... 103

5.2.2 Received Signal ...... 106

5.3 SFS Matrix Design with no Channel Information at the Transmitter . . . 107

5.4 Comparison With MIMO Multi-Carrier CDMA Schemes ...... 111

5.5 SFS Matrix Design with Perfect Channel Knowledge at the Transmitter . 115

5.6 Simulation Results ...... 116

5.7 Summary ...... 124

vii 6 Conclusion 125

6.1 Summary of Contributions ...... 125

6.2 Future Work ...... 126

A Spreading Matrix Design Examples 129

B Proof of Uplink-Downlink Duality in MIMO Multiuser Systems 131

C The Algorithms for Multiple Symbol Transmission to each user 134

Bibliography 137

viii List of Tables

2.1 Comparison between different STC schemes for the downlink in a MIMO

multiuser channel ...... 43

3.1 The algorithm for precoding and MMSE beamforming ...... 68

4.1 The precoding/beamforming algorithm for MIMO-BC channels minimiz-

ing total transmit power...... 88

4.2 The error rate performance of TTPC versus PUPC algorithm for t = r = 4,

K = 4, SINR = 10(dB)...... 98

4.3 The error rate performance of TTPC versus PUPC algorithm for t = r = 4,

G = 4, K = 16, SINR = 10(dB)...... 98

C.1 The precoding/beamforming for multiple symbol transmission...... 135

C.2 The space-time precoding/beamforming for multiple symbol transmission. 136

ix List of Figures

1.1 Multiple Access Channel ...... 2

1.2 Broadcast Channel ...... 3

1.3 Matrix DFE ...... 18

2.1 Transmission system model ...... 28

2.2 Structure of two-stage SIC...... 40

2.3 Performance of 2-D STSC for different receivers, t = r = 2, G = 2, U = 4. 44

2.4 The effect of MAI (number of users) on the achieved diversity with MMSE

for t = r = 4, G =4...... 45

2.5 The impact of power allocation on the performance of 2-D STSC for t =

r = 2, U =8...... 47

2.6 The impact of power allocation on the performance of SIC for t = r = 4,

G = 4, U =16...... 48

2.7 Performance of 2-D STSC in correlated fading channels for t = r = 2. . 49

2.8 Performance comparison of various schemes for multiuser channel in the

downlink for t = r = 2, G =2...... 50

x 2.9 1214.5=13.613.6Performance of proposed 2D-STSC versus randomly gen-

erated ST spreading codes which do not have the zero average MAI prop-

erty, and Hadamard codes which give zero average MAI but do not satisfy

the full-diversity criterion...... 51

2.10 Performance comparison of the proposed space-time coding scheme and

rotated constellation (TAST) in a single user system for t = r = 2, G = 2. 52

3.1 Block diagram of the matrix DFE...... 59

3.2 Matrix form of the Tomlinson-Harashima precoder...... 60

3.3 The average Pe for different number of receive antennas and t = 4,K = 2, z =2...... 72

3.4 The performance of space-time spreading for different number of receive

antennas, t = 4,G = 4,K = 8, z = 2...... 73

3.5 Average Pe compared with Pe for each individual user, t = 2, r = 2,K = 2, z =1...... 74

4.1 Uplink-downlink duality – these two multi-user channels have the same

achievable SINR region for a given sum power constraint...... 84

4.2 Performance of the iterative linear beamforming and the proposed algo-

rithm with MMSE and random initializations, for t = r = 4, K = 4,

SINR = 10 dB...... 93

4.3 Total transmit power versus the required SINR for different number of

transmit/receive antennas for K =4...... 94

4.4 Transmit power per user versus the number of active users at the system

for r = 4, SINR = 10 dB...... 95

4.5 Precoding/Beamforming over space and time for t = r = 4, K = 16. . . . 96

4.6 Precoding/Beamforming over space and time for t = r = 4, K = 8. . . . . 97

xi 5.1 OFDM/MIMO Block Diagram ...... 105

5.2 Transceiver structure of MIMO MC-CDMA systems ...... 112

5.3 Performance comparison of Space-Frequency Spreading methods for one,

two and four tap equal power fading channels for t = r = G = 2...... 117

5.4 Performance of ML detection versus SIC with and without power control

for two-tap channel for t = r = 2, L = 2,Nf = 8...... 119 5.5 The performance of space-frequency spreading compared with MMSE beam-

forming over flat fading channel t = 2, r = 2,Nf = 8, z × K = 16. . . . . 120

5.6 Average Pe compared with Pe for individual users in space-frequency spread-

ing, t = 2, r = 2,Nf = 8,K = 8, z =2...... 121 5.7 Precoding/Beamforming over space and frequency for t = r = 2, K = 16,

SINR = 10(dB)...... 123

xii Chapter 1

Introduction

The use of antenna arrays at both the transmitter and the receiver has received signifi- cant attention as a promising method to provide diversity and/or multiplexing gain over wireless links. Multiple antennas create extra dimensions in the signal space which can be used in different ways. The receiver can be provided with replicas of the same data to increase the reliability of signal transmission which results in spatial diversity gain.

The spatial dimensions can also be used to carry independent data streams to increase the data rate which results in spatial multiplexing gain. This collective improvement associated with spatial multiple-input multiple-output (MIMO) channels is based on the premise that in the wireless system with enough separation between antennas in an ar- ray, a rich scattering environment provides different channels between each transmit and receive antenna which are statistically uncorrelated to some extent.

MIMO techniques were first investigated in a point-to-point or single-user commu- nication link. In a MIMO single-user system with t transmit and r receive antennas, a diversity order of tr can be provided for the system. Also, if the channel is perfectly known at the receiver, capacity scales linearly with min(t, r) relative to a system with just one transmit and one receive antenna. A MIMO system is thus able to provide im-

1 Chapter 1:Introduction 2

proved power and bandwidth efficiencies, at the cost of setting up additional antennas.

Space-time coding schemes have been designed for MIMO single-user systems to achieve diversity gain [1–3], or achieve high data rates by taking advantage of multiplexing gain of MIMO systems [4,5], or both [6,7].

Base Station

User 1 User 2 User K

Figure 1.1: Multiple Access Channel

In many applications ranging from wireless LAN to cellular telephony, multiuser com- munication is a reality. Therefore, recently researchers have been attracted to investi- gate the impact and implications of using MIMO systems in multiuser environments.

There are two basic multiuser MIMO channel models: the MIMO multiple-access chan- nel (MAC) and the MIMO broadcast channel (BC). In MIMO MAC, a number of users share a common communication channel to transmit their individual signals to a receiver.

Such a system is shown in Figure 1.1. In the uplink of a mobile cellular communication system, the users are the mobile transmitters in any particular cell and the receiver is Chapter 1:Introduction 3

the base station of that cell. In MIMO BC, a transmitter sends information to multiple receivers as shown in Figure 1.2. In the downlink of a mobile cellular communication system, the transmitter is the base station and the receivers are the mobile stations.

A key difference between single-user, MAC, and BC channels is that in the single-user channel, there is a full collaboration at both sides of transmitter and the receiver, while in the MAC channel there is collaboration only at the receiver, and in the BC channel collaboration exists only at the transmitter. Therefore in the BC channel joint processing between the receivers cannot be supported. Based on this fact, the design of BC channel is proved to be more challenging [8–10].

Base Station

User 1 User 2 User K

Figure 1.2: Broadcast Channel

This thesis is primarily concerned with the design of the transmitter in a MIMO broad- cast channel. Assuming no channel information at the transmitter, space-time spreading matrices are designed to maximize diversity gain and spectral efficiency. Assuming per- Chapter 1:Introduction 4

fect channel knowledge at the transmitter, an algorithm based on MMSE beamforming combined with non-linear interference pre-subtraction is proposed which is applicable to a multiuser BC channel with any desired number of transmit/receive antennas.

This chapter will provide the basis for the rest of the thesis. Multipath fading and different diversity schemes are explained in the next section. In Section 1.2, a brief review on space-time coding schemes for single-user systems and then the extension to multiuser systems is explained. In Section 1.3, precoding is introduced for both single- user and multiuser systems when the channel is known perfectly at the transmitter. The overview of the thesis is provided in Section 1.4, and the notations which are used through the thesis are given in Section 1.5.

1.1 Multipath Fading Channels

The physical characteristics of the wireless channel presents a fundamental technical chal- lenge for reliable communications. This is mainly because of the time varying multipath nature of the channel. Multipath propagation is a result of the propagation of the signal over a number of different paths due to reflections of the signal by mountains, buildings, and other objects. Because of the time variations in the structure of the wireless chan- nel, the nature of the multipath varies with time. This results in signal fading over time.

The amplitude variations in the received signal are due to the destructive and construc- tive addition of multiple signal paths between receiver and transmitter. For a multipath fading channel, we define the time-variant impulse response of the channel as h(t, τ) which is the output of the channel at time t to an impulse applied at time t − τ. Since the channel time variations are not predictable, the time variant multipath channel is modelled statistically. The most common statistical fading model is the Rayleigh fading model in which the impulse response of the channel, h(t, τ), is assumed to be a complex Chapter 1:Introduction 5

random variable whose real and imaginary parts are zero-mean statistically independent

2 Gaussian random variables, each having a variance στ . Therefore the magnitude of the channel at any instant t, r = |h(t, τ)|, has a Rayleigh distribution

r 2 2 −r /2στ P (r) = 2 e , r ≥ 0. (1.1) στ The autocorrelation function of h(t, τ) is given by [11]

1 φ (∆t; τ , τ ) = E [h∗(t, τ )h(t + ∆t, τ )] , (1.2) h i j 2 i j where ∆t is the observation time difference. Since in most radio transmissions the impulse response of the channel for different paths are independent, if we let ∆t = 0, then we have

φh(τi, τj) = φh(τi)δ(τi − τj). (1.3)

In fact φh(τi) represents the average channel output power as a function of the time delay

τi. The different paths have different time delays and different average powers. We call 1 φ (τ) = E [h∗(t, τ)h(t, τ)] , (1.4) h 2 the multipath intensity profile of the channel [11]. The range of values of τ that φh(τ) is nonzero is said the multipath spread of the channel, and the largest value among these delays is defined as the delay spread of the channel which is denoted by Tm. In other words φh(τ) ≈ 0, for τ ≥ Tm.

The coherence bandwidth of the channel, (∆f)c, is defined as the frequency separation at which two frequency components of the signal undergo independent attenuations by the channel. This parameter will be defined corresponding to Tm as 1 (∆f)c ≈ . (1.5) Tm If the bandwidth of the signal, W , that is transmitted through the channel is smaller than the coherence bandwidth of the channel, i.e. W < (∆f)c, the channel is called Chapter 1:Introduction 6

flat-fading channel in which all the frequency components of the signal undergo the same attenuation by the channel. In other words, within the bandwidth of the signal, the transfer function of the channel is constant in the frequency variable. In this case, the multipath components in the received signal are not resolvable, and the channel appears as a single fading path. This implies that in flat-fading channels, the received signal is simply the transmitted signal multiplied by the channel coefficient, h, where h is a zero-mean complex-valued Gaussian random process. For a single-antenna system, this can be simply modelled as

y = hx + ν, (1.6) where x, and y are the transmitted and received signal respectively, and ν is additive noise which is usually assumed to be Gaussian distributed and independent of x.

If the signal bandwidth is such that W > (∆f)c the channel is called to be frequency- selective and the signal is severely distorted by the channel. In this case, the multipath components can be resolved in the received signal and therefore the receiver is provided with several independently fading signal paths [11]. Consequently, the frequency-selective channel is modelled as a tapped delay line filter with time-variant tap coefficients. The frequency-selective fading can degrade system performance by causing inter-symbol inter- ference (ISI) which result in an irreducible bit error rate (BER). Time-domain equaliza- tion [11] and orthogonal frequency-division multiplexing (OFDM) [11–13] are practical techniques that can be used to resolve ISI.

Diversity techniques are based on the fact that if the channel is in a deep fade because of the destructive addition of the multipath signals, errors may occur due to the large channel attenuation. However if we can provide the receiver with several replicas of the same signal transmitted over L independently fading channels, the probability that all the signals fade simultaneously will be reduced considerably. If p is the probability Chapter 1:Introduction 7

that any one signal fades below a threshold level, then pL is the probability that all

L independently fading replicas will fade below the threshold level. There are several diversity techniques that can be employed in wireless communication systems to supply to the receiver L independently fading replicas of the same information signal. Diversity techniques which may be used include time, frequency, and space diversity.

• Time Diversity refers to transmitting the same signal over L different time slots

where the separation between successive time slots is enough to make their channels

independent. A common example of time diversity is the interleaving of coded

symbols over a large block length.

• Frequency Diversity refers to transmitting the same signal over a large bandwidth,

exceeding the coherence bandwidth of the channel. An example of the use of

frequency diversity is spread spectrum modulation. In fact, in a frequency-selective

fading channel, the receiver is provided with TmW ≈ W/(∆f)c resolvable signal components. By applying either OFDM or time-domain equalization schemes, a

frequency diversity of order L ≈ W/(∆f)c can be obtained.

• Space Diversity refers to transmitting or receiving the same signal over multiple

antennas that are separated enough to create independent fading channels. To

provide space diversity, multiple antennas are used at the transmitter and/or the

receiver. The independent spatial channels provided by multiple antennas can be

also used to carry independent data steams to increase the data rate. This latter

technique is known as spatial multiplexing.

In this thesis, both flat-fading and frequency-selective fading channels are considered.

In the latter case, OFDM is applied to resolve ISI and extract the frequency diversity of the channel. Chapter 1:Introduction 8

1.2 Space-Time Coding

1.2.1 System Model

A single-user channel is considered with t transmit and r receive antennas. The transmit symbols s1, ..., sp are encoded to a n × t (possibly complex) space-time code matrix C which is transmitted from t transmit antennas over n time slots. The rate of this code is defined as

R = p/n symbols/channel-use, (1.7) where again p is the number of data symbols transmitted in n time slots. The t×r channel matrix H is defined such that H(i, j), which represents the element in the ith row and the jth column of the matrix H, is the channel gain between the transmit antenna i

2 and receive antenna j. Each channel coefficient has the same variance of σh, and the tr channels are assumed to be independent. Also we assume a quasi-static Rayleigh fading channel which is constant over a block of n time slots and independent from block to block. Then if the power per input symbol transmitted from each transmit antenna is ps/t, the received signal which is a n × r matrix Y will be

p Y = ps/tCH + N, (1.8) where N ∈ Cn×r is a matrix of i.i.d. complex Gaussian random variables with zero mean, and the variance of σ2, representing receiver noise. To simplify the analysis,

2 σh = 1 is assumed and therefore the signal-to-noise ratio per receive antenna is defined

2 as ρ = ps/σ .

The channel is known at the receiver but not at the transmitter. The goal is to design the matrix C to achieve full diversity and multiplexing gain. Chapter 1:Introduction 9

1.2.2 Design Criteria

Space-time codes design criteria are derived based on maximum likelihood (ML) detection in [2, 14]. The analysis is based on pairwise error probability [11]. For a given channel matrix H, the probability that a ML receiver decides erroneously in favor of the code matrix Cj when the code matrix Ci is transmitted will be

2 2 P (Ci → Cj | H) = P (kY − CjHk ≤ kY − CiHk ) q (1.9) 2 A (Ci,Cj )ρ 2 = Q( 2 ) ≤ exp(−A (Ci, Cj)ρ/4),

2 2 where A (Ci, Cj) = k(Ci−Cj)Hk . Equation (1.9) needs to be averaged over the channel distribution. An upper bound on the average probability of error in the case of Rayleigh fading channel is obtained in [2] as follows, Ã ! Yl −r −lr P (Ci → Cj) ≤ λi (ρ/4) for Ci 6= Cj, (1.10) i=1 where l is the minimum rank of the difference matrix, Dij = (Ci − Cj), over different

H possible code matrices Ci 6= Cj, and λi are nonzero eigenvalues of the matrix Λ = DijDij . This results in the following design criteria for space-time codes:

• Rank Criterion (diversity gain): The achieved transmit diversity at the receiver is

the minimum rank of the difference matrix, Dij, over all possible code matrices

Ci 6= Cj. A full diversity code is obtained if l = t.

• Determinant Criterion (coding gain): The coding gain, g is defined as à ! Yl g = min λi . (1.11) Ci6=Cj i=1

Space-time codes are designed to maximize both diversity and coding gain. Chapter 1:Introduction 10

1.2.3 Space-time Coding Schemes

In this section, we briefly review a few well known space-time coding (STC) schemes so that we may refer to them later in the thesis.

• Space-Time Block Coding (STBC): In [1] a STC scheme is proposed for two

transmit antennas. The input symbols (si) are divided into groups of two symbols each. Then the STC matrix is generated as follows:    s1 s2  C =   (1.12) ∗ ∗ −s2 s1

It is shown that because of the orthogonal structure of this code, i.e. X2 H 2 CC = si I2 (1.13) i=1

(where Ik is a k × k identity matrix), ML detection simplifies to a linear processing at the receiver [1]. Also it can be easily shown that this code has full diversity (the

difference matrix is full rank). This scheme is later generalized in [3] to an arbitrary

number of antennas. For t transmit antennas, the input symbols are divided into

groups of t symbols each and then the STC matrix is generated as an orthogonal

H Pt 2 matrix, i.e. CC = i=1 si It.

Here, we summarize some important properties of STBC:

– STBC are full diversity codes.

– Real orthogonal codes with rate R = 1 can be designed for any number of

transmit antennas.

– A complex orthogonal design with rate R = 1 exists if and only if t = 2 (see

(1.12)). There are also complex orthogonal designs for t = 3, and t = 4 but

with a rate R = 3/4. Chapter 1:Introduction 11

– ML detection requires only linear processing at the receiver.

• BLAST Codes: BLAST stands for Bell Laboratories Layered Space-Time. This

architecture breaks the data stream into t sub-streams that are transmitted simulta-

neously from t antennas. Hence, there is no built-in spatial transmit diversity. This

scheme is implemented as Diagonal-BLAST (D-BLAST) [4], Vertical-BLAST (V-

BLAST) [5] and Turbo-BLAST (T-BLAST) [15]. In particular BLAST is designed

to provide very high data rate communications over wireless flat-fading channels. A

typical example for V-BLAST when we have two transmit and two receive antennas

is    s1 s2  C =   . (1.14) s3 s4

Comparing this with STBC (1.12), one can see that in BLAST scheme, transmit

diversity is not provided for the system. However, two symbols are transmitted

per channel use which is twice the rate of STBC in terms of channel symbols per

channel use.

At the receiver, successive nulling and cancelling is applied. The interference

from an already-detected symbol is subtracted out from the received signal be-

fore the next symbol is detected. Each symbol is detected based on a zero forcing

method [16]. Therefore, it is necessary to have r ≥ t. The order in which the sym-

bols are detected affects the overall performance of the algorithm. The best-first

cancellation scheme is widely known within the multiuser detection community [17].

This can also be applied at the receiver for BLAST. Based on this scheme, the

symbols are ordered based on their received signal-to-interference-plus-noise ratios

(SINRs). Then the symbols with higher SINRs are detected first. Because of the

particular structure of BLAST, it can be easily seen that the symbols are automat- Chapter 1:Introduction 12

ically received with different SINRs.

Here, we summarize some important properties of BLAST:

– Spatial diversity is not provided at the transmitter.

– BLAST can be designed for any number of transmit/receive antennas as long

as the number of receive antennas is equal or greater than the number of

transmit antennas r ≥ t.

– BLAST is a full rate code, R = t.

• Linear Dispersion (LD) Codes: In this scheme which is proposed in [6], the

data stream is broken into Q sub-streams, sq = αq + jβq, q = 1, ..., Q, that are transmitted over space and time as indicated by the codeword matrix XQ C = (αqAq + jβqBq). (1.15) q=1

The performance of LD codes is dependent on Q, {Aq}, and {Bq}. The LD codes in [6] were designed to maximize the mutual information between the transmit and

receive signals.

Note that, in a MIMO single-user channel, if the channel is known at the receiver,

the resulting channel capacity is [4,18]: n h ³ ´io ρ H C(ρ, t, r) = max E log det Ir + HRsH , (1.16) Rs,T r(Rs)=t t where the expectation is taken over the distribution of the random matrix H, and

r×t Rs is the covariance matrix of the input signal. If the channel matrix H ∈ C is a matrix of i.i.d. complex Gaussian random variables with zero mean, and the

2 variance of σh = 1, the optimal covariance matrix when H is unknown to the

transmitter is Rs = It, and (1.16) becomes n h ³ ρ ´io C(ρ, t, r) = E log det I + HHH . (1.17) r t Chapter 1:Introduction 13

By substituting (1.15) in (1.8) we get

p XQ Y = ps/t (αqAq + jβqBq)H + N. (1.18) q=1

By decomposing the matrices in (1.18) into their real and imaginary parts and then

collecting the real and imaginary parts of the received signal in the vector y, the

equation (1.18) is re-formulated as [6]

p y = ps/tHgx + ν, (1.19)

2nr×2Q where Hg ∈ R is a modified channel matrix which is a function of real

and imaginary components of Aq and Bq as well as the original channel gains. y, ν ∈ R2nr×1, and x ∈ R2Q×1 is a vector of real and imaginary parts of transmitted

symbols (αq, βq). Therefore the LD codes are linear in the variables αq, βq and the same detection algorithm as explained for BLAST scheme can be applied.

Without loss of generality, s1, ..., sQ are assumed to be unit-variance and uncorre- lated. Then E[T r(CCH )] = nt and therefore

XQ H H (T r(Aq Aq) + T r(Bq Bq)) = 2nt. (1.20) q=1

To design LD codes, first of all Q = n. min(t, r) is chosen. As mentioned earlier

the LD codes are designed to maximize the mutual information between the trans-

mit and receive signals. Therefore to choose {Aq, Bq}, the following optimization problem has to be solved n h ³ ´io 1 ρ T CLD(ρ, n, t, r) = max E log det I2nr + HgHg , (1.21) Aq,Bq,q=1,...,Q 2n t

subject to one of the following constraints

PQ H H 1. q=1(T rAq Aq + T rBq Bq) = 2nt Chapter 1:Introduction 14

H H nt 2. T rAq Aq = T rBq Bq = Q , q = 1, ..., Q

H H n 3. Aq Aq = Bq Bq = Q It, q = 1, ..., Q £ ¤ The first constraint is the power constraint that ensures E T r(CCH ) = nt. The

second constraint is to make sure that all the data symbols are transmitted with

the same power. The third constraint is to transmit all the data symbols with equal

energy in all spatial and temporal directions.

Here, we summarize some important properties of LD codes:

– Full diversity is not guaranteed but the codes are shown to provide good

performance with respect to the probability of error [6].

– The optimization problem in (1.21) is neither convex nor concave. Therefore

the optimization problem may lead to a local optimum.

– The solution (Aq, Bq) is not unique.

– LD code is a full rate code, R = Q/n which if r ≥ t results in R = t.

• TAST Codes: TAST stands for Threaded Algebraic Space-Time. This scheme

which is proposed in [7,19,20], uses a threaded structure and algebraic number the-

oretic tools to design full diversity codes. The codes are directly optimized based

on the rank criterion (diversity gain) and determinant criterion (coding gain) (see

1.2.2). The problem of space-time diversity gain is related to algebraic number

theory, and the coding gain optimization is related to the theory of simultaneous

Diophantine approximation in the geometry of numbers. The coding gain opti-

mization is found to be equivalent to finding irrational numbers, the furthest from

any simultaneous rational approximations.

Applying a ML detection at the receiver, these codes achieve full diversity while

the coding gain is optimized as well. Chapter 1:Introduction 15

For comparison, a design of TAST codes for two transmit/receive antennas is given

in the following    s1 + φs2 θ(s3 + φs4)  Cφ =   , (1.22) θ(s3 − φs4) s1 − φs2

where θ2 = φ, and φ = ej/2.

It can be seen that the rate of this code is R = 2. Also based on the rank criterion,

it can be easily shown that this code achieves full diversity. φ is chosen to maximize

the coding gain.

Other details for designing these codes are omitted here. The comprehensive ex-

planation is provided in [7, 19, 20]. Here we summarize some key points of TAST

codes.

– TAST codes can be designed for any number of transmit antennas

– TAST codes are full diversity, full rate (R = t) codes

– Optimal detection (ML) is required to achieve full diversity

1.2.4 Space-Time Coding in a Multiuser System

Designing space-time codes for single-user systems is very well understood. However, there has not been an extensive work towards space-time code design for multiuser ap- plications. In fact, splitting of the channel resources among independent users either in the form of multiple access (uplink) or broadcasting (downlink) is often considered a straightforward task involving the concatenation of a multiple access scheme such as

CDMA with the space-time (ST) processor [21–23]. For instance, each user can be as- signed an orthogonal spreading code, which is used to spread the symbols at the output of a space-time encoder. For that matter, the channel symbols can be generated using Chapter 1:Introduction 16

orthogonal space-time block codes, or any other STC designed for a single-user system.

With flat fading synchronous channels, as seen on the downlink, a de-spreading front-end at each receiver results in a single-user channel without multi-access interference (MAI).

Note that, the maximum number of active users is only equal to the processing gain

(bandwidth expansion) of the system, regardless of the number of space-time dimensions

(n time slots, t transmit antennas) used in the STC part of the system. Furthermore, all the constraints and complexities of the applied STC scheme carry over to the multiuser case. As we discussed before, for some STC designs, such as orthogonal space-time block codes, it is simply impossible to allow certain antenna configurations; for others, such as linear dispersion codes, it is necessary to maximize an objective function for a given t and r.

In applying the BLAST scheme to a multiuser system, one can use the same spread- ing code to spread each of the sub-streams. Since the same code is used to spread the sub-streams, the spreading does not aid the receiver in distinguishing among them. As an alternative, different spreading codes could be used for the sub-streams which are trans- mitted simultaneously from different transmit antennas. In this case, the sub-streams can be separated by their spatial characteristics and their codes. In either case, we can either transmit multiple sub-streams to each user or transmit one sub-stream per user.

In addition, different spreading codes can be used to transmit the same sub-stream from different antennas to achieve transmit diversity. In this case, a different spreading code is used on each antenna to distinguish the sub-streams [22]. Although applying different spreading codes over different antennas improves the performance significantly, but as we will show later, it decreases the spectral efficiency. In Chapter 2, we will propose a space-time spreading scheme which is designed for multiuser downlink channel and then we compare the spectral efficiency of the proposed scheme with other STC schemes presented for CDMA systems. Chapter 1:Introduction 17

1.3 Precoding

The main difficulty in MIMO channels is the separation of the data streams which are sent in parallel. In the context of the multiple access channel, this task is called multiuser detection.

In this section we discuss precoding or pre-equalization of the transmitted signals for MIMO systems. This type of processing at the transmitter requires the channel state information (CSI) at the transmitter. In order to be able to obtain CSI at the transmitter, the channel should be fixed (non-mobile) or approximately constant over a reasonably large time period. If CSI is available at the transmitter, the transmitted symbols, either for a single-user or for multiple users, can be partially separated by means of pre-equalization at the transmitter. In this section, we give a brief overview of precoding schemes for single-user and multiuser systems.

1.3.1 MIMO Single-user Systems

A MIMO channel can be described by a very basic model as y = Hx + ν, where x, y are the transmit and receive signal vector respectively, ν represents the receive noise, and

H is the r × t MIMO channel. In a zero-forcing receiver, the transmit data signals are detected by multiplying the received signal vector by the pseudo inverse of the channel matrix

xˆ = (HH H)−1HH y, if t ≤ r. (1.23)

For this, the number of receive antennas should be greater than or equal to the number of transmit antennas. It is well known that zero-forcing equalization suffers from noise enhancement. To overcome this deficiency, decision-feedback equalization (DFE) can be applied at the receiver [24]. In DFE, the symbols are detected sequentially. After each symbol is detected, it is cancelled out before the next symbol is detected, therefore DFE Chapter 1:Introduction 18

suffers from error propagation. The structure of DFE is shown in Figure 1.3. The matrix

B is a lower triangular matrix representing the decision feedback operation, and matrix

F is the feedforward matrix.

For the above methods, the CSI is required only at the receiver. By assuming perfect

CSI at the transmitter, the interference between the transmitted symbols can be com- pletely avoided at the receiver by multiplying the transmit signal by the pseudo inverse of the channel, which means transmitting

x = HH (HHH )−1s, if r ≤ t, (1.24) rather than transmitting the data vector s. In this linear pre-equalization, instead of enhancing the noise, the average transmit power is increased. Also the number of transmit antennas should be equal or greater than the number of receive antennas.

The equalization can also be split among transmitter and receiver. A popular strategy is based on the singular value decomposition (SVD) of the channel matrix. The channel can be written as H = UΣVH , where U, V are unitary matrices and Σ is diagonal.

By multiplying the data signal by V at the transmitter, and then applying UH at the receiver, the channel is diagonalized [25]. In this scheme, neither transmit power is increased, nor channel noise is enhanced. The above schemes are considered as linear pre-equalization.

noise

x y xˆ H F

B

Figure 1.3: Matrix DFE

As mentioned before, the DFE is a non-linear equalizer. With perfect channel knowl- Chapter 1:Introduction 19

edge at the transmitter, the feedback part of the DFE can be transferred to the trans- mitter which results in a non-linear precoding scheme known as Tomlinson-Harashima precoding (THP). The performance of DFE and THP are the same but since THP is applied at the transmitter, error propagation is avoided [26].

The calculation for the feedforward and feedback filter is as follows. We begin by applying a QL factorization over the channel matrix such that H = QH S where Q is a unitary matrix and S is a lower triangular matrix. This can be obtained through a

Cholesky factorization of HH H because HH H = SH S [26]. Now, we define C = VS where V is a diagonal matrix with the elements equal to the inverse of the diagonal elements of the matrix S so that C becomes a unit-diagonal lower triangular matrix. It can be easily verified that the feedback matrix at the transmitter and the feedforward

filter at the receiver should be calculated as B = C − I, and F = VQ respectively.

Therefore at the transmitter the symbols ai, i = 1, ..., K are generated successively from the original data x

Xi−1 ai = xi − B(i, l)al, i = 1, ..., K (1.25) l=1

where xi is the ith element of x and B(i, l) is the element in the ith row and the lth column of the matrix B. This strategy will significantly increase the transmit power, therefore the symbols are modulo reduced into the boundary region of the used signal constellation. Mathematically, the integers are added to real and imaginary parts of ai to bound the transmit signals to the constellation region (see [27] and Chapter 3 for more details). Because of this modulo operation, THP is considered as a non-linear precoding. As is shown in [27] the transmit power is still slightly increased, but the scheme outperforms linear pre-coding schemes in the sense of error probability. Chapter 1:Introduction 20

1.3.2 MIMO Multiuser Systems

A multiuser downlink channel can be also modelled as y = Hx+ν, while H is the overall downlink channel matrix, and y includes the received signals for all users. However, since the receivers are not collaborating, joint processing of the vector y is not possible, and consequently the schemes proposed for single-user systems may not be applicable.

For instance the SVD over the known channel matrix, as explained in the last section, cannot be applied. Also, in THP implementation although the feedback part is moved to the transmitter but still the feedforward filter requires a joint processing of the received signals. However, the THP can be modified to be suitable for a multiuser channel. In fact the feedforward filter is also transferred to the transmitter. The calculation for feedforward and feedback filter for this new structure is as follows. A QR factorization is applied over HH such that HH = QR and H = SQH where Q is a unitary matrix and

S = RH is a lower triangular matrix. By defining C = VS where V is a diagonal matrix with the elements equal to the inverse of the diagonal elements of the matrix S so that

C becomes a unit-diagonal lower triangular matrix, the feedforward and the feedback matrices are F = Q, and B = C − I respectively. The output signals of the Tomlinson-

Harashima precoder are now applied to the feedforward filter before transmitting through the downlink channel. As a result the received signal is equal to HQC−1 = V−1 which is a diagonal matrix, and therefore a joint processing is not required at the receiver. In this scheme the number of transmit antennas has to be equal to or greater than the total number of receive antennas which is a restrictive condition over the number of users in the system or the number of receive antennas at each user. In Chapter 3 we have designed a non-linear precoding scheme based on THP which is valid for any number of transmit/receive antennas.

Also, in [8] the authors show that the broadcast channel sum capacity is achieved Chapter 1:Introduction 21

using a precoder with the structure of a DFE that decomposes the broadcast channel into a series of single-user channels with interference pre-subtracted at the transmitter.

The proposed precoder is a generalization of the Tomlinson-Harashima precoder.

1.4 Overview of the Thesis

The focus of this thesis is on the precoding and beamforming design for the multiuser downlink channel (Figure 1.2) when multiple antennas are employed at both transmitter and receiver sides. We address two scenarios:

• No channel state information is available at the transmitter (NCSIT)

• Perfect channel state information is available at the transmitter (CSIT)

The design problem with NCSIT is addressed in Chapter 2. The channel is assumed to be flat fading. A space-time spreading matrix is proposed for each user, rather than a temporal spreading code vector as is usual in code-division-multiple-access (CDMA) systems. The spreading matrices are designed to provide full spatial diversity at each receiver while the multiplexing gain is maximized as well. The bandwidth expansion, for a given number of users, is then reduced by a factor of min(t, r), while full spatial diversity is provided for each user, where t is the number of transmit antennas at the base station, and r is the minimum number of receive antennas at user stations. In the downlink since the receivers are portable end-user devices, we are concerned with the complexity at the receiver. Therefore, suboptimal detectors are preferred over optimal detectors (maximum likelihood detection). We have a two-stage interference canceller

(IC) applied at each receiver. A power allocation scheme is then suggested to improve the performance of IC towards achieving full diversity. Chapter 1:Introduction 22

The design problem with CSIT is studied in two parts. In the first part which is addressed in Chapter 3, we have a per-user power constraint. In the second part which is addressed in Chapter 4, the design goal is to minimize the total transmitted power in the downlink, while signal-to-interference-noise ratio (SINR) requirements are to be satisfied at each receiver. In the following we explain these two parts in more detail.

As mentioned before since we do not have collaboration between the receivers in the downlink BC channel and also low complexity receivers are preferred at mobile stations, our goal is to transfer the processing load from the receivers to the transmitter. It is very well known that assuming perfect channel knowledge at the transmitter, complexity can be moved from the receivers to the transmitter without loss of performance [10], [8].

We also know that the boundary of the capacity region of the broadcast (BC) channel is attained with channel knowledge at the transmitter, and using it for successive dirty paper coding (DPC) [28]. Dirty paper coding is a technique that can be seen as interference pre-cancellation at the transmitter.

In this work, assuming perfect channel knowledge at the transmitter, a successive interference pre-subtraction is applied via a matrix version of Tomlinson-Harashima Pre- coding (THP).

In Chapter 3, the multiuser MMSE beamforming is combined with THP to mini- mize the mean squared error between transmit and receive data streams. The receive beam vectors are obtained with the MMSE criterion, and the transmit beam vectors are obtained through an eigen-value-decomposition scheme. In fact since interference pre- cancellation is applied at the BS, the single user algorithms are applicable over individual single user channels. The proposed scheme is extended to design the beam vectors over the time domain as well.

In Chapter 4, the same interference pre-cancellation is applied at the transmitter.

However, since the goal is to minimize the total transmit power, the design problem Chapter 1:Introduction 23

is more complicated. We have shown that transmit beamforming is much more com- plicated than receive beamforming when the total transmit power is to be minimized.

We have proposed an iterative algorithm for designing the one transmitter and multiple receivers. An uplink-downlink SINR duality result is proved and used, which computes

MMSE beamforming receivers for the virtual uplink and the downlink in turn. Initializa- tion is provided by the eigen-value-decomposition scheme explained in Chapter 3. This algorithm is applicable to design space-time beam vectors as well.

In the above proposed algorithms, there is no limitation on the number of trans- mit/receive antennas.

In Chapter 5, the proposed designs in Chapters 2, 3, 4 are extended to perform precoding and beamforming in the MIMO multiuser frequency selective fading channel when orthogonal frequency-division multiplexing (OFDM) is applied. In frequency se- lective MIMO channels, there is an additional source of diversity, frequency diversity, due to the existence of multiple propagation paths between each transmit and receive antenna pair. In MIMO/OFDM systems, the channel frequency diversity can be also ex- ploited through the proper design of space-frequency codes. In this chapter, first without any knowledge of the channel at the transmitter a multiple access scheme is proposed for the downlink in a MIMO/OFDM system. The space-frequency codes are designed to exploit the space and frequency diversity. Then assuming perfect channel knowledge at the transmitter, the precoding and beamforming design is performed over space and frequency. It is shown that the optimization algorithm benefits from cooperation among the processing at different frequency bins.

We conclude the thesis in Chapter 6 where we summarize the contributions of this work and suggest some directions for future work. Chapter 1:Introduction 24

1.5 Notations

The notations used in this thesis are as follows. Boldface lower case letters are used to denote vectors, boldface upper case letters are used to denote matrices. The superscripts

·∗, ·T , and ·H denote conjugate, transpose and conjugate transpose respectively, I denotes the identity matrix, Diag(·) is the abbreviate for the block diagonal matrix, and E means the statistical expectation. det(·), T r(·) are abbreviates for the determinant and trace respectively. M(i, j) represents the element in the ith row and the jth column of the matrix M. Chapter 2

Space-Time Multiplexing for MIMO

Multiuser Downlink Channels

In this chapter, we study the downlink of a multiple-input multiple-output (MIMO) multiuser system, in which antenna arrays are employed at both the transmitter (base station) and the receivers (clients) to provide diversity and also multiplexing gain. The channel is assumed to be unknown at the transmitter but each user receiver is assumed to know its own channel. A modulation technique that can be seen as two-dimensional space-time spreading code (2D-STSC) is described. It is based on well-known Walsh codes, provides full transmit diversity and high spectral efficiency, and produces groups of users that are orthogonal to each other. This last point translates into simplified detection strategies without loss of performance as we will show later.

Note that, the full diversity space time coding (STC) schemes designed based on ML criterion [2,7,29], require optimal detection at the receiver. In the downlink we may be concerned with the complexity at the receiver, and suboptimal detectors are preferred.

The main results of this chapter are the following. We propose a joint space-time cod- ing/spreading scheme that is designed for the multiuser downlink channel without channel

25 Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 26

knowledge at the transmitter. It is effectively a two-dimensional spreading scheme. The spreading codes are designed to provide full spatial diversity at each receiver while the multiplexing gain is maximized as well. The bandwidth expansion of a system with a given number of users may then be reduced by a factor of minimum number of transmit and receive antennas.

The main detector structure of interest is a two-stage interference canceller (IC) which employs serial interference cancellation (SIC) in the first stage. We will demonstrate that in conjunction with an unequal power allocation scheme, this receiver is able to provide full diversity and suffers from only a small performance loss compared to the full-complexity maximum likelihood (ML) receiver. While assigning different powers to individual users may seem controversial and appears to lead to enforced differences in quality of service, in fact wireless systems already employ power control and with a very high probability, transmissions to different users have different powers. Therefore, we are only pointing out that the proposed scheme has a good chance of being successfully implemented in a practical, power-controlled system, even with low complexity receivers.

Our perspective also enables us to find a very simple design that is applicable to any number of transmit and receive antennas, and which works even for single-user multiple antenna systems. In that case, applying the same spreading scheme and assigning dif- ferent power levels on different symbols gives a new approach to designing full rate, full diversity space-time codes even with suboptimal detectors at the receiver.

The remainder of this chapter is organized as follows. The system model is provided in the next section. The spreading matrix design and power control algorithm are explained in Section 2.2. Receiver structures are discussed in Section 2.3. The comparison with the other STC schemes for the downlink in a MIMO multiuser channel is given in Section

2.4. The simulation results are shown in Section 2.5. We conclude this chapter in Section

2.6. Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 27

2.1 System Model

The downlink transmitter sends U data streams to K users, where K can be smaller than

U, by memoryless linear modulation of the U symbols si, i = 1,...,U in each symbol epoch. The transmission model is shown in Figure 2.1. The modulation waveforms are two-dimensional, unique to each data stream, and represented by the matrices Φi, i = 1,...,U. If the number of time dimensions is G (this is also equivalent to the spreading

t×G gain) and the number of transmit antennas is t, then Φi ∈ C . The transmitted signal is therefore XU t×G S = siΦi ∈ C . (2.1) i=1 By the above equation, one is reminded of the class of linear dispersion codes explained in [29], and indeed (2.1) represents space-time block codes of many types. However, un- like in [29] where a single-user or point-to-point system is considered, in the downlink, point-to-multipoint problem, we do not have the luxury of a capacity expression to max- imize. For instance, the sum capacity in a MIMO broadcast channel does not have a closed-form expression and is instead expressed as the saddle point of a certain function.

Therefore, we need to resort to other design goals, as will be explained later.

From (2.1) the signal received by user j,(j = 1, ..., K), over G channel uses (or

“chips” in spread spectrum terminology) and rj receiver antennas is

XU rj ×G Yj = siHj,0Φi + Vj ∈ C , (2.2) i=1

rj ×t where Hj,0 ∈ C is the matrix of flat-fading channel gains between the transmitter and the jth receiver’s antennas, and Vj is a matrix of i.i.d. complex Gaussian random

2 variables with zero mean, and the variance of σj , representing receiver noise. We note that because of the assumption that the channel is non-dispersive in time, Yj constitutes

T a sufficient statistic for the detection of the symbol vector s = [s1, . . . , sU ] , hence the Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 28

1 . s1 . 1 Φ1 t

1 Antenna 1 . s2 . 2 Φ2 t ...... Antenna t 1 . sK . K ΦK t

Figure 2.1: Transmission system model symbol epoch is not explicitly mentioned.

It is assumed that in the jth MIMO channel, each flat fading coefficient has the same

2 variance of σh,j, and that the t · rj channels are independent, and so

H 2 E[Hj,0Hj,0] = rjσh,jI. (2.3)

By stacking the columns of Yj, we obtain the familiar linear signal model

yj = HjCs + vj, (2.4) where

rj G×1 yj = vect(Yj) ∈ C

rj G×tG Hj = Diag(Hj,0,..., Hj,0) ∈ C

tG×U C = [vect(Φ1),..., vect(ΦU )] ∈ C and

rj G×1 vj = vect(Vj) ∈ C . Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 29

Essentially, each column of C corresponds to a space-time modulation waveform for one data stream. For transmitter design, we are interested in finding C; for receiver design, the signal model suggests that joint detection at each receiver is necessary to mitigate the inter-user interference that inevitably exists when the channel is unknown to the transmitter1.

2.2 Transmitted Signal Design

2.2.1 Assumptions and Goals

We consider the problem of designing the spreading matrices Φi, i = 1,...,U, or equiva- lently the matrix C, assuming that no channel information is available at the base station transmitter, whereas receiver j knows Hj perfectly at all times.

There are two main factors influencing the performance of a multiuser MIMO system

– multi-access interference (MAI) and diversity gain. To achieve single-user performance for all users, or no MAI, requires that

H H C Hj HjC = Dj, (2.5)

where Dj is a diagonal matrix, for all j ∈ {1,...,U} and all realizations of the channel

Hj. However, since the transmitter does not know Hj, zero MAI is not possible through spreading matrix design. Instead, we introduce a much looser objective, that of achieving zero MAI on average i.e.

H H E[C Hj HjC] = Dj, (2.6)

where expectation is taken over the distribution of Hj.

1This precludes any sort of precoding for performance improvement. Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 30

From (2.3), we have

H 2 E[Hj Hj] = rjσh,jI, (2.7) and hence (2.6) leads to

Design Requirement 1: In order for the average MAI to be zero at each receiver, the spreading matrices represented by C should be designed so that CH C = D, where D is a diagonal matrix.

While Requirement 1 is very loose and easily satisfied, it is nonetheless important because violating it leads to non-zero MAI, even on average. Therefore we impose the constraint that C has orthogonal columns. Since C has U columns and Gt rows, the orthogonality constraint also leads to a strict upper limit on the number of users (symbols) that can be supported in this system:

U ≤ Gt (2.8)

To tackle the issue of diversity gain, assume that receiver j performs joint maximum likelihood (ML) decoding of s based on yj. The transmitted space-time codeword is

U S from (2.1), where S ∈ S and the set S has cardinality |S| = (log2 M) , if M-ary modulation is used for each si. From Tarokh’s work [2], we know that full transmit diversity is obtained when Sm − Sn has a rank of t for every pair of (Sm, Sn) ∈ S × S. Given the structure of (2.1), we have

XU Em,n = Sm − Sn = (sm,i − sn,i)Φi, (2.9) i=1 where sm,i and sn,i are the ith symbols in the mth and nth codewords respectively.

The codeword difference matrix Em,n has dimensions t × G, so for it to have a rank of t requires

Design Requirement 2: The temporal spreading factor G must be at least equal to the number of transmit antennas or G ≥ t. Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 31

In addition, a necessary condition for Em,n to have a rank of t for all codeword pairs is

Design Requirement 3: Every spreading matrix Φi, i = 1,...,U, must have a rank of t.

The proof of this statement is trivial. In fact Em,n for the pair of (Sm, Sn), when sm,i 6= sn,i and sm,j = sn,j for j 6= i, j = 1, ..., U has to be full rank. This implies that each spreading matrix Φi has to be full rank. It has to be mentioned that design requirements 2, 3 are necessary, but as will be shown shortly, they are not sufficient to achieve full diversity. In the next section, we present a spreading matrix design based on

Walsh-Hadamard matrices which meets all three design requirements.

2.2.2 Spreading Matrix Design

Consider a system with U = Gt users2, where G ≥ t determines the bandwidth expansion of the system, and t can be any positive integer. In the proposed design, G must be a power of two because of the use of Walsh-Hadamard basis vectors. It will be clear shortly that the proposed algorithm works for U < Gt too, and therefore the assumption of U = Gt is non-restrictive and adopted for convenience only.

We first divide the U users into t groups of G users each. Length-G Walsh codes are assigned to each group, and user g (g = 1,...,G) in group n (n = 1, . . . , t) is assigned

Walsh code g, denoted wg. Observe that the same Walsh code is used by t users in the system.

3 Then the spreading matrix Φi for user g of group n is formed by “threading” wg diagonally starting from antenna n. To illustrate, suppose t = 3 and G = 4. Then the

2For convenience and without loss of generality, we assume that U = K, i.e. each data stream belongs to a distinct user. 3The index i depends on g and n according to i = (n − 1)G + g. Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 32

four spreading matrices of group one are:          1 0 0 1   1 0 0 −1   1 0 0 −1   1 0 0 1                   0 1 0 0  ,  0 −1 0 0  ,  0 1 0 0  ,  0 −1 0 0  (2.10)         0 0 1 0 0 0 1 0 0 0 −1 0 0 0 −1 0 those for group two are:          0 0 1 0   0 0 1 0   0 0 −1 0   0 0 −1 0                   1 0 0 1  ,  1 0 0 −1  ,  1 0 0 −1  ,  1 0 0 1  (2.11)         0 1 0 0 0 −1 0 0 0 1 0 0 0 −1 0 0 and the four matrices for group 3 are:          0 1 0 0   0 −1 0 0   0 1 0 0   0 −1 0 0                   0 0 1 0  ,  0 0 1 0  ,  0 0 −1 0  ,  0 0 −1 0  . (2.12)         1 0 0 1 1 0 0 −1 1 0 0 −1 1 0 0 1

In the rest of the chapter, we will assume without loss of generality that each Φi is scaled

H so that, ci ci = 1, where ci = vec(Φi). The transmitted symbol energy will then be equal

2 to E|si| . Referring to the second matrix in (2.11) for instance, we see that user 2 of group 2

(i = 6) transmits its symbol s6 over antenna two in chip interval 1, −s6 over antenna 3 at chip 2, s6 over antenna 1 at chip 3, and −s6 over antenna 2 at chip 4. All 12 users transmit over the three antennas and four chips using their assigned spreading matrices.

For more examples, one can see Appendix A and [30].

It is straightforward to verify that this design procedure satisfies all the design re- quirements stated in the last section for any t and all values of G that are powers of two. If we use non-binary spreading vectors wg, we can even relax the requirements that

k G = 2 , k ∈ Z+ by choosing any set of G orthonormal length-G basis vectors which have Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 33

only non-zero elements. However, the Walsh-code design just described has a very useful and unique property, captured in the following lemma.

k Lemma 1 For any G = 2 , where k ∈ Z+, and t ∈ Z+ such that G ≥ t, if U ≤ Gt, the proposed design results in Uint mutually interfering data streams at each receiver, where

k0 Uint = min(U, 2 t), (2.13)

and k0 is the integer satisfying log2 t ≤ k0 < log2 t + 1, which implies that k0 ≤ k, regardless of the channel realizations Hj, j = 1,...,U. This indicates that by increasing

G above 2k0 to activate more users, the number of interferers is not necessarily increased.

Proof

The number of mutually interfering data streams at receiver j can be found from the

H H U×U maximum number of non-zero elements in the rows of C Hj HjC ∈ C . Clearly if

k0 U < 2 t, then Uint is at most equal to U and this explains the first part of the lemma.

k0 k0 To show that Uint does not grow with U for values of U > 2 t, let G0 = 2 and let

C0 be a tG0 × U matrix which results from applying our design procedure when G = G0.

0 As well, let Hj be the Hj matrix for this system and fix U at its maximum value of Gt

k0+1 for all values of G. Now consider G = G1 = 2G0 = 2 . Because of the Walsh-code basis of our design, we can arrange the 2tG0 × 2U spreading code matric C1 so that    C0 C0  C1 =   . (2.14) C0 −C0

Furthermore, with the doubling of the spreading factor from G0 to 2G0, Hj will have twice its original number of columns and rows, and thus we have a new Hj matrix

1 0 0 Hj = Diag(Hj , Hj ). (2.15) Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 34

To find Uint, we form       CH CH H0H H0 0 C C H 1H 1  0 0   j j   0 0  C1 Hj Hj C1 =       H H 0H 0 C0 −C0 0 Hj Hj C0 −C0   H 0H 0  C0 Hj Hj C0 0  =   H 0H 0 0 C0 Hj Hj C0

Since the maximum number of non-zero elements in each row of this matrix is exactly

H 0H 0 k0 equal to that of C0 Hj Hj C0, which is 2 t, the number of interferers remains unchanged even though the number of users is doubled.

By induction, we can now conclude that the number of interferers when G > G0 is

k0 the same as when G = G0, i.e. Uint = 2 t. Finally, since the number of interfering users cannot increase when U takes values smaller than Gt, the result holds for all values of

U ≤ Gt. ¥

The implications of Lemma 1 can be revealed by re-visiting the G = 4, t = 3 example

k0 presented earlier. In this case, k0 = 2, and so G = 2 . The number of interferers is the number of users, i.e. Uint = U = 12. Suppose that G is increased to 8, and the number of users to U = 24. Lemma 1 tells us that Uint does not increase to 24 but instead remains at 12! This means that a joint detector of the same complexity can be used for both the 12-user and 24-user systems and still give the same performance. If there are only a small number of transmit antennas, say t = 2, even the ML detector can be used at each receiver because each user effectively acts inside a four-user system only. Regardless of the actual number of users in the system, the number of interferers is limited to 3. Also to achieve high performance with suboptimal detectors, we only need rjG ≥ Uint, e.g. t = 4,G = 8,U = 32, then at each receiver, Uint = 15, and rj = 2 is sufficient. Remarks:

1. Note that Lemma 1 relies only on the Walsh matrix structure (2.14) being used Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 35

for extensions to higher orders, i.e. for any G0 × G0 orthogonal matrix C0, if Ck

k k is a 2 G0 × 2 G0 orthogonal matrix formed using (2.14), then Lemma 1 holds. So any orthogonal matrix may be used as a “seed” for a whole family of orthogonal

k matrices of order (2 )G0, k ∈ Z+, where G0 is the order of the seed matrix. In this work, we are interested in antipodal binary orthogonal matrices because they

are easy to generate, and this constraint leads us to Walsh matrices as the natural

choice.

2. To multiplex U = Gt users with only G0t mutual interferers, as an alternative,

one can divide the G chip intervals into U/(G0t) time slots, each spanning G0 chips (TDMA). Thus, the dimensionality of the received signal when the receiver

has rj antennas is rjG0. Assuming that the signal dimensionality has to be no smaller than the number of mutually interfering users for good performance4, then

with this TDMA-type transmission, rj ≥ t. On the other hand, in 2D-STSC, the

dimensionality of the received signal is rjG because the time axis has not been

divided, so setting rjG to a minimum of G0t yields rj ≥ G0t/G. Hence

r (TDMA) G j = = 2k ≥ 1, (2.16) rj(2D-STSC) G0

since k ≥ 0, showing that a TDMA scheme may require more receive antennas than

the proposed 2D-STSC method.

2.2.3 Constellation and Power Allocation

So far we have designed the spreading matrices Φi to ensure zero average MAI, and satisfy a necessary condition for full transmit diversity. However, full diversity is guaranteed only if Em,n has full rank. If all users employ the same modulation format, this condition will

4This corresponds to a relative load of unity. Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 36

not be met. Again, consider the example introduced earlier in this section. If the users at the first group all use antipodal BPSK i.e. s1, s2, s3, s4 ∈ {−1, +1}, and we consider a pair of codewords which differ only in s1, s2, s3 and s4 as follows:

Codeword s1 s2 s3 s4 A 1 −1 −1 1

B −1 1 1 −1 then the rank of Em,n is one. To overcome the problem, we need to ensure that the users in the same group, which are transmitted from the same antenna, employ different constellations, through con- stellation rotation for instance, as described by Giraud et al [31], DaSilva and Sousa

[32], or Damen et al [7]. In this work, we examine the technique in [7], and show through simulations that indeed, in each user j, full diversity (trj) is obtained with ML detection. However, this is not the focus of this work because in the downlink, we are more interested in low complexity suboptimal receivers. The interested reader in design of constellation rotation is referred to [7].

With linear detectors, the diversity gain is upper bounded by the number of excess degrees of freedom in the receiver. In [33], it is shown that, for M transmitted symbols per channel use (PCU), with transmit diversity order of one, the achieved diversity order,

L, at a receiver with rj antennas is L = rj − M + 1. Extending this result to our system model with transmit diversity order of t, we hypothesize that

+ L = t[rj − M] + 1, (2.17)

+ where [θ] = θ, θ ≥ 0; 0, θ < 0. This implies that if t = rj, for a full rate transmission (M = t), then L = 1. The same result is also obtained in [20] for TAST codes with rotated constellation. Therefore, as we show by simulation results, constellation rotation appears to be inadequate to achieve full diversity with linear/suboptimal detectors. Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 37

To improve the achieved diversity gain with a non-linear SIC, we need to maximize its performance by allocating different powers across the interfering data streams in each group [34]. This is because the symbols are decoded successively and the first decoded symbols do not benefit from much interference cancellation. For these first symbols to be decoded reliably, they must be the ones received with the highest SINR.

With downlink power control, in general, users are allocated different transmit powers based on their received signal strengths so that the same quality of service is provided to all users, in spite of their different channel states. Since every user’s signal is transmitted through the same channel to arrive at any given receiver, the differences in transmit powers are preserved at the receivers. They can then sort the interfering users according to decreasing SINR, and decode using the SIC to maximum effect. This means that transmitting with a range of signal powers is a natural result of downlink power control, and the SIC is well-suited for use on the downlink channel, which is why it is the detector structure of greatest interest in this work.

Devising an optimal power control algorithm for the downlink channel is beyond the scope of this work. However, we develop a simple power control scheme which can be implemented with a very low rate feedback. We consider l different levels of power such that the interfering users in the same group (the ones transmitted from the same antenna) are allocated different powers. Since there are t groups and Uint co-channel users, we have

Uint/t co-channel users per group. Therefore it is sensible to define

k0 l = Uint/t = min(G, 2 ), (2.18)

when Uint and k0 are defined in equation (2.13). Each user is then required to measure the received signal to interference-noise ratio (SINR), and compare it with l threshold levels, and then feedback log2 l bits to the base station. At the base station, transmitted power is adjusted so that the user with the higher SINR will get less power. Also, the spreading Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 38

codes/matrices are assigned to the users so that, as far as possible, symbols with the same power are transmitted from different antennas. For SIC to perform correctly, the ordering of the users has to be sent to the receivers.

The simulation results show that even with this simple power control algorithm, the two-stage interference canceller (Figure 2.2) explained in Section 2.3.2 is able to achieve almost the same diversity gain as ML detection, when the bit error rate (BER) is averaged over all users in the system.

It can be easily verified that with this power allocation scheme, the difference matrix,

Em,n, is full rank, and full diversity is achieved by ML detection. For instance, in the √ previous example, if we define Ak = Pk, k = 1, ..., U, then

2 3 A − A − A + A 0 0 A + A + A + A 6 1 2 3 4 1 2 3 4 7 6 7 Em,n = 6 0 A + A − A − A 0 0 7 , 4 1 2 3 4 5

0 0 A1 − A2 + A3 − A4 0 which has a rank of 3. However, to optimize the performance with the ML receiver, the powers should be defined to maximize the coding gain according to ML criterion (see

Section 1.2.2).

In this work, concentrating on SIC, different powers are selected geometrically. We get better performance for larger ratio of powers; however, a large ratio of the powers may not be practically feasible. The dynamic range of the power for each user has to be taken into account as well. Also, if the users are in the same channel states (which is very unlikely) and they all demand the same BER, we cannot have any arbitrary ratio for the allocated powers.

In a single-user (K = 1) system, the same spreading scheme can be employed over different symbols, to design full rate space-time coding. In that case, it gives a similar structure as threaded algebraic space-time code (TAST) of Damen et al [7]. However, the proposed design allows us to obtain vastly improved performance with suboptimal Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 39

detectors, because of the unequal power allocation feature. The number of time slots,

k0 T , is chosen to be T = 2 , when k0 is the integer satisfying log2 t ≤ k0 < log2 t + 1. Without any knowledge of the channel, we deliberately allocate powers unequally to the

symbols transmitted from the same antenna. As explained before, with this technique,

full diversity is achieved by ML detection. In addition to design simplicity, the main

advantage of this scheme is the ability to obtain almost full diversity with two-stage IC,

which is less complex compared to ML detection. In fact this would lead to a new family

of space-time codes with reduced receiver complexity. In Section 2.5 we have compared

the performance of these codes with TAST codes when two-stage IC is applied at the

receiver.

2.3 Receiver Structures

We study both optimal (ML) and suboptimal (SIC) detection at the receiver.

2.3.1 Joint ML Detection

In the ML detection algorithm, each receiver searches for the symbols, s, that maximizes

the conditional density P(yj | s) which is equivalent to:

2 H T H H H H T H ˆs = arg min kyj − HjCsk = arg min(s C Hj HjCs + yj yj − yj HjCs − s C Hj yj). s s (2.19)

k k0 T H From Lemma 1, we know for log2 t ≤ k0 < log2 t + 1, if G = 2 > 2 , then C Hj HjC =

k−k0 k0 k0 Diag(A1, A2,..., A2k−k0 ), where Aq, q = 1, ..., 2 is a 2 t × 2 t matrix. In other

words, we have 2k−k0 orthogonal subgroups, sq, q = 1, ..., 2k−k0 , of the user symbols, and

each sq has 2k0 t symbols. Therefore equation (2.19) can be written as, 2Xk−k0 2Xk−k0 q 2 qH q qH q q q ˆs = arg min (kAqs k − s r − r s ) = arg min f(s , r ) (2.20) s sq q=1 q=1 Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 40

T H q q where r , C Hj yj, and r is a vector of the elements of r associated with s . Therefore ML optimization can be performed over each subgroup separately. In fact, instead of one optimization over U symbols, we have 2k−k0 parallel optimizations over 2k0 t symbols which significantly decreases the computational complexity of ML detection. Since we have parallel detectors, to improve the achieved diversity gain, constellation rotation or power allocation, need to be applied only over each subgroup sq.

2.3.2 Multi-Stage Successive Interference Cancellation

Interference cancellation is applied at each receiver to improve the achieved diversity gain by combating the interference from other users’ symbols. In the multiuser detection literature, SIC is an effective approach to mitigate the interference between the users symbols (see [35–37], and their references).

yj

w1 sˆ1 MMSE + a1 + w2 P sˆ2 n6=j sˆnan - MMSE + wmatch a2 MF . . . . wuint

sˆuint sˆj MMSE

Figure 2.2: Structure of two-stage SIC.

In this work, a two-stage IC is considered. In the first stage, MMSE-SIC is employed: Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 41

the symbols are detected using linear MMSE detector, and cancelled in descending order according to their power levels. In the second stage, after cancelling out all other symbols, the desired symbol is detected based on matched filtering. The structure of the receiver j is shown in Figure 2.2. As explained above, interference cancellation is required only over the interfering symbols. Hence, if CI is the spreading code matrix for the interfering symbols which are sorted in a descending order of power, we define Hc,j = HjCI . Let y1,j = yj, the mathematical notations in Figure 2.2 are obtained from the following two-step algorithm:

Stage I:

Recursion : (m = 1 : Uint)

H = (H ) m,j c,j m:Uint

am = (Hc,j)m

H 2 −1 wm = (Hm,jHm,j + σj I) am

H sˆm = Q(wmym,j)

ym+1,j = ym,j − amsˆm

Stage II: XUint zj = yj − sˆnan n6=j

wmatch = Hj(C)j

H sˆj = Q(wmatchzj) where, (H ) and (H ) denote the columns (m, ..., U ) and mth column of H c,j m:Uint c,j m int c,j respectively, and (C)j is the spreading code vector for the user j. Besides wm and wmatch

2 are MMSE and Matched filter weight vectors respectively, σj is the noise variance, and Q(.) is the quantization operation appropriate to the constellation in use.

For two transmit antennas and G ≥ 2, since the number of interfering users is small Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 42

(Uint = 4), the first stage (SIC) can be also implemented by matched filtering, instead of MMSE, achieving the same performance. In this case, at the first stage we have:

wm = Hj(C)m.

It is simply equalizing the channel, and then de-spreading each user symbol with its own spreading code. The simulation results show that, the two-stage matched filter IC, incorporating power control, provides almost the same performance as ML detection.

2.4 Comparison With Other STC-CDMA Transceivers

Under flat fading conditions, an alternative methodology for transmission over the down- link in a MIMO multiuser channel is to assign an orthogonal spreading vector to each user, perform single-user STC on each user’s message sequence, and then spread (in time only) user j’s signals on every antenna with user j’s spreading vector. The existing transmission strategies are based on BLAST [22] or orthogonal STBC [21, 23]. There are three proposals in [22]: (1) Spreading all the symbols of one user with the same code to maximize the spectral efficiency. (2) Assigning each user t orthogonal spreading sequences, and then spreading each antenna’s signal with its own spreading code. (3)

Transmitting each symbol simultaneously over n ≤ t antennas with n different spreading codes, to improve the diversity gain. We have compared the above techniques with the proposed scheme, 2D-STSC, in Table 2.1, when spectral efficiency, es, is defined as the number of transmitted symbols per channel use.

It should be mentioned that TAST codes proposed in [7] for single user systems, can also be extended to multiuser systems in the same way as explained for STBC in [23].

This may achieve the same spectral efficiency and diversity gain as 2-D STSC, but we are bound to transmit a certain number of symbols to each user and the maximum number Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 43

Spectral efficiency,es Diversity gain Number of users, U

STBC-CDMA es ≤ 1 t.r G BLAST-CDMA(1) min(t, r) r G

BLAST-CDMA(2) min(t, r)/t r G/t

BLAST-CDMA(3) min(t, r)/n n.r G/n

2-D STSC min(t, r) ≤ es ≤ t t.r min(t, r).G ≤ U ≤ t.G

Table 2.1: Comparison between different STC schemes for the downlink in a MIMO multiuser channel of users is G. Besides, TAST codes are optimized based on ML criterion which does not guarantee high performance (full diversity) with suboptimal detectors. In [38], space- time diagonal sequences are proposed for spreading the data over space and time. This scheme may also achieve the same spectral efficiency and diversity gain as 2-D STSC.

However, in this design, the number of transmit and receive antennas is required to be the same as the processing gain.

2.5 Simulations

The performance of the proposed scheme is evaluated in terms of the average bit error rate (BER) versus signal-to-noise ratio per receive antenna. In Monte Carlo simulations, we consider quasi-static fading channels in which the matrix channel is constant over the duration of each symbol. The channels between every pair of transmit and receive anten- nas are assumed to be flat Rayleigh fading and uncorrelated. However, the performance degradation under spatially correlated channels is also demonstrated. The results are averaged over more than 109 different channels. Without loss of generality, we consider

BPSK modulation, and the same number of receive antennas are assumed for all users Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 44

(rj = r, j = 1, ..., K).

0 10

−1 10

−2 10

−3 10

−4 10 Probability of error

−5 10

Single−antenna −6 Decorrelator 10 MMSE Diversity order 2 ML

−7 ML with Rotation 10 0 5 10 15 20 25 SNR (dB)

Figure 2.3: Performance of 2-D STSC for different receivers, t = r = 2, G = 2, U = 4.

In Figure 2.3, we compare the performance of 2D-STSC when t = r = 2 for different receivers: Decorrelator, MMSE, ML, and ML with constellation rotation. We have ex- amined the rotation matrix proposed in [7] for t = r = 2, multiplying the transmitted signal vector s by a diagonal matrix Ψ,

Ψ = diag(1, φ, θ, φθ) (2.21) before spreading, where θ2 = φ, and φ = ej/2. As discussed above, we get the same results for all G = 2k, k = 1, 2, ..., and U = tG. The results are also compared with the single antenna system, where Walsh codes are applied to separate the users. The Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 45

bandwidth efficiency of the proposed scheme is twice that of a single antenna system.

However, because of MAI, the achieved diversity order with MMSE and decorrelator is still one which supports our conjecture in equation (2.17). The achieved diversity order with ML is two (r); but by applying the rotation (2.21) to the constellation points, the full diversity order, t · r = 4, can be obtained.

−1 10

−2 10

−3 10

−4 10

−5 10 Probability of error

−6 10

−7 10 u=16 u=12 u=10

−8 10 0 2 4 6 8 10 12 14 16 18 20 SNR (dB)

Figure 2.4: The effect of MAI (number of users) on the achieved diversity with MMSE for t = r = 4, G = 4.

In Figure 2.4, the effect of MAI on the diversity achieved with an MMSE detector is shown for t = r = G = 4. For full multiplexing gain (U = 16), the diversity order is almost one; while by decreasing the number of users to 10, it goes up to four or more.

Note that from equation (2.17), for U = 10, the achieved diversity order is expected to Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 46

be L = 5.

In the above simulations, the U channels have equal power, and therefore the symbols

2 are sent with the same power. Assuming different channel power gains (σh,j) for different users, we define the average signal-to-noise ratio or Eb/N0 per receive antenna as

PU 2 j=1 Pjσh,j Eb/N0 = 2 . (2.22) σj · U

Figure 2.5 depicts the BER versus Eb/N0 for a system with t = r = 2, G = 4 and eight users in different fading states (see Example I in Appendix A for the spreading matrix design). Following Lemma 1, it can be easily verified that the [s1, s2, s5, s6] users do not interfere with [s3, s4, s7, s8]. As far as the performance of SIC is concerned, we do not need to allocate different powers to non-interfering users. Therefore we assume two different power levels, P1 and P2. The [s1, s3, s6, s8] users are the ones with lower

SINRs, allocated the power P1, and the [s2, s4, s5, s7] users are those with higher SINRs, allocated the power P2. In this specific example, we have chosen P1/P2 = 4. If a 6dB difference in allocated powers to two groups of the users is not practically feasible, a smaller ratio may be considered for P1/P2. For smaller ratios, the performance of SIC may be degraded slightly. On the other hand in a practical situation, it is likely that all the users are allocated different powers as a result of downlink power control, but what we want to indicate in this example is that by having only two different levels of powers between two groups of the users, the IC is able to achieve almost full diversity.

For t = 2, a matched filter SIC is employed in the first stage of IC. Using this simple power control (PC) which requires only one bit feedback from the receivers to the transmitter, full diversity is achieved at the receiver. There is a gap of less than 1dB between ML and matched filter-SIC detector. There is a substantial performance gap between a no power control (NPC) (even with ML detection) and a PC system. It is partly because with NPC, some of the users may be suffering from a weak channel state; Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 47

hence the average BER over all users is high.

−1 10

−2 10

−3 10 Probability of error

−4 10

−5 10 ML+PC SIC+PC SIC(NPC) ML+Rot.(NPC)

−6 10 0 2 4 6 8 10 12 14 16 SNR(dB)

Figure 2.5: The impact of power allocation on the performance of 2-D STSC for t = r = 2,

U = 8.

Figure 2.6 shows the BER versus Eb/N0 for a system with t = r = 4, G = 4, and U = 16 users. We examine two scenarios when a MMSE-SIC is employed in the

first stage of IC. One is for the case in which the users are almost at the same channel power gain, and no power allocation is applied at the transmitter. The other is for the case in which the users have different channel power gains and power control is applied at the transmitter. In the latter case, we assume four different power levels:

P2/P1 = 3,P3/P2 = 3,P4/P3 = 3, such that the users transmitted from the same antenna are allocated different powers. The [s1, s5, s9, s13] users are those with higher SNRs, and Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 48

the [s4, s8, s12, s16] users are those with lower SINRs in the system. The results show that power control significantly improves the achieved diversity order with SIC.

−1 10

−2 10

−3 10

−4 10 Probability of error

−5 10

−6 10

NPC+SIC PC+SIC

−7 10 0 1 2 3 4 5 6 7 8 9 10 SNR (dB)

Figure 2.6: The impact of power allocation on the performance of SIC for t = r = 4,

G = 4, U = 16 .

Figure 2.7 illustrates the performance of 2-D STSC under spatially correlated fading at the transmitter and the receiver. The correlation coefficient at the transmitter and

− the receiver is defined as ρt and ρr respectively. As can be seen, at a BER of 10 3, there is 1.7dB and 3dB performance loss due to antenna correlation ρr = ρt = .25 and

ρr = ρt = .75 respectively, and 2.2dB due to correlation only on the receiver side ρr = .75. The correlation does not make a major penalty in achieved diversity. As explained in

[2] for space-time trellis codes, the performance loss because of the correlation may be Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 49

quantified approximately by coding gain.

−1 10 ρ =ρ =0 r t ρ =ρ =.25 r t ρ =ρ =.75 r t ρ =.75, ρ =0 r t

−2 10

−3 10 Probability of error

−4 10

−5 10 0 2 4 6 8 10 12 14 16 SNR(dB)

Figure 2.7: Performance of 2-D STSC in correlated fading channels for t = r = 2.

Figure 2.8 compares 2-D STSC (PC, and NPC) with BLAST-CDMA(1), and also

TAST codes when the symbols are sent to different users, and SIC detector is applied at each receiver. We have U = 4 users and to have a fair comparison, the same channel gain is assumed for all users. In 2-D STSC with power control, the users (s1, s4) are allocated the same power P1, and the users (s2, s3) are allocated the same power P2 for P1/P2 = 4. In BLAST-CDMA(1), since the symbols are received in different powers, no power control is required, and it has a better performance than TAST or 2-D STSC (NPC)5. However,

5Note that, because of the coding structure in both TAST and 2-D STSC (NPC), all symbols are received with the same power at each user receiver Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 50

full diversity is not provided by BLAST-CDMA(1) (diversity order= r), while a diversity order of t · r can be achieved by 2-D STSC with power control.

−1 10 2D−STSC+PC 2D−STSC (NPC) TAST blastcdma1

−2 10

−3 10 Probability of error

−4 10

−5 10 0 2 4 6 8 10 12 14 16 SNR(dB)

Figure 2.8: Performance comparison of various schemes for multiuser channel in the downlink for t = r = 2, G = 2.

In Figure 2.9, we show that the proposed spreading code design method outperforms

(a) randomly generated spreading codes, and (b) codes that yield orthogonal C, in this case C is a Hadamard matrix, for instance the spreading matrices for users 1 and 4  1 1   1 −1  are respectively as Φ1 =  , Φ4 =  . The simulation parameters are 1 1 −1 1 t = r = 2, G = 2, U = 4, and the ML receiver is used when the same power allocation as

2-D STSC in Figure 2.8 is applied. For case (a), the result shown is a typical one (from Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 51

random matrices with full diversity), but even the best instance of a randomly selected C performs worse than the proposed scheme. The problem is likely due to the non-zero MAI on average (a violation of Design Requirement 1). For the Hadamard matrix, the problem is simply that it does not meet the full-diversity requirements (Design Requirement 3).

Note that similar results are obtained with the SIC.

0 10

−1 10

−2 10 Probability of error

−3 10

−4 10

Non−orthogonal Random Codes Hadamard Codes 2−D STS Codes

−5 10 0 2 4 6 8 10 12 14 16 SNR(dB)

Figure 2.9: Performance of proposed 2D-STSC versus randomly generated ST spreading codes which do not have the zero average MAI property, and Hadamard codes which give zero average

MAI but do not satisfy the full-diversity criterion.

Figure 2.10 depicts the BER versus Eb/N0 for a single user system with two transmit, and two receive antennas. Four symbols are sent together in two time slots. We compare the performance of different receivers: Matched filter-SIC and ML, for the proposed Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 52

power control scheme and for the system with no power control (NPC).

−1 10

−2 10

−3 10 Probability of error

−4 10

ML+Rotation −5 10 ML+Power Cont. SIC+Power Cont. SIC(NPC) SIC+Rotation

−6 10 0 2 4 6 8 10 12 14 16 SNR(dB)

Figure 2.10: Performance comparison of the proposed space-time coding scheme and rotated constellation (TAST) in a single user system for t = r = 2, G = 2.

The transmitted signal is:   √  s1 + s2 s3 + s4  S = (1/ 2)   s3 − s4 s1 − s2 while in power control algorithm we have chosen P1 = P4,P2 = P3,P1/P2 = 4. We have also compared the performance of the proposed scheme with TAST codes which is obtained based on the rotated constellation from Equation 2.21. This figure shows that with power control, BER for two-stage matched filter SIC, has almost the same slope Chapter 2: Space-Time Multiplexing for MIMO Multiuser Downlink Channels 53

with ML for high SNR. That means, full diversity can be achieved at the receiver. Also comparing ML receiver in both cases of PC and rotated constellations (which is supposed to provide the optimal solution with ML) shows a small gap in BER curves. It should be noted that to compensate for this gap, the power levels can be obtained through coding gain maximization based on ML detection criterion. Also it can be seen that with TAST codes, the full diversity is not achieved with the SIC receiver.

2.6 Summary

A two dimensional space-time spreading scheme is proposed for the fading MIMO broad- cast channel which involves the design of a t × G spreading matrix for each user. Under the constraint of no channel information at the transmitter, our design goals were to maximize diversity gain and spectral efficiency while minimizing complexity at both the transmitter and the receiver. The 2D-STSC technique proposed here yields good perfor- mance with a low-complexity two-stage interference canceller at each downlink receiver, and is based on well-known Walsh codes. It also benefits from unequal power allocation to each user. The proposed technique has the interesting property of not suffering from full interference from all active users, and this results in reduced detection complexity.

Compared to existing MIMO multiuser transmission schemes, the proposed technique has better spectral efficiency, greater flexibility in the choice of number of transmit an- tennas, and lower receiver complexity. In addition, the design method is simple and does not require complex re-designs whenever system configurations such as spreading gain or number of antennas change. This makes the proposed method very attractive from a practical viewpoint. The results of this chapter are also published in [30,39–42]. Chapter 3

Precoding and Beamforming for

MIMO Downlink Channels with

Per-User Power Constraints

In the previous chapter, no channel state information was assumed at the transmitter.

In this chapter, assuming perfect channel knowledge at the transmitter, we study beam- forming for the downlink in a multiuser multi-input multi-output (MIMO) channel while a non-linear interference pre-subtraction is presumed at the transmitter. With a per- user power constraint, the primary goal is to minimize the mean squared error between transmit and receive data streams.

Several emerging wireless networks expected to grow in importance in the future are non-mobile in nature. These include broadband wireless access (BWA) networks meant to compete with existing cable and digital subscriber line services, backhaul mesh networks that can be set up more easily than a wired system, and wireless local area networks

(WLANs) which are being deployed and improved at a rapid rate. In these applications, it is reasonable to assume that the wireless channels are nearly time-invariant, and hence

54 Chapter 3: Precoding and Beamforming with Per-User Power Constraints 55

it may be possible for the transmitter to obtain channel information (estimated at the receiver) through a feedback channel, with only a small error caused by limited bandwidth and finite delay in the feedback channel.

In MIMO single user channels, there are a few works for joint transmit/receive beam- forming design based on the MMSE criterion [43,44]. In MIMO multiuser channels, the main challenges are that the receivers cannot cooperate with each other, and that each user is suffering from the interference from all other users.

In [45, 46], and [47], linear transmit preprocessing methods are proposed for the downlink of multiuser MIMO systems. In these techniques, the multiuser MIMO down- link channel is decomposed into parallel independent single user MIMO channels so that the multiuser interference is completely cancelled and then single user schemes can be ap- plied over each independent channel. However, all these techniques are highly restrictive on the number of transmit/receive antennas in the sense that the number of transmit antennas must be greater than the total number of receive antennas at all users. In practical situations where the number of transmit antennas is limited, this constraint severely restricts the number of receive antennas which results in restriction either on receive diversity or the total number of users that can be supported. Also, in [48], a linear beamforming scheme is proposed for the downlink channel in which although there is no restriction on the number of transmit/receive antennas, but the number of transmitted symbols has to be equal or less than the number of transmit antennas.

On the other hand, in multi-user communication systems where the downlink trans- mitter has knowledge of the channels to each receiver, it is well-known that complexity can be moved from the receivers to the transmitter without loss of performance [8, 10, 49].

This operation is analogous to the equivalence between decision-feedback equalization

(DFE) and transmitter precoding, for instance using the Tomlinson-Harashima precod- ing (THP) method, in a single-user frequency-selective channel [26]. In addition, we Chapter 3: Precoding and Beamforming with Per-User Power Constraints 56

also know that the boundary of the capacity region of this broadcast channel is attained with channel knowledge at the transmitter, and using it for successive dirty paper coding

(DPC) [8,28]. This means that users are arranged in some order, and a user’s source in- formation is encoded only after those in the queue before it have been encoded. Since the interference generated by the users higher up in the queue is known, Costa’s DPC result states that a rate equal to the one without that known interference can be supported [50].

The achievable-rate vector for all K users can be found after the optimal transmit co- variance matrices are obtained. For the latter computation, the uplink-downlink duality result of [28] is useful.

In [26] zero forcing pre-equalization is also implemented by THP at the transmitter.

Although this scheme is effective and outperforms linear pre-equalization schemes, the same restriction as explained before, exists on the number of transmit/receive antennas.

Also, similar to all zero forcing methods, the performance is degraded in low signal- to-noise ratios (SNRs). In [51], a THP design is proposed for downlink channels with multiple antennas at the transmitter and one antenna at each receiver. The goal is to obtain minimal transmit power for given encoding order and individual user rate requirements. Compared to zero forcing THP, this scheme requires less power and can be applied to a system with more users than transmit antennas.

In this work, multiuser downlink MMSE beamforming is combined with non-linear

THP to minimize the mean squared error between transmit and receive data streams.

In the proposed algorithm, there is no limitation on the number of transmit/receive antennas.

The contributions of this chapter are as follows. The optimum transmit/receive beam vectors are obtained based on a minimum mean-squared error (MMSE) criterion and a per-user power constraint. Since successive interference cancellation is applied at the BS, the available single user algorithms can be extended to be applicable over individual single Chapter 3: Precoding and Beamforming with Per-User Power Constraints 57

user channels. The algorithm proposed here is based on the work in [44]. The receive beam vectors are obtained with the MMSE criterion, and the transmit beam vectors are obtained through singular-value-decomposition (SVD) of the whitened channels. In fact part of the multiuser interference is cancelled out at the BS and the residual interference at each user is minimized by transmit/receive beamforming. In this proposed scheme, there is no limitation on the number of transmit/receive antennas. Also, when the signal model is generalized to include the time domain, the proposed algorithm offers a unique method for assigning time slots to users in a MIMO-TDMA system.

The remainder of this chapter is organized as follows. The problem formulation, signal model and precoding structure are given in the next section. The precoding matrix design as well as transmit and receive beamforming matrix design are discussed in Section 3.2.

Space-Time precoding/beamforming is introduced in Section 3.3. Simulation results are given in Section 3.4, and we conclude with Section 3.5.

3.1 Problem Formulation

3.1.1 Signal Model

The downlink in a multiuser MIMO channel is considered with K users, where the base station has t transmit antennas and each receiver has ri antennas for i = 1, ..., K. Then the baseband signal received by user i is:

XK yi = Hi Bjxj + ni, i = 1, ..., K, (3.1) j=1

ri×t where Hi ∈ C is the matrix of flat-fading channel gains between the transmitter and the ith receiver’s antennas, and ni is a ri × 1 vector of i.i.d. complex Gaussian random

2 variables with zero mean, and the variance of σi , representing receiver noise for user i. Chapter 3: Precoding and Beamforming with Per-User Power Constraints 58

It is assumed that in the ith MIMO channel, each flat fading coefficient has the same

2 variance of σh,i, and that the t.ri channels are independent. xi is the transmitted symbol vector to the user i and Bj is the transmit beamforming matrix with

H tr(Bj Bj) ≤ pj,

when pj is the transmit power constraint for user j.

th The transmitter is designed to send zi symbols simultaneously to the i user. Assum- ing that the signal dimensionality at the receiver has to be no smaller than the number of mutually interfering symbols to achieve an acceptable bit error probability (BEP),

1 ≤ zi ≤ min(t, ri). (3.2)

£ ¤ T T T If we define the aggregate received signal vector y = y1 ··· yK , the overall channel £ ¤ T T T matrix H = H1 ··· HK , the beamforming matrix B = [B1 ··· BK ], and the transmit £ ¤ T T T signal vector x = x1 ··· xK , the multi-user signal model is

y = HBx + n. (3.3)

3.1.2 Precoding

In point-to-point communication systems or multiple access channels in which the re- ceivers are coordinated, a minimum mean-square error decision-feedback equalizer (MMSE-

DFE) can be applied to untangle the interference. The structure of DFE at the receiver is shown in Figure 3.1. After each symbol is detected, it is subtracted from the received signal before the next symbol is detected. As a consequence error propagation may occur in these receivers. For multiple access channel, this is equivalent to serial interference cancelation (SIC). In the downlink channel, unlike in the other two, receivers are uncoor- dinated and hence it is impossible to implement joint processing techniques such as DFE. Chapter 3: Precoding and Beamforming with Per-User Power Constraints 59

However, if the channel is known to the transmitter, the decision feedback receiver can be implemented at the transmitter as shown in Figure 3.2. A well-known example of this type of “pre-equalizer” is the Tomlinson-Harashima precoder (THP), introduced in the context of equalization of ISI channels. In this work, the matrix form of THP is applied to the multi-user downlink channel (Figure 3.2). This presents the same idea as DPC: if the interference is known at the transmitter, it can be pre-subtracted prior to transmission.

In this way, error propagation may be avoided. A regular subtraction at the transmitter

Transmitter Channel Receiver n

x ? xˆ - B - H -+i - F -+i - - 6

I − G 

Figure 3.1: Block diagram of the matrix DFE.

will cause power amplification at the transmitter. Modulo-arithmetic in the form of THP is used at both transmitter and receiver in order to minimize power amplification [27,49].

If the data symbols (d) are from an M-ary constellation: χ = {±1, ±3, ..., ±(M − 1)}

(M even), then the operation of this modulo adder is such that the transmitted symbols, x, are constrained into the interval [−M, +M). Indeed if the result of the summation is greater than M, 2M is repeatedly subtracted until the result is less than M. If the result of the summation is less than −M, 2M is repeatedly added until the result is greater than or equal to −M. The decoder also has to perform a modulo-M operation to recover Chapter 3: Precoding and Beamforming with Per-User Power Constraints 60

the original signal1. In other words, if we define z = x mod-M then    x |x| ≤ M z =   x + 2Mq |x| > M, q ∈ Z s.t. |x + 2Mq| ≤ M

To design the Tomlinson-Harashima precoder, an encoding order K, ..., 1 is assumed which means that the interference caused by the users i + 1, ..., K to the user i is known prior to transmission. Hence G is a block upper triangular matrix with identity matrices on the diagonal:    IG1       0 I G2     .  G =  ..  , (3.4)        0 I GK−1    0 ... I P zi×ζi K where Gi ∈ C with ζi = m=i+1 zi.

Transmitter Channel Receiver n

d x ? ˆ -+i - mod - B - H -+i - F - mod - -d 6 M M

I − G 

Figure 3.2: Matrix form of the Tomlinson-Harashima precoder.

In a broadcast channel, since the receivers are not cooperating, the feedforward matrix

(Figure 3.2) has to be block diagonal

H H F = Diag(F1 ,..., FK ), (3.5)

1It is well known that by implementing THP, there is still a power loss equal to M 2/M 2 − 1 at the transmitter [27]. The loss is small for moderate to large M, and hence in this work we ignore it. Chapter 3: Precoding and Beamforming with Per-User Power Constraints 61

H where Fj is a zj × rj matrix for j = 1, ..., K. We attempt to solve the following problem:

Problem statement: Using a per user power constraint, design the beamforming matrix (Bi), and the feedforward and feedback matrices Fi and Gi respectively in order to minimize the mean squared error between the transmitted and received data symbols for each individual user i. The design is performed to either

• minimize the sum of the MSE for all the symbols transmitted to the same user or

• minimize the sum of the MSE for all the symbols transmitted to the same user

while the same MSE is maintained for all the symbols.

3.2 MMSE Beamforming/Precoding

To derive the transmit/receive beamforming matrices in the downlink, because of the interference pre-subtraction at the BS with the encoding order K, ..., 1, the first user is interference-free and transmits over a single user channel. After designing the beam- forming matrix for the first user, the beamforming matrix for the second user may be designed, treating the first user as an interferer. This procedure can be continued for the users l = 3, ..., K sequentially, with users i = 1, ...l − 1 interfering with user l. Therefore, available single user schemes ( [43, 44]) can be applied to design the beamforming ma- trices for each user individually. In the following, we first design the matrix G so that the interference from the users l + 1, ..., K are cancelled out at the user receiver l for l = 1, ..., K. Then, we design the optimum transmit/receive beamforming matrices to minimize the MSE between transmit and receive data streams when we have a per-user power constraint. Chapter 3: Precoding and Beamforming with Per-User Power Constraints 62

3.2.1 Precoding Matrix Design

ri The signal that is received by user i on the downlink is yi ∈ C : Ã ! XK Xi−1 yi = Hi + Bjxj + Bjxj + ni, (3.6) j=i+1 j=1 when xi is the ith user’s transmitted signal, before filtering with Bi,    xi+1     .  xi = di − [Gi,i+1,..., Gi,K ]  .  mod-M (3.7)   xK K = di − Gixi+1 + 2Mqi (3.8)

K = vi − Gixi+1, (3.9)

zi×zj where Gi,j ∈ C cancels interference from user j on user i, zi is the number of

K T T T zi symbols transmitted simultaneously to user i in xi, xi+1 = [xi+1,..., xK ] , ∈ Z , and vi = di + 2Mqi.

Note that vi is a signal from an expanded signal constellation that is congruent with the constellation of di, meaning that di can be recovered without error from vi, assuming that M is chosen so that every element of di has a magnitude less than M.

H The received signal by user i, yi, is filtered by a matrix Fi and then after performing a mod-M operation, a decision statistic is formed for di: " # ¡ ¢ XK Xi−1 ˆ H K H di = Fi Hi Bi vi − Gixi+1 + Bjxj + Bjxj + Fi ni mod-M (3.10) j=i+1 j=1 " # XK Xi−1 H H = Fi Hi Bivi + (Bj − BiGi,j)xj + Bjxj + Fi ni mod-M (3.11) j=i+1 j=1

For complete interference pre-subtraction, we will choose Gi,j to remove the second term above, and so

H −1 H Gi,j = (Fi HiBi) Fi HiBj. (3.12) Chapter 3: Precoding and Beamforming with Per-User Power Constraints 63

ˆ With this choice of Gi,j, di becomes

Xi−1 ˆ H H H di = Fi HiBivi + Fi HiBjxj + Fi ni mod-M (3.13) j=1 and this can be further manipulated into

Xi−1 ˆ H H H 0 di = vi + (Fi HiBi − I)vi + Fi HiBjxj + Fi ni − 2Mqi. (3.14) j=1

Given that vi = di + 2Mqi, we then have Xi−1 ˆ 0 H H H di = di + 2M(qi − qi) + (Fi HiBi − I)vi + Fi HiBjxj + Fi ni. (3.15) j=1

2 0 According to the simulation results , with a probability of .996, qi = qi and therefore the error vector for user i can be defined as

Xi−1 ˆ H H H ei = di − di = (Fi HiBi − I)vi + Fi HiBjxj + Fi ni. (3.16) j=1

Then the covariance matrix of the error vector will be

H Ei = E[eiei ], (3.17) H 2 H 2 H H 2 = Fi Rri Fi + σv I − Fi HiBiσv − Bi Hi Fiσv , where

H H 2 Rri = HiBiBi Hi σv + R(n+I)i , (3.18) with Xi−1 2 2 H H R(n+I)i = σ I + σxHi( BjBj )Hi (3.19) j=1 being the covariance matrix of the noise and residual interference seen by user i where

2 2 σx and σv are respectively the variance of individual elements of xi and vi. The symbols generated by THP, xi are almost i.i.d., and uniformly distributed within the region

2The simulations are performed in two different cases: t = K = 4, and t = K = 8, averaged over 500000 different channel states in a range of SNR= [−4, 0, 4] dB. Chapter 3: Precoding and Beamforming with Per-User Power Constraints 64

2 2 [−M,M), and hence σx = M /3 [27]. In this work, we consider a BPSK modulation

2 (M = 2) and for convenience we have assumed σx = 1. The elements of vi are also assumed to be i.i.d., and uncorrelated with the elements of xj, j = 1, ..., K. To obtain

2 σv , we derived the distribution function of qi through the computer simulations, by performing the simulations over more than 500000 different channel states for t = K = 4, and t = K = 8. Based on the distribution function of qi for i = 1, ..., K obtained through

2 2 the computer simulations, qi = 0 with a probability of 0.984, and therefore σv ≈ σd = 1

2 2 with a very good approximation. In the rest of this thesis, we assume σv = σd = 1.

3.2.2 Optimum Receive Matrix

The MSE of the lth substream transmitted to user i is the lth diagonal element of Ei,

H H H H [Ei]ll = fi,l Rri fi,l + 1 − fi,l Hibi,l − bi,lHi fi,l, (3.20)

where fi,l and bi,l are the lth column of Fi and Bi respectively. It can be seen that for a given Bi,[Ei]ll is convex in fi,l and independent of other columns of Fi as well as other

Fj, j = 1, ..., K, j 6= i. Therefore each fi,l can be independently optimized. In fact, given a transmit beamforming matrix {Bi}, the optimal receive matrix Fi is obtained such that the diagonal elements of Ei are minimized. This is equivalent to solving:

H min c Eic ∀c (3.21) H Fi

H H H Differentiating c Eic = T r(Eicc ) with respect to Fi and setting the result to zero yields

H H Rri Ficc − HiBicc = 0 ∀c, and therefore opt −1 F = Rr HiBi i i (3.22) H H −1 = (HiBiBi Hi + R(n+I)i ) HiBi, Chapter 3: Precoding and Beamforming with Per-User Power Constraints 65

which is the linear MMSE receiver (Wiener filter). This receiver is optimum for both criteria in the problem stated in Section 3.1. A similar derivation is given for the single user case in [44]. Note that because of the successive nature of the design algorithm,

R(n+I)i is known before designing the beamforming matrices Bi and Fi for user i.

3.2.3 Optimum Transmit Matrix

With the optimal receive matrix Fi in Equation (3.22), user i’s mean square error is at its minimum value for the given {Bi}, and the error covariance matrix will be

o H H H H −1 Ei = I − Bi Hi (HiBiBi Hi + R(n+I)i ) HiBi (3.23) H −1 = (I + Bi RHi Bi) , where the last line is obtained by the matrix inversion lemma, and R = HH R−1 H . Hi i (n+I)i i

Next, the transmit beamforming matrix Bi is designed using two methods based on the MSE of user i.

• Sum-MSE Minimization Design

Corollary 1: Minimizing the sum of the diagonal elements of MSE matrix Ei with

respect to Bi

min tr[Ei] (3.24) Bi

subject to

H tr(Bi Bi) ≤ pi (3.25)

is given by

opt Bi = UiΣi (3.26)

where pi is the power constraint for user i, Ui denotes the zi eigenvectors associated with the z largest eigenvalues, λ , of R = HH R−1 H and Σ is a diagonal i i Hi i (n+I)i i i Chapter 3: Precoding and Beamforming with Per-User Power Constraints 66

zi × zi matrix with the diagonal elements of %i,l when

h i+ 2 −1/2 −1/2 −1 %i,l = µi λi,l − λi,l , (3.27)

+ where [θ] = θ, θ ≥ 0; 0, θ < 0, λi,l are the zi largest eigenvalues of RHi , and µi is chosen to satisfy the transmit power constraint for user i: P zi λ−1/2 µ1/2 = jP=1 j (3.28) i zi −1 pi + j=1 λj

Proof: This optimization problem is solved for the single user case in [44]. The key difference is that the noise covariance term is replaced here by the noise-plus-interference covariance matrix. Since the derivations in [44] are valid for any arbitrary noise covari- ance matrix, the results are directly applicable.

H It can be easily shown that Fi HiBi becomes a diagonal matrix which means the

MIMO channel for each user is diagonalized. By substituting Bi from (3.26) in (3.23) the error covariance matrix is

o H −1 Ei = (I + Σi DiΣi) , (3.29)

in which Di is a diagonal matrix with the elements of the largest eigenvalues of RHi in an increasing order. Therefore the MMSE matrix for each user is diagonal with diagonal elements in decreasing order.

• Equal MSE Design

Corollary 2: Minimizing the sum of the MSE with the same constraint as (3.25)

while the same MSE is maintained for all the symbols transmitted to the same user

is given by:

opt Bi = UiΣiQ (3.30)

where Ui, Σi are defined in corollary 1 and Q could be any rotation matrix that satisfies |Q(i, k)| = |Q(i, l)|, ∀i, k, l such as the DFT matrix [44]. Chapter 3: Precoding and Beamforming with Per-User Power Constraints 67

It can be seen that in this design, neither the MIMO channel, nor the MSE matrix is diagonal. However, the MSE matrix has identical diagonal elements. In fact for any given

H H −1 Bi, a rotation matrix Q can be found so that Q (I+Bi RHi Bi) Q has identical diagonal elements. The sum of the diagonal elements of MSE matrix is the same regardless of Q.

In [48], a similar approach based on Sum-MSE minimization (3.26) is applied in the downlink of a multiuser system to transmit multiple symbols to multiple users based on a per-user power constraint. However, interference pre-cancellation is not applied at the

BS and therefore R(n+I)i is dependent on other users beam vectors, and the MSE for each user is dependent on other users MSE which makes the algorithm very complicated.

The authors proposed an iterative algorithm in which each transmit matrix is obtained by fixing the others but, there is no guarantee for the convergence of the algorithm, and it may converge to a false or local optimum. Here, because of the interference pre-

subtraction, R(n+I)i for i = 1, ..., K is only dependent on the other users’ beamforming matrices, Bl for l ≤ i − 1 (see (3.19)). Therefore, it is possible to design the transmit beamforming matrices, Bl sequentially for l = 1, ..., K.

The algorithm is summarized in Table 3.1.

3.2.4 Precoding Ordering

Performance of the proposed algorithm is also dependent on the precoding ordering.

Defining the optimal ordering which is related to channel fading states, is an open prob- lem. Indeed, to find the optimal ordering, the algorithm has to be examined over K! rearrangements which is not practical for a large K. In [10] an optimal ordering scheme is proposed for K = 2 when the optimization goal is to minimize total transmit power, satisfying certain SINRs at different users while each user has one single-antenna.

Here, we have applied a suboptimal low complexity method. The users are ordered Chapter 3: Precoding and Beamforming with Per-User Power Constraints 68

2 Initialization: C = 0, Rn = σ I. for i = 1 : K

H R(n+I)i = Rn + Hi ∗ C ∗ Hi , R = HH ∗ R−1 ∗ H , Hi i (n+I)i i

Bi = UiΣi,

H H −1 Fi = (HiBiBi Hi + R(n+I)i ) HiBi,

H C = C + Bi ∗ Bi . end

for j = K − 1 : −1 : 1

H −1 H Gj = (Fj HjBj) Fj Hj [Bj+1, ...., BK ] . end

Table 3.1: The algorithm for precoding and MMSE beamforming

2 based on the square Frobenius norm of their channel matrix, kHikF for i = 1, ..., K:

X 2 2 kHikF = Hi (r, t), (3.31) r,t when Hi(r, t) is the (r, t) element of the matrix Hi. The user which is supposed not to see any interference, is the one with the smallest channel matrix norm, and the user who sees the full interference (in this work, the Kth user) is the one with the largest channel matrix norm. This method especially makes sense, if all users have the same power constraint. Chapter 3: Precoding and Beamforming with Per-User Power Constraints 69

3.3 Space-Time Spreading

In the proposed scheme, there is no limitation on the number of users in the system.

However, as the number of active users is increased, the MMSE is increased for each user too. So if there is a quality of service constraint for each user, the number of users we can accommodate is, in fact, limited. On the other hand, if we are allowed to increase the bandwidth by spreading each symbol over time, we can increase the number of active users without any performance loss. The beamforming is now optimized over two dimensions: space and time. Therefore as in Chapter 2, instead of assigning a beam vector to each user symbol, one t×G beamforming matrix, Φil, i = 1, ..., K, l = 1, ..., zi is assigned to each user symbol where G is the processing gain, and T = GTc is the symbol period where Tc is the chip period. Then the transmitted signal is

XK Xzi X = xilΦil. (3.32) i=1 l=1 The channel is assumed to be constant over at least one symbol period. The signal received by user i, over G channel uses and ri receiver antennas, is

XK Xzj ri×G Yi = xjlHiΦjl + Ni ∈ C (3.33) j=1 l=1 where Ni is a matrix of i.i.d. complex Gaussian random variables with zero mean and

2 the variance of σi , representing receiver noise.

By stacking the columns of Yi, we obtain

XK ˜ y˜i = Hi Cjxj + n˜i, (3.34) j=1 £ ¤ ˜ where y˜i = vec(Yi), Hi = Diag(Hi,..., Hi), xj = xj1, ..., xjzj , n˜i = vec(Ni) and £ ¤ Cj = vec(Φj1),..., vec(Φjzj ) . Essentially, each column of Cj corresponds to a space- time modulation waveform for one data stream. This presents exactly the same system Chapter 3: Precoding and Beamforming with Per-User Power Constraints 70

model as Equation (3.1). Therefore we can apply the same algorithm explained in Section

3.2 to design the matrices Ci, Fi, and Gi for each user. As an alternative to the above methodology, one may think of choosing a subset of users randomly in each chip time-slot (TDM fashion) and perform the original algorithm to design the proper beam vectors over each subset independently. However, by apply- ing the algorithm once over all users as presented in signal model (3.34), the transmit beamforming matrices are designed to minimize noise and interference based on the co- variance matrix in Equation (3.19), and therefore the subsets of the users that result in the least amount of interference are automatically selected to be transmitted at the same time. In fact, each user symbol is transmitted through the subchannel with the highest effective gain or equivalently with the minimum amount of interference. The result of the algorithm is to assign each user symbol to only one time slot (which has the least amount of interference), rather than spreading each user symbol over all time slots. As will be shown by numerical results in Section 3.4, even though the channels are constant during one symbol time, the algorithm benefits from selecting the simultaneous users intelligently.

The other interesting point is that when we transmit multiple symbols to each user

(at most min(t, ri)) and the beam vectors are designed based on either of the schemes in Equation (3.26), or Equation (3.30), if G ≥ min(t, ri), no two symbols for the same user will be transmitted at the same time slot. This is because the beam vectors are obtained by the largest eigenvectors of R . Since H˜ is a block diagonal matrix with H˜ i i min(t, ri) different eigenvalues and G min(t, ri) eigenvectors, all G largest eigenvectors are orthogonal in time. The importance of this simple fact is that although we do not have interference pre-cancellation for the symbols transmitted to the same user, there is no interference between the transmitted symbols to the same user because their beam vectors are orthogonal in time. Chapter 3: Precoding and Beamforming with Per-User Power Constraints 71

3.4 Simulation Results

In this section we provide some numerical results to illustrate the performance of the proposed algorithm. In all of the simulations a Rayleigh fading channel is assumed with

2 zero mean uncorrelated complex Gaussian noise across the receive antennas (Rn = σ I). The elements of the channel matrix H are generated as independent and identically distributed (i.i.d.) samples of a complex Gaussian process with zero mean and unit variance. The channel matrix is known at the transmitter. Each user is assumed to have full knowledge about its own channel and also the transmit matrix B. Without loss of generality, the same power constraints, the same number of receive antennas and the same number of transmit symbols are assumed for all users (pi = p, ri = r, zi = z, for i = 1, ..., K). The results are presented based on the average probability of error

2 (Pe) versus signal-to-noise ratio per receive antenna (SNR= p/σ (dB)) for each user symbol. Here, the proposed algorithm in Section 3.2 is recalled as MMSE algorithm and the design of transmit beamforming matrix for each user is based on Sum-MSE minimization in Equation (3.26).

Figure 3.3 depicts the average Pe versus SNR for a system with four transmit antennas and two users when two symbols are transmitted to each user. As is shown by increasing the number of receive antennas for each user from r = 2 to r = 4, at an average Pe of 10−3, the performance is improved by 8.8dB because of the receive diversity. For the same number of transmit/receive antennas, space-time precoding/beamforming is examined in

Figure 3.4. The processing gain, G = 4 which means the signal is spread over four time slots. We have K = 8 users and two symbols are transmitted to each user. Again, by increasing the number of receive antennas from r = 2 to r = 4, the performance is improved significantly because of receive diversity. Also for r = 4, the average Pe is compared with the Pe of the eighth user. Note that, since the precoding order is 8, ..., 1, Chapter 3: Precoding and Beamforming with Per-User Power Constraints 72

the eighth user is the one who sees full interference and may expect the worst performance compared with other users. As is shown, for low SNRs the performance for this user is about 1dB worse than the average.

−1 10

−2 10

−3 10

−4 10 Probability of error

−5 10

r=4 r=2

−6 10 −4 −2 0 2 4 6 8 10 12 SNR (dB)

Figure 3.3: The average Pe for different number of receive antennas and t = 4,K = 2, z = 2.

Comparing the results of these two figures, confirms the fact explained in Section 3.3.

In Figure 3.3, the users are assigned to the time slots in a TDM fashion and the beam vectors are optimized only over space. In Figure 3.4 by optimizing the beam vectors over both space and time dimensions, the symbols which generate the least amount of interference to each other are transmitted together in the same time slot. As a result,

−3 in the latter case, at an average Pe of 10 , the performance is improved by 4.5dB and Chapter 3: Precoding and Beamforming with Per-User Power Constraints 73

−1 10

−2 10

−3 10

−4 10 Probability of error

−5 10

−6 10

Average P for r=2 −7 e 10 P for user−8, r=4 e Average P for r=4 e

−8 10 −4 −2 0 2 4 6 8 10 12 SNR (dB)

Figure 3.4: The performance of space-time spreading for different number of receive antennas, t = 4,G = 4,K = 8, z = 2. Chapter 3: Precoding and Beamforming with Per-User Power Constraints 74

0 10

−1 10

−2 10

−3 10

−4 10 Probability of error

−5 10

P for user 2 −6 e 10 Average P e P for user 1 e

−7 10 −4 −2 0 2 4 6 8 10 12 SNR (dB)

Figure 3.5: Average Pe compared with Pe for each individual user, t = 2, r = 2,K = 2, z = 1.

2.5dB for r = 2 and r = 4 respectively.

In Figure 3.5 the average probability of error is compared with the probability of error of each user when we have two transmit antennas and two users. The first user is the one who does not see any interference, the second user is the one who sees the full interference.

If we compare the results of Figure 3.5 with the results of Figure 2.5 in Chapter 2, where the channel is unknown at the transmitter and we have linear space-time coding at the transmitter and serial interference cancellation at the receiver, it can be seen that

−4 at an average Pe of 10 , the improvement in Figure 3.5 is about 5dB. This is achieved Chapter 3: Precoding and Beamforming with Per-User Power Constraints 75

by channel knowledge at the transmitter and then applying interference pre-subtraction.

3.5 Summary

A joint transmitter/receiver precoding and beamforming design is proposed for the down- link in a MIMO multiuser channel by using DPC. The optimum transmit/receive beam vectors are obtained based on a minimum mean-squared error (MMSE) criterion and a per-user power constraint. There is no constraint on the number of transmit/receive an- tennas and because of the interference pre-subtraction at the BS, the single user MMSE beamforming schemes can be extended to the current multiuser channel. To increase the number of active users over the number of transmit antennas without loss of performance, the signal is spread over both time and space. Then, to minimize the MSE between the transmitted and received data, the beam vectors are optimized over both dimensions: space and time. As a result, each user symbol is assigned to the time slot with the least amount of interference. In fact the subsets of the users symbols that result in the least amount of interference are automatically selected to be transmitted at the same time.

The results of this chapter are published in [52]. Chapter 4

Precoding and Beamforming for

MIMO Downlink Channels to

Minimize Total Transmit Power

Assuming perfect channel knowledge at the transmitter, precoding or pre-processing of the transmitted signal can be applied to achieve various practically significant objec- tives. In this chapter, we are interested in power efficient multiuser beamforming in the downlink when multiple antennas are present at both the transmitter and the receivers.

Successive interference pre-subtraction is applied via a matrix version of Tomlinson-

Harashima precoding (THP) as introduced before in Chapter 3 (see Figure 3.2). In

Chapter 3 the optimization is based on a per-user power constraint, but here the objec- tive is to minimize the total transmitted power, while satisfying pre-defined signal-to- interference-noise ratio (SINR) requirements at each receiver.

To date, precoding designs for the single-antenna receiver scenario have been pre- sented, but the multi-antenna receiver problem has proved to be more challenging, and is not as well understood.

76 Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 77

As will be shown, transmit beamforming is much more complicated than receive beamforming when the total transmitted power is to be minimized [9, 53]. This fact coupled with the uplink-downlink duality explored in [10,54,55] for single antenna users, has motivated designs for downlink beamforming through virtual uplink beamforming.

Based on this duality, the uplink multiple access channel and the downlink broadcast channel have the same SINR achievable region when the sum power in the uplink is the same as the transmit power constraint in the downlink. In [10,49] a combined successive interference pre-subtraction and beamforming algorithm is proposed for single-antenna users, based on uplink-downlink duality. For single-antenna users, the duality result leads to a simple way to compute the optimal downlink beamformers, by solving a dual uplink problem with single-antenna transmitters. However, when we have multi-antenna users, the problem remains unsolved because both the downlink and virtual uplink transmitters have multiple antennas, and hence the difficult problem of designing transmit beam vectors in a multi-user system cannot be circumvented through uplink-downlink duality.

This problem is the focus of our work in this chapter.

The contributions of this chapter are as follows. An iterative algorithm is presented for designing the one transmitter and multiple receivers in a downlink or broadcast chan- nel (BC). Transmitter precoding takes the form of successive interference pre-subtraction, while the receiver employs a linear combiner or beamformer. We adopt an approach sim- ilar in spirit to the one in [53], where transmit and receive beamformers and transmit powers are obtained one after another, and further iterations of this cycle produce im- proved estimates. In our case, the THP concept simplifies the iterations significantly.

MMSE designs are used throughout, because they maximize SINR for a given transmit- ted signal structure. An uplink-downlink SINR duality result is proved and used in the proposed algorithm, which computes minimum mean squared error (MMSE) beamform- ing receivers for the virtual uplink and the downlink in turn. The receive beamformers Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 78

of the virtual uplink are used as the transmit beamformers in the downlink, and vice versa. Initialization is provided by an eigen-decomposition scheme, which was originally proposed for single user systems. In addition, when we generalize the “beamforming” vectors to non-spatial dimensions such as time, we find that the proposed algorithm in fact offers a unique method for assigning time slots to users in a MIMO-TDMA system.

In all our simulation examples, it is found that the proposed algorithm yields a lower total power than a random resource allocation algorithm.

The remainder of this chapter is organized as follows. The signal model and also the background are provided in the next section. Uplink-Downlink duality for MIMO chan- nels and the algorithm for joint power allocation and MMSE beamforming are discussed in Section 4.2. The extension of the proposed algorithm to space-time multiplexing is provided in Section 4.3. In Section 4.4, we show how possibly the proposed scheme can be extended to transmit multiple symbols to each user. Simulation results are given in

Section 4.5, and we conclude with Section 4.6.

4.1 Problem Formulation

4.1.1 Signal Model

A MIMO broadcast channel is considered with K users, where the base station (BS) has t transmit antennas and each user receiver has ri antennas for i = 1, ..., K. Then the baseband signal received by user i is:

yi = HiBx + ni, i = 1, ..., K, (4.1)

ri×t where Hi ∈ C is the matrix of flat-fading channel gains between the transmitter and the ith receiver’s antennas, and ni is a vector of i.i.d. complex Gaussian random variables

2 with zero mean and variance of σi , representing receiver noise. B is the beamforming Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 79

matrix and x is the vector of signals transmitted to all users. This gives the same signal model as (3.1). Also the multi-user signal model for the overall BC channel is the same as (3.3). The key difference in this chapter is that we do not have per-user power constraints; the beam vectors are designed to minimize total transmit power. Besides, here we present the design to transmit one single symbol to each user. We will later show how the proposed design can be possibly extended to transmit multiple symbols to each user as well. In that case each data stream can have its own desired SINR.

As explained in Section 3.1.2, since in the broadcast channel, receivers are uncoor- dinated, it is impossible to use joint processing techniques such as decision feedback equalization (DFE) shown in Figure 3.1 (page 60). However, if the channel is known to the transmitter, the decision feedback receiver can be implemented at the transmitter as shown in Figure 3.2 (page 61). In this chapter, we have the same THP structure

(Figure 3.2) introduced in Section 3.1.2, to implement interference pre-subtraction at the transmitter, and we attempt to solve the following problem.

Problem statement Design the beamforming matrix (B), and the feedforward and

feedback matrices F and G respectively in Figure 3.2 to minimize total transmitted

power while maintaining individual target SINRs:

PK minp,B,F,G i=1 pi (4.2)

subject to: SINRi ≥ γi, 1 ≤ i ≤ K,

where p = [p1, p2, ..., pK ] is the vector of transmission powers to individual users,

and γi is the required SINR for user i.

The optimization problem just stated is difficult to solve because the constraints are non-convex in the optimization variables. However it is possible to solve it approximately using the algorithm to be described. Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 80

4.1.2 Background

For single-antenna users, the problem of power efficient multiuser beamforming transmis- sion is solved in [10], based on the uplink-downlink duality result. This duality states that the uplink multiple access channel and the downlink broadcast channel have the same

SINR achievable region when the sum power in the uplink is the same as the transmit power constraint in the downlink and the same receive beam vector in the uplink is used as the transmit beam vector in the downlink [10, 54, 55]. In addition to beamforming, dirty paper precoding is applied at the base station to partially combat the interference between the users in the downlink. A precoding order {K,..., 1} is considered, which means that the interference caused by users i+1,...,K to user i is known prior to trans- mission. Therefore the first user does not see any interference, the second user only sees the interference from the first user, and so on. In the dual uplink, successive interference cancelation (SIC) without error propagation is considered at the receiver (BS) which is reciprocal to DPC in the downlink. As is shown in [10], duality holds as long as the uplink channels are decoded in the reverse order {1,...,K} as the downlink.

To further clarify the importance of this duality in beamforming design, note that the SINR for user i in the uplink is:

H 2 qi|w hi| SINRUL = P i , (4.3) i H 2 K H wi (σ I + k=i+1 qkhkhk )wi where hi is the channel vector between the transmit antennas and the ith user and wi is the receive beam vector for the user i. It can be observed that, the uplink SINR for the user i, for fixed q, is only dependent on the beam vector wi. Hence, the SINR for each user can be independently maximized through MMSE receivers. However, in the downlink, the SINR at receiver i is:

p |bH h |2 SINRDL = P i i i (4.4) i i−1 H 2 2 k=1 pk|bk hi| + σ Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 81

where bi is the ith column of the matrix B which is the transmit beam vector for the user i. It shows that SINRi is coupled not only with bi, but also with other beam vectors bk, k = 1, ..., i − 1, and therefore solving (4.2) directly over the downlink is too complicated. In fact, the designed beam vector for each user affects the interference experienced by other users. Hence, the transmit beam vectors are obtained in the dual uplink as MMSE receive filters, and then powers are assigned to different users to satisfy the SINR constraints based on (4.3). To obtain qi, the beam vector wi and the powers ql for only l < i have to be calculated. Therefore it is possible to obtain the beam vectors and the powers in the virtual uplink sequentially, starting from user K. In the downlink, wi is used as the transmit beam vector and the transmit powers are adjusted to satisfy the required SINRs (4.4). It is shown in [10] that this procedure minimizes the total transmitted power for a given set of required SINRs.

In this work, the above results are extended to MIMO BC channels in which the users have multiple antennas. Even for single-antenna users, as will be discussed in Section 4.3 and later in Chapter 5, if the beamforming is generalized over the time and frequency dimensions, we have an effective MIMO channel model.

In MIMO BC channels, the problem is that, besides the transmit beam vectors at the BS, the receive beam vectors have to be designed too. In the virtual uplink, we now have difficulty designing the transmit beam vectors. To be precise, the SINR for user i in the virtual uplink is:

H 2 qi|w Hiai| SINRUL = P i , (4.5) i H 2 K H H wi (σ I + k=i+1 qkHkakak Hk )wi where ai is the transmit beam vector of the user i in the uplink. To minimize total transmit power for a given {ai} and {qi}, wi are designed to minimize MSE. However, the design of {ai} is not straightforward. The main difference between single- and multi-antenna receiver scenarios is that in the Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 82

former, the virtual uplink only has one scalar unknown qi for each user, and in addition,

UL SINRi is monotonically decreasing in qk, k > i and monotonically increasing in qi. This means that, starting from a setting in which all required SINRs are met, increasing qK

UL will decrease SINRi for i < K, necessitating an increase in qK−1, and hence qK−2 and so P on. So qK should be at its minimum value for i qi to be minimized, and we can argue step by step that each qi should be minimized for the total power to be minimized. In the multi-antenna case, this argument fails, because an increase in qK can be accompanied

UL by a change in aK which would increase SINRi for some i < K. The total required transmit power to satisfy the given SINR requirements may thus fall with an increase in qK (or any other virtual uplink user’s power). An MMSE design for ai and wi on a per-user basis is thus not guaranteed to work.

In the next section an iterative suboptimal methodology similar to the one in [53] is proposed to solve this problem. In [53], an iterative algorithm is derived to find the transmitter and receiver array weight vectors and transmitter powers for the uplink in a

MIMO multi-cell system that minimizes the total transmitted power in the system while satisfying the SINR requirement at each receiver. To calculate each set of variables, two other sets are assumed to be fixed. This cyclic algorithm has to be repeated until it converges to the final solution. Because of the non-convexity of the problem, a global solution is not guaranteed. Besides, a set of SINRs might not be always jointly feasible and therefore the algorithm diverges at times. The algorithm proposed in [53] starts with arbitrary initial values for transmit beam vectors and powers in the uplink, while in our work, by taking advantage of interference pre-subtraction at the BS, instead of a random initialization, we are able to design individual transmit beam vectors to minimize

MSE for individual users. As will be shown, convergence speed is improved substantially.

Also Appendix B shows that with our proposed algorithm any set of SINRs are jointly achievable. Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 83

4.2 Joint Power Allocation and MMSE Beamform-

ing Using Uplink/Downlink Duality

4.2.1 Uplink-Downlink Duality for MIMO channels

Following the same procedure for single-antenna users, it is vital to justify the uplink- downlink duality for MIMO multiuser channels as well.

Corollary 1: For an specific choice of transmit powers, pi, transmit beam vectors, bi, and receive beam vectors, fi, for i = 1, ..., K, in the downlink, there is a set of transmit powers qi, receive beam vectors wi, and transmit beam vectors, ai in the virtual uplink,

PK PK H H such that i=1 pi = i=1 qi, and the beam vectors bi = wi , ai = fi for i = 1, ..., K, while the achieved SINRs are the same for both cases. Also, the decoding order in the virtual uplink has to be the reverse of the precoding order in the downlink.

The proof, which is based on the one presented in [10] for single-antenna users, is given in the Appendix B. It is also proved that any arbitrary set of SINRs can be jointly achieved.

Note that, the channel gains of the virtual uplink are the same as the channel gains of the downlink; the dual channel is obtained by changing the receivers in the downlink to the transmitters in the uplink, and multiple-antenna transmitter into a multiple-antenna receiver (see Figure 4.1). Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 84

H b1 n1 f1 ? ? ? x1- i - - i - i - × H1 + × q q q q q q i? q + q 6 q H bK nK fK xK- i? - - i? - i? - × HK + ×

MIMO downlink

H f1 b1 x ? ? 1-×i - HH -×i - 1 n q q q ? ? q q i - i q q + + q q 6 H q fK bK ? ? xK- i - H - i - × HK ×

Virtual MIMO uplink

Figure 4.1: Uplink-downlink duality – these two multi-user channels have the same achievable SINR region for a given sum power constraint. Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 85

4.2.2 Proposed Algorithm

Initialization

We already established that successive interference pre-subtraction on the downlink is equivalent to perfect successive interference cancellation on the virtual uplink, with the decoding order opposite of the encoding order, in the sense of both schemes having the same achievable SINR regions under a sum power constraint.

On the virtual uplink, assume that the decoding order is {1,...,K}, and so user K is interference-free. We know that an MMSE design does not solve the problem, but we would like to initialize our iterative algorithm with the MMSE design, so a brief description of MMSE transmit and receive beamforming is in order. The initialization routine designs the MMSE filters for user K, then moves on to user K − 1, treating user

K as known interference, and so on until user 1.

For user i, the received signal after SIC would be

XK √ H ri = qjHj ajxj + vi (4.6) j=i where vi is the t × 1 Gaussian noise vector for user i. ri is filtered by wi to yieldx ˆi, and the MSE is then easily found to be

2 H √ H H √ H Ei = E|xi − xˆi| = wi Riwi + 1 − qiwi Hi ai − qiai Hiwi, (4.7) where

H H Ri = qiHi aiai Hi + Ci, (4.8) is the covariance matrix of yi, and XK 2 H H Ci = σ I + qlHl alal Hl (4.9) l=i+1 is the covariance matrix of the noise and residual interference seen by user i. Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 86

Keeping in mind the successive nature of the initialization procedure, note that Ci is known when the beam vectors ai and wi for user i are designed.

Lemma 2 Minimizing MSE Ei with respect to both wi and ai, subject to kaik = 1, for any set of transmit powers {qi}, yields

opt −1 H w = R Hi ai i i (4.10) opt ai = e1(Pi) where e1(X) denotes the eigenvector associated with the largest eigenvalue of X, Pi =

−1 H HiCi Hi and Ri and Ci are defined in (4.8) and (4.9).

1 H Proof: Differentiating (4.7) with respect to wi and setting the result to zero yields

opt −1 H w = R Hi ai i i (4.11) H H −1 H = (qiHi aiai Hi + Ci) Hi ai.

With this setting for wi, user i’s mean square error is at its minimum value for the given

{ai}, and is given by

o H H H −1 H Ei = 1 − ai Hi(qiHi aiai Hi + Ci) Hi ai (4.12) H −1 = (1 + qiai Piai) , where the last line is obtained by the matrix inversion lemma.

o H The setting of {ai} with kaik = 1 that minimizes Ei must maximize ai Piai, and is therefore the eigenvector associated with the largest eigenvalue of Pi. ¥

Since minimizing MSE will always maximize SINR, the above MMSE settings of wi and ai, which can be computed sequentially from user K down to user 1, maximizes SINR for a given set of transmit powers {qi}. Equivalently, MMSE settings will minimize each qi for given {γi} – unfortunately this does not ensure that total power is minimized.

1 H The same result is obtained if we differentiate the Lagrangian of the problem with respect to wi . Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 87

Finally, the power qi is determined from the required SINR for user i, γi:

H 2 PK H H γi(wi (σ I + k=i+1 qkHk akak Hk)wi) qi = H H H , (4.13) wi Hiaiai Hi wi starting with user K and progressing down to user 1.

Iterations

After initialization we will design the downlink and virtual uplink in turn. On the downlink, wi are used as transmit beam vectors, or bi = wi, in conjunction with THP with an encoding order K,..., 1. The receive beam vector fi is the MMSE receiver for user i, or

H 1:i 1:iH H 2 −1 fi = (Hibi) (HiB B Hi + σ I) , (4.14)

1:i i when B is the columns 1 to i of the beamforming matrix B and bi = B . The reason for employing an MMSE design is once again, that it maximizes SINR for a given set of transmit powers.

The transmit powers pi are also adjusted sequentially for i = K,..., 1 to satisfy the

SINR constraints, γi: H Pi−1 2 2 γifi ( k=1 pk|Hibk| + σ )fi pi = H H H . (4.15) fiHibibi Hi fi The algorithm is iterated by applying the receive beam vectors in the downlink as transmit beam vectors in the virtual uplink, and receive beam vectors in the virtual uplink as transmit beam vectors in the downlink. The algorithm is summarized in Table

4.1. To solve the optimization problem (4.2), the precoding/decoding order has to be specified. As mentioned in Section 3.2.4, defining the optimal ordering is not easy.

Here, we follow the same suboptimal ordering approach as presented in Section 3.2.4.

In fact, since precoding orders are defined based on decoding orders for SIC, we consider the virtual uplink, and then the users are detected based on the square Frobenius norm Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 88

First step (virtual uplink):

2 Rn = σ I for i = K : −1 : 1

R = H (R )−1HH ; a = u . Hi i n i i RHi ,1 H H −1 H wi = (Hi aiai Hi + Rn) Hi ai. H wi γi.(wi Rnwi) wi = . qi = H H H . kwik wi Hi aiai Hiwi H H Rn = Rn + qiHi aiai Hi. end.

Second step (iterative algorithm):

(Downlink), C = 0.

for i = 1 : K

i H i H 1:i 1:iH H 2 −1 B = wi ,fi = (HiB ) (HiB B Hi + σ I) , H 2 H H fi γi(fifi σ +fiHiCHi fi ) fi = ; pi = H . kfik i i H H fiHiB B Hi fi i √ i B = piB C = C + BiBiH ;

end

2 (Virtual uplink): Rn = σ I. for i = K : −1 : 1

H H H H −1 ai = fi , wi = ai Hi(Hi aiai Hi + Rn) . H wi γi.(wiRnwi ) wi = . qi = H H H . kwik wiHi aiai Hiwi H H Rn = Rn + qiHi aiai Hi. end

(back to second step)

Table 4.1: The precoding/beamforming algorithm for MIMO-BC channels minimizing total transmit power. Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 89

2 of their channel matrix, kHikF for i = 1, ..., K (see equation 3.31 ). The user with the larger channel matrix norm is detected first. The Kth user which is supposed not to see any interference, is the one with the smallest channel matrix norm. This method specially makes sense, if the required SINR’s for different users are the same. This is also obtained for two-user case in [10] for γ1 = γ2, when two users have single antenna, and their channels are uncorrelated.

The Tomlinson-Harashima precoding design is explained before in Section 3.1.2. Also the design of the feedback matrix G for a given beamforming matrix B and feedforward matrix F is given in Section 3.2.1, Equation 3.12. Here, the same design is applied but

T if we transmit only one single symbol to each user, Gj in Equation 3.4 is a 1 × (K − j) vector.

4.3 Space-Time Spreading

In the proposed scheme, the same as the design in Chapter 3, there is no limitation on the number of users in the system. However, as the number of active users is increased, the transmitted power has to increase too. So if there is a peak power constraint, the number of users we can accommodate is in fact limited. On the other hand, if we are allowed to increase the bandwidth by spreading, we can increase the number of active users without increasing the peak power. As mentioned in Section 3.3, the beamforming has to be then optimized over two dimensions: space and time. Instead of assigning a beam vector bi to each user, one t × G beamforming matrix, Φi, i = 1, ..., K is assigned to each user where G is the processing gain. The transmitted signal, and the received signal by user i are the same as equations (3.32) and (3.33) respectively (for zi = 1, i = 1, ..., K). Then the multi-user signal model for the overall channel is

˜ y˜i = HiCx + n˜i, (4.16) Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 90

˜ where Hi = diag(Hi,..., Hi), n˜i = vec(Ni) and C = [vec(Φ1),..., vec(ΦK )]. Each column of C corresponds to a space-time modulation waveform for one data stream.

This presents exactly the same system model as (4.1) and therefore we can apply the same strategy explained in Section 4.2.2 to design the matrices C, F, and G. Similarly, in the virtual uplink, the received data for the user i after SIC, over G channel uses is:

XK √ H t×G Zi = qjHj Ajxj + Vi ∈ C (4.17) j=i

where Vi represents the received noise at the BS, and Aj is the transmit beamforming matrix (rj × G) from the user j. Again, by stacking the columns of Zi, we obtain

XK √ ˜ H ˜ri = qjHj a˜jxj + v˜i, (4.18) j=i

˜ H H H where ˜ri = vec(Zi), Hj = diag(Hj ,..., Hj ), v˜i = vec(Vi) and a˜j = vec(Aj). Having the same system model as (4.6) motivates us to apply the same algorithm explained in

Table 4.1, to design the transmit/receive beamforming vectors.

As an alternative, we can assign the users to different time slots in a TDM fashion.

However, as also explained in Section 3.3, by applying the algorithm once over all users as presented in signal model (4.18), the subsets of the users that result in the least amount of interference are automatically selected to be transmitted at the same time. In fact, in the

first step of the algorithm presented in Table 4.1, each user signal is transmitted through the subchannel with the minimum amount of interference. The result of the algorithm is to assign each user to only one time slot (which has the least amount of interference), rather than spreading each user symbol over all time slots. As will be shown by numerical results in Section 4.5, even though the channels are constant during one symbol time, the algorithm benefits from selecting the simultaneous users intelligently. Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 91

4.4 Multiple Symbol Transmission to each user

To generalize the algorithm proposed in Table 4.1 to transmit multiple symbols, zi ≤ min(t, ri) for i = 1, ..., K, to each user, for the first step of the algorithm (initialization) one may think of designing the beam vectors as zi largest eigenvectors of the matrix

−1 H Pi = HiCi Hi (see equations 4.8–4.10). Then the powers could be adjusted through waterfilling. However, since we have iterations in this algorithm, the beam vectors ob- tained in the second step of the algorithm (iterations), are not the eigenvectors of the corresponding whitened channel and therefore we will have difficulty to adjust the powers.

One way to solve this problem is applying interference pre-subtraction over the trans- mitted symbols to the same user as well. Precoding ordering over the users can be done as explained before in Section 4.2.2 and then a random ordering can be applied over the symbols transmitted to the same user. The algorithm presented in Table 4.1 can be then easily modified to perform multiple symbol transmission to each user as is summarized in Table C.1 in Appendix C.

The other solution is that if we have space-time precoding/beamforming as introduced in Section 4.3, then at the first step of the algorithm, the beam vectors to transmit multiple symbols to each user can be designed as the eigenvectors corresponding to the zi largest eigenvalues of the whitened channel matrix in the uplink. Since the channel ˜ matrix in the uplink (Hj) is a block diagonal matrix with min(t, ri) different eigenvalues and G min(t, ri) eigenvectors, all G largest eigenvectors are orthogonal in time. Therefore if G ≥ min(t, ri), all zi beam vectors to transmit the symbols to the same user are orthogonal in time, and therefore the algorithm proposed in Table 4.1 can be easily applied to transmit multiple symbols, when we have interference pre-subtraction only over the users, not the symbols transmitted to the same user. In fact to adjust the powers for the symbols of the same user, there is no need for waterfilling, and since also Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 92

in the iterations the beam vectors will be orthogonal, the powers can be obtained the same way as single symbol transmission. The algorithm is summarized in Table C.2 in

Appendix C.

4.5 Simulation Results

In this section we provide some numerical results to illustrate the performance of the proposed iterative algorithm. In all of the simulations a Rayleigh fading channel is as- sumed with a zero mean uncorrelated complex Gaussian noise across the receive antennas

2 (Rn = σ I). The elements of the channel matrix H are generated as independent and identically distributed (i.i.d.) samples of a complex Gaussian process with zero mean and unit variance. The channel matrix is known at the transmitter. Each user is assumed to have full knowledge about its own channel and also the transmit matrix B. All the results are presented based on the total transmit power when the noise variance, σ2 = 1.

Without loss of generality, the same SINR constraints, and also the same number of receive antennas are assumed for all users (ri = r, i = 1, ..., K). In Figure 4.2, we examine the convergence and the role of the MMSE design2 in the

first step, versus random initializations. The results are also compared with the iterative linear beamforming algorithm (based on [53]) in which there is no precoding at the BS.

There are four transmit and receive antennas (for each user). The required SINR for all users is 10 dB. It can be seen that with MMSE design, after 5 iterations, the algorithm has converged to the final point, and even if it is stopped after one iteration, the loss is less than .25 dB. In the first iteration, the beam vectors at the BS are derived in the virtual uplink and then the beam vectors at the users are obtained in the downlink.

With a random initialization, sometimes it converges after 15−20 iterations, but in some

2In this section, the proposed algorithm in Table 4.1 is recalled as MMSE. Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 93

14 Linear Beamforming Random+Precoding MMSE+Precoding 13

12

11

10 Total Transmit Power (dB)

9

8

7 0 5 10 15 20 25 Iteration

Figure 4.2: Performance of the iterative linear beamforming and the proposed algorithm with MMSE and random initializations, for t = r = 4, K = 4, SINR = 10 dB. cases it needs more than 25 iterations. We examined over 1000 random initializations but the results are shown for only 3 cases. Comparing with linear beamforming shows the improvement achieved by interference pre-subtraction at the BS.

Figure 4.3 shows the effect of increasing the number of transmit/receive antennas on the total power. By increasing the number of receive antennas from r = 1 to r = 2, and r = 4 the total power is decreased by 2.5 dB, and 4.5 dB in SINR = 10dB. It can significantly decrease the total generated interference in the network. Also by increasing the number of transmit antennas from t = 4 to t = 8, the total power is decreased by 3.5 dB in SINR = 10dB. Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 94

35

30

25

20

15

10 Total Transmit Power (dB)

5

t=8, r=4 t=4, r=4 0 t=4, r=2 t=4, r=1

−5 0 5 10 15 20 25 30 SINR(dB)

Figure 4.3: Total transmit power versus the required SINR for different number of trans- mit/receive antennas for K = 4. Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 95

300

250

200

150

100

Transmit power per user 50 t=4 0 2 4 6 8 10 12 14 16 Number of active users, K

5

4

3

2

1 Transmit power per user t=8 0 2 4 6 8 10 12 14 16 Number of active users, K

Figure 4.4: Transmit power per user versus the number of active users at the system for r = 4, SINR = 10 dB.

Figure 4.4 shows how the transmit power per user is changed by increasing the number of active users. It can be seen that by increasing the number of users over the number of transmit antennas, the total power is increased significantly. However, the algorithm still converges, while the iterative linear beamforming algorithm (no precoding) diverges if K > t for SINR > 0 dB.

In Figure 4.5, space-time precoding/beamforming is performed for K = 16 users when we have t = 4 transmit and r = 4 receive antennas, and processing gain, G = 4 which means the signal is spread over four time slots. In SS−MMSE, four users are assigned to each time slot randomly (TDMA fashion), and then MMSE algorithm is performed over Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 96

15.5 STS−MMSE STS−Random SS−MMSE

15

14.5

14 Total Transmit Power(dB)

13.5

13 0 5 10 15 20 25 Iteration

Figure 4.5: Precoding/Beamforming over space and time for t = r = 4, K = 16. each time slot independently, while in STS−Random, and STS−MMSE, the algorithm is performed over all users together as explained in Section 4.3. The initial vectors of the iterative algorithm in STS − Random are random whereas in STS − MMSE, they are calculated based on the system model (4.18) by (4.10). Compared with Figure 4.4

(r = 4) in which all the users are serviced at the same time, it can be seen that for

K = 16 users, the total power over all users is decreased from 36.8 dB to 13.1 dB by optimization over time as well as space. The price for this improvement is the bandwidth efficiency which is decreased by a factor of four.

In Figure 4.6, we have space-time precoding/beamforming performed over a system with four transmit antennas and G = 4. However two symbols are transmitted to each Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 97

13.26

13.24

13.22

13.2

Total Transmit Power (dB) 13.18

13.16

13.14 0 5 10 15 20 25 Iteration

Figure 4.6: Precoding/Beamforming over space and time for t = r = 4, K = 8. user while we have eight users with four receive antennas. THP is applied only over different users which means we do not have interference pre-subtraction for the symbols transmitted to the same user. As explained in Section 4.4 since the beamforming vectors to transmit two symbols to each user are orthogonal in time, the total transmit power is the same as the case of transmitting single symbol to each user for K = 16.

In Table 4.2 and Table 4.3 we have compared the error rate performance of the proposed algorithm in this chapter (minimizing total transmit power, TTPC) and the proposed algorithm in Chapter 3 (MMSE with per-user power constraint PUPC). To ob- tain these results, the TTPC algorithm is used to fulfill the SINR = 10(dB) requirement for all users. The probability of error is calculated for SNR = 0(dB) i.e. σ2 = 1, and Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 98

SNR = 10(dB) i.e. σ2 = 0.1 for four transmit/receive antennas and G = 1,K = 4 as well as G = 4,K = 16. Then with the same total transmit power obtained in TTPC algorithm, the transmit/receive beam vectors are obtained with the PUPC algorithm, and equal powers for all users, and the probability of error is obtained for comparison.

As is expected, the Pe with PUPC is much higher than the one with TTPC. It follows that for the same bit error probability, the PUPC algorithm requires more power than

TTPC.

TTPC PUPC

2 −5 −3 Average Pe (σ = 1) 5.4 × 10 3.3 × 10

2 −6 −8 Pe for user 1(σ = 1) 4.3 × 10 7 × 10

2 −5 −2 Pe for user 4(σ = 1) 7.3 × 10 1.3 × 10

2 −6 −3 Average Pe (σ = .1) 4.8 × 10 5.3 × 10

2 −6 −8 Pe for user 1(σ = .1) 2.86 × 10 < 10

2 −6 −2 Pe for user 4(σ = .1) 6.4 × 10 1.78 × 10

Table 4.2: The error rate performance of TTPC versus PUPC algorithm for t = r = 4,

K = 4, SINR = 10(dB)

TTPC PUPC

2 −5 −3 Average Pe(σ = 1) 3.3 × 10 9.7 × 10

2 −5 −8 Pe for user 1(σ = 1) 9.6 × 10 8 × 10

2 −4 −2 Pe for user 16(σ = 1) 1.93 × 10 2.6 × 10

Table 4.3: The error rate performance of TTPC versus PUPC algorithm for t = r = 4,

G = 4, K = 16, SINR = 10(dB) Chapter 4: Precoding and Beamforming to Minimize Total Transmit Power 99

4.6 Summary

An algorithm based on non-orthogonal minimum mean-square-error (MMSE) beamform- ing combined with non-linear interference pre-subtraction is proposed which is applicable to a multiuser BC channel with any number of transmit/receive antennas. The design is based on an uplink-downlink duality theorem discovered for MIMO multiuser systems and the goal is the minimization of the total transmit power subject to the signal to inter- ference and noise ratios (SINR’s) at the decentralized receivers being above their target levels. Extending precoding/beamforming design to the time dimension, the proposed design indeed allocates each user to the time slot with the least amount of interference.

The algorithm converges in its very first iterations, however, because of the non-convexity of the problem, it may not converge to the global optimum, which is still unknown.

In summary, the algorithm proposed here provides a high-performance method for transmit and receive beamforming design in a downlink multi-user system that outper- forms any other method currently known in the literature. The results of this chapter are published in [56,57]. Chapter 5

Precoding and Beamforming for the

Down-link in a MIMO/OFDM

System

In this chapter, and in contrast to previous chapters we consider precoding and beam- forming for the MIMO multiuser frequency selective fading channel and we investigate the effect of delay spread on space-time beamforming. In general, frequency selective fading can severely degrade system performance by causing inter-symbol interference (ISI) and result in an irreducible bit error rate (BER) which imposes an upper limit on the data rate. Specifically space-time code (STC) design will become a complicated problem. In order to transmit over frequency-selective MIMO channels, the same design principles as those described in earlier chapters may be applied as long as the channel can be transformed into one resembling flat fading.

Orthogonal frequency-division multiplexing (OFDM) modulation is a practical solu- tion which transforms a frequency selective fading channel into parallel, possibly corre- lated flat fading channels. Frequency components which are separated by the channel

100 Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 101

coherence bandwidth or more may be assumed to be totally uncorrelated.

As discussed before MIMO systems offer considerable performance improvement over single-antenna systems because of the achievable diversity gain. In fact space-time coding

(STC) is proposed to take advantage of spatial diversity. In frequency selective MIMO channels, there is an additional source of diversity, frequency diversity, due to the ex- istence of multiple propagation paths between each transmit and receive antenna pair.

By combining OFDM modulation with MIMO systems, the channel frequency diversity can be also exploited through the proper design of space-frequency codes. The strategy is to distribute the channel symbols over different transmit antennas and OFDM tones within one OFDM block. If longer decoding delay and higher decoding complexity are allowable, this can be extended to coding over several OFDM blocks which results in space-time-frequency coding.

The contributions of this chapter are as follows.

Without any knowledge of the channel at the transmitter (the same scenario as Chap- ter 2 but for frequency selective channels), we propose a method for multiple access using multiple antennas at both ends of the communication link (MIMO) and orthogonal fre- quency division multiplexing (OFDM). The MIMO component of the system serves to provide spatial diversity and increase bandwidth efficiency, whereas the OFDM compo- nent tackles the frequency selectivity of the channel. The unique feature of the proposed design is that it is based on direct sequence spread spectrum, except that the spreading codes are defined in the space and frequency domains, rather than the time domain.

Exploitation of frequency diversity is an inherent feature of the designed codes. With second-order statistical channel knowledge at the transmitter, this allows for power allo- cation schemes over spatial and frequency channels.

Then, assuming perfect channel knowledge at the transmitter, the proposed tech- niques in Chapter 3 and Chapter 4 are extended for precoding and beamforming design Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 102

over space and frequency when OFDM is used over the frequency selective channels. It will be shown that with cooperation among the processing at different frequency bins, the performance improves significantly. In fact, the proposed design selects the best frequency channel, for each user, intelligently.

The remainder of this chapter is organized as follows. In the next section, a brief overview is presented over the space-frequency spreading (SFS) schemes for single user

MIMO/OFDM systems. The transmit and receive signal model for multiuser MIMO/OFDM systems are presented in Section 5.2. SFS Matrix Design when no Channel Information is available at the Transmitter is explained in Section 5.3. Assuming perfect channel knowledge at the transmitter, space-frequency precoding and beamforming is explained in Section 5.5. Simulation results are given in Section 5.6, and we conclude with Section

5.7.

5.1 Single User MIMO/OFDM Systems

There are a few works on space-time coding over MIMO/OFDM single user systems which we briefly review here for completeness.

In [58] the previously existing trellis-structured STC were extended to use for fre- quency selective channels by replacing the time domain with the frequency domain. The design is analyzed based on pairwise error probability (PEP) in optimal detection. The resulting space-frequency codes could achieve only spatial diversity and were not guar- anteed to achieve full frequency diversity.

In [59–61] the performance criteria are derived for space-frequency coding MIMO/OFDM systems based on maximum likelihood (ML) PEP. The maximum achievable diversity or- der was found to be the product of the number of transmit antennas, the number of receive antennas and the number of delay paths. The authors of [60] showed that space-time Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 103

codes designed to achieve full spatial diversity in flat fading channels will, in general, not achieve full space-frequency diversity in frequency selective channels and therefore a completely new code design procedure is required for MIMO/OFDM systems. Then in

[61] they proposed a design scheme for a class of space-frequency codes by multiplying a part of the discrete Fourier transform (DFT) matrix with the input symbol vectors. The obtained codes achieve full spatial and frequency diversity at the expense of bandwidth efficiency. Indeed they suffer from low multiplexing gain, the code rate (see Equation

1.7), R < 1.

In [62,63], design methods are proposed to obtain full diversity space-frequency codes which provide higher data rates than the approach described in [61]. In the best scenario they can transmit one symbol per subcarrier.

5.2 Signal Model

We consider a MIMO/OFDM multiuser system with Nf subcarriers and K users, where the base station has t transmit antennas and each receiver has ri antennas for i = 1, ..., K. The frequency selective fading channels between different transmit and receive antenna pairs are assumed to have L independent paths and the same power delay profile. The channel frequency diversity is then at least equal to L.

5.2.1 Transmit Signal

In Chapter 2, we defined space-time spreading as the transmission of the KM symbols xi(q), i = 1,...,K, q = 0,...,M − 1, using G × t spreading matrices Φi, i.e. the transmitted signal over the qth symbol interval is

XK G×t S(q) = xi(q)Φi ∈ C . i=1 Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 104

The pth column of S(q) denotes the time sequence transmitted over the pth antenna.

In order to transmit over frequency-selective MIMO channels, the same design may be applied as long as the channel can be transformed into one resembling flat fading, either through time-domain equalization, or OFDM. Here, the latter route is selected for practical reasons, because it significantly reduces the receiver complexity. As is shown in Figure 5.1, the OFDM transmitter applies an Nf point inverse fast Fourier transform over a sequence of symbols which are transmitted over each antenna. A cyclic prefix is added to each OFDM symbol, before transmitting, to avoid any ISI between the OFDM symbols due to the channel delay spread. The cyclic prefix is a copy of the last ν chips of the OFDM symbol where ν is the maximum delay spread of the channel.

The transmit signal can be designed in two different ways.

Method I

As is shown in Figure 5.1, after spreading over different users symbols, instead of trans- mitting S(0),..., S(M − 1), we now perform an inverse fast Fourier transform on a chip sequence spanning a number of symbol intervals per antenna. For instance, if we wanted to use a 2G-point FFT1 we would form a super-symbol of two of the original symbols and pass that through an IFFT operation. The transmitted space-time signal will now be    S(q)  Sˇ(˜q) = FH   = FH S˜(˜q) ∈ C2G×t (5.1) S(q + 1) where F is the FFT matrix of order 2G andq ˜ = dq/2e. As with S(q), the columns of

Sˇ(˜q) denote the samples transmitted in time over each antenna, at the same chip rate

G/T , where T is the symbol interval, as before. In other words, there is no bandwidth

1The order of the FFT to be performed in OFDM is dependent on several factors such as bandwidth efficiency, coherence time and receiver complexity. Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 105

expansion introduced by the multi-carrier modulation operation.

In S˜(˜q), we now have symbols transmitted in space and frequency, so it is appropriate to term this form of modulation “space-frequency” modulation or spreading. Space- frequency spreading allows us to control the signals we transmit over orthogonal, parallel frequency channels.

In general, we will need to consider spreading across P symbol intervals per user, so that the super-symbol S˜(˜q) is given by    xi(q)Φi,1  XK   XK ˜  .  ˜ S(˜q) =  .  = Si(˜q) (5.2) i=1   i=1 xi(q + P − 1)Φi,P

where the number of FFT bins used, Nf = P ·G. Note that xi(q) is transmitted only using

Φi,1 above. This implies that each symbol is only spread over G frequency bins rather than Nf frequency bins. Therefore, the symbols are not spread over all uncorrelated frequency bins and hence in this method, full channel frequency diversity may not be achieved. In [60], it was also shown that by applying OFDM over space-time codes which are designed for flat fading channels, full frequency diversity is not generally provided.

Add Prefix Remove Prefix DESPREADING

IFFT FFT SPACE−TIME SPACE−TIME SPREADING

Add Prefix Remove Prefix IFFT FFT

Figure 5.1: OFDM/MIMO Block Diagram Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 106

Method II

We have the same structure as Figure 5.1 and as in the last section, we assume that the number of FFT bins is Nf = PG and the channel frequency diversity is at least L.

The frequency bins which are taken Nf /L tones (channel coherence bandwidth) apart or more are assumed to be totally uncorrelated. If the symbols are not allowed to be spread over all uncorrelated bins (as we have in method I), we will not exploit the full frequency diversity. On the other hand spreading over all bins (even the correlated bins which are inside the channel coherence bandwidth) does not increase the achieved frequency diversity but, it will increase the interference between the users symbols which are sent in the same frequency channels.

Here to provide full frequency diversity, the super-symbol S˜(˜q) is defined by

XK PX−1 ˜ Nf ×t S(˜q) = xi(q)Φi(q) ∈ C (5.3) i=1 q=0 which is passed through an IFFT operation, while we allow Φi(q) to have non-zero values in any row. This implies that one symbol, say x1(0), can occupy any frequency bin, as opposed to method I, where it will only reside in the first block of G frequencies.

In both methods, in each OFDM block which is Nf time slots, the number of K symbols are transmitted, and the channel is supposed to be constant.

In the rest of this chapter, Method II is applied in order to capture the full frequency diversity. The transmit signal model is also simplified as XK ˜ Nf ×t S = xiΦi ∈ C (5.4) i=1

5.2.2 Received Signal

The same signal models as (2.2, 2.4) in Chapter 2 are valid but the discrete-time channel gains are defined by a multi-ray model, with the maximum delay spread ν chips, and the Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 107

channel frequency diversity is at least L(≤ ν). At each receive antenna, the cyclic prefix is first removed from each OFDM block and then it is fed to an OFDM demodulator

(Figure 5.1). After taking Nf -point FFT we end up with Nf MIMO flat fading channels in Nf subcarriers. If we stack the signals of the received antennas in Nf chip time slots, in a Nf ri ×1 vector r, with the same notations as Section 2.1 (C = [vec(Φ1),..., vec(ΦK )]), we have    Hi(0) 0     ..  ri =  .  Cx + ni, (5.5)   0 Hi(Nf − 1) ˜ = HiCx + ni, (5.6)

where Hi(n) for n = 0,...,Nf − 1 is the ri × t matrix whose each component is the nth frequency component of the channel of the user i between each single transmit-receive antenna pair.

5.3 SFS Matrix Design with no Channel Information

at the Transmitter

The spreading matrices are designed for the downlink in a multiuser MIMO/OFDM system, based on Walsh codes, so that each user symbol is spread over all transmit antennas as well as all uncorrelated frequency bins to provide full spatial and frequency diversity. It is designed with the same methodology as space-time multiplexing presented in Chapter 2 to keep the same properties as explained for multiuser flat fading channels.

In each OFDM block which is Nf chip time slots, the number of K ≤ Nf .t symbols are transmitted. To further illustrate the design scheme, the following examples are considered. Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 108

Example I: For t = 2, ri ≥ 2, Nf = 8, L = 2, and ν = 1 chip, we can transmit K = 16 symbols per OFDM block. The frequency diversity order, L = 2 and the frequencies which are separated by Nf /L = 4 bins may be assumed to be uncorrelated. It should be noted that in this design, each chip (not symbol) is transmitted through different subcarriers. Before applying IFFT, the transmit signal matrix is then    x1 + x2 + x3 + x4 x5 + x6 + x7 + x8       x5 − x6 + x7 − x8 x1 − x2 + x3 − x4       x + x + x + x x + x + x + x   9 10 11 12 13 14 15 16       x13 − x14 + x15 − x16 x9 − x10 + x11 − x12  S = 1/2   .    x1 + x2 − x3 − x4 x5 + x6 − x7 − x8         x5 − x6 − x7 + x8 x1 − x2 − x3 + x4       x9 + x10 − x11 − x12 x13 + x14 − x15 − x16    x13 − x14 − x15 + x16 x9 − x10 − x11 + x12 where the column i for i = 1, 2 represents the signal sequence transmitted from antenna i. Therefore, for instance, the spreading matrices for the users x1, x8, are:

 T  1 0 0 0 1 0 0 0  Φ1 = 1/2   , 0 1 0 0 0 1 0 0

 T  0 −1 0 0 0 −1 0 0  Φ8 = 1/2   . 1 0 0 0 1 0 0 0

The symbol x1 is transmitted through antenna 1 over frequencies (f0, f4), and also an- tenna 2 over frequencies (f1, f5) while channel responses in frequencies f0, and f1 are uncorrelated since they are the frequency components of different spatial channels. The same is for channel responses in frequencies f4, and f5. Comparing the transmit signal in this example with the one in example I, Appendix A; although in both cases, each Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 109

symbol is spread over four time slots, but there, since the channel is constant in one sym- bol interval and we have only 2rj different spatial channels, the provided diversity order is 2rj, while here because of frequency selectivity of the channel, the provided diversity order is 4rj.

Example II: For t = 4, ri ≥ 4, Nf = 8, L = 2, and ν = 1 chip, we can transmit K = 32 symbols per OFDM block. The frequency diversity is again L = 2 and the frequencies which are separated by four bins are assumed to be uncorrelated. Following the same strategy explained in Section 2.2.2, the users are divided into four groups, each group has eight users. Then length−8 Walsh codes are applied over each group separately. For instance, the spreading matrices for the users x2, x32, will be:

 T 1 0 0 0 1 0 0 0       √  0 −1 0 0 0 −1 0 0  Φ = 1/ 8   , 2    0 0 1 0 0 0 1 0    0 0 0 −1 0 0 0 −1

 T 0 0 0 1 0 0 0 1        0 0 −1 0 0 0 −1 0  Φ = 1/2   . 32    0 −1 0 0 0 −1 0 0    1 0 0 0 1 0 0 0

In the above system design, each user symbol is transmitted over triL different fading channels while the scheme benefits from the maximum multiplexing gain of the system.

Note that if ri ≥ t, we can transmit t symbols per subcarrier, while even in full-rate spreading schemes for single user MIMO/OFDM systems, they are able to transmit at most one symbol per subcarrier ( [63]). The maximal diversity gain provided by the channel is triL which can be fully achieved by applying ML detection or sphere decoding over each interfering subgroup of the users. A proper precoding may be required to be Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 110

applied over each subgroup of the symbols to satisfy full diversity gain according to ML rank criterion which is defined in [59,60].

As mentioned before, in the downlink, optimal detectors may not be feasible because of the complexity constraints at the receiver. On the other hand, linear detectors like

Minimum Mean-Square Error (MMSE), or suboptimal detectors like SIC schemes are not able to achieve full diversity gain at the receiver. However we show that even with linear receivers like MMSE or applying serial interference cancellation (SIC) we still can achieve major improvements over flat fading channel.

Especially since different symbols which are sent through the same bin are for different users with different channel frequency responses, the system design can benefit from power control over the frequency bins as well. Specifically the achieved diversity gain with SIC detectors can be improved if the symbols, at the same frequency bins, are sent with different powers.

Power Control: We apply the same power control algorithm as the one explained in

Section 2.2.3, over different users in different channel gains. The goal is to maintain the same bit error probability for all users, and also provide a better interference cancellation between the users symbols which are sent at the same frequency bins but they have different channel frequency responses.

Each user is required to measure its SNR and compares with N threshold levels, and then transmit log2 N bits feedback to the base station to indicate its SNR level, when N may be defined as the number of the symbols (or the number of the users) which are sent from the same transmit antenna at the same time. Then at the base station, the transmitted power to each user will be adapted in the way, the user with the higher level of SNR will get the less power. It provides almost the same bit error probability over all users at different channel gains. We consider N different power levels and allocate the users with the same power to different transmit antennas. At the receivers, SIC benefits Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 111

from receiving the symbols in different levels of power.

A MMSE-SIC decoder is applied in two stages with the same structure as Figure

2.2. The decoder considers the transmitted symbols in sequence, each symbol is detected after subtracting the already decoded symbols. The role of power control to improve the achieved diversity gain by SIC decoders is explained in Section 2.2.3.

In flat fading channels, the maximum achievable diversity order in a MIMO system is calculated based on pairwise error probability (PEP) obtained with ML detection [2]. In the same way, the potential diversity order in a MIMO/OFDM system can be analyzed through PEP evaluation with ML detection. The PEP for a general MIMO OFDM single-user system is calculated in [64]. The results can be used to analyze the current multiuser case as well. Based on the design criterion derived in [64], the achieved diversity order in a MIMO/OFDM system is ld ≤ nri with n ≤ min(Deff , tL), where t, ri are the number of transmit and receive antennas respectively, L is the number of independent paths (frequency diversity order), and Deff is the effective length of the space-frequency spreading code which is in fact the minimum distance of the spreading code over every possible codeword pair. It can be easily shown that in the proposed SFS design, with no power control, Deff = 1 while with power control Deff = tL, and therefore the designed spreading code has the potential to provide full diversity which is: ld = tLri.

5.4 Comparison With MIMO Multi-Carrier CDMA

Schemes

In frequency selective MIMO multiuser channels, an alternative methodology is to employ a combination of space-time coding and multi-carrier CDMA (MC-CDMA).

The existing strategies are mostly based on either BLAST [65, 66] or orthogonal Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 112

STBC [66, 67]. The transceiver structure of a typical MIMO MC-CDMA is shown in

Figure 5.2. A Walsh-Hadamard code is assigned to each user as a spreading code.

Data from other users

Spreading IFFT Cyclic Prefix Data Channel Coding STC Tx Modulation Encoder Spreading IFFT Cyclic Prefix

Data from other users

Remove FFT Cyclic Prefix Rx STC Demodulation Data De−spreading Decoder Channel Decoding Remove FFT Cyclic Prefix

Figure 5.2: Transceiver structure of MIMO MC-CDMA systems

Using BLAST as the STC encoder, a frame of data of each user is split into t sub- streams to be transmitted from t transmit antennas. The sub-streams of each user are spread with the same spreading code and then in each antenna, all data sequences from different users are combined together, the same as a CDMA system. An IFFT is applied over the data in each antenna and a cyclic prefix is added before transmitting over the channel. At the user receiver i, after the removal of cyclic prefix and applying FFT, the

l ri×1 outputs at the ri receiving antennas corresponding to the lth subcarrier, yi ∈ C , for Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 113

l = 1, ..., Nf can be written as [66]:

l l l l yi = His + vi, (5.7)

l ri×t where Hi ∈ C is the frequency domain channel matrix on the lth subcarrier between the transmit antennas and the receive antennas of the user i, and " # XK XK T l l l t×1 s = clkdk1, ..., clkdkt , ∈ C , (5.8) k=1 k=1

l where clk is the code chip of the user k at lth subcarrier, dkm for m = 1, ..., t is the mod- ulated data of user k transmitted from mth antenna corresponding to the lth subcarrier,

l and vi is the ri × 1 noise vector at the user receiver i on the lth subcarrier. As can be seen from (5.7) the received signal after FFT has a similar structure as that in a single carrier system. Therefore a common BLAST detector can be applied per subcarrier to recover the t transmitted signals on the particular subcarrier. As a result, in this scheme the maximum achievable diversity order for each user i is the multiplication of frequency and receive diversity: L.ri since BLAST does not provide transmit diversity. Applying STBC, the same structure as BLAST is applied but the symbols may be required to be sent in a number of OFDM symbols and then the STC decoding has to be performed between those OFDM symbols as well. For example, in a simple case of

1 2 t = ri = 2, each user i transmits symbols (di , di ) from both antennas at time t0, and then

2∗ 1∗ (−di , di ) at time t0 + Tf where Tf is the OFDM symbol duration. The data symbols of K users are multiplied by their specific orthogonal spreading codes and the rest operation such as IFFT and cyclic prefix addition is the same as BLAST scheme. The received signal at the jth receive antenna of the user i after the OFDM demodulation, at time t0 and t0 + Tf is given by [67]:   j j j  Hi1 Hi2  j yi =   Ds + vi , (5.9) j∗ j∗ Hi2 −Hi1 Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 114

h iT h iT j jT jH j j j where yi = zi (t0), zi (t0 + Tf ) with zi (t0) = zi,1(t0), ··· , zi,N (t0) , the vector of h i f N received signal at time t , Hj = Diag hj , ..., hj (t, j ∈ {1, 2}) is a N × N f 0 it it,1 it,Nf f f j diagonal matrix where hit,k is the channel frequency response for the subcarrier k from the transmit antenna t to the receive antenna j. D = Diag [C, C] where C is the spreading £ ¤ code matrix, and s = d1T d2T T ∈ 2K × 1 is the vector of transmitted symbols from all

j the users, and vi represents the additive noise at the jth receive antenna of the user i at time t0 and t0 + Tf . For any processing and recombination of symbols with this STBC scheme, channel time invariance during two MC-CDMA symbols has to be assumed.

The same as BLAST MC-CDMA scheme, after OFDM demodulation we get to the same system model as flat fading channel and therefore the same detection algorithm may be applied to detect the symbols of each user. Full transmit/receive and frequency diversity can be then achieved with this scheme.

Comparing BLAST and STBC MC-CDMA schemes with the SFS design proposed in this chapter, we get to the same conclusion as Section 2.4 for flat fading channels.

BLAST MC-CDMA has the same spectral efficiency as the SFS design, but it does not provide transmit diversity. STBC MC-CDMA provides the same diversity order as the

SFS design but it suffers from low multiplexing gain. In STBC, at most one symbol is transmitted per channel use while in the SFS design we may transmit t symbols per channel use. Note that with STBC, as mentioned in Section 2.4, the spectral efficiency es ≤ 1. The key point is that the SFS design proposed in this chapter provides a unified framework for STC and multiplexing over frequency, rather than applying STC encoding and spreading separately. Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 115

5.5 SFS Matrix Design with Perfect Channel Knowl-

edge at the Transmitter

Assuming perfect channel knowledge at the transmitter, precoding and beamforming design is performed for the downlink in a frequency selective multiuser MIMO/OFDM system while a non-linear interference pre-subtraction is presumed at the transmitter

(the same structure as Figure 3.2). The transmit and receive signal models are given in

(5.3) and (5.5) respectively. The goal is to either

• minimize the mean squared error between transmit and receive data streams with

a per-user power constraint, which was called PUPC in Section 4.5 or

• minimize total transmit power while satisfying pre-defined signal to interference/noise

ratio (SINR) requirements at each receiver which was called TTPC in Section 4.5.

For flat fading channels, PUPC and TTPC are solved before in Chapters 3 and 4 respectively. Having the same system model as (3.1) and (4.1) motivates us to apply the same algorithms explained in Table 3.1 and Table 4.1 to design the transmit/receive beamforming vectors based on PUPC and TTPC respectively. Note that in the virtual uplink, we also have the same equation as (4.6) for the received signal of the ith user, when ˜ Hi is defined in (5.5). Therefore the proposed scheme in Chapter 4 can be extended to design the space-frequency beam vectors as well. In fact the beam vectors are optimized over two dimensions: space and frequency. As a result of the beamforming design either in Table 3.1 or in the algorithm in Table 4.1, each user signal is transmitted over the channel frequency with the highest gain and the least amount of interference. In fact instead of spreading the signal of each user over all frequency bins, each user is only assigned to the best channel frequency (for a fixed precoding order). This has the same Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 116

reasoning as what explained for space-time beamforming design in Section 3.3 and 4.3.

The only difference is that in previous ones, the only parameter which is changing over time-slots is the interference, while in the current case, the channel is also different over the frequencies and therefore, the frequency diversity may be achieved as well. Note that it is also possible to assign users randomly to different frequency bins and then perform the proposed algorithms on each frequency bin independently. It will be shown that with cooperation among the processing at different frequency bins, the performance improves significantly compared to the case that the beam vectors are optimized over different frequency bins independently.

5.6 Simulation Results

We examine the proposed schemes for t = 2 transmit antennas and BPSK modulation.

Without loss of generality, the same number of receive antennas are assumed for all users (rj = r = 2, j = 1, ..., K). However there is no constraint on the number of antennas. The signal to noise ratio (SNR) is defined as average Eb/N0 over all the users, per receive antenna. All the results are obtained by averaging over more than 10, 000, 000 independent Monte Carlo trials.

In Figure 5.3 we have a Nf = 8 point FFT and the average bit error probability

(Pe) for the flat fading channel is compared with the frequency selective channel, in both space-frequency spreading methods explained in Section 5.2.1, while a two-tap equal power channel is considered with the maximum channel delay spread, ν = 1 chip. Also the bit error probability is evaluated for ν = 3 chips and a four-tap equal power channel when a Nf = 16 point FFT is applied. In both cases, MMSE receiver is applied. As mentioned before, in practical cases, a cyclic prefix which is a copy of the last ν chips of the OFDM symbol, has to be added to each OFDM symbol to avoid ISI. Therefore Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 117

for a larger channel delay spread (ν), a higher order of FFT is chosen to increase the bandwidth efficiency. The importance of frequency diversity is well defined by comparing

Pe between flat fading channel and multipath channels. Although we expect a diversity order of 8 for ν = 1 and (L = 2), but because of the interference between the users at the same frequency bins, MMSE is not able to achieve full diversity gain. The optimal detectors are required to achieve full diversity gain but SIC also improves the Pe.

0 10 np=4, SIC np=4, MMSE np=2, OFDM I,MMSE np=2, MMSE −1 10 Flat+MMSE

−2 10

−3 10 Probability of error

−4 10

−5 10

−6 10 0 5 10 15 20 25 SNR(dB)

Figure 5.3: Performance comparison of Space-Frequency Spreading methods for one, two and four tap equal power fading channels for t = r = G = 2.

In Figure 5.4 we have compared the performance of optimal (ML) and suboptimal

(MMSE-SIC) detectors with and without power control, for a two tap channel. As is shown power control improves the performance of SIC detector while, it might be also Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 118

necessary when the users are in different channel gains, to maintain the same bit error probability. Here we have applied two-stage SIC; but by increasing the stages of SIC, the achieved diversity gain gets close to ML detection.

As is discussed in Chapter 2.2.3, for space-time spreading codes, to achieve full diver- sity gain by ML detection, a proper rotation should be applied over the symbols before multiplexing to satisfy the rank criterion of ML detection. This fact also applies for space-frequency codes, inside each subgroup of the users which are transmitted at the same frequency bins. Here we have applied the precoder

θ = diag(1, ej/4, ej2/4, ej3/4, ej, ej5/4, ej6/4, ej7/4), over each subgroup of eight symbols. We get the better performance with ML detection and the above precoding, than no precoding. For high SNR the achieved diversity gain gets close to full diversity provided by the channel. However we do not claim that this is the optimal precoding to achieve full diversity. We have only used this θ as an example to improve the performance of ML detection.

The penalty for the achieved improvement with ML detection is the increased com- plexity at the receiver.

In Figure 5.5, the performance of MMSE precoding/beamforming over a flat fading channel is compared with space-frequency MMSE precoding/beamforming over frequency selective channel, based on PUPC, when we have t = 2 transmit, and r = 2 receive antennas. The algorithm is given in Table 3.1. In frequency selective channel, two-tap channels are simulated for each user with a maximum delay spread of ν = 1 chip and

8−point FFT is applied to implement OFDM. As explained in Section 5.5, once the proposed MMSE algorithm is applied to all frequency bins, each user symbol is assigned to one frequency bin. We examined both cases of transmitting one symbol per user for

K = 16 and two symbols per user for K = 8. It can be seen that the performance Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 119

−1 10 ML+Rotation ML+Power control SIC SIC+ Power control −2 10

−3 10

−4 10 Probability of error

−5 10

−6 10

−7 10 0 2 4 6 8 10 12 14 16 18 20 SNR(dB)

Figure 5.4: Performance of ML detection versus SIC with and without power control for two-tap channel for t = r = 2, L = 2,Nf = 8. Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 120

improvement over the flat fading channel is about 5dB because of the frequency diversity which can be explored in space-frequency spreading.

−1 10

−2 10

−3 10

−4 10

−5 10 Probability of Error −6 10

−7 10 Flat Fading/One symbol per user OFDM−MMSE/One symbol per user OFDM−MMSE/Two symbols per user

−8 10 −4 −2 0 2 4 6 8 10 12 SNR (dB)

Figure 5.5: The performance of space-frequency spreading compared with MMSE beam- forming over flat fading channel t = 2, r = 2,Nf = 8, z × K = 16.

In Figure 5.6, the average probability of error is compared with the probability of error of each individual user. The first user is the one who does not see any interference, and the eighth user is the one who sees the full interference.

In Figure 5.7, space-frequency precoding/beamforming based on TTPC is performed for K = 16 users when we have t = 2 transmit, r = 2 receive antennas, and 8−point

FFT is applied to implement OFDM. Two-tap channels are simulated for each user with a maximum delay spread of ν = 1 chip. As explained in Section 5.5, once the proposed Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 121

−1 10

−2 10

−3 10

−4 10 Probability of error

−5 10

−6 10 P for user−8 e Average P for all users e −7 P for user−4 10 e P for user−1 e

−8 10 −4 −2 0 2 4 6 8 SNR (dB)

Figure 5.6: Average Pe compared with Pe for individual users in space-frequency spread- ing, t = 2, r = 2,Nf = 8,K = 8, z = 2. Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 122

MMSE algorithm (Table 4.1) is applied to all frequency bins, each user is assigned to one frequency bin. The total transmit power is compared in three following different cases.

MMSE per carrier in which two users are assigned to each frequency bin randomly and the algorithm in Table 4.1 is performed over different frequency bins independently.

Random-OFDM in which the algorithm is performed over all frequency bins (the beam vectors are optimized over all frequency bins) but we have a random initialization so that in the initialization stage (virtual uplink), each user signal is spread over all frequency bins as well as all antennas. MMSE-OFDM in which the algorithm in Table 4.1 is performed over all frequency bins. The interesting point is that the iterative algorithm in random-OFDM converges to the same point as MMSE-OFDM. In fact, ultimately, after many iterations, each user is only assigned to one frequency bin. There is a large improvement (3 dB) compared with MMSE-per-carrier scheme because of the frequency diversity which can be exploited in random-OFDM and MMSE-OFDM. The same as the example in Figure 4.2, the result of MMSE-OFDM is very satisfactory even after one iteration. Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 123

16

15.5

15

14.5 MMSE−OFDM random−OFDM MMSE per carrier

14 Total Transmit Power(dB)

13.5

13

12.5 0 2 4 6 8 10 12 14 16 18 20 Iteration

Figure 5.7: Precoding/Beamforming over space and frequency for t = r = 2, K = 16,

SINR = 10(dB). Chapter 5: Precoding and Beamforming in a Multiuser MIMO/OFDM System 124

5.7 Summary

We proposed space-frequency spreading codes for a broadband MIMO multiuser system.

We considered the downlink of a frequency selective wireless channel.

Without any knowledge of the channel at the transmitter, the spreading codes are designed to provide full spatial and frequency diversity at the receiver of each user while the system benefits from the maximum multiplexing gain of multiple antennas. We examined both, suboptimal and optimal detectors. Although full diversity gain is not achieved by linear detectors but still frequency diversity, which is extracted by a proper design of MIMO/OFDM system, makes a major improvement over flat fading channel.

Interference cancellation between each subgroup of the users which are transmitted at the same frequency bins is improved by allocating different power over different users at different channel gains.

Assuming perfect channel knowledge at the transmitter, a unified signal model is derived for precoding over space and frequency when OFDM is used, and hence the proposed techniques for flat fading channels are extended to design space-frequency beam vectors too. The proposed design is able to explore full frequency diversity as well; each user is allocated to the strongest frequency channel with the least amount of interference.

The results of this chapter are published in [57,68]. Chapter 6

Conclusion

In this chapter we briefly review the main contributions of this thesis and propose some directions for future work.

6.1 Summary of Contributions

This thesis is focused on the downlink of a MIMO multiuser wireless system, in which multiple-antennas are employed at both the transmitter (base station) and the receivers

(clients) to provide diversity and also multiplexing gain. It is divided into two main parts.

In the first part, no channel information is assumed at the transmitter. In flat fading channels, a space-time spreading scheme is proposed based on well-known Walsh codes.

The spreading matrices are designed to provide full transmit diversity and high spectral efficiency. This technique produces groups of users that are orthogonal to each other.

This simplifies detection strategies without loss of performance. A two-stage interference canceller is applied at each receiver, which in conjunction with an unequal power alloca- tion scheme, is able to achieve full diversity and suffers from only a small performance

125 Chapter 6: Conclusion 126

loss compared to optimal (maximum likelihood) receivers. In frequency selective fading channel when OFDM is used, space-frequency spreading matrices are designed to provide full frequency diversity as well as transmit diversity.

In the second part, perfect channel knowledge is assumed at the transmitter. A non- linear interference pre-subtraction is implemented at the transmitter by using a matrix version of Tomlinson-Harashima precoding. Space-time beamforming is performed based on a minimum mean-squared error (MMSE) criterion and a per-user power constraint.

Since successive interference cancellation is applied at the BS, the available single user algorithms can be applied to each individual single user channel. Besides, an uplink- downlink SINR duality result is proved for MIMO multiuser systems. Based on this duality result, an iterative algorithm is proposed for minimizing the total transmit power subject to the signal to interference and noise ratios (SINR’s) at the decentralized re- ceivers being above their target levels. In frequency selective fading channels, where

OFDM is applied, the proposed designs are able to explore full frequency diversity as well.

6.2 Future Work

Although the general areas of multiuser communications and MIMO communications

(specifically through applying multiple-antennas) have been both independently well studied, but applications of adding multiple-antennas (MIMO) into multiuser wireless networks open up many new areas of research that have not yet been addressed. In this section, we provide a few directions as future research related to this current work; it is clear that there are many interesting problems to be considered.

In Chapter 2 where we proposed a spreading scheme for MIMO multiuser downlink channels with no channel information at the transmitter, a power allocation algorithm is Chapter 6: Conclusion 127

suggested which makes a vast improvement in the performance of interference cancella- tion. We intended to use a very low-rate feedback (possible minimum rate) and therefore the algorithm is neither optimal nor even very precise. Applying cross-layer optimiza- tion schemes, it is more interesting to explore a more precise suboptimal/optimal power control scheme which is suitable for the downlink channel and take advantage of it for performance improvement of SIC as well. It may require partial channel information at the transmitter, more than what we have already used.

In Chapters 3, 4 we applied a non-linear precoding (THP) scheme at the transmitter.

Devising a linear pre-coding scheme which achieves the same goals, as we have in both

Chapters 3, 4, would be an important contribution to this field of research. Also, we assumed a perfect channel state information at the transmitter. This may be easy to obtain when the channel is fixed or changing slowly, but it is very difficult if the channel changes quickly. Even in fixed wireless communication, because of changes in the envi- ronment, there is always a chance that we have imperfect or outdated channel knowledge at the transmitter. An analysis of the penalty for using imperfect or outdated channel information has a significant benefit. Developing linear/nonlinear precoding and beam- forming schemes for the MIMO downlink channel where the transmitter knows only the statistics of the channel coefficients would be a major contribution to the field. Study of non-linear precoding schemes where partial or statistical channel information is available at the transmitter may be of significant interest in information theoretic applications such as the work in [28,69] as well.

In Chapter 4, we devised an iterative algorithm which according to simulation results converged in its very first steps. However, as mentioned before the convergence of the algorithm is not guaranteed. It might have even converged to a local optimum. There- fore convergence analysis of this algorithm may clarify the current ambiguities of the algorithm. We also applied a suboptimal precoding ordering scheme, finding an optimal Chapter 6: Conclusion 128

ordering is an open problem.

Applications of MIMO precoding in broadband wireless access (e.g. WiMAX) is very promising especially since a fixed wireless channel may be assumed. Network issues can be also studied in parallel, to satisfy quality of service (QOS) requirements. Appendix A

Spreading Matrix Design Examples

In this appendix, we present two examples of spreading matrix design for the downlink in a MIMO multiuser channel when no information about the channel is provided at the transmitter.

Example I: t = 2, rj ≥ 2, j = 1, ..., K, G = 4, K = U = 8:

The users are divided into two groups of {u1, u2, u3, u4}, {u5, u6, u7, u8}, and four columns of the 4−dimensional Walsh-Hadamard matrix are used to spread the data of different users in each group. The transmit matrix is then   s + s + s + s s + s + s + s  1 2 3 4 5 6 7 8       s5 − s6 + s7 − s8 s1 − s2 + s3 − s4  S = 1/2      s1 + s2 − s3 − s4 s5 + s6 − s7 − s8   

s5 − s6 − s7 + s8 s1 − s2 − s3 + s4 is transmitted. Therefore the spreading matrix for the fourth user is:   1 0        0 −1  Φ = 1/2   . 4    −1 0    0 1

129 Appendix A: Spreading Matrix Design Examples 130

Example II: t = 4, rj = 2, j = 1, ..., K, G = 2, K = U = 4: In this case, since for each user, four channels is provided, either two or all four columns of a 4-dimensional Walsh-Hadamard matrix can be used to spread the data of each user. If we consider the later case, the transmit matrix is:    s1 + s2 s3 + s4 s1 + s2 −s3 − s4  S = 1/2   . s3 − s4 s1 − s2 −s3 + s4 s1 − s2

Then the spreading matrices for the second and fourth user are:    1 0 1 0  Φ2 = 1/2   0 −1 0 −1    0 1 0 −1  Φ4 = 1/2   . −1 0 1 0 Appendix B

Proof of Uplink-Downlink Duality in

MIMO Multiuser Systems

For given fi and bi, where fi is the beam vector at receiver i, and bi is the transmit beam vector in the base station for user i, the mutual interference between the users in the downlink, considering the precoding order as {1, ..., K} is defined by Ψ,   H 2 H H 2 |fi Hibl| = |b Hi fi| . l = 1, ..., i − 1 [Ψ] = l (B.1) i,l   0, l = i, ..., K In the downlink, the interference from all K users to the user i is presented in the ith row of the matrix Ψ. In the virtual uplink, with a decoding order as the reverse of precoding order in BS, and channels to be transpose conjugate of downlink channels (see Fig. 4.1), the interference experienced by the user receiver i is presented in the ith column of Ψ.

For the target SINR’s γ1, ..., γK , and the diagonal normalization matrix   γl |f H H b |2 , l = i [D] = l l l i,l   0, l 6= i it has been shown that the target SINR’s can be jointly achieved in uplink and down- link if λmax(DΨ) < 1, where λmax(.) is the maximum eigenvalue (see [9, 70] and their

131 Appendix B: Proof of Uplink-Downlink Duality in MIMO Multiuser Systems 132

references). For linear beamforming (without interference cancelation), the mutual in- terference may lead to λmax(DΨ) ≥ 1 which means the target SINR’s are infeasible. However, for the channels with precoding/successive interference cancelation, because of the triangular structure of Ψ, the equation det{λI − DΨ} = 0 is equivalent to λK = 0.

Therefore, λmax(DΨ) = 0, which means that arbitrary individual SINR targets can be jointly achieved by an appropriate design of the F and B matrices. To clarify the above notations, consider the following simple example:

Example: For K = 2, t = 2, ri = 2 for i = 1, 2, we have:    0 0  Ψ =   . H 2 |f2 H2b1| 0

The precoding ordering is so that the first user does not see any interference. Then   H 2  γ1/|f1 H1b1| 0  D =   , H 2 0 γ2/|f2 H2b2| and,    λ 0  λI − DΨ = .  H 2  γ2|f2 H2b1| | H 2 λ |f2 H2b2| To prove the uplink-downlink duality, we have to show that the same total power is required in both uplink and downlink to achieve the same target SINR’s while the same beam vectors are used in both links. The minimal total power to achieve the targets

γ1, ..., γK is obtained by setting SINRi = γi, for i = 1, ..., K which results the matrix equations as follows:

(I − DΨ)p = σ2D1, (downlink) (B.2)

(I − DΨT)q = σ2D1, (uplink) (B.3) where 1 is an all-1 vector. Appendix B: Proof of Uplink-Downlink Duality in MIMO Multiuser Systems 133

It can be easily shown that (I − DΨ) and (I − DΨT) are non-singular, and always there exist positive power allocations:

p = σ2(I − DΨ)−1D1, (downlink) (B.4)

q = σ2(I − DΨT)−1D1, (uplink) (B.5) and therefore the minimum required total power in the uplink is:

PK 2 T −1 T −1 l=1 ql = σ 1 (D − Ψ ) 1 = σ21T (D−1 − Ψ)−11 (B.6)

PK = l=1 pl, which is the same as total required power in the downlink.

Therefore, applying the same beam vectors in both links, the same total power is required to achieve a certain set of SINR’s. It can be seen from (B.2), (B.3), (B.6) that this duality holds as long as the mutual interference matrix in the downlink (Ψ) is the transpose of the mutual interference matrix in the uplink. This explains the reason that precoding order has to be the reverse of the decoding order. Appendix C

The Algorithms for Multiple Symbol

Transmission to each user

In this appendix, we give the algorithms for multiple symbol transmission to each user, in a MIMO broadcast channel to minimize total transmit power while pre-defined signal to interference/noise ratio (SINR) requirements are to be satisfied at each receiver.

Table C.1 presents the algorithm when we have interference pre-subtraction over all the users symbols, and precoding/beamforming is performed only over space dimension.

Table C.2 presents the algorithm when we have space-time precoding/beamforming and interference pre-subtraction is only applied over different users.

j j In the tables, xi stands for transmit/receive beam vector, and γi is the required signal j j to noise ratio for the jth symbol of the user i. Also, pi and qi are the power allocated to the jth symbol of the user i in the downlink and uplink respectively. U are the RHi ,(1:zi) eigenvectors corresponding to zi largest eigenvalues of the matrix RHi .

134 Bibliography 135

First step (virtual uplink):

2 Rn = σ I for i = K : −1 : 1

for j = zi : −1 : 1 R = H (R )−1HH ; aj = u . Hi i n i i RHi ,1 j H j jH −1 H j wi = (Hi ai ai Hi + Rn) Hi ai . j j jH j j wi j γi .(wi Rnwi ) w = j . q = . i kw k i jH H j jH j i wi Hi ai ai Hiwi j H j jH Rn = Rn + qi Hi ai ai Hi. end,

end.

Second step (iterative algorithm):

(Downlink), C = 0, Q = σ2I

for i = 1 : K

for j = 1 : zi

j jH j j H j jH H −1 bi = wi ,fi = (Hibi ) (Hibi bi Hi + Q) , H H j j j j 2 j H j j fi j γi (fi fi σ +fi HiCHi fi ) f = j ; p = H . i kf k i j j H H j i fi Hibi bij Hi fi j j jH j j jH C = C + pi bi bi , Q = Q + pi bi bi ; end,

end.

2 (Virtual uplink): Rn = σ I. for i = K : −1 : 1

for j = zi : −1 : 1

j jH j jH H j jH −1 ai = fi , wi = ai Hi(Hi ai ai Hi + Rn) . j j j jH j wi j γi .(wi Rnwi ) w = j . q = . i kw k i j H j jH jH i wi Hi ai ai Hiwi j H j jH Rn = Rn + qi Hi ai ai Hi. end,

end. (back to second step)

Table C.1: The precoding/beamforming for multiple symbol transmission. Bibliography 136

First step (virtual uplink):

2 Rn = σ I for i = K : −1 : 1

R = H (R )−1HH ; A = U . Hi i n i i RHi ,(1:zi) H H −1 H Wi = (Hi AiAi Hi + Rn) Hi Ai.

for j = 1 : zi j j jH j j Wi j γi .(Wi RnWi ) W = j . q = . i kW k i jH H j jH j i Wi Hi Ai Ai HiWi j H j jH Rn = Rn + qi Hi Ai Ai Hi. end,

end.

Second step (iterative algorithm):

(Downlink), C = 0, Q = σ2I.

for i = 1 : K

H H H H −1 Bi = Wi ,Fi = (HiBi) (HiBiBi Hi + Q) ,

for j = 1 : zi H H j j j j 2 j H j j Fi j γi (Fi Fi σ +Fi HiCHi Fi ) F = j ; p = . i kF k i j j jH H jH i Fi HiBi Bi Hi Fi j j jH j j jH C = C + pi Bi Bi , Q = Q + pi Bi Bi end

end.

2 (Virtual uplink): Rn = σ I. for i = K : −1 : 1

H H H H −1 Ai = Fi , Wi = Ai Hi(Hi AiAi Hi + Rn) .

for j = 1 : zi j j j jH j Wi j γi .(Wi RnWi ) W = j . q = . i kw k i j H j jH jH i Wi Hi Ai Ai HiWi j H j jH Rn = Rn + qi Hi Ai Ai Hi. end,

end. (back to second step)

Table C.2: The space-time precoding/beamforming for multiple symbol transmission. Bibliography

[1] S. Alamouti. Space time block coding: A simple transmitter diversity technique

for wireless communications. IEEE Journal on Selected Areas in Communications,

16(8):1451–1458, Oct. 1998.

[2] V. Tarokh, N. Seshadri, and A. R. Calderbank. Space-time codes for high data rate

wireless communication: Performance criterion and code construction. IEEE Trans.

on Information Theory, 44(2):744–765, March 1998.

[3] V. Tarokh, H. Jafarkhani, and A. R. Calderbank. Space-time block codes from

orthogonal designs. IEEE Trans. On Information Theory, 45(5):1456–1467, July

1999.

[4] G. J. Foschini. Layered space-time architecture for wireless communication in a

fading environment when using multi element antennas. Bell Labs Technical Journal,

1(2):41–59, Aug. 1996.

[5] P. W. Wolniansky, J. G. Foschini, G.D. Golden, and R. A. Valenzuela. V-BLAST:

An architecture for realizing very high data rates over the rich-scattering wireless

channel. In Proc. ISSSE., Sept. 1998.

[6] B. Hassibi and B.M. Hochwald. High-rate codes that are linear in space and time.

IEEE Transactions on Information Theory, 48(7):1804 – 1824, July 2002.

137 Bibliography 138

[7] H. El Gamal and M.O. Damen. Universal space-time coding. IEEE Trans. on

Information Theory, 49(5):1097 –1119, May 2003.

[8] W. Yu and J. Cioffi. Sum capacity of gaussian vector broadcast channels. In Proc.

IEEE International Symposium on Information Theory (ISIT), 2002.

[9] M. Schubert and H. Boche. Solution of the multiuser downlink beamforming problem

with individual SINR constraints. IEEE Trans. On Vehicular Technology, 53(1):18–

28, Jan. 2004.

[10] M. Schubert and H. Boche. Iterative multiuser uplink and downlink beamforming

under SINR constraints. Accepted for publication in IEEE Trans. on Signal Process-

ing, 2004.

[11] J. G. Proakis. Digital Communications. McGraw-Hill, fourth edition, 2000.

[12] R. van Nee. OFDM wireless multimedia communications. Artech House, Boston,

2000.

[13] A. R. S. Bahai, B. R. Saltzberg, and M. Ergen. Multi-carrier digital communications

: theory and applications of OFDM. Springer, 2004.

[14] A. F. Naguib, V. Tarokh, N. Seshadri, and A.R. Calderbank. A space-time coding

modem for high-data-rate wireless communications. IEEE Journal on Selected Areas

in Communications, 16(8):1459–1478, Oct. 1998.

[15] M. Sellathurai and S. Haykin. TURBO-BLAST for high-speed wireless communica-

tions. In Proc. Wireless Communications and Networking Confernce, Sept. 2000.

[16] G.D. Golden, J. G. Foschini, R. A. Valenzuela, and P. W. Wolniansky. Detection

algorithm and initial laboratory results using V-BLAST space-time communication

architecture. Electronics letter, 35(1):14–16, January 1999. Bibliography 139

[17] A. L. C. Hui and K. B. Letaief. Successive interference cancellation for multiuser

asynchronous DS/CDMA detectors in multipath fading links. IEEE Trans. on Com-

munication, 46(3), March 1998.

[18] I. E. Telatar. Capacity of multi-antenna gaussian channels. Eur. Trans. Telecom.,

10:585–595, Nov. 1999.

[19] M. O. Damen, A. Tewfik, and J. Belfiore. A construction of a space-time code based

on number theory. IEEE Trans. on Information Theory, 48(3), March 2002.

[20] M.O. Damen, H. El Gamal, and N. C. Beaulieu. Linear threaded algebraic space-

time constellations. IEEE Trans. on Information Theory, 49(10):2372–2388, Oct.

2003.

[21] B. Hochwald, T. L. Marzetta, and C.B. Papadias. A transmitter diversity scheme

for wideband CDMA systems based on space-time spreading. IEEE J. Sel. Areas In

Communications, 19(1):48–60, January 2001.

[22] H. Huang, H. Viswanathan, and G.J. Foschini. Multiple antennas in cellular CDMA

systems: transmission, detection, and spectral efficiency. IEEE Trans. Wireless

Comms., 1(3):383–392, July 2002.

[23] J. Wang and K. Yao. Space-time coded wideband CDMA systems. In Proc. IEEE

Veh. Tech. Conf. (VTC), 2002.

[24] G.J. Foschini and M.J. Gans. On limits of wireless communications in a fading

environment when using multiple antennas. Wireless Personal Communications,

6(3):311–335, March 1998.

[25] G. Raleigh and J. Cioffi. Spatio-temporal coding for wireless communication. IEEE

Trans. on Comm., 46(3):357 – 366, March 1998. Bibliography 140

[26] C. Windpassinger, R. F. H. Fischer, T. Vencel, and J.B. Huber. Precoding in multi-

antenna and multi-user communications. IEEE Trans. on wireless communications,

3(4):1305 – 1316, July 2004.

[27] R. F. H. Fischer. Precoding and Signal Shaping for Digital Transmission. Wiley,

New York, 2002.

[28] S. Vishwanath, N. Jindal, and A. Goldsmith. Duality, achievable rates, and sum-

rate capacity of gaussian MIMO broadcast channels. IEEE Trans. on Information

Theory, 49(10):2658–2668, October 2003.

[29] B. Hassibi and B. Hochwald. High-rate linear space-time codes. In Proceedings IEEE

International Conference on Acoustics, Speech, and Signal Processing, 2001.

[30] R. Doostnejad, T.J. Lim, and E.S. Sousa. Space-time spreading codes for a multiuser

MIMO system. In Proc. IEEE 36th Asilomar Conference on Signals, Systems and

Computers, Nov. 2002.

[31] X. Giraud, E. Boutillon, and J.C. Belfiore. Algebraic tools to build modulation

schemes for fading channels. IEEE Transactions on Information Theory, 43(3):938

– 952, May 1997.

[32] V. M. DaSilva and E. S. Sousa. Fading-resistant modulation using several transmit-

ter antennas. IEEE Trans. on Communications, 45(10):1236 – 1244, Oct. 1997.

[33] H. Nichols, A. Giordano, and J. G. Proakis. MLD and MSE algorithms for adaptive

detection of digital signals in the presence of interchannel interference. IEEE Trans.

on Information Theory, 23:563–575, September 1977.

[34] Y. Lin and D. W. Lin. On optimal power distribution for successive interference

cancellation (SIC) for wideband CDMA. Third IEEE Signal Processing Workshop Bibliography 141

on Signal Processing Advances in Wireless Communications, pages 38–41, March

2001.

[35] S. Verdu. Multiuser Detection. Cambridge Univ. Press, Cambridge, UK, 1998.

[36] P. Patel and J. Holtzmann. Analysis of simple successive interference cancellation

scheme in a DS/CDMA system. IEEE J. Sel. Areas Comms., 12(5):796–807, June

1994.

[37] P. H. Tan, L. K. Rasmussen, and T. J. Lim. Constrained maximum-likelihood

detection in CDMA. IEEE Trans. Communications, 49(1):142–153, January 2001.

[38] B. K. Ng and E. S. Sousa. A novel spread space-spectrum multiple access scheme for

the forward link. In Proc. Wireless Communications and Networking Conference,

March 2002.

[39] R. Doostnejad, T.J. Lim, and E.S. Sousa. Two dimensional spreading codes for the

down-link in a multiuser system with multiple antennas. In Proc. IEEE WPMC,

Oct. 2002.

[40] R. Doostnejad, T.J. Lim, and E.S. Sousa. Impact of power control on the perfor-

mance of space-time spreading codes with suboptimal detectors. In Proc. IEEE

PACRIM, August 2003.

[41] R. Doostnejad, T.J. Lim, and E. S. Sousa. Transceiver designs for the MIMO fading

broadcast channel. Proceeding of the IEEE PIMRC, 2004.

[42] R. Doostnejad, T.J. Lim, and E. S. Sousa. Transceiver designs for the MIMO fading

broadcast channel. to Appear, IEEE Trans. On Wireless Communication. Bibliography 142

[43] H. Sampath, P. Stoica, and A. Paulraj. Generalized linear precoder and decoder

design for MIMO channels using the weighted MMSE criterion. IEEE Trans. On

Wireless Communications, 49(12):2198–2206, Dec. 2001.

[44] D.P. Palomar, J.M. Cioffi, and M.A. Lagunas. Joint Tx-Rx beamforming design for

multicarrier MIMO channels: a unified framework for convex optimization. IEEE

Trans. on Signal Processing, 51(9):2381 – 2401, Sept. 2003.

[45] A. Bourdoux and N. Khaled. Joint TX-RX optimization for MIMO-SDMA based

on a null-space constraint. In Proc. IEEE Veh. Tech. Conf. (VTC), 1:171 – 174,

Sept 2002.

[46] L.U. Choi and R.D. Murch. A transmit preprocessing technique for multiuser MIMO

systems using a decomposition approach. IEEE Trans. On Wireless Communica-

tions, 3(1):20 – 24, Jan 2004.

[47] Q.H. Spencer, A.L. Swindlehurst, and M. Haardt. Zero-forcing methods for downlink

spatial multiplexing in multiuser MIMO channels. IEEE Trans. on Signal Processing,

52(2):461 – 471, Feb. 2004.

[48] A. J. Tenenbaum and R. S. Adve. Joint multiuser transmit-receive optimization

using linear processing. In Proc. ICC’ 04, June 2004.

[49] S. Shi and M. Schubert. Precoding and power loading for multi-antenna broadcast

channels. In Proc. CISS, March 2004.

[50] M. Costa. Writting on dirty paper. IEEE Trans. on Information Theory, 29(3):439–

441, May 1983. Bibliography 143

[51] F. Fung, W. Yu, and T.J. Lim. Multi-antenna downlink precoding with individual

rate constraints: power minimization and user ordering. Proceeding of the IEEE

Singapore Int’l Conf. Comm. Systems (ICCS), Sept. 2004.

[52] R. Doostnejad, T.J. Lim, and E. S. Sousa. Joint precoding and beamforming design

for the downlink in a multiuser MIMO system. Proceeding of the IEEE WIMOB,

2005.

[53] J. Chang, L. Tassiulas, and F. Rashid-Farrokhi. Joint transmitter receiver diversity

for efficient space division multiaccess. IEEE Trans. On Wireless Communications,

1(1):16–27, January 2002.

[54] D. Tse and P. Viswanath. Downlink-uplink duality and effective bandwidths. in

Proc. IEEE Int. Symposium on Inf. Theory (ISIT), July 2002.

[55] P. Viswanath and D. Tse. Sum capacity of the vector gaussian broadcast channel

and uplink-downlink duality. IEEE Trans. on Information Theory, 49(8):1912–1921,

Aug. 2003.

[56] R. Doostnejad, T.J. Lim, and E. S. Sousa. Precoding for the MIMO broadcast

channels with multiple antennas at each receiver. Proceeding CISS, 2005.

[57] R. Doostnejad, T.J. Lim, and E. S. Sousa. Precoding and beamforming design for

MIMO broadcast fading channels with multiple antennas at each receiver. Submitted

to IEEE Trans. On Communication, Jan. 2005.

[58] D. Agrawal, V. Tarokh, A. Naguib, and N. Seshadri. Space-time coded OFDM for

high data-rate wireless communication over wideband channels. In Proc. IEEE Veh.

Tech. Conf. (VTC)-1998, May 1998. Bibliography 144

[59] B. Lu and X. Wang. Space-time code design in OFDM systems. In Proc. IEEE

Globecom Conference, Nov. 2000.

[60] H. Bolcskei and A. J. Paulraj. Space-frequency coded broadband OFDM systems.

In Proc. IEEE WCNC-2000, Sept. 2000.

[61] H. Bolcskei and A. J. Paulraj. Space-frequency codes for broadband fading channels.

In Proc. IEEE ISIT-2001, Jun. 2001.

[62] W. Su, Z. Safar, M. Olfat, and K. J. R. Liu. Obtaining full-diversity space-frequency

codes from space-time codes via mapping. IEEE Trans. on Signal Processing,

51(11):2905 – 2916, Nov. 2003.

[63] W. Su, Z. Safar, and K. J. R. Liu. Full-rate full-diversity space-frequency codes with

optimum coding advantage. IEEE Trans. on Information Theory, 51(1):229 – 249,

January 2005.

[64] B. Lu, X. Wang, and K. R. Narayanan. LDPC-based space-time coded OFDM

systems over correlated fading channels: performance analysis and receiver design.

IEEE Trans. On Communications, 50(1):74 – 88, Jan. 2002.

[65] P. Bouvet, V.L. Nir, M. Helard, and R.L. Gouable. Spatial multiplexed coded MC-

CDMA with iterative receiver. Proceeding of the IEEE PIMRC, 2:801 – 804, Sept.

2004.

[66] X. Peng, Z. Lei, and F.P.S. Chin. Prformance comparison of different MIMO con-

figurations for downlink MC-CDMA systems. Proceeding of the IEEE ICCS, pages

281 – 285, Sept. 2004. Bibliography 145

[67] F. Portier, J.Y. Baudars, and J. F. Helard. Performance of STBC MC-CDMA

systems over outdoor realistic MIMO channels. Proceeding of the IEEE VTC, 4:2409

– 2413, Sept. 2004.

[68] R. Doostnejad, T.J. Lim, and E.S. Sousa. On spreading codes for the down-link in

a multiuser MIMO/OFDM system. In Proc. IEEE VTC, fall 2003.

[69] W. Yu. Spatial multiplex in downlink multiuser multiple-antenna wireless environ-

ments. In Proc. IEEE Globecom Conference, 2003.

[70] J. Zander. Performance of optimum transmitter power control in cellular radio

systems. IEEE Trans. Vehicular Technology, 41:57–62, February 1992.