EURASIP Journal on Applied Signal Processing

Space-Time Coding and its Applications—Part II

Guest Editors: Dirk Slock, Vahid Tarokh, and Xiang-Gen Xia

EURASIP Journal on Applied Signal Processing Space-Time Coding and its Applications—Part II

EURASIP Journal on Applied Signal Processing Space-Time Coding and its Applications—Part II

Guest Editors: Dirk Slock, Vahid Tarokh, and Xiang-Gen Xia

Copyright © 2002 Hindawi Publishing Corporation. All rights reserved.

This is a special issue published in volume 2002 of “EURASIP Journal on Applied Signal Processing.” All articles are open access articles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Editor-in-Chief K. J. Ray Liu, University of Maryland, College Park, USA

Associate Editors Kiyoharu Aizawa, Japan Jiri Jan, Czech Antonio Ortega, USA Gonzalo Arce, USA Shigeru Katagiri, Japan Mukund Padmanabhan, USA Jaakko Astola, Finland Mos Kaveh, USA Ioannis Pitas, Greece Mauro Barni, Italy Bastiaan Kleijn, Sweden Raja Rajasekaran, USA Sankar Basu, USA Ut Va Koc, USA Phillip Regalia, France Shih-Fu Chang, USA Aggelos Katsaggelos, USA Hideaki Sakai, Japan Jie Chen, USA C. C. Jay Kuo, USA William Sandham, UK Tsuhan Chen, USA S. Y. Kung, USA Wan-Chi Siu, Hong Kong M. Reha Civanlar, USA Chin-Hui Lee, USA Piet Sommen, The Netherlands Tony Constantinides, UK Kyoung Mu Lee, Korea John Sorensen, Denmark Luciano Costa, Brazil Y. Geoffrey Li, USA Michael G. Strintzis, Greece Irek Defee, Finland Heinrich Meyr, Ming-Ting Sun, USA Ed Deprettere, The Netherlands Ferran Marques, Spain Tomohiko Taniguchi, Japan Zhi Ding, USA Jerry M. Mendel, USA Sergios Theodoridis, Greece Jean-Luc Dugelay, France Marc Moonen, Belgium Yuke Wang, USA Pierre Duhamel, France José M. F.Moura, USA Andy Wu, Taiwan Tariq Durrani, UK Ryohei Nakatsu, Japan Xiang-Gen Xia, USA Sadaoki Furui, Japan King N. Ngan, Singapore Zixiang Xiong, USA Ulrich Heute, Germany Takao Nishitani, Japan Kung Yao, USA Yu Hen Hu, USA Naohisa Ohta, Japan

Contents

Editorial, Dirk Slock, Vahid Tarokh, and Xiang-Gen Xia Volume 2002 (2002), Issue 5, Pages 445-446

On the Capacity of Certain Space-Time Coding Schemes, Constantinos B. Papadias and Gerard J. Foschini Volume 2002 (2002), Issue 5, Pages 447-458

Space-Time Turbo Trellis Coded Modulation for Wireless Data Communications, Welly Firmanto, Branka Vucetic, Jinhong Yuan, and Zhuo Chen Volume 2002 (2002), Issue 5, Pages 459-470

On Some Design Issues of Space-Time Coded Multi-Antenna Systems, Hsuan-Jung Su and Evaggelos Geraniotis Volume 2002 (2002), Issue 5, Pages 471-481

Space-Time Trellis Coded 8PSK Schemes for Rapid Rayleigh Fading Channels, Salam A. Zummo and Saud A. Al-Semari Volume 2002 (2002), Issue 5, Pages 482-486

Blind Identification of Convolutive MIMO Systems with 3 Sources and 2 Sensors, Binning Chen, Athina P. Petropulu, and Lieven De Lathauwer Volume 2002 (2002), Issue 5, Pages 487-496

Maximum Likelihood Blind Channel Estimation for Space-Time Coding Systems, Hakan A. Çırpan, Erdal Panayırcı, and Erdinc Çekli Volume 2002 (2002), Issue 5, Pages 497-506

Pilot-Symbol-Assisted Channel Estimation for Space-Time Coded OFDM Systems, King F. Lee and Douglas B. Williams Volume 2002 (2002), Issue 5, Pages 507-516

Low-Complexity Iterative Receiver for Space-Time Coded Signals over Frequency Selective Channels, Noura Sellami, Inbar Fijalkow, and Mohamed Siala Volume 2002 (2002), Issue 5, Pages 517-524

Maximum-Likelihood Sequence Detection of Multiple Antenna Systems over Dispersive Channels via Sphere Decoding, Haris Vikalo and Babak Hassibi Volume 2002 (2002), Issue 5, Pages 525-531

Linear Equalization Combined with Multiple Symbol Decision Feedback Detection for Differential Space-Time Modulation, Genyuan Wang, Aijun Song, and Xiang-Gen Xia Volume 2002 (2002), Issue 5, Pages 532-537 EURASIP Journal on Applied Signal Processing 2002:5, 445–446 c 2002 Hindawi Publishing Corporation

Editorial

Dirk Slock Mobile Communication Department, EURECOM Institute, 2229 route des Cretes, BP 193, 06904 Sophia Antipolis Cedex, France Email: [email protected]

Vahid Tarokh Department of EECS MIT, Cambridge, MA 02139, USA Email: [email protected]

Xiang-Gen Xia Department of ECE, University of Delaware, Newark, DE 19716, USA Email: [email protected]

This is the second part of the special issue “Space-Time Cod- of blind identification of a convolutive MIMO system with ing and Its Applications.” In this part, there are ten papers more inputs than outputs. It considers the problem in the covering capacity of space-time coded systems, space-time frequency domain where, for each frequency, it constructs code designs, decoding methods for space-time coded trans- two tensors based on cross-polyspectra of the output. In- missions, and MIMO systems. novative solutions are proposed to resolve frequency depen- The first paper by C. B. Papadias and G. J. Foschini is dent scaling and permutation ambiguities. The paper by H. in the area of capacity issues of space-time coded MIMO A. C¸ ırpan, E. Panayırcı, and E. C¸ ekli considers the problem systems. This paper considers some capacity issues of some of blind estimation of space-time coded signals along with space-time coded systems. It proposes attainable capaci- the channel parameters. In this paper, both conditional and ties that mean the capacities achieved by different tech- unconditional maximum likelihood approaches are devel- niques with the use of progressively stronger known encod- oped and iterative solutions are proposed. The paper by K. ing/decoding techniques. F. Lee and D. B. Williams considers space-time coded or- The second three papers are in the area of space-time thogonal frequency division multiplexing (OFDM) systems code designs. The paper by W. Firmanto, B. Vucetic, J. Yuan, with multi-transmit antennas. In this paper, a low complex- and Z. Chen presents a design of space-time turbo trellis ity, bandwidth efficient, pilot-symbol-assisted channel esti- coded modulation by proposing a new recursive space-time mator for multi-transmit antenna OFDM systems is pro- trellis coded modulation. The proposed scheme is less than posed. 3 dB away from the theoretical capacity bound for MIMO The final three papers are in the area of decoding/ channels. The paper by H.-J. Su and E. Geraniotis consid- demodulation of space-time coded systems. The paper by ers some detailed design issues and tradeoffsofaspace-time N.Sellami,I.Fijalkow,andM.Sialapresentsalowcom- coded MIMO system. The paper by S. A. Zummo and S. A. plexity turbo-detector scheme for space-time coded fre- Al-Semari presents an 8PSK trellis space-time code design quency selective MIMO channels. The paper by H. Vikalo that is suitable for rapid fading channels. They propose two and B. Hassibi presents a sphere decoder for sequence de- approaches or their design: (i) to maximize the symbol-wise tection in multiple-antenna communication systems over Hamming distance between signals leaving from or remerg- dispersive channels. The sphere decoder provides the ML ing to the same encoder’s state; (ii) to partition a set based on sequence estimate with computational complexity compa- maximizing the sum of squared Euclidean distances and also rable to standard space-time decision-feedback equalizing the branch-wise Hamming distance. (DFE) algorithms. The paper by G. Wang, A. Song and X.- The next three papers focus on the topic of channel esti- G. Xia introduces linear equalization to reduce a convolutive mation for space-time coded systems. The paper by B. Chen, MIMO channel to a flat MIMO channel that may possibly A. P. Petropulu, and L. De Lathauwer addresses the problem be only partially known. Detection of differential space-time 446 EURASIP Journal on Applied Signal Processing modulation becomes hence feasible. Multiple symbol deci- Xiang-Gen Xia received his B.S. degree in sion feedback detection is considered for improved perfor- mathematics from Nanjing Normal Univer- mance. sity, Nanjing, China, his M.S. degree in mathematics from Nankai University, Tian- jin, China, and his Ph.D. degree in Electrical Dirk Slock Engineering from the University of Southern Vahid Tarokh California, Los Angeles, USA in 1983, 1986, Xiang-Gen Xia and 1992, respectively. He was a Lecturer at Nankai University, China during 1986–1988, a Teaching Assistant at University of Cincinnati, USA during 1988– 1990, a Research Assistant at the University of Southern California, Dirk Slock received the engineer’s degree USA during 1990–1992, and a Research Scientist at the Air Force In- from the University of Gent, Belgium in stitute of Technology during 1993–1994. He was a Senior/Research 1982. In 1984, he was awarded a Fulbright Staff Member at Hughes Research Laboratories, Malibu, California, scholarship for Stanford University, USA, during 1995–1996. In September 1996, he joined the Department where he received his M.S. in Electrical Engi- of Electrical and Computer Engineering, University of Delaware, neering, M.S. in Statistics, and Ph.D. in Elec- Newark, Delaware, USA, where he is currently an Associate Profes- trical Engineering in 1986, 1989, and 1989, sor. His current research interests include communication systems respectively. While at Stanford, he developed including equalization and coding; SAR and ISAR imaging of mov- new fast recursive least-squares algorithms ing targets, wavelet transform and multirate filterbank theory and for adaptive filtering. In 1989–1991, he was a member of the re- ff applications; time-frequency analysis and synthesis; and numeri- search sta at the Philips Research Laboratory, Belgium. In 1991, he cal analysis and inverse problems in signal/image processing. Dr. joined the Eurecom Institute where he is now Associate Professor. At Xia has over 80 refereed journal articles published, and four U.S. Eurecom, he teaches statistical signal processing and speech coding patents awarded. He is the author of the book “Modulated Coding for mobile communications. His research interests include DSP for for Intersymbol Interference Channels” (New York, Marcel Dekker, mobile communications: antenna arrays for (semi-blind) equaliza- 2000). Dr. Xia received the National Science Foundation (NSF) Fac- tion/interference cancellation and spatial division multiple access, ulty Early Career Development (CAREER) Program Award in 1997, space-time processing and audio coding. More recently, he has been the Office of Naval Research (ONR) Young Investigator Award in focusing on receiver design, downlink antenna array processing, and 1998, and the Outstanding Overseas Young Investigator Award from speech coding for third generation systems, and introducing spa- the National Nature Science Foundation of China in 2001. He also tial multiplexing in existing wireless systems. He received one best received the Outstanding Junior Faculty Award of the Engineering journal paper award from the IEEE-SP and one from EURASIP in School of the University of Delaware in 2001. He is currently an As- 1992. He is the coauthor of two IEEE-Globecom98 best student pa- sociate Editor of the IEEE Transactions on Mobile Computing, the per awards. He has been an Associate Editor for the IEEE-SP Trans- IEEE Transactions on Signal Processing and the EURASIP Journal actions. on Applied Signal Processing. He is also a Member of the Signal Vahid Tarokh received his Ph.D. degree Processing for Communications Technical Committee in the IEEE in Electrical Engineering from the Uni- Signal Processing Society. versity of Waterloo, Ontario, Canada in 1995. From August 1995 to May 1996, he was employed by the Coordinated Science Laboratory of the University of Illinois Urbana-Champaign, as a visit- ing Professor. He then joined the AT&T Labs-Research, where he was employed as a Senior Member of Technical Staff, Principal Member of Technical Staff, and the Head of the Department of Wireless Communications and Signal Processing until August 2000. In the fall of 2000, Dr. Tarokh joined the Department of Electri- cal Engineering and Computer Sciences of MIT as an Associate Professor, where he is currently employed. Dr. Tarokh received numerous awards including the 1987 Gold Tablet of the Iranian Math Society, the 1995 Governor General of Canada’s Academic Gold Medal, the 1999 IEEE Information Theory Society Prize Paper Award (jointly with A. R. Calderbank and N. Seshadri), and more recently the 2001 Alan T. Waterman Award. EURASIP Journal on Applied Signal Processing 2002:5, 447–458 c 2002 Hindawi Publishing Corporation

On the Capacity of Certain Space-Time Coding Schemes

Constantinos B. Papadias Global Wireless Systems Research Department, Bell Laboratories, Lucent Technologies, 791 Holmdel-Keyport Road, Holmdel, NJ 07733, USA Email: [email protected] Gerard J. Foschini Wireless Communications Research Department, Bell Laboratories, Lucent Technologies, 791 Holmdel-Keyport Road, Holmdel, NJ 07733, USA Email: [email protected]

Received 30 May 2001 and in revised form 22 February 2002

We take a capacity view of a number of different space-time coding (STC) schemes. While the Shannon capacity of multiple- input multiple-output (MIMO) channels has been known for a number of years now, the attainment of these capacities remains a challenging issue in many cases. The introduction of space-time coding schemes in the last 2–3 years has, however, begun paving the way towards the attainment of the promised capacities. In this work we attempt to describe what are the attainable information rates of certain STC schemes, by quantifying their inherent capacity penalties. The obtained results, which are validated for a number of typical cases, cast some interesting light on the merits and tradeoffsofdifferent techniques. Further, they point to future work needed in bridging the gap between the theoretically expected capacities and the performance of practical systems. Keywords and phrases: MIMO systems, space-time coding, Bell labs Layered Space Time (BLAST), space-time spreading (STS), channel capacity, space-time processing (STP), transmit diversity.

1. INTRODUCTION exist many cases of interest, where more research is needed in order to approach the capacities of MIMO systems in The combined use of antenna arrays and sophisticated practice. multiple-input multiple-output (MIMO) transceiver tech- In this paper, we will attempt to quantify the perfor- niques has boosted the anticipated spectral efficiencies of mance of certain STC techniques, in terms of their “at- wireless links in the last five years or so. The MIMO channel tainable” capacities. By “attainable capacities” we mean the capacity expressions derived in [1] indicate that the spectral capacities achieved by different techniques with the use efficiencies of MIMO channels can grow approximately lin- of progressively stronger known (typically SISO) encod- early with the (minimum of the) number of antennas avail- ing/decoding techniques. In other words, we will quantify the able on each side of the link. irreducible capacity penalties inherent in certain STCs, due to Similar to the case of single-input single-output (SISO) the way they process signals at the transmitter, as well as at channels, the attainment of the theoretically promised ca- the receiver. Our general framework targets STC techniques pacities in practice, has to rely on strong encoding/decoding which, when viewed end-to-end, can be broken down to a techniques. In the SISO case, it took about fifty years to number of SISO problems. This will allow the evaluation of approach closely (with the advent of Turbo codes [2]) the attainable capacities by quantifying the spectral efficiency of channel capacities predicted by Shannon [3]. In the MIMO each component SISO channel. We then present a number of case, an initial (the so-called D-BLAST) architectural super- STC techniques that fit well within our framework and de- structure was proposed in [1] that is theoretically capable rive their capacities. The evaluation of these capacities helps of achieving the channel capacity. The quest for practical not only compare some of the existing techniques, but also capacity-approaching STC techniques is however ongoing. to identify cases where further research is needed. Interestingly, it seems that some existing STC techniques The remainder of the paper is organized as follows. In [4, 5], already allow to approach closely the channel capac- Section 2, we provide our working assumptions, as well as ities in a number of cases [6, 7]. These quite rapid advance- some background on the topic. In Section 3, we define the ments are of course not unrelated to the mature state of SISO notion of decomposable STCs, as well as some other rele- encoding/decoding techniques. At the same time, there still vant features of such codes. In Section 4, we focus on a num- 448 EURASIP Journal on Applied Signal Processing

s1(k) x1(k) RF RF . . b˜(t) DEMUX . . . . STP bˆ(i) ...... ENCODE . . DECODE STM sM (k) xN (k) REMUX RF RF

Figure 1: A generic (M, N) multiple antenna system. ber of recently developed STC techniques for multiple-input The way in which we will view MIMO systems through- single-output (MISO) channels, and show what are their out the paper is the following. The specific way in which attainable capacities. In Section 5, we present similar results the operations at the transmitter and the receiver mentioned for MIMO systems. In Section 6, we show some numerical above take place, imposes a number of constraints to the results for outage capacities of MIMO Rayleigh-faded chan- problem of achieving the MIMO capacity. We refer to the nels in a number of cases of interest. Finally, in Section 7 we system that results after the imposition of these constraints present our conclusions, as well as some directions for future as an architectural “STC super-structure.” Each STC super- work. structure then admits a whole class of specific STCs, by ap- plying different types of temporal error-correction codes. Our goal will be to identify what are the capacity penalties 2. BACKGROUND AND ASSUMPTIONS inherent to these STC super-structures, due to the imposi- Figure 1 shows a generic architecture of a wireless system tion of constraints on both the transmitter and the receiver. with M transmitter and N receiver antennas. Such a system Said differently, we will attempt to quantify the Shannon ca- will be denoted in the remainder of the paper as (M, N). The pacities that are attainable in each case. continuous-time input stream b˜(t) is assumed to be carrying Now we define the notation and assumptions that will the original primitive data stream {b˜(i)} that is to be com- be used throughout the rest of the paper. After error correc- municated to the receiver. The input stream is then processed tion coding, interleaving, and demultiplexing (irrespective of by the shown DEMUX/ENCODE/STM unit, whose output the order into which these operations occur), the original bit is an ensemble of M parallel data streams, each one of which stream {b˜(i)} is converted to a number, say Q,ofencoded is separately upconverted and transmitted over the MIMO sub-streams, denoted as {b1(k)},...,{bQ(k)}. Note that the channel. number of encoded sub-streams Q will most often equal the The DEMUX/ENCODE/STM unit includes the follow- number of transmitter antennas M, however this may not ing operations: always be the case. Finally, the Q sub-streams are mapped through a spatial multiplexing operation, as shown in (1) demultiplexing; Figure 1, to the M sub-streams that are transmitted from the (2) encoding; M antennas. We denote the sub-stream transmitted from the (3) spatial multiplexing. mth antenna by {sm(k)}. We assume that the physical chan- These operations may be ordered differently and can be done nel between the mth transmitter and the nth receiver antenna in a more or less joint fashion. For example, the original is flat-faded in frequency, it can be hence represented, at bit stream may be encoded first as a whole, and then de- baseband, through the complex scalar hnm. The baseband re- multiplexed onto the M antennas. Alternatively, {b˜(i)} may ceived signal at the receiver antenna array is then represented be first demultiplexed onto a number of sub-streams, each by the following familiar (narrow-band) mixing model: one of which is afterwards separately encoded independently. x(k) = Hs(k)+n(k), (1) Either way, the encoded/demultiplexed sub-streams are then mapped through the so-called spatial multiplexer onto the where the involved quantities are defined as follows: M antennas for transmission. This mapping may be a sim- T ple 1-1 streaming of each encoded sub-stream on each an- • s(k): = [s1(k) ··· sM(k)] is the M ×1 vector snapshot tenna (such as in the original so-called V-BLAST transmis- of transmitted sub-streams, each assumed of equal 2 sion mentioned in [8]), or a more complex spatial map- variance σs ; ping. At the receiver, after the signals are received with an • H is the N × M channel matrix; antenna array, they are first converted to baseband. Then, • x(k) is the N × 1 vector of received signal snapshots; they are processed in space and time (STP), decoded, and re- • n(k) is the N × 1 vector of additive noise samples, as- multiplexed in the STP/DECODE/REMUX unit (again the sumed i.i.d. and mutually independent, each of vari- order of these operations may be arbitrary). These opera- 2 ance σn . tions attempt to recover as reliably as possible a replica of the original primitive bit stream {b˜(i)}. For the purposes of We also denote by the superscripts ∗, T, † the complex con- this paper, temporal interleaving is not explicitly accounted jugate, transpose, and Hermitian transpose, respectively, of for, but it can be easily accomodated. a scalar or matrix. The open-loop Shannon capacity of the On the Capacity of Certain Space-Time Coding Schemes 449

 (M, N)flat-fadedchannelisgiven(see[1]) by the now fa- v(k) is “spatially” white, that is, E(v(k)v †(k)) = I.Weare miliar (so-called “log-det”) formula: now ready to define a number of useful properties of end-to- end linear STC super-structures. ρ † C = log det IN + HH [bps/Hz], (2) 2 M Decomposability = 2 2 We call an end-to-end linear STC super-structure fully de- where ρ Mσs /σn . This formula assumes the transmitter  is constrained to communicate using i.i.d. random processes composable, when the matrix F in (4) is diagonal (possibly of equal power from each of the antennas. Later we will after a rearrangement of its entries). In this case, the orig- M × refine the context to fully accommodate an open loop chan- inal M N problem has reduced into Q spatially single- nel outage mode. As mentioned above, the capacity in (2) dimensional problems. can be only achieved with the use of strong encoding (STC) techniques. In the remainder of the paper, we will attempt to Partial decomposability quantify how much of this capacity is allowed to be attained We call, similarly, an end-to-end linear STC super-structure within certain STC super-structures. partially decomposable, when the matrix F in (4) is block- diagonal (again after a possible rearrangement of entries). In × 3. FEATURES OF STC ARCHITECTURAL other words, instead of a coupled Q Q problem, we are faced SUPER-STRUCTURES with a number of decoupled lower-dimension problems. When viewing (1), the only visible imposed constraints are Balance the equality between the powers of each sub-stream, the in- We call an end-to-end linear STC super-structure fully bal- dependent equal-power noise, and the flat channel charac- anced, when each of the Q sub-streams in (4) experiences teristic. The absence of additional constraints would allow, the same amount of interference from the other Q − 1sub- in theory, the attainable capacity of the (M, N)systemtobe streams as any other sub-stream. given by (2). The imposition of further constraints though, reflecting operations at both the transmitter and the receiver, Partial balance may reduce the capacity in (2). We call this (potentially) re- duced capacity the constrained capacity of a given architec- We call an end-to-end linear STC super-structure partially tural super-structure. balanced, when, the Q sub-streams can be arranged in groups At the receiver, the received encoded vector signal x(k) of sub-streams of dimension lower than Q, such that each is processed in order to produce attempted replicas of group experiences the same amount of interference from the the Q encoded sub-streams. These replicas, denoted as other groups as any other group of sub-streams. {d1(k)},...,{dQ(k)}, are then driven to the (joint or dis- The features defined above, as well as some other that will joint) decoder/de-interleaver, which will attempt to recover be discussed later, will help us classify different STC super- the original uncoded sub-streams, and eventually, the origi- structures, regarding their ability to attain their respective nal primitive bit stream. Leaving out the encoding/decoding capacities. More precisely, they affect the way in which er- stages, we can take an end-to-end view that relates the en- ror correction coding can be embedded in them, so as to coded sub-streams at the transmitter {b1(k)},...,{bQ(k)} approach these capacities. We will now see how some of to their processed attempted (soft) replicas at the receiver these properties and features are reflected into some particu- {d1(k)},...,{dQ(k)}. Quite often, these relate in a linear lar STC super-structures. fashion, that is, according to the following model: 4. (M, 1) SYSTEMS d(k) = Fb(k)+v(k), (3) Inthissection,weconsidersomerepresentativeSTCsuper- where F is a square (Q × Q) matrix and all vectors in (3)are structures that were developed for cases of multiple-input of dimension Q × 1. We will refer to STC super-structures single-output (M, 1) systems. Due to the fact that MISO that admit the end-to-end representation in (3)asend-to- antenna systems provide diversity-type gains, which are, at end linear. Further, depending on the specific structure and most, logarithmic in M, (as opposed to the linear capacity in- attributes of the mixing matrix F and the noise impairment crease of true MIMO systems), they are usually called “trans- v(k), we can define some extra attributes. Before describ- mit diversity systems.” The few techniques that will be shown ing these attributes, we define, for convenience, a noise-pre- are examples that fit well within the framework defined in whitened version of model (3). Denoting by Rv the covari- Section 3, and as such, allow for an analytical evaluation of † ance matrix of v(k)(i.e.,Rv = E(v(k)v (k))), assumed full their theoretical constrained capacities. Before proceeding, rank, an equivalent representation of (3)is we mention the open-loop capacity of flat-faded (M, 1) sys- tem, which is given by  =   d (k) F b(k)+v (k), (4) ρ M  = Φ−1  = Φ−1  = Φ−1 = max = 2 where d (k) v d(k), F v F, v (k) v v(k), and Rv CM,1 log2 1+ hm (5) † † M m=1 E(v(k)v (k)) = ΦvΦv. Note that the new noise impairment 450 EURASIP Journal on Applied Signal Processing which is obtained by substituting, in (2), N = 1andH = (1) fully decomposable to two (1, 1) systems; [h1 ··· hM]. (2) fully balanced.

4.1. (2, 1) systems: the Alamouti scheme The total constrained capacity of the system equals the sum of the capacities of the two SISO systems (recall that each An ingenious transmit diversity scheme for the (2, 1) case was introduced a few years ago by Alamouti [4], and remains to SISO system operates at half the original information rate): date the most popular scheme for (2, 1) systems. We denote A ρ 2 2 by S the 2 × 2 matrix whose ( ) element is the encoded sig- C = log 1+ h1 + h2 . (11) i, j 2,1 2 2 nal going out of the jth antenna at odd (i = 1) or even (i = 2) time periods (the length of each time period equals the du- Note that, by contrasting (11)to(5), we see that ration of one encoded symbol). In other words, one could A = max think of the vertical dimension of S as representing “time” C2,1 C2,1 . (12) and of its horizontal dimension as representing “space.” The Alamouti scheme transmits the following signal every two This result is summarized in the following theorem. encoded symbol periods: Theorem 1. The (2, 1) Alamouti transmit diversity scheme has b1(k) b2(k) a constrained capacity equal to the (2, 1) open-loop channel ca- S( ) = s ( ) s ( ) = (6) 2 k 1 k 2 k ∗ − ∗ . pacity. b2(k) b1(k)  Notice that in this case, Q = M = 2. Notice also that the Moreover, since each component of v (k) is a stationary spatial multiplexing is done according to a block scheme, the noise process, the capacity of each (1, 1) system is attain- block length being equal to L = 2 time periods. Having as- able through conventional (i.e., spatially single-dimensional) sumed, as noted earlier, the channel to be flat in frequency, state-of-the-art encoding techniques. For example, each of the two sub-streams can be encoded independently with a the (2, 1) channel is characterized through H = [h1 h2]. The baseband signal arriving at the single receiver antenna at two Turbo code, which is suitable for the classical additive white consecutive time instants can be expressed as Gaussian noise channel (with stationary noise). ∗ 4.2. A (4, 1) scheme r(k) = h1 b1(k)c1 + b (k)c2 2 (7) − ∗ The nice property of the full log-det capacity being attain- + h2 b2(k)c1 b1(k)c2 + n(k), able in the (2, 1) case does not unfortunately hold in general for (M, 1) systems with M>2 (see, e.g., [10, 11]). However, where cT = [1 | 0], cT = [0 | 1].1 After sub-sampling at 1 2 some schemes have been developed recently for special cases. the receiver and complex-conjugating the second output, we In the following, we describe a scheme that we recently de- obtain rived for the (4, 1) case (see [12]) and evaluate its capacity = T = constrained on different receiver processing options. d1(k) c1 r(k) h1b1(k)+h2b2(k) + v1(k), (8) The original information sequence b˜(i)isfirstdemul- ∗ = T ∗ = − ∗ ∗ d2(k) c2 r (k) h2b1(k) h1b2(k) + v2(k), tiplexed into four sub-streams bm(k)(m = 1,...,4). The 4-dimensional transmitted signal is now organized in blocks = T = where vm(k) cmn(k), m 1, 2. Equation (8)canbeequiva- of L = 4 (encoded) symbol periods, it is hence represented by lently written as a4× 4matrixS, which is arranged as follows:   = h1 h2 b1(k) = b1 b2 b3 b4 d(k) ∗ ∗ + v(k) Hb(k)+v(k), (9)  ∗ ∗ ∗ ∗ −h h b2(k) b −b b −b  2 1 S =  2 1 4 3 , (13) b3 −b4 −b1 b2  where vT ( ) = ( ) ( ) and H is a unitary matrix. After ∗ ∗ ∗ ∗ k v1 k v2 k b b −b b match-filtering to H,weobtain 4 3 2 1 where we have dropped the time index k for convenience. d(k) = H†d(k) The channel matrix is again assumed flat-faded, it can be 2 2 (10) hence represented by = . The received sig- = h1 + h2 0  H h1 h2 h2 h4 2 2 b(k)+v (k), nal will then be given by 0 h1 + h2    h1 where v (k) remains spatially white. Comparing to (3), it is   h2 clear that this (2, 1) system is r = S   + n = Sh + n, (14) h3 h4 1 By suitably redefining c1 and c2, the scheme can be modified for use with direct-sequence CDMA systems, where it is referred to as space-time spreading (STS) [9]. 2This result has been previously reported in [6, 7]. On the Capacity of Certain Space-Time Coding Schemes 451

T where r = [x(1) x(2) x(3) x(4)] contains 4-symbol snap- models above share the same 2×2 channel matrix ∆2 and have shots at the received signal. By complex-conjugating the sec- identically distributed, but statistically independent, 2×1ad- ond and the fourth entry of r in (14), we obtain ditive noise vectors. In order to facilitate the capacity eval-       uation of this scheme, we present at this point the noise- r(1) h1 h2 h3 h4 b1 prewhitened version of (20):  ∗   ∗ ∗ ∗ ∗     r (2) −h h −h h  b2  r =   =  2 1 4 3    + n , (15)     − −    r b1 n r(3) h3 h4 h1 h2 b3 1 = Λ + 1 , (22) ∗ ∗ ∗ ∗ ∗   − − r b3 n r (4) h4 h3 h2 h1 b4 3 3 where n is similarly obtained from n by complex- where conjugating its second and fourth entry. The received signal   λκ is hence written as Λ = (23) −κλ  =  r Hb + n , (16) ∆ = Λ†Λ   with 2 and where n1 and n3 are i.i.d. and mutually 2 where now H is defined as independent Gaussian variables of variance σn each. Again, { }   an identical signal model to (22) holds for the pair b4,b2 . h h h h Similar to γ and α in (21), λ and κ in (23) are real and imagi-  1 2 3 4  −h∗ h∗ −h∗ h∗  nary, respectively. Similarly, the nonzero value of κ represents H =  2 1 4 3  . (17) = −h h h −h  mutual interference√ between the two sub-streams (if α 0, 3 4 1 2 = = − ∗ − ∗ ∗ ∗ then λ γ and κ 0). h4 h3 h2 h1 Maximum allowable Shannon capacity We now perform matched filtering with respect to H   We first compute the Shannon capacity constrained only γ 0 α 0 upon transmitter processing. By considering the two 2 × 2   †   0 γ 0 −α †  models that describe the post-matched-filtering signals ac- r = H r =   b + H n = ∆ b + n , (18) mf −α 0 γ 0  4 mf cording to (22), we deduce that the maximum achievable 0 α 0 γ capacity of a (4, 1) system within the space-time spreading scheme (13)isgivenby where 1 P † C constr,max = log det I + T ΛΛ , (24) 4 4,1 2 2 2 2 = † = 2 = ∗ ∗ 4σn γ h h hm ,α2jIm h1h3 + h4h2 . (19) m=1 = 2 where PT 4σb is the total average transmitted power from 2 ΛΛ† = The parameter α expresses some residual interference inher- the antenna array (σ is the variance of each bi). Since ∆ b ent in this (4, 1) technique, and is in general nonzero. Note 2, this gives both the particular sparse structure of the matrix ∆ ,aswell 4 1 ρ as the fact that γ is real and α is imaginary. These result in ∆ constr,max = ∆ 4 C4 1 log2 det I2 + 2 , (25) 2 2 2 , 2 4 being in general full rank (det(∆4) = (γ + α ) ). Comparing (18)to(3), we observe that this (4, 1) scheme is: = 2 where ρ PT /σn (all bandwidth-related normalizations (1) partially decomposable to two uncoupled (2, 2) sys- have been taken into account, so that (25) represents the tems; total capacity of the system). If the interference caused (2) fully balanced. by the quantity α vanished, the expression in (25)would reduce to Namely, by grouping the entries of rmf in two pairs, we obtain   ργ Copt = log 1+ (26)       4,1 2 4 rmf,1 b1 nmf,1 = ∆2 + , rmf,3 b3 nmf,3 which is the open-loop capacity of the (4, 1) flat-faded (20) = constr,max       system. However, for α 0, C4,1 falls short of r b n opt mf,4 = ∆ 4 + mf,4 , C . r 2 b n 4,1 mf,2 2 mf,2 As mentioned in Section 3, the constrained capaci- where ties that correspond to each scheme depend not only on   the constraints imposed at the transmitter, but also on γα those caused by receiver processing. In the following, we ∆ = (21) 2 −αγ will describe a number of options for receiver process- ing, which will each correspond to a different constrained ∆† = ∆ capacity. (we note in passing that 2 2). The two (2, 2) signal 452 EURASIP Journal on Applied Signal Processing

− 4.2.1 Linear receiver processing 2 1 † = ∆† ∆ ∆† σn † WMS,4 4 4 4 + 2 H H . (32) We observe from (20) that, in order to demodulate the σb transmitted sub-streams in a joint fashion, 2 input/2 output multiuser detection (MUD) is required. In this section, we Hence, the post-MMSE-processed signal delivered to the present candidate receivers that perform linear MUD on each detector is given by pair of matched filter outputs. − 2 1 † † σ r = ∆ ∆ ∆ + n H†H r Zero-forcing processing MS 4 4 4 2 mf σb − (33) A straightforward way of mitigating the interference in the 2 1 desired signal b due to α in (18), is to use a decorrelating = ∆† ∆ ∆† σn ∆ 4 4 4 + 2 4 rmf . (zero forcing—ZF) receiver. Mathematically, the ZF receiver σb operates on the matched-filter outputs as follows: Similar to the ZF case, the MMSE solution in (33) is, similar 3 = ∆−1 = ∆−1 to (28), decomposable as follows: rZF 4 rmf b + 4 nmf . (27)   −     2 1 Due to the decoupling expressed in (20), the ZF operation rMS,1 = ∆† ∆ ∆† σn ∆ rmf,1 = † rmf,1 2 2 2 + 2 WMS,2 , rMS,3 2 rmf,3 rmf,3 decouples too, as follows σb   −             2 1 rMS 4 † † σ rmf 4 † rmf 4 r − r b − n , = ∆ ∆ ∆ n ∆ , = , ZF,1 = ∆ 1 mf,1 = 1 ∆ 1 mf,1 2 2 2 + 2 WMS,2 . 2 + 2 , rMS,2 2 rmf,2 rmf,2 rZF,3 rmf,3 b3 nmf,3 σb         (28) (34) rZF,4 = ∆−1 rmf,4 = b4 ∆−1 nmf,4 2 + 2 . rZF,2 rmf,2 b2 nmf,2 In this case too, an equation similar to (29) can be written, wherein each sub-stream is detected at the output of a (1, 1) Note that (28)isequivalentto(27). The ZF receiver detects system and AWGN noise. The additive noise will contain now the four sub-streams by further processing the zero-forcing contributions from one other sub-stream, it has however the outputs, that is, the entries of the vector rZF given in (27). same variance for all four sub-streams. So again, the system Each of the four zero-forcing outputs can be seen as the out- has been fully decomposed to four (1, 1) systems in a fully put of the following AWGN channel: balanced way. It is then straightforward to compute the ca- pacity of the MMSE receiver, which is given by

rZF,i = bi + nZF,i,i= 1,...,4, (29) † W ΩW1 CMMSE = log 1+ 1 , (35) 4,1 2 †Φ †∆ where nZF,i is an i.i.d. Gaussian noise independent of bi,of W1 W1 +4/ρ W1 2W1 2 2 2 variance that can be found to equal γσn /(γ + α ). At this point, the system has been reduced to a fully decomposed, where fully balanced system. Hence, its capacity is given by   †   † γ γ α α † † Ω = , Φ = ,W= 10W , −α −α γ γ 1 MS,2 ρ γ2 + α2 CZF = log 1+ [bps/Hz], (30) (36) 4,1 2 4 γ † where WMS,2 isgivenin(34).

= 2 = 2 2 4.2.2 Maximum likelihood MUD where we recall that ρ PT /σn 4σb /σn . Notice that the four sub-streams have equal capacities, and that the total capacity We now focus on the prewhitened signal model (22), which of the system equals four times that of any given sub-stream. we repeat here for convenience:   MMSE processing r1 = Λ b1 n1  +  . (37) A better compromise between signal recovery and noise am- r3 b3 n3 plification (and hence better performance) can be achieved   T with minimum mean squared error (MMSE) processing. Keeping in mind that the noise vector n1 n3 is jointly 2 This is achieved by the 4 × 4 setting WMS,4 which minimizes Gaussian with covariance matrix σn I2, the maximum likeli- the MMSE criterion: hood (ML) multiuser detector for (37) solves the following optimization problem:    † − 2 min WMS,4rmf b . (31) WMS,4 3 →∞ † As expected, as ρ ,thesolutionWMS,2 in (34) converges to the ZF The minimization of (31) yields the Wiener solution † = ∆−1 solution WZF,2 2 . On the Capacity of Certain Space-Time Coding Schemes 453

    2  r b1  (2) STS(4, 1)—real: it was also mentioned in [9]and min  1 − Λ  , (38) { }∈Ꮽ×Ꮽ    elsewhere, that a fully decomposable and fully balanced b1,b3 r b3 3 extension of the Alamouti (2, 1) scheme for real inputs can where Ꮽ is the alphabet shared by all the encoded sub- be used for a (4, 1) system. In the case of complex inputs, it streams. is possible to use the same scheme if we sacrifice 50% of the Equation (38) is a typical maximum likelihood MUD rate, that is by signaling half of the time on each complex di- problem (see [13]).Typically,inordertoavoidanexhaus- mension. The capacity of this scheme equals half of the (4, 1) tive multi-dimensional search, the encoding imparts a spe- open-loop capacity: cial structure (such as with convolutional codes). Then the use of dynamic programming techniques such as the Viterbi 1 max C − = C . (41) Algorithm (VA) provides an important saving in complexity. 41 real 2 4,1 We are now ready to assess the capacity of the proposed (4, 1) super-structure, constrained on ML reception. Con- 4.4. (M, 1) hopping sider a pair of transmitted sequences {b˜ }, {b˜ },tobeen- 1 3 A very simple alternative that can be used for any integer coded in a spatially balanced way (either independently or M is based on the idea of cycling a single encoded stream jointly). Then, because of the symmetrical structure of the over the four transmit antennas. In this case, the data stream channel matrix Λ in (37), the communication system is per- is first encoded as a single stream {b(k)}.Theencodedse- fectly balanced (it is understood that the encoding of each quence {b(k)} is then demultiplexed into M sub-sequences, sequence at the transmitter is done without knowledge of {b (k)},...,{b (k)}.Themth subsequence is transmitted the channel instantiations). A spatially two-dimensional ver- 1 M from the mth antenna (m = 1,...,M). In other words, the sion of Shannon’s classical random coding procedure then M antennas take turns in transmitting (at full power) the applies. We start with a primitive (maxentropic) indepen- M sub-streams of the single encoded data sequence. This dent bit stream, and demultiplex it into its even and odd scheme is fully balanced and fully decomposable, however sub-streams, b and b , respectively. The encoded sequence 1 3 its noise impairment is not stationary. Its capacity is easily is assigned half of its bits (b ) from the first sub-stream (first 1 found to be given by the average capacity of the M full-power dimension), and the other half (b ) from the second sub- 3 (1,1) sub-channels, that is, stream (second dimension). Then, due to the perfect bal- ance between the two dimensions, the system’s capacity is   1 M achieved when each of the two sub-streams achieves its own Chop = log 1+ρh 2 . (42) M,1 M 2 m (half of the full) capacity. This requires, however, joint opti- m=1 mal (minimum distance) detection of the two sub-streams, It is important to emphasize that a peculiarity of this as per (38). simple approach is that the encoded sequence is effectively In conclusion, the Shannon capacity which is achieved transmitted through a channel whose SNR is periodic. Con- through ML detection (in the limit of infinitely long random ventional encoding techniques do not perform in general codes) is given by (25), which we repeat here for convenience satisfactorily with such periodic channels. Special codes that 1 ρ can cope with such channels are required in order to be able CML = log det I + ∆ . (39) 4,1 2 2 2 4 2 to approach the capacity in (42). These codes are a current research topic [14]. As will be shown later, this capacity is very close to the full max (4, 1) capacity C4,1 . Notice that in the ML case, even though Discussion it is fully balanced, the system has been only partially decom- Other approaches for the (M, 1) case have appeared in the × posed in two 2 2systems. recent literature. An exhaustive listing of all of them would 4.3. Nonfull rate (4, 1) codes be however beyond the scope of this paper. We should note further that the benefit of open-loop (M, 1) systems becomes We now describe some easily derived constrained capacities increasingly limited as M grows. Keeping the total transmit of some other, less optimal, but quite simple, (M, 1) schemes. 2 power from all the antennas constant, assuming that E|hm| = (1) STS(3,1)—3/4 rate: in [9] it was shown that a (3, 1) 1 and letting M to grow towards infinity, the (M, 1) open- scheme can be designed, which achieves the full (3, 1) capac- loop capacity in (5) tends to the following asymptote: ity, but at the price of a 25% loss of rate. This scheme uses block multiplexing, in a fashion similar to the above schemes = of Section 4. It multiplexes Q = 3 sub-streams on 3 antennas, C∞,1 log2(1 + ρ). (43) over L = 4 symbol periods. It results, however, in a fully- decomposable, fully-balanced, 3 × 3 system with stationary It is clear from (43) that we cannot keep increasing the ca- noise. Its constrained capacity is given by pacity of an open-loop (M, 1) open-loop system by sim- ply increasing the number of transmitter antennas. The use 2 2 2 3 ρ h1 + h2 + h3 of more antennas at the receiver becomes necessary when C − = log 1+ . (40) 31 3/4 4 2 3 higher capacities are sought. 454 EURASIP Journal on Applied Signal Processing

5. (M, N) SYSTEMS where   In this section, we analyze some STC architectural super- Λ1   structures for the case of N>1 receiver antennas. Λ =  .  2n . (49) Λ 5.1. Combined transmit/receive diversity systems n

Given a certain (M, 1) system, one straightforward way to de- and Λn is defined, similar to (23) for the nth receiver antenna. sign an (M, N) system is to simply: 5.2. V-BLAST • transmit as in the (M, 1) system, • receive on each antenna as in the (M, 1) system, A quite simple, from the transmitter’s point of view, STC • combine optimally the N receiver antenna outputs. super-structure was proposed in [8], and is widely referred- to as “V-BLAST.” In this architecture, {(bi)} is first demul- The capacity quantification of these transmit/receive di- tiplexed into M sub-streams, which are then encoded inde- versity systems is straightforward. The M × N (assumed pendently and mapped each on a different antenna: flat) channel is represented through the N × M channel = matrix sm(k) bm(k). (50)   ··· h11 h1M In other words, the original bit stream is converted into   = . . . = ··· a vertical vector of encoded sub-streams (whence the term H  . .. .  h1 hN . (44) “vertical” BLAST) which are then streamed to the antennas h ··· h N1 NM through a 1-1 mapping. In [8], it was proposed to process We first compute an upper bound for the capacity of such an the received signal with the use of a successive interference (M, N) transmit/receive diversity system. With optimal ra- canceller. After determining the order into which the M sub- tio combining, and assuming that each (M, 1) system takes streams will be detected, the V-BLAST receiver operates ac- no interference hit, the input/output relationship takes the cording to the following generic 3-stage scheme, which is fol- form lowed in a successive fashion for each sub-stream: M N (1) project away from the remaining interfering sub- 2 streams; d(k) = |hnm| b(k)+n(k) (45) m=1 n=1 (2) detect (after de-coding, de-interleaving, and slicing) the sub-stream; corresponding to the capacity (3) cancel the effect of the detected sub-stream from sub- sequent sub-streams. M N trd,max = ρ 2 CM,N log2 1+ hnm . (46) Mathematically, these operations can be described as follows M m=1 n=1 for the kmth sub-stream:

It is clear that, when the attainable capacity of the corre- † m zk (k) = W x (k), sponding (M, 1) schemes is away from the (M, 1) log-det ca- m km = pacity, the upper bound in (46) will not be attained either. zˆkm (k) dec zkm (k) , (51) Notice further that, the expression in (46)isstrictlysmaller xm+1(k) = xm(k) − enc zˆ (k) h , than the (M, N) log-det capacity in (2)forN>1. km km where x1(k) = x(k), {k ,...,k } is a reordering of the Examples 1 M set {1,...,M} that determines the order in which the sub- To give some examples, the capacity of a (2,N) system that streams will be detected, dec(·) represents the decoding uses the Alamouti (2, 1) super-structure is plus detection operation, and enc(·) represents the encoding operation. Finally, Wk represents the N × 1 vector that op- N   m ρ m A = log 1+ 2 + 2 (47) erates on x (k) in order to project away from sub-streams C2,N 2 hn,1 hn,2 , { } 2 n=1 km+1,...,kM . The operations in (51)areperformedsuc- cessively for m = 1,...,M, after the ordering {k1,...,kM} that is, as expected, the upper bound in (46) is attained by has been determined. the Alamouti scheme in the (2,N)case. We now discern between the following two cases for It is also straightforward to compute the maximum at- this linear operation, since they affect significantly the con- tainable capacity of a (4,N) system that uses the (4, 1) strained capacity of the system. scheme of Section 4.2, which is given by Zero-forcing projection † ML = 1 ρ Λ Λ† In this case, at the mth stage, W nulls perfectly the interfer- C log2 det I2 + 2n , (48) km 4,N 2 4 2n ence from all the remaining (undetected) sub-streams. These On the Capacity of Certain Space-Time Coding Schemes 455

{ } −1 are the sub-streams with indices km+1,...,kM . This nulling † † M W = Hk H + IN hk , (59) is represented mathematically as MMSE,km m km ρ m

† = ··· ··· = where hk is the kmth column of H. This end-to-end system WZF H 0 010 0 δkm , (52) m ,km has now been fully decomposed into M (1, 1) systems, how- ever, it is not generally balanced. Its capacity is hence com- where the unique nonzero element of the 1 × M vector δk is m puted again through the minimum of the M 1 × 1 capacities in its kmth position. As a result, the end-to-end model for the (assuming Gaussian signaling for each sub-stream), and is kmth output is given by a formula similar to (56): † dk (k) = bk (k)+W n(k),m= 1,...,M, (53)   m m ZF,km VB-MMSE = × C M min log2 1+ρMMSE,km , (60) MN m∈{1,...,M} T where n(k)= n1(k) ··· nN (k) is the receiver noise. Defin- ing where now    † 2 T W Hk = ··· MMSE,km m d(k) dk1 (k) dkM (k) , ρ =      . (61) (54) MMSE,km  2  2 T M W /ρ + = W = ··· MMSE,km l km MMSE,l b(k) bk1 (k) bkM (k) , Again, the capacity in (60) can be maximized through opti- equation (53) can be written in matrix form as mal ordering. = † d(k) b(k)+WZFn(k), (55) 5.3. Other (M, N) schemes = ··· Similar to the (M, 1) case, several other schemes have been where WZF WZF,k1 WZF,kM .From(55), it is obvious that the ZF version of the V-BLAST super-structure is a fully proposed in the literature for the general (M, N)case.For example, it was suggested in [15] to use a block space-time decomposable, however not fully balanced system, due to the ffi generally different square norms of the different columns of multiplexing whose mixing coe cients are derived numer- ically according to a maximum average capacity criterion. WZF. Regarding the capacity of the end-to-end system, it is im- Another approach in [16] uses Turbo codes in the follow- portant to emphasize that we have assumed that each sub- ing way: the original sub-stream is first demultiplexed into stream is independently encoded, and that the transmitter M sub-streams, which are separately encoded each with a has no way of knowing which is the highest rate for each an- block code. Then, the M encoded outputs are space-time in- tenna. As a result, it can only transmit from all antennas the terleaved in a random fashion, mapped onto constellation same rate. Hence, the capacity will equal M times the small- symbols, and sent out of the M antennas. At the receiver, the est of the M decomposed channel capacities: M sub-streams are separated through an iterative interfer- ence canceller, which uses MMSE for the linear (soft) part,   VB-ZF = × and subtracts decisions made after (joint) de-interleaving CMN M min log2 1+ρZF,km , (56) m∈{1,...,M} and (separate) decoding of each interfering sub-stream in the cancellation part. These approaches have demonstrated encouraging per- where ρkm is the output SNR of the kmth sub-stream: formance in terms of bit/frame error rate at the receiver. ρ ρ =   . (57) However, their inherent capacity penalties are still unknown, ZF,km  2 M WZF,kM due mainly to their apparent luck of structure and other properties such as the ones discussed above. The quantifi- It should finally be noted that the capacity in (56)can cation of the capacity penalties of these and other emerging be optimized by choosing an optimal ordering for the set STC super-structures remains an interesting open question. {k1,...,kM} (see [8]). 6. NUMERICAL RESULTS MMSE projection We will now show some representative capacity plots for the In this case, at the mth stage, an optimal compromise be- STC architectures that were mentioned above. In all cases, tween linear interference mitigation of the undetected sub- we will use the analytical expressions derived in the paper. streams and noise amplification is sought. This is achieved We will run these expressions over an ensemble of (M, N) through the following MMSE criterion: random Rayleigh-faded channel matrices (each entry of the   matrix is chosen independently from any other entry from  † 2 min E dk − W Hk , (58) a complex i.i.d. Gaussian distribution of unit variance). We m km m Wkm will then plot outage capacities, that is, we will pick out of the computed capacity cdf a point according to a typical outage where Hkm is derived from H by deleting its columns corre- percentage (such as 10%, which is typical in wireless com- { − } sponding to indices k1,...,km 1 . This gives for Wkm : munications). 456 EURASIP Journal on Applied Signal Processing

10% outage capacities 10% outage capacities 9 6

8 5 7

6 4

5 3 4 bps/Hz

Capacity [bps/Hz] 3 2

2 1 1

0 0 0 2 4 6 8 10 12 14 16 18 20 −10 −50 5101520 SNR [dB] ρ [dB] × 2 2open-loop × × 4 1 open-loop capacity 2 2Alamouti × ∞× 4 1proposedML 1TD × × 4 1 proposed MMSE 8 1TD 4 × 1proposedZF 4 × 1TD 2 × 1Alamouti 1 × 1 Figure 3: Outage capacities of the (4, 1) scheme of Section 4.2 com- pared to the (4, 1) open-loop capacity. Figure 2: Outage capacities and bounds of (M, 1) and (M, 2) schemes. Table 1: Indicative outage capacities of the proposed (4, 1) scheme versus the (4, 1) open-loop capacity. In Figure 2, we show the 10% outage capacities for sev- eral (M, 1) cases, as well as for the (2, 2) case. In the (2, 1) ρ [dB] ZF MMSE ML OPT case, the plotted capacity corresponds to both the Alamouti −10 0.038 0.056 0.057 0.057 scheme and to the maximum open-loop capacity, as indi- 0 0.344 0.469 0.480 0.491 cated by (12). For the other (M, 1) cases, we plot the capac- 10 1.886 1.990 2.212 2.339 ity upper bounds corresponding to (5), and we use (43)for 20 4.805 4.825 5.155 5.379 the asymptotic (∞, 1) case. We also use (46)withN = 2for the capacity of a (2, 2) combined Alamouti/receive diversity outage capacity. Moreover, the (2, 1) scheme is increasingly scheme, and the log-det expression (2) for the (2, 2) maxi- close to the (4, 1) open-loop capacity at low SNR’s. Similarly, mum open-loop capacity. We observe that, at ρ = 10 dB, the in Figure 5, we show comparative plots of the capacity of the (2, 1) system almost doubles the capacity of the (1, 1) system! (4, 1) hopping scheme mentiond in Section 4.2,(see(42)). However, as noted earlier, increasing the number of transmit In Figure 6, we show the capacities of some combined antennas in the (M, 1) case offers diminishing returns. It is transmit/receive diversity schemes for different (4,N)cases. also worth noting that the (2, 2) combined transmit/receiver The circles represent the combined (4,N)systemscorre- diversity scheme is capable of attaining a quite significant sponding to the (4, 1) scheme of Section 4.2, in conjunction fraction (particularly at low SNR’s) of the maximum (2, 2) with optimal receiver diversity. When read from the bot- open-loop capacity. Finally, it is also interesting to note that tom up, these four curves correspond to N = 1, 2, 3, 4, re- a(2, 2) system achieves about the same capacity as a (∞, 1) spectively. Similarly, the crosses represent the corresponding system, which conveys again the message of the high value of open loop (4,N) capacities. Notice that the proposed (4, 1) adding extra antennas at the receiver. scheme is very close to the open-loop capacity, however the In Figure 3, we show the capacities of the (4, 1) scheme of gap gets increasingly larger as N grows from 1 to 4. In the Section 4.2 when used in conjunction with the different pro- (4, 2) case however, the scheme still performs very well at low posed receiver architectures (ZF, MMSE, and ML). It is no- SNR’s. ticeable that the ML structure approaches closely the chan- Finally, in Figure 7, we show a capacity cdf, at 10 dB SNR, nel’s (4, 1) open-loop capacity. Moreover, we observe that at of the ZF and MMSE V-BLAST architectures described in low SNR’s, the linear MMSE solution is also very close to the Section 5.2 for the (4, 4) case. Notice that, at this SNR, the open-loop capacity. Table 1 shows some of these results at MMSE architecture is able of attaining about 70% of the to- chosen SNR points. tal open-loop capacity at 10% outage. However, the ZF ar- Figure 4 shows a comparison of some of the (M, 1) sys- chitecture performs poorly, and is even outperformed by a tems mentioned in Section 4, including non-full rate variants (1, 4) maximal ratio combining system at outages smaller of the Alamouti (STS) scheme. We observe that the non-full than 20%! The situation is more severe for lower SNR’s such rate schemes fall well behind the (2, 1) scheme in terms of as 0 dB, as shown in Figure 8. Now the V-BLAST MMSE On the Capacity of Certain Space-Time Coding Schemes 457

10% outage capacities 10% outage capacities 6 20 18 5 16 14 4 12

3 10 bps/Hz bps/Hz 8 2 6 4 1 2

0 0 −10 −50 5101520 −10 −50 5101520 ρ [dB] ρ [dB] × 4 × 1 open-loop capacity 4 1open-loop × 4 × 1proposedML 4 1proposedmax × 4 × 1 proposed MMSE 4 2open-loop × 2 × 1STS 4 2proposedmax × 3 × 1STS3/4 4 3open-loop 4 × 1STSreal 4 × 3proposedmax 4 × 4open-loop Figure 4: 10% outage capacities compared to other open-loop al- 4 × 4proposedmax ternatives. Figure 6: Outage capacities of the (4, 1) scheme described in Section 4.2, when used with up to four receiver antennas. 10% outage capacities 6 = = = M 4, N 4, ρ 10 dB 1 5 0.9

4 0.8 0.7

3 0.6 bps/Hz abcissa)

> 0.5 2 0.4

1 0.3 Pr (capacity 0.2 0 −10 −50 5101520 0.1 ρ [dB] 0 0246810121416 4 × 1 open-loop capacity 4 × 1proposedML Capacity [bps/Hz] 4 × 1 hopping (1, 4) open-loop (4, 4) V-BLAST-ZF Figure 5: Outage capacity of a hopping scheme. (4, 4) V-BLAST-MMSE (4, 4) open-loop architecture attains only about 50% of the (4, 4) open-loop Figure 7: Outage capacity distribution of a V-BLAST MMSE archi- capacity, whereas the ZF architecture is outperformed by the tecture at 10 dB SNR. (1, 4) system across the board.

for the computation of their inherent capacity penalties, 7. CONCLUSIONS which we have computed analytically for a few representative We have presented a framework for analyzing space-time examples. Our theoretically derived expressions were also coding architectures in terms of Shannon capacity. We de- numerically validated by a number of computer simulations fined a number of attributes of such schemes that allow that compare the considered architectures in terms of outage 458 EURASIP Journal on Applied Signal Processing

M = 4, N = 4, ρ = 0dB versity scheme for wideband CDMA systems based on space- 1 time spreading,” IEEE Journal on Selected Areas in Communi- 0.9 cations, vol. 19, no. 1, pp. 48–60, 2001. [10] A. V. Geramita and J. Seberry, Orthogonal Designs: Quadratic 0.8 Forms and Hadamard Matrices, Marcel Dekker, New York, 0.7 USA, 1979. [11] G. Ganesan and P. Stoica, “Space-time diversity using or-

abcissa) 0.6 thogonal and amicable orthogonal designs,” in Proc. IEEE > Int. Conf. Acoustics, Speech, Signal Processing,,, 0.5 June 2000. 0.4 [12] C. Papadias and G. J. Foschini, “A space-time coding approach for systems employing four transmit antennas,” in Interna- Pr (capacity 0.3 tional Conference on Acoustics, Speech, and Signal Processing, 0.2 Salt Lake City, Utah, USA, May 2001. [13] S. Verdu,´ Multi-User Detection, Cambridge University Press, 0.1 Cambridge, UK, 1999. [14] R. D. Wesel, X. Liu, and W. Shi, “Trellis codes for periodic era- 0 0123456 sures,” IEEE Trans. Communications, vol. 48, no. 6, pp. 938– 974, 2000. Capacity [bps/Hz] [15] B. Hassibi and B. Hochwald, “High-rate linear space-time (1, 4) open-loop codes,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Pro- (4, 4) V-BLAST-ZF cessing, Salt Lake City, Utah, USA, May 2001. (4, 4) V-BLAST-MMSE [16] M. Sellathurai and S. Haykin, “Joint beamformer estimation (4, 4) open-loop and co-antenna interference cancelation for turbo-BLAST,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing,Salt Figure 8: Outage capacity distribution of a V-BLAST MMSE archi- Lake City, Utah, USA, May 2001. tecture at 0 dB SNR.

Constantinos B. Papadias was born in capacity. We believe that these results provide some useful Athens, Greece, in 1969. He received the intuition regarding the performance trade-offsofdifferent diploma of electrical engineering from the techniques. Future work will be targeted in analyzing other National Technical University of Athens promising STC schemes, as well as in determining new ar- (NTUA) in 1991 and the Ph.D. degree in chitectures of higher capacity potential. signal processing (highest honors) from the Ecole Nationale Superieure´ des Tel´ e-´ communications (ENST), Paris, France, in REFERENCES 1995. From 1992 to 1995, he was a Teaching [1] G. J. Foschini, “Layered space-time architecture for wireless and Research Assistant at the Mobile Com- communication in a fading environment when using multi- munications Department, Eurecom,´ France. In 1995, he joined element antennas,” Bell Labs Technical Journal, vol. 1, no. 2, the Information Systems Laboratory, Stanford University, Stanford, pp. 41–59, 1996. Calif, USA, as a PostDoctoral Researcher, working in the Smart An- [2] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near tennas Research Group. In November 1997, he joined the Wireless Shannon-limit error correction coding and decoding: Turbo Research Laboratory of Bell Labs, Lucent Technologies, Holmdel, codes,” in Proc. 1993 International Conference on Communi- NJ, USA, as a Member of Technical Staff. He now heads the Global cations, pp. 1064–1070, Geneva, Switzerland, May 1993. Wireless Strategy Research group in the same lab. His current re- [3] C. Shannon, “A mathematical theory of communication,” Bell search interests lie in the areas of multiple antenna systems (e.g., System Technical Journal, vol. 27, pp. 379–423, 623–656, 1948. MIMO transceiver design, and space-time coding), interference [4] S. Alamouti, “A simple transmitter diversity scheme for wire- mitigation techniques, reconfigurable wireless networks, as well as less communications,” IEEE Journal on Selected Areas in Com- financial evaluation of wireless technologies. He has authored sev- munications, vol. 16, no. 8, pp. 1451–1458, 1998. eral papers and patents on these topics. Dr. Papadias is a member [5] V. Tarokh, N. Seshadri, and A. R. Calderbank, “Space-time codes for high data rate wireless communication: perfor- of the Technical Chamber of Greece. mance criterion and code construction,” IEEE Transactions Gerard J. Foschini BSEE-NJIT, MEE-NYU, on Information Theory, vol. 44, no. 2, pp. 744–765, 1998. Ph.D. Mathematics-Stevens. Mr. Gerard J. [6] C. Papadias, “On the spectral efficiency of space-time spread- Foschini has been at Bell Laboratories for ing schemes for multiple antenna CDMA systems,” in 33rd nearly 40 years. He holds the position of Asilomar Conference on Signals, Systems, and Computers,pp. ff 639–643, Pacific Grove, Calif, USA, October 1999. Distinguished Member of Sta .Hehascon- [7] S. Sandhu and A. Paulraj, “Space-time block codes: a capacity ducted data communications research on perspective,” IEEE Communications Letters, vol. 4, no. 12, pp. many kinds of systems, most recently wire- 384–386, 2000. less communications and optical communi- [8] G. J. Foschini, G. D. Golden, R. A. Valenzuela, and P. W. Wol- cations systems. Gerard has done extensive niansky, “Simplified processing for wireless communication research on point to point systems as well as athighspectralefficiency,” IEEE Journal on Selected Areas in on networks. He won the 2001 Bell Labs, Distinguished Inventor Communications, vol. 17, no. 11, pp. 1841–1852, 1999. Award. He is an IEEE Fellow and he has taught at Princeton and [9] B. Hochwald, L. Marzetta, and C. Papadias, “A transmitter di- Rutgers. EURASIP Journal on Applied Signal Processing 2002:5, 459–470 c 2002 Hindawi Publishing Corporation

Space-Time Turbo Trellis Coded Modulation for Wireless Data Communications

Welly Firmanto School of Electrical and Information Engineering, The University of Sydney, Sydney 2006, Australia Email: fi[email protected]

Branka Vucetic School of Electrical and Information Engineering, The University of Sydney, Sydney 2006, Australia Email: [email protected]

Jinhong Yuan School of Electrical Engineering and Telecommunications, The University of New South Wales, Sydney, NSW 2052, Australia Email: [email protected]

Zhuo Chen School of Electrical and Information Engineering, The University of Sydney, Sydney 2006, Australia Email: [email protected]

Received 1 June 2001 and in revised form 22 March 2002

This paper presents the design of space-time turbo trellis coded modulation (ST turbo TCM) for improving the bandwidth effi- ciency and the reliability of future wireless data networks. We present new recursive space-time trellis coded modulation (STTC) which outperform feedforward STTC proposed by Tarokh et al. (1998) and Baro et al. (2000) on slow and fast fading channels. A substantial improvement in performance can be obtained by constructing ST turbo TCM which consists of concatenated recursive STTC, decoded by iterative decoding algorithm. The proposed recursive STTC are used as constituent codes in this scheme. They have been designed to satisfy the design criteria for STTC on slow and fast fading channels, derived for systems with the product of transmit and receive antennas larger than 3. The proposed ST turbo TCM significantly outperforms the best known STTC on both slow and fast fading channels. The capacity of this scheme on fast fading channels is less than 3 dB away from the theoretical capacity bound for multi-input multi-output (MIMO) channels. Keywords and phrases: space-time coding, MIMO fading channels, space-time trellis codes, turbo coded modulation, interative decoding.

1. INTRODUCTION bined with spatial diversity is called space-time (ST) coding. Code design criteria based on the rank and the deter- In the present cellular mobile communication systems, mul- minant of the codeword distance matrix for trellis based ST tiple antennas are being considered for applications at base codes were derived in [1, 2]. In this approach, multiple trans- station receivers with the aim to suppress cochannel inter- mit antennas and error correction coding are combined with ference and minimize the fading effects on the uplink. The higher level modulation schemes. An ST encoder takes as in- size of base stations allows the deployment of receive diver- put a block of b binary data, and maps them into nT mod- sity on the uplink. On the downlink, however, the limited size ulation symbols from a signal set of 2b points. Each output and power of the mobile stations make it more practical to modulation symbol feeds a separate transmit antenna. The consider transmit diversity. Transmit diversity decreases the symbols from nT antennas are transmitted simultaneously in required processing power of the receivers, resulting in a sim- one symbol interval. The scheme gives a maximum spectral pler system structure, lower power consumption and lower efficiency of b bits/s/Hz which is equal to the spectral effi- cost. Furthermore, transmit diversity can be combined with ciency of the reference uncoded systems. receive diversity to further improve the system performance The receiver uses a maximum likelihood decoding algo- and increase the spectral efficiency. Channel coding com- rithm to recover the transmitted information. Space-time 460 EURASIP Journal on Applied Signal Processing trellis coded modulation (STTC) can achieve a substantial mitter has no knowledge about the channel, it is assumed improvement in performance, benefiting from both diversity that the receiver can recover the channel state informa- and coding gains. However, when the number of transmit tion perfectly. Information bits are encoded into nT streams antennas gets larger, the complexity of the receiver structure of MPSK symbols by the ST encoder. A space-time sym- and the code construction becomes prohibitive. In [1], feed- bol xt at time t consists of nT MPSK symbols, and can be = 1 2 nT forward STTCs with two transmit antennas were designed. written as xt (xt ,xt ,...,xt ). At any given time t,an i In [3], a set of improved feedforward 4PSK STTCs relative to MPSK symbol xt is transmitted through the ith antenna, the codes in [1]wereproposed. i = 1, 2,...,nT . Recently, a new set of design criteria for STTCs for slow At the receiver, each antenna receives a noisy superposi- and fast fading channels were proposed [4, 5]. These criteria tion of nT transmitted symbols which have been subjected to are applicable to multiple-input multiple-output (MIMO) independent fading. After matched filtering, assuming ideal channels with a high diversity order. When the diversity or- j timing information, the received signal rt at the jth receive der, defined as the product of the minimum rank of the dis- antenna at time t can be expressed as tance matrices and the number of receive antennas, is small, the rank and the determinant criteria, proposed in [1], are  nT j = i j valid. However, for high diversity orders, (larger than 3), the rt Es hi,j(t)xt + nt , (1) minimum trace of the codeword distance matrix, or equiva- i=1 lently the minimum squared Euclidean distance, dominates where hi,j (t) models the complex fading gain from transmit the code performance and its minimum value should be antenna i to receive antenna j at time t, i = 1, 2,...,nT , maximized in code design. Motivated by this design crite- j = 1, 2,...,nR,andEs is the energy per symbol. On a fast rion, in this paper we design recursive STTCs and demon- fading channel, we assume that the fading coefficients change strate that they are superior to feedforward STTCs reported independently from symbol to symbol. On a slow fading in [1, 3]. Furthermore, we construct an ST turbo trellis mod- channel, we assume that the fading coefficients remain the ulation (TCM) scheme with the new recursive STTCs as same over a frame and change independently from frame to constituent codes. The recursive structure of the constituent frame. When the fading coefficients remain the same over codes enables the full benefit of interleaver gain and itera- more than one symbol but less than a frame, the channel un- tive decoding. The proposed ST turbo TCM scheme is based dergoes a block fading. Regardless of the fade rate, the fad- on a parallel concatenation of two constituent STTCs and al- ing gains are modelled as independent samples of a complex ternate puncturing of parity symbols, analogous to a turbo Gaussian random variable with a zero mean and a variance TCM scheme reported in [6]. The ST turbo trellis encoder j of 0.5 per dimension. The noise nt at the jth receive an- consists of two identical recursive STTCs linked by an inter- tenna at time t is modeled as an independent sample of a leaver and followed by an MPSK signal mapper. The iterative zero mean complex Gaussian random variable with a noise decoder operates on the constituent code trellis and gener- spectral density of N0. ates soft symbol estimates by a log-MAP algorithm [7, 8]. Independent of the work of this paper, a similar design was done in [9], based on a recursive code obtained by con- 3. PERFORMANCE ANALYSIS AND CODE vertingthefeedforwardSTTCreportedin[1] into a recur- DESIGN CRITERIA sive code. In [10], a turbo code is serially concatenated with a space-time block code. In [11], the concept of recursive Amemoryν recursive MPSK STTC can be described in STTCs is first suggested and the serial and parallel concate- terms of its 2ν-state trellis. At time t = 0 the trellis is at the nation structures with recursive STTCs as component codes zero state. Given a particular input, the state of the trellis at are proposed. In [10, 11], full diversity is guaranteed but full any given time is indicated by the content of the ν memory i rate is not achieved. In [12], a novel serial concatenation of taps. At time t, there are M branches leaving each state st, STTC with interleaver and rate 1 simple recursive inner code i ∈{0, 1,...,2ν − 1}, each of which corresponds to an in- is proposed. coming input j, j ∈{0, 1,...,M − 1}, and is labeled with One of the key issues with turbo codes is decoding algo- nT MPSK symbols. These MPSK symbols are the encoder rithm convergence. We discuss the decoder convergence of output to be transmitted simultaneously through nT trans- i the proposed ST turbo TCM scheme and evaluate the de- mit antennas when the previous state is st and the input is coding thresholds, expressed as the minimum Eb/N0 ratio for j. At the decoder, the received sequence is decoded using a which the code can converge. Furthermore, we estimate that maximum likelihood decoding algorithm based on the M- the proposed ST turbo TCM codes are less than 3 dB away ary trellis. from the MIMO theoretical channel capacity limit [13]. Following the derivation in [5], consider an nT × nT codeword distance matrix A(x, xˆ) = B(x, xˆ) · BH (x, xˆ)be- tween two codewords x = (x1, x2,...,xt,...,xl)andxˆ = 2. STTC SYSTEM MODEL H (xˆ1, xˆ2,...,xˆt,...,xˆl)oflengthl.ThematrixB denotes the The system under consideration employs a recursive STTC Hermitian of a matrix B,andB(x, xˆ)isacodeworddifference with nT transmit and nR receive antennas. While the trans- matrix, defined as Space-Time Turbo Trellis Coded Modulation for Wireless Data Communications 461   x1 − xˆ1 x1 − xˆ1 ··· x1 − xˆ1  1 1 2 2 l l  achievable, it is equivalent to maximizing the minimum de-  x2 − xˆ2 x2 − xˆ2 ··· x2 − xˆ2   1 1 2 2 l l  terminant of A(x, xˆ), as first proposed in [1]. B x, xˆ =  . . . .  . (2)  . . .. .  . . . 3.2. Performance on fast fading channels nT − nT nT − nT ··· nT − nT x1 xˆ1 x2 xˆ2 xl xˆl Provided that δH · nR ≥ 4[5], the pairwise error probability For the purpose of our analysis, define r as the minimum on fast Rayleigh fading channels can be upper bounded by rank of the matrix A(x, xˆ) over all possible codeword pairs, √ 4 2 4 2 2 1 n D nRd nR D − 8σ d and δH , the minimum symbol Hamming distance, is defined P x, xˆ) ≤ exp R − E Q √ E , as 2 128σ4 8σ2 8σ2 D4 (7) = 2 δH min υ x, xˆ , (3) where d is the accumulated squared Euclidean distance be- x,xˆ E tween two space-time symbol sequences, given by taken over all codeword pairs, where υ(x, xˆ) denotes the set   2 =  − 2 of time instances t ∈{1, 2,...,l}, such that xt − xˆt = 0. dE xt xˆt , (8) t∈υ(x,xˆ) 3.1. Performance on slow fading channels while D4 is given by The pairwise error probability P(x, xˆ) is the probability that   4  4 the decoder selects as its estimate the sequence xˆ when the D = xt − xˆt . (9) transmitted sequence was in fact x. When r ·nR ≥ 4, on a slow t∈υ(x,xˆ) fading channel, the pairwise error probability can be upper bounded as [5] By using an approximation of the Q(·) function, at high   signal-to-noise ratios the upper bound in (7) can be further r 2 r 1 nR = λ n = λ approximated as P x, xˆ ≤ exp i 1 i − R i 1 i 2 128σ4 8σ2 n   P x, xˆ ≤ exp − R d2 √   (4) 2 E r 2 2 r 8σ n = λ − 8σ = λ ×  R i 1 i i 1 i  (10) Q  , n l nT 2 r 2 ≤ − R i − i2 8σ i=1 λi exp 2 xt xˆt . 8σ t=1 i=1 where λ ,i= 1, 2,...,r, are nonzero eigenvalues of the matrix i From (10), we can conclude that the pairwise error prob- A(x, xˆ), σ2 is the noise variance, and Q(·) is the complemen- ability is dominated by the squared Euclidean distance d2. tary error function. E 2 When δ · n < 4, the upper bound on the pairwise error By using inequality Q(x) ≤ (1/2)e−x /2 for x ≥ 0, at high H R probability at high signal-to-noise ratios becomes signal-to-noise ratios the upper bound in (4) can be further approximated as $ −δ ·n  − 1 H R ≤  −  2nR P x, xˆ xt xˆt 2 r ∈ 8σ nR t υ(x,xˆ) P x, xˆ ≤ exp − λi . (5) (11) 2 −δ ·n 8σ i=1 − 1 H R = d 2nR , p 8σ2 From (5), it can be seen that, in order to minimize the er- ror probability, the minimum sum of all eigenvalues of the 2 where dp is the product of the squared Euclidean distances matrix A(x, xˆ) among all codeword pairs should be maxi- between two space-time symbol sequences, given by mized. For a square matrix, the sum of the eigenvalues is $   equal to the sum of all elements on the main diagonal which 2 =  − 2 dp xt xˆt . (12) is called the trace of the matrix. The performance is domi- t∈υ(x,xˆ) nated by the minimum trace which is equivalent to the min- imum Euclidean distance over all codewords. When r · nR ≥ 4andδH · nR ≥ 4, the design criteria for When r · nR < 4, however, the upper bound on the pair- STTC on slow and fast fading channels are identical. The de- wise error probability at high signal-to-noise ratios can be sign criteria in this case can be formulated as expressed as • Maximize the minimum Euclidean distance over all codewords. 8σ2 rnR P x, xˆ ≤ # , (6) r nR · ≥ · ≥ i=1 λi Therefore, provided that r nR 4andδH nR 4, we can construct a set of recursive STTCs which best satisfy the which suggests that to achieve the best performance the design criterion and perform well on both types of fading minimum rank and the minimum product of all nonzero channels, and can be directly used as constituent codes in a eigenvalues of A(x, xˆ) should be maximized. If full rank is parallel concatenation structure. 462 EURASIP Journal on Applied Signal Processing

(a1,a2)(a1,a2)(a1 ,a2 ) (a1,a2)(a1,a2)(a1 ,a2 ) 0 0 × 1 1 ××ν1 ν1 0 0 ××1 1 ν1 ν1 ×

c1 t ••··· • c1 + t • • ··· 1 2 (st ,st ) • • ··· + ×× × 2 q0 q1 qν1 ct 1 2 (st ,st ) ×× × + 1 2 1 2 1 2 (b0,b0)(b1,b1)(bν1 ,bν1 ) q q q ××0 1 ν2 ×

+ ••··· • c2 Figure 1: Feedforward STTC encoder. t ×× × 1 2 1 2 1 2 (b0,b0)(b1,b1)(bν2 ,bν2 ) 4. CONSTRUCTION OF RECURSIVE SPACE-TIME TRELLIS CODES 4.1. Code structure Figure 2: Recursive STTC encoder. In this section, the structure of systematic and nonsystematic recursive STTC is explained. A feedforward STTC encoder for 4PSK and two antennas with a memory order of ν = 2ν1 is 1 = 1 1 1 1 The feedforward generator matrix from (19), shown in Figure 1. If the sequence c (c0,c1,c2,...,ct ,...) is the binary input stream to the upper row of shift registers, Gi (D) in a polynomial form it can be represented as = 1 G(D) i , (20) G2(D) 1 = 1 1 1 2 ··· 1 t ··· c (D) c0 + c1D + c2D + + ct D + . (13) can be converted into an equivalent recursive matrix by di- Similarly, the binary input sequence viding it by a binary polynomial q(D)ofadegreeequalto or less than ν1.However,ifq(D) is chosen to be a primitive 2 = 2 2 2 2 c c0,c1,c2,...,ct ,... (14) polynomial, the resulting recursive code should have a high minimum distance. The generator polynomial for antenna i to the lower row of shift registers can be written as can be represented as   2 = 2 2 2 2 ··· 2 3 ··· i c (D) c0 + c1D + c2D + + ct D + . (15) G (D)  1   q(D)  The feedforward generator polynomial for the upper row of Gi(D) =   , (21) Gi ( ) shift registers and transmit antenna i,wherei ∈{1, 2},can 2 D be written as q(D)

i = i i ··· i ν1 where G1(D) a0 + a1D + + aν1 D . (16)

= 2 ··· ν1 Similarly, the feedforward generator polynomial for the q(D) q0 + q1D + q2D + + qν1 D . (22) lower row of shift registers and transmit antenna i,where A systematic recursive STTC can be obtained by setting i ∈{1, 2} can be written as   2 Gi (D) = bi + bi D + ···+ bi Dν2 . (17) G1(D) = , (23) 2 0 1 ν2 1 The encoded symbol sequence transmitted from antenna i is which means that the output of the first antenna is obtained given by by directly mapping the input sequences c1 and c2 into a si(D) = c1(D)Gi (D)+c2(D)Gi (D)mod4. (18) 4PSK sequence. A diagram of a recursive 4PSK STTC encoder 1 2 with two transmit antennas is shown in Figure 2. Equivalently, the relationship in (18) can be written in the A recursive 8PSK STTC, can be generated by a similar following form: procedure by converting a feedforward 8PSK STTC gener- ator matrix with polynomial entries into an equivalent re- Gi (D) cursive generator matrix with rational entries. The spectral i = 1 2 1 s (D) c (D) c (D) i . (19) efficiency in this case is 3 bits/s/Hz. G2(D) Space-Time Turbo Trellis Coded Modulation for Wireless Data Communications 463

Table 1: Recursive 4PSK STTC for slow and fast fading channels, bandwidth efficiency 2 bits/s/Hz.

1 2 1 2 1 2 1 2 1 2 1 2 1 2 2 ν (a0,a0)(a1,a1)(a2,a2)(b0,b0)(b1,b1)(b2,b2)(b3,b3) dE 2(0, 2) (1, 2) — (2, 3) (2, 0) — — 10.0 3(2, 2) (2, 1) — (2, 0) (1, 2) (0, 2) — 12.0 4(1, 2) (1, 3) (3, 2) (2, 0) (2, 2) (2, 0) — 16.0 5(0, 2) (2, 3) (1, 2) (2, 2) (1, 2) (2, 3) (2, 0) 16.0

Table 2: Recursive 8PSK STTC for slow and fast fading channels, bandwidth efficiency 3 bits/s/Hz.

1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 2 2 ν (a0,a0)(a1,a1)(b0,b0)(b1,b1)(b2,b2)(d0,d0)(d1,d1)(d2,d2) dE dE [1] 3(2, 1) (3, 4) (4, 6) (2, 0) — (0, 4) (4, 0) — 7.172 4.0 4(2, 4) (3, 7) (4, 0) (6, 6) — (7, 2) (0, 7) (4, 4) 8.06.0 5(0, 4) (4, 4) (0, 2) (2, 3) (2, 2) (3, 0) (2, 2) (3, 7) 8.586 8.0

4.2. Algebraic structure of recursive space-time 2 and 3 bits/s/Hz. Unlike previously reported feedforward trellis codes STTC in [1, 3], these recursive codes can be used directly 1 2 as constituent codes in ST turbo TCM schemes to deliver For a 4PSK recursive STTC, the output symbols st and st from Figure 2 can be expressed algebraically as data transmission at the same rates but at much lower sig- nal to noise ratios than the reference uncoded systems with ν1 ν2 the same spectral efficiency. In a cellular system, a lower 1 = ˆ1 1 + ˆ2 1 mod 4 st ct−j1 aj1 ct−j2 bj2 , transmission power means lower interference to neighbor- j =0 j =0 1 2 (24) ing cells, thus allowing a frequency band to be reused more ν1 ν2 2 1 2 2 2 frequently. Section 3 discusses the design criteria for recur- s = cˆ − a + cˆ − b mod 4, t t j1 j1 t j2 j2 sive STTCs on slow and fast fading channels. In reality, how- j =0 j =0 1 2 ever, the fade rate falls somewhere between these two ex- 1 1 2 2 tremes. Therefore, it would be desirable to obtain a set of where ν = ν1 + ν2, a ,b ,a ,b ∈{0, 1, 2, 3} and i ∈{1, 2}. ji ji ji ji codes which satisfy the design criteria for both extreme con- The new variable cˆi is defined as t ditions. It is expected that such codes will perform well

νi in a wide variety of fading conditions. In [1], such codes cˆi = ci + qi cˆi mod 2, (25) are termed smart and greedy space-time codes because the t t ji t−ji = ji 1 encoder does not need to know the channel but can take advantage of the benefits offered by both the multiple trans- ∈{ } where i 0, 1 . mit/receive antennas and the possible temporal channel vari- The encoder for an 8PSK recursive STTC is implemented ations. as a feedforward shift register with a memory order of ν.The We have stated previously that when r · nR ≥ 4and encoder output can be expressed as δH · nR ≥ 4, the design criteria for recursive STTCs on

ν1 ν2 ν3 slow and fast fading channels coincide. Under these condi- 1 = 1 1 2 1 3 1 tions, the error probability is minimized when the minimum st cˆt−j aj + cˆt−j bj + cˆt−j dj mod 8, 1 1 2 2 3 3 2 j1=0 j2=0 j3=0 squared Euclidean distance, d , of the code is maximized. (26) E ν1 ν2 ν3 Therefore, with the code structure given in Section 4.1 and 2 = ˆ1 2 + ˆ2 2 + ˆ3 2 mod 8 assuming that at least two receive antennas, (n ≥ 2), are st ct−j1 aj1 ct−j2 bj2 ct−j3 dj3 , R = = = i i j1 0 j2 0 j3 0 available to the system, we find a set of coefficients a ,b jk jk i i i 1 1 1 2 2 2 for recursive 4PSK STTCs, and aj ,bj ,dj for recursive 8PSK where ν = ν1 + ν2 + ν3, a ,b ,d ,a ,b ,d ∈{0, 1,...,7} k k k ji ji ji ji ji ji STTCs for a given memory order which maximizes 2. ∈{ } i i = i dE and i 1, 2, 3 . The new variable cˆt is defined as cˆt ct + νi i i Tables 1 and 2 list recursive 4PSK and 8PSK STTCs, re- = q cˆ − mod 2. ji 1 ji t ji spectively, with two transmit antennas which best satisfy the design criterion on slow and fast fading channels, provided 4.3. A hybrid design of robust recursive STTC that nR ≥ 2. Each code in both tables have the minimum rank In this section, we consider design of recursive STTCs which r = 2 and the minimum symbol Hamming distance δH ≥ 2, can deliver data transmission with bandwidth efficiency of satisfying the condition on the design criterion. These codes 464 EURASIP Journal on Applied Signal Processing

1E+00 1E+00

1E−01 1E−01

Frame error rate 1E−02 − Frame error rate 1E 02

1E−03 1E−03 −3 −2 −1012345 −20 2 4 6 8

Eb/N0 (dB) Eb/N0 (dB)

8-st STTC in [1] 32-st STTC [1] 16-st STTC in [1] 32-st STTC [1] 8-st STTC in [2] new recursive 32-st STTC 16-st STTC in [2] new recursive 32-st STTC new recursive 8-st STTC new recursive 16-st STTC

Figure 3: Performance comparison of the 8-state and 32-state re- Figure 4: Performance comparison of the 16-state and 32-state cursive 4PSK STTCs on slow fading channels, bandwidth efficiency 4PSK STTCs on a fast fading channel, bandwidth efficiency 2 bits/s/Hz. 2 bits/s/Hz. were obtained through an exhaustive computer search. These feedforward STTC for the same memory order, previously codes were initially constructed in a feedforward form in [4]. proposed in [1, 3]. When FER = 10−3 the new recursive 16- A further investigation shows that these codes maintain their state 4PSK STTC offers a 2 dB and a 0.8 dB gain over feedfor- superiority in terms of their squared Euclidean distance, and ward STTCs in [1, 3], respectively. The new recursive 32-state thus their performance, when they are converted into a feed- 4PSK outperforms feedforward STTC of the same memory back recursive form as discussed in Section 4.1. Both tables order in [1]by0.5dBatFER= 10−3. list the squared Euclidean distance of each code and that of All figures we have shown in this section confirm that its counterparts of the same memory order reported in [1, 3]. the new recursive STTC outperforms feedforward STTC of For any given memory order, the new recursive STTC has the the same memory order previously proposed in [1, 3], both 2 largest dE, indicative of a superior performance on slow and on slow and fast fading channels. Note however, that the re- fast fading channels for a large product rnR. cursive structure STTC by itself does not have any advan- tage over feedforward STTC. As stated in Section 4,recur- 5. PERFORMANCE OF RECURSIVE STTC sive STTCs in Tables 1 and 2 were originally constructed in feedforward form. Figure 5 shows the performance of the In this section, we compare the performance of the new re- 16-state 4PSK STTC in feedforward and recursive forms on cursive STTCs with previously known feedforward STTCs on quasi-static fading channels. The frame error rate perfor- slow and fast fading channels. The performance is measured mance of the code in both forms is identical. intermsoftheframeerrorrateasafunctionofEb/N0, the ratio between the energy per information bit to the noise at each receive antenna. Each frame consists of 130 MPSK 6. SPACE-TIME TURBO TCM symbol transmissions from each transmit antenna. Figure 3 Having designed and constructed a set of recursive STTCs shows the performance of the new 8-state and 32-state 4PSK with a superior performance on slow and fast fading chan- recursive STTCs in comparison with feedforward STTCs of nels, we would like to use them in a parallel concatenation the same memory order proposed in [1, 3]withfourre- to further reduce bit errors by taking advantage of inter- ceive antennas on slow fading channels. The 8-state STTCs in leaver gain and iterative decoding. Figure 6 shows the en- [1, 3] achieve virtually the same performance, while the new coder structure of a ST turbo TCM with two transmit an- 8-state recursive STTC offers a 0.5 dB gain over the other two tennas, consisting of two recursive STTC encoders in the up- − STTCsataframeerrorrate(FER)of103. The new recur- per and lower branches, and linked by a pairwise interleaver sive 32-state STTC offers a 0.5 dB gain over the feedforward and a symbol deinterleaver [6]. Each encoder operates on a STTC in [1] at the same frame error rate. message block of L groups of b information bits, where L Figure 4 shows the performance of the new 16 and 32- is the interleaver size. The message sequence c is given by state 4PSK recursive STTCs with two receive antennas on c = (c1, c2,...,ct,...,cL), where ct is a group of b informa- fast fading channels. The performance curves show consis- tion at time t,givenbyct = (ct,0,ct,1,...,ct,b−1). tently lower error rates of the new recursive STTC over the TheupperrecursiveSTTCencoderinFigure 6 maps the Space-Time Turbo Trellis Coded Modulation for Wireless Data Communications 465

1E+00 encoders at any given time instant are identical. Assum- ing that L is even, the first stream of symbols generated by 1 1 the upper and lower encoders, x1 and x2, are alternately 1 = 1 1 1 1 1 1 punctured into x (x1,1,x2,2,x1,3,x2,4,...,x1,L−1,x2,L)and 1E−01 transmitted through the first transmit antenna. The sec- ond stream of symbols generated by the upper and lower 2 2 2 = encoders, x1 and x2, are alternately punctured into x 2 2 2 2 2 2 (x1,1,x2,2,x1,3,x2,4,...,x1,L−1,x2,L) and transmitted through − the second transmit antenna.

Frame error rate 1E 02

7. DECODING ALGORITHM The decoding process of ST turbo TCM is very similar to that 1E−03 4 6 8 1012141618 of binary turbo codes except that the symbol probability is used as the extrinsic information instead of the bit proba- Eb/N0 (dB) bility. The MAP decoding algorithm for nonbinary trellises Recursive STTC is called symbol-by-symbol MAP algorithm. Since the ex- Feedforward STTC trinsic information can become either too large or too small and causes computational overflows, a log-MAP algorithm is Figure 5: FER performance comparison between a recursive and used instead of MAP.With a log-MAP decoder, the logarithm feedforward 4PSK STTC on a slow fading channel. of probabilities is computed and passed to the next decoding stage. The log-MAP decoder computes the log-likelihood ratio b of each group of information bits ct = i. The soft output 1 • x Λ(ct = i)isgivenby Recursive 1 2 • x1, x1 Inform. STTC   source • Selector 2 b x Pr ct = i|r over 2 set Λ c = i = log   t Pr c = 0|r  t b  i  (27) 1 2 − x , x (l,l)∈Bi αt 1 l γt l ,l βt(l) 2 2 = log  t , ···  0   ∈ 0 αt−1 l γ l ,l βt(l) (l ,l) Bt t Symbol Symbol interleaver deinterleaver where i is the set of the information groups, i ∈ {0, 1, 2,...,2k − 1}, r is the received sequence, and the proba- bilities αt(l), βt(l), and γt(l) can be computed as in the MAP algorithm [8]. The symbol i with the largest log-likelihood Recursive ratio in (27) is chosen as the hard decision output. STTC Regardless of whether the component code is systematic over 2b set or not, it is not possible to separate the systematic component from the nonsystematic one in the received signal. This is due Figure 6: ST turbo TCM encoder. to the fact that the received signal at a particular receive an- tenna contains a joint signal transmitted from all antennas. This prohibits the separation of the contribution from the 1 first antenna, which, as we assume, transmits the systematic input sequence into two streams of LMPSK symbols, x1, 2 i = i i i ∈{ } = b symbol, from the rest of it. Therefore, in contrast to binary x1,wherex1 (x1,1,x1,2,...,x1,L), i 1, 2 and M 2 . Prior to encoding by the lower encoder, the information bits turbodecoders,asoftoutputofturboTCMdecoderscan are interleaved by a pairwise symbol interleaver. The pair- only be split into two terms. They are the a priori informa- wise symbol interleaver operates on groups of b bits in- tion generated by the other decoder and the extrinsic infor- stead of on single bits. The interleaver maps even positions mation generated by all coded digits. The extrinsic informa- to even positions, and odd ones to odd ones. The inter- tion will be exchanged between the two component decoders. leaver ensures that the ordering of b information bits ar- It is worth noting that for symbol-by-symbol MAP de- riving at the interleaver at any time instant t remains un- coding, each component decoder should avoid using the changed. The lower encoder also produces two streams of same information twice in each iteration. In turbo TCM, LMPSK symbols. Each stream is deinterleaved, resulting each decoder alternately receives the noisy output of its own 1 2 i = i i i ∈{ } encoder and that of the other encoder. That is, the coded in x2 and x2,wherex2 (x2,1,x2,2,...,x2,L), i 1, 2 . Deinterleaving at this stage ensures that the b information symbols in every second received signal belong to the other bits determining the output symbols of the upper and lower encoder and need to be treated as punctured. For example, consider the first decoder. For every odd received signal, the 466 EURASIP Journal on Applied Signal Processing

1E+00 1E+00

1E−01 1E−01

− − Frame error rate 1E 02 Frame error rate 1E 02

1E−03 1E−03 −10 12345678 0 2 4 6 8 10 12 14

Eb/N0 (dB) Eb/N0 (dB) 16-st STTC, L = 1024 4-state 4-PSK ST turbo TCM, 10 iterations 16-st turbo TCM in [9], L = 1024, 10 iterations 4-state 4-PSK STTC new 16-st ST turbo TCM , L = 1024, 10 iterations Figure 8: FER performance of a 4-state 4PSK STTC and a 4-state ST Figure 7: FER performance comparison between a 16-state 4PSK Turbo TCM, bandwidth efficiency 2 bits/s/Hz on quasi-static fading STTC and a 16-state 4PSK ST turbo TCM with interleaver size of channels. 1024, bandwidth efficiency 2 bits/s/Hz on fast fading channels.

decoding operation proceeds as for the binary turbo codes 1E+00 when the decoder receives the symbol generated by its own encoder. However, for every even received signal, the de- coder receives the punctured symbol which is generated by the other encoder. The decoder in this case ignores this sym- 1E−01 bol by setting the branch transition metric to zero. The only input at this step in the trellis is the a priori component ob- tained from the other decoder.

− Frame error rate 1E 02 8. ST TURBO TCM PERFORMANCE This section evaluates the performance of ST turbo TCM scheme on fast and block fading channels. In each case, it is 1E−03 assumed that the receiver has two receive antennas. Figure 7 1 2345678910 shows the (FER) performance comparison between the 16- E /N (dB) state recursive 4PSK STTC in Table 1 and a 16-state 4PSK ST b 0 turbo TCM. The 16-state recursive 4PSK STTC is the con- 32-st 8-PSK STTC, L = 1024 = stituent code in the ST turbo TCM configuration. The per- new 32-st ST turbo TCM , L 1024, 6 iterations new 32-st ST turbo TCM , L = 1024, 10 iterations formance curves show that the ST turbo TCM configuration offers a tremendous improvement. At a frame error rate of −3 Figure 9: FER performance comparison between a 32-state 8PSK 10 , with ten iterations and an interleaver size of 1024, it STTC and a 32-state 8PSK ST Turbo TCM with bandwidth effi- achieves a gain of more than 7 dB relative to STTC. At the ciency 3 bits/s/Hz on fast fading channels. same frame error rate, it achieves more than 2 dB gain com- pared to ST turbo TCM with the constituent code of the same memory order, proposed in [9].Thebandwidthefficiency in 8PSK STTC. In this case, with ten iterations the new 32-state all cases is 2 bits/s/Hz. 8PSK ST turbo TCM offers more than 7 dB gain at FER=10−3, Figure 8 shows the performance of the 4-state 4PSK ST compared to the 32-state recursive 8PSK STTC in Table 2. turbo TCM on quasi-static fading channels. The number of When the number of iteration is reduced from ten to six, the iterations is 10 and the interleaver size is 130. The curves performance is degraded by about 0.3dB. show that at FER=10−2 the ST turbo TCM offers 8.8dBand Figure 10 shows the effects of increasing the number 8.0 dB gain over the recursive STTC for the fading block size of transmit and receive antennas on the performance of of 100 and 200, respectively. the 16-state 4PSK ST turbo TCM on fast fading channels. Figure 9 shows the FER performance of the new 32-state Following an algebraic description of a recursive 4PSK STTC 8PSK ST turbo TCM in comparison with that of the 32-state in Section 4.2, the constituent recursive 4PSK STTC with Space-Time Turbo Trellis Coded Modulation for Wireless Data Communications 467

1E+00 1E+00

1E−01 1E−01

− − Frame error rate

1E 02 Frame error rate 1E 02

1E−03 1E−03 −5 −4 −3 −2 −10 1 −101234567

Eb/N0 (dB) Eb/N0 (dB) 2T, 2R 2T, 4R 16-st STTC, L = 1024 3T, 2R 3T, 4R 16-st turbo TCM in [5], L = 1024, 10 iterations 4T, 2R 4T, 4R new 16-st ST turbo TCM , L = 1024, 10 iterations

Figure 10: Effects of various numbers of transmit and receive an- Figure 11: FER performance comparison between the 16-state tennas on the FER performance of the 16-state 4PSK ST turbo TCM, 4PSK ST turbo TCM and a 16-state 8PSK turbo TCM, bandwidth bandwidth efficiency 2 bits/s/Hz on fast fading channels. efficiency 2 bits/s/Hz on fast fading channels.

three transmit antennas is given as turbo TCM and a 16-state turbo TCM scheme from [6]. Note that although the turbo TCM scheme uses only one transmit 1 2 3 = · 0 · 0 xt ,xt ,xt (1, 2, 1) cˆt +(1, 3, 2) cˆt−1 antenna while the ST turbo TCM scheme uses two, the to- tal transmit power remains the same. Two receive antennas +(3, 2, 1) · cˆ0 +(2, 0, 2) · cˆ1 (28) t−2 t are used in both cases. With the same interleaver size of 1024 · 1 · 1 and ten iterations, the ST turbo TCM offers a 2.5dBgainata +(2, 2, 0) cˆt−1 +(2, 0, 2) cˆt−2 mod 4. frame error rate of 10−2. The bandwidth efficiency in all cases Similarly, the constituent recursive 4PSK STTC with four is 2 bits/s/Hz. Note that to achieve the same bandwidth effi- transmit antennas is given as ciency, the scheme by Robertson et al. has to use 8PSK signal set. 1 2 3 4 = · 0 · 0 xt ,xt ,xt ,xt (1, 2, 1, 1) cˆt +(1, 3, 2, 2) cˆt−1 Figure 12 shows the performance of 4-state ST turbo 0 1 TCM and 4-state STTC on quasi-static fading channels with +(3 2 1 3) · ˆ +(2 0 2 2) · ˆ − , , , ct−2 , , , ct two transmit and two receive antennas. At FER = 10 3, the · 1 · 1 ST turbo TCM offers more than 1.5 dB improvement. The +(2, 2, 0, 0) cˆt−1 +(2, 0, 2, 2) cˆt−2 mod 4, (29) frame size is 130 symbols.

k ∈{ } 8.1. System capacity where in both cases cˆt ,k 0, 1 is defined in (25). The per- formance curves show that increasing the number of trans- Telatar investigated and derived the formula for the capac- mit antenna from two to three brings about 0.7dBgainat ity of multiantenna Gaussian channels with or without fad- FER = 10−3, while increasing the number of transmit anten- ing in [13]. Assuming independent Rayleigh fading and in- nas from three to four results in a negligible gain. The incre- dependent noise at different receive antennas, the capacity of mental gain resulting from increasing the number of trans- the channel with nT transmit and nR receive antennas under mit antennas stays relatively the same when the number of power constraint P equals [13] receive antennas increases from two to four. % ∞ m−1 The performance curves of Figures 7, 8,and9 suggest P k! − 2 − − log 1+ λ Ln m(λ) λn me λdλ, that the parallel concatenation of STTC outperforms recur- 2 − k 0 σ nT k=0 (k + n m)! sive STTC scheme. One may argue, however, that the com- (30) parison is less than fair since ST turbo TCM can take ad- where σ2 is the noise variance per dimension, m = vantage of interleaver gain and iterative decoding. Thus, a { } = { } i min nR,nT ,n max nR,nT ,andLj are the associated La- fairer comparison should consider the performance of ST guerre polynomials [15]. turbo TCM with other known turbo TCM schemes such as Using this formula, we plotted the theoretical capac- that proposed by Robertson [6, 14]. Figure 11 shows the FER ity of a MIMO independent Rayleigh fading channel when performance comparison between the new 16-state 4PSK ST (nT ,nR) = (2, 2) and (2, 4). Figure 13 shows the spectral effi- 468 EURASIP Journal on Applied Signal Processing

1E+00 5 4.5 4

1E−01 3.5 3 (2, 2) 8PSK 2.5 ST turbo TCM Capacity − 2 Frame error rate 1E 02 (2, 2) 4PSK (2, 2) 4PSK 1.5 ST turbo TCM STTC 1 (2, 4) 4PSK ST turbo TCM 1E−03 0.5 24681012 0 0 −6 −4 −20 2 4 6 8 Eb/N0 (dB) E /N (dB) 4-state 4-PSK ST turbo TCM, 10 iterations b 0 4-state 4-PSK STTC Capacity with 2T, 2R Capacity with 2T, 4R Figure 12: FER performance of a 4-state 4PSK STTC and a 4-state ST Turbo TCM, bandwidth efficiency 2 bits/s/Hz on quasi-static Figure 13: The system capacity when (nT ,nR) = (2, 4) and (2, 2). fading channels. ciency of STTC and ST turbo TCM with various constituent 100 codes when the bit error rate (BER) is 10−5.Theyarecom- pared with theoretical MIMO channel capacity, expressed in (30). The figure shows that the 16-state 4PSK ST turbo TCM, with an interleaver size of 1024 and 10 decoder itera- 10−1 tions when (nT ,nR) = (2, 2) and (2, 4), is 2.4dBawayfrom the channel capacity. The 32-state 8PSK ST turbo TCM with the same interleaver size and the same number of iterations is 1.65 dB away from the channel capacity. The 16-state 4PSK

Frame error rate −2 STTC when (nT ,nR) = (2, 2), on the other hand, is 8.86 dB 10 away from the channel capacity, or 6.46 dB worse than the 16-state 4PSK ST turbo TCM. Note that these capacity fig- ures are indicative of the performance on fast fading chan- nels. For slow fading channels, the outage probability calcu- 10−3 lations should be applied. −2.5 −2 −1.5 −1 −0.50 0.51 1.52 2.5 In Figure 14, the performance of 16-state ST turbo TCM SNR per receive antenna (dB) with four transmit and receive antennas on quasi-static Rayleigh fading channel is presented. For comparison, the 16-state ST turbo TCM, 4T, 4R outage probability for 2 bits/s/Hz, which is a lower bound Outage probability, 4T, 4R, 2 bits/s/Hz for FER on quasi-static fading channels, is also included. The = performance curves show that the 16-state ST turbo TCM is Figure 14: The outage capacity when (nT ,nR) (4, 4). 1.5 dB away from the outage capacity at the FER of 10−3. tion message is independent and identically Gaussian dis- 8.2. Decoder convergence 2 tributed with mean µi and variance σi at end of the ith it- We analyze the convergence of ST turbo TCM decoder by ap- eration. The mean and the variance at each iteration can be proximating the density functions of the extrinsic informa- determined through simulations. The SNRi of the extrinsic tion message as a Gaussian distribution, and calculating the information at the ith iteration is defined as mean and variance in the Gaussian density evolution. This µ2 technique was used to analyze turbo codes [16]andtoob- = i SNRi 2 . (31) tain an Eb/N0 threshold on low density parity check (LDPC) σi codes[17]. A threshold is the smallest Eb/N0 value beyond which an iterative decoder converges and the bit error rate For a parallel concatenation code, the decoder convergence goes to zero as the number of iterations increases. can be determined by plotting the output SNR versus the in- Assuming perfect interleaving, each extrinsic informa- put SNR of the first decoder and the input SNR versus the Space-Time Turbo Trellis Coded Modulation for Wireless Data Communications 469

5 Table 3: Eb/N0 thresholds of various 4PSK STTC constituent codes.

4 ν NewrecursiveSTTC RecursiveSTTCin[9] 3 −0.90 dB −0.65 dB 4 −0.80 dB −0.50 dB 3 50.55 dB 0.40 dB

SNR out 2 turbo TCM scheme with the new 16-state 4PSK STTC as the = − 1 constituent code when Eb/N0 0.5 dB. The figure shows a tunnel between the two curves through which the itera- tive decoding progresses. This figure suggests that the thresh- 0 00.511.52 2.53old is less than −0.5 dB. A further investigation shows that − SNR in the threshold for this code is 0.8 dB. This shows that ST turbo TCM with the new 16-state 4PSK STTC as the con- = − First decoder, Eb/N0 0.5dB stituent code is more optimized than that with the 16-state = − Second decoder, Eb/N0 0.5dB QPSK STTC in [9]astheconstituentcode,becauseitcon- verges more quickly at a lower operating E /N . Further- Figure 15: Convergence and iterative decoding threshold of ST b 0 more, Table 3 compares the thresholds between the turbo TCM decoder when recursive 16-state STTC in [9]isused Eb/N0 as the constituent code. new recursive 4PSK STTC with that proposed in [9] when being used as constituent codes in a parallel concatenation structure. The entries show that for a given memory order, 5 the new recursive STTC converges more rapidly. The entries of Table 3 suggests that increasing the memory order does not necessarily result in a lower threshold. A similar phe- 4 nomenon has been observed with binary turbo codes, for which a lower memory code has a lower threshold. This can 3 be explained as follows. Firstly, codes with larger memory have longer paths in the trellis and when the noise is large at low operating SNRs the decoder is more likely to diverge SNR out 2 as the number of iterations increases. Secondly, codes with larger memory have more nearest neighbor codewords, re- 1 sulting in larger error coefficients. Consequently, at low oper- ating SNRs, it is harder for the decoder to choose the correct 0 codeword. 00.511.52 2.53 SNR in 9. CONCLUSIONS = − First decoder, Eb/N0 0.5dB = − Second decoder, Eb/N0 0.5dB This paper considers the design of a space-time turbo trel- lis coded modulation scheme. The structure of recursive Figure 16: Convergence of ST turbo TCM decoder when the new STTC is presented and new recursive STTCs which best sat- recursive16-state4PSKSTTCisusedastheconstituentcode. isfy the design criterion on slow and fast fading channels are proposed. These recursive STTCs outperform previously known feedforward STTC. Moreover, they can be used di- output SNR of the second decoder. If the two curves intersect rectly as constituent codes in a parallel concatenation struc- with each other, the decoder does not converge. The thresh- ture, benefiting from interleaver gain and iterative decod- old is the value of Eb/N0 at which the two curves just touch. ing. This structure offers significant performance improve- ment compared to the traditional STTC scheme on fast and Figure 15 shows the input/output SNR curves of ST block fading channels. The new ST turbo TCM is less sen- turbo TCM scheme with a 16-state 4PSK STTC in [9] as the sitive to any change in the fading rate compared to previ- constituent code. Note that SNR in denotes the SNR of the ously known codes, and falls within 3 dB from the theoretical extrinsic information at the input of a decoder, and SNR out MIMO channel capacity. denotes the SNR of the extrinsic information at the output of a decoder. The curves were generated when E /N = −0.5dB. b 0 ACKNOWLEDGMENT The figure shows the two curves just touch. This implies that the threshold is −0.5dB. The authors wish to thank Nortel Networks for its sponsor- Figure 16 shows the input/output SNR curves of ST ship for this study. 470 EURASIP Journal on Applied Signal Processing

REFERENCES Welly Firmanto received the B.S. of electri- cal engineering in 1994 from Purdue Uni- [1] V. Tarokh, N. Seshadri, and A. Calderbank, “Space-time codes versity, West Lafayette, USA and the Mas- for high data rate wireless communication: performance cri- ter’s of engineering in 1996 from Carleton terion and code construction,” IEEE Transactions on Informa- University, Ottawa, Canada. Between 1996 tion Theory, vol. 44, no. 2, pp. 744–765, 1998. and 1999 he was a research engineering [2] J. C. Guey, M. P.Fitz, M. R. Bell, and W.-Y. Kuo, “Signal design at PT. Industri Telekomunikasi Indonesia, for transmitter diversity wireless communication systems over Bandung, Indonesia. He is currently work- Rayleigh fading channels,” in Proc. IEEE Vehicular Technology ing towards the Ph.D. degree at the Uni- Conference, pp. 136–140, Atlanta, Ga, USA, April 1996. versity of Sydney, Sydney, Australia. His re- [3] S. Baro, G. Bauch, and A. Hansmann, “Improved codes for space-time trellis-coded modulation,” IEEE Transactions search interests include wireless communications, digital commu- Communications Letters, vol. 4, no. 1, pp. 20–22, 2000. nications and error control coding. [4] Z. Chen, J. Yuan, and B. Vucetic, “Improved space-time trellis coded modulation scheme on slow Rayleigh fading channels,” Branka Vucetic received the B.S. of electri- IEE Electronics Letters, vol. 37, no. 7, pp. 440–441, 2001. cal engineering, M.S., and Ph.D. degrees in [5] J. Yuan, B. Vucetic, Z. Chen, and W. Firmanto, “Performance 1972, 1978, and 1982, respectively, from the of space-time coding on fading channels,” in Proc. of Intl. Sym- University of Belgrade, Belgrade. She is the posium on Inform. Theory, Washington D.C., USA, June 2001. Director of Telecommunications Labora- [6] P. Robertson and T. Worz, “Bandwidth-efficient turbo trellis tory and Professor of Telecommunications coded modulation using punctured component codes,” IEEE at the University of Sydney, Sydney, Aus- Journal on Selected Areas in Communications,vol.16,no.2, tralia. Her research interests include wire- pp. 206–218, 1998. less communications, digital communica- [7] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decod- tion theory, coding, and multiuser detec- ing of linear codes for minimizing symbol error rate,” IEEE tion. Transactions on Information Theory, vol. 20, no. 2, pp. 284– 287, 1974. Jinhong Yuan received the B.S. and Ph.D. [8] B. Vucetic and J. Yuan, Turbo Codes Principles and Applica- degrees in electrical engineering from the tions, Kluwer Academic, Boston, Mass, USA, 2000. Beijing Institute of Technology, China, in [9] D. Tujkovic, “Recursive space-time trellis codes for turbo 1991 and 1996, respectively. In 1997, he coded modulation,” in Proc. IEEE Global Telecommunications joined the School of Electrical and Infor- Conference, vol. 2, pp. 1010–1015, San Francisco, Calif, USA, mation Engineering, University of Sydney, 27 November–1 December 2000. Australia, as a Research Fellow. He is cur- [10] G. Bauch, “Concatenation of space-time block codes and rently with the School of Electrical Engi- “turbo” TCM,” in Proc. IEEE International Conference on neering and Telecommunications, Univer- Communications, vol. 2, pp. 1202–1206, Vancouver, British sity of New South Wales, Australia. His re- Columbia, Canada, June 1999. search interests include wireless communications, communication [11] K. R. Narayanan, “Turbo decoding of concatenated space- theory, error control coding, and digital modulation. time codes,” in Proc. 37th Annual Allerton Conference on Com- munication, Control, and Computing, pp. 899–900, Allerton, Zhuo Chen was born in Sichuan, China, on Ill, USA, September 1999. March 3, 1977. He received the B.S. of elec- [12] X. Lin and R. S. Blum, “Improved space-time codes using trical engineering from Shanghai Jiao Tong serial concatenation,” IEEE Communications Letters, vol. 4, University, China, in 1997 and the M.S. de- no. 7, pp. 221–223, 2000. gree from the University of Sydney, Aus- [13] E. Telatar, “Capacity of multi-antenna Gaussian channels,” The European Transactions on Telecommunications, vol. 10, no. tralia. He is currently pursuing the Ph.D. 6, pp. 585–595, 1999. degree at the University of Sydney. His re- [14] P. Robertson and T. Worz, “Coded modulation scheme em- search interests include space-time coding ploying turbo codes,” IEE Electronics Letters, vol. 31, no. 18, and communications theory. pp. 1546–1547, 1995. [15] I. S. Gradshteyn and I. M. Ryzhik, Table of Integrals, Series, and Products, Academic Press, San Diego, Calif, USA, 5th edition, 1994. [16] D. Divsalar, S. Dolinar, and F. Pollara, “Low complexity turbo- like codes,” in Proc. 2nd Int. Symp. on Turbo Codes and Related Topics, pp. 73–80, Brest, France, September 2000. [17] S.-Y. Chung, T. Richardson, and R. Urbanke, “Analysis of sum-product decoding of low-density-parity-check codes us- ing Gaussian approximation,” IEEE Transactions on Informa- tion Theory, vol. 47, no. 2, pp. 657–670, 2001. EURASIP Journal on Applied Signal Processing 2002:5, 471–481 c 2002 Hindawi Publishing Corporation

On Some Design Issues of Space-Time Coded Multi-Antenna Systems

Hsuan-Jung Su Bell Laboratories, Lucent Technologies, Holmdel, NJ 07733, USA Email: [email protected]

Evaggelos Geraniotis Department of Electrical and Computer Engineering, University of Maryland at College Park, College Park, MD 20742, USA Email: [email protected]

Received 30 May 2001 and in revised form 30 January 2002

This paper concerns some design issues and tradeoffs of communication systems equipped with multiple transmit and receive antennas. The general space-time coding/modulation structure by Tarokh et al. (1999) is considered. Several design issues are investigated for this structure. The layered space-time architecture by Foschini (1996) is revisited as a special case of the general structure. It is also used to demonstrate the design and complexity tradeoffs of the system. Through intuitive and analytical explanations, as well as simulations, the design considerations for these space-time transmission structures and their contributions to the performance are shown.

Keywords and phrases: space-time codes, array processing, iterative processing.

1. INTRODUCTION ual transmit antennas can be detected separately after proper signal space separation. Signal coordination is not necessary The growing demand of high data rate service has inspired in this case, and the throughput can be kept high. The lay- studies on multi-antenna wireless communication systems in ered space-time (LST) architecture in [2]isanattemptto recent years (e.g., [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]). From an realizing the information theoretic result in [3, 4], namely, information theoretic point of view, multiple antennas can the multi-antenna channel capacity grows linearly with the increase the capacity of wireless channels [3, 4]. In practice, smaller of the number of transmit antennas and the number this increased capacity amounts to improved error perfor- of receive antennas. The LST uses one-dimensional signal per mance when the communication data rate is fixed. data layer. Its focus is more on the signal space separation at When there are not enough antennas at the receiver to the receiver. While the channel coding method is, contrary to resolve the transmission signal spaces, joint detection of the the space-time coding techniques, conventional. signals transmitted from different antennas may be necessary In order to attain a compromise between throughput and at the receiver. In this scenario, proper coordination of the performance when the receiver has a certain number of an- signals transmitted on different antennas can help improv- tennas, the authors of [8] later on extended their work to ing the performance without additional cost. The space-time consider grouped space-time transmission (GST) [1]. In this coding techniques in [8, 9, 10, 11], which integrate channel system, the space-time coding advantages are explored by the coding, modulation and antenna diversity to allow simulta- antennas within the same group, while the system through- neous exploitation of coding gain (temporal diversity) and put can be kept high by transmitting the groups indepen- antenna diversity, are examples in this category. The draw- dently. As in the LST, signal processing at the receiver is re- back of such signal coordination is intractable joint detec- quired in order to separate the signal groups. A zero-forcing tion complexity when the number of transmit antennas is method was employed in [1] for signal space separation. large, unless certain throughput-inefficient signal coordina- The GST design possesses the fundamental ingredients tions are introduced [11]. of a multi-antenna system: space-time coding, data alloca- If the number of receive antennas is large enough to re- tion, and antenna array processing. It therefore can be seen solve the transmission signal spaces, the signals from individ- as a general space-time transmission structure, with the LST 472 EURASIP Journal on Applied Signal Processing being a special case which has one antenna per group. In This change in notation does not affect the zero-forcing array this paper, we investigate the system properties and design processing, while it allows more concise presentation of our issues of this general structure. Some design considerations alternative approach. With this new notation, we now rewrite and tradeoffsarerevealedthroughintuitive,aswellasan- (1) in the vector form alytical discussions. The issues discussed are then verified = Ω η with simulations. Note that the issues considered in this rt ct + t, (2) paper pertain to an “open-loop” system, meaning there is no channel information feedback from the receiver to the where transmitter. c = 1 2 n T This paper is organized as follows. In Section 2, the sys- t ct ,ct ,...,ct , tem model of the GST is given. We then discuss the antenna T r = r1,r2,...,rm , array processing issues in Section 3, which is followed by the t t t t power allocation issue in Section 4.InSection 5, spatial in- η = 1 2 m T t ηt ,ηt ,...,ηt , terleaving is discussed in the context of diversity advantage.   (3) Section 6 revisits the LST in terms of diversity and coding α α ··· α  1,1 2,1 n,1  gain. In Section 7, we describe the application of the itera-  ···  α1,2 α2,2 αn,2  tive minimum-mean-squared-error (MMSE) multiuser de- Ω =  . . .  .  . . .. .  tection algorithm [13, 14] to the GST. Section 8 then verifies . . . . ··· and concludes the design issues discussed with comprehen- α1,m α2,m αn,m sive simulation results. Finally, some concluding remarks are given in Section 9. In the GST, the data to be transmitted are divided into q groups, each space-time coded and transmitted with one group of antennas. The antenna groups are disjoint. The pur- 2. SYSTEM MODEL pose of space-time coding is to ensure maximal spatial diver- sity, hence the steepest asymptotic slope of the performance ∗ In the remainder of this paper, the superscripts , T,and curve [8]. Two 2-antenna space-time trellis code examples H are used to denote complex conjugate, matrix transposi- are shown in Figure 1. They both use the same trellis diagram tion, and Hermitian transposition, respectively. The commu- but different input-output associations. The code to the right nication system considered consists of a transmitter which is (Tarokh et al.) was proposed in [8],1 while the other (Code equipped with n antennas and a receiver which has m anten- A) is a slight alteration of it. nas. At the transmitter, data is encoded by the channel en- At the receiver, the signal groups are successively de- coder, the encoded symbols are then divided into n streams coded. Once a group is decoded, its contribution to the re- which will be transmitted simultaneously using n distinct an- ceived signal is removed before decoding the next group. tennas after modulation. Let nj denote the number of transmit antennas in group j, The signal at each receive antenna is a noisy superpo- 1 ≤ j ≤ q,wehaven1 + n2 + ··· + nq = n. In decoding a sition of the n transmitted signals corrupted by the fad- specific group, the uncanceled groups are considered as in- ing channel which is assumed to be flat, Rayleigh, and spa- terferences, therefore antenna array processing is necessary tially independent, meaning fading is statistically indepen- to suppress them. The array processing issues are discussed dent from one transmit-receive antenna pair to another. in Section 3. Following the convention of [1, 2], we assume that fading remains constant within a data frame and varies from one frame to another. This assumption amounts to the worst case 3. ARRAY PROCESSING situation where temporal diversity is not available, and can In [1], a zero-forcing approach was used for array process- serve as an upper bound on the error probability. For this ing. In this section, we recapitulate this approach first. Then j ≤ ≤ system, the signal rt received by antenna j (1 j m)at we propose an alternative approach which can improve the time t is given by performance. n  j = i j 3.1. Zero-forcing array processing rt αi,j ct Ei + ηt , (1) = i 1 Without loss of generality, we consider the decoding of the Ꮿ where i is the transmit antenna index, E is the symbol en- first group code denoted by 1 and assume that the chan- i Ω ergy of the ith transmit antenna, ci is the symbol transmitted nel gain matrix is known to the receiver. Assuming that t ≥ − from antenna i at time t,andαi,j is the path gain from the m n n1 + 1, the zero-forcing group interference sup- ith transmit antenna to the jth receive antenna. The noise pression method proposed in [1] uses the null space of the j ηt is assumed to be additive white Gaussian (AWGN) with variance N0/2 for both real and imaginary components. 1The Tarokh et al. code in Figure 1 was generated with the encoder poly- For notational simplicity, we redefine αi,j as the product nomials in [8, page 757]. It differs from [8, Figure 6] which has a few typos of the channel gain αi,j and the symbol amplitude Ei in (1). and results in slightly worse performance. On Some Design Issues of Space-Time Coded Multi-Antenna Systems 473

1 Θ(Ꮿ1). Let   α α ··· α  1,1 2,1 n1,1  α1,2 α2,2 ··· αn ,2  Ω Ꮿ =  1  ( 1)  . . . .  , (6) 2 0  . . .. .  α α ··· α 1,m 2,m n1,m ˜r = ΘH Ꮿ r , (7) t 1 t H Code A Tarokh et al. Ω˜ = Θ Ꮿ1 Ω Ꮿ1 , (8) 3 00,21,02,23 00,01,02,03 η = ΘH Ꮿ η ˜t ( 1) t, (9) 11,32,13,30 11,12,13,10 22,03,20,01 22,23,20,21 we arrive at 33,10,31,12 33,30,31,32 20,01,22,03 20,21,22,23 ˜ = Ω˜ 1 η˜ 31,12,33,10 31,32,33,30 rt ct + t, (10) 02,23,00,21 02,03,00,01 1 = 1 2 n1 T Ꮿ 13,30,11,32 13,10,11,12 where ct (ct ,ct ,...,ct ) . The decoder then decodes 1 by 1 33,10,31,12 33,30,31,32 choosing the hypothesis cˆt which minimizes the metric 00,21,02,23 00,01,02,03 − Ω˜ 12 11,32,13,30 11,12,13,10 ˜rt cˆt . (11) 22,03,20,01 22,23,20,21 t 13,30,11,32 13,10,11,12 20,01,22,03 20,21,22,23 3.2. Maximum signal-to-noise ratio (SNR) 31,12,33,10 31,32,33,30 array processing 02,23,00,21 02,03,00,01 22,03,20,01 22,23,20,21 The performance of the array processor can be improved if 33,10,31,12 33,30,31,32 the balance between interference suppression and noise en- 00,21,02,23 00,01,02,03 hancement can be found. We derive in this section the max- 11,32,13,30 11,12,13,10 imum SNR array processor. For a given time t, since the 02,23,00,21 02,03,00,01 space-time coded symbols are unknown at the array process- 13,30,11,32 13,10,11,12 ing stage, the SNR can only be maximized statistically. For 20,01,22,03 20,21,22,23 31,12,33,10 31,32,33,30 this purpose, the signal and interference covariance matrices 11,32,13,30 11,12,13,10 are defined, considering decoding the first group code, as 22,03,20,01 22,23,20,21 = Ω Ꮿ 1 1 H ΩH Ꮿ 33,10,31,12 33,30,31,32 Rs ( 1)E ct (ct ) ( 1), (12) 00,21,02,23 00,01,02,03 = Λ Ꮿ o o H ΛH Ꮿ 31,12,33,10 31,32,33,30 Rn ( 1)E ct ct ( 1)+N0I, (13) 02,23,00,21 02,03,00,01 13,30,11,32 13,10,11,12 respectively, with 20,01,22,03 20,21,22,23 1 = 1 2 n1 T ct ct ,ct ,...,c , t (14) Figure 1: Two rate 1/2, 32-state, 4PSK space-time codes. o = n1+1 n1+2 n T ct ct ,ct ,...,ct . Using the canonical representation of the MMSE filter in interference space spanned by the columns of [15], the maximum SNR array processor will require the same number of linear filters as the dimension of the signal   space in order to collect all energy of the desired signal. Each αn +1,1 αn +2,1 ··· αn,1  1 1  of these linear filters can be decomposed into two compo- αn +1,2 αn +2,2 ··· αn,2  Λ Ꮿ =  1 1  nents: one in the signal space and the other in the orthogonal ( 1)  . . . .  (4)  . . .. .  space. The maximum SNR array processor is obtained using ··· the following theorem. αn1+1,m αn1+2,m αn,m as the array processor to remove completely the signals from Theorem 1. Let m be the number of receive antennas, n1 be other groups, however, at the expense of sacrificing some en- the number of transmit antennas of the space-time code group ergy of the desired signals. Let in consideration; let Rs and Rn be the signal and interference covariance matrices, respectively, defined by (12) and (13).The linear maximum SNR array processor for this group, denoted Θ(Ꮿ1) = υ1 υ2 ··· υm−n+n (5) 1 by be the array processor, where υ j , j = 1,...,m− n + n1,form Θ (Ꮿ ) = w w ··· w (15) the null space of Λ(Ꮿ1), the received signal is processed with M 1 1 2 k , 474 EURASIP Journal on Applied Signal Processing consists of a set of k linearly independent eigenvectors corre- This conclusion can also be reached from a different per- sponding to the nonzero eigenvalues, counting multiplicities, of spective. From the discussion in Section 3, we can see that the generalized eigenvalue problem the desired signal energy is completely collected by the max- imum SNR array processor, while this is not true in the = Rsw λRnw, (16) zero-forcing case. The receive diversity gain of the maximum SNR method is therefore larger than that of the zero-forcing ≤ where k min(m, n1) is the rank of Rs. The filtering outputs of method. This diversity advantage, however, decreases as the { H }k these eigenvectors wi rt i=1 are uncorrelated with one another. transmission groups are decoded and canceled. For the last decoded group, both zero-forcing and maximum SNR meth- Proof. See the appendix. ods give the same diversity. With this observation, we again This theorem generalizes the MMSE filtering for multi- can conclude that the power allocation pattern of the max- dimensional signals. When n1 = 1 (single antenna transmis- imum SNR approach does not need to decrease so rapidly 1 = 2 = ··· = n1 as the zero-forcing approach. In the simulation, we will sion) or ct ct ct (repetitive transmission), the maximum SNR array processor becomes the conventional compare geometric power allocation with arithmetic power (one-dimensional) MMSE filter. Intuitively, the maximum allocation. SNR array processor prewhitens the interference and orthog- onalizes the desired signal. Maximum ratio combining of the 5. SPATIAL INTERLEAVING maximum SNR array processor outputs is done at the de- coder by substituting Θ(Ꮿ1)in(7), (8), (9), (10), (11)with The focus so far has been on maximizing the transmit and re- ΘM(Ꮿ1). Using Gaussian approximation of the array pro- ceive diversities within a group. When the number of trans- cessor outputs [16], the decoding becomes maximum like- mit antennas is larger than required for one group (i.e., more lihood. Compared to the zero-forcing approach, the decoder than one group is allowed), it is possible to increase the trans- now only needs to process at most min(m, ni)samplesper mit diversity beyond what is provided by space-time coding, − i trellis branch instead of (m n + j=1 nj )[1] when decoding if spatial interleaving is used. In other words, it might be ben- Ꮿi. Unlike the zero-forcing method, the maximum SNR ap- eficiary if the association between data groups and transmit proach does not impose any requirement on the number of antenna groups varies with time. One must bear in mind − i ≤ receive antennas. It works even when (m n + j=1 nj ) 0. that, although the concept of improving the diversity gain Theperformanceunderthissituation,however,willsuffer. by spatial interleaving is very intuitive and straightforward, The principal ratio combining (PRC) method proposed inappropriate spatial interleaving can decrease instead of in- in [17] is also a subset of the maximum SNR method. The creasing the diversity. PRC filter is the eigenvector of (16) corresponding to the We consider the case where the total number of trans- largest eigenvalue. In practice, when the number of filters mit antennas is divisible by the number of code streams per is limited by the equipment cost, the best possible perfor- group. A “group-based” spatial interleaving is realized by di- mance can be achieved by using the eigenvectors of (16)cor- viding the transmit antennas into q disjoint groups, then at responding to the largest eigenvalues (SNRs). every symbol duration, each space-time code group is trans- Given similar optimization procedures and the existence mitted on one distinct antenna group. The mapping from of local optima, the adaptive algorithm in [15] can be ap- code groups to antenna groups may vary with time, but no plied to find the maximum SNR filters when only the channel change in the code stream order is made within a group. At gains of the desired signal (i.e., Ω(Ꮿ1)) are known. the receiver, the array processing methods discussed previ- ously still apply, except that the matrix indices need to be 4. POWER ALLOCATION reordered accordingly. As shown in [1], the diversity gains of the zero-forcing ap- Lemma 1. Let B = B1 B2 ··· Bq , where each matrix Bk, ff ≤ ≤  × proach at di erent decoding stages can be expressed as an in- 1 k q,hasdimensionn lk.Letr, r1,r2,...,rq be the ranks i q q creasing sequence {ni(m−n+ = nj )} = . The authors of [1] ≥ j 1 i 1 of B, B1, B2,...,Bq,respectively.Then =1 rk r. thus proposed to allocate power among the antenna groups k according to a geometrically decreasing sequence. Proof. Using column operations can leave Bk,1≤ k ≤ q,with In the case of maximum SNR filtering, the optimization rk nonzero columns, respectively. Therefore, the rank of B of power allocation is extremely difficult due to the fact that q must be smaller than k=1 rk. the filter outputs contain interference from other groups, and the contribution of this interference depends on the operat- Since our focus is on the transmit diversity, and the re- ing point (in terms of channel gains and AWGN power spec- ceive diversity only depends on the number of receive anten- tral density) of the filter. By fixing the overall power con- nas and the decoding order, we can consider, without loss sumption, we nevertheless can see that, giving one group of generality, the case where the code group of interest is more power by depriving the power of the others increases transmitted alone. Assume that after the group-based spatial the SNR two-fold: the signal power is increased and the inter- interleaving the space-time coded symbols are transmitted ference power is decreased. We therefore argue that the power from the kth antenna group if t ∈ ᐀k,1≤ k ≤ q, where the allocation pattern does not need to decrease so rapidly. ᐀k’s represent disjoint sets of time instants with their sizes lk’s On Some Design Issues of Space-Time Coded Multi-Antenna Systems 475

q = Antenna ... satisfying k=1 lk l. The elements in a set are not necessarily Group 1 C1 C4 C3 C2 C1 C4 C3 C2 C1 consecutive. The pairwise error probability that the decoder Antenna ... will prefer an alternate codeword Group 2 C2 C1 C4 C3 C2 C1 C4 C3 C2 Antenna   C C C C C C C C C ... 1 1 ··· 1 Group 3 3 2 1 4 3 2 1 4 3 e1 e2 e  l  Antenna ... e2 e2 ··· e2  Group 4 C4 C3 C2 C1 C4 C3 C2 C1 C4 =  1 2 l  e  . . . .  (17) T 2T 3T 4T 5T 6T 7T 8T 9T  . . .. .  s s s s s s s s s n n ··· n Time e1 e2 el when Figure 2: Spatial interleaving (“rotation”), Ts: symbol duration.   c1 c1 ··· c1  1 2 l  code streams per group, the antenna group index in (23)can c2 c2 ··· c2   1 2 l  be dropped to arrive at the original bound in [8] c =  . . . .  (18)  . . .. .  . . .  −  n n ··· n $r m −rm c1 c2 cl Es P(c −→ e) ≤ λi (24) = 4N0 is transmitted, is upper bounded by [8] i 1 with diversity advantage rm. −→ | k =  =  = P c e αi,j ,i 1, 2,...,n,j 1, 2,...,m,k 1, 2,...,q For notational simplicity, we give new, consecutive in-  dices to the elements in ᐀k and denote them with a “ .” By 2 Es ≤ exp − d (c, e) , construction, the matrix 4N0   (19) 1 − 1 1 − 1 ··· 1 − 1 e1 c1 e2 c2 e  c   lk lk   2 2 2 2 2 2 e  − c  e  − c  ··· e  − c   where Es is the symbol energy, n is the number of transmit  1 1 2 2 l l   =  k k  antennas per group, m is the number of receive antennas (or Bk  . . . .  (25)  . . .. .  the receive diversity level after array processing in the pres-       n − n n − n ··· n − n k e1 c1 e2 c2 e  c  ence of the other groups [1]), αi,j is the path gain from the lk lk ith transmit antenna in group k to the jth receive antenna, and is a square root of Ak, and the ranks of Ak and Bk are the same. Based on (23), (24), and (25), the following proposi-   2 q m n tion is an immediate result of Lemma 1. 2 = k i − i d (c, e) αi,j ct et . (20) = = k=1 t∈᐀k j 1 i 1 Proposition 1. When the number of transmit antennas is di- visible by the number of space-time code streams per group, the By taking expectation over the complex Gaussian variables diversity gain after the group-based spatial interleaving is no k αi,j , inequality (19)becomes less than that provided by the space-time coding.

 $q m Since errors in trellis decoding usually appear in clus- −→ ≤ 1 P(c e) #  , (21) ters, letting consecutive code symbols experience indepen- n k q k=1 i=1 1+λi Es/4N0 dent fadings might give larger transmit diversity ( k=1 rk). We consider in this paper a cycled spatial interleaving termed k =  where λi , i 1, 2,...,n are the eigenvalues of matrix Ak de- “rotation” (see Figure 2). In this interleaving scheme, the kth fined element-wise by antenna group is devoted to the ith space-time code group when t ∈ ᐀i ,where᐀i = {k − i +1+hq | 1 ≤ k − i +1+hq ≤ = i − i j − j ∗ k k Ak,ij ct et ct et . (22) l, h ∈ Z}. ∈᐀ t k When the number of transmit antennas is large, the di- versity advantage with rotation is dominated by the mini- Let rk denote the rank of the matrix Ak, and reorder the no- k = mum free distance of the code. tations so that λi , i 1, 2,...,rk, are the nonzero eigenvalues of Ak, it follows that

  6. LAYERED SPACE-TIME ARCHITECTURE − q  $q $r m − r m k E k=1 k P(c −→ e) ≤ λk s . (23) The LST proposed in [2] can be seen as a special case of the i 4N k=1 i=1 0 GST with one transmission stream per group. On the other  hand, it is different from the space-time coded GST in that Thus a diversity advantage of q r m and a coding advan- # #  k=1 k the coded symbols from the same group are multiplexed in q r q k k 1/ k=1 rk tage of ( k=1 i=1 λi ) are achieved. When the num- time instead of simultaneously transmitted. With the same ber of transmit antennas is equal to the number of space-time throughput (code rate 1/n), the LST transmits the codeword 476 EURASIP Journal on Applied Signal Processing

2T antennas Rate 1/2 codes Space-time code 1 1 1 1 1 1 1 1 S-T C C C C C C C C ENC C2 C2 C2 C2 C2 C2 C2 C2

LST 1-D 1 2 1 2 1 2 1 2 ENC1 Spatial C1 C2 C1 C2 C1 C2 C1 C2 Demux interl. 1 2 1 2 1 2 1 2 1-D C2 C1 C2 C1 C2 C1 C2 C1 ENC2 Interference suppression Column permutation

1 1 1 1 C1 C1 C1 C1 0000 2 2 2 2 0000C1 C1 C1 C1

Figure 3: An example illustrating the design simplicity of the LST.

(18) column by column and one symbol at a time as To illustrate such a design simplicity of the LST, an ex-  =    ample where n 2 and the number of transmit antennas = 1 2 ··· n 1 2 ··· n ··· 1 2 ··· n  c c1c2 cn cn+1cn+2 c2n c(l−1)n+1c(l−1)n+2 cln , equals n is given in Figure 3. In this figure, the superscript is (26) the code stream index, while the subscript is the group index where the time indices have been modified to reflect the ac- which is not used by the space-time coding case as it has only tual transmission order. As mentioned in [2], the LST uses one group. In order to achieve full (two) transmit diversity, only one-dimensional coding without signal coordination the space-time coded system has to be carefully designed sub- among antennas, so the design is much simpler. We elabo- ject to the rank criterion in [8]. The LST with rotation, on the rate in the following, using specific arguments on diversity contrary, can provide full transmit diversity for both groups and coding gain not found in [2], the LST design advantages. with any useful codes, assuming that the receiver has enough antennas to perform interference suppression. This much re- 6.1. Diversity laxed code design criterion facilitates the use of a larger class We, again, consider the transmission of a specific group in of good codes. the absence of the other groups. Assume that the number of Similar interpretation of the LST is also given in [18], transmit antennas is divisible by n and “rotation” is used. with detailed diversity derivations. For each of the n code streams (rows in (18)), a matrix as in Lemma 1 can be constructed with time shuffling 6.2. Coding gain   In this section, we consider the coding gain when the sys- i = i i ··· i ≤ ≤  B B1 B2 Bq , 1 i n , (27) tem complexity is constrained. From Theorem 1, we can see that, if there are enough receive antennas, the array process- i ≤ ≤ ing complexity depends only on the number of transmit an- where the constituent matrices Bk,1 k q,aregiven,with new consecutive time indices as in (25), by tennas, no matter how the transmit antennas are grouped. In   other words, antenna grouping and coding may only affect i i i i i i i = e  − c  e  − c  ··· e  − c  the system complexity through decoding complexity, which Bk 1 1 2 2 li li , (28) k k is defined by the number of trellis branch computations per and are subject to independent fadings. According to the dis- information bit. For the LST, the decoding of each group cussion in Section 5, the diversity of this code stream is no only involves one-dimensional signal, so the in-phase and less than the rank of Bi, which is always one provided that quadrature components can be decoded separately. This is the code considered is useful, meaning no two distinct input not true for a GST with more than one antenna per group, sequences share the same codeword on this code stream. because the channel induced phase shifts are not the same for With the number of transmit antennas being divisible by every dimension. Using the two-dimensional QPSK space- n and each code stream using a distinct set of antennas, we time codes in Figure 1 as examples, their decoding complex- can construct for the entire transmission sequence of this ity is 64 trellis branch computations per information bit. group a matrix A complexity-wise equivalent LST can allow 32 states for both in-phase and quadrature components, if they are bi-  B = B1 B2 ··· Bn (29) nary coded. As a result, while the two-dimensional QPSK space-time code has constraint length 4, a complexity-wise with the constituent matrices subject to independent fad- equivalent LST can have constraint length 6 for both in- ings. It is now straightforward that the transmit diversity of a phase and quadrature components. This increased constraint group is no less than n as long as a useful code is employed. length not only gives better coding gain, it also increases the On Some Design Issues of Space-Time Coded Multi-Antenna Systems 477

Int q . . . Int 2 Int 1

Max SNR SISO Soft IC 1 DeInt 1 IS 1 DEC 1 Decision SISO Soft IC 2 Max SNR DeInt 2 IS 2 DEC 2 Decision ...... Soft IC q Max SNR DeInt q SISO IS q DEC q Decision

Figure 4: Iterative maximum SNR array processing and decoding of GST. transmit diversity when the number of transmit antennas is results, as well as the cancellation, becomes better. In the end, large and rotation is used. the cancellation might be good enough that the maximum SNRfiltersbecomematchedfilterswhichmaximumratio combine full receive diversity for all groups. When this is the 7. ITERATIVE PROCESSING case, the optimal power allocation is to assign equal power to With a maximum SNR based front-end filter followed by the every group. group decoders at the receiver, the GST finds itself an im- To accelerate the convergence of the iterative process, it mediate application of the MMSE based iterative algorithm is desirable to (temporally) interleave the groups indepen- proposed in [13, 14]. In the GST application, multigroup is dently before rotation so the neighboring symbols in the equivalent to multiuser in [13, 14]. The CDMA chip sam- two estimation stages (front-end filter and decoder) are as pling is replaced by the spatial (antenna array) sampling, and different as possible. For the GST with space-time coding, the spreading sequences are replaced by the random channel group-based temporal interleaving can be used to maintain gains seen by individual antennas. This algorithm has been the same transmit diversity after interleaving. For the LST, implemented recently for the LST in [18, 19]. the same effect can be achieved by separately interleaving the The iterative maximum SNR array processing and de- code streams within a group. coding algorithm is depicted in Figure 4. The basic idea of this algorithm is to concatenate the maximum SNR based 8. NUMERICAL RESULTS front-end filter with the group decoder, then apply itera- tive processing [20] by properly exchanging soft information The design issues discussed in this paper are verified through between them. The front-end filter provides interference- simulation of an 8-transmit 8-receive multi-antenna system suppressed inputs to the group decoder. While the group de- over a flat, Rayleigh, quasi-static (constant within a frame), coder, being soft-input soft-output (SISO), computes the a and spatially independent channel. For the GST, we use sim- posteriori probabilities of the coded symbols needed in the ilar simulation parameters as in [1]. That is, every transmit (soft) interference cancellation. The goal of the interference group contains two antennas; and a rate 1/2, 32-state, 4PSK cancellation is to minimize the residual interference power space-time trellis code (Tarokh et al. in Figure 1) is used. The after cancellation. It can be shown that the reconstructed space-time codes for all groups are the same. Each frame interfering signals used in the cancellation should be the consists of a total of 131 transmissions (128 data + 3 tail) MMSE estimates of the original signals [21]. These MMSE from each transmit antenna. Using Figure 1 and the assump- estimates can be computed by taking expectations using the a tion of equally distributed source sequence, we have diagonal 1 1 H posteriori probabilities. After the interference cancellation is E[ct (ct ) ]. The extreme case with one antenna per group, done, a new maximum SNR filter is computed for the next it- namely, the LST, encodes its in-phase and quadrature com- eration based on the residual interference powers. Intuitively, ponents separately using rate 1/2 binary convolutional codes one can see that if the decoding result of a symbol (from with maximal minimum free distances. Again, all groups use other groups) is less reliable, its MMSE estimate will be far- the same code. Each frame still consists of 128 data trans- ther away from the original, and its residual power after can- missions from each transmit antenna. The number of tail cellation will be higher. When updating the maximum SNR transmissions, however, depends on the allowable constraint filter, the filter weights will be adjusted to deemphasize the lengths of individual systems. signal space of this symbol. We first demonstrate in Figure 5 the array processing and As the iterations go on, one can expect that the decoding power allocation issues. The case in [1] with zero-forcing 478 EURASIP Journal on Applied Signal Processing

8T 8R antennas 8T 8R antennas (FER) 100 0 10

10−1 10−1

− Zero-forcing 10 2 10−2 FER FER

10−3 10−3 Max SNR

− 10 4 10−4 9101112131415161718 2 4 6 8 1012141618 SNR per R antenna (dB) SNR per R antenna (dB) Geometric power allocation Geo. power, ZF Geo. power, max SNR Arithmetic power allocation Ari. power, max SNR Ari. power, max SNR, rotate Figure 5: Performance comparison between zero-forcing and max- Code A, ari. power, max SNR, rotate imum SNR methods. Eq.power,iterativemaxSNR,rotate LST, ari. power, MMSE, rotate LST,eq.power,iterativeMMSE,rotate array processing and geometrically decreasing power allo- Channel outage probability cation (8 : 4 : 2 : 1) among groups is reproduced. We then apply maximum SNR array processing and/or an arith- Figure 6: Performance when different design issues are considered. metically decreasing power allocation (4 : 3 : 2 : 1) to the same system. In this figure, it is shown clearly that mond marks). At 10−2 FER, rotation gives about 1 dB gain. the maximum SNR approach outperforms the zero-forcing The steeper slope of this configuration implies that rota- approach. When the same (geometric) power allocation is tion does increase the transmit diversity. A slight alteration − used, the maximum SNR approach is 1.5 dB better at 10 2 of the Tarokh et al. code is also simulated. This code, re- frame error rate (FER), and its advantage increases as the ferred to as Code A in Figure 1,hasencoderpolynomi- signal power increases due to the steeper FER slope. The als (defined in [8, page 757]): (2, 2), (3, 3), (2, 0), (2, 2), steeper slope is a result of higher receive diversity, as the (1, 1), (0, 2), (2, 1). It slightly outperforms the Tarokh et al. maximum SNR array processor does not preclude the re- code. ceived signal components lying in the interference space. Thesolidcurvewithsquaremarksusesrotationandthe The divergence of the asymptotic performances of the zero- iterative maximum SNR array processing and decoding de- forcing and the maximum SNR approaches might seem scribed in Section 7. It has equal transmission powers for all contradictory to the common understanding. It is, how- groups. Due to the necessity of SISO decoding, the soft out- ever, necessary to clarify that the common understanding put Viterbi algorithm (SOVA) [22] is modified to generate was built on the model which has a deterministic chan- soft outputs of the coded symbols. The SOVA decoding com- nel gain. With the quasi-static, random channel gains in plexity is approximately twice the Viterbi decoder. Therefore, our simulation, there is always a possibility that some of to maintain similar complexity, the space-time code used is the channel gains are very small and the AWGN cannot be the best 4-state code found in [23], and the number of ar- ignored, no matter how high the transmitted power is. It ray processing and decoding iterations is four. Group-based is these worst cases which limit the average error perfor- random temporal interleaving is applied to accelerate the de- mance. coding convergence. Due to its extremely small constraint When arithmetic power allocation is applied, the maxi- length, this system only shows a diversity gain similar to the mum SNR approach performs even better. Its gain over the non-iterative system without rotation. Nevertheless, iterative zero-forcing method is 3 dB. As arithmetic power allocation processing does improve the decoding performance when is not matched to the zero-forcing diversity gains, it worsens SNR is low. − the performance 1 dB at 10 2 FER. The LST performances are also given. The decoding com- Figure 6 gives a comprehensive illustration of the de- plexities of these examples are kept the same as the GST sign issues discussed. First, three curves from Figure 5 are configurations. According to the discussion in Section 6.2, shown again to demonstrate the advantages of the maxi- when there is only one antenna per group, the in-phase mum SNR algorithm and the arithmetic power allocation, and quadrature components can be separately encoded to respectively. Then the performance when rotation is applied increase the coding gain. The curve with triangular marks in addition to these two design considerations is shown (dia- uses a rate 1/2 binary convolutional code with polynomials On Some Design Issues of Space-Time Coded Multi-Antenna Systems 479

2T 2R antennas (FER) −2 10−1 LST is not superior anymore. At 10 FER, all these systems are about 2.5 dB away from the channel outage capacity. They also seem to be able to achieve the information theoretic per- formance slope.

9. CONCLUSION 10−2 FER In this paper, we discussed some design issues of the open- loop space-time coded multi-antenna system. Through intu- itive, as well as analytical explanations, we revealed one by one the advantages of the design considerations. To summa- rize, the zero-forcing array processing method in [1]canbe improved with a maximum SNR approach. This new array 10−3 processing approach then necessitates a different power allo- 7891011121314 cation among the transmission groups. To increase the trans- SNR per R antenna (dB) mit diversity when there are more than one group, group- Space-time code (Tarokh) Space-time code (Code A) based spatial interleaving (“rotation”) can be applied. We LST, MMSE, rotate also considered the LST as a special case of the GST which LST, iterative MMSE, rotate has one antenna per group. When the number of receive Channel outage probability antennas is large enough to resolve the transmission signal spaces, the LST allows easier code design and separate in- Figure 7: 2-transmit, 2-receive performance. phase and quadrature encoding/decoding. These advantages can increase the coding gain and possibly the transmit diver- sity when the decoding complexity is constrained. The GST (658, 578) (32 states). For fair comparison with the corre- sponding GST cases, the power allocation of this system is with maximum SNR array processing also finds itself a direct thesame(4:4:3:3:2:2:1:1).Itsarrayprocess- application of the iterative algorithm proposed in [13, 14]. In ing method is MMSE. From Figure 6, we can see that the in- the 8-antenna LST example we gave, the iterative algorithm creased coding gain improves the performance. The slope of improves the performance about 2 dB. this curve, on the other hand, is very similar to the GST case with Code A. This is an implication that both codes (QPSK APPENDIX space-time code with constraint length 4 and binary convo- lutional code with constraint length 6) have the same degree Proof of Theorem 1. We consider the linear filtering in two of diversity after rotation. steps. First, a linear filter bank ΘM(Ꮿ1) is used to filter the The LST with iterative MMSE algorithm and equal group receive antenna outputs. The outputs of this filter bank are powers uses a (58, 78) (4 states) convolutional code. The then combined to achieve maximum SNR. Without loss of number of decoding iterations is four. Random temporal in- generality, we assume that the interference components of terleaving is applied, separately and independently, to each of the filter bank outputs are uncorrelated with one another. In ΘH Ꮿ Θ Ꮿ the four code streams (two in-phase, two quadrature) of ev- other words, M( 1)Rn M( 1) is diagonal. This assump- ery group. The performance of this system is about 2 dB bet- tion is reasonable since for any filter bank consisting of k ter than the LST case with single-sweep decoding and hard linear filters, we can find, using the singular-value decom- ΘH Ꮿ Θ Ꮿ decision feedback cancellation. As to the diversity gain, al- position (SVD) [24]on M( 1)Rn M( 1), a nonsingular though this system has higher receive diversity due to better k × k matrix to transform it and diagonalize the interference interference cancellation, its transmit diversity suffers from covariance matrix of its outputs. This transformation is re- the shortened minimum free distance. The resultant perfor- versible and does not destroy the information contained in mance slope of this system is very similar to that of the single- the filter bank outputs. sweep LST case which has larger minimum free distance. At As one of our goals is to minimize the number of linear 10−2 FER, this system is about 4 dB away from the channel filters required, we further assume that the linear filters of outage capacity [3, 4]. Figure 6 also shows that, due to either this filter bank are linearly independent of one another. If imperfect signal (code) design or sub-optimal decoding, all this is not true, we can always combine some of the filters to these practical systems do not achieve the performance slope form a linearly independent filter bank with fewer filters. given by information theory. With the above assumptions and the fact that Rn is non- Similar experiment was conducted for the 2-transmit singular, the filter bank outputs will have nonzero and un- 2-receive case where the space-time coding (GST with one correlated interference components. Under this condition, group) system uses optimal joint detection, while the LST the maximum SNR combining of the filter bank outputs is suffers from inefficient interference suppression due to the maximum ratio combining. Given independent group trans- 1 small number of receive antennas (see Figure 7). With space- mission so that the signal and interference codeword (ct and o time coding being able to achieve maximum diversity, the ct ) expectations can be taken separately, the maximum SNR 480 EURASIP Journal on Applied Signal Processing

filter bank satisfies REFERENCES H H = ∀ [1] V. Tarokh, A. Naguib, N. Seshadri, and A. R. Calderbank, max wj Rswj subject to wi Rnwi ν, i, (A.1) w1,...,wk “Combined array processing and space-time coding,” IEEE j Transactions on Information Theory, vol. 45, no. 4, pp. 1121– 1128, 1999. where ν is a constant. With the quadratic form, this condition [2] G. J. Foschini, “Layered space-time architecture for wireless is equivalent to the following Lagrange multipliers communication in a fading environment when using multi- element antennas,” Bell Labs Technical Journal,vol.1,no.2, ∂ pp. 41–59, 1996. wH R w − λ wH R w = 0, ∀i. (A.2) ∂w∗ j s j j j n j [3] G. J. Foschini and M. J. Gans, “On limits of wireless commu- i j j nications in a fading environment when using multiple an- tennas,” Wireless Personal Communications,vol.6,no.3,pp. The solutions to the Lagrange multipliers satisfy 311–335, 1998. [4] E. Teletar, “Capacity of multi-antenna Gaussian channels,” Tech. Rep., Internal Tech. Memo., AT&T-Bell Labs, June 1995. Rs − λiRn wi = 0, ∀i, (A.3) [5] N. Seshadri and J. H. Winters, “Two signaling schemes for im- with proving the error performance of frequency-division-duplex (FDD) transmission systems using transmitter antenna diver- H sity,” International Journal of Wireless Information Networks, wi Rswi vol. 1, no. 1, pp. 49–60, 1994. λi = (A.4) H [6] A. Wittneben, “Base station modulation diversity for digital wi Rnwi SIMULCAST,” in Proc. IEEE Vehicular Technology Conference, being the SNRs of the maximum SNR filter outputs. pp. 505–511, Secaucus, NJ, USA, May 1993. ffi Clearly, we need only filters giving nonzero SNRs. To be [7] A. Wittneben, “A new bandwidth e cient transmit antenna modulation diversity scheme for linear digital modulation,” consistent with the assumption of uncorrelated interference in Proc. IEEE International Conference on Communications, = H components, we can use Cholesky decomposition Rn LL pp. 1630–1634, Geneva, Switzerland, June 1993. [24], as Rn is Hermitian and positive definite. Premultiplying [8] V. Tarokh, N. Seshadri, and A. R. Calderbank, “Space-time (16)byL−1,weget codes for high data rate wireless communication: perfor- mance criteria and code construction,” IEEE Transactions on C LH w = λ LH w , (A.5) Information Theory, vol. 44, no. 2, pp. 744–765, 1998. [9] J.-C. Guey, M. P. Fitz, M. R. Bell, and W.-Y. Kuo, “Signal de- where sign for transmitter diversity wireless communication systems over Rayleigh fading channels,” IEEE Trans. Communications, C = L−1R L−1 H (A.6) vol. 47, no. 4, pp. 527–537, 1999. s [10] S. M. Alamouti, “A simple transmit diversity technique for wireless communications,” IEEE Journal on Selected Areas in is Hermitian and nonnegative definite with rank k.Accord- Communications, vol. 16, no. 8, pp. 1451–1458, 1998. Ω Ꮿ ing to (12), k can be no larger than the rank of ( 1), so [11] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space-time k ≤ min(m, n1). The eigenvalues of C are the same as those block codes from orthogonal designs,” IEEE Transactions on of the original problem (16). There are k nonzero eigenval- Information Theory, vol. 45, no. 5, pp. 1456–1467, 1999. { H }k [12] A. R. Hammons, Jr. and H. El Gamal, “On the theory of space- ues, and a set of orthogonal eigenvectors L wi i=1 can be found by using SVD. It is then straightforward that time codes for PSK modulation,” IEEE Transactions on Infor- mation Theory, vol. 46, no. 2, pp. 524–542, 2000. & & [13] X. Wang and H. V. Poor, “Iterative (Turbo) soft interfer- = = H ν, i j, H λiν, i j, ence cancellation and decoding for coded CDMA,” IEEE w Rnwj = w Rswj = (A.7) i 0,i= j, i 0,i= j. Trans. Communications, vol. 47, no. 7, pp. 1046–1061, 1999. [14] H. El Gamal and E. Geraniotis, “Iterative multi-user detec- Since tion for coded CDMA signals in AWGN and fading channels,” IEEE Journal on Selected Areas in Communications, vol. 18, no. E wH r rH w = wH R + R w , (A.8) 1, pp. 30–41, 2000. i t t j i s n j [15] M. L. Honig, U. Madhow, and S. Verdu,´ “Blind adaptive mul- tiuser detection,” IEEE Transactions on Information Theory, then the filter bank outputs are uncorrelated with one an- vol. 41, no. 4, pp. 944–960, 1995. other. [16] H. V. Poor and S. Verdu,´ “Probability of error in MMSE mul- tiuser detection,” IEEE Transactions on Information Theory, vol. 43, no. 3, pp. 858–871, 1997. ACKNOWLEDGMENTS [17] V. Tarokh and T. K. Y. Lo, “Principal ratio combining for fixed This work was presented at the IEEE International Confer- wireless applications when transmitter diversity is employed,” ence on Third Generation Wireless Communications, San IEEE Communications Letters, vol. 2, no. 8, pp. 223–225, 1998. [18] H. El Gamal and A. R. Hammons Jr., “A new approach to lay- Francisco, California, USA, June 2000. ered space-time coding and signal processing,” IEEE Trans- This work was supported in part by NASA through con- actions on Information Theory, vol. 47, no. 6, pp. 2321–2334, tract NCC3528 and in part by the Army Federated Lab 2001. through CTA contract DAAD19-01-2-0011. [19] D. Shiu and J. M. Kahn, “Scalable layered space-time codes On Some Design Issues of Space-Time Coded Multi-Antenna Systems 481

for wireless communications: performance analysis and de- sign criteria,” in Proc. IEEE Wireless Communications and Networking Conference, pp. 159–163, New Orleans, La, USA, September 1999. [20] J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding of binary block and convolutional codes,” IEEE Transactions on Information Theory, vol. 42, no. 2, pp. 429–445, 1996. [21] F. Tarkoy,¨ “MMSE-optimal feedback and its applications,” in Proc. IEEE International Symposium on Information Theory,p. 334, Whistler, Canada, September 1995. [22] J. Hagenauer and P. Hoeher, “A Viterbi algorithm with soft- decision outputs and its applications,” in Proc. GLOBECOM ’89, pp. 1680–1686, Dallas, Tex, USA, November 1989. [23] S. Baro,¨ G. Bauch, and A. Hansmann, “Improved codes for space-time trellis-coded modulation,” IEEE Communnica- tions Letters, vol. 4, no. 1, pp. 20–22, 2000. [24] S. Haykin, Adaptive Filter Theory, Prentice-Hall, Englewood Cliffs, NJ, USA, 3rd edition, 1996.

Hsuan-Jung Su received his B.S. degree in electronic engineering from the National Chiao-Tung University, Taiwan, in 1992, and the M.S. and Ph.D. degrees in electrical engineering from the University of Mary- land, College Park, in 1996 and 1999, re- spectively. From 1999 to 2000, he was a Postdoctoral Research Associate with the Institute for Systems Research, University of Maryland. Since November 2000, he has been with Bell Laboratories, Lucent Technologies, Holmdel, New Jersey, where he has been involved in the design and perfor- mance evaluation of adaptive coding/modulation, fast Hybrid- ARQ, scheduling, and Radio Link Control protocol for 3G wireless networks. His research interests cover coding, modulation, signal processing, power control and synchronization of narrowband and wideband communication systems.

Evaggelos Geraniotis obtained his Ph.D. in electrical engineering from the University of Illinois at Urbana-Champaign in 1982. He has been with the University of Maryland since 1985 where he is now Professor of Electrical Engineering and a member of the Institute for Systems Research and the Center of Satellite and Hybrid Communi- cation Networks. Dr. Geraniotis’s research has been in Communica- tion Systems and Networks. In the communication systems area, his earlier research has focused on spread-spectrum and anti-jam com- munications; receiver design for fading channels; and schemes for interception, feature-detection, and classification of signals. His re- cent work pertains to several design issues of DS/CDMA, FH/SSMA, and OFDM wireless communications, including power control, ad- vanced modulation, FEC coding, array processing, and interference cancellation techniques, as well as retransmission schemes, MAC protocols, handoff, and switching schemes. His research on com- munication networks has encompassed channel and trafficmodel- ing, performance evaluation, and design of multi-access protocols for mobile, satellite, cellular, and PCS networks; and multi-media integration schemes for wireless networks, high-speed ATM net- works, and hybrid satellite/terrestrial networks. He is the author of over 200 technical papers in journals and conference proceedings on the above areas. He has been consulting regularly for Govern- ment and Industry in the above areas. Dr. Geraniotis was an Editor for Spread-Spectrum of the IEEE Transactions on Communications from 1989 to 1992. EURASIP Journal on Applied Signal Processing 2002:5, 482–486 c 2002 Hindawi Publishing Corporation

Space-Time Trellis Coded 8PSK Schemes for Rapid Rayleigh Fading Channels

Salam A. Zummo Electrical Engineering and Computer Science Department, University of Michigan, Ann Arbor, MI 48109-2122, USA Email: [email protected]

Saud A. Al-Semari Electrical Engineering Department, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia Email: [email protected]

Received 29 May 2001 and in revised form 17 January 2002

This paper presents the design of 8PSK space-time (ST) trellis codes suitable for rapid fading channels. The proposed codes utilize the design criteria of ST codes over rapid fading channels. Two different approaches have been used. The first approach maximizes the symbol-wise Hamming distance (HD) between signals leaving from or entering to the same encoder’s state. In the second approach, set partitioning based on maximizing the sum of squared Euclidean distances (SSED) between the ST signals is performed; then, the branch-wise HD is maximized. The proposed codes were simulated over independent and correlated Rayleigh fading channels. Coding gains up to 4 dB have been observed over other ST trellis codes of the same complexity. Keywords and phrases: diversity, multiple transmit antennas, space-time codes, Rayleigh fading channels.

1. INTRODUCTION (SNR). Improved ST codes for trellis-coded modulation have been presented in [8]. Moreover, optimum ST trellis codes As more wireless communication systems are emerging, for quasi-static Rayleigh channels have been obtained using higher data rates with improved quality of service are re- a new approach [9]. The basic concept of ST codes was also quired. In general, diversity and error control codes are extended to turbo codes [10, 11]. ST trellis codes for rapid known to improve the link quality in wireless communica- fading channels using QPSK and 16-QAM signal constella- tions [1, 2]. In particular, transmit diversity can be used to tions were presented in [12]and[13], respectively. increase the transmission rate. Also, the rate can be increased In this paper, two ST coded 8PSK schemes suitable for using higher order signal constellations, such as MPSK and rapid fading channels are proposed. The paper starts with M-QAM. The conventional transmit diversity system can be a general description of the ST system model. Then, the viewed as a repetition code which consumes higher band- proposed codes are presented. Performance comparisons are width [2]. Therefore, it is expected that substantial perfor- shown in Section 4. Finally, conclusions out of this work are mance improvement can be achieved using more sophisti- outlined. cated codes, utilizing both space and time. Systems combin- ing transmit diversity and trellis codes with high order con- stellations are promising to provide higher transmission rates 2. SYSTEM MODEL at very good quality via providing diversity in time and space. A typical system that employs ST coding consists of a trellis The concept of space-time (ST) codes had appeared first encoder, a vector block interleaver, N modulators, N trans- in [3], referred to as the delay diversity system, where dif- mit antennas, M receive antennas, demodulators and a com- ferent symbols are simultaneously transmitted via differ- biner, a deinterleaver, and finally a ST decoder. The block ent transmit antennas. Later, the performance criteria of ST diagram of a typical ST coded system is shown in Figure 1. codes over quasi-static and rapid fading channels were de- Throughout this work, N hasbeensetto2.TheencodedST rived in [4], where ST codes were designed for quasi-static signals are interleaved using a vector block interleaver. Each fading channels. In [5, 6, 7], the ST concept was applied to element in the interleaver is a vector containing N symbols enhance the quality of transmission at the same bit rate of which are transmitted via the N transmit antennas. The in- systems using single transmit antenna. Hence, the same error terleaver is used in order to break the memory of the channel probability can be achieved at a lower signal-to-noise ratio so that it approaches the behavior of independent fading Space-Time Trellis Coded 8PSK Schemes for Rapid Rayleigh Fading Channels 483

& 1 ct $L N Mod. 1 2 = i − i2 dP(L) min ct cˆt ; Input ST trellis Vector . t=1 i=1 block S/P . '

encoder . int. cN Mod. N t forallerrorpathofeffective length = L , (3) (a) Encoder. where the sum of squared euclidean distances (SSED) is de- d1 fined as the summation term in (3).Theasymptoticgainisa t ff Dem #1 popular parameter to compare di erent coding schemes. In [14], it was derived for conventional trellis codes. It is defined Output . ff . MRC Deint. ST Dec. as the di erence in the SNR values required in both coded M dt systems to provide the same error probability. In this paper, Dem #M a similar approach is used to derive the asymptotic gain of ST codes. From (2)and(3), the asymptotic coding gain of any ST coded scheme over another one can be expressed by (b) Decoder. 2 1/L 10 dP(L)2 α1 Figure 1: General ST system (a) Encoder, (b) Decoder. g∞ = log , (4) L 10 2 α dP(L)1 2 channels, and hence the diversity provided by the coded sys- where α1 and α2 are the multiplicities of the shortest error tem is fully utilized. The depth and span of the interleaver events in the first and second codes, respectively. This ex- pression will be used to calculate the gain of the proposed depend on the channel’s fading rate ( fDT) and the encoder’s constraint length, respectively. The received signal at the jth ST codes over the available ST codes in the literature. In the receive antenna is a noisy superposition of all transmitted following, a detailed discussion on the design of the proposed symbols over all transmit antennas and is modeled as ST codes is presented.

N j = i j 3. THE PROPOSED CODES dt αij,tct + ηt , (1) i=1 All 8PSK ST schemes presented in this paper provide a throughput of 3 bits/s/Hz. The ST coded 8PSK scheme de- where the coefficient αij,t is the path gain from the ith trans- mit antenna to the jth receive antenna at time instance t.It signed in [4], referred to as 8PSK1 here, uses a rate-3/6 trellis is modeled as independent samples of a zero-mean complex encoder to encode the incoming 3 bits onto 6 output bits. Gaussian random process with variance of 1/N.Theci is the The 6 bits at the output of the encoder are mapped onto two t 8PSK signals and transmitted over two transmit antennas. It transmitted symbol from the ith transmit antenna, and η j is t is optimized for quasi-static fading channels, but it is pre- an additive white Gaussian noise (AWGN) sample with vari- sented here as a baseline for the new codes. Both the ST min- ance N /2 per dimension. The pairwise error probability of 0 imum time diversity and minimum squared product distance ST coded systems over rapid Rayleigh fading channels is up- of the code are 2, where the shortest error event has a multi- per bounded as [4] plicity of 2. − Thesecond8PSKcodeisreferredtoas8PSK2.Italso $ N M ˆ ≤ i − i2 Es uses a rate-3/6 trellis encoder. The design approach of this P Cl, Cl 1+ ct cˆt , (2) t∈η i=1 4N0 code maximizes the symbol-wise HD of branch labels leaving from or entering to the same state. This HD is the controlling = { = } = 1 2 ··· N where η t : ct cˆt ,andct (ct ct ct ) is the codeword factor in the design of multidimensional trellis codes over of symbols transmitted simultaneously over all transmit an- fading channels. In [15], the general approach for partition- tennas at time t. The parameter L defined to be the length ing multidimensional MPSK signal space for fading channels of the shortest error event path, is referred to as the space- was presented. Hence, the 4D signal space at the output of time minimum time diversity (ST-MTD) of the ST code [12]. the ST encoder is partitioned using this approach. The resul- It can be visualized as the “branch-wise” Hamming distance tant code is expected to have a higher ST minimum squared (HD) in conventional trellis codes, by considering the whole product distance than the 8PSK1 scheme. In [12, 13], a sim- codeword ct as one symbol. The design rules of good ST ilar approach was used to design ST codes for rapid fad- codes over rapid fading channels state that for the system to ing channels using QPSK and 16-QAM signal constellations, achieve a diversity of νM, the code should have L being at where substantial coding gains have been observed. The set least equal to ν [4]. Also, the code’s ST minimum squared partitioning of multidimensional MPSK signal space for fad- product distance should be maximized. It is defined over the ing channels developed in [15] is briefly reviewed in the shortest error path as following. 484 EURASIP Journal on Applied Signal Processing

Let Ω0 be the 4D MPSK signal set to be partitioned. In 2 general, Ωp refers to the subset at the pth partitioning level. . The set Ωp contains 26−p signal points, where the number of 3 . . 1 bits at the encoder’s output is 6 in this case. The SSED be- Ω1 2 = × = tween signals in is 2δ0 2 0.586 1.17, where the symbol-wise HD is 1. If the vector y = [ 2 1 0] is the bit . . i yi yi yi 4 0 representation of ith codeword at the output of the encoder, T then the vector y = [y1 y2] contains the bit representation of the two 8PSK symbols at the output of the encoder in its . . rows. The columns of this matrix are viewed as block codes of 5 . 7 length equal to 2 bits. Hence, the same matrix can be repre- 2 1 0 6 sented as y = [y y y ]. Define the code Cm that contains 23−m codewords, where m = 1, 2, 3 as the code to be cho- Figure 2: Naturally mapped 8PSK signal constellation. sen as a column of the output matrix. The first code C0 is a (2, 2) block code with codewords [0, 0]T ,[0, 1]T ,[1, 0]T and T [1, 1] and HD = 1. The second code C1 is a (2, 1) code with T T = SSED = 0.59 codewords [0, 0] ,[1, 1] and HD 2. The last code C2 is = = T = ∞ SSED 1.17 SSED 1.17 a(2, 0) block code with codeword [0, 0] and HD .The SSED = 1.17 partitioning is performed such that the symbol-wise HD is increased and the SSED is increased if the former cannot be 12, 23, 34, 45 00, 11, 22, 33 increased. 56, 67, 70, 01 44, 55, 66, 77 The set at partitioning level 0, Ω0, contains all possible 52, 63, 74, 05 40, 51, 62, 73 0 16, 27, 30, 41 04, 15, 26, 73 signal points and it can be described as Ω = Ω(C0, C0, C0). So, by substituting the bit representation of each of the three 36, 47, 50, 61 24, 35, 46, 57 72, 03, 14, 25 76, 07, 10, 21 64, 75, 06, 17 60, 71, 02, 13 codes C0 in the output matrix, all signal points in the set 32, 43, 54, 65 20, 31, 42, 53 are generated. The set at the next partitioning level contains 0 half the number of signal points in Ω . The choice of the (a) code to be partitioned among the three codes C0 determines the resulting set partitioning. Since there are eight branches 00, 11, 22, 33, 44, 55, 66, 77 departing from each state in the trellis required to be con- structed, it is needed to have eight signal points in each sub- 12, 23, 34, 45, 56, 67, 70, 01 set, and hence the third partitioning level will be enough for this case. For fading channels, subsets of the 3rd partition- 24, 35, 46, 57, 60, 71, 02, 13 ing level are chosen so that to maximize the symbol-wise HD 36, 47, 50, 61, 72, 03, 14, 25 3 of the generated subsets, which is Ω = Ω(C1, C1, C1). This yields eight signal points in the first subset. In order to find 40, 51, 62, 73, 04, 15, 26, 37 Ω3 the other subsets, the generated subset from is referred 52, 63, 74, 05, 16, 27, 30, 41 to as Ω3(0). The other subsets, Ω3(1) to Ω3(7), are generated using the generator vectors (tp). They are defined for this case 64, 75, 06, 17, 20, 31, 42, 53 as ( 0)T = [01], ( 1)T = [02], and ( 2)T = [04]. The other t t t 76, 07, 10, 21, 32, 43, 54, 65 subsets use these vectors such that p−1 (b) Ωp(z) = Ωp(0) + z j t j (mod 8), (5) j=0 Figure 3: Trellis diagrams of 8PSK2 (8-state) 3 bits/s/Hz.  = p−1 j j j where, z j=0 2 z and z is the jth bit constituting the integer z. over the 8PSK1 code is computed, the 8PSK2 code shows a The naturally mapped 8PSK signal constellation is shown coding gain of 1.65 dB. in Figure 2, where the set partitioning of the 4D 8PSK signal The second design approach maximizes the SSED be- space used in the first design approach is shown in Figure 3a. tween ST symbols at branches leaving from or entering to Also, the trellis diagram of the 8PSK2 code is shown in the same state. Then, the branch-wise HD is maximized. Figure 3b. From the trellis, signals from the same subset are These two parameters are equivalent to the ST minimum chosen as labels of branches leaving the same encoder’s state. time diversity and minimum squared product distance of Similarly, labels of branches entering to the same state are set the ST code, respectively. The code designed using this ap- to have the maximum HD. The ST minimum time diversity proach is referred to as the 8PSK3. This approach satisfies and minimum squared product distance of the code are 2 and the design criteria of ST codes in [4] since it maximizes the 3.03, respectively. Also, the multiplicity of the shortest error design parameters directly. In this approach, the 4D 8PSK event is one. When the asymptotic coding gain of this code signal space is partitioned so that the SSED between sig- Space-Time Trellis Coded 8PSK Schemes for Rapid Rayleigh Fading Channels 485

1E−02 SSED = 0.59 SSED = 2 SSED = 2 SSED = 1.17

01, 23, 05, 27 00, 22, 44, 26 1E−03 41, 63, 45, 67 40, 62, 44, 66 03, 25, 07, 21 02, 24, 06, 20 43, 65, 47, 61 42, 64, 46, 60 BER 1E−04 12, 34, 16, 30 11, 33, 15, 36 52, 74, 56, 70 14, 36, 10, 32 13, 35, 17, 30 51, 73, 55, 77 54, 76, 50, 72 53, 75, 57, 71

(a) 1E−05 8 101214161820222426 NEs/No 00, 04, 40, 44, 22, 26, 62, 66 8PSK1 1R 8PSK2 2R 8PSK1 2R 8PSK3 1R 33, 37, 73, 77, 55, 51, 15, 11 8PSK2 1R 8PSK3 2R 12, 16, 52, 56, 34, 30, 74, 70 Figure 5: Performance of the 8PSK codes for 1R and 2R antenna 24, 20, 64, 60, 06, 02, 46, 42 over ideally interleaved fading channels.

41, 45, 05, 01, 63, 67, 23, 27

65, 61, 21, 25, 47, 43, 07, 03 1E−02

57, 53, 17, 13, 71, 75, 31, 35

76, 72, 36, 32, 10, 14, 50, 54 1E−03

(b) BER

Figure 4: Trellis diagrams of 8PSK2 (8-state) 3 bits/s/Hz. 1E−04

nals in each pair increases each time the partitioning is per- 1E−05 formed. Since maximizing the SSED of the code is the design 8 101214161820222426 criterion of trellis codes over AWGN channels, then the set NEs/No partitioning optimized for AWGN channels can be utilized. 8PSK1 1R 8PSK2 2R The set partitioning for multidimensional MPSK signal space 8PSK1 2R 8PSK3 1R for AWGN channels was presented in [16]. The same termi- 8PSK2 1R 8PSK3 2R nologies used for the first approach in defining set partition- Figure 6: Performance of the 8PSK codes for 1R and 2R antenna ing are used here. In this case, the subset at partitioning level = × Ω3 ff Ω3 = Ω over correlated fading channels with fDT 0.005 and a 25 16 3, is di erent. It is defined as (C0, C1, C2), where vector block interleaver. the generator vectors are (t0)T = [01], (t1)T = [11], and (t2)T = [02]. The set partitioning of the 4D 8PSK signal space using the second approach is shown in Figure 4a. Also, the trellis 4. PERFORMANCE COMPARISONS diagram of the 8PSK3 code is shown in Figure 4b. In the de- The two proposed coding schemes (8PSK2 and 8PSK3) are sign process, signal labels of branches leaving from the same compared to the 8PSK1 code designed in [4]. They are tested state are drawn from the same subset. Also, signal labels of under time-varying fading environments. First, ideal inter- branches entering to the same state have the SSED maxi- leaving that yields independent fading channels is assumed. mized. By following these rules, it is insured that the designed It is presented here to show the ideal performance of the code gives the maximum possible ST minimum time diver- codes. Figure 5 shows the performance of the three codes for sity and minimum squared product distance. For this code, the cases of one and two receive antennas. It is clear that the the two parameters are 2 and 10.34, respectively. The short- 8PSK3 code is the best followed by the 8PSK2 code. Both est error event occurs with a multiplicity of one, yielding an codes have the same ST minimum time diversity. However, asymptotic coding gain of 4.3 dB over the 8PSK1 scheme. It is the 8PSK3 code has a higher minimum squared product dis- clear that the ST minimum squared product distance is much tance. The 8PSK3 code provides a coding gain of 4 dB over larger than those of both 8PSK1 and 8PSK2 codes, causing the 8PSK1 code and almost 1.3 dB over the 8PSK2 code at a significant performance improvement. bit error rate (BER) of 10−3. 486 EURASIP Journal on Applied Signal Processing

It is observed that the gains of the 8PSK3 scheme over IEEE 1998 International Conference on Universal Personal both 8PSK1 and 8PSK2 schemes are higher in the case of Communications, pp. 917–920, October 1998. two receive antennas. This is because the design approach [7] S. Alamouti, V. Tarokh, and P. Poon, “Trellis-coded mod- in the 8PSK3 scheme maximizes the SSED, whose contribu- ulation and transmit diversity: Design criteria and perfor- mance evaluation,” in Proc. IEEE 1998 International Confer- tion becomes more dominant as the channel approaches the ence on Universal Personal Communications, pp. 703–707, Oc- Gaussian channel (i.e., as the number of receive antennas in- tober 1998. creases). The opposite is observed in the case of the 8PSK2 [8] S. Baro and A. Hansmann, “Improved codes for space-time scheme, where the effect of the symbol-wise HD decreases in trellis-coded modulation,” IEEE Communications Letters, vol. the case of two receive antennas because of the space diversity 4, no. 1, pp. 20–22, 2000. provided at the receiver. [9] Q. Yan and R. Blum, “Optimum space-time convolutional Figure 6 shows the performance of the three codes over codes,” in Proc. IEEE 2000 Wireless Communications and Net- working Conference, pp. 1351–1355, September 2000. fading channels with a fading rate ( fDT) of 0.005. The trans- [10] G. Bauch, “Concatenation of space-time block codes and mitted symbols at each antenna are interleaved using a 25×16 “turbo”-TCM,” in Proc. IEEE International Conference on block interleaver. This interleaver size is not enough to break Communications, vol. 2, pp. 1202–1206, Vancouver, British the memory of the channel, and hence it is considered to be Columbia, Canada, June 2000. improper. This case is studied in order to test the codes under [11] Y. Liu, M. P. Fitz, and O. Y. Takeshita, “Full rate space-time nonideal situations. The results show that the 8PSK3 code turbo codes,” IEEE Journal on Selected Areas in Communica- tions, vol. 19, no. 5, pp. 969–980, 2001. is still the best followed by 8PSK2. The gains of 8PSK3 and [12] S. Zummo and S. Al-Semari, “Design of space-time QPSK 8PSK2 codes over 8PSK1 code are slightly less than that for codes for Rayleigh fading channel,” in Proc. 11th IEEE In- the ideally interleaved fading channel. ternational Symposium on Personal, Indoor, and Mobile Radio Communications, pp. 504–508, September 2000. [13] S. Zummo and S. Al-Semari, “Design of 16-QAM space-time 5. CONCLUSIONS codes for rapid Rayleigh fading channels,” in Proc. 10th annual MPRG Symposium on Wireless Personal Communications, June Two ST coded 8PSK schemes have been proposed. The first 2000. scheme is designed using signal space set partitioning for [14] S. H. Jamali and T. Le-Ngoc, Coded-Modulation Techniques fading channels, and hence maximizes the symbol-wise HD. for Fading Channels, Kluwer Academic, Boston, Mass, USA, On the other hand, the second code utilizes the set partition- 1994. ing for AWGN (to maximize the SSED) and then maximizes [15] E. Leonardo, L. Zhang, and B. Vucetic, “Multidimensional the branchwise HD. The proposed codes were simulated over M-PSK trellis codes for fading channels,” IEEE Transactions ideally interleaved fading channels and also over channels on Information Theory, vol. 42, no. 4, pp. 1093–1108, 1996. [16] S. Pietrobon, R. Deng, A. Lafanechere, G. Ungerboeck, and with improper interleaving. Simulation results showed that D. Costello, “Trellis coded multidimensional phase modula- these coding schemes outperform the 8PSK ST codes pre- tion,” IEEE Transactions on Information Theory, vol. 36, no. 1, sented in the literature. pp. 63–89, 1990.

ACKNOWLEDGMENT Salam A. Zummo was born in 1976 in Saudi Arabia. He received his B.S. and M.S. de- The authors wish to acknowledge the support of King Fahd grees in electrical engineering from King University of Petroleum and Minerals provided to conduct Fahd University of Petroleum and Miner- this research. als (KFUPM), Dhahran, Saudi Arabia, in 1998 and 1999, respectively. In 2000, he REFERENCES joined the Electrical Engineering and Com- puter Science Department at the University [1] B. Sklar, “Rayleigh fading channels in mobile digital commu- of Michigan, Ann Arbor, where he is cur- nication systems—part I: characterization,” IEEE Communi- rently a Ph.D. candidate. His main research cations Magazine, vol. 35, no. 9, pp. 136–146, 1997. interests are in space-time codes, error control coding, channel es- [2] J. Poakis, Digital Communications, McGraw-Hill, New York, timation and iterative receivers for wireless systems. NY, USA, 1989. [3] A. Wittneben, “A new bandwidth efficient transmit antenna Saud A. Al-Semari received his B.S. and modulation diversity scheme for linear digital modulation,” M.S. degrees in electrical engineering from in Proc. IEEE International Conference on Communications, pp. 1630–1634, Geneva, 1993. KFUPM in 1991 and 1992, respectively. He [4] V. Tarokh, N. Seshadri, and A. R. Calderbank, “Space-time received his Ph.D. degree from the Univer- codes for high data rate wireless communication: perfor- sity of Maryland at College Park, USA, in mance criterion and code construction,” IEEE Transactions December 1995. He is currently an Asso- on Information Theory, vol. 44, no. 2, pp. 744–765, 1998. ciate Professor of Electrical Engineering at [5] S. Alamouti, “A simple transmit diversity technique for wire- KFUPM. He is also the Director of the In- less communications,” IEEE Journal on Selected Areas in Com- formation Technology Center at KFUPM. munications, vol. 16, no. 8, pp. 1451–1458, 1998. Dr. Al-Semari pursues research in a range [6] V. Tarokh, S. Alamouti, and P. Poon, “New detection schemes of topics related to wireless communication systems including er- for transmit diversity with no channel estimation,” in Proc. ror control coding, diversity, fading, multiple-access and security. EURASIP Journal on Applied Signal Processing 2002:5, 487–496 c 2002 Hindawi Publishing Corporation

Blind Identification of Convolutive MIMO Systems with 3 Sources and 2 Sensors

Binning Chen Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104, USA Email: [email protected]

Athina P.Petropulu Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104, USA Email: [email protected]

Lieven De Lathauwer Equipe´ Traitement des Images et du Signal, Ecole´ Nationale Sup´erieure d’Electronique´ et de ses Applications, Universit´e de Cergy-Pontoise, Cergy-Pontoise, France Email: [email protected]

Received 4 July 2001 and in revised form 6 March 2002

We address the problem of blind identification of a convolutive Multiple-Input Multiple-Output (MIMO) system with more inputs than outputs, and in particular, the 3-input 2-output case. We assume that the inputs are temporally white, non-Gaussian distributed, and spatially independent. Solutions for the scalar MIMO case, within scaling and permutation ambiguities, have been proposed in the past, based on the canonical decomposition of tensors constructed from higher-order cross-cumulants of the system output. In this paper, we look at the problem in the frequency domain, where, for each frequency we construct a number of tensors based on cross-polyspectra of the output. These tensors lead to the system frequency response within frequency dependent scaling and permutation ambiguities. We propose ways to resolve these ambiguities, and show that it is possible to obtain the system response within a scalar and a linear phase. Keywords and phrases: MIMO system identification, tensor decomposition, higher-order statistics.

1. INTRODUCTION than outputs has been studied. A special case of a blind 2 × 3 convolutive system, where the cross-channels are simple de- The goal of blind r-input n-output (n × r) system identifi- lay elements, has been studied in [9].Thedelayswereesti- cation is to identify an unknown system H(z), driven by r mated via a polyspectra based method. unobservable inputs, based on the n system outputs. Blind The scalar 2 × 3 MIMO case has been approached in identification of a Multiple-Input Multiple-Output (MIMO) [6, 10], based on the canonical decomposition of tensors, system is of great importance in many applications, such as which were constructed from higher-order cross-cumulants speech enhancement in the presence of competing speak- of the system output. That approach yields the system within ers, digital multiuser/multiaccess communications systems, scaling and permutation ambiguities. biomedical engineering [1, 2, 3, 4, 5]. In this paper, we address the blind identification of 2 × 3 Most of the literature on n × r MIMO problems refers to convolutive systems. We look at the problem in the frequency the case of n ≥ r. In that case, system identification can lead domain, where, for each frequency we construct a number to recovery of the inputs via deconvolution. Here we con- of tensors based on cross-polyspectra of the system output. sider the case of more inputs than outputs, that is, n

3 CONVOLUTIVE MIMO CASE Ꮿ30 = 30 ◦ ◦ (6) γsp Hp Hp Hp, Consider the case of convolutive mixtures. The MIMO sys- = p 1 tem output is given by Blind Identification of Convolutive MIMO Systems with 3 Sources and 2 Sensors 489

L−1 22 = − cijkl τ1,τ2,τ3 x(t) h(l)s(t l), (11)   ∗ ∗ l=0  CUM xi (t),xj t + τ1 ,xk t + τ2 ,xl t + τ3 × 3 L−1 where s(t) is the 3 1 input vector containing mutually in- = 22 ∗ ∗ γsp hip(t)hjp t + τ1 hkp t + τ2 hlp t + τ3 , dependent entries with unit variance and non-Gaussian dis- = = × p 1 t 0 tribution, h(l) is the 2 3 impulse response matrix with ele- (14) ments {hij(l)}, i = 1, 2, j = 1, 2, 3, x(t) is the 2 × 1vectorof observations, and t denotes discrete time. where In addition to assumptions (A1) and (A2) introduced for 40 = { } the scalar MIMO case, we further assume the following: γsp CUM sp(t),sp(t),sp(t),sp(t) , 31 ∗ ∗ = { } (A3) there exist a nonempty subset of ω’s, denoted by ω , γsp CUM sp(t),sp(t),sp(t),sp(t) , (15) and a nonempty subset of the indices 1,...,n,denoted 22 = { ∗ ∗ } γs CUM sp(t),sp(t),sp(t),sp(t) by l∗, so that for l ∈ l∗ and ω ∈ ω∗, the lth row of p the matrix H( ) has elements with magnitudes that ω are the three types of the fourth-order cumulants of are mutually different. sp(·), respectively. The corresponding fourth-order cross- 40 At first look, an extension of the scalar MIMO case to the polyspectra, defined as the Fourier transform of cijkl(τ1,τ2, 31 22 convolutive case would appear feasible by observing that, in τ3), cijkl(τ1,τ2,τ3), and cijkl(τ1,τ2,τ3), equals [14]: the frequency domain, it holds that 3 = 40 = 40 − − − x(ω) H(ω)s(ω), (12) Cijkl ω1,ω2,ω3 γsp Hip ω1 ω2 ω3 =1 p where x( ), s( ), and H( ) are the Discrete-Time Fourier ω ω ω × Hjp ω1 Hkp ω2 Hlp ω3 , (16) transform of x(t), s(t), and h(l), respectively. Thus, (12)is 3 similar to (1)forafixedω. 31 = 31 ∗ Cijkl(ω1,ω2,ω3) γsp Hip ω1 + ω2 + ω3 To extend the idea of [10] to this case, one would need = p 1 to estimate the cross cumulants of xi(ω1), xj (ω2),.... Al- × H ω H ω H ω , (17) though cumulants of Fourier transforms has been considered jp 1 kp 2 lp 3 3 in [12], there is a problem right there. According to Brillinger 22 = 22 ∗ Cijkl ω1,ω2,ω3 γsp Hip ω1 + ω2 + ω3 [13, Theorem 4.3.2 on page 93], p=1   × H ω H∗ − ω H ω . CUM xi ω1 ,xj ω2 ,xk ω3 ,xl ω4 = 0 jp 1 kp 2 lp 3 (13) (18) if ω1 + ω2 + ω3 + ω4 = 2πK, Taking ω1 = ω2 = ω3 = ω in (16)and(17), we get, re- 40 30 where K is an integer. Thus, C and C would be identically spectively, zero. When considering discrete Fourier transforms the cu- mulants will not be zero, but they will be very small and thus 3 40 = 40 − sensitive to estimation errors. Cijkl(ω, ω, ω) γsp Hip( 3ω)Hjp(ω)Hkp(ω)Hlp(ω), In the following, we propose an approach that does not p=1 3 involve cumulants of x(ω). We define a set of tensors based ∗ C31 (ω, ω, ω) = γ31H (3ω)H (ω)H (ω)H (ω). on cross-polyspectra of the system output, which lead to ex- ijkl sp ip jp kp lp p=1 pressions similar to those of (4)and(5). (19) First, we define three types of the fourth-order cross cu- mulants of the received signals [14]: These two equations enable us to construct two tensors ᐀1 40 ᐀2 40 31, with elements Cjkli(ω, ω, ω), and 31, with elements c τ1,τ2,τ3 ijkl C31 (ω, ω, ω), where j, k, l, i = 1, 2. We can show that these   jkli 31  CUM xi(t),xj t + τ1 ,xk t + τ2 ,xl t + τ3 two tensors correspond to the tensor C in the instantaneous 3 L−1 case, that is, 40 = γ hip(t)hjp t + τ1 hkp t + τ2 hlp t + τ3 , sp 3 p=1 t=0 ᐀1 (ω) = γ40H (ω) ◦ H (ω) ◦ H (ω) ◦ H (−3ω), 31 sp p p p p 31 p=1 cijkl τ1,τ2,τ3 3 (20)   ᐀2 = 31 ◦ ◦ ◦ ∗ ∗ 31(ω) γs Hp(ω) Hp(ω) Hp(ω) Hp(3ω),  CUM x (t),xj t + τ1 ,xk t + τ2 ,xl t + τ3 p i p=1 3 L−1 = γ31 h∗ (t)h t + τ h t + τ h t + τ , sp ip jp 1 kp 2 lp 3 where Hp(ω) denotes the pth column of H(ω). For system p=1 t=0 with real impulse response h(n), the two tensors are identical 490 EURASIP Journal on Applied Signal Processing

∗ = − since Hp(3ω) Hp( 3ω). On the other hand, for complex proposed in [6, 15]. The idea in [6] is to start an exhaus- systems, the two tensors are nonidentical. This observation tive search, such that the structure of ᐀22 is also taken implies that we need to treat the real and complex cases sep- into account. For each possible solution of (22), the corre- arately. sponding values of {Hp(ω)}(1≤p≤3) and of the rank-1 tensors { ◦ ∗ ◦ ◦ ∗ } Hp(ω) Hp(ω) Hp(ω) Hp(ω) (1≤p≤3) are computed. At The complex system case that point, the “goodness” of the approximation by a sum of ᐀1 ᐀2 rank-1 tensors in (25) is assessed, and the global optimum is From the two tensors 31(ω)and 31(ω)wecanderivefour independent equations, to be used in the estimation of the sought. Each step amounts to computing the roots of a poly- { } polynomial G as defined in (8): nomial of degree 3 (computation of Hp(ω) (1≤p≤3))andver-   ifying how close a vector in a real 35-dimensional space is to C40 (ω,ω,ω) C40 (ω,ω,ω) C40 (ω,ω,ω) C40 (ω,ω,ω) a subspace spanned by three other vectors (checking (25)).  1111 2111 2211 2221   40 40 40 40  In [15], an alternating least squares (ALS) method was pro- C1112(ω,ω,ω) C1122(ω,ω,ω) C1222(ω,ω,ω) C2222(ω,ω,ω)   posed to simultaneously solve the equations based on the two  31 31 31 31  C1111(ω,ω,ω) C2111(ω,ω,ω) C2211(ω,ω,ω) C2221(ω,ω,ω) tensors; here one alternates between the computation of the 31 31 31 31 C1112(ω,ω,ω) C1122(ω,ω,ω) C1222(ω,ω,ω) C2222(ω,ω,ω) roots of a polynomial of degree 3 and 2, respectively. Note here that the proposed method fails to estimate the system ∗ ·G (ω) = 0. transfer function H(ω)atfrequencyω = 0andω = π,be- (21) ᐀1 ᐀ cause the tensors 31 and 22 are identical at these two fre- quencies. Since discrete frequencies will be used in the im- Note for the convolutive MIMO system, the G is a function of plementation, one possible remedy is to obtain the system frequency ω. The above equation provides enough equations transfer function H(ω) at these two frequencies by interpo- for the unique estimation of the polynomial G(ω). lation using the estimate in surrounding frequencies. Simu- lations examples show that the interpolation method works The real system case well with Finite Input Response (FIR) systems. ᐀1 ᐀2 Based on the above methodology, we can get the estimate For a real system, 31(ω)and 31(ω) are identical, thus we only have two independent linear equations to be used in the of H(ω), that is, H˘ (ω), up to some ambiguities as stated in estimation of the polynomial G(ω): (10). These ambiguities are acceptable for the instantaneous mixture case, but not for the convolutive mixture case. As in C40 (ω,ω,ω) C40 (ω,ω,ω) C40 (ω,ω,ω) C40 (ω,ω,ω) the latter case the ambiguities are frequency dependent, one 1111 2111 2211 2221 ff C40 (ω,ω,ω) C40 (ω,ω,ω) C40 (ω,ω,ω) C40 (ω,ω,ω) cannot combine the estimates of H(ω)atdi erent frequen- 1112 1122 1222 2222 cies to get a final estimation. We next propose some steps to ·G∗(ω) = 0. solve this problem. (22) 4. DEALING WITH FREQUENCY DEPENDENT This leaves one complex degree of freedom in the solution. AMBIGUITIES There is no easy way to get another equation like in ᐀1 (ω), 31 ˘ nor to get an equivalent of tensor Ꮿ40 as in the instanta- The solution H(ω) is related to H(ω)as neous case. However, here we propose a tensor ᐀22,which can provide enough information to reach the solution. Tak- H˘ (ω)Λ(ω)e jΦ(ω)P(ω) = H(ω), (26) ing ω1 = ω2 = −ω3 = ω in (18), we get where Λ(ω) is a diagonal matrix representing the frequency 3 Φ 22 − = 22 ∗ ∗ − − dependent real and positive scaling ambiguity; e j (ω) is the Cijkl(ω, ω, ω) γsp Hip(ω)Hjp(ω)Hkp( ω)Hlp( ω). p=1 diagonal matrix representing the frequency dependent phase (23) ambiguity; P(ω) is a permutation matrix representing the For the real system case, we have frequency dependent column permutation ambiguity.

3 Λ 22 − = 22 ∗ ∗ Estimation of (ω) Cijkl(ω, ω, ω) γsp Hip(ω)Hjp(ω)Hkp(ω)Hlp(ω). (24) p=1 Based on assumption (A2), it holds that

H This enables us to construct a tensor ᐀ based on the cross- PX (ω) = H(ω)H(ω) 22 polyspectra C22 (ω, ω, −ω). Then, for the tensor ᐀ it holds 2 ijkl 22 = H˘ (ω) Λ(ω) H˘ (ω)H that (27) 3 2 H 3 = λp(ω) H˘ p(ω)H˘ p(ω) , ∗ ∗ ᐀ = 22 ◦ ◦ ◦ p=1 22(ω) γsp Hp(ω) Hp(ω) Hp(ω) Hp(ω). (25) p=1 where PX (ω) is the cross power spectrum matrix of the re- ˘ The tensor ᐀22 can be utilized following the approaches ceived signal x(t), thus can be estimated; Hp(ω) is the pth Blind Identification of Convolutive MIMO Systems with 3 Sources and 2 Sensors 491

jΦ(ω) column of H˘ (ω); and λp(ω) is the pth diagonal element of Resolving phase ambiguity e Λ (ω). Equation (33)wouldsuffice for the identification of mini- In matrix equation (27) we have three unknowns, that mum phase systems only. For nonminimum phase systems {| |2 = } is, λp(ω) ,p 1, 2, 3 and four linear equations of which the phase ambiguity can be resolved as follows. |Λ | three are independent. Thus, (ω) can be obtained as the The recovery of Φ(ω) is also based on the special struc- solution of these equations. ture of (29). Combining (29)and(33), we get ˜  Now that the scaling has been estimated, let H(ω) H˘ (ω)Λ(ω). It holds that 22 − − Cl l ω, ω α, ω3 1 2 jΦ(ω) H H˜ (ω)e P(ω) = H(ω). (28) = H(ω)Λ2 ω, −ω − α, ω3 H (ω + α) = H¯ (ω)e jΦ(ω)PΛ ω, −ω − α, ω PT e−jΦ(ω+α)H¯ H (ω + α). Resolving frequency dependent permutation 2 3 (34) ambiguity P(ω) Resolving frequency dependent column permutation ambi- Define guity, P(ω), amounts to reducing it to a constant permuta- jΨ(ω) jΦ(ω) T −jΦ(ω+α) tion matrix P. This can be achieved as follows. e  e PΛ2 ω, −ω − α, ω3 P e , (35) Based on the definition of cross-polyspectra in (18), we 22 22 − − construct a polyspectra matrix C (ω1,ω2,ω3) whose (i, j)th which is a diagonal matrix. Since C (ω, ω α, ω3)canbe l1l2 l1l2 22 ¯ ¯ H element equals C (ω1,ω2,ω3),thenwecanrewrite(18)in estimated, and H(ω)andH (ω + α) are known at this stage, l1ijl2 jΨ(ω) matrix form as follows: e can be estimated based on the following equation: 22 − − = ¯ jΨ(ω) ¯ H Cl l ω, ω α, ω3 H(ω)e H (ω + α) (36) C22 ω ,ω ,ω =H ω Λ ω ,ω ,ω HH − ω , (29) 1 2 l1l2 1 2 3 1 2 1 2 3 2 in the same way as we did for solving the frequency depen- where dent scaling and permutation. Based on (35), we can get a Φ recursive equation of the unknown phase ambiguity (ω)as Λ (ω ,ω ,ω )=Diag ...,γ22H∗ (ω +ω +ω )H (ω ),... . follows: 2 1 2 3 sp l1 p 1 2 3 l2 p 3 (30) Φ( + ) − Φ( ) = Θ − Ψ( ) (37) Then it holds that ω α ω ω , 22 − where Θ is Cll ω, ω, ω3     22| |2 Θ = Θ = Λ − − T γs1 Hl1(ω3) Diag ..., ii,... arg P 2 ω, ω α, ω3 P . (38)  .  H = H(ω)  ..  H(ω) Note that for fixed l1, l2, α,andω3, Λ2(ω, −ω − α, ω3) is in- γ22|H (ω )|2 s3 l3 3 dependent of ω.Equation(37) can be solved as in [16, 17], 3 to obtain Φ(ω) up to a linear phase and constant phase am- = 22 2 H γsp Hlp ω3 Hp(ω)Hp(ω) . biguity, which, respectively, correspond to a time delay and p=1 a complex scaling of the columns of the system impulse re- (31) sponse matrix. Both of the latter ambiguities are acceptable for blind system identification. Based on assumption (A3), it holds that Finally, we obtain a solution Hˆ (ω) up to a constant per- 3 mutation, complex scaling and linear phase ambiguity, that 22 − = 22 2 ˜ ˜ H is, Cll ω, ω, ω3 γsp Hlp ω3 HP(p)(ω)HP(p)(ω) , p=1 −Φ (32) Hˆ (ω)e jMw (0)P = H(ω), (39) where P(p) represents the unknown column permutation. 22 2 × By solving (32), we get an estimate of the γ |H (ω3)| in where M is a 3 3 diagonal matrix with integer elements that sp lp Φ 22| |2 = represents the linear phase ambiguity, (0) represents the re- some permuted order, where each γsp Hlp(ω3) , p 1, 2, 3, ˜ maining constant phase ambiguity or the complex scaling. is associated with a column vector HP(p)(ω). By sorting the As a summary, the computation of Hˆ (ω) is carried out in ˜ columns of Hp(ω) according to a predefined order of the es- the following steps: 22| |2 timated γsp Hlp(ω3) , which is the same for all ω’s, we can achieve a constant permutation. Up to this point, we have (S1) estimate the cross-power spectrum matrix PX (ω)of the estimate H¯ (ω) of the system transfer function with only the received signal vector x(t); phase ambiguity and constant permutation, that is, (S2) estimate the fourth-order cross-polyspectra slices 40 31 22 − Cijkl(ω, ω, ω), Cijkl(ω, ω, ω), and Cijkl(ω, ω, ω) of the Φ H¯ (ω)e j (ω)P = H(ω). (33) received signal vector x(t) using the indirect class method [14]. Then construct the matrix equation (21) 492 EURASIP Journal on Applied Signal Processing

| | for complex system. For real system, construct matrix H11(ω)/H12(ω) equation (22) and use the method proposed in [6] 4 3 or [15] to estimate the roots of the polynomials con- 2 structed by the columns of H(ω) at each discrete fre- 1 ˘ quency ω. At this stage we get the estimate H(ω); 0 (S3) solve for the frequency dependent scaling based on 0123456 the estimated cross-power spectrum matrix PX (ω)by | | solving (27). This step yields H˜ (ω); H21(ω)/H22(ω) 2 (S4) solve for the frequency dependent permutation 1.5 ambiguity using the cross-polyspectra matrix 1 22 − ¯ Cll (ω, ω, ω3)in(32), to obtain H(ω); 0.5 (S5) compute the phase ambiguity using (37)(see[16, 17] 0 for more details). At this stage, we have the estimate 0123456 Hˆ (ω) of the system transfer function up to a constant |H31(ω)/H32(ω)| permutation, complex scaling, and linear phase ambi- 15 guity; 10 (S6) estimate the time domain impulse response H(l) using inverse Fourier transform of Hˆ (ω). 5 0 0123456 5. SIMULATION RESULTS time t In this section, we provide two simulation examples to demonstrate the feasibility of the proposed algorithm. Mat- Figure 1: The absolute value of the estimated roots of the polyno- mials constructed by the columns of H˘ (ω) (true: dotted line; esti- lab code for the simulation example can be found at mation: solid line). DFT length F = 128. http://www.ece.drexel.edu/CSPL/. To demonstrate the feasibility of the approach, we first provide an example based on true cumulants, rather than cu- mulant estimates and true power spectrum matrix. the frequencies 0 and π, the magnitude estimation will be al- most perfect. Figure 3 shows the phase estimation result us- Example 1. The impulse response matrix H(l)istakentobe ing the proposed approach. Since the phase is estimated by a a2× 3 nonminimum phase system with transfer function recursive equation (see (37)), the errors at frequencies 0 and π tend to accumulate. However, simulations showed that the = − −1 − −2 − −3 − −4 H11(z) 1 0.6879z 0.8976z 0.6126z 0.1318z , phase estimate can be improved greatly if the interpolated − − − − version of the system estimate is used. H (z) = 1 − 0.7137z 1 − 1.5079z 2 +1.6471z 3 − 1.2443z 4, 12 The time-domain impulse response h(l)wasestimated −1 −2 −3 −4 H13(z) = 1+2.1911z +1.7313z − 0.1818z − 0.2214z , by taking the inverse Fourier transform of the estimated channel frequency domain response Hˆ (ω). Since we do not = − −1 − −2 −3 − −4 H21(z) 1 1.0191z 1.5532z +1.5117z 0.7217z , know the channel order L in advance, we can use an overes- H (z) = 1+2.2149z−1 +1.0828z−2 − 1.1731z−3 − 0.8069z−4, timated channel order Le to truncate the estimated impulse 22 response using the inverse Fourier transform. In this exper- −1 −2 −3 −4 H23(z) = 1 − 1.5537z − 0.0363z +0.5847z +0.5093z . iment, the extended channel order was taken to be Le = 11. (40) For comparison purpose, proper alignment and scaling with the true impulse response was performed. Figure 4 shows the The length of the DFT used in the computation of the estimated channel response estimation. cross-power spectrum and fourth-order cross-polyspectra Numerical simulations also showed that the proposed was taken to be F = 128. Figure 1 shows the absolute value methods work well with complex MIMO systems with 3- of the estimated roots of the polynomials constructed by the input 2-output given true cumulants. When it comes to ap- columns of H(ω) at each discrete frequency. As it can be seen plying the same approach to cumulant estimates, the result is in Figure 1, the estimation is good at most frequencies, except rather sensitive to estimation errors. Errors are mainly caused at frequencies 0 and π, where errors occur in the estimation by the rooting step in obtaining an initial estimate H˘ (ω). The of roots. These errors occur due to the failure of the proposed reason is that the algorithm in [6] which we used to get the method as mentioned in Section 4. initial estimate, is rather sensitive to cumulant estimation er- Figure 2 shows the computed frequency domain magni- rors and has local minima. Unfortunately, no better method tude estimation after the frequency dependent scaling and exists at this time. permutation ambiguity have been recovered by the proposed In the following, an example using cumulant estimates is approach. We can see that there are errors at certain frequen- given. cies due to errors in the estimation of the roots. Simulations show that after the interpolation of the system estimate at Example 2. The impulse response matrix H(l)istakentobe Blind Identification of Convolutive MIMO Systems with 3 Sources and 2 Sensors 493

|H11(ω)| |H11(ω)| |H11(ω)|

0246 0246 0246 Frequency (ω) Frequency (ω) Frequency (ω)

|H11(ω)| |H11(ω)| |H11(ω)|

0246 0246 0246 Frequency (ω) Frequency (ω) Frequency (ω)

Figure 2: Frequency domain magnitude estimation (true: dotted line; estimation: solid line). DFT length F = 128.

Phase estimation Phase estimation Phase estimation 3 3 3

2 2 2 1 1 1 0 0 0 −1 −1 −1 −2 −2 −2

−3 −3 −3 0246 0246 0246 Frequency (ω) Frequency (ω) Frequency (ω)

Phase estimation Phase estimation Phase estimation 3 3 3 2 2 2 1 1 1 0 0 0 −1 −1 −1 −2 −2 −2

−3 −3 −3 0246 0246 0246 Frequency (ω) Frequency (ω) Frequency (ω)

Figure 3: Frequency domain phase estimation (true: dotted line; estimation: solid line). DFT length F = 128. 494 EURASIP Journal on Applied Signal Processing

h11(n) h12(n) h13(n) 1 2 3

0.5 1 2

0 0 1

−0.5 −1 0

−1 −2 −1 051005100510

h21(n) h22(n) h23(n) 2 3 1

2 0.5 1 0 1 0 −0.5 0 −1 − 1 − 1 −1.5 −2 −2 −2 051005100510

Figure 4: Impulse response estimation of extended channel order (true: dotted line and circle; estimation: solid line and star). True channel order L = 5, extended channel order Le = 11.

h11(n) h12(n) h13(n) 1 2 1

0.5 1.5 0.5 0 0 1 −0.5 −0.5 0.5 −1 − 1 0 −1.5

−1.5 −0.5 −2 024024024

h21(n) h22(n) h23(n) 0.6 1 1 0.5 0.8 0.4 0.5 0.6 0.3 0.4 0 0.2 0.2 0.1 0 −0.5 0 −0.2 −0.1 −1 −0.4 024024024

Figure 5: Impulse response estimation of extended channel order (true: dotted line and circle; estimation: solid line and star). True channel order L = 2, extended channel order Le = 5. Blind Identification of Convolutive MIMO Systems with 3 Sources and 2 Sensors 495 a2× 3 nonminimum phase system with transfer function nel FIR filters,” IEEE Trans. Signal Processing, vol. 43, no. 2,   pp. 516–525, 1995. 1 − 1.3537z−1 1+1.9149z−1 0.4 − 1.8000z−1 [4] P. Comon, “Analyse en composantes independantes´ et identi- H(z) = − − − . 0.5+0.3000z 1 1 − 0.7137z 1 1+0.4611z 1 fication aveugle,” Traitement du Signal, vol. 7, no. 5, pp. 435– 450, 1990, French. (41) [5]V.Capdevielle,C.Serviere,andJ.L.Lacoume,“Separation of wide band sources,” in Proc. IEEE ATHOS Workshop on The inputs {sj (k)},j = 1, 2, 3, were mutually indepen- Higher-Order Statistics, pp. 66–70, Begur, Spain, June 1995. dent, zero-mean i.i.d. signals, single-side exponentially dis- [6] P. Comon, “Blind channel identification and extraction of tributed, with length T = 8192. The cross power spec- more sources than sensors,” in Proc. SPIE Conference,pp.2– 13, San Diego, Calif, USA, July 1998. trum matrix Pˆ x(ω) was estimated using the Blackman-Tukey method [18]. The polyspectra slices used in the algorithm [7] P. Comon and O. Grellier, “Non-linear inversion of under- determined mixtures,” in Proc. First International Workshop were estimated via the indirect class method [14], and on Independent Component Analysis and signal Separation,pp. the sample cross-cumulant sequence was windowed by the 461–465, Aussois, France, January 1999. Kaiser window with parameter 6 [14]. The DFT length in the [8] L. Tong, “Identification of multichannel MA parameters us- computation of the cross-power spectrum and fourth-order ing higher-order statistics,” Signal Processing, vol. 53, no. 2, cross-polyspectra was taken to be F = 64. pp. 195–209, 1996, Special Issue on High-Order Statistics. The extended channel order was taken to be L = 5. [9] B. Emile, P. Comon, and J. Le Roux, “Estimation of time de- e lays with fewer sensors than sources,” IEEE Trans. Signal Pro- Proper alignment and scaling with the true impulse response cessing, vol. 46, no. 7, pp. 2012–2015, 1998. was also performed for comparison purpose. Figure 5 shows [10] L. De Lathauwer, P. Comon, B. De Moor, and J. Vande- the estimated channel response estimation. walle, “ICA algorithms for 3 sources and 2 sensors,” in Proc. IEEE Signal Processing Workshop on Higher-Order Statistics, pp. 116–120, Caesarea, Israel, June 1999. 6. CONCLUSION [11] P. Comon and B. Mourrain, “Decomposition of quantics in sums of powers of linear forms,” Signal Processing, vol. 53, no. We proposed a polyspectra based frequency domain method 2, pp. 93–107, 1996. to show the feasibility of MIMO system identification with [12] C. Serviere and V. Capdevielle, “An identification method of more inputs than outputs. The method proposed in [6]and FIR digital filters in frequency domain,” in Proc. 1994 Euro- [10] for the instantaneous case was extended to the convo- pean Signal Processing Conference, Edinburgh, Scotland, UK, September 2000. lutive case, and the frequency dependent ambiguities related [13] D. R. Brillinger, Time Series: Data Analysis and Theory, with frequency domain method were resolved using power Holden-Day, San Francisco, Calif, USA, Expanded edition, spectrum and polyspectra matrices. The method was shown 1981. to work well when true cumulant were provided, while it was [14] C. L. Nikias and A. P. Petropulu, Higher-Order Spectra Analy- in general sensitive when cumulant estimates were used. No sis, Prentice Hall, Englewood Cliffs, NJ, USA, 1993. comparisons were provided because no other methods exist [15] L. De Lathauwer, B. De Moor, and J. Vandewalle, “An alge- for the same problem. braic ICA algorithm for 3 sources and 2 sensors,” in Proc. 2000 European Signal Processing Conference, pp. 461–465, Tampere, Finland, September 2000. [16] B. Chen and A. Petropulu, “Blind MIMO system identifica- ACKNOWLEDGMENTS tion based on cross-polyspectra,” in Proc. 34th Annual Confer- ence on Information Sciences and Systems,Princeton,NJ,USA, This work was supported by NSF under grant MIP-9553227, March 2000. and ONR under grant N00014-20-1-0137. [17] B. Chen and A. P. Petropulu, “Frequency domain MIMO sys- L. De Lathauwer is also affiliated with the group SCD tem identification based on second and higher-order statis- (SISTA) of the E.E. Department (ESAT) of the KULeu- tics,” IEEE Trans. Signal Processing, vol. 49, no. 8, pp. 1677– ven (http://www.esat.kuleuven.ac.be/sista/). Part of this work 1688, 2001. [18] S. M. Kay, Modern Spectral Estimation: Theory and Applica- was supported by the Flemish Government (Research ff Council KULeuven (GOA-Mefisto-666), FWO (G.0240.99, tion, Prentice Hall, Englewood Cli s, NJ, USA, 1988. G.0256.97, Research Communities ICCoS and ANMMM, postdoc grant), Federal State (IUAP IV-02, IUAP V-10-29). Binning Chen was born in Hebei, China. He received his B.S. degree in 1993 from Xi- dian University, Xi’an, China, and the M.S. REFERENCES and Ph.D. degrees in 1998 and 2001 from [1] J. K. Tugnait, “Identification and deconvolution of multichan- Tsinghua University, Beijing, China and nel linear non-Gaussian processes using higher order statistics Drexel University, Philadelphia, PA, USA. and inverse filter criteria,” IEEE Trans. Signal Processing, vol. His research interests are in the area of sta- 45, no. 3, pp. 658–672, 1997. tistical signal processing, MIMO blind sys- [2] M. Torlak and G. Xu, “Blind multiuser channel estimation in tem identification with applications to mul- asynchronous CDMA systems,” IEEE Trans. Signal Processing, tiuser wireless communications, higher- vol. 45, no. 1, pp. 137–147, 1997. order statistics, HDTV receiver design. He is currently working [3] E. Moulines, P. Duhamel, J. F. Cardoso, and S. Mayrargue, at Nxtwave Communications Inc., Langhorne, PA, USA, doing “Subspace methods for the blind identification of multichan- HDTV receiver chip design. 496 EURASIP Journal on Applied Signal Processing

Athina P. Petropulu was born in Kalamata, Greece. She received the Diploma in electrical engineering from the National Techni- cal University of Athens, Greece in 1986, the M.S. degree in elec- trical and computer engineering in 1988 and the Ph.D. degree in electrical and computer engineering in 1991, both from Northeast- ern University, Boston, MA, USA. In 1992, she joined the Depart- ment of Electrical and Computer Engineering at Drexel University where she is now an Associate Professor. During the academic year 1999/2000 she was an Associate Professor at LSS, CNRS-Universite´ Paris Sud, Ecole´ Superieure´ d’Electricit´ e´ in France. Dr. Petropulu’s research interests span the area of statistical signal processing, com- munications, higher-order statistics, fractional-order statistics and ultrasound imaging. She is the co-author of the textbook enti- tled, “Higher-Order Spectra Analysis: A Nonlinear Signal Process- ing Framework,” (Englewood Cliffs, NJ, USA: Prentice-Hall, Inc., 1993). She is the recipient of the 1995 Presidential Faculty Fellow Award. She has served as an associate editor for the IEEE Transac- tions on Signal Processing and the IEEE Signal Processing Letters. She is a member of the IEEE Conference Board and the IEEE Tech- nical Committee on Signal Processing Theory and Methods.

Lieven De Lathauwer wasborninAalst, Belgium, on November 10, 1969. He obtained the Master degree in electro- mechanical engineering in 1992 and the Doctoral degree in applied sciences in 1997 at the Katholieke Universiteit Leuven. The subject of his Ph.D. thesis was signal pro- cessing based on multilinear algebra. He currently holds a permanent research posi- tion with the French Centre National de la Recherche Scientifique (CNRS); he also holds an honorary post- doctoral research mandate with the Fund for Scientific Research- Flanders (FWO), affiliated with the KULeuven. His research inter- ests include linear and multilinear algebra, statistical signal and ar- ray processing, HOS, ICA and BSS, identification, blind identifica- tion and equalization. EURASIP Journal on Applied Signal Processing 2002:5, 497–506 c 2002 Hindawi Publishing Corporation

Maximum Likelihood Blind Channel Estimation for Space-Time Coding Systems

Hakan A. C¸ırpan Department of Electrical Engineering, , Avcilar, 34850 Istanbul, Turkey Email: [email protected]

Erdal Panayırcı Department of Electronic Engineering, IS¸IK University, Maslak, 80670 Istanbul, Turkey Email: [email protected]

Erdinc C¸ekli Department of Electrical Engineering, Istanbul University, Avcilar, 34850 Istanbul, Turkey Email: [email protected]

Received 30 May 2001 and in revised form 7 March 2002

Sophisticated signal processing techniques have to be developed for capacity enhancement of future wireless communication sys- tems. In recent years, space-time coding is proposed to provide significant capacity gains over the traditional communication systems in fading wireless channels. Space-time codes are obtained by combining channel coding, modulation, transmit diversity, and optional receive diversity in order to provide diversity at the receiver and coding gain without sacrificing the bandwidth. In this paper, we consider the problem of blind estimation of space-time coded signals along with the channel parameters. Both con- ditional and unconditional maximum likelihood approaches are developed and iterative solutions are proposed. The conditional maximum likelihood algorithm is based on iterative least squares with projection whereas the unconditional maximum likeli- hood approach is developed by means of finite state Markov process modelling. The performance analysis issues of the proposed methods are studied. Finally, some simulation results are presented. Keywords and phrases: blind channel estimation, conditional and unconditional maximum likelihood.

1. INTRODUCTION available to the receiver with the hope that at least some of them are not severally attenuated. Moreover, the meth- The rapid growth in demand for a wide range of wireless ods of transmitter diversity combined with channel coding services is a major driving force to provide high-data rate have been employed at the transmitter, which is referred to as and high quality wireless access over fading channels [1]. space-time coding, to introduce temporal and spatial corre- However, wireless transmission is limited by available radio lation into signals transmitted from different antennas [2, 3]. spectrum and impaired by path loss, interference from other The basic idea is to reuse the same frequency band simultane- users and fading caused by destructive addition of multipath. ously for parallel transmission channels to increase channel Therefore, several physical layer related techniques have to capacity [2, 3]. be developed for future wireless systems to use the frequency Unfortunately, employing antenna diversity at the trans- resourcesasefficiently as possible. One approach that shows mitter is particularly challenging, since the signals are com- real promise for substantial capacity enhancement is the use bined in space prior to reception. Moreover, estimation of of diversity techniques [2]. Diversity techniques basically re- fading channels in space-time systems is further complicated, duce the impact of fading due to multipath transmission and since the receiver estimates the path gain from each transmit improve interference tolerance which in turn can be traded antenna to each receive antenna. It is also important to note for increase capacity of the system. In recent years, the use of that space-time decoding requires multi-channel state infor- antenna array at the base station for transmit diversity has mation. Thus the achievable diversity gain comes at the price become increasingly popular, since it is difficult to deploy of proportional increase in the amount of training which more than one or two antennas at the portable unit. Trans- results in efficiency loss, especially in a rapidly varying en- mit diversity techniques make several replicas of the signal vironment. Clearly, the practical advantages of eliminating 498 EURASIP Journal on Applied Signal Processing

Space-time coder Information Fading Information sink source . . Channel Spatial . channel . Space-time Space-time . . s(k) encoder formatter demodulator decoder s˜(k)

Figure 1: Space-time coding and decoding system. the need for a training sequence numerous. This motivates for these parameter set. Finally, we present some numerical the development of receiver structures with blind channel examples that illustrate the performance of the ML estima- estimation capabilities. There has been considerable work tors in Section 5. reported in the literature on the estimation of channel in- Notations used in this paper are standard. Symbols for formation to improve performance of space-time coded sys- matrices (in capital letter) and vector (lower case) are in tems operating on fading channels [4, 5, 6, 7]. In this paper, boldface. (·)T ,(·)H ,(·)∗,and⊗ denote transpose, Hermitian, we consider the problem of blind estimation of space-time conjugate, and Kronecker product, respectively. The symbol coded signals along with the matrix of path gains. We pro- I stands for identity matrix with proper dimension; θˆ de- pose two different approaches based on the assumptions on notes the estimate of parameter vector θ;and·denotes the input sequences. Our proposed approaches also exploit the 2-norm. the finite alphabet property of the space-time coded sig- nals. We treat both conditional and unconditional maximum 2. SYSTEM MODEL likelihood (ML) approaches. The first approach (conditional ML) results in joint estimation of the channel matrix and the In the sequel, we consider a mobile communication system input sequences, and is based on the iterative least squares equipped with n transmit antennas and optional m receive and projection [8]. The second approach, which is known as antennas. A general block diagram for the systems of interest unconditional ML, treats the input sequence as stochastic in- is depicted in Figure 1. In this system, the source generates bit dependent identically distributed (i.i.d.) sequences. In con- sequence s(k), which are encoded by an error control code to trast, the unconditional ML approach formulates the blind produce codewords. The encoded data are parsed among n estimation problem in discrete-time finite state Markov pro- transmit antennas and then mapped by the modulator into cess framework [9, 10, 11]. Since the proposed algorithms discrete complex-valued constellation points for transmis- obtain ML estimates of channel matrix and the space-time sion across channel. The modulated streams for all antennas coded signals, they enjoy many attractive properties of the are transmitted simultaneously. At the receiver, there are m ML estimator including consistency and asymptotic normal- receive antennas to collect the transmissions. Spatial channel ity. Moreover, it is asymptotically unbiased and its error co- link between each transmit and receive antenna is assumed variance approaches Cramer-Rao´ lower bound (CRB). to experience statistically independent fading. The performance of the proposed ML approaches are ex- The signals at each receive antenna is a noisy superposi- plored based on the evaluation of CRB. The CRB is a well- tion of the faded versions of the n transmitted signals. The known statistical tool that provides benchmarks for evalu- constellation points are scaled by a factor of Es, so that the ating the performance of actual estimators. For the condi- average energy of transmitted symbols is 1. Then we have tional estimator, the CRB derived in [12], is adapted to the the following complex base-band equivalent received signal present scenario. In unconditional case, since, the computa- at receive antenna j: tion of the exact CRB is analytically intractable, some alter- n native methods must therefore be considered for simplifying rj (k) = αi,j (k)ci(k)+nj (k), (1) CRB calculation [13]. The derivation technique used for un- i=1 conditional ML have the advantage of eliminating the need to where αi,j (k) is the complex path gain from transmit antenna evaluate computationally intractable averaging over all pos- i to receive antenna j, ci(k) is the coded symbol transmitted sible input sequences. However, it provides a looser bound from antenna i at time k, nj (k) is the additive white Gaussian which is not as tight as the exact CRB, but it is computation- noise sample for receive antenna j at time k. ally easier to evaluate. Equation (1) can be written in a matrix form as The outline of the paper is as follows. In Section 2,we describe a basic model for a communication system that em- r(k) = Ω(k)c(k)+n(k), (2) ploys space-time coding with n transmit and m receive an- T m×1 tennas. In Section 3, we derive both conditional and uncon- where r(k) = [r1(k),...,rm(k)] ∈ C is the received signal T n×1 ditional ML estimators for the blind estimation of space-time vector, c(k) = [c1(k),...,cn(k)] ∈ C is the code vector coded signals along with the channel matrix. In Section 4, transmitted from the n transmit antennas at time k, n(k) = T m×1 we develop CRB for the covariance of the estimation er- [n1(k),...,nm(k)] ∈ C is the noise vector at the receive rors for the achievable variance of any unbiased estimator antennas, and Ω(k) ∈ Cm×n is the fading channel gain matrix Maximum Likelihood Blind Channel Estimation for Space-Time Coding Systems 499 given as 3.1. Conditional ML approach   In this section, an ML approach is developed under (AS1), α1,1(k) ··· α1,n(k)   (AS2), (AS3), and the conditional signal model assumption. Ω = . . (k)  . ··· .  . (3) The log-likelihood function is then given by αm,1(k) ··· αm,n(k) 1 L   ᏸ = −const − mL log σ2 − r(k) − Ωc(k)2. (6) We impose the following assumptions on model (2) for the 2 σ k=1 rest of the paper: (AS1) the coded symbol c (k) is adopting finite complex val- The conditional ML estimation can be obtained by jointly i ᏸ Ω c = ues; maximizing over the unknown parameters and (L) T T T T [c (1),...,c (L)] . After neglecting unnecessary terms, con- (AS2) the noise vector n(k) = [n1(k),...,nm(k)] is Gaus- sian distributed with zero-mean and ditional ML yields the following minimization problem:    2 E H = 2 E T = min r(L) − Ωc(L) . (7) n(k)n (l) σ Iδk,l, n(k)n (l) 0, (4) Ω,c(L) E where denotes expectation operator and δk,l is the Since the elements of c(L) are restricted to be finite alpha- = = Kronecker delta (δk,l 1ifk l and 0 otherwise). bet, (7) results in a nonlinear separable optimization prob- Thus n(k) is assumed to be uncorrelated both tempo- lem with mixed integer and continuous variables. Typically, rally and spatially; the minimization problem in (7) is solved in two steps by (AS3) the fading channel is assumed to be quasi-static flat alternatively minimizing with respect to Ω and c(L) while fading, so that during the transmission of L codeword keeping other parameters fixed. First, we minimize (7)with symbols across anyone of the links, the complex path respect to Ω by the least squares solution. Then substitute Ωˆ gains do not change with time k, but are independent back into (7)andsolveitforc(L).TheMLestimateofc(L) from one codeword transmission to the next, that is, in the second step can be obtained by enumeration. How- ever, this search is computationally very demanding since the = = αi,j (k) αi,j ,k1, 2,...,L. (5) number of possible c(L) matrices that need to be checked grows exponentially both with L and n. Therefore, the iter- The problem of estimating matrix of path gains along with ative approaches attempt to solve this problem with lower the space-time coded signals from noisy observations r(L) = computational complexity. [rT (1),...,rT (L)]T is the main concern of the paper. The We now adopt a block conditional ML algorithm that traditional solution to this problem is to first estimate θ = has a lower computational complexity [8]. The proposed al- [Ω,σ2] from training sequence embedded in the input signal, gorithm is based on iterative least squares and projection and then use these estimates as if they were the true param- (ILSP). It takes advantage of the ML estimator being sepa- eters to obtain estimates of input sequence. As an alterna- rable in its continuous and integer variables. Note that the tive, we propose ML blind approaches based on finite alpha- dimension of the channel gain matrix Ω is chosen to satisfy bet property of the space-time coded signals. Then we derive n ≤ m for this particular approach. ML cost functions for our proposed approaches in the next Given an initial estimate Ωˆ of Ω, the minimization of section. (7)withrespecttoc(L) is a least squares problem that can be solved in closed form. Each element of the solution is 3. ML ESTIMATION rounded-off to its closest discrete values (coded MPSK sig- Regarding the input sequence, two different assumptions can nals). Then a better estimate of Ω is obtained by minimiz- be considered: (i) conditional model which assumes the in- ing (7)withrespecttoΩ, keeping cˆ(L) fixed. This minimiza- put sequences to be deterministic unknown parameters and tion also results in least squares. This process continues un- (ii) unconditional model which assumes the input sequences til Ω converges. In practice, we can stop when the difference to be stochastic processes. These two signal models lead to Ωi − Ωi−1 is within a threshold . corresponding ML solutions. In the first approach, the input The following steps summarize the conditional ML algo- sequences are treated as unknown but deterministic quanti- rithm: ties, therefore they are part of the set of unknown parame- Start with initial estimate Ω , = 0 ters. The number of unknown parameters in deterministic (0) i (1) i = i +1 case grows with the increase in the number of observations ∗ −1 ∗ • c ( ) = (Ω Ω − ) Ω r( ). which usually results in inconsistent estimates. In contrast, i L i−1 i 1 i−1 L • Projecteachelementofc (L) to closest dis- under the unconditional signal model, the input sequences i crete values. are treated as random quantities, and are not included in • Ω = rc∗(L)(c (L)c∗(L))−1. the parameter set. As a result, the number of unknown pa- i i i i (2) Continue until Ωi − Ωi−1≤. rameters is fixed and it is therefore possible to obtain consis- tent estimates. Now we develop corresponding ML estima- Clearly, due to nonlinear operation in projecting ci(L)toits tion algorithms. closest discrete values, the convergence is not guaranteed. 500 EURASIP Journal on Applied Signal Processing

However, sufficiently good initialization provided from sub- imizer of ᏸ(θ). Unfortunately, existence of the globally con- optimal techniques improve the possibility of global conver- vergent algorithm for this nonlinear cost function is un- gence and also reduce the number of iterations required. likely. Moreover, the direct maximization of (11)stillre- sults in computationally demanding nonlinear optimiza- 3.2. Unconditional ML approach tion problem. In finding the ML estimator, it is quite com- Under (AS2), (AS3), and the signal model (2), we can formu- mon to resort numerical techniques of maximization such late the probability density function of the received vector r as the Newton-Raphson and scoring methods. However, the (given u)as Newton-Raphson and scoring methods may suffer from con- &   ' vergence problems. As an alternative, the problem can be cast  2 1 $L r(k) − Ωg u(k) in a finite-state Markov chain framework by employing the fθ(r | u) = exp − , (8) mL 2 Baum-Welch algorithm which reduces computational bur- πσ2 = σ k 1 den significantly. The Baum-Welch algorithm although iter- where g(·) is the same nonlinear mapping that describes ative in nature, is guaranteed under certain mild conditions channel coder, spatial formatter, and modulator, u(k) is the to converge and at convergence to produce a local maximum. input sequence influencing the space-time coded symbols. In the sequel, we exploit finite-state Markov process In general, trying to estimate θ and u jointly from (8)is modelling property of the space-time coded signals and em- computationally demanding except for small data alphabet ployed associated estimation algorithm to provide computa- ffi size and small data record. Therefore, the goal is to obtain a tionally e cient solution to resulting optimization problem. cost function that is dependent only on θ, in this way it is Let us then introduce unconditional ML framework based on possible to avoid least squares based on two step procedures finite-state Markov process modelling first. for blind ML estimation. To this end, we therefore consider an unconditional signal model and compute the correspond- 3.2.1 Function of a Markov chain ing ML cost function via the expectation of the conditional Many important problems in digital communications such ML function with respect to the statistics of the input se- as inter-symbol interference, partial response signalling can quences be modelled by means of finite-state Markov process with unknown parameters observed in independent noise [10, fθ(r) = Eu fθ(r | u) . (9) 11]. Based on (AS1u), codeword produced by the channel encoder in space-time coder can be characterized as a finite- However, the expectation Eu in (9) leads to complicated cost function. The maximization of this cost function is there- state Markov process. Moreover, the received signal vector at fore computationally demanding. At this point, we modified an antenna array in the presence of spatial formatting, fading (AS1) for the unconditional case in the following form: channel and noise can also be viewed as a stochastic process (function of Markov chain) that has an underlying Marko- (AS1u) information sequence s(k) is an i.i.d. sequence adopting equiprobable finite values. vian finite-state structure. If we exploit the assumption (AS1 ) on the input se- The space-time coder is characterized by a memory of u length t and 2(l+t−1) state trellis, where the state ζ(k)attimek quence and use the conditional ML function (8), we can ob- − − tain the unconditional ML function specifically for the prob- labels the coder memory (s(lk + l 2),...,s(lk t)), lem at hand as   ζ ∈ Π = = (l+t−2) &   ' (k) τp,p 1,...,2 . (12) $L 2(l+t−1)  −Ω ζ 2 1 r(k) g p fθ(r)= exp − , (l+t−1) 2 mL 2 The transition from state ζ(k)toζ(k +1)isrepresented 2 πσ k=1 p=1 σ (10) on the trellis by a branch denoted by the vector ζ = − − T where p [s(lk + l 1),...,s(lk t)] is the input vec- tor influencing the coded symbols at time k, t is the number φ(k) = s(lk + l − 1),...,s(lk − t) T (13) = of memory elements in the encoder, l log2 M is the block length of information bits that are transmitted (if we restrict and φ(k) ∈ Φ = {ξ ,n= 1,...,2(l+t−1)}. Then both the {ζ(k)} ζ n ourselves to MPSK). Since each element of the p takes on sequence and the {φ(k)} sequence form a first-order finite − 2 possible values, 2(l+t 1) is the set of all possible (l + t − 1) Markov chains, that is, vectors of 2. The log-likelihood function for the unconditional signal Pr φ(k) = ξ = Pr ζ(k) = τ , ζ(k − 1) = τ (14) model is then given by n q s &   ' L 2(l+t−1) r(k) − Ωg ζ 2 for some q, s depending on k. ᏸ θ = − p r ( ) log exp 2 The observation vector (k) can therefore be modelled = σ (11) k=1 p 1 as a probabilistic function of the Markov chain. In the re- + constant, ceived signal model, the unknown channel matrix Ω enter in a linear way, while the nonlinear part of the function g(·) θ ξ and the unconditional ML estimation of is the global max- is due to the space-time coder and is known. Let g( n)de- Maximum Likelihood Blind Channel Estimation for Space-Time Coding Systems 501 note the space-time coder output corresponding to the event Since sequences ᐄ have equal probability, the first term φ = ξ φ = ξ ᐄ (k) n. The sample (k) n is a realization of the log Pr[ ] is constant. For the second term, we use the fact complex random sample g(φ(k)) which takes 2(l+t−1) possible that the noise samples are independent and obtain φ = ξ values depending on the (k) n. Moreover, every realiza- tion of a sequence of symbols corresponds to the sequence of L log fθ r(k), xk branches {xk} of length L,givenas k=1 (18) L(l+t−1) (l+t−1) ᐄ = x1,...,xL , ᐄ ∈ Ξ |Ξ|∈2 . (15) L 2 =  = ξ ξ log fθ r(k), xk p δ xk, p , The underlying Markovian structure of our signal model can k=1 p=1 then be characterized by the following model parameters: where δ(xk, ξ ) = 1 when xk = ξ and 0 otherwise, and ζ = τ | ζ − = τ p p (i) Pr[ (k) q (k 1) s] is a predetermined tran- sition probability. If no information about the trans-  = ξ log fθ r(k), xk p mitted sequence is available, all permissible state tran-   (19) 1  2 2 sitions have the same probability, that is, Pr[ζ(k) = τq | = −  − Ω ξ  − r(k) g p log σ . (l+t−1) 2 ζ(k − 1) = τs] = 1/2 , if state τs leads to state σ τ q; Substitution of (18)in(16) yields π = (ii) ˆ (0) [πˆ1(0),...,πˆ2(l+t−1) (0)] initial state probability  vector. If no assumption on the starting bits is made, Q θ(i), θ the initial probability is same for all states; & | ζ = τ ζ − = L 2(l+t−1)   (iii) the conditional density f (r(k) (k) q, (k 1) 1   2 2 τ = | φ = ξ = C + − r(k) − Ω g ξ − log σ s) f (r(k) (k) n) is that of a Gaussian complex 2 p Ω ξ 2 k=1 p=1 σ random vector with mean g( n)andvarianceσ . ' Since the state transition probability and the initial state × r ᐄ ξ fθ(i) ( , )δ xk, p . probability vector are predetermined, the only model param- ᐄ∈Ξ eter of the Markov chain left to be estimated is f (r(k) | (20) φ = ξ (k) n) for the current model. We therefore devise the Baum-Welch algorithm to estimate the Markov chain model It was shown in [10], that the sum over Ξ is equal to φ = ξ parameter (iii) or equivalently to estimate θ. fθ(i) (r, (k) p). We thus have

3.2.2 Baum-Welch algorithm L 2(l+t−1) θ(i) θ = φ = ξ Q , C + fθ(i) r, (k) p The Baum-Welch algorithm is a commonly used iterative k=1 p=1 technique for estimating the parameters of a probabilistic   functions of a Markov chain. It maximizes an auxiliary func- × − 1  − Ω ξ 2 − 2 r(k) g p log σ , tion related to the Kullback-Leibler information measure in- σ2 stead of the likelihood function [9]. The auxiliary function is (21)  defined as a function of two sets of parameters θ, θ θ(i) where is the old parameter estimates obtained at the ith    2 Q θ, θ = fθ(r, ᐄ)log fθ (r, ᐄ) , (16) iteration while θ = [Ω ,σ ] is the new parameter set to be ᐄ∈Ξ φ = ξ estimated at the (i+1)th iteration and fθ(i) (r, (k) p) is the weighted conditional likelihood. The direct computation of where fθ(r, ᐄ) represents the conditional likelihood, given a ᐄ ᐄ weighted conditional likelihood is computationally intensive. particular branch sequences ,weightedbyPr[ ], the a pri- Fortunately, there exist recursive procedures (called forward ori probability of ᐄ (e.g., [10]). φ = ξ and backward procedures), for computing fθ(i) (r, (k) p) The theorem that forms the basis for the Baum-Welch al- whose complexity increases only linearly with data length gorithm explains the reason why Kullback-Leibler informa- L [9]. tion measure can be used instead of the average likelihood. The following explicit expression for the array response Ω =  matrix is obtained from ∂Q/∂ 0: Theorem 1. The maximization of Q(θ, θ ) leads to increased  likelihood, that is, Q(θ, θ ) ≥ Q(θ, θ) ⇒ fθ (r) ≥ fθ(r). L 2(l+t−1) Ω(i+1) = φ = ξ ξ H fθ(i) r, (k) p r(k)g p For the proof of the theorem, see [9]. k=1 p=1 − To obtain the explicit form of the auxiliary function for L 2(l+t−1) 1 × φ = ξ ξ ξ H the current problem, we start with fθ(i) r, (k) p g p g p . k=1 p=1 log fθ (r, ᐄ) = log Pr[ᐄ] + log fθ (r | ᐄ). (17) (22) 502 EURASIP Journal on Applied Signal Processing

τ = T T ··· T T αT αT T The last equality follows from the definition of the partial cr (1) cc (1) cr (L) cc (L) r c , (26) derivative with respect to a complex quantity (see, e.g., [14])   where ( ) ∂Q = 1 ∂Q  ∂Q    + j  , (23) = T ∂Ω ij 2 ∂Re Ω ij ∂Im Ω ij cr (k) Re c1(k),...,cn(k) , ( ) = T where Ωij is the ijth element of Ω. cc(k) Im c1(k),...,cn(k) , From ∂Q/∂σ2 = 0, the iterative estimation formula can ( ) αi = T also be derived for the noise variance r Re α1,i,...,αm,i , ( ) (27) T   (l+t−1)   α = αT αT L 2 φ = ξ  − Ω ξ 2 r Re 1 ,..., n ,  k=1 p=1 fθ(i) r, (k) p r(k) g p σ 2 =   . ( ) L 2(l+t−1) φ = ξ αi = T k=1 p=1 fθ(i) r, (k) p c Im α1,i,...,αm,i , (24) ( ) α = αT αT T Based on this results, the steps of the proposed uncondi- c Im 1 ,..., n . tional ML algorithm are summarized as follows: Taking the partial derivatives of (6), we then have Set the parameters to some initial value θ(0) = (Ω(0),σ2(0)). ∂ᏸ ∂ 1 L = const. − nH (k)n(k) k = 1,...,L (1) Compute the forward and backward variables to 2 ∂cr (k) ∂cr (k) σ k=1 obtain f (i) (r, ζ(k) = ζ ). θ p 1 Ω(i+1) = ΩH ΩT ∗ (2) Compute from (22). 2 n(k)+ n (k) 2(i+1) σ (3) Compute σ from (24). 2   θ(i+1) − θ(i)  = Re ΩH n(k) , (4) Repeat steps (1)–(3) until < , σ2  where is a predefined tolerance parameter. L φ = ξ ∂ᏸ ∂ 1 (5) Use fθ(i) (r, (k) p)’s to recover the transmitted = − H = const. 2 n (k)n(k) k 1,...,L symbols. ∂cc(k) ∂cc(k) σ k=1 Since the proposed method exploits the finite alphabet struc- = 1 − ΩH ΩT ∗ 2 j n(k)+j n (k) ture of the space-time coded signals and implements a σ 2   stochastic ML solution, it is expected to exhibit better per- = Im ΩH n(k) , formance than suboptimal estimation techniques, especially σ2 ffi L   when short data records are available. For a su ciently good ∂ᏸ 2 ∗ = Re c (k)n(k) i = 1,...,n, initialization, the proposed algorithm converges rapidly to αi 2 i ∂ r σ = the ML estimate of θˆ. In practice, however, we did not ob- k 1 L   serve convergence problem when we initialized parameters ∂ᏸ 2 ∗ = Re c (k) ⊗ n(k) , 2 α 2 according to suggestions of [11] (while initial guess on σ ∂ r σ = Ω k 1 is large enough to avoid overflow, is initialized arbitrarily   ∂ᏸ 1 ∗ ∗ (e.g., Ω(0) ≈ 0)). = − jc (k)n(k)+jc (k)n (k) i = 1,...,n αi 2 i i ∂ c σ 2 L   4. PERFORMANCE ANALYSIS = 2 Im ci(k)n(k) , σ k=1 The performance of the conditional and unconditional ML L   ∂ᏸ 2 ∗ methods are assessed here by deriving their CRBs for the = Im c (k) ⊗ n(k) . α 2 unbiased estimates of the nonrandom parameters. The CRB ∂ c σ k=1 depends on the information on vector parameter θ quanti- (28) fied by the Fisher information matrix (FIM) and provides a lower bound on the variance of the unbiased estimate (i.e., We need the following assumption and results to obtain FIM, E{θˆ} = θ). Then the CRB for an unbiased estimator θˆ is (see [12]): bounded by the inverse of the FIM J(θ): E n(n)nH (m) = σ2I, ( ) E θ − θˆ θ − θˆ T ≥ J−1(θ). (25) E n(n)nT (m) = 0, (29) E nH (n)n(n)nT (m) = 0. 4.1. Conditional CRB The derivation of J(θ)in(25) follows along the lines of [12]. Using (28), (29), and taking expectations , we then obtain the We start constructing FIM by calculating the derivative of (6) entries of the FIM for the conditional case, which are given with respect to by Maximum Likelihood Blind Channel Estimation for Space-Time Coding Systems 503

∂ᏸ ∂ᏸ T 2   4.2. Unconditional CRB E = Re ΩH Ω δ = A, ∂c (n) ∂c (m) σ2 n,m r r We now turn to the evaluation of the unconditional CRB. ᏸ ᏸ T 2   ∂ ∂ = − ΩH Ω = Under (AS1u), the computation of the exact CRB is ana- E 2 Im δn,m B, ∂cr (n) ∂cc(m) σ lytically intractable, we therefore consider an alternative ap- ᏸ ᏸ T   proach for simplifying CRB calculation [13]. ∂ ∂ = 2 ΩH Ω E 2 Re δn,m, The evaluation of the exact form of the unconditional ∂cc(n) ∂cc(m) σ CRB requires the Hessian matrix for the unconditional log- ∂ᏸ ∂ᏸ T 2   E = Re ΩH ⊗ cH (k) = C , likelihood function. The corresponding log-likelihood func- ∂c (k) ∂α σ2 k r r tion explicitly for the current problem is given by ᏸ ᏸ T   ∂ ∂ = 2 ΩH ⊗ H = E 2 Im c (k) Dk, = − − 2 ∂cc(k) ∂αr σ log fθ(r) nL log(2) mL log πσ &   ' ᏸ ᏸ T   − ∂ ∂ 2 H H L 2(l+t 1)  − Ω ζ 2 E = − Im Ω ⊗ c (k) , r(k) g p ∂c (k) ∂α σ2 + log exp − . r c σ2 ᏸ ᏸ T   k=1 p=1 ∂ ∂ = 2 ΩH ⊗ H E 2 Re c (k) , (33) ∂cc(k) ∂αc σ T L L ∂ᏸ ∂ᏸ 2 ∗ Unfortunately, due to the nature of (33) the evaluation of the E = Re c (k) ⊗ n(k) α α 2 Hessian matrix is analytically intractable. However, it is com- ∂ r ∂ r σ n=1 m=1 mon to adopt (see, e.g., [13]) an approximate log-likelihood × H ⊗ H n (m) c (m) , function to obtain valid CRB. Due to concavity of the log- L likelihood function and Jensen’s inequality, we obtain from = 2 ∗ ⊗ 2 Re c (k) Im (33) the following approximate log-likelihood function: σ = k 1 H &   ' ⊗ c (k) = E, L 2(l+t−1)  − Ω ζ 2 r(k) g p θ ≤ − T L L log f (r) log exp 2 . ∂ᏸ ∂ᏸ 2 ∗ σ E = Re c (k) ⊗ n(k) k=1 p=1 α α 2 ∂ c ∂ c σ = = (34) n 1 m 1 ×nH (m) ⊗ cH (m) , If we further simplify (35), we obtain L = 2 ∗ ⊗ ⊗ H L 2(l+t−1)   Re c (k) Im c (k) , 1  2 σ2 θ ≤− − Ω ζ k=1 log f (r) 2 r(k) g p . (35) σ =1 p=1 T L L k ∂ᏸ ∂ᏸ 2 ∗ E = − Im c (k) ⊗ n(k) α α 2 ∂ r ∂ c σ = = At this point, we should point out that the Hessian matrix n 1 m 1 ×nH (m) ⊗ cH (m) , from the approximate log-likelihood function can be eas- ily obtained. However, (35) leads to a CRB called modified L = − 2 ∗ ⊗ CRB(MCRB) which is not as tight as exact CRB, but it is 2 Im c (k) Im σ k=1 computationally easier to evaluate. It turns out from the approximate log-likelihood func- ⊗ cH (k) = −F. tion of (34) that the entries of the FIM are as (30) nL Then the FIM can be written in partitioned form as Jσ2,σ2 = , Jσ2,Ω = 0, JΩ,σ2 = 0. (36) σ4   Ᏼ 0 Ꮿ  1  Moreover, the submatrix JΩ,Ω can also be obtained as  . .   .. .   .  2(l+t−1) =  Ᏼ Ꮿ  2 J  0 L  , (31) JΩ Ω = g ζ gH ζ . (37)   , 2 p p   σ p=1 ᏯT ··· ᏯT Ᏹ 1 L The i.i.d. input sequence coded with orthogonal space-time codes results in uncorrelated coded sequence. It is therefore where       possible to further simplify the valid MCRB’s. In this case, A −B Ck −Dk E −F the valid MCRB can be easily obtained as follows: Ᏼ = , Ꮿk = , Ᏹ = . BA Dk Ck FE   σ2 (32)  0  −1 = 2 nL  The FIM can now be directly constructed. We can numeri- J σ 2 I . (38) cally compute the variance of individual parameter estimate 0 − − 22(l+t 1) by inverting the FIM CRB(τ) = diag{J 1(τ)}. 504 EURASIP Journal on Applied Signal Processing

c˜1(k)

12

s(k) ∈{0, 1} s(2k +1) s(2k) s(2k − 1) s(2k − 2) Information sequence

12

c˜2(k)

Figure 2: 4-state space-time coding system model.

c˜1(k)

12 2 s(k) ∈{0, 1} s(2k +1) s(2k) s(2k − 1) s(2k − 2) s(2k − 3) Information sequence 12 2

c˜2(k)

Figure 3: 8-state space-time coding system model.

5. SIMULATIONS In this case, the coded 4PSK symbols obtained from two cur- rent information bits are transmitted over the first antenna, In this section, we illustrate some simulation results to eval- whereas the coded 4PSK symbols obtained from two pre- uate the effectiveness and applicability of the proposed ML ceding bits are transmitted over the second antenna simul- approaches. We consider the generator matrix form repre- taneously. The coded symbols are then transmitted through sentation of the space-time coding system [15]. In this rep- quasi-static fading channel matrix. resentation the stream of coded complex MPSK symbols are In Figure 4, we have plotted the estimation error ob- obtained by applying the mapping function ᏹ to the follow- tained from conditional and unconditional ML for the chan- ing matrix multiplication: nel parameters as well as the corresponding CRBs. The esti- c(k) = ᏹ u(k) · G(modM) , (39) mation error experienced by the proposed estimation proce- dures at each iteration (SNR = 10 dB) is shown in Figure 6. where u(k) = [s(lk+t −1),...,s(lk−t)]T and G is the genera- tormatrixwithn columns and l + s rows and ᏹ is a mapping Case 2. A slightly more complicated space-time encoder with = = function that maps integer values c˜i to the coded MPSK sym- n 2, t 3 and the generator matrix bols, ᏹ(c˜i) = exp(2πjc˜i/M).   20 The performance of the proposed methods was evalu-   10 ated as a function of SNR (signal-to-noise ratio) based on   G = 02 (41) the Monte Carlo simulations. Both conditional and uncon-   ditional ML methods were tested for 200 Monte Carlo trials 01 per SNR point across range of SNRs. In each trial, the estima- 22 tion error of each parameter estimate from conditional and is considered in this case. This example would be an 8-state unconditional ML for the channel parameters were recorded. code as shown in Figure 3. We consider the following two different cases. In Case 2, the coded 4PSK symbols generated from Case 1. 4PSK space-time code example shown in Figure 2 is [s(2k +1),s(2k),s(2k − 3)] are transmitted over the first considered with n = 2, t = 2 and the generator matrix antenna, whereas the coded 4PSK symbols obtained from   − − − 20 [s(2k 1),s(2k 2),s(2k 3)] are transmitted over the second 10 antenna simultaneously. The coded symbols are then trans- G =   . (40) 02 mitted through the quasi-static fading channel matrix. 01 Figure 5 shows the experimental estimation error for Maximum Likelihood Blind Channel Estimation for Space-Time Coding Systems 505

Performance analysis: Case 1 Convergence of the proposed algorithms: Case 1 0 10 100 Conditional ML Conditional ML Unconditional ML Unconditional ML 10−1 SNR = 10 dB Conditional CRB 10−1 Unconditional CRB 10−2 10−2

10−3

10−3 −4 10 Channel estimation error norm Channel parameter estimation error norm −4 10−5 10 51015 20 12345678910 SNR in dB Iteration number

Figure 4: Case 1: Channel matrix estimation error norm. Figure 6: Case 1: Convergence of the channel matrix.

Performance analysis: Case 2 Convergence of the proposed algorithms: Case 2 0 10 100 Conditional ML Conditional ML Unconditional ML Unconditional ML 10−1 Conditional CRB SNR = 10 dB 10−1 Unconditional CRB 10−2

10−2 10−3

−4 10 − 10 3 Channel estimation error norm

−5

Channel parameter estimation10 error norm 10−4 51015 20 12345678910 SNR in dB Iteration number

Figure 5: Case 2: Channel matrix estimation error norm. Figure 7: Case 2: Convergence of the channel matrix. both the conditional and unconditional ML together with (iii) the unconditional approach requires more iterations their corresponding CRB’s for a range of SNR’s. Figure 7 than the conditional approach to converge, however, shows the estimation error experienced by the proposed es- unconditional approach is more successful in reduc- timation procedures at each iteration (SNR = 10 dB). ing channel estimation error norm at convergence for Based on the simulations we made the following obser- moderate SNR values. vations: (i) the proposed conditional and unconditional ML ap- 6. CONCLUSIONS proaches perform almost identically for high SNR val- ues. Moreover, conditional ML achieve conditional In this paper, we presented the conditional and uncondi- CRB for high SNRs; tional approaches to the problem of blind estimation of (ii) since the unconditional cost function is dominated the channel parameters along with the space-time coded se- by only one term for high SNR, it results in exactly quence. We derived iterative ML algorithms based on the the same cost function as one would obtain for con- conditional and unconditional signal models. Furthermore, ditional ML estimation of θ. It is therefore expected the performance of the proposed algorithms are explored that both conditional and unconditional cost func- based on the derivation of their associated CRBs. We also tions yield similar estimates of θ at high SNR. Thus the presented Monte Carlo simulations to verify the theoretically unconditional ML approach also achieves conditional predicted estimator’s performance. The examples demon- CRB for high SNR; strated that proposed ML approaches achieve the conditional 506 EURASIP Journal on Applied Signal Processing

CRB for high SNR values. Since the unconditional CRB pro- Hakan A. C¸ırpan received his B.S. degree vides a looser bound, it is not as tight as exact CRB. in 1989 from Uludag University, Bursa, Turkey, the M.S. degree in 1992 from the University of Istanbul, Istanbul, Turkey, and ACKNOWLEDGMENTS the Ph.D. degree in 1997 from the Stevens This work was supported in part by the Research Fund of The Institute of Technology, Hoboken, NJ, USA, University of Istanbul, Project numbers: B-924/12042001, all in electrical engineering. From 1995– ¨ 1997, he was a Research Assistant with the O-1032/07062001, 1072/031297 and The Scientific and Stevens Institute of Technology, working on Technical Council of Turkey (TUBITAK) Project number signal processing for wireless communica- 100EE006. tions. In 1997, he joined the faculty of the Department of Electrical- Electronics Engineering at The University of Istanbul. His cur- REFERENCES rent research activities are focused on signal processing and com- munication concepts with specific attention to channel estimation [1] T. S. Rappaport, Wireless Communications Principles and Prac- and equalization algorithms for space-time coding and multicar- tice, Prentice Hall, Upper Saddle River, NJ, USA, 1996. rier (OFDM) systems. Dr. C¸ ırpan received the Peskin Award from [2] V. Tarokh, N. Seshadri, and A. R. Calderbank, “Space-time Stevens Institute of Technology as well as Prof. Nazim Terzioglu codes for high data rate wireless comunication: performance criterion and code construction,” IEEE Transactions on Infor- award from the Research fund of The University of Istanbul. He is mation Theory, vol. 44, no. 2, pp. 744–765, 1998. a Member of IEEE and Member of Sigma Xi. [3] A. F. Naguib, V. Tarokh, N. Seshadri, and A. R. Calderbank, “A space-time coding modem for high data rate wireless comu- Erdal Panayırcı received the Diploma En- nications,” IEEE Journal on Selected Areas in Communications, gineering degree in electrical engineering vol. 16, no. 8, pp. 1459–1478, 1998. from Istanbul Technical University, Istan- [4] Y. Li, G. N. Georghiades, and G. Huang, “EM-based sequence bul, Turkey in 1964 and the Ph.D. degree estimation for space-time codes systems,” in ISIT ’2000,p. in electrical engineering and system sci- 315, Sorrento, Italy, June 2000. ence from Michigan State University, East [5] A. F. Naguib and N. Seshadri, “MLSE and equalization Lansing, Michigan, USA, in 1970. Between of space-time coded signals,” in VTC2000, pp. 1688–1693, 1970–2000 he has been with the Faculty Tokyo, Japan, Spring 2000. of Electrical and Electronics Engineering at [6] Z. Liu, X. Ma, and G. B. Giannakis, “Space-time coding and the Istanbul Technical University, where he Kalman filtering for diversity transmissions through time- was a Professor and Head of the Telecommunications Chair. Cur- selective fading channels,” in Proc. MILCOM Conf., vol. 1, rently, he is a Professor and Head of the Electronics Engineer- pp. 382–386, Los Angeles, Calif, USA, October 2000. ing Department at IS¸IK University, Istanbul, Turkey. He is en- [7] C. Cozzo and B. L. Hughes, “Joint channel estimation and gaged in research and teaching in digital communications and wire- data symbol detection in space-time communications,” in less systems, equalization and channel estimation in multicarrier IEEE International Conference on Communications, vol. 1, pp. (OFDM) communication systems, and efficient modulation and 287–291, 2000. coding techniques (TCM and turbo coding). He spent two years [8] S. Talwar, M. Viberg, and A. Paulraj, “Blind estimation of (1979–1981) with the Department of Computer Science, Michigan multiple co-channel digital signals using an antenna array,” State University, as a Fulbright-Hays Fellow and a NATO Senior Sci- IEEE Signal Processing Letters, vol. 1, no. 2, pp. 29–31, 1994. entist. From August 1990 to December 1991 he was with the Center [9]L.E.Baum,T.Petrie,G.Soules,andN.Weiss,“Amaximiza- tion technique occurring in the statistical analysis of proba- for Communications and Signal Processing, New Jersey Institute of bilistic functions of Markov chains,” The Annals of Mathe- Technology, as a Visiting Professor, and took part in the research matical Statistics, vol. 41, no. 1, pp. 164–171, 1970. project on Interference Cancelation by Array Processing. Between [10] G. K. Kaleh and R. Valet, “Joint parameter estimation and 1998–2000, he was Visiting Professor at the Department of Electri- symbol detection for linear and nonlinear unknown chan- cal Engineering, Texas A&M University and took part in research nels,” IEEE Trans. Communications, vol. 42, no. 7, pp. 2406– on developing efficient synchronization algorithms for OFDM sys- 2413, 1994. tems. Between 1995–1999, Prof. Panayırcı was an Editor for IEEE [11] M. Erkurt and J. G. Proakis, “Joint data detection and channel Transactions on Communications in the fields of Synchronizations estimation for rapidly fading channels,” in IEEE Globecom and Equalizations. ’1992, pp. 910–914, Orlando, Fla, USA, December 1992. [12] P. Stoica and A. Nehorai, “MUSIC, maximum likelihood, and Erdinc C¸ekliwas born in Istanbul, Turkey, Cramer-Rao bound,” IEEE Trans. Acoustics, Speech, and Signal on April 19, 1969. He received his B.S., M.S., Processing, vol. 37, no. 5, pp. 720–741, 1989. and Ph.D. degrees in electrical engineering [13] A. N. D’Andrea, U. Mengali, and R. Reggiannini, “The mod- from Istanbul University, Istanbul Turkey, ified Cramer-Rao bound and its application to synchroniza- in 1993, 1996, and 2001, respectively. From tion problems,” IEEE Trans. Communications, vol. 42, no. 1994–2001, he was a Research Assistant at 2/3/4, pp. 1391–1399, 1994. the University of Istanbul. He was a Visit- [14] S. Haykin, Adaptive Filter Theory, Prentice-Hall, Englewood ing Researcher at the Technical University ff Cli s, NJ, USA, 1996. of in 1999. Currently, he works as [15] S. Baro,¨ G. Bauch, and A. Hansmann, “Improved codes for a Research Associate in the Scientific and space-time trellis coded modulation,” IEEE Communications Technical Council of Turkey (TUBITAK). Letters, vol. 4, no. 1, pp. 20–22, 2000. EURASIP Journal on Applied Signal Processing 2002:5, 507–516 c 2002 Hindawi Publishing Corporation

Pilot-Symbol-Assisted Channel Estimation for Space-Time Coded OFDM Systems

King F. Lee Multimedia Architecture Lab, Motorola Labs, Schaumburg, IL 60196, USA Email: [email protected]

Douglas B. Williams School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA Email: [email protected]

Received 31 May 2001 and in revised form 5 March 2002

Space-time coded orthogonal frequency division multiplexing (OFDM) transmitter diversity techniques have been shown to pro- vide an efficient means of achieving near optimal diversity gain in frequency-selective fading channels. For these systems, knowl- edge of the channel parameters is required at the receivers for diversity combining and decoding. In this paper, we propose a low complexity, bandwidth efficient, pilot-symbol-assisted (PSA) channel estimator for multiple transmitter OFDM systems. The pilot symbols are constructed to be nonoverlapping in frequency to allow simultaneous sounding of the multiple channels. The time- varying channel responses are tracked by interpolating a set of estimates obtained through periodically transmitted pilot symbols. Simulations are used to verify the effectiveness of the proposed estimator and to examine its limitations. It is also shown that the PSA channel estimator has a lower computational complexity and better performance than a previously proposed decision- directed minimum mean square error MMSE channel estimator for OFDM transmitter diversity systems. Keywords and phrases: transmitter diversity, OFDM, channel estimation, pilot symbols, interpolation.

1. INTRODUCTION diversity has received strong interest in recent years, es- pecially in the mobile communications research commu- The mobile wireless channel suffers from multipath fading nity. Furthermore, the channels over which these high data that severely attenuates the received signal during periods of rate mobile communications systems operate are generally deep fades. Spatial diversity is a well-known technique for frequency-selective, so transmitter diversity techniques that improving the performance and reliability of wireless com- are effective in frequency-selective fading channels are of spe- munications over fading channels. Traditionally, spatial di- cial interest. versity has been implemented at the receiver end by using A number of space-time coded orthogonal frequency multiple antennas at the receiver and then combining sig- division multiplexing (OFDM) transmitter diversity tech- nals to improve the quality of the received signal. Unfortu- niques have recently been proposed for frequency-selective nately, receiver diversity requires multiple, widely-spaced an- fading channels [1, 2, 3, 4]. These techniques are capable tennas and multiple radio frequency (RF) front-end circuits of achieving near optimal diversity gain when the receivers at the receiver. This multiplicity of receiver front-end hard- have perfect knowledge of the channels. In practice, the chan- ware is undesirable and impractical for portable receivers, nel parameters have to be estimated at the receivers. Chan- such as pagers or cellular handsets, where physical size and nel estimation techniques for conventional OFDM systems current drain are important constraints. Transmitter diver- have been studied extensively by many researchers [5, 6, 7, sity, on the other hand, can be implemented with multiple 8, 9, 10, 11]. However, channel estimation for OFDM sys- spatially separated antennas at the transmitter and requires tems with transmitter diversity has seen only limited devel- only a single antenna and front-end circuit at the receiver. opment so far. Channel estimation for transmitter diversity Transmitter diversity techniques are, therefore, very suitable systems is complicated by the fact that signals transmitted for paging, cellular, and portable wireless data services, where simultaneously from multiple antennas become interference a small number of base stations serve a large number of for each other during the channel estimation process. In [12], mobile users and where spatially separated antennas can be a decision-directed minimum mean square error (MMSE) easily implemented at the base stations. Hence, transmitter channel estimator for OFDM transmitter diversity systems 508 EURASIP Journal on Applied Signal Processing

Tx1

IDFT & h1(n) X1(n) cyclic prefix Rx Transmitter Tx Serial to X(n) 2 X(u) diversity parallel encoder h2(n) IDFT & X2(n) cyclic prefix

ˆ X(n) Diversity Prefix r(n) Xˆ (u) Parallel Y(n) to serial decoder removal &DFT Λˆ 1(n)

Λˆ (n) Channel 2 estimator

Figure 1: Block diagram of a two-branch OFDM transmitter diversity system. was proposed. The primary shortcoming of the MMSE chan- or equal to L, the order of the channel impulse responses, nel estimation approach is the high computational complex- that is, G ≥ L. At the receiver, the received signal vector first ity required to update the channel estimates during the data has the cyclic prefix removed and is then demodulated by a transmission mode. In this paper, we investigate a low com- discrete Fourier transform (DFT) to yield the demodulated plexity channel estimation technique for multiple transmit- signal vector Y(n). Assuming that the channel impulse re- ter OFDM systems. The proposed technique uses bandwidth sponses remain constant during the entire block interval, it efficient pilot symbols to facilitate temporal estimation of the can be easily shown that the demodulated signal is given by multiple channel responses. Simple interpolation filters are then used to update the estimates during the data transmis- Y(n) = Λ1(n)X1(n)+Λ2(n)X2(n)+Z(n), (1) sion mode. where Λ1(n)andΛ2(n) are two diagonal matrices whose el- 2. OFDM TRANSMITTER DIVERSITY SYSTEMS ements are the DFTs of the respective channel impulse re- sponses, h1(n)andh2(n), and Z(n) is the DFT of the channel A block diagram of a general two-branch OFDM transmitter noise. Elements of Z(n) are generally assumed to be addi- diversity system is shown in Figure 1.LetX(u) denote the in- 2 tive white Gaussian noise with variance σZ . Clearly, the de- put serial data symbols with symbol duration TS. The serial modulated signal vector Y(n) is the superposition of the two to parallel converter collects K serial data symbols into a data encoded vectors X (n)andX (n), which makes estimation = ··· − T 1 2 vector X(n) [X(n, 0) X(n, 1) X(n, K 1)] , which has of the channel parameters (i.e., h ( )andh ( )or,equiva- 1 1 n 2 n a block duration of KTS. The transmitter diversity encoder lently, Λ1(n)andΛ2(n)) from Y(n) challenging for transmit- codes X(n) into two vectors X1(n)andX2(n) according to an ter diversity systems, especially during the data transmission appropriate coding scheme as in [1, 2, 3, 4]. The coded vec- mode. tor X1(n) is modulated by an inverse discrete Fourier trans- form (IDFT) into an OFDM symbol sequence. A length G cyclic extension is added to the OFDM symbol sequence, 3. CHANNEL ESTIMATION FOR OFDM TRANSMITTER and the resulting signal is transmitted from the first trans- DIVERSITY SYSTEMS mit antenna. Similarly, the vector X2(n) is modulated by an IDFT, cyclically extended, and transmitted from the second There are two common strategies for estimating the parame- ters of fading channels: decision-directed channel estimation transmit antenna. Let h1(n) denote the impulse response of the channel between the first transmit antenna and the re- and pilot-symbol-assisted (PSA) channel estimation. With decision-directed channel estimation, decoded symbols Xˆ (n) ceiver and h2(n) denote the impulse response of the channel between the second transmit antenna and the receiver. The at the output of the decision device, or more frequently length of the cyclic extension is chosen to be greater than after the error-correction decoder, are used for estimating the channel parameters during the data transmission mode. Since past decisions are used to estimate the channel param- 1Throughout the paper we use the notation that A(n, k)isthekth ele- eters, decision-directed channel estimation is susceptible to ment of the vector A(n). error propagation, especially during a deep fade. Therefore, Pilot-Symbol-Assisted Channel Estimation for Space-Time Coded OFDM Systems 509 even with decision-directed channel estimation, known sym- 3.1. Pilot symbols for multiple transmitter OFDM bols2 are periodically transmitted to avoid excessive error systems propagation. The channel estimator in [12] is essentially a Pilot-symbol-assisted (PSA) channel estimation techniques decision-directed channel estimator where the decoded data for single transmitter systems have been proposed and are symbols are used, during the data transmission mode, to esti- well understood [14, 15]. However, there is little literature mate the set of channel parameters that minimizes the mean- on PSA channel estimation techniques for multiple trans- square error (MSE) cost function mitter systems. In [16], an alternating PSA channel estima- − 2 tion scheme was suggested for multiple transmitter systems. K1 M − Λ To estimate the channel from the mth transmitter to the re- Y(n, k) m(n, k)Xm(n, k) , (2) k=0 m=1 ceiver, the pilot symbols are transmitted only from the mth transmitter, while all the other transmitters either transmit where M is the number of transmitters. With PSA channel es- null symbols or stop transmission. With this alternating pilot timation, known pilot symbols are inserted into the transmit symbol scheme, M times as many pilot symbols are needed symbol stream, usually at a regular interval. At the receiver, to estimate all the channels in an M transmit antenna system the pilot symbols are extracted to provide a temporal esti- as compared to that required for a single transmit antenna mate of the channel parameters at the pilot instants. These system. The expansion in pilot symbols is undesirable from temporal estimates are then either filtered or interpolated to the standpoint of data throughput and bandwidth efficiency. provide estimates of the channel parameters during the data Here, we propose a multirate PSA channel estimation tech- transmission mode. nique that does not require expansion in the number of pilot It is interesting to note that although both the decision- symbols for multiple transmitter OFDM systems. directed channel estimator and the PSA channel estimator Although the different signals from multiple transmitters estimate the channel parameters using known symbols in the in a transmitter diversity system tend to interfere with each form of training and pilot symbols, there is a major differ- other, pilot symbols can be constructed for multiple trans- ence between the two estimators during the data transmis- mitter OFDM systems to avoid this form of interference and, sion mode. With the decision-directed approach, decoded thus, simplify the task of channel estimation during the pilot symbols are used to update the channel estimates contin- mode. Notice in (1) that, for properly designed OFDM trans- uously during the data transmission mode. On the other mitter diversity systems, the subchannels for the signal from hand, with the PSA approach, decoded symbols during the each transmitter are decoupled, that is, Λ1(n)andΛ2(n)are data transmission mode are not used to determine the chan- diagonal matrices. Therefore, if the pilot symbols are con- nel. Channel estimates are generated either by filtering or structed so that pilot symbols transmitted from different interpolating the temporal estimates obtained at the pilot transmitters occupy different frequency bins, any individual instants. This difference has special significance for trans- symbol in the demodulated signal vector Y(n) will then con- mitter diversity systems where the multiple transmitted sig- tain a contribution from only one transmitter, and the com- nals tend to interfere with the channel estimation process. plex channel gain for that particular subcarrier can be eas- During the training or pilot mode, the interferences among ily estimated. An obvious choice is to have the pilot symbols the multiple transmitted signals can be easily minimized for among the transmitters evenly distributed while nonoverlap- the decision-directed channel estimator or PSA channel es- ping in frequency. In theory, any pilot symbols that satisfy timator by employing properly designed orthogonal train- the nonoverlapping conditions will be sufficient. In practice, ing symbols [12, 13] or pilot symbols. However, there is no the pilot symbols should be chosen to have other desirable such “luxury” during the data transmission mode, because OFDM properties as well. Chirp sequences are attractive for the multiple transmitted signals typically correspond to ran- channel estimation in OFDM systems because they have a flat domly distributed data symbols. Hence, the MMSE solution power spectrum and a low peak-to-average power ratio [17]. for finding the “best” estimate amid the interfering signals, Here, we propose the use of chirp sequences, with different such as in [12], is indeed the logical approach for decision- phase offsets from antenna to antenna, as pilot symbols for directed channel estimation for transmitter diversity systems. multiple transmitter OFDM systems. Define a length K chirp For the PSA channel estimator, however, the main challenges sequence as are in minimizing the interferences among the pilot sym- 2 bols from the multiple transmitters and in the design of the C(k) = e jπk /K, 0 ≤ k ≤ K − 1. (3) interpolator. The interferences during the data transmission ffi mode, which are more di cult to resolve, are not a concern Let PSm(n, k) denote the kth tone of the pilot symbol trans- at all for PSA channel estimators. Consequently, PSA channel mitted from the mth transmit antenna during the block in- estimation will be shown to be the better choice for transmit- stant n. The pilot symbols are constructed as ter diversity systems. PSm(n, k + m − 1) & √ 2 m (4) To avoid possible confusion with PSA channel estimation approaches, (−1) MC(k + m − 1), if (k)M = 0, these known symbols for decision-directed channel estimation are some- = times referred to as training symbols. 0, otherwise, 510 EURASIP Journal on Applied Signal Processing

NN A number of two-dimensional (2D) filtering techniques k = 0 P D D D D P D D D D P D have been proposed for PSA channel estimation for OFDM systems. The 2D Wiener filter proposed in [6, 7]hasfairly D D D D D D D D D high complexity and requires knowledge of the channel P D D D D P D D D D P D statistics. In [11], a robust MMSE interpolator that does not D D D D D D D D D ··· ··· ··· require knowledge of channel statistics was proposed. The in- P D D D D P D D D D P D terpolator in [11], however, requires 2D filtering, a 2D DFT, D D D D D D D D D

(frequency index) and a 2D IDFT. Here, we consider a robust yet simple inter-

k P D D D D P D D D D P D polation approach that does not depend on channel statis- = D D D D D D D D D k 7 tics and requires only a simple windowing function and one- PS1 dimensional interpolation filters. The diagonal elements of Λ˜ m(n)are,ineffect, samples of the frequency response of the channel between the mth ˜ k = 0 transmitter and the receiver. Let hm(n) be the IDFT of the D D D D D D D D D Λ˜ ˜ P D D D D P D D D D P D diagonal of m(n). In the absence of noise, hm(n) is related D D D D D D D D D to the actual channel impulse response (CIR) hm(n)by[18] P D D ···D D P D D ···D D P D ··· M−1 D D D D D D D D D ˜ 1 K j(2πm/M)l hm(n, k) = hm n, k + l e . (7) P D D D D P D D D D P D M l=0 M K (frequency index)

k D D D D D D D D D ˜ k = 7 P D D D D P D D D D P D Notice that hm(n) is the sum of circularly shifted images of hm(n).Theimagesin(7) are the direct result of sampling in PS2 the frequency domain. To avoid aliasing in the time domain, n (time index) the condition K ≥ M(L + 1) must be satisfied. To remove the ˜ images, hm(n) is passed through a length L + 1 rectangular Legend: DData symbol P Pilot symbol Null symbol ˆ window of gain M to yield the temporal estimate hm(n)at the pilot instant as Figure 2: Pilot symbol patterns for an example OFDM transmitter & diversity system with K = 8andM = 2. ≤ ≤ ˆ hm(n, k)+ξ(n, k), 0 k L, hm(n, k) = (8) 0,L+1≤ k ≤ K − 1.

ˆ where M is the number of transmitters, (k)M denotes k mod- The DFT of hm(n) yields the estimate of the channel param- ulo M,1≤ m ≤ M,0≤ k ≤ K −1, and 1 ≤ m+k ≤ K. Figure 2 eters shows the pilot symbol patterns for an example two-branch OFDM transmitter diversity system. Λˆ m(n) = Λm(n)+Ξ(n), (9) Since the pilot symbols are known to the receiver and, during the pilot instants, each symbol in Y(n) contains only where the elements of the noise vector Ξ(n)haveavariance 2 the contribution from one transmitter, the least-square esti- of σW M(L +1)/K. Since M(L +1)

(FIR) digital filter, and generates N −1 interpolated CIR sam- 0.4 ples at the OFDM symbol rate of 1/((K + G)TS). These inter- polated values provide a robust estimate of the CIRs for the 0.35 diversity decoder during the data transmission mode. 0.3 Notice that the multipath fading process is bandlimited to the maximum Doppler shift frequency fD. Therefore, to 0.25 satisfy the Nyquist criteria, the sampling rate of the chan- 0.2 nel estimates must satisfy fs > 2 fD, where the sampling fre- = quency fs 1/(N(K + G)TS). The equivalent condition 0.15 Interpolation MSE 1 N< (10) 0.1 2 fD(K + G)TS 0.05 gives an upper bound on the pilot symbol spacing. It is well 0 known that the impulse response of the ideal interpolator for 1 1.5 2 2.5 3 3.4 4 bandlimited signals is the sinc function, which has an infi- Normalized sampling rate nite number of coefficients and is, therefore, unrealizable. Linear (Q = 2) A number of practical interpolators have been proposed in 3rd order Lagrange (Q = 4) [14, 19, 20]. As shown in [19], even order interpolation fil- 5th order Lagrange (Q = 6) 3rd order least-square (Q = 4) ters, that is, when Q is odd, do not have linear phase. Nonlin- 5th order least-square (Q = 6) ear phase distortion can cause discontinuities in the envelope of the interpolated signal. Furthermore, linear phase interpo- Figure 3: Interpolation error as a function of the normalized sam- lation filters have symmetrical coefficients, which can reduce pling rate fs/(2 fD). the number of calculations by a factor of 2. Therefore, we will focus on odd order, linear phase interpolation filters. In general, the interpolation process improves with in- transition band and has less stopband rejection than other creased sampling rates and with higher order interpolation types of “optimum” filters such as the least-square filter [20]. filters. However, there is no analytical expression for the in- Figure 3 also shows that the least-square filters have a lower terpolation error of bandlimited signals using these interpo- interpolation MSE than the Lagrange interpolation filters of lators. Therefore, the interpolation errors of a number of in- the same order. Therefore, the least-square filter is a better terpolators were simulated to provide a qualitative measure choice for interpolating a bandlimited signal. Interestingly, of how well they may track a frequency-selective fading chan- the interpolation errors of the fifth order Lagrange, and the nel. The interpolation performance criteria used is the MSE third and fifth order least-square interpolators are very close between the interpolated and the actual CIR. Assuming that to the error floor at only twice the Nyquist rate. the pilot symbols are transmitted at block instants n = pN Since the insertion of pilot symbols represents a loss for p = 0, 1, 2,..., the interpolation MSE is defined as of bandwidth efficiency, a main objective in designing PSA channel estimators is to minimize the sampling rate or the − − NT 1 pN+N 1 L−1 1 2 number of pilot symbols, that is, to maximize N. Another ε = hˆ(n, l) − h(n, l) , (11) − practical consideration is minimization of the complexity NT (N 1)L p=0 n=pN+1 l=0 and delay of the interpolator, which usually translates to us- where NT is the total number of interpolation intervals in the ing the lowest order interpolator possible. From the above simulation. The interpolation MSEs for an OFDM system simulation results, the third order least-square interpola- ≈ with K = 128 were simulated at fD = 100 Hz with average tor operating at about twice the Nyquist rate, that is, N received signal-to-noise ratio (SNR) of 40 dB and at several 1/(4 fD(K +G)TS), should achieve good interpolation perfor- pilot symbol spacings N to measure the effectiveness of the mance at a reasonable sampling rate, implementation com- various interpolation filters at different sampling rates. The plexity, and delay. The hardware complexity of the interpola- COST 207 six-ray typical urban channel power delay profile tion filter can be further reduced by employing the polyphase [21] was used throughout the simulations. Simulation results filter structure as shown in [20]. A block diagram of the pro- of the interpolation MSE as a function of the normalized posed PSA channel estimator for a two-branch OFDM trans- sampling rate, fs/(2 fD), for the linear interpolator, third and mitter diversity system is shown in Figure 4. fifth order Lagrange interpolators [19], and third and fifth order least-square interpolators (α = 0.5) [20] are shown in 3.3. Performance of pilot-symbol-assisted channel Figure 3. Simulation results show that the linear interpolator estimators has significant interpolation error until the sampling rate is Channel estimators based on the pilot symbols and interpo- well above 4 times the Nyquist rate. As expected, the higher lators described in Sections 3.1 and 3.2 have been evaluated order interpolators all have better performance than the lin- with the two-branch space-time block-coded OFDM (STBC- ear interpolator. Although the classical Lagrange interpola- OFDM) and space-frequency block-coded OFDM (SFBC- tion filter is optimally flat in the passband, it has a wider OFDM) transmitter diversity systems proposed in [3, 4]. For 512 EURASIP Journal on Applied Signal Processing

1/PS1(pN, 0)

Y(pN, 0) × hˆ 1(pN, 0) Λˆ 1(n, 0) 0 1/PS1(pN, 2) hˆ 1(pN, 1) Λˆ (n, 1) Interpolation . 1 . . Y(pN, 2) × . filter . . Λˆ 1(n, 2)

0 IDFT ˆ DFT . h1(pN, L) . . − ˆ . 1/PS1(pN, K 2) h1(pN, L+1) . 0 . . . − . . Y(pN, K 2) × . hˆ 1(pN, K −1) 0 0 Λˆ 1(n, K − 1)

0 hˆ 2(pN, 0) Λˆ (n, 0) Y(pN, 1) × 2 hˆ 2(pN, 1) Λˆ 2(n, 1) 1/PS2(pN, 1) Interpolation . . 0 filter . Λˆ 2(n, 2)

Y(pN, 3) × IDFT DFT hˆ 2(pN, L)

1/PS2(pN, 3) ˆ . . h2(pN, L+1) . . 0 . . . . . 0 . . hˆ 2(pN, K −1) Y(pN, K − 1) × 0 Λˆ 2(n, K − 1)

1/PS2(pN, K − 1)

Figure 4: Block diagram of the proposed PSA channel estimator for a two-branch OFDM transmitter diversity system. the STBC-OFDM simulations, the system employed 128 sub- with the interpolation MSE results in Figure 3, where the carriers with 4-QAM modulation at a symbol rate of 220 sym- third order least-square interpolator has a lower interpola- bols per second on each subcarrier, that is, K = 128 and TS = tion MSE than the third order Lagrange interpolator. At a −20 2 seconds. The pilot symbol spacing was set at N = 20 so faster fading condition of fD = 150 Hz, which corresponds to that the sampling frequency was near twice the Nyquist rate sampling at about 1.4 times the Nyquist rate, the BER per- at a maximum Doppler frequency of 100 Hz. Simulation re- formances of both systems with estimated channel parame- sults of the average bit error rate (BER) performance for a ters are severely degraded as a result of the excessively high two-branch STBC-OFDM system with ideal channel param- interpolation MSE. eters and with channel parameters estimated by a third order For the SFBC-OFDM simulations, the system employed Lagrange interpolator are shown in Figure 5. Comparisons to 256 subcarriers with 4-QAM modulation at a symbol rate of a third order least-square interpolator are shown in Figure 6. 220 symbols per second on each subcarrier, that is, K = 256 −20 Simulation results confirm that at slow fading conditions, and TS = 2 seconds. The pilot symbol spacing was set such as fD = 50 Hz, both the third order Lagrange and third at N = 10, so that the sampling frequency was again at order least-square interpolators perform very well. In fact, about twice the Nyquist rate for a maximum Doppler fre- for this fading rate there is no noticeable BER degradation quency of 100 Hz. Simulation results of the average BER per- between the systems using the ideal channel parameters and formance for a two-branch SFBC-OFDM system with ideal those using the estimated parameters from the third order channel parameters and with channel parameters estimated least-square interpolator. At fD = 100 Hz, which corresponds by a third order Lagrange interpolator are shown in Figure 7. to sampling at about twice the Nyquist rate, the BER perfor- Simulation results with a third order least-square interpola- mance with the Lagrange interpolator is degraded slightly, tor are shown in Figure 8. Simulation results of the SFBC- while that with the least-square interpolator still shows very OFDM system show that at slow fading conditions, such little degradation. This relative performance is in agreement as fD = 50 Hz, both the third order Lagrange and third Pilot-Symbol-Assisted Channel Estimation for Space-Time Coded OFDM Systems 513

100 100

− 10−2 10 2

− 10−4 10 4 Average bit error rate Average bit error rate −6 10−6 10

−8 10−8 10 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 Average received SNR (dB) Average received SNR (dB) = Ideal parameters, f = 50 Hz Ideal parameters, fD 50 Hz D = Estimated parameters, f = 50 Hz Estimated parameters, fD 50 Hz D = Ideal parameters, f = 100 Hz Ideal parameters, fD 100 Hz D = Estimated parameters, f = 100 Hz Estimated parameters, fD 100 Hz D = Ideal parameters, f = 150 Hz Ideal parameters, fD 150 Hz D = Estimated parameters, fD = 150 Hz Estimated parameters, fD 150 Hz

Figure 5: Performance comparison of STBC-OFDM systems with Figure 7: Performance comparison of SFBC-OFDM systems with ideal channel parameters and channel parameters estimated by a ideal channel parameters and channel parameters estimated by a third order Lagrange interpolator. third order Lagrange interpolator.

0 100 10

−2 10−2 10

− − 10 4 10 4 Average bit error rate

Average bit error rate −6 10−6 10

−8 −8 10 10 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 Average received SNR (dB) Average received SNR (dB) = Ideal parameters, f = 50 Hz Ideal parameters, fD 50 Hz D = Estimated parameters, fD = 50 Hz Estimated parameters, fD 50 Hz = Ideal parameters, fD = 100 Hz Ideal parameters, fD 100 Hz Estimated parameters, f = 100 Hz Estimated parameters, fD = 100 Hz D = Ideal parameters, f = 150 Hz Ideal parameters, fD 150 Hz D = Estimated parameters, fD = 150 Hz Estimated parameters, fD 150 Hz

Figure 6: Performance comparison of STBC-OFDM systems with Figure 8: Performance comparison of SFBC-OFDM systems with ideal channel parameters and channel parameters estimated by a ideal channel parameters and channel parameters estimated by a third order least-square interpolator. third order least-square interpolator.

order least-square interpolators have similar performance, the ideal channel parameters. At fD = 100 Hz, which cor- and there is only a slight degradation for the systems us- responds to sampling at about twice the Nyquist rate, the ing the estimated parameters compared to the systems with BER performance for the least-square interpolator is clearly 514 EURASIP Journal on Applied Signal Processing better than that of the Lagrange interpolator. At a faster fad- 100 ing condition of fD = 150 Hz, which corresponds to sam- pling at about 1.4 times the Nyquist rate, the BER perfor- mances of both interpolators are severely degraded. Clearly, 10−2 asufficient sampling rate is crucial to the performance of the proposed PSA channel estimator. From the above sim- ulation results and the earlier interpolation MSE analysis, 10−4 a good rule-of-thumb is to set the pilot symbol spacing at about twice the Nyquist rate for the anticipated maximum Doppler frequency. Average bit error rate The simulation results for the SFBC-OFDM are generally 10−6 similar to that of the STBC-OFDM system shown previously in Figures 5 and 6, where the third order least-square inter- polator slightly outperforms the third order Lagrange inter- 10−8 polator and both interpolators perform reasonably well when 0 5 10 15 20 25 30 35 40 the channel fading rate is at or below the anticipated maxi- Average received SNR (dB) mum Doppler frequency. MMSE estimator, fD = 50 Hz PSA estimator, fD = 50 Hz 3.4. Comparison with the decision-directed MMSE MMSE estimator, fD = 100 Hz channel estimator PSA estimator, fD = 100 Hz MMSE estimator, fD = 150 Hz = In this section, the PSA channel estimator proposed in PSA estimator, fD 150 Hz Section 3.2 is briefly compared to the decision-directed MMSE channel estimator of [12].4 As mentioned previ- Figure 9: Performance comparison of STBC-OFDM systems with ously in Section 3, that decision-directed channel estimator channel parameters estimated by a decision-directed MMSE chan- is susceptible to error propagation. Therefore, the perfor- nel estimator and by the PSA channel estimator. mance of the decision-directed channel estimator depends on the number of errors in the decisions or decoded sym- 100 bols used to direct the channel estimates. To improve the performance of the decision-directed channel estimator, the decoded symbols after error-correction decoding are often used for updating the channel estimation. Hence, the per- 10−2 formance of the decision-directed channel estimator is af- fected by the performance of the particular error-correction code employed by the system. Here, instead of arbitrarily 10−4 choosing an error-correction coding scheme, a lower bound for the BER of the decision-directed MMSE channel esti- Average bit error rate mator was simulated by using the actual symbols in direct- −6 ing the channel estimation. The simulations used the same 10 STBC-OFDM and SFBC-OFDM system parameters as the systems simulated in Section 3.3. The training symbols in [12] were used for the decision-directed MMSE channel es- 10−8 timator and the training symbols were sent at the same spac- 0 5 10 15 20 25 30 35 40 ing as the pilot symbols for the corresponding PSA chan- Average received SNR (dB) nel estimator. Figure 9 shows the simulation results com- MMSE estimator, fD = 50 Hz = paring the decision-directed MMSE channel estimator with PSA estimator, fD 50 Hz = the PSA channel estimator using a third order least-square MMSE estimator, fD 100 Hz PSA estimator, fD = 100 Hz interpolation filter for the STBC-OFDM system. Figure 10 MMSE estimator, fD = 150 Hz = shows the same comparison for the SFBC-OFDM system. It PSA estimator, fD 150 Hz is interesting to note that, for these particular STBC-OFDM and SFBC-OFDM systems, the PSA channel estimator sig- Figure 10: Performance comparison of SFBC-OFDM systems with nificantly outperforms the decision-directed MMSE channel channel parameters estimated by a decision-directed MMSE chan- nel estimator and by the PSA channel estimator. estimator under all fading conditions. These results further

4In [12], a simplified approach was proposed that required identification support the earlier suggestion that the PSA channel estimator of the significant taps of hm(n). Here, we consider only the basic approach is the better choice for OFDM transmitter diversity systems. for comparison. The computational complexities of the PSA channel Pilot-Symbol-Assisted Channel Estimation for Space-Time Coded OFDM Systems 515

Table 1: Computational complexities of the PSA channel estimator and the decision-directed MMSE channel estimator.

Multiplications Additions − PSA channel estimator ((M + M/N)/2)Klog2K + K/N + M(L +1)Q (M + M/N)Klog2K + M(L +1)(Q 1) ∗ 2 3 2 3 MMSE channel estimator ((2M + M )/2)Klog2K + MK +(ML) /3(2M + M )Klog2K +(ML) /3

∗ Assuming K is a power of two, each FFT requires K/2log2K multiplications and Klog2K additions [22], and Gaussian elimination with an n × n matrix requires n3/3 multiplications and n3/3 additions [23].

estimator and the decision-directed MMSE channel estima- [3]K.F.LeeandD.B.Williams,“Aspace-timecodedtransmitter tor are shown in Table 1. Compared to the MMSE channel diversity technique for frequency selective fading channels,” estimator, the PSA channel estimator requires fewer DFTs: in Proc. IEEE Sensor Array and Multichannel Signal Processing M + M/N for the PSA estimator versus 2M + M2 for the Workshop, pp. 149–152, Cambridge, Mass, USA, March 2000. [4] K. F. Lee and D. B. Williams, “A space-frequency transmit- MMSE estimator. Furthermore, calculating the MMSE solu- ter diversity technique for OFDM systems,” in Proc. IEEE tion for the decision-directed estimator has a complexity of Global Telecommunications Conference, vol. 3, pp. 1473–1477, ᏻ(M3L3), while the interpolation filter for the PSA estima- San Francisco, Calif, USA, November 2000. tor has a complexity that is only proportional to ML. Clearly, [5]J.-J.vandeBeek,O.Edfors,M.Sandell,S.K.Wilson,andP.O. the PSA channel estimator is computationally more efficient Borjesson,¨ “On channel estimation in OFDM systems,” in than the decision-directed MMSE channel estimator. Proc. IEEE Vehicular Technology Conference, vol. 2, pp. 815– 819, Chicago, Ill, USA, July 1995. [6] P. Hoeher, S. Kaiser, and P. Robertson, “Two-dimensional 4. SUMMARY pilot-symbol-aided channel estimation by wiener filtering,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 3, A low complexity, bandwidth efficient, pilot-symbol-assisted pp. 1845–1848, Munich, Germany, April 1995. channel estimator for OFDM transmitter diversity systems [7] P. Hoeher, S. Kaiser, and P. Robertson, “Pilot-symbol-aided has been presented. Different interpolation algorithms have channel estimation in time and frequency,” in Proc. IEEE been evaluated and were seen to provide robust channel pa- GLOBECOM Communication Theory Mini-Conference, vol. 3, rameter estimates in various fading environments. Simula- pp. 90–96, Phoenix, Ariz, USA, November 1997. [8] O. Edfors, M. Sandell, J.-J. van de Beek, S. K. Wilson, and P. O. tion results verify that the proposed technique is well suited Borjesson,¨ “OFDM channel estimation by singular value de- for channel estimation in space-time coded OFDM trans- composition,” IEEE Trans. Communications,vol.46,no.7,pp. mitter diversity systems. It has also been shown that the 931–939, 1998. proposed PSA channel estimator outperforms the decision- [9]O.Edfors,M.Sandell,J.-J.vandeBeek,S.K.Wilson,and directed MMSE channel estimator and is also more compu- P. O. B orjesson,¨ “Analysis of DFT-based channel estimators tationally efficient. for OFDM,” Wireless Personal Communications,vol.12,no.1, For ease of presentation, this paper has focused on sys- pp. 55–70, 2000. [10] Y. Li, L. J. Cimini Jr., and N. R. Sollenberger, “Robust chan- tems with multiple transmit antennas and a single receive nel estimation for OFDM systems with rapid dispersive fad- antenna. It should be noted that the proposed approach can ing channels,” IEEE Trans. Communications,vol.46,no.7,pp. also be applied to systems with multiple receive antennas by 902–915, 1998. replicating the proposed channel estimator for each receive [11] Y. Li, “Pilot-symbol-aided channel estimation for OFDM in antenna. wireless systems,” IEEE Transactions on Vehicular Technology, vol. 49, no. 4, pp. 1207–1215, 2000. [12] Y. Li, N. Seshadri, and S. Ariyavisitakul, “Channel estima- ACKNOWLEDGMENT tion for OFDM systems with transmitter diversity in mobile wireless channels,” IEEE Journal on Selected Areas in Commu- This paper was presented in part at the IEEE International nications, vol. 17, no. 3, pp. 461–471, 1999. Conference on Acoustics, Speech, and Signal Processing [13] Y. Li, “Optimum training sequences for OFDM systems with (ICASSP), Salt Lake City, Utah, USA, May 2001. multiple transmit antennas,” in Proc. IEEE Global Telecom- munications Conference, vol. 3, pp. 1478–1482, San Francisco, REFERENCES Calif, USA, November 2000. [14] S. Sampei and T. Sunaga, “Rayleigh fading compensation for [1] D. Agrawal, V. Tarokh, A. Naguib, and N. Seshadri, “Space- QAM in land mobile radio communications,” IEEE Transac- time coded OFDM for high data-rate wireless communication tions on Vehicular Technology, vol. 42, no. 2, pp. 137–47, 1993. over wideband channels,” in Proc. IEEE Vehicular Technology [15] J. K. Cavers, “An analysis of pilot symbol assisted modulation Conference, vol. 3, pp. 2232–2236, Ottawa, Ont., Canada, May for Rayleigh fading channels,” IEEE Transactions on Vehicular 1998. Technology, vol. 40, no. 4, pp. 686–693, 1991. [2] Y. Li, J. C. Chuang, and N. R. Sollenberger, “Transmitter di- [16] S. M. Alamouti, “A simple transmitter diversity scheme for versity for OFDM systems and its impact on high-rate data wireless communications,” IEEE Journal on Selected Areas in wireless networks,” IEEE Journal on Selected Areas in Commu- Communications, vol. 16, no. 8, pp. 1451–1458, 1998. nications, vol. 17, no. 87, pp. 1233–1243, 1999. [17] J. M. Cioffi and J. A. C. Bingham, “A data-driven multitone 516 EURASIP Journal on Applied Signal Processing

echo canceller,” IEEE Trans. Communications, vol. 42, no. 10, pp. 2853–2869, 1994. [18] N. J. Fliege, Multirate digital signal processing : multirate sys- tems, filter banks, wavelets, Wiley, Chichester, NY, USA, 1994. [19] R. W. Schafer and L. R. Rabiner, “A digital signal processing approach to interpolation,” Proceedings of the IEEE, vol. 61, no. 6, pp. 692–702, 1973. [20] G. Oetken, T. W. Parks, and H. W. Schussler, “New results in the design of digital interpolators,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 23, no. 3, pp. 301–309, 1975. [21] Commission of the European Communities, Digital Land Mobile Radio Communications—COST 207,Office for Of- ficial Publications of the European Communities. Luxem- bourg, 1989. [22] A. V. Oppenheim and R. W. Schafer, Discrete-time signal pro- cessing, Prentice Hall, Englewood Cliffs, NJ, USA, 1989. [23] G. H. Golub and C. F. Van Loan, Matrix computations, Johns Hopkins University Press, Baltimore, Md, USA, 3rd edition, 1996.

King F. Lee received his BSEE from the Uni- versity of Florida, the MSE from Florida Atlantic University, and the Ph.D. degree from the Georgia Institute of Technology. In 1979, he joined Motorola Inc., where he is currently a Distinguished Member of the Technical Staff. His areas of interest include mixed analog-digital integrated circuit de- sign, computer aided circuit design, wire- less communications, digital signal and im- age processing. He has served as a Member of the Industrial Advi- sory Board of the NSF Research Center for the Design of Analog- Digital Integrated Circuits (CDADIC) from 1989 to 1993 and is a Registered Professional Engineer.

Douglas B. Williams received his BSEE, MS, and Ph.D. degrees in electrical and computer engineering from Rice University, Houston, Texas, in 1984, 1987, and 1989, re- spectively. In 1989, he joined the faculty of theSchoolofElectricalandComputerEn- gineering at the Georgia Institute of Tech- nology, Atlanta, Georgia, where he is cur- rently an Associate Professor. There he is also affiliated with the Center for Signal and Image Processing and teaches courses in signal processing and telecommunications. Dr. Williams has served as an Associate Ed- itor of the IEEE Transactions on Signal Processing and is a mem- ber of the IEEE Signal Processing Society’s SPTM Technical Com- mittee. He was on the conference committee for the 1996 Interna- tional Conference on Acoustics, Speech, and Signal Processing that was held in Atlanta and is currently co-chair of the 2002 IEEE DSP and Signal Processing Education workshops. Dr. Williams was co- editor of the Digital Signal Processing Handbook published in 1998 by CRC Press and IEEE Press. He is a member of the Tau Beta Pi, Eta Kappa Nu, and Phi Beta Kappa honor societies. EURASIP Journal on Applied Signal Processing 2002:5, 517–524 c 2002 Hindawi Publishing Corporation

Low-Complexity Iterative Receiver for Space-Time Coded Signals over Frequency Selective Channels

Noura Sellami France T´el´ecom R&D, 38-40 rue du G´en´eral Leclerc, 92794 Issy les Moulineaux, France Equipe´ de Traitement des Images et du Signal, ENSEA-Universit´edeCergyPontoise, 6 avenue du Ponceau, 95014 Cergy-Pontoise, France Email: [email protected] Inbar Fijalkow Equipe´ de Traitement des Images et du Signal, ENSEA-Universit´edeCergyPontoise, 6 avenue du Ponceau, 95014 Cergy-Pontoise, France Email: fi[email protected] Mohamed Siala Sup’Com, Route de Raoued Km 3.5, 2083 Ariana, Tunisia Email: [email protected]

Received 31 May 2001 and in revised form 18 March 2002

We propose a low-complexity turbo-detector scheme for frequency selective multiple-input multiple-output channels. The detec- tion part of the receiver is based on a List-type MAP equalizer which is a state-reduction algorithm of the MAP algorithm using per-survivor technique. This alternative achieves a good tradeoff between performance and complexity provided a small amount of the channel is neglected. In order to induce the good performance of this equalizer, we propose to use a whitened matched filter (WMF) which leads to a white-noise “minimum phase” channel model. Simulation results show that the use of the WMF yields significant improvement, particularly over severe channels. Thanks to the iterative turbo processing (detection and decoding are iterated several times), the performance loss due to the use of the suboptimum List-type equalizer is recovered. Keywords and phrases: space-time coded MIMO channel equalization, per-survivor processing, multidimensional whitened matched filter, turbo detection.

1. INTRODUCTION exponentially with the product of the channel memory and The growing demand for new services at high data rates the number of transmit antennas [4]. When the channel indicates the need for new techniques to increase channel memory becomes large and high-order constellations are capacity. Foschini and Gans [1] have demonstrated the enor- used, the algorithm becomes impractical. Therefore, a re- mous capacity potential gain of wireless communication sys- duced complexity approach is needed. tems with antenna arrays at both transmitter and receiver. In In this paper, we consider a List-type MAP equalizer order to achieve the promised high data rates over frequency [5]whichrealizesagoodtradeoff between performance and selective multiple-input multiple-output (MIMO) channels, complexity. The trellis has a reduced number of states taking an equalizer has to be applied to reduce the channel time dis- into account a reduced number of taps of the channel. The persion due to multipath propagation at high data rates. remaining intersymbol and cochannel interference is can- Several solutions have been proposed among them linear celled by an internal per-survivor processing with a list of S and decision-feedback structures with a zero forcing or min- survivors [6, 7, 8]. The choice of the receiver filter can af- imum mean square error optimization [2]. These equalizers fect dramatically the performance of this suboptimal equal- have a low-complexity but suffer from noise enhancement izer. To induce the performance of this equalizer, it is desir- and error propagation. In terms of performance, it is better able to use a receiver filter which concentrates the energy to use a maximum a posteriori (MAP) [3]orViterbiequal- on the first taps. In the scalar case, this is easily achieved izer. However, the complexity of these algorithms is pro- using a minimum phase factorization of the channel. Here, portional to the number of states of the trellis which grows we extend this approach to the MIMO case. Therefore, we 518 EURASIP Journal on Applied Signal Processing

Antenna 1 C1 d1 Input Π1 Modulator bits . . Encoder DeMux . . Antenna N CN dN ΠN Modulator

Figure 1: Transmitter structure. propose to use a whitened matched filter (WMF) which case, respectively, and vectors are underlined lower case; makes the channel “minimum phase” and keeps the noise (·)T ,(·)†,and(·)−1 denote, respectively, transposition, trans- white [9, 10, 11]. Simulation results show that the use of the conjugation, and inversion. Moreover, Trace(A) denotes the WMF yields significant improvement particularly over severe trace of matrix A and E(·) denotes the expected value opera- channels. tor. Finally, x denotes the highest integer not bigger than x. We propose here to enhance the performance by using a channel encoder. As shown in Figure 1,weconsideraMIMO 2. SYSTEM MODEL system transmitting parallel data coded streams and demulti- 2.1. General framework plexed simultaneously. The structure of this transmitter, re- ferred to as bit-interleaved coded modulation (BICM), was We consider a frequency selective fading MIMO channel initially proposed for single-antenna systems [12, 13]. A sin- with N transmitting antennas and M receiving antennas. The gle code is cyclically connected to N transmitters, each in- channel between each transmit antenna and each receive an- tenna is modeled by a Rayleigh fading with a memory of L cluding a bit interleaver Πi followed by a modulator and an antenna. The code and the frequency selective channel sep- symbols. arated by the interleavers constitute a serially concatenated As shown in Figure 1, the input information bit se- scheme to which we can apply an iterative receiver composed quence is first encoded with a rate K convolutional encoder. The output of the encoder is demultiplexed into N streams of a soft-input/soft-output decoder and an equalizer follow- ff Π ing the idea of turbo-codes [14]. The basic idea behind iter- that are interleaved by di erent interleavers i,mappedto ative processing is to exchange extrinsic information among PSK/QAM symbols and transmitted simultaneously by the the receiver modules in order to achieve successively refined N transmitting antennas. The same modulation constella- performance. Turbo processing for MIMO coded data en- tion with size Q is used for each stream. Thus, log2(Q)coded hances all the more the need for a low-complexity equalizer. bits are mapped into one Q-ary symbol. This scheme was In [15, 16], an equalizer based on filters has been proposed. first proposed in [12, 13] for single-antenna and is known as However, the gain in complexity induces a loss in perfor- bit-interleaved coded modulation (BICM). We assume that mance. In [17], Bauch and Al-Dhahir used a set of shorten- transmissions are organized into bursts of T symbols. For the ing filters [18] as prefilters to turbo equalization in order to sake of simplicity, the channels are supposed to be invariant reduce the number of states of the MAP equalizer. The dis- during a burst and to change independently from burst to advantage of this method is that the noise becomes colored burst. The received baseband signal sampled at the symbol rate at the outputs of the shortened channel. at antenna at time is a linear combination of the trans- In this paper, we consider a turbo detector composed of a j k N mitted signals perturbed with noise List-type MAP equalizer and a MAP decoder. Thanks to the WMF, the noise at the input of the turbo detector is white. N L−1 = − This permits to have a good performance without enhancing rj (k) hi,j(l)di(k l)+nj (k). (1) the complexity of the equalizer. In fact, in the case of a col- i=1 l=0 ored noise, the noise autocorrelation matrix has to be taken In this expression, nj (k) are modeled as independent into account by the equalizer in order to achieve good perfor- samples of a zero mean white complex Gaussian noise with 2 mance. Unfortunately, the complexity of the equalizer is then variance σ = N0 and hi,j(l) is the lth tap gain from trans- considerably increased. If, for simplicity, the colored noise mit antenna i to receive antenna j. The tap gains hi,j(l)are is assumed to be white, the soft outputs of the equalizer are modeled as independent complex Gaussian random vari- degraded, particularly on the first turbo iteration. This can ableswithzeromeanandvariance 2( ). We assume that  σh l affect significantly the performance of the iterative receiver L−1 2 = = T l=0 σh (l) 1. Let d(k) (d1(k),...,dN (k)) be the N-long since we know the importance of the equalization at the first vector of modulated symbols transmitted from the N trans- iteration for the turbo processing. T mitting antennas at time k and n(k) = (n1(k),...,nM(k)) The paper is organized as follows. In Section 2,we be the M-long noise vector at the receiving antennas. describe the system model. In Section 3, we present the The output of the channel is the M-long vector r(k) = whitened matched filter principle and give simulation results T (r1(k),...,r (k)) with Z-transform: for the List-type MAP equalizer. In Section 4, we describe the M proposed turbo detector and the corresponding simulation r(z) = H(z)d(z)+n(z), (2) results.  Throughout, scalars and matrices are lower and upper = L−1 −l = where H(z) l=0 H(l)z and (H(l))j,i hi,j(l). Low-Complexity Iterative Receiver for Space-Time Coded Signals over Frequency Selective Channels 519

The problem we address is then to recover the informa- According to this theorem, there exists an (N × N) causal tion bits from the noisy observation. and stable matrix filter B−(z) with causal and stable inverse † −1 † −1 which verifies B−(z )B−(z) = H (z )H(z)assoonasM ≥ N † −1 −1 2.2. Simulation framework (more receivers than transmitters). The filter (B−(z )) is In our simulations, we concentrate on a MIMO system a whitening filter for a process with spectrum S(z). Assum- † −1 −1 with 2 transmit antennas and 2 receive antennas. We use a ing that (B−(z )) can be perfectly known, and passing x(z) frequency-selective fading channel with memory L = 5. The through this filter, we have the following as the output of the channel is considered to be time invariant during the trans- whitened matched filter: mission of a burst of 512 information bits and changes inde- = † −1 −1 = pendently from burst to burst. We assume that the channel y(z) B− z x(z) B−(z)d(z)+n1(z), (5) is perfectly known at the receiver. The rate K = 1/2convolu- tional code has 4 states and generator polynomials (7, 5). The where n1(z)isawhite Gaussian noise with spectrum N0I, I modulation used is BPSK. We plot the bit error rate (BER) being the (N × N) identity matrix. with respect to averaged Eb/N0 per receive antenna. 3.2. WMF implementation using linear prediction

3. EQUALIZATION Several algorithms have been presented in the literature to † −1 determine the spectral factors B−(z)andB−(z )[19]. In the In the case of a single-input single-output (SISO) frequency following, based on prediction theory results [9], we briefly selective channel, a whitened matched filter (WMF) is used † −1 −1 explain how to compute an approximation of (B−(z )) . to transform the received signal into a sequence with mini- As we will see, the algorithm based on the prediction the- mum phase channel response and additive white noise. This † −1 ory provides the factorization S(z) = B−(z)B−(z ). Since procedure is a first step in the implementation of some we want to obtain the spectral factorization given in (5), equalizers including decision-feedback detectors and delayed † we begin by factoring †( ) = ( ) ( −1) then we take decision-feedback sequence estimators (DDFSE) [6]. This is S z B1 z B1 z = † achieved by factoring the channel spectrum into a product B−(z) B1(z). Denote by s(z) a WSS process with spectrum † = † −1 of a minimum phase filter and its time inverse. In this pa- S (z) H (z)H(z ), i(z) its innovations and L(z) its inno- per, we propose to use a List-type equalizer which realizes vations filter; s(n)canbewrittenas ff a good tradeo between performance and complexity. Since ∞ this type of algorithm considers a reduced number of states s(n) = L(k)i(n − k). (6) corresponding to a reduced number of taps of the channel k=0 [6, 7, 8], it is desirable to design a multidimensional WMF  = ∞ −k and to use it as a prefilter for the equalizer, analogously to Consider the linear predictor A(z) k=1 A(k)z of s(n). the scalar case. In the following, we will present a solution The estimation sˆ(n)ofs(n) in terms of its entire past is then, based on prediction theory results. ∞ ∞ sˆ(n) = A(k)s(n − k) = L(k)i(n − k). (7) 3.1. Whitened matched filter k=1 k=1 As in the SISO case, the first step is to feed the received signal † − It can be approximated by the estimation of ( )intermsof r(z) to the matched filter H (z 1). The output signal is then, s n its D most recent past values, † −1 ( ) = ( ) = ( ) ( )+ ( ) (3) H z r z x z S z d z b z , D ˆ( )  ( ) ( − ) (8) = † −1 = † −1 s n A k s n k . where S(z) H (z )H(z)andb(z) H (z )n(z)isacol- k=1 ored noise with spectrum N0S(z). For the multidimensional case, some results have been The estimation error is given by derived using the linear prediction theory for vector wide- sense stationary (WSS) processes [9, 10, 11]. An interesting e(n) = s(n) − sˆ(n) = L(0)i(n). (9) result is stated below. Writing the Yule-Walker equations Theorem 1 (multidimensional spectral factorization [11]). † Given an N-dimensional WSS process v(z)withspectrum E e(n)s(n − k) = 0, 1 ≤ k ≤ D, † † (10) Sv(z), there exists a factorization E e(n)s(n) = L(0)L(0) , = † −1 Sv(z) B− z B−(z)(4)we obtain − such that B−(z) is causal and stable and has a causal inverse. A = A(1),...,A(D) = RR 1, −1 D The filter B− (z) is stable if S (z) is nonsingular on the unit v D (11) circle. This factorization is called the minimum phase factor- R(0) − A(k)R(k)† = L(0)L(0)†. ization of Sv(z). k=1 520 EURASIP Journal on Applied Signal Processing

Table 1 Table 2 Channel 1 Channel 2 Channel 3 Channel 1 Channel 2 Channel 3

Mean of gain0 0.540 0.369 0.159 Mean of gain1 0.542 0.312 0.089

Standard deviation of gain0 0.114 0.107 0.121 Standard deviation of gain1 0.138 0.118 0.081

Mean of gain2 0.219 0.238 0.0213

Standard deviation of gain2 0.094 0.104 0.0196

In this expression, RD is the DN × DN covariance matrix of the random vector s (n) = [s(n)T ,...,s(n − D +1)T ]T , R(i) = † D E(s(k)s(k − i) ), and R = (R(1),...,R(D)). From (9), we can gain in terms of energy on the first taps that can be obtained write by using the WMF, we consider the following quantities: − † −1  −1 † † −1 1 † † H (z)H z A1(z) L(0)L(0) A1 z , (12) Trace B−(0) B−(0) − H(0) H(0) gain (H) =    ,  0 − † − L 1 = − D k Trace i=0 H(i) H(i) where A1(z) (IN k=1 A(k)z ). Hence, by taking the trans-conjugate of this expression,    n0 † − † we obtain Trace i=0 B−(i) B−(i) H(i) H(i) gain ( ) =   n0 H  . † − − −1 † † −1 L−1 † 1  1 Trace = H(i) H(i) H z H(z) A1 z L(0)L(0) A1(z) . (13) i 0 (17) † −1 −1 Thus, an approximation of the matrix (B−(z )) is given by −1 −1 † −1 −1 L(0) A1(z ). The implementation of (B−(z )) is given by Table 1 shows the mean and the standard deviation of the solving (11). The autocorrelation matrices R(i) are obtained measure gain0 over 1000 realizations for three types of fre- by identifying the terms in z−i in the equality quency selective channels described by the standard devia- tion of the Rayleigh distribution of their taps: +∞ H†(z)H z−1 = R(i)z−i. (14) = Channel 1: σh (0.227; 0.460; 0.688; 0.460; 0.227), i=−∞ √ √ √ √ √ Channel 2: σ = 1/ 5; 1/ 5; 1/ 5; 1/ 5; 1/ 5 , (18) We choose arbitrary L(0) as a lower triangular matrix h = since we do not care which minimum phase factor is con- Channel 3: σh (0.716; 0.501; 0.429; 0.214; 0.071), sidered. = − where σh (σh(0),...,σh(L 1)). ff 3.3. Energy concentration These channels were chosen because they have di erent energy profiles. Now, we want to verify that the prefiltered MIMO channels Channel 1 has the highest gain since the powers of its delayed have the property of energy concentration as for SISO chan- paths are larger than that of its direct path. Second in terms nels. We start by recalling the property for the SISO case [20]. of gain, Channel 2 which is quite severe because each path has the same averaged power. Third, Channel 3 which is close to Theorem 2 (energy concentration [20]). If B− is an impulse be minimum phase. response of a minimum phase filter, and H a filter having the Table 2 shows the mean and standard deviation of same spectrum, then for any n0, gain ( ) when = 1and = 2 for the same channels. n0 H n0 n0 , For the three channels, Table 2 shows that the energy con- n0 n0 2 2 = = B−(i) ≥ H(i) . (15) centration property is verified when n0 1andn0 2. We i=0 i=0 notice that for Channel 3 the mean and the standard devia- tion of the different gains are very close. Hence, the gains can It has been shown in [9] that the property of energy con- be very low for a few realizations. In order to achieve a good centration stated above holds for minimum phase MIMO tradeoff between complexity and performance, a test can be = channels obtained by spectral factorization for n0 0. performed on the calculated gains for each realization of the channel. If gain ( ) is less than a determined threshold, the n0 H Theorem 3(see[9]). Let y be a WSS stochastic process, let received signals are not prefiltered by the WMF before the B−(z) be a filter matrix corresponding to the Wold decompo- equalization, since the improvement will be very little. Oth- sition of y and W(z) a spectral factor of the matrix spectrum erwise, the WMF is used. † −1 † −1 of y such as B−(z)B−(z ) = W(z)W (z ), then 3.4. Generalized List-type MAP equalizer † † B−(0)B−(0) ≥ W(0)W(0) . (16) The List-type approach was first proposed for hard-output Viterbi algorithms [6, 7, 8]. A soft version following In the sequel, we will verify through simulations that this the idea of the MAP algorithm has been proposed by property still holds for orders n0 ≥ 1. In order to measure the Penther et al. [5]. We present here a generalization of this Low-Complexity Iterative Receiver for Space-Time Coded Signals over Frequency Selective Channels 521

soft-input/soft-output algorithm to MIMO channels. As ex- Let {m1,...,mQ} be the constellation points. The equal- (J−1)N plained in [6, 7, 8], the trellis has Q states where J is izer calculates the APP P(di(n) = mq|y) for each possible the reduced memory of the channel (J

100 coded bits probabilities Pdec(ci(n)|y)as Pdec ci(n)|y ext | = Pdec ci(n) y . (26) Peq ci(n)|y 10−1 These information are provided to the equalizer as a pri- ori information on the next iteration in order to obtain

BER more reliable soft outputs. The equalizer computes the APP − P (c (k)|y) on the coded bits and provides the extrinsic 10 2 eq i ext | information Peq (ci(k) y) to the decoder. The probabilities ext | Peq (ci(k) y) are calculated as Peq ci(k)|y 10−3 Pext c (k)|y = . (27) 01234 5678 eq i ext | Pdec ci(k) y Eb/N0 Channel 1 Channel 2 min After some iterations, hard decisions on the input informa- Channel 1 min Channel 3 tion bits are taken by the decoder. Channel 2 Channel 3 min It is important that the receiver modules exchange ex- trinsic information in order to minimize the correlation be- Figure 2: Performance of the List-type MAP equalizer with and tween the a priori information used by a module and its pre- = without the WMF for a channel reduced memory J 2. vious decisions. To fully realize the potential of the iterative receiver, the streams of coded symbols are fed to indepen- dent random interleavers before they are transmitted. Inter- 100 leavers help to break the correlation of the encoder output. They also guarantee that the a priori information provided to the equalizer are almost independent. The use of differ- −1 10 ent interleavers may improve the performance of the iterative receiver. 4.2. Equalizer for the iterative processing 10−2 For the first iteration, the equalizer is the List-type MAP al- BER gorithm described in Section 3. For the next iterations, the MAP decoder provides the equalizer with extrinsic informa- −3 10 tion and only the calculation of the transition probabilities Si  γn (µ, µ ) is changed. More precisely, the a priori probabilities in (20) are now calculated as, 10−4 01234 5678 $N P µ|µ = Pext c |y , (28) Eb/N0 dec i i=1 Channel 1 Channel 2 min Channel 1 min Channel 3  where ci,i= 1,...,N correspond to the transition (µ, µ )and Channel 2 Channel 3 min ext | Pdec(ci y) comes from (26). Figure 3: Performance of the List-type MAP equalizer with and without the WMF for a channel reduced memory J = 3. 4.3. Simulation results In the following simulations, the calculation of the WMF is performed by using a predictor of degree D = 10. We focus 4.1. Principle of turbo detection on Channel 2 since it is a severe channel, so it is a good test for evaluating the performance of our receiver. Figure 4 shows the iterative receiver system. On the first it- Figure 5 shows the performance for one to three itera- eration, the List-type MAP equalizer generates the a pos- tions of the turbo receiver with and without the WMF. When | ≤ ≤ teriori probabilities Peq(ci(k) y)for1 i N, given the the WMF is not used the number of survivors S is set to 4 whole received vector during a burst. These probabilities are and when it is used the number of survivors is set to 1 and deinterleaved and multiplexed. The decoder uses these infor- 2. We notice that for the coded system the use of the WMF mation to calculate the APP for the information bits and for yields a significant improvement even if the number of sur- the coded bits using the MAP algorithm [3]. vivors is low (S = 1andS = 2). We can conclude that the gain Extrinsic information are computed from a posteriori achieved when prefiltering the received signal with a WMF Low-Complexity Iterative Receiver for Space-Time Coded Signals over Frequency Selective Channels 523

ext r y1 P (c1|y) 1 eq Π−1 1 Whitened List-type Decoded matched MAP MUX Decoder bits filter ext y equalizer P (cN |y) rM N eq Π−1 N

ext | Pdec(c1 y) Π1 ext P (ci|y) DeMUX dec ext | Pdec(cN y) ΠN

Figure 4: Low-complexity turbo detector.

100 as a prefilter for the turbo detector. Simulation results show that this prefiltering improves significantly the receiver per- formance particularly over severe channels. Our receiver − achieves a good tradeoff between complexity and perfor- 10 1 mance. In fact, we have decreased exponentially the com- plexity of the optimum MAP detector. Moreover, thanks to the prefiltering and the iterative process, the performance has 10−2

BER been dramatically improved. Thus, our turbo detector is a good candidate for multi-antenna systems mainly when the memory of the channel is large and the constellation used has 10−3 ahighorder.

REFERENCES − 10 4 [1] G. J. Foschini and M. J. Gans, “On limits of wireless commu- 01234 5678 nications in a fading environment when using multiple an- Eb/N0 tennas,” Wireless Personal Communications,vol.6,no.3,pp. Ch2-iter1-S = 4 Ch2min-iter3-S = 1 311–335, 1998. Ch2-iter2-S = 4 Ch2min-iter1-S = 2 [2] G. K. Kaleh, “Channel equalization for block transmission Ch2-iter3-S = 4 Ch2min-iter2-S = 2 systems,” IEEE Journal on Selected Areas in Communications, Ch2min-iter1-S = 1 Ch2min-iter3-S = 2 vol. 13, no. 1, pp. 110–121, 1995. Ch2min-iter2-S = 1 [3] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decod- ing of linear codes for minimizing symbol error rate,” IEEE Figure 5: BER performance of the iterative detector (J = 2) on Transactions on Information Theory, vol. 20, no. 2, pp. 284– Channel 2 versus Eb/N0: Ch2-iter corresponds to a receiver with a 4- 287, 1974. survivor detector and Ch2min-iter corresponds to the receiver with [4] G. Bauch, A. F. Naguib, and N. Seshadri, “MAP equalization the WMF and a dfse (S = 1, dotted lines) or a 2-survivor detector of space-time coded signals over frequency selective chan- (S = 2). nels,” in Wireless Communications and Networking Conference, New Orleans, La, USA, September 1999. [5] B. Penther, D. Castelain, and H. Kubo, “A modified turbo- is dramatically more important than that obtained when in- detector for long delay spread channels,” in International Sym- creasing the number of survivors. The gain achieved via the posium on Turbo Codes, Brest, France, September 2000. iterative processing is also obvious. We notice that most of [6] A. Duel-Hallen and C. Heegard, “Delayed decision-feedback sequence estimation,” IEEE Trans. Communications, vol. 37, the improvement is achieved in the second iteration. For a no. 5, pp. 428–436, 1989. = −3 BER 10 , the iterative processing yields an improvement [7] T. Hashimoto, “A list-type reduced-constraint generalization of 1.8 dB, at the third iteration when the WMF is used with of the Viterbi algorithm,” IEEE Transactions on Information the 2-survivor detector. Theory, vol. 33, no. 6, pp. 866–876, 1987. [8] H. Kubo, K. Murakami, and T. Fujino, “A list-output Viterbi equalizer with two kinds of metric criteria,” in Proc. IEEE In- 5. CONCLUSION ternational Conf. on Universal Pers. Commun., pp. 1209–1213, In this paper, we have proposed a low-complexity turbo October 1998. detector for space-time coded frequency selective channels. [9]P.E.Caines,Linear Stochastic Systems, Wiley, New York, USA, 1988. Our detector is a suboptimal variant of the MAP algorithm [10] A. Duel-Hallen, “Equalizers for multiple input/multiple out- based on state-reduction and per-survivor processing. In or- put channels and PAM systems with cyclostationary input se- der to compensate the loss due to state-reduction, we have quences,” IEEE Journal on Selected Areas in Communications, designed a multidimensional whitened matched filter used vol. 10, no. 3, pp. 630–639, 1992. 524 EURASIP Journal on Applied Signal Processing

[11] A. Duel-Hallen, “A family of multiuser decision-feedback de- Mohamed Siala was born in 1965 in tectors for asynchronous code-division multiple-access chan- Tunisia. He received the “Diplomeˆ nels,” IEEE Trans. Communications, vol. 43, no. 2–4, pp. 421– d’Ingenieur”´ from “Ecole´ Polytech- 434, 1995. nique, Palaiseau, France,” in 1988, the [12] G. Caire, G. Taricco, and E. Biglieri, “Bit-interleaved coded “Diplomeˆ d’Ingenieur”´ from “Ecole´ Na- modulation,” IEEE Transactions on Information Theory, vol. tionale Superieure´ des Tel´ ecommunications´ 44, no. 3, pp. 927–946, 1998. (ENST),” Paris, France, in 1990 and the [13] E. Zehavi, “8-PSK trellis codes for a Rayleigh channel,” IEEE “Doctorat de l’ENST,” Paris, France, in Trans. Communications, vol. 40, no. 5, pp. 873–884, 1992. 1995. From 1990 to 1992, he was with [14] C. Berrou, A. Glavieux, and P.Thitimajshima, “Near Shannon Alcatel Radio-Telephones, Colombes, limit error-correcting coding and decoding: turbo codes,” in France, working on the GSM physical layer. In 1995, he joined Proc. IEEE International Conf. on Communications, pp. 1064– Wavecom, Issy-les-Moulineaux, France, where he worked on IOTA 1070, Geneva, Switzerland, May 1993. multicarrier modulations for terrestrial digital TV and channel [15] X. Wang and H. V. Poor, “Iterative (turbo) soft interfer- ence cancellation and decoding for coded CDMA,” IEEE estimation for the ICO project. From 1997 to 2001, he worked at Trans. Communications, vol. 47, no. 7, pp. 1046–1061, 1999. France Tel´ ecom´ R&D, Issy-les-Moulineaux, France, on the physical ´ [16] A. Roumy, I. Fijalkow, and D. Pirez, “Turbo multiuser detec- layer of the UMTS system. In 2001, he joined “Ecole Superieure´ tion for coded asynchronous DS-CDMA over frequency se- des Communications de Tunis (SUPCOM),” Tunis, Tunisia, where lective channels,” J. on Commun. Networks,vol.3,no.3,pp. he is an Associate Professor. His research interests are in the area 202–210, 2001, special issue on multiuser detector. of digital communications with special emphasis on channel [17] G. Bauch and N. Al-Dhahir, “Iterative equalization and de- estimation, modulation and coding for mobile communications. coding with channel shortening filters for space-time coded modulation,” in Proc. IEEE Vehicular Technology Conference, pp. 1575–1582, Boston, Mass, USA, September 2000. [18] N. Al-Dhahir, “FIR channel-shortening equalizers for MIMO ISI channels,” IEEE Trans. Communications,vol.49,no.2,pp. 213–218, 2001. [19] V. Kucera, “Factorization of rational spectral matrices: a sur- vey of methods,” in Proc.IEEInternationalConferenceonCon- trol, vol. 2, pp. 1074–1078, Edinburgh, 1991. [20] A. Papoulis, Signal Analysis, McGraw-Hill, New York, NY, USA, 1977.

Noura Sellami was born in Tunisia, in 1975. She received the preliminary doctorate cer- tificate in signal processing from University of Cergy-Pontoise, France, and the engineer diploma from Ecole´ Nationale Superieure´ de l’Electronique´ et de ses Applications (ENSEA), France, in 1999. She is currently working toward the Ph.D. degree at Univer- sity of Cergy-Pontoise, France.

Inbar Fijalkow received the Engineering M.S. and Ph.D. degrees from Ecole´ Na- tionale Superieure´ de Tel´ ecommunications´ (ENST), Paris, France, in 1990 and 1993, respectively. In 1993–1994, she was a Re- search Associate at Cornell University, NY, USA. During 1994–1999, she was an As- sociate Professor, since 1999 she has been a Professor. She is member of the Labora- tory ETIS (ENSEA—University of Cergy- Pontoise), France. In 1998, she was a visiting researcher at the Aus- tralian National University (ANU), Canberra, Australia. Her cur- rent research interests are in signal processing applied to digital communications: adaptive and iterative (turbo) processing, blind deconvolution/equalization of multiple sources and sensors sys- tems. She is member of the board of the GDR ISIS, which is the CNRS research group on signal image and vision processing. She has been associate editor of the IEEE Transactions on Signal Pro- cessing since August 2000. EURASIP Journal on Applied Signal Processing 2002:5, 525–531 c 2002 Hindawi Publishing Corporation

Maximum-Likelihood Sequence Detection of Multiple Antenna Systems over Dispersive Channels viaSphereDecoding

Haris Vikalo Information Systems Laboratory, Stanford University, Stanford, CA 94305, USA Email: [email protected]

Babak Hassibi Department of Electrical Engineering, California Institute of Technology, Pasadena, CA 91125, USA Email: [email protected]

Received 6 May 2001 and in revised form 28 March 2002

Multiple antenna systems are capable of providing high data rate transmissions over wireless channels. When the channels are dispersive, the signal at each receive antenna is a combination of both the current and past symbols sent from all transmit anten- nas corrupted by noise. The optimal receiver is a maximum-likelihood sequence detector and is often considered to be practically infeasible due to high computational complexity (exponential in number of antennas and channel memory). Therefore, in prac- tice, one often settles for a less complex suboptimal receiver structure, typically with an equalizer meant to suppress both the intersymbol and interuser interference, followed by the decoder. We propose a sphere decoding for the sequence detection in multiple antenna communication systems over dispersive channels. The sphere decoding provides the maximum-likelihood esti- mate with computational complexity comparable to the standard space-time decision-feedback equalizing (DFE) algorithms. The performance and complexity of the sphere decoding are compared with the DFE algorithm by means of simulations. Keywords and phrases: sphere decoding, maximum-likelihood, multiple antennas, dispersive channels, computational complexity.

1. INTRODUCTION may be linear (zero-forcing or minimum mean square), Multiple antenna wireless communication systems are capa- or nonlinear decision-feedback equalizer (DFE). DFEs es- ble of providing data transmission at potentially very high sentially perform successive interference cancellation: a soft rates [1]. To secure high reliability of the data transmis- symbol estimate is used to cancel the trailing interference, sion, special attention has to be payed to the design of the upon which the hard decision is made to recover the sym- receiver. When transmitting over noisy dispersive channels, bol. (For the analysis of the performance of DFE algorithm in the received signal at each receive antenna is the combi- a dispersive MIMO environment, see [6].) For high enough nation of the transmitted signals perturbed by noise, in- SNR, DFEs obtain better performance than linear equalizers tersymbol interference (ISI), and by interuser interference while still having much lower complexity than the optimal (IUI). In this case, the optimal receiver structure is the multi- MLSE algorithm. However, the performance of the DFE is channel maximum-likelihood sequence estimation (MLSE). highly inferior compared to the performance of the optimal However, the computational complexity of the traditional MLSE algorithm. maximum-likelihood sequence detector often prohibits its In this paper, we propose an algorithm that yields the practical implementation. (For instance, the Viterbi decoder optimal MLSE performance on dispersive multiple-input is exponential in the length of the channel [2].) One way multiple-output (MIMO) channels with finite impulse re- to alleviate the computational burden is to settle for (sub- sponse (FIR). (We should point out that the wireless commu- optimal) reduced complexity MLSE algorithms by reducing nication systems may or may not employ feedback from the the number of states (see, e.g., [3, 4]). In practice, however, receiver to the transmitter. In this paper, we focus on optimal most often a multichannel (space-time) equalizer is used to detector structures for systems where feedback is unavail- suppress ISI and IUI first; then, a hard decision is made to able and the receiver learns the channel based on the training recover the symbol that has been sent [2, 5, 6]. The equalizer information.) 526 EURASIP Journal on Applied Signal Processing

We consider the so-called sphere decoding, an algorithm (1) νk for solving integer least-squares problems, which, in the (1) (1) sk h(2,1) χ communication context, provides the ML estimate of the k transmitted data sequence. The algorithm is due to Fincke h(1,2) (2) and Pohst [7] and was first proposed in the context of the νk s(2) (2) closest point searches in lattices (for a review of these, see k χk [8] and the references therein). The algorithm was rediscov- ered in [9] in the context of detection in GPS systems. The use of the sphere decoding for lattice codes was first pro- . (N,1) . . h . posed in [10],andfurtherinvestigatedin[11, 12]. In [13], . it has been analytically shown that the average complexity (N) of the sphere decoding used for ML detection in flat fading νk multiple-antenna systems is polynomial (often sub-cubic) s(M) (N) k χk for a wide range of signal-to-noise ratios (SNRs). The paper is organized as follows: in Section 2,wede- scribe the FIR MIMO channel model. In Section 3,wepose Figure 1: FIR MIMO channel model. the detection problem, briefly overview heuristics for solving it, and describe the sphere decoding algorithm. Simulation results are presented in Section 4, where it is shown that the matrix form as sphere decoding provides significant improvement (several dBs) over the MIMO DFE. The computational complexity of C ᐄ = ᏿ ᐂ the sphere decoding turns out to be comparable to that of the k Hl k−l + k, (3) MIMO DFE, thereby suggesting that it can be implemented l=1 in practice. The paper concludes with Section 5. where - . ᏿ = (1) (2) ··· (M) (4) 2. FIR MIMO MODEL DESCRIPTION k sk sk sk

We consider a multiple-antenna system with M transmit and is the transmit vector, whose entries typically come from a N receive antennas. The MIMO channel is modeled as block- N×1 QAM constellation, ᐂk ∈ Ꮿ is the additive noise vector fading frequency-selective, where the channel impulse re- defined as sponse is constant for some discrete interval T,afterwhich - . it changes to another (independent) impulse response that (1) (2) (N) ᐂ = ν ν ··· ν , (5) remains constant for another interval T, and so on. The addi- k k k k tive noise is spatially and temporally independent identically ∈ ᏯN×M ffi distributed (i.i.d.) circularly-symmetric complex-Gaussian. and Hl is the lth coe cient matrix in the MIMO The MIMO channel model is shown in Figure 1. channel impulse response, The channel is represented by its complex baseband   equivalent model. Let the column vector h(1,1) h(1,2) ··· h(1,M)  l l l   (2,1) (2,2) (2,M)  - . h h ··· h  (i,j) (i,j) (i,j) (i,j)  l l l  h = h h ··· h (1) H =   . (6) 1 2 C(i,j) l  . . .. .   . . . .  denote the single-input single-output (SISO) channel im- h(N,1) h(N,2) ··· h(N,M) pulse response from the jth transmit to the ith receive an- l l l tenna. For convenience, we shall make the following assump- In other words, the z-transform of the MIMO channel tions on the SISO channels h(i,j): impulse response is given by

(1) C(i,j) = C,1 ≤ i ≤ N,1 ≤ j ≤ M, that is, all SISO −1 −(C−1) H(z) = H1 + H2z + ···+ HCz . (7) channels have impulse responses of the same length, ffi (i,j) ≤ ≤ ≤ ≤ Define the following vectors: (2) the channel coe cients hl ,1 l C,1 i N, 1 ≤ j ≤ M are i.i.d. Ꮿ(0, 1).     ᐄ = ᐄ ᐄ ··· ᐄ − , 1 2 T+C 1 The received signal at the ith antenna can then be expressed     ᐂ = ᐂ ᐂ ··· ᐂ , (8) as 1 2 T+C−1  ᏿ = ᏿ ᏿ ··· ᏿ . M C 1 2 T χ(i) = h(i,j)s(j) + ν(i), (2) k l k−l k ᐂ ∈ ᏯN(T+C−1) j=1 l=1 (Note that the random vector has unit vari- ∗ ance complex Gaussian i.i.d. entries, E[ᐂᐂ ] = IN(T+C−1).) for k = 1, 2,...,T + C − 1. Equation (2)canbewrittenina Then from (3) we can write the input-output relation for the MIMO MLSD via Sphere Decoding 527

FIR MIMO channel in the matrix form as ᐂ

ᐄ = Ᏼ᏿ ᐂ ᏿ ᐄ + , (9) Ᏼ where Ᏼ ∈ ᏯN(T+C−1)×MT is constructed as   Figure 2: Matrix equivalent channel model. H1     H2 H1    as  . . .   . . ..    = H ··· H  x Hs + v, (14)  C 1    Ᏼ =  .. ..  . (10)  . .  where the signal vectors s are typically obtained upon mod-  ···  Ᏸ2MT  HC H1  ulation of the input bits onto an L-PAM constellation L ,    .. . .   . . .    2MT  −  − − − − HC HC 1 Ᏸ2MT = − L 1 −L 3 L 3 L 1 L , ,..., , . (15) HC 2 2 2 2

Model (9)isillustratedinFigure 2. We assume that symbol (This particular structure of vector s stems from the assump- ᏿ × bursts are uncorrelated (which is an appropriate assumption tion that entries of in (9) are points in L L QAM constel- when modeling, for instance, packet transmission in TDMA lation.) Notice that we assumed that L is even. (In practice, systems). L is commonly a power of 2, giving rise to 2-PAM, 4-PAM, It will be convenient to define the signal-to-noise ratio ρ 8-PAM, etc., constellations.) Ᏸ2MT for the system in (9), Finally, notice that L is a finite lattice carved from an infinite one, ᐆ2MT. EᏴ᏿2 ρ = 2 Eᐂ2 3. PROBLEM STATEMENT 2 E tr Ᏼ᏿᏿∗Ᏼ∗ With the notation introduced in Section 2, due to the Gaus- = (11) E tr ᐂᐂ∗ sian assumption on the additive noise, we can express the MLSE problem as the optimization problem E tr ᏿᏿∗Ᏼ∗Ᏼ = . N(T + C − 1) min x − Hs2, (16) ∈Ᏸ2MT⊂ᐆ2MT s L Assuming that the entries in ᏿ are coming from an L×L QAM constellation (where L is assumed to be even), and that the where the minimization is over all points in the constellation Ᏸ2MT. We can interpret problem (16) as follows. minimum distance between constellation points is dmin = 1, L we find that Given the “skewed” lattice Hs, find the “closest” lattice point to a given 2NT-dimensional vector x. 2 L − 1 ∗ The closest lattice point search problem in (16) is known ρ = E tr Ᏼ Ᏼ 6N(T + C − 1) to be, in general, of exponential complexity [8]. There are L2 − 1 several reduced complexity heuristic methods that can be = MTCN (12) used to obtain approximate solutions to (16). The most ob- 6N(T + C − 1) vious are the following two. L2 − 1 MTC = . • Inverting and rounding to the closest integer 6(T + C − 1) = † Notice that all quantities in (9) are complex. We will find sˆ H x ᐆ, (17) it useful to rewrite (9) in terms of real quantities. To this end, † define where H denotes the pseudo-inverse, and where for a ∈ ᏾ the notation [a]ᐆ means the closest integer to  † x = (ᐄ) (ᐄ) , a.So[H x]ᐆ is simply the vector obtained by this op- † ˆ  eration applied to each entry of H x. The above s is v = (ᐂ) (ᐂ) , called the Babai point (estimate). In the communica- (13) tions context, the preceding procedure is nothing but (Ᏼ) −(Ᏼ) simple zero-forcing equalization, followed by a hard H = . (Ᏼ) (Ᏼ) decision. • In nulling and canceling [14], one uses the Babai es- Thus, with the previously defined x ∈ ᏾2N(T+C−1)×1, v ∈ timate for one of the entries of s,says(1); then as- ᏾2N(T+C−1)×1,andH ∈ ᏾2N(T+C−1)×2MT,wecanrewrite(9) sumes that s(1) is known and subtracts out its effect 528 EURASIP Journal on Applied Signal Processing

to obtain a reduced integer least-squares problem with that s lies in a sphere of radius r if 2MT − 1 unknowns. Then the procedure is repeated ∗   to solve similarly for s(2), and so on. (Nulling and can- r2 ≥x − Hs2 = s − sˆ H∗H s − sˆ + x2 − Hsˆ2, (21) celling is fundamentally equivalent to the generalized † decision-feedback equalization discussed in [15].) As where sˆ = H x. To make the notation simpler, denote size of a side note, one can further improve the performance the vector s as of nulling and canceling by introducing optimal order- = ing: the algorithm starts from the “strongest” and pro- m 2MT. (22) ceeds to the “weakest” entry in s (see, e.g., [14, 16]). (Note that m is the number of unknowns and it will be of The aforementioned heuristics have acceptable polyno- interest in studying the complexity.) mial-time computational complexity for practical imple- Introducing the QR decomposition H = QR (where Q is  mentation purposes. However, their performance is inferior unitary and R is upper triangular), and defining r 2 = r2 − in comparison with the exact solution to the MLSE problem. x2 + Hsˆ2,wecanwrite(21)as We proceed by describing an algorithm, the so-called 2 ≥ − ∗ ∗ ∗ − sphere decoding,forefficient closest point search in the lattice. r s sˆ R /012Q Q R s sˆ =I ∗ 3.1. Sphere decoding ≥ s − sˆ R∗R s − sˆ The sphere decoding performs the closest-point search in m m r 2 = 2 − ij − a somewhat more sophisticated manner than doing a full rii si sˆi + sj sˆj i=1 j=i+1 rii search over the integer lattice, which requires exponential complexity. In particular, it performs search only over lat- = r2 s − sˆ 2 mm m m tice points lying in a certain hypersphere of radius r cen- 2 2 − rm−1,m − ··· tered around the received vector x. The closest lattice point is + rm−1,m−1 sm−1 sˆm−1 + sm sˆm + , rm−1,m−1 clearly the solution. (23) From a practical point of view, there are two issues that have to be resolved. One is the proper choice of the sphere ra- where ri,j denotes (i, j) entry of the matrix R. A necessary dius r:ifr is too large there will be too many lattice points in condition for s to lie inside the sphere is therefore that the sphere and we may still require an exponential search; if r is too small there will be no points in the sphere. The other 2 2 2 r sm − sˆm ≤ r . (24) issue concerns determining which lattice points lie within the mm sphere—if the algorithm were to check all the points in the This condition is easy to check and it leads to lattice, we would be again stuck with an exponential search. 3  4 5  6 We use a statistical criterion to choose radius r.Inpar- r r sˆm − ≤ sm ≤ sˆm + . (25) ticular, the radius of the sphere is chosen so that with high rmm rmm probability we find at least one lattice point in the sphere. To this end, note that However, condition (25)isbynomeanssufficient. For every 2 = 2 − 2 − 2 sm satisfying (25), upon defining rm−1 r rmm(sm sˆm) v2 = x − Hs2 (18) one can state a stronger necessary condition   is a chi-square random variable with NT degrees of freedom. 2   (Recall that each entry on v is an independent N(0,σ2)ran- 2  − rm−1,m −  ≤ 2 rm−1,m−1 sm−1 sˆm−1 + sm sˆm rm−1, (26) dom variable.) We choose the radius r to be a linear function / rm−101,m−1 2  2 of the variance of v , sˆm−1|m

r2 = α2NTσ2, (19) which is equivalent to

3  4 5  6 where the coefficient α is chosen in such a way that with a rm−1 rm−1 sˆm−1|m − ≤ sm−1 ≤ sˆm−1|m + . (27) high probability pfp we find a lattice point inside a sphere, rm−1,m−1 rm−1,m−1 % α2NT NT−1 In a similar fashion, one proceeds for sm−2, and so on, λ − λ = stating nested necessary conditions for all elements of s. This Γ e dλ pfp. (20) 0 (NT) leads us to the sphere decoding algorithm which essentially finds all points that satisfy the previously stated conditions: We find α in (20) by a simple table lookup. Once we have chosen radius r, we need to determine Input: R, x, sˆ, r. = 2 = 2 − 2  2 = which lattice points belong to the sphere of radius r.Anef- (1) Set k m, rm r x + Hsˆ , sˆm|m+1 sˆm.  = =  |  ficient way to check whether a lattice point belongs to the (2) (Bounds for sk)setz rk/rkk, UB(sk) z + sˆk k+1 , sphere is given by the algorithm of Fincke and Pohst [7]. Note sk = −z + sˆk|k+1−1. MIMO MLSD via Sphere Decoding 529

(3) (Increase sk) sk = sk +1.Ifsk ≤ UB(sk) go to (5), else 100 Sphere decoding to (4). Generalized DFE = = (4) (Increase k) k k+1; if k m+1, terminate algorithm, 10−1 else go to (3). = = − = (5) (Decrease k)ifk 1goto(6).Elsek k 1, sˆk|k−1 −2 m − 2 = 2 − 2 − 10 sˆk + j=k+1(rkj/rkk)(sj sˆj ), rk rk+1 rk+1,k+1(sk+1 2 sˆ +1| +2) , and go to (2). − k k 10 3 (6) Solution found. Save s and go to (3). BER

In general, the closest point search has both worst-case 10−4 and average complexity that is exponential in the number of unknowns [17]. The same is true for the sphere decoding. 10−5 However, in our application, the vector x in (16)isnotan arbitrary point in space but rather a lattice point perturbed − 10 6 by the noise as expressed by (14). Clearly, the higher the SNR 024 6810 12 14 16 18 20 22 in (12), the less perturbed the lattice point is. Therefore, one SNR (dB) may suspect that the expected complexity of the sphere de- coding algorithm will depend on the SNR. Indeed, this is the Figure 3: BER performance of SD and DFE for M = 2, N = 2, case—the higher the SNR, the lower the complexity. C = 4, T = 4, L = 2. In [13], we have computed in closed-form the expected complexity (averaged over the noise and the lattice) of the sphere decoding for the nondispersive (flat-fading) channels. It is shown that the expected complexity is polynomial-time over a wide range of SNRs, and is, in fact, often sub-cubic for 3.6 e SNRs that support the data rates being transmitted. c For dispersive channels explicitly computing the ex- 3.4 pected complexity appears to be much more complicated, and we are currently not able to analytically perform all 3.2 the required steps. Nonetheless, simulation suggest the same qualitative performance of polynomial-time complexity as 3 we observe from the examples in Section 4. Complexity exponent Furthermore, the complexity of the sphere decoding can be improved by exploiting the Toeplitz structure of the chan- 2.8 nel matrix. In particular, note that the channel matrix pre- processing is required only in order to transform H into an 2.6 0246810 12 14 16 18 upper triangular form. Due to the Toeplitz structure of H, SNR (dB) it is in fact sufficient to perform QR factorization of only one coefficient matrix in the MIMO channel impulse re- Figure 4: Complexity exponent of the SD for M = 2, N = 2, C = 4, sponse (HC in (10)). Upon QR factorization of HC the bot- T = 4, L = 2. tom square submatrix of H becomes upper triangular and thus can be processed by the sphere decoding algorithm to find a lattice point s; then one proceeds by adding the con- (corresponding to 2-PAM, or L = 2, in the real-valued tribution of the top 2(C − 1) rows of H to find the metric set of (14)). The resulting transmission rate is therefore x − Hs2 and by testing whether the lattice point s belongs 4 bits/channel use. The performance comparison of an un- to the sphere. coded transmission in terms of bit error rate (BER) between Further improvement in the complexity of the sphere de- the sphere decoding and nulling and canceling (or, equiva- coding can be obtained by employing the Schnorr-Euchner lently, generalized DFE) is shown in Figure 3. variation of the Fincke-Pohst algorithm (see [8, 18]). Essen- As an indicator of the expected computational complex- tially, by examining points in the hypersphere in a different ity of the sphere decoding, we adopt the complexity expo- order (in particular, by starting from the Babai point), signif- nent, ce,definedas icant computational savings can be obtained [18]. log(expected total flop count) ce = , (28) 4. SIMULATION RESULTS log(m) = We first consider a communication system with M 2 where m is defined in (22). The expected complexity can = transmit and N 2 receive antennas. The channel mem- therefore be expressed as = ory is assumed to be C 4, and the coherence interval time T = 4. Data is modulated onto 4-QAM constellation O mce = O (2MT)ce . (29) 530 EURASIP Journal on Applied Signal Processing

100 100 Sphere decoding Sphere decoding Generalized DFE Generalized DFE 10−1 10−1

10−2 10−2

10−3 10−3 BER BER

10−4 10−4

10−5 10−5

10−6 10−6 12 14 16 18 20 22 24 26 28 32 34 6 8 10 12 14 16 18 20 22 24 SNR (dB) SNR (dB)

Figure 5: BER performance of SD and DFE for M = 2, N = 2, Figure 7: BER performance of SD and GDFE for M = 4, N = 4, C = 4, T = 8, L = 4. C = 4, T = 8, L = 4.

4.2 4.4 4 4.2 3.8

e 4 e c c 3.6 3.8 3.4 3.6 3 2 3.4 . 3.2 3 3 2.8 Complexity exponent Complexity exponent 2.8 2.6 2.6 2.4 2.4 2.2 12 14 16 18 20 22 24 26 28 6 8 10 12 14 16 18 20 SNR (dB) SNR (dB)

Figure 6: Complexity exponent of the SD for M = 2, N = 2, C = 4, Figure 8: Complexity exponent of the SD for M = 4, N = 4, C = 4, T = 8, L = 4. T = 8, L = 4.

The complexity exponent as the function of SNR for the corresponding complexity exponent of the sphere decoding previous example with m = 16 is shown in Figure 4.Note is shown in Figure 8 and is sub-cubic for SNRs above 12 dB. that for SNRs above 7 dB we obtain sub-cubic complexity. As another example, we consider the same 2 × 2system 5. DISCUSSION AND CONCLUSION (M = 2, N = 2), with C = 4, but now increase the block length to T = 8, and the constellation to 16-QAM, corre- We have proposed sphere decoding for maximum-likelihood sponding to L = 4 and a transmission rate of 8 bits/channel sequence detection of multiple antenna systems over use. The performance comparison between the sphere de- frequency-selective channels. To employ the sphere decod- coding and generalized DFE is shown in Figure 5.Thecom- ing, the detection problem was posed as an integer least- plexity exponent as the function of SNR for this example squares problem. As illustrated by simulations, the sphere (where m = 32) is shown in Figure 6. decoding provides several dBs improvement over the MIMO As a final example, consider the 4×4 communication sys- decision-feedback equalization. We have shown empirically tem (M = 4, N = 4), with C = 4andblocklengthT = 8(and that the expected computational complexity of the sphere thus m = 64). The constellation used is 4-QAM (hence L = 2, decoding is polynomial (often sub-cubic) for a wide range and the corresponding transmission rate is 8 bits/channel of SNRs. Both the sphere decoding and MIMO DFE re- use). The performance comparison between sphere decoding quire some preprocessing of the channel matrix (usually in and generalized DFE for this system is shown in Figure 7.The aformofQRfactorization)which,ingeneral,hascubic MIMO MLSD via Sphere Decoding 531 complexity. Therefore, the maximum-likelihood detection [18] C. P. Schnorr and M. Euchner, “Lattice basis reduction: im- on MIMO channels with memory can be implemented with proved practical algorithms and solving subset sum prob- complexity similar to that of heuristic methods, but with sig- lems,” Mathematical Programming, vol. 66, pp. 181–191, 1994. nificant performance gains. Haris Vikalo was born in Tuzla, Bosnia and Herzegovina. He re- REFERENCES ceived his B.S. degree from University of Zagreb, Croatia, in 1994, his M.S. degree from Lehigh University, Bethlehem, PA, in 1997, [1] I. E. Telatar, “Capacity of multi-antenna Gaussian channels,” and Ph.D. degree from Stanford University, Stanford, Calif, USA, European Transactions on Telecommunications, vol. 6, no. 10, in 2002, all in electrical engineering. He held short-term positions pp. 585–595, 1999. at Siemens (Croatia), Bell Laboratories (Murray Hill, NJ, USA), and [2] C. Tidestav, A. Ahlen,´ and M. Sternad, “Realizable MIMO California Institute of Technology (Pasadena, Calif, USA). His re- decision feedback equalizers: structure and design,” IEEE search interests include wireless communications, signal process- Trans. Signal Processing, vol. 49, no. 1, pp. 121–133, 2001. [3] M.V.Eyuboglu´ and S. U. H. Qureshi, “Reduced state sequence ing, and algorithm complexity. estimation for coded modulation on intersymbol interference Babak Hassibi was born in Tehran, Iran, in 1967. He received his channels,” IEEE Journal on Selected Areas in Communications, vol. 7, no. 6, pp. 989–995, 1989. B.S. degree from University of Tehran in 1989 and the M.S. and [4] R. Raheli, A. Polydoros, and C.-K. Tzou, “The principle of Ph.D. degrees from Stanford University, Stanford, Calif, USA, in per-survivor processing: a general approach to approximate 1993 and 1996, respectively, all in electrical engineering. From Oc- and adaptive MLSE,” in Proc. Global Telecommunications Con- tober 1996 to October 1998, he was a Research Associate with the ference, vol. 2, pp. 1170–1175, Phoenix, Ariz, USA, 1991. Information Systems Laboratory, Stanford University, and from [5] A. M. Tehrani, B. Hassibi, and J. Cioffi, “Adaptive equalization November 1998 to January 2001, he was Member of Technical Staff of multiple-input multiple-output frequency selective chan- at Bell Laboratories, Murray Hill, NJ, USA. Since January 2001, he nels,” in Proc. 33rd Asilomar Conference on Signals, Systems, has been an Assistant Professor of electrical engineering at the Cal- and Computers, October 1999. ifornia Institute of Technology, Pasadena, USA. He has also held [6] N. Al-Dhahir and A. H. Sayed, “The finite-length multi-input short-term appointments at Ricoh California Research Center, the multi-output MMSE-DFE,” IEEE Trans. Signal Processing, vol. Indian Institute of Science, and Linkoping¨ University, Linkoping,¨ 48, no. 10, pp. 2921–2936, 2000. Sweden. His research interests include wireless communications, [7] U. Fincke and M. Pohst, “Improved methods for calculat- robust estimation and control, adaptive signal processing and lin- ing vectors of short length in a lattice, including a complexity ear algebra. He is coauthor of the books “Indefinite Quadratic Es- analysis,” Mathematics of Computation, vol. 44, no. 170, pp. timation and Control: A Unified Approach to the H2 and H∞ The- 463–471, 1985. ories” (New York: SIAM, 1999) and “Linear Estimation” (Engle- [8] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, “Closest point wood Cliffs, NJ: Prentice-Hall, 2000). He was also the recipient of search in lattices,” IEEE Transactions on Information Theory,v the 1999 O. Hugo Schuck best paper award of the American Auto- (to appear). matic Control Council. [9] A. Hassibi and S. Boyd, “Integer parameter estimation in lin- ear models with applications to GPS,” IEEE Trans. Signal Pro- cessing, vol. 46, no. 11, pp. 2938–2952, 1998. [10] E. Viterbo and J. Boutros, “A universal lattice code decoder for fading channels,” IEEE Transactions on Information Theory, vol. 45, no. 5, pp. 1639–1642, 1997. [11] M. O. Damen, A. Chkeif, and J. C. Belfiore, “Lattice codes decoder for space-time codes,” IEEE Communications Letters, vol. 4, no. 5, pp. 161–163, 2000. [12] M. O. Damen, K. Abed-Meraim, and M. S. Lemdani, “Further results on the sphere decoder algorithm,” submitted to IEEE Transactions on Information Theory, 2000. [13] B. Hassibi and H. Vikalo, “Expected complexity of the sphere decoding algorithm,” submitted to IEEE Trans. on Signal Pro- cessing, 2002. [14] G. J. Foschini, “Layered space-time architecture for wireless communication in a fading environment when using multi- element antennas,” Bell Labs Technical Journal, vol. 1, no. 2, pp. 41–59, 1996. [15] J. M. Cioffi and G. D. Forney, “Generalized decision-feedback equalization for packet transmission with ISI and Gaussian noise,” in Communications, Computation, Control, and Sig- nal Processing: a tribute to Thomas Kailath,KluwerAcademic, Boston, Mass, USA, 1997. [16] B. Hassibi, “An efficient square-root algorithm for BLAST,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 2, pp. 737–740, 2000. [17] M. Ajtai, “Generating hard instances of lattice problems,” in Proc. 28th Annual ACS Symposium on the Theory of Comput- ing, pp. 99–108, Philadelphia, Pa, USA, May 1996. EURASIP Journal on Applied Signal Processing 2002:5, 532–537 c 2002 Hindawi Publishing Corporation

Linear Equalization Combined with Multiple Symbol Decision Feedback Detection for Differential Space-Time Modulation

Genyuan Wang Department of Electrical and Computer Engineering, University of Delaware, Newark, DE 19716, USA Email: [email protected]

Aijun Song Department of Electrical and Computer Engineering, University of Delaware, Newark, DE 19716, USA Email: [email protected]

Xiang-Gen Xia Department of Electrical and Computer Engineering, University of Delaware, Newark, DE 19716, USA Email: [email protected]

Received 30 May 2001 and in revised form 19 February 2002

Differential space-time modulation (DSTM) for multiple antenna communication systems has been recently proposed for frequency-nonselective fading channels. In broadband multirate systems, frequency-selective fading may occur. In this paper, the DSTM in frequency-selective fading channels is considered. A linear equalizer for the DSTM is proposed over frequency-selective fading channels. Furthermore, the multiple symbol decision feedback detection is used to improve the performance of the linear equalizer. Keywords and phrases: differential space-time modulation, transmit diversity, decision feedback detection, MIMO, linear equal- ization, RLS.

1. INTRODUCTION DSTM system when frequency-selective fading occurs. An equalizer combined with the differential detection is pro- Differential space-time modulation (DSTM) has been re- posed by exploiting the structure of the DSTM and its differ- cently proposed by Hochwald and Sweldens [1], Hughes [2], ential detection. The equalizer is to minimize the detection and Tarokh and Jafarkhani [3] for multiple antenna com- errors. In such an equalizer, not only the channels are of ma- munication systems, where the channel information is not trix form but also the input signals and the output signals are known to the transmitters or receivers. Several unitary space- of matrix form too. Similar to the existing MIMO equaliza- time codes for the DSTM have been proposed in [1, 2]of tion techniques in the literature, see for example [7, 8], the group codes, in [2, 4] of quaternion codes, and in [5]offixed- least squares (LS) solution and an adaptive RLS algorithm point free group codes, and in [6] of parametric codes. The are derived. As a remark, equalizations of space-time trellis 2 × 2 parametric codes obtained in [6] for two transmit an- coded multi-antenna systems over frequency-selective fading tennas have the largest possible diversity product of size five channels have been considered in [9, 10]. and the largest known diversity product of size sixteen, which The DSTM can be treated as a generalization of differen- will be used in the simulations in this paper later. tial phase shift keying (DPSK). Thus, its differential detection Space-time coded multiple antenna systems can be casted scheme is similar to that of DPSK [1]. Decision feedback de- as multiple input and multiple output (MIMO) systems. tection schemes have been used to improve the performance TherecentstudiesofDSTMin[1, 2, 3] are on frequency- for the DPSK systems in [11]. In this paper, we also general- nonselective fading channels where MIMO channel matrices ize this idea to the equalizer for the DSTM system by incor- do not have memory. However, in broadband multirate sys- porating the multiple symbol decision feedback detection in tems, frequency-selective fading and frequency-nonselective the linear equalizer. fading may co-exist. In this paper, we are interested in the This paper is organized as follows. The differential space- Equalization for Differential Space-Time Modulation 533

Antenna 1 t, wn,t is the additive noise on the nth receive antenna at time

zk Vzk S(k) t, Lh is the largest delay spread of all channels, and ρ is the Mapping Antenna 2 signal-to-noise ratio (SNR) on the receive antennas. For the . energy normalization purpose, the channels are passive as in . Delay [9], that is, Antenna M − Lh 1 2 hn,m(t,d) = 1. (5) = Figure 1: The transmission scheme of differential space-time mod- d 0 ulation with multiple antennas. It is assumed that wn,t is complex Gaussian white noise with zero mean and unit variance, that is, wn,t ∼ CN(0, 1) and wn,t time modulation is briefly reviewed with the frequency- is independent of each other with respect to both n and t.As selective fading channel model in Section 2. The equalization in [1, 2, 3, 4, 5, 6], hn,m(t,d) is assumed independent of each other with respect to m and to n but correlated with respect scheme is presented under the LS criterion and the equalizer ffi combined with the multiple symbol decision feedback detec- to t. When the fading is slow, the channel coe cients can be tion is presented in Section 3.InSection 4, the RLS algorithm taken as constant over a period of time, coherence time. Thus is derived. Simulation results and conclusions are given in hn,m(t,d)canberepresentedbyhn,m(d) within the coherence Sections 5 and 6,respectively. time. Therefore, (4) can be written in the following matrix form:

L−1 2. DIFFERENTIAL SPACE-TIME MODULATION  h X(k) = ρ H S(k − d)+W(k), (6) AND SIGNAL MODEL d d=0 Figure 1 shows the transmission scheme of differential space- where time modulation. Suppose there are M transmit, N receive   antennas in a frequency-selective fading environment. The x1,kM x1,kM+1 ··· x1,(k+1)M−1 transmission rate is bits per channel use.  ···  Rt x2,kM x2,kM+1 x2,(k+1)M−1  An information sequence is grouped as z and then X(k) =   , k  . . . .  mapped to Vz , Vz ∈{Vl,l = 0, 1,...,L− 1},aspace-time . . . . k k ··· code, where L = 2MRt and L is the signal constellation size. xN,kM xN,kM+1 xN,(k+1)M−1 ff (7) Using the di erential modulation technique (similar to the   single antenna system), the transmit signal of the kth block h1,1(d) h1,2(d) ··· h1,M(d)  ···  is h2,1(d) h2,2(d) x2,M(d) =   Hd  . . . .  , = − . . . . S(k) S(k 1)Vzk , (1) hN,1(d) hN,1(d) ··· hN,M(d) where ×   and W(k) is the M N noise matrix on the receive antennas ··· s1,kM s1,kM+1 s1,(k+1)M−1 at the kth block, which is from wn,t.  ···   s2,kM s2,kM+1 s2,(k+1)M−1  In [1, 2, 3, 4, 5, 6], delay spread L = 1 is assumed. In this =   h S(k)  . . . .  , (2) case, (6)becomes . . . . ···  sM,kM sM,kM+1 sM,(k+1)M−1 X(k) = ρHS(k)+W(k), (8) sm,t is the transmitted signal on the mth transmit antenna at where H = (hn,m)N×M. time t and S(0) is the M × M identity matrix. Since V is zk When channel matrix is approximately constant during a unitary matrix, S(k) is also unitary. Thus, the mean signal the transmission of two blocks of data, (8)canberewritten power at time t is always unit, that is, as M  2 = − = X(k) ρHS(k 1)Vzk + W(k)(9) E sm,t 1. (3) m=1 = − − − X(k 1) W(k 1) Vzk + W(k) (10) = −  On the nth receive antenna at time t, the received signal is X(k 1)Vzk + W (k), (11)

 L −1 = − −  M h where W (k) W(k) W(k 1)Vzk .AsVzk is unitary, the ad- =  xn,t ρ hn,m(t,d)sm,t−d + wn,t, (4) ditive noise W (k) is still independent Gaussian. Therefore, m=1 d=0 the maximum-likelihood detector [1, 2]for(11)is ffi   where hn,m(t,d) is the dth tap coe cient of the channel be- =  − −  Vzk arg min X(k) X(k 1)Vzl , (12) tween the mth transmit and the nth receive antennas at time l=0,1,...,L−1 534 EURASIP Journal on Applied Signal Processing

where A is the Frobenius norm of A = (amn)M×N ,defined as X(k) Q(k) Vˆ (k) cν Decision M N 1/2   = H = 2 A Tr A A amn . (13) Σ = = m 1 n 1 Qref (k)

3. EQUALIZATION OF DIFFERENTIAL SPACE-TIME Figure 2: Block diagram of receiver combining linear equalization MODULATED SYSTEM with multiple symbol feedback detection for differential space-time modulation. In broadband multirate wireless communication links, the assumption of delay spread Lh = 1 does not always hold. When multipaths exist, equalization techniques are needed to compensate the intersymbol interference (ISI). In the sin- where feedback order Ln is the number of previous equal- gle antenna case, the linear equalizer works as a filter to sup- ization outputs used in the decision. If Ln = 2, Qref (k) = press the ISI term. The scalar coefficients of the equalizer are Q(k − 1). In (19), each item in the summation is the esti- the parameters of the filter. In the multiple transmit antenna mation of Q(k − 1). Qref (k) is the average of Ln − 1estimated system, such coefficients are matrices. The input and the out- values of Q(k − 1) and therefore less noisy. Therefore, when put are also matrices. The structure of the equalization is Ln > 2, performance improvement can be obtained. Thus the shown in Figure 2. detector turns into Therefore, the output of such equalizer is   ˆ =  −  Vzk−k arg min Q(k) Qref (k)Vzl . (20) 0 l=0,1,...,L−1

Le−1 = − = − Q(k) cνX(k ν) (14) The equalizer is to minimize the error e(k) Q(k) ν=0 Qref (k)Vzk . In the system, the cost function is defined as in − − Lh+Le 2 Le 1 [13], = − − − Gk0 S k k0 + GνS(k ν)+ cνW(k ν), = = =0 k   ν 0,ν k0 ν = k−τ  2 (15) ψ(k) λ e(τ) , (21) τ=1 = − × ffi where cν, ν 0, 1,...,Le 1, are N N coe cient matrices where λ is the forgetting factor. Under the LS criterion, the for equalization, and Le is the length of the equalizer, ffi = − equalization coe cient matrices cν, ν 0, 1,...,Le 1canbe obtained. With the obtained equalizer coefficients, the detec- L−1 e − G = c H(ν − µ) (16) tor (20) can be applied with the equalization output Q(k j), ν µ = − µ=0 j 0, 1,...,Ln 1. is the equivalent linear transform of the cascade of the chan- nel and the equalizer, and k0 is the overall delay caused by the 4. RLS ALGORITHM FOR DIFFERENTIAL SPACE-TIME MODULATED SYSTEM linear transform Gν. After equalization, the ISI item, that is, the second item In the forms of (14), (15), and (16)inSection 3, the chan- in (15), is compensated. So nels, the input and output signals are of matrix forms. Simi- lar to the existing equalization techniques for MIMO systems − Le 1 [7, 8, 14], we can derive the LS solution and the RLS solution ( ) = − + ( − ) (17) Q k Gk0 S k k0 cνW k ν . for differential space-time modulated systems as follows. = ν 0 il il Let cν , x (k)denotei, jth element of matrix cν and X(k), respectively, then Therefore, (12) can be applied to (17). That is, - . H il il il il   c = c c ··· c − , ˆ =  − −  0 1 Le 1 Vzk−k arg min Q(k) Q(k 1)Vzl . (18) (22) 0 l=0,1,...,L−1 il il il lj T x (k) = x (k) x (k − 1) ··· x k − Le +1 , − As in the DPSK with single antenna [11, 12], Q(k 1) can where the superscript H denotes the complex conjugate be substituted by transpose, and T denotes the transpose. Equation (14)can be written in matrix form Ln−1 $1 = 1 − ˆ Qref (k) Q(k j) Vz − − , (19) − k k0 i = H Ln 1 j=1 i=j−1 Q(k) C R(k), (23) Equalization for Differential Space-Time Modulation 535 where 5. SIMULATION RESULTS   c11 c21 ··· cN1 In this section, the performance of the equalization scheme   ff  12 22 ··· N2  for di erential space-time modulated system is shown with c c c  C =  . . . .  , simulations. In the following simulations, parametric space-  . . . .  time code of size 16 in [6]isadopted.Forl = 0, 1,...,15, c1N c2N ··· cNN   (24)   11 12 ··· 1M jlπ x (k) x (k) x (k) exp 0       21 22 ··· 2M  =  8  x (k) x (k) x (k) Vl R(k) =   .  j3lπ   . . . .  0exp . . . . 8 xN1(k) xN2(k) ··· xNM(k)   lπ lπ  cos sin  In addition,  2 2  ×   (34) lπ lπ = ˆ − sin cos D(k) Q (k)V − . (25) ref zk k0 2 2   jlπ Therefore C, which minimizes ψ(k), can be obtained by exp 0  differentiating ψ(k)in(21) and setting the result to be the  4  ×   . zero matrix, that is,  −jlπ  0exp 4 φ(k)C = ϕ(k), (26) This code has the best known product diversity in the lit- where erature of the same size, which is 21/4/2 following the nota-

k tion in [1]. Although this code itself is not a group, it is a φ(k) = λk−τ R(τ)RH (τ), subsetofagroupofsize32[6]. Using this space-time code, = the number of transmit antennas is M = 2 and the transmis- τ 1 (27) k sion data rate is Rt = 2. The number of receive antennas is set = k−τ H ϕ(k) λ R(τ)D (τ). as 2, that is, N = 2. The channel delay spread Lh = 2 is used τ=1 throughout the simulations. The simulation results are given under two cases, fixed Note that R(τ)andD(τ) are matrices rather than vectors in channels and fading channels. In the fixed channel case, the conventional normal equation [13]. Using the matrix inver- two taps of a channel have equal power as the channels stud- sion lemma [13], an iterative algorithm for (26)canbeob- ied in [9]. Figure 3 shows the learning curve of the RLS al- tained. Hence, the RLS algorithm for the space-time modu- gorithm for differential space-time modulation. The length lation with multiple antennas is as follows: of equalizer Le = 3. An information sequence of about 300 −1 symbols known to the receiver is used in the training. J(k) = P(0) = σ IL ×N ,σ= 0.004, (28) e Ee(k)2. The curves are drawn from the average over 1000 −1 Γ(k) = λ P(k − 1)R(k) (29) training processes. The forgetting factor λ = 1.0. The overall − × I + λ−1RH (k)P(k − 1)R(k) 1, (30) delay k0 is set to zero. Though the convergence processes do M not speed up remarkably, the errors under concern decrease = − H ξ(k) D(k) C (k)R(k), (31) as the increase of Ln, the feedback order. ff C(k) = C(k − 1) + Γ(k)ξH (k), (32) The performances of di erent receivers are shown in Figure 4. Because the energy of first taps and that of the sec- −1 −1 H P(k) = λ P(k − 1) − λ Γ(k)R (k)P(k − 1), (33) ond taps are the same, the ratio of the desired signals to the interference part is 0 dB when no equalization techniques × where Iν is the ν ν identity matrix and C(k) denotes the is used. In this case, the performance is not good (BLER is = = value of C after k iterations. If M N 1, this is just the above 10−1) whatever SNR is. Thus, this case is not shown RLS algorithm in [13]. From (25), (23), (19), (31), and (32), in Figure 4.FromFigure 4 one can see that the increase of C(0) should have a nonzero value. In the following simula- the feedback order Ln improves the performance, especially ij = tions, all the elements of ck0 are set to 0.1, that is, ck0 0.1, in the high SNR. However, the increase of Ln also means the i, j = 1, 2,...,N. For fixed channels, λ can be set to 1 and increase of the computational complexity. ˆ V − are assumed known in the training. After the training, zk k0 The performances of the receive scheme over fading ˆ ffi the detector in (20) is applied. V − is the previous detection channels are shown in Figure 5. All the channel coe cients, zk k0 result, to which the multiple symbol decision feedback de- independent of each other, are generated by Jakes’s model tection refers. For the fading channels, λ can be set to a value [15]. The auto-correlation of a fading channel of t time sam- less than 1. Thus, the receiver can track the variation of the ples apart is J0(2πfdTst), where J0 is the zero-order Bessel channels. function of the first kind, fdTs is the fading rate. In GSM 536 EURASIP Journal on Applied Signal Processing

102 100

10−1 101 ) k ( J 10−2 Block error rate 100 10−3

−4 10−1 10 0 20 40 60 80 100 120 140 160 180 200 12 14 16 18 20 22 24 26 28 30 SNR K Without equalizer Ln = 5 Ln = 2 Ln = 5 Ln = 2 Ln = 10 Ln = 3 Ln = 10 Ln = 3 Figure 3: Learning curves of the RLS algorithm for space-time Figure 5: The block error rate of space-time modulation with modulation with different Ln. equalization under frequency selective fading environment, fdTs = 10−5.

100 is that (18) does not give a better estimation of Q(k − 1) due to the fading of the channel.

10−1 6. CONCLUSIONS In the paper, an equalizer combined with differential detec- 10−2 tion was proposed for the differential space-time modulated

Block error rate system over frequency selective fading channels. By applying

−3 the LS criterion on the detection error, the RLS algorithm 10 was derived. Also the multiple symbol decision feedback de- tection was incorporated to improve the performance of the combined equalizer and detector. The performance was illus- 10−4 810121416182022 trated by simulations. After this paper was done, we found a SNR related and independent work [16]. Ln = 2 Ln = 5 Ln = 3 Ln = 10 ACKNOWLEDGMENTS Figure 4: The block error rate of space-time modulated system un- This work was supported in part by the Air Force Office of der fixed ISI channels. Scientific Research (AFOSR) under Grant F49620-00-1-0086 and the National Science Foundation (NSF) Grants MIP- 9703377 and CCR-0097240. system, the carrier frequency is 900 MHz. If a basic GSM EDGE slot structure with symbol rate of 2.71 × 105 symbols/s REFERENCES is used and the speed of mobile station is 3 km/h (TU3), the ff fading rate is 10−5, which is assumed in the simulation. [1] B. M. Hochwald and W. Sweldens, “Di erential unitary space- fdTs time modulation,” IEEE Trans. Communications, vol. 48, no. The average energy for the first taps of the channels is 9/10 12, pp. 2041–2052, 2000. = of the total energy. The length of the equalizer Le 3. Before [2] B. L. Hughes, “Differential space-time modulation,” IEEE the fading applied, 800 symbols are used in training. After Transactions on Information Theory, vol. 46, no. 7, pp. 2567– training, λ is set as 0.98. The curve marked by ◦ in Figure 5 2578, 2000. is the performance without equalization, which has an error [3] V. Tarokh and H. Jafarkhani, “A differential detection scheme floor around BLER of 10−1. It is shown that the receiver with for transmit diversity,” IEEE Journal on Selected Areas in Com- munications, vol. 18, no. 7, pp. 1169–1174, 2000. Ln > 2 still has a better performance over the receiver with [4] B. L. Hughes, “Optimal space-time constellations from = = Ln 2. However, the receiver with Ln 10 has approximately groups,” submittedd to IEEE Transactions on Information The- the same BLER with the receiver with Ln = 5. The reason lies ory, 2000. Equalization for Differential Space-Time Modulation 537

[5]A.Shokrollahi,B.Hassibi,B.M.Hochwald,andW.Sweldens, Aijun Song received his B.S. and M.S. de- “Representation theory for high-rate multiple-antenna code grees in electrical engineering in Xidian design,” IEEE Transactions on Information Theory, vol. 47, no. University, Xi’an, China, in 1997 and 2000, 6, pp. 2335–2367, 2000. respectively. From 1997 to 2000, he was a [6] X.-B. Liang and X.-G. Xia, “Unitary signal constellations for Research Assistant with National Key Lab differential space-time modulation with two transmit anten- for Radar Signal Processing in Xidian Uni- nas: Parametric codes, optimal designs, and bounds,” to ap- versity, Xi’an, China. Since 2000, he has pear in IEEE Transactions on Information Theory. been a Research Assistant with the Depart- [7] N. Al-Dhahir and A. H. Sayed, “The finite-length multi-input ment of Electrical and Computer Engineer- multi-output MMSE-DFE,” IEEE Trans. Signal Processing, vol. ing, University of Delaware, USA. His gen- 48, no. 10, pp. 2921–2936, 2000. eral interests include space-time coding techniques and multicar- [8] A. Duel-Hallen, “Equalizers for multiple input/output chan- rier modulations in communications. nels and PAM systems with cyclostationary input sequences,” IEEE Journal on Selected Areas in Communications, vol. 10, no. Xiang-Gen Xia received his B.S. degree in 3, pp. 630–639, 1992. mathematics from Nanjing Normal Uni- [9] G. Bauch, A. F. Naguib, and N. Seshadri, “MAP equalization versity, Nanjing, China, his M.S. degree in of space-time coded signals over frequency selective chan- mathematics from Nankai University, Tian- nels,” in Proc. Wireless Communications and Networking Con- jin, China, and his Ph.D. degree in electrical ference, New Orleans, La, USA, September 1999. engineering from the University of South- [10] A. F. Naguib, “Equalization of transmit diversity space-time ern California, Los Angeles, USA in 1983, coded signals,” in IEEE Globecom 2000, pp. 1077–1082, San 1986, and 1992, respectively. He was a Lec- Francisco, Calif, USA, December 1999. turer at Nankai University, China during [11] R. Schober, W. H. Gerstacker, and J. B. Huber, “Adaptive 1986–1988, a Teaching Assistant at Univer- linear equalization combined with noncoherent detection for sity of Cincinnati, USA during 1988–1990, a Research Assistant at MDPSK signals,” IEEE Trans. Communications, vol. 48, no. 5, the University of Southern California, USA during 1990–1992, and pp. 733–738, 2000. a Research Scientist at the Air Force Institute of Technology dur- [12] F. Edbauer, “Bit error rate of binary and quaternary DPSK ff signals with multiple differential feedback detection,” IEEE ing 1993–1994. He was a Senior/Research Sta Member at Hughes Trans. Communications, vol. 40, no. 3, pp. 457–460, 1992. Research Laboratories, Malibu, California, during 1995–1996. In [13] S. Haykin, Adaptive Filter Theory, Prentice-Hall, Upper Sad- September 1996, he joined the Department of Electrical and Com- dle River, NJ, USA, 3rd edition, 1996. puter Engineering, University of Delaware, USA, where he is cur- [14] T. Kailath, A. H. Sayed, and B. Hassibi, Linear Estimation, rently an Associate Professor. His current research interests include Prentice-Hall, Upper Saddle River, NJ, USA, 2000. communication systems including equalization and coding; SAR [15] G. L. Stuber, Principles of Mobile Communication,Kluwer and ISAR imaging of moving targets, wavelet transform and mul- Academic, Boston, Mass, USA, 2nd edition, 2001. tirate filterbank theory and applications; time-frequency analysis [16] Y. Liu and X. Wang, “Multiple-symbol decision-feedback and synthesis; and numerical analysis and inverse problems in sig- space-time differential decoding in fading channels,” nal/image processing. Dr. Xia has over 80 refereed journal articles EURASIP Journal on Applied Signal Processing, vol. 2002, no. published, and 4 U.S. patents awarded. He is the author of the book 3, pp. 297–304, 2002, Special Issue on Space-Time Coding “Modulated Coding for Intersymbol Interference Channels” (New and Its Applications—Part I. York, Marcel Dekker, 2000). Dr. Xia received the National Science Foundation (NSF) Faculty Early Career Development (CAREER) Program Award in 1997, the Office of Naval Research (ONR) Young Genyuan Wang received his B.S. and M.S. Investigator Award in 1998, and the Outstanding Overseas Young degrees in mathematics from Shaanxi Nor- Investigator Award from the National Nature Science Foundation mal University, Xi’an, China, in 1985 and of China in 2001. He also received the Outstanding Junior Faculty 1988, respectively, and the Ph.D. degree in Award of the Engineering School of the University of Delaware in electrical engineering from Xidian Univer- 2001. He is currently an Associate Editor of the IEEE Transactions sity, Xi’an China, in 1998. From July, 1988 on Mobile Computing, the IEEE Transactions on Signal Processing to September 1994, he worked at Shaanxi and the EURASIP Journal on Applied Signal Processing. He is also Normal University as an Assistant Profes- a member of the Signal Processing for Communications Technical sor and then an Associate Professor. From Committee in the IEEE Signal Processing Society. September 1994 to May 1998, he worked at Xidian University as a Research Assistant. Currently, he is Post- Doctoral Fellow at the Department of Electrical and Computer Engineering, University of Delaware, USA. His research interests include radar imaging and radar signal processing, adaptive filter, OFDM system, channel equalization, and coding theory.