UPTEC F10 059 Examensarbete 20 p November 2010

Polynomial Decompositions

Evaluation of Algorithms with an Application to Wideband MIMO Communications

Rasmus Brandt

Abstract Matrix Decompositions: Evaluation of Algorithms with an Application to Wideband MIMO Communications Rasmus Brandt

Teknisk- naturvetenskaplig fakultet UTH-enheten The interest in wireless communications among consumers has exploded since the introduction of the ''3G'' cell phone standards. One reason for their success is the Besöksadress: increasingly higher data rates achievable through the networks. A further increase in Ångströmlaboratoriet Lägerhyddsvägen 1 data rates is possible through the use of multiple antennas at either or both sides of Hus 4, Plan 0 the wireless links.

Postadress: Precoding and receive filtering using matrices obtained from a singular value Box 536 751 21 Uppsala decomposition (SVD) of the channel matrix is a transmission strategy for achieving the channel capacity of a deterministic narrowband multiple-input multiple-output Telefon: (MIMO) communications channel. When signalling over wideband channels using 018 – 471 30 03 orthogonal frequency-division multiplexing (OFDM), an SVD must be performed for

Telefax: every sub-carrier. As the number of sub-carriers of this traditional approach grow 018 – 471 30 00 large, so does the computational load. It is therefore interesting to study alternate means for obtaining the decomposition. Hemsida: http://www.teknat.uu.se/student A wideband MIMO channel can be modeled as a matrix filter with a finite impulse response, represented by a polynomial matrix. This thesis is concerned with investigating algorithms which decompose the polynomial channel matrix directly. The resulting decomposition factors can then be used to obtain the sub-carrier based precoding and receive filtering matrices. Existing approximative polynomial matrix QR and singular value decomposition algorithms were modified, and studied in terms of decomposition quality and computational complexity. The decomposition algorithms were shown to give decompositions of good quality, but if the goal is to obtain precoding and receive filtering matrices, the computational load is prohibitive for channels with long impulse responses.

Two algorithms for performing exact rational decompositions (QRD/SVD) of polynomial matrices were proposed and analyzed. Although they for simple cases resulted in excellent decompositions, issues with numerical stability of a spectral factorization step renders the algorithms in their current form purposeless.

For a MIMO channel with exponentially decaying power-delay profile, the sum rates achieved by employing the filters given from the approximative polynomial SVD algorithm were compared to the channel capacity. It was shown that if the symbol streams were decoded independently, as done in the traditional approach, the sum rates were sensitive to errors in the decomposition. A receiver with a spatially joint detector achieved sum rates close to the channel capacity, but with such a receiver the low complexity detector set-up of the traditional approach is lost.

Summarizing, this thesis has shown that a wideband MIMO channel can be diagonalized in space and frequency using OFDM in conjunction with an approximative polynomial SVD algorithm. In order to reach sum rates close to the capacity of a simple channel, the computational load becomes restraining compared to the traditional approach, for channels with long impulse responses.

Handledare: Mats Bengtsson Ämnesgranskare: Mikael Sternad Examinator: Tomas Nyberg ISSN: 1401-5757, UPTEC F10 059 Popul¨arvetenskaplig sammanfattning p˚asvenska Tr˚adl¨os kommunikation ¨ar ett omr˚adevars popul¨aritet har ¨okat de senaste ˚aren. Ett sk¨al till ”3G-internets” framg˚ang ¨ar de h¨oga datatakter som ¨ar m¨ojliga. Datatakten i en tr˚adl¨os l¨ank beror p˚asignalens bandbredd samt s¨andeffekten, och genom att ¨oka endera erh˚allsh¨ogre datatakter. B˚adebandbredd och s¨andeffekt ¨ar dock dyra resurser, eftersom deras anv¨andande ofta ¨ar reglerat av nationella och internationella myndigheter. Ett annat s¨att att ¨oka datatakten i en tr˚adl¨os l¨ank kan vara att l¨agga till fler anten- ner p˚as¨andar- och mottagarsidan, ett s.k. MIMO-system. Ett s˚adant system kan ses som en upps¨attning av enkelantennl¨ankar med inb¨ordes p˚averkan och kan beskrivas av en matris. Da- tatakten f¨or flerantennl¨anken kan maximeras genom att skicka flera parallella datastr¨ommar ¨over MIMO-kanalen. Eftersom de olika uts¨anda signalerna samsas om radiokanalen kommer de att blandas. Varje mottagarantenn kommer d¨arf¨or att ta emot en kombination av de uts¨anda signalerna fr˚ande olika s¨andarantennerna. F¨or att undvika att signalerna blandas m˚astede kodas. Det visar sig att genom att koda de s¨anda signalerna med en speciell matris, samt avkoda de mottagna signalerna med en annan matris, s˚atransformeras kanalen till en upps¨attning av parallella virtuella kanaler. P˚a dessa virtuella kanaler kan sedan oberoende datastr¨ommar skickas. Kodningsmatriserna ges av en s.k. singul¨arv¨ardesuppdelning av den ursprungliga kanalmatrisen. F¨or ett enkelantennsystem med h¨og bandbredd kommer radiokanalen att p˚averka de olika frekvenskomponenterna i signalen olika. Om inte systemet tar h¨ansyn till den effekten kommer dess prestanda att p˚averkas. Ett s¨att att undvika denna frekvensselektivitet ¨ar att signalera ¨over kanalen med s.k. OFDM. Genom OFDM-systemet delas den ursprungliga signalen upp i flera signaler med l˚agbandbredd. Genom att skicka dessa smalbandiga signaler p˚aolika delar av frekvensbandet s˚ap˚averkar de inte varandra. Den frekvensselektiva kanalen har s˚aledes delats upp i ett antal icke frekvensselektiva parallella subkanaler. Genom att skicka en bredbandig signal ¨over ett OFDM-baserat MIMO-system kan ¨annu h¨ogre datatakter ˚astadkommas. Dock m˚astekodningsmatriserna ber¨aknas f¨or varje parallell subkanal i frekvensbandet, vilket inneb¨ar att m˚angaber¨akningsoperationer kr¨avs. Det h¨ar examensarbetet har unders¨okt en ny upps¨attning algoritmer f¨or att erh˚allaapproximatio- ner av de kodningsmatriser som beh¨ovs. Kvaliteten p˚ade approximativa kodningsmatriserna j¨amf¨ordes med de exakta matriserna och antalet n¨odv¨andiga ber¨akningsoperationer m¨attes. Det visade sig att de nya algoritmerna kan producera kodningsmatriser av god kvalitet, men med fler n¨odv¨andiga ber¨akningsoperationer ¨an det traditionella s¨attet att erh˚allakodnings- matriserna. Kodningsmatriserna fr˚ande nya algoritmerna simulerades ocks˚ai ett kommunikationssy- stem. Med de nya matriserna kan man uppn˚adatatakter som ¨ar n¨ara den teoretiska maxka- paciteten f¨or en enkel radiokanal om en avancerad dekoder anv¨ands p˚amottagarsidan. Om ist¨allet en upps¨attning av enkla dekodrar anv¨ands, som i det traditionella systemet, blir systemprestanda lidande. Sammanfattningsvis s˚ahar det h¨ar examensarbetet visat att kodningsmatriserna erh˚allna fr˚ande nya algoritmerna kan anv¨andas i ett bredbandigt MIMO-system f¨or att maximera datatakten. Dock s˚akr¨aver de fler ber¨akningsoperationer, och en mer avancerad dekoder, ¨an det traditionella systemet. De nya algoritmerna ¨ar s˚aledesinte konkurrenskraftiga j¨amf¨ort med det traditionella systemet. Acknowledgements

This diploma work was performed at the Signal Processing Laboratory at the School of Electrical Engineering at Kungliga Tekniska H¨ogskolan in Stockholm, and will lead to a degree of Master of Science in Engineering Physics from Uppsala University. First and foremost, I would like to thank my supervisor Mats Bengtsson for proposing the thesis topic and taking me on as a MSc thesis worker. His advice and guidance has helped me considerably during the course of this work. My ¨amnesgranskare Mikael Sternad at the Division for Signals and Systems at Uppsala university also deserves my gratitude; his comments have been very valuable to the final version of this thesis. My family has always been supporting my endeavours, and for that I am endlessly grateful. Finally, thank you Melissa for being so lovely and cheerful, and for moving to Sweden to be with me. Contents

1 Introduction 1 1.1 Wireless Communications ...... 1 1.2 Multiple Antennas and Wideband Channels ...... 3 1.3 Problem Formulation and Contributions ...... 3 1.4 Thesis Outline ...... 4

2 Preliminaries 5 2.1 Complex ...... 5 2.1.1 Addition and Subtraction ...... 6 2.1.2 Multiplication ...... 6 2.2 Polynomial Matrices ...... 7 2.2.1 Givens Rotations ...... 7 2.2.2 Decompositions ...... 9 2.2.3 Coefficient Truncation ...... 9 2.3 Computational Complexity ...... 10

3 MIMO Channels and Multipath Propagation 11 3.1 Propagation and Modeling ...... 11 3.1.1 Propagation ...... 11 3.1.2 Channel Modeling ...... 13 3.1.3 MIMO Channels ...... 14 3.2 Channel Capacity and Achievable Rate ...... 15 3.3 Equalization Techniques ...... 16 3.4 Summary ...... 17

4 Polynomial Decomposition Algorithms: Coefficient Nulling 18 4.1 Performance Criteria ...... 18 4.2 PQRD-BC: Polynomial QR Decomposition ...... 20 4.2.1 Convergence and Complexity ...... 21 4.2.2 Discussion ...... 22 4.3 MPQRD-BC: Modified PQRD-BC ...... 22 4.3.1 Convergence and Complexity ...... 23 4.3.2 Simulations ...... 25 4.3.3 Discussion ...... 29 4.4 PSVD by PQRD-BC: Polynomial Singular Value Decomposition ...... 29 4.4.1 Convergence and Complexity ...... 31

i 4.4.2 Discussion ...... 32 4.5 MPSVD by MPQRD-BC: Modified PSVD ...... 32 4.5.1 Convergence and Complexity ...... 33 4.5.2 Simulations ...... 34 4.5.3 Discussion ...... 39 4.6 Sampled PSVD vs. SVD in DFT Domain ...... 39 4.6.1 Frequency Domain Comparison ...... 39 4.6.2 Computational Load Comparison, Set-Up Phase ...... 39 4.6.3 Computational Load, Online Phase ...... 41 4.6.4 Discussion ...... 42 4.7 Summary ...... 43

5 Rational Decomposition Algorithms: Polynomial Nulling 44 5.1 Rational Givens Rotation ...... 44 5.2 PQRD-R: Rational QR Decomposition ...... 45 5.2.1 Simulations ...... 46 5.2.2 Discussion ...... 47 5.3 PSVD-R by PQRD-R: Rational Singular Value Decomposition ...... 50 5.3.1 Simulations ...... 50 5.3.2 Discussion ...... 51 5.4 Summary ...... 51

6 Polynomial SVD for Wideband Spatial Multiplexing 55 6.1 Generic System Model ...... 56 6.1.1 Narrowband Scenario ...... 56 6.1.2 Wideband Scenario ...... 58 6.2 SM by MIMO-OFDM: SVD in the DFT Domain ...... 58 6.2.1 Specific System Model ...... 58 6.2.2 Capacity ...... 59 6.3 SM by MIMO-OFDM: SVD in the z-Domain ...... 60 6.3.1 Specific System Model ...... 60 6.3.2 Achievable Rate ...... 62 6.4 Simulations ...... 62 6.4.1 Method ...... 62 6.4.2 Results ...... 63 6.5 Summary ...... 65

7 Summary 69 7.1 Conclusions ...... 70 7.2 Future Work ...... 71

A Acronyms and Notation 72

B Some Complexity Derivations 74 B.1 Matrix- ...... 74

Bibliography 78

ii Chapter 1

Introduction

In the current day and age, access to mobile broadband through the cellular networks is ubiquitous. The demands for higher data rates are ever increasing, as people get used to having constant access to the Internet. The latest acronym in the flora of terms relating to cellular networks is LTE, standing for Long Term Evolution. This new standard promises even higher data rates than the previous ”3G” standards, by employing efficient modulation schemes as well as terminals with multiple antennas [1]. With these increasing data rates, one could ponder how they are achieved. It all boils down to the efficient use of resources, in this case power and bandwidth. As the use of the resources is optimized, higher data rates can be provided to the cellular customers. When the power and bandwidth allocation gets close to the optimal point however, how would one go about increase the data rate even further? An exciting field in wireless communications is that of multi-antenna systems, so called MIMO (multiple-input multiple-output) communications. Having access to multiple antennas at either or both sides of a wireless link can open up for new transmission strategies. The MIMO channel can be used for increasing the rate even further, improving the signal quality, or both at the same time. The reason that the data rate can be increased, for the same amount of available power and bandwidth, is because the MIMO channel under certain conditions provides multiple parallel spatial channels, which can be used independently. This thesis will study how one can get access to the spatial channels, and compare two different approaches for doing this.

1.1 Wireless Communications

Wireless communications is the field of communication strategies that employ radio waves for information transfer. In particular, this thesis will focus on digital wireless communications, meaning that the information being transmitted is in digital form. The system is agnostic of the meaning of the data, but is rather focused on reliably transferring the data across the channel. The typical framework for digital communications can be seen in Figure 1.1. The first operation in the system is that of source coding. This is the act of taking the information, in whatever form, and transforming it into a form suitable for transmission in a digital communications system. It may include sampling, quantization and lossy or lossless compression of the data [2, p. 2]. The output from the source coder is a sequence of bits, or binary digits. At the receiver side, the last operation performed is undoing the source coding.

1 Source Channel Channel Source Mod. Channel Demod. Coding Coding Decoding Decoding

Figure 1.1: Block Diagram of a Typical Digital Communications System.

If the system has been designed properly, the output of the source decoder will resemble the input to the source coder. If part of the source coder’s task is to remove redundancy in the data, then the opposite holds for the channel coder. The blocks in Figure 1.1 in between the channel coder and channel decoder will almost certainly introduce some errors in the stream of transmitted symbols. This could be because the system temporarily suffers from a high noise level, or that the radio channel is bad. It is the task of the channel coder/decoder pair to recover any information lost, or at least recognize that some information was lost. This is done by inserting redundancy into the stream of symbols to be sent. Knowing the structure of the redundancy at the receiver side, errors can be corrected, or at least detected. The last block on the sender side is the modulator. The modulator takes the output of the channel encoder, and transforms it into a form suitable for launch onto the physical channel. This includes mapping bits to symbols, applying pulse-shaping to the symbols so a continuous waveform is obtained and finally upconverting the signal to the carrier frequency. The waveform is sent to the RF chain of the transmitter, and then converted to RF energy in the antenna. The modulation and demodulation part of the system can be seen in close-up in Figure 1.2.

Pulse Upconv. Shaping

Channel

Matched Det. Demod. Sampling Downconv. Filtering

Figure 1.2: Modulation-Channel-Demodulation Sub-system.

At the receiver, the effect of the carrier frequency is removed in the downconverting step. The signal is then matched filtered to the pulse-shaping waveform, and sampled. Based on the samples, demodulation and detection is performed resulting in estimates of the transmitted symbols. A digital communications system does not necessarily have clear cut lines between the subsystems of Figure 1.1. For instance, joint channel coding and modulation may give boosts in performance, at the price of higher transceiver complexity.

2 1.2 Multiple Antennas and Wideband Channels

There are several ways to increase the maximum reliable data rate of a wireless link. By transmitting more symbols per unit time, that is increasing the bandwidth of the signal, the data rate is increased. The downside is that bandwidth in the radio spectrum is an expensive resource due to national and international regulations. Another strategy is to increase the signal-to-noise ratio (SNR) of the link. A higher SNR means a better quality of the received signal, and therefore more data can be transferred per symbol, leading to increased spectral efficiency. For this case, more power is needed at the transmitter side, which may pose a problem if the transmitter is battery powered or if there are regulations regarding the amount of acceptable transmitted power at that particular frequency. A third way of increasing the data rate is by introducing multiple antennas at either or both sides of the link. If the channels between the various antenna elements are uncorrelated, which they under certain circumstances are, multiple parallel spatial channels arise that can be used for parallel signalling. By accessing these spatial channels, the data rate can be increased without consuming more bandwidth or power resources. Information theory shows that for high SNRs, the theoretically highest data rate of a MIMO channel with uncorrelated spatial sub-channels grows linearly in the minimum number of antennas at the receiver or transmitter sides [3]. For radio channels where there are multiple paths from the transmitter to the receiver, several versions of the transmitted signal will be received at different points in time. This multipath propagation leads to problem for signals with a high symbol rate, i.e. wide band- width. In effect, different frequencies will be attenuated differently by the radio channel; the channel is said to be frequency-selective. This wideband behaviour of the channel must be mitigated for reliable communication to take place.

1.3 Problem Formulation and Contributions

This thesis will investigate whether a transmission scheme based on a polynomial singular value decomposition of a wideband MIMO channel matrix can achieve the same data rate as a similar system where the singular value decomposition is performed in the frequency domain, at comparable or lower transmission set-up complexity. In order to do this, some algorithms for polynomial matrix decomposition will be investigated in terms of complexity and error performance, and the achievable data rate is simulated for communication system frameworks including these algorithms. The contributions given by this thesis are:

Modified versions of the decomposition algorithms of Foster et al. [4] are proposed and • analyzed in terms of complexity and decomposition quality.

Two rational decomposition algorithms for polynomial matrices are proposed, and their • shortcomings discussed.

A transmission scheme utilizing the modified polynomial singular value decomposition • algorithm is shown to achieve good performance, in terms of achievable rate, but at a restraining computational load compared to the reference SM by MIMO-OFDM trans- mission scheme.

3 1.4 Thesis Outline

The next chapter will lay the mathematical groundwork for the investigations to come. Poly- nomials and matrices are introduced, as well as the conjunction of the two concepts, which is polynomial matrices. Finally, an analysis of algorithm run time performance through bench- marking and theoretical analysis is presented. The following part of the thesis, Chapter 3, will discuss the MIMO channel more deeply. The idea of frequency-selective channels is presented, and the orthogonal frequency-division multiplexing technique for mitigating inter-symbol interference resulting from high data rates is introduced. In Chapter 4, a set of polynomial matrix decomposition algorithms is presented. They are based on the idea of single coefficient nulling through polynomial Givens rotations. The chapter states the algorithms, and some analytical and numerical results are presented in order to examine the properties of the algorithms. Chapter 5 similarly analyzes some polynomial matrix decomposition algorithms, but these are instead based on the idea of polynomial nulling through polynomial IIR Givens rotations. With the algorithms clearly defined, Chapter 6 will employ them in a communications set up. The channel and system models are defined, and the system capacities derived. Using these expressions, simulation results are presented showing the capacity of the given systems. Finally, conclusions will be drawn in Chapter 7. The thesis will be summarized, and key points will be presented.

4 Chapter 2

Preliminaries

This chapter is concerned with some fundamental facts and definitions that will be used a great deal in the rest of the thesis. Definitions of complex polynomials and constant matrices will be given, and then the two concepts will be joined into complex polynomial matrices. The definitions regarding polynomials, matrices and polynomial matrices are mostly taken from [5, 6], where more details are given. Additionally, a brief discussion about algorithmic complexity will be undertaken. After this chapter, the reader will be aware of all needed to understand the algorithms to be presented.

2.1 Complex Polynomials

A Laurent polynomial is an expression of the form

V1 1 −1 −V2 p(z) = C−V1 z + ... + C−1z + C0 + C1z + ... + CV2 z (2.1) or more compactly V2 X −v p(z) = Cvz V1 > 0,V2 > 0 (2.2)

v=−V1 where z is some indeterminate symbol and the Cv coefficients are taken from some field F. For our purposes, we will assume that the coefficients are complex numbers, that is F = C. Hence, (2.2) is a complex Laurent polynomial. In the following, we will just call it a polynomial. For a more in-depth discussion of fields and their properties, see [7, Ch. 7]. For F = C it holds that every polynomial uniquely determines a function [7, p. 297]. The function is evaluated at a point z0 simply by replacing the indeterminate symbol z in (2.2) with z0, and performing the summation. It should be noted though that a polynomial, and the function defined by the polynomial, are two distinctly different entities. In order to conveniently classify polynomials, some notation will now be introduced. The relation , used as ∼ p(z) (V1,V2, C) (2.3) ∼ signifies that p(z) has V1 positive exponent terms, V2 negative exponent terms and that the coefficients are in C. The space of all Laurent polynomials with complex coefficients will be 1×1×V1+V2+1 denoted C. The space of all polynomials p(z) (V1,V2, C) will be C . Further on, ∼ the maximum degree of p(z) is V1 + V2.

5 0 0 It is clear that for V1 > V1,V2 > V2

0 0 p(z) (V1,V2, C) = p(z) (V ,V , C). (2.4) ∼ ⇒ ∼ 1 2 This can be shown to hold by setting the added outer coefficients to zero. The reason for our interest in polynomials is in their relation to Linear Time-Invariant (LTI) systems. Chapter 3 will explore this relation further.

2.1.1 Addition and Subtraction Given two polynomials

a(z) (V1,V2, C), b(z) (U1,U2, C) (2.5) ∼ ∼ the variables M1 = max(V1,U1),M2 = max(V2,U2) (2.6) can be defined. Assuming that a(z), b(z) have coefficient sequences

V2 U2 Ca,i , Cb,i (2.7) { }i=−V1 { }i=−U1 the sum of the polynomials is defined as

M2 X −v (a + b)(z) = a(z) + b(z) = (Ca,v + Cb,v) z (2.8)

v=−M1 with C = 0 v / [ V ,V ],C = 0 v / [ U ,U ]. (2.9) a,v ∀ ∈ − 1 2 b,v ∀ ∈ − 1 2 Subtraction is similarly defined, but replacing C ( C ) v. b,v → − b,v ∀ 2.1.2 Multiplication Multiplication of two polynomials is the convolution of the two coefficient sequences. Given the polynomials in 2.5, the product is written as     V2 U2 X −v X −u c(z) = a(z)b(z) =  Ca,vz   Cb,uz  v=−V u=−U 1 1 (2.10) V2 U2 X X −(v+u) = Ca,vCb,uz .

v=−V1 u=−U1

In particular, the coefficient associated with z−r in the product will be given by

V X X2 Cc,r = Ca,vCb,u = Ca,vCb,r−v. (2.11) u,v v=−V u+v=r 1

Defining C = 0 v / [ V ,V ],C − = 0 (r v) / [ U ,U ] (2.12) a,v ∀ ∈ − 1 2 b,r v ∀ − ∈ − 1 2

6 the sum in (2.11) can be written as an infinite sum ∞ X Cc,r = Ca,vCb,r−v (2.13) v=−∞ which can be identified as the convolution sum. Let d1, d2 be the maximum degrees of the polynomials a(z), b(z). Then by zero-padding the two coefficient vectors to length d + d 1, the convolution can efficiently be evaluated 1 2 − using the convolution theorem [8, p. 191]. That is, c(z) = −1 ( (a(z)) (b(z))) (2.14) Fd Fd Fd where the transforms are understood to be working on the coefficient vectors of the polyno- mials.

2.2 Polynomial Matrices

A polynomial matrix A(z) is a matrix whose elements are polynomials, or equivalently, a polynomial whose coefficients are matrix-valued [6, p. 24]. An arbitrary polynomial matrix

V2 X −v A(z) = Avz (2.15)

v=−V1 p×q p×q belongs to the space C if Av C v. For the given V1,V2, we can also write Av × × ∈ ∀ { } ∈ Cp q (V1+V2+1). ∗ The transpose AT (z), conjugate A∗(z) and Hermitian conjugate AH (z) = AT (z) of a polynomial matrix are obtained by applying the respective operation on each of the coefficient matrices. In addition, AH (z−∗) will be termed the para-Hermitian conjugate of A(z). A polynomial matrix which satisfies AH (z−∗)A(z) = I is called a paraunitary matrix [4]. This type of matrices will play an important part in the algorithms to be developed in Chapter 4, as their columns are mutually orthogonal over all frequencies. Due to this orthogonality, the multiplication of an arbitrary matrix with a paraunitary matrix preserves the Frobenius norm of the original matrix. The Frobenius norm of the polynomial matrix in (2.15) is defined as v u p q V2 uX X X A(z) = t [A ] 2 (2.16) k kF | v ij| i=1 j=1 v=−V1 where [ ] denots the (i, j) component of the matrix. · ij In the following, the terms matrix and polynomial matrix will be used interchangeably, with the understanding that an ordinary matrix is just a polynomial matrix of maximum degree 0 with a single coefficient matrix.

2.2.1 Givens Rotations A constant Givens rotation is a unitary transformation which zeroes a specific component of a vector. For the 2 2 case, the constant complex Givens rotation is defined as ×  cejα sejφ  G = . (2.17) se−jφ ce−jα − 7 × Applying G to a vector x C2 1 one obtains ∈ x   cejαx + sejφx  Gx = G 1 = 1 2 x se−jφx + ce−jαx 2 − 1 2 and by selecting   ( −1 |x2| tan if x1 = 0 θ = |x1| | | 6 π if x = 0 2 | 1| c = cos θ s = sin θ α = arg(x ) φ = arg(x ) − 1 − 2 it can be shown that [Gx]2 = 0. Additionally, since G is unitary, the magnitude squared of the other component will be

[Gx] 2 = x 2 + x 2 | 1| | 1| | 2| which will be referred to as the energy moving property of the Givens rotation. Intuitively, the application of the Givens rotation moves the energy of component 2 to component 1, so that component 2 becomes zero. The Givens rotation in (2.17) can be extended so that it zeroes a specific component of a p 1 vector or p q matrix. For the matrix case, if element (i, j) is to be zeroed, and element × × (i, i) is to receive the energy, the Givens rotation takens the form of a p p × with elements at the intersections of rows i, j and columns i, j replaced by the elements of (2.17). For the polynomial matrix case, a polynomial Givens rotation (PGR) can be defined. What will be referred to as a PGR in this thesis is the elementary polynomial Givens rotation of [4], with an elementary delay matrix prepended. The polynomial analogue of (2.17) is therefore 1 0   cejα sejφ  1 0   cejα sejφzt G(z) = = (2.18) 0 z−t se−jφ ce−jα 0 zt se−jφz−t ce−jα − − T −t which when applied to x(z) = x1(z) x2(z) will zero the coefficient associated with z in x2(z) and move the energy to the constant coefficient of x1(z), if the parameters are chosen such that   ( −1 |x2(t)| tan if x1(0) = 0 θ = |x1(0)| | | 6 π if x (0) = 0 2 | 1 | (2.19) c = cos θ s = sin θ α = arg(x (0)) φ = arg(x (t)) − 1 − 2 −j where xi(j) denotes the coefficient associated with z for element xi. In the process, all H −∗ other coefficients of x1(z) and x2(z) will be affected. It can be shown that G (z )G(z) = I, and (2.18) is therefore a paraunitary operation. The paraunitarity implies that the operation is norm preserving, and in particular that the energy moving property still holds.

8 The polynomial Givens rotation can easily be extended to the p q case, in the same × manner as for the constant Givens rotation case. By construction, the p p polynomial Givens × rotation will also be paraunitary. For every application of a polynomial Givens rotation, the degree of the polynomial matrix it is applied to will grow with 2 t . This inherent property of | | the PGR will lead to a need for a truncation step in the algorithms to be formed, as further explained in Chapter 4. The application of a PGR only affects two rows of the matrix it is applied to. The complexity of the application on a p q matrix with r lags is therefore × C = 2q(r + 2 t ). (2.20) PGR | | 2.2.2 Decompositions × The (full) QR decomposition of a constant matrix A Cp q is ∈ A = QR (2.21)

× × where Q Cp p is a unitary constant matrix, and R Cp q is an upper triangular constant ∈ ∈ matrix [5, p. 112]. The constant QR decomposition can be calculated in a variety of ways: through Givens rotations, Householder rotations or via the Gram-Schmidt orthogonalization procedure [5, pp. 114-117]. An approximate polynomial QR decomposition of a polynomial p×q matrix A(z) C is ∈ A(z) = Q(z)R(z) (2.22) p×q p×q where Q(z) C is an approximately paraunitary polynomial matrix and R(z) C is ∈ ∈ an approximately upper triangular polynomial matrix [4]. Intuitively, this can be seen as a constant QRD taken over all frequencies. × For an arbitrary constant matrix A Cp q, the singular value decomposition (SVD) is ∈ A = UDVH (2.23)

× × × where U Cp p, V Cq q are unitary matrices and D Rp q is a [5, p. ∈ ∈ ∈ 414]. The diagonal entries of D are called the singular values of the matrix A. The columns of U and V are called the right and left singular vectors of A, respectively. An efficient implementation for calculating the SVD of a constant matrix can be found in [9, p. 448] Similarly, an approximate singular value decomposition can be obtained for polynomial p×q matrices. Given A(z) C , a PSVD of A(z) is ∈ A(z) = U(z)D(z)VH (z−∗) (2.24)

p×p q×q p×q where U(z) C , V(z) C are approximately paraunitary matrices and D(z) C ∈ ∈ ∈ is an approximately diagonal matrix [4] whose diagonal elements are called the singular values of A(z). As in the QRD case, the intuition is that the polynomial matrix A(z) is decomposed into its SVD over all frequencies. Note that in this definition there is no assumption of ordering of the singular values, as compared to the ordinary SVD [5, p. 414].

2.2.3 Coefficient Truncation During certain steps of the algorithms to be investigated in Chapter 4, the maximum degrees of the polynomials involved grow fast. As will be seen, often times most of the energy of the

9 filter coefficients will be concentrated around the constant coefficient. In order to keep the maximum degrees of the polynomials involved down, a truncation step is utilized. Thereby, the storage requirements for the polynomial matrices are reduced, as well as the computational load involved when applying the decomposition factors as filters. The truncation step is defined in [4], but is restated in Algorithm 1 for clarity.

Algorithm 1 Polynomial Matrix Truncation p×q 1: Input polynomial matrix A(z) (V1,V2, C ) and truncation parameter µ. ∼ 1 PT1 Pp Pq 2 µ 2: Find the maximum value for T1 so that 2 alm(v) holds. ||A(z)|| v=V1 l=1 m=1 | | ≤ 2 1 PV2 Pp Pq 2 µ 3: Find the minimum value for T2 so that 2 alm(v) holds. ||A(z)|| v=T2 l=1 m=1 | | ≤ 2 4: Return A (z) = PT2 A z−v trunc v=T1 v

The parameter µ defines the proportion of the total energy of the filter to be truncated from the matrix. It can be shown that the complexity of the naive implementation of this algorithm is (pqr2). The complexity can be decreased by a binary search algorithm. O 2.3 Computational Complexity

When studying algorithms from a performance perspective, it is interesting to analyze their running times. This may either be done through benchmarking or theoretical analysis [10, p. 91]. In this thesis, both approaches will be used. Benchmarking is straight-forward, as it only involves running the algorithm and measuring some quantity related to the performance. By running the algorithm for different sets of input data, experimental relations are obtained relating the performance to the properties of the input. The downside of benchmarking is that the algorithm must be implemented first, and that one can not be sure that the results generalize. Theoretical analysis, on the other hand, typically only gives upper bounds of the perfor- mance. This is particularly the case for algorithms containing loops, where the number of iterations of the loops are not obviously defined in terms of the input data size. In order to deal with bounds in a convenient manner, Ordo or Big-Oh notation is introduced. Assume that T (n) describes the run time of an algorithm with input data size n. Then we say that T (n) = (f(n)) if it holds that O T (n) cf(n) ≤ for some constant c > 0 and all integers n > n0. Ordo notation is a very convenient tool for describing complexity expressions, because of the two following rules as given by [10, p. 98]:

1. Constant factors don’t matter: For any constant d, (df(n)) = (f(n)) since the O O constant can be accumulated in the hidden constant c of the Ordo notation.

2. Low-order terms don’t matter: For the polynomial case, only the term with the largest degree needs to be kept, since it will dominate over any other terms for large n.

10 Chapter 3

MIMO Channels and Multipath Propagation

3.1 Propagation and Modeling

Wireless communication uses electromagnetic waves in the radio spectrum for information transfer. Electromagnetic waves are completely characterized by the Maxwell equations; a set of non-linear partial differential equations. The inherent complicated structure of the equations makes them hard to solve as well as hard to analyze analytically. Radio engineers therefore tend to use simpler models, where certain aspects of the radio wave propagation can more easily be analyzed.

3.1.1 Propagation In general, the received radio signal is modeled as an attenuated and modified version of the transmitted signal, with some noise added. The attenuation can be classified into three time-scale dependent behaviours: path loss, shadowing and fading [3]. Path loss occurs only because of the distance between the transmitter and the receiver, and can be modeled as a factor 1/rd, where d usually is in the range [2, 4] [3]. At best, which is the case of an isotropic transmitter antenna with fixed transmit power in empty space, the exponent will take the value 2. This is because for this scenario, the generated radio waves will be spherically propagating from the antenna, and the amplitude at distance r will decrease as 1/r. The power therefore decreases as 1/r2. Due to ground reflections and other phenomena, the exponent can be larger though. The path loss is the most slowly varying of the three attenuation factors mentioned. Shadowing takes place on a shorter time scale, and is typically incurred by objects that are blocking the radio waves. In an urban environment, this could be due to cars moving around and temporarily changing the propagation paths. Shadowing is typically hard to model, but one common model is the log-normal distribution [3]. The most quickly evolving attenuation phenomenon is fading. This occurs due to radio waves from different paths adding up constructively or destructively at the receiver. This interference effect changes on the order of half a wavelength, and is therefore sensitive to small changes in the propagation environment. Assuming that there is no Line-of-Sight (LOS) component, the fading is called Rayleigh fading. The name arises from the fact that the

11 channel coefficient can be modeled, through the central limit theorem, as a complex Gaussian stochastic variable, whose magnitude square is described by the Rayleigh distribution. If there is a LOS component, the channel gains are instead modeled by a Rice distribution, and the phenomenon is called Rician fading. A subject that we have touched upon, but not discussed upfront, is that of multipath propagation. Depending on the environment, a transmitted radio wave may take several different paths to the receiver. An example can be seen in Figure 3.1. Depending on the path lengths, and the associated reflections, several delayed versions of the same signal will reach the receiver. The delay spread of a channel is a measure of how large spread it is between the first and the last multipath component to arrive at the receiver. For short delay spreads, compared to the transmit symbol period, the channel is said to have narrowband fading, for which the Rayleigh and Rice models work well. On the other hand, for channels with large delay spreads, other models must be used. The coherence bandwidth Bc of a channel is what width in the frequency domain can be assumed constant, and is approximately inversely proportional to the delay spread.

Figure 3.1: Example of Multipath Propagation

For channels where the delay spread is larger than the symbol period, an effect called inter-symbol interference (ISI) takes place. Since the symbols are transmitted at a high rate, the channel impulse response will not have enough time to ”die off”. Instead, subsequent symbols will be overlaid at the receiver, and therefore interfere with each other. This type of channels are also called frequency selective channels, since different frequency components of the transmitted signal will be attenuated by different factors.

12 3.1.2 Channel Modeling In this section, the channel is assumed to be described well by a single-input single-output (SISO) linear system. A linear system is characterized by its impulse response at time t, h(t; τ). For a transmitted signal s(t), the received signal r(t) is modeled as a filtered version of s(t) with some noise n(t) added, such that Z ∞ r(t) = h(t; τ)s(t τ)dτ + n(t). −∞ − In the following, we will assume that the channel is time-invariant over the transmission of a block of data, and the channel impulse response can be replace by h(τ) such that Z ∞ r(t) = h(τ)s(t τ)dτ + n(t). (3.1) −∞ − The output of the channel can be thought of like a weighted sum of the input signal x(t) with weighting factor h(τ). A graphical representation of a SISO channel can be seen in Figure 3.2.

Figure 3.2: Single-input single-output Channel

Radio transmissions always take place at some carrier frequency, which is modulated for data transfer. The carrier frequency itself does not carry any information, and it is therefore convenient to remove the effect of the carrier in any analysis of the signal. For the example above, all signals involved are assumed to be real, since they relate to physical processes. Let S(f) be the Fourier transform of s(t), band-limited to [f W/2, f + W/2] where f is the c − c c carrier frequency and W is the signal bandwidth. For consistency, W < 2fc. The complex baseband equivalent version of the signal is then defined through its Fourier transform ( √2S(f + fc) f + fc > 0 Sb(f) = . (3.2) 0 f + f 0 c ≤ Since s(t) is real, its Fourier transform is symmetric around the origin. The transformation (3.2) therefore effectively moves the positive spectrum down from the carrier frequency to baseband, and scales it so that the sum power remains constant. The baseband equivalent spectrum will not be symmetric around the origin anymore, and the signal sb(t) defined by Sb(f) is therefore complex. Note that the original signal s(t) can easily be recovered from sb(t) since

1 ∗ S(f) = Sb(f fc) + S ( f fc) √2 − b − − and by taking inverse Fourier transforms   1 j2πfct ∗ −j2πfct j2πfct s(t) = sb(t)e + s (t)e = √2Real(sb(t)e ). (3.3) √2 b

13 The complex baseband equivalent signal is more convenient to work with than the passband signal, and since the transformation is invertible no information is lost in the conversion process. Finally, in order to work with the signals in a computer, a discrete-time model is needed. Sampling (3.1) faster than the Nyquist rate, it can be shown that a discrete-time model takes the form ∞ X r[m] = h[l]s[m l] + n[m] (3.4) − l=−∞ where h[l] is determined from the transmit, channel, and receive filters in place. For the full derivations, see e.g. [11, p. 49].

3.1.3 MIMO Channels A multiple-input multiple-out (MIMO) channel is a channel with several transmit and/or receive antennas. It is described by a matrix, since every transmit-receive antenna pair has a channel associated with it. These single-input single-output (SISO) sub-channels may be correlated. A graphical representation of a MIMO channel can be seen in Figure 3.3. The number of antennas used in MIMO communications systems are typically on the order of 2-4 antennas on either or both sides. For example, the IEEE 802.11n WiFi standard allows for up to 4 antennas on both sides [12].

. . . . .

Figure 3.3: Multiple-input multiple-output Channel

Following the same derivations as in the previous section, it can be shown that discrete- time complex baseband equivalent system for the MIMO channel with Mr receive antennas and Mt transmit antennas takes the form ∞ X r[m] = H[l]s[m l] + n[m] (3.5) − l=−∞

14 × × × where r[m], n[m] CMr 1, H CMr Mt , s[m] CMt 1. For the narrowband case, (3.5) { } ∈ ∈ ∈ simplifies to r[m] = Hs[m] + n[m]. The matrix channel H[m] describes how the transmitted signal is mixed in space and time, when sampled at the receiver. Row i of H[m] determines the weighting factors for the received signal at receiver antenna i at time lag m. Similarly, column j of H[m] describes the spatial signature of the signal from transmitter antenna j at time lag m. Taking the z-transform of both sides of (3.5), and using the convolution theorem for the z-transform [8, p. 191], the relation is described by the polynomial matrix equation r(z) = H(z)s(z) + n(z), (3.6) a fact we will rely heavily on in the following.

3.2 Channel Capacity and Achievable Rate

In his seminal paper ”A Mathematical Theory of Communication” [13], Claude E. Shannon, introduced the concept of a channel capacity, and derived it for the additive White Gaussian noise (AWGN) SISO channel with an average power constraint. The channel capacity, if known for the given channel model, is the highest possible rate that can be used for commu- nication over the channel with vanishing error probability. That is, for any rate below the channel capacity, there exists a code, with long enough codewords, that achieves that rate with arbitrarily small error probability. Shannon’s results provide a benchmark for which any practical transmission scheme can be compared. The proof of the channel coding theorem is not constructive however, and it is not until recently with the introduction of turbo codes and low density parity check (LDPC) codes that practical systems get close to the channel capacity [2, p. 252]. In [13], Shannon derived the channel capacity for the deterministic AWGN channel with 2 an average power constraint. With system bandwidth W , noise variance σn and average power less than P , the well-known formula for the capacity is  P  SAWGN = W log 1 + 2 . σn A similar result, for the channel capacity of deterministic narrowband AWGN MIMO channel with average sum power constraint, was derived in [14]. The capacity of the channel, 2 with spatially and temporally white Gaussian noise with variance σn, is given by

1 H SMIMO-AWGN = max log I + 2 HPH P σn H  where P = E ss is the covariance of the transmit symbol s. The rate R < SMIMO-AWGN

1 H R = log I + 2 HPH (3.7) σn is called an achievable rate, since for a sub-optimal choice of P it is less than the channel capacity. The noisy channel coding theorem of [13] therefore gives that such a rate R exists, since R < S and a code with rate S exists. For our purposes, (3.7) will simply serve as a mapping from SNR to data rate, which will be useful for comparison of transmission schemes in Chapter 6.

15 3.3 Equalization Techniques

For a wideband channel, multipath propagation is inherent. For high symbol rates, this results in ISI. What is meant with a high symbol rate is not clear cut, but the rule of thumb is that ISI must be mitigated when W >> Bc. In order to do symbol-by-symbol detection, the receiver needs to remove the ISI. The operation is called channel equalization, and there are many different strategies on how to do it. The equalizer needs information about how the channel behaves, typically the channel impulse response. This is obtained by a training phase, where the receiver can identify the channel from a sequence of known symbols that are sent. The optimal equalization, in the sense of minimizing the probability for that a symbol in the sequence is detected incorrectly, is called Maximum Likelihood Sequence Estimation (MLSE), and is implemented in practice using the Viterbi algorithm [15, p. 88]. A downside of MLSE is that the algorithm complexity grows exponentially in the number of channel taps [15, p. 90]. On the other side is the family of low-complexity linear equalizers. A common linear equalizer, the zero-forcing equalizer, removes the effect of ISI completely but suffers from noise amplification. Another equalizer, the MMSE receiver, makes an optimal trade-off between removing ISI and amplifying the noise, for the class of linear receivers.

Orthogonal Frequency-Division Multiplexing

For channels with high spectral dynamics, that is small coherence bandwidth Bc, there is another viable alternative. Orthogonal frequency-division multiplexing (OFDM) leverages the Fast Fourier Transform (FFT) and can handle quasi-static wideband channels well [15, p. 99]. In fact, OFDM can achieve the channel capacity as the number of sub-carriers grow large. Through OFDM, signaling is performed in the frequency domain. In order to launch the signal onto the channel, it is transformed to the time domain using an Inverse FFT (IFFT). At the receiver, the received signal is transformed to the frequency domain, where detection takes place. Effectively, OFDM transforms the wideband channel into a set of parallel independent channels in the frequency domain. For the SISO case, assume that a sequence of N bits b(n) 0, 1 is to be transmitted. 0 ∈× { } Through some mapping from b(n), a frequency vector s CN 1 is created. The frequency ∈ vector is transformed to the time domain through the application of an IDFT matrix F × ∈ CN N defined by j2π(i 1)(j 1) 1 − − − Fij = e N . (3.8) √N The time domain representation then takes the form s00 = FH s0. (3.9) In order to make the cyclic convolution of the DFT into a linear convolution, a cyclic prefix is prepended so that the signal T s = s00[N (L 1)] . . . s00[N 1] s00[0] s00[1] . . . s00[N] (3.10) e − − − is transmitted on the channel. Because of the cyclic prefix, the output of the channel will be a cyclic convolution of the input signal, plus noise L−1 X r[m] = h s00[(m L l)modN] + w[m] (3.11) e l − − l=0

16 At the receiver, stripping of the cyclic prefix, the vector

T r00 = r[0] r[1] ... r[N 1] (3.12) e e e − is formed. Transforming into the frequency domain, the received frequency vector

r0 = Fr00 (3.13) is obtained, which then is detected for every vector component. It can be shown that thanks to the cyclic prefix, the channel is rendered circulant, and the IFFT/FFT pair diagonalizes circulant matrices. The model, in the frequency domain, for the communications process is therefore r0 = Ωs0 + w (3.14) where Ω is a diagonal matrix with the channel gains of the different frequency bins on the diagonal. The noise keeps its characteristics due to the unitarity of the FFT transform. With the channel diagonalized over the frequency band, the transmitter is free to select varying transmit powers for the different frequency bins. The optimal way, in the sense of achievable rate, of doing this is by the waterfilling technique as further described by [11, p. 68]. The waterfilling strategy is also applied in Chapter 6, where a wideband MIMO channel is diagonalized in frequency through OFDM.

3.4 Summary

This chapter provided an introduction to channel modeling in wireless communications, as well as the concepts of channel capacity and achievable rate. Furthermore, multipath propagation and the sometimes ensuing inter-symbol interference was introduced. Some strategies for combating ISI was discussed, and the OFDM technique was described in detail. In Chapter 6, most of the ideas of this chapter will be referred to. In particular, OFDM will be used to mitigate ISI for the MIMO wideband scenario present there.

17 Chapter 4

Polynomial Decomposition Algorithms: Coefficient Nulling

This chapter will investigate a couple of polynomial decomposition algorithms which are based on the idea of iterative nulling of single coefficients. Polynomial Givens Rotations (PGRs) are employed for the coefficient nulling, as defined in Section 2.2.1. Through the application of consecutive PGRs on a matrix, decompositions such as PQRD and PSVD can be found. Four algorithms will be studied, of which two were proposed by Foster et. al. in [4]. The remaining two are slightly modified versions of the original algorithms. The decompositions generated by the algorithms will be approximations, because as shown by [16], an exact FIR decomposition of a FIR matrix is impossible to achieve. In this thesis, approximate polynomial decomposition algorithms will be used for the channel diagonalization problem of spatial multiplexing in wireless communications. This topic is further studied in Chapter 6. In [17], McWhirter proposes a different procedure for obtaining a polynomial singular value decomposition. There, a sequential best rotation algorithm is introduced using generalized Kogbetliantz transformations. The algorithm, which is not studied in this thesis, is shown to perform better than previous sequential best rotation procedures. The first section of this chapter will describe the performance measures employed in the study of the algorithms. As these are defined, the following sections will investigate one algorithm at a time, with respect to function, convergence and complexity.

4.1 Performance Criteria

In order to measure performance of the algorithms, a number of performance criteria will be defined. For the run time measurements, or algorithm complexity, the number of iterations of the innermost loop (coefficient steps, see Section 4.2) needed until convergence will be taken as the performance measure. The complexity in terms of floating point operations is then simply obtained by plugging in the complexity of a single coefficient step in terms of floating point operations, as given by equation (2.20). Before introducing the other performance criteria, the following optimization problems

18 are posed:

minimize A(z) Q(z)R(z) minimize A(z) U(z)D(z)VH (z−∗) k − kF k − kF subject to QH (z−∗)Q(z) = I subject to UH (z−∗)U(z) = I VH (z−∗)V(z) = I R (z) = 0 D (z) = 0. k lower kF k non-diag kF Casually speaking, the PQRD and PSVD algorithms under study, respectively, can be thought to approximately solve these optimization problems. The resulting matrices Q(z), R(z) or U(z), D(z), V(z) would however neither be feasible nor minimize the cost function. Rather, the algorithms will output matrices that are ”close” to fulfilling the constraints, while having a ”small” associated cost. With this loose argument in mind, error criteria will be defined that determine how ”close” a set of matrices are to fulfilling the constraints, and how ”small” the associated cost is. How well the product of the decomposition matrices describes the decomposed matrix is measured by the decomposition error

A(z) Q(z)R(z) EQRD = k − kF (4.1) d A(z) k kF for a PQRD A(z) = Q(z)R(z). For the same PQRD, the triangularity error

R (z) 2 E = k lower kF . (4.2) t A(z) 2 k kF shows the ratio of the amount of energy in the lower triangular section of R(z) to the total amount of energy. Similarly, for a PSVD A(z) = U(z)D(z)VH (z−∗) the decomposition error is A(z) U(z)D(z)VH (z−∗) ESVD = k − kF . (4.3) d A(z) k kF This decomposition error definition makes sense from the algorithm evaluation point of view, as it represents how well the decomposition factors describe the original matrix. From an application point of view, it might be interesting to study the normalized version of D(z) k − UH (z−∗)A(z)V(z) , as that definition instead would describe the error in the calculated kF singular values of the original matrix. The relative amount of energy in the non-diagonal part of D(z) is described by the diag- onality error D (z) 2 E = k non-diag kF . (4.4) diag A(z) 2 k kF Finally, the unitarity error of any matrix A(z) is defined as

I AH (z−∗)A(z) E = k − kF (4.5) u I k kF and indicates how close A(z) is to being paraunitary.

19 4.2 PQRD-BC: Polynomial QR Decomposition

The first algorithm to be studied is PQRD By Columns (PQRD-BC), which was first intro- duced in [4]. The algorithm generates an approximate PQRD

A(z) = Q(z)R(z) of an arbitrary matrix A(z), where Q(z) is approximately paraunitary and R(z) is approxi- mately upper triangular. This is done through the iterative application of PGRs, until A(z) has been transformed into a sufficiently upper . As will be shown in Sec- tion 4.4, PQRD-BC will be a necessary component of the PSVD by PQRD-BC algorithm. As the next algorithm to be studied, Modified PQRD-BC, is a derivative of PQRD-BC, we will refrain from presenting simulation results for PQRD-BC. Rather, the results for Modified PQRD-BC given in Section 4.3 also represent the typical behaviour of PQRD-BC. Though the original definition of PQRD-BC can be found in [4], we restate the algo- rithm here for completeness. A pseudocode representation of the algorithm can be seen in Algorithm 2.

Algorithm 2 PQRD By Columns (PQRD-BC) p×q 1: Input polynomial matrix A(z) (V1,V2, C ), convergence parameter , truncation ∼ parameter µ and absolute stopping criteria MaxIter and MaxSweeps. 2: Let Q(z) = Ip, g1 = 1 +  and n = 0. 3: while n MaxSweeps and g >  do ≤ 1 4: Let n = n + 1. 5: for k = 1 ... min(p 1, q) do − 6: Let iter = 0 and g2 = 1 +  7: while iter MaxIter and g >  do ≤ 2 8: Find j and t such that ajk(t) amk(t) holds for m = k + 1 . . . p and t Z. | | ≥ | | ∀ ∈ 9: Let g = a (t) . 2 | jk | 10: if g2 >  then 11: Let iter = iter + 1. 12: Obtain PGR G(z) as a function of (j, k, t, a (t) , a (0) ). | jk | | kk | 13: Let A(z) = G(z)A(z). 14: Let Q(z) = G(z)Q(z). 15: Truncate A(z), Q(z) given µ. 16: end if 17: end while 18: end for 19: Find j, k and t such that a (t) a (t) holds for n = 1 . . . q and m = n + 1 . . . p | jk | ≥ | mk | and t Z. ∀ ∈ 20: Let g = a (t) . 1 | jk | 21: end while 22: Let R(z) = A(z).

For future reference, the block of rows 12-15 will be called a coefficient step and the operations in rows 6-17 a column step. The algorithm operates over all columns of A(z) from left to right. For every column, a certain number of coefficient steps are performed,

20 until the coefficients in the given column are sufficiently small. To determine what coefficient to be nulled next, the coefficient with greatest magnitude under the main diagonal in the given column is found. This coefficient is subsequently nulled through the application of a polynomial Givens rotation with appropriate parameters. This behaviour is repeated until all coefficients in the given column have a magnitude less than the convergence parameter . As one column step is finished, the algorithm moves to the next column, until all columns have been traversed. As a safe-guard, the algorithm can restart from column 1 if necessary, making another sweep over the columns. Every coefficient step includes a matrix truncation, implemented as described by Algo- rithm 1. Without this truncation step, the maximum degrees of the matrix polynomials would grow very fast, and with that the memory requirements. As is generally the case, most of the energy of the coefficients is centered around the zero-lag coefficient, and the decomposition is therefore not ruined. This can specifically be seen in the example in Section 4.3.2.

4.2.1 Convergence and Complexity The polynomial Givens rotation is a paraunitary transformation, and therefore energy pre- serving. By selecting parameters according to equation (2.19), the rotation has the effect of moving energy from the coefficient being nulled to the zero-lag coefficient of the diagonal element of the current column. After a column step is finished, most of the energy of the coefficients below the main diagonal will have been transferred to the diagonal element. Be- cause the columns are visited in order from left to right, any subsequent coefficient step will mainly affect columns including and to the right of the current column. This is because the coefficients in the previous columns will be close to being zero. The left-to-right ordering of column steps, together with the fact that the algorithm is allowed to restart for more sweeps, guarantees the convergence of the algorithm. The full convergence proof can be found in [4]. In order to derive the theoretical complexity, assume that PQRD-BC is applied to A(z) p×q ∼ (V1,V2, C ). Additionally, it will be assumed that the time lag dimension of any matrix involved will be bounded by some r (V + V + 1), because of the truncation step at row 15. ∝ 1 2 The complexity will be derived in terms of number of coefficient steps needed for convergence. By definition, the block of rows 12-15 is one coefficient step. Let the complexity of the coefficient step be (C ) where C is some function of p, q, r. Since row 11 is (1), it is O c c O insignificant compared to the complexity of the coefficient step. Rows 8 and 19 involve searches for the coefficient with the largest magnitude under the main diagonal in the current column, and under the main diagonal in any column respectively. Denote the complexities of these algorithms as (C ) and (C ). O jt O jkt The number of iterations of the while loop starting at row 7 is bounded by the con- stant MaxIter, and the complexity of the loop is therefore (MaxIterC ) + (MaxIterC ) = O c O jt (C ) + (C ). This is a bound with a very large hidden constant in the Ordo expression, O c O jt and may therefore not be representative for the typical behaviour of the loop. In practice, the number of iterations of the loop would be proportional to the number of coefficients below the main diagonal of that column. The modified PQRD-BC given in Algorithm 3 does not suffer from this theoretical inconvenience, as the convergence criterion is defined differently. Furthermore, the for loop at row 5 will iterate min(p 1, q) times, and the complexity of the − block of rows 6-17 is therefore (min(p 1, q)C ) + (min(p 1, q)C ). The number of iter- O − c O − jt ations (sweeps) of the outermost loop of the algorithm is bounded by a constant MaxSweeps. The complexity of the iterative part of the algorithm is therefore (MaxSweeps min(p O −

21 1, q)C ) + (MaxSweeps min(p 1, q)C ) + (MaxSweepsC ) = (min(p 1, q)C ) + c O − jt O jkt O − c (min(p 1, q)C ) + (C ). O − jt O jkt The set-up costs at rows 1-2 are (pqr)+ (p2)+ (1) = (pqr)+ (p2). Finally, the cost O O O O O of row 22 is (pqr). Writing it out, the theoretical complexity of the PQRD-BC algorithm is O C = (min(p 1, q)C ) + (min(p 1, q)C ) + (C ) + (pqr) + (p2). (4.6) PQRD-BC O − c O − jt O jkt O O Recall that the Ordo notation contains an unknown constant, which in this case is large. This expression may therefore not give an accurate description of the complexity of the PQRD-BC algorithm. A brief simulation showed that for  = 10−3 and a matrix with coefficients drawn from a zero-mean circularly symmetric normalized Gaussian distribution, the hidden constant in the Ordo expression of the first term of (4.6) was on the order of 102.

4.2.2 Discussion The algorithm presented performs an approximate PQRD A(z) = Q(z)R(z) of an arbitrary matrix A(z). Convergence is defined in terms of an absolute criterion, which is reached eventually as shown in [4]. Because of its iterative behaviour and the way the maximum number of loop iterations is defined, it is hard to get tight bounds for the analytical complexity expressions. The complexity given by equation (4.6) may therefore be of less significance. Due to the absolute convergence criterion, the algorithm is unsuitable for direct use in a scenario where the scaling of the filter taps may vary over time. This is the case for a communications system. One way to overcome this problem could be to normalize the matrix first, and then applying the algorithm accordingly. This has the effect of mimicking a relative convergence criterion, but it would be better if the algorithm itself could deal with arbitrarily scaled matrices. An algorithm which has that ability is introduced in the next section.

4.3 MPQRD-BC: Modified PQRD-BC

In order to be better suited for implementation in a communications system, this section proposes some changes to the PQRD-BC algorithm. This will result in the Modified PQRD- BC (MPQRD-BC) algorithm, which will use a relative convergence criterion as opposed to the absolute criterion of PQRD-BC. The definition of convergence is changed so that given a p×q parameter 0 < r < 1 and matrix A(z) (0, r, C ), convergence is reached when ∼ R (z) 2 k lower kF <  . (4.7) A(z) 2 r k kF That is, convergence is defined as the state when the triangularity error, or the relative amount of squared magnitude of the coefficients below the main diagonal, is less than r. Equation (4.7) defines convergence for the outer loop of the algorithm, but the column step convergence criterion needs also to be changed. This is done by finding the coefficient in the column with the largest magnitude g2. The algorithm will move to a new column when

g2 rN 2 <  (4.8) sub A(z) 2 r k kF

22 where Nsub is the number of matrix elements below the main diagonal and r is the greatest maximum degree of any polynomial in the matrix. That is, the algorithm moves to the next column when the largest magnitude of all coefficients is so small, so that if all coefficients below the main diagonal in the matrix would have this magnitude, the algorithm would have converged. The value for r is updated for every coefficient step, to reflect the changes in maximum degrees due to the polynomial Givens rotations and the truncation step. Additionally, we will take the opportunity to redefine the way the maximum number of coefficient steps allowed is determined. In PQRD-BC, a parameter MaxIter is simply passed to the algorithm, and no more than MaxIter coefficient steps will be performed per column and sweep. In MPQRD-BC, this constant is instead defined to be

MaxIter = ρrN log ( ) (4.9) −b sub 10 r c where ρ is some positive integer parameter. The rationale behind the definition is that the number of coefficient steps needed probably is related to the number of coefficients below the main diagonal, as well as the convergence criterion selected. A brief simulation was performed, and the results showed that the coefficient steps needed were approximately linear in the logarithm of the convergence criterion. Because of the rather arbitrary choice of MaxIter, a fudge factor ρ is added to the expression. Solving equation (4.9) for ρ gives an expression in terms of MaxIter, and the behaviour of PQRD-BC, where MaxIter is explicitly defined, can therefore be simulated by selecting ρ accordingly. Finally, notice that the logarithm of r is negative, hence the minus sign in equation (4.9). Putting the changes introduced into perspective of the rest of the algorithm, we state Modified PQRD-BC using pseudocode in Algorithm 3.

4.3.1 Convergence and Complexity The convergence proof for MPQRD-BC follows directly from the convergence proof of PQRD- BC in [4], given large enough ρ and MaxSweeps. Because the iterative nulling of coefficients below the main diagonal, and the inherent energy moving property of the polynomial Givens rotation, the sum of the magnitude squared of the coefficients below the main diagonal will decrease. Convergence with respect to the coefficient with the largest magnitude below the main diagonal will therefore imply convergence in the sense of amount of energy below the main diagonal, for some convergence parameter r. The changed maximum number of iter- ations for the loops do not change this fact, as they do not alter the main behaviour of the algorithm. Because of the changes introduced, the theoretical complexity expressions will be different compared to PQRD-BC. Some rows have been added to the algorithm, and the number of loop iterations will have changed. In the following derivation, the same assumptions and notation as in Section 4.2.1 is used. Starting with the block enclosed by the if statement at row 13, the only new statement is at row 19, which is (1). The entire block therefore has the same complexity as the O equivalent block of PQRD-BC, that is (C ). The block inside the while loop of row 9 is O c functionally equivalent to the inner while loop of PQRD-BC. Row 12 is new though, and it is (pqr) + (1) = (pqr). Because MaxIter is given by row 4, this gives that the complexity O O O of the inner while loop at row 9 is given by

( ρrN log ( ) C ) + ( ρrN log ( ) pqr) + ( ρrN log ( ) C ). O −b sub 10 r c jt O −b sub 10 r c O −b sub 10 r c c

23 Algorithm 3 Modified PQRD By Columns (MPQRD-BC) p×q 1: Input polynomial matrix A(z) (V1,V2, C ), convergence parameter r, truncation ∼ parameter µ and stopping criteria ρ and MaxSweeps. 2: Let Q(z) = Ip, h1 = 1 + r, n = 0 and A0(z) = A(z). min(p−1,q) 3: Let N = p min(p 1, q) P 1 and r = V + V + 1. sub − − 1 1 2 4: Let MaxIter = ρrN log ( ) . −b sub 10 r c 5: while n MaxSweeps and h >  do ≤ 1 r 6: Let n = n + 1. 7: for k = 1 ... min(p 1, q) do − 8: Let iter = 0 and h2 = 1 + r 9: while iter MaxIter and h >  do ≤ 2 r 10: Find j and t such that ajk(t) amk(t) holds for m = k + 1 . . . p and t Z. | | ≥ | | ∀ ∈ 11: Let g2 = ajk(t) . | | 2 g2 12: Let h2 = rNsub 2 kA0(z)kF 13: if h2 > r then 14: Let iter = iter + 1. 15: Obtain PGR G(z) as a function of (j, k, t, a (t) , a (0) ). | jk | | kk | 16: Let A(z) = G(z)A(z). 17: Let Q(z) = G(z)Q(z). 18: Truncate A(z), Q(z) given µ. p×q 19: Let r = U1 + U2 + 1 if A(z) (U1,U2, C ). ∼ 20: end if 21: end while 22: end for 23: Find j, k and t such that a (t) a (τ) holds for n = 1 . . . q and m = n + 1 . . . p | jk | ≥ | mk | and τ [ V1,V2]. ∀ ∈ − 2 kAlower(z)kF 24: Let h1 = 2 . kA0(z)kF 25: end while 26: Let R(z) = A(z).

24 Neglecting row 8, as it only is (1), the for loop at row 7 has complexity O ( ρrN log ( ) (C + pqr + C ) min(p 1, q)). O −b sub 10 r c jt c − The number of iterations (sweeps) of the outermost loop is bounded by the constant MaxSweeps. Using the rule of constants for the Ordo operator, the complexity of the iterative part of the algorithm is therefore

( ρrN log ( ) (C + pqr + C ) min(p 1, q)) + (C ) + (pqr). O −b sub 10 r c jt c − O jkt O The set-up costs are the same as for PQRD-BC, that is (pqr)+ (p2). Row 26 is (pqr), O O O and therefore the theoretical complexity of MPQRD-BC is

C = ( ρrN log ( ) (C + pqr + C ) min(p 1, q))+ (C )+ (pqr)+ (p2) MPQRD-BC O −b sub 10 r c jt c − O jkt O O (4.10) −3 For r = 10 and matrix with coefficients drawn from a zero-mean circularly symmetric normalized Gaussian distribution, a brief simulation showed that the hidden constant in the Ordo expression of the first term of (4.10) was on the order of 101.

4.3.2 Simulations In this section, some numerical results regarding the properties of the MPQRD-BC algorithm are presented. First, the results of the algorithm working on a 3 3 matrix will be shown. × Secondly, the results of a decomposition quality study will be presented. Finally, the run time of the algorithm has been measured for various sizes of input data and parameter values.

A First Example × A matrix A(z) (0, 2, C3 3) was generated, with coefficients taken as observations drawn ∼ from a zero-mean normalized circularly symmetric Gaussian distribution. Viewing the matrix as an FIR filter, the impulse response can be seen in Figure 4.1a. −3 −6 The MPQRD-BC was applied to A(z) using parameters r = 10 , µ = 10 , ρ = 2, MaxSweeps = 10. The resulting Q(z) and R(z) can be seen in Figures 4.2a and 4.1b respectively. One sweep and 72 coefficient steps were necessary for convergence. The decom- position errors are shown in Table 4.1. The error values indicate a good decomposition, with

Table 4.1: Errors for PQRD of Matrix A(z).

Decomposition Triangularity Unitarity 4.6 10−3 1.5 10−4 4.8 10−3 · · ·

an almost paraunitary Q(z). Clearly, the triangularity error is less than r, as expected. Interestingly, it can be seen in Figure 4.1b that for all but the last matrix element on the diagonal, the zero-lag coefficient has the largest magnitude. This is due to the way the polynomial Givens rotation works; moving energy from the coefficient being nulled to the zero-lag coefficient of the diagonal matrix element of the column. Because there are more coefficients below the (1, 1) element than the (2, 2) element, the zero-lag coefficient of the former has larger magnitude than the zero-lag coefficient of the latter.

25 Matrix A(z) Matrix R(z) 4 4 4 2.5 2.5 2.5 2 2 2 3 3 3 1.5 1.5 1.5 2 2 2 1 1 1 1 1 1 0.5 0.5 0.5 0 0 0 0 0 0 10123 10123 1 0 1 2 3 10 0 10 10 0 10 10 0 10 − − − − − −

4 4 4 2.5 2.5 2.5 2 2 2 3 3 3 1.5 1.5 1.5 2 2 2 1 1 1 1 1 1 0.5 0.5 0.5 0 0 0 Tap Magnitudes

Tap Magnitudes 0 0 0 10123 10123 1 0 1 2 3 10 0 10 10 0 10 10 0 10 − − − − − −

4 4 4 2.5 2.5 2.5 2 2 2 3 3 3 1.5 1.5 1.5 2 2 2 1 1 1 1 1 1 0.5 0.5 0.5 0 0 0 0 0 0 10123 10123 1 0 1 2 3 10 0 10 10 0 10 10 0 10 − − Lags − − − Lags −

(a) Impulse Response of Original Matrix A(z) ∼ (b) Impulse Response of Upper Triangular Matrix 3 3 (0, 2, C × )

Figure 4.1: The Original and Upper Triangular Matrices Obtained From MPQRD-BC For −3 −6 r = 10 , µ = 10 , ρ = 2.

H 1 Matrix Q(z) Matrix Q(z)Q (z− ) 0.6 0.6 0.6 1 1 1 0.8 0.8 0.8 0.4 0.4 0.4 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0 0 0 0 0 0 10 0 10 10 0 10 10 0 10 20 0 20 20 0 20 20 0 20 − − − − − −

0.6 0.6 0.6 1 1 1 0.8 0.8 0.8 0.4 0.4 0.4 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0 0 0 0 0 0 Tap Magnitudes 10 0 10 10 0 10 10 0 10 Tap Magnitudes 20 0 20 20 0 20 20 0 20 − − − − − −

0.6 0.6 0.6 1 1 1 0.8 0.8 0.8 0.4 0.4 0.4 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0 0 0 0 0 0 10 0 10 10 0 10 10 0 10 20 0 20 20 0 20 20 0 20 − − Lags − − − Lags −

H (a) Impulse Response of Paraunitary Matrix Q(z) (b) Impulse Response of Q (z−∗)Q(z)

−3 Figure 4.2: The Paraunitary Matrix Obtained From MPQRD-BC For r = 10 , µ = 10−6, ρ = 2.

26 Decomposition Quality In this section, the connection between choice of algorithm parameter values and decom- position quality will be presented. The errors defined in equations (4.1), (4.2), (4.5) were calculated for a set of PQRDs generated by MPQRD-BC. The errors were measured for var- ious choices of the convergence parameter r and truncation parameter µ, and then averaged over 100 independently generated matrices. The other parameters were set so that ρ = 2, × MaxSweeps = 10. The matrices were A(z) (0, 2, C3 3) with coefficients drawn from a ∼ zero-mean normalized circularly symmetric Gaussian distribution. The relative decomposition errors can be seen in Figure 4.3. As expected, smaller µ gives a better decomposition, since less energy of the matrices involved is removed, and they therefore stay closer to the ideal matrices. A striking result is that the relative decomposition errors go up for decreasing r in Figure 4.3. However, note that the relative decomposition error indicates how well the decomposition imitates the original matrix, and not whether the decomposition is close to a true QR decomposition. A zero decomposition error could be achieved for the degenerate case of ρ = 0. That would result in Q(z) = I, R(z) = A(z), which minimizes the decomposition error, but is not close to the QR factorization. The fact that the relative decomposition error increases for decreasing r does therefore not necessarily mean that larger r is better. The unitarity error of the decomposition is shown in Figure 4.4. As expected, the unitarity errors go down for decreasing µ because less energy is truncated. It can also be seen that smaller r gives larger unitarity errors. This may be because smaller r results in more coefficient steps, and therefore more truncations. The last performance measure is the triangularity error shown in Figure 4.5. Per definition, the triangularity error is always less than r. The choice of value for µ does not to any large degree affect the triangularity error.

Complexity In the run time performance simulation, the number of coefficient steps needed for convergence was measured for different input matrices and algorithm parameters. For the effects of input data size, three simulation series were run, each varying one of the dimensions of the input matrix while keeping the rest constant. For every series, 100 polynomial matrices were generated with coefficients drawn from a zero-mean circularly symmetric Gaussian distribution. For every matrix, MPQRD-BC was applied to all principal sub-matrices obtained by removing a number of trailing columns and rows. By maintaining a dimension of 3 for the independent dimension sizes, this way, the matrix input size was varied. Finally, the measurements, as functions of matrix size, were averaged over the 100 realizations. The algorithm parameter values used are shown in Table 4.2.

Table 4.2: MPQRD-BC Parameter Values for Spatial/Temporal Series.

Indp. Dim. Size Convergence r Truncation µ MaxIter Factor ρ MaxSweeps 3 10−3 10−6 2 10

The run times for the spatial series, in terms of number of coefficient steps, can be seen

27 iue43 eopsto ro saFnto of Function a as Error Decomposition 4.3: Figure iue45 raglrt ro saFnto of Function a as Error Triangularity 4.5: Figure iue44 ntrt ro saFnto of Function a as Error Unitarity 4.4: Figure Triangularity error 10 Unitarity error RDE RDE 10 10 10 10 10 10 10 10 − 10 10 10 e.dcmoiinerra ucino loih paramet algorithm of function a as error decomposition Rel. − − − − − − − − 10 10 10 10 4 4 2 3 2 5 0 4 0 0 2 raglrt ro safnto fagrtmprmtr ( parameters algorithm of function a as error Triangularity

− − − ntrt ro safnto fagrtmprmtr (MPQR parameters algorithm of function a as error Unitarity 6 6 6 10 10 10 10 − − − 5 5 5 − 6 10 10 10 ovrec Parameter Convergence ovrec Parameter Convergence rnainParameter Truncation rnainParameter Truncation − − − 4 4 4 28 10 10 10 10 − − − − 3 3 3   4 r and  10 =  and and µ µ 10 10 10 − 10 = µ µ vrgdoe 0 Matrices. 100 over Averaged , 3   µ − − − r r µ vrgdoe 0 Matrices. 100 over Averaged , 2 2 2 vrgdoe 0 Matrices. 100 over Averaged , − 6 r (MPQRD-BC) ers 10 10 10 10  MPQRD-BC) r − − − − 1 1 1 10 = 2 µ D-BC) 10 = − 1 10 10 10 −

3 0 0 0 in Figure 4.6. The graphs suggest that the run time is linear in the number of rows and taps, but approximately constant in number of columns. The slope of the curve for the row series abruptly changes at the point for 3 rows. This is because the min(p 1, q) factor in − the complexity expression changes at that point, since the number of columns in the row measurement series was 3. In order to analyze the effect of selection of algorithm parameters on the run time, × MPQRD-BC was applied to 100 matrices on the form A(z) (0, 2, C3 3), while varying ∼ convergence parameter r and truncation parameter µ. The results of this study can be seen in Figure 4.7. It is clear that decreasing r results in more iterations until convergence. This is an intuitive result, because a smaller r means that the algorithm has more coefficients to zero out. Looking at the graph of iterations versus µ in Figure 4.7, it is clear that the number of iterations level out for sufficiently small µ. This can be interpreted as µ being small enough so that an insignificant amount of energy is truncated from the matrices. Selecting a smaller µ will therefore not affect the number of iterations.

4.3.3 Discussion The modifications introduced which led to the MPQRD-BC algorithm are straight-forward, but convenient. They lead to a direct convergence definition suitable for a communications system, and to tighter complexity bounds. As will be seen in Chapter 6, the designer of a communications system would probably like to select algorithm parameters based on an intuition of how they affect the capacity of the system. When using the triangularity error as convergence criterion, this is exactly what is done. Because the general behaviour of the algorithm is not changed, only simulation results for MPQRD-BC are presented, and not PQRD-BC. The qualitative analysis of the two algorithms is similar, and therefore we focus on the relative algorithm. Additionally, the convergence proof of MPQRD-BC follows directly from the convergence proof of PQRD-BC, as given by [4].

4.4 PSVD by PQRD-BC: Polynomial Singular Value Decom- position

With an understanding of the workings of PQRD-BC, we are now ready to study the PSVD by PQRD-BC algorithm, as proposed by Foster et. al. in [4]. Given an arbitrary matrix A(z), this algorithm obtains a PSVD of A(z) such that for some paraunitary matrices U(z), V(z)

A(z) = U(z)D(z)VH (z−∗)

where D(z) is diagonal. The general idea behind the algorithm is to obtain a PQRD of A(z), and then take the parahermitian conjugate of the resulting upper triangular matrix. The PQRD of this lower triangular matrix is then found, yielding a upper triangular matrix that thanks to the energy moving property of the polynomial Givens rotation has a smaller diag- onality error than the original matrix A(z). Iterating this behaviour, paraunitary matrices and one diagonal matrix is obtained, which together form a PSVD of A(z). Because of the iterative manner, and some necessary matrix truncation, the algorithm will not output an exact PSVD. Rather, the matrices are only approximately paraunitary

29 Number of iterations as a function of input size (MPQRD-BC) 800 800 800

600 600 600

400 400 400 Iterations 200 200 200

0 0 0 0 10 20 0 10 20 0 10 20 Rows Columns Taps

Figure 4.6: Number of coefficient steps (iterations) needed for convergence, as a function of input matrix size. The dimension size of the independent dimensions was 3.

Number of iterations as a function of algorithm parameters (MPQRD-BC) 250 6 3 200 µ = 10− µ = 10− 150 100 Iterations 50 0 5 4 3 2 1 0 10− 10− 10− 10− 10− 10

Convergence Parameter r

150

3 1 r = 10− r = 10− 100

50 Iterations

0 8 6 4 2 10− 10− 10− 10− Truncation Parameter µ

Figure 4.7: Number of coefficient steps (iterations) needed for convergence as a function of × algorithm parameters, for a Matrix A(z) (0, 2, C3 3) ∼

30 and approximately diagonal. The errors can be made arbitrarily small though, given enough time and memory is available. The PSVD of a matrix has an interesting application in spatial multiplexing for wideband wireless channels. By precoding and receive filtering by the obtained paraunitary matrices, a channel matrix can be diagonalized over all frequencies, so that signaling can be performed over a set of frequency-selective spatial modes. This application is further studied in Chap- ter 6. Since a modified version of PSVD by PQRD-BC will be presented in the next section, our focus will mainly be on the properties of that algorithm. In order to introduce the modified version, we restate PSVD by PQRD-BC in Algorithm 4.

Algorithm 4 PSVD by PQRD-BC p×q 1: Input polynomial matrix A(z) (V1,V2, C ), convergence parameter , trunca- ∼ tion parameter µ, absolute stopping criteria MaxPSVDIter, and PQRD-BC parameters MaxSweeps, MaxIter. 2: Let U(z) = Ip, V(z) = Iq, iter = 0, g = 1 + . 3: while iter < MaxPSVDIter and g >  do 4: Find j, k, t where j = k such that a (t) a (τ) holds for m = 1 . . . p, n = 1 . . . q 6 | jk | ≥ | mn | such that m = n and τ [ V ,V ]. 6 ∀ ∈ − 1 2 5: Let g = a (t) . | jk | 6: if g >  then 7: Let iter = iter + 1. 8: Call [U1(z), R1(z)] = pqrd bc (A(z), , µ, MaxIter, MaxSweeps). 0 H −∗ 9: Let A (z) = R1 (z ) and U(z) = U1(z)U(z). 0 10: Call [V1(z), R2(z)] = pqrd bc (A (z), , µ, MaxIter, MaxSweeps). H −∗ 11: Let A(z) = R2 (z ) and V(z) = V1(z)V(z). 12: Truncate A(z), U(z) and V(z) given µ. 13: end if 14: end while 15: Let D(z) = A(z).

In the following, rows 8-12 of Algorithm 4 will be referred to as a flip step. The name stems from the fact that the algorithm operates by iteratively applying PQRD-BC to a sequence of flipped matrices. The parameters  and µ have the same meaning as in PQRD-BC, and are also passed along the calls to PQRD-BC on rows 8 and 10. The only new parameter is MaxPSVDIter, which determines the maximum number of flip steps to allow.

4.4.1 Convergence and Complexity Convergence for PSVD by PQRD-BC is proven by a similar argument to that for PQRD-BC. For every call to PQRD-BC, the energy of the coefficients of the diagonal matrix elements will increase. As PQRD-BC is applied to the sequence of flipped matrices, eventually a sufficient amount of energy has been moved from the non-diagonal part, so that the coefficient with the largest magnitude has a magnitude less than . This is the state of convergence. The full proof of convergence is given in [4]. In order to derive the theoretical complexity expression, we note that rows 8-12 by defi- nition is a flip step. Denoting the complexity of a flip step (C ), where C is some function O f f

31 of p, q, r, it is clear that the complexity of the block of rows within the if statement at row 6 is (C ) + (1) = (C ). Using the definition from Section 4.2.1 for the complexity of row O f O O f 4, and noting that row 5 is (1) gives that the complexity of rows 4-13 is (C ) + (C ). O O f O jkt The maximum number of iterations of the while loop at row 3 is MaxPSVDIter. The complexity of the loop is therefore (MaxPSVDIterC ) + (MaxPSVDIterC ) = (C ) + O f O jkt O f (C ). Taking into account the start-up costs at rows 1-2, which are (pqr)+ (p2)+ (q2), O jkt O O O and noting that row 15 is (pqr), it is clear that the complexity of the entire algorithm is O C = (C ) + (C ) + (pqr) + (p2) + (q2). (4.11) PSVD by PQRD-BC O f O jkt O O O To express the complexity in terms of coefficient steps, as opposed to flip steps, we expand the C term in equation (4.11). Row 8 and 10 are each (C ). Rows 9 and 11 are f O PQRD-BC together (pqr) + (p3) + (q3). Finally, row 12 is (pqr2) as given by Section 2.2.3. This O O O O gives that C = (C ) + (pqr) + (p3) + (q3). f O PQRD-BC O O O Plugging in the expression for CPQRD-BC from equation (4.6), the final expression

C = (min(p 1, q)C )+ (min(p 1, q)C )+ (C )+ (pqr2)+ (p3)+ (q3) PSVD by PQRD-BC O − c O − jt O jkt O O O (4.12) is obtained. Note that the hidden constant in the Ordo terms from the PQRD-BC expressions may be very large. Equation (4.12) therefore suffers from the same problem as equation (4.6), that is that the bounds may not be tight. A brief simulation showed that for  = 10−3 and a matrix with coefficients drawn from a zero-mean circularly symmetric normalized Gaussian distribution, the hidden constant in the Ordo expression of the first term of (4.12) was on the order of 102.

4.4.2 Discussion PSVD by PQRD-BC is an algorithm for obtaining a PSVD of a matrix A(z), through a series of polynomial QR decompositions. The bulk of the work is performed by the PQRD-BC algo- rithm, which is called iteratively. Convergence is defined in terms of an absolute convergence criterion, making the algorithm unsuitable for direct implementation in a communications system. The theoretical complexity derivation gave that the algorithm complexity is linear in the number of coefficient steps. Recalling that the hidden constant of the Ordo operator may be very big, this is not a particularly interesting result. No deliberation is spent on this fact though, since a modified algorithm is introduced in the next section, which does not suffer from this problem.

4.5 MPSVD by MPQRD-BC: Modified PSVD

This section will introduce a modified version of PSVD by PQRD-BC that uses a relative convergence criterion. The structure of the modified algorithm is identical to the structure of PSVD by PQRD-BC, but the flip step is modified to employ MPQRD-BC for the PQRD. With these modifications in place, the algorithm will be suitable for direct implementation in a communications system. Indeed, MPSVD by MPQRD-BC will be the algorithm of choice for the transmission scheme to be presented in Chapter 6.

32 The relative convergence criterion is defined so that convergence is reached when

D (z) 2 k non-diag kF <  (4.13) A(z) 2 r k kF for a given parameter 0 < r < 1. That is, the state of convergence is obtained when the trian- gularity error is smaller than r. By this selection of convergence criteria, a direct relationship between the value of r and the capacity expressions of Chapter 6 can be established. Any en- ergy outside the diagonal part of a diagonalized channel matrix D(z) results in cross-channel interference. Decreasing r directly reduces this interference. A pseudocode representation of MPSVD by MPQRD-BC is shown in Algorithm 5.

Algorithm 5 MPSVD by MPQRD-BC p×q 1: Input polynomial matrix A(z) (V1,V2, C ), convergence parameter r, trunca- ∼ tion parameter µ, absolute stopping criteria MaxPSVDIter, and PQRD-BC parameters MaxSweeps, ρ. 2: Let U(z) = Ip, V(z) = Iq, iter = 0, h = 1 +  and A0(z) = A(z). 3: while iter < MaxPSVDIter and h > r do 2 kAnon-diag(z)kF 4: Let h = 2 . kA0(z)kF 5: if h > r then 6: Let iter = iter + 1. 7: Call [U1(z), R1(z)] = mpqrd bc (A(z), r/2, µ, ρ, MaxSweeps). 0 H −∗ 8: Let A (z) = R1 (z ) and U(z) = U1(z)U(z). 0 9: Call [V1(z), R2(z)] = mpqrd bc (A (z), r/2, µ, ρ, MaxSweeps). H −∗ 10: Let A(z) = R2 (z ) and V(z) = V1(z)V(z). 11: Truncate A(z), U(z) and V(z) given µ. 12: end if 13: end while 14: Let D(z) = A(z).

The structure of MPSVD by MPQRD-BC is obviously identical to the structure of PSVD by PQRD-BC. The behaviour of the modified algorithm is the same as PSVD by PQRD-BC, but the parameters have changed slightly. In addition to r, MPSVD by MPQRD-BC also needs a ρ for the calls to MPQRD-BC. The other parameters have the same function as for PSVD by PQRD-BC. It is worth noting that MPQRD-BC is called with convergence parameter r/2. The reason is that half of the non-diagonal energy of the diagonalized matrix is expected to be in each of the triangular parts.

4.5.1 Convergence and Complexity The convergence proof for MPSVD by MPQRD-BC follows directly from the convergence proofs of PSVD by PQRD-BC (see [4]) and MPQRD-BC. For every flip step, more energy will have been moved to the diagonal matrix element polynomials. As this goes on, eventually a state will be reached where a sufficient ratio of energy will be on the diagonal elements, thereby satisfying the triangularity error convergence criterion.

33 Through a similar argument as for PSVD by PQRD-BC, it can be shown that the com- plexity of MPSVD by MPQRD-BC is

C = (C ) + (pqr) + (p2) + (q2). (4.14) MPSVD by MPQRD-BC O f O O O The missing (C ) term is due to the fact that row 4 is replaced in MPSVD by MPQRD- O jkt BC. The hidden constant of the first Ordo term in (4.14) was estimated to be 302.5 as further described in Section 4.6.

4.5.2 Simulations In this section some numerical results regarding the properties of MPSVD by MPQRD-BC will be presented. First, the results of the algorithm applied to a single matrix will be presented. Secondly, the decomposition quality as a function of input data size and algorithm parameters is studied. Finally, the run time in terms of coefficient steps, has been measured for a set of matrices and algorithm parameter values.

A First Example × MPSVD by MPQRD-BC was applied to the same matrix A(z) (0, 2, C3 3) as in Sec- −3 −∼6 tion 4.3.2. The parameter values were selected so r = 10 , µ = 10 , ρ = 2 and MaxSweeps = 2, MaxPSVDIter = 10. Recall that the coefficients of the matrix element polynomials were drawn from a zero-mean normalized circularly symmetric Gaussian distribution. Seeing the original matrix as a multi-dimensional FIR filter, the impulse response is shown in Figure 4.8a. The resulting diagonalized matrix can be seen in Figure 4.8b, and the obtained paraunitary matrices U(z), V(z) are plotted in Figures 4.9a, 4.9b. The algorithm needed 4 flip steps to converge, and altogether 544 coefficient steps. As the PSVD was obtained, the errors from Section 4.1 were calculated. The computed values can be seen in Table 4.3. The results tell us that the decomposition is good, and that the matrices U(z), V(z) are close to being perfectly paraunitary.

Table 4.3: Errors for PSVD of Matrix A(z) from Section 4.3.2.

Decomposition Diagonality Unitarity U Unitarity V 1.1 10−2 4.8 10−4 1.4 10−2 7.0 10−3 · · · ·

Decomposition Quality In order to measure the impact of selection of algorithm parmameters on the decomposition 3×3 quality, MPSVD by MPQRD-BC was applied to 100 matrices Av(z) (0, 2, C ), for a set ∼ of parameters. The matrices were independently generated with coefficients drawn from a zero-mean normalized circularly symmetric Gaussian distribution. For every matrix, MPSVD by MPQRD-BC was applied while varying the convergence parameter r and truncation parameter µ. The results were then averaged over the 100 realizations.

34 Matrix A(z) Matrix D(z)

2.5 2.5 2.5 5 5 5 2 2 2 4 4 4 1.5 1.5 1.5 3 3 3 1 1 1 2 2 2 0.5 0.5 0.5 1 1 1 0 0 0 0 0 0 10123 10123 1 0 1 2 3 2 0 2 4 2 0 2 4 2 0 2 4 − − − − − −

2.5 2.5 2.5 5 5 5 2 2 2 4 4 4 1.5 1.5 1.5 3 3 3 1 1 1 2 2 2 0.5 0.5 0.5 1 1 1 0 0 0 Tap Magnitudes

Tap Magnitudes 0 0 0 10123 10123 1 0 1 2 3 2 0 2 4 2 0 2 4 2 0 2 4 − − − − − −

2.5 2.5 2.5 5 5 5 2 2 2 4 4 4 1.5 1.5 1.5 3 3 3 1 1 1 2 2 2 0.5 0.5 0.5 1 1 1 0 0 0 0 0 0 10123 10123 1 0 1 2 3 2 0 2 4 2 0 2 4 2 0 2 4 − − Lags − − − Lags −

(a) Impulse Response of Original Matrix A(z) ∼ (b) Impulse Response of Diagonalized Matrix 3 3 (0, 2, C × )

Figure 4.8: The Original and Diagonalized Matrices Obtained From a MPSVD by MPQRD- −3 −6 BC Run with r = 10 , µ = 10 , ρ = 2.

Matrix U(z) Matrix V(z)

0.6 0.6 0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2 0.2 0.2

0 0 0 0 0 0 5 0 5 5 0 5 5 0 5 5 0 5 5 0 5 5 0 5 − − − − − −

0.6 0.6 0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2 0.2 0.2

0 0 0 0 0 0 Tap Magnitudes 5 0 5 5 0 5 5 0 5 Tap Magnitudes 5 0 5 5 0 5 5 0 5 − − − − − −

0.6 0.6 0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2 0.2 0.2

0 0 0 0 0 0 5 0 5 5 0 5 5 0 5 5 0 5 5 0 5 5 0 5 − − Lags − − − Lags −

(a) Impulse Response of Paraunitary Matrix U(z) (b) Impulse Response of Paraunitary Matrix V(z)

Figure 4.9: The Paraunitary Matrices Obtained From MPSVD by MPQRD-BC Applied to −3 −6 the Original Matrix from Figure 4.8a, with r = 10 , µ = 10 , ρ = 2.

35 The relative decomposition error as a function of r and µ can be seen in Figure 4.10. The unitarity error of the matrix U(z) is shown in Figure 4.11. The curve for the unitarity error of matrix V(z) is similar to Figure 4.11. Finally, the diagonality error of matrix D(z) can be seen in Figure 4.12. Clearly, the relative decomposition error decreases for decreasing µ. Interestingly, the relative decomposition error increases for decreasing r, until it plateaus. This may be because a smaller r means more coefficient steps, and therefore more matrix truncations, which affect the decompositions negatively. Figure 4.11 tells us that smaller µ gives smaller unitarity error, as expected. For a given µ, a larger r decreases the unitarity error. This is probably because of the same reason as the decomposition error, that is that smaller r means more iterations and therefore worse unitarity performance. The last graph, Figure 4.12 shows that the choice of µ has no big impact on the diagonality error, but that r is directly proportional to the diagonality error. This is natural, since that is how convergence is defined.

Complexity In order to measure run time performance, in terms of number of coefficient steps needed for convergence, one hundred matrices were generated. Each matrix had coefficients of its polynomials drawn from a zero-mean normalized circularly symmetric Gaussian distribution. For every matrix, MPSVD by MPQRD-BC was applied to all principal sub-matrices obtained by removing a number of trailing columns and rows. The run time measurements were measured over the 100 matrices. The run time was measured for different input sizes by processing different principal sub- matrices of the original matrices. The results are shown in Figure 4.13. In the figure, it can be seen that the number of iterations behaves linearly for the row and column series, after a particular point. For the rows, this point is at 3, and for columns it is at 4. The reason is the min(p 1, q) term in the complexity derivations, which changes abruptly at a point. The last − sub-plot shows the number of iterations as a function of number of polynomial coefficients, or taps. The algorithm parameter values used for the series are shown in Table 4.4.

Table 4.4: MPSVD by MPQRD-BC Parameter Values for Spatial/Temporal Series.

Indp. Dim. Size Convergence r Truncation µ Factor ρ MaxSweeps MaxPSVDIter 3 10−3 10−6 2 10 100

The choice of algorithm parameter values was also investigated for the same 100 matrices. For every matrix, the sub-matrix of the first 3 rows and 3 taps was extracted. MPSVD by MPQRD-BC was then applied using different parameters. The results were averaged over 100 matrices. The results can be seen in Figure 4.14. The figure tells us that more iterations are needed for decreasing r. This is intuitive, as more coefficients have to be nulled in order to achieve lower diagonality error. For a lower µ, the number of iterations go up until they plateau. This is because after a while, no energy is removed because µ is so small, and therefore it will not matter if we choose smaller µ.

36 iue41:DcmoiinErra ucinof Function a as Error Decomposition 4.10: Figure iue41:DaoaiyErra ucinof Function a as Error Diagonality 4.12: Figure iue41:UiaiyErra ucinof Function a as Error Unitarity 4.11: Figure e.dcmoiinerra ucino loih paramet algorithm of function a as error decomposition Rel. ntrt ro for error Unitarity

Diagonality error (MP parameters algorithm of function a as error Diagonality Unitarity error RDE RDE 10 10 10 10 10 10 10 10 10 10 10 − − − − − − − − 10 0 5 2 1 0 0 4 3 4 2 2

− 10 6 − 6 10 U 10 − ( 8 − z 5 safnto fagrtmprmtr MSDb MPQRD-BC) by (MPSVD parameters algorithm of function a as ) 10 − 6 10 ovrec Parameter Convergence ovrec Parameter Convergence 10 rnainParameter Truncation rnainParameter Truncation 10 − 4 − − 6 4 37 10 − 10 3 10 −  4  − and  and 4 and 10 µ 10 µ µ µ   vrgdoe 0 Matrices. 100 over Averaged , − r r vrgdoe 0 Matrices. 100 over Averaged , − µ 2 2 vrgdoe 0 Matrices. 100 over Averaged , r MSDb MPQRD-BC) by (MPSVD ers 10 V yMPQRD-BC) by SVD − 10 2 10 µ µ − − 10 = 10 = 1 2   r r − − 10 = 10 = 3 6 10 10 10 − −

0 0 3 1 0 Number of iterations as a function of input size (MPSVD by MPQRD-BC)

2500 2500 2500

2000 2000 2000

1500 1500 1500

1000 1000 1000 Iterations

500 500 500

0 0 0 0 5 10 0 5 10 0 5 10 Rows Columns Taps

Figure 4.13: Number of coefficient steps (iterations) needed for convergence, as a function of input matrix size.

Number of iterations as a function of algorithm parameters (MPSVD by MPQRD-BC)

5 6 3 10 µ = 10− µ = 10− Iterations

100 6 5 4 3 2 1 0 10− 10− 10− 10− 10− 10− 10

Convergence Parameter r

105 3 1 r = 10− r = 10− Iterations

100 8 6 4 2 0 10− 10− 10− 10− 10 Truncation Parameter µ

Figure 4.14: Number of coefficient steps (iterations) needed for convergence as a function of algorithm parameters, for a 3 3 matrix with 3 lags. ×

38 4.5.3 Discussion The Modified PSVD by Modified PQRD-BC is a straight-forward extension of the PSVD by PQRD-BC algorithm. It retains the main behaviour of PSVD by PQRD-BC, but has a slightly altered convergence criterion. Most of the operations performed on the input matrix are in fact performed by MPQRD-BC, and the behaviour of MPSVD by MPQRD-BC therefore relies heavily on the behaviour of MPQRD-BC. For the given diagonalization strategy, that is to obtain the PQRD of a sequence of flipped matrices, there is probably not a lot of room for improvement of MPSVD by MPQRD-BC. Rather, any development efforts should probably be spent on MPQRD-BC. Thanks to its nature of accepting relative parameters, the MPSVD by MPQRD-BC is a candidate for implementation in a communications system. Chapter 6 presents the result of a study of the performance of MPSVSD by PQRD-BC, when applied to the channel diagonalization problem of spatial multiplexing in wireless communications.

4.6 Sampled PSVD vs. SVD in DFT Domain

In this section, a comparison will be made between the matrices obtained by sampling the polynomial matrices given by MPSVD by MPQRD-BC with the matrices given by a SVD in the DFT domain. The computational load for the two methods for a polynomial matrix with fixed spatial dimensions but varying temporal dimension will also be compared.

4.6.1 Frequency Domain Comparison The standard 3 3 polynomial matrix defined in Section 4.3.2 was decomposed using the × −1 −6 MPSVD by MQPRD-BC algorithm with parameters r = 10 , µ = 10 , ρ = 15, MaxSweeps = 10, MaxPSVDIter = 10. The obtained D(z) matrix was sampled at N = 512 points along the unit-circle, and the frequency and phase response were computed. As a comparison, the original matrix A(z) was also oversampled at N = 512 points along the unit-circle. A stan- dard constant SVD was performed for all the 512 matrices, and the magnitude and phase of the elements of the sequence of matrices Dk given by the SVDs were plotted in the same diagrams. The frequency response comparison can be seen in Figure 4.15, and the phase response comparison is visible in Figure 4.16. The reason for choosing a relatively large r was to give a visible difference in the plots. It is clear from Figure 4.15 that the frequency response of the sampled PSVD matrix follows the magnitude of the sequence of matrices Dk closely. For a smaller r, the difference would have been less. On the other hand, it can be seen in Figure 4.16 that the sequence of jω matrices Dk only have real elements, whereas the matrix D(e ) is complex. Therefore, the phase response of the sampled PSVD matrix are at most frequencies not close to the phase of the sequence of matrices Dk.

4.6.2 Computational Load Comparison, Set-Up Phase As will be shown in Chapter 6, the traditional way of diagonalizing a wideband MIMO channel in both frequency and space is to transform the channel into the frequency domain using the FFT, and then performing an SVD for every sub-carrier. Using an estimate for the computational load (in terms of floating-point operations, flops) of the SVD operation, it

39 Frequency response: Sampled PSVD (dotted), SVD in DFT Domain (solid) 20 0 0 10 20 20 0 − − 10 40 40 − 0 0.5 1 − 0 0.5 1 − 0 0.5 1 0 20 0 10 20 20 − 0 − 40 10 40 − 0 0.5 1 − 0 0.5 1 − 0 0.5 1 Power gain (dB) 0 0 20 10 20 20 − − 0 40 40 10 − 0 0.5 1 − 0 0.5 1 − 0 0.5 1 Normalized frequency

−1 jω Figure 4.15: Frequency Response of Sampled PSVD (r = 10 ) Matrix D(e ) and Magni- tudes of Sequence of SVD Matrices Dk.

is trivial to obtain the computational load for performing an SVD for every sub-carrier. As given by [9, p. 254], an estimate of the number of flops needed to perform an SVD yielding all three decomposition matrices of a p q matrix is × 2 2 3 C\SVD,1 = 4p q + 8pq + 9q .

The estimate of number of flops needed for N SVDs is then simply

2 2 3 C\SVD,N = N(4p q + 8pq + 9q ). (4.15)

In order to compare the computational load of the N SVDs to performing a PSVD in the z-domain, a rough estimate of the computational load (in terms of flops) will be derived. As shown by the simulation in Section 4.5.2, the number of coefficient steps needed for performing a PSVD on a 3 3 matrix displays a linear behaviour in terms of the number of taps of × the matrix, keeping the spatial dimensions fixed. An affine polynomial was fitted to the complexity measurements data of the temporal series in Section 4.5.2 using least squares. The slope of the curve obtained suggests that

Db = 302.5 coefficient steps/original matrix lag dimension size

which corresponds well to the plot in Figure 4.13. Recall that the temporal measurement series of Section 4.5.2 was performed with the MPSVD by MPQRD-BC parameters of Table 4.4, and therefore corresponds to good decompositions. The complexity of a coefficient step is estimated by the complexity of two polynomial Givens rotations, which is given by (2.20). This gives

C\ = 4q(r + 2 t ) 4qr (4.16) coefstep | | ≤ final

40 Phase response: Sampled PSVD (dotted), SVD in DFT Domain (solid) 10 10 10

0 0 0

10 10 10 − 0 0.5 1 − 0 0.5 1 − 0 0.5 1 10 10 10

0 0 0

Phase (rad) 10 10 10 − 0 0.5 1 − 0 0.5 1 − 0 0.5 1 10 10 10

0 0 0

10 10 10 − 0 0.5 1 − 0 0.5 1 − 0 0.5 1 Normalized frequency

−1 jω Figure 4.16: Phase Response of Sampled PSVD (r = 10 ) Matrix D(e ) and Magnitudes of Sequence of SVD Matrices Dk.

where rfinal is the number of lags of the final matrix. Accounting only for the computational load of the coefficient steps, the estimate for the computational load for performing an MPSVD by MPQRD-BC is then

CMPSVD\ by MPQRD-BC = MlDbC\coefstep. (4.17)

for a matrix with Ml lags. The estimates of (4.15) and (4.17) were calculated for a sequence of 100 3 3 matrices × with increasing number of lags. The comparison was made with four different FFT sizes. The results are shown in Figure 4.17 It can be seen that for increasingly large channel impulse responses, the computational load for performing the PSVD becomes prohibitive compared to performing N SVDs.

4.6.3 Computational Load, Online Phase The previous section compared the computational load for diagonalizing a wideband MIMO channel. The resulting (para-)unitary matrices are then used by precoding and receive filtering the symbol stream. In the traditional system, the unitary matrices are given as constant matrices in the frequency domain. It is therefore natural to perform the filtering of the data stream in the frequency domain. The PSVD approach, on the other hand, results in paraunitary polynomial matrices. The straight-forward way of implementing the filtering with these matrices is in the time domain, as the polynomial coefficients directly give the filter impulse response. However, as shown in e.g. [18, p. 10], for long filters it is more computationally advantageous to perform the filtering in the frequency domain. As the PSVD algorithm typically generates paraunitary matrices

41 Computational Load Comparison 1012 Constant SVD 512 s.c. Constant SVD 1024 s.c. 1010 Constant SVD 2048 s.c. Constant SVD 4096 s.c. MPSVD by MPQRD-BC 108

106 Approx. number of flops

104 100 101 102 103 Number of taps in Channel Impulse Response

Figure 4.17: Computational Load Comparison Between Performing a PSVD and Performing N SVDs. with significantly higher maximum degrees than the original channel matrix maximum degree, this will also be the case here. Assuming that the two systems are using the same number of sub-carriers, the online complexity of the two systems will therefore be the same. The investigation in Section 4.6.2 is therefore sufficient for comparing the computational load of the two systems.

4.6.4 Discussion The fact that the frequency response of the sampled matrix D(ejω) followed the magnitude of the sequence of matrices Dk in Figure 4.15 is reassuring since it tells us that MPSVD by MPQRD-BC gives solutions with the same qualitative behaviour as performing SVDs in the DFT domain. The fact that the phase in Figure 4.15 is not constant zero shows that the sampled D(ejω) will in general be complex, whereas the constant SVD always produces real singular values. The computational load comparison performed in Section 4.6.2 is telling. It is obvious that applying the MPSVD by MPQRD-BC to a polynomial matrix, and then sampling the approximately diagonal factor D(z) along the unit-circle is not computationally advantageous as compared to the traditional approach. In the computation, only the computational load coming from the application of the polynomial Givens rotations was accounted for. This is the bulk of the work load for the algorithm, but as seen in Section 4.5.1 there are more terms in the complexity expression. The investigation uses estimates of the computational load for performing an MPSVD by MPQRD-BC of a 3 3 matrix, and the factor D is calculated × coefstep from a limited data set, but the results still give the trend. Furthermore, the N frequency-domain SVDs can easily be parallelized, since they are independent of each other. It does not seem trivial to parallelize the PSVD algorithm however.

42 4.7 Summary

This chapter has presented four algorithms for approximate decompositions of polynomial matrices through coefficient nulling by polynomial Givens rotations. Two of the algorithms presented were modifications of the remaining two. All algorithms perform the approximate decompositions by in some way iteratively applying PGRs until a predetermined convergence criterion is satisfied. Since the decomposition matrices returned by the algorithms are approximations, the decompositions will naturally have some error associated with them. A number of error criteria were defined and described. A decomposition can only be said to be good if all error criteria are low. It may be the case that a decomposition has a low relative decomposition error, but this may be of no value if the unitarity or triangularity/diagonality errors of the matrices are high. In order to bring some intuition into what the error criteria mean, two optimization problems were introduced. A common theme in the algorithms is that the maximum degrees of the polynomials in- volved may grow fast, because of the properties of the polynomial Givens rotation. A matrix truncation step is therefore introduced at certain places in the algorithms. The truncation removes the outer coefficients of the matrix polynomials, if these are deemed to be unimpor- tant in the sense that their energy is low. The truncation algorithm used (Algorithm 1) has a complexity of (pqr2), but this could probably be reduced by implementing some sort of O binary search algorithm. The potential use for spatial multiplexing of the original algorithms in [4] is disturbed by the fact that they have an absolute convergence criterion. The two modified algorithms presented in this chapter deal with this problem, by redefining the state of convergence. Using the new definitions, a direct link between the diagonality error measure and cross- channel interference in the spatial multiplexing system of Chapter 6 can be established. A system designer can therefore determine the algorithm parameters based on an intuition on their effect on system capacity. Finally, it was shown that the computational load becomes prohibitive if the goal of applying the algorithm is to sample the obtained matrices along the unit-circle. For the scenario of moderate to large channel impulse responses, it is more computationally feasible to apply an FFT and perform multiple SVDs than to apply the MPSVD by MPQRD-BC algorithm and sampling it along the unit-circle. There are other approaches to polynomial decomposition, that do not rely on single coef- ficient polynomial Givens rotations. The next chapter will study algorithms for polynomial decomposition obtained from rational Givens rotations that null entire polynomial matrix elements exactly. By this strategy, a PQRD can be obtained with zero triangularity error.

43 Chapter 5

Rational Decomposition Algorithms: Polynomial Nulling

This chapter will propose two polynomial decomposition algorithms based on the idea of exact polynomial nulling. Rather than nulling polynomial coefficients, one at a time, the approach taken in this chapter is to null an entire polynomial matrix element per iteration. In order to preserve the paraunitarity of the polynomial Givens rotation, it has to be extended to allow for rational functions. From a signal processing point of view, this corresponds to filters with infinite impulse responses (IIR). Care has to be taken to ensure the stability of the IIR filters, because their poles are not confined to the origin as is the case for FIR filters. The notion of exact polynomial nulling used as the foundation of the algorithms to be presented was proposed by Bengtsson in [19]. The QRD algorithm presented in this chapter bears similarities with the rather old tri- angularization algorithm of [6, p. 33]. There, an arbitrary polynomial matrix is transformed into the Hermite row form, which can be thought of as the analogue to the of constant matrices. Elementary row operations are used, and therefore the decomposition obtained is not necessarily the QR decomposition. Henrion in [20] states that the triangular- ization algorithm of [6] is well-known to be impractical due to bad numerical behaviour. A numerically stable algorithm for the triangularization problem is then proposed in [20]. As will be shown, the algorithms of this chapter will unfortunately also suffer from numerical instability.

5.1 Rational Givens Rotation

Applying the polynomial Givens rotation (PGR) in (2.18) to a 2 1 polynomial vector x(z) = T × x1(z) x2(z) , a specific coefficient of x2(z) will be nulled, and in the process all other coefficients of both element polynomials will be altered. Iterating this behaviour, a state will eventually be reached when the magnitude of the dominant coefficient of x2(z) is less than some constant . Convergence in this sense is certain, due to the energy moving property of the PGR. If the goal is to null the entire polynomial x2(z), then there are other ways of doing this. In this section, a rational Givens rotation (RGR) will be developed for the purpose of exact polynomial nulling of x2(z). The derivations resemble the steps in [16], where it however not

44 is clear how the denominator factorization step is performed. Let

α(z) = x2(z)

β(z) = x1(z) ∗ γ(z) = γ+(z) γ+(z−∗) = α(z)α∗(z−∗) + β(z)β∗(z−∗)

where γ(z) = γ+(z)(γ+(z−∗))∗ is the canonical spectral factorization. Assuming that the coefficients of γ(z) are exponentially bounded, and that γ(z) has no unit-circle zeros, the factorization is known to exist and be unique, cf. [21]. A property of the factorization is that γ+(z) is minimum-phase, i.e. it is invertible stable. Now, set up the polynomial matrix

 β∗(z−∗) α∗(z−∗) G (z) = − − (5.1) f α(z) β(z) − and note that  α(z)α∗(z−∗) β(z)β∗(z−∗)  α(z)α∗(z−∗) β(z)β∗(z−∗) G (z)x(z) = − − = − − . f α(z)β(z) α(z)β(z) 0 − However, the matrix in (5.1) is not paraunitary because

α(z)α∗(z−∗) + β(z)β∗(z−∗) α∗(z−∗)β(z) α∗(z−∗)β(z) GH (z−∗)G (z) = − f f α(z)β∗(z−∗) α(z)β∗(z−∗) α(z)α∗(z−∗) + β(z)β∗(z−∗) − γ(z) 0  = . 0 γ(z)

By allowing rational functions, the rational Givens rotation takes the form 1 G (z) = G (z) (5.2) r γ+(z) f which easily is verified to be paraunitary. Recall that 1/γ+(z) is stable since γ+(z) is minimum-phase. The rational Givens rotation defined by (5.2) is extended to the p p case by setting up × a p p matrix with γ+(z) on the diagonal elements, and the elements at the intersection × of rows i, j and columns i, j taken from (5.1). Normalizing by γ+(z), the extended RGR is paraunitary. Applying the p p RGR to a p q matrix, the element at (i, j) will be exactly × × nulled.

5.2 PQRD-R: Rational QR Decomposition

With the p p RGR defined, the Rational PQRD (PQRD-R) algorithm can be stated. The × general idea is to apply one RGR for every sub-diagonal matrix element of the matrix A(z) until it is on an upper triangular form. Keeping track of the accumulated paraunitary RGRs, and recalling that the inverse of a paraunitary matrix is its parahermitian conjugate, an invertible decomposition is formed. Let Gr,n(z) = (1/cn(z)) Gf,n(z) be the RGR applied at

45 iteration n, and the total number of iterations N. Then the decomposition is defined by the rational matrices H −∗ Qr (z ) = Gr,N (z)Gr,N−1(z) ... Gr,1(z) 1 = Gf,N (z)Gf,N−1(z) ... Gf,1(z) (5.3) cN (z)cN−1(z) . . . c1(z) H −∗ Rr(z) = Gr,N (z)Gr,N−1(z) ... Gr,1(z)A(z) = Qr (z )A(z) because H −∗ A(z) = Qr(z)Rr(z) = Qr(z)Qr (z ) A(z) | {z } I

Note that Qr(z) is an anti-causal filter with all poles and zeros outside the unit circle. The algorithm operates by separately keeping track of the numerator and denominator of the rational matrices. As seen by (5.3), this is possible since all involved matrices have the same denominator, or its parahermitian conjugate. A pseudocode representation of PQRD-R can be seen in Algorithm 6.

Algorithm 6 Rational PQRD (PQRD-R) p×q 1: Input polynomial matrix A(z) (V1,V2, C ). H −∗ ∼ 2: Let Qf (z ) = Ip, c(z) = 1. 3: for k = 1 ... min(p 1, q) do − 4: for j = (k + 1) . . . p do 5: 1 Obtain RGR Gr(z) = c(z) Gf (z) with (α(z), β(z), γ(z)) as a function of (A(z), j, k). 6: Let A(z) = Gf (z)A(z). H −∗ H −∗ 7: Let Qf (z ) = Gf (z)Qf (z ). 8: Let c(z) = c(z)γ+(z). 9: end for 10: Let Rf (z) = A(z). 11: end for

The spectral factorization needed for obtaining the RGR at row 5 in Algorithm 6 was implemented using the Kalman filtering approach of [21]. The implementation turned out to be sensitive to the assumption that the function did not have any unit-circle zeroes; this problem is discussed further in later sections.

5.2.1 Simulations × The Rational PQRD was applied to the matrix A(z) (0, 2, C3 3) defined in Section 4.3.2. ∼ The impulse response of the matrix can be seen in Figure 4.1a, and the frequency response is shown in Figure 5.1a. The frequency responses of the approximately upper triangular Rr(z) H −∗ and the approximately paraunitary Qr (z ) are graphed in Figures 5.1b and 5.2a respectively. It can be seen that Rr(z) indeed is approximately upper-triangular; for this particular case the sub-diagonal elements had power gains of around 300 dB, and are therefore not visible − in the plot. H −∗ Paraunitarity for Qr (z ) is not obvious from Figure 5.2a. However, the frequency re- H −∗ H −∗ sponse of Qr (z )Qr(z) shown in Figure 5.2b shows that Qr (z ) is approximately parauni- tary. The phase response was flat, but is not plotted here. The relative decomposition error,

46 defined as the square root of the sum of the relative error squared in the frequency domain, was 1.9 10−27. · The algorithm was also applied to a 4 4 matrix. The frequency response of the original × matrix can be seen in Figure 5.3a, and the frequency responses of the decomposition factors are shown in Figures 5.3b and 5.4a. The decomposition factor frequency responses are distorted, H −∗ as well as the frequency response of Qr (z )Qr(z) in Figure 5.4b. During the algorithm run, the spectral factorization sub-routine warned for zeroes close to the unit-circle, so the cause for the numerical instability is probably from the spectral factorization.

5.2.2 Discussion The PQRD-R algorithm with a Kalman filtering spectral factorization step sometimes con- verges to a good rational decomposition of a polynomial matrix. The spectral factorization step is sensitive to functions with zeros close to the unit circle, as spectral factors are not guaranteed to exist then. This is visible in the second example plots in Figures 5.3, 5.4. When performing the spectral factorization by solving a Discrete Algebraic Riccatti Equa- tion, as described in [21], the same problems arose. The numerical problems therefore seem inherent in the algorithm, rather than in the specific implementation. For every iteration, an entire matrix element polynomial is nulled using a rational Givens rotation. This is in contrast to the polynomial Givens rotation used in Chapter 4, which only nulled one coefficient at a time. The PQRD-R algorithm finishes in a known number of steps, as opposed to the PQRD algorithms of 4. This behaviour makes the algorithm easier to analyze; this is however left for future work.

47 Frequency Response for A(z) Frequency Response for Rr(z) 20 20 20 20 20 20 10 10 10 10 10 10 0 0 0 0 0 0 10 10 10 10 10 10 − − − − − − 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 20 20 20 20 20 20 10 10 10 10 10 10 0 0 0 0 0 0 10 10 10 10 10 10 − − − − − − 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 Power gain (dB) Power gain (dB) 20 20 20 20 20 20 10 10 10 10 10 10 0 0 0 0 0 0 10 10 10 10 10 10 − − − − − − 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 Normalized frequency Normalized frequency

(a) Frequency Response of Original Matrix (b) Frequency Response of Upper Triangular Matrix 3 3 A(z) ∼ (0, 2, C × )

Figure 5.1: The Original and Upper Triangular Matrices Obtained From PQRD-R for Matrix A(z) from Section 4.3.2.

H H Frequency Response for Qr (z−∗) Frequency Response for Qr (z−∗)Qr(z) 1 1 1 0 0 0 0.5 0.5 0.5 20 20 20 − − − 0 0 0 40 40 40 0.5 0.5 0.5 − − − 60 60 60 − 1 − 1 − 1 − 0 0.5− 1 0 0.5− 1 0 0.5 1 − 0 0.5 1− 0 0.5 1− 0 0.5 1 1 1 1 0 0 0 0.5 0.5 0.5 20 20 20 − − − 0 0 0 40 40 40 0.5 0.5 0.5 − − − 60 60 60 − 1 − 1 − 1 − 0 0.5 −1 0 0.5− 1 0 0.5 1 − 0 0.5 1− 0 0.5 1− 0 0.5 1 Power gain (dB) Power gain (dB) 1 1 1 0 0 0 0.5 0.5 0.5 20 20 20 − − − 0 0 0 40 40 40 0.5 0.5 0.5 − − − 60 60 60 − 1 − 1 − 1 − 0 0.5− 1 0 0.5− 1 0 0.5 1 − 0 0.5 1− 0 0.5 1− 0 0.5 1 Normalized frequency Normalized frequency

H (a) Frequency Response of Paraunitary Matrix (b) Frequency Response of Qr (z−∗)Qr(z) H Qr (z−∗)

Figure 5.2: The Paraunitary Matrix Obtained From PQRD-R for Matrix A(z) from Sec- tion 4.3.2.

48 Frequency Response for A(z) Frequency Response for Rr(z) 20 20 20 20 20 20 20 20 10 10 10 10 10 10 10 10 0 0 0 0 0 0 0 0 10 10 10 10 10 10 10 10 − − − − − 0 0.5− 1 0 0.5− 1 0 0.5− 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 20 20 20 20 20 20 20 20 10 10 10 10 10 10 10 10 0 0 0 0 0 0 0 0 10 10 10 10 10 10 10 10 − − − − − 0 0.5 −1 0 0.5− 1 0 0.5− 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 20 20 20 20 20 20 20 20 10 10 10 10 10 10 10 10 0 0 0 0 0 0 0 0 10 10 10 10 10 10 10 10 − 0 0.5 −1 0 0.5 −1 0 0.5− 1 0 0.5 1 − 0 0.5 −1 0 0.5 −1 0 0.5− 1 0 0.5 1 20 20 20 20 20 20 20 20 Power gain (dB) Power10 gain (dB) 10 10 10 10 10 10 10 0 0 0 0 0 0 0 0 10 10 10 10 10 10 10 10 − 0 0.5− 1 0 0.5 −1 0 0.5 −1 0 0.5 1 − 0 0.5− 1 0 0.5 −1 0 0.5 −1 0 0.5 1 Normalized frequency Normalized frequency (a) Frequency Response of Original Matrix (b) Frequency Response of Upper Triangular Matrix 4 4 B(z) ∼ (0, 2, C × )

Figure 5.3: The Original and Upper Triangular Matrices Obtained From PQRD-R for a 4 4 × Matrix.

H H Frequency Response for Qr (z−∗) Frequency Response for Qr (z−∗)Qr(z) 0 0 0 0 1 1 1 1 20 20 20 20 0 0 0 0 −40 −40 −40 −40 −60 −60 −60 −60 1 1 1 1 − 0 0.5− 1 0 0.5− 1 0 0.5− 1 0 0.5 1 − 0 0.5− 1 0 0.5− 1 0 0.5− 1 0 0.5 1 0 0 0 0 1 1 1 1 20 20 20 20 0 0 0 0 −40 −40 −40 −40 −60 −60 −60 −60 1 1 1 1 − 0 0.5 −1 0 0.5− 1 0 0.5− 1 0 0.5 1 − 0 0.5 −1 0 0.5− 1 0 0.5− 1 0 0.5 1 0 0 0 0 1 1 1 1 20 20 20 20 0 0 0 0 −40 −40 −40 −40 −60 −60 −60 −60 1 1 1 1 − 0 0.5 −1 0 0.5 −1 0 0.5− 1 0 0.5 1 − 0 0.5− 1 0 0.5 −1 0 0.5− 1 0 0.5 1 1 1 1 1 Power gain (dB) 0 0 0 0 Power gain (dB) 20 20 20 20 0 0 0 0 −40 −40 −40 −40 −60 −60 −60 −60 1 1 1 1 − 0 0.5− 1 0 0.5 −1 0 0.5 −1 0 0.5 1 − 0 0.5− 1 0 0.5− 1 0 0.5 −1 0 0.5 1 Normalized frequency Normalized frequency

H (a) Frequency Response of Paraunitary Matrix (b) Frequency Response of Qr (z−∗)Qr(z) H Qr (z−∗)

Figure 5.4: The Paraunitary Matrix Obtained From PQRD-R for a 4 4 Matrix. ×

49 5.3 PSVD-R by PQRD-R: Rational Singular Value Decompo- sition

The approach used in Chapter 4, to use an PQRD algorithm to devise an PSVD algorithm, invites to the same idea for the rational case. For the coefficient nulling case, convergence is proved thanks to the energy moving property of the polynomial Givens rotation. By iterating long enough, a large enough chunk of energy will have moved to the coefficients belonging to the zero-lag taps of the diagonal matrix elements. In the rational case, one of the off-diagonal parts of the matrix will be completely nulled. During that stage, the other off-diagonal part of the matrix, which in the previous iteration was completely nulled, will be polluted by the nulling of the first off-diagonal part. For every iteration though, the amount of energy in the off-diagonal parts of the matrix decreases.

Algorithm 7 Rational SVD (PSVD-R by PQRD-R) p×q 1: Input polynomial matrix A(z) (V1,V2, C ) and parameter MaxPSVDIter. H −∗ H −∗ ∼ 2: Let Uf (z ) = Ip, Vf (z ) = Ip, c(z) = 1, d(z) = 1. 3: Let iter = 0. 4: while iter < MaxPSVDIter do 5: Let iter = iter + 1. H −∗ 6: Call [Uf,1(z ), Rf,1(z), c1(z)] = pqrd r(A(z)). H −∗ H −∗ H −∗ 7: Let Uf (z ) = Uf,1(z )Uf (z ), c(z) = c1(z)c(z). 0 H −∗ 8: Let A (z) = Rf,1(z ). H −∗ 0 9: Call [Vf,1(z ), Rf,2(z), d1(z)] = pqrd r(A (z)). H −∗ H −∗ H −∗ 10: Let Vf (z ) = Vf,1(z )Vf (z ), d(z) = d1(z)d(z). H −∗ 11: Let A(z) = Rf,2(z ). 12: end while 13: Let Df (z) = A(z).

5.3.1 Simulations Two results are presented: the decomposition after two and threee iterations of the PSVD-R by PQRD-R algorithm applied to the matrix A(z). The frequency response of the matrix can be seen in Figure 5.5. After two iterations, the decomposition does not result in a perfectly diagonal matrix, but still has a low decomposition error. After three iterations, on the other hand, the decomposition is severely distorted, due to numerical instability. The frequency responses of the decomposition and the approximately diagonal matrix, after 2 iterations of PSVD-R by PQRD-R, can be seen in Figures 5.6a, 5.6b. For the same H −∗ H −∗ decomposition, the approximate paraunitarity of the matrices Ur (z ), Vr (z ) are shown in Figures 5.7a, 5.7b. The decomposition is good, but the (2,1) element of the approximately diagonal matrix has still a significant magnitude. The (1,2) element of the same matrix is almost zero, and is not visible in the plot. Letting the algorithm do one more iteration, the decomposition frequency response is shown in Figure 5.8a. The plot is distorted, and the decomposition is clearly incorrect. The same effect can be seen in Figures 5.8b, 5.9a, 5.9b. This result shows that the PSVD-R by

50 Frequency Response for A(z). 15 15 10 10 5 5 0 0 5 5 −10 −10 −15 −15 −20 −20 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1

15 15 10 10

Power gain (dB) 5 5 0 0 5 5 −10 −10 −15 −15 −20 −20 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1 Normalized frequency

Figure 5.5: Frequency Response of Matrix A(z).

PQRD-R algorithm is not stable for this particular example. It seems like it is the spectral factorization step that causes the problems.

5.3.2 Discussion The PSVD-R by PQRD-R algorithm uses the PQRD-R in a straight-forward way. For every iteration, the PQRD-R is applied to a flipped set of matrices. Due to the nature of the PQRD-R, the output matrix will be perfectly upper triangular. The output of the PSVD-R by PQRD-R will not be perfectly diagonal however, because every subsequent application of the PQRD-R pollutes the previously perfectly triangular matrix. When the algorithm remains stable, the resulting decomposition is good but the approx- imately diagonal matrix will have sub-diagonal section which is dominant compared to the super-diagonal section. The algorithm is shown to easily break down though, after which it is not interesting to talk about decomposition quality.

5.4 Summary

Exact polynomial nulling is an interesting idea for polynomial matrix decomposition algo- rithms. Since the goal is to obtain matrices with some part completely zeroed out, it seems more straight-forward to zero all matrix elements in that part directly, rather than to iter- atively zero all coefficients of those matrix elements. The growth in degrees of the involved polynomials could be analyzed theoretically, since the number of steps is predetermined. The algorithms of Chapter 4 are also deterministic, but harder to analyze due to their dependence on the distribution of the initial polynomial coefficients. The algorithms in this chapter were shown to work for some examples, and shown to break down for others. They are not viable alternatives to the algorithms of Chapter 4, unless numerical stability can be guaranteed. It is left for future efforts how to improve the

51 H Frequency Response for D (z), 2iter. Frequency Response for Ur(z)Dr(z)Vr (z−∗), 2 iter. r 15 15 10 10 20 20 5 5 0 0 20 20 0 0 − − 5 5 40 40 −10 −10 −60 −60 −15 −15 −80 −80 −20 −20 −100 −100 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1− 0 0.2 0.4 0.6 0.8 1 15 15 10 10 20 20 0 0 Power gain (dB) 5 5 Power gain (dB) 20 20 0 0 − − 5 5 40 40 −10 −10 −60 −60 −15 −15 −80 −80 −20 −20 −100 −100 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1− 0 0.2 0.4 0.6 0.8 1 Normalized frequency Normalized frequency (a) Frequency Response of Decomposition (b) Frequency Response of Diagonal Matrix H Ur(z)Dr(z)Vr (z−∗)

Figure 5.6: The Decomposition and Diagonal Matrices Obtained From PSVD-R after 2 iter- ations.

H H Frequency Response for Ur (z−∗)Ur(z), 2 iter. Frequency Response for Vr (z−∗)Vr(z), 2 iter. 1 1 1 1 0.5 0.5 0.5 0.5 0 0 0 0 0.5 0.5 0.5 0.5 − − − − 1 1 1 1 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1 1 1 1 1 0.5 0.5 0.5 0.5 Power gain (dB) Power gain (dB) 0 0 0 0 0.5 0.5 0.5 0.5 − − − − 1 1 1 1 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1 Normalized frequency Normalized frequency

H H (a) Frequency Response of Ur (z−∗)Ur(z) (b) Frequency Response of Vr (z−∗)Vr(z)

Figure 5.7: The Paraunitary Matrices Obtained From PSVD-R after 2 iterations.

52 H Frequency Response for Ur(z)Dr(z)Vr (z−∗), 3 iter. Frequency Response for Dr(z), 3 iter. 15 15 10 10 20 20 5 5 0 0 20 20 0 0 − − 5 5 40 40 −10 −10 −60 −60 −15 −15 −80 −80 −20 −20 −100 −100 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1− 0 0.2 0.4 0.6 0.8 1 15 15 10 10 20 20 0 0

Power gain (dB) 5 5 Power gain20 (dB) 20 0 0 − − 5 5 40 40 −10 −10 −60 −60 −15 −15 −80 −80 −20 −20 −100 −100 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1− 0 0.2 0.4 0.6 0.8 1 Normalized frequency Normalized frequency (a) Frequency Response of Decomposition (b) Frequency Response of Diagonal Matrix H Ur(z)Dr(z)Vr (z−∗)

Figure 5.8: The Decomposition and Diagonal Matrices Obtained From PSVD-R after 3 iter- ations.

H H Frequency Response for Ur (z−∗)Ur(z), 3 iter. Frequency Response for Vr (z−∗)Vr(z), 3 iter. 1 1 1 1 0.5 0.5 0.5 0.5 0 0 0 0 0.5 0.5 0.5 0.5 − − − − 1 1 1 1 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1 1 1 1 1

0Power gain. (dB) 5 0.5 0.5 0.5 Power gain (dB) 0 0 0 0 0.5 0.5 0.5 0.5 − − − − 1 1 1 1 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1 − 0 0.2 0.4 0.6 0.8 1 Normalized frequency Normalized frequency

H H (a) Frequency Response of Ur (z−∗)Ur(z) (b) Frequency Response of Vr (z−∗)Vr(z)

Figure 5.9: The Paraunitary Matrices Obtained From PSVD-R after 3 iterations.

53 numerical issues of the algorithms. The rational Givens rotation is a rational function because of the normalization step. If this step were to be omitted, the associated would not be paraunitary, and the subsequent decompositions would not be of PQRD/PSVD form. Instead of being paraunitary, the involved matrices would take the form of a paraunitary matrix times a scalar polynomial. In for instance the communications application, such a decomposition could still be used for channel diagonalization. However, it is not the normalization step per se which is the cause of the numerical instability of the algorithms. It is the spectral factorization step, and this step is necessary when setting up rotation matrices of dimensions larger than 2 2. × Another aspect of the decompositions presented in this chapter is that the paraunitary matrices involved in the rational decompositions will have unstable inverses. This is easily seen as the paraunitary matrices are normalized with the spectral factor with all poles within the unit circle. Taking the paraconjugate transponate of such a matrix will yield a matrix with all poles outside the unit circle, i.e. an antistable filter.

54 Chapter 6

Polynomial SVD for Wideband Spatial Multiplexing

This chapter will introduce the polynomial decomposition algorithms of Chapter 4 into a communications system framework. The performance will be compared to a sub-carrier SVD based approach, called SM for MIMO-OFDM [11, p. 186] or MIMO-DMMT [22] in the literature. Similar attempts to characterize polynomial decomposition based communications systems have been made in [23, 24, 25]. These papers focused on bit error rates, rather than the sum rate performance measured employed in this chapter. For a deterministic narrowband MIMO channel, as shown in Section 6.1.1, channel ca- pacity can be obtained by diagonalizing the channel using the SVD. In effect, the channel is transformed so that multiple spatial data streams can be transmitted without interference between them, hence the name spatial multiplexing. For an analogous wideband scenario a similar approach named SM for MIMO-OFDM can be taken. The channel is then first diago- nalized in frequency using the OFDM technique, and then the SVD is applied to diagonalize the channel in space. As the number of sub-carriers grow large, this procedure has a high computational load. Alternative strategies, such as performing the SVD in the z-domain using the polynomial decomposition algorithms of previous chapters, are therefore interesting. The achievable rate study in this chapter is performed in the frequency domain. However, the PSVD algorithms results in filter representations in the z-domain. The implementation of the precoding and receive filtering can therefore easily be performed in the time domain, if that is deemed appropriate. The main assumption for our investigation is that the fading is slow, so that the channel maintains its state during the transmission of each block of data. Additionally, the transmitter is assumed to have perfect knowledge of the channel state. The channel is typically estimated at the receiver, and subsequently fed back to the transmitter. For a slowly varying channel, it is feasible to assume that the feedback can occur before the channel has changed. The perfect channel state knowledge assumption is simplifying the derivations, but it is not reasonable to make this assumption for a practical system. Furthermore, the loss in spectral efficiency due to the cyclic prefix of OFDM is neglected in the capacity and rate formulations.

55 6.1 Generic System Model

6.1.1 Narrowband Scenario ×  A signal vector s CMt 1 with E ssH = P is to be sent from the ∈ × transmitter to the receiver. Then for an arbitrary narrowband channel H CMr Mt , our ∈ model for the received signal is r = Hs + n (6.1) H  2 where n is a white Gaussian noise vector with covariance E nn = Rn = σnI. The mutual information between the stochastic variables s and r is then [14]

−1 H 1 H I(s; s) = log I + Rn HPH = log I + 2 HPH σn which, for a given P, is maximized if s is circularly symmetric Gaussian [14]. Now, the channel capacity of (6.1) for a given maximum transmit power Es/Mt, is defined as the solution to the following optimization problem [11, p. 65]:

1 max log I + HPHH P σ2 n (6.2) E subject to tr (P) s . ≤ Mt

Spatial Multiplexing The goal of spatial multiplexing is to obtain sum rates equal to the MIMO channel capacity as given by (6.2). The name spatial multiplexing comes from the fact that this sum rate maximization is achieved by diagonalizing the MIMO channel into a set of M = min(Mr,Mt) parallel spatial modes. In order to maximize the sum rate, the transmit powers are then optimized for the diagonalized system. First, we will modify the definition of the SVD from × × section 2.2.2 slightly. For some unitary matrices U CMr M , V CMt M and a non-negative × ∈ ∈ × diagonal matrix D RM M , the compact SVD of the channel matrix H CMr Mt is ∈ ∈ H = UDVH . (6.3)

The compact SVD can be obtained from the full SVD by removing the last M M columns of t − U and V, and removing the corresponding rows and columns of D. Assuming that the channel H, or at least the right singular vectors V, are known at the transmitter side, precoding and receive filtering matrices can be chosen such that

s = VQ1/2x y = UH r.

The covariance of signal vector x is assumed to be the identity matrix, but in order to be able to change the transmit powers the extra factor Q1/2 is added. With these operations in place, the communications process (6.1) can be written

y = UH r = UH (Hs + n) = UH HVQ1/2x + UH n | {z } w

56 which by plugging in the SVD of H = UDVH is equivalent to y = UH UDVH VQ1/2x + w = DQ1/2x + w. (6.4) Note that thanks to the unitary nature of U and V, the total transmit power is defined solely by Q = Q1/2Q1/2,H since

  H  1/2  1/2   1/2 1/2,H H  tr E VQ x VQ x = tr VQ IQ V = tr (Q) using the tr (AB) = tr (BA) property of the trace operator. Similarly, the filtered received H 2 2 noise covariance is Rw = U σnIU = σnI. A block diagram of the system can be seen in Figure 6.1.

x(1) n xˆ(1) b(n) r y ˆb(n) . s Demult . Q1/2 V H UH Det. . Mult. . .

x(M) xˆ(M)

Figure 6.1: Narrowband Spatial Multiplexing System.

Now plugging in (6.3) and substituting P = VQVH into (6.2), the optimization problem is transformed into

1 max log I + D2Q Q σ2 n (6.5) E subject to tr (Q) s ≤ Mt where the I + AB = I + BA rule also was used. Expanding the in terms of | | | | its argument’s eigenvalues, and using the product rule for logarithms, the cost function of (6.5) can be reformulated as

M ! M Y  1  X  1  log 1 + λ D2Q = log 1 + λ D2Q (6.6) σ2 i σ2 i i=1 n i=1 n where λi(A) denotes the i-th eigenvalue of A. As shown by [14], equation (6.6) is maximized for a diagonal Q, and therefore D2Q is diagonal as well. Let

 2 1/2 1/2,H Q = diag γ1 . . . γMt = diag √γ1 ... √γMt = Q Q . The eigenvalues of a diagonal matrix are the diagonal entries, and on that account the final form of the optimization problem is:

M  2  X Dii max log 1 + γi 2 {γi} σn i=1 (6.7) M X E subject to γ s . i ≤ M i=1 t

57 Finally, problem (6.7) can be solved for the optimal γi’s by the waterfilling algorithm, see e.g. [11, p. 68]. In summary, using the application of the precoder/receive filter matrices given by the SVD of the channel matrix, a capacity-achieving system could be implemented.

6.1.2 Wideband Scenario For the wideband scenario, the M M MIMO channel, constant for a transmission block, r × t with L channel taps is described by

L−1 X −l H(z) = Hlz (6.8) l=0

where the Hl matrix represents the l-th tap of the filter. For a sequence of signal vectors × s(m) CMt 1 m = 1, 2,... that are launched onto the channel, the sequence of received ∈ symbol vectors is then represented by the system model m X r(m) = Hm−ns(n) + n(m) n=m−L+1 or equivalently in the z-domain r(z) = H(z)s(z) + n(z). (6.9)  The noise process n(m) is assumed to be spatially and temporally white such that E n(m)nH (k) = R δ(m k) = σ2Iδ(m k). n − n − Assuming a transmit covariance P(z), according to [26, p. 25], the channel capacity is given by the solution to: Z 2π −1 jω jω jω H jω max log I + Rn (e )H(e )P(e )H (e ) dω P(z) 0 (6.10) Z 2π E subject to tr P(ejω) dω s . 0 ≤ Mt 6.2 SM by MIMO-OFDM: SVD in the DFT Domain

SM by MIMO-OFDM is the typical system proposed in the literature for spatial multiplexing over wideband channels [27, 11]. It is conceptually simple; the channel matrix is diagonalized using the FFT in frequency, and using the SVD in space. The sum rate of the system can then be maximized by selecting the transmit powers appropriately, for the set of independent parallel channels provided by the FFT-SVD pair.

6.2.1 Specific System Model The specific system model for SM by MIMO-OFDM is based on the generic system model for the narrowband system, but treats several parallel narrowband channels obtained from the OFDM processing. Through the applications of FFT/IFFT and the addition and removal of the cyclic prefix, as described for the SISO case in Section 3.3, the wideband system model (6.9) is transformed into a set of N parallel narrowband channels r = H s + n k = 0,...,N 1 (6.11) k k k k −

58 2 where the covariance of the noise remains Rnk = σnI due to the unitarity of the FFT/IFFT H matrices. As for the narrowband case, obtaining the SVD Hk = UkDkVk and precoding and receive filtering such that

1/2 sk = VkQk xk H yk = Uk rk transforms (6.11) into

y = D Q1/2x + w k = 0,...,N 1 (6.12) k k k k k − 2 and as per usual Rwk = Rnk = σnI. In order to derive the global communication process mutual information expression, we stack the frequency vectors according to

T T T T x = x0 x1 ... xN−1 T T T T y = y0 y1 ... yN−1 T T T T w = w0 w1 ... wN−1 so that y and w are M N 1 vectors and x is an M N 1 vector. By defining the block r × t × diagonal matrices

D = diag D N−1 { k}k=0 Q1/2 = diag Q1/2 N−1 { k }k=0 the global communication process can be written as

y = DQ1/2x + w. (6.13)

With this fully diagonalized system, the optimal detector takes the form of multiple SISO detectors.

6.2.2 Capacity With the system diagonalized over space and frequency in (6.13), and neglecting the loss in spectral efficiency due to the cyclic prefix, the rate is given by (6.7). Maximizing the rate means solving the optimization problem

M N 2 ! 1 X X Dij,ij max log 1 + γij {γ } N σ2 ij i=1 j=1 n (6.14) M N 1 X X E subject to γ s N ij ≤ M i=1 j=1 t where the normalization factor N is due to the N sub-carriers. Problem (6.14) is of the same form as problem (6.7), and can therefore be solved using waterfilling. The solution to (6.14) is the same as the solution to (6.10), and hence SM by MIMO-OFDM with SVD in DFT domain is capacity-achieving.

59 6.3 SM by MIMO-OFDM: SVD in the z-Domain

Another approach to obtaining the Vk, Uk matrices could be to sample the matrices given by the PSVD algorithms of Chapter 4 along the unit circle. This is effectively the FFT-SVD pair of SM by MIMO-OFDM, but taken in the opposite order. The FFT is applied in order to sample the U(z), V(z) matrices along the unit circle. If the PSVD algorithms of Chapter 4 were to produce perfect decompositions, the two approaches would be equivalent. Instead, the decomposition takes the form

H −∗ H0(z) = U(z)D(z)V (z ) +M(z) (6.15) | {z } H(z)

where H0(z) is the original matrix, H(z) is the approximation given by the PSVD, and M(z) is the associated error. For future reference, define the absolute unitarity errors such that

H −∗ U (z )U(z) = I + Ue(z) (6.16) H −∗ V (z )V(z) = I + Ve(z). (6.17)

Now sampling the involved matrices at N points along the unit circle, they take the form:

j2πk/N j2πk/N Uk = U(e ) Vk = V(e ) (6.18) j2πk/N j2πk/N Dk = D(e ) Mk = M(e ) (6.19) j2πk/N j2πk/N Ue,k = Ue(e ) Ve,k = Ve(e ) (6.20)

for k = 0 ...N 1. − 6.3.1 Specific System Model Given a wideband system on the form (6.8), precoding and receive filtering with the filters obtained from the PSVD gives the system model

H −∗ 1/2 H −∗ y(z) = U (z )H0(z)V(z)Q (z)x(z) + U (z )n(z) (6.21) | {z } w(z) in the z-domain. Note that due to the decomposition and unitarity errors, the channel will neither be perfectly diagonalized, nor will the filtered noise be temporally or spatially white. Applying the IFFT/FFT operations at the transmitter/receiver sides, (6.21) is represented by y = UH H V Q1/2x + w k = 0 ...N 1. k k 0,k k k k k − Plugging in (6.15) and using (6.16) – (6.20) transforms the model for sub-carrier k into

H H  1/2 yk = Uk HkVk + Uk MkVk Qk xk + wk  H k H  1/2 = Uk UkDkVH Vk + Uk MkVk Qk xk + wk H  1/2 = (I + Ue,k) Dk (I + Ve,k) + Uk MkVk Qk xk + wk H  1/2 = Dk + Ue,kDk + DkVe,k + Ue,kDkVe,k + Uk MkVk Qk xk + wk

60 H which by letting Ek = Ue,kDk + DkVe,k + Ue,kDkVe,k + Uk MkVk is equivalent to

y = (D + E ) Q1/2x + w k = 0 ...N 1. (6.22) k k k k k k −

Even though Dk is not perfectly diagonal, and Ek has no general structure, the modified channel is in some sense close to being diagonal. It is therefore similar in form to (6.12). Through the same set of transformations, the model (6.22) can be written on the aggregate form y = FQ1/2x + w where F = diag F N−1, F = D + E and the other entities are defined as in Section 6.2.1. { k}k=0 k k k In order to leverage the information about the cross-channel interference, a vector detector should be used. This type of system will be denoted PSVD-V, where the V is for Vector receiver. From the vector relation (6.22), the input-output relation for sub-channel i on sub-carrier k can be written M X yi,k = √γi,k[Fk]iixi,k + √γj,k[Fk]ijxj,k + wi,k (6.23) j=1 j=6 i where the second term is the cross-channel interference. For the SM by MIMO-OFDM with DFT domain SVD system, it is optimal to use a set of separate SISO detectors for every space/frequency sub-stream. Doing so for the set of sub-streams in (6.23) will be sub-optimal however, because the system is not diagonalized in space. Despite this, if we were to use a set of separate SISO detectors, the performance of a given sub-stream would be susceptible to the interference caused by other sub-streams. On the other hand, decoding complexity would be lower than for the vector receiver. A system with a set of separate SISO detectors will henceforth go under the name PSVD-S.

y1 xˆ1 y1 xˆ1 Det.

y2 xˆ2 y2 xˆ2 Det.

Joint Det......

yM xˆM yM xˆM Det.

(a) Vector Detector. (b) SISO Detectors.

Figure 6.2: Comparison Between Vector and SISO Detector.

A visual comparison between the PSVD-V and the PSVD-S receiver set-ups can be seen in Figure 6.2.

61 6.3.2 Achievable Rate Assuming that a vector receiver is employed (PSVD-V), the achievable rate of the system (6.21) is given by the objective function of (6.10). Plugging in the appropriate entities, the rate is then Z 2π SPSVD-V = log A dω 0 | | 1 H jω jω −1 H jω jω jω jω H jω H jω jω A = I + 2 U (e )U(e ) U (e )H(e )V(e )Q(e )V (e )H (e )U(e ) σn (6.24)

which can be maximized for Q(z) with e.g. a sum transmit power constraint. For the PSVD-S system, the achievable rate is the sum of achievable rate for all sub- streams (6.23). It is given by   M N 1 X X γi,k[Fk]ii S = log 1 +  (6.25) PSVD-S N  PM  i=1 k=1 j=1 γj,k[Fk]ij + Rwi,k j=6 i

6.4 Simulations

In this section, some numerical results regarding the sum rates attained by the different transmission schemes will be presented. The channel capacity (6.14), and the achievable sum rates (6.24), (6.25) were calculated for a variety of channels, received SNRs, and MPSVD by MPQRD-BC parameter values.

6.4.1 Method The system model defined in (6.9) was used. The channel H(z) was modeled using the structure L−1 X −ψ(l−1) −l H(z) = Hle z (6.26) l=0 where the elements of the Hl’s were drawn from a zero-mean circularly symmetric normalized Gaussian distribution and the exponential factor, with ψ R+, was added to give the channel ∈ an exponentially decaying power-delay-profile. This is a simple channel model, but adequate for our needs since the purpose of the study is to compare the transmission schemes, rather than evaluate the absolute capacity of the channel. For every channel realization, the channel capacity given by (6.14) with N = 512 was calculated. The PSVD-V rate (6.24) was calculated in the same fashion, using transmit powers obtained from the waterfilling algorithm assuming no cross-channel interference and white noise. For good decompositions, this is a reasonable assumption. Finally, the PSVD-S rate (6.25) was computed. Again, the transmit powers were selected assuming no CCI and white noise. This assumption is weak, which means that the achievable rate may not be close to the channel capacity. It is still interesting to study this case, because the transceiver structure is the same as for SM by MIMO-OFDM with a DFT domain SVD.

62 The signal-to-noise ratio at the receiver was calculated for a white reference signal s0(z) with total power Es/Mt. The received SNR was then

 0 H 0  Z 2π E H(ejω)s (ejω) H(ejω)s (ejω) SNR = H jω jω dω (6.27) 0 E (n (e )n(e )) where the denominator is

H jω jω  jω H jω  E n (e )n(e ) = E tr n(e )n (e ) = jω H jω  jω  2 = tr E n(e )n (e ) = tr Rn(e ) = Mrσn and the numerator

 jω 0 jω H jω 0 jω   0H jω H jω jω 0 jω  E H(e )s (e ) H(e )s (e ) = E s (e )H (e )H(e )s (e ) =

  jω 0 jω 0H jω H jω    H jω jω 0 jω 0H jω  E tr H(e )s (e )s (e )H (e ) = E tr H (e )H(e )s (e )s (e ) =    H jω jω   0 jω 0H jω  H jω jω  Es tr E H (e )H(e ) E s (e )s (e ) = tr E H (e )H(e ) I = Mt Es H jω jω  Es jω 2 tr E H (e )H(e ) = H(e ) F Mt Mt k k so that Z 2π Es jω 2 2πEs 2 SNR = 2 H(e ) F dω = 2 H(z) F . (6.28) σnMrMt 0 k k σnMrMt k k

6.4.2 Results The first simulation compared the sum rate of the PSVD-V system to the channel capacity, for varying size of the number of rows of the channel matrix. The simulated CDF for the matrix spatial sizes 2 3, 3 3, 4 3 can be seen in Figure 6.3. The rate CDF of the PSVD-V × × × system is clearly very close to the capacity CDF. The increase in capacity going from a 2 3 × to a 3 3 channel is larger than when going from a 3 3 to a 4 3 channel, because the former × × × adds another spatial mode, whereas the latter only increases the magnitude of the singular values. The sum rate for channels with different impulse response length were compared through the simulated CDFs. The comparison between a L = 2 and a L = 5 channel can be seen in Figure 6.4. The sum rate of the PSVD-V system is also here close to the channel capacity. Interestingly, the two curves intersect, so that the outage capacity for the L = 5 is better than the outage capacity for the other channel, for codes with low rates. For codes with higher rates, the converse holds. Several other simulations for the PSVD-V system were performed, but the plots are not shown here. For all simulations, the results suggested that the PSVD-V system performs close to the channel capacity, no matter spatial/temporal size of the channel, or PSVD algorithm parameter values. In addition to the CDF results, Figure 6.5 shows the average sum rate obtained for 30 channel realizations for the PSVD-V system. The average sum rate was close to the channel capacity, and therefore only the sum rate is shown in the graph. It can be seen that the system achieves a multiplexing gain proportional to min(Mr,Mt) in the high SNR region.

63 Sum Rate CDF Compared to Capacity CDF, for Vector Receiver. 1

0.8

0.6

CDF 2x3 Ch. Cap. 0.4 2x3 Sum Rate 3x3 Ch. Cap. 0.2 3x3 Sum Rate 4x3 Ch. Cap. 4x3 Sum Rate 0 5 5.5 6 6.5 7 7.5 8 8.5 Sum Rate

Figure 6.3: Simulated Sum Rate CDFs (Vector Receiver) for Various Spatial Channel Con- figurations. The channel length was L = 3 and MPSVD by MPQRD-BC parameters were set −3 −6 to r = 10 , µ = 10 , ρ = 5 and received SNR = 15dB.

Sum Rate CDF Compared to Capacity CDF, for Vector Receiver. 1

0.8

0.6 CDF 0.4 1-delay Ch. Cap. 0.2 1-delay Sum Rate 4-delay Ch. Cap. 4-delay Sum Rate 0 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7 7.1 7.2 Sum Rate

Figure 6.4: Simulated Sum Rate CDFs (Vector Receiver) for 3 3 Channels with L 2, 5 . −3 × −6 ∈ { } MPSVD by MPQRD-BC parameters were set to r = 10 , µ = 10 , ρ = 5 and received SNR = 15dB.

64 Average Sum Rate For Different Spatial Configurations 25

20

15

10

Sum Rate (bpcu) 5 2x3 3x3 4x3 0 0 5 10 15 20 25 30 Received SNR

Figure 6.5: Sum Rate (Vector Receiver) Averaged over 100 Channel Realizations. The channel −3 length was L = 3 and MPSVD by MPQRD-BC parameters were set to r = 10 , µ = 10−6, ρ = 5.

The sum rates CDFs for the PSVD-S system are shown in Figures 6.6, 6.7, varying the convergence parameter r and truncation parameter µ of the MPSVD by MPQRD-BC algorithm. It can be seen that the choice of r, µ has a direct effect on the achievable rate of the system. Selecting a smaller r reduces the amount of filter energy in the non-diagonal section of the diagonalized matrix, which means less cross-channel interference. Because the PSVD-S system does not leverage the information about the cross-channel interference, performance is increased as the cross-channel interference is decreased. Reducing the µ results in a better decomposition, which means better performance. These effects were less visible in the low SNR region, and therefore the high SNR region results are shown. The PSVD-S system is sensitive to choice of value for the MPSVD by MPQRD-BC pa- rameters r and µ. This is clear in Figures 6.8, 6.9, where decreasing the parameter results in higher average sum rates. Clearly, the impact of bad channel decomposition is greatest in the high SNR region.

6.5 Summary

This chapter presented the channel capacity expression for a wideband channel, and showed how the SM by MIMO-OFDM with DFT domain SVD could achieve the capacity. Another approach for obtaining the SVD matrices was introduced using the PSVD algorithms of previous chapters. The precoding/receive filtering matrices given by the PSVD algorithm have some error to them, and hence they do not perfectly diagonalize the channel. Because the PSVD approach does not perfectly diagonalize the channel, in order to maxi- mize the sum rate of the system a vector detector is needed (PSVD-V), that takes into account the cross-channel interference. A property of the perfectly diagonalized system is that the optimal detector takes the form of a set of separate SISO detectors for each stream; a setup which has lower decoding complexity than a vector detector. Therefore, the sum rate for a semi-diagonalized system with separate SISO detectors was derived (PSVD-S).

65 Sum Rate CDF Compared to Capacity CDF, for SISO Receiver Setup. 1 4 r = 10−

0.8 3 r = 10−

2 r = 10− 0.6 1 r = 10−

CDF Ch. Cap. 0.4

0.2

0 13 14 15 16 17 18 19 20 Sum Rate

Figure 6.6: Simulated Sum Rate CDFs (SISO Receivers) for 3 3 Channel with Varying × −6 MPSVD by MPQRD-BC Parameter r. Remaining parameters was µ = 10 , ρ = 5 and received SNR = 30dB.

Sum Rate CDF Compared to Capacity CDF, for SISO Receiver Setup. 1 8 µ 10− ∼ 6 µ 10− 0.8 ∼ 5 µ 10− ∼ 3 µ 10− 0.6 ∼ Ch. Cap. CDF 0.4

0.2

0 13 14 15 16 17 18 19 20 Sum Rate

Figure 6.7: Simulated Sum Rate CDFs (SISO Receivers) for 3 3 Channel with Varying × −3 MPSVD by MPQRD-BC Parameter µ. Remaining parameters was r = 10 , ρ = 5 and received SNR = 30dB.

66 Average Sum Rate For Various r 20 1 r = 10− 15 2 r = 10−

3 10 r = 10− Ch. Cap. 5 Sum Rate (bpcu)

0 0 5 10 15 20 25 30 Received SNR

Figure 6.8: Sum Rate (SISO Receiver Setup) Averaged over 100 Channel Realizations, for Varying r. The channel length was L = 3 and MPSVD by MPQRD-BC parameters were set to µ = 10−6, ρ = 5.

Average Sum Rate For Various µ 20 1 µ 10− ∼ 3 15 µ 10− ∼ 5 µ 10− ∼ 10 Ch. Cap.

5 Sum Rate (bpcu)

0 0 5 10 15 20 25 30 Received SNR

Figure 6.9: Sum Rate (SISO Receiver Setup) Averaged over 100 Channel Realizations, for Varying µ. The channel length was L = 3 and MPSVD by MPQRD-BC parameters were set −3 to r = 10 , ρ = 5.

67 Simulations show that the PSVD-V system gets close to the channel capacity, for most choices of PSVD algorithm parameter values. This is because the Givens rotations are unitary operations, and thereby energy preserving, and the truncations typically removes a small per- centage of the total energy of the filter. Since the total channel gains for the semi-diagonalized system stay close to the original channel, similar performance is achieved. This is dependent on the fact that a vector detector is employed. The simplicity of the diagonalized channel from SM by MIMO-OFDM with DFT domain SVD approach means that the optimal detector takes the form of a set of separate SISO detectors. Employing the same setup for the PSVD approach, the sum rate suddenly depends heavily on how close the channel is to being perfectly diagonalized. The results from the simulations show that the effect of choice of PSVD parameter values is great in the high SNR domain. This has the interpretation that the PSVD-S system is interference limited in the high SNR region, but power limited for low SNRs. The results show that as r, µ become small, the sum rate of the PSVD-S system gets close to the channel capacity. This is naturally at the expense of higher computational load, as the results of Chapter 4 show. If a system with near-capacity performance is sought, the SM by MIMO-OFDM may therefore be a better choice, based on the run time performance comparison of Chapter 4.

68 Chapter 7

Summary

This thesis has evaluated algorithms for approximative polynomial matrix decompositions, with the application to spatial multiplexing of wideband MIMO channels. In order to motivate the study, some background material from the wireless communications field was presented. The MIMO channel was introduced, and the effects of multipath propagation on the signal was discussed. The performance measure achievable rate was discussed, together with the upper bound channel capacity. Under the LTI assumption, a wideband MIMO channel can conveniently be represented by a polynomial matrix. In order to set the stage for the introduction of the algorithms, the concepts of polynomials, matrices and polynomial matrices were presented. Furthermore, the polynomial Givens rotation was presented, which forms the building block of the polynomial decomposition algorithms. With the theoretical underpinnings in place, the approximative polynomial decomposition algorithms were presented. The PQRD algorithms operated on a given polynomial matrix by iteratively applying polynomial Givens rotations, in order to null the dominant coefficient under the main diagonal. The PSVD algorithms iteratively applied the PQRD algorithms to form an approximate polynomial singular value decomposition of the input matrix. Thanks to the energy moving property of the polynomial Givens rotation, the algorithms were shown to converge. The original decomposition algorithms of [4] were modified to work with relative convergence criteria and the modified algorithms were then analyzed in terms of decomposition quality and computational complexity. It was shown that the relative decomposition error could be made small by performing sufficiently many iterations. However, for every iteration the maximum degree of the matrix grew, due to an inherent property of the polynomial Givens rotation. Because of this effect, a truncation step is needed in order to reduce the memory requirements of the algorithms. The computational complexity of the algorithms were shown to be linear in terms of iterations needed, but with a large slope coefficient. If the polynomial singular value decomposition were to be sampled along the unit-circle, in order to get the precoding/receive filtering matrices needed for wideband spatial multiplexing, it was shown in a simple example how the computational load of the PSVD becomes prohibitively large for channels with long impulse responses. With this rather gloomy result in mind, another approach for decomposing polynomial matrices was taken. A rational Givens rotation was introduced, representing a matrix filter with infinite impulse response. Using this new Givens rotation, corresponding PQRD and PSVD algorithms were set up, yielding rational decompositions of polynomial matrices. For

69 simple cases, the rational decomposition algorithms were shown to result in excellent decom- positions. However, the algorithms turned out to be numerically unstable, which was seen when they were applied to larger matrices. A polynomial singular value decomposition algorithm was plugged into a wideband spatial multiplexing framework, for performance evaluation. The sum rate achieved from the system with precoding/receive filtering matrices obtained from sampling the PSVD along the unit- circle was compared to the channel capacity. If a joint detector was assumed at the receiver end, the achievable sum rates got close to the channel capacity for most PSVD algorithm parameter values. However, in order to be compared to the reference system SM by MIMO- OFDM, a set of separate SISO detectors should be used at the receiver. For this system, it was shown that the achievable sum rate was dependent on the diagonality error of the decomposed channel.

7.1 Conclusions

The polynomial decomposition algorithms studied in this thesis operate on a simple, but effective, strategy. For the QR case, every coefficient below the main diagonal is nulled using a polynomial Givens rotation, until the matrix satisfies some convergence criterion. The algorithms provably converges, for the appropriate choice of parameter values they produce decompositions of good quality. The iterative behaviour of the algorithms is their downfall though. In order to converge, especially for polynomial matrices with many taps, a large amount of iterations are needed. If the goal is to obtain precoding and receive filtering matrices for diagonalizing a wideband MIMO channel, the computational load becomes high compared to the traditional approach of performing a set of SVDs in the frequency domain. There is therefore no advantage to using the polynomial decomposition algorithms for this application. Additionally, the approximative nature of the algorithms is another shortcoming compared to the traditional approach. Instead of iteratively nulling every coefficient below the main diagonal, every matrix ele- ment below the main diagonal can be nulled in its entirety in one step. In order to keep its paraunitarity, the Givens rotation must be allowed to be a rational matrix function. Algo- rithms based on this rational Givens rotation is shown to sometimes produce excellent results, but sometimes fail completely. Their numerical instability is probably due to the spectral fac- torization needed for setting up the rational Givens rotation. Because of their bad numerical properties, the rational decomposition algorithms in their current form are purposeless. Assuming that the polynomial decomposition algorithms were to be used for wideband spatial multiplexing, an evaluation of the system achievable sum rate was performed. Em- ploying the same receiver set-up as is used for the traditional approach of frequency-domain SVDs, it was shown that the achievable rate for high SNR was directly dependent on the decomposition quality. For low SNR, the effect of decomposition quality was not as clear. Consequently, in order for the polynomial decomposition system to compete with the tradi- tional approach in terms of achievable rate, a good decomposition is needed in the high SNR region. The demands for a good decomposition will lead to many iterations of the algorithms, hence a high computational load compared to the traditional approach. The application of the polynomial decomposition algorithms are therefore not viable for the application of wideband spatial multiplexing. The fact that the SVDs in the frequency-domain easily can be paral-

70 lelized, and that this is not the case for the polynomial decomposition algorithms, strengthens the conclusion.

7.2 Future Work

Some directions for future work could be:

Investigate the problems of the rational decomposition algorithms; perhaps the algo- • rithms can be reformulated to obtain better numerical properties.

Evaluate the MPSVD by MPQRD-BC algorithm on a more involved channel model; to • see whether the conclusion for the exponential power-delay-profile channel model holds.

Consider whether the algorithms can be used in a communications system with only • statistical or noisy channel state information at the transmitter.

Apply the algorithm of [17] in the communications framework of Chapter 6 for perfor- • mance evaluation.

71 Appendix A

Acronyms and Notation

Z The set of integers

R The set of real numbers

R+ The set of positive real numbers

C The set of complex numbers × Cp q The field of complex p q matrices × p, q, r Number of rows, columns and coefficients of a polynomial matrix

A(z) Arbitrary polynomial matrix

−t ai,j(t) The (i, j) coefficient of the coefficient matrix associated with z . H(z) Channel matrix s Symbol vector launched onto channel x Symbol vector to be precoded n AWGN noise vector w Receive filtered noise vector r Received symbols vector y Filtered received symbols vector

N FFT size/number of sub-carriers

M Number of spatial modes

Mt Number of transmit antennas

Mr Number of receive antennas L Number of channel taps

W System bandwidth

72 Bc Channel coherence bandwidth

AWGN Additive White Gaussian Noise

CCI Cross-Channel Interference

DFT Discrete Fourier Transform

FFT Fast Fourier Transform

FIR Finite Impulse Response

flop Floating point operation

IIR Infinite Impulse Response

IDFT Inverse Discrete Fourier Transform

IFFT Inverse Fast Fourier Transform

LOS Line-of-sight

LTI Linear Time-Invariant

MIMO Multiple Input Multiple Output

NLOS Non line-of-sight

OFDM Orthogonal Frequency-Division Multiplexing

PGR Polynomial Givens Rotation

PQRD Polynomial QR Decomposition

PSVD Polynomial Singular Value Decomposition

RDE Relative Decomposition Error

RGR Rational Givens Rotation

SM Spatial Multiplexing

SISO Single Input Single Output

SVD Singular Value Decomposition

QRD QR Decomposition

73 Appendix B

Some Complexity Derivations

B.1 Matrix-Matrix Multiplication

For completeness, this section provides complexity expressions for the matrix-matrix multi- plication operation. Both the constant case as well as the polynomial case is studied.

Constant Matrix-Matrix Multiplication × × × Let A Cm n, B Cn p, C Cm p and C = AB. Then there will be mp elements in C, ∈ ∈ ∈ and n complex multiplications will be needed to calculate every element. The complexity of such a matrix-matrix multiplication is hence (mnp) complex floating point operations. O Polynomial Matrix-Matrix Multiplication × × × × × × − Let A(z) Cp1 q1 r1 , B(z) Cq1 q2 r2 , C(z) Cp1 q2 (r1+r2 1) and C(z) = A(z)B(z). As ∈ ∈ ∈ derived in section 2.1.2, multiplication of two polynomials is equivalent to convolution of the respective coefficient vectors. For the polynomial matrix-matrix multiplication, this results in a convolution of matrix-valued coefficients. The product is hence

∞ ! X −v C(z) = (A B)(z) = A B − z (B.1) ∗ v z v v=−∞ where every matrix-matrix multiplication will be (p q q ). The number of non-zero multipli- O 1 1 2 cations for a given coefficient in the resulting polynomial is bounded by min(r1, r2). This gives that the number of operations to obtain a single output coefficient is (p q q min(r , r )). O 1 1 2 1 2 There will be r + r 1 coefficient matrices in C(z), and the total complexity is therefore 1 2 − ((r + r 1)p q q min(r , r )). O 1 2 − 1 1 2 1 2

74 List of Figures

1.1 Block Diagram of a Typical Digital Communications System...... 2 1.2 Modulation-Channel-Demodulation Sub-system...... 2

3.1 Example of Multipath Propagation ...... 12 3.2 Single-input single-output Channel ...... 13 3.3 Multiple-input multiple-output Channel ...... 14

4.1 The Original and Upper Triangular Matrices Obtained From MPQRD-BC For −3 −6 r = 10 , µ = 10 , ρ = 2...... 26 −3 4.2 The Paraunitary Matrix Obtained From MPQRD-BC For r = 10 , µ = 10−6, ρ = 2...... 26 4.3 Decomposition Error as a Function of  and µ, Averaged over 100 Matrices. . 28 4.4 Unitarity Error as a Function of  and µ, Averaged over 100 Matrices. . . . . 28 4.5 Triangularity Error as a Function of  and µ, Averaged over 100 Matrices. . . 28 4.6 Number of coefficient steps (iterations) needed for convergence, as a function of input matrix size. The dimension size of the independent dimensions was 3. 30 4.7 Number of coefficient steps (iterations) needed for convergence as a function × of algorithm parameters, for a Matrix A(z) (0, 2, C3 3)...... 30 ∼ 4.8 The Original and Diagonalized Matrices Obtained From a MPSVD by MPQRD- −3 −6 BC Run with r = 10 , µ = 10 , ρ = 2...... 35 4.9 The Paraunitary Matrices Obtained From MPSVD by MPQRD-BC Applied −3 −6 to the Original Matrix from Figure 4.8a, with r = 10 , µ = 10 , ρ = 2. .. 35 4.10 Decomposition Error as a Function of  and µ, Averaged over 100 Matrices. . 37 4.11 Unitarity Error as a Function of  and µ, Averaged over 100 Matrices. . . . . 37 4.12 Diagonality Error as a Function of  and µ, Averaged over 100 Matrices. . . . 37 4.13 Number of coefficient steps (iterations) needed for convergence, as a function of input matrix size...... 38 4.14 Number of coefficient steps (iterations) needed for convergence as a function of algorithm parameters, for a 3 3 matrix with 3 lags...... 38 × −1 jω 4.15 Frequency Response of Sampled PSVD (r = 10 ) Matrix D(e ) and Mag- nitudes of Sequence of SVD Matrices Dk...... 40 −1 jω 4.16 Phase Response of Sampled PSVD (r = 10 ) Matrix D(e ) and Magnitudes of Sequence of SVD Matrices Dk...... 41 4.17 Computational Load Comparison Between Performing a PSVD and Performing N SVDs...... 42

75 5.1 The Original and Upper Triangular Matrices Obtained From PQRD-R for Ma- trix A(z) from Section 4.3.2...... 48 5.2 The Paraunitary Matrix Obtained From PQRD-R for Matrix A(z) from Sec- tion 4.3.2...... 48 5.3 The Original and Upper Triangular Matrices Obtained From PQRD-R for a 4 4 Matrix...... 49 × 5.4 The Paraunitary Matrix Obtained From PQRD-R for a 4 4 Matrix. . . . . 49 × 5.5 Frequency Response of Matrix A(z)...... 51 5.6 The Decomposition and Diagonal Matrices Obtained From PSVD-R after 2 iterations...... 52 5.7 The Paraunitary Matrices Obtained From PSVD-R after 2 iterations. . . . . 52 5.8 The Decomposition and Diagonal Matrices Obtained From PSVD-R after 3 iterations...... 53 5.9 The Paraunitary Matrices Obtained From PSVD-R after 3 iterations. . . . . 53

6.1 Narrowband Spatial Multiplexing System...... 57 6.2 Comparison Between Vector and SISO Detector...... 61 6.3 Simulated Sum Rate CDFs (Vector Receiver) for Various Spatial Channel Con- figurations. The channel length was L = 3 and MPSVD by MPQRD-BC pa- −3 −6 rameters were set to r = 10 , µ = 10 , ρ = 5 and received SNR = 15dB. . . 64 6.4 Simulated Sum Rate CDFs (Vector Receiver) for 3 3 Channels with L 2, 5 . × −3 −∈6 { } MPSVD by MPQRD-BC parameters were set to r = 10 , µ = 10 , ρ = 5 and received SNR = 15dB...... 64 6.5 Sum Rate (Vector Receiver) Averaged over 100 Channel Realizations. The channel length was L = 3 and MPSVD by MPQRD-BC parameters were set −3 −6 to r = 10 , µ = 10 , ρ =5...... 65 6.6 Simulated Sum Rate CDFs (SISO Receivers) for 3 3 Channel with Vary- × ing MPSVD by MPQRD-BC Parameter r. Remaining parameters was µ = 10−6, ρ = 5 and received SNR = 30dB...... 66 6.7 Simulated Sum Rate CDFs (SISO Receivers) for 3 3 Channel with Vary- × ing MPSVD by MPQRD-BC Parameter µ. Remaining parameters was r = 10−3, ρ = 5 and received SNR = 30dB...... 66 6.8 Sum Rate (SISO Receiver Setup) Averaged over 100 Channel Realizations, for Varying r. The channel length was L = 3 and MPSVD by MPQRD-BC parameters were set to µ = 10−6, ρ =5...... 67 6.9 Sum Rate (SISO Receiver Setup) Averaged over 100 Channel Realizations, for Varying µ. The channel length was L = 3 and MPSVD by MPQRD-BC −3 parameters were set to r = 10 , ρ =5...... 67

76 List of Tables

4.1 Errors for PQRD of Matrix A(z)...... 25 4.2 MPQRD-BC Parameter Values for Spatial/Temporal Series...... 27 4.3 Errors for PSVD of Matrix A(z) from Section 4.3.2...... 34 4.4 MPSVD by MPQRD-BC Parameter Values for Spatial/Temporal Series. . . . 36

77 Bibliography

[1] D. Astely, E. Dahlman, A. Furusk¨ar,Y. Jading, M. Lindstr¨om,and S. Parkvall, “LTE: The Evolution of Mobile Broadband,” Communications Magazine, IEEE, vol. 47, pp. 44 –51, Apr. 2009.

[2] U. Madhow, Fundamentals of Digital Communication. Cambridge University Press, 2008.

[3] S. Diggavi, N. Al-Dhahir, A. Stamoulis, and A. Calderbank, “Great Expectations: The Value of Spatial Diversity in Wireless Networks,” Proceedings of the IEEE, vol. 92, pp. 219–270, Feb. 2004.

[4] J. Foster, J. McWhirter, M. Davies, and J. Chambers, “An Algorithm for Calculating the QR and Singular Value Decompositions of Polynomial Matrices,” Signal Processing, IEEE Transactions on, vol. 58, pp. 1263–1274, Mar. 2010.

[5] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge University Press, 1985.

[6] F. M. Callier and C. A. Desoer, Multivariable Feedback Systems. Springer, 1982.

[7] L. N. Childs, A Concrete Introduction to Higher Algebra. Springer, 2009.

[8] T. Kailath, A. H. Sayed, and B. Hassibi, Linear Estimation. Information and System Sciences Series, Prentice Hall, 2000.

[9] G. H. Golub and C. F. Van Loan, Matrix Computations. Johns Hopkins University Press, 1996.

[10] A. V. Aho and J. D. Ullman, Foundations of Computer Science. Computer Science Press, 1995.

[11] A. Pauraj, R. Nabar, and D. Gore, Introduction to Space-Time Wireless Communica- tions. Cambridge University Press, 2003.

[12] E. Perahia, “IEEE 802.11n Development: History, Process, and Technology,” Commu- nications Magazine, IEEE, vol. 46, pp. 48 –55, July 2008.

[13] C. E. Shannon, “A Mathematical Theory of Communication,” The Bell System Technical Journal, vol. 27, pp. 379–423, 623–656, 1948.

[14] E. Telatar, “Capacity of Multi-Antenna Gaussian Channels,” European Transactions on Telecommunications, vol. 10, no. 6, pp. 585–595, 1999.

78 [15] D. Tse and P. Viswanath, Fundamentals of Wireless Communication. Cambridge Uni- versity Press, 2008.

[16] R. Wirski and K. Wawryn, “QR Decomposition of Rational Matrix Functions,” in In- formation, Communications and Signal Processing, 2009. ICICS 2009. 7th International Conference on,, pp. 1–4, Dec. 2009.

[17] J. G. McWhirter, “An Algorithm for Polynomial Matrix SVD Based on Generalised Kogbetliantz Transformations,” in Proceedings of EUSIPCO 2010, 2010.

[18] M. Bengtsson, “Complementary Reading in Digital Signal Processing,” October 2009.

[19] M. Bengtsson, “Private communication.” June 2010.

[20] D. Henrion and M. Sebek, “Reliable Numerical Methods for Polynomial Matrix Trian- gularization,” Automatic Control, IEEE Transactions on, vol. 44, pp. 497–508, Mar. 1999.

[21] A. Sayed and T. Kailath, “A Survey of Spectral Factorization Methods,” Numerical with Applications, vol. 8, no. 6-7, pp. 467–496, 2001.

[22] G. Raleigh and J. Cioffi, “Spatio-Temporal Coding for Wireless Communication,” Com- munications, IEEE Transactions on, vol. 46, pp. 357–366, Mar. 1998.

[23] M. Davies, S. Lambotharan, J. Chambers, and J. McWhirter, “Broadband MIMO Beam- forming for Frequency Selective Channels using the Sequential Best Rotation Algorithm,” in Vehicular Technology Conference, 2008. VTC Spring 2008. IEEE, pp. 1147 –1151, May 2008.

[24] M. Davies, S. Lambotharan, J. Foster, J. Chambers, and J. McWhirter, “Polynomial Matrix QR Decomposition and Iterative Decoding of Frequency Selective MIMO Chan- nels,” in Wireless Communications and Networking Conference, 2009. WCNC 2009. IEEE, pp. 1–6, Apr. 2009.

[25] C. Ta and S. Weiss, “Design of Precoding and Equalisation for Broadband MIMO Trans- mission,” in DSPenabledRadio, 2005. The 2nd IEE/EURASIP Conference on (Ref. No. 2005/11086), Sept. 2005.

[26] E. G. Larsson and P. Stoica, Space-Time Block Coding for Wireless Communications. Cambridge University Press, 2003.

[27] H. Bolcskei, D. Gesbert, and A. Paulraj, “On the Capacity of OFDM-based Spatial Multiplexing Systems,” Communications, IEEE Transactions on, vol. 50, pp. 225–234, Feb. 2002.

79