1 Optimal Energy-efficient Source and Relay Precoder Design for Cooperative MIMO-AF Systems Fabien H´eliot, Member, IEEE and Rahim Tafazolli, Senior Member, IEEE

Abstract—Energy efficiency (EE) is a key design criterion munication. As such, MIMO-AF communication has been for the next generation of communication systems. Equally, thoroughly investigated over the last decade, where numerous cooperative communication is known to be very effective for en- works have focused on designing /resource alloca- hancing the performance of such systems. This paper proposes a breakthrough approach for maximizing the EE of multiple-input- tion techniques for improving the SE, reducing the detection multiple-output (MIMO) relay-based nonregenerative coopera- error rate, or reducing the power consumption of MIMO-AF tive communication systems by optimizing both the source and [14]–[22]. With EE becoming one of the key design parameters relay precoders when both relay and direct links are considered. in wireless communication, more recent works on MIMO- We prove that the corresponding optimization problem is at least AF communication have started to develop precoding/resource strictly pseudo-convex, i.e. having a unique solution, when the relay precoding matrix is known, and that its Lagrangian can allocation techniques for improving the EE of MIMO-AF be lower and upper bounded by strictly pseudo-convex functions [10]–[12]. For instance, EE-optimal precoding/resource allo- when the source precoding matrix is known. Accordingly, we cation schemes have been designed in [10], [11] and [12] for then derive EE-optimal source and relay precoding matrices that the MIMO-AF two-hop and multi-hop scenarios, respectively, are jointly optimize through alternating optimization. We also where a source node (SN) transmits data to a destination provide a low-complexity alternative to the EE-optimal relay precoding matrix that exhibits close to optimal performance, node (DN) via one or more relay nodes (RNs). However, but with a significantly reduced complexity. Simulations results these works do not take into account the direct link (DL) show that our joint source and relay precoding optimization can transmission (i.e. SN to DN transmission is neglected) in their improve the EE of MIMO-AF systems by up to 50% when precoding design. As far as SE/detection error-based MIMO- compared to direct/relay link only precoding optimization. AF precoding/resource allocation schemes are concerned, the Index Terms—Energy efficiency, precoding/, co- works in [16]–[20] have focused on the same scenario as operative communication, MIMO, amplify-and-forward. in this paper, i.e. the cooperative MIMO-AF scenario, where both the relay link (RL) (i.e. SN to RN to DN transmission) I.INTRODUCTION and DL transmissions are fully considered. For instance in Energy efficiency (EE) is one of the eight key figures [16]–[19], which all focus on SE improvement, heuristic of merit identified by the international telecommunication precoding/resource allocation methods have been proposed; union (ITU) for shaping the next generation of communication however, none of these works have provided any closed-form systems [1]; as such, EE has recently received a surge of of the optimal precoding matrix at the SN or RN. Whereas interests from the research community [2]–[4], as well as in [20], optimal SN and RN precoding matrices have been network vendors and operators who perceive EE as an enabler obtained (in closed form) for minimizing the mean square error for sustainable communication (both from an economical and (MSE) (i.e. detection error). environmental perspectives). Equally, relay-based cooperative In this paper, we go beyond the works in [10], [11] and communication, which is also well-documented [5]–[7], has propose a breakthrough approach for maximizing the EE of proved to be very effective for improving the spectral effi- cooperative MIMO-AF systems, where we derive both EE- ciency (SE) or/and the coverage of cellular networks [7], as optimal SN and RN precoding matrices. Deriving SN and well as reducing the cost of network deployment [8]. More RN precoding matrices was also the aim of [16]–[20], but the recently, relays have also been used for improving the EE [9]– precoding matrices of [16]–[20] are designed to either maxi- [12]. As a result, relay-based cooperative communication is an mize the SE or minimize the MSE instead of maximizing the integral part of the existing wireless communication standards EE, which is an entirely new proposition given the significant [13]; it is also foreseen to play a major role for enabling difference in optimization problem formulation between our device-to-device communication in the near future [1]. work and these works. In addition, contrary to [16]–[19], we Amongst the various existing relay-based communica- formally prove the optimality of our SN and RN precoders by tion strategies (e.g. amplify-and-forward (AF), decode-and- relying on pseudo-convexity arguments and provide a closed- forward, compress-and-forward), AF remains one of the most form expression for the EE-optimal SN precoding matrix. We popular strategy given its simplicity and practicality for en- assume, as in most existing works on MIMO-AF precoding abling multi-input multi-output (MIMO) cooperative com- [16]–[20], [22] that full channel state information (CSI), i.e. transmit and receive CSI, is available at both the SN and RN; The authors are with the Institute for Communication Systems, Faculty of further practical detailed about CSI acquisition can be found Electronics & Physical Sciences, University of Surrey, Guildford GU2 7XH, in [21], [22]. Note that a preliminary version of this work is UK (e-mail: [email protected]). We would like to acknowledge the support of the University of Surrey 5GIC available in [23]; contrary to [23], we prove here the pseudo- (http://www.surrey.ac.uk/5gic) members for this work. convexity/convexity of the main optimization problem, provide 2 an EE-optimal relay precoding matrix (instead of suboptimal), and consider power constraints for designing the source and 1st phase 2nd phase relay precoders. RN The rest of the paper is organized as follows. Section II-A n y RN SN 1 DN first recalls the layout and then defines the achievable sum-rate, H n 1 H2 n n power consumption, as well as energy-per-bit consumption of SN G 2 DN n1 y y the cooperative MIMO-AF system. Then, Section III intro- s 2 y duces the optimization problem that is solved in this paper, R 0 H0 i.e. EE-based iterative joint optimization of the SN and RN n0 precoding matrices. We first prove that the Lagrangian function associated to this problem is strictly pseudo-convex or convex Fig. 1: Nonregenerative cooperative MIMO communication for a known RN precoding matrix and obtain the EE-optimal system model. SN precoding matrix in closed-form. We then derive lower and upper bounds of the Lagrangian function for a known SN precoding matrix (which are proved to be strictly pseudo- aggregate mutual information/achievable rate (over two time convex or convex functions) and proposed a novel method slots) of the cooperative MIMO-AF system depicted in Fig. 1 for obtaining the EE-optimal RN precoding matrix. We also can be expressed, in two distinct manners (see equation (6) of provide a sub-optimal RN precoding matrix in closed-form. [17]), as I(y; s)= An iterative process based on alternating optimization [24] is σ2 finally utilized, as for instance in [10], [16], [20], for jointly R (R, G)=R (R)+W log I + 1 H GΥ˙ (R)G†H† Σ 0 2 nDN σ2 2 2 optimizing the SN and RN precoding matrices; a process that 2 −1 is proved to converge towards a local optimum at each iteration ×Ω(G) or (1a) due to the strictly pseudo-convexity/convexity of the SN/RN † ˙ RΣ(R, G)= W log2 InSN +R Ψ(G)R , (1b) precoding optimization problem. Next, Section IV provides a performance as well as complexity analysis of our scheme. where Simulation results confirm that joint SN and RN EE-based R (R)= I(y ; s)= W log I + σ−2H RR†H† , precoding optimization can improve the EE performance of 0 0 2 nDN 0 0 0 MIMO-AF systems by up to 50% when compared to existing −1 Υ˙ (R)=σ−2H R I +σ−2R†H†H R R†H†, EE-optimal precoding schemes such as in [10], [11]. They 1 1 nSN 0 0 0 1 also indicate that the classic relay precoding structure of σ2 Ω(G) = I + 1 H GG†H†, and [14], which is known to be optimal in various MIMO-AF nDN 2 2 2 σ2 optimization cases [10], [11], [14], [25], is not optimal here. † † † Ψ˙ (G)=σ−2H H +σ−2H G†H Ω(G)−1H GH . Conclusions are finally drawn in Section V. 0 0 0 2 1 2 2 1 Notation: The following notation is considered through- In addition W is the channel bandwidth, R ∈ CnSN×nSN out the paper. Boldface lowercase letters (e.g. a) denote and G ∈ CnRN×nRN are the SN and RN precoding matrices, nDN×nSN nRN×nSN vectors, boldface uppercase letters (e.g. A) denote matrices, respectively. Moreover, H0 ∈ C , H1 ∈ C , and nDN×nRN boldface uppercase letters with a hat on top (e.g. A) denote H2 ∈ C model the SN to DN, SN to RN, and SN 2 2 diagonal matrices, and Ix denotes a x × x identity matrix. to RN MIMO channels, correspondingly. Furthermore, σ0 , σ1 2 Whereas the operator diag(.) transforms a vector into a diag- and σ2 are the variances of the Gaussian noise at the DN onal matrix (e.g. diag(a)= A). Moreover, A ≻ 0 or A 0 (SN-DN link), RN and DN (SN-RN-DN link), respectively. indicates that A is a positive definite or semi-definite matrix, 1 respectively, and A 2 represents the Hermitian square root of B. Power Consumption Model and EE-SE Trade-off A 0. Furthermore, |.|, tr{.}, .†, and .−1 are the determi- nant, trace, conjugate transpose, and generalized inverse (both Given that the transmission between the SN and DN occurs inverse and pseudo-inverse) matrix operators, respectively. In in two phases, the way in which power is consumed by each node in each phase can be different. For instance, in the first addition, ., .F denotes the Frobenius inner product between † phase, the SN transmits data to both the RN and DN, which two matrices, such that A, BF = tr A B . Finally, the receive it; whereas in the second phase, only the RN transmits notation [.]+ refers to max{., 0}. to the DN, such that the SN is idle (sleep mode). By assuming II. COOPERATIVE MIMO-AF EE FRAMEWORK that three power consumption modes are available for each Tx Rx Sl A. System Model node, i.e. transmission, P. , reception, P. , and sleep, P. , This paper focuses on the EE/energy consumption of a the total consumed power (over two phases) of the cooperative classic nonregenerative cooperative MIMO system that is MIMO-AF system in Fig. 1 can hence be modeled as composed of three nodes, i.e. a SN, a RN and a DN, as it Tx Rx Rx Sl Tx Rx PΣ = P + P + P + P + P + P . (2) is illustrated in Fig. 1. The SN, RN and DN are equipped SN RN DN SN RN DN with nSN, nRN and nDN antennas, respectively. In transmission mode, a node consumes power for preparing As in [14], we assume here that the data transmission is (e.g. baseband processing, RF transceiver chain) and sending performed over two phases of equal duration, such that the the information (power amplifier). According to [4], [26], [27], 3 the power consumption of common communication equip- III. EE-OPTIMAL SOURCE AND RELAY PRECODING ments/devices in a cooperative MIMO-AF system, e.g. BS, In this section, our aim is to derive precoding matrices R RN or user equipment (UE), is linearly dependent with their and G that minimizes the following optimization problem transmit power when transmitting. Hence, it can be expressed via a generic linear MIMO power model [28] min Eb(R, G), (9a) R,G Tx CipA Ci s.t. R max (9b) P. = ∆.P. + n.P. + P. , (3) PSN( ) ≤ PSN , max PRN(R, G) ≤ PRN , (9c) where P. represents the transmit power, ∆. accounts for the max inefficiency of the transmitting node power amplifier, n. is where R 0, R = 0, and G 0. In addition, PSN and CipA max the number of transmit antennas, P. models the circuit PRN are the maximum transmit power at the SN and RN, power consumption scaling with n. (e.g. RF transceiver chain respectively. Contrary to sum-rate maximization (e.g. [16], Ci power consumption), and P. models the other types of [17]), or transmit power and MSE minimization problems circuit power consumption (e.g. DC-DC conversion, baseband (e.g. [20], [22]), the EE objective function in (9a) is not a processing). In reception mode, the receiving node consumes Schur-concave/convex objective function, but a ratio between power for receiving (e.g. RF transceiver chain) and processing Schur-concave/convex functions. As such, contrary to Schur- the information (e.g. baseband processing), such that concave/convex objective functions, (9a) exhibits a global op- timum (which is not 0 or ∞ as long as P > 0 [30]) even when Rx CipA Ci (4) c P. = ς[n.P. + P. ], no constraints are enforced. Indeed, as it explained in [31], the where 0≤ς ≤1 given that reception is usually less demanding sole EE-optimal solution for a given EE optimization problem in terms of circuit power than transmission. Finally, in sleep can only be obtained by solving its unconstrained form [31], mode, a node waits to transmit/receive and does not perform i.e. solving (9a) without (9b) and (9c). Consequently, in order any processing, such that only a fraction of the circuit power to optimally solve the optimization problem at hand, it is Sl SlpA SlpA necessary to first solve (9a) on its own and refine its solution is consumed [4], i.e. P. = n.P. , where P. is the per- Tx if it does not meet the constraints in (9b) and (9c), as it is antenna sleep power. By inserting the definitions of P. in Rx Id further detailed in Sections III-A and III-B. (3), P. in (4) and P. into (2), the total consumed power of the cooperative MIMO-AF system in Fig. 1 is reformulated as The problem in (9) is generally non-convex, since (9a) is not necessarily jointly convex in R and G. However, it can PΣ = Pc + ∆SNPSN + ∆RNPRN, (5) be proved, by relying on Propositions 1, 2 and 3, that this optimization problem has a unique global minimum when R where, as in [14], and G are treated independently. In other words, there exists † ⋆ PSN(R)=tr RR and (6a) an optimal matrix R minimizing (9) for a known G and an optimal matrix G⋆ minimizing (9) for a known R. R G G 2I H RR†H† G† (6b) PRN( , )=tr σ1 nRN + 1 1 . Proposition 1: Functions of the type

CipA SlpA Ci In addition, Pc = nSN(PSN + PSN ) + PSN + (1 + f(X)= p(X)/q(X), (10) ς)(n P CipA + P Ci )+2ς(n P CipA + P Ci ) accounts for all RN RN RN DN DN DN where p is a linear function of X and q is a strictly concave the fixed circuit consumed powers. Hence, based on (5) and function of X, are strictly pseudo-convex functions of X, for (6), P can be expressed in two distinct manners as Σ X 0. See section A of the Appendix for the proof. In turn, σ2 according to [32], if X⋆ is a stationary point of f, i.e. a point P (R, G)= P ′(R)+ 1 tr GΥ¨ (R)G† or (7a) Σ c 2 where ∇Xf(X = X⋆) = 0, then f ⋆ = f(X⋆) is the unique σ2 global minimum of f over its domain. R G ′ G R†Ψ¨ G R PΣ( , )= Pc( )+tr (∆SN, ∆RN, ) , (7b) Proposition 2: Functions of the type ′ where Pc(R) = Pc + ∆SNPSN(R), f(X)= p(X)+ q(X), (11) P ′(G) = P + ∆ σ2 tr GG† , c c RN 1 where p is a linear function of X and q is a strictly pseudo- 2 ¨ σ2 2 † † convex function of X, are strictly pseudo-convex functions of Υ(R) = 2 ∆RN σ1InRN + H1RR H1 , and σ1 X, for X 0. See section B of the Appendix for the proof. ¨ † † Proposition 3: Functions of the type Ψ(α, β, G) = αInSN + βH1G GH1.

The energy consumption, Eb, (or EE, 1/Eb) being simply a f(X)= a/q(X), (12) ratio between consumed power and rate [29], the EE-SE trade- where a is a constant and q is a strictly positive and concave off of the cooperative MIMO-AF system in Fig. 1 can be function of X, are convex functions of X, for X 0. expressed as Proof: Based on the scalar composition rules in [33], we PΣ(R, G) Eb(R, G)= , (8) know that a function f = h ◦ g is convex if h is convex and RΣ(R, G) a non-increasing, and g is concave. Given that h(x) = x is where detailed expressions for RΣ(R, G) and PΣ(R, G) are convex and non-increasing function of x for x> 0, it implies provided in (1) and (7), respectively. that f would be convex if g > 0 is concave. 4

In the following, we first derive R⋆ for a known G and then Corollary 1: The Lagrangian function in (17) is a strictly G⋆ for a known R. Next, an alternating optimization approach pseudo-convex (in the unconstrained and both SPC cases) or is utilized (as in [10], [16], [20]) to find a global solution to a convex function (in the DPC case) of Y, such that it has a ⋆ ⋆ ⋆ the problem in (9), as it is further detailed in Section III-C. unique global minimum, Eb = Eb(Y )= Eb(R ). Proof: Given that both PSN(Y) and P RN(Y) are linear A. EE-Optimal Source Precoding functions of Y and RΣ(Y) is a strictly positive and concave Based on equations (9), (8), (7b) and (1b), finding the function of Y, as it is fully proved in section C of the Ap- EE-optimal source precoding matrix, R⋆, for a given relay pendix, the Lagrangian function in (17) meets the requirements precoding matrix G boils down to solving R⋆ = of Propositions 1 (unconstrained and SPC cases), 2 (SPC cases) and 3 (DPC case). ′ † ¨ ⋆ Pc(G)+tr R Ψ(∆SN, ∆RN, G)R Proposition 4: The optimal source precoding matrix, R , arg min Eb(R, G)= , i.e. the optimal solution to the optimization problem in (13) R † ˙ W log2 InSN +R Ψ(G)R can be expressed in closed-form as

(13) 1 1 1 2 2 2 subject to (9b) and (9c). Given that tr {AB} = tr {BA} and ⋆ ˙ −1 ⋆ † ˙ −1 R = Ψ UΨY U Ψ , (18) I AB I BA (Sylvester’s determinant identity) for Ψ | + | = | + | any matrices A and B, E (R, G) in (13) is equivalent to b where UΨ is a unitary matrix that contains the eigenvectors of Ψ(α, β). In addition, Y⋆ = diag([y⋆,y⋆,...,y⋆ ]) is a P ′ + tr RR†Ψ¨ (∆ , ∆ ) 1 2 nSN c SN RN diagonal matrix with diagonal elements such as Eb(R)= , (14) 1 1 I Ψ˙ 2RR†Ψ˙ 2 ⋆ W log2 nSN + ⋆ W Eb 1 y = − 1 , ∀i ∈{1,...,nSN}, (19) i ln(2) ψ (α, β) since Ψ˙ ≻ 0. Note that the argument G is omitted in (14) and i + in the rest of sub-Section III-A for simplifying the notation. where ψi(α, β) are the eigenvalues of Ψ(α, β). See section 1 † 1 By applying the change of variables Y = Ψ˙ 2RR Ψ˙ 2 D of the Appendix for the proof. In addition, by inserting the (such that Y 0, Y = 0, is a Hermitian matrix) and optimal precoding structure, R⋆ in (18), into (14), arguments 1 1 2 2 of both the trace and determinant operators become diagonal Ψ(α, β)= Ψ˙ −1 Ψ¨ (α, β) Ψ˙ −1 , to (14), (9b) and (9c), matrices such that (14) simplifies as the optimization problem in (13) can be re-expressed as P ′ + nSN y⋆ψ (α, β) ′ ⋆ c i=1 i i (20) ⋆ Pc + ∆SNPSN(Y) + ∆RNP RN(Y) Eb = nSN ⋆ . Y = arg min Eb(Y)= , W i=1 log2(1 + yi ) Y RΣ(Y) max max The values of α and βin (19) as well as (20) depend on s.t. PSN(Y) ≤ PSN and P RN(Y) ≤ P RN , (15) how constrained the problem in (13) or (15) is. 1) Unconstrained Optimization: In this case, α = ∆SN and where RΣ(Y)= W log2 | InSN + Y| , (16a) β = ∆RN in both (19) and (20) (see Section D of the Appendix for more details). Consequently, the problem of finding R⋆ PSN(Y) =tr {YΨ(1, 0)} , and (16b) boils down to finding E⋆ that satisfies equation (20), i.e. a P (Y) =tr {YΨ(0, 1)} . (16c) b RN univariate root-search problem since (20) is expressed solely max max 2 † ⋆ ⋆ In addition, P RN = (PRN − σ1 tr GG ). In turn, the La- as a function Eb (via yi ). This problem can be solved in a low- grangian associated to (16) can expressed as L (Y, λ , λ )= complexity manner by using a classic univariate root-finding 1 2 ′ algorithm, as it is detailed in Algorithm 1. Pc + tr {YΨ(∆SN, ∆RN)} Eb(Y)= , 2) Single Power Constrained Optimization: In the SPC case RΣ(Y) ⋆ at the SN, α = λ1 and β = ∆RN in both (19) and (20);  ′ max P + ∆SNP + ∆RNP RN(Y) ⋆  c SN Y max whereas in the SPC case at the RN, α = ∆SN and β = λ2 in  +λ1 (PSN( ) − PSN ), ⋆ ⋆  RΣ(Y) both (19) and (20), where λ and λ are variables to optimize  1 2  ′ max (see Section D of the Appendix for more details). Thus, in the Pc + ∆SNPSN(Y) + ∆RNP RN max  +λ P (Y) − P , or ⋆  R (Y) 2 RN RN SPC case at the SN, the problem of finding R is equivalent  Σ ⋆ ⋆ ′ max max to finding the variable λ1 that minimizes Eb in (20) when Pc + ∆SNPSN + ∆RNP RN max  +λ1 (PSN(Y) − PSN ) the following power constraint equation (obtained by inserting  RΣ(Y)  (18) into (9b)) is satisfied  max  Y nSN +λ2 P RN( ) − P RN ,  y⋆ψˇ (1, 0) = P max, (21)  (17) i i SN  ⋆ max i=1 in the unconstrained case, i.e. if PSN(Y ) < P and max SN ⋆ † P RN(Y ) < P , single power constrained (SPC) case at the ˇ RN max where ψi(1, 0) are the diagonal elements of UΨΨ(1, 0)UΨ. SN, i.e. if P (Y⋆) ≥ P max and P (Y⋆) < P , SPC case ⋆ SN SN RN RN Similarly, in the SPC case at the RN, it is required to find λ2 ⋆ max ⋆ max at the RN, i.e. if PSN(Y ) < P and P RN(Y ) ≥ P , or ⋆ SN RN that minimizes Eb in (20) when dual power constrained (DPC) case, i.e. if Y⋆ max PSN( ) ≥ PSN nSN ⋆ max max and P RN(Y ) ≥ P , respectively. In addition, λ1 ≥ 0 and ⋆ ˇ RN yi ψi(0, 1) = P RN (22) λ2 ≥ 0 are Lagrange multipliers. i=1 5

Algorithm 1 : EE-Optimal Source Precoder, R⋆ in (1a). Note that the argument R is omitted in (24) and in ⋆ ⋆ 2 1: function SN-OPT(Eb , G , W, Pc, ∆SN, ∆RN, nSN, nDN,σj , Hj ) the rest of sub-Section III-B for simplifying the notation. By 6 2 2: Set x , ǫ − ,λ⋆ and λ⋆ ; σ1 † † = 0 = 10 1 =∆SN 2 =∆SN applying the change of variables Z = 2 G H2H2G (such ′ ⋆ ˙ ⋆ ¨ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ σ2 3: Set Pc(G ), Ψ(G ), Ψ(λ1,λ2, G ), and Ψ(λ1,λ2, G ) ; ⋆ ⋆ that Z 0 is a Hermitian matrix) to the Lagrangian function 4: Obtain UΨ and ψi(λ1,λ2) for any i ∈ {1,...,nSN} via ⋆ ⋆ ⋆ in (24), the latter can be re-expressed as Eigen decomposition of Ψ(λ1,λ2, G ); ⋆ ′ 5: while (|Eb − x|≥ ǫ) do Pc + ∆RNPRN(Z, W) ⋆ E (Z, W)= or 6: Set x = Eb ; b ⋆ ⋆ RΣ(Z) 7: Obtain yi , ∀i ∈ {1,...,nSN}, by inserting Eb into (19), L (Z, W,)=  ⋆ ⋆ ′ max for α = λ1 and β = λ2 in (19); Pc + ∆RNPRN max R⋆  + (PRN(Z, W) − PRN ), 8: Obtain via (18); RΣ(Z) R⋆ max R⋆ G⋆ max 9: if PSN( ) ≥ PSN || PRN( , ) ≥ P RN ≥ 0 then (25) ⋆ ⋆  10: Update λ1 or λ2 by using a univariate root-finding where  algorithm, satisfying either (21) or (22). ⋆ 1/2 1/2 11: Obtain R via (18); InRN + Υ ZΥ max Z and R⋆ max R⋆ G⋆ RΣ( )= R0 +W log2 12: if PSN( )≥PSN & PRN( , )≥P RN ≥0 then InRN + Z ⋆ ⋆ 13: Update λ1 and λ2 by using a bivariate root-finding (26a) algorithm, satisfying both (21) and (22). −1 14: Obtain R⋆ via (18); 1 1/2 † † 1/2 ¨ PRN(Z, W)= tr Z W H2H2 W Z Υ . 15: end if ∆RN 16: end if (26b) ⋆ ⋆ ⋆ 17: Update Eb by inserting yi into (20), where α = λ1 and ⋆ CnRN×nDN † β = λ2; Moreover, W ∈ , WW = InRN (if nDN ≥ nRN) in 18: end while (26). Contrary to (15), it is not possible to easily solve (23) 19: return E⋆ and R⋆. ⋆ ⋆ b and obtain its global minimum Eb = Eb(G ); even though 20: end function RΣ(Z) is a strictly positive and concave function of Z, as it is proved in section E of the Appendix, PΣ(Z, W) is not a is satisfied (obtained by inserting (18) into (9c)), where linear function of Z and, hence, the convexity of (24) or (25) ˇ † cannot be readily established. Nevertheless, it can be proved, ψi(0, 1) are the diagonal elements of UΨΨ(0, 1)UΨ. Finding ⋆ ⋆ as it is further explained in the following, that (25) can be Eb for a fixed λi ,i = 1 or 2, via (20) can be computed as ⋆ ⋆ lower and upper bounded by strictly pseudo-convex or convex in the unconstrained case, while finding λi for a fixed Eb via (21) or (22) is a univariate root-finding problem, which functions (i.e. meeting the requirements of Propositions 1 and − ⋆ + − can be solved in a low-complexity manner by using a classic 3). In turn, we prove that Eb ≤ Eb ≤ Eb , where Eb and + root-finding algorithm (e.g. Newton-Raphson method [34]). Eb are the global minima of the lower and upper bounds of 3) Dual Power Constrained Optimization: In this case (25), correspondingly. Based on these bounds, two methods ⋆ ⋆ are then proposed to solve (23); an EE-optimal method and a α = λ1 and β = λ2 in both (19) and (20) (see Section D of the Appendix for more details). Consequently, the problem of low-complexity energy-efficient (but suboptimal) method. ⋆ ⋆ ⋆ 1) Lower and Upper Bounds of (25): Let G be decomposed finding R boils down to finding the variables λ1 and λ2 that minimizes ⋆ in (20) when both power constraint equations as Eb 1 † in (21) and (22) are satisfied. G = VGG 2 UG, (27) † The different steps to obtain the EE-optimal source matrix where VG as well as UG are unitary matrices and G is a R⋆ for a known G are summarized in Algorithm 1. diagonal matrix containing the eigenvalues of G†G sorted in † † descending order, such that G G = UGGUG. For instance B. EE-optimal Relay Precoding in [10], [11], [14], [25], VG = V2 and UG = U1, where Based on equations (9), (8), (7a) and (1a), finding the V2 contains the right-singular vectors of H2 and U1 contains H Λ EE-optimal relay precoding matrix, G⋆, for a given re- the left-singular vectors of 1. In addition, let be a diagonal matrix containing the eigenvalues of H†H sorted lay precoding matrix R boils down to solving G⋆ = 2 2 in descending order, such that H†H = V ΛV†. arg minG E (R, G)= 2 2 2 2 b a) Lower Bound: σ2 ′ 1 ¨ † Proposition 5: The function PRN(Z, W) in (26b) can be Pc(R)+ σ2 tr GΥ(R)G 2 , (23) lower bounded by R (R)+ W log Ω(GΥ (R)1/2)Ω(G)−1 0 2 1 ↑ P (Z)= tr ZΛ−1Υ¨ , (28) subject to (9c), where Υ(R) = I + Υ˙ (R). In turn, the RN ∆ nRN RN Lagrangian associated to (23) can expressed as 2 σ1 i.e. PRN(Z) ≤ PRN(Z, W),where Z = 2 GΛ is a diagonal σ2 Eb(G) or ↑ ′ max matrix and Υ¨ is a diagonal matrix containing the eigenvalues L (G,)= Pc + ∆RNPRN max (24)  + (PRN(G) − PRN ), of Υ¨ sorted in ascending order. Whereas the function R (Z)  RΣ(G) Σ in (26a) can be upper bounded by in the unconstrained or power constrained case, i.e. if ⋆ max ⋆ max PRN(G ) < PRN or PRN(G ) ≥ PRN in (9c), respectively, RΣ(Z)= R0 + W log2 InRN + ZΥ − W log2 InRN + Z , where ≥ 0 is a Lagrange multiplier and RΣ(G) is given (29) 6

i.e. 1/RΣ(Z) ≤ 1/RΣ(Z), where Υ is a diagonal matrix can be expressed as in (30) with RΣ(Z) expressed as in (29) containing the eigenvalues of Υ sorted in descending order, and PRN(Z) given by † such that Υ = UΥΥUΥ . In addition, UΥ is a unitary 1 −1 ¨ matrix containing the eigenvectors of Υ. See section F of PRN(Z)= tr ZΛ Υ , (32) ∆RN the Appendix for the proof. Thus, the following Lagrangian † function instead of (28), where Υ¨ = UΥ ΥU¨ Υ 0 is a Hermitian

′ matrix. Note that PRN(Z) in (32) is a linear function of Z. Pc + ∆RNPRN(Z) Eb(Z)= or Proof: Same as proof of Corollary 2. R (Z) L Z, =  Σ c) Lower and Upper Bound Algorithms: P ′ + ∆ P max  c RN RN Z max Proposition 7: The global optimum of L Z, in (30)  +PRN( ) − PRN , RΣ(Z) occurs at Z• = diag([z•,z•,...,z• ]), where 1 2 nRN  (30)  is a lower bound of the Lagrangian of the original problem in 2 • 1 1 1 4Wγ(υi − 1) (24) or (25). zi = − 1+ + 1 − + • , 2  υi υi ln(2)υiui  Corollary 2: Contrary to (25), the Lagrangian function in + (30) is a strictly pseudo-convex (unconstrained case) or convex  (33) function (power constrained case) of Z, respectively, such that ∀i ∈ {1,...,nRN}, and υi are the eigenvalues of Υ. Given it has a unique global minimum. that PRN(Z) is expressed differently in (28) and (32) for the Proof: Given that PRN(Z) is a linear function of Z lower and upper bounds of (25), respectively, it implies that • and RΣ(Z) is a strictly positive and concave function of Z the values of ui in (33) are also different for each of these • (same proof as for RΣ(Z) in Section E of the Appendix), bounds. For the lower bound, ui represents the elements of ↑ the Lagrangian function in (30) meets the requirements of the diagonal matrix Λ−1Υ¨ in (28). Whereas for the upper Propositions 1 and 3. bound, u• represents the diagonal elements of Λ−1Υ¨ in (32). Corollary 3: Let E− be the global minimum of (30), it then i b See section G of the Appendix for the proof. In addition, by implies that − ⋆, (given that Z Z W ). • Eb ≤ Eb L , ≤ L ( , ,) inserting Z into Eb(Z) in (30), the latter simplifies as b) Upper Bound: n P ′ + RN z•u• + • c i=1 i i Proposition 6: Let G be the relay precoding matrix that Eb = nRN • • , (34) 1 † R0 + W i=1 log2(1 + zi υi) − log2(1 + zi ) minimizes (23) subject to (9c) and G = V2G 2 UΥ, i.e. • • + where E = Eb Z = Z . E = min E (G, VG, UG) (31) b b b b G,VG,UG The value ofγ in (33) is dependent on the value of • • max s.t. (9c), VG =V2, and UG =UΥ; PRN(Z ). If PRN(Z ) < PRN , i.e. unconstrained case, then • γ = Eb in (33) (see Section G of the Appendix for mode + • it then implies that the global minimum of (31), Eb = details). In this case, the problem of finding Z boils down E (G+), verifies E+ ≥ E⋆. • b b b to finding the variable Eb that satisfies equation (34). This Proof: It is commonly known in optimization theory [33] is a univariate root-finding problem, since (34) is expressed • • that enforcing extra constraints to an optimization problem solely as a function Eb (via zi ), that can be solved in a low- results in a reduction of the set of possible/feasible solutions complexity manner by using a classic univariate root-finding • max for the given problem; in other words, the new solution space algorithm. On the contrary, if PRN(Z ) ≥ PRN , i.e. power becomes a subset of the original solution space. Consequently, constrained case, then γ = • in (33) (see Section G of it implies that the solution of a minimization problem is always the Appendix for mode details), with • being a variable to lower or equal to the solution of the same problem when extra optimise. Thus, the problem of finding Z• becomes equivalent constraints are enforced on it; in other words, the solution of to finding the variables • that satisfies the following power the problem with extra constraints upper bounds the solution constraint equation (obtained by inserting Z• into (28) or (32)) of the original problem. Based on this premise, it can be nRN straightforwardly concluded that 1 • • max zi ui = PRN , (35) ∆RN ⋆ i=1 E = min E (G) = min E (G, VG, UG) ≤ (31) b G b b b G,VG,UG i.e. a univariate root-finding problem since (35) is expressed solely as a function • (via z•) in this case. Hence, • can be s.t. (9c), s.t. (9c). i computed in a low-complexity manner by using a root-finding Indeed, the solution space in (31) is restricted to the domain algorithm (e.g. Newton-Raphson method [34]). of positive semi-definite matrix having a specific structure, i.e. Given that the lower and upper bounds of the original a structure as in (27) with VG = V2 and UG = UΥ, instead Lagrangian can be both formulated as in (30), it implies that of the whole positive semi-definite matrix domain (as in the − + Eb and Eb can be computed by using a similar procedure, left side of the inequality). − • • which is summarized in Algorithm 2; Eb = Eb if ui are Corollary 4: Contrary to (23), the optimization problem in ↑ −1 ¨ + • • (31) is strictly-pseudo convex or convex in the unconstrained the elements of Λ Υ in (28), or Eb = Eb if ui are the or power constrained case, respectively. Indeed, its Lagrangian diagonal elements of Λ−1Υ¨ in (32). 7

⋆ • ⋆ Algorithm 2 : Lower or Upper Bound of Eb , Eb Algorithm 3 : EE-Optimal Relay Precoder, G 2 • ′ Υ¨ Υ Λb 1: PT R⋆ H 1: function RN-BND(Eb , W, nRN, ∆RN, Pc, R0, , , ) function RN-O ( , W, Pc, ∆SN, ∆RN, nSN, nRN, nDN,σj , j , −6 b 2: Set x = 0 and ǫ = 10 ; Λ) ′ ⋆ ⋆ ˙ ⋆ ¨ ⋆ ⋆ 3: Obtain UΥ and υi, ∀i ∈ {1,...,nRN}, via Eigen decompo- 2: Set Pc(R ), R0(R ), Υ(R ), Υ(R ), and Υ(R ); Υ − − sition of ; 3: Obtain Eb via RN-BND (Eb , W, ...) in Algorithm 2; • − − + + 4: if “Eb ”=“Eb ” then ⊲ Lower bound, Eb 4: Obtain Eb via RN-BND (Eb , W, ...) in Algorithm 2; ¨ + − −6 5: Obtain the eigenvalues of Υ via Eigen decomposition, 5: Set ε = 0, εmax = Eb − Eb , η = 1, w = 1, and ǫ = 10 ; b ↑ 6: while (η > ǫ) do Υ¨ and sort them in ascending order to form ; 7: Set ε⋆ = ε; ↑ b −1 ¨b • ⋆ 6: Obtain the elements of Λ Υ , u , ∀i ∈ {1,...,n }; 8: while (η > 0) and (ε ≤ εmax) do i RN G⋆ 0 ⋆ − ⋆ 7: end if 9: Obtain by solving (37) for Eb = Eb + ε 8: if ”E•”=”E+” then ⊲ Upper bound, E+ via a gradient/projected gradient search method; b b b ⋆ G⋆ ¨ † ¨ 10: Obtain Eb by inserting in (23); 9: Set Υ = UΥΥUΥ; ⋆ − ⋆ 11: Set η = Eb − (Eb + ε ); Λb −1Υ¨ • ⋆ ⋆ −w 10: Obtain the diagonal elements of , ui , ∀i; 12: if η > 0 then Set ε = ε + 10 (εmax − ε); 11: end if 13: end while • ⋆ −w ⋆ 12: while (|Eb − x|≥ ǫ) do 14: Set ε = ε − 10 (εmax − ε), εmax = ε , and η = |η|; • • 13: Obtain zi , ∀i, by inserting γ = Eb into (33); 15: end while b• max ⋆ G⋆ 14: if PRN(Z ) ≥ PRN then 16: return Eb and . • • 15: Set = Eb ; 17: end function • • 16: Obtain via zi by using a root-finding algorithm such that (35) is satisfied; 17: end if Algorithm 4 : Low-complexity EE Relay Precoder, G+ • 18: Update Eb via (34); + 2 1: function CY R⋆ 19: end while RN-L (Eb , , W, Pc, ∆SN, ∆RN, nSN, nRN, nDN,σj , • • H Λb V 20: return E and z , ∀i. j , , 2) b i ′ R⋆ R⋆ Υ˙ R⋆ Υ¨ R⋆ Υ R⋆ 21: end function 2: Set Pc( ), R0( ), ( ), ( ), and ( ); + • + 3: Obtain Eb and zi , ∀i, via RN-BND (Eb , W, ...) in Alg. 2; b• • • • Z UΥ 4: Set = diag([z1 ,z2 ,...,znRN ]) and obtain (as in line 2) EE-Optimal Relay Precoder: 3 of Algorithm 2); + Proposition 8: Let ε be a nonnegative real number that 5: Obtain G via (38); 6: return E+ and G+. increases in an infinitesimal manner from 0 to infinity; then, b 7: end function the first value of ε, ε⋆, for which the equality

− − + Eb + ε = Eb(G = G(Eb + ε)), (36) problem in (31), G , can be expressed in closed-form as 1 − max 2 2 when PRN(G = G(Eb +ε)) ≤ PRN , is satisfied is the global + σ2 • −1 † ⋆ − ⋆ G = V2 2 Z Λ UΥ, (38) minimum of Eb, such that Eb = Eb + ε . The EE-optimal σ1 ⋆ ⋆ − relay precoding matrix, G , is then such that G = G(Eb + • • • • • ⋆ ⋆ where Z = diag([z1 ,z2 ,...,zn ]), and zi is defined as in ε ), where the latter is a solution of ∇GEb(G=G )= RN (33) with u• being the diagonal elements of Λ−1Υ¨ . i 2 1 ⋆ ⋆ ⋆ σ1 + [∇GP (G=G ) − E ∇GR (G=G )] = 0, Proof: Given that Z = 2 GΛ and G minimizes (31), ⋆ Σ b Σ σ2 RΣ(G ) 1 † + (37) i.e. (23) subject to G = V2G 2 UΥ, it implies that G can ⋆ max be defined as in (38). with PRN(G = G ) ≤ PRN . In addition, ∇GPΣ(G) and Accordingly, G+, can be obtained via Algorithm 4, which ∇GRΣ(G) are expressed in (70). See section H of the Appendix for the proof. has a lower computational complexity than Algorithm 3. Based on this proposition, the EE-optimal relay precoding Indeed, it only involves one or two univariate root-finding matrix, G⋆, can be obtained via Algorithm 3 in conjunction search(es) since it is based on Algorithm 2. However, it is not with, for instance, a gradient search (in the unconstrained case) necessary optimal in general, given that the search domain is or projected gradient search (in the power constrained case) restricted to the domain of positive semi-define matrix having − ⋆ ⋆ a particular structure. method for obtaining G(Eb +ε ) for a given value of ε . The search for the optimal ε⋆ stops when the difference between ⋆ − ⋆ C. Joint Source and Relay Precoding Optimization Eb and Eb +ε becomes negligible (i.e. when both equations (36) and (37) are jointly satisfied). Note that w in line 5 of 1) Alternating Optimization Procedure: As in [10], [16], Algorithm 3 is an accuracy parameter, which is utilized to [20], [24], the main optimization problem in (9) can be solved implement in practice the ”infinitesimal” increasing of ε from by using an alternating optimization procedure based on R⋆ 0 to ε⋆; the larger w is, the more accurate is the algorithm, but and G⋆ or G+, as follows: then the more iterations are likely to be required (i.e. increased 1) Obtain Λ and V2 via singular value decomposition. ⋆(1) ⋆(1) complexity). Then, set R = InSN , G , and obtain the eigenval- ⋆(1) 3) Low-complexity Energy-Efficient Relay Precoder: ues of Ψ(∆SN, ∆RN, G ), ψi(∆SN, ∆RN). As in [10], Proposition 9: Contrary to G⋆, which can only be obtained we consider N different randomly selected initialization ⋆(1) ⋆ ln(2) via a numerical method, the solution of the optimization matrices G 0. Next, set Eb = 1+ W maxi{ψi}. 8

k ⋆( ) 2 − 2 2 2) At the k-th iteration, G is used as an input of 14.8 σ0 = 30 dB & σ1 = σ2 = 0 dB Algorithm 1, which updates the value of E⋆ and return k b ⋆( +1) 14.3 R . (J/bit) k ⋆( +1) b

3) Next, R is used as an input of Algorithm 3, which E 13.8 k ⋆ ⋆( +1) updates the value of E and return G . b 2 − 2 2 4) Steps 2) and 3) are repeated iteratively until conver- 34.3 σ2 = 30 dB & σ0 = σ1 = 0 dB ⋆ gence, i.e. until the values of Eb at the end of the k-th 34.28 and k +1-th iterations are the same. (J/bit) b

E 34.26 Note that the same procedure can also be used for obtaining + ⋆ 2 2 2 G instead of G , where Algorithm 4, instead of Algorithm 16.8 σ1 = σ2 = −30 dB & σ0 = 0 dB 3, is used at step 3). 16.7

2) Convergence Discussion and Results: Following the (J/bit) same line of reasoning as in [24], the convergence of the b E 16.6 alternating procedure for solving (9) in the previous subsection 2 2 2 can be established as follows; first, given that the optimization 182 σ0 = 20 dB & σ1 = σ2 = 10 dB problem in (9) has a unique global minimum when the 181 variable R is optimized for a known G, it implies that (J/bit)

k k k k b ⋆( +1) ⋆( ) ⋆( ) ⋆( ) Eb(R , G ) ≤ Eb(R , G ), at the k-th iteration. E 180

Second, since the optimization problem in (9) has a unique 1 2 3 4 5 6 global minimum when the variable G is optimized for a known Iterations k k k k ⋆( +1) ⋆( +1) ⋆( +1) ⋆( ) R, it implies that Eb(R , G ) ≤ Eb(R , G ), at the k-th iteration. By combining the two previous inequali- Fig. 2: Number of iterations required for the alternating k k k k ⋆( +1) ⋆( +1) ⋆( ) ⋆( ) optimization procedure to converge in different settings. ties, such that Eb(R , G ) ≤ Eb(R , G ), and knowing that Eb is lower bounded, we can conclude that the (1) conditional updating of R⋆ (for a fixed G⋆) and G⋆ (for a the same value and, hence, is independent of G⋆ , when fixed R⋆) at each iteration either decreases or maintains the considering any of the three special case settings. In turn, this ⋆ ⋆ ⋆ ⋆ value of Eb = Eb(R , G ). In other words, Eb is the best confirms that the alternating procedure can reach the global ⋆ ⋆ ⋆(1) local minimum of Eb, for a given R or G , at each iteration. minimum in these special cases, regardless of G . Whereas 2 2 2 However, similar to [10], [16], [20], it cannot be guaranteed in the case of σ0 = 20 dB and σ1 = σ2 = 10 dB, the outcome ⋆ ⋆(1) that Eb is always the global minimum of (9) in the general of the alternating procedure is clearly dependent of G , i.e. (1) case since Eb(R, G) in (9) is not necessarily jointly strictly different G⋆ provide different outcomes. Thus, this justifies pseudo-convex/convex in both R and G. Nevertheless, in the the use of multiple initialization points [10] for increasing the ⋆ following special cases, Eb is guaranteed to be the global likelihood of reaching the global minimum of (9). minimum of (9): 2 2 2 • if σ0 ≪ 1, σ1 ≫ 1, or σ2 ≫ 1, then (9) becomes IV. NUMERICAL RESULTS AND DISCUSSIONS equivalent to a DL only EE-based optimization problem Our simulations, which are averaged over 10000 runs, (i.e. independent of G), such that E (R⋆, G⋆ = 0) is b assume a downlink transmission of the cooperative MIMO AF optimal; 2 system, such that the SN is a BS and the DN is a UE. They also • if σ ≪ 1, then (9) becomes independent of G as in the 2 rely on the power model parameters of Table 1 of [12] (but previous case; SlpA 2 2 with P = 0 W) for setting values to all the parameters • if σ ≪ 1 and σ ≪ 1, then R and G become SN 2 1 discussed in Section II-B. In addition, note that P max per uncorrelated in (1). In turn, it has been discussed in . antenna is between 6 to 20 W for a typical micro/macro BS [4], [10] (for the RL only scenario, i.e. σ2 ≫ 1) that it 0 and between 1 to 5 W for a typical urban/rural RN [26]. We is a sufficient condition for (9) to converge to a global also consider a single-tap i.i.d MIMO Rayleigh channel optimum. between each node, a unit bandwidth (W =1), and ς =1/2 Figure 2 depicts the number of iterations that are necessary in Pc. Moreover, we assume that all the channel matrices have for the alternating optimization procedure to convergence in the same dimension, i.e. , and 2 n = nSN = nRN = nDN N = 10 the three special cases previously discussed, i.e. σ0 = −30 in step 1) of Section III-C1. 2 2 2 2 dB and σ1 = σ2 = 0 dB (⇔ σ0 ≪ 1), σ2 = −30 dB and 2 2 2 2 2 σ0 = σ1 = 0 dB (⇔ σ2 ≪ 1), or σ1 = σ2 = −30 dB and 2 2 2 A. Performance Results and Insights σ0 =0 dB (⇔ σ2 ≪ 1 and σ1 ≪ 1), respectively. In addition, a case where convergence is not necessarily guaranteed, i.e. In order to demonstrate the benefits, in terms of EE, 2 2 2 σ0 = 20 dB and σ1 = σ2 = 10 dB, is also plotted. Note of our source and relay precoding optimization scheme for that the alternating procedure is run N = 10 times (such cooperative MIMO-AF systems, we compare its performance that each subplot has 10 dashed lines), where the random against existing approaches that either minimize the MSE [20], ⋆(1) initialization matrix G is different for each run. The results maximize the SE [16], or minimize Eb [23] in the same clearly show that the alternating procedure converges towards scenario (when considering both direct and relay links), and 9

0 2 10 10 −1

10 (W)

−2 10 Tx P 1st phase 1 10

MSE −3

10 2 10 −4 min MSE [20] 10 max SE [16] (W) ⋆⋆ Eb 1 Tx 10 P 60 2nd phase

2 10 40 (bit/2/s) 1 Σ 10 R 20 (bit/2/s)

Σ

R 0 ⋆◦ 10 Eb (DL only) [35] ∗∗ Eb (RL only) [10,11] 2 3 ⋆⋆ 10 10 Eb (Direct + relay links)

Unconstrained

(bit/J) 2 Power constrained b

(J/bit) 10 E b E

1 1 10 10 −30 −20 −10 0 10 20 −20 −10 0 10 20 30 2 2 σ0 (dB) σ0 (dB) Fig. 3: MSE, sum-rate, and EE performances of our scheme, Fig. 4: Transmit power, sum-rate, and EE performances of against other existing non-EE-based cooperative MIMO-AF our scheme against other existing EE-based DL only and RL 2 2 precoding schemes, as a function of σ0 dB. only schemes, as a function of σ0 dB.

other existing EE/Eb-based approaches that either optimize the Pc is defined right after equation (6), whereas in the DL or CipA Ci CipA Ci EE of the DL [35] or RL [10], [11] on its own . For ease of RL only case, Pc = nSNPSN + PSN + ς(nDNPDN + PDN) or CipA SlpA Ci CipA Ci introduction, the different schemes that are compared in this Pc = nSN(PSN + PSN )+ PSN +(1+ ς)(nRNPRN + PRN)+ section are denoted, as follows CipA Ci SlpA ς(nDNPDN + PDN)+ nDNPDN , respectively. ⋆⋆ ⋆ ⋆ • Eb = Eb(R , G ) is the outcome of our EE-optimal Figure 3 compares the MSE, sum-rate, and EE performances source and relay precoding design, which can be imple- of our scheme, against other existing precoding schemes that mented through Algorithms 1 and 3. are dedicated to either minimize the MSE [20] or maximize ⋆+ ⋆ + • Eb = Eb(R , G ) is the outcome of our EE-optimal the SE [16] of cooperative MIMO-AF systems, for n = 4, max max 2 2 2 source and low-complexity relay precoding design, which PSN = 80 W, PRN = 10 W, σ1 = σ2 = σ0 /10. Note that can be implemented through Algorithms 1 and 4. the unit for sum-rate is bit/2/s, since we consider the number • min MSE is the outcome of the MSE-optimal source and of transmitted bits over two transmission phases. As it is relay precoding design proposed in [20]. expected, the precoding schemes of [20] and [16] provide the • max SE is the outcome of the SE-based source and relay best MSE and sum-rate performances, respectively, whereas precoding design proposed in [16]. our scheme provides the best energy consumption. Indeed, ⋆⋄ • Eb denotes the outcome of our preliminary work in our scheme can reduce Eb by up to 25% in good channel 2 [23], where the EE-optimal source precoding is utilized condition (when σ0 = −30 dB), but at the expense of a 15% in conjunction with a different relay precoding structure, reduction in sum-rate when compared to max SE and almost 1 † i.e. G = V2G 2 U1. one order of MSE magnitude when compared to min MSE. ⋆◦ • Eb = minR Eb(R, G = 0) is the outcome of the EE- It can also be remarked that as the channel condition worsen 2 optimal source precoding design for DL only in [35]. (when σ0 increases), as the results of the three schemes start ∗∗ R G • Eb = min , limσ0→∞ Eb(R, G) is the outcome of to converge. In turn, this indicates that the type of precoding the EE-optimal source and relay precoding design for RL design becomes less of an issue in poor channel condition. only in [10], [11]. Figure 4 compares the transmit power (in both transmission Note that when comparing the EE of cooperative schemes with phases), sum-rate, and EE performances of our scheme, against DL or RL only scheme, it is necessary to integrate the fact that other existing EE/Eb-based approaches that either optimize the Pc, which accounts for all the fixed circuit consumed powers EE of the DL [35] or RL [10], [11] on its own, for n = 4, max max 2 2 2 in the system, is different in each case. In the cooperative case, PSN = 80 W, PRN = 10 W, σ1 = σ2 = σ0 /10. Note 10

a) that both the unconstrained and power constrained results are 20 ∆EE(%) plotted for each scheme. In terms of Eb performance, the results show that our new scheme can outperform the two other 10 schemes by more than 30% in both the unconstrained and power constrained scenarios. Whereas in terms of sum-rate, 25 it can be remarked that in this particular setting, the DL only (dB) 0 2 2 transmission scheme [35] performs better than our cooperative σ scheme or the RL only scheme, by up to 40%. This is similar −10 to the results in Fig. 3 and it indicates the existence of a trade- off between EE and rate [30]. Even though DL transmission −20 −20 −10 2 0 10 20 20 provides a better rate than cooperative transmission, this does σ1 (dB) not translates in a lower energy consumption. Indeed, the b) 20 transmit power results indicate that the DL transmission al- ways utilizes more transmit power than the two other schemes (up to 8 times more in the 2nd transmission phase) and, in 10 turn, this explains why it is less energy efficient than our 15 cooperative scheme. By analyzing the four subplots together, (dB) 0 2 2

we can conclude that our scheme provides the best trade-off σ between power and rate, such that it exhibits the best Eb. −10 The transmit power results of Fig. 4 are also very informa- tive about the way in which EE-based optimization works and −20 10 differs from SE maximization or MSE minimization, which c) echoes the first paragraph of Section III. Indeed, the top two 20 2 subplots of Fig. 4 show that depending on the value of σ0 , the 2 10 optimization problem in (9) is either unconstrained (for σ0 ≤ 2 2 2 dB), SPC at the RN (for 2 dB < σ0 ≤ 22dB) or DPC (for σ0 > 5 ⋆⋆ 22dB). Moreover, as it was expected, Eb is always lower in (dB) 0 2 the unconstrained rather than in the constrained regime. In 1 σ contrast, the minimum MSE and maximum SE are always −10 achieved when both power constraints are met (DPC regime) ⋆⋆ in Fig. 3. This explains why Eb tends to converge towards 2 −20 0 min MSE and max EE in Fig. 3 when σ0 ≫ 0, since in this −20 −10 0 10 20 2 case, all the schemes operate in the DPC regime. σ0 (dB) Figure 5 depicts the EE gain, ∆EE, of our novel EE- based cooperative precoding scheme against EE-based non- Fig. 5: EE gain comparison of our scheme against other 2 cooperative precoding schemes for various σi values, ∀i ∈ existing EE-based DL only and RL only schemes, as a function max max 2 2 2 2 2 2 {0, 1, 2}, n =4, PSN = 80 W, PRN = 10 W, and where of a) (σ1 , σ2) when σ0 =0 dB, b) (σ0, σ2 ) when σ1 =0 dB, 2 2 2 ⋆⋆ and c) (σ0 , σ1) when σ2 =0 dB. ∆EE = 100[1 − Eb /χ]+ %. (39) ⋆◦ ∗∗ Note that χ = min{Eb , Eb } in Fig. 5. The results clearly 2 2 2 the worst link of the two-hop RL, i.e. σ0 << max{σ1, σ1 }, confirm the benefits, in terms of EE, of joint source and the precoding at the SN prioritizes the DL transmission such relay precoding optimization in comparison with existing ⋆⋆ ⋆◦ that Eb ≈ Eb . In turn, this explains why ∆EE ≃ 0 in approaches, where precoding matrices are optimized for either the left region of both Fig. 5 b) and c), as well as in the the direct or relay link separately. Indeed, 45% or more of upper right region of Fig. 5 a). On the other hand, whenever each subplot area has a , which graphically indicates ∆EE ≥ 0 the RL exhibits a far better link quality than the DL, i.e. that joint source and relay precoding matrices optimization σ2 >> max{σ2, σ2}, the precoding at both the SN and RN is useful for improving the EE of MIMO-AF systems. For 0 1 1 prioritize the RL transmission such that E⋆⋆ ≈ E∗∗. As a instance, in Fig. 5 b) and c), an EE improvement of up to 28% b b result, ∆EE ≃ 0 in the lower right region of both Figs. 5 b) is achieved. However, it can be remarked that the intensity of and c), as well as in the lower left region of Fig. 5 a). Note the gain is not uniform; it clearly depends on the values of that similar results have been remarked for higher number of 2, which themselves reflect the quality of each links. Similar σi antennas, e.g. n =8 or n = 16. Finally, the link quality being observations have also been reported in [17], [18], for the dependent on the RN placement in a practical deployment, Fig. case of SE-based joint source and relay precoding subject to 5 is quite informative for getting the most out of cooperative power constraints. On the one hand, we know that multi-hop MIMO-AF communication systems in terms of EE. communication (RL only) is prone to the ‘bottleneck’ effect (see Fig. 4 of [15]), where the overall rate of the RL is limited Figures 6 and 7 complement the results of Fig. 5 by by the rate of its worst hop (see Section IV. B1) of [11]). focusing on the impact of power constraints and imperfect Hence, whenever the DL exhibits a far better link quality than CSI, respectively, on the EE gain. By considering the same 11

a) a) 20 ∆EE(%) 20 ∆EE(%) 50 10 10 30

(dB) 0 (dB) 0 2 2 1 1 40 σ 25 σ −10 −10

−20 20 −20 30 b) b) 20 20 15 10 10 20 10

(dB) 0 (dB) 0 2 2 1 1

σ σ 10 −10 5 −10

−20 −20 0 −20 −10 0 10 20 −20 −10 0 10 20 2 2 σ0 (dB) σ0 (dB)

Fig. 6: EE gain comparison of our scheme against other Fig. 7: EE gain comparison of our scheme against other existing EE-based DL only and RL only schemes, for different existing EE-based DL only and RL only schemes, when power constraints. considering imperfect CSI.

max max a) settings as in Fig. 5 c), except that PSN = 40 W, PRN = 20 20 ∆EE(%) max max W in Fig. 6 a) and PSN = 20 W, PRN =5 W in Fig. 6 b), Fig. 6 further confirms that our scheme can be more energy 10 6 efficient than existing approaches and that the improvement is localized. In addition, it can be remarked that more EE

(dB) 0

gain, around 33%, is achieved in Fig. 6 b) when the power 2 1 5

max max σ constraints are more stringent, i.e. PSN = 20 W, PRN = 5 max max −10 W instead of PSN = 40 W, PRN = 20 W in Fig. 6 a) or max max PSN = 80 W, PRN = 10 W in Fig. 5 c). Whereas in Fig. 4 7, by relying on the noisy CSI model of [36] for modeling −20 b) the imperfection in CSI estimation, the EE gain is depicted 20 when considering the same settings as in Fig. 5 c), except 3 2 2 that σE = 0.1 in Fig. 7 a) and σE = 0.5 in Fig. 7 b). In 2 10 the noisy CSI framework of [36], σE is used to model the 2 2 quality of the CSI estimation; σE =0 being equivalent to the perfect CSI estimation case (e.g., as in Figs. 5 and 6), whereas (dB) 0 2 2 1 the estimation quality degrades as σ increases. Even though σ E 1 the absolute performance of all the schemes will obviously −10 2 degrade as σE increases, it is interesting to see in Fig. 7 that it is not the case for the relative performance of our novel EE- −20 based cooperative precoding scheme against EE-based non- −20 −10 0 10 20 2 cooperative precoding schemes. On the contrary, higher values σ0 (dB) 2 of ∆EE% can be achieved when σE increases, i.e. an EE gain 2 Fig. 8: EE gain comparison of our scheme against a) its of up to 50% is obtained in Fig. 7 b) for σE =0.5. In turn, this indicates that our cooperative scheme is more resilient to CSI low-complexity alternative and b) our preliminary EE-based estimation error that the existing non-cooperative precoding precoding scheme of [23]. schemes. This improved resilience could be due to diversity; the cooperative scheme relies on two different transmission Figure 8 compares the EE performance of the our scheme routes, instead of one for the non-cooperative schemes. Finally, with the EE-optimal relay precoder against the low-complexity as in Figs. 5 and 6, the EE gain is still localized in Fig. 7. energy efficient relay precoder in Fig. 8 a) and the relay 12

precoder of [23] in Fig. 8 b), when considering the same settings as in Fig. 5 c). Consequently, ∆EE is defined as in 10 ⋆+ ⋆⋄ 10 (39), but where χ = Eb and χ = Eb in Fig. 8 a) and y ∝ n3 b), respectively. On the one hand, Fig. 8 a) confirms that the relay precoding matrix G+ is not as energy efficient as G⋆ in the general case (since ∆EE ≥ 0); however, in this particular 8 example, the difference of performance between using G⋆ or 10 G+ remains below 4%. On the other hand, Fig. 8 b) clearly 1 † indicates that the relay precoding structure G = V2G 2 U1, which is known to be optimal in various other MIMO-AF 6 10 precoding scenarios [10], [11], [14], [25], is not optimal here Number of operations ⋆⋄ ⋆⋆ since Eb can differ by more than 6% compared to Eb . Alg.1, EE-optimal Source Alg.4, Low-complexity Relay B. Complexity Analysis and Results Alg.3, EE-optimal Relay 4 10 In order to put the results in Figs. 5 and 8 into perspective, 2 4 6 8 16 32 64 we discuss here the computational complexity of the different Number of transmit antennas, n algorithms proposed in this paper, when assuming that n = Fig. 9: Comparison of the average number of basic operations nSN = nRN = nDN: required by Algorithms 1, 3, and 4 as a function of n, for ⋆ • Even though R in Algorithm 1 is obtained through a σ2 =0 dB and σ2 = σ2 = −6 dB. combination of simple univariate and/or bivariate root- 0 1 2 finding searches (with complexity linear in n), the com- plexity of the whole algorithm is asymptotically driven by Figure 9 depicts the computational complexity, measured in the complexity of the matrix operations it uses, e.g. eigen terms of the average number of basic operations (e.g. addition, substraction, multiplications, etc.), of Algorithms 1, 3, and 4, decomposition (ED) for obtaining UΨ, matrix multipli- 2 cation/inversion for computing Ψ˙ , Ψ¨ , and R⋆. Given that as a function of the matrix dimension, n, for σ0 =0 dB and 2 2 these matrix operations usually exhibit a computational σ1 = σ2 = −6 dB (i.e. a setting for which the EE gain is complexity of O(n3), we expect Algorithm 1 to exhibit significant according to Fig. 5 a)). The results first confirm the same sort of asymptotical complexity. that the complexity of all three algorithms is asymptotically 3 • Algorithm 2 also utilizes univariate root-finding searches driven by the complexity of the matrix operations, i.e. n , as well as ED and matrix multiplication/inversion; conse- and, as it was expected, that Algorithm 4 is the least complex quently, we also expect Algorithm 2 to exhibit a compu- of the three algorithms; Algorithm 4 is roughly 60 times tational complexity of O(n3). However, Algorithm 2 uses less complex than Algorithm 3. Hence, the low-complexity at worst two root-finding searches (i.e. for unconstrained approach of Algorithm 4 has performance close to the original and DPC cases) instead of three in Algorithm 1. Thus, approach of Algorithm 3 (based on Fig.8 a)), but with a far we expect Algorithm 2 to exhibit a lower computational lower computational complexity. complexity than Algorithm 1. + • Algorithm 4 uses Algorithm 2 to return G , such that it V. CONCLUSION follows the same structure as Algorithm 1. Hence, it is expected to have a similar complexity of O(n3). In this paper, an energy efficient precoding method for the • The EE-optimal relay precoding in Algorithm 3 is based cooperative MIMO-AF scenario (i.e. when both relay and on an iterative approach, where the average number of direct links are considered) has been proposed, based on EE- ⋆ iterations, Niter, that are necessary for finding G are optimal source and relay precoding matrices. We have formally increasing with w and 1/ǫ; w and ǫ being accuracy proved the optimality of our source and relay precoders, when parameters that are defined at line 5 of Algorithm 3. The treated independently, by using pseudo-convexity/convexity ar- larger w and 1/ǫ are, the more accurate is this algorithm, guments and have relied on alternating optimization (for which but the more iterations (complexity) are required. For the convergence has been proved) for jointly optimizing them. instance, when w =1, then inner while loop in Algorithm We have provided a closed-form expression for the EE-optimal 3 (lines 8 to 13) is perform at most 10 times for each source precoder and have designed an optimal numerical outer while loop (lines 6 to 15). Meanwhile, inside the approach for obtaining the relay precoder. We have also derive inner while loop, a gradient/projected gradient search is a sub-optimal relay precoder in closed-form that exhibits EE performed to update G⋆, which again relies on matrix performance close to the EE-optimal relay precoder, but with operations, such as determinant and multiplication, that a far lower computational complexity. Simulation results have exhibit a computational complexity of O(n3). Overall, confirmed that our novel EE-based approach can improve the the optimal approach is expected to exhibit a computa- EE of cooperative MIMO-AF systems by up to 50% in 3 tional complexity of O(Nitern ). In addition, given that comparison with existing approaches. In the future, we plan to Algorithm 3 also required to run Algorithm 2 twice, its extend our work for the case where only imperfect or partial computational complexity can only be always greater than CSI is available at the source and/or relay node, given the the one of Algorithm 4. promising results of Fig. 7. 13

APPENDIX C. Proof of Corollary 1: Strict Positivity and Concavity of (16a) A. Proof of Proposition 1 Proof: On the one hand, let RΣ(Y) be defined as in (16a), Proof: Based on the definition of [32], any function f it then implies that RΣ(Y) > 0 for Y 0, Y = 0. Moreover, must verify the following relationship in order to be a strictly by relying on matrix calculus [37], it also implies that pseudo-convex function of X 0, −1 ∇YRΣ(Y), EF = W/ ln(2) tr ( InSN + Y) E . (48)

∇Xf(X), EF ≥ 0 ⇒ f(X + E) >f(X), (40) † Note that ( InSN + Y) = ( InSN + Y) since ( InSN + Y) is a Hermitian matrix. On the other hand, for any Hermitian matrix E such that X + E 0 and E = 0. X p( ) −1 Given that f(X)= X , it implies that ∇Xf(X), EF= q( ) RΣ(Y + E) − RΣ(Y)= W log2 InSN + ( InSN + Y) E . (49) −1 X X X E X X X E X tr{X} q( ) [∇ p( ), F − f( )∇ q( ), F] . (41) Knowing that for any square matrix X, e = e ⇔ tr{X} = ln eX , and by using the change of variable In turn, based on (41), ∇Xf(X), EF ≥ 0 is equivalent to −1 X1 = ( InSN + Y) E ≻ 0 (when assuming that E = X1 0), (48) and (49) can be re-expressed as W log2 e and ∇Xp(X), E ≥ f(X)∇Xq(X), E . (42) X1 F F W log | I + X1|, respectively. Since e > | I + X1| 2 nSN nSN for any square matrix X1 ≻ 0, RΣ verifies (43) and, in turn, Based on the definition of [33], q is a strictly concave function is a strictly positive and concave function of Y. of X 0 if and only if

D. Proof of Proposition 4 ∇Xq(X), EF > q(X + E) − q(X), (43) Proof: The Lagrangian associated with the optimization for any Hermitian matrix E such that X + E 0 and E = 0. problem in (15) is formulated in (17). Given that, according to By inserting equation (43) into (42), we obtain Corollary 1, this function is strictly pseudo-convex or convex, it implies that the global optimum of L (Y, λ1, λ2) occurs at ∇Xp(X), EF >f(X)[q(X + E) − q(X)] a stationary point, such that

⇔ q(X)∇Xp(X), EF >p(X)[q(X + E) − q(X)] (44) ⋆ ⋆ ⋆ ∇YL (Y = Y , λ1, λ2)= 0, (50) ⇔ q(X)[p(X)+ ∇Xp(X), EF] >p(X)q(X + E). ⋆ ⋆ where λi = λiRΣ(Y ), ∀i ∈{1, 2}. Given that p is a linear function of X, it implies that p(X + 1) Unconstrained Optimization: According to the first E)= p(X)+ ∇Xp(X), EF. Consequently, equation of (17), (50) is equivalent to

⋆ ⋆ ⋆ q(X)p(X + E) >p(X)q(X + E) ∇YPΣ(Y = Y ) − Eb ∇YRΣ(Y = Y )= 0, (51) Y⋆ p(X + E) p(X) (45) ⋆ ⋆ PΣ( ) ⇔ > ⇔ f(X + E) >f(X). in the unconstrained case, where Eb = Eb(Y ) = Y⋆ . q(X + E) q(X) RΣ( ) According to the definitions of RΣ(Y) in (16a) and PΣ(Y)= P ′ + tr {YΨ(∆ , ∆ )} in the first line of (17), it then Hence, (40) is verified and f(X) is strictly pseudo-convex. c SN RN implies, by relying on matrix calculus [37], that

∇YPΣ(Y) = Ψ(∆SN, ∆RN) and B. Proof of Proposition 2 W −1 (52) ∇YR (Y) = (I + Y) , Σ ln(2) nSN Proof: Let q be a strictly pseudo-convex function, such that it verifies (40), and p be a function of X 0; hence, respectively. By inserting the results in (52) into (51), the latter can be re-expressed as (p(X + E) − p(X)) + ∇Xq(X), EF ≥ 0 ⇒ W E⋆ b I = Ψ(∆ , ∆ )[I + Y⋆] (53a) (p(X + E) − p(X)) + q(X + E) − q(X) > 0 ln(2) nSN SN RN nSN (46) ⋆ ⇔ (p(X + E) − p(X)) + ∇Xq(X), EF ≥ 0 ⇒ ⋆ W Eb −1 ⇔Y = Ψ(α, β) − InSN , (53b) p(X + E)+ q(X + E) >p(X)+ q(X). ln(2) where α = ∆ and β = ∆ . Let X X X and be linear, i.e. X E SN RN f( ) = p( )+ q( ) p p( + ) = 2) Single Power Constrained Optimization: According to X X X E , it then implies that p( )+ ∇ p( ), F the second and third equations of (17), (50) is equivalent to ⋆ ∇Xp(X), EF +∇Xq(X), EF ≥ 0⇒f(X + E)>f(X) W Eb ⋆ −1 ⋆ Ψ(0, ∆RN) − (InSN + Y ) + Ψ(λ1, 0) = 0 and ⇔ ∇Xf(X), EF ≥ 0 ⇒ f(X + E) >f(X), ln(2) (47) W E⋆ Ψ(∆ , 0) − b (I + Y⋆)−1 + Ψ(0, λ⋆)= 0, which is the definition of a strictly pseudo-convex function in SN ln(2) nSN 2 (40). (54) 14 in the SPC case at the SN and SPC case at the RN, respec- F. Proof for Proposition 5 tively. Equivalently, Proof: By inserting G in (27) into (23), the denom- ⋆ W Eb ⋆ ⋆ inator and numerator of the latter can be re-expressed as In = Ψ(λ , ∆RN)[In + Y ] and ln(2) SN 1 SN RΣ(G, VG, UG)= (55) W E⋆ 2 b ⋆ ⋆ σ1 1 1 † † 1 † 1 InSN = Ψ(∆SN, λ2)[InSN + Y ] , R +W log I + Υ 2 UGG 2 V H H VGG 2 U Υ 2 ln(2) 0 2 nRN σ2 G 2 2 G 2 ⋆ 2 such that equality (53b) holds, but where α = λ and β = ∆RN σ 1 † † 1 1 1 2 2 ⋆ − W log2 InRN + G VGH2H2VGG and (60a) in the SPC case at the SN, and α = ∆SN and β = λ in the σ2 2 2 SPC case at the RN. 2 ′ σ1 † G ¨ G 3) Dual Power Constrained Optimization: According to the PΣ(G, U )= Pc + 2 tr GUGΥU . (60b) σ2 fourth equation of (17), (50) is equivalent to ⋆ Consequently, the following inequalities hold W Eb ⋆ −1 ⋆ ⋆ 0 − (InSN + Y ) + Ψ(λ1, λ2)= 0 ln(2) RΣ(G, VG, UG) ≤ max RΣ(G, VG, UG) and (61a) VG,UG ⋆ (56) W Eb ⋆ ⋆ ⋆ I Ψ I Y G G ⇔ nSN = (λ1, λ2)[ nSN + ] , minPΣ(G, U ) ≤ PΣ(G, U ). (61b) ln(2) UG in the DPC case, such that equality (53b) holds, but where On the one hand, based on the work of [14] (which relies on ⋆ ⋆ α = λ1 and β = λ2. Hadamard determinant inequality), we know that the function Finally, since Ψ(α, β) is a Hermitian matrix, it can de- of the form as described in (60a) are maximized when the † arguments of both determinant operators become diagonal composed as Ψ(α, β) = UΨΨ(α, β)UΨ, where UΨ is a unitary matrix containing the eigenvectors of Ψ(α, β) and matrices (when assuming that the eigenvalues of all the matrices are sorted in the same order). This can easily be Ψ(α, β) = diag([ψ1(α, β), ψ2(α, β),...,ψnSN (α, β)]) is a diagonal matrix containing the eigenvalues of Ψ(α, β). Hence, achieved by setting VG = V2 and UG = UΥ in (60a). equation (53b) can be reformulated as Hence, maxVG,UG RΣ(G, VG, UG)= RΣ(G, V2, UΥ)= ⋆ −1 2 2 ⋆ W Eb † ⋆ † σ1 σ1 Y = UΨΨ(α, β)UΨ − InSN = UΨY UΨ, R0 + W log2 InRN + 2GΛΥ − W log2 InRN + 2 GΛ , ln(2) σ2 σ2 (57) (62) ⋆ 2 WEb σ where Y⋆ = Ψ(α, β)−1 − I is a diagonal matrix with 1 ln(2) nSN which is equivalent to equation (29) whenZ = σ2 GΛ. On ⋆ 2 elements, y , as defined in (19). Note that the operator [.]+ is i the other hand, the matrix UG that minimizes PΣ(G, UG), used in (19) since both Y 0 and R 0. † or equivalently tr GUGΥU¨ G according to (60b), is sim- ply the matrix matching the largest values of G with the E. Proof of the Strict Positivity and Concavity of (26a) † smallest values of UGΥU¨ G. This can easily be achieved Proof: On the one hand, let RΣ(Z) be defined as in (26a), ↑ ↑ by setting UG = UΥ¨ , where UΥ¨ is unitary matrix that it then implies that R (Z) > 0 since R > 0 when R 0, Σ 0 contains a column permutation of the eigenvectors of UΥ¨ , R = 0. Moreover, by relying on matrix calculus [37], it also † ↑ ↑ i.e. U↑ ΥU¨ ↑ = Υ¨ . Hence, Υ¨ is a diagonal matrix implies that ∇ZRΣ(Z), EF = Υ¨ Υ¨ ¨ −1 −1 containing the eigenvalues of Υ sorted in ascending order. −1 ↑ W/ ln(2) tr Υ + Z − ( InRN + Z) E . (58) Therefore, minUG PΣ(G, UG)= PΣ(G, UΥ¨ )= −1 −1 2 Note that Υ−1 + Z ( I + Z) given that Υ I. σ ↑ nRN P ′ + 1 tr GΥ¨ , (63) On the other hand, c σ2 2 −1 −1 ′ InRN + Υ + Z E which is equivalent to Pc + ∆RNPRN(Z), with PRN(Z) as in RΣ(Z + E) − RΣ(Z)= W log . 2 2 −1 (28) when Z = σ1 GΛ. InRN + ( InRN + Z) E σ2 2 (59) Knowing that for any square matrix X, tr{X} = X G. Proof of Proposition 7 ln e , and by using the change of variables X1 = −1 −1 Proof: The Lagrangian associated with the lower and Υ−1 + Z − ( I + Z) E 0 and X2 = nRN upper bounds of the original problem in (23) can be expressed −1 (InRN + Z) E ≻ 0 (when assuming that E = as in (30). Given that, according to Corollary 2, this function X1 0), (58) and (59) can be re-expressed as W log2 e is strictly pseudo-convex or convex, it implies that the global In +X1+X2 and W log RN , respectively. Since eX1 ≥ optimum of L Z, occurs at a stationary point, such that 2 In +X2 RN | InRN + X1| for any square matrix X1 0 and In X1 X2 • • RN + + X1 b Z Z 0 I I X1 ∇ZL = , = , (64) In X2 = nRN + In X2 < | nRN + |, RΣ in RN + RN + (26a) verifies (43) and, in turn, is a strictly positive and ′ max where • Pc+∆RNPRN . = Zb• 2 concave function of Z. (RΣ( )) 15

⋆ − ⋆ 1) Unconstrained Optimization: According to the first it exists a matrix G = G(Eb + ε ) that verifies equation ⋆ ⋆ equation of (30), (64) is equivalent to (37) when Eb is known. In other words, if Eb is the global ⋆ − ⋆ • • • minimum of Eb occurring at G , then G(Eb +ε) with ε<ε ∇Zb PRN(Z = Z ) − Eb ∇Zb RΣ(Z = Z )= 0, (65) ⋆ − ⋆ can verify (37) when Eb is known, but Eb(G(Eb +ε)) > Eb . Zb• • PΣ( ) Thus, if ε is a nonnegative real number that increases in an in the unconstrained case, where Eb = Zb• . Based on the RΣ( ) infinitesimal manner from 0 to infinity, the first value of ε for definitions of R (Z) in (29) and P (Z) in (28) or (32), it Σ RN which (36) is verified can only be the global minimum of Eb. then implies, by relying on matrix calculus [37], that • ∇Zb PRN(Z) = diag{u } and −1 −1 ACKNOWLEDGMENT W −1 ∇b R (Z) = Υ + Z − I + Z , Z Σ ln(2) nRN The first author wishes to thank his beloved wife for her (66) support. respectively, where u• is a vector containing either the ele- ↑ −1 ments of Λ Υ¨ (for PRN(Z) in (28)) or the diagonal ele- REFERENCES −1 ¨ ments of Λ Υ (for PRN(Z) in (32)). Given that ∇Zb PRN(Z) [1] International Telecommunication Union (ITU), “IMT Vision Framework and overall objectives of the future development of IMT for 2020 and and ∇b R(Z) are both diagonal matrices, by inserting the Z Σ beyond,” ITU, Geneva, Switzerland, ITU-R Recommendations M.2083- results in (66) into (65), the latter can be re-expressed as 0, Sep. 2015. [2] H. Zhang et al., “Energy efficiency in communications,” IEEE Commun. W E• 1 1 Mag., vol. 48, no. 11, pp. 48–79, Nov. 2010. u• = b − (67a) i ln(2) 1 + z• 1+ z• [3] L. M. Correia et al., “Challenges and enabling technologies for energy υi i i aware mobile radio networks,” IEEE Commun. Mag., vol. 48, no. 11, pp. 66–72, Nov. 2010. • 2 • Wγ(υi − 1) ⇔ (zi ) υi + zi (1 + υi)+1 − • =0, (67b) [4] G. Auer et al., “How much energy is needed to run a wireless network ln(2)ui ?” IEEE Wireless Commun., vol. 18, no. 5, pp. 40–49, Oct. 2011. • [5] A. Sendonaris, E. Erkip, and B. Aazhang, “User cooperation diversity ∀i ∈{1,...,nRN}, where γ = Eb . part I & II- system description / implementation aspects and performance 2) Power Constrained Optimization: According to the sec- analysis,” IEEE Trans. Commun., vol. 51, no. 11, pp. 1927–1948, Nov. ond equation of (30), (64) is equivalent to 2003. [6] A. Nosratinia, T. E. Hunter, and A. Hedayat, “Cooperative Communi- • • • cation in wireless networks,” IEEE Commun. Mag., vol. 42, no. 10, pp. ∇b PRN(Z = Z ) − ∇b R (Z = Z )= 0, (68) Z Z Σ 74–80, Oct. 2004. in the power constrained case. In turn, equality (67b) holds, [7] J. N. Laneman, D. N. C. Tse, and G. W. Wornell, “ • in wireless networks: efficient protocols and outage behavior,” IEEE but where γ = . Trans. Inf. Theory, vol. 50, no. 12, pp. 3062–3080, Dec. 2004. Equation (33) can then be obtained by solving the quadratic [8] J. Vidal et al. (2008-2009) Reconfigurable OFDMA-based Cooperative equation in (67b). Note that the operator [.] is used in (33) NetworKs Enabled by Agile SpecTrum Use (ROCKET). Universitat + Politecnica de Catalunya and others. Barcelona, Spain. [Online]. to reflect the fact that both Z 0 and G 0. Available: http://cordis.europa.eu/project/rcn/85296 en.html [9] C.-L. Chen, W. E. Stark, and S.-G. Chen, “Energy-bandwidth efficiency tradeoff in MIMO multi-hop wireless networks,” IEEE J. Sel. Areas H. Proof for Proposition 8 Commun., vol. 29, no. 8, pp. 1537–1546, Sep. 2011. Proof: Knowing that E in (23), which is a continuous [10] A. Zappone, P. Cao, and E. A. Jorswieck, “Energy efficiency optimiza- b tion in relay-assisted MIMO systems with perfect and statistical CSI,” function, is lower and upper bounded by continuous functions IEEE Trans. Signal Process., vol. 62, no. 2, pp. 443–457, Jan. 2014. having a global minimum, it implies that Eb has also a global [11] F. H´eliot, “Low-complexity energy-efficient joint resource allocation for minimum, E⋆ (note that E can also have local extrema or two-hop MIMO-AF systems,” IEEE Trans. Wireless Commun., vol. 13, b b no. 6, pp. 3088–3099, Jun. 2014. saddle points), such that [12] F. H´eliot and R. Tafazolli, “Optimal energy-efficient joint resource IEEE Trans. Commun. − ⋆ + allocation for multi-hop MIMO-AF systems,” , Eb ≤ Eb ≤ Eb . (69) vol. 64, no. 9, pp. 3655–3668, Sep. 2016. [13] RAN WG1 (Radio layer 1), “Evolved Universal Terrestrial Radio Furthermore, we know from (p.194 of [38]) that any extrema Access (E-UTRA); Physical layer for relaying operation (Release of a differentiable function can only occur at stationary points. 10-14),” The 3rd Generation Partnership Project (3GPP), Technical Specifications 36.216, 2010 onwards. [Online]. Available: http: Hence, the global minimum of Eb must occur at a stationary //www.3gpp.org/dynareport/36216.htm ⋆ point such that ∇GEb(G = G ) = 0. According to the ex- [14] O. Mu˜noz-Medina, J. Vidal, and A. Agust´ın, “Linear transceiver design in nonregenerative relays with channel state information,” IEEE Trans. pression of Eb(G) in (23) and matrix calculus [37], ∇GEb(G) Signal Process., vol. 55, no. 6, pp. 2593–2604, Jun. 2007. is equivalent to (37), where [15] I. Hammerstr¨om and A. Wittneben, “Power allocation schemes for 2 2 amplify-and-forward MIMO-OFDM relay links,” IEEE Trans. Wireless 2σ1 2σ1W † G ¨ G Commun., vol. 6, no. 8, pp. 2798–2802, Aug. 2007. ∇ PΣ(G) = 2 GΥ and ∇ RΣ(G)= 2 H2 σ2 σ2 ln(2) [16] R. Mo and Y. H. Chew, “Precoder design for non-regenerative MIMO −1 relay systems,” IEEE Trans. Wireless Commun., vol. 8, no. 10, pp. 5041– 1/2 −1 × Ω GΥ H2GΥ − Ω(G) H2G . 5049, Oct. 2009. [17] F. H´eliot, R. Hoshyar, and R. Tafazolli, “On the spectral efficiency (70) maximization of nonregenerative cooperative MIMO communication − ⋆ systems,” in Proc. IEEE PIMRC, Istanbul, Turkey, Sep. 2010. Given that Eb ≤ Eb , as it is mentioned in (69), it implies ⋆ [18] Y. Zhang, J. Li, L. Pang, and Z. Ding, “On precoder design for that there exists a unique real nonnegative number ε such that amplify-and-forward MIMO relay systems,” in Proc. IEEE VTC-Fall, ⋆ − ⋆ − ⋆ Eb = Eb +ε and Eb +ε verifies equation (36). In addition, San Fransisco, CA, Sep. 2011. 16

[19] N. Wang and M. Chen, “Joint source and relay precoder design for Fabien Heliot´ (S’05-M’07) received the MSc degree MIMO relay systems with direct link,” in Proc. IEEE WCSP, Huang- in Telecommunications from the Institut Sup´erieur shan, China, Oct. 2012. de l’Electronique et du Num´erique (ISEN), Toulon, [20] Y. Rong, “Optimal joint source and relay beamforming for MIMO relays France, and the PhD degree in Mobile Telecommu- with direct link,” IEEE Commun. Lett., vol. 14, no. 5, pp. 390–392, May nications from King’s College London, in 2002 and 2010. 2006, respectively. He is currently a Lecturer at the [21] C. Xing et al., “A general robust linear transceiver design for multi- Institute for Communication Systems (ICS) of the hop amplify-and-forward MIMO relaying systems,” IEEE Trans. Signal University of Surrey, formerly known as CCSR. He Process., vol. 61, no. 5, pp. 1196–1209, Mar. 2013. has been actively involved in European Commis- [22] L. Sanguinetti, A. A. D’Amico, and Y. Rong, “A tutorial on the sion (EC) funded projects such as FIREWORKS, optimization of amplify-and-forward MIMO relay systems,” IEEE J. ROCKET, SMART-Net, LEXNET projects and in Sel. Areas Commun., vol. 30, no. 8, pp. 1331–1346, Sep. 2012. the award-winning EARTH project. He is currently involved in the 5GIC [23] F. H´eliot and R. Tafazolli, “Optimal energy-efficient source and relay project, a UK funded project on shaping the future of wireless communication. precoding for cooperative MIMO-AF systems,” in in Proc. EuCNC, His research interests include energy efficiency, EM exposure reduction, Oulu, Finland, Jun. 2017. cooperative communication, MIMO, and radio resource management. He [24] R. Hunger et al., “Alternating optimization for MMSE broadcast precod- received an Exemplary Reviewer Award from IEEE COMMUNICATIONS ing,” in Proc. IEEE ICASSP, Toulouse, France, May 2006, pp. 757–760. LETTERS in 2011. [25] Y. Rong and F. Gao, “Optimal beamforming for non-regenerative MIMO relays with direct link,” IEEE Commun. Lett., vol. 13, no. 12, pp. 926– 928, Dec. 2009. [26] Y. Qi, F. H´eliot, and M. A. Imran, “Green relay techniques in cellular systems,” in Green communications and networking, F. R. Yu, X. Zhang, and V. C. Leung, Eds. Boca Raton, FL: CRC press, Dec. 2012, ch. 3. [27] G. Miao, N. Himayat, and G. Y. Li, “Energy-efficient link adaptation in frequency-selective channels,” IEEE Trans. Commun., vol. 58, no. 2, pp. 545–554, Feb. 2010. [28] F. H´eliot, M. A. Imran, and R. Tafazolli, “On the energy efficiency- spectral efficiency trade-off over the MIMO Rayleigh fading channel,” IEEE Trans. Commun., vol. 60, no. 5, pp. 1345–1356, May 2012. [29] S. Verdu, “Spectral efficiency in the wideband regime,” IEEE Trans. Inf. Theory, vol. 48, no. 6, pp. 1319–1343, Jun. 2002. [30] F. H´eliot, E. Katranaras, O. Onireti, and M. A. Imran, “On the energy efficiency-spectral efficiency trade-off in cellular systems,” in Green communications theoretical fundamentals algorithms and applications, J. Wu, S. Rangan, and H. Zhang, Eds. Boca Raton, FL: CRC press, Nov. 2012, ch. 17. [31] F. H´eliot, M. A. Imran, and R. Tafazolli, “Low-complexity energy- efficient coordinated resource allocation in cellular systems,” IEEE Trans. Commun., vol. 61, no. 6, pp. 2271–2281, Jun. 2013. [32] M. Avriel, Nonlinear Programming: Analysis and Methods. New-York, Dover Publications, Sep. 2003. [33] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, UK: Rahim Tafazolli (SM’09) is a professor and the Cambridge Univ. Press, Mar. 2004. Director of the Institute for Communication Systems [34] E. W. Weisstein, “Newton’s method.” MathWorld–A Wolfram (ICS) and Innovation Centre (5GIC), University Web Resource. [Online]. Available: http://mathworld.wolfram.com/ of Surrey, United Kingdom. He has been active in NewtonsMethod.html research for more than 25 years and has authored [35] R. S. Prabhu and B. Daneshrad, “Energy-efficient power loading for a and co-authored more than 800 papers in refer- MIMO-SVD system and its performance in flat fading,” in Proc. IEEE eed international journals and conferences. Professor Globecom, Miami, FL, Dec. 2010. Tafazolli is a consultant to many mobile companies, [36] T. Yoo and A. Goldsmith, “Capacity and power allocation for fading has lectured, chaired, and been invited as keynote MIMO channels with channel estimation error,” IEEE Trans. Inf. Theory, speaker to a number of IET and IEEE workshops vol. 52, no. 5, pp. 2203–2214, May 2006. and conferences. He has been Technical Advisor to [37] K. B. Petersen and M. S. Pedersen, “The matrix cookbook,” nov 2012. many mobile companies, all in the field of mobile communications. He is the [Online]. Available: http://www2.imm.dtu.dk/pubdb/p.php?3274 Founder and past Chairman of IET International Conference on 3rd Generation [38] M. Golden, Mathematical Methods for Neural Network Analysis and Mobile Communications. He is a Fellow of the IET and WWRF (Wireless Design. Cambridge, MA: MIT Press, Dec. 1996. World Research Forum). He is Chairman of EU Expert Group on Net!Works Technology Platform.