The Nearest Generalized Doubly Stochastic to a Real Matrix with the same First and Second Moment

William Glunt a, Thomas L. Hayden b, and Robert Reams c∗

bDepartment of Mathematics and Computer Science, Austin Peay State University, Clarksville, TN 37044 bDepartment of Mathematics, University of Kentucky, Lexington, KY 40506 cDepartment of Mathematics, Virginia Commonwealth University, 1001 West Main Street, Richmond, VA 23284

Abstract Let T be an arbitrary n × n matrix with real entries. We explicitly find the closest (in Frobenius norm) matrix A to T , where A is n × n with real entries, subject to the condition that A is “generalized doubly stochastic” (i.e. Ae = e and eT A = eT , where e = (1, 1, ..., 1)T , although A is not necessarily nonnegative) and A has the same first T T moment as T (i.e. e1 Ae1 = e1 T e1). We also explicitly find the closest matrix A to T when A is generalized doubly stochastic has the same first moment as T and the same T 2 T 2 second moment as T (i.e. e1 A e1 = e1 T e1), when such a matrix A exists. Key words doubly stochastic, generalized doubly stochastic, moments, nearest matrix, closest matrix, Frobenius norm. MSC 15A51, 65K05, 90C25

1 Introduction

n T n Ê Let e ∈ Ê be the vector of all ones, i.e. e = (1, 1, ..., 1) , and let ei ∈ denote the vector with a 1 in the ith position and zeroes elsewhere. An n × n matrix A with real entries is said to be generalized doubly stochastic if Ae = e and eT A = eT . A generalized doubly does not necessarily have nonnegative entries, unlike a which has all entries nonnegative. The kth moment of A is defined as * Correspondence to R. Reams, [email protected], Phone: (804) 828-5576, Fax: (804) 828-8785

1 T k n n e1 A e1. Let T = (tij) ∈ Ê × be an arbitrary matrix which is given. We will say that n n T k T k a matrix A ∈ Ê × is Mk if e1 A e1 = e1 T e1, where k is a positive integer. We use the convention that ||·|| refers to either the Frobenius ||·||F , or the vector

2-norm ||· ||2, with the context determining which is intended. Frequent use is made of

n n n Ê the fact that if x, y ∈ Ê are unit vectors we can find a Householder matrix Q ∈ × such that Qx = y [5]. We will determine the closest (in Frobenius norm) matrix A to T , subject to the conditions that A is generalized doubly stochastic and has the same first and second moments as T . The motivation for this problem comes from an application in [2] where it is desired to approximate a certain matrix T , where T comes from a linear system corresponding to a large linear network, subject to the approximating matrix satisfying certain conditions. We outlined these applications in [3] and (among other things there) used a computational algorithm to find the closest matrix A to T , subject to A being generalized doubly stochastic and M1, or subject to A being doubly stochastic and M1. See the references [2] and [3] for more details about the applications. Our extended work here takes only an analytic approach to the problem, includes the second moment condition, and explicitly finds the closest matrix which is generalized doubly stochastic,

M1 and M2, having dropped the requirement that the closest matrix be nonnegative. It is worth emphasizing that despite dropping the nonnegativity requirement our solution is still relevant to the original problem. Previous approaches to this problem, in both [3] and [8], did not include the second moment, although they did include the nonnegative condition. A survey of matrix nearness problems and their applications, which include areas of control theory, and statistics, was given by Higham [4], see also [7].

2 The closest generalized doubly stochastic matrix

Although Theorem 1 and Corollary 2 resemble results proved in [3] and [6], the results herein present a different formulation. We include them for clarity and because we will use them later.

n n n n Ê Theorem 1 Let A ∈ Ê × and let Q ∈ × be an so that Qen = A 0 1 e. Then A is generalized doubly stochastic if and only if A = Q 1 QT , for any √n  0T 1  (n 1) (n 1) A1 ∈ Ê − × − .

2 T T T T T T Proof Ae = e if and only Q AQen = en, and e A = e if and only if en Q AQ = en . Theorem 1 enables us to easily incorporate the condition that A is generalized doubly stochastic and in Corollary 2 find the closest such matrix to T .

n n T T1 t2 (n 1) (n 1) Ê Corollary 2 Let T ∈ Ê × and Q T Q = T , where T1 ∈ − × − , t2, t3 ∈  t3 t4 

n 1 n n

Ê Ê Ê − , t4 ∈ , and where Q ∈ × is as in Theorem 1. Then the generalized doubly n n T1 0 T stochastic matrix A ∈ Ê × given by A = Q Q satisfies the inequality ||A−T || ≤  0T 1  n n

||Z − T || among all generalized doubly stochastic matrices Z ∈ Ê × .

2 Proof The Frobenius norm is invariant under orthogonal similarity so ||A − T || = ||A1 − 2 2 2 2 T1|| + ||t2|| + ||t3|| +(1 − t4) , and A is minimal among generalized doubly stochastic matrices when A1 = T1.

3 The closest generalized doubly stochastic matrix

which is M1

m m n n Ê For A ∈ Ê × and B ∈ × , the direct sum of A and B, denoted A ⊕ B, is the A 0 (m + n) × (m + n) .  0 B  We now construct an orthogonal matrix Q with an additional desirable property which n n we need for the first and second moments. Let Q1 ∈ Ê × be an orthogonal matrix such

1 T u n 1 (n 1) (n 1)

Ê Ê that Q e = e, with Q e = , for some u ∈ Ê − , β ∈ . Let Q ∈ − × − 1 n √n 1 1  β  2 0 be an orthogonal matrix such that QT u = , where α = ||u||, and let Q = Q (Q ⊕ 1). 2  α  1 2 1 Then Qen = Q1(Q2 ⊕ 1)en = Q1en = √n e as in Section 2, and we have the additional 0 u property that QT e = (QT ⊕ 1)QT e = (QT ⊕ 1) =  α . Note that if u = 0, then 1 2 1 1 2  β   β  T 1   Q1 e1 = βen, so e1 = βQ1en = β √n e, which is not possible, so we must have α =6 0. The proof of Theorem 3 will use the fact, as stated in the introduction, that if T =

n n n n Ê (tij) ∈ Ê × is the matrix to be approximated then saying A ∈ × is M1 means T T n n e1 Ae1 = e1 T e1 = t11. This theorem gives the form of a matrix A ∈ Ê × that is both generalized doubly stochastic and M1.

3

n n n n Ê Theorem 3 Let A ∈ Ê × and let Q ∈ × be an orthogonal matrix such that Qen = 0

1 T   n n Ê √n e and Q e1 = α , where α, β ∈ Ê . Then A ∈ × is generalized doubly stochastic  β    A a2 0 1 2  T t11 β  T (n 2) (n 2) and M1 if and only if A = Q a3 α−2 0 Q , for any A1 ∈ Ê − × − , a2, a3 ∈  0T 0 1  n 2  

Ê − .

Proof The matrix A is both generalized doubly stochastic and M1 if and only if t11 = A1 a2 0 A1 a2 0 T  T  T 2 2  T  T e1 Q a3 a4 0 Q e1 =a4α + β , where A=Q a3 a4 0 Q from Theorem 1.  0T 0 1   0T 0 1      Theorem 1 gives us the means to now find the closest matrix to T which is both generalized doubly stochastic and M1.

T1 t2 t5

n n T  T  (n 2) (n 2) Ê Corollary 4 Let T ∈ Ê × and Q T Q = t3 t4 t6 , where T1 ∈ − × − , T  t7 t8 t9 

n 2  n n 

Ê Ê t2, t3, t5, t7 ∈ Ê − , t4, t6, t8, t9 ∈ , and where Q ∈ × is an orthogonal matrix such 0 1 T   that Qen = √n e and Q e1 = α , where α, β ∈ Ê . Then the generalized doubly stochastic  β    T t2 0 1 2  T t11 β  T and M1 matrix A given by A = Q t3 α−2 0 Q is the closest matrix to T in the  0T 0 1    sense that ||A − T || ≤ ||Z − T || for all generalized doubly stochastic and M1 matrices n n

Z ∈ Ê × .

Proof As in the proof of Corollary 2, since the Frobenius norm is invariant under orthog- 2 2 2 2 2 t11 β onal similarity, we have that ||A − T || = ||A1 − T1|| + ||a2 − t2|| + ||a3 − t3|| +( α−2 − 2 2 2 2 2 2 t4) + ||t5|| + t6 + ||t7|| + t8 + (1 − t9) , and A is minimal when A1 = T1, a2 = t2, and a3 = t3.

4 The closest generalized doubly stochastic matrix

which is M1 and M2

Similar reasoning to that given in Section 3 can be used to find necessary and sufficient conditions for a matrix to be generalized doubly stochastic, M1 and M2. However, for

4 finding a nearest point in Corollary 6, a difficulty comes from the fact that the set of

M2 matrices is not a convex set, so we don’t necessarily expect for there to be a unique nearest point [1]. Although, if there is such a nearest point then it will be determined by the conditions given in Corollary 6.

n n n n 1 Ê Theorem 5 Let A ∈ Ê × and Q ∈ × be an orthogonal matrix such that Qen = √n e 0

T   n n Ê and Q e1 = α , where α, β ∈ Ê . Then A ∈ × is generalized doubly stochastic, M1  β    A x 0 1 2  T t11 β  T (n 2) (n 2) and M2 if and only if A = Q y α−2 0 Q , for any A1 ∈ Ê − × − and any  0T 0 1    T 2 2 2 n 2 T e1 T e1 − β t11 − β 2

x, y ∈ Ê − such that x y = − ( ) . α2 α2

Proof A is generalized doubly stochastic, M1 and M2 if and only if A both satisfies T 2 T 2 Theorem 3 and satisfies the second moment condition that e1 T e1 = e1 A e1. However,

A x 0 2 0 1 2 T 2 T T 2 T  T t11 β    e1 A e1 = e1 Q(Q AQ) Q e1 = [ 0 α β ] y α−2 0 α ,  0T 0 1   β      ∗ ∗ 0 0 2 2 T t11 β 2 2 T t11 − β 2 2 = [ 0 α β ]  ∗ y x +( −2 ) 0   α  = α [y x +( ) ]+ β , α α2  0 0 1   β      T 2 and then substituting e1 T e1 and rearranging gives the result. Preparing for the proof of Corollary 6, and arguing similarly to the preceding corol- laries, since the Frobenius norm is invariant under orthogonal similarity we calculate that

t − β2 ||A − T ||2 = ||A − T ||2 + ||x − t ||2 + ||y − t ||2 +( 11 − t )2 + ||t ||2 + t2 1 1 2 3 α2 4 5 6 2 2 2 +||t7|| + t8 + (1 − t9) .

Now A is minimal among generalized doubly stochastic, M1 and M2 matrices when 2 2 A1 = T1, and when x and y are such that ||x − t2|| + ||y − t3|| is minimized subject to eT T 2e − β2 t − β2 the constraint xT y = 1 1 − ( 11 )2. Thus we have our final corollary. α2 α2

5 T1 t2 t5

n n T T (n 2) (n 2) Ê Corollary 6 Let T ∈ Ê × and Q T Q =  t3 t4 t6 , where T1 ∈ − × − , T  t7 t8 t9 

n 2  n n

Ê Ê t2, t3, t5, t7 ∈ Ê − , t4, t6, t8, t9 ∈ , and where Q ∈ × is an orthogonal matrix 0 1 T   such that Qen = √n e and Q e1 = α , where α, β ∈ Ê . Then the generalized dou-  β    T x 0 1 2  T t11 β  T bly stochastic, M1 and M2 matrix A given by A = Q y α−2 0 Q , satisfies the  0T 0 1    requirement that ||A − T || ≤ ||Z − T || for all generalized doubly stochastic, M1 and M2 n n matrices Z ∈ Ê × , where x and y have been chosen (where possible) so as to minimize eT T 2e − β2 t − β2 ||x − t ||2 + ||y − t ||2 subject to the constraint xT y = 1 1 − ( 11 )2. 2 3 α2 α2 eT T 2e − β2 t − β2 Proof For convenience we write c = t , d = t and r = 1 1 − ( 11 )2. 2 3 α2 α2 It remains for us to solve the problem min ||x − c||2 + ||y − d||2, xT y=r for which we find the Kuhn-Tucker conditions to be x = c − µy and y = d − µx, where µ is the Lagrange multiplier. We solve simultaneously xT y = r, x+µy = c and µx+y = d. The latter two equations imply c − µd = (1 − µ2)x and d − µc = (1 − µ2)y, which imply (1 − µ2)2xT y =(c − µd)T (d − µc), and with the first equation this implies

(1 − µ2)2r = cT d(1 + µ2) − µ(||c||2 + ||d||2).

If this quartic equation has no real solution µ then there is no minimum. Case 1: If each real solution µ =6 ±1 then for each such µ c − µd d − µc x = and y = , 1 − µ2 1 − µ2 and we check to see which pair x, y gives a minimum. Case 2: If µ = 1 and c = d then we must solve x+y = c and xT y = r, i.e. xT (x−c)= −r, c T c c 2 which by completing the square becomes (x − 2 ) (x − 2 )= || 2 || − r. c 2 If || 2 || − r< 0 there is no minimum in this case. Note that since x + y = c we have ||x − c||2 + ||y − c||2 = ||x + y||2 − 2xT y = ||c||2 − 2r.

c 2 n This is a constant, so if || 2 || − r ≥ 0 we can take for any w ∈ Ê , where w =6 0, c c w c c w x = + || ||2 − r and y = − || ||2 − r . 2 r 2 ||w|| 2 r 2 ||w||

6 c 2 Case 3: If µ = −1 and c = −d then a similar calculation shows that if || 2 || + r< 0 there c 2 n is no minimum. Whereas if || 2 || + r ≥ 0 we can take for any w ∈ Ê , where w =6 0,

c c w c c w x = + || ||2 + r and y = − || ||2 + r . 2 r 2 ||w|| 2 r 2 ||w||

Acknowledgements The second author received partial support from NSF grant CHE-9005960.

References

[1] S.K. Berberian, Introduction to Hilbert Space, Oxford University Press, New York, (1961)

[2] P. Feldmann, R. W. Freund, Efficient Linear Circuit Analysis by Pad´eApproximation via the Lanczos Process, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 14(5)(1995), pp. 639–649.

[3] W. Glunt, T.L. Hayden, R. Reams, The Nearest “Doubly Stochastic” Matrix to a Real Matrix with the same First Moment, Numerical with Applica- tions, 5(1)(1999), pp. 1–8.

[4] N.J. Higham, Matrix nearness problems and applications, Applications of Matrix Theory, pp. 1–27, The Institute of Mathematics and its Applications Conference Series. New Series. 22 (1989), Oxford University Press, New York.

[5] R. A. Horn, C. R. Johnson, Matrix Analysis, Cambridge University Press, Cambridge (1985)

[6] R. N. Khoury, Closest Matrices in the Space of Generalized Doubly Stochastic Ma- trices, Journal of Mathematical Analysis and Applications, 222 (1998), pp. 562–568.

[7] A.W. Marshall, I. Olkin, Inequalities: Theory of Majorization and its Applications, Academic Press, New York (1979)

[8] Zheng-Jian Bai, Delin Chu, Roger C. E. Tan, Computing the nearest doubly stochas- tic matrix with a prescribed entry, SIAM Journal on Scientific Computing 29(2) (2007), pp. 635–655.

7