SIAM J. NUMER.ANAL. c 2016 Society for Industrial and Applied Mathematics Vol. 54, No. 4, pp. 2015–2035
A GEOMETRIC NONLINEAR CONJUGATE GRADIENT METHOD FOR STOCHASTIC INVERSE EIGENVALUE PROBLEMS∗
ZHI ZHAO†, XIAO-QING JIN‡ , AND ZHENG-JIAN BAI§
Abstract. In this paper, we focus on the stochastic inverse eigenvalue problem of reconstruct- ing a stochastic matrix from the prescribed spectrum. We directly reformulate the stochastic inverse eigenvalue problem as a constrained optimization problem over several matrix manifolds to minimize the distance between isospectral matrices and stochastic matrices. Then we propose a geometric Polak–Ribi`ere–Polyak-based nonlinear conjugate gradient method for solving the constrained opti- mization problem. The global convergence of the proposed method is established. Our method can also be extended to the stochastic inverse eigenvalue problem with prescribed entries. An extra advantage is that our models yield new isospectral flow methods. Finally, we report some numerical tests to illustrate the efficiency of the proposed method for solving the stochastic inverse eigenvalue problem and the case of prescribed entries.
Key words. inverse eigenvalue problem, stochastic matrix, geometric nonlinear conjugate gradient method, oblique manifold, isospectral flow method
AMS subject classifications. 65F18, 65F15, 15A18, 65K05, 90C26, 90C48 DOI. 10.1137/140992576
1. Introduction. An n-by-n real matrix A is called a nonnegative matrix if all its entries are nonnegative, i.e., Aij ≥ 0 for all i, j =1 ,...,n,whereAij means the (i, j)th entry of A.Ann-by-n real matrix A is called a (row) stochastic matrix if it
is nonnegative and the sum of the entries in each row equals one, i.e., Aij ≥ 0and n j=1 Aij =1foralli, j =1,...,n. Stochastic matrices arise in various applications such as probability theory, statistics, economics, computer science, physics, chemistry, and population genetics. See, for instance, [6, 13, 15, 20, 25, 32, 35] and the references therein.
In this paper, we consider the stochastic inverse eigenvalue problem (StIEP) and the stochastic inverse eigenvalue problem with prescribed entries (StIEP-PE) as follows: ∗ ∗ ∗ StIEP. Given a self-conjugate set of complex numbers {λ1,λ2,...,λn}, find an ∗ ∗ ∗ n-by-n stochastic matrix C such that its eigenvalues are λ1,λ2,...,λn. ∗ ∗ ∗ StIEP-PE. Given a self-conjugate set of complex numbers {λ1,λ2,...,λn},an L⊂N { | } index subset := (i, j) i, j =1,...,n ,andacertain n-by-n nonnega- tive matrix Ca with (Ca)ij =0for (i, j) ∈L/ (i.e., the set of nonnegative numbers {(Ca)ij | (i, j) ∈L}is prescribed), find an n-by-n stochastic matrix C such that its
∗ Received by the editors October 22, 2014; accepted for publication (in revised form) March 31, 2016; published electronically July 6, 2016. http://www.siam.org/journals/sinum/54-4/99257.html † Department of Mathematics, School of Sciences, Hangzhou Dianzi University, Hangzhou 310018, People’s Republic of China ([email protected]). ‡ Department of Mathematics, University of Macau, Taipa, Macau, People’s Republic of China ([email protected]). The research of this author was supported by the research grant MYRG098(Y3- L3)-FST13-JXQ from University of Macau. §Corresponding author. School of Mathematical Sciences, Xiamen University, Xiamen 361005, People’s Republic of China ([email protected]). The research of this author was partially supported Downloaded 01/18/18 to 166.111.71.48. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php by the National Natural Science Foundation of China (11271308), the Natural Science Foundation of Fujian Province of China (2016J01035), and the Fundamental Research Funds for the Central Universities (20720150001).
2015
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 2016 ZHI ZHAO, XIAO-QING JIN, AND ZHENG-JIAN BAI
∗ ∗ ∗ eigenvalues are λ1,λ2,...,λn and Cij =(Ca)ij ∀ (i, j) ∈L .
As stated in [11], the StIEP aims to construct a Markov chain (i.e., a stochastic matrix) from the given spectral property. The StIEP is a special nonnegative inverse eigenvalue problem which aims to find a nonnegative matrix from the given spectrum. These two problems are related to each other by the following result [19, p. 142].
Theorem 1.1. If A is a nonnegative matrix with positive maximal eigenvalue r and a positive maximal eigenvector a,thenr−1D−1AD is a stochastic matrix, where D := Diag(a) means a diagonal matrix with the vector a on its diagonal.
Theorem 1.1 shows that one may first construct a nonnegative matrix with the prescribed spectrum, and then obtain a stochastic matrix with the desired spectrum by a diagonal similarity transformation. In many applications (see, for instance, [6, 32, 35]), the underlying structure of a desired stochastic matrix is often character-
ized by the prescribed entries. The corresponding inverse problem can be described as the StIEP-PE. However, the StIEP-PE may not be solved by first constructing a nonnegative solution followed by a diagonal similarity transformation. There is a large literature on the existence condition of solutions to the StIEP in many special cases. The reader may refer to [8, 10, 11, 17, 19, 23, 24, 36, 37]
for various necessary or sufficient conditions. However, there exist few numerical methods for solving the StIEP, especially for the large-scale StIEP (say n ≥ 1000). In [34], Soules gave a constructive method where additional inequalities of the Perron eigenvector are required. In [12], based on a matrix similarity transformation, Chu and Guo presented an isospectral flow method for constructing a stochastic matrix with
the desired spectrum. An inverse matrix involved in the isospectral flow method was further characterized by an analytic singular value decomposition (ASVD). Recently, in [21] Orsi proposed an alternating projection-like scheme for the construction of a stochastic matrix with the prescribed spectrum. The main computational cost is an eigenvalue-eigenvector decomposition or a Schur matrix decomposition. In [18],
Lin presented a fast recursive algorithm for solving the StIEP, where the prescribed eigenvalues are assumed to be real and satisfy additional inequality conditions. We see that all the numerical methods in [12, 18, 21] are based on Theorem 1.1 and may not be extended to the StIEP-PE. Recently, there appeared a few Riemannian optimization methods for eigenprob- lems. For instance, in [1], a truncated conjugate gradient method was presented
for the symmetric generalized eigenvalue problem. In [5], a Riemannian trust-region method was proposed for the symmetric generalized eigenproblem. Newton’s method and the conjugate gradient method proposed in [33] are applicable to the symmetric eigenvalue problem. In [40], a Riemannian Newton method was proposed for a kind of nonlinear eigenvalue problem.
In this paper, we directly focus on the StIEP. Based on the real Schur matrix decomposition, we reformulate the StIEP as a constrained optimization problem over several matrix manifolds, which aims to minimize the distance between isospectral matrices and stochastic matrices. We propose a geometric Polak–Ribi`ere–Polyak- based nonlinear conjugate gradient method for solving the constrained optimization
problem. This is motivated by Yao et al. [38] and Zhang, Zhou, and Li [39]. In
Downloaded 01/18/18 to 166.111.71.48. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php [38], a Riemannian Fletcher–Reeves conjugate gradient approach was proposed for solving the doubly stochastic inverse eigenvalue problem. In [39], Zhang, Zhou, and Li gave a modified Polak–Ribi`ere–Polyak conjugate gradi ent method for solving the
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. STOCHASTIC INVERSE EIGENVALUE PROBLEMS 2017
unconstrained optimization problem, where the generated direction is always a descent direction for the objective function. We investigate some basic properties of these matrix manifolds (e.g., tangent spaces, Riemannian metrics, orthogonal projections, retractions, and vector transports) and derive an explicit expression of the Riemannian gradient of the cost function. The global convergence of the proposed method is
established. We also extend the proposed method to the StIEP-PE. An additional advantage of our models is that they also yield new geometric isospectral flow methods for both the StIEP and the StIEP-PE. Finally, we report some numerical tests to show that the proposed geometric methods perform very effectively and are applicable to large-scale problems. Moreover, the new geometric isospectral flow methods work
acceptably for small- and medium-scale problems. The rest of this paper is organized as follows. In section 2 we propose a geometric nonlinear conjugate gradient approach for solving the StIEP. In section 3 we establish the global convergence of the proposed approach. In section 4 we extend the proposed approach to the StIEP-PE. Finally, some numerical tests are reported in section 5,
and some concluding remarks are given in section 6. 2. A geometric nonlinear conjugate gradient method. In this section, we
propose a geometric nonlinear conjugate gradient approach for solving the StIEP. We first reformulate the StIEP as a nonlinear minimization problem defined on some matrix manifolds. We also discuss some basic properties of these matrix manifolds and derive the Riemannian gradient of the cost function. Then we present a geo- metric nonlinear conjugate gradient approach for solving the nonlinear minimization problem.
2.1. Reformulation. As noted in [19, p. 141],we only assume that the set of ∗ ∗ ∗ prescribed complex numbers {λ1,λ2,...,λn} satisfies the following necessary condi- ∗ tion: One element is 1 and max1≤i≤n |λi | = 1. Note that the set of complex numbers ∗ ∗ ∗ {λ1,λ2,...,λn} is closed under complex conjugation. Without loss of generality, we can assume √ √ ∗ ∗ λ2j−1 = aj + bj −1,λ2j = aj − bj −1,j=1,...,s ; λj ∈ R,j=2s +1,...,n,
where aj ,bj ∈ R with bj =0for j =1,...,s. Then we define a block diagonal matrix
by [2]∗ [2]∗ ∗ ∗ Λ:=blkdiag λ1 ,...,λs ,λ2s+1 ,...,λn , [2]∗ [2]∗ ∗ ∗ with diagonal blocks λ1 ,...,λs ,λ2s+1,...,λn,where [2]∗ aj bj λj = ,j=1,...,s. −bj aj
Denote by O(n)thesetofalln-by-n orthogonal matrices, i.e., n×n T O(n):= P ∈ R | P P = In ,
T where P means the transpose of P and In is the identity matrix of order n.In addition, we define the set V by n×n V := V ∈ R | Vij =0, (i, j ) ∈I ,
Downloaded 01/18/18 to 166.111.71.48. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php where I is the index subset, I := (i, j) | i ≥ j or Λij =0 ,i,j=1 ,...,n .
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 2018 ZHI ZHAO, XIAO-QING JIN, AND ZHENG-JIAN BAI
S Let be the set of all n-by-n stochastic matrices, which can be defined as the following matrix manifold: T n×n S := {S S | diag(SS )=In,S ∈ R }, where denotes the Hadamard product [7] and diag(A ) is a diagonal matrix with the same diagonal entries as a square matrix A. The set of isospectral matrices defined by (2.1) M(Λ) := Z ∈ Rn×n | Z = P (Λ + V )P T ,P ∈O(n),V ∈V
is a smooth manifold. Then the StIEP has a solution if and only if M(Λ) ∩S = ∅. Suppose that the StIEP has at least one solution. Then the StIEP is equivalent to solving the following nonlinear matrix equation:
H(S, P, V )=0n×n
for (S, P, V ) ∈OB×O(n) ×V,where0n×n is the zero matrix of order n and the set OB is the oblique manifold defined by [3, 4], n×n T OB := S ∈ R | diag(SS )= In .
The mapping H : OB × O(n) ×V →Rn×n is defined by T (2.2) H(S, P, V )=S S − P (Λ + V )P , (S,P, V ) ∈OB×O(n) ×V.
If we find a solution (S,P,V ) ∈OB×O(n) ×V to H(S, P, V )=0n×n,then C = S S is a solution to the StIEP. Alternatively, one may solve the StIEP by finding a global solution to the following nonlinear minimization problem:
1 2 1 T 2 min h(S, P, V ):= H(S, P, V ) = S S − P (Λ + V )P (2.3) 2 F 2 F s.t.S∈OB,P∈O(n),V∈V,
where “s.t.” means “subject to” and · F means the matrix Frobenius norm [16].
2.2. Basic properties. In the following, we establish some basic properties for the product manifold OB × O(n) ×V. It is easy to check that the dimensions of OB, O(n), and V are given by [4, p. 27]
1 dim OB = n(n − 1), dim O(n)= n(n − 1), dim V = |J |, 2
where |J | is the cardinality of the index subset J = N\I, which is the relative complement of I in N .Thenwehave
n(n − 1) n×n dim(OB × O(n) ×V)=n(n − 1) + + |J | > dim R ,n≥ 3. 2
This shows that the nonlinear equation H(S, P, V )= 0n×n is an under-determined system defined from OB × O(n) ×V to Rn×n for n ≥ 3. The tangent spaces of OB, O(n), and V at S ∈OB , P ∈O(n), and V ∈Vare, respectively, given by (see [3] and [4, p. 42]) ⎧ n×n T ⎪ TSOB = T ∈ R | diag(ST )=0n×n , ⎨ T n×n Downloaded 01/18/18 to 166.111.71.48. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php O | − ∈ R ⎪ TP (n)= P Ω Ω = Ω, Ω , ⎩⎪ TV V = V.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. STOCHASTIC INVERSE EIGENVALUE PROBLEMS 2019
OB × O ×V ∈OB×O ×V Hence, the tangent space of (n) at (S, P, V ) (n) is given by
T(S,P,V )OB × O(n) ×V= TSOB × TP O(n) × TV V.
For the isospectral manifold M(Λ) defined in (2.1), it follows that T T n×n TZ M(Λ) = [Z, Ω] + PUP | Ω = −Ω,U ∈V, Ω ∈ R , T where Z = P (Λ + V )P ∈M(Λ) and [Z, Ω] := ZΩ − ΩZ means the Lie-bracket product. Let the Euclidean space Rn×n × Rn×n × Rn×n be equipped with the following natural inner product:
T T T (X1,Y1,Z1), (X2,Y2,Z2) := tr(X X2)+tr(Y Y2)+tr(Z Z2) 1 1 1 ∈ Rn×n × Rn×n × Rn ×n · for all (X1,Y1,Z1), (X2,Y2,Z2) and its induced norm , where tr(A) denotes the sum of the diagonal entries of a square matrix A.Since OB × O(n) ×V is an embedded submanifold of Rn×n × Rn×n × Rn×n,wecanequip OB × O(n) ×V with an induced Riemannian metric, (2.4) g(S,P,V ) (ξ1,ζ1,η1), (ξ2,ζ2,η2) := (ξ1,ζ 1,η1), (ξ2,ζ2,η2)
for all (S, P, V ) ∈OB×O(n)×V and (ξ1,ζ1,η1), (ξ2,ζ2 ,η2) ∈ T(S,P,V )OB ×O(n)×V. Without causing any confusion, in what follows we use ·, · and · to denote the
Riemannian metric on OB × O(n) ×V and its induced norm. Then, the orthogonal n×n projections of any ξ,ζ,η ∈ R onto TSOB, TP O(n ), and TV V are, respectively, given by (see [3] and [4, p. 48])
T T (2.5) ΠSξ = ξ − diag(Sξ )S, ΠP ζ = P skew(P ζ), ΠV η = W η,
1 T n×n n×n where skew(A):= 2 (A − A ) for a matrix A ∈ R and the matrix W ∈ R is defined by 0if(i, j) ∈I, Wij := 1otherwise.
Therefore, the orthogonal projection of a point (ξ,ζ,η ) ∈ Rn×n × Rn×n × Rn×n onto T(S,P,V )OB × O(n) ×V has the following form: T T Π(S,P,V )(ξ,ζ,η)= ξ − diag(Sξ )S, P skew( P ζ),W η .
In addition, the retractions on the manifolds OB, O(n), and V canbechosenas
(see [3] and [4, p. 58]), ⎧ ⎪ T −1/2 ⎨⎪ RS(ξS )= diag (S + ξS)(S + ξS ) ( S + ξS)forξS ∈ TSOB,
RP (ζP )=qf(P + ζP )forζP ∈ TP O(n), ⎪ ⎩ RV (ηV )=V + ηV for ηV ∈ TV V,
where qf(A) denotes the Q factor of the QR decomposition of a nonsingular matrix n×n A ∈ R in the form of A = QR. Here, Q ∈O(n)and R is an upper triangular matrix with strictly positive diagonal elements. Thus, a retraction R on OB×O(n)×V
Downloaded 01/18/18 to 166.111.71.48. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php takes the form of R(S,P,V )(ξS ,ζP ,ηV )= RS (ξS),RP (ζ P ),RV (ηV )
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. 2020 ZHI ZHAO, XIAO-QING JIN, AND ZHENG-JIAN BAI
∈ OB × O ×V for all (ξS ,ζP ,γV ) T(S,P,V ) (n) . Since all the manifolds OB, O(n), and V are embedded Riemannian submani- folds of Rn×n, we can define vector transports on OB , O(n), and V by orthogonal projections [4, p. 174]:
(2.6)⎧ T − T ∈ OB ⎪ ηS ξS := ΠRS (ηS )ξS = ξS diag RS(ηS)ξS RS(ηS )forξS,ηS TS , ⎨⎪ T Tη ξP := ΠR (η )ξP = RP (ηP )skew RP (ηP ) ξP for ξP ,ηP ∈ TP O(n), ⎪ P P P ⎩ Tη ξV := ξV for ξV ,ηV ∈ TV V. V
Thus the vector transport on OB × O(n) ×V has the following form: (2.7) T(η ,η ,η )(ξS ,ξP ,ξV )= Tη ξS, Tη ξP , Tη ξV S P V S P V ∈ OB × O ×V for any (ξS ,ξP ,ξV ), (ηS ,ηP ,ηV ) T(S,P,V ) (n) . We now establish explicit formulas for the differential of H defined in (2.2) and the Riemannian gradient of the cost function h defined in (2.3). To derive the Riemannian gradient of h, we define the mapping H : Rn×n × Rn×n × Rn×n → Rn×n and the cost function h : Rn×n × Rn×n × Rn×n → R by
T 1 2 (2.8) H(S, P, V ):=S S − P (Λ + V )P and h( S, P, V ):= H(S, P, V ) F 2
for all (S, P, V ) ∈ Rn×n × Rn×n × Rn×n. Then the mapping H and the function h can be seen as the restrictions of H and h onto OB × O(n) ×V, respectively. That
is, H = H|OB×O(n)×V and h = h|OB×O(n)×V . The Euclidean gradient of h at a point ∈ Rn×n × Rn×n × Rn×n (S, P, V ) is given by ∂ ∂ ∂ grad h(S, P, V )= h(S, P, V ), h(S, P, V ), h(S, P, V ) , ∂S ∂P ∂V
where ⎧ ⎪ ∂ h(S, P, V )=2S H(S, P, V ), ⎨⎪ ∂S T ∂ − T − ⎪ ∂P h(S, P, V )= H(S, P, V )P (Λ + V ) H (S, P, V )P (Λ + V ), ⎩⎪ ∂ − T ∂V h(S, P, V )= P H(S, P, V )P.
Then the Riemannian gradient of h at a point (S, P, V ) ∈OB×O(n) ×V is given by [4, p. 48]
grad h(S, P, V )=Π(S,P,V ) grad h(S, P, V ) (2.9) ∂ ∂ ∂ = ΠS h(S, P, V ), ΠP h(S, P, V ), ΠV h(S, P, V ) . ∂S ∂P ∂V
It follows from (2.5) that ⎧ Downloaded 01/18/18 to 166.111.71.48. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php ∂ − T ⎪ ΠS ∂S h(S, P, V )=2S H(S, P, V ) 2 diag S(S H(S, P, V )) S, ⎨ ∂ 1 T T T T ΠP ∂P h(S, P, V )= [P (Λ + V )P ,H (S, P, V )] + [P (Λ + V ) P ,H(S, P, V )] P, ⎪ 2 ⎩ ∂ T ΠV ∂V h(S, P, V )=−W P H(S, P, V )P .
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. STOCHASTIC INVERSE EIGENVALUE PROBLEMS 2021
By using a similar idea from [9] and [12], we can define the steepest descent flow for the cost function h over OB × O(n) ×V as follows:
d(S, P, V ) (2.10) = − grad h(S, P, V ). dt
This, together with an initial value (S(0),P(0),V(0)) ∈OB×O(n) ×V,leadsto an initial value problem for problem (2.3). Thus we can apply existing ODE solvers
to the above differential equation and get a solution for problem (2.3). From the numerical tests in section 5, one can see that this extra geometric isospectral flow method is acceptable for small- and medium-scale problems. On the other hand, the differential DH(S, P, V ): T(S,P,V )OB × O(n) ×V → n×n n×n TH(S,P,V )R R of H at a point (S, P, V ) ∈OB×O(n) ×V is determined by
T T T DH(S, P, V )[(ΔS, ΔP, ΔV )] = 2S ΔS +[P (Λ + V )P , ΔPP ] − P ΔVP
for all (ΔS, ΔP, ΔV ) ∈ T(S,P,V )OB × O(n) ×V. Then we have [4, p. 185]
(2.11) grad h(S, P, V )=(DH(S, P, V ))∗[H (S, P, V )],
where (DH(S, P, V ))∗ denotes the adjoint of the operator DH(S, P, V ). OB×O ×V OB×O ×V Let T (n) denote the tangent bundle of (n) . As in [4, p. 55], n×n the pullback mappings H : T OB × O(n) ×V→R and h : T OB × O(n) ×V →R of H and h through the retraction R on OB × O(n) ×V can be defined by
(2.12) H (ξ)=H(R(ξ)) and h(ξ)=h(R(ξ)),ξ ∈ T OB × O(n) ×V,