NUMERICAL ALGEBRA, doi:10.3934/naco.2018026 CONTROL AND OPTIMIZATION Volume 8, Number 4, December 2018 pp. 413–440
OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS
K. T. Arasu Department of Mathematics and Statistics Wright State University 3640 Colonel Glenn Highway Dayton, OH 45435, U.S.A. Manil T. Mohan∗† Department of Mathematics and Statistics Air Force Institute of Technology, 2950 Hobson Way Wright Patterson Air Force Base, OH 45433, USA
(Communicated by Zhong Wan)
Abstract. The optimization problems involving orthogonal matrices have been formulated in this work. A lower bound for the number of stationary points of such optimization problems is found and its connection to the num- ber of possible partitions of natural numbers is also established. We obtained local and global optima of such problems for different orders and showed their connection with the Hadamard, conference and weighing matrices. The appli- cation of general theory to some concrete examples including maximization of Shannon, Rény, Tsallis and Sharma-Mittal entropies for orthogonal matrices, minimum distance orthostochastic matrices to uniform van der Waerden matri- ces, Cressie-Read and K-divergence functions for orthogonal matrices, etc are also discussed. Global optima for all orders has been found for the optimization problems involving unitary matrix constraints.
1. Introduction. Optimization problems involving constraints like orthogonal ma- trices and unitary matrices play an important role in several applications of science, engineering and technology. Some examples include linear and nonlinear eigenvalue problems, electronic structures computations, low-rank matrix optimization, poly- nomial optimization, subspace tracking, combinatorial optimization, sparse princi- pal component analysis, etc (see [8,1, 18, 24, 25] and references therein). These problems are difficult because the constraints are non-convex and the orthogonality constraints may lead to several local optimum and, in particular, many of these problems in special forms are non-deterministic polynomial-time hard (NP-hard). There is no assurance for obtaining the global optimizer except for a few simple cases and hence is a tedious task for the numerical analysts to find a global optimum (see [25] for more details).
2010 Mathematics Subject Classification. Primary: 15B51; Secondary: 65K05, 15B10, 15B34. Key words and phrases. optimization, orthogonal matrix, Hadamard matrix, conference ma- trix, weighing matrix, orthostochastic matrix, Shannon entropy. ∗Corresponding author: Manil T. Mohan. †M. T. Mohan’s current address Department of Mathematics, Indian Institute of Technology Roorkee-IIT Roorkee, Haridwar Highway, Roorkee, Uttarakhand 247 667, INDIA.
413 414 K. T. ARASU AND MANIL T. MOHAN
In this work, we formulate some interesting optimization problems involving or- thogonal matrix constraints (of order n, n ∈ N) and find a fascinating connection between the number of stationary points and the number of possible partitions of a natural number. We identify some global and local optima for such prob- lems, which are inevitably related to the real Hadamard, conference and weighing matrices. For the optimization problems involving unitary matrix constraints, we found global optima for all orders. We also give the application of general theory to some concrete examples including maximization of Shannon, Rény, Tsallis and Sharma-Mittal entropies for orthogonal matrices, minimum distance orthostochas- tic matrices to uniform van der Waerden matrices, Cressie-Read and K-divergence functions for orthogonal matrices, etc. Let us now list some of the optimization problems with orthogonal matrix con- strains available in the literature. One of the first such problems is addressed in [13], where the optimization problem was to find the nearest orthogonal matrix to a given matrix. The entropy of a random variable is the measure of uncertainty in Information Theory. The Shannon entropy of a real orthogonal matrix is introduced in [9], and it has been shown in [9,2] that if a real Hadamard matrix exists, then it maximizes the Shannon entropy. The work in [3] extended this idea and found (local and global) minimum distance orthostochastic matrices to uniform van der Waerden matrices (see [17,5,3]) of different orders. The uniform van der Waerden 1 matrix of order n (with all its entries n ) is bistochastic, and is orthostochastic if and only if there exists a real Hadamard matrix of order n. Note that a real Hadamard matrix of order n exits, then n = 1, 2, or n ≡ 0(mod 4). Thus for all other orders, finding local and global optimum for the optimization problem of minimum dis- tance orthostochastic matrices to uniform van der Waerden matrix, is a challenging problem. In the numerical point of view, the authors in [25] developed a feasible method for optimization problems with orthogonality constraints and they covered a wide range of such problems. Bistochastic and orthostochastic matrices have variety of applications in Statis- tics, Mathematics and Physics, including but not limited to the theory of majoriza- tion, angular momentum, in transfer problems, investigations of the Frobenius- Perron operator, and in characterization of completely positive maps acting in the space of density matrices, see for example, [4, 15, 26] etc. The construction of the paper is as follows. In the next section, we give some basic definitions and properties of the orthogonal group, Hadamard, conference and weighing matrices. In section3, we formulate a general optimization problem and discuss about its stationary points, local and global maxima (minima also). A general maximization problem (minimization problem) is described in section4 and several properties of its stationary points, local and global maxima (minima) are also examined. The same optimization problems involving unitary matrices as constraints are also formulated in this section and we find the matrices involving complex Hadamard matrices as the global optimizers for all orders. The application of general theory to some particular examples including maximization of Shannon, Rény, Tsallis and Sharma-Mittal entropies for orthogonal matrices, minimum dis- tance orthostochastic matrices to uniform van der Waerden matrices, Cressie-Read and K-divergence functions for orthogonal matrices etc, are given in section5.
2. Preliminaries. In this section, we give some preliminaries needed to establish the main results of this paper. In the sequel, M(n, R) = Rn×n denotes the space of OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 415 all n × n real matrices and M(n, C) = Cn×n denotes the space of all n × n complex matrices.
Definition 2.1. For every positive integer n, the orthogonal group O(n, R) is the group of n × n real orthogonal matrices Mn with the group operation of matrix multiplication, satisfying > > MnMn = Mn Mn = In, > where In is the n × n identity matrix, Mn is the transpose of Mn. Because the determinant of an orthogonal matrix is either +1 or −1, the orthog- onal group has two components. The component containing the identity In is the special orthogonal group SO(n, R). That is, n o SO(n, R) := Mn ∈ O(n, R): det(Mn) = +1 , ∼ and is a normal subgroup of O(n, R) and O(n, R)/SO(n, R) = Z2. Note that O(n, R) is compact, not connected and it has two connected components. The map det : O(n, R) → {−1, +1} is a continuous map and {+1} is open in {−1, +1}. Thus, −1 SO(n, R) = det ({+1}) is an open, connected subset of O(n, R), and both O(n, R) 2 2 and SO(n, R) are smooth submanifolds of Rn , where Rn is the n2−dimensional Euclidean space. For more properties of the orthogonal and special orthogonal groups, interested readers are referred to see section 7.4.2, [14]. Remark 1. The vector space of real n × n skew-symmetric matrices is denoted by so(n, R). The exponential map exp : so(n, R) → SO(n, R),
∞ k An P An defined by exp(An) = e := In + k! , for all An ∈ so(n, R), is well defined and k=1 surjective. Given a real skew-symmetric matrix An, Rn = exp(An) is a rotation matrix and conversely given a rotation matrix Rn ∈ SO(n, R), there is some skew- symmetric matrix An such that Rn = exp(An) (see Theorem 14.2.2, [10]).
Definition 2.2 (Real Hadamard matrix). A real Hadamard matrix Hn of order n is defined as an n × n square matrix with entries from {+1, −1} such that
> HnHn = nIn. The following result gives the existence of Hadamard matrices for various orders. Lemma 2.3 (Theorem 4.4, [21], Lemma 1.1.4, [23]). If a real Hadamard matrix of order n exists, then n = 1, 2 or n ≡ 0(mod 4). Conjecture 1 (The Hadamard conjecture, [20]). If n ≡ 0(mod 4), then there is a real Hadamard matrix. Definition 2.4 (Real conference matrix). A real conference matrix of order n > 1 is an n × n matrix Cn ∈ M(n, R) with diagonal entries 0 and off-diagonal entries ±1 which satisfies > CnCn = (n − 1)In.
The defining equation shows that any two rows of Cn are orthogonal and hence n must be even. The following result gives the equivalence of real conference matrices. 416 K. T. ARASU AND MANIL T. MOHAN
Lemma 2.5 (Corollary 2.2, [7]). Any real conference matrix of order n > 2 is equivalent, under multiplication of rows and columns by −1, to a conference sym- metric or to a skew-symmetric matrix according as n satisfies n ≡ 2(mod 4) or n ≡ 0(mod 4).
Definition 2.6 (Real weighing matrix). A real weighing matrix Wn,k is a square matrix with entries 0, ±1 having k non-zero entries per row and column and inner > product of distinct rows zero. Hence, Wn,k satisfies Wn,kWn,k = kIn. The number k is called the weight of Wn,k. n/2 2 n The determinant of Wn,k is ±k and if n is odd, [det(Wn,k)] = k implies k must be a square. Note that a Wn,n, n ≡ 0(mod 4), is a real Hadamard matrix of order n and a Wn,n−1, n ≡ 2(mod 4), is equivalent to a symmetric conference matrix. Using this definition, the zero element is no more required to be on the diagonal, and hence Wn,n−1 is a relaxed definition of real conference matrices. The following theorem gives the existence of real weighing matrices.
Theorem 2.7 (Proposition 23, [11]). 1. If n is odd, then a Wn,k exists only if (i) k is a square and (ii) (n − k)2 − (n − k) + 1 ≥ n.
2. If n ≡ 2(mod 4), then for a Wn,k to exist, (i) k ≤ n − 1 and (ii) k is the sum of two integral squares.
Also, it has been conjectured that n ≡ 0(mod 4), then a Wn,k exists for all 1 ≤ k ≤ n (see [12]). Definition 2.8. For every positive integer n, the n×n unitary group U(n, C) is the group of n × n unitary matrices with the group operation of matrix multiplication, satisfying ∗ ∗ UnUn = UnUn = In, ∗ where Un is the conjugate transpose of Un. The n × n special orthogonal group is n o SU(n, C) = Un ∈ U(n, C): det(Un) = +1 . Note that the unitary and special unitary groups are smooth manifolds, compact and path-connected.
Definition 2.9 (Complex Hadamard matrix). A complex Hadamard matrix Hn ∈ ∗ M(n, C) is an n×n matrix with complex entries of modulus 1 such that HnHn = nIn. Lemma 2.10. For every n ≥ 1, there exists a complex Hadamard matrix of order n. Proof. The Fourier matrix 2πi(i−1)(j−1)/n [Fn]ij = e , i, j = 1, . . . , n, is an example of a complex Hadamard matrix of order n. The following are complex Hadamard matrices of orders n = 1, 2 and 3, respec- tively: 1 1 1 1 1 F = [1], F = , F = 1 ω ω2 , 1 2 1 −1 3 1 ω2 ω OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 417 where {1, ω, ω2} is the cube roots of unity. For more details on the complex Hadamard matrices, see [23].
3. The Optimization Problem. In this section, we formulate optimization prob- lems involving orthogonal matrix constraints. For notational convenience, we use the symbols A, B, G, I, W, X, Y, Z, etc, as n × n matrices. The general minimization problem(see [25]) over the real orthogonal matrices can be formulated as min/max F(X), such that X>X = I, (1) n×n X∈R where F(·): Rn×n → R is a differentiable function. The feasible set n n×n > o Mn := X ∈ R :X X = In , is often referred to as the Stiefel Manifold. Given a differentiable function, F(·): Rn×n → R, the gradient of F(·) with n ∂F(X) respect to X is denoted by G := DF(X) = ∂X . The derivative of F(·) at ij i,j=1 X in the direction Z is F(X + tZ) − F(X) DF(X)[Z] := lim = hDF(X), Zi . t→0 t Here h·, ·i denotes the Euclidean inner product between two matrices and is defined by n X > n n n×n hA, Bi := ajkbjk = Tr(A B), for all A = (aij)i,j=1, B = (bij)i,j=1 ∈ R , j,k=1 where Tr(A) is the trace of A. We use ∇F for gradients in tangent planes. The n n×n 2 P 2 Frobenius norm of A ∈ R is defined as kAkF := aij. Given a feasible point i,j=1 X and the gradient G, we define a skew-symmetric matrix A as either A := GX> − XG> or 1 (2) A := (P G)X> − X(P G)>, where P := I − XX> . X X X 2 Following [25], we next state the first and second order optimality conditions in the following lemmas. Since the matrix X>X = I is symmetric, the Lagrangian multiplier Λ corresponding to X>X = I is a symmetric matrix. The Lagrangian function for the optimization problem (1) is 1 L(X, Λ) = F(X) − Tr Λ X>X − I . (3) 2 Lemma 3.1 (First order necessary conditions, Lemma 1, [25]). If X is a local minimizer of the problem (1), then X satisfies the first order optimality conditions, > > DXL(X, Λ) = G − XG X = 0 and X X = I, with the associated Lagrangian multi- plier Λ = G>X. Define ∇F(X) := G − XG>X, and A := GX> − XG>. (4) Then, ∇F(X) = AX. Moreover, ∇F(X) = 0, if and only if A = 0. Now we state the second order necessary and sufficient conditions for the maxi- mization problem in (1). 418 K. T. ARASU AND MANIL T. MOHAN
Lemma 3.2 (Second order necessary conditions, Theorem 12.5, [19], Lemma 2, [25]). Suppose that X ∈ Mn is a local maximizer for the problem (1). Then X satisfies Tr Z>D (DF(X)) [Z] − Tr ΛZ>Z ≤ 0, where Λ = G>X, (5)
n n×n > > o for all Z ∈ TXMn := Z ∈ R :X Z + Z X = 0 , which is the tangent space of Mn at X. Lemma 3.3 (Second order sufficient conditions, Theorem 12.6, [19], Lemma 2, [25]). Suppose that for X ∈ Mn, there exists a Lagrange multiplier Λ such that the first order conditions are satisfied. Suppose also that Tr Z>D (DF(X)) [Z] − Tr ΛZ>Z < 0, (6) for any matrix Z ∈ TXMn. Then X is a strict local maximizer for the problem (1). Remark 2. For the corresponding minimization problem, Lemma 3.1 remains the same and the inequalities (5) and (6) in the necessary and sufficient conditions (see Lemma 3.2 and Lemma 3.3) are reversed.
Remark 3. We know that SO(n, R) is the connected component of the identity in O(n, R). The subsets of SO(n, R), whose members are connected to the identity by paths (path-connected component). Let γ :(−a, a) → SO(n, R), a ∈ R be a smooth curve with γ(0) = I. Since, it is a curve in SO(n, R), for each s, we have γ(s)γ(s)> = I. Let us differentiate this with respect to s to obtain γ0(s)γ(s)> + γ(s)γ0(s)> = 0. For s = 0, we have γ0(0) + γ0(0)> = 0, so that γ0(0) is skew-symmetric. Hence, SO every tangent vector to SO(n, R) at I is a skew-symmetric matrix. Since TIMn ⊂ SO n n×n > o so(n, R), where Mn := X ∈ R :X X = I, det(X) = +1 and so(n, R) denotes the n×n skew-symmetric matrices, and both are vector spaces of dimension n(n−1) 2 , they must be equal. In general, the tangent space of a matrix X ∈ SO(n, R) is given by SO n o TXMn = XA : A ∈ so(n, R) ,
SO n2−n where the dimension of TXMn is 2 . This means that any tangent vector at X is X times some skew-symmetric matrix. Since SO(n, R) is a connected component of O(n, R) containing I, the group SO(n, R) has the same tangent space at the neutral element I, because all members of O(n, R) near the identity are members of SO O SO(n, R). We denote TXMn = TXMn := TXMn. + 3.1. The Maximization Problem. Let R0 denotes the set of all positive real numbers along with 0. For convenience, we use the symbols Xij for entries in the i,j n×n n matrix X and X for general matrices in R . For a matrix X = Mn = (aij)i,j=1, let us consider n ! n X 2 1 X 1 if i = j, J [Mn] := max aijf 2 , such that akiakj = (7) X∈ n×n a 0 if i 6= j, R i,j=1 ij k=1 where + (i) f(·) : [1, ∞] → R0 ∪ {∞} is a twice continuously differentiable function with f(1) = 0, (ii) f is monotonically increasing in [1, ∞) with f 0(∞) = 0, (iii) f is a strictly concave function for x ∈ [1, ∞) with f 00(∞) = 0, OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 419
0 df 00 d2f and we define 0×∞ = 0. Here, we denote f (a) = dx (a) and f (a) = dx2 (a). The derivative matrix G, the matrix A and the Lagrange multiplier Λ are given by n ! !!n ∂F(X) 1 1 0 1 G : = = 2 aijf 2 − f 2 , (8) ∂Xij a aij a i,j=1 ij ij i,j=1 A : = GX> − XG>
Aij n (" !# " !#)! X 1 1 1 1 1 1 =2 a a f − f − f 0 − f 0 , ik jk a2 a2 a2 a2 a2 a2 k=1 ik jk ik ik jk jk (9) and n X 1 1 1 n Λ : = G>X = 2 a a f − f 0 . (10) ki kj a2 a2 a2 i,j=1 k=1 ki ki ki n If there exists a matrix X = Mn = (aij)i,j=1 maximizes the function given in (7), then the first order conditions given in Lemma 3.1 becomes GX> = XG> and hence for all 1 ≤ i, j ≤ n, we have
n (" !# " !#) X 1 1 1 1 1 1 a a f − f − f 0 − f 0 = 0. ik jk a2 a2 a2 a2 a2 a2 k=1 ik jk ik ik jk jk (11) Also for the second order necessary and sufficient conditions given in Lemma 3.2 and Lemma 3.3, we first calculate D (DF(X)) [Z] as n n X ∂ (DF(X))ij D (DF(X)) [Z] = Tr Z>DF(X) = z ij ∂X i,j=1 i,j=1 " ! ! !#!n 2 1 1 1 1 = 2 z f 00 − f 0 + f , (12) ij a4 a2 a2 a2 a2 ij ij ij ij ij i,j=1
n > where Z = (zij)i,j=1. Thus, we find Tr Z D (DF(X)) [Z] as
n > X Tr Z D (DF(X)) [Z] = zij (D (DF(X)) [Z])ij i,j=1 n " ! ! !# X 2 1 1 1 1 = 2 z2 f 00 − f 0 + f , ij a4 a2 a2 a2 a2 i,j=1 ij ij ij ij ij (13) for all Z ∈ TXMn. Let us define Ai,j ∈ so(n), for 1 ≤ i < j ≤ n, as the skew-symmetric matrix th th i,j with +1 on the ij position and −1 on the ji position. Now, for Z ∈ TXMn, 1 ≤ i < j ≤ n, by using the orthogonality, we have Tr Λ(Zi,j)>Zi,j 420 K. T. ARASU AND MANIL T. MOHAN
n " ! !!# X 1 1 1 1 1 1 = 2 a2 f − f 0 + a2 f − f 0 . ki a2 a2 a2 kj a2 a2 a2 k=1 ki ki ki kj kj kj (14) From (13), we also obtain Tr (Zi,j)>D (DF(X)) [Zi,j] n " ! ! !# X 2 1 1 1 1 = 2 a2 f 00 − f 0 + f ki a4 a2 a2 a2 a2 k=1 kj kj kj kj kj n X 2 1 1 1 1 + 2 a2 f 00 − f 0 + f . (15) kj a4 a2 a2 a2 a2 k=1 ki ki ki ki ki Thus for 1 ≤ i < j ≤ n, we have
i,j i,j > i,j i,j > i,j ξX : = Tr (Z ) D (DF(X)) [Z ] − Tr Λ(Z ) Z n " 2 ! 2 # X a 1 akj 1 = 4 ki f 00 + f 00 a4 a2 a4 a2 k=1 kj kj ki ki n " ! ! # X 1 1 1 1 1 1 + 2 a2 − a2 f 0 − f 0 + f − f . ki kj a2 a2 a2 a2 a2 a2 k=1 ki ki kj kj kj ki (16)
n If X = (aij)i,j=1 is a strict local maximum, then from the necessary and sufficient i,j conditions (see Lemma 3.2 and Lemma 3.3), we know that ξX < 0, for all 1 ≤ i < j ≤ n. Since the set O(n, R) is not convex, the maximization problem in (7) over the set of all n × n real orthogonal matrices may give several local and global maxima (even if the cost functional is concave).
n 3.2. The Minimization Problem. For a matrix X = Mn = (aij)i,j=1, let us consider n n X 2 2 X 1 if i = j, Jf[Mn] := min aijg aij , such that akiakj = (17) X∈ n×n 0 if i 6= j, R i,j=1 k=1 where + (i) g(·) : [0, 1] → R0 is a twice continuously differentiable function with g(0) = 0, (ii) g is monotonically increasing, (iii) g is a strictly convex function for x ∈ (0, 1]. This problem can also be handled in a similar way as discussed in section 3.1 and we obtain following estimates: n G = 2 a g(a2 ) + a2 g0(a2 ) , ij ij ij ij i,j=1 n !n X 2 2 2 0 2 2 0 2 A = 2 aikajk g(aik) − g(ajk) + aikg (aik) − ajkg (ajk) , k=1 i,j=1 n !n X 2 2 2 Λ = 2 akiakj g(aki) + akig(aki) , k=1 i,j=1 OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 421 and n i,j > i,j X 2 2 4 0 2 2 2 4 0 2 Tr Λ(Z ) Z = 2 akig(aki) + akig (aki) + akjg(akj) + akjg (akj) k=1 Tr (Zi,j)>D (DF(X)) [Zi,j] n X 2 2 2 0 2 4 00 2 = 2 aki g(akj) + 5akjg (akj) + 2akjg (akj) k=1 n X 2 2 2 0 2 4 00 2 + 2 akj g(aki) + 5akig (aki) + 2akig (aki) k=1 n i,j X 2 2 2 00 2 2 00 2 ξX = 4 akiakj akig (aki) + akjg (akj) k=1 n X 2 2 0 2 0 2 4 0 2 4 0 2 + 2 5akiakj g (aki) + g (akj) − akig (aki) − akjg (akj) k=1 n X 2 2 2 2 + 2 (aki − akj) g(akj) − g(aki) , k=1
i,j n for all Z ∈ TXMn, 1 ≤ i < j ≤ n. If X = (aij)i,j=1 is a strict local minimum to the problem (17), then from the necessary and sufficient conditions, we know that i,j ξX > 0, for all 1 ≤ i < j ≤ n.
4. Stationary Points and Maximal/Minimal Orthogonal Matrices. In this section, we find some stationary points and local maximum/minimum of the opti- mization problem (7)/(17) using the multivariate analysis techniques described in section3. We first state an elementary lemma and the proof of which is easy.
Lemma 4.1. Let A be an n × n orthogonal matrix and B be an m × m orthogonal matrix. Then the matrices A ⊕ B and A ⊗ B are also orthogonal matrices of orders (n + m) × (n + m) and (nm) × (nm) respectively, where ⊕ denotes the direct sum and ⊗ denotes their Kronecker product of A and B.
Proposition 4.2. Let the orthogonal matrices Mn and Mm be stationary points of the maximization problem (7) (minimization problem (17)) of orders n and m respectively, then Mn ⊕ Mm is a stationary point of the corresponding maximization problem (minimization problem) of order n + m. Also for the minimization problem (17), we have 1. if both Mn and Mm are local maximum, then Mn ⊕ Mm is a local maximum, 2. if at least one of them is a saddle point, then Mn ⊕ Mm is also a saddle point, 3. if both are local minimum, then Mn ⊕ Mm is a saddle point. Proof. We prove the proposition only for the maximization problem given in (7) and the proof of the minimization problem (17) follows in a similar fashion. Let n n n+m Mn = (aij)i,j=1, Mm = (bij)i,j=1 and Mn ⊕ Mm = (cij)i,j=1. By the definition aij for 1 ≤ i, j ≤ n, of ⊕, we know that cij = bij for n + 1 ≤ i, j ≤ m + n, Since Mn and Mm be 0, otherwise. 422 K. T. ARASU AND MANIL T. MOHAN stationary points of the maximization problem (7), from (11), we have n (" !# " !#) X 1 1 1 1 1 1 η : = a a f − f − f 0 − f 0 ij ik jk a2 a2 a2 a2 a2 a2 k=1 ik jk ik ik jk jk = 0, (18) n (" ! # " ! #) X 1 1 1 1 1 1 θ : = b b f − f − f 0 − f 0 = 0, gl gk lk b2 b2 b2 b2 b2 b2 k=1 gk lk gk gk lk lk (19) for all 1 ≤ i, j ≤ n and 1 ≤ g, l ≤ m. Now, we know that n+m (" !# " !#) X 1 1 1 1 1 1 ζ : = c c f − f − f 0 − f 0 ij ik jk c2 c2 c2 c2 c2 c2 k=1 ik jk ik ik jk jk n (" !# " !#) X 1 1 1 1 1 1 = c c f − f − f 0 − f 0 ik jk c2 c2 c2 c2 c2 c2 k=1 ik jk ik ik jk jk n+m (" !# " !#) X 1 1 1 1 1 1 + c c f − f − f 0 − f 0 ik jk c2 c2 c2 c2 c2 c2 k=n+1 ik jk ik ik jk jk 1 2 = ζij + ζij. (20) 1 2 For 1 ≤ i, j ≤ n, we get ζij = ηij = 0, ζij = 0 and so ζij = 0. For n + 1 ≤ i, j ≤ 1 2 n + m, we have ζij = 0, ζij = θij and hence ζij = 0. For the cases, 1 ≤ i ≤ n, n + 1 ≤ j ≤ n + m and n + 1 ≤ i ≤ n + m, 1 ≤ j ≤ n, it can be easily seen that ζij = 0, since at least one of the multiplying entry in ζij is zero. Hence Mn ⊕ Mm of order n + m is a stationary point of the maximization problem (7). In a similar way one can prove the first order necessary conditions for the min- imization problem (17). In order to get the second order necessary and sufficient and conditions for the problem (17), we consider the following: n m 1. Let us assume that Mn = (aij)i,j=1 and Mm = (bij)i,j=1 are strict local maxima of the problem (17) for orders n and m respectively. Since Mn is a strict local maximum, from the second order necessary and sufficient condition, we know that n n X X ξi,j : = 4 a2 a2 a2 g00(a2 ) + a2 g00(a2 ) + 2 (a2 − a2 ) g(a2 ) − g(a2 ) Mn ki kj ki ki kj kj ki kj kj ki k=1 k=1 n X 2 2 0 2 0 2 4 0 2 4 0 2 + 2 5akiakj g (aki) + g (akj) − akig (aki) − akjg (akj) < 0, (21) k=1 for all 1 ≤ i < j ≤ n. Similarly since Mm is a a strict local maximizer, we have m m X X ξi,j : = 4 b2 b2 b2 g00(b2 ) + b2 g00(b2 ) + 2 (b2 − b2 ) g(b2 ) − g(b2 ) Mm ki kj ki ki kj kj ki kj kj ki k=1 k=1 m X 2 2 0 2 0 2 4 0 2 4 0 2 + 2 5bkibkj g (bki) + g (bkj) − bkig (bki) − bkjg (bkj) < 0, (22) k=1 for all 1 ≤ i < j ≤ m. Now, for 1 ≤ i < j ≤ n and X = Mn ⊕ Mm, we have n+m n+m i,j X 2 2 2 00 2 2 00 2 X 2 2 2 2 ξX : = 4 ckickj ckig (cki) + ckjg (ckj) + 2 (cki − ckj) g(ckj) − g(cki) k=1 k=1 OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 423
n+m X 2 2 0 2 0 2 4 0 2 4 0 2 + 2 5ckickj g (cki) + g (ckj) − ckig (cki) − ckjg (ckj) k=1 n n+m X 2 2 2 00 2 2 00 2 X 2 2 2 2 = 4 ckickj ckig (cki) + ckjg (ckj) + 2 (cki − ckj) g(ckj) − g(cki) k=1 k=1 n X 2 2 0 2 0 2 4 0 2 4 0 2 + 2 5ckickj g (cki) + g (ckj) − ckig (cki) − ckjg (ckj) k=1 = ξi,j < 0, (23) Mn n(n−1) and it covers 2 terms. Now for n + 1 ≤ i < j ≤ n + m, we have n+m n+m i,j X 2 2 2 00 2 2 00 2 X 2 2 2 2 ξX = 4 ckickj ckig (cki) + ckjg (ckj) + 2 (cki − ckj) g(ckj) − g(cki) k=n+1 k=n+1 n+m X 2 2 0 2 0 2 4 0 2 4 0 2 + 2 5ckickj g (cki) + g (ckj) − ckig (cki) − ckjg (ckj) k=n+1 = ξi,j < 0, (24) Mm m(m−1) by using (22) and it covers 2 terms. Now, for the remaining nm terms (for the case 1 ≤ i ≤ n, n + 1 ≤ j ≤ n + m), we have n+m n+m i,j X 2 2 2 00 2 2 00 2 X 2 2 2 2 ξX = 4 ckickj ckig (cki) + ckjg (ckj) + 2 (cki − ckj) g(ckj) − g(cki) k=1 k=1 n+m X 2 2 0 2 0 2 4 0 2 4 0 2 + 2 5ckickj g (cki) + g (ckj) − ckig (cki) − ckjg (ckj) k=1 n+m X 2 2 2 2 4 0 2 4 0 2 = −2 ckig(cki) + ckjg(ckj) + ckig (cki) + ckjg (ckj) k=1 n X 2 2 2 2 4 0 2 4 0 2 = −2 akig(aki) + akjg(akj) + akig (aki) + akjg (akj) k=1 m X 2 2 2 2 4 0 2 4 0 2 − 2 bkig(bki) + bkjg(bkj) + akig (bki) + bkjg (bkj) < 0, (25) k=1 since the cross terms are zero, g ≥ 0 and g(·) is monotonically increasing. Combining (23), (24) and (25), we finally get X = Mn ⊕Mm is a local maximizer to the problem (7) of order n + m. 2. If at least one of Mn or Mm is a saddle point, then from (23) or (24), it is clear that Mn ⊕ Mm is also a saddle point. 3. If Mn and Mm are local minimum, then the estimates (23) and (24) are positive, but the estimate (25) is still negative, making Mn⊕Mm, a saddle point. Remark 4. In general, the second part of Proposition 4.2 is not true for the maximization problem (7). For instance, for the local minimum, the estimates similar to (23) and (24) are positive. But the estimate (25) becomes n n " ! !# X 1 1 1 X 1 1 1 ξi,j = 2 a2 f 0 − f + 2 a2 f 0 − f X ki a2 a2 a2 kj a2 a2 a2 k=1 ki ki ki k=1 kj kj kj 424 K. T. ARASU AND MANIL T. MOHAN
m n " ! !# X 1 1 1 X 1 1 1 + 2 b2 f 0 − f + 2 b2 f 0 − f , ki b2 b2 b2 kj b2 b2 b2 k=1 ki ki ki k=1 kj kj kj (26) i,j for 1 ≤ i ≤ n, n + 1 ≤ j ≤ n + m. From the above estimate, the sign of ξX is not conclusive as f ≥ 0 and f(·) is monotonically increasing. n n P 2 1 P 2 2 Let us now define J[Mn] := aijf a2 and eJ[Mn] := aijg(aij). i,j=1 ij i,j=1 n m Lemma 4.3. Let the orthogonal matrices Mn = (aij)ij=1 and Mm = (bij)i,j=1 be stationary points of the maximization problem (7) (minimization problem (17)). Then ) J[Mn ⊕ Mm] = J[Mn] + J[Mm], (27) eJ[Mn ⊕ Mm] = eJ[Mn] + eJ[Mm]. n+m Proof. If Mn ⊕ Mm = (cij)i,j=1, then we have n+m ! X 1 J[M ⊕ M ] = c2 f n m ij c2 i,j=1 ij n ! n+m ! n n+m ! X 1 X 1 X X 1 = c2 f + c2 f + c2 f ij c2 ij c2 ij c2 i,j=1 ij i,j=n+1 ij i=1 j=n+1 ij n+m n ! X X 1 + c2 f ij c2 i=n+1 j=1 ij n ! m ! X 1 X 1 = a2 f + b2 f ij a2 ij b2 i,j=1 ij i,j=1 ij
= J[Mn] + J[Mm]. (28) The second equality in (27) follows similarly. Proposition 4.4. The orthogonal matrix −(n − 2) 2 ··· 2 1 2 −(n − 2) ··· 2 K = , (29) n . . .. . n . . . . 2 2 · · · −(n − 2) is always a stationary point of the maximization problem (7) (minimization problem (17)). Proof. For the orthogonal matrix given in (29), the derivative matrix G is given by −a b ··· b 2 b −a ··· b G = = G>, (30) . . .. . n . . . . b b · · · −a where n2 n2 n2 a = (n − 2) f − f 0 , and (n − 2)2 (n − 2)2 (n − 2)2 OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 425
n2 n2 n2 b = 2 f − f 0 . 4 4 4 Now the matrix A is given by A = GX> − XG> = GX − XG = 0, (31) since GX = XG (n − 2)a + 2(n − 1)b −2a + (n − 2)b · · · −2a + (n − 2)b 2 −2a + (n − 2)b (n − 2)a + 2(n − 1)b · · · −2a + (n − 2)b = , 2 . . .. . n . . . . −2a + (n − 2)b −2a + (n − 2)b ··· (n − 2)a + 2(n − 1)b making X = Kn, a stationary point of the problem (7). Theorem 4.5. There exists at least p(n) stationary points for the maximization problem (7) (minimization problem (17)), where p(n) represents the number of pos- sible partitions of a natural number n, i.e., the number of distinct ways of repre- senting n as a sum of natural numbers. Proof. Follows by combining Proposition 4.2 and Proposition 4.4.
∞ ∞ P n Q 1 Note that the generating function of p(n) is given by p(n)x = 1−xk . n=0 k=1 Also p(n) can be found using the Euler’s recurrence relation:
n 1 if n = 0, X α(i)p(n − i) = 0, n ≥ 1, p(0) = 1 and α(n) = r r(3r±1) (−1) if n = 2 , r ≥ 1, i=0 0 otherwise. The first few values of p(n) are p(1) = 1, p(2) = 2, p(3) = 3, p(4) = 5, p(5) = 7, p(6) = 11, p(7) = 15, p(8) = 22, p(9) = 30, p(10) = 42, etc.
n m Lemma 4.6. Let the orthogonal matrices Mn = (aij)ij=1 and Mm = (bij)i,j=1 be stationary points of the maximization problem (7) (minimization problem (17)) of orders n and m respectively. If Mn ⊗ Mm is a stationary point of the maximization problem (7) (minimization problem (17)) of order nm, then n n ! X X 1 J[M ⊗ M ] = a2 b2 f , n m kl ij a2 b2 k,l=1 i,j=1 kl ij n n (32) X X J[M ⊗ M ] = a2 b2 g a2 b2 . e n m kl ij kl ij k,l=1 i,j=1 If f(xy) = f(x) + f(y) (or g(xy) = g(x) + g(y)), then ) J[Mn ⊗ Mm] = mJ[Mn] + nJ[Mm], (33) eJ[Mn ⊗ Mm] = meJ[Mn] + neJ[Mm], and if f(xy) = f(x)f(y) (or g(xy) = g(x)g(y)), then ) J[Mn ⊗ Mm] = J[Mn]J[Mm], (34) eJ[Mn ⊗ Mm] = eJ[Mn]eJ[Mm]. 426 K. T. ARASU AND MANIL T. MOHAN
mn Proof. If Mn ⊗ Mm = (cij)i,j=1, then we have mn ! n m ! X 1 X X 1 J[M ⊗ M ] = c2 f = a2 b2 f . (35) n m ij c2 kl ij a2 b2 i,j=1 ij k,l=1 i,j=1 kl ij If f(xy) = f(x) + f(y), then we have n m !! X X 1 1 J[M ⊗ M ] = a2 b2 f + f n m kl ij a2 b2 k,l=1 i,j=1 kl ij
n m n m ! X 1 X X X 1 = a2 f b2 + a2 b2 f kl a2 ij kl ij b2 k,l=1 kl i,j=1 k,l=1 i,j=1 ij
= mJ[Mn] + nJ[Mm]. (36) If f(xy) = f(x)f(y), then n m ! X X 1 1 J[M ⊗ M ] = a2 b2 f f n m kl ij a2 b2 k,l=1 i,j=1 kl ij
n m ! X 1 X 1 = a2 f b2 f kl a2 ij b2 k,l=1 kl i,j=1 ij
= J[Mn]J[Mm], (37) which completes the proof. Theorem 4.7. The cost function in (7) satisfies
0 ≤ J[Mn] ≤ nf (n) , for all n, (38) 2 1 and the global maximum value nf (n) is attained if and only if aij = n .
Proof. It is clear that J[Mn] ≥ 0, since f is monotonically increasing and f(1) = 0. n P 2 Since aij = 1, for all 1 ≤ i ≤ n, by using Jensen’s inequality, we have j=1
n n ! X 1 X 1 f(n) = f a2 ≥ a2 f . (39) ij a2 ij a2 j=1 ij j=1 ij Summing over i from 1 to n, we get n ! n X 1 X 0 ≤ a2 f ≤ f(n) = nf(n). (40) ij a2 i,j=1 ij i=1 Since, f(·) is a strictly concave function, the right hand side of the inequality (40) 2 1 is strict unless aij = n . The global minimum value 0 is obtained for the orthogonal 1 if i = j matrices having a2 = δ , for all 1 ≤ i, j ≤ n, where δ = (and its ij ij ij 0 if i 6= j permutations). Theorem 4.8. The cost function in (17) satisfies 1 ng(1) ≥ eJ[Mn] ≥ ng , for all n, (41) n OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 427