NUMERICAL ALGEBRA, doi:10.3934/naco.2018026 CONTROL AND OPTIMIZATION Volume 8, Number 4, December 2018 pp. 413–440

OPTIMIZATION PROBLEMS WITH ORTHOGONAL CONSTRAINTS

K. T. Arasu Department of Mathematics and Statistics Wright State University 3640 Colonel Glenn Highway Dayton, OH 45435, U.S.A. Manil T. Mohan∗† Department of Mathematics and Statistics Air Force Institute of Technology, 2950 Hobson Way Wright Patterson Air Force Base, OH 45433, USA

(Communicated by Zhong Wan)

Abstract. The optimization problems involving orthogonal matrices have been formulated in this work. A lower bound for the number of stationary points of such optimization problems is found and its connection to the num- ber of possible partitions of natural numbers is also established. We obtained local and global optima of such problems for different orders and showed their connection with the Hadamard, conference and weighing matrices. The appli- cation of general theory to some concrete examples including maximization of Shannon, Rény, Tsallis and Sharma-Mittal entropies for orthogonal matrices, minimum distance orthostochastic matrices to uniform van der Waerden matri- ces, Cressie-Read and K-divergence functions for orthogonal matrices, etc are also discussed. Global optima for all orders has been found for the optimization problems involving constraints.

1. Introduction. Optimization problems involving constraints like orthogonal ma- trices and unitary matrices play an important role in several applications of science, engineering and technology. Some examples include linear and nonlinear eigenvalue problems, electronic structures computations, low-rank matrix optimization, poly- nomial optimization, subspace tracking, combinatorial optimization, sparse princi- pal component analysis, etc (see [8,1, 18, 24, 25] and references therein). These problems are difficult because the constraints are non-convex and the constraints may lead to several local optimum and, in particular, many of these problems in special forms are non-deterministic polynomial-time hard (NP-hard). There is no assurance for obtaining the global optimizer except for a few simple cases and hence is a tedious task for the numerical analysts to find a global optimum (see [25] for more details).

2010 Mathematics Subject Classification. Primary: 15B51; Secondary: 65K05, 15B10, 15B34. Key words and phrases. optimization, , , conference ma- trix, , orthostochastic matrix, Shannon entropy. ∗Corresponding author: Manil T. Mohan. †M. T. Mohan’s current address Department of Mathematics, Indian Institute of Technology Roorkee-IIT Roorkee, Haridwar Highway, Roorkee, Uttarakhand 247 667, INDIA.

413 414 K. T. ARASU AND MANIL T. MOHAN

In this work, we formulate some interesting optimization problems involving or- thogonal matrix constraints (of order n, n ∈ N) and find a fascinating connection between the number of stationary points and the number of possible partitions of a natural number. We identify some global and local optima for such prob- lems, which are inevitably related to the real Hadamard, conference and weighing matrices. For the optimization problems involving unitary matrix constraints, we found global optima for all orders. We also give the application of general theory to some concrete examples including maximization of Shannon, Rény, Tsallis and Sharma-Mittal entropies for orthogonal matrices, minimum distance orthostochas- tic matrices to uniform van der Waerden matrices, Cressie-Read and K-divergence functions for orthogonal matrices, etc. Let us now list some of the optimization problems with orthogonal matrix con- strains available in the literature. One of the first such problems is addressed in [13], where the optimization problem was to find the nearest orthogonal matrix to a given matrix. The entropy of a random variable is the measure of uncertainty in Information Theory. The Shannon entropy of a real orthogonal matrix is introduced in [9], and it has been shown in [9,2] that if a real Hadamard matrix exists, then it maximizes the Shannon entropy. The work in [3] extended this idea and found (local and global) minimum distance orthostochastic matrices to uniform van der Waerden matrices (see [17,5,3]) of different orders. The uniform van der Waerden 1 matrix of order n (with all its entries n ) is bistochastic, and is orthostochastic if and only if there exists a real Hadamard matrix of order n. Note that a real Hadamard matrix of order n exits, then n = 1, 2, or n ≡ 0(mod 4). Thus for all other orders, finding local and global optimum for the optimization problem of minimum dis- tance orthostochastic matrices to uniform van der Waerden matrix, is a challenging problem. In the numerical point of view, the authors in [25] developed a feasible method for optimization problems with orthogonality constraints and they covered a wide range of such problems. Bistochastic and orthostochastic matrices have variety of applications in Statis- tics, Mathematics and Physics, including but not limited to the theory of majoriza- tion, angular momentum, in transfer problems, investigations of the Frobenius- Perron operator, and in characterization of completely positive maps acting in the space of density matrices, see for example, [4, 15, 26] etc. The construction of the paper is as follows. In the next section, we give some basic definitions and properties of the orthogonal group, Hadamard, conference and weighing matrices. In section3, we formulate a general optimization problem and discuss about its stationary points, local and global maxima (minima also). A general maximization problem (minimization problem) is described in section4 and several properties of its stationary points, local and global maxima (minima) are also examined. The same optimization problems involving unitary matrices as constraints are also formulated in this section and we find the matrices involving complex Hadamard matrices as the global optimizers for all orders. The application of general theory to some particular examples including maximization of Shannon, Rény, Tsallis and Sharma-Mittal entropies for orthogonal matrices, minimum dis- tance orthostochastic matrices to uniform van der Waerden matrices, Cressie-Read and K-divergence functions for orthogonal matrices etc, are given in section5.

2. Preliminaries. In this section, we give some preliminaries needed to establish the main results of this paper. In the sequel, M(n, R) = Rn×n denotes the space of OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 415 all n × n real matrices and M(n, C) = Cn×n denotes the space of all n × n complex matrices.

Definition 2.1. For every positive integer n, the orthogonal group O(n, R) is the group of n × n real orthogonal matrices Mn with the group operation of , satisfying > > MnMn = Mn Mn = In, > where In is the n × n , Mn is the of Mn. Because the of an orthogonal matrix is either +1 or −1, the orthog- onal group has two components. The component containing the identity In is the special orthogonal group SO(n, R). That is, n o SO(n, R) := Mn ∈ O(n, R): det(Mn) = +1 , ∼ and is a normal subgroup of O(n, R) and O(n, R)/SO(n, R) = Z2. Note that O(n, R) is compact, not connected and it has two connected components. The map det : O(n, R) → {−1, +1} is a continuous map and {+1} is open in {−1, +1}. Thus, −1 SO(n, R) = det ({+1}) is an open, connected subset of O(n, R), and both O(n, R) 2 2 and SO(n, R) are smooth submanifolds of Rn , where Rn is the n2−dimensional Euclidean space. For more properties of the orthogonal and special orthogonal groups, interested readers are referred to see section 7.4.2, [14]. Remark 1. The vector space of real n × n skew-symmetric matrices is denoted by so(n, R). The exponential map exp : so(n, R) → SO(n, R),

∞ k An P An defined by exp(An) = e := In + k! , for all An ∈ so(n, R), is well defined and k=1 surjective. Given a real skew- An, Rn = exp(An) is a and conversely given a rotation matrix Rn ∈ SO(n, R), there is some skew- symmetric matrix An such that Rn = exp(An) (see Theorem 14.2.2, [10]).

Definition 2.2 (Real Hadamard matrix). A real Hadamard matrix Hn of order n is defined as an n × n with entries from {+1, −1} such that

> HnHn = nIn. The following result gives the existence of Hadamard matrices for various orders. Lemma 2.3 (Theorem 4.4, [21], Lemma 1.1.4, [23]). If a real Hadamard matrix of order n exists, then n = 1, 2 or n ≡ 0(mod 4). Conjecture 1 (The Hadamard conjecture, [20]). If n ≡ 0(mod 4), then there is a real Hadamard matrix. Definition 2.4 (Real ). A real conference matrix of order n > 1 is an n × n matrix Cn ∈ M(n, R) with diagonal entries 0 and off-diagonal entries ±1 which satisfies > CnCn = (n − 1)In.

The defining equation shows that any two rows of Cn are orthogonal and hence n must be even. The following result gives the equivalence of real conference matrices. 416 K. T. ARASU AND MANIL T. MOHAN

Lemma 2.5 (Corollary 2.2, [7]). Any real conference matrix of order n > 2 is equivalent, under multiplication of rows and columns by −1, to a conference sym- metric or to a skew-symmetric matrix according as n satisfies n ≡ 2(mod 4) or n ≡ 0(mod 4).

Definition 2.6 (Real weighing matrix). A real weighing matrix Wn,k is a square matrix with entries 0, ±1 having k non-zero entries per row and column and inner > product of distinct rows zero. Hence, Wn,k satisfies Wn,kWn,k = kIn. The number k is called the weight of Wn,k. n/2 2 n The determinant of Wn,k is ±k and if n is odd, [det(Wn,k)] = k implies k must be a square. Note that a Wn,n, n ≡ 0(mod 4), is a real Hadamard matrix of order n and a Wn,n−1, n ≡ 2(mod 4), is equivalent to a symmetric conference matrix. Using this definition, the zero element is no more required to be on the diagonal, and hence Wn,n−1 is a relaxed definition of real conference matrices. The following theorem gives the existence of real weighing matrices.

Theorem 2.7 (Proposition 23, [11]). 1. If n is odd, then a Wn,k exists only if (i) k is a square and (ii) (n − k)2 − (n − k) + 1 ≥ n.

2. If n ≡ 2(mod 4), then for a Wn,k to exist, (i) k ≤ n − 1 and (ii) k is the sum of two integral squares.

Also, it has been conjectured that n ≡ 0(mod 4), then a Wn,k exists for all 1 ≤ k ≤ n (see [12]). Definition 2.8. For every positive integer n, the n×n unitary group U(n, C) is the group of n × n unitary matrices with the group operation of matrix multiplication, satisfying ∗ ∗ UnUn = UnUn = In, ∗ where Un is the conjugate transpose of Un. The n × n special orthogonal group is n o SU(n, C) = Un ∈ U(n, C): det(Un) = +1 . Note that the unitary and special unitary groups are smooth manifolds, compact and path-connected.

Definition 2.9 (). A complex Hadamard matrix Hn ∈ ∗ M(n, C) is an n×n matrix with complex entries of modulus 1 such that HnHn = nIn. Lemma 2.10. For every n ≥ 1, there exists a complex Hadamard matrix of order n. Proof. The Fourier matrix 2πi(i−1)(j−1)/n [Fn]ij = e , i, j = 1, . . . , n, is an example of a complex Hadamard matrix of order n. The following are complex Hadamard matrices of orders n = 1, 2 and 3, respec- tively:  1 1 1   1 1  F = [1], F = , F = 1 ω ω2 , 1 2 1 −1 3   1 ω2 ω OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 417 where {1, ω, ω2} is the cube roots of unity. For more details on the complex Hadamard matrices, see [23].

3. The Optimization Problem. In this section, we formulate optimization prob- lems involving orthogonal matrix constraints. For notational convenience, we use the symbols A, B, G, I, W, X, Y, Z, etc, as n × n matrices. The general minimization problem(see [25]) over the real orthogonal matrices can be formulated as min/max F(X), such that X>X = I, (1) n×n X∈R where F(·): Rn×n → R is a differentiable function. The feasible set n n×n > o Mn := X ∈ R :X X = In , is often referred to as the Stiefel Manifold. Given a differentiable function, F(·): Rn×n → R, the gradient of F(·) with n  ∂F(X)  respect to X is denoted by G := DF(X) = ∂X . The derivative of F(·) at ij i,j=1 X in the direction Z is F(X + tZ) − F(X) DF(X)[Z] := lim = hDF(X), Zi . t→0 t Here h·, ·i denotes the Euclidean inner product between two matrices and is defined by n X > n n n×n hA, Bi := ajkbjk = Tr(A B), for all A = (aij)i,j=1, B = (bij)i,j=1 ∈ R , j,k=1 where Tr(A) is the trace of A. We use ∇F for gradients in tangent planes. The n n×n 2 P 2 Frobenius norm of A ∈ R is defined as kAkF := aij. Given a feasible point i,j=1 X and the gradient G, we define a skew-symmetric matrix A as either  A := GX> − XG> or   1  (2) A := (P G)X> − X(P G)>, where P := I − XX> . X X X 2  Following [25], we next state the first and second order optimality conditions in the following lemmas. Since the matrix X>X = I is symmetric, the Lagrangian multiplier Λ corresponding to X>X = I is a symmetric matrix. The Lagrangian function for the optimization problem (1) is 1 L(X, Λ) = F(X) − Tr Λ X>X − I . (3) 2 Lemma 3.1 (First order necessary conditions, Lemma 1, [25]). If X is a local minimizer of the problem (1), then X satisfies the first order optimality conditions, > > DXL(X, Λ) = G − XG X = 0 and X X = I, with the associated Lagrangian multi- plier Λ = G>X. Define ∇F(X) := G − XG>X, and A := GX> − XG>. (4) Then, ∇F(X) = AX. Moreover, ∇F(X) = 0, if and only if A = 0. Now we state the second order necessary and sufficient conditions for the maxi- mization problem in (1). 418 K. T. ARASU AND MANIL T. MOHAN

Lemma 3.2 (Second order necessary conditions, Theorem 12.5, [19], Lemma 2, [25]). Suppose that X ∈ Mn is a local maximizer for the problem (1). Then X satisfies Tr Z>D (DF(X)) [Z] − Tr ΛZ>Z ≤ 0, where Λ = G>X, (5)

n n×n > > o for all Z ∈ TXMn := Z ∈ R :X Z + Z X = 0 , which is the tangent space of Mn at X. Lemma 3.3 (Second order sufficient conditions, Theorem 12.6, [19], Lemma 2, [25]). Suppose that for X ∈ Mn, there exists a Lagrange multiplier Λ such that the first order conditions are satisfied. Suppose also that Tr Z>D (DF(X)) [Z] − Tr ΛZ>Z < 0, (6) for any matrix Z ∈ TXMn. Then X is a strict local maximizer for the problem (1). Remark 2. For the corresponding minimization problem, Lemma 3.1 remains the same and the inequalities (5) and (6) in the necessary and sufficient conditions (see Lemma 3.2 and Lemma 3.3) are reversed.

Remark 3. We know that SO(n, R) is the connected component of the identity in O(n, R). The subsets of SO(n, R), whose members are connected to the identity by paths (path-connected component). Let γ :(−a, a) → SO(n, R), a ∈ R be a smooth curve with γ(0) = I. Since, it is a curve in SO(n, R), for each s, we have γ(s)γ(s)> = I. Let us differentiate this with respect to s to obtain γ0(s)γ(s)> + γ(s)γ0(s)> = 0. For s = 0, we have γ0(0) + γ0(0)> = 0, so that γ0(0) is skew-symmetric. Hence, SO every tangent vector to SO(n, R) at I is a skew-symmetric matrix. Since TIMn ⊂ SO n n×n > o so(n, R), where Mn := X ∈ R :X X = I, det(X) = +1 and so(n, R) denotes the n×n skew-symmetric matrices, and both are vector spaces of dimension n(n−1) 2 , they must be equal. In general, the tangent space of a matrix X ∈ SO(n, R) is given by SO n o TXMn = XA : A ∈ so(n, R) ,

SO n2−n where the dimension of TXMn is 2 . This means that any tangent vector at X is X times some skew-symmetric matrix. Since SO(n, R) is a connected component of O(n, R) containing I, the group SO(n, R) has the same tangent space at the neutral element I, because all members of O(n, R) near the identity are members of SO O SO(n, R). We denote TXMn = TXMn := TXMn. + 3.1. The Maximization Problem. Let R0 denotes the set of all positive real numbers along with 0. For convenience, we use the symbols Xij for entries in the i,j n×n n matrix X and X for general matrices in R . For a matrix X = Mn = (aij)i,j=1, let us consider n ! n  X 2 1 X 1 if i = j, J [Mn] := max aijf 2 , such that akiakj = (7) X∈ n×n a 0 if i 6= j, R i,j=1 ij k=1 where + (i) f(·) : [1, ∞] → R0 ∪ {∞} is a twice continuously differentiable function with f(1) = 0, (ii) f is monotonically increasing in [1, ∞) with f 0(∞) = 0, (iii) f is a strictly concave function for x ∈ [1, ∞) with f 00(∞) = 0, OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 419

0  df  00  d2f  and we define 0×∞ = 0. Here, we denote f (a) = dx (a) and f (a) = dx2 (a). The derivative matrix G, the matrix A and the Lagrange multiplier Λ are given by  n ! !!n ∂F(X) 1 1 0 1 G : = = 2 aijf 2 − f 2 , (8) ∂Xij a aij a i,j=1 ij ij i,j=1 A : = GX> − XG>

Aij n (" !# " !#)! X  1  1 1  1  1 1 =2 a a f − f − f 0 − f 0 , ik jk a2 a2 a2 a2 a2 a2 k=1 ik jk ik ik jk jk (9) and n      X 1 1 1 n Λ : = G>X = 2 a a f − f 0  . (10) ki kj a2 a2 a2 i,j=1 k=1 ki ki ki n If there exists a matrix X = Mn = (aij)i,j=1 maximizes the function given in (7), then the first order conditions given in Lemma 3.1 becomes GX> = XG> and hence for all 1 ≤ i, j ≤ n, we have

n (" !# " !#) X  1  1 1  1  1 1 a a f − f − f 0 − f 0 = 0. ik jk a2 a2 a2 a2 a2 a2 k=1 ik jk ik ik jk jk (11) Also for the second order necessary and sufficient conditions given in Lemma 3.2 and Lemma 3.3, we first calculate D (DF(X)) [Z] as n  n X ∂ (DF(X))ij D (DF(X)) [Z] = Tr Z>DF(X) = z ij ∂X i,j=1 i,j=1 " ! ! !#!n 2 1 1 1 1 = 2 z f 00 − f 0 + f , (12) ij a4 a2 a2 a2 a2 ij ij ij ij ij i,j=1

n >  where Z = (zij)i,j=1. Thus, we find Tr Z D (DF(X)) [Z] as

n >  X Tr Z D (DF(X)) [Z] = zij (D (DF(X)) [Z])ij i,j=1 n " ! ! !# X 2 1 1 1 1 = 2 z2 f 00 − f 0 + f , ij a4 a2 a2 a2 a2 i,j=1 ij ij ij ij ij (13) for all Z ∈ TXMn. Let us define Ai,j ∈ so(n), for 1 ≤ i < j ≤ n, as the skew-symmetric matrix th th i,j with +1 on the ij position and −1 on the ji position. Now, for Z ∈ TXMn, 1 ≤ i < j ≤ n, by using the orthogonality, we have Tr Λ(Zi,j)>Zi,j 420 K. T. ARASU AND MANIL T. MOHAN

n " ! !!# X   1  1  1  1 1 1 = 2 a2 f − f 0 + a2 f − f 0 . ki a2 a2 a2 kj a2 a2 a2 k=1 ki ki ki kj kj kj (14) From (13), we also obtain Tr (Zi,j)>D (DF(X)) [Zi,j] n " ! ! !# X 2 1 1 1 1 = 2 a2 f 00 − f 0 + f ki a4 a2 a2 a2 a2 k=1 kj kj kj kj kj n X  2  1  1  1   1  + 2 a2 f 00 − f 0 + f . (15) kj a4 a2 a2 a2 a2 k=1 ki ki ki ki ki Thus for 1 ≤ i < j ≤ n, we have

i,j i,j > i,j  i,j > i,j ξX : = Tr (Z ) D (DF(X)) [Z ] − Tr Λ(Z ) Z n " 2 ! 2  # X a 1 akj 1 = 4 ki f 00 + f 00 a4 a2 a4 a2 k=1 kj kj ki ki n " ! ! # X 1  1  1 1 1  1  + 2 a2 − a2  f 0 − f 0 + f − f . ki kj a2 a2 a2 a2 a2 a2 k=1 ki ki kj kj kj ki (16)

n If X = (aij)i,j=1 is a strict local maximum, then from the necessary and sufficient i,j conditions (see Lemma 3.2 and Lemma 3.3), we know that ξX < 0, for all 1 ≤ i < j ≤ n. Since the set O(n, R) is not convex, the maximization problem in (7) over the set of all n × n real orthogonal matrices may give several local and global maxima (even if the cost functional is concave).

n 3.2. The Minimization Problem. For a matrix X = Mn = (aij)i,j=1, let us consider n n  X 2 2  X 1 if i = j, Jf[Mn] := min aijg aij , such that akiakj = (17) X∈ n×n 0 if i 6= j, R i,j=1 k=1 where + (i) g(·) : [0, 1] → R0 is a twice continuously differentiable function with g(0) = 0, (ii) g is monotonically increasing, (iii) g is a strictly convex function for x ∈ (0, 1]. This problem can also be handled in a similar way as discussed in section 3.1 and we obtain following estimates: n G = 2 a g(a2 ) + a2 g0(a2 ) , ij ij ij ij i,j=1 n !n X  2 2 2 0 2 2 0 2  A = 2 aikajk g(aik) − g(ajk) + aikg (aik) − ajkg (ajk) , k=1 i,j=1 n !n X  2 2 2  Λ = 2 akiakj g(aki) + akig(aki) , k=1 i,j=1 OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 421 and n i,j > i,j X  2 2 4 0 2 2 2 4 0 2  Tr Λ(Z ) Z = 2 akig(aki) + akig (aki) + akjg(akj) + akjg (akj) k=1 Tr (Zi,j)>D (DF(X)) [Zi,j] n X 2  2 2 0 2 4 00 2  = 2 aki g(akj) + 5akjg (akj) + 2akjg (akj) k=1 n X 2  2 2 0 2 4 00 2  + 2 akj g(aki) + 5akig (aki) + 2akig (aki) k=1 n i,j X 2 2 2 00 2 2 00 2  ξX = 4 akiakj akig (aki) + akjg (akj) k=1 n X  2 2 0 2 0 2  4 0 2 4 0 2  + 2 5akiakj g (aki) + g (akj) − akig (aki) − akjg (akj) k=1 n X 2 2 2 2  + 2 (aki − akj) g(akj) − g(aki) , k=1

i,j n for all Z ∈ TXMn, 1 ≤ i < j ≤ n. If X = (aij)i,j=1 is a strict local minimum to the problem (17), then from the necessary and sufficient conditions, we know that i,j ξX > 0, for all 1 ≤ i < j ≤ n.

4. Stationary Points and Maximal/Minimal Orthogonal Matrices. In this section, we find some stationary points and local maximum/minimum of the opti- mization problem (7)/(17) using the multivariate analysis techniques described in section3. We first state an elementary lemma and the proof of which is easy.

Lemma 4.1. Let A be an n × n orthogonal matrix and B be an m × m orthogonal matrix. Then the matrices A ⊕ B and A ⊗ B are also orthogonal matrices of orders (n + m) × (n + m) and (nm) × (nm) respectively, where ⊕ denotes the direct sum and ⊗ denotes their Kronecker product of A and B.

Proposition 4.2. Let the orthogonal matrices Mn and Mm be stationary points of the maximization problem (7) (minimization problem (17)) of orders n and m respectively, then Mn ⊕ Mm is a stationary point of the corresponding maximization problem (minimization problem) of order n + m. Also for the minimization problem (17), we have 1. if both Mn and Mm are local maximum, then Mn ⊕ Mm is a local maximum, 2. if at least one of them is a saddle point, then Mn ⊕ Mm is also a saddle point, 3. if both are local minimum, then Mn ⊕ Mm is a saddle point. Proof. We prove the proposition only for the maximization problem given in (7) and the proof of the minimization problem (17) follows in a similar fashion. Let n n n+m Mn = (aij)i,j=1, Mm = (bij)i,j=1 and Mn ⊕ Mm = (cij)i,j=1. By the definition   aij for 1 ≤ i, j ≤ n, of ⊕, we know that cij = bij for n + 1 ≤ i, j ≤ m + n, Since Mn and Mm be  0, otherwise. 422 K. T. ARASU AND MANIL T. MOHAN stationary points of the maximization problem (7), from (11), we have n (" !# " !#) X  1  1 1  1  1 1 η : = a a f − f − f 0 − f 0 ij ik jk a2 a2 a2 a2 a2 a2 k=1 ik jk ik ik jk jk = 0, (18) n (" ! # " ! #) X 1  1  1 1 1  1  θ : = b b f − f − f 0 − f 0 = 0, gl gk lk b2 b2 b2 b2 b2 b2 k=1 gk lk gk gk lk lk (19) for all 1 ≤ i, j ≤ n and 1 ≤ g, l ≤ m. Now, we know that n+m (" !# " !#) X  1  1 1  1  1 1 ζ : = c c f − f − f 0 − f 0 ij ik jk c2 c2 c2 c2 c2 c2 k=1 ik jk ik ik jk jk n (" !# " !#) X  1  1 1  1  1 1 = c c f − f − f 0 − f 0 ik jk c2 c2 c2 c2 c2 c2 k=1 ik jk ik ik jk jk n+m (" !# " !#) X  1  1 1  1  1 1 + c c f − f − f 0 − f 0 ik jk c2 c2 c2 c2 c2 c2 k=n+1 ik jk ik ik jk jk 1 2 = ζij + ζij. (20) 1 2 For 1 ≤ i, j ≤ n, we get ζij = ηij = 0, ζij = 0 and so ζij = 0. For n + 1 ≤ i, j ≤ 1 2 n + m, we have ζij = 0, ζij = θij and hence ζij = 0. For the cases, 1 ≤ i ≤ n, n + 1 ≤ j ≤ n + m and n + 1 ≤ i ≤ n + m, 1 ≤ j ≤ n, it can be easily seen that ζij = 0, since at least one of the multiplying entry in ζij is zero. Hence Mn ⊕ Mm of order n + m is a stationary point of the maximization problem (7). In a similar way one can prove the first order necessary conditions for the min- imization problem (17). In order to get the second order necessary and sufficient and conditions for the problem (17), we consider the following: n m 1. Let us assume that Mn = (aij)i,j=1 and Mm = (bij)i,j=1 are strict local maxima of the problem (17) for orders n and m respectively. Since Mn is a strict local maximum, from the second order necessary and sufficient condition, we know that n n X X ξi,j : = 4 a2 a2 a2 g00(a2 ) + a2 g00(a2 ) + 2 (a2 − a2 ) g(a2 ) − g(a2 ) Mn ki kj ki ki kj kj ki kj kj ki k=1 k=1 n X  2 2 0 2 0 2  4 0 2 4 0 2  + 2 5akiakj g (aki) + g (akj) − akig (aki) − akjg (akj) < 0, (21) k=1 for all 1 ≤ i < j ≤ n. Similarly since Mm is a a strict local maximizer, we have m m X X ξi,j : = 4 b2 b2 b2 g00(b2 ) + b2 g00(b2 ) + 2 (b2 − b2 ) g(b2 ) − g(b2 ) Mm ki kj ki ki kj kj ki kj kj ki k=1 k=1 m X  2 2 0 2 0 2  4 0 2 4 0 2  + 2 5bkibkj g (bki) + g (bkj) − bkig (bki) − bkjg (bkj) < 0, (22) k=1 for all 1 ≤ i < j ≤ m. Now, for 1 ≤ i < j ≤ n and X = Mn ⊕ Mm, we have n+m n+m i,j X 2 2 2 00 2 2 00 2  X 2 2 2 2  ξX : = 4 ckickj ckig (cki) + ckjg (ckj) + 2 (cki − ckj) g(ckj) − g(cki) k=1 k=1 OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 423

n+m X  2 2 0 2 0 2  4 0 2 4 0 2  + 2 5ckickj g (cki) + g (ckj) − ckig (cki) − ckjg (ckj) k=1 n n+m X 2 2 2 00 2 2 00 2  X 2 2 2 2  = 4 ckickj ckig (cki) + ckjg (ckj) + 2 (cki − ckj) g(ckj) − g(cki) k=1 k=1 n X  2 2 0 2 0 2  4 0 2 4 0 2  + 2 5ckickj g (cki) + g (ckj) − ckig (cki) − ckjg (ckj) k=1 = ξi,j < 0, (23) Mn n(n−1) and it covers 2 terms. Now for n + 1 ≤ i < j ≤ n + m, we have n+m n+m i,j X 2 2 2 00 2 2 00 2  X 2 2 2 2  ξX = 4 ckickj ckig (cki) + ckjg (ckj) + 2 (cki − ckj) g(ckj) − g(cki) k=n+1 k=n+1 n+m X  2 2 0 2 0 2  4 0 2 4 0 2  + 2 5ckickj g (cki) + g (ckj) − ckig (cki) − ckjg (ckj) k=n+1 = ξi,j < 0, (24) Mm m(m−1) by using (22) and it covers 2 terms. Now, for the remaining nm terms (for the case 1 ≤ i ≤ n, n + 1 ≤ j ≤ n + m), we have n+m n+m i,j X 2 2 2 00 2 2 00 2  X 2 2 2 2  ξX = 4 ckickj ckig (cki) + ckjg (ckj) + 2 (cki − ckj) g(ckj) − g(cki) k=1 k=1 n+m X  2 2 0 2 0 2  4 0 2 4 0 2  + 2 5ckickj g (cki) + g (ckj) − ckig (cki) − ckjg (ckj) k=1 n+m X  2 2 2 2 4 0 2 4 0 2  = −2 ckig(cki) + ckjg(ckj) + ckig (cki) + ckjg (ckj) k=1 n X  2 2 2 2 4 0 2 4 0 2  = −2 akig(aki) + akjg(akj) + akig (aki) + akjg (akj) k=1 m X  2 2 2 2 4 0 2 4 0 2  − 2 bkig(bki) + bkjg(bkj) + akig (bki) + bkjg (bkj) < 0, (25) k=1 since the cross terms are zero, g ≥ 0 and g(·) is monotonically increasing. Combining (23), (24) and (25), we finally get X = Mn ⊕Mm is a local maximizer to the problem (7) of order n + m. 2. If at least one of Mn or Mm is a saddle point, then from (23) or (24), it is clear that Mn ⊕ Mm is also a saddle point. 3. If Mn and Mm are local minimum, then the estimates (23) and (24) are positive, but the estimate (25) is still negative, making Mn⊕Mm, a saddle point. Remark 4. In general, the second part of Proposition 4.2 is not true for the maximization problem (7). For instance, for the local minimum, the estimates similar to (23) and (24) are positive. But the estimate (25) becomes n n " ! !# X  1  1   1  X 1 1 1 ξi,j = 2 a2 f 0 − f + 2 a2 f 0 − f X ki a2 a2 a2 kj a2 a2 a2 k=1 ki ki ki k=1 kj kj kj 424 K. T. ARASU AND MANIL T. MOHAN

m n " ! !# X  1  1   1  X 1 1 1 + 2 b2 f 0 − f + 2 b2 f 0 − f , ki b2 b2 b2 kj b2 b2 b2 k=1 ki ki ki k=1 kj kj kj (26) i,j for 1 ≤ i ≤ n, n + 1 ≤ j ≤ n + m. From the above estimate, the sign of ξX is not conclusive as f ≥ 0 and f(·) is monotonically increasing. n n P 2  1  P 2 2 Let us now define J[Mn] := aijf a2 and eJ[Mn] := aijg(aij). i,j=1 ij i,j=1 n m Lemma 4.3. Let the orthogonal matrices Mn = (aij)ij=1 and Mm = (bij)i,j=1 be stationary points of the maximization problem (7) (minimization problem (17)). Then ) J[Mn ⊕ Mm] = J[Mn] + J[Mm], (27) eJ[Mn ⊕ Mm] = eJ[Mn] + eJ[Mm]. n+m Proof. If Mn ⊕ Mm = (cij)i,j=1, then we have n+m ! X 1 J[M ⊕ M ] = c2 f n m ij c2 i,j=1 ij n ! n+m ! n n+m ! X 1 X 1 X X 1 = c2 f + c2 f + c2 f ij c2 ij c2 ij c2 i,j=1 ij i,j=n+1 ij i=1 j=n+1 ij n+m n ! X X 1 + c2 f ij c2 i=n+1 j=1 ij n ! m ! X 1 X 1 = a2 f + b2 f ij a2 ij b2 i,j=1 ij i,j=1 ij

= J[Mn] + J[Mm]. (28) The second equality in (27) follows similarly. Proposition 4.4. The orthogonal matrix  −(n − 2) 2 ··· 2  1  2 −(n − 2) ··· 2  K =   , (29) n  . . .. .  n  . . . .  2 2 · · · −(n − 2) is always a stationary point of the maximization problem (7) (minimization problem (17)). Proof. For the orthogonal matrix given in (29), the derivative matrix G is given by  −a b ··· b  2  b −a ··· b  G =   = G>, (30)  . . .. .  n  . . . .  b b · · · −a where   n2  n2  n2  a = (n − 2) f − f 0 , and (n − 2)2 (n − 2)2 (n − 2)2 OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 425

 n2  n2 n2  b = 2 f − f 0 . 4 4 4 Now the matrix A is given by A = GX> − XG> = GX − XG = 0, (31) since GX = XG  (n − 2)a + 2(n − 1)b −2a + (n − 2)b · · · −2a + (n − 2)b  2  −2a + (n − 2)b (n − 2)a + 2(n − 1)b · · · −2a + (n − 2)b  =   , 2  . . .. .  n  . . . .  −2a + (n − 2)b −2a + (n − 2)b ··· (n − 2)a + 2(n − 1)b making X = Kn, a stationary point of the problem (7). Theorem 4.5. There exists at least p(n) stationary points for the maximization problem (7) (minimization problem (17)), where p(n) represents the number of pos- sible partitions of a natural number n, i.e., the number of distinct ways of repre- senting n as a sum of natural numbers. Proof. Follows by combining Proposition 4.2 and Proposition 4.4.

∞ ∞ P n Q  1  Note that the generating function of p(n) is given by p(n)x = 1−xk . n=0 k=1 Also p(n) can be found using the Euler’s recurrence relation:

n  1 if n = 0, X  α(i)p(n − i) = 0, n ≥ 1, p(0) = 1 and α(n) = r r(3r±1) (−1) if n = 2 , r ≥ 1, i=0  0 otherwise. The first few values of p(n) are p(1) = 1, p(2) = 2, p(3) = 3, p(4) = 5, p(5) = 7, p(6) = 11, p(7) = 15, p(8) = 22, p(9) = 30, p(10) = 42, etc.

n m Lemma 4.6. Let the orthogonal matrices Mn = (aij)ij=1 and Mm = (bij)i,j=1 be stationary points of the maximization problem (7) (minimization problem (17)) of orders n and m respectively. If Mn ⊗ Mm is a stationary point of the maximization problem (7) (minimization problem (17)) of order nm, then n n !  X X 1 J[M ⊗ M ] = a2 b2 f , n m kl ij a2 b2  k,l=1 i,j=1 kl ij  n n (32) X X  J[M ⊗ M ] = a2 b2 g a2 b2  .  e n m kl ij kl ij  k,l=1 i,j=1  If f(xy) = f(x) + f(y) (or g(xy) = g(x) + g(y)), then ) J[Mn ⊗ Mm] = mJ[Mn] + nJ[Mm], (33) eJ[Mn ⊗ Mm] = meJ[Mn] + neJ[Mm], and if f(xy) = f(x)f(y) (or g(xy) = g(x)g(y)), then ) J[Mn ⊗ Mm] = J[Mn]J[Mm], (34) eJ[Mn ⊗ Mm] = eJ[Mn]eJ[Mm]. 426 K. T. ARASU AND MANIL T. MOHAN

mn Proof. If Mn ⊗ Mm = (cij)i,j=1, then we have mn ! n m ! X 1 X X 1 J[M ⊗ M ] = c2 f = a2 b2 f . (35) n m ij c2 kl ij a2 b2 i,j=1 ij k,l=1 i,j=1 kl ij If f(xy) = f(x) + f(y), then we have n m !! X X  1  1 J[M ⊗ M ] = a2 b2 f + f n m kl ij a2 b2 k,l=1 i,j=1 kl ij

 n   m   n   m ! X  1  X X X 1 = a2 f b2 + a2 b2 f  kl a2   ij  kl  ij b2  k,l=1 kl i,j=1 k,l=1 i,j=1 ij

= mJ[Mn] + nJ[Mm]. (36) If f(xy) = f(x)f(y), then n m ! X X  1  1 J[M ⊗ M ] = a2 b2 f f n m kl ij a2 b2 k,l=1 i,j=1 kl ij

 n   m ! X  1  X 1 = a2 f b2 f  kl a2   ij b2  k,l=1 kl i,j=1 ij

= J[Mn]J[Mm], (37) which completes the proof. Theorem 4.7. The cost function in (7) satisfies

0 ≤ J[Mn] ≤ nf (n) , for all n, (38) 2 1 and the global maximum value nf (n) is attained if and only if aij = n .

Proof. It is clear that J[Mn] ≥ 0, since f is monotonically increasing and f(1) = 0. n P 2 Since aij = 1, for all 1 ≤ i ≤ n, by using Jensen’s inequality, we have j=1

 n  n ! X 1 X 1 f(n) = f a2 ≥ a2 f . (39)  ij a2  ij a2 j=1 ij j=1 ij Summing over i from 1 to n, we get n ! n X 1 X 0 ≤ a2 f ≤ f(n) = nf(n). (40) ij a2 i,j=1 ij i=1 Since, f(·) is a strictly concave function, the right hand side of the inequality (40) 2 1 is strict unless aij = n . The global minimum value 0 is obtained for the orthogonal  1 if i = j matrices having a2 = δ , for all 1 ≤ i, j ≤ n, where δ = (and its ij ij ij 0 if i 6= j permutations). Theorem 4.8. The cost function in (17) satisfies  1  ng(1) ≥ eJ[Mn] ≥ ng , for all n, (41) n OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 427

1  2 1 and the global minimum value ng n is attained if and only if aij = n . Proof. The first part of the inequality in (41) can be proved as n n X 2 X 2 2  ng(1) = aijg(1) ≥ aijg aij = eJ[Mn], (42) i,j=1 i,j=1

2 since g is a monotonically increasing function and 0 ≤ aij ≤ 1. In order to prove the second part of the inequality in (41), we use the convexity of g to obtain

n n n  n  X X 2 2  X X 4 aijg aij ≥ g  aij . (43) i=1 j=1 i=1 j=1 Now for fixed 1 ≤ i ≤ n, we consider the problem n n X 4 X 2 min aij subject to aij = 1. j=1 j=1

2 1 By using Lagrange multipliers, it can be easily seen that aij = n , for all 1 ≤ j ≤ n is a stationary point. It is a local minimum, since the 8 H = 8 diag(a2 , . . . , a2 ) = I i1 in n n is positive definite and one can easily see that it is also a global minimum. From (43), we have

n n n  n  n X X X X X  1   1  a2 g a2  ≥ g a4 ≥ g = ng . (44) ij ij  ij n n i=1 j=1 i=1 j=1 i=1 Combining (42) and (44), we finally obtain (41).

√1 Proposition 4.9. If a real Hadamard matrix Hn exists, then n Hn is a stationary point of the maximization problem (7) (minimization problem (17)) and is a local maximum (local minimum) and also a global maximum (global minimum). Proof. Let n be 1, 2 or ≡ 0(mod 4) and let us assume that a real Hadamard matrix n √1 Hn = (hij)i,j=1 exists. Let X = n Hn. The derivative matrix G = 2(f(n) − nf 0(n))X and the matrix A = GX> − XG> = 0,

√1 and the first order necessary conditions are satisfied, so that n Hn is a stationary point. > 0 Now the Lagrange multiplier is given by Λ = G X = 2(f(n) − nf (n))In. Since f(·) is a strictly concave function, we have Tr(Λ(Zi,j)>Zi,j) = 4(f(n) − nf 0(n)), Tr (Zi,j)>D (DF(X)) [Zi,j] = 4[2n2f 00(n) − nf 0(n) + f(n)], and i,j 2 00 ξX = 8n f (n) < 0,

i,j √1 for all n, and for each Z ∈ TXMn, 1 ≤ i < j ≤ n. Hence n Hn is a local 2 1 maximum and since aij = n , by the Theorem 4.7, it is a global maximum. 428 K. T. ARASU AND MANIL T. MOHAN

Proposition 4.10. If a real conference matrix Cn exists, then the orthogonal ma- trix √ 1 C is a stationary point to the maximization problem (7) (minimization n−1 n problem (17)). It is a global minimum for n = 2 and is a strict local minimum for the minimiza- tion problem (17) if 8(n − 2)  1  4(4n − 9)  1  4  1  g00 + g0 − g > 0, (45) (n − 1)3 n − 1 (n − 1)2 n − 1 n − 1 n − 1 for n ≥ 3.

n Proof. Let n be even, and assume that a real conference matrix Cn = (cij)i,j=1 exists. For X = √ 1 C , the derivative matrix and the Lagrange multiplier are n−1 n given by G = 2 (f(n − 1) − (n − 1)f 0(n − 1)) X, 0 Λ = 2 (f(n − 1) − (n − 1)f (n − 1)) In.

Then, we easily have A = GX> − XG> = 0. Hence, the orthogonal matrix √ 1 C n−1 n is a stationary point to the maximization problem (7). The first order necessary condition for the minimization problem (17) can be proved in a similar way as above. For the second part of the Proposition 4.10, we have   1  1  1  Λ = 2 g + g0 I , n − 1 n − 1 n − 1 n   1  1  1  Tr(Λ(Zi,j)>Zi,j) = 4 g + g0 , n − 1 n − 1 n − 1 (n − 2)   1  5  1  Tr (Zi,j)>D (DF(X)) [Zi,j] = 4 g + g0 (n − 1) n − 1 (n − 1) n − 1 2  1  + g00 , and (n − 1)2 n − 1 8(n − 2)  1  4(4n − 9)  1  ξi,j = g00 + g0 X (n − 1)3 n − 1 (n − 1)2 n − 1 4  1  − g , n − 1 n − 1

i,j for each Z ∈ TXMn, 1 ≤ i < j ≤ n. For n = 2, we have i,j 0 ξX = −4 [g (1) + g(1)] < 0, since g is strictly increasing, and hence C2 is a local maximum. In fact, the matrix 2 2 C2 (cij = 1 for i 6= j and cii = 0), is a global maximum with value 2g(1). i,j Now, for n ≥ 3, if ξX > 0 for all 1 ≤ i < j ≤ n, then X is a strict local minimum of the problem (17) and is true by condition (45).   Remark 5. For M = √ 1 C , J[M ] = nf(n − 1) and J[M ] = ng 1 . n n−1 n n e n n−1

Proposition 4.11. If a real weighing matrix Wn,k exists for some 1 ≤ k ≤ n, then the orthogonal matrix X = √1 W with W = (w )n is a stationary point to k n,k n,k ij i,j=1 the maximization problem (7) (minimization problem (17)). OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 429

n P 2 2 Also for βij = wliwlj, X is a strict local minimum of the problem (17) if l=1 8β  1  4(5β − k)  1   β   1  ij g00 + ij g0 − 4 1 − ij g > 0. (46) k3 k k2 k k k

Proof. Let X = √1 W exists for some 1 ≤ k ≤ n. The derivative matrix G and k n,k the Lagrange multiplier Λ for the maximization problem (7) are given by G = 2 (f(k) − kf 0(k)) X, 0 Λ = 2 (f(k)) − kf (k)) In, with A = 0. Hence, X is a stationray point for the maximization problem (7). For the minimization problem (17), it can be shown in a similar way as above that √1 W is a local minimum. In order to get the second order necessary and k n,k sufficient conditions, we estimate   1  1  1  Λ = 2 g + g0 I , k k k n   1  1  1  Tr(Λ(Zi,j)>Zi,j) = 4 g + g0 , k k k 4β   1  5  1  2  1  Tr (Zi,j)>D (DF(X)) [Zi,j] = ij g + g0 + g00 , and k k k k k2 k 8β  1  4(5β − k)  1  ξi,j = ij g00 + ij g0 X k3 k k2 k  β   1  − 4 1 − ij g , k k

i,j i,j for each Z ∈ TXMn, 1 ≤ i < j ≤ n. Hence X is strict local minimum if ξX > 0 for all 1 ≤ i < j ≤ n and is true if (46) is satisfied.

Remark 6. For M = √1 W , J[M ] = nf(k) and J[M ] = ng 1 . n k n,k n e n k Remark 7. It is difficult establish the second order necessary and sufficient condi- tions given in Proposition 4.10 and Proposition 4.11 for the maximization problem i,j (7), since 0 is an entry of X and f(∞) = ∞, so that we obtain ξX = +∞, for all 1 ≤ i < j ≤ n. Hence, the theory given in Lemma 3.2 and Lemma 3.3 are inconclusive at this point and one can use the definition of local maximum to check nature of the stationary points (see for example [2]). 4.1. Optimization Problems with Unitary Matrix Constraints. For a com- n n×n plex matrix X = Mn = (aij)i,j=1 ∈ C , let us consider n ! X 2 1 ∗ J [Mn] = max |aij| f 2 , such that XX = In, (47) X∈ n×n C i,j=1 |aij| where f(·) satisfies conditions given in the problem (7). Similarly for a complex n n×n matrix X = Mn = (aij)i,j=1 ∈ C , we consider n X 2  2 ∗ Jf[Mn] = min |aij| g |aij| , such that XX = In, (48) X∈ n×n C i,j=1 430 K. T. ARASU AND MANIL T. MOHAN where g(·) satisfies conditions given in the problem (17). The global maximizer of (47) and global minimizer of (48) are normalized complex Hadamard matrices (with 1  values nf(n) and ng n respectively)  1  M = √ F , (49) n n n  1 1 1 ··· 1  2 n−1  1 ω ω ··· ω   2 4 2n−2   1 ω ω ··· ω  where Fn is the matrix, Fn =   , a special case  ......   . . . . .  2 1 ωn−1 ω2n−2 ··· ω(n−1) of the , where ω is the nth root of unity (ω 6= 1).

5. Applications. In this section, we consider some important examples and study their local and global maxima and minima.

5.1. Minimum distance orthostochastic matrices to uniform van der Waer- n den matrices [3]. Let us consider a square matrix Bn = (bij)i,j=1, of order n containing non-negative entries. Then, the matrix Bn is called stochastic if each n P row sums to unity, i.e., bij = 1 for all 1 ≤ j ≤ n. In addition to this, if each of i=1 n P its columns sums to unity, i.e., bij = 1 for all 1 ≤ i ≤ n, then it is called bis- j=1 tochastic or doubly stochastic. A matrix Bn is called orthostochastic, if there exists n 2 a real orthogonal matrix Mn = (aij)i,j=1 such that bij = aij for all 1 ≤ i, j ≤ n. Some examples of bistochastic matrices include Pn with only 2 n non-zero entries, and uniform van der Waerden matrix Jn, with all its n entries equal to 1/n (see [26,5]). The optimization problem for the distance function d(Jn, Bn) from an orthos- tochastic matrix Bn to the uniform van der Waerden matrix Jn can be stated as n follows: Let Bn be generated from the orthogonal matrix X = Mn = (aij)i,j=1 and we consider n  2 n X 1 2 X 4 > min − aij = min aij − 1, such that XX = In. (50) X∈ n×n n X∈ n×n R i,j=1 R i,j=1 0 00 n Here g(t) = t, g (t) = 1 > 0 and g (t) = 0 for all t ∈ [0, 1]. If X = (aij)i,j=1 is stationary point, then we have n !n > > X 2 2  A = GX − XG = 4 aikajk aik − ajk = 0, (51) k=1 i,j=1 n i,j X  2 2 4 4  ξX = 4 6akiakj − (aki + akj) > 0, (52) k=1 for 1 ≤ i < j ≤ n. By Theorem 4.8, the global minimum of the problem (50) is 0, and by Proposition 4.9, if a real Hadamard matrix of order n exists, it achieves the minimum, since i,j 16 ξX = n > 0. A global maximum value is n − 1 and is achieved by the orthogonal matrices which generate the permutation matrices Pn. OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 431

Clearly the matrix given in (29) is a stationary point of the minimization problem (50) by using Proposition 4.4. The matrix (29) a strict local minimum for n ≤ 7 and a strict local maximum for n ≥ 8, since 8 ξi,j = (−n3 + 8n2 − 32) > 0 for n ≤ 7, and ξi,j < 0 for n ≥ 8. X n3 X √ The corresponding orthostochastic matrix is at a distance (n−4) n−1 for n ≥ 4 and √ n (4−n) n−1 n for n ≤ 3 from the uniform van der Waerden matrix Jn. Now, by Proposition 4.10, if a real conference matrix Cn exists, then the orthog- onal matrix X = √ 1 C is a stationary point and is a global maximum for n = 2 n−1 n and is a strict local minimum for n ≥ 3, since 8(2n − 5) ξi,j = > 0 for n ≥ 3, and ξi,j < 0 for n = 2. X (n − 1)2 X Note that the orthostochastic matrix corresponding to √ 1 C is at a distance n−1 n √ 1 from J . n−1 n Also by Proposition 4.11, if a weighing matrix Wn,k exists for some 1 ≤ k ≤ n, then the orthogonal matrix X = √1 W = (w )n is a stationary point of the k n,k ij i,j=1 n P 2 2 minimization problem (50). If βij = wliwlj and β = min βij and β = max βij, l=1 i,j i,j for 1 ≤ i, j ≤ n, i < j, then √1 W is a strict local minimum if β > k , a strict k n,k 3 k k k local maximum if β < 3 , and a saddle point if βij ≥ 3 and βij ≤ 3 for some i and j, since 8  3  ξi,j = β − 1 , X k k ij for 1 ≤ i < j ≤ n. The orthostochastic matrix corresponding to √1 W is at a k n,k q n−k distance k from the uniform van der Waerden matrix Jn. We now list some orthostochastic matrices Bn which are closer to the uniform van der Waerden matrices Jn of orders n ≤ 20 in the following table. The matrices K3 and K5 in the table below are given by  −3 2 2 2 2    −1 2 2  2 −3 2 2 2  1 1   K3 =  2 −1 2  and K5 =  2 2 −3 2 2  . (53) 3 5   2 2 −1  2 2 2 −3 2  2 2 2 2 −3 It should be noted that not all the matrices given in the table are global minima. Also for larger orders, if n is a even and n ≡ 2(mod 4) and if a real conference matrix exists, then the matrix √ 1 C seems to be a global minima (see the figure n−1 n given below). 5.2. Shannon entropy for real orthogonal matrices [2]. The Shannon entropy of a real orthogonal matrix can be defined as follows: n Definition 5.1. For an n×n real orthogonal matrix Mn = (aij)i,j=1, the definition of Shannon entropy is given by n ! X 2 1 H1[Mn] = a ln . (54) ij a2 i,j=1 ij 432 K. T. ARASU AND MANIL T. MOHAN

Table 1. 1 ≤ n ≤ 20.

n Orthogonal matrix d(Jn, Bn) n Orthogonal matrix d(Jn, Bn) 1 ∗ 1 H1 0 11 3 C10 ⊕ H1 1.054 2 √1 H 0 12 √1 H 0 2 2 2 3 12 1 3 K3 0.471 13 3 W13,9 0.667 4 1 H 0 14 √1 C 0.277 2 4 13 14 † 5 K5 0.400 15 K3 ⊗ K5 0.646 6 √1 C 0.447 16 1 H 0 5 6 4 16 1 1 7 2 W7,4 0.866 17 3 W17,9 0.943 8 √1 H(8) 0 18 √1 C 0.243 2 2 17 18 9 K ⊗ K 0.703 19 √1 C ⊕ H∗ 1.029 3 3 17 18 1 10 1 C 0.333 20 √1 H 0 3 10 2 5 20 ∗By Proposition 4.2, these orthogonal matrices are saddle points and not local minima. The weighing matrices W11,4 and W19,9 exists for orders 11 and 19, but they are also saddle points. The orthostochastic matrices corresponding to the 1 1 orthogonal matrices 2 W11,4 and 3 W19,9 are at distances 1.323 > 1.054 and 1.054 > 1.029, respectively from the the uniform van der Waerden matrices J11 and J19. † For order 15, weighing matrix W15,9 exist, which are also local minima, by Propo- sition 4.11. But the orthostochastic matrix corresponding to the orthogonal matrix 1 3 W15,9 is at a distance 0.817 > 0.646, from the uniform van der Waerden matrix J15.

Figure 1. n versus d graph.

The maximization problem of the Shannon entropy is given by

n ! n  X 2 1 X 1 if i = j, J [Mn] = max aijf 2 , such that akiakj = (55) X∈ n×n a 0 if i 6= j, R i,j=1 ij k=1 OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 433

0 1 00 1 so that f(t) = ln t, f (t) = t > 0 and f (t) = − t2 < 0 for all t ∈ [1, ∞] with 0 00 n f(1) = 0, f(∞) = ∞, f (∞) = 0 and f (∞) = 0. If X = (aij)i,j=1 is stationary point, then we have n 2 !!n > > X ajk A := GX − XG = 2 aikajk ln 2 = 0, (56) aik k=1 i,j=1 " n !# X a2 ξi,j : = 2 −4 + (a2 − a2 ) ln ki < 0, (57) X ki kj a2 k=1 kj for 1 ≤ i < j ≤ n. By Theorem 41, the maximum value of the Shannon entropy is given by n ln n and if a real Hadamard matrix Hn exists, by Proposition 4.9, this bound is achieved i,j √1 by X = n Hn, which is a local maximum, since ξX = −8 < 0. By Proposition 4.4, the matrix given in (29), i.e., X = Kn is clearly a stationary point of the maximization problem (55). Now, we have  n − 4 (n − 2)2  ξi,j = 4 −2 + ln , (58) X n 4 i,j for 1 ≤ i < j ≤ n. It can be easily seen that ξX = −0.343 < 0 for n = 11 and i,j ξX = 0.584 > 0 for n = 12. That is, the matrix Kn is a local maximum for n ≤ 11 and a local minimum for n ≥ 12. In this case, J[Mn] is given by 1 J[M ] = n2 ln(n2) − (n − 2)2 ln((n − 2)2) − 4(n − 1) ln 4 . (59) n n For n = 3, K3 is a global maximum and for n = 5, it appears to us that K5 is global maximum (see [9,2,3]) with Shannon entropies 2.895 < 3 ln 3 ≈ 3.296 and 7.703 < 5 ln 5 ≈ 8.047, respectively. If a real conference matrix C exists, then by Proposition 4.10, X = √ 1 C is a n n−1 n stationary point. Also, it is a local minimum for n = 2 with minimum value 0 and a local maximum for n ≥ 3 (see [2]). The local maximum value is given by with J [Mn] = n ln(n − 1). If n ≡ 2(mod 4) and if a real conference matrix exists, then for large n, it appears to us that √ 1 C is a global maximum. n−1 n By Proposition 4.11, if a weighing matrix Wn,k exists for some 1 ≤ k ≤ n, then the orthogonal matrix X = √1 W is also a stationary point of the maximization k n,k problem (55) with J [Mn] = nf(k). It should be noted that X can be a local maximum, local minimum or saddle point for different values of k (see [2]). 5.3. Rény entropy for orthogonal matrices. Next, we define the Rényi entropy for a real orthogonal matrix as:

n Definition 5.2. For an n × n orthogonal matrix Mn = (aij)i,j=1, the definition of the Rényi entropy is given by

 n  n  1 X 2 α Hα[Mn] = ln a , (60) 1 − α n ij  i,j=1  for all 0 ≤ α ≤ ∞ and α 6= 1. For α = 0 in (60), we get the Hartley or max-entropy as

H0[Mn] = n ln n = n ln (o(Mn)) , 434 K. T. ARASU AND MANIL T. MOHAN where o(Mn) is the order of the matrix Mn. As α → 1 in (60), by using L’Hospital’s rule, we have

 n  n n ! n  1 X 2 α X X 2 1 lim Hα[Mn] = lim ln aij = aij ln = H1[Mn], α→1 α→1 1 − α n a2  i,j=1  i=1 j=1 ij which is the Shannon entropy. The Collision Entropy or Quadratic Rényi entropy is the Rényi entropy for α = 2, i.e.,

 n   1 X 4  H2[Mn] = −n ln a . (61) n ij  i,j=1  For α = ∞ in (60), we get the min-entropy defined by   2  2 2 H∞[Mn] = n min − ln aij = −n max ln aij = −n ln max aij. 1≤i,j≤n 1≤i,j≤n 1≤i,j≤n n For an orthogonal matrix X = (aij)i,j=1, the maximization problem of Rényi en- tropy is given by

  n  n  1 X 2 α > J [Mn] = max  ln aij  , such that XX = In. (62) X∈ n×n 1 − α n R  i,j=1  But for α > 1, we know that

  n   n  n  1 X 2 α n  1 X 2 α max  ln aij  = min ln aij , X∈ n×n 1 − α n 1 − α X∈ n×n n R  i,j=1  R  i,j=1  and minimizing a monotonically increasing function of a variable is equivalent to minimizing the variable itself. Thus, it is enough to consider the problem n n  X 2 α X 1 if i = j, Jc[Mn] = min aij , such that akiakj = (63) X∈ n×n 0 if i 6= j. R i,j=1 k=1 Comparing with the minimization problem in (17), we know that g(x) = xα−1 with g(0) = 0, g0(x) = (α − 1)xα−2 > 0 and g00(x) = (α − 1)(α − 2)xα−3 ≥ 0, for all n α > 1. If X = (aij)i,j=1 is stationary point, then we have n !n X  2 α 2 α A = 2(α + 1) aikajk aik − ajk = 0, (64) k=1 i,j=1 n i,j X n h 2 2 α−1 2 2 α−1i h 2 α 2 αio ξX = 2α (2α − 1) aki akj + akj aki − aki + akj , (65) k=1 for 1 ≤ i < j ≤ n. 1 By Theorem 4.8, the global minimum of the problem (63) is nα−2 . Using Propo- √1 sition 4.9, if a real Hadamard matrix of order n exists, then X = n Hn is a local i,j 8α(α−1) minimum, since ξ = nα−1 > 0, for all α > 1 with J [Mn] = n ln n. Once again √1 by using Theorem 4.8, we know that n Hn is also a global minimum. Now, by Proposition 4.10, if a real conference matrix Cn exists, then the orthog- onal matrix X = √ 1 C is a stationary point and is a global maximum for n = 2 n−1 n OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 435 and is a strict local minimum for n ≥ 3, since 4α ξi,j = [(2α − 1)n − 4α + 3] > 0 for n ≥ 3, and ξi,j < 0 for n = 2. X (n − 1)α X i,j 4α−3 n−3 Note that ξX > 0 only if n > 2α−1 (equivalently for α > 2n−4 ) and we know that 4α−3 n lim = 2. For this case, bJ[Mn] = α−1 and the Rény entropy is n ln(n−1). α→∞ 2α−1 (n−1) Also by Proposition 4.11, if a real weighing matrix Wn,k exists for some 1 ≤ k ≤ n, then the orthogonal matrix X = √1 W is a stationary point the minimization k n,k k problem (63). Also X is a strict local minimum if β > 2α−1 , a strict local maximum k k k if β < 2α−1 , and a saddle point if βij ≥ 2α−1 and βij ≤ 2α−1 for some i and j, since 4α 2α − 1  ξi,j = β − 1 , X kα k ij n for 1 ≤ i < j ≤ n. Thus we have eJ[Mn] = kα−1 and the Rény entropy is n ln k. Finally the matrix given in (29) is a stationary point by Proposition 4.4. The matrix (29) a strict local minimum if 4αn ξi,j = [(2α − 1)(n − 2) − (n − 1)] 22α − (n − 2)2α X n4 h i o + 4(2α − 1)(n − 2)2 22(α−2) + (n − 2)2(α−2) > 0.

2α 2α In this case, J[M ] is (n−2) +(n−1)2 . b n n2α−1 √  ±a ∓ 1 − a2  Any 2 × 2 orthogonal matrix can be written as √ or ± 1 − a2 ±a √  ±a ± 1 − a2  √ , for a ∈ [0, 1] , and hence the Shannon and Rényi en- ± 1 − a2 ∓a tropies become      2 1 2 1 H1[M2] = 2 a ln + (1 − a ) ln , and a2 1 − a2

2  2α 2 α Hα[M2] = ln a + (1 − a ) . 1 − α The figure given below depicts the Rényi entropy of 2 × 2 orthogonal matrices for different values of α. 5.4. Tsallis entropy for orthogonal matrices. We define the Tsallis entropy of a real orthogonal matrix as: n Definition 5.3. For an n × n orthogonal matrix Mn = (aij)i,j=1, the definition of the Tsallis entropy is given by   n    n 1 X 2 q n 1 − q Sq[Mn] = 1 − (a ) = 1 − exp Hq[Mn] , (66) q − 1  n ij  q − 1 n i,j=1 for all 0 < q ≤ ∞ (called the entropic-index) and q 6= 1. As q → 1, Tsallis entropy becomes the Shannon entropy, since by using L’Hospital’s rule, we have

 n  n ! n 1 X 2 q X 2 1 lim Sq[Mn] = lim 1 − (aij)  = aij ln = H1[Mn]. q→1 q→1 q − 1 n a2 i,j=1 i,j=1 ij 436 K. T. ARASU AND MANIL T. MOHAN

Figure 2. Rényi entropy of 2 × 2 orthogonal matrices.

The maximization problem of Tsallis entropy is given by

  n  n 1 X 2 q > J [Mn] = max  1 − (aij)  , such that XX = In. (67) X∈ n×n q − 1 n R i,j=1 But for q > 1, we know that

  n  n n 1 X 2 q n 1 X 2 q max  1 − (aij)  = − min (aij) , (68) X∈ n×n q − 1 n q − 1 q − 1 X∈ n×n R i,j=1 R i,j=1 and this minimization problem is same as (63). Note also that

n  !1−q  n  1 X 1 n 1 X a2 1 − = 1 − (a2 )q , (69) q − 1 ij  a2  q − 1  n ij  i,j=1 ij i,j=1

1−x1−q and we can consider it as a maximization problem given in (7) with f(x) = q−1 , 0 1 00 q and f(1) = 0, f (x) = xq > 0 and f (x) = − xq+1 < 0. For a 2 × 2 orthogonal matrix, the Tsallis entropy becomes

2 2q 2 q Sq[M2] = 1 − a − (1 − a ) . q − 1 The following figure gives the Tsallis entropy of 2×2 orthogonal matrices for different values of q. 5.5. Sharma-Mittal entropy for orthogonal matrices [22]. Let us now define the Sharma-Mittal entropy as n Definition 5.4. For an n × n orthogonal matrix Mn = (aij)i,j=1, the definition of the Sharma-Mittal entropy is given by  1−β   n  1−α n 1 X 2 α Tα,β[Mn] = 1 − a  , (70) β − 1  n ij   i,j=1 OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 437

Figure 3. Tsallis entropy of 2 × 2 orthogonal matrices. for all 0 < α, β ≤ ∞ and α, β 6= 1. By using L’Hospital’s rule, it can be easily seen that

 n  n 1 X 2 α Tα,α[Mn] = 1 − a = Sα[Mn], α − 1  n ij  i,j=1

 n  n 1 X 2 α lim Tα,β[Mn] = ln  aij  = Hα[Mn], and β→1 1 − α n i,j=1 n ! X 2 1 lim lim Tα,β[Mn] = aij ln = H1[Mn]. α→1 β→1 a2 i,j=1 ij The maximization problem of the Sharma-Mittal entropy is given by   1−β    1−α  n   n 1 X 2 α  > J [Mn] = max 1 −  aij   , such that XX = In. X∈ n×n β − 1  n  R  i,j=1  (71) But for α, β > 1, we also know that   1−β    1−α  n   n 1 X 2 α  max 1 −  aij   X∈ n×n β − 1  n  R  i,j=1   1−β   n  1−α n 1 X 2 α = 1 − min  aij   , β − 1  X∈ n×n n  R i,j=1 and since xγ is an increasing function for any γ > 0, it is enough consider the problem given in (63). The following figure gives the Sharma-Mittal entropy for n = 2, and different values of α and β. 438 K. T. ARASU AND MANIL T. MOHAN

Figure 4. Sharma-Mittal entropy of 2 × 2 orthogonal matrices.

5.6. Cressie-Read and K-divergence function for orthogonal matrices [6]. Let us now give some examples involving divergence functions from the Information Theory. 1+x  The K-divergence function for orthogonal matrices is f(x) = ln 2 with 0 1 00 1 f(1) = 0, f (x) = 1+x > 0 and f (x) = − (1+x)2 < 0, for all x ∈ [1, ∞). The maximization problem of K-divergence function is given by n " !# X 2 1 1 > J [Mn] = max aij ln 1 + 2 , such that X X = In, (72) X∈ n×n 2 a R i,j=1 ij n where X = (aij)i,j=1. If X is stationary point, then we have n ( ! ! !)!n a2 2 a2 − a2 X jk 1 + aik ik jk A = 2 aikajk ln 2 + ln 2 + 2 2 aik 1 + ajk (1 + aik)(1 + ajk) k=1 i,j=1 = 0. (73)

1−x−α The Cressie-Read functions for orthogonal matrices is f(x) = α(α+1) with f(1) = 0 1 00 1 0, f (x) = (α+1)xα+1 > 0 and f (x) = − xα+2 < 0, for all x ∈ [1, ∞). The maximization problem of K-divergence function is given by

  n  1 X 2 α+1 > J [Mn] = max  n − (aij)  , such that X X = In, (74) X∈ n×n α(α + 1) R i,j=1 n where X = (aij)i,j=1. If X is stationary point, then we have n !n 2 X  α α A = a a a2  − a2  = 0. (75) α ik jk jk ik k=1 i,j=1 From (56), (51), (64), (73) and (75), it can be seen that all these problems have same stationary points (see [2,3]). Also their local and global optimum are same for different parameters (say α, β, q etc) and orders n. OPTIMIZATION PROBLEMS WITH ORTHOGONAL MATRIX CONSTRAINTS 439

Acknowledgments. The first author’s research was partially supported by grants from the AFOSR and NSF. The second would like to thank Air Force Office of Scien- tific Research(AFOSR) and National Research Council(NRC) for Research Associ- ateship Award, and Air Force Institute of Technology(AFIT) and Indian Statistical Institute Bangalore Centre(ISI Bangalore) for providing stimulating scientific envi- ronment and resources. The authors sincerely would like to thank the reviewers for their valuable comments which led to the improvement of this work.

REFERENCES

[1] P.-A. Absil, R. Mahony and R. Sepulchre, Optimization Algorithms on Matrix Manifolds, Princeton University Press, Princeton, NJ, 2008. [2] K. T. Arasu, M. T. Mohan, A. Pathak and R. J. Ramya, Entropy optimal orthogonal matrices, Submitted for Journal Publication, 2018. [3] K. T. Arasu and M. T. Mohan, Entropy of orthogonal matrices and minimum distance or- thostochastic matrices from the uniform van der Waerden matrix, Submitted for Journal Publication, 2018. [4] R. Bhatia, Matrix Analysis, Graduate Texts in Mathematics, Springer, New York, 1997. [5] O. Chterental and D. Zˇ. Ðoković, On orthostochastic, unistochastic and qustochastic matrices, Linear Algebra and its Applications, 428 (2008), 1178–1201. [6] N. Cressie and T. R. C. Read, Multinomial goodness of fit, Journal of the Royal Statistical Society, B, 46 (1984), 440–464. [7] P. Delsarte, J. M. Goethals and J. J. Seidel, Orthogonal matrices with zero diagonal II, Canadian Journal of Mathematics, XXXIII (1971), 816–832. [8] A. Edelman, T. A. Arias and S. T. Smith, The geometry of algorithms with orthogonality constraints, SIAM Journal of Matrix Analysis and Applications, 20 (1999), 303–353. [9] H. G. Gadiyar, K. M. Sangeeta Maini, R. Padma and H. S. Sharatchandra, Entropy and Hadamard matrices, Journal of Physics A: Mathematical and General, 36 (2003), 109–112. [10] J. Gallier, Basics of Classical Lie Groups: The Exponential Map, Lie Groups, and Lie Al- gebras, Chapter 14, in “Geometric Methods and Applications", the series Texts in Applied Mathematicsolume, 38 (2001), 367–414. [11] A. V. Geramita, J. M. Geramita and J. Seberry, Orthogonal designs, Linear and Multilinear Algebra, 3 (1976), 381–206. [12] A. V. Geramita and J. Seberry, Orthogonal Designs: Quadratic Forms and Hadamard Ma- trices, Marcel Dekker, New York-Basel, 1979. [13] B. K. P. Horn, Closed form solution of absolute orientation using unit quaternions, Journal of the Optical Society A, 4, (4), 629–642, 1987. [14] Z. Q. Ma, Group Theory for Physicists, World Scientific, Singapore 2007. [15] A. W. Marshall, I. Olkin and B. C. Arnold, Inequalities: Theory of Majorization and Its Applications, Springer Series in Statistics, New York, 2011. [16] M. T. Mohan, On some p−almost Hadamard matrices, Accepted in Operators and Matrices, 2018. [17] H. Nakzato, Set of 3 × 3 orthostochastic matrices, Nihonkai Mathematics Journal, 7 (1996), 83–100. [18] A. Nemirovski, Sums of random symmetric matrices and quadratic optimization under or- thogonality constraints, Mathematical Programming, 109 (2007), 283–317. [19] J. Nocedal and S. J. Wright, Numerical Optimization, Springer Series in Operations Research and Financial Engineering, 2nd ed. Springer, New York, 2006. [20] R. E. A. C. Paley, On orthogonal matrices, Journal of Mathematics and Physics, 12 (1933), 311–320. [21] D. R. Stinson Combinatorial Designs: Constructions and Analysis, Springer-Verlag New York, Inc., 2004. [22] B. D. Sharma, and D. P. Mittal, New non-additive measures of entropy for discrete probability distributions, Journal of Mathematical Sciences, 10 (1975), 28–40. [23] F. Szöllősi, Construction, classification and parametrization of complex Hadamard matrices, PhD Thesis, https://arxiv.org/abs/1110.5590, 2011. 440 K. T. ARASU AND MANIL T. MOHAN

[24] V. Weber, J. Vande Vondele, J. Hütter and A. M. Niklasson, Direct energy functional min- imization under orthogonality constraints, The Journal of Chemical Physics, 128 (2008), 84–113. [25] Z. Wen and W. Yin, A feasible method for optimization with orthogonality constraints, Math. Program., Ser. A, 142 (2013), 397–434. [26]K. Z˙ yczkowski, M. Kuś, W. Słomczyński and H.-J. Sommers, Random unistochastic matrices, Journal of Physics A: Mathematical and General, 36 (2003), 3425–3450. Received May 2017; 1st revision March 2018; final revision August 2018. E-mail address: [email protected] E-mail address: [email protected], [email protected]