Geometrical Inverse Preconditioning for Symmetric Positive Definite
Total Page:16
File Type:pdf, Size:1020Kb
Geometrical inverse preconditioning for symmetric positive definite matrices Jean-Paul Chehab∗ Marcos Raydan † November 18, 2015 Abstract We focus on inverse preconditioners based on minimizing F (X)=1 cos(XA, I), where XA is the preconditioned matrix and A is symmetric and positive− definite. We present and analyze gradient-type methods to minimize F (X) on a suitable com- pact set. For that we use the geometrical properties of the non-polyhedral cone of symmetric and positive definite matrices, and also the special properties of F (X) on the feasible set. Preliminary and encouraging numerical results are also presented in which dense and sparse approximations are included. Key words: Preconditioning, cones of matrices, gradient method, minimal residual method. 1 Introduction Algebraic inverse preconditioning play a key role in a wide variety of applications that involve the solution of large and sparse linear systems of equations; see e.g., [4, 6, 7, 9, 14, 22, 23]. For a given square matrix A, there exist several proposals for constructing sparse inverse approximations which are based on optimization techniques, mainly based on minimizing the Frobenius norm of the residual (I XA) over a set P of matrices with a − certain sparsity pattern; see e.g., [2, 7, 8, 9, 10, 12, 18, 19, 13]. However, we must remark that when A is symmetric and positive definite, minimizing the Frobenius norm of the arXiv:1511.07694v1 [math.NA] 24 Nov 2015 residual in general will produce an inverse preconditioner which is neither symmetric nor positive definite; see, e.g., [2]. There is currently a growing interest, and understanding, in the rich geometrical struc- ture of the non-polyhedral cone of symmetric and positive semidefinite matrices (P SD); see e.g., [1, 5, 12, 19, 16, 17, 24]. In this work, we focus on inverse preconditioners based on minimizing the positive-scaling-invariant function F (X) = 1 cos(XA,I), instead of − ∗LAMFA, UMR 7352, Universit´e de Picardie Jules Verne, 33 rue Saint Leu, 80039 Amiens France([email protected]) †Departamento de C´omputo Cient´ıfico y Estad´ıstica, Universidad Sim´on Bol´ıvar, Ap. 89000, Caracas 1080-A, Venezuela ([email protected]). Supported by CNRS (Unit FR3399 ARC Math´ematiques, Universit´e de Picardie Jules Verne, Amiens, France). 1 minimizing the Frobenius norm of the residual. Our approach takes advantage of the geometrical properties of the P SD cone, and also of the special properties of F (X) on a suitable compact set, to introduce specialized gradient-type methods for which we analyze their convergence properties. The rest of the document is organized as follows. In Section 2, we develop and analyze two different gradient-type iterative schemes for finding inverse approximations based on minimizing F (X), including sparse versions. In Section 3, we present numerical results on some well-known test matrices to illustrate the behavior and properties of the introduced gradient-type methods. Finally, in Section 4 we present some concluding remarks. 2 Gradient-type iterative methods Let us recall that the cosine between two n n real matrices A and B is defined as × A, B cos(A, B)= h i , (1) A B k kF k kF where A, B = trace(BT A) is the Frobenius inner product in the space of matrices and h i . is the associated Frobenius norm. By the Cauchy-Schwarz inequality it follows that k kF cos(A, B) 1, | |≤ and the equality is attained if and only if A = γB for some nonzero real number γ. To compute the inverse of a given symmetric and positive definite matrix A we consider the function F (X) = 1 cos(XA,I) 0, (2) − ≥ for which the minimum value zero is reached at X = ξA−1, for any positive real number ξ. Let us recall that any positive semidefinite matrix B has nonnegative diagonal entries and so trace(B) 0. Hence, if XA is symetric, we need to impose that XA,I = ≥ h i trace(XA) 0 as a necessary condition for XA to be in the P SD cone, see [5, 17, 24]. ≥ Therefore, in order to impose uniqueness in the P SD cone, we consider the constrained minimization problem Min F (X), (3) X∈S∩T where S = X IRn×n XA = √n and T = X IRn×n trace(XA) 0 . Notice { ∈ | k kF } { ∈ | ≥ } that S T is a closed and bounded set, and so problem (3) is well-posed. ∩ Remark 2.1 For any β > 0, F (βX) = F (X), and so the function F is invariant under positive scaling. The derivative of F (X), denoted by F (X), plays an important role in our work. ∇ Lemma 2.1 1 XA,I F (X)= h 2i XA I A. ∇ I F XA F XA − k k k k k kF 2 Proof. For fixed matrices X and Y , we consider the function ϕ(t) = F (X + tY ). It is well-known that ϕ′(0) = F (X),Y . We have h∇ i 1 XA,I + t YA,I F (X + tY ) = 1 h i h i , 2 − I F XA F hXA,Y Ai 2 kYAk k k k k 1 + 2t 2 + t 2 kXAkF kXAkF r and we obtain after differentiating ϕ(t) and some algebraic manipulations ′ 1 XA,I ϕ (0) = h 2i XA I A, Y , h I F XA F XA − i k k k k k kF and the result is established. Theorem 2.1 Problem (3) possesses the unique solution X = A−1. Proof. Notice that F (X) = 0 for X S, if and only if X = βA−1 for β = 1. Now, ∇ ∈ ± F ( A−1) = 2 and so X = A−1 is the global maximizer of the function F on S, but − − A−1 / T ; whereas F (A−1)=0 and A−1 T . Therefore, X = A−1 is the unique feasible − ∈ ∈ solution of (3). Before discussing different numerical schemes for solving problem (3), we need a couple of technical lemmas. Lemma 2.2 If X S and XA = AX, then ∈ F (X), X = 0. h∇ i Proof. Since X S then XA 2 = n, and we have ∈ k kF 1 XA,I F (X)= h i XA I A, ∇ n n − hence 1 XA,I 1 F (X), X = h i XAA, X A, X . h∇ i n n h i− nh i But XAA, X = XA 2 = n, so h i k kF XA,I 1 F (X), X = h i A, X = 0, h∇ i n − nh i since A, X = AX,I = XA,I . h i h i h i Lemma 2.3 If X S, then ∈ √n X √n A−1 . A ≤ k kF ≤ k kF k kF Proof. For every X we have X = A−1AX, and so X = A−1AX √n A−1 . k kF k kF ≤ k kF On the other hand, since X S, √n = XA X A , and hence ∈ k kF ≤ k kF k kF √n X , k kF ≥ A k kF and the result is established. 3 2.1 The negative gradient direction For the numerical solution of (3), we start by considering the classical gradient iterations that, from an initial guess X0, are given by X(k+1) = X(k) α F (X(k)), − k∇ where αk > 0 is a suitable step length. A standard approach is to use the optimal choice i.e., the positive step length that (exactly) minimizes the function F (X) along the negative gradient direction. We present a closed formula for the optimal choice of step length in a more general setting, assuming that the iterative method is given by: (k+1) (k) X = X + αkDk, where Dk is a search direction in the space of matrices. (k) Lemma 2.4 The optimal step length αk, that optimizes F (X + αDk), is given by (k) (k) X A, I X A, DkA n DkA, I αk = h ih i− h i . D A, I X(k)A, D A X(k)A, I D A, D A h k ih k i − h ih k k i Proof. Consider the auxiliary function in one variable (k) (k) X A, I + α DkA, I ψ(α)= F (X + αDk) = 1 h i h i. − √n X(k)A + αD A k k kF Differentiating ψ(α), using that X(k)A, X(k)A = n, and also that h i (k) ∂ (k) X A, DkA + α DkA, DkA X A + αD A F = h i h i, ∂αk k k X(k)A + αD A k k kF and then forcing ψ′(α) = 0 the result is obtained, after some algebraic manipulations. Remark 2.2 For our first approach, D = F (X(k)), and so for the optimal gradient k −∇ method (also known as Cauchy method or steepest descent method) the step length is given by n F (X(k))A, I X(k)A, I X(k)A, F (X(k))A αk = h∇ i − h ih ∇ i . (4) F (X(k))A, I X(k)A, F (X(k))A X(k)A, I F (X(k))A 2 h∇ ih ∇ i − h ik∇ kF Notice that if we use instead D = F (X(k)), the obtained α which also forces ψ′(α ) = 0 k ∇ k k is given by (4) but with a negative sign. Therefore, to guarantee that αk > 0 minimizes F along the negative gradient direction to approximate A−1, instead of maximizing F along the gradient direction to approximate A−1, we will choose the step length α as − k the absolute value of the expression in (4).