Optimal Approximation of Doubly Stochastic Matrices

Total Page:16

File Type:pdf, Size:1020Kb

Optimal Approximation of Doubly Stochastic Matrices Optimal Approximation of Doubly Stochastic Matrices Nikitas Rontsis Paul J. Goulart Department of Engineering Science Department of Engineering Science University of Oxford, OX1 3PJ, UK University of Oxford, OX1 3PJ, UK Abstract X 2 Rn×n, i.e., a matrix with nonnegative elements whose columns and rows sum to one, that has the same We consider the least-squares approximation sparsity pattern as C. Problem P is a matrix nearness of a matrix C in the set of doubly stochastic problem, i.e., a problem of finding a matrix with cer- matrices with the same sparsity pattern as C. tain properties that is close to some given matrix; see Our approach is based on applying the well- Higham (1989) for a survey on matrix nearness prob- known Alternating Direction Method of Mul- lems. tipliers (ADMM) to a reformulation of the Adjusting a matrix so that it becomes doubly stochas- original problem. Our resulting algorithm tic is relevant in many fields, e.g., for precondition- requires an initial Cholesky factorization of ing linear systems (Knight, 2008), as a normalization a positive definite matrix that has the same tool used in spectral clustering (Zass and Shashua, sparsity pattern as C + I followed by sim- 2007), optimal transport (Cuturi, 2013), and image fil- ple iterations whose complexity is linear in tering (Milanfar, 2013), or as a tool to estimate a dou- the number of nonzeros in C, thus ensuring bly stochastic matrix from incomplete or approximate excellent scalability and speed. We demon- data used e.g., in longitudinal studies in life sciences strate the advantages of our approach in a (Diggle et al., 2002), or to analyze the 3D structure of series of experiments on problems with up to the human genome (Rao et al., 2014) 82 million nonzeros; these include normaliz- ing large scale matrices arising from the 3D A related and widely used approach is to search for a structure of the human genome, clustering diagonal matrix D such that DCD is doubly stochas- applications, and the SuiteSparse matrix li- tic. This is commonly referred to as the matrix bal- brary. Overall, our experiments illustrate the ancing problem, or the Sinkhorn's algorithm (Knight, outstanding scalability of our algorithm; ma- 2008). Such a scaling matrix D exists and is unique trices with millions of nonzeros can be ap- whenever C has total support. Perhaps surprisingly, proximated in a few seconds on modest desk- when C has only nonnegative elements, then the ma- top computing hardware. trix balancing problem can be considered as a matrix nearness problem. This is because, DCD has been shown to minimize the relative entropy measure (Idel, 1 INTRODUCTION 2016, Observation 3.19) (Benamou et al., 2015), i.e., it is the solution of the following convex problem Consider the following optimization problem P minimize i;j Xi;j log Xi;j=Ci;j subject to X doubly stochastic (1) minimize 1 kX − Ck2 2 Xi;j = 0 8 (i; j) with Ci;j = 0; subject to X nonnegative (P) Xi;j = 0 8 (i; j) with Ci;j = 0 where we define 0 · log(0) = 0, 0 · log(0=0) = 0, and X1 = 1;XT 1 = 1; 1 · log(1=0) = 1. Note, however, that the relative entropy is not a proper distance metric since, it is not which approximates the symmetric real, n × n matrix symmetric, does not satisfy the triangular inequality C in the Frobenius norm by a doubly stochastic matrix and can take the value 1. Proceedings of the 23rdInternational Conference on Artifi- The matrix balancing problem can be solved by iter- cial Intelligence and Statistics (AISTATS) 2020, Palermo, ative methods with remarkable scalability. Standard Italy. PMLR: Volume 108. Copyright 2020 by the au- and simple iterative methods exist that exhibit linear thor(s). per-iteration complexity w.r.t. the number of nonze- Optimal Approximation of Doubly Stochastic Matrices ros in C and linear convergence rate (Knight, 2008), 2 MODELING P EFFICIENTLY (Idel, 2016). More recent algorithms can exhibit a super-linear convergence rate and increased robustness In this section, we present a reformulation of the dou- in many practical situations (Knight and Ruiz, 2013). bly stochastic approximation problem P suitable for solving large-scale problems. One of the difficulties The aim of the paper is to show that the direct mini- with the original formulation, P, is that it has n2 mization of a least squares objective in doubly stochas- variables and 2n2 + 2n constraints. Attempting to tic approximation, which has a long history dating solve P with an off-the-shelf QP solver, such as Gurobi back to the influential paper of (Deming and Stephan, (Gurobi Optimization LLC, 2018), can result in out- 1940)1, can also be solved efficiently. This gives practi- of-memory issues just by representing the problem's tioners a new solution to a very important problem of data even for medium-sized problems. doubly stochastic matrix approximation, which might prove useful for cases where the relative entropy metric In order to avoid this issue, we will perform a series is not suitable to their problem. of reformulations that will eliminate variables from P. The final problem, i.e., P , will have significantly fewer The approach we present is also flexible in the sense 2 variables and constraints while maintaining a remark- that it can handle other interesting cases, for example, able degree of structure, which will be revealed and where C is asymmetric or rectangular, k·k is a weighted exploited in the next section. Frobenius norm, and X1 and XT 1 are required to sum to an arbitrary, given vector. We discuss these We first take the obvious step of eliminating all vari- generalizations in x3.2. ables in P that are constrained to be zero, as pre- scribed by the constraint X = 0 for all (i; j) such that C = 0: (2) Related work Zass and Shashua (2007) consider i;j i;j the problem P in the case when C is fully dense. They Indeed, consider vec(·), the operator that stacks a ma- suggest an alternating projections algorithm that has trix into a vector in a column-wise fashion and H an linear per-iteration complexity. The approach of Zass 2 nnz ×n matrix, where nnz := card(C), that selects all and Shashua (2007) resembles the results of x3.2, but the nonzero elements of vec(C). Note that H depends as we will see in the experimental section, it is not on the sparsity pattern S of C, defined as the 0 − 1 guaranteed to converge to an optimizer. n × n matrix The paper is organized as follows: In Section 2, we ( introduce a series of reformulations to Problem 1, re- 0 Ci;j = 0 Si;j := (3) sulting in a problem that is much easier to solve.In 1 otherwise; Section 3, we suggest a solution method that reveals a particular structure in the problem. Finally, in Sec- but we will not denote this dependence explicitly. We tion 4, we present a series of numerical results that can now isolate the nonzero variables contained in X highlight the scalability of the approach. and C in nnz−dimensional vectors defined as x := H vec(X); c := H vec(C): (4) Notation used The symbol ⊗ denotes the Kro- Note that x is simply a re-writing of necker product, vec(·) the operator that stacks a X in Rnnz , since for any X 2 S := n×n matrix into a vector in a column-wise fashion and fX 2 R j Xi;j = 0 for all Ci;j = 0g we have mat(·) the inverse operator to vec(·) (see (Golub and T T Van Loan, 2013, 12.3.4) for a rigorous definition). H H vec(X) = vec(X) , H x = vec(X); Given two matrices, or vectors, with equal dimensions T denotes the Hadamard (element-wise) product. k·k due to the fact that H H = diag(vec(S)) Thus every denotes the 2-norm of a vector and the Frobenius norm x defines a unique X 2 S and vice versa. of a matrix, while card(·) the number of nonzero ele- We can now describe the constraints of P, i.e., X ≥ 0, T ments in a vector or a matrix. Finally p denotes the X1n = 1n and X 1n = 1n, on the \x−space". Ob- machine precision. viously X ≥ 0 trivially maps to x ≥ 0. Furthermore, recalling the standard Kronecker product identity vec(LMN) = (N T ⊗ L) vec(M) (5) 1Deming and Stephan (1940) consider the weighted 1 P 2 for matrices of compatible dimension, the constraints least squares cost 2 i;j [Xi;j − Ci;j ] =Ci;j which we treat T in x3.2. X1n = 1n and X 1n = 1n of P can be rewritten in Nikitas Rontsis, Paul J. Goulart an equivalent vectorized form as As in the previous definitions, define Hu as the matrix that stacks all the nonzeros of Cu in an column-wise T 1n ⊗ In fashion, which is used to extract the nonzero elements T vec(X) = 12n (6) In ⊗ 1n of Xu and Cu, i.e., scaled nonzero elements of the up- per triangular part of X, to the vectors or, by noting that HT x = vec(X), as xu := Hu vec(Xu); cu := Hu vec(Xu): (10) T 1n ⊗ In T T H x = 12n: (7) In ⊗ 1n We can now write down our reduced optimization problem. Note that, although it might not be directly Thus P can be rewritten in the \x−space" as evident, the following problem possesses a remarkable degree of internal structure that is exploited in the minimize 1 kx − ck2 2 suggested solution algorithm of the next section.
Recommended publications
  • Solving the Multi-Way Matching Problem by Permutation Synchronization
    Solving the multi-way matching problem by permutation synchronization Deepti Pachauriy Risi Kondorx Vikas Singhy yUniversity of Wisconsin Madison xThe University of Chicago [email protected] [email protected] [email protected] http://pages.cs.wisc.edu/∼pachauri/perm-sync Abstract The problem of matching not just two, but m different sets of objects to each other arises in many contexts, including finding the correspondence between feature points across multiple images in computer vision. At present it is usually solved by matching the sets pairwise, in series. In contrast, we propose a new method, Permutation Synchronization, which finds all the matchings jointly, in one shot, via a relaxation to eigenvector decomposition. The resulting algorithm is both computationally efficient, and, as we demonstrate with theoretical arguments as well as experimental results, much more stable to noise than previous methods. 1 Introduction 0 Finding the correct bijection between two sets of objects X = fx1; x2; : : : ; xng and X = 0 0 0 fx1; x2; : : : ; xng is a fundametal problem in computer science, arising in a wide range of con- texts [1]. In this paper, we consider its generalization to matching not just two, but m different sets X1;X2;:::;Xm. Our primary motivation and running example is the classic problem of matching landmarks (feature points) across many images of the same object in computer vision, which is a key ingredient of image registration [2], recognition [3, 4], stereo [5], shape matching [6, 7], and structure from motion (SFM) [8, 9]. However, our approach is fully general and equally applicable to problems such as matching multiple graphs [10, 11].
    [Show full text]
  • Math 221: LINEAR ALGEBRA
    Math 221: LINEAR ALGEBRA Chapter 8. Orthogonality §8-3. Positive Definite Matrices Le Chen1 Emory University, 2020 Fall (last updated on 11/10/2020) Creative Commons License (CC BY-NC-SA) 1 Slides are adapted from those by Karen Seyffarth from University of Calgary. Positive Definite Matrices Cholesky factorization – Square Root of a Matrix Positive Definite Matrices Definition An n × n matrix A is positive definite if it is symmetric and has positive eigenvalues, i.e., if λ is a eigenvalue of A, then λ > 0. Theorem If A is a positive definite matrix, then det(A) > 0 and A is invertible. Proof. Let λ1; λ2; : : : ; λn denote the (not necessarily distinct) eigenvalues of A. Since A is symmetric, A is orthogonally diagonalizable. In particular, A ∼ D, where D = diag(λ1; λ2; : : : ; λn). Similar matrices have the same determinant, so det(A) = det(D) = λ1λ2 ··· λn: Since A is positive definite, λi > 0 for all i, 1 ≤ i ≤ n; it follows that det(A) > 0, and therefore A is invertible. Theorem A symmetric matrix A is positive definite if and only if ~xTA~x > 0 for all n ~x 2 R , ~x 6= ~0. Proof. Since A is symmetric, there exists an orthogonal matrix P so that T P AP = diag(λ1; λ2; : : : ; λn) = D; where λ1; λ2; : : : ; λn are the (not necessarily distinct) eigenvalues of A. Let n T ~x 2 R , ~x 6= ~0, and define ~y = P ~x. Then ~xTA~x = ~xT(PDPT)~x = (~xTP)D(PT~x) = (PT~x)TD(PT~x) = ~yTD~y: T Writing ~y = y1 y2 ··· yn , 2 y1 3 6 y2 7 ~xTA~x = y y ··· y diag(λ ; λ ; : : : ; λ ) 6 7 1 2 n 1 2 n 6 .
    [Show full text]
  • Arxiv:1912.06217V3 [Math.NA] 27 Feb 2021
    ROUNDING ERROR ANALYSIS OF MIXED PRECISION BLOCK HOUSEHOLDER QR ALGORITHMS∗ L. MINAH YANG† , ALYSON FOX‡, AND GEOFFREY SANDERS‡ Abstract. Although mixed precision arithmetic has recently garnered interest for training dense neural networks, many other applications could benefit from the speed-ups and lower storage cost if applied appropriately. The growing interest in employing mixed precision computations motivates the need for rounding error analysis that properly handles behavior from mixed precision arithmetic. We develop mixed precision variants of existing Householder QR algorithms and show error analyses supported by numerical experiments. 1. Introduction. The accuracy of a numerical algorithm depends on several factors, including numerical stability and well-conditionedness of the problem, both of which may be sensitive to rounding errors, the difference between exact and finite precision arithmetic. Low precision floats use fewer bits than high precision floats to represent the real numbers and naturally incur larger rounding errors. Therefore, error attributed to round-off may have a larger influence over the total error and some standard algorithms may yield insufficient accuracy when using low precision storage and arithmetic. However, many applications exist that would benefit from the use of low precision arithmetic and storage that are less sensitive to floating-point round-off error, such as training dense neural networks [20] or clustering or ranking graph algorithms [25]. As a step towards that goal, we investigate the use of mixed precision arithmetic for the QR factorization, a widely used linear algebra routine. Many computing applications today require solutions quickly and often under low size, weight, and power constraints, such as in sensor formation, where low precision computation offers the ability to solve many problems with improvement in all four parameters.
    [Show full text]
  • Diagonalizing a Matrix
    Diagonalizing a Matrix Definition 1. We say that two square matrices A and B are similar provided there exists an invertible matrix P so that . 2. We say a matrix A is diagonalizable if it is similar to a diagonal matrix. Example 1. The matrices and are similar matrices since . We conclude that is diagonalizable. 2. The matrices and are similar matrices since . After we have developed some additional theory, we will be able to conclude that the matrices and are not diagonalizable. Theorem Suppose A, B and C are square matrices. (1) A is similar to A. (2) If A is similar to B, then B is similar to A. (3) If A is similar to B and if B is similar to C, then A is similar to C. Proof of (3) Since A is similar to B, there exists an invertible matrix P so that . Also, since B is similar to C, there exists an invertible matrix R so that . Now, and so A is similar to C. Thus, “A is similar to B” is an equivalence relation. Theorem If A is similar to B, then A and B have the same eigenvalues. Proof Since A is similar to B, there exists an invertible matrix P so that . Now, Since A and B have the same characteristic equation, they have the same eigenvalues. > Example Find the eigenvalues for . Solution Since is similar to the diagonal matrix , they have the same eigenvalues. Because the eigenvalues of an upper (or lower) triangular matrix are the entries on the main diagonal, we see that the eigenvalues for , and, hence, are .
    [Show full text]
  • Arxiv:1306.4805V3 [Math.OC] 6 Feb 2015 Used Greedy Techniques to Reorder Matrices
    CONVEX RELAXATIONS FOR PERMUTATION PROBLEMS FAJWEL FOGEL, RODOLPHE JENATTON, FRANCIS BACH, AND ALEXANDRE D’ASPREMONT ABSTRACT. Seriation seeks to reconstruct a linear order between variables using unsorted, pairwise similarity information. It has direct applications in archeology and shotgun gene sequencing for example. We write seri- ation as an optimization problem by proving the equivalence between the seriation and combinatorial 2-SUM problems on similarity matrices (2-SUM is a quadratic minimization problem over permutations). The seriation problem can be solved exactly by a spectral algorithm in the noiseless case and we derive several convex relax- ations for 2-SUM to improve the robustness of seriation solutions in noisy settings. These convex relaxations also allow us to impose structural constraints on the solution, hence solve semi-supervised seriation problems. We derive new approximation bounds for some of these relaxations and present numerical experiments on archeological data, Markov chains and DNA assembly from shotgun gene sequencing data. 1. INTRODUCTION We study optimization problems written over the set of permutations. While the relaxation techniques discussed in what follows are applicable to a much more general setting, most of the paper is centered on the seriation problem: we are given a similarity matrix between a set of n variables and assume that the variables can be ordered along a chain, where the similarity between variables decreases with their distance within this chain. The seriation problem seeks to reconstruct this linear ordering based on unsorted, possibly noisy, pairwise similarity information. This problem has its roots in archeology [Robinson, 1951] and also has direct applications in e.g.
    [Show full text]
  • Arxiv:1709.00765V4 [Cond-Mat.Stat-Mech] 14 Oct 2019
    Number of hidden states needed to physically implement a given conditional distribution Jeremy A. Owen,1 Artemy Kolchinsky,2 and David H. Wolpert2, 3, 4, ∗ 1Physics of Living Systems Group, Department of Physics, Massachusetts Institute of Technology, 400 Tech Square, Cambridge, MA 02139. 2Santa Fe Institute 3Arizona State University 4http://davidwolpert.weebly.com† (Dated: October 15, 2019) We consider the problem of how to construct a physical process over a finite state space X that applies some desired conditional distribution P to initial states to produce final states. This problem arises often in the thermodynamics of computation and nonequilibrium statistical physics more generally (e.g., when designing processes to implement some desired computation, feedback controller, or Maxwell demon). It was previously known that some conditional distributions cannot be implemented using any master equation that involves just the states in X. However, here we show that any conditional distribution P can in fact be implemented—if additional “hidden” states not in X are available. Moreover, we show that is always possible to implement P in a thermodynamically reversible manner. We then investigate a novel cost of the physical resources needed to implement a given distribution P : the minimal number of hidden states needed to do so. We calculate this cost exactly for the special case where P represents a single-valued function, and provide an upper bound for the general case, in terms of the nonnegative rank of P . These results show that having access to one extra binary degree of freedom, thus doubling the total number of states, is sufficient to implement any P with a master equation in a thermodynamically reversible way, if there are no constraints on the allowed form of the master equation.
    [Show full text]
  • Handout 9 More Matrix Properties; the Transpose
    Handout 9 More matrix properties; the transpose Square matrix properties These properties only apply to a square matrix, i.e. n £ n. ² The leading diagonal is the diagonal line consisting of the entries a11, a22, a33, . ann. ² A diagonal matrix has zeros everywhere except the leading diagonal. ² The identity matrix I has zeros o® the leading diagonal, and 1 for each entry on the diagonal. It is a special case of a diagonal matrix, and A I = I A = A for any n £ n matrix A. ² An upper triangular matrix has all its non-zero entries on or above the leading diagonal. ² A lower triangular matrix has all its non-zero entries on or below the leading diagonal. ² A symmetric matrix has the same entries below and above the diagonal: aij = aji for any values of i and j between 1 and n. ² An antisymmetric or skew-symmetric matrix has the opposite entries below and above the diagonal: aij = ¡aji for any values of i and j between 1 and n. This automatically means the digaonal entries must all be zero. Transpose To transpose a matrix, we reect it across the line given by the leading diagonal a11, a22 etc. In general the result is a di®erent shape to the original matrix: a11 a21 a11 a12 a13 > > A = A = 0 a12 a22 1 [A ]ij = A : µ a21 a22 a23 ¶ ji a13 a23 @ A > ² If A is m £ n then A is n £ m. > ² The transpose of a symmetric matrix is itself: A = A (recalling that only square matrices can be symmetric).
    [Show full text]
  • Doubly Stochastic Matrices Whose Powers Eventually Stop
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Linear Algebra and its Applications 330 (2001) 25–30 www.elsevier.com/locate/laa Doubly stochastic matrices whose powers eventually stopୋ Suk-Geun Hwang a,∗, Sung-Soo Pyo b aDepartment of Mathematics Education, Kyungpook National University, Taegu 702-701, South Korea bCombinatorial and Computational Mathematics Center, Pohang University of Science and Technology, Pohang, South Korea Received 22 June 2000; accepted 14 November 2000 Submitted by R.A. Brualdi Abstract In this note we characterize doubly stochastic matrices A whose powers A, A2,A3,... + eventually stop, i.e., Ap = Ap 1 =···for some positive integer p. The characterization en- ables us to determine the set of all such matrices. © 2001 Elsevier Science Inc. All rights reserved. AMS classification: 15A51 Keywords: Doubly stochastic matrix; J-potent 1. Introduction Let R denote the real field. For positive integers m, n,letRm×n denote the set of all m × n matrices with real entries. As usual let Rn denote the set Rn×1.We call the members of Rn the n-vectors. The n-vector of 1’s is denoted by e,andthe identity matrix of order n is denoted by In. For two matrices A, B of the same size, let A B denote that all the entries of A − B are nonnegative. A matrix A is called nonnegative if A O. A nonnegative square matrix is called a doubly stochastic matrix if all of its row sums and column sums equal 1.
    [Show full text]
  • On the Eigenvalues of Euclidean Distance Matrices
    “main” — 2008/10/13 — 23:12 — page 237 — #1 Volume 27, N. 3, pp. 237–250, 2008 Copyright © 2008 SBMAC ISSN 0101-8205 www.scielo.br/cam On the eigenvalues of Euclidean distance matrices A.Y. ALFAKIH∗ Department of Mathematics and Statistics University of Windsor, Windsor, Ontario N9B 3P4, Canada E-mail: [email protected] Abstract. In this paper, the notion of equitable partitions (EP) is used to study the eigenvalues of Euclidean distance matrices (EDMs). In particular, EP is used to obtain the characteristic poly- nomials of regular EDMs and non-spherical centrally symmetric EDMs. The paper also presents methods for constructing cospectral EDMs and EDMs with exactly three distinct eigenvalues. Mathematical subject classification: 51K05, 15A18, 05C50. Key words: Euclidean distance matrices, eigenvalues, equitable partitions, characteristic poly- nomial. 1 Introduction ( ) An n ×n nonzero matrix D = di j is called a Euclidean distance matrix (EDM) 1, 2,..., n r if there exist points p p p in some Euclidean space < such that i j 2 , ,..., , di j = ||p − p || for all i j = 1 n where || || denotes the Euclidean norm. i , ,..., Let p , i ∈ N = {1 2 n}, be the set of points that generate an EDM π π ( , ,..., ) D. An m-partition of D is an ordered sequence = N1 N2 Nm of ,..., nonempty disjoint subsets of N whose union is N. The subsets N1 Nm are called the cells of the partition. The n-partition of D where each cell consists #760/08. Received: 07/IV/08. Accepted: 17/VI/08. ∗Research supported by the Natural Sciences and Engineering Research Council of Canada and MITACS.
    [Show full text]
  • On the Asymptotic Existence of Partial Complex Hadamard Matrices And
    Discrete Applied Mathematics 102 (2000) 37–45 View metadata, citation and similar papers at core.ac.uk brought to you by CORE On the asymptotic existence of partial complex Hadamard provided by Elsevier - Publisher Connector matrices and related combinatorial objects Warwick de Launey Center for Communications Research, 4320 Westerra Court, San Diego, California, CA 92121, USA Received 1 September 1998; accepted 30 April 1999 Abstract A proof of an asymptotic form of the original Goldbach conjecture for odd integers was published in 1937. In 1990, a theorem reÿning that result was published. In this paper, we describe some implications of that theorem in combinatorial design theory. In particular, we show that the existence of Paley’s conference matrices implies that for any suciently large integer k there is (at least) about one third of a complex Hadamard matrix of order 2k. This implies that, for any ¿0, the well known bounds for (a) the number of codewords in moderately high distance binary block codes, (b) the number of constraints of two-level orthogonal arrays of strengths 2 and 3 and (c) the number of mutually orthogonal F-squares with certain parameters are asymptotically correct to within a factor of 1 (1 − ) for cases (a) and (b) and to within a 3 factor of 1 (1 − ) for case (c). The methods yield constructive algorithms which are polynomial 9 in the size of the object constructed. An ecient heuristic is described for constructing (for any speciÿed order) about one half of a Hadamard matrix. ? 2000 Elsevier Science B.V. All rights reserved.
    [Show full text]
  • 3.3 Diagonalization
    3.3 Diagonalization −4 1 1 1 Let A = 0 1. Then 0 1 and 0 1 are eigenvectors of A, with corresponding @ 4 −4 A @ 2 A @ −2 A eigenvalues −2 and −6 respectively (check). This means −4 1 1 1 −4 1 1 1 0 1 0 1 = −2 0 1 ; 0 1 0 1 = −6 0 1 : @ 4 −4 A @ 2 A @ 2 A @ 4 −4 A @ −2 A @ −2 A Thus −4 1 1 1 1 1 −2 −6 0 1 0 1 = 0−2 0 1 − 6 0 11 = 0 1 @ 4 −4 A @ 2 −2 A @ @ −2 A @ −2 AA @ −4 12 A We have −4 1 1 1 1 1 −2 0 0 1 0 1 = 0 1 0 1 @ 4 −4 A @ 2 −2 A @ 2 −2 A @ 0 −6 A 1 1 (Think about this). Thus AE = ED where E = 0 1 has the eigenvectors of A as @ 2 −2 A −2 0 columns and D = 0 1 is the diagonal matrix having the eigenvalues of A on the @ 0 −6 A main diagonal, in the order in which their corresponding eigenvectors appear as columns of E. Definition 3.3.1 A n × n matrix is A diagonal if all of its non-zero entries are located on its main diagonal, i.e. if Aij = 0 whenever i =6 j. Diagonal matrices are particularly easy to handle computationally. If A and B are diagonal n × n matrices then the product AB is obtained from A and B by simply multiplying entries in corresponding positions along the diagonal, and AB = BA.
    [Show full text]
  • QUADRATIC FORMS and DEFINITE MATRICES 1.1. Definition of A
    QUADRATIC FORMS AND DEFINITE MATRICES 1. DEFINITION AND CLASSIFICATION OF QUADRATIC FORMS 1.1. Definition of a quadratic form. Let A denote an n x n symmetric matrix with real entries and let x denote an n x 1 column vector. Then Q = x’Ax is said to be a quadratic form. Note that a11 ··· a1n . x1 Q = x´Ax =(x1...xn) . xn an1 ··· ann P a1ixi . =(x1,x2, ··· ,xn) . P anixi 2 (1) = a11x1 + a12x1x2 + ... + a1nx1xn 2 + a21x2x1 + a22x2 + ... + a2nx2xn + ... + ... + ... 2 + an1xnx1 + an2xnx2 + ... + annxn = Pi ≤ j aij xi xj For example, consider the matrix 12 A = 21 and the vector x. Q is given by 0 12x1 Q = x Ax =[x1 x2] 21 x2 x1 =[x1 +2x2 2 x1 + x2 ] x2 2 2 = x1 +2x1 x2 +2x1 x2 + x2 2 2 = x1 +4x1 x2 + x2 1.2. Classification of the quadratic form Q = x0Ax: A quadratic form is said to be: a: negative definite: Q<0 when x =06 b: negative semidefinite: Q ≤ 0 for all x and Q =0for some x =06 c: positive definite: Q>0 when x =06 d: positive semidefinite: Q ≥ 0 for all x and Q = 0 for some x =06 e: indefinite: Q>0 for some x and Q<0 for some other x Date: September 14, 2004. 1 2 QUADRATIC FORMS AND DEFINITE MATRICES Consider as an example the 3x3 diagonal matrix D below and a general 3 element vector x. 100 D = 020 004 The general quadratic form is given by 100 x1 0 Q = x Ax =[x1 x2 x3] 020 x2 004 x3 x1 =[x 2 x 4 x ] x2 1 2 3 x3 2 2 2 = x1 +2x2 +4x3 Note that for any real vector x =06 , that Q will be positive, because the square of any number is positive, the coefficients of the squared terms are positive and the sum of positive numbers is always positive.
    [Show full text]