Matrix Decompositions and Latent Semantic Indexing
Total Page:16
File Type:pdf, Size:1020Kb
Online edition (c)2009 Cambridge UP DRAFT! © April 1, 2009 Cambridge University Press. Feedback welcome. 403 Matrix decompositions and latent 18 semantic indexing On page 123 we introduced the notion of a term-document matrix: an M N matrix C, each of whose rows represents a term and each of whose column× s represents a document in the collection. Even for a collection of modest size, the term-document matrix C is likely to have several tens of thousands of rows and columns. In Section 18.1.1 we first develop a class of operations from linear algebra, known as matrix decomposition. In Section 18.2 we use a special form of matrix decomposition to construct a low-rank approximation to the term-document matrix. In Section 18.3 we examine the application of such low-rank approximations to indexing and retrieving documents, a technique referred to as latent semantic indexing. While latent semantic in- dexing has not been established as a significant force in scoring and ranking for information retrieval, it remains an intriguing approach to clustering in a number of domains including for collections of text documents (Section 16.6, page 372). Understanding its full potential remains an area of active research. Readers who do not require a refresher on linear algebra may skip Sec- tion 18.1, although Example 18.1 is especially recommended as it highlights a property of eigenvalues that we exploit later in the chapter. 18.1 Linear algebra review We briefly review some necessary background in linear algebra. Let C be an M N matrix with real-valued entries; for a term-document matrix, all × RANK entries are in fact non-negative. The rank of a matrix is the number of linearly independent rows (or columns) in it; thus, rank(C) min M, N . A square r r matrix all of whose off-diagonal entries are≤ zero is{ called} a diagonal matrix× ; its rank is equal to the number of non-zero diagonal entries. If all r diagonal entries of such a diagonal matrix are 1, it is called the identity matrix of dimension r and represented by Ir. For a square M M matrix C and a vector ~x that is not all zeros, the values × Online edition (c)2009 Cambridge UP 404 18 Matrix decompositions and latent semantic indexing of λ satisfying (18.1) C~x = λ~x EIGENVALUE are called the eigenvalues of C . The N-vector ~x satisfying Equation (18.1) for an eigenvalue λ is the corresponding right eigenvector. The eigenvector corresponding to the eigenvalue of largest magnitude is called the principal eigenvector. In a similar fashion, the left eigenvectors of C are the M-vectors y such that (18.2) ~yT C = λ~yT. The number of non-zero eigenvalues of C is at most rank(C). The eigenvalues of a matrix are found by solving the characteristic equation, which is obtained by rewriting Equation (18.1) in the form (C λIM)~x = 0. The eigenvalues of C are then the solutions of (C λI ) =−0, where S | − M | | | denotes the determinant of a square matrix S. The equation (C λIM) = 0 is an Mth order polynomial equation in λ and can have at| most− M roots,| which are the eigenvalues of C. These eigenvalues can in general be complex, even if all entries of C are real. We now examine some further properties of eigenvalues and eigenvectors, to set up the central idea of singular value decompositions in Section 18.2 be- low. First, we look at the relationship between matrix-vector multiplication and eigenvalues. ✎ Example 18.1: Consider the matrix 30 0 0 S = 0 20 0 . 0 0 1 λ λ Clearly the matrix has rank 3, and has 3 non-zero eigenvalues 1 = 30, 2 = 20 and λ 3 = 1, with the three corresponding eigenvectors 1 0 0 x~1 = 0 , x~2 = 1 and x~3 = 0 . 0 0 1 For each of the eigenvectors, multiplication by S acts as if we were multiplying the eigenvector by a multiple of the identity matrix; the multiple is different for each 2 eigenvector. Now, consider an arbitrary vector, such as ~v = 4 . We can always 6 express~v as a linear combination of the three eigenvectors of S; in the current example we have 2 ~v = 4 = 2x~1 + 4x~2 + 6x~3. 6 Online edition (c)2009 Cambridge UP 18.1 Linear algebra review 405 Suppose we multiply ~v by S: S~v = S(2x~1 + 4x~2 + 6x~3) = 2Sx~1 + 4Sx~2 + 6Sx~3 λ λ λ = 2 1x~1 + 4 2x~2 + 6 3x~3 (18.3) = 60x~1 + 80x~2 + 6x~3. Example 18.1 shows that even though ~v is an arbitrary vector, the effect of multiplication by S is determined by the eigenvalues and eigenvectors of S. Furthermore, it is intuitively apparent from Equation (18.3) that the product S~v is relatively unaffected by terms arising from the small eigenvalues of S; λ in our example, since 3 = 1, the contribution of the third term on the right hand side of Equation (18.3) is small. In fact, if we were to completely ignore the contribution in Equation (18.3) from the third eigenvector corresponding 60 λ to 3 = 1, then the product S~v would be computed to be 80 rather than 0 60 the correct product which is 80 ; these two vectors are relatively close 6 to each other by any of various metrics one could apply (such as the length of their vector difference). This suggests that the effect of small eigenvalues (and their eigenvectors) on a matrix-vector product is small. We will carry forward this intuition when studying matrix decompositions and low-rank approximations in Sec- tion 18.2. Before doing so, we examine the eigenvectors and eigenvalues of special forms of matrices that will be of particular interest to us. For a symmetric matrix S, the eigenvectors corresponding to distinct eigen- values are orthogonal. Further, if S is both real and symmetric, the eigenvalues are all real. ✎ Example 18.2: Consider the real, symmetric matrix 2 1 (18.4) S = . 1 2 From the characteristic equation S λI = 0, we have the quadratic (2 λ)2 1 = 0, whose solutions yield the eigenvalues| − | 3 and 1. The corresponding eigenvectors− − 1 1 and are orthogonal. 1 1 − Online edition (c)2009 Cambridge UP 406 18 Matrix decompositions and latent semantic indexing 18.1.1 Matrix decompositions In this section we examine ways in which a square matrix can be factored into the product of matrices derived from its eigenvectors; we refer to this MATRIX process as matrix decomposition. Matrix decompositions similar to the ones DECOMPOSITION in this section will form the basis of our principal text-analysis technique in Section 18.3, where we will look at decompositions of non-square term- document matrices. The square decompositions in this section are simpler and can be treated with sufficient mathematical rigor to help the reader un- derstand how such decompositions work. The detailed mathematical deriva- tion of the more complex decompositions in Section 18.2 are beyond the scope of this book. We begin by giving two theorems on the decomposition of a square ma- trix into the product of three matrices of a special form. The first of these, Theorem 18.1, gives the basic factorization of a square real-valued matrix into three factors. The second, Theorem 18.2, applies to square symmetric matrices and is the basis of the singular value decomposition described in Theorem 18.3. Theorem 18.1. (Matrix diagonalization theorem) Let S be a square real-valued M M matrix with M linearly independent eigenvectors. Then there exists an × EIGEN DECOMPOSITION eigen decomposition 1 (18.5) S = UΛU− , where the columns of U are the eigenvectors of S and Λ is a diagonal matrix whose diagonal entries are the eigenvalues of S in decreasing order λ 1 λ (18.6) 2 , λ λ . i ≥ i+1 · · · λ M If the eigenvalues are distinct, then this decomposition is unique. To understand how Theorem 18.1 works, we note that U has the eigenvec- tors of S as columns (18.7) U = (u~ u~ u~ ) . 1 2 · · · M Then we have SU = S (u~ u~ u~ ) 1 2 · · · M = (λ u~ λ u~ λ u~ ) 1 1 2 2 · · · M M λ 1 λ = (u~ u~ u~ ) 2 . 1 2 · · · M · · · λ M Online edition (c)2009 Cambridge UP 18.2 Term-document matrices and singular value decompositions 407 1 Thus, we have SU = UΛ, or S = UΛU− . We next state a closely related decomposition of a symmetric square matrix into the product of matrices derived from its eigenvectors. This will pave the way for the development of our main tool for text analysis, the singular value decomposition (Section 18.2). Theorem 18.2. (Symmetric diagonalization theorem) Let S be a square, sym- metric real-valued M M matrix with M linearly independent eigenvectors. Then × SYMMETRIC DIAGONAL there exists a symmetric diagonal decomposition DECOMPOSITION (18.8) S = QΛQT, where the columns of Q are the orthogonal and normalized (unit length, real) eigen- vectors of S, and Λ is the diagonal matrix whose entries are the eigenvalues of S. 1 T Further, all entries of Q are real and we have Q− = Q . We will build on this symmetric diagonal decomposition to build low-rank approximations to term-document matrices. Exercise 18.1 What is the rank of the 3 3 diagonal matrix below? ? × 1 1 0 0 1 1 1 2 1 Exercise 18.2 Show that λ = 2 is an eigenvalue of 6 2 C = − .