Multilinear Algebra in Data Analysis: Tensors, Symmetric Tensors, Nonnegative Tensors
Total Page:16
File Type:pdf, Size:1020Kb
Multilinear Algebra in Data Analysis: tensors, symmetric tensors, nonnegative tensors Lek-Heng Lim Stanford University Workshop on Algorithms for Modern Massive Datasets Stanford, CA June 21–24, 2006 Thanks: G. Carlsson, L. De Lathauwer, J.M. Landsberg, M. Mahoney, L. Qi, B. Sturmfels; Collaborators: P. Comon, V. de Silva, P. Drineas, G. Golub References http://www-sccm.stanford.edu/nf-publications-tech.html [CGLM2] P. Comon, G. Golub, L.-H. Lim, and B. Mourrain, “Symmetric tensors and symmetric tensor rank,” SCCM Tech. Rep., 06-02, 2006. [CGLM1] P. Comon, B. Mourrain, L.-H. Lim, and G.H. Golub, “Genericity and rank deficiency of high order symmetric tensors,” Proc. IEEE Int. Con- ference on Acoustics, Speech, and Signal Processing (ICASSP), 31 (2006), no. 3, pp. 125–128. [dSL] V. de Silva and L.-H. Lim, “Tensor rank and the ill-posedness of the best low-rank approximation problem,” SCCM Tech. Rep., 06-06 (2006). [GL] G. Golub and L.-H. Lim, “Nonnegative decomposition and approximation of nonnegative matrices and tensors,” SCCM Tech. Rep., 06-01 (2006), forthcoming. [L] L.-H. Lim, “Singular values and eigenvalues of tensors: a variational approach,” Proc. IEEE Int. Workshop on Computational Advances in Multi- Sensor Adaptive Processing (CAMSAP), 1 (2005), pp. 129–132. 2 What is not a tensor, I • What is a vector? – Mathematician: An element of a vector space. – Physicist: “What kind of physical quantities can be rep- resented by vectors?” Answer: Once a basis is chosen, an n-dimensional vector is something that is represented by n real numbers only if those real numbers transform themselves as expected (ie. according to the change-of-basis theorem) when one changes the basis • What is a tensor? – Mathematician: An element of a tensor product of vector spaces. – Physicist: “What kind of physical quantities can be rep- resented by tensors?” Answer: Slide 7. 3 What is not a tensor, II By a tensor, physicists and geometers often mean a tensor field (roughly, these are tensor-valued functions on manifolds): • stress tensor • moment-of-intertia tensor • gravitational field tensor • metric tensor • curvature tensor 4 Tensor product of vector spaces U, V, W vector spaces. Think of U ⊗ V ⊗ W as the vector space of all formal linear combinations of terms of the form u ⊗ v ⊗ w, X αu ⊗ v ⊗ w, where α ∈ R, u ∈ U, v ∈ V, w ∈ W. One condition: ⊗ decreed to have the multilinear property (αu1 + βu2) ⊗ v ⊗ w = αu1 ⊗ v ⊗ w + βu2 ⊗ v ⊗ w, u ⊗ (αv1 + βv2) ⊗ w = αu ⊗ v1 ⊗ w + βu ⊗ v2 ⊗ w, u ⊗ v ⊗ (αw1 + βw2) = αu ⊗ v ⊗ w1 + βu ⊗ v ⊗ w2. 5 Tensors and multiway arrays Up to a choice of bases on U, V, W , A ∈ U ⊗ V ⊗ W can be l,m,n l×m×n represented by a 3-way array A = aijk i,j,k=1 ∈ R on which the following algebraic operationsJ areK defined: l×m×n 1. Addition/Scalar Multiplication: for bijk ∈ R , λ ∈ R, J K l×m×n aijk + bijk := aijk+bijk and λ aijk := λaijk ∈ R J K J K J K J K J K 2. Multilinear Matrix Multiplication: for matrices L = [λi0i] ∈ p×l q×m r×n R ,M = [µj0j] ∈ R ,N = [νk0k] ∈ R , p×q×r (L, M, N) · A := ci0j0k0 ∈ R J K where l m n X X X ci0j0k0 := λi0iµj0jνk0kaijk. i=1 j=1 k=1 6 Change-of-basis theorem for tensors Two representations A, A0 of A in different bases are related by (L, M, N) · A = A0 with L, M, N respective change-of-basis matrices (non-singular). Henceforth, we will not distinguish between an order-k tensor and a k-way array that represents it (with respect to some implicit choice of basis). 7 Segre outer product l m n l m n If U = R , V = R , W = R , R ⊗ R ⊗ R may be identified with l×m×n R if we define ⊗ by l,m,n u ⊗ v ⊗ w = uivjwk i,j,k=1. J K l×m×n A tensor A ∈ R is said to be decomposable if it can be written in the form A = u ⊗ v ⊗ w l m n for some u ∈ R , v ∈ R , w ∈ R . The set of all decomposable tensors is known as the Segre variety in algebraic geometry. It is a closed set (in both the Euclidean and Zariski sense) as it can be described algebraically: Seg( l m n) = l×m×n = = R , R , R {A ∈ R | ai1i2i3 aj1j2j3 ak1k2k3 al1l2l3 , {iα, jα} {kα, lα}} 8 Tensor ranks m×n Matrix rank. A ∈ R rank( ) = dim(span ) (column rank) A R{A•1,...,A•n} = dim(span ) (row rank) R{A1•,...,Am•} Pr | = min{r | A = i=1uivi } (outer product rank). Multilinear rank. A ∈ l×m×n. rank (A) = (r (A), r (A), r (A)) R 1 2 3 where ( ) = dim(span ) r1 A R{A1••,...,Al••} ( ) = dim(span ) r2 A R{A•1•,...,A•m•} ( ) = dim(span ) r3 A R{A••1,...,A••n} l×m×n Outer product rank. A ∈ R . Pr rank⊗(A) = min{r | A = i=1ui ⊗ vi ⊗ wi} In general, rank⊗(A) 6= r1(A) 6= r2(A) 6= r3(A). 9 Credit Both notions of tensor rank (also the corresponding decomposi- tion) due to Frank L. Hitchcock in 1927. Multilinear rank is a special case (uniplex) of his more general multiplex rank. F.L. Hitchcock, “The expression of a tensor or a polyadic as a sum of products,” J. Math. Phys., 6 (1927), no. 1, pp. 164–189. F.L. Hitchcock, “Multiple invariants and generalized rank of a p-way matrix or tensor,” J. Math. Phys., 7 (1927), no. 1, pp. 39–79. Often wrongly attributed. Predates CANDECOMP/PARAFAC or Tucker by 40 years. 10 Outer product rank l×m×n Theorem (H˚astad). Computing rank⊗(A) for A ∈ R is an NP-hard problem. m×n m×n Matrix: A ∈ R ⊂ C , rank(A) is the same whether we regard it as a real matrix or a complex matrix. l×m×n l×m×n Theorem (Bergman). For A ∈ R ⊂ C , rank⊗(A) is base field dependent. n Example. x, y ∈ R linearly independent and let z = x + iy. x ⊗ x ⊗ x − x ⊗ y ⊗ y + y ⊗ x ⊗ y + y ⊗ y ⊗ x 1 = (z ⊗ ¯z ⊗ ¯z + ¯z ⊗ z ⊗ z) 2 rank⊗(A) is 3 over R and is 2 over C. 11 Fundamental problem of multiway data analysis Let A be a tensor, symmetric tensor, or nonnegative tensor. Solve argminrank(B)≤rkA − Bk where rank may be outer product rank, multilinear rank, sym- metric rank (for symmetric tensors), or nonnegative rank (non- negative tensors). d ×d ×d Example. Given A ∈ R 1 2 3, find ui, vi, wi, i = 1, . , r, that minimizes kA − u1 ⊗ v1 ⊗ w1 − u2 ⊗ v2 ⊗ w2 − · · · − ur ⊗ vr ⊗ zrk. r ×r ×r d ×r or C ∈ R 1 2 3 and Li ∈ R i i that minimizes kA − (L1,L2,L3) · Ck. k n Example. Given A ∈ S (C ), find ui, i = 1, . , r, that minimizes ⊗k ⊗k ⊗k kA − u1 − u2 − · · · − ur k. 13 Harmonic analytic approach to data analysis More generally, F = C, R, R+, Rmax (max-plus algebra), R[x1, . , xn] (polynomial rings), etc. N Dictionary, D ⊂ F , not contained in any hyperplane. Let D2 = union of bisecants to D, D3 = union of trisecants to D,..., Dr = union of r-secants to D. N Define D-rank of A ∈ F to be min{r | A ∈ Dr}. N N If ϕ : F × F → R is some measure of ‘nearness’ between pairs of points (eg. norms, Bregman divergences, etc), we want to find a best low-rank approximation to A: argmin{ϕ(A, B) | D-rank(B) ≤ r}. 14 Feature revelation Get low-rank approximation A ≈ α1 · B1 + ··· + αr · Br ∈ Dr. Bi ∈ D reveal features of the dataset A. Note that another way to say ‘best low-rank’ is ‘sparsest possi- ble’. Example. D = {A | rank⊗(A) ≤ 1}, ϕ(A, B) = kA − BkF — get usual CANDECOMP/PARAFAC. Example. D = {A | rank (A) ≤ (r , r , r )} (an algebraic set), 1 2 3 ϕ(A, B) = kA − BkF — get De Lathauwer decomposition. 15 Simple lemma Lemma (de-Silva, L.). Let r ≥ 2 and k ≥ 3. Given the norm- d ×···×d topology on R 1 k, the following statements are equivalent: (a) The set Sr(d1, . , dk) := {A | rank⊗(A) ≤ r} is not closed. (b) There exists a sequence An, rank⊗(An) ≤ r, n ∈ N, converg- ing to B with rank⊗(B) > r. (c) There exists B, rank⊗(B) > r, that may be approximated arbitrarily closely by tensors of strictly lower rank, ie. inf{kB − Ak | rank⊗(A) ≤ r} = 0. (d) There exists C, rank⊗(C) > r, that does not have a best rank-r approximation, ie. inf{kC − Ak | rank⊗(A) ≤ r} is not attained (by any A with rank⊗(A) ≤ r). 16 Non-existence of best low-rank approximation D. Bini, M. Capovani, F. Romani, and G. Lotti, “O(n2.7799) complexity for n × n approximate matrix multiplication,” Inform. Process. Lett., 8 (1979), no. 5, pp. 234–235. Let x, y, z, w be linearly independent. Define A := x⊗x⊗x+x⊗y ⊗z+y ⊗z⊗x+y ⊗w ⊗z+z⊗x⊗y +z⊗y ⊗w and, for ε > 0, −1 −1 Bε := (y + εx) ⊗ (y + εw) ⊗ ε z + (z + εx) ⊗ ε x ⊗ (x + εy) − ε−1y ⊗ y ⊗ (x + z + εw) − ε−1z ⊗ (x + y + εz) ⊗ x + ε−1(y + z) ⊗ (y + εz) ⊗ (x + εw). Then rank⊗(Bε) ≤ 5, rank⊗(A) = 6 and kBε − Ak → 0 as ε → 0. A has no optimal approximation by tensors of rank ≤ 5.