Multilinear Algebra in Data Analysis: Tensors, Symmetric Tensors, Nonnegative Tensors

Multilinear Algebra in Data Analysis: Tensors, Symmetric Tensors, Nonnegative Tensors

Multilinear Algebra in Data Analysis: tensors, symmetric tensors, nonnegative tensors Lek-Heng Lim Stanford University Workshop on Algorithms for Modern Massive Datasets Stanford, CA June 21–24, 2006 Thanks: G. Carlsson, L. De Lathauwer, J.M. Landsberg, M. Mahoney, L. Qi, B. Sturmfels; Collaborators: P. Comon, V. de Silva, P. Drineas, G. Golub References [CGLM2] P. Comon, G. Golub, L.-H. Lim, and B. Mourrain, “Symmetric tensors and symmetric tensor rank,” SCCM Tech. Rep., 06-02, 2006. [CGLM1] P. Comon, B. Mourrain, L.-H. Lim, and G.H. Golub, “Genericity and rank deficiency of high order symmetric tensors,” Proc. IEEE Int. Con- ference on Acoustics, Speech, and Signal Processing (ICASSP), 31 (2006), no. 3, pp. 125–128. [dSL] V. de Silva and L.-H. Lim, “Tensor rank and the ill-posedness of the best low-rank approximation problem,” SCCM Tech. Rep., 06-06 (2006). [GL] G. Golub and L.-H. Lim, “Nonnegative decomposition and approximation of nonnegative matrices and tensors,” SCCM Tech. Rep., 06-01 (2006), forthcoming. [L] L.-H. Lim, “Singular values and eigenvalues of tensors: a variational approach,” Proc. IEEE Int. Workshop on Computational Advances in Multi- Sensor Adaptive Processing (CAMSAP), 1 (2005), pp. 129–132. 2 What is not a tensor, I • What is a vector? – Mathematician: An element of a vector space. – Physicist: “What kind of physical quantities can be rep- resented by vectors?” Answer: Once a basis is chosen, an n-dimensional vector is something that is represented by n real numbers only if those real numbers transform themselves as expected (ie. according to the change-of-basis theorem) when one changes the basis • What is a tensor? – Mathematician: An element of a tensor product of vector spaces. – Physicist: “What kind of physical quantities can be rep- resented by tensors?” Answer: Slide 7. 3 What is not a tensor, II By a tensor, physicists and geometers often mean a tensor field (roughly, these are tensor-valued functions on manifolds): • stress tensor • moment-of-intertia tensor • gravitational field tensor • metric tensor • curvature tensor 4 Tensor product of vector spaces U, V, W vector spaces. Think of U ⊗ V ⊗ W as the vector space of all formal linear combinations of terms of the form u ⊗ v ⊗ w, X αu ⊗ v ⊗ w, where α ∈ R, u ∈ U, v ∈ V, w ∈ W. One condition: ⊗ decreed to have the multilinear property (αu1 + βu2) ⊗ v ⊗ w = αu1 ⊗ v ⊗ w + βu2 ⊗ v ⊗ w, u ⊗ (αv1 + βv2) ⊗ w = αu ⊗ v1 ⊗ w + βu ⊗ v2 ⊗ w, u ⊗ v ⊗ (αw1 + βw2) = αu ⊗ v ⊗ w1 + βu ⊗ v ⊗ w2. 5 Tensors and multiway arrays Up to a choice of bases on U, V, W , A ∈ U ⊗ V ⊗ W can be l,m,n l×m×n represented by a 3-way array A = aijk i,j,k=1 ∈ R on which the following algebraic operationsJ areK defined: l×m×n 1. Addition/Scalar Multiplication: for bijk ∈ R , λ ∈ R, J K l×m×n aijk + bijk := aijk+bijk and λ aijk := λaijk ∈ R J K J K J K J K J K 2. Multilinear Matrix Multiplication: for matrices L = [λi0i] ∈ p×l q×m r×n R ,M = [µj0j] ∈ R ,N = [νk0k] ∈ R , p×q×r (L, M, N) · A := ci0j0k0 ∈ R J K where l m n X X X ci0j0k0 := λi0iµj0jνk0kaijk. i=1 j=1 k=1 6 Change-of-basis theorem for tensors Two representations A, A0 of A in different bases are related by (L, M, N) · A = A0 with L, M, N respective change-of-basis matrices (non-singular). Henceforth, we will not distinguish between an order-k tensor and a k-way array that represents it (with respect to some implicit choice of basis). 7 Segre outer product l m n l m n If U = R , V = R , W = R , R ⊗ R ⊗ R may be identified with l×m×n R if we define ⊗ by l,m,n u ⊗ v ⊗ w = uivjwk i,j,k=1. J K l×m×n A tensor A ∈ R is said to be decomposable if it can be written in the form A = u ⊗ v ⊗ w l m n for some u ∈ R , v ∈ R , w ∈ R . The set of all decomposable tensors is known as the Segre variety in algebraic geometry. It is a closed set (in both the Euclidean and Zariski sense) as it can be described algebraically: Seg( l m n) = l×m×n = = R , R , R {A ∈ R | ai1i2i3 aj1j2j3 ak1k2k3 al1l2l3 , {iα, jα} {kα, lα}} 8 Tensor ranks m×n Matrix rank. A ∈ R rank( ) = dim(span ) (column rank) A R{A•1,...,A•n} = dim(span ) (row rank) R{A1•,...,Am•} Pr | = min{r | A = i=1uivi } (outer product rank). Multilinear rank. A ∈ l×m×n. rank (A) = (r (A), r (A), r (A)) R 1 2 3 where ( ) = dim(span ) r1 A R{A1••,...,Al••} ( ) = dim(span ) r2 A R{A•1•,...,A•m•} ( ) = dim(span ) r3 A R{A••1,...,A••n} l×m×n Outer product rank. A ∈ R . Pr rank⊗(A) = min{r | A = i=1ui ⊗ vi ⊗ wi} In general, rank⊗(A) 6= r1(A) 6= r2(A) 6= r3(A). 9 Credit Both notions of tensor rank (also the corresponding decomposi- tion) due to Frank L. Hitchcock in 1927. Multilinear rank is a special case (uniplex) of his more general multiplex rank. F.L. Hitchcock, “The expression of a tensor or a polyadic as a sum of products,” J. Math. Phys., 6 (1927), no. 1, pp. 164–189. F.L. Hitchcock, “Multiple invariants and generalized rank of a p-way matrix or tensor,” J. Math. Phys., 7 (1927), no. 1, pp. 39–79. Often wrongly attributed. Predates CANDECOMP/PARAFAC or Tucker by 40 years. 10 Outer product rank l×m×n Theorem (H˚astad). Computing rank⊗(A) for A ∈ R is an NP-hard problem. m×n m×n Matrix: A ∈ R ⊂ C , rank(A) is the same whether we regard it as a real matrix or a complex matrix. l×m×n l×m×n Theorem (Bergman). For A ∈ R ⊂ C , rank⊗(A) is base field dependent. n Example. x, y ∈ R linearly independent and let z = x + iy. x ⊗ x ⊗ x − x ⊗ y ⊗ y + y ⊗ x ⊗ y + y ⊗ y ⊗ x 1 = (z ⊗ ¯z ⊗ ¯z + ¯z ⊗ z ⊗ z) 2 rank⊗(A) is 3 over R and is 2 over C. 11 Fundamental problem of multiway data analysis Let A be a tensor, symmetric tensor, or nonnegative tensor. Solve argminrank(B)≤rkA − Bk where rank may be outer product rank, multilinear rank, sym- metric rank (for symmetric tensors), or nonnegative rank (non- negative tensors). d ×d ×d Example. Given A ∈ R 1 2 3, find ui, vi, wi, i = 1, . , r, that minimizes kA − u1 ⊗ v1 ⊗ w1 − u2 ⊗ v2 ⊗ w2 − · · · − ur ⊗ vr ⊗ zrk. r ×r ×r d ×r or C ∈ R 1 2 3 and Li ∈ R i i that minimizes kA − (L1,L2,L3) · Ck. k n Example. Given A ∈ S (C ), find ui, i = 1, . , r, that minimizes ⊗k ⊗k ⊗k kA − u1 − u2 − · · · − ur k. 13 Harmonic analytic approach to data analysis More generally, F = C, R, R+, Rmax (max-plus algebra), R[x1, . , xn] (polynomial rings), etc. N Dictionary, D ⊂ F , not contained in any hyperplane. Let D2 = union of bisecants to D, D3 = union of trisecants to D,..., Dr = union of r-secants to D. N Define D-rank of A ∈ F to be min{r | A ∈ Dr}. N N If ϕ : F × F → R is some measure of ‘nearness’ between pairs of points (eg. norms, Bregman divergences, etc), we want to find a best low-rank approximation to A: argmin{ϕ(A, B) | D-rank(B) ≤ r}. 14 Feature revelation Get low-rank approximation A ≈ α1 · B1 + ··· + αr · Br ∈ Dr. Bi ∈ D reveal features of the dataset A. Note that another way to say ‘best low-rank’ is ‘sparsest possi- ble’. Example. D = {A | rank⊗(A) ≤ 1}, ϕ(A, B) = kA − BkF — get usual CANDECOMP/PARAFAC. Example. D = {A | rank (A) ≤ (r , r , r )} (an algebraic set), 1 2 3 ϕ(A, B) = kA − BkF — get De Lathauwer decomposition. 15 Simple lemma Lemma (de-Silva, L.). Let r ≥ 2 and k ≥ 3. Given the norm- d ×···×d topology on R 1 k, the following statements are equivalent: (a) The set Sr(d1, . , dk) := {A | rank⊗(A) ≤ r} is not closed. (b) There exists a sequence An, rank⊗(An) ≤ r, n ∈ N, converg- ing to B with rank⊗(B) > r. (c) There exists B, rank⊗(B) > r, that may be approximated arbitrarily closely by tensors of strictly lower rank, ie. inf{kB − Ak | rank⊗(A) ≤ r} = 0. (d) There exists C, rank⊗(C) > r, that does not have a best rank-r approximation, ie. inf{kC − Ak | rank⊗(A) ≤ r} is not attained (by any A with rank⊗(A) ≤ r). 16 Non-existence of best low-rank approximation D. Bini, M. Capovani, F. Romani, and G. Lotti, “O(n2.7799) complexity for n × n approximate matrix multiplication,” Inform. Process. Lett., 8 (1979), no. 5, pp. 234–235. Let x, y, z, w be linearly independent. Define A := x⊗x⊗x+x⊗y ⊗z+y ⊗z⊗x+y ⊗w ⊗z+z⊗x⊗y +z⊗y ⊗w and, for ε > 0, −1 −1 Bε := (y + εx) ⊗ (y + εw) ⊗ ε z + (z + εx) ⊗ ε x ⊗ (x + εy) − ε−1y ⊗ y ⊗ (x + z + εw) − ε−1z ⊗ (x + y + εz) ⊗ x + ε−1(y + z) ⊗ (y + εz) ⊗ (x + εw). Then rank⊗(Bε) ≤ 5, rank⊗(A) = 6 and kBε − Ak → 0 as ε → 0. A has no optimal approximation by tensors of rank ≤ 5.

View Full Text


  • File Type
  • Upload Time
  • Content Languages
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    41 Page
  • File Size


Channel Download Status
Express Download Enable


We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.


For help with questions, suggestions, or problems, please contact us