Numerical Multilinear Algebra: a New Beginning?

Numerical Multilinear Algebra: a new beginning? Lek-Heng Lim Stanford University Matrix Computations and Scientific Computing Seminar Berkeley, CA October 18, 2006 Thanks: G. Carlsson, S. Lacoste-Julien, J.M. Landsberg, B. Sturmfels; Collaborators: P. Comon, V. de Silva, G. Golub, M. Mørup Synopsis • Tensors and tensor ranks • Rank-revealing decompositions • Best low rank approximations • Symmetric tensors and nonnegative tensors • Eigenvalues/vectors of tensors • Singular values/vectors of tensors • Systems of multilinear equations • Multilinear least squares 2 Tensor product of vector spaces U, V, W vector spaces. Think of U ⊗ V ⊗ W as the vector space of all formal linear combinations of terms of the form u ⊗ v ⊗ w, X αu ⊗ v ⊗ w, where α ∈ R, u ∈ U, v ∈ V, w ∈ W. One condition: ⊗ decreed to have the multilinear property (αu1 + βu2) ⊗ v ⊗ w = αu1 ⊗ v ⊗ w + βu2 ⊗ v ⊗ w, u ⊗ (αv1 + βv2) ⊗ w = αu ⊗ v1 ⊗ w + βu ⊗ v2 ⊗ w, u ⊗ v ⊗ (αw1 + βw2) = αu ⊗ v ⊗ w1 + βu ⊗ v ⊗ w2. Up to a choice of bases on U, V, W , A ∈ U ⊗ V ⊗ W can be l,m,n l×m×n represented by a 3-way array A = aijk i,j,k=1 ∈ R . J K 3 Tensors and multiway arrays l,m,n l×m×n A set of multiply indexed real numbers A = aijk i,j,k=1 ∈ R on which the following algebraic operationsJ areK defined: l×m×n 1. Addition/Scalar Multiplication: for bijk ∈ R , λ ∈ R, J K l×m×n aijk + bijk := aijk+bijk and λ aijk := λaijk ∈ R J K J K J K J K J K 2. Multilinear Matrix Multiplication: for matrices L = [λi0i] ∈ p×l q×m r×n R ,M = [µj0j] ∈ R ,N = [νk0k] ∈ R , p×q×r (L, M, N) · A := ci0j0k0 ∈ R J K where l m n X X X ci0j0k0 := λi0iµj0jνk0kaijk. i=1 j=1 k=1 May think of A as a 3-dimensional array of numbers. (L, M, N)·A as multiplication on ‘3 sides’ by matrices L, M, N. 4 Change-of-basis theorem for tensors Two representations A, A0 of A in different bases are related by (L, M, N) · A = A0 with L, M, N respective change-of-basis matrices (non-singular). Henceforth, we will not distinguish between an order-k tensor and a k-way array that represents it (with respect to some implicit choice of basis). 5 Segre outer product l m n l m n If U = R , V = R , W = R , R ⊗ R ⊗ R may be identified with l×m×n R if we define ⊗ by l,m,n u ⊗ v ⊗ w = uivjwk i,j,k=1. J K l×m×n A tensor A ∈ R is said to be decomposable if it can be written in the form A = u ⊗ v ⊗ w l m n for some u ∈ R , v ∈ R , w ∈ R . The set of all decomposable tensors is known as the Segre variety in algebraic geometry. It is a closed set (in both the Euclidean and Zariski sense) as it can be described algebraically: Seg( l m n) = l×m×n = = R , R , R {A ∈ R | ai1i2i3 aj1j2j3 ak1k2k3 al1l2l3 , {iα, jα} {kα, lα}} 6 Tensor ranks m×n Matrix rank. A ∈ R rank( ) = dim(span ) (column rank) A R{A•1,...,A•n} = dim(span ) (row rank) R{A1•,...,Am•} Pr T = min{r | A = i=1uivi } (outer product rank). Multilinear rank. A ∈ l×m×n. rank (A) = (r (A), r (A), r (A)) R 1 2 3 where ( ) = dim(span ) r1 A R{A1••,...,Al••} ( ) = dim(span ) r2 A R{A•1•,...,A•m•} ( ) = dim(span ) r3 A R{A••1,...,A••n} l×m×n Outer product rank. A ∈ R . Pr rank⊗(A) = min{r | A = i=1ui ⊗ vi ⊗ wi} In general, rank⊗(A) 6= r1(A) 6= r2(A) 6= r3(A). 7 Multilinear decomposition Let A ∈ l×m×n and rank (A) = (r , r , r ). Multilinear decom- R 1 2 3 position of A is A = (X, Y, Z) · C. In other words, r1 r2 r3 X X X aijk = xiαyjβzkγcαβγ α=1 β=1 γ=1 l×r m×r for some full-rank matrices X = [xiα] ∈ R 1, Y = [yjβ] ∈ R 2, n×r r ×r ×r Z = [zkγ] ∈ R 3, and core tensor C = cαβγ ∈ R 1 2 3. J K X, Y, Z may be chosen to have orthonormal columns. T T For matrices, this is just the L1DL2 or Q1RQ2 decompositions. 8 Outer product decomposition l×m×n Let A ∈ R and rank⊗(A) = r. The outer product decomposition of A is r X A = dαxα ⊗ yα ⊗ zα = (X, Y, Z) · D. α=1 In other words, r X aijk = dαxiαyjαzkα α=1 T l for dα ∈ R and unit vectors xα = (x1α, . , xlα) ∈ R , yα = T m T n (y1α, . , ymα) ∈ R , zα = (z1α, . , znα) ∈ R , α = 1, . , r. l×r Alternatively, write X = [x1,..., xr] ∈ R , Y = [y1,..., yr] ∈ m×r n×r r×r×r R , Z = [z1,..., zr] ∈ R , D = diag(d1, . , dr) ∈ R . 9 Credit These ‘rank-revealing’ decompositions are commonly referred to as the CANDECOMP/PARAFAC model and Tucker models, used originally in the psychometrics in the 1970s. Both notions of tensor rank (also the corresponding decomposition) are really due to Frank L. Hitchcock. Multilinear rank is a special case (uniplex) of his more general multiplex rank. F.L. Hitchcock, “The expression of a tensor or a polyadic as a sum of products,” J. Math. Phys., 6 (1927), no. 1, pp. 164–189. F.L. Hitchcock, “Multiple invariants and generalized rank of a p-way matrix or tensor,” J. Math. Phys., 7 (1927), no. 1, pp. 39–79. 10 Tensor rank is difficult Mystical Power of Twoness (Eugene L. Lawler). 2-SAT is easy, 3-SAT is hard; 2-dimensional matching is easy, 3-dimensional matching is hard; etc. Matrix rank is easy, tensor rank is hard: l×m×n Theorem (H˚astad). Computing rank⊗(A) for A ∈ R is an NP-hard problem. Tensor rank depends on base field: l×m×n l×m×n Theorem (Bergman). For A ∈ R ⊂ C , rank⊗(A) is base field dependent. n Example. x, y ∈ R linearly independent and let z = x + iy. x ⊗ x ⊗ x − x ⊗ y ⊗ y + y ⊗ x ⊗ y + y ⊗ y ⊗ x 1 = (z ⊗ ¯z ⊗ ¯z + ¯z ⊗ z ⊗ z) 2 rank⊗(A) is 3 over R and is 2 over C. 11 Tensor rank is useful P. Bürgisser, M. Clausen, and M.A. Shokrollahi, Algebraic com- plexity theory, Springer-Verlag, Berlin, 1996. n×n For A = [aij],B = [bjk] ∈ R , n n AB = X a b E = X ϕ (A)ϕ (B)E i,j,k=1 ik kj ij i,j,k=1 ik kj ij T n×n where Eij = eiej ∈ R . Let n T = X ϕ ⊗ ϕ ⊗ E . i,j,k=1 ik kj ij O(n2+ε) algorithm for multiplying two n×n matrices gives O(n2+ε) algorithm for solving system of n linear equations [Stassen 1969]. 2+ε Conjecture. rank⊗(T ) = O(n ). Best known results. O(n2.376) [Coppersmith-Winograd, Cohn- Kleinberg-Szegedy-Umans]. If n = 2, rank⊗(T ) = rank⊗(T ) = 7 [Strassen; Winograd; Hopcroft-Kerr; Landsberg]. 12 Fundamental problem of multiway data analysis Let A be a tensor, symmetric tensor, or nonnegative tensor. Solve argminrank(B)≤rkA − Bk where rank may be outer product rank, multilinear rank, symmetric rank (for symmetric tensors), or nonnegative rank (nonnegative tensors). d ×d ×d Example. Given A ∈ R 1 2 3, find ui, vi, wi, i = 1, . , r, that minimizes kA − u1 ⊗ v1 ⊗ w1 − u2 ⊗ v2 ⊗ w2 − · · · − ur ⊗ vr ⊗ zrk. r ×r ×r d ×r or C ∈ R 1 2 3 and Li ∈ R i i that minimizes kA − (L1,L2,L3) · Ck. k n Example. Given A ∈ S (C ), find ui, i = 1, . , r, that minimizes ⊗k ⊗k ⊗k kA − u1 − u2 − · · · − ur k. 14 Harmonic analytic approach to data analysis More generally, F = C, R, R+, Rmax (max-plus algebra), R[x1, . , xn] (polynomial rings), etc. N Dictionary, D ⊂ F , not contained in any hyperplane. Let D2 = union of bisecants to D, D3 = union of trisecants to D,..., Dr = union of r-secants to D. N Define D-rank of A ∈ F to be min{r | A ∈ Dr}. N N If ϕ : F × F → R is some measure of ‘nearness’ between pairs of points (eg. norms, Bregman divergences, etc), we want to find a best low-rank approximation to A: argmin{ϕ(A, B) | D-rank(B) ≤ r}. 15 Feature revelation Get low-rank approximation A ≈ α1 · B1 + ··· + αr · Br ∈ Dr. Bi ∈ D reveal features of the dataset A. Note that another way to say ‘best low-rank’ is ‘sparsest possi- ble’. Example. D = {A | rank⊗(A) ≤ 1}, ϕ(A, B) = kA − BkF — get usual CANDECOMP/PARAFAC. Example. D = {A | rank (A) ≤ (r , r , r )} (an algebraic set), 1 2 3 ϕ(A, B) = kA − BkF — get De Lathauwer decomposition. 16 Simple lemma Lemma (de-Silva, L.). Let r ≥ 2 and k ≥ 3. Given the norm- d ×···×d topology on R 1 k, the following statements are equivalent: (a) The set Sr(d1, . , dk) := {A | rank⊗(A) ≤ r} is not closed. (b) There exists a sequence An, rank⊗(An) ≤ r, n ∈ N, converg- ing to B with rank⊗(B) > r. (c) There exists B, rank⊗(B) > r, that may be approximated arbitrarily closely by tensors of strictly lower rank, ie. inf{kB − Ak | rank⊗(A) ≤ r} = 0. (d) There exists C, rank⊗(C) > r, that does not have a best rank-r approximation, ie.

Load more