
Numerical Multilinear Algebra: a new beginning? Lek-Heng Lim Stanford University Matrix Computations and Scientific Computing Seminar Berkeley, CA October 18, 2006 Thanks: G. Carlsson, S. Lacoste-Julien, J.M. Landsberg, B. Sturmfels; Collaborators: P. Comon, V. de Silva, G. Golub, M. Mørup Synopsis • Tensors and tensor ranks • Rank-revealing decompositions • Best low rank approximations • Symmetric tensors and nonnegative tensors • Eigenvalues/vectors of tensors • Singular values/vectors of tensors • Systems of multilinear equations • Multilinear least squares 2 Tensor product of vector spaces U, V, W vector spaces. Think of U ⊗ V ⊗ W as the vector space of all formal linear combinations of terms of the form u ⊗ v ⊗ w, X αu ⊗ v ⊗ w, where α ∈ R, u ∈ U, v ∈ V, w ∈ W. One condition: ⊗ decreed to have the multilinear property (αu1 + βu2) ⊗ v ⊗ w = αu1 ⊗ v ⊗ w + βu2 ⊗ v ⊗ w, u ⊗ (αv1 + βv2) ⊗ w = αu ⊗ v1 ⊗ w + βu ⊗ v2 ⊗ w, u ⊗ v ⊗ (αw1 + βw2) = αu ⊗ v ⊗ w1 + βu ⊗ v ⊗ w2. Up to a choice of bases on U, V, W , A ∈ U ⊗ V ⊗ W can be l,m,n l×m×n represented by a 3-way array A = aijk i,j,k=1 ∈ R . J K 3 Tensors and multiway arrays l,m,n l×m×n A set of multiply indexed real numbers A = aijk i,j,k=1 ∈ R on which the following algebraic operationsJ areK defined: l×m×n 1. Addition/Scalar Multiplication: for bijk ∈ R , λ ∈ R, J K l×m×n aijk + bijk := aijk+bijk and λ aijk := λaijk ∈ R J K J K J K J K J K 2. Multilinear Matrix Multiplication: for matrices L = [λi0i] ∈ p×l q×m r×n R ,M = [µj0j] ∈ R ,N = [νk0k] ∈ R , p×q×r (L, M, N) · A := ci0j0k0 ∈ R J K where l m n X X X ci0j0k0 := λi0iµj0jνk0kaijk. i=1 j=1 k=1 May think of A as a 3-dimensional array of numbers. (L, M, N)·A as multiplication on ‘3 sides’ by matrices L, M, N. 4 Change-of-basis theorem for tensors Two representations A, A0 of A in different bases are related by (L, M, N) · A = A0 with L, M, N respective change-of-basis matrices (non-singular). Henceforth, we will not distinguish between an order-k tensor and a k-way array that represents it (with respect to some implicit choice of basis). 5 Segre outer product l m n l m n If U = R , V = R , W = R , R ⊗ R ⊗ R may be identified with l×m×n R if we define ⊗ by l,m,n u ⊗ v ⊗ w = uivjwk i,j,k=1. J K l×m×n A tensor A ∈ R is said to be decomposable if it can be written in the form A = u ⊗ v ⊗ w l m n for some u ∈ R , v ∈ R , w ∈ R . The set of all decomposable tensors is known as the Segre variety in algebraic geometry. It is a closed set (in both the Euclidean and Zariski sense) as it can be described algebraically: Seg( l m n) = l×m×n = = R , R , R {A ∈ R | ai1i2i3 aj1j2j3 ak1k2k3 al1l2l3 , {iα, jα} {kα, lα}} 6 Tensor ranks m×n Matrix rank. A ∈ R rank( ) = dim(span ) (column rank) A R{A•1,...,A•n} = dim(span ) (row rank) R{A1•,...,Am•} Pr T = min{r | A = i=1uivi } (outer product rank). Multilinear rank. A ∈ l×m×n. rank (A) = (r (A), r (A), r (A)) R 1 2 3 where ( ) = dim(span ) r1 A R{A1••,...,Al••} ( ) = dim(span ) r2 A R{A•1•,...,A•m•} ( ) = dim(span ) r3 A R{A••1,...,A••n} l×m×n Outer product rank. A ∈ R . Pr rank⊗(A) = min{r | A = i=1ui ⊗ vi ⊗ wi} In general, rank⊗(A) 6= r1(A) 6= r2(A) 6= r3(A). 7 Multilinear decomposition Let A ∈ l×m×n and rank (A) = (r , r , r ). Multilinear decom- R 1 2 3 position of A is A = (X, Y, Z) · C. In other words, r1 r2 r3 X X X aijk = xiαyjβzkγcαβγ α=1 β=1 γ=1 l×r m×r for some full-rank matrices X = [xiα] ∈ R 1, Y = [yjβ] ∈ R 2, n×r r ×r ×r Z = [zkγ] ∈ R 3, and core tensor C = cαβγ ∈ R 1 2 3. J K X, Y, Z may be chosen to have orthonormal columns. T T For matrices, this is just the L1DL2 or Q1RQ2 decompositions. 8 Outer product decomposition l×m×n Let A ∈ R and rank⊗(A) = r. The outer product decom- position of A is r X A = dαxα ⊗ yα ⊗ zα = (X, Y, Z) · D. α=1 In other words, r X aijk = dαxiαyjαzkα α=1 T l for dα ∈ R and unit vectors xα = (x1α, . , xlα) ∈ R , yα = T m T n (y1α, . , ymα) ∈ R , zα = (z1α, . , znα) ∈ R , α = 1, . , r. l×r Alternatively, write X = [x1,..., xr] ∈ R , Y = [y1,..., yr] ∈ m×r n×r r×r×r R , Z = [z1,..., zr] ∈ R , D = diag(d1, . , dr) ∈ R . 9 Credit These ‘rank-revealing’ decompositions are commonly referred to as the CANDECOMP/PARAFAC model and Tucker models, used originally in the psychometrics in the 1970s. Both notions of tensor rank (also the corresponding decomposi- tion) are really due to Frank L. Hitchcock. Multilinear rank is a special case (uniplex) of his more general multiplex rank. F.L. Hitchcock, “The expression of a tensor or a polyadic as a sum of products,” J. Math. Phys., 6 (1927), no. 1, pp. 164–189. F.L. Hitchcock, “Multiple invariants and generalized rank of a p-way matrix or tensor,” J. Math. Phys., 7 (1927), no. 1, pp. 39–79. 10 Tensor rank is difficult Mystical Power of Twoness (Eugene L. Lawler). 2-SAT is easy, 3-SAT is hard; 2-dimensional matching is easy, 3-dimensional matching is hard; etc. Matrix rank is easy, tensor rank is hard: l×m×n Theorem (H˚astad). Computing rank⊗(A) for A ∈ R is an NP-hard problem. Tensor rank depends on base field: l×m×n l×m×n Theorem (Bergman). For A ∈ R ⊂ C , rank⊗(A) is base field dependent. n Example. x, y ∈ R linearly independent and let z = x + iy. x ⊗ x ⊗ x − x ⊗ y ⊗ y + y ⊗ x ⊗ y + y ⊗ y ⊗ x 1 = (z ⊗ ¯z ⊗ ¯z + ¯z ⊗ z ⊗ z) 2 rank⊗(A) is 3 over R and is 2 over C. 11 Tensor rank is useful P. B¨urgisser, M. Clausen, and M.A. Shokrollahi, Algebraic com- plexity theory, Springer-Verlag, Berlin, 1996. n×n For A = [aij],B = [bjk] ∈ R , n n AB = X a b E = X ϕ (A)ϕ (B)E i,j,k=1 ik kj ij i,j,k=1 ik kj ij T n×n where Eij = eiej ∈ R . Let n T = X ϕ ⊗ ϕ ⊗ E . i,j,k=1 ik kj ij O(n2+ε) algorithm for multiplying two n×n matrices gives O(n2+ε) algorithm for solving system of n linear equations [Stassen 1969]. 2+ε Conjecture. rank⊗(T ) = O(n ). Best known results. O(n2.376) [Coppersmith-Winograd, Cohn- Kleinberg-Szegedy-Umans]. If n = 2, rank⊗(T ) = rank⊗(T ) = 7 [Strassen; Winograd; Hopcroft-Kerr; Landsberg]. 12 Fundamental problem of multiway data analysis Let A be a tensor, symmetric tensor, or nonnegative tensor. Solve argminrank(B)≤rkA − Bk where rank may be outer product rank, multilinear rank, sym- metric rank (for symmetric tensors), or nonnegative rank (non- negative tensors). d ×d ×d Example. Given A ∈ R 1 2 3, find ui, vi, wi, i = 1, . , r, that minimizes kA − u1 ⊗ v1 ⊗ w1 − u2 ⊗ v2 ⊗ w2 − · · · − ur ⊗ vr ⊗ zrk. r ×r ×r d ×r or C ∈ R 1 2 3 and Li ∈ R i i that minimizes kA − (L1,L2,L3) · Ck. k n Example. Given A ∈ S (C ), find ui, i = 1, . , r, that minimizes ⊗k ⊗k ⊗k kA − u1 − u2 − · · · − ur k. 14 Harmonic analytic approach to data analysis More generally, F = C, R, R+, Rmax (max-plus algebra), R[x1, . , xn] (polynomial rings), etc. N Dictionary, D ⊂ F , not contained in any hyperplane. Let D2 = union of bisecants to D, D3 = union of trisecants to D,..., Dr = union of r-secants to D. N Define D-rank of A ∈ F to be min{r | A ∈ Dr}. N N If ϕ : F × F → R is some measure of ‘nearness’ between pairs of points (eg. norms, Bregman divergences, etc), we want to find a best low-rank approximation to A: argmin{ϕ(A, B) | D-rank(B) ≤ r}. 15 Feature revelation Get low-rank approximation A ≈ α1 · B1 + ··· + αr · Br ∈ Dr. Bi ∈ D reveal features of the dataset A. Note that another way to say ‘best low-rank’ is ‘sparsest possi- ble’. Example. D = {A | rank⊗(A) ≤ 1}, ϕ(A, B) = kA − BkF — get usual CANDECOMP/PARAFAC. Example. D = {A | rank (A) ≤ (r , r , r )} (an algebraic set), 1 2 3 ϕ(A, B) = kA − BkF — get De Lathauwer decomposition. 16 Simple lemma Lemma (de-Silva, L.). Let r ≥ 2 and k ≥ 3. Given the norm- d ×···×d topology on R 1 k, the following statements are equivalent: (a) The set Sr(d1, . , dk) := {A | rank⊗(A) ≤ r} is not closed. (b) There exists a sequence An, rank⊗(An) ≤ r, n ∈ N, converg- ing to B with rank⊗(B) > r. (c) There exists B, rank⊗(B) > r, that may be approximated arbitrarily closely by tensors of strictly lower rank, ie. inf{kB − Ak | rank⊗(A) ≤ r} = 0. (d) There exists C, rank⊗(C) > r, that does not have a best rank-r approximation, ie.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages52 Page
-
File Size-