Hyperdeterminant and Tensor Rank
Total Page:16
File Type:pdf, Size:1020Kb
Hyperdeterminant and Tensor Rank Lek-Heng Lim Workshop on Tensor Decompositions and Applications CIRM, Luminy, France August 29–September 2, 2005 Thanks: NSF DMS 01-01364 Collaborators Pierre Comon Laboratoire I3S Universit´ede Nice Sophia Antipolis Vin de Silva Department of Mathematics Stanford University 2 Matrix Multiplication Let f : U → V and g : V → W be linear maps; U, V, W vector spaces over R of dimensions n, m, l. With choice of bases on U, V, W , g, f have matrix representations l×m m×n A = [aij] ∈ R ,B = [bjk] ∈ R . The matrix representation of h = g ◦ f (ie. h(x) := g(f(x))) is l×n then C = [cik] ∈ R where n c := X a b . ik j=1 ij jk Similarly for bilinear g : V1 × V2 → R and linear f1 : U1 → V1, f2 : d ×d d ×s U2 → V2 with matrix representations A ∈ R 1 2, B1 ∈ R 1 1, d ×s B2 ∈ R 2 2. The composite map h, where h(x, y) := g(f1(x), f2(y)), has ma- trix representation > s1×s2 C = B2 AB1 ∈ R . 3 Multilinear Matrix Multiplication Do the same for multilinear map g : V1 × · · · × Vk → R and linear maps f1 : U1 → V1, . , fk : Uk → Vk; dim(Vi) = si, dim(Ui) = di. With choice of bases on Vi’s and Ui’s, g is represented by A = d1×···×dk and by = [ 1 ] d1×s1 = aj1···j ∈ R f1, . , fk M1 mj i ∈ R ,...,Mk J kK 1 1 [mk ] ∈ dk×sk. jkik R If we compose g by f1, . , fk to get h : U1 × · · · × Uk → R defined by h(x1,..., xk) = g(f(x1), . , f(xk)), s1×···×sk then h is represented by ci1···i ∈ R where J kK c := Pd1 ··· Pdk a m1 ··· mk . (1) i1···ik j1=1 jk=1 j1···jk j1i1 jkik The covariant multilinear matrix multiplication will be written s1×···×sk A(M1,...,Mk) := ci1···i ∈ R . J kK 4 Contravariant Version The contravariant multilinear matrix multiplication of aj1···jk ∈ d1×···×dk by matrices L = [`1 ] ∈ r1×d1,...,L =J [`k ]K ∈ R 1 i1j1 R k ikjk r ×d R k k is defined by r1×···×rk (L1,...,Lk)A = bi1···i ∈ R , J kK b := Pd1 ··· Pdk `1 ··· `k a . (2) i1···ik j1=1 jk=1 i1j1 ikjk j1···jk ∗ This comes from the composition of a multilinear map g : V1 × ∗ · · · × Vk → R by linear maps f1 : V1 → U1, . , fk : Vk → Uk. Simple relation if we disregard covariance/contravariance: > > (L1,...,Lk)A = A(L1 ,...,Lk ) > > A(M1,...,Mk) = (M1 ,...,Mk )A. > † Works over C too (replace Li by Li ). 5 Properties d ×···×d r ×d • Let A, B ∈ R 1 k and λ, µ ∈ R. Let L1 ∈ R 1 1,..., Lk ∈ r ×d R k k. Then (L1,...,Lk)(λA + µB) = λ(L1,...,Lk)A + µ(L1,...,Lk)B. d ×···×d r ×d r ×d • Let A ∈ R 1 k. Let L1 ∈ R 1 1,..., Lk ∈ R k k, and s ×r s ×r M1 ∈ R 1 1,..., Mk ∈ R k k. Then (M1,..., Mk)(L1,..., Lk)A = (M1L1,..., MkLk)A s ×d where MiLi ∈ R i i is simply the matrix-matrix product of Mi and Li. d ×···×d r ×d • Let A ∈ R 1 k and λ, µ ∈ R. Let L1 ∈ R 1 1,..., Lj, Mj ∈ rj×dj r ×d R ,...,Lk ∈ R k k. Then A(L1, . , λLj + µMj,...,Lk) = λ(L1,..., Lj,...,Lk)A + µ(L1,..., Mj,...,Lk)A. 6 Aside: Relation with Kronecker Product d ×···×d d ···d Forgetful map R 1 k → R 1 k, A 7→ vec(A) (‘forgets’ the multilinear structure), then vec((L1,...,Lk)A) = L1 ⊗ · · · ⊗ Lk vec(A). d ···d ×d ···d where L1 ⊗ · · · ⊗ Lk ∈ R 1 k 1 k is the Kronecker product of L1,...,Lk. 7 Matrix Techniques m×n×l m×n×l Start with R and C . l = 2 is well understood, may m×n 2 m×n 2 be regarded as pairs of matrices (A, B) ∈ (C ) or (R ) , m×n or equivalently, as a matrix pencil λA + µB ∈ C[λ, µ] or m×n R[λ, µ] . Kronecker-Weierstrass Theory. There exist S ∈ GL(m),T ∈ GL(n) such that (SAT, SBT ) can be decomposed into block pairs of the following forms 1 0 0 1 1 0 .. .. (p+1)×p 0 . , 1 . ∈ R , . . .. 1 .. 0 0 1 1 0 0 1 1 0 0 1 , ∈ q×(q+1), ... ... ... ... R 1 0 0 1 1 0 −a0 . 1 1 .. , ∈ r×r. ... .. R . 0 −ar−2 1 1 −ar−1 Likewise for C. Similar but simpler results obtained by Jos ten Berge for generic pairs. 8 Larger Sizes and Higher Orders Want to obtain results as general as possible — for tensors of arbitrary size and order over both R and C. For larger values of k or d1, . , dk, techniques relying on multilinear matrix multipli- cations become increasingly less effective. Inherent limitation: d1×···×d k dim(R k) = d1 ····· dk = O(d ) while 2 2 2 dim(GL(d1) × · · · × GL(dk)) = d1 + ··· + dk = O(kd ) and 2 dim(O(d1)×· · ·×O(dk)) = d1(d1−1)/2+···+dk(dk−1)/2 = O(kd ). d ×···×d The action of GL(d1)×· · ·×GL(dk) on R 1 k has uncountably many orbits, {(L1,...,Lk)A | (L1,...,Lk) ∈ GL(d1) × · · · × GL(dk)}, as soon as di > 2, k > 4. 9 Multilinear Functional and its Gradient d ×···×d Multilinear functional associated with A ∈ R 1 k, ie. d1 d fA : R × · · · × R k → R, (3) Pd1 Pdk 1 k (x1,..., x ) 7→ ··· a x ··· x , k j1=1 jk=1 j1···jk j1 jk can be written as fA(x1,..., xk) = A(x1,..., xk) (4) where the rhs is the right multilinear multiplication by xi = (xi , . , xi )>, regarded as a d × 1 matrix. 1 di i Gradient of fA may be written as ∇fA = (∇x1fA,..., ∇xkfA) where ∂fA ∂fA ∇ if (x1,..., x ) = ,..., = A(x1,..., xi−1,I , x +1,..., x ). x A k ∂xi ∂xi di i k 1 di denotes identity matrix. Idi di × di 10 Hyperdeterminant (d +1)×···×(d +1) Work in C 1 k for the time being (di ≥ 1). Consider (d1+1)×···×(d +1) S := {A ∈ C k | ∇fA(x1,..., xk) = 0 for some non-zero (x1,..., xk)}. Theorem (Gelfand, Kapranov, Zelevinsky, 1992). S is a hypersurface if and only if X dj ≤ di i6=j for all j = 1, . , k. Let ∆ be the equation of the hypersurface, ie. a multivariate polynomial in the entries of A such that (d +1)×···×(d +1) S = {A ∈ C 1 k | ∆(A) = 0}. Then ∆ may be chosen to have integer coefficients. m×n For C , the condition becomes m ≤ n and n ≤ m — that’s why matrix determinants is only defined for square matrices. Since ∆ has integer coefficients, ∆(A) is real-valued for A ∈ (d +1)×···×(d +1) R 1 k . 11 Geometric View (d +1)×···×(d +1) d +1 Let X = {x1 ⊗ · · · ⊗ xk ∈ C 1 k | xi ∈ C i } be the (smooth) manifold of decomposable tensors (X oftened called the Segre variety). (d +1)×···×(d +1) Let A ∈ C 1 k . Then the condition ∇fA(x1,..., xk) = 0 for some non-zero (x1,..., xk) is equivalent to saying that the hyperplane orthogonal to A, ie. (d1+1)×···×(d +1) HA := {B ∈ C k | hA, Bi = 0} contains a tangent to X at the point x1 ⊗ · · · ⊗ xk. This may also be taken as an alternative definition of the hyperdeterminant ∆(A). Projective duality: X∗ = S. 12 ¢(A) = 0 rank(A) = 1 ¢(B) ≠ 0 rank(B) = 2 rank(B) = 2 ¢(A) = 0 ¢(B) = 0 rank(A) = 1 Minor Inaccuracy (d +1)×···×(d +1) Should really be working in projective spaces P (C 1 k ) = (d +1)···(d +1)−1 P 1 k . This is the set of equivalence classes (d +1)×···×(d +1) × [A] := {λA ∈ C 1 k | λ ∈ C }. (d +1)×···×(d +1) Thing to note is that the for any A ∈ C 1 k and × λ ∈ C , rank⊗(λA) = rank⊗(A). (d +1)×···×(d +1) So outer-product rank is well-defined in P (C 1 k ), (d +1)×···×(d +1) ie. given [A] ∈ P (C 1 k ) define rank⊗([A]) = rank⊗(A) for any A ∈ [A]. 13 Examples A. Cayley, “On the theory of linear transformation,” Cambridge Math. J., 4 (1845), pp. 193–209. 2×2×2 Hyperdeterminant of A = [[aijk]] ∈ R is " " # " #! 1 a a a a ∆(A) = det 000 010 + 100 110 4 a001 a011 a101 a111 " # " #!#2 a a a a − det 000 010 − 100 110 a001 a011 a101 a111 " # " # a a a a − 4 det 000 010 det 100 110 . a001 a011 a101 a111 A result that parallels the matrix case is the following: the system 14 of bilinear equations a000x0y0 + a010x0y1 + a100x1y0 + a110x1y1 = 0, a001x0y0 + a011x0y1 + a101x1y0 + a111x1y1 = 0, a000x0z0 + a001x0z1 + a100x1z0 + a101x1z1 = 0, a010x0z0 + a011x0z1 + a110x1z0 + a111x1z1 = 0, a000y0z0 + a001y0z1 + a010y1z0 + a011y1z1 = 0, a100y0z0 + a101y0z1 + a110y1z0 + a111y1z1 = 0, has a non-trivial solution iff ∆(A) = 0. Examples 2×2×3 Hyperdeterminant of A = [[aijk]] ∈ R is a000 a001 a002 a100 a101 a102 ∆(A) = det a100 a101 a102 det a010 a011 a012 a010 a011 a012 a110 a111 a112 a000 a001 a002 a000 a001 a002 − det a100 a101 a102 det a010 a011 a012 a110 a111 a112 a110 a111 a112 Again, the following is true: a000x0y0 + a010x0y1 + a100x1y0 + a110x1y1 = 0, a001x0y0 + a011x0y1 + a101x1y0 + a111x1y1 = 0, a002x0y0 + a012x0y1 + a102x1y0 + a112x1y1 = 0, a000x0z0 + a001x0z1 + a002x0z2 + a100x1z0 + a101x1z1 + a102x1z2 = 0, a010x0z0 + a011x0z1 + a012x0z2 + a110x1z0 + a111x1z1 + a112x1z2 = 0, a000y0z0 + a001y0z1 + a002y0z2 + a010y1z0 + a011y1z1 + a012y1z2 = 0, a100y0z0 + a101y0z1 + a102y0z2 + a110y1z0 + a111y1z1 + a112y1z2 = 0, has a non-trivial solution iff ∆(A) = 0.