<<

I: Basic Operations and Representations

1 Overview

Tensors: Vectors, matrices and so on … Definition Operators PARAFAC/Candecomp, polyadic, CP Tucker, HOSVD

2 Different Matrix Products

Kronecker A = (a1  an ), B = (b1  bm ) Matrix case:  a B a B  a B   11 12 1n   a B a B  a B  A⊗ B = 21 22 2n =           am1B am2 B  amn B

= (a1 ⊗b1 a1 ⊗b2 a1 ⊗b3  an ⊗bm−1 an ⊗bm )

 5 7 15 21   1 3 5 7  6 8 18 24 A =  , B =  ,⇒ A⊗ B = 2 4 6 8 10 14 20 28         12 16 24 32

3 Different Matrix Products

Vector case (row or column form):

a ⊗b = (a1  an )⊗ (b1  bm ) =

= (a1b  an b) = (a1b1 ⋅⋅ a1bm  anb1 ⋅⋅ anbm )

 b   1  a ⊗b = (a1  an )⊗   =   bm   a b  a b   1 1 n 1  = (a1b  an b) =       a1bm  anbm  4 Different Matrix Products

Khatri-Rao product:

A = (a1  an ), B = (b1  bn );

A• B = (a1 ⊗b1 a2 ⊗b2 a3 ⊗b3  an−1 ⊗bn−1 an ⊗bn )

= matching columnwise only for matrices with the same of columns!

 5 21   1 3 5 7  6 24 A =  , B =  ,⇒ A• B = 2 4 6 8 10 28         12 32

5 Different Matrix Products

Hadamard product:

 a  a   b  b   11 1n   11 1n  A =    , B =    ,     am1  amn  bm1  bmn 

 a b a b  a b   11 11 12 12 1n 1n   a b a b  a b  A∗ B = 21 21 22 22 2n 2n           am1bm1 am2bm2  amnbnm 

only for matrices of equal size!

6 Definition as multi-indexed object:  x   1  x = (x )n = (x )n =  = (  ) One index: vector: i i=1 i1 i =1   or x x1 xn 1    xn   a  a   1,1 1,m  A = (A )n, m = (A )n1 , n2 =      Two indices: matrix: i, j = = i1 ,i2 = = i 1, j 1 i1 1,i2 1   an,1  an,m 

a1,1,2 n, m l n , n n 1 2 3 a1,1,1 a1,2,1  Three indices: cube: A = (Ai, j,k ) = (Ai ,i ,i ) i=1, j=1,k=1 1 2 3 i1 =1,i2 =1,i3 =1

a2,1,1  n , n ,  ,n Multi-index: x = (x ) 1 2 N i1i2 ...iN i =1,i =1,.....,i =1 1 2 N 7 Motivation: Why tensors?

PDE for two-dimensional problems:

(aux )x + (bu y ) y = f (x, y)

aui−1, j + aui+1, j − (a + b)ui, j + bui, j−1 + bui, j+1 Discretization in 2D: = f h2 i, j ui,j can be seen as a vector or as a 2-way tensor=matrix.

Linear system Au=f with A:

Aij,kmukm = fij

So matrix Aij,km can be also seen as a 4-way tensor

8 Motivation: Why tensors?

PDE with a number of additional parameters, high-dimensional problems:

+ + = auxx bu yy cuzz f for discrete sets of parameters aijk

Leads to Amn for each i,j,k  Amn,ijk

Classical matrix/vector problems but for huge problems: Represent vector/matrix by tensor with efficient representation. x = x i i1...iN

9 Graphical Notation

i

Vector (1 leg): (xi )i ↔

i Matrix (2 legs): a ↔ ( ij )i, j j

i Cube (3 legs): x ↔ ( ijk )i, j,k k j

i1 i2 … iN General tensor with N legs … (x ) ↔ i1...iN i ,...,i 1 N 10 Graphical Notation

Matrix-vector product – contraction over index i: a ⋅(x ) = y ↔ i ↔ ( ij )i, j i i ( j )j j j

∑ aij xi = aij xi = y j i

Einstein notation, shared indices are contracted via . No distinction between covariant and contravariant!

11 Basic Operations

y

x y i1 Contraction ∑ i1 i1 gives z i x 1

x i x i y i gives 2-tensor zi i 1 1 2 1 2 = z y i1 i2 i2

x   y ′ ′ ′  ′ = z   ′ ′ ′  ′ More general: ∑ i1 in iN i1 in−1inin+1 iM i1 in−1in+1 iN i1 in−1in+1 iM in

i‘1..i‘n-1 i‘n+1..i‘M y z in = x i1i‘1…in-1i‘n-1in+1i‘n+1…iNi‘M

i1..in-1 in+1..iN 12 Tensor as data hive of different form

T kron(x, y) = x ⊗ y = (x1 y  xn y) = (xi yi ) seen as a column vector 1 2 i1 ,i2  x y  x y   1 1 1 m  xyT =    = (x y ) seen as a matrix   i1 i2 i ,i   1 2  xn y1  xn ym  = kron(yT , x) = yT ⊗ x

 y x  y x   1 1 1 n  yxT =    = (x y ) seen as a matrix   i1 i2 i ,i   1 2  ym x1  ym xn  = kron(xT , y) = xT ⊗ y

x  y = (x y ) i1 i2 i ,i seen as a two-leg tensor 1 2 13 Matrix

i1 A A Matrix: i1i2 i2

x A x = z i1 z Operations: Contractions ∑ i1i2 i1 i2 A i2 i1 i2

i 1 i1 A y = z A z ∑ i1i2 i2 i1 i 2 i2 y

x i Ai i xi yi = z 1 ∑ 1 2 1 2 A z i1i2 i2 y

i3 i B 3 A B = C i2 C ∑ i1i2 i2i3 i1i3 i1 i2 A i1 A x = C Tensor product: i1i2 i3 i1i2i3 14 Three Leg as Standardexample

i1

i3 A i1i2i3 i2

Operations: Contractions in i1 , i2 , i3 or combinations gives tensor with less legs.

Tensor product gives tensor with more legs.

 (ai ) See tensor as - collection of vectors fiber 1 i2i3

- collection of matrices  slices (Ai i ) 1 2 i3

A = A - large matrix, unfolding {i1i2 }i3 j1i3

Operations between tensors are defined by contracted indices. 15 Fibers 13 16 19 22 1 4 7 10 A: 3 x 4 x 2 – tensor 14 17 20 23 2 5 8 11 15 18 21 24 3 6 9 12

1 4 7 10 13 16 19 22 Mode-1 fibers, X:,j,k: 2 5 8 11 14 17 20 23 3 6 9 12 15 18 21 24

1 2 3 13 14 15 4 5 6 16 17 18 Mode-2 fibers, X : j,:,k 7 8 9 19 20 21 10 11 12 22 23 24

1 2 3 4 5 6 7 8 9 10 11 12 Mode-3 fibers, Xj,k,:: 13 14 15 16 17 18 19 20 21 22 23 24

16 Slices 13 16 19 22 1 4 7 10 A: 3 x 4 x 2 – tensor 14 17 20 23 2 5 8 11 15 18 21 24 3 6 9 12

1 4 7 10 13 16 19 22 Frontal slices, 1,2: X:,:,k 2 5 8 11 14 17 20 23 3 6 9 12 15 18 21 24

1 13 4 16 7 19 10 22 Lateral slices, 1,3: X:,k,: 2 14 5 17 8 20 11 23 3 15 6 18 9 21 12 24

13 16 19 22 14 17 20 23 15 18 21 24 Horizontal sl. 2,3: Xk,:,: 1 4 7 10 2 5 8 11 3 6 9 12

17 Matricification 13 16 19 22 1 4 7 10 A: 3 x 4 x 2 – tensor 14 17 20 23 2 5 8 11 15 18 21 24 3 6 9 12

1 4 7 10 13 16 19 22   A = A i1{i2i3} i1 j1 Mode-1 unfolding: A(1) = 2 5 8 11 14 17 20 23   3 6 9 12 15 18 21 24 j1=i2+n2(i3-1)

 1 2 3 13 14 15    4 5 6 16 17 18 Mode-2 unfolding A(2) = Ai {i i }  7 8 9 19 20 21 2 1 3     10 11 12 22 23 24

 1 2 3 4 5 6 7 8 9 10 11 12  =   Mode-3 unfolding A(3)   13 14 15 16 17 18 19 20 21 22 23 24

T Vectorization: vec(A) = (1 2  23 24) 18 General Matricification

Tensor A → A = A Matrix i1...inin+1...iN {i1...in }{in+1...iN } ij

i = i1 + n2 (i2 −1) + n2n3 (i3 −1) +...+ n2 nn (in −1),

j = in+1 + nn+2 (in+2 −1) + nn+2nn+3 (in+3 −1) +...+ nn+2 nN (iN −1). or with any partitioning of the indices in two groups (rows/columns)

General remark on notation: many properties/operations with tensors are formulated using totally different notations! ►,◄,ʘ, ⊗ ,•,,×, 

19 Transformation

I J K Tensor (1) (2) (3) A = ∑ ∑ ∑ Aijk ei ⊗ e j ⊗ ek i=1 j=1 k =1

(l) (l) (l) ei = Q e'i   =  (1) (1) ⊗ (2) (2) ⊗ (3) (3)  = A' pqr ∑∑∑ AijkQ e'i Q e' j Q e'k   i j k  pqr I J K (1) (2) (3) = ∑ ∑ ∑ Qpi Qqj Qrk Aijk i=1 j=1 k =1

Notation: A'= (Q(1) ,Q(2) ,Q(3) )⋅ A 20 n-Mode Product of Tensor with Matrix

Tensor Matrix In A , U : (A× U ) = a ⋅u =: B i1...in ...iN jin n i1...in−1 jin+1...iN ∑ i1...iN jin i1... j...iN in =1

Contraction over in, in replaced by index j:=i‘n j

i1 in iN i1…… j …………. iN

In the n-mode product each mode-n fiber is multiplied B = U ⋅ A by the matrix U: i1...in−1 ,:,in+1...iN i1...in−1 ,:,in+1...iN

Useful relation between n-mode product and mode-n-unfolding:

B(n) = U ⋅ A(n) Unfold tensor A to matrix, multiply by U, fold back to tensor B. Unfolding in matrix with blue and red, matrix product, back 21 n-Mode Products

For multiple n-mode product the order is irrelevant:

n ≠ m : A×m U ×n V = A×n V ×m U    A U V = ∑ ∑ i1...in ...im ...iN jin  kim im  in 

= A U V = A V U = ∑ i1...in ...im ...iN jin kim ∑ i1...in ...im ...iN kim jin inim imin   =  A V U ∑ ∑ i1...in ...im ...iN kim  jin in  im 

A matrix: B = A×1 U ×2 V ⇔ (B )= (A )× (U )× (V )= U ⋅ A⋅V T = U ⋅ A⋅(V )T jk i1i2 1 ji1 2 ki2 j,: k,: × = ⋅ × = ⋅ T especially A 1 U U A, A 2 V A V 22 n-Mode Products For multiple n-mode product with the same n the order is relevant:

A×n U ×n V = A×n (VU )

   A U V = ∑ ∑ i1...in ...iN i'n in  ki'n i'n  in 

= A U V = A V U = ∑ i1...in ...iN ∑ i'n in ki'n ∑ i1...in ...iN ∑ ki'n i'n in in i'n in i'n

= A W = B ∑ i1...in ...iN kin i1...k...iN in

Matrix case: A×1 U ×1 V = V ⋅U ⋅ A = (VU )⋅ A, T T T A×2 U ×2 V = A⋅U ⋅V = A⋅(VU ) 23 n-Mode Product with vector

n-mode vector product of tensor A with vector v: Compute all inner products of mode-n fibers with v.

 nn  A × v =  A v  n ∑ i1...in ...iN in  in =1  i1...in−1in+1...iN

A ×n v ×m u = (A ×n v)×m−1 u = (A ×m u)×n v =

 nn nm  =  A v u  ∑∑ i1...in ...im ...iN in im  in =1 im =1  i1...in−1in+1...im−1im+1...iN for n

After contracting in: m  m-1 Matrix case: A × v = vT ⋅ A, A × v = A⋅v 1 2 24 Properties

(1) (A⊗ B)(C ⊗ D) = (AC)⊗ (BD),

(2) (A⊗ B)−T = A−T ⊗ B−T

(3) A• B •C = (A• B) •C = A• (B •C),

(4) (A• B)T (A• B) = (AT A)∗(BT B), (A• B)−1 = ((AT A)∗(BT B))−1(A• B)T

25 Proofs (1): (A⊗ B)(C ⊗ D) =  a B  a B  c D  c D   11 1n   11 1k  =    ⋅    =     am1B  amn B cn1D  cnk D

a11c11BD ++ a1ncn1BD  =   =   

(AC)11 BD  =   = (AC)⊗ (BD),    26 Proofs (2): (A⊗ B)−T = A−T ⊗ B−T

(A⊗ B)(A−1 ⊗ B−1 )= ((AA−1 )⊗ (BB−1 ))= I ⊗ I = I

T  a11B  a1n B  T   (A⊗ B) =     =   am1B  amn B  a BT  a BT   11 m1  =     = AT ⊗ BT  T  T  a1n B anm B  27 Proofs (3): A• B •C = (A• B) •C = A• (B •C),

(A• B) •C = (a1 ⊗b1  an ⊗bn )•C =

((a1 ⊗b1 )⊗ c1  (an ⊗bn )⊗ cn ) =

= (a1 ⊗b1 ⊗ c1  an ⊗bn ⊗ cn ) = A• (B •C), because (A⊗ B)⊗C = A⊗ (B ⊗C) = A⊗ B ⊗C

28 Proofs (4): (A• B)−1 = ((AT A)∗(BT B))−1(A• B)T ((AT A)∗(BT B)) = (A• B)T (A• B)

((AT A)∗(BT B)) = aT a ∗ bT b = (( i j )ij ( i j )ij ) (aT a )(bT b )  = aT a bT b =  1 1 1 1  = (( i j )( i j ))ij      T T a1 ⊗b1  (aT ⊗bT )(a ⊗b )    =  1 1 1 1  =   (a ⊗b  a ⊗b ) =    1 1 n n    T ⊗ T  an bn  T T = (a1 ⊗b1  an ⊗bn ) (a1 ⊗b1  an ⊗bn ) = (A• B) (A• B) 29 n-Mode Products Tensor with Matrices

General relation between n-mode product, mode-n unfolding and Kronecker (tensor) product:

(1) (2) ( N ) Y = A×1 U ×2 U ×N U ⇔ (n) (N ) (n+1) (n−1) (1) T Y(n) = U ⋅ A(n) ⋅(U ⊗⊗U ⊗U ⊗⊗U ) Y = A× U (1) × U (2) × U (N ) = A U (1) U (N ) = B 1 2 N ∑ i1...iN j1i1 jN iN j1... jN i1 ,...,iN

(1) (2) (1) (2) T N=2: Y = A×1 U ×2 U = U A(U ) (1) (2) T (1) (2) T Y(1) = (U A(U ) )(1) = U A(1) (U )

(1) (2) T (1) (2) T T Y(2) = (U A(U ) )(2) = (U A(U ) ) =

(2) T (1) T (2) (1) T = U A (U ) = U A(2) (U ) 30 n-Mode Products Tensor with Matrices (1) (2) (3) Y(1) = (A×1 U ×2 U ×3 U )(1) =   =  A U (1)U (2) U ( N )  =  ∑ i1i2i3 j1i1 j2i2 jN iN   i1 ,i2 ,i3 (1)    =  U (1)  A U (2) U (3)  = ∑ j1i1 ∑ i1i2i3 j2i2 j3i3  i i i  1  2 3 (1)    (1)  (1) = Bi j j U j i = (Bi j j ×1 U j ) = ∑ 1 2 3 1 1  1 2 3 1i1 (1)  i1 (1)   = U (1) (B ) = U (1)  A U (3)U (2)  = i1 j2 j3 (1) ∑ i1i2i3 j3i3 j2i2   i2i3 (1) = U (1) A (U (3) ⊗U (2) ) = U (1) A (U (3) ⊗U (2) )T ∑ i1k ,k (1) k

k = {i2i3}, r = { j2 j3} 31 of a tensor (3 leg case)

Rank-1 tensor:

(X ijk )= (a  b  c ) 3 dimensional

(X ijk )= (a ⊗b ⊗ c ) as vector

with vectors a, b, and c

X ijk = aibjck

32 Rank-R tensor for 3-leg case: PARAFAC ( factors) Candecomp (canonical decomposition) Polyadic form  CP (CANDECOMP/PARAFAC)

= + + + . . .

(Aijk )= (u1  v1  w1 )+ (u2  v2  w2 )+ (u3  v3  w3 )+

R Aijk = ∑(urivrj wrk ) r=1

Tensor rank R of tensor (Aijk) is the number of rank-1 terms that are necessary for representing A.

33 Rank representation R A = ∑ur  vr  wr = r=1 R  I J K  =  (1)  (2)  (3)  = ∑∑uirei ∑v jre j ∑ wkrek  r=1  i=1 j=1 k =1  R   (1) (2) (3) = ∑∑uirv jr wkr ei  e j  ek i, j,k  r=1  With matrices U, V, and W we can write R A = (u v w )= (u v w )δ ijk ∑ ir jr kr ∑ ip jq kt p,q,t 1 r=1 p,q,t 1 . A = (U,V ,W )⋅ I 1 with I the 3-way tensor with 1 on the main

 U,V,W describe basis transformation with A I 34 Notation Let U, V, and W be the matrices built by the vectors ur, vr, and wr. Then we can write T A(1) = U (W •V ) , T A(2) = V (W •U ) , T A(3) = W (V •U ) .

For frontal slices A(k) of a three leg tensor :

(k ) T (k ) A(k ) = UD V , D = diag(wk ) R Short notation: A = [[U,V ,W ]] = ∑uk  vk  wk k =1 R Or more general with factor λ: A = [[λ;U,V ,W ]] = ∑λkuk  vk  wk k =1 35 Proof:

 u   1  T Two-leg tensor (u  v)(1) = u ⋅v =   ⋅(v1  vm )   un 

T T One 3-leg tensor: (u  (v  w))(1) = u ⋅(w⊗ v) = u ⋅(w• v)

 R  R   =   = General 3-leg case: ∑ur vr wr  ∑(ur (vr wr ))(1)  r=1 (1) r=1 R T = ∑ur ⋅(wr • vr ) = r=1 T = (u1  uR )⋅(w1 • v1  wR • vR ) = = U (W •V )T

36 General N-way tensor

R (1) (2) (N ) A = [[U ,U ,...,U ]] = ∑u1,k  u2,k ... uN ,k k =1 R (1) (2) (N ) A = [[λ;U ,U ,...,U ]] = ∑λku1,k  u2,k ... uN ,k k =1

Mode-n matrix formula:

(n) (N ) (n+1) (n−1) (1) T A(n) = U Λ(U •...•U •U •...•U )

with Λ = diag(λ)

37 Proof:

3-leg tensor, proof like before:

T ∑λrur  vr  wr = UΛ(W •V ) r

In general:

T U (1)Λ(U (N ) ••U (2) ) =

(1) (N ) (2) (N ) (2) T = U (λ1U1 ⊗⊗U1  λRU R ⊗⊗U R ) = R  (1) (2) (N )  = ∑λrU r ⊗U r ⊗⊗U r   r=1 (1)

38 Low rank approximation

R r A = a ...a ≈ b ...b i1...iN ∑ ki1 kiN ∑ ki1 kiN k =1 k =1

(1) For R large enough every A can be represented by CP

(2) For given A there is a minimum R with this property

(3) Approximate A as good as possible by r

39 PARAFAC Graphical

Ai1,i2,i3,…,iN

≈ r

. . . . A1,i1,r A2,i2,r A3,i3,r AN,iN,r

i1 i2 i3 iN

M A ⋅A ⋅⋅ A ∑ 1,i1 ,r 2,i2 ,r N ,iN ,r r=1 40 Norm etc.

n1 ,...nN A , B = A B Inner product: i1...iN i1...iN ∑ i1...iN i1...iN i1...iN =1

n1 ,...nN A = A2 Norm: i1...iN ∑ i1...iN i1...iN =1

A = a(1)  a(2) ... a(N ) with vectors Rank-One tensor: A = a(1) ⋅a(2) ⋅...⋅a(N ) ( j) i1...iN i1 i2 iN a

Diagonal tensor: A ≠ 0 ⇔ i = i = ... = i i1...iN 1 2 N

41

A tensor is called cubical, if every mode is of the same size, n1=n2=…=nN

A cubical tensor is called supersymmetric, if ist elements Remain constant under any of the indices:

A = A i1...iN iπ (1) ...iπ ( N )

A tensor is partial symmetric, if it is symmetric in some modes, e.g. three-way tensor, where all frontal slices are symmetric matrices.

42 Example

A111 =1, A112 = A121 = A211 = 2, A122 = A212 = A221 = 3, A222 = 4. 2 3

1 2 2 3

3 4 1 2

3 4 2 3 2 3

2 3 1 2

3 4

2 3

43 Results on tensor rank

R A = a ...a i1...iN ∑ ki1 kiN with minimum R, n1,…,nN, nj≤n k =1 General N-way tensor: R=rank ≤nN-1

Proof: Assume nN=n=max nj.

A = A e(1) ⊗⊗ e(N ) = ∑ i1...iN i1 iN i1 ,...,iN   = e(1) ⊗⊗ e(N −1) ⊗ A e(N )  ∑ i1 iN−1 ∑ i1...iN iN  i1 ,...,iN−1  iN  N −1 n ≤ n N −1 Where the summation runs over maximum ∏ j j=1 rank 1 terms.

44 Results on tensor rank

The true rank might be much smaller:

The maximum rank of a 3 leg tensor 3x3x3 over IR is bounded by 5.

For general 3 leg IxJxK tensor A the maximum rank is bounded by rank(A) ≤ min{IJ, IK, JK}

For general 3 leg IxJx2 tensor A the maximum rank max{I, J} is bounded by rank(A) ≤ min{I, J}+ min{I, J, } 2 The typical rank of a 3 leg tensor 5x3x3 over IR is 5 or 6. 45 Results on tensor rank

Example: A = a ⊗ a + a ⊗b + b ⊗ a + b ⊗b

with linearly independent a and b, rank≤4, with 4 linearly independent terms, but A = (a + b) ⊗ a + b) with rank 1.

Theorem: rank(A)=3 for A = v1 ⊗ v2 ⊗ w3 + v1 ⊗ w2 ⊗ v3 + w1 ⊗ v2 ⊗ v3 with linearly independent vj,wj.

Proof: (1) rank(A)=0  A=0  v 1 ⊗ a = w 1 ⊗ b !!!

(2) rank(A)=1 

u ⊗ v ⊗ w = v1 ⊗ v2 ⊗ w3 + v1 ⊗ w2 ⊗ v3 + w1 ⊗ v2 ⊗ v3 46 Assume a linear functional with

ϕ1(v1) =1, ϕ :=ϕ1 ⊗id ⊗id and apply it on above equation:

ϕ1(u)v ⊗ w = v2 ⊗ w3 + w2 ⊗ v3 +ϕ1(w1)v2 ⊗ v3 =

= v2 ⊗ w3 + (w2 +ϕ1(w1)v2 ) ⊗ v3 Left side rank 1 matrix, right side rank 2 matrix !!!

(3) Rank(A)=2:

u ⊗ v ⊗ w + u'⊗v'⊗w'= v1 ⊗ v2 ⊗ w3 + v1 ⊗ w2 ⊗ v3 + w1 ⊗ v2 ⊗ v3 If u and u‘ are linearly dependent there is a functional

ϕ1(u) = ϕ1(u') = 0, ϕ1(v1) ≠ 0 or ϕ1(w1) ≠ 0 47 0 = (ϕ1 ⊗id ⊗id )(A) = ϕ1(v1)(v2 ⊗ w3 + w2 ⊗ v3 ) +ϕ1(w1)v2 ⊗ v3

Linearly independent !!

Hence, u and u‘ have to be linearly independent, and one

of the vectors u or u‘ must be linearly independent of v1, say u‘ is l.i. of v1. Choose functional with ϕ1(v1) =1, ϕ1(u') = 0.

ϕ1(u)v ⊗ w = (v2 ⊗ w3 + w2 ⊗ v3 )+ϕ1(w1)v2 ⊗ v3

Again, the left-hand-side matrix is rank ≤1, the right-hand-side matrix has rank 2 !!!

48 For a supersymmetric tensor we can define the symmetric rank:  r  rankS (A) = minr : A = ∑ ar  ar ... ar   k =1 

⊗3 1 2 ⊗3 1 1 Example: A = (1,2) = (1,2) ⊗ , B = (1,1) = (1,1) ⊗  2 4 1 1 2 4 1 1 3 5

1 2 1 1 2 3 + = 4 8 1 1 5 9

2 4 1 1 3 5

Supersymmetric of symmetric rank 2.

3 Rank 1:? ( a , b ) ⊗ = A + B , 4 equations for 2 unknowns a,b. 3 3 2 2 a = 2,b = 9,a b = 3,ab = 5; 49 Smallest Typical Rank 3-way T K 2 3 4 J 2 3 4 5 3 4 5 4 5 2 2 3 4 4 3 4 5 4 5 3 3 3 4 5 5 5 5 6 6 4 4 4 4 5 5 6 6 7 8 5 4 5 5 5 5 6 8 8 9 6 4 6 6 6 6 7 8 8 10 I 7 4 6 7 7 7 7 9 9 10 8 4 6 8 8 8 8 9 10 11 9 4 6 8 9 9 9 9 10 12 10 4 6 8 10 9 10 10 10 12 11 4 6 8 10 9 11 11 11 13 12 4 6 8 10 9 12 12 12 13

 IJK  DOF: R(I+J+K-2)  Expected Rank:    I + J + K − 2 50 Examples

Strassen by considering a 3-leg tensor with rank 7 Hackbusch page 69

 a a  b b   c c   1 2 ⋅ 1 2  =  1 2        with submatrices aj, bj, cj a3 a4  b3 b4  c3 c4 

4 cν = ∑tν ,µ,λ aµbλ t is of rank 7. µ,λ=1

51 Matrix case: SVD

For a tensor that is a vector, the rank is 1.

For a tensor that is a nxm matrix, the rank is given by the decomposition

r r T T A = UΣV = ∑σ i (uivi )= ∑σ i (ui ⊗ vi ) i=1 i=1 r = the number of nonzero singular values.

For low rank approximation we can delete the small singular values.

52 Uniqueness of CP

Matrix case: A nxm matrix of rank r: r T A = U n,rVr,m = ∑ur  vr k =1 Every matrix of this form gives a CP representation. QR-, SVD.

In the matrix case (2-leg-case) the rank representations are not unique!

53 Uniqueness 3 leg case

Let A be a three-way tensor of rank R: R A = [[U,V ,W ]] = ∑uk  vk  wk k =1

Uniqueness is related to other rank R representations upto and upto

A = [[U,V ,W ]] = [[UΠ,VΠ,WΠ]] for any RxR permutation Π

R = α  β  γ A ∑( kuk ) ( k vk ) ( k wk ) with αk βk γk = 1, for k=1,…,R k =1

54 k-rank of a matrix

The k-rank of a matrix A - denoted by kA – is the maximum number k such that any k columns of A are linearly independent.

T = [[A, B,C]] Then the CP representation of A is unique if k A + kB + kC ≥ 2R + 2

55 T an IxJxK-Tensor:

Then the CP representation of T is unique if min{I, R}+ min{J, R}+ min{K, R} ≥ 2R + 2

For R≤K the CP representation of T is unique if 2R(R −1) ≤ I(I −1)J (J −1)

The CP representation is unique for an N-way rank R tensor

R (1) (2) (N ) (1) (2) (N ) A = [[A , A ,..., A ]] = ∑ ak  ak  ak N k =1 ≥ + − if ∑ k A( n) 2R (N 1) n=1 56 Approximation of tensor by CP

Matrix case trivial via SVD: keep larger singular values and replace smaller one by 0.

For 3-way tensors this is not so easy. Especially for

R A = ∑λkuk  vk  wk k =1 summing up r of these terms will not give a good rank-r approximation.

For finding the best rank-r approximation we have to

determine all factors simultaneously! 57 Rank-r approximation

The situation is even worse: the best rank-r approximation might even not exist!

Consider A = u1  v1  w2 + u1  v2  w1 + u2  v1  w1 where the matrices U, V, and W have linearly independent Columns.

Approximation by rank-2 tensors:  1   1   1  Bα = αu + u   v + v    w + w  −α(u  v  w )  1 α 2   1 α 2   1 α 2  1 1 1 1 1 α →∞ A − B = u  v  w + u  v  w + u  v  w + u  v  w → 0 α α 2 2 1 2 1 2 1 2 2 α 2 2 2 Example for degeneracy! 58 Another example:

⊗3 ⊗3  1 1   1 1  3 A(n) = n2  x + y + z + n2  x + y − z − 2n2 x⊗  n2 n   n2 n  with linearly independent x,y,z.

The of rank 3 tensors converges for n∞ to the rank 5 tensor:

A(∞) = x ⊗ x ⊗ y + x ⊗ y ⊗ x + y ⊗ x ⊗ x + x ⊗ z ⊗ z + z ⊗ x ⊗ z + z ⊗ z ⊗ x

59 Rank spaces

Hence a sequence of rank-2 tensors converges against a rank-3 tensor: The space of rank-2 tensors is not closed!

We can approximate the 3-way tensor as good as we want by rank-2 tensors, but the sequence of approximations does not converge in the rank-2 space.

A

B∞ B2 B0 B1

60 Computing the CP

Standard method: Alternating Least Squares method (ALS)

Given any (high-rank) tensor A Compute r-rank approximation in tensor B r

min A − B with B = λkuk  vk  wk = [[λ;U,V ,W ]] B ∑ k =1

ALS approach: fix two matrices, e.g. V and W, and solve for U. This leads to the matrix minimization ˆ T min A(1) −U (W •V ) Uˆ F − ˆ T 1 T T −1 with solution U = A(1) ((W •V ) ) = A(1) (W •V )(W W ∗V V ) 61 Computing

Advantage: Compute pseudoinverse of small rxr-matrix

Afterwards, λ is defined by normalization

λk = uˆk , uk = uˆk / λk , k =1,...,r

In this way we update U, then V, then W, then again U and so on until convergence.

Costs per step:

62 ELS ALS with enhanced search

Assume, ALS has computed new Unew replacing Uold. Hence, we have a change in the direction Δ=Unew-Uold in the form Unew = Uold + Δ. We generalize this by introducing line search and step size μ in the form

Unew = Uold + μ Δ looking for an optimal value of μ. R 2 min A − (u + µδ ) v  w µ ∑ k k k k k=1

R R 2 = min (A − u  v  w ) − µ δ  v  w µ ∑ k k k ∑ k k k k =1 k =1 2 = min B − µC → µ →U = U + µ∆ µ new old 63 ELS general

Unew = Uold + μ ΔU, Vnew = Vold + μ ΔV, Wnew = Wold + μ ΔW,

R 2 min A − (u + µδ ) (v + µδ )  (w + µδ ) µ ∑ k u,k k v,k k w,k k =1 2 = min B − µ 3C − µ 2 D − µE µ = min a + a µ + a µ 2 + a µ 3 + a µ 4 + a µ 5 + a µ 6 µ 0 1 2 3 4 5 6

Find the 5 roots of the and choose the root with minimum value of the objective function. Gives new U, V, and W. Use ALS for new search directions and repeat. 64 Application of the CP

Starting point: 3-leg tensors often have small rank and the low-rank approximation is unique.

Therefore, the best approximating rank-1 term can give useful information on the data: - Mixtures of analytes can be separated - Concentrations can be measured - Pure spectra and profiles can be estimated

Typical example: 3-way data in time, space frequency Translate matrix case by additional index in 3-leg tensor to achieve uniqueness!

65 Application of the CP

Van Huffel: PARAFAC in EEG monitoring EEG data as 3-way tensor

66 EEG Monitoring

67 EEG rank terms

68 EEG: epileptic seizure onset localization

69 EEG

Better localization by CP than visually or by other matrix techniques.

70 Block PARAFAC (L,L,1) Consider more general higher rank terms (L,L,1) Because larger blocks might be necessary for accurate representation of the data.

R T = ∑ Er ⊗ cr , Er : I × J − matrix, rank(Er ) = L r=1 R T T = ∑(Ar ⋅ Br )⊗ cr r=1 Also block representations are often unique, e.g. for RL ≤ min(I,J) and C without proportional colmuns.

„Essentially unique“, upto - permutations, - factor between A and B - scaling 71 Visualization c J K K L J

L

I = I D B

T A

R T = (A, B,c)⋅ D = [[D; A, B,c]] = ∑ Dr ×1 Ar ×2 Br ×3 cr r=1

72 Waring Problem

Write given integer n in the form n = nd ++ nd 1 kd Proved by Hilbert 1909.

What is the minimum number kd?

Reformulation in : Which is the minimum s such that a degree d can be written as a sum of powers of d of linear terms:

d d P = L1 ++ Ls

Answered by Hirschowitz 1995.

73 Symmetric Rank of Polynomial

The minimum s is called the symmetric rank of the polynomial P

Reformulation: Consider map ν d : IP(S1) → IP(Sd ) L → Ld

Image of this map is called d-th Veronese variety Xn,d

74 Veronese Variety for Tensors

d ⊗d ν d : IP(V ) → IP(S V ) ⊂ IP(V ) d v → v⊗

The of this map is the Veronese Variety of IP(V)

The symmetric rank of a T ϵ Sd(V) is the minimum s with

⊗d ⊗d T = v1 ++ vs

75 Mode n-Rank of a Tensor View the tensor as collection of vectors in the n-th index (fibers) The rank of these collection of vectors is the mode n-rank. 1 0

Example with R1=R2=2, R3=1 1 0 Mode n=3: Vectors (0,0), (1,1), (1,1), (0,0)

1

0 1 0 1 R = rank  =1 0 1 3   3 0 1 0 1

0 2 1

Mode n-rank is the rank of the mode-n unfolding matrix A(n) 76 Tucker Decomposition

(three-mode) (Tucker, 1966) N-mode PCA (principal component analysis) Higher-order SVD (HOSVD) (De Lathauwer, 2000) N-mode SVD

Idea: decompose given N-way tensor into a core N-way tensor with less entries in each dimension multiplied by a matrix along each mode. P Q K A = G ×1 U ×2 V ×3 W = ∑∑∑ g pqku p  vq  wk = p=1 q=1 k =1 = [[G;U,V ,W ]] = (U,V ,W )⋅G With core tensor G and U,V,W matrices relative to each mode 77 Tucker Decomposition Unfolding in i and {jk}, V SVD  U Backfolding ΛV Repeat for other U W unfoldings. A = S

(1) (1) (1) (Aijk ) = (Sijk )×1 U ×2 V ×3 W

i j k k1 k2 k3 u v w μ μ (1) (1) (1) 1 μ2 3 (A )= S ⋅u  v  w S ijk ∑ ∑ ∑ µ1µ2µ3 µ1 µ2 µ3 µ1 =1 µ2 =1 µ3 =1

k1 k2 k3 A = S u v w ,i =1,..., I, j =1,..., J, k =1,..., K ijk ∑ ∑ ∑ µ1µ2µ3 µ1i µ2 j µ3k µ1 µ2 µ3

Multilinear rank (k1,k2,k3) 78 Computation

(1) (1) (1)T A(1) : I1 × I2 I3; SVD : A(1) = U Σ V

(2) (2) (2)T A(2) : I2 × I1I3; SVD : A(2) = U Σ V

(3) (3) (3)T A(3) : I3 × I1I2 ; SVD : A(3) = U Σ V

(1)T (2)T (3)T S = A×1 U ×2 U ×3 U (1) (2) (3) A = S ×1 U ×2 U ×3 U

U with orthonormal columns S all-orthogonal and ordered

79 Proof:

n ≠ m : A×m U ×n V = A×n V ×m U

A×n U ×n V = A×n (VU )

B = A×n U ⇔ B(n) = U ⋅ A(n) (1)T (2)T (3)T S = A×1 U ×2 U ×3 U

(1) (2) (3) A = S ×1 U ×2 U ×3 U = (1)T (2)T (3)T (1) (2) (3) = (A×1 U ×2 U ×3 U )×1 U ×2 U ×3 U = (1)T (1) (2)T (2) (3)T (3) = A×1 (U U )×2 (U U )×3 (U U )=

= A×1 I ×2 I ×3 I = A 80 Core Tensor S all-orthogonal:

(1) (2) (3) A = S ×1 U ×2 U ×3 U with the additional property

Si,:,:, S j,:,: = 0 for i ≠ j

(1) (3) (2) Proof: S(1) = U A(1) (U ⊗U )

(1) (3) (2) S(1),i = Ui A(1) (U ⊗U )

(3) (2) T T (1)T (1) (3) (2) S(1),i , S(1) j = (U ⊗U ) A(1) (Ui U j )A(1) (U ⊗U )

and similarly for index 2 and 3.

81 Properties

Mode-n singular values = norms of slices = sing.v. of An

Truncate by deleting small singular values/vectors

(1)T (2)T (3)T S = A×1 U ×2 U ×3 U → ~ ~(1)T ~(2)T ~(3)T S = A×1 U ×2 U ×3 U

~ ~ ~(1) ~(2) ~(3) A → A = S ×1 U ×2 U ×3 U

82 Tucker Graphical

Ai1,i2,i3,…,iN

i1,…,iN =1,…,In ≈

Bm1,m2,m3,…,mN

m1 m2 m3 mN . . . . C1,i1,m1 C2,i2,m2 C3,i3,m3 CN,iN,mN

i1 i2 i3 iN

Dsmall B C ⋅⋅C , ∑ m1 ,...,mN m1i1 mN iN m1 ,...,mN 83 Three-way Tucker

P Q K A = G ×1 U ×2 V ×3 W = ∑∑∑Gpqku p  vq  wk = [[G;U,V ,W ]] p=1 q=1 k =1

P Q K Aijm = ∑∑∑Gpqkuipv jq wmk , p=1 q=1 k =1

T A(1) = U ⋅G(1) (W ⊗V ) T A(2) = V ⋅G(2) (W ⊗U ) T A(3) = W ⋅G(3) (V ⊗U )

84 N-way Tucker

(1) (2) (N ) (1) (2) (N ) A = G ×1 U ×2 U ...×N U = [[G;U ,U ,...,U ]]

R1 R2 RN A = ... G u(1) u(2) ...u(N ) , i1i2 ...iN ∑∑ ∑ k1k2 ...kN i1k1 i2k2 iN kN in =1,..., In k1 =1 k2 =1 kN =1

(n) (N ) (n+1) (n−1) T A(n) = U ⋅G(n) (U ⊗...⊗U ⊗U ⊗...⊗U (1)) so Tucker1 is decomposition relative to only one index, Tucker2 relative to 2 indices, and Tucker relative to all indices.

85 Computing the Tucker Dec.

For n=1,…,N (n) U := matrix of left singular vectors of A(n)

(1)T (2)T (N )T G := A×1 U ×2 U ×3 ×N U

Output: G, U(1), …, U(N).

We can use this algorithm also for approximating A by Choosing in U(n) only the dominant left singular vectors!

Rn  rn

86 Approximating Tucker dec. min A −[[G;U (1) ,...,U (N ) ]] G,U (1) ,...,U ( N ) × × × subject to G ∈ IRr1 ... rN , U (n) ∈ IR In rn columnwise orthogonal Rewrite as minimizing vec(A) − (U (N ) ⊗⊗U (1) )vec(G)

(1)T (2)T (N )T with solution G := A×1 U ×2 U ×3 ×N U

2 A −[[G;U (1) ,...,U ( N ) ]]

2 2 = A − 2 A,[[G;U (1) ,...,U ( N ) ]] + [[G;U (1) ,...,U (N ) ]]

2 (1)T (N )T 2 = A − 2 A×1 U ×2 ×N U ,G + G

2 2 2 2 = A − 2 G,G + G = A − G 2 2 T T = A − A× U (1) × × U (N ) 1 2 N 87 ALS for Tucker

(1)T (N )T max A×1 U ×2 ×N U U ( n)

× subject to U (n) ∈ IR In rn columnwise orthogonal

(n)T (N ) (n+1) (n−1) (1) max U W with W = A(n) (U ⊗⊗U ⊗U ⊗⊗U ) U ( n)

ALS method: For n=1,…,N: (n) choose U the rn dominant singular vectors of W Repeat until convergence

88 Uniqueness

Tucker is not unique:

−1 −1 −1 [[G; A, B,C]] = [[G ×1 U ×2 V ×3 W; AU , BV ,CW ]]

89 Application: Tensorfaces

Given a database of images of different persons, e.g. with different looks=expressions, illumination, positions=views. We can collect all the images in a big 5-leg tensor A = (a ) ipeople ,iviews ,iillum ,iexp ress ,ipixel

In the example there is a database of 28 male persons With 5 poses, 3 illuminations, 3 expressions, each image 512x352 pixels. Hence, A is a 28x5x3x3x7943 tensor.

90 Database 28 subjects with 45 images per person.

Expression: smile

Illuminations 1 2 3 45 images for one person:

Expression V i e w p o i n t 91 Principal Component Analysis PCA

Use to capture the important features in a compact form. Often eigendecomposition and eigenvectors are used in PCA.

Here we use the Tucker decomposition:

A = Z ×1 U people ×2 U views ×3 Uillum ×4 U express ×5 U pixels

resulting in T A( pixels) = U pixels Z( pixels) (U express ⊗Uillum ⊗U views ⊗U people ) image data basis vectors 92 Interpretation

T A( pixels) = U pixels Z( pixels) (U express ⊗Uillum ⊗U views ⊗U people ) image data basis vectors coefficients

The mode matrix Upixels can be interpreted as PCA.

By the core tensor Z we can transform the eigenimages

present in Upixels into eigenmodes representing the principal axes of variation across the various factors (people, viewpoints, illuminations, expressions) by

forming Z ×5 U pixels

93 Eigenfaces

The first 10 PCA eigenvectors (eigenfaces) contained in the mode matrix Upixels

„Multilinear Analysis of Image Ensembles: TensorFaces“ by M.A.O. Vasilescu and D. Terzopoulos

Similar paper on PCA on human motion via Tensors.

94 PCA-CP-Tucker

PCA - bilinear model: xij = ∑ aif bjf + eij, i =1,..., I; j =1,..., J; f X = ABT + E

CP - trilinear model: xijk = ∑ aif bjf ckf + eijk , f T X k = ADk B + Ek , Dk = diag(C(k,:));

Tucker3: X = AG(C ⊗ B)T + E

with unitary bases A, B, C for each mode.

95