Tensors I: Basic Operations and Representations
1 Overview
Tensors: Vectors, matrices and so on … Definition Operators PARAFAC/Candecomp, polyadic, CP Tucker, HOSVD
2 Different Matrix Products
Kronecker product A = (a1 an ), B = (b1 bm ) Matrix case: a B a B a B 11 12 1n a B a B a B A⊗ B = 21 22 2n = am1B am2 B amn B
= (a1 ⊗b1 a1 ⊗b2 a1 ⊗b3 an ⊗bm−1 an ⊗bm )
5 7 15 21 1 3 5 7 6 8 18 24 A = , B = ,⇒ A⊗ B = 2 4 6 8 10 14 20 28 12 16 24 32
3 Different Matrix Products
Vector case (row or column form):
a ⊗b = (a1 an )⊗ (b1 bm ) =
= (a1b an b) = (a1b1 ⋅⋅ a1bm anb1 ⋅⋅ anbm )
b 1 a ⊗b = (a1 an )⊗ = bm a b a b 1 1 n 1 = (a1b an b) = a1bm anbm 4 Different Matrix Products
Khatri-Rao product:
A = (a1 an ), B = (b1 bn );
A• B = (a1 ⊗b1 a2 ⊗b2 a3 ⊗b3 an−1 ⊗bn−1 an ⊗bn )
= matching columnwise Kronecker product only for matrices with the same number of columns!
5 21 1 3 5 7 6 24 A = , B = ,⇒ A• B = 2 4 6 8 10 28 12 32
5 Different Matrix Products
Hadamard product:
a a b b 11 1n 11 1n A = , B = , am1 amn bm1 bmn
a b a b a b 11 11 12 12 1n 1n a b a b a b A∗ B = 21 21 22 22 2n 2n am1bm1 am2bm2 amnbnm
only for matrices of equal size!
6 Definition Tensor as multi-indexed object: x 1 x = (x )n = (x )n = = ( ) One index: vector: i i=1 i1 i =1 or x x1 xn 1 xn a a 1,1 1,m A = (A )n, m = (A )n1 , n2 = Two indices: matrix: i, j = = i1 ,i2 = = i 1, j 1 i1 1,i2 1 an,1 an,m
a1,1,2 n, m l n , n n 1 2 3 a1,1,1 a1,2,1 Three indices: cube: A = (Ai, j,k ) = (Ai ,i ,i ) i=1, j=1,k=1 1 2 3 i1 =1,i2 =1,i3 =1
a2,1,1 n , n , ,n Multi-index: x = (x ) 1 2 N i1i2 ...iN i =1,i =1,.....,i =1 1 2 N 7 Motivation: Why tensors?
PDE for two-dimensional problems:
(aux )x + (bu y ) y = f (x, y)
aui−1, j + aui+1, j − (a + b)ui, j + bui, j−1 + bui, j+1 Discretization in 2D: = f h2 i, j ui,j can be seen as a vector or as a 2-way tensor=matrix.
Linear system Au=f with block matrix A:
Aij,kmukm = fij
So matrix Aij,km can be also seen as a 4-way tensor
8 Motivation: Why tensors?
PDE with a number of additional parameters, high-dimensional problems:
+ + = auxx bu yy cuzz f for discrete sets of parameters aijk
Leads to linear system Amn for each i,j,k Amn,ijk
Classical matrix/vector problems but for huge problems: Represent vector/matrix by tensor with efficient representation. x = x i i1...iN
9 Graphical Notation
i
Vector (1 leg): (xi )i ↔
i Matrix (2 legs): a ↔ ( ij )i, j j
i Cube (3 legs): x ↔ ( ijk )i, j,k k j
i1 i2 … iN General tensor with N legs … (x ) ↔ i1...iN i ,...,i 1 N 10 Graphical Notation
Matrix-vector product – contraction over index i: a ⋅(x ) = y ↔ i ↔ ( ij )i, j i i ( j )j j j
∑ aij xi = aij xi = y j i
Einstein notation, shared indices are contracted via summation. No distinction between covariant and contravariant!
11 Basic Operations
y
x y i1 Contraction ∑ i1 i1 gives scalar z i x 1
x i Tensor product x i y i gives 2-tensor zi i 1 1 2 1 2 = z y i1 i2 i2
x y ′ ′ ′ ′ = z ′ ′ ′ ′ More general: ∑ i1 in iN i1 in−1inin+1 iM i1 in−1in+1 iN i1 in−1in+1 iM in
i‘1..i‘n-1 i‘n+1..i‘M y z in = x i1i‘1…in-1i‘n-1in+1i‘n+1…iNi‘M
i1..in-1 in+1..iN 12 Tensor as data hive of different form
T kron(x, y) = x ⊗ y = (x1 y xn y) = (xi yi ) seen as a column vector 1 2 i1 ,i2 x y x y 1 1 1 m xyT = = (x y ) seen as a matrix i1 i2 i ,i 1 2 xn y1 xn ym = kron(yT , x) = yT ⊗ x
y x y x 1 1 1 n yxT = = (x y ) seen as a matrix i1 i2 i ,i 1 2 ym x1 ym xn = kron(xT , y) = xT ⊗ y
x y = (x y ) i1 i2 i ,i seen as a two-leg tensor 1 2 13 Matrix
i1 A A Matrix: i1i2 i2
x A x = z i1 z Operations: Contractions ∑ i1i2 i1 i2 A i2 i1 i2
i 1 i1 A y = z A z ∑ i1i2 i2 i1 i 2 i2 y
x i Ai i xi yi = z 1 ∑ 1 2 1 2 A z i1i2 i2 y
i3 i B 3 A B = C i2 C ∑ i1i2 i2i3 i1i3 i1 i2 A i1 A x = C Tensor product: i1i2 i3 i1i2i3 14 Three Leg as Standardexample
i1
i3 A i1i2i3 i2
Operations: Contractions in i1 , i2 , i3 or combinations gives tensor with less legs.
Tensor product gives tensor with more legs.
(ai ) See tensor as - collection of vectors fiber 1 i2i3
- collection of matrices slices (Ai i ) 1 2 i3
A = A - large matrix, unfolding {i1i2 }i3 j1i3
Operations between tensors are defined by contracted indices. 15 Fibers 13 16 19 22 1 4 7 10 A: 3 x 4 x 2 – tensor 14 17 20 23 2 5 8 11 15 18 21 24 3 6 9 12
1 4 7 10 13 16 19 22 Mode-1 fibers, X:,j,k: 2 5 8 11 14 17 20 23 3 6 9 12 15 18 21 24
1 2 3 13 14 15 4 5 6 16 17 18 Mode-2 fibers, X : j,:,k 7 8 9 19 20 21 10 11 12 22 23 24
1 2 3 4 5 6 7 8 9 10 11 12 Mode-3 fibers, Xj,k,:: 13 14 15 16 17 18 19 20 21 22 23 24
16 Slices 13 16 19 22 1 4 7 10 A: 3 x 4 x 2 – tensor 14 17 20 23 2 5 8 11 15 18 21 24 3 6 9 12
1 4 7 10 13 16 19 22 Frontal slices, 1,2: X:,:,k 2 5 8 11 14 17 20 23 3 6 9 12 15 18 21 24
1 13 4 16 7 19 10 22 Lateral slices, 1,3: X:,k,: 2 14 5 17 8 20 11 23 3 15 6 18 9 21 12 24
13 16 19 22 14 17 20 23 15 18 21 24 Horizontal sl. 2,3: Xk,:,: 1 4 7 10 2 5 8 11 3 6 9 12
17 Matricification 13 16 19 22 1 4 7 10 A: 3 x 4 x 2 – tensor 14 17 20 23 2 5 8 11 15 18 21 24 3 6 9 12
1 4 7 10 13 16 19 22 A = A i1{i2i3} i1 j1 Mode-1 unfolding: A(1) = 2 5 8 11 14 17 20 23 3 6 9 12 15 18 21 24 j1=i2+n2(i3-1)
1 2 3 13 14 15 4 5 6 16 17 18 Mode-2 unfolding A(2) = Ai {i i } 7 8 9 19 20 21 2 1 3 10 11 12 22 23 24
1 2 3 4 5 6 7 8 9 10 11 12 = Mode-3 unfolding A(3) 13 14 15 16 17 18 19 20 21 22 23 24
T Vectorization: vec(A) = (1 2 23 24) 18 General Matricification
Tensor A → A = A Matrix i1...inin+1...iN {i1...in }{in+1...iN } ij
i = i1 + n2 (i2 −1) + n2n3 (i3 −1) +...+ n2 nn (in −1),
j = in+1 + nn+2 (in+2 −1) + nn+2nn+3 (in+3 −1) +...+ nn+2 nN (iN −1). or with any partitioning of the indices in two groups (rows/columns)
General remark on notation: many properties/operations with tensors are formulated using totally different notations! ►,◄,ʘ, ⊗ ,•,,×,
19 Basis Transformation
I J K Tensor (1) (2) (3) A = ∑ ∑ ∑ Aijk ei ⊗ e j ⊗ ek i=1 j=1 k =1
(l) (l) (l) Change of basis ei = Q e'i = (1) (1) ⊗ (2) (2) ⊗ (3) (3) = A' pqr ∑∑∑ AijkQ e'i Q e' j Q e'k i j k pqr I J K (1) (2) (3) = ∑ ∑ ∑ Qpi Qqj Qrk Aijk i=1 j=1 k =1
Notation: A'= (Q(1) ,Q(2) ,Q(3) )⋅ A 20 n-Mode Product of Tensor with Matrix
Tensor Matrix In A , U : (A× U ) = a ⋅u =: B i1...in ...iN jin n i1...in−1 jin+1...iN ∑ i1...iN jin i1... j...iN in =1
Contraction over in, in replaced by index j:=i‘n j
i1 in iN i1…… j …………. iN
In the n-mode product each mode-n fiber is multiplied B = U ⋅ A by the matrix U: i1...in−1 ,:,in+1...iN i1...in−1 ,:,in+1...iN
Useful relation between n-mode product and mode-n-unfolding:
B(n) = U ⋅ A(n) Unfold tensor A to matrix, multiply by U, fold back to tensor B. Unfolding in matrix with blue and red, matrix product, back 21 n-Mode Products
For multiple n-mode product the order is irrelevant:
n ≠ m : A×m U ×n V = A×n V ×m U A U V = ∑ ∑ i1...in ...im ...iN jin kim im in
= A U V = A V U = ∑ i1...in ...im ...iN jin kim ∑ i1...in ...im ...iN kim jin inim imin = A V U ∑ ∑ i1...in ...im ...iN kim jin in im
A matrix: B = A×1 U ×2 V ⇔ (B )= (A )× (U )× (V )= U ⋅ A⋅V T = U ⋅ A⋅(V )T jk i1i2 1 ji1 2 ki2 j,: k,: × = ⋅ × = ⋅ T especially A 1 U U A, A 2 V A V 22 n-Mode Products For multiple n-mode product with the same n the order is relevant:
A×n U ×n V = A×n (VU )
A U V = ∑ ∑ i1...in ...iN i'n in ki'n i'n in
= A U V = A V U = ∑ i1...in ...iN ∑ i'n in ki'n ∑ i1...in ...iN ∑ ki'n i'n in in i'n in i'n
= A W = B ∑ i1...in ...iN kin i1...k...iN in
Matrix case: A×1 U ×1 V = V ⋅U ⋅ A = (VU )⋅ A, T T T A×2 U ×2 V = A⋅U ⋅V = A⋅(VU ) 23 n-Mode Product with vector
n-mode vector product of tensor A with vector v: Compute all inner products of mode-n fibers with v.
nn A × v = A v n ∑ i1...in ...iN in in =1 i1...in−1in+1...iN
A ×n v ×m u = (A ×n v)×m−1 u = (A ×m u)×n v =
nn nm = A v u ∑∑ i1...in ...im ...iN in im in =1 im =1 i1...in−1in+1...im−1im+1...iN for n After contracting in: m m-1 Matrix case: A × v = vT ⋅ A, A × v = A⋅v 1 2 24 Properties (1) (A⊗ B)(C ⊗ D) = (AC)⊗ (BD), (2) (A⊗ B)−T = A−T ⊗ B−T (3) A• B •C = (A• B) •C = A• (B •C), (4) (A• B)T (A• B) = (AT A)∗(BT B), (A• B)−1 = ((AT A)∗(BT B))−1(A• B)T 25 Proofs (1): (A⊗ B)(C ⊗ D) = a B a B c D c D 11 1n 11 1k = ⋅ = am1B amn B cn1D cnk D a11c11BD ++ a1ncn1BD = = (AC)11 BD = = (AC)⊗ (BD), 26 Proofs (2): (A⊗ B)−T = A−T ⊗ B−T (A⊗ B)(A−1 ⊗ B−1 )= ((AA−1 )⊗ (BB−1 ))= I ⊗ I = I T a11B a1n B T (A⊗ B) = = am1B amn B a BT a BT 11 m1 = = AT ⊗ BT T T a1n B anm B 27 Proofs (3): A• B •C = (A• B) •C = A• (B •C), (A• B) •C = (a1 ⊗b1 an ⊗bn )•C = ((a1 ⊗b1 )⊗ c1 (an ⊗bn )⊗ cn ) = = (a1 ⊗b1 ⊗ c1 an ⊗bn ⊗ cn ) = A• (B •C), because (A⊗ B)⊗C = A⊗ (B ⊗C) = A⊗ B ⊗C 28 Proofs (4): (A• B)−1 = ((AT A)∗(BT B))−1(A• B)T ((AT A)∗(BT B)) = (A• B)T (A• B) ((AT A)∗(BT B)) = aT a ∗ bT b = (( i j )ij ( i j )ij ) (aT a )(bT b ) = aT a bT b = 1 1 1 1 = (( i j )( i j ))ij T T a1 ⊗b1 (aT ⊗bT )(a ⊗b ) = 1 1 1 1 = (a ⊗b a ⊗b ) = 1 1 n n T ⊗ T an bn T T = (a1 ⊗b1 an ⊗bn ) (a1 ⊗b1 an ⊗bn ) = (A• B) (A• B) 29 n-Mode Products Tensor with Matrices General relation between n-mode product, mode-n unfolding and Kronecker (tensor) product: (1) (2) ( N ) Y = A×1 U ×2 U ×N U ⇔ (n) (N ) (n+1) (n−1) (1) T Y(n) = U ⋅ A(n) ⋅(U ⊗⊗U ⊗U ⊗⊗U ) Y = A× U (1) × U (2) × U (N ) = A U (1) U (N ) = B 1 2 N ∑ i1...iN j1i1 jN iN j1... jN i1 ,...,iN (1) (2) (1) (2) T N=2: Y = A×1 U ×2 U = U A(U ) (1) (2) T (1) (2) T Y(1) = (U A(U ) )(1) = U A(1) (U ) (1) (2) T (1) (2) T T Y(2) = (U A(U ) )(2) = (U A(U ) ) = (2) T (1) T (2) (1) T = U A (U ) = U A(2) (U ) 30 n-Mode Products Tensor with Matrices (1) (2) (3) Y(1) = (A×1 U ×2 U ×3 U )(1) = = A U (1)U (2) U ( N ) = ∑ i1i2i3 j1i1 j2i2 jN iN i1 ,i2 ,i3 (1) = U (1) A U (2) U (3) = ∑ j1i1 ∑ i1i2i3 j2i2 j3i3 i i i 1 2 3 (1) (1) (1) = Bi j j U j i = (Bi j j ×1 U j ) = ∑ 1 2 3 1 1 1 2 3 1i1 (1) i1 (1) = U (1) (B ) = U (1) A U (3)U (2) = i1 j2 j3 (1) ∑ i1i2i3 j3i3 j2i2 i2i3 (1) = U (1) A (U (3) ⊗U (2) ) = U (1) A (U (3) ⊗U (2) )T ∑ i1k r,k (1) k k = {i2i3}, r = { j2 j3} 31 Rank of a tensor (3 leg case) Rank-1 tensor: (X ijk )= (a b c ) 3 dimensional (X ijk )= (a ⊗b ⊗ c ) as vector with vectors a, b, and c X ijk = aibjck 32 Rank-R tensor for 3-leg case: PARAFAC (parallel factors) Candecomp (canonical decomposition) Polyadic form CP (CANDECOMP/PARAFAC) = + + + . . . (Aijk )= (u1 v1 w1 )+ (u2 v2 w2 )+ (u3 v3 w3 )+ R Aijk = ∑(urivrj wrk ) r=1 Tensor rank R of tensor (Aijk) is the number of rank-1 terms that are necessary for representing A. 33 Rank representation R A = ∑ur vr wr = r=1 R I J K = (1) (2) (3) = ∑∑uirei ∑v jre j ∑ wkrek r=1 i=1 j=1 k =1 R (1) (2) (3) = ∑∑uirv jr wkr ei e j ek i, j,k r=1 With matrices U, V, and W we can write R A = (u v w )= (u v w )δ ijk ∑ ir jr kr ∑ ip jq kt p,q,t 1 r=1 p,q,t 1 . A = (U,V ,W )⋅ I 1 with I the 3-way tensor with 1 on the main diagonal U,V,W describe basis transformation with A I 34 Notation Let U, V, and W be the matrices built by the vectors ur, vr, and wr. Then we can write T A(1) = U (W •V ) , T A(2) = V (W •U ) , T A(3) = W (V •U ) . For frontal slices A(k) of a three leg tensor : (k ) T (k ) A(k ) = UD V , D = diag(wk ) R Short notation: A = [[U,V ,W ]] = ∑uk vk wk k =1 R Or more general with factor λ: A = [[λ;U,V ,W ]] = ∑λkuk vk wk k =1 35 Proof: u 1 T Two-leg tensor (u v)(1) = u ⋅v = ⋅(v1 vm ) un T T One 3-leg tensor: (u (v w))(1) = u ⋅(w⊗ v) = u ⋅(w• v) R R = = General 3-leg case: ∑ur vr wr ∑(ur (vr wr ))(1) r=1 (1) r=1 R T = ∑ur ⋅(wr • vr ) = r=1 T = (u1 uR )⋅(w1 • v1 wR • vR ) = = U (W •V )T 36 General N-way tensor R (1) (2) (N ) A = [[U ,U ,...,U ]] = ∑u1,k u2,k ... uN ,k k =1 R (1) (2) (N ) A = [[λ;U ,U ,...,U ]] = ∑λku1,k u2,k ... uN ,k k =1 Mode-n matrix formula: (n) (N ) (n+1) (n−1) (1) T A(n) = U Λ(U •...•U •U •...•U ) with Λ = diag(λ) 37 Proof: 3-leg tensor, proof like before: T ∑λrur vr wr = UΛ(W •V ) r In general: T U (1)Λ(U (N ) ••U (2) ) = (1) (N ) (2) (N ) (2) T = U (λ1U1 ⊗⊗U1 λRU R ⊗⊗U R ) = R (1) (2) (N ) = ∑λrU r ⊗U r ⊗⊗U r r=1 (1) 38 Low rank approximation R r A = a ...a ≈ b ...b i1...iN ∑ ki1 kiN ∑ ki1 kiN k =1 k =1 (1) For R large enough every A can be represented by CP (2) For given A there is a minimum R with this property (3) Approximate A as good as possible by r 39 PARAFAC Graphical Ai1,i2,i3,…,iN ≈ r . . . . A1,i1,r A2,i2,r A3,i3,r AN,iN,r i1 i2 i3 iN M A ⋅A ⋅⋅ A ∑ 1,i1 ,r 2,i2 ,r N ,iN ,r r=1 40 Norm etc. n1 ,...nN A , B = A B Inner product: i1...iN i1...iN ∑ i1...iN i1...iN i1...iN =1 n1 ,...nN A = A2 Norm: i1...iN ∑ i1...iN i1...iN =1 A = a(1) a(2) ... a(N ) with vectors Rank-One tensor: A = a(1) ⋅a(2) ⋅...⋅a(N ) ( j) i1...iN i1 i2 iN a Diagonal tensor: A ≠ 0 ⇔ i = i = ... = i i1...iN 1 2 N 41 Symmetry A tensor is called cubical, if every mode is of the same size, n1=n2=…=nN A cubical tensor is called supersymmetric, if ist elements Remain constant under any permutation of the indices: A = A i1...iN iπ (1) ...iπ ( N ) A tensor is partial symmetric, if it is symmetric in some modes, e.g. three-way tensor, where all frontal slices are symmetric matrices. 42 Example A111 =1, A112 = A121 = A211 = 2, A122 = A212 = A221 = 3, A222 = 4. 2 3 1 2 2 3 3 4 1 2 3 4 2 3 2 3 2 3 1 2 3 4 2 3 43 Results on tensor rank R A = a ...a i1...iN ∑ ki1 kiN with minimum R, dimension n1,…,nN, nj≤n k =1 General N-way tensor: R=rank ≤nN-1 Proof: Assume nN=n=max nj. A = A e(1) ⊗⊗ e(N ) = ∑ i1...iN i1 iN i1 ,...,iN = e(1) ⊗⊗ e(N −1) ⊗ A e(N ) ∑ i1 iN−1 ∑ i1...iN iN i1 ,...,iN−1 iN N −1 n ≤ n N −1 Where the summation runs over maximum ∏ j j=1 rank 1 terms. 44 Results on tensor rank The true rank might be much smaller: The maximum rank of a 3 leg tensor 3x3x3 over IR is bounded by 5. For general 3 leg IxJxK tensor A the maximum rank is bounded by rank(A) ≤ min{IJ, IK, JK} For general 3 leg IxJx2 tensor A the maximum rank max{I, J} is bounded by rank(A) ≤ min{I, J}+ min{I, J, } 2 The typical rank of a 3 leg tensor 5x3x3 over IR is 5 or 6. 45 Results on tensor rank Example: A = a ⊗ a + a ⊗b + b ⊗ a + b ⊗b with linearly independent a and b, rank≤4, with 4 linearly independent terms, but A = (a + b) ⊗ a + b) with rank 1. Theorem: rank(A)=3 for A = v1 ⊗ v2 ⊗ w3 + v1 ⊗ w2 ⊗ v3 + w1 ⊗ v2 ⊗ v3 with linearly independent vj,wj. Proof: (1) rank(A)=0 A=0 v 1 ⊗ a = w 1 ⊗ b !!! (2) rank(A)=1 u ⊗ v ⊗ w = v1 ⊗ v2 ⊗ w3 + v1 ⊗ w2 ⊗ v3 + w1 ⊗ v2 ⊗ v3 46 Assume a linear functional with ϕ1(v1) =1, ϕ :=ϕ1 ⊗id ⊗id and apply it on above equation: ϕ1(u)v ⊗ w = v2 ⊗ w3 + w2 ⊗ v3 +ϕ1(w1)v2 ⊗ v3 = = v2 ⊗ w3 + (w2 +ϕ1(w1)v2 ) ⊗ v3 Left side rank 1 matrix, right side rank 2 matrix !!! (3) Rank(A)=2: u ⊗ v ⊗ w + u'⊗v'⊗w'= v1 ⊗ v2 ⊗ w3 + v1 ⊗ w2 ⊗ v3 + w1 ⊗ v2 ⊗ v3 If u and u‘ are linearly dependent there is a functional ϕ1(u) = ϕ1(u') = 0, ϕ1(v1) ≠ 0 or ϕ1(w1) ≠ 0 47 0 = (ϕ1 ⊗id ⊗id )(A) = ϕ1(v1)(v2 ⊗ w3 + w2 ⊗ v3 ) +ϕ1(w1)v2 ⊗ v3 Linearly independent !! Hence, u and u‘ have to be linearly independent, and one of the vectors u or u‘ must be linearly independent of v1, say u‘ is l.i. of v1. Choose functional with ϕ1(v1) =1, ϕ1(u') = 0. ϕ1(u)v ⊗ w = (v2 ⊗ w3 + w2 ⊗ v3 )+ϕ1(w1)v2 ⊗ v3 Again, the left-hand-side matrix is rank ≤1, the right-hand-side matrix has rank 2 !!! 48 For a supersymmetric tensor we can define the symmetric rank: r rankS (A) = minr : A = ∑ ar ar ... ar k =1 ⊗3 1 2 ⊗3 1 1 Example: A = (1,2) = (1,2) ⊗ , B = (1,1) = (1,1) ⊗ 2 4 1 1 2 4 1 1 3 5 1 2 1 1 2 3 + = 4 8 1 1 5 9 2 4 1 1 3 5 Supersymmetric of symmetric rank 2. 3 Rank 1:? ( a , b ) ⊗ = A + B , 4 equations for 2 unknowns a,b. 3 3 2 2 a = 2,b = 9,a b = 3,ab = 5; 49 Smallest Typical Rank 3-way T K 2 3 4 J 2 3 4 5 3 4 5 4 5 2 2 3 4 4 3 4 5 4 5 3 3 3 4 5 5 5 5 6 6 4 4 4 4 5 5 6 6 7 8 5 4 5 5 5 5 6 8 8 9 6 4 6 6 6 6 7 8 8 10 I 7 4 6 7 7 7 7 9 9 10 8 4 6 8 8 8 8 9 10 11 9 4 6 8 9 9 9 9 10 12 10 4 6 8 10 9 10 10 10 12 11 4 6 8 10 9 11 11 11 13 12 4 6 8 10 9 12 12 12 13 IJK DOF: R(I+J+K-2) Expected Rank: I + J + K − 2 50 Examples Strassen by considering a 3-leg tensor with rank 7 Hackbusch page 69 a a b b c c 1 2 ⋅ 1 2 = 1 2 with submatrices aj, bj, cj a3 a4 b3 b4 c3 c4 4 cν = ∑tν ,µ,λ aµbλ t is of rank 7. µ,λ=1 51 Matrix case: SVD For a tensor that is a vector, the rank is 1. For a tensor that is a nxm matrix, the rank is given by the singular value decomposition r r T T A = UΣV = ∑σ i (uivi )= ∑σ i (ui ⊗ vi ) i=1 i=1 r = the number of nonzero singular values. For low rank approximation we can delete the small singular values. 52 Uniqueness of CP Matrix case: A nxm matrix of rank r: r T A = U n,rVr,m = ∑ur vr k =1 Every matrix factorization of this form gives a CP representation. QR-factorizations, SVD. In the matrix case (2-leg-case) the rank representations are not unique! 53 Uniqueness 3 leg case Let A be a three-way tensor of rank R: R A = [[U,V ,W ]] = ∑uk vk wk k =1 Uniqueness is related to other rank R representations upto scaling and upto permutations A = [[U,V ,W ]] = [[UΠ,VΠ,WΠ]] for any RxR permutation Π R = α β γ A ∑( kuk ) ( k vk ) ( k wk ) with αk βk γk = 1, for k=1,…,R k =1 54 k-rank of a matrix The k-rank of a matrix A - denoted by kA – is the maximum number k such that any k columns of A are linearly independent. T = [[A, B,C]] Then the CP representation of A is unique if k A + kB + kC ≥ 2R + 2 55 T an IxJxK-Tensor: Then the CP representation of T is unique if min{I, R}+ min{J, R}+ min{K, R} ≥ 2R + 2 For R≤K the CP representation of T is unique if 2R(R −1) ≤ I(I −1)J (J −1) The CP representation is unique for an N-way rank R tensor R (1) (2) (N ) (1) (2) (N ) A = [[A , A ,..., A ]] = ∑ ak ak ak N k =1 ≥ + − if ∑ k A( n) 2R (N 1) n=1 56 Approximation of tensor by CP Matrix case trivial via SVD: keep larger singular values and replace smaller one by 0. For 3-way tensors this is not so easy. Especially for R A = ∑λkuk vk wk k =1 summing up r of these terms will not give a good rank-r approximation. For finding the best rank-r approximation we have to determine all factors simultaneously! 57 Rank-r approximation The situation is even worse: the best rank-r approximation might even not exist! Consider A = u1 v1 w2 + u1 v2 w1 + u2 v1 w1 where the matrices U, V, and W have linearly independent Columns. Approximation by rank-2 tensors: 1 1 1 Bα = αu + u v + v w + w −α(u v w ) 1 α 2 1 α 2 1 α 2 1 1 1 1 1 α →∞ A − B = u v w + u v w + u v w + u v w → 0 α α 2 2 1 2 1 2 1 2 2 α 2 2 2 Example for degeneracy! 58 Another example: ⊗3 ⊗3 1 1 1 1 3 A(n) = n2 x + y + z + n2 x + y − z − 2n2 x⊗ n2 n n2 n with linearly independent x,y,z. The sequence of rank 3 tensors converges for n∞ to the rank 5 tensor: A(∞) = x ⊗ x ⊗ y + x ⊗ y ⊗ x + y ⊗ x ⊗ x + x ⊗ z ⊗ z + z ⊗ x ⊗ z + z ⊗ z ⊗ x 59 Rank spaces Hence a sequence of rank-2 tensors converges against a rank-3 tensor: The space of rank-2 tensors is not closed! We can approximate the 3-way tensor as good as we want by rank-2 tensors, but the sequence of approximations does not converge in the rank-2 space. A B∞ B2 B0 B1 60 Computing the CP Standard method: Alternating Least Squares method (ALS) Given any (high-rank) tensor A Compute r-rank approximation in tensor B r min A − B with B = λkuk vk wk = [[λ;U,V ,W ]] B ∑ k =1 ALS approach: fix two matrices, e.g. V and W, and solve for U. This leads to the matrix minimization ˆ T min A(1) −U (W •V ) Uˆ F − ˆ T 1 T T −1 with solution U = A(1) ((W •V ) ) = A(1) (W •V )(W W ∗V V ) 61 Computing Advantage: Compute pseudoinverse of small rxr-matrix Afterwards, λ is defined by normalization λk = uˆk , uk = uˆk / λk , k =1,...,r In this way we update U, then V, then W, then again U and so on until convergence. Costs per step: 62 ELS ALS with enhanced line search Assume, ALS has computed new Unew replacing Uold. Hence, we have a change in the direction Δ=Unew-Uold in the form Unew = Uold + Δ. We generalize this by introducing line search and step size μ in the form Unew = Uold + μ Δ looking for an optimal value of μ. R 2 min A − (u + µδ ) v w µ ∑ k k k k k=1 R R 2 = min (A − u v w ) − µ δ v w µ ∑ k k k ∑ k k k k =1 k =1 2 = min B − µC → µ →U = U + µ∆ µ new old 63 ELS general Unew = Uold + μ ΔU, Vnew = Vold + μ ΔV, Wnew = Wold + μ ΔW, R 2 min A − (u + µδ ) (v + µδ ) (w + µδ ) µ ∑ k u,k k v,k k w,k k =1 2 = min B − µ 3C − µ 2 D − µE µ = min a + a µ + a µ 2 + a µ 3 + a µ 4 + a µ 5 + a µ 6 µ 0 1 2 3 4 5 6 Find the 5 roots of the derivative and choose the root with minimum value of the objective function. Gives new U, V, and W. Use ALS for new search directions and repeat. 64 Application of the CP Starting point: 3-leg tensors often have small rank and the low-rank approximation is unique. Therefore, the best approximating rank-1 term can give useful information on the data: - Mixtures of analytes can be separated - Concentrations can be measured - Pure spectra and profiles can be estimated Typical example: 3-way data in time, space frequency Translate matrix case by additional index in 3-leg tensor to achieve uniqueness! 65 Application of the CP Van Huffel: PARAFAC in EEG monitoring EEG data as 3-way tensor 66 EEG Monitoring 67 EEG rank terms 68 EEG: epileptic seizure onset localization 69 EEG Better localization by CP than visually or by other matrix techniques. 70 Block PARAFAC (L,L,1) Consider more general higher rank terms (L,L,1) Because larger blocks might be necessary for accurate representation of the data. R T = ∑ Er ⊗ cr , Er : I × J − matrix, rank(Er ) = L r=1 R T T = ∑(Ar ⋅ Br )⊗ cr r=1 Also block representations are often unique, e.g. for RL ≤ min(I,J) and C without proportional colmuns. „Essentially unique“, upto - permutations, - factor between A and B - scaling 71 Visualization c J K K L J L I = I D B T A R T = (A, B,c)⋅ D = [[D; A, B,c]] = ∑ Dr ×1 Ar ×2 Br ×3 cr r=1 72 Waring Problem Write given integer n in the form n = nd ++ nd 1 kd Proved by Hilbert 1909. What is the minimum number kd? Reformulation in polynomials: Which is the minimum s such that a degree d polynomial can be written as a sum of powers of d of linear terms: d d P = L1 ++ Ls Answered by Hirschowitz 1995. 73 Symmetric Rank of Polynomial The minimum s is called the symmetric rank of the polynomial P Reformulation: Consider map ν d : IP(S1) → IP(Sd ) L → Ld Image of this map is called d-th Veronese variety Xn,d 74 Veronese Variety for Tensors d ⊗d ν d : IP(V ) → IP(S V ) ⊂ IP(V ) d v → v⊗ The image of this map is the Veronese Variety of IP(V) The symmetric rank of a symmetric tensor T ϵ Sd(V) is the minimum s with ⊗d ⊗d T = v1 ++ vs 75 Mode n-Rank of a Tensor View the tensor as collection of vectors in the n-th index (fibers) The rank of these collection of vectors is the mode n-rank. 1 0 Example with R1=R2=2, R3=1 1 0 Mode n=3: Vectors (0,0), (1,1), (1,1), (0,0) 1 0 1 0 1 R = rank =1 0 1 3 3 0 1 0 1 0 2 1 Mode n-rank is the rank of the mode-n unfolding matrix A(n) 76 Tucker Decomposition (three-mode) factor analysis (Tucker, 1966) N-mode PCA (principal component analysis) Higher-order SVD (HOSVD) (De Lathauwer, 2000) N-mode SVD Idea: decompose given N-way tensor into a core N-way tensor with less entries in each dimension multiplied by a matrix along each mode. P Q K A = G ×1 U ×2 V ×3 W = ∑∑∑ g pqku p vq wk = p=1 q=1 k =1 = [[G;U,V ,W ]] = (U,V ,W )⋅G With core tensor G and U,V,W matrices relative to each mode 77 Tucker Decomposition Unfolding in i and {jk}, V SVD U Backfolding ΛV Repeat for other U W unfoldings. A = S (1) (1) (1) (Aijk ) = (Sijk )×1 U ×2 V ×3 W i j k k1 k2 k3 u v w μ μ (1) (1) (1) 1 μ2 3 (A )= S ⋅u v w S ijk ∑ ∑ ∑ µ1µ2µ3 µ1 µ2 µ3 µ1 =1 µ2 =1 µ3 =1 k1 k2 k3 A = S u v w ,i =1,..., I, j =1,..., J, k =1,..., K ijk ∑ ∑ ∑ µ1µ2µ3 µ1i µ2 j µ3k µ1 µ2 µ3 Multilinear rank (k1,k2,k3) 78 Computation (1) (1) (1)T A(1) : I1 × I2 I3; SVD : A(1) = U Σ V (2) (2) (2)T A(2) : I2 × I1I3; SVD : A(2) = U Σ V (3) (3) (3)T A(3) : I3 × I1I2 ; SVD : A(3) = U Σ V (1)T (2)T (3)T S = A×1 U ×2 U ×3 U (1) (2) (3) A = S ×1 U ×2 U ×3 U U with orthonormal columns S all-orthogonal and ordered 79 Proof: n ≠ m : A×m U ×n V = A×n V ×m U A×n U ×n V = A×n (VU ) B = A×n U ⇔ B(n) = U ⋅ A(n) (1)T (2)T (3)T S = A×1 U ×2 U ×3 U (1) (2) (3) A = S ×1 U ×2 U ×3 U = (1)T (2)T (3)T (1) (2) (3) = (A×1 U ×2 U ×3 U )×1 U ×2 U ×3 U = (1)T (1) (2)T (2) (3)T (3) = A×1 (U U )×2 (U U )×3 (U U )= = A×1 I ×2 I ×3 I = A 80 Core Tensor S all-orthogonal: (1) (2) (3) A = S ×1 U ×2 U ×3 U with the additional property Si,:,:, S j,:,: = 0 for i ≠ j (1) (3) (2) Proof: S(1) = U A(1) (U ⊗U ) (1) (3) (2) S(1),i = Ui A(1) (U ⊗U ) (3) (2) T T (1)T (1) (3) (2) S(1),i , S(1) j = (U ⊗U ) A(1) (Ui U j )A(1) (U ⊗U ) and similarly for index 2 and 3. 81 Properties Mode-n singular values = norms of slices = sing.v. of An Truncate by deleting small singular values/vectors (1)T (2)T (3)T S = A×1 U ×2 U ×3 U → ~ ~(1)T ~(2)T ~(3)T S = A×1 U ×2 U ×3 U ~ ~ ~(1) ~(2) ~(3) A → A = S ×1 U ×2 U ×3 U 82 Tucker Graphical Ai1,i2,i3,…,iN i1,…,iN =1,…,In ≈ Bm1,m2,m3,…,mN m1 m2 m3 mN . . . . C1,i1,m1 C2,i2,m2 C3,i3,m3 CN,iN,mN i1 i2 i3 iN Dsmall B C ⋅⋅C , ∑ m1 ,...,mN m1i1 mN iN m1 ,...,mN 83 Three-way Tucker P Q K A = G ×1 U ×2 V ×3 W = ∑∑∑Gpqku p vq wk = [[G;U,V ,W ]] p=1 q=1 k =1 P Q K Aijm = ∑∑∑Gpqkuipv jq wmk , p=1 q=1 k =1 T A(1) = U ⋅G(1) (W ⊗V ) T A(2) = V ⋅G(2) (W ⊗U ) T A(3) = W ⋅G(3) (V ⊗U ) 84 N-way Tucker (1) (2) (N ) (1) (2) (N ) A = G ×1 U ×2 U ...×N U = [[G;U ,U ,...,U ]] R1 R2 RN A = ... G u(1) u(2) ...u(N ) , i1i2 ...iN ∑∑ ∑ k1k2 ...kN i1k1 i2k2 iN kN in =1,..., In k1 =1 k2 =1 kN =1 (n) (N ) (n+1) (n−1) T A(n) = U ⋅G(n) (U ⊗...⊗U ⊗U ⊗...⊗U (1)) so Tucker1 is decomposition relative to only one index, Tucker2 relative to 2 indices, and Tucker relative to all indices. 85 Computing the Tucker Dec. For n=1,…,N (n) U := matrix of left singular vectors of A(n) (1)T (2)T (N )T G := A×1 U ×2 U ×3 ×N U Output: G, U(1), …, U(N). We can use this algorithm also for approximating A by Choosing in U(n) only the dominant left singular vectors! Rn rn 86 Approximating Tucker dec. min A −[[G;U (1) ,...,U (N ) ]] G,U (1) ,...,U ( N ) × × × subject to G ∈ IRr1 ... rN , U (n) ∈ IR In rn columnwise orthogonal Rewrite as minimizing vec(A) − (U (N ) ⊗⊗U (1) )vec(G) (1)T (2)T (N )T with solution G := A×1 U ×2 U ×3 ×N U 2 A −[[G;U (1) ,...,U ( N ) ]] 2 2 = A − 2 A,[[G;U (1) ,...,U ( N ) ]] + [[G;U (1) ,...,U (N ) ]] 2 (1)T (N )T 2 = A − 2 A×1 U ×2 ×N U ,G + G 2 2 2 2 = A − 2 G,G + G = A − G 2 2 T T = A − A× U (1) × × U (N ) 1 2 N 87 ALS for Tucker (1)T (N )T max A×1 U ×2 ×N U U ( n) × subject to U (n) ∈ IR In rn columnwise orthogonal (n)T (N ) (n+1) (n−1) (1) max U W with W = A(n) (U ⊗⊗U ⊗U ⊗⊗U ) U ( n) ALS method: For n=1,…,N: (n) choose U the rn dominant singular vectors of W Repeat until convergence 88 Uniqueness Tucker is not unique: −1 −1 −1 [[G; A, B,C]] = [[G ×1 U ×2 V ×3 W; AU , BV ,CW ]] 89 Application: Tensorfaces Given a database of images of different persons, e.g. with different looks=expressions, illumination, positions=views. We can collect all the images in a big 5-leg tensor A = (a ) ipeople ,iviews ,iillum ,iexp ress ,ipixel In the example there is a database of 28 male persons With 5 poses, 3 illuminations, 3 expressions, each image 512x352 pixels. Hence, A is a 28x5x3x3x7943 tensor. 90 Database 28 subjects with 45 images per person. Expression: smile Illuminations 1 2 3 45 images for one person: Expression V i e w p o i n t 91 Principal Component Analysis PCA Use eigenfaces to capture the important features in a compact form. Often eigendecomposition and eigenvectors are used in PCA. Here we use the Tucker decomposition: A = Z ×1 U people ×2 U views ×3 Uillum ×4 U express ×5 U pixels resulting in T A( pixels) = U pixels Z( pixels) (U express ⊗Uillum ⊗U views ⊗U people ) image data basis vectors coefficients 92 Interpretation T A( pixels) = U pixels Z( pixels) (U express ⊗Uillum ⊗U views ⊗U people ) image data basis vectors coefficients The mode matrix Upixels can be interpreted as PCA. By the core tensor Z we can transform the eigenimages present in Upixels into eigenmodes representing the principal axes of variation across the various factors (people, viewpoints, illuminations, expressions) by forming Z ×5 U pixels 93 Eigenfaces The first 10 PCA eigenvectors (eigenfaces) contained in the mode matrix Upixels „Multilinear Analysis of Image Ensembles: TensorFaces“ by M.A.O. Vasilescu and D. Terzopoulos Similar paper on PCA on human motion via Tensors. 94 PCA-CP-Tucker PCA - bilinear model: xij = ∑ aif bjf + eij, i =1,..., I; j =1,..., J; f X = ABT + E CP - trilinear model: xijk = ∑ aif bjf ckf + eijk , f T X k = ADk B + Ek , Dk = diag(C(k,:)); Tucker3: X = AG(C ⊗ B)T + E with unitary bases A, B, C for each mode. 95