Examensarbete
Tensor Rank Elias Erdtman, Carl J¨onsson
LiTH - MAT - EX - - 2012/06 - - SE
Tensor Rank
Applied Mathematics, Link¨opings Universitet
Elias Erdtman, Carl J¨onsson
LiTH - MAT - EX - - 2012/06 - - SE
Examensarbete: 30 hp
Level: A
Supervisor: G¨oranBergqvist, Applied Mathematics, Link¨opings Universitet
Examiner: Milagros Izquierdo Barrios, Applied Mathematics, Link¨opings Universitet
Link¨oping June 2012
Abstract
This master’s thesis addresses numerical methods of computing the typical ranks of tensors over the real numbers and explores some properties of tensors over finite fields. We present three numerical methods to compute typical tensor rank. Two of these have already been published and can be used to calculate the lowest typical ranks of tensors and an approximate percentage of how many tensors have the lowest typical ranks (for some tensor formats), respectively. The third method was developed by the authors with the intent to be able to discern if there is more than one typical rank. Some results from the method are presented but are inconclusive. In the area of tensors over finite fields some new results are shown, namely that there are eight GLq(2) × GLq(2) × GLq(2)-orbits of 2 × 2 × 2 tensors over any finite field and that some tensors over Fq have lower rank when considered as tensors over Fq2 . Furthermore, it is shown that some symmetric tensors over F2 do not have a symmetric rank and that there are tensors over some other finite fields which have a larger symmetric rank than rank.
Keywords: generic rank, symmetric tensor, tensor rank, tensors over finite fields, typical rank. URL for electronic version:
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-78449
Erdtman, J¨onsson,2012. v vi Preface
“Tensors? Richard had no idea what a tensor was, but he had noticed that when math geeks started throwing the word around, it meant that they were headed in the general direction of actually getting something done.” - Neal Stephenson, Reamde (2011).
This text is a master’s thesis, written by Elias Erdtman and Carl J¨onsson at Link¨opingsuniversitet, with G¨oranBergqvist as supervisor and Milagros Izquierdo Barrios as examiner, in 2012.
Background
The study of tensors of order greater than two has recently had an upswing, both from a theoretical point of view and in applications, and there are lots of unanswered questions in both areas. Questions of interest are for example what a generic tensor looks like, what are useful tensor decompositions and how can one calculate them, what are and how can one find equations for sets of tensors, etc. Basically one wants to have a theory of tensors as well-developed and easy to use as the theory of matrices.
Purpose
In this thesis we aim to show some basic results on tensor rank and investigate methods for discerning generic and typical ranks of tensors, i.e., searhing for an answer to the question, which ranks are the most ”common”?.
Chapter outline
Chapter 1. Introduction In the first chapter we present theory relevant to tensors. It is divided in four major parts: the first part is about multilinear algebra, the second part is a short introduction to the CP decomposition, the third part gives the reader the background in algebraic geometry necessary to understand the results in chapter 2. The fourth and last part of the chapter gives an example of the application of tensor decomposition, more specifically the multiplication tensor for 2 × 2 matrices and Strassen’s algorithm for matrix multiplication.
Erdtman, J¨onsson,2012. vii viii
Chapter 2. Tensor rank In the second chapter we introduce different notions of rank: tensor rank, multilinear rank, Kruskal rank, etc. We show some basic results on tensors using algebraic geometry, among them some results on generic ranks over C and typical ranks over R. Chapter 3. Numerical methods and results Numerical results for determining typical ranks are presented in chapter three. We present an algorithm which can calculate the generic rank for any format of tensor spaces and another algorithm from which one can infer if there is more than one typical rank over R for some tensor space formats. A method developed by the authors is also presented along with results giving an indication that the method does not seem to work. Chapter 4. Tensors over finite fields This chapter contains some results on finite fields. We present a classi- fication and the sizes of the eight GLq(2) × GLq(2) × GLq(2)-orbits of 2 2 2 Fq ⊗ Fq ⊗ Fq and show that the elements of one of the orbits have lower rank when considered as tensors over Fq2 . Finally we show that there are symmetric tensors over F2 which do not have a symmetric rank and over some other finite fields a symmetric tensor can have a symmetric rank which is greater than its rank. Chapter 5. Summary and future work The results of the thesis are summarized and some directions of future work are indicated.
Appendix A. Programs Program code for Mathematica or MATLAB used to produce the results in the thesis is given in this appendix.
Distribution of work
Since this is a master’s thesis we give account for who has done what in the table below. Section Author 1.1 CJ/EE 1.2 EE 1.3 CJ 1.4 CJ/EE 2.1 CJ/EE 2.2-2.4 CJ 3.1-3.2 EE/CJ 3.3 EE 4 CJ 5 CJ & EE Nomenclature
Most of the reoccurring abbreviations and symbols are described here.
Symbols
• F is a field.
• Fq is the finite field of q elements. •I(V ) is the ideal of an algebraic set V .
•V(I) is the algebraic set of zeros to an ideal I. • Seg is the Segre mapping.
• σr(X) is the r:th secant variety of X.
• Sd is the symmetric group on d elements. • ⊗ is tensor product.
• ~ is the matrix Kronecker product. • Xˆ is the affine cone to a set X ∈ PV . • dxe is the number x rounded up to the nearest integer.
Erdtman, J¨onsson,2012. ix x Contents
1 Introduction 1 1.1 Multilinear algebra ...... 2 1.1.1 Tensor products and multilinear maps ...... 2 1.1.2 Symmetric and skew-symmetric tensors ...... 5 1.1.3 GL(V1) × · · · × GL(Vk) acts on V1 ⊗ · · · ⊗ Vk ...... 7 1.2 Tensor decomposition ...... 7 1.3 Algebraic geometry ...... 9 1.3.1 Basic definitions ...... 9 1.3.2 Varieties and ideals ...... 10 1.3.3 Projective spaces and varieties ...... 11 1.3.4 Dimension of an algebraic set ...... 12 1.3.5 Cones, joins, and secant varieties ...... 14 1.3.6 Real algebraic geometry ...... 15 1.4 Application to matrix multiplication ...... 16
2 Tensor rank 19 2.1 Different notions of rank ...... 19 2.1.1 Results on tensor rank ...... 21 2.1.2 Symmetric tensor rank ...... 21 2.1.3 Kruskal rank ...... 22 2.1.4 Multilinear rank ...... 23 2.2 Varieties of matrices over C ...... 23 2.3 Varieties of tensors over C ...... 24 2.3.1 Equations for the variety of tensors of rank one ...... 24 2.3.2 Varieties of higher ranks ...... 25 2.4 Real tensors ...... 27
3 Numerical methods and results 29 3.1 Comon, Ten Berge, Lathauwer and Castaing’s method ...... 29 3.1.1 Numerical results ...... 32 3.1.2 Discussion ...... 32 3.2 Choulakian’s method ...... 33 3.2.1 Numerical results ...... 35 3.2.2 Discussion ...... 37 3.3 Surjectivity check ...... 37 3.3.1 Results ...... 39 3.3.2 Discussion ...... 40
Erdtman, J¨onsson,2012. xi xii Contents
4 Tensors over finite fields 43 4.1 Finite fields and linear algebra ...... 43 2 2 2 4.2 GLq(2) × GLq(2) × GLq(2)-orbits of Fq ⊗ Fq ⊗ Fq ...... 44 4.2.1 Rank zero and rank one orbits ...... 48 4.2.2 Rank two orbits ...... 48 4.2.3 Rank three orbits ...... 50 4.2.4 Main result ...... 51 4.3 Lower rank over field extensions ...... 52 4.4 Symmetric rank ...... 52
5 Summary and future work 57 5.1 Summary ...... 57 5.2 Future work ...... 57
A Programs 59 A.1 Numerical methods ...... 59 A.1.1 Comon, Ten Berge, Lathauwer and Castaing’s method . . 59 A.1.2 Choulakian’s method ...... 60 A.1.3 Surjectivity check ...... 61 A.2 Tensors over finite fields ...... 64 A.2.1 Rank partitioning ...... 64 A.2.2 Orbit paritioning ...... 66
Bibliography 68 List of Tables
3.1 Known typical ranks for 2 × N2 × N3 arrays over R...... 33 3.2 Known typical ranks for 3 × N2 × N3 arrays over R...... 33 3.3 Known typical ranks for 4 × N2 × N3 arrays over R...... 34 3.4 Known typical ranks for 5 × N2 × N3 arrays over R...... 34 3.5 Known typical ranks for N ×d arrays over R...... 34 3.6 Number of real solutions to (3.7) for 10 000 random 5 × 3 × 3 tensors...... 36 3.7 Number of real solutions to (3.7) for 10 000 random 7 × 4 × 3 tensors...... 36 3.8 Number of real solutions to (3.7) for 10 000 random 9 × 5 × 3 tensors...... 36 3.9 Number of real solutions to (3.7) for 10 000 random 10 × 4 × 4 tensors...... 36 3.10 Number of real solutions to (3.7) for 10 000 random 11 × 6 × 3 tensors...... 36 3.11 Approximate probability that a random I × J × K tensor has rank I...... 36 3.12 Euclidean distances depending on the fraction of the area on the n-sphere...... 40 3.13 Number of points from φ2 close to some control points for the 2 × 2 × 2 tensor...... 40 3.14 Number of points from φ3 close to some control points for the 2 × 2 × 3 tensor...... 41 3.15 Number of points from φ3 close to some control points for the 2 × 3 × 3 tensor...... 41 3.16 Number of points from φ5 close to some control points for the 3 × 3 × 4 tensor...... 41
2 2 2 4.1 Orbits of Fq ⊗ Fq ⊗ Fq under the action of GLq(2) × GLq(2) × GLq(2) for q = 2, 3...... 46 2 2 2 4.2 Orbits of Fq ⊗Fq ⊗Fq under the action of GLq(2)×GLq(2)×GLq(2). 52 4.3 Number of symmetric 2 × 2 × 2-tensors generated by symmetric rank one tensors over some small finite fields...... 53 4.4 Number of N ×N ×N symmetric tensors generated by symmetric rank one tensors over F2...... 55
Erdtman, J¨onsson,2012. xiii xiv List of Tables List of Figures
1.1 The image of t 7→ (t, t2, t3) for −1 ≤ t ≤ 1...... 10 1.2 The intersection of the surfaces defined by y−x2 = 0 and z−x3 = 0, namely the twisted cubic, for (−1, 0, −1) ≤ (x, y, z) ≤ (1, 1, 1). 11 1.3 The cuspidal cubic...... 13 1.4 An example of a semi-algebraic set...... 16
3.1 Connection between Euclidean distance and an angle on a 2- dimensional intersection of a sphere...... 39
Erdtman, J¨onsson,2012. xv xvi List of Figures Chapter 1
Introduction
This first chapter will introduce basic notions, definitions and results concerning multilinear algebra, tensor decomposition, tensor rank and algebraic geometry. A general reference for this chapter is [25]. The simplest way to look at tensors is as a generalization of matrices; they are objects in which one can arrange multidimensional data in a natural way. For instance, if one wants to analyze a sequence of images with small differ- ences in some property, e.g. lighting or facial expression, one can use matrix decomposition algorithms, but then one has to vectorize the images and lose the natural structure. If one could use tensors, one can keep the natural structure of the pictures and it will be a significant advantage. However, the problem then becomes that one needs new results and algorithms for tensor decomposition. The study of decomposition of higher order tensors has its origins in arti- cles by Hitchcock from 1927 [19, 20]. Tensor decomposition was introduced in psychometrics by Tucker in the 1960’s [41], and in chemometrics by Appellof and Davidson in the 1980’s [2]. Strassen published his algorithm for matrix multiplication in 1969 [37] and since then tensor decomposition has received at- tention in the area of algebraic complexity theory. An overview of the subject, its literature and applications can be found in [1, 24]. Tensor rank, as introduced later in this chapter, is a natural generalization of matrix rank. Kruskal [23] states that is so natural that it was introduced independently at least three times before he introduced it himself in 1976. Tensors have recently been studied from the viewpoint of algebraic geometry, yielding results on typical ranks, which are the ranks a random tensor takes with non-zero probability. The recent book [25] summarizes the results in the field. Results often concern the typical ranks of certain formats of tensors, methods for discerning the rank of a tensor or algorithms for computing tensor decomposi- tions. Algorithms for tensor decompositions are often of interest in applications areas, where one wants to find structures and patterns in data. In some cases, just finding a decomposition is not enough, one wants the decomposition to be essentially unique. In these cases one wants an algorithm to find a decomposi- tion of a tensor and some way of determining if it is unique. In other fields of applications, one wants to find decompositions of important tensors, since this will yield better performing algorithms in the field, e.g. Strassen’s algorithm. Of course, an algorithm for finding a decomposition would be of high interest also in this case, but uniqueness is not important. However, in this case, just
Erdtman, J¨onsson,2012. 1 2 Chapter 1. Introduction knowing that a tensor has a certain rank gives one the knowledge that there is a better algorithm, but if the decomposition is the important part, just knowing the rank is of little help. We take a look at efficient matrix multiplication and Strassen’s algorithm as an example application in the end of the chapter. There are other examples of applications of tensor decomposition and rank, e.g. face recognition in the area of pattern recognition, modeling fluorescence excitation-emission data in chemistry, blind deconvolution of DS-CDMA signals in wireless communications, Bayesian networks in algebraic statistics, tensor network states in quantum information theory [25] and in neuroscience tensors are used in the study of effects of new drugs on brain activity [1, 24]. Efficient matrix multiplication is a special case of efficient evaluation of bilinear forms, see [22, 21, section 4.6.4 pp. 506-524], which, among other things, is studied in algebraic complexity theory [9, 25, chapter 13]. Historically, tensors over R and C have been investigated. In chapter 4, we investigate tensors over finite fields and show some new results.
1.1 Multilinear algebra
In this section we introduce the basics of multilinear algebra, which is an exten- sion of linear algebra by expanding the domain from one vector space to several. For an easy introduction to tensor products of vector spaces see [42].
1.1.1 Tensor products and multilinear maps Definition 1.1.1 (Dual space, dual basis). For a vector space V over the field F, the dual space V ∗ of V is the vector space of all linear maps V → F. ∗ If {v1, v2,..., vn} is a basis for V the dual basis {α1, α2, . . . , αn} in V is defined by ( 1 i = j αi(vj) = 0 i 6= j and extending linearly. Theorem 1.1.2. If V is of finite dimension, the dual basis is a basis of V ∗. Furthermore, V ∗ is isomorphic to V . The dual of the dual, (V ∗)∗ is naturally isomorphic to V . Definition 1.1.3 (Tensor product). For vector spaces V,W we define the tensor product V ⊗ W to be the vector space of all expressions of the form
v1 ⊗ w1 + ··· + vk ⊗ wk where vi ∈ V, wi ∈ W and the following equalities hold for the operator ⊗: • λ(v ⊗ w) = (λv) ⊗ w = v ⊗ (λw).
• (v1 + v2) ⊗ w = v1 ⊗ w + v2 ⊗ w.
• v ⊗ (w1 + w2) = v ⊗ w1 + v ⊗ w2. i.e., (· ⊗ ·) is linear in both arguments. 1.1. Multilinear algebra 3
Since V ⊗ W is a vector space, we can iteratively form tensor products V1 ⊗ V2 ⊗ · · · ⊗ Vk of an arbitrary number of vector spaces V1,V2,...,Vk. An element of V1 ⊗ V2 ⊗ · · · ⊗ Vk is said to be a tensor of order k.
nV nW Theorem 1.1.4. If {vi}i=1 and {wj}j=1 are bases for V and W respectively, nV ,nW then {vi ⊗wj}i=1,j=1 is a basis for V ⊗W and dim(V ⊗W ) = dim(V ) dim(W ). Proof. Any T ∈ V ⊗ W can be written
n X T = ak ⊗ bk k=1 for ak ∈ V, bk ∈ W . Since vi and wj are bases, we can write
n n XV XW ak = akivi bk = bkjwj i=1 j=1 and thus
n n ! n X XV XW T = akivi ⊗ bkjwj = k=1 i=1 j=1 n n n X XV XW = akibkjvi ⊗ wj = k=1 i=1 j=1 n n n ! XV XW X = akibkj vi ⊗ wj i=1 j=1 k=1
nV ,nW so that {vi ⊗wj}i=1,j=1 is a basis follows, and this in turn implies dim(V ⊗W ) = dim(V ) dim(W ).
If {v(i)}ni is a basis for V , this implies that {v(1)⊗v(2)⊗· · ·⊗v(k)}n1,...,nk j j=1 i j1 j2 jk j1=1,...,jk=1 is a basis for V1 ⊗ V2 ⊗ · · · ⊗ Vk. Furthermore, if we have chosen a basis for each Vi, we can identify a tensor T ∈ V1 ⊗ V2 ⊗ · · · ⊗ Vk with a k-dimensional array of size dim V1 × dim V2 × · · · × dim Vk where the element in position (j1, j2, . . . , jk) is the coefficient for v(1) ⊗ v(2) ⊗ · · · ⊗ v(k) in the expansion of T in the induced j1 j2 jk basis for V1 ⊗ V2 ⊗ · · · ⊗ Vk. If k = 2, one gets matrices. If one describes a third order tensor as a three-dimensional array, one can describe the tensor as a tuple of matrices. For example, say the I × J × K tensor T has the entries tijk in its array. Then T can be described as the J,K tuple (T1,T2,...TI ) where Ti = (tijk)j=1,k=1, but it can also be described as 0 0 0 00 00 00 0 I,K the tuples (T1,T2,...,TJ ) or (T1 ,T2 ,...,TK ), where Tj = (tijk)i=1,k=1 and 00 I,J Ti = (tijk)i=1,j=1. The matrices in the tuples are called the slices of the array. Sometimes the adjectives frontal, horizontal and lateral are used to distinguish the different kinds of slices.
2 Example 1.1.5 (Arrays). Let {e1, e2} be a basis for R . Then e1 ⊗ e1 + 2 2 2e1 ⊗ e2 + 3e2 ⊗ e1 ∈ R ⊗ R can be expressed as the matrix 1 2 . 3 0 4 Chapter 1. Introduction
The third order tensor e1 ⊗e1 ⊗e1 +2e1 ⊗e2 ⊗e2 +3e2 ⊗e1 ⊗e2 +4e2 ⊗e2 ⊗e2 ∈ R2 ⊗ R2 ⊗ R2 can be expressed as a 3-dimensional array: 1 0 0 2 0 0 3 4 and the slices of the array are
1 0 0 2 , 0 0 3 4 1 0 0 2 , 0 3 0 4 1 0 0 0 , 0 2 3 4 where each pair arises from a different way of cutting the tensor.
Definition 1.1.6 (Tensor rank). The smallest R for which T ∈ V1 ⊗ · · · ⊗ Vk can be written R X (1) (k) T = vr ⊗ · · · ⊗ vr , (1.1) r=1 (i) for arbitrary vectors vr ∈ Vi is called the tensor rank of T .
Definition 1.1.7 (Multilinear map). Let V1,...Vk be vector spaces over F.A map f : V1 × · · · × Vk → F, is a multilinear map if f is linear in each factor Vi.
Theorem 1.1.8. The set of all multilinear maps V1 × · · · × Vk → F can be ∗ ∗ identified with V1 ⊗ · · · ⊗ Vk . (i) (i) Proof. Let Vi have dimension ni and basis {v1 ,..., vni }, and let the dual basis (i) (i) ∗ ∗ be {α1 , . . . , αni }. Then f ∈ V1 ⊗ · · · ⊗ Vk can be written
X (1) (k) f = β α ⊗ ... ⊗ α i1,...,ik i1 ik i1,...,ik and for (u1,..., uk) ∈ V1 × · · · × Vk acts as a multilinear mapping by:
X (1) (k) f(u ,..., u ) = β α (u ) ··· α (u ). 1 k i1,...,ik i1 1 ik k i1,...,ik
Conversely, let f : V1 × · · · × Vk → F be a multilinear mapping. Pick a basis (i) (i) (i) (i) {v1 ,..., vni } for Vi and let the dual basis be {α1 , . . . , αni }. Define
β = f(v(1),..., v(k)) i1,...,ik i1 ik and thus X (1) (k) β α ⊗ ... ⊗ α ∈ V ∗ ⊗ · · · ⊗ V ∗ i1,...,ik i1 ik 1 k i1,...,ik f acts as the multilinear map f by the description above. 1.1. Multilinear algebra 5
∗ A multilinear mapping (V1 ×· · ·×Vk)×W → F can be seen as an element of ∗ ∗ V1 ⊗· · ·⊗Vk ⊗W and can also be seen as a map V1 ⊗· · ·⊗Vk → W . Explicitly, ∗ P (1) (k) if f :(V1 × · · · × Vk) × W → F is written f = i αi ⊗ · · · ⊗ αi ⊗ wi it acts ∗ on an element in V1 × · · · × Vk × W by
X (1) (k) f(v1,..., vk, β) = αi (v1) ··· αi (vk)wi(β) ∈ F i but it can also act on an element in V1 × · · · × Vk by
X (1) (k) f(v1,..., vk) = αi (v1) ··· αi (vk)wi ∈ W. i Example 1.1.9 (Linear maps). Given two vector spaces V,W the set of all ∗ Pn linear maps V → W can be identified with V ⊗ W . If f = i=1 αi ⊗ wi, f acts as a linear map V → W by
n X f(v) = αi(v)wi i=1 or, going in the other direction, if f is a linear map f : V → W , we can describe ∗ it as a member of V ⊗ W by taking a basis {v1, v2,..., vn} for V and its dual basis {α1, α2, . . . , αn} and setting wi = f(vi), so we get
n X f = αi ⊗ wi. i=1
1.1.2 Symmetric and skew-symmetric tensors Two important subspaces of second order tensors V ⊗ V are the symmetric tensors and the skew-symmetric tensors. First, define the map τ : V ⊗V → V ⊗V by τ(v1 ⊗ v2) = v2 ⊗ v1 and extending linearly (τ can be interpreted as the non-trivial permutation on two elements). The spaces of symmetric tensors, S2V , and skew-symmetric tensors, Λ2V , can then be defined as:
S2V := span{v ⊗ v | v ∈ V } = = {T ∈ V ⊗ V | τ(T ) = T }, Λ2V := span{v ⊗ w − w ⊗ v| v, w ∈ V } = = {T ∈ V ⊗ V | τ(T ) = −T }.
Let us define two operators that give the symmetric and anti-symmetric part of a second order tensor. For v1, v2 ∈ V , define the symmetric part of v1 ⊗ v2 to 1 2 be v1v2 = 2 (v1 ⊗v2 +v2 ⊗v1) ∈ S V and the anti-symmetric part of v1 ⊗v2 to 1 2 be v1 ∧ v2 = 2 (v1 ⊗ v2 − v2 ⊗ v1) ∈ Λ V and we have v1 ⊗ v2 = v1v2 + v1 ∧ v2. To expand the definition of symmetric and skew-symmetric tensor, over R and C, to higher order we need to generalize these operators. Denote the tensor product of the same vector space k times as V ⊗k. Then for the symmetric case ⊗k ⊗k the map πS : V → V is defined on rank-one tensors by 1 X π (v ⊗ · · · ⊗ v ) = v ⊗ · · · ⊗ v = v v ··· v , S 1 k k! τ(1) τ(k) 1 2 k τ∈Sk 6 Chapter 1. Introduction
where Sk is the symmetric group on k elements. ⊗k ⊗k For the skew-symmetric tensors the map πΛV → V is defined on rank- one elements by
1 X π (v ⊗ · · · ⊗ v ) = sgn(τ)v ⊗ · · · ⊗ v = v ∧ · · · ∧ v . Λ 1 k k! τ(1) τ(k) 1 k τ∈Sk
πS and πΛ are then extended linearly to act on the entire space.
Definition 1.1.10 (SkV, ΛkV ). Let V be a vector space. The space of sym- metric tensors SkV is defined as
k ⊗k S V = πS(V ) = ⊗k = {X ∈ V | πS(X) = X}.
The space of skew-symmetric tensors or alternating tensors is defined as
k ⊗k Λ V = πΛ(V ) = ⊗k = {X ∈ V | πΛ(X) = X}.
The space SkV ∗ can be seen as the space of symmetric k-linear forms on V , but also as the space of homogeneous polynomials of degree k on V , so we can identify homogeneous polynomials of degree k with symmetric k-linear forms. We do this through a process called polarization.
Theorem 1.1.11 (Polarization identity). Let f be a homogeneous polynomial of degree k. Then
! 1 X X f¯(x , x , . . . , x ) = (−1)k−|I|f x 1 2 k k! i I⊂[k],I6=∅ i∈I is a symmetric k-linear form. Here [k] = {1, 2, . . . , k}.
Example 1.1.12. Let P (s, t, u) be a cubic homogenous polynomial in three variables. Plugging this into the polarization identity yields the folowing mul- tilinear form:
s s s 1 2 3 1 P¯ t , t , t = [P (s + s + s , t + t + t , u + u + u ) 1 2 3 3! 1 2 3 1 2 3 1 2 3 u1 u2 u3
− P (s1 + s2, t1 + t2, u1 + u2) − P (s1 + s3, t1 + t3, u1 + u3)
− P (s2 + s3, t2 + t3, u2 + u3) + P (s1, t1, u1) + P (s2, t2, u2) + P (s3, t3, u3)]
For example, if P (s, t, u) = stu one gets
1 P¯ = (s t u + s t u + s t u + s t u + s t u + s t u ) . 6 1 2 3 1 3 2 2 1 3 2 3 1 3 1 2 3 2 1 1.2. Tensor decomposition 7
1.1.3 GL(V1) × · · · × GL(Vk) acts on V1 ⊗ · · · ⊗ Vk
GL(V ) is the group of invertible linear maps V → V . An element (g1, g2, . . . , gk) ∈ GL(V1) × · · · × GL(Vk) acts on an element v1 ⊗ v2 ⊗ · · · ⊗ vk ∈ V1 ⊗ · · · ⊗ Vk by
(g1, g2, . . . gk) · (v1 ⊗ · · · ⊗ vk) = g1(v1) ⊗ · · · ⊗ gk(vk) and on the whole space V1 ⊗ · · · ⊗ Vk by extending linearly. (i) ni If one picks a basis for each V1,...,Vk, say {vj }j=1 is a basis for Vi, one can write ni (i) X (i) (i) gi(vj ) = αj,l vl , (1.2) l=1 and if T ∈ V1 ⊗ · · · ⊗ Vk,
X (1) (k) T = β v ⊗ · · · ⊗ v . (1.3) j1,...,jk j1 jk j1,...,jk
Thus, if g = (g1, . . . , gk),
X (1) (k) g · T = β g (v ) ⊗ · · · ⊗ g(v ) = j1,...,jk 1 j1 jk j1,...,jk X X (1) (k) (1) (k) = β α ··· α v ⊗ · · · ⊗ v = j1,...,jk j1,l1 jk,lk l1 lk j1,...,jk l1,...,lk X X (1) (k) (1) (k) = β α ··· α v ⊗ · · · ⊗ v . (1.4) j1,...,jk j1,l1 jk,lk l1 lk l1,...,lk j1,...,jk
One can note that the α’s in (1.2) gives the matrix of gi, and that the β’s in (1.3) gives the tensor T as a k-dimensional array. Thus the scalars
X (1) (k) β α ··· α j1,...,jk j1,l1 jk,lk j1,...,jk in (1.4) gives the coefficients in the k-dimensional array representing g · T .
1.2 Tensor decomposition
Let us start to consider how factorisation and decomposition works for tensors of order two, in other word matrices. Depending on the application and the resources for calculation, different decompositions are used. A very important decomposition is the singular value decomposition (SVD). It decomposes a ma- trix M into a sum of outer products (tensor products) of vectors as
R R X T X M = σrurvr = σrur ⊗ vr. r=1 r=1
Here ur and vr are pairwise orthonormal vectors, σr are the singular values and R is the rank of the matrix M, and these conditions make the decomposition essentially unique. The rank of M is the number of non-zero singular values and the best low rank-approximations of M are given by truncating the sum. 8 Chapter 1. Introduction
For tensors of order greater than two the situation is different. A decompo- sition that is a generalization of the SVD, but not of all its properties, is called CANDECOMP (canonical decomposition), PARAFAC (parallel factors analy- sis) or CP decomposition [24]. It is also the sum of tensor products of vectors as the following: R X (1) (k) T = vr ⊗ · · · ⊗ vr , r=1 (j) where Vj are vector spaces and vi ∈ Vj. As one can see the CP decomposition is used to define the rank of a tensor, where R is the rank of T if R is the smallest possible number such that equality holds (definition 1.1.6). A big issue with higher order tensors is that there is no method or algorithm to calculate the CP decomposition exactly, which would also give the rank of a tensor. A common algorithm to calculate the CP decomposition is the alter- nating least square (ALS) algorithm. It can be summarized as a least square method where we let the values from one vector space change while the others are fixed. Then the same is done for the next vector space and so forth for all vector spaces. If the difference between the approximation and the given tensor is too large the whole procedure is repeated until the difference is small enough. The algorithm is described in algorithm 1 where T is a tensor of size d1 × · · · × dN . The norm that is used is the Frobenius norm, and it is defined as
d1,...,dN 2 X 2 kT k = |Ti1,...,iN | , (1.5)
i1=1,...,iN =1 where Ti1,...,iN denotes the i1, . . . , iN component of T . One thing to notice is that the rank is needed as a parameter for the calculations, so if the rank is not known it needs to be approximated before the algorithm can start.
Algorithm 1 ALS algorithm to calculate the CP decomposition Require: T,R (n) d Initialize ar ∈ R n for n = 1,...,N and r = 1,...,R. repeat for n = 1,. . . ,N do R 2 X (1) (N) Solve min T − ar ⊗ · · · ⊗ ar . (n) ai ,i=1,...,R r=1 (n) Update ai to its newly calculated value, for i = 1,...R. end for 2 PR (1) (N) until T − r=1 ar ⊗ · · · ⊗ ar < threshold or maximum iteration is reached (1) (N) return ar ,... ar for r = 1,...,R.
This is actually a way to decide the rank of a tensor, but the method has a few problems. First of all is the issue with border rank (see section 2.1), which makes it possible to approximate some tensors arbitrary well with tensors with lower rank (see example 2.1.1). Furthermore the algorithm is not guaranteed to converge to a global optimum, and even if it does converge, it might need a large number of iterations [24]. 1.3. Algebraic geometry 9
1.3 Algebraic geometry
In this section we introduce basic notions of algebraic geometry, which is the study of objects defined by polynomial equations. References for this section are [13, 17, 25, 31], and for section 1.3.6, [6].
1.3.1 Basic definitions
Definition 1.3.1 (Monomial). A monomial in variables x1, x2, . . . , xn is a prod- uct of variables
α1 α2 αn x1 x2 . . . xn
α where αi ∈ N = {0, 1, 2,... }. Another notation for this is x where x = n (x1, x2, . . . , xn) and α = (α1, α2, . . . , αn) ∈ N . α is called a multi-index.
Definition 1.3.2 (Polynomial). Given a field F, a polynomial is a finite linear combination of monomials with coefficients in F, i.e. if f is a polynomial over F it can be written X α f = cαx α∈A for some finite set A and cα ∈ F. A homogenuos polynomial is a polynomial where all the multi-indices α ∈ A sum to the same integer. In other words, all the monomials have the same degree.
The set F[x1, x2, . . . , xn] of all polynomials over the field F in variables x1, x2, . . . , xn forms a commutative ring. Since it will be important in the sequel, we remind of some important definitions and results in ring theory.
Definition 1.3.3 (Ideal). If R is a commutative ring (e.g. F[x1, x2, . . . , xn]), an ideal in R is a set I for which the following holds:
• If x, y ∈ I, we have x + y ∈ I (I is a subgroup of (R, +).)
• If x ∈ I and r ∈ R we have rx ∈ I.
If f1, f2, . . . , fk ∈ R, the ideal generated by f1, f2, . . . , fk, denoted hf1, f2, . . . , fki, is defined as: ( k ) X hf1, f2, . . . , fki = qifi | qi ∈ R . i=1
The next theorem is a special case of Hilbert’s basis theorem.
Theorem 1.3.4. Every ideal in the polynomial ring F[x1, x2, . . . , xn] is finitely generated, i.e. for every ideal I there exists polynomials f1, f2, . . . , fk such that I = hf1, f2, . . . , fki. 10 Chapter 1. Introduction
1.3.2 Varieties and ideals Definition 1.3.5 (Affine algebraic set). An affine algebraic set is the set X ⊂ Fn of solutions to a system of polynomial equations
f1 = 0
f2 = 0 . .
fk = 0 for a given set {f1, f2, . . . , fk} of polynomials in n variables. We write X = V(f1, f2, . . . fk) for this affine algebraic set. An algebraic set X is called irreducible, or a variety if it cannot be written as X = X1 ∪ X2 for algebraic sets X1,X2 ⊂ X.
Definition 1.3.6 (Ideal of an affine algebraic set). For an algebraic set X ⊂ Fn, the ideal of X, denoted I(X) is the set of polynomials f ∈ F[x1, x2, . . . , xn] such that f(a1, a2, . . . , an) = 0 for every (a1, a2, . . . , an) ∈ X. When one works with algebraic sets one wants to find equations for the set and this can mean different things. A set of polynomials P = {p1, p2, . . . , pk} is said to cut out the algebraic set X set-theoretically if the set of common zeros of p1, p2, . . . , pk is X. P is said to cut out X ideal-theoretically if P is a generating set for I(X).
Example 1.3.7 (Twisted cubic). The twisted cubic is a curve in R3 which can be given as the image of R under the mapping t 7→ (t, t2, t3), fig. 1.1. However, the twisted cubic can also be viewed as an algebraic set, namely V(y−x2, z−x3), fig. 1.2.
Figure 1.1: The image of t 7→ (t, t2, t3) for −1 ≤ t ≤ 1. 1.3. Algebraic geometry 11
Figure 1.2: The intersection of the surfaces defined by y−x2 = 0 and z−x3 = 0, namely the twisted cubic, for (−1, 0, −1) ≤ (x, y, z) ≤ (1, 1, 1).
Example 1.3.8 (Matrices of rank r). Given vector spaces V,W of dimensions n m ∗ n and m and bases {vi}i=1 and {wj}j=1 respectively, V ⊗ W can be identified with the set of m×n matrices. The set of matrices of rank at most r is a variety in this space, namely the variety defined as the zero set of all (r + 1) × (r + 1) minors, since a matrix has rank less than or equal to r if and only if all of its (r + 1) × (r + 1) minors are zero. For example, if n = 4 and m = 3, a matrix defining a map between V and W can be written x11 x12 x13 x14 x21 x22 x23 x24 x31 x32 x33 x34 and the variety of matrices of rank 2 or less is the matrices satisfying
x11 x12 x13 x11 x12 x14
x21 x22 x23 = 0 x21 x22 x24 = 0
x31 x32 x33 x31 x32 x34
x11 x13 x14 x12 x13 x14
x21 x23 x24 = 0 x22 x23 x24 = 0.
x31 x33 x34 x32 x33 x34 That these equations cut out the set of 4 × 3 matrices of rank 2 or less set- theoreotically is easy to prove. They also generate the ideal for the variety, but this is harder to prove.
1.3.3 Projective spaces and varieties Definition 1.3.9 (Projective space). The n-dimensional projective space over F, denoted Pn(F), is the set Fn+1 \{0} modulo the equivalence relation ∼ where x ∼ y if and only if x = λy for some λ ∈ F \{0}. For a vector space V we write PV for the projectivization of V , and if v ∈ V , we write [v] for the equivalence 12 Chapter 1. Introduction class to which v belongs, i.e. [v] is the element in PV corresponding to the line λv in V . For a subset X ⊆ PV we will write Xˆ for the affine cone of X in V , i.e. Xˆ = {v ∈ V :[v] ∈ X}. We will now define what is meant by a projective algebraic set. Note that the zero locus of a polynomial is not defined in projective space, since in general f(x) 6= f(λx) for a polynomial f, but x = λx in projective space. However, for a polynomial F which is homogeneous of degree d the zero locus is well defined, since F (λx) = λdF (x). Note that even though the zero locus of a homogeneous polynomial is well defined on projective space, the homogeneous polynomials are not functions on projective space. Definition 1.3.10 (Projective algebraic set). A projetive algebraic set X ⊂ Pn(F) is the solution set to a system of polynomial equations
F1(x) = 0
F2(x) = 0 . .
Fk(x) = 0 for a set {F1,F2,...,Fk} of homogeneous polynomials in n + 1 variables. A projective algebraic set is called irreducible or a projective variety if it is not the union of two projective algebraic sets. Definition 1.3.11 (Ideal of a projective algebraic set). If X ⊂ Pn(F) is an algebraic set, its ideal I(X) is the set of all homogeneous polynomials which vanish on X, i.e. I(X) consists of all polynomials F such that
F (a1, a2, . . . , an+1) = 0 for all (a1, a2, . . . , an+1) ∈ X. Definition 1.3.12 (Zariski topology). The Zariski topology on Pn(F) (or Fn) is defined by its closed sets, which are taken to be all the sets X for which there exists a set S of homogeneous polynomials (or arbritrary polynomials in the case of Fn) such that X = {α : f(α) = 0 ∀f ∈ S}. The Zariski closure of a set X is the set V(I(X)).
1.3.4 Dimension of an algebraic set Definition 1.3.13 (Tangent space). Let M be a subset of a vector space V ˆ over F = R or C and let x ∈ M. The tangent space TxM ⊂ V is the span of vectors which are derivatives α0(0) of a smooth curve α : F → M such that α(0) = x. For a projective algebraic set X ⊂ PV , the affine tangent space to X at ˆ ˆ ˆ [x] ∈ X is T[x]X := TxX.
Definition 1.3.14 (Smooth and singular points). If dim TˆxX is constant at and near x, x is called a smooth point of X. If x is not smooth, it is called a singular point. For a variety X, let Xsmooth and Xsing denote the smooth and singular points of X respectively. 1.3. Algebraic geometry 13
Definition 1.3.15 (Dimension of a variety). For an affine algebraic set X, define the dimension of X as dim(X) := dim(TˆxX) for x ∈ Xsmooth. For an projective algebraic set X, define the dimension of X as dim(X) := dim(TˆxX) − 1 for x ∈ Xsmooth. Example 1.3.16 (Cuspidal cubic). The variety X in R2 given by X = V(y2 − x3) is called the cuspidal cubic, see fig. 1.3. The cuspidal cubic has one singular point, namely (0, 0). One can see that both the unit vector in the x-direction and the unit vector in the y-direction are tangent vectors to the variety at the ˆ point (0, 0). Thus dim T(0,0)X = 2, but for all x 6= (0, 0) on the cuspidal cubic we have dim TˆxX = 1, so (0, 0) is a singular point but all other points are smooth and the dimension of the cuspidal cubic is one.
Figure 1.3: The cuspidal cubic.
Example 1.3.17 (Matrices of rank r). Going back to the example of the ma- trices of size m × n with rank r or less, these can also be seen as a projective variety. We form the projective space Pm×n−1(F) (i.e. the space of matrices where matrices A and B are identified iff A = λB for some λ 6= 0, note that if A and B are identified they have the same rank). The equations will still be the same; the minors of size (r + 1) × (r + 1), which are homogeneous of degree r + 1. Example 1.3.18 (Segre variety). This variety will be very important in the sequel. Let V1,V2,... be complex vector spaces. The two-factor Segre variety is the variety defined as the image of the map
Seg : PV1 × PV2 → P(V1 ⊗ V2) Seg([v1], [v2]) = [v1 ⊗ v2] and it can be seen that the image of this map is the projectivization of the set of rank one tensors in V1 ⊗ V2. We can in a similar fashion define the n-factor Segre as the image of
Seg : PV1 × · · · × PVn → P(V1 ⊗ · · · ⊗ Vn) Seg([v1],..., [vn]) = [v1 ⊗ · · · ⊗ vn] 14 Chapter 1. Introduction and the image is once again the projectivization of the set of rank one tensors in V1 ⊗ · · · ⊗ Vn. That the 2-factor Segre variety is an algebraic set follows from the fact that the 2 × 2 minors furnish equations for the variety. In the next chapter we will work with the 3-factor Segre variety, for which equations are provided in section 2.3.1. For a general proof for the n-factor Segre, see [25, page 103]. Any curve in Seg(PV1 × PV2) is of the form v1(t) ⊗ v2(t), and its derivative 0 0 will be v1(0) ⊗ v2(0) + v1(0) ⊗ v2(0). Thus ˆ T[v1⊗v2] Seg(PV1 × PV2) = V1 ⊗ v2 + v1 ⊗ V2 and the intersection between V1 ⊗ v2 and v1 ⊗ V2 is the one-dimensional space spanned by v1 ⊗ v2. Therefore the dimension of the Segre variety is n1 + n2 − 2, where n1, n2 are the dimensions of V1,V2 respectively.
1.3.5 Cones, joins, and secant varieties
Definition 1.3.19 (Cone). Let X ⊂ Pn(F) be a projective variety and p ∈ Pn(F) a point. The cone over X with vertex p, J(p, X), is the Zariski closure of the union of all the lines pq joining p with a point q ∈ X, i.e.: [ J(p, X) = pq. q∈X
n Definition 1.3.20 (Join of varieties). Let X1,X2 ⊂ P (F) be two varieties. The join of X1 and X2 is the set [ J(X1,X2) = p1p2
p1∈X1,p2∈X2,p16=p2 which can be interpreted as the Zariski closure of the union of all cones over X2 with a vertex in X1, or vice versa. The join of several varieties X1,X2,...,Xk is defined inductively:
J(X1,X2,...,Xk) = J(X1,J(X2,...,Xk)).
Definition 1.3.21 (Secant variety). Let X be a variety. The r:th secant variety of X is the set σr(X) = J(X,...,X). | {z } k copies Lemma 1.3.22 (Secant varieties are varieties). Secant varieties of irreducible algebraic sets are irreducible, i.e. they are varieties. Proof. See [17, p. 144, prop. 11.24].
Let X ⊂ Pn(F) be an algebraic set of dimension k. The expected dimension of σr(X) is min{rk + r − 1, n}. However, the dimension is not always the expected.
Definition 1.3.23 (Degenerate secant variety). Let X ⊂ Pn(F) be an projective variety with dim(X) = k. If dim σr(X) < min{rk + r − 1, n}, then σr(X) is called degenerate with defect δr(X) = rk + r − 1 − dim σr(X). 1.3. Algebraic geometry 15
Definition 1.3.24 (X-rank). If V is a vector space over C, X ⊂ PV is a projective variety and p ∈ PV is a point, the X-rank of p is the smallest number r of points in X such that p lies in their linear span. The X-border rank of p is the least number r such that p lies in the σr(X), the r:th secant variety of X. The generic X-rank is the smallest r such that σr(X) = PV . These notions of X-rank and X-border rank will coincide with the ideas of tensor rank and tensor border rank (see section 2.1) when X is taken to be the Segre variety.
Lemma 1.3.25 (Terracini’s lemma). Let xi for i = 1, . . . , r be general points ˆ of Xi, where Xi are projective varieties in PV for a complex vector space V and let [u] = [x1 + ··· + xr] ∈ J(X1,...,Xr). Then ˆ ˆ ˆ T[u]J(X1, ··· ,Xr) = T[x1]X1 + ··· + T[xr ]Xr.
Proof. It is enough to consider the case of u = x1 + x2 for x1 ∈ X1, x2 ∈ X2 ˆ for varieties X1,X2 ∈ PV and deriving the expression for T[u]J(X1,X2). The addition map a : V × V → V is defined by a(v1, v2) = v1 + v2. Then
Jˆ(X1,X2) = a(Xˆ1 × Xˆ2) ˆ and so, for general points x1, x2, T[u]J(X1,X2) is obtained by differentiating curves x1(t) ∈ X1, x2(t) ∈ X2 with x1(0) = x1, x2(0) = x2. Thus the tangent space to x1 + x2 in J(X1,X2) will be the sum of tangent spaces of x1 in X1 and x2 in X2.
1.3.6 Real algebraic geometry In section 2.4 we will need the following definition. Definition 1.3.26 (Affine semi-algebraic set). An affine semi-algebraic set is a subset of Rn of the form: s r [ \i {x ∈ R | fi,ji,j0} i=1 j=1 where fi,j ∈ R[x1, . . . , xn] and i,j is < or =. Example 1.3.27 (Semi-algebraic set). Consider the semi-algebraic set given by 2 2 f1,1 = x + y − 2 3 f = x − y 1,2 2 f1,3 = −y 2 2 f2,1 = x + y − 2 3 f = x + y 2,2 2 f2,3 = y 1 f = (x − 2)2 + y2 − 3,1 4 1 f = (x − 7/2)2 + y2 − 4,1 4 16 Chapter 1. Introduction and all i,j being <. The set can be vizualised as in figure 1.4.
Figure 1.4: An example of a semi-algebraic set.
1.4 Application to matrix multiplication
We take a look at the problem of efficient computation of the product of 2 × 2 matrices. Let A, B, C be copies of the space of n × n matrices, and let the multiplica- tion mapping mn : A × B → C given by mn(M1,M2) = M1M2. To compute the matrix M3 = m2(M1,M2) = M1M2 one can naively use eight multiplica- tions and four additions using the standard method for matrix multiplication. Explicitly, if
1 1 1 1 a1 a2 b1 b2 M1 = 2 2 M2 = 2 2 a1 a2 b1 b2 one can compute M3 = M1M2 by
1 1 1 1 2 c1 = a1b1 + a2b1 1 1 1 1 2 c2 = a1b2 + a2b2 2 2 1 2 2 c1 = a1b1 + a2b1 2 2 1 2 2 c2 = a1b2 + a2b2.
However, this is not optimal. Strassen [37] showed that one can calculate M3 = M1M2 using only seven multiplications. First, one calculates
1 2 1 2 k1 = (a1 + a2)(b1 + b2) 2 2 1 k2 = (a1 + a2)b1 1 1 2 k3 = a1(b2 − b2) 2 1 2 k4 = a2(−b1 + b1) 1 1 2 k5 = (a1 + a2)b2 1 2 1 1 k6 = (−a1 + a1)(b1 + b2) 1 2 2 2 k7 = (a2 − a2)(b1 + b2) 1.4. Application to matrix multiplication 17
and the coeffients of M3 = M1M2 can then be calculated as
1 c1 = k1 + k4 − k5 + k7 2 c1 = k2 + k4 1 c2 = k3 + k5 2 c2 = k1 + k3 − k2 + k6.
Now, the map mn : A × B → C is obviously a bilinear map and as such can be expressed as a tensor. Let us take a look at m2. Equip A, B, C with the same basis 1 0 0 1 0 0 0 0 . 0 0 0 0 1 0 0 1
j 2,2 j 2,2 For clarity, let m2 : A × B → C and let the bases be {ai }i=1,j=1, {bi }i=1,j=1, j 2,2 j 2,2 j 2,2 {ci }i=1,j=1. Let the dual bases of A, B be {αi }i=1,j=1, {βi }i=1,j=1 respectively. ∗ ∗ Thus, m2 ∈ A ⊗ B ⊗ C and the standard algorithm for matrix multplication corresponds to the following rank eight decomposition of m2:
1 1 1 2 1 1 1 1 2 1 m2 = (α1 ⊗ β1 + α2 ⊗ β1 ) ⊗ c1 + (α1 ⊗ β2 + α2 ⊗ β2 ) ⊗ c2 2 1 2 2 2 2 1 2 2 2 + (α1 ⊗ β1 + α2 ⊗ β1 ) ⊗ c1 + (α1 ⊗ β2 + α2 ⊗ β2 ) ⊗ c2 whereas Strassen’s algorithm corresponds to a rank seven decomposition of m2:
1 2 1 2 1 2 2 2 1 2 2 m2 = (α1 + α2) ⊗ (β1 + β2 ) ⊗ (c1 + c2) + (α1 + α2) ⊗ β1 ⊗ (c1 − c2) 1 1 2 1 2 2 1 2 1 2 + α1 ⊗ (β2 − β2 ) ⊗ (c2 + c2) + α2 ⊗ (−β1 + β1 ) ⊗ (c1 + c1) 1 1 2 1 1 1 2 1 1 2 + (α1 + α2) ⊗ β2 ⊗ (−c1 + c2) + (−α1 + α1) ⊗ (β1 + β2 ) ⊗ c2 1 2 2 2 1 + (α2 − α2) ⊗ (β1 + β2 ) ⊗ c1.
It has been proven that both the rank and border rank of m2 is seven [26]. This can be seen from the fact that σ7(Seg(PA × PB × PC)) = P(A ⊗ B ⊗ C). However, the rank of mn for n ≥ 3 is still unkown. Even for m3, all that is known is that the rank is between 19 and 23 [25, chapter 11]. It is interesting to note that this is lower than the generic rank for 9 × 9 × 9 tensors, which is 30 (theorem 2.3.8). The rank of m2 is however the generic seven. 18 Chapter 1. Introduction Chapter 2
Tensor rank
In this chapter we present some results on tensor rank, mainly from the view of algebraic geometry. We introduce different types of rank of a tensor and show some basic results concerning these different types of ranks. We derive equations for the Segre variety and show some basic results on secant defects of the Segre variety and generic ranks. A general reference for this chapter is [25].
2.1 Different notions of rank
If T : U → V is a linear operator and U, V are vector spaces, the rank of T is the dimension of the image T (U). If one considers T as an element of U ∗ ⊗ V , the rank of T coincides with the smallest integer R such that T can be written
R X T = αi ⊗ vi. i=1
However, if one considers a T ∈ V1 ⊗ V2 ⊗ · · · ⊗ Vk, this can be viewed as a ∗ linear operator Vi → V1 ⊗ · · · ⊗ Vi−1 ⊗ Vi+1 ⊗ · · · ⊗ Vk for any 1 ≤ i ≤ k, so T can be viewed as a linear operator in these k different ways, and for every way ∗ ∗ we get a different rank. The k-tuple (dim T (V1 ),..., dim T (Vk )) is known as the multilinear rank of T . However, the smallest integer R such that T can be written R X (1) (k) T = vi ⊗ · · · ⊗ vi i=1 is known as the rank of T (sometimes called the outer product rank). If T is a tensor, let R(T ) denote the rank of T . The idea of tensor rank gets more complicated still. If a tensor T has rank R it is possible that there exist tensors of rank R˜ < R such that T is the limit of these tensors, in which case T is said to have border rank R˜. Let R(T ) denote the border rank of the tensor T .
Erdtman, J¨onsson,2012. 19 20 Chapter 2. Tensor rank
Example 2.1.1 (Border rank). Consider the numerically given tensor T 0 1 1 1 1 1 T = ⊗ ⊗ + ⊗ ⊗ 1 0 1 2 0 1 0 2 1 0 1 −1 1 0 1 0 + ⊗ ⊗ + ⊗ ⊗ = . 1 1 1 1 0 1 4 1 6 1 One can show that T has rank 3, for instance with a method for p×p×2 tensors used in [36]. Now consider the rank-two tensor T (ε) ε − 1 0 1 1 T (ε) = ⊗ ⊗ ε 1 0 1 1 0 1 1 2 1 −1 + + ε ⊗ + ε ⊗ + ε . ε 1 2 0 1 1 1 Calculating T (ε) for a few values of ε gives us the following results 0 0 6 2 T (1) = , 0 0 18 6 1.0800 0.0900 1.0800 0.1100 T 10−1 = , 3.9600 1.0800 6.8400 1.3200 1.0010 1.0010 1.0030 0.0010 T 10−3 = , 4.0000 1.0010 6.0080 1.0030 1.0000 0.0000 1.0000 0.0000 T 10−5 = , 4.0000 1.0000 6.0001 1.0000 which gives us an indication that T (ε) → T when ε → 0. The above tensor is a special case of tensors on the form
T = a1 ⊗ b1 ⊗ c1 + a2 ⊗ b1 ⊗ c1 + a1 ⊗ b2 ⊗ c1 + a1 ⊗ b1 ⊗ c2 and even in this general case one can show that T has rank three, but there are tensors of rank two arbitrarly close to it: 1 T (ε) = ((ε − 1)a ⊗ b ⊗ c + (a + εa ) ⊗ (b + εb ) ⊗ (c + εc )) = ε 1 1 1 1 2 1 2 1 2 1 = (εa ⊗ b ⊗ c − a ⊗ b ⊗ c + a ⊗ b ⊗ c + εa ⊗ b ⊗ c + ε 1 1 1 1 1 1 1 1 1 2 1 1 2 + εa1 ⊗ b2 ⊗ c1 + εa1 ⊗ b1 ⊗ c2 + O(ε )) → T , when ε → 0. There is a well-known result for matrices which states that if one fills an n×m matrix with random entries, the matrix will have maximal rank, min{n, m}, with probability one. In the case of square matrices, a random matrix will be invertible with probability one. For tensors over C the situation is similar; a random tensor will have a certain rank with probability one - this rank is called the generic rank. Over R however, there can be multiple ranks, called typical ranks, which a random tensor takes with non-zero probability, see more in section 2.4. For now, we remind of definition 1.3.24, and that the generic rank is the smallest r such that the r:th secant variety of the Segre variety is the whole space. Compare these observations and definitions with the fact that GL(n, C) is a n2-dimensional manifold in the n2-dimensional space of n × n matrices, and a random matrix in this space is invertible with probability one. 2.1. Different notions of rank 21
2.1.1 Results on tensor rank Theorem 2.1.2. Given an I × J × K tensor T , R(T ) is the minimal number p of rank one J × K matrices S1,...,Sp such that Ti ∈ span(S1,...,Sp) for all slices Ti of T . Proof. For a tensor T one can write
R(T ) X T = ak ⊗ bk ⊗ ck k=1
1 I T and thus, if ak = (ak, . . . , ak) , we have
R(T ) X i Ti = akbk ⊗ ck k=1 so Ti ∈ span(b1 ⊗c1,..., bR(T ) ⊗cR(T )) for i = 1,...,I, which proves R(T ) ≥ p. If Ti ∈ span(S1,...,Sp) with rank(Sj) = 1 for i = 1,...,I, we can write
p p X i X i Ti = xkSk = xkyk ⊗ zk k=1 k=1
1 I and thus with xk = (xk, . . . , xk) we get
p X T = xk ⊗ yk ⊗ zk k=1 which proves R(T ) ≤ p, resulting in R(T ) = p. Corollary 2.1.3. For an I × J × K-tensor T , R(T ) ≤ min{IJ, IK, JK}. Proof. One observes from theorem 2.1.2 that one can manipulate any of the three kinds of slices in T , and thus one can pick the kind which results in the smallest matrices, say m × n. The space of m × n matrices is spanned by the m,n mn matrices Mkl = {δkl(i, j)}i,j=1. Thus one cannot need more than mn rank one matrices to get all the slices in the linear span.
2.1.2 Symmetric tensor rank Definition 2.1.4 (Symmetric rank). Given a tensor T ∈ SdV , the symmetric rank of T , denoted RS(T ), is defined as the smallest R such that
R X T = vr ⊗ · · · ⊗ vr, r=1 for vi ∈ V . The symmetric border rank of T is defined as the smallest R such that T is the limit of symmetric tensors of symmetric rank R.
Since we, over R and C, can put symmetric tensors of order d in bijective correspondence with homogeneous polynomials of degree d, and vectors in bijec- tive correspondence with linear forms, the symmetric rank of a given symmetric 22 Chapter 2. Tensor rank tensor can be translated to the number R of linear forms needed for a given homogeneous polynomial of degree d to be expressed as a sum of linear forms to the power of d. That is, if P is a homogeneous polynomial of degree d over C, what is the least R such that
d d P = l1 + ··· + lR for linear forms li? Over C, the following theorem gives an answer to this question in the generic case.
Theorem 2.1.5 (Alexander-Hirschowitz). The generic symmetric rank in SdCn is &